[GitHub] [hudi] rtdt99 opened a new issue, #7249: [SUPPORT] How to run cleaner table service on DFS source of DeltaStreamer ?

2022-11-19 Thread GitBox
rtdt99 opened a new issue, #7249: URL: https://github.com/apache/hudi/issues/7249 **_Tips before filing an issue_** - Have you gone through our [FAQs](https://hudi.apache.org/learn/faq/)? - Join the mailing list to engage in conversations and get faster support at

[jira] [Updated] (HUDI-5245) Honor pruned partitions while looking up in col stats partition in MDT

2022-11-19 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-5245: -- Fix Version/s: 0.13.0 > Honor pruned partitions while looking up in col stats partition

[jira] [Updated] (HUDI-5245) Honor pruned partitions while looking up in col stats partition in MDT

2022-11-19 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-5245: -- Priority: Critical (was: Major) > Honor pruned partitions while looking up in col

[jira] [Updated] (HUDI-5245) Honor pruned partitions while looking up in col stats partition in MDT

2022-11-19 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-5245: -- Description: When looking up in col stats for data skipping, we are passing in only the

[jira] [Updated] (HUDI-5245) Honor pruned partitions while looking up in col stats partition in MDT

2022-11-19 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-5245: -- Description: When looking up in col stats for data skipping, we are passing in only the

[jira] [Created] (HUDI-5245) Honor pruned partitions while looking up in col stats partition in MDT

2022-11-19 Thread sivabalan narayanan (Jira)
sivabalan narayanan created HUDI-5245: - Summary: Honor pruned partitions while looking up in col stats partition in MDT Key: HUDI-5245 URL: https://issues.apache.org/jira/browse/HUDI-5245

[GitHub] [hudi] hudi-bot commented on pull request #7248: [HUDI-5244] Fix bugs in schema evolution client with lost operation field and not found schema

2022-11-19 Thread GitBox
hudi-bot commented on PR #7248: URL: https://github.com/apache/hudi/pull/7248#issuecomment-1321053519 ## CI report: * 035d1ca955024eefcad2989882f402940569f3a2 Azure:

[GitHub] [hudi] hudi-bot commented on pull request #7248: [HUDI-5244] Fix bugs in schema evolution client with lost operation field and not found schema

2022-11-19 Thread GitBox
hudi-bot commented on PR #7248: URL: https://github.com/apache/hudi/pull/7248#issuecomment-1321052780 ## CI report: * 035d1ca955024eefcad2989882f402940569f3a2 Azure:

[GitHub] [hudi] hudi-bot commented on pull request #7241: [HUDI-5241] Optimize HoodieDefaultTimeline API

2022-11-19 Thread GitBox
hudi-bot commented on PR #7241: URL: https://github.com/apache/hudi/pull/7241#issuecomment-1321042762 ## CI report: * 3045f14ac99e049be4b40d14906b8aef0f3ed34d UNKNOWN * 8b5df16abb67a2270fa42bc9f97aff0bada32c1b Azure:

[GitHub] [hudi] hudi-bot commented on pull request #7248: [HUDI-5244] Fix bugs in schema evolution client with lost operation field and not found schema

2022-11-19 Thread GitBox
hudi-bot commented on PR #7248: URL: https://github.com/apache/hudi/pull/7248#issuecomment-1321034276 ## CI report: * 035d1ca955024eefcad2989882f402940569f3a2 Azure:

[GitHub] [hudi] hudi-bot commented on pull request #7248: [HUDI-5244] Fix bugs in schema evolution client with lost operation field and not found schema

2022-11-19 Thread GitBox
hudi-bot commented on PR #7248: URL: https://github.com/apache/hudi/pull/7248#issuecomment-1321033134 ## CI report: * 035d1ca955024eefcad2989882f402940569f3a2 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the

[GitHub] [hudi] shangwen commented on pull request #7187: [HUDI-5199] Fixed HoodieParquetDataBlock failed to read data when schema evolutio…

2022-11-19 Thread GitBox
shangwen commented on PR #7187: URL: https://github.com/apache/hudi/pull/7187#issuecomment-1321025147 hi @xiarixiaoyao , it looks like Test Call rollback_to_savepoint Procedure test case fails, not related to my test case -- This is an automated message from the Apache Git Service. To

[GitHub] [hudi] nsivabalan commented on a diff in pull request #6358: [HUDI-4588][HUDI-4472] Addressing schema handling issues in the write path

2022-11-19 Thread GitBox
nsivabalan commented on code in PR #6358: URL: https://github.com/apache/hudi/pull/6358#discussion_r1027179438 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/commit/HoodieMergeHelper.java: ## @@ -72,93 +70,116 @@ public static HoodieMergeHelper

[GitHub] [hudi] nsivabalan commented on a diff in pull request #6358: [HUDI-4588][HUDI-4472] Addressing schema handling issues in the write path

2022-11-19 Thread GitBox
nsivabalan commented on code in PR #6358: URL: https://github.com/apache/hudi/pull/6358#discussion_r1027178962 ## hudi-common/src/main/java/org/apache/hudi/avro/AvroSchemaUtils.java: ## @@ -76,6 +101,19 @@ public static Schema resolveUnionSchema(Schema schema, String

[GitHub] [hudi] nsivabalan commented on a diff in pull request #6358: [HUDI-4588][HUDI-4472] Addressing schema handling issues in the write path

2022-11-19 Thread GitBox
nsivabalan commented on code in PR #6358: URL: https://github.com/apache/hudi/pull/6358#discussion_r1027178962 ## hudi-common/src/main/java/org/apache/hudi/avro/AvroSchemaUtils.java: ## @@ -76,6 +101,19 @@ public static Schema resolveUnionSchema(Schema schema, String

[GitHub] [hudi] nsivabalan commented on a diff in pull request #6358: [HUDI-4588][HUDI-4472] Addressing schema handling issues in the write path

2022-11-19 Thread GitBox
nsivabalan commented on code in PR #6358: URL: https://github.com/apache/hudi/pull/6358#discussion_r1027173356 ## hudi-common/src/main/java/org/apache/hudi/common/table/log/HoodieLogFileReader.java: ## @@ -243,6 +243,18 @@ private HoodieLogBlock readBlock() throws IOException {

[jira] [Updated] (HUDI-5244) Fix bugs in schema evolution client with lost operation field and not found schema

2022-11-19 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-5244: - Labels: pull-request-available (was: ) > Fix bugs in schema evolution client with lost operation

[GitHub] [hudi] trushev opened a new pull request, #7248: [HUDI-5244] Fix bugs in schema evolution client with lost operation field and not found schema

2022-11-19 Thread GitBox
trushev opened a new pull request, #7248: URL: https://github.com/apache/hudi/pull/7248 ### Change Logs This PR fixes 2 issues in schema evolution client api: 1. Lost operation field in avro schema 2. Not found schema for table ```

[jira] [Created] (HUDI-5244) Fix bugs in schema evolution client with lost operation field and not found schema

2022-11-19 Thread Alexander Trushev (Jira)
Alexander Trushev created HUDI-5244: --- Summary: Fix bugs in schema evolution client with lost operation field and not found schema Key: HUDI-5244 URL: https://issues.apache.org/jira/browse/HUDI-5244

[GitHub] [hudi] hudi-bot commented on pull request #7241: [HUDI-5241] Optimize HoodieDefaultTimeline API

2022-11-19 Thread GitBox
hudi-bot commented on PR #7241: URL: https://github.com/apache/hudi/pull/7241#issuecomment-1321019747 ## CI report: * 3045f14ac99e049be4b40d14906b8aef0f3ed34d UNKNOWN * 16d9382c24677fedab501a76afe08640133ebb64 Azure:

[GitHub] [hudi] hudi-bot commented on pull request #7241: [HUDI-5241] Optimize HoodieDefaultTimeline API

2022-11-19 Thread GitBox
hudi-bot commented on PR #7241: URL: https://github.com/apache/hudi/pull/7241#issuecomment-1321018976 ## CI report: * 3045f14ac99e049be4b40d14906b8aef0f3ed34d UNKNOWN * 16d9382c24677fedab501a76afe08640133ebb64 Azure:

[hudi] branch master updated: [HUDI-5162] Allow user specified start offset for streaming query (#7138)

2022-11-19 Thread biyan
This is an automated email from the ASF dual-hosted git repository. biyan pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/master by this push: new d976671c7b [HUDI-5162] Allow user specified start

[GitHub] [hudi] YannByron merged pull request #7138: [HUDI-5162] Allow user specified start offset for streaming query

2022-11-19 Thread GitBox
YannByron merged PR #7138: URL: https://github.com/apache/hudi/pull/7138 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

[GitHub] [hudi] nsivabalan commented on a diff in pull request #5581: [HUDI-53] Implementation of a native DFS based index based on the metadata table.

2022-11-19 Thread GitBox
nsivabalan commented on code in PR #5581: URL: https://github.com/apache/hudi/pull/5581#discussion_r1027105129 ## hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/client/SparkRDDWriteClient.java: ## @@ -379,10 +380,12 @@ private void

[GitHub] [hudi] hemanth-gowda-12 opened a new issue, #7247: [SUPPORT] Duplicates on upserts when record partition path begins with "/".

2022-11-19 Thread GitBox
hemanth-gowda-12 opened a new issue, #7247: URL: https://github.com/apache/hudi/issues/7247 Upserts don't work on the Java client for both MOR and COW if partition path in `new HoodieKey(rKey, partitionPath)` starts with a "/". Steps to reproduce the behavior: 1.

[GitHub] [hudi] kasured commented on issue #7246: [SUPPORT] Controlling the Archival process retention

2022-11-19 Thread GitBox
kasured commented on issue #7246: URL: https://github.com/apache/hudi/issues/7246#issuecomment-1320944879 That seems to be a duplicate of https://github.com/apache/hudi/issues/4275 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [hudi] kasured opened a new issue, #7246: [SUPPORT] Controlling the Archival process retention

2022-11-19 Thread GitBox
kasured opened a new issue, #7246: URL: https://github.com/apache/hudi/issues/7246 **Describe the problem you faced** We started seeing OutOfMemory issues while finalizing writes to the Hudi COW table on the step of archival Documentation is not quite clear how to tune the

[jira] [Updated] (HUDI-5243) Return num_affected_rows from sql INSERT statement

2022-11-19 Thread kazdy (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kazdy updated HUDI-5243: Status: In Progress (was: Open) > Return num_affected_rows from sql INSERT statement >

[jira] [Updated] (HUDI-5243) Return num_affected_rows from sql INSERT statement

2022-11-19 Thread kazdy (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kazdy updated HUDI-5243: Description: Currently when running spark sql DML, in order to check how many rows were affected, users need to

[jira] [Created] (HUDI-5243) Return num_affected_rows from sql INSERT statement

2022-11-19 Thread kazdy (Jira)
kazdy created HUDI-5243: --- Summary: Return num_affected_rows from sql INSERT statement Key: HUDI-5243 URL: https://issues.apache.org/jira/browse/HUDI-5243 Project: Apache Hudi Issue Type: Improvement

[GitHub] [hudi] xushiyan commented on pull request #7206: [HUDI-5220] fix hive snapshot query add non hoodie paths file status

2022-11-19 Thread GitBox
xushiyan commented on PR #7206: URL: https://github.com/apache/hudi/pull/7206#issuecomment-1320926100 @onlywangyh thanks for making the patch. would you add a test case pls? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [hudi] xushiyan commented on issue #7209: [SUPPORT] Hudi deltastreamer fails due to Clean

2022-11-19 Thread GitBox
xushiyan commented on issue #7209: URL: https://github.com/apache/hudi/issues/7209#issuecomment-1320925426 @koldic do you mind zip the `.hoodie/` folder and share? so we can exam what is going on with the commit metadata -- This is an automated message from the Apache Git Service. To

[GitHub] [hudi] xushiyan commented on issue #7219: [SUPPORT] Schema update to dataset causing pipeline failure with PostgresDebeziumSource.

2022-11-19 Thread GitBox
xushiyan commented on issue #7219: URL: https://github.com/apache/hudi/issues/7219#issuecomment-1320924580 > I found the gap. #7225 can you give this a try. @BalaMahesh let us know if it works -- This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [hudi] xushiyan commented on issue #7221: [SUPPORT] Spark could not read Flink created table

2022-11-19 Thread GitBox
xushiyan commented on issue #7221: URL: https://github.com/apache/hudi/issues/7221#issuecomment-1320924188 > you can load data like this val df = spark.read.format("hudi").load("hdfs://nameservice/user/hudi/db/table_name") @punish-yh pls close if this works in your case -- This is

[GitHub] [hudi] hudi-bot commented on pull request #7241: [HUDI-5241] Optimize HoodieDefaultTimeline API

2022-11-19 Thread GitBox
hudi-bot commented on PR #7241: URL: https://github.com/apache/hudi/pull/7241#issuecomment-1320885572 ## CI report: * 3045f14ac99e049be4b40d14906b8aef0f3ed34d UNKNOWN * 16d9382c24677fedab501a76afe08640133ebb64 Azure:

[GitHub] [hudi] SteNicholas commented on pull request #7159: [HUDI-5173]Skip if there is only one file in clusteringGroup

2022-11-19 Thread GitBox
SteNicholas commented on PR #7159: URL: https://github.com/apache/hudi/pull/7159#issuecomment-1320876523 @zhuanshenbsj1, you should apply above patch and pay attention to the `ITTestHoodieFlinkClustering#testHoodieFlinkClustering` test case because the test case has only one file slice for

[GitHub] [hudi] hudi-bot commented on pull request #7241: [HUDI-5241] Optimize HoodieDefaultTimeline API

2022-11-19 Thread GitBox
hudi-bot commented on PR #7241: URL: https://github.com/apache/hudi/pull/7241#issuecomment-1320868668 ## CI report: * 3045f14ac99e049be4b40d14906b8aef0f3ed34d UNKNOWN * 748b8e6d2835054019fdb81da93ce4a963e2a1d2 Azure:

[GitHub] [hudi] hudi-bot commented on pull request #7241: [HUDI-5241] Optimize HoodieDefaultTimeline API

2022-11-19 Thread GitBox
hudi-bot commented on PR #7241: URL: https://github.com/apache/hudi/pull/7241#issuecomment-1320855373 ## CI report: * 3045f14ac99e049be4b40d14906b8aef0f3ed34d UNKNOWN * 748b8e6d2835054019fdb81da93ce4a963e2a1d2 Azure: