(hudi) branch master updated: [HUDI-7378] Fix Spark SQL DML with custom key generator (#10615)

2024-04-12 Thread yihua
This is an automated email from the ASF dual-hosted git repository. yihua pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/master by this push: new 17ea14ab6d6 [HUDI-7378] Fix Spark SQL DML with

Re: [PR] [HUDI-7378] Fix Spark SQL DML with custom key generator [hudi]

2024-04-12 Thread via GitHub
yihua merged PR #10615: URL: https://github.com/apache/hudi/pull/10615 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

Re: [I] [SUPPORT]After compacting, there are a large number of logs with size 0, and they can never be cleared. [hudi]

2024-04-12 Thread via GitHub
MrAladdin commented on issue #11007: URL: https://github.com/apache/hudi/issues/11007#issuecomment-205291 > rollback the compaction I'm not sure which compact to roll back and how to locate it since it has been compacted multiple times already. If it's not addressed, will it be

Re: [I] [SUPPORT]There is a deltacommit that remains in the REQUESTED state [hudi]

2024-04-12 Thread via GitHub
MrAladdin commented on issue #11010: URL: https://github.com/apache/hudi/issues/11010#issuecomment-2052898550 > You can trigger revert with Hudi CLI. 您可以使用 Hudi CLI 触发还原。 Please, how can I restart, can you give me a specific command example? Also, I would like to ask why serial

Re: [PR] [HUDI-7566] Add schema evolution to spark file readers [hudi]

2024-04-12 Thread via GitHub
yihua commented on code in PR #10956: URL: https://github.com/apache/hudi/pull/10956#discussion_r1563527600 ## hudi-spark-datasource/hudi-spark3-common/src/main/scala/org/apache/spark/sql/execution/datasources/Spark3ParquetSchemaEvolutionUtils.scala: ## @@ -0,0 +1,194 @@ +/* +

Re: [PR] [HUDI-7566] Add schema evolution to spark file readers [hudi]

2024-04-12 Thread via GitHub
yihua commented on code in PR #10956: URL: https://github.com/apache/hudi/pull/10956#discussion_r1563514428 ## hudi-spark-datasource/hudi-spark3-common/src/main/scala/org/apache/spark/sql/execution/datasources/Spark3ParquetSchemaEvolutionUtils.scala: ## @@ -0,0 +1,194 @@ +/* +

Re: [PR] [HUDI-7378] Fix Spark SQL DML with custom key generator [hudi]

2024-04-12 Thread via GitHub
hudi-bot commented on PR #10615: URL: https://github.com/apache/hudi/pull/10615#issuecomment-2052767220 ## CI report: * 805ba35b65afbb1daccbcf00291fd520a69c5584 Azure:

Re: [PR] [HUDI-7566] Add schema evolution to spark file readers [hudi]

2024-04-12 Thread via GitHub
yihua commented on code in PR #10956: URL: https://github.com/apache/hudi/pull/10956#discussion_r1563399841 ## hudi-spark-datasource/hudi-spark3.0.x/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/Spark30ParquetReader.scala: ## @@ -142,11 +149,20 @@ class

Re: [PR] [MINOR] Make ordering deterministic in small file selection [hudi]

2024-04-12 Thread via GitHub
hudi-bot commented on PR #11008: URL: https://github.com/apache/hudi/pull/11008#issuecomment-2052737016 ## CI report: * 08eee17c0e936c02e100b65aeba27f81a232452c Azure:

Re: [PR] [MINOR] Make ordering deterministic in small file selection [hudi]

2024-04-12 Thread via GitHub
the-other-tim-brown commented on code in PR #11008: URL: https://github.com/apache/hudi/pull/11008#discussion_r1563428947 ## hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/table/action/deltacommit/SparkUpsertDeltaCommitPartitioner.java: ## @@ -89,10 +89,13 @@

Re: [PR] [MINOR] Make ordering deterministic in small file selection [hudi]

2024-04-12 Thread via GitHub
danny0405 commented on code in PR #11008: URL: https://github.com/apache/hudi/pull/11008#discussion_r1563427282 ## hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/table/action/deltacommit/SparkUpsertDeltaCommitPartitioner.java: ## @@ -89,10 +89,13 @@ protected List

Re: [PR] [HUDI-7609] Support array field type whose element type can be nullable [hudi]

2024-04-12 Thread via GitHub
danny0405 commented on code in PR #11006: URL: https://github.com/apache/hudi/pull/11006#discussion_r1563425831 ## hudi-sync/hudi-sync-common/src/main/java/org/apache/hudi/sync/common/util/Parquet2SparkSchemaUtils.java: ## @@ -140,7 +141,7 @@ private static String

Re: [I] [SUPPORT]There is a deltacommit that remains in the REQUESTED state [hudi]

2024-04-12 Thread via GitHub
danny0405 commented on issue #11010: URL: https://github.com/apache/hudi/issues/11010#issuecomment-2052727390 You can trigger revert with Hudi CLI. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] [HUDI-7608] Fix Flink table creation configuration not taking effect when writing… [hudi]

2024-04-12 Thread via GitHub
danny0405 commented on code in PR #11005: URL: https://github.com/apache/hudi/pull/11005#discussion_r1563423256 ## hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/spark/sql/hudi/HoodieOptionConfig.scala: ## @@ -43,6 +43,11 @@ object HoodieOptionConfig { */

Re: [I] [SUPPORT]After compacting, there are a large number of logs with size 0, and they can never be cleared. [hudi]

2024-04-12 Thread via GitHub
danny0405 commented on issue #11007: URL: https://github.com/apache/hudi/issues/11007#issuecomment-2052724244 You can rollback the compaction with CIL, the cleaner would finally clean these logs, because before 1.0, the log cleaning is actually appending new log blocks to the corrupt

Re: [I] [SUPPORT] StreamWriteFunction support Exectly-Once in Flink ? [hudi]

2024-04-12 Thread via GitHub
danny0405 commented on issue #11004: URL: https://github.com/apache/hudi/issues/11004#issuecomment-2052723145 The checkpoint would trigger commit to hudi table. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[I] [SUPPORT]There is a deltacommit that remains in the REQUESTED state [hudi]

2024-04-12 Thread via GitHub
MrAladdin opened a new issue, #11010: URL: https://github.com/apache/hudi/issues/11010 **Describe the problem you faced** There is a deltacommit that remains in the REQUESTED state.Does it have an impact, will it cause data loss, and how to deal with it next? **Environment

Re: [PR] [HUDI-7566] Add schema evolution to spark file readers [hudi]

2024-04-12 Thread via GitHub
hudi-bot commented on PR #10956: URL: https://github.com/apache/hudi/pull/10956#issuecomment-2052706711 ## CI report: * c8f507bcac03c7183893400487a1885400c46853 Azure:

Re: [PR] [HUDI-7378] Fix Spark SQL DML with custom key generator [hudi]

2024-04-12 Thread via GitHub
hudi-bot commented on PR #10615: URL: https://github.com/apache/hudi/pull/10615#issuecomment-2052703726 ## CI report: * dfab8e1285bf0241eea2e71f9d85607c647446d7 Azure:

Re: [PR] [HUDI-7378] Fix Spark SQL DML with custom key generator [hudi]

2024-04-12 Thread via GitHub
hudi-bot commented on PR #10615: URL: https://github.com/apache/hudi/pull/10615#issuecomment-2052699814 ## CI report: * dfab8e1285bf0241eea2e71f9d85607c647446d7 Azure:

[jira] [Updated] (HUDI-7615) Mark a few write configs with the correct sinceVersion

2024-04-12 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-7615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-7615: Component/s: configs > Mark a few write configs with the correct sinceVersion >

[jira] [Updated] (HUDI-7615) Mark a few write configs with the correct sinceVersion

2024-04-12 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-7615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-7615: Description: The following write configs are not associated with the correct since version

[jira] [Updated] (HUDI-7615) Mark a few write configs with the correct sinceVersion

2024-04-12 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-7615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-7615: Fix Version/s: 0.15.0 1.0.0 > Mark a few write configs with the correct sinceVersion >

[jira] [Created] (HUDI-7615) Mark a few write configs with the correct sinceVersion

2024-04-12 Thread Ethan Guo (Jira)
Ethan Guo created HUDI-7615: --- Summary: Mark a few write configs with the correct sinceVersion Key: HUDI-7615 URL: https://issues.apache.org/jira/browse/HUDI-7615 Project: Apache Hudi Issue Type:

[jira] [Assigned] (HUDI-7615) Mark a few write configs with the correct sinceVersion

2024-04-12 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-7615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo reassigned HUDI-7615: --- Assignee: tao pan (was: Ethan Guo) > Mark a few write configs with the correct sinceVersion >

[jira] [Assigned] (HUDI-7615) Mark a few write configs with the correct sinceVersion

2024-04-12 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-7615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo reassigned HUDI-7615: --- Assignee: Ethan Guo > Mark a few write configs with the correct sinceVersion >

Re: [PR] [HUDI-7378] Fix Spark SQL DML with custom key generator [hudi]

2024-04-12 Thread via GitHub
yihua commented on code in PR #10615: URL: https://github.com/apache/hudi/pull/10615#discussion_r1563323590 ## hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/spark/sql/hudi/ProvidesHoodieConfig.scala: ## @@ -530,6 +539,40 @@ object ProvidesHoodieConfig {

Re: [PR] [MINOR] Make ordering deterministic in small file selection [hudi]

2024-04-12 Thread via GitHub
hudi-bot commented on PR #11008: URL: https://github.com/apache/hudi/pull/11008#issuecomment-2052662120 ## CI report: * baaff5d03b4199e0aa188492cfa8a5fe2908a47e Azure:

Re: [PR] [MINOR] Make ordering deterministic in small file selection [hudi]

2024-04-12 Thread via GitHub
hudi-bot commented on PR #11008: URL: https://github.com/apache/hudi/pull/11008#issuecomment-2052654450 ## CI report: * baaff5d03b4199e0aa188492cfa8a5fe2908a47e Azure:

Re: [PR] [HUDI-7566] Add schema evolution to spark file readers [hudi]

2024-04-12 Thread via GitHub
hudi-bot commented on PR #10956: URL: https://github.com/apache/hudi/pull/10956#issuecomment-2052654190 ## CI report: * a73f9559fc8626342b767085cf7a56f743a425fc Azure:

Re: [PR] [HUDI-7566] Add schema evolution to spark file readers [hudi]

2024-04-12 Thread via GitHub
hudi-bot commented on PR #10956: URL: https://github.com/apache/hudi/pull/10956#issuecomment-2052647727 ## CI report: * a73f9559fc8626342b767085cf7a56f743a425fc Azure:

Re: [PR] [HUDI-7378] Fix Spark SQL DML with custom key generator [hudi]

2024-04-12 Thread via GitHub
yihua commented on code in PR #10615: URL: https://github.com/apache/hudi/pull/10615#discussion_r1563245298 ## hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/functional/TestSparkSqlWithCustomKeyGenerator.scala: ## @@ -0,0 +1,571 @@ +/* + * Licensed to the

Re: [PR] [HUDI-7269] Fallback to key based merge if positions are missing from log block [hudi]

2024-04-12 Thread via GitHub
hudi-bot commented on PR #10991: URL: https://github.com/apache/hudi/pull/10991#issuecomment-2052596559 ## CI report: * 2af03c004aef66248dae6283e9c2f1e63e062e75 Azure:

Re: [PR] [DO NOT MERGE][HUDI-7567] Add schema evolution to the filegroup reader [hudi]

2024-04-12 Thread via GitHub
hudi-bot commented on PR #10957: URL: https://github.com/apache/hudi/pull/10957#issuecomment-2052596475 ## CI report: * 966e8c85f2afb0ffaf00e12d02eb41b41c68e0bc Azure:

Re: [PR] [HUDI-7566] Add schema evolution to spark file readers [hudi]

2024-04-12 Thread via GitHub
hudi-bot commented on PR #10956: URL: https://github.com/apache/hudi/pull/10956#issuecomment-2052596422 ## CI report: * a73f9559fc8626342b767085cf7a56f743a425fc Azure:

[jira] [Assigned] (HUDI-7614) Run hudi-cli tests in Azure CI

2024-04-12 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-7614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo reassigned HUDI-7614: --- Assignee: Shawn Chang > Run hudi-cli tests in Azure CI > -- > >

[jira] [Updated] (HUDI-7614) Run hudi-cli tests in Azure CI

2024-04-12 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-7614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-7614: Description: Right now Azure CI does not run tests in hudi-cli module.  Some tests in hudi-cli module fail

[jira] [Updated] (HUDI-7614) Run hudi-cli tests in Azure CI

2024-04-12 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-7614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-7614: Epic Link: HUDI-4302 > Run hudi-cli tests in Azure CI > -- > >

[jira] [Updated] (HUDI-7614) Run hudi-cli tests in Azure CI

2024-04-12 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-7614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-7614: Fix Version/s: 1.0.0 > Run hudi-cli tests in Azure CI > -- > >

[jira] [Created] (HUDI-7614) Run hudi-cli tests in Azure CI

2024-04-12 Thread Ethan Guo (Jira)
Ethan Guo created HUDI-7614: --- Summary: Run hudi-cli tests in Azure CI Key: HUDI-7614 URL: https://issues.apache.org/jira/browse/HUDI-7614 Project: Apache Hudi Issue Type: Improvement

Re: [PR] [HUDI-7378] Fix Spark SQL DML with custom key generator [hudi]

2024-04-12 Thread via GitHub
yihua commented on code in PR #10615: URL: https://github.com/apache/hudi/pull/10615#discussion_r1563198254 ## hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/spark/sql/hudi/ProvidesHoodieConfig.scala: ## @@ -530,6 +539,40 @@ object ProvidesHoodieConfig {

Re: [PR] [HUDI-7378] Fix Spark SQL DML with custom key generator [hudi]

2024-04-12 Thread via GitHub
yihua commented on code in PR #10615: URL: https://github.com/apache/hudi/pull/10615#discussion_r1563196323 ## hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/spark/sql/hudi/ProvidesHoodieConfig.scala: ## @@ -530,6 +539,40 @@ object ProvidesHoodieConfig {

[jira] [Comment Edited] (HUDI-7610) Delete records are inconsistent depending on MOR/COW, Avro/Spark record merger, new filegroup reader enabled/disabled

2024-04-12 Thread Jonathan Vexler (Jira)
[ https://issues.apache.org/jira/browse/HUDI-7610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17836740#comment-17836740 ] Jonathan Vexler edited comment on HUDI-7610 at 4/12/24 8:40 PM: retest

[jira] [Comment Edited] (HUDI-7610) Delete records are inconsistent depending on MOR/COW, Avro/Spark record merger, new filegroup reader enabled/disabled

2024-04-12 Thread Jonathan Vexler (Jira)
[ https://issues.apache.org/jira/browse/HUDI-7610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17836738#comment-17836738 ] Jonathan Vexler edited comment on HUDI-7610 at 4/12/24 8:40 PM: retest

[jira] [Commented] (HUDI-7610) Delete records are inconsistent depending on MOR/COW, Avro/Spark record merger, new filegroup reader enabled/disabled

2024-04-12 Thread Jonathan Vexler (Jira)
[ https://issues.apache.org/jira/browse/HUDI-7610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17836740#comment-17836740 ] Jonathan Vexler commented on HUDI-7610: --- retest delete where delete precombine is less than insert

Re: [PR] Setup spark and timeline services once where possible [hudi]

2024-04-12 Thread via GitHub
hudi-bot commented on PR #11009: URL: https://github.com/apache/hudi/pull/11009#issuecomment-2052497270 ## CI report: * f6f303c7a2c89f5926d1f8dfda1d39fdd0134cba Azure:

[jira] [Commented] (HUDI-7610) Delete records are inconsistent depending on MOR/COW, Avro/Spark record merger, new filegroup reader enabled/disabled

2024-04-12 Thread Jonathan Vexler (Jira)
[ https://issues.apache.org/jira/browse/HUDI-7610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17836738#comment-17836738 ] Jonathan Vexler commented on HUDI-7610: --- retest because default payload changed in the last few days 

Re: [PR] [MINOR] Hudi CLI 'version' command output empty string [hudi]

2024-04-12 Thread via GitHub
pt657407064 commented on code in PR #10973: URL: https://github.com/apache/hudi/pull/10973#discussion_r1563160252 ## hudi-cli/src/main/resources/application.yml: ## @@ -20,4 +20,7 @@ spring: shell: history: enabled: true - name: hoodie-cmd.log \ No newline

Re: [PR] [HUDI-7378] Fix Spark SQL DML with custom key generator [hudi]

2024-04-12 Thread via GitHub
yihua commented on code in PR #10615: URL: https://github.com/apache/hudi/pull/10615#discussion_r1563151868 ## hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/spark/sql/hudi/ProvidesHoodieConfig.scala: ## @@ -528,6 +536,40 @@ object ProvidesHoodieConfig {

[jira] [Updated] (HUDI-7613) Check write/query with Flink and Hive on CustomKeyGenerator

2024-04-12 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-7613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-7613: Description: https://github.com/apache/hudi/pull/10615/files#r1551075779 > Check write/query with Flink and

[jira] [Created] (HUDI-7613) Check write/query with Flink and Hive on CustomKeyGenerator

2024-04-12 Thread Ethan Guo (Jira)
Ethan Guo created HUDI-7613: --- Summary: Check write/query with Flink and Hive on CustomKeyGenerator Key: HUDI-7613 URL: https://issues.apache.org/jira/browse/HUDI-7613 Project: Apache Hudi Issue

[jira] [Updated] (HUDI-7613) Check write/query with Flink and Hive on CustomKeyGenerator

2024-04-12 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-7613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-7613: Fix Version/s: 1.0.0 > Check write/query with Flink and Hive on CustomKeyGenerator >

[jira] [Commented] (HUDI-7610) Delete records are inconsistent depending on MOR/COW, Avro/Spark record merger, new filegroup reader enabled/disabled

2024-04-12 Thread Jonathan Vexler (Jira)
[ https://issues.apache.org/jira/browse/HUDI-7610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17836737#comment-17836737 ] Jonathan Vexler commented on HUDI-7610: --- use hoodie is deleted where delete precombine is less than

[jira] [Comment Edited] (HUDI-7610) Delete records are inconsistent depending on MOR/COW, Avro/Spark record merger, new filegroup reader enabled/disabled

2024-04-12 Thread Jonathan Vexler (Jira)
[ https://issues.apache.org/jira/browse/HUDI-7610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17836735#comment-17836735 ] Jonathan Vexler edited comment on HUDI-7610 at 4/12/24 7:58 PM: use hoodie

[jira] [Commented] (HUDI-7610) Delete records are inconsistent depending on MOR/COW, Avro/Spark record merger, new filegroup reader enabled/disabled

2024-04-12 Thread Jonathan Vexler (Jira)
[ https://issues.apache.org/jira/browse/HUDI-7610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17836735#comment-17836735 ] Jonathan Vexler commented on HUDI-7610: --- use hoodie is deleted: {code:java} @Test def

Re: [PR] [HUDI-7269] Fallback to key based merge if positions are missing from log block [hudi]

2024-04-12 Thread via GitHub
hudi-bot commented on PR #10991: URL: https://github.com/apache/hudi/pull/10991#issuecomment-2052437226 ## CI report: * 7dfe5ef7fa89cebfca107cd54ca9f417eff2ba3c Azure:

Re: [PR] [DO NOT MERGE][HUDI-7567] Add schema evolution to the filegroup reader [hudi]

2024-04-12 Thread via GitHub
hudi-bot commented on PR #10957: URL: https://github.com/apache/hudi/pull/10957#issuecomment-2052437106 ## CI report: * ee7a0e3a401dd0d2c0f2d1095256fb9f27c9802f Azure:

Re: [PR] [HUDI-7566] Add schema evolution to spark file readers [hudi]

2024-04-12 Thread via GitHub
hudi-bot commented on PR #10956: URL: https://github.com/apache/hudi/pull/10956#issuecomment-2052437070 ## CI report: * 37bc97b3e080cb3664405a446c0174655720d41c Azure:

Re: [PR] Setup spark and timeline services once where possible [hudi]

2024-04-12 Thread via GitHub
hudi-bot commented on PR #11009: URL: https://github.com/apache/hudi/pull/11009#issuecomment-2052428638 ## CI report: * 819ec8e0c3de67165bd6d54b35d5c708e28d98a0 Azure:

Re: [PR] [HUDI-7269] Fallback to key based merge if positions are missing from log block [hudi]

2024-04-12 Thread via GitHub
hudi-bot commented on PR #10991: URL: https://github.com/apache/hudi/pull/10991#issuecomment-2052428509 ## CI report: * 7dfe5ef7fa89cebfca107cd54ca9f417eff2ba3c Azure:

Re: [PR] [DO NOT MERGE][HUDI-7567] Add schema evolution to the filegroup reader [hudi]

2024-04-12 Thread via GitHub
hudi-bot commented on PR #10957: URL: https://github.com/apache/hudi/pull/10957#issuecomment-2052428384 ## CI report: * ee7a0e3a401dd0d2c0f2d1095256fb9f27c9802f Azure:

Re: [PR] [HUDI-7566] Add schema evolution to spark file readers [hudi]

2024-04-12 Thread via GitHub
hudi-bot commented on PR #10956: URL: https://github.com/apache/hudi/pull/10956#issuecomment-2052428308 ## CI report: * 37bc97b3e080cb3664405a446c0174655720d41c Azure:

Re: [PR] Setup spark and timeline services once where possible [hudi]

2024-04-12 Thread via GitHub
hudi-bot commented on PR #11009: URL: https://github.com/apache/hudi/pull/11009#issuecomment-2052420254 ## CI report: * 819ec8e0c3de67165bd6d54b35d5c708e28d98a0 Azure:

Re: [PR] [MINOR] Make ordering deterministic in small file selection [hudi]

2024-04-12 Thread via GitHub
hudi-bot commented on PR #11008: URL: https://github.com/apache/hudi/pull/11008#issuecomment-2052420166 ## CI report: * baaff5d03b4199e0aa188492cfa8a5fe2908a47e Azure:

[jira] [Created] (HUDI-7612) HoodieSparkRecordMerger does not handle deletes based on the preCombine/ordering field

2024-04-12 Thread Jonathan Vexler (Jira)
Jonathan Vexler created HUDI-7612: - Summary: HoodieSparkRecordMerger does not handle deletes based on the preCombine/ordering field Key: HUDI-7612 URL: https://issues.apache.org/jira/browse/HUDI-7612

[jira] [Created] (HUDI-7611) DELETE operation does not route preCombine/ordering field values to the delete records

2024-04-12 Thread Jonathan Vexler (Jira)
Jonathan Vexler created HUDI-7611: - Summary: DELETE operation does not route preCombine/ordering field values to the delete records Key: HUDI-7611 URL: https://issues.apache.org/jira/browse/HUDI-7611

[jira] [Updated] (HUDI-7611) DELETE operation does not route preCombine/ordering field values to the delete records

2024-04-12 Thread Jonathan Vexler (Jira)
[ https://issues.apache.org/jira/browse/HUDI-7611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Vexler updated HUDI-7611: -- Fix Version/s: 1.0.0 > DELETE operation does not route preCombine/ordering field values to the

[jira] [Closed] (HUDI-7565) Break-up schema evolution: port spark code to file readers

2024-04-12 Thread Jonathan Vexler (Jira)
[ https://issues.apache.org/jira/browse/HUDI-7565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Vexler closed HUDI-7565. - Resolution: Fixed > Break-up schema evolution: port spark code to file readers >

[jira] [Comment Edited] (HUDI-7610) Delete records are inconsistent depending on MOR/COW, Avro/Spark record merger, new filegroup reader enabled/disabled

2024-04-12 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-7610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17836729#comment-17836729 ] Ethan Guo edited comment on HUDI-7610 at 4/12/24 7:03 PM: -- Based on offline

[jira] [Commented] (HUDI-7610) Delete records are inconsistent depending on MOR/COW, Avro/Spark record merger, new filegroup reader enabled/disabled

2024-04-12 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-7610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17836729#comment-17836729 ] Ethan Guo commented on HUDI-7610: - Based on offline discussion, immediately we see two issues: (1) DELETE

Re: [PR] [MINOR] Make ordering deterministic in small file selection [hudi]

2024-04-12 Thread via GitHub
hudi-bot commented on PR #11008: URL: https://github.com/apache/hudi/pull/11008#issuecomment-2052326296 ## CI report: * baaff5d03b4199e0aa188492cfa8a5fe2908a47e Azure:

Re: [PR] [MINOR] Make ordering deterministic in small file selection [hudi]

2024-04-12 Thread via GitHub
the-other-tim-brown commented on PR #11008: URL: https://github.com/apache/hudi/pull/11008#issuecomment-2052311371 @hudi-bot run azure -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [HUDI-7378] Fix Spark SQL DML with custom key generator [hudi]

2024-04-12 Thread via GitHub
yihua commented on PR #10615: URL: https://github.com/apache/hudi/pull/10615#issuecomment-2052277245 > I like that this has the benefit of not breaking tables with their existing hoodie.table.recordkey.fields, but I am curious about any other approaches you thought about. From you test

Re: [I] [SUPPORT] Questions about LOG in Hudi source code [hudi]

2024-04-12 Thread via GitHub
Gatsby-Lee commented on issue #10903: URL: https://github.com/apache/hudi/issues/10903#issuecomment-2052276305 @danny0405 Thank you for your response. Can you share what to be added into the log4j2 config to print the Hudi log into AWS EMR log? -- This is an automated message from

Re: [PR] [HUDI-7599] add bootstrap mor legacy reader back to default source [hudi]

2024-04-12 Thread via GitHub
yihua merged PR #10990: URL: https://github.com/apache/hudi/pull/10990 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

(hudi) branch master updated: [HUDI-7599] add bootstrap mor legacy reader back to default source (#10990)

2024-04-12 Thread yihua
This is an automated email from the ASF dual-hosted git repository. yihua pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/master by this push: new 56aded81287 [HUDI-7599] add bootstrap mor legacy

Re: [PR] [DO NOT MERGE][HUDI-7567] Add schema evolution to the filegroup reader [hudi]

2024-04-12 Thread via GitHub
hudi-bot commented on PR #10957: URL: https://github.com/apache/hudi/pull/10957#issuecomment-2052198778 ## CI report: * ee7a0e3a401dd0d2c0f2d1095256fb9f27c9802f Azure:

Re: [PR] [HUDI-7604] Make table name config work properly [hudi]

2024-04-12 Thread via GitHub
jonvex commented on code in PR #10998: URL: https://github.com/apache/hudi/pull/10998#discussion_r1562940754 ## hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/DataSourceOptions.scala: ## @@ -964,6 +964,11 @@ object DataSourceOptionsHelper { def

Re: [PR] [DO NOT MERGE][HUDI-7566] Add schema evolution to spark file readers [hudi]

2024-04-12 Thread via GitHub
hudi-bot commented on PR #10956: URL: https://github.com/apache/hudi/pull/10956#issuecomment-2052198701 ## CI report: * 37bc97b3e080cb3664405a446c0174655720d41c Azure:

[jira] [Updated] (HUDI-7610) Delete records are inconsistent depending on MOR/COW, Avro/Spark record merger, new filegroup reader enabled/disabled

2024-04-12 Thread Jonathan Vexler (Jira)
[ https://issues.apache.org/jira/browse/HUDI-7610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Vexler updated HUDI-7610: -- Description: Here is a test that can be run on master:   {code:java} @Test def

Re: [PR] [HUDI-7604] Make table name config work properly [hudi]

2024-04-12 Thread via GitHub
yihua commented on code in PR #10998: URL: https://github.com/apache/hudi/pull/10998#discussion_r1562929292 ## hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/DataSourceOptions.scala: ## @@ -964,6 +964,11 @@ object DataSourceOptionsHelper { def

(hudi) branch master updated: [HUDI-7565] Create spark file readers to read a single file instead of an entire partition (#10954)

2024-04-12 Thread yihua
This is an automated email from the ASF dual-hosted git repository. yihua pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/master by this push: new f715e8a02e8 [HUDI-7565] Create spark file readers

Re: [PR] [HUDI-7565] Create spark file readers to read a single file instead of an entire partition [hudi]

2024-04-12 Thread via GitHub
yihua merged PR #10954: URL: https://github.com/apache/hudi/pull/10954 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

Re: [PR] [HUDI-7565] Create spark file readers to read a single file instead of an entire partition [hudi]

2024-04-12 Thread via GitHub
yihua commented on code in PR #10954: URL: https://github.com/apache/hudi/pull/10954#discussion_r1562924017 ## hudi-spark-datasource/hudi-spark/src/test/java/org/apache/hudi/functional/TestSparkHoodieParquetReader.java: ## @@ -0,0 +1,48 @@ +/* + * Licensed to the Apache

[jira] [Updated] (HUDI-7610) Delete records are inconsistent depending on MOR/COW, Avro/Spark record merger, new filegroup reader enabled/disabled

2024-04-12 Thread Jonathan Vexler (Jira)
[ https://issues.apache.org/jira/browse/HUDI-7610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Vexler updated HUDI-7610: -- Description: Here is a test that can be run on master:   {code:java} @Test def

Re: [PR] Setup spark and timeline services once where possible [hudi]

2024-04-12 Thread via GitHub
hudi-bot commented on PR #11009: URL: https://github.com/apache/hudi/pull/11009#issuecomment-2052115383 ## CI report: * 819ec8e0c3de67165bd6d54b35d5c708e28d98a0 Azure:

Re: [PR] Setup spark and timeline services once where possible [hudi]

2024-04-12 Thread via GitHub
hudi-bot commented on PR #11009: URL: https://github.com/apache/hudi/pull/11009#issuecomment-2052104271 ## CI report: * 819ec8e0c3de67165bd6d54b35d5c708e28d98a0 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run

Re: [PR] [MINOR] Make ordering deterministic in small file selection [hudi]

2024-04-12 Thread via GitHub
hudi-bot commented on PR #11008: URL: https://github.com/apache/hudi/pull/11008#issuecomment-2052104198 ## CI report: * baaff5d03b4199e0aa188492cfa8a5fe2908a47e Azure:

[jira] [Updated] (HUDI-7610) Delete records are inconsistent depending on MOR/COW, Avro/Spark record merger, new filegroup reader enabled/disabled

2024-04-12 Thread Jonathan Vexler (Jira)
[ https://issues.apache.org/jira/browse/HUDI-7610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Vexler updated HUDI-7610: -- Description: Here is a test that can be run on master:   {code:java} @Test def

[jira] [Updated] (HUDI-7610) Delete records are inconsistent depending on MOR/COW, Avro/Spark record merger, new filegroup reader enabled/disabled

2024-04-12 Thread Jonathan Vexler (Jira)
[ https://issues.apache.org/jira/browse/HUDI-7610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Vexler updated HUDI-7610: -- Description: Here is a test that can be run on master:   {code:java} @Test def

[jira] [Created] (HUDI-7610) Delete records are inconsistent depending on MOR/COW, Avro/Spark record merger, new filegroup reader enabled/disabled

2024-04-12 Thread Jonathan Vexler (Jira)
Jonathan Vexler created HUDI-7610: - Summary: Delete records are inconsistent depending on MOR/COW, Avro/Spark record merger, new filegroup reader enabled/disabled Key: HUDI-7610 URL:

[PR] Setup spark and timeline services once where possible [hudi]

2024-04-12 Thread via GitHub
the-other-tim-brown opened a new pull request, #11009: URL: https://github.com/apache/hudi/pull/11009 ### Change Logs _Describe context and summary for this change. Highlight if any code was copied._ ### Impact _Describe any public API or user-facing feature change or

Re: [PR] [DO NOT MERGE][HUDI-7567] Add schema evolution to the filegroup reader [hudi]

2024-04-12 Thread via GitHub
hudi-bot commented on PR #10957: URL: https://github.com/apache/hudi/pull/10957#issuecomment-2052016581 ## CI report: * 15acc2e870fb880a56de561be9abb72f28fa588d Azure:

Re: [PR] [HUDI-7565] Create spark file readers to read a single file instead of an entire partition [hudi]

2024-04-12 Thread via GitHub
hudi-bot commented on PR #10954: URL: https://github.com/apache/hudi/pull/10954#issuecomment-2052016503 ## CI report: * 8f1ba6d46d8777f39c522d8bcac545ba3d4fd544 Azure:

Re: [PR] [DO NOT MERGE][HUDI-7567] Add schema evolution to the filegroup reader [hudi]

2024-04-12 Thread via GitHub
hudi-bot commented on PR #10957: URL: https://github.com/apache/hudi/pull/10957#issuecomment-2052005116 ## CI report: * 15acc2e870fb880a56de561be9abb72f28fa588d Azure:

Re: [PR] [DO NOT MERGE][HUDI-7566] Add schema evolution to spark file readers [hudi]

2024-04-12 Thread via GitHub
hudi-bot commented on PR #10956: URL: https://github.com/apache/hudi/pull/10956#issuecomment-2051927813 ## CI report: * 088f69ed54db32d1686caa4f457f6fc9aed0 Azure:

Re: [PR] [MINOR] Make ordering deterministic in small file selection [hudi]

2024-04-12 Thread via GitHub
hudi-bot commented on PR #11008: URL: https://github.com/apache/hudi/pull/11008#issuecomment-2051912011 ## CI report: * e7dde68f9c2bda3e1045d3bcda6c2472072395a0 Azure:

Re: [PR] [DO NOT MERGE][HUDI-7566] Add schema evolution to spark file readers [hudi]

2024-04-12 Thread via GitHub
hudi-bot commented on PR #10956: URL: https://github.com/apache/hudi/pull/10956#issuecomment-2051911694 ## CI report: * 088f69ed54db32d1686caa4f457f6fc9aed0 Azure:

Re: [PR] [MINOR] Make ordering deterministic in small file selection [hudi]

2024-04-12 Thread via GitHub
hudi-bot commented on PR #11008: URL: https://github.com/apache/hudi/pull/11008#issuecomment-2051898070 ## CI report: * e7dde68f9c2bda3e1045d3bcda6c2472072395a0 Azure:

Re: [PR] [HUDI-7565] Create spark file readers to read a single file instead of an entire partition [hudi]

2024-04-12 Thread via GitHub
hudi-bot commented on PR #10954: URL: https://github.com/apache/hudi/pull/10954#issuecomment-2051897716 ## CI report: * 8f1ba6d46d8777f39c522d8bcac545ba3d4fd544 Azure:

  1   2   >