Re: [PR] [HUDI-7597] Add logs of Kafka offsets when the checkpoint is out of bound [hudi]

2024-04-10 Thread via GitHub
hudi-bot commented on PR #10987: URL: https://github.com/apache/hudi/pull/10987#issuecomment-2046625980 ## CI report: * e608ca27d1d69ea9b6d6fe299ea0139f4fed04d5 Azure:

Re: [PR] [HUDI-6993] Support Flink 1.18 [hudi]

2024-04-10 Thread via GitHub
chattarajoy commented on PR #9949: URL: https://github.com/apache/hudi/pull/9949#issuecomment-2046664748 Hey, are there any plans on when this will be released or somewhere I can check the timeline for next release? -- This is an automated message from the Apache Git Service. To respond

Re: [PR] [HUDI-7597] Add logs of Kafka offsets when the checkpoint is out of bound [hudi]

2024-04-10 Thread via GitHub
hudi-bot commented on PR #10987: URL: https://github.com/apache/hudi/pull/10987#issuecomment-2046724112 ## CI report: * a860c65630df6a8f7cc23ae2dc2db5d9215db722 Azure:

[jira] [Comment Edited] (HUDI-7580) Inserting rows into partitioned table leads to data sanity issues

2024-04-10 Thread Vinaykumar Bhat (Jira)
[ https://issues.apache.org/jira/browse/HUDI-7580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17835624#comment-17835624 ] Vinaykumar Bhat edited comment on HUDI-7580 at 4/10/24 7:29 AM: I think

Re: [PR] [HUDI-6441] Passing custom Headers with Hudi Callback URL [hudi]

2024-04-10 Thread via GitHub
wombatu-kun commented on PR #10970: URL: https://github.com/apache/hudi/pull/10970#issuecomment-2046784206 @hudi-bot run azure -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [MINOR] Fix BUG: HoodieLogFormatWriter: unable to close output stream for log… [hudi]

2024-04-10 Thread via GitHub
hudi-bot commented on PR #10989: URL: https://github.com/apache/hudi/pull/10989#issuecomment-2047064107 ## CI report: * 0f919fb1f4dfe327567fbd9763769f3e79142dfa Azure:

Re: [PR] [MINOR] Fix BUG: HoodieLogFormatWriter: unable to close output stream for log… [hudi]

2024-04-10 Thread via GitHub
hudi-bot commented on PR #10989: URL: https://github.com/apache/hudi/pull/10989#issuecomment-2046933362 ## CI report: * 0f919fb1f4dfe327567fbd9763769f3e79142dfa Azure:

Re: [PR] [HUDI-7391] HoodieMetadataMetrics should use Metrics instance for metrics registry [hudi]

2024-04-10 Thread via GitHub
nsivabalan commented on code in PR #10635: URL: https://github.com/apache/hudi/pull/10635#discussion_r1559285636 ## hudi-common/src/main/java/org/apache/hudi/metrics/Metrics.java: ## @@ -166,6 +169,17 @@ public void registerGauge(String metricName, final long value) { }

Re: [PR] [HUDI-7457] Remove runtime shutdown hook from HoodieLogFormatWriter [hudi]

2024-04-10 Thread via GitHub
danny0405 closed pull request #10789: [HUDI-7457] Remove runtime shutdown hook from HoodieLogFormatWriter URL: https://github.com/apache/hudi/pull/10789 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] [HUDI-7457] Remove runtime shutdown hook from HoodieLogFormatWriter [hudi]

2024-04-10 Thread via GitHub
danny0405 commented on PR #10789: URL: https://github.com/apache/hudi/pull/10789#issuecomment-2046986553 Close because there is another fix in #10989 . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] [HUDI-6297] Fixed issue in consuming transactional topic [hudi]

2024-04-10 Thread via GitHub
danielfordfc commented on PR #9059: URL: https://github.com/apache/hudi/pull/9059#issuecomment-2047017172 Hey @nsivabalan , @bvaradar , @ad1happy2go - is this absolutely not going to happen anytime soon? It's preventing us from directly ingesting a large majority of our Kafka topics in

[jira] [Commented] (HUDI-7580) Inserting rows into partitioned table leads to data sanity issues

2024-04-10 Thread Vinaykumar Bhat (Jira)
[ https://issues.apache.org/jira/browse/HUDI-7580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17835624#comment-17835624 ] Vinaykumar Bhat commented on HUDI-7580: --- I think the problem is that spark [rewrites the

[PR] [MINOR] Fix BUG: HoodieLogFormatWriter: unable to close output stream for log… [hudi]

2024-04-10 Thread via GitHub
silly-carbon opened a new pull request, #10989: URL: https://github.com/apache/hudi/pull/10989 Fix BUG: HoodieLogFormatWriter: unable to close output stream for log file HoodieLogFile{xxx} This happens sometimes when users try to shutdown Hudi programs manually (and then YARN

Re: [I] [SUPPORT] org.apache.avro.SchemaParseException: Can't redefine decimal field [hudi]

2024-04-10 Thread via GitHub
ad1happy2go commented on issue #10983: URL: https://github.com/apache/hudi/issues/10983#issuecomment-2046872506 Thanks @junkri for raising this . We will look into this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [I] [SUPPORT]File Not Found Exception occurrs when Flink task read Hudi MOR table failure recover [hudi]

2024-04-10 Thread via GitHub
danny0405 commented on issue #10988: URL: https://github.com/apache/hudi/issues/10988#issuecomment-2046973455 yeah, I think it may make sense we add this fix: just skip the missing files while we recovering from the state by do a existence check. -- This is an automated message from the

Re: [PR] [MINOR] Fix BUG: HoodieLogFormatWriter: unable to close output stream for log… [hudi]

2024-04-10 Thread via GitHub
silly-carbon commented on PR #10989: URL: https://github.com/apache/hudi/pull/10989#issuecomment-2046983505 > Nice catch, I was thinking this exception was caused by concurrent access of the shutdown hook before #10789, it's greate if this is actually the culprit. > > Take a look at

[jira] [Comment Edited] (HUDI-7580) Inserting rows into partitioned table leads to data sanity issues

2024-04-10 Thread Vinaykumar Bhat (Jira)
[ https://issues.apache.org/jira/browse/HUDI-7580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17835624#comment-17835624 ] Vinaykumar Bhat edited comment on HUDI-7580 at 4/10/24 7:27 AM: I think

(hudi) branch master updated: [HUDI-7597] Add logs of Kafka offsets when the checkpoint is out of bound (#10987)

2024-04-10 Thread codope
This is an automated email from the ASF dual-hosted git repository. codope pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/master by this push: new ee0bcc417be [HUDI-7597] Add logs of Kafka offsets

Re: [PR] [HUDI-7597] Add logs of Kafka offsets when the checkpoint is out of bound [hudi]

2024-04-10 Thread via GitHub
codope merged PR #10987: URL: https://github.com/apache/hudi/pull/10987 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

Re: [PR] [HUDI-6441] Passing custom Headers with Hudi Callback URL [hudi]

2024-04-10 Thread via GitHub
hudi-bot commented on PR #10970: URL: https://github.com/apache/hudi/pull/10970#issuecomment-2046898402 ## CI report: * dbceeac96d98d8b87a3a771650d59554480cca16 Azure:

Re: [PR] [HUDI-6441] Passing custom Headers with Hudi Callback URL [hudi]

2024-04-10 Thread via GitHub
hudi-bot commented on PR #10970: URL: https://github.com/apache/hudi/pull/10970#issuecomment-2046915753 ## CI report: * dbceeac96d98d8b87a3a771650d59554480cca16 Azure:

Re: [PR] [MINOR] Fix BUG: HoodieLogFormatWriter: unable to close output stream for log… [hudi]

2024-04-10 Thread via GitHub
danny0405 commented on PR #10989: URL: https://github.com/apache/hudi/pull/10989#issuecomment-2046961572 Nice catch, I was thinking this exception was caused by concurrent access of the shutdown hook before https://github.com/apache/hudi/pull/10789, it's greate if this is actually the

Re: [PR] [MINOR] Fix BUG: HoodieLogFormatWriter: unable to close output stream for log… [hudi]

2024-04-10 Thread via GitHub
danny0405 commented on PR #10989: URL: https://github.com/apache/hudi/pull/10989#issuecomment-2047056747 > Yes, it was exactly this piece of JDK code that made me to come up with this bug fix And jdk would finally take care of the cleaning of shutdown hooks: ```java /*

Re: [PR] [MINOR] Fix BUG: HoodieLogFormatWriter: unable to close output stream for log… [hudi]

2024-04-10 Thread via GitHub
danny0405 merged PR #10989: URL: https://github.com/apache/hudi/pull/10989 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

(hudi) branch master updated: [MINOR] Fix BUG: HoodieLogFormatWriter: unable to close output stream for log file HoodieLogFile{xxx} (#10989)

2024-04-10 Thread danny0405
This is an automated email from the ASF dual-hosted git repository. danny0405 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/master by this push: new 913c320e4a5 [MINOR] Fix BUG:

Re: [I] [SUPPORT] Duplicate data in base file of MOR table [hudi]

2024-04-10 Thread via GitHub
wqwl611 commented on issue #10882: URL: https://github.com/apache/hudi/issues/10882#issuecomment-2046845414 > @wqwl611 I tried the same configuration but unable to reproduce. As you also mentioned that you are also getting this in very few cases. So we need to verify this on your env. In

Re: [PR] [MINOR] Fix BUG: HoodieLogFormatWriter: unable to close output stream for log… [hudi]

2024-04-10 Thread via GitHub
hudi-bot commented on PR #10989: URL: https://github.com/apache/hudi/pull/10989#issuecomment-2046916008 ## CI report: * 0f919fb1f4dfe327567fbd9763769f3e79142dfa UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run

Re: [PR] [HUDI-7391] HoodieMetadataMetrics should use Metrics instance for metrics registry [hudi]

2024-04-10 Thread via GitHub
hudi-bot commented on PR #10635: URL: https://github.com/apache/hudi/pull/10635#issuecomment-2047552397 ## CI report: * a6b4e7f80ed04f25241504c833f9b85b4331f1fd Azure:

Re: [PR] [DO NOT MERGE][HUDI-7567] Add schema evolution to the filegroup reader [hudi]

2024-04-10 Thread via GitHub
hudi-bot commented on PR #10957: URL: https://github.com/apache/hudi/pull/10957#issuecomment-2047737267 ## CI report: * 31eb84b8fc7e0d8066633ff8f6bc92b14b8660e3 Azure:

[jira] [Updated] (HUDI-7597) Add logs of Kafka offsets when the checkpoint is out of bound

2024-04-10 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-7597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-7597: Fix Version/s: 0.15.0 1.0.0 > Add logs of Kafka offsets when the checkpoint is out of

[jira] [Assigned] (HUDI-7597) Add logs of Kafka offsets when the checkpoint is out of bound

2024-04-10 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-7597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo reassigned HUDI-7597: --- Assignee: Ethan Guo > Add logs of Kafka offsets when the checkpoint is out of bound >

Re: [PR] [HUDI-6441] Passing custom Headers with Hudi Callback URL [hudi]

2024-04-10 Thread via GitHub
hudi-bot commented on PR #10970: URL: https://github.com/apache/hudi/pull/10970#issuecomment-2047595637 ## CI report: * dbceeac96d98d8b87a3a771650d59554480cca16 Azure:

Re: [PR] [DO NOT MERGE][HUDI-7567] Add schema evolution to the filegroup reader [hudi]

2024-04-10 Thread via GitHub
hudi-bot commented on PR #10957: URL: https://github.com/apache/hudi/pull/10957#issuecomment-2047863552 ## CI report: * 9c723d060870d975efca67f769816a98bb662c49 Azure:

Re: [PR] [HUDI-6441] Passing custom Headers with Hudi Callback URL [hudi]

2024-04-10 Thread via GitHub
hudi-bot commented on PR #10970: URL: https://github.com/apache/hudi/pull/10970#issuecomment-2047863708 ## CI report: * 8ab6f394dd71631fd7be5d8ef6fcceb7ac89e584 Azure:

Re: [I] [SUPPORT]File Not Found Exception occurrs when Flink task read Hudi MOR table failure recover [hudi]

2024-04-10 Thread via GitHub
Sparsamkeit commented on issue #10988: URL: https://github.com/apache/hudi/issues/10988#issuecomment-2047424467 > yeah, I think it may make sense we add this fix: just skip the missing files while we recovering from the state by do a existence check. @danny0405 Will skipping cause

Re: [PR] [HUDI-7391] HoodieMetadataMetrics should use Metrics instance for metrics registry [hudi]

2024-04-10 Thread via GitHub
lokeshj1703 commented on code in PR #10635: URL: https://github.com/apache/hudi/pull/10635#discussion_r1559373670 ## hudi-common/src/main/java/org/apache/hudi/metrics/Metrics.java: ## @@ -166,6 +169,17 @@ public void registerGauge(String metricName, final long value) { }

Re: [PR] [HUDI-6441] Passing custom Headers with Hudi Callback URL [hudi]

2024-04-10 Thread via GitHub
hudi-bot commented on PR #10970: URL: https://github.com/apache/hudi/pull/10970#issuecomment-2047755640 ## CI report: * 8ab6f394dd71631fd7be5d8ef6fcceb7ac89e584 Azure:

Re: [PR] [DO NOT MERGE][HUDI-7567] Add schema evolution to the filegroup reader [hudi]

2024-04-10 Thread via GitHub
hudi-bot commented on PR #10957: URL: https://github.com/apache/hudi/pull/10957#issuecomment-2047755442 ## CI report: * 31eb84b8fc7e0d8066633ff8f6bc92b14b8660e3 Azure:

Re: [PR] [HUDI-6441] Passing custom Headers with Hudi Callback URL [hudi]

2024-04-10 Thread via GitHub
wombatu-kun commented on PR #10970: URL: https://github.com/apache/hudi/pull/10970#issuecomment-2047788583 @hudi-bot run azure -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[jira] [Updated] (HUDI-7598) Remove duplicate methods in subclasses of HoodieSparkClientTestBase to enhance reusability

2024-04-10 Thread Sagar Sumit (Jira)
[ https://issues.apache.org/jira/browse/HUDI-7598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sagar Sumit updated HUDI-7598: -- Summary: Remove duplicate methods in subclasses of HoodieSparkClientTestBase to enhance reusability

Re: [PR] [HUDI-6441] Passing custom Headers with Hudi Callback URL [hudi]

2024-04-10 Thread via GitHub
hudi-bot commented on PR #10970: URL: https://github.com/apache/hudi/pull/10970#issuecomment-2047572876 ## CI report: * dbceeac96d98d8b87a3a771650d59554480cca16 Azure:

Re: [PR] [HUDI-7391] HoodieMetadataMetrics should use Metrics instance for metrics registry [hudi]

2024-04-10 Thread via GitHub
hudi-bot commented on PR #10635: URL: https://github.com/apache/hudi/pull/10635#issuecomment-2047570358 ## CI report: * a6b4e7f80ed04f25241504c833f9b85b4331f1fd Azure:

Re: [I] [SUPPORT] Querying Hudi tables with Spark+Velox(C++), ObjectSizeCalculator.getObjectSize hangs causing about a 50-second delay in queries [hudi]

2024-04-10 Thread via GitHub
Zouxxyy commented on issue #10580: URL: https://github.com/apache/hudi/issues/10580#issuecomment-2047723018 Same problem, -Djol.skipHotspotSAAttach=true works! Thanks -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] [HUDI-7144] Build storage partition stats index and use it for data skipping [hudi]

2024-04-10 Thread via GitHub
codope commented on code in PR #10352: URL: https://github.com/apache/hudi/pull/10352#discussion_r1559706816 ## hudi-common/src/main/java/org/apache/hudi/common/util/BaseFileUtils.java: ## @@ -64,6 +67,51 @@ public static BaseFileUtils getInstance(HoodieFileFormat fileFormat)

[jira] [Updated] (HUDI-7590) Add configs to choose HoodieStorage and reader/writer implementation

2024-04-10 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-7590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-7590: Fix Version/s: 0.15.0 1.0.0 > Add configs to choose HoodieStorage and reader/writer

[jira] [Created] (HUDI-7598) duplicate methods in subclasses of HoodieSparkClientTestBase to enahnce reusability

2024-04-10 Thread Sagar Sumit (Jira)
Sagar Sumit created HUDI-7598: - Summary: duplicate methods in subclasses of HoodieSparkClientTestBase to enahnce reusability Key: HUDI-7598 URL: https://issues.apache.org/jira/browse/HUDI-7598 Project:

Re: [PR] [HUDI-7144] Build storage partition stats index and use it for data skipping [hudi]

2024-04-10 Thread via GitHub
codope commented on code in PR #10352: URL: https://github.com/apache/hudi/pull/10352#discussion_r1559743268 ## hudi-common/src/main/java/org/apache/hudi/metadata/HoodieMetadataPayload.java: ## @@ -641,6 +642,36 @@ public static Stream createColumnStatsRecords(String

Re: [PR] [HUDI-7391] HoodieMetadataMetrics should use Metrics instance for metrics registry [hudi]

2024-04-10 Thread via GitHub
hudi-bot commented on PR #10635: URL: https://github.com/apache/hudi/pull/10635#issuecomment-2047713270 ## CI report: * f5aeb901fa61f9de26faceed3a15e0814bde4cfe Azure:

Re: [PR] [HUDI-6441] Passing custom Headers with Hudi Callback URL [hudi]

2024-04-10 Thread via GitHub
hudi-bot commented on PR #10970: URL: https://github.com/apache/hudi/pull/10970#issuecomment-2047880233 ## CI report: * 8ab6f394dd71631fd7be5d8ef6fcceb7ac89e584 Azure:

Re: [PR] [HUDI-7144] Build storage partition stats index and use it for data skipping [hudi]

2024-04-10 Thread via GitHub
codope commented on code in PR #10352: URL: https://github.com/apache/hudi/pull/10352#discussion_r1559774779 ## hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieFileIndex.scala: ## @@ -346,6 +352,12 @@ case class HoodieFileIndex(spark: SparkSession,

Re: [PR] [DO NOT MERGE][HUDI-7567] Add schema evolution to the filegroup reader [hudi]

2024-04-10 Thread via GitHub
hudi-bot commented on PR #10957: URL: https://github.com/apache/hudi/pull/10957#issuecomment-2048139336 ## CI report: * 1e4657afd949bc1610adb947abd26a35bd89d884 Azure:

Re: [PR] [HUDI-7599] add bootstrap mor legacy reader back to default source [hudi]

2024-04-10 Thread via GitHub
hudi-bot commented on PR #10990: URL: https://github.com/apache/hudi/pull/10990#issuecomment-2048139565 ## CI report: * 8522c4683328ac7aadbc42bd9b69485d3bbdc720 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run

Re: [PR] [HUDI-7391] HoodieMetadataMetrics should use Metrics instance for metrics registry [hudi]

2024-04-10 Thread via GitHub
hudi-bot commented on PR #10635: URL: https://github.com/apache/hudi/pull/10635#issuecomment-2048138517 ## CI report: * f5aeb901fa61f9de26faceed3a15e0814bde4cfe Azure:

Re: [PR] [HUDI-7144] Build storage partition stats index and use it for data skipping [hudi]

2024-04-10 Thread via GitHub
hudi-bot commented on PR #10352: URL: https://github.com/apache/hudi/pull/10352#issuecomment-2048292729 ## CI report: * 53e141b4edb11cfc803af29bf90a8583aeb991f3 Azure:

Re: [PR] [MINOR] Support replacecommit rollback as part of rollbackFailedWrites [hudi]

2024-04-10 Thread via GitHub
hudi-bot commented on PR #10648: URL: https://github.com/apache/hudi/pull/10648#issuecomment-2048397322 ## CI report: * 314969b1a7d2e5e24f099829d2b5e5b0c5b99893 Azure:

Re: [PR] [HUDI-7144] Build storage partition stats index and use it for data skipping [hudi]

2024-04-10 Thread via GitHub
codope commented on code in PR #10352: URL: https://github.com/apache/hudi/pull/10352#discussion_r1559769906 ## hudi-common/src/main/java/org/apache/hudi/metadata/HoodieTableMetadataUtil.java: ## @@ -1901,4 +1910,162 @@ private static Path filePath(String basePath, String

Re: [PR] [HUDI-7144] Build storage partition stats index and use it for data skipping [hudi]

2024-04-10 Thread via GitHub
codope commented on code in PR #10352: URL: https://github.com/apache/hudi/pull/10352#discussion_r1559776148 ## hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/PartitionStatsIndexSupport.scala: ## @@ -0,0 +1,71 @@ +/* + * Licensed to the Apache Software

Re: [PR] [HUDI-7144] Build storage partition stats index and use it for data skipping [hudi]

2024-04-10 Thread via GitHub
hudi-bot commented on PR #10352: URL: https://github.com/apache/hudi/pull/10352#issuecomment-2048137978 ## CI report: * 2f0f7115ae2321ad0de852643ecda4fe2fd0ce42 Azure:

Re: [PR] [HUDI-7144] Build storage partition stats index and use it for data skipping [hudi]

2024-04-10 Thread via GitHub
codope commented on code in PR #10352: URL: https://github.com/apache/hudi/pull/10352#discussion_r1559767970 ## hudi-common/src/main/java/org/apache/hudi/metadata/HoodieTableMetadataUtil.java: ## @@ -1901,4 +1910,162 @@ private static Path filePath(String basePath, String

Re: [PR] [DO NOT MERGE][HUDI-7567] Add schema evolution to the filegroup reader [hudi]

2024-04-10 Thread via GitHub
hudi-bot commented on PR #10957: URL: https://github.com/apache/hudi/pull/10957#issuecomment-2048033789 ## CI report: * 9c723d060870d975efca67f769816a98bb662c49 Azure:

Re: [PR] [HUDI-7144] Build storage partition stats index and use it for data skipping [hudi]

2024-04-10 Thread via GitHub
codope commented on code in PR #10352: URL: https://github.com/apache/hudi/pull/10352#discussion_r1559788412 ## hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/functional/TestPartitionStatsIndex.scala: ## @@ -0,0 +1,188 @@ +/* + * Licensed to the Apache Software

Re: [PR] [HUDI-7599] add bootstrap mor legacy reader back to default source [hudi]

2024-04-10 Thread via GitHub
hudi-bot commented on PR #10990: URL: https://github.com/apache/hudi/pull/10990#issuecomment-2048204294 ## CI report: * 8522c4683328ac7aadbc42bd9b69485d3bbdc720 Azure:

Re: [PR] [MINOR] Support replacecommit rollback as part of rollbackFailedWrites [hudi]

2024-04-10 Thread via GitHub
hudi-bot commented on PR #10648: URL: https://github.com/apache/hudi/pull/10648#issuecomment-2048386067 ## CI report: * 314969b1a7d2e5e24f099829d2b5e5b0c5b99893 Azure:

Re: [PR] [HUDI-7599] add bootstrap mor legacy reader back to default source [hudi]

2024-04-10 Thread via GitHub
hudi-bot commented on PR #10990: URL: https://github.com/apache/hudi/pull/10990#issuecomment-2048386936 ## CI report: * 8522c4683328ac7aadbc42bd9b69485d3bbdc720 Azure:

Re: [PR] [HUDI-7144] Build storage partition stats index and use it for data skipping [hudi]

2024-04-10 Thread via GitHub
codope commented on code in PR #10352: URL: https://github.com/apache/hudi/pull/10352#discussion_r1559779602 ## hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/PartitionStatsIndexSupport.scala: ## @@ -0,0 +1,71 @@ +/* + * Licensed to the Apache Software

Re: [PR] [DO NOT MERGE][HUDI-7567] Add schema evolution to the filegroup reader [hudi]

2024-04-10 Thread via GitHub
hudi-bot commented on PR #10957: URL: https://github.com/apache/hudi/pull/10957#issuecomment-2048021693 ## CI report: * 9c723d060870d975efca67f769816a98bb662c49 Azure:

[jira] [Updated] (HUDI-7599) Bootstrap MOR with legacy reader broken on master

2024-04-10 Thread Jonathan Vexler (Jira)
[ https://issues.apache.org/jira/browse/HUDI-7599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Vexler updated HUDI-7599: -- Status: Patch Available (was: In Progress) > Bootstrap MOR with legacy reader broken on master

[jira] [Updated] (HUDI-7599) Bootstrap MOR with legacy reader broken on master

2024-04-10 Thread Jonathan Vexler (Jira)
[ https://issues.apache.org/jira/browse/HUDI-7599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Vexler updated HUDI-7599: -- Status: In Progress (was: Open) > Bootstrap MOR with legacy reader broken on master >

[PR] [HUDI-7599] add bootstrap mor legacy reader back to default source [hudi]

2024-04-10 Thread via GitHub
jonvex opened a new pull request, #10990: URL: https://github.com/apache/hudi/pull/10990 ### Change Logs Was accidentally removed by https://github.com/apache/hudi/pull/10304 TestNewHoodieParquetFileFormat will pass now ### Impact legacy reader now has bootstrap

Re: [PR] [HUDI-7144] Build storage partition stats index and use it for data skipping [hudi]

2024-04-10 Thread via GitHub
hudi-bot commented on PR #10352: URL: https://github.com/apache/hudi/pull/10352#issuecomment-2048125079 ## CI report: * 2f0f7115ae2321ad0de852643ecda4fe2fd0ce42 Azure:

[jira] [Updated] (HUDI-7599) Bootstrap MOR with legacy reader broken on master

2024-04-10 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/HUDI-7599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-7599: - Labels: pull-request-available (was: ) > Bootstrap MOR with legacy reader broken on master >

Re: [PR] [HUDI-7144] Build storage partition stats index and use it for data skipping [hudi]

2024-04-10 Thread via GitHub
codope commented on code in PR #10352: URL: https://github.com/apache/hudi/pull/10352#discussion_r1559773320 ## hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/functional/TestPartitionStatsIndex.scala: ## @@ -0,0 +1,188 @@ +/* + * Licensed to the Apache Software

Re: [PR] [HUDI-7144] Build storage partition stats index and use it for data skipping [hudi]

2024-04-10 Thread via GitHub
codope commented on code in PR #10352: URL: https://github.com/apache/hudi/pull/10352#discussion_r1559793410 ## hudi-common/src/main/java/org/apache/hudi/metadata/HoodieTableMetadataUtil.java: ## @@ -1915,4 +1924,162 @@ private static Path filePath(String basePath, String

[jira] [Created] (HUDI-7599) Bootstrap MOR with legacy reader broken on master

2024-04-10 Thread Jonathan Vexler (Jira)
Jonathan Vexler created HUDI-7599: - Summary: Bootstrap MOR with legacy reader broken on master Key: HUDI-7599 URL: https://issues.apache.org/jira/browse/HUDI-7599 Project: Apache Hudi Issue

Re: [PR] [HUDI-7144] Build storage partition stats index and use it for data skipping [hudi]

2024-04-10 Thread via GitHub
codope commented on code in PR #10352: URL: https://github.com/apache/hudi/pull/10352#discussion_r1559770723 ## hudi-common/src/main/java/org/apache/hudi/metadata/HoodieTableMetadataUtil.java: ## @@ -1901,4 +1910,162 @@ private static Path filePath(String basePath, String

Re: [PR] [HUDI-7144] Build storage partition stats index and use it for data skipping [hudi]

2024-04-10 Thread via GitHub
codope commented on code in PR #10352: URL: https://github.com/apache/hudi/pull/10352#discussion_r1559796223 ## hudi-common/src/main/java/org/apache/hudi/common/util/BaseFileUtils.java: ## @@ -64,6 +67,51 @@ public static BaseFileUtils getInstance(HoodieFileFormat fileFormat)

Re: [PR] [HUDI-7144] Build storage partition stats index and use it for data skipping [hudi]

2024-04-10 Thread via GitHub
codope commented on code in PR #10352: URL: https://github.com/apache/hudi/pull/10352#discussion_r1559794433 ## hudi-common/src/main/java/org/apache/hudi/common/util/BaseFileUtils.java: ## @@ -67,6 +70,50 @@ public static BaseFileUtils getInstance(HoodieFileFormat fileFormat)

Re: [PR] [HUDI-7144] Build storage partition stats index and use it for data skipping [hudi]

2024-04-10 Thread via GitHub
codope commented on code in PR #10352: URL: https://github.com/apache/hudi/pull/10352#discussion_r1559794771 ## hudi-common/src/main/java/org/apache/hudi/common/util/BaseFileUtils.java: ## @@ -67,6 +70,50 @@ public static BaseFileUtils getInstance(HoodieFileFormat fileFormat)

Re: [PR] [HUDI-7391] HoodieMetadataMetrics should use Metrics instance for metrics registry [hudi]

2024-04-10 Thread via GitHub
hudi-bot commented on PR #10635: URL: https://github.com/apache/hudi/pull/10635#issuecomment-2048125781 ## CI report: * f5aeb901fa61f9de26faceed3a15e0814bde4cfe Azure:

[jira] [Updated] (HUDI-7590) Add configs to choose HoodieStorage and reader/writer implementation

2024-04-10 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-7590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-7590: Labels: hoodie-storage (was: ) > Add configs to choose HoodieStorage and reader/writer implementation >

[jira] [Updated] (HUDI-7269) Fallback to key-based merging if there is no positions in log header

2024-04-10 Thread Jonathan Vexler (Jira)
[ https://issues.apache.org/jira/browse/HUDI-7269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Vexler updated HUDI-7269: -- Attachment: image (7).png > Fallback to key-based merging if there is no positions in log

[jira] [Updated] (HUDI-7269) Fallback to key-based merging if there is no positions in log header

2024-04-10 Thread Jonathan Vexler (Jira)
[ https://issues.apache.org/jira/browse/HUDI-7269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Vexler updated HUDI-7269: -- Attachment: image (6).png > Fallback to key-based merging if there is no positions in log

Re: [PR] [DO NOT MERGE][HUDI-7567] Add schema evolution to the filegroup reader [hudi]

2024-04-10 Thread via GitHub
hudi-bot commented on PR #10957: URL: https://github.com/apache/hudi/pull/10957#issuecomment-2048554300 ## CI report: * 1e4657afd949bc1610adb947abd26a35bd89d884 Azure:

Re: [PR] [MINOR] Support replacecommit rollback as part of rollbackFailedWrites [hudi]

2024-04-10 Thread via GitHub
danny0405 commented on code in PR #10648: URL: https://github.com/apache/hudi/pull/10648#discussion_r1560153283 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/BaseHoodieTableServiceClient.java: ## @@ -892,9 +893,25 @@ protected Map>

Re: [PR] [HUDI-7565] Create spark file readers to read a single file instead of an entire partition [hudi]

2024-04-10 Thread via GitHub
yihua commented on code in PR #10954: URL: https://github.com/apache/hudi/pull/10954#discussion_r1560196842 ## hudi-spark-datasource/hudi-spark2/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/Spark24HoodieParquetReader.scala: ## @@ -0,0 +1,222 @@ +/* + *

Re: [PR] [HUDI-7565] Create spark file readers to read a single file instead of an entire partition [hudi]

2024-04-10 Thread via GitHub
yihua commented on code in PR #10954: URL: https://github.com/apache/hudi/pull/10954#discussion_r1560197011 ## hudi-spark-datasource/hudi-spark3.1.x/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/Spark31HoodieParquetReader.scala: ## @@ -0,0 +1,243 @@ +/* + *

[jira] [Updated] (HUDI-7378) Fix Spark SQL DML with custom key generator

2024-04-10 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-7378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-7378: Remaining Estimate: 0.05h Original Estimate: 0.05h > Fix Spark SQL DML with custom key generator >

[jira] [Updated] (HUDI-7378) Fix Spark SQL DML with custom key generator

2024-04-10 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-7378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-7378: Story Points: 12 > Fix Spark SQL DML with custom key generator >

Re: [PR] [HUDI-7552] Remove the suffix for MDT table service instants [hudi]

2024-04-10 Thread via GitHub
yihua merged PR #10945: URL: https://github.com/apache/hudi/pull/10945 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

[jira] [Comment Edited] (HUDI-7269) Fallback to key-based merging if there is no positions in log header

2024-04-10 Thread Jonathan Vexler (Jira)
[ https://issues.apache.org/jira/browse/HUDI-7269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17835888#comment-17835888 ] Jonathan Vexler edited comment on HUDI-7269 at 4/10/24 9:03 PM: Did an

Re: [PR] [HUDI-7565] Create spark file readers to read a single file instead of an entire partition [hudi]

2024-04-10 Thread via GitHub
yihua commented on code in PR #10954: URL: https://github.com/apache/hudi/pull/10954#discussion_r1560196572 ## hudi-spark-datasource/hudi-spark2/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/Spark24HoodieParquetReader.scala: ## @@ -0,0 +1,222 @@ +/* + *

Re: [PR] [DO NOT MERGE][HUDI-7567] Add schema evolution to the filegroup reader [hudi]

2024-04-10 Thread via GitHub
hudi-bot commented on PR #10957: URL: https://github.com/apache/hudi/pull/10957#issuecomment-2048648466 ## CI report: * 15acc2e870fb880a56de561be9abb72f28fa588d Azure:

[jira] [Updated] (HUDI-7597) Add logs of Kafka offsets when the checkpoint is out of bound

2024-04-10 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-7597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-7597: Sprint: Sprint 2024-03-25 > Add logs of Kafka offsets when the checkpoint is out of bound >

[jira] [Updated] (HUDI-7585) Avoid reading log files for resolving schema for _hoodie_operation field

2024-04-10 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-7585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-7585: Sprint: Sprint 2024-03-25 > Avoid reading log files for resolving schema for _hoodie_operation field >

[jira] [Updated] (HUDI-7378) Fix Spark SQL DML with custom key generator

2024-04-10 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-7378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-7378: Fix Version/s: 0.15.0 > Fix Spark SQL DML with custom key generator >

[jira] [Updated] (HUDI-7580) Inserting rows into partitioned table leads to data sanity issues

2024-04-10 Thread Sagar Sumit (Jira)
[ https://issues.apache.org/jira/browse/HUDI-7580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sagar Sumit updated HUDI-7580: -- Fix Version/s: 0.15.0 1.0.0 > Inserting rows into partitioned table leads to data

[jira] [Updated] (HUDI-7269) Fallback to key-based merging if there is no positions in log header

2024-04-10 Thread Jonathan Vexler (Jira)
[ https://issues.apache.org/jira/browse/HUDI-7269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Vexler updated HUDI-7269: -- Attachment: image (8).png > Fallback to key-based merging if there is no positions in log

Re: [PR] [MINOR] Support replacecommit rollback as part of rollbackFailedWrites [hudi]

2024-04-10 Thread via GitHub
hudi-bot commented on PR #10648: URL: https://github.com/apache/hudi/pull/10648#issuecomment-2048467722 ## CI report: * b5ef634c3490ffa6703d17f6056badc9faa672e9 Azure:

Re: [I] [SUPPORT]File Not Found Exception occurrs when Flink task read Hudi MOR table failure recover [hudi]

2024-04-10 Thread via GitHub
danny0405 commented on issue #10988: URL: https://github.com/apache/hudi/issues/10988#issuecomment-2048590244 yes, of course, onless we re-generate the input splicts based on the latest snapshot. -- This is an automated message from the Apache Git Service. To respond to the message,

[jira] [Updated] (HUDI-7269) Fallback to key-based merging if there is no positions in log header

2024-04-10 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/HUDI-7269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-7269: - Labels: pull-request-available (was: ) > Fallback to key-based merging if there is no positions

  1   2   >