Re: [PR] [HUDI-7618] Add ability to ignore checkpoints in delta streamer [hudi]

2024-04-18 Thread via GitHub
hudi-bot commented on PR #11018: URL: https://github.com/apache/hudi/pull/11018#issuecomment-2063520104 ## CI report: * 34446bfde68247607172d6478f92b053642c9c94 Azure:

Re: [PR] [HUDI-7515] Fix partition metadata write failure [hudi]

2024-04-18 Thread via GitHub
hudi-bot commented on PR #10886: URL: https://github.com/apache/hudi/pull/10886#issuecomment-2063823692 ## CI report: * 522a68cb3ea8dc725418eb9b811a03b5c86c694b Azure:

Re: [PR] [HUDI-7618] Add ability to ignore checkpoints in delta streamer [hudi]

2024-04-18 Thread via GitHub
hudi-bot commented on PR #11018: URL: https://github.com/apache/hudi/pull/11018#issuecomment-2063408847 ## CI report: * b2eda0f44dc17ccc3722be2eecbf001a2c57a955 Azure:

[I] [SUPPORT] HudiDeltaStreaming with Kafka - where is the Kafka Consumer Group? [hudi]

2024-04-18 Thread via GitHub
mattssll opened a new issue, #11051: URL: https://github.com/apache/hudi/issues/11051 **_Tips before filing an issue_** - Have you gone through our [FAQs](https://hudi.apache.org/learn/faq/)? - Join the mailing list to engage in conversations and get faster support at

Re: [PR] [HUDI-7515] Fix partition metadata write failure [hudi]

2024-04-18 Thread via GitHub
danny0405 commented on code in PR #10886: URL: https://github.com/apache/hudi/pull/10886#discussion_r1570413829 ## hudi-common/src/main/java/org/apache/hudi/common/model/HoodiePartitionMetadata.java: ## @@ -92,11 +92,12 @@ public int getPartitionDepth() { /** * Write

Re: [PR] [HUDI-7515] Fix partition metadata write failure [hudi]

2024-04-18 Thread via GitHub
hudi-bot commented on PR #10886: URL: https://github.com/apache/hudi/pull/10886#issuecomment-2063621252 ## CI report: * aadcb616ac338ef60c5799414bef660a19135c06 Azure:

Re: [I] [SUPPORT] java.lang.NoClassDefFoundError: org/apache/hudi/com/fasterxml/jackson/module/scala/DefaultScalaModule$ when doing an Incremental CDC Query in 0.14.1 [hudi]

2024-04-18 Thread via GitHub
VitoMakarevich commented on issue #10590: URL: https://github.com/apache/hudi/issues/10590#issuecomment-2063509773 This is fixed in https://github.com/apache/hudi/pull/10877 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

Re: [PR] [HUDI-7515] Fix partition metadata write failure [hudi]

2024-04-18 Thread via GitHub
wecharyu commented on code in PR #10886: URL: https://github.com/apache/hudi/pull/10886#discussion_r1570475426 ## hudi-common/src/main/java/org/apache/hudi/common/model/HoodiePartitionMetadata.java: ## @@ -92,11 +92,12 @@ public int getPartitionDepth() { /** * Write

Re: [PR] [HUDI-7515] Fix partition metadata write failure [hudi]

2024-04-18 Thread via GitHub
hudi-bot commented on PR #10886: URL: https://github.com/apache/hudi/pull/10886#issuecomment-2063608614 ## CI report: * aadcb616ac338ef60c5799414bef660a19135c06 Azure:

Re: [PR] [HUDI-7618] Add ability to ignore checkpoints in delta streamer [hudi]

2024-04-18 Thread via GitHub
hudi-bot commented on PR #11018: URL: https://github.com/apache/hudi/pull/11018#issuecomment-2063394311 ## CI report: * b2eda0f44dc17ccc3722be2eecbf001a2c57a955 Azure:

Re: [PR] [HUDI-7637] Make StoragePathInfo Comparable [hudi]

2024-04-18 Thread via GitHub
codope merged PR #11050: URL: https://github.com/apache/hudi/pull/11050 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

(hudi) branch master updated: [HUDI-7637] Make StoragePathInfo Comparable (#11050)

2024-04-18 Thread codope
This is an automated email from the ASF dual-hosted git repository. codope pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/master by this push: new b5b14f7d4fa [HUDI-7637] Make StoragePathInfo

Re: [PR] [HUDI-7634] Rename HoodieStorage APIs [hudi]

2024-04-18 Thread via GitHub
yihua commented on code in PR #11047: URL: https://github.com/apache/hudi/pull/11047#discussion_r1570710795 ## hudi-hadoop-common/src/main/java/org/apache/hudi/storage/hadoop/HoodieHadoopStorage.java: ## @@ -94,7 +94,7 @@ public boolean createDirectory(StoragePath path) throws

[jira] [Created] (HUDI-7639) Refactor HoodieFileIndex so that different indexes can be used via optimizer rules

2024-04-18 Thread Sagar Sumit (Jira)
Sagar Sumit created HUDI-7639: - Summary: Refactor HoodieFileIndex so that different indexes can be used via optimizer rules Key: HUDI-7639 URL: https://issues.apache.org/jira/browse/HUDI-7639 Project:

Re: [I] [Inquiry] Does HoodieIndexer can Do Indexing for RLI Async Fashion [hudi]

2024-04-18 Thread via GitHub
soumilshah1995 commented on issue #10815: URL: https://github.com/apache/hudi/issues/10815#issuecomment-2064849319 Thanks for heads up guys -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [HUDI-6497] Replace FileSystem, Path, and FileStatus usage in `hudi-common` module [hudi]

2024-04-18 Thread via GitHub
hudi-bot commented on PR #10591: URL: https://github.com/apache/hudi/pull/10591#issuecomment-2064394461 ## CI report: * 8207558e8c8714386cf2f71929d6fb08db10617b UNKNOWN * 7c517227bb1079621647852c99dd7836f9900025 UNKNOWN * e89e4e0bcb756832c22779a5ccf259c5e69c0e0d UNKNOWN *

[jira] [Updated] (HUDI-7634) Rename HoodieStorage APIs

2024-04-18 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-7634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-7634: Sprint: Sprint 2024-03-25 > Rename HoodieStorage APIs > - > > Key:

[jira] [Updated] (HUDI-7636) Make StoragePath Serializable

2024-04-18 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-7636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-7636: Sprint: Sprint 2024-03-25 > Make StoragePath Serializable > - > >

[jira] [Updated] (HUDI-7637) Make StoragePathInfo Comparable

2024-04-18 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-7637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-7637: Sprint: Sprint 2024-03-25 > Make StoragePathInfo Comparable > --- > >

[jira] [Updated] (HUDI-7638) Add metrics to HoodieStorage implementation that is not hadoop-dependent

2024-04-18 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-7638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-7638: Sprint: Sprint 2024-03-25 > Add metrics to HoodieStorage implementation that is not hadoop-dependent >

Re: [PR] [HUDI-7567] Add schema evolution to the filegroup reader [hudi]

2024-04-18 Thread via GitHub
jonvex commented on code in PR #10957: URL: https://github.com/apache/hudi/pull/10957#discussion_r1571149642 ## hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieHadoopFsRelationFactory.scala: ## @@ -64,7 +65,7 @@ abstract class

Re: [PR] [HUDI-7567] Add schema evolution to the filegroup reader [hudi]

2024-04-18 Thread via GitHub
jonvex commented on code in PR #10957: URL: https://github.com/apache/hudi/pull/10957#discussion_r1571150439 ## hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/HoodieFileGroupReaderBasedParquetFileFormat.scala: ## @@

Re: [PR] [HUDI-7567] Add schema evolution to the filegroup reader [hudi]

2024-04-18 Thread via GitHub
jonvex commented on code in PR #10957: URL: https://github.com/apache/hudi/pull/10957#discussion_r1571161187 ## hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/HoodieFileGroupReaderBasedParquetFileFormat.scala: ## @@

Re: [PR] [HUDI-7007] Add bloom_filters index support on read side [hudi]

2024-04-18 Thread via GitHub
hudi-bot commented on PR #11043: URL: https://github.com/apache/hudi/pull/11043#issuecomment-2064800265 ## CI report: * 98f4d4d4b61df443ca8c46078d921919f83e8595 Azure:

Re: [PR] [HUDI-7567] Add schema evolution to the filegroup reader [hudi]

2024-04-18 Thread via GitHub
jonvex commented on code in PR #10957: URL: https://github.com/apache/hudi/pull/10957#discussion_r157935 ## hudi-common/src/main/java/org/apache/hudi/common/engine/HoodieReaderContext.java: ## @@ -48,6 +49,12 @@ *and {@code RowData} in Flink. */ public

Re: [PR] [HUDI-7567] Add schema evolution to the filegroup reader [hudi]

2024-04-18 Thread via GitHub
jonvex commented on code in PR #10957: URL: https://github.com/apache/hudi/pull/10957#discussion_r157108 ## hudi-common/src/main/java/org/apache/hudi/avro/AvroSchemaUtils.java: ## @@ -231,7 +231,13 @@ private static Option findNestedField(Schema schema, String[] fiel

Re: [PR] [HUDI-7567] Add schema evolution to the filegroup reader [hudi]

2024-04-18 Thread via GitHub
hudi-bot commented on PR #10957: URL: https://github.com/apache/hudi/pull/10957#issuecomment-2064822612 ## CI report: * 72e09f67466fcfa61b0ec555fc2eecfa52fbb856 Azure:

[jira] [Closed] (HUDI-7635) Add default block size and openSeekable APIs to HoodieStorage

2024-04-18 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-7635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo closed HUDI-7635. --- Resolution: Fixed > Add default block size and openSeekable APIs to HoodieStorage >

[jira] [Updated] (HUDI-7633) Use try with resources for AutoCloseable

2024-04-18 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-7633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-7633: Sprint: Sprint 2024-03-25 > Use try with resources for AutoCloseable >

[jira] [Updated] (HUDI-7635) Add default block size and openSeekable APIs to HoodieStorage

2024-04-18 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-7635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-7635: Sprint: Sprint 2024-03-25 > Add default block size and openSeekable APIs to HoodieStorage >

[jira] [Closed] (HUDI-7637) Make StoragePathInfo Comparable

2024-04-18 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-7637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo closed HUDI-7637. --- Resolution: Fixed > Make StoragePathInfo Comparable > --- > >

[jira] [Updated] (HUDI-7639) Refactor HoodieFileIndex so that different indexes can be used via optimizer rules

2024-04-18 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-7639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-7639: Sprint: Sprint 2024-03-25 > Refactor HoodieFileIndex so that different indexes can be used via optimizer >

[jira] [Closed] (HUDI-7636) Make StoragePath Serializable

2024-04-18 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-7636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo closed HUDI-7636. --- Resolution: Fixed > Make StoragePath Serializable > - > > Key:

Re: [PR] [HUDI-7007] Add bloom_filters index support on read side [hudi]

2024-04-18 Thread via GitHub
hudi-bot commented on PR #11043: URL: https://github.com/apache/hudi/pull/11043#issuecomment-2064969766 ## CI report: * 5f8a1f1f175c99c5f1fb36c46de04cee1eaab88e Azure:

Re: [I] [SUPPORT] Flink-Hudi - Upsert into the same Hudi table via two different Flink pipelines (stream and batch) [hudi]

2024-04-18 Thread via GitHub
ChiehFu commented on issue #10914: URL: https://github.com/apache/hudi/issues/10914#issuecomment-2064634809 @danny0405 Is it expected that the checkpoint size of bucket_assigner operator changes significantly from 500GB in the job 2 to less than 50GB in the job 3 mentioned above?

Re: [PR] [HUDI-6497] Replace FileSystem, Path, and FileStatus usage in `hudi-common` module [hudi]

2024-04-18 Thread via GitHub
hudi-bot commented on PR #10591: URL: https://github.com/apache/hudi/pull/10591#issuecomment-2064628258 ## CI report: * 8207558e8c8714386cf2f71929d6fb08db10617b UNKNOWN * 7c517227bb1079621647852c99dd7836f9900025 UNKNOWN * e89e4e0bcb756832c22779a5ccf259c5e69c0e0d UNKNOWN *

Re: [PR] [HUDI-6497] Replace FileSystem, Path, and FileStatus usage in `hudi-common` module [hudi]

2024-04-18 Thread via GitHub
yihua merged PR #10591: URL: https://github.com/apache/hudi/pull/10591 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

[jira] [Closed] (HUDI-6497) Replace FileSystem, Path, and FileStatus usage in hudi-common

2024-04-18 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-6497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo closed HUDI-6497. --- Resolution: Fixed > Replace FileSystem, Path, and FileStatus usage in hudi-common >

Re: [PR] [HUDI-7532] Include only compaction instants for lastCompaction in getDeltaCommitsSinceLatestCompaction [hudi]

2024-04-18 Thread via GitHub
nsivabalan commented on code in PR #10915: URL: https://github.com/apache/hudi/pull/10915#discussion_r1571247566 ## hudi-common/src/main/java/org/apache/hudi/common/table/cdc/HoodieCDCExtractor.java: ## @@ -114,6 +114,24 @@ public Map> extractCDCFileSplits() {

Re: [PR] [HUDI-6497] Replace FileSystem, Path, and FileStatus usage in `hudi-common` module [hudi]

2024-04-18 Thread via GitHub
hudi-bot commented on PR #10591: URL: https://github.com/apache/hudi/pull/10591#issuecomment-2064421164 ## CI report: * 8207558e8c8714386cf2f71929d6fb08db10617b UNKNOWN * 7c517227bb1079621647852c99dd7836f9900025 UNKNOWN * e89e4e0bcb756832c22779a5ccf259c5e69c0e0d UNKNOWN *

[jira] [Updated] (HUDI-7363) Replace unnecessary FileSystem, Path, and FileStatus usage in other modules

2024-04-18 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-7363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-7363: Labels: hoodie-storage (was: ) > Replace unnecessary FileSystem, Path, and FileStatus usage in other

[jira] [Updated] (HUDI-7363) Replace unnecessary FileSystem, Path, and FileStatus usage in other modules

2024-04-18 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-7363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-7363: Epic Link: HUDI-6243 > Replace unnecessary FileSystem, Path, and FileStatus usage in other modules >

Re: [PR] [HUDI-7567] Add schema evolution to the filegroup reader [hudi]

2024-04-18 Thread via GitHub
jonvex commented on code in PR #10957: URL: https://github.com/apache/hudi/pull/10957#discussion_r1571157403 ## hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/HoodieFileGroupReaderBasedParquetFileFormat.scala: ## @@

Re: [PR] [HUDI-7567] Add schema evolution to the filegroup reader [hudi]

2024-04-18 Thread via GitHub
jonvex commented on code in PR #10957: URL: https://github.com/apache/hudi/pull/10957#discussion_r1571162253 ## hudi-spark-datasource/hudi-spark-common/src/test/scala/org/apache/spark/execution/datasources/parquet/TestHoodieFileGroupReaderBasedParquetFileFormat.scala: ## @@

Re: [PR] [HUDI-7567] Add schema evolution to the filegroup reader [hudi]

2024-04-18 Thread via GitHub
jonvex commented on code in PR #10957: URL: https://github.com/apache/hudi/pull/10957#discussion_r1571163565 ## hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/spark/sql/hudi/ddl/TestSpark3DDL.scala: ## @@ -138,6 +138,7 @@ class TestSpark3DDL extends

Re: [PR] [HUDI-7567] Add schema evolution to the filegroup reader [hudi]

2024-04-18 Thread via GitHub
jonvex commented on code in PR #10957: URL: https://github.com/apache/hudi/pull/10957#discussion_r1571170811 ## hudi-common/src/main/java/org/apache/hudi/common/table/read/HoodieBaseFileGroupRecordBuffer.java: ## @@ -242,7 +252,44 @@ protected Pair, Schema>

[jira] [Updated] (HUDI-7638) Add metrics to HoodieStorage implementation that is not hadoop-dependent

2024-04-18 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-7638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-7638: Fix Version/s: 1.0.0 > Add metrics to HoodieStorage implementation that is not hadoop-dependent >

[jira] [Updated] (HUDI-7638) Add metrics to HoodieStorage implementation that is not hadoop-dependent

2024-04-18 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-7638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-7638: Labels: hoodie-storage (was: ) > Add metrics to HoodieStorage implementation that is not hadoop-dependent

[jira] [Updated] (HUDI-7638) Add metrics to HoodieStorage implementation that is not hadoop-dependent

2024-04-18 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-7638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-7638: Fix Version/s: 0.15.0 > Add metrics to HoodieStorage implementation that is not hadoop-dependent >

Re: [PR] [HUDI-6497] Replace FileSystem, Path, and FileStatus usage in `hudi-common` module [hudi]

2024-04-18 Thread via GitHub
yihua commented on PR #10591: URL: https://github.com/apache/hudi/pull/10591#issuecomment-2064406522 The PR is rebased on the latest master and ready to land once CI passes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [PR] [HUDI-6497] Replace FileSystem, Path, and FileStatus usage in `hudi-common` module [hudi]

2024-04-18 Thread via GitHub
yihua commented on code in PR #10591: URL: https://github.com/apache/hudi/pull/10591#discussion_r1571109412 ## hudi-common/src/main/java/org/apache/hudi/common/table/log/HoodieLogFormat.java: ## @@ -281,36 +281,30 @@ public Writer build() throws IOException {

Re: [PR] [HUDI-7567] Add schema evolution to the filegroup reader [hudi]

2024-04-18 Thread via GitHub
jonvex commented on code in PR #10957: URL: https://github.com/apache/hudi/pull/10957#discussion_r1571152424 ## hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/HoodieFileGroupReaderBasedParquetFileFormat.scala: ## @@

Re: [PR] [HUDI-7634] Rename HoodieStorage APIs [hudi]

2024-04-18 Thread via GitHub
nsivabalan commented on PR #11047: URL: https://github.com/apache/hudi/pull/11047#issuecomment-2064791962 #getHoodieStorage -> #getStorage since this is in tests, I am ok. if it had been in source code, we should align method name w/ the class name. ok w/ the patch. --

Re: [PR] [HUDI-7567] Add schema evolution to the filegroup reader [hudi]

2024-04-18 Thread via GitHub
hudi-bot commented on PR #10957: URL: https://github.com/apache/hudi/pull/10957#issuecomment-2064993735 ## CI report: * 24be89663d1b95cf7db83dd39378a675a54b98fc Azure:

Re: [PR] [HUDI-7532] Include only compaction instants for lastCompaction in getDeltaCommitsSinceLatestCompaction [hudi]

2024-04-18 Thread via GitHub
nsivabalan commented on code in PR #10915: URL: https://github.com/apache/hudi/pull/10915#discussion_r1571247566 ## hudi-common/src/main/java/org/apache/hudi/common/table/cdc/HoodieCDCExtractor.java: ## @@ -114,6 +114,24 @@ public Map> extractCDCFileSplits() {

[jira] [Created] (HUDI-7638) Add metrics to HoodieStorage implementation that is not hadoop-dependent

2024-04-18 Thread Ethan Guo (Jira)
Ethan Guo created HUDI-7638: --- Summary: Add metrics to HoodieStorage implementation that is not hadoop-dependent Key: HUDI-7638 URL: https://issues.apache.org/jira/browse/HUDI-7638 Project: Apache Hudi

[jira] [Assigned] (HUDI-7638) Add metrics to HoodieStorage implementation that is not hadoop-dependent

2024-04-18 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-7638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo reassigned HUDI-7638: --- Assignee: Ethan Guo > Add metrics to HoodieStorage implementation that is not hadoop-dependent >

[jira] [Updated] (HUDI-7638) Add metrics to HoodieStorage implementation that is not hadoop-dependent

2024-04-18 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-7638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-7638: Epic Link: HUDI-6243 Story Points: 6 > Add metrics to HoodieStorage implementation that is not

Re: [PR] [HUDI-7567] Add schema evolution to the filegroup reader [hudi]

2024-04-18 Thread via GitHub
jonvex commented on code in PR #10957: URL: https://github.com/apache/hudi/pull/10957#discussion_r1571167483 ## hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/spark/sql/hudi/ddl/TestSpark3DDL.scala: ## @@ -706,6 +709,8 @@ class TestSpark3DDL extends

Re: [PR] [HUDI-7007] Add bloom_filters index support on read side [hudi]

2024-04-18 Thread via GitHub
hudi-bot commented on PR #11043: URL: https://github.com/apache/hudi/pull/11043#issuecomment-2064823328 ## CI report: * 98f4d4d4b61df443ca8c46078d921919f83e8595 Azure:

Re: [PR] [HUDI-7567] Add schema evolution to the filegroup reader [hudi]

2024-04-18 Thread via GitHub
hudi-bot commented on PR #10957: URL: https://github.com/apache/hudi/pull/10957#issuecomment-2064968952 ## CI report: * 72e09f67466fcfa61b0ec555fc2eecfa52fbb856 Azure:

Re: [PR] [HUDI-7567] Add schema evolution to the filegroup reader [hudi]

2024-04-18 Thread via GitHub
hudi-bot commented on PR #10957: URL: https://github.com/apache/hudi/pull/10957#issuecomment-2065344992 ## CI report: * 24be89663d1b95cf7db83dd39378a675a54b98fc Azure:

[jira] [Assigned] (HUDI-7515) Fix partition metadata write failure

2024-04-18 Thread Danny Chen (Jira)
[ https://issues.apache.org/jira/browse/HUDI-7515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Danny Chen reassigned HUDI-7515: Assignee: Danny Chen > Fix partition metadata write failure >

Re: [PR] [HUDI-7567] Add schema evolution to the filegroup reader [hudi]

2024-04-18 Thread via GitHub
hudi-bot commented on PR #10957: URL: https://github.com/apache/hudi/pull/10957#issuecomment-2065564649 ## CI report: * 89078f34a2dafff26d47d8a201a59d8bf8a540ba Azure:

[PR] [HUDI-7640] Uses UUID as temporary file suffix for HoodieWrapperFileSystem.createImmutableFileInPath [hudi]

2024-04-18 Thread via GitHub
danny0405 opened a new pull request, #11052: URL: https://github.com/apache/hudi/pull/11052 ### Change Logs Always use UUID as the temporary file suffix so that the method can be thread-safe. Also moves the method to `HadoopFSUtils` as a static utility method. ### Impact

[jira] [Updated] (HUDI-7640) Uses UUID as temporary file suffix for HoodieWrapperFileSystem.createImmutableFileInPath

2024-04-18 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/HUDI-7640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-7640: - Labels: pull-request-available (was: ) > Uses UUID as temporary file suffix for >

Re: [PR] [HUDI-7578] Avoid unnecessary rewriting when copy old data from old base to new base file to improve compaction performance [hudi]

2024-04-18 Thread via GitHub
yihua closed pull request #10980: [HUDI-7578] Avoid unnecessary rewriting when copy old data from old base to new base file to improve compaction performance URL: https://github.com/apache/hudi/pull/10980 -- This is an automated message from the Apache Git Service. To respond to the message,

Re: [PR] [HUDI-7640] Uses UUID as temporary file suffix for HoodieWrapperFileSystem.createImmutableFileInPath [hudi]

2024-04-18 Thread via GitHub
hudi-bot commented on PR #11052: URL: https://github.com/apache/hudi/pull/11052#issuecomment-2065570145 ## CI report: * 569e14e31d4b352dec8ef4e73c59574c70791056 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run

Re: [PR] [HUDI-7567] Add schema evolution to the filegroup reader [hudi]

2024-04-18 Thread via GitHub
hudi-bot commented on PR #10957: URL: https://github.com/apache/hudi/pull/10957#issuecomment-2065415574 ## CI report: * 96a371f7fca39943737731bd18b9e52af37955e8 Azure:

Re: [PR] [MINOR] Make ordering deterministic in small file selection [hudi]

2024-04-18 Thread via GitHub
hudi-bot commented on PR #11008: URL: https://github.com/apache/hudi/pull/11008#issuecomment-2065467861 ## CI report: * e5a2713d07581824214bcc7b9321e3d1cb371c02 Azure:

Re: [PR] [HUDI-7567] Add schema evolution to the filegroup reader [hudi]

2024-04-18 Thread via GitHub
hudi-bot commented on PR #10957: URL: https://github.com/apache/hudi/pull/10957#issuecomment-2065467751 ## CI report: * 96a371f7fca39943737731bd18b9e52af37955e8 Azure:

Re: [PR] [HUDI-7567] Add schema evolution to the filegroup reader [hudi]

2024-04-18 Thread via GitHub
hudi-bot commented on PR #10957: URL: https://github.com/apache/hudi/pull/10957#issuecomment-2065473691 ## CI report: * 96a371f7fca39943737731bd18b9e52af37955e8 Azure:

Re: [PR] [MINOR] Make ordering deterministic in small file selection [hudi]

2024-04-18 Thread via GitHub
hudi-bot commented on PR #11008: URL: https://github.com/apache/hudi/pull/11008#issuecomment-2065473833 ## CI report: * e5a2713d07581824214bcc7b9321e3d1cb371c02 Azure:

Re: [PR] [MINOR] Make ordering deterministic in small file selection [hudi]

2024-04-18 Thread via GitHub
hudi-bot commented on PR #11008: URL: https://github.com/apache/hudi/pull/11008#issuecomment-2065517817 ## CI report: * 82029e70eec8c77e1c64bf9f751200c6962777ec Azure:

Re: [I] [SUPPORT] Flink-Hudi - Upsert into the same Hudi table via two different Flink pipelines (stream and batch) [hudi]

2024-04-18 Thread via GitHub
danny0405 commented on issue #10914: URL: https://github.com/apache/hudi/issues/10914#issuecomment-2065542560 You may need to read this doc first: https://www.yuque.com/yuzhao-my9fz/kb/flqll8? -- This is an automated message from the Apache Git Service. To respond to the message, please

[jira] [Updated] (HUDI-7515) Fix partition metadata write failure

2024-04-18 Thread Danny Chen (Jira)
[ https://issues.apache.org/jira/browse/HUDI-7515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Danny Chen updated HUDI-7515: - Status: Patch Available (was: In Progress) > Fix partition metadata write failure >

[jira] [Updated] (HUDI-7515) Fix partition metadata write failure

2024-04-18 Thread Danny Chen (Jira)
[ https://issues.apache.org/jira/browse/HUDI-7515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Danny Chen updated HUDI-7515: - Status: In Progress (was: Open) > Fix partition metadata write failure >

[jira] [Updated] (HUDI-7515) Fix partition metadata write failure

2024-04-18 Thread Danny Chen (Jira)
[ https://issues.apache.org/jira/browse/HUDI-7515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Danny Chen updated HUDI-7515: - Sprint: Sprint 2024-03-25 > Fix partition metadata write failure > >

[jira] [Created] (HUDI-7640) Uses UUID as temporary file suffix for HoodieWrapperFileSystem.createImmutableFileInPath

2024-04-18 Thread Danny Chen (Jira)
Danny Chen created HUDI-7640: Summary: Uses UUID as temporary file suffix for HoodieWrapperFileSystem.createImmutableFileInPath Key: HUDI-7640 URL: https://issues.apache.org/jira/browse/HUDI-7640

[jira] [Updated] (HUDI-7640) Uses UUID as temporary file suffix for HoodieWrapperFileSystem.createImmutableFileInPath

2024-04-18 Thread Danny Chen (Jira)
[ https://issues.apache.org/jira/browse/HUDI-7640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Danny Chen updated HUDI-7640: - Sprint: Sprint 2024-03-25 > Uses UUID as temporary file suffix for >

Re: [PR] [HUDI-7640] Uses UUID as temporary file suffix for HoodieWrapperFileSystem.createImmutableFileInPath [hudi]

2024-04-18 Thread via GitHub
hudi-bot commented on PR #11052: URL: https://github.com/apache/hudi/pull/11052#issuecomment-2065575462 ## CI report: * 569e14e31d4b352dec8ef4e73c59574c70791056 Azure:

[jira] [Updated] (HUDI-7588) Replace hadoop Configuration with StorageConfiguration in hudi-common module

2024-04-18 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-7588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-7588: Status: In Progress (was: Open) > Replace hadoop Configuration with StorageConfiguration in hudi-common

[jira] [Updated] (HUDI-7515) Fix partition metadata write failure

2024-04-18 Thread Danny Chen (Jira)
[ https://issues.apache.org/jira/browse/HUDI-7515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Danny Chen updated HUDI-7515: - Fix Version/s: 0.15.0 1.0.0 > Fix partition metadata write failure >

Re: [PR] [HUDI-7567] Add schema evolution to the filegroup reader [hudi]

2024-04-18 Thread via GitHub
hudi-bot commented on PR #10957: URL: https://github.com/apache/hudi/pull/10957#issuecomment-2065335907 ## CI report: * 24be89663d1b95cf7db83dd39378a675a54b98fc Azure:

Re: [PR] [HUDI-7515] Fix partition metadata write failure [hudi]

2024-04-18 Thread via GitHub
danny0405 commented on code in PR #10886: URL: https://github.com/apache/hudi/pull/10886#discussion_r1571564604 ## hudi-common/src/main/java/org/apache/hudi/common/model/HoodiePartitionMetadata.java: ## @@ -92,37 +94,33 @@ public int getPartitionDepth() { /** * Write

[jira] [Updated] (HUDI-7640) Uses UUID as temporary file suffix for HoodieWrapperFileSystem.createImmutableFileInPath

2024-04-18 Thread Danny Chen (Jira)
[ https://issues.apache.org/jira/browse/HUDI-7640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Danny Chen updated HUDI-7640: - Status: In Progress (was: Open) > Uses UUID as temporary file suffix for >

Re: [I] [SUPPORT] Flink-Hudi - Upsert into the same Hudi table via two different Flink pipelines (stream and batch) [hudi]

2024-04-18 Thread via GitHub
ChiehFu commented on issue #10914: URL: https://github.com/apache/hudi/issues/10914#issuecomment-2065368620 In addition, I found some duplicates written by my bulk_insert batch job 1 and upsert stream job 2 (the one that had index bootstrap enabled). For bulk_insert batch job, it had

Re: [PR] [HUDI-7576] Improve efficiency of getRelativePartitionPath, reduce computation of partitionPath in AbstractTableFileSystemView [hudi]

2024-04-18 Thread via GitHub
hudi-bot commented on PR #11001: URL: https://github.com/apache/hudi/pull/11001#issuecomment-2065666160 ## CI report: * 22f01c9e071a9f92747f4af966c9f63056c7216d UNKNOWN * d2f4d099595879917fbefa3bc467e37be5ec4f24 Azure:

Re: [PR] [MINOR] Make ordering deterministic in small file selection [hudi]

2024-04-18 Thread via GitHub
hudi-bot commented on PR #11008: URL: https://github.com/apache/hudi/pull/11008#issuecomment-2065666227 ## CI report: * 82029e70eec8c77e1c64bf9f751200c6962777ec Azure:

Re: [PR] [HUDI-7640] Uses UUID as temporary file suffix for HoodieStorage.createImmutableFileInPath [hudi]

2024-04-18 Thread via GitHub
hudi-bot commented on PR #11052: URL: https://github.com/apache/hudi/pull/11052#issuecomment-2065666353 ## CI report: * 569e14e31d4b352dec8ef4e73c59574c70791056 Azure:

Re: [PR] [HUDI-7515] Fix partition metadata write failure [hudi]

2024-04-18 Thread via GitHub
hudi-bot commented on PR #10886: URL: https://github.com/apache/hudi/pull/10886#issuecomment-2065675635 ## CI report: * 522a68cb3ea8dc725418eb9b811a03b5c86c694b Azure:

Re: [PR] [HUDI-7575] avoid repeated fetching of pending replace instants [hudi]

2024-04-18 Thread via GitHub
hudi-bot commented on PR #10976: URL: https://github.com/apache/hudi/pull/10976#issuecomment-2065675751 ## CI report: * db99bbcc7ede1bb1372a7996c25cfb54c1069a49 Azure:

Re: [PR] [HUDI-7576] Improve efficiency of getRelativePartitionPath, reduce computation of partitionPath in AbstractTableFileSystemView [hudi]

2024-04-18 Thread via GitHub
hudi-bot commented on PR #11001: URL: https://github.com/apache/hudi/pull/11001#issuecomment-2065675798 ## CI report: * 22f01c9e071a9f92747f4af966c9f63056c7216d UNKNOWN * d2f4d099595879917fbefa3bc467e37be5ec4f24 Azure:

[jira] [Updated] (HUDI-7498) Fix schema for HoodieTimestampAwareParquetInputFormat

2024-04-18 Thread Sagar Sumit (Jira)
[ https://issues.apache.org/jira/browse/HUDI-7498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sagar Sumit updated HUDI-7498: -- Fix Version/s: 0.15.0 > Fix schema for HoodieTimestampAwareParquetInputFormat >

Re: [PR] [HUDI-7640] Uses UUID as temporary file suffix for HoodieStorage.createImmutableFileInPath [hudi]

2024-04-18 Thread via GitHub
danny0405 commented on code in PR #11052: URL: https://github.com/apache/hudi/pull/11052#discussion_r1571748491 ## hudi-io/src/main/java/org/apache/hudi/storage/HoodieStorage.java: ## @@ -267,7 +270,7 @@ public final void createImmutableFileInPath(StoragePath path, if

[jira] [Created] (HUDI-7641) Add metrics to track what partitions are enabled in MDT

2024-04-18 Thread sivabalan narayanan (Jira)
sivabalan narayanan created HUDI-7641: - Summary: Add metrics to track what partitions are enabled in MDT Key: HUDI-7641 URL: https://issues.apache.org/jira/browse/HUDI-7641 Project: Apache Hudi

Re: [PR] [HUDI-7429] Fixing average record size estimation for delta commits [hudi]

2024-04-18 Thread via GitHub
the-other-tim-brown commented on code in PR #10763: URL: https://github.com/apache/hudi/pull/10763#discussion_r1571784969 ## hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/table/action/commit/AverageRecordSizeUtils.java: ## @@ -0,0 +1,90 @@ +/* + * Licensed to the

[jira] [Closed] (HUDI-7625) Avoid unnecessary rewrite for metadata table

2024-04-18 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-7625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo closed HUDI-7625. --- Resolution: Fixed > Avoid unnecessary rewrite for metadata table >

Re: [PR] [HUDI-7515] Fix partition metadata write failure [hudi]

2024-04-18 Thread via GitHub
danny0405 commented on code in PR #10886: URL: https://github.com/apache/hudi/pull/10886#discussion_r1571623276 ## hudi-common/src/main/java/org/apache/hudi/common/model/HoodiePartitionMetadata.java: ## @@ -92,11 +92,12 @@ public int getPartitionDepth() { /** * Write

Re: [PR] [MINOR] Make ordering deterministic in small file selection [hudi]

2024-04-18 Thread via GitHub
the-other-tim-brown commented on code in PR #11008: URL: https://github.com/apache/hudi/pull/11008#discussion_r1571633643 ## hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/table/action/deltacommit/SparkUpsertDeltaCommitPartitioner.java: ## @@ -89,10 +89,13 @@

  1   2   >