Re: [PR] [HUDI-7567] Add schema evolution to the filegroup reader [hudi]

2024-04-18 Thread via GitHub
jonvex commented on code in PR #10957: URL: https://github.com/apache/hudi/pull/10957#discussion_r1571150439 ## hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/HoodieFileGroupReaderBasedParquetFileFormat.scala: ## @@

Re: [PR] [HUDI-7567] Add schema evolution to the filegroup reader [hudi]

2024-04-18 Thread via GitHub
jonvex commented on code in PR #10957: URL: https://github.com/apache/hudi/pull/10957#discussion_r1571149642 ## hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieHadoopFsRelationFactory.scala: ## @@ -64,7 +65,7 @@ abstract class

Re: [PR] [HUDI-6497] Replace FileSystem, Path, and FileStatus usage in `hudi-common` module [hudi]

2024-04-18 Thread via GitHub
yihua merged PR #10591: URL: https://github.com/apache/hudi/pull/10591 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

Re: [I] [SUPPORT] Flink-Hudi - Upsert into the same Hudi table via two different Flink pipelines (stream and batch) [hudi]

2024-04-18 Thread via GitHub
ChiehFu commented on issue #10914: URL: https://github.com/apache/hudi/issues/10914#issuecomment-2064634809 @danny0405 Is it expected that the checkpoint size of bucket_assigner operator changes significantly from 500GB in the job 2 to less than 50GB in the job 3 mentioned above?

Re: [PR] [HUDI-6497] Replace FileSystem, Path, and FileStatus usage in `hudi-common` module [hudi]

2024-04-18 Thread via GitHub
hudi-bot commented on PR #10591: URL: https://github.com/apache/hudi/pull/10591#issuecomment-2064628258 ## CI report: * 8207558e8c8714386cf2f71929d6fb08db10617b UNKNOWN * 7c517227bb1079621647852c99dd7836f9900025 UNKNOWN * e89e4e0bcb756832c22779a5ccf259c5e69c0e0d UNKNOWN *

Re: [PR] [HUDI-7567] Add schema evolution to the filegroup reader [hudi]

2024-04-18 Thread via GitHub
jonvex commented on code in PR #10957: URL: https://github.com/apache/hudi/pull/10957#discussion_r157935 ## hudi-common/src/main/java/org/apache/hudi/common/engine/HoodieReaderContext.java: ## @@ -48,6 +49,12 @@ *and {@code RowData} in Flink. */ public

Re: [PR] [HUDI-7567] Add schema evolution to the filegroup reader [hudi]

2024-04-18 Thread via GitHub
jonvex commented on code in PR #10957: URL: https://github.com/apache/hudi/pull/10957#discussion_r157108 ## hudi-common/src/main/java/org/apache/hudi/avro/AvroSchemaUtils.java: ## @@ -231,7 +231,13 @@ private static Option findNestedField(Schema schema, String[] fiel

Re: [PR] [HUDI-6497] Replace FileSystem, Path, and FileStatus usage in `hudi-common` module [hudi]

2024-04-18 Thread via GitHub
yihua commented on code in PR #10591: URL: https://github.com/apache/hudi/pull/10591#discussion_r1571109412 ## hudi-common/src/main/java/org/apache/hudi/common/table/log/HoodieLogFormat.java: ## @@ -281,36 +281,30 @@ public Writer build() throws IOException {

[jira] [Created] (HUDI-7639) Refactor HoodieFileIndex so that different indexes can be used via optimizer rules

2024-04-18 Thread Sagar Sumit (Jira)
Sagar Sumit created HUDI-7639: - Summary: Refactor HoodieFileIndex so that different indexes can be used via optimizer rules Key: HUDI-7639 URL: https://issues.apache.org/jira/browse/HUDI-7639 Project:

Re: [PR] [HUDI-6497] Replace FileSystem, Path, and FileStatus usage in `hudi-common` module [hudi]

2024-04-18 Thread via GitHub
hudi-bot commented on PR #10591: URL: https://github.com/apache/hudi/pull/10591#issuecomment-2064421164 ## CI report: * 8207558e8c8714386cf2f71929d6fb08db10617b UNKNOWN * 7c517227bb1079621647852c99dd7836f9900025 UNKNOWN * e89e4e0bcb756832c22779a5ccf259c5e69c0e0d UNKNOWN *

[jira] [Updated] (HUDI-7363) Replace unnecessary FileSystem, Path, and FileStatus usage in other modules

2024-04-18 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-7363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-7363: Epic Link: HUDI-6243 > Replace unnecessary FileSystem, Path, and FileStatus usage in other modules >

[jira] [Updated] (HUDI-7363) Replace unnecessary FileSystem, Path, and FileStatus usage in other modules

2024-04-18 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-7363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-7363: Labels: hoodie-storage (was: ) > Replace unnecessary FileSystem, Path, and FileStatus usage in other

Re: [PR] [HUDI-6497] Replace FileSystem, Path, and FileStatus usage in `hudi-common` module [hudi]

2024-04-18 Thread via GitHub
yihua commented on PR #10591: URL: https://github.com/apache/hudi/pull/10591#issuecomment-2064406522 The PR is rebased on the latest master and ready to land once CI passes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[jira] [Updated] (HUDI-7638) Add metrics to HoodieStorage implementation that is not hadoop-dependent

2024-04-18 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-7638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-7638: Fix Version/s: 1.0.0 > Add metrics to HoodieStorage implementation that is not hadoop-dependent >

[jira] [Updated] (HUDI-7638) Add metrics to HoodieStorage implementation that is not hadoop-dependent

2024-04-18 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-7638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-7638: Fix Version/s: 0.15.0 > Add metrics to HoodieStorage implementation that is not hadoop-dependent >

[jira] [Updated] (HUDI-7638) Add metrics to HoodieStorage implementation that is not hadoop-dependent

2024-04-18 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-7638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-7638: Labels: hoodie-storage (was: ) > Add metrics to HoodieStorage implementation that is not hadoop-dependent

[jira] [Updated] (HUDI-7638) Add metrics to HoodieStorage implementation that is not hadoop-dependent

2024-04-18 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-7638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-7638: Epic Link: HUDI-6243 Story Points: 6 > Add metrics to HoodieStorage implementation that is not

[jira] [Created] (HUDI-7638) Add metrics to HoodieStorage implementation that is not hadoop-dependent

2024-04-18 Thread Ethan Guo (Jira)
Ethan Guo created HUDI-7638: --- Summary: Add metrics to HoodieStorage implementation that is not hadoop-dependent Key: HUDI-7638 URL: https://issues.apache.org/jira/browse/HUDI-7638 Project: Apache Hudi

[jira] [Assigned] (HUDI-7638) Add metrics to HoodieStorage implementation that is not hadoop-dependent

2024-04-18 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-7638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo reassigned HUDI-7638: --- Assignee: Ethan Guo > Add metrics to HoodieStorage implementation that is not hadoop-dependent >

Re: [PR] [HUDI-6497] Replace FileSystem, Path, and FileStatus usage in `hudi-common` module [hudi]

2024-04-18 Thread via GitHub
hudi-bot commented on PR #10591: URL: https://github.com/apache/hudi/pull/10591#issuecomment-2064394461 ## CI report: * 8207558e8c8714386cf2f71929d6fb08db10617b UNKNOWN * 7c517227bb1079621647852c99dd7836f9900025 UNKNOWN * e89e4e0bcb756832c22779a5ccf259c5e69c0e0d UNKNOWN *

Re: [PR] [HUDI-7634] Rename HoodieStorage APIs [hudi]

2024-04-18 Thread via GitHub
yihua commented on code in PR #11047: URL: https://github.com/apache/hudi/pull/11047#discussion_r1570710795 ## hudi-hadoop-common/src/main/java/org/apache/hudi/storage/hadoop/HoodieHadoopStorage.java: ## @@ -94,7 +94,7 @@ public boolean createDirectory(StoragePath path) throws

Re: [PR] [HUDI-7515] Fix partition metadata write failure [hudi]

2024-04-18 Thread via GitHub
hudi-bot commented on PR #10886: URL: https://github.com/apache/hudi/pull/10886#issuecomment-2063823692 ## CI report: * 522a68cb3ea8dc725418eb9b811a03b5c86c694b Azure:

(hudi) branch master updated: [HUDI-7637] Make StoragePathInfo Comparable (#11050)

2024-04-18 Thread codope
This is an automated email from the ASF dual-hosted git repository. codope pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/master by this push: new b5b14f7d4fa [HUDI-7637] Make StoragePathInfo

Re: [PR] [HUDI-7637] Make StoragePathInfo Comparable [hudi]

2024-04-18 Thread via GitHub
codope merged PR #11050: URL: https://github.com/apache/hudi/pull/11050 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

Re: [PR] [HUDI-7515] Fix partition metadata write failure [hudi]

2024-04-18 Thread via GitHub
hudi-bot commented on PR #10886: URL: https://github.com/apache/hudi/pull/10886#issuecomment-2063621252 ## CI report: * aadcb616ac338ef60c5799414bef660a19135c06 Azure:

Re: [PR] [HUDI-7515] Fix partition metadata write failure [hudi]

2024-04-18 Thread via GitHub
hudi-bot commented on PR #10886: URL: https://github.com/apache/hudi/pull/10886#issuecomment-2063608614 ## CI report: * aadcb616ac338ef60c5799414bef660a19135c06 Azure:

Re: [PR] [HUDI-7515] Fix partition metadata write failure [hudi]

2024-04-18 Thread via GitHub
wecharyu commented on code in PR #10886: URL: https://github.com/apache/hudi/pull/10886#discussion_r1570475426 ## hudi-common/src/main/java/org/apache/hudi/common/model/HoodiePartitionMetadata.java: ## @@ -92,11 +92,12 @@ public int getPartitionDepth() { /** * Write

[I] [SUPPORT] HudiDeltaStreaming with Kafka - where is the Kafka Consumer Group? [hudi]

2024-04-18 Thread via GitHub
mattssll opened a new issue, #11051: URL: https://github.com/apache/hudi/issues/11051 **_Tips before filing an issue_** - Have you gone through our [FAQs](https://hudi.apache.org/learn/faq/)? - Join the mailing list to engage in conversations and get faster support at

Re: [PR] [HUDI-7618] Add ability to ignore checkpoints in delta streamer [hudi]

2024-04-18 Thread via GitHub
hudi-bot commented on PR #11018: URL: https://github.com/apache/hudi/pull/11018#issuecomment-2063520104 ## CI report: * 34446bfde68247607172d6478f92b053642c9c94 Azure:

Re: [I] [SUPPORT] java.lang.NoClassDefFoundError: org/apache/hudi/com/fasterxml/jackson/module/scala/DefaultScalaModule$ when doing an Incremental CDC Query in 0.14.1 [hudi]

2024-04-18 Thread via GitHub
VitoMakarevich commented on issue #10590: URL: https://github.com/apache/hudi/issues/10590#issuecomment-2063509773 This is fixed in https://github.com/apache/hudi/pull/10877 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

Re: [PR] [HUDI-7515] Fix partition metadata write failure [hudi]

2024-04-18 Thread via GitHub
danny0405 commented on code in PR #10886: URL: https://github.com/apache/hudi/pull/10886#discussion_r1570413829 ## hudi-common/src/main/java/org/apache/hudi/common/model/HoodiePartitionMetadata.java: ## @@ -92,11 +92,12 @@ public int getPartitionDepth() { /** * Write

Re: [PR] [HUDI-7618] Add ability to ignore checkpoints in delta streamer [hudi]

2024-04-18 Thread via GitHub
hudi-bot commented on PR #11018: URL: https://github.com/apache/hudi/pull/11018#issuecomment-2063408847 ## CI report: * b2eda0f44dc17ccc3722be2eecbf001a2c57a955 Azure:

Re: [PR] [HUDI-7618] Add ability to ignore checkpoints in delta streamer [hudi]

2024-04-18 Thread via GitHub
hudi-bot commented on PR #11018: URL: https://github.com/apache/hudi/pull/11018#issuecomment-2063394311 ## CI report: * b2eda0f44dc17ccc3722be2eecbf001a2c57a955 Azure:

Re: [PR] [HUDI-7618] Add ability to ignore checkpoints in delta streamer [hudi]

2024-04-18 Thread via GitHub
sampan-s-nayak commented on code in PR #11018: URL: https://github.com/apache/hudi/pull/11018#discussion_r1570287032 ## hudi-utilities/src/main/java/org/apache/hudi/utilities/streamer/HoodieStreamer.java: ## @@ -150,21 +159,27 @@ public HoodieStreamer(Config cfg,

Re: [PR] [HUDI-7618] Add ability to ignore checkpoints in delta streamer [hudi]

2024-04-18 Thread via GitHub
sampan-s-nayak commented on code in PR #11018: URL: https://github.com/apache/hudi/pull/11018#discussion_r1570282467 ## hudi-utilities/src/main/java/org/apache/hudi/utilities/streamer/HoodieStreamer.java: ## @@ -150,21 +159,27 @@ public HoodieStreamer(Config cfg,

Re: [PR] [HUDI-7618] Add ability to ignore checkpoints in delta streamer [hudi]

2024-04-18 Thread via GitHub
rmahindra123 commented on code in PR #11018: URL: https://github.com/apache/hudi/pull/11018#discussion_r1570264478 ## hudi-utilities/src/main/java/org/apache/hudi/utilities/streamer/HoodieStreamer.java: ## @@ -150,21 +159,27 @@ public HoodieStreamer(Config cfg, JavaSparkContext

Re: [PR] [HUDI-7618] Add ability to ignore checkpoints in delta streamer [hudi]

2024-04-18 Thread via GitHub
rmahindra123 commented on code in PR #11018: URL: https://github.com/apache/hudi/pull/11018#discussion_r1570247366 ## hudi-utilities/src/main/java/org/apache/hudi/utilities/streamer/HoodieStreamer.java: ## @@ -150,21 +159,27 @@ public HoodieStreamer(Config cfg, JavaSparkContext

Re: [I] [SUPPORT]Data Loss Issue with Hudi Table After 3 Days of Continuous Writes [hudi]

2024-04-18 Thread via GitHub
juice411 commented on issue #11016: URL: https://github.com/apache/hudi/issues/11016#issuecomment-2063254935 CREATE TABLE if not exists test_simulated_data.ods_table_v1( id int, count_field double, write_time timestamp(0), _part string, proc_time timestamp(3), WATERMARK FOR

Re: [I] [SUPPORT]Data Loss Issue with Hudi Table After 3 Days of Continuous Writes [hudi]

2024-04-18 Thread via GitHub
juice411 commented on issue #11016: URL: https://github.com/apache/hudi/issues/11016#issuecomment-2063253723 CREATE TABLE if not exists test_simulated_data.ods_table_v1( id int, count_field double, write_time timestamp(0), _part string, proc_time timestamp(3), WATERMARK FOR

Re: [PR] [HUDI-7515] Fix partition metadata write failure [hudi]

2024-04-18 Thread via GitHub
Tartarus0zm commented on code in PR #10886: URL: https://github.com/apache/hudi/pull/10886#discussion_r1570153651 ## hudi-common/src/main/java/org/apache/hudi/common/model/HoodiePartitionMetadata.java: ## @@ -92,11 +92,12 @@ public int getPartitionDepth() { /** * Write

Re: [PR] [HUDI-7515] Fix partition metadata write failure [hudi]

2024-04-18 Thread via GitHub
Tartarus0zm commented on code in PR #10886: URL: https://github.com/apache/hudi/pull/10886#discussion_r1570153651 ## hudi-common/src/main/java/org/apache/hudi/common/model/HoodiePartitionMetadata.java: ## @@ -92,11 +92,12 @@ public int getPartitionDepth() { /** * Write

<    1   2