Re: [PR] [HUDI-7578] Avoid unnecessary rewriting when copy old data from old base to new base file to improve compaction performance [hudi]

2024-04-08 Thread via GitHub
hudi-bot commented on PR #10980: URL: https://github.com/apache/hudi/pull/10980#issuecomment-2044184717 ## CI report: * c382de2b71540404831449de82e40d9488a38575 Azure:

Re: [I] Duplicate Row in Same Partition using Global Bloom Index [hudi]

2024-04-08 Thread via GitHub
Raghvendradubey commented on issue #9536: URL: https://github.com/apache/hudi/issues/9536#issuecomment-2044164961 Hi @ad1happy2go @nsivabalan After migrating to new Hudi version 0.14.0 I didn't face this issue again, thanks for your support. -- This is an automated message from the

Re: [I] [SUPPORT] spark stuctrued streaming failed to update MDT metadata [hudi]

2024-04-08 Thread via GitHub
Qiuzhuang commented on issue #10891: URL: https://github.com/apache/hudi/issues/10891#issuecomment-2044133901 > but woudn't the inprocess lock provider kick in? and should avoid multiple writers to MDT. I am assuming the setup is, spark streaming w/ async compaction or clustering. A single

Re: [PR] [HUDI-7578] Avoid unnecessary rewriting when copy old data from old base to new base file to improve compaction performance [hudi]

2024-04-08 Thread via GitHub
hudi-bot commented on PR #10980: URL: https://github.com/apache/hudi/pull/10980#issuecomment-2044130880 ## CI report: * 36b0e8f8e5e00096b9844f8db6cc51cbc114f42c Azure:

Re: [PR] [HUDI-7578] Avoid unnecessary rewriting when copy old data from old base to new base file to improve compaction performance [hudi]

2024-04-08 Thread via GitHub
hudi-bot commented on PR #10980: URL: https://github.com/apache/hudi/pull/10980#issuecomment-2044125667 ## CI report: * 36b0e8f8e5e00096b9844f8db6cc51cbc114f42c Azure:

Re: [PR] [HUDI-7391] HoodieMetadataMetrics should use Metrics instance for metrics registry [hudi]

2024-04-08 Thread via GitHub
nsivabalan commented on code in PR #10635: URL: https://github.com/apache/hudi/pull/10635#discussion_r1556835447 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieMetadataWriteUtils.java: ## @@ -200,6 +200,11 @@ public static HoodieWriteConfig

Re: [PR] [HUDI-7391] HoodieMetadataMetrics should use Metrics instance for metrics registry [hudi]

2024-04-08 Thread via GitHub
nsivabalan commented on code in PR #10635: URL: https://github.com/apache/hudi/pull/10635#discussion_r1556836048 ## hudi-common/src/main/java/org/apache/hudi/metrics/Metrics.java: ## @@ -176,4 +190,16 @@ public static boolean isInitialized(String basePath) { } return

Re: [PR] [HUDI-7395] Fix computation for metrics in HoodieMetadataMetrics [hudi]

2024-04-08 Thread via GitHub
nsivabalan commented on PR #10641: URL: https://github.com/apache/hudi/pull/10641#issuecomment-2044100016 hey @prashantwason : lets de-couple the fixes. a. Fixing MDT to emit writer side metrics(commit duration, compaction duration etc) b. Fixing MDT to emit reader side metrics (col

Re: [PR] [HUDI-7578] Avoid unnecessary rewriting when copy old data from old base to new base file to improve compaction performance [hudi]

2024-04-08 Thread via GitHub
beyond1920 commented on code in PR #10980: URL: https://github.com/apache/hudi/pull/10980#discussion_r1556818304 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieMergeHandle.java: ## @@ -147,6 +149,13 @@ public HoodieMergeHandle(HoodieWriteConfig config,

Re: [PR] [HUDI-7578] Avoid unnecessary rewriting when copy old data from old base to new base file to improve compaction performance [hudi]

2024-04-08 Thread via GitHub
beyond1920 commented on code in PR #10980: URL: https://github.com/apache/hudi/pull/10980#discussion_r1556818304 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieMergeHandle.java: ## @@ -147,6 +149,13 @@ public HoodieMergeHandle(HoodieWriteConfig config,

Re: [PR] [HUDI-7578] Avoid unnecessary rewriting when copy old data from old base to new base file to improve compaction performance [hudi]

2024-04-08 Thread via GitHub
beyond1920 commented on code in PR #10980: URL: https://github.com/apache/hudi/pull/10980#discussion_r1556818304 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieMergeHandle.java: ## @@ -147,6 +149,13 @@ public HoodieMergeHandle(HoodieWriteConfig config,

Re: [PR] [HUDI-7578] Avoid unnecessary rewriting when copy old data from old base to new base file to improve compaction performance [hudi]

2024-04-08 Thread via GitHub
beyond1920 commented on code in PR #10980: URL: https://github.com/apache/hudi/pull/10980#discussion_r1556817370 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieMergeHandle.java: ## @@ -147,6 +149,13 @@ public HoodieMergeHandle(HoodieWriteConfig config,

Re: [PR] [HUDI-7578] Avoid unnecessary rewriting when copy old data from old base to new base file to improve compaction performance [hudi]

2024-04-08 Thread via GitHub
beyond1920 commented on code in PR #10980: URL: https://github.com/apache/hudi/pull/10980#discussion_r1556818304 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieMergeHandle.java: ## @@ -147,6 +149,13 @@ public HoodieMergeHandle(HoodieWriteConfig config,

Re: [I] [SUPPORT] spark stuctrued streaming failed to update MDT metadata [hudi]

2024-04-08 Thread via GitHub
xicm commented on issue #10891: URL: https://github.com/apache/hudi/issues/10891#issuecomment-2044061311 The root cause is the deltacommit in MDT rollbacks the compaction instant(compaction in MDT is a deltacommit) in MDT. When a compaction starts, it will create a **inflight

[I] [SUPPORT]Exception when executing log compaction : Unsupported Operation Exception [hudi]

2024-04-08 Thread via GitHub
MrAladdin opened a new issue, #10982: URL: https://github.com/apache/hudi/issues/10982 **Describe the problem you faced** 1、spark upsert hudi(mor) 2、exception when executing log compaction : Unsupported Operation Exception 3、org.apache.hudi.exception.HoodieRollbackException:

Re: [I] [Inquiry] Does HoodieIndexer can Do Indexing for RLI Async Fashion [hudi]

2024-04-08 Thread via GitHub
nsivabalan commented on issue #10815: URL: https://github.com/apache/hudi/issues/10815#issuecomment-2044048808 hey @ad1happy2go @codope : looks like there is some mis understanging on how to use async indexer. when enabling async indexer to build say RLI, ingestion also should have

Re: [I] Duplicate Row in Same Partition using Global Bloom Index [hudi]

2024-04-08 Thread via GitHub
nsivabalan commented on issue #9536: URL: https://github.com/apache/hudi/issues/9536#issuecomment-2044042739 hey @Raghvendradubey : any follow ups on this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [I] [SUPPORT]Data loss occurs when using bulkinsert [hudi]

2024-04-08 Thread via GitHub
nsivabalan commented on issue #9748: URL: https://github.com/apache/hudi/issues/9748#issuecomment-2044042481 hey @ad1happy2go : any follow up on this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [I] [SUPPORT] After enable speculation execution of spark compaction job, some broken parquet files might be generated [hudi]

2024-04-08 Thread via GitHub
nsivabalan commented on issue #9615: URL: https://github.com/apache/hudi/issues/9615#issuecomment-2044040888 We gonna attempt at fixing the issue on this using completion markers. Will post an update shortly on how we plan to tackle this. But in the mean time, curious to know how you

Re: [I] Enable Hudi Metadata Table and Multi-Modal Index bug [hudi]

2024-04-08 Thread via GitHub
nsivabalan commented on issue #9672: URL: https://github.com/apache/hudi/issues/9672#issuecomment-2044037688 hey @MorningGlow : any follow ups on this? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [I] too many s3 list when hoodie.metadata.enable=true [hudi]

2024-04-08 Thread via GitHub
nsivabalan commented on issue #9751: URL: https://github.com/apache/hudi/issues/9751#issuecomment-2044036786 hey @njalan @BruceKellan : any follow ups on this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [I] [SUPPORT] Facing java.util.NoSuchElementException on EMR 6.12 (Hudi 0.13) with inline compaction and cleaning on MoR tables [hudi]

2024-04-08 Thread via GitHub
nsivabalan commented on issue #9861: URL: https://github.com/apache/hudi/issues/9861#issuecomment-2044035691 hey @ad1happy2go : any follow ups on this? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [I] [SUPPORT] Compaction error [hudi]

2024-04-08 Thread via GitHub
nsivabalan commented on issue #9885: URL: https://github.com/apache/hudi/issues/9885#issuecomment-2044033752 hey @ad1happy2go : reminder to follow up on this -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [I] [SUPPORT] AWS Athena query fail when compaction is scheduled for MOR table [hudi]

2024-04-08 Thread via GitHub
nsivabalan commented on issue #9907: URL: https://github.com/apache/hudi/issues/9907#issuecomment-2044029051 hey @codope @rahil-c : is athena querying hudi related issues are all fixed as of now? or do we still have any pending gaps. -- This is an automated message from the Apache

Re: [I] [SUPPORT] Data loss in MOR table after clustering partition [hudi]

2024-04-08 Thread via GitHub
nsivabalan commented on issue #9977: URL: https://github.com/apache/hudi/issues/9977#issuecomment-2044027211 hey @ad1happy2go : whats the follow up on this. do we need to make any fixes to hudi. or doc enhancements etc? -- This is an automated message from the Apache Git Service. To

Re: [I] [SUPPORT] Query failure due to replacecommit being archived [hudi]

2024-04-08 Thread via GitHub
nsivabalan commented on issue #10107: URL: https://github.com/apache/hudi/issues/10107#issuecomment-2044026284 hey @haoxie-aws : the link PRs should fix the issue reported. are you facing the issue after 0.14.1 as well ? -- This is an automated message from the Apache Git

Re: [I] [SUPPORT] Additional records in dataset after clustering [hudi]

2024-04-08 Thread via GitHub
nsivabalan commented on issue #10172: URL: https://github.com/apache/hudi/issues/10172#issuecomment-2044025853 hey @noahtaite : any follow ups on this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [I] [SUPPORT] Compaction & Clustering are not working [hudi]

2024-04-08 Thread via GitHub
nsivabalan commented on issue #10183: URL: https://github.com/apache/hudi/issues/10183#issuecomment-2044025493 hey @ad1happy2go : can you follow up on this. @Cpandey43 : yes you are right. enabling async w/ batch writers like spark-ds does not mean much. -- This is an automated

Re: [I] [SUPPORT] INSERT_OVERWRITE_TABLE on subsequent runs fails with a metadata file not found error (v0.14.0) [hudi]

2024-04-08 Thread via GitHub
nsivabalan commented on issue #10445: URL: https://github.com/apache/hudi/issues/10445#issuecomment-2044023506 just to get past the issue, you can completely delete the table and rewrite. or use overwrite mode w/ spark. until we have a proper fix. -- This is an automated message from

Re: [PR] [HUDI-7575] avoid repeated fetching of pending replace instants [hudi]

2024-04-08 Thread via GitHub
danny0405 commented on code in PR #10976: URL: https://github.com/apache/hudi/pull/10976#discussion_r1556736368 ## hudi-common/src/main/java/org/apache/hudi/common/table/view/AbstractTableFileSystemView.java: ## @@ -140,6 +141,22 @@ protected void init(HoodieTableMetaClient

Re: [I] Upsert operation not working and job is running longer while using "Record level index" in Apache Hudi 0.14 in EMR 6.15 [hudi]

2024-04-08 Thread via GitHub
nsivabalan commented on issue #10587: URL: https://github.com/apache/hudi/issues/10587#issuecomment-2043999240 hey @ad1happy2go : do let me know if we find any data consistency issues w/ MDT or RLI. thanks. -- This is an automated message from the Apache Git Service. To respond to the

Re: [I] RLI Spark Hudi Error occurs when executing map [hudi]

2024-04-08 Thread via GitHub
nsivabalan commented on issue #10609: URL: https://github.com/apache/hudi/issues/10609#issuecomment-2043998416 and @ad1happy2go : if you encounter any bugs wrt MDT or RLI, do keep me posted. -- This is an automated message from the Apache Git Service. To respond to the message,

Re: [I] RLI Spark Hudi Error occurs when executing map [hudi]

2024-04-08 Thread via GitHub
nsivabalan commented on issue #10609: URL: https://github.com/apache/hudi/issues/10609#issuecomment-2043998156 hey @bksrepo : can you file a new issue hey @ad1happy2go : if the original issue is resolved, can we close it out. -- This is an automated message from the Apache Git

Re: [I] File not found while using metadata table for insert_overwrite table [hudi]

2024-04-08 Thread via GitHub
nsivabalan commented on issue #10628: URL: https://github.com/apache/hudi/issues/10628#issuecomment-2043996684 hey @ad1happy2go : if this turns out to be MDT data consistency issue, do keep me posted. thanks. -- This is an automated message from the Apache Git Service. To respond to the

[jira] [Commented] (HUDI-7574) Auto-pilot for Flink Hudi sink tasks

2024-04-08 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-7574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17835092#comment-17835092 ] Vinoth Chandar commented on HUDI-7574: -- We need to rethink these singleton tasks like cleaning etc. 

[jira] [Updated] (HUDI-7574) Auto-pilot for Flink Hudi sink tasks

2024-04-08 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-7574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Chandar updated HUDI-7574: - Status: In Progress (was: Open) > Auto-pilot for Flink Hudi sink tasks >

Re: [I] [SUPPORT] Duplicate data in base file of MOR table [hudi]

2024-04-08 Thread via GitHub
nsivabalan commented on issue #10882: URL: https://github.com/apache/hudi/issues/10882#issuecomment-2043992885 hey @ad1happy2go : if this is related to MDT, can you let me know. I am trying to take stock of all MDT data consistency related issues. -- This is an automated message from

[jira] [Updated] (HUDI-7577) Avoid MDT compaction instant time conflicts

2024-04-08 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-7577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Chandar updated HUDI-7577: - Status: In Progress (was: Open) > Avoid MDT compaction instant time conflicts >

[jira] [Updated] (HUDI-7572) Avoid to schedule empty compaction plan without log files

2024-04-08 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-7572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Chandar updated HUDI-7572: - Reviewers: Ethan Guo, Sagar Sumit > Avoid to schedule empty compaction plan without log files >

Re: [I] [SUPPORT] IllegalArgumentException at org.apache.hudi.common.util.ValidationUtils.checkArgument(ValidationUtils.java:33) [hudi]

2024-04-08 Thread via GitHub
nsivabalan commented on issue #10906: URL: https://github.com/apache/hudi/issues/10906#issuecomment-2043989098 CC @linliu-code -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [I] [SUPPORT] No way to clean `archived/` folder [hudi]

2024-04-08 Thread via GitHub
nsivabalan commented on issue #10930: URL: https://github.com/apache/hudi/issues/10930#issuecomment-2043988319 may be we should introduce a ArchivalClean table service to auto clean files older than say 2 months. Not many users are going to inspect archival timeline after 2+ months. and it

Re: [I] [Feature Inquiry] index for randomized upserts [hudi]

2024-04-08 Thread via GitHub
nsivabalan commented on issue #10961: URL: https://github.com/apache/hudi/issues/10961#issuecomment-2043987312 just a note. 0.14.1 RLI is a substitute for global index and not any index. for eg, if you were using bloom, you can't replace it w/ RLI. Current RLI cannot support same

Re: [I] [SUPPORT] Rollback failed clustering 0.12.2 [hudi]

2024-04-08 Thread via GitHub
nsivabalan commented on issue #10964: URL: https://github.com/apache/hudi/issues/10964#issuecomment-2043986341 hey @suryaprasanna : Can you take this up and offer some suggestions. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [PR] [HUDI-7575] avoid repeated fetching of pending replace instants [hudi]

2024-04-08 Thread via GitHub
the-other-tim-brown commented on code in PR #10976: URL: https://github.com/apache/hudi/pull/10976#discussion_r1556695938 ## hudi-common/src/main/java/org/apache/hudi/common/table/view/AbstractTableFileSystemView.java: ## @@ -140,6 +141,22 @@ protected void

Re: [PR] [HUDI-7578] Avoid unnecessary rewriting when copy old data from old base to new base file to improve compaction performance [hudi]

2024-04-08 Thread via GitHub
danny0405 commented on code in PR #10980: URL: https://github.com/apache/hudi/pull/10980#discussion_r1556687381 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieMergeHandle.java: ## @@ -147,6 +149,13 @@ public HoodieMergeHandle(HoodieWriteConfig config,

Re: [PR] [HUDI-7578] Avoid unnecessary rewriting when copy old data from old base to new base file to improve compaction performance [hudi]

2024-04-08 Thread via GitHub
danny0405 commented on code in PR #10980: URL: https://github.com/apache/hudi/pull/10980#discussion_r1556687736 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieMergeHandle.java: ## @@ -147,6 +149,13 @@ public HoodieMergeHandle(HoodieWriteConfig config,

Re: [PR] [HUDI-7503] Compaction and LogCompaction executions should start a heartbeat on every attempt and block concurrent executions of same plan [hudi]

2024-04-08 Thread via GitHub
danny0405 commented on code in PR #10965: URL: https://github.com/apache/hudi/pull/10965#discussion_r1556682595 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/BaseHoodieWriteClient.java: ## @@ -1135,8 +1137,34 @@ protected void

Re: [PR] [HUDI-7503] Compaction and LogCompaction executions should start a heartbeat on every attempt and block concurrent executions of same plan [hudi]

2024-04-08 Thread via GitHub
danny0405 commented on code in PR #10965: URL: https://github.com/apache/hudi/pull/10965#discussion_r1556682151 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/BaseHoodieWriteClient.java: ## @@ -1135,8 +1138,36 @@ protected void

Re: [PR] [HUDI-7503] Compaction and LogCompaction executions should start a heartbeat on every attempt and block concurrent executions of same plan [hudi]

2024-04-08 Thread via GitHub
danny0405 commented on code in PR #10965: URL: https://github.com/apache/hudi/pull/10965#discussion_r1554475930 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/BaseHoodieWriteClient.java: ## @@ -1135,8 +1138,36 @@ protected void

Re: [PR] [HUDI-7575] avoid repeated fetching of pending replace instants [hudi]

2024-04-08 Thread via GitHub
danny0405 commented on code in PR #10976: URL: https://github.com/apache/hudi/pull/10976#discussion_r1556677633 ## hudi-common/src/main/java/org/apache/hudi/common/table/view/AbstractTableFileSystemView.java: ## @@ -140,6 +141,22 @@ protected void init(HoodieTableMetaClient

Re: [PR] [HUDI-7503] Compaction and LogCompaction executions should start a heartbeat on every attempt and block concurrent executions of same plan [hudi]

2024-04-08 Thread via GitHub
hudi-bot commented on PR #10965: URL: https://github.com/apache/hudi/pull/10965#issuecomment-2043939357 ## CI report: * e1a6e4a24083dd8871a2fc3fbb289e1a6192593a Azure:

Re: [PR] [HUDI-7395] Fix computation for metrics in HoodieMetadataMetrics [hudi]

2024-04-08 Thread via GitHub
prashantwason commented on code in PR #10641: URL: https://github.com/apache/hudi/pull/10641#discussion_r1556617105 ## hudi-common/src/main/java/org/apache/hudi/metadata/HoodieMetadataMetrics.java: ## @@ -136,7 +144,7 @@ public void updateMetrics(String action, long

Re: [PR] [HUDI-7503] Compaction and LogCompaction executions should start a heartbeat on every attempt and block concurrent executions of same plan [hudi]

2024-04-08 Thread via GitHub
hudi-bot commented on PR #10965: URL: https://github.com/apache/hudi/pull/10965#issuecomment-2043855651 ## CI report: * c41af6435281865147967768419da5e4fb688f8b Azure:

Re: [PR] [HUDI-7503] Compaction and LogCompaction executions should start a heartbeat on every attempt and block concurrent executions of same plan [hudi]

2024-04-08 Thread via GitHub
hudi-bot commented on PR #10965: URL: https://github.com/apache/hudi/pull/10965#issuecomment-2043839847 ## CI report: * c41af6435281865147967768419da5e4fb688f8b Azure:

Re: [I] [SUPPORT] Hudi CLI bundle not working [hudi]

2024-04-08 Thread via GitHub
mansipp commented on issue #10566: URL: https://github.com/apache/hudi/issues/10566#issuecomment-2043833097 Getting the similar error while running the `commit rollback` ``` commit rollback --commit 20240408231846380 24/04/08 23:22:02 INFO InputStreamConsumer: Apr 08, 2024 11:22:02

(hudi) branch asf-site updated: [DOCS] Updates slack link across site (#10981)

2024-04-08 Thread bhavanisudha
This is an automated email from the ASF dual-hosted git repository. bhavanisudha pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/asf-site by this push: new a4ec3fc9016 [DOCS] Updates slack link

Re: [PR] [DOCS] Updates slack link across site [hudi]

2024-04-08 Thread via GitHub
bhasudha merged PR #10981: URL: https://github.com/apache/hudi/pull/10981 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

Re: [PR] [HUDI-7503] Compaction and LogCompaction executions should start a heartbeat on every attempt and block concurrent executions of same plan [hudi]

2024-04-08 Thread via GitHub
hudi-bot commented on PR #10965: URL: https://github.com/apache/hudi/pull/10965#issuecomment-2043740065 ## CI report: * c41af6435281865147967768419da5e4fb688f8b Azure:

Re: [PR] [DOCS] Updates slack link across site [hudi]

2024-04-08 Thread via GitHub
bhasudha commented on PR #10981: URL: https://github.com/apache/hudi/pull/10981#issuecomment-2043726886 Tested locally ![Screenshot 2024-04-08 at 3 06 27 PM](https://github.com/apache/hudi/assets/2179254/9070ea06-7658-4f85-a627-10339de6051c) ![Screenshot 2024-04-08 at 3 05 08

[PR] [DOCS] Updates slack link across site [hudi]

2024-04-08 Thread via GitHub
bhasudha opened a new pull request, #10981: URL: https://github.com/apache/hudi/pull/10981 ### Change Logs Update slack link due to expiry of old one. ### Impact Slack link update across website. ### Risk level (write none, low medium or high below) low.

[jira] [Commented] (HUDI-6787) Hive Integrate FileGroupReader with HoodieMergeOnReadSnapshotReader and RealtimeCompactedRecordReader for Hive

2024-04-08 Thread Jonathan Vexler (Jira)
[ https://issues.apache.org/jira/browse/HUDI-6787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17835052#comment-17835052 ] Jonathan Vexler commented on HUDI-6787: --- {code:java} root@adhoc-2:/opt# spark-submit \ >   --class

Re: [I] [SUPPORT]insert_overwrite_table table slow [hudi]

2024-04-08 Thread via GitHub
wkhappy1 commented on issue #10979: URL: https://github.com/apache/hudi/issues/10979#issuecomment-2043650074 @ad1happy2go yes,table size is 27.1 G that is a the hudi table in hdfs ,and I find a rdd cache on disk size is 503.8 from spark ui.can the rdd size cached be small?it seems to

Re: [PR] [HUDI-7503] Compaction and LogCompaction executions should start a heartbeat on every attempt and block concurrent executions of same plan [hudi]

2024-04-08 Thread via GitHub
hudi-bot commented on PR #10965: URL: https://github.com/apache/hudi/pull/10965#issuecomment-2043606333 ## CI report: * c8e268903a19c7ecc5cd927fd8afa3332a1c3aea Azure:

Re: [PR] [HUDI-7503] Compaction and LogCompaction executions should start a heartbeat on every attempt and block concurrent executions of same plan [hudi]

2024-04-08 Thread via GitHub
kbuci commented on code in PR #10965: URL: https://github.com/apache/hudi/pull/10965#discussion_r1556395911 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/BaseHoodieWriteClient.java: ## @@ -1135,8 +1138,34 @@ protected void

Re: [PR] [HUDI-7503] Compaction and LogCompaction executions should start a heartbeat on every attempt and block concurrent executions of same plan [hudi]

2024-04-08 Thread via GitHub
hudi-bot commented on PR #10965: URL: https://github.com/apache/hudi/pull/10965#issuecomment-2043593538 ## CI report: * c8e268903a19c7ecc5cd927fd8afa3332a1c3aea Azure:

Re: [PR] [HUDI-7290] Don't assume ReplaceCommits are always Clustering [hudi]

2024-04-08 Thread via GitHub
bvaradar commented on code in PR #10479: URL: https://github.com/apache/hudi/pull/10479#discussion_r1556399110 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/marker/WriteMarkers.java: ## @@ -86,7 +86,7 @@ public Option create(String partitionPath, String

Re: [PR] [HUDI-7290] Don't assume ReplaceCommits are always Clustering [hudi]

2024-04-08 Thread via GitHub
bvaradar commented on code in PR #10479: URL: https://github.com/apache/hudi/pull/10479#discussion_r1556399888 ## hudi-common/src/main/java/org/apache/hudi/common/table/timeline/HoodieDefaultTimeline.java: ## @@ -516,13 +516,40 @@ public Option getLastClusteringInstant() {

Re: [PR] [HUDI-7503] Compaction and LogCompaction executions should start a heartbeat on every attempt and block concurrent executions of same plan [hudi]

2024-04-08 Thread via GitHub
kbuci commented on code in PR #10965: URL: https://github.com/apache/hudi/pull/10965#discussion_r1556395911 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/BaseHoodieWriteClient.java: ## @@ -1135,8 +1138,34 @@ protected void

Re: [PR] [HUDI-7290] Don't assume ReplaceCommits are always Clustering [hudi]

2024-04-08 Thread via GitHub
hudi-bot commented on PR #10479: URL: https://github.com/apache/hudi/pull/10479#issuecomment-2043517085 ## CI report: * b9b3ae4c3025515e61eca8a7df887eb9fe764b0f Azure:

Re: [PR] [HUDI-7290] Don't assume ReplaceCommits are always Clustering [hudi]

2024-04-08 Thread via GitHub
hudi-bot commented on PR #10479: URL: https://github.com/apache/hudi/pull/10479#issuecomment-2043429116 ## CI report: * 0a5e5faa01273113cb974e9aa31cfb54d62dff67 Azure:

Re: [PR] [HUDI-7290] Don't assume ReplaceCommits are always Clustering [hudi]

2024-04-08 Thread via GitHub
hudi-bot commented on PR #10479: URL: https://github.com/apache/hudi/pull/10479#issuecomment-2043418339 ## CI report: * 0a5e5faa01273113cb974e9aa31cfb54d62dff67 Azure:

Re: [I] [SUPPORT]insert_overwrite_table table slow [hudi]

2024-04-08 Thread via GitHub
ad1happy2go commented on issue #10979: URL: https://github.com/apache/hudi/issues/10979#issuecomment-2043307873 @wkhappy1 As you said the table size is 27.1 G, is it parquet table? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [PR] [HUDI-7290] Don't assume ReplaceCommits are always Clustering [hudi]

2024-04-08 Thread via GitHub
jonvex commented on PR #10479: URL: https://github.com/apache/hudi/pull/10479#issuecomment-2043297073 @bvaradar org.apache.hudi.common.table.view.TestHoodieTableFileSystemView#testHoodieTableFileSystemViewWithPendingClustering is failing because that test relies on this feature to be

Re: [PR] [HUDI-7503] Compaction and LogCompaction executions should start a heartbeat on every attempt and block concurrent executions of same plan [hudi]

2024-04-08 Thread via GitHub
kbuci commented on code in PR #10965: URL: https://github.com/apache/hudi/pull/10965#discussion_r1556157237 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/BaseHoodieWriteClient.java: ## @@ -1135,8 +1138,36 @@ protected void

Re: [PR] [HUDI-7290] Don't assume ReplaceCommits are always Clustering [hudi]

2024-04-08 Thread via GitHub
hudi-bot commented on PR #10479: URL: https://github.com/apache/hudi/pull/10479#issuecomment-2043184042 ## CI report: * 0a5e5faa01273113cb974e9aa31cfb54d62dff67 Azure:

[jira] [Assigned] (HUDI-6330) Update user document to introduce this feature

2024-04-08 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-6330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu reassigned HUDI-6330: Assignee: Jing Zhang > Update user document to introduce this feature >

[jira] [Commented] (HUDI-6330) Update user document to introduce this feature

2024-04-08 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-6330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17834990#comment-17834990 ] Raymond Xu commented on HUDI-6330: -- [~jingzhang] thanks and merged! > Update user document to introduce

[jira] [Closed] (HUDI-6330) Update user document to introduce this feature

2024-04-08 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-6330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu closed HUDI-6330. Resolution: Fixed > Update user document to introduce this feature >

(hudi) branch asf-site updated: [HUDI-6330][DOCS] Update user doc to show how to use consistent bucket index for Flink engine (#10977)

2024-04-08 Thread xushiyan
This is an automated email from the ASF dual-hosted git repository. xushiyan pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/asf-site by this push: new 72b01a53d3d [HUDI-6330][DOCS] Update user

Re: [PR] [HUDI-6330][DOCS] Update user doc to show how to use consistent bucket index for Flink engine [hudi]

2024-04-08 Thread via GitHub
xushiyan merged PR #10977: URL: https://github.com/apache/hudi/pull/10977 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

Re: [PR] [HUDI-7290] Don't assume ReplaceCommits are always Clustering [hudi]

2024-04-08 Thread via GitHub
hudi-bot commented on PR #10479: URL: https://github.com/apache/hudi/pull/10479#issuecomment-2043093848 ## CI report: * 52afba2aa7c6ec4e0f8ca0f50eaf4a0639c53432 Azure:

Re: [PR] [HUDI-7290] Don't assume ReplaceCommits are always Clustering [hudi]

2024-04-08 Thread via GitHub
hudi-bot commented on PR #10479: URL: https://github.com/apache/hudi/pull/10479#issuecomment-2043078181 ## CI report: * 52afba2aa7c6ec4e0f8ca0f50eaf4a0639c53432 Azure:

(hudi) branch asf-site updated: [DOCS] Update blogs (#10971)

2024-04-08 Thread xushiyan
This is an automated email from the ASF dual-hosted git repository. xushiyan pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/asf-site by this push: new 06eb97ca409 [DOCS] Update blogs (#10971)

Re: [PR] [DOCS] Update blogs [hudi]

2024-04-08 Thread via GitHub
xushiyan merged PR #10971: URL: https://github.com/apache/hudi/pull/10971 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

Re: [PR] [HUDI-7290] Don't assume ReplaceCommits are always Clustering [hudi]

2024-04-08 Thread via GitHub
hudi-bot commented on PR #10479: URL: https://github.com/apache/hudi/pull/10479#issuecomment-2043060978 ## CI report: * 52afba2aa7c6ec4e0f8ca0f50eaf4a0639c53432 Azure:

Re: [PR] [HUDI-7576] add partitionPath as an instance variable to HoodieBaseFile and HoodieLogFile [hudi]

2024-04-08 Thread via GitHub
the-other-tim-brown commented on PR #10975: URL: https://github.com/apache/hudi/pull/10975#issuecomment-2042932190 > > > Can you explain why? > > > > > > Because it represents an "File", the partition notion kind of belongs to table, which is firstly introduced by Hive to resolve

Re: [PR] [HUDI-7575] avoid repeated fetching of pending replace instants [hudi]

2024-04-08 Thread via GitHub
the-other-tim-brown commented on code in PR #10976: URL: https://github.com/apache/hudi/pull/10976#discussion_r1555936753 ## hudi-common/src/main/java/org/apache/hudi/common/table/view/AbstractTableFileSystemView.java: ## @@ -140,6 +141,22 @@ protected void

Re: [I] [SUPPORT]The number of tasks in each distinct stage of building workload profile is always 60 [hudi]

2024-04-08 Thread via GitHub
MrAladdin closed issue #10972: [SUPPORT]The number of tasks in each distinct stage of building workload profile is always 60 URL: https://github.com/apache/hudi/issues/10972 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [I] [SUPPORT]The number of tasks in each distinct stage of building workload profile is always 60 [hudi]

2024-04-08 Thread via GitHub
MrAladdin commented on issue #10972: URL: https://github.com/apache/hudi/issues/10972#issuecomment-2042898188 > @MrAladdin Can you provide the writer configurations you are using?@MrAladdin 你能提供你正在使用的写入器配置吗? sorry, Forget to close "hoodie.metadata.index.async" -- This is an

Re: [PR] [HUDI-7578] Avoid unnecessary rewriting when copy old data from old base to new base file to improve compaction performance [hudi]

2024-04-08 Thread via GitHub
hudi-bot commented on PR #10980: URL: https://github.com/apache/hudi/pull/10980#issuecomment-2042891233 ## CI report: * 36b0e8f8e5e00096b9844f8db6cc51cbc114f42c Azure:

Re: [I] Nested object support in Hudi Table using Flink [hudi]

2024-04-08 Thread via GitHub
waytoharish closed issue #10895: Nested object support in Hudi Table using Flink URL: https://github.com/apache/hudi/issues/10895 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [I] Nested object support in Hudi Table using Flink [hudi]

2024-04-08 Thread via GitHub
waytoharish commented on issue #10895: URL: https://github.com/apache/hudi/issues/10895#issuecomment-2042890195 Thanks @ad1happy2go @danny0405 its worked for me after using the GenericRowData. I am closing the issue -- This is an automated message from the Apache Git Service. To

Re: [PR] [HUDI-7576] add partitionPath as an instance variable to HoodieBaseFile and HoodieLogFile [hudi]

2024-04-08 Thread via GitHub
the-other-tim-brown commented on PR #10975: URL: https://github.com/apache/hudi/pull/10975#issuecomment-2042690168 > > Can you explain why? > > Because it represents an "File", the partition notion kind of belongs to table, which is firstly introduced by Hive to resolve the

Re: [I] [SUPPORT]insert_overwrite_table table slow [hudi]

2024-04-08 Thread via GitHub
wkhappy1 commented on issue #10979: URL: https://github.com/apache/hudi/issues/10979#issuecomment-2042672365 @ad1happy2go input data is a dataframe compute from other tables -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [PR] [HUDI-7578] Avoid unnecessary rewriting when copy old data from old base to new base file to improve compaction performance [hudi]

2024-04-08 Thread via GitHub
hudi-bot commented on PR #10980: URL: https://github.com/apache/hudi/pull/10980#issuecomment-2042643487 ## CI report: * 07e398007c1557d3e17adc3d8a36d8778ed3e976 Azure:

Re: [PR] [HUDI-7578] Avoid unnecessary rewriting when copy old data from old base to new base file to improve compaction performance [hudi]

2024-04-08 Thread via GitHub
hudi-bot commented on PR #10980: URL: https://github.com/apache/hudi/pull/10980#issuecomment-2042627787 ## CI report: * 07e398007c1557d3e17adc3d8a36d8778ed3e976 Azure:

Re: [PR] [HUDI-7578] Avoid unnecessary rewriting when copy old data from old base to new base file to improve compaction performance [hudi]

2024-04-08 Thread via GitHub
beyond1920 commented on code in PR #10980: URL: https://github.com/apache/hudi/pull/10980#discussion_r1555662848 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieMergeHandle.java: ## @@ -147,6 +149,13 @@ public HoodieMergeHandle(HoodieWriteConfig config,

Re: [PR] [HUDI-6330][DOCS] Update user doc to show how to use consistent bucket index for Flink engine [hudi]

2024-04-08 Thread via GitHub
beyond1920 commented on code in PR #10977: URL: https://github.com/apache/hudi/pull/10977#discussion_r1555701329 ## website/docs/sql_dml.md: ## @@ -390,3 +390,70 @@ and `clean.async.enabled` options are used to disable the compaction and cleanin This is done to ensure that

Re: [I] [SUPPORT]insert_overwrite_table table slow [hudi]

2024-04-08 Thread via GitHub
ad1happy2go commented on issue #10979: URL: https://github.com/apache/hudi/issues/10979#issuecomment-2042560266 @wkhappy1 What is the format of your input data? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] [HUDI-6330][DOCS] Update user doc to show how to use consistent bucket index for Flink engine [hudi]

2024-04-08 Thread via GitHub
beyond1920 commented on code in PR #10977: URL: https://github.com/apache/hudi/pull/10977#discussion_r1555696442 ## website/docs/sql_dml.md: ## @@ -390,3 +390,70 @@ and `clean.async.enabled` options are used to disable the compaction and cleanin This is done to ensure that

  1   2   >