[jira] [Updated] (HUDI-7286) On the Flink side, the index. type parameter is case sensitive

2024-01-09 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/HUDI-7286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-7286: - Labels: pull-request-available (was: ) > On the Flink side, the index. type parameter is case

[PR] [HUDI-7286]flink get hudi index type ignore case sensitive. [hudi]

2024-01-09 Thread via GitHub
Akihito-Liang opened a new pull request, #10476: URL: https://github.com/apache/hudi/pull/10476 ### Change Logs The OptionsResolver#getIndexType function convert to uppercase after obtaining index.type parameter ### Impact If the index. type parameter in Flink options

Re: [PR] [HUDI-7282] avoid verification failure due to append writing of the c… [hudi]

2024-01-09 Thread via GitHub
hudi-bot commented on PR #10475: URL: https://github.com/apache/hudi/pull/10475#issuecomment-1884338151 ## CI report: * 5c30be8b54110cc1414f0cb0f3715604bc063998 Azure:

Re: [PR] [HUDI-7241] Avoid always broadcast HUDI relation if not using HoodieSparkSessionExtension [hudi]

2024-01-09 Thread via GitHub
hudi-bot commented on PR #10373: URL: https://github.com/apache/hudi/pull/10373#issuecomment-188433 ## CI report: * 21de94929f716f94ca3af46d376aa2ccfb32d791 UNKNOWN * ac0d69e0f48af950cc4568762f35b3080c1cf16c Azure:

Re: [I] multi-writer jobs wait forever to finish it off (Using OPTIMISTIC_CONCURRENCY_CONTROL) [hudi]

2024-01-09 Thread via GitHub
ad1happy2go commented on issue #10468: URL: https://github.com/apache/hudi/issues/10468#issuecomment-1884336042 @SamarthRaval OCC is not recommend to run lot of multi writer jobs together if they write to same file group. You may be seeing failures which will be resulting in rollbacks.

Re: [PR] [HUDI-7241] Avoid always broadcast HUDI relation if not using HoodieSparkSessionExtension [hudi]

2024-01-09 Thread via GitHub
beyond1920 commented on PR #10373: URL: https://github.com/apache/hudi/pull/10373#issuecomment-1884331061 @bvaradar Thanks for suggestion. I updated the PR. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] [HUDI-7282] avoid verification failure due to append writing of the c… [hudi]

2024-01-09 Thread via GitHub
hudi-bot commented on PR #10475: URL: https://github.com/apache/hudi/pull/10475#issuecomment-1884329188 ## CI report: * 5c30be8b54110cc1414f0cb0f3715604bc063998 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run

Re: [PR] [HUDI-6902] Use mvnw command for hadoo-mr test [hudi]

2024-01-09 Thread via GitHub
hudi-bot commented on PR #10474: URL: https://github.com/apache/hudi/pull/10474#issuecomment-1884329128 ## CI report: * a7cf9f6af282e9bed6c4eddfa55e712912976522 Azure:

Re: [PR] [HUDI-6902] Use mvnw command for hadoo-mr test [hudi]

2024-01-09 Thread via GitHub
hudi-bot commented on PR #10474: URL: https://github.com/apache/hudi/pull/10474#issuecomment-1884319365 ## CI report: * a7cf9f6af282e9bed6c4eddfa55e712912976522 Azure:

Re: [PR] [HUDI-7278] make bloom filter skippable for CPU saving [hudi]

2024-01-09 Thread via GitHub
hudi-bot commented on PR #10457: URL: https://github.com/apache/hudi/pull/10457#issuecomment-1884319262 ## CI report: * af71b6b0adf5722b58b941ad129f685f1242a808 Azure:

Re: [PR] [HUDI-6902] Use mvnw command for hadoo-mr test [hudi]

2024-01-09 Thread via GitHub
XuQianJin-Stars commented on PR #10474: URL: https://github.com/apache/hudi/pull/10474#issuecomment-1884315682 @hudi-bot run azure -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[jira] [Updated] (HUDI-7282) Hudi COW APPEND mode can be verified through cluster that even if the index is bucket

2024-01-09 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/HUDI-7282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-7282: - Labels: pull-request-available (was: ) > Hudi COW APPEND mode can be verified through cluster

[PR] [HUDI-7282] avoid verification failure due to append writing of the c… [hudi]

2024-01-09 Thread via GitHub
Akihito-Liang opened a new pull request, #10475: URL: https://github.com/apache/hudi/pull/10475 ### Change Logs The ClusteringUtil#validateClusteringScheduling function skip the append mode check. ### Impact The change is In order to enable Flink to write the Hudi Cow

Re: [PR] [HUDI-6902] Set minimum memory for unit tests [hudi]

2024-01-09 Thread via GitHub
hudi-bot commented on PR #10469: URL: https://github.com/apache/hudi/pull/10469#issuecomment-1884264189 ## CI report: * d029150a51a74b6faf5ec28aacd5c20b4e52467b Azure:

Re: [I] Partitioning data into two keys is taking more time (10x) than partitioning into one key. [hudi]

2024-01-09 Thread via GitHub
maheshguptags commented on issue #10456: URL: https://github.com/apache/hudi/issues/10456#issuecomment-1884209599 Yes I am trying to test the different combination with bucket number. -- This is an automated message from the Apache Git Service. To respond to the message, please log on

(hudi) branch master updated: [HUDI-5973] Fixing refreshing of schemas in HoodieStreamer continuous mode (#10261)

2024-01-09 Thread vbalaji
This is an automated email from the ASF dual-hosted git repository. vbalaji pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/master by this push: new e8747036554 [HUDI-5973] Fixing refreshing of

Re: [PR] [HUDI-5973] Fixing refreshing of schemas in HoodieStreamer continuous mode [hudi]

2024-01-09 Thread via GitHub
bvaradar merged PR #10261: URL: https://github.com/apache/hudi/pull/10261 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

[jira] [Updated] (HUDI-7279) Make the sampling rate of object size estimation configurable

2024-01-09 Thread Sagar Sumit (Jira)
[ https://issues.apache.org/jira/browse/HUDI-7279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sagar Sumit updated HUDI-7279: -- Fix Version/s: 1.0.0 > Make the sampling rate of object size estimation configurable >

Re: [PR] [HUDI-6760] Add SelfDescribingInputFormatInterface for hive FileInput… [hudi]

2024-01-09 Thread via GitHub
bvaradar commented on PR #9554: URL: https://github.com/apache/hudi/pull/9554#issuecomment-1884186097 Rerunning failed jobs -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [HUDI-6902] Use mvnw command for hadoo-mr test [hudi]

2024-01-09 Thread via GitHub
hudi-bot commented on PR #10474: URL: https://github.com/apache/hudi/pull/10474#issuecomment-1884173562 ## CI report: * a7cf9f6af282e9bed6c4eddfa55e712912976522 Azure:

Re: [PR] [HUDI-7241] Avoid always broadcast HUDI relation if not using HoodieSparkSessionExtension [hudi]

2024-01-09 Thread via GitHub
bvaradar commented on code in PR #10373: URL: https://github.com/apache/hudi/pull/10373#discussion_r1446861389 ## hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieFileIndex.scala: ## @@ -111,6 +111,9 @@ case class HoodieFileIndex(spark: SparkSession,

Re: [PR] [HUDI-7241] Avoid always broadcast HUDI relation if not using HoodieSparkSessionExtension [hudi]

2024-01-09 Thread via GitHub
bvaradar commented on code in PR #10373: URL: https://github.com/apache/hudi/pull/10373#discussion_r1446861389 ## hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieFileIndex.scala: ## @@ -111,6 +111,9 @@ case class HoodieFileIndex(spark: SparkSession,

Re: [PR] [HUDI-7278] make bloom filter skippable for CPU saving [hudi]

2024-01-09 Thread via GitHub
hudi-bot commented on PR #10457: URL: https://github.com/apache/hudi/pull/10457#issuecomment-1884141208 ## CI report: * 7c668bbb0b7cafeb9b6c4d302d6154c91beb366e Azure:

Re: [PR] [HUDI-7278] make bloom filter skippable for CPU saving [hudi]

2024-01-09 Thread via GitHub
hudi-bot commented on PR #10457: URL: https://github.com/apache/hudi/pull/10457#issuecomment-1884135931 ## CI report: * 7c668bbb0b7cafeb9b6c4d302d6154c91beb366e Azure:

Re: [I] [SUPPORT]Flink writes MOR table, both RO table and RT table read nothing by hive [hudi]

2024-01-09 Thread via GitHub
nicholasxu commented on issue #10465: URL: https://github.com/apache/hudi/issues/10465#issuecomment-1884133654 > @nicholasxu That probably may be the reason. As with hive simple select * doesn't trigger the TEZ job. you can try adding condition WHERE 1 = 1 which should trigger job. >

[jira] [Closed] (HUDI-7279) Make the sampling rate of object size estimation configurable

2024-01-09 Thread Kong Wei (Jira)
[ https://issues.apache.org/jira/browse/HUDI-7279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kong Wei closed HUDI-7279. -- Resolution: Fixed > Make the sampling rate of object size estimation configurable >

Re: [PR] [HUDI-6902] Set minimum memory for unit tests [hudi]

2024-01-09 Thread via GitHub
hudi-bot commented on PR #10469: URL: https://github.com/apache/hudi/pull/10469#issuecomment-1884131053 ## CI report: * 5407ebfef7b7a331d4cbb27e7e81746bce701062 Azure:

[jira] [Created] (HUDI-7286) On the Flink side, the index. type parameter is case sensitive

2024-01-09 Thread Junning Liang (Jira)
Junning Liang created HUDI-7286: --- Summary: On the Flink side, the index. type parameter is case sensitive Key: HUDI-7286 URL: https://issues.apache.org/jira/browse/HUDI-7286 Project: Apache Hudi

(hudi) branch master updated (b54381ff9e6 -> 2170d0cdb14)

2024-01-09 Thread vbalaji
This is an automated email from the ASF dual-hosted git repository. vbalaji pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git from b54381ff9e6 [MINOR] Disable flaky test (#10449) add 2170d0cdb14 [HUDI-7279] make sampling rate configurable for

Re: [PR] [HUDI-7279] make sampling rate configurable for BOUNDED_IN_MEMORY executor type [hudi]

2024-01-09 Thread via GitHub
bvaradar merged PR #10459: URL: https://github.com/apache/hudi/pull/10459 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

Re: [PR] [HUDI-6979][RFC-76] support event time based compaction strategy [hudi]

2024-01-09 Thread via GitHub
waitingF closed pull request #10266: [HUDI-6979][RFC-76] support event time based compaction strategy URL: https://github.com/apache/hudi/pull/10266 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] [HUDI-6902] Set minimum memory for unit tests [hudi]

2024-01-09 Thread via GitHub
hudi-bot commented on PR #10469: URL: https://github.com/apache/hudi/pull/10469#issuecomment-1884094998 ## CI report: * 5407ebfef7b7a331d4cbb27e7e81746bce701062 Azure:

Re: [PR] [HUDI-6902] Set minimum memory for unit tests [hudi]

2024-01-09 Thread via GitHub
hudi-bot commented on PR #10469: URL: https://github.com/apache/hudi/pull/10469#issuecomment-1884088906 ## CI report: * 5407ebfef7b7a331d4cbb27e7e81746bce701062 Azure:

Re: [PR] [HUDI-7279] make sampling rate configurable for BOUNDED_IN_MEMORY executor type [hudi]

2024-01-09 Thread via GitHub
waitingF commented on PR #10459: URL: https://github.com/apache/hudi/pull/10459#issuecomment-1884079388 > Thanks for addressing the comments. There are a couple of [test failures](https://github.com/apache/hudi/actions/runs/7447573282/job/20260155012?pr=10459) in

Re: [PR] [MINOR] Use mvnw command for hadoo-mr test [hudi]

2024-01-09 Thread via GitHub
hudi-bot commented on PR #10474: URL: https://github.com/apache/hudi/pull/10474#issuecomment-1884058477 ## CI report: * a7cf9f6af282e9bed6c4eddfa55e712912976522 Azure:

Re: [PR] [MINOR] Use mvnw command for hadoo-mr test [hudi]

2024-01-09 Thread via GitHub
hudi-bot commented on PR #10474: URL: https://github.com/apache/hudi/pull/10474#issuecomment-1884053026 ## CI report: * a7cf9f6af282e9bed6c4eddfa55e712912976522 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run

Re: [I] Partitioning data into two keys is taking more time (10x) than partitioning into one key. [hudi]

2024-01-09 Thread via GitHub
xicm commented on issue #10456: URL: https://github.com/apache/hudi/issues/10456#issuecomment-1884052740 > can you tell me how to check number of filegroup? cli or spark sql, show_commits, pay attention to `total_files_added` and `total_files_updated` > it is still taking 45-50

Re: [PR] [HUDI-6902] Create a dummy PR to trigger tests [hudi]

2024-01-09 Thread via GitHub
hudi-bot commented on PR #10464: URL: https://github.com/apache/hudi/pull/10464#issuecomment-1884047397 ## CI report: * 47057e95cff923a24dc201f0642bd3ac6168ca5c Azure:

Re: [PR] [HUDI-6902] Use mvnw command for hadoop-mr test [hudi]

2024-01-09 Thread via GitHub
linliu-code closed pull request #10473: [HUDI-6902] Use mvnw command for hadoop-mr test URL: https://github.com/apache/hudi/pull/10473 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[PR] [HUDI-6902] Use mvnw command for hadoo-mr test [hudi]

2024-01-09 Thread via GitHub
linliu-code opened a new pull request, #10474: URL: https://github.com/apache/hudi/pull/10474 ### Change Logs The reason is to clean up any orphan resources. ### Impact Low. ### Risk level (write none, low medium or high below) None. ###

[PR] [HUDI-6902] Use mvnw command for hadoop-mr test [hudi]

2024-01-09 Thread via GitHub
linliu-code opened a new pull request, #10473: URL: https://github.com/apache/hudi/pull/10473 ### Change Logs The reason is that mvnw command can help cleanup resources used by the tests. ### Impact Fixing the flaky test. ### Risk level (write none, low

Re: [PR] [HUDI-6902] Clean up potential orphan processes [hudi]

2024-01-09 Thread via GitHub
linliu-code closed pull request #10472: [HUDI-6902] Clean up potential orphan processes URL: https://github.com/apache/hudi/pull/10472 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[PR] [HUDI-6902] Clean up potential orphan processes [hudi]

2024-01-09 Thread via GitHub
linliu-code opened a new pull request, #10472: URL: https://github.com/apache/hudi/pull/10472 for hadoop test. ### Change Logs _Describe context and summary for this change. Highlight if any code was copied._ ### Impact _Describe any public API or user-facing

Re: [PR] [HUDI-6902] Clean up potential orphan processes [hudi]

2024-01-09 Thread via GitHub
linliu-code closed pull request #10471: [HUDI-6902] Clean up potential orphan processes URL: https://github.com/apache/hudi/pull/10471 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[PR] [HUDI-6902] Clean up potential orphan processes [hudi]

2024-01-09 Thread via GitHub
linliu-code opened a new pull request, #10471: URL: https://github.com/apache/hudi/pull/10471 for hadoop test. ### Change Logs _Describe context and summary for this change. Highlight if any code was copied._ ### Impact _Describe any public API or user-facing

Re: [PR] [HUDI-6902] Clean up potential orphan processes [hudi]

2024-01-09 Thread via GitHub
linliu-code closed pull request #10470: [HUDI-6902] Clean up potential orphan processes URL: https://github.com/apache/hudi/pull/10470 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [HUDI-6902] Clean up potential orphan processes [hudi]

2024-01-09 Thread via GitHub
hudi-bot commented on PR #10470: URL: https://github.com/apache/hudi/pull/10470#issuecomment-1884015922 ## CI report: * 4e6e49ba6804cf5ac022ba1f21c61e433c132d6d UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run

[PR] [HUDI-6902] Clean up potential orphan processes [hudi]

2024-01-09 Thread via GitHub
linliu-code opened a new pull request, #10470: URL: https://github.com/apache/hudi/pull/10470 ### Change Logs for hadoop test. ### Impact Low. ### Risk level (write none, low medium or high below) None. ### Contributor's checklist - [

Re: [PR] [HUDI-1623] Solid completion time on timeline [hudi]

2024-01-09 Thread via GitHub
waywtdcc commented on PR #9617: URL: https://github.com/apache/hudi/pull/9617#issuecomment-1884005603 The meta file of version 1.0.0 is in avro format, and the old version is in json format. This should also be compatible. -- This is an automated message from the Apache Git Service. To

Re: [PR] [HUDI-6902] Give more memory for tests [hudi]

2024-01-09 Thread via GitHub
hudi-bot commented on PR #10469: URL: https://github.com/apache/hudi/pull/10469#issuecomment-1883963420 ## CI report: * 5407ebfef7b7a331d4cbb27e7e81746bce701062 Azure:

Re: [PR] [HUDI-6902] Give more memory for tests [hudi]

2024-01-09 Thread via GitHub
hudi-bot commented on PR #10469: URL: https://github.com/apache/hudi/pull/10469#issuecomment-1883957222 ## CI report: * 5407ebfef7b7a331d4cbb27e7e81746bce701062 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run

Re: [PR] [HUDI-7284] stream sync doesn't differentiate replace commits [hudi]

2024-01-09 Thread via GitHub
hudi-bot commented on PR #10467: URL: https://github.com/apache/hudi/pull/10467#issuecomment-1883957181 ## CI report: * d26cc3ca0ececfd34a32529be52299da02b39d75 Azure:

Re: [PR] [HUDI-7284] stream sync doesn't differentiate replace commits [hudi]

2024-01-09 Thread via GitHub
hudi-bot commented on PR #10467: URL: https://github.com/apache/hudi/pull/10467#issuecomment-1883920993 ## CI report: * ae88a3597422ceb690891a1340309ac20f22afd4 Azure:

Re: [PR] [HUDI-6497] WIP HoodieStorage abstraction [hudi]

2024-01-09 Thread via GitHub
hudi-bot commented on PR #10360: URL: https://github.com/apache/hudi/pull/10360#issuecomment-1883920369 ## CI report: * 0a958d6408a7d0107ae2dcfc2aae676fd1a6977d UNKNOWN * 9ca9af9d1632fe091a98cc1882e2863702b72e86 Azure:

[PR] [HUDI-6902] Give more memory for tests [hudi]

2024-01-09 Thread via GitHub
linliu-code opened a new pull request, #10469: URL: https://github.com/apache/hudi/pull/10469 ### Change Logs Changes: 1. 2g -> 4g 2. Set initial memory 128M. ### Impact Low. ### Risk level (write none, low medium or high below) None.

Re: [PR] [HUDI-6497] WIP HoodieStorage abstraction [hudi]

2024-01-09 Thread via GitHub
hudi-bot commented on PR #10360: URL: https://github.com/apache/hudi/pull/10360#issuecomment-1883912414 ## CI report: * 0a958d6408a7d0107ae2dcfc2aae676fd1a6977d UNKNOWN * 621bad3fc89b1a77bab7ca7fdb147afbdea8d30e Azure:

Re: [I] multi-writer jobs wait forever to finish it off (Using OPTIMISTIC_CONCURRENCY_CONTROL) [hudi]

2024-01-09 Thread via GitHub
SamarthRaval commented on issue #10468: URL: https://github.com/apache/hudi/issues/10468#issuecomment-1883856414 @ad1happy2go @xushiyan @danny0405 @codope @nsivabalan Can you guys please help me here. -- This is an automated message from the Apache Git Service. To respond to the

Re: [PR] [HUDI-6902] Create a dummy PR to trigger tests [hudi]

2024-01-09 Thread via GitHub
hudi-bot commented on PR #10464: URL: https://github.com/apache/hudi/pull/10464#issuecomment-1883843643 ## CI report: * cce1c921771eef9d6dfb5b475692a1501588bf8b Azure:

Re: [PR] [HUDI-6902] Create a dummy PR to trigger tests [hudi]

2024-01-09 Thread via GitHub
hudi-bot commented on PR #10464: URL: https://github.com/apache/hudi/pull/10464#issuecomment-1883832875 ## CI report: * cce1c921771eef9d6dfb5b475692a1501588bf8b Azure:

Re: [PR] [HUDI-6497] WIP HoodieStorage abstraction [hudi]

2024-01-09 Thread via GitHub
hudi-bot commented on PR #10360: URL: https://github.com/apache/hudi/pull/10360#issuecomment-1883818196 ## CI report: * 0a958d6408a7d0107ae2dcfc2aae676fd1a6977d UNKNOWN * 621bad3fc89b1a77bab7ca7fdb147afbdea8d30e Azure:

Re: [PR] [HUDI-6902] Create a dummy PR to trigger tests [hudi]

2024-01-09 Thread via GitHub
hudi-bot commented on PR #10464: URL: https://github.com/apache/hudi/pull/10464#issuecomment-1883818671 ## CI report: * cce1c921771eef9d6dfb5b475692a1501588bf8b Azure:

Re: [PR] [HUDI-7284] stream sync doesn't differentiate replace commits [hudi]

2024-01-09 Thread via GitHub
hudi-bot commented on PR #10467: URL: https://github.com/apache/hudi/pull/10467#issuecomment-1883763558 ## CI report: * ae88a3597422ceb690891a1340309ac20f22afd4 Azure:

Re: [PR] [HUDI-6497] WIP HoodieStorage abstraction [hudi]

2024-01-09 Thread via GitHub
hudi-bot commented on PR #10360: URL: https://github.com/apache/hudi/pull/10360#issuecomment-1883763120 ## CI report: * 0a958d6408a7d0107ae2dcfc2aae676fd1a6977d UNKNOWN * 7b47ad7e74e126fdf3578e38fec59da4d3651a66 Azure:

[I] multi-writer jobs wait forever to finish it off (Using OPTIMISTIC_CONCURRENCY_CONTROL) [hudi]

2024-01-09 Thread via GitHub
SamarthRaval opened a new issue, #10468: URL: https://github.com/apache/hudi/issues/10468 I am running multi-writer jobs on one hudi table. I am submitting 60-100ish files in parallel to write on hudi table, with all the concurrency set up configured. It is possible that many jobs

Re: [PR] [HUDI-7284] stream sync doesn't differentiate replace commits [hudi]

2024-01-09 Thread via GitHub
hudi-bot commented on PR #10467: URL: https://github.com/apache/hudi/pull/10467#issuecomment-1883752714 ## CI report: * ae88a3597422ceb690891a1340309ac20f22afd4 Azure:

Re: [PR] [HUDI-6497] WIP HoodieStorage abstraction [hudi]

2024-01-09 Thread via GitHub
hudi-bot commented on PR #10360: URL: https://github.com/apache/hudi/pull/10360#issuecomment-1883751984 ## CI report: * 0a958d6408a7d0107ae2dcfc2aae676fd1a6977d UNKNOWN * 7b47ad7e74e126fdf3578e38fec59da4d3651a66 Azure:

Re: [PR] [HUDI-7284] stream sync doesn't differentiate replace commits [hudi]

2024-01-09 Thread via GitHub
jonvex commented on code in PR #10467: URL: https://github.com/apache/hudi/pull/10467#discussion_r1446548231 ## hudi-utilities/src/main/java/org/apache/hudi/utilities/streamer/StreamSync.java: ## @@ -457,7 +457,7 @@ public Pair, JavaRDD> syncOnce() throws IOException

Re: [PR] [HUDI-7284] stream sync doesn't differentiate replace commits [hudi]

2024-01-09 Thread via GitHub
yihua commented on code in PR #10467: URL: https://github.com/apache/hudi/pull/10467#discussion_r1446534664 ## hudi-utilities/src/main/java/org/apache/hudi/utilities/streamer/StreamSync.java: ## @@ -457,7 +457,7 @@ public Pair, JavaRDD> syncOnce() throws IOException

Re: [PR] [HUDI-7284] stream sync doesn't differentiate replace commits [hudi]

2024-01-09 Thread via GitHub
hudi-bot commented on PR #10467: URL: https://github.com/apache/hudi/pull/10467#issuecomment-1883674329 ## CI report: * ae88a3597422ceb690891a1340309ac20f22afd4 Azure:

Re: [PR] [HUDI-6902] Create a dummy PR to trigger tests [hudi]

2024-01-09 Thread via GitHub
hudi-bot commented on PR #10464: URL: https://github.com/apache/hudi/pull/10464#issuecomment-1883674256 ## CI report: * f622ac886541b288dbf53989ca32f80f4c6c5a89 Azure:

Re: [PR] [HUDI-7284] stream sync doesn't differentiate replace commits [hudi]

2024-01-09 Thread via GitHub
hudi-bot commented on PR #10467: URL: https://github.com/apache/hudi/pull/10467#issuecomment-1883662814 ## CI report: * ae88a3597422ceb690891a1340309ac20f22afd4 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run

Re: [PR] [HUDI-6902] Create a dummy PR to trigger tests [hudi]

2024-01-09 Thread via GitHub
hudi-bot commented on PR #10464: URL: https://github.com/apache/hudi/pull/10464#issuecomment-1883662745 ## CI report: * f622ac886541b288dbf53989ca32f80f4c6c5a89 Azure:

Re: [PR] [HUDI-6497] WIP HoodieStorage abstraction [hudi]

2024-01-09 Thread via GitHub
hudi-bot commented on PR #10360: URL: https://github.com/apache/hudi/pull/10360#issuecomment-1883651244 ## CI report: * 0a958d6408a7d0107ae2dcfc2aae676fd1a6977d UNKNOWN * 7b47ad7e74e126fdf3578e38fec59da4d3651a66 Azure:

[jira] [Updated] (HUDI-7285) OOMs in spark tests

2024-01-09 Thread Lin Liu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-7285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lin Liu updated HUDI-7285: -- Description: # HUDI-7283 duplicates. > OOMs in spark tests > --- > > Key:

[jira] [Assigned] (HUDI-7285) OOMs in spark tests

2024-01-09 Thread Lin Liu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-7285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lin Liu reassigned HUDI-7285: - Assignee: Lin Liu > OOMs in spark tests > --- > > Key: HUDI-7285 >

[jira] [Updated] (HUDI-7284) Differentiate between replacecommits in stream sync

2024-01-09 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/HUDI-7284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-7284: - Labels: pull-request-available (was: ) > Differentiate between replacecommits in stream sync >

[jira] [Created] (HUDI-7284) Differentiate between replacecommits in stream sync

2024-01-09 Thread Jonathan Vexler (Jira)
Jonathan Vexler created HUDI-7284: - Summary: Differentiate between replacecommits in stream sync Key: HUDI-7284 URL: https://issues.apache.org/jira/browse/HUDI-7284 Project: Apache Hudi

[PR] [HUDI-7284] stream sync doesn't differentiate replace commits [hudi]

2024-01-09 Thread via GitHub
jonvex opened a new pull request, #10467: URL: https://github.com/apache/hudi/pull/10467 ### Change Logs Differentiate between replace commits that are cluster and those that are not ### Impact streamer won't get stuck ### Risk level (write none, low medium or

[jira] [Created] (HUDI-7285) OOMs in spark tests

2024-01-09 Thread Lin Liu (Jira)
Lin Liu created HUDI-7285: - Summary: OOMs in spark tests Key: HUDI-7285 URL: https://issues.apache.org/jira/browse/HUDI-7285 Project: Apache Hudi Issue Type: Sub-task Reporter: Lin Liu

[jira] [Closed] (HUDI-6903) Resource is low on HDFS in docker

2024-01-09 Thread Lin Liu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-6903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lin Liu closed HUDI-6903. - Resolution: Cannot Reproduce > Resource is low on HDFS in docker > - > >

[jira] [Created] (HUDI-7283) OOM issue

2024-01-09 Thread Lin Liu (Jira)
Lin Liu created HUDI-7283: - Summary: OOM issue Key: HUDI-7283 URL: https://issues.apache.org/jira/browse/HUDI-7283 Project: Apache Hudi Issue Type: Sub-task Reporter: Lin Liu

Re: [PR] [HUDI-6497] WIP HoodieStorage abstraction [hudi]

2024-01-09 Thread via GitHub
hudi-bot commented on PR #10360: URL: https://github.com/apache/hudi/pull/10360#issuecomment-1883594754 ## CI report: * 0a958d6408a7d0107ae2dcfc2aae676fd1a6977d UNKNOWN * 4d1c6b4d5d83eaed954174ce26a83e23be62bb20 Azure:

Re: [PR] [HUDI-7144] Build storage partition stats index and use it for data skipping [hudi]

2024-01-09 Thread via GitHub
hudi-bot commented on PR #10352: URL: https://github.com/apache/hudi/pull/10352#issuecomment-1883490105 ## CI report: * 7d6558bdad3a7a2e168dae36bb8c9ead051b6690 Azure:

Re: [PR] [HUDI-7144] Build storage partition stats index and use it for data skipping [hudi]

2024-01-09 Thread via GitHub
hudi-bot commented on PR #10352: URL: https://github.com/apache/hudi/pull/10352#issuecomment-1883399409 ## CI report: * 0279dd4b1ab59776cbb5024810f5bb6a00fd2164 Azure:

Re: [PR] [HUDI-7144] Build storage partition stats index and use it for data skipping [hudi]

2024-01-09 Thread via GitHub
hudi-bot commented on PR #10352: URL: https://github.com/apache/hudi/pull/10352#issuecomment-1883384200 ## CI report: * 0279dd4b1ab59776cbb5024810f5bb6a00fd2164 Azure:

Re: [PR] [WIP] [HUDI-6787] Implement the HoodieFileGroupReader API for Hive [hudi]

2024-01-09 Thread via GitHub
jonvex commented on code in PR #10422: URL: https://github.com/apache/hudi/pull/10422#discussion_r1446299229 ## hudi-hadoop-mr/src/test/java/org/apache/hudi/hadoop/realtime/TestHoodieRealtimeRecordReader.java: ## @@ -116,6 +117,7 @@ public void setUp() {

Re: [PR] [WIP] [HUDI-6787] Implement the HoodieFileGroupReader API for Hive [hudi]

2024-01-09 Thread via GitHub
jonvex commented on code in PR #10422: URL: https://github.com/apache/hudi/pull/10422#discussion_r1446284069 ## packaging/bundle-validation/validate.sh: ## @@ -93,7 +93,7 @@ test_spark_hadoop_mr_bundles () { # save HiveQL query results

Re: [I] [SUPPORT]Flink writes MOR table, both RO table and RT table read nothing by hive [hudi]

2024-01-09 Thread via GitHub
ad1happy2go commented on issue #10465: URL: https://github.com/apache/hudi/issues/10465#issuecomment-1883325629 @nicholasxu That probably may be the reason. As with hive simple select * doesn't trigger the TEZ job. you can try adding condition WHERE 1 = 1 which should trigger job. --

Re: [I] [SUPPORT]Flink writes MOR table, both RO table and RT table read nothing by hive [hudi]

2024-01-09 Thread via GitHub
nicholasxu commented on issue #10465: URL: https://github.com/apache/hudi/issues/10465#issuecomment-1883249450 > @nicholasxu Can you please also try to give all column names once instead of `select *` 1. select all column names

Re: [I] [SUPPORT]Flink writes MOR table, both RO table and RT table read nothing by hive [hudi]

2024-01-09 Thread via GitHub
ad1happy2go commented on issue #10465: URL: https://github.com/apache/hudi/issues/10465#issuecomment-1883099562 @nicholasxu Can you please also try to give column name once instead of `select *` -- This is an automated message from the Apache Git Service. To respond to the message,

Re: [I] [SUPPORT]Flink writes MOR table, both RO table and RT table read nothing by hive [hudi]

2024-01-09 Thread via GitHub
nicholasxu commented on issue #10465: URL: https://github.com/apache/hudi/issues/10465#issuecomment-1883084647 > @nicholasxu It may be due to caching I guess. Can you restart hive and see if you can query the data using `select * from table` Thx, I restart all hive services, and set

Re: [I] [SUPPORT]Flink writes MOR table, both RO table and RT table read nothing by hive [hudi]

2024-01-09 Thread via GitHub
ad1happy2go commented on issue #10465: URL: https://github.com/apache/hudi/issues/10465#issuecomment-1883064528 @nicholasxu It may be due to caching I guess. Can you restart hive and see if you can query the data using `select * from table` -- This is an automated message from the Apache

[I] If Sanitastiion Enabled In HudiStreamer It is taking too much time [hudi]

2024-01-09 Thread via GitHub
Amar1404 opened a new issue, #10466: URL: https://github.com/apache/hudi/issues/10466 **_Tips before filing an issue_** **Describe the problem you faced** I have enabled the SANITIZE_SCHEMA_FIELD_NAMES hudiDeltaStreamer is stuck after reading CSV. I think we can refactor

Re: [PR] [HUDI-1881]: draft implementation for trigger based on data availability [hudi]

2024-01-09 Thread via GitHub
Sarfaraz-214 commented on PR #5071: URL: https://github.com/apache/hudi/pull/5071#issuecomment-1883033907 I cherry picked this commit into Hudi 0.14.1 and did some minor changes and it seems to working fine for me. -- This is an automated message from the Apache Git Service. To respond

[I] [SUPPORT]Flink writes MOR table, both RO table and RT table read nothing by hive [hudi]

2024-01-09 Thread via GitHub
nicholasxu opened a new issue, #10465: URL: https://github.com/apache/hudi/issues/10465 **Describe the problem you faced** I use Flink write HUDI MOR table, and Flink read table normally, while RO table and RT table read nothing by hive **To Reproduce** Steps to

[jira] [Created] (HUDI-7282) Hudi COW APPEND mode can be verified through cluster that even if the index is bucket

2024-01-09 Thread Junning Liang (Jira)
Junning Liang created HUDI-7282: --- Summary: Hudi COW APPEND mode can be verified through cluster that even if the index is bucket Key: HUDI-7282 URL: https://issues.apache.org/jira/browse/HUDI-7282

Re: [I] [SUPPORT] Kafka connect sink to S3 authentification parameters [hudi]

2024-01-09 Thread via GitHub
akolyaga commented on issue #10428: URL: https://github.com/apache/hudi/issues/10428#issuecomment-1882937441 I don't use AWS, as I mentioned I try to connect to ceph cluster. This connection configuration works for my case: "hoodie.base.path": "s3a://x", "store.url":

Re: [I] Partitioning data into two keys is taking more time (10x) than partitioning into one key. [hudi]

2024-01-09 Thread via GitHub
maheshguptags commented on issue #10456: URL: https://github.com/apache/hudi/issues/10456#issuecomment-1882910466 @xicm I reduced the number of bucket( it makes sense to reduce the bucket size as we have second level partition) but it is still taking 45-50 min to execute which 5 times as

Re: [PR] [HUDI-6902] Create a dummy PR to trigger tests [hudi]

2024-01-09 Thread via GitHub
hudi-bot commented on PR #10464: URL: https://github.com/apache/hudi/pull/10464#issuecomment-1882908653 ## CI report: * f622ac886541b288dbf53989ca32f80f4c6c5a89 Azure:

Re: [PR] [MINOR] Add parallel listing of existing partitions [hudi]

2024-01-09 Thread via GitHub
hudi-bot commented on PR #10460: URL: https://github.com/apache/hudi/pull/10460#issuecomment-1882897567 ## CI report: * 1022041ada5bc1b360c463e2b044e232fd8f9749 Azure:

  1   2   >