[GitHub] [hudi] nfarah86 opened a new pull request, #8577: rewrote clustering doc

2023-04-25 Thread via GitHub
nfarah86 opened a new pull request, #8577: URL: https://github.com/apache/hudi/pull/8577 Staging docs- waiting for kyle's review ### Change Logs _Describe context and summary for this change. Highlight if any code was copied._ ### Impact _Describe any public API

[GitHub] [hudi] hudi-bot commented on pull request #8520: [HUDI-6115] Hardening expectation of corruptRecordColumn in ChainedTransformer.

2023-04-25 Thread via GitHub
hudi-bot commented on PR #8520: URL: https://github.com/apache/hudi/pull/8520#issuecomment-1522801441 ## CI report: * 5ca6c38df3f6511f072a1c65ac291249f0957311 Azure:

[GitHub] [hudi] ad1happy2go commented on issue #8576: [SUPPORT] Doubt about handling old data arrival in hudi

2023-04-25 Thread via GitHub
ad1happy2go commented on issue #8576: URL: https://github.com/apache/hudi/issues/8576#issuecomment-1522797431 @pravin1406 For the same reason if have a option preCombine field. This field basically used for ordering the records. So if old update comes after 3 days, the preCombine field

[GitHub] [hudi] hudi-bot commented on pull request #8520: [HUDI-6115] Hardening expectation of corruptRecordColumn in ChainedTransformer.

2023-04-25 Thread via GitHub
hudi-bot commented on PR #8520: URL: https://github.com/apache/hudi/pull/8520#issuecomment-1522768626 ## CI report: * 5ca6c38df3f6511f072a1c65ac291249f0957311 Azure:

[GitHub] [hudi] hudi-bot commented on pull request #8303: [HUDI-5998] Speed up reads from bootstrapped tables in spark

2023-04-25 Thread via GitHub
hudi-bot commented on PR #8303: URL: https://github.com/apache/hudi/pull/8303#issuecomment-1522768292 ## CI report: * 732fbf0bf522d987baff2e6831fa85a0b5597c88 Azure:

[GitHub] [hudi] hudi-bot commented on pull request #8503: [HUDI-6047] Clustering operation on consistent hashing index resulting in duplicate data

2023-04-25 Thread via GitHub
hudi-bot commented on PR #8503: URL: https://github.com/apache/hudi/pull/8503#issuecomment-1522763320 ## CI report: * 0738d975df341763e384b9ac9bcad14b006c9c47 UNKNOWN * 6302a97fd391f4823c08d0c5f945719aa5457757 Azure:

[GitHub] [hudi] hudi-bot commented on pull request #8303: [HUDI-5998] Speed up reads from bootstrapped tables in spark

2023-04-25 Thread via GitHub
hudi-bot commented on PR #8303: URL: https://github.com/apache/hudi/pull/8303#issuecomment-1522763024 ## CI report: * 9cda89b23cbf8514e1c2e0049eea4624f3b49f10 Azure:

[GitHub] [hudi] hudi-bot commented on pull request #8550: [HUDI-6127]Flink Hudi Write support commit on an empty batch

2023-04-25 Thread via GitHub
hudi-bot commented on PR #8550: URL: https://github.com/apache/hudi/pull/8550#issuecomment-1522757858 ## CI report: * 563e10e0492a8194d789772de6bb9ced9f8c0721 UNKNOWN * 25a2ebf3646b2abf99bfba54d947066d3fc16c6b Azure:

[GitHub] [hudi] hudi-bot commented on pull request #8503: [HUDI-6047] Clustering operation on consistent hashing index resulting in duplicate data

2023-04-25 Thread via GitHub
hudi-bot commented on PR #8503: URL: https://github.com/apache/hudi/pull/8503#issuecomment-1522757718 ## CI report: * 0738d975df341763e384b9ac9bcad14b006c9c47 UNKNOWN * 6302a97fd391f4823c08d0c5f945719aa5457757 Azure:

[GitHub] [hudi] hudi-bot commented on pull request #8303: [HUDI-5998] Speed up reads from bootstrapped tables in spark

2023-04-25 Thread via GitHub
hudi-bot commented on PR #8303: URL: https://github.com/apache/hudi/pull/8303#issuecomment-1522757352 ## CI report: * 9cda89b23cbf8514e1c2e0049eea4624f3b49f10 Azure:

[GitHub] [hudi] danny0405 commented on pull request #8529: [HUDI-6120]filter base file when there is only one file slice fetched

2023-04-25 Thread via GitHub
danny0405 commented on PR #8529: URL: https://github.com/apache/hudi/pull/8529#issuecomment-1522754687 > > @codope Hi codope, I've added a unit test that covers the scenario for AbstractTableFileSystemView, do I still need to add some tests for IncrementalInputSplits? > > @codope Hi

[GitHub] [hudi] danny0405 commented on a diff in pull request #8546: [MINOR] Add log in flink compact/cluster commit sink for troubleshoot…

2023-04-25 Thread via GitHub
danny0405 commented on code in PR #8546: URL: https://github.com/apache/hudi/pull/8546#discussion_r1177310563 ## hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/compact/CompactionCommitSink.java: ## @@ -101,6 +101,12 @@ public void open(Configuration

[GitHub] [hudi] danny0405 commented on a diff in pull request #8568: [HUDI-6134] prevent two clean run concurrently in flink.

2023-04-25 Thread via GitHub
danny0405 commented on code in PR #8568: URL: https://github.com/apache/hudi/pull/8568#discussion_r1177309173 ## hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/clustering/ClusteringCommitSink.java: ## @@ -179,7 +179,7 @@ private void doCommit(String

[GitHub] [hudi] voonhous commented on issue #8540: [SUPPORT] Getting error when writing into COW HUDI table if schema changed (datatype changed / column dropped)

2023-04-25 Thread via GitHub
voonhous commented on issue #8540: URL: https://github.com/apache/hudi/issues/8540#issuecomment-1522738859 My bad, i tried reproducing the test cases through the specifications you provided, but cannot seem to replicate them. Instead of me trying to figure out how one can reproduce

[GitHub] [hudi] danny0405 commented on pull request #8355: [HUDI-6016] HoodieCLIUtils supports creating HoodieClient with non-default database

2023-04-25 Thread via GitHub
danny0405 commented on PR #8355: URL: https://github.com/apache/hudi/pull/8355#issuecomment-1522735498 Can you rebase with the latest master and resolve the conflicts -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [hudi] danny0405 commented on pull request #8512: [HUDI-6057] Support Flink 1.17

2023-04-25 Thread via GitHub
danny0405 commented on PR #8512: URL: https://github.com/apache/hudi/pull/8512#issuecomment-1522732827 @PrabhuJoseph You need to rebase with the latest master code to make the tests pass -- This is an automated message from the Apache Git Service. To respond to the message, please log

[GitHub] [hudi] hudi-bot commented on pull request #8546: [MINOR] Add log in flink compact/cluster commit sink for troubleshoot…

2023-04-25 Thread via GitHub
hudi-bot commented on PR #8546: URL: https://github.com/apache/hudi/pull/8546#issuecomment-1522729150 ## CI report: * 2914fd9a3052f735733c8a212644918349943618 Azure:

[GitHub] [hudi] hudi-bot commented on pull request #8303: [HUDI-5998] Speed up reads from bootstrapped tables in spark

2023-04-25 Thread via GitHub
hudi-bot commented on PR #8303: URL: https://github.com/apache/hudi/pull/8303#issuecomment-1522728741 ## CI report: * 9cda89b23cbf8514e1c2e0049eea4624f3b49f10 Azure:

[GitHub] [hudi] danny0405 commented on pull request #8432: [HUDI-6072] Fix NPE when upsert merger and null map or array

2023-04-25 Thread via GitHub
danny0405 commented on PR #8432: URL: https://github.com/apache/hudi/pull/8432#issuecomment-1522728301 Thanks for the fix, I have reviewed and created a patch: [6072.patch.zip](https://github.com/apache/hudi/files/11328753/6072.patch.zip) You can rebase with the latest master, apply

[GitHub] [hudi] rohan-uptycs commented on pull request #8503: [HUDI-6047] Clustering operation on consistent hashing index resulting in duplicate data

2023-04-25 Thread via GitHub
rohan-uptycs commented on PR #8503: URL: https://github.com/apache/hudi/pull/8503#issuecomment-1522727119 > > @rohan-uptycs, could you add the test cases for this change? > > @SteNicholas, Sure will do > @rohan-uptycs, could you add the test cases for this change?

[GitHub] [hudi] hudi-bot commented on pull request #8550: [HUDI-6127]Flink Hudi Write support commit on an empty batch

2023-04-25 Thread via GitHub
hudi-bot commented on PR #8550: URL: https://github.com/apache/hudi/pull/8550#issuecomment-1522722842 ## CI report: * 563e10e0492a8194d789772de6bb9ced9f8c0721 UNKNOWN * 25a2ebf3646b2abf99bfba54d947066d3fc16c6b Azure:

[GitHub] [hudi] hudi-bot commented on pull request #8546: [MINOR] Add log in flink compact/cluster commit sink for troubleshoot…

2023-04-25 Thread via GitHub
hudi-bot commented on PR #8546: URL: https://github.com/apache/hudi/pull/8546#issuecomment-1522722762 ## CI report: * 2914fd9a3052f735733c8a212644918349943618 Azure:

[GitHub] [hudi] hudi-bot commented on pull request #8568: [HUDI-6134] prevent two clean run concurrently in flink.

2023-04-25 Thread via GitHub
hudi-bot commented on PR #8568: URL: https://github.com/apache/hudi/pull/8568#issuecomment-1522717656 ## CI report: * 6926fef9d1a76bed03d6827b4a64355bc4d6de76 Azure:

[GitHub] [hudi] hudi-bot commented on pull request #8566: [HUDI-5761] Added type configs for configs that take in classes

2023-04-25 Thread via GitHub
hudi-bot commented on PR #8566: URL: https://github.com/apache/hudi/pull/8566#issuecomment-1522717615 ## CI report: * 47d7e3525f277ba9191754b0cb776c34f40a9ddd Azure:

[GitHub] [hudi] hbgstc123 commented on a diff in pull request #8546: [MINOR] Add log in flink compact/cluster commit sink for troubleshoot…

2023-04-25 Thread via GitHub
hbgstc123 commented on code in PR #8546: URL: https://github.com/apache/hudi/pull/8546#discussion_r1177288415 ## hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/clustering/ClusteringCommitSink.java: ## @@ -150,6 +159,13 @@ private void doCommit(String

[GitHub] [hudi] waitingF commented on a diff in pull request #8376: [HUDI-6019] support config minPartitions when reading from kafka

2023-04-25 Thread via GitHub
waitingF commented on code in PR #8376: URL: https://github.com/apache/hudi/pull/8376#discussion_r1177276655 ## hudi-utilities/src/test/java/org/apache/hudi/utilities/sources/helpers/TestCheckpointUtils.java: ## @@ -57,63 +58,191 @@ public void testStringToOffsets() { @Test

[GitHub] [hudi] hudi-bot commented on pull request #8568: [HUDI-6134] prevent two clean run concurrently in flink.

2023-04-25 Thread via GitHub
hudi-bot commented on PR #8568: URL: https://github.com/apache/hudi/pull/8568#issuecomment-1522683673 ## CI report: * 6926fef9d1a76bed03d6827b4a64355bc4d6de76 Azure:

[GitHub] [hudi] xccui commented on issue #8554: [SUPPORT] Some resources should be reset after failure recovery of Flink

2023-04-25 Thread via GitHub
xccui commented on issue #8554: URL: https://github.com/apache/hudi/issues/8554#issuecomment-1522683040 Sure. I'll send out a fix these days -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [hudi] hudi-bot commented on pull request #8556: [HUDI-6131] Refactor getWritePathsOfInstants in Flink WriteProfiles

2023-04-25 Thread via GitHub
hudi-bot commented on PR #8556: URL: https://github.com/apache/hudi/pull/8556#issuecomment-1522683484 ## CI report: * f7fd7f30f8128fe0e1cfff8903e09b49ff0351c5 Azure:

[GitHub] [hudi] hbgstc123 commented on a diff in pull request #8568: [HUDI-6134] prevent two clean run concurrently in flink.

2023-04-25 Thread via GitHub
hbgstc123 commented on code in PR #8568: URL: https://github.com/apache/hudi/pull/8568#discussion_r1177274405 ## hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/CleanFunction.java: ## @@ -64,7 +64,19 @@ public void open(Configuration parameters) throws

[GitHub] [hudi] codope commented on issue #7634: [SUPPORT] Consistent Hashing index type read inconsistency when performing an ALTER TABLE DROP PARTITION DDL

2023-04-25 Thread via GitHub
codope commented on issue #7634: URL: https://github.com/apache/hudi/issues/7634#issuecomment-1522676443 Closing it as the fix has landed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [hudi] codope closed issue #7634: [SUPPORT] Consistent Hashing index type read inconsistency when performing an ALTER TABLE DROP PARTITION DDL

2023-04-25 Thread via GitHub
codope closed issue #7634: [SUPPORT] Consistent Hashing index type read inconsistency when performing an ALTER TABLE DROP PARTITION DDL URL: https://github.com/apache/hudi/issues/7634 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [hudi] codope commented on pull request #7776: [HUDI-5642] Enable schema reconciliation by default

2023-04-25 Thread via GitHub
codope commented on PR #7776: URL: https://github.com/apache/hudi/pull/7776#issuecomment-1522676118 > please help me understand why this is critical priority? this was planned to be done in one of the prev releases but there are some issues. I've lowered the priority for now, but we

[GitHub] [hudi] hudi-bot commented on pull request #8556: [HUDI-6131] Refactor getWritePathsOfInstants in Flink WriteProfiles

2023-04-25 Thread via GitHub
hudi-bot commented on PR #8556: URL: https://github.com/apache/hudi/pull/8556#issuecomment-1522673781 ## CI report: * f7fd7f30f8128fe0e1cfff8903e09b49ff0351c5 Azure:

[GitHub] [hudi] hudi-bot commented on pull request #8505: [HUDI-6106] Spark offline compaction/Clustering Job will do clean like Flink job

2023-04-25 Thread via GitHub
hudi-bot commented on PR #8505: URL: https://github.com/apache/hudi/pull/8505#issuecomment-1522668875 ## CI report: * f7c73e83812258b53b979afbd6d465e9066b801f UNKNOWN * 269fad02a5346121e823a15c9804e2e63eb16c30 UNKNOWN * 442430f680316bdfefc27c4aca9f7cd94e95373c UNKNOWN *

[GitHub] [hudi] hudi-bot commented on pull request #7627: [HUDI-5517] HoodieTimeline support filter instants by state transition time

2023-04-25 Thread via GitHub
hudi-bot commented on PR #7627: URL: https://github.com/apache/hudi/pull/7627#issuecomment-1522667692 ## CI report: * 85b25f5cda4ccd8189a1607259e1732a910c3262 UNKNOWN * bfb9fbbed9a2423ba1781962cea8ccc277a84880 Azure:

[GitHub] [hudi] stream2000 commented on a diff in pull request #8550: [HUDI-6127]Flink Hudi Write support commit on an empty batch

2023-04-25 Thread via GitHub
stream2000 commented on code in PR #8550: URL: https://github.com/apache/hudi/pull/8550#discussion_r1177255523 ## hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/configuration/FlinkOptions.java: ## @@ -614,6 +614,13 @@ private FlinkOptions() {

[GitHub] [hudi] danny0405 commented on a diff in pull request #8550: [HUDI-6127]Flink Hudi Write support commit on an empty batch

2023-04-25 Thread via GitHub
danny0405 commented on code in PR #8550: URL: https://github.com/apache/hudi/pull/8550#discussion_r1177252652 ## hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/configuration/FlinkOptions.java: ## @@ -614,6 +614,13 @@ private FlinkOptions() {

[GitHub] [hudi] danny0405 commented on a diff in pull request #8568: [HUDI-6134] prevent two clean run concurrently in flink.

2023-04-25 Thread via GitHub
danny0405 commented on code in PR #8568: URL: https://github.com/apache/hudi/pull/8568#discussion_r1177250127 ## hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/CleanFunction.java: ## @@ -64,7 +64,19 @@ public void open(Configuration parameters) throws

[GitHub] [hudi] danny0405 commented on a diff in pull request #8568: [HUDI-6134] prevent two clean run concurrently in flink.

2023-04-25 Thread via GitHub
danny0405 commented on code in PR #8568: URL: https://github.com/apache/hudi/pull/8568#discussion_r1177250127 ## hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/CleanFunction.java: ## @@ -64,7 +64,19 @@ public void open(Configuration parameters) throws

[GitHub] [hudi] danny0405 commented on a diff in pull request #8546: [MINOR] Add log in flink compact/cluster commit sink for troubleshoot…

2023-04-25 Thread via GitHub
danny0405 commented on code in PR #8546: URL: https://github.com/apache/hudi/pull/8546#discussion_r1177245362 ## hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/clustering/ClusteringCommitSink.java: ## @@ -150,6 +159,13 @@ private void doCommit(String

[GitHub] [hudi] zhuanshenbsj1 commented on pull request #8505: [HUDI-6106] Spark offline compaction/Clustering Job will do clean like Flink job

2023-04-25 Thread via GitHub
zhuanshenbsj1 commented on PR #8505: URL: https://github.com/apache/hudi/pull/8505#issuecomment-1522646175 @hudi-bot run azure -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [hudi] danny0405 commented on a diff in pull request #8556: [HUDI-6131] Refactor getWritePathsOfInstants in Flink WriteProfiles

2023-04-25 Thread via GitHub
danny0405 commented on code in PR #8556: URL: https://github.com/apache/hudi/pull/8556#discussion_r1177227588 ## hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/partitioner/profile/WriteProfiles.java: ## @@ -83,22 +83,22 @@ public static void clean(String

[GitHub] [hudi] boneanxs commented on pull request #7627: [HUDI-5517] HoodieTimeline support filter instants by state transition time

2023-04-25 Thread via GitHub
boneanxs commented on PR #7627: URL: https://github.com/apache/hudi/pull/7627#issuecomment-1522638580 @hudi-bot run azure -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [hudi] hudi-bot commented on pull request #8452: [HUDI-6077] Add more partition push down filters

2023-04-25 Thread via GitHub
hudi-bot commented on PR #8452: URL: https://github.com/apache/hudi/pull/8452#issuecomment-1522638101 ## CI report: * f751e8281b606c4ae2c43c3aefc5be2cd2cbea66 Azure:

[GitHub] [hudi] boneanxs commented on pull request #8452: [HUDI-6077] Add more partition push down filters

2023-04-25 Thread via GitHub
boneanxs commented on PR #8452: URL: https://github.com/apache/hudi/pull/8452#issuecomment-1522635062 @hudi-bot run azure -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [hudi] easonwood commented on issue #8540: [SUPPORT] Getting error when writing into COW HUDI table if schema changed (datatype changed / column dropped)

2023-04-25 Thread via GitHub
easonwood commented on issue #8540: URL: https://github.com/apache/hudi/issues/8540#issuecomment-1522632098 @voonhous Actually we created a dataframe only containing the primaryKey for this table. And do DELETE operation by writing it to hudi. Code is like this :

[GitHub] [hudi] hudi-bot commented on pull request #8543: [HUDI-6124] Optimize exception message in HoodieCatalogTable

2023-04-25 Thread via GitHub
hudi-bot commented on PR #8543: URL: https://github.com/apache/hudi/pull/8543#issuecomment-1522628873 ## CI report: * b660f610433f383d8d0cc947028e399b92fa416e Azure:

[GitHub] [hudi] hudi-bot commented on pull request #8548: [HUDI-6126] Fix test `testInsertDatasetWithTimelineTimezoneUTC`

2023-04-25 Thread via GitHub
hudi-bot commented on PR #8548: URL: https://github.com/apache/hudi/pull/8548#issuecomment-1522628913 ## CI report: * ecf4e47c740f033cfbc4735ea52ad3eed13493d1 Azure:

[GitHub] [hudi] hudi-bot commented on pull request #8506: [HUDI-6104] Clean deleted partition with clean policy

2023-04-25 Thread via GitHub
hudi-bot commented on PR #8506: URL: https://github.com/apache/hudi/pull/8506#issuecomment-1522576567 ## CI report: * 7b76be8196e05d6a7bf5149eaab99f9e34858ef2 Azure:

[GitHub] [hudi] nsivabalan commented on pull request #8490: [HUDI-5968] Fix global index duplicate and handle custom payload when update partition

2023-04-25 Thread via GitHub
nsivabalan commented on PR #8490: URL: https://github.com/apache/hudi/pull/8490#issuecomment-1522575172 do you think we should do the snapshot read only when updatePartitionPath is set to true and avoid when its set to false. I am inclined towards leave it uniform(and not have two code

[GitHub] [hudi] nsivabalan commented on a diff in pull request #8490: [HUDI-5968] Fix global index duplicate and handle custom payload when update partition

2023-04-25 Thread via GitHub
nsivabalan commented on code in PR #8490: URL: https://github.com/apache/hudi/pull/8490#discussion_r1177166424 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieMergedReadHandle.java: ## @@ -0,0 +1,173 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] [hudi] nsivabalan commented on a diff in pull request #8390: [HUDI-5315] Use sample writes to estimate record size

2023-04-25 Thread via GitHub
nsivabalan commented on code in PR #8390: URL: https://github.com/apache/hudi/pull/8390#discussion_r1177162896 ## hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/client/utils/SparkSampleWritesUtils.java: ## @@ -0,0 +1,143 @@ +/* + * Licensed to the Apache Software

[GitHub] [hudi] nsivabalan commented on a diff in pull request #8390: [HUDI-5315] Use sample writes to estimate record size

2023-04-25 Thread via GitHub
nsivabalan commented on code in PR #8390: URL: https://github.com/apache/hudi/pull/8390#discussion_r1177161536 ## hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/client/utils/SparkSampleWritesUtils.java: ## @@ -0,0 +1,143 @@ +/* + * Licensed to the Apache Software

[GitHub] [hudi] nsivabalan commented on pull request #8490: [HUDI-5968] Fix global index duplicate and handle custom payload when update partition

2023-04-25 Thread via GitHub
nsivabalan commented on PR #8490: URL: https://github.com/apache/hudi/pull/8490#issuecomment-1522545434 so, https://github.com/apache/hudi/pull/8344 is not valid anymore ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [hudi] hudi-bot commented on pull request #8505: [HUDI-6106] Spark offline compaction/Clustering Job will do clean like Flink job

2023-04-25 Thread via GitHub
hudi-bot commented on PR #8505: URL: https://github.com/apache/hudi/pull/8505#issuecomment-1522538020 ## CI report: * f7c73e83812258b53b979afbd6d465e9066b801f UNKNOWN * 269fad02a5346121e823a15c9804e2e63eb16c30 UNKNOWN * 442430f680316bdfefc27c4aca9f7cd94e95373c UNKNOWN *

[GitHub] [hudi] hudi-bot commented on pull request #8505: [HUDI-6106] Spark offline compaction/Clustering Job will do clean like Flink job

2023-04-25 Thread via GitHub
hudi-bot commented on PR #8505: URL: https://github.com/apache/hudi/pull/8505#issuecomment-1522508500 ## CI report: * f7c73e83812258b53b979afbd6d465e9066b801f UNKNOWN * 269fad02a5346121e823a15c9804e2e63eb16c30 UNKNOWN * 442430f680316bdfefc27c4aca9f7cd94e95373c UNKNOWN *

[GitHub] [hudi] hudi-bot commented on pull request #8432: [HUDI-6072] Fix NPE when upsert merger and null map or array

2023-04-25 Thread via GitHub
hudi-bot commented on PR #8432: URL: https://github.com/apache/hudi/pull/8432#issuecomment-1522508319 ## CI report: * 3e9388ee9a6edaa6caab4f738b093f82744bc7dc Azure:

[GitHub] [hudi] hudi-bot commented on pull request #8575: [MINOR] Prevent nullptr exception if enum config class has extra fields

2023-04-25 Thread via GitHub
hudi-bot commented on PR #8575: URL: https://github.com/apache/hudi/pull/8575#issuecomment-1522496057 ## CI report: * ba47591a4088f7a17d922438f9537a9fdf657be7 Azure:

[GitHub] [hudi] rahil-c commented on pull request #8512: [HUDI-6057] Support Flink 1.17

2023-04-25 Thread via GitHub
rahil-c commented on PR #8512: URL: https://github.com/apache/hudi/pull/8512#issuecomment-1522459454 Thanks @PrabhuJoseph For making this contribution, besides the CI testing was wondering if you ran anything manually? -- This is an automated message from the Apache Git Service. To

[GitHub] [hudi] rahil-c commented on a diff in pull request #8512: [HUDI-6057] Support Flink 1.17

2023-04-25 Thread via GitHub
rahil-c commented on code in PR #8512: URL: https://github.com/apache/hudi/pull/8512#discussion_r1177089640 ## pom.xml: ## @@ -2373,9 +2374,23 @@ + + flink1.17 + Review Comment: https://github.com/apache/flink/blob/release-1.17/pom.xml#L144

[GitHub] [hudi] rahil-c commented on a diff in pull request #8512: [HUDI-6057] Support Flink 1.17

2023-04-25 Thread via GitHub
rahil-c commented on code in PR #8512: URL: https://github.com/apache/hudi/pull/8512#discussion_r1177089248 ## scripts/release/deploy_staging_jars.sh: ## @@ -75,6 +75,7 @@ declare -a ALL_VERSION_OPTS=( "-Dscala-2.12 -Dflink1.14 -Davro.version=1.10.0 -pl

[GitHub] [hudi] hudi-bot commented on pull request #8505: [HUDI-6106] Spark offline compaction/Clustering Job will do clean like Flink job

2023-04-25 Thread via GitHub
hudi-bot commented on PR #8505: URL: https://github.com/apache/hudi/pull/8505#issuecomment-1522447941 ## CI report: * f7c73e83812258b53b979afbd6d465e9066b801f UNKNOWN * 269fad02a5346121e823a15c9804e2e63eb16c30 UNKNOWN * 442430f680316bdfefc27c4aca9f7cd94e95373c UNKNOWN *

[GitHub] [hudi] soumilshah1995 commented on issue #8400: [SUPPORT] Hudi Offline Compaction in EMR Serverless 6.10 for YouTube Video

2023-04-25 Thread via GitHub
soumilshah1995 commented on issue #8400: URL: https://github.com/apache/hudi/issues/8400#issuecomment-1522438317 No same error what we talked on call yesterday i will paste the screenshot

[hudi] branch master updated: [MINOR] fix misleading configuration value (#8534)

2023-04-25 Thread vinoth
This is an automated email from the ASF dual-hosted git repository. vinoth pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/master by this push: new b690346a700 [MINOR] fix misleading configuration

[GitHub] [hudi] vinothchandar merged pull request #8534: [MINOR] Fix misleading permitted write operation value

2023-04-25 Thread via GitHub
vinothchandar merged PR #8534: URL: https://github.com/apache/hudi/pull/8534 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

[GitHub] [hudi] vinothchandar commented on pull request #7776: [HUDI-5642] Enable schema reconciliation by default

2023-04-25 Thread via GitHub
vinothchandar commented on PR #7776: URL: https://github.com/apache/hudi/pull/7776#issuecomment-1522397041 @codope please help me understand why this is critical priority? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[jira] [Updated] (HUDI-6138) HoodieAvroRecord - Fix Option get for empty values

2023-04-25 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-6138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Chandar updated HUDI-6138: - Fix Version/s: 0.14.0 > HoodieAvroRecord - Fix Option get for empty values >

[GitHub] [hudi] hudi-bot commented on pull request #8574: [HUDI-6139] Add support for Transformer schema validation in DeltaStreamer

2023-04-25 Thread via GitHub
hudi-bot commented on PR #8574: URL: https://github.com/apache/hudi/pull/8574#issuecomment-1522369685 ## CI report: * cf4e7358763e10aab951d16d7270f7592f7c62b0 Azure:

[GitHub] [hudi] pravin1406 opened a new issue, #8576: [SUPPORT] Doubt about handling old data arrival in hudi

2023-04-25 Thread via GitHub
pravin1406 opened a new issue, #8576: URL: https://github.com/apache/hudi/issues/8576 Hi I'm still exploring hudi for our cdc use case. Have resolved multiple issues with help from the hudi community. Though i have a very basic question. Please help out ! I have a base hudi table. I

[GitHub] [hudi] bvaradar commented on a diff in pull request #8376: [HUDI-6019] support config minPartitions when reading from kafka

2023-04-25 Thread via GitHub
bvaradar commented on code in PR #8376: URL: https://github.com/apache/hudi/pull/8376#discussion_r1176966023 ## hudi-utilities/src/test/java/org/apache/hudi/utilities/sources/helpers/TestCheckpointUtils.java: ## @@ -57,63 +58,191 @@ public void testStringToOffsets() { @Test

[GitHub] [hudi] hudi-bot commented on pull request #8550: [HUDI-6127]Flink Hudi Write support commit on an empty batch

2023-04-25 Thread via GitHub
hudi-bot commented on PR #8550: URL: https://github.com/apache/hudi/pull/8550#issuecomment-1522315403 ## CI report: * 563e10e0492a8194d789772de6bb9ced9f8c0721 UNKNOWN * 25a2ebf3646b2abf99bfba54d947066d3fc16c6b Azure:

[hudi] branch master updated: [HUDI-6090] Optimise payload size for list of FileGroupDTO (#8480)

2023-04-25 Thread sivabalan
This is an automated email from the ASF dual-hosted git repository. sivabalan pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/master by this push: new 3e641b2530e [HUDI-6090] Optimise payload size

[GitHub] [hudi] nsivabalan merged pull request #8480: [HUDI-6090] Optimise payload size for list of FileGroupDTO

2023-04-25 Thread via GitHub
nsivabalan merged PR #8480: URL: https://github.com/apache/hudi/pull/8480 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

[hudi] branch asf-site updated: updated community blog and videos (#8524)

2023-04-25 Thread bhavanisudha
This is an automated email from the ASF dual-hosted git repository. bhavanisudha pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/asf-site by this push: new 6ebd7f9 updated community blog and

[GitHub] [hudi] bhasudha merged pull request #8524: [DOCS] updated community blog and videos

2023-04-25 Thread via GitHub
bhasudha merged PR #8524: URL: https://github.com/apache/hudi/pull/8524 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

[jira] [Updated] (HUDI-6138) HoodieAvroRecord - Fix Option get for empty values

2023-04-25 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/HUDI-6138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-6138: - Labels: pull-request-available (was: ) > HoodieAvroRecord - Fix Option get for empty values >

[GitHub] [hudi] hudi-bot commented on pull request #8573: [HUDI-6138] Handled empty option for Hoodie Avro Record

2023-04-25 Thread via GitHub
hudi-bot commented on PR #8573: URL: https://github.com/apache/hudi/pull/8573#issuecomment-1522235320 ## CI report: * a7dd5031a22b75707d8e16918f6225596c6a7cd4 Azure:

[GitHub] [hudi] hudi-bot commented on pull request #8566: [HUDI-5761] Added type configs for configs that take in classes

2023-04-25 Thread via GitHub
hudi-bot commented on PR #8566: URL: https://github.com/apache/hudi/pull/8566#issuecomment-1522166795 ## CI report: * 3d2c3bda0581d03bc1cdfb042b1b9611938c7432 Azure:

[GitHub] [hudi] hudi-bot commented on pull request #8566: [HUDI-5761] Added type configs for configs that take in classes

2023-04-25 Thread via GitHub
hudi-bot commented on PR #8566: URL: https://github.com/apache/hudi/pull/8566#issuecomment-1522157508 ## CI report: * 3d2c3bda0581d03bc1cdfb042b1b9611938c7432 Azure:

[GitHub] [hudi] hudi-bot commented on pull request #7627: [HUDI-5517] HoodieTimeline support filter instants by state transition time

2023-04-25 Thread via GitHub
hudi-bot commented on PR #7627: URL: https://github.com/apache/hudi/pull/7627#issuecomment-1522155394 ## CI report: * 85b25f5cda4ccd8189a1607259e1732a910c3262 UNKNOWN * bfb9fbbed9a2423ba1781962cea8ccc277a84880 Azure:

[GitHub] [hudi] codope commented on issue #8178: Duplicate data in MOR table Hudi

2023-04-25 Thread via GitHub
codope commented on issue #8178: URL: https://github.com/apache/hudi/issues/8178#issuecomment-1522128826 Also, previously our spark streaming writes were not idempotent. So, there could be duplicates. We have fixed that in https://github.com/apache/hudi/issues/8178 -- This is an

[GitHub] [hudi] codope commented on issue #8178: Duplicate data in MOR table Hudi

2023-04-25 Thread via GitHub
codope commented on issue #8178: URL: https://github.com/apache/hudi/issues/8178#issuecomment-1522119191 @koochiswathiTR Is the record key field `guid` some randomly generation id like uuid. There have been known [issues](https://github.com/apache/hudi/issues/7829) with non-deteministic id

[GitHub] [hudi] hbgstc123 commented on a diff in pull request #8546: [MINOR] Add log in flink compact/cluster commit sink for troubleshoot…

2023-04-25 Thread via GitHub
hbgstc123 commented on code in PR #8546: URL: https://github.com/apache/hudi/pull/8546#discussion_r1176776518 ## hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/clustering/ClusteringCommitSink.java: ## @@ -150,6 +159,13 @@ private void doCommit(String

[GitHub] [hudi] hudi-bot commented on pull request #8556: [HUDI-6131] Refactor getWritePathsOfInstants in Flink WriteProfiles

2023-04-25 Thread via GitHub
hudi-bot commented on PR #8556: URL: https://github.com/apache/hudi/pull/8556#issuecomment-1522096497 ## CI report: * 7a47fbb96e38221107f338a8793f37a5135b7ffd Azure:

[GitHub] [hudi] hudi-bot commented on pull request #8543: [HUDI-6124] Optimize exception message in HoodieCatalogTable

2023-04-25 Thread via GitHub
hudi-bot commented on PR #8543: URL: https://github.com/apache/hudi/pull/8543#issuecomment-1522096344 ## CI report: * 4ee8b4c535b3d7fac3327b37e3ea88aa326fd9cc Azure:

[GitHub] [hudi] hudi-bot commented on pull request #8548: [HUDI-6126] Fix test `testInsertDatasetWithTimelineTimezoneUTC`

2023-04-25 Thread via GitHub
hudi-bot commented on PR #8548: URL: https://github.com/apache/hudi/pull/8548#issuecomment-1522096400 ## CI report: * fbfe0c394ec50c9c82ee2016d8d165381e671c23 Azure:

[GitHub] [hudi] hudi-bot commented on pull request #8506: [HUDI-6104] Clean deleted partition with clean policy

2023-04-25 Thread via GitHub
hudi-bot commented on PR #8506: URL: https://github.com/apache/hudi/pull/8506#issuecomment-1522096115 ## CI report: * 944ecd724b9fa3c783cd0624b6235f0750167c9e Azure:

[GitHub] [hudi] hudi-bot commented on pull request #8432: [HUDI-6072] Fix NPE when upsert merger and null map or array

2023-04-25 Thread via GitHub
hudi-bot commented on PR #8432: URL: https://github.com/apache/hudi/pull/8432#issuecomment-1522095630 ## CI report: * a59cd66283dafe08691023f579f4cadb6308d63f Azure:

[GitHub] [hudi] codope commented on issue #7839: [BUG] the deleted data reappeared after clustering on the table

2023-04-25 Thread via GitHub
codope commented on issue #7839: URL: https://github.com/apache/hudi/issues/7839#issuecomment-1522091991 Downgrading the priority as the issue is not reproducible. Please provide more info as requested above. -- This is an automated message from the Apache Git Service. To respond to the

[GitHub] [hudi] ad1happy2go commented on issue #8153: [SUPPORT] Async Clustering failing for MoR in 0.13.0

2023-04-25 Thread via GitHub
ad1happy2go commented on issue #8153: URL: https://github.com/apache/hudi/issues/8153#issuecomment-1522091307 @haripriyarhp Can you let me know which jar you are using. Also paste the complete command. Looks like the version mismatch in the jar. -- This is an automated message from the

[GitHub] [hudi] hudi-bot commented on pull request #8556: [HUDI-6131] Refactor getWritePathsOfInstants in Flink WriteProfiles

2023-04-25 Thread via GitHub
hudi-bot commented on PR #8556: URL: https://github.com/apache/hudi/pull/8556#issuecomment-1522085813 ## CI report: * bb7d54dd589f4347d4c1fb6a1f0f6f0a5a4bd0ac Azure:

[GitHub] [hudi] hudi-bot commented on pull request #8548: [HUDI-6126] Fix test `testInsertDatasetWithTimelineTimezoneUTC`

2023-04-25 Thread via GitHub
hudi-bot commented on PR #8548: URL: https://github.com/apache/hudi/pull/8548#issuecomment-1522085685 ## CI report: * fbfe0c394ec50c9c82ee2016d8d165381e671c23 Azure:

[GitHub] [hudi] hudi-bot commented on pull request #8543: [HUDI-6124] Optimize exception message in HoodieCatalogTable

2023-04-25 Thread via GitHub
hudi-bot commented on PR #8543: URL: https://github.com/apache/hudi/pull/8543#issuecomment-1522085601 ## CI report: * 4ee8b4c535b3d7fac3327b37e3ea88aa326fd9cc Azure:

[GitHub] [hudi] hudi-bot commented on pull request #8506: [HUDI-6104] Clean deleted partition with clean policy

2023-04-25 Thread via GitHub
hudi-bot commented on PR #8506: URL: https://github.com/apache/hudi/pull/8506#issuecomment-1522085367 ## CI report: * 7b820a698d87eaaa54c5146c3eabbc4c7512c394 Azure:

[GitHub] [hudi] hudi-bot commented on pull request #8432: [HUDI-6072] Fix NPE when upsert merger and null map or array

2023-04-25 Thread via GitHub
hudi-bot commented on PR #8432: URL: https://github.com/apache/hudi/pull/8432#issuecomment-1522084930 ## CI report: * a59cd66283dafe08691023f579f4cadb6308d63f Azure:

[GitHub] [hudi] codope closed issue #7733: [SUPPORT] Duplicate rows found in Hudi non partitioned table.

2023-04-25 Thread via GitHub
codope closed issue #7733: [SUPPORT] Duplicate rows found in Hudi non partitioned table. URL: https://github.com/apache/hudi/issues/7733 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [hudi] codope commented on issue #7733: [SUPPORT] Duplicate rows found in Hudi non partitioned table.

2023-04-25 Thread via GitHub
codope commented on issue #7733: URL: https://github.com/apache/hudi/issues/7733#issuecomment-1522076493 Closing due to inactivity but the issue is fixed in https://github.com/apache/hudi/pull/7944 This is due to non-partitioned table having null partition value. -- This is an

[GitHub] [hudi] hudi-bot commented on pull request #8575: [MINOR] Prevent nullptr exception if enum config class has extra fields

2023-04-25 Thread via GitHub
hudi-bot commented on PR #8575: URL: https://github.com/apache/hudi/pull/8575#issuecomment-1522074202 ## CI report: * ba47591a4088f7a17d922438f9537a9fdf657be7 Azure:

[jira] [Comment Edited] (HUDI-5761) Create "Type" configs for current configs that take in classes

2023-04-25 Thread Jonathan Vexler (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17715937#comment-17715937 ] Jonathan Vexler edited comment on HUDI-5761 at 4/25/23 4:18 PM: Additional

  1   2   3   >