[GitHub] [hudi] stream2000 commented on a diff in pull request #9558: [HUDI-6481] Support run multi tables services in a single spark job

2023-09-04 Thread via GitHub
stream2000 commented on code in PR #9558: URL: https://github.com/apache/hudi/pull/9558#discussion_r1315399483 ## hudi-utilities/src/main/java/org/apache/hudi/utilities/HoodieCleaner.java: ## @@ -64,6 +64,13 @@ public HoodieCleaner(Config cfg, JavaSparkContext jssc) {

[GitHub] [hudi] hudi-bot commented on pull request #9616: [HUDI-6819] Fix logic for throwing exception in getRecordIndexUpdates.

2023-09-04 Thread via GitHub
hudi-bot commented on PR #9616: URL: https://github.com/apache/hudi/pull/9616#issuecomment-1705964692 ## CI report: * 856b4de4345faa1524592b0dc4ff955410e09ae0 Azure:

[GitHub] [hudi] hudi-bot commented on pull request #9594: [HUDI-6742] Remove the log file appending for multiple instants

2023-09-04 Thread via GitHub
hudi-bot commented on PR #9594: URL: https://github.com/apache/hudi/pull/9594#issuecomment-1705964593 ## CI report: * 028a58deeb60e5bf9be83508913518259e41d99c Azure:

[GitHub] [hudi] hudi-bot commented on pull request #9616: [HUDI-6819] Fix logic for throwing exception in getRecordIndexUpdates.

2023-09-04 Thread via GitHub
hudi-bot commented on PR #9616: URL: https://github.com/apache/hudi/pull/9616#issuecomment-1705957994 ## CI report: * 856b4de4345faa1524592b0dc4ff955410e09ae0 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the

[GitHub] [hudi] PankajKaushal commented on issue #9613: [SUPPORT] Failed to write with hudi 0.13.0

2023-09-04 Thread via GitHub
PankajKaushal commented on issue #9613: URL: https://github.com/apache/hudi/issues/9613#issuecomment-1705954047 hive_sync.support_timestamp=true hoodie.archive.async=true hoodie.archive.automatic=true hoodie.archivelog.folder=archived hoodie.bulkinsert.shuffle.parallelism=200

[GitHub] [hudi] amrishlal opened a new pull request, #9616: [HUDI-6819] Fix logic for throwing exception in getRecordIndexUpdates.

2023-09-04 Thread via GitHub
amrishlal opened a new pull request, #9616: URL: https://github.com/apache/hudi/pull/9616 ### Change Logs Fix logic for throwing exception in getRecordIndexUpdates. ### Impact None ### Risk level (write none, low medium or high below) Low

[jira] [Created] (HUDI-6819) Fix logic to throw exception in HoodieBackedTableMetadataWriter

2023-09-04 Thread Amrish Lal (Jira)
Amrish Lal created HUDI-6819: Summary: Fix logic to throw exception in HoodieBackedTableMetadataWriter Key: HUDI-6819 URL: https://issues.apache.org/jira/browse/HUDI-6819 Project: Apache Hudi

[GitHub] [hudi] hudi-bot commented on pull request #9553: [HUDI-1517][HUDI-6758][HUDI-6761] Adding support for per-logfile marker to track all log files added by a commit and to assist with rollbacks

2023-09-04 Thread via GitHub
hudi-bot commented on PR #9553: URL: https://github.com/apache/hudi/pull/9553#issuecomment-1705923600 ## CI report: * aeac327c3cad812fea5e2bc01c07c1314bbf1838 UNKNOWN * 9c68af34a1df527cef22a8636f829cc670399593 Azure:

[GitHub] [hudi] KnightChess commented on issue #9615: [SUPPORT] After enable speculation execution of spark compaction job, some broken parquet files might be generated

2023-09-04 Thread via GitHub
KnightChess commented on issue #9615: URL: https://github.com/apache/hudi/issues/9615#issuecomment-1705910526 > @KnightChess We solve the problem by the following update. https://user-images.githubusercontent.com/1525333/265580015-05483dc8-b6dd-45d0-a93c-6d6e341f8767.png;> > > It

[GitHub] [hudi] beyond1920 commented on issue #9615: [SUPPORT] After enable speculation execution of spark compaction job, some broken parquet files might be generated

2023-09-04 Thread via GitHub
beyond1920 commented on issue #9615: URL: https://github.com/apache/hudi/issues/9615#issuecomment-1705909216 @danny0405 Thanks a lot for reply. I would look into this [pr](https://github.com/apache/hudi/pull/9035) and response later. -- This is an automated message from the Apache Git

[GitHub] [hudi] beyond1920 commented on issue #9615: [SUPPORT] After enable speculation execution of spark compaction job, some broken parquet files might be generated

2023-09-04 Thread via GitHub
beyond1920 commented on issue #9615: URL: https://github.com/apache/hudi/issues/9615#issuecomment-1705907051 @KnightChess We solve the problem by the following update. https://github.com/apache/hudi/assets/1525333/05483dc8-b6dd-45d0-a93c-6d6e341f8767;> It could work, but not

[GitHub] [hudi] raghunittala commented on issue #9596: [SUPPORT] Flink job failing with Avro ClassCastException

2023-09-04 Thread via GitHub
raghunittala commented on issue #9596: URL: https://github.com/apache/hudi/issues/9596#issuecomment-1705905754 No worries @danny0405 - I'll try to spend more time on this and update in case of success. -- This is an automated message from the Apache Git Service. To respond to the

[GitHub] [hudi] beyond1920 commented on issue #9615: [SUPPORT] After enable speculation execution of spark compaction job, some broken parquet files might be generated

2023-09-04 Thread via GitHub
beyond1920 commented on issue #9615: URL: https://github.com/apache/hudi/issues/9615#issuecomment-1705905729 > I'm wondering why the marker based finalizing does not work here, because the slow attempt was killed later, when the instant was committed, this task should have finished,

[GitHub] [hudi] imrewang closed issue #9614: [SUPPORT]No data displayed in hive synchronization partition table

2023-09-04 Thread via GitHub
imrewang closed issue #9614: [SUPPORT]No data displayed in hive synchronization partition table URL: https://github.com/apache/hudi/issues/9614 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [hudi] danny0405 commented on a diff in pull request #9558: [HUDI-6481] Support run multi tables services in a single spark job

2023-09-04 Thread via GitHub
danny0405 commented on code in PR #9558: URL: https://github.com/apache/hudi/pull/9558#discussion_r1315315332 ## hudi-utilities/src/main/java/org/apache/hudi/utilities/multitable/ArchiveTask.java: ## @@ -0,0 +1,90 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

[GitHub] [hudi] danny0405 commented on a diff in pull request #9558: [HUDI-6481] Support run multi tables services in a single spark job

2023-09-04 Thread via GitHub
danny0405 commented on code in PR #9558: URL: https://github.com/apache/hudi/pull/9558#discussion_r1315315003 ## hudi-utilities/src/main/java/org/apache/hudi/utilities/multitable/ArchiveTask.java: ## @@ -0,0 +1,90 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

[GitHub] [hudi] danny0405 commented on a diff in pull request #9558: [HUDI-6481] Support run multi tables services in a single spark job

2023-09-04 Thread via GitHub
danny0405 commented on code in PR #9558: URL: https://github.com/apache/hudi/pull/9558#discussion_r1315314496 ## hudi-utilities/src/main/java/org/apache/hudi/utilities/HoodieCleaner.java: ## @@ -64,6 +64,13 @@ public HoodieCleaner(Config cfg, JavaSparkContext jssc) {

[GitHub] [hudi] danny0405 commented on issue #9615: [SUPPORT] After enable speculation execution of spark compaction job, some broken parquet files might be generated

2023-09-04 Thread via GitHub
danny0405 commented on issue #9615: URL: https://github.com/apache/hudi/issues/9615#issuecomment-1705877649 Maybe related PR: https://github.com/apache/hudi/pull/9035 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [hudi] danny0405 commented on issue #9615: [SUPPORT] After enable speculation execution of spark compaction job, some broken parquet files might be generated

2023-09-04 Thread via GitHub
danny0405 commented on issue #9615: URL: https://github.com/apache/hudi/issues/9615#issuecomment-1705877008 > we met this question too, the spark speculation task thread will not be killed still job finally commit instance The solutions: 1. Can we add a hook in the write handle to

[GitHub] [hudi] danny0405 commented on pull request #9611: [HUDI-6758] Fixing deducing spurious log blocks due to spark retries

2023-09-04 Thread via GitHub
danny0405 commented on PR #9611: URL: https://github.com/apache/hudi/pull/9611#issuecomment-1705873396 > working solution e2e and has been tested w/ spark retries as well. Okay, I'm just scared for any regression because the changes are very core, I know we already have 2e2 tests

[GitHub] [hudi] danny0405 commented on a diff in pull request #9611: [HUDI-6758] Fixing deducing spurious log blocks due to spark retries

2023-09-04 Thread via GitHub
danny0405 commented on code in PR #9611: URL: https://github.com/apache/hudi/pull/9611#discussion_r1315308129 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieAppendHandle.java: ## @@ -143,6 +144,7 @@ public HoodieAppendHandle(HoodieWriteConfig config,

[GitHub] [hudi] danny0405 commented on issue #9596: [SUPPORT] Flink job failing with Avro ClassCastException

2023-09-04 Thread via GitHub
danny0405 commented on issue #9596: URL: https://github.com/apache/hudi/issues/9596#issuecomment-1705866902 @raghunittala Sorry I'm not a Avro expert, kind of think it is a avro version conflict issue. -- This is an automated message from the Apache Git Service. To respond to the

[GitHub] [hudi] KnightChess commented on issue #9615: [SUPPORT] After enable speculation execution of spark compaction job, some broken parquet files might be generated

2023-09-04 Thread via GitHub
KnightChess commented on issue #9615: URL: https://github.com/apache/hudi/issues/9615#issuecomment-1705866097 we met this question too, the spark speculation task thread will not be killed still job finally commit instance -- This is an automated message from the Apache Git Service. To

[GitHub] [hudi] KnightChess commented on pull request #9035: [HUDI-6416] Completion markers for handling execution engine (spark) …

2023-09-04 Thread via GitHub
KnightChess commented on PR #9035: URL: https://github.com/apache/hudi/pull/9035#issuecomment-1705865010 @nbalajee is this can resolve the orphan file which product by spark speculation execution, which create maker file after commit submit -- This is an automated message from the Apache

[jira] [Closed] (HUDI-6766) Fixing mysql debezium data loss

2023-09-04 Thread Danny Chen (Jira)
[ https://issues.apache.org/jira/browse/HUDI-6766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Danny Chen closed HUDI-6766. Resolution: Fixed Fixed via master branch: c77188009252bfaea370a9bfefa68a8c02eca976 > Fixing mysql

[hudi] branch master updated: [HUDI-6766] Fixing mysql debezium data loss (#9475)

2023-09-04 Thread danny0405
This is an automated email from the ASF dual-hosted git repository. danny0405 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/master by this push: new c7718800925 [HUDI-6766] Fixing mysql debezium

[GitHub] [hudi] danny0405 merged pull request #9475: [HUDI-6766] Fixing mysql debezium data loss

2023-09-04 Thread via GitHub
danny0405 merged PR #9475: URL: https://github.com/apache/hudi/pull/9475 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

[GitHub] [hudi] danny0405 commented on issue #9615: [SUPPORT] After enable speculation execution of spark compaction job, some broken parquet files might be generated

2023-09-04 Thread via GitHub
danny0405 commented on issue #9615: URL: https://github.com/apache/hudi/issues/9615#issuecomment-1705862155 I'm wondering why the marker based finalizing does not work here, because the slow attempt was killed later, when the instant was committed, this task should have finished,

[GitHub] [hudi] hudi-bot commented on pull request #9594: [HUDI-6742] Remove the log file appending for multiple instants

2023-09-04 Thread via GitHub
hudi-bot commented on PR #9594: URL: https://github.com/apache/hudi/pull/9594#issuecomment-1705859653 ## CI report: * 2690f091ae54e806131354707fc0c514fbcf9696 Azure:

[GitHub] [hudi] danny0405 commented on issue #9513: [SUPPORT]Index Bootstrap deleted some snapshot data that has been batch-inserted into Hudi ?

2023-09-04 Thread via GitHub
danny0405 commented on issue #9513: URL: https://github.com/apache/hudi/issues/9513#issuecomment-1705859616 Can you print out the debezium log to see what the input records look like? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [hudi] hudi-bot commented on pull request #9482: [HUDI-6728] Update BigQuery manifest sync to support schema evolution

2023-09-04 Thread via GitHub
hudi-bot commented on PR #9482: URL: https://github.com/apache/hudi/pull/9482#issuecomment-1705859439 ## CI report: * c446099c646cdec2d82d4b6037362b121dedbf6c Azure:

[hudi] branch master updated (31bc565b5d5 -> 7344a2dbd28)

2023-09-04 Thread danny0405
This is an automated email from the ASF dual-hosted git repository. danny0405 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git from 31bc565b5d5 [HUDI-6804] Fix hive read schema evolution MOR table (#9573) add 7344a2dbd28 [HUDI-6818] Create a

[jira] [Closed] (HUDI-6818) Create a database automatically when using the flink catalog dfs mode

2023-09-04 Thread Danny Chen (Jira)
[ https://issues.apache.org/jira/browse/HUDI-6818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Danny Chen closed HUDI-6818. Resolution: Fixed Fixed via master branch: 7344a2dbd2810c3ab5259a9a13f76a3f11e2840d > Create a database

[GitHub] [hudi] danny0405 merged pull request #9592: [HUDI-6818] Create a database automatically when using the flink catalog dfs mode

2023-09-04 Thread via GitHub
danny0405 merged PR #9592: URL: https://github.com/apache/hudi/pull/9592 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

[jira] [Created] (HUDI-6818) Create a database automatically when using the flink catalog dfs mode

2023-09-04 Thread Danny Chen (Jira)
Danny Chen created HUDI-6818: Summary: Create a database automatically when using the flink catalog dfs mode Key: HUDI-6818 URL: https://issues.apache.org/jira/browse/HUDI-6818 Project: Apache Hudi

[GitHub] [hudi] hudi-bot commented on pull request #9594: [HUDI-6742] Remove the log file appending for multiple instants

2023-09-04 Thread via GitHub
hudi-bot commented on PR #9594: URL: https://github.com/apache/hudi/pull/9594#issuecomment-1705854518 ## CI report: * 2690f091ae54e806131354707fc0c514fbcf9696 Azure:

[jira] [Created] (HUDI-6817) After enable speculation execution of spark compaction job, some broken parquet files might be generated

2023-09-04 Thread Jing Zhang (Jira)
Jing Zhang created HUDI-6817: Summary: After enable speculation execution of spark compaction job, some broken parquet files might be generated Key: HUDI-6817 URL: https://issues.apache.org/jira/browse/HUDI-6817

[GitHub] [hudi] beyond1920 commented on issue #9615: [SUPPORT] After enable speculation execution of spark compaction job, some broken parquet files might be generated

2023-09-04 Thread via GitHub
beyond1920 commented on issue #9615: URL: https://github.com/apache/hudi/issues/9615#issuecomment-1705848097 Creates a JIRA [HUDI-6817](https://issues.apache.org/jira/browse/HUDI-6817 )to track this. -- This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [hudi] hudi-bot commented on pull request #9553: [HUDI-1517][HUDI-6758][HUDI-6761] Adding support for per-logfile marker to track all log files added by a commit and to assist with rollbacks

2023-09-04 Thread via GitHub
hudi-bot commented on PR #9553: URL: https://github.com/apache/hudi/pull/9553#issuecomment-1705847708 ## CI report: * aeac327c3cad812fea5e2bc01c07c1314bbf1838 UNKNOWN * 31d9cf29d09360563e235291abfc717f83c99220 Azure:

[GitHub] [hudi] beyond1920 opened a new issue, #9615: [SUPPORT] After enable speculation execution of spark compaction job, some broken parquet files might be generated

2023-09-04 Thread via GitHub
beyond1920 opened a new issue, #9615: URL: https://github.com/apache/hudi/issues/9615 Dear community, After enable speculation execution of spark compaction job, some broken parquet might be generated. It would lead to subsequent jobs (no matter reader jobs, ingestion jobs or compaction

[jira] [Closed] (HUDI-6804) Fix hive read schema evolution table

2023-09-04 Thread Danny Chen (Jira)
[ https://issues.apache.org/jira/browse/HUDI-6804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Danny Chen closed HUDI-6804. Resolution: Fixed Fixed via master branch: 31bc565b5d55017dadafab7daee44f8cefb2528a > Fix hive read schema

[jira] [Updated] (HUDI-6804) Fix hive read schema evolution table

2023-09-04 Thread Danny Chen (Jira)
[ https://issues.apache.org/jira/browse/HUDI-6804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Danny Chen updated HUDI-6804: - Fix Version/s: 0.14.0 > Fix hive read schema evolution table > > >

[hudi] branch master updated: [HUDI-6804] Fix hive read schema evolution MOR table (#9573)

2023-09-04 Thread danny0405
This is an automated email from the ASF dual-hosted git repository. danny0405 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/master by this push: new 31bc565b5d5 [HUDI-6804] Fix hive read schema

[GitHub] [hudi] danny0405 merged pull request #9573: [HUDI-6804] Fix hive read schema evolution MOR table

2023-09-04 Thread via GitHub
danny0405 merged PR #9573: URL: https://github.com/apache/hudi/pull/9573 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

[GitHub] [hudi] imrewang opened a new issue, #9614: [SUPPORT]No data displayed in hive synchronization partition table

2023-09-04 Thread via GitHub
imrewang opened a new issue, #9614: URL: https://github.com/apache/hudi/issues/9614 1. When I synchronize the **partition table** to the hive table, must I manually add the **external table** and **partition** in Hive before I can query the data **?** 2. Now I only add external

[jira] [Commented] (HUDI-6725) Support efficient completion time queries on the timeline

2023-09-04 Thread Danny Chen (Jira)
[ https://issues.apache.org/jira/browse/HUDI-6725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17761914#comment-17761914 ] Danny Chen commented on HUDI-6725: -- Fixed via master branch: a3eea2fdccd40a6439d6e8d72bf3ae53f5967893 >

[jira] [Resolved] (HUDI-6725) Support efficient completion time queries on the timeline

2023-09-04 Thread Danny Chen (Jira)
[ https://issues.apache.org/jira/browse/HUDI-6725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Danny Chen resolved HUDI-6725. -- > Support efficient completion time queries on the timeline >

[hudi] branch master updated: [HUDI-6725] Support efficient completion time queries on the timeline (#9565)

2023-09-04 Thread danny0405
This is an automated email from the ASF dual-hosted git repository. danny0405 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/master by this push: new a3eea2fdccd [HUDI-6725] Support efficient

[GitHub] [hudi] danny0405 merged pull request #9565: [HUDI-6725] Support efficient completion time queries on the timeline

2023-09-04 Thread via GitHub
danny0405 merged PR #9565: URL: https://github.com/apache/hudi/pull/9565 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

[GitHub] [hudi] hudi-bot commented on pull request #9553: [HUDI-1517][HUDI-6758][HUDI-6761] Adding support for per-logfile marker to track all log files added by a commit and to assist with rollbacks

2023-09-04 Thread via GitHub
hudi-bot commented on PR #9553: URL: https://github.com/apache/hudi/pull/9553#issuecomment-1705819011 ## CI report: * aeac327c3cad812fea5e2bc01c07c1314bbf1838 UNKNOWN * 31d9cf29d09360563e235291abfc717f83c99220 Azure:

[GitHub] [hudi] hudi-bot commented on pull request #9482: [HUDI-6728] Update BigQuery manifest sync to support schema evolution

2023-09-04 Thread via GitHub
hudi-bot commented on PR #9482: URL: https://github.com/apache/hudi/pull/9482#issuecomment-1705756953 ## CI report: * b84a6f31d753b486645d333fd645f7841de3e6e8 Azure:

[GitHub] [hudi] hudi-bot commented on pull request #9482: [HUDI-6728] Update BigQuery manifest sync to support schema evolution

2023-09-04 Thread via GitHub
hudi-bot commented on PR #9482: URL: https://github.com/apache/hudi/pull/9482#issuecomment-1705737702 ## CI report: * b84a6f31d753b486645d333fd645f7841de3e6e8 Azure:

[GitHub] [hudi] bhasudha commented on a diff in pull request #9603: [DOCS] Minor doc fixes

2023-09-04 Thread via GitHub
bhasudha commented on code in PR #9603: URL: https://github.com/apache/hudi/pull/9603#discussion_r1315217100 ## website/docs/quick-start-guide.md: ## @@ -20,7 +20,8 @@ Hudi works with Spark-2.4.3+ & Spark 3.x versions. You can follow instructions [ | Hudi|

[GitHub] [hudi] bhasudha commented on a diff in pull request #9603: [DOCS] Minor doc fixes

2023-09-04 Thread via GitHub
bhasudha commented on code in PR #9603: URL: https://github.com/apache/hudi/pull/9603#discussion_r1315216513 ## website/docs/quick-start-guide.md: ## @@ -20,7 +20,8 @@ Hudi works with Spark-2.4.3+ & Spark 3.x versions. You can follow instructions [ | Hudi|

[GitHub] [hudi] bhasudha commented on a diff in pull request #9603: [DOCS] Minor doc fixes

2023-09-04 Thread via GitHub
bhasudha commented on code in PR #9603: URL: https://github.com/apache/hudi/pull/9603#discussion_r1315216513 ## website/docs/quick-start-guide.md: ## @@ -20,7 +20,8 @@ Hudi works with Spark-2.4.3+ & Spark 3.x versions. You can follow instructions [ | Hudi|

[hudi] branch asf-site updated: [DOCS] Update cleaning page (#9560)

2023-09-04 Thread bhavanisudha
This is an automated email from the ASF dual-hosted git repository. bhavanisudha pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/asf-site by this push: new 0fc1724b5ed [DOCS] Update cleaning page

[GitHub] [hudi] bhasudha merged pull request #9560: [DOCS] Update cleaning page

2023-09-04 Thread via GitHub
bhasudha merged PR #9560: URL: https://github.com/apache/hudi/pull/9560 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

[GitHub] [hudi] bhasudha commented on a diff in pull request #9560: [DOCS] Update cleaning page

2023-09-04 Thread via GitHub
bhasudha commented on code in PR #9560: URL: https://github.com/apache/hudi/pull/9560#discussion_r1315214826 ## website/docs/hoodie_cleaner.md: ## @@ -1,44 +1,82 @@ --- title: Cleaning toc: true +toc_min_heading_level: 2 +toc_max_heading_level: 4 --- +## Background

[GitHub] [hudi] Hans-Raintree commented on issue #8968: [SUPPORT] Upsert fails with CDC logging enabled when deleted record does not exist.

2023-09-04 Thread via GitHub
Hans-Raintree commented on issue #8968: URL: https://github.com/apache/hudi/issues/8968#issuecomment-1705697779 Hey @ad1happy2go, it works!! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [hudi] hudi-bot commented on pull request #9579: [HUDI-6776] Replace JSON with Avro bytes for commit metadata

2023-09-04 Thread via GitHub
hudi-bot commented on PR #9579: URL: https://github.com/apache/hudi/pull/9579#issuecomment-1705593855 ## CI report: * d513b2964747630e3656ec1fe86755d656149e6a Azure:

[GitHub] [hudi] hudi-bot commented on pull request #9579: [HUDI-6776] Replace JSON with Avro bytes for commit metadata

2023-09-04 Thread via GitHub
hudi-bot commented on PR #9579: URL: https://github.com/apache/hudi/pull/9579#issuecomment-1705556894 ## CI report: * 8a550a00ddfac89c9f75e32b29af161e7485dd09 Azure:

[GitHub] [hudi] hudi-bot commented on pull request #9579: [HUDI-6776] Replace JSON with Avro bytes for commit metadata

2023-09-04 Thread via GitHub
hudi-bot commented on PR #9579: URL: https://github.com/apache/hudi/pull/9579#issuecomment-1705548436 ## CI report: * aa85eb7a559ddfe00fe751c29e680d725f984216 Azure:

[jira] [Created] (HUDI-6816) Remove JSON HoodieCommitMetadata altogether

2023-09-04 Thread Sagar Sumit (Jira)
Sagar Sumit created HUDI-6816: - Summary: Remove JSON HoodieCommitMetadata altogether Key: HUDI-6816 URL: https://issues.apache.org/jira/browse/HUDI-6816 Project: Apache Hudi Issue Type: Task

[jira] [Updated] (HUDI-6776) Unify commit metadata content in json for completed and avro for pending commits

2023-09-04 Thread Sagar Sumit (Jira)
[ https://issues.apache.org/jira/browse/HUDI-6776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sagar Sumit updated HUDI-6776: -- Status: Patch Available (was: In Progress) > Unify commit metadata content in json for completed and

[GitHub] [hudi] hudi-bot commented on pull request #9579: [HUDI-6776] Replace JSON with Avro bytes for commit metadata

2023-09-04 Thread via GitHub
hudi-bot commented on PR #9579: URL: https://github.com/apache/hudi/pull/9579#issuecomment-1705511865 ## CI report: * aa85eb7a559ddfe00fe751c29e680d725f984216 Azure:

[GitHub] [hudi] imrewang commented on issue #9513: [SUPPORT]Index Bootstrap deleted some snapshot data that has been batch-inserted into Hudi ?

2023-09-04 Thread via GitHub
imrewang commented on issue #9513: URL: https://github.com/apache/hudi/issues/9513#issuecomment-1705485069 still not working -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [hudi] ad1happy2go commented on issue #9613: [SUPPORT] Failed to write with hudi 0.13.0

2023-09-04 Thread via GitHub
ad1happy2go commented on issue #9613: URL: https://github.com/apache/hudi/issues/9613#issuecomment-1705343859 @PankajKaushal Can you post the table properties and writer configs. As you are seeing data loss, Can you try upgrading version to 0.13.1 -- This is an automated message from the

[GitHub] [hudi] ad1happy2go commented on issue #9588: [SUPPORT] [spark_sql]deleting data in Hudi based on partitioning, the corresponding partition in the Hudi metadata will be removed. However, the p

2023-09-04 Thread via GitHub
ad1happy2go commented on issue #9588: URL: https://github.com/apache/hudi/issues/9588#issuecomment-1705320186 @lucienoz Show partitions get the folder names from file system, We dont delete the partition directory from file system for other usecases like historical queries. The reason for

[GitHub] [hudi] raghunittala commented on issue #9596: [SUPPORT] Flink job failing with Avro ClassCastException

2023-09-04 Thread via GitHub
raghunittala commented on issue #9596: URL: https://github.com/apache/hudi/issues/9596#issuecomment-1705266623 Hi @danny0405 - We're not deserializing the payload. We are consuming a protobuf format from Kafka and writing to Hudi table as it is. We do not have any transformations in our

[GitHub] [hudi] BBency closed issue #9094: Async Clustering failing with errors for MOR table

2023-09-04 Thread via GitHub
BBency closed issue #9094: Async Clustering failing with errors for MOR table URL: https://github.com/apache/hudi/issues/9094 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [hudi] beyond1920 commented on a diff in pull request #9594: [HUDI-6742] Remove the log file appending for multiple instants

2023-09-04 Thread via GitHub
beyond1920 commented on code in PR #9594: URL: https://github.com/apache/hudi/pull/9594#discussion_r1313771543 ## hudi-common/src/main/java/org/apache/hudi/common/table/log/HoodieLogFormatWriter.java: ## @@ -87,40 +86,14 @@ public long getSizeThreshold() { * Lazily opens

[GitHub] [hudi] raghunadh-nittala-swi commented on issue #9596: [SUPPORT] Flink job failing with Avro ClassCastException

2023-09-04 Thread via GitHub
raghunadh-nittala-swi commented on issue #9596: URL: https://github.com/apache/hudi/issues/9596#issuecomment-1705138335 Hi @danny0405 - We're not deserializing the payload. We are consuming a protobuf format from Kafka and writing to Hudi table as it is. We do not have any transformations

[GitHub] [hudi] hudi-bot commented on pull request #9594: [HUDI-6742] Remove the log file appending for multiple instants

2023-09-04 Thread via GitHub
hudi-bot commented on PR #9594: URL: https://github.com/apache/hudi/pull/9594#issuecomment-1705081829 ## CI report: * 2690f091ae54e806131354707fc0c514fbcf9696 Azure:

[GitHub] [hudi] hudi-bot commented on pull request #9221: [HUDI-6550] Add Hadoop conf to HiveConf for HiveSyncConfig

2023-09-04 Thread via GitHub
hudi-bot commented on PR #9221: URL: https://github.com/apache/hudi/pull/9221#issuecomment-1704985804 ## CI report: * 63a32a980dfa7535b780966411acedd87b02f43b Azure:

[GitHub] [hudi] FWLamb commented on pull request #3771: [HUDI-2402] Add Kerberos configuration options to Hive Sync

2023-09-04 Thread via GitHub
FWLamb commented on PR #3771: URL: https://github.com/apache/hudi/pull/3771#issuecomment-1704893487 So now HiveSync supports kerberos or not? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [hudi] hudi-bot commented on pull request #9594: [HUDI-6742] Remove the log file appending for multiple instants

2023-09-04 Thread via GitHub
hudi-bot commented on PR #9594: URL: https://github.com/apache/hudi/pull/9594#issuecomment-1704837742 ## CI report: * 9ab27e9648ecdbbfbb55e0c4424bc5cfc858c69a Azure:

[GitHub] [hudi] hudi-bot commented on pull request #9594: [HUDI-6742] Remove the log file appending for multiple instants

2023-09-04 Thread via GitHub
hudi-bot commented on PR #9594: URL: https://github.com/apache/hudi/pull/9594#issuecomment-1704824206 ## CI report: * 9ab27e9648ecdbbfbb55e0c4424bc5cfc858c69a Azure:

[GitHub] [hudi] hudi-bot commented on pull request #9221: [HUDI-6550] Add Hadoop conf to HiveConf for HiveSyncConfig

2023-09-04 Thread via GitHub
hudi-bot commented on PR #9221: URL: https://github.com/apache/hudi/pull/9221#issuecomment-1704809907 ## CI report: * d61cae59e1bd9024ee67a9c7917ffbb2fc44236f Azure:

[jira] [Updated] (HUDI-6805) Print detailed error messages in clustering

2023-09-04 Thread Akira Ajisaka (Jira)
[ https://issues.apache.org/jira/browse/HUDI-6805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira Ajisaka updated HUDI-6805: Description: If clustering failed, it's not printing the detailed error reason. For example, in

[jira] [Commented] (HUDI-6805) Print detailed error messages in clustering

2023-09-04 Thread Akira Ajisaka (Jira)
[ https://issues.apache.org/jira/browse/HUDI-6805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17761723#comment-17761723 ] Akira Ajisaka commented on HUDI-6805: - Thank you Danny! > Print detailed error messages in clustering

[GitHub] [hudi] aajisaka commented on pull request #9577: [HUDI-6805] Print detailed error message in clustering

2023-09-04 Thread via GitHub
aajisaka commented on PR #9577: URL: https://github.com/apache/hudi/pull/9577#issuecomment-1704777437 Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [hudi] hudi-bot commented on pull request #9221: [HUDI-6550] Add Hadoop conf to HiveConf for HiveSyncConfig

2023-09-04 Thread via GitHub
hudi-bot commented on PR #9221: URL: https://github.com/apache/hudi/pull/9221#issuecomment-1704749615 ## CI report: * d61cae59e1bd9024ee67a9c7917ffbb2fc44236f Azure:

[GitHub] [hudi] xushiyan commented on pull request #9221: [HUDI-6550] Add Hadoop conf to HiveConf for HiveSyncConfig

2023-09-04 Thread via GitHub
xushiyan commented on PR #9221: URL: https://github.com/apache/hudi/pull/9221#issuecomment-1704733085 > The issue above only show up when using new HiveConf(hadoopConf, HiveConf.class). When it's reverted it to HiveConf hiveConf = new HiveConf(); hiveConf.addResource(hadoopConf); it works

[GitHub] [hudi] nsivabalan commented on a diff in pull request #9611: [HUDI-6758] Fixing deducing spurious log blocks due to spark retries

2023-09-04 Thread via GitHub
nsivabalan commented on code in PR #9611: URL: https://github.com/apache/hudi/pull/9611#discussion_r1314502200 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieAppendHandle.java: ## @@ -461,11 +464,11 @@ protected void appendDataAndDeleteBlocks(Map

[GitHub] [hudi] danny0405 closed issue #9598: [SUPPORT]Fix bootstrap operator null point exception while lastInstantTime is null

2023-09-04 Thread via GitHub
danny0405 closed issue #9598: [SUPPORT]Fix bootstrap operator null point exception while lastInstantTime is null URL: https://github.com/apache/hudi/issues/9598 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [hudi] danny0405 commented on a diff in pull request #9597: make SimpleAvroKeyGenerator support multi partition key

2023-09-04 Thread via GitHub
danny0405 commented on code in PR #9597: URL: https://github.com/apache/hudi/pull/9597#discussion_r1314488641 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/keygen/SimpleAvroKeyGenerator.java: ## @@ -41,7 +43,10 @@ public SimpleAvroKeyGenerator(TypedProperties

[jira] [Updated] (HUDI-6805) Print detailed error messages in clustering

2023-09-04 Thread Danny Chen (Jira)
[ https://issues.apache.org/jira/browse/HUDI-6805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Danny Chen updated HUDI-6805: - Fix Version/s: 1.0.0 > Print detailed error messages in clustering >

[jira] [Closed] (HUDI-6805) Print detailed error messages in clustering

2023-09-04 Thread Danny Chen (Jira)
[ https://issues.apache.org/jira/browse/HUDI-6805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Danny Chen closed HUDI-6805. Resolution: Fixed Fixed via master branch: f4e486ea3c369f441dcb39f51679720f85a3971c > Print detailed error

[hudi] branch master updated (09ed2cbffd5 -> f4e486ea3c3)

2023-09-04 Thread danny0405
This is an automated email from the ASF dual-hosted git repository. danny0405 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git from 09ed2cbffd5 [HUDI-6812]Fix bootstrap operator null point exception while lastInstantTime is null (#9599) add

[GitHub] [hudi] danny0405 merged pull request #9577: [HUDI-6805] Print detailed error message in clustering

2023-09-04 Thread via GitHub
danny0405 merged PR #9577: URL: https://github.com/apache/hudi/pull/9577 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: