[GitHub] [hudi] zhuanshenbsj1 commented on pull request #7159: [HUDI-5173]Skip if there is only one file in clusteringGroup

2023-01-31 Thread via GitHub
zhuanshenbsj1 commented on PR #7159: URL: https://github.com/apache/hudi/pull/7159#issuecomment-1411595328 @hudi-bot run azure -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment

[jira] [Updated] (HUDI-5671) BucketIndexPartitioner partition algorithm skew

2023-01-31 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-5671: - Labels: pull-request-available (was: ) > BucketIndexPartitioner partition algorithm skew > --

[GitHub] [hudi] loukey-lj opened a new pull request, #7815: [HUDI-5671] BucketIndexPartitioner partition algorithm skew

2023-01-31 Thread via GitHub
loukey-lj opened a new pull request, #7815: URL: https://github.com/apache/hudi/pull/7815 ### Change Logs The online job runs for 13 days and finds that there are subtasks but no data processing, as shown in the figure below, this job uses the update time as the partition, uses the b

[GitHub] [hudi] Archie-selfless commented on issue #7730: [SUPPORT] Hive Sync Tool parses timestamp field as bigint in Hive metastore

2023-01-31 Thread via GitHub
Archie-selfless commented on issue #7730: URL: https://github.com/apache/hudi/issues/7730#issuecomment-1411571048 @danny0405 @lokeshj1703 I had the same problem when using Flink ingest data(one MySQL table) to Hudi with syncing metadata to Hive. Env versions as follows: - Hudi-0.12.1

[jira] [Updated] (HUDI-5673) Support multi writer for bucket index with guarded lock

2023-01-31 Thread Danny Chen (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Danny Chen updated HUDI-5673: - Summary: Support multi writer for bucket index with guarded lock (was: Support multi-writer for bucket in

[jira] [Created] (HUDI-5674) Support multi-writer for state-backend index

2023-01-31 Thread Danny Chen (Jira)
Danny Chen created HUDI-5674: Summary: Support multi-writer for state-backend index Key: HUDI-5674 URL: https://issues.apache.org/jira/browse/HUDI-5674 Project: Apache Hudi Issue Type: New Featur

[jira] [Created] (HUDI-5673) Support multi-writer for bucket index with guarded lock

2023-01-31 Thread Danny Chen (Jira)
Danny Chen created HUDI-5673: Summary: Support multi-writer for bucket index with guarded lock Key: HUDI-5673 URL: https://issues.apache.org/jira/browse/HUDI-5673 Project: Apache Hudi Issue Type:

[jira] [Created] (HUDI-5672) Flink multi writer support

2023-01-31 Thread Danny Chen (Jira)
Danny Chen created HUDI-5672: Summary: Flink multi writer support Key: HUDI-5672 URL: https://issues.apache.org/jira/browse/HUDI-5672 Project: Apache Hudi Issue Type: Epic Reporter: D

[jira] [Created] (HUDI-5671) BucketIndexPartitioner partition algorithm skew

2023-01-31 Thread loukey_j (Jira)
loukey_j created HUDI-5671: -- Summary: BucketIndexPartitioner partition algorithm skew Key: HUDI-5671 URL: https://issues.apache.org/jira/browse/HUDI-5671 Project: Apache Hudi Issue Type: Improvement

[GitHub] [hudi] scxwhite commented on pull request #7664: [HUDI-5551] support seconds unit on event_time

2023-01-31 Thread via GitHub
scxwhite commented on PR #7664: URL: https://github.com/apache/hudi/pull/7664#issuecomment-1411553653 @danny0405 @yihua The test has been added and completed, please help to review it, thank you very much. -- This is an automated message from the Apache Git Service. To respond to the mes

[jira] [Updated] (HUDI-5660) Support bucket index for spark bulk_insert

2023-01-31 Thread xi chaomin (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xi chaomin updated HUDI-5660: - Issue Type: Improvement (was: Bug) > Support bucket index for spark bulk_insert > ---

[jira] [Updated] (HUDI-5661) Add ConflictResolutionStrategy for bucket index

2023-01-31 Thread xi chaomin (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xi chaomin updated HUDI-5661: - Issue Type: Bug (was: Improvement) > Add ConflictResolutionStrategy for bucket index > --

[jira] [Updated] (HUDI-5660) Support bucket index for spark bulk_insert

2023-01-31 Thread xi chaomin (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xi chaomin updated HUDI-5660: - Issue Type: Bug (was: Improvement) > Support bucket index for spark bulk_insert > ---

[GitHub] [hudi] hudi-bot commented on pull request #7378: [HUDI-5329] spark reads hudi table error when flink creates the table without preCombine fields

2023-01-31 Thread via GitHub
hudi-bot commented on PR #7378: URL: https://github.com/apache/hudi/pull/7378#issuecomment-1411547289 ## CI report: * 576dad186d2bfbe4bc497d75d17c6ded88df35a5 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1480

[GitHub] [hudi] hudi-bot commented on pull request #7803: [HUDI-5661] Add ConflictResolutionStrategy for bucket index

2023-01-31 Thread via GitHub
hudi-bot commented on PR #7803: URL: https://github.com/apache/hudi/pull/7803#issuecomment-1411543085 ## CI report: * 410d0d504acaa6ff46ee85bb3dddb46cf5fb18fb UNKNOWN * 93948c73f4936b57263752cefc44c758f603bcf5 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2

[GitHub] [hudi] hudi-bot commented on pull request #7378: [HUDI-5329] spark reads hudi table error when flink creates the table without preCombine fields

2023-01-31 Thread via GitHub
hudi-bot commented on PR #7378: URL: https://github.com/apache/hudi/pull/7378#issuecomment-1411542578 ## CI report: * 576dad186d2bfbe4bc497d75d17c6ded88df35a5 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1480

[GitHub] [hudi] hudi-bot commented on pull request #7614: [HUDI-5509] check if dfs support atomic creation when using filesyste…

2023-01-31 Thread via GitHub
hudi-bot commented on PR #7614: URL: https://github.com/apache/hudi/pull/7614#issuecomment-1411538121 ## CI report: * 058ab2703bda207fc9f5861d5e4b865e83ee1b45 UNKNOWN * 3c010a86327c341b29aaea9ff6ca571855951bd3 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2

[GitHub] [hudi] hudi-bot commented on pull request #7362: [HUDI-5315] The record size is dynamically estimated when the table i…

2023-01-31 Thread via GitHub
hudi-bot commented on PR #7362: URL: https://github.com/apache/hudi/pull/7362#issuecomment-1411537759 ## CI report: * b3e842754a302dc1372b330a8c32298d49732107 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1483

[GitHub] [hudi] hudi-bot commented on pull request #6856: [HUDI-4968] Update misleading read.streaming.skip_compaction/skip_clustering config

2023-01-31 Thread via GitHub
hudi-bot commented on PR #6856: URL: https://github.com/apache/hudi/pull/6856#issuecomment-1411537136 ## CI report: * 5b1de9e25fa1e58d90d315b52683b3a506ccbdd4 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1483

[GitHub] [hudi] LinMingQiang commented on pull request #7719: [HUDI-5584] When the table to be synchronized already exists in hive,…

2023-01-31 Thread via GitHub
LinMingQiang commented on PR #7719: URL: https://github.com/apache/hudi/pull/7719#issuecomment-1411533859 > This modification adds an API method, maybe we can do it in another way. We can place storage descriptor modifications within the scope of the `org.apache.hudi.hive.HoodieHiveSyncClie

[GitHub] [hudi] hudi-bot commented on pull request #7812: [HUDI-5669]fix BucketIndexPartitioner data skew

2023-01-31 Thread via GitHub
hudi-bot commented on PR #7812: URL: https://github.com/apache/hudi/pull/7812#issuecomment-1411532688 ## CI report: * 27a7aee4207a4f7050b57ffc20562ed4aae2f1fd Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1483

[GitHub] [hudi] hudi-bot commented on pull request #7803: [HUDI-5661] Add ConflictResolutionStrategy for bucket index

2023-01-31 Thread via GitHub
hudi-bot commented on PR #7803: URL: https://github.com/apache/hudi/pull/7803#issuecomment-1411532623 ## CI report: * 410d0d504acaa6ff46ee85bb3dddb46cf5fb18fb UNKNOWN * 93948c73f4936b57263752cefc44c758f603bcf5 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2

[hudi] branch master updated: [HUDI-5317] Fix insert overwrite table for partitioned table (#7793)

2023-01-31 Thread leesf
This is an automated email from the ASF dual-hosted git repository. leesf pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/master by this push: new 0a9a6d20471 [HUDI-5317] Fix insert overwrite table

[GitHub] [hudi] leesf merged pull request #7793: [HUDI-5317] Fix insert overwrite table for partitioned table

2023-01-31 Thread via GitHub
leesf merged PR #7793: URL: https://github.com/apache/hudi/pull/7793 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

[hudi] branch master updated: [HUDI-5540] Close write client after usage of DeleteMarker/RollbackToInstantTime/RunClean/RunCompactionProcedure (#7655)

2023-01-31 Thread leesf
This is an automated email from the ASF dual-hosted git repository. leesf pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/master by this push: new 255d40c5fb5 [HUDI-5540] Close write client after us

[GitHub] [hudi] leesf merged pull request #7655: [HUDI-5540] Close write client after usage of DeleteMarker/RollbackTo…

2023-01-31 Thread via GitHub
leesf merged PR #7655: URL: https://github.com/apache/hudi/pull/7655 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

[GitHub] [hudi] SteNicholas commented on a diff in pull request #6856: [HUDI-4968] Update misleading read.streaming.skip_compaction/skip_clustering config

2023-01-31 Thread via GitHub
SteNicholas commented on code in PR #6856: URL: https://github.com/apache/hudi/pull/6856#discussion_r1092782380 ## hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/configuration/FlinkOptions.java: ## @@ -297,19 +299,22 @@ private FlinkOptions() { .key("read.

[GitHub] [hudi] SteNicholas commented on a diff in pull request #6856: [HUDI-4968] Update misleading read.streaming.skip_compaction/skip_clustering config

2023-01-31 Thread via GitHub
SteNicholas commented on code in PR #6856: URL: https://github.com/apache/hudi/pull/6856#discussion_r1092781898 ## hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/configuration/FlinkOptions.java: ## @@ -297,19 +299,22 @@ private FlinkOptions() { .key("read.

[GitHub] [hudi] SteNicholas commented on a diff in pull request #6856: [HUDI-4968] Update misleading read.streaming.skip_compaction/skip_clustering config

2023-01-31 Thread via GitHub
SteNicholas commented on code in PR #6856: URL: https://github.com/apache/hudi/pull/6856#discussion_r1092781756 ## hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/configuration/FlinkOptions.java: ## @@ -297,19 +299,22 @@ private FlinkOptions() { .key("read.

[GitHub] [hudi] txl2017 opened a new issue, #7814: [Suport]hudi Load all partition paths and it's files under the query table path when createRelation

2023-01-31 Thread via GitHub
txl2017 opened a new issue, #7814: URL: https://github.com/apache/hudi/issues/7814 Analyzed Logical Plan got error when query hudi table use sparksql. java.lang.OutOfMemoryError: Java heap space ![1675222529197_DC51BC37-D299-4600-B33F-7E3E2B7369D2](https://user-images.githubusercon

[GitHub] [hudi] hudi-bot commented on pull request #7813: [MINOR] Fix CTAS and Insert Into to avoid combine-on-insert by default

2023-01-31 Thread via GitHub
hudi-bot commented on PR #7813: URL: https://github.com/apache/hudi/pull/7813#issuecomment-1411466324 ## CI report: * bd427884d0d57f86eeb0260a5bc0f606fb72cb19 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1483

[GitHub] [hudi] hudi-bot commented on pull request #7787: [HUDI-5646] Guard dropping columns by a config, do not allow by default

2023-01-31 Thread via GitHub
hudi-bot commented on PR #7787: URL: https://github.com/apache/hudi/pull/7787#issuecomment-1411465963 ## CI report: * fe056bb626afe4ae03eeb0fd2a9c2108e81daca2 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1482

[GitHub] [hudi] stream2000 commented on pull request #7793: [HUDI-5317] Fix insert overwrite table for partitioned table

2023-01-31 Thread via GitHub
stream2000 commented on PR #7793: URL: https://github.com/apache/hudi/pull/7793#issuecomment-1411425231 LGTM -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe,

[GitHub] [hudi] hudi-bot commented on pull request #7813: [MINOR] Fix CTAS and Insert Into to avoid combine-on-insert by default

2023-01-31 Thread via GitHub
hudi-bot commented on PR #7813: URL: https://github.com/apache/hudi/pull/7813#issuecomment-1411422693 ## CI report: * bd427884d0d57f86eeb0260a5bc0f606fb72cb19 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the

[GitHub] [hudi] hudi-bot commented on pull request #7787: [HUDI-5646] Guard dropping columns by a config, do not allow by default

2023-01-31 Thread via GitHub
hudi-bot commented on PR #7787: URL: https://github.com/apache/hudi/pull/7787#issuecomment-1411422598 ## CI report: * fe056bb626afe4ae03eeb0fd2a9c2108e81daca2 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1482

[GitHub] [hudi] slfan1989 commented on pull request #7809: [HUDI-5664] Improve SqlQueryPreCommitValidator#queries Parallelism.

2023-01-31 Thread via GitHub
slfan1989 commented on PR #7809: URL: https://github.com/apache/hudi/pull/7809#issuecomment-1411419305 @danny0405 @yihua @@xushiyan Can you help review this pr? Thank you very much! This part of the code uses Java's Stream Api, and the author hopes to use thread pool to improve proc

[GitHub] [hudi] hudi-bot commented on pull request #7812: [HUDI-5669]fix BucketIndexPartitioner data skew

2023-01-31 Thread via GitHub
hudi-bot commented on PR #7812: URL: https://github.com/apache/hudi/pull/7812#issuecomment-1411419123 ## CI report: * 27a7aee4207a4f7050b57ffc20562ed4aae2f1fd Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1483

[GitHub] [hudi] hudi-bot commented on pull request #7803: [HUDI-5661] Add ConflictResolutionStrategy for bucket index

2023-01-31 Thread via GitHub
hudi-bot commented on PR #7803: URL: https://github.com/apache/hudi/pull/7803#issuecomment-1411419083 ## CI report: * 410d0d504acaa6ff46ee85bb3dddb46cf5fb18fb UNKNOWN * 53807f6493b7056be1afdd7e78353a354514f845 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2

[GitHub] [hudi] alexeykudinkin opened a new pull request, #7813: [MINOR] Fix CTAS and Insert Into to avoid combine-on-insert by default

2023-01-31 Thread via GitHub
alexeykudinkin opened a new pull request, #7813: URL: https://github.com/apache/hudi/pull/7813 ### Change Logs Currently, `InsertIntoHoodieTable` by default sets `COMBINE_BEFORE_INSERT` config whenever pre-combine field is specified. Instead we should defer this to default setting (w

[GitHub] [hudi] hudi-bot commented on pull request #7362: [HUDI-5315] The record size is dynamically estimated when the table i…

2023-01-31 Thread via GitHub
hudi-bot commented on PR #7362: URL: https://github.com/apache/hudi/pull/7362#issuecomment-1411418561 ## CI report: * 3115bb62ba02c1e5affe303aa47706c88387704d Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1429

[GitHub] [hudi] hudi-bot commented on pull request #6856: [HUDI-4968] Update misleading read.streaming.skip_compaction/skip_clustering config

2023-01-31 Thread via GitHub
hudi-bot commented on PR #6856: URL: https://github.com/apache/hudi/pull/6856#issuecomment-1411418179 ## CI report: * c1f659db4e4512f0a3dea07764a5542473b14da7 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1209

[GitHub] [hudi] Zouxxyy commented on a diff in pull request #7793: [HUDI-5317] Fix insert overwrite table for partitioned table

2023-01-31 Thread via GitHub
Zouxxyy commented on code in PR #7793: URL: https://github.com/apache/hudi/pull/7793#discussion_r109272 ## hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/spark/sql/hudi/TestInsertTable.scala: ## @@ -433,122 +433,22 @@ class TestInsertTable extends HoodieSparkSqlT

[GitHub] [hudi] Zouxxyy commented on a diff in pull request #7793: [HUDI-5317] Fix insert overwrite table for partitioned table

2023-01-31 Thread via GitHub
Zouxxyy commented on code in PR #7793: URL: https://github.com/apache/hudi/pull/7793#discussion_r109272 ## hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/spark/sql/hudi/TestInsertTable.scala: ## @@ -433,122 +433,22 @@ class TestInsertTable extends HoodieSparkSqlT

[GitHub] [hudi] hudi-bot commented on pull request #7812: [HUDI-5669]fix BucketIndexPartitioner data skew

2023-01-31 Thread via GitHub
hudi-bot commented on PR #7812: URL: https://github.com/apache/hudi/pull/7812#issuecomment-1411415450 ## CI report: * 27a7aee4207a4f7050b57ffc20562ed4aae2f1fd UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the

[GitHub] [hudi] hudi-bot commented on pull request #7803: [HUDI-5661] Add ConflictResolutionStrategy for bucket index

2023-01-31 Thread via GitHub
hudi-bot commented on PR #7803: URL: https://github.com/apache/hudi/pull/7803#issuecomment-1411415406 ## CI report: * 410d0d504acaa6ff46ee85bb3dddb46cf5fb18fb UNKNOWN * 53807f6493b7056be1afdd7e78353a354514f845 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2

[GitHub] [hudi] hudi-bot commented on pull request #7362: [HUDI-5315] The record size is dynamically estimated when the table i…

2023-01-31 Thread via GitHub
hudi-bot commented on PR #7362: URL: https://github.com/apache/hudi/pull/7362#issuecomment-1411414979 ## CI report: * 3115bb62ba02c1e5affe303aa47706c88387704d Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1429

[GitHub] [hudi] hudi-bot commented on pull request #6856: [HUDI-4968] Update misleading read.streaming.skip_compaction/skip_clustering config

2023-01-31 Thread via GitHub
hudi-bot commented on PR #6856: URL: https://github.com/apache/hudi/pull/6856#issuecomment-1411414538 ## CI report: * c1f659db4e4512f0a3dea07764a5542473b14da7 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1209

[jira] [Updated] (HUDI-5670) Server-based markers creation times out

2023-01-31 Thread Alexey Kudinkin (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexey Kudinkin updated HUDI-5670: -- Summary: Server-based markers creation times out (was: Server-based Marker creation times out)

[GitHub] [hudi] hudi-bot commented on pull request #7811: [HUDI-5518] Support canal-json for HoodieDeltaStreamer

2023-01-31 Thread via GitHub
hudi-bot commented on PR #7811: URL: https://github.com/apache/hudi/pull/7811#issuecomment-1411411400 ## CI report: * 96712c1c9710b5eeb5da7458e5d9395d25078ded Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1482

[GitHub] [hudi] hudi-bot commented on pull request #7810: [MINOR] Restoring existing behavior for `DeltaStreamer` Incremental Source

2023-01-31 Thread via GitHub
hudi-bot commented on PR #7810: URL: https://github.com/apache/hudi/pull/7810#issuecomment-1411411389 ## CI report: * adbb0c255fd7f6510c65161decfd732e72434770 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1482

[GitHub] [hudi] hudi-bot commented on pull request #7614: [HUDI-5509] check if dfs support atomic creation when using filesyste…

2023-01-31 Thread via GitHub
hudi-bot commented on PR #7614: URL: https://github.com/apache/hudi/pull/7614#issuecomment-1411411046 ## CI report: * 058ab2703bda207fc9f5861d5e4b865e83ee1b45 UNKNOWN * 3c010a86327c341b29aaea9ff6ca571855951bd3 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2

[GitHub] [hudi] voonhous commented on pull request #6856: [HUDI-4968] Update misleading read.streaming.skip_compaction/skip_clustering config

2023-01-31 Thread via GitHub
voonhous commented on PR #6856: URL: https://github.com/apache/hudi/pull/6856#issuecomment-1411410995 @SteNicholas Done, can you please help to take a look? I update the PR's description to match up with the changes too. -- This is an automated message from the Apache Git Service.

[GitHub] [hudi] danny0405 closed pull request #7807: [DO NOT MERGE] Test flacky #testLatestCheckpointCarryOverWithMultipleWriters

2023-01-31 Thread via GitHub
danny0405 closed pull request #7807: [DO NOT MERGE] Test flacky #testLatestCheckpointCarryOverWithMultipleWriters URL: https://github.com/apache/hudi/pull/7807 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [hudi] stream2000 commented on a diff in pull request #7793: [HUDI-5317] Fix insert overwrite table for partitioned table

2023-01-31 Thread via GitHub
stream2000 commented on code in PR #7793: URL: https://github.com/apache/hudi/pull/7793#discussion_r1092719567 ## hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/spark/sql/hudi/TestInsertTable.scala: ## @@ -433,122 +433,22 @@ class TestInsertTable extends HoodieSparkS

[GitHub] [hudi] stream2000 commented on a diff in pull request #7793: [HUDI-5317] Fix insert overwrite table for partitioned table

2023-01-31 Thread via GitHub
stream2000 commented on code in PR #7793: URL: https://github.com/apache/hudi/pull/7793#discussion_r1092719567 ## hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/spark/sql/hudi/TestInsertTable.scala: ## @@ -433,122 +433,22 @@ class TestInsertTable extends HoodieSparkS

[jira] [Updated] (HUDI-5670) Server-based Marker creation times out

2023-01-31 Thread Alexey Kudinkin (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexey Kudinkin updated HUDI-5670: -- Fix Version/s: 0.13.1 > Server-based Marker creation times out > ---

[jira] [Created] (HUDI-5670) Server-based Marker creation times out

2023-01-31 Thread Alexey Kudinkin (Jira)
Alexey Kudinkin created HUDI-5670: - Summary: Server-based Marker creation times out Key: HUDI-5670 URL: https://issues.apache.org/jira/browse/HUDI-5670 Project: Apache Hudi Issue Type: Bug

[GitHub] [hudi] sandyfog opened a new pull request, #7812: [HUDI-5669]fix BucketIndexPartitioner data skew

2023-01-31 Thread via GitHub
sandyfog opened a new pull request, #7812: URL: https://github.com/apache/hudi/pull/7812 ### Change Logs _Describe context and summary for this change. Highlight if any code was copied._ ### Impact _Describe any public API or user-facing feature change or any performance

[jira] [Updated] (HUDI-5669) BucketIndexPartitioner maybe cause task data skew

2023-01-31 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-5669: - Labels: pull-request-available (was: ) > BucketIndexPartitioner maybe cause task data skew > ---

[jira] [Updated] (HUDI-5668) Separate advanced configs from essential configs

2023-01-31 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-5668: Fix Version/s: 0.13.1 > Separate advanced configs from essential configs > -

[jira] [Updated] (HUDI-5668) Separate advanced configs from essential configs

2023-01-31 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-5668: Priority: Blocker (was: Major) > Separate advanced configs from essential configs > ---

[jira] [Created] (HUDI-5669) BucketIndexPartitioner maybe cause task data skew

2023-01-31 Thread sandy du (Jira)
sandy du created HUDI-5669: -- Summary: BucketIndexPartitioner maybe cause task data skew Key: HUDI-5669 URL: https://issues.apache.org/jira/browse/HUDI-5669 Project: Apache Hudi Issue Type: Bug

[jira] [Created] (HUDI-5668) Separate advanced configs from essential configs

2023-01-31 Thread Ethan Guo (Jira)
Ethan Guo created HUDI-5668: --- Summary: Separate advanced configs from essential configs Key: HUDI-5668 URL: https://issues.apache.org/jira/browse/HUDI-5668 Project: Apache Hudi Issue Type: Improvem

[GitHub] [hudi] fengjian428 commented on pull request #7614: [HUDI-5509] check if dfs support atomic creation when using filesyste…

2023-01-31 Thread via GitHub
fengjian428 commented on PR #7614: URL: https://github.com/apache/hudi/pull/7614#issuecomment-1411356130 @hudi-bot run azure -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [hudi] hudi-bot commented on pull request #7802: [DNM] Disable default Avro schema validation

2023-01-31 Thread via GitHub
hudi-bot commented on PR #7802: URL: https://github.com/apache/hudi/pull/7802#issuecomment-1411338697 ## CI report: * 9c39a3a58b9407bdbccdb3765fee62ced4674462 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1482

[GitHub] [hudi] hudi-bot commented on pull request #7787: [HUDI-5646] Guard dropping columns by a config, do not allow by default

2023-01-31 Thread via GitHub
hudi-bot commented on PR #7787: URL: https://github.com/apache/hudi/pull/7787#issuecomment-1411338571 ## CI report: * fe056bb626afe4ae03eeb0fd2a9c2108e81daca2 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1482

[jira] [Created] (HUDI-5667) Establish better index lookup and write perf w/ RLI compared to global bloom

2023-01-31 Thread sivabalan narayanan (Jira)
sivabalan narayanan created HUDI-5667: - Summary: Establish better index lookup and write perf w/ RLI compared to global bloom Key: HUDI-5667 URL: https://issues.apache.org/jira/browse/HUDI-5667 Pr

[jira] [Created] (HUDI-5666) Support custom compaction strategy to compact files partition in MDT aggressively

2023-01-31 Thread sivabalan narayanan (Jira)
sivabalan narayanan created HUDI-5666: - Summary: Support custom compaction strategy to compact files partition in MDT aggressively Key: HUDI-5666 URL: https://issues.apache.org/jira/browse/HUDI-5666

[GitHub] [hudi] danny0405 commented on a diff in pull request #7795: [HUDI-5651] sort the inputs by record keys for bulk insert tasks

2023-01-31 Thread via GitHub
danny0405 commented on code in PR #7795: URL: https://github.com/apache/hudi/pull/7795#discussion_r1092668275 ## hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/utils/Pipelines.java: ## @@ -150,6 +150,22 @@ public static DataStreamSink bulkInsert(Configurati

[jira] [Created] (HUDI-5665) Re-use table configs for subsequent writes

2023-01-31 Thread sivabalan narayanan (Jira)
sivabalan narayanan created HUDI-5665: - Summary: Re-use table configs for subsequent writes Key: HUDI-5665 URL: https://issues.apache.org/jira/browse/HUDI-5665 Project: Apache Hudi Issue

[jira] [Resolved] (HUDI-5585) After flink creates and writes the table, the spark alter table reports an error

2023-01-31 Thread Danny Chen (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Danny Chen resolved HUDI-5585. -- > After flink creates and writes the table, the spark alter table reports an > error >

[jira] [Closed] (HUDI-5585) After flink creates and writes the table, the spark alter table reports an error

2023-01-31 Thread Danny Chen (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Danny Chen closed HUDI-5585. Fix Version/s: 0.13.1 Resolution: Fixed Fixed via master branch: 9469882d80f6abf0fa1b233430de7c2c0ab5

[hudi] branch master updated: [HUDI-5585][flink] Fix flink creates and writes the table, the spark alter table reports an error (#7706)

2023-01-31 Thread danny0405
This is an automated email from the ASF dual-hosted git repository. danny0405 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/master by this push: new 9469882d80f [HUDI-5585][flink] Fix flink create

[GitHub] [hudi] danny0405 merged pull request #7706: [HUDI-5585][flink]Fix flink creates and writes the table, the spark alter table reports an error

2023-01-31 Thread via GitHub
danny0405 merged PR #7706: URL: https://github.com/apache/hudi/pull/7706 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache

[GitHub] [hudi] danny0405 commented on pull request #7706: [HUDI-5585][flink]Fix flink creates and writes the table, the spark alter table reports an error

2023-01-31 Thread via GitHub
danny0405 commented on PR #7706: URL: https://github.com/apache/hudi/pull/7706#issuecomment-1411326005 The two failed tests are un-related with this patch: https://dev.azure.com/apache-hudi-ci-org/apache-hudi-ci/_build/results?buildId=14803&view=logs&j=3b6e910d-b98f-5de6-b9cb-1e5ff571

[GitHub] [hudi] hudi-bot commented on pull request #7810: [MINOR] Restoring existing behavior for `DeltaStreamer` Incremental Source

2023-01-31 Thread via GitHub
hudi-bot commented on PR #7810: URL: https://github.com/apache/hudi/pull/7810#issuecomment-1411284234 ## CI report: * adbb0c255fd7f6510c65161decfd732e72434770 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1482

[GitHub] [hudi] hudi-bot commented on pull request #7811: [HUDI-5518] Support canal-json for HoodieDeltaStreamer

2023-01-31 Thread via GitHub
hudi-bot commented on PR #7811: URL: https://github.com/apache/hudi/pull/7811#issuecomment-1411271063 ## CI report: * 96712c1c9710b5eeb5da7458e5d9395d25078ded UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the

[GitHub] [hudi] hudi-bot commented on pull request #7811: [HUDI-5518] Support canal-json for HoodieDeltaStreamer

2023-01-31 Thread via GitHub
hudi-bot commented on PR #7811: URL: https://github.com/apache/hudi/pull/7811#issuecomment-1411279887 ## CI report: * 96712c1c9710b5eeb5da7458e5d9395d25078ded Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1482

[GitHub] [hudi] hudi-bot commented on pull request #7359: [HUDI-3304] WIP - Allow selective partial update

2023-01-31 Thread via GitHub
hudi-bot commented on PR #7359: URL: https://github.com/apache/hudi/pull/7359#issuecomment-1411279294 ## CI report: * adce700376e9214504bdf08a43a6b345c920345c Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1482

[GitHub] [hudi] hudi-bot commented on pull request #7810: [MINOR] Restoring existing behavior for `DeltaStreamer` Incremental Source

2023-01-31 Thread via GitHub
hudi-bot commented on PR #7810: URL: https://github.com/apache/hudi/pull/7810#issuecomment-1411270999 ## CI report: * adbb0c255fd7f6510c65161decfd732e72434770 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the

[jira] [Updated] (HUDI-5510) The latest written commit is not used when getInstantsToArchive

2023-01-31 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-5510: - Sprint: 0.13.0 Final Sprint 3, Sprint 2023-01-31 (was: 0.13.0 Final Sprint 3) > The latest written commi

[jira] [Updated] (HUDI-5642) Enable schema reconciliation by default

2023-01-31 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-5642: - Sprint: 0.13.0 Final Sprint 3, Sprint 2023-01-31 (was: 0.13.0 Final Sprint 3) > Enable schema reconcilia

[jira] [Updated] (HUDI-5585) After flink creates and writes the table, the spark alter table reports an error

2023-01-31 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-5585: - Sprint: 0.13.0 Final Sprint 3, Sprint 2023-01-31 (was: 0.13.0 Final Sprint 3) > After flink creates and

[jira] [Updated] (HUDI-5321) Fix Bulk Insert ColumnSortPartitioners

2023-01-31 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-5321: - Sprint: 2022/12/12, 0.13.0 Final Sprint, 0.13.0 Final Sprint 2, 0.13.0 Final Sprint 3, Sprint 2023-01-31

[jira] [Updated] (HUDI-1574) Trim existing unit tests to finish in much shorter amount of time

2023-01-31 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-1574: - Sprint: 2022/08/22, 2022/09/05, 2022/09/19, 2022/10/04, 2022/10/18, 2022/11/01, 2022/11/15, 2022/11/29, 20

[jira] [Updated] (HUDI-2681) Make hoodie record_key and preCombine_key optional

2023-01-31 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-2681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-2681: - Sprint: 0.13.0 Final Sprint 2, 0.13.0 Final Sprint 3, Sprint 2023-01-31 (was: 0.13.0 Final Sprint 2, 0.13

[jira] [Updated] (HUDI-4613) Avoid the use of regex expressions when call hoodieFileGroup#addLogFile function

2023-01-31 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-4613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-4613: - Sprint: 2022/09/05, 2022/12/12, 0.13.0 Final Sprint, 0.13.0 Final Sprint 2, 0.13.0 Final Sprint 3, Sprint

[jira] [Updated] (HUDI-5552) Too slow while using trino-hudi connector while querying partitioned tables.

2023-01-31 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-5552: - Sprint: 0.13.0 Final Sprint 2, 0.13.0 Final Sprint 3, Sprint 2023-01-31 (was: 0.13.0 Final Sprint 2, 0.13

[jira] [Updated] (HUDI-3601) Support multi-arch builds in docker setup

2023-01-31 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-3601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-3601: - Sprint: 2022/09/05, 2022/09/19, 2022/10/04, 2022/10/18, 2022/11/01, 2022/11/15, 2022/11/29, 2022/12/12, 0.

[jira] [Updated] (HUDI-5647) Automate savepoint and restore tests

2023-01-31 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-5647: - Sprint: 0.13.0 Final Sprint 3, Sprint 2023-01-31 (was: 0.13.0 Final Sprint 3) > Automate savepoint and r

[jira] [Updated] (HUDI-5641) Streamline Advanced Schema Evolution flow

2023-01-31 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-5641: - Sprint: 0.13.0 Final Sprint 3, Sprint 2023-01-31 (was: 0.13.0 Final Sprint 3) > Streamline Advanced Sche

[jira] [Updated] (HUDI-5656) Metadata Bootstrap flow resulting in NPE

2023-01-31 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-5656: - Sprint: 0.13.0 Final Sprint 3, Sprint 2023-01-31 (was: 0.13.0 Final Sprint 3) > Metadata Bootstrap flow

[jira] [Updated] (HUDI-5649) Unify all the loggers to slf4j

2023-01-31 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-5649: - Sprint: 0.13.0 Final Sprint 3, Sprint 2023-01-31 (was: 0.13.0 Final Sprint 3) > Unify all the loggers to

[jira] [Updated] (HUDI-5651) sort the inputs by record keys for bulk insert tasks

2023-01-31 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-5651: - Sprint: 0.13.0 Final Sprint 3, Sprint 2023-01-31 (was: 0.13.0 Final Sprint 3) > sort the inputs by recor

[jira] [Updated] (HUDI-3967) Automatic savepoint in Hudi

2023-01-31 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-3967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-3967: - Sprint: 2022/08/22, 2022/09/05, 2022/09/19, 2022/10/04, 2022/10/18, 2022/11/01, 2022/11/15, 2022/11/29, 20

[jira] [Updated] (HUDI-4937) Fix HoodieTable injecting HoodieBackedTableMetadata not reusing underlying MT readers

2023-01-31 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-4937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-4937: - Sprint: 2022/10/04, 2022/10/18, 2022/11/01, 2022/11/15, 2022/11/29, 2022/12/12, 0.13.0 Final Sprint, 0.13.

[jira] [Updated] (HUDI-5352) Jackson fails to serialize LocalDate when updating Delta Commit metadata

2023-01-31 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-5352: - Sprint: 2022/12/12, 0.13.0 Final Sprint, 0.13.0 Final Sprint 2, 0.13.0 Final Sprint 3, Sprint 2023-01-31

[jira] [Updated] (HUDI-5575) Support any record key generation along w/ any partition path generation for row writer

2023-01-31 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-5575: - Sprint: 0.13.0 Final Sprint 3, Sprint 2023-01-31 (was: 0.13.0 Final Sprint 3) > Support any record key g

[jira] [Updated] (HUDI-5442) Fix HiveHoodieTableFileIndex to use lazy listing

2023-01-31 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-5442: - Sprint: 0.13.0 Final Sprint, 0.13.0 Final Sprint 2, 0.13.0 Final Sprint 3, Sprint 2023-01-31 (was: 0.13.0

[jira] [Updated] (HUDI-3529) Improve dependency management and bundling

2023-01-31 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-3529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-3529: - Sprint: 2022/08/22, 2022/09/05, 2022/09/19, 2022/10/04, 2022/10/18, 2022/11/01, 2022/11/15, 2022/11/29, 20

  1   2   3   >