[GitHub] [hudi] Limess commented on issue #6367: [SUPPORT] Failed Job - doing partition and writing data - in Hudi 0.11.0

2022-08-15 Thread GitBox
Limess commented on issue #6367: URL: https://github.com/apache/hudi/issues/6367#issuecomment-1215268725 > Can you disable metadata explicitly in your write configs (hoodie.metadata.enable=false), and try restarting your pipeline This already worked for us, I haven't tried

[GitHub] [hudi] Zouxxyy commented on pull request #6267: [HUDI-4515] Fix savepoints will be cleaned in keeping latest versions policy

2022-08-15 Thread GitBox
Zouxxyy commented on PR #6267: URL: https://github.com/apache/hudi/pull/6267#issuecomment-1215266488 @nsivabalan Can you please help with a review ^-^ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [hudi] rbtrtr commented on issue #6397: [SUPPORT] spark history server - sql tab

2022-08-15 Thread GitBox
rbtrtr commented on issue #6397: URL: https://github.com/apache/hudi/issues/6397#issuecomment-1215239268 Yes exactly that’s what we‘re looking for. Did we miss some configurations? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[jira] [Updated] (HUDI-4579) [DOCS] Add docs on manually upgrading and downgrading table through CLI

2022-08-15 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-4579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-4579: Status: In Progress (was: Open) > [DOCS] Add docs on manually upgrading and downgrading table through CLI

[jira] [Updated] (HUDI-4579) [DOCS] Add docs on manually upgrading and downgrading table through CLI

2022-08-15 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-4579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-4579: Status: Patch Available (was: In Progress) > [DOCS] Add docs on manually upgrading and downgrading table

[GitHub] [hudi] desaismi closed issue #6069: [SUPPORT] /hoodie/temp Folder and contents not getting deleted

2022-08-15 Thread GitBox
desaismi closed issue #6069: [SUPPORT] /hoodie/temp Folder and contents not getting deleted URL: https://github.com/apache/hudi/issues/6069 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[jira] [Closed] (HUDI-4564) Docs writing for 0.12.0: spark 3.3 support

2022-08-15 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-4564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo closed HUDI-4564. --- Resolution: Fixed > Docs writing for 0.12.0: spark 3.3 support > -- >

[GitHub] [hudi] Zouxxyy commented on issue #6397: [SUPPORT] spark history server - sql tab

2022-08-15 Thread GitBox
Zouxxyy commented on issue #6397: URL: https://github.com/apache/hudi/issues/6397#issuecomment-1215205417 like this? my hudi is 0.10.1, spark is 3.2.1 https://user-images.githubusercontent.com/37108074/184668340-aaa4d423-e50a-4d20-9623-c0e78c486223.png;> -- This is an automated

[GitHub] [hudi] hudi-bot commented on pull request #6401: [HUDI-4623] Write mor log by suffix for different flink jobs

2022-08-15 Thread GitBox
hudi-bot commented on PR #6401: URL: https://github.com/apache/hudi/pull/6401#issuecomment-1215193628 ## CI report: * 19f985687513ba5e5c450201fc05932ddfc189e8 UNKNOWN * 718e112b4ae9ce0c2a90c8445693990f45e72081 Azure:

[jira] [Updated] (HUDI-4585) Optimize query performance on Presto Hudi connector

2022-08-15 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-4585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-4585: Status: In Progress (was: Open) > Optimize query performance on Presto Hudi connector >

[jira] [Updated] (HUDI-4586) Address S3 timeouts in Bloom Index with metadata table

2022-08-15 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-4586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-4586: Status: In Progress (was: Open) > Address S3 timeouts in Bloom Index with metadata table >

[jira] [Closed] (HUDI-4556) Improve functional test coverage of column stats index

2022-08-15 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-4556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo closed HUDI-4556. --- Resolution: Fixed > Improve functional test coverage of column stats index >

[jira] [Closed] (HUDI-4576) Fix schema evolution docs

2022-08-15 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-4576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo closed HUDI-4576. --- Resolution: Fixed > Fix schema evolution docs > - > > Key: HUDI-4576

[GitHub] [hudi] hudi-bot commented on pull request #6401: [HUDI-4623] Write mor log by suffix for different flink jobs

2022-08-15 Thread GitBox
hudi-bot commented on PR #6401: URL: https://github.com/apache/hudi/pull/6401#issuecomment-1215181753 ## CI report: * 19f985687513ba5e5c450201fc05932ddfc189e8 UNKNOWN * 718e112b4ae9ce0c2a90c8445693990f45e72081 Azure:

[GitHub] [hudi] hudi-bot commented on pull request #6383: [HUDI-4607]fixed timeline based marker thread safaty issue

2022-08-15 Thread GitBox
hudi-bot commented on PR #6383: URL: https://github.com/apache/hudi/pull/6383#issuecomment-1215167926 ## CI report: * a7c9b9dde772cf1d0cfdaf005cd9888711394365 Azure:

[GitHub] [hudi] hudi-bot commented on pull request #6365: [HUDI-4601] read error from MOR table after compaction

2022-08-15 Thread GitBox
hudi-bot commented on PR #6365: URL: https://github.com/apache/hudi/pull/6365#issuecomment-1215167721 ## CI report: * 4ed8bd8bf4b2a4ab2948fe7dba9aae85ae1fccfb Azure:

[jira] [Commented] (HUDI-3973) Implement GENERATE manifest command for Snowflake integration

2022-08-15 Thread Joyan Sil (Jira)
[ https://issues.apache.org/jira/browse/HUDI-3973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17579767#comment-17579767 ] Joyan Sil commented on HUDI-3973: - [~xushiyan] I have not started working on this. This will be helpful

[jira] [Updated] (HUDI-4567) Finalize design approach and RFC docs

2022-08-15 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-4567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-4567: - Status: In Progress (was: Open) > Finalize design approach and RFC docs >

[jira] [Updated] (HUDI-4441) Disbale INFO level logs from tests

2022-08-15 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-4441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-4441: - Status: In Progress (was: Open) > Disbale INFO level logs from tests >

[jira] [Updated] (HUDI-3287) Remove unnecessary deps in hudi-kafka-connect

2022-08-15 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-3287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-3287: - Reviewers: Raymond Xu, Sagar Sumit > Remove unnecessary deps in hudi-kafka-connect >

[jira] [Updated] (HUDI-4441) Disbale INFO level logs from tests

2022-08-15 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-4441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-4441: - Status: Patch Available (was: In Progress) > Disbale INFO level logs from tests >

[jira] [Updated] (HUDI-4503) Support table identifier with explicit catalog

2022-08-15 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-4503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-4503: - Status: In Progress (was: Open) > Support table identifier with explicit catalog >

[jira] [Updated] (HUDI-3973) Implement GENERATE manifest command for Snowflake integration

2022-08-15 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-3973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-3973: - Sprint: 2022/08/22 > Implement GENERATE manifest command for Snowflake integration >

[jira] [Updated] (HUDI-3973) Implement GENERATE manifest command for Snowflake integration

2022-08-15 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-3973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-3973: - Sprint: (was: 2022/08/08) > Implement GENERATE manifest command for Snowflake integration >

[jira] [Updated] (HUDI-3579) Add timeline commands in hudi-cli

2022-08-15 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-3579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-3579: - Status: In Progress (was: Open) > Add timeline commands in hudi-cli > -

[jira] [Updated] (HUDI-4503) Support table identifier with explicit catalog

2022-08-15 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-4503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-4503: - Status: Patch Available (was: In Progress) > Support table identifier with explicit catalog >

[GitHub] [hudi] nsivabalan commented on issue #6367: [SUPPORT] Failed Job - doing partition and writing data - in Hudi 0.11.0

2022-08-15 Thread GitBox
nsivabalan commented on issue #6367: URL: https://github.com/apache/hudi/issues/6367#issuecomment-1215134782 For col stats issue, we might need to find the root cause how did we end up updating two records w/ same key w/ metadata table. probably its a transient issue and if you restart

[GitHub] [hudi] nsivabalan commented on issue #6379: [SUPPORT]What's the reading behavior for MOR table?

2022-08-15 Thread GitBox
nsivabalan commented on issue #6379: URL: https://github.com/apache/hudi/issues/6379#issuecomment-1215126959 your 1st answer is pretty much correct. If there are multiple users querying concurrently, each will do the merge by themselves in memory and will get to read the merged final

[GitHub] [hudi] nsivabalan commented on issue #6380: [SUPPORT] Will clustering update metadata table?

2022-08-15 Thread GitBox
nsivabalan commented on issue #6380: URL: https://github.com/apache/hudi/issues/6380#issuecomment-1215122282 yes, metadata table is updated. clustering has been tested w/ metadata as well. Do you see any strange behavior ? which version of hudi are you using. -- This is an automated

[GitHub] [hudi] nsivabalan commented on a diff in pull request #6376: [HUDI-4579] Add docs on upgrading and downgrading table through CLI

2022-08-15 Thread GitBox
nsivabalan commented on code in PR #6376: URL: https://github.com/apache/hudi/pull/6376#discussion_r945837128 ## website/docs/cli.md: ## @@ -419,4 +487,66 @@ savepoints show savepoint rollback --savepoint 20220128160245447 --sparkMaster local[2] ``` +### Upgrade and

[GitHub] [hudi] nsivabalan commented on a diff in pull request #6374: [HUDI-4608] Fix upgrade command in Hudi CLI

2022-08-15 Thread GitBox
nsivabalan commented on code in PR #6374: URL: https://github.com/apache/hudi/pull/6374#discussion_r945832105 ## hudi-cli/src/test/java/org/apache/hudi/cli/commands/TestUpgradeDowngradeCommand.java: ## @@ -83,10 +88,32 @@ public void init() throws Exception {

[GitHub] [hudi] wuwenchi commented on pull request #6396: [HUDI-4621] all data fill in the same bucket because not check INDEX_KEY_FIELD

2022-08-15 Thread GitBox
wuwenchi commented on PR #6396: URL: https://github.com/apache/hudi/pull/6396#issuecomment-1215096652 @danny0405 @garyli1019 can you help review it? thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [hudi] hudi-bot commented on pull request #6401: [HUDI-4623] Write mor log by suffix for different flink jobs

2022-08-15 Thread GitBox
hudi-bot commented on PR #6401: URL: https://github.com/apache/hudi/pull/6401#issuecomment-1215092635 ## CI report: * 19f985687513ba5e5c450201fc05932ddfc189e8 UNKNOWN * 718e112b4ae9ce0c2a90c8445693990f45e72081 Azure:

[GitHub] [hudi] hudi-bot commented on pull request #6365: [HUDI-4601] read error from MOR table after compaction

2022-08-15 Thread GitBox
hudi-bot commented on PR #6365: URL: https://github.com/apache/hudi/pull/6365#issuecomment-1215092524 ## CI report: * 4ed8bd8bf4b2a4ab2948fe7dba9aae85ae1fccfb Azure:

[GitHub] [hudi] hudi-bot commented on pull request #6401: [HUDI-4623] Write mor log by suffix for different flink jobs

2022-08-15 Thread GitBox
hudi-bot commented on PR #6401: URL: https://github.com/apache/hudi/pull/6401#issuecomment-1215086759 ## CI report: * 19f985687513ba5e5c450201fc05932ddfc189e8 UNKNOWN * 718e112b4ae9ce0c2a90c8445693990f45e72081 Azure:

[GitHub] [hudi] hudi-bot commented on pull request #6401: [HUDI-4623] Write mor log by suffix for different flink jobs

2022-08-15 Thread GitBox
hudi-bot commented on PR #6401: URL: https://github.com/apache/hudi/pull/6401#issuecomment-1215081273 ## CI report: * 19f985687513ba5e5c450201fc05932ddfc189e8 UNKNOWN * 718e112b4ae9ce0c2a90c8445693990f45e72081 UNKNOWN Bot commands @hudi-bot supports the

[GitHub] [hudi] hudi-bot commented on pull request #6401: [HUDI-4623] Write mor log by suffix for different flink jobs

2022-08-15 Thread GitBox
hudi-bot commented on PR #6401: URL: https://github.com/apache/hudi/pull/6401#issuecomment-1215075962 ## CI report: * 19f985687513ba5e5c450201fc05932ddfc189e8 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the

[jira] [Updated] (HUDI-4623) Write mor log by suffix for different flink jobs

2022-08-15 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/HUDI-4623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-4623: - Labels: pull-request-available (was: ) > Write mor log by suffix for different flink jobs >

[GitHub] [hudi] XuQianJin-Stars opened a new pull request, #6401: [HUDI-4623] write mor log by suffix for different flink jobs

2022-08-15 Thread GitBox
XuQianJin-Stars opened a new pull request, #6401: URL: https://github.com/apache/hudi/pull/6401 ### Change Logs _Describe context and summary for this change. Highlight if any code was copied._ ### Impact _Describe any public API or user-facing feature change or any

[jira] [Created] (HUDI-4623) Write mor log by suffix for different flink jobs

2022-08-15 Thread Forward Xu (Jira)
Forward Xu created HUDI-4623: Summary: Write mor log by suffix for different flink jobs Key: HUDI-4623 URL: https://issues.apache.org/jira/browse/HUDI-4623 Project: Apache Hudi Issue Type: New

[GitHub] [hudi] hudi-bot commented on pull request #6383: [HUDI-4607]fixed timeline based marker thread safaty issue

2022-08-15 Thread GitBox
hudi-bot commented on PR #6383: URL: https://github.com/apache/hudi/pull/6383#issuecomment-1215012077 ## CI report: * d1f9a27a1e939fb07e614c90cfa6d0ec9d9940a6 Azure:

[GitHub] [hudi] hudi-bot commented on pull request #6383: [HUDI-4607]fixed timeline based marker thread safaty issue

2022-08-15 Thread GitBox
hudi-bot commented on PR #6383: URL: https://github.com/apache/hudi/pull/6383#issuecomment-1215006570 ## CI report: * d1f9a27a1e939fb07e614c90cfa6d0ec9d9940a6 Azure:

[GitHub] [hudi] cocoapan closed issue #6400: [SUPPORT] MergeInto syntax merge_condition does not support Non-Equal condition

2022-08-15 Thread GitBox
cocoapan closed issue #6400: [SUPPORT] MergeInto syntax merge_condition does not support Non-Equal condition URL: https://github.com/apache/hudi/issues/6400 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [hudi] cocoapan opened a new issue, #6400: [SUPPORT] MergeInto syntax merge_condition does not support Non-Equal condition

2022-08-15 Thread GitBox
cocoapan opened a new issue, #6400: URL: https://github.com/apache/hudi/issues/6400 Hi, Recently I found spark error message when using mergeInto syntax: https://user-images.githubusercontent.com/99819932/184637559-5d9590a7-d6b1-473e-9166-2e8720da1e80.png;> SQL: 1.

[jira] [Updated] (HUDI-4560) [DOCS] Update default value for partition extractor and note about infer function

2022-08-15 Thread Sagar Sumit (Jira)
[ https://issues.apache.org/jira/browse/HUDI-4560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sagar Sumit updated HUDI-4560: -- Status: Patch Available (was: In Progress) > [DOCS] Update default value for partition extractor and

[jira] [Updated] (HUDI-4560) [DOCS] Update default value for partition extractor and note about infer function

2022-08-15 Thread Sagar Sumit (Jira)
[ https://issues.apache.org/jira/browse/HUDI-4560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sagar Sumit updated HUDI-4560: -- Status: In Progress (was: Open) > [DOCS] Update default value for partition extractor and note about

[jira] [Updated] (HUDI-4583) [DOCS] Optimal write configs for different workload patterns

2022-08-15 Thread Sagar Sumit (Jira)
[ https://issues.apache.org/jira/browse/HUDI-4583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sagar Sumit updated HUDI-4583: -- Status: In Progress (was: Open) > [DOCS] Optimal write configs for different workload patterns >

[jira] [Updated] (HUDI-4583) [DOCS] Optimal write configs for different workload patterns

2022-08-15 Thread Sagar Sumit (Jira)
[ https://issues.apache.org/jira/browse/HUDI-4583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sagar Sumit updated HUDI-4583: -- Status: Patch Available (was: In Progress) > [DOCS] Optimal write configs for different workload

[jira] [Updated] (HUDI-4583) [DOCS] Optimal write configs for different workload patterns

2022-08-15 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/HUDI-4583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-4583: - Labels: docs performance pull-request-available (was: docs performance) > [DOCS] Optimal write

[GitHub] [hudi] codope opened a new pull request, #6399: [HUDI-4583][DOCS] Optimal write configs for bulk insert

2022-08-15 Thread GitBox
codope opened a new pull request, #6399: URL: https://github.com/apache/hudi/pull/6399 ### Change Logs Add optimal write configs for bulk insert under the performance guide. Also, point to the benchmarking blog for tpc-ds benchmark. ### Impact _Describe any public API

[GitHub] [hudi] rbtrtr opened a new issue, #6398: [SUPPORT] Metadata table thows hbase exceptions

2022-08-15 Thread GitBox
rbtrtr opened a new issue, #6398: URL: https://github.com/apache/hudi/issues/6398 **Description** We're running on a cloudera cdp stack and want to upgrade to hudi 0.11.1 and take advantage of the metadata table feature. We tried to run a simple hudi write with generated data an got

[GitHub] [hudi] honeyaya commented on pull request #6347: [HUDI-4582] Support batch synchronization of partition to hive metastore to avoid timeout with --sync-mode="hms" and use-jdbc=false

2022-08-15 Thread GitBox
honeyaya commented on PR #6347: URL: https://github.com/apache/hudi/pull/6347#issuecomment-1214882067 @zhangyue19921010 hi, could you help review this pr, thanks -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [hudi] robertrichter opened a new issue, #6397: [SUPPORT]

2022-08-15 Thread GitBox
robertrichter opened a new issue, #6397: URL: https://github.com/apache/hudi/issues/6397 **To Reproduce** Steps to reproduce the behavior: 1. Examine the sql tab in the spark history web ui after the hudi write process has finshed. **Expected behavior** To

[jira] [Updated] (HUDI-4613) Avoid the use of regex expressions when call hoodieFileGroup#addLogFile function

2022-08-15 Thread lei w (Jira)
[ https://issues.apache.org/jira/browse/HUDI-4613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lei w updated HUDI-4613: Summary: Avoid the use of regex expressions when call hoodieFileGroup#addLogFile function (was: Avoid the use of

[jira] [Created] (HUDI-4622) Support using event time for timeline

2022-08-15 Thread shuai.xu (Jira)
shuai.xu created HUDI-4622: -- Summary: Support using event time for timeline Key: HUDI-4622 URL: https://issues.apache.org/jira/browse/HUDI-4622 Project: Apache Hudi Issue Type: New Feature

[GitHub] [hudi] ThinkerLei commented on a diff in pull request #6384: [HUDI-4613] Avoid the use of regular expressions when call hoodieFileGroup#addLogFile function

2022-08-15 Thread GitBox
ThinkerLei commented on code in PR #6384: URL: https://github.com/apache/hudi/pull/6384#discussion_r945586837 ## hudi-common/src/main/java/org/apache/hudi/common/fs/FSUtils.java: ## @@ -64,19 +64,17 @@ import java.util.function.Function; import java.util.function.Predicate;

[GitHub] [hudi] danny0405 commented on a diff in pull request #6384: [HUDI-4613] Avoid the use of regular expressions when call hoodieFileGroup#addLogFile function

2022-08-15 Thread GitBox
danny0405 commented on code in PR #6384: URL: https://github.com/apache/hudi/pull/6384#discussion_r945585595 ## hudi-common/src/main/java/org/apache/hudi/common/fs/FSUtils.java: ## @@ -64,19 +64,17 @@ import java.util.function.Function; import java.util.function.Predicate;

[GitHub] [hudi] hudi-bot commented on pull request #6393: [HUDI-4619] Fix The retry mechanism of remotehoodietablefilesystemvie…

2022-08-15 Thread GitBox
hudi-bot commented on PR #6393: URL: https://github.com/apache/hudi/pull/6393#issuecomment-1214844217 ## CI report: * 09f49abeeca229df307426ba79bd77ed0392b79f UNKNOWN * 4c7dcf78cdee3e26dbf291dc49f8ac64a05c8c60 Azure:

[GitHub] [hudi] danny0405 commented on a diff in pull request #6256: [RFC-51][HUDI-3478] Update RFC: CDC support

2022-08-15 Thread GitBox
danny0405 commented on code in PR #6256: URL: https://github.com/apache/hudi/pull/6256#discussion_r945577916 ## rfc/rfc-51/rfc-51.md: ## @@ -64,69 +65,72 @@ We follow the debezium output format: four columns as shown below Note: the illustration here ignores all the Hudi

[GitHub] [hudi] danny0405 commented on a diff in pull request #6256: [RFC-51][HUDI-3478] Update RFC: CDC support

2022-08-15 Thread GitBox
danny0405 commented on code in PR #6256: URL: https://github.com/apache/hudi/pull/6256#discussion_r945577068 ## rfc/rfc-51/rfc-51.md: ## @@ -64,69 +65,72 @@ We follow the debezium output format: four columns as shown below Note: the illustration here ignores all the Hudi

[GitHub] [hudi] danny0405 commented on a diff in pull request #6256: [RFC-51][HUDI-3478] Update RFC: CDC support

2022-08-15 Thread GitBox
danny0405 commented on code in PR #6256: URL: https://github.com/apache/hudi/pull/6256#discussion_r945574953 ## rfc/rfc-51/rfc-51.md: ## @@ -64,69 +65,72 @@ We follow the debezium output format: four columns as shown below Note: the illustration here ignores all the Hudi

[GitHub] [hudi] Hexiaoqiao commented on a diff in pull request #6384: [HUDI-4613] Avoid the use of regular expressions when call hoodieFileGroup#addLogFile function

2022-08-15 Thread GitBox
Hexiaoqiao commented on code in PR #6384: URL: https://github.com/apache/hudi/pull/6384#discussion_r945567984 ## hudi-common/src/main/java/org/apache/hudi/common/model/HoodieLogFile.java: ## @@ -42,63 +45,98 @@ public class HoodieLogFile implements Serializable { private

[GitHub] [hudi] YannByron commented on a diff in pull request #6256: [RFC-51][HUDI-3478] Update RFC: CDC support

2022-08-15 Thread GitBox
YannByron commented on code in PR #6256: URL: https://github.com/apache/hudi/pull/6256#discussion_r945544647 ## rfc/rfc-51/rfc-51.md: ## @@ -148,20 +152,27 @@ hudi_cdc_table/ Under a partition directory, the `.log` file with `CDCBlock` above will keep the changing data we

[GitHub] [hudi] xiarixiaoyao commented on pull request #6046: [HUDI-4363] Support Clustering row writer to improve performance

2022-08-15 Thread GitBox
xiarixiaoyao commented on PR #6046: URL: https://github.com/apache/hudi/pull/6046#issuecomment-1214811767 @alexeykudinkin could you pls help review this pr, thanks very much -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [hudi] prasannarajaperumal commented on a diff in pull request #6256: [RFC-51][HUDI-3478] Update RFC: CDC support

2022-08-15 Thread GitBox
prasannarajaperumal commented on code in PR #6256: URL: https://github.com/apache/hudi/pull/6256#discussion_r945438639 ## rfc/rfc-51/rfc-51.md: ## @@ -148,20 +152,27 @@ hudi_cdc_table/ Under a partition directory, the `.log` file with `CDCBlock` above will keep the changing

[GitHub] [hudi] xiarixiaoyao commented on a diff in pull request #6046: [HUDI-4363] Support Clustering row writer to improve performance

2022-08-15 Thread GitBox
xiarixiaoyao commented on code in PR #6046: URL: https://github.com/apache/hudi/pull/6046#discussion_r945558006 ## hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/client/clustering/run/strategy/MultipleSparkJobExecutionStrategy.java: ## @@ -273,6 +398,62 @@ private

[GitHub] [hudi] xiarixiaoyao commented on a diff in pull request #6046: [HUDI-4363] Support Clustering row writer to improve performance

2022-08-15 Thread GitBox
xiarixiaoyao commented on code in PR #6046: URL: https://github.com/apache/hudi/pull/6046#discussion_r94810 ## hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/client/clustering/run/strategy/MultipleSparkJobExecutionStrategy.java: ## @@ -273,6 +398,62 @@ private

[GitHub] [hudi] YannByron commented on a diff in pull request #6256: [RFC-51][HUDI-3478] Update RFC: CDC support

2022-08-15 Thread GitBox
YannByron commented on code in PR #6256: URL: https://github.com/apache/hudi/pull/6256#discussion_r945554115 ## rfc/rfc-51/rfc-51.md: ## @@ -64,69 +65,72 @@ We follow the debezium output format: four columns as shown below Note: the illustration here ignores all the Hudi

[GitHub] [hudi] flashJd commented on a diff in pull request #6385: [HUDI-4614] fix primary key extract of delete_record when complexKeyGen configured and ChangeLogDisabled

2022-08-15 Thread GitBox
flashJd commented on code in PR #6385: URL: https://github.com/apache/hudi/pull/6385#discussion_r945550779 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/keygen/KeyGenUtils.java: ## @@ -73,21 +73,20 @@ public static String

[GitHub] [hudi] flashJd commented on a diff in pull request #6385: [HUDI-4614] fix primary key extract of delete_record when complexKeyGen configured and ChangeLogDisabled

2022-08-15 Thread GitBox
flashJd commented on code in PR #6385: URL: https://github.com/apache/hudi/pull/6385#discussion_r945550399 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/keygen/KeyGenUtils.java: ## @@ -73,21 +73,20 @@ public static String

[GitHub] [hudi] YannByron commented on a diff in pull request #6256: [RFC-51][HUDI-3478] Update RFC: CDC support

2022-08-15 Thread GitBox
YannByron commented on code in PR #6256: URL: https://github.com/apache/hudi/pull/6256#discussion_r945547502 ## rfc/rfc-51/rfc-51.md: ## @@ -64,69 +65,72 @@ We follow the debezium output format: four columns as shown below Note: the illustration here ignores all the Hudi

[GitHub] [hudi] xiarixiaoyao commented on a diff in pull request #6046: [HUDI-4363] Support Clustering row writer to improve performance

2022-08-15 Thread GitBox
xiarixiaoyao commented on code in PR #6046: URL: https://github.com/apache/hudi/pull/6046#discussion_r945546375 ## hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/client/clustering/run/strategy/MultipleSparkJobExecutionStrategy.java: ## @@ -131,6 +161,53 @@ public

[GitHub] [hudi] YannByron commented on a diff in pull request #6256: [RFC-51][HUDI-3478] Update RFC: CDC support

2022-08-15 Thread GitBox
YannByron commented on code in PR #6256: URL: https://github.com/apache/hudi/pull/6256#discussion_r945546131 ## rfc/rfc-51/rfc-51.md: ## @@ -64,69 +65,72 @@ We follow the debezium output format: four columns as shown below Note: the illustration here ignores all the Hudi

[GitHub] [hudi] YannByron commented on a diff in pull request #6256: [RFC-51][HUDI-3478] Update RFC: CDC support

2022-08-15 Thread GitBox
YannByron commented on code in PR #6256: URL: https://github.com/apache/hudi/pull/6256#discussion_r945544647 ## rfc/rfc-51/rfc-51.md: ## @@ -148,20 +152,27 @@ hudi_cdc_table/ Under a partition directory, the `.log` file with `CDCBlock` above will keep the changing data we

[GitHub] [hudi] xiarixiaoyao commented on a diff in pull request #6046: [HUDI-4363] Support Clustering row writer to improve performance

2022-08-15 Thread GitBox
xiarixiaoyao commented on code in PR #6046: URL: https://github.com/apache/hudi/pull/6046#discussion_r945544123 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java: ## @@ -211,6 +211,23 @@ public class HoodieWriteConfig extends

[GitHub] [hudi] hudi-bot commented on pull request #6393: [HUDI-4619] Fix The retry mechanism of remotehoodietablefilesystemvie…

2022-08-15 Thread GitBox
hudi-bot commented on PR #6393: URL: https://github.com/apache/hudi/pull/6393#issuecomment-1214761950 ## CI report: * c3621ebdc635c4c66d265f24d48ffec203a732c1 Azure:

[GitHub] [hudi] hudi-bot commented on pull request #6393: [HUDI-4619] Fix The retry mechanism of remotehoodietablefilesystemvie…

2022-08-15 Thread GitBox
hudi-bot commented on PR #6393: URL: https://github.com/apache/hudi/pull/6393#issuecomment-1214758716 ## CI report: * c3621ebdc635c4c66d265f24d48ffec203a732c1 Azure:

[jira] [Updated] (HUDI-4621) all data fill in the same bucket because not check INDEX_KEY_FIELD

2022-08-15 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/HUDI-4621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-4621: - Labels: pull-request-available (was: ) > all data fill in the same bucket because not check

[GitHub] [hudi] hudi-bot commented on pull request #6393: [HUDI-4619] Fix The retry mechanism of remotehoodietablefilesystemvie…

2022-08-15 Thread GitBox
hudi-bot commented on PR #6393: URL: https://github.com/apache/hudi/pull/6393#issuecomment-1214755011 ## CI report: * c3621ebdc635c4c66d265f24d48ffec203a732c1 Azure:

[GitHub] [hudi] hudi-bot commented on pull request #6396: [HUDI-4621] all data fill in the same bucket because not check INDEX_KEY_FIELD

2022-08-15 Thread GitBox
hudi-bot commented on PR #6396: URL: https://github.com/apache/hudi/pull/6396#issuecomment-1214755047 ## CI report: * b8b7d966955c6b731ddc00b28300c18c2a630651 Azure:

[hudi] branch master updated: [MINOR] fix progress field calculate logic in HoodieLogRecordReader (#6291)

2022-08-15 Thread garyli
This is an automated email from the ASF dual-hosted git repository. garyli pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/master by this push: new 997200f27f [MINOR] fix progress field calculate

[GitHub] [hudi] garyli1019 merged pull request #6291: [MINOR] fix `progress` field calculate logic in HoodieLogRecordReader

2022-08-15 Thread GitBox
garyli1019 merged PR #6291: URL: https://github.com/apache/hudi/pull/6291 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

[GitHub] [hudi] hudi-bot commented on pull request #6393: [HUDI-4619] Fix The retry mechanism of remotehoodietablefilesystemvie…

2022-08-15 Thread GitBox
hudi-bot commented on PR #6393: URL: https://github.com/apache/hudi/pull/6393#issuecomment-1214715776 ## CI report: * c3621ebdc635c4c66d265f24d48ffec203a732c1 Azure:

[GitHub] [hudi] hudi-bot commented on pull request #6396: [4621] all data fill in the same bucket because not check INDEX_KEY_FIELD

2022-08-15 Thread GitBox
hudi-bot commented on PR #6396: URL: https://github.com/apache/hudi/pull/6396#issuecomment-1214708956 ## CI report: * 56b169af5016ed2b70ba631363ffa70b5e9b1174 Azure:

[GitHub] [hudi] hudi-bot commented on pull request #6396: [4621] all data fill in the same bucket because not check INDEX_KEY_FIELD

2022-08-15 Thread GitBox
hudi-bot commented on PR #6396: URL: https://github.com/apache/hudi/pull/6396#issuecomment-1214704730 ## CI report: * 56b169af5016ed2b70ba631363ffa70b5e9b1174 Azure:

[GitHub] [hudi] danny0405 commented on a diff in pull request #6393: [HUDI-4619] Fix The retry mechanism of remotehoodietablefilesystemvie…

2022-08-15 Thread GitBox
danny0405 commented on code in PR #6393: URL: https://github.com/apache/hudi/pull/6393#discussion_r945460251 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/compact/HoodieCompactor.java: ## @@ -305,7 +305,7 @@ HoodieCompactionPlan

[GitHub] [hudi] boneanxs commented on pull request #6046: [HUDI-4363] Support Clustering row writer to improve performance

2022-08-15 Thread GitBox
boneanxs commented on PR #6046: URL: https://github.com/apache/hudi/pull/6046#issuecomment-1214672107 The CI failure seems not relate to the PR. Thanks to @voonhous, he tested 2 cases, cluster individual parquet files of ~500MB up to 10GB groups. After enable

[jira] [Commented] (HUDI-3189) Fallback to full table scan with incremental query when files are cleaned up or achived for MOR table

2022-08-15 Thread Danny Chen (Jira)
[ https://issues.apache.org/jira/browse/HUDI-3189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17579552#comment-17579552 ] Danny Chen commented on HUDI-3189: -- Add a following fix in master:

[GitHub] [hudi] hudi-bot commented on pull request #6396: [4621] all data fill in the same bucket because not check INDEX_KEY_FIELD

2022-08-15 Thread GitBox
hudi-bot commented on PR #6396: URL: https://github.com/apache/hudi/pull/6396#issuecomment-1214668755 ## CI report: * 56b169af5016ed2b70ba631363ffa70b5e9b1174 Azure:

[GitHub] [hudi] danny0405 merged pull request #6141: [HUDI-3189] Fallback to full table scan with incremental query when files are cleaned up or achived for MOR table

2022-08-15 Thread GitBox
danny0405 merged PR #6141: URL: https://github.com/apache/hudi/pull/6141 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

[hudi] branch master updated: [HUDI-3189] Fallback to full table scan with incremental query when files are cleaned up or achived for MOR table (#6141)

2022-08-15 Thread danny0405
This is an automated email from the ASF dual-hosted git repository. danny0405 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/master by this push: new 553e280c6d [HUDI-3189] Fallback to full table

[GitHub] [hudi] novisfff commented on a diff in pull request #6383: [HUDI-4607]fixed timeline based marker thread safaty issue

2022-08-15 Thread GitBox
novisfff commented on code in PR #6383: URL: https://github.com/apache/hudi/pull/6383#discussion_r945453868 ## hudi-timeline-service/src/main/java/org/apache/hudi/timeline/service/handlers/marker/MarkerCreationDispatchingRunnable.java: ## @@ -66,8 +66,9 @@ public void run() {

[GitHub] [hudi] hudi-bot commented on pull request #6393: [HUDI-4619] Fix The retry mechanism of remotehoodietablefilesystemvie…

2022-08-15 Thread GitBox
hudi-bot commented on PR #6393: URL: https://github.com/apache/hudi/pull/6393#issuecomment-1214660960 ## CI report: * acaf6edd64671b6815421a7d108bc68e92763c0d Azure:

[GitHub] [hudi] danny0405 commented on a diff in pull request #6383: [HUDI-4607]fixed timeline based marker thread safaty issue

2022-08-15 Thread GitBox
danny0405 commented on code in PR #6383: URL: https://github.com/apache/hudi/pull/6383#discussion_r945450541 ## hudi-timeline-service/src/main/java/org/apache/hudi/timeline/service/handlers/marker/MarkerCreationDispatchingRunnable.java: ## @@ -66,8 +66,9 @@ public void run() {

[GitHub] [hudi] novisfff commented on a diff in pull request #6383: [HUDI-4607]fixed timeline based marker thread safaty issue

2022-08-15 Thread GitBox
novisfff commented on code in PR #6383: URL: https://github.com/apache/hudi/pull/6383#discussion_r945449815 ## hudi-timeline-service/src/main/java/org/apache/hudi/timeline/service/handlers/marker/MarkerCreationDispatchingRunnable.java: ## @@ -66,8 +66,9 @@ public void run() {

[GitHub] [hudi] hudi-bot commented on pull request #6393: [HUDI-4619] Fix The retry mechanism of remotehoodietablefilesystemvie…

2022-08-15 Thread GitBox
hudi-bot commented on PR #6393: URL: https://github.com/apache/hudi/pull/6393#issuecomment-1214657928 ## CI report: * acaf6edd64671b6815421a7d108bc68e92763c0d Azure:

[GitHub] [hudi] novisfff commented on a diff in pull request #6383: [HUDI-4607]fixed timeline based marker thread safaty issue

2022-08-15 Thread GitBox
novisfff commented on code in PR #6383: URL: https://github.com/apache/hudi/pull/6383#discussion_r945447878 ## hudi-timeline-service/src/main/java/org/apache/hudi/timeline/service/handlers/marker/MarkerCreationDispatchingRunnable.java: ## @@ -66,8 +66,9 @@ public void run() {

[GitHub] [hudi] hudi-bot commented on pull request #6395: [HUDI-4620] No expected exception is thrown when create hudi table without primaryKey

2022-08-15 Thread GitBox
hudi-bot commented on PR #6395: URL: https://github.com/apache/hudi/pull/6395#issuecomment-1214654472 ## CI report: * 952bef85e137b87f1a24a08eb71d987e53a6460b Azure:

[GitHub] [hudi] boneanxs commented on pull request #6141: [HUDI-3189] Fallback to full table scan with incremental query when files are cleaned up or achived for MOR table

2022-08-15 Thread GitBox
boneanxs commented on PR #6141: URL: https://github.com/apache/hudi/pull/6141#issuecomment-1214652833 @danny0405 the CI pass~ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

<    1   2