[jira] [Commented] (HUDI-1904) Make SchemaProvider spark free and move it to hudi-client-common
[ https://issues.apache.org/jira/browse/HUDI-1904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17373244#comment-17373244 ] ASF GitHub Bot commented on HUDI-1904: -- codecov-commenter edited a comment on pull request #2963: URL: https://github.com/apache/hudi/pull/2963#issuecomment-843155329 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2963?src=pr=h1_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) Report > Merging [#2963](https://codecov.io/gh/apache/hudi/pull/2963?src=pr=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) (8c67c9b) into [master](https://codecov.io/gh/apache/hudi/commit/6eca06d074520140d7bc67b48bd2b9a5b76f0a87?el=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) (6eca06d) will **decrease** coverage by `32.02%`. > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/hudi/pull/2963/graphs/tree.svg?width=650=150=pr=VTTXabwbs2_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/2963?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) ```diff @@ Coverage Diff @@ ## master#2963 +/- ## = - Coverage 47.51% 15.48% -32.03% + Complexity 5429 478 -4951 = Files 922 281 -641 Lines 4096811548-29420 Branches 4105 945 -3160 = - Hits 19464 1788-17676 + Misses19780 9602-10178 + Partials 1724 158 -1566 ``` | Flag | Coverage Δ | | |---|---|---| | hudicli | `?` | | | hudiclient | `0.00% <ø> (-34.59%)` | :arrow_down: | | hudicommon | `?` | | | hudiflink | `?` | | | hudihadoopmr | `?` | | | hudisparkdatasource | `?` | | | hudisync | `5.38% <ø> (-48.67%)` | :arrow_down: | | huditimelineservice | `?` | | | hudiutilities | `58.04% <ø> (+0.01%)` | :arrow_up: | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more. | [Impacted Files](https://codecov.io/gh/apache/hudi/pull/2963?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) | Coverage Δ | | |---|---|---| | [...rg/apache/hudi/schema/SchemaProviderInterface.java](https://codecov.io/gh/apache/hudi/pull/2963/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGllbnQvaHVkaS1jbGllbnQtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL3NjaGVtYS9TY2hlbWFQcm92aWRlckludGVyZmFjZS5qYXZh) | `0.00% <ø> (ø)` | | | [...g/apache/hudi/utilities/schema/SchemaProvider.java](https://codecov.io/gh/apache/hudi/pull/2963/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NjaGVtYS9TY2hlbWFQcm92aWRlci5qYXZh) | `66.66% <ø> (-4.77%)` | :arrow_down: | | [...main/java/org/apache/hudi/metrics/HoodieGauge.java](https://codecov.io/gh/apache/hudi/pull/2963/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGllbnQvaHVkaS1jbGllbnQtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL21ldHJpY3MvSG9vZGllR2F1Z2UuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | :arrow_down: | | [.../org/apache/hudi/hive/NonPartitionedExtractor.java](https://codecov.io/gh/apache/hudi/pull/2963/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvTm9uUGFydGl0aW9uZWRFeHRyYWN0b3IuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | :arrow_down: | | [.../java/org/apache/hudi/metrics/MetricsReporter.java](https://codecov.io/gh/apache/hudi/pull/2963/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGllbnQvaHVkaS1jbGllbnQtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL21ldHJpY3MvTWV0cmljc1JlcG9ydGVyLmphdmE=) | `0.00% <0.00%> (-100.00%)` | :arrow_down: | |
[GitHub] [hudi] codecov-commenter edited a comment on pull request #2963: [HUDI-1904] Introduce SchemaProviderInterface to make SchemaProvider unified
codecov-commenter edited a comment on pull request #2963: URL: https://github.com/apache/hudi/pull/2963#issuecomment-843155329 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2963?src=pr=h1_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) Report > Merging [#2963](https://codecov.io/gh/apache/hudi/pull/2963?src=pr=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) (8c67c9b) into [master](https://codecov.io/gh/apache/hudi/commit/6eca06d074520140d7bc67b48bd2b9a5b76f0a87?el=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) (6eca06d) will **decrease** coverage by `32.02%`. > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/hudi/pull/2963/graphs/tree.svg?width=650=150=pr=VTTXabwbs2_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/2963?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) ```diff @@ Coverage Diff @@ ## master#2963 +/- ## = - Coverage 47.51% 15.48% -32.03% + Complexity 5429 478 -4951 = Files 922 281 -641 Lines 4096811548-29420 Branches 4105 945 -3160 = - Hits 19464 1788-17676 + Misses19780 9602-10178 + Partials 1724 158 -1566 ``` | Flag | Coverage Δ | | |---|---|---| | hudicli | `?` | | | hudiclient | `0.00% <ø> (-34.59%)` | :arrow_down: | | hudicommon | `?` | | | hudiflink | `?` | | | hudihadoopmr | `?` | | | hudisparkdatasource | `?` | | | hudisync | `5.38% <ø> (-48.67%)` | :arrow_down: | | huditimelineservice | `?` | | | hudiutilities | `58.04% <ø> (+0.01%)` | :arrow_up: | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more. | [Impacted Files](https://codecov.io/gh/apache/hudi/pull/2963?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) | Coverage Δ | | |---|---|---| | [...rg/apache/hudi/schema/SchemaProviderInterface.java](https://codecov.io/gh/apache/hudi/pull/2963/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGllbnQvaHVkaS1jbGllbnQtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL3NjaGVtYS9TY2hlbWFQcm92aWRlckludGVyZmFjZS5qYXZh) | `0.00% <ø> (ø)` | | | [...g/apache/hudi/utilities/schema/SchemaProvider.java](https://codecov.io/gh/apache/hudi/pull/2963/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NjaGVtYS9TY2hlbWFQcm92aWRlci5qYXZh) | `66.66% <ø> (-4.77%)` | :arrow_down: | | [...main/java/org/apache/hudi/metrics/HoodieGauge.java](https://codecov.io/gh/apache/hudi/pull/2963/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGllbnQvaHVkaS1jbGllbnQtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL21ldHJpY3MvSG9vZGllR2F1Z2UuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | :arrow_down: | | [.../org/apache/hudi/hive/NonPartitionedExtractor.java](https://codecov.io/gh/apache/hudi/pull/2963/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvTm9uUGFydGl0aW9uZWRFeHRyYWN0b3IuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | :arrow_down: | | [.../java/org/apache/hudi/metrics/MetricsReporter.java](https://codecov.io/gh/apache/hudi/pull/2963/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGllbnQvaHVkaS1jbGllbnQtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL21ldHJpY3MvTWV0cmljc1JlcG9ydGVyLmphdmE=) | `0.00% <0.00%> (-100.00%)` | :arrow_down: | |
[jira] [Commented] (HUDI-1904) Make SchemaProvider spark free and move it to hudi-client-common
[ https://issues.apache.org/jira/browse/HUDI-1904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17373238#comment-17373238 ] ASF GitHub Bot commented on HUDI-1904: -- codecov-commenter edited a comment on pull request #2963: URL: https://github.com/apache/hudi/pull/2963#issuecomment-843155329 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2963?src=pr=h1_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) Report > Merging [#2963](https://codecov.io/gh/apache/hudi/pull/2963?src=pr=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) (8c67c9b) into [master](https://codecov.io/gh/apache/hudi/commit/6eca06d074520140d7bc67b48bd2b9a5b76f0a87?el=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) (6eca06d) will **decrease** coverage by `44.61%`. > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/hudi/pull/2963/graphs/tree.svg?width=650=150=pr=VTTXabwbs2_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/2963?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) ```diff @@ Coverage Diff @@ ## master #2963 +/- ## - Coverage 47.51% 2.89% -44.62% + Complexity 5429 82 -5347 Files 922 281 -641 Lines 40968 11548-29420 Branches 4105 945 -3160 - Hits 19464 334-19130 + Misses19780 11188 -8592 + Partials 1724 26 -1698 ``` | Flag | Coverage Δ | | |---|---|---| | hudicli | `?` | | | hudiclient | `0.00% <ø> (-34.59%)` | :arrow_down: | | hudicommon | `?` | | | hudiflink | `?` | | | hudihadoopmr | `?` | | | hudisparkdatasource | `?` | | | hudisync | `5.38% <ø> (-48.67%)` | :arrow_down: | | huditimelineservice | `?` | | | hudiutilities | `9.31% <ø> (-48.71%)` | :arrow_down: | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more. | [Impacted Files](https://codecov.io/gh/apache/hudi/pull/2963?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) | Coverage Δ | | |---|---|---| | [...rg/apache/hudi/schema/SchemaProviderInterface.java](https://codecov.io/gh/apache/hudi/pull/2963/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGllbnQvaHVkaS1jbGllbnQtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL3NjaGVtYS9TY2hlbWFQcm92aWRlckludGVyZmFjZS5qYXZh) | `0.00% <ø> (ø)` | | | [...g/apache/hudi/utilities/schema/SchemaProvider.java](https://codecov.io/gh/apache/hudi/pull/2963/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NjaGVtYS9TY2hlbWFQcm92aWRlci5qYXZh) | `66.66% <ø> (-4.77%)` | :arrow_down: | | [...va/org/apache/hudi/utilities/IdentitySplitter.java](https://codecov.io/gh/apache/hudi/pull/2963/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL0lkZW50aXR5U3BsaXR0ZXIuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | :arrow_down: | | [...va/org/apache/hudi/utilities/schema/SchemaSet.java](https://codecov.io/gh/apache/hudi/pull/2963/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NjaGVtYS9TY2hlbWFTZXQuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | :arrow_down: | | [...a/org/apache/hudi/utilities/sources/RowSource.java](https://codecov.io/gh/apache/hudi/pull/2963/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvUm93U291cmNlLmphdmE=) | `0.00% <0.00%> (-100.00%)` | :arrow_down: | |
[GitHub] [hudi] codecov-commenter edited a comment on pull request #2963: [HUDI-1904] Introduce SchemaProviderInterface to make SchemaProvider unified
codecov-commenter edited a comment on pull request #2963: URL: https://github.com/apache/hudi/pull/2963#issuecomment-843155329 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2963?src=pr=h1_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) Report > Merging [#2963](https://codecov.io/gh/apache/hudi/pull/2963?src=pr=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) (8c67c9b) into [master](https://codecov.io/gh/apache/hudi/commit/6eca06d074520140d7bc67b48bd2b9a5b76f0a87?el=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) (6eca06d) will **decrease** coverage by `44.61%`. > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/hudi/pull/2963/graphs/tree.svg?width=650=150=pr=VTTXabwbs2_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/2963?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) ```diff @@ Coverage Diff @@ ## master #2963 +/- ## - Coverage 47.51% 2.89% -44.62% + Complexity 5429 82 -5347 Files 922 281 -641 Lines 40968 11548-29420 Branches 4105 945 -3160 - Hits 19464 334-19130 + Misses19780 11188 -8592 + Partials 1724 26 -1698 ``` | Flag | Coverage Δ | | |---|---|---| | hudicli | `?` | | | hudiclient | `0.00% <ø> (-34.59%)` | :arrow_down: | | hudicommon | `?` | | | hudiflink | `?` | | | hudihadoopmr | `?` | | | hudisparkdatasource | `?` | | | hudisync | `5.38% <ø> (-48.67%)` | :arrow_down: | | huditimelineservice | `?` | | | hudiutilities | `9.31% <ø> (-48.71%)` | :arrow_down: | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more. | [Impacted Files](https://codecov.io/gh/apache/hudi/pull/2963?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) | Coverage Δ | | |---|---|---| | [...rg/apache/hudi/schema/SchemaProviderInterface.java](https://codecov.io/gh/apache/hudi/pull/2963/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGllbnQvaHVkaS1jbGllbnQtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL3NjaGVtYS9TY2hlbWFQcm92aWRlckludGVyZmFjZS5qYXZh) | `0.00% <ø> (ø)` | | | [...g/apache/hudi/utilities/schema/SchemaProvider.java](https://codecov.io/gh/apache/hudi/pull/2963/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NjaGVtYS9TY2hlbWFQcm92aWRlci5qYXZh) | `66.66% <ø> (-4.77%)` | :arrow_down: | | [...va/org/apache/hudi/utilities/IdentitySplitter.java](https://codecov.io/gh/apache/hudi/pull/2963/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL0lkZW50aXR5U3BsaXR0ZXIuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | :arrow_down: | | [...va/org/apache/hudi/utilities/schema/SchemaSet.java](https://codecov.io/gh/apache/hudi/pull/2963/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NjaGVtYS9TY2hlbWFTZXQuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | :arrow_down: | | [...a/org/apache/hudi/utilities/sources/RowSource.java](https://codecov.io/gh/apache/hudi/pull/2963/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvUm93U291cmNlLmphdmE=) | `0.00% <0.00%> (-100.00%)` | :arrow_down: | |
[jira] [Commented] (HUDI-1468) incremental read support with clustering
[ https://issues.apache.org/jira/browse/HUDI-1468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17373234#comment-17373234 ] ASF GitHub Bot commented on HUDI-1468: -- hudi-bot edited a comment on pull request #3211: URL: https://github.com/apache/hudi/pull/3211#issuecomment-872639166 ## CI report: * c9c9a0d5343b65e690544dfcb85e71d915c455e1 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=637) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > incremental read support with clustering > > > Key: HUDI-1468 > URL: https://issues.apache.org/jira/browse/HUDI-1468 > Project: Apache Hudi > Issue Type: Sub-task > Components: Incremental Pull >Affects Versions: 0.9.0 >Reporter: satish >Assignee: liwei >Priority: Blocker > Labels: pull-request-available > Fix For: 0.9.0 > > > As part of clustering, metadata such as hoodie_commit_time changes for > records that are clustered. This is specific to > SparkBulkInsertBasedRunClusteringStrategy implementation. Figure out a way to > carry commit_time from original record to support incremental queries. > Also, incremental queries dont work with 'replacecommit' used by clustering > HUDI-1264. Change incremental query to work for replacecommits created by > Clustering. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] hudi-bot edited a comment on pull request #3211: [HUDI-1468] Support more flexible clustering strategies and preserve commit …
hudi-bot edited a comment on pull request #3211: URL: https://github.com/apache/hudi/pull/3211#issuecomment-872639166 ## CI report: * c9c9a0d5343b65e690544dfcb85e71d915c455e1 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=637) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (HUDI-2116) sync 10w partitions to hive by using HiveSyncTool lead to the oom of hive MetaStore
[ https://issues.apache.org/jira/browse/HUDI-2116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17373231#comment-17373231 ] ASF GitHub Bot commented on HUDI-2116: -- hudi-bot edited a comment on pull request #3209: URL: https://github.com/apache/hudi/pull/3209#issuecomment-872173561 ## CI report: * fbcd406b45e370446193c32e7d09db09d57a0996 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=636) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > sync 10w partitions to hive by using HiveSyncTool lead to the oom of hive > MetaStore > > > Key: HUDI-2116 > URL: https://issues.apache.org/jira/browse/HUDI-2116 > Project: Apache Hudi > Issue Type: Bug > Components: Hive Integration >Affects Versions: 0.8.0 > Environment: hive3.1.1 > hadoop 3.1.1 >Reporter: tao meng >Assignee: tao meng >Priority: Major > Labels: pull-request-available > Fix For: 0.10.0 > > > when we try to sync 10w partitions to hive by using HiveSyncTool lead to the > oom of hive MetaStore。 > > here is a stress test for HiveSyncTool > env: > hive metastore -Xms16G -Xmx16G > hive.metastore.client.socket.timeout=10800 > > ||partitionNum||time consume|| > |100|37s| > |1000|168s| > |5000|1830s| > |1|timeout| > |10|hive metastore oom| > HiveSyncTools sync all partitions to hive metastore at once。 when the > partitions num is large ,it puts a lot of pressure on hive metastore。 for > large partition num we should support batch sync 。 > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] hudi-bot edited a comment on pull request #3209: [HUDI-2116] Support batch synchronization of partition datas to hive metastore to avoid oom problem
hudi-bot edited a comment on pull request #3209: URL: https://github.com/apache/hudi/pull/3209#issuecomment-872173561 ## CI report: * fbcd406b45e370446193c32e7d09db09d57a0996 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=636) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (HUDI-2045) Support Read Hoodie As DataSource Table For Flink And DeltaStreamer
[ https://issues.apache.org/jira/browse/HUDI-2045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17373227#comment-17373227 ] ASF GitHub Bot commented on HUDI-2045: -- hudi-bot edited a comment on pull request #3120: URL: https://github.com/apache/hudi/pull/3120#issuecomment-864760893 ## CI report: * aaca30fffd1ea37f803f51ef3cf49c59ed79badc UNKNOWN * fcd06c8bccfc90b272b51d3511094e6617ec25bd UNKNOWN * 96947d0419df5f8bab10072eb64afecd29326e55 UNKNOWN * 02acd1127b72470f6d7adffb787179f0cddfa954 UNKNOWN * 504a6770be5d4cd3a78d61129be5b1aaadd515df UNKNOWN * 75aadbc834d6606527764468dd3dbcb1e802b171 UNKNOWN * f14ffb1f08820146e5d26616aa9b956ff99ec604 UNKNOWN * 06dff3c437b7b3f1aa227b700cf8c34669b067ed UNKNOWN * 97ba05a69199cff86cebbe25732097e3a68284f1 UNKNOWN * 3948fff7aacd6c97dcbe053a59a1208dae875607 UNKNOWN * 8ff6a0af2f53984c5864b04156a5b942400811c3 UNKNOWN * dbcd6ae0092c067c4bc364355ca7fa8129f46a39 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=635) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Support Read Hoodie As DataSource Table For Flink And DeltaStreamer > --- > > Key: HUDI-2045 > URL: https://issues.apache.org/jira/browse/HUDI-2045 > Project: Apache Hudi > Issue Type: Improvement > Components: Hive Integration >Reporter: pengzhiwei >Assignee: pengzhiwei >Priority: Major > Labels: pull-request-available > Fix For: 0.9.0 > > > Currently we only support reading hoodie table as datasource table for spark > since [https://github.com/apache/hudi/pull/2283] > In order to support this feature for flink and DeltaStreamer, we need to sync > the spark table properties needed by datasource table to the meta store in > HiveSyncTool. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] hudi-bot edited a comment on pull request #3120: [HUDI-2045] Support Read Hoodie As DataSource Table For Flink And Del…
hudi-bot edited a comment on pull request #3120: URL: https://github.com/apache/hudi/pull/3120#issuecomment-864760893 ## CI report: * aaca30fffd1ea37f803f51ef3cf49c59ed79badc UNKNOWN * fcd06c8bccfc90b272b51d3511094e6617ec25bd UNKNOWN * 96947d0419df5f8bab10072eb64afecd29326e55 UNKNOWN * 02acd1127b72470f6d7adffb787179f0cddfa954 UNKNOWN * 504a6770be5d4cd3a78d61129be5b1aaadd515df UNKNOWN * 75aadbc834d6606527764468dd3dbcb1e802b171 UNKNOWN * f14ffb1f08820146e5d26616aa9b956ff99ec604 UNKNOWN * 06dff3c437b7b3f1aa227b700cf8c34669b067ed UNKNOWN * 97ba05a69199cff86cebbe25732097e3a68284f1 UNKNOWN * 3948fff7aacd6c97dcbe053a59a1208dae875607 UNKNOWN * 8ff6a0af2f53984c5864b04156a5b942400811c3 UNKNOWN * dbcd6ae0092c067c4bc364355ca7fa8129f46a39 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=635) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (HUDI-1447) DeltaStreamer kafka source supports consuming from specified timestamp
[ https://issues.apache.org/jira/browse/HUDI-1447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17373226#comment-17373226 ] ASF GitHub Bot commented on HUDI-1447: -- hudi-bot edited a comment on pull request #2438: URL: https://github.com/apache/hudi/pull/2438#issuecomment-863310563 ## CI report: * a39570dfe0493bcd23edf911f6256e90d3b22907 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=638) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > DeltaStreamer kafka source supports consuming from specified timestamp > -- > > Key: HUDI-1447 > URL: https://issues.apache.org/jira/browse/HUDI-1447 > Project: Apache Hudi > Issue Type: New Feature > Components: DeltaStreamer >Reporter: wangxianghu#1 >Assignee: liujinhui >Priority: Major > Labels: pull-request-available, sev:high, user-support-issues > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] hudi-bot edited a comment on pull request #2438: [HUDI-1447] DeltaStreamer kafka source supports consuming from specified timestamp
hudi-bot edited a comment on pull request #2438: URL: https://github.com/apache/hudi/pull/2438#issuecomment-863310563 ## CI report: * a39570dfe0493bcd23edf911f6256e90d3b22907 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=638) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (HUDI-2057) CTAS Generate An External Table When Create Managed Table
[ https://issues.apache.org/jira/browse/HUDI-2057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17373225#comment-17373225 ] ASF GitHub Bot commented on HUDI-2057: -- hudi-bot edited a comment on pull request #3146: URL: https://github.com/apache/hudi/pull/3146#issuecomment-867438049 ## CI report: * 189ca2500f54564e9c252dbab04198bae15494ef Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=634) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > CTAS Generate An External Table When Create Managed Table > - > > Key: HUDI-2057 > URL: https://issues.apache.org/jira/browse/HUDI-2057 > Project: Apache Hudi > Issue Type: Bug > Components: Spark Integration >Reporter: pengzhiwei >Assignee: pengzhiwei >Priority: Major > Labels: pull-request-available > > Currently CTAS would generate an external table when create a managed table > in the hive meta store. > {code:java} > create table h0 using hudi as select 1 as id, 'a1' as name;{code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] hudi-bot edited a comment on pull request #3146: [HUDI-2057] CTAS Generate An External Table When Create Managed Table
hudi-bot edited a comment on pull request #3146: URL: https://github.com/apache/hudi/pull/3146#issuecomment-867438049 ## CI report: * 189ca2500f54564e9c252dbab04198bae15494ef Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=634) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (HUDI-1447) DeltaStreamer kafka source supports consuming from specified timestamp
[ https://issues.apache.org/jira/browse/HUDI-1447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17373224#comment-17373224 ] ASF GitHub Bot commented on HUDI-1447: -- hudi-bot edited a comment on pull request #2438: URL: https://github.com/apache/hudi/pull/2438#issuecomment-863310563 ## CI report: * 1bbcdb44cbb0ab9ac84c95a48fea5d7f38a8f657 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=633) * a39570dfe0493bcd23edf911f6256e90d3b22907 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=638) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > DeltaStreamer kafka source supports consuming from specified timestamp > -- > > Key: HUDI-1447 > URL: https://issues.apache.org/jira/browse/HUDI-1447 > Project: Apache Hudi > Issue Type: New Feature > Components: DeltaStreamer >Reporter: wangxianghu#1 >Assignee: liujinhui >Priority: Major > Labels: pull-request-available, sev:high, user-support-issues > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] hudi-bot edited a comment on pull request #2438: [HUDI-1447] DeltaStreamer kafka source supports consuming from specified timestamp
hudi-bot edited a comment on pull request #2438: URL: https://github.com/apache/hudi/pull/2438#issuecomment-863310563 ## CI report: * 1bbcdb44cbb0ab9ac84c95a48fea5d7f38a8f657 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=633) * a39570dfe0493bcd23edf911f6256e90d3b22907 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=638) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (HUDI-1468) incremental read support with clustering
[ https://issues.apache.org/jira/browse/HUDI-1468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17373214#comment-17373214 ] ASF GitHub Bot commented on HUDI-1468: -- hudi-bot edited a comment on pull request #3211: URL: https://github.com/apache/hudi/pull/3211#issuecomment-872639166 ## CI report: * ab7bacb26d44f383e7f61ec81531b34011f1383b Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=629) * c9c9a0d5343b65e690544dfcb85e71d915c455e1 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=637) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > incremental read support with clustering > > > Key: HUDI-1468 > URL: https://issues.apache.org/jira/browse/HUDI-1468 > Project: Apache Hudi > Issue Type: Sub-task > Components: Incremental Pull >Affects Versions: 0.9.0 >Reporter: satish >Assignee: liwei >Priority: Blocker > Labels: pull-request-available > Fix For: 0.9.0 > > > As part of clustering, metadata such as hoodie_commit_time changes for > records that are clustered. This is specific to > SparkBulkInsertBasedRunClusteringStrategy implementation. Figure out a way to > carry commit_time from original record to support incremental queries. > Also, incremental queries dont work with 'replacecommit' used by clustering > HUDI-1264. Change incremental query to work for replacecommits created by > Clustering. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] hudi-bot edited a comment on pull request #3211: [HUDI-1468] Support more flexible clustering strategies and preserve commit …
hudi-bot edited a comment on pull request #3211: URL: https://github.com/apache/hudi/pull/3211#issuecomment-872639166 ## CI report: * ab7bacb26d44f383e7f61ec81531b34011f1383b Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=629) * c9c9a0d5343b65e690544dfcb85e71d915c455e1 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=637) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (HUDI-1468) incremental read support with clustering
[ https://issues.apache.org/jira/browse/HUDI-1468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17373212#comment-17373212 ] ASF GitHub Bot commented on HUDI-1468: -- hudi-bot edited a comment on pull request #3211: URL: https://github.com/apache/hudi/pull/3211#issuecomment-872639166 ## CI report: * ab7bacb26d44f383e7f61ec81531b34011f1383b Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=629) * c9c9a0d5343b65e690544dfcb85e71d915c455e1 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > incremental read support with clustering > > > Key: HUDI-1468 > URL: https://issues.apache.org/jira/browse/HUDI-1468 > Project: Apache Hudi > Issue Type: Sub-task > Components: Incremental Pull >Affects Versions: 0.9.0 >Reporter: satish >Assignee: liwei >Priority: Blocker > Labels: pull-request-available > Fix For: 0.9.0 > > > As part of clustering, metadata such as hoodie_commit_time changes for > records that are clustered. This is specific to > SparkBulkInsertBasedRunClusteringStrategy implementation. Figure out a way to > carry commit_time from original record to support incremental queries. > Also, incremental queries dont work with 'replacecommit' used by clustering > HUDI-1264. Change incremental query to work for replacecommits created by > Clustering. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] hudi-bot edited a comment on pull request #3211: [HUDI-1468] Support more flexible clustering strategies and preserve commit …
hudi-bot edited a comment on pull request #3211: URL: https://github.com/apache/hudi/pull/3211#issuecomment-872639166 ## CI report: * ab7bacb26d44f383e7f61ec81531b34011f1383b Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=629) * c9c9a0d5343b65e690544dfcb85e71d915c455e1 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (HUDI-2116) sync 10w partitions to hive by using HiveSyncTool lead to the oom of hive MetaStore
[ https://issues.apache.org/jira/browse/HUDI-2116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17373207#comment-17373207 ] ASF GitHub Bot commented on HUDI-2116: -- hudi-bot edited a comment on pull request #3209: URL: https://github.com/apache/hudi/pull/3209#issuecomment-872173561 ## CI report: * f4c7b374a7a338f0202c356baf08f24a9043e37a Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=613) * fbcd406b45e370446193c32e7d09db09d57a0996 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=636) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > sync 10w partitions to hive by using HiveSyncTool lead to the oom of hive > MetaStore > > > Key: HUDI-2116 > URL: https://issues.apache.org/jira/browse/HUDI-2116 > Project: Apache Hudi > Issue Type: Bug > Components: Hive Integration >Affects Versions: 0.8.0 > Environment: hive3.1.1 > hadoop 3.1.1 >Reporter: tao meng >Assignee: tao meng >Priority: Major > Labels: pull-request-available > Fix For: 0.10.0 > > > when we try to sync 10w partitions to hive by using HiveSyncTool lead to the > oom of hive MetaStore。 > > here is a stress test for HiveSyncTool > env: > hive metastore -Xms16G -Xmx16G > hive.metastore.client.socket.timeout=10800 > > ||partitionNum||time consume|| > |100|37s| > |1000|168s| > |5000|1830s| > |1|timeout| > |10|hive metastore oom| > HiveSyncTools sync all partitions to hive metastore at once。 when the > partitions num is large ,it puts a lot of pressure on hive metastore。 for > large partition num we should support batch sync 。 > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] hudi-bot edited a comment on pull request #3209: [HUDI-2116] Support batch synchronization of partition datas to hive metastore to avoid oom problem
hudi-bot edited a comment on pull request #3209: URL: https://github.com/apache/hudi/pull/3209#issuecomment-872173561 ## CI report: * f4c7b374a7a338f0202c356baf08f24a9043e37a Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=613) * fbcd406b45e370446193c32e7d09db09d57a0996 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=636) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (HUDI-2116) sync 10w partitions to hive by using HiveSyncTool lead to the oom of hive MetaStore
[ https://issues.apache.org/jira/browse/HUDI-2116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17373205#comment-17373205 ] ASF GitHub Bot commented on HUDI-2116: -- hudi-bot edited a comment on pull request #3209: URL: https://github.com/apache/hudi/pull/3209#issuecomment-872173561 ## CI report: * f4c7b374a7a338f0202c356baf08f24a9043e37a Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=613) * fbcd406b45e370446193c32e7d09db09d57a0996 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > sync 10w partitions to hive by using HiveSyncTool lead to the oom of hive > MetaStore > > > Key: HUDI-2116 > URL: https://issues.apache.org/jira/browse/HUDI-2116 > Project: Apache Hudi > Issue Type: Bug > Components: Hive Integration >Affects Versions: 0.8.0 > Environment: hive3.1.1 > hadoop 3.1.1 >Reporter: tao meng >Assignee: tao meng >Priority: Major > Labels: pull-request-available > Fix For: 0.10.0 > > > when we try to sync 10w partitions to hive by using HiveSyncTool lead to the > oom of hive MetaStore。 > > here is a stress test for HiveSyncTool > env: > hive metastore -Xms16G -Xmx16G > hive.metastore.client.socket.timeout=10800 > > ||partitionNum||time consume|| > |100|37s| > |1000|168s| > |5000|1830s| > |1|timeout| > |10|hive metastore oom| > HiveSyncTools sync all partitions to hive metastore at once。 when the > partitions num is large ,it puts a lot of pressure on hive metastore。 for > large partition num we should support batch sync 。 > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-2116) sync 10w partitions to hive by using HiveSyncTool lead to the oom of hive MetaStore
[ https://issues.apache.org/jira/browse/HUDI-2116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17373206#comment-17373206 ] ASF GitHub Bot commented on HUDI-2116: -- xiarixiaoyao commented on pull request #3209: URL: https://github.com/apache/hudi/pull/3209#issuecomment-872694937 @yanghua thanks for your review.already changed the PR title. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > sync 10w partitions to hive by using HiveSyncTool lead to the oom of hive > MetaStore > > > Key: HUDI-2116 > URL: https://issues.apache.org/jira/browse/HUDI-2116 > Project: Apache Hudi > Issue Type: Bug > Components: Hive Integration >Affects Versions: 0.8.0 > Environment: hive3.1.1 > hadoop 3.1.1 >Reporter: tao meng >Assignee: tao meng >Priority: Major > Labels: pull-request-available > Fix For: 0.10.0 > > > when we try to sync 10w partitions to hive by using HiveSyncTool lead to the > oom of hive MetaStore。 > > here is a stress test for HiveSyncTool > env: > hive metastore -Xms16G -Xmx16G > hive.metastore.client.socket.timeout=10800 > > ||partitionNum||time consume|| > |100|37s| > |1000|168s| > |5000|1830s| > |1|timeout| > |10|hive metastore oom| > HiveSyncTools sync all partitions to hive metastore at once。 when the > partitions num is large ,it puts a lot of pressure on hive metastore。 for > large partition num we should support batch sync 。 > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] xiarixiaoyao commented on pull request #3209: [HUDI-2116] Support batch synchronization of partition datas to hive metastore to avoid oom problem
xiarixiaoyao commented on pull request #3209: URL: https://github.com/apache/hudi/pull/3209#issuecomment-872694937 @yanghua thanks for your review.already changed the PR title. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot edited a comment on pull request #3209: [HUDI-2116] Support batch synchronization of partition datas to hive metastore to avoid oom problem
hudi-bot edited a comment on pull request #3209: URL: https://github.com/apache/hudi/pull/3209#issuecomment-872173561 ## CI report: * f4c7b374a7a338f0202c356baf08f24a9043e37a Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=613) * fbcd406b45e370446193c32e7d09db09d57a0996 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (HUDI-1447) DeltaStreamer kafka source supports consuming from specified timestamp
[ https://issues.apache.org/jira/browse/HUDI-1447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17373202#comment-17373202 ] ASF GitHub Bot commented on HUDI-1447: -- hudi-bot edited a comment on pull request #2438: URL: https://github.com/apache/hudi/pull/2438#issuecomment-863310563 ## CI report: * 1bbcdb44cbb0ab9ac84c95a48fea5d7f38a8f657 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=633) * a39570dfe0493bcd23edf911f6256e90d3b22907 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > DeltaStreamer kafka source supports consuming from specified timestamp > -- > > Key: HUDI-1447 > URL: https://issues.apache.org/jira/browse/HUDI-1447 > Project: Apache Hudi > Issue Type: New Feature > Components: DeltaStreamer >Reporter: wangxianghu#1 >Assignee: liujinhui >Priority: Major > Labels: pull-request-available, sev:high, user-support-issues > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] hudi-bot edited a comment on pull request #2438: [HUDI-1447] DeltaStreamer kafka source supports consuming from specified timestamp
hudi-bot edited a comment on pull request #2438: URL: https://github.com/apache/hudi/pull/2438#issuecomment-863310563 ## CI report: * 1bbcdb44cbb0ab9ac84c95a48fea5d7f38a8f657 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=633) * a39570dfe0493bcd23edf911f6256e90d3b22907 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (HUDI-2072) Add Precommit validator framework
[ https://issues.apache.org/jira/browse/HUDI-2072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17373200#comment-17373200 ] ASF GitHub Bot commented on HUDI-2072: -- vinothchandar commented on a change in pull request #3153: URL: https://github.com/apache/hudi/pull/3153#discussion_r662716233 ## File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java ## @@ -194,6 +194,37 @@ public static final String EXTERNAL_RECORD_AND_SCHEMA_TRANSFORMATION = AVRO_SCHEMA + ".externalTransformation"; public static final String DEFAULT_EXTERNAL_RECORD_AND_SCHEMA_TRANSFORMATION = "false"; + public static final String PRE_COMMIT_VALIDATORS = "hoodie.precommit.validators"; + private static final String DEFAULT_PRE_COMMIT_VALIDATORS = ""; + public static final String VALIDATOR_TABLE_VARIABLE = ""; + + /** + * Spark SQL queries to run on table before committing new data to validate state before and after commit. + * Multiple queries separated by ';' delimiter are supported. + * example: "select count(*) from \" + * Note \ is replaced by table state before and after commit. + */ + public static final String PRE_COMMIT_VALIDATORS_EQUALITY_SQL_QUERIES = "hoodie.precommit.validators.equality.sql.queries"; Review comment: please move all these configs to as ConfigProperty ## File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java ## @@ -194,6 +194,37 @@ public static final String EXTERNAL_RECORD_AND_SCHEMA_TRANSFORMATION = AVRO_SCHEMA + ".externalTransformation"; public static final String DEFAULT_EXTERNAL_RECORD_AND_SCHEMA_TRANSFORMATION = "false"; + public static final String PRE_COMMIT_VALIDATORS = "hoodie.precommit.validators"; Review comment: can we create a new Config class for this, instead of overloading the WriteConfig? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add Precommit validator framework > - > > Key: HUDI-2072 > URL: https://issues.apache.org/jira/browse/HUDI-2072 > Project: Apache Hudi > Issue Type: New Feature >Reporter: satish >Assignee: satish >Priority: Major > Labels: pull-request-available > > We want to run pre-commit validators before 'promoting' a inflight operation > to commit. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] vinothchandar commented on a change in pull request #3153: [HUDI-2072] Add pre-commit validator framework
vinothchandar commented on a change in pull request #3153: URL: https://github.com/apache/hudi/pull/3153#discussion_r662716233 ## File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java ## @@ -194,6 +194,37 @@ public static final String EXTERNAL_RECORD_AND_SCHEMA_TRANSFORMATION = AVRO_SCHEMA + ".externalTransformation"; public static final String DEFAULT_EXTERNAL_RECORD_AND_SCHEMA_TRANSFORMATION = "false"; + public static final String PRE_COMMIT_VALIDATORS = "hoodie.precommit.validators"; + private static final String DEFAULT_PRE_COMMIT_VALIDATORS = ""; + public static final String VALIDATOR_TABLE_VARIABLE = ""; + + /** + * Spark SQL queries to run on table before committing new data to validate state before and after commit. + * Multiple queries separated by ';' delimiter are supported. + * example: "select count(*) from \" + * Note \ is replaced by table state before and after commit. + */ + public static final String PRE_COMMIT_VALIDATORS_EQUALITY_SQL_QUERIES = "hoodie.precommit.validators.equality.sql.queries"; Review comment: please move all these configs to as ConfigProperty ## File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java ## @@ -194,6 +194,37 @@ public static final String EXTERNAL_RECORD_AND_SCHEMA_TRANSFORMATION = AVRO_SCHEMA + ".externalTransformation"; public static final String DEFAULT_EXTERNAL_RECORD_AND_SCHEMA_TRANSFORMATION = "false"; + public static final String PRE_COMMIT_VALIDATORS = "hoodie.precommit.validators"; Review comment: can we create a new Config class for this, instead of overloading the WriteConfig? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (HUDI-2045) Support Read Hoodie As DataSource Table For Flink And DeltaStreamer
[ https://issues.apache.org/jira/browse/HUDI-2045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17373198#comment-17373198 ] ASF GitHub Bot commented on HUDI-2045: -- hudi-bot edited a comment on pull request #3120: URL: https://github.com/apache/hudi/pull/3120#issuecomment-864760893 ## CI report: * aaca30fffd1ea37f803f51ef3cf49c59ed79badc UNKNOWN * fcd06c8bccfc90b272b51d3511094e6617ec25bd UNKNOWN * 96947d0419df5f8bab10072eb64afecd29326e55 UNKNOWN * 02acd1127b72470f6d7adffb787179f0cddfa954 UNKNOWN * 504a6770be5d4cd3a78d61129be5b1aaadd515df UNKNOWN * 75aadbc834d6606527764468dd3dbcb1e802b171 UNKNOWN * f14ffb1f08820146e5d26616aa9b956ff99ec604 UNKNOWN * 06dff3c437b7b3f1aa227b700cf8c34669b067ed UNKNOWN * 97ba05a69199cff86cebbe25732097e3a68284f1 UNKNOWN * 3948fff7aacd6c97dcbe053a59a1208dae875607 UNKNOWN * 8ff6a0af2f53984c5864b04156a5b942400811c3 UNKNOWN * adba2dccf6e41da3bde98e5ed622cfd4b39554e9 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=515) * dbcd6ae0092c067c4bc364355ca7fa8129f46a39 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=635) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Support Read Hoodie As DataSource Table For Flink And DeltaStreamer > --- > > Key: HUDI-2045 > URL: https://issues.apache.org/jira/browse/HUDI-2045 > Project: Apache Hudi > Issue Type: Improvement > Components: Hive Integration >Reporter: pengzhiwei >Assignee: pengzhiwei >Priority: Major > Labels: pull-request-available > Fix For: 0.9.0 > > > Currently we only support reading hoodie table as datasource table for spark > since [https://github.com/apache/hudi/pull/2283] > In order to support this feature for flink and DeltaStreamer, we need to sync > the spark table properties needed by datasource table to the meta store in > HiveSyncTool. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] hudi-bot edited a comment on pull request #3120: [HUDI-2045] Support Read Hoodie As DataSource Table For Flink And Del…
hudi-bot edited a comment on pull request #3120: URL: https://github.com/apache/hudi/pull/3120#issuecomment-864760893 ## CI report: * aaca30fffd1ea37f803f51ef3cf49c59ed79badc UNKNOWN * fcd06c8bccfc90b272b51d3511094e6617ec25bd UNKNOWN * 96947d0419df5f8bab10072eb64afecd29326e55 UNKNOWN * 02acd1127b72470f6d7adffb787179f0cddfa954 UNKNOWN * 504a6770be5d4cd3a78d61129be5b1aaadd515df UNKNOWN * 75aadbc834d6606527764468dd3dbcb1e802b171 UNKNOWN * f14ffb1f08820146e5d26616aa9b956ff99ec604 UNKNOWN * 06dff3c437b7b3f1aa227b700cf8c34669b067ed UNKNOWN * 97ba05a69199cff86cebbe25732097e3a68284f1 UNKNOWN * 3948fff7aacd6c97dcbe053a59a1208dae875607 UNKNOWN * 8ff6a0af2f53984c5864b04156a5b942400811c3 UNKNOWN * adba2dccf6e41da3bde98e5ed622cfd4b39554e9 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=515) * dbcd6ae0092c067c4bc364355ca7fa8129f46a39 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=635) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot edited a comment on pull request #3146: [HUDI-2057] CTAS Generate An External Table When Create Managed Table
hudi-bot edited a comment on pull request #3146: URL: https://github.com/apache/hudi/pull/3146#issuecomment-867438049 ## CI report: * a3889e81b221cbefe5ad98c1e62f90aa24742d80 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=624) * 189ca2500f54564e9c252dbab04198bae15494ef Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=634) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (HUDI-2057) CTAS Generate An External Table When Create Managed Table
[ https://issues.apache.org/jira/browse/HUDI-2057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17373196#comment-17373196 ] ASF GitHub Bot commented on HUDI-2057: -- hudi-bot edited a comment on pull request #3146: URL: https://github.com/apache/hudi/pull/3146#issuecomment-867438049 ## CI report: * a3889e81b221cbefe5ad98c1e62f90aa24742d80 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=624) * 189ca2500f54564e9c252dbab04198bae15494ef Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=634) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > CTAS Generate An External Table When Create Managed Table > - > > Key: HUDI-2057 > URL: https://issues.apache.org/jira/browse/HUDI-2057 > Project: Apache Hudi > Issue Type: Bug > Components: Spark Integration >Reporter: pengzhiwei >Assignee: pengzhiwei >Priority: Major > Labels: pull-request-available > > Currently CTAS would generate an external table when create a managed table > in the hive meta store. > {code:java} > create table h0 using hudi as select 1 as id, 'a1' as name;{code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-2072) Add Precommit validator framework
[ https://issues.apache.org/jira/browse/HUDI-2072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17373195#comment-17373195 ] ASF GitHub Bot commented on HUDI-2072: -- bvaradar commented on a change in pull request #3153: URL: https://github.com/apache/hudi/pull/3153#discussion_r662650025 ## File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java ## @@ -194,6 +194,37 @@ public static final String EXTERNAL_RECORD_AND_SCHEMA_TRANSFORMATION = AVRO_SCHEMA + ".externalTransformation"; public static final String DEFAULT_EXTERNAL_RECORD_AND_SCHEMA_TRANSFORMATION = "false"; + public static final String PRE_COMMIT_VALIDATORS = "hoodie.precommit.validators"; + private static final String DEFAULT_PRE_COMMIT_VALIDATORS = ""; + public static final String VALIDATOR_TABLE_VARIABLE = ""; Review comment: It would make the validation queries more flexible if we can make both after and before table names individually configurable. Sometimes, your validation queries would involve joining both before and after tables. Keeping them configurable would allow for more flexibility. ## File path: hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/client/validator/SqlQueryPreCommitValidator.java ## @@ -0,0 +1,92 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hudi.client.validator; + +import org.apache.hudi.client.WriteStatus; +import org.apache.hudi.client.common.HoodieSparkEngineContext; +import org.apache.hudi.common.engine.HoodieEngineContext; +import org.apache.hudi.common.model.HoodieRecordPayload; +import org.apache.hudi.common.util.StringUtils; +import org.apache.hudi.config.HoodieWriteConfig; +import org.apache.hudi.exception.HoodieValidationException; +import org.apache.hudi.table.HoodieSparkTable; +import org.apache.log4j.LogManager; +import org.apache.log4j.Logger; +import org.apache.spark.api.java.JavaRDD; +import org.apache.spark.api.java.JavaSparkContext; +import org.apache.spark.sql.Dataset; +import org.apache.spark.sql.Row; +import org.apache.spark.sql.SQLContext; + +import java.util.Arrays; +import java.util.Set; +import java.util.concurrent.atomic.AtomicInteger; + +/** + * Validator framework to run sql queries and compare table state at different locations. + */ +public abstract class SqlQueryPreCommitValidator> extends SparkPreCommitValidator { + private static final Logger LOG = LogManager.getLogger(SqlQueryPreCommitValidator.class); + private static final AtomicInteger TABLE_COUNTER = new AtomicInteger(0); + + public SqlQueryPreCommitValidator(HoodieSparkTable table, HoodieEngineContext engineContext, HoodieWriteConfig config) { +super(table, engineContext, config); + } + + /** + * Takes input of RDD 1) before clustering and 2) after clustering. Perform required validation + * and throw error if validation fails + */ + @Override + public void validateRecordsBeforeAndAfter(Dataset before, Dataset after, final Set partitionsAffected) { +String hoodieTableName = "staged_table_" + TABLE_COUNTER.incrementAndGet(); +String hoodieTableBeforeClustering = hoodieTableName + "_before"; +String hoodieTableAfterClustering = hoodieTableName + "_after"; Review comment: Can you also take one pass at the code and rename variables. the validator needs to be agnostic to commit or clustering operations. Can you name them accordingly. ## File path: hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/client/utils/SparkValidatorUtils.java ## @@ -0,0 +1,169 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to
[GitHub] [hudi] wangxianghu closed issue #3188: java.lang.NoSuchMethodError: org.apache.hudi.org.apache.jetty.server.session.SessionHandler.setHttpOnly(Z)V[SUPPORT]
wangxianghu closed issue #3188: URL: https://github.com/apache/hudi/issues/3188 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] bvaradar commented on a change in pull request #3153: [HUDI-2072] Add pre-commit validator framework
bvaradar commented on a change in pull request #3153: URL: https://github.com/apache/hudi/pull/3153#discussion_r662650025 ## File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java ## @@ -194,6 +194,37 @@ public static final String EXTERNAL_RECORD_AND_SCHEMA_TRANSFORMATION = AVRO_SCHEMA + ".externalTransformation"; public static final String DEFAULT_EXTERNAL_RECORD_AND_SCHEMA_TRANSFORMATION = "false"; + public static final String PRE_COMMIT_VALIDATORS = "hoodie.precommit.validators"; + private static final String DEFAULT_PRE_COMMIT_VALIDATORS = ""; + public static final String VALIDATOR_TABLE_VARIABLE = ""; Review comment: It would make the validation queries more flexible if we can make both after and before table names individually configurable. Sometimes, your validation queries would involve joining both before and after tables. Keeping them configurable would allow for more flexibility. ## File path: hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/client/validator/SqlQueryPreCommitValidator.java ## @@ -0,0 +1,92 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hudi.client.validator; + +import org.apache.hudi.client.WriteStatus; +import org.apache.hudi.client.common.HoodieSparkEngineContext; +import org.apache.hudi.common.engine.HoodieEngineContext; +import org.apache.hudi.common.model.HoodieRecordPayload; +import org.apache.hudi.common.util.StringUtils; +import org.apache.hudi.config.HoodieWriteConfig; +import org.apache.hudi.exception.HoodieValidationException; +import org.apache.hudi.table.HoodieSparkTable; +import org.apache.log4j.LogManager; +import org.apache.log4j.Logger; +import org.apache.spark.api.java.JavaRDD; +import org.apache.spark.api.java.JavaSparkContext; +import org.apache.spark.sql.Dataset; +import org.apache.spark.sql.Row; +import org.apache.spark.sql.SQLContext; + +import java.util.Arrays; +import java.util.Set; +import java.util.concurrent.atomic.AtomicInteger; + +/** + * Validator framework to run sql queries and compare table state at different locations. + */ +public abstract class SqlQueryPreCommitValidator> extends SparkPreCommitValidator { + private static final Logger LOG = LogManager.getLogger(SqlQueryPreCommitValidator.class); + private static final AtomicInteger TABLE_COUNTER = new AtomicInteger(0); + + public SqlQueryPreCommitValidator(HoodieSparkTable table, HoodieEngineContext engineContext, HoodieWriteConfig config) { +super(table, engineContext, config); + } + + /** + * Takes input of RDD 1) before clustering and 2) after clustering. Perform required validation + * and throw error if validation fails + */ + @Override + public void validateRecordsBeforeAndAfter(Dataset before, Dataset after, final Set partitionsAffected) { +String hoodieTableName = "staged_table_" + TABLE_COUNTER.incrementAndGet(); +String hoodieTableBeforeClustering = hoodieTableName + "_before"; +String hoodieTableAfterClustering = hoodieTableName + "_after"; Review comment: Can you also take one pass at the code and rename variables. the validator needs to be agnostic to commit or clustering operations. Can you name them accordingly. ## File path: hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/client/utils/SparkValidatorUtils.java ## @@ -0,0 +1,169 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing
[jira] [Commented] (HUDI-2057) CTAS Generate An External Table When Create Managed Table
[ https://issues.apache.org/jira/browse/HUDI-2057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17373194#comment-17373194 ] ASF GitHub Bot commented on HUDI-2057: -- hudi-bot edited a comment on pull request #3146: URL: https://github.com/apache/hudi/pull/3146#issuecomment-867438049 ## CI report: * a3889e81b221cbefe5ad98c1e62f90aa24742d80 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=624) * 189ca2500f54564e9c252dbab04198bae15494ef UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > CTAS Generate An External Table When Create Managed Table > - > > Key: HUDI-2057 > URL: https://issues.apache.org/jira/browse/HUDI-2057 > Project: Apache Hudi > Issue Type: Bug > Components: Spark Integration >Reporter: pengzhiwei >Assignee: pengzhiwei >Priority: Major > Labels: pull-request-available > > Currently CTAS would generate an external table when create a managed table > in the hive meta store. > {code:java} > create table h0 using hudi as select 1 as id, 'a1' as name;{code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] hudi-bot edited a comment on pull request #3146: [HUDI-2057] CTAS Generate An External Table When Create Managed Table
hudi-bot edited a comment on pull request #3146: URL: https://github.com/apache/hudi/pull/3146#issuecomment-867438049 ## CI report: * a3889e81b221cbefe5ad98c1e62f90aa24742d80 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=624) * 189ca2500f54564e9c252dbab04198bae15494ef UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (HUDI-2057) CTAS Generate An External Table When Create Managed Table
[ https://issues.apache.org/jira/browse/HUDI-2057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17373191#comment-17373191 ] ASF GitHub Bot commented on HUDI-2057: -- pengzhiwei2018 commented on a change in pull request #3146: URL: https://github.com/apache/hudi/pull/3146#discussion_r662711996 ## File path: hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/HiveSyncConfig.java ## @@ -151,6 +155,7 @@ public String toString() { + ", help=" + help + ", supportTimestamp=" + supportTimestamp + ", decodePartition=" + decodePartition + + ", createManagedTable= " + createManagedTable Review comment: done! ## File path: hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/util/HiveSchemaUtil.java ## @@ -413,7 +413,12 @@ public static String generateCreateDDL(String tableName, MessageType storageSche } String partitionsStr = String.join(",", partitionFields); -StringBuilder sb = new StringBuilder("CREATE EXTERNAL TABLE IF NOT EXISTS "); +StringBuilder sb = new StringBuilder(); +if (config.createManagedTable) { + sb.append("CREATE TABLE IF NOT EXISTS "); Review comment: done! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > CTAS Generate An External Table When Create Managed Table > - > > Key: HUDI-2057 > URL: https://issues.apache.org/jira/browse/HUDI-2057 > Project: Apache Hudi > Issue Type: Bug > Components: Spark Integration >Reporter: pengzhiwei >Assignee: pengzhiwei >Priority: Major > Labels: pull-request-available > > Currently CTAS would generate an external table when create a managed table > in the hive meta store. > {code:java} > create table h0 using hudi as select 1 as id, 'a1' as name;{code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-2057) CTAS Generate An External Table When Create Managed Table
[ https://issues.apache.org/jira/browse/HUDI-2057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17373192#comment-17373192 ] ASF GitHub Bot commented on HUDI-2057: -- pengzhiwei2018 commented on a change in pull request #3146: URL: https://github.com/apache/hudi/pull/3146#discussion_r662712075 ## File path: hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/util/HiveSchemaUtil.java ## @@ -413,7 +413,12 @@ public static String generateCreateDDL(String tableName, MessageType storageSche } String partitionsStr = String.join(",", partitionFields); -StringBuilder sb = new StringBuilder("CREATE EXTERNAL TABLE IF NOT EXISTS "); +StringBuilder sb = new StringBuilder(); +if (config.createManagedTable) { + sb.append("CREATE TABLE IF NOT EXISTS "); +} else { + sb.append("CREATE EXTERNAL TABLE IF NOT EXISTS "); Review comment: done! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > CTAS Generate An External Table When Create Managed Table > - > > Key: HUDI-2057 > URL: https://issues.apache.org/jira/browse/HUDI-2057 > Project: Apache Hudi > Issue Type: Bug > Components: Spark Integration >Reporter: pengzhiwei >Assignee: pengzhiwei >Priority: Major > Labels: pull-request-available > > Currently CTAS would generate an external table when create a managed table > in the hive meta store. > {code:java} > create table h0 using hudi as select 1 as id, 'a1' as name;{code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] pengzhiwei2018 commented on a change in pull request #3146: [HUDI-2057] CTAS Generate An External Table When Create Managed Table
pengzhiwei2018 commented on a change in pull request #3146: URL: https://github.com/apache/hudi/pull/3146#discussion_r662712075 ## File path: hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/util/HiveSchemaUtil.java ## @@ -413,7 +413,12 @@ public static String generateCreateDDL(String tableName, MessageType storageSche } String partitionsStr = String.join(",", partitionFields); -StringBuilder sb = new StringBuilder("CREATE EXTERNAL TABLE IF NOT EXISTS "); +StringBuilder sb = new StringBuilder(); +if (config.createManagedTable) { + sb.append("CREATE TABLE IF NOT EXISTS "); +} else { + sb.append("CREATE EXTERNAL TABLE IF NOT EXISTS "); Review comment: done! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] pengzhiwei2018 commented on a change in pull request #3146: [HUDI-2057] CTAS Generate An External Table When Create Managed Table
pengzhiwei2018 commented on a change in pull request #3146: URL: https://github.com/apache/hudi/pull/3146#discussion_r662711996 ## File path: hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/HiveSyncConfig.java ## @@ -151,6 +155,7 @@ public String toString() { + ", help=" + help + ", supportTimestamp=" + supportTimestamp + ", decodePartition=" + decodePartition + + ", createManagedTable= " + createManagedTable Review comment: done! ## File path: hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/util/HiveSchemaUtil.java ## @@ -413,7 +413,12 @@ public static String generateCreateDDL(String tableName, MessageType storageSche } String partitionsStr = String.join(",", partitionFields); -StringBuilder sb = new StringBuilder("CREATE EXTERNAL TABLE IF NOT EXISTS "); +StringBuilder sb = new StringBuilder(); +if (config.createManagedTable) { + sb.append("CREATE TABLE IF NOT EXISTS "); Review comment: done! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (HUDI-2057) CTAS Generate An External Table When Create Managed Table
[ https://issues.apache.org/jira/browse/HUDI-2057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17373189#comment-17373189 ] ASF GitHub Bot commented on HUDI-2057: -- pengzhiwei2018 commented on a change in pull request #3146: URL: https://github.com/apache/hudi/pull/3146#discussion_r662711582 ## File path: hudi-sync/hudi-hive-sync/src/test/java/org/apache/hudi/hive/TestHiveSyncTool.java ## @@ -33,8 +35,6 @@ import org.apache.avro.Schema.Field; import org.apache.hadoop.fs.Path; import org.apache.hadoop.hive.metastore.api.Partition; -import org.apache.hadoop.hive.ql.Driver; -import org.apache.hadoop.hive.ql.session.SessionState; Review comment: remove unused import -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > CTAS Generate An External Table When Create Managed Table > - > > Key: HUDI-2057 > URL: https://issues.apache.org/jira/browse/HUDI-2057 > Project: Apache Hudi > Issue Type: Bug > Components: Spark Integration >Reporter: pengzhiwei >Assignee: pengzhiwei >Priority: Major > Labels: pull-request-available > > Currently CTAS would generate an external table when create a managed table > in the hive meta store. > {code:java} > create table h0 using hudi as select 1 as id, 'a1' as name;{code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] pengzhiwei2018 commented on a change in pull request #3146: [HUDI-2057] CTAS Generate An External Table When Create Managed Table
pengzhiwei2018 commented on a change in pull request #3146: URL: https://github.com/apache/hudi/pull/3146#discussion_r662711582 ## File path: hudi-sync/hudi-hive-sync/src/test/java/org/apache/hudi/hive/TestHiveSyncTool.java ## @@ -33,8 +35,6 @@ import org.apache.avro.Schema.Field; import org.apache.hadoop.fs.Path; import org.apache.hadoop.hive.metastore.api.Partition; -import org.apache.hadoop.hive.ql.Driver; -import org.apache.hadoop.hive.ql.session.SessionState; Review comment: remove unused import -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (HUDI-1447) DeltaStreamer kafka source supports consuming from specified timestamp
[ https://issues.apache.org/jira/browse/HUDI-1447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17373188#comment-17373188 ] ASF GitHub Bot commented on HUDI-1447: -- hudi-bot edited a comment on pull request #2438: URL: https://github.com/apache/hudi/pull/2438#issuecomment-863310563 ## CI report: * 1bbcdb44cbb0ab9ac84c95a48fea5d7f38a8f657 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=633) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > DeltaStreamer kafka source supports consuming from specified timestamp > -- > > Key: HUDI-1447 > URL: https://issues.apache.org/jira/browse/HUDI-1447 > Project: Apache Hudi > Issue Type: New Feature > Components: DeltaStreamer >Reporter: wangxianghu#1 >Assignee: liujinhui >Priority: Major > Labels: pull-request-available, sev:high, user-support-issues > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-2058) support incremental query for insert_overwrite_table/insert_overwrite operation on cow table
[ https://issues.apache.org/jira/browse/HUDI-2058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17373187#comment-17373187 ] ASF GitHub Bot commented on HUDI-2058: -- vinothchandar commented on pull request #3139: URL: https://github.com/apache/hudi/pull/3139#issuecomment-872684832 cc @codope this may also fix incremental + clustering, given they are all replace commit. Could you review this once please? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > support incremental query for insert_overwrite_table/insert_overwrite > operation on cow table > > > Key: HUDI-2058 > URL: https://issues.apache.org/jira/browse/HUDI-2058 > Project: Apache Hudi > Issue Type: Bug > Components: Incremental Pull >Affects Versions: 0.8.0 > Environment: hadoop 3.1.1 > spark3.1.1 > hive 3.1.1 >Reporter: tao meng >Assignee: tao meng >Priority: Major > Labels: pull-request-available > Fix For: 0.9.0 > > > when incremental query contains multiple commit before and after > replacecommit, and the query result contains the data of the old file. > Notice: mor table is ok, only cow table has this problem. > > when query incr_view for cow table, replacecommit is ignored which lead the > wrong result. > > > test step: > step1: create dataFrame > val df = spark.range(0, 10).toDF("keyid") > .withColumn("col3", expr("keyid")) > .withColumn("age", lit(1)) > .withColumn("p", lit(2)) > > step2: insert df to a empty hoodie table > df.write.format("hudi"). > option(DataSourceWriteOptions.TABLE_TYPE_OPT_KEY, > DataSourceWriteOptions.COW_TABLE_TYPE_OPT_VAL). > option(DataSourceWriteOptions.PRECOMBINE_FIELD_OPT_KEY, "col3"). > option(DataSourceWriteOptions.RECORDKEY_FIELD_OPT_KEY, "keyid"). > option(DataSourceWriteOptions.PARTITIONPATH_FIELD_OPT_KEY, ""). > option(DataSourceWriteOptions.KEYGENERATOR_CLASS_OPT_KEY, > "org.apache.hudi.keygen.NonpartitionedKeyGenerator"). > option(DataSourceWriteOptions.OPERATION_OPT_KEY, "insert"). > option("hoodie.insert.shuffle.parallelism", "4"). > option(HoodieWriteConfig.TABLE_NAME, "hoodie_test") > .mode(SaveMode.Overwrite).save(basePath) > > step3: do insert_overwrite > df.write.format("hudi"). > option(DataSourceWriteOptions.TABLE_TYPE_OPT_KEY, > DataSourceWriteOptions.COW_TABLE_TYPE_OPT_VAL). > option(DataSourceWriteOptions.PRECOMBINE_FIELD_OPT_KEY, "col3"). > option(DataSourceWriteOptions.RECORDKEY_FIELD_OPT_KEY, "keyid"). > option(DataSourceWriteOptions.PARTITIONPATH_FIELD_OPT_KEY, ""). > option(DataSourceWriteOptions.KEYGENERATOR_CLASS_OPT_KEY, > "org.apache.hudi.keygen.NonpartitionedKeyGenerator"). > option(DataSourceWriteOptions.OPERATION_OPT_KEY, "insert_overwrite_table"). > option("hoodie.insert.shuffle.parallelism", "4"). > option(HoodieWriteConfig.TABLE_NAME, "hoodie_test") > .mode(SaveMode.Append).save(basePath) > > step4: query incrematal table > spark.read.format("hudi").option(DataSourceReadOptions.QUERY_TYPE_OPT_KEY, > DataSourceReadOptions.QUERY_TYPE_INCREMENTAL_OPT_VAL) > .option(DataSourceReadOptions.BEGIN_INSTANTTIME_OPT_KEY, "") > .option(DataSourceReadOptions.END_INSTANTTIME_OPT_KEY, currentCommits(0)) > .load(basePath).select("keyid").orderBy("keyid").show(100, false) > > result: the result contains old data > +-+ > |keyid| > +-+ > |0 | > |0 | > |1 | > |1 | > |2 | > |2 | > |3 | > |3 | > |4 | > |4 | > |5 | > |5 | > |6 | > |6 | > |7 | > |7 | > |8 | > |8 | > |9 | > |9 | > +-+ > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] hudi-bot edited a comment on pull request #2438: [HUDI-1447] DeltaStreamer kafka source supports consuming from specified timestamp
hudi-bot edited a comment on pull request #2438: URL: https://github.com/apache/hudi/pull/2438#issuecomment-863310563 ## CI report: * 1bbcdb44cbb0ab9ac84c95a48fea5d7f38a8f657 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=633) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] vinothchandar commented on pull request #3139: [HUDI-2058]support incremental query for insert_overwrite_table/insert_overwrite operation on cow table
vinothchandar commented on pull request #3139: URL: https://github.com/apache/hudi/pull/3139#issuecomment-872684832 cc @codope this may also fix incremental + clustering, given they are all replace commit. Could you review this once please? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (HUDI-2119) Syncing of rollbacks to metadata table does not work in all cases
[ https://issues.apache.org/jira/browse/HUDI-2119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17373184#comment-17373184 ] ASF GitHub Bot commented on HUDI-2119: -- codecov-commenter commented on pull request #3210: URL: https://github.com/apache/hudi/pull/3210#issuecomment-872684476 # [Codecov](https://codecov.io/gh/apache/hudi/pull/3210?src=pr=h1_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) Report > Merging [#3210](https://codecov.io/gh/apache/hudi/pull/3210?src=pr=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) (9d10483) into [master](https://codecov.io/gh/apache/hudi/commit/6eca06d074520140d7bc67b48bd2b9a5b76f0a87?el=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) (6eca06d) will **increase** coverage by `2.23%`. > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/hudi/pull/3210/graphs/tree.svg?width=650=150=pr=VTTXabwbs2_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/3210?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) ```diff @@ Coverage Diff @@ ## master#3210 +/- ## + Coverage 47.51% 49.74% +2.23% + Complexity 5429 406-5023 Files 922 67 -855 Lines 40968 2985 -37983 Branches 4105 320-3785 - Hits 19464 1485 -17979 + Misses19780 1365 -18415 + Partials 1724 135-1589 ``` | Flag | Coverage Δ | | |---|---|---| | hudicli | `?` | | | hudiclient | `?` | | | hudicommon | `?` | | | hudiflink | `?` | | | hudihadoopmr | `?` | | | hudisparkdatasource | `?` | | | hudisync | `?` | | | huditimelineservice | `?` | | | hudiutilities | `49.74% <ø> (-8.28%)` | :arrow_down: | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more. | [Impacted Files](https://codecov.io/gh/apache/hudi/pull/3210?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) | Coverage Δ | | |---|---|---| | [...ies/exception/HoodieSnapshotExporterException.java](https://codecov.io/gh/apache/hudi/pull/3210/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2V4Y2VwdGlvbi9Ib29kaWVTbmFwc2hvdEV4cG9ydGVyRXhjZXB0aW9uLmphdmE=) | `0.00% <0.00%> (-100.00%)` | :arrow_down: | | [.../apache/hudi/utilities/HoodieSnapshotExporter.java](https://codecov.io/gh/apache/hudi/pull/3210/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL0hvb2RpZVNuYXBzaG90RXhwb3J0ZXIuamF2YQ==) | `5.17% <0.00%> (-83.63%)` | :arrow_down: | | [...hudi/utilities/schema/JdbcbasedSchemaProvider.java](https://codecov.io/gh/apache/hudi/pull/3210/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NjaGVtYS9KZGJjYmFzZWRTY2hlbWFQcm92aWRlci5qYXZh) | `0.00% <0.00%> (-72.23%)` | :arrow_down: | | [...org/apache/hudi/utilities/HDFSParquetImporter.java](https://codecov.io/gh/apache/hudi/pull/3210/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL0hERlNQYXJxdWV0SW1wb3J0ZXIuamF2YQ==) | `0.00% <0.00%> (-71.82%)` | :arrow_down: | | [...he/hudi/utilities/transform/AWSDmsTransformer.java](https://codecov.io/gh/apache/hudi/pull/3210/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3RyYW5zZm9ybS9BV1NEbXNUcmFuc2Zvcm1lci5qYXZh) | `0.00% <0.00%> (-66.67%)` | :arrow_down: | |
[GitHub] [hudi] codecov-commenter commented on pull request #3210: [HUDI-2119] Ensure the rolled-back instance was previously synced to the Metadata Table when syncing a Rollback Instant.
codecov-commenter commented on pull request #3210: URL: https://github.com/apache/hudi/pull/3210#issuecomment-872684476 # [Codecov](https://codecov.io/gh/apache/hudi/pull/3210?src=pr=h1_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) Report > Merging [#3210](https://codecov.io/gh/apache/hudi/pull/3210?src=pr=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) (9d10483) into [master](https://codecov.io/gh/apache/hudi/commit/6eca06d074520140d7bc67b48bd2b9a5b76f0a87?el=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) (6eca06d) will **increase** coverage by `2.23%`. > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/hudi/pull/3210/graphs/tree.svg?width=650=150=pr=VTTXabwbs2_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/3210?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) ```diff @@ Coverage Diff @@ ## master#3210 +/- ## + Coverage 47.51% 49.74% +2.23% + Complexity 5429 406-5023 Files 922 67 -855 Lines 40968 2985 -37983 Branches 4105 320-3785 - Hits 19464 1485 -17979 + Misses19780 1365 -18415 + Partials 1724 135-1589 ``` | Flag | Coverage Δ | | |---|---|---| | hudicli | `?` | | | hudiclient | `?` | | | hudicommon | `?` | | | hudiflink | `?` | | | hudihadoopmr | `?` | | | hudisparkdatasource | `?` | | | hudisync | `?` | | | huditimelineservice | `?` | | | hudiutilities | `49.74% <ø> (-8.28%)` | :arrow_down: | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more. | [Impacted Files](https://codecov.io/gh/apache/hudi/pull/3210?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) | Coverage Δ | | |---|---|---| | [...ies/exception/HoodieSnapshotExporterException.java](https://codecov.io/gh/apache/hudi/pull/3210/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2V4Y2VwdGlvbi9Ib29kaWVTbmFwc2hvdEV4cG9ydGVyRXhjZXB0aW9uLmphdmE=) | `0.00% <0.00%> (-100.00%)` | :arrow_down: | | [.../apache/hudi/utilities/HoodieSnapshotExporter.java](https://codecov.io/gh/apache/hudi/pull/3210/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL0hvb2RpZVNuYXBzaG90RXhwb3J0ZXIuamF2YQ==) | `5.17% <0.00%> (-83.63%)` | :arrow_down: | | [...hudi/utilities/schema/JdbcbasedSchemaProvider.java](https://codecov.io/gh/apache/hudi/pull/3210/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NjaGVtYS9KZGJjYmFzZWRTY2hlbWFQcm92aWRlci5qYXZh) | `0.00% <0.00%> (-72.23%)` | :arrow_down: | | [...org/apache/hudi/utilities/HDFSParquetImporter.java](https://codecov.io/gh/apache/hudi/pull/3210/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL0hERlNQYXJxdWV0SW1wb3J0ZXIuamF2YQ==) | `0.00% <0.00%> (-71.82%)` | :arrow_down: | | [...he/hudi/utilities/transform/AWSDmsTransformer.java](https://codecov.io/gh/apache/hudi/pull/3210/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3RyYW5zZm9ybS9BV1NEbXNUcmFuc2Zvcm1lci5qYXZh) | `0.00% <0.00%> (-66.67%)` | :arrow_down: | | [...in/java/org/apache/hudi/utilities/UtilHelpers.java](https://codecov.io/gh/apache/hudi/pull/3210/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL1V0aWxIZWxwZXJzLmphdmE=) |
[jira] [Commented] (HUDI-1447) DeltaStreamer kafka source supports consuming from specified timestamp
[ https://issues.apache.org/jira/browse/HUDI-1447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17373183#comment-17373183 ] ASF GitHub Bot commented on HUDI-1447: -- hudi-bot edited a comment on pull request #2438: URL: https://github.com/apache/hudi/pull/2438#issuecomment-863310563 ## CI report: * 5e8ab52b0e139333c4c003932c55ff6e88302206 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=565) * 1bbcdb44cbb0ab9ac84c95a48fea5d7f38a8f657 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=633) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > DeltaStreamer kafka source supports consuming from specified timestamp > -- > > Key: HUDI-1447 > URL: https://issues.apache.org/jira/browse/HUDI-1447 > Project: Apache Hudi > Issue Type: New Feature > Components: DeltaStreamer >Reporter: wangxianghu#1 >Assignee: liujinhui >Priority: Major > Labels: pull-request-available, sev:high, user-support-issues > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] hudi-bot edited a comment on pull request #2438: [HUDI-1447] DeltaStreamer kafka source supports consuming from specified timestamp
hudi-bot edited a comment on pull request #2438: URL: https://github.com/apache/hudi/pull/2438#issuecomment-863310563 ## CI report: * 5e8ab52b0e139333c4c003932c55ff6e88302206 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=565) * 1bbcdb44cbb0ab9ac84c95a48fea5d7f38a8f657 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=633) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (HUDI-1447) DeltaStreamer kafka source supports consuming from specified timestamp
[ https://issues.apache.org/jira/browse/HUDI-1447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17373182#comment-17373182 ] ASF GitHub Bot commented on HUDI-1447: -- hudi-bot edited a comment on pull request #2438: URL: https://github.com/apache/hudi/pull/2438#issuecomment-863310563 ## CI report: * 5e8ab52b0e139333c4c003932c55ff6e88302206 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=565) * 1bbcdb44cbb0ab9ac84c95a48fea5d7f38a8f657 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > DeltaStreamer kafka source supports consuming from specified timestamp > -- > > Key: HUDI-1447 > URL: https://issues.apache.org/jira/browse/HUDI-1447 > Project: Apache Hudi > Issue Type: New Feature > Components: DeltaStreamer >Reporter: wangxianghu#1 >Assignee: liujinhui >Priority: Major > Labels: pull-request-available, sev:high, user-support-issues > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] hudi-bot edited a comment on pull request #2438: [HUDI-1447] DeltaStreamer kafka source supports consuming from specified timestamp
hudi-bot edited a comment on pull request #2438: URL: https://github.com/apache/hudi/pull/2438#issuecomment-863310563 ## CI report: * 5e8ab52b0e139333c4c003932c55ff6e88302206 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=565) * 1bbcdb44cbb0ab9ac84c95a48fea5d7f38a8f657 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (HUDI-1951) Hash Index for HUDI
[ https://issues.apache.org/jira/browse/HUDI-1951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17373181#comment-17373181 ] ASF GitHub Bot commented on HUDI-1951: -- vinothchandar commented on pull request #3173: URL: https://github.com/apache/hudi/pull/3173#issuecomment-872683439 @minihippo can you please rebase this PR again? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Hash Index for HUDI > --- > > Key: HUDI-1951 > URL: https://issues.apache.org/jira/browse/HUDI-1951 > Project: Apache Hudi > Issue Type: New Feature >Reporter: XiaoyuGeng >Assignee: XiaoyuGeng >Priority: Major > Labels: pull-request-available > > https://cwiki.apache.org/confluence/display/HUDI/RFC+-+29%3A+Hash+Index -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] vinothchandar commented on pull request #3173: [HUDI-1951] Add bucket hash index, compatible with the hive bucket
vinothchandar commented on pull request #3173: URL: https://github.com/apache/hudi/pull/3173#issuecomment-872683439 @minihippo can you please rebase this PR again? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (HUDI-2090) when hudi metadata is enabled, use different user to query table, the query will failed
[ https://issues.apache.org/jira/browse/HUDI-2090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17373179#comment-17373179 ] ASF GitHub Bot commented on HUDI-2090: -- vinothchandar commented on pull request #3183: URL: https://github.com/apache/hudi/pull/3183#issuecomment-872683185 @n3nash Looks like you shepherded the change with gary :). I don't see #795 actuallly make this path change. Instead of fixing perms (which may not work actually, based on permissions, like you can have write perms, but not to do chmod), can we create an unique folder like `/tmp/hudi_fsview_map-` . This should be totally backwards compatible to do. thoughts? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > when hudi metadata is enabled, use different user to query table, the query > will failed > - > > Key: HUDI-2090 > URL: https://issues.apache.org/jira/browse/HUDI-2090 > Project: Apache Hudi > Issue Type: Bug > Components: Common Core >Affects Versions: 0.8.0 >Reporter: tao meng >Assignee: tao meng >Priority: Major > Labels: pull-request-available > Fix For: 0.9.0 > > > when hudi metadata is enabled, use different user to query table, the query > will failed. > > The user permissions of the temporary directory generated by DiskBasedMap are > incorrect. This directory only has permissions for the user of current > operation, and other users have no permissions to access it, which leads to > this problem > test step: > step1: create hudi table with metadata enabled. > step1: create two user(omm,user2) > step2: > f1) use omm to query hudi table > DiskBasedMap will generate view_map with permissions drwx--. > 2) then user user2 to query hudi table > now user2 has no right to access view_map which created by omm, the > exception will throws: > org.apache.hudi.exception.HoodieIOException: IOException when creating > ExternalSplillableMap at /tmp/view_map > > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] vinothchandar commented on pull request #3183: [HUDI-2090] When hudi metadata is enabled, use different users to quer…
vinothchandar commented on pull request #3183: URL: https://github.com/apache/hudi/pull/3183#issuecomment-872683185 @n3nash Looks like you shepherded the change with gary :). I don't see #795 actuallly make this path change. Instead of fixing perms (which may not work actually, based on permissions, like you can have write perms, but not to do chmod), can we create an unique folder like `/tmp/hudi_fsview_map-` . This should be totally backwards compatible to do. thoughts? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (HUDI-2114) Spark Query MOR Table Written By Flink Return Incorrect Timestamp Value
[ https://issues.apache.org/jira/browse/HUDI-2114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17373173#comment-17373173 ] ASF GitHub Bot commented on HUDI-2114: -- hudi-bot edited a comment on pull request #3208: URL: https://github.com/apache/hudi/pull/3208#issuecomment-872163385 ## CI report: * 7db85c8b1a665ea3cc84d2e085518d100686e8a4 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=632) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Spark Query MOR Table Written By Flink Return Incorrect Timestamp Value > --- > > Key: HUDI-2114 > URL: https://issues.apache.org/jira/browse/HUDI-2114 > Project: Apache Hudi > Issue Type: Bug > Components: Spark Integration >Reporter: pengzhiwei >Assignee: pengzhiwei >Priority: Major > Labels: pull-request-available > Fix For: 0.9.0, 0.10.0 > > > Write a MOR table by flink like this: > {code:java} > create table h0 ( > uuid varchar(20), > name varchar(10), > ts timestamp(3) > ) with ( >'connector' = 'hudi', >'path' = '/xx/xx/', > 'table.type' = 'MERGE_ON_READ' > ); > insert into h0 values('id1', 'jim', TIMESTAMP '2021-01-01 00:00:01'){code} > Query the table by spark will return a incorrect *ts* value: > {code:java} > 'id', 'jim', 1970-01-20 03:22:34.849144{code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] hudi-bot edited a comment on pull request #3208: [HUDI-2114] Spark Query MOR Table Written By Flink Return Incorrect T…
hudi-bot edited a comment on pull request #3208: URL: https://github.com/apache/hudi/pull/3208#issuecomment-872163385 ## CI report: * 7db85c8b1a665ea3cc84d2e085518d100686e8a4 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=632) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (HUDI-2045) Support Read Hoodie As DataSource Table For Flink And DeltaStreamer
[ https://issues.apache.org/jira/browse/HUDI-2045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17373169#comment-17373169 ] ASF GitHub Bot commented on HUDI-2045: -- hudi-bot edited a comment on pull request #3120: URL: https://github.com/apache/hudi/pull/3120#issuecomment-864760893 ## CI report: * aaca30fffd1ea37f803f51ef3cf49c59ed79badc UNKNOWN * fcd06c8bccfc90b272b51d3511094e6617ec25bd UNKNOWN * 96947d0419df5f8bab10072eb64afecd29326e55 UNKNOWN * 02acd1127b72470f6d7adffb787179f0cddfa954 UNKNOWN * 504a6770be5d4cd3a78d61129be5b1aaadd515df UNKNOWN * 75aadbc834d6606527764468dd3dbcb1e802b171 UNKNOWN * f14ffb1f08820146e5d26616aa9b956ff99ec604 UNKNOWN * 06dff3c437b7b3f1aa227b700cf8c34669b067ed UNKNOWN * 97ba05a69199cff86cebbe25732097e3a68284f1 UNKNOWN * 3948fff7aacd6c97dcbe053a59a1208dae875607 UNKNOWN * 8ff6a0af2f53984c5864b04156a5b942400811c3 UNKNOWN * adba2dccf6e41da3bde98e5ed622cfd4b39554e9 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=515) * dbcd6ae0092c067c4bc364355ca7fa8129f46a39 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Support Read Hoodie As DataSource Table For Flink And DeltaStreamer > --- > > Key: HUDI-2045 > URL: https://issues.apache.org/jira/browse/HUDI-2045 > Project: Apache Hudi > Issue Type: Improvement > Components: Hive Integration >Reporter: pengzhiwei >Assignee: pengzhiwei >Priority: Major > Labels: pull-request-available > Fix For: 0.9.0 > > > Currently we only support reading hoodie table as datasource table for spark > since [https://github.com/apache/hudi/pull/2283] > In order to support this feature for flink and DeltaStreamer, we need to sync > the spark table properties needed by datasource table to the meta store in > HiveSyncTool. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] hudi-bot edited a comment on pull request #3120: [HUDI-2045] Support Read Hoodie As DataSource Table For Flink And Del…
hudi-bot edited a comment on pull request #3120: URL: https://github.com/apache/hudi/pull/3120#issuecomment-864760893 ## CI report: * aaca30fffd1ea37f803f51ef3cf49c59ed79badc UNKNOWN * fcd06c8bccfc90b272b51d3511094e6617ec25bd UNKNOWN * 96947d0419df5f8bab10072eb64afecd29326e55 UNKNOWN * 02acd1127b72470f6d7adffb787179f0cddfa954 UNKNOWN * 504a6770be5d4cd3a78d61129be5b1aaadd515df UNKNOWN * 75aadbc834d6606527764468dd3dbcb1e802b171 UNKNOWN * f14ffb1f08820146e5d26616aa9b956ff99ec604 UNKNOWN * 06dff3c437b7b3f1aa227b700cf8c34669b067ed UNKNOWN * 97ba05a69199cff86cebbe25732097e3a68284f1 UNKNOWN * 3948fff7aacd6c97dcbe053a59a1208dae875607 UNKNOWN * 8ff6a0af2f53984c5864b04156a5b942400811c3 UNKNOWN * adba2dccf6e41da3bde98e5ed622cfd4b39554e9 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=515) * dbcd6ae0092c067c4bc364355ca7fa8129f46a39 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (HUDI-2086) redo the logical of mor_incremental_view for hive
[ https://issues.apache.org/jira/browse/HUDI-2086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17373166#comment-17373166 ] ASF GitHub Bot commented on HUDI-2086: -- hudi-bot edited a comment on pull request #3203: URL: https://github.com/apache/hudi/pull/3203#issuecomment-872092745 ## CI report: * 792ae1d9884eac5eb7004e4edb617ea6e86d79b7 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=631) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > redo the logical of mor_incremental_view for hive > - > > Key: HUDI-2086 > URL: https://issues.apache.org/jira/browse/HUDI-2086 > Project: Apache Hudi > Issue Type: Bug > Components: Hive Integration > Environment: spark3.1.1 > hive3.1.1 > hadoop3.1.1 > os: suse >Reporter: tao meng >Assignee: tao meng >Priority: Major > Labels: pull-request-available > > now ,There are some problems with mor_incremental_view for hive。 > For example, > 1):*hudi cannot read the lastest incremental datas which are stored by logs* > think that: create a mor table with bulk_insert, and then do upsert for this > table, > no we want to query the latest incremental data by hive/sparksql, however > the lastest incremental datas are stored by logs, when we do query nothings > will return > step1: prepare data > val df = spark.sparkContext.parallelize(0 to 20, 2).map(x => testCase(x, > x+"jack", Random.nextInt(2))).toDF() > .withColumn("col3", expr("keyid + 3000")) > .withColumn("p", lit(1)) > step2: do bulk_insert > mergePartitionTable(df, 4, "default", "inc", tableType = > DataSourceWriteOptions.MOR_TABLE_TYPE_OPT_VAL, op = "bulk_insert") > step3: do upsert > mergePartitionTable(df, 4, "default", "inc", tableType = > DataSourceWriteOptions.MOR_TABLE_TYPE_OPT_VAL, op = "upsert") > step4: check the lastest commit time and do query > spark.sql("set hoodie.inc.consume.mode=INCREMENTAL") > spark.sql("set hoodie.inc.consume.max.commits=1") > spark.sql("set hoodie.inc.consume.start.timestamp=20210628103935") > spark.sql("select keyid, col3 from inc_rt where `_hoodie_commit_time` > > '20210628103935' order by keyid").show(100, false) > +-++ > |keyid|col3| > +-++ > +-++ > > 2):*if we do insert_over_write/insert_over_write_table for hudi mor table, > the incr query result is wrong when we want to query the data before > insert_overwrite/insert_overwrite_table* > step1: do bulk_insert > mergePartitionTable(df, 4, "default", "overInc", tableType = > DataSourceWriteOptions.MOR_TABLE_TYPE_OPT_VAL, op = "bulk_insert") > now the commits is > [20210628160614.deltacommit ] > step2: do insert_overwrite_table > mergePartitionTable(df, 4, "default", "overInc", tableType = > DataSourceWriteOptions.MOR_TABLE_TYPE_OPT_VAL, op = "insert_overwrite_table") > now the commits is > [20210628160614.deltacommit, 20210628160923.replacecommit ] > step3: query the data before insert_overwrite_table > spark.sql("set hoodie.overInc.consume.mode=INCREMENTAL") > spark.sql("set hoodie.overInc.consume.max.commits=1") > spark.sql("set hoodie.overInc.consume.start.timestamp=0") > spark.sql("select keyid, col3 from overInc_rt where `_hoodie_commit_time` > > '0' order by keyid").show(100, false) > +-++ > |keyid|col3| > +-++ > +-++ > > 3) *hive/presto/flink cannot read file groups which has only logs* > when we use hbase/inmemory as index, mor table will produce log files instead > of parquet file, but now hive/presto cannot read those files since those > files are log files. > *HUDI-2048* mentions this problem. > > however when we use spark data source to executre incremental query, there is > no such problem above。keep the logical of mor_incremental_view for hive as > the same logicl as spark dataSource is necessary。 > we redo the logical of mor_incremental_view for hive,to solve above problems > and keep the logical of mor_incremental_view as the same logicl as spark > dataSource > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] hudi-bot edited a comment on pull request #3203: [HUDI-2086] Redo the logical of mor_incremental_view for hive
hudi-bot edited a comment on pull request #3203: URL: https://github.com/apache/hudi/pull/3203#issuecomment-872092745 ## CI report: * 792ae1d9884eac5eb7004e4edb617ea6e86d79b7 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=631) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] freeshow commented on issue #3188: java.lang.NoSuchMethodError: org.apache.hudi.org.apache.jetty.server.session.SessionHandler.setHttpOnly(Z)V[SUPPORT]
freeshow commented on issue #3188: URL: https://github.com/apache/hudi/issues/3188#issuecomment-872674173 > @freeshow Try this : > > ``` > hoodie.embed.timeline.server=false > ``` Thanks,it works -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (HUDI-2086) redo the logical of mor_incremental_view for hive
[ https://issues.apache.org/jira/browse/HUDI-2086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17373152#comment-17373152 ] ASF GitHub Bot commented on HUDI-2086: -- hudi-bot edited a comment on pull request #3203: URL: https://github.com/apache/hudi/pull/3203#issuecomment-872092745 ## CI report: * 5d49bd4e8c638ffb9ced102dc6771d6291c199e3 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=603) * 792ae1d9884eac5eb7004e4edb617ea6e86d79b7 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=631) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > redo the logical of mor_incremental_view for hive > - > > Key: HUDI-2086 > URL: https://issues.apache.org/jira/browse/HUDI-2086 > Project: Apache Hudi > Issue Type: Bug > Components: Hive Integration > Environment: spark3.1.1 > hive3.1.1 > hadoop3.1.1 > os: suse >Reporter: tao meng >Assignee: tao meng >Priority: Major > Labels: pull-request-available > > now ,There are some problems with mor_incremental_view for hive。 > For example, > 1):*hudi cannot read the lastest incremental datas which are stored by logs* > think that: create a mor table with bulk_insert, and then do upsert for this > table, > no we want to query the latest incremental data by hive/sparksql, however > the lastest incremental datas are stored by logs, when we do query nothings > will return > step1: prepare data > val df = spark.sparkContext.parallelize(0 to 20, 2).map(x => testCase(x, > x+"jack", Random.nextInt(2))).toDF() > .withColumn("col3", expr("keyid + 3000")) > .withColumn("p", lit(1)) > step2: do bulk_insert > mergePartitionTable(df, 4, "default", "inc", tableType = > DataSourceWriteOptions.MOR_TABLE_TYPE_OPT_VAL, op = "bulk_insert") > step3: do upsert > mergePartitionTable(df, 4, "default", "inc", tableType = > DataSourceWriteOptions.MOR_TABLE_TYPE_OPT_VAL, op = "upsert") > step4: check the lastest commit time and do query > spark.sql("set hoodie.inc.consume.mode=INCREMENTAL") > spark.sql("set hoodie.inc.consume.max.commits=1") > spark.sql("set hoodie.inc.consume.start.timestamp=20210628103935") > spark.sql("select keyid, col3 from inc_rt where `_hoodie_commit_time` > > '20210628103935' order by keyid").show(100, false) > +-++ > |keyid|col3| > +-++ > +-++ > > 2):*if we do insert_over_write/insert_over_write_table for hudi mor table, > the incr query result is wrong when we want to query the data before > insert_overwrite/insert_overwrite_table* > step1: do bulk_insert > mergePartitionTable(df, 4, "default", "overInc", tableType = > DataSourceWriteOptions.MOR_TABLE_TYPE_OPT_VAL, op = "bulk_insert") > now the commits is > [20210628160614.deltacommit ] > step2: do insert_overwrite_table > mergePartitionTable(df, 4, "default", "overInc", tableType = > DataSourceWriteOptions.MOR_TABLE_TYPE_OPT_VAL, op = "insert_overwrite_table") > now the commits is > [20210628160614.deltacommit, 20210628160923.replacecommit ] > step3: query the data before insert_overwrite_table > spark.sql("set hoodie.overInc.consume.mode=INCREMENTAL") > spark.sql("set hoodie.overInc.consume.max.commits=1") > spark.sql("set hoodie.overInc.consume.start.timestamp=0") > spark.sql("select keyid, col3 from overInc_rt where `_hoodie_commit_time` > > '0' order by keyid").show(100, false) > +-++ > |keyid|col3| > +-++ > +-++ > > 3) *hive/presto/flink cannot read file groups which has only logs* > when we use hbase/inmemory as index, mor table will produce log files instead > of parquet file, but now hive/presto cannot read those files since those > files are log files. > *HUDI-2048* mentions this problem. > > however when we use spark data source to executre incremental query, there is > no such problem above。keep the logical of mor_incremental_view for hive as > the same logicl as spark dataSource is necessary。 > we redo the logical of mor_incremental_view for hive,to solve above problems > and keep the logical of mor_incremental_view as the same logicl as spark > dataSource > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-2114) Spark Query MOR Table Written By Flink Return Incorrect Timestamp Value
[ https://issues.apache.org/jira/browse/HUDI-2114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17373153#comment-17373153 ] ASF GitHub Bot commented on HUDI-2114: -- hudi-bot edited a comment on pull request #3208: URL: https://github.com/apache/hudi/pull/3208#issuecomment-872163385 ## CI report: * 14b39be069c0155fb3292f17305ed51428c1399a Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=622) * 7db85c8b1a665ea3cc84d2e085518d100686e8a4 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=632) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Spark Query MOR Table Written By Flink Return Incorrect Timestamp Value > --- > > Key: HUDI-2114 > URL: https://issues.apache.org/jira/browse/HUDI-2114 > Project: Apache Hudi > Issue Type: Bug > Components: Spark Integration >Reporter: pengzhiwei >Assignee: pengzhiwei >Priority: Major > Labels: pull-request-available > Fix For: 0.9.0, 0.10.0 > > > Write a MOR table by flink like this: > {code:java} > create table h0 ( > uuid varchar(20), > name varchar(10), > ts timestamp(3) > ) with ( >'connector' = 'hudi', >'path' = '/xx/xx/', > 'table.type' = 'MERGE_ON_READ' > ); > insert into h0 values('id1', 'jim', TIMESTAMP '2021-01-01 00:00:01'){code} > Query the table by spark will return a incorrect *ts* value: > {code:java} > 'id', 'jim', 1970-01-20 03:22:34.849144{code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] hudi-bot edited a comment on pull request #3208: [HUDI-2114] Spark Query MOR Table Written By Flink Return Incorrect T…
hudi-bot edited a comment on pull request #3208: URL: https://github.com/apache/hudi/pull/3208#issuecomment-872163385 ## CI report: * 14b39be069c0155fb3292f17305ed51428c1399a Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=622) * 7db85c8b1a665ea3cc84d2e085518d100686e8a4 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=632) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot edited a comment on pull request #3203: [HUDI-2086] Redo the logical of mor_incremental_view for hive
hudi-bot edited a comment on pull request #3203: URL: https://github.com/apache/hudi/pull/3203#issuecomment-872092745 ## CI report: * 5d49bd4e8c638ffb9ced102dc6771d6291c199e3 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=603) * 792ae1d9884eac5eb7004e4edb617ea6e86d79b7 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=631) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (HUDI-2114) Spark Query MOR Table Written By Flink Return Incorrect Timestamp Value
[ https://issues.apache.org/jira/browse/HUDI-2114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17373151#comment-17373151 ] ASF GitHub Bot commented on HUDI-2114: -- hudi-bot edited a comment on pull request #3208: URL: https://github.com/apache/hudi/pull/3208#issuecomment-872163385 ## CI report: * 14b39be069c0155fb3292f17305ed51428c1399a Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=622) * 7db85c8b1a665ea3cc84d2e085518d100686e8a4 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Spark Query MOR Table Written By Flink Return Incorrect Timestamp Value > --- > > Key: HUDI-2114 > URL: https://issues.apache.org/jira/browse/HUDI-2114 > Project: Apache Hudi > Issue Type: Bug > Components: Spark Integration >Reporter: pengzhiwei >Assignee: pengzhiwei >Priority: Major > Labels: pull-request-available > Fix For: 0.9.0, 0.10.0 > > > Write a MOR table by flink like this: > {code:java} > create table h0 ( > uuid varchar(20), > name varchar(10), > ts timestamp(3) > ) with ( >'connector' = 'hudi', >'path' = '/xx/xx/', > 'table.type' = 'MERGE_ON_READ' > ); > insert into h0 values('id1', 'jim', TIMESTAMP '2021-01-01 00:00:01'){code} > Query the table by spark will return a incorrect *ts* value: > {code:java} > 'id', 'jim', 1970-01-20 03:22:34.849144{code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-2086) redo the logical of mor_incremental_view for hive
[ https://issues.apache.org/jira/browse/HUDI-2086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17373150#comment-17373150 ] ASF GitHub Bot commented on HUDI-2086: -- hudi-bot edited a comment on pull request #3203: URL: https://github.com/apache/hudi/pull/3203#issuecomment-872092745 ## CI report: * 5d49bd4e8c638ffb9ced102dc6771d6291c199e3 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=603) * 792ae1d9884eac5eb7004e4edb617ea6e86d79b7 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > redo the logical of mor_incremental_view for hive > - > > Key: HUDI-2086 > URL: https://issues.apache.org/jira/browse/HUDI-2086 > Project: Apache Hudi > Issue Type: Bug > Components: Hive Integration > Environment: spark3.1.1 > hive3.1.1 > hadoop3.1.1 > os: suse >Reporter: tao meng >Assignee: tao meng >Priority: Major > Labels: pull-request-available > > now ,There are some problems with mor_incremental_view for hive。 > For example, > 1):*hudi cannot read the lastest incremental datas which are stored by logs* > think that: create a mor table with bulk_insert, and then do upsert for this > table, > no we want to query the latest incremental data by hive/sparksql, however > the lastest incremental datas are stored by logs, when we do query nothings > will return > step1: prepare data > val df = spark.sparkContext.parallelize(0 to 20, 2).map(x => testCase(x, > x+"jack", Random.nextInt(2))).toDF() > .withColumn("col3", expr("keyid + 3000")) > .withColumn("p", lit(1)) > step2: do bulk_insert > mergePartitionTable(df, 4, "default", "inc", tableType = > DataSourceWriteOptions.MOR_TABLE_TYPE_OPT_VAL, op = "bulk_insert") > step3: do upsert > mergePartitionTable(df, 4, "default", "inc", tableType = > DataSourceWriteOptions.MOR_TABLE_TYPE_OPT_VAL, op = "upsert") > step4: check the lastest commit time and do query > spark.sql("set hoodie.inc.consume.mode=INCREMENTAL") > spark.sql("set hoodie.inc.consume.max.commits=1") > spark.sql("set hoodie.inc.consume.start.timestamp=20210628103935") > spark.sql("select keyid, col3 from inc_rt where `_hoodie_commit_time` > > '20210628103935' order by keyid").show(100, false) > +-++ > |keyid|col3| > +-++ > +-++ > > 2):*if we do insert_over_write/insert_over_write_table for hudi mor table, > the incr query result is wrong when we want to query the data before > insert_overwrite/insert_overwrite_table* > step1: do bulk_insert > mergePartitionTable(df, 4, "default", "overInc", tableType = > DataSourceWriteOptions.MOR_TABLE_TYPE_OPT_VAL, op = "bulk_insert") > now the commits is > [20210628160614.deltacommit ] > step2: do insert_overwrite_table > mergePartitionTable(df, 4, "default", "overInc", tableType = > DataSourceWriteOptions.MOR_TABLE_TYPE_OPT_VAL, op = "insert_overwrite_table") > now the commits is > [20210628160614.deltacommit, 20210628160923.replacecommit ] > step3: query the data before insert_overwrite_table > spark.sql("set hoodie.overInc.consume.mode=INCREMENTAL") > spark.sql("set hoodie.overInc.consume.max.commits=1") > spark.sql("set hoodie.overInc.consume.start.timestamp=0") > spark.sql("select keyid, col3 from overInc_rt where `_hoodie_commit_time` > > '0' order by keyid").show(100, false) > +-++ > |keyid|col3| > +-++ > +-++ > > 3) *hive/presto/flink cannot read file groups which has only logs* > when we use hbase/inmemory as index, mor table will produce log files instead > of parquet file, but now hive/presto cannot read those files since those > files are log files. > *HUDI-2048* mentions this problem. > > however when we use spark data source to executre incremental query, there is > no such problem above。keep the logical of mor_incremental_view for hive as > the same logicl as spark dataSource is necessary。 > we redo the logical of mor_incremental_view for hive,to solve above problems > and keep the logical of mor_incremental_view as the same logicl as spark > dataSource > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] hudi-bot edited a comment on pull request #3203: [HUDI-2086] Redo the logical of mor_incremental_view for hive
hudi-bot edited a comment on pull request #3203: URL: https://github.com/apache/hudi/pull/3203#issuecomment-872092745 ## CI report: * 5d49bd4e8c638ffb9ced102dc6771d6291c199e3 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=603) * 792ae1d9884eac5eb7004e4edb617ea6e86d79b7 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot edited a comment on pull request #3208: [HUDI-2114] Spark Query MOR Table Written By Flink Return Incorrect T…
hudi-bot edited a comment on pull request #3208: URL: https://github.com/apache/hudi/pull/3208#issuecomment-872163385 ## CI report: * 14b39be069c0155fb3292f17305ed51428c1399a Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=622) * 7db85c8b1a665ea3cc84d2e085518d100686e8a4 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (HUDI-1904) Make SchemaProvider spark free and move it to hudi-client-common
[ https://issues.apache.org/jira/browse/HUDI-1904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17373149#comment-17373149 ] ASF GitHub Bot commented on HUDI-1904: -- hudi-bot edited a comment on pull request #2963: URL: https://github.com/apache/hudi/pull/2963#issuecomment-866559341 ## CI report: * 8c67c9bee4b43a6306199d1e89d1c785cd083d4c Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=630) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Make SchemaProvider spark free and move it to hudi-client-common > > > Key: HUDI-1904 > URL: https://issues.apache.org/jira/browse/HUDI-1904 > Project: Apache Hudi > Issue Type: Task >Reporter: Xianghu Wang >Assignee: Xianghu Wang >Priority: Major > Labels: pull-request-available > Fix For: 0.9.0 > > > Currently, we have support spark, flink and java client to operate hudi > tables. The "common" stuff like `SchemaPriovider` should be extracted and > move away from `hudi-utilities` module to share with other engines -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] hudi-bot edited a comment on pull request #2963: [HUDI-1904] Introduce SchemaProviderInterface to make SchemaProvider unified
hudi-bot edited a comment on pull request #2963: URL: https://github.com/apache/hudi/pull/2963#issuecomment-866559341 ## CI report: * 8c67c9bee4b43a6306199d1e89d1c785cd083d4c Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=630) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (HUDI-1468) incremental read support with clustering
[ https://issues.apache.org/jira/browse/HUDI-1468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17373143#comment-17373143 ] ASF GitHub Bot commented on HUDI-1468: -- hudi-bot edited a comment on pull request #3211: URL: https://github.com/apache/hudi/pull/3211#issuecomment-872639166 ## CI report: * ab7bacb26d44f383e7f61ec81531b34011f1383b Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=629) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > incremental read support with clustering > > > Key: HUDI-1468 > URL: https://issues.apache.org/jira/browse/HUDI-1468 > Project: Apache Hudi > Issue Type: Sub-task > Components: Incremental Pull >Affects Versions: 0.9.0 >Reporter: satish >Assignee: liwei >Priority: Blocker > Labels: pull-request-available > Fix For: 0.9.0 > > > As part of clustering, metadata such as hoodie_commit_time changes for > records that are clustered. This is specific to > SparkBulkInsertBasedRunClusteringStrategy implementation. Figure out a way to > carry commit_time from original record to support incremental queries. > Also, incremental queries dont work with 'replacecommit' used by clustering > HUDI-1264. Change incremental query to work for replacecommits created by > Clustering. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] hudi-bot edited a comment on pull request #3211: [HUDI-1468] Support more flexible clustering strategies and preserve commit …
hudi-bot edited a comment on pull request #3211: URL: https://github.com/apache/hudi/pull/3211#issuecomment-872639166 ## CI report: * ab7bacb26d44f383e7f61ec81531b34011f1383b Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=629) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] wangxianghu commented on issue #3188: java.lang.NoSuchMethodError: org.apache.hudi.org.apache.jetty.server.session.SessionHandler.setHttpOnly(Z)V[SUPPORT]
wangxianghu commented on issue #3188: URL: https://github.com/apache/hudi/issues/3188#issuecomment-872647851 @freeshow Try this : ``` hoodie.embed.timeline.server=false ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (HUDI-1904) Make SchemaProvider spark free and move it to hudi-client-common
[ https://issues.apache.org/jira/browse/HUDI-1904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17373130#comment-17373130 ] ASF GitHub Bot commented on HUDI-1904: -- hudi-bot edited a comment on pull request #2963: URL: https://github.com/apache/hudi/pull/2963#issuecomment-866559341 ## CI report: * ce253fe711d93b46c4afa933550c82bf6500cac0 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=430) * 8c67c9bee4b43a6306199d1e89d1c785cd083d4c Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=630) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Make SchemaProvider spark free and move it to hudi-client-common > > > Key: HUDI-1904 > URL: https://issues.apache.org/jira/browse/HUDI-1904 > Project: Apache Hudi > Issue Type: Task >Reporter: Xianghu Wang >Assignee: Xianghu Wang >Priority: Major > Labels: pull-request-available > Fix For: 0.9.0 > > > Currently, we have support spark, flink and java client to operate hudi > tables. The "common" stuff like `SchemaPriovider` should be extracted and > move away from `hudi-utilities` module to share with other engines -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] hudi-bot edited a comment on pull request #2963: [HUDI-1904] Introduce SchemaProviderInterface to make SchemaProvider unified
hudi-bot edited a comment on pull request #2963: URL: https://github.com/apache/hudi/pull/2963#issuecomment-866559341 ## CI report: * ce253fe711d93b46c4afa933550c82bf6500cac0 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=430) * 8c67c9bee4b43a6306199d1e89d1c785cd083d4c Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=630) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (HUDI-1904) Make SchemaProvider spark free and move it to hudi-client-common
[ https://issues.apache.org/jira/browse/HUDI-1904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17373129#comment-17373129 ] ASF GitHub Bot commented on HUDI-1904: -- hudi-bot edited a comment on pull request #2963: URL: https://github.com/apache/hudi/pull/2963#issuecomment-866559341 ## CI report: * ce253fe711d93b46c4afa933550c82bf6500cac0 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=430) * 8c67c9bee4b43a6306199d1e89d1c785cd083d4c UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Make SchemaProvider spark free and move it to hudi-client-common > > > Key: HUDI-1904 > URL: https://issues.apache.org/jira/browse/HUDI-1904 > Project: Apache Hudi > Issue Type: Task >Reporter: Xianghu Wang >Assignee: Xianghu Wang >Priority: Major > Labels: pull-request-available > Fix For: 0.9.0 > > > Currently, we have support spark, flink and java client to operate hudi > tables. The "common" stuff like `SchemaPriovider` should be extracted and > move away from `hudi-utilities` module to share with other engines -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] hudi-bot edited a comment on pull request #2963: [HUDI-1904] Introduce SchemaProviderInterface to make SchemaProvider unified
hudi-bot edited a comment on pull request #2963: URL: https://github.com/apache/hudi/pull/2963#issuecomment-866559341 ## CI report: * ce253fe711d93b46c4afa933550c82bf6500cac0 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=430) * 8c67c9bee4b43a6306199d1e89d1c785cd083d4c UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (HUDI-2116) sync 10w partitions to hive by using HiveSyncTool lead to the oom of hive MetaStore
[ https://issues.apache.org/jira/browse/HUDI-2116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17373127#comment-17373127 ] ASF GitHub Bot commented on HUDI-2116: -- yanghua commented on pull request #3209: URL: https://github.com/apache/hudi/pull/3209#issuecomment-872645071 @xiarixiaoyao This title of PR describes the problem, while the better title is to describe "what do you want to do in the PR". WDYT? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > sync 10w partitions to hive by using HiveSyncTool lead to the oom of hive > MetaStore > > > Key: HUDI-2116 > URL: https://issues.apache.org/jira/browse/HUDI-2116 > Project: Apache Hudi > Issue Type: Bug > Components: Hive Integration >Affects Versions: 0.8.0 > Environment: hive3.1.1 > hadoop 3.1.1 >Reporter: tao meng >Assignee: tao meng >Priority: Major > Labels: pull-request-available > Fix For: 0.10.0 > > > when we try to sync 10w partitions to hive by using HiveSyncTool lead to the > oom of hive MetaStore。 > > here is a stress test for HiveSyncTool > env: > hive metastore -Xms16G -Xmx16G > hive.metastore.client.socket.timeout=10800 > > ||partitionNum||time consume|| > |100|37s| > |1000|168s| > |5000|1830s| > |1|timeout| > |10|hive metastore oom| > HiveSyncTools sync all partitions to hive metastore at once。 when the > partitions num is large ,it puts a lot of pressure on hive metastore。 for > large partition num we should support batch sync 。 > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] yanghua commented on pull request #3209: [HUDI-2116] Sync 10w partitions to hive by using HiveSyncTool lead to …
yanghua commented on pull request #3209: URL: https://github.com/apache/hudi/pull/3209#issuecomment-872645071 @xiarixiaoyao This title of PR describes the problem, while the better title is to describe "what do you want to do in the PR". WDYT? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] freeshow edited a comment on issue #3188: java.lang.NoSuchMethodError: org.apache.hudi.org.apache.jetty.server.session.SessionHandler.setHttpOnly(Z)V[SUPPORT]
freeshow edited a comment on issue #3188: URL: https://github.com/apache/hudi/issues/3188#issuecomment-872643783 I found hadoop3.0.0 provides Jetty 9.3; Hudi has a dependency on Jetty 9.4 (specifically, SessionHandler.setHttpOnly() doesn't exist in 9.3). I compile with Hadoop3.0.0. When I use hadoop2.7, not appeer the error! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] freeshow commented on issue #3188: java.lang.NoSuchMethodError: org.apache.hudi.org.apache.jetty.server.session.SessionHandler.setHttpOnly(Z)V[SUPPORT]
freeshow commented on issue #3188: URL: https://github.com/apache/hudi/issues/3188#issuecomment-872643783 I fount hadoop3.0.0 provides Jetty 9.3; Hudi has a dependency on Jetty 9.4 (specifically, SessionHandler.setHttpOnly() doesn't exist in 9.3). I compile with Hadoop3.0.0. When I use hadoop2.7, not appeer the error! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (HUDI-1872) Move HoodieFlinkStreamer into hudi-utilities module
[ https://issues.apache.org/jira/browse/HUDI-1872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17373126#comment-17373126 ] ASF GitHub Bot commented on HUDI-1872: -- wangxianghu commented on pull request #3162: URL: https://github.com/apache/hudi/pull/3162#issuecomment-872641634 > > Yes, I agree with you on creating a new module , but let's not put it under the original hudi-utilities-bundle. > > I prefer this: > > hudi-utilities-bundle > > ├── hudi-flink-utilities-bundle > > └── hudi-spark-utilities-bundle > > I am not suggesting to create modules under `hudi-utilities-bundle`, Also, in the bundle we are not adding any classes. Since we are moving some Flink classes to `hudi-utilities` module , we will have to create modules under it, something like > > hudi-utilities > ├── hudi-flink-utilities > └── hudi-spark-utilities > > and then we can add these two as part of `hudi-utilities-bundle` here https://github.com/apache/hudi/blob/master/packaging/hudi-utilities-bundle/pom.xml#L70 like the we way we have added `hudi-hive-sync` module from `hudi-sync` project. The utilities-bundle jars should be engine-related. each jar for one engine(use different dependencies). if you want to put these bundles under `hudi-utilities-bundle`, could be: ``` hudi-utilities-bundle ├── hudi-flink-utilities-bundle └── hudi-spark-utilities-bundle ``` but these ways do not respect backward compatibility, user should use new jars(hudi-xxx-utilities) to start their jobs another option: add a new bundle for `HoodieFLinkStreamer` that's is `hudi-flink-utilities`, leave the others untouched. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Move HoodieFlinkStreamer into hudi-utilities module > --- > > Key: HUDI-1872 > URL: https://issues.apache.org/jira/browse/HUDI-1872 > Project: Apache Hudi > Issue Type: Task > Components: Flink Integration >Reporter: Danny Chen >Assignee: Vinay >Priority: Major > Labels: pull-request-available, sev:normal > Fix For: 0.9.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] wangxianghu commented on pull request #3162: [HUDI-1872] Move HoodieFlinkStreamer to hudi-utilities module
wangxianghu commented on pull request #3162: URL: https://github.com/apache/hudi/pull/3162#issuecomment-872641634 > > Yes, I agree with you on creating a new module , but let's not put it under the original hudi-utilities-bundle. > > I prefer this: > > hudi-utilities-bundle > > ├── hudi-flink-utilities-bundle > > └── hudi-spark-utilities-bundle > > I am not suggesting to create modules under `hudi-utilities-bundle`, Also, in the bundle we are not adding any classes. Since we are moving some Flink classes to `hudi-utilities` module , we will have to create modules under it, something like > > hudi-utilities > ├── hudi-flink-utilities > └── hudi-spark-utilities > > and then we can add these two as part of `hudi-utilities-bundle` here https://github.com/apache/hudi/blob/master/packaging/hudi-utilities-bundle/pom.xml#L70 like the we way we have added `hudi-hive-sync` module from `hudi-sync` project. The utilities-bundle jars should be engine-related. each jar for one engine(use different dependencies). if you want to put these bundles under `hudi-utilities-bundle`, could be: ``` hudi-utilities-bundle ├── hudi-flink-utilities-bundle └── hudi-spark-utilities-bundle ``` but these ways do not respect backward compatibility, user should use new jars(hudi-xxx-utilities) to start their jobs another option: add a new bundle for `HoodieFLinkStreamer` that's is `hudi-flink-utilities`, leave the others untouched. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (HUDI-2113) Fix integration testing failure caused by sql results out of order
[ https://issues.apache.org/jira/browse/HUDI-2113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17373125#comment-17373125 ] ASF GitHub Bot commented on HUDI-2113: -- yanghua commented on pull request #3204: URL: https://github.com/apache/hudi/pull/3204#issuecomment-872640137 @n3nash Would you want to give a double check? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Fix integration testing failure caused by sql results out of order > -- > > Key: HUDI-2113 > URL: https://issues.apache.org/jira/browse/HUDI-2113 > Project: Apache Hudi > Issue Type: Test > Components: Testing >Reporter: XiaoyuGeng >Assignee: XiaoyuGeng >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] yanghua commented on pull request #3204: [HUDI-2113] Fix integration testing failure caused by sql results out…
yanghua commented on pull request #3204: URL: https://github.com/apache/hudi/pull/3204#issuecomment-872640137 @n3nash Would you want to give a double check? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (HUDI-1468) incremental read support with clustering
[ https://issues.apache.org/jira/browse/HUDI-1468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17373124#comment-17373124 ] ASF GitHub Bot commented on HUDI-1468: -- hudi-bot edited a comment on pull request #3211: URL: https://github.com/apache/hudi/pull/3211#issuecomment-872639166 ## CI report: * ab7bacb26d44f383e7f61ec81531b34011f1383b Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=629) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > incremental read support with clustering > > > Key: HUDI-1468 > URL: https://issues.apache.org/jira/browse/HUDI-1468 > Project: Apache Hudi > Issue Type: Sub-task > Components: Incremental Pull >Affects Versions: 0.9.0 >Reporter: satish >Assignee: liwei >Priority: Blocker > Labels: pull-request-available > Fix For: 0.9.0 > > > As part of clustering, metadata such as hoodie_commit_time changes for > records that are clustered. This is specific to > SparkBulkInsertBasedRunClusteringStrategy implementation. Figure out a way to > carry commit_time from original record to support incremental queries. > Also, incremental queries dont work with 'replacecommit' used by clustering > HUDI-1264. Change incremental query to work for replacecommits created by > Clustering. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] hudi-bot edited a comment on pull request #3211: [HUDI-1468] Support more flexible clustering strategies and preserve commit …
hudi-bot edited a comment on pull request #3211: URL: https://github.com/apache/hudi/pull/3211#issuecomment-872639166 ## CI report: * ab7bacb26d44f383e7f61ec81531b34011f1383b Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=629) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (HUDI-1468) incremental read support with clustering
[ https://issues.apache.org/jira/browse/HUDI-1468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17373122#comment-17373122 ] ASF GitHub Bot commented on HUDI-1468: -- hudi-bot commented on pull request #3211: URL: https://github.com/apache/hudi/pull/3211#issuecomment-872639166 ## CI report: * ab7bacb26d44f383e7f61ec81531b34011f1383b UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > incremental read support with clustering > > > Key: HUDI-1468 > URL: https://issues.apache.org/jira/browse/HUDI-1468 > Project: Apache Hudi > Issue Type: Sub-task > Components: Incremental Pull >Affects Versions: 0.9.0 >Reporter: satish >Assignee: liwei >Priority: Blocker > Labels: pull-request-available > Fix For: 0.9.0 > > > As part of clustering, metadata such as hoodie_commit_time changes for > records that are clustered. This is specific to > SparkBulkInsertBasedRunClusteringStrategy implementation. Figure out a way to > carry commit_time from original record to support incremental queries. > Also, incremental queries dont work with 'replacecommit' used by clustering > HUDI-1264. Change incremental query to work for replacecommits created by > Clustering. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-1468) incremental read support with clustering
[ https://issues.apache.org/jira/browse/HUDI-1468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17373123#comment-17373123 ] ASF GitHub Bot commented on HUDI-1468: -- satishkotha commented on pull request #3211: URL: https://github.com/apache/hudi/pull/3211#issuecomment-872639220 @n3nash @vinothchandar this includes all my changes done for supporting encryption style usecases using clustering framework. I still need to port some tests. But please take a look and add any comments -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > incremental read support with clustering > > > Key: HUDI-1468 > URL: https://issues.apache.org/jira/browse/HUDI-1468 > Project: Apache Hudi > Issue Type: Sub-task > Components: Incremental Pull >Affects Versions: 0.9.0 >Reporter: satish >Assignee: liwei >Priority: Blocker > Labels: pull-request-available > Fix For: 0.9.0 > > > As part of clustering, metadata such as hoodie_commit_time changes for > records that are clustered. This is specific to > SparkBulkInsertBasedRunClusteringStrategy implementation. Figure out a way to > carry commit_time from original record to support incremental queries. > Also, incremental queries dont work with 'replacecommit' used by clustering > HUDI-1264. Change incremental query to work for replacecommits created by > Clustering. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] satishkotha commented on pull request #3211: [HUDI-1468] Support more flexible clustering strategies and preserve commit …
satishkotha commented on pull request #3211: URL: https://github.com/apache/hudi/pull/3211#issuecomment-872639220 @n3nash @vinothchandar this includes all my changes done for supporting encryption style usecases using clustering framework. I still need to port some tests. But please take a look and add any comments -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #3211: [HUDI-1468] Support more flexible clustering strategies and preserve commit …
hudi-bot commented on pull request #3211: URL: https://github.com/apache/hudi/pull/3211#issuecomment-872639166 ## CI report: * ab7bacb26d44f383e7f61ec81531b34011f1383b UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (HUDI-1468) incremental read support with clustering
[ https://issues.apache.org/jira/browse/HUDI-1468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17373121#comment-17373121 ] ASF GitHub Bot commented on HUDI-1468: -- satishkotha opened a new pull request #3211: URL: https://github.com/apache/hudi/pull/3211 ## What is the purpose of the pull request Support custom clustering strategies and preserve commit time to support incremental read ## Brief change log * introduce new way of running clustering using SingleSparkJobExecutionStrategy for usecases that dont need sorting * Push down more logic into clustering strategies to avoid RDD union. * Make some performance improvements after running at large scale. Avoid RDD collect multiple times. * Preserve Hoodie commit time (optional for backward compatibility) while rewriting the data ## Verify this pull request This change added tests and can be verified as follows: ## Committer checklist - [ ] Has a corresponding JIRA in PR title & commit - [ ] Commit message is descriptive of the change - [ ] CI is green - [ ] Necessary doc changes done or have another open PR - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > incremental read support with clustering > > > Key: HUDI-1468 > URL: https://issues.apache.org/jira/browse/HUDI-1468 > Project: Apache Hudi > Issue Type: Sub-task > Components: Incremental Pull >Affects Versions: 0.9.0 >Reporter: satish >Assignee: liwei >Priority: Blocker > Fix For: 0.9.0 > > > As part of clustering, metadata such as hoodie_commit_time changes for > records that are clustered. This is specific to > SparkBulkInsertBasedRunClusteringStrategy implementation. Figure out a way to > carry commit_time from original record to support incremental queries. > Also, incremental queries dont work with 'replacecommit' used by clustering > HUDI-1264. Change incremental query to work for replacecommits created by > Clustering. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-1468) incremental read support with clustering
[ https://issues.apache.org/jira/browse/HUDI-1468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-1468: - Labels: pull-request-available (was: ) > incremental read support with clustering > > > Key: HUDI-1468 > URL: https://issues.apache.org/jira/browse/HUDI-1468 > Project: Apache Hudi > Issue Type: Sub-task > Components: Incremental Pull >Affects Versions: 0.9.0 >Reporter: satish >Assignee: liwei >Priority: Blocker > Labels: pull-request-available > Fix For: 0.9.0 > > > As part of clustering, metadata such as hoodie_commit_time changes for > records that are clustered. This is specific to > SparkBulkInsertBasedRunClusteringStrategy implementation. Figure out a way to > carry commit_time from original record to support incremental queries. > Also, incremental queries dont work with 'replacecommit' used by clustering > HUDI-1264. Change incremental query to work for replacecommits created by > Clustering. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] satishkotha opened a new pull request #3211: [HUDI-1468] Support custom clustering strategies and preserve commit …
satishkotha opened a new pull request #3211: URL: https://github.com/apache/hudi/pull/3211 ## What is the purpose of the pull request Support custom clustering strategies and preserve commit time to support incremental read ## Brief change log * introduce new way of running clustering using SingleSparkJobExecutionStrategy for usecases that dont need sorting * Push down more logic into clustering strategies to avoid RDD union. * Make some performance improvements after running at large scale. Avoid RDD collect multiple times. * Preserve Hoodie commit time (optional for backward compatibility) while rewriting the data ## Verify this pull request This change added tests and can be verified as follows: ## Committer checklist - [ ] Has a corresponding JIRA in PR title & commit - [ ] Commit message is descriptive of the change - [ ] CI is green - [ ] Necessary doc changes done or have another open PR - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org