[jira] [Commented] (HUDI-1904) Make SchemaProvider spark free and move it to hudi-client-common

2021-07-01 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-1904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17373244#comment-17373244
 ] 

ASF GitHub Bot commented on HUDI-1904:
--

codecov-commenter edited a comment on pull request #2963:
URL: https://github.com/apache/hudi/pull/2963#issuecomment-843155329


   # 
[Codecov](https://codecov.io/gh/apache/hudi/pull/2963?src=pr=h1_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 Report
   > Merging 
[#2963](https://codecov.io/gh/apache/hudi/pull/2963?src=pr=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (8c67c9b) into 
[master](https://codecov.io/gh/apache/hudi/commit/6eca06d074520140d7bc67b48bd2b9a5b76f0a87?el=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (6eca06d) will **decrease** coverage by `32.02%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/2963/graphs/tree.svg?width=650=150=pr=VTTXabwbs2_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/2963?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
   
   ```diff
   @@  Coverage Diff  @@
   ## master#2963   +/-   ##
   =
   - Coverage 47.51%   15.48%   -32.03% 
   + Complexity 5429  478 -4951 
   =
 Files   922  281  -641 
 Lines 4096811548-29420 
 Branches   4105  945 -3160 
   =
   - Hits  19464 1788-17676 
   + Misses19780 9602-10178 
   + Partials   1724  158 -1566 
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | hudicli | `?` | |
   | hudiclient | `0.00% <ø> (-34.59%)` | :arrow_down: |
   | hudicommon | `?` | |
   | hudiflink | `?` | |
   | hudihadoopmr | `?` | |
   | hudisparkdatasource | `?` | |
   | hudisync | `5.38% <ø> (-48.67%)` | :arrow_down: |
   | huditimelineservice | `?` | |
   | hudiutilities | `58.04% <ø> (+0.01%)` | :arrow_up: |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/2963?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 | Coverage Δ | |
   |---|---|---|
   | 
[...rg/apache/hudi/schema/SchemaProviderInterface.java](https://codecov.io/gh/apache/hudi/pull/2963/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGllbnQvaHVkaS1jbGllbnQtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL3NjaGVtYS9TY2hlbWFQcm92aWRlckludGVyZmFjZS5qYXZh)
 | `0.00% <ø> (ø)` | |
   | 
[...g/apache/hudi/utilities/schema/SchemaProvider.java](https://codecov.io/gh/apache/hudi/pull/2963/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NjaGVtYS9TY2hlbWFQcm92aWRlci5qYXZh)
 | `66.66% <ø> (-4.77%)` | :arrow_down: |
   | 
[...main/java/org/apache/hudi/metrics/HoodieGauge.java](https://codecov.io/gh/apache/hudi/pull/2963/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGllbnQvaHVkaS1jbGllbnQtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL21ldHJpY3MvSG9vZGllR2F1Z2UuamF2YQ==)
 | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | 
[.../org/apache/hudi/hive/NonPartitionedExtractor.java](https://codecov.io/gh/apache/hudi/pull/2963/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvTm9uUGFydGl0aW9uZWRFeHRyYWN0b3IuamF2YQ==)
 | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | 
[.../java/org/apache/hudi/metrics/MetricsReporter.java](https://codecov.io/gh/apache/hudi/pull/2963/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGllbnQvaHVkaS1jbGllbnQtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL21ldHJpY3MvTWV0cmljc1JlcG9ydGVyLmphdmE=)
 | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | 

[GitHub] [hudi] codecov-commenter edited a comment on pull request #2963: [HUDI-1904] Introduce SchemaProviderInterface to make SchemaProvider unified

2021-07-01 Thread GitBox


codecov-commenter edited a comment on pull request #2963:
URL: https://github.com/apache/hudi/pull/2963#issuecomment-843155329


   # 
[Codecov](https://codecov.io/gh/apache/hudi/pull/2963?src=pr=h1_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 Report
   > Merging 
[#2963](https://codecov.io/gh/apache/hudi/pull/2963?src=pr=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (8c67c9b) into 
[master](https://codecov.io/gh/apache/hudi/commit/6eca06d074520140d7bc67b48bd2b9a5b76f0a87?el=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (6eca06d) will **decrease** coverage by `32.02%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/2963/graphs/tree.svg?width=650=150=pr=VTTXabwbs2_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/2963?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
   
   ```diff
   @@  Coverage Diff  @@
   ## master#2963   +/-   ##
   =
   - Coverage 47.51%   15.48%   -32.03% 
   + Complexity 5429  478 -4951 
   =
 Files   922  281  -641 
 Lines 4096811548-29420 
 Branches   4105  945 -3160 
   =
   - Hits  19464 1788-17676 
   + Misses19780 9602-10178 
   + Partials   1724  158 -1566 
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | hudicli | `?` | |
   | hudiclient | `0.00% <ø> (-34.59%)` | :arrow_down: |
   | hudicommon | `?` | |
   | hudiflink | `?` | |
   | hudihadoopmr | `?` | |
   | hudisparkdatasource | `?` | |
   | hudisync | `5.38% <ø> (-48.67%)` | :arrow_down: |
   | huditimelineservice | `?` | |
   | hudiutilities | `58.04% <ø> (+0.01%)` | :arrow_up: |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/2963?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 | Coverage Δ | |
   |---|---|---|
   | 
[...rg/apache/hudi/schema/SchemaProviderInterface.java](https://codecov.io/gh/apache/hudi/pull/2963/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGllbnQvaHVkaS1jbGllbnQtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL3NjaGVtYS9TY2hlbWFQcm92aWRlckludGVyZmFjZS5qYXZh)
 | `0.00% <ø> (ø)` | |
   | 
[...g/apache/hudi/utilities/schema/SchemaProvider.java](https://codecov.io/gh/apache/hudi/pull/2963/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NjaGVtYS9TY2hlbWFQcm92aWRlci5qYXZh)
 | `66.66% <ø> (-4.77%)` | :arrow_down: |
   | 
[...main/java/org/apache/hudi/metrics/HoodieGauge.java](https://codecov.io/gh/apache/hudi/pull/2963/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGllbnQvaHVkaS1jbGllbnQtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL21ldHJpY3MvSG9vZGllR2F1Z2UuamF2YQ==)
 | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | 
[.../org/apache/hudi/hive/NonPartitionedExtractor.java](https://codecov.io/gh/apache/hudi/pull/2963/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvTm9uUGFydGl0aW9uZWRFeHRyYWN0b3IuamF2YQ==)
 | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | 
[.../java/org/apache/hudi/metrics/MetricsReporter.java](https://codecov.io/gh/apache/hudi/pull/2963/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGllbnQvaHVkaS1jbGllbnQtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL21ldHJpY3MvTWV0cmljc1JlcG9ydGVyLmphdmE=)
 | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | 

[jira] [Commented] (HUDI-1904) Make SchemaProvider spark free and move it to hudi-client-common

2021-07-01 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-1904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17373238#comment-17373238
 ] 

ASF GitHub Bot commented on HUDI-1904:
--

codecov-commenter edited a comment on pull request #2963:
URL: https://github.com/apache/hudi/pull/2963#issuecomment-843155329


   # 
[Codecov](https://codecov.io/gh/apache/hudi/pull/2963?src=pr=h1_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 Report
   > Merging 
[#2963](https://codecov.io/gh/apache/hudi/pull/2963?src=pr=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (8c67c9b) into 
[master](https://codecov.io/gh/apache/hudi/commit/6eca06d074520140d7bc67b48bd2b9a5b76f0a87?el=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (6eca06d) will **decrease** coverage by `44.61%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/2963/graphs/tree.svg?width=650=150=pr=VTTXabwbs2_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/2963?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
   
   ```diff
   @@ Coverage Diff  @@
   ## master   #2963   +/-   ##
   
   - Coverage 47.51%   2.89%   -44.62% 
   + Complexity 5429  82 -5347 
   
 Files   922 281  -641 
 Lines 40968   11548-29420 
 Branches   4105 945 -3160 
   
   - Hits  19464 334-19130 
   + Misses19780   11188 -8592 
   + Partials   1724  26 -1698 
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | hudicli | `?` | |
   | hudiclient | `0.00% <ø> (-34.59%)` | :arrow_down: |
   | hudicommon | `?` | |
   | hudiflink | `?` | |
   | hudihadoopmr | `?` | |
   | hudisparkdatasource | `?` | |
   | hudisync | `5.38% <ø> (-48.67%)` | :arrow_down: |
   | huditimelineservice | `?` | |
   | hudiutilities | `9.31% <ø> (-48.71%)` | :arrow_down: |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/2963?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 | Coverage Δ | |
   |---|---|---|
   | 
[...rg/apache/hudi/schema/SchemaProviderInterface.java](https://codecov.io/gh/apache/hudi/pull/2963/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGllbnQvaHVkaS1jbGllbnQtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL3NjaGVtYS9TY2hlbWFQcm92aWRlckludGVyZmFjZS5qYXZh)
 | `0.00% <ø> (ø)` | |
   | 
[...g/apache/hudi/utilities/schema/SchemaProvider.java](https://codecov.io/gh/apache/hudi/pull/2963/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NjaGVtYS9TY2hlbWFQcm92aWRlci5qYXZh)
 | `66.66% <ø> (-4.77%)` | :arrow_down: |
   | 
[...va/org/apache/hudi/utilities/IdentitySplitter.java](https://codecov.io/gh/apache/hudi/pull/2963/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL0lkZW50aXR5U3BsaXR0ZXIuamF2YQ==)
 | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | 
[...va/org/apache/hudi/utilities/schema/SchemaSet.java](https://codecov.io/gh/apache/hudi/pull/2963/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NjaGVtYS9TY2hlbWFTZXQuamF2YQ==)
 | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | 
[...a/org/apache/hudi/utilities/sources/RowSource.java](https://codecov.io/gh/apache/hudi/pull/2963/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvUm93U291cmNlLmphdmE=)
 | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | 

[GitHub] [hudi] codecov-commenter edited a comment on pull request #2963: [HUDI-1904] Introduce SchemaProviderInterface to make SchemaProvider unified

2021-07-01 Thread GitBox


codecov-commenter edited a comment on pull request #2963:
URL: https://github.com/apache/hudi/pull/2963#issuecomment-843155329


   # 
[Codecov](https://codecov.io/gh/apache/hudi/pull/2963?src=pr=h1_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 Report
   > Merging 
[#2963](https://codecov.io/gh/apache/hudi/pull/2963?src=pr=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (8c67c9b) into 
[master](https://codecov.io/gh/apache/hudi/commit/6eca06d074520140d7bc67b48bd2b9a5b76f0a87?el=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (6eca06d) will **decrease** coverage by `44.61%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/2963/graphs/tree.svg?width=650=150=pr=VTTXabwbs2_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/2963?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
   
   ```diff
   @@ Coverage Diff  @@
   ## master   #2963   +/-   ##
   
   - Coverage 47.51%   2.89%   -44.62% 
   + Complexity 5429  82 -5347 
   
 Files   922 281  -641 
 Lines 40968   11548-29420 
 Branches   4105 945 -3160 
   
   - Hits  19464 334-19130 
   + Misses19780   11188 -8592 
   + Partials   1724  26 -1698 
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | hudicli | `?` | |
   | hudiclient | `0.00% <ø> (-34.59%)` | :arrow_down: |
   | hudicommon | `?` | |
   | hudiflink | `?` | |
   | hudihadoopmr | `?` | |
   | hudisparkdatasource | `?` | |
   | hudisync | `5.38% <ø> (-48.67%)` | :arrow_down: |
   | huditimelineservice | `?` | |
   | hudiutilities | `9.31% <ø> (-48.71%)` | :arrow_down: |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/2963?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 | Coverage Δ | |
   |---|---|---|
   | 
[...rg/apache/hudi/schema/SchemaProviderInterface.java](https://codecov.io/gh/apache/hudi/pull/2963/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGllbnQvaHVkaS1jbGllbnQtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL3NjaGVtYS9TY2hlbWFQcm92aWRlckludGVyZmFjZS5qYXZh)
 | `0.00% <ø> (ø)` | |
   | 
[...g/apache/hudi/utilities/schema/SchemaProvider.java](https://codecov.io/gh/apache/hudi/pull/2963/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NjaGVtYS9TY2hlbWFQcm92aWRlci5qYXZh)
 | `66.66% <ø> (-4.77%)` | :arrow_down: |
   | 
[...va/org/apache/hudi/utilities/IdentitySplitter.java](https://codecov.io/gh/apache/hudi/pull/2963/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL0lkZW50aXR5U3BsaXR0ZXIuamF2YQ==)
 | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | 
[...va/org/apache/hudi/utilities/schema/SchemaSet.java](https://codecov.io/gh/apache/hudi/pull/2963/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NjaGVtYS9TY2hlbWFTZXQuamF2YQ==)
 | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | 
[...a/org/apache/hudi/utilities/sources/RowSource.java](https://codecov.io/gh/apache/hudi/pull/2963/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvUm93U291cmNlLmphdmE=)
 | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | 

[jira] [Commented] (HUDI-1468) incremental read support with clustering

2021-07-01 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-1468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17373234#comment-17373234
 ] 

ASF GitHub Bot commented on HUDI-1468:
--

hudi-bot edited a comment on pull request #3211:
URL: https://github.com/apache/hudi/pull/3211#issuecomment-872639166


   
   ## CI report:
   
   * c9c9a0d5343b65e690544dfcb85e71d915c455e1 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=637)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> incremental read support with clustering
> 
>
> Key: HUDI-1468
> URL: https://issues.apache.org/jira/browse/HUDI-1468
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Incremental Pull
>Affects Versions: 0.9.0
>Reporter: satish
>Assignee: liwei
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> As part of clustering, metadata such as hoodie_commit_time changes for 
> records that are clustered. This is specific to 
> SparkBulkInsertBasedRunClusteringStrategy implementation. Figure out a way to 
> carry commit_time from original record to support incremental queries.
> Also, incremental queries dont work with 'replacecommit' used by clustering 
> HUDI-1264. Change incremental query to work for replacecommits created by 
> Clustering.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] hudi-bot edited a comment on pull request #3211: [HUDI-1468] Support more flexible clustering strategies and preserve commit …

2021-07-01 Thread GitBox


hudi-bot edited a comment on pull request #3211:
URL: https://github.com/apache/hudi/pull/3211#issuecomment-872639166


   
   ## CI report:
   
   * c9c9a0d5343b65e690544dfcb85e71d915c455e1 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=637)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-2116) sync 10w partitions to hive by using HiveSyncTool lead to the oom of hive MetaStore

2021-07-01 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17373231#comment-17373231
 ] 

ASF GitHub Bot commented on HUDI-2116:
--

hudi-bot edited a comment on pull request #3209:
URL: https://github.com/apache/hudi/pull/3209#issuecomment-872173561


   
   ## CI report:
   
   * fbcd406b45e370446193c32e7d09db09d57a0996 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=636)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


>  sync 10w partitions to hive by using HiveSyncTool lead to the oom of hive 
> MetaStore
> 
>
> Key: HUDI-2116
> URL: https://issues.apache.org/jira/browse/HUDI-2116
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Hive Integration
>Affects Versions: 0.8.0
> Environment: hive3.1.1
> hadoop 3.1.1
>Reporter: tao meng
>Assignee: tao meng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> when we try to sync 10w partitions to hive by using HiveSyncTool lead to the 
> oom of hive MetaStore。
>  
> here is a stress test for HiveSyncTool
> env: 
> hive metastore -Xms16G -Xmx16G
> hive.metastore.client.socket.timeout=10800
>  
> ||partitionNum||time consume||
> |100|37s|
> |1000|168s|
> |5000|1830s|
> |1|timeout|
> |10|hive metastore oom|
> HiveSyncTools  sync all partitions to hive metastore at once。 when the 
> partitions num is large ,it puts a lot of pressure on hive metastore。 for 
> large partition num we should support batch sync 。
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] hudi-bot edited a comment on pull request #3209: [HUDI-2116] Support batch synchronization of partition datas to hive metastore to avoid oom problem

2021-07-01 Thread GitBox


hudi-bot edited a comment on pull request #3209:
URL: https://github.com/apache/hudi/pull/3209#issuecomment-872173561


   
   ## CI report:
   
   * fbcd406b45e370446193c32e7d09db09d57a0996 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=636)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-2045) Support Read Hoodie As DataSource Table For Flink And DeltaStreamer

2021-07-01 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17373227#comment-17373227
 ] 

ASF GitHub Bot commented on HUDI-2045:
--

hudi-bot edited a comment on pull request #3120:
URL: https://github.com/apache/hudi/pull/3120#issuecomment-864760893


   
   ## CI report:
   
   * aaca30fffd1ea37f803f51ef3cf49c59ed79badc UNKNOWN
   * fcd06c8bccfc90b272b51d3511094e6617ec25bd UNKNOWN
   * 96947d0419df5f8bab10072eb64afecd29326e55 UNKNOWN
   * 02acd1127b72470f6d7adffb787179f0cddfa954 UNKNOWN
   * 504a6770be5d4cd3a78d61129be5b1aaadd515df UNKNOWN
   * 75aadbc834d6606527764468dd3dbcb1e802b171 UNKNOWN
   * f14ffb1f08820146e5d26616aa9b956ff99ec604 UNKNOWN
   * 06dff3c437b7b3f1aa227b700cf8c34669b067ed UNKNOWN
   * 97ba05a69199cff86cebbe25732097e3a68284f1 UNKNOWN
   * 3948fff7aacd6c97dcbe053a59a1208dae875607 UNKNOWN
   * 8ff6a0af2f53984c5864b04156a5b942400811c3 UNKNOWN
   * dbcd6ae0092c067c4bc364355ca7fa8129f46a39 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=635)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Support Read Hoodie As DataSource Table For Flink And DeltaStreamer
> ---
>
> Key: HUDI-2045
> URL: https://issues.apache.org/jira/browse/HUDI-2045
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Hive Integration
>Reporter: pengzhiwei
>Assignee: pengzhiwei
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> Currently we only support reading hoodie table as datasource table for spark 
> since [https://github.com/apache/hudi/pull/2283]
> In order to support this feature for flink and DeltaStreamer, we need to sync 
> the spark table properties needed by datasource table to the meta store in 
> HiveSyncTool.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] hudi-bot edited a comment on pull request #3120: [HUDI-2045] Support Read Hoodie As DataSource Table For Flink And Del…

2021-07-01 Thread GitBox


hudi-bot edited a comment on pull request #3120:
URL: https://github.com/apache/hudi/pull/3120#issuecomment-864760893


   
   ## CI report:
   
   * aaca30fffd1ea37f803f51ef3cf49c59ed79badc UNKNOWN
   * fcd06c8bccfc90b272b51d3511094e6617ec25bd UNKNOWN
   * 96947d0419df5f8bab10072eb64afecd29326e55 UNKNOWN
   * 02acd1127b72470f6d7adffb787179f0cddfa954 UNKNOWN
   * 504a6770be5d4cd3a78d61129be5b1aaadd515df UNKNOWN
   * 75aadbc834d6606527764468dd3dbcb1e802b171 UNKNOWN
   * f14ffb1f08820146e5d26616aa9b956ff99ec604 UNKNOWN
   * 06dff3c437b7b3f1aa227b700cf8c34669b067ed UNKNOWN
   * 97ba05a69199cff86cebbe25732097e3a68284f1 UNKNOWN
   * 3948fff7aacd6c97dcbe053a59a1208dae875607 UNKNOWN
   * 8ff6a0af2f53984c5864b04156a5b942400811c3 UNKNOWN
   * dbcd6ae0092c067c4bc364355ca7fa8129f46a39 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=635)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-1447) DeltaStreamer kafka source supports consuming from specified timestamp

2021-07-01 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-1447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17373226#comment-17373226
 ] 

ASF GitHub Bot commented on HUDI-1447:
--

hudi-bot edited a comment on pull request #2438:
URL: https://github.com/apache/hudi/pull/2438#issuecomment-863310563


   
   ## CI report:
   
   * a39570dfe0493bcd23edf911f6256e90d3b22907 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=638)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> DeltaStreamer kafka source supports consuming from specified timestamp
> --
>
> Key: HUDI-1447
> URL: https://issues.apache.org/jira/browse/HUDI-1447
> Project: Apache Hudi
>  Issue Type: New Feature
>  Components: DeltaStreamer
>Reporter: wangxianghu#1
>Assignee: liujinhui
>Priority: Major
>  Labels: pull-request-available, sev:high, user-support-issues
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] hudi-bot edited a comment on pull request #2438: [HUDI-1447] DeltaStreamer kafka source supports consuming from specified timestamp

2021-07-01 Thread GitBox


hudi-bot edited a comment on pull request #2438:
URL: https://github.com/apache/hudi/pull/2438#issuecomment-863310563


   
   ## CI report:
   
   * a39570dfe0493bcd23edf911f6256e90d3b22907 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=638)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-2057) CTAS Generate An External Table When Create Managed Table

2021-07-01 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17373225#comment-17373225
 ] 

ASF GitHub Bot commented on HUDI-2057:
--

hudi-bot edited a comment on pull request #3146:
URL: https://github.com/apache/hudi/pull/3146#issuecomment-867438049


   
   ## CI report:
   
   * 189ca2500f54564e9c252dbab04198bae15494ef Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=634)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> CTAS Generate An External Table When Create Managed Table
> -
>
> Key: HUDI-2057
> URL: https://issues.apache.org/jira/browse/HUDI-2057
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Spark Integration
>Reporter: pengzhiwei
>Assignee: pengzhiwei
>Priority: Major
>  Labels: pull-request-available
>
> Currently CTAS would generate an external table when create a managed table 
> in the hive meta store.
> {code:java}
> create table h0 using hudi as select 1 as id, 'a1' as name;{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] hudi-bot edited a comment on pull request #3146: [HUDI-2057] CTAS Generate An External Table When Create Managed Table

2021-07-01 Thread GitBox


hudi-bot edited a comment on pull request #3146:
URL: https://github.com/apache/hudi/pull/3146#issuecomment-867438049


   
   ## CI report:
   
   * 189ca2500f54564e9c252dbab04198bae15494ef Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=634)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-1447) DeltaStreamer kafka source supports consuming from specified timestamp

2021-07-01 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-1447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17373224#comment-17373224
 ] 

ASF GitHub Bot commented on HUDI-1447:
--

hudi-bot edited a comment on pull request #2438:
URL: https://github.com/apache/hudi/pull/2438#issuecomment-863310563


   
   ## CI report:
   
   * 1bbcdb44cbb0ab9ac84c95a48fea5d7f38a8f657 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=633)
 
   * a39570dfe0493bcd23edf911f6256e90d3b22907 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=638)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> DeltaStreamer kafka source supports consuming from specified timestamp
> --
>
> Key: HUDI-1447
> URL: https://issues.apache.org/jira/browse/HUDI-1447
> Project: Apache Hudi
>  Issue Type: New Feature
>  Components: DeltaStreamer
>Reporter: wangxianghu#1
>Assignee: liujinhui
>Priority: Major
>  Labels: pull-request-available, sev:high, user-support-issues
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] hudi-bot edited a comment on pull request #2438: [HUDI-1447] DeltaStreamer kafka source supports consuming from specified timestamp

2021-07-01 Thread GitBox


hudi-bot edited a comment on pull request #2438:
URL: https://github.com/apache/hudi/pull/2438#issuecomment-863310563


   
   ## CI report:
   
   * 1bbcdb44cbb0ab9ac84c95a48fea5d7f38a8f657 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=633)
 
   * a39570dfe0493bcd23edf911f6256e90d3b22907 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=638)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-1468) incremental read support with clustering

2021-07-01 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-1468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17373214#comment-17373214
 ] 

ASF GitHub Bot commented on HUDI-1468:
--

hudi-bot edited a comment on pull request #3211:
URL: https://github.com/apache/hudi/pull/3211#issuecomment-872639166


   
   ## CI report:
   
   * ab7bacb26d44f383e7f61ec81531b34011f1383b Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=629)
 
   * c9c9a0d5343b65e690544dfcb85e71d915c455e1 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=637)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> incremental read support with clustering
> 
>
> Key: HUDI-1468
> URL: https://issues.apache.org/jira/browse/HUDI-1468
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Incremental Pull
>Affects Versions: 0.9.0
>Reporter: satish
>Assignee: liwei
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> As part of clustering, metadata such as hoodie_commit_time changes for 
> records that are clustered. This is specific to 
> SparkBulkInsertBasedRunClusteringStrategy implementation. Figure out a way to 
> carry commit_time from original record to support incremental queries.
> Also, incremental queries dont work with 'replacecommit' used by clustering 
> HUDI-1264. Change incremental query to work for replacecommits created by 
> Clustering.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] hudi-bot edited a comment on pull request #3211: [HUDI-1468] Support more flexible clustering strategies and preserve commit …

2021-07-01 Thread GitBox


hudi-bot edited a comment on pull request #3211:
URL: https://github.com/apache/hudi/pull/3211#issuecomment-872639166


   
   ## CI report:
   
   * ab7bacb26d44f383e7f61ec81531b34011f1383b Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=629)
 
   * c9c9a0d5343b65e690544dfcb85e71d915c455e1 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=637)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-1468) incremental read support with clustering

2021-07-01 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-1468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17373212#comment-17373212
 ] 

ASF GitHub Bot commented on HUDI-1468:
--

hudi-bot edited a comment on pull request #3211:
URL: https://github.com/apache/hudi/pull/3211#issuecomment-872639166


   
   ## CI report:
   
   * ab7bacb26d44f383e7f61ec81531b34011f1383b Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=629)
 
   * c9c9a0d5343b65e690544dfcb85e71d915c455e1 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> incremental read support with clustering
> 
>
> Key: HUDI-1468
> URL: https://issues.apache.org/jira/browse/HUDI-1468
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Incremental Pull
>Affects Versions: 0.9.0
>Reporter: satish
>Assignee: liwei
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> As part of clustering, metadata such as hoodie_commit_time changes for 
> records that are clustered. This is specific to 
> SparkBulkInsertBasedRunClusteringStrategy implementation. Figure out a way to 
> carry commit_time from original record to support incremental queries.
> Also, incremental queries dont work with 'replacecommit' used by clustering 
> HUDI-1264. Change incremental query to work for replacecommits created by 
> Clustering.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] hudi-bot edited a comment on pull request #3211: [HUDI-1468] Support more flexible clustering strategies and preserve commit …

2021-07-01 Thread GitBox


hudi-bot edited a comment on pull request #3211:
URL: https://github.com/apache/hudi/pull/3211#issuecomment-872639166


   
   ## CI report:
   
   * ab7bacb26d44f383e7f61ec81531b34011f1383b Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=629)
 
   * c9c9a0d5343b65e690544dfcb85e71d915c455e1 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-2116) sync 10w partitions to hive by using HiveSyncTool lead to the oom of hive MetaStore

2021-07-01 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17373207#comment-17373207
 ] 

ASF GitHub Bot commented on HUDI-2116:
--

hudi-bot edited a comment on pull request #3209:
URL: https://github.com/apache/hudi/pull/3209#issuecomment-872173561


   
   ## CI report:
   
   * f4c7b374a7a338f0202c356baf08f24a9043e37a Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=613)
 
   * fbcd406b45e370446193c32e7d09db09d57a0996 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=636)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


>  sync 10w partitions to hive by using HiveSyncTool lead to the oom of hive 
> MetaStore
> 
>
> Key: HUDI-2116
> URL: https://issues.apache.org/jira/browse/HUDI-2116
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Hive Integration
>Affects Versions: 0.8.0
> Environment: hive3.1.1
> hadoop 3.1.1
>Reporter: tao meng
>Assignee: tao meng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> when we try to sync 10w partitions to hive by using HiveSyncTool lead to the 
> oom of hive MetaStore。
>  
> here is a stress test for HiveSyncTool
> env: 
> hive metastore -Xms16G -Xmx16G
> hive.metastore.client.socket.timeout=10800
>  
> ||partitionNum||time consume||
> |100|37s|
> |1000|168s|
> |5000|1830s|
> |1|timeout|
> |10|hive metastore oom|
> HiveSyncTools  sync all partitions to hive metastore at once。 when the 
> partitions num is large ,it puts a lot of pressure on hive metastore。 for 
> large partition num we should support batch sync 。
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] hudi-bot edited a comment on pull request #3209: [HUDI-2116] Support batch synchronization of partition datas to hive metastore to avoid oom problem

2021-07-01 Thread GitBox


hudi-bot edited a comment on pull request #3209:
URL: https://github.com/apache/hudi/pull/3209#issuecomment-872173561


   
   ## CI report:
   
   * f4c7b374a7a338f0202c356baf08f24a9043e37a Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=613)
 
   * fbcd406b45e370446193c32e7d09db09d57a0996 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=636)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-2116) sync 10w partitions to hive by using HiveSyncTool lead to the oom of hive MetaStore

2021-07-01 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17373205#comment-17373205
 ] 

ASF GitHub Bot commented on HUDI-2116:
--

hudi-bot edited a comment on pull request #3209:
URL: https://github.com/apache/hudi/pull/3209#issuecomment-872173561


   
   ## CI report:
   
   * f4c7b374a7a338f0202c356baf08f24a9043e37a Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=613)
 
   * fbcd406b45e370446193c32e7d09db09d57a0996 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


>  sync 10w partitions to hive by using HiveSyncTool lead to the oom of hive 
> MetaStore
> 
>
> Key: HUDI-2116
> URL: https://issues.apache.org/jira/browse/HUDI-2116
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Hive Integration
>Affects Versions: 0.8.0
> Environment: hive3.1.1
> hadoop 3.1.1
>Reporter: tao meng
>Assignee: tao meng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> when we try to sync 10w partitions to hive by using HiveSyncTool lead to the 
> oom of hive MetaStore。
>  
> here is a stress test for HiveSyncTool
> env: 
> hive metastore -Xms16G -Xmx16G
> hive.metastore.client.socket.timeout=10800
>  
> ||partitionNum||time consume||
> |100|37s|
> |1000|168s|
> |5000|1830s|
> |1|timeout|
> |10|hive metastore oom|
> HiveSyncTools  sync all partitions to hive metastore at once。 when the 
> partitions num is large ,it puts a lot of pressure on hive metastore。 for 
> large partition num we should support batch sync 。
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-2116) sync 10w partitions to hive by using HiveSyncTool lead to the oom of hive MetaStore

2021-07-01 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17373206#comment-17373206
 ] 

ASF GitHub Bot commented on HUDI-2116:
--

xiarixiaoyao commented on pull request #3209:
URL: https://github.com/apache/hudi/pull/3209#issuecomment-872694937


   @yanghua  thanks for your review.already changed the PR title.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


>  sync 10w partitions to hive by using HiveSyncTool lead to the oom of hive 
> MetaStore
> 
>
> Key: HUDI-2116
> URL: https://issues.apache.org/jira/browse/HUDI-2116
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Hive Integration
>Affects Versions: 0.8.0
> Environment: hive3.1.1
> hadoop 3.1.1
>Reporter: tao meng
>Assignee: tao meng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> when we try to sync 10w partitions to hive by using HiveSyncTool lead to the 
> oom of hive MetaStore。
>  
> here is a stress test for HiveSyncTool
> env: 
> hive metastore -Xms16G -Xmx16G
> hive.metastore.client.socket.timeout=10800
>  
> ||partitionNum||time consume||
> |100|37s|
> |1000|168s|
> |5000|1830s|
> |1|timeout|
> |10|hive metastore oom|
> HiveSyncTools  sync all partitions to hive metastore at once。 when the 
> partitions num is large ,it puts a lot of pressure on hive metastore。 for 
> large partition num we should support batch sync 。
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] xiarixiaoyao commented on pull request #3209: [HUDI-2116] Support batch synchronization of partition datas to hive metastore to avoid oom problem

2021-07-01 Thread GitBox


xiarixiaoyao commented on pull request #3209:
URL: https://github.com/apache/hudi/pull/3209#issuecomment-872694937


   @yanghua  thanks for your review.already changed the PR title.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot edited a comment on pull request #3209: [HUDI-2116] Support batch synchronization of partition datas to hive metastore to avoid oom problem

2021-07-01 Thread GitBox


hudi-bot edited a comment on pull request #3209:
URL: https://github.com/apache/hudi/pull/3209#issuecomment-872173561


   
   ## CI report:
   
   * f4c7b374a7a338f0202c356baf08f24a9043e37a Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=613)
 
   * fbcd406b45e370446193c32e7d09db09d57a0996 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-1447) DeltaStreamer kafka source supports consuming from specified timestamp

2021-07-01 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-1447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17373202#comment-17373202
 ] 

ASF GitHub Bot commented on HUDI-1447:
--

hudi-bot edited a comment on pull request #2438:
URL: https://github.com/apache/hudi/pull/2438#issuecomment-863310563


   
   ## CI report:
   
   * 1bbcdb44cbb0ab9ac84c95a48fea5d7f38a8f657 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=633)
 
   * a39570dfe0493bcd23edf911f6256e90d3b22907 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> DeltaStreamer kafka source supports consuming from specified timestamp
> --
>
> Key: HUDI-1447
> URL: https://issues.apache.org/jira/browse/HUDI-1447
> Project: Apache Hudi
>  Issue Type: New Feature
>  Components: DeltaStreamer
>Reporter: wangxianghu#1
>Assignee: liujinhui
>Priority: Major
>  Labels: pull-request-available, sev:high, user-support-issues
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] hudi-bot edited a comment on pull request #2438: [HUDI-1447] DeltaStreamer kafka source supports consuming from specified timestamp

2021-07-01 Thread GitBox


hudi-bot edited a comment on pull request #2438:
URL: https://github.com/apache/hudi/pull/2438#issuecomment-863310563


   
   ## CI report:
   
   * 1bbcdb44cbb0ab9ac84c95a48fea5d7f38a8f657 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=633)
 
   * a39570dfe0493bcd23edf911f6256e90d3b22907 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-2072) Add Precommit validator framework

2021-07-01 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17373200#comment-17373200
 ] 

ASF GitHub Bot commented on HUDI-2072:
--

vinothchandar commented on a change in pull request #3153:
URL: https://github.com/apache/hudi/pull/3153#discussion_r662716233



##
File path: 
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java
##
@@ -194,6 +194,37 @@
   public static final String EXTERNAL_RECORD_AND_SCHEMA_TRANSFORMATION = 
AVRO_SCHEMA + ".externalTransformation";
   public static final String DEFAULT_EXTERNAL_RECORD_AND_SCHEMA_TRANSFORMATION 
= "false";
 
+  public static final String PRE_COMMIT_VALIDATORS = 
"hoodie.precommit.validators";
+  private static final String DEFAULT_PRE_COMMIT_VALIDATORS = "";
+  public static final String VALIDATOR_TABLE_VARIABLE = "";
+
+  /**
+   * Spark SQL queries to run on table before committing new data to validate 
state before and after commit.
+   * Multiple queries separated by ';' delimiter are supported.
+   * example: "select count(*) from \"
+   * Note \ is replaced by table state before and after commit. 
+   */
+  public static final String PRE_COMMIT_VALIDATORS_EQUALITY_SQL_QUERIES = 
"hoodie.precommit.validators.equality.sql.queries";

Review comment:
   please move all these configs to as  ConfigProperty

##
File path: 
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java
##
@@ -194,6 +194,37 @@
   public static final String EXTERNAL_RECORD_AND_SCHEMA_TRANSFORMATION = 
AVRO_SCHEMA + ".externalTransformation";
   public static final String DEFAULT_EXTERNAL_RECORD_AND_SCHEMA_TRANSFORMATION 
= "false";
 
+  public static final String PRE_COMMIT_VALIDATORS = 
"hoodie.precommit.validators";

Review comment:
   can we create a new Config class for this, instead of overloading the 
WriteConfig?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add Precommit validator framework
> -
>
> Key: HUDI-2072
> URL: https://issues.apache.org/jira/browse/HUDI-2072
> Project: Apache Hudi
>  Issue Type: New Feature
>Reporter: satish
>Assignee: satish
>Priority: Major
>  Labels: pull-request-available
>
> We want to run pre-commit validators before 'promoting' a inflight operation 
> to commit.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] vinothchandar commented on a change in pull request #3153: [HUDI-2072] Add pre-commit validator framework

2021-07-01 Thread GitBox


vinothchandar commented on a change in pull request #3153:
URL: https://github.com/apache/hudi/pull/3153#discussion_r662716233



##
File path: 
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java
##
@@ -194,6 +194,37 @@
   public static final String EXTERNAL_RECORD_AND_SCHEMA_TRANSFORMATION = 
AVRO_SCHEMA + ".externalTransformation";
   public static final String DEFAULT_EXTERNAL_RECORD_AND_SCHEMA_TRANSFORMATION 
= "false";
 
+  public static final String PRE_COMMIT_VALIDATORS = 
"hoodie.precommit.validators";
+  private static final String DEFAULT_PRE_COMMIT_VALIDATORS = "";
+  public static final String VALIDATOR_TABLE_VARIABLE = "";
+
+  /**
+   * Spark SQL queries to run on table before committing new data to validate 
state before and after commit.
+   * Multiple queries separated by ';' delimiter are supported.
+   * example: "select count(*) from \"
+   * Note \ is replaced by table state before and after commit. 
+   */
+  public static final String PRE_COMMIT_VALIDATORS_EQUALITY_SQL_QUERIES = 
"hoodie.precommit.validators.equality.sql.queries";

Review comment:
   please move all these configs to as  ConfigProperty

##
File path: 
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java
##
@@ -194,6 +194,37 @@
   public static final String EXTERNAL_RECORD_AND_SCHEMA_TRANSFORMATION = 
AVRO_SCHEMA + ".externalTransformation";
   public static final String DEFAULT_EXTERNAL_RECORD_AND_SCHEMA_TRANSFORMATION 
= "false";
 
+  public static final String PRE_COMMIT_VALIDATORS = 
"hoodie.precommit.validators";

Review comment:
   can we create a new Config class for this, instead of overloading the 
WriteConfig?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-2045) Support Read Hoodie As DataSource Table For Flink And DeltaStreamer

2021-07-01 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17373198#comment-17373198
 ] 

ASF GitHub Bot commented on HUDI-2045:
--

hudi-bot edited a comment on pull request #3120:
URL: https://github.com/apache/hudi/pull/3120#issuecomment-864760893


   
   ## CI report:
   
   * aaca30fffd1ea37f803f51ef3cf49c59ed79badc UNKNOWN
   * fcd06c8bccfc90b272b51d3511094e6617ec25bd UNKNOWN
   * 96947d0419df5f8bab10072eb64afecd29326e55 UNKNOWN
   * 02acd1127b72470f6d7adffb787179f0cddfa954 UNKNOWN
   * 504a6770be5d4cd3a78d61129be5b1aaadd515df UNKNOWN
   * 75aadbc834d6606527764468dd3dbcb1e802b171 UNKNOWN
   * f14ffb1f08820146e5d26616aa9b956ff99ec604 UNKNOWN
   * 06dff3c437b7b3f1aa227b700cf8c34669b067ed UNKNOWN
   * 97ba05a69199cff86cebbe25732097e3a68284f1 UNKNOWN
   * 3948fff7aacd6c97dcbe053a59a1208dae875607 UNKNOWN
   * 8ff6a0af2f53984c5864b04156a5b942400811c3 UNKNOWN
   * adba2dccf6e41da3bde98e5ed622cfd4b39554e9 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=515)
 
   * dbcd6ae0092c067c4bc364355ca7fa8129f46a39 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=635)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Support Read Hoodie As DataSource Table For Flink And DeltaStreamer
> ---
>
> Key: HUDI-2045
> URL: https://issues.apache.org/jira/browse/HUDI-2045
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Hive Integration
>Reporter: pengzhiwei
>Assignee: pengzhiwei
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> Currently we only support reading hoodie table as datasource table for spark 
> since [https://github.com/apache/hudi/pull/2283]
> In order to support this feature for flink and DeltaStreamer, we need to sync 
> the spark table properties needed by datasource table to the meta store in 
> HiveSyncTool.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] hudi-bot edited a comment on pull request #3120: [HUDI-2045] Support Read Hoodie As DataSource Table For Flink And Del…

2021-07-01 Thread GitBox


hudi-bot edited a comment on pull request #3120:
URL: https://github.com/apache/hudi/pull/3120#issuecomment-864760893


   
   ## CI report:
   
   * aaca30fffd1ea37f803f51ef3cf49c59ed79badc UNKNOWN
   * fcd06c8bccfc90b272b51d3511094e6617ec25bd UNKNOWN
   * 96947d0419df5f8bab10072eb64afecd29326e55 UNKNOWN
   * 02acd1127b72470f6d7adffb787179f0cddfa954 UNKNOWN
   * 504a6770be5d4cd3a78d61129be5b1aaadd515df UNKNOWN
   * 75aadbc834d6606527764468dd3dbcb1e802b171 UNKNOWN
   * f14ffb1f08820146e5d26616aa9b956ff99ec604 UNKNOWN
   * 06dff3c437b7b3f1aa227b700cf8c34669b067ed UNKNOWN
   * 97ba05a69199cff86cebbe25732097e3a68284f1 UNKNOWN
   * 3948fff7aacd6c97dcbe053a59a1208dae875607 UNKNOWN
   * 8ff6a0af2f53984c5864b04156a5b942400811c3 UNKNOWN
   * adba2dccf6e41da3bde98e5ed622cfd4b39554e9 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=515)
 
   * dbcd6ae0092c067c4bc364355ca7fa8129f46a39 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=635)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot edited a comment on pull request #3146: [HUDI-2057] CTAS Generate An External Table When Create Managed Table

2021-07-01 Thread GitBox


hudi-bot edited a comment on pull request #3146:
URL: https://github.com/apache/hudi/pull/3146#issuecomment-867438049


   
   ## CI report:
   
   * a3889e81b221cbefe5ad98c1e62f90aa24742d80 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=624)
 
   * 189ca2500f54564e9c252dbab04198bae15494ef Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=634)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-2057) CTAS Generate An External Table When Create Managed Table

2021-07-01 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17373196#comment-17373196
 ] 

ASF GitHub Bot commented on HUDI-2057:
--

hudi-bot edited a comment on pull request #3146:
URL: https://github.com/apache/hudi/pull/3146#issuecomment-867438049


   
   ## CI report:
   
   * a3889e81b221cbefe5ad98c1e62f90aa24742d80 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=624)
 
   * 189ca2500f54564e9c252dbab04198bae15494ef Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=634)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> CTAS Generate An External Table When Create Managed Table
> -
>
> Key: HUDI-2057
> URL: https://issues.apache.org/jira/browse/HUDI-2057
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Spark Integration
>Reporter: pengzhiwei
>Assignee: pengzhiwei
>Priority: Major
>  Labels: pull-request-available
>
> Currently CTAS would generate an external table when create a managed table 
> in the hive meta store.
> {code:java}
> create table h0 using hudi as select 1 as id, 'a1' as name;{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-2072) Add Precommit validator framework

2021-07-01 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17373195#comment-17373195
 ] 

ASF GitHub Bot commented on HUDI-2072:
--

bvaradar commented on a change in pull request #3153:
URL: https://github.com/apache/hudi/pull/3153#discussion_r662650025



##
File path: 
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java
##
@@ -194,6 +194,37 @@
   public static final String EXTERNAL_RECORD_AND_SCHEMA_TRANSFORMATION = 
AVRO_SCHEMA + ".externalTransformation";
   public static final String DEFAULT_EXTERNAL_RECORD_AND_SCHEMA_TRANSFORMATION 
= "false";
 
+  public static final String PRE_COMMIT_VALIDATORS = 
"hoodie.precommit.validators";
+  private static final String DEFAULT_PRE_COMMIT_VALIDATORS = "";
+  public static final String VALIDATOR_TABLE_VARIABLE = "";

Review comment:
   It would make the validation queries more flexible if we can make both 
after and before table names individually configurable. Sometimes, your 
validation queries would involve joining both before and after tables. Keeping 
them configurable would allow for more flexibility.

##
File path: 
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/client/validator/SqlQueryPreCommitValidator.java
##
@@ -0,0 +1,92 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.client.validator;
+
+import org.apache.hudi.client.WriteStatus;
+import org.apache.hudi.client.common.HoodieSparkEngineContext;
+import org.apache.hudi.common.engine.HoodieEngineContext;
+import org.apache.hudi.common.model.HoodieRecordPayload;
+import org.apache.hudi.common.util.StringUtils;
+import org.apache.hudi.config.HoodieWriteConfig;
+import org.apache.hudi.exception.HoodieValidationException;
+import org.apache.hudi.table.HoodieSparkTable;
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+import org.apache.spark.api.java.JavaRDD;
+import org.apache.spark.api.java.JavaSparkContext;
+import org.apache.spark.sql.Dataset;
+import org.apache.spark.sql.Row;
+import org.apache.spark.sql.SQLContext;
+
+import java.util.Arrays;
+import java.util.Set;
+import java.util.concurrent.atomic.AtomicInteger;
+
+/**
+ * Validator framework to run sql queries and compare table state at different 
locations.
+ */
+public abstract class SqlQueryPreCommitValidator> extends 
SparkPreCommitValidator {
+  private static final Logger LOG = 
LogManager.getLogger(SqlQueryPreCommitValidator.class);
+  private static final AtomicInteger TABLE_COUNTER = new AtomicInteger(0);
+
+  public SqlQueryPreCommitValidator(HoodieSparkTable table, 
HoodieEngineContext engineContext, HoodieWriteConfig config) {
+super(table, engineContext, config);
+  }
+
+  /**
+   * Takes input of RDD 1) before clustering and 2) after clustering. Perform 
required validation 
+   * and throw error if validation fails
+   */
+  @Override
+  public void validateRecordsBeforeAndAfter(Dataset before, Dataset 
after, final Set partitionsAffected) {
+String hoodieTableName = "staged_table_" + TABLE_COUNTER.incrementAndGet();
+String hoodieTableBeforeClustering = hoodieTableName + "_before";
+String hoodieTableAfterClustering = hoodieTableName + "_after";

Review comment:
   Can you also take one pass at the code and rename variables. the 
validator needs to be agnostic to commit or clustering operations. Can you name 
them accordingly. 

##
File path: 
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/client/utils/SparkValidatorUtils.java
##
@@ -0,0 +1,169 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to 

[GitHub] [hudi] wangxianghu closed issue #3188: java.lang.NoSuchMethodError: org.apache.hudi.org.apache.jetty.server.session.SessionHandler.setHttpOnly(Z)V[SUPPORT]

2021-07-01 Thread GitBox


wangxianghu closed issue #3188:
URL: https://github.com/apache/hudi/issues/3188


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] bvaradar commented on a change in pull request #3153: [HUDI-2072] Add pre-commit validator framework

2021-07-01 Thread GitBox


bvaradar commented on a change in pull request #3153:
URL: https://github.com/apache/hudi/pull/3153#discussion_r662650025



##
File path: 
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java
##
@@ -194,6 +194,37 @@
   public static final String EXTERNAL_RECORD_AND_SCHEMA_TRANSFORMATION = 
AVRO_SCHEMA + ".externalTransformation";
   public static final String DEFAULT_EXTERNAL_RECORD_AND_SCHEMA_TRANSFORMATION 
= "false";
 
+  public static final String PRE_COMMIT_VALIDATORS = 
"hoodie.precommit.validators";
+  private static final String DEFAULT_PRE_COMMIT_VALIDATORS = "";
+  public static final String VALIDATOR_TABLE_VARIABLE = "";

Review comment:
   It would make the validation queries more flexible if we can make both 
after and before table names individually configurable. Sometimes, your 
validation queries would involve joining both before and after tables. Keeping 
them configurable would allow for more flexibility.

##
File path: 
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/client/validator/SqlQueryPreCommitValidator.java
##
@@ -0,0 +1,92 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.client.validator;
+
+import org.apache.hudi.client.WriteStatus;
+import org.apache.hudi.client.common.HoodieSparkEngineContext;
+import org.apache.hudi.common.engine.HoodieEngineContext;
+import org.apache.hudi.common.model.HoodieRecordPayload;
+import org.apache.hudi.common.util.StringUtils;
+import org.apache.hudi.config.HoodieWriteConfig;
+import org.apache.hudi.exception.HoodieValidationException;
+import org.apache.hudi.table.HoodieSparkTable;
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+import org.apache.spark.api.java.JavaRDD;
+import org.apache.spark.api.java.JavaSparkContext;
+import org.apache.spark.sql.Dataset;
+import org.apache.spark.sql.Row;
+import org.apache.spark.sql.SQLContext;
+
+import java.util.Arrays;
+import java.util.Set;
+import java.util.concurrent.atomic.AtomicInteger;
+
+/**
+ * Validator framework to run sql queries and compare table state at different 
locations.
+ */
+public abstract class SqlQueryPreCommitValidator> extends 
SparkPreCommitValidator {
+  private static final Logger LOG = 
LogManager.getLogger(SqlQueryPreCommitValidator.class);
+  private static final AtomicInteger TABLE_COUNTER = new AtomicInteger(0);
+
+  public SqlQueryPreCommitValidator(HoodieSparkTable table, 
HoodieEngineContext engineContext, HoodieWriteConfig config) {
+super(table, engineContext, config);
+  }
+
+  /**
+   * Takes input of RDD 1) before clustering and 2) after clustering. Perform 
required validation 
+   * and throw error if validation fails
+   */
+  @Override
+  public void validateRecordsBeforeAndAfter(Dataset before, Dataset 
after, final Set partitionsAffected) {
+String hoodieTableName = "staged_table_" + TABLE_COUNTER.incrementAndGet();
+String hoodieTableBeforeClustering = hoodieTableName + "_before";
+String hoodieTableAfterClustering = hoodieTableName + "_after";

Review comment:
   Can you also take one pass at the code and rename variables. the 
validator needs to be agnostic to commit or clustering operations. Can you name 
them accordingly. 

##
File path: 
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/client/utils/SparkValidatorUtils.java
##
@@ -0,0 +1,169 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing 

[jira] [Commented] (HUDI-2057) CTAS Generate An External Table When Create Managed Table

2021-07-01 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17373194#comment-17373194
 ] 

ASF GitHub Bot commented on HUDI-2057:
--

hudi-bot edited a comment on pull request #3146:
URL: https://github.com/apache/hudi/pull/3146#issuecomment-867438049


   
   ## CI report:
   
   * a3889e81b221cbefe5ad98c1e62f90aa24742d80 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=624)
 
   * 189ca2500f54564e9c252dbab04198bae15494ef UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> CTAS Generate An External Table When Create Managed Table
> -
>
> Key: HUDI-2057
> URL: https://issues.apache.org/jira/browse/HUDI-2057
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Spark Integration
>Reporter: pengzhiwei
>Assignee: pengzhiwei
>Priority: Major
>  Labels: pull-request-available
>
> Currently CTAS would generate an external table when create a managed table 
> in the hive meta store.
> {code:java}
> create table h0 using hudi as select 1 as id, 'a1' as name;{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] hudi-bot edited a comment on pull request #3146: [HUDI-2057] CTAS Generate An External Table When Create Managed Table

2021-07-01 Thread GitBox


hudi-bot edited a comment on pull request #3146:
URL: https://github.com/apache/hudi/pull/3146#issuecomment-867438049


   
   ## CI report:
   
   * a3889e81b221cbefe5ad98c1e62f90aa24742d80 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=624)
 
   * 189ca2500f54564e9c252dbab04198bae15494ef UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-2057) CTAS Generate An External Table When Create Managed Table

2021-07-01 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17373191#comment-17373191
 ] 

ASF GitHub Bot commented on HUDI-2057:
--

pengzhiwei2018 commented on a change in pull request #3146:
URL: https://github.com/apache/hudi/pull/3146#discussion_r662711996



##
File path: 
hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/HiveSyncConfig.java
##
@@ -151,6 +155,7 @@ public String toString() {
   + ", help=" + help
   + ", supportTimestamp=" + supportTimestamp
   + ", decodePartition=" + decodePartition
+  + ", createManagedTable= " + createManagedTable

Review comment:
   done!

##
File path: 
hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/util/HiveSchemaUtil.java
##
@@ -413,7 +413,12 @@ public static String generateCreateDDL(String tableName, 
MessageType storageSche
 }
 
 String partitionsStr = String.join(",", partitionFields);
-StringBuilder sb = new StringBuilder("CREATE EXTERNAL TABLE  IF NOT EXISTS 
");
+StringBuilder sb = new StringBuilder();
+if (config.createManagedTable) {
+  sb.append("CREATE TABLE  IF NOT EXISTS ");

Review comment:
   done!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> CTAS Generate An External Table When Create Managed Table
> -
>
> Key: HUDI-2057
> URL: https://issues.apache.org/jira/browse/HUDI-2057
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Spark Integration
>Reporter: pengzhiwei
>Assignee: pengzhiwei
>Priority: Major
>  Labels: pull-request-available
>
> Currently CTAS would generate an external table when create a managed table 
> in the hive meta store.
> {code:java}
> create table h0 using hudi as select 1 as id, 'a1' as name;{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-2057) CTAS Generate An External Table When Create Managed Table

2021-07-01 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17373192#comment-17373192
 ] 

ASF GitHub Bot commented on HUDI-2057:
--

pengzhiwei2018 commented on a change in pull request #3146:
URL: https://github.com/apache/hudi/pull/3146#discussion_r662712075



##
File path: 
hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/util/HiveSchemaUtil.java
##
@@ -413,7 +413,12 @@ public static String generateCreateDDL(String tableName, 
MessageType storageSche
 }
 
 String partitionsStr = String.join(",", partitionFields);
-StringBuilder sb = new StringBuilder("CREATE EXTERNAL TABLE  IF NOT EXISTS 
");
+StringBuilder sb = new StringBuilder();
+if (config.createManagedTable) {
+  sb.append("CREATE TABLE  IF NOT EXISTS ");
+} else {
+  sb.append("CREATE EXTERNAL TABLE  IF NOT EXISTS ");

Review comment:
   done!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> CTAS Generate An External Table When Create Managed Table
> -
>
> Key: HUDI-2057
> URL: https://issues.apache.org/jira/browse/HUDI-2057
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Spark Integration
>Reporter: pengzhiwei
>Assignee: pengzhiwei
>Priority: Major
>  Labels: pull-request-available
>
> Currently CTAS would generate an external table when create a managed table 
> in the hive meta store.
> {code:java}
> create table h0 using hudi as select 1 as id, 'a1' as name;{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] pengzhiwei2018 commented on a change in pull request #3146: [HUDI-2057] CTAS Generate An External Table When Create Managed Table

2021-07-01 Thread GitBox


pengzhiwei2018 commented on a change in pull request #3146:
URL: https://github.com/apache/hudi/pull/3146#discussion_r662712075



##
File path: 
hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/util/HiveSchemaUtil.java
##
@@ -413,7 +413,12 @@ public static String generateCreateDDL(String tableName, 
MessageType storageSche
 }
 
 String partitionsStr = String.join(",", partitionFields);
-StringBuilder sb = new StringBuilder("CREATE EXTERNAL TABLE  IF NOT EXISTS 
");
+StringBuilder sb = new StringBuilder();
+if (config.createManagedTable) {
+  sb.append("CREATE TABLE  IF NOT EXISTS ");
+} else {
+  sb.append("CREATE EXTERNAL TABLE  IF NOT EXISTS ");

Review comment:
   done!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] pengzhiwei2018 commented on a change in pull request #3146: [HUDI-2057] CTAS Generate An External Table When Create Managed Table

2021-07-01 Thread GitBox


pengzhiwei2018 commented on a change in pull request #3146:
URL: https://github.com/apache/hudi/pull/3146#discussion_r662711996



##
File path: 
hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/HiveSyncConfig.java
##
@@ -151,6 +155,7 @@ public String toString() {
   + ", help=" + help
   + ", supportTimestamp=" + supportTimestamp
   + ", decodePartition=" + decodePartition
+  + ", createManagedTable= " + createManagedTable

Review comment:
   done!

##
File path: 
hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/util/HiveSchemaUtil.java
##
@@ -413,7 +413,12 @@ public static String generateCreateDDL(String tableName, 
MessageType storageSche
 }
 
 String partitionsStr = String.join(",", partitionFields);
-StringBuilder sb = new StringBuilder("CREATE EXTERNAL TABLE  IF NOT EXISTS 
");
+StringBuilder sb = new StringBuilder();
+if (config.createManagedTable) {
+  sb.append("CREATE TABLE  IF NOT EXISTS ");

Review comment:
   done!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-2057) CTAS Generate An External Table When Create Managed Table

2021-07-01 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17373189#comment-17373189
 ] 

ASF GitHub Bot commented on HUDI-2057:
--

pengzhiwei2018 commented on a change in pull request #3146:
URL: https://github.com/apache/hudi/pull/3146#discussion_r662711582



##
File path: 
hudi-sync/hudi-hive-sync/src/test/java/org/apache/hudi/hive/TestHiveSyncTool.java
##
@@ -33,8 +35,6 @@
 import org.apache.avro.Schema.Field;
 import org.apache.hadoop.fs.Path;
 import org.apache.hadoop.hive.metastore.api.Partition;
-import org.apache.hadoop.hive.ql.Driver;
-import org.apache.hadoop.hive.ql.session.SessionState;

Review comment:
   remove unused import 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> CTAS Generate An External Table When Create Managed Table
> -
>
> Key: HUDI-2057
> URL: https://issues.apache.org/jira/browse/HUDI-2057
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Spark Integration
>Reporter: pengzhiwei
>Assignee: pengzhiwei
>Priority: Major
>  Labels: pull-request-available
>
> Currently CTAS would generate an external table when create a managed table 
> in the hive meta store.
> {code:java}
> create table h0 using hudi as select 1 as id, 'a1' as name;{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] pengzhiwei2018 commented on a change in pull request #3146: [HUDI-2057] CTAS Generate An External Table When Create Managed Table

2021-07-01 Thread GitBox


pengzhiwei2018 commented on a change in pull request #3146:
URL: https://github.com/apache/hudi/pull/3146#discussion_r662711582



##
File path: 
hudi-sync/hudi-hive-sync/src/test/java/org/apache/hudi/hive/TestHiveSyncTool.java
##
@@ -33,8 +35,6 @@
 import org.apache.avro.Schema.Field;
 import org.apache.hadoop.fs.Path;
 import org.apache.hadoop.hive.metastore.api.Partition;
-import org.apache.hadoop.hive.ql.Driver;
-import org.apache.hadoop.hive.ql.session.SessionState;

Review comment:
   remove unused import 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-1447) DeltaStreamer kafka source supports consuming from specified timestamp

2021-07-01 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-1447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17373188#comment-17373188
 ] 

ASF GitHub Bot commented on HUDI-1447:
--

hudi-bot edited a comment on pull request #2438:
URL: https://github.com/apache/hudi/pull/2438#issuecomment-863310563


   
   ## CI report:
   
   * 1bbcdb44cbb0ab9ac84c95a48fea5d7f38a8f657 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=633)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> DeltaStreamer kafka source supports consuming from specified timestamp
> --
>
> Key: HUDI-1447
> URL: https://issues.apache.org/jira/browse/HUDI-1447
> Project: Apache Hudi
>  Issue Type: New Feature
>  Components: DeltaStreamer
>Reporter: wangxianghu#1
>Assignee: liujinhui
>Priority: Major
>  Labels: pull-request-available, sev:high, user-support-issues
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-2058) support incremental query for insert_overwrite_table/insert_overwrite operation on cow table

2021-07-01 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17373187#comment-17373187
 ] 

ASF GitHub Bot commented on HUDI-2058:
--

vinothchandar commented on pull request #3139:
URL: https://github.com/apache/hudi/pull/3139#issuecomment-872684832


   cc @codope this may also fix incremental + clustering, given they are all 
replace commit. Could you review this once please?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> support incremental query for insert_overwrite_table/insert_overwrite 
> operation on cow table
> 
>
> Key: HUDI-2058
> URL: https://issues.apache.org/jira/browse/HUDI-2058
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Incremental Pull
>Affects Versions: 0.8.0
> Environment: hadoop 3.1.1
> spark3.1.1
> hive 3.1.1
>Reporter: tao meng
>Assignee: tao meng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
>  when  incremental query contains multiple commit before and after 
> replacecommit, and the query result contains the data of the old file. 
> Notice: mor table is ok, only cow table has this problem.
>  
> when query incr_view for cow table, replacecommit is ignored which lead the 
> wrong result. 
>  
>  
> test step:
> step1:  create dataFrame
> val df = spark.range(0, 10).toDF("keyid")
>  .withColumn("col3", expr("keyid"))
>  .withColumn("age", lit(1))
>  .withColumn("p", lit(2))
>  
> step2:  insert df to a empty hoodie table
> df.write.format("hudi").
>  option(DataSourceWriteOptions.TABLE_TYPE_OPT_KEY, 
> DataSourceWriteOptions.COW_TABLE_TYPE_OPT_VAL).
>  option(DataSourceWriteOptions.PRECOMBINE_FIELD_OPT_KEY, "col3").
>  option(DataSourceWriteOptions.RECORDKEY_FIELD_OPT_KEY, "keyid").
>  option(DataSourceWriteOptions.PARTITIONPATH_FIELD_OPT_KEY, "").
>  option(DataSourceWriteOptions.KEYGENERATOR_CLASS_OPT_KEY, 
> "org.apache.hudi.keygen.NonpartitionedKeyGenerator").
>  option(DataSourceWriteOptions.OPERATION_OPT_KEY, "insert").
>  option("hoodie.insert.shuffle.parallelism", "4").
>  option(HoodieWriteConfig.TABLE_NAME, "hoodie_test")
>  .mode(SaveMode.Overwrite).save(basePath)
>  
> step3: do insert_overwrite
> df.write.format("hudi").
>  option(DataSourceWriteOptions.TABLE_TYPE_OPT_KEY, 
> DataSourceWriteOptions.COW_TABLE_TYPE_OPT_VAL).
>  option(DataSourceWriteOptions.PRECOMBINE_FIELD_OPT_KEY, "col3").
>  option(DataSourceWriteOptions.RECORDKEY_FIELD_OPT_KEY, "keyid").
>  option(DataSourceWriteOptions.PARTITIONPATH_FIELD_OPT_KEY, "").
>  option(DataSourceWriteOptions.KEYGENERATOR_CLASS_OPT_KEY, 
> "org.apache.hudi.keygen.NonpartitionedKeyGenerator").
>  option(DataSourceWriteOptions.OPERATION_OPT_KEY, "insert_overwrite_table").
>  option("hoodie.insert.shuffle.parallelism", "4").
>  option(HoodieWriteConfig.TABLE_NAME, "hoodie_test")
>  .mode(SaveMode.Append).save(basePath)
>  
> step4: query incrematal table 
> spark.read.format("hudi").option(DataSourceReadOptions.QUERY_TYPE_OPT_KEY, 
> DataSourceReadOptions.QUERY_TYPE_INCREMENTAL_OPT_VAL)
>  .option(DataSourceReadOptions.BEGIN_INSTANTTIME_OPT_KEY, "")
>  .option(DataSourceReadOptions.END_INSTANTTIME_OPT_KEY, currentCommits(0))
>  .load(basePath).select("keyid").orderBy("keyid").show(100, false)
>  
> result:   the result contains old data
> +-+
> |keyid|
> +-+
> |0 |
> |0 |
> |1 |
> |1 |
> |2 |
> |2 |
> |3 |
> |3 |
> |4 |
> |4 |
> |5 |
> |5 |
> |6 |
> |6 |
> |7 |
> |7 |
> |8 |
> |8 |
> |9 |
> |9 |
> +-+
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] hudi-bot edited a comment on pull request #2438: [HUDI-1447] DeltaStreamer kafka source supports consuming from specified timestamp

2021-07-01 Thread GitBox


hudi-bot edited a comment on pull request #2438:
URL: https://github.com/apache/hudi/pull/2438#issuecomment-863310563


   
   ## CI report:
   
   * 1bbcdb44cbb0ab9ac84c95a48fea5d7f38a8f657 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=633)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] vinothchandar commented on pull request #3139: [HUDI-2058]support incremental query for insert_overwrite_table/insert_overwrite operation on cow table

2021-07-01 Thread GitBox


vinothchandar commented on pull request #3139:
URL: https://github.com/apache/hudi/pull/3139#issuecomment-872684832


   cc @codope this may also fix incremental + clustering, given they are all 
replace commit. Could you review this once please?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-2119) Syncing of rollbacks to metadata table does not work in all cases

2021-07-01 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17373184#comment-17373184
 ] 

ASF GitHub Bot commented on HUDI-2119:
--

codecov-commenter commented on pull request #3210:
URL: https://github.com/apache/hudi/pull/3210#issuecomment-872684476


   # 
[Codecov](https://codecov.io/gh/apache/hudi/pull/3210?src=pr=h1_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 Report
   > Merging 
[#3210](https://codecov.io/gh/apache/hudi/pull/3210?src=pr=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (9d10483) into 
[master](https://codecov.io/gh/apache/hudi/commit/6eca06d074520140d7bc67b48bd2b9a5b76f0a87?el=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (6eca06d) will **increase** coverage by `2.23%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/3210/graphs/tree.svg?width=650=150=pr=VTTXabwbs2_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/3210?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
   
   ```diff
   @@ Coverage Diff  @@
   ## master#3210  +/-   ##
   
   + Coverage 47.51%   49.74%   +2.23% 
   + Complexity 5429  406-5023 
   
 Files   922   67 -855 
 Lines 40968 2985   -37983 
 Branches   4105  320-3785 
   
   - Hits  19464 1485   -17979 
   + Misses19780 1365   -18415 
   + Partials   1724  135-1589 
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | hudicli | `?` | |
   | hudiclient | `?` | |
   | hudicommon | `?` | |
   | hudiflink | `?` | |
   | hudihadoopmr | `?` | |
   | hudisparkdatasource | `?` | |
   | hudisync | `?` | |
   | huditimelineservice | `?` | |
   | hudiutilities | `49.74% <ø> (-8.28%)` | :arrow_down: |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/3210?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 | Coverage Δ | |
   |---|---|---|
   | 
[...ies/exception/HoodieSnapshotExporterException.java](https://codecov.io/gh/apache/hudi/pull/3210/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2V4Y2VwdGlvbi9Ib29kaWVTbmFwc2hvdEV4cG9ydGVyRXhjZXB0aW9uLmphdmE=)
 | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | 
[.../apache/hudi/utilities/HoodieSnapshotExporter.java](https://codecov.io/gh/apache/hudi/pull/3210/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL0hvb2RpZVNuYXBzaG90RXhwb3J0ZXIuamF2YQ==)
 | `5.17% <0.00%> (-83.63%)` | :arrow_down: |
   | 
[...hudi/utilities/schema/JdbcbasedSchemaProvider.java](https://codecov.io/gh/apache/hudi/pull/3210/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NjaGVtYS9KZGJjYmFzZWRTY2hlbWFQcm92aWRlci5qYXZh)
 | `0.00% <0.00%> (-72.23%)` | :arrow_down: |
   | 
[...org/apache/hudi/utilities/HDFSParquetImporter.java](https://codecov.io/gh/apache/hudi/pull/3210/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL0hERlNQYXJxdWV0SW1wb3J0ZXIuamF2YQ==)
 | `0.00% <0.00%> (-71.82%)` | :arrow_down: |
   | 
[...he/hudi/utilities/transform/AWSDmsTransformer.java](https://codecov.io/gh/apache/hudi/pull/3210/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3RyYW5zZm9ybS9BV1NEbXNUcmFuc2Zvcm1lci5qYXZh)
 | `0.00% <0.00%> (-66.67%)` | :arrow_down: |
   | 

[GitHub] [hudi] codecov-commenter commented on pull request #3210: [HUDI-2119] Ensure the rolled-back instance was previously synced to the Metadata Table when syncing a Rollback Instant.

2021-07-01 Thread GitBox


codecov-commenter commented on pull request #3210:
URL: https://github.com/apache/hudi/pull/3210#issuecomment-872684476


   # 
[Codecov](https://codecov.io/gh/apache/hudi/pull/3210?src=pr=h1_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 Report
   > Merging 
[#3210](https://codecov.io/gh/apache/hudi/pull/3210?src=pr=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (9d10483) into 
[master](https://codecov.io/gh/apache/hudi/commit/6eca06d074520140d7bc67b48bd2b9a5b76f0a87?el=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (6eca06d) will **increase** coverage by `2.23%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/3210/graphs/tree.svg?width=650=150=pr=VTTXabwbs2_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/3210?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
   
   ```diff
   @@ Coverage Diff  @@
   ## master#3210  +/-   ##
   
   + Coverage 47.51%   49.74%   +2.23% 
   + Complexity 5429  406-5023 
   
 Files   922   67 -855 
 Lines 40968 2985   -37983 
 Branches   4105  320-3785 
   
   - Hits  19464 1485   -17979 
   + Misses19780 1365   -18415 
   + Partials   1724  135-1589 
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | hudicli | `?` | |
   | hudiclient | `?` | |
   | hudicommon | `?` | |
   | hudiflink | `?` | |
   | hudihadoopmr | `?` | |
   | hudisparkdatasource | `?` | |
   | hudisync | `?` | |
   | huditimelineservice | `?` | |
   | hudiutilities | `49.74% <ø> (-8.28%)` | :arrow_down: |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/3210?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 | Coverage Δ | |
   |---|---|---|
   | 
[...ies/exception/HoodieSnapshotExporterException.java](https://codecov.io/gh/apache/hudi/pull/3210/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2V4Y2VwdGlvbi9Ib29kaWVTbmFwc2hvdEV4cG9ydGVyRXhjZXB0aW9uLmphdmE=)
 | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | 
[.../apache/hudi/utilities/HoodieSnapshotExporter.java](https://codecov.io/gh/apache/hudi/pull/3210/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL0hvb2RpZVNuYXBzaG90RXhwb3J0ZXIuamF2YQ==)
 | `5.17% <0.00%> (-83.63%)` | :arrow_down: |
   | 
[...hudi/utilities/schema/JdbcbasedSchemaProvider.java](https://codecov.io/gh/apache/hudi/pull/3210/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NjaGVtYS9KZGJjYmFzZWRTY2hlbWFQcm92aWRlci5qYXZh)
 | `0.00% <0.00%> (-72.23%)` | :arrow_down: |
   | 
[...org/apache/hudi/utilities/HDFSParquetImporter.java](https://codecov.io/gh/apache/hudi/pull/3210/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL0hERlNQYXJxdWV0SW1wb3J0ZXIuamF2YQ==)
 | `0.00% <0.00%> (-71.82%)` | :arrow_down: |
   | 
[...he/hudi/utilities/transform/AWSDmsTransformer.java](https://codecov.io/gh/apache/hudi/pull/3210/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3RyYW5zZm9ybS9BV1NEbXNUcmFuc2Zvcm1lci5qYXZh)
 | `0.00% <0.00%> (-66.67%)` | :arrow_down: |
   | 
[...in/java/org/apache/hudi/utilities/UtilHelpers.java](https://codecov.io/gh/apache/hudi/pull/3210/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL1V0aWxIZWxwZXJzLmphdmE=)
 | 

[jira] [Commented] (HUDI-1447) DeltaStreamer kafka source supports consuming from specified timestamp

2021-07-01 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-1447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17373183#comment-17373183
 ] 

ASF GitHub Bot commented on HUDI-1447:
--

hudi-bot edited a comment on pull request #2438:
URL: https://github.com/apache/hudi/pull/2438#issuecomment-863310563


   
   ## CI report:
   
   * 5e8ab52b0e139333c4c003932c55ff6e88302206 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=565)
 
   * 1bbcdb44cbb0ab9ac84c95a48fea5d7f38a8f657 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=633)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> DeltaStreamer kafka source supports consuming from specified timestamp
> --
>
> Key: HUDI-1447
> URL: https://issues.apache.org/jira/browse/HUDI-1447
> Project: Apache Hudi
>  Issue Type: New Feature
>  Components: DeltaStreamer
>Reporter: wangxianghu#1
>Assignee: liujinhui
>Priority: Major
>  Labels: pull-request-available, sev:high, user-support-issues
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] hudi-bot edited a comment on pull request #2438: [HUDI-1447] DeltaStreamer kafka source supports consuming from specified timestamp

2021-07-01 Thread GitBox


hudi-bot edited a comment on pull request #2438:
URL: https://github.com/apache/hudi/pull/2438#issuecomment-863310563


   
   ## CI report:
   
   * 5e8ab52b0e139333c4c003932c55ff6e88302206 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=565)
 
   * 1bbcdb44cbb0ab9ac84c95a48fea5d7f38a8f657 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=633)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-1447) DeltaStreamer kafka source supports consuming from specified timestamp

2021-07-01 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-1447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17373182#comment-17373182
 ] 

ASF GitHub Bot commented on HUDI-1447:
--

hudi-bot edited a comment on pull request #2438:
URL: https://github.com/apache/hudi/pull/2438#issuecomment-863310563


   
   ## CI report:
   
   * 5e8ab52b0e139333c4c003932c55ff6e88302206 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=565)
 
   * 1bbcdb44cbb0ab9ac84c95a48fea5d7f38a8f657 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> DeltaStreamer kafka source supports consuming from specified timestamp
> --
>
> Key: HUDI-1447
> URL: https://issues.apache.org/jira/browse/HUDI-1447
> Project: Apache Hudi
>  Issue Type: New Feature
>  Components: DeltaStreamer
>Reporter: wangxianghu#1
>Assignee: liujinhui
>Priority: Major
>  Labels: pull-request-available, sev:high, user-support-issues
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] hudi-bot edited a comment on pull request #2438: [HUDI-1447] DeltaStreamer kafka source supports consuming from specified timestamp

2021-07-01 Thread GitBox


hudi-bot edited a comment on pull request #2438:
URL: https://github.com/apache/hudi/pull/2438#issuecomment-863310563


   
   ## CI report:
   
   * 5e8ab52b0e139333c4c003932c55ff6e88302206 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=565)
 
   * 1bbcdb44cbb0ab9ac84c95a48fea5d7f38a8f657 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-1951) Hash Index for HUDI

2021-07-01 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-1951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17373181#comment-17373181
 ] 

ASF GitHub Bot commented on HUDI-1951:
--

vinothchandar commented on pull request #3173:
URL: https://github.com/apache/hudi/pull/3173#issuecomment-872683439


   @minihippo can you please rebase this PR again?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Hash Index for HUDI
> ---
>
> Key: HUDI-1951
> URL: https://issues.apache.org/jira/browse/HUDI-1951
> Project: Apache Hudi
>  Issue Type: New Feature
>Reporter: XiaoyuGeng
>Assignee: XiaoyuGeng
>Priority: Major
>  Labels: pull-request-available
>
> https://cwiki.apache.org/confluence/display/HUDI/RFC+-+29%3A+Hash+Index



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] vinothchandar commented on pull request #3173: [HUDI-1951] Add bucket hash index, compatible with the hive bucket

2021-07-01 Thread GitBox


vinothchandar commented on pull request #3173:
URL: https://github.com/apache/hudi/pull/3173#issuecomment-872683439


   @minihippo can you please rebase this PR again?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-2090) when hudi metadata is enabled, use different user to query table, the query will failed

2021-07-01 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17373179#comment-17373179
 ] 

ASF GitHub Bot commented on HUDI-2090:
--

vinothchandar commented on pull request #3183:
URL: https://github.com/apache/hudi/pull/3183#issuecomment-872683185


   @n3nash Looks like you shepherded the change with gary :). I don't see #795 
actuallly make this path change. 
   
   Instead of fixing perms (which may not work actually, based on permissions, 
like you can have write perms, but not to do chmod), can we create an unique 
folder like `/tmp/hudi_fsview_map-` . This should be totally backwards 
compatible to do. thoughts?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> when  hudi metadata is enabled,  use different user to query table, the query 
> will failed
> -
>
> Key: HUDI-2090
> URL: https://issues.apache.org/jira/browse/HUDI-2090
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Common Core
>Affects Versions: 0.8.0
>Reporter: tao meng
>Assignee: tao meng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> when hudi metadata is enabled, use different user to query table, the query 
> will failed.
>  
> The user permissions of the temporary directory generated by DiskBasedMap are 
> incorrect. This directory only has permissions for the user of current 
> operation, and other users have no permissions to access it, which leads to 
> this problem
> test step:
> step1: create hudi table with metadata enabled.
> step1: create two user(omm,user2)
> step2:  
> f1) use omm to query hudi table 
> DiskBasedMap will generate view_map with permissions drwx--.
> 2) then user user2 to query hudi table
> now user2 has no right to access view_map which created by omm,   the 
> exception will throws:
>      org.apache.hudi.exception.HoodieIOException: IOException when creating 
> ExternalSplillableMap at /tmp/view_map
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] vinothchandar commented on pull request #3183: [HUDI-2090] When hudi metadata is enabled, use different users to quer…

2021-07-01 Thread GitBox


vinothchandar commented on pull request #3183:
URL: https://github.com/apache/hudi/pull/3183#issuecomment-872683185


   @n3nash Looks like you shepherded the change with gary :). I don't see #795 
actuallly make this path change. 
   
   Instead of fixing perms (which may not work actually, based on permissions, 
like you can have write perms, but not to do chmod), can we create an unique 
folder like `/tmp/hudi_fsview_map-` . This should be totally backwards 
compatible to do. thoughts?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-2114) Spark Query MOR Table Written By Flink Return Incorrect Timestamp Value

2021-07-01 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17373173#comment-17373173
 ] 

ASF GitHub Bot commented on HUDI-2114:
--

hudi-bot edited a comment on pull request #3208:
URL: https://github.com/apache/hudi/pull/3208#issuecomment-872163385


   
   ## CI report:
   
   * 7db85c8b1a665ea3cc84d2e085518d100686e8a4 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=632)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Spark Query MOR Table Written By Flink Return Incorrect Timestamp Value
> ---
>
> Key: HUDI-2114
> URL: https://issues.apache.org/jira/browse/HUDI-2114
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Spark Integration
>Reporter: pengzhiwei
>Assignee: pengzhiwei
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0, 0.10.0
>
>
> Write a MOR table by flink like this:
> {code:java}
> create table h0 (
>  uuid varchar(20),
>  name varchar(10),
>  ts   timestamp(3)
> ) with (
>'connector' = 'hudi',
>'path' = '/xx/xx/',
> 'table.type' = 'MERGE_ON_READ'
> );
> insert into h0 values('id1', 'jim', TIMESTAMP '2021-01-01 00:00:01'){code}
> Query the table by spark will return a incorrect *ts* value:
> {code:java}
> 'id', 'jim', 1970-01-20 03:22:34.849144{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] hudi-bot edited a comment on pull request #3208: [HUDI-2114] Spark Query MOR Table Written By Flink Return Incorrect T…

2021-07-01 Thread GitBox


hudi-bot edited a comment on pull request #3208:
URL: https://github.com/apache/hudi/pull/3208#issuecomment-872163385


   
   ## CI report:
   
   * 7db85c8b1a665ea3cc84d2e085518d100686e8a4 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=632)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-2045) Support Read Hoodie As DataSource Table For Flink And DeltaStreamer

2021-07-01 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17373169#comment-17373169
 ] 

ASF GitHub Bot commented on HUDI-2045:
--

hudi-bot edited a comment on pull request #3120:
URL: https://github.com/apache/hudi/pull/3120#issuecomment-864760893


   
   ## CI report:
   
   * aaca30fffd1ea37f803f51ef3cf49c59ed79badc UNKNOWN
   * fcd06c8bccfc90b272b51d3511094e6617ec25bd UNKNOWN
   * 96947d0419df5f8bab10072eb64afecd29326e55 UNKNOWN
   * 02acd1127b72470f6d7adffb787179f0cddfa954 UNKNOWN
   * 504a6770be5d4cd3a78d61129be5b1aaadd515df UNKNOWN
   * 75aadbc834d6606527764468dd3dbcb1e802b171 UNKNOWN
   * f14ffb1f08820146e5d26616aa9b956ff99ec604 UNKNOWN
   * 06dff3c437b7b3f1aa227b700cf8c34669b067ed UNKNOWN
   * 97ba05a69199cff86cebbe25732097e3a68284f1 UNKNOWN
   * 3948fff7aacd6c97dcbe053a59a1208dae875607 UNKNOWN
   * 8ff6a0af2f53984c5864b04156a5b942400811c3 UNKNOWN
   * adba2dccf6e41da3bde98e5ed622cfd4b39554e9 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=515)
 
   * dbcd6ae0092c067c4bc364355ca7fa8129f46a39 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Support Read Hoodie As DataSource Table For Flink And DeltaStreamer
> ---
>
> Key: HUDI-2045
> URL: https://issues.apache.org/jira/browse/HUDI-2045
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Hive Integration
>Reporter: pengzhiwei
>Assignee: pengzhiwei
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> Currently we only support reading hoodie table as datasource table for spark 
> since [https://github.com/apache/hudi/pull/2283]
> In order to support this feature for flink and DeltaStreamer, we need to sync 
> the spark table properties needed by datasource table to the meta store in 
> HiveSyncTool.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] hudi-bot edited a comment on pull request #3120: [HUDI-2045] Support Read Hoodie As DataSource Table For Flink And Del…

2021-07-01 Thread GitBox


hudi-bot edited a comment on pull request #3120:
URL: https://github.com/apache/hudi/pull/3120#issuecomment-864760893


   
   ## CI report:
   
   * aaca30fffd1ea37f803f51ef3cf49c59ed79badc UNKNOWN
   * fcd06c8bccfc90b272b51d3511094e6617ec25bd UNKNOWN
   * 96947d0419df5f8bab10072eb64afecd29326e55 UNKNOWN
   * 02acd1127b72470f6d7adffb787179f0cddfa954 UNKNOWN
   * 504a6770be5d4cd3a78d61129be5b1aaadd515df UNKNOWN
   * 75aadbc834d6606527764468dd3dbcb1e802b171 UNKNOWN
   * f14ffb1f08820146e5d26616aa9b956ff99ec604 UNKNOWN
   * 06dff3c437b7b3f1aa227b700cf8c34669b067ed UNKNOWN
   * 97ba05a69199cff86cebbe25732097e3a68284f1 UNKNOWN
   * 3948fff7aacd6c97dcbe053a59a1208dae875607 UNKNOWN
   * 8ff6a0af2f53984c5864b04156a5b942400811c3 UNKNOWN
   * adba2dccf6e41da3bde98e5ed622cfd4b39554e9 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=515)
 
   * dbcd6ae0092c067c4bc364355ca7fa8129f46a39 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-2086) redo the logical of mor_incremental_view for hive

2021-07-01 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17373166#comment-17373166
 ] 

ASF GitHub Bot commented on HUDI-2086:
--

hudi-bot edited a comment on pull request #3203:
URL: https://github.com/apache/hudi/pull/3203#issuecomment-872092745


   
   ## CI report:
   
   * 792ae1d9884eac5eb7004e4edb617ea6e86d79b7 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=631)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> redo the logical of mor_incremental_view for hive
> -
>
> Key: HUDI-2086
> URL: https://issues.apache.org/jira/browse/HUDI-2086
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Hive Integration
> Environment: spark3.1.1
> hive3.1.1
> hadoop3.1.1
> os: suse
>Reporter: tao meng
>Assignee: tao meng
>Priority: Major
>  Labels: pull-request-available
>
> now ,There are some problems with mor_incremental_view for hive。
> For example,
> 1):*hudi cannot read the lastest incremental datas which are stored by logs*
> think that:  create a mor table with bulk_insert, and then do upsert for this 
> table, 
> no we want to query the latest incremental data by hive/sparksql,   however 
> the lastest incremental datas are stored by logs,   when we do query nothings 
> will return
> step1: prepare data
> val df = spark.sparkContext.parallelize(0 to 20, 2).map(x => testCase(x, 
> x+"jack", Random.nextInt(2))).toDF()
>  .withColumn("col3", expr("keyid + 3000"))
>  .withColumn("p", lit(1))
> step2: do bulk_insert
> mergePartitionTable(df, 4, "default", "inc", tableType = 
> DataSourceWriteOptions.MOR_TABLE_TYPE_OPT_VAL, op = "bulk_insert")
> step3: do upsert
> mergePartitionTable(df, 4, "default", "inc", tableType = 
> DataSourceWriteOptions.MOR_TABLE_TYPE_OPT_VAL, op = "upsert")
> step4:  check the lastest commit time and do query
> spark.sql("set hoodie.inc.consume.mode=INCREMENTAL")
> spark.sql("set hoodie.inc.consume.max.commits=1")
> spark.sql("set hoodie.inc.consume.start.timestamp=20210628103935")
> spark.sql("select keyid, col3 from inc_rt where `_hoodie_commit_time` > 
> '20210628103935' order by keyid").show(100, false)
> +-++
> |keyid|col3|
> +-++
> +-++
>  
> 2):*if we do insert_over_write/insert_over_write_table for hudi mor table, 
> the incr query result is wrong when we want to query the data before 
> insert_overwrite/insert_overwrite_table*
> step1: do bulk_insert 
> mergePartitionTable(df, 4, "default", "overInc", tableType = 
> DataSourceWriteOptions.MOR_TABLE_TYPE_OPT_VAL, op = "bulk_insert")
> now the commits is
> [20210628160614.deltacommit ]
> step2: do insert_overwrite_table
> mergePartitionTable(df, 4, "default", "overInc", tableType = 
> DataSourceWriteOptions.MOR_TABLE_TYPE_OPT_VAL, op = "insert_overwrite_table")
> now the commits is
> [20210628160614.deltacommit, 20210628160923.replacecommit ]
> step3: query the data before insert_overwrite_table
> spark.sql("set hoodie.overInc.consume.mode=INCREMENTAL")
> spark.sql("set hoodie.overInc.consume.max.commits=1")
> spark.sql("set hoodie.overInc.consume.start.timestamp=0")
> spark.sql("select keyid, col3 from overInc_rt where `_hoodie_commit_time` > 
> '0' order by keyid").show(100, false)
> +-++
> |keyid|col3|
> +-++
> +-++
>  
> 3) *hive/presto/flink  cannot read  file groups which has only logs*
> when we use hbase/inmemory as index, mor table will produce log files instead 
> of parquet file, but now hive/presto cannot read those files since those 
> files are log files.
> *HUDI-2048* mentions this problem.
>  
> however when we use spark data source to executre incremental query, there is 
> no such problem above。keep the logical of mor_incremental_view for hive as 
> the same logicl as spark dataSource is necessary。
> we redo the logical of mor_incremental_view for hive,to solve above problems 
> and keep the logical of mor_incremental_view  as the same logicl as spark 
> dataSource
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] hudi-bot edited a comment on pull request #3203: [HUDI-2086] Redo the logical of mor_incremental_view for hive

2021-07-01 Thread GitBox


hudi-bot edited a comment on pull request #3203:
URL: https://github.com/apache/hudi/pull/3203#issuecomment-872092745


   
   ## CI report:
   
   * 792ae1d9884eac5eb7004e4edb617ea6e86d79b7 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=631)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] freeshow commented on issue #3188: java.lang.NoSuchMethodError: org.apache.hudi.org.apache.jetty.server.session.SessionHandler.setHttpOnly(Z)V[SUPPORT]

2021-07-01 Thread GitBox


freeshow commented on issue #3188:
URL: https://github.com/apache/hudi/issues/3188#issuecomment-872674173


   > @freeshow Try this :
   > 
   > ```
   > hoodie.embed.timeline.server=false
   > ```
   
   Thanks,it works


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-2086) redo the logical of mor_incremental_view for hive

2021-07-01 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17373152#comment-17373152
 ] 

ASF GitHub Bot commented on HUDI-2086:
--

hudi-bot edited a comment on pull request #3203:
URL: https://github.com/apache/hudi/pull/3203#issuecomment-872092745


   
   ## CI report:
   
   * 5d49bd4e8c638ffb9ced102dc6771d6291c199e3 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=603)
 
   * 792ae1d9884eac5eb7004e4edb617ea6e86d79b7 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=631)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> redo the logical of mor_incremental_view for hive
> -
>
> Key: HUDI-2086
> URL: https://issues.apache.org/jira/browse/HUDI-2086
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Hive Integration
> Environment: spark3.1.1
> hive3.1.1
> hadoop3.1.1
> os: suse
>Reporter: tao meng
>Assignee: tao meng
>Priority: Major
>  Labels: pull-request-available
>
> now ,There are some problems with mor_incremental_view for hive。
> For example,
> 1):*hudi cannot read the lastest incremental datas which are stored by logs*
> think that:  create a mor table with bulk_insert, and then do upsert for this 
> table, 
> no we want to query the latest incremental data by hive/sparksql,   however 
> the lastest incremental datas are stored by logs,   when we do query nothings 
> will return
> step1: prepare data
> val df = spark.sparkContext.parallelize(0 to 20, 2).map(x => testCase(x, 
> x+"jack", Random.nextInt(2))).toDF()
>  .withColumn("col3", expr("keyid + 3000"))
>  .withColumn("p", lit(1))
> step2: do bulk_insert
> mergePartitionTable(df, 4, "default", "inc", tableType = 
> DataSourceWriteOptions.MOR_TABLE_TYPE_OPT_VAL, op = "bulk_insert")
> step3: do upsert
> mergePartitionTable(df, 4, "default", "inc", tableType = 
> DataSourceWriteOptions.MOR_TABLE_TYPE_OPT_VAL, op = "upsert")
> step4:  check the lastest commit time and do query
> spark.sql("set hoodie.inc.consume.mode=INCREMENTAL")
> spark.sql("set hoodie.inc.consume.max.commits=1")
> spark.sql("set hoodie.inc.consume.start.timestamp=20210628103935")
> spark.sql("select keyid, col3 from inc_rt where `_hoodie_commit_time` > 
> '20210628103935' order by keyid").show(100, false)
> +-++
> |keyid|col3|
> +-++
> +-++
>  
> 2):*if we do insert_over_write/insert_over_write_table for hudi mor table, 
> the incr query result is wrong when we want to query the data before 
> insert_overwrite/insert_overwrite_table*
> step1: do bulk_insert 
> mergePartitionTable(df, 4, "default", "overInc", tableType = 
> DataSourceWriteOptions.MOR_TABLE_TYPE_OPT_VAL, op = "bulk_insert")
> now the commits is
> [20210628160614.deltacommit ]
> step2: do insert_overwrite_table
> mergePartitionTable(df, 4, "default", "overInc", tableType = 
> DataSourceWriteOptions.MOR_TABLE_TYPE_OPT_VAL, op = "insert_overwrite_table")
> now the commits is
> [20210628160614.deltacommit, 20210628160923.replacecommit ]
> step3: query the data before insert_overwrite_table
> spark.sql("set hoodie.overInc.consume.mode=INCREMENTAL")
> spark.sql("set hoodie.overInc.consume.max.commits=1")
> spark.sql("set hoodie.overInc.consume.start.timestamp=0")
> spark.sql("select keyid, col3 from overInc_rt where `_hoodie_commit_time` > 
> '0' order by keyid").show(100, false)
> +-++
> |keyid|col3|
> +-++
> +-++
>  
> 3) *hive/presto/flink  cannot read  file groups which has only logs*
> when we use hbase/inmemory as index, mor table will produce log files instead 
> of parquet file, but now hive/presto cannot read those files since those 
> files are log files.
> *HUDI-2048* mentions this problem.
>  
> however when we use spark data source to executre incremental query, there is 
> no such problem above。keep the logical of mor_incremental_view for hive as 
> the same logicl as spark dataSource is necessary。
> we redo the logical of mor_incremental_view for hive,to solve above problems 
> and keep the logical of mor_incremental_view  as the same logicl as spark 
> dataSource
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-2114) Spark Query MOR Table Written By Flink Return Incorrect Timestamp Value

2021-07-01 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17373153#comment-17373153
 ] 

ASF GitHub Bot commented on HUDI-2114:
--

hudi-bot edited a comment on pull request #3208:
URL: https://github.com/apache/hudi/pull/3208#issuecomment-872163385


   
   ## CI report:
   
   * 14b39be069c0155fb3292f17305ed51428c1399a Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=622)
 
   * 7db85c8b1a665ea3cc84d2e085518d100686e8a4 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=632)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Spark Query MOR Table Written By Flink Return Incorrect Timestamp Value
> ---
>
> Key: HUDI-2114
> URL: https://issues.apache.org/jira/browse/HUDI-2114
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Spark Integration
>Reporter: pengzhiwei
>Assignee: pengzhiwei
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0, 0.10.0
>
>
> Write a MOR table by flink like this:
> {code:java}
> create table h0 (
>  uuid varchar(20),
>  name varchar(10),
>  ts   timestamp(3)
> ) with (
>'connector' = 'hudi',
>'path' = '/xx/xx/',
> 'table.type' = 'MERGE_ON_READ'
> );
> insert into h0 values('id1', 'jim', TIMESTAMP '2021-01-01 00:00:01'){code}
> Query the table by spark will return a incorrect *ts* value:
> {code:java}
> 'id', 'jim', 1970-01-20 03:22:34.849144{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] hudi-bot edited a comment on pull request #3208: [HUDI-2114] Spark Query MOR Table Written By Flink Return Incorrect T…

2021-07-01 Thread GitBox


hudi-bot edited a comment on pull request #3208:
URL: https://github.com/apache/hudi/pull/3208#issuecomment-872163385


   
   ## CI report:
   
   * 14b39be069c0155fb3292f17305ed51428c1399a Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=622)
 
   * 7db85c8b1a665ea3cc84d2e085518d100686e8a4 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=632)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot edited a comment on pull request #3203: [HUDI-2086] Redo the logical of mor_incremental_view for hive

2021-07-01 Thread GitBox


hudi-bot edited a comment on pull request #3203:
URL: https://github.com/apache/hudi/pull/3203#issuecomment-872092745


   
   ## CI report:
   
   * 5d49bd4e8c638ffb9ced102dc6771d6291c199e3 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=603)
 
   * 792ae1d9884eac5eb7004e4edb617ea6e86d79b7 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=631)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-2114) Spark Query MOR Table Written By Flink Return Incorrect Timestamp Value

2021-07-01 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17373151#comment-17373151
 ] 

ASF GitHub Bot commented on HUDI-2114:
--

hudi-bot edited a comment on pull request #3208:
URL: https://github.com/apache/hudi/pull/3208#issuecomment-872163385


   
   ## CI report:
   
   * 14b39be069c0155fb3292f17305ed51428c1399a Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=622)
 
   * 7db85c8b1a665ea3cc84d2e085518d100686e8a4 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Spark Query MOR Table Written By Flink Return Incorrect Timestamp Value
> ---
>
> Key: HUDI-2114
> URL: https://issues.apache.org/jira/browse/HUDI-2114
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Spark Integration
>Reporter: pengzhiwei
>Assignee: pengzhiwei
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0, 0.10.0
>
>
> Write a MOR table by flink like this:
> {code:java}
> create table h0 (
>  uuid varchar(20),
>  name varchar(10),
>  ts   timestamp(3)
> ) with (
>'connector' = 'hudi',
>'path' = '/xx/xx/',
> 'table.type' = 'MERGE_ON_READ'
> );
> insert into h0 values('id1', 'jim', TIMESTAMP '2021-01-01 00:00:01'){code}
> Query the table by spark will return a incorrect *ts* value:
> {code:java}
> 'id', 'jim', 1970-01-20 03:22:34.849144{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-2086) redo the logical of mor_incremental_view for hive

2021-07-01 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17373150#comment-17373150
 ] 

ASF GitHub Bot commented on HUDI-2086:
--

hudi-bot edited a comment on pull request #3203:
URL: https://github.com/apache/hudi/pull/3203#issuecomment-872092745


   
   ## CI report:
   
   * 5d49bd4e8c638ffb9ced102dc6771d6291c199e3 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=603)
 
   * 792ae1d9884eac5eb7004e4edb617ea6e86d79b7 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> redo the logical of mor_incremental_view for hive
> -
>
> Key: HUDI-2086
> URL: https://issues.apache.org/jira/browse/HUDI-2086
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Hive Integration
> Environment: spark3.1.1
> hive3.1.1
> hadoop3.1.1
> os: suse
>Reporter: tao meng
>Assignee: tao meng
>Priority: Major
>  Labels: pull-request-available
>
> now ,There are some problems with mor_incremental_view for hive。
> For example,
> 1):*hudi cannot read the lastest incremental datas which are stored by logs*
> think that:  create a mor table with bulk_insert, and then do upsert for this 
> table, 
> no we want to query the latest incremental data by hive/sparksql,   however 
> the lastest incremental datas are stored by logs,   when we do query nothings 
> will return
> step1: prepare data
> val df = spark.sparkContext.parallelize(0 to 20, 2).map(x => testCase(x, 
> x+"jack", Random.nextInt(2))).toDF()
>  .withColumn("col3", expr("keyid + 3000"))
>  .withColumn("p", lit(1))
> step2: do bulk_insert
> mergePartitionTable(df, 4, "default", "inc", tableType = 
> DataSourceWriteOptions.MOR_TABLE_TYPE_OPT_VAL, op = "bulk_insert")
> step3: do upsert
> mergePartitionTable(df, 4, "default", "inc", tableType = 
> DataSourceWriteOptions.MOR_TABLE_TYPE_OPT_VAL, op = "upsert")
> step4:  check the lastest commit time and do query
> spark.sql("set hoodie.inc.consume.mode=INCREMENTAL")
> spark.sql("set hoodie.inc.consume.max.commits=1")
> spark.sql("set hoodie.inc.consume.start.timestamp=20210628103935")
> spark.sql("select keyid, col3 from inc_rt where `_hoodie_commit_time` > 
> '20210628103935' order by keyid").show(100, false)
> +-++
> |keyid|col3|
> +-++
> +-++
>  
> 2):*if we do insert_over_write/insert_over_write_table for hudi mor table, 
> the incr query result is wrong when we want to query the data before 
> insert_overwrite/insert_overwrite_table*
> step1: do bulk_insert 
> mergePartitionTable(df, 4, "default", "overInc", tableType = 
> DataSourceWriteOptions.MOR_TABLE_TYPE_OPT_VAL, op = "bulk_insert")
> now the commits is
> [20210628160614.deltacommit ]
> step2: do insert_overwrite_table
> mergePartitionTable(df, 4, "default", "overInc", tableType = 
> DataSourceWriteOptions.MOR_TABLE_TYPE_OPT_VAL, op = "insert_overwrite_table")
> now the commits is
> [20210628160614.deltacommit, 20210628160923.replacecommit ]
> step3: query the data before insert_overwrite_table
> spark.sql("set hoodie.overInc.consume.mode=INCREMENTAL")
> spark.sql("set hoodie.overInc.consume.max.commits=1")
> spark.sql("set hoodie.overInc.consume.start.timestamp=0")
> spark.sql("select keyid, col3 from overInc_rt where `_hoodie_commit_time` > 
> '0' order by keyid").show(100, false)
> +-++
> |keyid|col3|
> +-++
> +-++
>  
> 3) *hive/presto/flink  cannot read  file groups which has only logs*
> when we use hbase/inmemory as index, mor table will produce log files instead 
> of parquet file, but now hive/presto cannot read those files since those 
> files are log files.
> *HUDI-2048* mentions this problem.
>  
> however when we use spark data source to executre incremental query, there is 
> no such problem above。keep the logical of mor_incremental_view for hive as 
> the same logicl as spark dataSource is necessary。
> we redo the logical of mor_incremental_view for hive,to solve above problems 
> and keep the logical of mor_incremental_view  as the same logicl as spark 
> dataSource
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] hudi-bot edited a comment on pull request #3203: [HUDI-2086] Redo the logical of mor_incremental_view for hive

2021-07-01 Thread GitBox


hudi-bot edited a comment on pull request #3203:
URL: https://github.com/apache/hudi/pull/3203#issuecomment-872092745


   
   ## CI report:
   
   * 5d49bd4e8c638ffb9ced102dc6771d6291c199e3 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=603)
 
   * 792ae1d9884eac5eb7004e4edb617ea6e86d79b7 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot edited a comment on pull request #3208: [HUDI-2114] Spark Query MOR Table Written By Flink Return Incorrect T…

2021-07-01 Thread GitBox


hudi-bot edited a comment on pull request #3208:
URL: https://github.com/apache/hudi/pull/3208#issuecomment-872163385


   
   ## CI report:
   
   * 14b39be069c0155fb3292f17305ed51428c1399a Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=622)
 
   * 7db85c8b1a665ea3cc84d2e085518d100686e8a4 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-1904) Make SchemaProvider spark free and move it to hudi-client-common

2021-07-01 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-1904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17373149#comment-17373149
 ] 

ASF GitHub Bot commented on HUDI-1904:
--

hudi-bot edited a comment on pull request #2963:
URL: https://github.com/apache/hudi/pull/2963#issuecomment-866559341


   
   ## CI report:
   
   * 8c67c9bee4b43a6306199d1e89d1c785cd083d4c Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=630)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Make SchemaProvider spark free and move it to hudi-client-common
> 
>
> Key: HUDI-1904
> URL: https://issues.apache.org/jira/browse/HUDI-1904
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Xianghu Wang
>Assignee: Xianghu Wang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> Currently, we have support spark, flink and java client to operate hudi 
> tables. The "common" stuff like `SchemaPriovider` should be extracted and 
> move away from `hudi-utilities` module to share with other engines



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] hudi-bot edited a comment on pull request #2963: [HUDI-1904] Introduce SchemaProviderInterface to make SchemaProvider unified

2021-07-01 Thread GitBox


hudi-bot edited a comment on pull request #2963:
URL: https://github.com/apache/hudi/pull/2963#issuecomment-866559341


   
   ## CI report:
   
   * 8c67c9bee4b43a6306199d1e89d1c785cd083d4c Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=630)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-1468) incremental read support with clustering

2021-07-01 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-1468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17373143#comment-17373143
 ] 

ASF GitHub Bot commented on HUDI-1468:
--

hudi-bot edited a comment on pull request #3211:
URL: https://github.com/apache/hudi/pull/3211#issuecomment-872639166


   
   ## CI report:
   
   * ab7bacb26d44f383e7f61ec81531b34011f1383b Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=629)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> incremental read support with clustering
> 
>
> Key: HUDI-1468
> URL: https://issues.apache.org/jira/browse/HUDI-1468
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Incremental Pull
>Affects Versions: 0.9.0
>Reporter: satish
>Assignee: liwei
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> As part of clustering, metadata such as hoodie_commit_time changes for 
> records that are clustered. This is specific to 
> SparkBulkInsertBasedRunClusteringStrategy implementation. Figure out a way to 
> carry commit_time from original record to support incremental queries.
> Also, incremental queries dont work with 'replacecommit' used by clustering 
> HUDI-1264. Change incremental query to work for replacecommits created by 
> Clustering.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] hudi-bot edited a comment on pull request #3211: [HUDI-1468] Support more flexible clustering strategies and preserve commit …

2021-07-01 Thread GitBox


hudi-bot edited a comment on pull request #3211:
URL: https://github.com/apache/hudi/pull/3211#issuecomment-872639166


   
   ## CI report:
   
   * ab7bacb26d44f383e7f61ec81531b34011f1383b Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=629)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] wangxianghu commented on issue #3188: java.lang.NoSuchMethodError: org.apache.hudi.org.apache.jetty.server.session.SessionHandler.setHttpOnly(Z)V[SUPPORT]

2021-07-01 Thread GitBox


wangxianghu commented on issue #3188:
URL: https://github.com/apache/hudi/issues/3188#issuecomment-872647851


   @freeshow Try this :
   ```
   hoodie.embed.timeline.server=false
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-1904) Make SchemaProvider spark free and move it to hudi-client-common

2021-07-01 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-1904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17373130#comment-17373130
 ] 

ASF GitHub Bot commented on HUDI-1904:
--

hudi-bot edited a comment on pull request #2963:
URL: https://github.com/apache/hudi/pull/2963#issuecomment-866559341


   
   ## CI report:
   
   * ce253fe711d93b46c4afa933550c82bf6500cac0 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=430)
 
   * 8c67c9bee4b43a6306199d1e89d1c785cd083d4c Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=630)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Make SchemaProvider spark free and move it to hudi-client-common
> 
>
> Key: HUDI-1904
> URL: https://issues.apache.org/jira/browse/HUDI-1904
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Xianghu Wang
>Assignee: Xianghu Wang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> Currently, we have support spark, flink and java client to operate hudi 
> tables. The "common" stuff like `SchemaPriovider` should be extracted and 
> move away from `hudi-utilities` module to share with other engines



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] hudi-bot edited a comment on pull request #2963: [HUDI-1904] Introduce SchemaProviderInterface to make SchemaProvider unified

2021-07-01 Thread GitBox


hudi-bot edited a comment on pull request #2963:
URL: https://github.com/apache/hudi/pull/2963#issuecomment-866559341


   
   ## CI report:
   
   * ce253fe711d93b46c4afa933550c82bf6500cac0 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=430)
 
   * 8c67c9bee4b43a6306199d1e89d1c785cd083d4c Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=630)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-1904) Make SchemaProvider spark free and move it to hudi-client-common

2021-07-01 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-1904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17373129#comment-17373129
 ] 

ASF GitHub Bot commented on HUDI-1904:
--

hudi-bot edited a comment on pull request #2963:
URL: https://github.com/apache/hudi/pull/2963#issuecomment-866559341


   
   ## CI report:
   
   * ce253fe711d93b46c4afa933550c82bf6500cac0 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=430)
 
   * 8c67c9bee4b43a6306199d1e89d1c785cd083d4c UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Make SchemaProvider spark free and move it to hudi-client-common
> 
>
> Key: HUDI-1904
> URL: https://issues.apache.org/jira/browse/HUDI-1904
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Xianghu Wang
>Assignee: Xianghu Wang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> Currently, we have support spark, flink and java client to operate hudi 
> tables. The "common" stuff like `SchemaPriovider` should be extracted and 
> move away from `hudi-utilities` module to share with other engines



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] hudi-bot edited a comment on pull request #2963: [HUDI-1904] Introduce SchemaProviderInterface to make SchemaProvider unified

2021-07-01 Thread GitBox


hudi-bot edited a comment on pull request #2963:
URL: https://github.com/apache/hudi/pull/2963#issuecomment-866559341


   
   ## CI report:
   
   * ce253fe711d93b46c4afa933550c82bf6500cac0 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=430)
 
   * 8c67c9bee4b43a6306199d1e89d1c785cd083d4c UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-2116) sync 10w partitions to hive by using HiveSyncTool lead to the oom of hive MetaStore

2021-07-01 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17373127#comment-17373127
 ] 

ASF GitHub Bot commented on HUDI-2116:
--

yanghua commented on pull request #3209:
URL: https://github.com/apache/hudi/pull/3209#issuecomment-872645071


   @xiarixiaoyao This title of PR describes the problem, while the better title 
is to describe "what do you want to do in the PR". WDYT?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


>  sync 10w partitions to hive by using HiveSyncTool lead to the oom of hive 
> MetaStore
> 
>
> Key: HUDI-2116
> URL: https://issues.apache.org/jira/browse/HUDI-2116
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Hive Integration
>Affects Versions: 0.8.0
> Environment: hive3.1.1
> hadoop 3.1.1
>Reporter: tao meng
>Assignee: tao meng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> when we try to sync 10w partitions to hive by using HiveSyncTool lead to the 
> oom of hive MetaStore。
>  
> here is a stress test for HiveSyncTool
> env: 
> hive metastore -Xms16G -Xmx16G
> hive.metastore.client.socket.timeout=10800
>  
> ||partitionNum||time consume||
> |100|37s|
> |1000|168s|
> |5000|1830s|
> |1|timeout|
> |10|hive metastore oom|
> HiveSyncTools  sync all partitions to hive metastore at once。 when the 
> partitions num is large ,it puts a lot of pressure on hive metastore。 for 
> large partition num we should support batch sync 。
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] yanghua commented on pull request #3209: [HUDI-2116] Sync 10w partitions to hive by using HiveSyncTool lead to …

2021-07-01 Thread GitBox


yanghua commented on pull request #3209:
URL: https://github.com/apache/hudi/pull/3209#issuecomment-872645071


   @xiarixiaoyao This title of PR describes the problem, while the better title 
is to describe "what do you want to do in the PR". WDYT?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] freeshow edited a comment on issue #3188: java.lang.NoSuchMethodError: org.apache.hudi.org.apache.jetty.server.session.SessionHandler.setHttpOnly(Z)V[SUPPORT]

2021-07-01 Thread GitBox


freeshow edited a comment on issue #3188:
URL: https://github.com/apache/hudi/issues/3188#issuecomment-872643783


   I found hadoop3.0.0 provides Jetty 9.3; Hudi has a dependency on Jetty 9.4 
(specifically, SessionHandler.setHttpOnly() doesn't exist in 9.3). I compile 
with Hadoop3.0.0.
   When I use hadoop2.7,   not appeer the error! 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] freeshow commented on issue #3188: java.lang.NoSuchMethodError: org.apache.hudi.org.apache.jetty.server.session.SessionHandler.setHttpOnly(Z)V[SUPPORT]

2021-07-01 Thread GitBox


freeshow commented on issue #3188:
URL: https://github.com/apache/hudi/issues/3188#issuecomment-872643783


   I fount hadoop3.0.0 provides Jetty 9.3; Hudi has a dependency on Jetty 9.4 
(specifically, SessionHandler.setHttpOnly() doesn't exist in 9.3). I compile 
with Hadoop3.0.0.
   When I use hadoop2.7,   not appeer the error! 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-1872) Move HoodieFlinkStreamer into hudi-utilities module

2021-07-01 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-1872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17373126#comment-17373126
 ] 

ASF GitHub Bot commented on HUDI-1872:
--

wangxianghu commented on pull request #3162:
URL: https://github.com/apache/hudi/pull/3162#issuecomment-872641634


   > > Yes, I agree with you on creating a new module , but let's not put it 
under the original hudi-utilities-bundle.
   > > I prefer this:
   > > hudi-utilities-bundle
   > > ├── hudi-flink-utilities-bundle
   > > └── hudi-spark-utilities-bundle
   > 
   > I am not suggesting to create modules under `hudi-utilities-bundle`, Also, 
in the bundle we are not adding any classes. Since we are moving some Flink 
classes to `hudi-utilities` module , we will have to create modules under it, 
something like
   > 
   > hudi-utilities
   > ├── hudi-flink-utilities
   > └── hudi-spark-utilities
   > 
   > and then we can add these two as part of `hudi-utilities-bundle` here 
https://github.com/apache/hudi/blob/master/packaging/hudi-utilities-bundle/pom.xml#L70
 like the we way we have added `hudi-hive-sync` module from `hudi-sync` project.
   
   The utilities-bundle jars should be engine-related. each jar for one 
engine(use different dependencies).
   if you want to put these bundles under `hudi-utilities-bundle`, could be:
   ```
hudi-utilities-bundle
├── hudi-flink-utilities-bundle
└── hudi-spark-utilities-bundle
   ```
   but these ways do not respect backward compatibility, user should use new 
jars(hudi-xxx-utilities) to start their jobs
   
   another option:
   add a new bundle for `HoodieFLinkStreamer` that's is `hudi-flink-utilities`, 
leave the others untouched.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Move HoodieFlinkStreamer into hudi-utilities module
> ---
>
> Key: HUDI-1872
> URL: https://issues.apache.org/jira/browse/HUDI-1872
> Project: Apache Hudi
>  Issue Type: Task
>  Components: Flink Integration
>Reporter: Danny Chen
>Assignee: Vinay
>Priority: Major
>  Labels: pull-request-available, sev:normal
> Fix For: 0.9.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] wangxianghu commented on pull request #3162: [HUDI-1872] Move HoodieFlinkStreamer to hudi-utilities module

2021-07-01 Thread GitBox


wangxianghu commented on pull request #3162:
URL: https://github.com/apache/hudi/pull/3162#issuecomment-872641634


   > > Yes, I agree with you on creating a new module , but let's not put it 
under the original hudi-utilities-bundle.
   > > I prefer this:
   > > hudi-utilities-bundle
   > > ├── hudi-flink-utilities-bundle
   > > └── hudi-spark-utilities-bundle
   > 
   > I am not suggesting to create modules under `hudi-utilities-bundle`, Also, 
in the bundle we are not adding any classes. Since we are moving some Flink 
classes to `hudi-utilities` module , we will have to create modules under it, 
something like
   > 
   > hudi-utilities
   > ├── hudi-flink-utilities
   > └── hudi-spark-utilities
   > 
   > and then we can add these two as part of `hudi-utilities-bundle` here 
https://github.com/apache/hudi/blob/master/packaging/hudi-utilities-bundle/pom.xml#L70
 like the we way we have added `hudi-hive-sync` module from `hudi-sync` project.
   
   The utilities-bundle jars should be engine-related. each jar for one 
engine(use different dependencies).
   if you want to put these bundles under `hudi-utilities-bundle`, could be:
   ```
hudi-utilities-bundle
├── hudi-flink-utilities-bundle
└── hudi-spark-utilities-bundle
   ```
   but these ways do not respect backward compatibility, user should use new 
jars(hudi-xxx-utilities) to start their jobs
   
   another option:
   add a new bundle for `HoodieFLinkStreamer` that's is `hudi-flink-utilities`, 
leave the others untouched.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-2113) Fix integration testing failure caused by sql results out of order

2021-07-01 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17373125#comment-17373125
 ] 

ASF GitHub Bot commented on HUDI-2113:
--

yanghua commented on pull request #3204:
URL: https://github.com/apache/hudi/pull/3204#issuecomment-872640137


   @n3nash Would you want to give a double check?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Fix integration testing failure caused by sql results out of order
> --
>
> Key: HUDI-2113
> URL: https://issues.apache.org/jira/browse/HUDI-2113
> Project: Apache Hudi
>  Issue Type: Test
>  Components: Testing
>Reporter: XiaoyuGeng
>Assignee: XiaoyuGeng
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] yanghua commented on pull request #3204: [HUDI-2113] Fix integration testing failure caused by sql results out…

2021-07-01 Thread GitBox


yanghua commented on pull request #3204:
URL: https://github.com/apache/hudi/pull/3204#issuecomment-872640137


   @n3nash Would you want to give a double check?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-1468) incremental read support with clustering

2021-07-01 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-1468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17373124#comment-17373124
 ] 

ASF GitHub Bot commented on HUDI-1468:
--

hudi-bot edited a comment on pull request #3211:
URL: https://github.com/apache/hudi/pull/3211#issuecomment-872639166


   
   ## CI report:
   
   * ab7bacb26d44f383e7f61ec81531b34011f1383b Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=629)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> incremental read support with clustering
> 
>
> Key: HUDI-1468
> URL: https://issues.apache.org/jira/browse/HUDI-1468
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Incremental Pull
>Affects Versions: 0.9.0
>Reporter: satish
>Assignee: liwei
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> As part of clustering, metadata such as hoodie_commit_time changes for 
> records that are clustered. This is specific to 
> SparkBulkInsertBasedRunClusteringStrategy implementation. Figure out a way to 
> carry commit_time from original record to support incremental queries.
> Also, incremental queries dont work with 'replacecommit' used by clustering 
> HUDI-1264. Change incremental query to work for replacecommits created by 
> Clustering.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] hudi-bot edited a comment on pull request #3211: [HUDI-1468] Support more flexible clustering strategies and preserve commit …

2021-07-01 Thread GitBox


hudi-bot edited a comment on pull request #3211:
URL: https://github.com/apache/hudi/pull/3211#issuecomment-872639166


   
   ## CI report:
   
   * ab7bacb26d44f383e7f61ec81531b34011f1383b Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=629)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-1468) incremental read support with clustering

2021-07-01 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-1468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17373122#comment-17373122
 ] 

ASF GitHub Bot commented on HUDI-1468:
--

hudi-bot commented on pull request #3211:
URL: https://github.com/apache/hudi/pull/3211#issuecomment-872639166


   
   ## CI report:
   
   * ab7bacb26d44f383e7f61ec81531b34011f1383b UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> incremental read support with clustering
> 
>
> Key: HUDI-1468
> URL: https://issues.apache.org/jira/browse/HUDI-1468
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Incremental Pull
>Affects Versions: 0.9.0
>Reporter: satish
>Assignee: liwei
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> As part of clustering, metadata such as hoodie_commit_time changes for 
> records that are clustered. This is specific to 
> SparkBulkInsertBasedRunClusteringStrategy implementation. Figure out a way to 
> carry commit_time from original record to support incremental queries.
> Also, incremental queries dont work with 'replacecommit' used by clustering 
> HUDI-1264. Change incremental query to work for replacecommits created by 
> Clustering.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-1468) incremental read support with clustering

2021-07-01 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-1468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17373123#comment-17373123
 ] 

ASF GitHub Bot commented on HUDI-1468:
--

satishkotha commented on pull request #3211:
URL: https://github.com/apache/hudi/pull/3211#issuecomment-872639220


   @n3nash @vinothchandar   this includes all my changes done for supporting 
encryption style usecases using clustering framework. I still need to port some 
tests. But please take a look and add any comments


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> incremental read support with clustering
> 
>
> Key: HUDI-1468
> URL: https://issues.apache.org/jira/browse/HUDI-1468
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Incremental Pull
>Affects Versions: 0.9.0
>Reporter: satish
>Assignee: liwei
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> As part of clustering, metadata such as hoodie_commit_time changes for 
> records that are clustered. This is specific to 
> SparkBulkInsertBasedRunClusteringStrategy implementation. Figure out a way to 
> carry commit_time from original record to support incremental queries.
> Also, incremental queries dont work with 'replacecommit' used by clustering 
> HUDI-1264. Change incremental query to work for replacecommits created by 
> Clustering.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] satishkotha commented on pull request #3211: [HUDI-1468] Support more flexible clustering strategies and preserve commit …

2021-07-01 Thread GitBox


satishkotha commented on pull request #3211:
URL: https://github.com/apache/hudi/pull/3211#issuecomment-872639220


   @n3nash @vinothchandar   this includes all my changes done for supporting 
encryption style usecases using clustering framework. I still need to port some 
tests. But please take a look and add any comments


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #3211: [HUDI-1468] Support more flexible clustering strategies and preserve commit …

2021-07-01 Thread GitBox


hudi-bot commented on pull request #3211:
URL: https://github.com/apache/hudi/pull/3211#issuecomment-872639166


   
   ## CI report:
   
   * ab7bacb26d44f383e7f61ec81531b34011f1383b UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-1468) incremental read support with clustering

2021-07-01 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-1468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17373121#comment-17373121
 ] 

ASF GitHub Bot commented on HUDI-1468:
--

satishkotha opened a new pull request #3211:
URL: https://github.com/apache/hudi/pull/3211


   
   ## What is the purpose of the pull request
   Support custom clustering strategies and preserve commit time to support 
incremental read
   
   ## Brief change log
   
   * introduce new way of running clustering using 
SingleSparkJobExecutionStrategy for usecases that dont need sorting
   *  Push down more logic into clustering strategies to avoid RDD union.
   * Make some performance improvements after running at large scale. Avoid RDD 
collect multiple times.
   * Preserve Hoodie commit time (optional for backward compatibility) while 
rewriting the data
   
   
   ## Verify this pull request
   
   This change added tests and can be verified as follows:
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> incremental read support with clustering
> 
>
> Key: HUDI-1468
> URL: https://issues.apache.org/jira/browse/HUDI-1468
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Incremental Pull
>Affects Versions: 0.9.0
>Reporter: satish
>Assignee: liwei
>Priority: Blocker
> Fix For: 0.9.0
>
>
> As part of clustering, metadata such as hoodie_commit_time changes for 
> records that are clustered. This is specific to 
> SparkBulkInsertBasedRunClusteringStrategy implementation. Figure out a way to 
> carry commit_time from original record to support incremental queries.
> Also, incremental queries dont work with 'replacecommit' used by clustering 
> HUDI-1264. Change incremental query to work for replacecommits created by 
> Clustering.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-1468) incremental read support with clustering

2021-07-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-1468:
-
Labels: pull-request-available  (was: )

> incremental read support with clustering
> 
>
> Key: HUDI-1468
> URL: https://issues.apache.org/jira/browse/HUDI-1468
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Incremental Pull
>Affects Versions: 0.9.0
>Reporter: satish
>Assignee: liwei
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> As part of clustering, metadata such as hoodie_commit_time changes for 
> records that are clustered. This is specific to 
> SparkBulkInsertBasedRunClusteringStrategy implementation. Figure out a way to 
> carry commit_time from original record to support incremental queries.
> Also, incremental queries dont work with 'replacecommit' used by clustering 
> HUDI-1264. Change incremental query to work for replacecommits created by 
> Clustering.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] satishkotha opened a new pull request #3211: [HUDI-1468] Support custom clustering strategies and preserve commit …

2021-07-01 Thread GitBox


satishkotha opened a new pull request #3211:
URL: https://github.com/apache/hudi/pull/3211


   
   ## What is the purpose of the pull request
   Support custom clustering strategies and preserve commit time to support 
incremental read
   
   ## Brief change log
   
   * introduce new way of running clustering using 
SingleSparkJobExecutionStrategy for usecases that dont need sorting
   *  Push down more logic into clustering strategies to avoid RDD union.
   * Make some performance improvements after running at large scale. Avoid RDD 
collect multiple times.
   * Preserve Hoodie commit time (optional for backward compatibility) while 
rewriting the data
   
   
   ## Verify this pull request
   
   This change added tests and can be verified as follows:
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




  1   2   3   4   5   6   >