date:20210324

[GitHub] [hudi] codecov-io commented on pull request #2708: [HUDI-1712] Rename & standardize config to match other configs

2021-03-24 Thread GitBox



codecov-io commented on pull request #2708:
URL: https://github.com/apache/hudi/pull/2708#issuecomment-805560787


   # [Codecov](https://codecov.io/gh/apache/hudi/pull/2708?src=pr&el=h1) Report
   > Merging 
[#2708](https://codecov.io/gh/apache/hudi/pull/2708?src=pr&el=desc) (1aaf8b1) 
into 
[master](https://codecov.io/gh/apache/hudi/commit/0e6909d3e241c794ed1b9318fcb9142a36cb0133?el=desc)
 (0e6909d) will **decrease** coverage by `37.03%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/2708/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2)](https://codecov.io/gh/apache/hudi/pull/2708?src=pr&el=tree)
   
   ```diff
   @@ Coverage Diff  @@
   ## master   #2708   +/-   ##
   
   - Coverage 46.43%   9.40%   -37.04% 
   + Complexity 3278  48 -3230 
   
 Files   476  54  -422 
 Lines 225831989-20594 
 Branches   2408 236 -2172 
   
   - Hits  10487 187-10300 
   + Misses111961789 -9407 
   + Partials900  13  -887 
   ```
   
   | Flag | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | hudicli | `?` | `?` | |
   | hudiclient | `?` | `?` | |
   | hudicommon | `?` | `?` | |
   | hudiflink | `?` | `?` | |
   | hudihadoopmr | `?` | `?` | |
   | hudisparkdatasource | `?` | `?` | |
   | hudisync | `?` | `?` | |
   | huditimelineservice | `?` | `?` | |
   | hudiutilities | `9.40% <ø> (ø)` | `0.00 <ø> (ø)` | |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/2708?src=pr&el=tree) | Coverage Δ 
| Complexity Δ | |
   |---|---|---|---|
   | 
[...ava/org/apache/hudi/exception/HoodieException.java](https://codecov.io/gh/apache/hudi/pull/2708/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvZXhjZXB0aW9uL0hvb2RpZUV4Y2VwdGlvbi5qYXZh)
 | | | |
   | 
[.../org/apache/hudi/sink/InstantGenerateOperator.java](https://codecov.io/gh/apache/hudi/pull/2708/diff?src=pr&el=tree#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS9zaW5rL0luc3RhbnRHZW5lcmF0ZU9wZXJhdG9yLmphdmE=)
 | | | |
   | 
[...in/java/org/apache/hudi/common/model/BaseFile.java](https://codecov.io/gh/apache/hudi/pull/2708/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL21vZGVsL0Jhc2VGaWxlLmphdmE=)
 | | | |
   | 
[...in/java/org/apache/hudi/cli/HoodiePrintHelper.java](https://codecov.io/gh/apache/hudi/pull/2708/diff?src=pr&el=tree#diff-aHVkaS1jbGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY2xpL0hvb2RpZVByaW50SGVscGVyLmphdmE=)
 | | | |
   | 
[...util/jvm/OpenJ9MemoryLayoutSpecification64bit.java](https://codecov.io/gh/apache/hudi/pull/2708/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3V0aWwvanZtL09wZW5KOU1lbW9yeUxheW91dFNwZWNpZmljYXRpb242NGJpdC5qYXZh)
 | | | |
   | 
[...mmon/table/log/block/HoodieDeleteBlockVersion.java](https://codecov.io/gh/apache/hudi/pull/2708/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL2xvZy9ibG9jay9Ib29kaWVEZWxldGVCbG9ja1ZlcnNpb24uamF2YQ==)
 | | | |
   | 
[.../org/apache/hudi/common/model/BaseAvroPayload.java](https://codecov.io/gh/apache/hudi/pull/2708/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL21vZGVsL0Jhc2VBdnJvUGF5bG9hZC5qYXZh)
 | | | |
   | 
[...e/hudi/common/table/log/HoodieLogFormatWriter.java](https://codecov.io/gh/apache/hudi/pull/2708/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL2xvZy9Ib29kaWVMb2dGb3JtYXRXcml0ZXIuamF2YQ==)
 | | | |
   | 
[...che/hudi/common/util/collection/ImmutablePair.java](https://codecov.io/gh/apache/hudi/pull/2708/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3V0aWwvY29sbGVjdGlvbi9JbW11dGFibGVQYWlyLmphdmE=)
 | | | |
   | 
[...rg/apache/hudi/metadata/HoodieMetadataPayload.java](https://codecov.io/gh/apache/hudi/pull/2708/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvbWV0YWRhdGEvSG9vZGllTWV0YWRhdGFQYXlsb2FkLmphdmE=)
 | | | |
   | ... and [411 
more](https://codecov.io/gh/apache/hudi/pull/2708/diff?src=pr&el=tree-more) | |
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] codecov-io edited a comment on pull request #2645: [HUDI-1659] Basic Implementation Of Spark Sql Support

2021-03-24 Thread GitBox



codecov-io edited a comment on pull request #2645:
URL: https://github.com/apache/hudi/pull/2645#issuecomment-792430670


   # [Codecov](https://codecov.io/gh/apache/hudi/pull/2645?src=pr&el=h1) Report
   > Merging 
[#2645](https://codecov.io/gh/apache/hudi/pull/2645?src=pr&el=desc) (c4316ae) 
into 
[master](https://codecov.io/gh/apache/hudi/commit/900de34e45b4c1d19c01ea84adc38413f2bd52ff?el=desc)
 (900de34) will **increase** coverage by `0.05%`.
   > The diff coverage is `53.16%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/2645/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2)](https://codecov.io/gh/apache/hudi/pull/2645?src=pr&el=tree)
   
   ```diff
   @@ Coverage Diff  @@
   ## master#2645  +/-   ##
   
   + Coverage 51.76%   51.82%   +0.05% 
   - Complexity 3601 3682  +81 
   
 Files   476  493  +17 
 Lines 2257923800+1221 
 Branches   2407 2672 +265 
   
   + Hits  1168812334 +646 
   - Misses 987410284 +410 
   - Partials   1017 1182 +165 
   ```
   
   | Flag | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | hudicli | `37.01% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | hudiclient | `100.00% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | hudicommon | `50.81% <7.31%> (-0.12%)` | `0.00 <2.00> (ø)` | |
   | hudiflink | `54.13% <ø> (-0.15%)` | `0.00 <ø> (ø)` | |
   | hudihadoopmr | `33.44% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | hudisparkdatasource | `63.95% <54.82%> (-6.99%)` | `0.00 <76.00> (ø)` | |
   | hudisync | `45.50% <0.00%> (-0.20%)` | `0.00 <1.00> (ø)` | |
   | huditimelineservice | `64.36% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | hudiutilities | `69.73% <ø> (ø)` | `0.00 <ø> (ø)` | |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/2645?src=pr&el=tree) | Coverage Δ 
| Complexity Δ | |
   |---|---|---|---|
   | 
[...g/apache/hudi/common/model/HoodiePayloadProps.java](https://codecov.io/gh/apache/hudi/pull/2645/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL21vZGVsL0hvb2RpZVBheWxvYWRQcm9wcy5qYXZh)
 | `0.00% <ø> (ø)` | `0.00 <0.00> (ø)` | |
   | 
[...rg/apache/hudi/common/table/HoodieTableConfig.java](https://codecov.io/gh/apache/hudi/pull/2645/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL0hvb2RpZVRhYmxlQ29uZmlnLmphdmE=)
 | `43.75% <0.00%> (-1.71%)` | `17.00 <0.00> (ø)` | |
   | 
[...pache/hudi/common/table/HoodieTableMetaClient.java](https://codecov.io/gh/apache/hudi/pull/2645/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL0hvb2RpZVRhYmxlTWV0YUNsaWVudC5qYXZh)
 | `65.09% <0.00%> (-3.22%)` | `43.00 <0.00> (ø)` | |
   | 
[.../apache/hudi/common/table/TableSchemaResolver.java](https://codecov.io/gh/apache/hudi/pull/2645/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL1RhYmxlU2NoZW1hUmVzb2x2ZXIuamF2YQ==)
 | `0.00% <0.00%> (ø)` | `0.00 <0.00> (ø)` | |
   | 
[...he/hudi/exception/HoodieDuplicateKeyException.java](https://codecov.io/gh/apache/hudi/pull/2645/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvZXhjZXB0aW9uL0hvb2RpZUR1cGxpY2F0ZUtleUV4Y2VwdGlvbi5qYXZh)
 | `0.00% <0.00%> (ø)` | `0.00 <0.00> (?)` | |
   | 
[...n/scala/org/apache/hudi/HoodieSparkSqlWriter.scala](https://codecov.io/gh/apache/hudi/pull/2645/diff?src=pr&el=tree#diff-aHVkaS1zcGFyay1kYXRhc291cmNlL2h1ZGktc3Bhcmsvc3JjL21haW4vc2NhbGEvb3JnL2FwYWNoZS9odWRpL0hvb2RpZVNwYXJrU3FsV3JpdGVyLnNjYWxh)
 | `57.79% <ø> (ø)` | `0.00 <0.00> (ø)` | |
   | 
[...la/org/apache/spark/sql/hive/HiveClientUtils.scala](https://codecov.io/gh/apache/hudi/pull/2645/diff?src=pr&el=tree#diff-aHVkaS1zcGFyay1kYXRhc291cmNlL2h1ZGktc3Bhcmsvc3JjL21haW4vc2NhbGEvb3JnL2FwYWNoZS9zcGFyay9zcWwvaGl2ZS9IaXZlQ2xpZW50VXRpbHMuc2NhbGE=)
 | `0.00% <0.00%> (ø)` | `0.00 <0.00> (?)` | |
   | 
[...e/spark/sql/catalyst/plans/logical/mergeInto.scala](https://codecov.io/gh/apache/hudi/pull/2645/diff?src=pr&el=tree#diff-aHVkaS1zcGFyay1kYXRhc291cmNlL2h1ZGktc3BhcmsyL3NyYy9tYWluL3NjYWxhL29yZy9hcGFjaGUvc3Bhcmsvc3FsL2NhdGFseXN0L3BsYW5zL2xvZ2ljYWwvbWVyZ2VJbnRvLnNjYWxh)
 | `0.00% <0.00%> (ø)` | `0.00 <0.00> (?)` | |
   | 
[...he/spark/sql/hudi/parser/HoodieSqlAstBuilder.scala](https://codecov.io/gh/apache/hudi/pull/2645/diff?src=pr&el=tree#diff-aHVkaS1zcGFyay1kYXRhc291cmNlL2h1ZGktc3BhcmsyL3NyYy9tYWluL3NjYWxhL29yZy9hcGFjaGUvc3Bhcmsvc3FsL2h1ZGkvcGFyc2VyL0hvb2RpZVNxbEFzdEJ1aWxkZXIuc2NhbGE=)
 | `0.00% <0.00%> (ø)` | `0.00 <0.00> (?)` | |

[GitHub] [hudi] codecov-io commented on pull request #2711: [hotfix] Log the error message for creating table source first

2021-03-24 Thread GitBox



codecov-io commented on pull request #2711:
URL: https://github.com/apache/hudi/pull/2711#issuecomment-805632411


   # [Codecov](https://codecov.io/gh/apache/hudi/pull/2711?src=pr&el=h1) Report
   > Merging 
[#2711](https://codecov.io/gh/apache/hudi/pull/2711?src=pr&el=desc) (ae47544) 
into 
[master](https://codecov.io/gh/apache/hudi/commit/03668dbaf1a60428d7e0d68c6622605e0809150a?el=desc)
 (03668db) will **decrease** coverage by `0.01%`.
   > The diff coverage is `50.00%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/2711/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2)](https://codecov.io/gh/apache/hudi/pull/2711?src=pr&el=tree)
   
   ```diff
   @@ Coverage Diff  @@
   ## master#2711  +/-   ##
   
   - Coverage 51.74%   51.73%   -0.02% 
   + Complexity 3602 3601   -1 
   
 Files   476  476  
 Lines 2259222595   +3 
 Branches   2409 2409  
   
   - Hits  1169011689   -1 
   - Misses 9885 9888   +3 
   - Partials   1017 1018   +1 
   ```
   
   | Flag | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | hudicli | `37.01% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | hudiclient | `100.00% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | hudicommon | `50.94% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | hudiflink | `54.08% <50.00%> (-0.05%)` | `0.00 <0.00> (ø)` | |
   | hudihadoopmr | `33.44% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | hudisparkdatasource | `70.87% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | hudisync | `45.58% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | huditimelineservice | `64.36% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | hudiutilities | `69.73% <ø> (-0.06%)` | `0.00 <ø> (ø)` | |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/2711?src=pr&el=tree) | Coverage Δ 
| Complexity Δ | |
   |---|---|---|---|
   | 
[...java/org/apache/hudi/table/HoodieTableFactory.java](https://codecov.io/gh/apache/hudi/pull/2711/diff?src=pr&el=tree#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS90YWJsZS9Ib29kaWVUYWJsZUZhY3RvcnkuamF2YQ==)
 | `72.72% <50.00%> (-5.33%)` | `11.00 <0.00> (ø)` | |
   | 
[...apache/hudi/utilities/deltastreamer/DeltaSync.java](https://codecov.io/gh/apache/hudi/pull/2711/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2RlbHRhc3RyZWFtZXIvRGVsdGFTeW5jLmphdmE=)
 | `71.37% <0.00%> (-0.35%)` | `55.00% <0.00%> (-1.00%)` | |
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] garyli1019 merged pull request #2708: [HUDI-1712] Rename & standardize config to match other configs

2021-03-24 Thread GitBox



garyli1019 merged pull request #2708:
URL: https://github.com/apache/hudi/pull/2708


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[hudi] branch master updated (03668db -> 01a1d79)

2021-03-24 Thread garyli

This is an automated email from the ASF dual-hosted git repository.

garyli pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git.


from 03668db  [HUDI-1710] Read optimized query type for Flink batch reader 
(#2702)
 add 01a1d79  [HUDI-1712] Rename & standardize config to match other 
configs (#2708)

No new revisions were added by this update.

Summary of changes:
 .../hudi/common/config/LockConfiguration.java  |  2 +-
 .../testsuite/job/TestHoodieTestSuiteJob.java  | 18 ++--
 .../functional/TestHoodieDeltaStreamer.java| 34 +++---
 3 files changed, 27 insertions(+), 27 deletions(-)

[GitHub] [hudi] Sugamber commented on issue #2637: [SUPPORT] - Partial Update : update few columns of a table

2021-03-24 Thread GitBox



Sugamber commented on issue #2637:
URL: https://github.com/apache/hudi/issues/2637#issuecomment-805645355


   There is an open pull request for partial update for CoW table.
   https://github.com/apache/hudi/pull/1929
   
   It looks like my use case .


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] Sugamber commented on pull request #2666: [HUDI-1160] Support update partial fields for CoW table

2021-03-24 Thread GitBox



Sugamber commented on pull request #2666:
URL: https://github.com/apache/hudi/pull/2666#issuecomment-805652102


   @liujinhui1994  We are also need the same feature in hudi. Is there any 
working branch which can be referred? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] liujinhui1994 commented on pull request #2666: [HUDI-1160] Support update partial fields for CoW table

2021-03-24 Thread GitBox



liujinhui1994 commented on pull request #2666:
URL: https://github.com/apache/hudi/pull/2666#issuecomment-805655005


   @Sugamber This branch should be available


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] Sugamber commented on pull request #2666: [HUDI-1160] Support update partial fields for CoW table

2021-03-24 Thread GitBox



Sugamber commented on pull request #2666:
URL: https://github.com/apache/hudi/pull/2666#issuecomment-805666558


   Is there any timeline for this pull request?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] liujinhui1994 commented on pull request #2666: [HUDI-1160] Support update partial fields for CoW table

2021-03-24 Thread GitBox



liujinhui1994 commented on pull request #2666:
URL: https://github.com/apache/hudi/pull/2666#issuecomment-805667943


   Maybe after 0.8 is released
   
   
   
   
   -- 原始邮件 --
   发件人: 
   "apache/hudi"
***@***.***>;
   发送时间: 2021年3月24日(星期三) 晚上6:03
   ***@***.***>;
   ***@***.**@***.***>;
   主题: Re: [apache/hudi] [HUDI-1160] Support update partial fields for CoW 
table (#2666)
   
   
   
   
   

   Is there any timeline for this pull request?

   —
   You are receiving this because you were mentioned.
   Reply to this email directly, view it on GitHub, or unsubscribe.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] codecov-io edited a comment on pull request #2651: [HUDI-1591] [RFC-26] Improve Hoodie Table Query Performance And Ease Of Use Fo…

2021-03-24 Thread GitBox



codecov-io edited a comment on pull request #2651:
URL: https://github.com/apache/hudi/pull/2651#issuecomment-794945140


   # [Codecov](https://codecov.io/gh/apache/hudi/pull/2651?src=pr&el=h1) Report
   > Merging 
[#2651](https://codecov.io/gh/apache/hudi/pull/2651?src=pr&el=desc) (fb7a9b1) 
into 
[master](https://codecov.io/gh/apache/hudi/commit/ce3e8ec87083ef4cd4f33de39b6697f66ff3f277?el=desc)
 (ce3e8ec) will **increase** coverage by `17.95%`.
   > The diff coverage is `50.00%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/2651/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2)](https://codecov.io/gh/apache/hudi/pull/2651?src=pr&el=tree)
   
   ```diff
   @@  Coverage Diff  @@
   ## master#2651   +/-   ##
   =
   + Coverage 51.76%   69.72%   +17.95% 
   + Complexity 3602  372 -3230 
   =
 Files   476   54  -422 
 Lines 22579 1995-20584 
 Branches   2408  236 -2172 
   =
   - Hits  11688 1391-10297 
   + Misses 9874  474 -9400 
   + Partials   1017  130  -887 
   ```
   
   | Flag | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | hudicli | `?` | `?` | |
   | hudiclient | `?` | `?` | |
   | hudicommon | `?` | `?` | |
   | hudiflink | `?` | `?` | |
   | hudihadoopmr | `?` | `?` | |
   | hudisparkdatasource | `?` | `?` | |
   | hudisync | `?` | `?` | |
   | huditimelineservice | `?` | `?` | |
   | hudiutilities | `69.72% <50.00%> (-0.06%)` | `0.00 <0.00> (ø)` | |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/2651?src=pr&el=tree) | Coverage Δ 
| Complexity Δ | |
   |---|---|---|---|
   | 
[...apache/hudi/utilities/deltastreamer/DeltaSync.java](https://codecov.io/gh/apache/hudi/pull/2651/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2RlbHRhc3RyZWFtZXIvRGVsdGFTeW5jLmphdmE=)
 | `71.28% <50.00%> (-0.45%)` | `56.00 <0.00> (ø)` | |
   | 
[...ava/org/apache/hudi/exception/HoodieException.java](https://codecov.io/gh/apache/hudi/pull/2651/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvZXhjZXB0aW9uL0hvb2RpZUV4Y2VwdGlvbi5qYXZh)
 | | | |
   | 
[.../org/apache/hudi/sink/InstantGenerateOperator.java](https://codecov.io/gh/apache/hudi/pull/2651/diff?src=pr&el=tree#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS9zaW5rL0luc3RhbnRHZW5lcmF0ZU9wZXJhdG9yLmphdmE=)
 | | | |
   | 
[...in/java/org/apache/hudi/common/model/BaseFile.java](https://codecov.io/gh/apache/hudi/pull/2651/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL21vZGVsL0Jhc2VGaWxlLmphdmE=)
 | | | |
   | 
[...in/java/org/apache/hudi/cli/HoodiePrintHelper.java](https://codecov.io/gh/apache/hudi/pull/2651/diff?src=pr&el=tree#diff-aHVkaS1jbGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY2xpL0hvb2RpZVByaW50SGVscGVyLmphdmE=)
 | | | |
   | 
[...util/jvm/OpenJ9MemoryLayoutSpecification64bit.java](https://codecov.io/gh/apache/hudi/pull/2651/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3V0aWwvanZtL09wZW5KOU1lbW9yeUxheW91dFNwZWNpZmljYXRpb242NGJpdC5qYXZh)
 | | | |
   | 
[...mmon/table/log/block/HoodieDeleteBlockVersion.java](https://codecov.io/gh/apache/hudi/pull/2651/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL2xvZy9ibG9jay9Ib29kaWVEZWxldGVCbG9ja1ZlcnNpb24uamF2YQ==)
 | | | |
   | 
[.../org/apache/hudi/common/model/BaseAvroPayload.java](https://codecov.io/gh/apache/hudi/pull/2651/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL21vZGVsL0Jhc2VBdnJvUGF5bG9hZC5qYXZh)
 | | | |
   | 
[...e/hudi/common/table/log/HoodieLogFormatWriter.java](https://codecov.io/gh/apache/hudi/pull/2651/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL2xvZy9Ib29kaWVMb2dGb3JtYXRXcml0ZXIuamF2YQ==)
 | | | |
   | 
[...che/hudi/common/util/collection/ImmutablePair.java](https://codecov.io/gh/apache/hudi/pull/2651/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3V0aWwvY29sbGVjdGlvbi9JbW11dGFibGVQYWlyLmphdmE=)
 | | | |
   | ... and [405 
more](https://codecov.io/gh/apache/hudi/pull/2651/diff?src=pr&el=tree-more) | |
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] garyli1019 merged pull request #2711: [hotfix] Log the error message for creating table source first

2021-03-24 Thread GitBox



garyli1019 merged pull request #2711:
URL: https://github.com/apache/hudi/pull/2711


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[hudi] branch master updated: [hotfix] Log the error message for creating table source first (#2711)

2021-03-24 Thread garyli

This is an automated email from the ASF dual-hosted git repository.

garyli pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new 29b79c9  [hotfix] Log the error message for creating table source 
first (#2711)
29b79c9 is described below

commit 29b79c99b02d66ef9b087b56223e74c0d1f99e94
Author: Danny Chan 
AuthorDate: Wed Mar 24 18:25:37 2021 +0800

[hotfix] Log the error message for creating table source first (#2711)
---
 .../org/apache/hudi/table/HoodieTableFactory.java  | 27 +++---
 1 file changed, 19 insertions(+), 8 deletions(-)

diff --git 
a/hudi-flink/src/main/java/org/apache/hudi/table/HoodieTableFactory.java 
b/hudi-flink/src/main/java/org/apache/hudi/table/HoodieTableFactory.java
index a2dac36..7ce8880 100644
--- a/hudi-flink/src/main/java/org/apache/hudi/table/HoodieTableFactory.java
+++ b/hudi-flink/src/main/java/org/apache/hudi/table/HoodieTableFactory.java
@@ -19,6 +19,7 @@
 package org.apache.hudi.table;
 
 import org.apache.hudi.configuration.FlinkOptions;
+import org.apache.hudi.exception.HoodieException;
 import org.apache.hudi.keygen.ComplexAvroKeyGenerator;
 import org.apache.hudi.util.AvroSchemaConverter;
 
@@ -57,14 +58,24 @@ public class HoodieTableFactory implements 
TableSourceFactory, TableSin
 Configuration conf = FlinkOptions.fromMap(context.getTable().getOptions());
 TableSchema schema = 
TableSchemaUtils.getPhysicalSchema(context.getTable().getSchema());
 setupConfOptions(conf, context.getObjectIdentifier().getObjectName(), 
context.getTable(), schema);
-Path path = new Path(conf.getOptional(FlinkOptions.PATH).orElseThrow(() ->
-new ValidationException("Option [path] should be not empty.")));
-return new HoodieTableSource(
-schema,
-path,
-context.getTable().getPartitionKeys(),
-conf.getString(FlinkOptions.PARTITION_DEFAULT_NAME),
-conf);
+// enclosing the code within a try catch block so that we can log the 
error message.
+// Flink 1.11 did a bad compatibility for the old table factory, it uses 
the old factory
+// to create the source/sink and catches all the exceptions then tries the 
new factory.
+//
+// log the error message first so that there is a chance to show the real 
failure cause.
+try {
+  Path path = new Path(conf.getOptional(FlinkOptions.PATH).orElseThrow(() 
->
+  new ValidationException("Option [path] should not be empty.")));
+  return new HoodieTableSource(
+  schema,
+  path,
+  context.getTable().getPartitionKeys(),
+  conf.getString(FlinkOptions.PARTITION_DEFAULT_NAME),
+  conf);
+} catch (Throwable throwable) {
+  LOG.error("Create table source error", throwable);
+  throw new HoodieException(throwable);
+}
   }
 
   @Override

[GitHub] [hudi] Sugamber edited a comment on issue #2637: [SUPPORT] - Partial Update : update few columns of a table

2021-03-24 Thread GitBox



Sugamber edited a comment on issue #2637:
URL: https://github.com/apache/hudi/issues/2637#issuecomment-805645355


   There is an open pull request for partial update for CoW table.
   https://github.com/apache/hudi/pull/1929
   
   It looks like my use case is similar to this .


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] IloveZiHan opened a new pull request #2713: 查阅Structured Streaming写入Hudi执行计划

2021-03-24 Thread GitBox



IloveZiHan opened a new pull request #2713:
URL: https://github.com/apache/hudi/pull/2713


   ## *Tips*
   - *Thank you very much for contributing to Apache Hudi.*
   - *Please review https://hudi.apache.org/contributing.html before opening a 
pull request.*
   
   ## What is the purpose of the pull request
   
   *(For example: This pull request adds quick-start document.)*
   
   ## Brief change log
   
   *(for example:)*
 - *Modify AnnotationLocation checkstyle rule in checkstyle.xml*
   
   ## Verify this pull request
   
   *(Please pick either of the following options)*
   
   This pull request is a trivial rework / code cleanup without any test 
coverage.
   
   *(or)*
   
   This pull request is already covered by existing tests, such as *(please 
describe tests)*.
   
   (or)
   
   This change added tests and can be verified as follows:
   
   *(example:)*
   
 - *Added integration tests for end-to-end.*
 - *Added HoodieClientWriteTest to verify the change.*
 - *Manually verified the change by running a job locally.*
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] IloveZiHan commented on pull request #2713: 查阅Structured Streaming写入Hudi执行计划

2021-03-24 Thread GitBox



IloveZiHan commented on pull request #2713:
URL: https://github.com/apache/hudi/pull/2713#issuecomment-805712787


   ok


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] IloveZiHan closed pull request #2713: 查阅Structured Streaming写入Hudi执行计划

2021-03-24 Thread GitBox



IloveZiHan closed pull request #2713:
URL: https://github.com/apache/hudi/pull/2713


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] Sugamber commented on issue #2637: [SUPPORT] - Partial Update : update few columns of a table

2021-03-24 Thread GitBox



Sugamber commented on issue #2637:
URL: https://github.com/apache/hudi/issues/2637#issuecomment-805713856


   @nsivabalan  Do we have any timeline for this pull request ?
   Pull request 1- https://github.com/apache/hudi/pull/1929/
   Pull request 2- https://github.com/apache/hudi/pull/2666
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Created] (HUDI-1714) Improve code coverage of TestHoodieTimelineArchiveLog

2021-03-24 Thread Jagmeet Bali (Jira)

Jagmeet Bali created HUDI-1714:
--

 Summary: Improve code coverage of TestHoodieTimelineArchiveLog
 Key: HUDI-1714
 URL: https://issues.apache.org/jira/browse/HUDI-1714
 Project: Apache Hudi
  Issue Type: Test
Reporter: Jagmeet Bali


Add tests for the newly added code which supports the archival of clean and 
rollback commits specifically around the 

getCleanInstantsToArchive codepath within HoodieTimelineArchiveLog



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[GitHub] [hudi] nsivabalan commented on issue #2675: [SUPPORT] Unable to query MOR table after schema evolution

2021-03-24 Thread GitBox



nsivabalan commented on issue #2675:
URL: https://github.com/apache/hudi/issues/2675#issuecomment-805805044


   1. do you use the RowBasedSchemaProvider and hence can't explicitly provide 
schema? If you were to use your own schema registry, you might as well provide 
an updated schema to hudi while writing. 
   2. got it. would be nice to have some contribution. I can help review the 
patch. 
   In the mean time, I will give it a try schema evolution on my end with some 
local set up.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] jsbali commented on a change in pull request #2677: Added tests to TestHoodieTimelineArchiveLog for the archival of compl…

2021-03-24 Thread GitBox



jsbali commented on a change in pull request #2677:
URL: https://github.com/apache/hudi/pull/2677#discussion_r600461519



##
File path: 
hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/io/TestHoodieTimelineArchiveLog.java
##
@@ -388,6 +391,31 @@ public void testArchiveCommitSavepointNoHole() throws 
IOException {
 "Archived commits should always be safe");
   }
 
+  @Test
+  public void testArchiveRollbacks() throws IOException {
+HoodieWriteConfig cfg = HoodieWriteConfig.newBuilder().withPath(basePath)
+
.withSchema(HoodieTestDataGenerator.TRIP_EXAMPLE_SCHEMA).withParallelism(2, 
2).forTable("test-trip-table")
+
.withCompactionConfig(HoodieCompactionConfig.newBuilder().retainCommits(1).archiveCommitsWith(2,
 3).build())
+.build();
+
+createCommitAndRollbackFile("100", "101", false);
+createCommitAndRollbackFile("102", "103", false);
+createCommitAndRollbackFile("104", "105", false);
+createCommitAndRollbackFile("106", "107", false);
+
+HoodieTable table = HoodieSparkTable.create(cfg, context);
+HoodieTimelineArchiveLog archiveLog = new HoodieTimelineArchiveLog(cfg, 
table);
+
+assertTrue(archiveLog.archiveIfRequired(context));
+HoodieTimeline timeline = 
metaClient.getActiveTimeline().reload().getCommitsTimeline().filterCompletedInstants();
+assertEquals(2, timeline.countInstants(),
+"first two commits must have been archived");
+assertFalse(metaClient.getActiveTimeline().containsInstant(new 
HoodieInstant(false, HoodieTimeline.ROLLBACK_ACTION, "101")),
+"first rollback must have been archived");
+assertFalse(metaClient.getActiveTimeline().containsInstant(new 
HoodieInstant(false, HoodieTimeline.ROLLBACK_ACTION, "103")),

Review comment:
   Done




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] jsbali commented on a change in pull request #2677: Added tests to TestHoodieTimelineArchiveLog for the archival of compl…

2021-03-24 Thread GitBox



jsbali commented on a change in pull request #2677:
URL: https://github.com/apache/hudi/pull/2677#discussion_r600462395



##
File path: 
hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/io/TestHoodieTimelineArchiveLog.java
##
@@ -491,6 +519,166 @@ public void testConvertCommitMetadata() {
 assertEquals(expectedCommitMetadata.getOperationType(), 
WriteOperationType.INSERT.toString());
   }
 
+  @Test
+  public void testArchiveCompletedClean() throws IOException {
+HoodieWriteConfig cfg =
+
HoodieWriteConfig.newBuilder().withPath(basePath).withSchema(HoodieTestDataGenerator.TRIP_EXAMPLE_SCHEMA)
+.withParallelism(2, 2).forTable("test-trip-table")
+
.withCompactionConfig(HoodieCompactionConfig.newBuilder().retainCommits(1).archiveCommitsWith(2,
 3).build())
+.build();
+metaClient = HoodieTableMetaClient.reload(metaClient);
+
+createCleanMetadata("10", false);
+createCleanMetadata("11", false);
+createCleanMetadata("12", false);
+HoodieInstant notArchivedInstant1 = new HoodieInstant(State.COMPLETED, 
"clean", "12");
+createCleanMetadata("13", false);
+HoodieInstant notArchivedInstant2 = new HoodieInstant(State.COMPLETED, 
"clean", "13");
+
+HoodieTable table = HoodieSparkTable.create(cfg, context, metaClient);
+HoodieTimelineArchiveLog archiveLog = new HoodieTimelineArchiveLog(cfg, 
table);
+
+archiveLog.archiveIfRequired(context);
+
+List notArchivedInstants = 
metaClient.getActiveTimeline().reload().getInstants().collect(Collectors.toList());
+//There will be 3 * 2 files but due to TimelineLayoutV1 this will show as 
2.
+assertEquals(2, notArchivedInstants.size(), "Not archived instants should 
be 2");
+assertEquals(notArchivedInstants, Arrays.asList(notArchivedInstant1, 
notArchivedInstant2), "");
+  }
+
+  @Test
+  public void testArchiveCompletedRollback() throws IOException {
+HoodieWriteConfig cfg =
+
HoodieWriteConfig.newBuilder().withPath(basePath).withSchema(HoodieTestDataGenerator.TRIP_EXAMPLE_SCHEMA)
+.withParallelism(2, 2).forTable("test-trip-table")
+
.withCompactionConfig(HoodieCompactionConfig.newBuilder().retainCommits(1).archiveCommitsWith(2,
 3).build())
+.build();
+metaClient = HoodieTableMetaClient.reload(metaClient);
+
+createCommitAndRollbackFile("6", "10", false);
+createCommitAndRollbackFile("8", "11", false);
+createCommitAndRollbackFile("7", "12", false);
+HoodieInstant notArchivedInstant1 = new HoodieInstant(State.COMPLETED, 
"rollback", "12");
+
+createCommitAndRollbackFile("5", "13", false);
+HoodieInstant notArchivedInstant2 = new HoodieInstant(State.COMPLETED, 
"rollback", "13");
+
+HoodieTable table = HoodieSparkTable.create(cfg, context, metaClient);
+HoodieTimelineArchiveLog archiveLog = new HoodieTimelineArchiveLog(cfg, 
table);
+
+archiveLog.archiveIfRequired(context);
+
+List notArchivedInstants = 
metaClient.getActiveTimeline().reload().getRollbackTimeline().getInstants().collect(Collectors.toList());
+//There will be 2 * 2 files but due to TimelineLayoutV1 this will show as 
2.
+assertEquals(2, notArchivedInstants.size(), "Not archived instants should 
be 2");
+assertEquals(notArchivedInstants, Arrays.asList(notArchivedInstant1, 
notArchivedInstant2), "");
+  }
+
+  @Test
+  public void 
testArchiveCompletedShouldRetainMinInstantsIfInstantsGreaterThanMaxtoKeep() 
throws IOException {
+int minInstants = 2;
+int maxInstants = 10;
+HoodieWriteConfig cfg =
+
HoodieWriteConfig.newBuilder().withPath(basePath).withSchema(HoodieTestDataGenerator.TRIP_EXAMPLE_SCHEMA)
+.withParallelism(2, 2).forTable("test-trip-table")
+
.withCompactionConfig(HoodieCompactionConfig.newBuilder().retainCommits(1).archiveCommitsWith(minInstants,
 maxInstants).build())
+.build();
+metaClient = HoodieTableMetaClient.reload(metaClient);
+for (int i = 0; i < maxInstants + 2; i++) {
+  createCleanMetadata(i + "", false);
+}
+
+HoodieTable table = HoodieSparkTable.create(cfg, context, metaClient);
+HoodieTimelineArchiveLog archiveLog = new HoodieTimelineArchiveLog(cfg, 
table);
+
+archiveLog.archiveIfRequired(context);
+assertEquals(minInstants, 
metaClient.getActiveTimeline().reload().getInstants().count());
+  }
+
+  @Test
+  public void 
testArchiveCompletedShouldNotArchiveIfInstantsLessThanMaxtoKeep() throws 
IOException {
+int minInstants = 2;
+int maxInstants = 10;
+HoodieWriteConfig cfg =
+
HoodieWriteConfig.newBuilder().withPath(basePath).withSchema(HoodieTestDataGenerator.TRIP_EXAMPLE_SCHEMA)
+.withParallelism(2, 2).forTable("test-trip-table")
+
.withCompactionConfig(HoodieCompactionConfig.newBuilder().retainCommits(1).archiveCom

[GitHub] [hudi] jsbali commented on a change in pull request #2677: Added tests to TestHoodieTimelineArchiveLog for the archival of compl…

2021-03-24 Thread GitBox



jsbali commented on a change in pull request #2677:
URL: https://github.com/apache/hudi/pull/2677#discussion_r600463160



##
File path: 
hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/io/TestHoodieTimelineArchiveLog.java
##
@@ -491,6 +519,166 @@ public void testConvertCommitMetadata() {
 assertEquals(expectedCommitMetadata.getOperationType(), 
WriteOperationType.INSERT.toString());
   }
 
+  @Test
+  public void testArchiveCompletedClean() throws IOException {
+HoodieWriteConfig cfg =
+
HoodieWriteConfig.newBuilder().withPath(basePath).withSchema(HoodieTestDataGenerator.TRIP_EXAMPLE_SCHEMA)
+.withParallelism(2, 2).forTable("test-trip-table")
+
.withCompactionConfig(HoodieCompactionConfig.newBuilder().retainCommits(1).archiveCommitsWith(2,
 3).build())
+.build();
+metaClient = HoodieTableMetaClient.reload(metaClient);
+
+createCleanMetadata("10", false);
+createCleanMetadata("11", false);
+createCleanMetadata("12", false);
+HoodieInstant notArchivedInstant1 = new HoodieInstant(State.COMPLETED, 
"clean", "12");
+createCleanMetadata("13", false);
+HoodieInstant notArchivedInstant2 = new HoodieInstant(State.COMPLETED, 
"clean", "13");
+
+HoodieTable table = HoodieSparkTable.create(cfg, context, metaClient);
+HoodieTimelineArchiveLog archiveLog = new HoodieTimelineArchiveLog(cfg, 
table);
+
+archiveLog.archiveIfRequired(context);
+
+List notArchivedInstants = 
metaClient.getActiveTimeline().reload().getInstants().collect(Collectors.toList());
+//There will be 3 * 2 files but due to TimelineLayoutV1 this will show as 
2.
+assertEquals(2, notArchivedInstants.size(), "Not archived instants should 
be 2");
+assertEquals(notArchivedInstants, Arrays.asList(notArchivedInstant1, 
notArchivedInstant2), "");
+  }
+
+  @Test
+  public void testArchiveCompletedRollback() throws IOException {
+HoodieWriteConfig cfg =
+
HoodieWriteConfig.newBuilder().withPath(basePath).withSchema(HoodieTestDataGenerator.TRIP_EXAMPLE_SCHEMA)
+.withParallelism(2, 2).forTable("test-trip-table")
+
.withCompactionConfig(HoodieCompactionConfig.newBuilder().retainCommits(1).archiveCommitsWith(2,
 3).build())
+.build();
+metaClient = HoodieTableMetaClient.reload(metaClient);
+
+createCommitAndRollbackFile("6", "10", false);
+createCommitAndRollbackFile("8", "11", false);
+createCommitAndRollbackFile("7", "12", false);
+HoodieInstant notArchivedInstant1 = new HoodieInstant(State.COMPLETED, 
"rollback", "12");
+
+createCommitAndRollbackFile("5", "13", false);
+HoodieInstant notArchivedInstant2 = new HoodieInstant(State.COMPLETED, 
"rollback", "13");
+
+HoodieTable table = HoodieSparkTable.create(cfg, context, metaClient);
+HoodieTimelineArchiveLog archiveLog = new HoodieTimelineArchiveLog(cfg, 
table);
+
+archiveLog.archiveIfRequired(context);
+
+List notArchivedInstants = 
metaClient.getActiveTimeline().reload().getRollbackTimeline().getInstants().collect(Collectors.toList());
+//There will be 2 * 2 files but due to TimelineLayoutV1 this will show as 
2.
+assertEquals(2, notArchivedInstants.size(), "Not archived instants should 
be 2");
+assertEquals(notArchivedInstants, Arrays.asList(notArchivedInstant1, 
notArchivedInstant2), "");
+  }
+
+  @Test
+  public void 
testArchiveCompletedShouldRetainMinInstantsIfInstantsGreaterThanMaxtoKeep() 
throws IOException {
+int minInstants = 2;
+int maxInstants = 10;
+HoodieWriteConfig cfg =
+
HoodieWriteConfig.newBuilder().withPath(basePath).withSchema(HoodieTestDataGenerator.TRIP_EXAMPLE_SCHEMA)
+.withParallelism(2, 2).forTable("test-trip-table")
+
.withCompactionConfig(HoodieCompactionConfig.newBuilder().retainCommits(1).archiveCommitsWith(minInstants,
 maxInstants).build())
+.build();
+metaClient = HoodieTableMetaClient.reload(metaClient);
+for (int i = 0; i < maxInstants + 2; i++) {
+  createCleanMetadata(i + "", false);
+}
+
+HoodieTable table = HoodieSparkTable.create(cfg, context, metaClient);
+HoodieTimelineArchiveLog archiveLog = new HoodieTimelineArchiveLog(cfg, 
table);
+
+archiveLog.archiveIfRequired(context);
+assertEquals(minInstants, 
metaClient.getActiveTimeline().reload().getInstants().count());
+  }
+
+  @Test
+  public void 
testArchiveCompletedShouldNotArchiveIfInstantsLessThanMaxtoKeep() throws 
IOException {
+int minInstants = 2;
+int maxInstants = 10;
+HoodieWriteConfig cfg =
+
HoodieWriteConfig.newBuilder().withPath(basePath).withSchema(HoodieTestDataGenerator.TRIP_EXAMPLE_SCHEMA)
+.withParallelism(2, 2).forTable("test-trip-table")
+
.withCompactionConfig(HoodieCompactionConfig.newBuilder().retainCommits(1).archiveCom

[GitHub] [hudi] jsbali commented on a change in pull request #2677: Added tests to TestHoodieTimelineArchiveLog for the archival of compl…

2021-03-24 Thread GitBox



jsbali commented on a change in pull request #2677:
URL: https://github.com/apache/hudi/pull/2677#discussion_r600463420



##
File path: 
hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/io/TestHoodieTimelineArchiveLog.java
##
@@ -491,6 +519,166 @@ public void testConvertCommitMetadata() {
 assertEquals(expectedCommitMetadata.getOperationType(), 
WriteOperationType.INSERT.toString());
   }
 
+  @Test
+  public void testArchiveCompletedClean() throws IOException {
+HoodieWriteConfig cfg =
+
HoodieWriteConfig.newBuilder().withPath(basePath).withSchema(HoodieTestDataGenerator.TRIP_EXAMPLE_SCHEMA)
+.withParallelism(2, 2).forTable("test-trip-table")
+
.withCompactionConfig(HoodieCompactionConfig.newBuilder().retainCommits(1).archiveCommitsWith(2,
 3).build())
+.build();
+metaClient = HoodieTableMetaClient.reload(metaClient);
+
+createCleanMetadata("10", false);
+createCleanMetadata("11", false);
+createCleanMetadata("12", false);
+HoodieInstant notArchivedInstant1 = new HoodieInstant(State.COMPLETED, 
"clean", "12");
+createCleanMetadata("13", false);
+HoodieInstant notArchivedInstant2 = new HoodieInstant(State.COMPLETED, 
"clean", "13");
+
+HoodieTable table = HoodieSparkTable.create(cfg, context, metaClient);
+HoodieTimelineArchiveLog archiveLog = new HoodieTimelineArchiveLog(cfg, 
table);
+
+archiveLog.archiveIfRequired(context);
+
+List notArchivedInstants = 
metaClient.getActiveTimeline().reload().getInstants().collect(Collectors.toList());
+//There will be 3 * 2 files but due to TimelineLayoutV1 this will show as 
2.
+assertEquals(2, notArchivedInstants.size(), "Not archived instants should 
be 2");
+assertEquals(notArchivedInstants, Arrays.asList(notArchivedInstant1, 
notArchivedInstant2), "");
+  }
+
+  @Test
+  public void testArchiveCompletedRollback() throws IOException {
+HoodieWriteConfig cfg =
+
HoodieWriteConfig.newBuilder().withPath(basePath).withSchema(HoodieTestDataGenerator.TRIP_EXAMPLE_SCHEMA)
+.withParallelism(2, 2).forTable("test-trip-table")
+
.withCompactionConfig(HoodieCompactionConfig.newBuilder().retainCommits(1).archiveCommitsWith(2,
 3).build())
+.build();
+metaClient = HoodieTableMetaClient.reload(metaClient);
+
+createCommitAndRollbackFile("6", "10", false);
+createCommitAndRollbackFile("8", "11", false);
+createCommitAndRollbackFile("7", "12", false);
+HoodieInstant notArchivedInstant1 = new HoodieInstant(State.COMPLETED, 
"rollback", "12");
+
+createCommitAndRollbackFile("5", "13", false);
+HoodieInstant notArchivedInstant2 = new HoodieInstant(State.COMPLETED, 
"rollback", "13");
+
+HoodieTable table = HoodieSparkTable.create(cfg, context, metaClient);
+HoodieTimelineArchiveLog archiveLog = new HoodieTimelineArchiveLog(cfg, 
table);
+
+archiveLog.archiveIfRequired(context);
+
+List notArchivedInstants = 
metaClient.getActiveTimeline().reload().getRollbackTimeline().getInstants().collect(Collectors.toList());
+//There will be 2 * 2 files but due to TimelineLayoutV1 this will show as 
2.
+assertEquals(2, notArchivedInstants.size(), "Not archived instants should 
be 2");
+assertEquals(notArchivedInstants, Arrays.asList(notArchivedInstant1, 
notArchivedInstant2), "");
+  }
+
+  @Test
+  public void 
testArchiveCompletedShouldRetainMinInstantsIfInstantsGreaterThanMaxtoKeep() 
throws IOException {
+int minInstants = 2;
+int maxInstants = 10;
+HoodieWriteConfig cfg =
+
HoodieWriteConfig.newBuilder().withPath(basePath).withSchema(HoodieTestDataGenerator.TRIP_EXAMPLE_SCHEMA)
+.withParallelism(2, 2).forTable("test-trip-table")
+
.withCompactionConfig(HoodieCompactionConfig.newBuilder().retainCommits(1).archiveCommitsWith(minInstants,
 maxInstants).build())
+.build();
+metaClient = HoodieTableMetaClient.reload(metaClient);
+for (int i = 0; i < maxInstants + 2; i++) {
+  createCleanMetadata(i + "", false);
+}
+
+HoodieTable table = HoodieSparkTable.create(cfg, context, metaClient);
+HoodieTimelineArchiveLog archiveLog = new HoodieTimelineArchiveLog(cfg, 
table);
+
+archiveLog.archiveIfRequired(context);
+assertEquals(minInstants, 
metaClient.getActiveTimeline().reload().getInstants().count());
+  }
+
+  @Test
+  public void 
testArchiveCompletedShouldNotArchiveIfInstantsLessThanMaxtoKeep() throws 
IOException {
+int minInstants = 2;
+int maxInstants = 10;
+HoodieWriteConfig cfg =
+
HoodieWriteConfig.newBuilder().withPath(basePath).withSchema(HoodieTestDataGenerator.TRIP_EXAMPLE_SCHEMA)
+.withParallelism(2, 2).forTable("test-trip-table")
+
.withCompactionConfig(HoodieCompactionConfig.newBuilder().retainCommits(1).archiveCom

[GitHub] [hudi] jsbali commented on pull request #2677: Added tests to TestHoodieTimelineArchiveLog for the archival of compl…

2021-03-24 Thread GitBox



jsbali commented on pull request #2677:
URL: https://github.com/apache/hudi/pull/2677#issuecomment-805810171


   @vinothchandar added [JIRA](https://issues.apache.org/jira/browse/HUDI-1714) 
for it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] jsbali edited a comment on pull request #2677: Added tests to TestHoodieTimelineArchiveLog for the archival of compl…

2021-03-24 Thread GitBox



jsbali edited a comment on pull request #2677:
URL: https://github.com/apache/hudi/pull/2677#issuecomment-805810171


   @vinothchandar added 
[HUDI-1714](https://issues.apache.org/jira/browse/HUDI-1714) for it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[hudi] branch master updated: Moving to 0.9.0-SNAPSHOT on master branch.

2021-03-24 Thread garyli

This is an automated email from the ASF dual-hosted git repository.

garyli pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new 6e803e0  Moving to 0.9.0-SNAPSHOT on master branch.
6e803e0 is described below

commit 6e803e08b1328b32a5c3a6acd8168fdabc8a1e50
Author: garyli1019 
AuthorDate: Wed Mar 24 21:37:14 2021 +0800

Moving to 0.9.0-SNAPSHOT on master branch.
---
 docker/hoodie/hadoop/base/pom.xml   | 2 +-
 docker/hoodie/hadoop/datanode/pom.xml   | 2 +-
 docker/hoodie/hadoop/historyserver/pom.xml  | 2 +-
 docker/hoodie/hadoop/hive_base/pom.xml  | 2 +-
 docker/hoodie/hadoop/namenode/pom.xml   | 2 +-
 docker/hoodie/hadoop/pom.xml| 2 +-
 docker/hoodie/hadoop/prestobase/pom.xml | 2 +-
 docker/hoodie/hadoop/spark_base/pom.xml | 2 +-
 docker/hoodie/hadoop/sparkadhoc/pom.xml | 2 +-
 docker/hoodie/hadoop/sparkmaster/pom.xml| 2 +-
 docker/hoodie/hadoop/sparkworker/pom.xml| 2 +-
 hudi-cli/pom.xml| 2 +-
 hudi-client/hudi-client-common/pom.xml  | 4 ++--
 hudi-client/hudi-flink-client/pom.xml   | 4 ++--
 hudi-client/hudi-java-client/pom.xml| 4 ++--
 hudi-client/hudi-spark-client/pom.xml   | 4 ++--
 hudi-client/pom.xml | 2 +-
 hudi-common/pom.xml | 2 +-
 hudi-examples/pom.xml   | 2 +-
 hudi-flink/pom.xml  | 2 +-
 hudi-hadoop-mr/pom.xml  | 2 +-
 hudi-integ-test/pom.xml | 2 +-
 hudi-spark-datasource/hudi-spark-common/pom.xml | 4 ++--
 hudi-spark-datasource/hudi-spark/pom.xml| 4 ++--
 hudi-spark-datasource/hudi-spark2/pom.xml   | 4 ++--
 hudi-spark-datasource/hudi-spark3/pom.xml   | 4 ++--
 hudi-spark-datasource/pom.xml   | 2 +-
 hudi-sync/hudi-dla-sync/pom.xml | 2 +-
 hudi-sync/hudi-hive-sync/pom.xml| 2 +-
 hudi-sync/hudi-sync-common/pom.xml  | 2 +-
 hudi-sync/pom.xml   | 2 +-
 hudi-timeline-service/pom.xml   | 2 +-
 hudi-utilities/pom.xml  | 2 +-
 packaging/hudi-flink-bundle/pom.xml | 2 +-
 packaging/hudi-hadoop-mr-bundle/pom.xml | 2 +-
 packaging/hudi-hive-sync-bundle/pom.xml | 2 +-
 packaging/hudi-integ-test-bundle/pom.xml| 2 +-
 packaging/hudi-presto-bundle/pom.xml| 2 +-
 packaging/hudi-spark-bundle/pom.xml | 2 +-
 packaging/hudi-timeline-server-bundle/pom.xml   | 2 +-
 packaging/hudi-utilities-bundle/pom.xml | 2 +-
 pom.xml | 2 +-
 42 files changed, 50 insertions(+), 50 deletions(-)

diff --git a/docker/hoodie/hadoop/base/pom.xml 
b/docker/hoodie/hadoop/base/pom.xml
index 42eb158..19a9bef 100644
--- a/docker/hoodie/hadoop/base/pom.xml
+++ b/docker/hoodie/hadoop/base/pom.xml
@@ -19,7 +19,7 @@
   
 hudi-hadoop-docker
 org.apache.hudi
-0.8.0-SNAPSHOT
+0.9.0-SNAPSHOT
   
   4.0.0
   pom
diff --git a/docker/hoodie/hadoop/datanode/pom.xml 
b/docker/hoodie/hadoop/datanode/pom.xml
index 3ac8ec0..ca77f0d 100644
--- a/docker/hoodie/hadoop/datanode/pom.xml
+++ b/docker/hoodie/hadoop/datanode/pom.xml
@@ -19,7 +19,7 @@
   
 hudi-hadoop-docker
 org.apache.hudi
-0.8.0-SNAPSHOT
+0.9.0-SNAPSHOT
   
   4.0.0
   pom
diff --git a/docker/hoodie/hadoop/historyserver/pom.xml 
b/docker/hoodie/hadoop/historyserver/pom.xml
index b0c5a77..c911d87 100644
--- a/docker/hoodie/hadoop/historyserver/pom.xml
+++ b/docker/hoodie/hadoop/historyserver/pom.xml
@@ -19,7 +19,7 @@
   
 hudi-hadoop-docker
 org.apache.hudi
-0.8.0-SNAPSHOT
+0.9.0-SNAPSHOT
   
   4.0.0
   pom
diff --git a/docker/hoodie/hadoop/hive_base/pom.xml 
b/docker/hoodie/hadoop/hive_base/pom.xml
index 62ea4c1..3d95036 100644
--- a/docker/hoodie/hadoop/hive_base/pom.xml
+++ b/docker/hoodie/hadoop/hive_base/pom.xml
@@ -19,7 +19,7 @@
   
 hudi-hadoop-docker
 org.apache.hudi
-0.8.0-SNAPSHOT
+0.9.0-SNAPSHOT
   
   4.0.0
   pom
diff --git a/docker/hoodie/hadoop/namenode/pom.xml 
b/docker/hoodie/hadoop/namenode/pom.xml
index dcd874c..3d3fd2f 100644
--- a/docker/hoodie/hadoop/namenode/pom.xml
+++ b/docker/hoodie/hadoop/namenode/pom.xml
@@ -19,7 +19,7 @@
   
 hudi-hadoop-docker
 org.apache.hudi
-0.8.0-SNAPSHOT
+0.9.0-SNAPSHOT
   
   4.0.0
   pom
diff --git a/docker/hoodie/hadoop/pom.xml b/docker/hoodie/hadoop/pom.xml
index 612067d..20e8b5f 100644
--- a/docker/hoodie/hadoop/pom.xml
+++ b/docker/hoodie/hadoop/pom.xml
@@ -19,7 +19,7 @@
   
 hudi
 org.apache.hudi
-0.8.0-SNAPSHOT
+0.9.0-SNAPSHOT
 ../../../pom.xml
   
   4.0.0
diff --git a/docker/hoodie/hadoop/prestobase/pom.xml 
b/docker/hoodie/hadoop/prestobase/pom

[hudi] branch release-0.8.0 created (now 9bfd810)

2021-03-24 Thread garyli

This is an automated email from the ASF dual-hosted git repository.

garyli pushed a change to branch release-0.8.0
in repository https://gitbox.apache.org/repos/asf/hudi.git.


  at 9bfd810  Create release branch for version 0.8.0.

This branch includes the following new commits:

 new 9bfd810  Create release branch for version 0.8.0.

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.

[hudi] 01/01: Create release branch for version 0.8.0.

2021-03-24 Thread garyli

This is an automated email from the ASF dual-hosted git repository.

garyli pushed a commit to branch release-0.8.0
in repository https://gitbox.apache.org/repos/asf/hudi.git

commit 9bfd810e745202bd27980fa696d61b47922f46a9
Author: garyli1019 
AuthorDate: Wed Mar 24 21:37:43 2021 +0800

Create release branch for version 0.8.0.
---
 docker/hoodie/hadoop/base/pom.xml   | 2 +-
 docker/hoodie/hadoop/datanode/pom.xml   | 2 +-
 docker/hoodie/hadoop/historyserver/pom.xml  | 2 +-
 docker/hoodie/hadoop/hive_base/pom.xml  | 2 +-
 docker/hoodie/hadoop/namenode/pom.xml   | 2 +-
 docker/hoodie/hadoop/pom.xml| 2 +-
 docker/hoodie/hadoop/prestobase/pom.xml | 2 +-
 docker/hoodie/hadoop/spark_base/pom.xml | 2 +-
 docker/hoodie/hadoop/sparkadhoc/pom.xml | 2 +-
 docker/hoodie/hadoop/sparkmaster/pom.xml| 2 +-
 docker/hoodie/hadoop/sparkworker/pom.xml| 2 +-
 hudi-cli/pom.xml| 2 +-
 hudi-client/hudi-client-common/pom.xml  | 4 ++--
 hudi-client/hudi-flink-client/pom.xml   | 4 ++--
 hudi-client/hudi-java-client/pom.xml| 4 ++--
 hudi-client/hudi-spark-client/pom.xml   | 4 ++--
 hudi-client/pom.xml | 2 +-
 hudi-common/pom.xml | 2 +-
 hudi-examples/pom.xml   | 2 +-
 hudi-flink/pom.xml  | 2 +-
 hudi-hadoop-mr/pom.xml  | 2 +-
 hudi-integ-test/pom.xml | 2 +-
 hudi-spark-datasource/hudi-spark-common/pom.xml | 4 ++--
 hudi-spark-datasource/hudi-spark/pom.xml| 4 ++--
 hudi-spark-datasource/hudi-spark2/pom.xml   | 4 ++--
 hudi-spark-datasource/hudi-spark3/pom.xml   | 4 ++--
 hudi-spark-datasource/pom.xml   | 2 +-
 hudi-sync/hudi-dla-sync/pom.xml | 2 +-
 hudi-sync/hudi-hive-sync/pom.xml| 2 +-
 hudi-sync/hudi-sync-common/pom.xml  | 2 +-
 hudi-sync/pom.xml   | 2 +-
 hudi-timeline-service/pom.xml   | 2 +-
 hudi-utilities/pom.xml  | 2 +-
 packaging/hudi-flink-bundle/pom.xml | 2 +-
 packaging/hudi-hadoop-mr-bundle/pom.xml | 2 +-
 packaging/hudi-hive-sync-bundle/pom.xml | 2 +-
 packaging/hudi-integ-test-bundle/pom.xml| 2 +-
 packaging/hudi-presto-bundle/pom.xml| 2 +-
 packaging/hudi-spark-bundle/pom.xml | 2 +-
 packaging/hudi-timeline-server-bundle/pom.xml   | 2 +-
 packaging/hudi-utilities-bundle/pom.xml | 2 +-
 pom.xml | 2 +-
 42 files changed, 50 insertions(+), 50 deletions(-)

diff --git a/docker/hoodie/hadoop/base/pom.xml 
b/docker/hoodie/hadoop/base/pom.xml
index 42eb158..3e2bc48 100644
--- a/docker/hoodie/hadoop/base/pom.xml
+++ b/docker/hoodie/hadoop/base/pom.xml
@@ -19,7 +19,7 @@
   
 hudi-hadoop-docker
 org.apache.hudi
-0.8.0-SNAPSHOT
+0.8.0-rc1
   
   4.0.0
   pom
diff --git a/docker/hoodie/hadoop/datanode/pom.xml 
b/docker/hoodie/hadoop/datanode/pom.xml
index 3ac8ec0..561d1a9 100644
--- a/docker/hoodie/hadoop/datanode/pom.xml
+++ b/docker/hoodie/hadoop/datanode/pom.xml
@@ -19,7 +19,7 @@
   
 hudi-hadoop-docker
 org.apache.hudi
-0.8.0-SNAPSHOT
+0.8.0-rc1
   
   4.0.0
   pom
diff --git a/docker/hoodie/hadoop/historyserver/pom.xml 
b/docker/hoodie/hadoop/historyserver/pom.xml
index b0c5a77..b06a238 100644
--- a/docker/hoodie/hadoop/historyserver/pom.xml
+++ b/docker/hoodie/hadoop/historyserver/pom.xml
@@ -19,7 +19,7 @@
   
 hudi-hadoop-docker
 org.apache.hudi
-0.8.0-SNAPSHOT
+0.8.0-rc1
   
   4.0.0
   pom
diff --git a/docker/hoodie/hadoop/hive_base/pom.xml 
b/docker/hoodie/hadoop/hive_base/pom.xml
index 62ea4c1..c17c3da 100644
--- a/docker/hoodie/hadoop/hive_base/pom.xml
+++ b/docker/hoodie/hadoop/hive_base/pom.xml
@@ -19,7 +19,7 @@
   
 hudi-hadoop-docker
 org.apache.hudi
-0.8.0-SNAPSHOT
+0.8.0-rc1
   
   4.0.0
   pom
diff --git a/docker/hoodie/hadoop/namenode/pom.xml 
b/docker/hoodie/hadoop/namenode/pom.xml
index dcd874c..ab7251c 100644
--- a/docker/hoodie/hadoop/namenode/pom.xml
+++ b/docker/hoodie/hadoop/namenode/pom.xml
@@ -19,7 +19,7 @@
   
 hudi-hadoop-docker
 org.apache.hudi
-0.8.0-SNAPSHOT
+0.8.0-rc1
   
   4.0.0
   pom
diff --git a/docker/hoodie/hadoop/pom.xml b/docker/hoodie/hadoop/pom.xml
index 612067d..deff4ba 100644
--- a/docker/hoodie/hadoop/pom.xml
+++ b/docker/hoodie/hadoop/pom.xml
@@ -19,7 +19,7 @@
   
 hudi
 org.apache.hudi
-0.8.0-SNAPSHOT
+0.8.0-rc1
 ../../../pom.xml
   
   4.0.0
diff --git a/docker/hoodie/hadoop/prestobase/pom.xml 
b/docker/hoodie/hadoop/prestobase/pom.xml
index dea2f43..2430969 100644
--- a/docker/hoodie/hadoop/prestobase/pom.xml
+++ b/docker/hoodie/hadoop/prestobase/pom.xml
@@ -20,7 +20,7 @@
   
 hudi-hadoop-docker
 org.apa

[jira] [Resolved] (HUDI-1712) Standardize prefix for hoodie lock configs

2021-03-24 Thread Gary Li (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-1712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gary Li resolved HUDI-1712.
---
Resolution: Resolved

> Standardize prefix for hoodie lock configs
> --
>
> Key: HUDI-1712
> URL: https://issues.apache.org/jira/browse/HUDI-1712
> Project: Apache Hudi
>  Issue Type: Task
>  Components: Writer Core
>Reporter: Nishith Agarwal
>Assignee: Nishith Agarwal
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Closed] (HUDI-1712) Standardize prefix for hoodie lock configs

2021-03-24 Thread Gary Li (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-1712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gary Li closed HUDI-1712.
-

> Standardize prefix for hoodie lock configs
> --
>
> Key: HUDI-1712
> URL: https://issues.apache.org/jira/browse/HUDI-1712
> Project: Apache Hudi
>  Issue Type: Task
>  Components: Writer Core
>Reporter: Nishith Agarwal
>Assignee: Nishith Agarwal
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HUDI-909) Integrate hudi with flink engine

2021-03-24 Thread Gary Li (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gary Li updated HUDI-909:
-
Issue Type: New Feature  (was: Task)

> Integrate hudi with flink engine
> 
>
> Key: HUDI-909
> URL: https://issues.apache.org/jira/browse/HUDI-909
> Project: Apache Hudi
>  Issue Type: New Feature
>Reporter: wangxianghu#1
>Assignee: Xianghu Wang
>Priority: Major
>
> Integrate hudi with flink engine



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HUDI-909) [UMBRELLA]Integrate hudi with flink engine

2021-03-24 Thread Gary Li (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gary Li updated HUDI-909:
-
Summary: [UMBRELLA]Integrate hudi with flink engine  (was: Integrate hudi 
with flink engine)

> [UMBRELLA]Integrate hudi with flink engine
> --
>
> Key: HUDI-909
> URL: https://issues.apache.org/jira/browse/HUDI-909
> Project: Apache Hudi
>  Issue Type: New Feature
>Reporter: wangxianghu#1
>Assignee: Xianghu Wang
>Priority: Major
>
> Integrate hudi with flink engine



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HUDI-1521) [UMBRELLA]HUDI Flink writer proposal

2021-03-24 Thread Gary Li (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-1521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gary Li updated HUDI-1521:
--
Summary: [UMBRELLA]HUDI Flink writer proposal  (was: HUDI Flink writer 
proposal)

> [UMBRELLA]HUDI Flink writer proposal
> 
>
> Key: HUDI-1521
> URL: https://issues.apache.org/jira/browse/HUDI-1521
> Project: Apache Hudi
>  Issue Type: New Feature
>  Components: Flink Integration
>Reporter: Danny Chen
>Assignee: Danny Chen
>Priority: Major
>
> As the RFC-24 has described [1], we would promote the Flink writer as 
> following:
> 1. Remove the single parallelism operator and add test framework
> 2. Make the write task scalable
> 3. Write as mini-batch
> 4. Add a new index
> So this is an umbrella issue, we would fix each as sub-tasks.
> [1] 
> https://cwiki.apache.org/confluence/display/HUDI/RFC+-+24%3A+Hoodie+Flink+Writer+Proposal



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HUDI-1521) HUDI Flink writer proposal

2021-03-24 Thread Gary Li (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-1521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gary Li updated HUDI-1521:
--
Issue Type: New Feature  (was: Improvement)

> HUDI Flink writer proposal
> --
>
> Key: HUDI-1521
> URL: https://issues.apache.org/jira/browse/HUDI-1521
> Project: Apache Hudi
>  Issue Type: New Feature
>  Components: Flink Integration
>Reporter: Danny Chen
>Assignee: Danny Chen
>Priority: Major
>
> As the RFC-24 has described [1], we would promote the Flink writer as 
> following:
> 1. Remove the single parallelism operator and add test framework
> 2. Make the write task scalable
> 3. Write as mini-batch
> 4. Add a new index
> So this is an umbrella issue, we would fix each as sub-tasks.
> [1] 
> https://cwiki.apache.org/confluence/display/HUDI/RFC+-+24%3A+Hoodie+Flink+Writer+Proposal



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HUDI-1521) [UMBRELLA]HUDI Flink writer proposal

2021-03-24 Thread Gary Li (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-1521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gary Li updated HUDI-1521:
--
Fix Version/s: 0.8.0

> [UMBRELLA]HUDI Flink writer proposal
> 
>
> Key: HUDI-1521
> URL: https://issues.apache.org/jira/browse/HUDI-1521
> Project: Apache Hudi
>  Issue Type: New Feature
>  Components: Flink Integration
>Reporter: Danny Chen
>Assignee: Danny Chen
>Priority: Major
> Fix For: 0.8.0
>
>
> As the RFC-24 has described [1], we would promote the Flink writer as 
> following:
> 1. Remove the single parallelism operator and add test framework
> 2. Make the write task scalable
> 3. Write as mini-batch
> 4. Add a new index
> So this is an umbrella issue, we would fix each as sub-tasks.
> [1] 
> https://cwiki.apache.org/confluence/display/HUDI/RFC+-+24%3A+Hoodie+Flink+Writer+Proposal



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HUDI-1521) [UMBRELLA] RFC-24 HUDI Flink writer proposal

2021-03-24 Thread Gary Li (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-1521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gary Li updated HUDI-1521:
--
Summary: [UMBRELLA] RFC-24 HUDI Flink writer proposal  (was: [UMBRELLA]HUDI 
Flink writer proposal)

> [UMBRELLA] RFC-24 HUDI Flink writer proposal
> 
>
> Key: HUDI-1521
> URL: https://issues.apache.org/jira/browse/HUDI-1521
> Project: Apache Hudi
>  Issue Type: New Feature
>  Components: Flink Integration
>Reporter: Danny Chen
>Assignee: Danny Chen
>Priority: Major
> Fix For: 0.8.0
>
>
> As the RFC-24 has described [1], we would promote the Flink writer as 
> following:
> 1. Remove the single parallelism operator and add test framework
> 2. Make the write task scalable
> 3. Write as mini-batch
> 4. Add a new index
> So this is an umbrella issue, we would fix each as sub-tasks.
> [1] 
> https://cwiki.apache.org/confluence/display/HUDI/RFC+-+24%3A+Hoodie+Flink+Writer+Proposal



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HUDI-1715) Refactor the BaseFlinkCommitActionExecutor to adapt to the new Flink writer

2021-03-24 Thread Gary Li (Jira)

Gary Li created HUDI-1715:
-

 Summary: Refactor the BaseFlinkCommitActionExecutor to adapt to 
the new Flink writer
 Key: HUDI-1715
 URL: https://issues.apache.org/jira/browse/HUDI-1715
 Project: Apache Hudi
  Issue Type: Improvement
Reporter: Gary Li






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HUDI-1715) Refactor the BaseFlinkCommitActionExecutor to adapt to the new Flink writer

2021-03-24 Thread Gary Li (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-1715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gary Li updated HUDI-1715:
--
Component/s: Flink Integration

> Refactor the BaseFlinkCommitActionExecutor to adapt to the new Flink writer
> ---
>
> Key: HUDI-1715
> URL: https://issues.apache.org/jira/browse/HUDI-1715
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Flink Integration
>Reporter: Gary Li
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Closed] (HUDI-1581) Refactor the BaseFlinkCommitActionExecutor to adapt to the new Flink writer

2021-03-24 Thread Gary Li (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-1581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gary Li closed HUDI-1581.
-
Resolution: Duplicate

> Refactor the BaseFlinkCommitActionExecutor to adapt to the new Flink writer
> ---
>
> Key: HUDI-1581
> URL: https://issues.apache.org/jira/browse/HUDI-1581
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Common Core
>Reporter: Danny Chen
>Priority: Major
>
> In order to adapt to the new Flink writer, the executor needs to support:
> 1. specify the bucket type explicitly for a batch of records directly
> 2. have control on when and how the underneath file handles roll over 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HUDI-1715) Refactor the BaseFlinkCommitActionExecutor to adapt to the new Flink writer

2021-03-24 Thread Gary Li (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-1715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gary Li updated HUDI-1715:
--
Description: 
In order to adapt to the new Flink writer, the executor needs to support:

1. specify the bucket type explicitly for a batch of records directly
2. have control on when and how the underneath file handles roll over

> Refactor the BaseFlinkCommitActionExecutor to adapt to the new Flink writer
> ---
>
> Key: HUDI-1715
> URL: https://issues.apache.org/jira/browse/HUDI-1715
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Flink Integration
>Reporter: Gary Li
>Priority: Major
>
> In order to adapt to the new Flink writer, the executor needs to support:
> 1. specify the bucket type explicitly for a batch of records directly
> 2. have control on when and how the underneath file handles roll over



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Closed] (HUDI-1521) [UMBRELLA] RFC-24 HUDI Flink writer proposal

2021-03-24 Thread Gary Li (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-1521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gary Li closed HUDI-1521.
-

> [UMBRELLA] RFC-24 HUDI Flink writer proposal
> 
>
> Key: HUDI-1521
> URL: https://issues.apache.org/jira/browse/HUDI-1521
> Project: Apache Hudi
>  Issue Type: New Feature
>  Components: Flink Integration
>Reporter: Danny Chen
>Assignee: Danny Chen
>Priority: Major
> Fix For: 0.8.0
>
>
> As the RFC-24 has described [1], we would promote the Flink writer as 
> following:
> 1. Remove the single parallelism operator and add test framework
> 2. Make the write task scalable
> 3. Write as mini-batch
> 4. Add a new index
> So this is an umbrella issue, we would fix each as sub-tasks.
> [1] 
> https://cwiki.apache.org/confluence/display/HUDI/RFC+-+24%3A+Hoodie+Flink+Writer+Proposal



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (HUDI-1521) [UMBRELLA] RFC-24 HUDI Flink writer proposal

2021-03-24 Thread Gary Li (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-1521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gary Li resolved HUDI-1521.
---
Resolution: Implemented

> [UMBRELLA] RFC-24 HUDI Flink writer proposal
> 
>
> Key: HUDI-1521
> URL: https://issues.apache.org/jira/browse/HUDI-1521
> Project: Apache Hudi
>  Issue Type: New Feature
>  Components: Flink Integration
>Reporter: Danny Chen
>Assignee: Danny Chen
>Priority: Major
> Fix For: 0.8.0
>
>
> As the RFC-24 has described [1], we would promote the Flink writer as 
> following:
> 1. Remove the single parallelism operator and add test framework
> 2. Make the write task scalable
> 3. Write as mini-batch
> 4. Add a new index
> So this is an umbrella issue, we would fix each as sub-tasks.
> [1] 
> https://cwiki.apache.org/confluence/display/HUDI/RFC+-+24%3A+Hoodie+Flink+Writer+Proposal



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[GitHub] [hudi] vburenin opened a new pull request #2714: [HUDI-1707] Reduces log level for too verbose messages from info to debug level.

2021-03-24 Thread GitBox



vburenin opened a new pull request #2714:
URL: https://github.com/apache/hudi/pull/2714


   ## What is the purpose of the pull request
   
   Some log messages are too verbose and some are not really readable, this PR 
is aimed at moving most verbose messages from info to debug level as well as 
improving print out of the configuration info in delta streamer.
   
   ## Verify this pull request
   This pull request is a trivial rework / code cleanup without any test 
coverage.
   
- [x] Has a corresponding JIRA in PR title & commit

- [x] Commit message is descriptive of the change

- [x] CI is green
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Updated] (HUDI-1707) Improve Logging subsystem

2021-03-24 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-1707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-1707:
-
Labels: pull-request-available  (was: )

> Improve Logging subsystem
> -
>
> Key: HUDI-1707
> URL: https://issues.apache.org/jira/browse/HUDI-1707
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Code Cleanup
>Reporter: Volodymyr Burenin
>Priority: Major
>  Labels: pull-request-available
>
> Currently Hudi has a relatively verbose logging on info level that is not 
> particularly useful. Like latency measurements of file system views, print 
> out of commit timelines, etc that could be super large, etc.
> Additionally to that, the logging subsystem is suboptimal as it formats all 
> messages before they are being passed into the logger, so it doesn't matter 
> if logger will or will not print out a log message a lot of work is still 
> being done anyway.
> Would be also nice to add more messages on info level to determine at which 
> phase Hudi is: ideally, the info level should be limited to the point that 
> just looking into the logs should be clear enough at which phase Hudi is and 
> what it is doing without being too verbose.
> TBD: Add more thoughts on logging subsystem



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[GitHub] [hudi] vburenin commented on a change in pull request #2687: [HUDI-1700] Hudi Meetup with Uber video link

2021-03-24 Thread GitBox



vburenin commented on a change in pull request #2687:
URL: https://github.com/apache/hudi/pull/2687#discussion_r600554090



##
File path: docs/_docs/0.7.0/1_4_powered_by.md
##
@@ -146,6 +146,8 @@ Meanwhile, we build a set of data access standards based on 
Hudi, which provides
 
 21. ["Meetup talk by Nishith 
Agarwal"](https://www.meetup.com/UberEvents/events/274924537/) - Uber Data 
Platforms Meetup, Dec 2020
 
+22. ["Apache Hudi Meetup at Uber with talks from AWS, CityStorageSystems & 
Uber"](https://youtu.be/cAvbBfMbaiA) - By Udit Mehrotra, Wenning Ding (AWS), 
Alexander Filipchik (CityStorageSystems), Prashant Wason, Satish Kotha (Uber), 
Feb 2021

Review comment:
   Still private




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] kema-wish opened a new pull request #2715: Fix non object id key

2021-03-24 Thread GitBox



kema-wish opened a new pull request #2715:
URL: https://github.com/apache/hudi/pull/2715


   ## *Tips*
   - *Thank you very much for contributing to Apache Hudi.*
   - *Please review https://hudi.apache.org/contributing.html before opening a 
pull request.*
   
   ## What is the purpose of the pull request
   
   *(For example: This pull request adds quick-start document.)*
   
   ## Brief change log
   
   *(for example:)*
 - *Modify AnnotationLocation checkstyle rule in checkstyle.xml*
   
   ## Verify this pull request
   
   *(Please pick either of the following options)*
   
   This pull request is a trivial rework / code cleanup without any test 
coverage.
   
   *(or)*
   
   This pull request is already covered by existing tests, such as *(please 
describe tests)*.
   
   (or)
   
   This change added tests and can be verified as follows:
   
   *(example:)*
   
 - *Added integration tests for end-to-end.*
 - *Added HoodieClientWriteTest to verify the change.*
 - *Manually verified the change by running a job locally.*
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] kema-wish closed pull request #2715: Fix non object id key

2021-03-24 Thread GitBox



kema-wish closed pull request #2715:
URL: https://github.com/apache/hudi/pull/2715


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] n3nash merged pull request #2709: [HUDI-1713] Updating config name for concurrency

2021-03-24 Thread GitBox



n3nash merged pull request #2709:
URL: https://github.com/apache/hudi/pull/2709


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[hudi] branch asf-site updated: Updating config name (#2709)

2021-03-24 Thread nagarwal

This is an automated email from the ASF dual-hosted git repository.

nagarwal pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/asf-site by this push:
 new f48fc59  Updating config name (#2709)
f48fc59 is described below

commit f48fc591cc2309152ed602401b973581e34a1916
Author: n3nash 
AuthorDate: Wed Mar 24 08:54:49 2021 -0700

Updating config name (#2709)
---
 docs/_docs/2_9_concurrency_control.md | 46 +--
 1 file changed, 23 insertions(+), 23 deletions(-)

diff --git a/docs/_docs/2_9_concurrency_control.md 
b/docs/_docs/2_9_concurrency_control.md
index e555d39..f3abc77 100644
--- a/docs/_docs/2_9_concurrency_control.md
+++ b/docs/_docs/2_9_concurrency_control.md
@@ -45,7 +45,7 @@ The following properties are needed to be set properly to 
turn on optimistic con
 ```
 hoodie.write.concurrency.mode=optimistic_concurrency_control
 hoodie.failed.writes.cleaner.policy=LAZY
-hoodie.writer.lock.provider=
+hoodie.write.lock.provider=
 ```
 
 There are 2 different server based lock providers that require different 
configuration to be set.
@@ -53,23 +53,23 @@ There are 2 different server based lock providers that 
require different configu
 **`Zookeeper`** based lock provider
 
 ```
-hoodie.writer.lock.provider=org.apache.hudi.client.transaction.lock.ZookeeperBasedLockProvider
-hoodie.writer.lock.zookeeper.url
-hoodie.writer.lock.zookeeper.port
-hoodie.writer.lock.wait_time_ms
-hoodie.writer.lock.num_retries
-hoodie.writer.lock.lock_key
-hoodie.writer.lock.zookeeper.zk_base_path
+hoodie.write.lock.provider=org.apache.hudi.client.transaction.lock.ZookeeperBasedLockProvider
+hoodie.write.lock.zookeeper.url
+hoodie.write.lock.zookeeper.port
+hoodie.write.lock.wait_time_ms
+hoodie.write.lock.num_retries
+hoodie.write.lock.lock_key
+hoodie.write.lock.zookeeper.zk_base_path
 ```
 
 **`HiveMetastore`** based lock provider
 
 ```
-hoodie.writer.lock.provider=org.apache.hudi.hive.HiveMetastoreBasedLockProvider
-hoodie.writer.lock.hivemetastore.database
-hoodie.writer.lock.hivemetastore.table
-hoodie.writer.lock.wait_time_ms
-hoodie.writer.lock.num_retries
+hoodie.write.lock.provider=org.apache.hudi.hive.HiveMetastoreBasedLockProvider
+hoodie.write.lock.hivemetastore.database
+hoodie.write.lock.hivemetastore.table
+hoodie.write.lock.wait_time_ms
+hoodie.write.lock.num_retries
 ```
 
 `The HiveMetastore URI's are picked up from the hadoop configuration file 
loaded during runtime.`
@@ -86,12 +86,12 @@ inputDF.write.format("hudi")
.option(PRECOMBINE_FIELD_OPT_KEY, "ts")
.option("hoodie.failed.writes.cleaner.policy", "LAZY")
.option("hoodie.write.concurrency.mode", 
"optimistic_concurrency_control")
-   .option("hoodie.writer.lock.zookeeper.url", "zookeeper")
-   .option("hoodie.writer.lock.zookeeper.port", "2181")
-   .option("hoodie.writer.lock.wait_time_ms", "12000")
-   .option("hoodie.writer.lock.num_retries", "2")
-   .option("hoodie.writer.lock.lock_key", "test_table")
-   .option("hoodie.writer.lock.zookeeper.zk_base_path", "/test")
+   .option("hoodie.write.lock.zookeeper.url", "zookeeper")
+   .option("hoodie.write.lock.zookeeper.port", "2181")
+   .option("hoodie.write.lock.wait_time_ms", "12000")
+   .option("hoodie.write.lock.num_retries", "2")
+   .option("hoodie.write.lock.lock_key", "test_table")
+   .option("hoodie.write.lock.zookeeper.zk_base_path", "/test")
.option(RECORDKEY_FIELD_OPT_KEY, "uuid")
.option(PARTITIONPATH_FIELD_OPT_KEY, "partitionpath")
.option(TABLE_NAME, tableName)
@@ -128,15 +128,15 @@ Concurrent Writing to Hudi tables requires acquiring a 
lock with either Zookeepe
 Set the correct native lock provider client retries. NOTE that sometimes these 
settings are set on the server once and all clients inherit the same configs. 
Please check your settings before enabling optimistic concurrency.

 ```
-hoodie.writer.lock.wait_time_ms
-hoodie.writer.lock.num_retries
+hoodie.write.lock.wait_time_ms
+hoodie.write.lock.num_retries
 ```
 
 Set the correct hudi client retries for Zookeeper & HiveMetastore. This is 
useful in cases when native client retry settings cannot be changed. Please 
note that these retries will happen in addition to any native client retries 
that you may have set. 
 
 ```
-hoodie.writer.lock.client.wait_time_ms
-hoodie.writer.lock.client.num_retries
+hoodie.write.lock.client.wait_time_ms
+hoodie.write.lock.client.num_retries
 ```
 
 *Setting the right values for these depends on a case by case basis; some 
defaults have been provided for general cases.*

[jira] [Updated] (HUDI-1713) Fix config name for concurrency

2021-03-24 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-1713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-1713:
-
Labels: pull-request-available  (was: )

> Fix config name for concurrency
> ---
>
> Key: HUDI-1713
> URL: https://issues.apache.org/jira/browse/HUDI-1713
> Project: Apache Hudi
>  Issue Type: Task
>  Components: Writer Core
>Reporter: Nishith Agarwal
>Assignee: Nishith Agarwal
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[GitHub] [hudi] n3nash edited a comment on pull request #2701: [HUDI 1623] New Hoodie Instant on disk format with end time and milliseconds granularity

2021-03-24 Thread GitBox



n3nash edited a comment on pull request #2701:
URL: https://github.com/apache/hudi/pull/2701#issuecomment-804432752


   @vinothchandar Can you take an early cursory look at this PR ? I have not 
added any tests yet and more changes need to be done for the build to work. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[hudi] branch asf-site updated: Travis CI build asf-site

2021-03-24 Thread vinoth

This is an automated email from the ASF dual-hosted git repository.

vinoth pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/asf-site by this push:
 new cd78ade  Travis CI build asf-site
cd78ade is described below

commit cd78ade8e43ce4e592df09e7ce1e775d009c44e1
Author: CI 
AuthorDate: Wed Mar 24 19:40:21 2021 +

Travis CI build asf-site
---
 content/docs/concurrency_control.html | 46 +--
 1 file changed, 23 insertions(+), 23 deletions(-)

diff --git a/content/docs/concurrency_control.html 
b/content/docs/concurrency_control.html
index af97c0f..f94a03f 100644
--- a/content/docs/concurrency_control.html
+++ b/content/docs/concurrency_control.html
@@ -415,29 +415,29 @@ This feature is currently experimental and 
requires either Zookeeper or
 
 hoodie.write.concurrency.mode=optimistic_concurrency_control
 hoodie.failed.writes.cleaner.policy=LAZY
-hoodie.writer.lock.provider=
+hoodie.write.lock.provider=
 
 
 There are 2 different server based lock providers that require different 
configuration to be set.
 
 Zookeeper based 
lock provider
 
-hoodie.writer.lock.provider=org.apache.hudi.client.transaction.lock.ZookeeperBasedLockProvider
-hoodie.writer.lock.zookeeper.url
-hoodie.writer.lock.zookeeper.port
-hoodie.writer.lock.wait_time_ms
-hoodie.writer.lock.num_retries
-hoodie.writer.lock.lock_key
-hoodie.writer.lock.zookeeper.zk_base_path
+hoodie.write.lock.provider=org.apache.hudi.client.transaction.lock.ZookeeperBasedLockProvider
+hoodie.write.lock.zookeeper.url
+hoodie.write.lock.zookeeper.port
+hoodie.write.lock.wait_time_ms
+hoodie.write.lock.num_retries
+hoodie.write.lock.lock_key
+hoodie.write.lock.zookeeper.zk_base_path
 
 
 HiveMetastore based 
lock provider
 
-hoodie.writer.lock.provider=org.apache.hudi.hive.HiveMetastoreBasedLockProvider
-hoodie.writer.lock.hivemetastore.database
-hoodie.writer.lock.hivemetastore.table
-hoodie.writer.lock.wait_time_ms
-hoodie.writer.lock.num_retries
+hoodie.write.lock.provider=org.apache.hudi.hive.HiveMetastoreBasedLockProvider
+hoodie.write.lock.hivemetastore.database
+hoodie.write.lock.hivemetastore.table
+hoodie.write.lock.wait_time_ms
+hoodie.write.lock.num_retries
 
 
 The HiveMetastore URI's are picked up from 
the hadoop configuration file loaded during runtime.
@@ -453,12 +453,12 @@ hoodie.writer.lock.num_retries
.option(PRECOMBINE_FIELD_OPT_KEY, "ts")
.option("hoodie.failed.writes.cleaner.policy", 
"LAZY")
.option("hoodie.write.concurrency.mode", "optimistic_concurrency_control")
-   .option("hoodie.writer.lock.zookeeper.url", 
"zookeeper")
-   .option("hoodie.writer.lock.zookeeper.port", 
"2181")
-   .option("hoodie.writer.lock.wait_time_ms", "12000")
-   .option("hoodie.writer.lock.num_retries", "2")
-   .option("hoodie.writer.lock.lock_key", "test_table")
-   .option("hoodie.writer.lock.zookeeper.zk_base_path", "/test")
+   .option("hoodie.write.lock.zookeeper.url", "zookeeper")
+   .option("hoodie.write.lock.zookeeper.port", 
"2181")
+   .option("hoodie.write.lock.wait_time_ms", "12000")
+   .option("hoodie.write.lock.num_retries", "2")
+   .option("hoodie.write.lock.lock_key", "test_table")
+   .option("hoodie.write.lock.zookeeper.zk_base_path", "/test")
.option(RECORDKEY_FIELD_OPT_KEY, "uuid")
.option(PARTITIONPATH_FIELD_OPT_KEY, "partitionpath")
.option(TABLE_NAME, 
tableName)
@@ -495,14 +495,14 @@ A deltastreamer job can then be triggered as follows:
 
 Set the correct native lock provider client retries. NOTE that sometimes 
these settings are set on the server once and all clients inherit the same 
configs. Please check your settings before enabling optimistic concurrency.
 
-hoodie.writer.lock.wait_time_ms
-hoodie.writer.lock.num_retries
+hoodie.write.lock.wait_time_ms
+hoodie.write.lock.num_retries
 
 
 Set the correct hudi client retries for Zookeeper & HiveMetastore. This 
is useful in cases when native client retry settings cannot be changed. Please 
note that these retries will happen in addition to any native client retries 
that you may have set.
 
-hoodie.writer.lock.client.wait_time_ms
-hoodie.writer.lock.client.num_retries
+hoodie.write.lock.client.wait_time_ms
+hoodie.write.lock.client.num_retries
 
 
 Setting the right values for these depends on a case by case basis; 
some defaults have been provided for general cases.

[GitHub] [hudi] vinothchandar commented on a change in pull request #2645: [HUDI-1659] Basic Implementation Of Spark Sql Support

2021-03-24 Thread GitBox



vinothchandar commented on a change in pull request #2645:
URL: https://github.com/apache/hudi/pull/2645#discussion_r597997690



##
File path: 
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieMergeHandle.java
##
@@ -394,4 +405,36 @@ public IOType getIOType() {
   public HoodieBaseFile baseFileForMerge() {
 return baseFileToMerge;
   }
+
+  /**
+   * A special record returned by {@link HoodieRecordPayload}, which means
+   * {@link HoodieMergeHandle} should just skip this record.
+   */
+  private static class IgnoreRecord implements GenericRecord {

Review comment:
   want to understand the need for this better. 

##
File path: 
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieWriteHandle.java
##
@@ -54,9 +53,18 @@
 public abstract class HoodieWriteHandle extends HoodieIOHandle {
 
   private static final Logger LOG = 
LogManager.getLogger(HoodieWriteHandle.class);
+  /**
+   * The input schema of the incoming dataframe.
+   */
+  protected final Schema inputSchema;

Review comment:
   as we discussed in the other PR, we typically call this the `writeSchema`

##
File path: 
hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/spark/sql/hudi/HoodieBaseSqlTest.scala
##
@@ -0,0 +1,72 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.hudi
+
+import java.io.File
+
+import org.apache.spark.sql.{Row, SparkSession}
+import org.apache.spark.util.Utils
+import org.scalactic.source
+import org.scalatest.{BeforeAndAfterAll, FunSuite, Tag}
+
+class HoodieBaseSqlTest extends FunSuite with BeforeAndAfterAll {

Review comment:
   can we start all test classes using `Test`  convention. Thats what 
we use throughout the project. 

##
File path: 
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/keygen/UuidKeyGenerator.java
##
@@ -0,0 +1,52 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.keygen;
+
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.UUID;
+import java.util.stream.Collectors;
+import org.apache.avro.generic.GenericRecord;
+import org.apache.hudi.common.config.TypedProperties;
+import org.apache.hudi.keygen.constant.KeyGeneratorOptions;
+
+/**
+ * A KeyGenerator which use the uuid as the record key.
+ */
+public class UuidKeyGenerator extends BuiltinKeyGenerator {

Review comment:
   to revisit why we want this. 

##
File path: hudi-spark-datasource/hudi-spark2/src/main/antlr4/imports/SqlBase.g4
##
@@ -0,0 +1,1099 @@
+/*
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ * This file is an adaptation of Presto's 
presto-parser/src/main/antlr4/com/facebook/presto/sql/parser/SqlBase.g4 grammar.

Review comment:
   I thought we planned to just use Spark SQL keywords and limit to Spark3 
(which should already recognize delete/merge?). Is ant

[GitHub] [hudi] pengzhiwei2018 edited a comment on pull request #2645: [HUDI-1659] Basic Implementation Of Spark Sql Support

2021-03-24 Thread GitBox



pengzhiwei2018 edited a comment on pull request #2645:
URL: https://github.com/apache/hudi/pull/2645#issuecomment-803355840


   > Getting started on this. Sorry for the delay.
   > 
   > How important are the changes around writeSchema vs inputSchema and such 
changes to the SQL implementation?
   
   Hi @vinothchandar ,Thanks for your review.
   It's necessary to introduce the `inputSchema` & `tableSchema` to replace the 
origin `writeSchema` for MergeInto.
   For example:
   
   ```
   Merge Into h0 using (
 select id, name, flag from s) as s0
   on s0.id = h0.id
   when matched and flag ='u' then update set id = s0.name, name = s0.name
   when not matched then insert (id, name) values(s0.id, s0.name)
   ```
   
   The input is `"select id, name, flag from s"` which schema is `(id, name, 
flag)`. But the record write to the table is `(id, name) ` after the 
update&insert translate.  The inputSchema is not equal to the writeSchema. So 
the origin `writeSchema` can not satisfy this scenario.
   I introduce  introduce the `inputSchema` & `tableSchema` to solve this 
problem. The `inputSchema` is used to parse the incoming record and the 
`tableSchema` for write & read record from the table.
   In most case except the MergeInto, The `inputSchema` is the same the 
`tableSchema`,So it should not affect the origin logical, IMO.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] vinothchandar commented on pull request #2645: [HUDI-1659] Basic Implementation Of Spark Sql Support

2021-03-24 Thread GitBox



vinothchandar commented on pull request #2645:
URL: https://github.com/apache/hudi/pull/2645#issuecomment-806141760


   Let me see how/if we can simplify the inputSchema vs writeSchema thing. 
   
   I went over the PR now. LGTM at a high level. 
   Few questions though 
   
   - I see we are introducing some antlr parsing and inject a custom parse for 
spark 2.x. Is this done for backwards compat with Spark 2 and will be 
eventually removed?
   - Do we reuse the MERGE/DELETE keywords from Spark 3? Is Spark 3 and Spark 2 
syntax different. Can you comment on how we are approaching all this.
   - Have you done any production testing of this PR? 
   
   
   cc @kwondw could you also please chime in. We would like to land something 
basic and iterate and get this out for 0.9.0 next month. 
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] vinothchandar commented on pull request #2645: [HUDI-1659] Basic Implementation Of Spark Sql Support

2021-03-24 Thread GitBox



vinothchandar commented on pull request #2645:
URL: https://github.com/apache/hudi/pull/2645#issuecomment-806159133


   Can we also handle DELETE in this PR itself?  That way, we have some basic 
support for all of the major DMLs 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] vinothchandar commented on pull request #2645: [HUDI-1659] Basic Implementation Of Spark Sql Support

2021-03-24 Thread GitBox



vinothchandar commented on pull request #2645:
URL: https://github.com/apache/hudi/pull/2645#issuecomment-806165264


   cc @vingov this may also be useful for you


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] xiarixiaoyao commented on pull request #2645: [HUDI-1659] Basic Implementation Of Spark Sql Support

2021-03-24 Thread GitBox



xiarixiaoyao commented on pull request #2645:
URL: https://github.com/apache/hudi/pull/2645#issuecomment-806300975


   @pengzhiwei2018  it's a great work, thanks.   MERGE/DELETE 
expresssions(MergeIntoTable, MergeAction... ) in your pr is copy from 
v2Commands.scala in spark 3.0 which will lead to class conflict if we implement 
sql suppor  for spark3 later.  could you shade those keywords
   
   @vinothchandar   
   MERGE/DELETE keywords from Spark 3 is incompatible with this pr， however 
it's not a problem,we can introduce a extra module hudi-spark3-extensions to 
resolve those incompatible, i will put forward a new pr for spark3 next few 
days。
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] xiarixiaoyao commented on a change in pull request #2651: [HUDI-1591] [RFC-26] Improve Hoodie Table Query Performance And Ease Of Use Fo…

2021-03-24 Thread GitBox



xiarixiaoyao commented on a change in pull request #2651:
URL: https://github.com/apache/hudi/pull/2651#discussion_r600990095



##
File path: 
hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/hudi/HoodieFileIndex.scala
##
@@ -0,0 +1,349 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi
+
+import java.util.Properties
+
+import scala.collection.JavaConverters._
+import org.apache.hadoop.fs.{FileStatus, Path}
+import org.apache.hudi.client.common.HoodieSparkEngineContext
+import org.apache.hudi.common.config.{HoodieMetadataConfig, 
SerializableConfiguration}
+import org.apache.hudi.common.engine.HoodieLocalEngineContext
+import org.apache.hudi.common.fs.FSUtils
+import org.apache.hudi.common.model.HoodieBaseFile
+import org.apache.hudi.common.table.{HoodieTableMetaClient, 
TableSchemaResolver}
+import org.apache.hudi.common.table.view.HoodieTableFileSystemView
+import org.apache.hudi.config.HoodieWriteConfig
+import org.apache.spark.api.java.JavaSparkContext
+import org.apache.spark.internal.Logging
+import org.apache.spark.sql.catalyst.{InternalRow, expressions}
+import org.apache.spark.sql.SparkSession
+import org.apache.spark.sql.avro.SchemaConverters
+import org.apache.spark.sql.catalyst.expressions.{AttributeReference, 
BoundReference, Expression, InterpretedPredicate}
+import org.apache.spark.sql.catalyst.util.{CaseInsensitiveMap, DateTimeUtils}
+import org.apache.spark.sql.execution.datasources.{FileIndex, FileStatusCache, 
NoopCache, PartitionDirectory, PartitionUtils}
+import org.apache.spark.sql.internal.SQLConf
+import org.apache.spark.sql.types.StructType
+import org.apache.spark.unsafe.types.UTF8String
+
+import scala.collection.mutable
+
+/**
+  * A File Index which support partition prune for hoodie snapshot and 
read-optimized
+  * query.
+  * Main steps to get the file list for query:
+  * 1、Load all files and partition values from the table path.
+  * 2、Do the partition prune by the partition filter condition.
+  *
+  * There are 3 cases for this:
+  * 1、If the partition columns size is equal to the actually partition path 
level, we
+  * read it as partitioned table.(e.g partition column is "dt", the partition 
path is "2021-03-10")
+  *
+  * 2、If the partition columns size is not equal to the partition path level, 
but the partition
+  * column size is "1" (e.g. partition column is "dt", but the partition path 
is "2021/03/10"
+  * who'es directory level is 3).We can still read it as a partitioned table. 
We will mapping the
+  * partition path (e.g. 2021/03/10) to the only partition column (e.g. "dt").
+  *
+  * 3、Else the the partition columns size is not equal to the partition 
directory level and the
+  * size is great than "1" (e.g. partition column is "dt,hh", the partition 
path is "2021/03/10/12")
+  * , we read it as a None Partitioned table because we cannot know how to 
mapping the partition
+  * path with the partition columns in this case.
+  */
+case class HoodieFileIndex(
+ spark: SparkSession,
+ metaClient: HoodieTableMetaClient,
+ schemaSpec: Option[StructType],
+ options: Map[String, String],
+ @transient fileStatusCache: FileStatusCache = NoopCache)
+  extends FileIndex with Logging {
+
+  private val basePath = metaClient.getBasePath
+
+  @transient private val queryPath = new Path(options.getOrElse("path", 
"'path' option required"))
+  /**
+* Get the schema of the table.
+*/
+  lazy val schema: StructType = schemaSpec.getOrElse({
+val schemaUtil = new TableSchemaResolver(metaClient)
+SchemaConverters.toSqlType(schemaUtil.getTableAvroSchema)
+  .dataType.asInstanceOf[StructType]
+  })
+
+  /**
+* Get the partition schema from the hoodie.properties.
+*/
+  private lazy val _partitionSchemaFromProperties: StructType = {
+val tableConfig = metaClient.getTableConfig
+val partitionColumns = tableConfig.getPartitionColumns
+val nameFieldMap = schema.fields.map(filed => filed.name -> filed).toMap
+
+if (partitionColumns.isPresent) {
+  val partitionFields = partitionColumns.get().map(column =>
+nameFieldMap.getOrElse(column, throw new 
IllegalArgumentException(s"Cannot find column: '" +
+

[GitHub] [hudi] xiarixiaoyao commented on a change in pull request #2645: [HUDI-1659] Basic Implementation Of Spark Sql Support

2021-03-24 Thread GitBox



xiarixiaoyao commented on a change in pull request #2645:
URL: https://github.com/apache/hudi/pull/2645#discussion_r600992762



##
File path: 
hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/HoodieSparkSessionExtension.scala
##
@@ -0,0 +1,48 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.hudi
+
+import org.apache.spark.SPARK_VERSION
+import org.apache.spark.sql.SparkSessionExtensions
+import org.apache.spark.sql.hudi.analysis.HoodieAnalysis
+import org.apache.spark.sql.hudi.parser.HoodieSqlParser
+
+/**
+  * The Hoodie SparkSessionExtension for extending the syntax and add the 
rules.
+  */
+class HoodieSparkSessionExtension extends (SparkSessionExtensions => Unit) {
+  override def apply(extensions: SparkSessionExtensions): Unit = {
+if (SPARK_VERSION.startsWith("2.")) {

Review comment:
   some SQL expressions is restructured in spark3. for example  
InsertIntoStatement is used in spark3 instead of InsertIntoTable which used by 
spark2.   it's better to introdcue a extra module for spark3 extensions




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] xiarixiaoyao commented on a change in pull request #2645: [HUDI-1659] Basic Implementation Of Spark Sql Support

2021-03-24 Thread GitBox



xiarixiaoyao commented on a change in pull request #2645:
URL: https://github.com/apache/hudi/pull/2645#discussion_r600993546



##
File path: hudi-spark-datasource/hudi-spark2/src/main/antlr4/imports/SqlBase.g4
##
@@ -0,0 +1,1099 @@
+/*
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ * This file is an adaptation of Presto's 
presto-parser/src/main/antlr4/com/facebook/presto/sql/parser/SqlBase.g4 grammar.

Review comment:
   i agree with pengzhiwei2018 that introduce antlr4 for spark2.   spark2 
cannot use spark3‘ parser directly (many grammer is changed in spark3)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Created] (HUDI-1716) rt view w/ MOR tables fails after schema evolution

2021-03-24 Thread sivabalan narayanan (Jira)

sivabalan narayanan created HUDI-1716:
-

 Summary: rt view w/ MOR tables fails after schema evolution
 Key: HUDI-1716
 URL: https://issues.apache.org/jira/browse/HUDI-1716
 Project: Apache Hudi
  Issue Type: Bug
  Components: Storage Management
Reporter: sivabalan narayanan


Looks like realtime view w/ MOR table fails if schema present in existing log 
file is evolved to add a new field. no issues w/ writing. but reading fails.

 

More info: https://github.com/apache/hudi/issues/2675



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[GitHub] [hudi] nsivabalan commented on issue #2675: [SUPPORT] Unable to query MOR table after schema evolution

2021-03-24 Thread GitBox



nsivabalan commented on issue #2675:
URL: https://github.com/apache/hudi/issues/2675#issuecomment-806348073


   yes, you are right. I was able to reproduce the issue(local spark). Have 
filed a [bug](https://issues.apache.org/jira/browse/HUDI-1716). 
   I am yet to try out the hive issue. but it could be the same. Appreciate any 
contribution :) 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Updated] (HUDI-1716) rt view w/ MOR tables fails after schema evolution

2021-03-24 Thread sivabalan narayanan (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-1716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-1716:
--
Labels: sev:critical user-support-issues  (was: )

> rt view w/ MOR tables fails after schema evolution
> --
>
> Key: HUDI-1716
> URL: https://issues.apache.org/jira/browse/HUDI-1716
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Storage Management
>Reporter: sivabalan narayanan
>Priority: Major
>  Labels: sev:critical, user-support-issues
> Fix For: 0.9.0
>
>
> Looks like realtime view w/ MOR table fails if schema present in existing log 
> file is evolved to add a new field. no issues w/ writing. but reading fails.
>  
> More info: https://github.com/apache/hudi/issues/2675



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HUDI-1716) rt view w/ MOR tables fails after schema evolution

2021-03-24 Thread sivabalan narayanan (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-1716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-1716:
--
Fix Version/s: 0.9.0

> rt view w/ MOR tables fails after schema evolution
> --
>
> Key: HUDI-1716
> URL: https://issues.apache.org/jira/browse/HUDI-1716
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Storage Management
>Reporter: sivabalan narayanan
>Priority: Major
> Fix For: 0.9.0
>
>
> Looks like realtime view w/ MOR table fails if schema present in existing log 
> file is evolved to add a new field. no issues w/ writing. but reading fails.
>  
> More info: https://github.com/apache/hudi/issues/2675



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Reopened] (HUDI-1495) Upgrade Flink version to 1.12.0

2021-03-24 Thread Danny Chen (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-1495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Chen reopened HUDI-1495:
--

Reopen for release 0.9.0

> Upgrade Flink version to 1.12.0
> ---
>
> Key: HUDI-1495
> URL: https://issues.apache.org/jira/browse/HUDI-1495
> Project: Apache Hudi
>  Issue Type: Task
>  Components: newbie
>Reporter: Danny Chen
>Assignee: Danny Chen
>Priority: Major
>  Labels: easyfix, pull-request-available
> Fix For: 0.7.0
>
>
> The apache Flink 1.12.0 has be released, upgrade the version to 1.12.0 in 
> order to adapter new Flink interfaces.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HUDI-1495) Upgrade Flink version to 1.12.0

2021-03-24 Thread Danny Chen (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-1495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Chen updated HUDI-1495:
-
Fix Version/s: (was: 0.7.0)
   0.9.0

> Upgrade Flink version to 1.12.0
> ---
>
> Key: HUDI-1495
> URL: https://issues.apache.org/jira/browse/HUDI-1495
> Project: Apache Hudi
>  Issue Type: Task
>  Components: newbie
>Reporter: Danny Chen
>Assignee: Danny Chen
>Priority: Major
>  Labels: easyfix, pull-request-available
> Fix For: 0.9.0
>
>
> The apache Flink 1.12.0 has be released, upgrade the version to 1.12.0 in 
> order to adapter new Flink interfaces.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HUDI-1716) rt view w/ MOR tables fails after schema evolution

2021-03-24 Thread sivabalan narayanan (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-1716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-1716:
--
Description: 
Looks like realtime view w/ MOR table fails if schema present in existing log 
file is evolved to add a new field. no issues w/ writing. but reading fails

More info: [https://github.com/apache/hudi/issues/2675]

 

Logs from local run: 

[https://gist.github.com/nsivabalan/656956ab313676617d84002ef8942198]

diff with which above logs were generated: 
[https://gist.github.com/nsivabalan/84dad29bc1ab567ebb6ee8c63b3969ec]

 

 

  was:
Looks like realtime view w/ MOR table fails if schema present in existing log 
file is evolved to add a new field. no issues w/ writing. but reading fails.

 

More info: https://github.com/apache/hudi/issues/2675


> rt view w/ MOR tables fails after schema evolution
> --
>
> Key: HUDI-1716
> URL: https://issues.apache.org/jira/browse/HUDI-1716
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Storage Management
>Reporter: sivabalan narayanan
>Priority: Major
>  Labels: sev:critical, user-support-issues
> Fix For: 0.9.0
>
>
> Looks like realtime view w/ MOR table fails if schema present in existing log 
> file is evolved to add a new field. no issues w/ writing. but reading fails
> More info: [https://github.com/apache/hudi/issues/2675]
>  
> Logs from local run: 
> [https://gist.github.com/nsivabalan/656956ab313676617d84002ef8942198]
> diff with which above logs were generated: 
> [https://gist.github.com/nsivabalan/84dad29bc1ab567ebb6ee8c63b3969ec]
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HUDI-1711) Avro Schema Exception with Spark 3.0 in 0.7

2021-03-24 Thread sivabalan narayanan (Jira)



[ 
https://issues.apache.org/jira/browse/HUDI-1711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17308346#comment-17308346
 ] 

sivabalan narayanan commented on HUDI-1711:
---

sure

> Avro Schema Exception with Spark 3.0 in 0.7
> ---
>
> Key: HUDI-1711
> URL: https://issues.apache.org/jira/browse/HUDI-1711
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: DeltaStreamer
>Reporter: Balaji Varadarajan
>Priority: Major
>
> GH: [https://github.com/apache/hudi/issues/2705]
>  
>  
> {{21/03/22 10:10:35 WARN util.package: Truncated the string representation of 
> a plan since it was too large. This behavior can be adjusted by setting 
> 'spark.sql.debug.maxToStringFields'.
> 21/03/22 10:10:35 ERROR executor.Executor: Exception in task 0.0 in stage 1.0 
> (TID 1)
> java.lang.RuntimeException: Error while decoding: 
> java.lang.NegativeArraySizeException: -1255727808
> createexternalrow(if (isnull(input[0, 
> struct,
>  true])) null else createexternalrow(if (input[0, 
> struct,
>  true].isNullAt) null else input[0, 
> struct,
>  true].id, if (input[0, 
> struct,
>  true].isNullAt) null else input[0, 
> struct,
>  true].name.toString, if (input[0, 
> struct,
>  true].isNullAt) null else input[0, 
> struct,
>  true].type.toString, if (input[0, 
> struct,
>  true].isNullAt) null else input[0, 
> struct,
>  true].url.toString, if (input[0, 
> struct,
>  true].isNullAt) null else input[0, 
> struct,
>  true].user.toString, if (input[0, 
> struct,
>  true].isNullAt) null else input[0, 
> struct,
>  true].password.toString, if (input[0, 
> struct,
>  true].isNullAt) null else input[0, 
> struct,
>  true].create_time.toString, if (input[0, 
> struct,
>  true].isNullAt) null else input[0, 
> struct,
>  true].create_user.toString, if (input[0, 
> struct,
>  true].isNullAt) null else input[0, 
> struct,
>  true].update_time.toString, if (input[0, 
> struct,
>  true].isNullAt) null else input[0, 
> struct,
>  true].update_user.toString, if (input[0, 
> struct,
>  true].isNullAt) null else input[0, 
> struct,
>  true].del_flag, StructField(id,IntegerType,false), 
> StructField(name,StringType,true), StructField(type,StringType,true), 
> StructField(url,StringType,true), StructField(user,StringType,true), 
> StructField(password,StringType,true), 
> StructField(create_time,StringType,true), 
> StructField(create_user,StringType,true), 
> StructField(update_time,StringType,true), 
> StructField(update_user,StringType,true), 
> StructField(del_flag,IntegerType,true)), if (isnull(input[1, 
> struct,
>  true])) null else createexternalrow(if (input[1, 
> struct,
>  true].isNullAt) null else input[1, 
> struct,
>  true].id, if (input[1, 
> struct,
>  true].isNullAt) null else input[1, 
> struct,
>  true].name.toString, if (input[1, 
> struct,
>  true].isNullAt) null else input[1, 
> struct,
>  true].type.toString, if (input[1, 
> struct,
>  true].isNullAt) null else input[1, 
> struct,
>  true].url.toString, if (input[1, 
> struct,
>  true].isNullAt) null else input[1, 
> struct,
>  true].user.toString, if (input[1, 
> struct,
>  true].isNullAt) null else input[1, 
> struct,
>  true].password.toString, if (input[1, 
> struct,
>  true].isNullAt) null else input[1, 
> struct,
>  true].create_time.toString, if (input[1, 
> struct,
>  true].isNullAt) null else input[1, 
> struct,
>  true].create_user.toString, if (input[1, 
> struct,
>  true].isNullAt) null else input[1, 
> struct,
>  true].update_time.toString, if (input[1, 
> struct,
>  true].isNullAt) null else input[1, 
> struct,
>  true].update_user.toString, if (input[1, 
> struct,
>  true].isNullAt) null else input[1, 
> struct,
>  true].del_flag, StructField(id,IntegerType,false), 
> StructField(name,StringType,true), StructField(type,StringType,true), 
> StructField(url,StringType,true), StructField(user,StringType,true), 
> StructField(password,StringType,true), 
> StructField(create_time,StringType,true), 
> StructField(create_user,StringType,true), 
> StructField(update_time,StringType,true), 
> StructField(update_user,StringType,true), 
> StructField(del_flag,IntegerType,true)), if (isnull(input[2, 
> struct,
>  false])) null else createexternalrow(if (input[2, 
> struct,
>  false].isNullAt) null else input[2, 
> struct,
>  false].version.toString, if (input[2, 
> struct,
>  false].isNullAt) null else input[2, 
> struct,
>  false].connector.toString, if (input[2, 
> struct,
>  false].isNullAt) null else input[2, 
> struct,
>  false].name.toString, if (input[2, 
> struct,
>  false].isNullAt) null else input[2, 
> struct,
>  false].ts_ms, if (input[2, 
> struct,
>  false].isNullAt) null else input[2, 
> struct,
>  false].snapshot.toString, if (input[2, 
> struct,
>  false].isNullAt) null else input[2, 
> struct,
>  false].db.toString, if (input[2, 
> struct,
>

[jira] [Updated] (HUDI-1716) rt view w/ MOR tables fails after schema evolution

2021-03-24 Thread sivabalan narayanan (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-1716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-1716:
--
Description: 
Looks like realtime view w/ MOR table fails if schema present in existing log 
file is evolved to add a new field. no issues w/ writing. but reading fails

More info: [https://github.com/apache/hudi/issues/2675]

 

Logs from local run: 

[https://gist.github.com/nsivabalan/656956ab313676617d84002ef8942198]

diff with which above logs were generated: 
[https://gist.github.com/nsivabalan/84dad29bc1ab567ebb6ee8c63b3969ec]

 

Steps to reproduce in spark shell:
 # create MOR table w/ schema1. 
 # Ingest (with schema1) until log files are created. // verify via hudi-cli. I 
didn't see log files w/ just 1 batch of updates. If not, do multiple rounds 
until you see log files.
 # create a new schema2 with one new additional field. ingest a batch with 
schema2 that updates existing records. 
 # read entire dataset. 

 

 

 

  was:
Looks like realtime view w/ MOR table fails if schema present in existing log 
file is evolved to add a new field. no issues w/ writing. but reading fails

More info: [https://github.com/apache/hudi/issues/2675]

 

Logs from local run: 

[https://gist.github.com/nsivabalan/656956ab313676617d84002ef8942198]

diff with which above logs were generated: 
[https://gist.github.com/nsivabalan/84dad29bc1ab567ebb6ee8c63b3969ec]

 

 


> rt view w/ MOR tables fails after schema evolution
> --
>
> Key: HUDI-1716
> URL: https://issues.apache.org/jira/browse/HUDI-1716
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Storage Management
>Reporter: sivabalan narayanan
>Priority: Major
>  Labels: sev:critical, user-support-issues
> Fix For: 0.9.0
>
>
> Looks like realtime view w/ MOR table fails if schema present in existing log 
> file is evolved to add a new field. no issues w/ writing. but reading fails
> More info: [https://github.com/apache/hudi/issues/2675]
>  
> Logs from local run: 
> [https://gist.github.com/nsivabalan/656956ab313676617d84002ef8942198]
> diff with which above logs were generated: 
> [https://gist.github.com/nsivabalan/84dad29bc1ab567ebb6ee8c63b3969ec]
>  
> Steps to reproduce in spark shell:
>  # create MOR table w/ schema1. 
>  # Ingest (with schema1) until log files are created. // verify via hudi-cli. 
> I didn't see log files w/ just 1 batch of updates. If not, do multiple rounds 
> until you see log files.
>  # create a new schema2 with one new additional field. ingest a batch with 
> schema2 that updates existing records. 
>  # read entire dataset. 
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[GitHub] [hudi] codecov-io commented on pull request #2710: [RFC-20][HUDI-648] Implement error log/table for Datasource/DeltaStreamer/WriteClient/Compaction writes

2021-03-24 Thread GitBox



codecov-io commented on pull request #2710:
URL: https://github.com/apache/hudi/pull/2710#issuecomment-806393474


   # [Codecov](https://codecov.io/gh/apache/hudi/pull/2710?src=pr&el=h1) Report
   > Merging 
[#2710](https://codecov.io/gh/apache/hudi/pull/2710?src=pr&el=desc) (d446d2d) 
into 
[master](https://codecov.io/gh/apache/hudi/commit/d7b18783bdd6edd6355ee68714982401d3321f86?el=desc)
 (d7b1878) will **increase** coverage by `10.07%`.
   > The diff coverage is `n/a`.
   
   > :exclamation: Current head d446d2d differs from pull request most recent 
head 0cddd8f. Consider uploading reports for the commit 0cddd8f to get more 
accurate results
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/2710/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2)](https://codecov.io/gh/apache/hudi/pull/2710?src=pr&el=tree)
   
   ```diff
   @@  Coverage Diff  @@
   ## master#2710   +/-   ##
   =
   + Coverage 51.76%   61.84%   +10.07% 
   + Complexity 3601  332 -3269 
   =
 Files   476   54  -422 
 Lines 22583 1989-20594 
 Branches   2408  236 -2172 
   =
   - Hits  11689 1230-10459 
   + Misses 9877  638 -9239 
   + Partials   1017  121  -896 
   ```
   
   | Flag | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | hudicli | `?` | `?` | |
   | hudiclient | `?` | `?` | |
   | hudicommon | `?` | `?` | |
   | hudiflink | `?` | `?` | |
   | hudihadoopmr | `?` | `?` | |
   | hudisparkdatasource | `?` | `?` | |
   | hudisync | `?` | `?` | |
   | huditimelineservice | `?` | `?` | |
   | hudiutilities | `61.84% <ø> (-7.90%)` | `0.00 <ø> (ø)` | |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/2710?src=pr&el=tree) | Coverage Δ 
| Complexity Δ | |
   |---|---|---|---|
   | 
[...ies/exception/HoodieSnapshotExporterException.java](https://codecov.io/gh/apache/hudi/pull/2710/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2V4Y2VwdGlvbi9Ib29kaWVTbmFwc2hvdEV4cG9ydGVyRXhjZXB0aW9uLmphdmE=)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-1.00%)` | |
   | 
[.../apache/hudi/utilities/HoodieSnapshotExporter.java](https://codecov.io/gh/apache/hudi/pull/2710/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL0hvb2RpZVNuYXBzaG90RXhwb3J0ZXIuamF2YQ==)
 | `5.17% <0.00%> (-83.63%)` | `0.00% <0.00%> (-28.00%)` | |
   | 
[...hudi/utilities/schema/JdbcbasedSchemaProvider.java](https://codecov.io/gh/apache/hudi/pull/2710/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NjaGVtYS9KZGJjYmFzZWRTY2hlbWFQcm92aWRlci5qYXZh)
 | `0.00% <0.00%> (-72.23%)` | `0.00% <0.00%> (-2.00%)` | |
   | 
[...he/hudi/utilities/transform/AWSDmsTransformer.java](https://codecov.io/gh/apache/hudi/pull/2710/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3RyYW5zZm9ybS9BV1NEbXNUcmFuc2Zvcm1lci5qYXZh)
 | `0.00% <0.00%> (-66.67%)` | `0.00% <0.00%> (-2.00%)` | |
   | 
[...in/java/org/apache/hudi/utilities/UtilHelpers.java](https://codecov.io/gh/apache/hudi/pull/2710/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL1V0aWxIZWxwZXJzLmphdmE=)
 | `40.69% <0.00%> (-23.84%)` | `27.00% <0.00%> (-6.00%)` | |
   | 
[.../hive/SlashEncodedHourPartitionValueExtractor.java](https://codecov.io/gh/apache/hudi/pull/2710/diff?src=pr&el=tree#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvU2xhc2hFbmNvZGVkSG91clBhcnRpdGlvblZhbHVlRXh0cmFjdG9yLmphdmE=)
 | | | |
   | 
[...g/apache/hudi/timeline/service/RequestHandler.java](https://codecov.io/gh/apache/hudi/pull/2710/diff?src=pr&el=tree#diff-aHVkaS10aW1lbGluZS1zZXJ2aWNlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL3RpbWVsaW5lL3NlcnZpY2UvUmVxdWVzdEhhbmRsZXIuamF2YQ==)
 | | | |
   | 
[...che/hudi/common/util/BufferedRandomAccessFile.java](https://codecov.io/gh/apache/hudi/pull/2710/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3V0aWwvQnVmZmVyZWRSYW5kb21BY2Nlc3NGaWxlLmphdmE=)
 | | | |
   | 
[...e/timeline/versioning/clean/CleanPlanMigrator.java](https://codecov.io/gh/apache/hudi/pull/2710/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3RpbWVsaW5lL3ZlcnNpb25pbmcvY2xlYW4vQ2xlYW5QbGFuTWlncmF0b3IuamF2YQ==)
 | | | |
   | 
[...va/org/apache/hudi/table/format/FilePathUtils.java](https://codecov.io/gh/apache/hudi/pull/2710/diff?src=pr&el=tre

[jira] [Commented] (HUDI-1717) Metadata Table reader does not show correct view of the metadata

2021-03-24 Thread Prashant Wason (Jira)



[ 
https://issues.apache.org/jira/browse/HUDI-1717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17308414#comment-17308414
 ] 

Prashant Wason commented on HUDI-1717:
--

[~vinothchandar] FYI

> Metadata Table reader does not show correct view of the metadata
> 
>
> Key: HUDI-1717
> URL: https://issues.apache.org/jira/browse/HUDI-1717
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Prashant Wason
>Priority: Blocker
>
> Dataset timeline: C1 C2 C3 Compaction.inflight C4 C5
> Metadata timeline: DC1 DC2 DC3. (DC=deltaCommit)
> Assume the dataset timeline has some completed commits (C1, C2 ... C5) and an 
> async compaction operation in progress. Also assume that the metadata table 
> is synced only till C3.
> The MetadataTableWriter will not sync any more instants to the Metadata Table 
> since an incomplete instant is present next (Compaction.inflight).
> The same sync logic is also used by the MetadataReader to perform the 
> in-memory merge of timeline. Hence, the reader will also not consider C4 and 
> C5  thereby providing an incorrect and older view of the FileSlices and 
> FileGroups. 
> Any future ingestion into this table MAY insert data into older versions of 
> the FileSlices which will end up being a data loss when queried.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HUDI-1717) Metadata Table reader does not show correct view of the metadata

2021-03-24 Thread Prashant Wason (Jira)

Prashant Wason created HUDI-1717:


 Summary: Metadata Table reader does not show correct view of the 
metadata
 Key: HUDI-1717
 URL: https://issues.apache.org/jira/browse/HUDI-1717
 Project: Apache Hudi
  Issue Type: Bug
Reporter: Prashant Wason


Dataset timeline: C1 C2 C3 Compaction.inflight C4 C5

Metadata timeline: DC1 DC2 DC3. (DC=deltaCommit)

Assume the dataset timeline has some completed commits (C1, C2 ... C5) and an 
async compaction operation in progress. Also assume that the metadata table is 
synced only till C3.

The MetadataTableWriter will not sync any more instants to the Metadata Table 
since an incomplete instant is present next (Compaction.inflight).

The same sync logic is also used by the MetadataReader to perform the in-memory 
merge of timeline. Hence, the reader will also not consider C4 and C5  thereby 
providing an incorrect and older view of the FileSlices and FileGroups. 

Any future ingestion into this table MAY insert data into older versions of the 
FileSlices which will end up being a data loss when queried.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

74 matches

Mail list logo