[jira] [Updated] (HUDI-1810) Azure CI integration test failed for CLI tests

2021-04-18 Thread Raymond Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-1810:
-
Fix Version/s: 0.9.0

> Azure CI integration test failed for CLI tests
> --
>
> Key: HUDI-1810
> URL: https://issues.apache.org/jira/browse/HUDI-1810
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: CLI, Testing
>Reporter: Raymond Xu
>Priority: Blocker
> Fix For: 0.9.0
>
>
> CLI job failure
> https://dev.azure.com/xushiyan/apache-hudi-ci/_build/results?buildId=29=logs=d5c42908-5572-5ce6-e4a8-5e2053b947e8=00d56f8a-c99a-5b4c-dcf4-5ca71d997069



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-1811) Azure CI connection refused error when init HoodieRealtimeRecordReader

2021-04-18 Thread Raymond Xu (Jira)
Raymond Xu created HUDI-1811:


 Summary: Azure CI connection refused error when init 
HoodieRealtimeRecordReader
 Key: HUDI-1811
 URL: https://issues.apache.org/jira/browse/HUDI-1811
 Project: Apache Hudi
  Issue Type: Bug
  Components: Testing
Reporter: Raymond Xu
 Fix For: 0.9.0


Failed job

https://dev.azure.com/xushiyan/apache-hudi-ci/_build/results?buildId=35=logs=d3721143-1417-5e3d-cf04-c39c0756eab9=a7783b9f-edd1-5bb0-5301-1955fdbfb2c4



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-1810) Azure CI integration test failed for CLI tests

2021-04-18 Thread Raymond Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-1810:
-
Description: 
CLI job failure

https://dev.azure.com/xushiyan/apache-hudi-ci/_build/results?buildId=29=logs=d5c42908-5572-5ce6-e4a8-5e2053b947e8=00d56f8a-c99a-5b4c-dcf4-5ca71d997069

> Azure CI integration test failed for CLI tests
> --
>
> Key: HUDI-1810
> URL: https://issues.apache.org/jira/browse/HUDI-1810
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: CLI, Testing
>Reporter: Raymond Xu
>Priority: Blocker
>
> CLI job failure
> https://dev.azure.com/xushiyan/apache-hudi-ci/_build/results?buildId=29=logs=d5c42908-5572-5ce6-e4a8-5e2053b947e8=00d56f8a-c99a-5b4c-dcf4-5ca71d997069



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-1810) Azure CI integration test failed for CLI tests

2021-04-18 Thread Raymond Xu (Jira)
Raymond Xu created HUDI-1810:


 Summary: Azure CI integration test failed for CLI tests
 Key: HUDI-1810
 URL: https://issues.apache.org/jira/browse/HUDI-1810
 Project: Apache Hudi
  Issue Type: Bug
  Components: CLI, Testing
Reporter: Raymond Xu






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-1251) [UMBRELLA] Migrate CI to azure

2021-04-18 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar updated HUDI-1251:
-
Priority: Blocker  (was: Major)

> [UMBRELLA] Migrate CI to azure 
> ---
>
> Key: HUDI-1251
> URL: https://issues.apache.org/jira/browse/HUDI-1251
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Testing
>Reporter: sivabalan narayanan
>Assignee: sivabalan narayanan
>Priority: Blocker
>  Labels: hudi-umbrellas
> Fix For: 0.9.0
>
>
> Stabilize CI on azure



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-1251) [UMBRELLA] Migrate CI to azure

2021-04-18 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar updated HUDI-1251:
-
Summary: [UMBRELLA] Migrate CI to azure   (was: [UMBRELLA] CI stability )

> [UMBRELLA] Migrate CI to azure 
> ---
>
> Key: HUDI-1251
> URL: https://issues.apache.org/jira/browse/HUDI-1251
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Testing
>Reporter: sivabalan narayanan
>Assignee: sivabalan narayanan
>Priority: Major
>  Labels: hudi-umbrellas
>
> Stabilize CI on azure



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-1251) [UMBRELLA] Migrate CI to azure

2021-04-18 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar updated HUDI-1251:
-
Fix Version/s: 0.9.0

> [UMBRELLA] Migrate CI to azure 
> ---
>
> Key: HUDI-1251
> URL: https://issues.apache.org/jira/browse/HUDI-1251
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Testing
>Reporter: sivabalan narayanan
>Assignee: sivabalan narayanan
>Priority: Major
>  Labels: hudi-umbrellas
> Fix For: 0.9.0
>
>
> Stabilize CI on azure



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-1251) [UMBRELLA] Migrate CI to azure

2021-04-18 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar updated HUDI-1251:
-
Status: Open  (was: New)

> [UMBRELLA] Migrate CI to azure 
> ---
>
> Key: HUDI-1251
> URL: https://issues.apache.org/jira/browse/HUDI-1251
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Testing
>Reporter: sivabalan narayanan
>Assignee: sivabalan narayanan
>Priority: Major
>  Labels: hudi-umbrellas
> Fix For: 0.9.0
>
>
> Stabilize CI on azure



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-1251) [UMBRELLA] CI stability

2021-04-18 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar updated HUDI-1251:
-
Description: Stabilize CI on azure  (was: Stabilize CI and ease debugging 
of integration test failures.)

> [UMBRELLA] CI stability 
> 
>
> Key: HUDI-1251
> URL: https://issues.apache.org/jira/browse/HUDI-1251
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Testing
>Reporter: sivabalan narayanan
>Assignee: sivabalan narayanan
>Priority: Major
>  Labels: hudi-umbrellas
>
> Stabilize CI on azure



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-1251) [UMBRELLA] CI stability

2021-04-18 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar updated HUDI-1251:
-
Summary: [UMBRELLA] CI stability   (was: [UMBRELLA] CI stability and 
debugging integ tests)

> [UMBRELLA] CI stability 
> 
>
> Key: HUDI-1251
> URL: https://issues.apache.org/jira/browse/HUDI-1251
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Testing
>Reporter: sivabalan narayanan
>Assignee: sivabalan narayanan
>Priority: Major
>  Labels: hudi-umbrellas
>
> Stabilize CI and ease debugging of integration test failures.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] hudi-bot edited a comment on pull request #2643: DO NOT MERGE (Azure CI) test branch ci

2021-04-18 Thread GitBox


hudi-bot edited a comment on pull request #2643:
URL: https://github.com/apache/hudi/pull/2643#issuecomment-792368481


   
   ## CI report:
   
   * 9831a6c50e9f49f8a71c02fc6ac50ae1446f7c1f UNKNOWN
   * a569dbe9409910fbb83b3764b300574c0e52612e Azure: 
[FAILURE](https://dev.azure.com/XUSH0012/0ef433cc-d4b4-47cc-b6a1-03d032ef546c/_build/results?buildId=142)
 
   * e6e9f1f1554a1474dd6c20338215030cad23a2e0 UNKNOWN
   * 2a6690a256c8cd8efe9ed2b1984b896fb27ef077 UNKNOWN
   * d8b7cca55e057a52a2e229d81e8cb52b60dc275f UNKNOWN
   * 3bce301333cc78194d13a702598b46e04fe9f85f UNKNOWN
   * f07f345baa450f3fec7eab59caa76b0fbda1e132 UNKNOWN
   * 869d2ce3fad330af93c1bb3b576824f519c6e68b UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] codecov-commenter edited a comment on pull request #2784: [HUDI-1740] Fix insert-overwrite API archival

2021-04-18 Thread GitBox


codecov-commenter edited a comment on pull request #2784:
URL: https://github.com/apache/hudi/pull/2784#issuecomment-822168210


   # 
[Codecov](https://codecov.io/gh/apache/hudi/pull/2784?src=pr=h1_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 Report
   > Merging 
[#2784](https://codecov.io/gh/apache/hudi/pull/2784?src=pr=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (2327417) into 
[master](https://codecov.io/gh/apache/hudi/commit/8d29863c86aed57dc8f1a0a450bce3b256de2960?el=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (8d29863) will **decrease** coverage by `43.21%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/2784/graphs/tree.svg?width=650=150=pr=VTTXabwbs2_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/2784?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
   
   ```diff
   @@ Coverage Diff  @@
   ## master   #2784   +/-   ##
   
   - Coverage 52.60%   9.38%   -43.22% 
   + Complexity 3709  48 -3661 
   
 Files   485  54  -431 
 Lines 232241993-21231 
 Branches   2465 235 -2230 
   
   - Hits  12216 187-12029 
   + Misses 99291793 -8136 
   + Partials   1079  13 -1066 
   ```
   
   | Flag | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | hudicli | `?` | `?` | |
   | hudiclient | `?` | `?` | |
   | hudicommon | `?` | `?` | |
   | hudiflink | `?` | `?` | |
   | hudihadoopmr | `?` | `?` | |
   | hudisparkdatasource | `?` | `?` | |
   | hudisync | `?` | `?` | |
   | huditimelineservice | `?` | `?` | |
   | hudiutilities | `9.38% <ø> (-60.42%)` | `48.00 <ø> (-325.00)` | |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/2784?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | 
[...va/org/apache/hudi/utilities/IdentitySplitter.java](https://codecov.io/gh/apache/hudi/pull/2784/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL0lkZW50aXR5U3BsaXR0ZXIuamF2YQ==)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-2.00%)` | |
   | 
[...va/org/apache/hudi/utilities/schema/SchemaSet.java](https://codecov.io/gh/apache/hudi/pull/2784/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NjaGVtYS9TY2hlbWFTZXQuamF2YQ==)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-3.00%)` | |
   | 
[...a/org/apache/hudi/utilities/sources/RowSource.java](https://codecov.io/gh/apache/hudi/pull/2784/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvUm93U291cmNlLmphdmE=)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-4.00%)` | |
   | 
[.../org/apache/hudi/utilities/sources/AvroSource.java](https://codecov.io/gh/apache/hudi/pull/2784/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvQXZyb1NvdXJjZS5qYXZh)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-1.00%)` | |
   | 
[.../org/apache/hudi/utilities/sources/JsonSource.java](https://codecov.io/gh/apache/hudi/pull/2784/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvSnNvblNvdXJjZS5qYXZh)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-1.00%)` | |
   | 

[GitHub] [hudi] codecov-commenter edited a comment on pull request #2784: [HUDI-1740] Fix insert-overwrite API archival

2021-04-18 Thread GitBox


codecov-commenter edited a comment on pull request #2784:
URL: https://github.com/apache/hudi/pull/2784#issuecomment-822168210


   # 
[Codecov](https://codecov.io/gh/apache/hudi/pull/2784?src=pr=h1_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 Report
   > Merging 
[#2784](https://codecov.io/gh/apache/hudi/pull/2784?src=pr=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (af963ac) into 
[master](https://codecov.io/gh/apache/hudi/commit/8d29863c86aed57dc8f1a0a450bce3b256de2960?el=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (8d29863) will **increase** coverage by `17.14%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/2784/graphs/tree.svg?width=650=150=pr=VTTXabwbs2_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/2784?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
   
   ```diff
   @@  Coverage Diff  @@
   ## master#2784   +/-   ##
   =
   + Coverage 52.60%   69.74%   +17.14% 
   + Complexity 3709  372 -3337 
   =
 Files   485   54  -431 
 Lines 23224 1993-21231 
 Branches   2465  235 -2230 
   =
   - Hits  12216 1390-10826 
   + Misses 9929  471 -9458 
   + Partials   1079  132  -947 
   ```
   
   | Flag | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | hudicli | `?` | `?` | |
   | hudiclient | `∅ <ø> (∅)` | `0.00 <ø> (ø)` | |
   | hudicommon | `?` | `?` | |
   | hudiflink | `?` | `?` | |
   | hudihadoopmr | `?` | `?` | |
   | hudisparkdatasource | `?` | `?` | |
   | hudisync | `?` | `?` | |
   | huditimelineservice | `?` | `?` | |
   | hudiutilities | `69.74% <ø> (-0.06%)` | `372.00 <ø> (-1.00)` | |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/2784?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | 
[...apache/hudi/utilities/deltastreamer/DeltaSync.java](https://codecov.io/gh/apache/hudi/pull/2784/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2RlbHRhc3RyZWFtZXIvRGVsdGFTeW5jLmphdmE=)
 | `70.74% <0.00%> (-0.35%)` | `54.00% <0.00%> (-1.00%)` | |
   | 
[...a/org/apache/hudi/common/model/HoodieBaseFile.java](https://codecov.io/gh/apache/hudi/pull/2784/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL21vZGVsL0hvb2RpZUJhc2VGaWxlLmphdmE=)
 | | | |
   | 
[...org/apache/hudi/common/util/SpillableMapUtils.java](https://codecov.io/gh/apache/hudi/pull/2784/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3V0aWwvU3BpbGxhYmxlTWFwVXRpbHMuamF2YQ==)
 | | | |
   | 
[...common/model/HoodieFailedWritesCleaningPolicy.java](https://codecov.io/gh/apache/hudi/pull/2784/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL21vZGVsL0hvb2RpZUZhaWxlZFdyaXRlc0NsZWFuaW5nUG9saWN5LmphdmE=)
 | | | |
   | 
[...e/timeline/versioning/clean/CleanPlanMigrator.java](https://codecov.io/gh/apache/hudi/pull/2784/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3RpbWVsaW5lL3ZlcnNpb25pbmcvY2xlYW4vQ2xlYW5QbGFuTWlncmF0b3IuamF2YQ==)
 | | | |
   | 
[.../apache/hudi/common/util/ObjectSizeCalculator.java](https://codecov.io/gh/apache/hudi/pull/2784/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3V0aWwvT2JqZWN0U2l6ZUNhbGN1bGF0b3IuamF2YQ==)
 | | 

[GitHub] [hudi] hudi-bot edited a comment on pull request #2643: DO NOT MERGE (Azure CI) test branch ci

2021-04-18 Thread GitBox


hudi-bot edited a comment on pull request #2643:
URL: https://github.com/apache/hudi/pull/2643#issuecomment-792368481


   
   ## CI report:
   
   * 9831a6c50e9f49f8a71c02fc6ac50ae1446f7c1f UNKNOWN
   * a569dbe9409910fbb83b3764b300574c0e52612e Azure: 
[FAILURE](https://dev.azure.com/XUSH0012/0ef433cc-d4b4-47cc-b6a1-03d032ef546c/_build/results?buildId=142)
 
   * e6e9f1f1554a1474dd6c20338215030cad23a2e0 UNKNOWN
   * 2a6690a256c8cd8efe9ed2b1984b896fb27ef077 UNKNOWN
   * d8b7cca55e057a52a2e229d81e8cb52b60dc275f UNKNOWN
   * 3bce301333cc78194d13a702598b46e04fe9f85f UNKNOWN
   * f07f345baa450f3fec7eab59caa76b0fbda1e132 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] codecov-commenter commented on pull request #2784: [HUDI-1740] Fix insert-overwrite API archival

2021-04-18 Thread GitBox


codecov-commenter commented on pull request #2784:
URL: https://github.com/apache/hudi/pull/2784#issuecomment-822168210


   # 
[Codecov](https://codecov.io/gh/apache/hudi/pull/2784?src=pr=h1_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 Report
   > Merging 
[#2784](https://codecov.io/gh/apache/hudi/pull/2784?src=pr=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (af963ac) into 
[master](https://codecov.io/gh/apache/hudi/commit/8d29863c86aed57dc8f1a0a450bce3b256de2960?el=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (8d29863) will **increase** coverage by `17.14%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/2784/graphs/tree.svg?width=650=150=pr=VTTXabwbs2_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/2784?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
   
   ```diff
   @@  Coverage Diff  @@
   ## master#2784   +/-   ##
   =
   + Coverage 52.60%   69.74%   +17.14% 
   + Complexity 3709  372 -3337 
   =
 Files   485   54  -431 
 Lines 23224 1993-21231 
 Branches   2465  235 -2230 
   =
   - Hits  12216 1390-10826 
   + Misses 9929  471 -9458 
   + Partials   1079  132  -947 
   ```
   
   | Flag | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | hudicli | `?` | `?` | |
   | hudiclient | `?` | `?` | |
   | hudicommon | `?` | `?` | |
   | hudiflink | `?` | `?` | |
   | hudihadoopmr | `?` | `?` | |
   | hudisparkdatasource | `?` | `?` | |
   | hudisync | `?` | `?` | |
   | huditimelineservice | `?` | `?` | |
   | hudiutilities | `69.74% <ø> (-0.06%)` | `372.00 <ø> (-1.00)` | |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/2784?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | 
[...apache/hudi/utilities/deltastreamer/DeltaSync.java](https://codecov.io/gh/apache/hudi/pull/2784/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2RlbHRhc3RyZWFtZXIvRGVsdGFTeW5jLmphdmE=)
 | `70.74% <0.00%> (-0.35%)` | `54.00% <0.00%> (-1.00%)` | |
   | 
[...cala/org/apache/hudi/HoodieBootstrapRelation.scala](https://codecov.io/gh/apache/hudi/pull/2784/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1zcGFyay1kYXRhc291cmNlL2h1ZGktc3Bhcmsvc3JjL21haW4vc2NhbGEvb3JnL2FwYWNoZS9odWRpL0hvb2RpZUJvb3RzdHJhcFJlbGF0aW9uLnNjYWxh)
 | | | |
   | 
[...i/hadoop/utils/HoodieRealtimeInputFormatUtils.java](https://codecov.io/gh/apache/hudi/pull/2784/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1oYWRvb3AtbXIvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaGFkb29wL3V0aWxzL0hvb2RpZVJlYWx0aW1lSW5wdXRGb3JtYXRVdGlscy5qYXZh)
 | | | |
   | 
[.../org/apache/hudi/sink/utils/NonThrownExecutor.java](https://codecov.io/gh/apache/hudi/pull/2784/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS9zaW5rL3V0aWxzL05vblRocm93bkV4ZWN1dG9yLmphdmE=)
 | | | |
   | 
[...oop/realtime/HoodieParquetRealtimeInputFormat.java](https://codecov.io/gh/apache/hudi/pull/2784/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1oYWRvb3AtbXIvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaGFkb29wL3JlYWx0aW1lL0hvb2RpZVBhcnF1ZXRSZWFsdGltZUlucHV0Rm9ybWF0LmphdmE=)
 | | | |
   | 
[...n/scala/org/apache/hudi/HoodieMergeOnReadRDD.scala](https://codecov.io/gh/apache/hudi/pull/2784/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1zcGFyay1kYXRhc291cmNlL2h1ZGktc3Bhcmsvc3JjL21haW4vc2NhbGEvb3JnL2FwYWNoZS9odWRpL0hvb2RpZU1lcmdlT25SZWFkUkRELnNjYWxh)
 | | | 

[GitHub] [hudi] ssdong commented on pull request #2784: [HUDI-1740] Fix insert-overwrite API archival

2021-04-18 Thread GitBox


ssdong commented on pull request #2784:
URL: https://github.com/apache/hudi/pull/2784#issuecomment-822161097


   @lw309637554 updated! Please take a look and let me know. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Created] (HUDI-1809) Flink merge on read input split uses wrong base file path for default merge type

2021-04-18 Thread Danny Chen (Jira)
Danny Chen created HUDI-1809:


 Summary: Flink merge on read input split uses wrong base file path 
for default merge type
 Key: HUDI-1809
 URL: https://issues.apache.org/jira/browse/HUDI-1809
 Project: Apache Hudi
  Issue Type: Bug
  Components: Flink Integration
Reporter: Danny Chen
Assignee: Danny Chen
 Fix For: 0.9.0


Should use the base file path instead of the table path.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] codecov-commenter edited a comment on pull request #2845: [HUDI-1723] Fix path selector listing files with the same mod date

2021-04-18 Thread GitBox


codecov-commenter edited a comment on pull request #2845:
URL: https://github.com/apache/hudi/pull/2845#issuecomment-822143179


   # 
[Codecov](https://codecov.io/gh/apache/hudi/pull/2845?src=pr=h1_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 Report
   > :exclamation: No coverage uploaded for pull request base 
(`master@4e050cc`). [Click here to learn what that 
means](https://docs.codecov.io/docs/error-reference?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#section-missing-base-commit).
   > The diff coverage is `40.00%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/2845/graphs/tree.svg?width=650=150=pr=VTTXabwbs2_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/2845?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
   
   ```diff
   @@Coverage Diff@@
   ## master#2845   +/-   ##
   =
 Coverage  ?   52.58%   
 Complexity? 3708   
   =
 Files ?  485   
 Lines ?23227   
 Branches  ? 2466   
   =
 Hits  ?12214   
 Misses? 9934   
 Partials  ? 1079   
   ```
   
   | Flag | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | hudicli | `40.29% <ø> (?)` | `215.00 <ø> (?)` | |
   | hudiclient | `∅ <ø> (?)` | `0.00 <ø> (?)` | |
   | hudicommon | `50.66% <ø> (?)` | `1976.00 <ø> (?)` | |
   | hudiflink | `56.51% <ø> (?)` | `516.00 <ø> (?)` | |
   | hudihadoopmr | `33.33% <ø> (?)` | `198.00 <ø> (?)` | |
   | hudisparkdatasource | `72.06% <ø> (?)` | `237.00 <ø> (?)` | |
   | hudisync | `45.70% <ø> (?)` | `131.00 <ø> (?)` | |
   | huditimelineservice | `64.36% <ø> (?)` | `62.00 <ø> (?)` | |
   | hudiutilities | `69.79% <40.00%> (?)` | `373.00 <0.00> (?)` | |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/2845?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | 
[...ies/sources/helpers/DatePartitionPathSelector.java](https://codecov.io/gh/apache/hudi/pull/2845/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvaGVscGVycy9EYXRlUGFydGl0aW9uUGF0aFNlbGVjdG9yLmphdmE=)
 | `54.68% <0.00%> (ø)` | `13.00 <0.00> (?)` | |
   | 
[...udi/utilities/sources/helpers/DFSPathSelector.java](https://codecov.io/gh/apache/hudi/pull/2845/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvaGVscGVycy9ERlNQYXRoU2VsZWN0b3IuamF2YQ==)
 | `82.60% <80.00%> (ø)` | `14.00 <0.00> (?)` | |
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] pengzhiwei2018 commented on a change in pull request #2283: [HUDI-1415] Read Hoodie Table As Spark DataSource Table

2021-04-18 Thread GitBox


pengzhiwei2018 commented on a change in pull request #2283:
URL: https://github.com/apache/hudi/pull/2283#discussion_r615528821



##
File path: 
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/DataSourceOptions.scala
##
@@ -353,6 +353,8 @@ object DataSourceWriteOptions {
   val HIVE_IGNORE_EXCEPTIONS_OPT_KEY = 
"hoodie.datasource.hive_sync.ignore_exceptions"
   val HIVE_SKIP_RO_SUFFIX = "hoodie.datasource.hive_sync.skip_ro_suffix"
   val HIVE_SUPPORT_TIMESTAMP = "hoodie.datasource.hive_sync.support_timestamp"
+  val HIVE_TABLE_PROPERTIES = "hoodie.datasource.hive_sync.table_properties"

Review comment:
   Good suggestion! +1 for this!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] codecov-commenter edited a comment on pull request #2845: [HUDI-1723] Fix path selector listing files with the same mod date

2021-04-18 Thread GitBox


codecov-commenter edited a comment on pull request #2845:
URL: https://github.com/apache/hudi/pull/2845#issuecomment-822143179


   # 
[Codecov](https://codecov.io/gh/apache/hudi/pull/2845?src=pr=h1_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 Report
   > :exclamation: No coverage uploaded for pull request base 
(`master@4e050cc`). [Click here to learn what that 
means](https://docs.codecov.io/docs/error-reference?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#section-missing-base-commit).
   > The diff coverage is `40.00%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/2845/graphs/tree.svg?width=650=150=pr=VTTXabwbs2_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/2845?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
   
   ```diff
   @@Coverage Diff@@
   ## master#2845   +/-   ##
   =
 Coverage  ?   69.79%   
 Complexity?  373   
   =
 Files ?   54   
 Lines ? 1993   
 Branches  ?  235   
   =
 Hits  ? 1391   
 Misses?  471   
 Partials  ?  131   
   ```
   
   | Flag | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | hudiutilities | `69.79% <40.00%> (?)` | `373.00 <0.00> (?)` | |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/2845?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | 
[...ies/sources/helpers/DatePartitionPathSelector.java](https://codecov.io/gh/apache/hudi/pull/2845/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvaGVscGVycy9EYXRlUGFydGl0aW9uUGF0aFNlbGVjdG9yLmphdmE=)
 | `54.68% <0.00%> (ø)` | `13.00 <0.00> (?)` | |
   | 
[...udi/utilities/sources/helpers/DFSPathSelector.java](https://codecov.io/gh/apache/hudi/pull/2845/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvaGVscGVycy9ERlNQYXRoU2VsZWN0b3IuamF2YQ==)
 | `82.60% <80.00%> (ø)` | `14.00 <0.00> (?)` | |
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] pengzhiwei2018 commented on a change in pull request #2283: [HUDI-1415] Read Hoodie Table As Spark DataSource Table

2021-04-18 Thread GitBox


pengzhiwei2018 commented on a change in pull request #2283:
URL: https://github.com/apache/hudi/pull/2283#discussion_r615526012



##
File path: 
hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/HiveSyncConfig.java
##
@@ -88,6 +88,12 @@
   @Parameter(names = {"--verify-metadata-file-listing"}, description = "Verify 
file listing from Hudi's metadata against file system")
   public Boolean verifyMetadataFileListing = 
HoodieMetadataConfig.DEFAULT_METADATA_VALIDATE;
 
+  @Parameter(names = {"--table-properties"}, description = "Table properties 
to hive table")
+  public String tableProperties;
+
+  @Parameter(names = {"--serde-properties"}, description = "Serde properties 
to hive table")
+  public String serdeProperties;
+

Review comment:
   Yes, thanks for remind me.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] pengzhiwei2018 commented on a change in pull request #2283: [HUDI-1415] Read Hoodie Table As Spark DataSource Table

2021-04-18 Thread GitBox


pengzhiwei2018 commented on a change in pull request #2283:
URL: https://github.com/apache/hudi/pull/2283#discussion_r615525852



##
File path: 
hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/HiveSyncTool.java
##
@@ -164,7 +165,13 @@ private void syncHoodieTable(String tableName, boolean 
useRealtimeInputFormat) {
 LOG.info("Storage partitions scan complete. Found " + 
writtenPartitionsSince.size());
 // Sync the partitions if needed
 syncPartitions(tableName, writtenPartitionsSince);
-
+// Sync the table properties if need
+if (cfg.tableProperties != null) {
+  Map tableProperties = 
ConfigUtils.toMap(cfg.tableProperties);
+  hoodieHiveClient.updateTableProperties(tableName, tableProperties);
+  LOG.info("Sync table properties for " + tableName + ", table properties 
is: "
+  + cfg.tableProperties);
+}

Review comment:
   Well, the `tableProperties` may change if the schema has changed. So we 
need to update the table properties  by a separate interface. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] codecov-commenter commented on pull request #2845: [HUDI-1723] Fix path selector listing files with the same mod date

2021-04-18 Thread GitBox


codecov-commenter commented on pull request #2845:
URL: https://github.com/apache/hudi/pull/2845#issuecomment-822143179


   # 
[Codecov](https://codecov.io/gh/apache/hudi/pull/2845?src=pr=h1_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 Report
   > :exclamation: No coverage uploaded for pull request base 
(`master@4e050cc`). [Click here to learn what that 
means](https://docs.codecov.io/docs/error-reference?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#section-missing-base-commit).
   > The diff coverage is `0.00%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/2845/graphs/tree.svg?width=650=150=pr=VTTXabwbs2_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/2845?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
   
   ```diff
   @@   Coverage Diff@@
   ## master   #2845   +/-   ##
   
 Coverage  ?   9.38%   
 Complexity?  48   
   
 Files ?  54   
 Lines ?1993   
 Branches  ? 235   
   
 Hits  ? 187   
 Misses?1793   
 Partials  ?  13   
   ```
   
   | Flag | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | hudiutilities | `9.38% <0.00%> (?)` | `48.00 <0.00> (?)` | |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/2845?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | 
[...udi/utilities/sources/helpers/DFSPathSelector.java](https://codecov.io/gh/apache/hudi/pull/2845/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvaGVscGVycy9ERlNQYXRoU2VsZWN0b3IuamF2YQ==)
 | `0.00% <0.00%> (ø)` | `0.00 <0.00> (?)` | |
   | 
[...ies/sources/helpers/DatePartitionPathSelector.java](https://codecov.io/gh/apache/hudi/pull/2845/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvaGVscGVycy9EYXRlUGFydGl0aW9uUGF0aFNlbGVjdG9yLmphdmE=)
 | `0.00% <0.00%> (ø)` | `0.00 <0.00> (?)` | |
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] pengzhiwei2018 commented on a change in pull request #2283: [HUDI-1415] Read Hoodie Table As Spark DataSource Table

2021-04-18 Thread GitBox


pengzhiwei2018 commented on a change in pull request #2283:
URL: https://github.com/apache/hudi/pull/2283#discussion_r615522929



##
File path: 
hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala
##
@@ -306,7 +311,10 @@ private[hudi] object HoodieSparkSqlWriter {
 } finally {
   writeClient.close()
 }
-val metaSyncSuccess = metaSync(parameters, basePath, 
jsc.hadoopConfiguration)
+val newParameters =
+  addSqlTableProperties(sqlContext.sparkSession.sessionState.conf, 
df.schema, parameters)

Review comment:
   yeah, moving the `addSqlTableProperties`  to  `metaSync` can simplify 
the logical.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] lw309637554 commented on pull request #2784: [HUDI-1740] Fix insert-overwrite API archival

2021-04-18 Thread GitBox


lw309637554 commented on pull request #2784:
URL: https://github.com/apache/hudi/pull/2784#issuecomment-822141958


   > @lw309637554 Thank you for your comments and I've replied. Please take a 
look and let me know. 
   
   replied, can modify the log 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] xushiyan commented on pull request #2845: [HUDI-1723] Fix path selector listing files with the same mod date

2021-04-18 Thread GitBox


xushiyan commented on pull request #2845:
URL: https://github.com/apache/hudi/pull/2845#issuecomment-822141919


   @vinothchandar This is the short-term fix for the bug. Logic were duplicated 
due to the existing logic copied over from one to another.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] lw309637554 commented on a change in pull request #2784: [HUDI-1740] Fix insert-overwrite API archival

2021-04-18 Thread GitBox


lw309637554 commented on a change in pull request #2784:
URL: https://github.com/apache/hudi/pull/2784#discussion_r615522271



##
File path: 
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/utils/MetadataConversionUtils.java
##
@@ -105,14 +117,25 @@ public static HoodieArchivedMetaEntry 
createMetaWrapper(HoodieInstant hoodieInst
 return archivedMetaWrapper;
   }
 
-  public static HoodieArchivedMetaEntry createMetaWrapper(HoodieInstant 
hoodieInstant,
-  HoodieCommitMetadata 
hoodieCommitMetadata) {
-HoodieArchivedMetaEntry archivedMetaWrapper = new 
HoodieArchivedMetaEntry();
-archivedMetaWrapper.setCommitTime(hoodieInstant.getTimestamp());
-archivedMetaWrapper.setActionState(hoodieInstant.getState().name());
-
archivedMetaWrapper.setHoodieCommitMetadata(convertCommitMetadata(hoodieCommitMetadata));
-archivedMetaWrapper.setActionType(ActionType.commit.name());
-return archivedMetaWrapper;
+  public static Option 
getInflightReplaceMetadata(HoodieTableMetaClient metaClient, HoodieInstant 
instant) throws IOException {

Review comment:
   okay




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (HUDI-1723) DFSPathSelector skips files with the same modify date when read up to source limit

2021-04-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-1723:
-
Labels: pull-request-available sev:critical user-support-issues  (was: 
sev:critical user-support-issues)

> DFSPathSelector skips files with the same modify date when read up to source 
> limit
> --
>
> Key: HUDI-1723
> URL: https://issues.apache.org/jira/browse/HUDI-1723
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: DeltaStreamer
>Reporter: Raymond Xu
>Priority: Blocker
>  Labels: pull-request-available, sev:critical, user-support-issues
> Fix For: 0.9.0
>
> Attachments: Screen Shot 2021-03-26 at 1.42.42 AM.png
>
>
> org.apache.hudi.utilities.sources.helpers.DFSPathSelector#listEligibleFiles 
> filters the input files based on last saved checkpoint, which was the 
> modification date from last read file. However, the last read file's 
> modification date could be duplicated for multiple files and resulted in 
> skipping a few of them when reading up to source limit. An illustration is 
> shown in the attached picture.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] xushiyan opened a new pull request #2845: [HUDI-1723] Fix path selector listing files with the same mod date

2021-04-18 Thread GitBox


xushiyan opened a new pull request #2845:
URL: https://github.com/apache/hudi/pull/2845


   For issues described in https://issues.apache.org/jira/browse/HUDI-1723
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] lw309637554 commented on a change in pull request #2784: [HUDI-1740] Fix insert-overwrite API archival

2021-04-18 Thread GitBox


lw309637554 commented on a change in pull request #2784:
URL: https://github.com/apache/hudi/pull/2784#discussion_r615520709



##
File path: 
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/ReplaceArchivalHelper.java
##
@@ -68,6 +68,11 @@
   public static boolean deleteReplacedFileGroups(HoodieEngineContext context, 
HoodieTableMetaClient metaClient,
  TableFileSystemView 
fileSystemView,
  HoodieInstant instant, 
List replacedPartitions) {
+// There is no file id to be replaced in the very first replace commit 
file for insert overwrite operation
+if (replacedPartitions.isEmpty()) {
+  LOG.warn("Found empty partitionToReplaceFileIds");

Review comment:
   yes, "Found no partition files to replace " will better




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] codecov-commenter edited a comment on pull request #2645: [HUDI-1659] Basic Implementation Of Spark Sql Support

2021-04-18 Thread GitBox


codecov-commenter edited a comment on pull request #2645:
URL: https://github.com/apache/hudi/pull/2645#issuecomment-822128091


   # 
[Codecov](https://codecov.io/gh/apache/hudi/pull/2645?src=pr=h1_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 Report
   > Merging 
[#2645](https://codecov.io/gh/apache/hudi/pull/2645?src=pr=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (8304965) into 
[master](https://codecov.io/gh/apache/hudi/commit/18459d4045ec4a85081c227893b226a4d759f84b?el=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (18459d4) will **increase** coverage by `0.46%`.
   > The diff coverage is `54.75%`.
   
   > :exclamation: Current head 8304965 differs from pull request most recent 
head 20a927c. Consider uploading reports for the commit 20a927c to get more 
accurate results
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/2645/graphs/tree.svg?width=650=150=pr=VTTXabwbs2_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/2645?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
   
   ```diff
   @@ Coverage Diff  @@
   ## master#2645  +/-   ##
   
   + Coverage 52.26%   52.73%   +0.46% 
   - Complexity 3682 3822 +140 
   
 Files   484  509  +25 
 Lines 2309424656+1562 
 Branches   2456 2774 +318 
   
   + Hits  1207013002 +932 
   - Misses 995910380 +421 
   - Partials   1065 1274 +209 
   ```
   
   | Flag | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | hudicli | `40.29% <ø> (+3.35%)` | `215.00 <ø> (+20.00)` | |
   | hudiclient | `∅ <ø> (∅)` | `0.00 <ø> (ø)` | |
   | hudicommon | `50.57% <11.11%> (-0.20%)` | `1979.00 <2.00> (+3.00)` | 
:arrow_down: |
   | hudiflink | `56.51% <ø> (-0.07%)` | `516.00 <ø> (+2.00)` | :arrow_down: |
   | hudihadoopmr | `33.33% <ø> (-0.12%)` | `198.00 <ø> (+1.00)` | :arrow_down: 
|
   | hudisparkdatasource | `65.00% <56.21%> (-6.34%)` | `348.00 <109.00> 
(+111.00)` | :arrow_down: |
   | hudisync | `45.62% <0.00%> (+0.15%)` | `131.00 <1.00> (+3.00)` | |
   | huditimelineservice | `64.36% <ø> (ø)` | `62.00 <ø> (ø)` | |
   | hudiutilities | `69.79% <ø> (+0.06%)` | `373.00 <ø> (ø)` | |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/2645?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | 
[...rg/apache/hudi/common/table/HoodieTableConfig.java](https://codecov.io/gh/apache/hudi/pull/2645/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL0hvb2RpZVRhYmxlQ29uZmlnLmphdmE=)
 | `41.66% <0.00%> (-1.55%)` | `17.00 <0.00> (ø)` | |
   | 
[...pache/hudi/common/table/HoodieTableMetaClient.java](https://codecov.io/gh/apache/hudi/pull/2645/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL0hvb2RpZVRhYmxlTWV0YUNsaWVudC5qYXZh)
 | `63.35% <0.00%> (-3.31%)` | `43.00 <0.00> (ø)` | |
   | 
[.../apache/hudi/common/table/TableSchemaResolver.java](https://codecov.io/gh/apache/hudi/pull/2645/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL1RhYmxlU2NoZW1hUmVzb2x2ZXIuamF2YQ==)
 | `0.00% <0.00%> (ø)` | `0.00 <0.00> (ø)` | |
   | 
[...he/hudi/exception/HoodieDuplicateKeyException.java](https://codecov.io/gh/apache/hudi/pull/2645/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvZXhjZXB0aW9uL0hvb2RpZUR1cGxpY2F0ZUtleUV4Y2VwdGlvbi5qYXZh)
 | `0.00% <0.00%> (ø)` | `0.00 <0.00> (?)` | |
   | 

[GitHub] [hudi] codecov-commenter edited a comment on pull request #2843: [HUDI-1804] Continue to write when Flink write task restart because o…

2021-04-18 Thread GitBox


codecov-commenter edited a comment on pull request #2843:
URL: https://github.com/apache/hudi/pull/2843#issuecomment-821748199


   # 
[Codecov](https://codecov.io/gh/apache/hudi/pull/2843?src=pr=h1_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 Report
   > Merging 
[#2843](https://codecov.io/gh/apache/hudi/pull/2843?src=pr=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (6e943cf) into 
[master](https://codecov.io/gh/apache/hudi/commit/b6d949b48a649acac27d5d9b91677bf2e25e9342?el=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (b6d949b) will **decrease** coverage by `0.01%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/2843/graphs/tree.svg?width=650=150=pr=VTTXabwbs2_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/2843?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
   
   ```diff
   @@ Coverage Diff  @@
   ## master#2843  +/-   ##
   
   - Coverage 52.60%   52.58%   -0.02% 
   + Complexity 3709 3708   -1 
   
 Files   485  485  
 Lines 2322423227   +3 
 Branches   2465 2466   +1 
   
   - Hits  1221612214   -2 
   - Misses 9929 9934   +5 
 Partials   1079 1079  
   ```
   
   | Flag | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | hudicli | `40.29% <ø> (ø)` | `215.00 <ø> (ø)` | |
   | hudiclient | `∅ <ø> (∅)` | `0.00 <ø> (ø)` | |
   | hudicommon | `50.66% <ø> (-0.03%)` | `1976.00 <ø> (-1.00)` | |
   | hudiflink | `56.51% <ø> (-0.04%)` | `516.00 <ø> (ø)` | |
   | hudihadoopmr | `33.33% <ø> (ø)` | `198.00 <ø> (ø)` | |
   | hudisparkdatasource | `72.06% <ø> (ø)` | `237.00 <ø> (ø)` | |
   | hudisync | `45.70% <ø> (ø)` | `131.00 <ø> (ø)` | |
   | huditimelineservice | `64.36% <ø> (ø)` | `62.00 <ø> (ø)` | |
   | hudiutilities | `69.79% <ø> (ø)` | `373.00 <ø> (ø)` | |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/2843?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | 
[...ache/hudi/common/fs/inline/InMemoryFileSystem.java](https://codecov.io/gh/apache/hudi/pull/2843/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL2ZzL2lubGluZS9Jbk1lbW9yeUZpbGVTeXN0ZW0uamF2YQ==)
 | `79.31% <0.00%> (-10.35%)` | `15.00% <0.00%> (-1.00%)` | |
   | 
[.../hudi/table/format/cow/CopyOnWriteInputFormat.java](https://codecov.io/gh/apache/hudi/pull/2843/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS90YWJsZS9mb3JtYXQvY293L0NvcHlPbldyaXRlSW5wdXRGb3JtYXQuamF2YQ==)
 | `55.33% <0.00%> (-0.75%)` | `20.00% <0.00%> (ø%)` | |
   | 
[...java/org/apache/hudi/common/fs/StorageSchemes.java](https://codecov.io/gh/apache/hudi/pull/2843/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL2ZzL1N0b3JhZ2VTY2hlbWVzLmphdmE=)
 | `100.00% <0.00%> (ø)` | `10.00% <0.00%> (ø%)` | |
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] codecov-commenter edited a comment on pull request #2844: [MINOR][hudi-sync] Fix typos

2021-04-18 Thread GitBox


codecov-commenter edited a comment on pull request #2844:
URL: https://github.com/apache/hudi/pull/2844#issuecomment-822005123


   # 
[Codecov](https://codecov.io/gh/apache/hudi/pull/2844?src=pr=h1_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 Report
   > :exclamation: No coverage uploaded for pull request base 
(`master@4e050cc`). [Click here to learn what that 
means](https://docs.codecov.io/docs/error-reference?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#section-missing-base-commit).
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/2844/graphs/tree.svg?width=650=150=pr=VTTXabwbs2_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/2844?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
   
   ```diff
   @@Coverage Diff@@
   ## master#2844   +/-   ##
   =
 Coverage  ?   52.61%   
 Complexity? 3710   
   =
 Files ?  485   
 Lines ?23227   
 Branches  ? 2466   
   =
 Hits  ?12220   
 Misses? 9930   
 Partials  ? 1077   
   ```
   
   | Flag | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | hudicli | `40.29% <ø> (?)` | `215.00 <ø> (?)` | |
   | hudiclient | `∅ <ø> (?)` | `0.00 <ø> (?)` | |
   | hudicommon | `50.71% <ø> (?)` | `1977.00 <ø> (?)` | |
   | hudiflink | `56.51% <ø> (?)` | `516.00 <ø> (?)` | |
   | hudihadoopmr | `33.33% <ø> (?)` | `198.00 <ø> (?)` | |
   | hudisparkdatasource | `72.06% <ø> (?)` | `237.00 <ø> (?)` | |
   | hudisync | `45.70% <ø> (?)` | `131.00 <ø> (?)` | |
   | huditimelineservice | `64.36% <ø> (?)` | `62.00 <ø> (?)` | |
   | hudiutilities | `69.84% <ø> (?)` | `374.00 <ø> (?)` | |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] codecov-commenter edited a comment on pull request #2843: [HUDI-1804] Continue to write when Flink write task restart because o…

2021-04-18 Thread GitBox


codecov-commenter edited a comment on pull request #2843:
URL: https://github.com/apache/hudi/pull/2843#issuecomment-821748199


   # 
[Codecov](https://codecov.io/gh/apache/hudi/pull/2843?src=pr=h1_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 Report
   > Merging 
[#2843](https://codecov.io/gh/apache/hudi/pull/2843?src=pr=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (6e943cf) into 
[master](https://codecov.io/gh/apache/hudi/commit/b6d949b48a649acac27d5d9b91677bf2e25e9342?el=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (b6d949b) will **decrease** coverage by `0.01%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/2843/graphs/tree.svg?width=650=150=pr=VTTXabwbs2_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/2843?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
   
   ```diff
   @@ Coverage Diff  @@
   ## master#2843  +/-   ##
   
   - Coverage 52.60%   52.58%   -0.02% 
   + Complexity 3709 3708   -1 
   
 Files   485  485  
 Lines 2322423227   +3 
 Branches   2465 2466   +1 
   
   - Hits  1221612214   -2 
   - Misses 9929 9934   +5 
 Partials   1079 1079  
   ```
   
   | Flag | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | hudicli | `40.29% <ø> (ø)` | `215.00 <ø> (ø)` | |
   | hudiclient | `∅ <ø> (∅)` | `0.00 <ø> (ø)` | |
   | hudicommon | `50.66% <ø> (-0.03%)` | `1976.00 <ø> (-1.00)` | |
   | hudiflink | `56.51% <ø> (-0.04%)` | `516.00 <ø> (ø)` | |
   | hudihadoopmr | `33.33% <ø> (ø)` | `198.00 <ø> (ø)` | |
   | hudisparkdatasource | `72.06% <ø> (ø)` | `237.00 <ø> (ø)` | |
   | hudisync | `45.70% <ø> (ø)` | `131.00 <ø> (ø)` | |
   | huditimelineservice | `64.36% <ø> (ø)` | `62.00 <ø> (ø)` | |
   | hudiutilities | `69.79% <ø> (ø)` | `373.00 <ø> (ø)` | |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/2843?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | 
[...ache/hudi/common/fs/inline/InMemoryFileSystem.java](https://codecov.io/gh/apache/hudi/pull/2843/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL2ZzL2lubGluZS9Jbk1lbW9yeUZpbGVTeXN0ZW0uamF2YQ==)
 | `79.31% <0.00%> (-10.35%)` | `15.00% <0.00%> (-1.00%)` | |
   | 
[.../hudi/table/format/cow/CopyOnWriteInputFormat.java](https://codecov.io/gh/apache/hudi/pull/2843/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS90YWJsZS9mb3JtYXQvY293L0NvcHlPbldyaXRlSW5wdXRGb3JtYXQuamF2YQ==)
 | `55.33% <0.00%> (-0.75%)` | `20.00% <0.00%> (ø%)` | |
   | 
[...java/org/apache/hudi/common/fs/StorageSchemes.java](https://codecov.io/gh/apache/hudi/pull/2843/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL2ZzL1N0b3JhZ2VTY2hlbWVzLmphdmE=)
 | `100.00% <0.00%> (ø)` | `10.00% <0.00%> (ø%)` | |
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] xiarixiaoyao edited a comment on pull request #2722: [HUDI-1722]hive beeline/spark-sql query specified field on mor table occur NPE

2021-04-18 Thread GitBox


xiarixiaoyao edited a comment on pull request #2722:
URL: https://github.com/apache/hudi/pull/2722#issuecomment-822129270


   @lw309637554   thanks for you review.  i have answered  your questions, pls 
check them, thanks。
   
   Another question: 
TestHoodieCombineHiveInputFormat.testHoodieRealtimeCombineHoodieInputFormat is 
disabled by default,  i have checked that test function, and find there exists 
some problems.  could i fix those problem and enable 
TestHoodieCombineHiveInputFormat.testHoodieRealtimeCombineHoodieInputFormat 
default


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] codecov-commenter edited a comment on pull request #2645: [HUDI-1659] Basic Implementation Of Spark Sql Support

2021-04-18 Thread GitBox


codecov-commenter edited a comment on pull request #2645:
URL: https://github.com/apache/hudi/pull/2645#issuecomment-822128091


   # 
[Codecov](https://codecov.io/gh/apache/hudi/pull/2645?src=pr=h1_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 Report
   > Merging 
[#2645](https://codecov.io/gh/apache/hudi/pull/2645?src=pr=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (8304965) into 
[master](https://codecov.io/gh/apache/hudi/commit/18459d4045ec4a85081c227893b226a4d759f84b?el=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (18459d4) will **increase** coverage by `17.52%`.
   > The diff coverage is `n/a`.
   
   > :exclamation: Current head 8304965 differs from pull request most recent 
head 20a927c. Consider uploading reports for the commit 20a927c to get more 
accurate results
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/2645/graphs/tree.svg?width=650=150=pr=VTTXabwbs2_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/2645?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
   
   ```diff
   @@  Coverage Diff  @@
   ## master#2645   +/-   ##
   =
   + Coverage 52.26%   69.79%   +17.52% 
   + Complexity 3682  373 -3309 
   =
 Files   484   54  -430 
 Lines 23094 1993-21101 
 Branches   2456  235 -2221 
   =
   - Hits  12070 1391-10679 
   + Misses 9959  471 -9488 
   + Partials   1065  131  -934 
   ```
   
   | Flag | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | hudicli | `?` | `?` | |
   | hudiclient | `?` | `?` | |
   | hudicommon | `?` | `?` | |
   | hudiflink | `?` | `?` | |
   | hudihadoopmr | `?` | `?` | |
   | hudisparkdatasource | `?` | `?` | |
   | hudisync | `?` | `?` | |
   | huditimelineservice | `?` | `?` | |
   | hudiutilities | `69.79% <ø> (+0.06%)` | `373.00 <ø> (ø)` | |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/2645?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | 
[...s/deltastreamer/HoodieMultiTableDeltaStreamer.java](https://codecov.io/gh/apache/hudi/pull/2645/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2RlbHRhc3RyZWFtZXIvSG9vZGllTXVsdGlUYWJsZURlbHRhU3RyZWFtZXIuamF2YQ==)
 | `78.39% <0.00%> (ø)` | `18.00% <0.00%> (ø%)` | |
   | 
[...ache/hudi/common/util/collection/DiskBasedMap.java](https://codecov.io/gh/apache/hudi/pull/2645/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3V0aWwvY29sbGVjdGlvbi9EaXNrQmFzZWRNYXAuamF2YQ==)
 | | | |
   | 
[...org/apache/hudi/cli/commands/BootstrapCommand.java](https://codecov.io/gh/apache/hudi/pull/2645/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY2xpL2NvbW1hbmRzL0Jvb3RzdHJhcENvbW1hbmQuamF2YQ==)
 | | | |
   | 
[...i/hadoop/utils/HoodieRealtimeInputFormatUtils.java](https://codecov.io/gh/apache/hudi/pull/2645/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1oYWRvb3AtbXIvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaGFkb29wL3V0aWxzL0hvb2RpZVJlYWx0aW1lSW5wdXRGb3JtYXRVdGlscy5qYXZh)
 | | | |
   | 
[...rg/apache/hudi/metadata/MetadataPartitionType.java](https://codecov.io/gh/apache/hudi/pull/2645/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvbWV0YWRhdGEvTWV0YWRhdGFQYXJ0aXRpb25UeXBlLmphdmE=)
 | | | |
   | 

[GitHub] [hudi] codecov-commenter edited a comment on pull request #2844: [MINOR][hudi-sync] Fix typos

2021-04-18 Thread GitBox


codecov-commenter edited a comment on pull request #2844:
URL: https://github.com/apache/hudi/pull/2844#issuecomment-822005123






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] xiarixiaoyao commented on a change in pull request #2722: [HUDI-1722]hive beeline/spark-sql query specified field on mor table occur NPE

2021-04-18 Thread GitBox


xiarixiaoyao commented on a change in pull request #2722:
URL: https://github.com/apache/hudi/pull/2722#discussion_r615508116



##
File path: 
hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/realtime/HoodieParquetRealtimeInputFormat.java
##
@@ -85,12 +85,14 @@ void addProjectionToJobConf(final RealtimeSplit 
realtimeSplit, final JobConf job
 // risk of experiencing race conditions. Hence, we synchronize on the 
JobConf object here. There is negligible
 // latency incurred here due to the synchronization since get record 
reader is called once per spilt before the
 // actual heavy lifting of reading the parquet files happen.
-if (jobConf.get(HoodieInputFormatUtils.HOODIE_READ_COLUMNS_PROP) == null) {
+if (jobConf.get(HoodieInputFormatUtils.HOODIE_READ_COLUMNS_PROP) == null
+|| (!realtimeSplit.getDeltaLogPaths().isEmpty() && 
!HoodieRealtimeInputFormatUtils.requiredProjectionFieldsExistInConf(jobConf))) {
   synchronized (jobConf) {
 LOG.info(
 "Before adding Hoodie columns, Projections :" + 
jobConf.get(ColumnProjectionUtils.READ_COLUMN_NAMES_CONF_STR)
 + ", Ids :" + 
jobConf.get(ColumnProjectionUtils.READ_COLUMN_IDS_CONF_STR));
-if (jobConf.get(HoodieInputFormatUtils.HOODIE_READ_COLUMNS_PROP) == 
null) {
+if (jobConf.get(HoodieInputFormatUtils.HOODIE_READ_COLUMNS_PROP) == 
null

Review comment:
   Clone the Configuration object can be very expensive。 To avoid 
unexpected performance regressions for workloads, we should not isolation the 
jobconf for different recordreader
   
   i also agree with that revert the 
https://github.com/apache/hudi/pull/2190/files.   however if current query does 
not 
   involve any log files, adding hoodie additional projection columns will lead 
unnecessary io,since we have scanned hoodie additional projection columns .




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] xiarixiaoyao commented on a change in pull request #2722: [HUDI-1722]hive beeline/spark-sql query specified field on mor table occur NPE

2021-04-18 Thread GitBox


xiarixiaoyao commented on a change in pull request #2722:
URL: https://github.com/apache/hudi/pull/2722#discussion_r615508116



##
File path: 
hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/realtime/HoodieParquetRealtimeInputFormat.java
##
@@ -85,12 +85,14 @@ void addProjectionToJobConf(final RealtimeSplit 
realtimeSplit, final JobConf job
 // risk of experiencing race conditions. Hence, we synchronize on the 
JobConf object here. There is negligible
 // latency incurred here due to the synchronization since get record 
reader is called once per spilt before the
 // actual heavy lifting of reading the parquet files happen.
-if (jobConf.get(HoodieInputFormatUtils.HOODIE_READ_COLUMNS_PROP) == null) {
+if (jobConf.get(HoodieInputFormatUtils.HOODIE_READ_COLUMNS_PROP) == null
+|| (!realtimeSplit.getDeltaLogPaths().isEmpty() && 
!HoodieRealtimeInputFormatUtils.requiredProjectionFieldsExistInConf(jobConf))) {
   synchronized (jobConf) {
 LOG.info(
 "Before adding Hoodie columns, Projections :" + 
jobConf.get(ColumnProjectionUtils.READ_COLUMN_NAMES_CONF_STR)
 + ", Ids :" + 
jobConf.get(ColumnProjectionUtils.READ_COLUMN_IDS_CONF_STR));
-if (jobConf.get(HoodieInputFormatUtils.HOODIE_READ_COLUMNS_PROP) == 
null) {
+if (jobConf.get(HoodieInputFormatUtils.HOODIE_READ_COLUMNS_PROP) == 
null

Review comment:
   Clone the Configuration object can be very expensive。 To avoid 
unexpected performance regressions for workloads, we should not isolation the 
jobconf for different recordreader
   
   i also agree with that revert the 
https://github.com/apache/hudi/pull/2190/files.   however if current query does 
not 
   involve log file, add hoodie additional projection columns will lead extra 
neccae the Configuration object can be very expensive。 To avoid unexpected 
performance regressions for workloads, we should not isolation the jobconf for 
different recordreader
   
   i also agree with that revert the 
https://github.com/apache/hudi/pull/2190/files.   however if current query does 
not 
   involve any log files, adding hoodie additional projection columns will lead 
unnecessary io,since we have scanned hoodie additional projection columns .




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] xiarixiaoyao edited a comment on pull request #2722: [HUDI-1722]hive beeline/spark-sql query specified field on mor table occur NPE

2021-04-18 Thread GitBox


xiarixiaoyao edited a comment on pull request #2722:
URL: https://github.com/apache/hudi/pull/2722#issuecomment-822129270


   @lw309637554   thanks for you review.  i left comments for your questions.
   
   Another question: 
TestHoodieCombineHiveInputFormat.testHoodieRealtimeCombineHoodieInputFormat is 
disabled by default,  i have checked that test function, and find there exists 
some problems.  could i fix those problem and enable 
TestHoodieCombineHiveInputFormat.testHoodieRealtimeCombineHoodieInputFormat 
default


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] ssdong commented on pull request #2784: [HUDI-1740] Fix insert-overwrite API archival

2021-04-18 Thread GitBox


ssdong commented on pull request #2784:
URL: https://github.com/apache/hudi/pull/2784#issuecomment-822130380


   @lw309637554 Thank you for your comments and I've replied. Please take a 
look and let me know.  
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] xiarixiaoyao edited a comment on pull request #2722: [HUDI-1722]hive beeline/spark-sql query specified field on mor table occur NPE

2021-04-18 Thread GitBox


xiarixiaoyao edited a comment on pull request #2722:
URL: https://github.com/apache/hudi/pull/2722#issuecomment-822129270


   @lw309637554   thanks for your reviewer.  i left comments for your questions.
   
   Another question: 
TestHoodieCombineHiveInputFormat.testHoodieRealtimeCombineHoodieInputFormat is 
disabled by default,  i have checked that test function, and find there exists 
some problems.  could i fix those problem and enable 
TestHoodieCombineHiveInputFormat.testHoodieRealtimeCombineHoodieInputFormat 
default


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] ssdong commented on a change in pull request #2784: [HUDI-1740] Fix insert-overwrite API archival

2021-04-18 Thread GitBox


ssdong commented on a change in pull request #2784:
URL: https://github.com/apache/hudi/pull/2784#discussion_r615512372



##
File path: 
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/utils/MetadataConversionUtils.java
##
@@ -105,14 +117,25 @@ public static HoodieArchivedMetaEntry 
createMetaWrapper(HoodieInstant hoodieInst
 return archivedMetaWrapper;
   }
 
-  public static HoodieArchivedMetaEntry createMetaWrapper(HoodieInstant 
hoodieInstant,
-  HoodieCommitMetadata 
hoodieCommitMetadata) {
-HoodieArchivedMetaEntry archivedMetaWrapper = new 
HoodieArchivedMetaEntry();
-archivedMetaWrapper.setCommitTime(hoodieInstant.getTimestamp());
-archivedMetaWrapper.setActionState(hoodieInstant.getState().name());
-
archivedMetaWrapper.setHoodieCommitMetadata(convertCommitMetadata(hoodieCommitMetadata));
-archivedMetaWrapper.setActionType(ActionType.commit.name());
-return archivedMetaWrapper;
+  public static Option 
getInflightReplaceMetadata(HoodieTableMetaClient metaClient, HoodieInstant 
instant) throws IOException {

Review comment:
   Hmm, I don't think this method is relevant to clustering.. 樂They are 
solely responsible for retrieving corresponding commit files and parse them, 
which is more applicable to be residing in a "meta conversion" file, which 
comes within `MetadataConversionUtils.java` naturally. In fact, I've gotten rid 
of the dependency upon `ClusteringUtils` in `MetadataConversionUtils.java` 
after this refactoring.  




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] ssdong commented on a change in pull request #2784: [HUDI-1740] Fix insert-overwrite API archival

2021-04-18 Thread GitBox


ssdong commented on a change in pull request #2784:
URL: https://github.com/apache/hudi/pull/2784#discussion_r615512419



##
File path: 
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/utils/MetadataConversionUtils.java
##
@@ -105,14 +117,25 @@ public static HoodieArchivedMetaEntry 
createMetaWrapper(HoodieInstant hoodieInst
 return archivedMetaWrapper;
   }
 
-  public static HoodieArchivedMetaEntry createMetaWrapper(HoodieInstant 
hoodieInstant,
-  HoodieCommitMetadata 
hoodieCommitMetadata) {
-HoodieArchivedMetaEntry archivedMetaWrapper = new 
HoodieArchivedMetaEntry();
-archivedMetaWrapper.setCommitTime(hoodieInstant.getTimestamp());
-archivedMetaWrapper.setActionState(hoodieInstant.getState().name());
-
archivedMetaWrapper.setHoodieCommitMetadata(convertCommitMetadata(hoodieCommitMetadata));
-archivedMetaWrapper.setActionType(ActionType.commit.name());
-return archivedMetaWrapper;
+  public static Option 
getInflightReplaceMetadata(HoodieTableMetaClient metaClient, HoodieInstant 
instant) throws IOException {
+Option inflightContent = 
metaClient.getActiveTimeline().getInstantDetails(instant);
+if (!inflightContent.isPresent() || inflightContent.get().length == 0) {
+  // inflight files can be empty in some certain cases, e.g. when users 
opt in clustering
+  return Option.empty();
+}
+return Option.of(HoodieCommitMetadata.fromBytes(inflightContent.get(), 
HoodieCommitMetadata.class));
+  }
+
+  public static Option 
getRequestedReplaceMetadata(HoodieTableMetaClient metaClient, HoodieInstant 
instant) throws IOException {

Review comment:
   Same as above  




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] xiarixiaoyao edited a comment on pull request #2722: [HUDI-1722]hive beeline/spark-sql query specified field on mor table occur NPE

2021-04-18 Thread GitBox


xiarixiaoyao edited a comment on pull request #2722:
URL: https://github.com/apache/hudi/pull/2722#issuecomment-822129270


   @lw309637554   thanks for your reviewer.  i left comments for your questions.
   
   Another question: 
TestHoodieCombineHiveInputFormat.testHoodieRealtimeCombineHoodieInputFormat is 
disabled by default,  i have checked that test function, and find there exists 
some problems.  could i fix those problem and enable 
TestHoodieCombineHiveInputFormat.testHoodieRealtimeCombineHoodieInputFormat


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] xiarixiaoyao commented on a change in pull request #2722: [HUDI-1722]hive beeline/spark-sql query specified field on mor table occur NPE

2021-04-18 Thread GitBox


xiarixiaoyao commented on a change in pull request #2722:
URL: https://github.com/apache/hudi/pull/2722#discussion_r615510606



##
File path: 
hudi-hadoop-mr/src/test/java/org/apache/hudi/hadoop/functional/TestHoodieCombineHiveInputFormat.java
##
@@ -84,6 +86,73 @@ public void setUp() throws IOException, InterruptedException 
{
 HoodieTestUtils.init(MiniClusterUtil.configuration, 
tempDir.toAbsolutePath().toString(), HoodieTableType.MERGE_ON_READ);
   }
 
+  @Test
+  public void testMutilReaderRealtimeComineHoodieInputFormat() throws 
Exception {
+// test for hudi-1722
+Configuration conf = new Configuration();
+// initial commit
+Schema schema = 
HoodieAvroUtils.addMetadataFields(SchemaTestUtil.getEvolvedSchema());
+HoodieTestUtils.init(hadoopConf, tempDir.toAbsolutePath().toString(), 
HoodieTableType.MERGE_ON_READ);
+String commitTime = "100";
+final int numRecords = 1000;
+// Create 3 parquet files with 1000 records each
+File partitionDir = InputFormatTestUtil.prepareParquetTable(tempDir, 
schema, 3, numRecords, commitTime);
+InputFormatTestUtil.commit(tempDir, commitTime);
+
+String newCommitTime = "101";
+// to trigger the bug of HUDI-1772, only update fileid2
+// insert 1000 update records to log file 2
+// now fileid0, fileid1 has no log files, fileid2 has log file
+HoodieLogFormat.Writer writer =
+InputFormatTestUtil.writeDataBlockToLogFile(partitionDir, fs, 
schema, "fileid2", commitTime, newCommitTime,
+numRecords, numRecords, 0);
+writer.close();
+
+TableDesc tblDesc = Utilities.defaultTd;
+// Set the input format
+tblDesc.setInputFileFormatClass(HoodieParquetRealtimeInputFormat.class);
+PartitionDesc partDesc = new PartitionDesc(tblDesc, null);
+LinkedHashMap pt = new LinkedHashMap<>();
+LinkedHashMap> tableAlias = new LinkedHashMap<>();
+ArrayList alias = new ArrayList<>();
+alias.add(tempDir.toAbsolutePath().toString());
+tableAlias.put(new Path(tempDir.toAbsolutePath().toString()), alias);
+pt.put(new Path(tempDir.toAbsolutePath().toString()), partDesc);
+
+MapredWork mrwork = new MapredWork();
+mrwork.getMapWork().setPathToPartitionInfo(pt);
+mrwork.getMapWork().setPathToAliases(tableAlias);
+Path mapWorkPath = new Path(tempDir.toAbsolutePath().toString());
+Utilities.setMapRedWork(conf, mrwork, mapWorkPath);
+jobConf = new JobConf(conf);
+// Add the paths
+FileInputFormat.setInputPaths(jobConf, partitionDir.getPath());
+jobConf.set(HAS_MAP_WORK, "true");
+// The following config tells Hive to choose ExecMapper to read the 
MAP_WORK
+jobConf.set(MAPRED_MAPPER_CLASS, ExecMapper.class.getName());
+// set SPLIT_MAXSIZE larger  to create one split for 3 files groups
+
jobConf.set(org.apache.hadoop.mapreduce.lib.input.FileInputFormat.SPLIT_MAXSIZE,
 "12800");
+
+HoodieCombineHiveInputFormat combineHiveInputFormat = new 
HoodieCombineHiveInputFormat();
+String tripsHiveColumnTypes = 
"double,string,string,string,double,double,double,double,double";
+InputFormatTestUtil.setProjectFieldsForInputFormat(jobConf, schema, 
tripsHiveColumnTypes);
+InputSplit[] splits = combineHiveInputFormat.getSplits(jobConf, 1);
+// Since the SPLIT_SIZE is 3, we should create only 1 split with all 3 
file groups
+assertEquals(1, splits.length);
+RecordReader recordReader =

Review comment:
   yes, we only create one combine recorder, but this recorder hold three 
RealtimeCompactedRecordReaders。
   the creating order of those RealtimeCompactedRecordReaders lead this npe 
problem.   
   for test example:
   combine recorder holds three RealtimeCompactedRecordReaders, we call them 
creader1, creader2, creader3
   creader1:  only has base file
   creader2: only has base file
   creader3: has base file and log file.
   
   if creader3 is create firstly, hoodie additional projection columns will be 
added to jobConf and in this case the query will be ok
   however if creader1 or creader2 is create firstly,  no hoodie additional 
projection columns will be added to jobConf, the query will failed




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] xiarixiaoyao commented on pull request #2722: [HUDI-1722]hive beeline/spark-sql query specified field on mor table occur NPE

2021-04-18 Thread GitBox


xiarixiaoyao commented on pull request #2722:
URL: https://github.com/apache/hudi/pull/2722#issuecomment-822129270


   @lw309637554   thanks for your reviewer.  i left comments for your questions.
   
   Another question: 
TestHoodieCombineHiveInputFormat.testHoodieRealtimeCombineHoodieInputFormat is 
disabled by default,  i have checked that test function, and find these exists 
some problems.  could i fix those problem and enable 
TestHoodieCombineHiveInputFormat.testHoodieRealtimeCombineHoodieInputFormat


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] ssdong commented on a change in pull request #2784: [HUDI-1740] Fix insert-overwrite API archival

2021-04-18 Thread GitBox


ssdong commented on a change in pull request #2784:
URL: https://github.com/apache/hudi/pull/2784#discussion_r615510834



##
File path: 
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/ReplaceArchivalHelper.java
##
@@ -68,6 +68,11 @@
   public static boolean deleteReplacedFileGroups(HoodieEngineContext context, 
HoodieTableMetaClient metaClient,
  TableFileSystemView 
fileSystemView,
  HoodieInstant instant, 
List replacedPartitions) {
+// There is no file id to be replaced in the very first replace commit 
file for insert overwrite operation
+if (replacedPartitions.isEmpty()) {
+  LOG.warn("Found empty partitionToReplaceFileIds");

Review comment:
   `partitionToReplaceFileIds` is the field name directly taken from the 
replace commit file. I guess it is subject to change. What about just 
explaining the warning as `Found no partition files to replace`? 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] xiarixiaoyao commented on a change in pull request #2722: [HUDI-1722]hive beeline/spark-sql query specified field on mor table occur NPE

2021-04-18 Thread GitBox


xiarixiaoyao commented on a change in pull request #2722:
URL: https://github.com/apache/hudi/pull/2722#discussion_r615510606



##
File path: 
hudi-hadoop-mr/src/test/java/org/apache/hudi/hadoop/functional/TestHoodieCombineHiveInputFormat.java
##
@@ -84,6 +86,73 @@ public void setUp() throws IOException, InterruptedException 
{
 HoodieTestUtils.init(MiniClusterUtil.configuration, 
tempDir.toAbsolutePath().toString(), HoodieTableType.MERGE_ON_READ);
   }
 
+  @Test
+  public void testMutilReaderRealtimeComineHoodieInputFormat() throws 
Exception {
+// test for hudi-1722
+Configuration conf = new Configuration();
+// initial commit
+Schema schema = 
HoodieAvroUtils.addMetadataFields(SchemaTestUtil.getEvolvedSchema());
+HoodieTestUtils.init(hadoopConf, tempDir.toAbsolutePath().toString(), 
HoodieTableType.MERGE_ON_READ);
+String commitTime = "100";
+final int numRecords = 1000;
+// Create 3 parquet files with 1000 records each
+File partitionDir = InputFormatTestUtil.prepareParquetTable(tempDir, 
schema, 3, numRecords, commitTime);
+InputFormatTestUtil.commit(tempDir, commitTime);
+
+String newCommitTime = "101";
+// to trigger the bug of HUDI-1772, only update fileid2
+// insert 1000 update records to log file 2
+// now fileid0, fileid1 has no log files, fileid2 has log file
+HoodieLogFormat.Writer writer =
+InputFormatTestUtil.writeDataBlockToLogFile(partitionDir, fs, 
schema, "fileid2", commitTime, newCommitTime,
+numRecords, numRecords, 0);
+writer.close();
+
+TableDesc tblDesc = Utilities.defaultTd;
+// Set the input format
+tblDesc.setInputFileFormatClass(HoodieParquetRealtimeInputFormat.class);
+PartitionDesc partDesc = new PartitionDesc(tblDesc, null);
+LinkedHashMap pt = new LinkedHashMap<>();
+LinkedHashMap> tableAlias = new LinkedHashMap<>();
+ArrayList alias = new ArrayList<>();
+alias.add(tempDir.toAbsolutePath().toString());
+tableAlias.put(new Path(tempDir.toAbsolutePath().toString()), alias);
+pt.put(new Path(tempDir.toAbsolutePath().toString()), partDesc);
+
+MapredWork mrwork = new MapredWork();
+mrwork.getMapWork().setPathToPartitionInfo(pt);
+mrwork.getMapWork().setPathToAliases(tableAlias);
+Path mapWorkPath = new Path(tempDir.toAbsolutePath().toString());
+Utilities.setMapRedWork(conf, mrwork, mapWorkPath);
+jobConf = new JobConf(conf);
+// Add the paths
+FileInputFormat.setInputPaths(jobConf, partitionDir.getPath());
+jobConf.set(HAS_MAP_WORK, "true");
+// The following config tells Hive to choose ExecMapper to read the 
MAP_WORK
+jobConf.set(MAPRED_MAPPER_CLASS, ExecMapper.class.getName());
+// set SPLIT_MAXSIZE larger  to create one split for 3 files groups
+
jobConf.set(org.apache.hadoop.mapreduce.lib.input.FileInputFormat.SPLIT_MAXSIZE,
 "12800");
+
+HoodieCombineHiveInputFormat combineHiveInputFormat = new 
HoodieCombineHiveInputFormat();
+String tripsHiveColumnTypes = 
"double,string,string,string,double,double,double,double,double";
+InputFormatTestUtil.setProjectFieldsForInputFormat(jobConf, schema, 
tripsHiveColumnTypes);
+InputSplit[] splits = combineHiveInputFormat.getSplits(jobConf, 1);
+// Since the SPLIT_SIZE is 3, we should create only 1 split with all 3 
file groups
+assertEquals(1, splits.length);
+RecordReader recordReader =

Review comment:
   yes, we only create one combine recorder, but this recorder hold three 
RealtimeCompactedRecordReaders。
   the executing order of the RealtimeCompactedRecordReaders lead this npe 
problem.   
   for test example:
   combine recorder holds three RealtimeCompactedRecordReaders, we call them 
creader1, creader2, creader3
   creader1:  only has base file
   creader2: only has base file
   creader3: has base file and log file.
   
   if creader3 is create firstly, hoodie additional projection columns will be 
added to jobConf and in this case the query will be ok
   however if creader1 or creader2 is create firstly,  no hoodie additional 
projection columns will be added to jobConf, the query will failed




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] codecov-commenter commented on pull request #2645: [HUDI-1659] Basic Implementation Of Spark Sql Support

2021-04-18 Thread GitBox


codecov-commenter commented on pull request #2645:
URL: https://github.com/apache/hudi/pull/2645#issuecomment-822128091


   # 
[Codecov](https://codecov.io/gh/apache/hudi/pull/2645?src=pr=h1_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 Report
   > Merging 
[#2645](https://codecov.io/gh/apache/hudi/pull/2645?src=pr=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (8304965) into 
[master](https://codecov.io/gh/apache/hudi/commit/18459d4045ec4a85081c227893b226a4d759f84b?el=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (18459d4) will **decrease** coverage by `42.88%`.
   > The diff coverage is `n/a`.
   
   > :exclamation: Current head 8304965 differs from pull request most recent 
head 20a927c. Consider uploading reports for the commit 20a927c to get more 
accurate results
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/2645/graphs/tree.svg?width=650=150=pr=VTTXabwbs2_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/2645?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
   
   ```diff
   @@ Coverage Diff  @@
   ## master   #2645   +/-   ##
   
   - Coverage 52.26%   9.38%   -42.89% 
   + Complexity 3682  48 -3634 
   
 Files   484  54  -430 
 Lines 230941993-21101 
 Branches   2456 235 -2221 
   
   - Hits  12070 187-11883 
   + Misses 99591793 -8166 
   + Partials   1065  13 -1052 
   ```
   
   | Flag | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | hudicli | `?` | `?` | |
   | hudiclient | `?` | `?` | |
   | hudicommon | `?` | `?` | |
   | hudiflink | `?` | `?` | |
   | hudihadoopmr | `?` | `?` | |
   | hudisparkdatasource | `?` | `?` | |
   | hudisync | `?` | `?` | |
   | huditimelineservice | `?` | `?` | |
   | hudiutilities | `9.38% <ø> (-60.35%)` | `48.00 <ø> (-325.00)` | |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/2645?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | 
[...va/org/apache/hudi/utilities/IdentitySplitter.java](https://codecov.io/gh/apache/hudi/pull/2645/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL0lkZW50aXR5U3BsaXR0ZXIuamF2YQ==)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-2.00%)` | |
   | 
[...va/org/apache/hudi/utilities/schema/SchemaSet.java](https://codecov.io/gh/apache/hudi/pull/2645/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NjaGVtYS9TY2hlbWFTZXQuamF2YQ==)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-3.00%)` | |
   | 
[...a/org/apache/hudi/utilities/sources/RowSource.java](https://codecov.io/gh/apache/hudi/pull/2645/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvUm93U291cmNlLmphdmE=)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-4.00%)` | |
   | 
[.../org/apache/hudi/utilities/sources/AvroSource.java](https://codecov.io/gh/apache/hudi/pull/2645/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvQXZyb1NvdXJjZS5qYXZh)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-1.00%)` | |
   | 
[.../org/apache/hudi/utilities/sources/JsonSource.java](https://codecov.io/gh/apache/hudi/pull/2645/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvSnNvblNvdXJjZS5qYXZh)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-1.00%)` | |
   | 

[GitHub] [hudi] pengzhiwei2018 commented on a change in pull request #2645: [HUDI-1659] Basic Implementation Of Spark Sql Support

2021-04-18 Thread GitBox


pengzhiwei2018 commented on a change in pull request #2645:
URL: https://github.com/apache/hudi/pull/2645#discussion_r615508923



##
File path: 
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieAppendHandle.java
##
@@ -444,7 +448,9 @@ private void writeToBuffer(HoodieRecord record) {
 }
 Option indexedRecord = getIndexedRecord(record);
 if (indexedRecord.isPresent()) {
-  recordList.add(indexedRecord.get());
+  if (indexedRecord.get() != IGNORE_RECORD) { // Skip the Ignore Record.

Review comment:
   Fixed~




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] ssdong commented on a change in pull request #2784: [HUDI-1740] Fix insert-overwrite API archival

2021-04-18 Thread GitBox


ssdong commented on a change in pull request #2784:
URL: https://github.com/apache/hudi/pull/2784#discussion_r615508860



##
File path: 
hudi-common/src/test/java/org/apache/hudi/common/testutils/HoodieTestTable.java
##
@@ -177,10 +161,14 @@ public HoodieTestTable addDeltaCommit(String instantTime) 
throws Exception {
 return this;
   }
 
-  public HoodieTestTable addReplaceCommit(String instantTime, 
HoodieRequestedReplaceMetadata requestedReplaceMetadata, 
HoodieReplaceCommitMetadata metadata) throws Exception {
+  public HoodieTestTable addReplaceCommit(
+  String instantTime,
+  HoodieRequestedReplaceMetadata requestedReplaceMetadata,
+  HoodieReplaceCommitMetadata completeReplaceMetadata,
+  HoodieCommitMetadata inflightReplaceMetadata) throws Exception {

Review comment:
   hmm, if you track it down to `createInflightReplaceCommit` where 
`HoodieCommitMetadata` is being referenced, a `null` check is being presented 
there. However, I do agree an `Optional` reminds people of a null check though 
the
   `o.isPresent()` check is _hardly_ any better than `o != null`. I could make 
the change and also update `createRequestedReplaceCommit` to adopt the same 
Optional `HoodieRequestedReplaceMetadata`.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] codecov-commenter edited a comment on pull request #2843: [HUDI-1804] Continue to write when Flink write task restart because o…

2021-04-18 Thread GitBox


codecov-commenter edited a comment on pull request #2843:
URL: https://github.com/apache/hudi/pull/2843#issuecomment-821748199


   # 
[Codecov](https://codecov.io/gh/apache/hudi/pull/2843?src=pr=h1_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 Report
   > Merging 
[#2843](https://codecov.io/gh/apache/hudi/pull/2843?src=pr=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (6e943cf) into 
[master](https://codecov.io/gh/apache/hudi/commit/b6d949b48a649acac27d5d9b91677bf2e25e9342?el=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (b6d949b) will **decrease** coverage by `43.21%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/2843/graphs/tree.svg?width=650=150=pr=VTTXabwbs2_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/2843?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
   
   ```diff
   @@ Coverage Diff  @@
   ## master   #2843   +/-   ##
   
   - Coverage 52.60%   9.38%   -43.22% 
   + Complexity 3709  48 -3661 
   
 Files   485  54  -431 
 Lines 232241993-21231 
 Branches   2465 235 -2230 
   
   - Hits  12216 187-12029 
   + Misses 99291793 -8136 
   + Partials   1079  13 -1066 
   ```
   
   | Flag | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | hudicli | `?` | `?` | |
   | hudiclient | `?` | `?` | |
   | hudicommon | `?` | `?` | |
   | hudiflink | `?` | `?` | |
   | hudihadoopmr | `?` | `?` | |
   | hudisparkdatasource | `?` | `?` | |
   | hudisync | `?` | `?` | |
   | huditimelineservice | `?` | `?` | |
   | hudiutilities | `9.38% <ø> (-60.42%)` | `48.00 <ø> (-325.00)` | |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/2843?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | 
[...va/org/apache/hudi/utilities/IdentitySplitter.java](https://codecov.io/gh/apache/hudi/pull/2843/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL0lkZW50aXR5U3BsaXR0ZXIuamF2YQ==)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-2.00%)` | |
   | 
[...va/org/apache/hudi/utilities/schema/SchemaSet.java](https://codecov.io/gh/apache/hudi/pull/2843/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NjaGVtYS9TY2hlbWFTZXQuamF2YQ==)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-3.00%)` | |
   | 
[...a/org/apache/hudi/utilities/sources/RowSource.java](https://codecov.io/gh/apache/hudi/pull/2843/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvUm93U291cmNlLmphdmE=)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-4.00%)` | |
   | 
[.../org/apache/hudi/utilities/sources/AvroSource.java](https://codecov.io/gh/apache/hudi/pull/2843/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvQXZyb1NvdXJjZS5qYXZh)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-1.00%)` | |
   | 
[.../org/apache/hudi/utilities/sources/JsonSource.java](https://codecov.io/gh/apache/hudi/pull/2843/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvSnNvblNvdXJjZS5qYXZh)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-1.00%)` | |
   | 

[GitHub] [hudi] pengzhiwei2018 commented on pull request #2645: [HUDI-1659] Basic Implementation Of Spark Sql Support

2021-04-18 Thread GitBox


pengzhiwei2018 commented on pull request #2645:
URL: https://github.com/apache/hudi/pull/2645#issuecomment-822125975


   > @pengzhiwei2018 one more question, will we introduce Catalog to manage 
table operations in further?
   
   Yes, I agree with introduce Catalog to manage table operations for spark3 in 
the further.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] pengzhiwei2018 commented on a change in pull request #2645: [HUDI-1659] Basic Implementation Of Spark Sql Support

2021-04-18 Thread GitBox


pengzhiwei2018 commented on a change in pull request #2645:
URL: https://github.com/apache/hudi/pull/2645#discussion_r615508220



##
File path: 
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieAppendHandle.java
##
@@ -444,7 +448,9 @@ private void writeToBuffer(HoodieRecord record) {
 }
 Option indexedRecord = getIndexedRecord(record);
 if (indexedRecord.isPresent()) {
-  recordList.add(indexedRecord.get());
+  if (indexedRecord.get() != IGNORE_RECORD) { // Skip the Ignore Record.

Review comment:
   > @pengzhiwei2018 one more question, will we introduce Catalog to manage 
table operations in further?
   
   Yes, I agree with introduce Catalog to manage table operations for spark3 in 
the further.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] xiarixiaoyao commented on a change in pull request #2722: [HUDI-1722]hive beeline/spark-sql query specified field on mor table occur NPE

2021-04-18 Thread GitBox


xiarixiaoyao commented on a change in pull request #2722:
URL: https://github.com/apache/hudi/pull/2722#discussion_r615508116



##
File path: 
hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/realtime/HoodieParquetRealtimeInputFormat.java
##
@@ -85,12 +85,14 @@ void addProjectionToJobConf(final RealtimeSplit 
realtimeSplit, final JobConf job
 // risk of experiencing race conditions. Hence, we synchronize on the 
JobConf object here. There is negligible
 // latency incurred here due to the synchronization since get record 
reader is called once per spilt before the
 // actual heavy lifting of reading the parquet files happen.
-if (jobConf.get(HoodieInputFormatUtils.HOODIE_READ_COLUMNS_PROP) == null) {
+if (jobConf.get(HoodieInputFormatUtils.HOODIE_READ_COLUMNS_PROP) == null
+|| (!realtimeSplit.getDeltaLogPaths().isEmpty() && 
!HoodieRealtimeInputFormatUtils.requiredProjectionFieldsExistInConf(jobConf))) {
   synchronized (jobConf) {
 LOG.info(
 "Before adding Hoodie columns, Projections :" + 
jobConf.get(ColumnProjectionUtils.READ_COLUMN_NAMES_CONF_STR)
 + ", Ids :" + 
jobConf.get(ColumnProjectionUtils.READ_COLUMN_IDS_CONF_STR));
-if (jobConf.get(HoodieInputFormatUtils.HOODIE_READ_COLUMNS_PROP) == 
null) {
+if (jobConf.get(HoodieInputFormatUtils.HOODIE_READ_COLUMNS_PROP) == 
null

Review comment:
   Clone the Configuration object can be very expensive。 To avoid 
unexpected performance regressions for workloads, we should not isolation the 
jobconf for different recordreader
   
   i also agree with that revert the 
https://github.com/apache/hudi/pull/2190/files




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] pengzhiwei2018 commented on a change in pull request #2645: [HUDI-1659] Basic Implementation Of Spark Sql Support

2021-04-18 Thread GitBox


pengzhiwei2018 commented on a change in pull request #2645:
URL: https://github.com/apache/hudi/pull/2645#discussion_r615507564



##
File path: 
hudi-common/src/main/java/org/apache/hudi/common/model/DefaultHoodieRecordPayload.java
##
@@ -97,4 +86,22 @@ public DefaultHoodieRecordPayload(Option 
record) {
 }
 return metadata.isEmpty() ? Option.empty() : Option.of(metadata);
   }
+
+  protected boolean needUpdatePersistedRecord(IndexedRecord currentValue,
+  IndexedRecord incomingRecord, 
Properties properties) {
+/*
+ * Combining strategy here returns currentValue on disk if incoming record 
is older.
+ * The incoming record can be either a delete (sent as an upsert with 
_hoodie_is_deleted set to true)
+ * or an insert/update record. In any case, if it is older than the record 
in disk, the currentValue
+ * in disk is returned (to be rewritten with new commit time).
+ *
+ * NOTE: Deletes sent via EmptyHoodieRecordPayload and/or Delete operation 
type do not hit this code path

Review comment:
   Yes, It is used by the HoodieMergeHandle. Here I just put the original 
code into the `needUpdatePersistedRecord`, which can used by the sub-class of 
`DefaultHoodieRecordPayload`.e .g. `ExpressionPayload`.  It is just a code 
refactor here.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] lw309637554 commented on pull request #2199: [HUDI-1264][WIP] spark incremental read support with replace

2021-04-18 Thread GitBox


lw309637554 commented on pull request #2199:
URL: https://github.com/apache/hudi/pull/2199#issuecomment-822124457


   > @satishkotha Is this PR still valid ? @lw309637554 Can you please rebase 
this PR so we can get this landed.
   @n3nash @satishkotha 
   i think the solution in this pr is not very good.
   
   hi , the solution in this pull request just filter the commits between the 
latest replace commit and the end commit.
   But compare to HoodieParquetRealtimeInputFormat , it use 
fsView.getLatestMergedFileSlicesBeforeOrOn to filter the not replace slice, if 
we should change spark incremental relation to use 
fsView.getLatestMergedFileSlicesBeforeOrOn ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-1138) Re-implement marker files via timeline server

2021-04-18 Thread liwei (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-1138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17324649#comment-17324649
 ] 

liwei commented on HUDI-1138:
-

[~uditme] [~vinoth] i also think listing will be  performance improvement 
point.  In cloud storage such as S3 and OSS of alibaba cloud list is expensive 
and slow. 

can we use  

P.S: I was tempted to think Spark listener mechanism can help us deal with 
failed tasks, but it has no guarantees. the writer job could die without 
deleting a partial file. i.e it can improve things, but cant provide guarantees 

and delete the residue files in clean ?

> Re-implement marker files via timeline server
> -
>
> Key: HUDI-1138
> URL: https://issues.apache.org/jira/browse/HUDI-1138
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Writer Core
>Affects Versions: 0.9.0
>Reporter: Vinoth Chandar
>Priority: Blocker
> Fix For: 0.9.0
>
>
> Even as you can argue that RFC-15/consolidated metadata, removes the need for 
> deleting partial files written due to spark task failures/stage retries. It 
> will still leave extra files inside the table (and users will pay for it 
> every month) and we need the marker mechanism to be able to delete these 
> partial files. 
> Here we explore if we can improve the current marker file mechanism, that 
> creates one marker file per data file written, by 
> Delegating the createMarker() call to the driver/timeline server, and have it 
> create marker metadata into a single file handle, that is flushed for 
> durability guarantees
>  
> P.S: I was tempted to think Spark listener mechanism can help us deal with 
> failed tasks, but it has no guarantees. the writer job could die without 
> deleting a partial file. i.e it can improve things, but cant provide 
> guarantees 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] RocMarshal closed pull request #2844: [Hotfix][hudi-sync] Fix typos

2021-04-18 Thread GitBox


RocMarshal closed pull request #2844:
URL: https://github.com/apache/hudi/pull/2844


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (HUDI-897) hudi support log append scenario with better write and asynchronous compaction

2021-04-18 Thread liwei (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

liwei updated HUDI-897:
---
Status: In Progress  (was: Open)

> hudi support log append scenario with better write and asynchronous compaction
> --
>
> Key: HUDI-897
> URL: https://issues.apache.org/jira/browse/HUDI-897
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Compaction, Performance
>Affects Versions: 0.9.0
>Reporter: liwei
>Assignee: liwei
>Priority: Major
> Fix For: 0.9.0
>
> Attachments: image-2020-05-14-19-51-37-938.png, 
> image-2020-05-14-20-14-59-429.png
>
>
> 一、scenario
> The business scenarios of the data lake mainly include analysis of databases, 
> logs, and files.
> !image-2020-05-14-20-14-59-429.png|width=444,height=286!
> Databricks delta lake also aim at these three  scenario. [1]
>  
> 二、Hudi current situation
> At present, hudi can better support the scenario where the database cdc is 
> incrementally written to hudi, and it is also doing bulkload files to hudi. 
> However, there is no good native support for log scenarios (requiring 
> high-throughput writes, no updates, deletions, and focusing on small file 
> scenarios);now can write through inserts without deduplication, but they will 
> still merge on the write side.
>  * In copy on write mode when "hoodie.parquet.small.file.limit" is 100MB, but 
>  every batch small  will cost some time for merge,it will reduce write 
> throughput.  
>  * This scene is not suitable for  merge on read. 
>  * the actual scenario only needs to write parquet in batches when writing, 
> and then provide reverse compaction (similar to delta lake )
> 三、what we can do
>   
>  1.On the write side, just write every batch to parquet file base on the 
> snapshot mechanism,default open the merge,use can close the auto merge for 
> more  write throughput.  
> 2. hudi support asynchronous merge small parquet files like databricks delta 
> lake's  OPTIMIZE command [2] 
>  
> [1] [https://databricks.com/product/delta-lake-on-databricks]
> [2] [https://docs.databricks.com/delta/optimizations/file-mgmt.html]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-897) hudi support log append scenario with better write and asynchronous compaction

2021-04-18 Thread liwei (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17324647#comment-17324647
 ] 

liwei commented on HUDI-897:


okay

> hudi support log append scenario with better write and asynchronous compaction
> --
>
> Key: HUDI-897
> URL: https://issues.apache.org/jira/browse/HUDI-897
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Compaction, Performance
>Affects Versions: 0.9.0
>Reporter: liwei
>Assignee: liwei
>Priority: Major
> Fix For: 0.9.0
>
> Attachments: image-2020-05-14-19-51-37-938.png, 
> image-2020-05-14-20-14-59-429.png
>
>
> 一、scenario
> The business scenarios of the data lake mainly include analysis of databases, 
> logs, and files.
> !image-2020-05-14-20-14-59-429.png|width=444,height=286!
> Databricks delta lake also aim at these three  scenario. [1]
>  
> 二、Hudi current situation
> At present, hudi can better support the scenario where the database cdc is 
> incrementally written to hudi, and it is also doing bulkload files to hudi. 
> However, there is no good native support for log scenarios (requiring 
> high-throughput writes, no updates, deletions, and focusing on small file 
> scenarios);now can write through inserts without deduplication, but they will 
> still merge on the write side.
>  * In copy on write mode when "hoodie.parquet.small.file.limit" is 100MB, but 
>  every batch small  will cost some time for merge,it will reduce write 
> throughput.  
>  * This scene is not suitable for  merge on read. 
>  * the actual scenario only needs to write parquet in batches when writing, 
> and then provide reverse compaction (similar to delta lake )
> 三、what we can do
>   
>  1.On the write side, just write every batch to parquet file base on the 
> snapshot mechanism,default open the merge,use can close the auto merge for 
> more  write throughput.  
> 2. hudi support asynchronous merge small parquet files like databricks delta 
> lake's  OPTIMIZE command [2] 
>  
> [1] [https://databricks.com/product/delta-lake-on-databricks]
> [2] [https://docs.databricks.com/delta/optimizations/file-mgmt.html]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HUDI-897) hudi support log append scenario with better write and asynchronous compaction

2021-04-18 Thread liwei (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

liwei resolved HUDI-897.

Resolution: Fixed

> hudi support log append scenario with better write and asynchronous compaction
> --
>
> Key: HUDI-897
> URL: https://issues.apache.org/jira/browse/HUDI-897
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Compaction, Performance
>Affects Versions: 0.9.0
>Reporter: liwei
>Assignee: liwei
>Priority: Major
> Fix For: 0.9.0
>
> Attachments: image-2020-05-14-19-51-37-938.png, 
> image-2020-05-14-20-14-59-429.png
>
>
> 一、scenario
> The business scenarios of the data lake mainly include analysis of databases, 
> logs, and files.
> !image-2020-05-14-20-14-59-429.png|width=444,height=286!
> Databricks delta lake also aim at these three  scenario. [1]
>  
> 二、Hudi current situation
> At present, hudi can better support the scenario where the database cdc is 
> incrementally written to hudi, and it is also doing bulkload files to hudi. 
> However, there is no good native support for log scenarios (requiring 
> high-throughput writes, no updates, deletions, and focusing on small file 
> scenarios);now can write through inserts without deduplication, but they will 
> still merge on the write side.
>  * In copy on write mode when "hoodie.parquet.small.file.limit" is 100MB, but 
>  every batch small  will cost some time for merge,it will reduce write 
> throughput.  
>  * This scene is not suitable for  merge on read. 
>  * the actual scenario only needs to write parquet in batches when writing, 
> and then provide reverse compaction (similar to delta lake )
> 三、what we can do
>   
>  1.On the write side, just write every batch to parquet file base on the 
> snapshot mechanism,default open the merge,use can close the auto merge for 
> more  write throughput.  
> 2. hudi support asynchronous merge small parquet files like databricks delta 
> lake's  OPTIMIZE command [2] 
>  
> [1] [https://databricks.com/product/delta-lake-on-databricks]
> [2] [https://docs.databricks.com/delta/optimizations/file-mgmt.html]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] lw309637554 commented on pull request #2784: [HUDI-1740] Fix insert-overwrite API archival

2021-04-18 Thread GitBox


lw309637554 commented on pull request #2784:
URL: https://github.com/apache/hudi/pull/2784#issuecomment-822117646


   @zherenyu831 @ssdong thanks for your contribution, left some minor comments


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] lw309637554 commented on a change in pull request #2784: [HUDI-1740] Fix insert-overwrite API archival

2021-04-18 Thread GitBox


lw309637554 commented on a change in pull request #2784:
URL: https://github.com/apache/hudi/pull/2784#discussion_r615500929



##
File path: 
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/utils/MetadataConversionUtils.java
##
@@ -105,14 +117,25 @@ public static HoodieArchivedMetaEntry 
createMetaWrapper(HoodieInstant hoodieInst
 return archivedMetaWrapper;
   }
 
-  public static HoodieArchivedMetaEntry createMetaWrapper(HoodieInstant 
hoodieInstant,
-  HoodieCommitMetadata 
hoodieCommitMetadata) {
-HoodieArchivedMetaEntry archivedMetaWrapper = new 
HoodieArchivedMetaEntry();
-archivedMetaWrapper.setCommitTime(hoodieInstant.getTimestamp());
-archivedMetaWrapper.setActionState(hoodieInstant.getState().name());
-
archivedMetaWrapper.setHoodieCommitMetadata(convertCommitMetadata(hoodieCommitMetadata));
-archivedMetaWrapper.setActionType(ActionType.commit.name());
-return archivedMetaWrapper;
+  public static Option 
getInflightReplaceMetadata(HoodieTableMetaClient metaClient, HoodieInstant 
instant) throws IOException {

Review comment:
   can we move this to ClusteringUtils

##
File path: 
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/utils/MetadataConversionUtils.java
##
@@ -105,14 +117,25 @@ public static HoodieArchivedMetaEntry 
createMetaWrapper(HoodieInstant hoodieInst
 return archivedMetaWrapper;
   }
 
-  public static HoodieArchivedMetaEntry createMetaWrapper(HoodieInstant 
hoodieInstant,
-  HoodieCommitMetadata 
hoodieCommitMetadata) {
-HoodieArchivedMetaEntry archivedMetaWrapper = new 
HoodieArchivedMetaEntry();
-archivedMetaWrapper.setCommitTime(hoodieInstant.getTimestamp());
-archivedMetaWrapper.setActionState(hoodieInstant.getState().name());
-
archivedMetaWrapper.setHoodieCommitMetadata(convertCommitMetadata(hoodieCommitMetadata));
-archivedMetaWrapper.setActionType(ActionType.commit.name());
-return archivedMetaWrapper;
+  public static Option 
getInflightReplaceMetadata(HoodieTableMetaClient metaClient, HoodieInstant 
instant) throws IOException {
+Option inflightContent = 
metaClient.getActiveTimeline().getInstantDetails(instant);
+if (!inflightContent.isPresent() || inflightContent.get().length == 0) {
+  // inflight files can be empty in some certain cases, e.g. when users 
opt in clustering
+  return Option.empty();
+}
+return Option.of(HoodieCommitMetadata.fromBytes(inflightContent.get(), 
HoodieCommitMetadata.class));
+  }
+
+  public static Option 
getRequestedReplaceMetadata(HoodieTableMetaClient metaClient, HoodieInstant 
instant) throws IOException {

Review comment:
   can we move this to ClusteringUtils




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] lw309637554 commented on a change in pull request #2784: [HUDI-1740] Fix insert-overwrite API archival

2021-04-18 Thread GitBox


lw309637554 commented on a change in pull request #2784:
URL: https://github.com/apache/hudi/pull/2784#discussion_r615499401



##
File path: 
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/ReplaceArchivalHelper.java
##
@@ -68,6 +68,11 @@
   public static boolean deleteReplacedFileGroups(HoodieEngineContext context, 
HoodieTableMetaClient metaClient,
  TableFileSystemView 
fileSystemView,
  HoodieInstant instant, 
List replacedPartitions) {
+// There is no file id to be replaced in the very first replace commit 
file for insert overwrite operation
+if (replacedPartitions.isEmpty()) {
+  LOG.warn("Found empty partitionToReplaceFileIds");

Review comment:
   can use  partitionToReplaceFileIds -> replacedPartitions better?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] lw309637554 commented on a change in pull request #2784: [HUDI-1740] Fix insert-overwrite API archival

2021-04-18 Thread GitBox


lw309637554 commented on a change in pull request #2784:
URL: https://github.com/apache/hudi/pull/2784#discussion_r615499171



##
File path: 
hudi-common/src/test/java/org/apache/hudi/common/testutils/HoodieTestTable.java
##
@@ -177,10 +161,14 @@ public HoodieTestTable addDeltaCommit(String instantTime) 
throws Exception {
 return this;
   }
 
-  public HoodieTestTable addReplaceCommit(String instantTime, 
HoodieRequestedReplaceMetadata requestedReplaceMetadata, 
HoodieReplaceCommitMetadata metadata) throws Exception {
+  public HoodieTestTable addReplaceCommit(
+  String instantTime,
+  HoodieRequestedReplaceMetadata requestedReplaceMetadata,
+  HoodieReplaceCommitMetadata completeReplaceMetadata,
+  HoodieCommitMetadata inflightReplaceMetadata) throws Exception {

Review comment:
   can we set HoodieCommitMetadata to Option< HoodieCommitMetadata >  to 
avoid null 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] ngonik commented on issue #1679: [HUDI-1609] How to disable Hive JDBC and enable metastore

2021-04-18 Thread GitBox


ngonik commented on issue #1679:
URL: https://github.com/apache/hudi/issues/1679#issuecomment-822089338


   Hey, I'm having the same issues with JSONEXception on EMR as mentioned 
above. Is there any update around that? Anything I can help with to make it 
work? Thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (HUDI-648) Implement error log/table for Datasource/DeltaStreamer/WriteClient/Compaction writes

2021-04-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-648:

Labels: pull-request-available sev:normal user-support-issues  (was: 
sev:normal user-support-issues)

> Implement error log/table for Datasource/DeltaStreamer/WriteClient/Compaction 
> writes
> 
>
> Key: HUDI-648
> URL: https://issues.apache.org/jira/browse/HUDI-648
> Project: Apache Hudi
>  Issue Type: New Feature
>  Components: DeltaStreamer, Spark Integration, Writer Core
>Reporter: Vinoth Chandar
>Assignee: liujinhui
>Priority: Major
>  Labels: pull-request-available, sev:normal, user-support-issues
> Attachments: image-2021-03-03-11-40-21-083.png
>
>
> We would like a way to hand the erroring records from writing or compaction 
> back to the users, in a separate table or log. This needs to work generically 
> across all the different writer paths.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] xushiyan commented on a change in pull request #2710: [HUDI-648][RFC-20] Implement error log/table for Datasource/DeltaStreamer/WriteClient/Compaction writes

2021-04-18 Thread GitBox


xushiyan commented on a change in pull request #2710:
URL: https://github.com/apache/hudi/pull/2710#discussion_r615463905



##
File path: 
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/error/HoodieBackedErrorTableWriter.java
##
@@ -0,0 +1,247 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.error;
+
+import org.apache.avro.Schema;
+import org.apache.avro.generic.GenericData;
+import org.apache.avro.generic.GenericRecord;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.Path;
+import org.apache.hudi.avro.HoodieAvroUtils;
+import org.apache.hudi.client.WriteStatus;
+import org.apache.hudi.common.config.HoodieErrorTableConfig;
+import org.apache.hudi.common.config.SerializableConfiguration;
+import org.apache.hudi.common.engine.HoodieEngineContext;
+import org.apache.hudi.common.model.HoodieCleaningPolicy;
+import org.apache.hudi.common.model.HoodieRecord;
+import org.apache.hudi.common.model.HoodieRecordPayload;
+import org.apache.hudi.common.model.HoodieTableType;
+import org.apache.hudi.common.model.HoodieFileFormat;
+import org.apache.hudi.common.model.HoodieRecordLocation;
+import org.apache.hudi.common.model.OverwriteWithLatestAvroSchemaPayload;
+import org.apache.hudi.common.model.HoodieKey;
+import org.apache.hudi.common.model.HoodieAvroPayload;
+import org.apache.hudi.common.model.OverwriteWithLatestAvroPayload;
+import org.apache.hudi.common.table.HoodieTableMetaClient;
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.common.util.StringUtils;
+import org.apache.hudi.config.HoodieCompactionConfig;
+import org.apache.hudi.config.HoodieWriteConfig;
+import org.apache.hudi.table.HoodieTable;
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+import org.joda.time.DateTime;
+import org.joda.time.DateTimeZone;
+
+import java.io.IOException;
+import java.io.Serializable;
+import java.util.ArrayList;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.UUID;
+
+import static 
org.apache.hudi.common.config.HoodieErrorTableConfig.ERROR_COMMIT_TIME_METADATA_FIELD;
+import static 
org.apache.hudi.common.config.HoodieErrorTableConfig.ERROR_RECORD_KEY_METADATA_FIELD;
+import static 
org.apache.hudi.common.config.HoodieErrorTableConfig.ERROR_PARTITION_PATH_METADATA_FIELD;
+import static 
org.apache.hudi.common.config.HoodieErrorTableConfig.ERROR_FILE_ID_FIELD;
+import static 
org.apache.hudi.common.config.HoodieErrorTableConfig.ERROR_TABLE_NAME;
+import static 
org.apache.hudi.common.config.HoodieErrorTableConfig.ERROR_RECORD_UUID;
+import static 
org.apache.hudi.common.config.HoodieErrorTableConfig.ERROR_RECORD_TS;
+import static 
org.apache.hudi.common.config.HoodieErrorTableConfig.ERROR_RECORD_SCHEMA;
+import static 
org.apache.hudi.common.config.HoodieErrorTableConfig.ERROR_RECORD_RECORD;
+import static 
org.apache.hudi.common.config.HoodieErrorTableConfig.ERROR_RECORD_MESSAGE;
+import static 
org.apache.hudi.common.config.HoodieErrorTableConfig.ERROR_RECORD_CONTEXT;
+
+/**
+ * Writer implementation backed by an internal hudi table. Error records are 
saved within an internal COW table
+ * called Error table.
+ */
+public abstract class HoodieBackedErrorTableWriter  implements Serializable {
+
+  private static final Logger LOG = 
LogManager.getLogger(HoodieBackedErrorTableWriter.class);
+
+  protected HoodieWriteConfig errorTableWriteConfig;
+  protected HoodieWriteConfig datasetWriteConfig;
+  protected String tableName;
+
+  protected HoodieTableMetaClient metaClient;
+  protected SerializableConfiguration hadoopConf;
+  protected final transient HoodieEngineContext engineContext;
+  protected String basePath;
+
+  protected HoodieBackedErrorTableWriter(Configuration hadoopConf, 
HoodieWriteConfig writeConfig, HoodieEngineContext engineContext) {
+this.datasetWriteConfig = writeConfig;
+this.engineContext = engineContext;
+this.hadoopConf = new SerializableConfiguration(hadoopConf);
+
+if (writeConfig.errorTableEnabled()) {
+  this.tableName = writeConfig.getTableName() + 
HoodieErrorTableConfig.ERROR_TABLE_NAME_SUFFIX;
+  this.basePath = 

[GitHub] [hudi] codecov-commenter edited a comment on pull request #2677: [HUDI-1714] Added tests to TestHoodieTimelineArchiveLog for the archival of compl…

2021-04-18 Thread GitBox


codecov-commenter edited a comment on pull request #2677:
URL: https://github.com/apache/hudi/pull/2677#issuecomment-822060324


   # 
[Codecov](https://codecov.io/gh/apache/hudi/pull/2677?src=pr=h1_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 Report
   > Merging 
[#2677](https://codecov.io/gh/apache/hudi/pull/2677?src=pr=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (a45db97) into 
[master](https://codecov.io/gh/apache/hudi/commit/4e050cc2ba2620d83687be5e5d69dcd747e9f72c?el=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (4e050cc) will **increase** coverage by `0.01%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/2677/graphs/tree.svg?width=650=150=pr=VTTXabwbs2_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/2677?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
   
   ```diff
   @@ Coverage Diff  @@
   ## master#2677  +/-   ##
   
   + Coverage 52.58%   52.59%   +0.01% 
   - Complexity 3707 3709   +2 
   
 Files   485  485  
 Lines 2322723227  
 Branches   2466 2466  
   
   + Hits  1221312217   +4 
   + Misses 9934 9933   -1 
   + Partials   1080 1077   -3 
   ```
   
   | Flag | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | hudicli | `40.29% <ø> (ø)` | `215.00 <ø> (ø)` | |
   | hudiclient | `∅ <ø> (∅)` | `0.00 <ø> (ø)` | |
   | hudicommon | `50.68% <ø> (+0.01%)` | `1976.00 <ø> (ø)` | |
   | hudiflink | `56.51% <ø> (ø)` | `516.00 <ø> (ø)` | |
   | hudihadoopmr | `33.33% <ø> (ø)` | `198.00 <ø> (ø)` | |
   | hudisparkdatasource | `72.06% <ø> (ø)` | `237.00 <ø> (ø)` | |
   | hudisync | `45.70% <ø> (ø)` | `131.00 <ø> (ø)` | |
   | huditimelineservice | `64.36% <ø> (ø)` | `62.00 <ø> (ø)` | |
   | hudiutilities | `69.84% <ø> (+0.10%)` | `374.00 <ø> (+2.00)` | |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/2677?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | 
[...apache/hudi/utilities/deltastreamer/DeltaSync.java](https://codecov.io/gh/apache/hudi/pull/2677/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2RlbHRhc3RyZWFtZXIvRGVsdGFTeW5jLmphdmE=)
 | `71.42% <0.00%> (+0.68%)` | `56.00% <0.00%> (+2.00%)` | |
   | 
[...e/hudi/common/table/log/HoodieLogFormatWriter.java](https://codecov.io/gh/apache/hudi/pull/2677/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL2xvZy9Ib29kaWVMb2dGb3JtYXRXcml0ZXIuamF2YQ==)
 | `79.68% <0.00%> (+1.56%)` | `26.00% <0.00%> (ø%)` | |
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] codecov-commenter edited a comment on pull request #2677: [HUDI-1714] Added tests to TestHoodieTimelineArchiveLog for the archival of compl…

2021-04-18 Thread GitBox


codecov-commenter edited a comment on pull request #2677:
URL: https://github.com/apache/hudi/pull/2677#issuecomment-822060324






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] codecov-commenter edited a comment on pull request #2677: [HUDI-1714] Added tests to TestHoodieTimelineArchiveLog for the archival of compl…

2021-04-18 Thread GitBox


codecov-commenter edited a comment on pull request #2677:
URL: https://github.com/apache/hudi/pull/2677#issuecomment-822060324


   # 
[Codecov](https://codecov.io/gh/apache/hudi/pull/2677?src=pr=h1_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 Report
   > Merging 
[#2677](https://codecov.io/gh/apache/hudi/pull/2677?src=pr=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (a45db97) into 
[master](https://codecov.io/gh/apache/hudi/commit/4e050cc2ba2620d83687be5e5d69dcd747e9f72c?el=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (4e050cc) will **increase** coverage by `17.26%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/2677/graphs/tree.svg?width=650=150=pr=VTTXabwbs2_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/2677?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
   
   ```diff
   @@  Coverage Diff  @@
   ## master#2677   +/-   ##
   =
   + Coverage 52.58%   69.84%   +17.26% 
   + Complexity 3707  374 - 
   =
 Files   485   54  -431 
 Lines 23227 1993-21234 
 Branches   2466  235 -2231 
   =
   - Hits  12213 1392-10821 
   + Misses 9934  471 -9463 
   + Partials   1080  130  -950 
   ```
   
   | Flag | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | hudicli | `?` | `?` | |
   | hudiclient | `?` | `?` | |
   | hudicommon | `?` | `?` | |
   | hudiflink | `?` | `?` | |
   | hudihadoopmr | `?` | `?` | |
   | hudisparkdatasource | `?` | `?` | |
   | hudisync | `?` | `?` | |
   | huditimelineservice | `?` | `?` | |
   | hudiutilities | `69.84% <ø> (+0.10%)` | `374.00 <ø> (+2.00)` | |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/2677?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | 
[...rg/apache/hudi/common/util/SerializationUtils.java](https://codecov.io/gh/apache/hudi/pull/2677/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3V0aWwvU2VyaWFsaXphdGlvblV0aWxzLmphdmE=)
 | | | |
   | 
[...he/hudi/sink/partitioner/BucketAssignFunction.java](https://codecov.io/gh/apache/hudi/pull/2677/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS9zaW5rL3BhcnRpdGlvbmVyL0J1Y2tldEFzc2lnbkZ1bmN0aW9uLmphdmE=)
 | | | |
   | 
[...e/hudi/exception/HoodieFlinkStreamerException.java](https://codecov.io/gh/apache/hudi/pull/2677/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS9leGNlcHRpb24vSG9vZGllRmxpbmtTdHJlYW1lckV4Y2VwdGlvbi5qYXZh)
 | | | |
   | 
[...hudi/common/fs/inline/InLineFsDataInputStream.java](https://codecov.io/gh/apache/hudi/pull/2677/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL2ZzL2lubGluZS9JbkxpbmVGc0RhdGFJbnB1dFN0cmVhbS5qYXZh)
 | | | |
   | 
[...sioning/clean/CleanMetadataV2MigrationHandler.java](https://codecov.io/gh/apache/hudi/pull/2677/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3RpbWVsaW5lL3ZlcnNpb25pbmcvY2xlYW4vQ2xlYW5NZXRhZGF0YVYyTWlncmF0aW9uSGFuZGxlci5qYXZh)
 | | | |
   | 
[...java/org/apache/hudi/sink/StreamWriteFunction.java](https://codecov.io/gh/apache/hudi/pull/2677/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS9zaW5rL1N0cmVhbVdyaXRlRnVuY3Rpb24uamF2YQ==)
 | | | |
   | 

[GitHub] [hudi] codecov-commenter commented on pull request #2677: [HUDI-1714] Added tests to TestHoodieTimelineArchiveLog for the archival of compl…

2021-04-18 Thread GitBox


codecov-commenter commented on pull request #2677:
URL: https://github.com/apache/hudi/pull/2677#issuecomment-822060324


   # 
[Codecov](https://codecov.io/gh/apache/hudi/pull/2677?src=pr=h1_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 Report
   > Merging 
[#2677](https://codecov.io/gh/apache/hudi/pull/2677?src=pr=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (a45db97) into 
[master](https://codecov.io/gh/apache/hudi/commit/4e050cc2ba2620d83687be5e5d69dcd747e9f72c?el=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (4e050cc) will **decrease** coverage by `43.19%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/2677/graphs/tree.svg?width=650=150=pr=VTTXabwbs2_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/2677?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
   
   ```diff
   @@ Coverage Diff  @@
   ## master   #2677   +/-   ##
   
   - Coverage 52.58%   9.38%   -43.20% 
   + Complexity 3707  48 -3659 
   
 Files   485  54  -431 
 Lines 232271993-21234 
 Branches   2466 235 -2231 
   
   - Hits  12213 187-12026 
   + Misses 99341793 -8141 
   + Partials   1080  13 -1067 
   ```
   
   | Flag | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | hudicli | `?` | `?` | |
   | hudiclient | `?` | `?` | |
   | hudicommon | `?` | `?` | |
   | hudiflink | `?` | `?` | |
   | hudihadoopmr | `?` | `?` | |
   | hudisparkdatasource | `?` | `?` | |
   | hudisync | `?` | `?` | |
   | huditimelineservice | `?` | `?` | |
   | hudiutilities | `9.38% <ø> (-60.37%)` | `48.00 <ø> (-324.00)` | |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/2677?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | 
[...va/org/apache/hudi/utilities/IdentitySplitter.java](https://codecov.io/gh/apache/hudi/pull/2677/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL0lkZW50aXR5U3BsaXR0ZXIuamF2YQ==)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-2.00%)` | |
   | 
[...va/org/apache/hudi/utilities/schema/SchemaSet.java](https://codecov.io/gh/apache/hudi/pull/2677/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NjaGVtYS9TY2hlbWFTZXQuamF2YQ==)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-3.00%)` | |
   | 
[...a/org/apache/hudi/utilities/sources/RowSource.java](https://codecov.io/gh/apache/hudi/pull/2677/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvUm93U291cmNlLmphdmE=)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-4.00%)` | |
   | 
[.../org/apache/hudi/utilities/sources/AvroSource.java](https://codecov.io/gh/apache/hudi/pull/2677/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvQXZyb1NvdXJjZS5qYXZh)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-1.00%)` | |
   | 
[.../org/apache/hudi/utilities/sources/JsonSource.java](https://codecov.io/gh/apache/hudi/pull/2677/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvSnNvblNvdXJjZS5qYXZh)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-1.00%)` | |
   | 

[jira] [Commented] (HUDI-648) Implement error log/table for Datasource/DeltaStreamer/WriteClient/Compaction writes

2021-04-18 Thread Raymond Xu (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17324600#comment-17324600
 ] 

Raymond Xu commented on HUDI-648:
-

[~vinoth] The PR is [https://github.com/apache/hudi/pull/2710]

somehow not linked by itself.

We previously discussed over the RFC about putting error table alongside the 
main table or having a global error table. I'm also ok with putting it 
alongside metadata table, having everything in one place.

[~liujinhui] would you kindly update the RFC to reflect the latest design 
please? given you've implemented in a slightly different way.

> Implement error log/table for Datasource/DeltaStreamer/WriteClient/Compaction 
> writes
> 
>
> Key: HUDI-648
> URL: https://issues.apache.org/jira/browse/HUDI-648
> Project: Apache Hudi
>  Issue Type: New Feature
>  Components: DeltaStreamer, Spark Integration, Writer Core
>Reporter: Vinoth Chandar
>Assignee: liujinhui
>Priority: Major
>  Labels: sev:normal, user-support-issues
> Attachments: image-2021-03-03-11-40-21-083.png
>
>
> We would like a way to hand the erroring records from writing or compaction 
> back to the users, in a separate table or log. This needs to work generically 
> across all the different writer paths.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] codecov-commenter edited a comment on pull request #2844: [Hotfix][hudi-sync] Fix typos

2021-04-18 Thread GitBox


codecov-commenter edited a comment on pull request #2844:
URL: https://github.com/apache/hudi/pull/2844#issuecomment-822005123


   # 
[Codecov](https://codecov.io/gh/apache/hudi/pull/2844?src=pr=h1_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 Report
   > Merging 
[#2844](https://codecov.io/gh/apache/hudi/pull/2844?src=pr=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (0e5e732) into 
[master](https://codecov.io/gh/apache/hudi/commit/4e050cc2ba2620d83687be5e5d69dcd747e9f72c?el=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (4e050cc) will **increase** coverage by `0.00%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/2844/graphs/tree.svg?width=650=150=pr=VTTXabwbs2_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/2844?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
   
   ```diff
   @@Coverage Diff@@
   ## master#2844   +/-   ##
   =
 Coverage 52.58%   52.58%   
   - Complexity 3707 3708+1 
   =
 Files   485  485   
 Lines 2322723227   
 Branches   2466 2466   
   =
   + Hits  1221312214+1 
 Misses 9934 9934   
   + Partials   1080 1079-1 
   ```
   
   | Flag | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | hudicli | `40.29% <ø> (ø)` | `215.00 <ø> (ø)` | |
   | hudiclient | `∅ <ø> (∅)` | `0.00 <ø> (ø)` | |
   | hudicommon | `50.66% <ø> (ø)` | `1976.00 <ø> (ø)` | |
   | hudiflink | `56.51% <ø> (ø)` | `516.00 <ø> (ø)` | |
   | hudihadoopmr | `33.33% <ø> (ø)` | `198.00 <ø> (ø)` | |
   | hudisparkdatasource | `72.06% <ø> (ø)` | `237.00 <ø> (ø)` | |
   | hudisync | `45.70% <ø> (ø)` | `131.00 <ø> (ø)` | |
   | huditimelineservice | `64.36% <ø> (ø)` | `62.00 <ø> (ø)` | |
   | hudiutilities | `69.79% <ø> (+0.05%)` | `373.00 <ø> (+1.00)` | |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/2844?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | 
[...apache/hudi/utilities/deltastreamer/DeltaSync.java](https://codecov.io/gh/apache/hudi/pull/2844/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2RlbHRhc3RyZWFtZXIvRGVsdGFTeW5jLmphdmE=)
 | `71.08% <0.00%> (+0.34%)` | `55.00% <0.00%> (+1.00%)` | |
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] codecov-commenter edited a comment on pull request #2844: [Hotfix][hudi-sync] Fix typos

2021-04-18 Thread GitBox


codecov-commenter edited a comment on pull request #2844:
URL: https://github.com/apache/hudi/pull/2844#issuecomment-822005123


   # 
[Codecov](https://codecov.io/gh/apache/hudi/pull/2844?src=pr=h1_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 Report
   > Merging 
[#2844](https://codecov.io/gh/apache/hudi/pull/2844?src=pr=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (0e5e732) into 
[master](https://codecov.io/gh/apache/hudi/commit/4e050cc2ba2620d83687be5e5d69dcd747e9f72c?el=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (4e050cc) will **increase** coverage by `17.21%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/2844/graphs/tree.svg?width=650=150=pr=VTTXabwbs2_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/2844?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
   
   ```diff
   @@  Coverage Diff  @@
   ## master#2844   +/-   ##
   =
   + Coverage 52.58%   69.79%   +17.21% 
   + Complexity 3707  373 -3334 
   =
 Files   485   54  -431 
 Lines 23227 1993-21234 
 Branches   2466  235 -2231 
   =
   - Hits  12213 1391-10822 
   + Misses 9934  471 -9463 
   + Partials   1080  131  -949 
   ```
   
   | Flag | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | hudicli | `?` | `?` | |
   | hudiclient | `?` | `?` | |
   | hudicommon | `?` | `?` | |
   | hudiflink | `?` | `?` | |
   | hudihadoopmr | `?` | `?` | |
   | hudisparkdatasource | `?` | `?` | |
   | hudisync | `?` | `?` | |
   | huditimelineservice | `?` | `?` | |
   | hudiutilities | `69.79% <ø> (+0.05%)` | `373.00 <ø> (+1.00)` | |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/2844?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | 
[...ava/org/apache/hudi/exception/HoodieException.java](https://codecov.io/gh/apache/hudi/pull/2844/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvZXhjZXB0aW9uL0hvb2RpZUV4Y2VwdGlvbi5qYXZh)
 | | | |
   | 
[...a/org/apache/hudi/avro/HoodieAvroWriteSupport.java](https://codecov.io/gh/apache/hudi/pull/2844/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvYXZyby9Ib29kaWVBdnJvV3JpdGVTdXBwb3J0LmphdmE=)
 | | | |
   | 
[.../apache/hudi/common/bootstrap/FileStatusUtils.java](https://codecov.io/gh/apache/hudi/pull/2844/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL2Jvb3RzdHJhcC9GaWxlU3RhdHVzVXRpbHMuamF2YQ==)
 | | | |
   | 
[...he/hudi/common/util/HoodieRecordSizeEstimator.java](https://codecov.io/gh/apache/hudi/pull/2844/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3V0aWwvSG9vZGllUmVjb3JkU2l6ZUVzdGltYXRvci5qYXZh)
 | | | |
   | 
[...e/hudi/common/table/timeline/dto/FileSliceDTO.java](https://codecov.io/gh/apache/hudi/pull/2844/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3RpbWVsaW5lL2R0by9GaWxlU2xpY2VEVE8uamF2YQ==)
 | | | |
   | 
[.../versioning/compaction/CompactionPlanMigrator.java](https://codecov.io/gh/apache/hudi/pull/2844/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3RpbWVsaW5lL3ZlcnNpb25pbmcvY29tcGFjdGlvbi9Db21wYWN0aW9uUGxhbk1pZ3JhdG9yLmphdmE=)
 | | | |
   | 

[GitHub] [hudi] codecov-commenter commented on pull request #2844: [Hotfix][hudi-sync] Fix typos

2021-04-18 Thread GitBox


codecov-commenter commented on pull request #2844:
URL: https://github.com/apache/hudi/pull/2844#issuecomment-822005123


   # 
[Codecov](https://codecov.io/gh/apache/hudi/pull/2844?src=pr=h1_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 Report
   > Merging 
[#2844](https://codecov.io/gh/apache/hudi/pull/2844?src=pr=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (0e5e732) into 
[master](https://codecov.io/gh/apache/hudi/commit/4e050cc2ba2620d83687be5e5d69dcd747e9f72c?el=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (4e050cc) will **decrease** coverage by `43.19%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/2844/graphs/tree.svg?width=650=150=pr=VTTXabwbs2_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/2844?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
   
   ```diff
   @@ Coverage Diff  @@
   ## master   #2844   +/-   ##
   
   - Coverage 52.58%   9.38%   -43.20% 
   + Complexity 3707  48 -3659 
   
 Files   485  54  -431 
 Lines 232271993-21234 
 Branches   2466 235 -2231 
   
   - Hits  12213 187-12026 
   + Misses 99341793 -8141 
   + Partials   1080  13 -1067 
   ```
   
   | Flag | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | hudicli | `?` | `?` | |
   | hudiclient | `?` | `?` | |
   | hudicommon | `?` | `?` | |
   | hudiflink | `?` | `?` | |
   | hudihadoopmr | `?` | `?` | |
   | hudisparkdatasource | `?` | `?` | |
   | hudisync | `?` | `?` | |
   | huditimelineservice | `?` | `?` | |
   | hudiutilities | `9.38% <ø> (-60.37%)` | `48.00 <ø> (-324.00)` | |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/2844?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | 
[...va/org/apache/hudi/utilities/IdentitySplitter.java](https://codecov.io/gh/apache/hudi/pull/2844/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL0lkZW50aXR5U3BsaXR0ZXIuamF2YQ==)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-2.00%)` | |
   | 
[...va/org/apache/hudi/utilities/schema/SchemaSet.java](https://codecov.io/gh/apache/hudi/pull/2844/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NjaGVtYS9TY2hlbWFTZXQuamF2YQ==)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-3.00%)` | |
   | 
[...a/org/apache/hudi/utilities/sources/RowSource.java](https://codecov.io/gh/apache/hudi/pull/2844/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvUm93U291cmNlLmphdmE=)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-4.00%)` | |
   | 
[.../org/apache/hudi/utilities/sources/AvroSource.java](https://codecov.io/gh/apache/hudi/pull/2844/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvQXZyb1NvdXJjZS5qYXZh)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-1.00%)` | |
   | 
[.../org/apache/hudi/utilities/sources/JsonSource.java](https://codecov.io/gh/apache/hudi/pull/2844/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvSnNvblNvdXJjZS5qYXZh)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-1.00%)` | |
   | 

[GitHub] [hudi] RocMarshal opened a new pull request #2844: [Hotfix][hudi-sync] Fix typos

2021-04-18 Thread GitBox


RocMarshal opened a new pull request #2844:
URL: https://github.com/apache/hudi/pull/2844


   ## *Tips*
   - *Thank you very much for contributing to Apache Hudi.*
   - *Please review https://hudi.apache.org/contributing.html before opening a 
pull request.*
   
   ## What is the purpose of the pull request
   
   *(For example: This pull request adds quick-start document.)*
   
   ## Brief change log
   
   *(for example:)*
 - *Modify AnnotationLocation checkstyle rule in checkstyle.xml*
   
   ## Verify this pull request
   
   *(Please pick either of the following options)*
   
   This pull request is a trivial rework / code cleanup without any test 
coverage.
   
   *(or)*
   
   This pull request is already covered by existing tests, such as *(please 
describe tests)*.
   
   (or)
   
   This change added tests and can be verified as follows:
   
   *(example:)*
   
 - *Added integration tests for end-to-end.*
 - *Added HoodieClientWriteTest to verify the change.*
 - *Manually verified the change by running a job locally.*
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] RocMarshal commented on a change in pull request #2822: [Hotfix][hudi-sync] Refactor method up to parent-class

2021-04-18 Thread GitBox


RocMarshal commented on a change in pull request #2822:
URL: https://github.com/apache/hudi/pull/2822#discussion_r615409612



##
File path: 
hudi-sync/hudi-sync-common/src/main/java/org/apache/hudi/sync/common/AbstractSyncHoodieClient.java
##
@@ -136,6 +141,42 @@ public MessageType getDataSchema() {
 }
   }
 
+  public abstract static class TypeOptimizer implements Serializable {

Review comment:
   @leesf Of course. I will add tests  for it ASAP.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] lw309637554 commented on pull request #2722: [HUDI-1722]hive beeline/spark-sql query specified field on mor table occur NPE

2021-04-18 Thread GitBox


lw309637554 commented on pull request #2722:
URL: https://github.com/apache/hudi/pull/2722#issuecomment-821957510


   @xiarixiaoyao  thanks for your contribution. Add the unit test is very 
necessary. Also the resolution left some comments.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] lw309637554 commented on a change in pull request #2722: [HUDI-1722]hive beeline/spark-sql query specified field on mor table occur NPE

2021-04-18 Thread GitBox


lw309637554 commented on a change in pull request #2722:
URL: https://github.com/apache/hudi/pull/2722#discussion_r615366128



##
File path: 
hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/realtime/HoodieParquetRealtimeInputFormat.java
##
@@ -85,12 +85,14 @@ void addProjectionToJobConf(final RealtimeSplit 
realtimeSplit, final JobConf job
 // risk of experiencing race conditions. Hence, we synchronize on the 
JobConf object here. There is negligible
 // latency incurred here due to the synchronization since get record 
reader is called once per spilt before the
 // actual heavy lifting of reading the parquet files happen.
-if (jobConf.get(HoodieInputFormatUtils.HOODIE_READ_COLUMNS_PROP) == null) {
+if (jobConf.get(HoodieInputFormatUtils.HOODIE_READ_COLUMNS_PROP) == null
+|| (!realtimeSplit.getDeltaLogPaths().isEmpty() && 
!HoodieRealtimeInputFormatUtils.requiredProjectionFieldsExistInConf(jobConf))) {
   synchronized (jobConf) {
 LOG.info(
 "Before adding Hoodie columns, Projections :" + 
jobConf.get(ColumnProjectionUtils.READ_COLUMN_NAMES_CONF_STR)
 + ", Ids :" + 
jobConf.get(ColumnProjectionUtils.READ_COLUMN_IDS_CONF_STR));
-if (jobConf.get(HoodieInputFormatUtils.HOODIE_READ_COLUMNS_PROP) == 
null) {
+if (jobConf.get(HoodieInputFormatUtils.HOODIE_READ_COLUMNS_PROP) == 
null

Review comment:
   can we isolation the jobconf for different recordreader?
   or just revert the https://github.com/apache/hudi/pull/2190/files, for 
delete the "if (!realtimeSplit.getDeltaLogPaths().isEmpty()) {"




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] lw309637554 commented on a change in pull request #2722: [HUDI-1722]hive beeline/spark-sql query specified field on mor table occur NPE

2021-04-18 Thread GitBox


lw309637554 commented on a change in pull request #2722:
URL: https://github.com/apache/hudi/pull/2722#discussion_r615365836



##
File path: 
hudi-hadoop-mr/src/test/java/org/apache/hudi/hadoop/functional/TestHoodieCombineHiveInputFormat.java
##
@@ -84,6 +86,73 @@ public void setUp() throws IOException, InterruptedException 
{
 HoodieTestUtils.init(MiniClusterUtil.configuration, 
tempDir.toAbsolutePath().toString(), HoodieTableType.MERGE_ON_READ);
   }
 
+  @Test
+  public void testMutilReaderRealtimeComineHoodieInputFormat() throws 
Exception {
+// test for hudi-1722
+Configuration conf = new Configuration();
+// initial commit
+Schema schema = 
HoodieAvroUtils.addMetadataFields(SchemaTestUtil.getEvolvedSchema());
+HoodieTestUtils.init(hadoopConf, tempDir.toAbsolutePath().toString(), 
HoodieTableType.MERGE_ON_READ);
+String commitTime = "100";
+final int numRecords = 1000;
+// Create 3 parquet files with 1000 records each
+File partitionDir = InputFormatTestUtil.prepareParquetTable(tempDir, 
schema, 3, numRecords, commitTime);
+InputFormatTestUtil.commit(tempDir, commitTime);
+
+String newCommitTime = "101";
+// to trigger the bug of HUDI-1772, only update fileid2
+// insert 1000 update records to log file 2
+// now fileid0, fileid1 has no log files, fileid2 has log file
+HoodieLogFormat.Writer writer =
+InputFormatTestUtil.writeDataBlockToLogFile(partitionDir, fs, 
schema, "fileid2", commitTime, newCommitTime,
+numRecords, numRecords, 0);
+writer.close();
+
+TableDesc tblDesc = Utilities.defaultTd;
+// Set the input format
+tblDesc.setInputFileFormatClass(HoodieParquetRealtimeInputFormat.class);
+PartitionDesc partDesc = new PartitionDesc(tblDesc, null);
+LinkedHashMap pt = new LinkedHashMap<>();
+LinkedHashMap> tableAlias = new LinkedHashMap<>();
+ArrayList alias = new ArrayList<>();
+alias.add(tempDir.toAbsolutePath().toString());
+tableAlias.put(new Path(tempDir.toAbsolutePath().toString()), alias);
+pt.put(new Path(tempDir.toAbsolutePath().toString()), partDesc);
+
+MapredWork mrwork = new MapredWork();
+mrwork.getMapWork().setPathToPartitionInfo(pt);
+mrwork.getMapWork().setPathToAliases(tableAlias);
+Path mapWorkPath = new Path(tempDir.toAbsolutePath().toString());
+Utilities.setMapRedWork(conf, mrwork, mapWorkPath);
+jobConf = new JobConf(conf);
+// Add the paths
+FileInputFormat.setInputPaths(jobConf, partitionDir.getPath());
+jobConf.set(HAS_MAP_WORK, "true");
+// The following config tells Hive to choose ExecMapper to read the 
MAP_WORK
+jobConf.set(MAPRED_MAPPER_CLASS, ExecMapper.class.getName());
+// set SPLIT_MAXSIZE larger  to create one split for 3 files groups
+
jobConf.set(org.apache.hadoop.mapreduce.lib.input.FileInputFormat.SPLIT_MAXSIZE,
 "12800");
+
+HoodieCombineHiveInputFormat combineHiveInputFormat = new 
HoodieCombineHiveInputFormat();
+String tripsHiveColumnTypes = 
"double,string,string,string,double,double,double,double,double";
+InputFormatTestUtil.setProjectFieldsForInputFormat(jobConf, schema, 
tripsHiveColumnTypes);
+InputSplit[] splits = combineHiveInputFormat.getSplits(jobConf, 1);
+// Since the SPLIT_SIZE is 3, we should create only 1 split with all 3 
file groups
+assertEquals(1, splits.length);
+RecordReader recordReader =

Review comment:
   hello , just see one recordreader?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] codecov-commenter edited a comment on pull request #2843: [HUDI-1804] Continue to write when Flink write task restart because o…

2021-04-18 Thread GitBox


codecov-commenter edited a comment on pull request #2843:
URL: https://github.com/apache/hudi/pull/2843#issuecomment-821748199


   # 
[Codecov](https://codecov.io/gh/apache/hudi/pull/2843?src=pr=h1_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 Report
   > :exclamation: No coverage uploaded for pull request base 
(`master@b6d949b`). [Click here to learn what that 
means](https://docs.codecov.io/docs/error-reference?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#section-missing-base-commit).
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/2843/graphs/tree.svg?width=650=150=pr=VTTXabwbs2_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/2843?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
   
   ```diff
   @@Coverage Diff@@
   ## master#2843   +/-   ##
   =
 Coverage  ?   52.58%   
 Complexity? 3709   
   =
 Files ?  485   
 Lines ?23227   
 Branches  ? 2466   
   =
 Hits  ?12215   
 Misses? 9934   
 Partials  ? 1078   
   ```
   
   | Flag | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | hudicli | `40.29% <ø> (?)` | `215.00 <ø> (?)` | |
   | hudiclient | `∅ <ø> (?)` | `0.00 <ø> (?)` | |
   | hudicommon | `50.66% <ø> (?)` | `1976.00 <ø> (?)` | |
   | hudiflink | `56.51% <ø> (?)` | `516.00 <ø> (?)` | |
   | hudihadoopmr | `33.33% <ø> (?)` | `198.00 <ø> (?)` | |
   | hudisparkdatasource | `72.06% <ø> (?)` | `237.00 <ø> (?)` | |
   | hudisync | `45.70% <ø> (?)` | `131.00 <ø> (?)` | |
   | huditimelineservice | `64.36% <ø> (?)` | `62.00 <ø> (?)` | |
   | hudiutilities | `69.84% <ø> (?)` | `374.00 <ø> (?)` | |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] codecov-commenter edited a comment on pull request #2843: [HUDI-1804] Continue to write when Flink write task restart because o…

2021-04-18 Thread GitBox


codecov-commenter edited a comment on pull request #2843:
URL: https://github.com/apache/hudi/pull/2843#issuecomment-821748199






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] codecov-commenter edited a comment on pull request #2843: [HUDI-1804] Continue to write when Flink write task restart because o…

2021-04-18 Thread GitBox


codecov-commenter edited a comment on pull request #2843:
URL: https://github.com/apache/hudi/pull/2843#issuecomment-821748199


   # 
[Codecov](https://codecov.io/gh/apache/hudi/pull/2843?src=pr=h1_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 Report
   > :exclamation: No coverage uploaded for pull request base 
(`master@b6d949b`). [Click here to learn what that 
means](https://docs.codecov.io/docs/error-reference?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#section-missing-base-commit).
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/2843/graphs/tree.svg?width=650=150=pr=VTTXabwbs2_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/2843?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
   
   ```diff
   @@Coverage Diff@@
   ## master#2843   +/-   ##
   =
 Coverage  ?   69.84%   
 Complexity?  374   
   =
 Files ?   54   
 Lines ? 1993   
 Branches  ?  235   
   =
 Hits  ? 1392   
 Misses?  471   
 Partials  ?  130   
   ```
   
   | Flag | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | hudiutilities | `69.84% <ø> (?)` | `374.00 <ø> (?)` | |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org