[jira] [Updated] (HUDI-1810) Azure CI integration test failed for CLI tests
[ https://issues.apache.org/jira/browse/HUDI-1810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-1810: - Fix Version/s: 0.9.0 > Azure CI integration test failed for CLI tests > -- > > Key: HUDI-1810 > URL: https://issues.apache.org/jira/browse/HUDI-1810 > Project: Apache Hudi > Issue Type: Bug > Components: CLI, Testing >Reporter: Raymond Xu >Priority: Blocker > Fix For: 0.9.0 > > > CLI job failure > https://dev.azure.com/xushiyan/apache-hudi-ci/_build/results?buildId=29=logs=d5c42908-5572-5ce6-e4a8-5e2053b947e8=00d56f8a-c99a-5b4c-dcf4-5ca71d997069 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HUDI-1811) Azure CI connection refused error when init HoodieRealtimeRecordReader
Raymond Xu created HUDI-1811: Summary: Azure CI connection refused error when init HoodieRealtimeRecordReader Key: HUDI-1811 URL: https://issues.apache.org/jira/browse/HUDI-1811 Project: Apache Hudi Issue Type: Bug Components: Testing Reporter: Raymond Xu Fix For: 0.9.0 Failed job https://dev.azure.com/xushiyan/apache-hudi-ci/_build/results?buildId=35=logs=d3721143-1417-5e3d-cf04-c39c0756eab9=a7783b9f-edd1-5bb0-5301-1955fdbfb2c4 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-1810) Azure CI integration test failed for CLI tests
[ https://issues.apache.org/jira/browse/HUDI-1810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-1810: - Description: CLI job failure https://dev.azure.com/xushiyan/apache-hudi-ci/_build/results?buildId=29=logs=d5c42908-5572-5ce6-e4a8-5e2053b947e8=00d56f8a-c99a-5b4c-dcf4-5ca71d997069 > Azure CI integration test failed for CLI tests > -- > > Key: HUDI-1810 > URL: https://issues.apache.org/jira/browse/HUDI-1810 > Project: Apache Hudi > Issue Type: Bug > Components: CLI, Testing >Reporter: Raymond Xu >Priority: Blocker > > CLI job failure > https://dev.azure.com/xushiyan/apache-hudi-ci/_build/results?buildId=29=logs=d5c42908-5572-5ce6-e4a8-5e2053b947e8=00d56f8a-c99a-5b4c-dcf4-5ca71d997069 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HUDI-1810) Azure CI integration test failed for CLI tests
Raymond Xu created HUDI-1810: Summary: Azure CI integration test failed for CLI tests Key: HUDI-1810 URL: https://issues.apache.org/jira/browse/HUDI-1810 Project: Apache Hudi Issue Type: Bug Components: CLI, Testing Reporter: Raymond Xu -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-1251) [UMBRELLA] Migrate CI to azure
[ https://issues.apache.org/jira/browse/HUDI-1251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Chandar updated HUDI-1251: - Priority: Blocker (was: Major) > [UMBRELLA] Migrate CI to azure > --- > > Key: HUDI-1251 > URL: https://issues.apache.org/jira/browse/HUDI-1251 > Project: Apache Hudi > Issue Type: Improvement > Components: Testing >Reporter: sivabalan narayanan >Assignee: sivabalan narayanan >Priority: Blocker > Labels: hudi-umbrellas > Fix For: 0.9.0 > > > Stabilize CI on azure -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-1251) [UMBRELLA] Migrate CI to azure
[ https://issues.apache.org/jira/browse/HUDI-1251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Chandar updated HUDI-1251: - Summary: [UMBRELLA] Migrate CI to azure (was: [UMBRELLA] CI stability ) > [UMBRELLA] Migrate CI to azure > --- > > Key: HUDI-1251 > URL: https://issues.apache.org/jira/browse/HUDI-1251 > Project: Apache Hudi > Issue Type: Improvement > Components: Testing >Reporter: sivabalan narayanan >Assignee: sivabalan narayanan >Priority: Major > Labels: hudi-umbrellas > > Stabilize CI on azure -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-1251) [UMBRELLA] Migrate CI to azure
[ https://issues.apache.org/jira/browse/HUDI-1251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Chandar updated HUDI-1251: - Fix Version/s: 0.9.0 > [UMBRELLA] Migrate CI to azure > --- > > Key: HUDI-1251 > URL: https://issues.apache.org/jira/browse/HUDI-1251 > Project: Apache Hudi > Issue Type: Improvement > Components: Testing >Reporter: sivabalan narayanan >Assignee: sivabalan narayanan >Priority: Major > Labels: hudi-umbrellas > Fix For: 0.9.0 > > > Stabilize CI on azure -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-1251) [UMBRELLA] Migrate CI to azure
[ https://issues.apache.org/jira/browse/HUDI-1251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Chandar updated HUDI-1251: - Status: Open (was: New) > [UMBRELLA] Migrate CI to azure > --- > > Key: HUDI-1251 > URL: https://issues.apache.org/jira/browse/HUDI-1251 > Project: Apache Hudi > Issue Type: Improvement > Components: Testing >Reporter: sivabalan narayanan >Assignee: sivabalan narayanan >Priority: Major > Labels: hudi-umbrellas > Fix For: 0.9.0 > > > Stabilize CI on azure -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-1251) [UMBRELLA] CI stability
[ https://issues.apache.org/jira/browse/HUDI-1251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Chandar updated HUDI-1251: - Description: Stabilize CI on azure (was: Stabilize CI and ease debugging of integration test failures.) > [UMBRELLA] CI stability > > > Key: HUDI-1251 > URL: https://issues.apache.org/jira/browse/HUDI-1251 > Project: Apache Hudi > Issue Type: Improvement > Components: Testing >Reporter: sivabalan narayanan >Assignee: sivabalan narayanan >Priority: Major > Labels: hudi-umbrellas > > Stabilize CI on azure -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-1251) [UMBRELLA] CI stability
[ https://issues.apache.org/jira/browse/HUDI-1251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Chandar updated HUDI-1251: - Summary: [UMBRELLA] CI stability (was: [UMBRELLA] CI stability and debugging integ tests) > [UMBRELLA] CI stability > > > Key: HUDI-1251 > URL: https://issues.apache.org/jira/browse/HUDI-1251 > Project: Apache Hudi > Issue Type: Improvement > Components: Testing >Reporter: sivabalan narayanan >Assignee: sivabalan narayanan >Priority: Major > Labels: hudi-umbrellas > > Stabilize CI and ease debugging of integration test failures. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] hudi-bot edited a comment on pull request #2643: DO NOT MERGE (Azure CI) test branch ci
hudi-bot edited a comment on pull request #2643: URL: https://github.com/apache/hudi/pull/2643#issuecomment-792368481 ## CI report: * 9831a6c50e9f49f8a71c02fc6ac50ae1446f7c1f UNKNOWN * a569dbe9409910fbb83b3764b300574c0e52612e Azure: [FAILURE](https://dev.azure.com/XUSH0012/0ef433cc-d4b4-47cc-b6a1-03d032ef546c/_build/results?buildId=142) * e6e9f1f1554a1474dd6c20338215030cad23a2e0 UNKNOWN * 2a6690a256c8cd8efe9ed2b1984b896fb27ef077 UNKNOWN * d8b7cca55e057a52a2e229d81e8cb52b60dc275f UNKNOWN * 3bce301333cc78194d13a702598b46e04fe9f85f UNKNOWN * f07f345baa450f3fec7eab59caa76b0fbda1e132 UNKNOWN * 869d2ce3fad330af93c1bb3b576824f519c6e68b UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] codecov-commenter edited a comment on pull request #2784: [HUDI-1740] Fix insert-overwrite API archival
codecov-commenter edited a comment on pull request #2784: URL: https://github.com/apache/hudi/pull/2784#issuecomment-822168210 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2784?src=pr=h1_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) Report > Merging [#2784](https://codecov.io/gh/apache/hudi/pull/2784?src=pr=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) (2327417) into [master](https://codecov.io/gh/apache/hudi/commit/8d29863c86aed57dc8f1a0a450bce3b256de2960?el=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) (8d29863) will **decrease** coverage by `43.21%`. > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/hudi/pull/2784/graphs/tree.svg?width=650=150=pr=VTTXabwbs2_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/2784?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) ```diff @@ Coverage Diff @@ ## master #2784 +/- ## - Coverage 52.60% 9.38% -43.22% + Complexity 3709 48 -3661 Files 485 54 -431 Lines 232241993-21231 Branches 2465 235 -2230 - Hits 12216 187-12029 + Misses 99291793 -8136 + Partials 1079 13 -1066 ``` | Flag | Coverage Δ | Complexity Δ | | |---|---|---|---| | hudicli | `?` | `?` | | | hudiclient | `?` | `?` | | | hudicommon | `?` | `?` | | | hudiflink | `?` | `?` | | | hudihadoopmr | `?` | `?` | | | hudisparkdatasource | `?` | `?` | | | hudisync | `?` | `?` | | | huditimelineservice | `?` | `?` | | | hudiutilities | `9.38% <ø> (-60.42%)` | `48.00 <ø> (-325.00)` | | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more. | [Impacted Files](https://codecov.io/gh/apache/hudi/pull/2784?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) | Coverage Δ | Complexity Δ | | |---|---|---|---| | [...va/org/apache/hudi/utilities/IdentitySplitter.java](https://codecov.io/gh/apache/hudi/pull/2784/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL0lkZW50aXR5U3BsaXR0ZXIuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-2.00%)` | | | [...va/org/apache/hudi/utilities/schema/SchemaSet.java](https://codecov.io/gh/apache/hudi/pull/2784/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NjaGVtYS9TY2hlbWFTZXQuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-3.00%)` | | | [...a/org/apache/hudi/utilities/sources/RowSource.java](https://codecov.io/gh/apache/hudi/pull/2784/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvUm93U291cmNlLmphdmE=) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-4.00%)` | | | [.../org/apache/hudi/utilities/sources/AvroSource.java](https://codecov.io/gh/apache/hudi/pull/2784/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvQXZyb1NvdXJjZS5qYXZh) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-1.00%)` | | | [.../org/apache/hudi/utilities/sources/JsonSource.java](https://codecov.io/gh/apache/hudi/pull/2784/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvSnNvblNvdXJjZS5qYXZh) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-1.00%)` | | |
[GitHub] [hudi] codecov-commenter edited a comment on pull request #2784: [HUDI-1740] Fix insert-overwrite API archival
codecov-commenter edited a comment on pull request #2784: URL: https://github.com/apache/hudi/pull/2784#issuecomment-822168210 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2784?src=pr=h1_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) Report > Merging [#2784](https://codecov.io/gh/apache/hudi/pull/2784?src=pr=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) (af963ac) into [master](https://codecov.io/gh/apache/hudi/commit/8d29863c86aed57dc8f1a0a450bce3b256de2960?el=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) (8d29863) will **increase** coverage by `17.14%`. > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/hudi/pull/2784/graphs/tree.svg?width=650=150=pr=VTTXabwbs2_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/2784?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) ```diff @@ Coverage Diff @@ ## master#2784 +/- ## = + Coverage 52.60% 69.74% +17.14% + Complexity 3709 372 -3337 = Files 485 54 -431 Lines 23224 1993-21231 Branches 2465 235 -2230 = - Hits 12216 1390-10826 + Misses 9929 471 -9458 + Partials 1079 132 -947 ``` | Flag | Coverage Δ | Complexity Δ | | |---|---|---|---| | hudicli | `?` | `?` | | | hudiclient | `∅ <ø> (∅)` | `0.00 <ø> (ø)` | | | hudicommon | `?` | `?` | | | hudiflink | `?` | `?` | | | hudihadoopmr | `?` | `?` | | | hudisparkdatasource | `?` | `?` | | | hudisync | `?` | `?` | | | huditimelineservice | `?` | `?` | | | hudiutilities | `69.74% <ø> (-0.06%)` | `372.00 <ø> (-1.00)` | | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more. | [Impacted Files](https://codecov.io/gh/apache/hudi/pull/2784?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) | Coverage Δ | Complexity Δ | | |---|---|---|---| | [...apache/hudi/utilities/deltastreamer/DeltaSync.java](https://codecov.io/gh/apache/hudi/pull/2784/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2RlbHRhc3RyZWFtZXIvRGVsdGFTeW5jLmphdmE=) | `70.74% <0.00%> (-0.35%)` | `54.00% <0.00%> (-1.00%)` | | | [...a/org/apache/hudi/common/model/HoodieBaseFile.java](https://codecov.io/gh/apache/hudi/pull/2784/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL21vZGVsL0hvb2RpZUJhc2VGaWxlLmphdmE=) | | | | | [...org/apache/hudi/common/util/SpillableMapUtils.java](https://codecov.io/gh/apache/hudi/pull/2784/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3V0aWwvU3BpbGxhYmxlTWFwVXRpbHMuamF2YQ==) | | | | | [...common/model/HoodieFailedWritesCleaningPolicy.java](https://codecov.io/gh/apache/hudi/pull/2784/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL21vZGVsL0hvb2RpZUZhaWxlZFdyaXRlc0NsZWFuaW5nUG9saWN5LmphdmE=) | | | | | [...e/timeline/versioning/clean/CleanPlanMigrator.java](https://codecov.io/gh/apache/hudi/pull/2784/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3RpbWVsaW5lL3ZlcnNpb25pbmcvY2xlYW4vQ2xlYW5QbGFuTWlncmF0b3IuamF2YQ==) | | | | | [.../apache/hudi/common/util/ObjectSizeCalculator.java](https://codecov.io/gh/apache/hudi/pull/2784/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3V0aWwvT2JqZWN0U2l6ZUNhbGN1bGF0b3IuamF2YQ==) | |
[GitHub] [hudi] hudi-bot edited a comment on pull request #2643: DO NOT MERGE (Azure CI) test branch ci
hudi-bot edited a comment on pull request #2643: URL: https://github.com/apache/hudi/pull/2643#issuecomment-792368481 ## CI report: * 9831a6c50e9f49f8a71c02fc6ac50ae1446f7c1f UNKNOWN * a569dbe9409910fbb83b3764b300574c0e52612e Azure: [FAILURE](https://dev.azure.com/XUSH0012/0ef433cc-d4b4-47cc-b6a1-03d032ef546c/_build/results?buildId=142) * e6e9f1f1554a1474dd6c20338215030cad23a2e0 UNKNOWN * 2a6690a256c8cd8efe9ed2b1984b896fb27ef077 UNKNOWN * d8b7cca55e057a52a2e229d81e8cb52b60dc275f UNKNOWN * 3bce301333cc78194d13a702598b46e04fe9f85f UNKNOWN * f07f345baa450f3fec7eab59caa76b0fbda1e132 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] codecov-commenter commented on pull request #2784: [HUDI-1740] Fix insert-overwrite API archival
codecov-commenter commented on pull request #2784: URL: https://github.com/apache/hudi/pull/2784#issuecomment-822168210 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2784?src=pr=h1_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) Report > Merging [#2784](https://codecov.io/gh/apache/hudi/pull/2784?src=pr=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) (af963ac) into [master](https://codecov.io/gh/apache/hudi/commit/8d29863c86aed57dc8f1a0a450bce3b256de2960?el=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) (8d29863) will **increase** coverage by `17.14%`. > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/hudi/pull/2784/graphs/tree.svg?width=650=150=pr=VTTXabwbs2_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/2784?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) ```diff @@ Coverage Diff @@ ## master#2784 +/- ## = + Coverage 52.60% 69.74% +17.14% + Complexity 3709 372 -3337 = Files 485 54 -431 Lines 23224 1993-21231 Branches 2465 235 -2230 = - Hits 12216 1390-10826 + Misses 9929 471 -9458 + Partials 1079 132 -947 ``` | Flag | Coverage Δ | Complexity Δ | | |---|---|---|---| | hudicli | `?` | `?` | | | hudiclient | `?` | `?` | | | hudicommon | `?` | `?` | | | hudiflink | `?` | `?` | | | hudihadoopmr | `?` | `?` | | | hudisparkdatasource | `?` | `?` | | | hudisync | `?` | `?` | | | huditimelineservice | `?` | `?` | | | hudiutilities | `69.74% <ø> (-0.06%)` | `372.00 <ø> (-1.00)` | | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more. | [Impacted Files](https://codecov.io/gh/apache/hudi/pull/2784?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) | Coverage Δ | Complexity Δ | | |---|---|---|---| | [...apache/hudi/utilities/deltastreamer/DeltaSync.java](https://codecov.io/gh/apache/hudi/pull/2784/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2RlbHRhc3RyZWFtZXIvRGVsdGFTeW5jLmphdmE=) | `70.74% <0.00%> (-0.35%)` | `54.00% <0.00%> (-1.00%)` | | | [...cala/org/apache/hudi/HoodieBootstrapRelation.scala](https://codecov.io/gh/apache/hudi/pull/2784/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1zcGFyay1kYXRhc291cmNlL2h1ZGktc3Bhcmsvc3JjL21haW4vc2NhbGEvb3JnL2FwYWNoZS9odWRpL0hvb2RpZUJvb3RzdHJhcFJlbGF0aW9uLnNjYWxh) | | | | | [...i/hadoop/utils/HoodieRealtimeInputFormatUtils.java](https://codecov.io/gh/apache/hudi/pull/2784/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1oYWRvb3AtbXIvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaGFkb29wL3V0aWxzL0hvb2RpZVJlYWx0aW1lSW5wdXRGb3JtYXRVdGlscy5qYXZh) | | | | | [.../org/apache/hudi/sink/utils/NonThrownExecutor.java](https://codecov.io/gh/apache/hudi/pull/2784/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS9zaW5rL3V0aWxzL05vblRocm93bkV4ZWN1dG9yLmphdmE=) | | | | | [...oop/realtime/HoodieParquetRealtimeInputFormat.java](https://codecov.io/gh/apache/hudi/pull/2784/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1oYWRvb3AtbXIvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaGFkb29wL3JlYWx0aW1lL0hvb2RpZVBhcnF1ZXRSZWFsdGltZUlucHV0Rm9ybWF0LmphdmE=) | | | | | [...n/scala/org/apache/hudi/HoodieMergeOnReadRDD.scala](https://codecov.io/gh/apache/hudi/pull/2784/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1zcGFyay1kYXRhc291cmNlL2h1ZGktc3Bhcmsvc3JjL21haW4vc2NhbGEvb3JnL2FwYWNoZS9odWRpL0hvb2RpZU1lcmdlT25SZWFkUkRELnNjYWxh) | | |
[GitHub] [hudi] ssdong commented on pull request #2784: [HUDI-1740] Fix insert-overwrite API archival
ssdong commented on pull request #2784: URL: https://github.com/apache/hudi/pull/2784#issuecomment-822161097 @lw309637554 updated! Please take a look and let me know. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (HUDI-1809) Flink merge on read input split uses wrong base file path for default merge type
Danny Chen created HUDI-1809: Summary: Flink merge on read input split uses wrong base file path for default merge type Key: HUDI-1809 URL: https://issues.apache.org/jira/browse/HUDI-1809 Project: Apache Hudi Issue Type: Bug Components: Flink Integration Reporter: Danny Chen Assignee: Danny Chen Fix For: 0.9.0 Should use the base file path instead of the table path. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] codecov-commenter edited a comment on pull request #2845: [HUDI-1723] Fix path selector listing files with the same mod date
codecov-commenter edited a comment on pull request #2845: URL: https://github.com/apache/hudi/pull/2845#issuecomment-822143179 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2845?src=pr=h1_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) Report > :exclamation: No coverage uploaded for pull request base (`master@4e050cc`). [Click here to learn what that means](https://docs.codecov.io/docs/error-reference?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#section-missing-base-commit). > The diff coverage is `40.00%`. [![Impacted file tree graph](https://codecov.io/gh/apache/hudi/pull/2845/graphs/tree.svg?width=650=150=pr=VTTXabwbs2_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/2845?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) ```diff @@Coverage Diff@@ ## master#2845 +/- ## = Coverage ? 52.58% Complexity? 3708 = Files ? 485 Lines ?23227 Branches ? 2466 = Hits ?12214 Misses? 9934 Partials ? 1079 ``` | Flag | Coverage Δ | Complexity Δ | | |---|---|---|---| | hudicli | `40.29% <ø> (?)` | `215.00 <ø> (?)` | | | hudiclient | `∅ <ø> (?)` | `0.00 <ø> (?)` | | | hudicommon | `50.66% <ø> (?)` | `1976.00 <ø> (?)` | | | hudiflink | `56.51% <ø> (?)` | `516.00 <ø> (?)` | | | hudihadoopmr | `33.33% <ø> (?)` | `198.00 <ø> (?)` | | | hudisparkdatasource | `72.06% <ø> (?)` | `237.00 <ø> (?)` | | | hudisync | `45.70% <ø> (?)` | `131.00 <ø> (?)` | | | huditimelineservice | `64.36% <ø> (?)` | `62.00 <ø> (?)` | | | hudiutilities | `69.79% <40.00%> (?)` | `373.00 <0.00> (?)` | | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more. | [Impacted Files](https://codecov.io/gh/apache/hudi/pull/2845?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) | Coverage Δ | Complexity Δ | | |---|---|---|---| | [...ies/sources/helpers/DatePartitionPathSelector.java](https://codecov.io/gh/apache/hudi/pull/2845/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvaGVscGVycy9EYXRlUGFydGl0aW9uUGF0aFNlbGVjdG9yLmphdmE=) | `54.68% <0.00%> (ø)` | `13.00 <0.00> (?)` | | | [...udi/utilities/sources/helpers/DFSPathSelector.java](https://codecov.io/gh/apache/hudi/pull/2845/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvaGVscGVycy9ERlNQYXRoU2VsZWN0b3IuamF2YQ==) | `82.60% <80.00%> (ø)` | `14.00 <0.00> (?)` | | -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] pengzhiwei2018 commented on a change in pull request #2283: [HUDI-1415] Read Hoodie Table As Spark DataSource Table
pengzhiwei2018 commented on a change in pull request #2283: URL: https://github.com/apache/hudi/pull/2283#discussion_r615528821 ## File path: hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/DataSourceOptions.scala ## @@ -353,6 +353,8 @@ object DataSourceWriteOptions { val HIVE_IGNORE_EXCEPTIONS_OPT_KEY = "hoodie.datasource.hive_sync.ignore_exceptions" val HIVE_SKIP_RO_SUFFIX = "hoodie.datasource.hive_sync.skip_ro_suffix" val HIVE_SUPPORT_TIMESTAMP = "hoodie.datasource.hive_sync.support_timestamp" + val HIVE_TABLE_PROPERTIES = "hoodie.datasource.hive_sync.table_properties" Review comment: Good suggestion! +1 for this! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] codecov-commenter edited a comment on pull request #2845: [HUDI-1723] Fix path selector listing files with the same mod date
codecov-commenter edited a comment on pull request #2845: URL: https://github.com/apache/hudi/pull/2845#issuecomment-822143179 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2845?src=pr=h1_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) Report > :exclamation: No coverage uploaded for pull request base (`master@4e050cc`). [Click here to learn what that means](https://docs.codecov.io/docs/error-reference?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#section-missing-base-commit). > The diff coverage is `40.00%`. [![Impacted file tree graph](https://codecov.io/gh/apache/hudi/pull/2845/graphs/tree.svg?width=650=150=pr=VTTXabwbs2_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/2845?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) ```diff @@Coverage Diff@@ ## master#2845 +/- ## = Coverage ? 69.79% Complexity? 373 = Files ? 54 Lines ? 1993 Branches ? 235 = Hits ? 1391 Misses? 471 Partials ? 131 ``` | Flag | Coverage Δ | Complexity Δ | | |---|---|---|---| | hudiutilities | `69.79% <40.00%> (?)` | `373.00 <0.00> (?)` | | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more. | [Impacted Files](https://codecov.io/gh/apache/hudi/pull/2845?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) | Coverage Δ | Complexity Δ | | |---|---|---|---| | [...ies/sources/helpers/DatePartitionPathSelector.java](https://codecov.io/gh/apache/hudi/pull/2845/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvaGVscGVycy9EYXRlUGFydGl0aW9uUGF0aFNlbGVjdG9yLmphdmE=) | `54.68% <0.00%> (ø)` | `13.00 <0.00> (?)` | | | [...udi/utilities/sources/helpers/DFSPathSelector.java](https://codecov.io/gh/apache/hudi/pull/2845/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvaGVscGVycy9ERlNQYXRoU2VsZWN0b3IuamF2YQ==) | `82.60% <80.00%> (ø)` | `14.00 <0.00> (?)` | | -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] pengzhiwei2018 commented on a change in pull request #2283: [HUDI-1415] Read Hoodie Table As Spark DataSource Table
pengzhiwei2018 commented on a change in pull request #2283: URL: https://github.com/apache/hudi/pull/2283#discussion_r615526012 ## File path: hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/HiveSyncConfig.java ## @@ -88,6 +88,12 @@ @Parameter(names = {"--verify-metadata-file-listing"}, description = "Verify file listing from Hudi's metadata against file system") public Boolean verifyMetadataFileListing = HoodieMetadataConfig.DEFAULT_METADATA_VALIDATE; + @Parameter(names = {"--table-properties"}, description = "Table properties to hive table") + public String tableProperties; + + @Parameter(names = {"--serde-properties"}, description = "Serde properties to hive table") + public String serdeProperties; + Review comment: Yes, thanks for remind me. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] pengzhiwei2018 commented on a change in pull request #2283: [HUDI-1415] Read Hoodie Table As Spark DataSource Table
pengzhiwei2018 commented on a change in pull request #2283: URL: https://github.com/apache/hudi/pull/2283#discussion_r615525852 ## File path: hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/HiveSyncTool.java ## @@ -164,7 +165,13 @@ private void syncHoodieTable(String tableName, boolean useRealtimeInputFormat) { LOG.info("Storage partitions scan complete. Found " + writtenPartitionsSince.size()); // Sync the partitions if needed syncPartitions(tableName, writtenPartitionsSince); - +// Sync the table properties if need +if (cfg.tableProperties != null) { + Map tableProperties = ConfigUtils.toMap(cfg.tableProperties); + hoodieHiveClient.updateTableProperties(tableName, tableProperties); + LOG.info("Sync table properties for " + tableName + ", table properties is: " + + cfg.tableProperties); +} Review comment: Well, the `tableProperties` may change if the schema has changed. So we need to update the table properties by a separate interface. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] codecov-commenter commented on pull request #2845: [HUDI-1723] Fix path selector listing files with the same mod date
codecov-commenter commented on pull request #2845: URL: https://github.com/apache/hudi/pull/2845#issuecomment-822143179 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2845?src=pr=h1_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) Report > :exclamation: No coverage uploaded for pull request base (`master@4e050cc`). [Click here to learn what that means](https://docs.codecov.io/docs/error-reference?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#section-missing-base-commit). > The diff coverage is `0.00%`. [![Impacted file tree graph](https://codecov.io/gh/apache/hudi/pull/2845/graphs/tree.svg?width=650=150=pr=VTTXabwbs2_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/2845?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) ```diff @@ Coverage Diff@@ ## master #2845 +/- ## Coverage ? 9.38% Complexity? 48 Files ? 54 Lines ?1993 Branches ? 235 Hits ? 187 Misses?1793 Partials ? 13 ``` | Flag | Coverage Δ | Complexity Δ | | |---|---|---|---| | hudiutilities | `9.38% <0.00%> (?)` | `48.00 <0.00> (?)` | | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more. | [Impacted Files](https://codecov.io/gh/apache/hudi/pull/2845?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) | Coverage Δ | Complexity Δ | | |---|---|---|---| | [...udi/utilities/sources/helpers/DFSPathSelector.java](https://codecov.io/gh/apache/hudi/pull/2845/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvaGVscGVycy9ERlNQYXRoU2VsZWN0b3IuamF2YQ==) | `0.00% <0.00%> (ø)` | `0.00 <0.00> (?)` | | | [...ies/sources/helpers/DatePartitionPathSelector.java](https://codecov.io/gh/apache/hudi/pull/2845/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvaGVscGVycy9EYXRlUGFydGl0aW9uUGF0aFNlbGVjdG9yLmphdmE=) | `0.00% <0.00%> (ø)` | `0.00 <0.00> (?)` | | -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] pengzhiwei2018 commented on a change in pull request #2283: [HUDI-1415] Read Hoodie Table As Spark DataSource Table
pengzhiwei2018 commented on a change in pull request #2283: URL: https://github.com/apache/hudi/pull/2283#discussion_r615522929 ## File path: hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala ## @@ -306,7 +311,10 @@ private[hudi] object HoodieSparkSqlWriter { } finally { writeClient.close() } -val metaSyncSuccess = metaSync(parameters, basePath, jsc.hadoopConfiguration) +val newParameters = + addSqlTableProperties(sqlContext.sparkSession.sessionState.conf, df.schema, parameters) Review comment: yeah, moving the `addSqlTableProperties` to `metaSync` can simplify the logical. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] lw309637554 commented on pull request #2784: [HUDI-1740] Fix insert-overwrite API archival
lw309637554 commented on pull request #2784: URL: https://github.com/apache/hudi/pull/2784#issuecomment-822141958 > @lw309637554 Thank you for your comments and I've replied. Please take a look and let me know. replied, can modify the log -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] xushiyan commented on pull request #2845: [HUDI-1723] Fix path selector listing files with the same mod date
xushiyan commented on pull request #2845: URL: https://github.com/apache/hudi/pull/2845#issuecomment-822141919 @vinothchandar This is the short-term fix for the bug. Logic were duplicated due to the existing logic copied over from one to another. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] lw309637554 commented on a change in pull request #2784: [HUDI-1740] Fix insert-overwrite API archival
lw309637554 commented on a change in pull request #2784: URL: https://github.com/apache/hudi/pull/2784#discussion_r615522271 ## File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/utils/MetadataConversionUtils.java ## @@ -105,14 +117,25 @@ public static HoodieArchivedMetaEntry createMetaWrapper(HoodieInstant hoodieInst return archivedMetaWrapper; } - public static HoodieArchivedMetaEntry createMetaWrapper(HoodieInstant hoodieInstant, - HoodieCommitMetadata hoodieCommitMetadata) { -HoodieArchivedMetaEntry archivedMetaWrapper = new HoodieArchivedMetaEntry(); -archivedMetaWrapper.setCommitTime(hoodieInstant.getTimestamp()); -archivedMetaWrapper.setActionState(hoodieInstant.getState().name()); - archivedMetaWrapper.setHoodieCommitMetadata(convertCommitMetadata(hoodieCommitMetadata)); -archivedMetaWrapper.setActionType(ActionType.commit.name()); -return archivedMetaWrapper; + public static Option getInflightReplaceMetadata(HoodieTableMetaClient metaClient, HoodieInstant instant) throws IOException { Review comment: okay -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-1723) DFSPathSelector skips files with the same modify date when read up to source limit
[ https://issues.apache.org/jira/browse/HUDI-1723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-1723: - Labels: pull-request-available sev:critical user-support-issues (was: sev:critical user-support-issues) > DFSPathSelector skips files with the same modify date when read up to source > limit > -- > > Key: HUDI-1723 > URL: https://issues.apache.org/jira/browse/HUDI-1723 > Project: Apache Hudi > Issue Type: Bug > Components: DeltaStreamer >Reporter: Raymond Xu >Priority: Blocker > Labels: pull-request-available, sev:critical, user-support-issues > Fix For: 0.9.0 > > Attachments: Screen Shot 2021-03-26 at 1.42.42 AM.png > > > org.apache.hudi.utilities.sources.helpers.DFSPathSelector#listEligibleFiles > filters the input files based on last saved checkpoint, which was the > modification date from last read file. However, the last read file's > modification date could be duplicated for multiple files and resulted in > skipping a few of them when reading up to source limit. An illustration is > shown in the attached picture. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] xushiyan opened a new pull request #2845: [HUDI-1723] Fix path selector listing files with the same mod date
xushiyan opened a new pull request #2845: URL: https://github.com/apache/hudi/pull/2845 For issues described in https://issues.apache.org/jira/browse/HUDI-1723 ## Committer checklist - [ ] Has a corresponding JIRA in PR title & commit - [ ] Commit message is descriptive of the change - [ ] CI is green - [ ] Necessary doc changes done or have another open PR - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] lw309637554 commented on a change in pull request #2784: [HUDI-1740] Fix insert-overwrite API archival
lw309637554 commented on a change in pull request #2784: URL: https://github.com/apache/hudi/pull/2784#discussion_r615520709 ## File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/ReplaceArchivalHelper.java ## @@ -68,6 +68,11 @@ public static boolean deleteReplacedFileGroups(HoodieEngineContext context, HoodieTableMetaClient metaClient, TableFileSystemView fileSystemView, HoodieInstant instant, List replacedPartitions) { +// There is no file id to be replaced in the very first replace commit file for insert overwrite operation +if (replacedPartitions.isEmpty()) { + LOG.warn("Found empty partitionToReplaceFileIds"); Review comment: yes, "Found no partition files to replace " will better -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] codecov-commenter edited a comment on pull request #2645: [HUDI-1659] Basic Implementation Of Spark Sql Support
codecov-commenter edited a comment on pull request #2645: URL: https://github.com/apache/hudi/pull/2645#issuecomment-822128091 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2645?src=pr=h1_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) Report > Merging [#2645](https://codecov.io/gh/apache/hudi/pull/2645?src=pr=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) (8304965) into [master](https://codecov.io/gh/apache/hudi/commit/18459d4045ec4a85081c227893b226a4d759f84b?el=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) (18459d4) will **increase** coverage by `0.46%`. > The diff coverage is `54.75%`. > :exclamation: Current head 8304965 differs from pull request most recent head 20a927c. Consider uploading reports for the commit 20a927c to get more accurate results [![Impacted file tree graph](https://codecov.io/gh/apache/hudi/pull/2645/graphs/tree.svg?width=650=150=pr=VTTXabwbs2_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/2645?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) ```diff @@ Coverage Diff @@ ## master#2645 +/- ## + Coverage 52.26% 52.73% +0.46% - Complexity 3682 3822 +140 Files 484 509 +25 Lines 2309424656+1562 Branches 2456 2774 +318 + Hits 1207013002 +932 - Misses 995910380 +421 - Partials 1065 1274 +209 ``` | Flag | Coverage Δ | Complexity Δ | | |---|---|---|---| | hudicli | `40.29% <ø> (+3.35%)` | `215.00 <ø> (+20.00)` | | | hudiclient | `∅ <ø> (∅)` | `0.00 <ø> (ø)` | | | hudicommon | `50.57% <11.11%> (-0.20%)` | `1979.00 <2.00> (+3.00)` | :arrow_down: | | hudiflink | `56.51% <ø> (-0.07%)` | `516.00 <ø> (+2.00)` | :arrow_down: | | hudihadoopmr | `33.33% <ø> (-0.12%)` | `198.00 <ø> (+1.00)` | :arrow_down: | | hudisparkdatasource | `65.00% <56.21%> (-6.34%)` | `348.00 <109.00> (+111.00)` | :arrow_down: | | hudisync | `45.62% <0.00%> (+0.15%)` | `131.00 <1.00> (+3.00)` | | | huditimelineservice | `64.36% <ø> (ø)` | `62.00 <ø> (ø)` | | | hudiutilities | `69.79% <ø> (+0.06%)` | `373.00 <ø> (ø)` | | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more. | [Impacted Files](https://codecov.io/gh/apache/hudi/pull/2645?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) | Coverage Δ | Complexity Δ | | |---|---|---|---| | [...rg/apache/hudi/common/table/HoodieTableConfig.java](https://codecov.io/gh/apache/hudi/pull/2645/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL0hvb2RpZVRhYmxlQ29uZmlnLmphdmE=) | `41.66% <0.00%> (-1.55%)` | `17.00 <0.00> (ø)` | | | [...pache/hudi/common/table/HoodieTableMetaClient.java](https://codecov.io/gh/apache/hudi/pull/2645/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL0hvb2RpZVRhYmxlTWV0YUNsaWVudC5qYXZh) | `63.35% <0.00%> (-3.31%)` | `43.00 <0.00> (ø)` | | | [.../apache/hudi/common/table/TableSchemaResolver.java](https://codecov.io/gh/apache/hudi/pull/2645/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL1RhYmxlU2NoZW1hUmVzb2x2ZXIuamF2YQ==) | `0.00% <0.00%> (ø)` | `0.00 <0.00> (ø)` | | | [...he/hudi/exception/HoodieDuplicateKeyException.java](https://codecov.io/gh/apache/hudi/pull/2645/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvZXhjZXB0aW9uL0hvb2RpZUR1cGxpY2F0ZUtleUV4Y2VwdGlvbi5qYXZh) | `0.00% <0.00%> (ø)` | `0.00 <0.00> (?)` | | |
[GitHub] [hudi] codecov-commenter edited a comment on pull request #2843: [HUDI-1804] Continue to write when Flink write task restart because o…
codecov-commenter edited a comment on pull request #2843: URL: https://github.com/apache/hudi/pull/2843#issuecomment-821748199 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2843?src=pr=h1_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) Report > Merging [#2843](https://codecov.io/gh/apache/hudi/pull/2843?src=pr=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) (6e943cf) into [master](https://codecov.io/gh/apache/hudi/commit/b6d949b48a649acac27d5d9b91677bf2e25e9342?el=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) (b6d949b) will **decrease** coverage by `0.01%`. > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/hudi/pull/2843/graphs/tree.svg?width=650=150=pr=VTTXabwbs2_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/2843?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) ```diff @@ Coverage Diff @@ ## master#2843 +/- ## - Coverage 52.60% 52.58% -0.02% + Complexity 3709 3708 -1 Files 485 485 Lines 2322423227 +3 Branches 2465 2466 +1 - Hits 1221612214 -2 - Misses 9929 9934 +5 Partials 1079 1079 ``` | Flag | Coverage Δ | Complexity Δ | | |---|---|---|---| | hudicli | `40.29% <ø> (ø)` | `215.00 <ø> (ø)` | | | hudiclient | `∅ <ø> (∅)` | `0.00 <ø> (ø)` | | | hudicommon | `50.66% <ø> (-0.03%)` | `1976.00 <ø> (-1.00)` | | | hudiflink | `56.51% <ø> (-0.04%)` | `516.00 <ø> (ø)` | | | hudihadoopmr | `33.33% <ø> (ø)` | `198.00 <ø> (ø)` | | | hudisparkdatasource | `72.06% <ø> (ø)` | `237.00 <ø> (ø)` | | | hudisync | `45.70% <ø> (ø)` | `131.00 <ø> (ø)` | | | huditimelineservice | `64.36% <ø> (ø)` | `62.00 <ø> (ø)` | | | hudiutilities | `69.79% <ø> (ø)` | `373.00 <ø> (ø)` | | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more. | [Impacted Files](https://codecov.io/gh/apache/hudi/pull/2843?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) | Coverage Δ | Complexity Δ | | |---|---|---|---| | [...ache/hudi/common/fs/inline/InMemoryFileSystem.java](https://codecov.io/gh/apache/hudi/pull/2843/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL2ZzL2lubGluZS9Jbk1lbW9yeUZpbGVTeXN0ZW0uamF2YQ==) | `79.31% <0.00%> (-10.35%)` | `15.00% <0.00%> (-1.00%)` | | | [.../hudi/table/format/cow/CopyOnWriteInputFormat.java](https://codecov.io/gh/apache/hudi/pull/2843/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS90YWJsZS9mb3JtYXQvY293L0NvcHlPbldyaXRlSW5wdXRGb3JtYXQuamF2YQ==) | `55.33% <0.00%> (-0.75%)` | `20.00% <0.00%> (ø%)` | | | [...java/org/apache/hudi/common/fs/StorageSchemes.java](https://codecov.io/gh/apache/hudi/pull/2843/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL2ZzL1N0b3JhZ2VTY2hlbWVzLmphdmE=) | `100.00% <0.00%> (ø)` | `10.00% <0.00%> (ø%)` | | -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] codecov-commenter edited a comment on pull request #2844: [MINOR][hudi-sync] Fix typos
codecov-commenter edited a comment on pull request #2844: URL: https://github.com/apache/hudi/pull/2844#issuecomment-822005123 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2844?src=pr=h1_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) Report > :exclamation: No coverage uploaded for pull request base (`master@4e050cc`). [Click here to learn what that means](https://docs.codecov.io/docs/error-reference?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#section-missing-base-commit). > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/hudi/pull/2844/graphs/tree.svg?width=650=150=pr=VTTXabwbs2_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/2844?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) ```diff @@Coverage Diff@@ ## master#2844 +/- ## = Coverage ? 52.61% Complexity? 3710 = Files ? 485 Lines ?23227 Branches ? 2466 = Hits ?12220 Misses? 9930 Partials ? 1077 ``` | Flag | Coverage Δ | Complexity Δ | | |---|---|---|---| | hudicli | `40.29% <ø> (?)` | `215.00 <ø> (?)` | | | hudiclient | `∅ <ø> (?)` | `0.00 <ø> (?)` | | | hudicommon | `50.71% <ø> (?)` | `1977.00 <ø> (?)` | | | hudiflink | `56.51% <ø> (?)` | `516.00 <ø> (?)` | | | hudihadoopmr | `33.33% <ø> (?)` | `198.00 <ø> (?)` | | | hudisparkdatasource | `72.06% <ø> (?)` | `237.00 <ø> (?)` | | | hudisync | `45.70% <ø> (?)` | `131.00 <ø> (?)` | | | huditimelineservice | `64.36% <ø> (?)` | `62.00 <ø> (?)` | | | hudiutilities | `69.84% <ø> (?)` | `374.00 <ø> (?)` | | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] codecov-commenter edited a comment on pull request #2843: [HUDI-1804] Continue to write when Flink write task restart because o…
codecov-commenter edited a comment on pull request #2843: URL: https://github.com/apache/hudi/pull/2843#issuecomment-821748199 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2843?src=pr=h1_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) Report > Merging [#2843](https://codecov.io/gh/apache/hudi/pull/2843?src=pr=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) (6e943cf) into [master](https://codecov.io/gh/apache/hudi/commit/b6d949b48a649acac27d5d9b91677bf2e25e9342?el=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) (b6d949b) will **decrease** coverage by `0.01%`. > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/hudi/pull/2843/graphs/tree.svg?width=650=150=pr=VTTXabwbs2_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/2843?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) ```diff @@ Coverage Diff @@ ## master#2843 +/- ## - Coverage 52.60% 52.58% -0.02% + Complexity 3709 3708 -1 Files 485 485 Lines 2322423227 +3 Branches 2465 2466 +1 - Hits 1221612214 -2 - Misses 9929 9934 +5 Partials 1079 1079 ``` | Flag | Coverage Δ | Complexity Δ | | |---|---|---|---| | hudicli | `40.29% <ø> (ø)` | `215.00 <ø> (ø)` | | | hudiclient | `∅ <ø> (∅)` | `0.00 <ø> (ø)` | | | hudicommon | `50.66% <ø> (-0.03%)` | `1976.00 <ø> (-1.00)` | | | hudiflink | `56.51% <ø> (-0.04%)` | `516.00 <ø> (ø)` | | | hudihadoopmr | `33.33% <ø> (ø)` | `198.00 <ø> (ø)` | | | hudisparkdatasource | `72.06% <ø> (ø)` | `237.00 <ø> (ø)` | | | hudisync | `45.70% <ø> (ø)` | `131.00 <ø> (ø)` | | | huditimelineservice | `64.36% <ø> (ø)` | `62.00 <ø> (ø)` | | | hudiutilities | `69.79% <ø> (ø)` | `373.00 <ø> (ø)` | | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more. | [Impacted Files](https://codecov.io/gh/apache/hudi/pull/2843?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) | Coverage Δ | Complexity Δ | | |---|---|---|---| | [...ache/hudi/common/fs/inline/InMemoryFileSystem.java](https://codecov.io/gh/apache/hudi/pull/2843/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL2ZzL2lubGluZS9Jbk1lbW9yeUZpbGVTeXN0ZW0uamF2YQ==) | `79.31% <0.00%> (-10.35%)` | `15.00% <0.00%> (-1.00%)` | | | [.../hudi/table/format/cow/CopyOnWriteInputFormat.java](https://codecov.io/gh/apache/hudi/pull/2843/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS90YWJsZS9mb3JtYXQvY293L0NvcHlPbldyaXRlSW5wdXRGb3JtYXQuamF2YQ==) | `55.33% <0.00%> (-0.75%)` | `20.00% <0.00%> (ø%)` | | | [...java/org/apache/hudi/common/fs/StorageSchemes.java](https://codecov.io/gh/apache/hudi/pull/2843/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL2ZzL1N0b3JhZ2VTY2hlbWVzLmphdmE=) | `100.00% <0.00%> (ø)` | `10.00% <0.00%> (ø%)` | | -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] xiarixiaoyao edited a comment on pull request #2722: [HUDI-1722]hive beeline/spark-sql query specified field on mor table occur NPE
xiarixiaoyao edited a comment on pull request #2722: URL: https://github.com/apache/hudi/pull/2722#issuecomment-822129270 @lw309637554 thanks for you review. i have answered your questions, pls check them, thanks。 Another question: TestHoodieCombineHiveInputFormat.testHoodieRealtimeCombineHoodieInputFormat is disabled by default, i have checked that test function, and find there exists some problems. could i fix those problem and enable TestHoodieCombineHiveInputFormat.testHoodieRealtimeCombineHoodieInputFormat default -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] codecov-commenter edited a comment on pull request #2645: [HUDI-1659] Basic Implementation Of Spark Sql Support
codecov-commenter edited a comment on pull request #2645: URL: https://github.com/apache/hudi/pull/2645#issuecomment-822128091 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2645?src=pr=h1_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) Report > Merging [#2645](https://codecov.io/gh/apache/hudi/pull/2645?src=pr=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) (8304965) into [master](https://codecov.io/gh/apache/hudi/commit/18459d4045ec4a85081c227893b226a4d759f84b?el=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) (18459d4) will **increase** coverage by `17.52%`. > The diff coverage is `n/a`. > :exclamation: Current head 8304965 differs from pull request most recent head 20a927c. Consider uploading reports for the commit 20a927c to get more accurate results [![Impacted file tree graph](https://codecov.io/gh/apache/hudi/pull/2645/graphs/tree.svg?width=650=150=pr=VTTXabwbs2_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/2645?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) ```diff @@ Coverage Diff @@ ## master#2645 +/- ## = + Coverage 52.26% 69.79% +17.52% + Complexity 3682 373 -3309 = Files 484 54 -430 Lines 23094 1993-21101 Branches 2456 235 -2221 = - Hits 12070 1391-10679 + Misses 9959 471 -9488 + Partials 1065 131 -934 ``` | Flag | Coverage Δ | Complexity Δ | | |---|---|---|---| | hudicli | `?` | `?` | | | hudiclient | `?` | `?` | | | hudicommon | `?` | `?` | | | hudiflink | `?` | `?` | | | hudihadoopmr | `?` | `?` | | | hudisparkdatasource | `?` | `?` | | | hudisync | `?` | `?` | | | huditimelineservice | `?` | `?` | | | hudiutilities | `69.79% <ø> (+0.06%)` | `373.00 <ø> (ø)` | | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more. | [Impacted Files](https://codecov.io/gh/apache/hudi/pull/2645?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) | Coverage Δ | Complexity Δ | | |---|---|---|---| | [...s/deltastreamer/HoodieMultiTableDeltaStreamer.java](https://codecov.io/gh/apache/hudi/pull/2645/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2RlbHRhc3RyZWFtZXIvSG9vZGllTXVsdGlUYWJsZURlbHRhU3RyZWFtZXIuamF2YQ==) | `78.39% <0.00%> (ø)` | `18.00% <0.00%> (ø%)` | | | [...ache/hudi/common/util/collection/DiskBasedMap.java](https://codecov.io/gh/apache/hudi/pull/2645/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3V0aWwvY29sbGVjdGlvbi9EaXNrQmFzZWRNYXAuamF2YQ==) | | | | | [...org/apache/hudi/cli/commands/BootstrapCommand.java](https://codecov.io/gh/apache/hudi/pull/2645/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY2xpL2NvbW1hbmRzL0Jvb3RzdHJhcENvbW1hbmQuamF2YQ==) | | | | | [...i/hadoop/utils/HoodieRealtimeInputFormatUtils.java](https://codecov.io/gh/apache/hudi/pull/2645/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1oYWRvb3AtbXIvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaGFkb29wL3V0aWxzL0hvb2RpZVJlYWx0aW1lSW5wdXRGb3JtYXRVdGlscy5qYXZh) | | | | | [...rg/apache/hudi/metadata/MetadataPartitionType.java](https://codecov.io/gh/apache/hudi/pull/2645/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvbWV0YWRhdGEvTWV0YWRhdGFQYXJ0aXRpb25UeXBlLmphdmE=) | | | | |
[GitHub] [hudi] codecov-commenter edited a comment on pull request #2844: [MINOR][hudi-sync] Fix typos
codecov-commenter edited a comment on pull request #2844: URL: https://github.com/apache/hudi/pull/2844#issuecomment-822005123 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] xiarixiaoyao commented on a change in pull request #2722: [HUDI-1722]hive beeline/spark-sql query specified field on mor table occur NPE
xiarixiaoyao commented on a change in pull request #2722: URL: https://github.com/apache/hudi/pull/2722#discussion_r615508116 ## File path: hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/realtime/HoodieParquetRealtimeInputFormat.java ## @@ -85,12 +85,14 @@ void addProjectionToJobConf(final RealtimeSplit realtimeSplit, final JobConf job // risk of experiencing race conditions. Hence, we synchronize on the JobConf object here. There is negligible // latency incurred here due to the synchronization since get record reader is called once per spilt before the // actual heavy lifting of reading the parquet files happen. -if (jobConf.get(HoodieInputFormatUtils.HOODIE_READ_COLUMNS_PROP) == null) { +if (jobConf.get(HoodieInputFormatUtils.HOODIE_READ_COLUMNS_PROP) == null +|| (!realtimeSplit.getDeltaLogPaths().isEmpty() && !HoodieRealtimeInputFormatUtils.requiredProjectionFieldsExistInConf(jobConf))) { synchronized (jobConf) { LOG.info( "Before adding Hoodie columns, Projections :" + jobConf.get(ColumnProjectionUtils.READ_COLUMN_NAMES_CONF_STR) + ", Ids :" + jobConf.get(ColumnProjectionUtils.READ_COLUMN_IDS_CONF_STR)); -if (jobConf.get(HoodieInputFormatUtils.HOODIE_READ_COLUMNS_PROP) == null) { +if (jobConf.get(HoodieInputFormatUtils.HOODIE_READ_COLUMNS_PROP) == null Review comment: Clone the Configuration object can be very expensive。 To avoid unexpected performance regressions for workloads, we should not isolation the jobconf for different recordreader i also agree with that revert the https://github.com/apache/hudi/pull/2190/files. however if current query does not involve any log files, adding hoodie additional projection columns will lead unnecessary io,since we have scanned hoodie additional projection columns . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] xiarixiaoyao commented on a change in pull request #2722: [HUDI-1722]hive beeline/spark-sql query specified field on mor table occur NPE
xiarixiaoyao commented on a change in pull request #2722: URL: https://github.com/apache/hudi/pull/2722#discussion_r615508116 ## File path: hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/realtime/HoodieParquetRealtimeInputFormat.java ## @@ -85,12 +85,14 @@ void addProjectionToJobConf(final RealtimeSplit realtimeSplit, final JobConf job // risk of experiencing race conditions. Hence, we synchronize on the JobConf object here. There is negligible // latency incurred here due to the synchronization since get record reader is called once per spilt before the // actual heavy lifting of reading the parquet files happen. -if (jobConf.get(HoodieInputFormatUtils.HOODIE_READ_COLUMNS_PROP) == null) { +if (jobConf.get(HoodieInputFormatUtils.HOODIE_READ_COLUMNS_PROP) == null +|| (!realtimeSplit.getDeltaLogPaths().isEmpty() && !HoodieRealtimeInputFormatUtils.requiredProjectionFieldsExistInConf(jobConf))) { synchronized (jobConf) { LOG.info( "Before adding Hoodie columns, Projections :" + jobConf.get(ColumnProjectionUtils.READ_COLUMN_NAMES_CONF_STR) + ", Ids :" + jobConf.get(ColumnProjectionUtils.READ_COLUMN_IDS_CONF_STR)); -if (jobConf.get(HoodieInputFormatUtils.HOODIE_READ_COLUMNS_PROP) == null) { +if (jobConf.get(HoodieInputFormatUtils.HOODIE_READ_COLUMNS_PROP) == null Review comment: Clone the Configuration object can be very expensive。 To avoid unexpected performance regressions for workloads, we should not isolation the jobconf for different recordreader i also agree with that revert the https://github.com/apache/hudi/pull/2190/files. however if current query does not involve log file, add hoodie additional projection columns will lead extra neccae the Configuration object can be very expensive。 To avoid unexpected performance regressions for workloads, we should not isolation the jobconf for different recordreader i also agree with that revert the https://github.com/apache/hudi/pull/2190/files. however if current query does not involve any log files, adding hoodie additional projection columns will lead unnecessary io,since we have scanned hoodie additional projection columns . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] xiarixiaoyao edited a comment on pull request #2722: [HUDI-1722]hive beeline/spark-sql query specified field on mor table occur NPE
xiarixiaoyao edited a comment on pull request #2722: URL: https://github.com/apache/hudi/pull/2722#issuecomment-822129270 @lw309637554 thanks for you review. i left comments for your questions. Another question: TestHoodieCombineHiveInputFormat.testHoodieRealtimeCombineHoodieInputFormat is disabled by default, i have checked that test function, and find there exists some problems. could i fix those problem and enable TestHoodieCombineHiveInputFormat.testHoodieRealtimeCombineHoodieInputFormat default -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] ssdong commented on pull request #2784: [HUDI-1740] Fix insert-overwrite API archival
ssdong commented on pull request #2784: URL: https://github.com/apache/hudi/pull/2784#issuecomment-822130380 @lw309637554 Thank you for your comments and I've replied. Please take a look and let me know. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] xiarixiaoyao edited a comment on pull request #2722: [HUDI-1722]hive beeline/spark-sql query specified field on mor table occur NPE
xiarixiaoyao edited a comment on pull request #2722: URL: https://github.com/apache/hudi/pull/2722#issuecomment-822129270 @lw309637554 thanks for your reviewer. i left comments for your questions. Another question: TestHoodieCombineHiveInputFormat.testHoodieRealtimeCombineHoodieInputFormat is disabled by default, i have checked that test function, and find there exists some problems. could i fix those problem and enable TestHoodieCombineHiveInputFormat.testHoodieRealtimeCombineHoodieInputFormat default -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] ssdong commented on a change in pull request #2784: [HUDI-1740] Fix insert-overwrite API archival
ssdong commented on a change in pull request #2784: URL: https://github.com/apache/hudi/pull/2784#discussion_r615512372 ## File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/utils/MetadataConversionUtils.java ## @@ -105,14 +117,25 @@ public static HoodieArchivedMetaEntry createMetaWrapper(HoodieInstant hoodieInst return archivedMetaWrapper; } - public static HoodieArchivedMetaEntry createMetaWrapper(HoodieInstant hoodieInstant, - HoodieCommitMetadata hoodieCommitMetadata) { -HoodieArchivedMetaEntry archivedMetaWrapper = new HoodieArchivedMetaEntry(); -archivedMetaWrapper.setCommitTime(hoodieInstant.getTimestamp()); -archivedMetaWrapper.setActionState(hoodieInstant.getState().name()); - archivedMetaWrapper.setHoodieCommitMetadata(convertCommitMetadata(hoodieCommitMetadata)); -archivedMetaWrapper.setActionType(ActionType.commit.name()); -return archivedMetaWrapper; + public static Option getInflightReplaceMetadata(HoodieTableMetaClient metaClient, HoodieInstant instant) throws IOException { Review comment: Hmm, I don't think this method is relevant to clustering.. 樂They are solely responsible for retrieving corresponding commit files and parse them, which is more applicable to be residing in a "meta conversion" file, which comes within `MetadataConversionUtils.java` naturally. In fact, I've gotten rid of the dependency upon `ClusteringUtils` in `MetadataConversionUtils.java` after this refactoring. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] ssdong commented on a change in pull request #2784: [HUDI-1740] Fix insert-overwrite API archival
ssdong commented on a change in pull request #2784: URL: https://github.com/apache/hudi/pull/2784#discussion_r615512419 ## File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/utils/MetadataConversionUtils.java ## @@ -105,14 +117,25 @@ public static HoodieArchivedMetaEntry createMetaWrapper(HoodieInstant hoodieInst return archivedMetaWrapper; } - public static HoodieArchivedMetaEntry createMetaWrapper(HoodieInstant hoodieInstant, - HoodieCommitMetadata hoodieCommitMetadata) { -HoodieArchivedMetaEntry archivedMetaWrapper = new HoodieArchivedMetaEntry(); -archivedMetaWrapper.setCommitTime(hoodieInstant.getTimestamp()); -archivedMetaWrapper.setActionState(hoodieInstant.getState().name()); - archivedMetaWrapper.setHoodieCommitMetadata(convertCommitMetadata(hoodieCommitMetadata)); -archivedMetaWrapper.setActionType(ActionType.commit.name()); -return archivedMetaWrapper; + public static Option getInflightReplaceMetadata(HoodieTableMetaClient metaClient, HoodieInstant instant) throws IOException { +Option inflightContent = metaClient.getActiveTimeline().getInstantDetails(instant); +if (!inflightContent.isPresent() || inflightContent.get().length == 0) { + // inflight files can be empty in some certain cases, e.g. when users opt in clustering + return Option.empty(); +} +return Option.of(HoodieCommitMetadata.fromBytes(inflightContent.get(), HoodieCommitMetadata.class)); + } + + public static Option getRequestedReplaceMetadata(HoodieTableMetaClient metaClient, HoodieInstant instant) throws IOException { Review comment: Same as above -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] xiarixiaoyao edited a comment on pull request #2722: [HUDI-1722]hive beeline/spark-sql query specified field on mor table occur NPE
xiarixiaoyao edited a comment on pull request #2722: URL: https://github.com/apache/hudi/pull/2722#issuecomment-822129270 @lw309637554 thanks for your reviewer. i left comments for your questions. Another question: TestHoodieCombineHiveInputFormat.testHoodieRealtimeCombineHoodieInputFormat is disabled by default, i have checked that test function, and find there exists some problems. could i fix those problem and enable TestHoodieCombineHiveInputFormat.testHoodieRealtimeCombineHoodieInputFormat -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] xiarixiaoyao commented on a change in pull request #2722: [HUDI-1722]hive beeline/spark-sql query specified field on mor table occur NPE
xiarixiaoyao commented on a change in pull request #2722: URL: https://github.com/apache/hudi/pull/2722#discussion_r615510606 ## File path: hudi-hadoop-mr/src/test/java/org/apache/hudi/hadoop/functional/TestHoodieCombineHiveInputFormat.java ## @@ -84,6 +86,73 @@ public void setUp() throws IOException, InterruptedException { HoodieTestUtils.init(MiniClusterUtil.configuration, tempDir.toAbsolutePath().toString(), HoodieTableType.MERGE_ON_READ); } + @Test + public void testMutilReaderRealtimeComineHoodieInputFormat() throws Exception { +// test for hudi-1722 +Configuration conf = new Configuration(); +// initial commit +Schema schema = HoodieAvroUtils.addMetadataFields(SchemaTestUtil.getEvolvedSchema()); +HoodieTestUtils.init(hadoopConf, tempDir.toAbsolutePath().toString(), HoodieTableType.MERGE_ON_READ); +String commitTime = "100"; +final int numRecords = 1000; +// Create 3 parquet files with 1000 records each +File partitionDir = InputFormatTestUtil.prepareParquetTable(tempDir, schema, 3, numRecords, commitTime); +InputFormatTestUtil.commit(tempDir, commitTime); + +String newCommitTime = "101"; +// to trigger the bug of HUDI-1772, only update fileid2 +// insert 1000 update records to log file 2 +// now fileid0, fileid1 has no log files, fileid2 has log file +HoodieLogFormat.Writer writer = +InputFormatTestUtil.writeDataBlockToLogFile(partitionDir, fs, schema, "fileid2", commitTime, newCommitTime, +numRecords, numRecords, 0); +writer.close(); + +TableDesc tblDesc = Utilities.defaultTd; +// Set the input format +tblDesc.setInputFileFormatClass(HoodieParquetRealtimeInputFormat.class); +PartitionDesc partDesc = new PartitionDesc(tblDesc, null); +LinkedHashMap pt = new LinkedHashMap<>(); +LinkedHashMap> tableAlias = new LinkedHashMap<>(); +ArrayList alias = new ArrayList<>(); +alias.add(tempDir.toAbsolutePath().toString()); +tableAlias.put(new Path(tempDir.toAbsolutePath().toString()), alias); +pt.put(new Path(tempDir.toAbsolutePath().toString()), partDesc); + +MapredWork mrwork = new MapredWork(); +mrwork.getMapWork().setPathToPartitionInfo(pt); +mrwork.getMapWork().setPathToAliases(tableAlias); +Path mapWorkPath = new Path(tempDir.toAbsolutePath().toString()); +Utilities.setMapRedWork(conf, mrwork, mapWorkPath); +jobConf = new JobConf(conf); +// Add the paths +FileInputFormat.setInputPaths(jobConf, partitionDir.getPath()); +jobConf.set(HAS_MAP_WORK, "true"); +// The following config tells Hive to choose ExecMapper to read the MAP_WORK +jobConf.set(MAPRED_MAPPER_CLASS, ExecMapper.class.getName()); +// set SPLIT_MAXSIZE larger to create one split for 3 files groups + jobConf.set(org.apache.hadoop.mapreduce.lib.input.FileInputFormat.SPLIT_MAXSIZE, "12800"); + +HoodieCombineHiveInputFormat combineHiveInputFormat = new HoodieCombineHiveInputFormat(); +String tripsHiveColumnTypes = "double,string,string,string,double,double,double,double,double"; +InputFormatTestUtil.setProjectFieldsForInputFormat(jobConf, schema, tripsHiveColumnTypes); +InputSplit[] splits = combineHiveInputFormat.getSplits(jobConf, 1); +// Since the SPLIT_SIZE is 3, we should create only 1 split with all 3 file groups +assertEquals(1, splits.length); +RecordReader recordReader = Review comment: yes, we only create one combine recorder, but this recorder hold three RealtimeCompactedRecordReaders。 the creating order of those RealtimeCompactedRecordReaders lead this npe problem. for test example: combine recorder holds three RealtimeCompactedRecordReaders, we call them creader1, creader2, creader3 creader1: only has base file creader2: only has base file creader3: has base file and log file. if creader3 is create firstly, hoodie additional projection columns will be added to jobConf and in this case the query will be ok however if creader1 or creader2 is create firstly, no hoodie additional projection columns will be added to jobConf, the query will failed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] xiarixiaoyao commented on pull request #2722: [HUDI-1722]hive beeline/spark-sql query specified field on mor table occur NPE
xiarixiaoyao commented on pull request #2722: URL: https://github.com/apache/hudi/pull/2722#issuecomment-822129270 @lw309637554 thanks for your reviewer. i left comments for your questions. Another question: TestHoodieCombineHiveInputFormat.testHoodieRealtimeCombineHoodieInputFormat is disabled by default, i have checked that test function, and find these exists some problems. could i fix those problem and enable TestHoodieCombineHiveInputFormat.testHoodieRealtimeCombineHoodieInputFormat -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] ssdong commented on a change in pull request #2784: [HUDI-1740] Fix insert-overwrite API archival
ssdong commented on a change in pull request #2784: URL: https://github.com/apache/hudi/pull/2784#discussion_r615510834 ## File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/ReplaceArchivalHelper.java ## @@ -68,6 +68,11 @@ public static boolean deleteReplacedFileGroups(HoodieEngineContext context, HoodieTableMetaClient metaClient, TableFileSystemView fileSystemView, HoodieInstant instant, List replacedPartitions) { +// There is no file id to be replaced in the very first replace commit file for insert overwrite operation +if (replacedPartitions.isEmpty()) { + LOG.warn("Found empty partitionToReplaceFileIds"); Review comment: `partitionToReplaceFileIds` is the field name directly taken from the replace commit file. I guess it is subject to change. What about just explaining the warning as `Found no partition files to replace`? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] xiarixiaoyao commented on a change in pull request #2722: [HUDI-1722]hive beeline/spark-sql query specified field on mor table occur NPE
xiarixiaoyao commented on a change in pull request #2722: URL: https://github.com/apache/hudi/pull/2722#discussion_r615510606 ## File path: hudi-hadoop-mr/src/test/java/org/apache/hudi/hadoop/functional/TestHoodieCombineHiveInputFormat.java ## @@ -84,6 +86,73 @@ public void setUp() throws IOException, InterruptedException { HoodieTestUtils.init(MiniClusterUtil.configuration, tempDir.toAbsolutePath().toString(), HoodieTableType.MERGE_ON_READ); } + @Test + public void testMutilReaderRealtimeComineHoodieInputFormat() throws Exception { +// test for hudi-1722 +Configuration conf = new Configuration(); +// initial commit +Schema schema = HoodieAvroUtils.addMetadataFields(SchemaTestUtil.getEvolvedSchema()); +HoodieTestUtils.init(hadoopConf, tempDir.toAbsolutePath().toString(), HoodieTableType.MERGE_ON_READ); +String commitTime = "100"; +final int numRecords = 1000; +// Create 3 parquet files with 1000 records each +File partitionDir = InputFormatTestUtil.prepareParquetTable(tempDir, schema, 3, numRecords, commitTime); +InputFormatTestUtil.commit(tempDir, commitTime); + +String newCommitTime = "101"; +// to trigger the bug of HUDI-1772, only update fileid2 +// insert 1000 update records to log file 2 +// now fileid0, fileid1 has no log files, fileid2 has log file +HoodieLogFormat.Writer writer = +InputFormatTestUtil.writeDataBlockToLogFile(partitionDir, fs, schema, "fileid2", commitTime, newCommitTime, +numRecords, numRecords, 0); +writer.close(); + +TableDesc tblDesc = Utilities.defaultTd; +// Set the input format +tblDesc.setInputFileFormatClass(HoodieParquetRealtimeInputFormat.class); +PartitionDesc partDesc = new PartitionDesc(tblDesc, null); +LinkedHashMap pt = new LinkedHashMap<>(); +LinkedHashMap> tableAlias = new LinkedHashMap<>(); +ArrayList alias = new ArrayList<>(); +alias.add(tempDir.toAbsolutePath().toString()); +tableAlias.put(new Path(tempDir.toAbsolutePath().toString()), alias); +pt.put(new Path(tempDir.toAbsolutePath().toString()), partDesc); + +MapredWork mrwork = new MapredWork(); +mrwork.getMapWork().setPathToPartitionInfo(pt); +mrwork.getMapWork().setPathToAliases(tableAlias); +Path mapWorkPath = new Path(tempDir.toAbsolutePath().toString()); +Utilities.setMapRedWork(conf, mrwork, mapWorkPath); +jobConf = new JobConf(conf); +// Add the paths +FileInputFormat.setInputPaths(jobConf, partitionDir.getPath()); +jobConf.set(HAS_MAP_WORK, "true"); +// The following config tells Hive to choose ExecMapper to read the MAP_WORK +jobConf.set(MAPRED_MAPPER_CLASS, ExecMapper.class.getName()); +// set SPLIT_MAXSIZE larger to create one split for 3 files groups + jobConf.set(org.apache.hadoop.mapreduce.lib.input.FileInputFormat.SPLIT_MAXSIZE, "12800"); + +HoodieCombineHiveInputFormat combineHiveInputFormat = new HoodieCombineHiveInputFormat(); +String tripsHiveColumnTypes = "double,string,string,string,double,double,double,double,double"; +InputFormatTestUtil.setProjectFieldsForInputFormat(jobConf, schema, tripsHiveColumnTypes); +InputSplit[] splits = combineHiveInputFormat.getSplits(jobConf, 1); +// Since the SPLIT_SIZE is 3, we should create only 1 split with all 3 file groups +assertEquals(1, splits.length); +RecordReader recordReader = Review comment: yes, we only create one combine recorder, but this recorder hold three RealtimeCompactedRecordReaders。 the executing order of the RealtimeCompactedRecordReaders lead this npe problem. for test example: combine recorder holds three RealtimeCompactedRecordReaders, we call them creader1, creader2, creader3 creader1: only has base file creader2: only has base file creader3: has base file and log file. if creader3 is create firstly, hoodie additional projection columns will be added to jobConf and in this case the query will be ok however if creader1 or creader2 is create firstly, no hoodie additional projection columns will be added to jobConf, the query will failed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] codecov-commenter commented on pull request #2645: [HUDI-1659] Basic Implementation Of Spark Sql Support
codecov-commenter commented on pull request #2645: URL: https://github.com/apache/hudi/pull/2645#issuecomment-822128091 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2645?src=pr=h1_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) Report > Merging [#2645](https://codecov.io/gh/apache/hudi/pull/2645?src=pr=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) (8304965) into [master](https://codecov.io/gh/apache/hudi/commit/18459d4045ec4a85081c227893b226a4d759f84b?el=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) (18459d4) will **decrease** coverage by `42.88%`. > The diff coverage is `n/a`. > :exclamation: Current head 8304965 differs from pull request most recent head 20a927c. Consider uploading reports for the commit 20a927c to get more accurate results [![Impacted file tree graph](https://codecov.io/gh/apache/hudi/pull/2645/graphs/tree.svg?width=650=150=pr=VTTXabwbs2_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/2645?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) ```diff @@ Coverage Diff @@ ## master #2645 +/- ## - Coverage 52.26% 9.38% -42.89% + Complexity 3682 48 -3634 Files 484 54 -430 Lines 230941993-21101 Branches 2456 235 -2221 - Hits 12070 187-11883 + Misses 99591793 -8166 + Partials 1065 13 -1052 ``` | Flag | Coverage Δ | Complexity Δ | | |---|---|---|---| | hudicli | `?` | `?` | | | hudiclient | `?` | `?` | | | hudicommon | `?` | `?` | | | hudiflink | `?` | `?` | | | hudihadoopmr | `?` | `?` | | | hudisparkdatasource | `?` | `?` | | | hudisync | `?` | `?` | | | huditimelineservice | `?` | `?` | | | hudiutilities | `9.38% <ø> (-60.35%)` | `48.00 <ø> (-325.00)` | | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more. | [Impacted Files](https://codecov.io/gh/apache/hudi/pull/2645?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) | Coverage Δ | Complexity Δ | | |---|---|---|---| | [...va/org/apache/hudi/utilities/IdentitySplitter.java](https://codecov.io/gh/apache/hudi/pull/2645/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL0lkZW50aXR5U3BsaXR0ZXIuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-2.00%)` | | | [...va/org/apache/hudi/utilities/schema/SchemaSet.java](https://codecov.io/gh/apache/hudi/pull/2645/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NjaGVtYS9TY2hlbWFTZXQuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-3.00%)` | | | [...a/org/apache/hudi/utilities/sources/RowSource.java](https://codecov.io/gh/apache/hudi/pull/2645/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvUm93U291cmNlLmphdmE=) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-4.00%)` | | | [.../org/apache/hudi/utilities/sources/AvroSource.java](https://codecov.io/gh/apache/hudi/pull/2645/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvQXZyb1NvdXJjZS5qYXZh) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-1.00%)` | | | [.../org/apache/hudi/utilities/sources/JsonSource.java](https://codecov.io/gh/apache/hudi/pull/2645/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvSnNvblNvdXJjZS5qYXZh) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-1.00%)` | | |
[GitHub] [hudi] pengzhiwei2018 commented on a change in pull request #2645: [HUDI-1659] Basic Implementation Of Spark Sql Support
pengzhiwei2018 commented on a change in pull request #2645: URL: https://github.com/apache/hudi/pull/2645#discussion_r615508923 ## File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieAppendHandle.java ## @@ -444,7 +448,9 @@ private void writeToBuffer(HoodieRecord record) { } Option indexedRecord = getIndexedRecord(record); if (indexedRecord.isPresent()) { - recordList.add(indexedRecord.get()); + if (indexedRecord.get() != IGNORE_RECORD) { // Skip the Ignore Record. Review comment: Fixed~ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] ssdong commented on a change in pull request #2784: [HUDI-1740] Fix insert-overwrite API archival
ssdong commented on a change in pull request #2784: URL: https://github.com/apache/hudi/pull/2784#discussion_r615508860 ## File path: hudi-common/src/test/java/org/apache/hudi/common/testutils/HoodieTestTable.java ## @@ -177,10 +161,14 @@ public HoodieTestTable addDeltaCommit(String instantTime) throws Exception { return this; } - public HoodieTestTable addReplaceCommit(String instantTime, HoodieRequestedReplaceMetadata requestedReplaceMetadata, HoodieReplaceCommitMetadata metadata) throws Exception { + public HoodieTestTable addReplaceCommit( + String instantTime, + HoodieRequestedReplaceMetadata requestedReplaceMetadata, + HoodieReplaceCommitMetadata completeReplaceMetadata, + HoodieCommitMetadata inflightReplaceMetadata) throws Exception { Review comment: hmm, if you track it down to `createInflightReplaceCommit` where `HoodieCommitMetadata` is being referenced, a `null` check is being presented there. However, I do agree an `Optional` reminds people of a null check though the `o.isPresent()` check is _hardly_ any better than `o != null`. I could make the change and also update `createRequestedReplaceCommit` to adopt the same Optional `HoodieRequestedReplaceMetadata`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] codecov-commenter edited a comment on pull request #2843: [HUDI-1804] Continue to write when Flink write task restart because o…
codecov-commenter edited a comment on pull request #2843: URL: https://github.com/apache/hudi/pull/2843#issuecomment-821748199 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2843?src=pr=h1_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) Report > Merging [#2843](https://codecov.io/gh/apache/hudi/pull/2843?src=pr=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) (6e943cf) into [master](https://codecov.io/gh/apache/hudi/commit/b6d949b48a649acac27d5d9b91677bf2e25e9342?el=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) (b6d949b) will **decrease** coverage by `43.21%`. > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/hudi/pull/2843/graphs/tree.svg?width=650=150=pr=VTTXabwbs2_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/2843?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) ```diff @@ Coverage Diff @@ ## master #2843 +/- ## - Coverage 52.60% 9.38% -43.22% + Complexity 3709 48 -3661 Files 485 54 -431 Lines 232241993-21231 Branches 2465 235 -2230 - Hits 12216 187-12029 + Misses 99291793 -8136 + Partials 1079 13 -1066 ``` | Flag | Coverage Δ | Complexity Δ | | |---|---|---|---| | hudicli | `?` | `?` | | | hudiclient | `?` | `?` | | | hudicommon | `?` | `?` | | | hudiflink | `?` | `?` | | | hudihadoopmr | `?` | `?` | | | hudisparkdatasource | `?` | `?` | | | hudisync | `?` | `?` | | | huditimelineservice | `?` | `?` | | | hudiutilities | `9.38% <ø> (-60.42%)` | `48.00 <ø> (-325.00)` | | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more. | [Impacted Files](https://codecov.io/gh/apache/hudi/pull/2843?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) | Coverage Δ | Complexity Δ | | |---|---|---|---| | [...va/org/apache/hudi/utilities/IdentitySplitter.java](https://codecov.io/gh/apache/hudi/pull/2843/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL0lkZW50aXR5U3BsaXR0ZXIuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-2.00%)` | | | [...va/org/apache/hudi/utilities/schema/SchemaSet.java](https://codecov.io/gh/apache/hudi/pull/2843/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NjaGVtYS9TY2hlbWFTZXQuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-3.00%)` | | | [...a/org/apache/hudi/utilities/sources/RowSource.java](https://codecov.io/gh/apache/hudi/pull/2843/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvUm93U291cmNlLmphdmE=) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-4.00%)` | | | [.../org/apache/hudi/utilities/sources/AvroSource.java](https://codecov.io/gh/apache/hudi/pull/2843/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvQXZyb1NvdXJjZS5qYXZh) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-1.00%)` | | | [.../org/apache/hudi/utilities/sources/JsonSource.java](https://codecov.io/gh/apache/hudi/pull/2843/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvSnNvblNvdXJjZS5qYXZh) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-1.00%)` | | |
[GitHub] [hudi] pengzhiwei2018 commented on pull request #2645: [HUDI-1659] Basic Implementation Of Spark Sql Support
pengzhiwei2018 commented on pull request #2645: URL: https://github.com/apache/hudi/pull/2645#issuecomment-822125975 > @pengzhiwei2018 one more question, will we introduce Catalog to manage table operations in further? Yes, I agree with introduce Catalog to manage table operations for spark3 in the further. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] pengzhiwei2018 commented on a change in pull request #2645: [HUDI-1659] Basic Implementation Of Spark Sql Support
pengzhiwei2018 commented on a change in pull request #2645: URL: https://github.com/apache/hudi/pull/2645#discussion_r615508220 ## File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieAppendHandle.java ## @@ -444,7 +448,9 @@ private void writeToBuffer(HoodieRecord record) { } Option indexedRecord = getIndexedRecord(record); if (indexedRecord.isPresent()) { - recordList.add(indexedRecord.get()); + if (indexedRecord.get() != IGNORE_RECORD) { // Skip the Ignore Record. Review comment: > @pengzhiwei2018 one more question, will we introduce Catalog to manage table operations in further? Yes, I agree with introduce Catalog to manage table operations for spark3 in the further. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] xiarixiaoyao commented on a change in pull request #2722: [HUDI-1722]hive beeline/spark-sql query specified field on mor table occur NPE
xiarixiaoyao commented on a change in pull request #2722: URL: https://github.com/apache/hudi/pull/2722#discussion_r615508116 ## File path: hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/realtime/HoodieParquetRealtimeInputFormat.java ## @@ -85,12 +85,14 @@ void addProjectionToJobConf(final RealtimeSplit realtimeSplit, final JobConf job // risk of experiencing race conditions. Hence, we synchronize on the JobConf object here. There is negligible // latency incurred here due to the synchronization since get record reader is called once per spilt before the // actual heavy lifting of reading the parquet files happen. -if (jobConf.get(HoodieInputFormatUtils.HOODIE_READ_COLUMNS_PROP) == null) { +if (jobConf.get(HoodieInputFormatUtils.HOODIE_READ_COLUMNS_PROP) == null +|| (!realtimeSplit.getDeltaLogPaths().isEmpty() && !HoodieRealtimeInputFormatUtils.requiredProjectionFieldsExistInConf(jobConf))) { synchronized (jobConf) { LOG.info( "Before adding Hoodie columns, Projections :" + jobConf.get(ColumnProjectionUtils.READ_COLUMN_NAMES_CONF_STR) + ", Ids :" + jobConf.get(ColumnProjectionUtils.READ_COLUMN_IDS_CONF_STR)); -if (jobConf.get(HoodieInputFormatUtils.HOODIE_READ_COLUMNS_PROP) == null) { +if (jobConf.get(HoodieInputFormatUtils.HOODIE_READ_COLUMNS_PROP) == null Review comment: Clone the Configuration object can be very expensive。 To avoid unexpected performance regressions for workloads, we should not isolation the jobconf for different recordreader i also agree with that revert the https://github.com/apache/hudi/pull/2190/files -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] pengzhiwei2018 commented on a change in pull request #2645: [HUDI-1659] Basic Implementation Of Spark Sql Support
pengzhiwei2018 commented on a change in pull request #2645: URL: https://github.com/apache/hudi/pull/2645#discussion_r615507564 ## File path: hudi-common/src/main/java/org/apache/hudi/common/model/DefaultHoodieRecordPayload.java ## @@ -97,4 +86,22 @@ public DefaultHoodieRecordPayload(Option record) { } return metadata.isEmpty() ? Option.empty() : Option.of(metadata); } + + protected boolean needUpdatePersistedRecord(IndexedRecord currentValue, + IndexedRecord incomingRecord, Properties properties) { +/* + * Combining strategy here returns currentValue on disk if incoming record is older. + * The incoming record can be either a delete (sent as an upsert with _hoodie_is_deleted set to true) + * or an insert/update record. In any case, if it is older than the record in disk, the currentValue + * in disk is returned (to be rewritten with new commit time). + * + * NOTE: Deletes sent via EmptyHoodieRecordPayload and/or Delete operation type do not hit this code path Review comment: Yes, It is used by the HoodieMergeHandle. Here I just put the original code into the `needUpdatePersistedRecord`, which can used by the sub-class of `DefaultHoodieRecordPayload`.e .g. `ExpressionPayload`. It is just a code refactor here. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] lw309637554 commented on pull request #2199: [HUDI-1264][WIP] spark incremental read support with replace
lw309637554 commented on pull request #2199: URL: https://github.com/apache/hudi/pull/2199#issuecomment-822124457 > @satishkotha Is this PR still valid ? @lw309637554 Can you please rebase this PR so we can get this landed. @n3nash @satishkotha i think the solution in this pr is not very good. hi , the solution in this pull request just filter the commits between the latest replace commit and the end commit. But compare to HoodieParquetRealtimeInputFormat , it use fsView.getLatestMergedFileSlicesBeforeOrOn to filter the not replace slice, if we should change spark incremental relation to use fsView.getLatestMergedFileSlicesBeforeOrOn ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (HUDI-1138) Re-implement marker files via timeline server
[ https://issues.apache.org/jira/browse/HUDI-1138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17324649#comment-17324649 ] liwei commented on HUDI-1138: - [~uditme] [~vinoth] i also think listing will be performance improvement point. In cloud storage such as S3 and OSS of alibaba cloud list is expensive and slow. can we use P.S: I was tempted to think Spark listener mechanism can help us deal with failed tasks, but it has no guarantees. the writer job could die without deleting a partial file. i.e it can improve things, but cant provide guarantees and delete the residue files in clean ? > Re-implement marker files via timeline server > - > > Key: HUDI-1138 > URL: https://issues.apache.org/jira/browse/HUDI-1138 > Project: Apache Hudi > Issue Type: Improvement > Components: Writer Core >Affects Versions: 0.9.0 >Reporter: Vinoth Chandar >Priority: Blocker > Fix For: 0.9.0 > > > Even as you can argue that RFC-15/consolidated metadata, removes the need for > deleting partial files written due to spark task failures/stage retries. It > will still leave extra files inside the table (and users will pay for it > every month) and we need the marker mechanism to be able to delete these > partial files. > Here we explore if we can improve the current marker file mechanism, that > creates one marker file per data file written, by > Delegating the createMarker() call to the driver/timeline server, and have it > create marker metadata into a single file handle, that is flushed for > durability guarantees > > P.S: I was tempted to think Spark listener mechanism can help us deal with > failed tasks, but it has no guarantees. the writer job could die without > deleting a partial file. i.e it can improve things, but cant provide > guarantees -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] RocMarshal closed pull request #2844: [Hotfix][hudi-sync] Fix typos
RocMarshal closed pull request #2844: URL: https://github.com/apache/hudi/pull/2844 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-897) hudi support log append scenario with better write and asynchronous compaction
[ https://issues.apache.org/jira/browse/HUDI-897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liwei updated HUDI-897: --- Status: In Progress (was: Open) > hudi support log append scenario with better write and asynchronous compaction > -- > > Key: HUDI-897 > URL: https://issues.apache.org/jira/browse/HUDI-897 > Project: Apache Hudi > Issue Type: Improvement > Components: Compaction, Performance >Affects Versions: 0.9.0 >Reporter: liwei >Assignee: liwei >Priority: Major > Fix For: 0.9.0 > > Attachments: image-2020-05-14-19-51-37-938.png, > image-2020-05-14-20-14-59-429.png > > > 一、scenario > The business scenarios of the data lake mainly include analysis of databases, > logs, and files. > !image-2020-05-14-20-14-59-429.png|width=444,height=286! > Databricks delta lake also aim at these three scenario. [1] > > 二、Hudi current situation > At present, hudi can better support the scenario where the database cdc is > incrementally written to hudi, and it is also doing bulkload files to hudi. > However, there is no good native support for log scenarios (requiring > high-throughput writes, no updates, deletions, and focusing on small file > scenarios);now can write through inserts without deduplication, but they will > still merge on the write side. > * In copy on write mode when "hoodie.parquet.small.file.limit" is 100MB, but > every batch small will cost some time for merge,it will reduce write > throughput. > * This scene is not suitable for merge on read. > * the actual scenario only needs to write parquet in batches when writing, > and then provide reverse compaction (similar to delta lake ) > 三、what we can do > > 1.On the write side, just write every batch to parquet file base on the > snapshot mechanism,default open the merge,use can close the auto merge for > more write throughput. > 2. hudi support asynchronous merge small parquet files like databricks delta > lake's OPTIMIZE command [2] > > [1] [https://databricks.com/product/delta-lake-on-databricks] > [2] [https://docs.databricks.com/delta/optimizations/file-mgmt.html] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-897) hudi support log append scenario with better write and asynchronous compaction
[ https://issues.apache.org/jira/browse/HUDI-897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17324647#comment-17324647 ] liwei commented on HUDI-897: okay > hudi support log append scenario with better write and asynchronous compaction > -- > > Key: HUDI-897 > URL: https://issues.apache.org/jira/browse/HUDI-897 > Project: Apache Hudi > Issue Type: Improvement > Components: Compaction, Performance >Affects Versions: 0.9.0 >Reporter: liwei >Assignee: liwei >Priority: Major > Fix For: 0.9.0 > > Attachments: image-2020-05-14-19-51-37-938.png, > image-2020-05-14-20-14-59-429.png > > > 一、scenario > The business scenarios of the data lake mainly include analysis of databases, > logs, and files. > !image-2020-05-14-20-14-59-429.png|width=444,height=286! > Databricks delta lake also aim at these three scenario. [1] > > 二、Hudi current situation > At present, hudi can better support the scenario where the database cdc is > incrementally written to hudi, and it is also doing bulkload files to hudi. > However, there is no good native support for log scenarios (requiring > high-throughput writes, no updates, deletions, and focusing on small file > scenarios);now can write through inserts without deduplication, but they will > still merge on the write side. > * In copy on write mode when "hoodie.parquet.small.file.limit" is 100MB, but > every batch small will cost some time for merge,it will reduce write > throughput. > * This scene is not suitable for merge on read. > * the actual scenario only needs to write parquet in batches when writing, > and then provide reverse compaction (similar to delta lake ) > 三、what we can do > > 1.On the write side, just write every batch to parquet file base on the > snapshot mechanism,default open the merge,use can close the auto merge for > more write throughput. > 2. hudi support asynchronous merge small parquet files like databricks delta > lake's OPTIMIZE command [2] > > [1] [https://databricks.com/product/delta-lake-on-databricks] > [2] [https://docs.databricks.com/delta/optimizations/file-mgmt.html] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HUDI-897) hudi support log append scenario with better write and asynchronous compaction
[ https://issues.apache.org/jira/browse/HUDI-897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liwei resolved HUDI-897. Resolution: Fixed > hudi support log append scenario with better write and asynchronous compaction > -- > > Key: HUDI-897 > URL: https://issues.apache.org/jira/browse/HUDI-897 > Project: Apache Hudi > Issue Type: Improvement > Components: Compaction, Performance >Affects Versions: 0.9.0 >Reporter: liwei >Assignee: liwei >Priority: Major > Fix For: 0.9.0 > > Attachments: image-2020-05-14-19-51-37-938.png, > image-2020-05-14-20-14-59-429.png > > > 一、scenario > The business scenarios of the data lake mainly include analysis of databases, > logs, and files. > !image-2020-05-14-20-14-59-429.png|width=444,height=286! > Databricks delta lake also aim at these three scenario. [1] > > 二、Hudi current situation > At present, hudi can better support the scenario where the database cdc is > incrementally written to hudi, and it is also doing bulkload files to hudi. > However, there is no good native support for log scenarios (requiring > high-throughput writes, no updates, deletions, and focusing on small file > scenarios);now can write through inserts without deduplication, but they will > still merge on the write side. > * In copy on write mode when "hoodie.parquet.small.file.limit" is 100MB, but > every batch small will cost some time for merge,it will reduce write > throughput. > * This scene is not suitable for merge on read. > * the actual scenario only needs to write parquet in batches when writing, > and then provide reverse compaction (similar to delta lake ) > 三、what we can do > > 1.On the write side, just write every batch to parquet file base on the > snapshot mechanism,default open the merge,use can close the auto merge for > more write throughput. > 2. hudi support asynchronous merge small parquet files like databricks delta > lake's OPTIMIZE command [2] > > [1] [https://databricks.com/product/delta-lake-on-databricks] > [2] [https://docs.databricks.com/delta/optimizations/file-mgmt.html] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] lw309637554 commented on pull request #2784: [HUDI-1740] Fix insert-overwrite API archival
lw309637554 commented on pull request #2784: URL: https://github.com/apache/hudi/pull/2784#issuecomment-822117646 @zherenyu831 @ssdong thanks for your contribution, left some minor comments -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] lw309637554 commented on a change in pull request #2784: [HUDI-1740] Fix insert-overwrite API archival
lw309637554 commented on a change in pull request #2784: URL: https://github.com/apache/hudi/pull/2784#discussion_r615500929 ## File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/utils/MetadataConversionUtils.java ## @@ -105,14 +117,25 @@ public static HoodieArchivedMetaEntry createMetaWrapper(HoodieInstant hoodieInst return archivedMetaWrapper; } - public static HoodieArchivedMetaEntry createMetaWrapper(HoodieInstant hoodieInstant, - HoodieCommitMetadata hoodieCommitMetadata) { -HoodieArchivedMetaEntry archivedMetaWrapper = new HoodieArchivedMetaEntry(); -archivedMetaWrapper.setCommitTime(hoodieInstant.getTimestamp()); -archivedMetaWrapper.setActionState(hoodieInstant.getState().name()); - archivedMetaWrapper.setHoodieCommitMetadata(convertCommitMetadata(hoodieCommitMetadata)); -archivedMetaWrapper.setActionType(ActionType.commit.name()); -return archivedMetaWrapper; + public static Option getInflightReplaceMetadata(HoodieTableMetaClient metaClient, HoodieInstant instant) throws IOException { Review comment: can we move this to ClusteringUtils ## File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/utils/MetadataConversionUtils.java ## @@ -105,14 +117,25 @@ public static HoodieArchivedMetaEntry createMetaWrapper(HoodieInstant hoodieInst return archivedMetaWrapper; } - public static HoodieArchivedMetaEntry createMetaWrapper(HoodieInstant hoodieInstant, - HoodieCommitMetadata hoodieCommitMetadata) { -HoodieArchivedMetaEntry archivedMetaWrapper = new HoodieArchivedMetaEntry(); -archivedMetaWrapper.setCommitTime(hoodieInstant.getTimestamp()); -archivedMetaWrapper.setActionState(hoodieInstant.getState().name()); - archivedMetaWrapper.setHoodieCommitMetadata(convertCommitMetadata(hoodieCommitMetadata)); -archivedMetaWrapper.setActionType(ActionType.commit.name()); -return archivedMetaWrapper; + public static Option getInflightReplaceMetadata(HoodieTableMetaClient metaClient, HoodieInstant instant) throws IOException { +Option inflightContent = metaClient.getActiveTimeline().getInstantDetails(instant); +if (!inflightContent.isPresent() || inflightContent.get().length == 0) { + // inflight files can be empty in some certain cases, e.g. when users opt in clustering + return Option.empty(); +} +return Option.of(HoodieCommitMetadata.fromBytes(inflightContent.get(), HoodieCommitMetadata.class)); + } + + public static Option getRequestedReplaceMetadata(HoodieTableMetaClient metaClient, HoodieInstant instant) throws IOException { Review comment: can we move this to ClusteringUtils -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] lw309637554 commented on a change in pull request #2784: [HUDI-1740] Fix insert-overwrite API archival
lw309637554 commented on a change in pull request #2784: URL: https://github.com/apache/hudi/pull/2784#discussion_r615499401 ## File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/ReplaceArchivalHelper.java ## @@ -68,6 +68,11 @@ public static boolean deleteReplacedFileGroups(HoodieEngineContext context, HoodieTableMetaClient metaClient, TableFileSystemView fileSystemView, HoodieInstant instant, List replacedPartitions) { +// There is no file id to be replaced in the very first replace commit file for insert overwrite operation +if (replacedPartitions.isEmpty()) { + LOG.warn("Found empty partitionToReplaceFileIds"); Review comment: can use partitionToReplaceFileIds -> replacedPartitions better? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] lw309637554 commented on a change in pull request #2784: [HUDI-1740] Fix insert-overwrite API archival
lw309637554 commented on a change in pull request #2784: URL: https://github.com/apache/hudi/pull/2784#discussion_r615499171 ## File path: hudi-common/src/test/java/org/apache/hudi/common/testutils/HoodieTestTable.java ## @@ -177,10 +161,14 @@ public HoodieTestTable addDeltaCommit(String instantTime) throws Exception { return this; } - public HoodieTestTable addReplaceCommit(String instantTime, HoodieRequestedReplaceMetadata requestedReplaceMetadata, HoodieReplaceCommitMetadata metadata) throws Exception { + public HoodieTestTable addReplaceCommit( + String instantTime, + HoodieRequestedReplaceMetadata requestedReplaceMetadata, + HoodieReplaceCommitMetadata completeReplaceMetadata, + HoodieCommitMetadata inflightReplaceMetadata) throws Exception { Review comment: can we set HoodieCommitMetadata to Option< HoodieCommitMetadata > to avoid null -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] ngonik commented on issue #1679: [HUDI-1609] How to disable Hive JDBC and enable metastore
ngonik commented on issue #1679: URL: https://github.com/apache/hudi/issues/1679#issuecomment-822089338 Hey, I'm having the same issues with JSONEXception on EMR as mentioned above. Is there any update around that? Anything I can help with to make it work? Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-648) Implement error log/table for Datasource/DeltaStreamer/WriteClient/Compaction writes
[ https://issues.apache.org/jira/browse/HUDI-648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-648: Labels: pull-request-available sev:normal user-support-issues (was: sev:normal user-support-issues) > Implement error log/table for Datasource/DeltaStreamer/WriteClient/Compaction > writes > > > Key: HUDI-648 > URL: https://issues.apache.org/jira/browse/HUDI-648 > Project: Apache Hudi > Issue Type: New Feature > Components: DeltaStreamer, Spark Integration, Writer Core >Reporter: Vinoth Chandar >Assignee: liujinhui >Priority: Major > Labels: pull-request-available, sev:normal, user-support-issues > Attachments: image-2021-03-03-11-40-21-083.png > > > We would like a way to hand the erroring records from writing or compaction > back to the users, in a separate table or log. This needs to work generically > across all the different writer paths. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] xushiyan commented on a change in pull request #2710: [HUDI-648][RFC-20] Implement error log/table for Datasource/DeltaStreamer/WriteClient/Compaction writes
xushiyan commented on a change in pull request #2710: URL: https://github.com/apache/hudi/pull/2710#discussion_r615463905 ## File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/error/HoodieBackedErrorTableWriter.java ## @@ -0,0 +1,247 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hudi.error; + +import org.apache.avro.Schema; +import org.apache.avro.generic.GenericData; +import org.apache.avro.generic.GenericRecord; +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.fs.Path; +import org.apache.hudi.avro.HoodieAvroUtils; +import org.apache.hudi.client.WriteStatus; +import org.apache.hudi.common.config.HoodieErrorTableConfig; +import org.apache.hudi.common.config.SerializableConfiguration; +import org.apache.hudi.common.engine.HoodieEngineContext; +import org.apache.hudi.common.model.HoodieCleaningPolicy; +import org.apache.hudi.common.model.HoodieRecord; +import org.apache.hudi.common.model.HoodieRecordPayload; +import org.apache.hudi.common.model.HoodieTableType; +import org.apache.hudi.common.model.HoodieFileFormat; +import org.apache.hudi.common.model.HoodieRecordLocation; +import org.apache.hudi.common.model.OverwriteWithLatestAvroSchemaPayload; +import org.apache.hudi.common.model.HoodieKey; +import org.apache.hudi.common.model.HoodieAvroPayload; +import org.apache.hudi.common.model.OverwriteWithLatestAvroPayload; +import org.apache.hudi.common.table.HoodieTableMetaClient; +import org.apache.hudi.common.util.Option; +import org.apache.hudi.common.util.StringUtils; +import org.apache.hudi.config.HoodieCompactionConfig; +import org.apache.hudi.config.HoodieWriteConfig; +import org.apache.hudi.table.HoodieTable; +import org.apache.log4j.LogManager; +import org.apache.log4j.Logger; +import org.joda.time.DateTime; +import org.joda.time.DateTimeZone; + +import java.io.IOException; +import java.io.Serializable; +import java.util.ArrayList; +import java.util.HashMap; +import java.util.List; +import java.util.Map; +import java.util.UUID; + +import static org.apache.hudi.common.config.HoodieErrorTableConfig.ERROR_COMMIT_TIME_METADATA_FIELD; +import static org.apache.hudi.common.config.HoodieErrorTableConfig.ERROR_RECORD_KEY_METADATA_FIELD; +import static org.apache.hudi.common.config.HoodieErrorTableConfig.ERROR_PARTITION_PATH_METADATA_FIELD; +import static org.apache.hudi.common.config.HoodieErrorTableConfig.ERROR_FILE_ID_FIELD; +import static org.apache.hudi.common.config.HoodieErrorTableConfig.ERROR_TABLE_NAME; +import static org.apache.hudi.common.config.HoodieErrorTableConfig.ERROR_RECORD_UUID; +import static org.apache.hudi.common.config.HoodieErrorTableConfig.ERROR_RECORD_TS; +import static org.apache.hudi.common.config.HoodieErrorTableConfig.ERROR_RECORD_SCHEMA; +import static org.apache.hudi.common.config.HoodieErrorTableConfig.ERROR_RECORD_RECORD; +import static org.apache.hudi.common.config.HoodieErrorTableConfig.ERROR_RECORD_MESSAGE; +import static org.apache.hudi.common.config.HoodieErrorTableConfig.ERROR_RECORD_CONTEXT; + +/** + * Writer implementation backed by an internal hudi table. Error records are saved within an internal COW table + * called Error table. + */ +public abstract class HoodieBackedErrorTableWriter implements Serializable { + + private static final Logger LOG = LogManager.getLogger(HoodieBackedErrorTableWriter.class); + + protected HoodieWriteConfig errorTableWriteConfig; + protected HoodieWriteConfig datasetWriteConfig; + protected String tableName; + + protected HoodieTableMetaClient metaClient; + protected SerializableConfiguration hadoopConf; + protected final transient HoodieEngineContext engineContext; + protected String basePath; + + protected HoodieBackedErrorTableWriter(Configuration hadoopConf, HoodieWriteConfig writeConfig, HoodieEngineContext engineContext) { +this.datasetWriteConfig = writeConfig; +this.engineContext = engineContext; +this.hadoopConf = new SerializableConfiguration(hadoopConf); + +if (writeConfig.errorTableEnabled()) { + this.tableName = writeConfig.getTableName() + HoodieErrorTableConfig.ERROR_TABLE_NAME_SUFFIX; + this.basePath =
[GitHub] [hudi] codecov-commenter edited a comment on pull request #2677: [HUDI-1714] Added tests to TestHoodieTimelineArchiveLog for the archival of compl…
codecov-commenter edited a comment on pull request #2677: URL: https://github.com/apache/hudi/pull/2677#issuecomment-822060324 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2677?src=pr=h1_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) Report > Merging [#2677](https://codecov.io/gh/apache/hudi/pull/2677?src=pr=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) (a45db97) into [master](https://codecov.io/gh/apache/hudi/commit/4e050cc2ba2620d83687be5e5d69dcd747e9f72c?el=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) (4e050cc) will **increase** coverage by `0.01%`. > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/hudi/pull/2677/graphs/tree.svg?width=650=150=pr=VTTXabwbs2_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/2677?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) ```diff @@ Coverage Diff @@ ## master#2677 +/- ## + Coverage 52.58% 52.59% +0.01% - Complexity 3707 3709 +2 Files 485 485 Lines 2322723227 Branches 2466 2466 + Hits 1221312217 +4 + Misses 9934 9933 -1 + Partials 1080 1077 -3 ``` | Flag | Coverage Δ | Complexity Δ | | |---|---|---|---| | hudicli | `40.29% <ø> (ø)` | `215.00 <ø> (ø)` | | | hudiclient | `∅ <ø> (∅)` | `0.00 <ø> (ø)` | | | hudicommon | `50.68% <ø> (+0.01%)` | `1976.00 <ø> (ø)` | | | hudiflink | `56.51% <ø> (ø)` | `516.00 <ø> (ø)` | | | hudihadoopmr | `33.33% <ø> (ø)` | `198.00 <ø> (ø)` | | | hudisparkdatasource | `72.06% <ø> (ø)` | `237.00 <ø> (ø)` | | | hudisync | `45.70% <ø> (ø)` | `131.00 <ø> (ø)` | | | huditimelineservice | `64.36% <ø> (ø)` | `62.00 <ø> (ø)` | | | hudiutilities | `69.84% <ø> (+0.10%)` | `374.00 <ø> (+2.00)` | | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more. | [Impacted Files](https://codecov.io/gh/apache/hudi/pull/2677?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) | Coverage Δ | Complexity Δ | | |---|---|---|---| | [...apache/hudi/utilities/deltastreamer/DeltaSync.java](https://codecov.io/gh/apache/hudi/pull/2677/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2RlbHRhc3RyZWFtZXIvRGVsdGFTeW5jLmphdmE=) | `71.42% <0.00%> (+0.68%)` | `56.00% <0.00%> (+2.00%)` | | | [...e/hudi/common/table/log/HoodieLogFormatWriter.java](https://codecov.io/gh/apache/hudi/pull/2677/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL2xvZy9Ib29kaWVMb2dGb3JtYXRXcml0ZXIuamF2YQ==) | `79.68% <0.00%> (+1.56%)` | `26.00% <0.00%> (ø%)` | | -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] codecov-commenter edited a comment on pull request #2677: [HUDI-1714] Added tests to TestHoodieTimelineArchiveLog for the archival of compl…
codecov-commenter edited a comment on pull request #2677: URL: https://github.com/apache/hudi/pull/2677#issuecomment-822060324 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] codecov-commenter edited a comment on pull request #2677: [HUDI-1714] Added tests to TestHoodieTimelineArchiveLog for the archival of compl…
codecov-commenter edited a comment on pull request #2677: URL: https://github.com/apache/hudi/pull/2677#issuecomment-822060324 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2677?src=pr=h1_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) Report > Merging [#2677](https://codecov.io/gh/apache/hudi/pull/2677?src=pr=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) (a45db97) into [master](https://codecov.io/gh/apache/hudi/commit/4e050cc2ba2620d83687be5e5d69dcd747e9f72c?el=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) (4e050cc) will **increase** coverage by `17.26%`. > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/hudi/pull/2677/graphs/tree.svg?width=650=150=pr=VTTXabwbs2_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/2677?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) ```diff @@ Coverage Diff @@ ## master#2677 +/- ## = + Coverage 52.58% 69.84% +17.26% + Complexity 3707 374 - = Files 485 54 -431 Lines 23227 1993-21234 Branches 2466 235 -2231 = - Hits 12213 1392-10821 + Misses 9934 471 -9463 + Partials 1080 130 -950 ``` | Flag | Coverage Δ | Complexity Δ | | |---|---|---|---| | hudicli | `?` | `?` | | | hudiclient | `?` | `?` | | | hudicommon | `?` | `?` | | | hudiflink | `?` | `?` | | | hudihadoopmr | `?` | `?` | | | hudisparkdatasource | `?` | `?` | | | hudisync | `?` | `?` | | | huditimelineservice | `?` | `?` | | | hudiutilities | `69.84% <ø> (+0.10%)` | `374.00 <ø> (+2.00)` | | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more. | [Impacted Files](https://codecov.io/gh/apache/hudi/pull/2677?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) | Coverage Δ | Complexity Δ | | |---|---|---|---| | [...rg/apache/hudi/common/util/SerializationUtils.java](https://codecov.io/gh/apache/hudi/pull/2677/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3V0aWwvU2VyaWFsaXphdGlvblV0aWxzLmphdmE=) | | | | | [...he/hudi/sink/partitioner/BucketAssignFunction.java](https://codecov.io/gh/apache/hudi/pull/2677/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS9zaW5rL3BhcnRpdGlvbmVyL0J1Y2tldEFzc2lnbkZ1bmN0aW9uLmphdmE=) | | | | | [...e/hudi/exception/HoodieFlinkStreamerException.java](https://codecov.io/gh/apache/hudi/pull/2677/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS9leGNlcHRpb24vSG9vZGllRmxpbmtTdHJlYW1lckV4Y2VwdGlvbi5qYXZh) | | | | | [...hudi/common/fs/inline/InLineFsDataInputStream.java](https://codecov.io/gh/apache/hudi/pull/2677/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL2ZzL2lubGluZS9JbkxpbmVGc0RhdGFJbnB1dFN0cmVhbS5qYXZh) | | | | | [...sioning/clean/CleanMetadataV2MigrationHandler.java](https://codecov.io/gh/apache/hudi/pull/2677/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3RpbWVsaW5lL3ZlcnNpb25pbmcvY2xlYW4vQ2xlYW5NZXRhZGF0YVYyTWlncmF0aW9uSGFuZGxlci5qYXZh) | | | | | [...java/org/apache/hudi/sink/StreamWriteFunction.java](https://codecov.io/gh/apache/hudi/pull/2677/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS9zaW5rL1N0cmVhbVdyaXRlRnVuY3Rpb24uamF2YQ==) | | | | |
[GitHub] [hudi] codecov-commenter commented on pull request #2677: [HUDI-1714] Added tests to TestHoodieTimelineArchiveLog for the archival of compl…
codecov-commenter commented on pull request #2677: URL: https://github.com/apache/hudi/pull/2677#issuecomment-822060324 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2677?src=pr=h1_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) Report > Merging [#2677](https://codecov.io/gh/apache/hudi/pull/2677?src=pr=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) (a45db97) into [master](https://codecov.io/gh/apache/hudi/commit/4e050cc2ba2620d83687be5e5d69dcd747e9f72c?el=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) (4e050cc) will **decrease** coverage by `43.19%`. > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/hudi/pull/2677/graphs/tree.svg?width=650=150=pr=VTTXabwbs2_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/2677?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) ```diff @@ Coverage Diff @@ ## master #2677 +/- ## - Coverage 52.58% 9.38% -43.20% + Complexity 3707 48 -3659 Files 485 54 -431 Lines 232271993-21234 Branches 2466 235 -2231 - Hits 12213 187-12026 + Misses 99341793 -8141 + Partials 1080 13 -1067 ``` | Flag | Coverage Δ | Complexity Δ | | |---|---|---|---| | hudicli | `?` | `?` | | | hudiclient | `?` | `?` | | | hudicommon | `?` | `?` | | | hudiflink | `?` | `?` | | | hudihadoopmr | `?` | `?` | | | hudisparkdatasource | `?` | `?` | | | hudisync | `?` | `?` | | | huditimelineservice | `?` | `?` | | | hudiutilities | `9.38% <ø> (-60.37%)` | `48.00 <ø> (-324.00)` | | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more. | [Impacted Files](https://codecov.io/gh/apache/hudi/pull/2677?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) | Coverage Δ | Complexity Δ | | |---|---|---|---| | [...va/org/apache/hudi/utilities/IdentitySplitter.java](https://codecov.io/gh/apache/hudi/pull/2677/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL0lkZW50aXR5U3BsaXR0ZXIuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-2.00%)` | | | [...va/org/apache/hudi/utilities/schema/SchemaSet.java](https://codecov.io/gh/apache/hudi/pull/2677/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NjaGVtYS9TY2hlbWFTZXQuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-3.00%)` | | | [...a/org/apache/hudi/utilities/sources/RowSource.java](https://codecov.io/gh/apache/hudi/pull/2677/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvUm93U291cmNlLmphdmE=) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-4.00%)` | | | [.../org/apache/hudi/utilities/sources/AvroSource.java](https://codecov.io/gh/apache/hudi/pull/2677/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvQXZyb1NvdXJjZS5qYXZh) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-1.00%)` | | | [.../org/apache/hudi/utilities/sources/JsonSource.java](https://codecov.io/gh/apache/hudi/pull/2677/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvSnNvblNvdXJjZS5qYXZh) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-1.00%)` | | |
[jira] [Commented] (HUDI-648) Implement error log/table for Datasource/DeltaStreamer/WriteClient/Compaction writes
[ https://issues.apache.org/jira/browse/HUDI-648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17324600#comment-17324600 ] Raymond Xu commented on HUDI-648: - [~vinoth] The PR is [https://github.com/apache/hudi/pull/2710] somehow not linked by itself. We previously discussed over the RFC about putting error table alongside the main table or having a global error table. I'm also ok with putting it alongside metadata table, having everything in one place. [~liujinhui] would you kindly update the RFC to reflect the latest design please? given you've implemented in a slightly different way. > Implement error log/table for Datasource/DeltaStreamer/WriteClient/Compaction > writes > > > Key: HUDI-648 > URL: https://issues.apache.org/jira/browse/HUDI-648 > Project: Apache Hudi > Issue Type: New Feature > Components: DeltaStreamer, Spark Integration, Writer Core >Reporter: Vinoth Chandar >Assignee: liujinhui >Priority: Major > Labels: sev:normal, user-support-issues > Attachments: image-2021-03-03-11-40-21-083.png > > > We would like a way to hand the erroring records from writing or compaction > back to the users, in a separate table or log. This needs to work generically > across all the different writer paths. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] codecov-commenter edited a comment on pull request #2844: [Hotfix][hudi-sync] Fix typos
codecov-commenter edited a comment on pull request #2844: URL: https://github.com/apache/hudi/pull/2844#issuecomment-822005123 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2844?src=pr=h1_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) Report > Merging [#2844](https://codecov.io/gh/apache/hudi/pull/2844?src=pr=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) (0e5e732) into [master](https://codecov.io/gh/apache/hudi/commit/4e050cc2ba2620d83687be5e5d69dcd747e9f72c?el=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) (4e050cc) will **increase** coverage by `0.00%`. > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/hudi/pull/2844/graphs/tree.svg?width=650=150=pr=VTTXabwbs2_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/2844?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) ```diff @@Coverage Diff@@ ## master#2844 +/- ## = Coverage 52.58% 52.58% - Complexity 3707 3708+1 = Files 485 485 Lines 2322723227 Branches 2466 2466 = + Hits 1221312214+1 Misses 9934 9934 + Partials 1080 1079-1 ``` | Flag | Coverage Δ | Complexity Δ | | |---|---|---|---| | hudicli | `40.29% <ø> (ø)` | `215.00 <ø> (ø)` | | | hudiclient | `∅ <ø> (∅)` | `0.00 <ø> (ø)` | | | hudicommon | `50.66% <ø> (ø)` | `1976.00 <ø> (ø)` | | | hudiflink | `56.51% <ø> (ø)` | `516.00 <ø> (ø)` | | | hudihadoopmr | `33.33% <ø> (ø)` | `198.00 <ø> (ø)` | | | hudisparkdatasource | `72.06% <ø> (ø)` | `237.00 <ø> (ø)` | | | hudisync | `45.70% <ø> (ø)` | `131.00 <ø> (ø)` | | | huditimelineservice | `64.36% <ø> (ø)` | `62.00 <ø> (ø)` | | | hudiutilities | `69.79% <ø> (+0.05%)` | `373.00 <ø> (+1.00)` | | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more. | [Impacted Files](https://codecov.io/gh/apache/hudi/pull/2844?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) | Coverage Δ | Complexity Δ | | |---|---|---|---| | [...apache/hudi/utilities/deltastreamer/DeltaSync.java](https://codecov.io/gh/apache/hudi/pull/2844/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2RlbHRhc3RyZWFtZXIvRGVsdGFTeW5jLmphdmE=) | `71.08% <0.00%> (+0.34%)` | `55.00% <0.00%> (+1.00%)` | | -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] codecov-commenter edited a comment on pull request #2844: [Hotfix][hudi-sync] Fix typos
codecov-commenter edited a comment on pull request #2844: URL: https://github.com/apache/hudi/pull/2844#issuecomment-822005123 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2844?src=pr=h1_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) Report > Merging [#2844](https://codecov.io/gh/apache/hudi/pull/2844?src=pr=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) (0e5e732) into [master](https://codecov.io/gh/apache/hudi/commit/4e050cc2ba2620d83687be5e5d69dcd747e9f72c?el=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) (4e050cc) will **increase** coverage by `17.21%`. > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/hudi/pull/2844/graphs/tree.svg?width=650=150=pr=VTTXabwbs2_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/2844?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) ```diff @@ Coverage Diff @@ ## master#2844 +/- ## = + Coverage 52.58% 69.79% +17.21% + Complexity 3707 373 -3334 = Files 485 54 -431 Lines 23227 1993-21234 Branches 2466 235 -2231 = - Hits 12213 1391-10822 + Misses 9934 471 -9463 + Partials 1080 131 -949 ``` | Flag | Coverage Δ | Complexity Δ | | |---|---|---|---| | hudicli | `?` | `?` | | | hudiclient | `?` | `?` | | | hudicommon | `?` | `?` | | | hudiflink | `?` | `?` | | | hudihadoopmr | `?` | `?` | | | hudisparkdatasource | `?` | `?` | | | hudisync | `?` | `?` | | | huditimelineservice | `?` | `?` | | | hudiutilities | `69.79% <ø> (+0.05%)` | `373.00 <ø> (+1.00)` | | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more. | [Impacted Files](https://codecov.io/gh/apache/hudi/pull/2844?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) | Coverage Δ | Complexity Δ | | |---|---|---|---| | [...ava/org/apache/hudi/exception/HoodieException.java](https://codecov.io/gh/apache/hudi/pull/2844/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvZXhjZXB0aW9uL0hvb2RpZUV4Y2VwdGlvbi5qYXZh) | | | | | [...a/org/apache/hudi/avro/HoodieAvroWriteSupport.java](https://codecov.io/gh/apache/hudi/pull/2844/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvYXZyby9Ib29kaWVBdnJvV3JpdGVTdXBwb3J0LmphdmE=) | | | | | [.../apache/hudi/common/bootstrap/FileStatusUtils.java](https://codecov.io/gh/apache/hudi/pull/2844/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL2Jvb3RzdHJhcC9GaWxlU3RhdHVzVXRpbHMuamF2YQ==) | | | | | [...he/hudi/common/util/HoodieRecordSizeEstimator.java](https://codecov.io/gh/apache/hudi/pull/2844/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3V0aWwvSG9vZGllUmVjb3JkU2l6ZUVzdGltYXRvci5qYXZh) | | | | | [...e/hudi/common/table/timeline/dto/FileSliceDTO.java](https://codecov.io/gh/apache/hudi/pull/2844/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3RpbWVsaW5lL2R0by9GaWxlU2xpY2VEVE8uamF2YQ==) | | | | | [.../versioning/compaction/CompactionPlanMigrator.java](https://codecov.io/gh/apache/hudi/pull/2844/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3RpbWVsaW5lL3ZlcnNpb25pbmcvY29tcGFjdGlvbi9Db21wYWN0aW9uUGxhbk1pZ3JhdG9yLmphdmE=) | | | | |
[GitHub] [hudi] codecov-commenter commented on pull request #2844: [Hotfix][hudi-sync] Fix typos
codecov-commenter commented on pull request #2844: URL: https://github.com/apache/hudi/pull/2844#issuecomment-822005123 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2844?src=pr=h1_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) Report > Merging [#2844](https://codecov.io/gh/apache/hudi/pull/2844?src=pr=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) (0e5e732) into [master](https://codecov.io/gh/apache/hudi/commit/4e050cc2ba2620d83687be5e5d69dcd747e9f72c?el=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) (4e050cc) will **decrease** coverage by `43.19%`. > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/hudi/pull/2844/graphs/tree.svg?width=650=150=pr=VTTXabwbs2_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/2844?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) ```diff @@ Coverage Diff @@ ## master #2844 +/- ## - Coverage 52.58% 9.38% -43.20% + Complexity 3707 48 -3659 Files 485 54 -431 Lines 232271993-21234 Branches 2466 235 -2231 - Hits 12213 187-12026 + Misses 99341793 -8141 + Partials 1080 13 -1067 ``` | Flag | Coverage Δ | Complexity Δ | | |---|---|---|---| | hudicli | `?` | `?` | | | hudiclient | `?` | `?` | | | hudicommon | `?` | `?` | | | hudiflink | `?` | `?` | | | hudihadoopmr | `?` | `?` | | | hudisparkdatasource | `?` | `?` | | | hudisync | `?` | `?` | | | huditimelineservice | `?` | `?` | | | hudiutilities | `9.38% <ø> (-60.37%)` | `48.00 <ø> (-324.00)` | | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more. | [Impacted Files](https://codecov.io/gh/apache/hudi/pull/2844?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) | Coverage Δ | Complexity Δ | | |---|---|---|---| | [...va/org/apache/hudi/utilities/IdentitySplitter.java](https://codecov.io/gh/apache/hudi/pull/2844/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL0lkZW50aXR5U3BsaXR0ZXIuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-2.00%)` | | | [...va/org/apache/hudi/utilities/schema/SchemaSet.java](https://codecov.io/gh/apache/hudi/pull/2844/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NjaGVtYS9TY2hlbWFTZXQuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-3.00%)` | | | [...a/org/apache/hudi/utilities/sources/RowSource.java](https://codecov.io/gh/apache/hudi/pull/2844/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvUm93U291cmNlLmphdmE=) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-4.00%)` | | | [.../org/apache/hudi/utilities/sources/AvroSource.java](https://codecov.io/gh/apache/hudi/pull/2844/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvQXZyb1NvdXJjZS5qYXZh) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-1.00%)` | | | [.../org/apache/hudi/utilities/sources/JsonSource.java](https://codecov.io/gh/apache/hudi/pull/2844/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvSnNvblNvdXJjZS5qYXZh) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-1.00%)` | | |
[GitHub] [hudi] RocMarshal opened a new pull request #2844: [Hotfix][hudi-sync] Fix typos
RocMarshal opened a new pull request #2844: URL: https://github.com/apache/hudi/pull/2844 ## *Tips* - *Thank you very much for contributing to Apache Hudi.* - *Please review https://hudi.apache.org/contributing.html before opening a pull request.* ## What is the purpose of the pull request *(For example: This pull request adds quick-start document.)* ## Brief change log *(for example:)* - *Modify AnnotationLocation checkstyle rule in checkstyle.xml* ## Verify this pull request *(Please pick either of the following options)* This pull request is a trivial rework / code cleanup without any test coverage. *(or)* This pull request is already covered by existing tests, such as *(please describe tests)*. (or) This change added tests and can be verified as follows: *(example:)* - *Added integration tests for end-to-end.* - *Added HoodieClientWriteTest to verify the change.* - *Manually verified the change by running a job locally.* ## Committer checklist - [ ] Has a corresponding JIRA in PR title & commit - [ ] Commit message is descriptive of the change - [ ] CI is green - [ ] Necessary doc changes done or have another open PR - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] RocMarshal commented on a change in pull request #2822: [Hotfix][hudi-sync] Refactor method up to parent-class
RocMarshal commented on a change in pull request #2822: URL: https://github.com/apache/hudi/pull/2822#discussion_r615409612 ## File path: hudi-sync/hudi-sync-common/src/main/java/org/apache/hudi/sync/common/AbstractSyncHoodieClient.java ## @@ -136,6 +141,42 @@ public MessageType getDataSchema() { } } + public abstract static class TypeOptimizer implements Serializable { Review comment: @leesf Of course. I will add tests for it ASAP. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] lw309637554 commented on pull request #2722: [HUDI-1722]hive beeline/spark-sql query specified field on mor table occur NPE
lw309637554 commented on pull request #2722: URL: https://github.com/apache/hudi/pull/2722#issuecomment-821957510 @xiarixiaoyao thanks for your contribution. Add the unit test is very necessary. Also the resolution left some comments. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] lw309637554 commented on a change in pull request #2722: [HUDI-1722]hive beeline/spark-sql query specified field on mor table occur NPE
lw309637554 commented on a change in pull request #2722: URL: https://github.com/apache/hudi/pull/2722#discussion_r615366128 ## File path: hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/realtime/HoodieParquetRealtimeInputFormat.java ## @@ -85,12 +85,14 @@ void addProjectionToJobConf(final RealtimeSplit realtimeSplit, final JobConf job // risk of experiencing race conditions. Hence, we synchronize on the JobConf object here. There is negligible // latency incurred here due to the synchronization since get record reader is called once per spilt before the // actual heavy lifting of reading the parquet files happen. -if (jobConf.get(HoodieInputFormatUtils.HOODIE_READ_COLUMNS_PROP) == null) { +if (jobConf.get(HoodieInputFormatUtils.HOODIE_READ_COLUMNS_PROP) == null +|| (!realtimeSplit.getDeltaLogPaths().isEmpty() && !HoodieRealtimeInputFormatUtils.requiredProjectionFieldsExistInConf(jobConf))) { synchronized (jobConf) { LOG.info( "Before adding Hoodie columns, Projections :" + jobConf.get(ColumnProjectionUtils.READ_COLUMN_NAMES_CONF_STR) + ", Ids :" + jobConf.get(ColumnProjectionUtils.READ_COLUMN_IDS_CONF_STR)); -if (jobConf.get(HoodieInputFormatUtils.HOODIE_READ_COLUMNS_PROP) == null) { +if (jobConf.get(HoodieInputFormatUtils.HOODIE_READ_COLUMNS_PROP) == null Review comment: can we isolation the jobconf for different recordreader? or just revert the https://github.com/apache/hudi/pull/2190/files, for delete the "if (!realtimeSplit.getDeltaLogPaths().isEmpty()) {" -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] lw309637554 commented on a change in pull request #2722: [HUDI-1722]hive beeline/spark-sql query specified field on mor table occur NPE
lw309637554 commented on a change in pull request #2722: URL: https://github.com/apache/hudi/pull/2722#discussion_r615365836 ## File path: hudi-hadoop-mr/src/test/java/org/apache/hudi/hadoop/functional/TestHoodieCombineHiveInputFormat.java ## @@ -84,6 +86,73 @@ public void setUp() throws IOException, InterruptedException { HoodieTestUtils.init(MiniClusterUtil.configuration, tempDir.toAbsolutePath().toString(), HoodieTableType.MERGE_ON_READ); } + @Test + public void testMutilReaderRealtimeComineHoodieInputFormat() throws Exception { +// test for hudi-1722 +Configuration conf = new Configuration(); +// initial commit +Schema schema = HoodieAvroUtils.addMetadataFields(SchemaTestUtil.getEvolvedSchema()); +HoodieTestUtils.init(hadoopConf, tempDir.toAbsolutePath().toString(), HoodieTableType.MERGE_ON_READ); +String commitTime = "100"; +final int numRecords = 1000; +// Create 3 parquet files with 1000 records each +File partitionDir = InputFormatTestUtil.prepareParquetTable(tempDir, schema, 3, numRecords, commitTime); +InputFormatTestUtil.commit(tempDir, commitTime); + +String newCommitTime = "101"; +// to trigger the bug of HUDI-1772, only update fileid2 +// insert 1000 update records to log file 2 +// now fileid0, fileid1 has no log files, fileid2 has log file +HoodieLogFormat.Writer writer = +InputFormatTestUtil.writeDataBlockToLogFile(partitionDir, fs, schema, "fileid2", commitTime, newCommitTime, +numRecords, numRecords, 0); +writer.close(); + +TableDesc tblDesc = Utilities.defaultTd; +// Set the input format +tblDesc.setInputFileFormatClass(HoodieParquetRealtimeInputFormat.class); +PartitionDesc partDesc = new PartitionDesc(tblDesc, null); +LinkedHashMap pt = new LinkedHashMap<>(); +LinkedHashMap> tableAlias = new LinkedHashMap<>(); +ArrayList alias = new ArrayList<>(); +alias.add(tempDir.toAbsolutePath().toString()); +tableAlias.put(new Path(tempDir.toAbsolutePath().toString()), alias); +pt.put(new Path(tempDir.toAbsolutePath().toString()), partDesc); + +MapredWork mrwork = new MapredWork(); +mrwork.getMapWork().setPathToPartitionInfo(pt); +mrwork.getMapWork().setPathToAliases(tableAlias); +Path mapWorkPath = new Path(tempDir.toAbsolutePath().toString()); +Utilities.setMapRedWork(conf, mrwork, mapWorkPath); +jobConf = new JobConf(conf); +// Add the paths +FileInputFormat.setInputPaths(jobConf, partitionDir.getPath()); +jobConf.set(HAS_MAP_WORK, "true"); +// The following config tells Hive to choose ExecMapper to read the MAP_WORK +jobConf.set(MAPRED_MAPPER_CLASS, ExecMapper.class.getName()); +// set SPLIT_MAXSIZE larger to create one split for 3 files groups + jobConf.set(org.apache.hadoop.mapreduce.lib.input.FileInputFormat.SPLIT_MAXSIZE, "12800"); + +HoodieCombineHiveInputFormat combineHiveInputFormat = new HoodieCombineHiveInputFormat(); +String tripsHiveColumnTypes = "double,string,string,string,double,double,double,double,double"; +InputFormatTestUtil.setProjectFieldsForInputFormat(jobConf, schema, tripsHiveColumnTypes); +InputSplit[] splits = combineHiveInputFormat.getSplits(jobConf, 1); +// Since the SPLIT_SIZE is 3, we should create only 1 split with all 3 file groups +assertEquals(1, splits.length); +RecordReader recordReader = Review comment: hello , just see one recordreader? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] codecov-commenter edited a comment on pull request #2843: [HUDI-1804] Continue to write when Flink write task restart because o…
codecov-commenter edited a comment on pull request #2843: URL: https://github.com/apache/hudi/pull/2843#issuecomment-821748199 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2843?src=pr=h1_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) Report > :exclamation: No coverage uploaded for pull request base (`master@b6d949b`). [Click here to learn what that means](https://docs.codecov.io/docs/error-reference?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#section-missing-base-commit). > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/hudi/pull/2843/graphs/tree.svg?width=650=150=pr=VTTXabwbs2_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/2843?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) ```diff @@Coverage Diff@@ ## master#2843 +/- ## = Coverage ? 52.58% Complexity? 3709 = Files ? 485 Lines ?23227 Branches ? 2466 = Hits ?12215 Misses? 9934 Partials ? 1078 ``` | Flag | Coverage Δ | Complexity Δ | | |---|---|---|---| | hudicli | `40.29% <ø> (?)` | `215.00 <ø> (?)` | | | hudiclient | `∅ <ø> (?)` | `0.00 <ø> (?)` | | | hudicommon | `50.66% <ø> (?)` | `1976.00 <ø> (?)` | | | hudiflink | `56.51% <ø> (?)` | `516.00 <ø> (?)` | | | hudihadoopmr | `33.33% <ø> (?)` | `198.00 <ø> (?)` | | | hudisparkdatasource | `72.06% <ø> (?)` | `237.00 <ø> (?)` | | | hudisync | `45.70% <ø> (?)` | `131.00 <ø> (?)` | | | huditimelineservice | `64.36% <ø> (?)` | `62.00 <ø> (?)` | | | hudiutilities | `69.84% <ø> (?)` | `374.00 <ø> (?)` | | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] codecov-commenter edited a comment on pull request #2843: [HUDI-1804] Continue to write when Flink write task restart because o…
codecov-commenter edited a comment on pull request #2843: URL: https://github.com/apache/hudi/pull/2843#issuecomment-821748199 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] codecov-commenter edited a comment on pull request #2843: [HUDI-1804] Continue to write when Flink write task restart because o…
codecov-commenter edited a comment on pull request #2843: URL: https://github.com/apache/hudi/pull/2843#issuecomment-821748199 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2843?src=pr=h1_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) Report > :exclamation: No coverage uploaded for pull request base (`master@b6d949b`). [Click here to learn what that means](https://docs.codecov.io/docs/error-reference?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#section-missing-base-commit). > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/hudi/pull/2843/graphs/tree.svg?width=650=150=pr=VTTXabwbs2_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/2843?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) ```diff @@Coverage Diff@@ ## master#2843 +/- ## = Coverage ? 69.84% Complexity? 374 = Files ? 54 Lines ? 1993 Branches ? 235 = Hits ? 1392 Misses? 471 Partials ? 130 ``` | Flag | Coverage Δ | Complexity Δ | | |---|---|---|---| | hudiutilities | `69.84% <ø> (?)` | `374.00 <ø> (?)` | | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org