[GitHub] [hudi] codecov-io commented on pull request #2708: [HUDI-1712] Rename & standardize config to match other configs
codecov-io commented on pull request #2708: URL: https://github.com/apache/hudi/pull/2708#issuecomment-805560787 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2708?src=pr&el=h1) Report > Merging [#2708](https://codecov.io/gh/apache/hudi/pull/2708?src=pr&el=desc) (1aaf8b1) into [master](https://codecov.io/gh/apache/hudi/commit/0e6909d3e241c794ed1b9318fcb9142a36cb0133?el=desc) (0e6909d) will **decrease** coverage by `37.03%`. > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/hudi/pull/2708/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2)](https://codecov.io/gh/apache/hudi/pull/2708?src=pr&el=tree) ```diff @@ Coverage Diff @@ ## master #2708 +/- ## - Coverage 46.43% 9.40% -37.04% + Complexity 3278 48 -3230 Files 476 54 -422 Lines 225831989-20594 Branches 2408 236 -2172 - Hits 10487 187-10300 + Misses111961789 -9407 + Partials900 13 -887 ``` | Flag | Coverage Δ | Complexity Δ | | |---|---|---|---| | hudicli | `?` | `?` | | | hudiclient | `?` | `?` | | | hudicommon | `?` | `?` | | | hudiflink | `?` | `?` | | | hudihadoopmr | `?` | `?` | | | hudisparkdatasource | `?` | `?` | | | hudisync | `?` | `?` | | | huditimelineservice | `?` | `?` | | | hudiutilities | `9.40% <ø> (ø)` | `0.00 <ø> (ø)` | | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags#carryforward-flags-in-the-pull-request-comment) to find out more. | [Impacted Files](https://codecov.io/gh/apache/hudi/pull/2708?src=pr&el=tree) | Coverage Δ | Complexity Δ | | |---|---|---|---| | [...ava/org/apache/hudi/exception/HoodieException.java](https://codecov.io/gh/apache/hudi/pull/2708/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvZXhjZXB0aW9uL0hvb2RpZUV4Y2VwdGlvbi5qYXZh) | | | | | [.../org/apache/hudi/sink/InstantGenerateOperator.java](https://codecov.io/gh/apache/hudi/pull/2708/diff?src=pr&el=tree#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS9zaW5rL0luc3RhbnRHZW5lcmF0ZU9wZXJhdG9yLmphdmE=) | | | | | [...in/java/org/apache/hudi/common/model/BaseFile.java](https://codecov.io/gh/apache/hudi/pull/2708/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL21vZGVsL0Jhc2VGaWxlLmphdmE=) | | | | | [...in/java/org/apache/hudi/cli/HoodiePrintHelper.java](https://codecov.io/gh/apache/hudi/pull/2708/diff?src=pr&el=tree#diff-aHVkaS1jbGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY2xpL0hvb2RpZVByaW50SGVscGVyLmphdmE=) | | | | | [...util/jvm/OpenJ9MemoryLayoutSpecification64bit.java](https://codecov.io/gh/apache/hudi/pull/2708/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3V0aWwvanZtL09wZW5KOU1lbW9yeUxheW91dFNwZWNpZmljYXRpb242NGJpdC5qYXZh) | | | | | [...mmon/table/log/block/HoodieDeleteBlockVersion.java](https://codecov.io/gh/apache/hudi/pull/2708/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL2xvZy9ibG9jay9Ib29kaWVEZWxldGVCbG9ja1ZlcnNpb24uamF2YQ==) | | | | | [.../org/apache/hudi/common/model/BaseAvroPayload.java](https://codecov.io/gh/apache/hudi/pull/2708/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL21vZGVsL0Jhc2VBdnJvUGF5bG9hZC5qYXZh) | | | | | [...e/hudi/common/table/log/HoodieLogFormatWriter.java](https://codecov.io/gh/apache/hudi/pull/2708/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL2xvZy9Ib29kaWVMb2dGb3JtYXRXcml0ZXIuamF2YQ==) | | | | | [...che/hudi/common/util/collection/ImmutablePair.java](https://codecov.io/gh/apache/hudi/pull/2708/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3V0aWwvY29sbGVjdGlvbi9JbW11dGFibGVQYWlyLmphdmE=) | | | | | [...rg/apache/hudi/metadata/HoodieMetadataPayload.java](https://codecov.io/gh/apache/hudi/pull/2708/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvbWV0YWRhdGEvSG9vZGllTWV0YWRhdGFQYXlsb2FkLmphdmE=) | | | | | ... and [411 more](https://codecov.io/gh/apache/hudi/pull/2708/diff?src=pr&el=tree-more) | | -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] codecov-io edited a comment on pull request #2645: [HUDI-1659] Basic Implementation Of Spark Sql Support
codecov-io edited a comment on pull request #2645: URL: https://github.com/apache/hudi/pull/2645#issuecomment-792430670 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2645?src=pr&el=h1) Report > Merging [#2645](https://codecov.io/gh/apache/hudi/pull/2645?src=pr&el=desc) (c4316ae) into [master](https://codecov.io/gh/apache/hudi/commit/900de34e45b4c1d19c01ea84adc38413f2bd52ff?el=desc) (900de34) will **increase** coverage by `0.05%`. > The diff coverage is `53.16%`. [![Impacted file tree graph](https://codecov.io/gh/apache/hudi/pull/2645/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2)](https://codecov.io/gh/apache/hudi/pull/2645?src=pr&el=tree) ```diff @@ Coverage Diff @@ ## master#2645 +/- ## + Coverage 51.76% 51.82% +0.05% - Complexity 3601 3682 +81 Files 476 493 +17 Lines 2257923800+1221 Branches 2407 2672 +265 + Hits 1168812334 +646 - Misses 987410284 +410 - Partials 1017 1182 +165 ``` | Flag | Coverage Δ | Complexity Δ | | |---|---|---|---| | hudicli | `37.01% <ø> (ø)` | `0.00 <ø> (ø)` | | | hudiclient | `100.00% <ø> (ø)` | `0.00 <ø> (ø)` | | | hudicommon | `50.81% <7.31%> (-0.12%)` | `0.00 <2.00> (ø)` | | | hudiflink | `54.13% <ø> (-0.15%)` | `0.00 <ø> (ø)` | | | hudihadoopmr | `33.44% <ø> (ø)` | `0.00 <ø> (ø)` | | | hudisparkdatasource | `63.95% <54.82%> (-6.99%)` | `0.00 <76.00> (ø)` | | | hudisync | `45.50% <0.00%> (-0.20%)` | `0.00 <1.00> (ø)` | | | huditimelineservice | `64.36% <ø> (ø)` | `0.00 <ø> (ø)` | | | hudiutilities | `69.73% <ø> (ø)` | `0.00 <ø> (ø)` | | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags#carryforward-flags-in-the-pull-request-comment) to find out more. | [Impacted Files](https://codecov.io/gh/apache/hudi/pull/2645?src=pr&el=tree) | Coverage Δ | Complexity Δ | | |---|---|---|---| | [...g/apache/hudi/common/model/HoodiePayloadProps.java](https://codecov.io/gh/apache/hudi/pull/2645/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL21vZGVsL0hvb2RpZVBheWxvYWRQcm9wcy5qYXZh) | `0.00% <ø> (ø)` | `0.00 <0.00> (ø)` | | | [...rg/apache/hudi/common/table/HoodieTableConfig.java](https://codecov.io/gh/apache/hudi/pull/2645/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL0hvb2RpZVRhYmxlQ29uZmlnLmphdmE=) | `43.75% <0.00%> (-1.71%)` | `17.00 <0.00> (ø)` | | | [...pache/hudi/common/table/HoodieTableMetaClient.java](https://codecov.io/gh/apache/hudi/pull/2645/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL0hvb2RpZVRhYmxlTWV0YUNsaWVudC5qYXZh) | `65.09% <0.00%> (-3.22%)` | `43.00 <0.00> (ø)` | | | [.../apache/hudi/common/table/TableSchemaResolver.java](https://codecov.io/gh/apache/hudi/pull/2645/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL1RhYmxlU2NoZW1hUmVzb2x2ZXIuamF2YQ==) | `0.00% <0.00%> (ø)` | `0.00 <0.00> (ø)` | | | [...he/hudi/exception/HoodieDuplicateKeyException.java](https://codecov.io/gh/apache/hudi/pull/2645/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvZXhjZXB0aW9uL0hvb2RpZUR1cGxpY2F0ZUtleUV4Y2VwdGlvbi5qYXZh) | `0.00% <0.00%> (ø)` | `0.00 <0.00> (?)` | | | [...n/scala/org/apache/hudi/HoodieSparkSqlWriter.scala](https://codecov.io/gh/apache/hudi/pull/2645/diff?src=pr&el=tree#diff-aHVkaS1zcGFyay1kYXRhc291cmNlL2h1ZGktc3Bhcmsvc3JjL21haW4vc2NhbGEvb3JnL2FwYWNoZS9odWRpL0hvb2RpZVNwYXJrU3FsV3JpdGVyLnNjYWxh) | `57.79% <ø> (ø)` | `0.00 <0.00> (ø)` | | | [...la/org/apache/spark/sql/hive/HiveClientUtils.scala](https://codecov.io/gh/apache/hudi/pull/2645/diff?src=pr&el=tree#diff-aHVkaS1zcGFyay1kYXRhc291cmNlL2h1ZGktc3Bhcmsvc3JjL21haW4vc2NhbGEvb3JnL2FwYWNoZS9zcGFyay9zcWwvaGl2ZS9IaXZlQ2xpZW50VXRpbHMuc2NhbGE=) | `0.00% <0.00%> (ø)` | `0.00 <0.00> (?)` | | | [...e/spark/sql/catalyst/plans/logical/mergeInto.scala](https://codecov.io/gh/apache/hudi/pull/2645/diff?src=pr&el=tree#diff-aHVkaS1zcGFyay1kYXRhc291cmNlL2h1ZGktc3BhcmsyL3NyYy9tYWluL3NjYWxhL29yZy9hcGFjaGUvc3Bhcmsvc3FsL2NhdGFseXN0L3BsYW5zL2xvZ2ljYWwvbWVyZ2VJbnRvLnNjYWxh) | `0.00% <0.00%> (ø)` | `0.00 <0.00> (?)` | | | [...he/spark/sql/hudi/parser/HoodieSqlAstBuilder.scala](https://codecov.io/gh/apache/hudi/pull/2645/diff?src=pr&el=tree#diff-aHVkaS1zcGFyay1kYXRhc291cmNlL2h1ZGktc3BhcmsyL3NyYy9tYWluL3NjYWxhL29yZy9hcGFjaGUvc3Bhcmsvc3FsL2h1ZGkvcGFyc2VyL0hvb2RpZVNxbEFzdEJ1aWxkZXIuc2NhbGE=) | `0.00% <0.00%> (ø)` | `0.00 <0.00> (?)` | |
[GitHub] [hudi] codecov-io commented on pull request #2711: [hotfix] Log the error message for creating table source first
codecov-io commented on pull request #2711: URL: https://github.com/apache/hudi/pull/2711#issuecomment-805632411 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2711?src=pr&el=h1) Report > Merging [#2711](https://codecov.io/gh/apache/hudi/pull/2711?src=pr&el=desc) (ae47544) into [master](https://codecov.io/gh/apache/hudi/commit/03668dbaf1a60428d7e0d68c6622605e0809150a?el=desc) (03668db) will **decrease** coverage by `0.01%`. > The diff coverage is `50.00%`. [![Impacted file tree graph](https://codecov.io/gh/apache/hudi/pull/2711/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2)](https://codecov.io/gh/apache/hudi/pull/2711?src=pr&el=tree) ```diff @@ Coverage Diff @@ ## master#2711 +/- ## - Coverage 51.74% 51.73% -0.02% + Complexity 3602 3601 -1 Files 476 476 Lines 2259222595 +3 Branches 2409 2409 - Hits 1169011689 -1 - Misses 9885 9888 +3 - Partials 1017 1018 +1 ``` | Flag | Coverage Δ | Complexity Δ | | |---|---|---|---| | hudicli | `37.01% <ø> (ø)` | `0.00 <ø> (ø)` | | | hudiclient | `100.00% <ø> (ø)` | `0.00 <ø> (ø)` | | | hudicommon | `50.94% <ø> (ø)` | `0.00 <ø> (ø)` | | | hudiflink | `54.08% <50.00%> (-0.05%)` | `0.00 <0.00> (ø)` | | | hudihadoopmr | `33.44% <ø> (ø)` | `0.00 <ø> (ø)` | | | hudisparkdatasource | `70.87% <ø> (ø)` | `0.00 <ø> (ø)` | | | hudisync | `45.58% <ø> (ø)` | `0.00 <ø> (ø)` | | | huditimelineservice | `64.36% <ø> (ø)` | `0.00 <ø> (ø)` | | | hudiutilities | `69.73% <ø> (-0.06%)` | `0.00 <ø> (ø)` | | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags#carryforward-flags-in-the-pull-request-comment) to find out more. | [Impacted Files](https://codecov.io/gh/apache/hudi/pull/2711?src=pr&el=tree) | Coverage Δ | Complexity Δ | | |---|---|---|---| | [...java/org/apache/hudi/table/HoodieTableFactory.java](https://codecov.io/gh/apache/hudi/pull/2711/diff?src=pr&el=tree#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS90YWJsZS9Ib29kaWVUYWJsZUZhY3RvcnkuamF2YQ==) | `72.72% <50.00%> (-5.33%)` | `11.00 <0.00> (ø)` | | | [...apache/hudi/utilities/deltastreamer/DeltaSync.java](https://codecov.io/gh/apache/hudi/pull/2711/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2RlbHRhc3RyZWFtZXIvRGVsdGFTeW5jLmphdmE=) | `71.37% <0.00%> (-0.35%)` | `55.00% <0.00%> (-1.00%)` | | -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] garyli1019 merged pull request #2708: [HUDI-1712] Rename & standardize config to match other configs
garyli1019 merged pull request #2708: URL: https://github.com/apache/hudi/pull/2708 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[hudi] branch master updated (03668db -> 01a1d79)
This is an automated email from the ASF dual-hosted git repository. garyli pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git. from 03668db [HUDI-1710] Read optimized query type for Flink batch reader (#2702) add 01a1d79 [HUDI-1712] Rename & standardize config to match other configs (#2708) No new revisions were added by this update. Summary of changes: .../hudi/common/config/LockConfiguration.java | 2 +- .../testsuite/job/TestHoodieTestSuiteJob.java | 18 ++-- .../functional/TestHoodieDeltaStreamer.java| 34 +++--- 3 files changed, 27 insertions(+), 27 deletions(-)
[GitHub] [hudi] Sugamber commented on issue #2637: [SUPPORT] - Partial Update : update few columns of a table
Sugamber commented on issue #2637: URL: https://github.com/apache/hudi/issues/2637#issuecomment-805645355 There is an open pull request for partial update for CoW table. https://github.com/apache/hudi/pull/1929 It looks like my use case . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] Sugamber commented on pull request #2666: [HUDI-1160] Support update partial fields for CoW table
Sugamber commented on pull request #2666: URL: https://github.com/apache/hudi/pull/2666#issuecomment-805652102 @liujinhui1994 We are also need the same feature in hudi. Is there any working branch which can be referred? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] liujinhui1994 commented on pull request #2666: [HUDI-1160] Support update partial fields for CoW table
liujinhui1994 commented on pull request #2666: URL: https://github.com/apache/hudi/pull/2666#issuecomment-805655005 @Sugamber This branch should be available -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] Sugamber commented on pull request #2666: [HUDI-1160] Support update partial fields for CoW table
Sugamber commented on pull request #2666: URL: https://github.com/apache/hudi/pull/2666#issuecomment-805666558 Is there any timeline for this pull request? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] liujinhui1994 commented on pull request #2666: [HUDI-1160] Support update partial fields for CoW table
liujinhui1994 commented on pull request #2666: URL: https://github.com/apache/hudi/pull/2666#issuecomment-805667943 Maybe after 0.8 is released -- 原始邮件 -- 发件人: "apache/hudi" ***@***.***>; 发送时间: 2021年3月24日(星期三) 晚上6:03 ***@***.***>; ***@***.**@***.***>; 主题: Re: [apache/hudi] [HUDI-1160] Support update partial fields for CoW table (#2666) Is there any timeline for this pull request? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] codecov-io edited a comment on pull request #2651: [HUDI-1591] [RFC-26] Improve Hoodie Table Query Performance And Ease Of Use Fo…
codecov-io edited a comment on pull request #2651: URL: https://github.com/apache/hudi/pull/2651#issuecomment-794945140 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2651?src=pr&el=h1) Report > Merging [#2651](https://codecov.io/gh/apache/hudi/pull/2651?src=pr&el=desc) (fb7a9b1) into [master](https://codecov.io/gh/apache/hudi/commit/ce3e8ec87083ef4cd4f33de39b6697f66ff3f277?el=desc) (ce3e8ec) will **increase** coverage by `17.95%`. > The diff coverage is `50.00%`. [![Impacted file tree graph](https://codecov.io/gh/apache/hudi/pull/2651/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2)](https://codecov.io/gh/apache/hudi/pull/2651?src=pr&el=tree) ```diff @@ Coverage Diff @@ ## master#2651 +/- ## = + Coverage 51.76% 69.72% +17.95% + Complexity 3602 372 -3230 = Files 476 54 -422 Lines 22579 1995-20584 Branches 2408 236 -2172 = - Hits 11688 1391-10297 + Misses 9874 474 -9400 + Partials 1017 130 -887 ``` | Flag | Coverage Δ | Complexity Δ | | |---|---|---|---| | hudicli | `?` | `?` | | | hudiclient | `?` | `?` | | | hudicommon | `?` | `?` | | | hudiflink | `?` | `?` | | | hudihadoopmr | `?` | `?` | | | hudisparkdatasource | `?` | `?` | | | hudisync | `?` | `?` | | | huditimelineservice | `?` | `?` | | | hudiutilities | `69.72% <50.00%> (-0.06%)` | `0.00 <0.00> (ø)` | | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags#carryforward-flags-in-the-pull-request-comment) to find out more. | [Impacted Files](https://codecov.io/gh/apache/hudi/pull/2651?src=pr&el=tree) | Coverage Δ | Complexity Δ | | |---|---|---|---| | [...apache/hudi/utilities/deltastreamer/DeltaSync.java](https://codecov.io/gh/apache/hudi/pull/2651/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2RlbHRhc3RyZWFtZXIvRGVsdGFTeW5jLmphdmE=) | `71.28% <50.00%> (-0.45%)` | `56.00 <0.00> (ø)` | | | [...ava/org/apache/hudi/exception/HoodieException.java](https://codecov.io/gh/apache/hudi/pull/2651/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvZXhjZXB0aW9uL0hvb2RpZUV4Y2VwdGlvbi5qYXZh) | | | | | [.../org/apache/hudi/sink/InstantGenerateOperator.java](https://codecov.io/gh/apache/hudi/pull/2651/diff?src=pr&el=tree#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS9zaW5rL0luc3RhbnRHZW5lcmF0ZU9wZXJhdG9yLmphdmE=) | | | | | [...in/java/org/apache/hudi/common/model/BaseFile.java](https://codecov.io/gh/apache/hudi/pull/2651/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL21vZGVsL0Jhc2VGaWxlLmphdmE=) | | | | | [...in/java/org/apache/hudi/cli/HoodiePrintHelper.java](https://codecov.io/gh/apache/hudi/pull/2651/diff?src=pr&el=tree#diff-aHVkaS1jbGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY2xpL0hvb2RpZVByaW50SGVscGVyLmphdmE=) | | | | | [...util/jvm/OpenJ9MemoryLayoutSpecification64bit.java](https://codecov.io/gh/apache/hudi/pull/2651/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3V0aWwvanZtL09wZW5KOU1lbW9yeUxheW91dFNwZWNpZmljYXRpb242NGJpdC5qYXZh) | | | | | [...mmon/table/log/block/HoodieDeleteBlockVersion.java](https://codecov.io/gh/apache/hudi/pull/2651/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL2xvZy9ibG9jay9Ib29kaWVEZWxldGVCbG9ja1ZlcnNpb24uamF2YQ==) | | | | | [.../org/apache/hudi/common/model/BaseAvroPayload.java](https://codecov.io/gh/apache/hudi/pull/2651/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL21vZGVsL0Jhc2VBdnJvUGF5bG9hZC5qYXZh) | | | | | [...e/hudi/common/table/log/HoodieLogFormatWriter.java](https://codecov.io/gh/apache/hudi/pull/2651/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL2xvZy9Ib29kaWVMb2dGb3JtYXRXcml0ZXIuamF2YQ==) | | | | | [...che/hudi/common/util/collection/ImmutablePair.java](https://codecov.io/gh/apache/hudi/pull/2651/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3V0aWwvY29sbGVjdGlvbi9JbW11dGFibGVQYWlyLmphdmE=) | | | | | ... and [405 more](https://codecov.io/gh/apache/hudi/pull/2651/diff?src=pr&el=tree-more) | | -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] garyli1019 merged pull request #2711: [hotfix] Log the error message for creating table source first
garyli1019 merged pull request #2711: URL: https://github.com/apache/hudi/pull/2711 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[hudi] branch master updated: [hotfix] Log the error message for creating table source first (#2711)
This is an automated email from the ASF dual-hosted git repository. garyli pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/master by this push: new 29b79c9 [hotfix] Log the error message for creating table source first (#2711) 29b79c9 is described below commit 29b79c99b02d66ef9b087b56223e74c0d1f99e94 Author: Danny Chan AuthorDate: Wed Mar 24 18:25:37 2021 +0800 [hotfix] Log the error message for creating table source first (#2711) --- .../org/apache/hudi/table/HoodieTableFactory.java | 27 +++--- 1 file changed, 19 insertions(+), 8 deletions(-) diff --git a/hudi-flink/src/main/java/org/apache/hudi/table/HoodieTableFactory.java b/hudi-flink/src/main/java/org/apache/hudi/table/HoodieTableFactory.java index a2dac36..7ce8880 100644 --- a/hudi-flink/src/main/java/org/apache/hudi/table/HoodieTableFactory.java +++ b/hudi-flink/src/main/java/org/apache/hudi/table/HoodieTableFactory.java @@ -19,6 +19,7 @@ package org.apache.hudi.table; import org.apache.hudi.configuration.FlinkOptions; +import org.apache.hudi.exception.HoodieException; import org.apache.hudi.keygen.ComplexAvroKeyGenerator; import org.apache.hudi.util.AvroSchemaConverter; @@ -57,14 +58,24 @@ public class HoodieTableFactory implements TableSourceFactory, TableSin Configuration conf = FlinkOptions.fromMap(context.getTable().getOptions()); TableSchema schema = TableSchemaUtils.getPhysicalSchema(context.getTable().getSchema()); setupConfOptions(conf, context.getObjectIdentifier().getObjectName(), context.getTable(), schema); -Path path = new Path(conf.getOptional(FlinkOptions.PATH).orElseThrow(() -> -new ValidationException("Option [path] should be not empty."))); -return new HoodieTableSource( -schema, -path, -context.getTable().getPartitionKeys(), -conf.getString(FlinkOptions.PARTITION_DEFAULT_NAME), -conf); +// enclosing the code within a try catch block so that we can log the error message. +// Flink 1.11 did a bad compatibility for the old table factory, it uses the old factory +// to create the source/sink and catches all the exceptions then tries the new factory. +// +// log the error message first so that there is a chance to show the real failure cause. +try { + Path path = new Path(conf.getOptional(FlinkOptions.PATH).orElseThrow(() -> + new ValidationException("Option [path] should not be empty."))); + return new HoodieTableSource( + schema, + path, + context.getTable().getPartitionKeys(), + conf.getString(FlinkOptions.PARTITION_DEFAULT_NAME), + conf); +} catch (Throwable throwable) { + LOG.error("Create table source error", throwable); + throw new HoodieException(throwable); +} } @Override
[GitHub] [hudi] Sugamber edited a comment on issue #2637: [SUPPORT] - Partial Update : update few columns of a table
Sugamber edited a comment on issue #2637: URL: https://github.com/apache/hudi/issues/2637#issuecomment-805645355 There is an open pull request for partial update for CoW table. https://github.com/apache/hudi/pull/1929 It looks like my use case is similar to this . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] IloveZiHan opened a new pull request #2713: 查阅Structured Streaming写入Hudi执行计划
IloveZiHan opened a new pull request #2713: URL: https://github.com/apache/hudi/pull/2713 ## *Tips* - *Thank you very much for contributing to Apache Hudi.* - *Please review https://hudi.apache.org/contributing.html before opening a pull request.* ## What is the purpose of the pull request *(For example: This pull request adds quick-start document.)* ## Brief change log *(for example:)* - *Modify AnnotationLocation checkstyle rule in checkstyle.xml* ## Verify this pull request *(Please pick either of the following options)* This pull request is a trivial rework / code cleanup without any test coverage. *(or)* This pull request is already covered by existing tests, such as *(please describe tests)*. (or) This change added tests and can be verified as follows: *(example:)* - *Added integration tests for end-to-end.* - *Added HoodieClientWriteTest to verify the change.* - *Manually verified the change by running a job locally.* ## Committer checklist - [ ] Has a corresponding JIRA in PR title & commit - [ ] Commit message is descriptive of the change - [ ] CI is green - [ ] Necessary doc changes done or have another open PR - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] IloveZiHan commented on pull request #2713: 查阅Structured Streaming写入Hudi执行计划
IloveZiHan commented on pull request #2713: URL: https://github.com/apache/hudi/pull/2713#issuecomment-805712787 ok -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] IloveZiHan closed pull request #2713: 查阅Structured Streaming写入Hudi执行计划
IloveZiHan closed pull request #2713: URL: https://github.com/apache/hudi/pull/2713 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] Sugamber commented on issue #2637: [SUPPORT] - Partial Update : update few columns of a table
Sugamber commented on issue #2637: URL: https://github.com/apache/hudi/issues/2637#issuecomment-805713856 @nsivabalan Do we have any timeline for this pull request ? Pull request 1- https://github.com/apache/hudi/pull/1929/ Pull request 2- https://github.com/apache/hudi/pull/2666 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (HUDI-1714) Improve code coverage of TestHoodieTimelineArchiveLog
Jagmeet Bali created HUDI-1714: -- Summary: Improve code coverage of TestHoodieTimelineArchiveLog Key: HUDI-1714 URL: https://issues.apache.org/jira/browse/HUDI-1714 Project: Apache Hudi Issue Type: Test Reporter: Jagmeet Bali Add tests for the newly added code which supports the archival of clean and rollback commits specifically around the getCleanInstantsToArchive codepath within HoodieTimelineArchiveLog -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] nsivabalan commented on issue #2675: [SUPPORT] Unable to query MOR table after schema evolution
nsivabalan commented on issue #2675: URL: https://github.com/apache/hudi/issues/2675#issuecomment-805805044 1. do you use the RowBasedSchemaProvider and hence can't explicitly provide schema? If you were to use your own schema registry, you might as well provide an updated schema to hudi while writing. 2. got it. would be nice to have some contribution. I can help review the patch. In the mean time, I will give it a try schema evolution on my end with some local set up. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] jsbali commented on a change in pull request #2677: Added tests to TestHoodieTimelineArchiveLog for the archival of compl…
jsbali commented on a change in pull request #2677: URL: https://github.com/apache/hudi/pull/2677#discussion_r600461519 ## File path: hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/io/TestHoodieTimelineArchiveLog.java ## @@ -388,6 +391,31 @@ public void testArchiveCommitSavepointNoHole() throws IOException { "Archived commits should always be safe"); } + @Test + public void testArchiveRollbacks() throws IOException { +HoodieWriteConfig cfg = HoodieWriteConfig.newBuilder().withPath(basePath) + .withSchema(HoodieTestDataGenerator.TRIP_EXAMPLE_SCHEMA).withParallelism(2, 2).forTable("test-trip-table") + .withCompactionConfig(HoodieCompactionConfig.newBuilder().retainCommits(1).archiveCommitsWith(2, 3).build()) +.build(); + +createCommitAndRollbackFile("100", "101", false); +createCommitAndRollbackFile("102", "103", false); +createCommitAndRollbackFile("104", "105", false); +createCommitAndRollbackFile("106", "107", false); + +HoodieTable table = HoodieSparkTable.create(cfg, context); +HoodieTimelineArchiveLog archiveLog = new HoodieTimelineArchiveLog(cfg, table); + +assertTrue(archiveLog.archiveIfRequired(context)); +HoodieTimeline timeline = metaClient.getActiveTimeline().reload().getCommitsTimeline().filterCompletedInstants(); +assertEquals(2, timeline.countInstants(), +"first two commits must have been archived"); +assertFalse(metaClient.getActiveTimeline().containsInstant(new HoodieInstant(false, HoodieTimeline.ROLLBACK_ACTION, "101")), +"first rollback must have been archived"); +assertFalse(metaClient.getActiveTimeline().containsInstant(new HoodieInstant(false, HoodieTimeline.ROLLBACK_ACTION, "103")), Review comment: Done -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] jsbali commented on a change in pull request #2677: Added tests to TestHoodieTimelineArchiveLog for the archival of compl…
jsbali commented on a change in pull request #2677: URL: https://github.com/apache/hudi/pull/2677#discussion_r600462395 ## File path: hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/io/TestHoodieTimelineArchiveLog.java ## @@ -491,6 +519,166 @@ public void testConvertCommitMetadata() { assertEquals(expectedCommitMetadata.getOperationType(), WriteOperationType.INSERT.toString()); } + @Test + public void testArchiveCompletedClean() throws IOException { +HoodieWriteConfig cfg = + HoodieWriteConfig.newBuilder().withPath(basePath).withSchema(HoodieTestDataGenerator.TRIP_EXAMPLE_SCHEMA) +.withParallelism(2, 2).forTable("test-trip-table") + .withCompactionConfig(HoodieCompactionConfig.newBuilder().retainCommits(1).archiveCommitsWith(2, 3).build()) +.build(); +metaClient = HoodieTableMetaClient.reload(metaClient); + +createCleanMetadata("10", false); +createCleanMetadata("11", false); +createCleanMetadata("12", false); +HoodieInstant notArchivedInstant1 = new HoodieInstant(State.COMPLETED, "clean", "12"); +createCleanMetadata("13", false); +HoodieInstant notArchivedInstant2 = new HoodieInstant(State.COMPLETED, "clean", "13"); + +HoodieTable table = HoodieSparkTable.create(cfg, context, metaClient); +HoodieTimelineArchiveLog archiveLog = new HoodieTimelineArchiveLog(cfg, table); + +archiveLog.archiveIfRequired(context); + +List notArchivedInstants = metaClient.getActiveTimeline().reload().getInstants().collect(Collectors.toList()); +//There will be 3 * 2 files but due to TimelineLayoutV1 this will show as 2. +assertEquals(2, notArchivedInstants.size(), "Not archived instants should be 2"); +assertEquals(notArchivedInstants, Arrays.asList(notArchivedInstant1, notArchivedInstant2), ""); + } + + @Test + public void testArchiveCompletedRollback() throws IOException { +HoodieWriteConfig cfg = + HoodieWriteConfig.newBuilder().withPath(basePath).withSchema(HoodieTestDataGenerator.TRIP_EXAMPLE_SCHEMA) +.withParallelism(2, 2).forTable("test-trip-table") + .withCompactionConfig(HoodieCompactionConfig.newBuilder().retainCommits(1).archiveCommitsWith(2, 3).build()) +.build(); +metaClient = HoodieTableMetaClient.reload(metaClient); + +createCommitAndRollbackFile("6", "10", false); +createCommitAndRollbackFile("8", "11", false); +createCommitAndRollbackFile("7", "12", false); +HoodieInstant notArchivedInstant1 = new HoodieInstant(State.COMPLETED, "rollback", "12"); + +createCommitAndRollbackFile("5", "13", false); +HoodieInstant notArchivedInstant2 = new HoodieInstant(State.COMPLETED, "rollback", "13"); + +HoodieTable table = HoodieSparkTable.create(cfg, context, metaClient); +HoodieTimelineArchiveLog archiveLog = new HoodieTimelineArchiveLog(cfg, table); + +archiveLog.archiveIfRequired(context); + +List notArchivedInstants = metaClient.getActiveTimeline().reload().getRollbackTimeline().getInstants().collect(Collectors.toList()); +//There will be 2 * 2 files but due to TimelineLayoutV1 this will show as 2. +assertEquals(2, notArchivedInstants.size(), "Not archived instants should be 2"); +assertEquals(notArchivedInstants, Arrays.asList(notArchivedInstant1, notArchivedInstant2), ""); + } + + @Test + public void testArchiveCompletedShouldRetainMinInstantsIfInstantsGreaterThanMaxtoKeep() throws IOException { +int minInstants = 2; +int maxInstants = 10; +HoodieWriteConfig cfg = + HoodieWriteConfig.newBuilder().withPath(basePath).withSchema(HoodieTestDataGenerator.TRIP_EXAMPLE_SCHEMA) +.withParallelism(2, 2).forTable("test-trip-table") + .withCompactionConfig(HoodieCompactionConfig.newBuilder().retainCommits(1).archiveCommitsWith(minInstants, maxInstants).build()) +.build(); +metaClient = HoodieTableMetaClient.reload(metaClient); +for (int i = 0; i < maxInstants + 2; i++) { + createCleanMetadata(i + "", false); +} + +HoodieTable table = HoodieSparkTable.create(cfg, context, metaClient); +HoodieTimelineArchiveLog archiveLog = new HoodieTimelineArchiveLog(cfg, table); + +archiveLog.archiveIfRequired(context); +assertEquals(minInstants, metaClient.getActiveTimeline().reload().getInstants().count()); + } + + @Test + public void testArchiveCompletedShouldNotArchiveIfInstantsLessThanMaxtoKeep() throws IOException { +int minInstants = 2; +int maxInstants = 10; +HoodieWriteConfig cfg = + HoodieWriteConfig.newBuilder().withPath(basePath).withSchema(HoodieTestDataGenerator.TRIP_EXAMPLE_SCHEMA) +.withParallelism(2, 2).forTable("test-trip-table") + .withCompactionConfig(HoodieCompactionConfig.newBuilder().retainCommits(1).archiveCom
[GitHub] [hudi] jsbali commented on a change in pull request #2677: Added tests to TestHoodieTimelineArchiveLog for the archival of compl…
jsbali commented on a change in pull request #2677: URL: https://github.com/apache/hudi/pull/2677#discussion_r600463160 ## File path: hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/io/TestHoodieTimelineArchiveLog.java ## @@ -491,6 +519,166 @@ public void testConvertCommitMetadata() { assertEquals(expectedCommitMetadata.getOperationType(), WriteOperationType.INSERT.toString()); } + @Test + public void testArchiveCompletedClean() throws IOException { +HoodieWriteConfig cfg = + HoodieWriteConfig.newBuilder().withPath(basePath).withSchema(HoodieTestDataGenerator.TRIP_EXAMPLE_SCHEMA) +.withParallelism(2, 2).forTable("test-trip-table") + .withCompactionConfig(HoodieCompactionConfig.newBuilder().retainCommits(1).archiveCommitsWith(2, 3).build()) +.build(); +metaClient = HoodieTableMetaClient.reload(metaClient); + +createCleanMetadata("10", false); +createCleanMetadata("11", false); +createCleanMetadata("12", false); +HoodieInstant notArchivedInstant1 = new HoodieInstant(State.COMPLETED, "clean", "12"); +createCleanMetadata("13", false); +HoodieInstant notArchivedInstant2 = new HoodieInstant(State.COMPLETED, "clean", "13"); + +HoodieTable table = HoodieSparkTable.create(cfg, context, metaClient); +HoodieTimelineArchiveLog archiveLog = new HoodieTimelineArchiveLog(cfg, table); + +archiveLog.archiveIfRequired(context); + +List notArchivedInstants = metaClient.getActiveTimeline().reload().getInstants().collect(Collectors.toList()); +//There will be 3 * 2 files but due to TimelineLayoutV1 this will show as 2. +assertEquals(2, notArchivedInstants.size(), "Not archived instants should be 2"); +assertEquals(notArchivedInstants, Arrays.asList(notArchivedInstant1, notArchivedInstant2), ""); + } + + @Test + public void testArchiveCompletedRollback() throws IOException { +HoodieWriteConfig cfg = + HoodieWriteConfig.newBuilder().withPath(basePath).withSchema(HoodieTestDataGenerator.TRIP_EXAMPLE_SCHEMA) +.withParallelism(2, 2).forTable("test-trip-table") + .withCompactionConfig(HoodieCompactionConfig.newBuilder().retainCommits(1).archiveCommitsWith(2, 3).build()) +.build(); +metaClient = HoodieTableMetaClient.reload(metaClient); + +createCommitAndRollbackFile("6", "10", false); +createCommitAndRollbackFile("8", "11", false); +createCommitAndRollbackFile("7", "12", false); +HoodieInstant notArchivedInstant1 = new HoodieInstant(State.COMPLETED, "rollback", "12"); + +createCommitAndRollbackFile("5", "13", false); +HoodieInstant notArchivedInstant2 = new HoodieInstant(State.COMPLETED, "rollback", "13"); + +HoodieTable table = HoodieSparkTable.create(cfg, context, metaClient); +HoodieTimelineArchiveLog archiveLog = new HoodieTimelineArchiveLog(cfg, table); + +archiveLog.archiveIfRequired(context); + +List notArchivedInstants = metaClient.getActiveTimeline().reload().getRollbackTimeline().getInstants().collect(Collectors.toList()); +//There will be 2 * 2 files but due to TimelineLayoutV1 this will show as 2. +assertEquals(2, notArchivedInstants.size(), "Not archived instants should be 2"); +assertEquals(notArchivedInstants, Arrays.asList(notArchivedInstant1, notArchivedInstant2), ""); + } + + @Test + public void testArchiveCompletedShouldRetainMinInstantsIfInstantsGreaterThanMaxtoKeep() throws IOException { +int minInstants = 2; +int maxInstants = 10; +HoodieWriteConfig cfg = + HoodieWriteConfig.newBuilder().withPath(basePath).withSchema(HoodieTestDataGenerator.TRIP_EXAMPLE_SCHEMA) +.withParallelism(2, 2).forTable("test-trip-table") + .withCompactionConfig(HoodieCompactionConfig.newBuilder().retainCommits(1).archiveCommitsWith(minInstants, maxInstants).build()) +.build(); +metaClient = HoodieTableMetaClient.reload(metaClient); +for (int i = 0; i < maxInstants + 2; i++) { + createCleanMetadata(i + "", false); +} + +HoodieTable table = HoodieSparkTable.create(cfg, context, metaClient); +HoodieTimelineArchiveLog archiveLog = new HoodieTimelineArchiveLog(cfg, table); + +archiveLog.archiveIfRequired(context); +assertEquals(minInstants, metaClient.getActiveTimeline().reload().getInstants().count()); + } + + @Test + public void testArchiveCompletedShouldNotArchiveIfInstantsLessThanMaxtoKeep() throws IOException { +int minInstants = 2; +int maxInstants = 10; +HoodieWriteConfig cfg = + HoodieWriteConfig.newBuilder().withPath(basePath).withSchema(HoodieTestDataGenerator.TRIP_EXAMPLE_SCHEMA) +.withParallelism(2, 2).forTable("test-trip-table") + .withCompactionConfig(HoodieCompactionConfig.newBuilder().retainCommits(1).archiveCom
[GitHub] [hudi] jsbali commented on a change in pull request #2677: Added tests to TestHoodieTimelineArchiveLog for the archival of compl…
jsbali commented on a change in pull request #2677: URL: https://github.com/apache/hudi/pull/2677#discussion_r600463420 ## File path: hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/io/TestHoodieTimelineArchiveLog.java ## @@ -491,6 +519,166 @@ public void testConvertCommitMetadata() { assertEquals(expectedCommitMetadata.getOperationType(), WriteOperationType.INSERT.toString()); } + @Test + public void testArchiveCompletedClean() throws IOException { +HoodieWriteConfig cfg = + HoodieWriteConfig.newBuilder().withPath(basePath).withSchema(HoodieTestDataGenerator.TRIP_EXAMPLE_SCHEMA) +.withParallelism(2, 2).forTable("test-trip-table") + .withCompactionConfig(HoodieCompactionConfig.newBuilder().retainCommits(1).archiveCommitsWith(2, 3).build()) +.build(); +metaClient = HoodieTableMetaClient.reload(metaClient); + +createCleanMetadata("10", false); +createCleanMetadata("11", false); +createCleanMetadata("12", false); +HoodieInstant notArchivedInstant1 = new HoodieInstant(State.COMPLETED, "clean", "12"); +createCleanMetadata("13", false); +HoodieInstant notArchivedInstant2 = new HoodieInstant(State.COMPLETED, "clean", "13"); + +HoodieTable table = HoodieSparkTable.create(cfg, context, metaClient); +HoodieTimelineArchiveLog archiveLog = new HoodieTimelineArchiveLog(cfg, table); + +archiveLog.archiveIfRequired(context); + +List notArchivedInstants = metaClient.getActiveTimeline().reload().getInstants().collect(Collectors.toList()); +//There will be 3 * 2 files but due to TimelineLayoutV1 this will show as 2. +assertEquals(2, notArchivedInstants.size(), "Not archived instants should be 2"); +assertEquals(notArchivedInstants, Arrays.asList(notArchivedInstant1, notArchivedInstant2), ""); + } + + @Test + public void testArchiveCompletedRollback() throws IOException { +HoodieWriteConfig cfg = + HoodieWriteConfig.newBuilder().withPath(basePath).withSchema(HoodieTestDataGenerator.TRIP_EXAMPLE_SCHEMA) +.withParallelism(2, 2).forTable("test-trip-table") + .withCompactionConfig(HoodieCompactionConfig.newBuilder().retainCommits(1).archiveCommitsWith(2, 3).build()) +.build(); +metaClient = HoodieTableMetaClient.reload(metaClient); + +createCommitAndRollbackFile("6", "10", false); +createCommitAndRollbackFile("8", "11", false); +createCommitAndRollbackFile("7", "12", false); +HoodieInstant notArchivedInstant1 = new HoodieInstant(State.COMPLETED, "rollback", "12"); + +createCommitAndRollbackFile("5", "13", false); +HoodieInstant notArchivedInstant2 = new HoodieInstant(State.COMPLETED, "rollback", "13"); + +HoodieTable table = HoodieSparkTable.create(cfg, context, metaClient); +HoodieTimelineArchiveLog archiveLog = new HoodieTimelineArchiveLog(cfg, table); + +archiveLog.archiveIfRequired(context); + +List notArchivedInstants = metaClient.getActiveTimeline().reload().getRollbackTimeline().getInstants().collect(Collectors.toList()); +//There will be 2 * 2 files but due to TimelineLayoutV1 this will show as 2. +assertEquals(2, notArchivedInstants.size(), "Not archived instants should be 2"); +assertEquals(notArchivedInstants, Arrays.asList(notArchivedInstant1, notArchivedInstant2), ""); + } + + @Test + public void testArchiveCompletedShouldRetainMinInstantsIfInstantsGreaterThanMaxtoKeep() throws IOException { +int minInstants = 2; +int maxInstants = 10; +HoodieWriteConfig cfg = + HoodieWriteConfig.newBuilder().withPath(basePath).withSchema(HoodieTestDataGenerator.TRIP_EXAMPLE_SCHEMA) +.withParallelism(2, 2).forTable("test-trip-table") + .withCompactionConfig(HoodieCompactionConfig.newBuilder().retainCommits(1).archiveCommitsWith(minInstants, maxInstants).build()) +.build(); +metaClient = HoodieTableMetaClient.reload(metaClient); +for (int i = 0; i < maxInstants + 2; i++) { + createCleanMetadata(i + "", false); +} + +HoodieTable table = HoodieSparkTable.create(cfg, context, metaClient); +HoodieTimelineArchiveLog archiveLog = new HoodieTimelineArchiveLog(cfg, table); + +archiveLog.archiveIfRequired(context); +assertEquals(minInstants, metaClient.getActiveTimeline().reload().getInstants().count()); + } + + @Test + public void testArchiveCompletedShouldNotArchiveIfInstantsLessThanMaxtoKeep() throws IOException { +int minInstants = 2; +int maxInstants = 10; +HoodieWriteConfig cfg = + HoodieWriteConfig.newBuilder().withPath(basePath).withSchema(HoodieTestDataGenerator.TRIP_EXAMPLE_SCHEMA) +.withParallelism(2, 2).forTable("test-trip-table") + .withCompactionConfig(HoodieCompactionConfig.newBuilder().retainCommits(1).archiveCom
[GitHub] [hudi] jsbali commented on pull request #2677: Added tests to TestHoodieTimelineArchiveLog for the archival of compl…
jsbali commented on pull request #2677: URL: https://github.com/apache/hudi/pull/2677#issuecomment-805810171 @vinothchandar added [JIRA](https://issues.apache.org/jira/browse/HUDI-1714) for it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] jsbali edited a comment on pull request #2677: Added tests to TestHoodieTimelineArchiveLog for the archival of compl…
jsbali edited a comment on pull request #2677: URL: https://github.com/apache/hudi/pull/2677#issuecomment-805810171 @vinothchandar added [HUDI-1714](https://issues.apache.org/jira/browse/HUDI-1714) for it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[hudi] branch master updated: Moving to 0.9.0-SNAPSHOT on master branch.
This is an automated email from the ASF dual-hosted git repository. garyli pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/master by this push: new 6e803e0 Moving to 0.9.0-SNAPSHOT on master branch. 6e803e0 is described below commit 6e803e08b1328b32a5c3a6acd8168fdabc8a1e50 Author: garyli1019 AuthorDate: Wed Mar 24 21:37:14 2021 +0800 Moving to 0.9.0-SNAPSHOT on master branch. --- docker/hoodie/hadoop/base/pom.xml | 2 +- docker/hoodie/hadoop/datanode/pom.xml | 2 +- docker/hoodie/hadoop/historyserver/pom.xml | 2 +- docker/hoodie/hadoop/hive_base/pom.xml | 2 +- docker/hoodie/hadoop/namenode/pom.xml | 2 +- docker/hoodie/hadoop/pom.xml| 2 +- docker/hoodie/hadoop/prestobase/pom.xml | 2 +- docker/hoodie/hadoop/spark_base/pom.xml | 2 +- docker/hoodie/hadoop/sparkadhoc/pom.xml | 2 +- docker/hoodie/hadoop/sparkmaster/pom.xml| 2 +- docker/hoodie/hadoop/sparkworker/pom.xml| 2 +- hudi-cli/pom.xml| 2 +- hudi-client/hudi-client-common/pom.xml | 4 ++-- hudi-client/hudi-flink-client/pom.xml | 4 ++-- hudi-client/hudi-java-client/pom.xml| 4 ++-- hudi-client/hudi-spark-client/pom.xml | 4 ++-- hudi-client/pom.xml | 2 +- hudi-common/pom.xml | 2 +- hudi-examples/pom.xml | 2 +- hudi-flink/pom.xml | 2 +- hudi-hadoop-mr/pom.xml | 2 +- hudi-integ-test/pom.xml | 2 +- hudi-spark-datasource/hudi-spark-common/pom.xml | 4 ++-- hudi-spark-datasource/hudi-spark/pom.xml| 4 ++-- hudi-spark-datasource/hudi-spark2/pom.xml | 4 ++-- hudi-spark-datasource/hudi-spark3/pom.xml | 4 ++-- hudi-spark-datasource/pom.xml | 2 +- hudi-sync/hudi-dla-sync/pom.xml | 2 +- hudi-sync/hudi-hive-sync/pom.xml| 2 +- hudi-sync/hudi-sync-common/pom.xml | 2 +- hudi-sync/pom.xml | 2 +- hudi-timeline-service/pom.xml | 2 +- hudi-utilities/pom.xml | 2 +- packaging/hudi-flink-bundle/pom.xml | 2 +- packaging/hudi-hadoop-mr-bundle/pom.xml | 2 +- packaging/hudi-hive-sync-bundle/pom.xml | 2 +- packaging/hudi-integ-test-bundle/pom.xml| 2 +- packaging/hudi-presto-bundle/pom.xml| 2 +- packaging/hudi-spark-bundle/pom.xml | 2 +- packaging/hudi-timeline-server-bundle/pom.xml | 2 +- packaging/hudi-utilities-bundle/pom.xml | 2 +- pom.xml | 2 +- 42 files changed, 50 insertions(+), 50 deletions(-) diff --git a/docker/hoodie/hadoop/base/pom.xml b/docker/hoodie/hadoop/base/pom.xml index 42eb158..19a9bef 100644 --- a/docker/hoodie/hadoop/base/pom.xml +++ b/docker/hoodie/hadoop/base/pom.xml @@ -19,7 +19,7 @@ hudi-hadoop-docker org.apache.hudi -0.8.0-SNAPSHOT +0.9.0-SNAPSHOT 4.0.0 pom diff --git a/docker/hoodie/hadoop/datanode/pom.xml b/docker/hoodie/hadoop/datanode/pom.xml index 3ac8ec0..ca77f0d 100644 --- a/docker/hoodie/hadoop/datanode/pom.xml +++ b/docker/hoodie/hadoop/datanode/pom.xml @@ -19,7 +19,7 @@ hudi-hadoop-docker org.apache.hudi -0.8.0-SNAPSHOT +0.9.0-SNAPSHOT 4.0.0 pom diff --git a/docker/hoodie/hadoop/historyserver/pom.xml b/docker/hoodie/hadoop/historyserver/pom.xml index b0c5a77..c911d87 100644 --- a/docker/hoodie/hadoop/historyserver/pom.xml +++ b/docker/hoodie/hadoop/historyserver/pom.xml @@ -19,7 +19,7 @@ hudi-hadoop-docker org.apache.hudi -0.8.0-SNAPSHOT +0.9.0-SNAPSHOT 4.0.0 pom diff --git a/docker/hoodie/hadoop/hive_base/pom.xml b/docker/hoodie/hadoop/hive_base/pom.xml index 62ea4c1..3d95036 100644 --- a/docker/hoodie/hadoop/hive_base/pom.xml +++ b/docker/hoodie/hadoop/hive_base/pom.xml @@ -19,7 +19,7 @@ hudi-hadoop-docker org.apache.hudi -0.8.0-SNAPSHOT +0.9.0-SNAPSHOT 4.0.0 pom diff --git a/docker/hoodie/hadoop/namenode/pom.xml b/docker/hoodie/hadoop/namenode/pom.xml index dcd874c..3d3fd2f 100644 --- a/docker/hoodie/hadoop/namenode/pom.xml +++ b/docker/hoodie/hadoop/namenode/pom.xml @@ -19,7 +19,7 @@ hudi-hadoop-docker org.apache.hudi -0.8.0-SNAPSHOT +0.9.0-SNAPSHOT 4.0.0 pom diff --git a/docker/hoodie/hadoop/pom.xml b/docker/hoodie/hadoop/pom.xml index 612067d..20e8b5f 100644 --- a/docker/hoodie/hadoop/pom.xml +++ b/docker/hoodie/hadoop/pom.xml @@ -19,7 +19,7 @@ hudi org.apache.hudi -0.8.0-SNAPSHOT +0.9.0-SNAPSHOT ../../../pom.xml 4.0.0 diff --git a/docker/hoodie/hadoop/prestobase/pom.xml b/docker/hoodie/hadoop/prestobase/pom
[hudi] branch release-0.8.0 created (now 9bfd810)
This is an automated email from the ASF dual-hosted git repository. garyli pushed a change to branch release-0.8.0 in repository https://gitbox.apache.org/repos/asf/hudi.git. at 9bfd810 Create release branch for version 0.8.0. This branch includes the following new commits: new 9bfd810 Create release branch for version 0.8.0. The 1 revisions listed above as "new" are entirely new to this repository and will be described in separate emails. The revisions listed as "add" were already present in the repository and have only been added to this reference.
[hudi] 01/01: Create release branch for version 0.8.0.
This is an automated email from the ASF dual-hosted git repository. garyli pushed a commit to branch release-0.8.0 in repository https://gitbox.apache.org/repos/asf/hudi.git commit 9bfd810e745202bd27980fa696d61b47922f46a9 Author: garyli1019 AuthorDate: Wed Mar 24 21:37:43 2021 +0800 Create release branch for version 0.8.0. --- docker/hoodie/hadoop/base/pom.xml | 2 +- docker/hoodie/hadoop/datanode/pom.xml | 2 +- docker/hoodie/hadoop/historyserver/pom.xml | 2 +- docker/hoodie/hadoop/hive_base/pom.xml | 2 +- docker/hoodie/hadoop/namenode/pom.xml | 2 +- docker/hoodie/hadoop/pom.xml| 2 +- docker/hoodie/hadoop/prestobase/pom.xml | 2 +- docker/hoodie/hadoop/spark_base/pom.xml | 2 +- docker/hoodie/hadoop/sparkadhoc/pom.xml | 2 +- docker/hoodie/hadoop/sparkmaster/pom.xml| 2 +- docker/hoodie/hadoop/sparkworker/pom.xml| 2 +- hudi-cli/pom.xml| 2 +- hudi-client/hudi-client-common/pom.xml | 4 ++-- hudi-client/hudi-flink-client/pom.xml | 4 ++-- hudi-client/hudi-java-client/pom.xml| 4 ++-- hudi-client/hudi-spark-client/pom.xml | 4 ++-- hudi-client/pom.xml | 2 +- hudi-common/pom.xml | 2 +- hudi-examples/pom.xml | 2 +- hudi-flink/pom.xml | 2 +- hudi-hadoop-mr/pom.xml | 2 +- hudi-integ-test/pom.xml | 2 +- hudi-spark-datasource/hudi-spark-common/pom.xml | 4 ++-- hudi-spark-datasource/hudi-spark/pom.xml| 4 ++-- hudi-spark-datasource/hudi-spark2/pom.xml | 4 ++-- hudi-spark-datasource/hudi-spark3/pom.xml | 4 ++-- hudi-spark-datasource/pom.xml | 2 +- hudi-sync/hudi-dla-sync/pom.xml | 2 +- hudi-sync/hudi-hive-sync/pom.xml| 2 +- hudi-sync/hudi-sync-common/pom.xml | 2 +- hudi-sync/pom.xml | 2 +- hudi-timeline-service/pom.xml | 2 +- hudi-utilities/pom.xml | 2 +- packaging/hudi-flink-bundle/pom.xml | 2 +- packaging/hudi-hadoop-mr-bundle/pom.xml | 2 +- packaging/hudi-hive-sync-bundle/pom.xml | 2 +- packaging/hudi-integ-test-bundle/pom.xml| 2 +- packaging/hudi-presto-bundle/pom.xml| 2 +- packaging/hudi-spark-bundle/pom.xml | 2 +- packaging/hudi-timeline-server-bundle/pom.xml | 2 +- packaging/hudi-utilities-bundle/pom.xml | 2 +- pom.xml | 2 +- 42 files changed, 50 insertions(+), 50 deletions(-) diff --git a/docker/hoodie/hadoop/base/pom.xml b/docker/hoodie/hadoop/base/pom.xml index 42eb158..3e2bc48 100644 --- a/docker/hoodie/hadoop/base/pom.xml +++ b/docker/hoodie/hadoop/base/pom.xml @@ -19,7 +19,7 @@ hudi-hadoop-docker org.apache.hudi -0.8.0-SNAPSHOT +0.8.0-rc1 4.0.0 pom diff --git a/docker/hoodie/hadoop/datanode/pom.xml b/docker/hoodie/hadoop/datanode/pom.xml index 3ac8ec0..561d1a9 100644 --- a/docker/hoodie/hadoop/datanode/pom.xml +++ b/docker/hoodie/hadoop/datanode/pom.xml @@ -19,7 +19,7 @@ hudi-hadoop-docker org.apache.hudi -0.8.0-SNAPSHOT +0.8.0-rc1 4.0.0 pom diff --git a/docker/hoodie/hadoop/historyserver/pom.xml b/docker/hoodie/hadoop/historyserver/pom.xml index b0c5a77..b06a238 100644 --- a/docker/hoodie/hadoop/historyserver/pom.xml +++ b/docker/hoodie/hadoop/historyserver/pom.xml @@ -19,7 +19,7 @@ hudi-hadoop-docker org.apache.hudi -0.8.0-SNAPSHOT +0.8.0-rc1 4.0.0 pom diff --git a/docker/hoodie/hadoop/hive_base/pom.xml b/docker/hoodie/hadoop/hive_base/pom.xml index 62ea4c1..c17c3da 100644 --- a/docker/hoodie/hadoop/hive_base/pom.xml +++ b/docker/hoodie/hadoop/hive_base/pom.xml @@ -19,7 +19,7 @@ hudi-hadoop-docker org.apache.hudi -0.8.0-SNAPSHOT +0.8.0-rc1 4.0.0 pom diff --git a/docker/hoodie/hadoop/namenode/pom.xml b/docker/hoodie/hadoop/namenode/pom.xml index dcd874c..ab7251c 100644 --- a/docker/hoodie/hadoop/namenode/pom.xml +++ b/docker/hoodie/hadoop/namenode/pom.xml @@ -19,7 +19,7 @@ hudi-hadoop-docker org.apache.hudi -0.8.0-SNAPSHOT +0.8.0-rc1 4.0.0 pom diff --git a/docker/hoodie/hadoop/pom.xml b/docker/hoodie/hadoop/pom.xml index 612067d..deff4ba 100644 --- a/docker/hoodie/hadoop/pom.xml +++ b/docker/hoodie/hadoop/pom.xml @@ -19,7 +19,7 @@ hudi org.apache.hudi -0.8.0-SNAPSHOT +0.8.0-rc1 ../../../pom.xml 4.0.0 diff --git a/docker/hoodie/hadoop/prestobase/pom.xml b/docker/hoodie/hadoop/prestobase/pom.xml index dea2f43..2430969 100644 --- a/docker/hoodie/hadoop/prestobase/pom.xml +++ b/docker/hoodie/hadoop/prestobase/pom.xml @@ -20,7 +20,7 @@ hudi-hadoop-docker org.apa
[jira] [Resolved] (HUDI-1712) Standardize prefix for hoodie lock configs
[ https://issues.apache.org/jira/browse/HUDI-1712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gary Li resolved HUDI-1712. --- Resolution: Resolved > Standardize prefix for hoodie lock configs > -- > > Key: HUDI-1712 > URL: https://issues.apache.org/jira/browse/HUDI-1712 > Project: Apache Hudi > Issue Type: Task > Components: Writer Core >Reporter: Nishith Agarwal >Assignee: Nishith Agarwal >Priority: Major > Labels: pull-request-available > Fix For: 0.8.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (HUDI-1712) Standardize prefix for hoodie lock configs
[ https://issues.apache.org/jira/browse/HUDI-1712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gary Li closed HUDI-1712. - > Standardize prefix for hoodie lock configs > -- > > Key: HUDI-1712 > URL: https://issues.apache.org/jira/browse/HUDI-1712 > Project: Apache Hudi > Issue Type: Task > Components: Writer Core >Reporter: Nishith Agarwal >Assignee: Nishith Agarwal >Priority: Major > Labels: pull-request-available > Fix For: 0.8.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-909) Integrate hudi with flink engine
[ https://issues.apache.org/jira/browse/HUDI-909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gary Li updated HUDI-909: - Issue Type: New Feature (was: Task) > Integrate hudi with flink engine > > > Key: HUDI-909 > URL: https://issues.apache.org/jira/browse/HUDI-909 > Project: Apache Hudi > Issue Type: New Feature >Reporter: wangxianghu#1 >Assignee: Xianghu Wang >Priority: Major > > Integrate hudi with flink engine -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-909) [UMBRELLA]Integrate hudi with flink engine
[ https://issues.apache.org/jira/browse/HUDI-909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gary Li updated HUDI-909: - Summary: [UMBRELLA]Integrate hudi with flink engine (was: Integrate hudi with flink engine) > [UMBRELLA]Integrate hudi with flink engine > -- > > Key: HUDI-909 > URL: https://issues.apache.org/jira/browse/HUDI-909 > Project: Apache Hudi > Issue Type: New Feature >Reporter: wangxianghu#1 >Assignee: Xianghu Wang >Priority: Major > > Integrate hudi with flink engine -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-1521) [UMBRELLA]HUDI Flink writer proposal
[ https://issues.apache.org/jira/browse/HUDI-1521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gary Li updated HUDI-1521: -- Summary: [UMBRELLA]HUDI Flink writer proposal (was: HUDI Flink writer proposal) > [UMBRELLA]HUDI Flink writer proposal > > > Key: HUDI-1521 > URL: https://issues.apache.org/jira/browse/HUDI-1521 > Project: Apache Hudi > Issue Type: New Feature > Components: Flink Integration >Reporter: Danny Chen >Assignee: Danny Chen >Priority: Major > > As the RFC-24 has described [1], we would promote the Flink writer as > following: > 1. Remove the single parallelism operator and add test framework > 2. Make the write task scalable > 3. Write as mini-batch > 4. Add a new index > So this is an umbrella issue, we would fix each as sub-tasks. > [1] > https://cwiki.apache.org/confluence/display/HUDI/RFC+-+24%3A+Hoodie+Flink+Writer+Proposal -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-1521) HUDI Flink writer proposal
[ https://issues.apache.org/jira/browse/HUDI-1521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gary Li updated HUDI-1521: -- Issue Type: New Feature (was: Improvement) > HUDI Flink writer proposal > -- > > Key: HUDI-1521 > URL: https://issues.apache.org/jira/browse/HUDI-1521 > Project: Apache Hudi > Issue Type: New Feature > Components: Flink Integration >Reporter: Danny Chen >Assignee: Danny Chen >Priority: Major > > As the RFC-24 has described [1], we would promote the Flink writer as > following: > 1. Remove the single parallelism operator and add test framework > 2. Make the write task scalable > 3. Write as mini-batch > 4. Add a new index > So this is an umbrella issue, we would fix each as sub-tasks. > [1] > https://cwiki.apache.org/confluence/display/HUDI/RFC+-+24%3A+Hoodie+Flink+Writer+Proposal -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-1521) [UMBRELLA]HUDI Flink writer proposal
[ https://issues.apache.org/jira/browse/HUDI-1521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gary Li updated HUDI-1521: -- Fix Version/s: 0.8.0 > [UMBRELLA]HUDI Flink writer proposal > > > Key: HUDI-1521 > URL: https://issues.apache.org/jira/browse/HUDI-1521 > Project: Apache Hudi > Issue Type: New Feature > Components: Flink Integration >Reporter: Danny Chen >Assignee: Danny Chen >Priority: Major > Fix For: 0.8.0 > > > As the RFC-24 has described [1], we would promote the Flink writer as > following: > 1. Remove the single parallelism operator and add test framework > 2. Make the write task scalable > 3. Write as mini-batch > 4. Add a new index > So this is an umbrella issue, we would fix each as sub-tasks. > [1] > https://cwiki.apache.org/confluence/display/HUDI/RFC+-+24%3A+Hoodie+Flink+Writer+Proposal -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-1521) [UMBRELLA] RFC-24 HUDI Flink writer proposal
[ https://issues.apache.org/jira/browse/HUDI-1521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gary Li updated HUDI-1521: -- Summary: [UMBRELLA] RFC-24 HUDI Flink writer proposal (was: [UMBRELLA]HUDI Flink writer proposal) > [UMBRELLA] RFC-24 HUDI Flink writer proposal > > > Key: HUDI-1521 > URL: https://issues.apache.org/jira/browse/HUDI-1521 > Project: Apache Hudi > Issue Type: New Feature > Components: Flink Integration >Reporter: Danny Chen >Assignee: Danny Chen >Priority: Major > Fix For: 0.8.0 > > > As the RFC-24 has described [1], we would promote the Flink writer as > following: > 1. Remove the single parallelism operator and add test framework > 2. Make the write task scalable > 3. Write as mini-batch > 4. Add a new index > So this is an umbrella issue, we would fix each as sub-tasks. > [1] > https://cwiki.apache.org/confluence/display/HUDI/RFC+-+24%3A+Hoodie+Flink+Writer+Proposal -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HUDI-1715) Refactor the BaseFlinkCommitActionExecutor to adapt to the new Flink writer
Gary Li created HUDI-1715: - Summary: Refactor the BaseFlinkCommitActionExecutor to adapt to the new Flink writer Key: HUDI-1715 URL: https://issues.apache.org/jira/browse/HUDI-1715 Project: Apache Hudi Issue Type: Improvement Reporter: Gary Li -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-1715) Refactor the BaseFlinkCommitActionExecutor to adapt to the new Flink writer
[ https://issues.apache.org/jira/browse/HUDI-1715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gary Li updated HUDI-1715: -- Component/s: Flink Integration > Refactor the BaseFlinkCommitActionExecutor to adapt to the new Flink writer > --- > > Key: HUDI-1715 > URL: https://issues.apache.org/jira/browse/HUDI-1715 > Project: Apache Hudi > Issue Type: Improvement > Components: Flink Integration >Reporter: Gary Li >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (HUDI-1581) Refactor the BaseFlinkCommitActionExecutor to adapt to the new Flink writer
[ https://issues.apache.org/jira/browse/HUDI-1581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gary Li closed HUDI-1581. - Resolution: Duplicate > Refactor the BaseFlinkCommitActionExecutor to adapt to the new Flink writer > --- > > Key: HUDI-1581 > URL: https://issues.apache.org/jira/browse/HUDI-1581 > Project: Apache Hudi > Issue Type: Sub-task > Components: Common Core >Reporter: Danny Chen >Priority: Major > > In order to adapt to the new Flink writer, the executor needs to support: > 1. specify the bucket type explicitly for a batch of records directly > 2. have control on when and how the underneath file handles roll over -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-1715) Refactor the BaseFlinkCommitActionExecutor to adapt to the new Flink writer
[ https://issues.apache.org/jira/browse/HUDI-1715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gary Li updated HUDI-1715: -- Description: In order to adapt to the new Flink writer, the executor needs to support: 1. specify the bucket type explicitly for a batch of records directly 2. have control on when and how the underneath file handles roll over > Refactor the BaseFlinkCommitActionExecutor to adapt to the new Flink writer > --- > > Key: HUDI-1715 > URL: https://issues.apache.org/jira/browse/HUDI-1715 > Project: Apache Hudi > Issue Type: Improvement > Components: Flink Integration >Reporter: Gary Li >Priority: Major > > In order to adapt to the new Flink writer, the executor needs to support: > 1. specify the bucket type explicitly for a batch of records directly > 2. have control on when and how the underneath file handles roll over -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (HUDI-1521) [UMBRELLA] RFC-24 HUDI Flink writer proposal
[ https://issues.apache.org/jira/browse/HUDI-1521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gary Li closed HUDI-1521. - > [UMBRELLA] RFC-24 HUDI Flink writer proposal > > > Key: HUDI-1521 > URL: https://issues.apache.org/jira/browse/HUDI-1521 > Project: Apache Hudi > Issue Type: New Feature > Components: Flink Integration >Reporter: Danny Chen >Assignee: Danny Chen >Priority: Major > Fix For: 0.8.0 > > > As the RFC-24 has described [1], we would promote the Flink writer as > following: > 1. Remove the single parallelism operator and add test framework > 2. Make the write task scalable > 3. Write as mini-batch > 4. Add a new index > So this is an umbrella issue, we would fix each as sub-tasks. > [1] > https://cwiki.apache.org/confluence/display/HUDI/RFC+-+24%3A+Hoodie+Flink+Writer+Proposal -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HUDI-1521) [UMBRELLA] RFC-24 HUDI Flink writer proposal
[ https://issues.apache.org/jira/browse/HUDI-1521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gary Li resolved HUDI-1521. --- Resolution: Implemented > [UMBRELLA] RFC-24 HUDI Flink writer proposal > > > Key: HUDI-1521 > URL: https://issues.apache.org/jira/browse/HUDI-1521 > Project: Apache Hudi > Issue Type: New Feature > Components: Flink Integration >Reporter: Danny Chen >Assignee: Danny Chen >Priority: Major > Fix For: 0.8.0 > > > As the RFC-24 has described [1], we would promote the Flink writer as > following: > 1. Remove the single parallelism operator and add test framework > 2. Make the write task scalable > 3. Write as mini-batch > 4. Add a new index > So this is an umbrella issue, we would fix each as sub-tasks. > [1] > https://cwiki.apache.org/confluence/display/HUDI/RFC+-+24%3A+Hoodie+Flink+Writer+Proposal -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] vburenin opened a new pull request #2714: [HUDI-1707] Reduces log level for too verbose messages from info to debug level.
vburenin opened a new pull request #2714: URL: https://github.com/apache/hudi/pull/2714 ## What is the purpose of the pull request Some log messages are too verbose and some are not really readable, this PR is aimed at moving most verbose messages from info to debug level as well as improving print out of the configuration info in delta streamer. ## Verify this pull request This pull request is a trivial rework / code cleanup without any test coverage. - [x] Has a corresponding JIRA in PR title & commit - [x] Commit message is descriptive of the change - [x] CI is green -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-1707) Improve Logging subsystem
[ https://issues.apache.org/jira/browse/HUDI-1707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-1707: - Labels: pull-request-available (was: ) > Improve Logging subsystem > - > > Key: HUDI-1707 > URL: https://issues.apache.org/jira/browse/HUDI-1707 > Project: Apache Hudi > Issue Type: Improvement > Components: Code Cleanup >Reporter: Volodymyr Burenin >Priority: Major > Labels: pull-request-available > > Currently Hudi has a relatively verbose logging on info level that is not > particularly useful. Like latency measurements of file system views, print > out of commit timelines, etc that could be super large, etc. > Additionally to that, the logging subsystem is suboptimal as it formats all > messages before they are being passed into the logger, so it doesn't matter > if logger will or will not print out a log message a lot of work is still > being done anyway. > Would be also nice to add more messages on info level to determine at which > phase Hudi is: ideally, the info level should be limited to the point that > just looking into the logs should be clear enough at which phase Hudi is and > what it is doing without being too verbose. > TBD: Add more thoughts on logging subsystem -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] vburenin commented on a change in pull request #2687: [HUDI-1700] Hudi Meetup with Uber video link
vburenin commented on a change in pull request #2687: URL: https://github.com/apache/hudi/pull/2687#discussion_r600554090 ## File path: docs/_docs/0.7.0/1_4_powered_by.md ## @@ -146,6 +146,8 @@ Meanwhile, we build a set of data access standards based on Hudi, which provides 21. ["Meetup talk by Nishith Agarwal"](https://www.meetup.com/UberEvents/events/274924537/) - Uber Data Platforms Meetup, Dec 2020 +22. ["Apache Hudi Meetup at Uber with talks from AWS, CityStorageSystems & Uber"](https://youtu.be/cAvbBfMbaiA) - By Udit Mehrotra, Wenning Ding (AWS), Alexander Filipchik (CityStorageSystems), Prashant Wason, Satish Kotha (Uber), Feb 2021 Review comment: Still private -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] kema-wish opened a new pull request #2715: Fix non object id key
kema-wish opened a new pull request #2715: URL: https://github.com/apache/hudi/pull/2715 ## *Tips* - *Thank you very much for contributing to Apache Hudi.* - *Please review https://hudi.apache.org/contributing.html before opening a pull request.* ## What is the purpose of the pull request *(For example: This pull request adds quick-start document.)* ## Brief change log *(for example:)* - *Modify AnnotationLocation checkstyle rule in checkstyle.xml* ## Verify this pull request *(Please pick either of the following options)* This pull request is a trivial rework / code cleanup without any test coverage. *(or)* This pull request is already covered by existing tests, such as *(please describe tests)*. (or) This change added tests and can be verified as follows: *(example:)* - *Added integration tests for end-to-end.* - *Added HoodieClientWriteTest to verify the change.* - *Manually verified the change by running a job locally.* ## Committer checklist - [ ] Has a corresponding JIRA in PR title & commit - [ ] Commit message is descriptive of the change - [ ] CI is green - [ ] Necessary doc changes done or have another open PR - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] kema-wish closed pull request #2715: Fix non object id key
kema-wish closed pull request #2715: URL: https://github.com/apache/hudi/pull/2715 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] n3nash merged pull request #2709: [HUDI-1713] Updating config name for concurrency
n3nash merged pull request #2709: URL: https://github.com/apache/hudi/pull/2709 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[hudi] branch asf-site updated: Updating config name (#2709)
This is an automated email from the ASF dual-hosted git repository. nagarwal pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/asf-site by this push: new f48fc59 Updating config name (#2709) f48fc59 is described below commit f48fc591cc2309152ed602401b973581e34a1916 Author: n3nash AuthorDate: Wed Mar 24 08:54:49 2021 -0700 Updating config name (#2709) --- docs/_docs/2_9_concurrency_control.md | 46 +-- 1 file changed, 23 insertions(+), 23 deletions(-) diff --git a/docs/_docs/2_9_concurrency_control.md b/docs/_docs/2_9_concurrency_control.md index e555d39..f3abc77 100644 --- a/docs/_docs/2_9_concurrency_control.md +++ b/docs/_docs/2_9_concurrency_control.md @@ -45,7 +45,7 @@ The following properties are needed to be set properly to turn on optimistic con ``` hoodie.write.concurrency.mode=optimistic_concurrency_control hoodie.failed.writes.cleaner.policy=LAZY -hoodie.writer.lock.provider= +hoodie.write.lock.provider= ``` There are 2 different server based lock providers that require different configuration to be set. @@ -53,23 +53,23 @@ There are 2 different server based lock providers that require different configu **`Zookeeper`** based lock provider ``` -hoodie.writer.lock.provider=org.apache.hudi.client.transaction.lock.ZookeeperBasedLockProvider -hoodie.writer.lock.zookeeper.url -hoodie.writer.lock.zookeeper.port -hoodie.writer.lock.wait_time_ms -hoodie.writer.lock.num_retries -hoodie.writer.lock.lock_key -hoodie.writer.lock.zookeeper.zk_base_path +hoodie.write.lock.provider=org.apache.hudi.client.transaction.lock.ZookeeperBasedLockProvider +hoodie.write.lock.zookeeper.url +hoodie.write.lock.zookeeper.port +hoodie.write.lock.wait_time_ms +hoodie.write.lock.num_retries +hoodie.write.lock.lock_key +hoodie.write.lock.zookeeper.zk_base_path ``` **`HiveMetastore`** based lock provider ``` -hoodie.writer.lock.provider=org.apache.hudi.hive.HiveMetastoreBasedLockProvider -hoodie.writer.lock.hivemetastore.database -hoodie.writer.lock.hivemetastore.table -hoodie.writer.lock.wait_time_ms -hoodie.writer.lock.num_retries +hoodie.write.lock.provider=org.apache.hudi.hive.HiveMetastoreBasedLockProvider +hoodie.write.lock.hivemetastore.database +hoodie.write.lock.hivemetastore.table +hoodie.write.lock.wait_time_ms +hoodie.write.lock.num_retries ``` `The HiveMetastore URI's are picked up from the hadoop configuration file loaded during runtime.` @@ -86,12 +86,12 @@ inputDF.write.format("hudi") .option(PRECOMBINE_FIELD_OPT_KEY, "ts") .option("hoodie.failed.writes.cleaner.policy", "LAZY") .option("hoodie.write.concurrency.mode", "optimistic_concurrency_control") - .option("hoodie.writer.lock.zookeeper.url", "zookeeper") - .option("hoodie.writer.lock.zookeeper.port", "2181") - .option("hoodie.writer.lock.wait_time_ms", "12000") - .option("hoodie.writer.lock.num_retries", "2") - .option("hoodie.writer.lock.lock_key", "test_table") - .option("hoodie.writer.lock.zookeeper.zk_base_path", "/test") + .option("hoodie.write.lock.zookeeper.url", "zookeeper") + .option("hoodie.write.lock.zookeeper.port", "2181") + .option("hoodie.write.lock.wait_time_ms", "12000") + .option("hoodie.write.lock.num_retries", "2") + .option("hoodie.write.lock.lock_key", "test_table") + .option("hoodie.write.lock.zookeeper.zk_base_path", "/test") .option(RECORDKEY_FIELD_OPT_KEY, "uuid") .option(PARTITIONPATH_FIELD_OPT_KEY, "partitionpath") .option(TABLE_NAME, tableName) @@ -128,15 +128,15 @@ Concurrent Writing to Hudi tables requires acquiring a lock with either Zookeepe Set the correct native lock provider client retries. NOTE that sometimes these settings are set on the server once and all clients inherit the same configs. Please check your settings before enabling optimistic concurrency. ``` -hoodie.writer.lock.wait_time_ms -hoodie.writer.lock.num_retries +hoodie.write.lock.wait_time_ms +hoodie.write.lock.num_retries ``` Set the correct hudi client retries for Zookeeper & HiveMetastore. This is useful in cases when native client retry settings cannot be changed. Please note that these retries will happen in addition to any native client retries that you may have set. ``` -hoodie.writer.lock.client.wait_time_ms -hoodie.writer.lock.client.num_retries +hoodie.write.lock.client.wait_time_ms +hoodie.write.lock.client.num_retries ``` *Setting the right values for these depends on a case by case basis; some defaults have been provided for general cases.*
[jira] [Updated] (HUDI-1713) Fix config name for concurrency
[ https://issues.apache.org/jira/browse/HUDI-1713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-1713: - Labels: pull-request-available (was: ) > Fix config name for concurrency > --- > > Key: HUDI-1713 > URL: https://issues.apache.org/jira/browse/HUDI-1713 > Project: Apache Hudi > Issue Type: Task > Components: Writer Core >Reporter: Nishith Agarwal >Assignee: Nishith Agarwal >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] n3nash edited a comment on pull request #2701: [HUDI 1623] New Hoodie Instant on disk format with end time and milliseconds granularity
n3nash edited a comment on pull request #2701: URL: https://github.com/apache/hudi/pull/2701#issuecomment-804432752 @vinothchandar Can you take an early cursory look at this PR ? I have not added any tests yet and more changes need to be done for the build to work. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[hudi] branch asf-site updated: Travis CI build asf-site
This is an automated email from the ASF dual-hosted git repository. vinoth pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/asf-site by this push: new cd78ade Travis CI build asf-site cd78ade is described below commit cd78ade8e43ce4e592df09e7ce1e775d009c44e1 Author: CI AuthorDate: Wed Mar 24 19:40:21 2021 + Travis CI build asf-site --- content/docs/concurrency_control.html | 46 +-- 1 file changed, 23 insertions(+), 23 deletions(-) diff --git a/content/docs/concurrency_control.html b/content/docs/concurrency_control.html index af97c0f..f94a03f 100644 --- a/content/docs/concurrency_control.html +++ b/content/docs/concurrency_control.html @@ -415,29 +415,29 @@ This feature is currently experimental and requires either Zookeeper or hoodie.write.concurrency.mode=optimistic_concurrency_control hoodie.failed.writes.cleaner.policy=LAZY -hoodie.writer.lock.provider=+hoodie.write.lock.provider= There are 2 different server based lock providers that require different configuration to be set. Zookeeper based lock provider -hoodie.writer.lock.provider=org.apache.hudi.client.transaction.lock.ZookeeperBasedLockProvider -hoodie.writer.lock.zookeeper.url -hoodie.writer.lock.zookeeper.port -hoodie.writer.lock.wait_time_ms -hoodie.writer.lock.num_retries -hoodie.writer.lock.lock_key -hoodie.writer.lock.zookeeper.zk_base_path +hoodie.write.lock.provider=org.apache.hudi.client.transaction.lock.ZookeeperBasedLockProvider +hoodie.write.lock.zookeeper.url +hoodie.write.lock.zookeeper.port +hoodie.write.lock.wait_time_ms +hoodie.write.lock.num_retries +hoodie.write.lock.lock_key +hoodie.write.lock.zookeeper.zk_base_path HiveMetastore based lock provider -hoodie.writer.lock.provider=org.apache.hudi.hive.HiveMetastoreBasedLockProvider -hoodie.writer.lock.hivemetastore.database -hoodie.writer.lock.hivemetastore.table -hoodie.writer.lock.wait_time_ms -hoodie.writer.lock.num_retries +hoodie.write.lock.provider=org.apache.hudi.hive.HiveMetastoreBasedLockProvider +hoodie.write.lock.hivemetastore.database +hoodie.write.lock.hivemetastore.table +hoodie.write.lock.wait_time_ms +hoodie.write.lock.num_retries The HiveMetastore URI's are picked up from the hadoop configuration file loaded during runtime. @@ -453,12 +453,12 @@ hoodie.writer.lock.num_retries .option(PRECOMBINE_FIELD_OPT_KEY, "ts") .option("hoodie.failed.writes.cleaner.policy", "LAZY") .option("hoodie.write.concurrency.mode", "optimistic_concurrency_control") - .option("hoodie.writer.lock.zookeeper.url", "zookeeper") - .option("hoodie.writer.lock.zookeeper.port", "2181") - .option("hoodie.writer.lock.wait_time_ms", "12000") - .option("hoodie.writer.lock.num_retries", "2") - .option("hoodie.writer.lock.lock_key", "test_table") - .option("hoodie.writer.lock.zookeeper.zk_base_path", "/test") + .option("hoodie.write.lock.zookeeper.url", "zookeeper") + .option("hoodie.write.lock.zookeeper.port", "2181") + .option("hoodie.write.lock.wait_time_ms", "12000") + .option("hoodie.write.lock.num_retries", "2") + .option("hoodie.write.lock.lock_key", "test_table") + .option("hoodie.write.lock.zookeeper.zk_base_path", "/test") .option(RECORDKEY_FIELD_OPT_KEY, "uuid") .option(PARTITIONPATH_FIELD_OPT_KEY, "partitionpath") .option(TABLE_NAME, tableName) @@ -495,14 +495,14 @@ A deltastreamer job can then be triggered as follows: Set the correct native lock provider client retries. NOTE that sometimes these settings are set on the server once and all clients inherit the same configs. Please check your settings before enabling optimistic concurrency. -hoodie.writer.lock.wait_time_ms -hoodie.writer.lock.num_retries +hoodie.write.lock.wait_time_ms +hoodie.write.lock.num_retries Set the correct hudi client retries for Zookeeper & HiveMetastore. This is useful in cases when native client retry settings cannot be changed. Please note that these retries will happen in addition to any native client retries that you may have set. -hoodie.writer.lock.client.wait_time_ms -hoodie.writer.lock.client.num_retries +hoodie.write.lock.client.wait_time_ms +hoodie.write.lock.client.num_retries Setting the right values for these depends on a case by case basis; some defaults have been provided for general cases.
[GitHub] [hudi] vinothchandar commented on a change in pull request #2645: [HUDI-1659] Basic Implementation Of Spark Sql Support
vinothchandar commented on a change in pull request #2645: URL: https://github.com/apache/hudi/pull/2645#discussion_r597997690 ## File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieMergeHandle.java ## @@ -394,4 +405,36 @@ public IOType getIOType() { public HoodieBaseFile baseFileForMerge() { return baseFileToMerge; } + + /** + * A special record returned by {@link HoodieRecordPayload}, which means + * {@link HoodieMergeHandle} should just skip this record. + */ + private static class IgnoreRecord implements GenericRecord { Review comment: want to understand the need for this better. ## File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieWriteHandle.java ## @@ -54,9 +53,18 @@ public abstract class HoodieWriteHandle extends HoodieIOHandle { private static final Logger LOG = LogManager.getLogger(HoodieWriteHandle.class); + /** + * The input schema of the incoming dataframe. + */ + protected final Schema inputSchema; Review comment: as we discussed in the other PR, we typically call this the `writeSchema` ## File path: hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/spark/sql/hudi/HoodieBaseSqlTest.scala ## @@ -0,0 +1,72 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.hudi + +import java.io.File + +import org.apache.spark.sql.{Row, SparkSession} +import org.apache.spark.util.Utils +import org.scalactic.source +import org.scalatest.{BeforeAndAfterAll, FunSuite, Tag} + +class HoodieBaseSqlTest extends FunSuite with BeforeAndAfterAll { Review comment: can we start all test classes using `Test` convention. Thats what we use throughout the project. ## File path: hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/keygen/UuidKeyGenerator.java ## @@ -0,0 +1,52 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hudi.keygen; + +import java.util.ArrayList; +import java.util.Arrays; +import java.util.UUID; +import java.util.stream.Collectors; +import org.apache.avro.generic.GenericRecord; +import org.apache.hudi.common.config.TypedProperties; +import org.apache.hudi.keygen.constant.KeyGeneratorOptions; + +/** + * A KeyGenerator which use the uuid as the record key. + */ +public class UuidKeyGenerator extends BuiltinKeyGenerator { Review comment: to revisit why we want this. ## File path: hudi-spark-datasource/hudi-spark2/src/main/antlr4/imports/SqlBase.g4 ## @@ -0,0 +1,1099 @@ +/* + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + * + * This file is an adaptation of Presto's presto-parser/src/main/antlr4/com/facebook/presto/sql/parser/SqlBase.g4 grammar. Review comment: I thought we planned to just use Spark SQL keywords and limit to Spark3 (which should already recognize delete/merge?). Is ant
[GitHub] [hudi] pengzhiwei2018 edited a comment on pull request #2645: [HUDI-1659] Basic Implementation Of Spark Sql Support
pengzhiwei2018 edited a comment on pull request #2645: URL: https://github.com/apache/hudi/pull/2645#issuecomment-803355840 > Getting started on this. Sorry for the delay. > > How important are the changes around writeSchema vs inputSchema and such changes to the SQL implementation? Hi @vinothchandar ,Thanks for your review. It's necessary to introduce the `inputSchema` & `tableSchema` to replace the origin `writeSchema` for MergeInto. For example: ``` Merge Into h0 using ( select id, name, flag from s) as s0 on s0.id = h0.id when matched and flag ='u' then update set id = s0.name, name = s0.name when not matched then insert (id, name) values(s0.id, s0.name) ``` The input is `"select id, name, flag from s"` which schema is `(id, name, flag)`. But the record write to the table is `(id, name) ` after the update&insert translate. The inputSchema is not equal to the writeSchema. So the origin `writeSchema` can not satisfy this scenario. I introduce introduce the `inputSchema` & `tableSchema` to solve this problem. The `inputSchema` is used to parse the incoming record and the `tableSchema` for write & read record from the table. In most case except the MergeInto, The `inputSchema` is the same the `tableSchema`,So it should not affect the origin logical, IMO. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] vinothchandar commented on pull request #2645: [HUDI-1659] Basic Implementation Of Spark Sql Support
vinothchandar commented on pull request #2645: URL: https://github.com/apache/hudi/pull/2645#issuecomment-806141760 Let me see how/if we can simplify the inputSchema vs writeSchema thing. I went over the PR now. LGTM at a high level. Few questions though - I see we are introducing some antlr parsing and inject a custom parse for spark 2.x. Is this done for backwards compat with Spark 2 and will be eventually removed? - Do we reuse the MERGE/DELETE keywords from Spark 3? Is Spark 3 and Spark 2 syntax different. Can you comment on how we are approaching all this. - Have you done any production testing of this PR? cc @kwondw could you also please chime in. We would like to land something basic and iterate and get this out for 0.9.0 next month. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] vinothchandar commented on pull request #2645: [HUDI-1659] Basic Implementation Of Spark Sql Support
vinothchandar commented on pull request #2645: URL: https://github.com/apache/hudi/pull/2645#issuecomment-806159133 Can we also handle DELETE in this PR itself? That way, we have some basic support for all of the major DMLs -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] vinothchandar commented on pull request #2645: [HUDI-1659] Basic Implementation Of Spark Sql Support
vinothchandar commented on pull request #2645: URL: https://github.com/apache/hudi/pull/2645#issuecomment-806165264 cc @vingov this may also be useful for you -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] xiarixiaoyao commented on pull request #2645: [HUDI-1659] Basic Implementation Of Spark Sql Support
xiarixiaoyao commented on pull request #2645: URL: https://github.com/apache/hudi/pull/2645#issuecomment-806300975 @pengzhiwei2018 it's a great work, thanks. MERGE/DELETE expresssions(MergeIntoTable, MergeAction... ) in your pr is copy from v2Commands.scala in spark 3.0 which will lead to class conflict if we implement sql suppor for spark3 later. could you shade those keywords @vinothchandar MERGE/DELETE keywords from Spark 3 is incompatible with this pr, however it's not a problem,we can introduce a extra module hudi-spark3-extensions to resolve those incompatible, i will put forward a new pr for spark3 next few days。 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] xiarixiaoyao commented on a change in pull request #2651: [HUDI-1591] [RFC-26] Improve Hoodie Table Query Performance And Ease Of Use Fo…
xiarixiaoyao commented on a change in pull request #2651: URL: https://github.com/apache/hudi/pull/2651#discussion_r600990095 ## File path: hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/hudi/HoodieFileIndex.scala ## @@ -0,0 +1,349 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hudi + +import java.util.Properties + +import scala.collection.JavaConverters._ +import org.apache.hadoop.fs.{FileStatus, Path} +import org.apache.hudi.client.common.HoodieSparkEngineContext +import org.apache.hudi.common.config.{HoodieMetadataConfig, SerializableConfiguration} +import org.apache.hudi.common.engine.HoodieLocalEngineContext +import org.apache.hudi.common.fs.FSUtils +import org.apache.hudi.common.model.HoodieBaseFile +import org.apache.hudi.common.table.{HoodieTableMetaClient, TableSchemaResolver} +import org.apache.hudi.common.table.view.HoodieTableFileSystemView +import org.apache.hudi.config.HoodieWriteConfig +import org.apache.spark.api.java.JavaSparkContext +import org.apache.spark.internal.Logging +import org.apache.spark.sql.catalyst.{InternalRow, expressions} +import org.apache.spark.sql.SparkSession +import org.apache.spark.sql.avro.SchemaConverters +import org.apache.spark.sql.catalyst.expressions.{AttributeReference, BoundReference, Expression, InterpretedPredicate} +import org.apache.spark.sql.catalyst.util.{CaseInsensitiveMap, DateTimeUtils} +import org.apache.spark.sql.execution.datasources.{FileIndex, FileStatusCache, NoopCache, PartitionDirectory, PartitionUtils} +import org.apache.spark.sql.internal.SQLConf +import org.apache.spark.sql.types.StructType +import org.apache.spark.unsafe.types.UTF8String + +import scala.collection.mutable + +/** + * A File Index which support partition prune for hoodie snapshot and read-optimized + * query. + * Main steps to get the file list for query: + * 1、Load all files and partition values from the table path. + * 2、Do the partition prune by the partition filter condition. + * + * There are 3 cases for this: + * 1、If the partition columns size is equal to the actually partition path level, we + * read it as partitioned table.(e.g partition column is "dt", the partition path is "2021-03-10") + * + * 2、If the partition columns size is not equal to the partition path level, but the partition + * column size is "1" (e.g. partition column is "dt", but the partition path is "2021/03/10" + * who'es directory level is 3).We can still read it as a partitioned table. We will mapping the + * partition path (e.g. 2021/03/10) to the only partition column (e.g. "dt"). + * + * 3、Else the the partition columns size is not equal to the partition directory level and the + * size is great than "1" (e.g. partition column is "dt,hh", the partition path is "2021/03/10/12") + * , we read it as a None Partitioned table because we cannot know how to mapping the partition + * path with the partition columns in this case. + */ +case class HoodieFileIndex( + spark: SparkSession, + metaClient: HoodieTableMetaClient, + schemaSpec: Option[StructType], + options: Map[String, String], + @transient fileStatusCache: FileStatusCache = NoopCache) + extends FileIndex with Logging { + + private val basePath = metaClient.getBasePath + + @transient private val queryPath = new Path(options.getOrElse("path", "'path' option required")) + /** +* Get the schema of the table. +*/ + lazy val schema: StructType = schemaSpec.getOrElse({ +val schemaUtil = new TableSchemaResolver(metaClient) +SchemaConverters.toSqlType(schemaUtil.getTableAvroSchema) + .dataType.asInstanceOf[StructType] + }) + + /** +* Get the partition schema from the hoodie.properties. +*/ + private lazy val _partitionSchemaFromProperties: StructType = { +val tableConfig = metaClient.getTableConfig +val partitionColumns = tableConfig.getPartitionColumns +val nameFieldMap = schema.fields.map(filed => filed.name -> filed).toMap + +if (partitionColumns.isPresent) { + val partitionFields = partitionColumns.get().map(column => +nameFieldMap.getOrElse(column, throw new IllegalArgumentException(s"Cannot find column: '" + +
[GitHub] [hudi] xiarixiaoyao commented on a change in pull request #2645: [HUDI-1659] Basic Implementation Of Spark Sql Support
xiarixiaoyao commented on a change in pull request #2645: URL: https://github.com/apache/hudi/pull/2645#discussion_r600992762 ## File path: hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/HoodieSparkSessionExtension.scala ## @@ -0,0 +1,48 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.hudi + +import org.apache.spark.SPARK_VERSION +import org.apache.spark.sql.SparkSessionExtensions +import org.apache.spark.sql.hudi.analysis.HoodieAnalysis +import org.apache.spark.sql.hudi.parser.HoodieSqlParser + +/** + * The Hoodie SparkSessionExtension for extending the syntax and add the rules. + */ +class HoodieSparkSessionExtension extends (SparkSessionExtensions => Unit) { + override def apply(extensions: SparkSessionExtensions): Unit = { +if (SPARK_VERSION.startsWith("2.")) { Review comment: some SQL expressions is restructured in spark3. for example InsertIntoStatement is used in spark3 instead of InsertIntoTable which used by spark2. it's better to introdcue a extra module for spark3 extensions -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] xiarixiaoyao commented on a change in pull request #2645: [HUDI-1659] Basic Implementation Of Spark Sql Support
xiarixiaoyao commented on a change in pull request #2645: URL: https://github.com/apache/hudi/pull/2645#discussion_r600993546 ## File path: hudi-spark-datasource/hudi-spark2/src/main/antlr4/imports/SqlBase.g4 ## @@ -0,0 +1,1099 @@ +/* + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + * + * This file is an adaptation of Presto's presto-parser/src/main/antlr4/com/facebook/presto/sql/parser/SqlBase.g4 grammar. Review comment: i agree with pengzhiwei2018 that introduce antlr4 for spark2. spark2 cannot use spark3‘ parser directly (many grammer is changed in spark3) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (HUDI-1716) rt view w/ MOR tables fails after schema evolution
sivabalan narayanan created HUDI-1716: - Summary: rt view w/ MOR tables fails after schema evolution Key: HUDI-1716 URL: https://issues.apache.org/jira/browse/HUDI-1716 Project: Apache Hudi Issue Type: Bug Components: Storage Management Reporter: sivabalan narayanan Looks like realtime view w/ MOR table fails if schema present in existing log file is evolved to add a new field. no issues w/ writing. but reading fails. More info: https://github.com/apache/hudi/issues/2675 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] nsivabalan commented on issue #2675: [SUPPORT] Unable to query MOR table after schema evolution
nsivabalan commented on issue #2675: URL: https://github.com/apache/hudi/issues/2675#issuecomment-806348073 yes, you are right. I was able to reproduce the issue(local spark). Have filed a [bug](https://issues.apache.org/jira/browse/HUDI-1716). I am yet to try out the hive issue. but it could be the same. Appreciate any contribution :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-1716) rt view w/ MOR tables fails after schema evolution
[ https://issues.apache.org/jira/browse/HUDI-1716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-1716: -- Labels: sev:critical user-support-issues (was: ) > rt view w/ MOR tables fails after schema evolution > -- > > Key: HUDI-1716 > URL: https://issues.apache.org/jira/browse/HUDI-1716 > Project: Apache Hudi > Issue Type: Bug > Components: Storage Management >Reporter: sivabalan narayanan >Priority: Major > Labels: sev:critical, user-support-issues > Fix For: 0.9.0 > > > Looks like realtime view w/ MOR table fails if schema present in existing log > file is evolved to add a new field. no issues w/ writing. but reading fails. > > More info: https://github.com/apache/hudi/issues/2675 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-1716) rt view w/ MOR tables fails after schema evolution
[ https://issues.apache.org/jira/browse/HUDI-1716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-1716: -- Fix Version/s: 0.9.0 > rt view w/ MOR tables fails after schema evolution > -- > > Key: HUDI-1716 > URL: https://issues.apache.org/jira/browse/HUDI-1716 > Project: Apache Hudi > Issue Type: Bug > Components: Storage Management >Reporter: sivabalan narayanan >Priority: Major > Fix For: 0.9.0 > > > Looks like realtime view w/ MOR table fails if schema present in existing log > file is evolved to add a new field. no issues w/ writing. but reading fails. > > More info: https://github.com/apache/hudi/issues/2675 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Reopened] (HUDI-1495) Upgrade Flink version to 1.12.0
[ https://issues.apache.org/jira/browse/HUDI-1495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Danny Chen reopened HUDI-1495: -- Reopen for release 0.9.0 > Upgrade Flink version to 1.12.0 > --- > > Key: HUDI-1495 > URL: https://issues.apache.org/jira/browse/HUDI-1495 > Project: Apache Hudi > Issue Type: Task > Components: newbie >Reporter: Danny Chen >Assignee: Danny Chen >Priority: Major > Labels: easyfix, pull-request-available > Fix For: 0.7.0 > > > The apache Flink 1.12.0 has be released, upgrade the version to 1.12.0 in > order to adapter new Flink interfaces. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-1495) Upgrade Flink version to 1.12.0
[ https://issues.apache.org/jira/browse/HUDI-1495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Danny Chen updated HUDI-1495: - Fix Version/s: (was: 0.7.0) 0.9.0 > Upgrade Flink version to 1.12.0 > --- > > Key: HUDI-1495 > URL: https://issues.apache.org/jira/browse/HUDI-1495 > Project: Apache Hudi > Issue Type: Task > Components: newbie >Reporter: Danny Chen >Assignee: Danny Chen >Priority: Major > Labels: easyfix, pull-request-available > Fix For: 0.9.0 > > > The apache Flink 1.12.0 has be released, upgrade the version to 1.12.0 in > order to adapter new Flink interfaces. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-1716) rt view w/ MOR tables fails after schema evolution
[ https://issues.apache.org/jira/browse/HUDI-1716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-1716: -- Description: Looks like realtime view w/ MOR table fails if schema present in existing log file is evolved to add a new field. no issues w/ writing. but reading fails More info: [https://github.com/apache/hudi/issues/2675] Logs from local run: [https://gist.github.com/nsivabalan/656956ab313676617d84002ef8942198] diff with which above logs were generated: [https://gist.github.com/nsivabalan/84dad29bc1ab567ebb6ee8c63b3969ec] was: Looks like realtime view w/ MOR table fails if schema present in existing log file is evolved to add a new field. no issues w/ writing. but reading fails. More info: https://github.com/apache/hudi/issues/2675 > rt view w/ MOR tables fails after schema evolution > -- > > Key: HUDI-1716 > URL: https://issues.apache.org/jira/browse/HUDI-1716 > Project: Apache Hudi > Issue Type: Bug > Components: Storage Management >Reporter: sivabalan narayanan >Priority: Major > Labels: sev:critical, user-support-issues > Fix For: 0.9.0 > > > Looks like realtime view w/ MOR table fails if schema present in existing log > file is evolved to add a new field. no issues w/ writing. but reading fails > More info: [https://github.com/apache/hudi/issues/2675] > > Logs from local run: > [https://gist.github.com/nsivabalan/656956ab313676617d84002ef8942198] > diff with which above logs were generated: > [https://gist.github.com/nsivabalan/84dad29bc1ab567ebb6ee8c63b3969ec] > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-1711) Avro Schema Exception with Spark 3.0 in 0.7
[ https://issues.apache.org/jira/browse/HUDI-1711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17308346#comment-17308346 ] sivabalan narayanan commented on HUDI-1711: --- sure > Avro Schema Exception with Spark 3.0 in 0.7 > --- > > Key: HUDI-1711 > URL: https://issues.apache.org/jira/browse/HUDI-1711 > Project: Apache Hudi > Issue Type: Bug > Components: DeltaStreamer >Reporter: Balaji Varadarajan >Priority: Major > > GH: [https://github.com/apache/hudi/issues/2705] > > > {{21/03/22 10:10:35 WARN util.package: Truncated the string representation of > a plan since it was too large. This behavior can be adjusted by setting > 'spark.sql.debug.maxToStringFields'. > 21/03/22 10:10:35 ERROR executor.Executor: Exception in task 0.0 in stage 1.0 > (TID 1) > java.lang.RuntimeException: Error while decoding: > java.lang.NegativeArraySizeException: -1255727808 > createexternalrow(if (isnull(input[0, > struct, > true])) null else createexternalrow(if (input[0, > struct, > true].isNullAt) null else input[0, > struct, > true].id, if (input[0, > struct, > true].isNullAt) null else input[0, > struct, > true].name.toString, if (input[0, > struct, > true].isNullAt) null else input[0, > struct, > true].type.toString, if (input[0, > struct, > true].isNullAt) null else input[0, > struct, > true].url.toString, if (input[0, > struct, > true].isNullAt) null else input[0, > struct, > true].user.toString, if (input[0, > struct, > true].isNullAt) null else input[0, > struct, > true].password.toString, if (input[0, > struct, > true].isNullAt) null else input[0, > struct, > true].create_time.toString, if (input[0, > struct, > true].isNullAt) null else input[0, > struct, > true].create_user.toString, if (input[0, > struct, > true].isNullAt) null else input[0, > struct, > true].update_time.toString, if (input[0, > struct, > true].isNullAt) null else input[0, > struct, > true].update_user.toString, if (input[0, > struct, > true].isNullAt) null else input[0, > struct, > true].del_flag, StructField(id,IntegerType,false), > StructField(name,StringType,true), StructField(type,StringType,true), > StructField(url,StringType,true), StructField(user,StringType,true), > StructField(password,StringType,true), > StructField(create_time,StringType,true), > StructField(create_user,StringType,true), > StructField(update_time,StringType,true), > StructField(update_user,StringType,true), > StructField(del_flag,IntegerType,true)), if (isnull(input[1, > struct, > true])) null else createexternalrow(if (input[1, > struct, > true].isNullAt) null else input[1, > struct, > true].id, if (input[1, > struct, > true].isNullAt) null else input[1, > struct, > true].name.toString, if (input[1, > struct, > true].isNullAt) null else input[1, > struct, > true].type.toString, if (input[1, > struct, > true].isNullAt) null else input[1, > struct, > true].url.toString, if (input[1, > struct, > true].isNullAt) null else input[1, > struct, > true].user.toString, if (input[1, > struct, > true].isNullAt) null else input[1, > struct, > true].password.toString, if (input[1, > struct, > true].isNullAt) null else input[1, > struct, > true].create_time.toString, if (input[1, > struct, > true].isNullAt) null else input[1, > struct, > true].create_user.toString, if (input[1, > struct, > true].isNullAt) null else input[1, > struct, > true].update_time.toString, if (input[1, > struct, > true].isNullAt) null else input[1, > struct, > true].update_user.toString, if (input[1, > struct, > true].isNullAt) null else input[1, > struct, > true].del_flag, StructField(id,IntegerType,false), > StructField(name,StringType,true), StructField(type,StringType,true), > StructField(url,StringType,true), StructField(user,StringType,true), > StructField(password,StringType,true), > StructField(create_time,StringType,true), > StructField(create_user,StringType,true), > StructField(update_time,StringType,true), > StructField(update_user,StringType,true), > StructField(del_flag,IntegerType,true)), if (isnull(input[2, > struct, > false])) null else createexternalrow(if (input[2, > struct, > false].isNullAt) null else input[2, > struct, > false].version.toString, if (input[2, > struct, > false].isNullAt) null else input[2, > struct, > false].connector.toString, if (input[2, > struct, > false].isNullAt) null else input[2, > struct, > false].name.toString, if (input[2, > struct, > false].isNullAt) null else input[2, > struct, > false].ts_ms, if (input[2, > struct, > false].isNullAt) null else input[2, > struct, > false].snapshot.toString, if (input[2, > struct, > false].isNullAt) null else input[2, > struct, > false].db.toString, if (input[2, > struct, >
[jira] [Updated] (HUDI-1716) rt view w/ MOR tables fails after schema evolution
[ https://issues.apache.org/jira/browse/HUDI-1716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-1716: -- Description: Looks like realtime view w/ MOR table fails if schema present in existing log file is evolved to add a new field. no issues w/ writing. but reading fails More info: [https://github.com/apache/hudi/issues/2675] Logs from local run: [https://gist.github.com/nsivabalan/656956ab313676617d84002ef8942198] diff with which above logs were generated: [https://gist.github.com/nsivabalan/84dad29bc1ab567ebb6ee8c63b3969ec] Steps to reproduce in spark shell: # create MOR table w/ schema1. # Ingest (with schema1) until log files are created. // verify via hudi-cli. I didn't see log files w/ just 1 batch of updates. If not, do multiple rounds until you see log files. # create a new schema2 with one new additional field. ingest a batch with schema2 that updates existing records. # read entire dataset. was: Looks like realtime view w/ MOR table fails if schema present in existing log file is evolved to add a new field. no issues w/ writing. but reading fails More info: [https://github.com/apache/hudi/issues/2675] Logs from local run: [https://gist.github.com/nsivabalan/656956ab313676617d84002ef8942198] diff with which above logs were generated: [https://gist.github.com/nsivabalan/84dad29bc1ab567ebb6ee8c63b3969ec] > rt view w/ MOR tables fails after schema evolution > -- > > Key: HUDI-1716 > URL: https://issues.apache.org/jira/browse/HUDI-1716 > Project: Apache Hudi > Issue Type: Bug > Components: Storage Management >Reporter: sivabalan narayanan >Priority: Major > Labels: sev:critical, user-support-issues > Fix For: 0.9.0 > > > Looks like realtime view w/ MOR table fails if schema present in existing log > file is evolved to add a new field. no issues w/ writing. but reading fails > More info: [https://github.com/apache/hudi/issues/2675] > > Logs from local run: > [https://gist.github.com/nsivabalan/656956ab313676617d84002ef8942198] > diff with which above logs were generated: > [https://gist.github.com/nsivabalan/84dad29bc1ab567ebb6ee8c63b3969ec] > > Steps to reproduce in spark shell: > # create MOR table w/ schema1. > # Ingest (with schema1) until log files are created. // verify via hudi-cli. > I didn't see log files w/ just 1 batch of updates. If not, do multiple rounds > until you see log files. > # create a new schema2 with one new additional field. ingest a batch with > schema2 that updates existing records. > # read entire dataset. > > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] codecov-io commented on pull request #2710: [RFC-20][HUDI-648] Implement error log/table for Datasource/DeltaStreamer/WriteClient/Compaction writes
codecov-io commented on pull request #2710: URL: https://github.com/apache/hudi/pull/2710#issuecomment-806393474 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2710?src=pr&el=h1) Report > Merging [#2710](https://codecov.io/gh/apache/hudi/pull/2710?src=pr&el=desc) (d446d2d) into [master](https://codecov.io/gh/apache/hudi/commit/d7b18783bdd6edd6355ee68714982401d3321f86?el=desc) (d7b1878) will **increase** coverage by `10.07%`. > The diff coverage is `n/a`. > :exclamation: Current head d446d2d differs from pull request most recent head 0cddd8f. Consider uploading reports for the commit 0cddd8f to get more accurate results [![Impacted file tree graph](https://codecov.io/gh/apache/hudi/pull/2710/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2)](https://codecov.io/gh/apache/hudi/pull/2710?src=pr&el=tree) ```diff @@ Coverage Diff @@ ## master#2710 +/- ## = + Coverage 51.76% 61.84% +10.07% + Complexity 3601 332 -3269 = Files 476 54 -422 Lines 22583 1989-20594 Branches 2408 236 -2172 = - Hits 11689 1230-10459 + Misses 9877 638 -9239 + Partials 1017 121 -896 ``` | Flag | Coverage Δ | Complexity Δ | | |---|---|---|---| | hudicli | `?` | `?` | | | hudiclient | `?` | `?` | | | hudicommon | `?` | `?` | | | hudiflink | `?` | `?` | | | hudihadoopmr | `?` | `?` | | | hudisparkdatasource | `?` | `?` | | | hudisync | `?` | `?` | | | huditimelineservice | `?` | `?` | | | hudiutilities | `61.84% <ø> (-7.90%)` | `0.00 <ø> (ø)` | | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags#carryforward-flags-in-the-pull-request-comment) to find out more. | [Impacted Files](https://codecov.io/gh/apache/hudi/pull/2710?src=pr&el=tree) | Coverage Δ | Complexity Δ | | |---|---|---|---| | [...ies/exception/HoodieSnapshotExporterException.java](https://codecov.io/gh/apache/hudi/pull/2710/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2V4Y2VwdGlvbi9Ib29kaWVTbmFwc2hvdEV4cG9ydGVyRXhjZXB0aW9uLmphdmE=) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-1.00%)` | | | [.../apache/hudi/utilities/HoodieSnapshotExporter.java](https://codecov.io/gh/apache/hudi/pull/2710/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL0hvb2RpZVNuYXBzaG90RXhwb3J0ZXIuamF2YQ==) | `5.17% <0.00%> (-83.63%)` | `0.00% <0.00%> (-28.00%)` | | | [...hudi/utilities/schema/JdbcbasedSchemaProvider.java](https://codecov.io/gh/apache/hudi/pull/2710/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NjaGVtYS9KZGJjYmFzZWRTY2hlbWFQcm92aWRlci5qYXZh) | `0.00% <0.00%> (-72.23%)` | `0.00% <0.00%> (-2.00%)` | | | [...he/hudi/utilities/transform/AWSDmsTransformer.java](https://codecov.io/gh/apache/hudi/pull/2710/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3RyYW5zZm9ybS9BV1NEbXNUcmFuc2Zvcm1lci5qYXZh) | `0.00% <0.00%> (-66.67%)` | `0.00% <0.00%> (-2.00%)` | | | [...in/java/org/apache/hudi/utilities/UtilHelpers.java](https://codecov.io/gh/apache/hudi/pull/2710/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL1V0aWxIZWxwZXJzLmphdmE=) | `40.69% <0.00%> (-23.84%)` | `27.00% <0.00%> (-6.00%)` | | | [.../hive/SlashEncodedHourPartitionValueExtractor.java](https://codecov.io/gh/apache/hudi/pull/2710/diff?src=pr&el=tree#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvU2xhc2hFbmNvZGVkSG91clBhcnRpdGlvblZhbHVlRXh0cmFjdG9yLmphdmE=) | | | | | [...g/apache/hudi/timeline/service/RequestHandler.java](https://codecov.io/gh/apache/hudi/pull/2710/diff?src=pr&el=tree#diff-aHVkaS10aW1lbGluZS1zZXJ2aWNlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL3RpbWVsaW5lL3NlcnZpY2UvUmVxdWVzdEhhbmRsZXIuamF2YQ==) | | | | | [...che/hudi/common/util/BufferedRandomAccessFile.java](https://codecov.io/gh/apache/hudi/pull/2710/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3V0aWwvQnVmZmVyZWRSYW5kb21BY2Nlc3NGaWxlLmphdmE=) | | | | | [...e/timeline/versioning/clean/CleanPlanMigrator.java](https://codecov.io/gh/apache/hudi/pull/2710/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3RpbWVsaW5lL3ZlcnNpb25pbmcvY2xlYW4vQ2xlYW5QbGFuTWlncmF0b3IuamF2YQ==) | | | | | [...va/org/apache/hudi/table/format/FilePathUtils.java](https://codecov.io/gh/apache/hudi/pull/2710/diff?src=pr&el=tre
[jira] [Commented] (HUDI-1717) Metadata Table reader does not show correct view of the metadata
[ https://issues.apache.org/jira/browse/HUDI-1717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17308414#comment-17308414 ] Prashant Wason commented on HUDI-1717: -- [~vinothchandar] FYI > Metadata Table reader does not show correct view of the metadata > > > Key: HUDI-1717 > URL: https://issues.apache.org/jira/browse/HUDI-1717 > Project: Apache Hudi > Issue Type: Bug >Reporter: Prashant Wason >Priority: Blocker > > Dataset timeline: C1 C2 C3 Compaction.inflight C4 C5 > Metadata timeline: DC1 DC2 DC3. (DC=deltaCommit) > Assume the dataset timeline has some completed commits (C1, C2 ... C5) and an > async compaction operation in progress. Also assume that the metadata table > is synced only till C3. > The MetadataTableWriter will not sync any more instants to the Metadata Table > since an incomplete instant is present next (Compaction.inflight). > The same sync logic is also used by the MetadataReader to perform the > in-memory merge of timeline. Hence, the reader will also not consider C4 and > C5 thereby providing an incorrect and older view of the FileSlices and > FileGroups. > Any future ingestion into this table MAY insert data into older versions of > the FileSlices which will end up being a data loss when queried. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HUDI-1717) Metadata Table reader does not show correct view of the metadata
Prashant Wason created HUDI-1717: Summary: Metadata Table reader does not show correct view of the metadata Key: HUDI-1717 URL: https://issues.apache.org/jira/browse/HUDI-1717 Project: Apache Hudi Issue Type: Bug Reporter: Prashant Wason Dataset timeline: C1 C2 C3 Compaction.inflight C4 C5 Metadata timeline: DC1 DC2 DC3. (DC=deltaCommit) Assume the dataset timeline has some completed commits (C1, C2 ... C5) and an async compaction operation in progress. Also assume that the metadata table is synced only till C3. The MetadataTableWriter will not sync any more instants to the Metadata Table since an incomplete instant is present next (Compaction.inflight). The same sync logic is also used by the MetadataReader to perform the in-memory merge of timeline. Hence, the reader will also not consider C4 and C5 thereby providing an incorrect and older view of the FileSlices and FileGroups. Any future ingestion into this table MAY insert data into older versions of the FileSlices which will end up being a data loss when queried. -- This message was sent by Atlassian Jira (v8.3.4#803005)