[jira] [Commented] (HUDI-1783) Support Huawei Cloud Object Storage
[ https://issues.apache.org/jira/browse/HUDI-1783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17318424#comment-17318424 ] vinoyang commented on HUDI-1783: [~xiaotaotao] I have given you Jira contributor permission. > Support Huawei Cloud Object Storage > > > Key: HUDI-1783 > URL: https://issues.apache.org/jira/browse/HUDI-1783 > Project: Apache Hudi > Issue Type: Bug > Components: Common Core >Reporter: tao meng >Assignee: tao meng >Priority: Major > Labels: pull-request-available > Fix For: 0.9.0 > > > add support for Huawei Cloud Object Storage -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HUDI-1783) Support Huawei Cloud Object Storage
[ https://issues.apache.org/jira/browse/HUDI-1783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] vinoyang reassigned HUDI-1783: -- Assignee: tao meng > Support Huawei Cloud Object Storage > > > Key: HUDI-1783 > URL: https://issues.apache.org/jira/browse/HUDI-1783 > Project: Apache Hudi > Issue Type: Bug > Components: Common Core >Reporter: tao meng >Assignee: tao meng >Priority: Major > Labels: pull-request-available > Fix For: 0.9.0 > > > add support for Huawei Cloud Object Storage -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-1783) Support Huawei Cloud Object Storage
[ https://issues.apache.org/jira/browse/HUDI-1783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] vinoyang updated HUDI-1783: --- Summary: Support Huawei Cloud Object Storage (was: support Huawei Cloud Object Storage) > Support Huawei Cloud Object Storage > > > Key: HUDI-1783 > URL: https://issues.apache.org/jira/browse/HUDI-1783 > Project: Apache Hudi > Issue Type: Bug > Components: Common Core >Reporter: tao meng >Priority: Major > Labels: pull-request-available > Fix For: 0.9.0 > > > add support for Huawei Cloud Object Storage -- This message was sent by Atlassian Jira (v8.3.4#803005)
[hudi] branch master updated: [HUDI-1783] Support Huawei Cloud Object Storage (#2796)
This is an automated email from the ASF dual-hosted git repository. vinoyang pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/master by this push: new 8d4a7fe [HUDI-1783] Support Huawei Cloud Object Storage (#2796) 8d4a7fe is described below commit 8d4a7fe33e041719e2509f1f8ad3667e2ae7bbb4 Author: xiarixiaoyao AuthorDate: Sat Apr 10 13:02:11 2021 +0800 [HUDI-1783] Support Huawei Cloud Object Storage (#2796) --- .../src/main/java/org/apache/hudi/common/fs/StorageSchemes.java | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/hudi-common/src/main/java/org/apache/hudi/common/fs/StorageSchemes.java b/hudi-common/src/main/java/org/apache/hudi/common/fs/StorageSchemes.java index 7ebf641..56c9c8e 100644 --- a/hudi-common/src/main/java/org/apache/hudi/common/fs/StorageSchemes.java +++ b/hudi-common/src/main/java/org/apache/hudi/common/fs/StorageSchemes.java @@ -53,7 +53,9 @@ public enum StorageSchemes { // Databricks file system DBFS("dbfs", false), // IBM Cloud Object Storage - COS("cos", false); + COS("cos", false), + // Huawei Cloud Object Storage + OBS("obs", false); private String scheme; private boolean supportsAppend;
[GitHub] [hudi] yanghua merged pull request #2796: [HUDI-1783] Support Huawei Cloud Object Storage
yanghua merged pull request #2796: URL: https://github.com/apache/hudi/pull/2796 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] yanghua commented on pull request #2796: [HUDI-1783] Support Huawei Cloud Object Storage
yanghua commented on pull request #2796: URL: https://github.com/apache/hudi/pull/2796#issuecomment-817078875 > already update the code。 @yanghua thanks for you review, sorry for that low level mistake。 It doesn't matter, just relax. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] xiarixiaoyao commented on pull request #2722: [HUDI-1722]hive beeline/spark-sql query specified field on mor table occur NPE
xiarixiaoyao commented on pull request #2722: URL: https://github.com/apache/hudi/pull/2722#issuecomment-817071591 @garyli1019 unit test has added, pls review again, thanks -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] xiarixiaoyao commented on pull request #2796: [HUDI-1783] Support Huawei Cloud Object Storage
xiarixiaoyao commented on pull request #2796: URL: https://github.com/apache/hudi/pull/2796#issuecomment-817070783 already update the code。 @yanghua thanks for you review, sorry for that low level mistake。 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] codecov-io edited a comment on pull request #2798: [HUDI-1785] Move OperationConverter to hudi-client-common for code reuse
codecov-io edited a comment on pull request #2798: URL: https://github.com/apache/hudi/pull/2798#issuecomment-817064692 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2798?src=pr=h1) Report > Merging [#2798](https://codecov.io/gh/apache/hudi/pull/2798?src=pr=desc) (81fd3bb) into [master](https://codecov.io/gh/apache/hudi/commit/18459d4045ec4a85081c227893b226a4d759f84b?el=desc) (18459d4) will **increase** coverage by `0.29%`. > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/hudi/pull/2798/graphs/tree.svg?width=650=150=pr=VTTXabwbs2)](https://codecov.io/gh/apache/hudi/pull/2798?src=pr=tree) ```diff @@ Coverage Diff @@ ## master#2798 +/- ## + Coverage 52.26% 52.56% +0.29% - Complexity 3682 3708 +26 Files 484 484 Lines 2309423167 +73 Branches 2456 2459 +3 + Hits 1207012177 +107 + Misses 9959 9919 -40 - Partials 1065 1071 +6 ``` | Flag | Coverage Δ | Complexity Δ | | |---|---|---|---| | hudicli | `40.29% <ø> (+3.35%)` | `0.00 <ø> (ø)` | | | hudiclient | `∅ <ø> (∅)` | `0.00 <ø> (ø)` | | | hudicommon | `50.67% <ø> (-0.10%)` | `0.00 <ø> (ø)` | | | hudiflink | `56.60% <ø> (+0.02%)` | `0.00 <ø> (ø)` | | | hudihadoopmr | `33.44% <ø> (ø)` | `0.00 <ø> (ø)` | | | hudisparkdatasource | `71.33% <ø> (ø)` | `0.00 <ø> (ø)` | | | hudisync | `45.70% <ø> (+0.23%)` | `0.00 <ø> (ø)` | | | huditimelineservice | `64.36% <ø> (ø)` | `0.00 <ø> (ø)` | | | hudiutilities | `69.84% <ø> (+0.12%)` | `0.00 <ø> (ø)` | | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags#carryforward-flags-in-the-pull-request-comment) to find out more. | [Impacted Files](https://codecov.io/gh/apache/hudi/pull/2798?src=pr=tree) | Coverage Δ | Complexity Δ | | |---|---|---|---| | [.../org/apache/hudi/streamer/FlinkStreamerConfig.java](https://codecov.io/gh/apache/hudi/pull/2798/diff?src=pr=tree#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS9zdHJlYW1lci9GbGlua1N0cmVhbWVyQ29uZmlnLmphdmE=) | `0.00% <ø> (ø)` | `0.00 <0.00> (ø)` | | | [...i/utilities/deltastreamer/HoodieDeltaStreamer.java](https://codecov.io/gh/apache/hudi/pull/2798/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2RlbHRhc3RyZWFtZXIvSG9vZGllRGVsdGFTdHJlYW1lci5qYXZh) | `71.78% <ø> (+0.59%)` | `18.00 <0.00> (ø)` | | | [...s/deltastreamer/HoodieMultiTableDeltaStreamer.java](https://codecov.io/gh/apache/hudi/pull/2798/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2RlbHRhc3RyZWFtZXIvSG9vZGllTXVsdGlUYWJsZURlbHRhU3RyZWFtZXIuamF2YQ==) | `78.39% <ø> (ø)` | `18.00 <0.00> (ø)` | | | [.../apache/hudi/sink/compact/CompactionPlanEvent.java](https://codecov.io/gh/apache/hudi/pull/2798/diff?src=pr=tree#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS9zaW5rL2NvbXBhY3QvQ29tcGFjdGlvblBsYW5FdmVudC5qYXZh) | `50.00% <0.00%> (-50.00%)` | `3.00% <0.00%> (ø%)` | | | [...pache/hudi/sink/compact/CompactionCommitEvent.java](https://codecov.io/gh/apache/hudi/pull/2798/diff?src=pr=tree#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS9zaW5rL2NvbXBhY3QvQ29tcGFjdGlvbkNvbW1pdEV2ZW50LmphdmE=) | `43.75% <0.00%> (-43.75%)` | `3.00% <0.00%> (ø%)` | | | [...i/common/table/timeline/TimelineMetadataUtils.java](https://codecov.io/gh/apache/hudi/pull/2798/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3RpbWVsaW5lL1RpbWVsaW5lTWV0YWRhdGFVdGlscy5qYXZh) | `70.17% <0.00%> (-2.56%)` | `17.00% <0.00%> (ø%)` | | | [...he/hudi/sink/partitioner/BucketAssignFunction.java](https://codecov.io/gh/apache/hudi/pull/2798/diff?src=pr=tree#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS9zaW5rL3BhcnRpdGlvbmVyL0J1Y2tldEFzc2lnbkZ1bmN0aW9uLmphdmE=) | `85.18% <0.00%> (-2.49%)` | `17.00% <0.00%> (+2.00%)` | :arrow_down: | | [...in/java/org/apache/hudi/table/HoodieTableSink.java](https://codecov.io/gh/apache/hudi/pull/2798/diff?src=pr=tree#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS90YWJsZS9Ib29kaWVUYWJsZVNpbmsuamF2YQ==) | `11.90% <0.00%> (-0.30%)` | `2.00% <0.00%> (ø%)` | | | [.../common/table/timeline/HoodieArchivedTimeline.java](https://codecov.io/gh/apache/hudi/pull/2798/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3RpbWVsaW5lL0hvb2RpZUFyY2hpdmVkVGltZWxpbmUuamF2YQ==) | `0.00% <0.00%> (ø)` | `0.00% <0.00%> (ø%)` | | |
[GitHub] [hudi] codecov-io commented on pull request #2798: [HUDI-1785] Move OperationConverter to hudi-client-common for code reuse
codecov-io commented on pull request #2798: URL: https://github.com/apache/hudi/pull/2798#issuecomment-817064692 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2798?src=pr=h1) Report > Merging [#2798](https://codecov.io/gh/apache/hudi/pull/2798?src=pr=desc) (81fd3bb) into [master](https://codecov.io/gh/apache/hudi/commit/18459d4045ec4a85081c227893b226a4d759f84b?el=desc) (18459d4) will **increase** coverage by `0.42%`. > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/hudi/pull/2798/graphs/tree.svg?width=650=150=pr=VTTXabwbs2)](https://codecov.io/gh/apache/hudi/pull/2798?src=pr=tree) ```diff @@ Coverage Diff @@ ## master#2798 +/- ## + Coverage 52.26% 52.68% +0.42% + Complexity 3682 3515 -167 Files 484 461 -23 Lines 2309421554-1540 Branches 2456 2303 -153 - Hits 1207011356 -714 + Misses 9959 9202 -757 + Partials 1065 996 -69 ``` | Flag | Coverage Δ | Complexity Δ | | |---|---|---|---| | hudicli | `40.29% <ø> (+3.35%)` | `0.00 <ø> (ø)` | | | hudiclient | `∅ <ø> (∅)` | `0.00 <ø> (ø)` | | | hudicommon | `50.67% <ø> (-0.10%)` | `0.00 <ø> (ø)` | | | hudiflink | `56.60% <ø> (+0.02%)` | `0.00 <ø> (ø)` | | | hudihadoopmr | `33.44% <ø> (ø)` | `0.00 <ø> (ø)` | | | hudisparkdatasource | `71.33% <ø> (ø)` | `0.00 <ø> (ø)` | | | hudisync | `?` | `?` | | | huditimelineservice | `?` | `?` | | | hudiutilities | `69.84% <ø> (+0.12%)` | `0.00 <ø> (ø)` | | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags#carryforward-flags-in-the-pull-request-comment) to find out more. | [Impacted Files](https://codecov.io/gh/apache/hudi/pull/2798?src=pr=tree) | Coverage Δ | Complexity Δ | | |---|---|---|---| | [.../org/apache/hudi/streamer/FlinkStreamerConfig.java](https://codecov.io/gh/apache/hudi/pull/2798/diff?src=pr=tree#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS9zdHJlYW1lci9GbGlua1N0cmVhbWVyQ29uZmlnLmphdmE=) | `0.00% <ø> (ø)` | `0.00 <0.00> (ø)` | | | [...i/utilities/deltastreamer/HoodieDeltaStreamer.java](https://codecov.io/gh/apache/hudi/pull/2798/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2RlbHRhc3RyZWFtZXIvSG9vZGllRGVsdGFTdHJlYW1lci5qYXZh) | `71.78% <ø> (+0.59%)` | `18.00 <0.00> (ø)` | | | [...s/deltastreamer/HoodieMultiTableDeltaStreamer.java](https://codecov.io/gh/apache/hudi/pull/2798/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2RlbHRhc3RyZWFtZXIvSG9vZGllTXVsdGlUYWJsZURlbHRhU3RyZWFtZXIuamF2YQ==) | `78.39% <ø> (ø)` | `18.00 <0.00> (ø)` | | | [.../apache/hudi/sink/compact/CompactionPlanEvent.java](https://codecov.io/gh/apache/hudi/pull/2798/diff?src=pr=tree#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS9zaW5rL2NvbXBhY3QvQ29tcGFjdGlvblBsYW5FdmVudC5qYXZh) | `50.00% <0.00%> (-50.00%)` | `3.00% <0.00%> (ø%)` | | | [...pache/hudi/sink/compact/CompactionCommitEvent.java](https://codecov.io/gh/apache/hudi/pull/2798/diff?src=pr=tree#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS9zaW5rL2NvbXBhY3QvQ29tcGFjdGlvbkNvbW1pdEV2ZW50LmphdmE=) | `43.75% <0.00%> (-43.75%)` | `3.00% <0.00%> (ø%)` | | | [...i/common/table/timeline/TimelineMetadataUtils.java](https://codecov.io/gh/apache/hudi/pull/2798/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3RpbWVsaW5lL1RpbWVsaW5lTWV0YWRhdGFVdGlscy5qYXZh) | `70.17% <0.00%> (-2.56%)` | `17.00% <0.00%> (ø%)` | | | [...he/hudi/sink/partitioner/BucketAssignFunction.java](https://codecov.io/gh/apache/hudi/pull/2798/diff?src=pr=tree#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS9zaW5rL3BhcnRpdGlvbmVyL0J1Y2tldEFzc2lnbkZ1bmN0aW9uLmphdmE=) | `85.18% <0.00%> (-2.49%)` | `17.00% <0.00%> (+2.00%)` | :arrow_down: | | [...in/java/org/apache/hudi/table/HoodieTableSink.java](https://codecov.io/gh/apache/hudi/pull/2798/diff?src=pr=tree#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS90YWJsZS9Ib29kaWVUYWJsZVNpbmsuamF2YQ==) | `11.90% <0.00%> (-0.30%)` | `2.00% <0.00%> (ø%)` | | | [.../common/table/timeline/HoodieArchivedTimeline.java](https://codecov.io/gh/apache/hudi/pull/2798/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3RpbWVsaW5lL0hvb2RpZUFyY2hpdmVkVGltZWxpbmUuamF2YQ==) | `0.00% <0.00%> (ø)` | `0.00% <0.00%> (ø%)` | | |
[GitHub] [hudi] codecov-io commented on pull request #2799: [HUDI-1784] Added print detailed stack log when hbase connection error
codecov-io commented on pull request #2799: URL: https://github.com/apache/hudi/pull/2799#issuecomment-817063994 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2799?src=pr=h1) Report > Merging [#2799](https://codecov.io/gh/apache/hudi/pull/2799?src=pr=desc) (84cf6fa) into [master](https://codecov.io/gh/apache/hudi/commit/6786581c4842e47e1a8a8e942f54003dc151c7c6?el=desc) (6786581) will **increase** coverage by `17.22%`. > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/hudi/pull/2799/graphs/tree.svg?width=650=150=pr=VTTXabwbs2)](https://codecov.io/gh/apache/hudi/pull/2799?src=pr=tree) ```diff @@ Coverage Diff @@ ## master#2799 +/- ## = + Coverage 52.54% 69.77% +17.22% + Complexity 3707 374 - = Files 485 54 -431 Lines 23171 1995-21176 Branches 2459 235 -2224 = - Hits 12176 1392-10784 + Misses 9923 473 -9450 + Partials 1072 130 -942 ``` | Flag | Coverage Δ | Complexity Δ | | |---|---|---|---| | hudicli | `?` | `?` | | | hudiclient | `?` | `?` | | | hudicommon | `?` | `?` | | | hudiflink | `?` | `?` | | | hudihadoopmr | `?` | `?` | | | hudisparkdatasource | `?` | `?` | | | hudisync | `?` | `?` | | | huditimelineservice | `?` | `?` | | | hudiutilities | `69.77% <ø> (+0.05%)` | `0.00 <ø> (ø)` | | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags#carryforward-flags-in-the-pull-request-comment) to find out more. | [Impacted Files](https://codecov.io/gh/apache/hudi/pull/2799?src=pr=tree) | Coverage Δ | Complexity Δ | | |---|---|---|---| | [...in/java/org/apache/hudi/hive/HoodieHiveClient.java](https://codecov.io/gh/apache/hudi/pull/2799/diff?src=pr=tree#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvSG9vZGllSGl2ZUNsaWVudC5qYXZh) | | | | | [.../common/bloom/HoodieDynamicBoundedBloomFilter.java](https://codecov.io/gh/apache/hudi/pull/2799/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL2Jsb29tL0hvb2RpZUR5bmFtaWNCb3VuZGVkQmxvb21GaWx0ZXIuamF2YQ==) | | | | | [...a/org/apache/hudi/common/bloom/InternalFilter.java](https://codecov.io/gh/apache/hudi/pull/2799/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL2Jsb29tL0ludGVybmFsRmlsdGVyLmphdmE=) | | | | | [...adoop/realtime/HoodieHFileRealtimeInputFormat.java](https://codecov.io/gh/apache/hudi/pull/2799/diff?src=pr=tree#diff-aHVkaS1oYWRvb3AtbXIvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaGFkb29wL3JlYWx0aW1lL0hvb2RpZUhGaWxlUmVhbHRpbWVJbnB1dEZvcm1hdC5qYXZh) | | | | | [...rg/apache/hudi/hadoop/HoodieROTablePathFilter.java](https://codecov.io/gh/apache/hudi/pull/2799/diff?src=pr=tree#diff-aHVkaS1oYWRvb3AtbXIvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaGFkb29wL0hvb2RpZVJPVGFibGVQYXRoRmlsdGVyLmphdmE=) | | | | | [...hudi/common/model/HoodieReplaceCommitMetadata.java](https://codecov.io/gh/apache/hudi/pull/2799/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL21vZGVsL0hvb2RpZVJlcGxhY2VDb21taXRNZXRhZGF0YS5qYXZh) | | | | | [.../apache/hudi/common/model/CompactionOperation.java](https://codecov.io/gh/apache/hudi/pull/2799/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL21vZGVsL0NvbXBhY3Rpb25PcGVyYXRpb24uamF2YQ==) | | | | | [...va/org/apache/hudi/cli/commands/CleansCommand.java](https://codecov.io/gh/apache/hudi/pull/2799/diff?src=pr=tree#diff-aHVkaS1jbGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY2xpL2NvbW1hbmRzL0NsZWFuc0NvbW1hbmQuamF2YQ==) | | | | | [.../apache/hudi/common/bootstrap/FileStatusUtils.java](https://codecov.io/gh/apache/hudi/pull/2799/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL2Jvb3RzdHJhcC9GaWxlU3RhdHVzVXRpbHMuamF2YQ==) | | | | | [...a/org/apache/hudi/common/util/ReflectionUtils.java](https://codecov.io/gh/apache/hudi/pull/2799/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3V0aWwvUmVmbGVjdGlvblV0aWxzLmphdmE=) | | | | | ... and [422 more](https://codecov.io/gh/apache/hudi/pull/2799/diff?src=pr=tree-more) | | -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hj2016 commented on issue #2623: org.apache.hudi.exception.HoodieDependentSystemUnavailableException:System HBASE unavailable.
hj2016 commented on issue #2623: URL: https://github.com/apache/hudi/issues/2623#issuecomment-817062980 @n3nash @nsivabalan @root18039532923 I also have the same connection problem. The debug found that it was because of the jar conflict. I submitted a pr [https://github.com/apache/hudi/pull/2799]. I hope that when the connection error is reported, a more detailed stack log will be printed to help accurately locate the problem. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-1784) Added print detailed stack log when hbase connection error
[ https://issues.apache.org/jira/browse/HUDI-1784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-1784: - Labels: pull-request-available (was: ) > Added print detailed stack log when hbase connection error > -- > > Key: HUDI-1784 > URL: https://issues.apache.org/jira/browse/HUDI-1784 > Project: Apache Hudi > Issue Type: Improvement > Components: Index >Reporter: jing >Assignee: jing >Priority: Major > Labels: pull-request-available > > I tried to upgrade hdfs to version 3.0 and found that hbase reported an error > and could not connect, but hbase was normal, and debug found that it was a > jar conflict problem. Exception failed to print detailed stack log, resulting > in no precise location of the cause of the problem. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] hj2016 opened a new pull request #2799: [HUDI-1784] Added print detailed stack log when hbase connection error
hj2016 opened a new pull request #2799: URL: https://github.com/apache/hudi/pull/2799 ## *Tips* - *Thank you very much for contributing to Apache Hudi.* - *Please review https://hudi.apache.org/contributing.html before opening a pull request.* ## What is the purpose of the pull request I tried to upgrade hdfs to version 3.0 and found that hbase reported an error and could not connect, but hbase was normal, and debug found that it was a jar conflict problem. Exception failed to print detailed stack log, resulting in no precise location of the cause of the problem. ## Committer checklist - [ ] Has a corresponding JIRA in PR title & commit - [ ] Commit message is descriptive of the change - [ ] CI is green - [ ] Necessary doc changes done or have another open PR - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-1785) Move OperationConverter to hudi-client-common for code reuse
[ https://issues.apache.org/jira/browse/HUDI-1785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-1785: - Labels: pull-request-available (was: ) > Move OperationConverter to hudi-client-common for code reuse > > > Key: HUDI-1785 > URL: https://issues.apache.org/jira/browse/HUDI-1785 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Xianghu Wang >Assignee: Xianghu Wang >Priority: Major > Labels: pull-request-available > > Currently, `OperationConverter` has been introduced twice in `hudi-flink` and > `hudi-utilities` module, we can move it to `hudi-client-common`, so it can be > used in both of them -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] wangxianghu opened a new pull request #2798: [HUDI-1785] Move OperationConverter to hudi-client-common for code reuse
wangxianghu opened a new pull request #2798: URL: https://github.com/apache/hudi/pull/2798 ## *Tips* - *Thank you very much for contributing to Apache Hudi.* - *Please review https://hudi.apache.org/contributing.html before opening a pull request.* ## What is the purpose of the pull request Currently, `OperationConverter` has been introduced twice in `hudi-flink` and `hudi-utilities` module, we can move it to `hudi-client-common`, so it can be used in both of them ## Committer checklist - [ ] Has a corresponding JIRA in PR title & commit - [ ] Commit message is descriptive of the change - [ ] CI is green - [ ] Necessary doc changes done or have another open PR - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (HUDI-1785) Move OperationConverter to hudi-client-common for code reuse
Xianghu Wang created HUDI-1785: -- Summary: Move OperationConverter to hudi-client-common for code reuse Key: HUDI-1785 URL: https://issues.apache.org/jira/browse/HUDI-1785 Project: Apache Hudi Issue Type: Improvement Reporter: Xianghu Wang Currently, `OperationConverter` has been introduced twice in `hudi-flink` and `hudi-utilities` module, we can move it to `hudi-client-common`, so it can be used in both of them -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HUDI-1785) Move OperationConverter to hudi-client-common for code reuse
[ https://issues.apache.org/jira/browse/HUDI-1785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xianghu Wang reassigned HUDI-1785: -- Assignee: Xianghu Wang > Move OperationConverter to hudi-client-common for code reuse > > > Key: HUDI-1785 > URL: https://issues.apache.org/jira/browse/HUDI-1785 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Xianghu Wang >Assignee: Xianghu Wang >Priority: Major > > Currently, `OperationConverter` has been introduced twice in `hudi-flink` and > `hudi-utilities` module, we can move it to `hudi-client-common`, so it can be > used in both of them -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HUDI-1784) Added print detailed stack log when hbase connection error
jing created HUDI-1784: -- Summary: Added print detailed stack log when hbase connection error Key: HUDI-1784 URL: https://issues.apache.org/jira/browse/HUDI-1784 Project: Apache Hudi Issue Type: Improvement Components: Index Reporter: jing Assignee: jing I tried to upgrade hdfs to version 3.0 and found that hbase reported an error and could not connect, but hbase was normal, and debug found that it was a jar conflict problem. Exception failed to print detailed stack log, resulting in no precise location of the cause of the problem. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] jintaoguan commented on pull request #2773: [HUDI-1764] Add Hudi-CLI support for clustering
jintaoguan commented on pull request #2773: URL: https://github.com/apache/hudi/pull/2773#issuecomment-817008126 Sure. Will do that. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] satishkotha edited a comment on pull request #2773: [HUDI-1764] Add Hudi-CLI support for clustering
satishkotha edited a comment on pull request #2773: URL: https://github.com/apache/hudi/pull/2773#issuecomment-817002604 @jintaoguan LGTM. Can you raise PR to update documentation on [CLI page](https://hudi.apache.org/docs/deployment.html#cli) and add example command line screenshots? the documentation is in 'asf-site' branch. See "content/docs/deployment.html" in the branch. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] satishkotha edited a comment on pull request #2773: [HUDI-1764] Add Hudi-CLI support for clustering
satishkotha edited a comment on pull request #2773: URL: https://github.com/apache/hudi/pull/2773#issuecomment-817002604 @jintaoguan LGTM. Can you raise PR to update documentation on [CLI page ] (https://hudi.apache.org/docs/deployment.html#cli) and add example command line screenshots? the documentation is in 'asf-site' branch. See "content/docs/deployment.html" in the branch. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] satishkotha commented on pull request #2773: [HUDI-1764] Add Hudi-CLI support for clustering
satishkotha commented on pull request #2773: URL: https://github.com/apache/hudi/pull/2773#issuecomment-817002604 @jintaoguan LGTM. Can you raise PR to update documentation on [CLI page ] (https://hudi.apache.org/docs/deployment.html#cli) and add example command line screenshots? the documentation is in 'asf-site' branch. See "content/docs/0.7.0-deployment.html" in the branch. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] satishkotha commented on pull request #2388: [HUDI-1353] add incremental timeline support for pending clustering ops
satishkotha commented on pull request #2388: URL: https://github.com/apache/hudi/pull/2388#issuecomment-816997366 @n3nash i dont have time in next 2-3 weeks to get this done. If you prefer, we can close this one. i can reopen (same PR or a different one) when i'm ready -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] n3nash commented on a change in pull request #2793: [HUDI-57] Support ORC Storage
n3nash commented on a change in pull request #2793: URL: https://github.com/apache/hudi/pull/2793#discussion_r610913501 ## File path: hudi-common/src/test/java/org/apache/hudi/common/util/TestAvroOrcUtils.java ## @@ -0,0 +1,76 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hudi.common.util; + +import static org.apache.hudi.common.testutils.HoodieTestDataGenerator.AVRO_SCHEMA; + +import java.util.Arrays; +import java.util.List; +import org.apache.avro.Schema; +import org.apache.hudi.common.testutils.HoodieCommonTestHarness; +import org.apache.orc.TypeDescription; + +import org.junit.jupiter.params.ParameterizedTest; +import org.junit.jupiter.params.provider.Arguments; +import org.junit.jupiter.params.provider.MethodSource; + +import static org.junit.jupiter.api.Assertions.assertEquals; + +public class TestAvroOrcUtils extends HoodieCommonTestHarness { + + public static List testCreateOrcSchemaArgs() { +// the ORC schema is constructed in the order as AVRO_SCHEMA: +// TRIP_SCHEMA_PREFIX, EXTRA_TYPE_SCHEMA, MAP_TYPE_SCHEMA, FARE_NESTED_SCHEMA, TIP_NESTED_SCHEMA, TRIP_SCHEMA_SUFFIX +// The following types are tested: +// DATE, DECIMAL, LONG, INT, BYTES, ARRAY, RECORD, MAP, STRING, FLOAT, DOUBLE +TypeDescription orcSchema = TypeDescription.fromString("struct<" Review comment: Is this testing all primitive types ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] n3nash commented on a change in pull request #2793: [HUDI-57] Support ORC Storage
n3nash commented on a change in pull request #2793: URL: https://github.com/apache/hudi/pull/2793#discussion_r610913343 ## File path: hudi-common/src/main/java/org/apache/hudi/common/util/OrcReaderIterator.java ## @@ -0,0 +1,118 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hudi.common.util; + +import java.util.List; +import org.apache.avro.Schema; +import org.apache.avro.generic.GenericData; +import org.apache.avro.generic.GenericData.Record; +import org.apache.orc.storage.ql.exec.vector.VectorizedRowBatch; +import org.apache.hudi.exception.HoodieIOException; + +import org.apache.orc.RecordReader; +import org.apache.orc.TypeDescription; + +import java.io.IOException; +import java.util.Iterator; + +/** + * This class wraps a ORC reader and provides an iterator based api to read from an ORC file. + */ +public class OrcReaderIterator implements Iterator { Review comment: Corresponding test class -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] n3nash commented on a change in pull request #2793: [HUDI-57] Support ORC Storage
n3nash commented on a change in pull request #2793: URL: https://github.com/apache/hudi/pull/2793#discussion_r610913269 ## File path: hudi-common/src/main/java/org/apache/hudi/common/util/OrcUtils.java ## @@ -0,0 +1,172 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hudi.common.util; + +import java.io.IOException; +import java.util.HashMap; +import java.util.HashSet; +import java.util.List; +import java.util.Map; +import java.util.Set; +import java.util.stream.Collectors; +import org.apache.avro.Schema; +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.fs.Path; +import org.apache.orc.storage.ql.exec.vector.BytesColumnVector; +import org.apache.orc.storage.ql.exec.vector.VectorizedRowBatch; +import org.apache.hudi.avro.HoodieAvroWriteSupport; +import org.apache.hudi.common.bloom.BloomFilter; +import org.apache.hudi.common.bloom.BloomFilterFactory; +import org.apache.hudi.common.bloom.BloomFilterTypeCode; +import org.apache.hudi.common.model.HoodieRecord; +import org.apache.hudi.exception.HoodieException; +import org.apache.hudi.exception.HoodieIOException; +import org.apache.hudi.exception.MetadataNotFoundException; +import org.apache.orc.OrcFile; +import org.apache.orc.OrcProto.UserMetadataItem; +import org.apache.orc.Reader; +import org.apache.orc.Reader.Options; +import org.apache.orc.RecordReader; +import org.apache.orc.TypeDescription; + +/** + * Utility functions for ORC files. + */ +public class OrcUtils { Review comment: Add corresponding test class to test all public methods -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] n3nash commented on a change in pull request #2793: [HUDI-57] Support ORC Storage
n3nash commented on a change in pull request #2793: URL: https://github.com/apache/hudi/pull/2793#discussion_r610913156 ## File path: hudi-common/src/main/java/org/apache/hudi/io/storage/HoodieOrcReader.java ## @@ -0,0 +1,88 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hudi.io.storage; + +import java.io.IOException; +import java.util.Iterator; +import java.util.Set; +import org.apache.avro.Schema; +import org.apache.avro.generic.IndexedRecord; +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.fs.Path; +import org.apache.hudi.common.bloom.BloomFilter; +import org.apache.hudi.common.util.AvroOrcUtils; +import org.apache.hudi.common.util.OrcReaderIterator; +import org.apache.hudi.common.util.OrcUtils; +import org.apache.hudi.exception.HoodieIOException; +import org.apache.orc.OrcFile; +import org.apache.orc.Reader; +import org.apache.orc.Reader.Options; +import org.apache.orc.RecordReader; +import org.apache.orc.TypeDescription; + +public class HoodieOrcReader implements HoodieFileReader { Review comment: Add corresponding test class -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] n3nash commented on a change in pull request #2793: [HUDI-57] Support ORC Storage
n3nash commented on a change in pull request #2793: URL: https://github.com/apache/hudi/pull/2793#discussion_r610911298 ## File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/storage/HoodieOrcWriter.java ## @@ -0,0 +1,176 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hudi.io.storage; + +import static org.apache.hudi.avro.HoodieAvroWriteSupport.HOODIE_AVRO_BLOOM_FILTER_METADATA_KEY; +import static org.apache.hudi.avro.HoodieAvroWriteSupport.HOODIE_BLOOM_FILTER_TYPE_CODE; +import static org.apache.hudi.avro.HoodieAvroWriteSupport.HOODIE_MAX_RECORD_KEY_FOOTER; +import static org.apache.hudi.avro.HoodieAvroWriteSupport.HOODIE_MIN_RECORD_KEY_FOOTER; + +import java.io.IOException; +import java.nio.ByteBuffer; +import java.util.List; +import java.util.concurrent.atomic.AtomicLong; +import org.apache.avro.Schema; +import org.apache.avro.generic.GenericRecord; +import org.apache.avro.generic.IndexedRecord; +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.fs.Path; +import org.apache.orc.storage.ql.exec.vector.ColumnVector; +import org.apache.orc.storage.ql.exec.vector.VectorizedRowBatch; +import org.apache.hudi.common.bloom.BloomFilter; +import org.apache.hudi.common.bloom.HoodieDynamicBoundedBloomFilter; +import org.apache.orc.OrcFile; +import org.apache.orc.TypeDescription; +import org.apache.orc.Writer; +import org.apache.hudi.avro.HoodieAvroUtils; +import org.apache.hudi.common.engine.TaskContextSupplier; +import org.apache.hudi.common.fs.FSUtils; +import org.apache.hudi.common.fs.HoodieWrapperFileSystem; +import org.apache.hudi.common.model.HoodieRecord; +import org.apache.hudi.common.model.HoodieRecordPayload; +import org.apache.hudi.common.util.AvroOrcUtils; + +public class HoodieOrcWriter +implements HoodieFileWriter { + private static final AtomicLong RECORD_INDEX = new AtomicLong(1); + + private final long maxFileSize; + private final Schema avroSchema; + private final List fieldTypes; + private final List fieldNames; + private final VectorizedRowBatch batch; + private final Writer writer; + + private final Path file; + private final HoodieWrapperFileSystem fs; + private final String instantTime; + private final TaskContextSupplier taskContextSupplier; + + private HoodieOrcConfig orcConfig; + private String minRecordKey; + private String maxRecordKey; + + public HoodieOrcWriter(String instantTime, Path file, HoodieOrcConfig config, Schema schema, + TaskContextSupplier taskContextSupplier) throws IOException { + +Configuration conf = FSUtils.registerFileSystem(file, config.getHadoopConf()); +this.file = HoodieWrapperFileSystem.convertToHoodiePath(file, conf); +this.fs = (HoodieWrapperFileSystem) this.file.getFileSystem(conf); +this.instantTime = instantTime; +this.taskContextSupplier = taskContextSupplier; + +this.avroSchema = schema; +final TypeDescription orcSchema = AvroOrcUtils.createOrcSchema(avroSchema); +this.fieldTypes = orcSchema.getChildren(); +this.fieldNames = orcSchema.getFieldNames(); +this.maxFileSize = config.getMaxFileSize(); +this.batch = orcSchema.createRowBatch(); +OrcFile.WriterOptions writerOptions = OrcFile.writerOptions(conf) +.blockSize(config.getBlockSize()) +.stripeSize(config.getStripeSize()) +.compress(config.getCompressionKind()) +.bufferSize(config.getBlockSize()) +.fileSystem(fs) +.setSchema(orcSchema); +this.writer = OrcFile.createWriter(this.file, writerOptions); +this.orcConfig = config; + } + + @Override + public void writeAvroWithMetadata(R avroRecord, HoodieRecord record) throws IOException { +String seqId = HoodieRecord.generateSequenceId(instantTime, taskContextSupplier.getPartitionIdSupplier().get(), +RECORD_INDEX.getAndIncrement()); +HoodieAvroUtils.addHoodieKeyToRecord((GenericRecord) avroRecord, record.getRecordKey(), +record.getPartitionPath(), file.getName()); +HoodieAvroUtils +.addCommitMetadataToRecord((GenericRecord) avroRecord, instantTime, seqId); + +writeAvro(record.getRecordKey(), avroRecord); + } + + @Override + public
[GitHub] [hudi] n3nash commented on a change in pull request #2793: [HUDI-57] Support ORC Storage
n3nash commented on a change in pull request #2793: URL: https://github.com/apache/hudi/pull/2793#discussion_r610909843 ## File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieStorageConfig.java ## @@ -39,10 +39,19 @@ public static final String DEFAULT_PARQUET_BLOCK_SIZE_BYTES = DEFAULT_PARQUET_FILE_MAX_BYTES; public static final String PARQUET_PAGE_SIZE_BYTES = "hoodie.parquet.page.size"; public static final String DEFAULT_PARQUET_PAGE_SIZE_BYTES = String.valueOf(1 * 1024 * 1024); + public static final String HFILE_FILE_MAX_BYTES = "hoodie.hfile.max.file.size"; public static final String HFILE_BLOCK_SIZE_BYTES = "hoodie.hfile.block.size"; public static final String DEFAULT_HFILE_BLOCK_SIZE_BYTES = String.valueOf(1 * 1024 * 1024); public static final String DEFAULT_HFILE_FILE_MAX_BYTES = String.valueOf(120 * 1024 * 1024); + + public static final String ORC_FILE_MAX_BYTES = "hoodie.orc.max.file.size"; + public static final String DEFAULT_ORC_FILE_MAX_BYTES = String.valueOf(120 * 1024 * 1024); + public static final String ORC_STRIPE_SIZE = "hoodie.orc.stripe.size"; Review comment: Can you please add comments on what is the stripe size used for ? ## File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieStorageConfig.java ## @@ -39,10 +39,19 @@ public static final String DEFAULT_PARQUET_BLOCK_SIZE_BYTES = DEFAULT_PARQUET_FILE_MAX_BYTES; public static final String PARQUET_PAGE_SIZE_BYTES = "hoodie.parquet.page.size"; public static final String DEFAULT_PARQUET_PAGE_SIZE_BYTES = String.valueOf(1 * 1024 * 1024); + public static final String HFILE_FILE_MAX_BYTES = "hoodie.hfile.max.file.size"; public static final String HFILE_BLOCK_SIZE_BYTES = "hoodie.hfile.block.size"; public static final String DEFAULT_HFILE_BLOCK_SIZE_BYTES = String.valueOf(1 * 1024 * 1024); public static final String DEFAULT_HFILE_FILE_MAX_BYTES = String.valueOf(120 * 1024 * 1024); + + public static final String ORC_FILE_MAX_BYTES = "hoodie.orc.max.file.size"; + public static final String DEFAULT_ORC_FILE_MAX_BYTES = String.valueOf(120 * 1024 * 1024); + public static final String ORC_STRIPE_SIZE = "hoodie.orc.stripe.size"; + public static final String DEFAULT_ORC_STRIPE_SIZE = String.valueOf(64 * 1024 * 1024); + public static final String ORC_BLOCK_SIZE = "hoodie.orc.block.size"; Review comment: Same for block size -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] vansimonsen commented on issue #2294: [SUPPORT] java.lang.IllegalArgumentException: Can not create a Path from an empty string on non partitioned COW table
vansimonsen commented on issue #2294: URL: https://github.com/apache/hudi/issues/2294#issuecomment-816889220 > @vansimonsen : Can you open a new GH issue with the stack trace . @rubenssoto : I believe the PR landed before 0.7.0 was cut. @bvaradar https://github.com/apache/hudi/issues/2797 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] vansimonsen opened a new issue #2797: [SUPPORT] Can not create a Path from an empty string on unpartitioned table
vansimonsen opened a new issue #2797: URL: https://github.com/apache/hudi/issues/2797 **Describe the problem you faced** * Issue trying to create unpartitioned tables to hive metastore (in aws glue data catalog) using hudi (Tested on `0.6.0`, `0.7.0` and `0.8.0` ) * Using hudi on AWS EMR, with pyspark * Hudi config for unpartitioned tables ``` hudiConfig = { "hoodie.datasource.write.precombine.field": , "hoodie.datasource.write.recordkey.field": _PRIMARY_KEY_COLUMN, "hoodie.datasource.write.keygenerator.class": 'org.apache.hudi.keygen.NonpartitionedKeyGenerator', "hoodie.datasource.hive_sync.partition_extractor_class": 'org.apache.hudi.hive.NonPartitionedExtractor', "hoodie.datasource.write.hive_style_partitioning": "true", "className": "org.apache.hudi", "hoodie.datasource.hive_sync.use_jdbc": "false", "hoodie.consistency.check.enabled": "true", "hoodie.datasource.hive_sync.database": DB_NAME, "hoodie.datasource.hive_sync.enable": "true", "hoodie.datasource.hive_sync.support_timestamp": "true", } ``` **To Reproduce** Steps to reproduce the behavior: 1. Run hudi with hive integration 2. Try to create an unpartitioned table, with config previously specified **Expected behavior** The table would be created without throw the exception, without any partition or `default` partitionpath **Environment Description** * Hudi version : `0.6.0`, `0.7.0` and `0.8.0` * Spark version : `2.4.7` * Hive version : Aws glue data catalog integration on EMR * Hadoop version : Amazon Hadoop distribution * Storage (HDFS/S3/GCS..) : S3 * Running on Docker? (yes/no) : no **Stacktrace** ```sql org.apache.hudi.hive.HoodieHiveSyncException: Failed to get update last commit time synced to 20210407181606 at org.apache.hudi.hive.HoodieHiveClient.updateLastCommitTimeSynced(HoodieHiveClient.java:496) at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:150) at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:94) at org.apache.hudi.HoodieSparkSqlWriter$.org$apache$hudi$HoodieSparkSqlWriter$$syncHive(HoodieSparkSqlWriter.scala:355) at org.apache.hudi.HoodieSparkSqlWriter$$anonfun$metaSync$2.apply(HoodieSparkSqlWriter.scala:403) at org.apache.hudi.HoodieSparkSqlWriter$$anonfun$metaSync$2.apply(HoodieSparkSqlWriter.scala:399) at scala.collection.mutable.HashSet.foreach(HashSet.scala:78) at org.apache.hudi.HoodieSparkSqlWriter$.metaSync(HoodieSparkSqlWriter.scala:399) at org.apache.hudi.HoodieSparkSqlWriter$.commitAndPerformPostOperations(HoodieSparkSqlWriter.scala:460) at org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:217) at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:134) at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68) at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:173) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:169) at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:197) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:194) at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:169) at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:114) at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:112) at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:696) at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:696) at org.apache.spark.sql.execution.SQLExecution$.org$apache$spark$sql$execution$SQLExecution$$executeQuery$1(SQLExecution.scala:83) at org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1$$anonfun$apply$1.apply(SQLExecution.scala:94) at org.apache.spark.sql.execution.QueryExecutionMetrics$.withMetrics(QueryExecutionMetrics.scala:141) at org.apache.spark.sql.execution.SQLExecution$.org$apache$spark$sql$execution$SQLExecution$$withMetrics(SQLExecution.scala:178) at
[GitHub] [hudi] codecov-io edited a comment on pull request #2740: [HUDI-1055] Remove hardcoded parquet in tests
codecov-io edited a comment on pull request #2740: URL: https://github.com/apache/hudi/pull/2740#issuecomment-809855336 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2740?src=pr=h1) Report > Merging [#2740](https://codecov.io/gh/apache/hudi/pull/2740?src=pr=desc) (3e41f49) into [master](https://codecov.io/gh/apache/hudi/commit/6786581c4842e47e1a8a8e942f54003dc151c7c6?el=desc) (6786581) will **decrease** coverage by `43.17%`. > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/hudi/pull/2740/graphs/tree.svg?width=650=150=pr=VTTXabwbs2)](https://codecov.io/gh/apache/hudi/pull/2740?src=pr=tree) ```diff @@ Coverage Diff @@ ## master #2740 +/- ## - Coverage 52.54% 9.37% -43.18% + Complexity 3707 48 -3659 Files 485 54 -431 Lines 231711995-21176 Branches 2459 235 -2224 - Hits 12176 187-11989 + Misses 99231795 -8128 + Partials 1072 13 -1059 ``` | Flag | Coverage Δ | Complexity Δ | | |---|---|---|---| | hudicli | `?` | `?` | | | hudiclient | `?` | `?` | | | hudicommon | `?` | `?` | | | hudiflink | `?` | `?` | | | hudihadoopmr | `?` | `?` | | | hudisparkdatasource | `?` | `?` | | | hudisync | `?` | `?` | | | huditimelineservice | `?` | `?` | | | hudiutilities | `9.37% <ø> (-60.36%)` | `0.00 <ø> (ø)` | | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags#carryforward-flags-in-the-pull-request-comment) to find out more. | [Impacted Files](https://codecov.io/gh/apache/hudi/pull/2740?src=pr=tree) | Coverage Δ | Complexity Δ | | |---|---|---|---| | [...va/org/apache/hudi/utilities/IdentitySplitter.java](https://codecov.io/gh/apache/hudi/pull/2740/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL0lkZW50aXR5U3BsaXR0ZXIuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-2.00%)` | | | [...va/org/apache/hudi/utilities/schema/SchemaSet.java](https://codecov.io/gh/apache/hudi/pull/2740/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NjaGVtYS9TY2hlbWFTZXQuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-3.00%)` | | | [...a/org/apache/hudi/utilities/sources/RowSource.java](https://codecov.io/gh/apache/hudi/pull/2740/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvUm93U291cmNlLmphdmE=) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-4.00%)` | | | [.../org/apache/hudi/utilities/sources/AvroSource.java](https://codecov.io/gh/apache/hudi/pull/2740/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvQXZyb1NvdXJjZS5qYXZh) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-1.00%)` | | | [.../org/apache/hudi/utilities/sources/JsonSource.java](https://codecov.io/gh/apache/hudi/pull/2740/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvSnNvblNvdXJjZS5qYXZh) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-1.00%)` | | | [...rg/apache/hudi/utilities/sources/CsvDFSSource.java](https://codecov.io/gh/apache/hudi/pull/2740/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvQ3N2REZTU291cmNlLmphdmE=) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-10.00%)` | | | [...g/apache/hudi/utilities/sources/JsonDFSSource.java](https://codecov.io/gh/apache/hudi/pull/2740/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvSnNvbkRGU1NvdXJjZS5qYXZh) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-4.00%)` | | | [...apache/hudi/utilities/sources/JsonKafkaSource.java](https://codecov.io/gh/apache/hudi/pull/2740/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvSnNvbkthZmthU291cmNlLmphdmE=) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-6.00%)` | | | [...pache/hudi/utilities/sources/ParquetDFSSource.java](https://codecov.io/gh/apache/hudi/pull/2740/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvUGFycXVldERGU1NvdXJjZS5qYXZh) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-5.00%)` | | | [...lities/schema/SchemaProviderWithPostProcessor.java](https://codecov.io/gh/apache/hudi/pull/2740/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NjaGVtYS9TY2hlbWFQcm92aWRlcldpdGhQb3N0UHJvY2Vzc29yLmphdmE=) | `0.00% <0.00%>
[GitHub] [hudi] TeRS-K commented on a change in pull request #2740: [HUDI-1055] Remove hardcoded parquet in tests
TeRS-K commented on a change in pull request #2740: URL: https://github.com/apache/hudi/pull/2740#discussion_r610789102 ## File path: hudi-cli/src/main/scala/org/apache/hudi/cli/SparkHelpers.scala ## @@ -40,7 +40,7 @@ import scala.collection.mutable._ object SparkHelpers { @throws[Exception] def skipKeysAndWriteNewFile(instantTime: String, fs: FileSystem, sourceFile: Path, destinationFile: Path, keysToSkip: Set[String]) { -val sourceRecords = ParquetUtils.readAvroRecords(fs.getConf, sourceFile) +val sourceRecords = new ParquetUtils().readAvroRecords(fs.getConf, sourceFile) Review comment: I removed all instances of `new ParquetUtils()` except for the one in `HoodieSparkBootstrapSchemaProvider::getBootstrapSourceSchema` as `readSchema()` returns a schema per file type. For parquet, `readSchema` returns a `org.apache.parquet.schema.MessageType`, for ORC (in future work), `readSchema` would return a `org.apache.orc.TypeDescription`, so this method cannot be extended from the base class. Does that make sense? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] rubenssoto edited a comment on issue #2509: [SUPPORT]Hudi saves TimestampType as bigInt
rubenssoto edited a comment on issue #2509: URL: https://github.com/apache/hudi/issues/2509#issuecomment-816812025 Hello Guys, @satishkotha @nsivabalan Athena Behavior changes, https://user-images.githubusercontent.com/36298331/114213658-a841c400-9939-11eb-9fc9-a2e51761908e.png;> https://user-images.githubusercontent.com/36298331/114213672-ad067800-9939-11eb-872d-fe264f97fcde.png;> This is a great news, but BETWEEN operator doesn't work. For exemple, this query works: select count(1) FROM "order" WHERE created_date >= cast('2021-04-07 03:00:00.000' as timestamp) and this query doens't work: select count(1) FROM "order" WHERE created_date between cast('2021-04-09 14:00:00.000' as timestamp) and cast('2021-04-09 15:00:00.000' as timestamp) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] rubenssoto commented on issue #2509: [SUPPORT]Hudi saves TimestampType as bigInt
rubenssoto commented on issue #2509: URL: https://github.com/apache/hudi/issues/2509#issuecomment-816812025 Hello Guys, Athena Behavior changes, https://user-images.githubusercontent.com/36298331/114213658-a841c400-9939-11eb-9fc9-a2e51761908e.png;> https://user-images.githubusercontent.com/36298331/114213672-ad067800-9939-11eb-872d-fe264f97fcde.png;> This is a great news, but BETWEEN operator doesn't work. For exemple, this query works: select count(1) FROM "order" WHERE created_date >= cast('2021-04-07 03:00:00.000' as timestamp) and this query doens't work: select count(1) FROM "order" WHERE created_date between cast('2021-04-09 14:00:00.000' as timestamp) and cast('2021-04-09 15:00:00.000' as timestamp) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] TeRS-K closed pull request #2793: [HUDI-57] Support ORC Storage
TeRS-K closed pull request #2793: URL: https://github.com/apache/hudi/pull/2793 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] yanghua commented on a change in pull request #2796: [HUDI-1783]support Huawei Cloud Object Storage
yanghua commented on a change in pull request #2796: URL: https://github.com/apache/hudi/pull/2796#discussion_r610693252 ## File path: hudi-common/src/main/java/org/apache/hudi/common/fs/StorageSchemes.java ## @@ -53,8 +53,9 @@ // Databricks file system DBFS("dbfs", false), // IBM Cloud Object Storage - COS("cos", false); - + COS("cos", false), + // Huawei Cloud Object Storage + OBS("obs", false); Review comment: Can we add an empty line like before? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] kimberlyamandalu commented on issue #2620: [SUPPORT] Performance Tuning: Slow stages (Building Workload Profile & Getting Small files from partitions) during Hudi Writes
kimberlyamandalu commented on issue #2620: URL: https://github.com/apache/hudi/issues/2620#issuecomment-816698689 > @kimberlyamandalu : do you have a support ticket for your question. lets not pollute this issue. we can create a new one for your use-case and can discuss over there hi @nsivabalan no, i do not have a separate ticket for my question. I thought it might be related to this so I chimed in. I can open a new ticket for my use case so we can isolate. Sorry for the confusion. Thanks. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] xiarixiaoyao commented on pull request #2796: [HUDI-1783]support Huawei Cloud Object Storage
xiarixiaoyao commented on pull request #2796: URL: https://github.com/apache/hudi/pull/2796#issuecomment-816688293 @nsivabalan could you pls help me to review this pr? thanks -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-1783) support Huawei Cloud Object Storage
[ https://issues.apache.org/jira/browse/HUDI-1783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-1783: - Labels: pull-request-available (was: ) > support Huawei Cloud Object Storage > > > Key: HUDI-1783 > URL: https://issues.apache.org/jira/browse/HUDI-1783 > Project: Apache Hudi > Issue Type: Bug > Components: Common Core >Reporter: tao meng >Priority: Major > Labels: pull-request-available > Fix For: 0.9.0 > > > add support for Huawei Cloud Object Storage -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-1752) Add HoodieFlinkClient InsertOverwrite
[ https://issues.apache.org/jira/browse/HUDI-1752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-1752: -- Fix Version/s: (was: 0.8.0) 0.9.0 > Add HoodieFlinkClient InsertOverwrite > - > > Key: HUDI-1752 > URL: https://issues.apache.org/jira/browse/HUDI-1752 > Project: Apache Hudi > Issue Type: New Feature > Components: CLI, Flink Integration >Reporter: xurunbai >Priority: Minor > Labels: features > Fix For: 0.9.0 > > > Add HoodieFlinkClient InsertOverwrite -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] xiarixiaoyao opened a new pull request #2796: [HUDI-1783]support Huawei Cloud Object Storage
xiarixiaoyao opened a new pull request #2796: URL: https://github.com/apache/hudi/pull/2796 ## *Tips* - *Thank you very much for contributing to Apache Hudi.* - *Please review https://hudi.apache.org/contributing.html before opening a pull request.* ## What is the purpose of the pull request add support for huawei cloud object stroage ## Brief change log *(for example:)* - *Modify AnnotationLocation checkstyle rule in checkstyle.xml* ## Verify this pull request *(Please pick either of the following options)* This pull request is a trivial rework / code cleanup without any test coverage. *(or)* This pull request is already covered by existing tests, such as *(please describe tests)*. (or) This change added tests and can be verified as follows: *(example:)* - *Added integration tests for end-to-end.* - *Added HoodieClientWriteTest to verify the change.* - *Manually verified the change by running a job locally.* ## Committer checklist - [ ] Has a corresponding JIRA in PR title & commit - [ ] Commit message is descriptive of the change - [ ] CI is green - [ ] Necessary doc changes done or have another open PR - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (HUDI-1783) support Huawei Cloud Object Storage
tao meng created HUDI-1783: -- Summary: support Huawei Cloud Object Storage Key: HUDI-1783 URL: https://issues.apache.org/jira/browse/HUDI-1783 Project: Apache Hudi Issue Type: Bug Components: Common Core Reporter: tao meng Fix For: 0.9.0 add support for Huawei Cloud Object Storage -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] Magicbeanbuyer commented on issue #2498: [SUPPORT] Hudi MERGE_ON_READ load to dataframe fails for the versions [0.6.0],[0.7.0] and runs for [0.5.3]
Magicbeanbuyer commented on issue #2498: URL: https://github.com/apache/hudi/issues/2498#issuecomment-816555746 Hey @nsivabalan, We have wrapped up our POC, therefore no longer have the setup anymore. Sorry couldn't contribute further to the issue. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] xiarixiaoyao edited a comment on pull request #2720: [HUDI-1719]hive on spark/mr,Incremental query of the mor table, the partition field is incorrect
xiarixiaoyao edited a comment on pull request #2720: URL: https://github.com/apache/hudi/pull/2720#issuecomment-816553709 @nsivabalan i find that the question: why test in TestHoodieCombineHiveInputFormat is Disabled ? @test @disabled public void testHoodieRealtimeCombineHoodieInputFormat() throws Exception { 。 this ut will failed when i enable it。 now i fixed the bug of this UT,can you push a patch for this test? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] xiarixiaoyao commented on pull request #2720: [HUDI-1719]hive on spark/mr,Incremental query of the mor table, the partition field is incorrect
xiarixiaoyao commented on pull request #2720: URL: https://github.com/apache/hudi/pull/2720#issuecomment-816553709 @nsivabalan i find that the question: why test in TestHoodieCombineHiveInputFormat is Disabled ? @test @disabled public void testHoodieRealtimeCombineHoodieInputFormat() throws Exception { 。 this ut will failed when i enable it。 now i fixed the bug of this UT,can you push a patch for this test -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[hudi] branch asf-site updated: Travis CI build asf-site
This is an automated email from the ASF dual-hosted git repository. vinoth pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/asf-site by this push: new b353b0b Travis CI build asf-site b353b0b is described below commit b353b0b66cf17ccfbbf55d58d741fdebd66ac0a4 Author: CI AuthorDate: Fri Apr 9 08:39:56 2021 + Travis CI build asf-site --- content/docs/0.8.0-concurrency_control.html | 6 -- content/docs/0.8.0-configurations.html | 32 - content/docs/concurrency_control.html | 6 -- content/docs/configurations.html| 32 - 4 files changed, 36 insertions(+), 40 deletions(-) diff --git a/content/docs/0.8.0-concurrency_control.html b/content/docs/0.8.0-concurrency_control.html index d28ec87..4e540b5 100644 --- a/content/docs/0.8.0-concurrency_control.html +++ b/content/docs/0.8.0-concurrency_control.html @@ -403,8 +403,6 @@ hoodie.write.lock.provider=lock-provider-classname hoodie.write.lock.provider=org.apache.hudi.client.transaction.lock.ZookeeperBasedLockProvider hoodie.write.lock.zookeeper.url hoodie.write.lock.zookeeper.port -hoodie.write.lock.wait_time_ms -hoodie.write.lock.num_retries hoodie.write.lock.zookeeper.lock_key hoodie.write.lock.zookeeper.base_path @@ -414,8 +412,6 @@ hoodie.write.lock.zookeeper.base_path hoodie.write.lock.provider=org.apache.hudi.hive.HiveMetastoreBasedLockProvider hoodie.write.lock.hivemetastore.database hoodie.write.lock.hivemetastore.table -hoodie.write.lock.wait_time_ms -hoodie.write.lock.num_retries The HiveMetastore URI's are picked up from the hadoop configuration file loaded during runtime. @@ -433,8 +429,6 @@ hoodie.write.lock.num_retries .option("hoodie.write.concurrency.mode", "optimistic_concurrency_control") .option("hoodie.write.lock.zookeeper.url", "zookeeper") .option("hoodie.write.lock.zookeeper.port", "2181") - .option("hoodie.write.lock.wait_time_ms", "12000") - .option("hoodie.write.lock.num_retries", "2") .option("hoodie.write.lock.zookeeper.lock_key", "test_table") .option("hoodie.write.lock.zookeeper.base_path", "/test") .option(RECORDKEY_FIELD_OPT_KEY, "uuid") diff --git a/content/docs/0.8.0-configurations.html b/content/docs/0.8.0-configurations.html index 434418f..960dfb4 100644 --- a/content/docs/0.8.0-configurations.html +++ b/content/docs/0.8.0-configurations.html @@ -999,6 +999,10 @@ HoodieWriteConfig can be built using a builder pattern as below. Property: hoodie.cleaner.policy Cleaning policy to be used. Hudi will delete older versions of parquet files to re-claim space. Any Query/Computation referring to this version of the file will fail. It is good to make sure that the data is retained for more than the maximum query execution time. +withFailedWritesCleaningPolicy(policy = HoodieFailedWritesCleaningPolicy.EAGER) +Property: hoodie.cleaner.policy.failed.writes + Cleaning policy for failed writes to be used. Hudi will delete any files written by failed writes to re-claim space. Choose to perform this rollback of failed writes eagerly before every writer starts (only supported for single writer) or lazily by the cleaner (required for multi-writers) + retainCommits(no_of_commits_to_retain = 24) Property: hoodie.cleaner.commits.retained Number of commits to retain. So data will be retained for num_of_commits * time_between_commits (scheduled). This also directly translates into how much you can incrementally pull on this table @@ -1360,59 +1364,59 @@ Each clustering operation can create multiple groups. Total amount of data proce withLockConfig (HoodieLockConfig) withLockProvider(lockProvider = org.apache.hudi.client.transaction.lock.ZookeeperBasedLockProvider) -Property: hoodie.writer.lock.provider +Property: hoodie.write.lock.provider Lock provider class name, user can provide their own implementation of LockProvider which should be subclass of org.apache.hudi.common.lock.LockProvider withZkQuorum(zkQuorum) -Property: hoodie.writer.lock.zookeeper.url +Property: hoodie.write.lock.zookeeper.url Set the list of comma separated servers to connect to withZkBasePath(zkBasePath) -Property: hoodie.writer.lock.zookeeper.base_path [Required] +Property: hoodie.write.lock.zookeeper.base_path [Required] The base path on Zookeeper under which to create a ZNode to acquire the lock. This should be common for all jobs writing to the same table withZkPort(zkPort) -Property: hoodie.writer.lock.zookeeper.port [Required] +Property: hoodie.write.lock.zookeeper.port [Required] The connection port to be used for Zookeeper withZkLockKey(zkLockKey) -Property: hoodie.writer.lock.zookeeper.lock_key [Required] +Property: hoodie.write.lock.zookeeper.lock_key [Required] Key name under base_path at which to create a ZNode and
[GitHub] [hudi] yanghua commented on pull request #2793: [HUDI-57] Support ORC Storage
yanghua commented on pull request #2793: URL: https://github.com/apache/hudi/pull/2793#issuecomment-816495293 > How can I trigger a rebuild? option 1: close and reopen the PR; option 2: push an empty commit via git command -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] n3nash merged pull request #2794: [MINOR] Fix concurrency docs
n3nash merged pull request #2794: URL: https://github.com/apache/hudi/pull/2794 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[hudi] branch asf-site updated: [MINOR] Fix concurrency docs (#2794)
This is an automated email from the ASF dual-hosted git repository. nagarwal pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/asf-site by this push: new a961350 [MINOR] Fix concurrency docs (#2794) a961350 is described below commit a961350740abf4d1637798bc287bd0b6b9800305 Author: n3nash AuthorDate: Fri Apr 9 00:48:45 2021 -0700 [MINOR] Fix concurrency docs (#2794) --- docs/_docs/0.8.0/2_4_configurations.md | 32 - docs/_docs/0.8.0/2_9_concurrency_control.md | 6 -- docs/_docs/2_4_configurations.md| 32 - docs/_docs/2_9_concurrency_control.md | 6 -- 4 files changed, 36 insertions(+), 40 deletions(-) diff --git a/docs/_docs/0.8.0/2_4_configurations.md b/docs/_docs/0.8.0/2_4_configurations.md index 0a5a4ab..207bf80 100644 --- a/docs/_docs/0.8.0/2_4_configurations.md +++ b/docs/_docs/0.8.0/2_4_configurations.md @@ -469,6 +469,10 @@ Configs that control compaction (merging of log files onto a new parquet base fi Property: `hoodie.cleaner.policy` Cleaning policy to be used. Hudi will delete older versions of parquet files to re-claim space. Any Query/Computation referring to this version of the file will fail. It is good to make sure that the data is retained for more than the maximum query execution time. + withFailedWritesCleaningPolicy(policy = HoodieFailedWritesCleaningPolicy.EAGER) {#withFailedWritesCleaningPolicy} +Property: `hoodie.cleaner.policy.failed.writes` + Cleaning policy for failed writes to be used. Hudi will delete any files written by failed writes to re-claim space. Choose to perform this rollback of failed writes `eagerly` before every writer starts (only supported for single writer) or `lazily` by the cleaner (required for multi-writers) + retainCommits(no_of_commits_to_retain = 24) {#retainCommits} Property: `hoodie.cleaner.commits.retained` Number of commits to retain. So data will be retained for num_of_commits * time_between_commits (scheduled). This also directly translates into how much you can incrementally pull on this table @@ -831,59 +835,59 @@ Configs that control locking mechanisms if [WriteConcurrencyMode=optimistic_conc [withLockConfig](#withLockConfig) (HoodieLockConfig) withLockProvider(lockProvider = org.apache.hudi.client.transaction.lock.ZookeeperBasedLockProvider) {#withLockProvider} -Property: `hoodie.writer.lock.provider` +Property: `hoodie.write.lock.provider` Lock provider class name, user can provide their own implementation of LockProvider which should be subclass of org.apache.hudi.common.lock.LockProvider withZkQuorum(zkQuorum) {#withZkQuorum} -Property: `hoodie.writer.lock.zookeeper.url` +Property: `hoodie.write.lock.zookeeper.url` Set the list of comma separated servers to connect to withZkBasePath(zkBasePath) {#withZkBasePath} -Property: `hoodie.writer.lock.zookeeper.base_path` [Required] +Property: `hoodie.write.lock.zookeeper.base_path` [Required] The base path on Zookeeper under which to create a ZNode to acquire the lock. This should be common for all jobs writing to the same table withZkPort(zkPort) {#withZkPort} -Property: `hoodie.writer.lock.zookeeper.port` [Required] +Property: `hoodie.write.lock.zookeeper.port` [Required] The connection port to be used for Zookeeper withZkLockKey(zkLockKey) {#withZkLockKey} -Property: `hoodie.writer.lock.zookeeper.lock_key` [Required] +Property: `hoodie.write.lock.zookeeper.lock_key` [Required] Key name under base_path at which to create a ZNode and acquire lock. Final path on zk will look like base_path/lock_key. We recommend setting this to the table name withZkConnectionTimeoutInMs(connectionTimeoutInMs = 15000) {#withZkConnectionTimeoutInMs} -Property: `hoodie.writer.lock.zookeeper.connection_timeout_ms` +Property: `hoodie.write.lock.zookeeper.connection_timeout_ms` How long to wait when connecting to ZooKeeper before considering the connection a failure withZkSessionTimeoutInMs(sessionTimeoutInMs = 6) {#withZkSessionTimeoutInMs} -Property: `hoodie.writer.lock.zookeeper.session_timeout_ms` +Property: `hoodie.write.lock.zookeeper.session_timeout_ms` How long to wait after losing a connection to ZooKeeper before the session is expired withNumRetries(num_retries = 3) {#withNumRetries} -Property: `hoodie.writer.lock.num_retries` +Property: `hoodie.write.lock.num_retries` Maximum number of times to retry by lock provider client withRetryWaitTimeInMillis(retryWaitTimeInMillis = 5000) {#withRetryWaitTimeInMillis} -Property: `hoodie.writer.lock.wait_time_ms_between_retry` +Property: `hoodie.write.lock.wait_time_ms_between_retry` Initial amount of time to wait between retries by lock provider client withHiveDatabaseName(hiveDatabaseName) {#withHiveDatabaseName}
[GitHub] [hudi] garyli1019 commented on pull request #2786: [HUDI-1782] Add more options for HUDI Flink
garyli1019 commented on pull request #2786: URL: https://github.com/apache/hudi/pull/2786#issuecomment-816462388 > > Should we change the 0.8.0 doc as well? It will be merged soon. #2792 > > I think it is not necessary ? People would always see the master document. If the new options only affect the master then it should be fine. There are many users are still using an older version, like AWS EMR still on 0.6.0. So the versioned doc is still worth it to maintain. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (HUDI-1711) Avro Schema Exception with Spark 3.0 in 0.7
[ https://issues.apache.org/jira/browse/HUDI-1711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17317684#comment-17317684 ] sivabalan narayanan commented on HUDI-1711: --- [~cdmikechen]: Can you give me the Avro schema for which exception is seen. and can you please confirm my understanding as to when exactly the exception is seen. * you see the exception with default value for "hoodie.deltastreamer.schemaprovider.spark_avro_post_processor.enable" which means is enabled. * Did you try w/ spark2* with hudi 0.7.0? * FYI: in latest release, we have added support for custom deserializer for kafka which is capable to leverage latest schema from schema registry. [https://github.com/apache/hudi/pull/2619 |https://github.com/apache/hudi/pull/2619v] > Avro Schema Exception with Spark 3.0 in 0.7 > --- > > Key: HUDI-1711 > URL: https://issues.apache.org/jira/browse/HUDI-1711 > Project: Apache Hudi > Issue Type: Bug > Components: DeltaStreamer >Reporter: Balaji Varadarajan >Assignee: sivabalan narayanan >Priority: Major > Labels: sev:critical, user-support-issues > > GH: [https://github.com/apache/hudi/issues/2705] > > > {{21/03/22 10:10:35 WARN util.package: Truncated the string representation of > a plan since it was too large. This behavior can be adjusted by setting > 'spark.sql.debug.maxToStringFields'. > 21/03/22 10:10:35 ERROR executor.Executor: Exception in task 0.0 in stage 1.0 > (TID 1) > java.lang.RuntimeException: Error while decoding: > java.lang.NegativeArraySizeException: -1255727808 > createexternalrow(if (isnull(input[0, > struct, > true])) null else createexternalrow(if (input[0, > struct, > true].isNullAt) null else input[0, > struct, > true].id, if (input[0, > struct, > true].isNullAt) null else input[0, > struct, > true].name.toString, if (input[0, > struct, > true].isNullAt) null else input[0, > struct, > true].type.toString, if (input[0, > struct, > true].isNullAt) null else input[0, > struct, > true].url.toString, if (input[0, > struct, > true].isNullAt) null else input[0, > struct, > true].user.toString, if (input[0, > struct, > true].isNullAt) null else input[0, > struct, > true].password.toString, if (input[0, > struct, > true].isNullAt) null else input[0, > struct, > true].create_time.toString, if (input[0, > struct, > true].isNullAt) null else input[0, > struct, > true].create_user.toString, if (input[0, > struct, > true].isNullAt) null else input[0, > struct, > true].update_time.toString, if (input[0, > struct, > true].isNullAt) null else input[0, > struct, > true].update_user.toString, if (input[0, > struct, > true].isNullAt) null else input[0, > struct, > true].del_flag, StructField(id,IntegerType,false), > StructField(name,StringType,true), StructField(type,StringType,true), > StructField(url,StringType,true), StructField(user,StringType,true), > StructField(password,StringType,true), > StructField(create_time,StringType,true), > StructField(create_user,StringType,true), > StructField(update_time,StringType,true), > StructField(update_user,StringType,true), > StructField(del_flag,IntegerType,true)), if (isnull(input[1, > struct, > true])) null else createexternalrow(if (input[1, > struct, > true].isNullAt) null else input[1, > struct, > true].id, if (input[1, > struct, > true].isNullAt) null else input[1, > struct, > true].name.toString, if (input[1, > struct, > true].isNullAt) null else input[1, > struct, > true].type.toString, if (input[1, > struct, > true].isNullAt) null else input[1, > struct, > true].url.toString, if (input[1, > struct, > true].isNullAt) null else input[1, > struct, > true].user.toString, if (input[1, > struct, > true].isNullAt) null else input[1, > struct, > true].password.toString, if (input[1, > struct, > true].isNullAt) null else input[1, > struct, > true].create_time.toString, if (input[1, > struct, > true].isNullAt) null else input[1, > struct, > true].create_user.toString, if (input[1, > struct, > true].isNullAt) null else input[1, > struct, > true].update_time.toString, if (input[1, > struct, > true].isNullAt) null else input[1, > struct, > true].update_user.toString, if (input[1, > struct, > true].isNullAt) null else input[1, > struct, > true].del_flag, StructField(id,IntegerType,false), > StructField(name,StringType,true), StructField(type,StringType,true), > StructField(url,StringType,true), StructField(user,StringType,true), > StructField(password,StringType,true), > StructField(create_time,StringType,true), > StructField(create_user,StringType,true), > StructField(update_time,StringType,true), > StructField(update_user,StringType,true), > StructField(del_flag,IntegerType,true)), if (isnull(input[2,