[jira] [Commented] (HUDI-1783) Support Huawei Cloud Object Storage

2021-04-09 Thread vinoyang (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-1783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17318424#comment-17318424
 ] 

vinoyang commented on HUDI-1783:


[~xiaotaotao] I have given you Jira contributor permission.

> Support Huawei  Cloud Object Storage
> 
>
> Key: HUDI-1783
> URL: https://issues.apache.org/jira/browse/HUDI-1783
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Common Core
>Reporter: tao meng
>Assignee: tao meng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> add support  for Huawei Cloud Object Storage



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HUDI-1783) Support Huawei Cloud Object Storage

2021-04-09 Thread vinoyang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang reassigned HUDI-1783:
--

Assignee: tao meng

> Support Huawei  Cloud Object Storage
> 
>
> Key: HUDI-1783
> URL: https://issues.apache.org/jira/browse/HUDI-1783
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Common Core
>Reporter: tao meng
>Assignee: tao meng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> add support  for Huawei Cloud Object Storage



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-1783) Support Huawei Cloud Object Storage

2021-04-09 Thread vinoyang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang updated HUDI-1783:
---
Summary: Support Huawei  Cloud Object Storage  (was: support Huawei  Cloud 
Object Storage)

> Support Huawei  Cloud Object Storage
> 
>
> Key: HUDI-1783
> URL: https://issues.apache.org/jira/browse/HUDI-1783
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Common Core
>Reporter: tao meng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> add support  for Huawei Cloud Object Storage



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[hudi] branch master updated: [HUDI-1783] Support Huawei Cloud Object Storage (#2796)

2021-04-09 Thread vinoyang
This is an automated email from the ASF dual-hosted git repository.

vinoyang pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new 8d4a7fe  [HUDI-1783] Support Huawei Cloud Object Storage (#2796)
8d4a7fe is described below

commit 8d4a7fe33e041719e2509f1f8ad3667e2ae7bbb4
Author: xiarixiaoyao 
AuthorDate: Sat Apr 10 13:02:11 2021 +0800

[HUDI-1783] Support Huawei Cloud Object Storage (#2796)
---
 .../src/main/java/org/apache/hudi/common/fs/StorageSchemes.java   | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git 
a/hudi-common/src/main/java/org/apache/hudi/common/fs/StorageSchemes.java 
b/hudi-common/src/main/java/org/apache/hudi/common/fs/StorageSchemes.java
index 7ebf641..56c9c8e 100644
--- a/hudi-common/src/main/java/org/apache/hudi/common/fs/StorageSchemes.java
+++ b/hudi-common/src/main/java/org/apache/hudi/common/fs/StorageSchemes.java
@@ -53,7 +53,9 @@ public enum StorageSchemes {
   // Databricks file system
   DBFS("dbfs", false),
   // IBM Cloud Object Storage
-  COS("cos", false);
+  COS("cos", false),
+  // Huawei Cloud Object Storage
+  OBS("obs", false);
 
   private String scheme;
   private boolean supportsAppend;


[GitHub] [hudi] yanghua merged pull request #2796: [HUDI-1783] Support Huawei Cloud Object Storage

2021-04-09 Thread GitBox


yanghua merged pull request #2796:
URL: https://github.com/apache/hudi/pull/2796


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] yanghua commented on pull request #2796: [HUDI-1783] Support Huawei Cloud Object Storage

2021-04-09 Thread GitBox


yanghua commented on pull request #2796:
URL: https://github.com/apache/hudi/pull/2796#issuecomment-817078875


   > already update the code。 @yanghua thanks for you review, sorry for that 
low level mistake。
   
   It doesn't matter, just relax.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] xiarixiaoyao commented on pull request #2722: [HUDI-1722]hive beeline/spark-sql query specified field on mor table occur NPE

2021-04-09 Thread GitBox


xiarixiaoyao commented on pull request #2722:
URL: https://github.com/apache/hudi/pull/2722#issuecomment-817071591


   @garyli1019  unit test has added, pls review again, thanks


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] xiarixiaoyao commented on pull request #2796: [HUDI-1783] Support Huawei Cloud Object Storage

2021-04-09 Thread GitBox


xiarixiaoyao commented on pull request #2796:
URL: https://github.com/apache/hudi/pull/2796#issuecomment-817070783


   already update the code。 @yanghua  thanks for you review,   sorry for 
that low level mistake。


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] codecov-io edited a comment on pull request #2798: [HUDI-1785] Move OperationConverter to hudi-client-common for code reuse

2021-04-09 Thread GitBox


codecov-io edited a comment on pull request #2798:
URL: https://github.com/apache/hudi/pull/2798#issuecomment-817064692


   # [Codecov](https://codecov.io/gh/apache/hudi/pull/2798?src=pr=h1) Report
   > Merging 
[#2798](https://codecov.io/gh/apache/hudi/pull/2798?src=pr=desc) (81fd3bb) 
into 
[master](https://codecov.io/gh/apache/hudi/commit/18459d4045ec4a85081c227893b226a4d759f84b?el=desc)
 (18459d4) will **increase** coverage by `0.29%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/2798/graphs/tree.svg?width=650=150=pr=VTTXabwbs2)](https://codecov.io/gh/apache/hudi/pull/2798?src=pr=tree)
   
   ```diff
   @@ Coverage Diff  @@
   ## master#2798  +/-   ##
   
   + Coverage 52.26%   52.56%   +0.29% 
   - Complexity 3682 3708  +26 
   
 Files   484  484  
 Lines 2309423167  +73 
 Branches   2456 2459   +3 
   
   + Hits  1207012177 +107 
   + Misses 9959 9919  -40 
   - Partials   1065 1071   +6 
   ```
   
   | Flag | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | hudicli | `40.29% <ø> (+3.35%)` | `0.00 <ø> (ø)` | |
   | hudiclient | `∅ <ø> (∅)` | `0.00 <ø> (ø)` | |
   | hudicommon | `50.67% <ø> (-0.10%)` | `0.00 <ø> (ø)` | |
   | hudiflink | `56.60% <ø> (+0.02%)` | `0.00 <ø> (ø)` | |
   | hudihadoopmr | `33.44% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | hudisparkdatasource | `71.33% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | hudisync | `45.70% <ø> (+0.23%)` | `0.00 <ø> (ø)` | |
   | huditimelineservice | `64.36% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | hudiutilities | `69.84% <ø> (+0.12%)` | `0.00 <ø> (ø)` | |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/2798?src=pr=tree) | Coverage Δ 
| Complexity Δ | |
   |---|---|---|---|
   | 
[.../org/apache/hudi/streamer/FlinkStreamerConfig.java](https://codecov.io/gh/apache/hudi/pull/2798/diff?src=pr=tree#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS9zdHJlYW1lci9GbGlua1N0cmVhbWVyQ29uZmlnLmphdmE=)
 | `0.00% <ø> (ø)` | `0.00 <0.00> (ø)` | |
   | 
[...i/utilities/deltastreamer/HoodieDeltaStreamer.java](https://codecov.io/gh/apache/hudi/pull/2798/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2RlbHRhc3RyZWFtZXIvSG9vZGllRGVsdGFTdHJlYW1lci5qYXZh)
 | `71.78% <ø> (+0.59%)` | `18.00 <0.00> (ø)` | |
   | 
[...s/deltastreamer/HoodieMultiTableDeltaStreamer.java](https://codecov.io/gh/apache/hudi/pull/2798/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2RlbHRhc3RyZWFtZXIvSG9vZGllTXVsdGlUYWJsZURlbHRhU3RyZWFtZXIuamF2YQ==)
 | `78.39% <ø> (ø)` | `18.00 <0.00> (ø)` | |
   | 
[.../apache/hudi/sink/compact/CompactionPlanEvent.java](https://codecov.io/gh/apache/hudi/pull/2798/diff?src=pr=tree#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS9zaW5rL2NvbXBhY3QvQ29tcGFjdGlvblBsYW5FdmVudC5qYXZh)
 | `50.00% <0.00%> (-50.00%)` | `3.00% <0.00%> (ø%)` | |
   | 
[...pache/hudi/sink/compact/CompactionCommitEvent.java](https://codecov.io/gh/apache/hudi/pull/2798/diff?src=pr=tree#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS9zaW5rL2NvbXBhY3QvQ29tcGFjdGlvbkNvbW1pdEV2ZW50LmphdmE=)
 | `43.75% <0.00%> (-43.75%)` | `3.00% <0.00%> (ø%)` | |
   | 
[...i/common/table/timeline/TimelineMetadataUtils.java](https://codecov.io/gh/apache/hudi/pull/2798/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3RpbWVsaW5lL1RpbWVsaW5lTWV0YWRhdGFVdGlscy5qYXZh)
 | `70.17% <0.00%> (-2.56%)` | `17.00% <0.00%> (ø%)` | |
   | 
[...he/hudi/sink/partitioner/BucketAssignFunction.java](https://codecov.io/gh/apache/hudi/pull/2798/diff?src=pr=tree#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS9zaW5rL3BhcnRpdGlvbmVyL0J1Y2tldEFzc2lnbkZ1bmN0aW9uLmphdmE=)
 | `85.18% <0.00%> (-2.49%)` | `17.00% <0.00%> (+2.00%)` | :arrow_down: |
   | 
[...in/java/org/apache/hudi/table/HoodieTableSink.java](https://codecov.io/gh/apache/hudi/pull/2798/diff?src=pr=tree#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS90YWJsZS9Ib29kaWVUYWJsZVNpbmsuamF2YQ==)
 | `11.90% <0.00%> (-0.30%)` | `2.00% <0.00%> (ø%)` | |
   | 
[.../common/table/timeline/HoodieArchivedTimeline.java](https://codecov.io/gh/apache/hudi/pull/2798/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3RpbWVsaW5lL0hvb2RpZUFyY2hpdmVkVGltZWxpbmUuamF2YQ==)
 | `0.00% <0.00%> (ø)` | `0.00% <0.00%> (ø%)` | |
   | 

[GitHub] [hudi] codecov-io commented on pull request #2798: [HUDI-1785] Move OperationConverter to hudi-client-common for code reuse

2021-04-09 Thread GitBox


codecov-io commented on pull request #2798:
URL: https://github.com/apache/hudi/pull/2798#issuecomment-817064692


   # [Codecov](https://codecov.io/gh/apache/hudi/pull/2798?src=pr=h1) Report
   > Merging 
[#2798](https://codecov.io/gh/apache/hudi/pull/2798?src=pr=desc) (81fd3bb) 
into 
[master](https://codecov.io/gh/apache/hudi/commit/18459d4045ec4a85081c227893b226a4d759f84b?el=desc)
 (18459d4) will **increase** coverage by `0.42%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/2798/graphs/tree.svg?width=650=150=pr=VTTXabwbs2)](https://codecov.io/gh/apache/hudi/pull/2798?src=pr=tree)
   
   ```diff
   @@ Coverage Diff  @@
   ## master#2798  +/-   ##
   
   + Coverage 52.26%   52.68%   +0.42% 
   + Complexity 3682 3515 -167 
   
 Files   484  461  -23 
 Lines 2309421554-1540 
 Branches   2456 2303 -153 
   
   - Hits  1207011356 -714 
   + Misses 9959 9202 -757 
   + Partials   1065  996  -69 
   ```
   
   | Flag | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | hudicli | `40.29% <ø> (+3.35%)` | `0.00 <ø> (ø)` | |
   | hudiclient | `∅ <ø> (∅)` | `0.00 <ø> (ø)` | |
   | hudicommon | `50.67% <ø> (-0.10%)` | `0.00 <ø> (ø)` | |
   | hudiflink | `56.60% <ø> (+0.02%)` | `0.00 <ø> (ø)` | |
   | hudihadoopmr | `33.44% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | hudisparkdatasource | `71.33% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | hudisync | `?` | `?` | |
   | huditimelineservice | `?` | `?` | |
   | hudiutilities | `69.84% <ø> (+0.12%)` | `0.00 <ø> (ø)` | |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/2798?src=pr=tree) | Coverage Δ 
| Complexity Δ | |
   |---|---|---|---|
   | 
[.../org/apache/hudi/streamer/FlinkStreamerConfig.java](https://codecov.io/gh/apache/hudi/pull/2798/diff?src=pr=tree#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS9zdHJlYW1lci9GbGlua1N0cmVhbWVyQ29uZmlnLmphdmE=)
 | `0.00% <ø> (ø)` | `0.00 <0.00> (ø)` | |
   | 
[...i/utilities/deltastreamer/HoodieDeltaStreamer.java](https://codecov.io/gh/apache/hudi/pull/2798/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2RlbHRhc3RyZWFtZXIvSG9vZGllRGVsdGFTdHJlYW1lci5qYXZh)
 | `71.78% <ø> (+0.59%)` | `18.00 <0.00> (ø)` | |
   | 
[...s/deltastreamer/HoodieMultiTableDeltaStreamer.java](https://codecov.io/gh/apache/hudi/pull/2798/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2RlbHRhc3RyZWFtZXIvSG9vZGllTXVsdGlUYWJsZURlbHRhU3RyZWFtZXIuamF2YQ==)
 | `78.39% <ø> (ø)` | `18.00 <0.00> (ø)` | |
   | 
[.../apache/hudi/sink/compact/CompactionPlanEvent.java](https://codecov.io/gh/apache/hudi/pull/2798/diff?src=pr=tree#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS9zaW5rL2NvbXBhY3QvQ29tcGFjdGlvblBsYW5FdmVudC5qYXZh)
 | `50.00% <0.00%> (-50.00%)` | `3.00% <0.00%> (ø%)` | |
   | 
[...pache/hudi/sink/compact/CompactionCommitEvent.java](https://codecov.io/gh/apache/hudi/pull/2798/diff?src=pr=tree#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS9zaW5rL2NvbXBhY3QvQ29tcGFjdGlvbkNvbW1pdEV2ZW50LmphdmE=)
 | `43.75% <0.00%> (-43.75%)` | `3.00% <0.00%> (ø%)` | |
   | 
[...i/common/table/timeline/TimelineMetadataUtils.java](https://codecov.io/gh/apache/hudi/pull/2798/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3RpbWVsaW5lL1RpbWVsaW5lTWV0YWRhdGFVdGlscy5qYXZh)
 | `70.17% <0.00%> (-2.56%)` | `17.00% <0.00%> (ø%)` | |
   | 
[...he/hudi/sink/partitioner/BucketAssignFunction.java](https://codecov.io/gh/apache/hudi/pull/2798/diff?src=pr=tree#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS9zaW5rL3BhcnRpdGlvbmVyL0J1Y2tldEFzc2lnbkZ1bmN0aW9uLmphdmE=)
 | `85.18% <0.00%> (-2.49%)` | `17.00% <0.00%> (+2.00%)` | :arrow_down: |
   | 
[...in/java/org/apache/hudi/table/HoodieTableSink.java](https://codecov.io/gh/apache/hudi/pull/2798/diff?src=pr=tree#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS90YWJsZS9Ib29kaWVUYWJsZVNpbmsuamF2YQ==)
 | `11.90% <0.00%> (-0.30%)` | `2.00% <0.00%> (ø%)` | |
   | 
[.../common/table/timeline/HoodieArchivedTimeline.java](https://codecov.io/gh/apache/hudi/pull/2798/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3RpbWVsaW5lL0hvb2RpZUFyY2hpdmVkVGltZWxpbmUuamF2YQ==)
 | `0.00% <0.00%> (ø)` | `0.00% <0.00%> (ø%)` | |
   | 

[GitHub] [hudi] codecov-io commented on pull request #2799: [HUDI-1784] Added print detailed stack log when hbase connection error

2021-04-09 Thread GitBox


codecov-io commented on pull request #2799:
URL: https://github.com/apache/hudi/pull/2799#issuecomment-817063994


   # [Codecov](https://codecov.io/gh/apache/hudi/pull/2799?src=pr=h1) Report
   > Merging 
[#2799](https://codecov.io/gh/apache/hudi/pull/2799?src=pr=desc) (84cf6fa) 
into 
[master](https://codecov.io/gh/apache/hudi/commit/6786581c4842e47e1a8a8e942f54003dc151c7c6?el=desc)
 (6786581) will **increase** coverage by `17.22%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/2799/graphs/tree.svg?width=650=150=pr=VTTXabwbs2)](https://codecov.io/gh/apache/hudi/pull/2799?src=pr=tree)
   
   ```diff
   @@  Coverage Diff  @@
   ## master#2799   +/-   ##
   =
   + Coverage 52.54%   69.77%   +17.22% 
   + Complexity 3707  374 - 
   =
 Files   485   54  -431 
 Lines 23171 1995-21176 
 Branches   2459  235 -2224 
   =
   - Hits  12176 1392-10784 
   + Misses 9923  473 -9450 
   + Partials   1072  130  -942 
   ```
   
   | Flag | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | hudicli | `?` | `?` | |
   | hudiclient | `?` | `?` | |
   | hudicommon | `?` | `?` | |
   | hudiflink | `?` | `?` | |
   | hudihadoopmr | `?` | `?` | |
   | hudisparkdatasource | `?` | `?` | |
   | hudisync | `?` | `?` | |
   | huditimelineservice | `?` | `?` | |
   | hudiutilities | `69.77% <ø> (+0.05%)` | `0.00 <ø> (ø)` | |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/2799?src=pr=tree) | Coverage Δ 
| Complexity Δ | |
   |---|---|---|---|
   | 
[...in/java/org/apache/hudi/hive/HoodieHiveClient.java](https://codecov.io/gh/apache/hudi/pull/2799/diff?src=pr=tree#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvSG9vZGllSGl2ZUNsaWVudC5qYXZh)
 | | | |
   | 
[.../common/bloom/HoodieDynamicBoundedBloomFilter.java](https://codecov.io/gh/apache/hudi/pull/2799/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL2Jsb29tL0hvb2RpZUR5bmFtaWNCb3VuZGVkQmxvb21GaWx0ZXIuamF2YQ==)
 | | | |
   | 
[...a/org/apache/hudi/common/bloom/InternalFilter.java](https://codecov.io/gh/apache/hudi/pull/2799/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL2Jsb29tL0ludGVybmFsRmlsdGVyLmphdmE=)
 | | | |
   | 
[...adoop/realtime/HoodieHFileRealtimeInputFormat.java](https://codecov.io/gh/apache/hudi/pull/2799/diff?src=pr=tree#diff-aHVkaS1oYWRvb3AtbXIvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaGFkb29wL3JlYWx0aW1lL0hvb2RpZUhGaWxlUmVhbHRpbWVJbnB1dEZvcm1hdC5qYXZh)
 | | | |
   | 
[...rg/apache/hudi/hadoop/HoodieROTablePathFilter.java](https://codecov.io/gh/apache/hudi/pull/2799/diff?src=pr=tree#diff-aHVkaS1oYWRvb3AtbXIvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaGFkb29wL0hvb2RpZVJPVGFibGVQYXRoRmlsdGVyLmphdmE=)
 | | | |
   | 
[...hudi/common/model/HoodieReplaceCommitMetadata.java](https://codecov.io/gh/apache/hudi/pull/2799/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL21vZGVsL0hvb2RpZVJlcGxhY2VDb21taXRNZXRhZGF0YS5qYXZh)
 | | | |
   | 
[.../apache/hudi/common/model/CompactionOperation.java](https://codecov.io/gh/apache/hudi/pull/2799/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL21vZGVsL0NvbXBhY3Rpb25PcGVyYXRpb24uamF2YQ==)
 | | | |
   | 
[...va/org/apache/hudi/cli/commands/CleansCommand.java](https://codecov.io/gh/apache/hudi/pull/2799/diff?src=pr=tree#diff-aHVkaS1jbGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY2xpL2NvbW1hbmRzL0NsZWFuc0NvbW1hbmQuamF2YQ==)
 | | | |
   | 
[.../apache/hudi/common/bootstrap/FileStatusUtils.java](https://codecov.io/gh/apache/hudi/pull/2799/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL2Jvb3RzdHJhcC9GaWxlU3RhdHVzVXRpbHMuamF2YQ==)
 | | | |
   | 
[...a/org/apache/hudi/common/util/ReflectionUtils.java](https://codecov.io/gh/apache/hudi/pull/2799/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3V0aWwvUmVmbGVjdGlvblV0aWxzLmphdmE=)
 | | | |
   | ... and [422 
more](https://codecov.io/gh/apache/hudi/pull/2799/diff?src=pr=tree-more) | |
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hj2016 commented on issue #2623: org.apache.hudi.exception.HoodieDependentSystemUnavailableException:System HBASE unavailable.

2021-04-09 Thread GitBox


hj2016 commented on issue #2623:
URL: https://github.com/apache/hudi/issues/2623#issuecomment-817062980


   @n3nash @nsivabalan @root18039532923 I also have the same connection 
problem. The debug found that it was because of the jar conflict. I submitted a 
pr [https://github.com/apache/hudi/pull/2799]. I hope that when the connection 
error is reported, a more detailed stack log will be printed to help accurately 
locate the problem.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (HUDI-1784) Added print detailed stack log when hbase connection error

2021-04-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-1784:
-
Labels: pull-request-available  (was: )

> Added print detailed stack log when hbase connection error
> --
>
> Key: HUDI-1784
> URL: https://issues.apache.org/jira/browse/HUDI-1784
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Index
>Reporter: jing
>Assignee: jing
>Priority: Major
>  Labels: pull-request-available
>
> I tried to upgrade hdfs to version 3.0 and found that hbase reported an error 
> and could not connect, but hbase was normal, and debug found that it was a 
> jar conflict problem. Exception failed to print detailed stack log, resulting 
> in no precise location of the cause of the problem.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] hj2016 opened a new pull request #2799: [HUDI-1784] Added print detailed stack log when hbase connection error

2021-04-09 Thread GitBox


hj2016 opened a new pull request #2799:
URL: https://github.com/apache/hudi/pull/2799


   ## *Tips*
   - *Thank you very much for contributing to Apache Hudi.*
   - *Please review https://hudi.apache.org/contributing.html before opening a 
pull request.*
   
   ## What is the purpose of the pull request
   I tried to upgrade hdfs to version 3.0 and found that hbase reported an 
error and could not connect, but hbase was normal, and debug found that it was 
a jar conflict problem. Exception failed to print detailed stack log, resulting 
in no precise location of the cause of the problem.
   
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (HUDI-1785) Move OperationConverter to hudi-client-common for code reuse

2021-04-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-1785:
-
Labels: pull-request-available  (was: )

> Move OperationConverter to hudi-client-common for code reuse
> 
>
> Key: HUDI-1785
> URL: https://issues.apache.org/jira/browse/HUDI-1785
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Xianghu Wang
>Assignee: Xianghu Wang
>Priority: Major
>  Labels: pull-request-available
>
> Currently, `OperationConverter` has been introduced twice in `hudi-flink` and 
> `hudi-utilities` module, we can move it to `hudi-client-common`, so it can be 
> used in both of them



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] wangxianghu opened a new pull request #2798: [HUDI-1785] Move OperationConverter to hudi-client-common for code reuse

2021-04-09 Thread GitBox


wangxianghu opened a new pull request #2798:
URL: https://github.com/apache/hudi/pull/2798


   ## *Tips*
   - *Thank you very much for contributing to Apache Hudi.*
   - *Please review https://hudi.apache.org/contributing.html before opening a 
pull request.*
   
   ## What is the purpose of the pull request
   
   Currently, `OperationConverter` has been introduced twice in `hudi-flink` 
and `hudi-utilities` module, we can move it to `hudi-client-common`, so it can 
be used in both of them
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Created] (HUDI-1785) Move OperationConverter to hudi-client-common for code reuse

2021-04-09 Thread Xianghu Wang (Jira)
Xianghu Wang created HUDI-1785:
--

 Summary: Move OperationConverter to hudi-client-common for code 
reuse
 Key: HUDI-1785
 URL: https://issues.apache.org/jira/browse/HUDI-1785
 Project: Apache Hudi
  Issue Type: Improvement
Reporter: Xianghu Wang


Currently, `OperationConverter` has been introduced twice in `hudi-flink` and 
`hudi-utilities` module, we can move it to `hudi-client-common`, so it can be 
used in both of them



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HUDI-1785) Move OperationConverter to hudi-client-common for code reuse

2021-04-09 Thread Xianghu Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xianghu Wang reassigned HUDI-1785:
--

Assignee: Xianghu Wang

> Move OperationConverter to hudi-client-common for code reuse
> 
>
> Key: HUDI-1785
> URL: https://issues.apache.org/jira/browse/HUDI-1785
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Xianghu Wang
>Assignee: Xianghu Wang
>Priority: Major
>
> Currently, `OperationConverter` has been introduced twice in `hudi-flink` and 
> `hudi-utilities` module, we can move it to `hudi-client-common`, so it can be 
> used in both of them



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-1784) Added print detailed stack log when hbase connection error

2021-04-09 Thread jing (Jira)
jing created HUDI-1784:
--

 Summary: Added print detailed stack log when hbase connection error
 Key: HUDI-1784
 URL: https://issues.apache.org/jira/browse/HUDI-1784
 Project: Apache Hudi
  Issue Type: Improvement
  Components: Index
Reporter: jing
Assignee: jing


I tried to upgrade hdfs to version 3.0 and found that hbase reported an error 
and could not connect, but hbase was normal, and debug found that it was a jar 
conflict problem. Exception failed to print detailed stack log, resulting in no 
precise location of the cause of the problem.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] jintaoguan commented on pull request #2773: [HUDI-1764] Add Hudi-CLI support for clustering

2021-04-09 Thread GitBox


jintaoguan commented on pull request #2773:
URL: https://github.com/apache/hudi/pull/2773#issuecomment-817008126


   Sure. Will do that.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] satishkotha edited a comment on pull request #2773: [HUDI-1764] Add Hudi-CLI support for clustering

2021-04-09 Thread GitBox


satishkotha edited a comment on pull request #2773:
URL: https://github.com/apache/hudi/pull/2773#issuecomment-817002604


   @jintaoguan  LGTM. Can you raise PR to update documentation on [CLI 
page](https://hudi.apache.org/docs/deployment.html#cli)  and add example 
command line screenshots? the documentation is in 'asf-site' branch. See 
"content/docs/deployment.html" in the branch.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] satishkotha edited a comment on pull request #2773: [HUDI-1764] Add Hudi-CLI support for clustering

2021-04-09 Thread GitBox


satishkotha edited a comment on pull request #2773:
URL: https://github.com/apache/hudi/pull/2773#issuecomment-817002604


   @jintaoguan  LGTM. Can you raise PR to update documentation on [CLI page ] 
(https://hudi.apache.org/docs/deployment.html#cli)  and add example command 
line screenshots? the documentation is in 'asf-site' branch. See 
"content/docs/deployment.html" in the branch.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] satishkotha commented on pull request #2773: [HUDI-1764] Add Hudi-CLI support for clustering

2021-04-09 Thread GitBox


satishkotha commented on pull request #2773:
URL: https://github.com/apache/hudi/pull/2773#issuecomment-817002604


   @jintaoguan  LGTM. Can you raise PR to update documentation on [CLI page ] 
(https://hudi.apache.org/docs/deployment.html#cli)  and add example command 
line screenshots? the documentation is in 'asf-site' branch. See 
"content/docs/0.7.0-deployment.html" in the branch.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] satishkotha commented on pull request #2388: [HUDI-1353] add incremental timeline support for pending clustering ops

2021-04-09 Thread GitBox


satishkotha commented on pull request #2388:
URL: https://github.com/apache/hudi/pull/2388#issuecomment-816997366


   @n3nash i dont have time in next 2-3 weeks to get this done. If you prefer, 
we can close this one. i can reopen (same PR or a different one) when i'm ready


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] n3nash commented on a change in pull request #2793: [HUDI-57] Support ORC Storage

2021-04-09 Thread GitBox


n3nash commented on a change in pull request #2793:
URL: https://github.com/apache/hudi/pull/2793#discussion_r610913501



##
File path: 
hudi-common/src/test/java/org/apache/hudi/common/util/TestAvroOrcUtils.java
##
@@ -0,0 +1,76 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.common.util;
+
+import static 
org.apache.hudi.common.testutils.HoodieTestDataGenerator.AVRO_SCHEMA;
+
+import java.util.Arrays;
+import java.util.List;
+import org.apache.avro.Schema;
+import org.apache.hudi.common.testutils.HoodieCommonTestHarness;
+import org.apache.orc.TypeDescription;
+
+import org.junit.jupiter.params.ParameterizedTest;
+import org.junit.jupiter.params.provider.Arguments;
+import org.junit.jupiter.params.provider.MethodSource;
+
+import static org.junit.jupiter.api.Assertions.assertEquals;
+
+public class TestAvroOrcUtils extends HoodieCommonTestHarness {
+
+  public static List testCreateOrcSchemaArgs() {
+// the ORC schema is constructed in the order as AVRO_SCHEMA:
+// TRIP_SCHEMA_PREFIX, EXTRA_TYPE_SCHEMA, MAP_TYPE_SCHEMA, 
FARE_NESTED_SCHEMA, TIP_NESTED_SCHEMA, TRIP_SCHEMA_SUFFIX
+// The following types are tested:
+// DATE, DECIMAL, LONG, INT, BYTES, ARRAY, RECORD, MAP, STRING, FLOAT, 
DOUBLE
+TypeDescription orcSchema = TypeDescription.fromString("struct<"

Review comment:
   Is this testing all primitive types ?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] n3nash commented on a change in pull request #2793: [HUDI-57] Support ORC Storage

2021-04-09 Thread GitBox


n3nash commented on a change in pull request #2793:
URL: https://github.com/apache/hudi/pull/2793#discussion_r610913343



##
File path: 
hudi-common/src/main/java/org/apache/hudi/common/util/OrcReaderIterator.java
##
@@ -0,0 +1,118 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.common.util;
+
+import java.util.List;
+import org.apache.avro.Schema;
+import org.apache.avro.generic.GenericData;
+import org.apache.avro.generic.GenericData.Record;
+import org.apache.orc.storage.ql.exec.vector.VectorizedRowBatch;
+import org.apache.hudi.exception.HoodieIOException;
+
+import org.apache.orc.RecordReader;
+import org.apache.orc.TypeDescription;
+
+import java.io.IOException;
+import java.util.Iterator;
+
+/**
+ * This class wraps a ORC reader and provides an iterator based api to read 
from an ORC file.
+ */
+public class OrcReaderIterator implements Iterator {

Review comment:
   Corresponding test class




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] n3nash commented on a change in pull request #2793: [HUDI-57] Support ORC Storage

2021-04-09 Thread GitBox


n3nash commented on a change in pull request #2793:
URL: https://github.com/apache/hudi/pull/2793#discussion_r610913269



##
File path: hudi-common/src/main/java/org/apache/hudi/common/util/OrcUtils.java
##
@@ -0,0 +1,172 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.common.util;
+
+import java.io.IOException;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+import java.util.stream.Collectors;
+import org.apache.avro.Schema;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.Path;
+import org.apache.orc.storage.ql.exec.vector.BytesColumnVector;
+import org.apache.orc.storage.ql.exec.vector.VectorizedRowBatch;
+import org.apache.hudi.avro.HoodieAvroWriteSupport;
+import org.apache.hudi.common.bloom.BloomFilter;
+import org.apache.hudi.common.bloom.BloomFilterFactory;
+import org.apache.hudi.common.bloom.BloomFilterTypeCode;
+import org.apache.hudi.common.model.HoodieRecord;
+import org.apache.hudi.exception.HoodieException;
+import org.apache.hudi.exception.HoodieIOException;
+import org.apache.hudi.exception.MetadataNotFoundException;
+import org.apache.orc.OrcFile;
+import org.apache.orc.OrcProto.UserMetadataItem;
+import org.apache.orc.Reader;
+import org.apache.orc.Reader.Options;
+import org.apache.orc.RecordReader;
+import org.apache.orc.TypeDescription;
+
+/**
+ * Utility functions for ORC files.
+ */
+public class OrcUtils {

Review comment:
   Add corresponding test class to test all public methods




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] n3nash commented on a change in pull request #2793: [HUDI-57] Support ORC Storage

2021-04-09 Thread GitBox


n3nash commented on a change in pull request #2793:
URL: https://github.com/apache/hudi/pull/2793#discussion_r610913156



##
File path: 
hudi-common/src/main/java/org/apache/hudi/io/storage/HoodieOrcReader.java
##
@@ -0,0 +1,88 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.io.storage;
+
+import java.io.IOException;
+import java.util.Iterator;
+import java.util.Set;
+import org.apache.avro.Schema;
+import org.apache.avro.generic.IndexedRecord;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.Path;
+import org.apache.hudi.common.bloom.BloomFilter;
+import org.apache.hudi.common.util.AvroOrcUtils;
+import org.apache.hudi.common.util.OrcReaderIterator;
+import org.apache.hudi.common.util.OrcUtils;
+import org.apache.hudi.exception.HoodieIOException;
+import org.apache.orc.OrcFile;
+import org.apache.orc.Reader;
+import org.apache.orc.Reader.Options;
+import org.apache.orc.RecordReader;
+import org.apache.orc.TypeDescription;
+
+public class HoodieOrcReader implements 
HoodieFileReader {

Review comment:
   Add corresponding test class




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] n3nash commented on a change in pull request #2793: [HUDI-57] Support ORC Storage

2021-04-09 Thread GitBox


n3nash commented on a change in pull request #2793:
URL: https://github.com/apache/hudi/pull/2793#discussion_r610911298



##
File path: 
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/storage/HoodieOrcWriter.java
##
@@ -0,0 +1,176 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.io.storage;
+
+import static 
org.apache.hudi.avro.HoodieAvroWriteSupport.HOODIE_AVRO_BLOOM_FILTER_METADATA_KEY;
+import static 
org.apache.hudi.avro.HoodieAvroWriteSupport.HOODIE_BLOOM_FILTER_TYPE_CODE;
+import static 
org.apache.hudi.avro.HoodieAvroWriteSupport.HOODIE_MAX_RECORD_KEY_FOOTER;
+import static 
org.apache.hudi.avro.HoodieAvroWriteSupport.HOODIE_MIN_RECORD_KEY_FOOTER;
+
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.util.List;
+import java.util.concurrent.atomic.AtomicLong;
+import org.apache.avro.Schema;
+import org.apache.avro.generic.GenericRecord;
+import org.apache.avro.generic.IndexedRecord;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.Path;
+import org.apache.orc.storage.ql.exec.vector.ColumnVector;
+import org.apache.orc.storage.ql.exec.vector.VectorizedRowBatch;
+import org.apache.hudi.common.bloom.BloomFilter;
+import org.apache.hudi.common.bloom.HoodieDynamicBoundedBloomFilter;
+import org.apache.orc.OrcFile;
+import org.apache.orc.TypeDescription;
+import org.apache.orc.Writer;
+import org.apache.hudi.avro.HoodieAvroUtils;
+import org.apache.hudi.common.engine.TaskContextSupplier;
+import org.apache.hudi.common.fs.FSUtils;
+import org.apache.hudi.common.fs.HoodieWrapperFileSystem;
+import org.apache.hudi.common.model.HoodieRecord;
+import org.apache.hudi.common.model.HoodieRecordPayload;
+import org.apache.hudi.common.util.AvroOrcUtils;
+
+public class HoodieOrcWriter
+implements HoodieFileWriter {
+  private static final AtomicLong RECORD_INDEX = new AtomicLong(1);
+
+  private final long maxFileSize;
+  private final Schema avroSchema;
+  private final List fieldTypes;
+  private final List fieldNames;
+  private final VectorizedRowBatch batch;
+  private final Writer writer;
+
+  private final Path file;
+  private final HoodieWrapperFileSystem fs;
+  private final String instantTime;
+  private final TaskContextSupplier taskContextSupplier;
+
+  private HoodieOrcConfig orcConfig;
+  private String minRecordKey;
+  private String maxRecordKey;
+
+  public HoodieOrcWriter(String instantTime, Path file, HoodieOrcConfig 
config, Schema schema,
+  TaskContextSupplier taskContextSupplier) throws IOException {
+
+Configuration conf = FSUtils.registerFileSystem(file, 
config.getHadoopConf());
+this.file = HoodieWrapperFileSystem.convertToHoodiePath(file, conf);
+this.fs = (HoodieWrapperFileSystem) this.file.getFileSystem(conf);
+this.instantTime = instantTime;
+this.taskContextSupplier = taskContextSupplier;
+
+this.avroSchema = schema;
+final TypeDescription orcSchema = AvroOrcUtils.createOrcSchema(avroSchema);
+this.fieldTypes = orcSchema.getChildren();
+this.fieldNames = orcSchema.getFieldNames();
+this.maxFileSize = config.getMaxFileSize();
+this.batch = orcSchema.createRowBatch();
+OrcFile.WriterOptions writerOptions = OrcFile.writerOptions(conf)
+.blockSize(config.getBlockSize())
+.stripeSize(config.getStripeSize())
+.compress(config.getCompressionKind())
+.bufferSize(config.getBlockSize())
+.fileSystem(fs)
+.setSchema(orcSchema);
+this.writer = OrcFile.createWriter(this.file, writerOptions);
+this.orcConfig = config;
+  }
+
+  @Override
+  public void writeAvroWithMetadata(R avroRecord, HoodieRecord record) throws 
IOException {
+String seqId = HoodieRecord.generateSequenceId(instantTime, 
taskContextSupplier.getPartitionIdSupplier().get(),
+RECORD_INDEX.getAndIncrement());
+HoodieAvroUtils.addHoodieKeyToRecord((GenericRecord) avroRecord, 
record.getRecordKey(),
+record.getPartitionPath(), file.getName());
+HoodieAvroUtils
+.addCommitMetadataToRecord((GenericRecord) avroRecord, instantTime, 
seqId);
+
+writeAvro(record.getRecordKey(), avroRecord);
+  }
+
+  @Override
+  public 

[GitHub] [hudi] n3nash commented on a change in pull request #2793: [HUDI-57] Support ORC Storage

2021-04-09 Thread GitBox


n3nash commented on a change in pull request #2793:
URL: https://github.com/apache/hudi/pull/2793#discussion_r610909843



##
File path: 
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieStorageConfig.java
##
@@ -39,10 +39,19 @@
   public static final String DEFAULT_PARQUET_BLOCK_SIZE_BYTES = 
DEFAULT_PARQUET_FILE_MAX_BYTES;
   public static final String PARQUET_PAGE_SIZE_BYTES = 
"hoodie.parquet.page.size";
   public static final String DEFAULT_PARQUET_PAGE_SIZE_BYTES = 
String.valueOf(1 * 1024 * 1024);
+
   public static final String HFILE_FILE_MAX_BYTES = 
"hoodie.hfile.max.file.size";
   public static final String HFILE_BLOCK_SIZE_BYTES = 
"hoodie.hfile.block.size";
   public static final String DEFAULT_HFILE_BLOCK_SIZE_BYTES = String.valueOf(1 
* 1024 * 1024);
   public static final String DEFAULT_HFILE_FILE_MAX_BYTES = String.valueOf(120 
* 1024 * 1024);
+
+  public static final String ORC_FILE_MAX_BYTES = "hoodie.orc.max.file.size";
+  public static final String DEFAULT_ORC_FILE_MAX_BYTES = String.valueOf(120 * 
1024 * 1024);
+  public static final String ORC_STRIPE_SIZE = "hoodie.orc.stripe.size";

Review comment:
   Can you please add comments on what is the stripe size used for ?

##
File path: 
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieStorageConfig.java
##
@@ -39,10 +39,19 @@
   public static final String DEFAULT_PARQUET_BLOCK_SIZE_BYTES = 
DEFAULT_PARQUET_FILE_MAX_BYTES;
   public static final String PARQUET_PAGE_SIZE_BYTES = 
"hoodie.parquet.page.size";
   public static final String DEFAULT_PARQUET_PAGE_SIZE_BYTES = 
String.valueOf(1 * 1024 * 1024);
+
   public static final String HFILE_FILE_MAX_BYTES = 
"hoodie.hfile.max.file.size";
   public static final String HFILE_BLOCK_SIZE_BYTES = 
"hoodie.hfile.block.size";
   public static final String DEFAULT_HFILE_BLOCK_SIZE_BYTES = String.valueOf(1 
* 1024 * 1024);
   public static final String DEFAULT_HFILE_FILE_MAX_BYTES = String.valueOf(120 
* 1024 * 1024);
+
+  public static final String ORC_FILE_MAX_BYTES = "hoodie.orc.max.file.size";
+  public static final String DEFAULT_ORC_FILE_MAX_BYTES = String.valueOf(120 * 
1024 * 1024);
+  public static final String ORC_STRIPE_SIZE = "hoodie.orc.stripe.size";
+  public static final String DEFAULT_ORC_STRIPE_SIZE = String.valueOf(64 * 
1024 * 1024);
+  public static final String ORC_BLOCK_SIZE = "hoodie.orc.block.size";

Review comment:
   Same for block size




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] vansimonsen commented on issue #2294: [SUPPORT] java.lang.IllegalArgumentException: Can not create a Path from an empty string on non partitioned COW table

2021-04-09 Thread GitBox


vansimonsen commented on issue #2294:
URL: https://github.com/apache/hudi/issues/2294#issuecomment-816889220


   > @vansimonsen : Can you open a new GH issue with the stack trace . 
@rubenssoto : I believe the PR landed before 0.7.0 was cut.
   @bvaradar 
   
   https://github.com/apache/hudi/issues/2797


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] vansimonsen opened a new issue #2797: [SUPPORT] Can not create a Path from an empty string on unpartitioned table

2021-04-09 Thread GitBox


vansimonsen opened a new issue #2797:
URL: https://github.com/apache/hudi/issues/2797


   **Describe the problem you faced**
   
   * Issue trying to create unpartitioned tables to hive metastore (in aws glue 
data catalog) using hudi (Tested on `0.6.0`, `0.7.0` and `0.8.0` )
   * Using hudi on AWS EMR, with pyspark
   
* Hudi config for unpartitioned tables
```
   hudiConfig = {
   "hoodie.datasource.write.precombine.field": ,
   "hoodie.datasource.write.recordkey.field": _PRIMARY_KEY_COLUMN,
   "hoodie.datasource.write.keygenerator.class": 
'org.apache.hudi.keygen.NonpartitionedKeyGenerator',
   "hoodie.datasource.hive_sync.partition_extractor_class": 
'org.apache.hudi.hive.NonPartitionedExtractor',
   "hoodie.datasource.write.hive_style_partitioning": "true",
   "className": "org.apache.hudi",
   "hoodie.datasource.hive_sync.use_jdbc": "false",
   "hoodie.consistency.check.enabled": "true",
   "hoodie.datasource.hive_sync.database": DB_NAME,
   "hoodie.datasource.hive_sync.enable": "true",
   "hoodie.datasource.hive_sync.support_timestamp": "true",
   }
```

   
   **To Reproduce**
   
   Steps to reproduce the behavior:
   
   1. Run hudi with hive integration
   2. Try to create an unpartitioned table, with config previously specified
   
   **Expected behavior**
   
   The table would be created without throw the exception, without any 
partition or `default` partitionpath
   
   **Environment Description**
   
   * Hudi version : `0.6.0`, `0.7.0` and `0.8.0`
   
   * Spark version :  `2.4.7` 
   
   * Hive version : Aws glue data catalog integration on EMR
   
   * Hadoop version : Amazon Hadoop distribution
   
   * Storage (HDFS/S3/GCS..) : S3
   
   * Running on Docker? (yes/no) : no
   
   
   **Stacktrace**
   
   
```sql
org.apache.hudi.hive.HoodieHiveSyncException: Failed to get update last 
commit time synced to 20210407181606
at 
org.apache.hudi.hive.HoodieHiveClient.updateLastCommitTimeSynced(HoodieHiveClient.java:496)
at 
org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:150)
at 
org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:94)
at 
org.apache.hudi.HoodieSparkSqlWriter$.org$apache$hudi$HoodieSparkSqlWriter$$syncHive(HoodieSparkSqlWriter.scala:355)
at 
org.apache.hudi.HoodieSparkSqlWriter$$anonfun$metaSync$2.apply(HoodieSparkSqlWriter.scala:403)
at 
org.apache.hudi.HoodieSparkSqlWriter$$anonfun$metaSync$2.apply(HoodieSparkSqlWriter.scala:399)
at scala.collection.mutable.HashSet.foreach(HashSet.scala:78)
at 
org.apache.hudi.HoodieSparkSqlWriter$.metaSync(HoodieSparkSqlWriter.scala:399)
at 
org.apache.hudi.HoodieSparkSqlWriter$.commitAndPerformPostOperations(HoodieSparkSqlWriter.scala:460)
at 
org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:217)
at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:134)
at 
org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45)
at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86)
at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:173)
at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:169)
at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:197)
at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at 
org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:194)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:169)
at 
org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:114)
at 
org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:112)
at 
org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:696)
at 
org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:696)
at 
org.apache.spark.sql.execution.SQLExecution$.org$apache$spark$sql$execution$SQLExecution$$executeQuery$1(SQLExecution.scala:83)
at 
org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1$$anonfun$apply$1.apply(SQLExecution.scala:94)
at 
org.apache.spark.sql.execution.QueryExecutionMetrics$.withMetrics(QueryExecutionMetrics.scala:141)
at 
org.apache.spark.sql.execution.SQLExecution$.org$apache$spark$sql$execution$SQLExecution$$withMetrics(SQLExecution.scala:178)
at 

[GitHub] [hudi] codecov-io edited a comment on pull request #2740: [HUDI-1055] Remove hardcoded parquet in tests

2021-04-09 Thread GitBox


codecov-io edited a comment on pull request #2740:
URL: https://github.com/apache/hudi/pull/2740#issuecomment-809855336


   # [Codecov](https://codecov.io/gh/apache/hudi/pull/2740?src=pr=h1) Report
   > Merging 
[#2740](https://codecov.io/gh/apache/hudi/pull/2740?src=pr=desc) (3e41f49) 
into 
[master](https://codecov.io/gh/apache/hudi/commit/6786581c4842e47e1a8a8e942f54003dc151c7c6?el=desc)
 (6786581) will **decrease** coverage by `43.17%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/2740/graphs/tree.svg?width=650=150=pr=VTTXabwbs2)](https://codecov.io/gh/apache/hudi/pull/2740?src=pr=tree)
   
   ```diff
   @@ Coverage Diff  @@
   ## master   #2740   +/-   ##
   
   - Coverage 52.54%   9.37%   -43.18% 
   + Complexity 3707  48 -3659 
   
 Files   485  54  -431 
 Lines 231711995-21176 
 Branches   2459 235 -2224 
   
   - Hits  12176 187-11989 
   + Misses 99231795 -8128 
   + Partials   1072  13 -1059 
   ```
   
   | Flag | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | hudicli | `?` | `?` | |
   | hudiclient | `?` | `?` | |
   | hudicommon | `?` | `?` | |
   | hudiflink | `?` | `?` | |
   | hudihadoopmr | `?` | `?` | |
   | hudisparkdatasource | `?` | `?` | |
   | hudisync | `?` | `?` | |
   | huditimelineservice | `?` | `?` | |
   | hudiutilities | `9.37% <ø> (-60.36%)` | `0.00 <ø> (ø)` | |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/2740?src=pr=tree) | Coverage Δ 
| Complexity Δ | |
   |---|---|---|---|
   | 
[...va/org/apache/hudi/utilities/IdentitySplitter.java](https://codecov.io/gh/apache/hudi/pull/2740/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL0lkZW50aXR5U3BsaXR0ZXIuamF2YQ==)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-2.00%)` | |
   | 
[...va/org/apache/hudi/utilities/schema/SchemaSet.java](https://codecov.io/gh/apache/hudi/pull/2740/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NjaGVtYS9TY2hlbWFTZXQuamF2YQ==)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-3.00%)` | |
   | 
[...a/org/apache/hudi/utilities/sources/RowSource.java](https://codecov.io/gh/apache/hudi/pull/2740/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvUm93U291cmNlLmphdmE=)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-4.00%)` | |
   | 
[.../org/apache/hudi/utilities/sources/AvroSource.java](https://codecov.io/gh/apache/hudi/pull/2740/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvQXZyb1NvdXJjZS5qYXZh)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-1.00%)` | |
   | 
[.../org/apache/hudi/utilities/sources/JsonSource.java](https://codecov.io/gh/apache/hudi/pull/2740/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvSnNvblNvdXJjZS5qYXZh)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-1.00%)` | |
   | 
[...rg/apache/hudi/utilities/sources/CsvDFSSource.java](https://codecov.io/gh/apache/hudi/pull/2740/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvQ3N2REZTU291cmNlLmphdmE=)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-10.00%)` | |
   | 
[...g/apache/hudi/utilities/sources/JsonDFSSource.java](https://codecov.io/gh/apache/hudi/pull/2740/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvSnNvbkRGU1NvdXJjZS5qYXZh)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-4.00%)` | |
   | 
[...apache/hudi/utilities/sources/JsonKafkaSource.java](https://codecov.io/gh/apache/hudi/pull/2740/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvSnNvbkthZmthU291cmNlLmphdmE=)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-6.00%)` | |
   | 
[...pache/hudi/utilities/sources/ParquetDFSSource.java](https://codecov.io/gh/apache/hudi/pull/2740/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvUGFycXVldERGU1NvdXJjZS5qYXZh)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-5.00%)` | |
   | 
[...lities/schema/SchemaProviderWithPostProcessor.java](https://codecov.io/gh/apache/hudi/pull/2740/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NjaGVtYS9TY2hlbWFQcm92aWRlcldpdGhQb3N0UHJvY2Vzc29yLmphdmE=)
 | `0.00% <0.00%> 

[GitHub] [hudi] TeRS-K commented on a change in pull request #2740: [HUDI-1055] Remove hardcoded parquet in tests

2021-04-09 Thread GitBox


TeRS-K commented on a change in pull request #2740:
URL: https://github.com/apache/hudi/pull/2740#discussion_r610789102



##
File path: hudi-cli/src/main/scala/org/apache/hudi/cli/SparkHelpers.scala
##
@@ -40,7 +40,7 @@ import scala.collection.mutable._
 object SparkHelpers {
   @throws[Exception]
   def skipKeysAndWriteNewFile(instantTime: String, fs: FileSystem, sourceFile: 
Path, destinationFile: Path, keysToSkip: Set[String]) {
-val sourceRecords = ParquetUtils.readAvroRecords(fs.getConf, sourceFile)
+val sourceRecords = new ParquetUtils().readAvroRecords(fs.getConf, 
sourceFile)

Review comment:
   I removed all instances of `new ParquetUtils()` except for the one in 
`HoodieSparkBootstrapSchemaProvider::getBootstrapSourceSchema` as 
`readSchema()` returns a schema per file type. For parquet, `readSchema` 
returns a `org.apache.parquet.schema.MessageType`, for ORC (in future work), 
`readSchema` would return a `org.apache.orc.TypeDescription`, so this method 
cannot be extended from the base class. Does that make sense?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] rubenssoto edited a comment on issue #2509: [SUPPORT]Hudi saves TimestampType as bigInt

2021-04-09 Thread GitBox


rubenssoto edited a comment on issue #2509:
URL: https://github.com/apache/hudi/issues/2509#issuecomment-816812025


   Hello Guys,
   
   @satishkotha  @nsivabalan
   
   Athena Behavior changes,
   
   https://user-images.githubusercontent.com/36298331/114213658-a841c400-9939-11eb-9fc9-a2e51761908e.png;>
   https://user-images.githubusercontent.com/36298331/114213672-ad067800-9939-11eb-872d-fe264f97fcde.png;>
   
   
   This is a great news, but BETWEEN operator doesn't work.
   
   For exemple, this query works:
   select count(1) FROM "order" WHERE created_date >= cast('2021-04-07 
03:00:00.000' as timestamp)
   
   and this query doens't work:
   select count(1) FROM "order" WHERE created_date between cast('2021-04-09 
14:00:00.000' as timestamp) and cast('2021-04-09 15:00:00.000' as timestamp)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] rubenssoto commented on issue #2509: [SUPPORT]Hudi saves TimestampType as bigInt

2021-04-09 Thread GitBox


rubenssoto commented on issue #2509:
URL: https://github.com/apache/hudi/issues/2509#issuecomment-816812025


   Hello Guys,
   
   Athena Behavior changes,
   
   https://user-images.githubusercontent.com/36298331/114213658-a841c400-9939-11eb-9fc9-a2e51761908e.png;>
   https://user-images.githubusercontent.com/36298331/114213672-ad067800-9939-11eb-872d-fe264f97fcde.png;>
   
   
   This is a great news, but BETWEEN operator doesn't work.
   
   For exemple, this query works:
   select count(1) FROM "order" WHERE created_date >= cast('2021-04-07 
03:00:00.000' as timestamp)
   
   and this query doens't work:
   select count(1) FROM "order" WHERE created_date between cast('2021-04-09 
14:00:00.000' as timestamp) and cast('2021-04-09 15:00:00.000' as timestamp)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] TeRS-K closed pull request #2793: [HUDI-57] Support ORC Storage

2021-04-09 Thread GitBox


TeRS-K closed pull request #2793:
URL: https://github.com/apache/hudi/pull/2793


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] yanghua commented on a change in pull request #2796: [HUDI-1783]support Huawei Cloud Object Storage

2021-04-09 Thread GitBox


yanghua commented on a change in pull request #2796:
URL: https://github.com/apache/hudi/pull/2796#discussion_r610693252



##
File path: 
hudi-common/src/main/java/org/apache/hudi/common/fs/StorageSchemes.java
##
@@ -53,8 +53,9 @@
   // Databricks file system
   DBFS("dbfs", false),
   // IBM Cloud Object Storage
-  COS("cos", false);
-
+  COS("cos", false),
+  // Huawei Cloud Object Storage
+  OBS("obs", false);

Review comment:
   Can we add an empty line like before?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] kimberlyamandalu commented on issue #2620: [SUPPORT] Performance Tuning: Slow stages (Building Workload Profile & Getting Small files from partitions) during Hudi Writes

2021-04-09 Thread GitBox


kimberlyamandalu commented on issue #2620:
URL: https://github.com/apache/hudi/issues/2620#issuecomment-816698689


   > @kimberlyamandalu : do you have a support ticket for your question. lets 
not pollute this issue. we can create a new one for your use-case and can 
discuss over there
   
   hi @nsivabalan no, i do not have a separate ticket for my question. I 
thought it might be related to this so I chimed in. I can open a new ticket for 
my use case so we can isolate. Sorry for the confusion. Thanks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] xiarixiaoyao commented on pull request #2796: [HUDI-1783]support Huawei Cloud Object Storage

2021-04-09 Thread GitBox


xiarixiaoyao commented on pull request #2796:
URL: https://github.com/apache/hudi/pull/2796#issuecomment-816688293


   @nsivabalan  could you pls help me to review this pr?  thanks


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (HUDI-1783) support Huawei Cloud Object Storage

2021-04-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-1783:
-
Labels: pull-request-available  (was: )

> support Huawei  Cloud Object Storage
> 
>
> Key: HUDI-1783
> URL: https://issues.apache.org/jira/browse/HUDI-1783
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Common Core
>Reporter: tao meng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> add support  for Huawei Cloud Object Storage



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-1752) Add HoodieFlinkClient InsertOverwrite

2021-04-09 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-1752:
--
Fix Version/s: (was: 0.8.0)
   0.9.0

> Add HoodieFlinkClient InsertOverwrite
> -
>
> Key: HUDI-1752
> URL: https://issues.apache.org/jira/browse/HUDI-1752
> Project: Apache Hudi
>  Issue Type: New Feature
>  Components: CLI, Flink Integration
>Reporter: xurunbai
>Priority: Minor
>  Labels: features
> Fix For: 0.9.0
>
>
> Add HoodieFlinkClient InsertOverwrite



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] xiarixiaoyao opened a new pull request #2796: [HUDI-1783]support Huawei Cloud Object Storage

2021-04-09 Thread GitBox


xiarixiaoyao opened a new pull request #2796:
URL: https://github.com/apache/hudi/pull/2796


   ## *Tips*
   - *Thank you very much for contributing to Apache Hudi.*
   - *Please review https://hudi.apache.org/contributing.html before opening a 
pull request.*
   
   ## What is the purpose of the pull request
   
   add support for huawei cloud object stroage
   
   ## Brief change log
   
   *(for example:)*
 - *Modify AnnotationLocation checkstyle rule in checkstyle.xml*
   
   ## Verify this pull request
   
   *(Please pick either of the following options)*
   
   This pull request is a trivial rework / code cleanup without any test 
coverage.
   
   *(or)*
   
   This pull request is already covered by existing tests, such as *(please 
describe tests)*.
   
   (or)
   
   This change added tests and can be verified as follows:
   
   *(example:)*
   
 - *Added integration tests for end-to-end.*
 - *Added HoodieClientWriteTest to verify the change.*
 - *Manually verified the change by running a job locally.*
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Created] (HUDI-1783) support Huawei Cloud Object Storage

2021-04-09 Thread tao meng (Jira)
tao meng created HUDI-1783:
--

 Summary: support Huawei  Cloud Object Storage
 Key: HUDI-1783
 URL: https://issues.apache.org/jira/browse/HUDI-1783
 Project: Apache Hudi
  Issue Type: Bug
  Components: Common Core
Reporter: tao meng
 Fix For: 0.9.0


add support  for Huawei Cloud Object Storage



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] Magicbeanbuyer commented on issue #2498: [SUPPORT] Hudi MERGE_ON_READ load to dataframe fails for the versions [0.6.0],[0.7.0] and runs for [0.5.3]

2021-04-09 Thread GitBox


Magicbeanbuyer commented on issue #2498:
URL: https://github.com/apache/hudi/issues/2498#issuecomment-816555746


   Hey @nsivabalan, 
   
   We have wrapped up our POC, therefore no  longer have the setup anymore. 
Sorry couldn't contribute further to the issue.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] xiarixiaoyao edited a comment on pull request #2720: [HUDI-1719]hive on spark/mr,Incremental query of the mor table, the partition field is incorrect

2021-04-09 Thread GitBox


xiarixiaoyao edited a comment on pull request #2720:
URL: https://github.com/apache/hudi/pull/2720#issuecomment-816553709


   @nsivabalan  i find that the question: why test in 
TestHoodieCombineHiveInputFormat is Disabled ?
   @test
   @disabled
   public void testHoodieRealtimeCombineHoodieInputFormat() throws Exception {  
   。 this ut will failed when i enable it。 now i fixed the bug of this 
UT,can you push a patch for this test?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] xiarixiaoyao commented on pull request #2720: [HUDI-1719]hive on spark/mr,Incremental query of the mor table, the partition field is incorrect

2021-04-09 Thread GitBox


xiarixiaoyao commented on pull request #2720:
URL: https://github.com/apache/hudi/pull/2720#issuecomment-816553709


   @nsivabalan  i find that the question: why test in 
TestHoodieCombineHiveInputFormat is Disabled ?
   @test
   @disabled
   public void testHoodieRealtimeCombineHoodieInputFormat() throws Exception {  
   。 this ut will failed when i enable it。 now i fixed the bug of this 
UT,can you push a patch for this test


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[hudi] branch asf-site updated: Travis CI build asf-site

2021-04-09 Thread vinoth
This is an automated email from the ASF dual-hosted git repository.

vinoth pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/asf-site by this push:
 new b353b0b  Travis CI build asf-site
b353b0b is described below

commit b353b0b66cf17ccfbbf55d58d741fdebd66ac0a4
Author: CI 
AuthorDate: Fri Apr 9 08:39:56 2021 +

Travis CI build asf-site
---
 content/docs/0.8.0-concurrency_control.html |  6 --
 content/docs/0.8.0-configurations.html  | 32 -
 content/docs/concurrency_control.html   |  6 --
 content/docs/configurations.html| 32 -
 4 files changed, 36 insertions(+), 40 deletions(-)

diff --git a/content/docs/0.8.0-concurrency_control.html 
b/content/docs/0.8.0-concurrency_control.html
index d28ec87..4e540b5 100644
--- a/content/docs/0.8.0-concurrency_control.html
+++ b/content/docs/0.8.0-concurrency_control.html
@@ -403,8 +403,6 @@ hoodie.write.lock.provider=lock-provider-classname
 hoodie.write.lock.provider=org.apache.hudi.client.transaction.lock.ZookeeperBasedLockProvider
 hoodie.write.lock.zookeeper.url
 hoodie.write.lock.zookeeper.port
-hoodie.write.lock.wait_time_ms
-hoodie.write.lock.num_retries
 hoodie.write.lock.zookeeper.lock_key
 hoodie.write.lock.zookeeper.base_path
 
@@ -414,8 +412,6 @@ hoodie.write.lock.zookeeper.base_path
 hoodie.write.lock.provider=org.apache.hudi.hive.HiveMetastoreBasedLockProvider
 hoodie.write.lock.hivemetastore.database
 hoodie.write.lock.hivemetastore.table
-hoodie.write.lock.wait_time_ms
-hoodie.write.lock.num_retries
 
 
 The HiveMetastore URI's are picked up from 
the hadoop configuration file loaded during runtime.
@@ -433,8 +429,6 @@ hoodie.write.lock.num_retries
.option("hoodie.write.concurrency.mode", "optimistic_concurrency_control")
.option("hoodie.write.lock.zookeeper.url", "zookeeper")
.option("hoodie.write.lock.zookeeper.port", 
"2181")
-   .option("hoodie.write.lock.wait_time_ms", "12000")
-   .option("hoodie.write.lock.num_retries", "2")
.option("hoodie.write.lock.zookeeper.lock_key", 
"test_table")
.option("hoodie.write.lock.zookeeper.base_path", "/test")
.option(RECORDKEY_FIELD_OPT_KEY, "uuid")
diff --git a/content/docs/0.8.0-configurations.html 
b/content/docs/0.8.0-configurations.html
index 434418f..960dfb4 100644
--- a/content/docs/0.8.0-configurations.html
+++ b/content/docs/0.8.0-configurations.html
@@ -999,6 +999,10 @@ HoodieWriteConfig can be built using a builder pattern as 
below.
 Property: hoodie.cleaner.policy 
  Cleaning policy to be used. Hudi will delete older 
versions of parquet files to re-claim space. Any Query/Computation referring to 
this version of the file will fail. It is good to make sure that the data is 
retained for more than the maximum query execution time.
 
+withFailedWritesCleaningPolicy(policy 
= HoodieFailedWritesCleaningPolicy.EAGER)
+Property: hoodie.cleaner.policy.failed.writes 
+ Cleaning policy for failed writes to be used. Hudi 
will delete any files written by failed writes to re-claim space. Choose to 
perform this rollback of failed writes eagerly before every writer starts (only 
supported for single writer) or lazily 
by the cleaner (required for multi-writers)
+
 retainCommits(no_of_commits_to_retain = 24)
 Property: hoodie.cleaner.commits.retained 
 Number of commits to retain. So data will be retained 
for num_of_commits * time_between_commits (scheduled). This also directly 
translates into how much you can incrementally pull on this table
@@ -1360,59 +1364,59 @@ Each clustering operation can create multiple groups. 
Total amount of data proce
 withLockConfig (HoodieLockConfig) 
 
 withLockProvider(lockProvider = 
org.apache.hudi.client.transaction.lock.ZookeeperBasedLockProvider)
-Property: hoodie.writer.lock.provider 
+Property: hoodie.write.lock.provider 

 Lock provider class name, user can provide their own 
implementation of LockProvider which should be subclass of 
org.apache.hudi.common.lock.LockProvider
 
 withZkQuorum(zkQuorum)
-Property: hoodie.writer.lock.zookeeper.url 
+Property: hoodie.write.lock.zookeeper.url 
 Set the list of comma separated servers to connect 
to
 
 withZkBasePath(zkBasePath)
-Property: hoodie.writer.lock.zookeeper.base_path 
[Required] 
+Property: hoodie.write.lock.zookeeper.base_path 
[Required] 
 The base path on Zookeeper under which to create a 
ZNode to acquire the lock. This should be common for all jobs writing to the 
same table
 
 withZkPort(zkPort)
-Property: hoodie.writer.lock.zookeeper.port [Required] 

+Property: hoodie.write.lock.zookeeper.port [Required] 

 The connection port to be used for 
Zookeeper
 
 withZkLockKey(zkLockKey)
-Property: hoodie.writer.lock.zookeeper.lock_key 
[Required] 
+Property: hoodie.write.lock.zookeeper.lock_key 
[Required] 
 Key name under base_path at which to create a ZNode 
and 

[GitHub] [hudi] yanghua commented on pull request #2793: [HUDI-57] Support ORC Storage

2021-04-09 Thread GitBox


yanghua commented on pull request #2793:
URL: https://github.com/apache/hudi/pull/2793#issuecomment-816495293


   >  How can I trigger a rebuild?
   
   option 1: close and reopen the PR;
   option 2: push an empty commit via git command
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] n3nash merged pull request #2794: [MINOR] Fix concurrency docs

2021-04-09 Thread GitBox


n3nash merged pull request #2794:
URL: https://github.com/apache/hudi/pull/2794


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[hudi] branch asf-site updated: [MINOR] Fix concurrency docs (#2794)

2021-04-09 Thread nagarwal
This is an automated email from the ASF dual-hosted git repository.

nagarwal pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/asf-site by this push:
 new a961350  [MINOR] Fix concurrency docs (#2794)
a961350 is described below

commit a961350740abf4d1637798bc287bd0b6b9800305
Author: n3nash 
AuthorDate: Fri Apr 9 00:48:45 2021 -0700

[MINOR] Fix concurrency docs (#2794)
---
 docs/_docs/0.8.0/2_4_configurations.md  | 32 -
 docs/_docs/0.8.0/2_9_concurrency_control.md |  6 --
 docs/_docs/2_4_configurations.md| 32 -
 docs/_docs/2_9_concurrency_control.md   |  6 --
 4 files changed, 36 insertions(+), 40 deletions(-)

diff --git a/docs/_docs/0.8.0/2_4_configurations.md 
b/docs/_docs/0.8.0/2_4_configurations.md
index 0a5a4ab..207bf80 100644
--- a/docs/_docs/0.8.0/2_4_configurations.md
+++ b/docs/_docs/0.8.0/2_4_configurations.md
@@ -469,6 +469,10 @@ Configs that control compaction (merging of log files onto 
a new parquet base fi
 Property: `hoodie.cleaner.policy` 
  Cleaning policy to be used. Hudi will delete older 
versions of parquet files to re-claim space. Any Query/Computation referring to 
this version of the file will fail. It is good to make sure that the data is 
retained for more than the maximum query execution time.
 
+ withFailedWritesCleaningPolicy(policy = 
HoodieFailedWritesCleaningPolicy.EAGER) {#withFailedWritesCleaningPolicy} 
+Property: `hoodie.cleaner.policy.failed.writes` 
+ Cleaning policy for failed writes to be used. Hudi 
will delete any files written by failed writes to re-claim space. Choose to 
perform this rollback of failed writes `eagerly` before every writer starts 
(only supported for single writer) or `lazily` by the cleaner (required for 
multi-writers)
+
  retainCommits(no_of_commits_to_retain = 24) {#retainCommits} 
 Property: `hoodie.cleaner.commits.retained` 
 Number of commits to retain. So data will be retained 
for num_of_commits * time_between_commits (scheduled). This also directly 
translates into how much you can incrementally pull on this table
@@ -831,59 +835,59 @@ Configs that control locking mechanisms if 
[WriteConcurrencyMode=optimistic_conc
 [withLockConfig](#withLockConfig) (HoodieLockConfig) 
 
  withLockProvider(lockProvider = 
org.apache.hudi.client.transaction.lock.ZookeeperBasedLockProvider) 
{#withLockProvider}
-Property: `hoodie.writer.lock.provider` 
+Property: `hoodie.write.lock.provider` 
 Lock provider class name, user can provide their own 
implementation of LockProvider which should be subclass of 
org.apache.hudi.common.lock.LockProvider
 
  withZkQuorum(zkQuorum) {#withZkQuorum}
-Property: `hoodie.writer.lock.zookeeper.url` 
+Property: `hoodie.write.lock.zookeeper.url` 
 Set the list of comma separated servers to connect 
to
 
  withZkBasePath(zkBasePath) {#withZkBasePath}
-Property: `hoodie.writer.lock.zookeeper.base_path` [Required] 
+Property: `hoodie.write.lock.zookeeper.base_path` [Required] 
 The base path on Zookeeper under which to create a 
ZNode to acquire the lock. This should be common for all jobs writing to the 
same table
 
  withZkPort(zkPort) {#withZkPort}
-Property: `hoodie.writer.lock.zookeeper.port` [Required] 
+Property: `hoodie.write.lock.zookeeper.port` [Required] 
 The connection port to be used for Zookeeper
 
  withZkLockKey(zkLockKey) {#withZkLockKey}
-Property: `hoodie.writer.lock.zookeeper.lock_key` [Required] 
+Property: `hoodie.write.lock.zookeeper.lock_key` [Required] 
 Key name under base_path at which to create a ZNode 
and acquire lock. Final path on zk will look like base_path/lock_key. We 
recommend setting this to the table name
 
  withZkConnectionTimeoutInMs(connectionTimeoutInMs = 15000) 
{#withZkConnectionTimeoutInMs}
-Property: `hoodie.writer.lock.zookeeper.connection_timeout_ms` 
+Property: `hoodie.write.lock.zookeeper.connection_timeout_ms` 
 How long to wait when connecting to ZooKeeper before 
considering the connection a failure
 
  withZkSessionTimeoutInMs(sessionTimeoutInMs = 6) 
{#withZkSessionTimeoutInMs}
-Property: `hoodie.writer.lock.zookeeper.session_timeout_ms` 
+Property: `hoodie.write.lock.zookeeper.session_timeout_ms` 
 How long to wait after losing a connection to 
ZooKeeper before the session is expired
 
  withNumRetries(num_retries = 3) {#withNumRetries}
-Property: `hoodie.writer.lock.num_retries` 
+Property: `hoodie.write.lock.num_retries` 
 Maximum number of times to retry by lock provider 
client
 
  withRetryWaitTimeInMillis(retryWaitTimeInMillis = 5000) 
{#withRetryWaitTimeInMillis}
-Property: `hoodie.writer.lock.wait_time_ms_between_retry` 
+Property: `hoodie.write.lock.wait_time_ms_between_retry` 
 Initial amount of time to wait between retries by 
lock provider client
 
  withHiveDatabaseName(hiveDatabaseName) {#withHiveDatabaseName}

[GitHub] [hudi] garyli1019 commented on pull request #2786: [HUDI-1782] Add more options for HUDI Flink

2021-04-09 Thread GitBox


garyli1019 commented on pull request #2786:
URL: https://github.com/apache/hudi/pull/2786#issuecomment-816462388


   > > Should we change the 0.8.0 doc as well? It will be merged soon. #2792
   > 
   > I think it is not necessary ? People would always see the master document.
   
   If the new options only affect the master then it should be fine. There are 
many users are still using an older version, like AWS EMR still on 0.6.0. So 
the versioned doc is still worth it to maintain. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-1711) Avro Schema Exception with Spark 3.0 in 0.7

2021-04-09 Thread sivabalan narayanan (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-1711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17317684#comment-17317684
 ] 

sivabalan narayanan commented on HUDI-1711:
---

[~cdmikechen]: Can you give me the Avro schema for which exception is seen. and 
can you please confirm my understanding as to when exactly the exception is 
seen. 
 * you see the exception with default value for 
"hoodie.deltastreamer.schemaprovider.spark_avro_post_processor.enable" which 
means is enabled. 
 * Did you try w/ spark2* with hudi 0.7.0? 
 * FYI: in latest release, we have added support for custom deserializer for 
kafka which is capable to leverage latest schema from schema registry.  
[https://github.com/apache/hudi/pull/2619 
|https://github.com/apache/hudi/pull/2619v]

> Avro Schema Exception with Spark 3.0 in 0.7
> ---
>
> Key: HUDI-1711
> URL: https://issues.apache.org/jira/browse/HUDI-1711
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: DeltaStreamer
>Reporter: Balaji Varadarajan
>Assignee: sivabalan narayanan
>Priority: Major
>  Labels: sev:critical, user-support-issues
>
> GH: [https://github.com/apache/hudi/issues/2705]
>  
>  
> {{21/03/22 10:10:35 WARN util.package: Truncated the string representation of 
> a plan since it was too large. This behavior can be adjusted by setting 
> 'spark.sql.debug.maxToStringFields'.
> 21/03/22 10:10:35 ERROR executor.Executor: Exception in task 0.0 in stage 1.0 
> (TID 1)
> java.lang.RuntimeException: Error while decoding: 
> java.lang.NegativeArraySizeException: -1255727808
> createexternalrow(if (isnull(input[0, 
> struct,
>  true])) null else createexternalrow(if (input[0, 
> struct,
>  true].isNullAt) null else input[0, 
> struct,
>  true].id, if (input[0, 
> struct,
>  true].isNullAt) null else input[0, 
> struct,
>  true].name.toString, if (input[0, 
> struct,
>  true].isNullAt) null else input[0, 
> struct,
>  true].type.toString, if (input[0, 
> struct,
>  true].isNullAt) null else input[0, 
> struct,
>  true].url.toString, if (input[0, 
> struct,
>  true].isNullAt) null else input[0, 
> struct,
>  true].user.toString, if (input[0, 
> struct,
>  true].isNullAt) null else input[0, 
> struct,
>  true].password.toString, if (input[0, 
> struct,
>  true].isNullAt) null else input[0, 
> struct,
>  true].create_time.toString, if (input[0, 
> struct,
>  true].isNullAt) null else input[0, 
> struct,
>  true].create_user.toString, if (input[0, 
> struct,
>  true].isNullAt) null else input[0, 
> struct,
>  true].update_time.toString, if (input[0, 
> struct,
>  true].isNullAt) null else input[0, 
> struct,
>  true].update_user.toString, if (input[0, 
> struct,
>  true].isNullAt) null else input[0, 
> struct,
>  true].del_flag, StructField(id,IntegerType,false), 
> StructField(name,StringType,true), StructField(type,StringType,true), 
> StructField(url,StringType,true), StructField(user,StringType,true), 
> StructField(password,StringType,true), 
> StructField(create_time,StringType,true), 
> StructField(create_user,StringType,true), 
> StructField(update_time,StringType,true), 
> StructField(update_user,StringType,true), 
> StructField(del_flag,IntegerType,true)), if (isnull(input[1, 
> struct,
>  true])) null else createexternalrow(if (input[1, 
> struct,
>  true].isNullAt) null else input[1, 
> struct,
>  true].id, if (input[1, 
> struct,
>  true].isNullAt) null else input[1, 
> struct,
>  true].name.toString, if (input[1, 
> struct,
>  true].isNullAt) null else input[1, 
> struct,
>  true].type.toString, if (input[1, 
> struct,
>  true].isNullAt) null else input[1, 
> struct,
>  true].url.toString, if (input[1, 
> struct,
>  true].isNullAt) null else input[1, 
> struct,
>  true].user.toString, if (input[1, 
> struct,
>  true].isNullAt) null else input[1, 
> struct,
>  true].password.toString, if (input[1, 
> struct,
>  true].isNullAt) null else input[1, 
> struct,
>  true].create_time.toString, if (input[1, 
> struct,
>  true].isNullAt) null else input[1, 
> struct,
>  true].create_user.toString, if (input[1, 
> struct,
>  true].isNullAt) null else input[1, 
> struct,
>  true].update_time.toString, if (input[1, 
> struct,
>  true].isNullAt) null else input[1, 
> struct,
>  true].update_user.toString, if (input[1, 
> struct,
>  true].isNullAt) null else input[1, 
> struct,
>  true].del_flag, StructField(id,IntegerType,false), 
> StructField(name,StringType,true), StructField(type,StringType,true), 
> StructField(url,StringType,true), StructField(user,StringType,true), 
> StructField(password,StringType,true), 
> StructField(create_time,StringType,true), 
> StructField(create_user,StringType,true), 
> StructField(update_time,StringType,true), 
> StructField(update_user,StringType,true), 
> StructField(del_flag,IntegerType,true)), if (isnull(input[2,