[jira] [Commented] (HUDI-677) Abstract/Refactor all transaction management logic into a set of classes from HoodieWriteClient

2020-03-28 Thread hong dongdong (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17070217#comment-17070217
 ] 

hong dongdong commented on HUDI-677:


[~vinoth] Of course not. 

> Abstract/Refactor all transaction management logic into a set of classes from 
> HoodieWriteClient
> ---
>
> Key: HUDI-677
> URL: https://issues.apache.org/jira/browse/HUDI-677
> Project: Apache Hudi (incubating)
>  Issue Type: Sub-task
>  Components: Code Cleanup
>Reporter: Vinoth Chandar
>Assignee: Vinoth Chandar
>Priority: Major
> Fix For: 0.6.0
>
>
> Over time a lot of the core transaction management code has been  split 
> across various files in hudi-client.. We want to clean this up and present a 
> nice interface.. 
> Some notes and thoughts and suggestions..  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] codecov-io edited a comment on issue #1460: [HUDI-679] Make io package Spark free

2020-03-28 Thread GitBox
codecov-io edited a comment on issue #1460: [HUDI-679] Make io package Spark 
free
URL: https://github.com/apache/incubator-hudi/pull/1460#issuecomment-605428358
 
 
   # 
[Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1460?src=pr=h1) 
Report
   > Merging 
[#1460](https://codecov.io/gh/apache/incubator-hudi/pull/1460?src=pr=desc) 
into 
[master](https://codecov.io/gh/apache/incubator-hudi/commit/1713f686f86e8c2f0a908c313cca9b595c6aed33=desc)
 will **decrease** coverage by `0.06%`.
   > The diff coverage is `97.36%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-hudi/pull/1460/graphs/tree.svg?width=650=150=pr=VTTXabwbs2)](https://codecov.io/gh/apache/incubator-hudi/pull/1460?src=pr=tree)
   
   ```diff
   @@ Coverage Diff  @@
   ## master#1460  +/-   ##
   
   - Coverage 67.66%   67.60%   -0.07% 
 Complexity  261  261  
   
 Files   342  348   +6 
 Lines 1651016670 +160 
 Branches   1684 1693   +9 
   
   + Hits  1117211270  +98 
   - Misses 4599 4661  +62 
 Partials739  739  
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/incubator-hudi/pull/1460?src=pr=tree) | 
Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | 
[.../hudi/execution/MergeOnReadLazyInsertIterable.java](https://codecov.io/gh/apache/incubator-hudi/pull/1460/diff?src=pr=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvZXhlY3V0aW9uL01lcmdlT25SZWFkTGF6eUluc2VydEl0ZXJhYmxlLmphdmE=)
 | `64.70% <66.66%> (ø)` | `0.00 <0.00> (ø)` | |
   | 
[...g/apache/hudi/client/SparkTaskContextSupplier.java](https://codecov.io/gh/apache/incubator-hudi/pull/1460/diff?src=pr=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY2xpZW50L1NwYXJrVGFza0NvbnRleHRTdXBwbGllci5qYXZh)
 | `100.00% <100.00%> (ø)` | `0.00 <0.00> (?)` | |
   | 
[...g/apache/hudi/execution/BulkInsertMapFunction.java](https://codecov.io/gh/apache/incubator-hudi/pull/1460/diff?src=pr=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvZXhlY3V0aW9uL0J1bGtJbnNlcnRNYXBGdW5jdGlvbi5qYXZh)
 | `100.00% <100.00%> (ø)` | `0.00 <0.00> (ø)` | |
   | 
[.../hudi/execution/CopyOnWriteLazyInsertIterable.java](https://codecov.io/gh/apache/incubator-hudi/pull/1460/diff?src=pr=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvZXhlY3V0aW9uL0NvcHlPbldyaXRlTGF6eUluc2VydEl0ZXJhYmxlLmphdmE=)
 | `80.76% <100.00%> (+0.37%)` | `0.00 <0.00> (ø)` | |
   | 
[...in/java/org/apache/hudi/io/HoodieAppendHandle.java](https://codecov.io/gh/apache/incubator-hudi/pull/1460/diff?src=pr=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaW8vSG9vZGllQXBwZW5kSGFuZGxlLmphdmE=)
 | `84.17% <100.00%> (ø)` | `0.00 <0.00> (ø)` | |
   | 
[...in/java/org/apache/hudi/io/HoodieCreateHandle.java](https://codecov.io/gh/apache/incubator-hudi/pull/1460/diff?src=pr=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaW8vSG9vZGllQ3JlYXRlSGFuZGxlLmphdmE=)
 | `84.61% <100.00%> (ø)` | `0.00 <0.00> (ø)` | |
   | 
[...ain/java/org/apache/hudi/io/HoodieMergeHandle.java](https://codecov.io/gh/apache/incubator-hudi/pull/1460/diff?src=pr=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaW8vSG9vZGllTWVyZ2VIYW5kbGUuamF2YQ==)
 | `79.31% <100.00%> (ø)` | `0.00 <0.00> (ø)` | |
   | 
[...ain/java/org/apache/hudi/io/HoodieWriteHandle.java](https://codecov.io/gh/apache/incubator-hudi/pull/1460/diff?src=pr=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaW8vSG9vZGllV3JpdGVIYW5kbGUuamF2YQ==)
 | `75.00% <100.00%> (+1.66%)` | `0.00 <0.00> (ø)` | |
   | 
[...rg/apache/hudi/io/storage/HoodieParquetWriter.java](https://codecov.io/gh/apache/incubator-hudi/pull/1460/diff?src=pr=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaW8vc3RvcmFnZS9Ib29kaWVQYXJxdWV0V3JpdGVyLmphdmE=)
 | `100.00% <100.00%> (ø)` | `0.00 <0.00> (ø)` | |
   | 
[...he/hudi/io/storage/HoodieStorageWriterFactory.java](https://codecov.io/gh/apache/incubator-hudi/pull/1460/diff?src=pr=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaW8vc3RvcmFnZS9Ib29kaWVTdG9yYWdlV3JpdGVyRmFjdG9yeS5qYXZh)
 | `93.75% <100.00%> (ø)` | `0.00 <0.00> (ø)` | |
   | ... and [22 
more](https://codecov.io/gh/apache/incubator-hudi/pull/1460/diff?src=pr=tree-more)
 | |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1460?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1460?src=pr=footer).
 Last update 

[GitHub] [incubator-hudi] codecov-io edited a comment on issue #1460: [HUDI-679] Make io package Spark free

2020-03-28 Thread GitBox
codecov-io edited a comment on issue #1460: [HUDI-679] Make io package Spark 
free
URL: https://github.com/apache/incubator-hudi/pull/1460#issuecomment-605428358
 
 
   # 
[Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1460?src=pr=h1) 
Report
   > Merging 
[#1460](https://codecov.io/gh/apache/incubator-hudi/pull/1460?src=pr=desc) 
into 
[master](https://codecov.io/gh/apache/incubator-hudi/commit/1713f686f86e8c2f0a908c313cca9b595c6aed33=desc)
 will **decrease** coverage by `0.08%`.
   > The diff coverage is `97.36%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-hudi/pull/1460/graphs/tree.svg?width=650=150=pr=VTTXabwbs2)](https://codecov.io/gh/apache/incubator-hudi/pull/1460?src=pr=tree)
   
   ```diff
   @@ Coverage Diff  @@
   ## master#1460  +/-   ##
   
   - Coverage 67.66%   67.58%   -0.09% 
 Complexity  261  261  
   
 Files   342  348   +6 
 Lines 1651016670 +160 
 Branches   1684 1693   +9 
   
   + Hits  1117211266  +94 
   - Misses 4599 4665  +66 
 Partials739  739  
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/incubator-hudi/pull/1460?src=pr=tree) | 
Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | 
[.../hudi/execution/MergeOnReadLazyInsertIterable.java](https://codecov.io/gh/apache/incubator-hudi/pull/1460/diff?src=pr=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvZXhlY3V0aW9uL01lcmdlT25SZWFkTGF6eUluc2VydEl0ZXJhYmxlLmphdmE=)
 | `64.70% <66.66%> (ø)` | `0.00 <0.00> (ø)` | |
   | 
[...g/apache/hudi/client/SparkTaskContextSupplier.java](https://codecov.io/gh/apache/incubator-hudi/pull/1460/diff?src=pr=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY2xpZW50L1NwYXJrVGFza0NvbnRleHRTdXBwbGllci5qYXZh)
 | `100.00% <100.00%> (ø)` | `0.00 <0.00> (?)` | |
   | 
[...g/apache/hudi/execution/BulkInsertMapFunction.java](https://codecov.io/gh/apache/incubator-hudi/pull/1460/diff?src=pr=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvZXhlY3V0aW9uL0J1bGtJbnNlcnRNYXBGdW5jdGlvbi5qYXZh)
 | `100.00% <100.00%> (ø)` | `0.00 <0.00> (ø)` | |
   | 
[.../hudi/execution/CopyOnWriteLazyInsertIterable.java](https://codecov.io/gh/apache/incubator-hudi/pull/1460/diff?src=pr=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvZXhlY3V0aW9uL0NvcHlPbldyaXRlTGF6eUluc2VydEl0ZXJhYmxlLmphdmE=)
 | `80.76% <100.00%> (+0.37%)` | `0.00 <0.00> (ø)` | |
   | 
[...in/java/org/apache/hudi/io/HoodieAppendHandle.java](https://codecov.io/gh/apache/incubator-hudi/pull/1460/diff?src=pr=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaW8vSG9vZGllQXBwZW5kSGFuZGxlLmphdmE=)
 | `84.17% <100.00%> (ø)` | `0.00 <0.00> (ø)` | |
   | 
[...in/java/org/apache/hudi/io/HoodieCreateHandle.java](https://codecov.io/gh/apache/incubator-hudi/pull/1460/diff?src=pr=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaW8vSG9vZGllQ3JlYXRlSGFuZGxlLmphdmE=)
 | `84.61% <100.00%> (ø)` | `0.00 <0.00> (ø)` | |
   | 
[...ain/java/org/apache/hudi/io/HoodieMergeHandle.java](https://codecov.io/gh/apache/incubator-hudi/pull/1460/diff?src=pr=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaW8vSG9vZGllTWVyZ2VIYW5kbGUuamF2YQ==)
 | `79.31% <100.00%> (ø)` | `0.00 <0.00> (ø)` | |
   | 
[...ain/java/org/apache/hudi/io/HoodieWriteHandle.java](https://codecov.io/gh/apache/incubator-hudi/pull/1460/diff?src=pr=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaW8vSG9vZGllV3JpdGVIYW5kbGUuamF2YQ==)
 | `75.00% <100.00%> (+1.66%)` | `0.00 <0.00> (ø)` | |
   | 
[...rg/apache/hudi/io/storage/HoodieParquetWriter.java](https://codecov.io/gh/apache/incubator-hudi/pull/1460/diff?src=pr=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaW8vc3RvcmFnZS9Ib29kaWVQYXJxdWV0V3JpdGVyLmphdmE=)
 | `100.00% <100.00%> (ø)` | `0.00 <0.00> (ø)` | |
   | 
[...he/hudi/io/storage/HoodieStorageWriterFactory.java](https://codecov.io/gh/apache/incubator-hudi/pull/1460/diff?src=pr=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaW8vc3RvcmFnZS9Ib29kaWVTdG9yYWdlV3JpdGVyRmFjdG9yeS5qYXZh)
 | `93.75% <100.00%> (ø)` | `0.00 <0.00> (ø)` | |
   | ... and [22 
more](https://codecov.io/gh/apache/incubator-hudi/pull/1460/diff?src=pr=tree-more)
 | |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1460?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1460?src=pr=footer).
 Last update 

[GitHub] [incubator-hudi] codecov-io edited a comment on issue #1460: [HUDI-679] Make io package Spark free

2020-03-28 Thread GitBox
codecov-io edited a comment on issue #1460: [HUDI-679] Make io package Spark 
free
URL: https://github.com/apache/incubator-hudi/pull/1460#issuecomment-605428358
 
 
   # 
[Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1460?src=pr=h1) 
Report
   > Merging 
[#1460](https://codecov.io/gh/apache/incubator-hudi/pull/1460?src=pr=desc) 
into 
[master](https://codecov.io/gh/apache/incubator-hudi/commit/1713f686f86e8c2f0a908c313cca9b595c6aed33=desc)
 will **decrease** coverage by `0.08%`.
   > The diff coverage is `97.36%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-hudi/pull/1460/graphs/tree.svg?width=650=150=pr=VTTXabwbs2)](https://codecov.io/gh/apache/incubator-hudi/pull/1460?src=pr=tree)
   
   ```diff
   @@ Coverage Diff  @@
   ## master#1460  +/-   ##
   
   - Coverage 67.66%   67.58%   -0.09% 
 Complexity  261  261  
   
 Files   342  348   +6 
 Lines 1651016670 +160 
 Branches   1684 1693   +9 
   
   + Hits  1117211266  +94 
   - Misses 4599 4665  +66 
 Partials739  739  
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/incubator-hudi/pull/1460?src=pr=tree) | 
Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | 
[.../hudi/execution/MergeOnReadLazyInsertIterable.java](https://codecov.io/gh/apache/incubator-hudi/pull/1460/diff?src=pr=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvZXhlY3V0aW9uL01lcmdlT25SZWFkTGF6eUluc2VydEl0ZXJhYmxlLmphdmE=)
 | `64.70% <66.66%> (ø)` | `0.00 <0.00> (ø)` | |
   | 
[...g/apache/hudi/client/SparkTaskContextSupplier.java](https://codecov.io/gh/apache/incubator-hudi/pull/1460/diff?src=pr=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY2xpZW50L1NwYXJrVGFza0NvbnRleHRTdXBwbGllci5qYXZh)
 | `100.00% <100.00%> (ø)` | `0.00 <0.00> (?)` | |
   | 
[...g/apache/hudi/execution/BulkInsertMapFunction.java](https://codecov.io/gh/apache/incubator-hudi/pull/1460/diff?src=pr=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvZXhlY3V0aW9uL0J1bGtJbnNlcnRNYXBGdW5jdGlvbi5qYXZh)
 | `100.00% <100.00%> (ø)` | `0.00 <0.00> (ø)` | |
   | 
[.../hudi/execution/CopyOnWriteLazyInsertIterable.java](https://codecov.io/gh/apache/incubator-hudi/pull/1460/diff?src=pr=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvZXhlY3V0aW9uL0NvcHlPbldyaXRlTGF6eUluc2VydEl0ZXJhYmxlLmphdmE=)
 | `80.76% <100.00%> (+0.37%)` | `0.00 <0.00> (ø)` | |
   | 
[...in/java/org/apache/hudi/io/HoodieAppendHandle.java](https://codecov.io/gh/apache/incubator-hudi/pull/1460/diff?src=pr=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaW8vSG9vZGllQXBwZW5kSGFuZGxlLmphdmE=)
 | `84.17% <100.00%> (ø)` | `0.00 <0.00> (ø)` | |
   | 
[...in/java/org/apache/hudi/io/HoodieCreateHandle.java](https://codecov.io/gh/apache/incubator-hudi/pull/1460/diff?src=pr=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaW8vSG9vZGllQ3JlYXRlSGFuZGxlLmphdmE=)
 | `84.61% <100.00%> (ø)` | `0.00 <0.00> (ø)` | |
   | 
[...ain/java/org/apache/hudi/io/HoodieMergeHandle.java](https://codecov.io/gh/apache/incubator-hudi/pull/1460/diff?src=pr=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaW8vSG9vZGllTWVyZ2VIYW5kbGUuamF2YQ==)
 | `79.31% <100.00%> (ø)` | `0.00 <0.00> (ø)` | |
   | 
[...ain/java/org/apache/hudi/io/HoodieWriteHandle.java](https://codecov.io/gh/apache/incubator-hudi/pull/1460/diff?src=pr=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaW8vSG9vZGllV3JpdGVIYW5kbGUuamF2YQ==)
 | `75.00% <100.00%> (+1.66%)` | `0.00 <0.00> (ø)` | |
   | 
[...rg/apache/hudi/io/storage/HoodieParquetWriter.java](https://codecov.io/gh/apache/incubator-hudi/pull/1460/diff?src=pr=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaW8vc3RvcmFnZS9Ib29kaWVQYXJxdWV0V3JpdGVyLmphdmE=)
 | `100.00% <100.00%> (ø)` | `0.00 <0.00> (ø)` | |
   | 
[...he/hudi/io/storage/HoodieStorageWriterFactory.java](https://codecov.io/gh/apache/incubator-hudi/pull/1460/diff?src=pr=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaW8vc3RvcmFnZS9Ib29kaWVTdG9yYWdlV3JpdGVyRmFjdG9yeS5qYXZh)
 | `93.75% <100.00%> (ø)` | `0.00 <0.00> (ø)` | |
   | ... and [22 
more](https://codecov.io/gh/apache/incubator-hudi/pull/1460/diff?src=pr=tree-more)
 | |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1460?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1460?src=pr=footer).
 Last update 

[GitHub] [incubator-hudi] codecov-io edited a comment on issue #1460: [HUDI-679] Make io package Spark free

2020-03-28 Thread GitBox
codecov-io edited a comment on issue #1460: [HUDI-679] Make io package Spark 
free
URL: https://github.com/apache/incubator-hudi/pull/1460#issuecomment-605428358
 
 
   # 
[Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1460?src=pr=h1) 
Report
   > Merging 
[#1460](https://codecov.io/gh/apache/incubator-hudi/pull/1460?src=pr=desc) 
into 
[master](https://codecov.io/gh/apache/incubator-hudi/commit/1713f686f86e8c2f0a908c313cca9b595c6aed33=desc)
 will **decrease** coverage by `0.06%`.
   > The diff coverage is `97.36%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-hudi/pull/1460/graphs/tree.svg?width=650=150=pr=VTTXabwbs2)](https://codecov.io/gh/apache/incubator-hudi/pull/1460?src=pr=tree)
   
   ```diff
   @@ Coverage Diff  @@
   ## master#1460  +/-   ##
   
   - Coverage 67.66%   67.60%   -0.07% 
 Complexity  261  261  
   
 Files   342  348   +6 
 Lines 1651016670 +160 
 Branches   1684 1693   +9 
   
   + Hits  1117211270  +98 
   - Misses 4599 4661  +62 
 Partials739  739  
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/incubator-hudi/pull/1460?src=pr=tree) | 
Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | 
[.../hudi/execution/MergeOnReadLazyInsertIterable.java](https://codecov.io/gh/apache/incubator-hudi/pull/1460/diff?src=pr=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvZXhlY3V0aW9uL01lcmdlT25SZWFkTGF6eUluc2VydEl0ZXJhYmxlLmphdmE=)
 | `64.70% <66.66%> (ø)` | `0.00 <0.00> (ø)` | |
   | 
[...g/apache/hudi/client/SparkTaskContextSupplier.java](https://codecov.io/gh/apache/incubator-hudi/pull/1460/diff?src=pr=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY2xpZW50L1NwYXJrVGFza0NvbnRleHRTdXBwbGllci5qYXZh)
 | `100.00% <100.00%> (ø)` | `0.00 <0.00> (?)` | |
   | 
[...g/apache/hudi/execution/BulkInsertMapFunction.java](https://codecov.io/gh/apache/incubator-hudi/pull/1460/diff?src=pr=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvZXhlY3V0aW9uL0J1bGtJbnNlcnRNYXBGdW5jdGlvbi5qYXZh)
 | `100.00% <100.00%> (ø)` | `0.00 <0.00> (ø)` | |
   | 
[.../hudi/execution/CopyOnWriteLazyInsertIterable.java](https://codecov.io/gh/apache/incubator-hudi/pull/1460/diff?src=pr=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvZXhlY3V0aW9uL0NvcHlPbldyaXRlTGF6eUluc2VydEl0ZXJhYmxlLmphdmE=)
 | `80.76% <100.00%> (+0.37%)` | `0.00 <0.00> (ø)` | |
   | 
[...in/java/org/apache/hudi/io/HoodieAppendHandle.java](https://codecov.io/gh/apache/incubator-hudi/pull/1460/diff?src=pr=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaW8vSG9vZGllQXBwZW5kSGFuZGxlLmphdmE=)
 | `84.17% <100.00%> (ø)` | `0.00 <0.00> (ø)` | |
   | 
[...in/java/org/apache/hudi/io/HoodieCreateHandle.java](https://codecov.io/gh/apache/incubator-hudi/pull/1460/diff?src=pr=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaW8vSG9vZGllQ3JlYXRlSGFuZGxlLmphdmE=)
 | `84.61% <100.00%> (ø)` | `0.00 <0.00> (ø)` | |
   | 
[...ain/java/org/apache/hudi/io/HoodieMergeHandle.java](https://codecov.io/gh/apache/incubator-hudi/pull/1460/diff?src=pr=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaW8vSG9vZGllTWVyZ2VIYW5kbGUuamF2YQ==)
 | `79.31% <100.00%> (ø)` | `0.00 <0.00> (ø)` | |
   | 
[...ain/java/org/apache/hudi/io/HoodieWriteHandle.java](https://codecov.io/gh/apache/incubator-hudi/pull/1460/diff?src=pr=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaW8vSG9vZGllV3JpdGVIYW5kbGUuamF2YQ==)
 | `75.00% <100.00%> (+1.66%)` | `0.00 <0.00> (ø)` | |
   | 
[...rg/apache/hudi/io/storage/HoodieParquetWriter.java](https://codecov.io/gh/apache/incubator-hudi/pull/1460/diff?src=pr=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaW8vc3RvcmFnZS9Ib29kaWVQYXJxdWV0V3JpdGVyLmphdmE=)
 | `100.00% <100.00%> (ø)` | `0.00 <0.00> (ø)` | |
   | 
[...he/hudi/io/storage/HoodieStorageWriterFactory.java](https://codecov.io/gh/apache/incubator-hudi/pull/1460/diff?src=pr=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaW8vc3RvcmFnZS9Ib29kaWVTdG9yYWdlV3JpdGVyRmFjdG9yeS5qYXZh)
 | `93.75% <100.00%> (ø)` | `0.00 <0.00> (ø)` | |
   | ... and [22 
more](https://codecov.io/gh/apache/incubator-hudi/pull/1460/diff?src=pr=tree-more)
 | |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1460?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1460?src=pr=footer).
 Last update 

[GitHub] [incubator-hudi] leesf commented on issue #1460: [HUDI-679] Make io package Spark free

2020-03-28 Thread GitBox
leesf commented on issue #1460: [HUDI-679] Make io package Spark free
URL: https://github.com/apache/incubator-hudi/pull/1460#issuecomment-605560443
 
 
   @yanghua @vinothchandar Thanks for your review, just updated the PR to 
address your comments. PTAL.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


Build failed in Jenkins: hudi-snapshot-deployment-0.5 #231

2020-03-28 Thread Apache Jenkins Server
See 


Changes:


--
[...truncated 2.40 KB...]
/home/jenkins/tools/maven/apache-maven-3.5.4/conf:
logging
settings.xml
toolchains.xml

/home/jenkins/tools/maven/apache-maven-3.5.4/conf/logging:
simplelogger.properties

/home/jenkins/tools/maven/apache-maven-3.5.4/lib:
aopalliance-1.0.jar
cdi-api-1.0.jar
cdi-api.license
commons-cli-1.4.jar
commons-cli.license
commons-io-2.5.jar
commons-io.license
commons-lang3-3.5.jar
commons-lang3.license
ext
guava-20.0.jar
guice-4.2.0-no_aop.jar
jansi-1.17.1.jar
jansi-native
javax.inject-1.jar
jcl-over-slf4j-1.7.25.jar
jcl-over-slf4j.license
jsr250-api-1.0.jar
jsr250-api.license
maven-artifact-3.5.4.jar
maven-artifact.license
maven-builder-support-3.5.4.jar
maven-builder-support.license
maven-compat-3.5.4.jar
maven-compat.license
maven-core-3.5.4.jar
maven-core.license
maven-embedder-3.5.4.jar
maven-embedder.license
maven-model-3.5.4.jar
maven-model-builder-3.5.4.jar
maven-model-builder.license
maven-model.license
maven-plugin-api-3.5.4.jar
maven-plugin-api.license
maven-repository-metadata-3.5.4.jar
maven-repository-metadata.license
maven-resolver-api-1.1.1.jar
maven-resolver-api.license
maven-resolver-connector-basic-1.1.1.jar
maven-resolver-connector-basic.license
maven-resolver-impl-1.1.1.jar
maven-resolver-impl.license
maven-resolver-provider-3.5.4.jar
maven-resolver-provider.license
maven-resolver-spi-1.1.1.jar
maven-resolver-spi.license
maven-resolver-transport-wagon-1.1.1.jar
maven-resolver-transport-wagon.license
maven-resolver-util-1.1.1.jar
maven-resolver-util.license
maven-settings-3.5.4.jar
maven-settings-builder-3.5.4.jar
maven-settings-builder.license
maven-settings.license
maven-shared-utils-3.2.1.jar
maven-shared-utils.license
maven-slf4j-provider-3.5.4.jar
maven-slf4j-provider.license
org.eclipse.sisu.inject-0.3.3.jar
org.eclipse.sisu.inject.license
org.eclipse.sisu.plexus-0.3.3.jar
org.eclipse.sisu.plexus.license
plexus-cipher-1.7.jar
plexus-cipher.license
plexus-component-annotations-1.7.1.jar
plexus-component-annotations.license
plexus-interpolation-1.24.jar
plexus-interpolation.license
plexus-sec-dispatcher-1.4.jar
plexus-sec-dispatcher.license
plexus-utils-3.1.0.jar
plexus-utils.license
slf4j-api-1.7.25.jar
slf4j-api.license
wagon-file-3.1.0.jar
wagon-file.license
wagon-http-3.1.0-shaded.jar
wagon-http.license
wagon-provider-api-3.1.0.jar
wagon-provider-api.license

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/ext:
README.txt

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native:
freebsd32
freebsd64
linux32
linux64
osx
README.txt
windows32
windows64

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/freebsd32:
libjansi.so

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/freebsd64:
libjansi.so

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/linux32:
libjansi.so

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/linux64:
libjansi.so

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/osx:
libjansi.jnilib

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/windows32:
jansi.dll

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/windows64:
jansi.dll
Finished /home/jenkins/tools/maven/apache-maven-3.5.4 Directory Listing :
Detected current version as: 
'HUDI_home=
0.6.0-SNAPSHOT'
[INFO] Scanning for projects...
[WARNING] 
[WARNING] Some problems were encountered while building the effective model for 
org.apache.hudi:hudi-spark_2.11:jar:0.6.0-SNAPSHOT
[WARNING] 'artifactId' contains an expression but should be a constant. @ 
org.apache.hudi:hudi-spark_${scala.binary.version}:[unknown-version], 

 line 26, column 15
[WARNING] 
[WARNING] Some problems were encountered while building the effective model for 
org.apache.hudi:hudi-timeline-service:jar:0.6.0-SNAPSHOT
[WARNING] 'build.plugins.plugin.(groupId:artifactId)' must be unique but found 
duplicate declaration of plugin org.jacoco:jacoco-maven-plugin @ 
org.apache.hudi:hudi-timeline-service:[unknown-version], 

 line 58, column 15
[WARNING] 
[WARNING] Some problems were encountered while building the effective model for 
org.apache.hudi:hudi-utilities_2.11:jar:0.6.0-SNAPSHOT
[WARNING] 'artifactId' contains an expression but should be a constant. @ 
org.apache.hudi:hudi-utilities_${scala.binary.version}:[unknown-version], 

 line 26, column 15
[WARNING] 
[WARNING] Some problems were encountered while building the effective model for 
org.apache.hudi:hudi-spark-bundle_2.11:jar:0.6.0-SNAPSHOT
[WARNING] 'artifactId' contains an expression but should be a constant. @ 

[incubator-hudi] branch asf-site updated: Travis CI build asf-site

2020-03-28 Thread vinoth
This is an automated email from the ASF dual-hosted git repository.

vinoth pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/incubator-hudi.git


The following commit(s) were added to refs/heads/asf-site by this push:
 new 96f3d74  Travis CI build asf-site
96f3d74 is described below

commit 96f3d746d491c06e67c494985452fd95d0b831ee
Author: CI 
AuthorDate: Sun Mar 29 03:30:47 2020 +

Travis CI build asf-site
---
 test-content/assets/js/lunr/lunr-store.js |   4 +-
 test-content/cn/docs/0.5.2-querying_data.html | 102 +-
 test-content/cn/docs/querying_data.html   | 102 +-
 test-content/docs/0.5.2-querying_data.html|   3 +-
 test-content/docs/querying_data.html  |   3 +-
 5 files changed, 204 insertions(+), 10 deletions(-)

diff --git a/test-content/assets/js/lunr/lunr-store.js 
b/test-content/assets/js/lunr/lunr-store.js
index 1077cf3..9351ab6 100644
--- a/test-content/assets/js/lunr/lunr-store.js
+++ b/test-content/assets/js/lunr/lunr-store.js
@@ -435,7 +435,7 @@ var store = [{
 "url": "https://hudi.apache.org/docs/0.5.2-writing_data.html;,
 "teaser":"https://hudi.apache.org/assets/images/500x300.png"},{
 "title": "查询 Hudi 数据集",
-"excerpt":"从概念上讲,Hudi物理存储一次数据到DFS上,同时在其上提供三个逻辑视图,如之前所述。 数据集同步到Hive 
Metastore后,它将提供由Hudi的自定义输入格式支持的Hive外部表。一旦提供了适当的Hudi捆绑包, 
就可以通过Hive、Spark和Presto之类的常用查询引擎来查询数据集。 具体来说,在写入过程中传递了两个由table name命名的Hive表。 
例如,如果table name = hudi_tbl,我们得到 hudi_tbl 实现了由 HoodieParquetInputFormat 
支持的数据集的读优化视图,从而提供了纯列式数据。 hudi_tbl_rt 实现了由 HoodieParquetRealtimeInputFormat 
支持的数据集的实时视图,从而提供了基础数据和日志数据的合并视图。 如概念部分所述,增量处理所需要的 
一个关键原语是增量拉取(以从数据集中获取更改流/日志)。您可以增量提取Hudi数据集,这意味着自指定的即时时间起, 您可�
 �只获得全部更新和新行。 这与插入更新一起使用,对于构建某 [...]
+"excerpt":"从概念上讲,Hudi物理存储一次数据到DFS上,同时在其上提供三个逻辑视图,如之前所述。 数据集同步到Hive 
Metastore后,它将提供由Hudi的自定义输入格式支持的Hive外部表。一旦提供了适当的Hudi捆绑包, 
就可以通过Hive、Spark和Presto之类的常用查询引擎来查询数据集。 具体来说,在写入过程中传递了两个由table name命名的Hive表。 
例如,如果table name = hudi_tbl,我们得到 hudi_tbl 实现了由 HoodieParquetInputFormat 
支持的数据集的读优化视图,从而提供了纯列式数据。 hudi_tbl_rt 实现了由 HoodieParquetRealtimeInputFormat 
支持的数据集的实时视图,从而提供了基础数据和日志数据的合并视图。 如概念部分所述,增量处理所需要的 
一个关键原语是增量拉取(以从数据集中获取更改流/日志)。您可以增量提取Hudi数据集,这意味着自指定的即时时间起, 您可�
 �只获得全部更新和新行。 这与插入更新一起使用,对于构建某 [...]
 "tags": [],
 "url": "https://hudi.apache.org/cn/docs/0.5.2-querying_data.html;,
 "teaser":"https://hudi.apache.org/assets/images/500x300.png"},{
@@ -600,7 +600,7 @@ var store = [{
 "url": "https://hudi.apache.org/docs/writing_data.html;,
 "teaser":"https://hudi.apache.org/assets/images/500x300.png"},{
 "title": "查询 Hudi 数据集",
-"excerpt":"从概念上讲,Hudi物理存储一次数据到DFS上,同时在其上提供三个逻辑视图,如之前所述。 数据集同步到Hive 
Metastore后,它将提供由Hudi的自定义输入格式支持的Hive外部表。一旦提供了适当的Hudi捆绑包, 
就可以通过Hive、Spark和Presto之类的常用查询引擎来查询数据集。 具体来说,在写入过程中传递了两个由table name命名的Hive表。 
例如,如果table name = hudi_tbl,我们得到 hudi_tbl 实现了由 HoodieParquetInputFormat 
支持的数据集的读优化视图,从而提供了纯列式数据。 hudi_tbl_rt 实现了由 HoodieParquetRealtimeInputFormat 
支持的数据集的实时视图,从而提供了基础数据和日志数据的合并视图。 如概念部分所述,增量处理所需要的 
一个关键原语是增量拉取(以从数据集中获取更改流/日志)。您可以增量提取Hudi数据集,这意味着自指定的即时时间起, 您可�
 �只获得全部更新和新行。 这与插入更新一起使用,对于构建某 [...]
+"excerpt":"从概念上讲,Hudi物理存储一次数据到DFS上,同时在其上提供三个逻辑视图,如之前所述。 数据集同步到Hive 
Metastore后,它将提供由Hudi的自定义输入格式支持的Hive外部表。一旦提供了适当的Hudi捆绑包, 
就可以通过Hive、Spark和Presto之类的常用查询引擎来查询数据集。 具体来说,在写入过程中传递了两个由table name命名的Hive表。 
例如,如果table name = hudi_tbl,我们得到 hudi_tbl 实现了由 HoodieParquetInputFormat 
支持的数据集的读优化视图,从而提供了纯列式数据。 hudi_tbl_rt 实现了由 HoodieParquetRealtimeInputFormat 
支持的数据集的实时视图,从而提供了基础数据和日志数据的合并视图。 如概念部分所述,增量处理所需要的 
一个关键原语是增量拉取(以从数据集中获取更改流/日志)。您可以增量提取Hudi数据集,这意味着自指定的即时时间起, 您可�
 �只获得全部更新和新行。 这与插入更新一起使用,对于构建某 [...]
 "tags": [],
 "url": "https://hudi.apache.org/cn/docs/querying_data.html;,
 "teaser":"https://hudi.apache.org/assets/images/500x300.png"},{
diff --git a/test-content/cn/docs/0.5.2-querying_data.html 
b/test-content/cn/docs/0.5.2-querying_data.html
index 0f4a441..5d337d6 100644
--- a/test-content/cn/docs/0.5.2-querying_data.html
+++ b/test-content/cn/docs/0.5.2-querying_data.html
@@ -335,6 +335,12 @@
   
  IN 
THIS PAGE
 
+  查询引擎支持列表
+
+  读优化表
+  实时表
+
+  
   Hive
 
   读优化表
@@ -352,7 +358,7 @@
   Presto
   Impala(此功能还未正式发布)
 
-  读优化表
+  读优化表
 
   
 
@@ -377,6 +383,94 @@
 并与其他表(数据集/维度)结合以写出增量到目标Hudi数据集。增量视图是通过查询上表之一实现的,并具有特殊配置,
 该特殊配置指示查询计划仅需要从数据集中获取增量数据。
 
+查询引擎支持列表
+
+下面的表格展示了各查询引擎是否支持Hudi格式
+
+读优化表
+
+
+  
+
+  查询引擎
+  实时视图
+  增量拉取
+
+  
+  
+
+  Hive
+  Y
+  Y
+
+
+  Spark SQL
+  Y
+  Y
+
+
+  Spark Datasource
+  Y
+  Y
+
+
+  Presto
+  Y
+  N
+
+
+  Impala
+  Y
+  N
+
+  
+
+
+实时表
+
+
+  
+
+  查询引擎
+  实时视图
+  增量拉取
+  读优化表
+
+  
+  
+
+  Hive
+  Y
+  Y
+  Y
+
+
+  Spark SQL
+  Y
+  Y
+  Y
+
+

[GitHub] [incubator-hudi] vinothchandar commented on issue #1421: [HUDI-724] Parallelize getSmallFiles for partitions

2020-03-28 Thread GitBox
vinothchandar commented on issue #1421: [HUDI-724] Parallelize getSmallFiles 
for partitions
URL: https://github.com/apache/incubator-hudi/pull/1421#issuecomment-605552930
 
 
   Took a pass. LGTM overall
   since @bvaradar is the assignee.. his call :) 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[incubator-hudi] branch asf-site updated: [MINOR] Update doc to include inc query on partitions (#1454)

2020-03-28 Thread leesf
This is an automated email from the ASF dual-hosted git repository.

leesf pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/incubator-hudi.git


The following commit(s) were added to refs/heads/asf-site by this push:
 new 18ce570  [MINOR] Update doc to include inc query on partitions (#1454)
18ce570 is described below

commit 18ce5708e073e80779f6dcc00d388b4cb0cc758a
Author: YanJia-Gary-Li 
AuthorDate: Sat Mar 28 20:28:48 2020 -0700

[MINOR] Update doc to include inc query on partitions (#1454)
---
 docs/_docs/0.5.2/2_3_querying_data.cn.md | 31 ++-
 docs/_docs/0.5.2/2_3_querying_data.md|  3 ++-
 docs/_docs/2_3_querying_data.cn.md   | 31 ++-
 docs/_docs/2_3_querying_data.md  |  3 ++-
 4 files changed, 64 insertions(+), 4 deletions(-)

diff --git a/docs/_docs/0.5.2/2_3_querying_data.cn.md 
b/docs/_docs/0.5.2/2_3_querying_data.cn.md
index 74afcef..77ad2d7 100644
--- a/docs/_docs/0.5.2/2_3_querying_data.cn.md
+++ b/docs/_docs/0.5.2/2_3_querying_data.cn.md
@@ -25,6 +25,33 @@ language: cn
 
并与其他表(数据集/维度)结合以[写出增量](/cn/docs/0.5.2-writing_data.html)到目标Hudi数据集。增量视图是通过查询上表之一实现的,并具有特殊配置,
 该特殊配置指示查询计划仅需要从数据集中获取增量数据。
 
+
+## 查询引擎支持列表
+
+下面的表格展示了各查询引擎是否支持Hudi格式
+
+### 读优化表
+  
+|查询引擎|实时视图|增量拉取|
+|||---|
+|**Hive**|Y|Y|
+|**Spark SQL**|Y|Y|
+|**Spark Datasource**|Y|Y|
+|**Presto**|Y|N|
+|**Impala**|Y|N|
+
+
+### 实时表
+
+|查询引擎|实时视图|增量拉取|读优化表|
+|||---|--|
+|**Hive**|Y|Y|Y|
+|**Spark SQL**|Y|Y|Y|
+|**Spark Datasource**|N|N|Y|
+|**Presto**|N|N|Y|
+|**Impala**|N|N|Y|
+
+
 接下来,我们将详细讨论在每个查询引擎上如何访问所有三个视图。
 
 ## Hive
@@ -128,7 +155,9 @@ scala> sqlContext.sql("select count(*) from hudi_rt where 
datestr = '2016-10-02'
  DataSourceReadOptions.VIEW_TYPE_INCREMENTAL_OPT_VAL())
  .option(DataSourceReadOptions.BEGIN_INSTANTTIME_OPT_KEY(),
 )
- .load(tablePath); // For incremental view, pass in the root/base path of 
dataset
+ .option(DataSourceReadOptions.INCR_PATH_GLOB_OPT_KEY(),
+"/year=2020/month=*/day=*") // 可选,从指定的分区增量拉取
+ .load(tablePath); // 用数据集的最底层路径
 ```
 
 请参阅[设置](/cn/docs/0.5.2-configurations.html#spark-datasource)部分,以查看所有数据源选项。
diff --git a/docs/_docs/0.5.2/2_3_querying_data.md 
b/docs/_docs/0.5.2/2_3_querying_data.md
index 0c28b12..9d17e72 100644
--- a/docs/_docs/0.5.2/2_3_querying_data.md
+++ b/docs/_docs/0.5.2/2_3_querying_data.md
@@ -55,7 +55,7 @@ Note that `Read Optimized` queries are not applicable for 
COPY_ON_WRITE tables.
 |**Spark SQL**|Y|Y|Y|
 |**Spark Datasource**|N|N|Y|
 |**Presto**|N|N|Y|
-|**Impala**|N|N|N|
+|**Impala**|N|N|Y|
 
 
 In sections, below we will discuss specific setup to access different query 
types from different query engines. 
@@ -148,6 +148,7 @@ The following snippet shows how to obtain all records 
changed after `beginInstan
  .format("org.apache.hudi")
  .option(DataSourceReadOptions.QUERY_TYPE_OPT_KEY(), 
DataSourceReadOptions.QUERY_TYPE_INCREMENTAL_OPT_VAL())
  .option(DataSourceReadOptions.BEGIN_INSTANTTIME_OPT_KEY(), 
)
+ .option(DataSourceReadOptions.INCR_PATH_GLOB_OPT_KEY(), 
"/year=2020/month=*/day=*") // Optional, use glob pattern if querying certain 
partitions
  .load(tablePath); // For incremental query, pass in the root/base path of 
table
  
 hudiIncQueryDF.createOrReplaceTempView("hudi_trips_incremental")
diff --git a/docs/_docs/2_3_querying_data.cn.md 
b/docs/_docs/2_3_querying_data.cn.md
index b2c4870..1fa91d1 100644
--- a/docs/_docs/2_3_querying_data.cn.md
+++ b/docs/_docs/2_3_querying_data.cn.md
@@ -24,6 +24,33 @@ language: cn
 
并与其他表(数据集/维度)结合以[写出增量](/cn/docs/writing_data.html)到目标Hudi数据集。增量视图是通过查询上表之一实现的,并具有特殊配置,
 该特殊配置指示查询计划仅需要从数据集中获取增量数据。
 
+
+## 查询引擎支持列表
+
+下面的表格展示了各查询引擎是否支持Hudi格式
+
+### 读优化表
+  
+|查询引擎|实时视图|增量拉取|
+|||---|
+|**Hive**|Y|Y|
+|**Spark SQL**|Y|Y|
+|**Spark Datasource**|Y|Y|
+|**Presto**|Y|N|
+|**Impala**|Y|N|
+
+
+### 实时表
+
+|查询引擎|实时视图|增量拉取|读优化表|
+|||---|--|
+|**Hive**|Y|Y|Y|
+|**Spark SQL**|Y|Y|Y|
+|**Spark Datasource**|N|N|Y|
+|**Presto**|N|N|Y|
+|**Impala**|N|N|Y|
+
+
 接下来,我们将详细讨论在每个查询引擎上如何访问所有三个视图。
 
 ## Hive
@@ -127,7 +154,9 @@ scala> sqlContext.sql("select count(*) from hudi_rt where 
datestr = '2016-10-02'
  DataSourceReadOptions.VIEW_TYPE_INCREMENTAL_OPT_VAL())
  .option(DataSourceReadOptions.BEGIN_INSTANTTIME_OPT_KEY(),
 )
- .load(tablePath); // For incremental view, pass in the root/base path of 
dataset
+ .option(DataSourceReadOptions.INCR_PATH_GLOB_OPT_KEY(),
+"/year=2020/month=*/day=*") // 可选,从指定的分区增量拉取
+ .load(tablePath); // 用数据集的最底层路径
 ```
 
 请参阅[设置](/cn/docs/configurations.html#spark-datasource)部分,以查看所有数据源选项。
diff --git a/docs/_docs/2_3_querying_data.md b/docs/_docs/2_3_querying_data.md
index 875b7f0..3e6a436 100644
--- a/docs/_docs/2_3_querying_data.md
+++ 

[GitHub] [incubator-hudi] leesf merged pull request #1454: MINOR update doc to include inc query on partitions

2020-03-28 Thread GitBox
leesf merged pull request #1454: MINOR update doc to include inc query on 
partitions
URL: https://github.com/apache/incubator-hudi/pull/1454
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on issue #1455: [SUPPORT] Hudi upsert run into exception: java.lang.NoSuchMethodError: java.lang.Math.floorMod(JI)I

2020-03-28 Thread GitBox
vinothchandar commented on issue #1455: [SUPPORT] Hudi upsert run into 
exception:  java.lang.NoSuchMethodError: java.lang.Math.floorMod(JI)I
URL: https://github.com/apache/incubator-hudi/issues/1455#issuecomment-605552838
 
 
   Thanks @EdwinGuo and @lamber-ken .. Please raise a JIRA if there is follow 
up work and clsoe this issue :) 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on issue #1453: HUDI-644 kafka connect checkpoint provider

2020-03-28 Thread GitBox
vinothchandar commented on issue #1453: HUDI-644 kafka connect checkpoint 
provider
URL: https://github.com/apache/incubator-hudi/pull/1453#issuecomment-605552741
 
 
   Slightly behind.. Will chime in here soon :) 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Assigned] (HUDI-677) Abstract/Refactor all transaction management logic into a set of classes from HoodieWriteClient

2020-03-28 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar reassigned HUDI-677:
---

Assignee: Vinoth Chandar  (was: hong dongdong)

> Abstract/Refactor all transaction management logic into a set of classes from 
> HoodieWriteClient
> ---
>
> Key: HUDI-677
> URL: https://issues.apache.org/jira/browse/HUDI-677
> Project: Apache Hudi (incubating)
>  Issue Type: Sub-task
>  Components: Code Cleanup
>Reporter: Vinoth Chandar
>Assignee: Vinoth Chandar
>Priority: Major
> Fix For: 0.6.0
>
>
> Over time a lot of the core transaction management code has been  split 
> across various files in hudi-client.. We want to clean this up and present a 
> nice interface.. 
> Some notes and thoughts and suggestions..  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-677) Abstract/Refactor all transaction management logic into a set of classes from HoodieWriteClient

2020-03-28 Thread Vinoth Chandar (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17070195#comment-17070195
 ] 

Vinoth Chandar commented on HUDI-677:
-

Let me take this over..  if you don't mind [~hongdongdong] .. we can discuss on 
the PR.. 

> Abstract/Refactor all transaction management logic into a set of classes from 
> HoodieWriteClient
> ---
>
> Key: HUDI-677
> URL: https://issues.apache.org/jira/browse/HUDI-677
> Project: Apache Hudi (incubating)
>  Issue Type: Sub-task
>  Components: Code Cleanup
>Reporter: Vinoth Chandar
>Assignee: hong dongdong
>Priority: Major
> Fix For: 0.6.0
>
>
> Over time a lot of the core transaction management code has been  split 
> across various files in hudi-client.. We want to clean this up and present a 
> nice interface.. 
> Some notes and thoughts and suggestions..  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] yanghua commented on a change in pull request #1460: [HUDI-679] Make io package Spark free

2020-03-28 Thread GitBox
yanghua commented on a change in pull request #1460: [HUDI-679] Make io package 
Spark free
URL: https://github.com/apache/incubator-hudi/pull/1460#discussion_r399737990
 
 

 ##
 File path: 
hudi-client/src/main/java/org/apache/hudi/client/SparkTaskContextDetailSupplier.java
 ##
 @@ -0,0 +1,45 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.client;
+
+import org.apache.spark.TaskContext;
+
+import java.io.Serializable;
+import java.util.function.Supplier;
+
+/**
+ * Spark Supplier.
+ */
+public interface SparkTaskContextDetailSupplier extends Supplier, 
Serializable {
 
 Review comment:
   Just online seeing the latest changes comes from @leesf . Yes, it seems this 
is a better abstraction.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] hddong commented on a change in pull request #1452: [HUDI-740]Fix can not specify the sparkMaster of cleans run command

2020-03-28 Thread GitBox
hddong commented on a change in pull request #1452: [HUDI-740]Fix can not 
specify the sparkMaster of cleans run command
URL: https://github.com/apache/incubator-hudi/pull/1452#discussion_r399733435
 
 

 ##
 File path: hudi-cli/src/main/java/org/apache/hudi/cli/commands/SparkMain.java
 ##
 @@ -62,7 +63,9 @@ public static void main(String[] args) throws Exception {
 
 SparkCommand cmd = SparkCommand.valueOf(command);
 
-JavaSparkContext jsc = SparkUtil.initJavaSparkConf("hoodie-cli-" + 
command);
+JavaSparkContext jsc = cmd == SparkCommand.CLEAN
 
 Review comment:
   @prashantwason my mistake, thanks for your point out.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Updated] (HUDI-736) Simplify ReflectionUtils#getTopLevelClasses

2020-03-28 Thread Suneel Marthi (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi updated HUDI-736:
---
Status: Open  (was: New)

> Simplify ReflectionUtils#getTopLevelClasses 
> 
>
> Key: HUDI-736
> URL: https://issues.apache.org/jira/browse/HUDI-736
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Code Cleanup
>Reporter: Vinoth Chandar
>Assignee: Suneel Marthi
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-736) Simplify ReflectionUtils#getTopLevelClasses

2020-03-28 Thread Suneel Marthi (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi updated HUDI-736:
---
Fix Version/s: 0.6.0

> Simplify ReflectionUtils#getTopLevelClasses 
> 
>
> Key: HUDI-736
> URL: https://issues.apache.org/jira/browse/HUDI-736
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Code Cleanup
>Reporter: Vinoth Chandar
>Assignee: Suneel Marthi
>Priority: Major
> Fix For: 0.6.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HUDI-736) Simplify ReflectionUtils#getTopLevelClasses

2020-03-28 Thread Suneel Marthi (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi reassigned HUDI-736:
--

Assignee: Suneel Marthi

> Simplify ReflectionUtils#getTopLevelClasses 
> 
>
> Key: HUDI-736
> URL: https://issues.apache.org/jira/browse/HUDI-736
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Code Cleanup
>Reporter: Vinoth Chandar
>Assignee: Suneel Marthi
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-737) Simplify/Eliminate need for CollectionUtils#Maps/MapsBuilder

2020-03-28 Thread Suneel Marthi (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17070158#comment-17070158
 ] 

Suneel Marthi commented on HUDI-737:


Fixed as part of HUDI-479

> Simplify/Eliminate need for CollectionUtils#Maps/MapsBuilder
> 
>
> Key: HUDI-737
> URL: https://issues.apache.org/jira/browse/HUDI-737
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Code Cleanup
>Reporter: Vinoth Chandar
>Assignee: Suneel Marthi
>Priority: Major
> Fix For: 0.6.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HUDI-479) Eliminate use of guava if possible

2020-03-28 Thread Suneel Marthi (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi resolved HUDI-479.

Resolution: Fixed

> Eliminate use of guava if possible
> --
>
> Key: HUDI-479
> URL: https://issues.apache.org/jira/browse/HUDI-479
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Code Cleanup
>Reporter: Vinoth Chandar
>Assignee: Suneel Marthi
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.6.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (HUDI-479) Eliminate use of guava if possible

2020-03-28 Thread Suneel Marthi (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi closed HUDI-479.
--

> Eliminate use of guava if possible
> --
>
> Key: HUDI-479
> URL: https://issues.apache.org/jira/browse/HUDI-479
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Code Cleanup
>Reporter: Vinoth Chandar
>Assignee: Suneel Marthi
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.6.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Reopened] (HUDI-479) Eliminate use of guava if possible

2020-03-28 Thread Suneel Marthi (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi reopened HUDI-479:


> Eliminate use of guava if possible
> --
>
> Key: HUDI-479
> URL: https://issues.apache.org/jira/browse/HUDI-479
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Code Cleanup
>Reporter: Vinoth Chandar
>Assignee: Suneel Marthi
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.6.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-744) Redistribute files in hudi-common utils package

2020-03-28 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar updated HUDI-744:

Status: Open  (was: New)

> Redistribute files in hudi-common utils package
> ---
>
> Key: HUDI-744
> URL: https://issues.apache.org/jira/browse/HUDI-744
> Project: Apache Hudi (incubating)
>  Issue Type: Sub-task
>  Components: Code Cleanup
>Reporter: Vinoth Chandar
>Assignee: Vinoth Chandar
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-744) Redistribute files in hudi-common utils package

2020-03-28 Thread Vinoth Chandar (Jira)
Vinoth Chandar created HUDI-744:
---

 Summary: Redistribute files in hudi-common utils package
 Key: HUDI-744
 URL: https://issues.apache.org/jira/browse/HUDI-744
 Project: Apache Hudi (incubating)
  Issue Type: Sub-task
  Components: Code Cleanup
Reporter: Vinoth Chandar
Assignee: Vinoth Chandar






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] umehrot2 commented on a change in pull request #1427: [HUDI-727]: Copy default values of fields if not present when rewriting incoming record with new schema

2020-03-28 Thread GitBox
umehrot2 commented on a change in pull request #1427: [HUDI-727]: Copy default 
values of fields if not present when rewriting incoming record with new schema
URL: https://github.com/apache/incubator-hudi/pull/1427#discussion_r399722115
 
 

 ##
 File path: 
hudi-common/src/main/java/org/apache/hudi/common/util/HoodieAvroUtils.java
 ##
 @@ -214,6 +220,63 @@ private static GenericRecord rewrite(GenericRecord 
record, Schema schemaWithFiel
 return newRecord;
   }
 
+  /*
+  This function takes the union of all the fields except hoodie metadata fields
+   */
+  private static List getAllFieldsToWrite(Schema oldSchema, Schema 
newSchema) {
+Set allFields = new HashSet<>(oldSchema.getFields());
+List fields = new ArrayList<>(oldSchema.getFields());
+for (Schema.Field f : newSchema.getFields()) {
+  if (!allFields.contains(f) && !isMetadataField(f.name())) {
+fields.add(f);
+  }
+}
+
+return fields;
+  }
+
+  private static void populateNewRecordAsPerDataType(GenericRecord record, 
Field field) {
+switch (getSchemaTypeForField(field)) {
+  case STRING:
+  case BYTES:
+  case ENUM:
+  case FIXED:
+record.put(field.name(), field.defaultVal() == null ? null : (String) 
field.defaultVal());
 
 Review comment:
   Why do we need this casting of individual data types ? It seems we can just 
pass `field.defaultVal()` as it is because it expects an `Object`, and 
`field.defaultVal()` returns exactly that.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] umehrot2 commented on a change in pull request #1427: [HUDI-727]: Copy default values of fields if not present when rewriting incoming record with new schema

2020-03-28 Thread GitBox
umehrot2 commented on a change in pull request #1427: [HUDI-727]: Copy default 
values of fields if not present when rewriting incoming record with new schema
URL: https://github.com/apache/incubator-hudi/pull/1427#discussion_r399722305
 
 

 ##
 File path: 
hudi-common/src/main/java/org/apache/hudi/common/util/HoodieAvroUtils.java
 ##
 @@ -214,6 +220,63 @@ private static GenericRecord rewrite(GenericRecord 
record, Schema schemaWithFiel
 return newRecord;
   }
 
+  /*
+  This function takes the union of all the fields except hoodie metadata fields
+   */
+  private static List getAllFieldsToWrite(Schema oldSchema, Schema 
newSchema) {
+Set allFields = new HashSet<>(oldSchema.getFields());
+List fields = new ArrayList<>(oldSchema.getFields());
+for (Schema.Field f : newSchema.getFields()) {
+  if (!allFields.contains(f) && !isMetadataField(f.name())) {
+fields.add(f);
+  }
+}
+
+return fields;
+  }
+
+  private static void populateNewRecordAsPerDataType(GenericRecord record, 
Field field) {
+switch (getSchemaTypeForField(field)) {
+  case STRING:
+  case BYTES:
+  case ENUM:
+  case FIXED:
+record.put(field.name(), field.defaultVal() == null ? null : (String) 
field.defaultVal());
+break;
+  case LONG:
+record.put(field.name(), field.defaultVal() == null ? null : (long) 
field.defaultVal());
+break;
+  case INT:
+record.put(field.name(), field.defaultVal() == null ? null : (int) 
field.defaultVal());
+break;
+  case FLOAT:
+record.put(field.name(), field.defaultVal() == null ? null : (float) 
field.defaultVal());
+break;
+  case DOUBLE:
+record.put(field.name(), field.defaultVal() == null ? null : (double) 
field.defaultVal());
+break;
+  case BOOLEAN:
+record.put(field.name(), field.defaultVal() == null ? null : (boolean) 
field.defaultVal());
+break;
+  default:
+record.put(field.name(), field.defaultVal());
+}
+  }
+
+  private static Schema.Type getSchemaTypeForField(Field field) {
+if (!field.schema().getType().equals(Schema.Type.UNION)) {
+  return field.schema().getType();
+}
+
+for (Schema schema : field.schema().getTypes()) {
+  if (!schema.getType().equals(Schema.Type.NULL)) {
+return schema.getType();
+  }
+}
+
+return Schema.Type.STRING;
 
 Review comment:
   Shouldn't we return `Schema.Type.NULL` here ? Seems like the only case where 
we will reach this line is when type is null.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] umehrot2 commented on a change in pull request #1427: [HUDI-727]: Copy default values of fields if not present when rewriting incoming record with new schema

2020-03-28 Thread GitBox
umehrot2 commented on a change in pull request #1427: [HUDI-727]: Copy default 
values of fields if not present when rewriting incoming record with new schema
URL: https://github.com/apache/incubator-hudi/pull/1427#discussion_r399719119
 
 

 ##
 File path: 
hudi-common/src/main/java/org/apache/hudi/common/util/HoodieAvroUtils.java
 ##
 @@ -168,7 +169,7 @@ public static GenericRecord 
addHoodieKeyToRecord(GenericRecord record, String re
*/
   public static Schema appendNullSchemaFields(Schema schema, List 
newFieldNames) {
 List newFields = schema.getFields().stream()
-.map(field -> new Field(field.name(), field.schema(), field.doc(), 
field.defaultValue())).collect(Collectors.toList());
+.map(field -> new Field(field.name(), field.schema(), field.doc(), 
field.defaultVal())).collect(Collectors.toList());
 for (String newField : newFieldNames) {
   newFields.add(new Schema.Field(newField, METADATA_FIELD_SCHEMA, "", 
NullNode.getInstance()));
 
 Review comment:
   Shall we change this line as well to consistently use the object API ?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] umehrot2 commented on a change in pull request #1427: [HUDI-727]: Copy default values of fields if not present when rewriting incoming record with new schema

2020-03-28 Thread GitBox
umehrot2 commented on a change in pull request #1427: [HUDI-727]: Copy default 
values of fields if not present when rewriting incoming record with new schema
URL: https://github.com/apache/incubator-hudi/pull/1427#discussion_r399722622
 
 

 ##
 File path: 
hudi-common/src/main/java/org/apache/hudi/common/util/HoodieAvroUtils.java
 ##
 @@ -214,6 +220,63 @@ private static GenericRecord rewrite(GenericRecord 
record, Schema schemaWithFiel
 return newRecord;
   }
 
+  /*
+  This function takes the union of all the fields except hoodie metadata fields
+   */
+  private static List getAllFieldsToWrite(Schema oldSchema, Schema 
newSchema) {
+Set allFields = new HashSet<>(oldSchema.getFields());
+List fields = new ArrayList<>(oldSchema.getFields());
+for (Schema.Field f : newSchema.getFields()) {
+  if (!allFields.contains(f) && !isMetadataField(f.name())) {
+fields.add(f);
+  }
+}
+
+return fields;
+  }
+
+  private static void populateNewRecordAsPerDataType(GenericRecord record, 
Field field) {
 
 Review comment:
   nit: May be rename this to `populateFieldWithDefaultValue` as that seems to 
be the intent of this function.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] umehrot2 commented on a change in pull request #1427: [HUDI-727]: Copy default values of fields if not present when rewriting incoming record with new schema

2020-03-28 Thread GitBox
umehrot2 commented on a change in pull request #1427: [HUDI-727]: Copy default 
values of fields if not present when rewriting incoming record with new schema
URL: https://github.com/apache/incubator-hudi/pull/1427#discussion_r399721017
 
 

 ##
 File path: 
hudi-common/src/main/java/org/apache/hudi/common/util/HoodieAvroUtils.java
 ##
 @@ -204,8 +205,13 @@ public static GenericRecord 
rewriteRecordWithOnlyNewSchemaFields(GenericRecord r
 
   private static GenericRecord rewrite(GenericRecord record, Schema 
schemaWithFields, Schema newSchema) {
 GenericRecord newRecord = new GenericData.Record(newSchema);
-for (Schema.Field f : schemaWithFields.getFields()) {
-  newRecord.put(f.name(), record.get(f.name()));
+//get union of both the schemas, and then populate the fields in the new 
record
+for (Schema.Field f : getAllFieldsToWrite(schemaWithFields, newSchema)) {
 
 Review comment:
   This is an internal function call that is being used by both 
`rewriteRecordWithOnlyNewSchemaFields` and `rewriteRecord`. 
`getAllFieldsToWrite` does not really make sense in case of 
`rewriteRecordWithOnlyNewSchemaFields` and won't really do anything in that 
case because old and new schema is same.
   
   I think it would be better to refactor `rewrite` to receive 
`List fieldsToWrite` as a parameter instead of 
`schemaWithFields`. In case of `rewriteRecord` we can call  
`getAllFieldsToWrite` and pass its value in the parameter, while in case of 
`rewriteRecordWithOnlyNewSchemaFields` just pass `schema.getFields()` here.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] umehrot2 commented on a change in pull request #1427: [HUDI-727]: Copy default values of fields if not present when rewriting incoming record with new schema

2020-03-28 Thread GitBox
umehrot2 commented on a change in pull request #1427: [HUDI-727]: Copy default 
values of fields if not present when rewriting incoming record with new schema
URL: https://github.com/apache/incubator-hudi/pull/1427#discussion_r399718998
 
 

 ##
 File path: 
hudi-common/src/main/java/org/apache/hudi/common/util/HoodieAvroUtils.java
 ##
 @@ -104,15 +103,15 @@ public static Schema addMetadataFields(Schema schema) {
 List parentFields = new ArrayList<>();
 
 Schema.Field commitTimeField =
-new Schema.Field(HoodieRecord.COMMIT_TIME_METADATA_FIELD, 
METADATA_FIELD_SCHEMA, "", NullNode.getInstance());
+new Schema.Field(HoodieRecord.COMMIT_TIME_METADATA_FIELD, 
METADATA_FIELD_SCHEMA, "", null);
 
 Review comment:
   Minor: we probably should do `(Object) null` to force it to resolve to the 
new API that accepts object, because null by itself can either refer to 
JsonNode or an Object


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on issue #1458: Issue with running compaction on a MOR dataset with org.apache.hudi.payload.AWSDmsAvroPayload

2020-03-28 Thread GitBox
vinothchandar commented on issue #1458: Issue with running compaction on a MOR 
dataset with org.apache.hudi.payload.AWSDmsAvroPayload
URL: https://github.com/apache/incubator-hudi/issues/1458#issuecomment-605527152
 
 
   @PhatakN1 This is because the `.AWSDmsAvroPayload` you used is not the cli 
class path... which is rather surprising, since cli does pull utilities as a 
dependencu.. 
   
   @lamber-ken that's a good line to pursue investigating on..


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] garyli1019 commented on a change in pull request #1453: HUDI-644 kafka connect checkpoint provider

2020-03-28 Thread GitBox
garyli1019 commented on a change in pull request #1453: HUDI-644 kafka connect 
checkpoint provider
URL: https://github.com/apache/incubator-hudi/pull/1453#discussion_r399703856
 
 

 ##
 File path: 
hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/checkpoint/KafkaConnectHdfsProvider.java
 ##
 @@ -0,0 +1,140 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.utilities.sources.checkpoint;
+
+import org.apache.hadoop.fs.FileStatus;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.fs.PathFilter;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.HashMap;
+import java.util.regex.Matcher;
+import java.util.regex.Pattern;
+
+/**
+ * Generate checkpoint from Kafka-Connect-HDFS.
+ */
+public class KafkaConnectHdfsProvider implements CheckPointProvider {
 
 Review comment:
   yes, prefer to keep the PR small enough for review. WDYT?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] garyli1019 commented on issue #1454: MINOR update doc to include inc query on partitions

2020-03-28 Thread GitBox
garyli1019 commented on issue #1454: MINOR update doc to include inc query on 
partitions
URL: https://github.com/apache/incubator-hudi/pull/1454#issuecomment-605511322
 
 
   thanks for reviewing! Addressed the comment


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] garyli1019 commented on a change in pull request #1454: MINOR update doc to include inc query on partitions

2020-03-28 Thread GitBox
garyli1019 commented on a change in pull request #1454: MINOR update doc to 
include inc query on partitions
URL: https://github.com/apache/incubator-hudi/pull/1454#discussion_r399702927
 
 

 ##
 File path: docs/_docs/0.5.2/2_3_querying_data.md
 ##
 @@ -146,8 +146,9 @@ The following snippet shows how to obtain all records 
changed after `beginInstan
 ```java
  Dataset hudiIncQueryDF = spark.read()
  .format("org.apache.hudi")
- .option(DataSourceReadOptions.QUERY_TYPE_OPT_KEY(), 
DataSourceReadOptions.QUERY_TYPE_INCREMENTAL_OPT_VAL())
+ 
.option(DataSourceReadOptions.QUERY_TYPE_OPT_KEY(),DataSourceReadOptions.QUERY_TYPE_INCREMENTAL_OPT_VAL())
 
 Review comment:
   yes, deleted the space accidentally 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on issue #1457: [HUDI-741] Added checks to validate Hoodie's schema evolution.

2020-03-28 Thread GitBox
vinothchandar commented on issue #1457: [HUDI-741] Added checks to validate 
Hoodie's schema evolution.
URL: https://github.com/apache/incubator-hudi/pull/1457#issuecomment-605508358
 
 
   @pratyakshsharma  @umehrot2 could you folks help review this, given your 
interests in this area


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1460: [HUDI-679] Make io package Spark free

2020-03-28 Thread GitBox
vinothchandar commented on a change in pull request #1460: [HUDI-679] Make io 
package Spark free
URL: https://github.com/apache/incubator-hudi/pull/1460#discussion_r399699506
 
 

 ##
 File path: 
hudi-client/src/main/java/org/apache/hudi/client/SparkTaskContextDetailSupplier.java
 ##
 @@ -0,0 +1,45 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.client;
+
+import org.apache.spark.TaskContext;
+
+import java.io.Serializable;
+import java.util.function.Supplier;
+
+/**
+ * Spark Supplier.
+ */
+public interface SparkTaskContextDetailSupplier extends Supplier, 
Serializable {
 
 Review comment:
   Should be interface be something generic like `WriteTaskContextSupplier`  
which is extended by `SparkTaskContextSupplier` ? 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1460: [HUDI-679] Make io package Spark free

2020-03-28 Thread GitBox
vinothchandar commented on a change in pull request #1460: [HUDI-679] Make io 
package Spark free
URL: https://github.com/apache/incubator-hudi/pull/1460#discussion_r399699941
 
 

 ##
 File path: hudi-client/src/main/java/org/apache/hudi/client/Suppliers.java
 ##
 @@ -0,0 +1,48 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.client;
+
+import java.io.Serializable;
+
+/**
+ * A bundle of Suppliers.
+ */
+public class Suppliers implements Serializable {
 
 Review comment:
   then this can go away.. I feel this is additional abstraction, that we may 
not need.. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1460: [HUDI-679] Make io package Spark free

2020-03-28 Thread GitBox
vinothchandar commented on a change in pull request #1460: [HUDI-679] Make io 
package Spark free
URL: https://github.com/apache/incubator-hudi/pull/1460#discussion_r399700184
 
 

 ##
 File path: hudi-client/src/main/java/org/apache/hudi/io/HoodieWriteHandle.java
 ##
 @@ -55,26 +55,27 @@
   protected final String partitionPath;
   protected final String fileId;
   protected final String writeToken;
+  protected final Suppliers suppliers;
 
   public HoodieWriteHandle(HoodieWriteConfig config, String instantTime, 
String partitionPath,
-   String fileId, HoodieTable hoodieTable) {
+   String fileId, HoodieTable hoodieTable, 
Suppliers suppliers) {
 super(config, instantTime, hoodieTable);
 this.partitionPath = partitionPath;
 this.fileId = fileId;
-this.writeToken = makeSparkWriteToken();
 this.originalSchema = new Schema.Parser().parse(config.getSchema());
 this.writerSchema = createHoodieWriteSchema(originalSchema);
 this.timer = new HoodieTimer().startTimer();
 this.writeStatus = (WriteStatus) 
ReflectionUtils.loadClass(config.getWriteStatusClassName(),
 !hoodieTable.getIndex().isImplicitWithStorage(), 
config.getWriteStatusFailureFraction());
+this.suppliers = suppliers;
+this.writeToken = makeSparkWriteToken();
 
 Review comment:
   rename to just `makeWriteToken()`? 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1460: [HUDI-679] Make io package Spark free

2020-03-28 Thread GitBox
vinothchandar commented on a change in pull request #1460: [HUDI-679] Make io 
package Spark free
URL: https://github.com/apache/incubator-hudi/pull/1460#discussion_r399700058
 
 

 ##
 File path: 
hudi-client/src/main/java/org/apache/hudi/execution/MergeOnReadLazyInsertIterable.java
 ##
 @@ -35,8 +36,9 @@
 public class MergeOnReadLazyInsertIterable 
extends CopyOnWriteLazyInsertIterable {
 
   public MergeOnReadLazyInsertIterable(Iterator> 
sortedRecordItr, HoodieWriteConfig config,
-  String instantTime, HoodieTable hoodieTable, String idPfx) {
-super(sortedRecordItr, config, instantTime, hoodieTable, idPfx);
+   String instantTime, HoodieTable 
hoodieTable, String idPfx,
 
 Review comment:
   side point: we should fix method arg formatting  consistently between 
intellij and checkstyle. Keep seeing these sort of whitespace changes in PRs.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1460: [HUDI-679] Make io package Spark free

2020-03-28 Thread GitBox
vinothchandar commented on a change in pull request #1460: [HUDI-679] Make io 
package Spark free
URL: https://github.com/apache/incubator-hudi/pull/1460#discussion_r399699794
 
 

 ##
 File path: 
hudi-client/src/main/java/org/apache/hudi/client/SparkTaskContextDetailSupplier.java
 ##
 @@ -0,0 +1,45 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.client;
+
+import org.apache.spark.TaskContext;
+
+import java.io.Serializable;
+import java.util.function.Supplier;
+
+/**
+ * Spark Supplier.
+ */
+public interface SparkTaskContextDetailSupplier extends Supplier, 
Serializable {
 
 Review comment:
   I am not sure this abstraction is at the right level.. Should this have to 
be `Supplier`.. I think we can just have three methods that return 
`Supplier` and `Supplier` and pass just one argument through the 
code path i.e the `SparkTaskContextSuppler` instance.. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (HUDI-722) IndexOutOfBoundsException in MessageColumnIORecordConsumer.addBinary when writing parquet

2020-03-28 Thread Alexander Filipchik (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17069995#comment-17069995
 ] 

Alexander Filipchik commented on HUDI-722:
--

Sure. The context -> he have a very schema heavy stream, means schema has 
multiple levels, arrays of structs which have arrays of structs. I saw 2500 
columns on parquet level.

We caught bunch of issues with avro aonversions with that stream. It works fine 
on 0.5, but when I tried to upgrade to 0.6 I got this error. The table type is 
MOR and INSERT.

If you want I we can do a f2f session (Zoom, Hangouts) as it will be easier to 
explain or even debug.

> IndexOutOfBoundsException in MessageColumnIORecordConsumer.addBinary when 
> writing parquet
> -
>
> Key: HUDI-722
> URL: https://issues.apache.org/jira/browse/HUDI-722
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>  Components: Writer Core
>Reporter: Alexander Filipchik
>Assignee: lamber-ken
>Priority: Major
> Fix For: 0.6.0
>
>
> Some writes fail with java.lang.IndexOutOfBoundsException : Invalid array 
> range: X to X inside MessageColumnIORecordConsumer.addBinary call.
> Specifically: getColumnWriter().write(value, r[currentLevel], 
> currentColumnIO.getDefinitionLevel());
> fails as size of r is the same as current level. What can be causing it?
>  
> It gets executed via: ParquetWriter.write(IndexedRecord) Library version: 
> 1.10.1 Avro is a very complex object (~2.5k columns, highly nested, arrays of 
> unions present).
> But what is surprising is that it fails to write top level field: 
> PrimitiveColumnIO _hoodie_commit_time r:0 d:1 [_hoodie_commit_time] which is 
> the first top level field in Avro: {"_hoodie_commit_time": "20200317215711", 
> "_hoodie_commit_seqno": "20200317215711_0_650",



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] pratyakshsharma commented on a change in pull request #1453: HUDI-644 kafka connect checkpoint provider

2020-03-28 Thread GitBox
pratyakshsharma commented on a change in pull request #1453: HUDI-644 kafka 
connect checkpoint provider
URL: https://github.com/apache/incubator-hudi/pull/1453#discussion_r399679398
 
 

 ##
 File path: 
hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/checkpoint/KafkaConnectHdfsProvider.java
 ##
 @@ -0,0 +1,140 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.utilities.sources.checkpoint;
+
+import org.apache.hadoop.fs.FileStatus;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.fs.PathFilter;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.HashMap;
+import java.util.regex.Matcher;
+import java.util.regex.Pattern;
+
+/**
+ * Generate checkpoint from Kafka-Connect-HDFS.
+ */
+public class KafkaConnectHdfsProvider implements CheckPointProvider {
 
 Review comment:
   I am more inclined towards integrating this tool at proper places in our 
code base rather than specifying the checkpoint manually. You plan to do this 
integration in a separate PR? 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[incubator-hudi] branch master updated (04449f3 -> ac73bdc)

2020-03-28 Thread sivabalan
This is an automated email from the ASF dual-hosted git repository.

sivabalan pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-hudi.git.


from 04449f3  [HUDI-743]: Remove FileIOUtils.close() (#1461)
 add ac73bdc  [HUDI-430] Adding InlineFileSystem to support embedding any 
file format as an InlineFile (#1176)

No new revisions were added by this update.

Summary of changes:
 hudi-client/pom.xml|   1 +
 hudi-common/pom.xml|  15 +
 .../hudi/common/inline/fs/InLineFSUtils.java   |  98 ++
 .../hudi/common/inline/fs/InLineFileSystem.java| 133 
 .../common/inline/fs/InLineFsDataInputStream.java  | 114 +++
 .../hudi/common/inline/fs/InMemoryFileSystem.java  | 138 
 .../hudi/common/inline/fs/FileSystemTestUtils.java |  66 
 .../hudi/common/inline/fs/TestHFileInLining.java   | 241 ++
 .../common/inline/fs/TestInLineFileSystem.java | 356 +
 .../common/inline/fs/TestInMemoryFileSystem.java   | 145 +
 .../utilities/inline/fs/TestParquetInLining.java   | 153 +
 pom.xml|   7 -
 12 files changed, 1460 insertions(+), 7 deletions(-)
 create mode 100644 
hudi-common/src/main/java/org/apache/hudi/common/inline/fs/InLineFSUtils.java
 create mode 100644 
hudi-common/src/main/java/org/apache/hudi/common/inline/fs/InLineFileSystem.java
 create mode 100644 
hudi-common/src/main/java/org/apache/hudi/common/inline/fs/InLineFsDataInputStream.java
 create mode 100644 
hudi-common/src/main/java/org/apache/hudi/common/inline/fs/InMemoryFileSystem.java
 create mode 100644 
hudi-common/src/test/java/org/apache/hudi/common/inline/fs/FileSystemTestUtils.java
 create mode 100644 
hudi-common/src/test/java/org/apache/hudi/common/inline/fs/TestHFileInLining.java
 create mode 100644 
hudi-common/src/test/java/org/apache/hudi/common/inline/fs/TestInLineFileSystem.java
 create mode 100644 
hudi-common/src/test/java/org/apache/hudi/common/inline/fs/TestInMemoryFileSystem.java
 create mode 100644 
hudi-utilities/src/test/java/org/apache/hudi/utilities/inline/fs/TestParquetInLining.java



[GitHub] [incubator-hudi] nsivabalan merged pull request #1176: [HUDI-430] Adding InlineFileSystem to support embedding any file format as an InlineFile

2020-03-28 Thread GitBox
nsivabalan merged pull request #1176: [HUDI-430] Adding InlineFileSystem to 
support embedding any file format as an InlineFile
URL: https://github.com/apache/incubator-hudi/pull/1176
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] pratyakshsharma commented on a change in pull request #1453: HUDI-644 kafka connect checkpoint provider

2020-03-28 Thread GitBox
pratyakshsharma commented on a change in pull request #1453: HUDI-644 kafka 
connect checkpoint provider
URL: https://github.com/apache/incubator-hudi/pull/1453#discussion_r399679177
 
 

 ##
 File path: 
hudi-utilities/src/test/java/org/apache/hudi/utilities/sources/checkpoint/TestCheckPointProvider.java
 ##
 @@ -0,0 +1,73 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.utilities.sources.checkpoint;
+
+import org.apache.hudi.common.HoodieCommonTestHarness;
+import org.apache.hudi.common.model.HoodieTestUtils;
+import org.apache.hudi.common.util.FSUtils;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.junit.Before;
+import org.junit.Test;
+
+import java.io.File;
+
+import static org.junit.Assert.assertEquals;
+
+public class TestCheckPointProvider extends HoodieCommonTestHarness {
+  private FileSystem fs = null;
+  private String topicPath = null;
+
+  @Before
+  public void init() {
+// Prepare directories
+initPath();
+topicPath = basePath + "/topic1";
+final Configuration hadoopConf = HoodieTestUtils.getDefaultHadoopConf();
+fs = FSUtils.getFs(basePath, hadoopConf);
+new File(topicPath).mkdirs();
+  }
+
+  @Test
+  public void testKafkaConnectHdfsProvider() throws Exception {
+// create regular kafka connect hdfs dirs
+new File(topicPath + "/year=2016/month=05/day=01/").mkdirs();
+new File(topicPath + "/year=2016/month=05/day=02/").mkdirs();
+// kafka connect tmp folder
+new File(topicPath + "/TMP").mkdirs();
+// tmp file that being written
+new File(topicPath + "/TMP/" + "topic1+0+301+400.parquet").createNewFile();
+// regular parquet files
+new File(topicPath + "/year=2016/month=05/day=01/"
++ "topic1+0+100+200.parquet").createNewFile();
+new File(topicPath + "/year=2016/month=05/day=01/"
++ "topic1+1+100+200.parquet").createNewFile();
+new File(topicPath + "/year=2016/month=05/day=02/"
++ "topic1+0+201+300.parquet").createNewFile();
+// noise parquet file
+new File(topicPath + "/year=2016/month=05/day=01/"
++ "random_snappy_1.parquet").createNewFile();
+new File(topicPath + "/year=2016/month=05/day=02/"
++ "random_snappy_2.parquet").createNewFile();
+CheckPointProvider c = new KafkaConnectHdfsProvider(new Path(topicPath), 
fs);
+assertEquals(c.getCheckpoint(), "topic1,0:300,1:200");
 
 Review comment:
   I see. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] pratyakshsharma commented on a change in pull request #1452: [HUDI-740]Fix can not specify the sparkMaster of cleans run command

2020-03-28 Thread GitBox
pratyakshsharma commented on a change in pull request #1452: [HUDI-740]Fix can 
not specify the sparkMaster of cleans run command
URL: https://github.com/apache/incubator-hudi/pull/1452#discussion_r399679036
 
 

 ##
 File path: hudi-cli/src/main/java/org/apache/hudi/cli/commands/SparkMain.java
 ##
 @@ -62,7 +63,9 @@ public static void main(String[] args) throws Exception {
 
 SparkCommand cmd = SparkCommand.valueOf(command);
 
-JavaSparkContext jsc = SparkUtil.initJavaSparkConf("hoodie-cli-" + 
command);
+JavaSparkContext jsc = cmd == SparkCommand.CLEAN
 
 Review comment:
   @hddong I can see in SparkMain.java class that sparkMaster and sparkMemory 
are both present in args of other command as well. For example - 
   
   1. 
https://github.com/apache/incubator-hudi/blob/04449f33feb300b99750c52ec37f2561aa644456/hudi-cli/src/main/java/org/apache/hudi/cli/commands/SparkMain.java#L216
   2. 
https://github.com/apache/incubator-hudi/blob/04449f33feb300b99750c52ec37f2561aa644456/hudi-cli/src/main/java/org/apache/hudi/cli/commands/SparkMain.java#L234
   
   Am I missing something here? 
   
   Also all these positional arguments will be changed to proper config objects 
as per this PR (https://github.com/apache/incubator-hudi/pull/1174). You might 
want to take a look at this one. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] leesf commented on issue #1460: [HUDI-679] Make io package Spark free

2020-03-28 Thread GitBox
leesf commented on issue #1460: [HUDI-679] Make io package Spark free
URL: https://github.com/apache/incubator-hudi/pull/1460#issuecomment-605462038
 
 
   @yanghua Updated this PR to  package three suppliers into `Suppliers`.  
PTAL, thanks.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] leesf commented on a change in pull request #1460: [HUDI-679] Make io package Spark free

2020-03-28 Thread GitBox
leesf commented on a change in pull request #1460: [HUDI-679] Make io package 
Spark free
URL: https://github.com/apache/incubator-hudi/pull/1460#discussion_r399673870
 
 

 ##
 File path: 
hudi-client/src/main/java/org/apache/hudi/execution/CopyOnWriteLazyInsertIterable.java
 ##
 @@ -50,15 +51,23 @@
   protected final HoodieTable hoodieTable;
   protected final String idPrefix;
   protected int numFilesWritten;
+  protected Supplier idSupplier;
+  protected Supplier stageSupplier;
+  protected Supplier attemptSupplier;
 
   public CopyOnWriteLazyInsertIterable(Iterator> 
sortedRecordItr, HoodieWriteConfig config,
-   String instantTime, HoodieTable 
hoodieTable, String idPrefix) {
+   String instantTime, HoodieTable 
hoodieTable, String idPrefix,
+   Supplier idSupplier, 
Supplier stageSupplier,
+   Supplier attemptSupplier) {
 
 Review comment:
   > Actually, I am not sure if we can package these three args into a DTO 
structure. Just a thought, you can ignore.
   
   Yes, I think it is better.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] hddong commented on a change in pull request #1452: [HUDI-740]Fix can not specify the sparkMaster of cleans run command

2020-03-28 Thread GitBox
hddong commented on a change in pull request #1452: [HUDI-740]Fix can not 
specify the sparkMaster of cleans run command
URL: https://github.com/apache/incubator-hudi/pull/1452#discussion_r399670689
 
 

 ##
 File path: hudi-cli/src/main/java/org/apache/hudi/cli/commands/SparkMain.java
 ##
 @@ -62,7 +63,9 @@ public static void main(String[] args) throws Exception {
 
 SparkCommand cmd = SparkCommand.valueOf(command);
 
-JavaSparkContext jsc = SparkUtil.initJavaSparkConf("hoodie-cli-" + 
command);
+JavaSparkContext jsc = cmd == SparkCommand.CLEAN
 
 Review comment:
   @yanghua yep, it is better to add sparkMaster  for other commands. Is it 
need in this PR? I think I can create a new jira to do this, and then we can 
init jsc by sparkMaster for all commands. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] hddong commented on issue #1449: [HUDI-698]Add unit test for CleansCommand

2020-03-28 Thread GitBox
hddong commented on issue #1449: [HUDI-698]Add unit test for CleansCommand
URL: https://github.com/apache/incubator-hudi/pull/1449#issuecomment-605455411
 
 
   @yanghua It's all passed locally, It's may be cause be two jsc read data 
from different dfs(HadoopConfig) in travis  enviroment, I will make some 
adjustments.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] yanghua commented on a change in pull request #1460: [HUDI-679] Make io package Spark free

2020-03-28 Thread GitBox
yanghua commented on a change in pull request #1460: [HUDI-679] Make io package 
Spark free
URL: https://github.com/apache/incubator-hudi/pull/1460#discussion_r399660338
 
 

 ##
 File path: 
hudi-client/src/main/java/org/apache/hudi/execution/BulkInsertMapFunction.java
 ##
 @@ -51,6 +51,7 @@ public BulkInsertMapFunction(String instantTime, 
HoodieWriteConfig config, Hoodi
   @Override
   public Iterator> call(Integer partition, 
Iterator> sortedRecordItr) {
 return new CopyOnWriteLazyInsertIterable<>(sortedRecordItr, config, 
instantTime, hoodieTable,
-fileIDPrefixes.get(partition));
+fileIDPrefixes.get(partition), hoodieTable.getIdSupplier(), 
hoodieTable.getStageSupplier(),
 
 Review comment:
   `hoodieTable.getIdSupplier()` is not clear here. I suggest we can rename 
these getter to e.g. `getPartitionIdSupplier`, `getStageId` and `getAttemptId`?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] yanghua commented on a change in pull request #1460: [HUDI-679] Make io package Spark free

2020-03-28 Thread GitBox
yanghua commented on a change in pull request #1460: [HUDI-679] Make io package 
Spark free
URL: https://github.com/apache/incubator-hudi/pull/1460#discussion_r399660094
 
 

 ##
 File path: 
hudi-client/src/main/java/org/apache/hudi/client/SparkTaskContextDetailSupplier.java
 ##
 @@ -0,0 +1,45 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.client;
+
+import org.apache.spark.TaskContext;
+
+import java.io.Serializable;
+import java.util.function.Supplier;
+
+/**
+ * Spark Supplier.
+ */
+public interface SparkTaskContextDetailSupplier extends Supplier, 
Serializable {
+
+  /**
+   * Supplier to get partition id.
+   */
+  SparkTaskContextDetailSupplier PARTITION_SUPPLIER = () -> 
TaskContext.getPartitionId();
 
 Review comment:
   `PARTITION_ID_SUPPLIER `?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] yanghua commented on a change in pull request #1460: [HUDI-679] Make io package Spark free

2020-03-28 Thread GitBox
yanghua commented on a change in pull request #1460: [HUDI-679] Make io package 
Spark free
URL: https://github.com/apache/incubator-hudi/pull/1460#discussion_r399660167
 
 

 ##
 File path: 
hudi-client/src/main/java/org/apache/hudi/client/SparkTaskContextDetailSupplier.java
 ##
 @@ -0,0 +1,45 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.client;
+
+import org.apache.spark.TaskContext;
+
+import java.io.Serializable;
+import java.util.function.Supplier;
+
+/**
+ * Spark Supplier.
+ */
+public interface SparkTaskContextDetailSupplier extends Supplier, 
Serializable {
+
+  /**
+   * Supplier to get partition id.
+   */
+  SparkTaskContextDetailSupplier PARTITION_SUPPLIER = () -> 
TaskContext.getPartitionId();
+
+  /**
+   * Supplier to get stage id.
+   */
+  SparkTaskContextDetailSupplier STAGE_SUPPLIER = () -> 
TaskContext.get().stageId();
+
+  /**
+   * Supplier to get task attempt id.
+   */
+  SparkTaskContextDetailSupplier ATTEMPT_SUPPLIER = () -> 
TaskContext.get().taskAttemptId();
 
 Review comment:
   `ATTEMPT_ID_SUPPLIER `?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] yanghua commented on a change in pull request #1460: [HUDI-679] Make io package Spark free

2020-03-28 Thread GitBox
yanghua commented on a change in pull request #1460: [HUDI-679] Make io package 
Spark free
URL: https://github.com/apache/incubator-hudi/pull/1460#discussion_r399660466
 
 

 ##
 File path: 
hudi-client/src/main/java/org/apache/hudi/execution/CopyOnWriteLazyInsertIterable.java
 ##
 @@ -50,15 +51,23 @@
   protected final HoodieTable hoodieTable;
   protected final String idPrefix;
   protected int numFilesWritten;
+  protected Supplier idSupplier;
+  protected Supplier stageSupplier;
+  protected Supplier attemptSupplier;
 
   public CopyOnWriteLazyInsertIterable(Iterator> 
sortedRecordItr, HoodieWriteConfig config,
-   String instantTime, HoodieTable 
hoodieTable, String idPrefix) {
+   String instantTime, HoodieTable 
hoodieTable, String idPrefix,
+   Supplier idSupplier, 
Supplier stageSupplier,
+   Supplier attemptSupplier) {
 
 Review comment:
   Actually, I am not sure if we can package these three args into a DTO 
structure. Just a thought, you can ignore.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] yanghua commented on a change in pull request #1460: [HUDI-679] Make io package Spark free

2020-03-28 Thread GitBox
yanghua commented on a change in pull request #1460: [HUDI-679] Make io package 
Spark free
URL: https://github.com/apache/incubator-hudi/pull/1460#discussion_r399660132
 
 

 ##
 File path: 
hudi-client/src/main/java/org/apache/hudi/client/SparkTaskContextDetailSupplier.java
 ##
 @@ -0,0 +1,45 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.client;
+
+import org.apache.spark.TaskContext;
+
+import java.io.Serializable;
+import java.util.function.Supplier;
+
+/**
+ * Spark Supplier.
+ */
+public interface SparkTaskContextDetailSupplier extends Supplier, 
Serializable {
+
+  /**
+   * Supplier to get partition id.
+   */
+  SparkTaskContextDetailSupplier PARTITION_SUPPLIER = () -> 
TaskContext.getPartitionId();
+
+  /**
+   * Supplier to get stage id.
+   */
+  SparkTaskContextDetailSupplier STAGE_SUPPLIER = () -> 
TaskContext.get().stageId();
 
 Review comment:
   `STAGE_ID_SUPPLIER `?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] yanghua commented on a change in pull request #1460: [HUDI-679] Make io package Spark free

2020-03-28 Thread GitBox
yanghua commented on a change in pull request #1460: [HUDI-679] Make io package 
Spark free
URL: https://github.com/apache/incubator-hudi/pull/1460#discussion_r399660595
 
 

 ##
 File path: hudi-client/src/main/java/org/apache/hudi/io/HoodieWriteHandle.java
 ##
 @@ -55,26 +55,32 @@
   protected final String partitionPath;
   protected final String fileId;
   protected final String writeToken;
+  protected final Supplier idSupplier;
 
 Review comment:
   ditto, IMO, id here is not clear. partition id is better.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] yanghua commented on a change in pull request #1460: [HUDI-679] Make io package Spark free

2020-03-28 Thread GitBox
yanghua commented on a change in pull request #1460: [HUDI-679] Make io package 
Spark free
URL: https://github.com/apache/incubator-hudi/pull/1460#discussion_r399660372
 
 

 ##
 File path: 
hudi-client/src/main/java/org/apache/hudi/execution/CopyOnWriteLazyInsertIterable.java
 ##
 @@ -50,15 +51,23 @@
   protected final HoodieTable hoodieTable;
   protected final String idPrefix;
   protected int numFilesWritten;
+  protected Supplier idSupplier;
 
 Review comment:
   ditto


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] leesf commented on a change in pull request #1454: MINOR update doc to include inc query on partitions

2020-03-28 Thread GitBox
leesf commented on a change in pull request #1454: MINOR update doc to include 
inc query on partitions
URL: https://github.com/apache/incubator-hudi/pull/1454#discussion_r399658424
 
 

 ##
 File path: docs/_docs/0.5.2/2_3_querying_data.md
 ##
 @@ -146,8 +146,9 @@ The following snippet shows how to obtain all records 
changed after `beginInstan
 ```java
  Dataset hudiIncQueryDF = spark.read()
  .format("org.apache.hudi")
- .option(DataSourceReadOptions.QUERY_TYPE_OPT_KEY(), 
DataSourceReadOptions.QUERY_TYPE_INCREMENTAL_OPT_VAL())
+ 
.option(DataSourceReadOptions.QUERY_TYPE_OPT_KEY(),DataSourceReadOptions.QUERY_TYPE_INCREMENTAL_OPT_VAL())
 
 Review comment:
   better to keep a blank, keep it as is?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] leesf commented on issue #1460: [HUDI-679] Make io package Spark free

2020-03-28 Thread GitBox
leesf commented on issue #1460: [HUDI-679] Make io package Spark free
URL: https://github.com/apache/incubator-hudi/pull/1460#issuecomment-605434298
 
 
   @yanghua Updated this PR to address your comments.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] leesf commented on a change in pull request #1460: [HUDI-679] Make io package Spark free

2020-03-28 Thread GitBox
leesf commented on a change in pull request #1460: [HUDI-679] Make io package 
Spark free
URL: https://github.com/apache/incubator-hudi/pull/1460#discussion_r399651507
 
 

 ##
 File path: hudi-client/src/main/java/org/apache/hudi/client/SparkSupplier.java
 ##
 @@ -0,0 +1,33 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.client;
+
+import org.apache.spark.TaskContext;
+
+import java.io.Serializable;
+import java.util.function.Supplier;
+
+/**
+ * Spark Supplier.
+ */
+public interface SparkSupplier extends Supplier, Serializable {
+  SparkSupplier PARTITION_SUPPLIER = () -> 
TaskContext.getPartitionId();
 
 Review comment:
   > Can we add empty line to split this definition? Additionally, add some 
comments?
   
   sure


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] leesf commented on a change in pull request #1460: [HUDI-679] Make io package Spark free

2020-03-28 Thread GitBox
leesf commented on a change in pull request #1460: [HUDI-679] Make io package 
Spark free
URL: https://github.com/apache/incubator-hudi/pull/1460#discussion_r399651495
 
 

 ##
 File path: hudi-client/src/main/java/org/apache/hudi/client/SparkSupplier.java
 ##
 @@ -0,0 +1,33 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.client;
+
+import org.apache.spark.TaskContext;
+
+import java.io.Serializable;
+import java.util.function.Supplier;
+
+/**
+ * Spark Supplier.
+ */
+public interface SparkSupplier extends Supplier, Serializable {
 
 Review comment:
   sounds reasonable.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] yanghua commented on a change in pull request #1460: [HUDI-679] Make io package Spark free

2020-03-28 Thread GitBox
yanghua commented on a change in pull request #1460: [HUDI-679] Make io package 
Spark free
URL: https://github.com/apache/incubator-hudi/pull/1460#discussion_r399649196
 
 

 ##
 File path: hudi-client/src/main/java/org/apache/hudi/client/SparkSupplier.java
 ##
 @@ -0,0 +1,33 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.client;
+
+import org.apache.spark.TaskContext;
+
+import java.io.Serializable;
+import java.util.function.Supplier;
+
+/**
+ * Spark Supplier.
+ */
+public interface SparkSupplier extends Supplier, Serializable {
 
 Review comment:
   Considering this interface supports some information about `TaskContext`, 
can we rename to `SparkTaskContextDetailSupplier`?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] yanghua commented on a change in pull request #1460: [HUDI-679] Make io package Spark free

2020-03-28 Thread GitBox
yanghua commented on a change in pull request #1460: [HUDI-679] Make io package 
Spark free
URL: https://github.com/apache/incubator-hudi/pull/1460#discussion_r399649258
 
 

 ##
 File path: hudi-client/src/main/java/org/apache/hudi/client/SparkSupplier.java
 ##
 @@ -0,0 +1,33 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.client;
+
+import org.apache.spark.TaskContext;
+
+import java.io.Serializable;
+import java.util.function.Supplier;
+
+/**
+ * Spark Supplier.
+ */
+public interface SparkSupplier extends Supplier, Serializable {
+  SparkSupplier PARTITION_SUPPLIER = () -> 
TaskContext.getPartitionId();
 
 Review comment:
   Can we add empty line to split this definition?  Additionally, add some 
comments?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] codecov-io commented on issue #1460: [HUDI-679] Make io package Spark free

2020-03-28 Thread GitBox
codecov-io commented on issue #1460: [HUDI-679] Make io package Spark free
URL: https://github.com/apache/incubator-hudi/pull/1460#issuecomment-605428358
 
 
   # 
[Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1460?src=pr=h1) 
Report
   > Merging 
[#1460](https://codecov.io/gh/apache/incubator-hudi/pull/1460?src=pr=desc) 
into 
[master](https://codecov.io/gh/apache/incubator-hudi/commit/1713f686f86e8c2f0a908c313cca9b595c6aed33=desc)
 will **decrease** coverage by `0.16%`.
   > The diff coverage is `97.67%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-hudi/pull/1460/graphs/tree.svg?width=650=150=pr=VTTXabwbs2)](https://codecov.io/gh/apache/incubator-hudi/pull/1460?src=pr=tree)
   
   ```diff
   @@ Coverage Diff  @@
   ## master#1460  +/-   ##
   
   - Coverage 67.66%   67.50%   -0.17% 
 Complexity  261  261  
   
 Files   342  344   +2 
 Lines 1651016589  +79 
 Branches   1684 1694  +10 
   
   + Hits  1117211198  +26 
   - Misses 4599 4652  +53 
 Partials739  739  
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/incubator-hudi/pull/1460?src=pr=tree) | 
Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | 
[.../hudi/execution/MergeOnReadLazyInsertIterable.java](https://codecov.io/gh/apache/incubator-hudi/pull/1460/diff?src=pr=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvZXhlY3V0aW9uL01lcmdlT25SZWFkTGF6eUluc2VydEl0ZXJhYmxlLmphdmE=)
 | `64.70% <66.66%> (ø)` | `0.00 <0.00> (ø)` | |
   | 
[...ain/java/org/apache/hudi/client/SparkSupplier.java](https://codecov.io/gh/apache/incubator-hudi/pull/1460/diff?src=pr=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY2xpZW50L1NwYXJrU3VwcGxpZXIuamF2YQ==)
 | `100.00% <100.00%> (ø)` | `0.00 <0.00> (?)` | |
   | 
[...g/apache/hudi/execution/BulkInsertMapFunction.java](https://codecov.io/gh/apache/incubator-hudi/pull/1460/diff?src=pr=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvZXhlY3V0aW9uL0J1bGtJbnNlcnRNYXBGdW5jdGlvbi5qYXZh)
 | `100.00% <100.00%> (ø)` | `0.00 <0.00> (ø)` | |
   | 
[.../hudi/execution/CopyOnWriteLazyInsertIterable.java](https://codecov.io/gh/apache/incubator-hudi/pull/1460/diff?src=pr=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvZXhlY3V0aW9uL0NvcHlPbldyaXRlTGF6eUluc2VydEl0ZXJhYmxlLmphdmE=)
 | `81.48% <100.00%> (+1.08%)` | `0.00 <0.00> (ø)` | |
   | 
[...in/java/org/apache/hudi/io/HoodieAppendHandle.java](https://codecov.io/gh/apache/incubator-hudi/pull/1460/diff?src=pr=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaW8vSG9vZGllQXBwZW5kSGFuZGxlLmphdmE=)
 | `84.17% <100.00%> (ø)` | `0.00 <0.00> (ø)` | |
   | 
[...in/java/org/apache/hudi/io/HoodieCreateHandle.java](https://codecov.io/gh/apache/incubator-hudi/pull/1460/diff?src=pr=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaW8vSG9vZGllQ3JlYXRlSGFuZGxlLmphdmE=)
 | `84.61% <100.00%> (ø)` | `0.00 <0.00> (ø)` | |
   | 
[...ain/java/org/apache/hudi/io/HoodieMergeHandle.java](https://codecov.io/gh/apache/incubator-hudi/pull/1460/diff?src=pr=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaW8vSG9vZGllTWVyZ2VIYW5kbGUuamF2YQ==)
 | `79.31% <100.00%> (ø)` | `0.00 <0.00> (ø)` | |
   | 
[...ain/java/org/apache/hudi/io/HoodieWriteHandle.java](https://codecov.io/gh/apache/incubator-hudi/pull/1460/diff?src=pr=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaW8vSG9vZGllV3JpdGVIYW5kbGUuamF2YQ==)
 | `74.46% <100.00%> (+1.13%)` | `0.00 <0.00> (ø)` | |
   | 
[...rg/apache/hudi/io/storage/HoodieParquetWriter.java](https://codecov.io/gh/apache/incubator-hudi/pull/1460/diff?src=pr=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaW8vc3RvcmFnZS9Ib29kaWVQYXJxdWV0V3JpdGVyLmphdmE=)
 | `100.00% <100.00%> (ø)` | `0.00 <0.00> (ø)` | |
   | 
[...he/hudi/io/storage/HoodieStorageWriterFactory.java](https://codecov.io/gh/apache/incubator-hudi/pull/1460/diff?src=pr=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaW8vc3RvcmFnZS9Ib29kaWVTdG9yYWdlV3JpdGVyRmFjdG9yeS5qYXZh)
 | `93.75% <100.00%> (ø)` | `0.00 <0.00> (ø)` | |
   | ... and [19 
more](https://codecov.io/gh/apache/incubator-hudi/pull/1460/diff?src=pr=tree-more)
 | |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1460?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1460?src=pr=footer).
 Last update 

[GitHub] [incubator-hudi] codecov-io edited a comment on issue #1460: [HUDI-679] Make io package Spark free

2020-03-28 Thread GitBox
codecov-io edited a comment on issue #1460: [HUDI-679] Make io package Spark 
free
URL: https://github.com/apache/incubator-hudi/pull/1460#issuecomment-605428358
 
 
   # 
[Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1460?src=pr=h1) 
Report
   > Merging 
[#1460](https://codecov.io/gh/apache/incubator-hudi/pull/1460?src=pr=desc) 
into 
[master](https://codecov.io/gh/apache/incubator-hudi/commit/1713f686f86e8c2f0a908c313cca9b595c6aed33=desc)
 will **decrease** coverage by `0.16%`.
   > The diff coverage is `97.67%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-hudi/pull/1460/graphs/tree.svg?width=650=150=pr=VTTXabwbs2)](https://codecov.io/gh/apache/incubator-hudi/pull/1460?src=pr=tree)
   
   ```diff
   @@ Coverage Diff  @@
   ## master#1460  +/-   ##
   
   - Coverage 67.66%   67.50%   -0.17% 
 Complexity  261  261  
   
 Files   342  344   +2 
 Lines 1651016589  +79 
 Branches   1684 1694  +10 
   
   + Hits  1117211198  +26 
   - Misses 4599 4652  +53 
 Partials739  739  
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/incubator-hudi/pull/1460?src=pr=tree) | 
Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | 
[.../hudi/execution/MergeOnReadLazyInsertIterable.java](https://codecov.io/gh/apache/incubator-hudi/pull/1460/diff?src=pr=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvZXhlY3V0aW9uL01lcmdlT25SZWFkTGF6eUluc2VydEl0ZXJhYmxlLmphdmE=)
 | `64.70% <66.66%> (ø)` | `0.00 <0.00> (ø)` | |
   | 
[...ain/java/org/apache/hudi/client/SparkSupplier.java](https://codecov.io/gh/apache/incubator-hudi/pull/1460/diff?src=pr=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY2xpZW50L1NwYXJrU3VwcGxpZXIuamF2YQ==)
 | `100.00% <100.00%> (ø)` | `0.00 <0.00> (?)` | |
   | 
[...g/apache/hudi/execution/BulkInsertMapFunction.java](https://codecov.io/gh/apache/incubator-hudi/pull/1460/diff?src=pr=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvZXhlY3V0aW9uL0J1bGtJbnNlcnRNYXBGdW5jdGlvbi5qYXZh)
 | `100.00% <100.00%> (ø)` | `0.00 <0.00> (ø)` | |
   | 
[.../hudi/execution/CopyOnWriteLazyInsertIterable.java](https://codecov.io/gh/apache/incubator-hudi/pull/1460/diff?src=pr=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvZXhlY3V0aW9uL0NvcHlPbldyaXRlTGF6eUluc2VydEl0ZXJhYmxlLmphdmE=)
 | `81.48% <100.00%> (+1.08%)` | `0.00 <0.00> (ø)` | |
   | 
[...in/java/org/apache/hudi/io/HoodieAppendHandle.java](https://codecov.io/gh/apache/incubator-hudi/pull/1460/diff?src=pr=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaW8vSG9vZGllQXBwZW5kSGFuZGxlLmphdmE=)
 | `84.17% <100.00%> (ø)` | `0.00 <0.00> (ø)` | |
   | 
[...in/java/org/apache/hudi/io/HoodieCreateHandle.java](https://codecov.io/gh/apache/incubator-hudi/pull/1460/diff?src=pr=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaW8vSG9vZGllQ3JlYXRlSGFuZGxlLmphdmE=)
 | `84.61% <100.00%> (ø)` | `0.00 <0.00> (ø)` | |
   | 
[...ain/java/org/apache/hudi/io/HoodieMergeHandle.java](https://codecov.io/gh/apache/incubator-hudi/pull/1460/diff?src=pr=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaW8vSG9vZGllTWVyZ2VIYW5kbGUuamF2YQ==)
 | `79.31% <100.00%> (ø)` | `0.00 <0.00> (ø)` | |
   | 
[...ain/java/org/apache/hudi/io/HoodieWriteHandle.java](https://codecov.io/gh/apache/incubator-hudi/pull/1460/diff?src=pr=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaW8vSG9vZGllV3JpdGVIYW5kbGUuamF2YQ==)
 | `74.46% <100.00%> (+1.13%)` | `0.00 <0.00> (ø)` | |
   | 
[...rg/apache/hudi/io/storage/HoodieParquetWriter.java](https://codecov.io/gh/apache/incubator-hudi/pull/1460/diff?src=pr=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaW8vc3RvcmFnZS9Ib29kaWVQYXJxdWV0V3JpdGVyLmphdmE=)
 | `100.00% <100.00%> (ø)` | `0.00 <0.00> (ø)` | |
   | 
[...he/hudi/io/storage/HoodieStorageWriterFactory.java](https://codecov.io/gh/apache/incubator-hudi/pull/1460/diff?src=pr=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaW8vc3RvcmFnZS9Ib29kaWVTdG9yYWdlV3JpdGVyRmFjdG9yeS5qYXZh)
 | `93.75% <100.00%> (ø)` | `0.00 <0.00> (ø)` | |
   | ... and [19 
more](https://codecov.io/gh/apache/incubator-hudi/pull/1460/diff?src=pr=tree-more)
 | |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1460?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1460?src=pr=footer).
 Last update 

[GitHub] [incubator-hudi] pratyakshsharma commented on a change in pull request #1150: [HUDI-288]: Add support for ingesting multiple kafka streams in a single DeltaStreamer deployment

2020-03-28 Thread GitBox
pratyakshsharma commented on a change in pull request #1150: [HUDI-288]: Add 
support for ingesting multiple kafka streams in a single DeltaStreamer 
deployment
URL: https://github.com/apache/incubator-hudi/pull/1150#discussion_r399646598
 
 

 ##
 File path: 
hudi-client/src/test/java/org/apache/hudi/common/HoodieTestDataGenerator.java
 ##
 @@ -124,6 +138,18 @@ public static void writePartitionMetadata(FileSystem fs, 
String[] partitionPaths
 }
   }
 
+  public TestRawTripPayload generateRandomValueAsPerSchema(String schemaStr, 
HoodieKey key, String commitTime) throws IOException {
 
 Review comment:
   I feel we do not gain much by defining enums here :) 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] pratyakshsharma commented on a change in pull request #1150: [HUDI-288]: Add support for ingesting multiple kafka streams in a single DeltaStreamer deployment

2020-03-28 Thread GitBox
pratyakshsharma commented on a change in pull request #1150: [HUDI-288]: Add 
support for ingesting multiple kafka streams in a single DeltaStreamer 
deployment
URL: https://github.com/apache/incubator-hudi/pull/1150#discussion_r399646360
 
 

 ##
 File path: 
hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/HoodieMultiTableDeltaStreamer.java
 ##
 @@ -0,0 +1,259 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.utilities.deltastreamer;
+
+import org.apache.hadoop.fs.FileUtil;
+import org.apache.hudi.DataSourceWriteOptions;
+import org.apache.hudi.common.util.FSUtils;
+import org.apache.hudi.common.util.TypedProperties;
+import org.apache.hudi.exception.HoodieException;
+import org.apache.hudi.utilities.UtilHelpers;
+import org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.Config;
+import org.apache.hudi.utilities.schema.SchemaRegistryProvider;
+
+import com.beust.jcommander.JCommander;
+import com.google.common.base.Strings;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+import org.apache.spark.api.java.JavaSparkContext;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Set;
+
+/**
+ * Wrapper over HoodieDeltaStreamer.java class.
+ * Helps with ingesting incremental data into hoodie datasets for multiple 
tables.
+ * Currently supports only COPY_ON_WRITE storage type.
+ */
+public class HoodieMultiTableDeltaStreamer {
+
+  private static Logger logger = 
LogManager.getLogger(HoodieMultiTableDeltaStreamer.class);
+
+  private List tableExecutionObjects;
+  private transient JavaSparkContext jssc;
+  private Set successTables;
+  private Set failedTables;
+
+  public HoodieMultiTableDeltaStreamer(String[] args, JavaSparkContext jssc) 
throws IOException {
+this.tableExecutionObjects = new ArrayList<>();
+this.successTables = new HashSet<>();
+this.failedTables = new HashSet<>();
+this.jssc = jssc;
+String commonPropsFile = getCommonPropsFileName(args);
+String configFolder = getConfigFolder(args);
+FileSystem fs = FSUtils.getFs(commonPropsFile, jssc.hadoopConfiguration());
+configFolder = configFolder.charAt(configFolder.length() - 1) == '/' ? 
configFolder.substring(0, configFolder.length() - 1) : configFolder;
+checkIfPropsFileAndConfigFolderExist(commonPropsFile, configFolder, fs);
+TypedProperties properties = UtilHelpers.readConfig(fs, new 
Path(commonPropsFile), new ArrayList<>()).getConfig();
+//get the tables to be ingested and their corresponding config files from 
this properties instance
+populateTableExecutionObjectList(properties, configFolder, fs, args);
+  }
+
+  private void checkIfPropsFileAndConfigFolderExist(String commonPropsFile, 
String configFolder, FileSystem fs) throws IOException {
+if (!fs.exists(new Path(commonPropsFile))) {
+  throw new IllegalArgumentException("Please provide valid common config 
file path!");
+}
+
+if (!fs.exists(new Path(configFolder))) {
+  fs.mkdirs(new Path(configFolder));
+}
+  }
+
+  private void checkIfTableConfigFileExists(String configFolder, FileSystem 
fs, String configFilePath) throws IOException {
+if (!fs.exists(new Path(configFilePath)) || !fs.isFile(new 
Path(configFilePath))) {
+  throw new IllegalArgumentException("Please provide valid table config 
file path!");
+}
+
+Path path = new Path(configFilePath);
+Path filePathInConfigFolder = new Path(configFolder, path.getName());
+if (!fs.exists(filePathInConfigFolder)) {
+  FileUtil.copy(fs, path, fs, filePathInConfigFolder, false, fs.getConf());
+}
+  }
+
+  //commonProps are passed as parameter which contain table to config file 
mapping
+  private void populateTableExecutionObjectList(TypedProperties properties, 
String configFolder, FileSystem fs, String[] args) throws IOException {
+List tablesToBeIngested = getTablesToBeIngested(properties);
+TableExecutionObject executionObject;
+for (String table : tablesToBeIngested) {
+  String[] 

[jira] [Reopened] (HUDI-743) Remove FileIOUtils.close()

2020-03-28 Thread vinoyang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang reopened HUDI-743:
---

> Remove FileIOUtils.close()
> --
>
> Key: HUDI-743
> URL: https://issues.apache.org/jira/browse/HUDI-743
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Code Cleanup
>Reporter: Suneel Marthi
>Assignee: Suneel Marthi
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.6.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (HUDI-743) Remove FileIOUtils.close()

2020-03-28 Thread vinoyang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang closed HUDI-743.
-
Resolution: Done

Done via master branch: 04449f33feb300b99750c52ec37f2561aa644456

> Remove FileIOUtils.close()
> --
>
> Key: HUDI-743
> URL: https://issues.apache.org/jira/browse/HUDI-743
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Code Cleanup
>Reporter: Suneel Marthi
>Assignee: Suneel Marthi
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.6.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[incubator-hudi] branch master updated: [HUDI-743]: Remove FileIOUtils.close() (#1461)

2020-03-28 Thread vinoyang
This is an automated email from the ASF dual-hosted git repository.

vinoyang pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new 04449f3  [HUDI-743]: Remove FileIOUtils.close() (#1461)
04449f3 is described below

commit 04449f33feb300b99750c52ec37f2561aa644456
Author: Suneel Marthi 
AuthorDate: Sat Mar 28 06:03:15 2020 -0400

[HUDI-743]: Remove FileIOUtils.close() (#1461)
---
 .../main/java/org/apache/hudi/metrics/Metrics.java |  3 +--
 .../org/apache/hudi/common/util/FileIOUtils.java   | 25 --
 2 files changed, 1 insertion(+), 27 deletions(-)

diff --git a/hudi-client/src/main/java/org/apache/hudi/metrics/Metrics.java 
b/hudi-client/src/main/java/org/apache/hudi/metrics/Metrics.java
index b6d2f7a..b62279e 100644
--- a/hudi-client/src/main/java/org/apache/hudi/metrics/Metrics.java
+++ b/hudi-client/src/main/java/org/apache/hudi/metrics/Metrics.java
@@ -18,7 +18,6 @@
 
 package org.apache.hudi.metrics;
 
-import org.apache.hudi.common.util.FileIOUtils;
 import org.apache.hudi.config.HoodieWriteConfig;
 import org.apache.hudi.exception.HoodieException;
 
@@ -53,7 +52,7 @@ public class Metrics {
 Runtime.getRuntime().addShutdownHook(new Thread(() -> {
   try {
 reporter.report();
-FileIOUtils.close(reporter.getReporter(), true);
+getReporter().close();
   } catch (Exception e) {
 e.printStackTrace();
   }
diff --git 
a/hudi-common/src/main/java/org/apache/hudi/common/util/FileIOUtils.java 
b/hudi-common/src/main/java/org/apache/hudi/common/util/FileIOUtils.java
index 65a28b0..f1095b6 100644
--- a/hudi-common/src/main/java/org/apache/hudi/common/util/FileIOUtils.java
+++ b/hudi-common/src/main/java/org/apache/hudi/common/util/FileIOUtils.java
@@ -18,10 +18,7 @@
 
 package org.apache.hudi.common.util;
 
-import javax.annotation.Nullable;
-
 import java.io.ByteArrayOutputStream;
-import java.io.Closeable;
 import java.io.File;
 import java.io.FileOutputStream;
 import java.io.IOException;
@@ -94,26 +91,4 @@ public class FileIOUtils {
 out.flush();
 out.close();
   }
-
-  /**
-   * Closes a {@link Closeable}, with control over whether an {@code 
IOException} may be thrown.
-   * @param closeable the {@code Closeable} object to be closed, or null,
-   *  in which case this method does nothing.
-   * @param swallowIOException if true, don't propagate IO exceptions thrown 
by the {@code close} methods.
-   *
-   * @throws IOException if {@code swallowIOException} is false and {@code 
close} throws an {@code IOException}.
-   */
-  public static void close(@Nullable Closeable closeable, boolean 
swallowIOException)
-  throws IOException {
-if (closeable == null) {
-  return;
-}
-try {
-  closeable.close();
-} catch (IOException e) {
-  if (!swallowIOException) {
-throw e;
-  }
-}
-  }
 }



[jira] [Updated] (HUDI-743) Remove FileIOUtils.close()

2020-03-28 Thread vinoyang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang updated HUDI-743:
--
Status: Closed  (was: Patch Available)

> Remove FileIOUtils.close()
> --
>
> Key: HUDI-743
> URL: https://issues.apache.org/jira/browse/HUDI-743
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Code Cleanup
>Reporter: Suneel Marthi
>Assignee: Suneel Marthi
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.6.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] yanghua merged pull request #1461: [HUDI-743]: Remove FileIOUtils.close()

2020-03-28 Thread GitBox
yanghua merged pull request #1461: [HUDI-743]: Remove FileIOUtils.close()
URL: https://github.com/apache/incubator-hudi/pull/1461
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] codecov-io edited a comment on issue #1461: [HUDI-743]: Remove FileIOUtils.close()

2020-03-28 Thread GitBox
codecov-io edited a comment on issue #1461: [HUDI-743]: Remove 
FileIOUtils.close()
URL: https://github.com/apache/incubator-hudi/pull/1461#issuecomment-605422050
 
 
   # 
[Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1461?src=pr=h1) 
Report
   > Merging 
[#1461](https://codecov.io/gh/apache/incubator-hudi/pull/1461?src=pr=desc) 
into 
[master](https://codecov.io/gh/apache/incubator-hudi/commit/8c3001363d80b29733470221c192a72f541381c5=desc)
 will **increase** coverage by `0.03%`.
   > The diff coverage is `0.00%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-hudi/pull/1461/graphs/tree.svg?width=650=150=pr=VTTXabwbs2)](https://codecov.io/gh/apache/incubator-hudi/pull/1461?src=pr=tree)
   
   ```diff
   @@ Coverage Diff  @@
   ## master#1461  +/-   ##
   
   + Coverage 67.49%   67.52%   +0.03% 
 Complexity  261  261  
   
 Files   343  343  
 Lines 1657316565   -8 
 Branches   1694 1693   -1 
   
 Hits  1118611186  
   + Misses 4648 4641   -7 
   + Partials739  738   -1 
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/incubator-hudi/pull/1461?src=pr=tree) | 
Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | 
[...src/main/java/org/apache/hudi/metrics/Metrics.java](https://codecov.io/gh/apache/incubator-hudi/pull/1461/diff?src=pr=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvbWV0cmljcy9NZXRyaWNzLmphdmE=)
 | `72.22% <0.00%> (+2.77%)` | `0.00 <0.00> (ø)` | |
   | 
[.../java/org/apache/hudi/common/util/FileIOUtils.java](https://codecov.io/gh/apache/incubator-hudi/pull/1461/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3V0aWwvRmlsZUlPVXRpbHMuamF2YQ==)
 | `65.51% <ø> (+11.46%)` | `0.00 <0.00> (ø)` | |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1461?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1461?src=pr=footer).
 Last update 
[8c30013...8152c49](https://codecov.io/gh/apache/incubator-hudi/pull/1461?src=pr=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] codecov-io commented on issue #1461: [HUDI-743]: Remove FileIOUtils.close()

2020-03-28 Thread GitBox
codecov-io commented on issue #1461: [HUDI-743]: Remove FileIOUtils.close()
URL: https://github.com/apache/incubator-hudi/pull/1461#issuecomment-605422050
 
 
   # 
[Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1461?src=pr=h1) 
Report
   > Merging 
[#1461](https://codecov.io/gh/apache/incubator-hudi/pull/1461?src=pr=desc) 
into 
[master](https://codecov.io/gh/apache/incubator-hudi/commit/8c3001363d80b29733470221c192a72f541381c5=desc)
 will **increase** coverage by `0.03%`.
   > The diff coverage is `0.00%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-hudi/pull/1461/graphs/tree.svg?width=650=150=pr=VTTXabwbs2)](https://codecov.io/gh/apache/incubator-hudi/pull/1461?src=pr=tree)
   
   ```diff
   @@ Coverage Diff  @@
   ## master#1461  +/-   ##
   
   + Coverage 67.49%   67.52%   +0.03% 
 Complexity  261  261  
   
 Files   343  343  
 Lines 1657316565   -8 
 Branches   1694 1693   -1 
   
 Hits  1118611186  
   + Misses 4648 4641   -7 
   + Partials739  738   -1 
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/incubator-hudi/pull/1461?src=pr=tree) | 
Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | 
[...src/main/java/org/apache/hudi/metrics/Metrics.java](https://codecov.io/gh/apache/incubator-hudi/pull/1461/diff?src=pr=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvbWV0cmljcy9NZXRyaWNzLmphdmE=)
 | `72.22% <0.00%> (+2.77%)` | `0.00 <0.00> (ø)` | |
   | 
[.../java/org/apache/hudi/common/util/FileIOUtils.java](https://codecov.io/gh/apache/incubator-hudi/pull/1461/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3V0aWwvRmlsZUlPVXRpbHMuamF2YQ==)
 | `65.51% <ø> (+11.46%)` | `0.00 <0.00> (ø)` | |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1461?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1461?src=pr=footer).
 Last update 
[8c30013...8152c49](https://codecov.io/gh/apache/incubator-hudi/pull/1461?src=pr=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Updated] (HUDI-743) Remove FileIOUtils.close()

2020-03-28 Thread Suneel Marthi (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi updated HUDI-743:
---
Status: Patch Available  (was: In Progress)

> Remove FileIOUtils.close()
> --
>
> Key: HUDI-743
> URL: https://issues.apache.org/jira/browse/HUDI-743
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Code Cleanup
>Reporter: Suneel Marthi
>Assignee: Suneel Marthi
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.6.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-743) Remove FileIOUtils.close()

2020-03-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-743:

Labels: pull-request-available  (was: )

> Remove FileIOUtils.close()
> --
>
> Key: HUDI-743
> URL: https://issues.apache.org/jira/browse/HUDI-743
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Code Cleanup
>Reporter: Suneel Marthi
>Assignee: Suneel Marthi
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.6.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] smarthi opened a new pull request #1461: [HUDI-743]: Remove FileIOUtils.close()

2020-03-28 Thread GitBox
smarthi opened a new pull request #1461: [HUDI-743]: Remove FileIOUtils.close()
URL: https://github.com/apache/incubator-hudi/pull/1461
 
 
   ## *Tips*
   - *Thank you very much for contributing to Apache Hudi.*
   - *Please review https://hudi.apache.org/contributing.html before opening a 
pull request.*
   
   ## What is the purpose of the pull request
   
   Remove FileIOUtils.close() method - not needed anymore
   
   ## Brief change log
   
   Removed FileIOUtils.close() method - not needed anymore
   
   ## Verify this pull request
   
   This pull request is a trivial rework / code cleanup without any test 
coverage.
   
   ## Committer checklist
   
- [X] Has a corresponding JIRA in PR title & commit

- [X] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Updated] (HUDI-743) Remove FileIOUtils.close()

2020-03-28 Thread Suneel Marthi (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi updated HUDI-743:
---
Status: Open  (was: New)

> Remove FileIOUtils.close()
> --
>
> Key: HUDI-743
> URL: https://issues.apache.org/jira/browse/HUDI-743
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Code Cleanup
>Reporter: Suneel Marthi
>Assignee: Suneel Marthi
>Priority: Major
> Fix For: 0.6.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-743) Remove FileIOUtils.close()

2020-03-28 Thread Suneel Marthi (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi updated HUDI-743:
---
Status: In Progress  (was: Open)

> Remove FileIOUtils.close()
> --
>
> Key: HUDI-743
> URL: https://issues.apache.org/jira/browse/HUDI-743
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Code Cleanup
>Reporter: Suneel Marthi
>Assignee: Suneel Marthi
>Priority: Major
> Fix For: 0.6.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-743) Remove FileIOUtils.close()

2020-03-28 Thread Suneel Marthi (Jira)
Suneel Marthi created HUDI-743:
--

 Summary: Remove FileIOUtils.close()
 Key: HUDI-743
 URL: https://issues.apache.org/jira/browse/HUDI-743
 Project: Apache Hudi (incubating)
  Issue Type: Improvement
  Components: Code Cleanup
Reporter: Suneel Marthi
Assignee: Suneel Marthi
 Fix For: 0.6.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] yanghua commented on a change in pull request #1452: [HUDI-740]Fix can not specify the sparkMaster of cleans run command

2020-03-28 Thread GitBox
yanghua commented on a change in pull request #1452: [HUDI-740]Fix can not 
specify the sparkMaster of cleans run command
URL: https://github.com/apache/incubator-hudi/pull/1452#discussion_r399634497
 
 

 ##
 File path: hudi-cli/src/main/java/org/apache/hudi/cli/commands/SparkMain.java
 ##
 @@ -62,7 +63,9 @@ public static void main(String[] args) throws Exception {
 
 SparkCommand cmd = SparkCommand.valueOf(command);
 
-JavaSparkContext jsc = SparkUtil.initJavaSparkConf("hoodie-cli-" + 
command);
+JavaSparkContext jsc = cmd == SparkCommand.CLEAN
 
 Review comment:
   I have the same concern with @pratyakshsharma , can we make other commands 
also support specifying the spark master?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] yanghua commented on issue #1449: [HUDI-698]Add unit test for CleansCommand

2020-03-28 Thread GitBox
yanghua commented on issue #1449: [HUDI-698]Add unit test for CleansCommand
URL: https://github.com/apache/incubator-hudi/pull/1449#issuecomment-605409519
 
 
   @hddong Still failed...


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Updated] (HUDI-479) Eliminate use of guava if possible

2020-03-28 Thread Suneel Marthi (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi updated HUDI-479:
---
Status: Closed  (was: Patch Available)

> Eliminate use of guava if possible
> --
>
> Key: HUDI-479
> URL: https://issues.apache.org/jira/browse/HUDI-479
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Code Cleanup
>Reporter: Vinoth Chandar
>Assignee: Suneel Marthi
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.6.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-737) Simplify/Eliminate need for CollectionUtils#Maps/MapsBuilder

2020-03-28 Thread Suneel Marthi (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi updated HUDI-737:
---
Status: Closed  (was: Patch Available)

> Simplify/Eliminate need for CollectionUtils#Maps/MapsBuilder
> 
>
> Key: HUDI-737
> URL: https://issues.apache.org/jira/browse/HUDI-737
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Code Cleanup
>Reporter: Vinoth Chandar
>Assignee: Suneel Marthi
>Priority: Major
> Fix For: 0.6.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[incubator-hudi] branch master updated: HUDI-479: Eliminate or Minimize use of Guava if possible (#1159)

2020-03-28 Thread smarthi
This is an automated email from the ASF dual-hosted git repository.

smarthi pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new 8c30013  HUDI-479: Eliminate or Minimize use of Guava if possible 
(#1159)
8c30013 is described below

commit 8c3001363d80b29733470221c192a72f541381c5
Author: Suneel Marthi 
AuthorDate: Sat Mar 28 03:11:32 2020 -0400

HUDI-479: Eliminate or Minimize use of Guava if possible (#1159)
---
 .../apache/hudi/cli/commands/RollbacksCommand.java |   4 +-
 .../common/HoodieTestCommitMetadataGenerator.java  |  20 ++--
 .../org/apache/hudi/client/HoodieWriteClient.java  |   6 +-
 .../hudi/index/bloom/BloomIndexFileInfo.java   |   5 +-
 .../org/apache/hudi/io/HoodieAppendHandle.java |   2 +-
 .../apache/hudi/metrics/JmxMetricsReporter.java|  10 +-
 .../org/apache/hudi/metrics/JmxReporterServer.java |  18 ++--
 .../main/java/org/apache/hudi/metrics/Metrics.java |   5 +-
 .../compact/HoodieMergeOnReadTableCompactor.java   |   4 +-
 .../apache/hudi/table/rollback/RollbackHelper.java |   2 +-
 .../index/bloom/TestHoodieGlobalBloomIndex.java|  12 +--
 .../java/org/apache/hudi/table/TestCleaner.java|  51 +-
 .../strategy/TestHoodieCompactionStrategy.java |  37 +--
 .../apache/hudi/avro/MercifulJsonConverter.java|  25 +++--
 .../org/apache/hudi/common/model/HoodieRecord.java |   7 +-
 .../hudi/common/table/HoodieTableMetaClient.java   |   6 +-
 .../table/timeline/HoodieActiveTimeline.java   |  11 +-
 .../table/timeline/HoodieArchivedTimeline.java |   1 +
 .../table/timeline/HoodieDefaultTimeline.java  |  10 +-
 .../hudi/common/table/timeline/HoodieInstant.java  |   6 +-
 .../IncrementalTimelineSyncFileSystemView.java |   2 +-
 .../view/RemoteHoodieTableFileSystemView.java  |   2 +-
 .../org/apache/hudi/common/util/AvroUtils.java |  18 ++--
 .../org/apache/hudi/common/util/CleanerUtils.java  |  11 +-
 .../apache/hudi/common/util/CollectionUtils.java   | 111 +
 .../java/org/apache/hudi/common/util/FSUtils.java  |   4 +-
 .../org/apache/hudi/common/util/FileIOUtils.java   |  25 +
 .../apache/hudi/common/util/ReflectionUtils.java   |  69 +++--
 .../hudi/common/minicluster/HdfsTestService.java   |  10 +-
 .../common/minicluster/ZookeeperTestService.java   |   6 +-
 .../common/model/TestHoodieCommitMetadata.java |   1 +
 .../table/string/TestHoodieActiveTimeline.java |  12 +--
 .../table/view/TestIncrementalFSViewSync.java  |  22 ++--
 .../view/TestPriorityBasedFileSystemView.java  |   5 +-
 .../hudi/common/util/CompactionTestUtils.java  |  15 +--
 .../hudi/common/util/TestCompactionUtils.java  |  10 +-
 .../org/apache/hudi/common/util/TestFSUtils.java   |   2 +-
 .../realtime/HoodieParquetRealtimeInputFormat.java |   4 +-
 .../org/apache/hudi/hive/util/HiveTestService.java |   9 +-
 .../org/apache/hudi/integ/ITTestHoodieDemo.java|  40 
 .../org/apache/hudi/HoodieDataSourceHelpers.java   |   5 +-
 .../hudi/utilities/HoodieSnapshotExporter.java |   4 +-
 .../hudi/utilities/sources/HoodieIncrSource.java   |   2 +-
 .../apache/hudi/utilities/UtilitiesTestBase.java   |   4 +-
 pom.xml|   7 --
 style/checkstyle.xml   |   4 +-
 46 files changed, 429 insertions(+), 217 deletions(-)

diff --git 
a/hudi-cli/src/main/java/org/apache/hudi/cli/commands/RollbacksCommand.java 
b/hudi-cli/src/main/java/org/apache/hudi/cli/commands/RollbacksCommand.java
index 4a122c6..3993714 100644
--- a/hudi-cli/src/main/java/org/apache/hudi/cli/commands/RollbacksCommand.java
+++ b/hudi-cli/src/main/java/org/apache/hudi/cli/commands/RollbacksCommand.java
@@ -28,9 +28,9 @@ import 
org.apache.hudi.common.table.timeline.HoodieActiveTimeline;
 import org.apache.hudi.common.table.timeline.HoodieInstant;
 import org.apache.hudi.common.table.timeline.HoodieInstant.State;
 import org.apache.hudi.common.util.AvroUtils;
+import org.apache.hudi.common.util.CollectionUtils;
 import org.apache.hudi.common.util.collection.Pair;
 
-import com.google.common.collect.ImmutableSet;
 import org.springframework.shell.core.CommandMarker;
 import org.springframework.shell.core.annotation.CliCommand;
 import org.springframework.shell.core.annotation.CliOption;
@@ -123,7 +123,7 @@ public class RollbacksCommand implements CommandMarker {
   class RollbackTimeline extends HoodieActiveTimeline {
 
 public RollbackTimeline(HoodieTableMetaClient metaClient) {
-  super(metaClient, 
ImmutableSet.builder().add(HoodieTimeline.ROLLBACK_EXTENSION).build());
+  super(metaClient, 
CollectionUtils.createImmutableSet(HoodieTimeline.ROLLBACK_EXTENSION));
 }
   }
 }
diff --git 
a/hudi-cli/src/test/java/org/apache/hudi/cli/common/HoodieTestCommitMetadataGenerator.java
 

[GitHub] [incubator-hudi] smarthi merged pull request #1159: [HUDI-479] Eliminate or Minimize use of Guava if possible

2020-03-28 Thread GitBox
smarthi merged pull request #1159: [HUDI-479] Eliminate or Minimize use of 
Guava if possible
URL: https://github.com/apache/incubator-hudi/pull/1159
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] codecov-io edited a comment on issue #1159: [HUDI-479] Eliminate or Minimize use of Guava if possible

2020-03-28 Thread GitBox
codecov-io edited a comment on issue #1159: [HUDI-479] Eliminate or Minimize 
use of Guava if possible
URL: https://github.com/apache/incubator-hudi/pull/1159#issuecomment-596089314
 
 
   # 
[Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1159?src=pr=h1) 
Report
   > Merging 
[#1159](https://codecov.io/gh/apache/incubator-hudi/pull/1159?src=pr=desc) 
into 
[master](https://codecov.io/gh/apache/incubator-hudi/commit/e101ea9bd4405a461bc78aad1af64499f797daed=desc)
 will **decrease** coverage by `67.04%`.
   > The diff coverage is `0.00%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-hudi/pull/1159/graphs/tree.svg?width=650=150=pr=VTTXabwbs2)](https://codecov.io/gh/apache/incubator-hudi/pull/1159?src=pr=tree)
   
   ```diff
   @@ Coverage Diff  @@
   ## master   #1159   +/-   ##
   
   - Coverage 67.68%   0.63%   -67.05% 
   + Complexity  261   2  -259 
   
 Files   341 294   -47 
 Lines 16511   14544 -1967 
 Branches   16881483  -205 
   
   - Hits  11175  92-11083 
   - Misses 4599   14449 +9850 
   + Partials737   3  -734 
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/incubator-hudi/pull/1159?src=pr=tree) | 
Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | 
[...java/org/apache/hudi/client/HoodieWriteClient.java](https://codecov.io/gh/apache/incubator-hudi/pull/1159/diff?src=pr=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY2xpZW50L0hvb2RpZVdyaXRlQ2xpZW50LmphdmE=)
 | `0.00% <0.00%> (-69.78%)` | `0.00 <0.00> (ø)` | |
   | 
[...rg/apache/hudi/index/bloom/BloomIndexFileInfo.java](https://codecov.io/gh/apache/incubator-hudi/pull/1159/diff?src=pr=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaW5kZXgvYmxvb20vQmxvb21JbmRleEZpbGVJbmZvLmphdmE=)
 | `0.00% <0.00%> (-46.88%)` | `0.00 <0.00> (ø)` | |
   | 
[...in/java/org/apache/hudi/io/HoodieAppendHandle.java](https://codecov.io/gh/apache/incubator-hudi/pull/1159/diff?src=pr=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaW8vSG9vZGllQXBwZW5kSGFuZGxlLmphdmE=)
 | `0.00% <ø> (-84.18%)` | `0.00 <0.00> (ø)` | |
   | 
[...va/org/apache/hudi/metrics/JmxMetricsReporter.java](https://codecov.io/gh/apache/incubator-hudi/pull/1159/diff?src=pr=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvbWV0cmljcy9KbXhNZXRyaWNzUmVwb3J0ZXIuamF2YQ==)
 | `0.00% <0.00%> (ø)` | `0.00 <0.00> (ø)` | |
   | 
[...ava/org/apache/hudi/metrics/JmxReporterServer.java](https://codecov.io/gh/apache/incubator-hudi/pull/1159/diff?src=pr=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvbWV0cmljcy9KbXhSZXBvcnRlclNlcnZlci5qYXZh)
 | `0.00% <0.00%> (ø)` | `0.00 <0.00> (ø)` | |
   | 
[...src/main/java/org/apache/hudi/metrics/Metrics.java](https://codecov.io/gh/apache/incubator-hudi/pull/1159/diff?src=pr=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvbWV0cmljcy9NZXRyaWNzLmphdmE=)
 | `0.00% <0.00%> (-70.28%)` | `0.00 <0.00> (ø)` | |
   | 
[...table/compact/HoodieMergeOnReadTableCompactor.java](https://codecov.io/gh/apache/incubator-hudi/pull/1159/diff?src=pr=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdGFibGUvY29tcGFjdC9Ib29kaWVNZXJnZU9uUmVhZFRhYmxlQ29tcGFjdG9yLmphdmE=)
 | `0.00% <0.00%> (-90.11%)` | `0.00 <0.00> (ø)` | |
   | 
[...org/apache/hudi/table/rollback/RollbackHelper.java](https://codecov.io/gh/apache/incubator-hudi/pull/1159/diff?src=pr=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdGFibGUvcm9sbGJhY2svUm9sbGJhY2tIZWxwZXIuamF2YQ==)
 | `0.00% <0.00%> (-80.44%)` | `0.00 <0.00> (ø)` | |
   | 
[...va/org/apache/hudi/avro/MercifulJsonConverter.java](https://codecov.io/gh/apache/incubator-hudi/pull/1159/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvYXZyby9NZXJjaWZ1bEpzb25Db252ZXJ0ZXIuamF2YQ==)
 | `0.00% <0.00%> (-58.72%)` | `0.00 <0.00> (ø)` | |
   | 
[...ava/org/apache/hudi/common/model/HoodieRecord.java](https://codecov.io/gh/apache/incubator-hudi/pull/1159/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL21vZGVsL0hvb2RpZVJlY29yZC5qYXZh)
 | `0.00% <0.00%> (-82.82%)` | `0.00 <0.00> (ø)` | |
   | ... and [320 
more](https://codecov.io/gh/apache/incubator-hudi/pull/1159/diff?src=pr=tree-more)
 | |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1159?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1159?src=pr=footer).
 Last update 

[GitHub] [incubator-hudi] codecov-io edited a comment on issue #1159: [HUDI-479] Eliminate or Minimize use of Guava if possible

2020-03-28 Thread GitBox
codecov-io edited a comment on issue #1159: [HUDI-479] Eliminate or Minimize 
use of Guava if possible
URL: https://github.com/apache/incubator-hudi/pull/1159#issuecomment-596089314
 
 
   # 
[Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1159?src=pr=h1) 
Report
   > Merging 
[#1159](https://codecov.io/gh/apache/incubator-hudi/pull/1159?src=pr=desc) 
into 
[master](https://codecov.io/gh/apache/incubator-hudi/commit/e101ea9bd4405a461bc78aad1af64499f797daed=desc)
 will **decrease** coverage by `0.18%`.
   > The diff coverage is `46.01%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-hudi/pull/1159/graphs/tree.svg?width=650=150=pr=VTTXabwbs2)](https://codecov.io/gh/apache/incubator-hudi/pull/1159?src=pr=tree)
   
   ```diff
   @@ Coverage Diff  @@
   ## master#1159  +/-   ##
   
   - Coverage 67.68%   67.49%   -0.19% 
 Complexity  261  261  
   
 Files   341  343   +2 
 Lines 1651116573  +62 
 Branches   1688 1694   +6 
   
   + Hits  1117511186  +11 
   - Misses 4599 4648  +49 
   - Partials737  739   +2 
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/incubator-hudi/pull/1159?src=pr=tree) | 
Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | 
[...in/java/org/apache/hudi/io/HoodieAppendHandle.java](https://codecov.io/gh/apache/incubator-hudi/pull/1159/diff?src=pr=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaW8vSG9vZGllQXBwZW5kSGFuZGxlLmphdmE=)
 | `84.17% <ø> (ø)` | `0.00 <0.00> (ø)` | |
   | 
[...va/org/apache/hudi/metrics/JmxMetricsReporter.java](https://codecov.io/gh/apache/incubator-hudi/pull/1159/diff?src=pr=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvbWV0cmljcy9KbXhNZXRyaWNzUmVwb3J0ZXIuamF2YQ==)
 | `0.00% <0.00%> (ø)` | `0.00 <0.00> (ø)` | |
   | 
[...ava/org/apache/hudi/metrics/JmxReporterServer.java](https://codecov.io/gh/apache/incubator-hudi/pull/1159/diff?src=pr=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvbWV0cmljcy9KbXhSZXBvcnRlclNlcnZlci5qYXZh)
 | `0.00% <0.00%> (ø)` | `0.00 <0.00> (ø)` | |
   | 
[...pache/hudi/common/table/HoodieTableMetaClient.java](https://codecov.io/gh/apache/incubator-hudi/pull/1159/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL0hvb2RpZVRhYmxlTWV0YUNsaWVudC5qYXZh)
 | `76.77% <ø> (ø)` | `0.00 <0.00> (ø)` | |
   | 
[.../common/table/timeline/HoodieArchivedTimeline.java](https://codecov.io/gh/apache/incubator-hudi/pull/1159/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3RpbWVsaW5lL0hvb2RpZUFyY2hpdmVkVGltZWxpbmUuamF2YQ==)
 | `43.24% <ø> (ø)` | `0.00 <0.00> (ø)` | |
   | 
[...le/view/IncrementalTimelineSyncFileSystemView.java](https://codecov.io/gh/apache/incubator-hudi/pull/1159/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3ZpZXcvSW5jcmVtZW50YWxUaW1lbGluZVN5bmNGaWxlU3lzdGVtVmlldy5qYXZh)
 | `86.44% <ø> (ø)` | `0.00 <0.00> (ø)` | |
   | 
[...on/table/view/RemoteHoodieTableFileSystemView.java](https://codecov.io/gh/apache/incubator-hudi/pull/1159/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3ZpZXcvUmVtb3RlSG9vZGllVGFibGVGaWxlU3lzdGVtVmlldy5qYXZh)
 | `77.59% <ø> (ø)` | `0.00 <0.00> (ø)` | |
   | 
[...main/java/org/apache/hudi/common/util/FSUtils.java](https://codecov.io/gh/apache/incubator-hudi/pull/1159/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3V0aWwvRlNVdGlscy5qYXZh)
 | `69.27% <ø> (ø)` | `0.00 <0.00> (ø)` | |
   | 
[...pache/hudi/utilities/sources/HoodieIncrSource.java](https://codecov.io/gh/apache/incubator-hudi/pull/1159/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvSG9vZGllSW5jclNvdXJjZS5qYXZh)
 | `92.59% <ø> (ø)` | `7.00 <0.00> (ø)` | |
   | 
[...a/org/apache/hudi/common/util/ReflectionUtils.java](https://codecov.io/gh/apache/incubator-hudi/pull/1159/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3V0aWwvUmVmbGVjdGlvblV0aWxzLmphdmE=)
 | `30.76% <3.22%> (-31.74%)` | `0.00 <0.00> (ø)` | |
   | ... and [30 
more](https://codecov.io/gh/apache/incubator-hudi/pull/1159/diff?src=pr=tree-more)
 | |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1159?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1159?src=pr=footer).
 

[GitHub] [incubator-hudi] leesf commented on issue #1460: [HUDI-679] Make io package Spark free

2020-03-28 Thread GitBox
leesf commented on issue #1460: [HUDI-679] Make io package Spark free
URL: https://github.com/apache/incubator-hudi/pull/1460#issuecomment-605404425
 
 
   @vinothchandar @yanghua Please take a look when you are free. Thanks


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Updated] (HUDI-679) Make io package spark free

2020-03-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-679:

Labels: pull-request-available  (was: )

> Make io package spark free
> --
>
> Key: HUDI-679
> URL: https://issues.apache.org/jira/browse/HUDI-679
> Project: Apache Hudi (incubating)
>  Issue Type: Sub-task
>Reporter: leesf
>Assignee: leesf
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] leesf opened a new pull request #1460: [HUDI-679] Make io package Spark free

2020-03-28 Thread GitBox
leesf opened a new pull request #1460: [HUDI-679] Make io package Spark free
URL: https://github.com/apache/incubator-hudi/pull/1460
 
 
   ## *Tips*
   - *Thank you very much for contributing to Apache Hudi.*
   - *Please review https://hudi.apache.org/contributing.html before opening a 
pull request.*
   
   ## What is the purpose of the pull request
   
   Make io package spark free.
   
   ## Brief change log
   
   * Introduce SparkSupplier.java
   * Remove the usage of TaskContext in io package.
   
   ## Verify this pull request
   
   This pull request is a trivial rework / code cleanup without any test 
coverage.
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services