[jira] [Updated] (HUDI-757) Add a command to hudi-cli to export commit metadata

2020-03-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-757:

Labels: pull-request-available  (was: )

> Add a command to hudi-cli to export commit metadata
> ---
>
> Key: HUDI-757
> URL: https://issues.apache.org/jira/browse/HUDI-757
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>Reporter: Prashant Wason
>Priority: Minor
>  Labels: pull-request-available
>   Original Estimate: 4h
>  Remaining Estimate: 4h
>
> HUDI stores commit related information in files within the .hoodie directory. 
> Each commit / delatacommit / rollback / etc creates one or more files. To 
> prevent a large number of files, older files are consolidated together and 
> moved into a commit archive which has multiple such files written together 
> using the format of HUDI Log files.
> During debugging of issues or for development of new features, it may be 
> required to refer to the metadata of older commits / cleanups / rollbacks. 
> There is no simple way to get these from a production setup especially from 
> the archive files.
> This enhancement provides a hudi cli command which allows exporting metadata 
> from HUDI commit archives.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] prashantwason opened a new pull request #1476: [HUDI-757] Added hudi-cli command to export metadata of Instants.

2020-03-31 Thread GitBox
prashantwason opened a new pull request #1476: [HUDI-757] Added hudi-cli 
command to export metadata of Instants.
URL: https://github.com/apache/incubator-hudi/pull/1476
 
 
   ## What is the purpose of the pull request
   
   Added hudi-cli command to export metadata of Instants.
   
   ## Brief change log
   
   Added a new command to hudi-cli 
   
   ## Verify this pull request
   
   This change can be verified by running hudi-cli, connecting to a HUDI table 
and running the following command:
   
   hudi:db.table-> export instants --localFolder /tmp/dump --limit 5 --actions 
clean,rollback,commit --desc false
   
   After this, 5 instant files should have been created in the /tmp/dump folder.
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Updated] (HUDI-757) Add a command to hudi-cli to export commit metadata

2020-03-31 Thread Prashant Wason (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prashant Wason updated HUDI-757:

Status: In Progress  (was: Open)

> Add a command to hudi-cli to export commit metadata
> ---
>
> Key: HUDI-757
> URL: https://issues.apache.org/jira/browse/HUDI-757
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>Reporter: Prashant Wason
>Priority: Minor
>   Original Estimate: 4h
>  Remaining Estimate: 4h
>
> HUDI stores commit related information in files within the .hoodie directory. 
> Each commit / delatacommit / rollback / etc creates one or more files. To 
> prevent a large number of files, older files are consolidated together and 
> moved into a commit archive which has multiple such files written together 
> using the format of HUDI Log files.
> During debugging of issues or for development of new features, it may be 
> required to refer to the metadata of older commits / cleanups / rollbacks. 
> There is no simple way to get these from a production setup especially from 
> the archive files.
> This enhancement provides a hudi cli command which allows exporting metadata 
> from HUDI commit archives.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-757) Add a command to hudi-cli to export commit metadata

2020-03-31 Thread Prashant Wason (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17072394#comment-17072394
 ] 

Prashant Wason commented on HUDI-757:
-

Test run on my setup on a hudi table which exported 2761 instants into a local 
directory.

 

hudi:pwason_db.pwason_test_table->export instants --localFolder 
/home/pwason/instant_dump

...

Exported 2761 Instants

 

ls -al  /home/pwason/instant_dump/

drwxrwxr-x 2 pwason users 131072 Mar 31 20:55 .
drwxr-xr-x 17 pwason users 4096 Mar 31 19:56 ..
-rw-rw-r-- 1 pwason users 15000711 Mar 31 20:55 20200228063343.commit
-rw-rw-r-- 1 pwason users 12042716 Mar 31 20:55 20200228094924.commit
-rw-rw-r-- 1 pwason users 73320 Mar 31 20:55 20200228094925.clean
-rw-rw-r-- 1 pwason users 11420128 Mar 31 20:55 20200228211516.commit
-rw-rw-r-- 1 pwason users 73320 Mar 31 20:55 20200228211540.clean
-rw-rw-r-- 1 pwason users 7567466 Mar 31 20:55 20200228221520.commit

> Add a command to hudi-cli to export commit metadata
> ---
>
> Key: HUDI-757
> URL: https://issues.apache.org/jira/browse/HUDI-757
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>Reporter: Prashant Wason
>Priority: Minor
>   Original Estimate: 4h
>  Remaining Estimate: 4h
>
> HUDI stores commit related information in files within the .hoodie directory. 
> Each commit / delatacommit / rollback / etc creates one or more files. To 
> prevent a large number of files, older files are consolidated together and 
> moved into a commit archive which has multiple such files written together 
> using the format of HUDI Log files.
> During debugging of issues or for development of new features, it may be 
> required to refer to the metadata of older commits / cleanups / rollbacks. 
> There is no simple way to get these from a production setup especially from 
> the archive files.
> This enhancement provides a hudi cli command which allows exporting metadata 
> from HUDI commit archives.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-757) Add a command to hudi-cli to export commit metadata

2020-03-31 Thread Prashant Wason (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prashant Wason updated HUDI-757:

Status: Open  (was: New)

> Add a command to hudi-cli to export commit metadata
> ---
>
> Key: HUDI-757
> URL: https://issues.apache.org/jira/browse/HUDI-757
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>Reporter: Prashant Wason
>Priority: Minor
>   Original Estimate: 4h
>  Remaining Estimate: 4h
>
> HUDI stores commit related information in files within the .hoodie directory. 
> Each commit / delatacommit / rollback / etc creates one or more files. To 
> prevent a large number of files, older files are consolidated together and 
> moved into a commit archive which has multiple such files written together 
> using the format of HUDI Log files.
> During debugging of issues or for development of new features, it may be 
> required to refer to the metadata of older commits / cleanups / rollbacks. 
> There is no simple way to get these from a production setup especially from 
> the archive files.
> This enhancement provides a hudi cli command which allows exporting metadata 
> from HUDI commit archives.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-757) Add a command to hudi-cli to export commit metadata

2020-03-31 Thread Prashant Wason (Jira)
Prashant Wason created HUDI-757:
---

 Summary: Add a command to hudi-cli to export commit metadata
 Key: HUDI-757
 URL: https://issues.apache.org/jira/browse/HUDI-757
 Project: Apache Hudi (incubating)
  Issue Type: Improvement
Reporter: Prashant Wason


HUDI stores commit related information in files within the .hoodie directory. 
Each commit / delatacommit / rollback / etc creates one or more files. To 
prevent a large number of files, older files are consolidated together and 
moved into a commit archive which has multiple such files written together 
using the format of HUDI Log files.

During debugging of issues or for development of new features, it may be 
required to refer to the metadata of older commits / cleanups / rollbacks. 
There is no simple way to get these from a production setup especially from the 
archive files.

This enhancement provides a hudi cli command which allows exporting metadata 
from HUDI commit archives.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-69) Support realtime view in Spark datasource #136

2020-03-31 Thread Yanjia Gary Li (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-69?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17072382#comment-17072382
 ] 

Yanjia Gary Li commented on HUDI-69:


[~vinoth] I am happy to work on this ticket. Please assign to me

> Support realtime view in Spark datasource #136
> --
>
> Key: HUDI-69
> URL: https://issues.apache.org/jira/browse/HUDI-69
> Project: Apache Hudi (incubating)
>  Issue Type: New Feature
>  Components: Spark Integration
>Reporter: Vinoth Chandar
>Assignee: Vinoth Chandar
>Priority: Major
> Fix For: 0.6.0
>
>
> https://github.com/uber/hudi/issues/136



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-756) Organize Action execution logic into a nicer class hierarchy in hudi-client

2020-03-31 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar updated HUDI-756:

Status: Open  (was: New)

> Organize Action execution logic into a nicer class hierarchy in hudi-client
> ---
>
> Key: HUDI-756
> URL: https://issues.apache.org/jira/browse/HUDI-756
> Project: Apache Hudi (incubating)
>  Issue Type: Sub-task
>  Components: Writer Core
>Reporter: Vinoth Chandar
>Assignee: Vinoth Chandar
>Priority: Major
> Fix For: 0.6.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-756) Organize Action execution logic into a nicer class hierarchy in hudi-client

2020-03-31 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar updated HUDI-756:

Status: In Progress  (was: Open)

> Organize Action execution logic into a nicer class hierarchy in hudi-client
> ---
>
> Key: HUDI-756
> URL: https://issues.apache.org/jira/browse/HUDI-756
> Project: Apache Hudi (incubating)
>  Issue Type: Sub-task
>  Components: Writer Core
>Reporter: Vinoth Chandar
>Assignee: Vinoth Chandar
>Priority: Major
> Fix For: 0.6.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] codecov-io edited a comment on issue #1459: [HUDI-418] [HUDI-421] Bootstrap Index using HFile and File System View Changes with unit-test

2020-03-31 Thread GitBox
codecov-io edited a comment on issue #1459: [HUDI-418] [HUDI-421] Bootstrap 
Index using HFile and File System View Changes with unit-test
URL: https://github.com/apache/incubator-hudi/pull/1459#issuecomment-607013294
 
 
   # 
[Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1459?src=pr=h1) 
Report
   > Merging 
[#1459](https://codecov.io/gh/apache/incubator-hudi/pull/1459?src=pr=desc) 
into 
[master](https://codecov.io/gh/apache/incubator-hudi/commit/78b3194e8241c519a85310997f31b2b55df487e1=desc)
 will **increase** coverage by `0.28%`.
   > The diff coverage is `78.27%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-hudi/pull/1459/graphs/tree.svg?width=650=150=pr=VTTXabwbs2)](https://codecov.io/gh/apache/incubator-hudi/pull/1459?src=pr=tree)
   
   ```diff
   @@ Coverage Diff  @@
   ## master#1459  +/-   ##
   
   + Coverage 67.64%   67.92%   +0.28% 
 Complexity  261  261  
   
 Files   348  351   +3 
 Lines 1667217115 +443 
 Branches   1694 1726  +32 
   
   + Hits  1127811626 +348 
   - Misses 4653 4725  +72 
   - Partials741  764  +23 
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/incubator-hudi/pull/1459?src=pr=tree) | 
Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | 
[...che/hudi/common/table/timeline/HoodieTimeline.java](https://codecov.io/gh/apache/incubator-hudi/pull/1459/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3RpbWVsaW5lL0hvb2RpZVRpbWVsaW5lLmphdmE=)
 | `100.00% <ø> (ø)` | `0.00 <0.00> (ø)` | |
   | 
[...ain/java/org/apache/hudi/avro/HoodieAvroUtils.java](https://codecov.io/gh/apache/incubator-hudi/pull/1459/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvYXZyby9Ib29kaWVBdnJvVXRpbHMuamF2YQ==)
 | `80.35% <12.50%> (-11.40%)` | `0.00 <0.00> (ø)` | |
   | 
[...c/main/java/org/apache/hudi/common/fs/FSUtils.java](https://codecov.io/gh/apache/incubator-hudi/pull/1459/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL2ZzL0ZTVXRpbHMuamF2YQ==)
 | `66.33% <25.00%> (-0.84%)` | `0.00 <0.00> (ø)` | |
   | 
[.../common/table/view/RocksDbBasedFileSystemView.java](https://codecov.io/gh/apache/incubator-hudi/pull/1459/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3ZpZXcvUm9ja3NEYkJhc2VkRmlsZVN5c3RlbVZpZXcuamF2YQ==)
 | `82.16% <52.94%> (-6.58%)` | `0.00 <0.00> (ø)` | |
   | 
[...i/common/table/view/HoodieTableFileSystemView.java](https://codecov.io/gh/apache/incubator-hudi/pull/1459/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3ZpZXcvSG9vZGllVGFibGVGaWxlU3lzdGVtVmlldy5qYXZh)
 | `87.34% <57.14%> (-10.94%)` | `0.00 <0.00> (ø)` | |
   | 
[.../hudi/common/model/BootstrapSourceFileMapping.java](https://codecov.io/gh/apache/incubator-hudi/pull/1459/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL21vZGVsL0Jvb3RzdHJhcFNvdXJjZUZpbGVNYXBwaW5nLmphdmE=)
 | `65.51% <65.51%> (ø)` | `0.00 <0.00> (?)` | |
   | 
[...on/table/view/SpillableMapBasedFileSystemView.java](https://codecov.io/gh/apache/incubator-hudi/pull/1459/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3ZpZXcvU3BpbGxhYmxlTWFwQmFzZWRGaWxlU3lzdGVtVmlldy5qYXZh)
 | `67.56% <70.00%> (+0.90%)` | `0.00 <0.00> (ø)` | |
   | 
[...rg/apache/hudi/common/model/HoodieFileGroupId.java](https://codecov.io/gh/apache/incubator-hudi/pull/1459/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL21vZGVsL0hvb2RpZUZpbGVHcm91cElkLmphdmE=)
 | `77.77% <75.00%> (-0.80%)` | `0.00 <0.00> (ø)` | |
   | 
[...g/apache/hudi/common/bootstrap/BootstrapIndex.java](https://codecov.io/gh/apache/incubator-hudi/pull/1459/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL2Jvb3RzdHJhcC9Cb290c3RyYXBJbmRleC5qYXZh)
 | `80.28% <80.28%> (ø)` | `0.00 <0.00> (?)` | |
   | 
[...common/table/view/FileSystemViewStorageConfig.java](https://codecov.io/gh/apache/incubator-hudi/pull/1459/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3ZpZXcvRmlsZVN5c3RlbVZpZXdTdG9yYWdlQ29uZmlnLmphdmE=)
 | `83.56% <81.81%> (-0.57%)` | `0.00 <0.00> (ø)` | |
   | ... and [17 
more](https://codecov.io/gh/apache/incubator-hudi/pull/1459/diff?src=pr=tree-more)
 | |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1459?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = 

[GitHub] [incubator-hudi] bvaradar commented on issue #1459: [HUDI-418] [HUDI-421] Bootstrap Index using HFile and File System View Changes with unit-test

2020-03-31 Thread GitBox
bvaradar commented on issue #1459: [HUDI-418] [HUDI-421] Bootstrap Index using 
HFile and File System View Changes with unit-test
URL: https://github.com/apache/incubator-hudi/pull/1459#issuecomment-607014929
 
 
   @vinothchandar  : Ready for review (cc @umehrot2 )


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] codecov-io commented on issue #1459: [WIP] [HUDI-418] [HUDI-421] Bootstrap Index using HFile and File System View Changes with unit-test

2020-03-31 Thread GitBox
codecov-io commented on issue #1459: [WIP] [HUDI-418] [HUDI-421] Bootstrap 
Index using HFile and File System View Changes with unit-test
URL: https://github.com/apache/incubator-hudi/pull/1459#issuecomment-607013294
 
 
   # 
[Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1459?src=pr=h1) 
Report
   > Merging 
[#1459](https://codecov.io/gh/apache/incubator-hudi/pull/1459?src=pr=desc) 
into 
[master](https://codecov.io/gh/apache/incubator-hudi/commit/78b3194e8241c519a85310997f31b2b55df487e1=desc)
 will **increase** coverage by `0.28%`.
   > The diff coverage is `78.27%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-hudi/pull/1459/graphs/tree.svg?width=650=150=pr=VTTXabwbs2)](https://codecov.io/gh/apache/incubator-hudi/pull/1459?src=pr=tree)
   
   ```diff
   @@ Coverage Diff  @@
   ## master#1459  +/-   ##
   
   + Coverage 67.64%   67.92%   +0.28% 
 Complexity  261  261  
   
 Files   348  351   +3 
 Lines 1667217115 +443 
 Branches   1694 1726  +32 
   
   + Hits  1127811626 +348 
   - Misses 4653 4725  +72 
   - Partials741  764  +23 
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/incubator-hudi/pull/1459?src=pr=tree) | 
Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | 
[...che/hudi/common/table/timeline/HoodieTimeline.java](https://codecov.io/gh/apache/incubator-hudi/pull/1459/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3RpbWVsaW5lL0hvb2RpZVRpbWVsaW5lLmphdmE=)
 | `100.00% <ø> (ø)` | `0.00 <0.00> (ø)` | |
   | 
[...ain/java/org/apache/hudi/avro/HoodieAvroUtils.java](https://codecov.io/gh/apache/incubator-hudi/pull/1459/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvYXZyby9Ib29kaWVBdnJvVXRpbHMuamF2YQ==)
 | `80.35% <12.50%> (-11.40%)` | `0.00 <0.00> (ø)` | |
   | 
[...c/main/java/org/apache/hudi/common/fs/FSUtils.java](https://codecov.io/gh/apache/incubator-hudi/pull/1459/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL2ZzL0ZTVXRpbHMuamF2YQ==)
 | `66.33% <25.00%> (-0.84%)` | `0.00 <0.00> (ø)` | |
   | 
[.../common/table/view/RocksDbBasedFileSystemView.java](https://codecov.io/gh/apache/incubator-hudi/pull/1459/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3ZpZXcvUm9ja3NEYkJhc2VkRmlsZVN5c3RlbVZpZXcuamF2YQ==)
 | `82.16% <52.94%> (-6.58%)` | `0.00 <0.00> (ø)` | |
   | 
[...i/common/table/view/HoodieTableFileSystemView.java](https://codecov.io/gh/apache/incubator-hudi/pull/1459/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3ZpZXcvSG9vZGllVGFibGVGaWxlU3lzdGVtVmlldy5qYXZh)
 | `87.34% <57.14%> (-10.94%)` | `0.00 <0.00> (ø)` | |
   | 
[.../hudi/common/model/BootstrapSourceFileMapping.java](https://codecov.io/gh/apache/incubator-hudi/pull/1459/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL21vZGVsL0Jvb3RzdHJhcFNvdXJjZUZpbGVNYXBwaW5nLmphdmE=)
 | `65.51% <65.51%> (ø)` | `0.00 <0.00> (?)` | |
   | 
[...on/table/view/SpillableMapBasedFileSystemView.java](https://codecov.io/gh/apache/incubator-hudi/pull/1459/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3ZpZXcvU3BpbGxhYmxlTWFwQmFzZWRGaWxlU3lzdGVtVmlldy5qYXZh)
 | `67.56% <70.00%> (+0.90%)` | `0.00 <0.00> (ø)` | |
   | 
[...rg/apache/hudi/common/model/HoodieFileGroupId.java](https://codecov.io/gh/apache/incubator-hudi/pull/1459/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL21vZGVsL0hvb2RpZUZpbGVHcm91cElkLmphdmE=)
 | `77.77% <75.00%> (-0.80%)` | `0.00 <0.00> (ø)` | |
   | 
[...g/apache/hudi/common/bootstrap/BootstrapIndex.java](https://codecov.io/gh/apache/incubator-hudi/pull/1459/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL2Jvb3RzdHJhcC9Cb290c3RyYXBJbmRleC5qYXZh)
 | `80.28% <80.28%> (ø)` | `0.00 <0.00> (?)` | |
   | 
[...common/table/view/FileSystemViewStorageConfig.java](https://codecov.io/gh/apache/incubator-hudi/pull/1459/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3ZpZXcvRmlsZVN5c3RlbVZpZXdTdG9yYWdlQ29uZmlnLmphdmE=)
 | `83.56% <81.81%> (-0.57%)` | `0.00 <0.00> (ø)` | |
   | ... and [17 
more](https://codecov.io/gh/apache/incubator-hudi/pull/1459/diff?src=pr=tree-more)
 | |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1459?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = 

[GitHub] [incubator-hudi] garyli1019 commented on issue #1453: HUDI-644 kafka connect checkpoint provider

2020-03-31 Thread GitBox
garyli1019 commented on issue #1453: HUDI-644 kafka connect checkpoint provider
URL: https://github.com/apache/incubator-hudi/pull/1453#issuecomment-607011011
 
 
   comments addressed. Please review


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


Build failed in Jenkins: hudi-snapshot-deployment-0.5 #234

2020-03-31 Thread Apache Jenkins Server
See 


Changes:


--
[...truncated 2.37 KB...]
/home/jenkins/tools/maven/apache-maven-3.5.4/conf:
logging
settings.xml
toolchains.xml

/home/jenkins/tools/maven/apache-maven-3.5.4/conf/logging:
simplelogger.properties

/home/jenkins/tools/maven/apache-maven-3.5.4/lib:
aopalliance-1.0.jar
cdi-api-1.0.jar
cdi-api.license
commons-cli-1.4.jar
commons-cli.license
commons-io-2.5.jar
commons-io.license
commons-lang3-3.5.jar
commons-lang3.license
ext
guava-20.0.jar
guice-4.2.0-no_aop.jar
jansi-1.17.1.jar
jansi-native
javax.inject-1.jar
jcl-over-slf4j-1.7.25.jar
jcl-over-slf4j.license
jsr250-api-1.0.jar
jsr250-api.license
maven-artifact-3.5.4.jar
maven-artifact.license
maven-builder-support-3.5.4.jar
maven-builder-support.license
maven-compat-3.5.4.jar
maven-compat.license
maven-core-3.5.4.jar
maven-core.license
maven-embedder-3.5.4.jar
maven-embedder.license
maven-model-3.5.4.jar
maven-model-builder-3.5.4.jar
maven-model-builder.license
maven-model.license
maven-plugin-api-3.5.4.jar
maven-plugin-api.license
maven-repository-metadata-3.5.4.jar
maven-repository-metadata.license
maven-resolver-api-1.1.1.jar
maven-resolver-api.license
maven-resolver-connector-basic-1.1.1.jar
maven-resolver-connector-basic.license
maven-resolver-impl-1.1.1.jar
maven-resolver-impl.license
maven-resolver-provider-3.5.4.jar
maven-resolver-provider.license
maven-resolver-spi-1.1.1.jar
maven-resolver-spi.license
maven-resolver-transport-wagon-1.1.1.jar
maven-resolver-transport-wagon.license
maven-resolver-util-1.1.1.jar
maven-resolver-util.license
maven-settings-3.5.4.jar
maven-settings-builder-3.5.4.jar
maven-settings-builder.license
maven-settings.license
maven-shared-utils-3.2.1.jar
maven-shared-utils.license
maven-slf4j-provider-3.5.4.jar
maven-slf4j-provider.license
org.eclipse.sisu.inject-0.3.3.jar
org.eclipse.sisu.inject.license
org.eclipse.sisu.plexus-0.3.3.jar
org.eclipse.sisu.plexus.license
plexus-cipher-1.7.jar
plexus-cipher.license
plexus-component-annotations-1.7.1.jar
plexus-component-annotations.license
plexus-interpolation-1.24.jar
plexus-interpolation.license
plexus-sec-dispatcher-1.4.jar
plexus-sec-dispatcher.license
plexus-utils-3.1.0.jar
plexus-utils.license
slf4j-api-1.7.25.jar
slf4j-api.license
wagon-file-3.1.0.jar
wagon-file.license
wagon-http-3.1.0-shaded.jar
wagon-http.license
wagon-provider-api-3.1.0.jar
wagon-provider-api.license

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/ext:
README.txt

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native:
freebsd32
freebsd64
linux32
linux64
osx
README.txt
windows32
windows64

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/freebsd32:
libjansi.so

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/freebsd64:
libjansi.so

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/linux32:
libjansi.so

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/linux64:
libjansi.so

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/osx:
libjansi.jnilib

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/windows32:
jansi.dll

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/windows64:
jansi.dll
Finished /home/jenkins/tools/maven/apache-maven-3.5.4 Directory Listing :
Detected current version as: 
'HUDI_home=
0.6.0-SNAPSHOT'
[INFO] Scanning for projects...
[WARNING] 
[WARNING] Some problems were encountered while building the effective model for 
org.apache.hudi:hudi-spark_2.11:jar:0.6.0-SNAPSHOT
[WARNING] 'artifactId' contains an expression but should be a constant. @ 
org.apache.hudi:hudi-spark_${scala.binary.version}:[unknown-version], 

 line 26, column 15
[WARNING] 
[WARNING] Some problems were encountered while building the effective model for 
org.apache.hudi:hudi-timeline-service:jar:0.6.0-SNAPSHOT
[WARNING] 'build.plugins.plugin.(groupId:artifactId)' must be unique but found 
duplicate declaration of plugin org.jacoco:jacoco-maven-plugin @ 
org.apache.hudi:hudi-timeline-service:[unknown-version], 

 line 58, column 15
[WARNING] 
[WARNING] Some problems were encountered while building the effective model for 
org.apache.hudi:hudi-utilities_2.11:jar:0.6.0-SNAPSHOT
[WARNING] 'artifactId' contains an expression but should be a constant. @ 
org.apache.hudi:hudi-utilities_${scala.binary.version}:[unknown-version], 

 line 26, column 15
[WARNING] 
[WARNING] Some problems were encountered while building the effective model for 
org.apache.hudi:hudi-spark-bundle_2.11:jar:0.6.0-SNAPSHOT
[WARNING] 'artifactId' contains an expression but should be a constant. @ 

[GitHub] [incubator-hudi] garyli1019 commented on a change in pull request #1453: HUDI-644 kafka connect checkpoint provider

2020-03-31 Thread GitBox
garyli1019 commented on a change in pull request #1453: HUDI-644 kafka connect 
checkpoint provider
URL: https://github.com/apache/incubator-hudi/pull/1453#discussion_r401331016
 
 

 ##
 File path: 
hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/checkpoint/KafkaConnectHdfsProvider.java
 ##
 @@ -0,0 +1,140 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.utilities.sources.checkpoint;
+
+import org.apache.hadoop.fs.FileStatus;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.fs.PathFilter;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.HashMap;
+import java.util.regex.Matcher;
+import java.util.regex.Pattern;
+
+/**
+ * Generate checkpoint from Kafka-Connect-HDFS.
+ */
+public class KafkaConnectHdfsProvider implements CheckPointProvider {
 
 Review comment:
   renamed to `InitialCheckPointProvider`


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] garyli1019 commented on a change in pull request #1453: HUDI-644 kafka connect checkpoint provider

2020-03-31 Thread GitBox
garyli1019 commented on a change in pull request #1453: HUDI-644 kafka connect 
checkpoint provider
URL: https://github.com/apache/incubator-hudi/pull/1453#discussion_r401330800
 
 

 ##
 File path: 
hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/checkpoint/CheckPointProvider.java
 ##
 @@ -0,0 +1,31 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.utilities.sources.checkpoint;
+
+import java.io.IOException;
+
+/**
+ * Provide checkpoint for delta streamer.
+ */
+public interface CheckPointProvider {
+  /**
+   * Get checkpoint string recognizable for delta streamer.
+   */
+  String getCheckpoint() throws IOException;
 
 Review comment:
   Used `HoodieException`


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] umehrot2 commented on issue #1475: [HUDI-426][WIP] Initial implementation for Bootstrapping data source

2020-03-31 Thread GitBox
umehrot2 commented on issue #1475: [HUDI-426][WIP] Initial implementation for 
Bootstrapping data source
URL: https://github.com/apache/incubator-hudi/pull/1475#issuecomment-606998860
 
 
   @bvaradar if you glance through my implementation one thing I might require 
from you is that in `HoodieBaseFile` if we can store `FileStatus` for external 
data file instead of just the string path. I need the list of `FileStatus` of 
external data files to work with here.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] hddong commented on issue #1452: [HUDI-740]Fix can not specify the sparkMaster of cleans run command

2020-03-31 Thread GitBox
hddong commented on issue #1452: [HUDI-740]Fix can not specify the sparkMaster 
of cleans run command
URL: https://github.com/apache/incubator-hudi/pull/1452#issuecomment-606996719
 
 
   @yanghua please have a review when you are free. Will add `sparkMater` for 
other command when add test class, it's safer to do .


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Updated] (HUDI-426) Implement Spark DataSource Support for querying bootstrapped tables

2020-03-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-426:

Labels: pull-request-available  (was: )

> Implement Spark DataSource Support for querying bootstrapped tables
> ---
>
> Key: HUDI-426
> URL: https://issues.apache.org/jira/browse/HUDI-426
> Project: Apache Hudi (incubating)
>  Issue Type: Sub-task
>  Components: Spark Integration
>Reporter: Balaji Varadarajan
>Assignee: Nicholas Jiang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.6.0
>
>
> We need ability in SparkDataSource to query COW table which is bootstrapped 
> as per 
> [https://cwiki.apache.org/confluence/display/HUDI/RFC+-+12+:+Efficient+Migration+of+Large+Parquet+Tables+to+Apache+Hudi#RFC-12:EfficientMigrationofLargeParquetTablestoApacheHudi-BootstrapIndex:]
>  
> Current implementation delegates to Parquet DataSource but this wont work as 
> we need ability to stitch the columns externally.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] umehrot2 opened a new pull request #1475: [HUDI-426][WIP] Initial implementation for Bootstrapping data source

2020-03-31 Thread GitBox
umehrot2 opened a new pull request #1475: [HUDI-426][WIP] Initial 
implementation for Bootstrapping data source
URL: https://github.com/apache/incubator-hudi/pull/1475
 
 
   ## *Tips*
   - *Thank you very much for contributing to Apache Hudi.*
   - *Please review https://hudi.apache.org/contributing.html before opening a 
pull request.*
   
   ## What is the purpose of the pull request
   
   *(For example: This pull request adds quick-start document.)*
   
   ## Brief change log
   
   *(for example:)*
 - *Modify AnnotationLocation checkstyle rule in checkstyle.xml*
   
   ## Verify this pull request
   
   *(Please pick either of the following options)*
   
   This pull request is a trivial rework / code cleanup without any test 
coverage.
   
   *(or)*
   
   This pull request is already covered by existing tests, such as *(please 
describe tests)*.
   
   (or)
   
   This change added tests and can be verified as follows:
   
   *(example:)*
   
 - *Added integration tests for end-to-end.*
 - *Added HoodieClientWriteTest to verify the change.*
 - *Manually verified the change by running a job locally.*
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] codecov-io edited a comment on issue #1440: [HUDI-731] Add ChainedTransformer

2020-03-31 Thread GitBox
codecov-io edited a comment on issue #1440: [HUDI-731] Add ChainedTransformer
URL: https://github.com/apache/incubator-hudi/pull/1440#issuecomment-602962209
 
 
   # 
[Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1440?src=pr=h1) 
Report
   > Merging 
[#1440](https://codecov.io/gh/apache/incubator-hudi/pull/1440?src=pr=desc) 
into 
[master](https://codecov.io/gh/apache/incubator-hudi/commit/ce0a4c64d07d6eea926d1bfb92b69ae387b88f50=desc)
 will **decrease** coverage by `0.05%`.
   > The diff coverage is `80.95%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-hudi/pull/1440/graphs/tree.svg?width=650=150=pr=VTTXabwbs2)](https://codecov.io/gh/apache/incubator-hudi/pull/1440?src=pr=tree)
   
   ```diff
   @@ Coverage Diff  @@
   ## master#1440  +/-   ##
   
   - Coverage 67.70%   67.64%   -0.06% 
   - Complexity  261  266   +5 
   
 Files   348  349   +1 
 Lines 1668316689   +6 
 Branches   1694 1699   +5 
   
   - Hits  1129511290   -5 
   - Misses 4647 4658  +11 
 Partials741  741  
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/incubator-hudi/pull/1440?src=pr=tree) | 
Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | 
[...i/utilities/deltastreamer/HoodieDeltaStreamer.java](https://codecov.io/gh/apache/incubator-hudi/pull/1440/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2RlbHRhc3RyZWFtZXIvSG9vZGllRGVsdGFTdHJlYW1lci5qYXZh)
 | `78.60% <20.00%> (-1.60%)` | `8.00 <0.00> (ø)` | |
   | 
[...in/java/org/apache/hudi/utilities/UtilHelpers.java](https://codecov.io/gh/apache/incubator-hudi/pull/1440/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL1V0aWxIZWxwZXJzLmphdmE=)
 | `65.41% <100.00%> (+2.62%)` | `21.00 <3.00> (+1.00)` | |
   | 
[...apache/hudi/utilities/deltastreamer/DeltaSync.java](https://codecov.io/gh/apache/incubator-hudi/pull/1440/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2RlbHRhc3RyZWFtZXIvRGVsdGFTeW5jLmphdmE=)
 | `72.00% <100.00%> (-0.28%)` | `38.00 <0.00> (ø)` | |
   | 
[...e/hudi/utilities/transform/ChainedTransformer.java](https://codecov.io/gh/apache/incubator-hudi/pull/1440/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3RyYW5zZm9ybS9DaGFpbmVkVHJhbnNmb3JtZXIuamF2YQ==)
 | `100.00% <100.00%> (ø)` | `4.00 <4.00> (?)` | |
   | 
[...g/apache/hudi/metrics/InMemoryMetricsReporter.java](https://codecov.io/gh/apache/incubator-hudi/pull/1440/diff?src=pr=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvbWV0cmljcy9Jbk1lbW9yeU1ldHJpY3NSZXBvcnRlci5qYXZh)
 | `40.00% <0.00%> (-40.00%)` | `0.00% <0.00%> (ø%)` | |
   | 
[...src/main/java/org/apache/hudi/metrics/Metrics.java](https://codecov.io/gh/apache/incubator-hudi/pull/1440/diff?src=pr=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvbWV0cmljcy9NZXRyaWNzLmphdmE=)
 | `58.33% <0.00%> (-13.89%)` | `0.00% <0.00%> (ø%)` | |
   | 
[...ache/hudi/common/fs/inline/InMemoryFileSystem.java](https://codecov.io/gh/apache/incubator-hudi/pull/1440/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL2ZzL2lubGluZS9Jbk1lbW9yeUZpbGVTeXN0ZW0uamF2YQ==)
 | `79.31% <0.00%> (-10.35%)` | `0.00% <0.00%> (ø%)` | |
   | 
[...e/compact/strategy/DayBasedCompactionStrategy.java](https://codecov.io/gh/apache/incubator-hudi/pull/1440/diff?src=pr=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdGFibGUvY29tcGFjdC9zdHJhdGVneS9EYXlCYXNlZENvbXBhY3Rpb25TdHJhdGVneS5qYXZh)
 | `65.00% <0.00%> (-1.67%)` | `0.00% <0.00%> (ø%)` | |
   | 
[...ava/org/apache/hudi/common/model/HoodieRecord.java](https://codecov.io/gh/apache/incubator-hudi/pull/1440/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL21vZGVsL0hvb2RpZVJlY29yZC5qYXZh)
 | `81.03% <0.00%> (-1.51%)` | `0.00% <0.00%> (ø%)` | |
   | 
[...org/apache/hudi/config/HoodieHBaseIndexConfig.java](https://codecov.io/gh/apache/incubator-hudi/pull/1440/diff?src=pr=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29uZmlnL0hvb2RpZUhCYXNlSW5kZXhDb25maWcuamF2YQ==)
 | `45.94% <0.00%> (-0.73%)` | `0.00% <0.00%> (ø%)` | |
   | ... and [22 
more](https://codecov.io/gh/apache/incubator-hudi/pull/1440/diff?src=pr=tree-more)
 | |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1440?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 

[GitHub] [incubator-hudi] hddong commented on issue #1471: [WIP][HUDI-752]Make CompactionAdminClient spark-free

2020-03-31 Thread GitBox
hddong commented on issue #1471: [WIP][HUDI-752]Make CompactionAdminClient 
spark-free
URL: https://github.com/apache/incubator-hudi/pull/1471#issuecomment-606987392
 
 
   @vinothchandar I totally agree with you, we need abstraction class first. My 
original idea was that we have many transform (List to list) used jsc, It is 
not must depend on spark, we can take them out.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] yanghua commented on issue #1472: [HUDI-754] Configure .asf.yaml for Hudi Github repository

2020-03-31 Thread GitBox
yanghua commented on issue #1472: [HUDI-754] Configure .asf.yaml for Hudi 
Github repository
URL: https://github.com/apache/incubator-hudi/pull/1472#issuecomment-606983494
 
 
   It seems the configuration of tags did not take effect?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] yanghua merged pull request #1472: [HUDI-754] Configure .asf.yaml for Hudi Github repository

2020-03-31 Thread GitBox
yanghua merged pull request #1472: [HUDI-754] Configure .asf.yaml for Hudi 
Github repository
URL: https://github.com/apache/incubator-hudi/pull/1472
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


Error while running github feature from .asf.yaml in incubator-hudi!

2020-03-31 Thread Apache Infrastructure
An error occurred while running github feature in .asf.yaml!:
.asf.yaml: Invalid GitHub label 'incremental processing' - must be lowercase 
alphanumerical and <= 35 characters!


[GitHub] [incubator-hudi] yanghua commented on issue #1472: [HUDI-754] Configure .asf.yaml for Hudi Github repository

2020-03-31 Thread GitBox
yanghua commented on issue #1472: [HUDI-754] Configure .asf.yaml for Hudi 
Github repository
URL: https://github.com/apache/incubator-hudi/pull/1472#issuecomment-606982866
 
 
   thanks @vinothchandar let's merge it and see what will happen.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[incubator-hudi] branch master updated: [HUDI-754] Configure .asf.yaml for Hudi Github repository (#1472)

2020-03-31 Thread vinoyang
This is an automated email from the ASF dual-hosted git repository.

vinoyang pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new c146ca9  [HUDI-754] Configure .asf.yaml for Hudi Github repository 
(#1472)
c146ca9 is described below

commit c146ca90fdd102f2bda90a2b15f6aef56414f9f4
Author: vinoyang 
AuthorDate: Wed Apr 1 10:02:47 2020 +0800

[HUDI-754] Configure .asf.yaml for Hudi Github repository (#1472)

* [HUDI-754] Configure .asf.yaml for Hudi Github repository
---
 .asf.yaml | 12 
 1 file changed, 12 insertions(+)

diff --git a/.asf.yaml b/.asf.yaml
new file mode 100644
index 000..4a74628
--- /dev/null
+++ b/.asf.yaml
@@ -0,0 +1,12 @@
+github:
+  description: "Upserts, Deletes And Incremental Processing on Big Data."
+  homepage: https://hudi.apache.org/
+  labels:
+- hudi
+- apachehudi
+- datalake
+- incremental processing
+- bigdata
+- stream processing
+- data integration
+- apachespark



[GitHub] [incubator-hudi] codecov-io edited a comment on issue #1473: [HUDI-568] Improve unit test coverage

2020-03-31 Thread GitBox
codecov-io edited a comment on issue #1473: [HUDI-568] Improve unit test 
coverage
URL: https://github.com/apache/incubator-hudi/pull/1473#issuecomment-606973658
 
 
   # 
[Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1473?src=pr=h1) 
Report
   > Merging 
[#1473](https://codecov.io/gh/apache/incubator-hudi/pull/1473?src=pr=desc) 
into 
[master](https://codecov.io/gh/apache/incubator-hudi/commit/78b3194e8241c519a85310997f31b2b55df487e1=desc)
 will **increase** coverage by `0.40%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-hudi/pull/1473/graphs/tree.svg?width=650=150=pr=VTTXabwbs2)](https://codecov.io/gh/apache/incubator-hudi/pull/1473?src=pr=tree)
   
   ```diff
   @@ Coverage Diff  @@
   ## master#1473  +/-   ##
   
   + Coverage 67.64%   68.05%   +0.40% 
 Complexity  261  261  
   
 Files   348  348  
 Lines 1667216670   -2 
 Branches   1694 1694  
   
   + Hits  1127811344  +66 
   + Misses 4653 4586  -67 
   + Partials741  740   -1 
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/incubator-hudi/pull/1473?src=pr=tree) | 
Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | 
[...apache/hudi/common/util/collection/RocksDBDAO.java](https://codecov.io/gh/apache/incubator-hudi/pull/1473/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3V0aWwvY29sbGVjdGlvbi9Sb2Nrc0RCREFPLmphdmE=)
 | `74.86% <ø> (+17.40%)` | `0.00 <0.00> (ø)` | |
   | 
[...pache/hudi/common/table/HoodieTableMetaClient.java](https://codecov.io/gh/apache/incubator-hudi/pull/1473/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL0hvb2RpZVRhYmxlTWV0YUNsaWVudC5qYXZh)
 | `83.76% <0.00%> (+7.14%)` | `0.00% <0.00%> (ø%)` | |
   | 
[.../hudi/hadoop/realtime/HoodieRealtimeFileSplit.java](https://codecov.io/gh/apache/incubator-hudi/pull/1473/diff?src=pr=tree#diff-aHVkaS1oYWRvb3AtbXIvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaGFkb29wL3JlYWx0aW1lL0hvb2RpZVJlYWx0aW1lRmlsZVNwbGl0LmphdmE=)
 | `100.00% <0.00%> (+73.52%)` | `0.00% <0.00%> (ø%)` | |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1473?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1473?src=pr=footer).
 Last update 
[78b3194...5d45687](https://codecov.io/gh/apache/incubator-hudi/pull/1473?src=pr=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] codecov-io commented on issue #1473: [HUDI-568] Improve unit test coverage

2020-03-31 Thread GitBox
codecov-io commented on issue #1473: [HUDI-568] Improve unit test coverage
URL: https://github.com/apache/incubator-hudi/pull/1473#issuecomment-606973658
 
 
   # 
[Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1473?src=pr=h1) 
Report
   > Merging 
[#1473](https://codecov.io/gh/apache/incubator-hudi/pull/1473?src=pr=desc) 
into 
[master](https://codecov.io/gh/apache/incubator-hudi/commit/78b3194e8241c519a85310997f31b2b55df487e1=desc)
 will **increase** coverage by `0.40%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-hudi/pull/1473/graphs/tree.svg?width=650=150=pr=VTTXabwbs2)](https://codecov.io/gh/apache/incubator-hudi/pull/1473?src=pr=tree)
   
   ```diff
   @@ Coverage Diff  @@
   ## master#1473  +/-   ##
   
   + Coverage 67.64%   68.05%   +0.40% 
 Complexity  261  261  
   
 Files   348  348  
 Lines 1667216670   -2 
 Branches   1694 1694  
   
   + Hits  1127811344  +66 
   + Misses 4653 4586  -67 
   + Partials741  740   -1 
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/incubator-hudi/pull/1473?src=pr=tree) | 
Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | 
[...apache/hudi/common/util/collection/RocksDBDAO.java](https://codecov.io/gh/apache/incubator-hudi/pull/1473/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3V0aWwvY29sbGVjdGlvbi9Sb2Nrc0RCREFPLmphdmE=)
 | `74.86% <ø> (+17.40%)` | `0.00 <0.00> (ø)` | |
   | 
[...pache/hudi/common/table/HoodieTableMetaClient.java](https://codecov.io/gh/apache/incubator-hudi/pull/1473/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL0hvb2RpZVRhYmxlTWV0YUNsaWVudC5qYXZh)
 | `83.76% <0.00%> (+7.14%)` | `0.00% <0.00%> (ø%)` | |
   | 
[.../hudi/hadoop/realtime/HoodieRealtimeFileSplit.java](https://codecov.io/gh/apache/incubator-hudi/pull/1473/diff?src=pr=tree#diff-aHVkaS1oYWRvb3AtbXIvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaGFkb29wL3JlYWx0aW1lL0hvb2RpZVJlYWx0aW1lRmlsZVNwbGl0LmphdmE=)
 | `100.00% <0.00%> (+73.52%)` | `0.00% <0.00%> (ø%)` | |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1473?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1473?src=pr=footer).
 Last update 
[78b3194...5d45687](https://codecov.io/gh/apache/incubator-hudi/pull/1473?src=pr=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] codecov-io commented on issue #1474: [HUDI-562] Enable testing at debug log level

2020-03-31 Thread GitBox
codecov-io commented on issue #1474: [HUDI-562] Enable testing at debug log 
level
URL: https://github.com/apache/incubator-hudi/pull/1474#issuecomment-606972178
 
 
   # 
[Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1474?src=pr=h1) 
Report
   > Merging 
[#1474](https://codecov.io/gh/apache/incubator-hudi/pull/1474?src=pr=desc) 
into 
[master](https://codecov.io/gh/apache/incubator-hudi/commit/78b3194e8241c519a85310997f31b2b55df487e1=desc)
 will **increase** coverage by `0.26%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-hudi/pull/1474/graphs/tree.svg?width=650=150=pr=VTTXabwbs2)](https://codecov.io/gh/apache/incubator-hudi/pull/1474?src=pr=tree)
   
   ```diff
   @@ Coverage Diff  @@
   ## master#1474  +/-   ##
   
   + Coverage 67.64%   67.91%   +0.26% 
 Complexity  261  261  
   
 Files   348  348  
 Lines 1667216672  
 Branches   1694 1694  
   
   + Hits  1127811323  +45 
   + Misses 4653 4606  -47 
   - Partials741  743   +2 
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/incubator-hudi/pull/1474?src=pr=tree) | 
Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | 
[...a/org/apache/hudi/common/util/collection/Pair.java](https://codecov.io/gh/apache/incubator-hudi/pull/1474/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3V0aWwvY29sbGVjdGlvbi9QYWlyLmphdmE=)
 | `72.00% <0.00%> (-4.00%)` | `0.00% <0.00%> (ø%)` | |
   | 
[...e/hudi/timeline/service/FileSystemViewHandler.java](https://codecov.io/gh/apache/incubator-hudi/pull/1474/diff?src=pr=tree#diff-aHVkaS10aW1lbGluZS1zZXJ2aWNlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL3RpbWVsaW5lL3NlcnZpY2UvRmlsZVN5c3RlbVZpZXdIYW5kbGVyLmphdmE=)
 | `90.14% <0.00%> (+0.93%)` | `0.00% <0.00%> (ø%)` | |
   | 
[...g/apache/hudi/hadoop/HoodieParquetInputFormat.java](https://codecov.io/gh/apache/incubator-hudi/pull/1474/diff?src=pr=tree#diff-aHVkaS1oYWRvb3AtbXIvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaGFkb29wL0hvb2RpZVBhcnF1ZXRJbnB1dEZvcm1hdC5qYXZh)
 | `80.99% <0.00%> (+1.65%)` | `0.00% <0.00%> (ø%)` | |
   | 
[.../org/apache/hudi/index/bloom/HoodieBloomIndex.java](https://codecov.io/gh/apache/incubator-hudi/pull/1474/diff?src=pr=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaW5kZXgvYmxvb20vSG9vZGllQmxvb21JbmRleC5qYXZh)
 | `96.49% <0.00%> (+1.75%)` | `0.00% <0.00%> (ø%)` | |
   | 
[...pache/hudi/common/table/HoodieTableMetaClient.java](https://codecov.io/gh/apache/incubator-hudi/pull/1474/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL0hvb2RpZVRhYmxlTWV0YUNsaWVudC5qYXZh)
 | `80.51% <0.00%> (+3.89%)` | `0.00% <0.00%> (ø%)` | |
   | 
[...hadoop/realtime/RealtimeCompactedRecordReader.java](https://codecov.io/gh/apache/incubator-hudi/pull/1474/diff?src=pr=tree#diff-aHVkaS1oYWRvb3AtbXIvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaGFkb29wL3JlYWx0aW1lL1JlYWx0aW1lQ29tcGFjdGVkUmVjb3JkUmVhZGVyLmphdmE=)
 | `63.82% <0.00%> (+4.25%)` | `0.00% <0.00%> (ø%)` | |
   | 
[...java/org/apache/hudi/io/HoodieKeyLookupHandle.java](https://codecov.io/gh/apache/incubator-hudi/pull/1474/diff?src=pr=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaW8vSG9vZGllS2V5TG9va3VwSGFuZGxlLmphdmE=)
 | `90.00% <0.00%> (+6.00%)` | `0.00% <0.00%> (ø%)` | |
   | 
[.../hadoop/realtime/AbstractRealtimeRecordReader.java](https://codecov.io/gh/apache/incubator-hudi/pull/1474/diff?src=pr=tree#diff-aHVkaS1oYWRvb3AtbXIvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaGFkb29wL3JlYWx0aW1lL0Fic3RyYWN0UmVhbHRpbWVSZWNvcmRSZWFkZXIuamF2YQ==)
 | `83.54% <0.00%> (+9.49%)` | `0.00% <0.00%> (ø%)` | |
   | 
[...i/index/bloom/BucketizedBloomCheckPartitioner.java](https://codecov.io/gh/apache/incubator-hudi/pull/1474/diff?src=pr=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaW5kZXgvYmxvb20vQnVja2V0aXplZEJsb29tQ2hlY2tQYXJ0aXRpb25lci5qYXZh)
 | `93.61% <0.00%> (+10.63%)` | `0.00% <0.00%> (ø%)` | |
   | 
[...rg/apache/hudi/hadoop/HoodieROTablePathFilter.java](https://codecov.io/gh/apache/incubator-hudi/pull/1474/diff?src=pr=tree#diff-aHVkaS1oYWRvb3AtbXIvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaGFkb29wL0hvb2RpZVJPVGFibGVQYXRoRmlsdGVyLmphdmE=)
 | `76.11% <0.00%> (+11.94%)` | `0.00% <0.00%> (ø%)` | |
   | ... and [1 
more](https://codecov.io/gh/apache/incubator-hudi/pull/1474/diff?src=pr=tree-more)
 | |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1474?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not 

[jira] [Created] (HUDI-756) Organize Action execution logic into a nicer class hierarchy in hudi-client

2020-03-31 Thread Vinoth Chandar (Jira)
Vinoth Chandar created HUDI-756:
---

 Summary: Organize Action execution logic into a nicer class 
hierarchy in hudi-client
 Key: HUDI-756
 URL: https://issues.apache.org/jira/browse/HUDI-756
 Project: Apache Hudi (incubating)
  Issue Type: Sub-task
  Components: Writer Core
Reporter: Vinoth Chandar
Assignee: Vinoth Chandar
 Fix For: 0.6.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-435) Make async compaction/cleaning extensible to new usages

2020-03-31 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar updated HUDI-435:

Summary: Make async compaction/cleaning extensible to new usages  (was: 
Make async compaction extensible to be available in other components. )

> Make async compaction/cleaning extensible to new usages
> ---
>
> Key: HUDI-435
> URL: https://issues.apache.org/jira/browse/HUDI-435
> Project: Apache Hudi (incubating)
>  Issue Type: Sub-task
>  Components: Compaction, Writer Core
>Reporter: sivabalan narayanan
>Assignee: sivabalan narayanan
>Priority: Major
> Fix For: 0.6.0
>
>
> Once HFile based index is available, next step is to make compaction 
> extensible to be available for all components.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-519) Document the need for Avro dependency shading/relocation for custom payloads, need for spark-avro

2020-03-31 Thread leesf (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17072254#comment-17072254
 ] 

leesf commented on HUDI-519:


Yes, it is done in 0.5.1. see 
[http://hudi.apache.org/releases.html#release-highlights-1]

> Document the need for Avro dependency shading/relocation for custom payloads, 
> need for spark-avro
> -
>
> Key: HUDI-519
> URL: https://issues.apache.org/jira/browse/HUDI-519
> Project: Apache Hudi (incubating)
>  Issue Type: Task
>  Components: Docs, Usability
>Reporter: Udit Mehrotra
>Assignee: leesf
>Priority: Major
> Fix For: 0.5.1
>
>
> In [https://github.com/apache/incubator-hudi/pull/1005] we are migrating Hudi 
> to Spark 2.4.4. As part of this migration, we also had to migrate Hudi to use 
> Avro 1.8.2 (required by spark), while Hive still uses older version of Avro.
> This has resulted in the need to shade Avro in *hadoop-mr-bundle*. This has 
> implications on users of Hudi, who implement custom record payloads. They 
> would have start shading Avro in there custom jars, similar to how it shaded 
> in *hadoop-mr-bundle*.
> This Jira is to track the documentation of this caveat in release notes, and 
> if needed at other places like website etc.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (HUDI-519) Document the need for Avro dependency shading/relocation for custom payloads, need for spark-avro

2020-03-31 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf closed HUDI-519.
--
Fix Version/s: (was: 0.6.0)
   0.5.1
   Resolution: Fixed

> Document the need for Avro dependency shading/relocation for custom payloads, 
> need for spark-avro
> -
>
> Key: HUDI-519
> URL: https://issues.apache.org/jira/browse/HUDI-519
> Project: Apache Hudi (incubating)
>  Issue Type: Task
>  Components: Docs, Usability
>Reporter: Udit Mehrotra
>Assignee: leesf
>Priority: Major
> Fix For: 0.5.1
>
>
> In [https://github.com/apache/incubator-hudi/pull/1005] we are migrating Hudi 
> to Spark 2.4.4. As part of this migration, we also had to migrate Hudi to use 
> Avro 1.8.2 (required by spark), while Hive still uses older version of Avro.
> This has resulted in the need to shade Avro in *hadoop-mr-bundle*. This has 
> implications on users of Hudi, who implement custom record payloads. They 
> would have start shading Avro in there custom jars, similar to how it shaded 
> in *hadoop-mr-bundle*.
> This Jira is to track the documentation of this caveat in release notes, and 
> if needed at other places like website etc.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] ramachandranms commented on a change in pull request #1468: [HUDI-748] Adding .codecov.yml to set exclusions for code coverage reports.

2020-03-31 Thread GitBox
ramachandranms commented on a change in pull request #1468: [HUDI-748] Adding 
.codecov.yml to set exclusions for code coverage reports.
URL: https://github.com/apache/incubator-hudi/pull/1468#discussion_r401257947
 
 

 ##
 File path: .codecov.yml
 ##
 @@ -0,0 +1,46 @@
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+# For more configuration details:
+# https://docs.codecov.io/docs/codecov-yaml
+
+# Check if this file is valid by running in bash:
+# curl -X POST --data-binary @.codecov.yml https://codecov.io/validate
+
+# Ignoring Paths
+# --
+# which folders/files to ignore
+ignore:
+  - "hudi-common/src/main/java/org/apache/hudi/avro/model/*"
+  - "hudi-common/src/main/java/org/apache/hudi/common/model/*"
 
 Review comment:
   they are just model classes and they only have properties + getter/setter 
methods. coverage does not work well with classes that have lot of member 
declarations since they only tag executed lines (getter/setters). Since most of 
the getters & setters are autogenerated or trivial code, there is no real 
benefit in testing them.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] ramachandranms opened a new pull request #1474: [HUDI-562] Enable testing at debug log level

2020-03-31 Thread GitBox
ramachandranms opened a new pull request #1474: [HUDI-562] Enable testing at 
debug log level
URL: https://github.com/apache/incubator-hudi/pull/1474
 
 
   ## What is the purpose of the pull request
   
   This pull request updates all the log4j configuration files that affect 
testing. Following changes are made
   * Enable tests to be run at the log level of DEBUG
   * Log only WARN and above to console to conserve log file size
   
   This is to ensure that tests will execute all code paths, even the ones
   written under DEBUG log levels. This will improve coverage as well as
   ensure there are no surprised when DEBUG log level is enabled in
   production.
   
   ## Brief change log
   
   * Modified all log4j testing properties to enable DEBUG level execute but 
only log WARN and above to console.
   
   ## Verify this pull request
   
   This pull request is a trivial log4j properties change without any test 
coverage.
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Created] (HUDI-755) Modify lo4j properties to allow execution of tests in debug log mode

2020-03-31 Thread Ramachandran M S (Jira)
Ramachandran M S created HUDI-755:
-

 Summary: Modify lo4j properties to allow execution of tests in 
debug log mode
 Key: HUDI-755
 URL: https://issues.apache.org/jira/browse/HUDI-755
 Project: Apache Hudi (incubating)
  Issue Type: Sub-task
Reporter: Ramachandran M S
Assignee: Ramachandran M S


Currently tests run at the log level of WARN. This skips some of the code paths 
that will be run if debug logging it turned on for production. Since some of 
the code inside the debug section is critical and could break production if not 
tested, we need to run tests at debug log level mode.

 

But we also need to maintain a balance to not log DEBUG & INFO logs to console 
so as to not overwhelm the log file size. Travis has a cap of 4MB per log file 
size. So we need to enable code execution at DEBUG level but only log WARN+ to 
console.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] ramachandranms opened a new pull request #1473: [HUDI-568] Improve unit test coverage

2020-03-31 Thread GitBox
ramachandranms opened a new pull request #1473: [HUDI-568] Improve unit test 
coverage
URL: https://github.com/apache/incubator-hudi/pull/1473
 
 
   ## What is the purpose of the pull request
   
   Improve code coverage for the following classes
   * HoodieTableMetaClient
   * RocksDBDAO
   * HoodieRealtimeFileSplit
   
   
   ## Brief change log
   
   Added new tests for functions missing coverage
   
   ## Verify this pull request
   
   Added tests in the following files
   * TestRocksDBManager.java
   * TesthoodieTableMetaClient.java
   * TestHoodieRealtimeFileSplit.java
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Updated] (HUDI-568) Improve unit test coverage for HoodieTableMetaClient, RocksDBDAO & HoodieRealtimeFileSplit

2020-03-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-568:

Labels: pull-request-available  (was: )

> Improve unit test coverage for HoodieTableMetaClient, RocksDBDAO & 
> HoodieRealtimeFileSplit
> --
>
> Key: HUDI-568
> URL: https://issues.apache.org/jira/browse/HUDI-568
> Project: Apache Hudi (incubating)
>  Issue Type: Sub-task
>Reporter: Prashant Wason
>Assignee: Ramachandran M S
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-568) Improve unit test coverage for HoodieTableMetaClient, RocksDBDAO & HoodieRealtimeFileSplit

2020-03-31 Thread Ramachandran M S (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ramachandran M S updated HUDI-568:
--
Summary: Improve unit test coverage for HoodieTableMetaClient, RocksDBDAO & 
HoodieRealtimeFileSplit  (was: Improve unit test coverage for 
org.apache.hudi.common.table.HoodieTableMetaClient)

> Improve unit test coverage for HoodieTableMetaClient, RocksDBDAO & 
> HoodieRealtimeFileSplit
> --
>
> Key: HUDI-568
> URL: https://issues.apache.org/jira/browse/HUDI-568
> Project: Apache Hudi (incubating)
>  Issue Type: Sub-task
>Reporter: Prashant Wason
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HUDI-568) Improve unit test coverage for HoodieTableMetaClient, RocksDBDAO & HoodieRealtimeFileSplit

2020-03-31 Thread Ramachandran M S (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ramachandran M S reassigned HUDI-568:
-

Assignee: Ramachandran M S

> Improve unit test coverage for HoodieTableMetaClient, RocksDBDAO & 
> HoodieRealtimeFileSplit
> --
>
> Key: HUDI-568
> URL: https://issues.apache.org/jira/browse/HUDI-568
> Project: Apache Hudi (incubating)
>  Issue Type: Sub-task
>Reporter: Prashant Wason
>Assignee: Ramachandran M S
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] garyli1019 commented on issue #1453: HUDI-644 kafka connect checkpoint provider

2020-03-31 Thread GitBox
garyli1019 commented on issue #1453: HUDI-644 kafka connect checkpoint provider
URL: https://github.com/apache/incubator-hudi/pull/1453#issuecomment-606906561
 
 
   sounds like a plan! Will update this PR soon


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Updated] (HUDI-677) Abstract/Refactor all transaction management logic into a set of classes

2020-03-31 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar updated HUDI-677:

Summary: Abstract/Refactor all transaction management logic into a set of 
classes   (was: Abstract/Refactor all transaction management logic into a set 
of classes from HoodieWriteClient)

> Abstract/Refactor all transaction management logic into a set of classes 
> -
>
> Key: HUDI-677
> URL: https://issues.apache.org/jira/browse/HUDI-677
> Project: Apache Hudi (incubating)
>  Issue Type: Sub-task
>  Components: Code Cleanup
>Reporter: Vinoth Chandar
>Assignee: Vinoth Chandar
>Priority: Major
> Fix For: 0.6.0
>
>
> Over time a lot of the core transaction management code has been  split 
> across various files in hudi-client.. We want to clean this up and present a 
> nice interface.. 
> Some notes and thoughts and suggestions..  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] PhatakN1 commented on issue #1458: Issue with running compaction on a MOR dataset with org.apache.hudi.payload.AWSDmsAvroPayload

2020-03-31 Thread GitBox
PhatakN1 commented on issue #1458: Issue with running compaction on a MOR 
dataset with org.apache.hudi.payload.AWSDmsAvroPayload
URL: https://github.com/apache/incubator-hudi/issues/1458#issuecomment-606752893
 
 
   I was running this on EMR and there was an issue with EMR. This will be 
fixed in EMR release 5.30.
   Thanks for your help and support


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] codecov-io edited a comment on issue #1472: [HUDI-754] Configure .asf.yaml for Hudi Github repository

2020-03-31 Thread GitBox
codecov-io edited a comment on issue #1472: [HUDI-754] Configure .asf.yaml for 
Hudi Github repository
URL: https://github.com/apache/incubator-hudi/pull/1472#issuecomment-606734785
 
 
   # 
[Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1472?src=pr=h1) 
Report
   > Merging 
[#1472](https://codecov.io/gh/apache/incubator-hudi/pull/1472?src=pr=desc) 
into 
[master](https://codecov.io/gh/apache/incubator-hudi/commit/9ecf0ccfb263cee8190afeb965ba350525860d6e=desc)
 will **decrease** coverage by `0.02%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-hudi/pull/1472/graphs/tree.svg?width=650=150=pr=VTTXabwbs2)](https://codecov.io/gh/apache/incubator-hudi/pull/1472?src=pr=tree)
   
   ```diff
   @@ Coverage Diff  @@
   ## master#1472  +/-   ##
   
   - Coverage 67.66%   67.64%   -0.03% 
 Complexity  261  261  
   
 Files   348  348  
 Lines 1668316672  -11 
 Branches   1694 1694  
   
   - Hits  1128911278  -11 
 Misses 4653 4653  
 Partials741  741  
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/incubator-hudi/pull/1472?src=pr=tree) | 
Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | 
[...ache/hudi/common/fs/inline/InMemoryFileSystem.java](https://codecov.io/gh/apache/incubator-hudi/pull/1472/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL2ZzL2lubGluZS9Jbk1lbW9yeUZpbGVTeXN0ZW0uamF2YQ==)
 | `79.31% <0.00%> (-10.35%)` | `0.00% <0.00%> (ø%)` | |
   | 
[...e/compact/strategy/DayBasedCompactionStrategy.java](https://codecov.io/gh/apache/incubator-hudi/pull/1472/diff?src=pr=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdGFibGUvY29tcGFjdC9zdHJhdGVneS9EYXlCYXNlZENvbXBhY3Rpb25TdHJhdGVneS5qYXZh)
 | `65.00% <0.00%> (-1.67%)` | `0.00% <0.00%> (ø%)` | |
   | 
[...ava/org/apache/hudi/common/model/HoodieRecord.java](https://codecov.io/gh/apache/incubator-hudi/pull/1472/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL21vZGVsL0hvb2RpZVJlY29yZC5qYXZh)
 | `81.03% <0.00%> (-1.51%)` | `0.00% <0.00%> (ø%)` | |
   | 
[...che/hudi/common/util/BufferedRandomAccessFile.java](https://codecov.io/gh/apache/incubator-hudi/pull/1472/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3V0aWwvQnVmZmVyZWRSYW5kb21BY2Nlc3NGaWxlLmphdmE=)
 | `54.38% <0.00%> (-0.88%)` | `0.00% <0.00%> (ø%)` | |
   | 
[...org/apache/hudi/config/HoodieHBaseIndexConfig.java](https://codecov.io/gh/apache/incubator-hudi/pull/1472/diff?src=pr=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29uZmlnL0hvb2RpZUhCYXNlSW5kZXhDb25maWcuamF2YQ==)
 | `45.94% <0.00%> (-0.73%)` | `0.00% <0.00%> (ø%)` | |
   | 
[...apache/hudi/utilities/deltastreamer/DeltaSync.java](https://codecov.io/gh/apache/incubator-hudi/pull/1472/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2RlbHRhc3RyZWFtZXIvRGVsdGFTeW5jLmphdmE=)
 | `72.00% <0.00%> (-0.28%)` | `38.00% <0.00%> (ø%)` | |
   | 
[...pache/hudi/common/table/HoodieTableMetaClient.java](https://codecov.io/gh/apache/incubator-hudi/pull/1472/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL0hvb2RpZVRhYmxlTWV0YUNsaWVudC5qYXZh)
 | `76.62% <0.00%> (-0.16%)` | `0.00% <0.00%> (ø%)` | |
   | 
[...ache/hudi/common/util/collection/DiskBasedMap.java](https://codecov.io/gh/apache/incubator-hudi/pull/1472/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3V0aWwvY29sbGVjdGlvbi9EaXNrQmFzZWRNYXAuamF2YQ==)
 | `82.94% <0.00%> (-0.14%)` | `0.00% <0.00%> (ø%)` | |
   | 
[.../main/java/org/apache/hudi/client/WriteStatus.java](https://codecov.io/gh/apache/incubator-hudi/pull/1472/diff?src=pr=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY2xpZW50L1dyaXRlU3RhdHVzLmphdmE=)
 | `69.64% <0.00%> (ø)` | `0.00% <0.00%> (ø%)` | |
   | 
[.../java/org/apache/hudi/client/HoodieReadClient.java](https://codecov.io/gh/apache/incubator-hudi/pull/1472/diff?src=pr=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY2xpZW50L0hvb2RpZVJlYWRDbGllbnQuamF2YQ==)
 | `100.00% <0.00%> (ø)` | `0.00% <0.00%> (ø%)` | |
   | ... and [17 
more](https://codecov.io/gh/apache/incubator-hudi/pull/1472/diff?src=pr=tree-more)
 | |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1472?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 

[GitHub] [incubator-hudi] vinothchandar commented on issue #1453: HUDI-644 kafka connect checkpoint provider

2020-03-31 Thread GitBox
vinothchandar commented on issue #1453: HUDI-644 kafka connect checkpoint 
provider
URL: https://github.com/apache/incubator-hudi/pull/1453#issuecomment-606735369
 
 
   @garyli1019 I am fine either way.. if this is going to be a utility for now, 
that's okay. but lets still clarify the naming to be 
`InitialCheckpointProvider`. 
   
   As for the big picture, 
   
   With the bootstrap work that is going on from @bvaradar & @umehrot2 , here 
is a future I think of. 
   
   - User writing data to S3 using another mechanism (sqoop, connect, ...) at 
`/old/dataset/path`
   - DeltaStreamer can support a `--bootstrap-from /old/dataset/path` and 
`--initial-checkpoint-provider SqoopCheckpointProvider.class`, then it will 
seamlessly perform the initial bootstrap, extract a checkpoint and keep 
incrementally ingesting from that.. 
   
   I was trying to see if we can jump ahead with the checkpoint flag now 
itself.. (its a matter of UX ? providing one time via --checkpoint vs the 
provider invoked by delta streamer itself).. 
   
   
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1471: [WIP][HUDI-752]Make CompactionAdminClient spark-free

2020-03-31 Thread GitBox
vinothchandar commented on a change in pull request #1471: [WIP][HUDI-752]Make 
CompactionAdminClient spark-free
URL: https://github.com/apache/incubator-hudi/pull/1471#discussion_r400979587
 
 

 ##
 File path: 
hudi-client/src/main/java/org/apache/hudi/client/utils/SparkEngineUtils.java
 ##
 @@ -0,0 +1,61 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.client.utils;
+
+import org.apache.spark.SparkContext;
+import org.apache.spark.api.java.JavaSparkContext;
+import org.apache.spark.api.java.function.FlatMapFunction;
+import org.apache.spark.api.java.function.Function;
+
+import java.util.List;
+
+/**
+ * Util class for Spark engine.
+ */
+public class SparkEngineUtils {
+  private static JavaSparkContext jsc;
+
+  public static void setJsc(JavaSparkContext javaSparkContext) {
+jsc = javaSparkContext;
+  }
+
+  /**
+   * Get the only SparkContext from JVM.
+   */
+  public static JavaSparkContext getJsc() {
+if (jsc == null) {
+  jsc = new JavaSparkContext(SparkContext.getOrCreate());
+}
+return jsc;
+  }
+
+  /**
+   * Parallelize map function.
+   */
+  public static  List parallelizeMap(List list, int num, 
Function f) {
 
 Review comment:
   My early thoughts are that this EngineContext abstraction needs to abstract 
Hudi logic and not at the level of `map`, `filter` etc...  (if we wanted that, 
we can think of Beam).. There would be operations with different signatures 
across engines.. for e.g something like `sortAndRepartitionWithPartitions` 
exists for spark RDDs and not DataFrames.
   
   I may be wrong.. but it's worth first computing a table of all RDD apis we 
invoke today, its inputs and see how this can evolve.. ? 
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1471: [WIP][HUDI-752]Make CompactionAdminClient spark-free

2020-03-31 Thread GitBox
vinothchandar commented on a change in pull request #1471: [WIP][HUDI-752]Make 
CompactionAdminClient spark-free
URL: https://github.com/apache/incubator-hudi/pull/1471#discussion_r400975640
 
 

 ##
 File path: 
hudi-client/src/main/java/org/apache/hudi/client/utils/SparkEngineUtils.java
 ##
 @@ -0,0 +1,61 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.client.utils;
+
+import org.apache.spark.SparkContext;
+import org.apache.spark.api.java.JavaSparkContext;
+import org.apache.spark.api.java.function.FlatMapFunction;
+import org.apache.spark.api.java.function.Function;
+
+import java.util.List;
+
+/**
+ * Util class for Spark Engine.
+ */
+public class SparkEngineUtils {
 
 Review comment:
   Also I would have a base interface `EngineContext` (let's not please 
overload Utils anymore.. I am trying to move us towards SRP principles).. and 
subclass `SparkRDDEngineContext` (we may add a DataFrame engine, Flink 
Engine).. and generify the code such that we pass the engineContext once to 
HoodieWriteClient and the rest of the code can execute.. 
   
   This sore of PoC will be extremely valuable to use at this stage, than doing 
classes one by one.. We will take a long time to be done :) .. @yanghua would 
you also agree with my thoughts here 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (HUDI-519) Document the need for Avro dependency shading/relocation for custom payloads, need for spark-avro

2020-03-31 Thread Vinoth Chandar (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17071847#comment-17071847
 ] 

Vinoth Chandar commented on HUDI-519:
-

this was done already IIUC.. right [~xleesf]? 

> Document the need for Avro dependency shading/relocation for custom payloads, 
> need for spark-avro
> -
>
> Key: HUDI-519
> URL: https://issues.apache.org/jira/browse/HUDI-519
> Project: Apache Hudi (incubating)
>  Issue Type: Task
>  Components: Docs, Usability
>Reporter: Udit Mehrotra
>Assignee: leesf
>Priority: Major
> Fix For: 0.6.0
>
>
> In [https://github.com/apache/incubator-hudi/pull/1005] we are migrating Hudi 
> to Spark 2.4.4. As part of this migration, we also had to migrate Hudi to use 
> Avro 1.8.2 (required by spark), while Hive still uses older version of Avro.
> This has resulted in the need to shade Avro in *hadoop-mr-bundle*. This has 
> implications on users of Hudi, who implement custom record payloads. They 
> would have start shading Avro in there custom jars, similar to how it shaded 
> in *hadoop-mr-bundle*.
> This Jira is to track the documentation of this caveat in release notes, and 
> if needed at other places like website etc.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1472: [HUDI-754] Configure .asf.yaml for Hudi Github repository

2020-03-31 Thread GitBox
vinothchandar commented on a change in pull request #1472: [HUDI-754] Configure 
.asf.yaml for Hudi Github repository
URL: https://github.com/apache/incubator-hudi/pull/1472#discussion_r400964428
 
 

 ##
 File path: .asf.yaml
 ##
 @@ -0,0 +1,10 @@
+github:
+  description: "Upserts Deletes And Incremental Processing on Big Data."
+  homepage: https://hudi.apache.org/
+  labels:
+- hudi
+- apachehudi
+- datalake
+- delta
+- incremental processing
+- bigdata
 
 Review comment:
   let's add `stream processing`, `data integration` (we support 
deltastreamer), `apachespark`?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1472: [HUDI-754] Configure .asf.yaml for Hudi Github repository

2020-03-31 Thread GitBox
vinothchandar commented on a change in pull request #1472: [HUDI-754] Configure 
.asf.yaml for Hudi Github repository
URL: https://github.com/apache/incubator-hudi/pull/1472#discussion_r400963416
 
 

 ##
 File path: .asf.yaml
 ##
 @@ -0,0 +1,10 @@
+github:
+  description: "Upserts Deletes And Incremental Processing on Big Data."
+  homepage: https://hudi.apache.org/
+  labels:
+- hudi
+- apachehudi
+- datalake
+- delta
 
 Review comment:
   lets remove this 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1472: [HUDI-754] Configure .asf.yaml for Hudi Github repository

2020-03-31 Thread GitBox
vinothchandar commented on a change in pull request #1472: [HUDI-754] Configure 
.asf.yaml for Hudi Github repository
URL: https://github.com/apache/incubator-hudi/pull/1472#discussion_r400963176
 
 

 ##
 File path: .asf.yaml
 ##
 @@ -0,0 +1,10 @@
+github:
+  description: "Upserts Deletes And Incremental Processing on Big Data."
 
 Review comment:
   nit: comma `Upserts, Deletes And` ? 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (HUDI-309) General Redesign of Archived Timeline for efficient scan and management

2020-03-31 Thread Vinoth Chandar (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17071842#comment-17071842
 ] 

Vinoth Chandar commented on HUDI-309:
-

Folks, I think we would agree this is a large effort with potential overall 
with RFC-15.. I was thinking about a way to make progress here.. on this 
specific problem and unblock other projects along the way. 

Specific problem : During write operations, cache the input using spark 
caching, compute a workload profile for purposes of file sizing etc. We also 
use persist this information in the inflight commit/deltacommit file, for doing 
rollbacks. i.e if the write fails midway leaving a .inflight 
commit/deltacommit, then upon the next write, we will read the workload profile 
written into the commit/deltacommit and then attempt delete left over files or 
log rollback blocks into log files to nullify the partial writes we might have 
written...  Note that we will not read base or log files that are inflight in 
the active time by checking if the instant was inflight. but if we don't 
perform any rollback action and enough time passes, then this instant will be 
archived and that's where the trouble is. Once an instant goes into archived 
timeline today, there is no way to check it's individual state (inflight vs 
completed).. and this is what the JIRA was trying to handle in a generic way, 
so that the memory caching requirement is not used in this critical way 
functionally.  

Thinking back, I think we can shelve this JIRA as a longer term effect. and use 
an alternate approach to solve the specific problem above.. During each write 
(from Create and Merge handles, code is in HoodieTable.java) we already write 
out marker files under .hoodie that correspond 1-1 with 1 file being created or 
merged today. In case of partial write, this marker file might be left behind 
(need to ensure in code that we commit first and then delete markers) and we 
can directly use this to perform the rollback... (note that we need to handle 
backwards compatibility with existing timelines, support also downgrades to old 
ways)

Let me know if this makes sense in a general way.. We can file a separate JIRA 
and get working on it.. 

[~xleesf] [~vbalaji] [~vinoyang] [~nagarwal] 


> General Redesign of Archived Timeline for efficient scan and management
> ---
>
> Key: HUDI-309
> URL: https://issues.apache.org/jira/browse/HUDI-309
> Project: Apache Hudi (incubating)
>  Issue Type: New Feature
>  Components: Common Core
>Reporter: Balaji Varadarajan
>Assignee: Balaji Varadarajan
>Priority: Major
> Fix For: 0.6.0
>
> Attachments: Archive TImeline Notes by Vinoth 1.jpg, Archived 
> Timeline Notes by Vinoth 2.jpg
>
>
> As designed by Vinoth:
> Goals
>  # Archived Metadata should be scannable in the same way as data
>  # Provides more safety by always serving committed data independent of 
> timeframe when the corresponding commit action was tried. Currently, we 
> implicitly assume a data file to be valid if its commit time is older than 
> the earliest time in the active timeline. While this works ok, any inherent 
> bugs in rollback could inadvertently expose a possibly duplicate file when 
> its commit timestamp becomes older than that of any commits in the timeline.
>  # We had to deal with lot of corner cases because of the way we treat a 
> "commit" as special after it gets archived. Examples also include Savepoint 
> handling logic by cleaner.
>  # Small Files : For Cloud stores, archiving simply moves fils from one 
> directory to another causing the archive folder to grow. We need a way to 
> efficiently compact these files and at the same time be friendly to scans
> Design:
>  The basic file-group abstraction for managing file versions for data files 
> can be extended to managing archived commit metadata. The idea is to use an 
> optimal format (like HFile) for storing compacted version of  Metadata> pairs. Every archiving run will read  pairs 
> from active timeline and append to indexable log files. We will run periodic 
> minor compactions to merge multiple log files to a compacted HFile storing 
> metadata for a time-range. It should be also noted that we will partition by 
> the action types (commit/clean).  This design would allow for the archived 
> timeline to be queryable for determining whether a timeline is valid or not.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (HUDI-751) Fix some coding issues reported by FindBugs

2020-03-31 Thread vinoyang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang closed HUDI-751.
-
Resolution: Fixed

Fixed via master branch: 78b3194e8241c519a85310997f31b2b55df487e1

> Fix some coding issues reported by FindBugs
> ---
>
> Key: HUDI-751
> URL: https://issues.apache.org/jira/browse/HUDI-751
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>Reporter: Shao Feng Shi
>Assignee: Shao Feng Shi
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> When I go through the code base, the FindBugs plugin in my IDEA reports 
> several issues, such as
> 1) Class implement "Serializable" doesn't have a "serialVersionUID";
> 2) Inner class wasn't declared as static;
> 3) Some static constant variables were not marked as final;
> 4) Some variable doesn't follow the naming convention, etc;
> 5) JDBC Connection resource wasn't closed after using;
>  
> I fixed them quickly, and will raise a pull request.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-751) Fix some coding issues reported by FindBugs

2020-03-31 Thread vinoyang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang updated HUDI-751:
--
Status: Open  (was: New)

> Fix some coding issues reported by FindBugs
> ---
>
> Key: HUDI-751
> URL: https://issues.apache.org/jira/browse/HUDI-751
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>Reporter: Shao Feng Shi
>Assignee: Shao Feng Shi
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> When I go through the code base, the FindBugs plugin in my IDEA reports 
> several issues, such as
> 1) Class implement "Serializable" doesn't have a "serialVersionUID";
> 2) Inner class wasn't declared as static;
> 3) Some static constant variables were not marked as final;
> 4) Some variable doesn't follow the naming convention, etc;
> 5) JDBC Connection resource wasn't closed after using;
>  
> I fixed them quickly, and will raise a pull request.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HUDI-751) Fix some coding issues reported by FindBugs

2020-03-31 Thread vinoyang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang reassigned HUDI-751:
-

Assignee: Shao Feng Shi

> Fix some coding issues reported by FindBugs
> ---
>
> Key: HUDI-751
> URL: https://issues.apache.org/jira/browse/HUDI-751
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>Reporter: Shao Feng Shi
>Assignee: Shao Feng Shi
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> When I go through the code base, the FindBugs plugin in my IDEA reports 
> several issues, such as
> 1) Class implement "Serializable" doesn't have a "serialVersionUID";
> 2) Inner class wasn't declared as static;
> 3) Some static constant variables were not marked as final;
> 4) Some variable doesn't follow the naming convention, etc;
> 5) JDBC Connection resource wasn't closed after using;
>  
> I fixed them quickly, and will raise a pull request.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-742) Fix java.lang.NoSuchMethodError: java.lang.Math.floorMod(JI)I

2020-03-31 Thread edwinguo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

edwinguo updated HUDI-742:
--
Status: Closed  (was: Patch Available)

> Fix java.lang.NoSuchMethodError: java.lang.Math.floorMod(JI)I
> -
>
> Key: HUDI-742
> URL: https://issues.apache.org/jira/browse/HUDI-742
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>  Components: Spark Integration
>Reporter: lamber-ken
>Assignee: edwinguo
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> *ISSUE* : https://github.com/apache/incubator-hudi/issues/1455
> {code:java}
> at org.apache.hudi.client.HoodieWriteClient.upsert(HoodieWriteClient.java:193)
> at org.apache.hudi.DataSourceUtils.doWriteOperation(DataSourceUtils.java:206)
> at org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:144)
> at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:108)
> at 
> org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45)
> at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
> at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
> at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:156)
> at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
> at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
> at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
> at 
> org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:83)
> at 
> org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:83)
> at 
> org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676)
> at 
> org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676)
> at 
> org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:84)
> at 
> org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:165)
> at 
> org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:74)
> at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:676)
> at 
> org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:285)
> at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:271)
> at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:229)
> ... 49 elided
> Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: 
> Task 44 in stage 11.0 failed 4 times, most recent failure: Lost task 44.3 in 
> stage 11.0 (TID 975, ip-10-81-135-85.ec2.internal, executor 6): 
> java.lang.NoSuchMethodError: java.lang.Math.floorMod(JI)I
> at 
> org.apache.hudi.index.bloom.BucketizedBloomCheckPartitioner.getPartition(BucketizedBloomCheckPartitioner.java:148)
> at 
> org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:151)
> at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
> at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55)
> at org.apache.spark.scheduler.Task.run(Task.scala:123)
> at 
> org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
> at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> Driver stacktrace:
> at 
> org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:2041)
> at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:2029)
> at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:2028)
> at 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
> at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:2028)
> at 
> 

[incubator-hudi] branch master updated: [HUDI-751] Fix some coding issues reported by FindBugs (#1470)

2020-03-31 Thread vinoyang
This is an automated email from the ASF dual-hosted git repository.

vinoyang pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new 78b3194  [HUDI-751] Fix some coding issues reported by FindBugs (#1470)
78b3194 is described below

commit 78b3194e8241c519a85310997f31b2b55df487e1
Author: Shaofeng Shi 
AuthorDate: Tue Mar 31 21:19:32 2020 +0800

[HUDI-751] Fix some coding issues reported by FindBugs (#1470)
---
 .../org/apache/hudi/cli/commands/RollbacksCommand.java   |  2 +-
 .../org/apache/hudi/cli/commands/SparkEnvCommand.java|  6 +++---
 .../java/org/apache/hudi/cli/commands/StatsCommand.java  |  6 +++---
 .../main/java/org/apache/hudi/cli/utils/HiveUtil.java| 16 
 .../java/org/apache/hudi/client/HoodieCleanClient.java   |  1 +
 .../java/org/apache/hudi/client/HoodieReadClient.java|  1 +
 .../java/org/apache/hudi/client/HoodieWriteClient.java   |  1 +
 .../main/java/org/apache/hudi/client/WriteStatus.java|  1 +
 .../org/apache/hudi/config/HoodieHBaseIndexConfig.java   |  2 +-
 .../BoundedPartitionAwareCompactionStrategy.java |  2 +-
 .../compact/strategy/DayBasedCompactionStrategy.java |  6 +++---
 .../hudi/common/config/SerializableConfiguration.java|  1 +
 .../org/apache/hudi/common/model/HoodieBaseFile.java |  1 +
 .../java/org/apache/hudi/common/model/HoodieLogFile.java |  2 ++
 .../java/org/apache/hudi/common/model/HoodieRecord.java  | 10 +-
 .../hudi/common/model/HoodieRollingStatMetadata.java |  2 +-
 .../apache/hudi/common/table/HoodieTableMetaClient.java  |  7 ---
 .../common/table/timeline/HoodieDefaultTimeline.java |  1 +
 .../java/org/apache/hudi/common/util/HoodieTimer.java|  2 +-
 .../apache/hudi/common/util/collection/DiskBasedMap.java |  2 +-
 .../org/apache/hudi/hadoop/HoodieROTablePathFilter.java  |  1 +
 .../hive/SlashEncodedDayPartitionValueExtractor.java |  1 +
 .../org/apache/hudi/utilities/HDFSParquetImporter.java   |  1 +
 .../apache/hudi/utilities/deltastreamer/Compactor.java   |  1 +
 .../apache/hudi/utilities/deltastreamer/DeltaSync.java   |  5 +++--
 .../utilities/deltastreamer/HoodieDeltaStreamer.java |  3 +++
 .../apache/hudi/utilities/perf/TimelineServerPerf.java   |  1 +
 .../org/apache/hudi/utilities/sources/CsvDFSSource.java  |  8 +---
 .../hudi/utilities/sources/HiveIncrPullSource.java   |  2 ++
 .../hudi/utilities/sources/helpers/AvroConvertor.java|  1 +
 .../hudi/utilities/sources/helpers/DFSPathSelector.java  |  2 +-
 31 files changed, 57 insertions(+), 41 deletions(-)

diff --git 
a/hudi-cli/src/main/java/org/apache/hudi/cli/commands/RollbacksCommand.java 
b/hudi-cli/src/main/java/org/apache/hudi/cli/commands/RollbacksCommand.java
index 9e4bf28..70b34bc 100644
--- a/hudi-cli/src/main/java/org/apache/hudi/cli/commands/RollbacksCommand.java
+++ b/hudi-cli/src/main/java/org/apache/hudi/cli/commands/RollbacksCommand.java
@@ -120,7 +120,7 @@ public class RollbacksCommand implements CommandMarker {
   /**
* An Active timeline containing only rollbacks.
*/
-  class RollbackTimeline extends HoodieActiveTimeline {
+  static class RollbackTimeline extends HoodieActiveTimeline {
 
 public RollbackTimeline(HoodieTableMetaClient metaClient) {
   super(metaClient, 
CollectionUtils.createImmutableSet(HoodieTimeline.ROLLBACK_EXTENSION));
diff --git 
a/hudi-cli/src/main/java/org/apache/hudi/cli/commands/SparkEnvCommand.java 
b/hudi-cli/src/main/java/org/apache/hudi/cli/commands/SparkEnvCommand.java
index d209a08..7969808 100644
--- a/hudi-cli/src/main/java/org/apache/hudi/cli/commands/SparkEnvCommand.java
+++ b/hudi-cli/src/main/java/org/apache/hudi/cli/commands/SparkEnvCommand.java
@@ -34,7 +34,7 @@ import java.util.Map;
 @Component
 public class SparkEnvCommand implements CommandMarker {
 
-  public static Map env = new HashMap();
+  public static Map env = new HashMap<>();
 
   @CliCommand(value = "set", help = "Set spark launcher env to cli")
   public void setEnv(@CliOption(key = {"conf"}, help = "Env config to be set") 
final String confMap) {
@@ -49,8 +49,8 @@ public class SparkEnvCommand implements CommandMarker {
   public String showAllEnv() {
 String[][] rows = new String[env.size()][2];
 int i = 0;
-for (String key: env.keySet()) {
-  rows[i] = new String[]{key, env.get(key)};
+for (Map.Entry entry: env.entrySet()) {
+  rows[i] = new String[]{entry.getKey(), entry.getValue()};
   i++;
 }
 return HoodiePrintHelper.print(new String[] {"key", "value"}, rows);
diff --git 
a/hudi-cli/src/main/java/org/apache/hudi/cli/commands/StatsCommand.java 
b/hudi-cli/src/main/java/org/apache/hudi/cli/commands/StatsCommand.java
index 9db544c..e5be0e4 100644
--- a/hudi-cli/src/main/java/org/apache/hudi/cli/commands/StatsCommand.java
+++ b/hudi-cli/src/main/java/org/apache/hudi/cli/commands/StatsCommand.java
@@ 

[GitHub] [incubator-hudi] yanghua merged pull request #1470: [HUDI-751] Fix some coding issues reported by FindBugs

2020-03-31 Thread GitBox
yanghua merged pull request #1470: [HUDI-751] Fix some coding issues reported 
by FindBugs
URL: https://github.com/apache/incubator-hudi/pull/1470
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Updated] (HUDI-742) Fix java.lang.NoSuchMethodError: java.lang.Math.floorMod(JI)I

2020-03-31 Thread edwinguo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

edwinguo updated HUDI-742:
--
Status: Patch Available  (was: In Progress)

> Fix java.lang.NoSuchMethodError: java.lang.Math.floorMod(JI)I
> -
>
> Key: HUDI-742
> URL: https://issues.apache.org/jira/browse/HUDI-742
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>  Components: Spark Integration
>Reporter: lamber-ken
>Assignee: edwinguo
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> *ISSUE* : https://github.com/apache/incubator-hudi/issues/1455
> {code:java}
> at org.apache.hudi.client.HoodieWriteClient.upsert(HoodieWriteClient.java:193)
> at org.apache.hudi.DataSourceUtils.doWriteOperation(DataSourceUtils.java:206)
> at org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:144)
> at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:108)
> at 
> org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45)
> at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
> at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
> at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:156)
> at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
> at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
> at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
> at 
> org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:83)
> at 
> org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:83)
> at 
> org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676)
> at 
> org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676)
> at 
> org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:84)
> at 
> org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:165)
> at 
> org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:74)
> at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:676)
> at 
> org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:285)
> at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:271)
> at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:229)
> ... 49 elided
> Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: 
> Task 44 in stage 11.0 failed 4 times, most recent failure: Lost task 44.3 in 
> stage 11.0 (TID 975, ip-10-81-135-85.ec2.internal, executor 6): 
> java.lang.NoSuchMethodError: java.lang.Math.floorMod(JI)I
> at 
> org.apache.hudi.index.bloom.BucketizedBloomCheckPartitioner.getPartition(BucketizedBloomCheckPartitioner.java:148)
> at 
> org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:151)
> at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
> at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55)
> at org.apache.spark.scheduler.Task.run(Task.scala:123)
> at 
> org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
> at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> Driver stacktrace:
> at 
> org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:2041)
> at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:2029)
> at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:2028)
> at 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
> at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:2028)
> at 
> 

[GitHub] [incubator-hudi] EdwinGuo closed issue #1455: [SUPPORT] Hudi upsert run into exception: java.lang.NoSuchMethodError: java.lang.Math.floorMod(JI)I

2020-03-31 Thread GitBox
EdwinGuo closed issue #1455: [SUPPORT] Hudi upsert run into exception:  
java.lang.NoSuchMethodError: java.lang.Math.floorMod(JI)I
URL: https://github.com/apache/incubator-hudi/issues/1455
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] yanghua opened a new pull request #1472: [HUDI-754] Configure .asf.yaml for Hudi Github repository

2020-03-31 Thread GitBox
yanghua opened a new pull request #1472: [HUDI-754] Configure .asf.yaml for 
Hudi Github repository
URL: https://github.com/apache/incubator-hudi/pull/1472
 
 
   
   
   ## What is the purpose of the pull request
   
   *This pull request configures `.asf.yaml` for Hudi Github repository*
   
   ## Brief change log
   
 - *Configure .asf.yaml for Hudi Github repository*
   
   ## Verify this pull request
   
   
   This pull request is a trivial rework / code cleanup without any test 
coverage.
   
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Updated] (HUDI-754) Config repository metadata

2020-03-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-754:

Labels: pull-request-available  (was: )

> Config repository metadata
> --
>
> Key: HUDI-754
> URL: https://issues.apache.org/jira/browse/HUDI-754
> Project: Apache Hudi (incubating)
>  Issue Type: Sub-task
>Reporter: vinoyang
>Priority: Major
>  Labels: pull-request-available
>
> e.g.:
> {code:java}
> github:
>   description: "JSONP module for Apache Foobar"
>   homepage: https://foobar.apache.org/
>   labels:
> - json
> - jsonp
> - foobar
> - apache
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-754) Config repository metadata

2020-03-31 Thread vinoyang (Jira)
vinoyang created HUDI-754:
-

 Summary: Config repository metadata
 Key: HUDI-754
 URL: https://issues.apache.org/jira/browse/HUDI-754
 Project: Apache Hudi (incubating)
  Issue Type: Sub-task
Reporter: vinoyang


e.g.:


{code:java}
github:
  description: "JSONP module for Apache Foobar"
  homepage: https://foobar.apache.org/
  labels:
- json
- jsonp
- foobar
- apache
{code}




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-753) Configure .asf.yaml for Hudi Github repository

2020-03-31 Thread vinoyang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang updated HUDI-753:
--
Status: Open  (was: New)

> Configure .asf.yaml for Hudi Github repository
> --
>
> Key: HUDI-753
> URL: https://issues.apache.org/jira/browse/HUDI-753
> Project: Apache Hudi (incubating)
>  Issue Type: Task
>Reporter: vinoyang
>Assignee: vinoyang
>Priority: Major
>
> Recently, ASF infra team released a nice feature named 
> [.asf.yaml|https://cwiki.apache.org/confluence/display/INFRA/.asf.yaml+features+for+git+repositories].
>  We can configure and verify those config options step by step.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-753) Configure .asf.yaml for Hudi Github repository

2020-03-31 Thread vinoyang (Jira)
vinoyang created HUDI-753:
-

 Summary: Configure .asf.yaml for Hudi Github repository
 Key: HUDI-753
 URL: https://issues.apache.org/jira/browse/HUDI-753
 Project: Apache Hudi (incubating)
  Issue Type: Task
Reporter: vinoyang
Assignee: vinoyang


Recently, ASF infra team released a nice feature named 
[.asf.yaml|https://cwiki.apache.org/confluence/display/INFRA/.asf.yaml+features+for+git+repositories].
 We can configure and verify those config options step by step.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] hddong commented on a change in pull request #1471: [WIP][HUDI-752]Make CompactionAdminClient spark-free

2020-03-31 Thread GitBox
hddong commented on a change in pull request #1471: [WIP][HUDI-752]Make 
CompactionAdminClient spark-free
URL: https://github.com/apache/incubator-hudi/pull/1471#discussion_r400795505
 
 

 ##
 File path: 
hudi-client/src/main/java/org/apache/hudi/client/utils/SparkEngineUtils.java
 ##
 @@ -0,0 +1,61 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.client.utils;
+
+import org.apache.spark.SparkContext;
+import org.apache.spark.api.java.JavaSparkContext;
+import org.apache.spark.api.java.function.FlatMapFunction;
+import org.apache.spark.api.java.function.Function;
+
+import java.util.List;
+
+/**
+ * Util class for Spark Engine.
+ */
+public class SparkEngineUtils {
 
 Review comment:
   @yanghua IMO, this is the same Purpose of 
[HUDI-678](https://github.com/apache/incubator-hudi/pull/1418), just make this 
class spark-free. After all spark RDD calculate move to this class , we can 
abstract the FlinkEngine like this.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] hddong commented on issue #1471: [WIP][HUDI-752]Make CompactionAdminClient spark-free

2020-03-31 Thread GitBox
hddong commented on issue #1471: [WIP][HUDI-752]Make CompactionAdminClient 
spark-free
URL: https://github.com/apache/incubator-hudi/pull/1471#issuecomment-606522425
 
 
   @yanghua thanks, just want to have a try. Will communicate first next time.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] yanghua commented on issue #1471: [WIP][HUDI-752]Make CompactionAdminClient spark-free

2020-03-31 Thread GitBox
yanghua commented on issue #1471: [WIP][HUDI-752]Make CompactionAdminClient 
spark-free
URL: https://github.com/apache/incubator-hudi/pull/1471#issuecomment-606516510
 
 
   @hddong Thanks for your contribution. Any time you work on the spark-free 
issue. Please make sure you communicated with @vinothchandar .


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] yanghua commented on a change in pull request #1471: [WIP][HUDI-752]Make CompactionAdminClient spark-free

2020-03-31 Thread GitBox
yanghua commented on a change in pull request #1471: [WIP][HUDI-752]Make 
CompactionAdminClient spark-free
URL: https://github.com/apache/incubator-hudi/pull/1471#discussion_r400779018
 
 

 ##
 File path: 
hudi-client/src/main/java/org/apache/hudi/client/utils/SparkEngineUtils.java
 ##
 @@ -0,0 +1,61 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.client.utils;
+
+import org.apache.spark.SparkContext;
+import org.apache.spark.api.java.JavaSparkContext;
+import org.apache.spark.api.java.function.FlatMapFunction;
+import org.apache.spark.api.java.function.Function;
+
+import java.util.List;
+
+/**
+ * Util class for Spark Engine.
+ */
+public class SparkEngineUtils {
 
 Review comment:
   Actually, If we depend on this class, we can not call it **spark free**.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Updated] (HUDI-752) Make CompactionAdminClient spark-free

2020-03-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-752:

Labels: pull-request-available  (was: )

> Make CompactionAdminClient spark-free
> -
>
> Key: HUDI-752
> URL: https://issues.apache.org/jira/browse/HUDI-752
> Project: Apache Hudi (incubating)
>  Issue Type: Sub-task
>Reporter: hong dongdong
>Assignee: hong dongdong
>Priority: Major
>  Labels: pull-request-available
>
> Now, we always pass jsc, there can only one sparkContext in JVM. So, we can 
> store it in a Factory class, then we can get it everywhere. After that, we 
> make many class spark-free



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] hddong opened a new pull request #1471: [HUDI-752]Make CompactionAdminClient spark-free

2020-03-31 Thread GitBox
hddong opened a new pull request #1471: [HUDI-752]Make CompactionAdminClient 
spark-free
URL: https://github.com/apache/incubator-hudi/pull/1471
 
 
   ## *Tips*
   - *Thank you very much for contributing to Apache Hudi.*
   - *Please review https://hudi.apache.org/contributing.html before opening a 
pull request.*
   
   ## What is the purpose of the pull request
   
   * There can only one `SparkContext` in  JVM. So, we can store it in a 
`Factory `class, then we can get it everywhere. After that, we make many class 
spark-free*
   
   ## Brief change log
   
   *(for example:)*
 - *Make CompactionAdminClient spark-free*
   
   ## Verify this pull request
   
   This pull request is a trivial rework / code cleanup without any test 
coverage.
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Updated] (HUDI-752) Make CompactionAdminClient spark-free

2020-03-31 Thread hong dongdong (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hong dongdong updated HUDI-752:
---
Description: Now, we always pass jsc, there can only one sparkContext in 
JVM. So, we can store it in a Factory class, then we can get it everywhere. 
After that, we make many class spark-free

> Make CompactionAdminClient spark-free
> -
>
> Key: HUDI-752
> URL: https://issues.apache.org/jira/browse/HUDI-752
> Project: Apache Hudi (incubating)
>  Issue Type: Sub-task
>Reporter: hong dongdong
>Assignee: hong dongdong
>Priority: Major
>
> Now, we always pass jsc, there can only one sparkContext in JVM. So, we can 
> store it in a Factory class, then we can get it everywhere. After that, we 
> make many class spark-free



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-752) Make CompactionAdminClient spark-free

2020-03-31 Thread hong dongdong (Jira)
hong dongdong created HUDI-752:
--

 Summary: Make CompactionAdminClient spark-free
 Key: HUDI-752
 URL: https://issues.apache.org/jira/browse/HUDI-752
 Project: Apache Hudi (incubating)
  Issue Type: Sub-task
Reporter: hong dongdong
Assignee: hong dongdong






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] yanghua commented on a change in pull request #1440: [HUDI-731] Add ChainedTransformer

2020-03-31 Thread GitBox
yanghua commented on a change in pull request #1440: [HUDI-731] Add 
ChainedTransformer
URL: https://github.com/apache/incubator-hudi/pull/1440#discussion_r400675355
 
 

 ##
 File path: 
hudi-utilities/src/test/java/org/apache/hudi/utilities/TestUtilHelpers.java
 ##
 @@ -0,0 +1,101 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hudi.utilities;
+
+import org.apache.hudi.common.config.TypedProperties;
+import org.apache.hudi.utilities.transform.ChainedTransformer;
+import org.apache.hudi.utilities.transform.Transformer;
+
+import org.apache.spark.api.java.JavaSparkContext;
+import org.apache.spark.sql.Dataset;
+import org.apache.spark.sql.Row;
+import org.apache.spark.sql.SparkSession;
+import org.junit.Rule;
+import org.junit.Test;
+import org.junit.experimental.runners.Enclosed;
+import org.junit.rules.ExpectedException;
+import org.junit.runner.RunWith;
+
+import java.io.IOException;
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.List;
+
+import static org.junit.Assert.assertEquals;
+import static org.junit.Assert.assertNull;
+import static org.junit.Assert.assertTrue;
+
+@RunWith(Enclosed.class)
+public class TestUtilHelpers {
+
+  public static class TestCreateTransformer {
 
 Review comment:
   Ok, reasonable.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] yanghua commented on a change in pull request #1440: [HUDI-731] Add ChainedTransformer

2020-03-31 Thread GitBox
yanghua commented on a change in pull request #1440: [HUDI-731] Add 
ChainedTransformer
URL: https://github.com/apache/incubator-hudi/pull/1440#discussion_r400674693
 
 

 ##
 File path: 
hudi-utilities/src/test/java/org/apache/hudi/utilities/TestUtilHelpers.java
 ##
 @@ -0,0 +1,101 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hudi.utilities;
+
+import org.apache.hudi.common.config.TypedProperties;
+import org.apache.hudi.utilities.transform.ChainedTransformer;
+import org.apache.hudi.utilities.transform.Transformer;
+
+import org.apache.spark.api.java.JavaSparkContext;
+import org.apache.spark.sql.Dataset;
+import org.apache.spark.sql.Row;
+import org.apache.spark.sql.SparkSession;
+import org.junit.Rule;
+import org.junit.Test;
+import org.junit.experimental.runners.Enclosed;
+import org.junit.rules.ExpectedException;
+import org.junit.runner.RunWith;
+
+import java.io.IOException;
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.List;
+
+import static org.junit.Assert.assertEquals;
+import static org.junit.Assert.assertNull;
+import static org.junit.Assert.assertTrue;
+
+@RunWith(Enclosed.class)
+public class TestUtilHelpers {
 
 Review comment:
   Yes, I am a bit worried about these issues you mentioned. Another concern is 
the issue of conventions. Most test classes currently include Java docs, even 
though it seems they can describe themselves. Of course, this is another 
subjective issue worth considering. OK, here I no longer require Java doc.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Updated] (HUDI-751) Fix some coding issues reported by FindBugs

2020-03-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-751:

Labels: pull-request-available  (was: )

> Fix some coding issues reported by FindBugs
> ---
>
> Key: HUDI-751
> URL: https://issues.apache.org/jira/browse/HUDI-751
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>Reporter: Shao Feng Shi
>Priority: Major
>  Labels: pull-request-available
>
> When I go through the code base, the FindBugs plugin in my IDEA reports 
> several issues, such as
> 1) Class implement "Serializable" doesn't have a "serialVersionUID";
> 2) Inner class wasn't declared as static;
> 3) Some static constant variables were not marked as final;
> 4) Some variable doesn't follow the naming convention, etc;
> 5) JDBC Connection resource wasn't closed after using;
>  
> I fixed them quickly, and will raise a pull request.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] shaofengshi opened a new pull request #1470: HUDI-751 Fix some coding issues reported by FindBugs

2020-03-31 Thread GitBox
shaofengshi opened a new pull request #1470: HUDI-751 Fix some coding issues 
reported by FindBugs
URL: https://github.com/apache/incubator-hudi/pull/1470
 
 
   ## *Tips*
   - *Thank you very much for contributing to Apache Hudi.*
   - *Please review https://hudi.apache.org/contributing.html before opening a 
pull request.*
   
   ## What is the purpose of the pull request
   
   This pull request fixes several coding issues, which is to improve the code 
quality.
   
   ## Brief change log
   
 - 1) Class implement "Serializable" doesn't have a "serialVersionUID"; add 
"serialVersionUID" for them;
   
 - 2) Inner class wasn't declared as static; Mark them as static;
   
 - 3) Some static constant variables were not marked as final; Mark them as 
final;
   
 - 4) Some variable doesn't follow the naming convention (like constant 
should be all in upper case), etc; Rename such variable;
   
 - 5) JDBC Connection resource wasn't closed after using; Make sure it be 
closed with the try() block;
   
   ## Verify this pull request
   
   'mvn test' got all passed;
   
   This pull request is a trivial rework / code cleanup without any test 
coverage.
   
   
   ## Committer checklist
   
- [x] Has a corresponding JIRA in PR title & commit

- [x] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services