[jira] [Closed] (HUDI-580) Fix incorrect license header in files

2020-02-29 Thread Suneel Marthi (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi closed HUDI-580.
--

> Fix incorrect license header in files
> -
>
> Key: HUDI-580
> URL: https://issues.apache.org/jira/browse/HUDI-580
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: newbie
>Reporter: leesf
>Assignee: lamber-ken
>Priority: Blocker
>  Labels: compliance, pull-request-available
> Fix For: 0.5.2
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Issues pointed out in general@incubator ML, more context here: 
> [https://lists.apache.org/thread.html/rd3f4a72d82a4a5a81b2c6bd71e1417054daa38637ce8e07901f26f04%40%3Cgeneral.incubator.apache.org%3E]
>  
> Would get it fixed before next release.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (HUDI-605) Avoid calculating the size of schema redundantly

2020-02-29 Thread Suneel Marthi (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi closed HUDI-605.
--

> Avoid calculating the size of schema redundantly  
> --
>
> Key: HUDI-605
> URL: https://issues.apache.org/jira/browse/HUDI-605
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Common Core
>Reporter: lamber-ken
>Assignee: lamber-ken
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 0.5.2
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Avoid calculating the size of schema redundantly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[incubator-hudi] branch release-0.5.2 created (now afaf4ba)

2020-02-29 Thread vinoyang
This is an automated email from the ASF dual-hosted git repository.

vinoyang pushed a change to branch release-0.5.2
in repository https://gitbox.apache.org/repos/asf/incubator-hudi.git.


  at afaf4ba  Create release branch for version 0.5.2.

This branch includes the following new commits:

 new afaf4ba  Create release branch for version 0.5.2.

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.




[incubator-hudi] branch master updated: Moving to 0.6.0-SNAPSHOT on master branch.

2020-02-29 Thread vinoyang
This is an automated email from the ASF dual-hosted git repository.

vinoyang pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new 0dc8e49  Moving to 0.6.0-SNAPSHOT on master branch.
0dc8e49 is described below

commit 0dc8e493aa1658910a3519df3941278d9d072c18
Author: yanghua 
AuthorDate: Sun Mar 1 15:08:30 2020 +0800

Moving to 0.6.0-SNAPSHOT on master branch.
---
 docker/hoodie/hadoop/base/pom.xml | 2 +-
 docker/hoodie/hadoop/datanode/pom.xml | 2 +-
 docker/hoodie/hadoop/historyserver/pom.xml| 2 +-
 docker/hoodie/hadoop/hive_base/pom.xml| 2 +-
 docker/hoodie/hadoop/namenode/pom.xml | 2 +-
 docker/hoodie/hadoop/pom.xml  | 2 +-
 docker/hoodie/hadoop/prestobase/pom.xml   | 2 +-
 docker/hoodie/hadoop/spark_base/pom.xml   | 2 +-
 docker/hoodie/hadoop/sparkadhoc/pom.xml   | 2 +-
 docker/hoodie/hadoop/sparkmaster/pom.xml  | 2 +-
 docker/hoodie/hadoop/sparkworker/pom.xml  | 2 +-
 hudi-cli/pom.xml  | 2 +-
 hudi-client/pom.xml   | 2 +-
 hudi-common/pom.xml   | 2 +-
 hudi-hadoop-mr/pom.xml| 2 +-
 hudi-hive/pom.xml | 2 +-
 hudi-integ-test/pom.xml   | 2 +-
 hudi-spark/pom.xml| 2 +-
 hudi-timeline-service/pom.xml | 2 +-
 hudi-utilities/pom.xml| 2 +-
 packaging/hudi-hadoop-mr-bundle/pom.xml   | 2 +-
 packaging/hudi-hive-bundle/pom.xml| 2 +-
 packaging/hudi-presto-bundle/pom.xml  | 2 +-
 packaging/hudi-spark-bundle/pom.xml   | 2 +-
 packaging/hudi-timeline-server-bundle/pom.xml | 2 +-
 packaging/hudi-utilities-bundle/pom.xml   | 2 +-
 pom.xml   | 2 +-
 27 files changed, 27 insertions(+), 27 deletions(-)

diff --git a/docker/hoodie/hadoop/base/pom.xml 
b/docker/hoodie/hadoop/base/pom.xml
index 0cbd377..55205ee 100644
--- a/docker/hoodie/hadoop/base/pom.xml
+++ b/docker/hoodie/hadoop/base/pom.xml
@@ -19,7 +19,7 @@
   
 hudi-hadoop-docker
 org.apache.hudi
-0.5.2-SNAPSHOT
+0.6.0-SNAPSHOT
   
   4.0.0
   pom
diff --git a/docker/hoodie/hadoop/datanode/pom.xml 
b/docker/hoodie/hadoop/datanode/pom.xml
index 034aebe..e8c95f9 100644
--- a/docker/hoodie/hadoop/datanode/pom.xml
+++ b/docker/hoodie/hadoop/datanode/pom.xml
@@ -19,7 +19,7 @@
   
 hudi-hadoop-docker
 org.apache.hudi
-0.5.2-SNAPSHOT
+0.6.0-SNAPSHOT
   
   4.0.0
   pom
diff --git a/docker/hoodie/hadoop/historyserver/pom.xml 
b/docker/hoodie/hadoop/historyserver/pom.xml
index b41ca5c..725cdcf 100644
--- a/docker/hoodie/hadoop/historyserver/pom.xml
+++ b/docker/hoodie/hadoop/historyserver/pom.xml
@@ -19,7 +19,7 @@
   
 hudi-hadoop-docker
 org.apache.hudi
-0.5.2-SNAPSHOT
+0.6.0-SNAPSHOT
   
   4.0.0
   pom
diff --git a/docker/hoodie/hadoop/hive_base/pom.xml 
b/docker/hoodie/hadoop/hive_base/pom.xml
index d65e230..04aac75 100644
--- a/docker/hoodie/hadoop/hive_base/pom.xml
+++ b/docker/hoodie/hadoop/hive_base/pom.xml
@@ -19,7 +19,7 @@
   
 hudi-hadoop-docker
 org.apache.hudi
-0.5.2-SNAPSHOT
+0.6.0-SNAPSHOT
   
   4.0.0
   pom
diff --git a/docker/hoodie/hadoop/namenode/pom.xml 
b/docker/hoodie/hadoop/namenode/pom.xml
index c35ff45..4ec1f9a 100644
--- a/docker/hoodie/hadoop/namenode/pom.xml
+++ b/docker/hoodie/hadoop/namenode/pom.xml
@@ -19,7 +19,7 @@
   
 hudi-hadoop-docker
 org.apache.hudi
-0.5.2-SNAPSHOT
+0.6.0-SNAPSHOT
   
   4.0.0
   pom
diff --git a/docker/hoodie/hadoop/pom.xml b/docker/hoodie/hadoop/pom.xml
index e2d0482..bedd3b4 100644
--- a/docker/hoodie/hadoop/pom.xml
+++ b/docker/hoodie/hadoop/pom.xml
@@ -19,7 +19,7 @@
   
 hudi
 org.apache.hudi
-0.5.2-SNAPSHOT
+0.6.0-SNAPSHOT
 ../../../pom.xml
   
   4.0.0
diff --git a/docker/hoodie/hadoop/prestobase/pom.xml 
b/docker/hoodie/hadoop/prestobase/pom.xml
index fd96e21..2ba319c 100644
--- a/docker/hoodie/hadoop/prestobase/pom.xml
+++ b/docker/hoodie/hadoop/prestobase/pom.xml
@@ -22,7 +22,7 @@
   
 hudi-hadoop-docker
 org.apache.hudi
-0.5.2-SNAPSHOT
+0.6.0-SNAPSHOT
   
   4.0.0
   pom
diff --git a/docker/hoodie/hadoop/spark_base/pom.xml 
b/docker/hoodie/hadoop/spark_base/pom.xml
index e9a4d5a..6385305 100644
--- a/docker/hoodie/hadoop/spark_base/pom.xml
+++ b/docker/hoodie/hadoop/spark_base/pom.xml
@@ -19,7 +19,7 @@
   
 hudi-hadoop-docker
 org.apache.hudi
-0.5.2-SNAPSHOT
+0.6.0-SNAPSHOT
   
   4.0.0
   pom
diff --git a/docker/hoodie/hadoop/sparkadhoc/pom.xml 
b/docker/hoodie/hadoop/sparkadhoc/pom.xml
index 1e008e5..c1babf4 100644
--- a/docker/hoodie/hadoop/sparkadhoc/pom.xml
+++ b/docker/hoodie/hadoop/sparkadhoc/pom.xml
@@ -19,7 +19,7 @@
   
 hudi-hadoop-docker
 org.apache.hudi
-

[GitHub] [incubator-hudi] codecov-io commented on issue #1364: [HUDI-599] Update release guide & release scripts due to the change of scala 2.12 build

2020-02-29 Thread GitBox
codecov-io commented on issue #1364: [HUDI-599] Update release guide & release 
scripts due to the change of scala 2.12 build
URL: https://github.com/apache/incubator-hudi/pull/1364#issuecomment-593063573
 
 
   # 
[Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1364?src=pr=h1) 
Report
   > Merging 
[#1364](https://codecov.io/gh/apache/incubator-hudi/pull/1364?src=pr=desc) 
into 
[master](https://codecov.io/gh/apache/incubator-hudi/commit/acf359c834bc1d9b9c4ea64d362ea20c7410c70a?src=pr=desc)
 will **decrease** coverage by `0.01%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-hudi/pull/1364/graphs/tree.svg?width=650=VTTXabwbs2=150=pr)](https://codecov.io/gh/apache/incubator-hudi/pull/1364?src=pr=tree)
   
   ```diff
   @@ Coverage Diff  @@
   ## master#1364  +/-   ##
   
   - Coverage 67.09%   67.08%   -0.02% 
 Complexity  223  223  
   
 Files   333  333  
 Lines 1620716207  
 Branches   1657 1657  
   
   - Hits  1087410872   -2 
   - Misses 4597 4598   +1 
   - Partials736  737   +1
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/incubator-hudi/pull/1364?src=pr=tree) | 
Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | 
[...i/utilities/deltastreamer/HoodieDeltaStreamer.java](https://codecov.io/gh/apache/incubator-hudi/pull/1364/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2RlbHRhc3RyZWFtZXIvSG9vZGllRGVsdGFTdHJlYW1lci5qYXZh)
 | `79.79% <0%> (-1.02%)` | `8% <0%> (ø)` | |
   | 
[...che/hudi/common/util/BufferedRandomAccessFile.java](https://codecov.io/gh/apache/incubator-hudi/pull/1364/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3V0aWwvQnVmZmVyZWRSYW5kb21BY2Nlc3NGaWxlLmphdmE=)
 | `54.38% <0%> (-0.88%)` | `0% <0%> (ø)` | |
   | 
[...a/org/apache/hudi/common/util/collection/Pair.java](https://codecov.io/gh/apache/incubator-hudi/pull/1364/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3V0aWwvY29sbGVjdGlvbi9QYWlyLmphdmE=)
 | `76% <0%> (+4%)` | `0% <0%> (ø)` | :arrow_down: |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1364?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1364?src=pr=footer).
 Last update 
[acf359c...b972b17](https://codecov.io/gh/apache/incubator-hudi/pull/1364?src=pr=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Closed] (HUDI-599) Update release guide & release scripts due to the change of scala 2.12 build

2020-02-29 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf closed HUDI-599.
--

> Update release guide & release scripts due to the change of scala 2.12 build
> 
>
> Key: HUDI-599
> URL: https://issues.apache.org/jira/browse/HUDI-599
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Release  Administrative
>Reporter: leesf
>Assignee: leesf
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.5.2
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Update release guide due to the change of scala 2.12 build, PR link below
> [https://github.com/apache/incubator-hudi/pull/1293]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HUDI-599) Update release guide & release scripts due to the change of scala 2.12 build

2020-02-29 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf resolved HUDI-599.

Resolution: Fixed

Fixed via master: 0cde27e63c2cf9b70f24f0ae6b63fad9259b28d3

and updated the release guide accordingly.

> Update release guide & release scripts due to the change of scala 2.12 build
> 
>
> Key: HUDI-599
> URL: https://issues.apache.org/jira/browse/HUDI-599
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Release  Administrative
>Reporter: leesf
>Assignee: leesf
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.5.2
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Update release guide due to the change of scala 2.12 build, PR link below
> [https://github.com/apache/incubator-hudi/pull/1293]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[incubator-hudi] branch master updated: [MINOR] Fix cut_release_branch script missed a double quotation marks (#1365)

2020-02-29 Thread vinoyang
This is an automated email from the ASF dual-hosted git repository.

vinoyang pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new 9160084  [MINOR] Fix cut_release_branch script missed a double 
quotation marks (#1365)
9160084 is described below

commit 9160084bb147552d670235d27ecb198239ee32e5
Author: vinoyang 
AuthorDate: Sun Mar 1 14:34:15 2020 +0800

[MINOR] Fix cut_release_branch script missed a double quotation marks 
(#1365)
---
 scripts/release/cut_release_branch.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/scripts/release/cut_release_branch.sh 
b/scripts/release/cut_release_branch.sh
index 8e6e923..f8f0d9f 100755
--- a/scripts/release/cut_release_branch.sh
+++ b/scripts/release/cut_release_branch.sh
@@ -73,7 +73,7 @@ echo "next_release: ${NEXT_VERSION_IN_BASE_BRANCH}"
 echo "working master branch: ${MASTER_BRANCH}"
 echo "working release branch: ${RELEASE_BRANCH}"
 echo "local repo dir: ~/${LOCAL_CLONE_DIR}/${HUDI_ROOT_DIR}"
-echo "RC_NUM: $RC_NUM
+echo "RC_NUM: $RC_NUM"
 echo "==="
 
 cd ~



[GitHub] [incubator-hudi] yanghua merged pull request #1365: [MINOR] Fix cut_release_branch script missed a double quotation marks

2020-02-29 Thread GitBox
yanghua merged pull request #1365: [MINOR] Fix cut_release_branch script missed 
a double quotation marks
URL: https://github.com/apache/incubator-hudi/pull/1365
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Updated] (HUDI-599) Update release guide & release scripts due to the change of scala 2.12 build

2020-02-29 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf updated HUDI-599:
---
Status: Open  (was: New)

> Update release guide & release scripts due to the change of scala 2.12 build
> 
>
> Key: HUDI-599
> URL: https://issues.apache.org/jira/browse/HUDI-599
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Release  Administrative
>Reporter: leesf
>Assignee: leesf
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.5.2
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Update release guide due to the change of scala 2.12 build, PR link below
> [https://github.com/apache/incubator-hudi/pull/1293]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] yanghua opened a new pull request #1365: [MINOR] Fix cut_release_branch script missed a double quotation marks

2020-02-29 Thread GitBox
yanghua opened a new pull request #1365: [MINOR] Fix cut_release_branch script 
missed a double quotation marks
URL: https://github.com/apache/incubator-hudi/pull/1365
 
 
   ## What is the purpose of the pull request
   
   *This pull request fixes cut_release_branch script missed a double quotation 
marks*
   
   ## Brief change log
   
 - *Fix cut_release_branch script missed a double quotation marks*
   
   ## Verify this pull request
   
   
   This pull request is a trivial rework / code cleanup without any test 
coverage.
   
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] yanghua merged pull request #1364: [HUDI-599] Update release guide & release scripts due to the change of scala 2.12 build

2020-02-29 Thread GitBox
yanghua merged pull request #1364: [HUDI-599] Update release guide & release 
scripts due to the change of scala 2.12 build
URL: https://github.com/apache/incubator-hudi/pull/1364
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[incubator-hudi] branch master updated: [HUDI-599] Update release guide & release scripts due to the change of scala 2.12 build (#1364)

2020-02-29 Thread vinoyang
This is an automated email from the ASF dual-hosted git repository.

vinoyang pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new 0cde27e  [HUDI-599] Update release guide & release scripts due to the 
change of scala 2.12 build (#1364)
0cde27e is described below

commit 0cde27e63c2cf9b70f24f0ae6b63fad9259b28d3
Author: leesf <490081...@qq.com>
AuthorDate: Sun Mar 1 14:30:32 2020 +0800

[HUDI-599] Update release guide & release scripts due to the change of 
scala 2.12 build (#1364)
---
 scripts/release/deploy_staging_jars.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/scripts/release/deploy_staging_jars.sh 
b/scripts/release/deploy_staging_jars.sh
index b02a7d4..885c25f 100755
--- a/scripts/release/deploy_staging_jars.sh
+++ b/scripts/release/deploy_staging_jars.sh
@@ -54,5 +54,5 @@ cd ..
 
 echo "Deploying to repository.apache.org with scala version ${SCALA_VERSION}"
 
-COMMON_OPTIONS="-Pscala-${SCALA_VERSION} -Prelease -DskipTests 
-DretryFailedDeploymentCount=10 -DdeployArtifacts=true"
+COMMON_OPTIONS="-Dscala-${SCALA_VERSION} -Prelease -DskipTests 
-DretryFailedDeploymentCount=10 -DdeployArtifacts=true"
 $MVN clean deploy $COMMON_OPTIONS



[jira] [Updated] (HUDI-599) Update release guide & release scripts due to the change of scala 2.12 build

2020-02-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-599:

Labels: pull-request-available  (was: )

> Update release guide & release scripts due to the change of scala 2.12 build
> 
>
> Key: HUDI-599
> URL: https://issues.apache.org/jira/browse/HUDI-599
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Release  Administrative
>Reporter: leesf
>Assignee: leesf
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.5.2
>
>
> Update release guide due to the change of scala 2.12 build, PR link below
> [https://github.com/apache/incubator-hudi/pull/1293]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] leesf opened a new pull request #1364: [HUDI-599] Update release guide & release scripts due to the change of scala 2.12 build

2020-02-29 Thread GitBox
leesf opened a new pull request #1364: [HUDI-599] Update release guide & 
release scripts due to the change of scala 2.12 build
URL: https://github.com/apache/incubator-hudi/pull/1364
 
 
   
   ## Brief change log
   
   update release scripts due to 
https://github.com/apache/incubator-hudi/pull/1293
   
   ## Verify this pull request
   
   run ./deploy_staging_jars.sh
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


svn commit: r38340 - /release/incubator/hudi/KEYS

2020-02-29 Thread vinoyang
Author: vinoyang
Date: Sun Mar  1 05:47:02 2020
New Revision: 38340

Log:
Update KEYS file

Modified:
release/incubator/hudi/KEYS

Modified: release/incubator/hudi/KEYS
==
--- release/incubator/hudi/KEYS (original)
+++ release/incubator/hudi/KEYS Sun Mar  1 05:47:02 2020
@@ -332,3 +332,40 @@ ePILQEvmZ8GgwevHx170WUjKWBpLLSFl4zgXIl9Q
 =FZo/
 -END PGP PUBLIC KEY BLOCK-
 
+pub   rsa2048 2020-03-01 [SC]
+  C3A96EC77149571AE89F82764C86684D047DE03C
+uid   [ultimate] vinoyang (apache gpg) 
+sig 34C86684D047DE03C 2020-03-01  vinoyang (apache gpg) 

+sub   rsa2048 2020-03-01 [E]
+sig  4C86684D047DE03C 2020-03-01  vinoyang (apache gpg) 

+
+-BEGIN PGP PUBLIC KEY BLOCK-
+
+mQENBF5bOScBCADbO+U5FnwUK2Nf+NjwTRK2HZYsckLbjpO9we6OuASbYmWSqMhu
+qdhGgwu37E7qjQW4nE8s1hzOPrVUlSzFhXTRjMmRtR1wviWs1ibwH/AQA0HGH9jE
+1p9O6Dy6tASSE0Hd+rxUaFmYxabC6BLopJccD9ANhkhnH9UVbHSDn5RXVV7eEvy5
+wzFHCQavqj4oMAGRUXIHl3uh4U9ROiAU3UQRkZ3pnf3R6tbCtw1Jv/NcWztaLQLm
+aMkr3hSFrrZ8xR2hAOhOs4+Y+i3bELg9bBmvz1lCQRYDEY82lugPyqpdKc98KTAw
+gVgQoXjJhw2+bC/4KTaXwwAcexuh7YfcNJAvABEBAAG0K3Zpbm95YW5nIChhcGFj
+aGUgZ3BnKSA8dmlub3lhbmdAYXBhY2hlLm9yZz6JAU4EEwEIADgWIQTDqW7HcUlX
+GuifgnZMhmhNBH3gPAUCXls5JwIbAwULCQgHAgYVCgkICwIEFgIDAQIeAQIXgAAK
+CRBMhmhNBH3gPJvxB/9gRVhXMXJXH7Z1JNf7zjdRXbdtQgMr2/WVLnbAevjU+Btf
+gHO+KllJZoFXFhHJWSRld7OaNJC0k9V9UyMgOBS1hvCRFEbMH8mRisux4JDx9/F6
+0/gWCzoK3pY2EevzDtr5bOx6C4GPoSjAfBUifTB5YixnX57ePxLnZmeeME+dQ+/1
+fMkpR6a5Eo4sLiVOwhBgYMGKHr6GZSVd23CSyPVxDKuImNfehZQLCtNq8LiKFTY9
+tGhInVdbLUJCPiXoxIKpinVtYxUGoW8LHUFaVq3BenG4wKx5s/ImU824pAtWJ0CK
+NHbzwF7oPckfM1hdT3ZycfUK1qoX6FoEVtZZ1QxTuQENBF5bOScBCADKl9sswehe
+S+2zrhR1C5U3gNqZYD/MSIdx3K2k//BjweYZCqxtzR1J9JtitrA0WJKF8NnK2dF+
+FkpC1iDduvAAZXw94tGb9qyTeSXhZ4gFAxfbRwthEhP0GYya1bhtM+gi2zOW+tsp
+KSYwCBUoAk8PKI3ZPyiWJNhlmsOolSg4IF50PWzhXet0t+OeJaBGNffdfERF7TF/
+y1lBu5RLiLxUDYc5tV80dA3MNLDkCKW16OlCAkxH8+IkZ5Z2eprDaFDBwDo0/5jk
+pET8XBCjBReaFsleYBkZmwdbzeurkj8sTa0GQZKdeBynciDqbREmWulkkTp9jGTv
+lRarr3woa2f7ABEBAAGJATYEGAEIACAWIQTDqW7HcUlXGuifgnZMhmhNBH3gPAUC
+Xls5JwIbDAAKCRBMhmhNBH3gPM9FCACO9+sqdi7wkp8asbpS6WzjZ0FS3KbW3IoW
+QgbVx9t4mB4cGq91h6CnbDGZnr2qlRKwCCAijuUfBTPER8lzyltOVos22FbHXWa+
+Oqicjn336aysnFZuNTvnvYsWvlwvW5AAVCZn4YfE0qYB6oHCBZLdg4YFQRx6U1t5
+CXIaSBYhtOhp0VJ4+0X9chmMmSpJayutFaykU2AnZwLe8a5EppT/NXe6db1oV/c5
+k2TGkmCbCkVobp4AElQ28fQ/sAYtVWLO6wGEpFH/HWUeMjXTsun2mY25jVr5X4CJ
+FA3V4MZ7SGwKjZZa6oep6lPoig/R4MfsDwQ2zW/vLFPel1am406v
+=Z0T0
+-END PGP PUBLIC KEY BLOCK-




svn commit: r38339 - /dev/incubator/hudi/KEYS

2020-02-29 Thread vinoyang
Author: vinoyang
Date: Sun Mar  1 04:49:47 2020
New Revision: 38339

Log:
Update KEYS file in release repo

Modified:
dev/incubator/hudi/KEYS

Modified: dev/incubator/hudi/KEYS
==
--- dev/incubator/hudi/KEYS (original)
+++ dev/incubator/hudi/KEYS Sun Mar  1 04:49:47 2020
@@ -332,3 +332,40 @@ ePILQEvmZ8GgwevHx170WUjKWBpLLSFl4zgXIl9Q
 =FZo/
 -END PGP PUBLIC KEY BLOCK-
 
+pub   rsa2048 2020-03-01 [SC]
+  C3A96EC77149571AE89F82764C86684D047DE03C
+uid   [ultimate] vinoyang (apache gpg) 
+sig 34C86684D047DE03C 2020-03-01  vinoyang (apache gpg) 

+sub   rsa2048 2020-03-01 [E]
+sig  4C86684D047DE03C 2020-03-01  vinoyang (apache gpg) 

+
+-BEGIN PGP PUBLIC KEY BLOCK-
+
+mQENBF5bOScBCADbO+U5FnwUK2Nf+NjwTRK2HZYsckLbjpO9we6OuASbYmWSqMhu
+qdhGgwu37E7qjQW4nE8s1hzOPrVUlSzFhXTRjMmRtR1wviWs1ibwH/AQA0HGH9jE
+1p9O6Dy6tASSE0Hd+rxUaFmYxabC6BLopJccD9ANhkhnH9UVbHSDn5RXVV7eEvy5
+wzFHCQavqj4oMAGRUXIHl3uh4U9ROiAU3UQRkZ3pnf3R6tbCtw1Jv/NcWztaLQLm
+aMkr3hSFrrZ8xR2hAOhOs4+Y+i3bELg9bBmvz1lCQRYDEY82lugPyqpdKc98KTAw
+gVgQoXjJhw2+bC/4KTaXwwAcexuh7YfcNJAvABEBAAG0K3Zpbm95YW5nIChhcGFj
+aGUgZ3BnKSA8dmlub3lhbmdAYXBhY2hlLm9yZz6JAU4EEwEIADgWIQTDqW7HcUlX
+GuifgnZMhmhNBH3gPAUCXls5JwIbAwULCQgHAgYVCgkICwIEFgIDAQIeAQIXgAAK
+CRBMhmhNBH3gPJvxB/9gRVhXMXJXH7Z1JNf7zjdRXbdtQgMr2/WVLnbAevjU+Btf
+gHO+KllJZoFXFhHJWSRld7OaNJC0k9V9UyMgOBS1hvCRFEbMH8mRisux4JDx9/F6
+0/gWCzoK3pY2EevzDtr5bOx6C4GPoSjAfBUifTB5YixnX57ePxLnZmeeME+dQ+/1
+fMkpR6a5Eo4sLiVOwhBgYMGKHr6GZSVd23CSyPVxDKuImNfehZQLCtNq8LiKFTY9
+tGhInVdbLUJCPiXoxIKpinVtYxUGoW8LHUFaVq3BenG4wKx5s/ImU824pAtWJ0CK
+NHbzwF7oPckfM1hdT3ZycfUK1qoX6FoEVtZZ1QxTuQENBF5bOScBCADKl9sswehe
+S+2zrhR1C5U3gNqZYD/MSIdx3K2k//BjweYZCqxtzR1J9JtitrA0WJKF8NnK2dF+
+FkpC1iDduvAAZXw94tGb9qyTeSXhZ4gFAxfbRwthEhP0GYya1bhtM+gi2zOW+tsp
+KSYwCBUoAk8PKI3ZPyiWJNhlmsOolSg4IF50PWzhXet0t+OeJaBGNffdfERF7TF/
+y1lBu5RLiLxUDYc5tV80dA3MNLDkCKW16OlCAkxH8+IkZ5Z2eprDaFDBwDo0/5jk
+pET8XBCjBReaFsleYBkZmwdbzeurkj8sTa0GQZKdeBynciDqbREmWulkkTp9jGTv
+lRarr3woa2f7ABEBAAGJATYEGAEIACAWIQTDqW7HcUlXGuifgnZMhmhNBH3gPAUC
+Xls5JwIbDAAKCRBMhmhNBH3gPM9FCACO9+sqdi7wkp8asbpS6WzjZ0FS3KbW3IoW
+QgbVx9t4mB4cGq91h6CnbDGZnr2qlRKwCCAijuUfBTPER8lzyltOVos22FbHXWa+
+Oqicjn336aysnFZuNTvnvYsWvlwvW5AAVCZn4YfE0qYB6oHCBZLdg4YFQRx6U1t5
+CXIaSBYhtOhp0VJ4+0X9chmMmSpJayutFaykU2AnZwLe8a5EppT/NXe6db1oV/c5
+k2TGkmCbCkVobp4AElQ28fQ/sAYtVWLO6wGEpFH/HWUeMjXTsun2mY25jVr5X4CJ
+FA3V4MZ7SGwKjZZa6oep6lPoig/R4MfsDwQ2zW/vLFPel1am406v
+=Z0T0
+-END PGP PUBLIC KEY BLOCK-




[GitHub] [incubator-hudi] codecov-io commented on issue #1360: [HUDI-344][RFC-09] Hudi Dataset Snapshot Exporter

2020-02-29 Thread GitBox
codecov-io commented on issue #1360: [HUDI-344][RFC-09] Hudi Dataset Snapshot 
Exporter
URL: https://github.com/apache/incubator-hudi/pull/1360#issuecomment-593050937
 
 
   # 
[Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1360?src=pr=h1) 
Report
   > :exclamation: No coverage uploaded for pull request base 
(`master@b7f35be`). [Click here to learn what that 
means](https://docs.codecov.io/docs/error-reference#section-missing-base-commit).
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-hudi/pull/1360/graphs/tree.svg?width=650=VTTXabwbs2=150=pr)](https://codecov.io/gh/apache/incubator-hudi/pull/1360?src=pr=tree)
   
   ```diff
   @@   Coverage Diff@@
   ## master   #1360   +/-   ##
   
 Coverage  ?   0.64%   
 Complexity?   2   
   
 Files ? 287   
 Lines ?   14310   
 Branches  ?1463   
   
 Hits  ?  92   
 Misses?   14215   
 Partials  ?   3
   ```
   
   
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1360?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1360?src=pr=footer).
 Last update 
[b7f35be...d6ffad9](https://codecov.io/gh/apache/incubator-hudi/pull/1360?src=pr=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] OpenOpened commented on issue #1360: [HUDI-344][RFC-09] Hudi Dataset Snapshot Exporter

2020-02-29 Thread GitBox
OpenOpened commented on issue #1360: [HUDI-344][RFC-09] Hudi Dataset Snapshot 
Exporter
URL: https://github.com/apache/incubator-hudi/pull/1360#issuecomment-593049426
 
 
   @xushiyan Based on your suggestions, I have optimized the code, please check 
it again. thanks.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


Build failed in Jenkins: hudi-snapshot-deployment-0.5 #203

2020-02-29 Thread Apache Jenkins Server
See 


Changes:


--
[...truncated 2.34 KB...]
/home/jenkins/tools/maven/apache-maven-3.5.4/conf:
logging
settings.xml
toolchains.xml

/home/jenkins/tools/maven/apache-maven-3.5.4/conf/logging:
simplelogger.properties

/home/jenkins/tools/maven/apache-maven-3.5.4/lib:
aopalliance-1.0.jar
cdi-api-1.0.jar
cdi-api.license
commons-cli-1.4.jar
commons-cli.license
commons-io-2.5.jar
commons-io.license
commons-lang3-3.5.jar
commons-lang3.license
ext
guava-20.0.jar
guice-4.2.0-no_aop.jar
jansi-1.17.1.jar
jansi-native
javax.inject-1.jar
jcl-over-slf4j-1.7.25.jar
jcl-over-slf4j.license
jsr250-api-1.0.jar
jsr250-api.license
maven-artifact-3.5.4.jar
maven-artifact.license
maven-builder-support-3.5.4.jar
maven-builder-support.license
maven-compat-3.5.4.jar
maven-compat.license
maven-core-3.5.4.jar
maven-core.license
maven-embedder-3.5.4.jar
maven-embedder.license
maven-model-3.5.4.jar
maven-model-builder-3.5.4.jar
maven-model-builder.license
maven-model.license
maven-plugin-api-3.5.4.jar
maven-plugin-api.license
maven-repository-metadata-3.5.4.jar
maven-repository-metadata.license
maven-resolver-api-1.1.1.jar
maven-resolver-api.license
maven-resolver-connector-basic-1.1.1.jar
maven-resolver-connector-basic.license
maven-resolver-impl-1.1.1.jar
maven-resolver-impl.license
maven-resolver-provider-3.5.4.jar
maven-resolver-provider.license
maven-resolver-spi-1.1.1.jar
maven-resolver-spi.license
maven-resolver-transport-wagon-1.1.1.jar
maven-resolver-transport-wagon.license
maven-resolver-util-1.1.1.jar
maven-resolver-util.license
maven-settings-3.5.4.jar
maven-settings-builder-3.5.4.jar
maven-settings-builder.license
maven-settings.license
maven-shared-utils-3.2.1.jar
maven-shared-utils.license
maven-slf4j-provider-3.5.4.jar
maven-slf4j-provider.license
org.eclipse.sisu.inject-0.3.3.jar
org.eclipse.sisu.inject.license
org.eclipse.sisu.plexus-0.3.3.jar
org.eclipse.sisu.plexus.license
plexus-cipher-1.7.jar
plexus-cipher.license
plexus-component-annotations-1.7.1.jar
plexus-component-annotations.license
plexus-interpolation-1.24.jar
plexus-interpolation.license
plexus-sec-dispatcher-1.4.jar
plexus-sec-dispatcher.license
plexus-utils-3.1.0.jar
plexus-utils.license
slf4j-api-1.7.25.jar
slf4j-api.license
wagon-file-3.1.0.jar
wagon-file.license
wagon-http-3.1.0-shaded.jar
wagon-http.license
wagon-provider-api-3.1.0.jar
wagon-provider-api.license

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/ext:
README.txt

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native:
freebsd32
freebsd64
linux32
linux64
osx
README.txt
windows32
windows64

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/freebsd32:
libjansi.so

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/freebsd64:
libjansi.so

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/linux32:
libjansi.so

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/linux64:
libjansi.so

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/osx:
libjansi.jnilib

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/windows32:
jansi.dll

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/windows64:
jansi.dll
Finished /home/jenkins/tools/maven/apache-maven-3.5.4 Directory Listing :
Detected current version as: 
'HUDI_home=
0.5.2-SNAPSHOT'
[INFO] Scanning for projects...
[WARNING] 
[WARNING] Some problems were encountered while building the effective model for 
org.apache.hudi:hudi-spark_2.11:jar:0.5.2-SNAPSHOT
[WARNING] 'artifactId' contains an expression but should be a constant. @ 
org.apache.hudi:hudi-spark_${scala.binary.version}:[unknown-version], 

 line 26, column 15
[WARNING] 
[WARNING] Some problems were encountered while building the effective model for 
org.apache.hudi:hudi-timeline-service:jar:0.5.2-SNAPSHOT
[WARNING] 'build.plugins.plugin.(groupId:artifactId)' must be unique but found 
duplicate declaration of plugin org.jacoco:jacoco-maven-plugin @ 
org.apache.hudi:hudi-timeline-service:[unknown-version], 

 line 58, column 15
[WARNING] 
[WARNING] Some problems were encountered while building the effective model for 
org.apache.hudi:hudi-utilities_2.11:jar:0.5.2-SNAPSHOT
[WARNING] 'artifactId' contains an expression but should be a constant. @ 
org.apache.hudi:hudi-utilities_${scala.binary.version}:[unknown-version], 

 line 26, column 15
[WARNING] 
[WARNING] Some problems were encountered while building the effective model for 
org.apache.hudi:hudi-spark-bundle_2.11:jar:0.5.2-SNAPSHOT
[WARNING] 'artifactId' contains an expression but should be a constant. @ 

[GitHub] [incubator-hudi] OpenOpened commented on a change in pull request #1360: [HUDI-344][RFC-09] Hudi Dataset Snapshot Exporter

2020-02-29 Thread GitBox
OpenOpened commented on a change in pull request #1360: [HUDI-344][RFC-09] Hudi 
Dataset Snapshot Exporter
URL: https://github.com/apache/incubator-hudi/pull/1360#discussion_r386073635
 
 

 ##
 File path: 
hudi-utilities/src/main/java/org/apache/hudi/utilities/HoodieSnapshotExporter.java
 ##
 @@ -0,0 +1,219 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.utilities;
+
+import com.beust.jcommander.JCommander;
+import com.beust.jcommander.Parameter;
+
+import org.apache.hadoop.fs.FileStatus;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.FileUtil;
+import org.apache.hadoop.fs.Path;
+import org.apache.hudi.common.util.StringUtils;
+import org.apache.hudi.common.SerializableConfiguration;
+import org.apache.hudi.common.model.HoodiePartitionMetadata;
+import org.apache.hudi.common.table.HoodieTableConfig;
+import org.apache.hudi.common.table.HoodieTableMetaClient;
+import org.apache.hudi.common.table.HoodieTimeline;
+import org.apache.hudi.common.table.TableFileSystemView;
+import org.apache.hudi.common.table.timeline.HoodieInstant;
+import org.apache.hudi.common.table.view.HoodieTableFileSystemView;
+import org.apache.hudi.common.util.FSUtils;
+import org.apache.hudi.common.util.Option;
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+import org.apache.spark.api.java.JavaSparkContext;
+import org.apache.spark.sql.Column;
+import org.apache.spark.sql.SaveMode;
+import org.apache.spark.sql.SparkSession;
+
+import scala.Tuple2;
+import scala.collection.JavaConversions;
+
+import java.io.IOException;
+import java.io.Serializable;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.stream.Collectors;
+
+/**
+ * Export the latest records of Hudi dataset to a set of external files (e.g., 
plain parquet files).
+ */
+
+public class HoodieSnapshotExporter {
+  private static final Logger LOG = 
LogManager.getLogger(HoodieSnapshotExporter.class);
+
+  public static class Config implements Serializable {
+@Parameter(names = {"--source-base-path", "-sbp"}, description = "Base 
path for the source Hudi dataset to be snapshotted", required = true)
+String basePath = null;
+
+@Parameter(names = {"--target-base-path", "-tbp"}, description = "Base 
path for the target output files (snapshots)", required = true)
+String outputPath = null;
+
+@Parameter(names = {"--snapshot-prefix", "-sp"}, description = "Snapshot 
prefix or directory under the target base path in order to segregate different 
snapshots")
+String snapshotPrefix;
+
+@Parameter(names = {"--output-format", "-of"}, description = "e.g. Hudi or 
Parquet", required = true)
+String outputFormat;
+
+@Parameter(names = {"--output-partition-field", "-opf"}, description = "A 
field to be used by Spark repartitioning")
+String outputPartitionField;
+  }
+
+  public void export(SparkSession spark, Config cfg) throws IOException {
+String sourceBasePath = cfg.basePath;
+String targetBasePath = cfg.outputPath;
+String snapshotPrefix = cfg.snapshotPrefix;
+String outputFormat = cfg.outputFormat;
+String outputPartitionField = cfg.outputPartitionField;
+JavaSparkContext jsc = new JavaSparkContext(spark.sparkContext());
+FileSystem fs = FSUtils.getFs(sourceBasePath, jsc.hadoopConfiguration());
+
+final SerializableConfiguration serConf = new 
SerializableConfiguration(jsc.hadoopConfiguration());
+final HoodieTableMetaClient tableMetadata = new 
HoodieTableMetaClient(fs.getConf(), sourceBasePath);
+final TableFileSystemView.BaseFileOnlyView fsView = new 
HoodieTableFileSystemView(tableMetadata,
+
tableMetadata.getActiveTimeline().getCommitsTimeline().filterCompletedInstants());
+// Get the latest commit
+Option latestCommit =
+
tableMetadata.getActiveTimeline().getCommitsTimeline().filterCompletedInstants().lastInstant();
+if (!latestCommit.isPresent()) {
+  LOG.warn("No commits present. Nothing to snapshot");
+  return;
+}
+final String latestCommitTimestamp = latestCommit.get().getTimestamp();
+LOG.info(String.format("Starting to snapshot latest version files which 
are also 

[jira] [Updated] (HUDI-344) Hudi Dataset Snapshot Exporter

2020-02-29 Thread Raymond Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-344:

Fix Version/s: 0.6.0

> Hudi Dataset Snapshot Exporter
> --
>
> Key: HUDI-344
> URL: https://issues.apache.org/jira/browse/HUDI-344
> Project: Apache Hudi (incubating)
>  Issue Type: New Feature
>  Components: Utilities
>Reporter: Raymond Xu
>Priority: Major
>  Labels: features, pull-request-available
> Fix For: 0.6.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> A dataset exporter tool for snapshotting. See 
> [RFC-9|https://cwiki.apache.org/confluence/display/HUDI/RFC-9%3A+Hudi+Dataset+Snapshot+Exporter]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] bhasudha edited a comment on issue #1325: presto - querying nested object in parquet file created by hudi

2020-02-29 Thread GitBox
bhasudha edited a comment on issue #1325: presto - querying nested object in 
parquet file created by hudi
URL: https://github.com/apache/incubator-hudi/issues/1325#issuecomment-592997350
 
 
   @adamjoneill I tried to reproduce this in local docker (little modification 
to Hudi docker demo setup) and could not.  I am able to select * on both the 
schemas ( with and without the simple identifier in the array element). 
   
   I wanted to verify that presto query select * from table against hudi 
parquet file is NOT failing currently.  That is ensured. Looks like there is 
some environment issues between this setup and your setup. Lets see if we can 
dig into that more somehow.
   
   Here is the setup and  reproducing steps.
   
   I am using the hoodie docker demo setup on latest master.
   Hudi version : 0.5.2-incubating
   Spark version : 2.4.4
   Hive version : Hive 2.3.3,
   Presto 0.217
   Hadoop version : 2.8.4
   Storage (HDFS/S3/GCS..) : HDFS
   Running on Docker? (yes/no) : yes
   
   I tried for both simple and wothout simple identifier inside the array item. 
Here is the set up I used:
   
   cat nested-table-kafka-source.properties
include=base.properties
# Key fields, for kafka example
hoodie.datasource.write.recordkey.field=id
hoodie.datasource.write.partitionpath.field=date
# Schema provider props (change to absolute path based on your 
installation)
   
hoodie.deltastreamer.schemaprovider.source.schema.file=/var/demo/config/nested_table_schema.avsc
   
hoodie.deltastreamer.schemaprovider.target.schema.file=/var/demo/config/nested_table_schema.avsc
# Kafka Source
hoodie.deltastreamer.source.kafka.topic=nested_table
#Kafka props
bootstrap.servers=kafkabroker:9092
auto.offset.reset=earliest
   
   
   
   cat nested_table_schema.avsc
{
  "name": "nested_table",
  "type": "record",
  "fields": [
{
  "name": "id",
  "type": "int"
},
{
  "name": "date",
  "type":"string"
},
{
  "name": "foos",
  "type": {
"type": "array",
"items": {
  "name": "foos_record",
  "type": "record",
  "fields": [
{  // remove this for without simple
  "name": "id",
  "type": "int"
},
{
  "name": "bar",
  "type": {
"name": "bar",
"type": "record",
"fields": [
  {
"name": "id",
"type": "int"
  },
  {
"name": "name",
"type": "string"
  }
]
  }
}
  ]
}
  }
}
  ]
}
   
   cat nested_table.json //remove the id entry inside foos[{elem}] for trying  
without simple.
{"id":1,"date": 
"2020/02/15","foos":[{"id":11,"bar":{"id":1,"name":"OneBar"}},{"id":12,"bar":{"id":2,"name":"TwoBar"}},{"id":13,"bar":{"id":3,"name":"ThreeBar"}}]}
{"id":2,"date": 
"2020/02/16","foos":[{"id":21,"bar":{"id":2,"name":"OneBar"}},{"id":22,"bar":{"id":2,"name":"TwoBar"}},{"id":23,"bar":{"id":3,"name":"ThreeBar"}}]}
{"id":3,"date": 
"2020/02/17","foos":[{"id":31,"bar":{"id":3,"name":"OneBar"}},{"id":32,"bar":{"id":2,"name":"TwoBar"}},{"id":33,"bar":{"id":3,"name":"ThreeBar"}}]}
{"id":4,"date": 
"2020/02/18","foos":[{"id":41,"bar":{"id":4,"name":"OneBar"}},{"id":42,"bar":{"id":2,"name":"TwoBar"}},{"id":43,"bar":{"id":3,"name":"ThreeBar"}}]}
   
   
   Steps to publish to kafka, ingest from kafka and sync with hive. these are 
in the website - https://hudi.apache.org/docs/docker_demo.html. I modified it a 
bit for this example.
   // publich data from local to kafka. cd into hudi repo dir.
   cat docker/demo/data/nested_table.json | kafkacat -b kafkabroker -t 
nested_table -P
   
   //verify kafka publish using
   kafkacat -b kafkabroker -L -J | jq .
   
   // Now hop onto one of the adhoc containers
   docker exec -it adhoc-2 /bin/bash
   
   // using deltastreamer ingest data into hdfs
   spark-submit --class 
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer 
$HUDI_UTILITIES_BUNDLE --table-type COPY_ON_WRITE --source-class 
org.apache.hudi.utilities.sources.JsonKafkaSource --source-ordering-field id  
--target-base-path /user/hive/warehouse/nested_table_cow --target-table 
nested_table_cow --props /var/demo/config/nested-table-kafka-source.properties 
--schemaprovider-class org.apache.hudi.utilities.schema.FilebasedSchemaProvider
   
   // verify 

[GitHub] [incubator-hudi] bhasudha commented on issue #1325: presto - querying nested object in parquet file created by hudi

2020-02-29 Thread GitBox
bhasudha commented on issue #1325: presto - querying nested object in parquet 
file created by hudi
URL: https://github.com/apache/incubator-hudi/issues/1325#issuecomment-592997350
 
 
   @adamjoneill I tried to reproduce this in local docker (little modification 
to Hudi docker demo setup) and could not.  I am able to select * on both the 
schemas ( with and without the simple identifier in the array element). Here is 
the setup and  reproducing steps.
   
   I am using the hoodie docker demo setup on latest master.
   Hudi version : 0.5.2-incubating
   Spark version : 2.4.4
   Hive version : Hive 2.3.3,
   Presto 0.217
   Hadoop version : 2.8.4
   Storage (HDFS/S3/GCS..) : HDFS
   Running on Docker? (yes/no) : yes
   
   I tried for both simple and wothout simple identifier inside the array item. 
Here is the set up I used:
   
   cat nested-table-kafka-source.properties
include=base.properties
# Key fields, for kafka example
hoodie.datasource.write.recordkey.field=id
hoodie.datasource.write.partitionpath.field=date
# Schema provider props (change to absolute path based on your 
installation)
   
hoodie.deltastreamer.schemaprovider.source.schema.file=/var/demo/config/nested_table_schema.avsc
   
hoodie.deltastreamer.schemaprovider.target.schema.file=/var/demo/config/nested_table_schema.avsc
# Kafka Source
hoodie.deltastreamer.source.kafka.topic=nested_table
#Kafka props
bootstrap.servers=kafkabroker:9092
auto.offset.reset=earliest
   
   
   
   cat nested_table_schema.avsc
{
  "name": "nested_table",
  "type": "record",
  "fields": [
{
  "name": "id",
  "type": "int"
},
{
  "name": "date",
  "type":"string"
},
{
  "name": "foos",
  "type": {
"type": "array",
"items": {
  "name": "foos_record",
  "type": "record",
  "fields": [
{  // remove this for without simple
  "name": "id",
  "type": "int"
},
{
  "name": "bar",
  "type": {
"name": "bar",
"type": "record",
"fields": [
  {
"name": "id",
"type": "int"
  },
  {
"name": "name",
"type": "string"
  }
]
  }
}
  ]
}
  }
}
  ]
}
   
   cat nested_table.json //remove the id entry inside foos[{elem}] for trying  
without simple.
{"id":1,"date": 
"2020/02/15","foos":[{"id":11,"bar":{"id":1,"name":"OneBar"}},{"id":12,"bar":{"id":2,"name":"TwoBar"}},{"id":13,"bar":{"id":3,"name":"ThreeBar"}}]}
{"id":2,"date": 
"2020/02/16","foos":[{"id":21,"bar":{"id":2,"name":"OneBar"}},{"id":22,"bar":{"id":2,"name":"TwoBar"}},{"id":23,"bar":{"id":3,"name":"ThreeBar"}}]}
{"id":3,"date": 
"2020/02/17","foos":[{"id":31,"bar":{"id":3,"name":"OneBar"}},{"id":32,"bar":{"id":2,"name":"TwoBar"}},{"id":33,"bar":{"id":3,"name":"ThreeBar"}}]}
{"id":4,"date": 
"2020/02/18","foos":[{"id":41,"bar":{"id":4,"name":"OneBar"}},{"id":42,"bar":{"id":2,"name":"TwoBar"}},{"id":43,"bar":{"id":3,"name":"ThreeBar"}}]}
   
   
   Steps to publish to kafka, ingest from kafka and sync with hive. these are 
in the website - https://hudi.apache.org/docs/docker_demo.html. I modified it a 
bit for this example.
   // publich data from local to kafka. cd into hudi repo dir.
   cat docker/demo/data/nested_table.json | kafkacat -b kafkabroker -t 
nested_table -P
   
   //verify kafka publish using
   kafkacat -b kafkabroker -L -J | jq .
   
   // Now hop onto one of the adhoc containers
   docker exec -it adhoc-2 /bin/bash
   
   // using deltastreamer ingest data into hdfs
   spark-submit --class 
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer 
$HUDI_UTILITIES_BUNDLE --table-type COPY_ON_WRITE --source-class 
org.apache.hudi.utilities.sources.JsonKafkaSource --source-ordering-field id  
--target-base-path /user/hive/warehouse/nested_table_cow --target-table 
nested_table_cow --props /var/demo/config/nested-table-kafka-source.properties 
--schemaprovider-class org.apache.hudi.utilities.schema.FilebasedSchemaProvider
   
   // verify ingest by viewing this page in web hdfs
   http://namenode:50070/explorer.html#/user/hive/warehouse/nested_table_cow
   
   // sync to hive to make it available as a hive table.
   /var/hoodie/ws/hudi-hive/run_sync_tool.sh  --jdbc-url 
jdbc:hive2://hiveserver:1 --user