[GitHub] [incubator-hudi] yanghua commented on a change in pull request #1390: [HUDI-634] Cut 0.5.2 documentation and write release note

2020-03-11 Thread GitBox
yanghua commented on a change in pull request #1390: [HUDI-634] Cut 0.5.2 
documentation and write release note
URL: https://github.com/apache/incubator-hudi/pull/1390#discussion_r391408675
 
 

 ##
 File path: docs/_pages/releases.md
 ##
 @@ -6,6 +6,32 @@ toc: true
 last_modified_at: 2019-12-30T15:59:57-04:00
 ---
 
+## [Release 
0.5.2-incubating](https://github.com/apache/incubator-hudi/releases/tag/release-0.5.2-incubating)
 ([docs](/docs/0.5.2-quick-start-guide.html))
+
+### Download Information
+ * Source Release : [Apache Hudi(incubating) 0.5.2-incubating Source 
Release](https://www.apache.org/dist/incubator/hudi/0.5.2-incubating/hudi-0.5.2-incubating.src.tgz)
 
([asc](https://www.apache.org/dist/incubator/hudi/0.5.2-incubating/hudi-0.5.2-incubating.src.tgz.asc),
 
[sha512](https://www.apache.org/dist/incubator/hudi/0.5.2-incubating/hudi-0.5.2-incubating.src.tgz.sha512))
+ * Apache Hudi (incubating) jars corresponding to this release is available 
[here](https://repository.apache.org/#nexus-search;quick~hudi)
+
+### Migration Guide for this release
 
 Review comment:
   OK, actually IMO, we also need to add it for 0.5.1. Whatever, let's discuss 
it later.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1390: [HUDI-634] Cut 0.5.2 documentation and write release note

2020-03-11 Thread GitBox
vinothchandar commented on a change in pull request #1390: [HUDI-634] Cut 0.5.2 
documentation and write release note
URL: https://github.com/apache/incubator-hudi/pull/1390#discussion_r391401705
 
 

 ##
 File path: docs/_pages/releases.md
 ##
 @@ -6,6 +6,32 @@ toc: true
 last_modified_at: 2019-12-30T15:59:57-04:00
 ---
 
+## [Release 
0.5.2-incubating](https://github.com/apache/incubator-hudi/releases/tag/release-0.5.2-incubating)
 ([docs](/docs/0.5.2-quick-start-guide.html))
+
+### Download Information
+ * Source Release : [Apache Hudi(incubating) 0.5.2-incubating Source 
Release](https://www.apache.org/dist/incubator/hudi/0.5.2-incubating/hudi-0.5.2-incubating.src.tgz)
 
([asc](https://www.apache.org/dist/incubator/hudi/0.5.2-incubating/hudi-0.5.2-incubating.src.tgz.asc),
 
[sha512](https://www.apache.org/dist/incubator/hudi/0.5.2-incubating/hudi-0.5.2-incubating.src.tgz.sha512))
+ * Apache Hudi (incubating) jars corresponding to this release is available 
[here](https://repository.apache.org/#nexus-search;quick~hudi)
+
+### Migration Guide for this release
 
 Review comment:
   can we call out the `KeyGenerator` rename? this is critical.. already one 
user has raised this in slack or mailing list 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1390: [HUDI-634] Cut 0.5.2 documentation and write release note

2020-03-11 Thread GitBox
vinothchandar commented on a change in pull request #1390: [HUDI-634] Cut 0.5.2 
documentation and write release note
URL: https://github.com/apache/incubator-hudi/pull/1390#discussion_r391401512
 
 

 ##
 File path: docs/_pages/releases.md
 ##
 @@ -6,6 +6,32 @@ toc: true
 last_modified_at: 2019-12-30T15:59:57-04:00
 ---
 
+## [Release 
0.5.2-incubating](https://github.com/apache/incubator-hudi/releases/tag/release-0.5.2-incubating)
 ([docs](/docs/0.5.2-quick-start-guide.html))
+
+### Download Information
+ * Source Release : [Apache Hudi(incubating) 0.5.2-incubating Source 
Release](https://www.apache.org/dist/incubator/hudi/0.5.2-incubating/hudi-0.5.2-incubating.src.tgz)
 
([asc](https://www.apache.org/dist/incubator/hudi/0.5.2-incubating/hudi-0.5.2-incubating.src.tgz.asc),
 
[sha512](https://www.apache.org/dist/incubator/hudi/0.5.2-incubating/hudi-0.5.2-incubating.src.tgz.sha512))
+ * Apache Hudi (incubating) jars corresponding to this release is available 
[here](https://repository.apache.org/#nexus-search;quick~hudi)
+
+### Migration Guide for this release
+ * Support for overwriting the payload implementation in `hoodie.properties` 
via specifying the `hoodie.compaction.payload.class` config option. Previously, 
once the payload class is set once in `hoodie.properties`, it cannot be 
changed. In some cases, if a code refactor is done and the jar updated, one may 
need to pass the new payload class name.
+ * Write Client restructuring has moved classes around 
([HUDI-554](https://issues.apache.org/jira/browse/HUDI-554)). Package `client` 
now has all the various client classes, that do the transaction management. 
`func` renamed to `execution` and some helpers moved to `client/utils`. All 
compaction code under `io` now under `table/compact`. Rollback code under 
`table/rollback` and in general all code for individual operations under `table`
+ * Simplify `HoodieBloomIndex` without the need for 2GB limit handling. Prior 
to spark 2.4.0, each spark partition has a limit of 2GB. In Hudi 0.5.1, after 
we upgraded to spark 2.4.4, we don't have the limitation anymore. Hence 
removing the safe parallelism constraint we had in` HoodieBloomIndex`.
 
 Review comment:
   yes.. correct.. the question we need to ask yourselves is "does the user 
need to do any action" to migrate.. No action is needed here. That is why I 
don't think this needs to be moved out 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


Build failed in Jenkins: hudi-snapshot-deployment-0.5 #214

2020-03-11 Thread Apache Jenkins Server
See 


Changes:


--
[...truncated 2.39 KB...]
/home/jenkins/tools/maven/apache-maven-3.5.4/conf:
logging
settings.xml
toolchains.xml

/home/jenkins/tools/maven/apache-maven-3.5.4/conf/logging:
simplelogger.properties

/home/jenkins/tools/maven/apache-maven-3.5.4/lib:
aopalliance-1.0.jar
cdi-api-1.0.jar
cdi-api.license
commons-cli-1.4.jar
commons-cli.license
commons-io-2.5.jar
commons-io.license
commons-lang3-3.5.jar
commons-lang3.license
ext
guava-20.0.jar
guice-4.2.0-no_aop.jar
jansi-1.17.1.jar
jansi-native
javax.inject-1.jar
jcl-over-slf4j-1.7.25.jar
jcl-over-slf4j.license
jsr250-api-1.0.jar
jsr250-api.license
maven-artifact-3.5.4.jar
maven-artifact.license
maven-builder-support-3.5.4.jar
maven-builder-support.license
maven-compat-3.5.4.jar
maven-compat.license
maven-core-3.5.4.jar
maven-core.license
maven-embedder-3.5.4.jar
maven-embedder.license
maven-model-3.5.4.jar
maven-model-builder-3.5.4.jar
maven-model-builder.license
maven-model.license
maven-plugin-api-3.5.4.jar
maven-plugin-api.license
maven-repository-metadata-3.5.4.jar
maven-repository-metadata.license
maven-resolver-api-1.1.1.jar
maven-resolver-api.license
maven-resolver-connector-basic-1.1.1.jar
maven-resolver-connector-basic.license
maven-resolver-impl-1.1.1.jar
maven-resolver-impl.license
maven-resolver-provider-3.5.4.jar
maven-resolver-provider.license
maven-resolver-spi-1.1.1.jar
maven-resolver-spi.license
maven-resolver-transport-wagon-1.1.1.jar
maven-resolver-transport-wagon.license
maven-resolver-util-1.1.1.jar
maven-resolver-util.license
maven-settings-3.5.4.jar
maven-settings-builder-3.5.4.jar
maven-settings-builder.license
maven-settings.license
maven-shared-utils-3.2.1.jar
maven-shared-utils.license
maven-slf4j-provider-3.5.4.jar
maven-slf4j-provider.license
org.eclipse.sisu.inject-0.3.3.jar
org.eclipse.sisu.inject.license
org.eclipse.sisu.plexus-0.3.3.jar
org.eclipse.sisu.plexus.license
plexus-cipher-1.7.jar
plexus-cipher.license
plexus-component-annotations-1.7.1.jar
plexus-component-annotations.license
plexus-interpolation-1.24.jar
plexus-interpolation.license
plexus-sec-dispatcher-1.4.jar
plexus-sec-dispatcher.license
plexus-utils-3.1.0.jar
plexus-utils.license
slf4j-api-1.7.25.jar
slf4j-api.license
wagon-file-3.1.0.jar
wagon-file.license
wagon-http-3.1.0-shaded.jar
wagon-http.license
wagon-provider-api-3.1.0.jar
wagon-provider-api.license

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/ext:
README.txt

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native:
freebsd32
freebsd64
linux32
linux64
osx
README.txt
windows32
windows64

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/freebsd32:
libjansi.so

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/freebsd64:
libjansi.so

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/linux32:
libjansi.so

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/linux64:
libjansi.so

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/osx:
libjansi.jnilib

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/windows32:
jansi.dll

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/windows64:
jansi.dll
Finished /home/jenkins/tools/maven/apache-maven-3.5.4 Directory Listing :
Detected current version as: 
'HUDI_home=
0.6.0-SNAPSHOT'
[INFO] Scanning for projects...
[WARNING] 
[WARNING] Some problems were encountered while building the effective model for 
org.apache.hudi:hudi-spark_2.11:jar:0.6.0-SNAPSHOT
[WARNING] 'artifactId' contains an expression but should be a constant. @ 
org.apache.hudi:hudi-spark_${scala.binary.version}:[unknown-version], 

 line 26, column 15
[WARNING] 
[WARNING] Some problems were encountered while building the effective model for 
org.apache.hudi:hudi-timeline-service:jar:0.6.0-SNAPSHOT
[WARNING] 'build.plugins.plugin.(groupId:artifactId)' must be unique but found 
duplicate declaration of plugin org.jacoco:jacoco-maven-plugin @ 
org.apache.hudi:hudi-timeline-service:[unknown-version], 

 line 58, column 15
[WARNING] 
[WARNING] Some problems were encountered while building the effective model for 
org.apache.hudi:hudi-utilities_2.11:jar:0.6.0-SNAPSHOT
[WARNING] 'artifactId' contains an expression but should be a constant. @ 
org.apache.hudi:hudi-utilities_${scala.binary.version}:[unknown-version], 

 line 26, column 15
[WARNING] 
[WARNING] Some problems were encountered while building the effective model for 
org.apache.hudi:hudi-spark-bundle_2.11:jar:0.6.0-SNAPSHOT
[WARNING] 'artifactId' contains an expression but should be a constant. @ 

[incubator-hudi] branch master updated (0f892ef -> c40a0d4)

2020-03-11 Thread vbalaji
This is an automated email from the ASF dual-hosted git repository.

vbalaji pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-hudi.git.


from 0f892ef  [HUDI-692] Add delete savepoint for cli (#1397)
 add c40a0d4  [HUDI-656][Performance] Return a dummy Spark relation after 
writing the DataFrame (#1394)

No new revisions were added by this update.

Summary of changes:
 .../main/scala/org/apache/hudi/DefaultSource.scala | 20 +-
 .../scala/org/apache/hudi/HudiEmptyRelation.scala  | 24 +++---
 2 files changed, 31 insertions(+), 13 deletions(-)
 copy hudi-cli/src/main/java/org/apache/hudi/cli/Main.java => 
hudi-spark/src/main/scala/org/apache/hudi/HudiEmptyRelation.scala (60%)



[GitHub] [incubator-hudi] bvaradar merged pull request #1394: [HUDI-656][Performance] Return a dummy Spark relation after writing the DataFrame

2020-03-11 Thread GitBox
bvaradar merged pull request #1394: [HUDI-656][Performance] Return a dummy 
Spark relation after writing the DataFrame
URL: https://github.com/apache/incubator-hudi/pull/1394
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] yanghua commented on issue #1390: [HUDI-634] Cut 0.5.2 documentation and write release note

2020-03-11 Thread GitBox
yanghua commented on issue #1390: [HUDI-634] Cut 0.5.2 documentation and write 
release note
URL: https://github.com/apache/incubator-hudi/pull/1390#issuecomment-597984210
 
 
   > In that sense, I feel few points you have don't fall into this category. 
Also we are missing an important one that key generators have been moved into 
their own package.
   
   @vinothchandar I did check the change of key generators. And make sure it 
happened in 0.5.1, not 0.5.2. So, I think we may not add it. If you agree I can 
file a normal issue to update the release note of 0.5.1. Correct me if I am 
wrong.
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] yanghua commented on a change in pull request #1390: [HUDI-634] Cut 0.5.2 documentation and write release note

2020-03-11 Thread GitBox
yanghua commented on a change in pull request #1390: [HUDI-634] Cut 0.5.2 
documentation and write release note
URL: https://github.com/apache/incubator-hudi/pull/1390#discussion_r391379013
 
 

 ##
 File path: docs/_pages/releases.md
 ##
 @@ -6,6 +6,32 @@ toc: true
 last_modified_at: 2019-12-30T15:59:57-04:00
 ---
 
+## [Release 
0.5.2-incubating](https://github.com/apache/incubator-hudi/releases/tag/release-0.5.2-incubating)
 ([docs](/docs/0.5.2-quick-start-guide.html))
+
+### Download Information
+ * Source Release : [Apache Hudi(incubating) 0.5.2-incubating Source 
Release](https://www.apache.org/dist/incubator/hudi/0.5.2-incubating/hudi-0.5.2-incubating.src.tgz)
 
([asc](https://www.apache.org/dist/incubator/hudi/0.5.2-incubating/hudi-0.5.2-incubating.src.tgz.asc),
 
[sha512](https://www.apache.org/dist/incubator/hudi/0.5.2-incubating/hudi-0.5.2-incubating.src.tgz.sha512))
+ * Apache Hudi (incubating) jars corresponding to this release is available 
[here](https://repository.apache.org/#nexus-search;quick~hudi)
+
+### Migration Guide for this release
+ * Support for overwriting the payload implementation in `hoodie.properties` 
via specifying the `hoodie.compaction.payload.class` config option. Previously, 
once the payload class is set once in `hoodie.properties`, it cannot be 
changed. In some cases, if a code refactor is done and the jar updated, one may 
need to pass the new payload class name.
+ * Write Client restructuring has moved classes around 
([HUDI-554](https://issues.apache.org/jira/browse/HUDI-554)). Package `client` 
now has all the various client classes, that do the transaction management. 
`func` renamed to `execution` and some helpers moved to `client/utils`. All 
compaction code under `io` now under `table/compact`. Rollback code under 
`table/rollback` and in general all code for individual operations under `table`
+ * Simplify `HoodieBloomIndex` without the need for 2GB limit handling. Prior 
to spark 2.4.0, each spark partition has a limit of 2GB. In Hudi 0.5.1, after 
we upgraded to spark 2.4.4, we don't have the limitation anymore. Hence 
removing the safe parallelism constraint we had in` HoodieBloomIndex`.
 
 Review comment:
   IMO, it only needs users to pay attention to this change. Its change happens 
in the inner of `HoodieBloomIndex `, right?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] lamber-ken commented on a change in pull request #1377: [HUDI-663] Fix HoodieDeltaStreamer offset not handled correctly

2020-03-11 Thread GitBox
lamber-ken commented on a change in pull request #1377: [HUDI-663] Fix 
HoodieDeltaStreamer offset not handled correctly
URL: https://github.com/apache/incubator-hudi/pull/1377#discussion_r391352601
 
 

 ##
 File path: 
hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/helpers/KafkaOffsetGen.java
 ##
 @@ -180,7 +180,7 @@ public KafkaOffsetGen(TypedProperties props) {
   .map(x -> new TopicPartition(x.topic(), 
x.partition())).collect(Collectors.toSet());
 
   // Determine the offset ranges to read from
-  if (lastCheckpointStr.isPresent()) {
+  if (lastCheckpointStr.isPresent() && !lastCheckpointStr.get().isEmpty()) 
{
 
 Review comment:
   Thanks, I need to think about it, wait a moment : )
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Created] (HUDI-693) Add unit test for hudi-cli module

2020-03-11 Thread hong dongdong (Jira)
hong dongdong created HUDI-693:
--

 Summary: Add unit test for hudi-cli module
 Key: HUDI-693
 URL: https://issues.apache.org/jira/browse/HUDI-693
 Project: Apache Hudi (incubating)
  Issue Type: New Feature
  Components: CLI, Testing
Reporter: hong dongdong


There is no unit tests for this module overall, need to add it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] hddong commented on issue #1397: [HUDI-692] Add delete savepoint for cli

2020-03-11 Thread GitBox
hddong commented on issue #1397: [HUDI-692] Add delete savepoint for cli
URL: https://github.com/apache/incubator-hudi/pull/1397#issuecomment-597953902
 
 
   @vinothchandar  Yes, I will create a  JIRA for writing unit tests for this 
module.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] garyli1019 commented on a change in pull request #1377: [HUDI-663] Fix HoodieDeltaStreamer offset not handled correctly

2020-03-11 Thread GitBox
garyli1019 commented on a change in pull request #1377: [HUDI-663] Fix 
HoodieDeltaStreamer offset not handled correctly
URL: https://github.com/apache/incubator-hudi/pull/1377#discussion_r391350401
 
 

 ##
 File path: 
hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/helpers/KafkaOffsetGen.java
 ##
 @@ -180,7 +180,7 @@ public KafkaOffsetGen(TypedProperties props) {
   .map(x -> new TopicPartition(x.topic(), 
x.partition())).collect(Collectors.toSet());
 
   // Determine the offset ranges to read from
-  if (lastCheckpointStr.isPresent()) {
+  if (lastCheckpointStr.isPresent() && !lastCheckpointStr.get().isEmpty()) 
{
 
 Review comment:
   In this case I think the user could use `cfg.checkpoint = xxx` to reset the 
checkpoint. It would be concerning for me if the deltastreamer automatically 
reset the checkpoint for me and I didn't aware of it. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] lamber-ken commented on a change in pull request #1377: [HUDI-663] Fix HoodieDeltaStreamer offset not handled correctly

2020-03-11 Thread GitBox
lamber-ken commented on a change in pull request #1377: [HUDI-663] Fix 
HoodieDeltaStreamer offset not handled correctly
URL: https://github.com/apache/incubator-hudi/pull/1377#discussion_r391348916
 
 

 ##
 File path: 
hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/helpers/KafkaOffsetGen.java
 ##
 @@ -180,7 +180,7 @@ public KafkaOffsetGen(TypedProperties props) {
   .map(x -> new TopicPartition(x.topic(), 
x.partition())).collect(Collectors.toSet());
 
   // Determine the offset ranges to read from
-  if (lastCheckpointStr.isPresent()) {
+  if (lastCheckpointStr.isPresent() && !lastCheckpointStr.get().isEmpty()) 
{
 
 Review comment:
   Prefer to pushdown the control bebavior to datasource(e.g kafka / pulsar)


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] lamber-ken commented on a change in pull request #1377: [HUDI-663] Fix HoodieDeltaStreamer offset not handled correctly

2020-03-11 Thread GitBox
lamber-ken commented on a change in pull request #1377: [HUDI-663] Fix 
HoodieDeltaStreamer offset not handled correctly
URL: https://github.com/apache/incubator-hudi/pull/1377#discussion_r391348111
 
 

 ##
 File path: 
hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/helpers/KafkaOffsetGen.java
 ##
 @@ -180,7 +180,7 @@ public KafkaOffsetGen(TypedProperties props) {
   .map(x -> new TopicPartition(x.topic(), 
x.partition())).collect(Collectors.toSet());
 
   // Determine the offset ranges to read from
-  if (lastCheckpointStr.isPresent()) {
+  if (lastCheckpointStr.isPresent() && !lastCheckpointStr.get().isEmpty()) 
{
 
 Review comment:
   hi @garyli1019, let's imagine a scenario that a topic with no data, the 
first commit will save empty checkpoint, then the second commit will always 
throw exception(even if we send msg to kafka). In that case, `EARLIEST` or 
`LATEST` will no longer has any effect. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Updated] (HUDI-644) checkpoint generator tool for delta streamer

2020-03-11 Thread Yanjia Gary Li (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yanjia Gary Li updated HUDI-644:

Summary: checkpoint generator tool for delta streamer  (was: Enable to 
retrieve checkpoint from previous commits in Delta Streamer)

> checkpoint generator tool for delta streamer
> 
>
> Key: HUDI-644
> URL: https://issues.apache.org/jira/browse/HUDI-644
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: DeltaStreamer
>Reporter: Yanjia Gary Li
>Assignee: Yanjia Gary Li
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.6.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> This ticket is to resolve the following problem:
> The user is using a homebrew Spark data source to read new data and write to 
> Hudi table
> The user would like to migrate to Delta Streamer
> But the Delta Streamer only checks the last commit metadata, if there is no 
> checkpoint info, then the Delta Streamer will use the default. For Kafka 
> source, it is LATEST. 
> The user would like to run the homebrew Spark data source reader and Delta 
> Streamer in parallel to prevent data loss, but the Spark data source writer 
> will make commit without checkpoint info, which will reset the delta 
> streamer. 
> So if we have an option to allow the user to retrieve the checkpoint from 
> previous commits instead of the latest commit would be helpful for the 
> migration. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] garyli1019 commented on a change in pull request #1377: [HUDI-663] Fix HoodieDeltaStreamer offset not handled correctly

2020-03-11 Thread GitBox
garyli1019 commented on a change in pull request #1377: [HUDI-663] Fix 
HoodieDeltaStreamer offset not handled correctly
URL: https://github.com/apache/incubator-hudi/pull/1377#discussion_r391344867
 
 

 ##
 File path: 
hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/helpers/KafkaOffsetGen.java
 ##
 @@ -180,7 +180,7 @@ public KafkaOffsetGen(TypedProperties props) {
   .map(x -> new TopicPartition(x.topic(), 
x.partition())).collect(Collectors.toSet());
 
   // Determine the offset ranges to read from
-  if (lastCheckpointStr.isPresent()) {
+  if (lastCheckpointStr.isPresent() && !lastCheckpointStr.get().isEmpty()) 
{
 
 Review comment:
   I agree with @bvaradar here. 
   I tried 
   ```java
   else if (commitMetadata.getMetadata(CHECKPOINT_KEY) != null && 
!commitMetadata.getMetadata(CHECKPOINT_KEY).isEmpty()) {
 resumeCheckpointStr = 
Option.of(commitMetadata.getMetadata(CHECKPOINT_KEY));
   }
   ```
   The unit tests were passed.
   If you mean when the checkpoint was empty, the application was throwing an 
Exception. I think it should the desired behavior, WDYT?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] lamber-ken edited a comment on issue #1373: [WIP] [HUDI-646] Re-enable TestUpdateSchemaEvolution to reproduce CI error

2020-03-11 Thread GitBox
lamber-ken edited a comment on issue #1373: [WIP] [HUDI-646] Re-enable 
TestUpdateSchemaEvolution to reproduce CI error
URL: https://github.com/apache/incubator-hudi/pull/1373#issuecomment-596656326
 
 
   hi @vinothchandar 
   
   After many tests, I finally found the answer, let me explain this issue step 
by step.
   
   Three questions
   1, Why ci build failure after code cleanup?
   2, Why junit doesn't report failure in local env?
   3, Why `TestHBaseQPSResourceAllocator` affects `TestUpdateSchemaEvolution`?
   
    Stackstrace
   ```
   Job aborted due to stage failure: Task 7 in stage 1.0 failed 1 times, most 
recent failure: Lost task 7.0 in stage 1.0 (TID 15, localhost, executor 
driver): org.apache.parquet.io.ParquetDecodingException: Can not read value at 
0 in block -1 in file 
file:/tmp/junit3406952253616234024/2016/01/31/f1-0_7-0-7_100.parquet
at 
org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:251)
at org.apache.parquet.hadoop.ParquetReader.read(ParquetReader.java:132)
at org.apache.parquet.hadoop.ParquetReader.read(ParquetReader.java:136)
at 
org.apache.hudi.common.util.ParquetUtils.readAvroRecords(ParquetUtils.java:190)
at 
org.apache.hudi.client.TestUpdateSchemaEvolution.lambda$testSchemaEvolutionOnUpdate$dfb2f24e$1(TestUpdateSchemaEvolution.java:123)
at 
org.apache.spark.api.java.JavaPairRDD$$anonfun$toScalaFunction$1.apply(JavaPairRDD.scala:1040)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:410)
at scala.collection.Iterator$class.foreach(Iterator.scala:891)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1334)
at 
scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59)
at 
scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:104)
at 
scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:48)
at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:310)
at scala.collection.AbstractIterator.to(Iterator.scala:1334)
at 
scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:302)
at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1334)
at 
scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:289)
at scala.collection.AbstractIterator.toArray(Iterator.scala:1334)
at 
org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$13.apply(RDD.scala:945)
at 
org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$13.apply(RDD.scala:945)
at 
org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2101)
at 
org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2101)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:123)
at 
org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
   Caused by: java.lang.UnsupportedOperationException: Byte-buffer read 
unsupported by input stream
at 
org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:146)
at 
org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:143)
at 
org.apache.parquet.hadoop.util.H2SeekableInputStream$H2Reader.read(H2SeekableInputStream.java:81)
at 
org.apache.parquet.hadoop.util.H2SeekableInputStream.readFully(H2SeekableInputStream.java:90)
at 
org.apache.parquet.hadoop.util.H2SeekableInputStream.readFully(H2SeekableInputStream.java:75)
at 
org.apache.parquet.hadoop.ParquetFileReader$ConsecutiveChunkList.readAll(ParquetFileReader.java:1174)
at 
org.apache.parquet.hadoop.ParquetFileReader.readNextRowGroup(ParquetFileReader.java:805)
at 
org.apache.parquet.hadoop.InternalParquetRecordReader.checkRead(InternalParquetRecordReader.java:127)
at 
org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:222)
... 29 more
   ```
   
   **Step 1:** Check whether the `f1-0_7-0-7_100.parquet` file is complete or 
not, used hdfs api to check it, it's no problem
   
   **Setp 2:** Noticed the `UnsupportedOperationException: Byte-buffer read 
unsupported by input stream` exception
   Add some log statements to `FSDataInputStream` and rerun the unit by travis, 
I found two different implementation class
   
   - org.apache.hadoop.fs.FSDataInputStream
   - org.apache.hadoop.fs.ChecksumFileSystem.ChecksumFSInputChecker
   
   ```
   

[GitHub] [incubator-hudi] codecov-io edited a comment on issue #1159: [HUDI-479] Eliminate or Minimize use of Guava if possible

2020-03-11 Thread GitBox
codecov-io edited a comment on issue #1159: [HUDI-479] Eliminate or Minimize 
use of Guava if possible
URL: https://github.com/apache/incubator-hudi/pull/1159#issuecomment-596089314
 
 
   # 
[Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1159?src=pr=h1) 
Report
   > Merging 
[#1159](https://codecov.io/gh/apache/incubator-hudi/pull/1159?src=pr=desc) 
into 
[master](https://codecov.io/gh/apache/incubator-hudi/commit/7194514aff33815a2f4d6d1847f00b94d1a1a36b?src=pr=desc)
 will **decrease** coverage by `0.16%`.
   > The diff coverage is `55%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-hudi/pull/1159/graphs/tree.svg?width=650=VTTXabwbs2=150=pr)](https://codecov.io/gh/apache/incubator-hudi/pull/1159?src=pr=tree)
   
   ```diff
   @@ Coverage Diff  @@
   ## master#1159  +/-   ##
   
   - Coverage 67.45%   67.28%   -0.17% 
 Complexity  230  230  
   
 Files   336  338   +2 
 Lines 1636616469 +103 
 Branches   1672 1684  +12 
   
   + Hits  1103911081  +42 
   - Misses 4592 4647  +55 
   - Partials735  741   +6
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/incubator-hudi/pull/1159?src=pr=tree) | 
Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | 
[...pache/hudi/utilities/sources/HoodieIncrSource.java](https://codecov.io/gh/apache/incubator-hudi/pull/1159/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvSG9vZGllSW5jclNvdXJjZS5qYXZh)
 | `92.59% <ø> (ø)` | `7 <0> (ø)` | :arrow_down: |
   | 
[...in/java/org/apache/hudi/io/HoodieAppendHandle.java](https://codecov.io/gh/apache/incubator-hudi/pull/1159/diff?src=pr=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaW8vSG9vZGllQXBwZW5kSGFuZGxlLmphdmE=)
 | `83.76% <ø> (ø)` | `0 <0> (ø)` | :arrow_down: |
   | 
[...mmon/versioning/clean/CleanV1MigrationHandler.java](https://codecov.io/gh/apache/incubator-hudi/pull/1159/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3ZlcnNpb25pbmcvY2xlYW4vQ2xlYW5WMU1pZ3JhdGlvbkhhbmRsZXIuamF2YQ==)
 | `90.24% <0%> (ø)` | `0 <0> (ø)` | :arrow_down: |
   | 
[...pache/hudi/utilities/HoodieWithTimelineServer.java](https://codecov.io/gh/apache/incubator-hudi/pull/1159/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL0hvb2RpZVdpdGhUaW1lbGluZVNlcnZlci5qYXZh)
 | `0% <0%> (ø)` | `0 <0> (ø)` | :arrow_down: |
   | 
[...e/hudi/common/util/queue/BoundedInMemoryQueue.java](https://codecov.io/gh/apache/incubator-hudi/pull/1159/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3V0aWwvcXVldWUvQm91bmRlZEluTWVtb3J5UXVldWUuamF2YQ==)
 | `91.13% <0%> (ø)` | `0 <0> (ø)` | :arrow_down: |
   | 
[...oning/compaction/CompactionV2MigrationHandler.java](https://codecov.io/gh/apache/incubator-hudi/pull/1159/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3ZlcnNpb25pbmcvY29tcGFjdGlvbi9Db21wYWN0aW9uVjJNaWdyYXRpb25IYW5kbGVyLmphdmE=)
 | `60% <0%> (ø)` | `0 <0> (ø)` | :arrow_down: |
   | 
[...mmon/versioning/clean/CleanV2MigrationHandler.java](https://codecov.io/gh/apache/incubator-hudi/pull/1159/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3ZlcnNpb25pbmcvY2xlYW4vQ2xlYW5WMk1pZ3JhdGlvbkhhbmRsZXIuamF2YQ==)
 | `94.87% <0%> (ø)` | `0 <0> (ø)` | :arrow_down: |
   | 
[...che/hudi/common/table/log/HoodieLogFileReader.java](https://codecov.io/gh/apache/incubator-hudi/pull/1159/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL2xvZy9Ib29kaWVMb2dGaWxlUmVhZGVyLmphdmE=)
 | `74.02% <0%> (ø)` | `0 <0> (ø)` | :arrow_down: |
   | 
[...common/table/view/AbstractTableFileSystemView.java](https://codecov.io/gh/apache/incubator-hudi/pull/1159/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3ZpZXcvQWJzdHJhY3RUYWJsZUZpbGVTeXN0ZW1WaWV3LmphdmE=)
 | `92.66% <0%> (ø)` | `0 <0> (ø)` | :arrow_down: |
   | 
[...pache/hudi/common/model/TimelineLayoutVersion.java](https://codecov.io/gh/apache/incubator-hudi/pull/1159/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL21vZGVsL1RpbWVsaW5lTGF5b3V0VmVyc2lvbi5qYXZh)
 | `65% <0%> (ø)` | `0 <0> (ø)` | :arrow_down: |
   | ... and [47 
more](https://codecov.io/gh/apache/incubator-hudi/pull/1159/diff?src=pr=tree-more)
 | |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1159?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  

[GitHub] [incubator-hudi] codecov-io edited a comment on issue #1159: [HUDI-479] Eliminate or Minimize use of Guava if possible

2020-03-11 Thread GitBox
codecov-io edited a comment on issue #1159: [HUDI-479] Eliminate or Minimize 
use of Guava if possible
URL: https://github.com/apache/incubator-hudi/pull/1159#issuecomment-596089314
 
 
   # 
[Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1159?src=pr=h1) 
Report
   > Merging 
[#1159](https://codecov.io/gh/apache/incubator-hudi/pull/1159?src=pr=desc) 
into 
[master](https://codecov.io/gh/apache/incubator-hudi/commit/7194514aff33815a2f4d6d1847f00b94d1a1a36b=desc)
 will **decrease** coverage by `0.16%`.
   > The diff coverage is `55.00%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-hudi/pull/1159/graphs/tree.svg?width=650=150=pr=VTTXabwbs2)](https://codecov.io/gh/apache/incubator-hudi/pull/1159?src=pr=tree)
   
   ```diff
   @@ Coverage Diff  @@
   ## master#1159  +/-   ##
   
   - Coverage 67.45%   67.28%   -0.17% 
 Complexity  230  230  
   
 Files   336  338   +2 
 Lines 1636616469 +103 
 Branches   1672 1684  +12 
   
   + Hits  1103911081  +42 
   - Misses 4592 4647  +55 
   - Partials735  741   +6 
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/incubator-hudi/pull/1159?src=pr=tree) | 
Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | 
[...org/apache/hudi/config/HoodieCompactionConfig.java](https://codecov.io/gh/apache/incubator-hudi/pull/1159/diff?src=pr=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29uZmlnL0hvb2RpZUNvbXBhY3Rpb25Db25maWcuamF2YQ==)
 | `80.00% <0.00%> (ø)` | `0.00 <0.00> (ø)` | |
   | 
[...in/java/org/apache/hudi/io/HoodieAppendHandle.java](https://codecov.io/gh/apache/incubator-hudi/pull/1159/diff?src=pr=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaW8vSG9vZGllQXBwZW5kSGFuZGxlLmphdmE=)
 | `83.76% <ø> (ø)` | `0.00 <0.00> (ø)` | |
   | 
[...pache/hudi/common/model/TimelineLayoutVersion.java](https://codecov.io/gh/apache/incubator-hudi/pull/1159/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL21vZGVsL1RpbWVsaW5lTGF5b3V0VmVyc2lvbi5qYXZh)
 | `65.00% <0.00%> (ø)` | `0.00 <0.00> (ø)` | |
   | 
[...pache/hudi/common/table/HoodieTableMetaClient.java](https://codecov.io/gh/apache/incubator-hudi/pull/1159/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL0hvb2RpZVRhYmxlTWV0YUNsaWVudC5qYXZh)
 | `76.77% <0.00%> (ø)` | `0.00 <0.00> (ø)` | |
   | 
[...che/hudi/common/table/log/HoodieLogFileReader.java](https://codecov.io/gh/apache/incubator-hudi/pull/1159/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL2xvZy9Ib29kaWVMb2dGaWxlUmVhZGVyLmphdmE=)
 | `74.02% <0.00%> (ø)` | `0.00 <0.00> (ø)` | |
   | 
[...common/table/view/AbstractTableFileSystemView.java](https://codecov.io/gh/apache/incubator-hudi/pull/1159/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3ZpZXcvQWJzdHJhY3RUYWJsZUZpbGVTeXN0ZW1WaWV3LmphdmE=)
 | `92.66% <0.00%> (ø)` | `0.00 <0.00> (ø)` | |
   | 
[...common/table/view/FileSystemViewStorageConfig.java](https://codecov.io/gh/apache/incubator-hudi/pull/1159/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3ZpZXcvRmlsZVN5c3RlbVZpZXdTdG9yYWdlQ29uZmlnLmphdmE=)
 | `84.12% <0.00%> (ø)` | `0.00 <0.00> (ø)` | |
   | 
[...on/table/view/RemoteHoodieTableFileSystemView.java](https://codecov.io/gh/apache/incubator-hudi/pull/1159/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3ZpZXcvUmVtb3RlSG9vZGllVGFibGVGaWxlU3lzdGVtVmlldy5qYXZh)
 | `77.95% <0.00%> (ø)` | `0.00 <0.00> (ø)` | |
   | 
[...e/hudi/common/util/queue/BoundedInMemoryQueue.java](https://codecov.io/gh/apache/incubator-hudi/pull/1159/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3V0aWwvcXVldWUvQm91bmRlZEluTWVtb3J5UXVldWUuamF2YQ==)
 | `91.13% <0.00%> (ø)` | `0.00 <0.00> (ø)` | |
   | 
[...pache/hudi/common/versioning/MetadataMigrator.java](https://codecov.io/gh/apache/incubator-hudi/pull/1159/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3ZlcnNpb25pbmcvTWV0YWRhdGFNaWdyYXRvci5qYXZh)
 | `58.33% <0.00%> (ø)` | `0.00 <0.00> (ø)` | |
   | ... and [47 
more](https://codecov.io/gh/apache/incubator-hudi/pull/1159/diff?src=pr=tree-more)
 | |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1159?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 

[GitHub] [incubator-hudi] codecov-io edited a comment on issue #1350: [HUDI-629]: Replace Guava's Hashing with an equivalent in NumericUtils.java

2020-03-11 Thread GitBox
codecov-io edited a comment on issue #1350: [HUDI-629]: Replace Guava's Hashing 
with an equivalent in NumericUtils.java
URL: https://github.com/apache/incubator-hudi/pull/1350#issuecomment-59868
 
 
   # 
[Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1350?src=pr=h1) 
Report
   > Merging 
[#1350](https://codecov.io/gh/apache/incubator-hudi/pull/1350?src=pr=desc) 
into 
[master](https://codecov.io/gh/apache/incubator-hudi/commit/7194514aff33815a2f4d6d1847f00b94d1a1a36b=desc)
 will **decrease** coverage by `0.02%`.
   > The diff coverage is `58.50%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-hudi/pull/1350/graphs/tree.svg?width=650=150=pr=VTTXabwbs2)](https://codecov.io/gh/apache/incubator-hudi/pull/1350?src=pr=tree)
   
   ```diff
   @@ Coverage Diff  @@
   ## master#1350  +/-   ##
   
   - Coverage 67.45%   67.42%   -0.03% 
 Complexity  230  230  
   
 Files   336  337   +1 
 Lines 1636616386  +20 
 Branches   1672 1672  
   
   + Hits  1103911049  +10 
   - Misses 4592 4598   +6 
   - Partials735  739   +4 
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/incubator-hudi/pull/1350?src=pr=tree) | 
Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | 
[...org/apache/hudi/config/HoodieCompactionConfig.java](https://codecov.io/gh/apache/incubator-hudi/pull/1350/diff?src=pr=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29uZmlnL0hvb2RpZUNvbXBhY3Rpb25Db25maWcuamF2YQ==)
 | `80.80% <0.00%> (+0.80%)` | `0.00 <0.00> (ø)` | |
   | 
[...java/org/apache/hudi/config/HoodieWriteConfig.java](https://codecov.io/gh/apache/incubator-hudi/pull/1350/diff?src=pr=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29uZmlnL0hvb2RpZVdyaXRlQ29uZmlnLmphdmE=)
 | `83.84% <ø> (ø)` | `0.00 <0.00> (ø)` | |
   | 
[...pache/hudi/common/model/TimelineLayoutVersion.java](https://codecov.io/gh/apache/incubator-hudi/pull/1350/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL21vZGVsL1RpbWVsaW5lTGF5b3V0VmVyc2lvbi5qYXZh)
 | `65.00% <0.00%> (ø)` | `0.00 <0.00> (ø)` | |
   | 
[...pache/hudi/common/table/HoodieTableMetaClient.java](https://codecov.io/gh/apache/incubator-hudi/pull/1350/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL0hvb2RpZVRhYmxlTWV0YUNsaWVudC5qYXZh)
 | `76.77% <0.00%> (ø)` | `0.00 <0.00> (ø)` | |
   | 
[...common/table/view/FileSystemViewStorageConfig.java](https://codecov.io/gh/apache/incubator-hudi/pull/1350/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3ZpZXcvRmlsZVN5c3RlbVZpZXdTdG9yYWdlQ29uZmlnLmphdmE=)
 | `84.12% <0.00%> (ø)` | `0.00 <0.00> (ø)` | |
   | 
[...e/hudi/common/util/queue/BoundedInMemoryQueue.java](https://codecov.io/gh/apache/incubator-hudi/pull/1350/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3V0aWwvcXVldWUvQm91bmRlZEluTWVtb3J5UXVldWUuamF2YQ==)
 | `91.13% <0.00%> (ø)` | `0.00 <0.00> (ø)` | |
   | 
[...pache/hudi/common/versioning/MetadataMigrator.java](https://codecov.io/gh/apache/incubator-hudi/pull/1350/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3ZlcnNpb25pbmcvTWV0YWRhdGFNaWdyYXRvci5qYXZh)
 | `58.33% <0.00%> (ø)` | `0.00 <0.00> (ø)` | |
   | 
[...mmon/versioning/clean/CleanV2MigrationHandler.java](https://codecov.io/gh/apache/incubator-hudi/pull/1350/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3ZlcnNpb25pbmcvY2xlYW4vQ2xlYW5WMk1pZ3JhdGlvbkhhbmRsZXIuamF2YQ==)
 | `94.87% <0.00%> (ø)` | `0.00 <0.00> (ø)` | |
   | 
[...in/java/org/apache/hudi/hive/HoodieHiveClient.java](https://codecov.io/gh/apache/incubator-hudi/pull/1350/diff?src=pr=tree#diff-aHVkaS1oaXZlLXN5bmMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaGl2ZS9Ib29kaWVIaXZlQ2xpZW50LmphdmE=)
 | `61.70% <0.00%> (ø)` | `0.00 <0.00> (ø)` | |
   | 
[.../apache/hudi/hive/MultiPartKeysValueExtractor.java](https://codecov.io/gh/apache/incubator-hudi/pull/1350/diff?src=pr=tree#diff-aHVkaS1oaXZlLXN5bmMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaGl2ZS9NdWx0aVBhcnRLZXlzVmFsdWVFeHRyYWN0b3IuamF2YQ==)
 | `55.55% <0.00%> (ø)` | `0.00 <0.00> (ø)` | |
   | ... and [32 
more](https://codecov.io/gh/apache/incubator-hudi/pull/1350/diff?src=pr=tree-more)
 | |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1350?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 

[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1390: [HUDI-634] Cut 0.5.2 documentation and write release note

2020-03-11 Thread GitBox
vinothchandar commented on a change in pull request #1390: [HUDI-634] Cut 0.5.2 
documentation and write release note
URL: https://github.com/apache/incubator-hudi/pull/1390#discussion_r391334410
 
 

 ##
 File path: docs/_pages/releases.md
 ##
 @@ -6,6 +6,32 @@ toc: true
 last_modified_at: 2019-12-30T15:59:57-04:00
 ---
 
+## [Release 
0.5.2-incubating](https://github.com/apache/incubator-hudi/releases/tag/release-0.5.2-incubating)
 ([docs](/docs/0.5.2-quick-start-guide.html))
+
+### Download Information
+ * Source Release : [Apache Hudi(incubating) 0.5.2-incubating Source 
Release](https://www.apache.org/dist/incubator/hudi/0.5.2-incubating/hudi-0.5.2-incubating.src.tgz)
 
([asc](https://www.apache.org/dist/incubator/hudi/0.5.2-incubating/hudi-0.5.2-incubating.src.tgz.asc),
 
[sha512](https://www.apache.org/dist/incubator/hudi/0.5.2-incubating/hudi-0.5.2-incubating.src.tgz.sha512))
+ * Apache Hudi (incubating) jars corresponding to this release is available 
[here](https://repository.apache.org/#nexus-search;quick~hudi)
+
+### Migration Guide for this release
+ * Support for overwriting the payload implementation in `hoodie.properties` 
via specifying the `hoodie.compaction.payload.class` config option. Previously, 
once the payload class is set once in `hoodie.properties`, it cannot be 
changed. In some cases, if a code refactor is done and the jar updated, one may 
need to pass the new payload class name.
+ * Write Client restructuring has moved classes around 
([HUDI-554](https://issues.apache.org/jira/browse/HUDI-554)). Package `client` 
now has all the various client classes, that do the transaction management. 
`func` renamed to `execution` and some helpers moved to `client/utils`. All 
compaction code under `io` now under `table/compact`. Rollback code under 
`table/rollback` and in general all code for individual operations under `table`
 
 Review comment:
   Let's make it clearer that this only affects the apps/projects depending on 
`hudi-client`. Users of deltastreamer/datasource will see no change.. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1390: [HUDI-634] Cut 0.5.2 documentation and write release note

2020-03-11 Thread GitBox
vinothchandar commented on a change in pull request #1390: [HUDI-634] Cut 0.5.2 
documentation and write release note
URL: https://github.com/apache/incubator-hudi/pull/1390#discussion_r391334510
 
 

 ##
 File path: docs/_pages/releases.md
 ##
 @@ -6,6 +6,32 @@ toc: true
 last_modified_at: 2019-12-30T15:59:57-04:00
 ---
 
+## [Release 
0.5.2-incubating](https://github.com/apache/incubator-hudi/releases/tag/release-0.5.2-incubating)
 ([docs](/docs/0.5.2-quick-start-guide.html))
+
+### Download Information
+ * Source Release : [Apache Hudi(incubating) 0.5.2-incubating Source 
Release](https://www.apache.org/dist/incubator/hudi/0.5.2-incubating/hudi-0.5.2-incubating.src.tgz)
 
([asc](https://www.apache.org/dist/incubator/hudi/0.5.2-incubating/hudi-0.5.2-incubating.src.tgz.asc),
 
[sha512](https://www.apache.org/dist/incubator/hudi/0.5.2-incubating/hudi-0.5.2-incubating.src.tgz.sha512))
+ * Apache Hudi (incubating) jars corresponding to this release is available 
[here](https://repository.apache.org/#nexus-search;quick~hudi)
+
+### Migration Guide for this release
+ * Support for overwriting the payload implementation in `hoodie.properties` 
via specifying the `hoodie.compaction.payload.class` config option. Previously, 
once the payload class is set once in `hoodie.properties`, it cannot be 
changed. In some cases, if a code refactor is done and the jar updated, one may 
need to pass the new payload class name.
+ * Write Client restructuring has moved classes around 
([HUDI-554](https://issues.apache.org/jira/browse/HUDI-554)). Package `client` 
now has all the various client classes, that do the transaction management. 
`func` renamed to `execution` and some helpers moved to `client/utils`. All 
compaction code under `io` now under `table/compact`. Rollback code under 
`table/rollback` and in general all code for individual operations under `table`
+ * Simplify `HoodieBloomIndex` without the need for 2GB limit handling. Prior 
to spark 2.4.0, each spark partition has a limit of 2GB. In Hudi 0.5.1, after 
we upgraded to spark 2.4.4, we don't have the limitation anymore. Hence 
removing the safe parallelism constraint we had in` HoodieBloomIndex`.
 
 Review comment:
   Again, this is not needed for migration rihgt? 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1390: [HUDI-634] Cut 0.5.2 documentation and write release note

2020-03-11 Thread GitBox
vinothchandar commented on a change in pull request #1390: [HUDI-634] Cut 0.5.2 
documentation and write release note
URL: https://github.com/apache/incubator-hudi/pull/1390#discussion_r391334001
 
 

 ##
 File path: docs/_pages/releases.md
 ##
 @@ -6,6 +6,32 @@ toc: true
 last_modified_at: 2019-12-30T15:59:57-04:00
 ---
 
+## [Release 
0.5.2-incubating](https://github.com/apache/incubator-hudi/releases/tag/release-0.5.2-incubating)
 ([docs](/docs/0.5.2-quick-start-guide.html))
+
+### Download Information
+ * Source Release : [Apache Hudi(incubating) 0.5.2-incubating Source 
Release](https://www.apache.org/dist/incubator/hudi/0.5.2-incubating/hudi-0.5.2-incubating.src.tgz)
 
([asc](https://www.apache.org/dist/incubator/hudi/0.5.2-incubating/hudi-0.5.2-incubating.src.tgz.asc),
 
[sha512](https://www.apache.org/dist/incubator/hudi/0.5.2-incubating/hudi-0.5.2-incubating.src.tgz.sha512))
+ * Apache Hudi (incubating) jars corresponding to this release is available 
[here](https://repository.apache.org/#nexus-search;quick~hudi)
+
+### Migration Guide for this release
+ * Support for overwriting the payload implementation in `hoodie.properties` 
via specifying the `hoodie.compaction.payload.class` config option. Previously, 
once the payload class is set once in `hoodie.properties`, it cannot be 
changed. In some cases, if a code refactor is done and the jar updated, one may 
need to pass the new payload class name.
 
 Review comment:
   I am not sure if this is needed for migration, right? its just a feature. 
Can we move this to highlights?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar merged pull request #1397: [HUDI-692] Add delete savepoint for cli

2020-03-11 Thread GitBox
vinothchandar merged pull request #1397: [HUDI-692] Add delete savepoint for cli
URL: https://github.com/apache/incubator-hudi/pull/1397
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[incubator-hudi] branch master updated: [HUDI-692] Add delete savepoint for cli (#1397)

2020-03-11 Thread vinoth
This is an automated email from the ASF dual-hosted git repository.

vinoth pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new 0f892ef  [HUDI-692] Add delete savepoint for cli (#1397)
0f892ef is described below

commit 0f892ef62c76436b17030e6edb4642f476d7de1e
Author: hongdd 
AuthorDate: Thu Mar 12 07:49:02 2020 +0800

[HUDI-692] Add delete savepoint for cli (#1397)

* Add delete savepoint for cli
* Add check
* Move JavaSparkContext to try
---
 .../hudi/cli/commands/SavepointsCommand.java   | 39 +-
 1 file changed, 30 insertions(+), 9 deletions(-)

diff --git 
a/hudi-cli/src/main/java/org/apache/hudi/cli/commands/SavepointsCommand.java 
b/hudi-cli/src/main/java/org/apache/hudi/cli/commands/SavepointsCommand.java
index 65a813d..9ef15ac 100644
--- a/hudi-cli/src/main/java/org/apache/hudi/cli/commands/SavepointsCommand.java
+++ b/hudi-cli/src/main/java/org/apache/hudi/cli/commands/SavepointsCommand.java
@@ -76,17 +76,17 @@ public class SavepointsCommand implements CommandMarker {
   return "Commit " + commitTime + " not found in Commits " + timeline;
 }
 
-JavaSparkContext jsc = SparkUtil.initJavaSparkConf("Create Savepoint");
-HoodieWriteClient client = createHoodieClient(jsc, 
metaClient.getBasePath());
 String result;
-if (client.savepoint(commitTime, user, comments)) {
-  // Refresh the current
-  refreshMetaClient();
-  result = String.format("The commit \"%s\" has been savepointed.", 
commitTime);
-} else {
-  result = String.format("Failed: Could not savepoint commit \"%s\".", 
commitTime);
+try (JavaSparkContext jsc = SparkUtil.initJavaSparkConf("Create 
Savepoint")) {
+  HoodieWriteClient client = createHoodieClient(jsc, 
metaClient.getBasePath());
+  if (client.savepoint(commitTime, user, comments)) {
+// Refresh the current
+refreshMetaClient();
+result = String.format("The commit \"%s\" has been savepointed.", 
commitTime);
+  } else {
+result = String.format("Failed: Could not savepoint commit \"%s\".", 
commitTime);
+  }
 }
-jsc.close();
 return result;
   }
 
@@ -127,6 +127,27 @@ public class SavepointsCommand implements CommandMarker {
 return "Metadata for table " + 
HoodieCLI.getTableMetaClient().getTableConfig().getTableName() + " refreshed.";
   }
 
+  @CliCommand(value = "savepoint delete", help = "Delete the savepoint")
+  public String deleteSavepoint(@CliOption(key = {"commit"}, help = "Delete a 
savepoint") final String commitTime) throws Exception {
+HoodieTableMetaClient metaClient = HoodieCLI.getTableMetaClient();
+HoodieTimeline completedInstants = 
metaClient.getActiveTimeline().getSavePointTimeline().filterCompletedInstants();
+if (completedInstants.empty()) {
+  throw new HoodieException("There are no completed savepoint to run 
delete");
+}
+HoodieInstant savePoint = new HoodieInstant(false, 
HoodieTimeline.SAVEPOINT_ACTION, commitTime);
+
+if (!completedInstants.containsInstant(savePoint)) {
+  return "Commit " + commitTime + " not found in Commits " + 
completedInstants;
+}
+
+try (JavaSparkContext jsc = SparkUtil.initJavaSparkConf("Delete 
Savepoint")) {
+  HoodieWriteClient client = createHoodieClient(jsc, 
metaClient.getBasePath());
+  client.deleteSavepoint(commitTime);
+  refreshMetaClient();
+}
+return "Savepoint " + commitTime + " deleted";
+  }
+
   private static HoodieWriteClient createHoodieClient(JavaSparkContext jsc, 
String basePath) throws Exception {
 HoodieWriteConfig config = 
HoodieWriteConfig.newBuilder().withPath(basePath)
 
.withIndexConfig(HoodieIndexConfig.newBuilder().withIndexType(HoodieIndex.IndexType.BLOOM).build()).build();



[incubator-hudi] branch master updated: [MINOR] Removing code which is duplicated from the base class HoodieWriteHandle. (#1399)

2020-03-11 Thread vinoth
This is an automated email from the ASF dual-hosted git repository.

vinoth pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new 7d66831  [MINOR] Removing code which is duplicated from the base class 
HoodieWriteHandle. (#1399)
7d66831 is described below

commit 7d668314447650243ec5c872229efdd02fb0212c
Author: Prashant Wason 
AuthorDate: Wed Mar 11 16:43:04 2020 -0700

[MINOR] Removing code which is duplicated from the base class 
HoodieWriteHandle. (#1399)
---
 .../java/org/apache/hudi/io/HoodieMergeHandle.java | 54 --
 1 file changed, 54 deletions(-)

diff --git 
a/hudi-client/src/main/java/org/apache/hudi/io/HoodieMergeHandle.java 
b/hudi-client/src/main/java/org/apache/hudi/io/HoodieMergeHandle.java
index 078c47f..199c0a0 100644
--- a/hudi-client/src/main/java/org/apache/hudi/io/HoodieMergeHandle.java
+++ b/hudi-client/src/main/java/org/apache/hudi/io/HoodieMergeHandle.java
@@ -93,65 +93,11 @@ public class HoodieMergeHandle extends HoodieWrit
   }
 
   @Override
-  public Path makeNewPath(String partitionPath) {
-Path path = FSUtils.getPartitionPath(config.getBasePath(), partitionPath);
-try {
-  fs.mkdirs(path); // create a new partition as needed.
-} catch (IOException e) {
-  throw new HoodieIOException("Failed to make dir " + path, e);
-}
-
-return new Path(path.toString(), FSUtils.makeDataFileName(instantTime, 
writeToken, fileId));
-  }
-
-  @Override
   public Schema getWriterSchema() {
 return writerSchema;
   }
 
   /**
-   * Determines whether we can accept the incoming records, into the current 
file. Depending on
-   * 
-   * - Whether it belongs to the same partitionPath as existing records - 
Whether the current file written bytes lt max
-   * file size
-   */
-  @Override
-  public boolean canWrite(HoodieRecord record) {
-return false;
-  }
-
-  /**
-   * Perform the actual writing of the given record into the backing file.
-   */
-  @Override
-  public void write(HoodieRecord record, Option insertValue) {
-// NO_OP
-  }
-
-  /**
-   * Perform the actual writing of the given record into the backing file.
-   */
-  @Override
-  public void write(HoodieRecord record, Option avroRecord, 
Option exception) {
-Option recordMetadata = record.getData().getMetadata();
-if (exception.isPresent() && exception.get() instanceof Throwable) {
-  // Not throwing exception from here, since we don't want to fail the 
entire job for a single record
-  writeStatus.markFailure(record, exception.get(), recordMetadata);
-  LOG.error("Error writing record " + record, exception.get());
-} else {
-  write(record, avroRecord);
-}
-  }
-
-  /**
-   * Rewrite the GenericRecord with the Schema containing the Hoodie Metadata 
fields.
-   */
-  @Override
-  protected GenericRecord rewriteRecord(GenericRecord record) {
-return HoodieAvroUtils.rewriteRecord(record, writerSchema);
-  }
-
-  /**
* Extract old file path, initialize StorageWriter and WriteStatus.
*/
   private void init(String fileId, String partitionPath, HoodieBaseFile 
dataFileToBeMerged) {



[GitHub] [incubator-hudi] vinothchandar merged pull request #1399: [MINOR] Removing code which is duplicated from the base class.

2020-03-11 Thread GitBox
vinothchandar merged pull request #1399: [MINOR] Removing code which is 
duplicated from the base class.
URL: https://github.com/apache/incubator-hudi/pull/1399
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1399: [MINOR] Removing code which is duplicated from the base class.

2020-03-11 Thread GitBox
vinothchandar commented on a change in pull request #1399: [MINOR] Removing 
code which is duplicated from the base class.
URL: https://github.com/apache/incubator-hudi/pull/1399#discussion_r391329711
 
 

 ##
 File path: hudi-client/src/main/java/org/apache/hudi/io/HoodieMergeHandle.java
 ##
 @@ -92,65 +92,11 @@ public static Schema createHoodieWriteSchema(Schema 
originalSchema) {
 return HoodieAvroUtils.addMetadataFields(originalSchema);
 
 Review comment:
   this method is also unused?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[incubator-hudi] branch master updated (1ca912a -> 7194514)

2020-03-11 Thread vbalaji
This is an automated email from the ASF dual-hosted git repository.

vbalaji pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-hudi.git.


from 1ca912a  [HUDI-667] Fixing delete tests for DeltaStreamer (#1395)
 add 7194514  [HUDI-689] Change CLI command names to not have overlap 
(#1392)

No new revisions were added by this update.

Summary of changes:
 .../src/main/java/org/apache/hudi/cli/commands/CommitsCommand.java  | 2 +-
 .../main/java/org/apache/hudi/cli/commands/CompactionCommand.java   | 6 +++---
 2 files changed, 4 insertions(+), 4 deletions(-)



[GitHub] [incubator-hudi] bvaradar merged pull request #1392: [HUDI-689] Change CLI command names to not have overlap

2020-03-11 Thread GitBox
bvaradar merged pull request #1392: [HUDI-689] Change CLI command names to not 
have overlap
URL: https://github.com/apache/incubator-hudi/pull/1392
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] bvaradar merged pull request #1395: [HUDI-667] Fixing delete tests for DeltaStreamer

2020-03-11 Thread GitBox
bvaradar merged pull request #1395: [HUDI-667] Fixing delete tests for 
DeltaStreamer
URL: https://github.com/apache/incubator-hudi/pull/1395
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[incubator-hudi] branch master updated: [HUDI-667] Fixing delete tests for DeltaStreamer (#1395)

2020-03-11 Thread vbalaji
This is an automated email from the ASF dual-hosted git repository.

vbalaji pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new 1ca912a  [HUDI-667] Fixing delete tests for DeltaStreamer (#1395)
1ca912a is described below

commit 1ca912af0904283a270a822d5876babca5c89739
Author: Sivabalan Narayanan 
AuthorDate: Wed Mar 11 16:19:23 2020 -0700

[HUDI-667] Fixing delete tests for DeltaStreamer (#1395)
---
 .../hudi/common/HoodieTestDataGenerator.java   | 42 +++---
 .../hudi/utilities/TestHoodieDeltaStreamer.java|  4 +--
 2 files changed, 23 insertions(+), 23 deletions(-)

diff --git 
a/hudi-client/src/test/java/org/apache/hudi/common/HoodieTestDataGenerator.java 
b/hudi-client/src/test/java/org/apache/hudi/common/HoodieTestDataGenerator.java
index e0d2a53..6d86e93 100644
--- 
a/hudi-client/src/test/java/org/apache/hudi/common/HoodieTestDataGenerator.java
+++ 
b/hudi-client/src/test/java/org/apache/hudi/common/HoodieTestDataGenerator.java
@@ -399,7 +399,6 @@ public class HoodieTestDataGenerator {
*/
   public Stream generateUniqueUpdatesStream(String commitTime, 
Integer n) {
 final Set used = new HashSet<>();
-
 if (n > numExistingKeys) {
   throw new IllegalArgumentException("Requested unique updates is greater 
than number of available keys");
 }
@@ -429,24 +428,24 @@ public class HoodieTestDataGenerator {
*/
   public Stream generateUniqueDeleteStream(Integer n) {
 final Set used = new HashSet<>();
-
 if (n > numExistingKeys) {
   throw new IllegalArgumentException("Requested unique deletes is greater 
than number of available keys");
 }
 
-return IntStream.range(0, n).boxed().map(i -> {
-  int index = numExistingKeys == 1 ? 0 : RAND.nextInt(numExistingKeys - 1);
-  KeyPartition kp = existingKeys.get(index);
-  // Find the available keyPartition starting from randomly chosen one.
-  while (used.contains(kp)) {
+List result = new ArrayList<>();
+for (int i = 0; i < n; i++) {
+  int index = RAND.nextInt(numExistingKeys);
+  while (!existingKeys.containsKey(index)) {
 index = (index + 1) % numExistingKeys;
-kp = existingKeys.get(index);
   }
-  existingKeys.remove(kp);
+  KeyPartition kp = existingKeys.remove(index);
+  existingKeys.put(index, existingKeys.get(numExistingKeys - 1));
+  existingKeys.remove(numExistingKeys - 1);
   numExistingKeys--;
   used.add(kp);
-  return kp.key;
-});
+  result.add(kp.key);
+}
+return result.stream();
   }
 
   /**
@@ -458,28 +457,29 @@ public class HoodieTestDataGenerator {
*/
   public Stream generateUniqueDeleteRecordStream(String 
commitTime, Integer n) {
 final Set used = new HashSet<>();
-
 if (n > numExistingKeys) {
   throw new IllegalArgumentException("Requested unique deletes is greater 
than number of available keys");
 }
 
-return IntStream.range(0, n).boxed().map(i -> {
-  int index = numExistingKeys == 1 ? 0 : RAND.nextInt(numExistingKeys - 1);
-  KeyPartition kp = existingKeys.get(index);
-  // Find the available keyPartition starting from randomly chosen one.
-  while (used.contains(kp)) {
+List result = new ArrayList<>();
+for (int i = 0; i < n; i++) {
+  int index = RAND.nextInt(numExistingKeys);
+  while (!existingKeys.containsKey(index)) {
 index = (index + 1) % numExistingKeys;
-kp = existingKeys.get(index);
   }
-  existingKeys.remove(kp);
+  // swap chosen index with last index and remove last entry. 
+  KeyPartition kp = existingKeys.remove(index);
+  existingKeys.put(index, existingKeys.get(numExistingKeys - 1));
+  existingKeys.remove(numExistingKeys - 1);
   numExistingKeys--;
   used.add(kp);
   try {
-return new HoodieRecord(kp.key, generateRandomDeleteValue(kp.key, 
commitTime));
+result.add(new HoodieRecord(kp.key, generateRandomDeleteValue(kp.key, 
commitTime)));
   } catch (IOException e) {
 throw new HoodieIOException(e.getMessage(), e);
   }
-});
+}
+return result.stream();
   }
 
   public String[] getPartitionPaths() {
diff --git 
a/hudi-utilities/src/test/java/org/apache/hudi/utilities/TestHoodieDeltaStreamer.java
 
b/hudi-utilities/src/test/java/org/apache/hudi/utilities/TestHoodieDeltaStreamer.java
index 9d324dc..100faa2 100644
--- 
a/hudi-utilities/src/test/java/org/apache/hudi/utilities/TestHoodieDeltaStreamer.java
+++ 
b/hudi-utilities/src/test/java/org/apache/hudi/utilities/TestHoodieDeltaStreamer.java
@@ -422,8 +422,8 @@ public class TestHoodieDeltaStreamer extends 
UtilitiesTestBase {
   } else {
 TestHelpers.assertAtleastNCompactionCommits(5, tableBasePath, dfs);
   }
-  TestHelpers.assertRecordCount(totalRecords + 200, tableBasePath + 
"/*/*.parquet", sqlContext);

[GitHub] [incubator-hudi] codecov-io edited a comment on issue #1399: [MINOR] Removing code which is duplicated from the base class.

2020-03-11 Thread GitBox
codecov-io edited a comment on issue #1399: [MINOR] Removing code which is 
duplicated from the base class.
URL: https://github.com/apache/incubator-hudi/pull/1399#issuecomment-597925767
 
 
   # 
[Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1399?src=pr=h1) 
Report
   > Merging 
[#1399](https://codecov.io/gh/apache/incubator-hudi/pull/1399?src=pr=desc) 
into 
[master](https://codecov.io/gh/apache/incubator-hudi/commit/77d5b92d88d6583bdfc09e4c10ecfe7ddbb04806=desc)
 will **increase** coverage by `0.09%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-hudi/pull/1399/graphs/tree.svg?width=650=150=pr=VTTXabwbs2)](https://codecov.io/gh/apache/incubator-hudi/pull/1399?src=pr=tree)
   
   ```diff
   @@ Coverage Diff  @@
   ## master#1399  +/-   ##
   
   + Coverage 67.40%   67.49%   +0.09% 
 Complexity  230  230  
   
 Files   336  336  
 Lines 1636616351  -15 
 Branches   1672 1671   -1 
   
   + Hits  1103111036   +5 
   + Misses 4602 4580  -22 
   - Partials733  735   +2 
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/incubator-hudi/pull/1399?src=pr=tree) | 
Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | 
[...ain/java/org/apache/hudi/io/HoodieMergeHandle.java](https://codecov.io/gh/apache/incubator-hudi/pull/1399/diff?src=pr=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaW8vSG9vZGllTWVyZ2VIYW5kbGUuamF2YQ==)
 | `78.87% <ø> (+6.89%)` | `0.00 <0.00> (ø)` | |
   | 
[...g/apache/hudi/metrics/InMemoryMetricsReporter.java](https://codecov.io/gh/apache/incubator-hudi/pull/1399/diff?src=pr=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvbWV0cmljcy9Jbk1lbW9yeU1ldHJpY3NSZXBvcnRlci5qYXZh)
 | `25.00% <0.00%> (-50.00%)` | `0.00% <0.00%> (ø%)` | |
   | 
[...n/java/org/apache/hudi/common/model/HoodieKey.java](https://codecov.io/gh/apache/incubator-hudi/pull/1399/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL21vZGVsL0hvb2RpZUtleS5qYXZh)
 | `88.88% <0.00%> (-5.56%)` | `0.00% <0.00%> (ø%)` | |
   | 
[...i/utilities/deltastreamer/HoodieDeltaStreamer.java](https://codecov.io/gh/apache/incubator-hudi/pull/1399/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2RlbHRhc3RyZWFtZXIvSG9vZGllRGVsdGFTdHJlYW1lci5qYXZh)
 | `79.79% <0.00%> (-1.02%)` | `8.00% <0.00%> (ø%)` | |
   | 
[...ache/hudi/common/util/collection/DiskBasedMap.java](https://codecov.io/gh/apache/incubator-hudi/pull/1399/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3V0aWwvY29sbGVjdGlvbi9EaXNrQmFzZWRNYXAuamF2YQ==)
 | `83.07% <0.00%> (+8.46%)` | `0.00% <0.00%> (ø%)` | |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1399?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1399?src=pr=footer).
 Last update 
[77d5b92...dc187ca](https://codecov.io/gh/apache/incubator-hudi/pull/1399?src=pr=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] codecov-io commented on issue #1399: [MINOR] Removing code which is duplicated from the base class.

2020-03-11 Thread GitBox
codecov-io commented on issue #1399: [MINOR] Removing code which is duplicated 
from the base class.
URL: https://github.com/apache/incubator-hudi/pull/1399#issuecomment-597925767
 
 
   # 
[Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1399?src=pr=h1) 
Report
   > Merging 
[#1399](https://codecov.io/gh/apache/incubator-hudi/pull/1399?src=pr=desc) 
into 
[master](https://codecov.io/gh/apache/incubator-hudi/commit/77d5b92d88d6583bdfc09e4c10ecfe7ddbb04806?src=pr=desc)
 will **decrease** coverage by `66.76%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-hudi/pull/1399/graphs/tree.svg?width=650=VTTXabwbs2=150=pr)](https://codecov.io/gh/apache/incubator-hudi/pull/1399?src=pr=tree)
   
   ```diff
   @@ Coverage Diff  @@
   ## master   #1399   +/-   ##
   
   - Coverage  67.4%   0.64%   -66.77% 
   + Complexity  230   2  -228 
   
 Files   336 289   -47 
 Lines 16366   14360 -2006 
 Branches   16721466  -206 
   
   - Hits  11031  92-10939 
   - Misses 4602   14265 +9663 
   + Partials733   3  -730
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/incubator-hudi/pull/1399?src=pr=tree) | 
Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | 
[...ain/java/org/apache/hudi/io/HoodieMergeHandle.java](https://codecov.io/gh/apache/incubator-hudi/pull/1399/diff?src=pr=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaW8vSG9vZGllTWVyZ2VIYW5kbGUuamF2YQ==)
 | `0% <ø> (-71.98%)` | `0 <0> (ø)` | |
   | 
[...apache/hudi/common/model/HoodieDeltaWriteStat.java](https://codecov.io/gh/apache/incubator-hudi/pull/1399/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL21vZGVsL0hvb2RpZURlbHRhV3JpdGVTdGF0LmphdmE=)
 | `0% <0%> (-100%)` | `0% <0%> (ø)` | |
   | 
[.../index/hbase/DefaultHBaseQPSResourceAllocator.java](https://codecov.io/gh/apache/incubator-hudi/pull/1399/diff?src=pr=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaW5kZXgvaGJhc2UvRGVmYXVsdEhCYXNlUVBTUmVzb3VyY2VBbGxvY2F0b3IuamF2YQ==)
 | `0% <0%> (-100%)` | `0% <0%> (ø)` | |
   | 
[...org/apache/hudi/common/model/HoodieFileFormat.java](https://codecov.io/gh/apache/incubator-hudi/pull/1399/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL21vZGVsL0hvb2RpZUZpbGVGb3JtYXQuamF2YQ==)
 | `0% <0%> (-100%)` | `0% <0%> (ø)` | |
   | 
[...g/apache/hudi/execution/BulkInsertMapFunction.java](https://codecov.io/gh/apache/incubator-hudi/pull/1399/diff?src=pr=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvZXhlY3V0aW9uL0J1bGtJbnNlcnRNYXBGdW5jdGlvbi5qYXZh)
 | `0% <0%> (-100%)` | `0% <0%> (ø)` | |
   | 
[.../common/util/queue/IteratorBasedQueueProducer.java](https://codecov.io/gh/apache/incubator-hudi/pull/1399/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3V0aWwvcXVldWUvSXRlcmF0b3JCYXNlZFF1ZXVlUHJvZHVjZXIuamF2YQ==)
 | `0% <0%> (-100%)` | `0% <0%> (ø)` | |
   | 
[...rg/apache/hudi/index/bloom/KeyRangeLookupTree.java](https://codecov.io/gh/apache/incubator-hudi/pull/1399/diff?src=pr=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaW5kZXgvYmxvb20vS2V5UmFuZ2VMb29rdXBUcmVlLmphdmE=)
 | `0% <0%> (-100%)` | `0% <0%> (ø)` | |
   | 
[...e/hudi/common/table/timeline/dto/FileGroupDTO.java](https://codecov.io/gh/apache/incubator-hudi/pull/1399/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3RpbWVsaW5lL2R0by9GaWxlR3JvdXBEVE8uamF2YQ==)
 | `0% <0%> (-100%)` | `0% <0%> (ø)` | |
   | 
[...apache/hudi/timeline/service/handlers/Handler.java](https://codecov.io/gh/apache/incubator-hudi/pull/1399/diff?src=pr=tree#diff-aHVkaS10aW1lbGluZS1zZXJ2aWNlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL3RpbWVsaW5lL3NlcnZpY2UvaGFuZGxlcnMvSGFuZGxlci5qYXZh)
 | `0% <0%> (-100%)` | `0% <0%> (ø)` | |
   | 
[...che/hudi/common/table/timeline/dto/LogFileDTO.java](https://codecov.io/gh/apache/incubator-hudi/pull/1399/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3RpbWVsaW5lL2R0by9Mb2dGaWxlRFRPLmphdmE=)
 | `0% <0%> (-100%)` | `0% <0%> (ø)` | |
   | ... and [290 
more](https://codecov.io/gh/apache/incubator-hudi/pull/1399/diff?src=pr=tree-more)
 | |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1399?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1399?src=pr=footer).
 Last update 

[GitHub] [incubator-hudi] lamber-ken commented on issue #1398: [SUPPORT] DeltaStreamer - NoClassDefFoundError for HiveDriver

2020-03-11 Thread GitBox
lamber-ken commented on issue #1398: [SUPPORT] DeltaStreamer - 
NoClassDefFoundError for HiveDriver
URL: https://github.com/apache/incubator-hudi/issues/1398#issuecomment-597816698
 
 
   Missing `hive-jdbc` at spark jar folder, download and add 
   
https://repository.cloudera.com/artifactory/cloudera-repos/org/apache/hive/hive-jdbc/2.1.1-cdh6.1.0/hive-jdbc-2.1.1-cdh6.1.0.jar


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (HUDI-242) Support Efficient bootstrap of large parquet datasets to Hudi

2020-03-11 Thread Pratyaksh Sharma (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17057323#comment-17057323
 ] 

Pratyaksh Sharma commented on HUDI-242:
---

I would like to work on hive sync integration and presto integration of 
bootstrapped table. [~vbalaji]

> Support Efficient bootstrap of large parquet datasets to Hudi
> -
>
> Key: HUDI-242
> URL: https://issues.apache.org/jira/browse/HUDI-242
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Usability
>Reporter: Balaji Varadarajan
>Assignee: Balaji Varadarajan
>Priority: Major
> Fix For: 0.6.0
>
>
>  Support Efficient bootstrap of large parquet tables



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] pratyakshsharma edited a comment on issue #1150: [HUDI-288]: Add support for ingesting multiple kafka streams in a single DeltaStreamer deployment

2020-03-11 Thread GitBox
pratyakshsharma edited a comment on issue #1150: [HUDI-288]: Add support for 
ingesting multiple kafka streams in a single DeltaStreamer deployment
URL: https://github.com/apache/incubator-hudi/pull/1150#issuecomment-597641119
 
 
   > @pratyakshsharma : Let us know if you need any help on getting this 
through :) ?
   
   @bvaradar The fixes in this PR depend on 
https://github.com/apache/incubator-hudi/pull/1395. Let us try to close it as 
soon as possible to get this through. :) Also I have tried to address most of 
the things. Please take a pass. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] utk-spartan edited a comment on issue #1384: [SUPPORT] Hudi datastore missing updates for many records

2020-03-11 Thread GitBox
utk-spartan edited a comment on issue #1384: [SUPPORT] Hudi datastore missing 
updates for many records
URL: https://github.com/apache/incubator-hudi/issues/1384#issuecomment-597750294
 
 
   Can this have some relation with 
https://issues.apache.org/jira/browse/HUDI-409 , as we recently encountered 
parquet corruption errors (magic numbers mismatch) while reading from presto on 
a fresh hudi table, and there were no errors/warn reported by spark or in hudi 
commit metadata files.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] utk-spartan commented on issue #1384: [SUPPORT] Hudi datastore missing updates for many records

2020-03-11 Thread GitBox
utk-spartan commented on issue #1384: [SUPPORT] Hudi datastore missing updates 
for many records
URL: https://github.com/apache/incubator-hudi/issues/1384#issuecomment-597750294
 
 
   Can this have some relation with 
https://issues.apache.org/jira/browse/HUDI-409 , as we recently encountered 
parquet corruption errors (magic numbers mismatch) while reading from presto on 
a fresh hudi table.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] utk-spartan commented on issue #1384: [SUPPORT] Hudi datastore missing updates for many records

2020-03-11 Thread GitBox
utk-spartan commented on issue #1384: [SUPPORT] Hudi datastore missing updates 
for many records
URL: https://github.com/apache/incubator-hudi/issues/1384#issuecomment-597748228
 
 
   This is for COW tables, upon analyzing the data , missing record updates 
were below 0.01 % for old updated data but have recently increased to around 
20-30%.
   
   Can't find any failures in spark logs.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] bvaradar commented on issue #1384: [SUPPORT] Hudi datastore missing updates for many records

2020-03-11 Thread GitBox
bvaradar commented on issue #1384: [SUPPORT] Hudi datastore missing updates for 
many records
URL: https://github.com/apache/incubator-hudi/issues/1384#issuecomment-597746244
 
 
   @jainnidhi703 : Is this MOR  or COW table. Also, Can you give us some idea 
of % of missing records. Also, can you inspect Spark logs to see if there are 
any other failures ?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] eigakow opened a new issue #1398: [SUPPORT] DeltaStreamer - NoClassDefFoundError for HiveDriver

2020-03-11 Thread GitBox
eigakow opened a new issue #1398: [SUPPORT] DeltaStreamer - 
NoClassDefFoundError for HiveDriver
URL: https://github.com/apache/incubator-hudi/issues/1398
 
 
   **Describe the problem you faced**
   
   Using DeltaStreamer with --enable-hive-sync throws 
`java.lang.NoClassDefFoundError: org/apache/hive/jdbc/HiveDriver` error.
   Should I change something in the default compilation process to include this 
class?
   
   **To Reproduce**
   
   Steps to reproduce the behavior:
   
   1.  Properties file:
   ```
   hoodie.datasource.write.recordkey.field=ts
   hoodie.datasource.write.partitionpath.field=ts
   
hoodie.deltastreamer.schemaprovider.source.schema.file=file:///home/director/me/hudi-0.5.1-incubating/schema.avro
   
hoodie.deltastreamer.schemaprovider.target.schema.file=file:///home/director/me/hudi-0.5.1-incubating/schema.avro
   source-class=FR24JsonKafkaSource
   
bootstrap.servers=streaming-kafka-broker-1:9092,streaming-kafka-broker-2:9092,streaming-kafka-broker-3:9092
   group.id=hudi_testing
   hoodie.deltastreamer.source.kafka.topic=fr-bru
   enable.auto.commit=false
   schemaprovider-class=org.apache.hudi.utilities.schema.FilebasedSchemaProvider
   auto.offset.reset=earliest
   
   hoodie.datasource.hive_sync.database=fr24raw
   hoodie.datasource.hive_sync.table=test_hudi
   
hoodie.datasource.hive_sync.jdbcurl=jdbc:hive2://master-1.bigdatapoc.local:1/default;principal=hive/master-1.bigdatapoc.local@BIGDATAPOC.LOCAL
   hoodie.datasource.hive_sync.assume_date_partitioning=true
   hoodie.datasource.hive_sync.useJdbc=false
   ```
   2. Launch spark-submit with HoodieDeltaStreamer
   ```
   spark-submit --master yarn  --class 
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer --jars 
$(pwd)/../my-app-1-jar-with-dependencies.jar 
$(pwd)/packaging/hudi-utilities-bundle/target/hudi-utilities-bundle_2.11-0.5.1-incubating.jar
 --props hdfs:///tmp/hudi-fr24.properties --target-base-path 
adl://XXX.azuredatalakestore.net/test-hudi --table-type MERGE_ON_READ 
--target-table test_hudi --source-class FR24JsonKafkaSource  
--schemaprovider-class org.apache.hudi.utilities.schema.FilebasedSchemaProvider 
--enable-hive-sync --continuous --source-limit 100
   ```
   **Expected behavior**
   
   Sync to hive works
   
   **Environment Description**
   
   * Hudi version : hudi-0.5.1-incubating
   
   * Spark version : 2.4.0-cdh6.1.0
   
   * Hive version : 2.1.1-cdh6.1.0
   
   * Hadoop version : 3.0.0-cdh6.1.0
   
   * Storage (HDFS/S3/GCS..) : ADLS
   
   * Running on Docker? (yes/no) : no
   
   
   **Stacktrace**
   
   ```
   0/03/11 16:04:47 INFO cluster.YarnScheduler: Removed TaskSet 37.0, whose 
tasks have all completed, from pool
   20/03/11 16:04:47 INFO scheduler.DAGScheduler: ResultStage 37 (collect at 
HoodieMergeOnReadTableCompactor.java:208) finished in 0.679 s
   20/03/11 16:04:47 INFO scheduler.DAGScheduler: Job 12 finished: collect at 
HoodieMergeOnReadTableCompactor.java:208, took 0.680344 s
   20/03/11 16:04:47 INFO compact.HoodieMergeOnReadTableCompactor: Total of 0 
compactions are retrieved
   20/03/11 16:04:47 INFO compact.HoodieMergeOnReadTableCompactor: Total number 
of latest files slices 4
   20/03/11 16:04:47 INFO compact.HoodieMergeOnReadTableCompactor: Total number 
of log files 0
   20/03/11 16:04:47 INFO compact.HoodieMergeOnReadTableCompactor: Total number 
of file slices 4
   20/03/11 16:04:47 WARN compact.HoodieMergeOnReadTableCompactor: After 
filtering, Nothing to compact for 
adl://ecintpocdl.azuredatalakestore.net/FlightRadar24/test-hudi3
   20/03/11 16:04:47 INFO deltastreamer.DeltaSync: Syncing target hoodie table 
with hive table(test_hudi). Hive metastore URL 
:jdbc:hive2://master-1.bigdatapoc.local:1/default;principal=hive/master-1.bigdatapoc.local@BIGDATAPOC.LOCAL,
 basePath :adl://XXX.azuredatalakestore.net/test-hudi
   20/03/11 16:04:47 INFO deltastreamer.HoodieDeltaStreamer: Delta Sync 
shutdown. Error ?false
   20/03/11 16:04:47 WARN deltastreamer.HoodieDeltaStreamer: Gracefully 
shutting down compactor
   20/03/11 16:05:00 INFO deltastreamer.HoodieDeltaStreamer: Compactor shutting 
down properly!!
   20/03/11 16:05:00 ERROR deltastreamer.AbstractDeltaStreamerService: Service 
shutdown with error
   java.util.concurrent.ExecutionException: java.lang.NoClassDefFoundError: 
org/apache/hive/jdbc/HiveDriver
   at 
java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
   at 
java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895)
   at 
org.apache.hudi.utilities.deltastreamer.AbstractDeltaStreamerService.waitForShutdown(AbstractDeltaStreamerService.java:72)
   at 
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.sync(HoodieDeltaStreamer.java:117)
   at 
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.main(HoodieDeltaStreamer.java:295)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 

[jira] [Commented] (HUDI-437) Support user-defined index

2020-03-11 Thread leesf (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17057058#comment-17057058
 ] 

leesf commented on HUDI-437:


Sure, will open a PR in the next few days.

> Support user-defined index
> --
>
> Key: HUDI-437
> URL: https://issues.apache.org/jira/browse/HUDI-437
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Index, newbie, Writer Core
>Reporter: leesf
>Assignee: leesf
>Priority: Major
> Fix For: 0.6.0
>
>
> Currently, Hudi does not support user-defined index, and will throw exception 
> if configured other index type except for HBASE/INMEMORY/BLOOM/GLOBAL_BLOOM



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] pratyakshsharma commented on issue #1150: [HUDI-288]: Add support for ingesting multiple kafka streams in a single DeltaStreamer deployment

2020-03-11 Thread GitBox
pratyakshsharma commented on issue #1150: [HUDI-288]: Add support for ingesting 
multiple kafka streams in a single DeltaStreamer deployment
URL: https://github.com/apache/incubator-hudi/pull/1150#issuecomment-597641119
 
 
   > @pratyakshsharma : Let us know if you need any help on getting this 
through :) ?
   
   @bvaradar The fixes in this PR depend on 
https://github.com/apache/incubator-hudi/pull/1395. Let us try to close it as 
soon as possible to get this through. :) 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] pratyakshsharma commented on issue #1174: [HUDI-96]: Implemented command line options instead of positional arguments for CLI commands

2020-03-11 Thread GitBox
pratyakshsharma commented on issue #1174: [HUDI-96]: Implemented command line 
options instead of positional arguments for CLI commands
URL: https://github.com/apache/incubator-hudi/pull/1174#issuecomment-597638791
 
 
   @n3nash Did you get a chance to review it again? Was thinking of rebasing it 
along with addressing your comments. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] hddong commented on issue #1397: [HUDI-692] Add delete savepoint for cli

2020-03-11 Thread GitBox
hddong commented on issue #1397: [HUDI-692] Add delete savepoint for cli
URL: https://github.com/apache/incubator-hudi/pull/1397#issuecomment-597628877
 
 
   @yanghua thanks for your review.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] adamjoneill commented on issue #1325: presto - querying nested object in parquet file created by hudi

2020-03-11 Thread GitBox
adamjoneill commented on issue #1325: presto - querying nested object in 
parquet file created by hudi
URL: https://github.com/apache/incubator-hudi/issues/1325#issuecomment-597608269
 
 
   @bhasudha your example looks fine. Apart from the different environment set 
ups it looks like you were able to query a record without a nested simple item 
successfully.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Resolved] (HUDI-688) Ensure NOTICE contains all notices for the dependencies called out in LICENSE

2020-03-11 Thread Suneel Marthi (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi resolved HUDI-688.

Resolution: Fixed

> Ensure NOTICE contains all notices for the dependencies called out in LICENSE
> -
>
> Key: HUDI-688
> URL: https://issues.apache.org/jira/browse/HUDI-688
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Release  Administrative
>Reporter: Vinoth Chandar
>Assignee: Vinoth Chandar
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.5.2
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (HUDI-688) Ensure NOTICE contains all notices for the dependencies called out in LICENSE

2020-03-11 Thread Suneel Marthi (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi closed HUDI-688.
--

> Ensure NOTICE contains all notices for the dependencies called out in LICENSE
> -
>
> Key: HUDI-688
> URL: https://issues.apache.org/jira/browse/HUDI-688
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Release  Administrative
>Reporter: Vinoth Chandar
>Assignee: Vinoth Chandar
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.5.2
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[incubator-hudi] branch master updated: [HUDI-688] Paring down the NOTICE file to minimum required notices (#1391)

2020-03-11 Thread smarthi
This is an automated email from the ASF dual-hosted git repository.

smarthi pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new dd7cf38  [HUDI-688] Paring down the NOTICE file to minimum required 
notices (#1391)
dd7cf38 is described below

commit dd7cf38a137ed43b8f00aa14c985bf8b106e256f
Author: vinoth chandar 
AuthorDate: Wed Mar 11 05:24:07 2020 -0700

[HUDI-688] Paring down the NOTICE file to minimum required notices (#1391)

- Based on analysis, we don't need to call out anything
 - We only do source releases at this time
 - Fix typo in LICENSE
---
 LICENSE |  2 +-
 NOTICE  | 81 -
 2 files changed, 1 insertion(+), 82 deletions(-)

diff --git a/LICENSE b/LICENSE
index 34d6be6..28dfacd 100644
--- a/LICENSE
+++ b/LICENSE
@@ -284,7 +284,7 @@ SOFTWARE.
 
 ---
 
-This product includes code from org.apache.hadoop.
+This product includes code from Apache Hadoop
 
 * org.apache.hudi.common.bloom.filter.InternalDynamicBloomFilter.java adapted 
from org.apache.hadoop.util.bloom.DynamicBloomFilter.java
 
diff --git a/NOTICE b/NOTICE
index c0469fa..ecd4479 100644
--- a/NOTICE
+++ b/NOTICE
@@ -3,84 +3,3 @@ Copyright 2019 and onwards The Apache Software Foundation
 
 This product includes software developed at
 The Apache Software Foundation (http://www.apache.org/).
-
-This project bundles the following dependencies
-
-
-Metrics
-Copyright 2010-2013 Coda Hale and Yammer, Inc.
-
-This product includes software developed by Coda Hale and Yammer, Inc.
-
--
-Guava
-Copyright (C) 2007 The Guava Authors
-
-Licensed under the Apache License, Version 2.0
-
--
-Kryo (https://github.com/EsotericSoftware/kryo)
-Copyright (c) 2008-2018, Nathan Sweet All rights reserved.
-
-Redistribution and use in source and binary forms, with or without 
modification, are permitted provided that the
-following conditions are met:
-
-Redistributions of source code must retain the above copyright notice, this 
list of conditions and the following disclaimer.
-Redistributions in binary form must reproduce the above copyright notice, this 
list of conditions and the following disclaimer in the documentation and/or 
other materials provided with the distribution.
-
-Neither the name of Esoteric Software nor the names of its contributors may be 
used to endorse or promote products derived from this software without specific 
prior written permission.
-THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" 
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE 
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE 
FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR 
SERVICES; LOSS OF USE, DATA, OR PROF [...]
-
-
-Jackson JSON Processor
-
-This copy of Jackson JSON processor streaming parser/generator is licensed 
under the
-Apache (Software) License, version 2.0 ("the License").
-See the License for details about distribution rights, and the
-specific rights regarding derivate works.
-
-You may obtain a copy of the License at:
-
-http://www.apache.org/licenses/LICENSE-2.0
-
---
-
-Gson
-Copyright 2008 Google Inc.
-
-Licensed under the Apache License, Version 2.0 (the "License");
-you may not use this file except in compliance with the License.
-You may obtain a copy of the License at
-
-http://www.apache.org/licenses/LICENSE-2.0
-
-Unless required by applicable law or agreed to in writing, software
-distributed under the License is distributed on an "AS IS" BASIS,
-WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-See the License for the specific language governing permissions and
-limitations under the License.
-
-
-= Apache Hadoop 2.8.5 =
-Apache Hadoop
-Copyright 2009-2017 The Apache Software Foundation
-
-= Apache Hive 2.3.1 =
-Apache Hive
-Copyright 2008-2017 The Apache Software Foundation
-
-= Apache Spark 2.4.4 =
-Apache Spark
-Copyright 2014 and onwards The Apache Software Foundation
-
-= Apache Kafka 2.0.0 =
-Apache Kafka
-Copyright 2020 The Apache Software Foundation.
-
-= Apache HBase 1.2.3 =
-Apache HBase
-Copyright 2007-2019 The Apache Software Foundation.
-
-= Apache Avro 1.8.2 =
-Apache Avro
-Copyright 2010-2019 The Apache Software Foundation.
\ No newline at end of file



[GitHub] [incubator-hudi] smarthi merged pull request #1391: [HUDI-688] Paring down the NOTICE file to minimum required notices

2020-03-11 Thread GitBox
smarthi merged pull request #1391: [HUDI-688] Paring down the NOTICE file to 
minimum required notices
URL: https://github.com/apache/incubator-hudi/pull/1391
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Closed] (HUDI-670) Improve unit test coverage for org.apache.hudi.common.util.collection.DiskBasedMap

2020-03-11 Thread Suneel Marthi (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi closed HUDI-670.
--

> Improve unit test coverage for 
> org.apache.hudi.common.util.collection.DiskBasedMap
> --
>
> Key: HUDI-670
> URL: https://issues.apache.org/jira/browse/HUDI-670
> Project: Apache Hudi (incubating)
>  Issue Type: Sub-task
>Reporter: Prashant Wason
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.6.0
>
>   Original Estimate: 2h
>  Time Spent: 20m
>  Remaining Estimate: 1h 40m
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HUDI-670) Improve unit test coverage for org.apache.hudi.common.util.collection.DiskBasedMap

2020-03-11 Thread Suneel Marthi (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi resolved HUDI-670.

Fix Version/s: 0.6.0
   Resolution: Fixed

> Improve unit test coverage for 
> org.apache.hudi.common.util.collection.DiskBasedMap
> --
>
> Key: HUDI-670
> URL: https://issues.apache.org/jira/browse/HUDI-670
> Project: Apache Hudi (incubating)
>  Issue Type: Sub-task
>Reporter: Prashant Wason
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.6.0
>
>   Original Estimate: 2h
>  Time Spent: 20m
>  Remaining Estimate: 1h 40m
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[incubator-hudi] branch master updated: [HUDI-670] Added test cases for TestDiskBasedMap. (#1379)

2020-03-11 Thread smarthi
This is an automated email from the ASF dual-hosted git repository.

smarthi pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new cf0a4c1  [HUDI-670] Added test cases for TestDiskBasedMap. (#1379)
cf0a4c1 is described below

commit cf0a4c19bc4ed850172e6ac938f57a0bf7e96353
Author: Prashant Wason 
AuthorDate: Wed Mar 11 05:03:03 2020 -0700

[HUDI-670] Added test cases for TestDiskBasedMap. (#1379)

* [HUDI-670] Added test cases for TestDiskBasedMap.

* Update TestDiskBasedMap.java

Co-authored-by: Suneel Marthi 
---
 .../common/util/collection/TestDiskBasedMap.java   | 25 ++
 1 file changed, 25 insertions(+)

diff --git 
a/hudi-common/src/test/java/org/apache/hudi/common/util/collection/TestDiskBasedMap.java
 
b/hudi-common/src/test/java/org/apache/hudi/common/util/collection/TestDiskBasedMap.java
old mode 100644
new mode 100755
index 2cc726e..3fcfab5
--- 
a/hudi-common/src/test/java/org/apache/hudi/common/util/collection/TestDiskBasedMap.java
+++ 
b/hudi-common/src/test/java/org/apache/hudi/common/util/collection/TestDiskBasedMap.java
@@ -20,6 +20,7 @@ package org.apache.hudi.common.util.collection;
 
 import org.apache.hudi.common.HoodieCommonTestHarness;
 import org.apache.hudi.common.model.AvroBinaryTestPayload;
+import org.apache.hudi.common.model.HoodieAvroPayload;
 import org.apache.hudi.common.model.HoodieKey;
 import org.apache.hudi.common.model.HoodieRecord;
 import org.apache.hudi.common.model.HoodieRecordPayload;
@@ -42,9 +43,11 @@ import java.io.IOException;
 import java.io.UncheckedIOException;
 import java.net.URISyntaxException;
 import java.util.ArrayList;
+import java.util.HashMap;
 import java.util.HashSet;
 import java.util.Iterator;
 import java.util.List;
+import java.util.Map;
 import java.util.Set;
 import java.util.UUID;
 import java.util.stream.Collectors;
@@ -184,6 +187,28 @@ public class TestDiskBasedMap extends 
HoodieCommonTestHarness {
 assertTrue(payloadSize > 0);
   }
 
+  @Test
+  public void testPutAll() throws IOException, URISyntaxException {
+DiskBasedMap records = new DiskBasedMap<>(basePath);
+List iRecords = SchemaTestUtil.generateHoodieTestRecords(0, 
100);
+Map recordMap = new HashMap<>();
+iRecords.forEach(r -> {
+  String key = ((GenericRecord) 
r).get(HoodieRecord.RECORD_KEY_METADATA_FIELD).toString();
+  String partitionPath = ((GenericRecord) 
r).get(HoodieRecord.PARTITION_PATH_METADATA_FIELD).toString();
+  HoodieRecord value = new HoodieRecord<>(new HoodieKey(key, 
partitionPath), new HoodieAvroPayload(Option.of((GenericRecord) r)));
+  recordMap.put(key, value);
+});
+
+records.putAll(recordMap);
+// make sure records have spilled to disk
+assertTrue(records.sizeOfFileOnDiskInBytes() > 0);
+
+// make sure all added records are present
+for (Map.Entry entry : records.entrySet()) {
+  assertTrue(recordMap.containsKey(entry.getKey()));
+}
+  }
+
   /**
* @na: Leaving this test here for a quick performance test
*/



[GitHub] [incubator-hudi] smarthi merged pull request #1379: [HUDI-670] Added test cases for TestDiskBasedMap.

2020-03-11 Thread GitBox
smarthi merged pull request #1379: [HUDI-670] Added test cases for 
TestDiskBasedMap.
URL: https://github.com/apache/incubator-hudi/pull/1379
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] raimarkuehnsiemens commented on issue #143: Tracking ticket for folks to be added to slack group

2020-03-11 Thread GitBox
raimarkuehnsiemens commented on issue #143: Tracking ticket for folks to be 
added to slack group
URL: https://github.com/apache/incubator-hudi/issues/143#issuecomment-597592275
 
 
   Can you add me: raimar.wag...@siemens.com?
   Thanks!


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] codecov-io commented on issue #1397: [HUDI-692] Add delete savepoint for cli

2020-03-11 Thread GitBox
codecov-io commented on issue #1397: [HUDI-692] Add delete savepoint for cli
URL: https://github.com/apache/incubator-hudi/pull/1397#issuecomment-597573024
 
 
   # 
[Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1397?src=pr=h1) 
Report
   > Merging 
[#1397](https://codecov.io/gh/apache/incubator-hudi/pull/1397?src=pr=desc) 
into 
[master](https://codecov.io/gh/apache/incubator-hudi/commit/77d5b92d88d6583bdfc09e4c10ecfe7ddbb04806=desc)
 will **decrease** coverage by `0.01%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-hudi/pull/1397/graphs/tree.svg?width=650=150=pr=VTTXabwbs2)](https://codecov.io/gh/apache/incubator-hudi/pull/1397?src=pr=tree)
   
   ```diff
   @@ Coverage Diff  @@
   ## master#1397  +/-   ##
   
   - Coverage 67.40%   67.38%   -0.02% 
 Complexity  230  230  
   
 Files   336  336  
 Lines 1636616366  
 Branches   1672 1672  
   
   - Hits  1103111028   -3 
 Misses 4602 4602  
   - Partials733  736   +3 
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/incubator-hudi/pull/1397?src=pr=tree) | 
Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | 
[...n/java/org/apache/hudi/common/model/HoodieKey.java](https://codecov.io/gh/apache/incubator-hudi/pull/1397/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL21vZGVsL0hvb2RpZUtleS5qYXZh)
 | `88.88% <0.00%> (-5.56%)` | `0.00% <0.00%> (ø%)` | |
   | 
[...i/utilities/deltastreamer/HoodieDeltaStreamer.java](https://codecov.io/gh/apache/incubator-hudi/pull/1397/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2RlbHRhc3RyZWFtZXIvSG9vZGllRGVsdGFTdHJlYW1lci5qYXZh)
 | `79.79% <0.00%> (-1.02%)` | `8.00% <0.00%> (ø%)` | |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1397?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1397?src=pr=footer).
 Last update 
[77d5b92...e6fe5b9](https://codecov.io/gh/apache/incubator-hudi/pull/1397?src=pr=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] jainnidhi703 edited a comment on issue #1384: [SUPPORT] Hudi datastore missing updates for many records

2020-03-11 Thread GitBox
jainnidhi703 edited a comment on issue #1384: [SUPPORT] Hudi datastore missing 
updates for many records
URL: https://github.com/apache/incubator-hudi/issues/1384#issuecomment-597564066
 
 
   The issue was prevailing on 0.4.7, so we thought maybe it was due to 
https://github.com/apache/incubator-hudi/issues/418, and upgraded to Hudi 
0.5.1. But the issue was still not resolved.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] jainnidhi703 commented on issue #1384: [SUPPORT] Hudi datastore missing updates for many records

2020-03-11 Thread GitBox
jainnidhi703 commented on issue #1384: [SUPPORT] Hudi datastore missing updates 
for many records
URL: https://github.com/apache/incubator-hudi/issues/1384#issuecomment-597564066
 
 
   The issue was prevailing on 0.4.7, so we thought maybe it was due to 
https://github.com/apache/incubator-hudi/issues/418, and upgraded to Hudi 
0.5.1. But the issue still prevails.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] yanghua commented on issue #1390: [HUDI-634] Cut 0.5.2 documentation and write release note

2020-03-11 Thread GitBox
yanghua commented on issue #1390: [HUDI-634] Cut 0.5.2 documentation and write 
release note
URL: https://github.com/apache/incubator-hudi/pull/1390#issuecomment-597526215
 
 
   @vinothchandar @bvaradar Have addressed most of your suggestions. And also 
cut the 0.5.2's doc. Please review again.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] yanghua commented on a change in pull request #1397: [HUDI-692] Add delete savepoint for cli

2020-03-11 Thread GitBox
yanghua commented on a change in pull request #1397: [HUDI-692] Add delete 
savepoint for cli
URL: https://github.com/apache/incubator-hudi/pull/1397#discussion_r390833231
 
 

 ##
 File path: 
hudi-cli/src/main/java/org/apache/hudi/cli/commands/SavepointsCommand.java
 ##
 @@ -127,6 +127,26 @@ public String refreshMetaClient() {
 return "Metadata for table " + 
HoodieCLI.getTableMetaClient().getTableConfig().getTableName() + " refreshed.";
   }
 
+  @CliCommand(value = "savepoint delete", help = "Delete the savepoint")
+  public String deleteSavepoint(@CliOption(key = {"commit"}, help = "Delete a 
savepoint") final String commitTime) throws Exception {
+HoodieTableMetaClient metaClient = HoodieCLI.getTableMetaClient();
+HoodieTimeline completedInstants = 
metaClient.getActiveTimeline().getSavePointTimeline().filterCompletedInstants();
+if (completedInstants.empty()) {
+  throw new HoodieException("There are no completed savepoint to run 
delete");
+}
+HoodieInstant savePoint = new HoodieInstant(false, 
HoodieTimeline.SAVEPOINT_ACTION, commitTime);
+
+if (!completedInstants.containsInstant(savePoint)) {
+  return "Commit " + commitTime + " not found in Commits " + 
completedInstants;
+}
+
+JavaSparkContext jsc = SparkUtil.initJavaSparkConf("Delete Savepoint");
+HoodieWriteClient client = createHoodieClient(jsc, 
metaClient.getBasePath());
+client.deleteSavepoint(commitTime);
+jsc.close();
 
 Review comment:
   `JavaSparkContext` implements `AutoCloseable` interface, can we use 
`try-with-resource` here to avoid the resource leak?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] dengziming commented on a change in pull request #1151: [HUDI-476] Add hudi-examples module

2020-03-11 Thread GitBox
dengziming commented on a change in pull request #1151: [HUDI-476] Add 
hudi-examples module
URL: https://github.com/apache/incubator-hudi/pull/1151#discussion_r390827545
 
 

 ##
 File path: 
hudi-examples/src/main/java/org/apache/hudi/examples/deltastreamer/HoodieDeltaStreamerDfsSourceExample.java
 ##
 @@ -0,0 +1,81 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.examples.deltastreamer;
+
+import org.apache.hudi.DataSourceWriteOptions;
+import org.apache.hudi.examples.common.HoodieExampleDataGenerator;
+import org.apache.hudi.examples.common.HoodieExampleSparkUtils;
+import org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer;
+import org.apache.hudi.utilities.sources.JsonDFSSource;
+import org.apache.hudi.utilities.transform.IdentityTransformer;
+
+import com.beust.jcommander.JCommander;
+import org.apache.spark.SparkConf;
+import org.apache.spark.api.java.JavaSparkContext;
+
+
+/**
+ * Simple examples of #{@link HoodieDeltaStreamer} from #{@link JsonDFSSource}.
+ *
+ * To run this example, you should
+ *   1. prepare sample data as 
`hudi-examples/src/main/resources/dfs-delta-streamer`
+ *   2. For running in IDE, set VM options `-Dspark.master=local[2]`
+ *   3. For running in shell, using `spark-submit`
+ *
+ * Usage: HoodieDeltaStreamerDfsSourceExample \
 
 Review comment:
   This is a good idea, I will try to extract the data prep part themselves.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] dengziming commented on a change in pull request #1151: [HUDI-476] Add hudi-examples module

2020-03-11 Thread GitBox
dengziming commented on a change in pull request #1151: [HUDI-476] Add 
hudi-examples module
URL: https://github.com/apache/incubator-hudi/pull/1151#discussion_r390818477
 
 

 ##
 File path: 
hudi-examples/src/main/java/org/apache/hudi/examples/deltastreamer/HoodieDeltaStreamerDfsSourceExample.java
 ##
 @@ -0,0 +1,81 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.examples.deltastreamer;
+
+import org.apache.hudi.DataSourceWriteOptions;
+import org.apache.hudi.examples.common.HoodieExampleDataGenerator;
+import org.apache.hudi.examples.common.HoodieExampleSparkUtils;
+import org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer;
+import org.apache.hudi.utilities.sources.JsonDFSSource;
+import org.apache.hudi.utilities.transform.IdentityTransformer;
+
+import com.beust.jcommander.JCommander;
+import org.apache.spark.SparkConf;
+import org.apache.spark.api.java.JavaSparkContext;
+
+
+/**
+ * Simple examples of #{@link HoodieDeltaStreamer} from #{@link JsonDFSSource}.
+ *
+ * To run this example, you should
+ *   1. prepare sample data as 
`hudi-examples/src/main/resources/dfs-delta-streamer`
+ *   2. For running in IDE, set VM options `-Dspark.master=local[2]`
+ *   3. For running in shell, using `spark-submit`
+ *
+ * Usage: HoodieDeltaStreamerDfsSourceExample \
+ *--target-base-path /tmp/hoodie/dfsdeltatable \
+ *--table-type MERGE_ON_READ \
+ *--target-table dfsdeltatable
+ *
+ */
+public class HoodieDeltaStreamerDfsSourceExample {
+
+  public static void main(String[] args) throws Exception {
+
+final HoodieDeltaStreamer.Config cfg = defaultDfsStreamerConfig();
 
 Review comment:
   The advantage of adding these configs in code is that developers and users 
can execute them directly, and the main function of examples is just to give 
users some tutorials, they will remove the hardcode when developing their own 
application, so we can remain them.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Updated] (HUDI-634) Cut 0.5.2 documentation and write release note

2020-03-11 Thread vinoyang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang updated HUDI-634:
--
Summary: Cut 0.5.2 documentation and write release note  (was: Write 
release blog and document breaking changes for 0.5.2 release)

> Cut 0.5.2 documentation and write release note
> --
>
> Key: HUDI-634
> URL: https://issues.apache.org/jira/browse/HUDI-634
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Release  Administrative
>Reporter: Vinoth Chandar
>Assignee: vinoyang
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.5.2
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> * Write Client restructuring has moved classes around (HUDI-554) 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-688) Ensure NOTICE contains all notices for the dependencies called out in LICENSE

2020-03-11 Thread vinoyang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang updated HUDI-688:
--
Status: In Progress  (was: Open)

> Ensure NOTICE contains all notices for the dependencies called out in LICENSE
> -
>
> Key: HUDI-688
> URL: https://issues.apache.org/jira/browse/HUDI-688
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Release  Administrative
>Reporter: Vinoth Chandar
>Assignee: Vinoth Chandar
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.5.2
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] dengziming commented on a change in pull request #1151: [HUDI-476] Add hudi-examples module

2020-03-11 Thread GitBox
dengziming commented on a change in pull request #1151: [HUDI-476] Add 
hudi-examples module
URL: https://github.com/apache/incubator-hudi/pull/1151#discussion_r390802033
 
 

 ##
 File path: hudi-examples/pom.xml
 ##
 @@ -0,0 +1,206 @@
+
+
+http://maven.apache.org/POM/4.0.0; 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance; 
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 
http://maven.apache.org/xsd/maven-4.0.0.xsd;>
+
+  
+hudi
+org.apache.hudi
+0.5.2-SNAPSHOT
+  
+  4.0.0
+
+  hudi-examples
+  jar
+
+  
+${project.parent.basedir}
+  
+
+  
+
+  
+src/main/resources
+  
+
+
+
+  
+org.apache.maven.plugins
 
 Review comment:
   I think it's better to not have a fat jar here and add a 
`run_hudi_example.sh`.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] dengziming commented on a change in pull request #1151: [HUDI-476] Add hudi-examples module

2020-03-11 Thread GitBox
dengziming commented on a change in pull request #1151: [HUDI-476] Add 
hudi-examples module
URL: https://github.com/apache/incubator-hudi/pull/1151#discussion_r390802033
 
 

 ##
 File path: hudi-examples/pom.xml
 ##
 @@ -0,0 +1,206 @@
+
+
+http://maven.apache.org/POM/4.0.0; 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance; 
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 
http://maven.apache.org/xsd/maven-4.0.0.xsd;>
+
+  
+hudi
+org.apache.hudi
+0.5.2-SNAPSHOT
+  
+  4.0.0
+
+  hudi-examples
+  jar
+
+  
+${project.parent.basedir}
+  
+
+  
+
+  
+src/main/resources
+  
+
+
+
+  
+org.apache.maven.plugins
 
 Review comment:
   I think it's better to not have a fat jar here.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] hddong commented on issue #1397: [HUDI-692] Add delete savepoint for cli

2020-03-11 Thread GitBox
hddong commented on issue #1397: [HUDI-692] Add delete savepoint for cli
URL: https://github.com/apache/incubator-hudi/pull/1397#issuecomment-597498318
 
 
   @yanghua @vinothchandar please have a review.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Updated] (HUDI-554) Restructure code/packages to move more code back into hudi-writer-common

2020-03-11 Thread vinoyang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang updated HUDI-554:
--
Fix Version/s: (was: 0.6.0)
   0.5.2

> Restructure code/packages  to move more code back into hudi-writer-common
> -
>
> Key: HUDI-554
> URL: https://issues.apache.org/jira/browse/HUDI-554
> Project: Apache Hudi (incubating)
>  Issue Type: Sub-task
>  Components: Code Cleanup
>Reporter: Vinoth Chandar
>Assignee: Vinoth Chandar
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.2
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] codecov-io edited a comment on issue #1165: [HUDI-76] Add CSV Source support for Hudi Delta Streamer

2020-03-11 Thread GitBox
codecov-io edited a comment on issue #1165: [HUDI-76] Add CSV Source support 
for Hudi Delta Streamer
URL: https://github.com/apache/incubator-hudi/pull/1165#issuecomment-597470147
 
 
   # 
[Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1165?src=pr=h1) 
Report
   > Merging 
[#1165](https://codecov.io/gh/apache/incubator-hudi/pull/1165?src=pr=desc) 
into 
[master](https://codecov.io/gh/apache/incubator-hudi/commit/77d5b92d88d6583bdfc09e4c10ecfe7ddbb04806=desc)
 will **increase** coverage by `0.01%`.
   > The diff coverage is `100.00%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-hudi/pull/1165/graphs/tree.svg?width=650=150=pr=VTTXabwbs2)](https://codecov.io/gh/apache/incubator-hudi/pull/1165?src=pr=tree)
   
   ```diff
   @@ Coverage Diff  @@
   ## master#1165  +/-   ##
   
   + Coverage 67.40%   67.41%   +0.01% 
   - Complexity  230  240  +10 
   
 Files   336  337   +1 
 Lines 1636616391  +25 
 Branches   1672 1676   +4 
   
   + Hits  1103111050  +19 
   - Misses 4602 4605   +3 
   - Partials733  736   +3 
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/incubator-hudi/pull/1165?src=pr=tree) | 
Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | 
[...rg/apache/hudi/utilities/sources/CsvDFSSource.java](https://codecov.io/gh/apache/incubator-hudi/pull/1165/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvQ3N2REZTU291cmNlLmphdmE=)
 | `100.00% <100.00%> (ø)` | `10.00 <10.00> (?)` | |
   | 
[...g/apache/hudi/metrics/InMemoryMetricsReporter.java](https://codecov.io/gh/apache/incubator-hudi/pull/1165/diff?src=pr=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvbWV0cmljcy9Jbk1lbW9yeU1ldHJpY3NSZXBvcnRlci5qYXZh)
 | `25.00% <0.00%> (-50.00%)` | `0.00% <0.00%> (ø%)` | |
   | 
[...n/java/org/apache/hudi/common/model/HoodieKey.java](https://codecov.io/gh/apache/incubator-hudi/pull/1165/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL21vZGVsL0hvb2RpZUtleS5qYXZh)
 | `88.88% <0.00%> (-5.56%)` | `0.00% <0.00%> (ø%)` | |
   | 
[...i/utilities/deltastreamer/HoodieDeltaStreamer.java](https://codecov.io/gh/apache/incubator-hudi/pull/1165/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2RlbHRhc3RyZWFtZXIvSG9vZGllRGVsdGFTdHJlYW1lci5qYXZh)
 | `79.79% <0.00%> (-1.02%)` | `8.00% <0.00%> (ø%)` | |
   | 
[...che/hudi/common/util/BufferedRandomAccessFile.java](https://codecov.io/gh/apache/incubator-hudi/pull/1165/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3V0aWwvQnVmZmVyZWRSYW5kb21BY2Nlc3NGaWxlLmphdmE=)
 | `54.38% <0.00%> (-0.88%)` | `0.00% <0.00%> (ø%)` | |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1165?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1165?src=pr=footer).
 Last update 
[77d5b92...fb6bc0b](https://codecov.io/gh/apache/incubator-hudi/pull/1165?src=pr=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] codecov-io commented on issue #1165: [HUDI-76] Add CSV Source support for Hudi Delta Streamer

2020-03-11 Thread GitBox
codecov-io commented on issue #1165: [HUDI-76] Add CSV Source support for Hudi 
Delta Streamer
URL: https://github.com/apache/incubator-hudi/pull/1165#issuecomment-597470147
 
 
   # 
[Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1165?src=pr=h1) 
Report
   > Merging 
[#1165](https://codecov.io/gh/apache/incubator-hudi/pull/1165?src=pr=desc) 
into 
[master](https://codecov.io/gh/apache/incubator-hudi/commit/77d5b92d88d6583bdfc09e4c10ecfe7ddbb04806?src=pr=desc)
 will **increase** coverage by `0.01%`.
   > The diff coverage is `100%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-hudi/pull/1165/graphs/tree.svg?width=650=VTTXabwbs2=150=pr)](https://codecov.io/gh/apache/incubator-hudi/pull/1165?src=pr=tree)
   
   ```diff
   @@ Coverage Diff  @@
   ## master#1165  +/-   ##
   
   + Coverage  67.4%   67.41%   +0.01% 
   - Complexity  230  240  +10 
   
 Files   336  337   +1 
 Lines 1636616391  +25 
 Branches   1672 1676   +4 
   
   + Hits  1103111050  +19 
   - Misses 4602 4605   +3 
   - Partials733  736   +3
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/incubator-hudi/pull/1165?src=pr=tree) | 
Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | 
[...rg/apache/hudi/utilities/sources/CsvDFSSource.java](https://codecov.io/gh/apache/incubator-hudi/pull/1165/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvQ3N2REZTU291cmNlLmphdmE=)
 | `100% <100%> (ø)` | `10 <10> (?)` | |
   | 
[...g/apache/hudi/metrics/InMemoryMetricsReporter.java](https://codecov.io/gh/apache/incubator-hudi/pull/1165/diff?src=pr=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvbWV0cmljcy9Jbk1lbW9yeU1ldHJpY3NSZXBvcnRlci5qYXZh)
 | `25% <0%> (-50%)` | `0% <0%> (ø)` | |
   | 
[...n/java/org/apache/hudi/common/model/HoodieKey.java](https://codecov.io/gh/apache/incubator-hudi/pull/1165/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL21vZGVsL0hvb2RpZUtleS5qYXZh)
 | `88.88% <0%> (-5.56%)` | `0% <0%> (ø)` | |
   | 
[...i/utilities/deltastreamer/HoodieDeltaStreamer.java](https://codecov.io/gh/apache/incubator-hudi/pull/1165/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2RlbHRhc3RyZWFtZXIvSG9vZGllRGVsdGFTdHJlYW1lci5qYXZh)
 | `79.79% <0%> (-1.02%)` | `8% <0%> (ø)` | |
   | 
[...che/hudi/common/util/BufferedRandomAccessFile.java](https://codecov.io/gh/apache/incubator-hudi/pull/1165/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3V0aWwvQnVmZmVyZWRSYW5kb21BY2Nlc3NGaWxlLmphdmE=)
 | `54.38% <0%> (-0.88%)` | `0% <0%> (ø)` | |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1165?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1165?src=pr=footer).
 Last update 
[77d5b92...fb6bc0b](https://codecov.io/gh/apache/incubator-hudi/pull/1165?src=pr=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Updated] (HUDI-692) Add delete savepoint for cli

2020-03-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-692:

Labels: pull-request-available  (was: )

> Add delete savepoint for cli
> 
>
> Key: HUDI-692
> URL: https://issues.apache.org/jira/browse/HUDI-692
> Project: Apache Hudi (incubating)
>  Issue Type: New Feature
>  Components: CLI
>Reporter: hong dongdong
>Assignee: hong dongdong
>Priority: Major
>  Labels: pull-request-available
>
> Now, deleteSavepoint already provided in HoodieWriteClient, but not provide 
> to user, add it in CLI.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] hddong opened a new pull request #1397: [HUDI-692] Add delete savepoint for cli

2020-03-11 Thread GitBox
hddong opened a new pull request #1397: [HUDI-692] Add delete savepoint for cli
URL: https://github.com/apache/incubator-hudi/pull/1397
 
 
   ## What is the purpose of the pull request
   
   *Now, deleteSavepoint already provided in HoodieWriteClient, but not provide 
to user, add it in CLI.*
   
   ## Brief change log
   
 - *Add delete savepoint for cli*
   
   ## Verify this pull request
   
   This pull request is a trivial rework / code cleanup without any test 
coverage.
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Assigned] (HUDI-692) Add delete savepoint for cli

2020-03-11 Thread hong dongdong (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hong dongdong reassigned HUDI-692:
--

Assignee: hong dongdong

> Add delete savepoint for cli
> 
>
> Key: HUDI-692
> URL: https://issues.apache.org/jira/browse/HUDI-692
> Project: Apache Hudi (incubating)
>  Issue Type: New Feature
>  Components: CLI
>Reporter: hong dongdong
>Assignee: hong dongdong
>Priority: Major
>
> Now, deleteSavepoint already provided in HoodieWriteClient, but not provide 
> to user, add it in CLI.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-692) Add delete savepoint for cli

2020-03-11 Thread hong dongdong (Jira)
hong dongdong created HUDI-692:
--

 Summary: Add delete savepoint for cli
 Key: HUDI-692
 URL: https://issues.apache.org/jira/browse/HUDI-692
 Project: Apache Hudi (incubating)
  Issue Type: New Feature
  Components: CLI
Reporter: hong dongdong


Now, deleteSavepoint already provided in HoodieWriteClient, but not provide to 
user, add it in CLI.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] yihua commented on a change in pull request #1165: [HUDI-76] Add CSV Source support for Hudi Delta Streamer

2020-03-11 Thread GitBox
yihua commented on a change in pull request #1165: [HUDI-76] Add CSV Source 
support for Hudi Delta Streamer
URL: https://github.com/apache/incubator-hudi/pull/1165#discussion_r390761706
 
 

 ##
 File path: 
hudi-utilities/src/test/java/org/apache/hudi/utilities/TestHoodieDeltaStreamer.java
 ##
 @@ -693,6 +699,146 @@ public void 
testParquetDFSSourceWithSchemaFilesAndTransformer() throws Exception
 testParquetDFSSource(true, TripsWithDistanceTransformer.class.getName());
   }
 
+  private void prepareCsvDFSSource(
+  boolean hasHeader, char sep, boolean useSchemaProvider, boolean 
hasTransformer) throws IOException {
+String sourceRoot = dfsBasePath + "/csvFiles";
+String recordKeyField = (hasHeader || useSchemaProvider) ? "_row_key" : 
"_c0";
+
+// Properties used for testing delta-streamer with CSV source
+TypedProperties csvProps = new TypedProperties();
+csvProps.setProperty("include", "base.properties");
+csvProps.setProperty("hoodie.datasource.write.recordkey.field", 
recordKeyField);
+csvProps.setProperty("hoodie.datasource.write.partitionpath.field", 
"not_there");
+if (useSchemaProvider) {
+  
csvProps.setProperty("hoodie.deltastreamer.schemaprovider.source.schema.file", 
dfsBasePath + "/source-flattened.avsc");
+  if (hasTransformer) {
+
csvProps.setProperty("hoodie.deltastreamer.schemaprovider.target.schema.file", 
dfsBasePath + "/target-flattened.avsc");
+  }
+}
+csvProps.setProperty("hoodie.deltastreamer.source.dfs.root", sourceRoot);
+
+if (sep != ',') {
+  if (sep == '\t') {
+csvProps.setProperty("hoodie.deltastreamer.csv.sep", "\\t");
+  } else {
+csvProps.setProperty("hoodie.deltastreamer.csv.sep", 
Character.toString(sep));
+  }
+}
+if (hasHeader) {
+  csvProps.setProperty("hoodie.deltastreamer.csv.header", 
Boolean.toString(hasHeader));
+}
+
+UtilitiesTestBase.Helpers.savePropsToDFS(csvProps, dfs, dfsBasePath + "/" 
+ PROPS_FILENAME_TEST_CSV);
+
+String path = sourceRoot + "/1.csv";
+HoodieTestDataGenerator dataGenerator = new HoodieTestDataGenerator();
+UtilitiesTestBase.Helpers.saveCsvToDFS(
+hasHeader, sep,
+Helpers.jsonifyRecords(dataGenerator.generateInserts("000", 
CSV_NUM_RECORDS, true)),
+dfs, path);
+  }
+
+  private void testCsvDFSSource(
+  boolean hasHeader, char sep, boolean useSchemaProvider, String 
transformerClassName) throws Exception {
+prepareCsvDFSSource(hasHeader, sep, useSchemaProvider, 
transformerClassName != null);
+String tableBasePath = dfsBasePath + "/test_csv_table" + testNum;
+String sourceOrderingField = (hasHeader || useSchemaProvider) ? 
"timestamp" : "_c0";
+HoodieDeltaStreamer deltaStreamer =
+new HoodieDeltaStreamer(TestHelpers.makeConfig(
+tableBasePath, Operation.INSERT, CsvDFSSource.class.getName(),
+transformerClassName, PROPS_FILENAME_TEST_CSV, false,
+useSchemaProvider, 1000, false, null, null, sourceOrderingField), 
jsc);
+deltaStreamer.sync();
+TestHelpers.assertRecordCount(CSV_NUM_RECORDS, tableBasePath + 
"/*/*.parquet", sqlContext);
+testNum++;
+  }
+
+  @Test
+  public void 
testCsvDFSSourceWithHeaderWithoutSchemaProviderAndNoTransformer() throws 
Exception {
+// The CSV files have header, the columns are separated by ',', the 
default separator
+// No schema provider is specified, no transformer is applied
+// In this case, the source schema comes from the inferred schema of the 
CSV files
+testCsvDFSSource(true, ',', false, null);
+  }
+
+  @Test
+  public void 
testCsvDFSSourceWithHeaderAndSepWithoutSchemaProviderAndNoTransformer() throws 
Exception {
+// The CSV files have header, the columns are separated by '\t',
+// which is passed in through the Hudi CSV properties
+// No schema provider is specified, no transformer is applied
+// In this case, the source schema comes from the inferred schema of the 
CSV files
+testCsvDFSSource(true, '\t', false, null);
+  }
+
+  @Test
+  public void 
testCsvDFSSourceWithHeaderAndSepWithSchemaProviderAndNoTransformer() throws 
Exception {
+// The CSV files have header, the columns are separated by '\t'
+// File schema provider is used, no transformer is applied
+// In this case, the source schema comes from the source Avro schema file
+testCsvDFSSource(true, '\t', true, null);
+  }
+
+  @Test
+  public void 
testCsvDFSSourceWithHeaderAndSepWithoutSchemaProviderAndWithTransformer() 
throws Exception {
+// The CSV files have header, the columns are separated by '\t'
+// No schema provider is specified, transformer is applied
+// In this case, the source schema comes from the inferred schema of the 
CSV files.
+// Target schema is determined based on the Dataframe after transformation
+testCsvDFSSource(true, '\t', false, 
TripsWithDistanceTransformer.class.getName());
+  }
+
+  @Test
+ 

[GitHub] [incubator-hudi] yihua commented on a change in pull request #1165: [HUDI-76] Add CSV Source support for Hudi Delta Streamer

2020-03-11 Thread GitBox
yihua commented on a change in pull request #1165: [HUDI-76] Add CSV Source 
support for Hudi Delta Streamer
URL: https://github.com/apache/incubator-hudi/pull/1165#discussion_r390763221
 
 

 ##
 File path: 
hudi-utilities/src/test/java/org/apache/hudi/utilities/sources/TestCsvDFSSource.java
 ##
 @@ -0,0 +1,61 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.utilities.sources;
+
+import org.apache.hudi.common.model.HoodieRecord;
+import org.apache.hudi.common.util.TypedProperties;
+import org.apache.hudi.utilities.UtilitiesTestBase;
+import org.apache.hudi.utilities.schema.FilebasedSchemaProvider;
+
+import org.apache.hadoop.fs.Path;
+import org.junit.Before;
+
+import java.io.IOException;
+import java.util.List;
+
+/**
+ * Basic tests for {@link CsvDFSSource}.
 
 Review comment:
   Actually this class runs the tests defined in `AbstractDFSSourceTestBase` 
with logic for CSV source implemented in this class.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] yihua commented on a change in pull request #1165: [HUDI-76] Add CSV Source support for Hudi Delta Streamer

2020-03-11 Thread GitBox
yihua commented on a change in pull request #1165: [HUDI-76] Add CSV Source 
support for Hudi Delta Streamer
URL: https://github.com/apache/incubator-hudi/pull/1165#discussion_r390760597
 
 

 ##
 File path: 
hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/CsvDFSSource.java
 ##
 @@ -0,0 +1,125 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.utilities.sources;
+
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.common.util.TypedProperties;
+import org.apache.hudi.common.util.collection.Pair;
+import org.apache.hudi.utilities.schema.SchemaProvider;
+import org.apache.hudi.utilities.sources.helpers.DFSPathSelector;
+
+import org.apache.spark.api.java.JavaSparkContext;
+import org.apache.spark.sql.DataFrameReader;
+import org.apache.spark.sql.Dataset;
+import org.apache.spark.sql.Row;
+import org.apache.spark.sql.SparkSession;
+import org.apache.spark.sql.avro.SchemaConverters;
+import org.apache.spark.sql.types.StructType;
+
+import java.util.Arrays;
+import java.util.List;
+
+/**
+ * Reads data from CSV files on DFS as the data source.
+ *
+ * Internally, we use Spark to read CSV files thus any limitation of Spark CSV 
also applies here
+ * (e.g., limited support for nested schema).
+ *
+ * You can set the CSV-specific configs in the format of 
hoodie.deltastreamer.csv.*
+ * that are Spark compatible to deal with CSV files in Hudi.  The supported 
options are:
+ *
+ *   "sep", "encoding", "quote", "escape", "charToEscapeQuoteEscaping", 
"comment",
+ *   "header", "enforceSchema", "inferSchema", "samplingRatio", 
"ignoreLeadingWhiteSpace",
+ *   "ignoreTrailingWhiteSpace", "nullValue", "emptyValue", "nanValue", 
"positiveInf",
+ *   "negativeInf", "dateFormat", "timestampFormat", "maxColumns", 
"maxCharsPerColumn",
+ *   "mode", "columnNameOfCorruptRecord", "multiLine"
+ *
+ * Detailed information of these CSV options can be found at:
+ * 
https://spark.apache.org/docs/latest/api/java/org/apache/spark/sql/DataFrameReader.html#csv-scala.collection.Seq-
+ *
+ * If the source Avro schema is provided through the {@link 
org.apache.hudi.utilities.schema.FilebasedSchemaProvider}
+ * using "hoodie.deltastreamer.schemaprovider.source.schema.file" config, the 
schema is
+ * passed to the CSV reader without inferring the schema from the CSV file.
+ */
+public class CsvDFSSource extends RowSource {
+  // CsvSource config prefix
+  public static final String CSV_SRC_CONFIG_PREFIX = 
"hoodie.deltastreamer.csv.";
+  // CSV-specific configurations to pass in from Hudi to Spark
+  public static final List CSV_CONFIG_KEYS = Arrays.asList(
+  "sep", "encoding", "quote", "escape", "charToEscapeQuoteEscaping", 
"comment",
+  "header", "enforceSchema", "inferSchema", "samplingRatio", 
"ignoreLeadingWhiteSpace",
+  "ignoreTrailingWhiteSpace", "nullValue", "emptyValue", "nanValue", 
"positiveInf",
+  "negativeInf", "dateFormat", "timestampFormat", "maxColumns", 
"maxCharsPerColumn",
+  "mode", "columnNameOfCorruptRecord", "multiLine"
+  );
+
+  private final DFSPathSelector pathSelector;
+  private final StructType sourceSchema;
+
+  public CsvDFSSource(TypedProperties props,
+  JavaSparkContext sparkContext,
+  SparkSession sparkSession,
+  SchemaProvider schemaProvider) {
+super(props, sparkContext, sparkSession, schemaProvider);
+this.pathSelector = new DFSPathSelector(props, 
sparkContext.hadoopConfiguration());
+if (overriddenSchemaProvider != null) {
+  sourceSchema = (StructType) 
SchemaConverters.toSqlType(overriddenSchemaProvider.getSourceSchema()).dataType();
 
 Review comment:
   Good point.  Done.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] yihua commented on a change in pull request #1165: [HUDI-76] Add CSV Source support for Hudi Delta Streamer

2020-03-11 Thread GitBox
yihua commented on a change in pull request #1165: [HUDI-76] Add CSV Source 
support for Hudi Delta Streamer
URL: https://github.com/apache/incubator-hudi/pull/1165#discussion_r390761144
 
 

 ##
 File path: 
hudi-utilities/src/test/java/org/apache/hudi/utilities/TestHoodieDeltaStreamer.java
 ##
 @@ -653,7 +659,7 @@ private void prepareParquetDFSSource(boolean 
useSchemaProvider, boolean hasTrans
 if (useSchemaProvider) {
   
parquetProps.setProperty("hoodie.deltastreamer.schemaprovider.source.schema.file",
 dfsBasePath + "/source.avsc");
   if (hasTransformer) {
-
parquetProps.setProperty("hoodie.deltastreamer.schemaprovider.source.schema.file",
 dfsBasePath + "/target.avsc");
+
parquetProps.setProperty("hoodie.deltastreamer.schemaprovider.target.schema.file",
 dfsBasePath + "/target.avsc");
 
 Review comment:
   I don't remember fixing unit tests.  Given that this is optional so it is 
possible that the data written may be different from the schema designated.  
However, I think the integration tests should be able to catch any issue due to 
schema mismatch.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] yihua commented on a change in pull request #1165: [HUDI-76] Add CSV Source support for Hudi Delta Streamer

2020-03-11 Thread GitBox
yihua commented on a change in pull request #1165: [HUDI-76] Add CSV Source 
support for Hudi Delta Streamer
URL: https://github.com/apache/incubator-hudi/pull/1165#discussion_r390762093
 
 

 ##
 File path: 
hudi-utilities/src/test/java/org/apache/hudi/utilities/UtilitiesTestBase.java
 ##
 @@ -193,19 +204,60 @@ public static void saveStringsToDFS(String[] lines, 
FileSystem fs, String target
   os.close();
 }
 
+/**
+ * Converts the json records into CSV format and writes to a file.
+ *
+ * @param hasHeader  whether the CSV file should have a header line.
+ * @param sep  the column separator to use.
+ * @param lines  the records in JSON format.
+ * @param fs  {@link FileSystem} instance.
+ * @param targetPath  File path.
+ * @throws IOException
+ */
+public static void saveCsvToDFS(
+boolean hasHeader, char sep,
+String[] lines, FileSystem fs, String targetPath) throws IOException {
+  Builder csvSchemaBuilder = CsvSchema.builder();
+
+  ArrayNode arrayNode = mapper.createArrayNode();
+  Arrays.stream(lines).forEachOrdered(
+  line -> {
+try {
+  arrayNode.add(mapper.readValue(line, ObjectNode.class));
+} catch (IOException e) {
+  e.printStackTrace();
 
 Review comment:
   This should not happen though but agree that we can throw exception here to 
catch any conversion issues.  Note that this is only used in the test code.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services