[GitHub] [incubator-hudi] lamber-ken edited a comment on issue #1128: [HUDI-453] Fix throw failed to archive commits error when writing data to MOR/COW table

2019-12-28 Thread GitBox
lamber-ken edited a comment on issue #1128: [HUDI-453] Fix throw failed to 
archive commits error when writing data to MOR/COW table
URL: https://github.com/apache/incubator-hudi/pull/1128#issuecomment-569474638
 
 
   > @lamber-ken : there is a bug in 
HoodieActiveTimeline.saveToCleanRequested(). It was never meant to be empty. 
The cleaner plan needs to be stored in requested file in the timeline
   > 
   > Remove these 2 lines
   > 
   > * // Plan is only stored in auxiliary folder
   > * createFileInMetaPath(instant.getFileName(), Option.empty(), false);
   >   and add
   > 
   > * createFileInMetaPath(instant.getFileName(), content, false);
   > 
   > This should guarantee that requested clean file is non-empty and no longer 
need any special casing in archiving.
   
   Done. I think it is not logic error, unit tests should not cover it, because 
we don't create an empty file manually. WDYT?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] yanghua commented on issue #1151: [HUDI-476] Add hudi-examples module

2019-12-28 Thread GitBox
yanghua commented on issue #1151: [HUDI-476] Add hudi-examples module
URL: https://github.com/apache/incubator-hudi/pull/1151#issuecomment-569479377
 
 
   Hi @dengziming Thanks for your contribution! It would be better to follow 
the [new contribution 
guidance](http://hudi.apache.org/contributing.html#life-of-a-contributor) about 
how to name the title of the PR. 
   
   Firstly, you can squash your commits into one commit.
   
   Regarding adding a new module, we'd better listen to @vinothchandar 's 
opinion firstly. I agree that it's better to have an example module.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Created] (HUDI-478) build_local_docker_images.sh

2019-12-28 Thread zhangpu (Jira)
zhangpu created HUDI-478:


 Summary: build_local_docker_images.sh
 Key: HUDI-478
 URL: https://issues.apache.org/jira/browse/HUDI-478
 Project: Apache Hudi (incubating)
  Issue Type: Bug
  Components: Testing
Reporter: zhangpu


exec  ./build_local_docker_images.sh command, get the following error:

[ERROR] Failed to execute goal org.apache.rat:apache-rat-plugin:0.12:check 
(default) on project hudi-cli: Too many files with unapproved license: 1 See 
RAT report in: /xxx/hudi/incubator-hudi/hudi-cli/target/rat.txt -> [Help 1]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] lamber-ken edited a comment on issue #1128: [HUDI-453] Fix throw failed to archive commits error when writing data to MOR/COW table

2019-12-28 Thread GitBox
lamber-ken edited a comment on issue #1128: [HUDI-453] Fix throw failed to 
archive commits error when writing data to MOR/COW table
URL: https://github.com/apache/incubator-hudi/pull/1128#issuecomment-569474638
 
 
   > @lamber-ken : there is a bug in 
HoodieActiveTimeline.saveToCleanRequested(). It was never meant to be empty. 
The cleaner plan needs to be stored in requested file in the timeline
   > 
   > Remove these 2 lines
   > 
   > * // Plan is only stored in auxiliary folder
   > * createFileInMetaPath(instant.getFileName(), Option.empty(), false);
   >   and add
   > 
   > * createFileInMetaPath(instant.getFileName(), content, false);
   > 
   > This should guarantee that requested clean file is non-empty and no longer 
need any special casing in archiving.
   
   Done. I think it's a logic error, unit tests should not cover it, because we 
don't create an empty file manually. WDYT?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] lamber-ken edited a comment on issue #1128: [HUDI-453] Fix throw failed to archive commits error when writing data to MOR/COW table

2019-12-28 Thread GitBox
lamber-ken edited a comment on issue #1128: [HUDI-453] Fix throw failed to 
archive commits error when writing data to MOR/COW table
URL: https://github.com/apache/incubator-hudi/pull/1128#issuecomment-569474638
 
 
   > @lamber-ken : there is a bug in 
HoodieActiveTimeline.saveToCleanRequested(). It was never meant to be empty. 
The cleaner plan needs to be stored in requested file in the timeline
   > 
   > Remove these 2 lines
   > 
   > * // Plan is only stored in auxiliary folder
   > * createFileInMetaPath(instant.getFileName(), Option.empty(), false);
   >   and add
   > 
   > * createFileInMetaPath(instant.getFileName(), content, false);
   > 
   > This should guarantee that requested clean file is non-empty and no longer 
need any special casing in archiving.
   
   Done. I think it's a logic error, unit tests should not cover it. WDYT?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] lamber-ken commented on issue #1128: [HUDI-453] Fix throw failed to archive commits error when writing data to MOR/COW table

2019-12-28 Thread GitBox
lamber-ken commented on issue #1128: [HUDI-453] Fix throw failed to archive 
commits error when writing data to MOR/COW table
URL: https://github.com/apache/incubator-hudi/pull/1128#issuecomment-569474638
 
 
   > @lamber-ken : there is a bug in 
HoodieActiveTimeline.saveToCleanRequested(). It was never meant to be empty. 
The cleaner plan needs to be stored in requested file in the timeline
   > 
   > Remove these 2 lines
   > 
   > * // Plan is only stored in auxiliary folder
   > * createFileInMetaPath(instant.getFileName(), Option.empty(), false);
   >   and add
   > 
   > * createFileInMetaPath(instant.getFileName(), content, false);
   > 
   > This should guarantee that requested clean file is non-empty and no longer 
need any special casing in archiving.
   
   Done. I think it's a logic error, unit tests should not cover it, WDYT?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] dengziming commented on issue #1151: Hudi-476: Add hudi-examples module

2019-12-28 Thread GitBox
dengziming commented on issue #1151: Hudi-476: Add hudi-examples module
URL: https://github.com/apache/incubator-hudi/pull/1151#issuecomment-569474270
 
 
   @leesf Hi, PTAL.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] dengziming opened a new pull request #1151: Hudi-476: Add hudi-examples module

2019-12-28 Thread GitBox
dengziming opened a new pull request #1151: Hudi-476: Add hudi-examples module
URL: https://github.com/apache/incubator-hudi/pull/1151
 
 
   ## *Tips*
   - *Thank you very much for contributing to Apache Hudi.*
   - *Please review https://hudi.apache.org/contributing.html before opening a 
pull request.*
   
   ## What is the purpose of the pull request
   
   [See](https://issues.apache.org/jira/browse/HUDI-476) this pr adds 
hudi-examples module
   
   ## Brief change log
   
 - add hudi-examples module and add other modules as dependencies in pom
 - add scala dependencies and scala-maven-plugin in parent pom
   
   ## Verify this pull request
   
   *(Please pick either of the following options)*
   
   This pull request is a trivial rework / code cleanup without any test 
coverage.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] yanghua commented on a change in pull request #1120: [HUDI-440] Rework the hudi web site

2019-12-28 Thread GitBox
yanghua commented on a change in pull request #1120: [HUDI-440] Rework the hudi 
web site
URL: https://github.com/apache/incubator-hudi/pull/1120#discussion_r361825874
 
 

 ##
 File path: newui/docs/_docs/1_4_powered_by.md
 ##
 @@ -0,0 +1,63 @@
+---
+title: "Talks & Powered By"
+keywords: hudi, talks, presentation
+permalink: /docs/powered_by.html
+---
+
+## Adoption
+
+### Uber
+
+Apache Hudi was originally developed at [Uber](https://uber.com), to achieve 
[low latency database ingestion, with high 
efficiency](http://www.slideshare.net/vinothchandar/hadoop-strata-talk-uber-your-hadoop-has-arrived/32).
+It has been in production since Aug 2016, powering the massive [100PB data 
lake](https://eng.uber.com/uber-big-data-platform/), including highly business 
critical tables like core trips,riders,partners. It also 
+powers several incremental Hive ETL pipelines and being currently integrated 
into Uber's data dispersal system.
+
+### Amazon Web Services
+Amazon Web Services is the World's leading cloud services provider. Apache 
Hudi is [pre-installed](https://aws.amazon.com/emr/features/hudi/) with the AWS 
Elastic Map Reduce 
+offering, providing means for AWS users to perform record-level 
updates/deletes and manage storage efficiently.
+
+### EMIS Health
+
+[EMIS Health][https://www.emishealth.com/] is the largest provider of Primary 
Care IT software in the UK with datasets including more than 500Bn healthcare 
records. HUDI is used to manage their analytics dataset in production and 
keeping them up-to-date with their upstream source. Presto is being used to 
query the data written in HUDI format.
 
 Review comment:
   `[EMIS Health][https://www.emishealth.com/]` should be `[EMIS 
Health](https://www.emishealth.com/)`? cc @lamber-ken 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] bvaradar commented on a change in pull request #1073: [HUDI-377] Adding Delete() support to DeltaStreamer

2019-12-28 Thread GitBox
bvaradar commented on a change in pull request #1073: [HUDI-377] Adding 
Delete() support to DeltaStreamer
URL: https://github.com/apache/incubator-hudi/pull/1073#discussion_r352920319
 
 

 ##
 File path: 
hudi-client/src/test/java/org/apache/hudi/common/HoodieTestDataGenerator.java
 ##
 @@ -77,9 +77,10 @@
   + "{\"name\": \"rider\", \"type\": \"string\"}," + "{\"name\": 
\"driver\", \"type\": \"string\"},"
   + "{\"name\": \"begin_lat\", \"type\": \"double\"}," + "{\"name\": 
\"begin_lon\", \"type\": \"double\"},"
   + "{\"name\": \"end_lat\", \"type\": \"double\"}," + "{\"name\": 
\"end_lon\", \"type\": \"double\"},"
-  + "{\"name\":\"fare\",\"type\": \"double\"}]}";
+  + "{\"name\":\"fare\",\"type\": \"double\"},"
+  + "{\"name\": \"_hoodie_delete_marker\", \"type\": 
[\"null\",\"string\"], \"default\": null} ]}";
 
 Review comment:
   _hoodie_delete_marker -> _hoodie_is_deleted ?
   Also can we use boolean type instead of String ?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] bvaradar commented on a change in pull request #1073: [HUDI-377] Adding Delete() support to DeltaStreamer

2019-12-28 Thread GitBox
bvaradar commented on a change in pull request #1073: [HUDI-377] Adding 
Delete() support to DeltaStreamer
URL: https://github.com/apache/incubator-hudi/pull/1073#discussion_r361825095
 
 

 ##
 File path: 
hudi-client/src/test/java/org/apache/hudi/common/HoodieTestDataGenerator.java
 ##
 @@ -74,14 +75,15 @@
   public static final String[] DEFAULT_PARTITION_PATHS =
   {DEFAULT_FIRST_PARTITION_PATH, DEFAULT_SECOND_PARTITION_PATH, 
DEFAULT_THIRD_PARTITION_PATH};
   public static final int DEFAULT_PARTITION_DEPTH = 3;
-  public static String TRIP_EXAMPLE_SCHEMA = "{\"type\": \"record\",\"name\": 
\"triprec\",\"fields\": [ "
-  + "{\"name\": \"timestamp\",\"type\": \"double\"},{\"name\": 
\"_row_key\", \"type\": \"string\"},"
-  + "{\"name\": \"rider\", \"type\": \"string\"},{\"name\": \"driver\", 
\"type\": \"string\"},"
-  + "{\"name\": \"begin_lat\", \"type\": \"double\"},{\"name\": 
\"begin_lon\", \"type\": \"double\"},"
-  + "{\"name\": \"end_lat\", \"type\": \"double\"},{\"name\": \"end_lon\", 
\"type\": \"double\"},"
-  + "{\"name\":\"fare\",\"type\": \"double\"}]}";
+  public static String TRIP_EXAMPLE_SCHEMA = "{\"type\": \"record\"," + 
"\"name\": \"triprec\"," + "\"fields\": [ "
+  + "{\"name\": \"timestamp\",\"type\": \"double\"}," + "{\"name\": 
\"_row_key\", \"type\": \"string\"},"
+  + "{\"name\": \"rider\", \"type\": \"string\"}," + "{\"name\": 
\"driver\", \"type\": \"string\"},"
+  + "{\"name\": \"begin_lat\", \"type\": \"double\"}," + "{\"name\": 
\"begin_lon\", \"type\": \"double\"},"
+  + "{\"name\": \"end_lat\", \"type\": \"double\"}," + "{\"name\": 
\"end_lon\", \"type\": \"double\"},"
+  + "{\"name\":\"fare\",\"type\": \"double\"},"
+  + "{\"name\": \"_hoodie_delete_marker\", \"type\": \"boolean\", 
\"default\": false} ]}";
   public static String NULL_SCHEMA = 
Schema.create(Schema.Type.NULL).toString();
-  public static String TRIP_HIVE_COLUMN_TYPES = 
"double,string,string,string,double,double,double,double,double";
+  public static String TRIP_HIVE_COLUMN_TYPES = 
"double,string,string,string,double,double,double,double,double,string";
 
 Review comment:
   string -> boolean ?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


Build failed in Jenkins: hudi-snapshot-deployment-0.5 #143

2019-12-28 Thread Apache Jenkins Server
See 


Changes:


--
[...truncated 2.18 KB...]
/home/jenkins/tools/maven/apache-maven-3.5.4/bin:
m2.conf
mvn
mvn.cmd
mvnDebug
mvnDebug.cmd
mvnyjp

/home/jenkins/tools/maven/apache-maven-3.5.4/boot:
plexus-classworlds-2.5.2.jar

/home/jenkins/tools/maven/apache-maven-3.5.4/conf:
logging
settings.xml
toolchains.xml

/home/jenkins/tools/maven/apache-maven-3.5.4/conf/logging:
simplelogger.properties

/home/jenkins/tools/maven/apache-maven-3.5.4/lib:
aopalliance-1.0.jar
cdi-api-1.0.jar
cdi-api.license
commons-cli-1.4.jar
commons-cli.license
commons-io-2.5.jar
commons-io.license
commons-lang3-3.5.jar
commons-lang3.license
ext
guava-20.0.jar
guice-4.2.0-no_aop.jar
jansi-1.17.1.jar
jansi-native
javax.inject-1.jar
jcl-over-slf4j-1.7.25.jar
jcl-over-slf4j.license
jsr250-api-1.0.jar
jsr250-api.license
maven-artifact-3.5.4.jar
maven-artifact.license
maven-builder-support-3.5.4.jar
maven-builder-support.license
maven-compat-3.5.4.jar
maven-compat.license
maven-core-3.5.4.jar
maven-core.license
maven-embedder-3.5.4.jar
maven-embedder.license
maven-model-3.5.4.jar
maven-model-builder-3.5.4.jar
maven-model-builder.license
maven-model.license
maven-plugin-api-3.5.4.jar
maven-plugin-api.license
maven-repository-metadata-3.5.4.jar
maven-repository-metadata.license
maven-resolver-api-1.1.1.jar
maven-resolver-api.license
maven-resolver-connector-basic-1.1.1.jar
maven-resolver-connector-basic.license
maven-resolver-impl-1.1.1.jar
maven-resolver-impl.license
maven-resolver-provider-3.5.4.jar
maven-resolver-provider.license
maven-resolver-spi-1.1.1.jar
maven-resolver-spi.license
maven-resolver-transport-wagon-1.1.1.jar
maven-resolver-transport-wagon.license
maven-resolver-util-1.1.1.jar
maven-resolver-util.license
maven-settings-3.5.4.jar
maven-settings-builder-3.5.4.jar
maven-settings-builder.license
maven-settings.license
maven-shared-utils-3.2.1.jar
maven-shared-utils.license
maven-slf4j-provider-3.5.4.jar
maven-slf4j-provider.license
org.eclipse.sisu.inject-0.3.3.jar
org.eclipse.sisu.inject.license
org.eclipse.sisu.plexus-0.3.3.jar
org.eclipse.sisu.plexus.license
plexus-cipher-1.7.jar
plexus-cipher.license
plexus-component-annotations-1.7.1.jar
plexus-component-annotations.license
plexus-interpolation-1.24.jar
plexus-interpolation.license
plexus-sec-dispatcher-1.4.jar
plexus-sec-dispatcher.license
plexus-utils-3.1.0.jar
plexus-utils.license
slf4j-api-1.7.25.jar
slf4j-api.license
wagon-file-3.1.0.jar
wagon-file.license
wagon-http-3.1.0-shaded.jar
wagon-http.license
wagon-provider-api-3.1.0.jar
wagon-provider-api.license

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/ext:
README.txt

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native:
freebsd32
freebsd64
linux32
linux64
osx
README.txt
windows32
windows64

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/freebsd32:
libjansi.so

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/freebsd64:
libjansi.so

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/linux32:
libjansi.so

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/linux64:
libjansi.so

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/osx:
libjansi.jnilib

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/windows32:
jansi.dll

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/windows64:
jansi.dll
Finished /home/jenkins/tools/maven/apache-maven-3.5.4 Directory Listing :
Detected current version as: 
'HUDI_home=
0.5.1-SNAPSHOT'
[INFO] Scanning for projects...
[INFO] 
[INFO] Reactor Build Order:
[INFO] 
[INFO] Hudi   [pom]
[INFO] hudi-common[jar]
[INFO] hudi-timeline-service  [jar]
[INFO] hudi-hadoop-mr [jar]
[INFO] hudi-client[jar]
[INFO] hudi-hive  [jar]
[INFO] hudi-spark [jar]
[INFO] hudi-utilities [jar]
[INFO] hudi-cli   [jar]
[INFO] hudi-hadoop-mr-bundle  [jar]
[INFO] hudi-hive-bundle   [jar]
[INFO] hudi-spark-bundle  [jar]
[INFO] hudi-presto-bundle [jar]
[INFO] hudi-utilities-bundle  [jar]
[INFO] hudi-timeline-server-bundle

[GitHub] [incubator-hudi] wangxianghu commented on issue #1148: [MINOR] Update the java doc of HoodieTableType

2019-12-28 Thread GitBox
wangxianghu commented on issue #1148: [MINOR] Update the java doc of 
HoodieTableType
URL: https://github.com/apache/incubator-hudi/pull/1148#issuecomment-569470817
 
 
   ok, thinks


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Created] (HUDI-477) Add HoodieClient Example code to hudi-examples

2019-12-28 Thread dengziming (Jira)
dengziming created HUDI-477:
---

 Summary: Add HoodieClient Example code to hudi-examples
 Key: HUDI-477
 URL: https://issues.apache.org/jira/browse/HUDI-477
 Project: Apache Hudi (incubating)
  Issue Type: Sub-task
Reporter: dengziming


the code incubator-hudi/hudi-client/src/test/java/HoodieClientExample.java 
could be reused, but it relies on 2 test class: HoodieTestDataGenerator and 
HoodieClientTestUtils, so it may be complex.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-476) Add a hudi-examples module

2019-12-28 Thread dengziming (Jira)
dengziming created HUDI-476:
---

 Summary: Add a hudi-examples module
 Key: HUDI-476
 URL: https://issues.apache.org/jira/browse/HUDI-476
 Project: Apache Hudi (incubating)
  Issue Type: Sub-task
Reporter: dengziming


add a hudi-examples module to add some examples code



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-475) Add hudi-examples module and move example codes to it and also add some necessary codes

2019-12-28 Thread dengziming (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17004628#comment-17004628
 ] 

dengziming commented on HUDI-475:
-

I will split it into small issues to be convenient to do code review

> Add hudi-examples module and move example codes to it and also add some 
> necessary codes
> ---
>
> Key: HUDI-475
> URL: https://issues.apache.org/jira/browse/HUDI-475
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>Reporter: dengziming
>Priority: Major
>  Labels: examples
>
> # Hudi doesn't have an examples module and it's better to add one which will 
> be advantageous to users and developers.
>  # incubator-hudi/hudi-client/src/test/java/HoodieClientExample.java code 
> could be move to examples module
>  # incubator-hudi/hudi-spark/src/test/java/HoodieJavaApp.java and 
> HoodieJavaStreamingApp could be moved to examples module
>  # the code in [quickstart|[https://hudi.apache.org/quickstart.html]] can be 
> added to examples module
>  # other suggestions are welcomed



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-242) Support Efficient bootstrap of large parquet datasets to Hudi

2019-12-28 Thread Balaji Varadarajan (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17004625#comment-17004625
 ] 

Balaji Varadarajan commented on HUDI-242:
-

sure [~Pratyaksh],  Does any of the unassigned tasks interests you ? 

> Support Efficient bootstrap of large parquet datasets to Hudi
> -
>
> Key: HUDI-242
> URL: https://issues.apache.org/jira/browse/HUDI-242
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Usability
>Reporter: Balaji Varadarajan
>Assignee: Balaji Varadarajan
>Priority: Major
> Fix For: 0.5.1
>
>
>  Support Efficient bootstrap of large parquet tables



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HUDI-417) Refactor HoodieWriteClient so that commit logic can be shareable by both bootstrap and normal write operations

2019-12-28 Thread Balaji Varadarajan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Balaji Varadarajan reassigned HUDI-417:
---

Assignee: Balaji Varadarajan

> Refactor HoodieWriteClient so that commit logic can be shareable by both 
> bootstrap and normal write operations
> --
>
> Key: HUDI-417
> URL: https://issues.apache.org/jira/browse/HUDI-417
> Project: Apache Hudi (incubating)
>  Issue Type: Sub-task
>  Components: Writer Core
>Reporter: Balaji Varadarajan
>Assignee: Balaji Varadarajan
>Priority: Major
> Fix For: 0.5.1
>
>
>  
> Basic Code Changes are present in the fork : 
> [https://github.com/bvaradar/hudi/tree/vb_bootstrap]
>  
> The current implementation of HoodieBootstrapClient has duplicate code for 
> committing bootstrap. 
> [https://github.com/bvaradar/hudi/blob/vb_bootstrap/hudi-client/src/main/java/org/apache/hudi/bootstrap/HoodieBootstrapClient.java]
>  
>  
> We can have an independent PR which would move these commit functionality 
> from HoodieWriteClient to a new base class AbstractHoodieWriteClient which 
> HoodieBootstrapClient can inherit.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-417) Refactor HoodieWriteClient so that commit logic can be shareable by both bootstrap and normal write operations

2019-12-28 Thread Balaji Varadarajan (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17004624#comment-17004624
 ] 

Balaji Varadarajan commented on HUDI-417:
-

[~nicholasjiang]: Sorry, I didnt get a chance to respond to your question. I am 
back from vacation. This is an initial step as it would make it easy to have 
further changes done with less conflicts. I will take it up as I have cycles 
tonight to get it done. As per earlier discussion, the spark datasource support 
task is not in the critical path for now but it would be great if you can start 
looking into it right away.

> Refactor HoodieWriteClient so that commit logic can be shareable by both 
> bootstrap and normal write operations
> --
>
> Key: HUDI-417
> URL: https://issues.apache.org/jira/browse/HUDI-417
> Project: Apache Hudi (incubating)
>  Issue Type: Sub-task
>  Components: Writer Core
>Reporter: Balaji Varadarajan
>Priority: Major
> Fix For: 0.5.1
>
>
>  
> Basic Code Changes are present in the fork : 
> [https://github.com/bvaradar/hudi/tree/vb_bootstrap]
>  
> The current implementation of HoodieBootstrapClient has duplicate code for 
> committing bootstrap. 
> [https://github.com/bvaradar/hudi/blob/vb_bootstrap/hudi-client/src/main/java/org/apache/hudi/bootstrap/HoodieBootstrapClient.java]
>  
>  
> We can have an independent PR which would move these commit functionality 
> from HoodieWriteClient to a new base class AbstractHoodieWriteClient which 
> HoodieBootstrapClient can inherit.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-475) Add hudi-examples module and move example codes to it and also add some necessary codes

2019-12-28 Thread dengziming (Jira)
dengziming created HUDI-475:
---

 Summary: Add hudi-examples module and move example codes to it and 
also add some necessary codes
 Key: HUDI-475
 URL: https://issues.apache.org/jira/browse/HUDI-475
 Project: Apache Hudi (incubating)
  Issue Type: Improvement
Reporter: dengziming


# Hudi doesn't have an examples module and it's better to add one which will be 
advantageous to users and developers.
 # incubator-hudi/hudi-client/src/test/java/HoodieClientExample.java code could 
be move to examples module

 # incubator-hudi/hudi-spark/src/test/java/HoodieJavaApp.java and 
HoodieJavaStreamingApp could be moved to examples module
 # the code in [quickstart|[https://hudi.apache.org/quickstart.html]] can be 
added to examples module
 # other suggestions are welcomed



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] bvaradar commented on issue #1128: [HUDI-453] Fix throw failed to archive commits error when writing data to MOR/COW table

2019-12-28 Thread GitBox
bvaradar commented on issue #1128: [HUDI-453] Fix throw failed to archive 
commits error when writing data to MOR/COW table
URL: https://github.com/apache/incubator-hudi/pull/1128#issuecomment-569469565
 
 
   @lamber-ken : there is a bug in HoodieActiveTimeline.saveToCleanRequested(). 
It was never meant to be empty. The cleaner plan needs to be stored in 
requested file in the timeline 
   
   Remove these 2 lines 
 - // Plan is only stored in auxiliary folder
 -  createFileInMetaPath(instant.getFileName(), Option.empty(), false);
   and add
   + createFileInMetaPath(instant.getFileName(), content, false);
   
   This should guarantee that requested clean file is non-empty and no longer 
need any special casing in archiving.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] yanghua commented on issue #1148: [MINOR] Update the java doc of HoodieTableType

2019-12-28 Thread GitBox
yanghua commented on issue #1148: [MINOR] Update the java doc of HoodieTableType
URL: https://github.com/apache/incubator-hudi/pull/1148#issuecomment-569466545
 
 
   Two additional tips:
   
   1. Always uppercase the first letter of the commit message(and the first 
letter of  the title of the PR) ;
   2. It would be better to avoid containing `'` in the whole commit message, 
because it may break the commit message;


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[incubator-hudi] branch master updated: [MINOR] Update the java doc of HoodieTableType (#1148)

2019-12-28 Thread vinoyang
This is an automated email from the ASF dual-hosted git repository.

vinoyang pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new 01c25d6  [MINOR] Update the java doc of HoodieTableType (#1148)
01c25d6 is described below

commit 01c25d6afff703156c31c4f4c12138113dca494a
Author: Mathieu <49835526+wangxian...@users.noreply.github.com>
AuthorDate: Sun Dec 29 09:57:19 2019 +0800

[MINOR] Update the java doc of HoodieTableType (#1148)
---
 .../src/main/java/org/apache/hudi/common/model/HoodieTableType.java | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git 
a/hudi-common/src/main/java/org/apache/hudi/common/model/HoodieTableType.java 
b/hudi-common/src/main/java/org/apache/hudi/common/model/HoodieTableType.java
index 17c7fd2..6c24e39 100644
--- 
a/hudi-common/src/main/java/org/apache/hudi/common/model/HoodieTableType.java
+++ 
b/hudi-common/src/main/java/org/apache/hudi/common/model/HoodieTableType.java
@@ -21,14 +21,14 @@ package org.apache.hudi.common.model;
 /**
  * Type of the Hoodie Table.
  * 
- * Currently, 1 type is supported
+ * Currently, 2 types are supported.
  * 
  * COPY_ON_WRITE - Performs upserts by versioning entire files, with later 
versions containing newer value of a record.
  * 
- * In the future, following might be added.
- * 
  * MERGE_ON_READ - Speeds up upserts, by delaying merge until enough work 
piles up.
  * 
+ * In the future, following might be added.
+ * 
  * SIMPLE_LSM - A simple 2 level LSM tree.
  */
 public enum HoodieTableType {



[GitHub] [incubator-hudi] yanghua merged pull request #1148: [MINOR] Update the java doc of HoodieTableType

2019-12-28 Thread GitBox
yanghua merged pull request #1148: [MINOR] Update the java doc of 
HoodieTableType
URL: https://github.com/apache/incubator-hudi/pull/1148
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] lamber-ken commented on a change in pull request #1128: [HUDI-453] Fix throw failed to archive commits error when writing data to MOR/COW table

2019-12-28 Thread GitBox
lamber-ken commented on a change in pull request #1128: [HUDI-453] Fix throw 
failed to archive commits error when writing data to MOR/COW table
URL: https://github.com/apache/incubator-hudi/pull/1128#discussion_r361819871
 
 

 ##
 File path: 
hudi-client/src/main/java/org/apache/hudi/io/HoodieCommitArchiveLog.java
 ##
 @@ -249,6 +249,13 @@ public void archive(List instants) throws 
HoodieCommitException {
   LOG.info("Wrapper schema " + wrapperSchema.toString());
   List records = new ArrayList<>();
   for (HoodieInstant hoodieInstant : instants) {
+
+// filter empty instant, like *.commit.requested
+byte[] instantDetails = 
commitTimeline.getInstantDetails(hoodieInstant).get();
+if (instantDetails == null || instantDetails.length == 0) {
+  continue;
+}
+
 
 Review comment:
   Yeah, will rich unit-tests.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] lamber-ken edited a comment on issue #1128: [HUDI-453] Fix throw failed to archive commits error when writing data to MOR/COW table

2019-12-28 Thread GitBox
lamber-ken edited a comment on issue #1128: [HUDI-453] Fix throw failed to 
archive commits error when writing data to MOR/COW table
URL: https://github.com/apache/incubator-hudi/pull/1128#issuecomment-569461408
 
 
   Hi @bvaradar, I went through the process again. There are three kinds of 
empty files(`*.clean.requested`, `*.commit.requested`, 
`*.deltacommit.requested`).
   
   When reading `*.clean.requested` files, will throw `Not an Avro data file` 
error. Because `org.apache.avro.file.DataFileReader#openReader` cat not read it.
   
   When reading `*.commit.requested`, `*.deltacommit.requested` files, 
`HoodieCommitMetadata#fromBytes` checks whether empty commit file or not, 
return a new HoodieCommitMetadata instance when meets empty commit file.
   
   `HoodieActiveTimeline#saveToCleanRequested` created the empty 
`*.clean.requested` files.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] lamber-ken commented on issue #1128: [HUDI-453] Fix throw failed to archive commits error when writing data to MOR/COW table

2019-12-28 Thread GitBox
lamber-ken commented on issue #1128: [HUDI-453] Fix throw failed to archive 
commits error when writing data to MOR/COW table
URL: https://github.com/apache/incubator-hudi/pull/1128#issuecomment-569461408
 
 
   Hi @bvaradar, I went through the process again. There are three kinds of 
empty files(`*.clean.requested`, `*.commit.requested`, 
`*.deltacommit.requested`), these files need to be read later.
   
   When reading `*.clean.requested` files, will throw `Not an Avro data file` 
error. Because `org.apache.avro.file.DataFileReader#openReader` cat not read it.
   
   When reading `*.commit.requested`, `*.deltacommit.requested`, 
`HoodieCommitMetadata#fromBytes` checks whether empty commit file or not, 
return a new HoodieCommitMetadata instance when meets empty commit file.
   
   `HoodieActiveTimeline#saveToCleanRequested` created the empty 
`*.clean.requested` files.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] lamber-ken edited a comment on issue #1128: [HUDI-453] Fix throw failed to archive commits error when writing data to MOR/COW table

2019-12-28 Thread GitBox
lamber-ken edited a comment on issue #1128: [HUDI-453] Fix throw failed to 
archive commits error when writing data to MOR/COW table
URL: https://github.com/apache/incubator-hudi/pull/1128#issuecomment-569461408
 
 
   Hi @bvaradar, I went through the process again. There are three kinds of 
empty files(`*.clean.requested`, `*.commit.requested`, 
`*.deltacommit.requested`).
   
   When reading `*.clean.requested` files, will throw `Not an Avro data file` 
error. Because `org.apache.avro.file.DataFileReader#openReader` cat not read it.
   
   When reading `*.commit.requested`, `*.deltacommit.requested`, 
`HoodieCommitMetadata#fromBytes` checks whether empty commit file or not, 
return a new HoodieCommitMetadata instance when meets empty commit file.
   
   `HoodieActiveTimeline#saveToCleanRequested` created the empty 
`*.clean.requested` files.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (HUDI-474) Delta Streamer is not able to read the commit files

2019-12-28 Thread Balaji Varadarajan (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17004569#comment-17004569
 ] 

Balaji Varadarajan commented on HUDI-474:
-

[~srkhan]  As per your comment in the email thread -  "when I have checked, 
folder .aux was empty ... ", this means that  you have upgraded deltastreamer 
to latest master (which includes 
[https://github.com/apache/incubator-hudi/pull/1009]) when running earlier 
clean operations but from the stack trace, it looks like older layout version 
is used. 

Is it possible you are using 0.5.0 version of deltastreamer or have you set the 
configuration. hoodie.timeline.layout.version. In any case, can you attach the 
complete logs of your deltastreamer run (where you saw the problem) in this 
Jira ? I did not see the attachment in your email to the dev ML.

Balaji.V

> Delta Streamer is not able to read the commit files
> ---
>
> Key: HUDI-474
> URL: https://issues.apache.org/jira/browse/HUDI-474
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>  Components: DeltaStreamer
>Reporter: Shahida Khan
>Assignee: Balaji Varadarajan
>Priority: Major
> Fix For: 0.5.1
>
> Attachments: Gmail - Commit time issue in DeltaStreamer 
> (Real-Time).pdf
>
>
> DeltaStreamer is not to able to read the correct commit files under when job 
> is deployed realtime.
> below is the stack trace: 
> ava.util.concurrent.ExecutionException:
> org.apache.hudi.exception.HoodieException: Could not read commit
> details from 
> hdfs:/user/hive/warehouse/hudi.db/tbltest/.hoodie/.aux/20191226153400.clean.requested
>       at
> java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)  
>   at
> java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895) at
> org.apache.hudi.utilities.deltastreamer.AbstractDeltaStreamerService.waitForShutdown(AbstractDeltaStreamerService.java:72)
>       at
> org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.sync(HoodieDeltaStreamer.java:117)
>   at
> org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.main(HoodieDeltaStreamer.java:297)
>   at
> sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)     at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
>   at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>    at
> java.lang.reflect.Method.invoke(Method.java:498)        at
> org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:688)Caused
> by: org.apache.hudi.exception.HoodieException: Could not read commit
> details from 
> hdfs:/user/hive/warehouse/hudi.db/tbltest/.hoodie/.aux/20191226153400.clean.requested
>       at
> org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer$DeltaSyncService.lambda$startService$0(HoodieDeltaStreamer.java:411)
>         at
> java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)
>      at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at
> java.lang.Thread.run(Thread.java:748)
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-402) Code clean up in DataSourceUtils class

2019-12-28 Thread Pratyaksh Sharma (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pratyaksh Sharma updated HUDI-402:
--
Fix Version/s: 0.5.1

> Code clean up in DataSourceUtils class
> --
>
> Key: HUDI-402
> URL: https://issues.apache.org/jira/browse/HUDI-402
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Code Cleanup
>Reporter: Pratyaksh Sharma
>Assignee: Pratyaksh Sharma
>Priority: Major
> Fix For: 0.5.1
>
>
> In function getNestedFieldValAsString, we call getNestedFieldVal function to 
> get the value. Then we check if the object returned is null, which is always 
> false, since the called function throws an exception rather than returning 
> null. 
> Need to change the code accordingly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-406) Introduce Default partition path in TimestampBasedKeyGenerator

2019-12-28 Thread Pratyaksh Sharma (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pratyaksh Sharma updated HUDI-406:
--
Fix Version/s: 0.5.1

> Introduce Default partition path in TimestampBasedKeyGenerator
> --
>
> Key: HUDI-406
> URL: https://issues.apache.org/jira/browse/HUDI-406
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: DeltaStreamer, newbie
>Reporter: Pratyaksh Sharma
>Assignee: Pratyaksh Sharma
>Priority: Major
> Fix For: 0.5.1
>
>
> TimestampBasedKeyGenerator is used to define custom partition path formats 
> with some timestamp based field. However, if the value of such a field is not 
> present in some incoming record, we should default to some default partition 
> path like other key generators. Currently we throw exception in such cases



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-242) Support Efficient bootstrap of large parquet datasets to Hudi

2019-12-28 Thread Pratyaksh Sharma (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17004552#comment-17004552
 ] 

Pratyaksh Sharma commented on HUDI-242:
---

Hi [~vbalaji], I would like to work with you on this. Please let me know if I 
can pick up any of the above tasks. :)

> Support Efficient bootstrap of large parquet datasets to Hudi
> -
>
> Key: HUDI-242
> URL: https://issues.apache.org/jira/browse/HUDI-242
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Usability
>Reporter: Balaji Varadarajan
>Assignee: Balaji Varadarajan
>Priority: Major
> Fix For: 0.5.1
>
>
>  Support Efficient bootstrap of large parquet tables



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HUDI-310) DynamoDB/Kinesis Change Capture using Delta Streamer

2019-12-28 Thread Suneel Marthi (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi reassigned HUDI-310:
--

Assignee: Suneel Marthi

> DynamoDB/Kinesis Change Capture using Delta Streamer
> 
>
> Key: HUDI-310
> URL: https://issues.apache.org/jira/browse/HUDI-310
> Project: Apache Hudi (incubating)
>  Issue Type: New Feature
>  Components: DeltaStreamer
>Reporter: Vinoth Chandar
>Assignee: Suneel Marthi
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] pratyakshsharma commented on issue #1075: [HUDI-114]: added option to overwrite payload implementation in hoodie.properties file

2019-12-28 Thread GitBox
pratyakshsharma commented on issue #1075: [HUDI-114]: added option to overwrite 
payload implementation in hoodie.properties file
URL: https://github.com/apache/incubator-hudi/pull/1075#issuecomment-569436705
 
 
   @cdmikechen Good catch! Have taken care of it now. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] pratyakshsharma commented on issue #1075: [HUDI-114]: added option to overwrite payload implementation in hoodie.properties file

2019-12-28 Thread GitBox
pratyakshsharma commented on issue #1075: [HUDI-114]: added option to overwrite 
payload implementation in hoodie.properties file
URL: https://github.com/apache/incubator-hudi/pull/1075#issuecomment-569434731
 
 
   @n3nash Done with the changes. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] pratyakshsharma commented on a change in pull request #1075: [HUDI-114]: added option to overwrite payload implementation in hoodie.properties file

2019-12-28 Thread GitBox
pratyakshsharma commented on a change in pull request #1075: [HUDI-114]: added 
option to overwrite payload implementation in hoodie.properties file
URL: https://github.com/apache/incubator-hudi/pull/1075#discussion_r361804460
 
 

 ##
 File path: hudi-cli/src/main/java/org/apache/hudi/cli/HoodieCLI.java
 ##
 @@ -74,7 +74,8 @@ public static void initFS(boolean force) throws IOException {
   }
 
   public static void refreshTableMetadata() {
-setTableMetaClient(new HoodieTableMetaClient(HoodieCLI.conf, basePath, 
false, HoodieCLI.consistencyGuardConfig));
+setTableMetaClient(new HoodieTableMetaClient(HoodieCLI.conf, basePath, 
false, HoodieCLI.consistencyGuardConfig,
 
 Review comment:
   Made the changes @n3nash 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (HUDI-446) Refactor the codes based on scala codestyle PublicMethodsHaveTypeChecker rule

2019-12-28 Thread lamber-ken (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17004524#comment-17004524
 ] 

lamber-ken commented on HUDI-446:
-

hi [~soundhearer], because of the checkstyle rules are under discusstion, don't 
fix these checkstyle issues currently. 

I'll let you know later. :)

> Refactor the codes based on scala codestyle PublicMethodsHaveTypeChecker rule
> -
>
> Key: HUDI-446
> URL: https://issues.apache.org/jira/browse/HUDI-446
> Project: Apache Hudi (incubating)
>  Issue Type: Sub-task
>  Components: Code Cleanup
>Reporter: lamber-ken
>Assignee: Leping Huang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-118) Hudi CLI : Provide options for passing properties to Compactor, Cleaner and ParquetImporter

2019-12-28 Thread Pratyaksh Sharma (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pratyaksh Sharma updated HUDI-118:
--
Fix Version/s: 0.5.1

> Hudi CLI : Provide options for passing properties to Compactor, Cleaner and 
> ParquetImporter 
> 
>
> Key: HUDI-118
> URL: https://issues.apache.org/jira/browse/HUDI-118
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: CLI, Common Core, newbie
>Reporter: Balaji Varadarajan
>Assignee: Pratyaksh Sharma
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.5.1
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> For non-trivial CLI operations, we have a standalone script in hudi-utilities 
> that users can call directly using spark-submit (usually). We also have 
> commands in hudi-cli to invoke the commands directly from hudi-cli shell.
> There was an earlier effort to allow users to pass properties directly to the 
> scripts in hudi-utilities but we still need to give the same functionality to 
> the corresponding commands in hudi-cli.
> In hudi-cli, Compaction (schedule/compact), Cleaner and HDFSParquetImporter 
> command does not have option to pass DFS properties file. This is a followup 
> to PR [https://github.com/apache/incubator-hudi/pull/691]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-114) Allow for clients to overwrite the payload implementation in hoodie.properties

2019-12-28 Thread Pratyaksh Sharma (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pratyaksh Sharma updated HUDI-114:
--
Fix Version/s: 0.5.1

> Allow for clients to overwrite the payload implementation in hoodie.properties
> --
>
> Key: HUDI-114
> URL: https://issues.apache.org/jira/browse/HUDI-114
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>  Components: newbie, Writer Core
>Reporter: Nishith Agarwal
>Assignee: Pratyaksh Sharma
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.5.1
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Right now, once the payload class is set once in hoodie.properties, it cannot 
> be changed. In some cases, if a code refactor is done and the jar updated, 
> one may need to pass the new payload class name.
> Also, fix picking up the payload name for datasource API. By default 
> HoodieAvroPayload is written whereas for datasource API default is 
> OverwriteLatestAvroPayload



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] pratyakshsharma commented on a change in pull request #1080: [HUDI-118]: Options provided for passing properties to Cleaner, compactor and importer commands

2019-12-28 Thread GitBox
pratyakshsharma commented on a change in pull request #1080: [HUDI-118]: 
Options provided for passing properties to Cleaner, compactor and importer 
commands
URL: https://github.com/apache/incubator-hudi/pull/1080#discussion_r361802264
 
 

 ##
 File path: 
hudi-cli/src/main/java/org/apache/hudi/cli/commands/CleansCommand.java
 ##
 @@ -129,4 +125,31 @@ public String showCleanPartitions(@CliOption(key = 
{"clean"}, help = "clean to s
 return HoodiePrintHelper.print(header, new HashMap<>(), sortByField, 
descending, limit, headerOnly, rows);
 
   }
+
+  @CliCommand(value = "cleans run", help = "run clean")
+  public String runClean(@CliOption(key = "sparkMemory", 
unspecifiedDefaultValue = "4G",
+  help = "Spark executor memory") final String sparkMemory,
+ @CliOption(key = "propsFilePath", help = "path to 
properties file on localfs or dfs with configurations for hoodie client for 
cleaning",
+   unspecifiedDefaultValue = "") final String 
propsFilePath,
+ @CliOption(key = "hoodieConfigs", help = "Any 
configuration that can be set in the properties file can be passed here in the 
form of an array",
 
 Review comment:
   Made the changes. @n3nash 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] XuQianJin-Stars commented on a change in pull request #1106: [HUDI-209] Implement JMX metrics reporter

2019-12-28 Thread GitBox
XuQianJin-Stars commented on a change in pull request #1106: [HUDI-209] 
Implement JMX metrics reporter
URL: https://github.com/apache/incubator-hudi/pull/1106#discussion_r361799570
 
 

 ##
 File path: 
hudi-client/src/main/java/org/apache/hudi/metrics/JmxMetricsReporter.java
 ##
 @@ -82,6 +102,106 @@ public void report() {
 
   @Override
   public Closeable getReporter() {
-return null;
+return jmxServer.getReporter();
+  }
+
+  @Override
+  public void stop() {
+if (jmxServer != null) {
+  try {
+jmxServer.stop();
+  } catch (IOException e) {
+LOG.error("Failed to stop JMX server.", e);
+  }
+}
+  }
+
+  /**
+   * JMX Server implementation that JMX clients can connect to.
+   *
+   * Heavily based on j256 simplejmx project
 
 Review comment:
   > whats the licensing of this code?
   
   The ISC License (https://opensource.org/licenses/ISC) Can this licensing be 
accepted?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Updated] (HUDI-322) DeltaSteamer should pick checkpoints off only deltacommits for MOR tables

2019-12-28 Thread Shahida Khan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shahida Khan updated HUDI-322:
--
Component/s: newbie

> DeltaSteamer should pick checkpoints off only deltacommits for MOR tables
> -
>
> Key: HUDI-322
> URL: https://issues.apache.org/jira/browse/HUDI-322
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>  Components: DeltaStreamer, newbie
>Reporter: Vinoth Chandar
>Assignee: Shahida Khan
>Priority: Major
> Fix For: 0.5.1
>
>
> When using DeltaStreamer with MOR, the checkpoints would be written out to 
> .deltacommit files (and not .commit files). We need to confirm the behavior 
> and change code such that it reads from the correct metadata file..  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HUDI-446) Refactor the codes based on scala codestyle PublicMethodsHaveTypeChecker rule

2019-12-28 Thread Leping Huang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Leping Huang reassigned HUDI-446:
-

Assignee: Leping Huang

> Refactor the codes based on scala codestyle PublicMethodsHaveTypeChecker rule
> -
>
> Key: HUDI-446
> URL: https://issues.apache.org/jira/browse/HUDI-446
> Project: Apache Hudi (incubating)
>  Issue Type: Sub-task
>  Components: Code Cleanup
>Reporter: lamber-ken
>Assignee: Leping Huang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] pratyakshsharma edited a comment on issue #1080: [HUDI-118]: Options provided for passing properties to Cleaner, compactor and importer commands

2019-12-28 Thread GitBox
pratyakshsharma edited a comment on issue #1080: [HUDI-118]: Options provided 
for passing properties to Cleaner, compactor and importer commands
URL: https://github.com/apache/incubator-hudi/pull/1080#issuecomment-569427142
 
 
   @n3nash Sorry for the delay. Actually I was busy with HUDI-288. Doing the 
changes now. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Created] (HUDI-474) Delta Streamer is not able to read the commit files

2019-12-28 Thread Shahida Khan (Jira)
Shahida Khan created HUDI-474:
-

 Summary: Delta Streamer is not able to read the commit files
 Key: HUDI-474
 URL: https://issues.apache.org/jira/browse/HUDI-474
 Project: Apache Hudi (incubating)
  Issue Type: Bug
  Components: DeltaStreamer
Reporter: Shahida Khan
 Fix For: 0.5.1
 Attachments: Gmail - Commit time issue in DeltaStreamer (Real-Time).pdf

DeltaStreamer is not to able to read the correct commit files under when job is 
deployed realtime.

below is the stack trace: 

ava.util.concurrent.ExecutionException:
org.apache.hudi.exception.HoodieException: Could not read commit
details from 
hdfs:/user/hive/warehouse/hudi.db/tbltest/.hoodie/.aux/20191226153400.clean.requested
      at
java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)    
at
java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895) at
org.apache.hudi.utilities.deltastreamer.AbstractDeltaStreamerService.waitForShutdown(AbstractDeltaStreamerService.java:72)
      at
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.sync(HoodieDeltaStreamer.java:117)
  at
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.main(HoodieDeltaStreamer.java:297)
  at
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)     at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)   
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at
java.lang.reflect.Method.invoke(Method.java:498)        at
org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:688)Caused
by: org.apache.hudi.exception.HoodieException: Could not read commit
details from 
hdfs:/user/hive/warehouse/hudi.db/tbltest/.hoodie/.aux/20191226153400.clean.requested
      at
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer$DeltaSyncService.lambda$startService$0(HoodieDeltaStreamer.java:411)
        at
java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)
     at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
at
java.lang.Thread.run(Thread.java:748)

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] pratyakshsharma commented on issue #1080: [HUDI-118]: Options provided for passing properties to Cleaner, compactor and importer commands

2019-12-28 Thread GitBox
pratyakshsharma commented on issue #1080: [HUDI-118]: Options provided for 
passing properties to Cleaner, compactor and importer commands
URL: https://github.com/apache/incubator-hudi/pull/1080#issuecomment-569427142
 
 
   @n3nash Sorry for the delay. Actually I was busy with HUDI-288. Wanted to 
know one thing - why have we removed @CliAvailabilityIndicator annotated 
functions from all the CLI commands in master branch? 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] XuQianJin-Stars commented on a change in pull request #1106: [HUDI-209] Implement JMX metrics reporter

2019-12-28 Thread GitBox
XuQianJin-Stars commented on a change in pull request #1106: [HUDI-209] 
Implement JMX metrics reporter
URL: https://github.com/apache/incubator-hudi/pull/1106#discussion_r361799570
 
 

 ##
 File path: 
hudi-client/src/main/java/org/apache/hudi/metrics/JmxMetricsReporter.java
 ##
 @@ -82,6 +102,106 @@ public void report() {
 
   @Override
   public Closeable getReporter() {
-return null;
+return jmxServer.getReporter();
+  }
+
+  @Override
+  public void stop() {
+if (jmxServer != null) {
+  try {
+jmxServer.stop();
+  } catch (IOException e) {
+LOG.error("Failed to stop JMX server.", e);
+  }
+}
+  }
+
+  /**
+   * JMX Server implementation that JMX clients can connect to.
+   *
+   * Heavily based on j256 simplejmx project
 
 Review comment:
   > whats the licensing of this code?
   
   The ISC License (https://opensource.org/licenses/ISC)


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] pratyakshsharma opened a new pull request #1150: [HUDI-288]: Add support for ingesting multiple kafka streams in a single DeltaStreamer deployment

2019-12-28 Thread GitBox
pratyakshsharma opened a new pull request #1150: [HUDI-288]: Add support for 
ingesting multiple kafka streams in a single DeltaStreamer deployment
URL: https://github.com/apache/incubator-hudi/pull/1150
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] wojustme closed pull request #1104: [HUDI-404] fix the error of compiling project.

2019-12-28 Thread GitBox
wojustme closed pull request #1104: [HUDI-404] fix the error of compiling 
project.
URL: https://github.com/apache/incubator-hudi/pull/1104
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] wojustme commented on issue #1104: [HUDI-404] fix the error of compiling project.

2019-12-28 Thread GitBox
wojustme commented on issue #1104: [HUDI-404] fix the error of compiling 
project.
URL: https://github.com/apache/incubator-hudi/pull/1104#issuecomment-569418057
 
 
   @lamber-ken @leesf 
   Sorry for replying later.
   I try again, maybe I set error config in 'setting.xml'.
   THX.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Created] (HUDI-473) QuickstartUtils

2019-12-28 Thread zhangpu (Jira)
zhangpu created HUDI-473:


 Summary: QuickstartUtils 
 Key: HUDI-473
 URL: https://issues.apache.org/jira/browse/HUDI-473
 Project: Apache Hudi (incubating)
  Issue Type: Improvement
  Components: Usability
Reporter: zhangpu


 First call dataGen.generateInserts to write the data,Then another process call 
dataGen.generateUpdates ,Throws the following exception:
Exception in thread "main" java.lang.IllegalArgumentException: bound must be 
positive
at java.util.Random.nextInt(Random.java:388)
at 
org.apache.hudi.QuickstartUtils$DataGenerator.generateUpdates(QuickstartUtils.java:163)

Is the design reasonable?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)