[GitHub] [incubator-hudi] pratyakshsharma commented on a change in pull request #1039: [HUDI-340]: made max events to read from kafka source configurable

2019-11-25 Thread GitBox
pratyakshsharma commented on a change in pull request #1039: [HUDI-340]: made 
max events to read from kafka source configurable
URL: https://github.com/apache/incubator-hudi/pull/1039#discussion_r350581065
 
 

 ##
 File path: 
hudi-utilities/src/test/java/org/apache/hudi/utilities/sources/TestKafkaSource.java
 ##
 @@ -131,6 +140,78 @@ public void testJsonKafkaSource() throws IOException {
 assertEquals(Option.empty(), fetch4AsRows.getBatch());
   }
 
+  @Test
+  public void testJsonKafkaSourceWithDefaultUpperCap() throws IOException {
+// topic setup.
+testUtils.createTopic(TEST_TOPIC_NAME, 2);
+HoodieTestDataGenerator dataGenerator = new HoodieTestDataGenerator();
+TypedProperties props = createPropsForJsonSource(Long.MAX_VALUE);
+
+Source jsonSource = new JsonKafkaSource(props, jsc, sparkSession, 
schemaProvider);
+SourceFormatAdapter kafkaSource = new SourceFormatAdapter(jsonSource);
+Config.DEFAULT_MAX_EVENTS_FROM_KAFKA_SOURCE = 500;
+
+/*
+1. Extract without any checkpoint => get all the data, respecting default 
upper cap since both sourceLimit and
+maxEventsFromKafkaSourceProp are set to Long.MAX_VALUE
+ */
+testUtils.sendMessages(TEST_TOPIC_NAME, 
Helpers.jsonifyRecords(dataGenerator.generateInserts("000",1000)));
 
 Review comment:
   Done. @leesf 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Assigned] (HUDI-209) Implement JMX metrics reporter

2019-11-25 Thread vinoyang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang reassigned HUDI-209:
-

Assignee: (was: vinoyang)

> Implement JMX metrics reporter
> --
>
> Key: HUDI-209
> URL: https://issues.apache.org/jira/browse/HUDI-209
> Project: Apache Hudi (incubating)
>  Issue Type: New Feature
>Reporter: vinoyang
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently, there are only two reporters {{MetricsGraphiteReporter}} and 
> {{InMemoryMetricsReporter}}. {{InMemoryMetricsReporter}} is used for testing. 
> So actually we only have one metrics reporter. Since JMX is a standard of the 
> monitor on the JVM platform, I propose to provide a JMX metrics reporter. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-118) Provide CLI Option for passing properties to Compactor, Cleaner and ParquetImporter

2019-11-25 Thread Vinoth Chandar (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16982116#comment-16982116
 ] 

Vinoth Chandar commented on HUDI-118:
-

Hmmm. I dont have context into this myself. Have to kick this one to [~vbalaji] 

> Provide CLI Option for passing properties to Compactor, Cleaner and 
> ParquetImporter
> ---
>
> Key: HUDI-118
> URL: https://issues.apache.org/jira/browse/HUDI-118
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: CLI, newbie
>Reporter: Balaji Varadarajan
>Assignee: Pratyaksh Sharma
>Priority: Minor
>
> Compaction (schedule/compact), Cleaner and HDFSParquetImporter command does 
> not have option to pass DFS properties file. This is a followup to PR 
> https://github.com/apache/incubator-hudi/pull/691



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Build failed in Jenkins: hudi-snapshot-deployment-0.5 #110

2019-11-25 Thread Apache Jenkins Server
See 


Changes:


--
[...truncated 2.18 KB...]
/home/jenkins/tools/maven/apache-maven-3.5.4/bin:
m2.conf
mvn
mvn.cmd
mvnDebug
mvnDebug.cmd
mvnyjp

/home/jenkins/tools/maven/apache-maven-3.5.4/boot:
plexus-classworlds-2.5.2.jar

/home/jenkins/tools/maven/apache-maven-3.5.4/conf:
logging
settings.xml
toolchains.xml

/home/jenkins/tools/maven/apache-maven-3.5.4/conf/logging:
simplelogger.properties

/home/jenkins/tools/maven/apache-maven-3.5.4/lib:
aopalliance-1.0.jar
cdi-api-1.0.jar
cdi-api.license
commons-cli-1.4.jar
commons-cli.license
commons-io-2.5.jar
commons-io.license
commons-lang3-3.5.jar
commons-lang3.license
ext
guava-20.0.jar
guice-4.2.0-no_aop.jar
jansi-1.17.1.jar
jansi-native
javax.inject-1.jar
jcl-over-slf4j-1.7.25.jar
jcl-over-slf4j.license
jsr250-api-1.0.jar
jsr250-api.license
maven-artifact-3.5.4.jar
maven-artifact.license
maven-builder-support-3.5.4.jar
maven-builder-support.license
maven-compat-3.5.4.jar
maven-compat.license
maven-core-3.5.4.jar
maven-core.license
maven-embedder-3.5.4.jar
maven-embedder.license
maven-model-3.5.4.jar
maven-model-builder-3.5.4.jar
maven-model-builder.license
maven-model.license
maven-plugin-api-3.5.4.jar
maven-plugin-api.license
maven-repository-metadata-3.5.4.jar
maven-repository-metadata.license
maven-resolver-api-1.1.1.jar
maven-resolver-api.license
maven-resolver-connector-basic-1.1.1.jar
maven-resolver-connector-basic.license
maven-resolver-impl-1.1.1.jar
maven-resolver-impl.license
maven-resolver-provider-3.5.4.jar
maven-resolver-provider.license
maven-resolver-spi-1.1.1.jar
maven-resolver-spi.license
maven-resolver-transport-wagon-1.1.1.jar
maven-resolver-transport-wagon.license
maven-resolver-util-1.1.1.jar
maven-resolver-util.license
maven-settings-3.5.4.jar
maven-settings-builder-3.5.4.jar
maven-settings-builder.license
maven-settings.license
maven-shared-utils-3.2.1.jar
maven-shared-utils.license
maven-slf4j-provider-3.5.4.jar
maven-slf4j-provider.license
org.eclipse.sisu.inject-0.3.3.jar
org.eclipse.sisu.inject.license
org.eclipse.sisu.plexus-0.3.3.jar
org.eclipse.sisu.plexus.license
plexus-cipher-1.7.jar
plexus-cipher.license
plexus-component-annotations-1.7.1.jar
plexus-component-annotations.license
plexus-interpolation-1.24.jar
plexus-interpolation.license
plexus-sec-dispatcher-1.4.jar
plexus-sec-dispatcher.license
plexus-utils-3.1.0.jar
plexus-utils.license
slf4j-api-1.7.25.jar
slf4j-api.license
wagon-file-3.1.0.jar
wagon-file.license
wagon-http-3.1.0-shaded.jar
wagon-http.license
wagon-provider-api-3.1.0.jar
wagon-provider-api.license

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/ext:
README.txt

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native:
freebsd32
freebsd64
linux32
linux64
osx
README.txt
windows32
windows64

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/freebsd32:
libjansi.so

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/freebsd64:
libjansi.so

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/linux32:
libjansi.so

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/linux64:
libjansi.so

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/osx:
libjansi.jnilib

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/windows32:
jansi.dll

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/windows64:
jansi.dll
Finished /home/jenkins/tools/maven/apache-maven-3.5.4 Directory Listing :
Detected current version as: 
'HUDI_home=
0.5.1-SNAPSHOT'
[INFO] Scanning for projects...
[INFO] 
[INFO] Reactor Build Order:
[INFO] 
[INFO] Hudi   [pom]
[INFO] hudi-common[jar]
[INFO] hudi-timeline-service  [jar]
[INFO] hudi-hadoop-mr [jar]
[INFO] hudi-client[jar]
[INFO] hudi-hive  [jar]
[INFO] hudi-spark [jar]
[INFO] hudi-utilities [jar]
[INFO] hudi-cli   [jar]
[INFO] hudi-hadoop-mr-bundle  [jar]
[INFO] hudi-hive-bundle   [jar]
[INFO] hudi-spark-bundle  [jar]
[INFO] hudi-presto-bundle [jar]
[INFO] hudi-utilities-bundle  [jar]
[INFO] hudi-timeline-server-bundle

[jira] [Commented] (HUDI-184) Integrate Hudi with Apache Flink

2019-11-25 Thread Vinoth Chandar (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16982107#comment-16982107
 ] 

Vinoth Chandar commented on HUDI-184:
-

Okay cool then. On using Flink’s state as index, if you take a look at 
HBaseIndex, it must be similar and you ll get an idea for what it takes (of 
course we can change the APIs as needed)

> Integrate Hudi with Apache Flink
> 
>
> Key: HUDI-184
> URL: https://issues.apache.org/jira/browse/HUDI-184
> Project: Apache Hudi (incubating)
>  Issue Type: New Feature
>  Components: Write Client
>Reporter: vinoyang
>Assignee: vinoyang
>Priority: Major
>
> Apache Flink is a popular streaming processing engine.
> Integrating Hudi with Flink is a valuable work.
> The discussion mailing thread is here: 
> [https://lists.apache.org/api/source.lua/1533de2d4cd4243fa9e8f8bf057ffd02f2ac0bec7c7539d8f72166ea@%3Cdev.hudi.apache.org%3E]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-118) Provide CLI Option for passing properties to Compactor, Cleaner and ParquetImporter

2019-11-25 Thread vinoyang (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16982085#comment-16982085
 ] 

vinoyang commented on HUDI-118:
---

[~Pratyaksh] I have nothing to supply. Let [~vinoth] make the decision.

> Provide CLI Option for passing properties to Compactor, Cleaner and 
> ParquetImporter
> ---
>
> Key: HUDI-118
> URL: https://issues.apache.org/jira/browse/HUDI-118
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: CLI, newbie
>Reporter: Balaji Varadarajan
>Assignee: Pratyaksh Sharma
>Priority: Minor
>
> Compaction (schedule/compact), Cleaner and HDFSParquetImporter command does 
> not have option to pass DFS properties file. This is a followup to PR 
> https://github.com/apache/incubator-hudi/pull/691



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-358) Add Java-doc and importOrder checkstyle rule

2019-11-25 Thread lamber-ken (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16982083#comment-16982083
 ] 

lamber-ken commented on HUDI-358:
-

Hi, [~yanghua], I also can't assign this to myself. [~vinoth] can you help?

> Add Java-doc and importOrder checkstyle rule
> 
>
> Key: HUDI-358
> URL: https://issues.apache.org/jira/browse/HUDI-358
> Project: Apache Hudi (incubating)
>  Issue Type: Sub-task
>  Components: Common Core
>Reporter: lamber-ken
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.1
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> 1, Add Java-doc and importOrder checkstyle rule.
> 2, Keep severity as info level before finish the issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-184) Integrate Hudi with Apache Flink

2019-11-25 Thread vinoyang (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16982081#comment-16982081
 ] 

vinoyang commented on HUDI-184:
---

{quote}In spark, we can run two spark jobs  (i.e the jobs tab you see in Spark 
UI) in parallel within the same physical set of executors.. Can Flink allow us 
to do this ?
{quote}
Yes.

> Integrate Hudi with Apache Flink
> 
>
> Key: HUDI-184
> URL: https://issues.apache.org/jira/browse/HUDI-184
> Project: Apache Hudi (incubating)
>  Issue Type: New Feature
>  Components: Write Client
>Reporter: vinoyang
>Assignee: vinoyang
>Priority: Major
>
> Apache Flink is a popular streaming processing engine.
> Integrating Hudi with Flink is a valuable work.
> The discussion mailing thread is here: 
> [https://lists.apache.org/api/source.lua/1533de2d4cd4243fa9e8f8bf057ffd02f2ac0bec7c7539d8f72166ea@%3Cdev.hudi.apache.org%3E]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-358) Add Java-doc and importOrder checkstyle rule

2019-11-25 Thread vinoyang (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16982080#comment-16982080
 ] 

vinoyang commented on HUDI-358:
---

[~lamber-ken] I can not assign this ticket to you. You can specify to yourself.

> Add Java-doc and importOrder checkstyle rule
> 
>
> Key: HUDI-358
> URL: https://issues.apache.org/jira/browse/HUDI-358
> Project: Apache Hudi (incubating)
>  Issue Type: Sub-task
>  Components: Common Core
>Reporter: lamber-ken
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.1
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> 1, Add Java-doc and importOrder checkstyle rule.
> 2, Keep severity as info level before finish the issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-367) Implement Prometheus metrics reporter

2019-11-25 Thread lamber-ken (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lamber-ken updated HUDI-367:

Labels: WIP  (was: )

> Implement Prometheus metrics reporter
> -
>
> Key: HUDI-367
> URL: https://issues.apache.org/jira/browse/HUDI-367
> Project: Apache Hudi (incubating)
>  Issue Type: New Feature
>Reporter: lamber-ken
>Priority: Major
>  Labels: WIP
> Attachments: image-2019-11-26-10-45-24-185.png
>
>
> Implement PrometheusGateway metrics reporter, it is a push model metrics 
> system.
> !image-2019-11-26-10-45-24-185.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-367) Implement Prometheus metrics reporter

2019-11-25 Thread lamber-ken (Jira)
lamber-ken created HUDI-367:
---

 Summary: Implement Prometheus metrics reporter
 Key: HUDI-367
 URL: https://issues.apache.org/jira/browse/HUDI-367
 Project: Apache Hudi (incubating)
  Issue Type: New Feature
Reporter: lamber-ken
 Attachments: image-2019-11-26-10-45-24-185.png

Implement Prometheus metrics reporter, it is a push model metrics system.

!image-2019-11-26-10-45-24-185.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-367) Implement Prometheus metrics reporter

2019-11-25 Thread lamber-ken (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lamber-ken updated HUDI-367:

Description: 
Implement PrometheusGateway metrics reporter, it is a push model metrics system.

!image-2019-11-26-10-45-24-185.png!

  was:
Implement Prometheus metrics reporter, it is a push model metrics system.

!image-2019-11-26-10-45-24-185.png!


> Implement Prometheus metrics reporter
> -
>
> Key: HUDI-367
> URL: https://issues.apache.org/jira/browse/HUDI-367
> Project: Apache Hudi (incubating)
>  Issue Type: New Feature
>Reporter: lamber-ken
>Priority: Major
> Attachments: image-2019-11-26-10-45-24-185.png
>
>
> Implement PrometheusGateway metrics reporter, it is a push model metrics 
> system.
> !image-2019-11-26-10-45-24-185.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-366) Refactor hudi-hadoop-mr based on new ImportOrder code style rule

2019-11-25 Thread lamber-ken (Jira)
lamber-ken created HUDI-366:
---

 Summary: Refactor hudi-hadoop-mr based on new ImportOrder code 
style rule
 Key: HUDI-366
 URL: https://issues.apache.org/jira/browse/HUDI-366
 Project: Apache Hudi (incubating)
  Issue Type: Sub-task
Reporter: lamber-ken


Refactor hudi-hadoop-mr based on new ImportOrder code style rule



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-364) Refactor hudi-hive based on new ImportOrder code style rule

2019-11-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-364:

Labels: pull-request-available  (was: )

> Refactor hudi-hive based on new ImportOrder code style rule
> ---
>
> Key: HUDI-364
> URL: https://issues.apache.org/jira/browse/HUDI-364
> Project: Apache Hudi (incubating)
>  Issue Type: Sub-task
>  Components: Hive Integration
>Reporter: lamber-ken
>Priority: Critical
>  Labels: pull-request-available
>
> Refactor hudi-hive based on new ImportOrder code style rule



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] lamber-ken opened a new pull request #1048: [HUDI-364] Refactor hudi-hive based on new ImportOrder code style rule

2019-11-25 Thread GitBox
lamber-ken opened a new pull request #1048: [HUDI-364] Refactor hudi-hive based 
on new ImportOrder code style rule
URL: https://github.com/apache/incubator-hudi/pull/1048
 
 
   ## What is the purpose of the pull request
   
   Refactor hudi-hive based on new ImportOrder code style rule
   
   ## Brief change log
   
 - Refactor hudi-hive based on new ImportOrder code style rule.
 - Change severity of checkstyle.xml file to `error`.
 - Fix some other minor hotfixs.
   
   ## Verify this pull request
   
   This pull request is a code cleanup without any test coverage.
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Created] (HUDI-365) Refactor hudi-cli based on new ImportOrder code style rule

2019-11-25 Thread Gurudatt Kulkarni (Jira)
Gurudatt Kulkarni created HUDI-365:
--

 Summary: Refactor hudi-cli based on new ImportOrder code style rule
 Key: HUDI-365
 URL: https://issues.apache.org/jira/browse/HUDI-365
 Project: Apache Hudi (incubating)
  Issue Type: Sub-task
Reporter: Gurudatt Kulkarni
Assignee: Gurudatt Kulkarni






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HUDI-354) Introduce stricter comment and code style validation rules

2019-11-25 Thread lamber-ken (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16982033#comment-16982033
 ] 

lamber-ken edited comment on HUDI-354 at 11/26/19 1:53 AM:
---

Because of it's a big work to do for each new checkstyle rule, so I create a 
new issue which working for ImportOrder rule.

[https://issues.apache.org/jira/projects/HUDI/issues/HUDI-363]


was (Author: lamber-ken):
Because of it's a big work to do for each new checkstyle rule, so I create a 
new issue which working for ImportOrder rule.

> Introduce stricter comment and code style validation rules
> --
>
> Key: HUDI-354
> URL: https://issues.apache.org/jira/browse/HUDI-354
> Project: Apache Hudi (incubating)
>  Issue Type: Task
>Reporter: vinoyang
>Priority: Major
> Attachments: Screenshot 2019-11-22 at 4.58.58 PM.png, Screenshot 
> 2019-11-22 at 5.02.32 PM.png
>
>
> This is an umbrella issue used to track apply some stricter comment and code 
> style validation rules for the whole project. The rules list below:
>  # All public classes must add class-level comments;
>  # All comments must end with a clear "."
>  # In the import statement of the class, clearly distinguish (by blank lines) 
> the import of Java SE and the import of non-java SE. Currently, I saw at 
> least two projects(Spark and Flink) that implement this rule. Flink 
> implements stricter rules than Spark. It is divided into several blocks from 
> top to bottom(owner import -> non-owner and non-JavaSE import -> Java SE 
> import -> static import), each block are sorted according to the natural 
> sequence of letters;
>  # Reconfirm the method and whether the comment is consistency;
> Each project sub-module mappings to one subtask.
> How to find all the invalidated points?
>  * Add the XML code snippet into {{PROJECT_ROOT/style/checkstyle.xml}} : 
> {code:java}
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
>value="Import {0} appears after other imports that it should precede"/>
> 
> 
> 
> 
>value="Redundant import {0}."/>
> 
> 
> 
> 
> {code}
>  *  Make sure you have installed CheckStyle-IDEA plugin and activated for the 
> project.
>  * Scan the project module you want to refactor and fix all the issues one by 
> one.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-354) Introduce stricter comment and code style validation rules

2019-11-25 Thread Gurudatt Kulkarni (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16982050#comment-16982050
 ] 

Gurudatt Kulkarni commented on HUDI-354:


[~lamber-ken] Awesome. Will rebase to master. Thanks. 

> Introduce stricter comment and code style validation rules
> --
>
> Key: HUDI-354
> URL: https://issues.apache.org/jira/browse/HUDI-354
> Project: Apache Hudi (incubating)
>  Issue Type: Task
>Reporter: vinoyang
>Priority: Major
> Attachments: Screenshot 2019-11-22 at 4.58.58 PM.png, Screenshot 
> 2019-11-22 at 5.02.32 PM.png
>
>
> This is an umbrella issue used to track apply some stricter comment and code 
> style validation rules for the whole project. The rules list below:
>  # All public classes must add class-level comments;
>  # All comments must end with a clear "."
>  # In the import statement of the class, clearly distinguish (by blank lines) 
> the import of Java SE and the import of non-java SE. Currently, I saw at 
> least two projects(Spark and Flink) that implement this rule. Flink 
> implements stricter rules than Spark. It is divided into several blocks from 
> top to bottom(owner import -> non-owner and non-JavaSE import -> Java SE 
> import -> static import), each block are sorted according to the natural 
> sequence of letters;
>  # Reconfirm the method and whether the comment is consistency;
> Each project sub-module mappings to one subtask.
> How to find all the invalidated points?
>  * Add the XML code snippet into {{PROJECT_ROOT/style/checkstyle.xml}} : 
> {code:java}
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
>value="Import {0} appears after other imports that it should precede"/>
> 
> 
> 
> 
>value="Redundant import {0}."/>
> 
> 
> 
> 
> {code}
>  *  Make sure you have installed CheckStyle-IDEA plugin and activated for the 
> project.
>  * Scan the project module you want to refactor and fix all the issues one by 
> one.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HUDI-349) Make cleaner retention based on time period to account for higher deviations in ingestion runs

2019-11-25 Thread Balaji Varadarajan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Balaji Varadarajan reassigned HUDI-349:
---

Assignee: Aravind Suresh  (was: Balaji Varadarajan)

> Make cleaner retention based on time period to account for higher deviations 
> in ingestion runs
> --
>
> Key: HUDI-349
> URL: https://issues.apache.org/jira/browse/HUDI-349
> Project: Apache Hudi (incubating)
>  Issue Type: Task
>  Components: Cleaner, newbie
>Reporter: Balaji Varadarajan
>Assignee: Aravind Suresh
>Priority: Major
> Fix For: 0.5.1
>
>
> Cleaner by commits is based on number of commits to be retained.  Ingestion 
> time could vary across runs due to various factors. For providing a bound on 
> the maximum running time for a query and for providing consistent retention 
> period, it is better to use a retention config based on time (e:g 12h) 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HUDI-349) Make cleaner retention based on time period to account for higher deviations in ingestion runs

2019-11-25 Thread Balaji Varadarajan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Balaji Varadarajan reassigned HUDI-349:
---

Assignee: Balaji Varadarajan

> Make cleaner retention based on time period to account for higher deviations 
> in ingestion runs
> --
>
> Key: HUDI-349
> URL: https://issues.apache.org/jira/browse/HUDI-349
> Project: Apache Hudi (incubating)
>  Issue Type: Task
>  Components: Cleaner, newbie
>Reporter: Balaji Varadarajan
>Assignee: Balaji Varadarajan
>Priority: Major
> Fix For: 0.5.1
>
>
> Cleaner by commits is based on number of commits to be retained.  Ingestion 
> time could vary across runs due to various factors. For providing a bound on 
> the maximum running time for a query and for providing consistent retention 
> period, it is better to use a retention config based on time (e:g 12h) 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-364) Refactor hudi-hive based on new ImportOrder code style rule

2019-11-25 Thread lamber-ken (Jira)
lamber-ken created HUDI-364:
---

 Summary: Refactor hudi-hive based on new ImportOrder code style 
rule
 Key: HUDI-364
 URL: https://issues.apache.org/jira/browse/HUDI-364
 Project: Apache Hudi (incubating)
  Issue Type: Sub-task
  Components: Hive Integration
Reporter: lamber-ken


Refactor hudi-hive based on new ImportOrder code style rule



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-363) Refactor codes based on ImportOrder code style rule

2019-11-25 Thread lamber-ken (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lamber-ken updated HUDI-363:

Description: 
Refactor codes based on ImportOrder code style rules. Manay places need to 
refactor, so this rule may needs some subtasks.

follow bellow steps to fix:

1, set the severity ImportOrder to error level in local env.

2, use command to check which module you are working on.
{code:java}
mvn -pl hudi-common checkstyle:check
{code}
3, remember to reset severity to info before commiting.

  was:
Refactor codes based on ImportOrder code style rules. Manay places need to 
refactor, so this rule may needs some subtasks.

follow bellow steps to fix:

1, set the severity ImportOrder to error level in local env.

2, use command to check which module you are working on.
{code:java}
mvn -pl hudi-common checkstyle:check
{code}


> Refactor codes based on ImportOrder code style rule
> ---
>
> Key: HUDI-363
> URL: https://issues.apache.org/jira/browse/HUDI-363
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>Reporter: lamber-ken
>Priority: Critical
>
> Refactor codes based on ImportOrder code style rules. Manay places need to 
> refactor, so this rule may needs some subtasks.
> follow bellow steps to fix:
> 1, set the severity ImportOrder to error level in local env.
> 2, use command to check which module you are working on.
> {code:java}
> mvn -pl hudi-common checkstyle:check
> {code}
> 3, remember to reset severity to info before commiting.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-363) Refactor codes based on ImportOrder code style rule

2019-11-25 Thread lamber-ken (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lamber-ken updated HUDI-363:

Description: 
Refactor codes based on ImportOrder code style rules. Manay places need to 
refactor, so this rule may needs some subtasks.

follow bellow steps to fix:

1, set the severity ImportOrder to error level in local env.

2, use command to check which module you are working on.
{code:java}
mvn -pl hudi-common checkstyle:check
{code}

  was:Refactor codes based on ImportOrder code style rules. Manay places need 
to refactor, so this rule may needs some subtasks


> Refactor codes based on ImportOrder code style rule
> ---
>
> Key: HUDI-363
> URL: https://issues.apache.org/jira/browse/HUDI-363
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>Reporter: lamber-ken
>Priority: Critical
>
> Refactor codes based on ImportOrder code style rules. Manay places need to 
> refactor, so this rule may needs some subtasks.
> follow bellow steps to fix:
> 1, set the severity ImportOrder to error level in local env.
> 2, use command to check which module you are working on.
> {code:java}
> mvn -pl hudi-common checkstyle:check
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-354) Introduce stricter comment and code style validation rules

2019-11-25 Thread lamber-ken (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16982033#comment-16982033
 ] 

lamber-ken commented on HUDI-354:
-

Because of it's a big work to do for each new checkstyle rule, so I create a 
new issue which working for ImportOrder rule.

> Introduce stricter comment and code style validation rules
> --
>
> Key: HUDI-354
> URL: https://issues.apache.org/jira/browse/HUDI-354
> Project: Apache Hudi (incubating)
>  Issue Type: Task
>Reporter: vinoyang
>Priority: Major
> Attachments: Screenshot 2019-11-22 at 4.58.58 PM.png, Screenshot 
> 2019-11-22 at 5.02.32 PM.png
>
>
> This is an umbrella issue used to track apply some stricter comment and code 
> style validation rules for the whole project. The rules list below:
>  # All public classes must add class-level comments;
>  # All comments must end with a clear "."
>  # In the import statement of the class, clearly distinguish (by blank lines) 
> the import of Java SE and the import of non-java SE. Currently, I saw at 
> least two projects(Spark and Flink) that implement this rule. Flink 
> implements stricter rules than Spark. It is divided into several blocks from 
> top to bottom(owner import -> non-owner and non-JavaSE import -> Java SE 
> import -> static import), each block are sorted according to the natural 
> sequence of letters;
>  # Reconfirm the method and whether the comment is consistency;
> Each project sub-module mappings to one subtask.
> How to find all the invalidated points?
>  * Add the XML code snippet into {{PROJECT_ROOT/style/checkstyle.xml}} : 
> {code:java}
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
>value="Import {0} appears after other imports that it should precede"/>
> 
> 
> 
> 
>value="Redundant import {0}."/>
> 
> 
> 
> 
> {code}
>  *  Make sure you have installed CheckStyle-IDEA plugin and activated for the 
> project.
>  * Scan the project module you want to refactor and fix all the issues one by 
> one.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-363) Refactor codes based on ImportOrder code style rule

2019-11-25 Thread lamber-ken (Jira)
lamber-ken created HUDI-363:
---

 Summary: Refactor codes based on ImportOrder code style rule
 Key: HUDI-363
 URL: https://issues.apache.org/jira/browse/HUDI-363
 Project: Apache Hudi (incubating)
  Issue Type: Improvement
Reporter: lamber-ken


Refactor codes based on ImportOrder code style rules. Manay places need to 
refactor, so this rule may needs some subtasks



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-354) Introduce stricter comment and code style validation rules

2019-11-25 Thread lamber-ken (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16982020#comment-16982020
 ] 

lamber-ken commented on HUDI-354:
-

hi, [~gurudatt], you can rebase the master branch, and you will see the latest 
checkstyle.xml

> Introduce stricter comment and code style validation rules
> --
>
> Key: HUDI-354
> URL: https://issues.apache.org/jira/browse/HUDI-354
> Project: Apache Hudi (incubating)
>  Issue Type: Task
>Reporter: vinoyang
>Priority: Major
> Attachments: Screenshot 2019-11-22 at 4.58.58 PM.png, Screenshot 
> 2019-11-22 at 5.02.32 PM.png
>
>
> This is an umbrella issue used to track apply some stricter comment and code 
> style validation rules for the whole project. The rules list below:
>  # All public classes must add class-level comments;
>  # All comments must end with a clear "."
>  # In the import statement of the class, clearly distinguish (by blank lines) 
> the import of Java SE and the import of non-java SE. Currently, I saw at 
> least two projects(Spark and Flink) that implement this rule. Flink 
> implements stricter rules than Spark. It is divided into several blocks from 
> top to bottom(owner import -> non-owner and non-JavaSE import -> Java SE 
> import -> static import), each block are sorted according to the natural 
> sequence of letters;
>  # Reconfirm the method and whether the comment is consistency;
> Each project sub-module mappings to one subtask.
> How to find all the invalidated points?
>  * Add the XML code snippet into {{PROJECT_ROOT/style/checkstyle.xml}} : 
> {code:java}
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
>value="Import {0} appears after other imports that it should precede"/>
> 
> 
> 
> 
>value="Redundant import {0}."/>
> 
> 
> 
> 
> {code}
>  *  Make sure you have installed CheckStyle-IDEA plugin and activated for the 
> project.
>  * Scan the project module you want to refactor and fix all the issues one by 
> one.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] bschell commented on issue #1040: [HUDI-327] Add null/empty checks to key generators

2019-11-25 Thread GitBox
bschell commented on issue #1040: [HUDI-327] Add null/empty checks to key 
generators
URL: https://github.com/apache/incubator-hudi/pull/1040#issuecomment-558409453
 
 
   oops, my mistake. Should be good now.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (HUDI-184) Integrate Hudi with Apache Flink

2019-11-25 Thread Vinoth Chandar (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16981964#comment-16981964
 ] 

Vinoth Chandar commented on HUDI-184:
-

>> Yes, It seems the ingestion and compaction steps are independent of each 
>> other? We just let them exist in the same Spark job? If so, it's also not a 
>> problem in Flink.

yes. they are independent and compaction can run concurrently as ingestion is 
running as well. In spark, we can run two spark jobs  (i.e the jobs tab you see 
in Spark UI) in parallel within the same physical set of executors.. Can Flink 
allow us to do this ?

> Integrate Hudi with Apache Flink
> 
>
> Key: HUDI-184
> URL: https://issues.apache.org/jira/browse/HUDI-184
> Project: Apache Hudi (incubating)
>  Issue Type: New Feature
>  Components: Write Client
>Reporter: vinoyang
>Assignee: vinoyang
>Priority: Major
>
> Apache Flink is a popular streaming processing engine.
> Integrating Hudi with Flink is a valuable work.
> The discussion mailing thread is here: 
> [https://lists.apache.org/api/source.lua/1533de2d4cd4243fa9e8f8bf057ffd02f2ac0bec7c7539d8f72166ea@%3Cdev.hudi.apache.org%3E]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] leesf commented on issue #1039: [HUDI-340]: made max events to read from kafka source configurable

2019-11-25 Thread GitBox
leesf commented on issue #1039: [HUDI-340]: made max events to read from kafka 
source configurable
URL: https://github.com/apache/incubator-hudi/pull/1039#issuecomment-558364384
 
 
   @pratyakshsharma Thanks for your work. It's almost ready.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] leesf commented on a change in pull request #1039: [HUDI-340]: made max events to read from kafka source configurable

2019-11-25 Thread GitBox
leesf commented on a change in pull request #1039: [HUDI-340]: made max events 
to read from kafka source configurable
URL: https://github.com/apache/incubator-hudi/pull/1039#discussion_r350450697
 
 

 ##
 File path: 
hudi-utilities/src/test/java/org/apache/hudi/utilities/sources/TestKafkaSource.java
 ##
 @@ -131,6 +140,78 @@ public void testJsonKafkaSource() throws IOException {
 assertEquals(Option.empty(), fetch4AsRows.getBatch());
   }
 
+  @Test
+  public void testJsonKafkaSourceWithDefaultUpperCap() throws IOException {
+// topic setup.
+testUtils.createTopic(TEST_TOPIC_NAME, 2);
+HoodieTestDataGenerator dataGenerator = new HoodieTestDataGenerator();
+TypedProperties props = createPropsForJsonSource(Long.MAX_VALUE);
+
+Source jsonSource = new JsonKafkaSource(props, jsc, sparkSession, 
schemaProvider);
+SourceFormatAdapter kafkaSource = new SourceFormatAdapter(jsonSource);
+Config.DEFAULT_MAX_EVENTS_FROM_KAFKA_SOURCE = 500;
+
+/*
+1. Extract without any checkpoint => get all the data, respecting default 
upper cap since both sourceLimit and
+maxEventsFromKafkaSourceProp are set to Long.MAX_VALUE
+ */
+testUtils.sendMessages(TEST_TOPIC_NAME, 
Helpers.jsonifyRecords(dataGenerator.generateInserts("000",1000)));
 
 Review comment:
   `dataGenerator.generateInserts("000",1000)` -> 
`dataGenerator.generateInserts("000", 1000)` add a blank.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Closed] (HUDI-362) Adds a check for the existence of field

2019-11-25 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf closed HUDI-362.
--
Fix Version/s: 0.5.1
   Resolution: Fixed

Fixed via master: 44823041a37601fed8163502272a8fcb7a5be45d

> Adds a check for the existence of field
> ---
>
> Key: HUDI-362
> URL: https://issues.apache.org/jira/browse/HUDI-362
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: CLI
>Reporter: hong dongdong
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.1
>
> Attachments: image-2019-11-25-15-32-14-057.png, 
> image-2019-11-25-15-33-21-610.png
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Use command
> {code:java}
> commits show --sortBy "Total Bytes Written" --desc true --limit 10{code}
> when  sortBy field not in columns, it throw 
> !image-2019-11-25-15-32-14-057.png!
> It is better to give a friendly hint as: !image-2019-11-25-15-33-21-610.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (HUDI-358) Add Java-doc and importOrder checkstyle rule

2019-11-25 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf closed HUDI-358.
--
Fix Version/s: 0.5.1
   Resolution: Fixed

Fixed via master: 212282c8aaf623f451e3f72674ed4d3ed550602d

> Add Java-doc and importOrder checkstyle rule
> 
>
> Key: HUDI-358
> URL: https://issues.apache.org/jira/browse/HUDI-358
> Project: Apache Hudi (incubating)
>  Issue Type: Sub-task
>  Components: Common Core
>Reporter: lamber-ken
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.1
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> 1, Add Java-doc and importOrder checkstyle rule.
> 2, Keep severity as info level before finish the issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (HUDI-359) Add hudi-env for hudi-cli module

2019-11-25 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf closed HUDI-359.
--
Fix Version/s: 0.5.1
   Resolution: Fixed

Fixed via master: a7e07cd910425b5cfe9886677e780bfb2ae96c52

> Add hudi-env for hudi-cli module
> 
>
> Key: HUDI-359
> URL: https://issues.apache.org/jira/browse/HUDI-359
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: CLI
>Reporter: hong dongdong
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.1
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Add hudi-env.sh for hudi-cli module to set running environments.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] vinothchandar merged pull request #1042: [HUDI-359] Add hudi-env for hudi-cli module

2019-11-25 Thread GitBox
vinothchandar merged pull request #1042: [HUDI-359] Add hudi-env for hudi-cli 
module
URL: https://github.com/apache/incubator-hudi/pull/1042
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[incubator-hudi] branch master updated: [HUDI-359] Add hudi-env for hudi-cli module (#1042)

2019-11-25 Thread vinoth
This is an automated email from the ASF dual-hosted git repository.

vinoth pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new a7e07cd  [HUDI-359] Add hudi-env for hudi-cli module (#1042)
a7e07cd is described below

commit a7e07cd910425b5cfe9886677e780bfb2ae96c52
Author: hongdd 
AuthorDate: Tue Nov 26 05:25:42 2019 +0800

[HUDI-359] Add hudi-env for hudi-cli module (#1042)
---
 hudi-cli/{hudi-cli.sh => conf/hudi-env.sh} | 18 --
 hudi-cli/hudi-cli.sh   | 14 +-
 2 files changed, 9 insertions(+), 23 deletions(-)

diff --git a/hudi-cli/hudi-cli.sh b/hudi-cli/conf/hudi-env.sh
old mode 100755
new mode 100644
similarity index 61%
copy from hudi-cli/hudi-cli.sh
copy to hudi-cli/conf/hudi-env.sh
index a0b242a..0499502
--- a/hudi-cli/hudi-cli.sh
+++ b/hudi-cli/conf/hudi-env.sh
@@ -18,17 +18,7 @@
 # limitations under the License.
 

 
-DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )"
-HOODIE_JAR=`ls $DIR/target/hudi-cli-*.jar | grep -v source | grep -v javadoc`
-if [ -z "$HADOOP_CONF_DIR" ]; then
-  echo "setting hadoop conf dir"
-  HADOOP_CONF_DIR="/etc/hadoop/conf"
-fi
-if [ -z "$SPARK_CONF_DIR" ]; then
-  echo "setting spark conf dir"
-  SPARK_CONF_DIR="/etc/spark/conf"
-fi
-if [ -z "$CLIENT_JAR" ]; then
-  echo "client jar location not set"
-fi
-java -cp 
${HADOOP_CONF_DIR}:${SPARK_CONF_DIR}:$DIR/target/lib/*:$HOODIE_JAR:${CLIENT_JAR}
 -DSPARK_CONF_DIR=${SPARK_CONF_DIR} -DHADOOP_CONF_DIR=${HADOOP_CONF_DIR} 
org.springframework.shell.Bootstrap $@
+# Set the necessary environment variables
+export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-"/etc/hadoop/conf"}
+export SPARK_CONF_DIR=${SPARK_CONF_DIR:-"/etc/spark/conf"}
+export CLIENT_JAR=${CLIENT_JAR}
diff --git a/hudi-cli/hudi-cli.sh b/hudi-cli/hudi-cli.sh
index a0b242a..3ab0096 100755
--- a/hudi-cli/hudi-cli.sh
+++ b/hudi-cli/hudi-cli.sh
@@ -20,15 +20,11 @@
 
 DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )"
 HOODIE_JAR=`ls $DIR/target/hudi-cli-*.jar | grep -v source | grep -v javadoc`
-if [ -z "$HADOOP_CONF_DIR" ]; then
-  echo "setting hadoop conf dir"
-  HADOOP_CONF_DIR="/etc/hadoop/conf"
-fi
-if [ -z "$SPARK_CONF_DIR" ]; then
-  echo "setting spark conf dir"
-  SPARK_CONF_DIR="/etc/spark/conf"
-fi
+
+. "${DIR}"/conf/hudi-env.sh
+
 if [ -z "$CLIENT_JAR" ]; then
-  echo "client jar location not set"
+  echo "Client jar location not set, please set it in conf/hudi-env.sh"
 fi
+
 java -cp 
${HADOOP_CONF_DIR}:${SPARK_CONF_DIR}:$DIR/target/lib/*:$HOODIE_JAR:${CLIENT_JAR}
 -DSPARK_CONF_DIR=${SPARK_CONF_DIR} -DHADOOP_CONF_DIR=${HADOOP_CONF_DIR} 
org.springframework.shell.Bootstrap $@



[GitHub] [incubator-hudi] vinothchandar commented on issue #1040: [HUDI-327] Add null/empty checks to key generators

2019-11-25 Thread GitBox
vinothchandar commented on issue #1040: [HUDI-327] Add null/empty checks to key 
generators
URL: https://github.com/apache/incubator-hudi/pull/1040#issuecomment-558341088
 
 
   I think its there is an error 
   
   ```
   [ERROR] 
/home/travis/build/apache/incubator-hudi/hudi-spark/src/test/scala/TestDataSourceDefaults.scala:22:
 error: object EmptyHoodieRecordPayload is not a member of package 
org.apache.hudi
   2690[ERROR] import org.apache.hudi.{ComplexKeyGenerator, 
DataSourceWriteOptions, EmptyHoodieRecordPayload, 
OverwriteWithLatestAvroPayload, SimpleKeyGenerator}
   2691[ERROR]^
   2692[ERROR] one error found
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] bschell commented on issue #1040: [HUDI-327] Add null/empty checks to key generators

2019-11-25 Thread GitBox
bschell commented on issue #1040: [HUDI-327] Add null/empty checks to key 
generators
URL: https://github.com/apache/incubator-hudi/pull/1040#issuecomment-558313317
 
 
   done


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1001: [HUDI-325] Fix Hive partition error for updated HDFS Hudi table

2019-11-25 Thread GitBox
vinothchandar commented on a change in pull request #1001: [HUDI-325] Fix Hive 
partition error for updated HDFS Hudi table
URL: https://github.com/apache/incubator-hudi/pull/1001#discussion_r350386205
 
 

 ##
 File path: hudi-hive/src/main/java/org/apache/hudi/hive/HoodieHiveClient.java
 ##
 @@ -192,7 +192,9 @@ private String getPartitionClause(String partition) {
 String alterTable = "ALTER TABLE " + syncConfig.tableName;
 for (String partition : partitions) {
   String partitionClause = getPartitionClause(partition);
-  String fullPartitionPath = FSUtils.getPartitionPath(syncConfig.basePath, 
partition).toString();
+  Path partitionPath = FSUtils.getPartitionPath(syncConfig.basePath, 
partition);
+  String fullPartitionPath = 
!partitionPath.toUri().getScheme().equals("hdfs") ? partitionPath.toString()
 
 Review comment:
   small nit: can we use the `StorageSchemes.HDFS` enum? this way we can track 
this special handling over time 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1001: [HUDI-325] Fix Hive partition error for updated HDFS Hudi table

2019-11-25 Thread GitBox
vinothchandar commented on a change in pull request #1001: [HUDI-325] Fix Hive 
partition error for updated HDFS Hudi table
URL: https://github.com/apache/incubator-hudi/pull/1001#discussion_r350386470
 
 

 ##
 File path: hudi-common/src/main/java/org/apache/hudi/common/util/FSUtils.java
 ##
 @@ -532,6 +532,13 @@ public static Path getPartitionPath(Path basePath, String 
partitionPath) {
 return ((partitionPath == null) || (partitionPath.isEmpty())) ? basePath : 
new Path(basePath, partitionPath);
   }
 
+  /**
+   * Get HDFS full partition path (e.g. hdfs://ip-address:8020:/)
+   */
+  public static String getHDFSFullPartitionPath(FileSystem fs, Path 
partitionPath) {
 
 Review comment:
   nit : rename to `getDFSFullPartitionPath()` since this is agnostic of hdfs 
itself.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (HUDI-76) CSV Source support for Hudi Delta Streamer

2019-11-25 Thread Vinoth Chandar (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-76?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16981839#comment-16981839
 ] 

Vinoth Chandar commented on HUDI-76:


> , can we assume that the CSV files sit on DFS so we only consider fetching 
> the files on DFS based on the last modified time?

I think yes. 

 

Clean up what you find along the way, sure :) . 

> CSV Source support for Hudi Delta Streamer
> --
>
> Key: HUDI-76
> URL: https://issues.apache.org/jira/browse/HUDI-76
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: deltastreamer, Incremental Pull
>Reporter: Balaji Varadarajan
>Assignee: Ethan Guo
>Priority: Minor
>
> DeltaStreamer does not have support to pull CSV data from sources (hdfs log 
> files/kafka). THis ticket is to provide support for csv sources.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] vinothchandar commented on issue #1043: [HUDI-358] Add Java-doc and importOrder checkstyle rule

2019-11-25 Thread GitBox
vinothchandar commented on issue #1043: [HUDI-358] Add Java-doc and importOrder 
checkstyle rule
URL: https://github.com/apache/incubator-hudi/pull/1043#issuecomment-558307683
 
 
   Looks like a good rule to have. Just trying to understand implications.. 
Merging. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[incubator-hudi] branch master updated: [HUDI-358] Add Java-doc and importOrder checkstyle rule (#1043)

2019-11-25 Thread vinoth
This is an automated email from the ASF dual-hosted git repository.

vinoth pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new 212282c  [HUDI-358] Add Java-doc and importOrder checkstyle rule 
(#1043)
212282c is described below

commit 212282c8aaf623f451e3f72674ed4d3ed550602d
Author: 谢磊 
AuthorDate: Tue Nov 26 03:36:23 2019 +0800

[HUDI-358] Add Java-doc and importOrder checkstyle rule (#1043)

- import groups are separated by one blank line
- org.apache.hudi.* at the top location
---
 style/checkstyle.xml | 31 ++-
 1 file changed, 26 insertions(+), 5 deletions(-)

diff --git a/style/checkstyle.xml b/style/checkstyle.xml
index 7eab7b4..91f51c3 100644
--- a/style/checkstyle.xml
+++ b/style/checkstyle.xml
@@ -206,11 +206,6 @@
 
 
 -->
-
-
-
-
-
 
 
 
@@ -274,5 +269,31 @@
 
 
 
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
 
 



[GitHub] [incubator-hudi] vinothchandar merged pull request #1043: [HUDI-358] Add Java-doc and importOrder checkstyle rule

2019-11-25 Thread GitBox
vinothchandar merged pull request #1043: [HUDI-358] Add Java-doc and 
importOrder checkstyle rule
URL: https://github.com/apache/incubator-hudi/pull/1043
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on issue #1043: [HUDI-358] Add Java-doc and importOrder checkstyle rule

2019-11-25 Thread GitBox
vinothchandar commented on issue #1043: [HUDI-358] Add Java-doc and importOrder 
checkstyle rule
URL: https://github.com/apache/incubator-hudi/pull/1043#issuecomment-558307083
 
 
   @leesf @lamber-ken how does this apply to all the existing formatting 
already in code? do people hit this as they keep changing files?
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar merged pull request #1047: [HUDI-362] Adds a check for the existence of field

2019-11-25 Thread GitBox
vinothchandar merged pull request #1047: [HUDI-362] Adds a check for the 
existence of field
URL: https://github.com/apache/incubator-hudi/pull/1047
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[incubator-hudi] branch master updated: [HUDI-362] Adds a check for the existence of field (#1047)

2019-11-25 Thread vinoth
This is an automated email from the ASF dual-hosted git repository.

vinoth pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new 4482304  [HUDI-362] Adds a check for the existence of field (#1047)
4482304 is described below

commit 44823041a37601fed8163502272a8fcb7a5be45d
Author: hongdd 
AuthorDate: Tue Nov 26 03:31:07 2019 +0800

[HUDI-362] Adds a check for the existence of field (#1047)
---
 hudi-cli/src/main/java/org/apache/hudi/cli/HoodiePrintHelper.java | 4 
 hudi-cli/src/main/java/org/apache/hudi/cli/TableHeader.java   | 4 
 2 files changed, 8 insertions(+)

diff --git a/hudi-cli/src/main/java/org/apache/hudi/cli/HoodiePrintHelper.java 
b/hudi-cli/src/main/java/org/apache/hudi/cli/HoodiePrintHelper.java
index 635097f..3cce301 100644
--- a/hudi-cli/src/main/java/org/apache/hudi/cli/HoodiePrintHelper.java
+++ b/hudi-cli/src/main/java/org/apache/hudi/cli/HoodiePrintHelper.java
@@ -59,6 +59,10 @@ public class HoodiePrintHelper {
   return HoodiePrintHelper.print(rowHeader);
 }
 
+if (!sortByField.isEmpty() && !rowHeader.containsField(sortByField)) {
+  return String.format("Field[%s] is not in table, given columns[%s]", 
sortByField, rowHeader.getFieldNames());
+}
+
 Table table =
 new Table(rowHeader, fieldNameToConverterMap, 
Option.ofNullable(sortByField.isEmpty() ? null : sortByField),
 Option.ofNullable(isDescending), Option.ofNullable(limit <= 0 ? 
null : limit)).addAllRows(rows).flip();
diff --git a/hudi-cli/src/main/java/org/apache/hudi/cli/TableHeader.java 
b/hudi-cli/src/main/java/org/apache/hudi/cli/TableHeader.java
index 0472b0f..e257e36 100644
--- a/hudi-cli/src/main/java/org/apache/hudi/cli/TableHeader.java
+++ b/hudi-cli/src/main/java/org/apache/hudi/cli/TableHeader.java
@@ -68,4 +68,8 @@ public class TableHeader {
   public int getNumFields() {
 return fieldNames.size();
   }
+
+  public boolean containsField(String fieldName) {
+return fieldNames.contains(fieldName);
+  }
 }



[GitHub] [incubator-hudi] vinothchandar commented on issue #1047: [HUDI-362] Adds a check for the existence of field

2019-11-25 Thread GitBox
vinothchandar commented on issue #1047: [HUDI-362] Adds a check for the 
existence of field
URL: https://github.com/apache/incubator-hudi/pull/1047#issuecomment-558305590
 
 
   Thanks @hddong !


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on issue #1021: how can i deal this problem when partition's value changed with the same row_key?

2019-11-25 Thread GitBox
vinothchandar commented on issue #1021: how can i deal this problem when 
partition's value changed with the same row_key? 
URL: https://github.com/apache/incubator-hudi/issues/1021#issuecomment-558302481
 
 
   @nsivabalan  Could you debug why the second write it not getting tagged to 
the first partition? Thats the crux of it.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] leesf commented on a change in pull request #1045: [HUDI-209] Implement JMX metrics reporter

2019-11-25 Thread GitBox
leesf commented on a change in pull request #1045: [HUDI-209] Implement JMX 
metrics reporter
URL: https://github.com/apache/incubator-hudi/pull/1045#discussion_r350199901
 
 

 ##
 File path: 
hudi-client/src/main/java/org/apache/hudi/metrics/JmxMetricsReporter.java
 ##
 @@ -0,0 +1,87 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.metrics;
+
+import java.io.Closeable;
+
+import java.lang.management.ManagementFactory;
+import java.rmi.registry.LocateRegistry;
+import javax.management.remote.JMXConnectorServer;
+import javax.management.remote.JMXConnectorServerFactory;
+import javax.management.remote.JMXServiceURL;
+import org.apache.hudi.config.HoodieWriteConfig;
+import org.apache.hudi.exception.HoodieException;
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+
+/**
+ * Implementation of Jmx reporter, which used to report jmx metric.
+ */
+public class JmxMetricsReporter extends MetricsReporter {
+
+  private static Logger logger = 
LogManager.getLogger(JmxMetricsReporter.class);
+  private final JMXConnectorServer connector;
+  private String host;
+  private int port;
+
+  public JmxMetricsReporter(HoodieWriteConfig config) {
+try {
+  // Check the host and port here
+  this.host = config.getJmxHost();
+  this.port = config.getJmxPort();
+  if (host == null || port == 0) {
+throw new RuntimeException(
+String.format("Jmx cannot be initialized with host[%s] and 
port[%s].",
+host, port));
+  }
+  LocateRegistry.createRegistry(port);
+  String serviceUrl =
+  "service:jmx:rmi://" + host + ":" + port + "/jndi/rmi://" + host + 
":" + port + "/jmxrmi";
+  JMXServiceURL url = new JMXServiceURL(serviceUrl);
+  this.connector = JMXConnectorServerFactory
+  .newJMXConnectorServer(url, null, 
ManagementFactory.getPlatformMBeanServer());
+} catch (Exception e) {
+  String msg = "Jmx initialize failed: ";
+  logger.error(msg, e);
+  throw new HoodieException(msg, e);
+}
+  }
+
+  @Override
+  public void start() {
+try {
+  if (connector != null) {
+connector.start();
+  } else {
+logger.error("Cannot start as the jmxReporter is null.");
+  }
+} catch (Exception e) {
+  throw new HoodieException(e);
+}
 
 Review comment:
   ```java
   if (connector != null) {
try {
connector.start();
} catch (IOException e) {
throw new HoodieIOException(e);
}
 } else {
   logger.error("Cannot start as the jmxReporter is null.");
 }
   ```
   
   Looks better, WDYT?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] leesf commented on a change in pull request #1045: [HUDI-209] Implement JMX metrics reporter

2019-11-25 Thread GitBox
leesf commented on a change in pull request #1045: [HUDI-209] Implement JMX 
metrics reporter
URL: https://github.com/apache/incubator-hudi/pull/1045#discussion_r350198461
 
 

 ##
 File path: 
hudi-client/src/main/java/org/apache/hudi/metrics/MetricsReporter.java
 ##
 @@ -19,6 +19,7 @@
 package org.apache.hudi.metrics;
 
 import java.io.Closeable;
+import java.io.IOException;
 
 Review comment:
   Would you please remove `import java.io.IOException;` too?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] leesf commented on a change in pull request #1045: [HUDI-209] Implement JMX metrics reporter

2019-11-25 Thread GitBox
leesf commented on a change in pull request #1045: [HUDI-209] Implement JMX 
metrics reporter
URL: https://github.com/apache/incubator-hudi/pull/1045#discussion_r350198077
 
 

 ##
 File path: 
hudi-client/src/main/java/org/apache/hudi/metrics/MetricsReporter.java
 ##
 @@ -28,7 +29,7 @@
   /**
* Push out metrics at scheduled intervals
*/
-  public abstract void start();
+  public abstract void start() throws IOException;
 
 Review comment:
   Would you please remove the `throws IOException` too?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (HUDI-118) Provide CLI Option for passing properties to Compactor, Cleaner and ParquetImporter

2019-11-25 Thread Pratyaksh Sharma (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16981557#comment-16981557
 ] 

Pratyaksh Sharma commented on HUDI-118:
---

[~yanghua] I guess it is already done. The option is already present in all the 
required classes. Is there anything else to be done or can we close this 
ticket? 

> Provide CLI Option for passing properties to Compactor, Cleaner and 
> ParquetImporter
> ---
>
> Key: HUDI-118
> URL: https://issues.apache.org/jira/browse/HUDI-118
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: CLI, newbie
>Reporter: Balaji Varadarajan
>Assignee: Pratyaksh Sharma
>Priority: Minor
>
> Compaction (schedule/compact), Cleaner and HDFSParquetImporter command does 
> not have option to pass DFS properties file. This is a followup to PR 
> https://github.com/apache/incubator-hudi/pull/691



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] leesf commented on issue #1040: [HUDI-327] Add null/empty checks to key generators

2019-11-25 Thread GitBox
leesf commented on issue #1040: [HUDI-327] Add null/empty checks to key 
generators
URL: https://github.com/apache/incubator-hudi/pull/1040#issuecomment-558118734
 
 
   Hi @bschell Would you please rebase against master to solve the conflicts.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] leesf commented on issue #1044: [HUDI-361] Implement CSV metrics reporter

2019-11-25 Thread GitBox
leesf commented on issue #1044: [HUDI-361] Implement CSV metrics reporter
URL: https://github.com/apache/incubator-hudi/pull/1044#issuecomment-558114628
 
 
   Hi @XuQianJin-Stars I am also wondering whether this kind of reporter is 
easy and common to used?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] leesf commented on a change in pull request #1045: [HUDI-209] Implement JMX metrics reporter

2019-11-25 Thread GitBox
leesf commented on a change in pull request #1045: [HUDI-209] Implement JMX 
metrics reporter
URL: https://github.com/apache/incubator-hudi/pull/1045#discussion_r350127951
 
 

 ##
 File path: 
hudi-client/src/main/java/org/apache/hudi/metrics/MetricsReporter.java
 ##
 @@ -28,7 +29,7 @@
   /**
* Push out metrics at scheduled intervals
*/
-  public abstract void start();
+  public abstract void start() throws IOException;
 
 Review comment:
   Would get removed as pointed above.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] leesf commented on a change in pull request #1045: [HUDI-209] Implement JMX metrics reporter

2019-11-25 Thread GitBox
leesf commented on a change in pull request #1045: [HUDI-209] Implement JMX 
metrics reporter
URL: https://github.com/apache/incubator-hudi/pull/1045#discussion_r350128094
 
 

 ##
 File path: 
hudi-client/src/main/java/org/apache/hudi/metrics/MetricsReporter.java
 ##
 @@ -19,6 +19,7 @@
 package org.apache.hudi.metrics;
 
 import java.io.Closeable;
+import java.io.IOException;
 
 Review comment:
   would get removed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] leesf commented on a change in pull request #1045: [HUDI-209] Implement JMX metrics reporter

2019-11-25 Thread GitBox
leesf commented on a change in pull request #1045: [HUDI-209] Implement JMX 
metrics reporter
URL: https://github.com/apache/incubator-hudi/pull/1045#discussion_r350127640
 
 

 ##
 File path: 
hudi-client/src/main/java/org/apache/hudi/metrics/JmxMetricsReporter.java
 ##
 @@ -0,0 +1,85 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.metrics;
+
+import java.io.Closeable;
+
+import java.io.IOException;
+import java.lang.management.ManagementFactory;
+import java.rmi.registry.LocateRegistry;
+import javax.management.remote.JMXConnectorServer;
+import javax.management.remote.JMXConnectorServerFactory;
+import javax.management.remote.JMXServiceURL;
+import org.apache.hudi.config.HoodieWriteConfig;
+import org.apache.hudi.exception.HoodieException;
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+
+/**
+ * Implementation of Jmx reporter, which used to report jmx metric.
+ */
+public class JmxMetricsReporter extends MetricsReporter {
+
+  private static Logger logger = 
LogManager.getLogger(JmxMetricsReporter.class);
+  private final JMXConnectorServer connector;
+  private String host;
+  private int port;
+
+  public JmxMetricsReporter(HoodieWriteConfig config) {
+try {
+  // Check the host and port here
+  this.host = config.getJmxHost();
+  this.port = config.getJmxPort();
+  if (host == null || port == 0) {
+throw new RuntimeException(
+String.format("Jmx cannot be initialized with host[%s] and 
port[%s].",
+host, port));
+  }
+  LocateRegistry.createRegistry(port);
+  String serviceUrl =
+  "service:jmx:rmi://" + host + ":" + port + "/jndi/rmi://" + host + 
":" + port + "/jmxrmi";
+  JMXServiceURL url = new JMXServiceURL(serviceUrl);
+  this.connector = JMXConnectorServerFactory
+  .newJMXConnectorServer(url, null, 
ManagementFactory.getPlatformMBeanServer());
+} catch (Exception e) {
+  String msg = "Jmx initialize failed: ";
+  logger.error(msg, e);
+  throw new HoodieException(msg, e);
+}
+  }
+
+  @Override
+  public void start() throws IOException {
+if (connector != null) {
+  connector.start();
+} else {
+  logger.error("Cannot start as the jmxReporter is null.");
+}
+  }
+
+  @Override
+  public void report() {
+
+  }
 
 Review comment:
   better to change to `public void report() {}`, and no need to add an empty 
line?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] leesf commented on a change in pull request #1045: [HUDI-209] Implement JMX metrics reporter

2019-11-25 Thread GitBox
leesf commented on a change in pull request #1045: [HUDI-209] Implement JMX 
metrics reporter
URL: https://github.com/apache/incubator-hudi/pull/1045#discussion_r350126884
 
 

 ##
 File path: 
hudi-client/src/main/java/org/apache/hudi/metrics/JmxMetricsReporter.java
 ##
 @@ -0,0 +1,85 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.metrics;
+
+import java.io.Closeable;
+
+import java.io.IOException;
+import java.lang.management.ManagementFactory;
+import java.rmi.registry.LocateRegistry;
+import javax.management.remote.JMXConnectorServer;
+import javax.management.remote.JMXConnectorServerFactory;
+import javax.management.remote.JMXServiceURL;
+import org.apache.hudi.config.HoodieWriteConfig;
+import org.apache.hudi.exception.HoodieException;
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+
+/**
+ * Implementation of Jmx reporter, which used to report jmx metric.
+ */
+public class JmxMetricsReporter extends MetricsReporter {
+
+  private static Logger logger = 
LogManager.getLogger(JmxMetricsReporter.class);
+  private final JMXConnectorServer connector;
+  private String host;
+  private int port;
+
+  public JmxMetricsReporter(HoodieWriteConfig config) {
+try {
+  // Check the host and port here
+  this.host = config.getJmxHost();
+  this.port = config.getJmxPort();
+  if (host == null || port == 0) {
+throw new RuntimeException(
+String.format("Jmx cannot be initialized with host[%s] and 
port[%s].",
+host, port));
+  }
+  LocateRegistry.createRegistry(port);
+  String serviceUrl =
+  "service:jmx:rmi://" + host + ":" + port + "/jndi/rmi://" + host + 
":" + port + "/jmxrmi";
+  JMXServiceURL url = new JMXServiceURL(serviceUrl);
+  this.connector = JMXConnectorServerFactory
+  .newJMXConnectorServer(url, null, 
ManagementFactory.getPlatformMBeanServer());
+} catch (Exception e) {
+  String msg = "Jmx initialize failed: ";
+  logger.error(msg, e);
+  throw new HoodieException(msg, e);
+}
+  }
+
+  @Override
+  public void start() throws IOException {
 
 Review comment:
   How about catching `connector.start` exception and then throwing 
`HoodieIOException`? Since `HoodieIOException` exception is more common in the 
project and we don't need to modify `MetricsReporter #start`.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[incubator-hudi] branch master updated (c335510 -> 845a050)

2019-11-25 Thread leesf
This is an automated email from the ASF dual-hosted git repository.

leesf pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-hudi.git.


from c335510  [HUDI-328] Adding delete api to HoodieWriteClient (#1004)
 add 845a050  [MINOR] Some minor optimizations in HoodieJavaStreamingApp 
(#1046)

No new revisions were added by this update.

Summary of changes:
 .../src/test/java/HoodieJavaStreamingApp.java  | 27 +-
 1 file changed, 11 insertions(+), 16 deletions(-)



[GitHub] [incubator-hudi] leesf merged pull request #1046: Use commitimestamp2 instead of commitimestamp1 in HoodieJavaStreamingApp

2019-11-25 Thread GitBox
leesf merged pull request #1046: Use commitimestamp2 instead of  
commitimestamp1 in HoodieJavaStreamingApp
URL: https://github.com/apache/incubator-hudi/pull/1046
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] hddong commented on a change in pull request #1042: [HUDI-359] Add hudi-env for hudi-cli module

2019-11-25 Thread GitBox
hddong commented on a change in pull request #1042: [HUDI-359] Add hudi-env for 
hudi-cli module
URL: https://github.com/apache/incubator-hudi/pull/1042#discussion_r350072902
 
 

 ##
 File path: hudi-cli/hudi-cli.sh
 ##
 @@ -20,15 +20,12 @@
 
 DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )"
 HOODIE_JAR=`ls $DIR/target/hudi-cli-*.jar | grep -v source | grep -v javadoc`
-if [ -z "$HADOOP_CONF_DIR" ]; then
-  echo "setting hadoop conf dir"
-  HADOOP_CONF_DIR="/etc/hadoop/conf"
-fi
-if [ -z "$SPARK_CONF_DIR" ]; then
-  echo "setting spark conf dir"
-  SPARK_CONF_DIR="/etc/spark/conf"
-fi
+
+. "${DIR}"/conf/hudi-env.sh
+
 if [ -z "$CLIENT_JAR" ]; then
-  echo "client jar location not set"
+  echo "Client jar location not set, please set it in conf/hudi-env.sh"
+  exit
 
 Review comment:
   @vinothchandar Found `exit` cause checks failed, I have  delete it.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] hddong commented on a change in pull request #1042: [HUDI-359] Add hudi-env for hudi-cli module

2019-11-25 Thread GitBox
hddong commented on a change in pull request #1042: [HUDI-359] Add hudi-env for 
hudi-cli module
URL: https://github.com/apache/incubator-hudi/pull/1042#discussion_r350072902
 
 

 ##
 File path: hudi-cli/hudi-cli.sh
 ##
 @@ -20,15 +20,12 @@
 
 DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )"
 HOODIE_JAR=`ls $DIR/target/hudi-cli-*.jar | grep -v source | grep -v javadoc`
-if [ -z "$HADOOP_CONF_DIR" ]; then
-  echo "setting hadoop conf dir"
-  HADOOP_CONF_DIR="/etc/hadoop/conf"
-fi
-if [ -z "$SPARK_CONF_DIR" ]; then
-  echo "setting spark conf dir"
-  SPARK_CONF_DIR="/etc/spark/conf"
-fi
+
+. "${DIR}"/conf/hudi-env.sh
+
 if [ -z "$CLIENT_JAR" ]; then
-  echo "client jar location not set"
+  echo "Client jar location not set, please set it in conf/hudi-env.sh"
+  exit
 
 Review comment:
   @vinothchandar Found `exit` cause checks failed, have to delete it.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] hddong removed a comment on issue #1042: [HUDI-359] Add hudi-env for hudi-cli module

2019-11-25 Thread GitBox
hddong removed a comment on issue #1042: [HUDI-359] Add hudi-env for hudi-cli 
module
URL: https://github.com/apache/incubator-hudi/pull/1042#issuecomment-557976681
 
 
   @vinothchandar , checks always have failed, could you have any suggestion 
and help.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] fbalicchia commented on issue #1046: Use commitimestamp2 instead of commitimestamp1 in HoodieJavaStreamingApp

2019-11-25 Thread GitBox
fbalicchia commented on issue #1046: Use commitimestamp2 instead of  
commitimestamp1 in HoodieJavaStreamingApp
URL: https://github.com/apache/incubator-hudi/pull/1046#issuecomment-558044299
 
 
   > Please remove `unused import - java.util.concurrent.Callable`, and you 
would clikc the details below to see error msg.
   
   done


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] leesf commented on a change in pull request #1039: [HUDI-340]: made max events to read from kafka source configurable

2019-11-25 Thread GitBox
leesf commented on a change in pull request #1039: [HUDI-340]: made max events 
to read from kafka source configurable
URL: https://github.com/apache/incubator-hudi/pull/1039#discussion_r350031504
 
 

 ##
 File path: 
hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/SourceFormatAdapter.java
 ##
 @@ -126,4 +127,13 @@ public SourceFormatAdapter(Source source) {
 throw new IllegalArgumentException("Unknown source type (" + 
source.getSourceType() + ")");
 }
   }
+
+  /**
+   * This method is needed to set Config.DEFAULT_MAX_EVENTS_FROM_KAFKA_SOURCE 
to some lower value for test cases,
+   * gets called only from TestKafkaSource.java class
+   * @param maxEventsFromKafkaSource
+   */
+  public void setDefaultMaxEventsFromKafkaSource(long 
maxEventsFromKafkaSource) {
 
 Review comment:
   > @leesf regarding your first point, basically I wanted to show a scenario 
where sourceLimit and Config.MAX_EVENTS_FROM_KAFKA_SOURCE_PROP are set to 
Long.MAX_VALUE and default value comes into play when the number of events in 
kafka topic is greater than that default value (i.e greater than 5M in normal 
case). That would require consuming those many events from kafka topic, hence I 
added the function to reset DEFAULT_MAX_EVENTS_FROM_KAFKA_SOURCE. IMHO, with 
lesser number of events in topic, this scenario is not depicted properly. WDYT?
   
   Make sense.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services