[jira] [Commented] (HUDI-554) Restructure code/packages to move more code back into hudi-writer-common

2020-02-09 Thread Vinoth Chandar (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17033417#comment-17033417
 ] 

Vinoth Chandar commented on HUDI-554:
-

No.. I am going to just organize a bit better and remove spark from few more 
packages (if possible), before we do the splitting..

Makes sense? taking care of known, lower hanging fruits 

> Restructure code/packages  to move more code back into hudi-writer-common
> -
>
> Key: HUDI-554
> URL: https://issues.apache.org/jira/browse/HUDI-554
> Project: Apache Hudi (incubating)
>  Issue Type: Sub-task
>  Components: Code Cleanup
>Reporter: Vinoth Chandar
>Assignee: Vinoth Chandar
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-85) Improve bloom index speed using interval trees for range pruning

2020-02-09 Thread Vinoth Chandar (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-85?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17033412#comment-17033412
 ] 

Vinoth Chandar commented on HUDI-85:


Don't recall...  :/ 

> Improve bloom index speed using interval trees for range pruning
> 
>
> Key: HUDI-85
> URL: https://issues.apache.org/jira/browse/HUDI-85
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Index, Performance
>Reporter: Vinoth Chandar
>Assignee: Vinoth Chandar
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> https://github.com/apache/incubator-hudi/pull/513



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] lamber-ken commented on a change in pull request #1317: [HUDI-605] Avoid calculating the size of schema redundantly

2020-02-09 Thread GitBox
lamber-ken commented on a change in pull request #1317: [HUDI-605] Avoid 
calculating the size of schema redundantly
URL: https://github.com/apache/incubator-hudi/pull/1317#discussion_r376899375
 
 

 ##
 File path: 
hudi-common/src/main/java/org/apache/hudi/common/util/collection/DiskBasedMap.java
 ##
 @@ -223,8 +223,8 @@ private R get(ValueMetadata entry) {
   private synchronized R put(T key, R value, boolean flush) {
 try {
   byte[] val = SerializationUtils.serialize(value);
-  Integer valueSize = val.length;
-  Long timestamp = System.currentTimeMillis();
+  int valueSize = val.length;
 
 Review comment:
   Done.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] lamber-ken commented on a change in pull request #1317: [HUDI-605] Avoid calculating the size of schema redundantly

2020-02-09 Thread GitBox
lamber-ken commented on a change in pull request #1317: [HUDI-605] Avoid 
calculating the size of schema redundantly
URL: https://github.com/apache/incubator-hudi/pull/1317#discussion_r376898473
 
 

 ##
 File path: 
hudi-common/src/main/java/org/apache/hudi/common/util/collection/DiskBasedMap.java
 ##
 @@ -223,8 +223,8 @@ private R get(ValueMetadata entry) {
   private synchronized R put(T key, R value, boolean flush) {
 try {
   byte[] val = SerializationUtils.serialize(value);
-  Integer valueSize = val.length;
-  Long timestamp = System.currentTimeMillis();
+  int valueSize = val.length;
 
 Review comment:
   Will revert it if this against contribution guide. Thanks


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] lamber-ken commented on a change in pull request #1317: [HUDI-605] Avoid calculating the size of schema redundantly

2020-02-09 Thread GitBox
lamber-ken commented on a change in pull request #1317: [HUDI-605] Avoid 
calculating the size of schema redundantly
URL: https://github.com/apache/incubator-hudi/pull/1317#discussion_r376897263
 
 

 ##
 File path: 
hudi-common/src/main/java/org/apache/hudi/common/util/collection/DiskBasedMap.java
 ##
 @@ -223,8 +223,8 @@ private R get(ValueMetadata entry) {
   private synchronized R put(T key, R value, boolean flush) {
 try {
   byte[] val = SerializationUtils.serialize(value);
-  Integer valueSize = val.length;
-  Long timestamp = System.currentTimeMillis();
+  int valueSize = val.length;
 
 Review comment:
   The type of `val.lenght` is primitive type(int), the type of 
`System.currentTimeMillis()` is primitive type(long).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1106: [HUDI-209] Implement JMX metrics reporter

2020-02-09 Thread GitBox
vinothchandar commented on a change in pull request #1106: [HUDI-209] Implement 
JMX metrics reporter
URL: https://github.com/apache/incubator-hudi/pull/1106#discussion_r376895998
 
 

 ##
 File path: 
hudi-client/src/main/java/org/apache/hudi/metrics/JmxMetricsReporter.java
 ##
 @@ -82,6 +102,106 @@ public void report() {
 
   @Override
   public Closeable getReporter() {
-return null;
+return jmxServer.getReporter();
+  }
+
+  @Override
+  public void stop() {
+if (jmxServer != null) {
+  try {
+jmxServer.stop();
+  } catch (IOException e) {
+LOG.error("Failed to stop JMX server.", e);
+  }
+}
+  }
+
+  /**
+   * JMX Server implementation that JMX clients can connect to.
+   *
+   * Heavily based on j256 simplejmx project
 
 Review comment:
   apache and MIT are what we can clear easily. others need more scrutiny. Can 
we re-implement this ourselves please? 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1317: [HUDI-605] Avoid calculating the size of schema redundantly

2020-02-09 Thread GitBox
vinothchandar commented on a change in pull request #1317: [HUDI-605] Avoid 
calculating the size of schema redundantly
URL: https://github.com/apache/incubator-hudi/pull/1317#discussion_r376894610
 
 

 ##
 File path: 
hudi-common/src/main/java/org/apache/hudi/common/util/collection/DiskBasedMap.java
 ##
 @@ -223,8 +223,8 @@ private R get(ValueMetadata entry) {
   private synchronized R put(T key, R value, boolean flush) {
 try {
   byte[] val = SerializationUtils.serialize(value);
-  Integer valueSize = val.length;
-  Long timestamp = System.currentTimeMillis();
+  int valueSize = val.length;
 
 Review comment:
   why change these? 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Updated] (HUDI-604) Update docker page

2020-02-09 Thread lamber-ken (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lamber-ken updated HUDI-604:

Summary: Update docker page  (was: Update docker pages)

> Update docker page
> --
>
> Key: HUDI-604
> URL: https://issues.apache.org/jira/browse/HUDI-604
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Docs
>Reporter: lamber-ken
>Assignee: lamber-ken
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> 1, Change one-line command to multi lines
> 2, Unify code indent



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-604) Update docker pages

2020-02-09 Thread lamber-ken (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lamber-ken updated HUDI-604:

Summary: Update docker pages  (was: Update docker pages )

> Update docker pages
> ---
>
> Key: HUDI-604
> URL: https://issues.apache.org/jira/browse/HUDI-604
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Docs
>Reporter: lamber-ken
>Assignee: lamber-ken
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> 1, Change one-line command to multi lines
> 2, Unify code indent



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HUDI-604) Update docker pages

2020-02-09 Thread lamber-ken (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lamber-ken reassigned HUDI-604:
---

Assignee: lamber-ken

> Update docker pages 
> 
>
> Key: HUDI-604
> URL: https://issues.apache.org/jira/browse/HUDI-604
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Docs
>Reporter: lamber-ken
>Assignee: lamber-ken
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> 1, Change one-line command to multi lines
> 2, Unify code indent



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] vinothchandar commented on issue #1314: [HUDI-542] Introduce a new pom module named hudi-writer-common

2020-02-09 Thread GitBox
vinothchandar commented on issue #1314: [HUDI-542] Introduce a new pom module 
named hudi-writer-common
URL: https://github.com/apache/incubator-hudi/pull/1314#issuecomment-583975092
 
 
   @yanghua ack. Do we need a feature branch for this work? can't we do this on 
master with incremental commits 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Updated] (HUDI-605) Avoid calculating the size of schema redundantly

2020-02-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-605:

Labels: pull-request-available  (was: )

> Avoid calculating the size of schema redundantly  
> --
>
> Key: HUDI-605
> URL: https://issues.apache.org/jira/browse/HUDI-605
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Common Core
>Reporter: lamber-ken
>Priority: Critical
>  Labels: pull-request-available
>
> Avoid calculating the size of schema redundantly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] lamber-ken opened a new pull request #1317: [HUDI-605] Avoid calculating the size of schema redundantly

2020-02-09 Thread GitBox
lamber-ken opened a new pull request #1317: [HUDI-605] Avoid calculating the 
size of schema redundantly
URL: https://github.com/apache/incubator-hudi/pull/1317
 
 
   ## What is the purpose of the pull request
   
   The same schema object is shared amongst all records in the JVM.
   
   ## Brief change log
   
 - Calculate the size of schema when init HoodieRecordSizeEstimator
   
   ## Verify this pull request
   
   This pull request is already covered by existing tests: 
   `org.apache.hudi.common.util.collection.TestDiskBasedMap`.
   
   ## Committer checklist
   
- [X] Has a corresponding JIRA in PR title & commit

- [X] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Updated] (HUDI-605) Avoid calculating the size of schema redundantly

2020-02-09 Thread lamber-ken (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lamber-ken updated HUDI-605:

Status: Open  (was: New)

> Avoid calculating the size of schema redundantly  
> --
>
> Key: HUDI-605
> URL: https://issues.apache.org/jira/browse/HUDI-605
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Common Core
>Reporter: lamber-ken
>Priority: Critical
>
> Avoid calculating the size of schema redundantly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-605) Avoid calculating the size of schema redundantly

2020-02-09 Thread lamber-ken (Jira)
lamber-ken created HUDI-605:
---

 Summary: Avoid calculating the size of schema redundantly  
 Key: HUDI-605
 URL: https://issues.apache.org/jira/browse/HUDI-605
 Project: Apache Hudi (incubating)
  Issue Type: Improvement
  Components: Common Core
Reporter: lamber-ken


Avoid calculating the size of schema redundantly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-604) Update docker pages

2020-02-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-604:

Labels: pull-request-available  (was: )

> Update docker pages 
> 
>
> Key: HUDI-604
> URL: https://issues.apache.org/jira/browse/HUDI-604
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Docs
>Reporter: lamber-ken
>Priority: Major
>  Labels: pull-request-available
>
> 1, Change one-line command to multi lines
> 2, Unify code indent



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] lamber-ken opened a new pull request #1316: [HUDI-604] Update docker pages

2020-02-09 Thread GitBox
lamber-ken opened a new pull request #1316: [HUDI-604] Update docker pages
URL: https://github.com/apache/incubator-hudi/pull/1316
 
 
   ## What is the purpose of the pull request
   
   1. Change one line command to multi line 
   2. Modify  sub title
   
   ## Verify this pull request
   
   **Compare changes**
   - https://hudi.apache.org/docs/docker_demo.html
   - https://lamber-ken.github.io/docs/docker_demo.html
   
   ## Committer checklist
   
- [X] Has a corresponding JIRA in PR title & commit

- [X] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Created] (HUDI-604) Update docker pages

2020-02-09 Thread lamber-ken (Jira)
lamber-ken created HUDI-604:
---

 Summary: Update docker pages 
 Key: HUDI-604
 URL: https://issues.apache.org/jira/browse/HUDI-604
 Project: Apache Hudi (incubating)
  Issue Type: Improvement
  Components: Docs
Reporter: lamber-ken


1, Change one-line command to multi lines

2, Unify code indent



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-604) Update docker pages

2020-02-09 Thread lamber-ken (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lamber-ken updated HUDI-604:

Status: Open  (was: New)

> Update docker pages 
> 
>
> Key: HUDI-604
> URL: https://issues.apache.org/jira/browse/HUDI-604
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Docs
>Reporter: lamber-ken
>Priority: Major
>
> 1, Change one-line command to multi lines
> 2, Unify code indent



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Build failed in Jenkins: hudi-snapshot-deployment-0.5 #184

2020-02-09 Thread Apache Jenkins Server
See 


Changes:


--
[...truncated 2.28 KB...]
plexus-classworlds-2.5.2.jar

/home/jenkins/tools/maven/apache-maven-3.5.4/conf:
logging
settings.xml
toolchains.xml

/home/jenkins/tools/maven/apache-maven-3.5.4/conf/logging:
simplelogger.properties

/home/jenkins/tools/maven/apache-maven-3.5.4/lib:
aopalliance-1.0.jar
cdi-api-1.0.jar
cdi-api.license
commons-cli-1.4.jar
commons-cli.license
commons-io-2.5.jar
commons-io.license
commons-lang3-3.5.jar
commons-lang3.license
ext
guava-20.0.jar
guice-4.2.0-no_aop.jar
jansi-1.17.1.jar
jansi-native
javax.inject-1.jar
jcl-over-slf4j-1.7.25.jar
jcl-over-slf4j.license
jsr250-api-1.0.jar
jsr250-api.license
maven-artifact-3.5.4.jar
maven-artifact.license
maven-builder-support-3.5.4.jar
maven-builder-support.license
maven-compat-3.5.4.jar
maven-compat.license
maven-core-3.5.4.jar
maven-core.license
maven-embedder-3.5.4.jar
maven-embedder.license
maven-model-3.5.4.jar
maven-model-builder-3.5.4.jar
maven-model-builder.license
maven-model.license
maven-plugin-api-3.5.4.jar
maven-plugin-api.license
maven-repository-metadata-3.5.4.jar
maven-repository-metadata.license
maven-resolver-api-1.1.1.jar
maven-resolver-api.license
maven-resolver-connector-basic-1.1.1.jar
maven-resolver-connector-basic.license
maven-resolver-impl-1.1.1.jar
maven-resolver-impl.license
maven-resolver-provider-3.5.4.jar
maven-resolver-provider.license
maven-resolver-spi-1.1.1.jar
maven-resolver-spi.license
maven-resolver-transport-wagon-1.1.1.jar
maven-resolver-transport-wagon.license
maven-resolver-util-1.1.1.jar
maven-resolver-util.license
maven-settings-3.5.4.jar
maven-settings-builder-3.5.4.jar
maven-settings-builder.license
maven-settings.license
maven-shared-utils-3.2.1.jar
maven-shared-utils.license
maven-slf4j-provider-3.5.4.jar
maven-slf4j-provider.license
org.eclipse.sisu.inject-0.3.3.jar
org.eclipse.sisu.inject.license
org.eclipse.sisu.plexus-0.3.3.jar
org.eclipse.sisu.plexus.license
plexus-cipher-1.7.jar
plexus-cipher.license
plexus-component-annotations-1.7.1.jar
plexus-component-annotations.license
plexus-interpolation-1.24.jar
plexus-interpolation.license
plexus-sec-dispatcher-1.4.jar
plexus-sec-dispatcher.license
plexus-utils-3.1.0.jar
plexus-utils.license
slf4j-api-1.7.25.jar
slf4j-api.license
wagon-file-3.1.0.jar
wagon-file.license
wagon-http-3.1.0-shaded.jar
wagon-http.license
wagon-provider-api-3.1.0.jar
wagon-provider-api.license

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/ext:
README.txt

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native:
freebsd32
freebsd64
linux32
linux64
osx
README.txt
windows32
windows64

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/freebsd32:
libjansi.so

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/freebsd64:
libjansi.so

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/linux32:
libjansi.so

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/linux64:
libjansi.so

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/osx:
libjansi.jnilib

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/windows32:
jansi.dll

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/windows64:
jansi.dll
Finished /home/jenkins/tools/maven/apache-maven-3.5.4 Directory Listing :
Detected current version as: 
'HUDI_home=
0.5.2-SNAPSHOT'
[INFO] Scanning for projects...
[WARNING] 
[WARNING] Some problems were encountered while building the effective model for 
org.apache.hudi:hudi-spark_2.11:jar:0.5.2-SNAPSHOT
[WARNING] 'artifactId' contains an expression but should be a constant. @ 
org.apache.hudi:hudi-spark_${scala.binary.version}:[unknown-version], 

 line 26, column 15
[WARNING] 
[WARNING] Some problems were encountered while building the effective model for 
org.apache.hudi:hudi-utilities_2.11:jar:0.5.2-SNAPSHOT
[WARNING] 'artifactId' contains an expression but should be a constant. @ 
org.apache.hudi:hudi-utilities_${scala.binary.version}:[unknown-version], 

 line 26, column 15
[WARNING] 
[WARNING] Some problems were encountered while building the effective model for 
org.apache.hudi:hudi-spark-bundle_2.11:jar:0.5.2-SNAPSHOT
[WARNING] 'artifactId' contains an expression but should be a constant. @ 
org.apache.hudi:hudi-spark-bundle_${scala.binary.version}:[unknown-version], 

 line 26, column 15
[WARNING] 
[WARNING] Some problems were encountered while building the effective model for 
org.apache.hudi:hudi-utilities-bundle_2.11:jar:0.5.2-SNAPSHOT
[WARNING] 'artifactId' contains an expression but should be a constant. @ 

[GitHub] [incubator-hudi] lamber-ken commented on issue #1306: [HUDI-598] Update quick start page

2020-02-09 Thread GitBox
lamber-ken commented on issue #1306: [HUDI-598] Update quick start page
URL: https://github.com/apache/incubator-hudi/pull/1306#issuecomment-583936293
 
 
   hi @bhasudha, all review comments are addressed and fixed. :)
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Resolved] (HUDI-560) Remove legacy IdentityTransformer

2020-02-09 Thread vinoyang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang resolved HUDI-560.
---
Resolution: Fixed

Done via master branch: 91f47802ebead2259f1d05fc9dc3323a46b4be56

> Remove legacy IdentityTransformer
> -
>
> Key: HUDI-560
> URL: https://issues.apache.org/jira/browse/HUDI-560
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Code Cleanup
>Reporter: vinoyang
>Assignee: wangxianghu
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Currently, {{IdentityTransformer}} has not been used anywhere in Hudi 
> codebase. And it seems it's just like a  pass-through transformer. Can we 
> remove it?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] yanghua merged pull request #1264: [HUDI-560] Remove legacy IdentityTransformer

2020-02-09 Thread GitBox
yanghua merged pull request #1264: [HUDI-560] Remove legacy IdentityTransformer
URL: https://github.com/apache/incubator-hudi/pull/1264
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[incubator-hudi] branch master updated: [HUDI-560] Remove legacy IdentityTransformer (#1264)

2020-02-09 Thread vinoyang
This is an automated email from the ASF dual-hosted git repository.

vinoyang pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new 5fdf5a1  [HUDI-560] Remove legacy IdentityTransformer (#1264)
5fdf5a1 is described below

commit 5fdf5a192706d8c8a0432297cca1e6e097de0e58
Author: Mathieu <49835526+wangxian...@users.noreply.github.com>
AuthorDate: Mon Feb 10 10:04:58 2020 +0800

[HUDI-560] Remove legacy IdentityTransformer (#1264)
---
 .../utilities/transform/IdentityTransformer.java   | 38 --
 1 file changed, 38 deletions(-)

diff --git 
a/hudi-utilities/src/main/java/org/apache/hudi/utilities/transform/IdentityTransformer.java
 
b/hudi-utilities/src/main/java/org/apache/hudi/utilities/transform/IdentityTransformer.java
deleted file mode 100644
index 31f0ce6..000
--- 
a/hudi-utilities/src/main/java/org/apache/hudi/utilities/transform/IdentityTransformer.java
+++ /dev/null
@@ -1,38 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one
- * or more contributor license agreements.  See the NOTICE file
- * distributed with this work for additional information
- * regarding copyright ownership.  The ASF licenses this file
- * to you under the Apache License, Version 2.0 (the
- * "License"); you may not use this file except in compliance
- * with the License.  You may obtain a copy of the License at
- *
- *  http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-
-package org.apache.hudi.utilities.transform;
-
-import org.apache.hudi.common.util.TypedProperties;
-
-import org.apache.spark.api.java.JavaSparkContext;
-import org.apache.spark.sql.Dataset;
-import org.apache.spark.sql.Row;
-import org.apache.spark.sql.SparkSession;
-
-/**
- * Identity transformer.
- */
-public class IdentityTransformer implements Transformer {
-
-  @Override
-  public Dataset apply(JavaSparkContext jsc, SparkSession sparkSession, 
Dataset rowDataset,
-  TypedProperties properties) {
-return rowDataset;
-  }
-}



[GitHub] [incubator-hudi] wangxianghu commented on issue #1224: [HUDI-397] Normalize log print statement

2020-02-09 Thread GitBox
wangxianghu commented on issue #1224: [HUDI-397] Normalize log print statement
URL: https://github.com/apache/incubator-hudi/pull/1224#issuecomment-583918838
 
 
   Hi @n3nash, Sorry for the delay, I'll fix it soon.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] leesf commented on a change in pull request #1106: [HUDI-209] Implement JMX metrics reporter

2020-02-09 Thread GitBox
leesf commented on a change in pull request #1106: [HUDI-209] Implement JMX 
metrics reporter
URL: https://github.com/apache/incubator-hudi/pull/1106#discussion_r376786221
 
 

 ##
 File path: 
hudi-client/src/main/java/org/apache/hudi/metrics/JmxMetricsReporter.java
 ##
 @@ -82,6 +102,106 @@ public void report() {
 
   @Override
   public Closeable getReporter() {
-return null;
+return jmxServer.getReporter();
+  }
+
+  @Override
+  public void stop() {
+if (jmxServer != null) {
+  try {
+jmxServer.stop();
+  } catch (IOException e) {
+LOG.error("Failed to stop JMX server.", e);
+  }
+}
+  }
+
+  /**
+   * JMX Server implementation that JMX clients can connect to.
+   *
+   * Heavily based on j256 simplejmx project
 
 Review comment:
   we would add the license to LICENSE file? @vinothchandar Let's get this on 
land.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] leesf commented on a change in pull request #1253: [HUDI-558] Introduce ability to compress bloom filters while storing in parquet

2020-02-09 Thread GitBox
leesf commented on a change in pull request #1253: [HUDI-558] Introduce ability 
to compress bloom filters while storing in parquet
URL: https://github.com/apache/incubator-hudi/pull/1253#discussion_r376785421
 
 

 ##
 File path: 
hudi-common/src/test/java/org/apache/hudi/common/util/TestGzipCompressionUtils.java
 ##
 @@ -0,0 +1,51 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.common.util;
+
+import org.apache.hudi.common.bloom.filter.BloomFilter;
+import org.apache.hudi.common.bloom.filter.SimpleBloomFilter;
+
+import org.apache.hadoop.util.hash.Hash;
+import org.junit.Assert;
+import org.junit.Test;
+
+import java.util.UUID;
+
 
 Review comment:
   add some annotation for the class would be better.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] leesf commented on a change in pull request #1253: [HUDI-558] Introduce ability to compress bloom filters while storing in parquet

2020-02-09 Thread GitBox
leesf commented on a change in pull request #1253: [HUDI-558] Introduce ability 
to compress bloom filters while storing in parquet
URL: https://github.com/apache/incubator-hudi/pull/1253#discussion_r376785469
 
 

 ##
 File path: 
hudi-common/src/test/java/org/apache/hudi/common/util/TestGzipCompressionUtils.java
 ##
 @@ -0,0 +1,51 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.common.util;
+
+import org.apache.hudi.common.bloom.filter.BloomFilter;
+import org.apache.hudi.common.bloom.filter.SimpleBloomFilter;
+
+import org.apache.hadoop.util.hash.Hash;
+import org.junit.Assert;
+import org.junit.Test;
+
+import java.util.UUID;
+
+public class TestGzipCompressionUtils {
+
+  @Test
+  public void testCompressDeCompress() {
 
 Review comment:
   +1


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] leesf commented on issue #1200: [HUDI-514] A schema provider to get metadata through Jdbc

2020-02-09 Thread GitBox
leesf commented on issue #1200: [HUDI-514] A schema provider to get metadata 
through Jdbc
URL: https://github.com/apache/incubator-hudi/pull/1200#issuecomment-583846298
 
 
   @OpenOpened Only left some minor comments, otherwise looks good to me.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] leesf commented on a change in pull request #1200: [HUDI-514] A schema provider to get metadata through Jdbc

2020-02-09 Thread GitBox
leesf commented on a change in pull request #1200: [HUDI-514] A schema provider 
to get metadata through Jdbc
URL: https://github.com/apache/incubator-hudi/pull/1200#discussion_r376784592
 
 

 ##
 File path: 
hudi-utilities/src/main/java/org/apache/hudi/utilities/UtilHelpers.java
 ##
 @@ -236,4 +250,57 @@ public static TypedProperties readConfig(InputStream in) 
throws IOException {
 defaults.load(in);
 return defaults;
   }
+
+  /***
+   * call spark function get the schema through jdbc.
+   * @param options
+   * @return
+   * @throws Exception
+   */
+  public static Schema getSchema(Map options) throws Exception 
{
+scala.collection.immutable.Map ioptions = 
toScalaImmutableMap(options);
+JDBCOptions jdbcOptions = new JDBCOptions(ioptions);
+Connection conn = JdbcUtils.createConnectionFactory(jdbcOptions).apply();
+String url = jdbcOptions.url();
+String table = jdbcOptions.tableOrQuery();
+JdbcOptionsInWrite jdbcOptionsInWrite = new JdbcOptionsInWrite(ioptions);
+boolean tableExists = JdbcUtils.tableExists(conn, jdbcOptionsInWrite);
+if (tableExists) {
+  JdbcDialect dialect = JdbcDialects.get(url);
+  try {
+PreparedStatement statement = 
conn.prepareStatement(dialect.getSchemaQuery(table));
+try {
+  statement.setQueryTimeout(Integer.parseInt(options.get("timeout")));
+  ResultSet rs = statement.executeQuery();
+  try {
+StructType structType;
+if (Boolean.parseBoolean(ioptions.get("nullable").get())) {
+  structType = JdbcUtils.getSchema(rs, dialect, true);
+} else {
+  structType = JdbcUtils.getSchema(rs, dialect, false);
+}
+return 
AvroConversionUtils.convertStructTypeToAvroSchema(structType, table, "hoodie." 
+ table);
+  } finally {
+rs.close();
+  }
+} finally {
+  statement.close();
+}
+  } finally {
+conn.close();
+  }
+} else {
+  throw new HoodieException(String.format("%s table not exists!", table));
+}
+  }
+
+  @SuppressWarnings("unchecked")
+  private static  scala.collection.immutable.Map 
toScalaImmutableMap(java.util.Map javaMap) {
 
 Review comment:
   also is the method(`toScalaImmutableMap`) copied from other projects or 
implemented on our own?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] leesf commented on a change in pull request #1200: [HUDI-514] A schema provider to get metadata through Jdbc

2020-02-09 Thread GitBox
leesf commented on a change in pull request #1200: [HUDI-514] A schema provider 
to get metadata through Jdbc
URL: https://github.com/apache/incubator-hudi/pull/1200#discussion_r376784460
 
 

 ##
 File path: 
hudi-utilities/src/test/java/org/apache/hudi/utilities/TestJdbcbasedSchemaProvider.java
 ##
 @@ -0,0 +1,82 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.utilities;
+
+import org.apache.hudi.common.util.TypedProperties;
+import org.apache.hudi.exception.HoodieException;
+import org.apache.hudi.utilities.schema.JdbcbasedSchemaProvider;
+
+import org.apache.avro.Schema;
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+import org.apache.spark.api.java.JavaSparkContext;
+import org.junit.After;
+import org.junit.Before;
+import org.junit.Test;
+
+import java.io.IOException;
+import java.sql.Connection;
+import java.sql.DriverManager;
+import java.sql.PreparedStatement;
+import java.sql.SQLException;
+
+import static org.junit.Assert.assertEquals;
+
 
 Review comment:
   Would be better to add some annotations


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] leesf commented on issue #1315: [Hudi-108] Removing 2GB spark partition limitations in HoodieBloomIndex with spark 2.4.4

2020-02-09 Thread GitBox
leesf commented on issue #1315: [Hudi-108] Removing 2GB spark partition 
limitations in HoodieBloomIndex with spark 2.4.4
URL: https://github.com/apache/incubator-hudi/pull/1315#issuecomment-583845543
 
 
   Looks much cleaner after change, @nsivabalan would you also please take a 
look at the travis failure.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Resolved] (HUDI-585) Optimize the steps of building with scala-2.12

2020-02-09 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf resolved HUDI-585.

Fix Version/s: 0.5.2
   Resolution: Fixed

Fixed via master: 425e3e6c78b9be00fc3fecfc335c94e05a1c70e5

> Optimize the steps of building with scala-2.12 
> ---
>
> Key: HUDI-585
> URL: https://issues.apache.org/jira/browse/HUDI-585
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Utilities
>Reporter: lamber-ken
>Assignee: lamber-ken
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.2
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Optimize the steps of building with scala-2.12.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HUDI-596) KafkaConsumer need to be closed

2020-02-09 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf resolved HUDI-596.

Fix Version/s: 0.5.2
   Resolution: Fixed

Fixed via master: 347e297ac19ed55172e84e13075e19ce060954c6

> KafkaConsumer need to be closed
> ---
>
> Key: HUDI-596
> URL: https://issues.apache.org/jira/browse/HUDI-596
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>  Components: Utilities
>Reporter: dengziming
>Assignee: dengziming
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.2
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> `offsetGen.getNextOffsetRanges` will is called periodically in DeltaStreamer 
> application, and it will `new KafkaConsumer(kafkaParams)` without close, and 
> Exception will throw after a while.
> ```
> java.net.SocketException: Too many open files
>   at sun.nio.ch.Net.socket0(Native Method)
>   at sun.nio.ch.Net.socket(Net.java:411)
>   at sun.nio.ch.Net.socket(Net.java:404)
>   at sun.nio.ch.SocketChannelImpl.(SocketChannelImpl.java:105)
>   at 
> sun.nio.ch.SelectorProviderImpl.openSocketChannel(SelectorProviderImpl.java:60)
>   at java.nio.channels.SocketChannel.open(SocketChannel.java:145)
>   at org.apache.kafka.common.network.Selector.connect(Selector.java:211)
>   at 
> org.apache.kafka.clients.NetworkClient.initiateConnect(NetworkClient.java:864)
>   at org.apache.kafka.clients.NetworkClient.ready(NetworkClient.java:265)
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.trySend(ConsumerNetworkClient.java:485)
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:261)
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:242)
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:218)
>   at 
> org.apache.kafka.clients.consumer.internals.Fetcher.getTopicMetadata(Fetcher.java:274)
>   at 
> org.apache.kafka.clients.consumer.KafkaConsumer.partitionsFor(KafkaConsumer.java:1774)
>   at 
> org.apache.kafka.clients.consumer.KafkaConsumer.partitionsFor(KafkaConsumer.java:1742)
>   at 
> org.apache.hudi.utilities.sources.helpers.KafkaOffsetGen.getNextOffsetRanges(KafkaOffsetGen.java:177)
>   at 
> org.apache.hudi.utilities.sources.JsonKafkaSource.fetchNewData(JsonKafkaSource.java:56)
>   at org.apache.hudi.utilities.sources.Source.fetchNext(Source.java:73)
>   at 
> org.apache.hudi.utilities.deltastreamer.SourceFormatAdapter.fetchNewDataInRowFormat(SourceFormatAdapter.java:107)
>   at 
> org.apache.hudi.utilities.deltastreamer.DeltaSync.readFromSource(DeltaSync.java:288)
>   at 
> org.apache.hudi.utilities.deltastreamer.DeltaSync.syncOnce(DeltaSync.java:226)
> ```



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-596) KafkaConsumer need to be closed

2020-02-09 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf updated HUDI-596:
---
Status: Open  (was: New)

> KafkaConsumer need to be closed
> ---
>
> Key: HUDI-596
> URL: https://issues.apache.org/jira/browse/HUDI-596
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>  Components: Utilities
>Reporter: dengziming
>Assignee: dengziming
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> `offsetGen.getNextOffsetRanges` will is called periodically in DeltaStreamer 
> application, and it will `new KafkaConsumer(kafkaParams)` without close, and 
> Exception will throw after a while.
> ```
> java.net.SocketException: Too many open files
>   at sun.nio.ch.Net.socket0(Native Method)
>   at sun.nio.ch.Net.socket(Net.java:411)
>   at sun.nio.ch.Net.socket(Net.java:404)
>   at sun.nio.ch.SocketChannelImpl.(SocketChannelImpl.java:105)
>   at 
> sun.nio.ch.SelectorProviderImpl.openSocketChannel(SelectorProviderImpl.java:60)
>   at java.nio.channels.SocketChannel.open(SocketChannel.java:145)
>   at org.apache.kafka.common.network.Selector.connect(Selector.java:211)
>   at 
> org.apache.kafka.clients.NetworkClient.initiateConnect(NetworkClient.java:864)
>   at org.apache.kafka.clients.NetworkClient.ready(NetworkClient.java:265)
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.trySend(ConsumerNetworkClient.java:485)
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:261)
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:242)
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:218)
>   at 
> org.apache.kafka.clients.consumer.internals.Fetcher.getTopicMetadata(Fetcher.java:274)
>   at 
> org.apache.kafka.clients.consumer.KafkaConsumer.partitionsFor(KafkaConsumer.java:1774)
>   at 
> org.apache.kafka.clients.consumer.KafkaConsumer.partitionsFor(KafkaConsumer.java:1742)
>   at 
> org.apache.hudi.utilities.sources.helpers.KafkaOffsetGen.getNextOffsetRanges(KafkaOffsetGen.java:177)
>   at 
> org.apache.hudi.utilities.sources.JsonKafkaSource.fetchNewData(JsonKafkaSource.java:56)
>   at org.apache.hudi.utilities.sources.Source.fetchNext(Source.java:73)
>   at 
> org.apache.hudi.utilities.deltastreamer.SourceFormatAdapter.fetchNewDataInRowFormat(SourceFormatAdapter.java:107)
>   at 
> org.apache.hudi.utilities.deltastreamer.DeltaSync.readFromSource(DeltaSync.java:288)
>   at 
> org.apache.hudi.utilities.deltastreamer.DeltaSync.syncOnce(DeltaSync.java:226)
> ```



--
This message was sent by Atlassian Jira
(v8.3.4#803005)