[jira] [Updated] (HUDI-470) Fix NPE when print result via hudi-cli

2019-12-25 Thread lamber-ken (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lamber-ken updated HUDI-470:

Description: 
The size of `rows` is error, for example The size of `rows` is error, for 
example 
{code:java}
List allRecords = Arrays.asList();

String[][] rows = new String[allRecords.size() + 1][];
int i = 0;
for (String record : allRecords) {
String[] data = new String[1];
data[0] = record;
rows[i++] = data;
}

HoodiePrintHelper.print(new String[]{"Partition Path"}, rows);
{code}
Result
{code:java}
Exception in thread "main" java.lang.NullPointerException
at com.jakewharton.fliptables.FlipTable.(FlipTable.java:37)
at com.jakewharton.fliptables.FlipTable.of(FlipTable.java:20)
at 
org.apache.hudi.cli.HoodiePrintHelper.printTextTable(HoodiePrintHelper.java:110)
at org.apache.hudi.cli.HoodiePrintHelper.print(HoodiePrintHelper.java:43)
at 
org.apache.hudi.cli.commands.RepairsCommand.main(RepairsCommand.java:127){code}
 

  was:
The size of `rows` is error, for example The size of `rows` is error, for 
example 
{code:java}
List allRecords = Arrays.asList();

String[][] rows = new String[allRecords.size() + 1][];
int i = 0;
for (String record : allRecords) {
String[] data = new String[1];
data[0] = record;
rows[i++] = data;
}

HoodiePrintHelper.print(new String[]{"Partition Path"}, rows);
{code}
Result
{code:java}
Exception in thread "main" java.lang.NullPointerException
at com.jakewharton.fliptables.FlipTable.(FlipTable.java:37)
at com.jakewharton.fliptables.FlipTable.of(FlipTable.java:20)
at 
org.apache.hudi.cli.HoodiePrintHelper.printTextTable(HoodiePrintHelper.java:110)
at 
org.apache.hudi.cli.HoodiePrintHelper.print(HoodiePrintHelper.java:43)
at 
org.apache.hudi.cli.commands.RepairsCommand.main(RepairsCommand.java:127){code}
 


> Fix NPE when print result via hudi-cli
> --
>
> Key: HUDI-470
> URL: https://issues.apache.org/jira/browse/HUDI-470
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>Reporter: lamber-ken
>Assignee: lamber-ken
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.1
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The size of `rows` is error, for example The size of `rows` is error, for 
> example 
> {code:java}
> List allRecords = Arrays.asList();
> String[][] rows = new String[allRecords.size() + 1][];
> int i = 0;
> for (String record : allRecords) {
> String[] data = new String[1];
> data[0] = record;
> rows[i++] = data;
> }
> HoodiePrintHelper.print(new String[]{"Partition Path"}, rows);
> {code}
> Result
> {code:java}
> Exception in thread "main" java.lang.NullPointerException
> at com.jakewharton.fliptables.FlipTable.(FlipTable.java:37)
> at com.jakewharton.fliptables.FlipTable.of(FlipTable.java:20)
> at 
> org.apache.hudi.cli.HoodiePrintHelper.printTextTable(HoodiePrintHelper.java:110)
> at org.apache.hudi.cli.HoodiePrintHelper.print(HoodiePrintHelper.java:43)
> at 
> org.apache.hudi.cli.commands.RepairsCommand.main(RepairsCommand.java:127){code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-470) Fix NPE when print result via hudi-cli

2019-12-25 Thread lamber-ken (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lamber-ken updated HUDI-470:

Description: 
The size of `rows` is error, for example The size of `rows` is error, for 
example 
{code:java}
List allRecords = Arrays.asList();

String[][] rows = new String[allRecords.size() + 1][];
int i = 0;
for (String record : allRecords) {
String[] data = new String[1];
data[0] = record;
rows[i++] = data;
}

HoodiePrintHelper.print(new String[]{"Partition Path"}, rows);
{code}
Result
{code:java}
Exception in thread "main" java.lang.NullPointerException
at com.jakewharton.fliptables.FlipTable.(FlipTable.java:37)
at com.jakewharton.fliptables.FlipTable.of(FlipTable.java:20)
at 
org.apache.hudi.cli.HoodiePrintHelper.printTextTable(HoodiePrintHelper.java:110)
at 
org.apache.hudi.cli.HoodiePrintHelper.print(HoodiePrintHelper.java:43)
at 
org.apache.hudi.cli.commands.RepairsCommand.main(RepairsCommand.java:127){code}
 

  was:Fix NPE when print result via hudi-cli


> Fix NPE when print result via hudi-cli
> --
>
> Key: HUDI-470
> URL: https://issues.apache.org/jira/browse/HUDI-470
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>Reporter: lamber-ken
>Assignee: lamber-ken
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.1
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The size of `rows` is error, for example The size of `rows` is error, for 
> example 
> {code:java}
> List allRecords = Arrays.asList();
> String[][] rows = new String[allRecords.size() + 1][];
> int i = 0;
> for (String record : allRecords) {
> String[] data = new String[1];
> data[0] = record;
> rows[i++] = data;
> }
> HoodiePrintHelper.print(new String[]{"Partition Path"}, rows);
> {code}
> Result
> {code:java}
> Exception in thread "main" java.lang.NullPointerException
>   at com.jakewharton.fliptables.FlipTable.(FlipTable.java:37)
>   at com.jakewharton.fliptables.FlipTable.of(FlipTable.java:20)
>   at 
> org.apache.hudi.cli.HoodiePrintHelper.printTextTable(HoodiePrintHelper.java:110)
>   at 
> org.apache.hudi.cli.HoodiePrintHelper.print(HoodiePrintHelper.java:43)
>   at 
> org.apache.hudi.cli.commands.RepairsCommand.main(RepairsCommand.java:127){code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] lamber-ken commented on issue #1138: [HUDI-470] Fix NPE when print result via hudi-cli

2019-12-25 Thread GitBox
lamber-ken commented on issue #1138: [HUDI-470] Fix NPE when print result via 
hudi-cli
URL: https://github.com/apache/incubator-hudi/pull/1138#issuecomment-569002628
 
 
   > It's a pity. These methods have not unit tests. It would be good to find 
these issues if we have. Additionally, we can give more detailed information in 
Jira Description just like what you did in the PR's description. WDYT?
   
   +1, manay methods need unit tests, will rich the jira description.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Updated] (HUDI-470) Fix NPE when print result via hudi-cli

2019-12-25 Thread vinoyang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang updated HUDI-470:
--
Fix Version/s: 0.5.1

> Fix NPE when print result via hudi-cli
> --
>
> Key: HUDI-470
> URL: https://issues.apache.org/jira/browse/HUDI-470
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>Reporter: lamber-ken
>Assignee: lamber-ken
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.1
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Fix NPE when print result via hudi-cli



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HUDI-470) Fix NPE when print result via hudi-cli

2019-12-25 Thread vinoyang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang resolved HUDI-470.
---
Resolution: Fixed

Fixed via master branch: a63bcc7c102e2a71bd63dac699a29d2ad92e9321

> Fix NPE when print result via hudi-cli
> --
>
> Key: HUDI-470
> URL: https://issues.apache.org/jira/browse/HUDI-470
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>Reporter: lamber-ken
>Assignee: lamber-ken
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Fix NPE when print result via hudi-cli



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] yanghua commented on issue #1138: [HUDI-470] Fix NPE when print result via hudi-cli

2019-12-25 Thread GitBox
yanghua commented on issue #1138: [HUDI-470] Fix NPE when print result via 
hudi-cli
URL: https://github.com/apache/incubator-hudi/pull/1138#issuecomment-569001616
 
 
   It's a pity. These methods have not unit tests. It would be good to find 
these issues if we have. Additionally, we can give more detailed information in 
Jira Description just like what you did in the PR's description. WDYT?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[incubator-hudi] branch master updated (def18a5 -> 842eabb)

2019-12-25 Thread vinoyang
This is an automated email from the ASF dual-hosted git repository.

vinoyang pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-hudi.git.


from def18a5  [MINOR] optimize hudi timeline service (#1137)
 add 842eabb  [HUDI-470] Fix NPE when print result via hudi-cli (#1138)

No new revisions were added by this update.

Summary of changes:
 .../main/java/org/apache/hudi/cli/commands/HoodieLogFileCommand.java| 2 +-
 hudi-cli/src/main/java/org/apache/hudi/cli/commands/RepairsCommand.java | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)



[GitHub] [incubator-hudi] yanghua merged pull request #1138: [HUDI-470] Fix NPE when print result via hudi-cli

2019-12-25 Thread GitBox
yanghua merged pull request #1138: [HUDI-470] Fix NPE when print result via 
hudi-cli
URL: https://github.com/apache/incubator-hudi/pull/1138
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] cdmikechen edited a comment on issue #1139: [MINOR]Optimize hudi-client module

2019-12-25 Thread GitBox
cdmikechen edited a comment on issue #1139: [MINOR]Optimize hudi-client module
URL: https://github.com/apache/incubator-hudi/pull/1139#issuecomment-568963144
 
 
   maybe use `git merge --squash` first to set a single commit, please.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] cdmikechen commented on issue #1139: [MINOR]Optimize hudi-client module

2019-12-25 Thread GitBox
cdmikechen commented on issue #1139: [MINOR]Optimize hudi-client module
URL: https://github.com/apache/incubator-hudi/pull/1139#issuecomment-568963144
 
 
   maybe use `git merge --squash` first to set a single commit, pleas.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Closed] (HUDI-467) Query RT Table in Hive found java.lang.NoClassDefFoundError Exception

2019-12-25 Thread cdmikechen (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

cdmikechen closed HUDI-467.
---
Resolution: Not A Problem

use hadoop-mr-bundle

> Query RT Table in Hive found java.lang.NoClassDefFoundError Exception
> -
>
> Key: HUDI-467
> URL: https://issues.apache.org/jira/browse/HUDI-467
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>  Components: Hive Integration
>Reporter: cdmikechen
>Assignee: cdmikechen
>Priority: Major
>
> When creating a *MERGE_ON_READ* table in hudi and syn to hive, hudi will 
> create two table named *table_name* and *table_name_rt*, when I query 
> *table_name_rt*, I catch  *java.lang.NoClassDefFoundError* Exception:
> {code}
> java.lang.RuntimeException: java.lang.NoClassDefFoundError: 
> org/apache/parquet/avro/AvroSchemaConverter
>   at 
> org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:89)
>  ~[hive-service-2.3.3.jar:2.3.3]
>   at 
> org.apache.hive.service.cli.session.HiveSessionProxy.access$000(HiveSessionProxy.java:36)
>  ~[hive-service-2.3.3.jar:2.3.3]
>   at 
> org.apache.hive.service.cli.session.HiveSessionProxy$1.run(HiveSessionProxy.java:63)
>  ~[hive-service-2.3.3.jar:2.3.3]
>   at java.security.AccessController.doPrivileged(Native Method) 
> ~[?:1.8.0_201]
>   at javax.security.auth.Subject.doAs(Subject.java:422) ~[?:1.8.0_201]
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1844)
>  ~[hadoop-common-2.8.5.jar:?]
>   at 
> org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:59)
>  ~[hive-service-2.3.3.jar:2.3.3]
>   at com.sun.proxy.$Proxy47.fetchResults(Unknown Source) ~[?:?]
>   at 
> org.apache.hive.service.cli.CLIService.fetchResults(CLIService.java:559) 
> ~[hive-service-2.3.3.jar:2.3.3]
>   at 
> org.apache.hive.service.cli.thrift.ThriftCLIService.FetchResults(ThriftCLIService.java:751)
>  ~[hive-service-2.3.3.jar:2.3.3]
>   at 
> org.apache.hive.service.rpc.thrift.TCLIService$Processor$FetchResults.getResult(TCLIService.java:1717)
>  ~[hive-exec-2.3.3.jar:2.3.3]
>   at 
> org.apache.hive.service.rpc.thrift.TCLIService$Processor$FetchResults.getResult(TCLIService.java:1702)
>  ~[hive-exec-2.3.3.jar:2.3.3]
>   at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) 
> ~[hive-exec-2.3.3.jar:2.3.3]
>   at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) 
> ~[hive-exec-2.3.3.jar:2.3.3]
>   at 
> org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:56)
>  ~[hive-service-2.3.3.jar:2.3.3]
>   at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
>  ~[hive-exec-2.3.3.jar:2.3.3]
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  [?:1.8.0_201]
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  [?:1.8.0_201]
>   at java.lang.Thread.run(Thread.java:748) [?:1.8.0_201]
> Caused by: java.lang.NoClassDefFoundError: 
> org/apache/parquet/avro/AvroSchemaConverter
>   at 
> org.apache.hudi.hadoop.realtime.AbstractRealtimeRecordReader.init(AbstractRealtimeRecordReader.java:341)
>  ~[?:?]
>   at 
> org.apache.hudi.hadoop.realtime.AbstractRealtimeRecordReader.(AbstractRealtimeRecordReader.java:108)
>  ~[?:?]
>   at 
> org.apache.hudi.hadoop.realtime.RealtimeCompactedRecordReader.(RealtimeCompactedRecordReader.java:50)
>  ~[?:?]
>   at 
> org.apache.hudi.hadoop.realtime.HoodieRealtimeRecordReader.constructRecordReader(HoodieRealtimeRecordReader.java:69)
>  ~[?:?]
>   at 
> org.apache.hudi.hadoop.realtime.HoodieRealtimeRecordReader.(HoodieRealtimeRecordReader.java:47)
>  ~[?:?]
>   at 
> org.apache.hudi.hadoop.realtime.HoodieParquetRealtimeInputFormat.getRecordReader(HoodieParquetRealtimeInputFormat.java:254)
>  ~[?:?]
>   at 
> org.apache.hadoop.hive.ql.exec.FetchOperator$FetchInputFormatSplit.getRecordReader(FetchOperator.java:695)
>  ~[hive-exec-2.3.3.jar:2.3.3]
>   at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:333)
>  ~[hive-exec-2.3.3.jar:2.3.3]
>   at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:459)
>  ~[hive-exec-2.3.3.jar:2.3.3]
>   at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:428) 
> ~[hive-exec-2.3.3.jar:2.3.3]
>   at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:147) 
> ~[hive-exec-2.3.3.jar:2.3.3]
>   at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:2208) 
> ~[hive-exec-2.3.3.jar:2.3.3]
>   at 
> 

[GitHub] [incubator-hudi] cdmikechen edited a comment on issue #1075: [HUDI-114]: added option to overwrite payload implementation in hoodie.properties file

2019-12-25 Thread GitBox
cdmikechen edited a comment on issue #1075: [HUDI-114]: added option to 
overwrite payload implementation in hoodie.properties file
URL: https://github.com/apache/incubator-hudi/pull/1075#issuecomment-568866635
 
 
   Can I ask an additional question?
   I found in first commit hudi will init table type in hoodie.properties.
   
https://github.com/apache/incubator-hudi/blob/313fab5fd1ef715f98a123d0e09f6010daacab68/hudi-spark/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala#L126
   And this code link to 
   
https://github.com/apache/incubator-hudi/blob/9a1f698eef103adadbf7a1bf7b5eb94fb84e/hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTableMetaClient.java#L286-L290
   Then link to 
   
https://github.com/apache/incubator-hudi/blob/9a1f698eef103adadbf7a1bf7b5eb94fb84e/hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTableMetaClient.java#L300-L318
   Does this means that no matter what user set `payloadClassName`, hudi always 
recognize 
`HoodieTableConfig.HOODIE_PAYLOAD_CLASS_PROP_NAME`(`hoodie.compaction.payload.class`)
 as default value? Should it be fixed?
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


Build failed in Jenkins: hudi-snapshot-deployment-0.5 #140

2019-12-25 Thread Apache Jenkins Server
See 


Changes:


--
[...truncated 2.17 KB...]
/home/jenkins/tools/maven/apache-maven-3.5.4/bin:
m2.conf
mvn
mvn.cmd
mvnDebug
mvnDebug.cmd
mvnyjp

/home/jenkins/tools/maven/apache-maven-3.5.4/boot:
plexus-classworlds-2.5.2.jar

/home/jenkins/tools/maven/apache-maven-3.5.4/conf:
logging
settings.xml
toolchains.xml

/home/jenkins/tools/maven/apache-maven-3.5.4/conf/logging:
simplelogger.properties

/home/jenkins/tools/maven/apache-maven-3.5.4/lib:
aopalliance-1.0.jar
cdi-api-1.0.jar
cdi-api.license
commons-cli-1.4.jar
commons-cli.license
commons-io-2.5.jar
commons-io.license
commons-lang3-3.5.jar
commons-lang3.license
ext
guava-20.0.jar
guice-4.2.0-no_aop.jar
jansi-1.17.1.jar
jansi-native
javax.inject-1.jar
jcl-over-slf4j-1.7.25.jar
jcl-over-slf4j.license
jsr250-api-1.0.jar
jsr250-api.license
maven-artifact-3.5.4.jar
maven-artifact.license
maven-builder-support-3.5.4.jar
maven-builder-support.license
maven-compat-3.5.4.jar
maven-compat.license
maven-core-3.5.4.jar
maven-core.license
maven-embedder-3.5.4.jar
maven-embedder.license
maven-model-3.5.4.jar
maven-model-builder-3.5.4.jar
maven-model-builder.license
maven-model.license
maven-plugin-api-3.5.4.jar
maven-plugin-api.license
maven-repository-metadata-3.5.4.jar
maven-repository-metadata.license
maven-resolver-api-1.1.1.jar
maven-resolver-api.license
maven-resolver-connector-basic-1.1.1.jar
maven-resolver-connector-basic.license
maven-resolver-impl-1.1.1.jar
maven-resolver-impl.license
maven-resolver-provider-3.5.4.jar
maven-resolver-provider.license
maven-resolver-spi-1.1.1.jar
maven-resolver-spi.license
maven-resolver-transport-wagon-1.1.1.jar
maven-resolver-transport-wagon.license
maven-resolver-util-1.1.1.jar
maven-resolver-util.license
maven-settings-3.5.4.jar
maven-settings-builder-3.5.4.jar
maven-settings-builder.license
maven-settings.license
maven-shared-utils-3.2.1.jar
maven-shared-utils.license
maven-slf4j-provider-3.5.4.jar
maven-slf4j-provider.license
org.eclipse.sisu.inject-0.3.3.jar
org.eclipse.sisu.inject.license
org.eclipse.sisu.plexus-0.3.3.jar
org.eclipse.sisu.plexus.license
plexus-cipher-1.7.jar
plexus-cipher.license
plexus-component-annotations-1.7.1.jar
plexus-component-annotations.license
plexus-interpolation-1.24.jar
plexus-interpolation.license
plexus-sec-dispatcher-1.4.jar
plexus-sec-dispatcher.license
plexus-utils-3.1.0.jar
plexus-utils.license
slf4j-api-1.7.25.jar
slf4j-api.license
wagon-file-3.1.0.jar
wagon-file.license
wagon-http-3.1.0-shaded.jar
wagon-http.license
wagon-provider-api-3.1.0.jar
wagon-provider-api.license

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/ext:
README.txt

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native:
freebsd32
freebsd64
linux32
linux64
osx
README.txt
windows32
windows64

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/freebsd32:
libjansi.so

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/freebsd64:
libjansi.so

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/linux32:
libjansi.so

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/linux64:
libjansi.so

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/osx:
libjansi.jnilib

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/windows32:
jansi.dll

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/windows64:
jansi.dll
Finished /home/jenkins/tools/maven/apache-maven-3.5.4 Directory Listing :
Detected current version as: 
'HUDI_home=
0.5.1-SNAPSHOT'
[INFO] Scanning for projects...
[INFO] 
[INFO] Reactor Build Order:
[INFO] 
[INFO] Hudi   [pom]
[INFO] hudi-common[jar]
[INFO] hudi-timeline-service  [jar]
[INFO] hudi-hadoop-mr [jar]
[INFO] hudi-client[jar]
[INFO] hudi-hive  [jar]
[INFO] hudi-spark [jar]
[INFO] hudi-utilities [jar]
[INFO] hudi-cli   [jar]
[INFO] hudi-hadoop-mr-bundle  [jar]
[INFO] hudi-hive-bundle   [jar]
[INFO] hudi-spark-bundle  [jar]
[INFO] hudi-presto-bundle [jar]
[INFO] hudi-utilities-bundle  [jar]
[INFO] hudi-timeline-server-bundle

[GitHub] [incubator-hudi] yanghua commented on issue #626: Adding documentation for hudi test suite

2019-12-25 Thread GitBox
yanghua commented on issue #626: Adding documentation for hudi test suite
URL: https://github.com/apache/incubator-hudi/pull/626#issuecomment-568950880
 
 
   Hi @vinothchandar and @n3nash Can we move it into the root dir of 
`hudi-test-suite` as a README file.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] cdmikechen edited a comment on issue #1073: [HUDI-377] Adding Delete() support to DeltaStreamer

2019-12-25 Thread GitBox
cdmikechen edited a comment on issue #1073: [HUDI-377] Adding Delete() support 
to DeltaStreamer
URL: https://github.com/apache/incubator-hudi/pull/1073#issuecomment-568942977
 
 
   @vinothchandar 
   > Are you asking for incremental pull to provide the both before and after 
images of a record like how 
   > Oracle ogg CDC Stream is? if so, this is a much larger feature.. we can 
discuss on a separate JIRA.
   
   I mean if hudi can get increment data by spark datasource api like
   ```java
Dataset hoodieIncViewDF = spark.read()
.format("org.apache.hudi")
.option(DataSourceReadOptions.VIEW_TYPE_OPT_KEY(),
DataSourceReadOptions.VIEW_TYPE_INCREMENTAL_OPT_VAL())
.option(DataSourceReadOptions.BEGIN_INSTANTTIME_OPT_KEY(),
   )
.load(tablePath); 
   ```
   Or Hive query like
   ```sql
   set hoodie.lims_method.consume.mode=INCREMENTAL;
   set hoodie.lims_method.consume.start.timestamp=;
   set hoodie.lims_method.consume.max.commits=1;
   select `_hoodie_commit_time`,  from table_name where 
`_hoodie_commit_time` >= '';
   ```
   Should we also support some incremental view or api or method to get delete 
rows after delete action. I think this should be considered at the same time 
after this issue and other related issues are submitted.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] listenLearning commented on issue #1135: [HUDI-233]Redo log statements using SLF4J

2019-12-25 Thread GitBox
listenLearning commented on issue #1135: [HUDI-233]Redo log statements using 
SLF4J
URL: https://github.com/apache/incubator-hudi/pull/1135#issuecomment-568947384
 
 
   copy that


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] listenLearning closed pull request #1135: [HUDI-233]Redo log statements using SLF4J

2019-12-25 Thread GitBox
listenLearning closed pull request #1135: [HUDI-233]Redo log statements using 
SLF4J
URL: https://github.com/apache/incubator-hudi/pull/1135
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] leesf commented on issue #1135: [HUDI-233]Redo log statements using SLF4J

2019-12-25 Thread GitBox
leesf commented on issue #1135: [HUDI-233]Redo log statements using SLF4J
URL: https://github.com/apache/incubator-hudi/pull/1135#issuecomment-568946135
 
 
   > @leesf @lamber-ken who were looking into this before..
   > @leesf could you shepherd this one? My concerns are mostly around all 
bundles working smoothly.
   > 
   > IIUC this PR is low touch.. Just adds log4j facade and changes code.. We 
need to verify logs do show up on all of Hive, presto, spark logs by running 
through the demo steps..
   
   Yes, the redo work should be verified carefully.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] SteNicholas opened a new pull request #1139: [MINOR]Optimize hudi-client module

2019-12-25 Thread GitBox
SteNicholas opened a new pull request #1139: [MINOR]Optimize hudi-client module
URL: https://github.com/apache/incubator-hudi/pull/1139
 
 
   ## *Tips*
   - *Thank you very much for contributing to Apache Hudi.*
   - *Please review https://hudi.apache.org/contributing.html before opening a 
pull request.*
   
   ## What is the purpose of the pull request
   
   Optimize hudi-client module code, including simplifing lamada expression, 
fixed spelling mistake, removing unused variable.
   
   ## Brief change log
   
 - Simplify lamada expression.
 - Fixed spelling mistake.
 - Remove unused variable.
   
   ## Verify this pull request
   
   This pull request is a trivial rework / code cleanup without any test 
coverage.
   
   *(or)*
   
   This pull request is already covered by existing tests, such as *(please 
describe tests)*.
   
   (or)
   
   This change added tests and can be verified as follows:
   
   *(example:)*
   
 - *Added integration tests for end-to-end.*
 - *Added HoodieClientWriteTest to verify the change.*
 - *Manually verified the change by running a job locally.*
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] cdmikechen commented on a change in pull request #1073: [HUDI-377] Adding Delete() support to DeltaStreamer

2019-12-25 Thread GitBox
cdmikechen commented on a change in pull request #1073: [HUDI-377] Adding 
Delete() support to DeltaStreamer
URL: https://github.com/apache/incubator-hudi/pull/1073#discussion_r361347283
 
 

 ##
 File path: 
hudi-spark/src/main/java/org/apache/hudi/OverwriteWithLatestAvroPayload.java
 ##
 @@ -61,8 +60,15 @@ public OverwriteWithLatestAvroPayload 
preCombine(OverwriteWithLatestAvroPayload
 
   @Override
   public Option combineAndGetUpdateValue(IndexedRecord 
currentValue, Schema schema) throws IOException {
+
 
 Review comment:
   @vinothchandar 
   > Doing this in getInsertValue() means even inserts with the flag set will 
be deleted.. Not sure if this is intended behavior.. We only want to delete if 
updating and marker set?
   
   If this is in a Kaapa architecture, it works. But if this is in a similar 
Lambda architecture, data should be rebuilt sometimes, it may will get whole 
data change logs by bulk insert.
   Of course, this is just my assumption. Maybe our test cases haven't happen 
at present. If I think too much, and in fact it can't be found in actual cases, 
please ignore my review.
   
   > do you have a performance concern here? `Option.of` should be very cheap 
right.. In any case, we can achieve the effect of what you mean, by simply 
hanging onto to the original Option[GenenricRecord]?
   
   Yes, `Option.of` may new another object. I personally feel that if an 
existing object already exists, unless there is a specific need, we should try 
to use the original object instead of creating a new one.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] cdmikechen commented on a change in pull request #1073: [HUDI-377] Adding Delete() support to DeltaStreamer

2019-12-25 Thread GitBox
cdmikechen commented on a change in pull request #1073: [HUDI-377] Adding 
Delete() support to DeltaStreamer
URL: https://github.com/apache/incubator-hudi/pull/1073#discussion_r361347283
 
 

 ##
 File path: 
hudi-spark/src/main/java/org/apache/hudi/OverwriteWithLatestAvroPayload.java
 ##
 @@ -61,8 +60,15 @@ public OverwriteWithLatestAvroPayload 
preCombine(OverwriteWithLatestAvroPayload
 
   @Override
   public Option combineAndGetUpdateValue(IndexedRecord 
currentValue, Schema schema) throws IOException {
+
 
 Review comment:
   @vinothchandar 
   > Doing this in getInsertValue() means even inserts with the flag set will 
be deleted.. Not sure if this is intended behavior.. We only want to delete if 
updating and marker set?
   
   If this is in a Kaapa architecture, it works. But if this is in a similar 
Lambda architecture, data should be rebuilt sometimes, it may will get whole 
data change logs by bulk insert.
   Of course, this is just my assumption. Maybe our test cases haven't happen 
at present. If I think too much, and in fact it can't be found in actual cases, 
please ignore my review.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Updated] (HUDI-115) Enhance OverwriteWithLatestAvroPayload to also respect ordering value of record in storage

2019-12-25 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar updated HUDI-115:

Status: New  (was: Open)

> Enhance OverwriteWithLatestAvroPayload to also respect ordering value of 
> record in storage
> --
>
> Key: HUDI-115
> URL: https://issues.apache.org/jira/browse/HUDI-115
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Spark Integration
>Reporter: Vinoth Chandar
>Assignee: Yuanbin Cheng
>Priority: Major
>
> https://lists.apache.org/thread.html/45035cc88901b37e3f985b72def90ee5529c4caf87e48d650c00327d@
>  
> context here 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-47) Revisit null checks in the Log Blocks, merge lazyreading with this null check #340

2019-12-25 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-47?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar updated HUDI-47:
---
Component/s: Code Cleanup

> Revisit null checks in the Log Blocks, merge lazyreading with this null check 
> #340
> --
>
> Key: HUDI-47
> URL: https://issues.apache.org/jira/browse/HUDI-47
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Code Cleanup, Storage Management
>Reporter: Vinoth Chandar
>Priority: Major
>
> https://github.com/uber/hudi/issues/340



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-155) Expose a merge-policy enum

2019-12-25 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar updated HUDI-155:

Status: New  (was: Open)

> Expose a merge-policy enum
> --
>
> Key: HUDI-155
> URL: https://issues.apache.org/jira/browse/HUDI-155
> Project: Apache Hudi (incubating)
>  Issue Type: Task
>  Components: Writer Core
>Reporter: Nishith Agarwal
>Priority: Major
>
> Configs such as : \{ PREFER_LATEST, PREFER_EARLIEST, CUSTOM } to allow user 
> to choose the type of merge based on the payload implementation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-72) Cap number of Spark partitions during write/compaction phases

2019-12-25 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-72?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar updated HUDI-72:
---
Status: New  (was: Open)

> Cap number of Spark partitions during write/compaction phases
> -
>
> Key: HUDI-72
> URL: https://issues.apache.org/jira/browse/HUDI-72
> Project: Apache Hudi (incubating)
>  Issue Type: New Feature
>  Components: Writer Core
>Reporter: Vinoth Chandar
>Assignee: omkar vinit joshi
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-33) Introduce config to allow users to control case-sensitivity in column projections #431

2019-12-25 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-33?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar updated HUDI-33:
---
Status: New  (was: Open)

> Introduce config to allow users to control case-sensitivity in column 
> projections #431
> --
>
> Key: HUDI-33
> URL: https://issues.apache.org/jira/browse/HUDI-33
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>  Components: Spark Integration
>Reporter: Vinoth Chandar
>Priority: Major
>
> https://github.com/uber/hudi/issues/431



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-202) Remove all one-time changes done for seamlessly migrating users from com.uber.hoodie to org.apache.hudi

2019-12-25 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar updated HUDI-202:

Status: New  (was: Open)

> Remove all one-time changes done for seamlessly migrating users from 
> com.uber.hoodie to org.apache.hudi
> ---
>
> Key: HUDI-202
> URL: https://issues.apache.org/jira/browse/HUDI-202
> Project: Apache Hudi (incubating)
>  Issue Type: Task
>  Components: Release  Administrative
>Reporter: Balaji Varadarajan
>Priority: Major
>
> Original PR: [https://github.com/apache/incubator-hudi/pull/830]
> Wiki: 
> [https://cwiki.apache.org/confluence/display/HUDI/Migration+Guide+From+com.uber.hoodie+to+org.apache.hudi]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-23) Consider adding split pruning support #504

2019-12-25 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-23?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar updated HUDI-23:
---
Status: New  (was: Open)

> Consider adding split pruning support #504
> --
>
> Key: HUDI-23
> URL: https://issues.apache.org/jira/browse/HUDI-23
> Project: Apache Hudi (incubating)
>  Issue Type: New Feature
>  Components: Hive Integration
>Reporter: Vinoth Chandar
>Assignee: Bhavani Sudha Saktheeswaran
>Priority: Major
>
> https://github.com/uber/hudi/issues/504



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-55) Investigate support for bucketed tables ala Hive #74

2019-12-25 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-55?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar updated HUDI-55:
---
Status: New  (was: Open)

> Investigate support for bucketed tables ala Hive #74
> 
>
> Key: HUDI-55
> URL: https://issues.apache.org/jira/browse/HUDI-55
> Project: Apache Hudi (incubating)
>  Issue Type: New Feature
>  Components: Hive Integration
>Reporter: Vinoth Chandar
>Priority: Major
>
> https://github.com/uber/hudi/issues/74



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-24) Caching .hoodie file in inputformat to reduce lookups #503

2019-12-25 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-24?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar updated HUDI-24:
---
Status: New  (was: Open)

> Caching .hoodie file in inputformat to reduce lookups #503
> --
>
> Key: HUDI-24
> URL: https://issues.apache.org/jira/browse/HUDI-24
> Project: Apache Hudi (incubating)
>  Issue Type: New Feature
>  Components: Hive Integration
>Reporter: Vinoth Chandar
>Assignee: Bhavani Sudha Saktheeswaran
>Priority: Major
>
> https://github.com/uber/hudi/issues/503



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-145) Limit the amount of partitions considered for GlobalBloomIndex

2019-12-25 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar updated HUDI-145:

Status: New  (was: Open)

> Limit the amount of partitions considered for GlobalBloomIndex
> --
>
> Key: HUDI-145
> URL: https://issues.apache.org/jira/browse/HUDI-145
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Index, newbie
>Reporter: Vinoth Chandar
>Assignee: Vinoth Chandar
>Priority: Major
>
> Currently, global bloom index will check inputs against files in all 
> partitions.. In lot of cases, the user may know a range of partitions 
> actually impacted from updates clearly (e.g upstream system drops updates 
> older than a year, ... )..  In such a scenario,it may make sense to support 
> an option for Global bloom to control how many partitions you want to match 
> against, to gain performance. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-95) Support IgniteFS/igfs storage via Hudi

2019-12-25 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-95?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar updated HUDI-95:
---
Status: New  (was: Open)

> Support IgniteFS/igfs storage via Hudi
> --
>
> Key: HUDI-95
> URL: https://issues.apache.org/jira/browse/HUDI-95
> Project: Apache Hudi (incubating)
>  Issue Type: New Feature
>  Components: Storage Management
>Reporter: Vinoth Chandar
>Priority: Major
>
> [https://github.com/vinothchandar/incubator-hudi/commit/dd578947ec1db9388038f0a1863a90b3761cd571]
>  
> has some test code. Seems to work with appends also.. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-284) Need Tests for Hudi handling of schema evolution

2019-12-25 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar updated HUDI-284:

Component/s: Testing

> Need  Tests for Hudi handling of schema evolution
> -
>
> Key: HUDI-284
> URL: https://issues.apache.org/jira/browse/HUDI-284
> Project: Apache Hudi (incubating)
>  Issue Type: Test
>  Components: Common Core, newbie, Testing
>Reporter: Balaji Varadarajan
>Priority: Major
>
> Context in : 
> https://github.com/apache/incubator-hudi/pull/927#pullrequestreview-293449514



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-64) Estimation of compression ratio & other dynamic storage knobs based on historical stats

2019-12-25 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-64?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar updated HUDI-64:
---
Status: New  (was: Open)

> Estimation of compression ratio & other dynamic storage knobs based on 
> historical stats
> ---
>
> Key: HUDI-64
> URL: https://issues.apache.org/jira/browse/HUDI-64
> Project: Apache Hudi (incubating)
>  Issue Type: New Feature
>  Components: Storage Management, Writer Core
>Reporter: Vinoth Chandar
>Assignee: Vinoth Chandar
>Priority: Major
>
> Something core to Hudi writing is using heuristics or runtime workload 
> statistics to optimize aspects of storage like file sizes, partitioning and 
> so on.  
> Below lists all such places. 
>  
>  # Compression ratio for parquet 
> [https://github.com/apache/incubator-hudi/blob/a4f9d7575f39bb79089714049ffea12ba5f25ec8/hudi-client/src/main/java/org/apache/hudi/config/HoodieStorageConfig.java#L46]
>  . This is used by HoodieWrapperFileSystem, to estimate amount of bytes it 
> has written for a given parquet file and closes the parquet file once the 
> configured size has reached. DFSOutputStream level we only know bytes written 
> before compression. Once enough data has been written, it should be possible 
> to replace this by a simple estimate of what the avg record size would be 
> (commit metadata would give you size and number of records in each file)
>  # Very similar problem exists for log files 
> [https://github.com/apache/incubator-hudi/blob/a4f9d7575f39bb79089714049ffea12ba5f25ec8/hudi-client/src/main/java/org/apache/hudi/config/HoodieStorageConfig.java#L52]
>  We write data into logs in avro and can log updates to same record in 
> parquet multiple times. We need to estimate again how large the log file(s) 
> can grow to, and still we would be able to produce a parquet file of 
> configured size during compaction. (hope I conveyed this clearly)
>  # WorkloadProfile : 
> [https://github.com/apache/incubator-hudi/blob/b19bed442d84c1cb1e48d184c9554920735bcb6c/hudi-client/src/main/java/org/apache/hudi/table/WorkloadProfile.java]
>  caches the input records using Spark Caching and computes the shape of the 
> workload, i.e how many records per partition, how many inserts vs updates 
> etc. This is used by the Partitioner here 
> [https://github.com/apache/incubator-hudi/blob/b19bed442d84c1cb1e48d184c9554920735bcb6c/hudi-client/src/main/java/org/apache/hudi/table/HoodieCopyOnWriteTable.java#L141]
>  for assigning records to a file group. This is the critical one to replace 
> for Flink support and probably the hardest, since we need to guess input, 
> which is not always possible? 
>  # Within partitioner, we already derive a simple average size per record 
> [https://github.com/apache/incubator-hudi/blob/b19bed442d84c1cb1e48d184c9554920735bcb6c/hudi-client/src/main/java/org/apache/hudi/table/HoodieCopyOnWriteTable.java#L756]
>  from the last commit metadata alone. This can be generalized.  (default : 
> [https://github.com/apache/incubator-hudi/blob/b19bed442d84c1cb1e48d184c9554920735bcb6c/hudi-client/src/main/java/org/apache/hudi/config/HoodieCompactionConfig.java#L71])
>  
>  # 
> Our goal in this Jira is to see, if could derive this information in the 
> background purely using the commit metadata.. Some parts of this are 
> open-ended.. Good starting point would be to see whats feasible, estimate ROI 
> before aactually implementing 
>  
>  
>  
>  
>  
>  
> Roughly along the likes of. [https://github.com/uber/hudi/issues/270] 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-103) Revisit handling of pending compaction in file-system view

2019-12-25 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar updated HUDI-103:

Status: New  (was: Open)

> Revisit handling of pending compaction in file-system view
> --
>
> Key: HUDI-103
> URL: https://issues.apache.org/jira/browse/HUDI-103
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Common Core
>Reporter: Balaji Varadarajan
>Priority: Minor
>
> This came out of code review for Timeline Server. 
> [https://github.com/apache/incubator-hudi/pull/600/files#r276825586]
> We need to investigate if pending compaction operation can be done in a 
> better way so that it can easily be applied to File Stiching as well.
>  
> The current approach stems from the requirement that we have to satisfy both 
> use-case : 
>  1. Fetch latest file slice without any regard for pending compaction for 
> appending
>  2. Fetch Merged file slices for realtime views with pending compaction 
> factored
> To avoid duplicate storage, we are doing lazy merging due to pending 
> compaction. 
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-241) Track per column level statistics for each file

2019-12-25 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar updated HUDI-241:

Status: New  (was: Open)

> Track per column level statistics for each file 
> 
>
> Key: HUDI-241
> URL: https://issues.apache.org/jira/browse/HUDI-241
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Common Core
>Reporter: Balaji Varadarajan
>Priority: Major
>
> Hudi currently maintains statistics for record keys. We should collect 
> similar statistics and for other columns and expose them from timeline server.
> Query engines can then be integrated with timeline server to make use of this 
> information
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-26) Introduce a way to collapse filegroups into one and reindex #491

2019-12-25 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-26?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar updated HUDI-26:
---
Status: New  (was: Open)

> Introduce a way to collapse filegroups into one and reindex #491
> 
>
> Key: HUDI-26
> URL: https://issues.apache.org/jira/browse/HUDI-26
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Common Core, Writer Core
>Reporter: Vinoth Chandar
>Priority: Major
>
> https://github.com/uber/hudi/issues/491



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-112) Supporting a Collapse type of operation

2019-12-25 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar updated HUDI-112:

Status: New  (was: Open)

> Supporting a Collapse type of operation
> ---
>
> Key: HUDI-112
> URL: https://issues.apache.org/jira/browse/HUDI-112
> Project: Apache Hudi (incubating)
>  Issue Type: New Feature
>  Components: Common Core
>Reporter: Nishith Agarwal
>Assignee: Nishith Agarwal
>Priority: Major
>
> Currently, for COPY_ON_WRITE tables Hudi automatically adjusts small file by 
> packing inserts and sending them over to a particular file based on the small 
> file size limits set in the client config.
> One of the side effects of this is that the time taken to rewrite the small 
> files into larger ones is borne by the writer (or the ingestor). In cases 
> where we continuously want really low ingestion latency ( < 5 mins ), having 
> the writer enlarge the small files may not be preferable.
> If there was a way for the writer to schedule a collapse sort of operation 
> that can later be picked up asynchronously by a job/thread (different from 
> the ingestor) that collapses N files into M files, thereby also enlarging the 
> file sizes. 
> The mechanism should support different strategies for scheduling collapse so 
> we can perform even smarter data layout during such rewriting, for eg., group 
> certain record_keys together in a single file from N different files to allow 
> for better query performance and more.
> MERGE_ON_READ on the other hand solves this in a different way. We can send 
> inserts to log files (for a base columnar file) and when the compaction kicks 
> in, it would automatically resize the file. Although, the reader (realtime 
> query) would have to pay a small penalty here to merge the log files with the 
> base columnar files to get freshest data. 
> In any case, we need a mechanism to collapse older smaller files into larger 
> ones while also keeping the query cost low. Creating this ticket to discuss 
> more around this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-16) Merkle tree based Hoodie dataset integrity #527

2019-12-25 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-16?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar updated HUDI-16:
---
Status: New  (was: Open)

> Merkle tree based Hoodie dataset integrity #527
> ---
>
> Key: HUDI-16
> URL: https://issues.apache.org/jira/browse/HUDI-16
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Common Core, Storage Management, Writer Core
>Reporter: Vinoth Chandar
>Priority: Major
>
> https://github.com/uber/hudi/issues/527



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-388) Support DDL / DML SparkSQL statements which useful for admins

2019-12-25 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar updated HUDI-388:

Status: New  (was: Open)

> Support DDL / DML SparkSQL statements which useful for admins
> -
>
> Key: HUDI-388
> URL: https://issues.apache.org/jira/browse/HUDI-388
> Project: Apache Hudi (incubating)
>  Issue Type: New Feature
>  Components: CLI
>Reporter: lamber-ken
>Assignee: lamber-ken
>Priority: Major
>
> *Purpose*
> Currently, hudi offers some tools available to operate an ecosystem of Hudi 
> datasets including hudi-cli, metrics, spark ui[1]. It's easy for admins to 
> manage the hudi datasets by some customized ddl sql statements instead of via 
> hudi-cli.
>  
> After SPARK-18127, we can customize the spark session with our own optimizer, 
> parser, analyzer, and physical plan strategy rules in Spark. Here are some 
> steps to extend spark session
> 1, Need a tool to parse the SparkSQL statements, like antlr, RegExp.
> 2, A class which extends org.apache.spark.sql.SparkSessionExtensions and 
> inject the parser.
> 3, Run the customized statements by extending 
> org.apache.spark.sql.execution.command.RunnableCommand.
>  
> *Demo*
> 1, Extend SparkSessionExtensions
> {code:java}
> class HudiSparkSessionExtension extends (SparkSessionExtensions => Unit) {
>   override def apply(extensions: SparkSessionExtensions): Unit = {
> extensions.injectParser { (session, parser) =>
>   new HudiDDLParser(parser)
> }
>   }
> } {code}
>  
> 2, Extend RunnableCommand
> {code:java}
> case class HudiStatCommand(path: String) extends RunnableCommand {
>   override val output: Seq[Attribute] = {
> Seq(
>   AttributeReference("CommitTime", StringType, nullable = false)(),
>   AttributeReference("Total Upserted", IntegerType, nullable = false)(),
>   AttributeReference("Total Written", IntegerType, nullable = false)(),
>   AttributeReference("Write Amplifiation Factor", DoubleType, nullable = 
> false)()
> )
>   }
>   override def run(sparkSession: SparkSession): Seq[Row] = {
> Seq(
>   Row("20191207003131", 0, 10, 0.1),
>   Row("20191207003200", 4, 10, 2.50),
>   Row("Total", 4, 20, 5.00)
> )
>   }
> }
> {code}
>  
> 3, demo result, mock data
> {code:java}
> +--+--+-+-+
> |CommitTime|Total Upserted|Total Written|Write Amplifiation Factor|
> +--+--+-+-+
> |20191207003131| 0|   10|  0.1|
> |20191207003200| 4|   10|  2.5|
> | Total| 4|   20|  5.0|
> +--+--+-+-+
> {code}
>  
> [https://github.com/lamber-ken/hudi-work]
> [http://hudi.apache.org/admin_guide.html]
> https://issues.apache.org/jira/browse/SPARK-18127
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-404) Compile Master's Source Code Error

2019-12-25 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar updated HUDI-404:

Status: New  (was: Open)

> Compile Master's Source Code  Error
> ---
>
> Key: HUDI-404
> URL: https://issues.apache.org/jira/browse/HUDI-404
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>  Components: Usability
>Reporter: Xurenhe
>Priority: Major
>  Labels: pull-request-available
> Attachments: hudi-compile-error.png
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Hi, I download source code of Hudi, when I use command of 'mvn clean package 
> -DskipTests -DskipITs' to compile this project, but some error happened.
> I check the maven's dependencies, I find miss one dependency of 
> 'com.google.code.findbugs:jsr305'



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-443) Add slides for Hadoop summit 2019, Bangalore to powered-by page

2019-12-25 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar updated HUDI-443:

Status: In Progress  (was: Open)

> Add slides for Hadoop summit 2019, Bangalore to powered-by page
> ---
>
> Key: HUDI-443
> URL: https://issues.apache.org/jira/browse/HUDI-443
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Docs, newbie
>Reporter: Pratyaksh Sharma
>Assignee: Pratyaksh Sharma
>Priority: Major
> Fix For: 0.5.1
>
>
> Add slides for the talk on Apache Hudi and debezium at Hadoop summit 2019, 
> Bangalore to powered-by page



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-352) The official documentation about project structure missed hudi-timeline-service module

2019-12-25 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar updated HUDI-352:

Status: New  (was: Open)

> The official documentation about project structure missed 
> hudi-timeline-service module
> --
>
> Key: HUDI-352
> URL: https://issues.apache.org/jira/browse/HUDI-352
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Docs
>Reporter: vinoyang
>Priority: Major
>  Labels: starter
>
> The official documentation about project structure[1] missed 
> hudi-timeline-service module, we should add it.
> [1]: http://hudi.apache.org/contributing.html#code--project-structure



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-281) HiveSync failure through Spark when useJdbc is set to false

2019-12-25 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar updated HUDI-281:

Status: New  (was: Open)

> HiveSync failure through Spark when useJdbc is set to false
> ---
>
> Key: HUDI-281
> URL: https://issues.apache.org/jira/browse/HUDI-281
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Hive Integration, Spark Integration, Usability
>Reporter: Udit Mehrotra
>Priority: Major
>
> Table creation with Hive sync through Spark fails, when I set *useJdbc* to 
> *false*. Currently I had to modify the code to set *useJdbc* to *false* as 
> there is not *DataSourceOption* through which I can specify this field when 
> running Hudi code.
> Here is the failure:
> {noformat}
> java.lang.NoSuchMethodError: 
> org.apache.hadoop.hive.ql.session.SessionState.start(Lorg/apache/hudi/org/apache/hadoop_hive/conf/HiveConf;)Lorg/apache/hadoop/hive/ql/session/SessionState;
>   at 
> org.apache.hudi.hive.HoodieHiveClient.updateHiveSQLs(HoodieHiveClient.java:527)
>   at 
> org.apache.hudi.hive.HoodieHiveClient.updateHiveSQLUsingHiveDriver(HoodieHiveClient.java:517)
>   at 
> org.apache.hudi.hive.HoodieHiveClient.updateHiveSQL(HoodieHiveClient.java:507)
>   at 
> org.apache.hudi.hive.HoodieHiveClient.createTable(HoodieHiveClient.java:272)
>   at org.apache.hudi.hive.HiveSyncTool.syncSchema(HiveSyncTool.java:132)
>   at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:96)
>   at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:68)
>   at 
> org.apache.hudi.HoodieSparkSqlWriter$.syncHive(HoodieSparkSqlWriter.scala:235)
>   at 
> org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:169)
>   at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:91)
>   at 
> org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45)
>   at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
>   at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
>   at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:156)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
>   at 
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
>   at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
>   at 
> org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:80)
>   at 
> org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:80)
>   at 
> org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676)
>   at 
> org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676)
>   at 
> org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:78)
>   at 
> org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125)
>   at 
> org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73)
>   at 
> org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:676)
>   at 
> org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:285)
>   at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:271)
>   at 
> org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:229){noformat}
> I was expecting this to fail through Spark, becuase *hive-exec* is not shaded 
> inside *hudi-spark-bundle*, while *HiveConf* is shaded and relocated. This 
> *SessionState* is coming from the spark-hive jar and obviously it does not 
> accept the relocated *HiveConf*.
> We in *EMR* are running into same problem when trying to integrate with Glue 
> Catalog. For this we have to create Hive metastore client through 
> *Hive.get(conf).getMsc()* instead of how it is being down now, so that 
> alternate implementations of metastore can get created. However, because 
> hive-exec is not shaded but HiveConf is relocated we run into same issues 
> there.
> It would not be recommended to shade *hive-exec* either because it itself is 
> an Uber jar that shades a lot of things, and all of them would end up in 
> *hudi-spark-bundle* jar. We would not want to head that route. That is why, 
> we would suggest if we consider removing any shading of Hive libraries.

[jira] [Updated] (HUDI-110) Better defaults for Partition extractor for Spark DataSOurce and DeltaStreamer

2019-12-25 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar updated HUDI-110:

Status: New  (was: Open)

> Better defaults for Partition extractor for Spark DataSOurce and DeltaStreamer
> --
>
> Key: HUDI-110
> URL: https://issues.apache.org/jira/browse/HUDI-110
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: DeltaStreamer, Spark Integration, Usability
>Reporter: Balaji Varadarajan
>Priority: Minor
>
> Currently
> SlashEncodedDayPartitionValueExtractor is the default being used. This is not 
> a common format outside Uber.
>  
> Also, Spark DataSource provides partitionedBy clauses which has not been 
> integrated for Hudi Data Source.  We need to investigate how we can leverage 
> partitionBy clause for partitioning.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-142) GCS: Hoodie-CLI failing with NoSuchMethodError:

2019-12-25 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar updated HUDI-142:

Status: New  (was: Open)

> GCS: Hoodie-CLI failing with NoSuchMethodError:
> ---
>
> Key: HUDI-142
> URL: https://issues.apache.org/jira/browse/HUDI-142
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>  Components: CLI
>Reporter: Balaji Varadarajan
>Priority: Minor
>  Labels: gcs-parity
>
> “””
> 19/05/08 02:54:39 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> Command failed java.lang.NoSuchMethodError: 
> org.apache.hadoop.hdfs.DFSUtil.getRandom()Ljava/util/Random;
> Exception in thread "Spring Shell" java.lang.NoSuchMethodError: 
> org.apache.hadoop.hdfs.DFSUtil.getRandom()Ljava/util/Random;
>  
> “””
>  
> _Status_: Can be fixed by making hoodie-cli as fat-jar. Suggested workaround 
> for now is to include HADOOP_HOME/lib/* in classpath along with 
> hoodie-utilities-bundle and remove hoodie-cli/target/lib/* from classpath



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-110) Better defaults for Partition extractor for Spark DataSOurce and DeltaStreamer

2019-12-25 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar updated HUDI-110:

Status: Open  (was: New)

> Better defaults for Partition extractor for Spark DataSOurce and DeltaStreamer
> --
>
> Key: HUDI-110
> URL: https://issues.apache.org/jira/browse/HUDI-110
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: DeltaStreamer, Spark Integration, Usability
>Reporter: Balaji Varadarajan
>Priority: Minor
>
> Currently
> SlashEncodedDayPartitionValueExtractor is the default being used. This is not 
> a common format outside Uber.
>  
> Also, Spark DataSource provides partitionedBy clauses which has not been 
> integrated for Hudi Data Source.  We need to investigate how we can leverage 
> partitionBy clause for partitioning.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-17) Better documentation for paths passed to incr and ro views from Spark datasource

2019-12-25 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-17?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar updated HUDI-17:
---
Status: New  (was: Open)

> Better documentation for paths passed to incr and ro views from Spark 
> datasource
> 
>
> Key: HUDI-17
> URL: https://issues.apache.org/jira/browse/HUDI-17
> Project: Apache Hudi (incubating)
>  Issue Type: Task
>  Components: Docs, Spark Integration
>Reporter: Vinoth Chandar
>Priority: Major
>
> https://github.com/uber/hudi/issues/524



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] cdmikechen commented on issue #1073: [HUDI-377] Adding Delete() support to DeltaStreamer

2019-12-25 Thread GitBox
cdmikechen commented on issue #1073: [HUDI-377] Adding Delete() support to 
DeltaStreamer
URL: https://github.com/apache/incubator-hudi/pull/1073#issuecomment-568942977
 
 
   @vinothchandar 
   > Are you asking for incremental pull to provide the both before and after 
images of a record like how 
   > Oracle ogg CDC Stream is? if so, this is a much larger feature.. we can 
discuss on a separate JIRA.
   
   I mean if hudi can get increment data by spark datasource api like
   ```java
Dataset hoodieIncViewDF = spark.read()
.format("org.apache.hudi")
.option(DataSourceReadOptions.VIEW_TYPE_OPT_KEY(),
DataSourceReadOptions.VIEW_TYPE_INCREMENTAL_OPT_VAL())
.option(DataSourceReadOptions.BEGIN_INSTANTTIME_OPT_KEY(),
   )
.load(tablePath); 
   ```
   Or Hive query like
   ```sql
   set hoodie.lims_method.consume.mode=INCREMENTAL;
   set hoodie.lims_method.consume.start.timestamp=;
   set hoodie.lims_method.consume.max.commits=1;
   select `_hoodie_commit_time`,  from table_name where 
`_hoodie_commit_time` >= '';
   ```
   Should we also support some api or method to get delete rows after delete 
action. I think this should be considered at the same time after this issue and 
other related issues are submitted.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Updated] (HUDI-470) Fix NPE when print result via hudi-cli

2019-12-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-470:

Labels: pull-request-available  (was: )

> Fix NPE when print result via hudi-cli
> --
>
> Key: HUDI-470
> URL: https://issues.apache.org/jira/browse/HUDI-470
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>Reporter: lamber-ken
>Assignee: lamber-ken
>Priority: Major
>  Labels: pull-request-available
>
> Fix NPE when print result via hudi-cli



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] lamber-ken opened a new pull request #1138: [HUDI-470] Fix NPE when print result via hudi-cli

2019-12-25 Thread GitBox
lamber-ken opened a new pull request #1138: [HUDI-470] Fix NPE when print 
result via hudi-cli
URL: https://github.com/apache/incubator-hudi/pull/1138
 
 
   ## What is the purpose of the pull request
   
   The size of `rows` is error, for example 
   ```
   List allRecords = Arrays.asList();
   
   String[][] rows = new String[allRecords.size() + 1][];
   int i = 0;
   for (String record : allRecords) {
   String[] data = new String[1];
   data[0] = record;
   rows[i++] = data;
   }
   
   HoodiePrintHelper.print(new String[]{"Partition Path"}, rows);
   ```
   Result is:
   ```
   Exception in thread "main" java.lang.NullPointerException
at com.jakewharton.fliptables.FlipTable.(FlipTable.java:37)
at com.jakewharton.fliptables.FlipTable.of(FlipTable.java:20)
at 
org.apache.hudi.cli.HoodiePrintHelper.printTextTable(HoodiePrintHelper.java:110)
at 
org.apache.hudi.cli.HoodiePrintHelper.print(HoodiePrintHelper.java:43)
at 
org.apache.hudi.cli.commands.RepairsCommand.main(RepairsCommand.java:127)
   ```
   
   ## Brief change log
   
 - Fix NPE when print result via hudi-cli
   
   ## Verify this pull request
   
   This pull request is a code cleanup without any test coverage.
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] SteNicholas edited a comment on issue #1136: [MINOR]Optimize hudi-cli module

2019-12-25 Thread GitBox
SteNicholas edited a comment on issue #1136: [MINOR]Optimize hudi-cli module
URL: https://github.com/apache/incubator-hudi/pull/1136#issuecomment-568942473
 
 
   > `hudi-cli` as it is is weak on unit tests.. Do you want to first take up 
that task on the current master branch.. that way we would know how/if large 
changes like these affect the functionality..
   > 
   > Your changes may be fine. but I am just saying it would be good to first 
add some tests to the module.. (again not a problem of this PR, but something 
that affects it)
   
   @vinothchandar My original thought was that optimizing the code of hudi-cli, 
never consider the unit tests.  I would like to first take up that task on the 
current master branch.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] SteNicholas commented on issue #1136: [MINOR]Optimize hudi-cli module

2019-12-25 Thread GitBox
SteNicholas commented on issue #1136: [MINOR]Optimize hudi-cli module
URL: https://github.com/apache/incubator-hudi/pull/1136#issuecomment-568942473
 
 
   > `hudi-cli` as it is is weak on unit tests.. Do you want to first take up 
that task on the current master branch.. that way we would know how/if large 
changes like these affect the functionality..
   > 
   > Your changes may be fine. but I am just saying it would be good to first 
add some tests to the module.. (again not a problem of this PR, but something 
that affects it)
   
   My original thought was that optimizing the code of hudi-cli, never consider 
the unit tests.  I would like to first take up that task on the current master 
branch.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] SteNicholas commented on a change in pull request #1136: [MINOR]Optimize hudi-cli module

2019-12-25 Thread GitBox
SteNicholas commented on a change in pull request #1136: [MINOR]Optimize 
hudi-cli module
URL: https://github.com/apache/incubator-hudi/pull/1136#discussion_r361345650
 
 

 ##
 File path: hudi-cli/src/main/java/org/apache/hudi/cli/HoodiePrintHelper.java
 ##
 @@ -84,7 +84,7 @@ private static String print(Table buffer) {
 buffer.getFieldNames().toArray(header);
 
 String[][] rows =
-buffer.getRenderRows().stream().map(l -> 
l.stream().toArray(String[]::new)).toArray(String[][]::new);
+buffer.getRenderRows().stream().map(l -> l.toArray(new 
String[0])).toArray(String[][]::new);
 
 Review comment:
   This should be pre-size for `toArray` method, and I have already modified. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Assigned] (HUDI-470) Fix NPE when print result via hudi-cli

2019-12-25 Thread lamber-ken (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lamber-ken reassigned HUDI-470:
---

Assignee: lamber-ken

> Fix NPE when print result via hudi-cli
> --
>
> Key: HUDI-470
> URL: https://issues.apache.org/jira/browse/HUDI-470
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>Reporter: lamber-ken
>Assignee: lamber-ken
>Priority: Major
>
> Fix NPE when print result via hudi-cli



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-470) Fix NPE when print result via hudi-cli

2019-12-25 Thread lamber-ken (Jira)
lamber-ken created HUDI-470:
---

 Summary: Fix NPE when print result via hudi-cli
 Key: HUDI-470
 URL: https://issues.apache.org/jira/browse/HUDI-470
 Project: Apache Hudi (incubating)
  Issue Type: Bug
Reporter: lamber-ken


Fix NPE when print result via hudi-cli



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-260) Hudi Spark Bundle does not work when passed in extraClassPath option

2019-12-25 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar updated HUDI-260:

Component/s: (was: Usability)
 (was: Spark Integration)

> Hudi Spark Bundle does not work when passed in extraClassPath option
> 
>
> Key: HUDI-260
> URL: https://issues.apache.org/jira/browse/HUDI-260
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Spark datasource
>Reporter: Vinoth Chandar
>Assignee: Udit Mehrotra
>Priority: Major
>
> On EMR's side we have the same findings. *a + b + c +d* work in the following 
> cases:
>  * The bundle jar (with databricks-avro shaded) is specified using *--jars* 
> or *spark.jars* option
>  * The bundle jar (with databricks-avro shaded) is placed in the Spark Home 
> jars folder i.e. */usr/lib/spark/jars* folder
> However, it does not work if the jar is specified using 
> *spark.driver.extraClassPath* and *spark.executor.extraClassPath* options 
> which is what EMR uses to configure external dependencies. Although we can 
> drop the jar in */usr/lib/spark/jars* folder, but I am not sure if it is 
> recommended because that folder is supposed to contain the jars coming from 
> spark. Extra dependencies from users side would be better off specified 
> through *extraClassPath* option.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-307) Dataframe written with Date,Timestamp, Decimal is read with same types

2019-12-25 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar updated HUDI-307:

Component/s: (was: Spark Integration)
 Spark datasource

> Dataframe written with Date,Timestamp, Decimal is read with same types
> --
>
> Key: HUDI-307
> URL: https://issues.apache.org/jira/browse/HUDI-307
> Project: Apache Hudi (incubating)
>  Issue Type: Test
>  Components: Spark datasource
>Reporter: Cosmin Iordache
>Priority: Minor
> Fix For: 0.5.1
>
>
> Small test for COW table to check the persistence of Date, Timestamp ,Decimal 
> types



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-439) Fix HoodieSparkSqlWriter wrt code refactoring

2019-12-25 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar updated HUDI-439:

Component/s: (was: Spark Integration)
 Spark datasource

> Fix HoodieSparkSqlWriter wrt code refactoring
> -
>
> Key: HUDI-439
> URL: https://issues.apache.org/jira/browse/HUDI-439
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Spark datasource
>Reporter: sivabalan narayanan
>Priority: Major
> Fix For: 0.5.1
>
>
> HoodieSparkSqlWriter have some common code paths for write and delete paths. 
> When I added support for deletes, it wasn't easy to have common code paths 
> due to HoodieWriteClient having generic type in java and scala expected to 
> declare the type. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-69) Support realtime view in Spark datasource #136

2019-12-25 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-69?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar updated HUDI-69:
---
Component/s: (was: Spark Integration)
 (was: Realtime View)
 Spark datasource

> Support realtime view in Spark datasource #136
> --
>
> Key: HUDI-69
> URL: https://issues.apache.org/jira/browse/HUDI-69
> Project: Apache Hudi (incubating)
>  Issue Type: New Feature
>  Components: Spark datasource
>Reporter: Vinoth Chandar
>Priority: Major
>
> https://github.com/uber/hudi/issues/136



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-303) Avro schema case sensitivity testing

2019-12-25 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar updated HUDI-303:

Component/s: (was: Testing)
 (was: Spark Integration)
 (was: Hive Integration)
 Spark datasource

> Avro schema case sensitivity testing
> 
>
> Key: HUDI-303
> URL: https://issues.apache.org/jira/browse/HUDI-303
> Project: Apache Hudi (incubating)
>  Issue Type: Test
>  Components: Spark datasource
>Reporter: Udit Mehrotra
>Priority: Minor
>
> As a fallout of [PR 956|https://github.com/apache/incubator-hudi/pull/956] we 
> would like to understand how Avro behaves with case sensitive column names.
> Couple of action items:
>  * Test with different field names just differing in case.
>  * *AbstractRealtimeRecordReader* is one of the classes where we are 
> converting Avro Schema field names to lower case, to be able to verify them 
> against column names from Hive. We can consider removing the *lowercase* 
> conversion there if we verify it does not break anything.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-33) Introduce config to allow users to control case-sensitivity in column projections #431

2019-12-25 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-33?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar updated HUDI-33:
---
Component/s: (was: Spark Integration)
 (was: Presto Integration)
 (was: Hive Integration)
 Spark datasource

> Introduce config to allow users to control case-sensitivity in column 
> projections #431
> --
>
> Key: HUDI-33
> URL: https://issues.apache.org/jira/browse/HUDI-33
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>  Components: Spark datasource
>Reporter: Vinoth Chandar
>Priority: Major
>
> https://github.com/uber/hudi/issues/431



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-254) Provide mechanism for installing hudi-spark-bundle onto an existing spark installation

2019-12-25 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar updated HUDI-254:

Component/s: (was: Spark Integration)

> Provide mechanism for installing hudi-spark-bundle onto an existing spark 
> installation
> --
>
> Key: HUDI-254
> URL: https://issues.apache.org/jira/browse/HUDI-254
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Spark datasource
>Reporter: Vinoth Chandar
>Assignee: Vinoth Chandar
>Priority: Major
> Fix For: 0.5.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> A lot of discussions around this kicked off from 
> [https://github.com/apache/incubator-hudi/issues/869] 
> Breaking down into phases, when we drop the hudi-spark-bundle*.jar onto the 
> `jars` folder 
>  
> a) Writing data via Hudi datasource should work 
> b) Spark datasource reads should work
>  
> c)  a + Hive Sync should work
> d) SparkSQL on Hive synced table works 
>  
> Start with Spark 2.3 (current demo setup) and then proceed to 2.4 and iron 
> out issues.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] cdmikechen commented on issue #1119: [HUDI-469] Fix: HoodieCommitMetadata only show first commit insert rows.

2019-12-25 Thread GitBox
cdmikechen commented on issue #1119: [HUDI-469] Fix: HoodieCommitMetadata only 
show first commit insert rows.
URL: https://github.com/apache/incubator-hudi/pull/1119#issuecomment-568941316
 
 
   > Lets first triage the issue, file a JIRA and then proceed with a PR?
   
   OK! I have open a JIRA issue HUDI-469 and rename the title.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Updated] (HUDI-469) HoodieCommitMetadata only show first commit insert rows.

2019-12-25 Thread cdmikechen (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

cdmikechen updated HUDI-469:

Status: Patch Available  (was: In Progress)

> HoodieCommitMetadata only show first commit insert rows. 
> -
>
> Key: HUDI-469
> URL: https://issues.apache.org/jira/browse/HUDI-469
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>  Components: CLI
>Reporter: cdmikechen
>Assignee: cdmikechen
>Priority: Major
> Fix For: 0.5.1
>
>
> When I run hudi cli to get insert rows, I found that hudi cli can not get 
> insert rows if it is not in first commit time. I found that 
> *{{HoodieCommitMetadata.fetchTotalInsertRecordsWritten()*}} method use 
> *{{stat.getPrevCommit().equalsIgnoreCase("null")*}} to filter first commit. 
> This check option should be removed。



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-469) HoodieCommitMetadata only show first commit insert rows.

2019-12-25 Thread cdmikechen (Jira)
cdmikechen created HUDI-469:
---

 Summary: HoodieCommitMetadata only show first commit insert rows. 
 Key: HUDI-469
 URL: https://issues.apache.org/jira/browse/HUDI-469
 Project: Apache Hudi (incubating)
  Issue Type: Bug
  Components: CLI
Reporter: cdmikechen
Assignee: cdmikechen
 Fix For: 0.5.1


When I run hudi cli to get insert rows, I found that hudi cli can not get 
insert rows if it is not in first commit time. I found that 
*{{HoodieCommitMetadata.fetchTotalInsertRecordsWritten()*}} method use 
*{{stat.getPrevCommit().equalsIgnoreCase("null")*}} to filter first commit. 
This check option should be removed。



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1136: [MINOR]Optimize hudi-cli module

2019-12-25 Thread GitBox
vinothchandar commented on a change in pull request #1136: [MINOR]Optimize 
hudi-cli module
URL: https://github.com/apache/incubator-hudi/pull/1136#discussion_r361343736
 
 

 ##
 File path: hudi-cli/src/main/java/org/apache/hudi/cli/HoodiePrintHelper.java
 ##
 @@ -84,7 +84,7 @@ private static String print(Table buffer) {
 buffer.getFieldNames().toArray(header);
 
 String[][] rows =
-buffer.getRenderRows().stream().map(l -> 
l.stream().toArray(String[]::new)).toArray(String[][]::new);
+buffer.getRenderRows().stream().map(l -> l.toArray(new 
String[0])).toArray(String[][]::new);
 
 Review comment:
   why `new String[0]`? 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] lamber-ken commented on issue #1135: [HUDI-233]Redo log statements using SLF4J

2019-12-25 Thread GitBox
lamber-ken commented on issue #1135: [HUDI-233]Redo log statements using SLF4J
URL: https://github.com/apache/incubator-hudi/pull/1135#issuecomment-568940452
 
 
   > @leesf @lamber-ken who were looking into this before..
   > @leesf could you shepherd this one? My concerns are mostly around all 
bundles working smoothly.
   > 
   > IIUC this PR is low touch.. Just adds log4j facade and changes code.. We 
need to verify logs do show up on all of Hive, presto, spark logs by running 
through the demo steps..
   
   I thinks @leesf's advice is reasonable, redo log statements module by module.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1073: [HUDI-377] Adding Delete() support to DeltaStreamer

2019-12-25 Thread GitBox
vinothchandar commented on a change in pull request #1073: [HUDI-377] Adding 
Delete() support to DeltaStreamer
URL: https://github.com/apache/incubator-hudi/pull/1073#discussion_r361343021
 
 

 ##
 File path: 
hudi-spark/src/main/java/org/apache/hudi/OverwriteWithLatestAvroPayload.java
 ##
 @@ -61,8 +60,15 @@ public OverwriteWithLatestAvroPayload 
preCombine(OverwriteWithLatestAvroPayload
 
   @Override
   public Option combineAndGetUpdateValue(IndexedRecord 
currentValue, Schema schema) throws IOException {
+
 
 Review comment:
   Doing this in `getInsertValue()` means even inserts with the flag set will 
be deleted.. Not sure if this is intended behavior.. We only want to delete if 
`updating and marker` set? 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1073: [HUDI-377] Adding Delete() support to DeltaStreamer

2019-12-25 Thread GitBox
vinothchandar commented on a change in pull request #1073: [HUDI-377] Adding 
Delete() support to DeltaStreamer
URL: https://github.com/apache/incubator-hudi/pull/1073#discussion_r361343052
 
 

 ##
 File path: 
hudi-spark/src/main/java/org/apache/hudi/OverwriteWithLatestAvroPayload.java
 ##
 @@ -61,8 +60,15 @@ public OverwriteWithLatestAvroPayload 
preCombine(OverwriteWithLatestAvroPayload
 
   @Override
   public Option combineAndGetUpdateValue(IndexedRecord 
currentValue, Schema schema) throws IOException {
+
 
 Review comment:
   do you have a performance concern here? `Option.of` should be very cheap 
right..  In any case, we can achieve the effect of what you mean, by simply 
hanging onto to the original `Option[GenenricRecord]`?
   
   ```
   Option val = getInsertValue(schema)
   GenericRecord genericRecord = (GenericRecord) val.get();
   ...
   
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1073: [HUDI-377] Adding Delete() support to DeltaStreamer

2019-12-25 Thread GitBox
vinothchandar commented on a change in pull request #1073: [HUDI-377] Adding 
Delete() support to DeltaStreamer
URL: https://github.com/apache/incubator-hudi/pull/1073#discussion_r361342980
 
 

 ##
 File path: 
hudi-client/src/test/java/org/apache/hudi/common/HoodieTestDataGenerator.java
 ##
 @@ -148,27 +169,27 @@ public static void createCommitFile(String basePath, 
String commitTime) throws I
   public static void createCommitFile(String basePath, String commitTime, 
Configuration configuration) {
 Arrays.asList(HoodieTimeline.makeCommitFileName(commitTime), 
HoodieTimeline.makeInflightCommitFileName(commitTime),
 HoodieTimeline.makeRequestedCommitFileName(commitTime)).forEach(f -> {
-  Path commitFile = new Path(
-  basePath + "/" + HoodieTableMetaClient.METAFOLDER_NAME + "/" + 
f);
-  FSDataOutputStream os = null;
+  Path commitFile = new Path(
+  basePath + "/" + HoodieTableMetaClient.METAFOLDER_NAME + "/" + f);
+  FSDataOutputStream os = null;
+  try {
+FileSystem fs = FSUtils.getFs(basePath, configuration);
+os = fs.create(commitFile, true);
+HoodieCommitMetadata commitMetadata = new HoodieCommitMetadata();
+// Write empty commit metadata
+os.writeBytes(new 
String(commitMetadata.toJsonString().getBytes(StandardCharsets.UTF_8)));
+  } catch (IOException ioe) {
+throw new HoodieIOException(ioe.getMessage(), ioe);
+  } finally {
+if (null != os) {
   try {
-FileSystem fs = FSUtils.getFs(basePath, configuration);
 
 Review comment:
   seems like this is just an indentation change? or has code changed here? 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1073: [HUDI-377] Adding Delete() support to DeltaStreamer

2019-12-25 Thread GitBox
vinothchandar commented on a change in pull request #1073: [HUDI-377] Adding 
Delete() support to DeltaStreamer
URL: https://github.com/apache/incubator-hudi/pull/1073#discussion_r361342961
 
 

 ##
 File path: 
hudi-client/src/test/java/org/apache/hudi/common/HoodieTestDataGenerator.java
 ##
 @@ -148,27 +169,27 @@ public static void createCommitFile(String basePath, 
String commitTime) throws I
   public static void createCommitFile(String basePath, String commitTime, 
Configuration configuration) {
 Arrays.asList(HoodieTimeline.makeCommitFileName(commitTime), 
HoodieTimeline.makeInflightCommitFileName(commitTime),
 HoodieTimeline.makeRequestedCommitFileName(commitTime)).forEach(f -> {
-  Path commitFile = new Path(
-  basePath + "/" + HoodieTableMetaClient.METAFOLDER_NAME + "/" + 
f);
-  FSDataOutputStream os = null;
+  Path commitFile = new Path(
 
 Review comment:
   is this indentation correct.. just curious 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar edited a comment on issue #1073: [HUDI-377] Adding Delete() support to DeltaStreamer

2019-12-25 Thread GitBox
vinothchandar edited a comment on issue #1073: [HUDI-377] Adding Delete() 
support to DeltaStreamer
URL: https://github.com/apache/incubator-hudi/pull/1073#issuecomment-568939561
 
 
   @cdmikechen its good to check on the payload .. There is also another issue 
open to track that I think.. @nsivabalan may be close the loop on that, get the 
build passing and we can do a follow on as needed. otherwise, PR LGTM.
   
   > WDYT @vinothchandar . If the delete api already supports, should a query 
api also be provided to query the deletion data?
   
   Are you asking for incremental pull to provide the both before and after 
images of a record like how Oracle ogg CDC Stream is?  if so, this is a much 
larger feature.. we can discuss on a separate JIRA. 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on issue #1073: [HUDI-377] Adding Delete() support to DeltaStreamer

2019-12-25 Thread GitBox
vinothchandar commented on issue #1073: [HUDI-377] Adding Delete() support to 
DeltaStreamer
URL: https://github.com/apache/incubator-hudi/pull/1073#issuecomment-568939561
 
 
   @cdmikechen its good to check on the payload .. There is also another issue 
open to track that I think.. @nsivabalan may be close the loop on that and we 
can do a follow on as needed. otherwise, PR LGTM.
   
   > WDYT @vinothchandar . If the delete api already supports, should a query 
api also be provided to query the deletion data?
   
   Are you asking for incremental pull to provide the both before and after 
images of a record like how Oracle ogg CDC Stream is?  if so, this is a much 
larger feature.. we can discuss on a separate JIRA. 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] lamber-ken commented on issue #1120: [HUDI-440] Rework the hudi web site

2019-12-25 Thread GitBox
lamber-ken commented on issue #1120: [HUDI-440] Rework the hudi web site
URL: https://github.com/apache/incubator-hudi/pull/1120#issuecomment-568939134
 
 
   Hi @vinothchandar, thanks for your review. I updated pr according to your 
advice.  


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on issue #736: hoodie-hive-hundle don't have hive jars

2019-12-25 Thread GitBox
vinothchandar commented on issue #736: hoodie-hive-hundle don't have hive jars
URL: https://github.com/apache/incubator-hudi/issues/736#issuecomment-568938832
 
 
   I think the code assumes everything is in the hadoop conf 
   
   ```
   if [ -z "$HADOOP_CONF_DIR" ]; then
 echo "setting hadoop conf dir"
 HADOOP_CONF_DIR="${HADOOP_HOME}/etc/hadoop"
   fi
   ```
   
   In either case, still cant understand how adding conf to the classpath will 
resolve the driver not being found.. is there a direct link? i.e do you know 
why exactly adding the `conf` directory helps? 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on issue #828: Synchronizing to hive partition is incorrect

2019-12-25 Thread GitBox
vinothchandar commented on issue #828: Synchronizing to hive partition is 
incorrect
URL: https://github.com/apache/incubator-hudi/issues/828#issuecomment-568938690
 
 
   In that case, I am not sure why the code would not find the partitions it 
just wrote? is this S3? may be an eventual consistency issue? 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] leesf commented on issue #1135: [HUDI-233]Redo log statements using SLF4J

2019-12-25 Thread GitBox
leesf commented on issue #1135: [HUDI-233]Redo log statements using SLF4J
URL: https://github.com/apache/incubator-hudi/pull/1135#issuecomment-568937528
 
 
   Thanks for opening the PR @listenLearning . Here are my some thoughts.
- we must ensure the log4j paly well with spark/hive/presto to avoid 
version conflicts, more context 
[here](https://issues.apache.org/jira/projects/HUDI/issues/HUDI-233?filter=allopenissues).
   - so please verify it in your own cluster to ensure the modification is 
correct. 
   - only replace the `LogManager.getLogger` with `LogFactory.getLogger` is 
incompleted, you need replace the `LOG.info,LOG.error,LOG.debug` statement in 
the project also.
   - IMHO, I would like to suggest redo log statements module by  module.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Updated] (HUDI-453) Throw failed to archive commits error when writing data to MOR/COW table

2019-12-25 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar updated HUDI-453:

Fix Version/s: 0.5.1

> Throw failed to archive commits error when writing data to MOR/COW table
> 
>
> Key: HUDI-453
> URL: https://issues.apache.org/jira/browse/HUDI-453
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>  Components: Write Client
>Reporter: lamber-ken
>Assignee: lamber-ken
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.1
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Throw failed to archive commits error when writing data to table, here are 
> reproduce steps.
> *1, Build from latest source*
> {code:java}
> mvn clean package -DskipTests -DskipITs -Dcheckstyle.skip=true -Drat.skip=true
> {code}
> *2, Write Data*
> {code:java}
> export SPARK_HOME=/work/BigData/install/spark/spark-2.3.3-bin-hadoop2.6
> ${SPARK_HOME}/bin/spark-shell --jars `ls 
> packaging/hudi-spark-bundle/target/hudi-spark-bundle-*.*.*-SNAPSHOT.jar` 
> --conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer'
> import org.apache.spark.sql.SaveMode._
> var datas = List("{ \"name\": \"kenken\", \"ts\": 1574297893836, \"age\": 12, 
> \"location\": \"latitude\"}")
> val df = spark.read.json(spark.sparkContext.parallelize(datas, 2))
> df.write.format("org.apache.hudi").
> option("hoodie.insert.shuffle.parallelism", "10").
> option("hoodie.upsert.shuffle.parallelism", "10").
> option("hoodie.delete.shuffle.parallelism", "10").
> option("hoodie.bulkinsert.shuffle.parallelism", "10").
> option("hoodie.datasource.write.recordkey.field", "name").
> option("hoodie.datasource.write.partitionpath.field", "location").
> option("hoodie.datasource.write.precombine.field", "ts").
> option("hoodie.table.name", "hudi_mor_table").
> mode(Overwrite).
> save("file:///tmp/hudi_mor_table")
> {code}
> *3, Append Data*
> {code:java}
> df.write.format("org.apache.hudi").
> option("hoodie.insert.shuffle.parallelism", "10").
> option("hoodie.upsert.shuffle.parallelism", "10").
> option("hoodie.delete.shuffle.parallelism", "10").
> option("hoodie.bulkinsert.shuffle.parallelism", "10").
> option("hoodie.datasource.write.recordkey.field", "name").
> option("hoodie.datasource.write.partitionpath.field", "location").
> option("hoodie.datasource.write.precombine.field", "ts").
> option("hoodie.keep.max.commits", "5").
> option("hoodie.keep.min.commits", "4").
> option("hoodie.cleaner.commits.retained", "3").
> option("hoodie.table.name", "hudi_mor_table").
> mode(Append).
> save("file:///tmp/hudi_mor_table")
> {code}
> *4, Repeat about six times Append Data operation(above), will get the 
> stackstrace*
> {code:java}
> 19/12/23 01:30:48 ERROR HoodieCommitArchiveLog: Failed to archive commits, 
> .commit file: 20191224004558.clean.requested
> java.io.IOException: Not an Avro data file
> at org.apache.avro.file.DataFileReader.openReader(DataFileReader.java:50)
> at 
> org.apache.hudi.common.util.AvroUtils.deserializeAvroMetadata(AvroUtils.java:147)
> at 
> org.apache.hudi.common.util.CleanerUtils.getCleanerPlan(CleanerUtils.java:88)
> at 
> org.apache.hudi.io.HoodieCommitArchiveLog.convertToAvroRecord(HoodieCommitArchiveLog.java:294)
> at 
> org.apache.hudi.io.HoodieCommitArchiveLog.archive(HoodieCommitArchiveLog.java:253)
> at 
> org.apache.hudi.io.HoodieCommitArchiveLog.archiveIfRequired(HoodieCommitArchiveLog.java:122)
> at org.apache.hudi.HoodieWriteClient.commit(HoodieWriteClient.java:562)
> at org.apache.hudi.HoodieWriteClient.commit(HoodieWriteClient.java:523)
> at org.apache.hudi.HoodieWriteClient.commit(HoodieWriteClient.java:514)
> at org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:159)
> at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:91)
> at 
> org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45)
> at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
> at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
> at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
> at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
> at 

[jira] [Closed] (HUDI-468) Not an avro data file - error while archiving post rename() change

2019-12-25 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar closed HUDI-468.
---
Resolution: Duplicate

dupe of HUDI-453 

> Not an avro data file - error while archiving post rename() change
> --
>
> Key: HUDI-468
> URL: https://issues.apache.org/jira/browse/HUDI-468
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>  Components: Write Client
>Reporter: Vinoth Chandar
>Assignee: Balaji Varadarajan
>Priority: Major
> Fix For: 0.5.1
>
>
> Here are reproduce steps.
> 1, Build from latest source
> mvn clean package -DskipTests -DskipITs -Dcheckstyle.skip=true -Drat.skip=true
> 2, Write Data
> export SPARK_HOME=/work/BigData/install/spark/spark-2.3.3-bin-hadoop2.6
> ${SPARK_HOME}/bin/spark-shell --jars `ls 
> packaging/hudi-spark-bundle/target/hudi-spark-bundle-*.*.*-SNAPSHOT.jar` 
> --conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer'
> import org.apache.spark.sql.SaveMode._
> var datas = List("\{ \"name\": \"kenken\", \"ts\": 1574297893836, \"age\": 
> 12, \"location\": \"latitude\"}")
> val df = spark.read.json(spark.sparkContext.parallelize(datas, 2))
> df.write.format("org.apache.hudi").
>     option("hoodie.insert.shuffle.parallelism", "10").
>     option("hoodie.upsert.shuffle.parallelism", "10").
>     option("hoodie.delete.shuffle.parallelism", "10").
>     option("hoodie.bulkinsert.shuffle.parallelism", "10").
>     option("hoodie.datasource.write.recordkey.field", "name").
>     option("hoodie.datasource.write.partitionpath.field", "location").
>     option("hoodie.datasource.write.precombine.field", "ts").
>     option("[hoodie.table.name|http://hoodie.table.name/];, "hudi_mor_table").
>     mode(Overwrite).
>     save("file:///tmp/hudi_mor_table")
> 3, Append Data
> df.write.format("org.apache.hudi").
>     option("hoodie.insert.shuffle.parallelism", "10").
>     option("hoodie.upsert.shuffle.parallelism", "10").
>     option("hoodie.delete.shuffle.parallelism", "10").
>     option("hoodie.bulkinsert.shuffle.parallelism", "10").
>     option("hoodie.datasource.write.recordkey.field", "name").
>     option("hoodie.datasource.write.partitionpath.field", "location").
>     option("hoodie.datasource.write.precombine.field", "ts").
>     option("hoodie.keep.max.commits", "5").
>     option("hoodie.keep.min.commits", "4").
>     option("hoodie.cleaner.commits.retained", "3").
>     option("[hoodie.table.name|http://hoodie.table.name/];, "hudi_mor_table").
>     mode(Append).
>     save("file:///tmp/hudi_mor_table")
> 4, Repeat about six times Append Data operation(above), will get the 
> stackstrace
> 19/12/24 13:34:09 ERROR HoodieCommitArchiveLog: Failed to archive commits, 
> .commit file: 20191224132942.clean.requested
> java.io.IOException: Not an Avro data file
> at org.apache.avro.file.DataFileReader.openReader(DataFileReader.java:50)
> at 
> org.apache.hudi.common.util.AvroUtils.deserializeAvroMetadata(AvroUtils.java:147)
> at 
> org.apache.hudi.common.util.CleanerUtils.getCleanerPlan(CleanerUtils.java:88)
> at 
> [org.apache.hudi.io|http://org.apache.hudi.io/].HoodieCommitArchiveLog.convertToAvroRecord(HoodieCommitArchiveLog.java:294)
> at 
> [org.apache.hudi.io|http://org.apache.hudi.io/].HoodieCommitArchiveLog.archive(HoodieCommitArchiveLog.java:253)
> at 
> [org.apache.hudi.io|http://org.apache.hudi.io/].HoodieCommitArchiveLog.archiveIfRequired(HoodieCommitArchiveLog.java:122)
> at org.apache.hudi.HoodieWriteClient.commit(HoodieWriteClient.java:562)
> at org.apache.hudi.HoodieWriteClient.commit(HoodieWriteClient.java:523)
> at org.apache.hudi.HoodieWriteClient.commit(HoodieWriteClient.java:514)
> at org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:159)
> at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:91)
> at 
> org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45)
> at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
> at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
> at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
> at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
> at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
> at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)

[jira] [Created] (HUDI-468) Not an avro data file - error while archiving post rename() change

2019-12-25 Thread Vinoth Chandar (Jira)
Vinoth Chandar created HUDI-468:
---

 Summary: Not an avro data file - error while archiving post 
rename() change
 Key: HUDI-468
 URL: https://issues.apache.org/jira/browse/HUDI-468
 Project: Apache Hudi (incubating)
  Issue Type: Bug
  Components: Write Client
Reporter: Vinoth Chandar
Assignee: Balaji Varadarajan
 Fix For: 0.5.1


Here are reproduce steps.


1, Build from latest source
mvn clean package -DskipTests -DskipITs -Dcheckstyle.skip=true -Drat.skip=true


2, Write Data
export SPARK_HOME=/work/BigData/install/spark/spark-2.3.3-bin-hadoop2.6
${SPARK_HOME}/bin/spark-shell --jars `ls 
packaging/hudi-spark-bundle/target/hudi-spark-bundle-*.*.*-SNAPSHOT.jar` --conf 
'spark.serializer=org.apache.spark.serializer.KryoSerializer'


import org.apache.spark.sql.SaveMode._


var datas = List("\{ \"name\": \"kenken\", \"ts\": 1574297893836, \"age\": 12, 
\"location\": \"latitude\"}")
val df = spark.read.json(spark.sparkContext.parallelize(datas, 2))
df.write.format("org.apache.hudi").
    option("hoodie.insert.shuffle.parallelism", "10").
    option("hoodie.upsert.shuffle.parallelism", "10").
    option("hoodie.delete.shuffle.parallelism", "10").
    option("hoodie.bulkinsert.shuffle.parallelism", "10").
    option("hoodie.datasource.write.recordkey.field", "name").
    option("hoodie.datasource.write.partitionpath.field", "location").
    option("hoodie.datasource.write.precombine.field", "ts").
    option("[hoodie.table.name|http://hoodie.table.name/];, "hudi_mor_table").
    mode(Overwrite).
    save("file:///tmp/hudi_mor_table")


3, Append Data

df.write.format("org.apache.hudi").
    option("hoodie.insert.shuffle.parallelism", "10").
    option("hoodie.upsert.shuffle.parallelism", "10").
    option("hoodie.delete.shuffle.parallelism", "10").
    option("hoodie.bulkinsert.shuffle.parallelism", "10").
    option("hoodie.datasource.write.recordkey.field", "name").
    option("hoodie.datasource.write.partitionpath.field", "location").
    option("hoodie.datasource.write.precombine.field", "ts").
    option("hoodie.keep.max.commits", "5").
    option("hoodie.keep.min.commits", "4").
    option("hoodie.cleaner.commits.retained", "3").
    option("[hoodie.table.name|http://hoodie.table.name/];, "hudi_mor_table").
    mode(Append).
    save("file:///tmp/hudi_mor_table")


4, Repeat about six times Append Data operation(above), will get the stackstrace
19/12/24 13:34:09 ERROR HoodieCommitArchiveLog: Failed to archive commits, 
.commit file: 20191224132942.clean.requested
java.io.IOException: Not an Avro data file
at org.apache.avro.file.DataFileReader.openReader(DataFileReader.java:50)
at 
org.apache.hudi.common.util.AvroUtils.deserializeAvroMetadata(AvroUtils.java:147)
at org.apache.hudi.common.util.CleanerUtils.getCleanerPlan(CleanerUtils.java:88)
at 
[org.apache.hudi.io|http://org.apache.hudi.io/].HoodieCommitArchiveLog.convertToAvroRecord(HoodieCommitArchiveLog.java:294)
at 
[org.apache.hudi.io|http://org.apache.hudi.io/].HoodieCommitArchiveLog.archive(HoodieCommitArchiveLog.java:253)
at 
[org.apache.hudi.io|http://org.apache.hudi.io/].HoodieCommitArchiveLog.archiveIfRequired(HoodieCommitArchiveLog.java:122)
at org.apache.hudi.HoodieWriteClient.commit(HoodieWriteClient.java:562)
at org.apache.hudi.HoodieWriteClient.commit(HoodieWriteClient.java:523)
at org.apache.hudi.HoodieWriteClient.commit(HoodieWriteClient.java:514)
at org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:159)
at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:91)
at 
org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45)
at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86)
at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] vinothchandar merged pull request #1137: [MINOR]Optimize hudi-timeline-service module

2019-12-25 Thread GitBox
vinothchandar merged pull request #1137: [MINOR]Optimize hudi-timeline-service 
module
URL: https://github.com/apache/incubator-hudi/pull/1137
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on issue #1137: [MINOR]Optimize hudi-timeline-service module

2019-12-25 Thread GitBox
vinothchandar commented on issue #1137: [MINOR]Optimize hudi-timeline-service 
module
URL: https://github.com/apache/incubator-hudi/pull/1137#issuecomment-568935856
 
 
   Good cleanup! thanks! 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[incubator-hudi] branch master updated (3c811ec -> def18a5)

2019-12-25 Thread vinoth
This is an automated email from the ASF dual-hosted git repository.

vinoth pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-hudi.git.


from 3c811ec  [MINOR] fix typos
 add def18a5  [MINOR] optimize hudi timeline service (#1137)

No new revisions were added by this update.

Summary of changes:
 .../apache/hudi/timeline/service/FileSystemViewHandler.java| 10 --
 .../java/org/apache/hudi/timeline/service/TimelineService.java |  2 +-
 .../apache/hudi/timeline/service/handlers/DataFileHandler.java |  2 +-
 .../hudi/timeline/service/handlers/FileSliceHandler.java   |  2 +-
 .../apache/hudi/timeline/service/handlers/TimelineHandler.java |  2 +-
 5 files changed, 8 insertions(+), 10 deletions(-)



[GitHub] [incubator-hudi] vinothchandar commented on issue #1077: [HUDI-335] : Improvements to DiskbasedMap

2019-12-25 Thread GitBox
vinothchandar commented on issue #1077: [HUDI-335] : Improvements to 
DiskbasedMap
URL: https://github.com/apache/incubator-hudi/pull/1077#issuecomment-568935637
 
 
   @nbalajee But this seems simple enough that we can do ourselves right.. I am 
bit hesitant to reuse code as-is from Cassandra.. There is ongoing maintenance 
overhead to consider  


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1106: [HUDI-209] Implement JMX metrics reporter

2019-12-25 Thread GitBox
vinothchandar commented on a change in pull request #1106: [HUDI-209] Implement 
JMX metrics reporter
URL: https://github.com/apache/incubator-hudi/pull/1106#discussion_r361340278
 
 

 ##
 File path: hudi-client/src/main/java/org/apache/hudi/metrics/Metrics.java
 ##
 @@ -47,7 +48,7 @@ private Metrics(HoodieWriteConfig metricConfig) {
 if (reporter == null) {
   throw new RuntimeException("Cannot initialize Reporter.");
 }
-// reporter.start();
+reporter.start();
 
 Review comment:
   you mean commented? can you try git blaming the lines and see if there are 
any clues.. IIRC, there was some issue that made us do this? I am not tbh .. 
uncommenting looks reasonable to me, but better to fully understand the context 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1106: [HUDI-209] Implement JMX metrics reporter

2019-12-25 Thread GitBox
vinothchandar commented on a change in pull request #1106: [HUDI-209] Implement 
JMX metrics reporter
URL: https://github.com/apache/incubator-hudi/pull/1106#discussion_r361340214
 
 

 ##
 File path: 
hudi-client/src/main/java/org/apache/hudi/metrics/JmxMetricsReporter.java
 ##
 @@ -18,47 +18,73 @@
 
 package org.apache.hudi.metrics;
 
+import org.apache.hudi.client.utils.NetUtils;
 import org.apache.hudi.config.HoodieWriteConfig;
 import org.apache.hudi.exception.HoodieException;
 
-import com.google.common.base.Preconditions;
+import com.codahale.metrics.MetricRegistry;
+import com.codahale.metrics.jmx.JmxReporter;
 import org.apache.log4j.LogManager;
 import org.apache.log4j.Logger;
 
+import javax.management.MBeanServer;
 import javax.management.remote.JMXConnectorServer;
 import javax.management.remote.JMXConnectorServerFactory;
 import javax.management.remote.JMXServiceURL;
 
 import java.io.Closeable;
+import java.io.IOException;
 import java.lang.management.ManagementFactory;
+import java.net.MalformedURLException;
+import java.rmi.NoSuchObjectException;
 import java.rmi.registry.LocateRegistry;
+import java.rmi.registry.Registry;
+import java.rmi.server.UnicastRemoteObject;
+import java.util.Iterator;
 
 /**
  * Implementation of Jmx reporter, which used to report jmx metric.
  */
 public class JmxMetricsReporter extends MetricsReporter {
 
   private static final Logger LOG = 
LogManager.getLogger(JmxMetricsReporter.class);
-  private final JMXConnectorServer connector;
-  private String host;
-  private int port;
+  private final JmxServer jmxServer;
 
-  public JmxMetricsReporter(HoodieWriteConfig config) {
+  public JmxMetricsReporter(HoodieWriteConfig config, MetricRegistry registry) 
{
 try {
   // Check the host and port here
-  this.host = config.getJmxHost();
-  this.port = config.getJmxPort();
-  if (host == null || port == 0) {
-throw new RuntimeException(
+  String host = config.getJmxHost();
+  String portsConfig = config.getJmxPorts();
+  if (host == null || portsConfig == null) {
+throw new HoodieException(
 String.format("Jmx cannot be initialized with host[%s] and 
port[%s].",
-host, port));
+host, portsConfig));
   }
-  LocateRegistry.createRegistry(port);
-  String serviceUrl =
-  "service:jmx:rmi://" + host + ":" + port + "/jndi/rmi://" + host + 
":" + port + "/jmxrmi";
-  JMXServiceURL url = new JMXServiceURL(serviceUrl);
-  this.connector = JMXConnectorServerFactory
-  .newJMXConnectorServer(url, null, 
ManagementFactory.getPlatformMBeanServer());
+
+  Iterator ports = NetUtils.getPortRangeFromString(portsConfig);
+  JmxServer successfullyStartedServer = null;
 
 Review comment:
   why do we need two separate `JmxServer` variables.. just to denote success 
or failure.. Can we use a boolean for that and just use `JmxServer server` 
declaration. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1106: [HUDI-209] Implement JMX metrics reporter

2019-12-25 Thread GitBox
vinothchandar commented on a change in pull request #1106: [HUDI-209] Implement 
JMX metrics reporter
URL: https://github.com/apache/incubator-hudi/pull/1106#discussion_r361340058
 
 

 ##
 File path: hudi-client/src/main/java/org/apache/hudi/client/utils/NetUtils.java
 ##
 @@ -0,0 +1,202 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.client.utils;
+
+import org.apache.hudi.exception.HoodieException;
+
+import java.net.Inet4Address;
+import java.net.Inet6Address;
+import java.net.InetAddress;
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.Iterator;
+
+/**
+ * Utility for various network related tasks (such as finding free ports).
+ */
+public class NetUtils {
+  // 
+  //  Encoding of IP addresses for URLs
+  // 
+
+  /**
+   * Encodes an IP address properly as a URL string. This method makes sure 
that IPv6 addresses have
+   * the proper formatting to be included in URLs.
+   *
+   * @param address The IP address to encode.
+   * @return The proper URL string encoded IP address.
+   */
+  public static String ipAddressToUrlString(InetAddress address) {
 
 Review comment:
   Would like to make sure, the methods here are not copied from someplace 
else.. I see a similar class in Flink. We have attribute code if we reuse from 
another project. This builds overhead when we release in terms of license 
clearance.. Please respond, so we can decide accordingly 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1106: [HUDI-209] Implement JMX metrics reporter

2019-12-25 Thread GitBox
vinothchandar commented on a change in pull request #1106: [HUDI-209] Implement 
JMX metrics reporter
URL: https://github.com/apache/incubator-hudi/pull/1106#discussion_r361340146
 
 

 ##
 File path: 
hudi-client/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java
 ##
 @@ -478,8 +478,8 @@ public String getJmxHost() {
 return props.getProperty(HoodieMetricsConfig.JMX_HOST);
   }
 
-  public int getJmxPort() {
-return Integer.parseInt(props.getProperty(HoodieMetricsConfig.JMX_PORT));
+  public String getJmxPorts() {
 
 Review comment:
   rename: getJmxPort() [singular] 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1106: [HUDI-209] Implement JMX metrics reporter

2019-12-25 Thread GitBox
vinothchandar commented on a change in pull request #1106: [HUDI-209] Implement 
JMX metrics reporter
URL: https://github.com/apache/incubator-hudi/pull/1106#discussion_r361340118
 
 

 ##
 File path: 
hudi-client/src/main/java/org/apache/hudi/client/utils/UnionIterator.java
 ##
 @@ -0,0 +1,106 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.client.utils;
+
+import org.apache.hudi.exception.HoodieException;
+
+import java.util.ArrayList;
+import java.util.Iterator;
+import java.util.List;
+import java.util.NoSuchElementException;
+
+/**
+ * An iterator that concatenates a collection of iterators. The UnionIterator 
is a mutable, reusable
+ * type.
+ *
+ * @param  The type returned by the iterator.
+ */
+public class UnionIterator implements Iterator, Iterable {
 
 Review comment:
   Lets get rid of this class and simply do `Stream.concat` 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1106: [HUDI-209] Implement JMX metrics reporter

2019-12-25 Thread GitBox
vinothchandar commented on a change in pull request #1106: [HUDI-209] Implement 
JMX metrics reporter
URL: https://github.com/apache/incubator-hudi/pull/1106#discussion_r361340224
 
 

 ##
 File path: 
hudi-client/src/main/java/org/apache/hudi/metrics/JmxMetricsReporter.java
 ##
 @@ -82,6 +102,106 @@ public void report() {
 
   @Override
   public Closeable getReporter() {
-return null;
+return jmxServer.getReporter();
+  }
+
+  @Override
+  public void stop() {
+if (jmxServer != null) {
+  try {
+jmxServer.stop();
+  } catch (IOException e) {
+LOG.error("Failed to stop JMX server.", e);
+  }
+}
+  }
+
+  /**
+   * JMX Server implementation that JMX clients can connect to.
+   *
+   * Heavily based on j256 simplejmx project
 
 Review comment:
   whats the licensing of this code?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1106: [HUDI-209] Implement JMX metrics reporter

2019-12-25 Thread GitBox
vinothchandar commented on a change in pull request #1106: [HUDI-209] Implement 
JMX metrics reporter
URL: https://github.com/apache/incubator-hudi/pull/1106#discussion_r361340109
 
 

 ##
 File path: hudi-client/src/main/java/org/apache/hudi/client/utils/NetUtils.java
 ##
 @@ -0,0 +1,202 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.client.utils;
+
+import org.apache.hudi.exception.HoodieException;
+
+import java.net.Inet4Address;
+import java.net.Inet6Address;
+import java.net.InetAddress;
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.Iterator;
+
+/**
+ * Utility for various network related tasks (such as finding free ports).
+ */
+public class NetUtils {
+  // 
+  //  Encoding of IP addresses for URLs
+  // 
+
+  /**
+   * Encodes an IP address properly as a URL string. This method makes sure 
that IPv6 addresses have
+   * the proper formatting to be included in URLs.
+   *
+   * @param address The IP address to encode.
+   * @return The proper URL string encoded IP address.
+   */
+  public static String ipAddressToUrlString(InetAddress address) {
+if (address == null) {
+  throw new NullPointerException("address is null");
+} else if (address instanceof Inet4Address) {
+  return address.getHostAddress();
+} else if (address instanceof Inet6Address) {
+  return getIPv6UrlRepresentation((Inet6Address) address);
+} else {
+  throw new IllegalArgumentException("Unrecognized type of InetAddress: " 
+ address);
+}
+  }
+
+  /**
+   * Creates a compressed URL style representation of an Inet6Address.
+   *
+   * This method copies and adopts code from Google's Guava library.
+   * We re-implement this here in order to reduce dependency on Guava. The 
Guava library has
+   * frequently caused dependency conflicts in the past.
+   */
+  private static String getIPv6UrlRepresentation(Inet6Address address) {
+return getIPv6UrlRepresentation(address.getAddress());
+  }
+
+  /**
+   * Creates a compressed URL style representation of an Inet6Address.
+   *
+   * This method copies and adopts code from Google's Guava library.
+   * We re-implement this here in order to reduce dependency on Guava. The 
Guava library has
+   * frequently caused dependency conflicts in the past.
+   */
+  private static String getIPv6UrlRepresentation(byte[] addressBytes) {
+// first, convert bytes to 16 bit chunks
+int[] hextets = new int[8];
+for (int i = 0; i < hextets.length; i++) {
+  hextets[i] = (addressBytes[2 * i] & 0xFF) << 8 | (addressBytes[2 * i + 
1] & 0xFF);
+}
+
+// now, find the sequence of zeros that should be compressed
+int bestRunStart = -1;
+int bestRunLength = -1;
+int runStart = -1;
+for (int i = 0; i < hextets.length + 1; i++) {
+  if (i < hextets.length && hextets[i] == 0) {
+if (runStart < 0) {
+  runStart = i;
+}
+  } else if (runStart >= 0) {
+int runLength = i - runStart;
+if (runLength > bestRunLength) {
+  bestRunStart = runStart;
+  bestRunLength = runLength;
+}
+runStart = -1;
+  }
+}
+if (bestRunLength >= 2) {
+  Arrays.fill(hextets, bestRunStart, bestRunStart + bestRunLength, -1);
+}
+
+// convert into text form
+StringBuilder buf = new StringBuilder(40);
+buf.append('[');
+
+boolean lastWasNumber = false;
+for (int i = 0; i < hextets.length; i++) {
+  boolean thisIsNumber = hextets[i] >= 0;
+  if (thisIsNumber) {
+if (lastWasNumber) {
+  buf.append(':');
+}
+buf.append(Integer.toHexString(hextets[i]));
+  } else {
+if (i == 0 || lastWasNumber) {
+  buf.append("::");
+}
+  }
+  lastWasNumber = thisIsNumber;
+}
+buf.append(']');
+return buf.toString();
+  }
+
+  // 
+  //  Port range parsing
+  // 
+

[GitHub] [incubator-hudi] SteNicholas opened a new pull request #1137: [MINOR]Optimize hudi-timeline-service module

2019-12-25 Thread GitBox
SteNicholas opened a new pull request #1137: [MINOR]Optimize 
hudi-timeline-service module
URL: https://github.com/apache/incubator-hudi/pull/1137
 
 
   ## *Tips*
   - *Thank you very much for contributing to Apache Hudi.*
   - *Please review https://hudi.apache.org/contributing.html before opening a 
pull request.*
   
   ## What is the purpose of the pull request
   
   Optimize hudi-timeline-service module code, including simplifing lamada 
expression, fixed spelling mistake.
   
   ## Brief change log
   
 - Simplify lamada expression.
 - Fixed spelling mistake.
   
   ## Verify this pull request
   
   This pull request is a trivial rework / code cleanup without any test 
coverage.
   
   *(or)*
   
   This pull request is already covered by existing tests, such as *(please 
describe tests)*.
   
   (or)
   
   This change added tests and can be verified as follows:
   
   *(example:)*
   
 - *Added integration tests for end-to-end.*
 - *Added HoodieClientWriteTest to verify the change.*
 - *Manually verified the change by running a job locally.*
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] SteNicholas opened a new pull request #1136: [MINOR]Optimize hudi-cli module

2019-12-25 Thread GitBox
SteNicholas opened a new pull request #1136: [MINOR]Optimize hudi-cli module
URL: https://github.com/apache/incubator-hudi/pull/1136
 
 
   ## *Tips*
   - *Thank you very much for contributing to Apache Hudi.*
   - *Please review https://hudi.apache.org/contributing.html before opening a 
pull request.*
   
   ## What is the purpose of the pull request
   
   Optimize hudi-cli module code, including simplifing lamada expression, fixed 
spelling mistake, removing unused variable.
   
   ## Brief change log
   
 - Simplify lamada expression.
 - Fixed spelling mistake.
 - Remove unused variable.
   
   ## Verify this pull request
   
   
   This pull request is a trivial rework / code cleanup without any test 
coverage.
   
   *(or)*
   
   This pull request is already covered by existing tests, such as *(please 
describe tests)*.
   
   (or)
   
   This change added tests and can be verified as follows:
   
   *(example:)*
   
 - *Added integration tests for end-to-end.*
 - *Added HoodieClientWriteTest to verify the change.*
 - *Manually verified the change by running a job locally.*
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] leesf commented on issue #1130: [HUDI-445]Refactor the codes based on scala codestyle BlockImportChecker rule

2019-12-25 Thread GitBox
leesf commented on issue #1130: [HUDI-445]Refactor the codes based on scala 
codestyle BlockImportChecker rule
URL: https://github.com/apache/incubator-hudi/pull/1130#issuecomment-568894069
 
 
   @sev7e0 close the PR as discussed above. :)


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] leesf closed pull request #1130: [HUDI-445]Refactor the codes based on scala codestyle BlockImportChecker rule

2019-12-25 Thread GitBox
leesf closed pull request #1130: [HUDI-445]Refactor the codes based on scala 
codestyle BlockImportChecker rule
URL: https://github.com/apache/incubator-hudi/pull/1130
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[incubator-hudi] branch master updated (8affdf8 -> 3c811ec)

2019-12-25 Thread leesf
This is an automated email from the ASF dual-hosted git repository.

leesf pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-hudi.git.


from 8affdf8  [HUDI-416] Improve hint information for cli (#1110)
 add 3c811ec  [MINOR] fix typos

No new revisions were added by this update.

Summary of changes:
 hudi-cli/src/main/java/org/apache/hudi/cli/Table.java| 2 +-
 hudi-client/src/main/java/org/apache/hudi/HoodieWriteClient.java | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)



[GitHub] [incubator-hudi] leesf merged pull request #1134: [MINOR] fix typo

2019-12-25 Thread GitBox
leesf merged pull request #1134: [MINOR] fix typo
URL: https://github.com/apache/incubator-hudi/pull/1134
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Closed] (HUDI-416) Improve hint information for Cli

2019-12-25 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf closed HUDI-416.
--
Fix Version/s: 0.5.1
   Resolution: Fixed

Fixed via master: 8affdf8bcbb4c7b236283e97c3afad186d5b6a3e

> Improve hint information for Cli
> 
>
> Key: HUDI-416
> URL: https://issues.apache.org/jira/browse/HUDI-416
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: CLI
>Reporter: hong dongdong
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.5.1
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Right now, cli always give error information: 
> {code:java}
> Command 'desc' was found but is not currently available (type 'help' then 
> ENTER to learn about this command)
> {code}
> but it is confused to user. We can give a hint clearly like:
> {code:java}
> Command failed java.lang.NullPointerException: There is no hudi dataset. 
> Please use connect command to set dataset first
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] leesf merged pull request #1110: [HUDI-416] improve hint information for cli

2019-12-25 Thread GitBox
leesf merged pull request #1110: [HUDI-416] improve hint information for cli
URL: https://github.com/apache/incubator-hudi/pull/1110
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


  1   2   >