[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1289: [HUDI-92] Provide reasonable names for Spark DAG stages in Hudi.

2020-02-05 Thread GitBox
vinothchandar commented on a change in pull request #1289: [HUDI-92] Provide 
reasonable names for Spark DAG stages in Hudi.
URL: https://github.com/apache/incubator-hudi/pull/1289#discussion_r375667569
 
 

 ##
 File path: 
hudi-client/src/main/java/org/apache/hudi/index/bloom/HoodieBloomIndex.java
 ##
 @@ -254,6 +255,7 @@ private int determineParallelism(int inputParallelism, int 
totalSubPartitions) {
 
 if (config.getBloomIndexPruneByRanges()) {
   // also obtain file ranges, if range pruning is enabled
+  jsc.setJobDescription("Obtain file ranges as range pruning is enabled");
 
 Review comment:
   `Obtain key ranges for file slices (range pruning=on)`


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1289: [HUDI-92] Provide reasonable names for Spark DAG stages in Hudi.

2020-02-05 Thread GitBox
vinothchandar commented on a change in pull request #1289: [HUDI-92] Provide 
reasonable names for Spark DAG stages in Hudi.
URL: https://github.com/apache/incubator-hudi/pull/1289#discussion_r375668377
 
 

 ##
 File path: 
hudi-client/src/test/java/org/apache/hudi/HoodieClientTestHarness.java
 ##
 @@ -107,11 +107,12 @@ protected void initSparkContexts(String appName) {
   }
 
   /**
-   * Initializes the Spark contexts ({@link JavaSparkContext} and {@link 
SQLContext}) with a default name
-   * TestHoodieClient.
+   * Initializes the Spark contexts ({@link JavaSparkContext} and {@link 
SQLContext}) 
+   * with a default name matching the name of the class.
*/
   protected void initSparkContexts() {
-initSparkContexts("TestHoodieClient");
+String ctxName = this.getClass().getSimpleName() + "#" + 
testName.getMethodName();
+initSparkContexts(ctxName);
 
 Review comment:
   can we do this in a single line? 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1289: [HUDI-92] Provide reasonable names for Spark DAG stages in Hudi.

2020-02-05 Thread GitBox
vinothchandar commented on a change in pull request #1289: [HUDI-92] Provide 
reasonable names for Spark DAG stages in Hudi.
URL: https://github.com/apache/incubator-hudi/pull/1289#discussion_r375667688
 
 

 ##
 File path: 
hudi-client/src/main/java/org/apache/hudi/io/compact/HoodieMergeOnReadTableCompactor.java
 ##
 @@ -94,6 +94,7 @@
 
.map(CompactionOperation::convertFromAvroRecordInstance).collect(toList());
 LOG.info("Compactor compacting " + operations + " files");
 
+jsc.setJobGroup(this.getClass().getSimpleName(), "Compacting files");
 
 Review comment:
   Compacting file slices?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1289: [HUDI-92] Provide reasonable names for Spark DAG stages in Hudi.

2020-02-05 Thread GitBox
vinothchandar commented on a change in pull request #1289: [HUDI-92] Provide 
reasonable names for Spark DAG stages in Hudi.
URL: https://github.com/apache/incubator-hudi/pull/1289#discussion_r375665978
 
 

 ##
 File path: hudi-client/src/main/java/org/apache/hudi/CompactionAdminClient.java
 ##
 @@ -398,6 +400,7 @@ private ValidationOpResult 
validateCompactionOperation(HoodieTableMetaClient met
   "Number of Compaction Operations :" + plan.getOperations().size() + 
" for instant :" + compactionInstant);
   List ops = plan.getOperations().stream()
   
.map(CompactionOperation::convertFromAvroRecordInstance).collect(Collectors.toList());
+  jsc.setJobGroup(this.getClass().getSimpleName(), "Generate renaming 
operations");
 
 Review comment:
   change to `Generate compaction unscheduling operations` ? 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1289: [HUDI-92] Provide reasonable names for Spark DAG stages in Hudi.

2020-02-05 Thread GitBox
vinothchandar commented on a change in pull request #1289: [HUDI-92] Provide 
reasonable names for Spark DAG stages in Hudi.
URL: https://github.com/apache/incubator-hudi/pull/1289#discussion_r375667190
 
 

 ##
 File path: hudi-client/src/main/java/org/apache/hudi/HoodieWriteClient.java
 ##
 @@ -586,6 +586,7 @@ public boolean savepoint(String commitTime, String user, 
String comment) {
   HoodieTimeline.compareTimestamps(commitTime, lastCommitRetained, 
HoodieTimeline.GREATER_OR_EQUAL),
   "Could not savepoint commit " + commitTime + " as this is beyond the 
lookup window " + lastCommitRetained);
 
+  jsc.setJobGroup(this.getClass().getSimpleName(), "Collecting latest 
files in partition");
 
 Review comment:
   In general lets provide some context into what higher level context, the 
action is being performed i.e savepoints, compaction, rollbacks. etc . In that 
spirit, change to `Collecting latest files for savepoint` ? 
   
   Also wonder if we can include the `commitTime` in the detail i.e `Collecting 
latest files for savepoint 2020020501`. This way, you can just go to past 
runs on spark history server and relate them to commits on hudi.. Even better, 
if someone is running deltastreamer in continuous mode, then they can see 
activity for commits over time 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1289: [HUDI-92] Provide reasonable names for Spark DAG stages in Hudi.

2020-02-05 Thread GitBox
vinothchandar commented on a change in pull request #1289: [HUDI-92] Provide 
reasonable names for Spark DAG stages in Hudi.
URL: https://github.com/apache/incubator-hudi/pull/1289#discussion_r375668006
 
 

 ##
 File path: 
hudi-client/src/main/java/org/apache/hudi/table/HoodieCopyOnWriteTable.java
 ##
 @@ -298,6 +298,7 @@ public HoodieCleanerPlan scheduleClean(JavaSparkContext 
jsc) {
   int cleanerParallelism = Math.min(partitionsToClean.size(), 
config.getCleanerParallelism());
   LOG.info("Using cleanerParallelism: " + cleanerParallelism);
 
+  jsc.setJobGroup(this.getClass().getSimpleName(), "Generates List of 
files to be cleaned");
 
 Review comment:
   I know the comments say `files` and you are just using that. but would be 
nice to stick to our terminologies as much as possible. 
https://cwiki.apache.org/confluence/display/HUDI/Design+And+Architecture 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] UZi5136225 commented on issue #1308: [Hudi-561] partition path config

2020-02-05 Thread GitBox
UZi5136225 commented on issue #1308: [Hudi-561] partition path config
URL: https://github.com/apache/incubator-hudi/pull/1308#issuecomment-582763965
 
 
   Currently the Hudi partition field is configured using 
hoodie.datasource.write.partitionpath.field.
   If the content of the partition field is 2020/02/06, we can correctly 
partition and build the hudi data directory
   But usually the data is not 2020/02/06, but 2020-02-06 15:34:20 (-MM-dd 
HH: mm: ss), or other formats, such data format hudi cannot be correct or 
friendly Create hudu data directory.
   Therefore, I want to add the time format configuration of source and tartget 
in 
   dfs-source.properties for time conversion.
   @vinothchandar 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar edited a comment on issue #1289: [HUDI-92] Provide reasonable names for Spark DAG stages in Hudi.

2020-02-05 Thread GitBox
vinothchandar edited a comment on issue #1289: [HUDI-92] Provide reasonable 
names for Spark DAG stages in Hudi.
URL: https://github.com/apache/incubator-hudi/pull/1289#issuecomment-582759343
 
 
   >The unit tests need to be run as follows:
   > mvn test -DSPARK_EVLOG_DIR=/path/for/spark/event/log
   
   lets add this to the README, under a new section `Running Tests` 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on issue #1289: [HUDI-92] Provide reasonable names for Spark DAG stages in Hudi.

2020-02-05 Thread GitBox
vinothchandar commented on issue #1289: [HUDI-92] Provide reasonable names for 
Spark DAG stages in Hudi.
URL: https://github.com/apache/incubator-hudi/pull/1289#issuecomment-582759343
 
 
   >The unit tests need to be run as follows:
   > mvn test -DSPARK_EVLOG_DIR=/path/for/spark/event/log
   lets add this to the README, under a new section `Running Tests` 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on issue #1289: [HUDI-92] Provide reasonable names for Spark DAG stages in Hudi.

2020-02-05 Thread GitBox
vinothchandar commented on issue #1289: [HUDI-92] Provide reasonable names for 
Spark DAG stages in Hudi.
URL: https://github.com/apache/incubator-hudi/pull/1289#issuecomment-582758667
 
 
   @prashantwason this is so awesome! Started reviewing this .. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (HUDI-553) Building/Running Hudi on higher java versions

2020-02-05 Thread Vinoth Chandar (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17031304#comment-17031304
 ] 

Vinoth Chandar commented on HUDI-553:
-

yes.. we would like hudi to run on jdk 9, 11 without issues if possible.. 
interested in picking this up? :) 

> Building/Running Hudi on higher java versions
> -
>
> Key: HUDI-553
> URL: https://issues.apache.org/jira/browse/HUDI-553
> Project: Apache Hudi (incubating)
>  Issue Type: Task
>  Components: Usability
>Reporter: Vinoth Chandar
>Priority: Major
> Fix For: 0.6.0
>
>
> [https://github.com/apache/incubator-hudi/issues/1235] 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] vinothchandar commented on issue #1232: [HUDI-529] Added cobertura coverage reporting support.

2020-02-05 Thread GitBox
vinothchandar commented on issue #1232: [HUDI-529] Added cobertura coverage 
reporting support.
URL: https://github.com/apache/incubator-hudi/pull/1232#issuecomment-582756949
 
 
   I see you want gradle mostly because of jacobo?  Lets hash out the need for 
bringing in gradle, which seems like an overkill.. 
   you can just an extra step in jenkins in a script like this ? 
https://github.com/rix0rrr/cover2cover (don't know if this specific one works. 
but you get my general point) .


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1151: [WIP] [HUDI-476] Add hudi-examples module

2020-02-05 Thread GitBox
vinothchandar commented on a change in pull request #1151: [WIP] [HUDI-476] Add 
hudi-examples module
URL: https://github.com/apache/incubator-hudi/pull/1151#discussion_r375657392
 
 

 ##
 File path: 
hudi-examples/src/main/java/org/apache/hudi/examples/spark/HoodieWriteClientExample.java
 ##
 @@ -0,0 +1,133 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.examples.spark;
+
+import org.apache.hudi.HoodieWriteClient;
+import org.apache.hudi.WriteStatus;
+import org.apache.hudi.common.model.HoodieAvroPayload;
+import org.apache.hudi.common.model.HoodieKey;
+import org.apache.hudi.common.model.HoodieRecord;
+import org.apache.hudi.common.model.HoodieTableType;
+import org.apache.hudi.common.table.HoodieTableMetaClient;
+import org.apache.hudi.common.util.FSUtils;
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.config.HoodieCompactionConfig;
+import org.apache.hudi.config.HoodieIndexConfig;
+import org.apache.hudi.config.HoodieWriteConfig;
+import org.apache.hudi.examples.common.HoodieExampleDataGenerator;
+import org.apache.hudi.examples.common.HoodieExampleSparkUtils;
+import org.apache.hudi.index.HoodieIndex;
+
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+import org.apache.spark.SparkConf;
+import org.apache.spark.api.java.JavaRDD;
+import org.apache.spark.api.java.JavaSparkContext;
+
+import java.util.ArrayList;
+import java.util.List;
+import java.util.stream.Collectors;
+
+
+/**
+ * Simple examples of #{@link HoodieWriteClient}.
+ *
+ * To run this example, you should
+ *   1. For running in IDE, set VM options `-Dspark.master=local[2]`
+ *   2. For running in shell, using `spark-submit`
+ *
+ * Usage: HoodieWriteClientExample  
+ *  and  describe root path of hudi and table name
+ * for example, `HoodieWriteClientExample file:///tmp/hoodie/sample-table 
hoodie_rt`
+ */
+public class HoodieWriteClientExample {
 
 Review comment:
   Please feel free to remove `HoodieClientExample` in 
hudi-client/src/test/java in favor of this.. (may need to confirm that the 
integ-test does not depend on it) 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1151: [WIP] [HUDI-476] Add hudi-examples module

2020-02-05 Thread GitBox
vinothchandar commented on a change in pull request #1151: [WIP] [HUDI-476] Add 
hudi-examples module
URL: https://github.com/apache/incubator-hudi/pull/1151#discussion_r375654298
 
 

 ##
 File path: hudi-examples/pom.xml
 ##
 @@ -0,0 +1,206 @@
+
+
+http://maven.apache.org/POM/4.0.0; 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance; 
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 
http://maven.apache.org/xsd/maven-4.0.0.xsd;>
+
+  
+hudi
+org.apache.hudi
+0.5.2-SNAPSHOT
+  
+  4.0.0
+
+  hudi-examples
+  jar
+
+  
+${project.parent.basedir}
+  
+
+  
+
+  
+src/main/resources
+  
+
+
+
+  
+org.apache.maven.plugins
+maven-dependency-plugin
+
+  
+copy-dependencies
+prepare-package
+
+  copy-dependencies
+
+
+  ${project.build.directory}/lib
+  true
+  true
+  true
+
+  
+
+  
+  
+net.alchim31.maven
+scala-maven-plugin
+
+  
+scala-compile-first
+process-resources
+
+  add-source
+  compile
+
+  
+
+  
+  
+org.apache.maven.plugins
+maven-compiler-plugin
+
+  
+compile
+
+  compile
+
+  
+
+  
+  
+org.apache.maven.plugins
+maven-jar-plugin
+
+  
+
+  test-jar
+
+test-compile
+  
+
+
+  false
+
+  
+  
+org.apache.rat
+apache-rat-plugin
+  
+
+  
+
+  
+
+
+  org.scala-lang
+  scala-library
+  ${scala.version}
+
+
+
+  org.apache.hudi
+  hudi-common
+  ${project.version}
+
+
+
+  org.apache.hudi
+  hudi-cli
+  ${project.version}
+
+
+
+  org.apache.hudi
+  hudi-client
+  ${project.version}
+
+
+
+  org.apache.hudi
+  hudi-utilities_${scala.binary.version}
+  ${project.version}
+
+
+
+  org.apache.hudi
+  hudi-spark_${scala.binary.version}
+  ${project.version}
+
+
+
+  org.apache.hudi
+  hudi-hadoop-mr
+  ${project.version}
+
+
+
+  org.apache.hudi
+  hudi-hive
+  ${project.version}
+
+
+
+  org.apache.hudi
+  hudi-timeline-service
+  ${project.version}
+
+
+
+
+  org.apache.spark
+  spark-core_${scala.binary.version}
+
+
+  org.apache.spark
+  spark-sql_${scala.binary.version}
+
+
+  org.apache.spark
+  spark-avro_${scala.binary.version}
+  provided
 
 Review comment:
   why have this specifically as `provided` given its in that scope in parent 
pom already?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1151: [WIP] [HUDI-476] Add hudi-examples module

2020-02-05 Thread GitBox
vinothchandar commented on a change in pull request #1151: [WIP] [HUDI-476] Add 
hudi-examples module
URL: https://github.com/apache/incubator-hudi/pull/1151#discussion_r375654859
 
 

 ##
 File path: 
hudi-examples/src/main/java/org/apache/hudi/examples/common/HoodieExampleDataGenerator.java
 ##
 @@ -0,0 +1,216 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.examples.common;
+
+import org.apache.hudi.common.model.HoodieAvroPayload;
+import org.apache.hudi.common.model.HoodieKey;
+import org.apache.hudi.common.model.HoodieRecord;
+import org.apache.hudi.common.model.HoodieRecordPayload;
+import org.apache.hudi.common.util.HoodieAvroUtils;
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.common.util.TypedProperties;
+import org.apache.hudi.utilities.schema.SchemaProvider;
+
+import org.apache.avro.Schema;
+import org.apache.avro.generic.GenericData;
+import org.apache.avro.generic.GenericRecord;
+import org.apache.spark.api.java.JavaSparkContext;
+
+import java.io.IOException;
+import java.io.Serializable;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.Random;
+import java.util.UUID;
+import java.util.stream.Collectors;
+import java.util.stream.IntStream;
+import java.util.stream.Stream;
+
+
+/**
+ * Class to be used to generate test data.
+ */
+public class HoodieExampleDataGenerator> {
 
 Review comment:
   I think we had to do this for QuickStartUtils as well.. cc @bhasudha .. May 
be we can create a code cleanup JIRA to consolidate this data generation into a 
common module inside Hudi and re-use consistently? 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1151: [WIP] [HUDI-476] Add hudi-examples module

2020-02-05 Thread GitBox
vinothchandar commented on a change in pull request #1151: [WIP] [HUDI-476] Add 
hudi-examples module
URL: https://github.com/apache/incubator-hudi/pull/1151#discussion_r375656925
 
 

 ##
 File path: 
hudi-examples/src/main/java/org/apache/hudi/examples/deltastreamer/HoodieDeltaStreamerKafkaSourceExample.java
 ##
 @@ -0,0 +1,87 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.examples.deltastreamer;
+
+import org.apache.hudi.DataSourceWriteOptions;
+import org.apache.hudi.examples.common.HoodieExampleDataGenerator;
+import org.apache.hudi.examples.common.HoodieExampleSparkUtils;
+import org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer;
+import org.apache.hudi.utilities.sources.JsonKafkaSource;
+import org.apache.hudi.utilities.transform.IdentityTransformer;
+
+import com.beust.jcommander.JCommander;
+import org.apache.spark.SparkConf;
+import org.apache.spark.api.java.JavaSparkContext;
+
+
+/**
+ * Simple examples of #{@link HoodieDeltaStreamer} from #{@link 
JsonKafkaSource}.
 
 Review comment:
   Also more javadocs descriptions here on what each example is trying to 
achieve?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] hmatu commented on a change in pull request #1242: [HUDI-544] Adjust the read and write path of archive

2020-02-05 Thread GitBox
hmatu commented on a change in pull request #1242: [HUDI-544] Adjust the read 
and write path of archive
URL: https://github.com/apache/incubator-hudi/pull/1242#discussion_r375657909
 
 

 ##
 File path: 
hudi-cli/src/main/java/org/apache/hudi/cli/commands/ArchivedCommitsCommand.java
 ##
 @@ -138,9 +139,11 @@ public String showCommits(
   throws IOException {
 
 System.out.println("===> Showing only " + limit + " archived 
commits <===");
-String basePath = HoodieCLI.getTableMetaClient().getBasePath();
+HoodieTableMetaClient metaClient = HoodieCLI.getTableMetaClient();
+String basePath = metaClient.getBasePath();
+Path archivePath = new Path(metaClient.getArchivePath() + 
"/.commits_.archive*");
 FileStatus[] fsStatuses =
-FSUtils.getFs(basePath, HoodieCLI.conf).globStatus(new Path(basePath + 
"/.hoodie/.commits_.archive*"));
+FSUtils.getFs(basePath, HoodieCLI.conf).globStatus(archivePath);
 
 Review comment:
   Thanks for your explain, but I still think it's a right way that add 
`archiveFolderPattern` to `show archived commits command` like `show archived 
commit stats` dose.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1151: [WIP] [HUDI-476] Add hudi-examples module

2020-02-05 Thread GitBox
vinothchandar commented on a change in pull request #1151: [WIP] [HUDI-476] Add 
hudi-examples module
URL: https://github.com/apache/incubator-hudi/pull/1151#discussion_r375656811
 
 

 ##
 File path: 
hudi-examples/src/main/java/org/apache/hudi/examples/deltastreamer/HoodieDeltaStreamerKafkaSourceExample.java
 ##
 @@ -0,0 +1,87 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.examples.deltastreamer;
+
+import org.apache.hudi.DataSourceWriteOptions;
+import org.apache.hudi.examples.common.HoodieExampleDataGenerator;
+import org.apache.hudi.examples.common.HoodieExampleSparkUtils;
+import org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer;
+import org.apache.hudi.utilities.sources.JsonKafkaSource;
+import org.apache.hudi.utilities.transform.IdentityTransformer;
+
+import com.beust.jcommander.JCommander;
+import org.apache.spark.SparkConf;
+import org.apache.spark.api.java.JavaSparkContext;
+
+
+/**
+ * Simple examples of #{@link HoodieDeltaStreamer} from #{@link 
JsonKafkaSource}.
+ *
+ * To run this example, you should
+ *1. Start Zookeeper and the Kafka demo server
+ *2. For running in IDE, set VM options `-Dspark.master=local[2]`
+ *3. For running in shell, using `spark-submit`
+ *4. produce some data to hoodie-source-topic configured by 
`hoodie.deltastreamer.source.kafka.topic`
+ *
+ * Usage: HoodieDeltaStreamerKafkaSourceExample \
+ *--target-base-path /tmp/hoodie/kafkadeltatable \
+ *--table-type MERGE_ON_READ \
+ *--target-table kafkadeltatable
+ */
+public class HoodieDeltaStreamerKafkaSourceExample {
+
+  public static void main(String[] args) throws Exception {
+
+final HoodieDeltaStreamer.Config cfg = defaultKafkaDeltaStreamerConfig();
+new JCommander(cfg).parse(args);
+
+SparkConf sparkConf = 
HoodieExampleSparkUtils.defaultSparkConf("hoodie-delta-streamer-kafka-source-example");
+JavaSparkContext jsc = new JavaSparkContext(sparkConf);
+
+try {
+  new HoodieDeltaStreamer(cfg, jsc).sync();
+} finally {
+  jsc.stop();
+}
+  }
+
+  /**
+   * also see #{@link HoodieDeltaStreamer.Config} for more params.
+   * @return default params for Kafka DeltaStreamer
+   */
+  private static HoodieDeltaStreamer.Config defaultKafkaDeltaStreamerConfig() {
+
+HoodieDeltaStreamer.Config cfg = new HoodieDeltaStreamer.Config();
+
+cfg.configs.add(String.format("%s=uuid", 
DataSourceWriteOptions.RECORDKEY_FIELD_OPT_KEY()));
 
 Review comment:
   little more comments for important configs to guide the user? for e.g 
   
   for this , 
   `cfg.configs.add("bootstrap.servers=localhost:9092");`
   
   we can `// The kafka cluster we want to ingest from` 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1151: [WIP] [HUDI-476] Add hudi-examples module

2020-02-05 Thread GitBox
vinothchandar commented on a change in pull request #1151: [WIP] [HUDI-476] Add 
hudi-examples module
URL: https://github.com/apache/incubator-hudi/pull/1151#discussion_r375656391
 
 

 ##
 File path: 
hudi-examples/src/main/java/org/apache/hudi/examples/deltastreamer/HoodieDeltaStreamerDfsSourceExample.java
 ##
 @@ -0,0 +1,81 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.examples.deltastreamer;
+
+import org.apache.hudi.DataSourceWriteOptions;
+import org.apache.hudi.examples.common.HoodieExampleDataGenerator;
+import org.apache.hudi.examples.common.HoodieExampleSparkUtils;
+import org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer;
+import org.apache.hudi.utilities.sources.JsonDFSSource;
+import org.apache.hudi.utilities.transform.IdentityTransformer;
+
+import com.beust.jcommander.JCommander;
+import org.apache.spark.SparkConf;
+import org.apache.spark.api.java.JavaSparkContext;
+
+
+/**
+ * Simple examples of #{@link HoodieDeltaStreamer} from #{@link JsonDFSSource}.
+ *
+ * To run this example, you should
+ *   1. prepare sample data as 
`hudi-examples/src/main/resources/dfs-delta-streamer`
+ *   2. For running in IDE, set VM options `-Dspark.master=local[2]`
+ *   3. For running in shell, using `spark-submit`
+ *
+ * Usage: HoodieDeltaStreamerDfsSourceExample \
+ *--target-base-path /tmp/hoodie/dfsdeltatable \
+ *--table-type MERGE_ON_READ \
+ *--target-table dfsdeltatable
+ *
+ */
+public class HoodieDeltaStreamerDfsSourceExample {
+
+  public static void main(String[] args) throws Exception {
+
+final HoodieDeltaStreamer.Config cfg = defaultDfsStreamerConfig();
 
 Review comment:
   Since a typical user will provide configs to delta streamer via property 
files or command line, can we follow the same, instead of constructing the 
deltastreamer config object programmatically (this approach is awesome for 
spark datasource where users typically supply options programmatically) 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1151: [WIP] [HUDI-476] Add hudi-examples module

2020-02-05 Thread GitBox
vinothchandar commented on a change in pull request #1151: [WIP] [HUDI-476] Add 
hudi-examples module
URL: https://github.com/apache/incubator-hudi/pull/1151#discussion_r375653917
 
 

 ##
 File path: hudi-examples/pom.xml
 ##
 @@ -0,0 +1,206 @@
+
+
+http://maven.apache.org/POM/4.0.0; 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance; 
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 
http://maven.apache.org/xsd/maven-4.0.0.xsd;>
+
+  
+hudi
+org.apache.hudi
+0.5.2-SNAPSHOT
+  
+  4.0.0
+
+  hudi-examples
+  jar
+
+  
+${project.parent.basedir}
+  
+
+  
+
+  
+src/main/resources
+  
+
+
+
+  
+org.apache.maven.plugins
 
 Review comment:
   IIUC.. we will be building out a fat jar for purposes of running the 
examples from command line? 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1151: [WIP] [HUDI-476] Add hudi-examples module

2020-02-05 Thread GitBox
vinothchandar commented on a change in pull request #1151: [WIP] [HUDI-476] Add 
hudi-examples module
URL: https://github.com/apache/incubator-hudi/pull/1151#discussion_r375655701
 
 

 ##
 File path: 
hudi-examples/src/main/java/org/apache/hudi/examples/deltastreamer/HoodieDeltaStreamerDfsSourceExample.java
 ##
 @@ -0,0 +1,81 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.examples.deltastreamer;
+
+import org.apache.hudi.DataSourceWriteOptions;
+import org.apache.hudi.examples.common.HoodieExampleDataGenerator;
+import org.apache.hudi.examples.common.HoodieExampleSparkUtils;
+import org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer;
+import org.apache.hudi.utilities.sources.JsonDFSSource;
+import org.apache.hudi.utilities.transform.IdentityTransformer;
+
+import com.beust.jcommander.JCommander;
+import org.apache.spark.SparkConf;
+import org.apache.spark.api.java.JavaSparkContext;
+
+
+/**
+ * Simple examples of #{@link HoodieDeltaStreamer} from #{@link JsonDFSSource}.
+ *
+ * To run this example, you should
+ *   1. prepare sample data as 
`hudi-examples/src/main/resources/dfs-delta-streamer`
+ *   2. For running in IDE, set VM options `-Dspark.master=local[2]`
+ *   3. For running in shell, using `spark-submit`
+ *
+ * Usage: HoodieDeltaStreamerDfsSourceExample \
 
 Review comment:
   can we provide a full working command for this and all the examples? Also as 
a general comment, examples are great when the user can jsut hit run or execute 
a command and it takes care of things like data prep (step 1).. 
   
   Can we make data prep part of the examples themselves and then also provide 
sane defaults for input/output paths.. for e.g 
`/tmp/hudi-examples/dfsdeltastreamer/input` and 
`/tmp/hudi-examples/dfsdeltastreamer/output`, 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1151: [WIP] [HUDI-476] Add hudi-examples module

2020-02-05 Thread GitBox
vinothchandar commented on a change in pull request #1151: [WIP] [HUDI-476] Add 
hudi-examples module
URL: https://github.com/apache/incubator-hudi/pull/1151#discussion_r375657650
 
 

 ##
 File path: pom.xml
 ##
 @@ -408,6 +418,13 @@
 ${log4j.version}
   
 
+  
 
 Review comment:
   why is this needed at the parent pom level? 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1151: [WIP] [HUDI-476] Add hudi-examples module

2020-02-05 Thread GitBox
vinothchandar commented on a change in pull request #1151: [WIP] [HUDI-476] Add 
hudi-examples module
URL: https://github.com/apache/incubator-hudi/pull/1151#discussion_r375657083
 
 

 ##
 File path: 
hudi-examples/src/main/java/org/apache/hudi/examples/deltastreamer/HoodieDeltaStreamerSimpleExample.java
 ##
 @@ -0,0 +1,105 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.examples.deltastreamer;
+
+import org.apache.hudi.DataSourceWriteOptions;
+import org.apache.hudi.common.model.HoodieAvroPayload;
+import org.apache.hudi.common.table.timeline.HoodieActiveTimeline;
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.common.util.TypedProperties;
+import org.apache.hudi.examples.common.HoodieExampleDataGenerator;
+import org.apache.hudi.examples.common.HoodieExampleSparkUtils;
+import org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer;
+import org.apache.hudi.utilities.schema.SchemaProvider;
+import org.apache.hudi.utilities.sources.InputBatch;
+import org.apache.hudi.utilities.sources.JsonSource;
+import org.apache.hudi.utilities.transform.IdentityTransformer;
+
+import com.beust.jcommander.JCommander;
+import org.apache.spark.SparkConf;
+import org.apache.spark.api.java.JavaRDD;
+import org.apache.spark.api.java.JavaSparkContext;
+import org.apache.spark.sql.SparkSession;
+
+import java.util.List;
+
+/**
+ * Simple examples of {@link HoodieDeltaStreamer}.
+ * this class use data from a mock {@link HoodieExampleDataGenerator}.
+ *
+ * To run this example, you should
+ *1. For running in IDE, set VM options `-Dspark.master=local[2]`
+ *2. For running in shell, using `spark-submit`
+ *
+ * Usage: HoodieDeltaStreamerSimpleExample \
+ *--target-base-path /tmp/hoodie/deltastreamertable \
+ *--table-type MERGE_ON_READ \
+ *--target-table deltastreamertable
+ */
+public class HoodieDeltaStreamerSimpleExample {
 
 Review comment:
   This cna actually be an example for a custom data source 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


Build failed in Jenkins: hudi-snapshot-deployment-0.5 #181

2020-02-05 Thread Apache Jenkins Server
See 


Changes:


--
[...truncated 2.28 KB...]
/home/jenkins/tools/maven/apache-maven-3.5.4/boot:
plexus-classworlds-2.5.2.jar

/home/jenkins/tools/maven/apache-maven-3.5.4/conf:
logging
settings.xml
toolchains.xml

/home/jenkins/tools/maven/apache-maven-3.5.4/conf/logging:
simplelogger.properties

/home/jenkins/tools/maven/apache-maven-3.5.4/lib:
aopalliance-1.0.jar
cdi-api-1.0.jar
cdi-api.license
commons-cli-1.4.jar
commons-cli.license
commons-io-2.5.jar
commons-io.license
commons-lang3-3.5.jar
commons-lang3.license
ext
guava-20.0.jar
guice-4.2.0-no_aop.jar
jansi-1.17.1.jar
jansi-native
javax.inject-1.jar
jcl-over-slf4j-1.7.25.jar
jcl-over-slf4j.license
jsr250-api-1.0.jar
jsr250-api.license
maven-artifact-3.5.4.jar
maven-artifact.license
maven-builder-support-3.5.4.jar
maven-builder-support.license
maven-compat-3.5.4.jar
maven-compat.license
maven-core-3.5.4.jar
maven-core.license
maven-embedder-3.5.4.jar
maven-embedder.license
maven-model-3.5.4.jar
maven-model-builder-3.5.4.jar
maven-model-builder.license
maven-model.license
maven-plugin-api-3.5.4.jar
maven-plugin-api.license
maven-repository-metadata-3.5.4.jar
maven-repository-metadata.license
maven-resolver-api-1.1.1.jar
maven-resolver-api.license
maven-resolver-connector-basic-1.1.1.jar
maven-resolver-connector-basic.license
maven-resolver-impl-1.1.1.jar
maven-resolver-impl.license
maven-resolver-provider-3.5.4.jar
maven-resolver-provider.license
maven-resolver-spi-1.1.1.jar
maven-resolver-spi.license
maven-resolver-transport-wagon-1.1.1.jar
maven-resolver-transport-wagon.license
maven-resolver-util-1.1.1.jar
maven-resolver-util.license
maven-settings-3.5.4.jar
maven-settings-builder-3.5.4.jar
maven-settings-builder.license
maven-settings.license
maven-shared-utils-3.2.1.jar
maven-shared-utils.license
maven-slf4j-provider-3.5.4.jar
maven-slf4j-provider.license
org.eclipse.sisu.inject-0.3.3.jar
org.eclipse.sisu.inject.license
org.eclipse.sisu.plexus-0.3.3.jar
org.eclipse.sisu.plexus.license
plexus-cipher-1.7.jar
plexus-cipher.license
plexus-component-annotations-1.7.1.jar
plexus-component-annotations.license
plexus-interpolation-1.24.jar
plexus-interpolation.license
plexus-sec-dispatcher-1.4.jar
plexus-sec-dispatcher.license
plexus-utils-3.1.0.jar
plexus-utils.license
slf4j-api-1.7.25.jar
slf4j-api.license
wagon-file-3.1.0.jar
wagon-file.license
wagon-http-3.1.0-shaded.jar
wagon-http.license
wagon-provider-api-3.1.0.jar
wagon-provider-api.license

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/ext:
README.txt

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native:
freebsd32
freebsd64
linux32
linux64
osx
README.txt
windows32
windows64

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/freebsd32:
libjansi.so

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/freebsd64:
libjansi.so

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/linux32:
libjansi.so

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/linux64:
libjansi.so

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/osx:
libjansi.jnilib

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/windows32:
jansi.dll

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/windows64:
jansi.dll
Finished /home/jenkins/tools/maven/apache-maven-3.5.4 Directory Listing :
Detected current version as: 
'HUDI_home=
0.5.2-SNAPSHOT'
[INFO] Scanning for projects...
[WARNING] 
[WARNING] Some problems were encountered while building the effective model for 
org.apache.hudi:hudi-spark_2.11:jar:0.5.2-SNAPSHOT
[WARNING] 'artifactId' contains an expression but should be a constant. @ 
org.apache.hudi:hudi-spark_${scala.binary.version}:[unknown-version], 

 line 26, column 15
[WARNING] 
[WARNING] Some problems were encountered while building the effective model for 
org.apache.hudi:hudi-utilities_2.11:jar:0.5.2-SNAPSHOT
[WARNING] 'artifactId' contains an expression but should be a constant. @ 
org.apache.hudi:hudi-utilities_${scala.binary.version}:[unknown-version], 

 line 26, column 15
[WARNING] 
[WARNING] Some problems were encountered while building the effective model for 
org.apache.hudi:hudi-spark-bundle_2.11:jar:0.5.2-SNAPSHOT
[WARNING] 'artifactId' contains an expression but should be a constant. @ 
org.apache.hudi:hudi-spark-bundle_${scala.binary.version}:[unknown-version], 

 line 26, column 15
[WARNING] 
[WARNING] Some problems were encountered while building the effective model for 
org.apache.hudi:hudi-utilities-bundle_2.11:jar:0.5.2-SNAPSHOT
[WARNING] 'artifactId' contains an 

[jira] [Created] (HUDI-602) Add some guidance about how to judge the scope of MINOR to the contribution guidance

2020-02-05 Thread vinoyang (Jira)
vinoyang created HUDI-602:
-

 Summary: Add some guidance about how to judge the scope of MINOR 
to the contribution guidance
 Key: HUDI-602
 URL: https://issues.apache.org/jira/browse/HUDI-602
 Project: Apache Hudi (incubating)
  Issue Type: Improvement
  Components: Docs
Reporter: vinoyang
Assignee: vinoyang


Currently, The semantic and the scope of "MINOR" pr is not a bit clear. Some 
big change should not be a "MINOR" pr. We need some guidance to tell 
contributors how to judge what kind of PRs belongs "MINOR" pr.

There is a thread on hudi dev mailing list talking about this topic: 
https://lists.apache.org/thread.html/rc77546c7de7b2fa40330f889ed39dec7c0ab335e30e93d25745508ef%40%3Cdev.hudi.apache.org%3E



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] hddong commented on a change in pull request #1242: [HUDI-544] Adjust the read and write path of archive

2020-02-05 Thread GitBox
hddong commented on a change in pull request #1242: [HUDI-544] Adjust the read 
and write path of archive
URL: https://github.com/apache/incubator-hudi/pull/1242#discussion_r375606750
 
 

 ##
 File path: 
hudi-cli/src/main/java/org/apache/hudi/cli/commands/ArchivedCommitsCommand.java
 ##
 @@ -138,9 +139,11 @@ public String showCommits(
   throws IOException {
 
 System.out.println("===> Showing only " + limit + " archived 
commits <===");
-String basePath = HoodieCLI.getTableMetaClient().getBasePath();
+HoodieTableMetaClient metaClient = HoodieCLI.getTableMetaClient();
+String basePath = metaClient.getBasePath();
+Path archivePath = new Path(metaClient.getArchivePath() + 
"/.commits_.archive*");
 FileStatus[] fsStatuses =
-FSUtils.getFs(basePath, HoodieCLI.conf).globStatus(new Path(basePath + 
"/.hoodie/.commits_.archive*"));
+FSUtils.getFs(basePath, HoodieCLI.conf).globStatus(archivePath);
 
 Review comment:
   @hmatu @n3nash  It return `/table/.hoodie/.commits_archive*` if old tables 
use `DEFAULT` `''` and return `/table/.hoodie/archived/.commits_.archive*` if 
old tables use path `archived` to archive. So it will return the correct path 
with archive path stored in .hoodie.
   On the other hand,`archiveFolderPattern` `allow for users to be able to 
provide a full path of the files under the archive folder and read it`.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] hddong commented on a change in pull request #1242: [HUDI-544] Adjust the read and write path of archive

2020-02-05 Thread GitBox
hddong commented on a change in pull request #1242: [HUDI-544] Adjust the read 
and write path of archive
URL: https://github.com/apache/incubator-hudi/pull/1242#discussion_r375606750
 
 

 ##
 File path: 
hudi-cli/src/main/java/org/apache/hudi/cli/commands/ArchivedCommitsCommand.java
 ##
 @@ -138,9 +139,11 @@ public String showCommits(
   throws IOException {
 
 System.out.println("===> Showing only " + limit + " archived 
commits <===");
-String basePath = HoodieCLI.getTableMetaClient().getBasePath();
+HoodieTableMetaClient metaClient = HoodieCLI.getTableMetaClient();
+String basePath = metaClient.getBasePath();
+Path archivePath = new Path(metaClient.getArchivePath() + 
"/.commits_.archive*");
 FileStatus[] fsStatuses =
-FSUtils.getFs(basePath, HoodieCLI.conf).globStatus(new Path(basePath + 
"/.hoodie/.commits_.archive*"));
+FSUtils.getFs(basePath, HoodieCLI.conf).globStatus(archivePath);
 
 Review comment:
   @hmatu It return `/table/.hoodie/.commits_archive*` if old tables use 
`DEFAULT` `''` and return `/table/.hoodie/archived/.commits_.archive*` if old 
tables use path `archived` to archive. So it will return the correct path with 
archive path stored in .hoodie.
   On the other hand,'archiveFolderPattern' `allow for users to be able to 
provide a full path of the files under the archive folder and read it`.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] yanghua commented on issue #1305: [MINOR] Remove the declaration of thrown RuntimeException

2020-02-05 Thread GitBox
yanghua commented on issue #1305: [MINOR] Remove the declaration of thrown 
RuntimeException
URL: https://github.com/apache/incubator-hudi/pull/1305#issuecomment-582695979
 
 
   > cc @yanghua I actually thought of assigning this to you..to get your take 
on whether this qualifies as MINOR.. I think it does
   
   @vinothchandar +1 it does


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] lamber-ken closed pull request #1304: [MINOR] Replace Thread.sleep with TimeUnit*

2020-02-05 Thread GitBox
lamber-ken closed pull request #1304: [MINOR] Replace Thread.sleep with 
TimeUnit*
URL: https://github.com/apache/incubator-hudi/pull/1304
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Updated] (HUDI-601) Improve unit test coverage for HoodieAvroWriteSupport, HoodieRealtimeRecordReader, RealtimeCompactedRecordReader

2020-02-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-601:

Labels: pull-request-available  (was: )

> Improve unit test coverage for HoodieAvroWriteSupport, 
> HoodieRealtimeRecordReader, RealtimeCompactedRecordReader
> 
>
> Key: HUDI-601
> URL: https://issues.apache.org/jira/browse/HUDI-601
> Project: Apache Hudi (incubating)
>  Issue Type: Sub-task
>Reporter: Abhishek Modi
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] modi95 opened a new pull request #1310: [HUDI-601] Improve unit test coverage for HoodieAvroWriteSupport, HoodieRealtimeRecordReader, RealtimeCompactedRecordReader

2020-02-05 Thread GitBox
modi95 opened a new pull request #1310: [HUDI-601] Improve unit test coverage 
for HoodieAvroWriteSupport, HoodieRealtimeRecordReader, 
RealtimeCompactedRecordReader
URL: https://github.com/apache/incubator-hudi/pull/1310
 
 
   Raises unit test coverage for the following classes:
   - org.apache.hudi.avro.HoodieAvroWriteSupport
   - org.apache.hudi.hadoop.realtime.HoodieRealtimeRecordReader
   - org.apache.hudi.hadoop.realtime.RealtimeCompactedRecordReader
   
   
   This PR does not add any new logic - it simply improves unit test coverage
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Created] (HUDI-601) Improve unit test coverage for HoodieAvroWriteSupport, HoodieRealtimeRecordReader, RealtimeCompactedRecordReader

2020-02-05 Thread Abhishek Modi (Jira)
Abhishek Modi created HUDI-601:
--

 Summary: Improve unit test coverage for HoodieAvroWriteSupport, 
HoodieRealtimeRecordReader, RealtimeCompactedRecordReader
 Key: HUDI-601
 URL: https://issues.apache.org/jira/browse/HUDI-601
 Project: Apache Hudi (incubating)
  Issue Type: Sub-task
Reporter: Abhishek Modi






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] lamber-ken edited a comment on issue #1290: [HUDI-584] Relocate spark-avro dependency by maven-shade-plugin

2020-02-05 Thread GitBox
lamber-ken edited a comment on issue #1290: [HUDI-584] Relocate spark-avro 
dependency by maven-shade-plugin
URL: https://github.com/apache/incubator-hudi/pull/1290#issuecomment-582669261
 
 
   Hi @vinothchandar, sorry, I didn't know you and @umehrot2 had talked about 
this before.   
   
https://github.com/apache/incubator-hudi/pull/1005#pullrequestreview-340275874


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Updated] (HUDI-571) Modify Hudi CLI to show archived commits

2020-02-05 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf updated HUDI-571:
---
Status: Closed  (was: Patch Available)

> Modify Hudi CLI to show archived commits
> 
>
> Key: HUDI-571
> URL: https://issues.apache.org/jira/browse/HUDI-571
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: CLI
>Reporter: satish
>Assignee: satish
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Hudi CLI has 'show archived commits' command which is not very helpful
>  
> {code:java}
> ->show archived commits
> ===> Showing only 10 archived commits <===
>     
>     | CommitTime    | CommitType|
>     |===|
>     | 2019033304| commit    |
>     | 20190323220154| commit    |
>     | 20190323220154| commit    |
>     | 20190323224004| commit    |
>     | 20190323224013| commit    |
>     | 20190323224229| commit    |
>     | 20190323224229| commit    |
>     | 20190323232849| commit    |
>     | 20190323233109| commit    |
>     | 20190323233109| commit    |
>  {code}
> Modify or introduce new command to make it easy to debug
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-571) Modify Hudi CLI to show archived commits

2020-02-05 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf updated HUDI-571:
---
Fix Version/s: 0.5.2

> Modify Hudi CLI to show archived commits
> 
>
> Key: HUDI-571
> URL: https://issues.apache.org/jira/browse/HUDI-571
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: CLI
>Reporter: satish
>Assignee: satish
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.5.2
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Hudi CLI has 'show archived commits' command which is not very helpful
>  
> {code:java}
> ->show archived commits
> ===> Showing only 10 archived commits <===
>     
>     | CommitTime    | CommitType|
>     |===|
>     | 2019033304| commit    |
>     | 20190323220154| commit    |
>     | 20190323220154| commit    |
>     | 20190323224004| commit    |
>     | 20190323224013| commit    |
>     | 20190323224229| commit    |
>     | 20190323224229| commit    |
>     | 20190323232849| commit    |
>     | 20190323233109| commit    |
>     | 20190323233109| commit    |
>  {code}
> Modify or introduce new command to make it easy to debug
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] lamber-ken commented on issue #1290: [HUDI-584] Relocate spark-avro dependency by maven-shade-plugin

2020-02-05 Thread GitBox
lamber-ken commented on issue #1290: [HUDI-584] Relocate spark-avro dependency 
by maven-shade-plugin
URL: https://github.com/apache/incubator-hudi/pull/1290#issuecomment-582669261
 
 
   Hi @vinothchandar, I didn't know you and @umehrot2 had talked about this  
   
https://github.com/apache/incubator-hudi/pull/1005#pullrequestreview-340275874


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Resolved] (HUDI-570) Improve unit test coverage FSUtils.java

2020-02-05 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf resolved HUDI-570.

Fix Version/s: 0.5.2
   Resolution: Fixed

Fixed via master: 1fb0b001a38ddc940995e45f5cd53701d0110c3b

> Improve unit test coverage FSUtils.java
> ---
>
> Key: HUDI-570
> URL: https://issues.apache.org/jira/browse/HUDI-570
> Project: Apache Hudi (incubating)
>  Issue Type: Sub-task
>Reporter: Balajee Nagasubramaniam
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.5.2
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Add test cases for 
> - deleteOlderRollbackMetaFiles()
> - deleteOlderCleanMetaFiles()



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-570) Improve unit test coverage FSUtils.java

2020-02-05 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf updated HUDI-570:
---
Status: Open  (was: New)

> Improve unit test coverage FSUtils.java
> ---
>
> Key: HUDI-570
> URL: https://issues.apache.org/jira/browse/HUDI-570
> Project: Apache Hudi (incubating)
>  Issue Type: Sub-task
>Reporter: Balajee Nagasubramaniam
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Add test cases for 
> - deleteOlderRollbackMetaFiles()
> - deleteOlderCleanMetaFiles()



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HUDI-587) Jacoco coverage report is not generated

2020-02-05 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf resolved HUDI-587.

Fix Version/s: 0.5.2
   Resolution: Fixed

Fixed via master: d26dc0b229043afa5aefca239e72f40d80446917

> Jacoco coverage report is not generated
> ---
>
> Key: HUDI-587
> URL: https://issues.apache.org/jira/browse/HUDI-587
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>Reporter: Prashant Wason
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.5.2
>
>   Original Estimate: 1h
>  Time Spent: 20m
>  Remaining Estimate: 40m
>
> When running tests, the jacoco coverage report is not generated. The jacoco 
> plugin is loaded, it sets the correct Java Agent line, bit it fails to find 
> the execution data file after tests complete.
> Example:
> mvn test -Dtest=TestHoodieActiveTimeline
> ...
> 22:42:40 [INFO] — jacoco-maven-plugin:0.7.8:prepare-agent (pre-unit-test) @ 
> hudi-common —
>  22:42:40 [INFO] *surefireArgLine set to 
> javaagent:/home/pwason/.m2/repository/org/jacoco/org.jacoco.agent/0.7.8/org.jacoco.agent-0.7.8-runtime.jar=destfile=/home/pwason/work/java/incubator-hudi/hudi-common/target/coverage-reports/jacocout.exec*
> *...*
> 22:42:49 [INFO] — jacoco-maven-plugin:0.7.8:report (post-unit-test) @ 
> hudi-common —
>  22:42:49 [INFO] *Skipping JaCoCo execution due to missing execution data 
> file.*
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] lamber-ken commented on issue #1290: [HUDI-584] Relocate spark-avro dependency by maven-shade-plugin

2020-02-05 Thread GitBox
lamber-ken commented on issue #1290: [HUDI-584] Relocate spark-avro dependency 
by maven-shade-plugin
URL: https://github.com/apache/incubator-hudi/pull/1290#issuecomment-582663093
 
 
   Hi @vinothchandar @bhasudha @umehrot2, thanks you all talk about this here, 
I thinks it's a meaningful discusstion. I wrote a email to talk about this, 
more discussion about this, please using email, thanks. :)


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] lamber-ken commented on issue #1290: [HUDI-584] Relocate spark-avro dependency by maven-shade-plugin

2020-02-05 Thread GitBox
lamber-ken commented on issue #1290: [HUDI-584] Relocate spark-avro dependency 
by maven-shade-plugin
URL: https://github.com/apache/incubator-hudi/pull/1290#issuecomment-582661108
 
 
   hi @bhasudha, `Spark 3.0.0-preview2` is pre-built with Scala 2.12, user need 
to build hudi with scala-2.12 first. You can download spark-preview package 
from 
[here](https://www.apache.org/dyn/closer.lua/spark/spark-3.0.0-preview2/spark-3.0.0-preview2-bin-hadoop2.7.tgz).
   
   
![image](https://user-images.githubusercontent.com/20113411/73892288-78201080-48b1-11ea-8a68-ff4473640694.png)
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] umehrot2 commented on issue #1290: [HUDI-584] Relocate spark-avro dependency by maven-shade-plugin

2020-02-05 Thread GitBox
umehrot2 commented on issue #1290: [HUDI-584] Relocate spark-avro dependency by 
maven-shade-plugin
URL: https://github.com/apache/incubator-hudi/pull/1290#issuecomment-582655960
 
 
   Also another thing to consider here, is `spark-avro` module is only used for 
converting between spark's struct type and avro schema. While all the code for 
actual conversion of the data, is maintained inside of Hudi.
   
   I recently fixed an issue with this schema conversion 
https://github.com/apache/incubator-hudi/pull/1223/files#diff-3c046573a91f36ba0f12dad0e3395dc9R346
 where since the way spark-avro 2.4.4 created avro namespaces was different 
than earlier databricks avro. And this resulted in a change in Hudi's code as 
well. Considering this, it might be good to actually fix the spark-avro version 
and relocate it, to avoid customer's running into such issues as this would fix 
the schema conversion logic being used.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] bhasudha commented on issue #1290: [HUDI-584] Relocate spark-avro dependency by maven-shade-plugin

2020-02-05 Thread GitBox
bhasudha commented on issue #1290: [HUDI-584] Relocate spark-avro dependency by 
maven-shade-plugin
URL: https://github.com/apache/incubator-hudi/pull/1290#issuecomment-582654435
 
 
   > > If the user wants some changes in spark-avro:3.0-preview2, the right way 
is modify spark-version to 3.0-preview2 at pom.xml file, then build hudi 
project source.
   > 
   > Understood... but we would like to avoid this need for building a version 
of hudi by themselves. Users should be able to do `--packages 
` and be on their way.. This change forces everyone who 
is not on the spark version used by hudi to do their own builds.. I still feel 
this is not a good idea.
   > 
   > Once again, I wish we can have more upfront discussion on the JIRA, rather 
than on a PR, around issues like this. (I feel bad when someone puts in the 
work to do the implementation that we cannot take :()
   
   +1.  Good catch. @lamber-ken I think Vinoth brings up a valid point. 
Although your PR intends to make it easier for users to not care about scala 
2.11 or scala 2.12, we also need to avoid coupling Hudi with specific 
spark_avro versions be it 2.4.4 or 3.0-preview2.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] umehrot2 edited a comment on issue #1290: [HUDI-584] Relocate spark-avro dependency by maven-shade-plugin

2020-02-05 Thread GitBox
umehrot2 edited a comment on issue #1290: [HUDI-584] Relocate spark-avro 
dependency by maven-shade-plugin
URL: https://github.com/apache/incubator-hudi/pull/1290#issuecomment-582650497
 
 
   As @vinothchandar mentioned we did discuss the possibility of doing this, 
but were thrown back because of the reason mentioned above by Vinoth.
   
   But just thinking out aloud, now that Hudi has been migrated to `spark 
2.4.4` is it really recommended for user's using older or newer versions of 
spark version to be simply dropping the pre-built hudi jars with spark 2.4.4 ? 
Shouldn't the recommendation for such users be to actually build their own hudi 
jars, with the spark version they use. This would help them catch any possible 
compatibility issues during compile time. I am not sure how safe it is for us 
to claim that user's use our pre-built jars regardless of spark versions.
   
   And if building own jars is the recommendation, then i think this change can 
be pulled in.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] umehrot2 commented on issue #1290: [HUDI-584] Relocate spark-avro dependency by maven-shade-plugin

2020-02-05 Thread GitBox
umehrot2 commented on issue #1290: [HUDI-584] Relocate spark-avro dependency by 
maven-shade-plugin
URL: https://github.com/apache/incubator-hudi/pull/1290#issuecomment-582650497
 
 
   As @vinothchandar mentioned we did discuss the possibility of doing this, 
but were thrown back because of the reason mentioned above by Vinoth.
   
   But just thinking out aloud, now that Hudi has been migrated to `spark 
2.4.4` is it really recommended for user's using older or newer versions of 
spark version to be simply dropping the pre-built hudi jars with spark 2.4.4 ? 
Shouldn't the recommendation for such users be to actually build their own hudi 
jars, with the spark version they use. This would help them catch any possible 
compatibility issues during compile time. I am not sure how safe it is for us 
to claim that user's use for our pre-built jars regardless of spark versions.
   
   And if building own jars is the recommendation, then i think this change can 
be pulled in.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] n3nash merged pull request #1298: [HUDI-587] Fixed generation of jacoco coverage reports.

2020-02-05 Thread GitBox
n3nash merged pull request #1298: [HUDI-587] Fixed generation of jacoco 
coverage reports.
URL: https://github.com/apache/incubator-hudi/pull/1298
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] n3nash commented on a change in pull request #1298: [HUDI-587] Fixed generation of jacoco coverage reports.

2020-02-05 Thread GitBox
n3nash commented on a change in pull request #1298: [HUDI-587] Fixed generation 
of jacoco coverage reports.
URL: https://github.com/apache/incubator-hudi/pull/1298#discussion_r375543391
 
 

 ##
 File path: pom.xml
 ##
 @@ -113,6 +113,7 @@
 
 provided
 
+-Xmx1024m -XX:MaxPermSize=256m
 
 Review comment:
   Okay, makes sense. 
   
   @vinothchandar I don't see it slowing down, just had a hunch, merging this 
as this does not seem to have any side-effect.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[incubator-hudi] branch master updated (1fb0b00 -> d26dc0b)

2020-02-05 Thread nagarwal
This is an automated email from the ASF dual-hosted git repository.

nagarwal pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-hudi.git.


from 1fb0b00  [HUDI-570] - Improve test coverage for FSUtils.java
 add d26dc0b  [HUDI-587] Fixed generation of jacoco coverage reports.

No new revisions were added by this update.

Summary of changes:
 pom.xml | 7 +--
 1 file changed, 1 insertion(+), 6 deletions(-)



[incubator-hudi] branch master updated (462fd02 -> 1fb0b00)

2020-02-05 Thread nagarwal
This is an automated email from the ASF dual-hosted git repository.

nagarwal pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-hudi.git.


from 462fd02  [HUDI-571] Add 'commits show archived' command to CLI
 add 1fb0b00  [HUDI-570] - Improve test coverage for FSUtils.java

No new revisions were added by this update.

Summary of changes:
 .../java/org/apache/hudi/common/util/FSUtils.java  |  8 +-
 .../org/apache/hudi/common/util/TestFSUtils.java   | 99 +-
 2 files changed, 100 insertions(+), 7 deletions(-)



[GitHub] [incubator-hudi] n3nash merged pull request #1307: [HUDI-570] Improve test coverage for FSUtils.java

2020-02-05 Thread GitBox
n3nash merged pull request #1307: [HUDI-570] Improve test coverage for 
FSUtils.java
URL: https://github.com/apache/incubator-hudi/pull/1307
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] n3nash commented on issue #1309: [HUDI-592] Remove duplicated dependencies in the pom file of test suite module

2020-02-05 Thread GitBox
n3nash commented on issue #1309: [HUDI-592] Remove duplicated dependencies in 
the pom file of test suite module
URL: https://github.com/apache/incubator-hudi/pull/1309#issuecomment-582642492
 
 
   @yanghua Could you take a look at the failed build please ?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Created] (HUDI-600) Cleaner fails with AVRO exception when upgrading from 0.5.0 to master

2020-02-05 Thread Nishith Agarwal (Jira)
Nishith Agarwal created HUDI-600:


 Summary: Cleaner fails with AVRO exception when upgrading from 
0.5.0 to master
 Key: HUDI-600
 URL: https://issues.apache.org/jira/browse/HUDI-600
 Project: Apache Hudi (incubating)
  Issue Type: Bug
  Components: Cleaner
Reporter: Nishith Agarwal


```
org.apache.avro.AvroTypeException: Found 
org.apache.hudi.avro.model.HoodieCleanMetadata, expecting 
org.apache.hudi.avro.model.HoodieCleanerPlan, missing required field policy
at org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:292)
at org.apache.avro.io.parsing.Parser.advance(Parser.java:88)
at org.apache.avro.io.ResolvingDecoder.readFieldOrder(ResolvingDecoder.java:130)
at 
org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:215)
at 
org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:175)
at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:153)
at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:145)
at org.apache.avro.file.DataFileStream.next(DataFileStream.java:233)
at org.apache.avro.file.DataFileStream.next(DataFileStream.java:220)
at 
org.apache.hudi.common.util.AvroUtils.deserializeAvroMetadata(AvroUtils.java:149)
at org.apache.hudi.common.util.CleanerUtils.getCleanerPlan(CleanerUtils.java:88)
at org.apache.hudi.HoodieCleanClient.runClean(HoodieCleanClient.java:144)
at org.apache.hudi.HoodieCleanClient.lambda$clean$0(HoodieCleanClient.java:89)
at 
java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382)
at java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:647)
at org.apache.hudi.HoodieCleanClient.clean(HoodieCleanClient.java:87)
at org.apache.hudi.HoodieWriteClient.clean(HoodieWriteClient.java:837)
at org.apache.hudi.HoodieWriteClient.postCommit(HoodieWriteClient.java:514)
at 
org.apache.hudi.AbstractHoodieWriteClient.commit(AbstractHoodieWriteClient.java:156)
at 
org.apache.hudi.AbstractHoodieWriteClient.commit(AbstractHoodieWriteClient.java:100)
at 
org.apache.hudi.AbstractHoodieWriteClient.commit(AbstractHoodieWriteClient.java:91)
at 
org.apache.hudi.HoodieSparkSqlWriter$.checkWriteStatus(HoodieSparkSqlWriter.scala:261)
at org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:183)
at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:91)
at 
org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45)
at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86)
at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:156)
at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
at 
org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:80)
at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:80)
at 
org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676)
at 
org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676)
at 
org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:78)
at 
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125)
at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73)
at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:676)
at 
org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:285)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:271)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:229)
```
 
[~varadarb] any ideas about this ?
 
[~thesquelched] fyi



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] lamber-ken edited a comment on issue #1290: [HUDI-584] Relocate spark-avro dependency by maven-shade-plugin

2020-02-05 Thread GitBox
lamber-ken edited a comment on issue #1290: [HUDI-584] Relocate spark-avro 
dependency by maven-shade-plugin
URL: https://github.com/apache/incubator-hudi/pull/1290#issuecomment-582628959
 
 
   > > If the user wants some changes in spark-avro:3.0-preview2, the right way 
is modify spark-version to 3.0-preview2 at pom.xml file, then build hudi 
project source.
   > 
   > Understood... but we would like to avoid this need for building a version 
of hudi by themselves. Users should be able to do `--packages 
` and be on their way.. This change forces everyone who 
is not on the spark version used by hudi to do their own builds.. I still feel 
this is not a good idea.
   > 
   > Once again, I wish we can have more upfront discussion on the JIRA, rather 
than on a PR, around issues like this. (I feel bad when someone puts in the 
work to do the implementation that we cannot take :()
   
   Thanks, got it. Welcome, any suggestion is good to me. will file a disscuss 
email about this. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] lamber-ken commented on issue #1290: [HUDI-584] Relocate spark-avro dependency by maven-shade-plugin

2020-02-05 Thread GitBox
lamber-ken commented on issue #1290: [HUDI-584] Relocate spark-avro dependency 
by maven-shade-plugin
URL: https://github.com/apache/incubator-hudi/pull/1290#issuecomment-582628959
 
 
   > > If the user wants some changes in spark-avro:3.0-preview2, the right way 
is modify spark-version to 3.0-preview2 at pom.xml file, then build hudi 
project source.
   > 
   > Understood... but we would like to avoid this need for building a version 
of hudi by themselves. Users should be able to do `--packages 
` and be on their way.. This change forces everyone who 
is not on the spark version used by hudi to do their own builds.. I still feel 
this is not a good idea.
   > 
   > Once again, I wish we can have more upfront discussion on the JIRA, rather 
than on a PR, around issues like this. (I feel bad when someone puts in the 
work to do the implementation that we cannot take :()
   
   Thanks, got it. will file a disscuss email about this. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] lamber-ken commented on issue #1304: [MINOR] Replace Thread.sleep with TimeUnit*

2020-02-05 Thread GitBox
lamber-ken commented on issue #1304: [MINOR] Replace Thread.sleep with TimeUnit*
URL: https://github.com/apache/incubator-hudi/pull/1304#issuecomment-582627717
 
 
   > IMO anything that needs some kind of subjective debate, cannot be MINOR..
   > 
   > @lamber-ken For these kinds of PRs, can we first check with one of the 
commiters or engage on a JIRA. I feel we need not do this change
   
   Okay, thanks. 
   
   `TimeUnit` gives us a readability improvement, we no need to multiplied by 
1000.
   
![image](https://user-images.githubusercontent.com/20113411/73885833-6df71580-48a3-11ea-96ac-be3e6ed19fda.png)
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] nbalajee commented on a change in pull request #1307: [HUDI-570] Improve test coverage for FSUtils.java

2020-02-05 Thread GitBox
nbalajee commented on a change in pull request #1307: [HUDI-570] Improve test 
coverage for FSUtils.java
URL: https://github.com/apache/incubator-hudi/pull/1307#discussion_r375520095
 
 

 ##
 File path: 
hudi-common/src/test/java/org/apache/hudi/common/util/TestFSUtils.java
 ##
 @@ -43,11 +49,33 @@
 import java.util.stream.Stream;
 
 import static org.junit.Assert.assertEquals;
+import static org.junit.Assert.assertTrue;
 
 /**
  * Tests file system utils.
  */
 public class TestFSUtils extends HoodieCommonTestHarness {
+  private long minRollbackToKeep = 10;
+  private long minCleanToKeep = 10;
+  protected transient FileSystem fs;
+  protected String basePath = null;
+
+  @Before
 
 Review comment:
   incorporated the comments.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] nbalajee commented on a change in pull request #1307: [HUDI-570] Improve test coverage for FSUtils.java

2020-02-05 Thread GitBox
nbalajee commented on a change in pull request #1307: [HUDI-570] Improve test 
coverage for FSUtils.java
URL: https://github.com/apache/incubator-hudi/pull/1307#discussion_r375520024
 
 

 ##
 File path: 
hudi-common/src/test/java/org/apache/hudi/common/util/TestFSUtils.java
 ##
 @@ -43,11 +49,33 @@
 import java.util.stream.Stream;
 
 import static org.junit.Assert.assertEquals;
+import static org.junit.Assert.assertTrue;
 
 /**
  * Tests file system utils.
  */
 public class TestFSUtils extends HoodieCommonTestHarness {
+  private long minRollbackToKeep = 10;
 
 Review comment:
   incorporated the comments.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Resolved] (HUDI-499) Allow partition path to be updated with GLOBAL_BLOOM index

2020-02-05 Thread Raymond Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu resolved HUDI-499.
-
Resolution: Implemented

Configuration docs will be published later.

> Allow partition path to be updated with GLOBAL_BLOOM index
> --
>
> Key: HUDI-499
> URL: https://issues.apache.org/jira/browse/HUDI-499
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Index
>Reporter: Raymond Xu
>Assignee: Raymond Xu
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.5.2
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> h3. Context
> When a record is to be updated with a new partition path, and when set to 
> GLOBAL_BLOOM as index, the current logic implemented in 
> [https://github.com/apache/incubator-hudi/pull/1091/] ignores the new 
> partition path and update the record in the original partition path.
> h3. Proposed change
> Allow records to be inserted into their new partition paths and delete the 
> records in the old partition paths. A configuration (e.g. 
> {{hoodie.index.bloom.update.partition.path=true}}) can be added to enable 
> this feature.
> h4. An example use case
> A Hudi dataset manages people info and partitioned by birthday. In most 
> cases, where people info are updated, birthdays are not to be changed (that's 
> why we choose it as partition field). But in some edge cases where birthday 
> info are input wrongly and we want to manually fix it or allow user to 
> updated it occasionally. In this case, option 2 would be helpful in keeping 
> records in the expected partition, so that a query like "show me people who 
> were born after 2000" would work.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-499) Allow partition path to be updated with GLOBAL_BLOOM index

2020-02-05 Thread Raymond Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-499:

Description: 
h3. Context

When a record is to be updated with a new partition path, and when set to 
GLOBAL_BLOOM as index, the current logic implemented in 
[https://github.com/apache/incubator-hudi/pull/1091/] ignores the new partition 
path and update the record in the original partition path.
h3. Proposed change

Allow records to be inserted into their new partition paths and delete the 
records in the old partition paths. A configuration (e.g. 
{{hoodie.index.bloom.update.partition.path=true}}) can be added to enable this 
feature.
h4. An example use case

A Hudi dataset manages people info and partitioned by birthday. In most cases, 
where people info are updated, birthdays are not to be changed (that's why we 
choose it as partition field). But in some edge cases where birthday info are 
input wrongly and we want to manually fix it or allow user to updated it 
occasionally. In this case, option 2 would be helpful in keeping records in the 
expected partition, so that a query like "show me people who were born after 
2000" would work.

 

  was:
h3. Context

When a record is to be updated with a new partition path, and when set to 
GLOBAL_BLOOM as index, the current logic implemented in 
[https://github.com/apache/incubator-hudi/pull/1091/] ignores the new partition 
path and update the record in the original partition path.
h3. Proposed change

Allow records to be inserted into their new partition paths and delete the 
records in the old partition paths. A configuration (e.g. 
{{hoodie.index.bloom.should.update.partition.path=true}}) can be added to 
enable this feature.
h4. An example use case

A Hudi dataset manages people info and partitioned by birthday. In most cases, 
where people info are updated, birthdays are not to be changed (that's why we 
choose it as partition field). But in some edge cases where birthday info are 
input wrongly and we want to manually fix it or allow user to updated it 
occasionally. In this case, option 2 would be helpful in keeping records in the 
expected partition, so that a query like "show me people who were born after 
2000" would work.

 


> Allow partition path to be updated with GLOBAL_BLOOM index
> --
>
> Key: HUDI-499
> URL: https://issues.apache.org/jira/browse/HUDI-499
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Index
>Reporter: Raymond Xu
>Assignee: Raymond Xu
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.5.2
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> h3. Context
> When a record is to be updated with a new partition path, and when set to 
> GLOBAL_BLOOM as index, the current logic implemented in 
> [https://github.com/apache/incubator-hudi/pull/1091/] ignores the new 
> partition path and update the record in the original partition path.
> h3. Proposed change
> Allow records to be inserted into their new partition paths and delete the 
> records in the old partition paths. A configuration (e.g. 
> {{hoodie.index.bloom.update.partition.path=true}}) can be added to enable 
> this feature.
> h4. An example use case
> A Hudi dataset manages people info and partitioned by birthday. In most 
> cases, where people info are updated, birthdays are not to be changed (that's 
> why we choose it as partition field). But in some edge cases where birthday 
> info are input wrongly and we want to manually fix it or allow user to 
> updated it occasionally. In this case, option 2 would be helpful in keeping 
> records in the expected partition, so that a query like "show me people who 
> were born after 2000" would work.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] vinothchandar commented on issue #1290: [HUDI-584] Relocate spark-avro dependency by maven-shade-plugin

2020-02-05 Thread GitBox
vinothchandar commented on issue #1290: [HUDI-584] Relocate spark-avro 
dependency by maven-shade-plugin
URL: https://github.com/apache/incubator-hudi/pull/1290#issuecomment-582609037
 
 
   >If the user wants some changes in spark-avro:3.0-preview2, the right way is 
modify spark-version to 3.0-preview2 at pom.xml file, then build hudi project 
source.
   
   Understood... but we would like to avoid this need for building a version of 
hudi by themselves. Users should be able to do `--packages 
` and be on their way.. This change forces everyone who 
is not on the spark version used by hudi to do their own builds.. I still feel 
this is not a good idea. 
   
   Once again, I wish we can have more upfront discussion on the JIRA, rather 
than on a PR, around issues like this. (I feel bad when someone puts in the 
work to do the implementation that we cannot take :() 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on issue #1304: [MINOR] Replace Thread.sleep with TimeUnit*

2020-02-05 Thread GitBox
vinothchandar commented on issue #1304: [MINOR] Replace Thread.sleep with 
TimeUnit*
URL: https://github.com/apache/incubator-hudi/pull/1304#issuecomment-582605546
 
 
   @yanghua Same here... This PR IMO is making a subjective argument.. I 
personally feel `Thread.sleep` is more universally known and this change does 
not improve the code quality per se significantly.. WDYT 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on issue #1305: [MINOR] Remove the declaration of thrown RuntimeException

2020-02-05 Thread GitBox
vinothchandar commented on issue #1305: [MINOR] Remove the declaration of 
thrown RuntimeException
URL: https://github.com/apache/incubator-hudi/pull/1305#issuecomment-582605003
 
 
   cc @yanghua  I actually thought of assigning this to you..to get your take 
on whether this qualifies as MINOR.. I think it does


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] prashantwason commented on issue #1289: [HUDI-92] Provide reasonable names for Spark DAG stages in Hudi.

2020-02-05 Thread GitBox
prashantwason commented on issue #1289: [HUDI-92] Provide reasonable names for 
Spark DAG stages in Hudi.
URL: https://github.com/apache/incubator-hudi/pull/1289#issuecomment-582587735
 
 
   Spark History Server Screenshots
   
   ![List of 
tests](https://user-images.githubusercontent.com/58448203/73878606-b1a03f80-480f-11ea-89c0-13da2feb244c.png)
   
   
   ![Event 
timeline](https://user-images.githubusercontent.com/58448203/73878671-d1cffe80-480f-11ea-8ea7-770f80c4213a.png)
   
   
   
![TestHoodieClientCopyOnWrite](https://user-images.githubusercontent.com/58448203/73878634-c086f200-480f-11ea-8b4f-96c810ee6fbf.png)
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[incubator-hudi] branch master updated: [HUDI-571] Add 'commits show archived' command to CLI

2020-02-05 Thread nagarwal
This is an automated email from the ASF dual-hosted git repository.

nagarwal pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new 462fd02  [HUDI-571] Add 'commits show archived' command to CLI
462fd02 is described below

commit 462fd025563b0ae8a4d4f28d366a9bbfca070d3f
Author: Satish Kotha 
AuthorDate: Wed Jan 22 13:50:34 2020 -0800

[HUDI-571] Add 'commits show archived' command to CLI
---
 .../apache/hudi/cli/commands/CommitsCommand.java   | 105 +--
 .../apache/hudi/io/TestHoodieCommitArchiveLog.java |  73 
 .../apache/hudi/common/model/HoodieWriteStat.java  |   3 +-
 .../apache/hudi/common/table/HoodieTimeline.java   |   8 +
 .../table/timeline/HoodieActiveTimeline.java   |  89 --
 .../table/timeline/HoodieArchivedTimeline.java | 192 ++---
 .../table/timeline/HoodieDefaultTimeline.java  |  81 -
 .../common/table/TestHoodieTableMetaClient.java|  35 
 8 files changed, 385 insertions(+), 201 deletions(-)

diff --git 
a/hudi-cli/src/main/java/org/apache/hudi/cli/commands/CommitsCommand.java 
b/hudi-cli/src/main/java/org/apache/hudi/cli/commands/CommitsCommand.java
index c0f8ead..3a11e58 100644
--- a/hudi-cli/src/main/java/org/apache/hudi/cli/commands/CommitsCommand.java
+++ b/hudi-cli/src/main/java/org/apache/hudi/cli/commands/CommitsCommand.java
@@ -28,9 +28,12 @@ import org.apache.hudi.common.model.HoodieWriteStat;
 import org.apache.hudi.common.table.HoodieTableMetaClient;
 import org.apache.hudi.common.table.HoodieTimeline;
 import org.apache.hudi.common.table.timeline.HoodieActiveTimeline;
+import org.apache.hudi.common.table.timeline.HoodieArchivedTimeline;
+import org.apache.hudi.common.table.timeline.HoodieDefaultTimeline;
 import org.apache.hudi.common.table.timeline.HoodieInstant;
 import org.apache.hudi.common.util.NumericUtils;
 
+import org.apache.hudi.common.util.StringUtils;
 import org.apache.spark.launcher.SparkLauncher;
 import org.springframework.shell.core.CommandMarker;
 import org.springframework.shell.core.annotation.CliCommand;
@@ -38,7 +41,10 @@ import org.springframework.shell.core.annotation.CliOption;
 import org.springframework.stereotype.Component;
 
 import java.io.IOException;
+import java.time.ZonedDateTime;
 import java.util.ArrayList;
+import java.util.Collections;
+import java.util.Date;
 import java.util.HashMap;
 import java.util.List;
 import java.util.Map;
@@ -51,6 +57,49 @@ import java.util.stream.Collectors;
 @Component
 public class CommitsCommand implements CommandMarker {
 
+  private String printCommits(HoodieDefaultTimeline timeline,
+  final Integer limit, final String sortByField,
+  final boolean descending,
+  final boolean headerOnly) throws IOException {
+final List rows = new ArrayList<>();
+
+final List commits = 
timeline.getCommitsTimeline().filterCompletedInstants()
+.getInstants().collect(Collectors.toList());
+// timeline can be read from multiple files. So sort is needed instead of 
reversing the collection
+Collections.sort(commits, HoodieInstant.COMPARATOR.reversed());
+
+for (int i = 0; i < commits.size(); i++) {
+  final HoodieInstant commit = commits.get(i);
+  final HoodieCommitMetadata commitMetadata = 
HoodieCommitMetadata.fromBytes(
+  timeline.getInstantDetails(commit).get(),
+  HoodieCommitMetadata.class);
+  rows.add(new Comparable[]{commit.getTimestamp(),
+  commitMetadata.fetchTotalBytesWritten(),
+  commitMetadata.fetchTotalFilesInsert(),
+  commitMetadata.fetchTotalFilesUpdated(),
+  commitMetadata.fetchTotalPartitionsWritten(),
+  commitMetadata.fetchTotalRecordsWritten(),
+  commitMetadata.fetchTotalUpdateRecordsWritten(),
+  commitMetadata.fetchTotalWriteErrors()});
+}
+
+final Map> fieldNameToConverterMap = new 
HashMap<>();
+fieldNameToConverterMap.put("Total Bytes Written", entry -> {
+  return 
NumericUtils.humanReadableByteCount((Double.valueOf(entry.toString(;
+});
+
+final TableHeader header = new TableHeader()
+.addTableHeaderField("CommitTime")
+.addTableHeaderField("Total Bytes Written")
+.addTableHeaderField("Total Files Added")
+.addTableHeaderField("Total Files Updated")
+.addTableHeaderField("Total Partitions Written")
+.addTableHeaderField("Total Records Written")
+.addTableHeaderField("Total Update Records Written")
+.addTableHeaderField("Total Errors");
+return HoodiePrintHelper.print(header, fieldNameToConverterMap, 
sortByField, descending, limit, headerOnly, rows);
+  }
+
   @CliCommand(value = "commits show", help = "Show the commits")
   public String 

[GitHub] [incubator-hudi] n3nash merged pull request #1274: [HUDI-571] Add 'commits show archived' command to CLI

2020-02-05 Thread GitBox
n3nash merged pull request #1274: [HUDI-571] Add 'commits show archived' 
command to CLI
URL: https://github.com/apache/incubator-hudi/pull/1274
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on issue #1187: [HUDI-499] Allow update partition path with GLOBAL_BLOOM

2020-02-05 Thread GitBox
vinothchandar commented on issue #1187: [HUDI-499] Allow update partition path 
with GLOBAL_BLOOM
URL: https://github.com/apache/incubator-hudi/pull/1187#issuecomment-582522731
 
 
   Thanks and Congrats! :) landed! 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[incubator-hudi] branch master updated: [HUDI-499] Allow update partition path with GLOBAL_BLOOM (#1187)

2020-02-05 Thread vinoth
This is an automated email from the ASF dual-hosted git repository.

vinoth pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new c1516df  [HUDI-499] Allow update partition path with GLOBAL_BLOOM 
(#1187)
c1516df is described below

commit c1516df8ac55757ebd07d8aa459a0ceedeccab7b
Author: Raymond Xu <2701446+xushi...@users.noreply.github.com>
AuthorDate: Wed Feb 5 09:33:33 2020 -0800

[HUDI-499] Allow update partition path with GLOBAL_BLOOM (#1187)

* Handle partition path update by deleting a record from the old partition 
and
  insert into the new one
* Add a new configuration "hoodie.bloom.index.update.partition.path" to
  enable the behavior
* Add a new unit test case for global bloom index
---
 .../org/apache/hudi/config/HoodieIndexConfig.java  | 18 +
 .../org/apache/hudi/config/HoodieWriteConfig.java  |  4 +
 .../hudi/index/bloom/HoodieGlobalBloomIndex.java   | 23 +-
 .../index/bloom/TestHoodieGlobalBloomIndex.java| 89 ++
 4 files changed, 131 insertions(+), 3 deletions(-)

diff --git 
a/hudi-client/src/main/java/org/apache/hudi/config/HoodieIndexConfig.java 
b/hudi-client/src/main/java/org/apache/hudi/config/HoodieIndexConfig.java
index d39fae1..db83498 100644
--- a/hudi-client/src/main/java/org/apache/hudi/config/HoodieIndexConfig.java
+++ b/hudi-client/src/main/java/org/apache/hudi/config/HoodieIndexConfig.java
@@ -77,6 +77,17 @@ public class HoodieIndexConfig extends DefaultHoodieConfig {
   public static final String BLOOM_INDEX_INPUT_STORAGE_LEVEL = 
"hoodie.bloom.index.input.storage.level";
   public static final String DEFAULT_BLOOM_INDEX_INPUT_STORAGE_LEVEL = 
"MEMORY_AND_DISK_SER";
 
+  /**
+   * Only applies if index type is GLOBAL_BLOOM.
+   * 
+   * When set to true, an update to a record with a different partition from 
its existing one
+   * will insert the record to the new partition and delete it from the old 
partition.
+   * 
+   * When set to false, a record will be updated to the old partition.
+   */
+  public static final String BLOOM_INDEX_UPDATE_PARTITION_PATH = 
"hoodie.bloom.index.update.partition.path";
+  public static final String DEFAULT_BLOOM_INDEX_UPDATE_PARTITION_PATH = 
"false";
+
   private HoodieIndexConfig(Properties props) {
 super(props);
   }
@@ -176,6 +187,11 @@ public class HoodieIndexConfig extends DefaultHoodieConfig 
{
   return this;
 }
 
+public Builder withBloomIndexUpdatePartitionPath(boolean 
updatePartitionPath) {
+  props.setProperty(BLOOM_INDEX_UPDATE_PARTITION_PATH, 
String.valueOf(updatePartitionPath));
+  return this;
+}
+
 public HoodieIndexConfig build() {
   HoodieIndexConfig config = new HoodieIndexConfig(props);
   setDefaultOnCondition(props, !props.containsKey(INDEX_TYPE_PROP), 
INDEX_TYPE_PROP, DEFAULT_INDEX_TYPE);
@@ -190,6 +206,8 @@ public class HoodieIndexConfig extends DefaultHoodieConfig {
   DEFAULT_BLOOM_INDEX_USE_CACHING);
   setDefaultOnCondition(props, 
!props.containsKey(BLOOM_INDEX_INPUT_STORAGE_LEVEL), 
BLOOM_INDEX_INPUT_STORAGE_LEVEL,
   DEFAULT_BLOOM_INDEX_INPUT_STORAGE_LEVEL);
+  setDefaultOnCondition(props, 
!props.containsKey(BLOOM_INDEX_UPDATE_PARTITION_PATH),
+  BLOOM_INDEX_UPDATE_PARTITION_PATH, 
DEFAULT_BLOOM_INDEX_UPDATE_PARTITION_PATH);
   setDefaultOnCondition(props, 
!props.containsKey(BLOOM_INDEX_TREE_BASED_FILTER_PROP),
   BLOOM_INDEX_TREE_BASED_FILTER_PROP, 
DEFAULT_BLOOM_INDEX_TREE_BASED_FILTER);
   setDefaultOnCondition(props, 
!props.containsKey(BLOOM_INDEX_BUCKETIZED_CHECKING_PROP),
diff --git 
a/hudi-client/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java 
b/hudi-client/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java
index 7fc0680..642384b 100644
--- a/hudi-client/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java
+++ b/hudi-client/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java
@@ -431,6 +431,10 @@ public class HoodieWriteConfig extends DefaultHoodieConfig 
{
 return 
StorageLevel.fromString(props.getProperty(HoodieIndexConfig.BLOOM_INDEX_INPUT_STORAGE_LEVEL));
   }
 
+  public boolean getBloomIndexUpdatePartitionPath() {
+return 
Boolean.parseBoolean(props.getProperty(HoodieIndexConfig.BLOOM_INDEX_UPDATE_PARTITION_PATH));
+  }
+
   /**
* storage properties.
*/
diff --git 
a/hudi-client/src/main/java/org/apache/hudi/index/bloom/HoodieGlobalBloomIndex.java
 
b/hudi-client/src/main/java/org/apache/hudi/index/bloom/HoodieGlobalBloomIndex.java
index be6f524..ba8976b 100644
--- 
a/hudi-client/src/main/java/org/apache/hudi/index/bloom/HoodieGlobalBloomIndex.java
+++ 
b/hudi-client/src/main/java/org/apache/hudi/index/bloom/HoodieGlobalBloomIndex.java
@@ -18,6 +18,7 @@
 
 package org.apache.hudi.index.bloom;
 
+import org.apache.hudi.common.model.EmptyHoodieRecordPayload;
 import 

[GitHub] [incubator-hudi] vinothchandar merged pull request #1187: [HUDI-499] Allow update partition path with GLOBAL_BLOOM

2020-02-05 Thread GitBox
vinothchandar merged pull request #1187: [HUDI-499] Allow update partition path 
with GLOBAL_BLOOM
URL: https://github.com/apache/incubator-hudi/pull/1187
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Updated] (HUDI-321) Support bulkinsert in HDFSParquetImporter

2020-02-05 Thread Raymond Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-321:

Priority: Minor  (was: Trivial)

> Support bulkinsert in HDFSParquetImporter
> -
>
> Key: HUDI-321
> URL: https://issues.apache.org/jira/browse/HUDI-321
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Utilities
>Reporter: Raymond Xu
>Assignee: Raymond Xu
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.5.1
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently, HDFSParquetImporter only support upsert and insert mode. It is 
> useful to have bulk insert mode.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-553) Building/Running Hudi on higher java versions

2020-02-05 Thread Raymond Xu (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17030821#comment-17030821
 ] 

Raymond Xu commented on HUDI-553:
-

[~vinoth] really great to see this is planned! Wondering if the target is java 
11?

> Building/Running Hudi on higher java versions
> -
>
> Key: HUDI-553
> URL: https://issues.apache.org/jira/browse/HUDI-553
> Project: Apache Hudi (incubating)
>  Issue Type: Task
>  Components: Usability
>Reporter: Vinoth Chandar
>Priority: Major
> Fix For: 0.6.0
>
>
> [https://github.com/apache/incubator-hudi/issues/1235] 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] xushiyan edited a comment on issue #1190: [HUDI-499] Add configuration docs

2020-02-05 Thread GitBox
xushiyan edited a comment on issue #1190: [HUDI-499] Add configuration docs
URL: https://github.com/apache/incubator-hudi/pull/1190#issuecomment-582500892
 
 
   @leesf As we discussed, this is blocked until `_docs/0.5.1`  is created


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] xushiyan commented on issue #1190: [HUDI-499] Add configuration docs

2020-02-05 Thread GitBox
xushiyan commented on issue #1190: [HUDI-499] Add configuration docs
URL: https://github.com/apache/incubator-hudi/pull/1190#issuecomment-582500892
 
 
   blocked for now until `_docs/0.5.1`  is created


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] xushiyan edited a comment on issue #1187: [HUDI-499] Allow update partition path with GLOBAL_BLOOM

2020-02-05 Thread GitBox
xushiyan edited a comment on issue #1187: [HUDI-499] Allow update partition 
path with GLOBAL_BLOOM
URL: https://github.com/apache/incubator-hudi/pull/1187#issuecomment-582498755
 
 
   > @xushiyan I think there is a checkstyle error. can you please take a look
   
   @vinothchandar Fixed!


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] xushiyan commented on issue #1187: [HUDI-499] Allow update partition path with GLOBAL_BLOOM

2020-02-05 Thread GitBox
xushiyan commented on issue #1187: [HUDI-499] Allow update partition path with 
GLOBAL_BLOOM
URL: https://github.com/apache/incubator-hudi/pull/1187#issuecomment-582498755
 
 
   > @xushiyan I think there is a checkstyle error. can you please take a look
   
   Fixed!


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[incubator-hudi] branch master updated: [MINOR] Remove the declaration of thrown RuntimeException (#1305)

2020-02-05 Thread leesf
This is an automated email from the ASF dual-hosted git repository.

leesf pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new 46842f4  [MINOR] Remove the declaration of thrown RuntimeException 
(#1305)
46842f4 is described below

commit 46842f4e92202bca9a3a4f972f481c93ccc29a3c
Author: lamber-ken 
AuthorDate: Wed Feb 5 23:23:20 2020 +0800

[MINOR] Remove the declaration of thrown RuntimeException (#1305)
---
 .../apache/hudi/cli/commands/SparkEnvCommand.java  |  3 +-
 .../apache/hudi/avro/MercifulJsonConverter.java| 39 --
 .../hudi/common/table/HoodieTableMetaClient.java   |  5 ++-
 .../org/apache/hudi/common/util/RocksDBDAO.java|  2 +-
 .../hudi/exception/TableNotFoundException.java |  3 +-
 .../org/apache/hudi/hive/HoodieHiveClient.java |  4 +--
 .../apache/hudi/utilities/HDFSParquetImporter.java |  4 +--
 .../hudi/utilities/HoodieWithTimelineServer.java   |  2 +-
 8 files changed, 23 insertions(+), 39 deletions(-)

diff --git 
a/hudi-cli/src/main/java/org/apache/hudi/cli/commands/SparkEnvCommand.java 
b/hudi-cli/src/main/java/org/apache/hudi/cli/commands/SparkEnvCommand.java
index e5a8d4e..d209a08 100644
--- a/hudi-cli/src/main/java/org/apache/hudi/cli/commands/SparkEnvCommand.java
+++ b/hudi-cli/src/main/java/org/apache/hudi/cli/commands/SparkEnvCommand.java
@@ -37,8 +37,7 @@ public class SparkEnvCommand implements CommandMarker {
   public static Map env = new HashMap();
 
   @CliCommand(value = "set", help = "Set spark launcher env to cli")
-  public void setEnv(@CliOption(key = {"conf"}, help = "Env config to be set") 
final String confMap)
-  throws IllegalArgumentException {
+  public void setEnv(@CliOption(key = {"conf"}, help = "Env config to be set") 
final String confMap) {
 String[] map = confMap.split("=");
 if (map.length != 2) {
   throw new IllegalArgumentException("Illegal set parameter, please use 
like [set --conf SPARK_HOME=/usr/etc/spark]");
diff --git 
a/hudi-common/src/main/java/org/apache/hudi/avro/MercifulJsonConverter.java 
b/hudi-common/src/main/java/org/apache/hudi/avro/MercifulJsonConverter.java
index 20b00f1..3f5df01 100644
--- a/hudi-common/src/main/java/org/apache/hudi/avro/MercifulJsonConverter.java
+++ b/hudi-common/src/main/java/org/apache/hudi/avro/MercifulJsonConverter.java
@@ -143,15 +143,13 @@ public class MercifulJsonConverter {
   return res.getRight();
 }
 
-protected abstract Pair convert(Object value, String 
name, Schema schema)
-throws HoodieJsonToAvroConversionException;
+protected abstract Pair convert(Object value, String 
name, Schema schema);
   }
 
   private static JsonToAvroFieldProcessor generateBooleanTypeHandler() {
 return new JsonToAvroFieldProcessor() {
   @Override
-  public Pair convert(Object value, String name, Schema 
schema)
-  throws HoodieJsonToAvroConversionException {
+  public Pair convert(Object value, String name, Schema 
schema) {
 if (value instanceof Boolean) {
   return Pair.of(true, value);
 }
@@ -163,8 +161,7 @@ public class MercifulJsonConverter {
   private static JsonToAvroFieldProcessor generateIntTypeHandler() {
 return new JsonToAvroFieldProcessor() {
   @Override
-  public Pair convert(Object value, String name, Schema 
schema)
-  throws HoodieJsonToAvroConversionException {
+  public Pair convert(Object value, String name, Schema 
schema) {
 if (value instanceof Number) {
   return Pair.of(true, ((Number) value).intValue());
 } else if (value instanceof String) {
@@ -178,8 +175,7 @@ public class MercifulJsonConverter {
   private static JsonToAvroFieldProcessor generateDoubleTypeHandler() {
 return new JsonToAvroFieldProcessor() {
   @Override
-  public Pair convert(Object value, String name, Schema 
schema)
-  throws HoodieJsonToAvroConversionException {
+  public Pair convert(Object value, String name, Schema 
schema) {
 if (value instanceof Number) {
   return Pair.of(true, ((Number) value).doubleValue());
 } else if (value instanceof String) {
@@ -193,8 +189,7 @@ public class MercifulJsonConverter {
   private static JsonToAvroFieldProcessor generateFloatTypeHandler() {
 return new JsonToAvroFieldProcessor() {
   @Override
-  public Pair convert(Object value, String name, Schema 
schema)
-  throws HoodieJsonToAvroConversionException {
+  public Pair convert(Object value, String name, Schema 
schema) {
 if (value instanceof Number) {
   return Pair.of(true, ((Number) value).floatValue());
 } else if (value instanceof String) {
@@ -208,8 +203,7 @@ public class MercifulJsonConverter {
   private static JsonToAvroFieldProcessor generateLongTypeHandler() {
 return new JsonToAvroFieldProcessor() {
   @Override
-  

[GitHub] [incubator-hudi] leesf merged pull request #1305: [MINOR] Remove the declaration of thrown RuntimeException

2020-02-05 Thread GitBox
leesf merged pull request #1305: [MINOR] Remove the declaration of thrown 
RuntimeException
URL: https://github.com/apache/incubator-hudi/pull/1305
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] yanghua opened a new pull request #1309: [HUDI-592] Remove duplicated dependencies in the pom file of test suite module

2020-02-05 Thread GitBox
yanghua opened a new pull request #1309: [HUDI-592] Remove duplicated 
dependencies in the pom file of test suite module
URL: https://github.com/apache/incubator-hudi/pull/1309
 
 
   
   
   ## What is the purpose of the pull request
   
   *This pull request removes duplicated dependencies in the pom file of test 
suite module*
   
   ## Brief change log
   
 - *Remove duplicated dependencies in the pom file of test suite module*
   
   ## Verify this pull request
   
   This pull request is a trivial rework / code cleanup without any test 
coverage.
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Updated] (HUDI-592) Remove duplicated dependencies in the pom file of test suite module

2020-02-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-592:

Labels: pull-request-available  (was: )

> Remove duplicated dependencies in the pom file of test suite module
> ---
>
> Key: HUDI-592
> URL: https://issues.apache.org/jira/browse/HUDI-592
> Project: Apache Hudi (incubating)
>  Issue Type: Sub-task
>  Components: Testing
>Reporter: vinoyang
>Assignee: vinoyang
>Priority: Major
>  Labels: pull-request-available
>
> There are some duplicated dependencies in the pom file of the test suite 
> module, such as {{hadoop-hdfs}} and {{hadoop-common}}. We need to remove 
> these dependencies.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] lamber-ken commented on issue #1293: [HUDI-585] Optimize the steps of building with scala-2.12

2020-02-05 Thread GitBox
lamber-ken commented on issue #1293: [HUDI-585] Optimize the steps of building 
with scala-2.12
URL: https://github.com/apache/incubator-hudi/pull/1293#issuecomment-582456190
 
 
   > +1 LGTM, also tested in window, except for hudi-integ-test(not install 
docker), other modules works fine. Thanks for your work @lamber-ken . Merging.
   
   Thanks :)


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[incubator-hudi] branch master updated: [HUDI-585] Optimize the steps of building with scala-2.12 (#1293)

2020-02-05 Thread leesf
This is an automated email from the ASF dual-hosted git repository.

leesf pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new 425e3e6  [HUDI-585] Optimize the steps of building with scala-2.12 
(#1293)
425e3e6 is described below

commit 425e3e6c78b9be00fc3fecfc335c94e05a1c70e5
Author: lamber-ken 
AuthorDate: Wed Feb 5 23:13:10 2020 +0800

[HUDI-585] Optimize the steps of building with scala-2.12 (#1293)
---
 LICENSE |  2 -
 README.md   |  6 +--
 dev/change-scala-version.sh | 66 -
 hudi-spark/pom.xml  |  2 +-
 hudi-utilities/pom.xml  |  2 +-
 packaging/hudi-spark-bundle/pom.xml |  2 +-
 packaging/hudi-utilities-bundle/pom.xml |  2 +-
 pom.xml |  5 +++
 8 files changed, 11 insertions(+), 76 deletions(-)

diff --git a/LICENSE b/LICENSE
index 85b7fea..e5cb0ce 100644
--- a/LICENSE
+++ b/LICENSE
@@ -245,8 +245,6 @@ This product includes code from Apache Spark
 
 * org.apache.hudi.AvroConversionHelper copied from classes in 
org/apache/spark/sql/avro package
 
-* dev/change-scala-version.sh copied from 
https://github.com/apache/spark/blob/branch-2.4/dev/change-scala-version.sh
-
 Copyright: 2014 and onwards The Apache Software Foundation
 Home page: http://spark.apache.org/
 License: http://www.apache.org/licenses/LICENSE-2.0
diff --git a/README.md b/README.md
index ae53e72..6bb5659 100644
--- a/README.md
+++ b/README.md
@@ -65,12 +65,10 @@ mvn clean javadoc:aggregate -Pjavadocs
 
 ### Build with Scala 2.12
 
-The default Scala version supported is 2.11. To build for Scala 2.12 version, 
after code checkout run dev/change-scala-version.sh 
-and build using `scala-2.12` profile
+The default Scala version supported is 2.11. To build for Scala 2.12 version, 
build using `scala-2.12` profile
 
 ```
-dev/change-scala-version 2.12
-mvn clean package -DskipTests -DskipITs -Pscala-2.12
+mvn clean package -DskipTests -DskipITs -Dscala-2.12
 ```
 
 ## Quickstart
diff --git a/dev/change-scala-version.sh b/dev/change-scala-version.sh
deleted file mode 100755
index 151581d..000
--- a/dev/change-scala-version.sh
+++ /dev/null
@@ -1,66 +0,0 @@
-#!/usr/bin/env bash
-
-#
-# Licensed to the Apache Software Foundation (ASF) under one or more
-# contributor license agreements.  See the NOTICE file distributed with
-# this work for additional information regarding copyright ownership.
-# The ASF licenses this file to You under the Apache License, Version 2.0
-# (the "License"); you may not use this file except in compliance with
-# the License.  You may obtain a copy of the License at
-#
-#http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-#
-
-set -e
-
-VALID_VERSIONS=( 2.11 2.12 )
-
-usage() {
-  echo "Usage: $(basename $0) [-h|--help] 
-where :
-  -h| --help Display this help text
-  valid version values : ${VALID_VERSIONS[*]}
-" 1>&2
-  exit 1
-}
-
-if [[ ($# -ne 1) || ( $1 == "--help") ||  $1 == "-h" ]]; then
-  usage
-fi
-
-TO_VERSION=$1
-
-check_scala_version() {
-  for i in ${VALID_VERSIONS[*]}; do [ $i = "$1" ] && return 0; done
-  echo "Invalid Scala version: $1. Valid versions: ${VALID_VERSIONS[*]}" 1>&2
-  exit 1
-}
-
-check_scala_version "$TO_VERSION"
-
-if [ $TO_VERSION = "2.11" ]; then
-  FROM_VERSION="2.12"
-else
-  FROM_VERSION="2.11"
-fi
-
-sed_i() {
-  sed -e "$1" "$2" > "$2.tmp" && mv "$2.tmp" "$2"
-}
-
-export -f sed_i
-
-BASEDIR=$(dirname $0)/..
-find "$BASEDIR" -name 'pom.xml' -not -path '*target*' -print \
-  -exec bash -c "sed_i 's/\(artifactId.*\)_'$FROM_VERSION'/\1_'$TO_VERSION'/g' 
{}" \;
-
-# Also update  in parent POM
-# Match any scala binary version to ensure idempotency
-sed_i 
'1,/[0-9]*\.[0-9]*[0-9]*\.[0-9]*'$TO_VERSION'
   4.0.0
 
-  hudi-spark_2.11
+  hudi-spark_${scala.binary.version}
   jar
 
   
diff --git a/hudi-utilities/pom.xml b/hudi-utilities/pom.xml
index ed0b283..3c1e0fc 100644
--- a/hudi-utilities/pom.xml
+++ b/hudi-utilities/pom.xml
@@ -23,7 +23,7 @@
   
   4.0.0
 
-  hudi-utilities_2.11
+  hudi-utilities_${scala.binary.version}
   jar
 
   
diff --git a/packaging/hudi-spark-bundle/pom.xml 
b/packaging/hudi-spark-bundle/pom.xml
index afce774..754b5cf 100644
--- a/packaging/hudi-spark-bundle/pom.xml
+++ b/packaging/hudi-spark-bundle/pom.xml
@@ -23,7 +23,7 @@
 ../../pom.xml
   
   4.0.0
-  hudi-spark-bundle_2.11
+  hudi-spark-bundle_${scala.binary.version}
   jar
 
   
diff --git a/packaging/hudi-utilities-bundle/pom.xml 

[GitHub] [incubator-hudi] leesf merged pull request #1293: [HUDI-585] Optimize the steps of building with scala-2.12

2020-02-05 Thread GitBox
leesf merged pull request #1293: [HUDI-585] Optimize the steps of building with 
scala-2.12
URL: https://github.com/apache/incubator-hudi/pull/1293
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] lamber-ken commented on a change in pull request #1293: [HUDI-585] Optimize the steps of building with scala-2.12

2020-02-05 Thread GitBox
lamber-ken commented on a change in pull request #1293: [HUDI-585] Optimize the 
steps of building with scala-2.12
URL: https://github.com/apache/incubator-hudi/pull/1293#discussion_r375271334
 
 

 ##
 File path: pom.xml
 ##
 @@ -1052,6 +1052,10 @@
 
 
 Review comment:
   Already revert it.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] lamber-ken commented on a change in pull request #1293: [HUDI-585] Optimize the steps of building with scala-2.12

2020-02-05 Thread GitBox
lamber-ken commented on a change in pull request #1293: [HUDI-585] Optimize the 
steps of building with scala-2.12
URL: https://github.com/apache/incubator-hudi/pull/1293#discussion_r375269481
 
 

 ##
 File path: pom.xml
 ##
 @@ -1052,6 +1052,10 @@
 
 
 Review comment:
   Not necessary, this profile doesn't do anything.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] leesf commented on a change in pull request #1293: [HUDI-585] Optimize the steps of building with scala-2.12

2020-02-05 Thread GitBox
leesf commented on a change in pull request #1293: [HUDI-585] Optimize the 
steps of building with scala-2.12
URL: https://github.com/apache/incubator-hudi/pull/1293#discussion_r375265150
 
 

 ##
 File path: pom.xml
 ##
 @@ -1052,6 +1052,10 @@
 
 
 Review comment:
   So this line would be removed?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Updated] (HUDI-599) Update release guide & release scripts due to the change of scala 2.12 build

2020-02-05 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf updated HUDI-599:
---
Summary: Update release guide & release scripts due to the change of scala 
2.12 build  (was: Update release guide/release scripts due to the change of 
scala 2.12 build)

> Update release guide & release scripts due to the change of scala 2.12 build
> 
>
> Key: HUDI-599
> URL: https://issues.apache.org/jira/browse/HUDI-599
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Release  Administrative
>Reporter: leesf
>Assignee: leesf
>Priority: Major
> Fix For: 0.5.2
>
>
> Update release guide due to the change of scala 2.12 build, PR link below
> [https://github.com/apache/incubator-hudi/pull/1293]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-599) Update release guide/release scripts due to the change of scala 2.12 build

2020-02-05 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf updated HUDI-599:
---
Summary: Update release guide/release scripts due to the change of scala 
2.12 build  (was: Update Release guide due to the change of scala 2.12 build)

> Update release guide/release scripts due to the change of scala 2.12 build
> --
>
> Key: HUDI-599
> URL: https://issues.apache.org/jira/browse/HUDI-599
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Release  Administrative
>Reporter: leesf
>Assignee: leesf
>Priority: Major
> Fix For: 0.5.2
>
>
> Update release guide due to the change of scala 2.12 build, PR link below
> [https://github.com/apache/incubator-hudi/pull/1293]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-599) Update Release guide due to the change of scala 2.12 build

2020-02-05 Thread leesf (Jira)
leesf created HUDI-599:
--

 Summary: Update Release guide due to the change of scala 2.12 build
 Key: HUDI-599
 URL: https://issues.apache.org/jira/browse/HUDI-599
 Project: Apache Hudi (incubating)
  Issue Type: Improvement
  Components: Release  Administrative
Reporter: leesf
Assignee: leesf
 Fix For: 0.5.2


Update release guide due to the change of scala 2.12 build, PR link below

[https://github.com/apache/incubator-hudi/pull/1293]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] lamber-ken commented on issue #1293: [HUDI-585] Optimize the steps of building with scala-2.12

2020-02-05 Thread GitBox
lamber-ken commented on issue #1293: [HUDI-585] Optimize the steps of building 
with scala-2.12
URL: https://github.com/apache/incubator-hudi/pull/1293#issuecomment-582415582
 
 
   > @lamber-ken Sorry for late update, tested and worked fine in Linux, let's 
go forward.
   
   Welcome :) 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] lamber-ken edited a comment on issue #1293: [HUDI-585] Optimize the steps of building with scala-2.12

2020-02-05 Thread GitBox
lamber-ken edited a comment on issue #1293: [HUDI-585] Optimize the steps of 
building with scala-2.12
URL: https://github.com/apache/incubator-hudi/pull/1293#issuecomment-582415582
 
 
   > @lamber-ken Sorry for late update, tested and worked fine in Linux, let's 
go forward.
   
   No problem :) 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Closed] (HUDI-595) code cleanup

2020-02-05 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf closed HUDI-595.
--

> code cleanup 
> -
>
> Key: HUDI-595
> URL: https://issues.apache.org/jira/browse/HUDI-595
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Code Cleanup
>Reporter: Suneel Marthi
>Assignee: Suneel Marthi
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.5.2
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Moving out the cleanup code from PR# 1159 into a separate PR.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HUDI-586) Revisit the release guide

2020-02-05 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf reassigned HUDI-586:
--

Assignee: leesf

> Revisit the release guide
> -
>
> Key: HUDI-586
> URL: https://issues.apache.org/jira/browse/HUDI-586
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Release  Administrative
>Reporter: leesf
>Assignee: leesf
>Priority: Major
> Fix For: 0.6.0
>
>
> Currently, the release guide is not very standard, mainly meaning the 
> finalize the release step, we would refer to FLINK 
> [https://cwiki.apache.org/confluence/display/FLINK/Creating+a+Flink+Release] 
> , main change might be not adding rc-\{RC_NUM} to the pom.xml.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-590) Cut a new Doc version 0.5.1 explicitly

2020-02-05 Thread leesf (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17030656#comment-17030656
 ] 

leesf commented on HUDI-590:


[~bhavanisudha] It would be better to create the 0.5.1 version faster since 
there are some docs update in exist PRs. WDYT?

> Cut a new Doc version 0.5.1 explicitly
> --
>
> Key: HUDI-590
> URL: https://issues.apache.org/jira/browse/HUDI-590
> Project: Apache Hudi (incubating)
>  Issue Type: Task
>  Components: Docs, Release  Administrative
>Reporter: Bhavani Sudha
>Assignee: Bhavani Sudha
>Priority: Major
>
> The latest version of docs needs to be tagged as 0.5.1 explicitly in the 
> site. Follow instructions in 
> [https://github.com/apache/incubator-hudi/blob/asf-site/README.md#updating-site]
>  to create a new dir 0.5.1 under docs/_docs/ 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-595) code cleanup

2020-02-05 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf updated HUDI-595:
---
Status: Open  (was: New)

> code cleanup 
> -
>
> Key: HUDI-595
> URL: https://issues.apache.org/jira/browse/HUDI-595
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Code Cleanup
>Reporter: Suneel Marthi
>Assignee: Suneel Marthi
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.5.2
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Moving out the cleanup code from PR# 1159 into a separate PR.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HUDI-595) code cleanup

2020-02-05 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf resolved HUDI-595.

Resolution: Fixed

Fixed via master: 594da28fbf64fb20432e718a409577fd10516c4a

> code cleanup 
> -
>
> Key: HUDI-595
> URL: https://issues.apache.org/jira/browse/HUDI-595
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Code Cleanup
>Reporter: Suneel Marthi
>Assignee: Suneel Marthi
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.5.2
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Moving out the cleanup code from PR# 1159 into a separate PR.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] leesf commented on issue #1293: [HUDI-585] Optimize the steps of building with scala-2.12

2020-02-05 Thread GitBox
leesf commented on issue #1293: [HUDI-585] Optimize the steps of building with 
scala-2.12
URL: https://github.com/apache/incubator-hudi/pull/1293#issuecomment-582405982
 
 
   @lamber-ken Sorry for late update, tested and worked fine in Linux, let's go 
forward.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] pratyakshsharma commented on issue #765: [WIP] Fix KafkaAvroSource to use the latest schema

2020-02-05 Thread GitBox
pratyakshsharma commented on issue #765: [WIP] Fix KafkaAvroSource to use the 
latest schema
URL: https://github.com/apache/incubator-hudi/pull/765#issuecomment-582404722
 
 
   ack. Will take a pass.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] pratyakshsharma commented on issue #1165: [HUDI-76] Add CSV Source support for Hudi Delta Streamer

2020-02-05 Thread GitBox
pratyakshsharma commented on issue #1165: [HUDI-76] Add CSV Source support for 
Hudi Delta Streamer
URL: https://github.com/apache/incubator-hudi/pull/1165#issuecomment-582404480
 
 
   @vinothchandar ack.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services