[incubator-hudi] branch master updated: [MINOR] Fix resource cleanup in TestTableSchemaEvolution (#1640)

2020-05-20 Thread vinoth
This is an automated email from the ASF dual-hosted git repository.

vinoth pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new f802d44  [MINOR] Fix resource cleanup in TestTableSchemaEvolution 
(#1640)
f802d44 is described below

commit f802d4400b14a4b11be6f6a1b758e1bb6fbb62a3
Author: Raymond Xu <2701446+xushi...@users.noreply.github.com>
AuthorDate: Wed May 20 05:07:30 2020 -0700

[MINOR] Fix resource cleanup in TestTableSchemaEvolution (#1640)

- Remove Xms it is not needed.
- extending process exit timeout from 30 to 120 sec should be safe to do
---
 .../test/java/org/apache/hudi/client/TestTableSchemaEvolution.java  | 6 +++---
 pom.xml | 3 ++-
 2 files changed, 5 insertions(+), 4 deletions(-)

diff --git 
a/hudi-client/src/test/java/org/apache/hudi/client/TestTableSchemaEvolution.java
 
b/hudi-client/src/test/java/org/apache/hudi/client/TestTableSchemaEvolution.java
index f9e59c9..d3081e8 100644
--- 
a/hudi-client/src/test/java/org/apache/hudi/client/TestTableSchemaEvolution.java
+++ 
b/hudi-client/src/test/java/org/apache/hudi/client/TestTableSchemaEvolution.java
@@ -75,13 +75,13 @@ public class TestTableSchemaEvolution extends 
TestHoodieClientBase {
   + TRIP_SCHEMA_SUFFIX;
 
   @BeforeEach
-  public void setUp() throws Exception {
+  public void setUp() throws IOException {
 initResources();
   }
 
   @AfterEach
-  public void tearDown() {
-cleanupSparkContexts();
+  public void tearDown() throws IOException {
+cleanupResources();
   }
 
   @Test
diff --git a/pom.xml b/pom.xml
index b2792f0..9fe27d7 100644
--- a/pom.xml
+++ b/pom.xml
@@ -245,7 +245,8 @@
 ${maven-surefire-plugin.version}
 
   ${skipUTs}
-  -Xms256m -Xmx2g
+  -Xmx2g
+  
120
   
 
   ${surefire-log4j.file}



[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1640: [MINOR] Fix resource cleanup in TestTableSchemaEvolution

2020-05-20 Thread GitBox


vinothchandar commented on a change in pull request #1640:
URL: https://github.com/apache/incubator-hudi/pull/1640#discussion_r427956293



##
File path: pom.xml
##
@@ -245,7 +245,8 @@
 ${maven-surefire-plugin.version}
 
   ${skipUTs}
-  -Xms256m -Xmx2g
+  -Xmx2g

Review comment:
   its fine for now.. but we can split the PRs next time.. 
   
   Also appreciate if you added a more descriptive commit message (we need to 
standardize this as a group) 

##
File path: pom.xml
##
@@ -245,7 +245,8 @@
 ${maven-surefire-plugin.version}
 
   ${skipUTs}
-  -Xms256m -Xmx2g
+  -Xmx2g

Review comment:
   its fine for now.. but we can split the PRs next time.. 
   
   Also appreciate if you added a more descriptive commit message (we need to 
standardize this as a group) :) 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (HUDI-889) Writer supports useJdbc configuration when hive synchronization is enabled

2020-05-20 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-889:
-
Fix Version/s: (was: 0.5.3)

> Writer supports useJdbc configuration when hive synchronization is enabled
> --
>
> Key: HUDI-889
> URL: https://issues.apache.org/jira/browse/HUDI-889
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Writer Core
>Reporter: dzcxzl
>Priority: Trivial
> Fix For: 0.6.0
>
>
> hudi-hive-sync supports the useJdbc = false configuration, but the writer 
> does not provide this configuration at this stage



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Reopened] (HUDI-889) Writer supports useJdbc configuration when hive synchronization is enabled

2020-05-20 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan reopened HUDI-889:
--

> Writer supports useJdbc configuration when hive synchronization is enabled
> --
>
> Key: HUDI-889
> URL: https://issues.apache.org/jira/browse/HUDI-889
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Writer Core
>Reporter: dzcxzl
>Priority: Trivial
> Fix For: 0.6.0
>
>
> hudi-hive-sync supports the useJdbc = false configuration, but the writer 
> does not provide this configuration at this stage



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] vinothchandar commented on pull request #1538: [HUDI-803]: added more test cases in TestHoodieAvroUtils.class

2020-05-20 Thread GitBox


vinothchandar commented on pull request #1538:
URL: https://github.com/apache/incubator-hudi/pull/1538#issuecomment-631444156


   @pratyakshsharma by close, you mean final review and merge right? :) 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-hudi] pratyakshsharma opened a new pull request #1647: [HUDI-867]: fixed IllegalArgumentException from graphite metrics in deltaStreamer continuous mode

2020-05-20 Thread GitBox


pratyakshsharma opened a new pull request #1647:
URL: https://github.com/apache/incubator-hudi/pull/1647


   
   ## *Tips*
   - *Thank you very much for contributing to Apache Hudi.*
   - *Please review https://hudi.apache.org/contributing.html before opening a 
pull request.*
   
   ## What is the purpose of the pull request
   
   Added a way to create metric names with updated table name in every 
iteration so that IllegalArgumentException does not comes up. 
   
   ## Brief change log
   
   *(for example:)*
 - *Modify AnnotationLocation checkstyle rule in checkstyle.xml*
   
   ## Verify this pull request
   
   *(Please pick either of the following options)*
   
   This pull request is a trivial rework / code cleanup without any test 
coverage.
   
   *(or)*
   
   This pull request is already covered by existing tests, such as *(please 
describe tests)*.
   
   (or)
   
   This change added tests and can be verified as follows:
   
   *(example:)*
   
 - *Added integration tests for end-to-end.*
 - *Added HoodieClientWriteTest to verify the change.*
 - *Manually verified the change by running a job locally.*
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (HUDI-867) Graphite metrics are throwing IllegalArgumentException on continuous mode

2020-05-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-867:

Labels: bug-bash-0.6.0 pull-request-available  (was: bug-bash-0.6.0)

> Graphite metrics are throwing IllegalArgumentException on continuous mode
> -
>
> Key: HUDI-867
> URL: https://issues.apache.org/jira/browse/HUDI-867
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>  Components: DeltaStreamer
>Reporter: João Esteves
>Assignee: Pratyaksh Sharma
>Priority: Major
>  Labels: bug-bash-0.6.0, pull-request-available
>
> Hello everyone, I am trying to extract Graphite metrics from Hudi using a 
> Spark Streaming process, but the method that sends metrics is throwing 
> java.lang.IllegalArgumentException after the first microbatch, like this:
> {code:java}
> 20/05/06 11:49:25 ERROR Metrics: Failed to send metrics: 
> java.lang.IllegalArgumentException: A metric named 
> kafka_hudi.finalize.duration already exists
>   at 
> org.apache.hudi.com.codahale.metrics.MetricRegistry.register(MetricRegistry.java:97)
>   at org.apache.hudi.metrics.Metrics.registerGauge(Metrics.java:83)
>   at 
> org.apache.hudi.metrics.HoodieMetrics.updateFinalizeWriteMetrics(HoodieMetrics.java:177)
>   at 
> org.apache.hudi.HoodieWriteClient.lambda$finalizeWrite$14(HoodieWriteClient.java:1233)
>   at org.apache.hudi.common.util.Option.ifPresent(Option.java:96)
>   at 
> org.apache.hudi.HoodieWriteClient.finalizeWrite(HoodieWriteClient.java:1231)
>   at org.apache.hudi.HoodieWriteClient.commit(HoodieWriteClient.java:497)
>   at org.apache.hudi.HoodieWriteClient.commit(HoodieWriteClient.java:479)
>   at org.apache.hudi.HoodieWriteClient.commit(HoodieWriteClient.java:470)
>   at 
> org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:152)
>   at 
> org.apache.hudi.HoodieStreamingSink$$anonfun$1$$anonfun$2.apply(HoodieStreamingSink.scala:51)
>   at 
> org.apache.hudi.HoodieStreamingSink$$anonfun$1$$anonfun$2.apply(HoodieStreamingSink.scala:51)
>   at scala.util.Try$.apply(Try.scala:192)
>   at 
> org.apache.hudi.HoodieStreamingSink$$anonfun$1.apply(HoodieStreamingSink.scala:50)
>   at 
> org.apache.hudi.HoodieStreamingSink$$anonfun$1.apply(HoodieStreamingSink.scala:50)
>   at 
> org.apache.hudi.HoodieStreamingSink.retry(HoodieStreamingSink.scala:114)
>   at 
> org.apache.hudi.HoodieStreamingSink.addBatch(HoodieStreamingSink.scala:49)
>   at 
> org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$org$apache$spark$sql$execution$streaming$MicroBatchExecution$$runBatch$5$$anonfun$apply$17.apply(MicroBatchExecution.scala:537)
>   at 
> org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:84)
>   at 
> org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:165)
>   at 
> org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:74)
>   at 
> org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$org$apache$spark$sql$execution$streaming$MicroBatchExecution$$runBatch$5.apply(MicroBatchExecution.scala:535)
>   at 
> org.apache.spark.sql.execution.streaming.ProgressReporter$class.reportTimeTaken(ProgressReporter.scala:351)
>   at 
> org.apache.spark.sql.execution.streaming.StreamExecution.reportTimeTaken(StreamExecution.scala:58)
>   at 
> org.apache.spark.sql.execution.streaming.MicroBatchExecution.org$apache$spark$sql$execution$streaming$MicroBatchExecution$$runBatch(MicroBatchExecution.scala:534)
>   at 
> org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$runActivatedStream$1$$anonfun$apply$mcZ$sp$1.apply$mcV$sp(MicroBatchExecution.scala:198)
>   at 
> org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$runActivatedStream$1$$anonfun$apply$mcZ$sp$1.apply(MicroBatchExecution.scala:166)
>   at 
> org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$runActivatedStream$1$$anonfun$apply$mcZ$sp$1.apply(MicroBatchExecution.scala:166)
>   at 
> org.apache.spark.sql.execution.streaming.ProgressReporter$class.reportTimeTaken(ProgressReporter.scala:351)
>   at 
> org.apache.spark.sql.execution.streaming.StreamExecution.reportTimeTaken(StreamExecution.scala:58)
>   at 
> org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$runActivatedStream$1.apply$mcZ$sp(MicroBatchExecution.scala:166)
>   at 
> org.apache.spark.sql.execution.streaming.ProcessingTimeExecutor.execute(TriggerExecutor.scala:56)
>   at 
> org.apache.spark.sql.execution.streaming.MicroBatchExecution.runActivatedStream(MicroBatchExecution.scala:160)
>   at 
> 

[GitHub] [incubator-hudi] bvaradar commented on a change in pull request #1566: [HUDI-603]: DeltaStreamer can now fetch schema before every run in continuous mode

2020-05-20 Thread GitBox


bvaradar commented on a change in pull request #1566:
URL: https://github.com/apache/incubator-hudi/pull/1566#discussion_r428209317



##
File path: 
hudi-utilities/src/main/java/org/apache/hudi/utilities/schema/SchemaRegistryProvider.java
##
@@ -81,11 +66,22 @@ private static Schema getSchema(String registryUrl) throws 
IOException {
 
   @Override
   public Schema getSourceSchema() {
-return schema;
+String registryUrl = config.getString(Config.SRC_SCHEMA_REGISTRY_URL_PROP);
+try {
+  return getSchema(registryUrl);
+} catch (IOException ioe) {
+  throw new HoodieIOException("Error reading source schema from registry 
:" + registryUrl, ioe);
+}
   }
 
   @Override
   public Schema getTargetSchema() {
-return targetSchema;
+String registryUrl = config.getString(Config.SRC_SCHEMA_REGISTRY_URL_PROP);
+String targetRegistryUrl = 
config.getString(Config.TARGET_SCHEMA_REGISTRY_URL_PROP, registryUrl);
+try {
+  return getSchema(targetRegistryUrl);

Review comment:
   yes, target schema is allowed to be different than source schema due to 
transformations and this is fine.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-890) Prepare for 0.5.3 patch release

2020-05-20 Thread Balaji Varadarajan (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17112436#comment-17112436
 ] 

Balaji Varadarajan commented on HUDI-890:
-

[~shivnarayan]: HUDI-846 is merged to master (as part of PR: 
[https://github.com/apache/incubator-hudi/pull/1634/files]) . Lets port it to 
0.5.3 as well. 

> Prepare for 0.5.3 patch release
> ---
>
> Key: HUDI-890
> URL: https://issues.apache.org/jira/browse/HUDI-890
> Project: Apache Hudi (incubating)
>  Issue Type: Task
>Reporter: Bhavani Sudha
>Assignee: Bhavani Sudha
>Priority: Major
> Fix For: 0.5.3
>
>
> The following commits are included in this release.
>  * #1372 HUDI-652 Decouple HoodieReadClient and AbstractHoodieClient to break 
> the inheritance chain
>  * #1388 HUDI-681 Remove embeddedTimelineService from HoodieReadClient
>  * #1350 HUDI-629: Replace Guava's Hashing with an equivalent in 
> NumericUtils.java
>  * #1505 [HUDI - 738] Add validation to DeltaStreamer to fail fast when 
> filterDupes is enabled on UPSERT mode.
>  * #1517 HUDI-799 Use appropriate FS when loading configs
>  * #1406 HUDI-713 Fix conversion of Spark array of struct type to Avro schema
>  * #1394 HUDI-656[Performance] Return a dummy Spark relation after writing 
> the DataFrame
>  * #1576 HUDI-850 Avoid unnecessary listings in incremental cleaning mode
>  * #1421 HUDI-724 Parallelize getSmallFiles for partitions
>  * #1330 HUDI-607 Fix to allow creation/syncing of Hive tables partitioned by 
> Date type columns
>  * #1413 Add constructor to HoodieROTablePathFilter
>  * #1415 HUDI-539 Make ROPathFilter conf member serializable
>  * #1578 Add changes for presto mor queries
>  * #1506 HUDI-782 Add support of Aliyun object storage service.
>  * #1432 HUDI-716 Exception: Not an Avro data file when running 
> HoodieCleanClient.runClean
>  * #1422 HUDI-400 Check upgrade from old plan to new plan for compaction
>  * #1448 [MINOR] Update DOAP with 0.5.2 Release
>  * #1466 HUDI-742 Fix Java Math Exception
>  * #1416 HUDI-717 Fixed usage of HiveDriver for DDL statements.
>  * #1427 HUDI-727: Copy default values of fields if not present when 
> rewriting incoming record with new schema
>  * #1515 HUDI-795 Handle auto-deleted empty aux folder
>  * #1547 [MINOR]: Fix cli docs for DeltaStreamer
>  * #1580 HUDI-852 adding check for table name for Append Save mode
>  * #1537 [MINOR] fixed building IndexFileFilter with a wrong condition in 
> HoodieGlobalBloomIndex class
>  * #1434 HUDI-616 Fixed parquet files getting created on local FS
>  * #1633 HUDI-858 Allow multiple operations to be executed within a single 
> commit
>  * #1634 HUDI-846Enable Incremental cleaning and embedded timeline-server by 
> default
>  * #1596 HUDI-863 get decimal properties from derived spark DataType
>  * #1636 HUDI-895 Remove unnecessary listing .hoodie folder when using 
> timeline server
>  * #1584 HUDI-902 Avoid exception when getSchemaProvider
>  * #1612 HUDI-528 Handle empty commit in incremental pulling
>  * #1511 HUDI-789Adjust logic of upsert in HDFSParquetImporter
>  * #1627 HUDI-889 Writer supports useJdbc configuration when hive 
> synchronization is enabled



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-848) Turn on embedded timeline server by default for all writes

2020-05-20 Thread Balaji Varadarajan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Balaji Varadarajan updated HUDI-848:

Status: In Progress  (was: Open)

> Turn on embedded timeline server by default for all writes
> --
>
> Key: HUDI-848
> URL: https://issues.apache.org/jira/browse/HUDI-848
> Project: Apache Hudi (incubating)
>  Issue Type: Sub-task
>  Components: Writer Core
>Reporter: Balaji Varadarajan
>Assignee: Balaji Varadarajan
>Priority: Major
> Fix For: 0.6.0, 0.5.3
>
>
> Includes RDD level, Spark DS and DeltaStreamer



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] vinothchandar merged pull request #1538: [HUDI-803]: added more test cases in TestHoodieAvroUtils.class

2020-05-20 Thread GitBox


vinothchandar merged pull request #1538:
URL: https://github.com/apache/incubator-hudi/pull/1538


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[incubator-hudi] branch master updated: [HUDI-803] Replaced used of NullNode with JsonProperties.NULL_VALUE in HoodieAvroUtils (#1538)

2020-05-20 Thread vinoth
This is an automated email from the ASF dual-hosted git repository.

vinoth pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new 6a0aa9a  [HUDI-803] Replaced used of NullNode with 
JsonProperties.NULL_VALUE in HoodieAvroUtils (#1538)
6a0aa9a is described below

commit 6a0aa9a645d11ed7b50e18aa0563dafcd9d145f7
Author: Pratyaksh Sharma 
AuthorDate: Wed May 20 21:34:43 2020 +0530

[HUDI-803] Replaced used of NullNode with JsonProperties.NULL_VALUE in 
HoodieAvroUtils (#1538)

- added more test cases in TestHoodieAvroUtils.class

Co-authored-by: Vinoth Chandar 
---
 .../java/org/apache/hudi/avro/HoodieAvroUtils.java | 17 ++--
 .../org/apache/hudi/avro/TestHoodieAvroUtils.java  | 93 +-
 2 files changed, 97 insertions(+), 13 deletions(-)

diff --git 
a/hudi-common/src/main/java/org/apache/hudi/avro/HoodieAvroUtils.java 
b/hudi-common/src/main/java/org/apache/hudi/avro/HoodieAvroUtils.java
index 8c22122..38b9d32 100644
--- a/hudi-common/src/main/java/org/apache/hudi/avro/HoodieAvroUtils.java
+++ b/hudi-common/src/main/java/org/apache/hudi/avro/HoodieAvroUtils.java
@@ -18,7 +18,7 @@
 
 package org.apache.hudi.avro;
 
-import org.apache.avro.JsonProperties.Null;
+import org.apache.avro.JsonProperties;
 import org.apache.hudi.common.model.HoodieRecord;
 import org.apache.hudi.exception.HoodieIOException;
 import org.apache.hudi.exception.SchemaCompatabilityException;
@@ -64,7 +64,7 @@ public class HoodieAvroUtils {
   private static ThreadLocal reuseDecoder = 
ThreadLocal.withInitial(() -> null);
 
   // All metadata fields are optional strings.
-  private static final Schema METADATA_FIELD_SCHEMA =
+  static final Schema METADATA_FIELD_SCHEMA =
   Schema.createUnion(Arrays.asList(Schema.create(Schema.Type.NULL), 
Schema.create(Schema.Type.STRING)));
 
   private static final Schema RECORD_KEY_SCHEMA = initRecordKeySchema();
@@ -96,7 +96,6 @@ public class HoodieAvroUtils {
 writer.write(record, jsonEncoder);
 jsonEncoder.flush();
 return out.toByteArray();
-//metadata.toJsonString().getBytes(StandardCharsets.UTF_8));
   }
 
   /**
@@ -142,15 +141,15 @@ public class HoodieAvroUtils {
 List parentFields = new ArrayList<>();
 
 Schema.Field commitTimeField =
-new Schema.Field(HoodieRecord.COMMIT_TIME_METADATA_FIELD, 
METADATA_FIELD_SCHEMA, "", NullNode.getInstance());
+new Schema.Field(HoodieRecord.COMMIT_TIME_METADATA_FIELD, 
METADATA_FIELD_SCHEMA, "", JsonProperties.NULL_VALUE);
 Schema.Field commitSeqnoField =
-new Schema.Field(HoodieRecord.COMMIT_SEQNO_METADATA_FIELD, 
METADATA_FIELD_SCHEMA, "", NullNode.getInstance());
+new Schema.Field(HoodieRecord.COMMIT_SEQNO_METADATA_FIELD, 
METADATA_FIELD_SCHEMA, "", JsonProperties.NULL_VALUE);
 Schema.Field recordKeyField =
-new Schema.Field(HoodieRecord.RECORD_KEY_METADATA_FIELD, 
METADATA_FIELD_SCHEMA, "", NullNode.getInstance());
+new Schema.Field(HoodieRecord.RECORD_KEY_METADATA_FIELD, 
METADATA_FIELD_SCHEMA, "", JsonProperties.NULL_VALUE);
 Schema.Field partitionPathField =
-new Schema.Field(HoodieRecord.PARTITION_PATH_METADATA_FIELD, 
METADATA_FIELD_SCHEMA, "", NullNode.getInstance());
+new Schema.Field(HoodieRecord.PARTITION_PATH_METADATA_FIELD, 
METADATA_FIELD_SCHEMA, "", JsonProperties.NULL_VALUE);
 Schema.Field fileNameField =
-new Schema.Field(HoodieRecord.FILENAME_METADATA_FIELD, 
METADATA_FIELD_SCHEMA, "", NullNode.getInstance());
+new Schema.Field(HoodieRecord.FILENAME_METADATA_FIELD, 
METADATA_FIELD_SCHEMA, "", JsonProperties.NULL_VALUE);
 
 parentFields.add(commitTimeField);
 parentFields.add(commitSeqnoField);
@@ -272,7 +271,7 @@ public class HoodieAvroUtils {
 GenericRecord newRecord = new GenericData.Record(newSchema);
 for (Schema.Field f : fieldsToWrite) {
   if (record.get(f.name()) == null) {
-if (f.defaultVal() instanceof Null) {
+if (f.defaultVal() instanceof JsonProperties.Null) {
   newRecord.put(f.name(), null);
 } else {
   newRecord.put(f.name(), f.defaultVal());
diff --git 
a/hudi-common/src/test/java/org/apache/hudi/avro/TestHoodieAvroUtils.java 
b/hudi-common/src/test/java/org/apache/hudi/avro/TestHoodieAvroUtils.java
index 9c5e046..7d5cf04 100644
--- a/hudi-common/src/test/java/org/apache/hudi/avro/TestHoodieAvroUtils.java
+++ b/hudi-common/src/test/java/org/apache/hudi/avro/TestHoodieAvroUtils.java
@@ -18,16 +18,24 @@
 
 package org.apache.hudi.avro;
 
+import org.apache.avro.JsonProperties;
+import org.apache.hudi.common.model.HoodieRecord;
+import org.apache.hudi.exception.SchemaCompatabilityException;
+
 import org.apache.avro.Schema;
 import org.apache.avro.generic.GenericData;
 import org.apache.avro.generic.GenericRecord;
+import org.codehaus.jackson.node.NullNode;
 import 

[GitHub] [incubator-hudi] xushiyan commented on pull request #1644: [HUDI-811] Restructure test packages in hudi-common

2020-05-20 Thread GitBox


xushiyan commented on pull request #1644:
URL: https://github.com/apache/incubator-hudi/pull/1644#issuecomment-631577878


   ![Screen Shot 2020-05-20 at 8 57 05 
AM](https://user-images.githubusercontent.com/2701446/82468795-15294280-9a78-11ea-909a-bf09da83d7a4.png)
   
   Test classes under `functional` and `testutils`
   - `TestHoodieLogFormat` and `TestHoodieLogFormatAppendFailure` involves 
minicluster hence go to `functional`
   - Same for 
`org.apache.hudi.hadoop.functional.TestHoodieCombineHiveInputFormat`
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-hudi] bvaradar commented on pull request #1645: [HUDI-707]Add unit test for StatsCommand

2020-05-20 Thread GitBox


bvaradar commented on pull request #1645:
URL: https://github.com/apache/incubator-hudi/pull/1645#issuecomment-631624593


   @yanghua @leesf : Would you be interested in shepherding this PR when it is 
ready ?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-846) Turn on incremental cleaning bu default in 0.6.0

2020-05-20 Thread Balaji Varadarajan (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17112434#comment-17112434
 ] 

Balaji Varadarajan commented on HUDI-846:
-

Incremental cleaning is enabled by default as part of 
[https://github.com/apache/incubator-hudi/pull/1634/files]

> Turn on incremental cleaning bu default in 0.6.0
> 
>
> Key: HUDI-846
> URL: https://issues.apache.org/jira/browse/HUDI-846
> Project: Apache Hudi (incubating)
>  Issue Type: Sub-task
>  Components: Cleaner
>Reporter: Balaji Varadarajan
>Assignee: Balaji Varadarajan
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.6.0, 0.5.3
>
>
> Incremental cleaner will track commits that have happened since the last 
> clean operation to figure out partitions which needs to be scanned for 
> cleaning. This avoids the costly scanning of all partition paths.
> Incremental cleaning is currently disabled by default. We need to enable it 
> by default in 0.6.0.
> No special handling is required for upgrade/downgrade scenarios as 
> incremental cleaning relies on standard format of commit metadata 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HUDI-846) Turn on incremental cleaning bu default in 0.6.0

2020-05-20 Thread Balaji Varadarajan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Balaji Varadarajan resolved HUDI-846.
-
Resolution: Fixed

> Turn on incremental cleaning bu default in 0.6.0
> 
>
> Key: HUDI-846
> URL: https://issues.apache.org/jira/browse/HUDI-846
> Project: Apache Hudi (incubating)
>  Issue Type: Sub-task
>  Components: Cleaner
>Reporter: Balaji Varadarajan
>Assignee: Balaji Varadarajan
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.6.0, 0.5.3
>
>
> Incremental cleaner will track commits that have happened since the last 
> clean operation to figure out partitions which needs to be scanned for 
> cleaning. This avoids the costly scanning of all partition paths.
> Incremental cleaning is currently disabled by default. We need to enable it 
> by default in 0.6.0.
> No special handling is required for upgrade/downgrade scenarios as 
> incremental cleaning relies on standard format of commit metadata 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-846) Turn on incremental cleaning bu default in 0.6.0

2020-05-20 Thread Balaji Varadarajan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Balaji Varadarajan updated HUDI-846:

Status: In Progress  (was: Open)

> Turn on incremental cleaning bu default in 0.6.0
> 
>
> Key: HUDI-846
> URL: https://issues.apache.org/jira/browse/HUDI-846
> Project: Apache Hudi (incubating)
>  Issue Type: Sub-task
>  Components: Cleaner
>Reporter: Balaji Varadarajan
>Assignee: Balaji Varadarajan
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.6.0, 0.5.3
>
>
> Incremental cleaner will track commits that have happened since the last 
> clean operation to figure out partitions which needs to be scanned for 
> cleaning. This avoids the costly scanning of all partition paths.
> Incremental cleaning is currently disabled by default. We need to enable it 
> by default in 0.6.0.
> No special handling is required for upgrade/downgrade scenarios as 
> incremental cleaning relies on standard format of commit metadata 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] bvaradar commented on a change in pull request #1566: [HUDI-603]: DeltaStreamer can now fetch schema before every run in continuous mode

2020-05-20 Thread GitBox


bvaradar commented on a change in pull request #1566:
URL: https://github.com/apache/incubator-hudi/pull/1566#discussion_r428206400



##
File path: 
hudi-utilities/src/main/java/org/apache/hudi/utilities/schema/SchemaSet.java
##
@@ -0,0 +1,44 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.utilities.schema;
+
+import java.io.Serializable;
+import java.util.HashSet;
+import org.apache.avro.Schema;
+import org.apache.avro.SchemaNormalization;
+
+import java.util.Set;
+
+/**
+ * Tracks already processed schemas.
+ */
+public class SchemaSet implements Serializable {
+
+  private final Set processedSchema = new HashSet<>();

Review comment:
   I think this is similar in scope to how sparkConf maintains avro 
schemas. In continuous mode, we reuse the same spark session. I think this is 
ok.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-hudi] garyli1019 commented on pull request #1643: [HUDI-110] Spark Datasource Auto Partition Extractor

2020-05-20 Thread GitBox


garyli1019 commented on pull request #1643:
URL: https://github.com/apache/incubator-hudi/pull/1643#issuecomment-631589860


   @vinothchandar Yes this feature was already supported. Maybe I misunderstand 
this ticket https://issues.apache.org/jira/browse/HUDI-110. Need @bvaradar 's 
input here.
   I will add the partition discovery example to the doc. I remember someone 
asked questions related to this topic before.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[incubator-hudi] branch hudi_test_suite_refactor updated (6472886 -> 2773fe9)

2020-05-20 Thread nagarwal
This is an automated email from the ASF dual-hosted git repository.

nagarwal pushed a change to branch hudi_test_suite_refactor
in repository https://gitbox.apache.org/repos/asf/incubator-hudi.git.


 discard 6472886  [HUDI-394] Provide a basic implementation of test suite
 add 2773fe9  [HUDI-394] Provide a basic implementation of test suite

This update added new revisions after undoing existing revisions.
That is to say, some revisions that were in the old version of the
branch are not in the new version.  This situation occurs
when a user --force pushes a change and generates a repository
containing something like this:

 * -- * -- B -- O -- O -- O   (6472886)
\
 N -- N -- N   refs/heads/hudi_test_suite_refactor (2773fe9)

You should already have received notification emails for all of the O
revisions, and so the following emails describe only the N revisions
from the common base, B.

Any revisions marked "omit" are not gone; other references still
refer to them.  Any revisions marked "discard" are gone forever.

No new revisions were added by this update.

Summary of changes:
 .../test/java/org/apache/hudi/testsuite/job/TestHoodieTestSuiteJob.java | 2 --
 1 file changed, 2 deletions(-)



[GitHub] [incubator-hudi] bvaradar commented on pull request #1643: [HUDI-110] Spark Datasource Auto Partition Extractor

2020-05-20 Thread GitBox


bvaradar commented on pull request #1643:
URL: https://github.com/apache/incubator-hudi/pull/1643#issuecomment-631630091


   @garyli1019 : There are 2 parts to it : The ticket was originally created to 
track making hive-style partitioning scheme as default in Hudi.  Spark supports 
this same style.  Given the adoption, changing the default partition style has 
implication on backwards compatibility and needs to have a discussion.
   
   The other part is about how to make use of partition configuration spark 
captures in partitionBy(..) and use it directly configure KeyGenerator. Let me 
know if this makes sense. 
   
   
   SlashEncodedDayPartitionValueExtractor is the default being used. This is 
not a common format outside Uber.
   
   
   Also, Spark DataSource provides partitionedBy clauses which has not been 
integrated for Hudi Data Source.  We need to investigate how we can leverage 
partitionBy clause for partitioning.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-hudi] bvaradar commented on a change in pull request #1566: [HUDI-603]: DeltaStreamer can now fetch schema before every run in continuous mode

2020-05-20 Thread GitBox


bvaradar commented on a change in pull request #1566:
URL: https://github.com/apache/incubator-hudi/pull/1566#discussion_r428208406



##
File path: 
hudi-utilities/src/main/java/org/apache/hudi/utilities/schema/SchemaRegistryProvider.java
##
@@ -81,11 +66,22 @@ private static Schema getSchema(String registryUrl) throws 
IOException {
 
   @Override
   public Schema getSourceSchema() {
-return schema;
+String registryUrl = config.getString(Config.SRC_SCHEMA_REGISTRY_URL_PROP);

Review comment:
   this is called in every run and if we detect schema change, we register 
and recreate write client.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-hudi] bvaradar commented on a change in pull request #1566: [HUDI-603]: DeltaStreamer can now fetch schema before every run in continuous mode

2020-05-20 Thread GitBox


bvaradar commented on a change in pull request #1566:
URL: https://github.com/apache/incubator-hudi/pull/1566#discussion_r428210549



##
File path: 
hudi-utilities/src/main/java/org/apache/hudi/utilities/schema/SchemaSet.java
##
@@ -0,0 +1,44 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.utilities.schema;
+
+import java.io.Serializable;
+import java.util.HashSet;
+import org.apache.avro.Schema;
+import org.apache.avro.SchemaNormalization;
+
+import java.util.Set;
+
+/**
+ * Tracks already processed schemas.
+ */
+public class SchemaSet implements Serializable {
+

Review comment:
   I will add serialVersionUUID. Usually, serialversion mismatch is a clue 
to an underlying problem which is package version mismatch 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Resolved] (HUDI-848) Turn on embedded timeline server by default for all writes

2020-05-20 Thread Balaji Varadarajan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Balaji Varadarajan resolved HUDI-848.
-
Resolution: Fixed

> Turn on embedded timeline server by default for all writes
> --
>
> Key: HUDI-848
> URL: https://issues.apache.org/jira/browse/HUDI-848
> Project: Apache Hudi (incubating)
>  Issue Type: Sub-task
>  Components: Writer Core
>Reporter: Balaji Varadarajan
>Assignee: Balaji Varadarajan
>Priority: Major
> Fix For: 0.6.0, 0.5.3
>
>
> Includes RDD level, Spark DS and DeltaStreamer



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[incubator-hudi] branch hudi_test_suite_refactor updated (2773fe9 -> 7781692)

2020-05-20 Thread nagarwal
This is an automated email from the ASF dual-hosted git repository.

nagarwal pushed a change to branch hudi_test_suite_refactor
in repository https://gitbox.apache.org/repos/asf/incubator-hudi.git.


 discard 2773fe9  [HUDI-394] Provide a basic implementation of test suite
 add 7781692  [HUDI-394] Provide a basic implementation of test suite

This update added new revisions after undoing existing revisions.
That is to say, some revisions that were in the old version of the
branch are not in the new version.  This situation occurs
when a user --force pushes a change and generates a repository
containing something like this:

 * -- * -- B -- O -- O -- O   (2773fe9)
\
 N -- N -- N   refs/heads/hudi_test_suite_refactor (7781692)

You should already have received notification emails for all of the O
revisions, and so the following emails describe only the N revisions
from the common base, B.

Any revisions marked "omit" are not gone; other references still
refer to them.  Any revisions marked "discard" are gone forever.

No new revisions were added by this update.

Summary of changes:
 .../java/org/apache/hudi/testsuite/job/TestHoodieTestSuiteJob.java| 4 
 1 file changed, 4 deletions(-)



[GitHub] [incubator-hudi] vinothchandar commented on pull request #1638: HUDI-515 Resolve API conflict for Hive 2 & Hive 3

2020-05-20 Thread GitBox


vinothchandar commented on pull request #1638:
URL: https://github.com/apache/incubator-hudi/pull/1638#issuecomment-631440488


   FWIW using reflection is probably the only way, if the API between Hive 2 
and 3 has broken (IIUC it has) 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (HUDI-916) Add support for multiple date/time formats in TimestampBasedKeyGenerator

2020-05-20 Thread Pratyaksh Sharma (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pratyaksh Sharma updated HUDI-916:
--
Status: Open  (was: New)

> Add support for multiple date/time formats in TimestampBasedKeyGenerator
> 
>
> Key: HUDI-916
> URL: https://issues.apache.org/jira/browse/HUDI-916
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: DeltaStreamer, newbie
>Reporter: Pratyaksh Sharma
>Assignee: Pratyaksh Sharma
>Priority: Major
> Fix For: 0.6.0
>
>
> Currently TimestampBasedKeyGenerator supports only one input date/time format 
> creating custom partition paths using timestamp based logic. Need to support 
> multiple input formats there. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-916) Add support for multiple date/time formats in TimestampBasedKeyGenerator

2020-05-20 Thread Pratyaksh Sharma (Jira)
Pratyaksh Sharma created HUDI-916:
-

 Summary: Add support for multiple date/time formats in 
TimestampBasedKeyGenerator
 Key: HUDI-916
 URL: https://issues.apache.org/jira/browse/HUDI-916
 Project: Apache Hudi (incubating)
  Issue Type: Improvement
  Components: DeltaStreamer, newbie
Reporter: Pratyaksh Sharma
Assignee: Pratyaksh Sharma
 Fix For: 0.6.0


Currently TimestampBasedKeyGenerator supports only one input date/time format 
creating custom partition paths using timestamp based logic. Need to support 
multiple input formats there. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] vinothchandar commented on pull request #1484: [HUDI-316] : Hbase qps repartition writestatus

2020-05-20 Thread GitBox


vinothchandar commented on pull request #1484:
URL: https://github.com/apache/incubator-hudi/pull/1484#issuecomment-631426244


   @v3nkatesh allow me to give some context around why we are strict around 
guava.. Hudi as you know has to be dropped under many different services 
(hive/spark/presto,...) and guava as universal it is, presents a jar conflict 
nightmare... 
   
   On the re-using code itself, i know we have done this in few places in the 
past. but, this often leads to maintenance issues more often than not... we 
change the re-used code, people fix the original code.. and overtime we don't 
invest in getting upstream bug fixes etc.. So for small stuff like this, I 
prefer that we write it ourselves.. 
   
   Is this the original 
[code](https://github.com/google/guava/blob/master/guava/src/com/google/common/util/concurrent/RateLimiter.java)
 ? I would say if we are trimming down that file and using some parts verbatim, 
we are still reusing code.. 
   
   The act of adding new things to LICENSE/NOTICE is not that straightforward, 
given we don't have an entry for guava yet.. We need to examine, guava's 
NOTICE, its dependencies etc.. I thought, even for you, just writing a small 
class and being done would be a better use of time? 
   
   
   
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-hudi] vinothchandar commented on issue #1641: [SUPPORT] Failed to merge old record into new file for key xxx from old file 123.parquet to new file 456.parquet

2020-05-20 Thread GitBox


vinothchandar commented on issue #1641:
URL: https://github.com/apache/incubator-hudi/issues/1641#issuecomment-631438361


   Looks like a schema mismatch.. did you change a number to a string for .eg? 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-hudi] vinothchandar commented on issue #1641: [SUPPORT] Failed to merge old record into new file for key xxx from old file 123.parquet to new file 456.parquet

2020-05-20 Thread GitBox


vinothchandar commented on issue #1641:
URL: https://github.com/apache/incubator-hudi/issues/1641#issuecomment-631438538


   cc @lamber-ken @leesf any of you , interested in helping here? :) 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-914) support different target data clusters

2020-05-20 Thread Vinoth Chandar (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17112128#comment-17112128
 ] 

Vinoth Chandar commented on HUDI-914:
-

For my understanding, whats a specific scenario where you cannot run on the 
target cluster, but have to run Hudi writing off another clusteR? 

> support different target data clusters
> --
>
> Key: HUDI-914
> URL: https://issues.apache.org/jira/browse/HUDI-914
> Project: Apache Hudi (incubating)
>  Issue Type: New Feature
>  Components: DeltaStreamer
>Reporter: liujinhui
>Assignee: liujinhui
>Priority: Major
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Currently hudi-DeltaStreamer does not support writing to different target 
> clusters. The specific scenarios are as follows: Generally, Hudi tasks run on 
> an independent cluster. If you want to write data to the target data cluster, 
> you generally rely on core-site.xml and hdfs-site.xml; sometimes you will 
> encounter different targets. The data cluster writes data, but the cluster 
> running the hudi task does not have the core-site.xml and hdfs-site.xml of 
> the target cluster. Although specifying the namenode IP address of the target 
> cluster can be written, this loses HDFS high availability, so I plan to Use 
> the contents of the core-site.xml and hdfs-site.xml files of the target 
> cluster as configuration items and configure them in the 
> dfs-source.properties or kafka-source.properties file of Hudi.
> Is there a better way to solve this problem?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] vinothchandar commented on pull request #1538: [HUDI-803]: added more test cases in TestHoodieAvroUtils.class

2020-05-20 Thread GitBox


vinothchandar commented on pull request #1538:
URL: https://github.com/apache/incubator-hudi/pull/1538#issuecomment-631445518


   @pratyakshsharma could you rebase again and repush .. codecov seems to need 
that to work.. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-hudi] pratyakshsharma commented on pull request #1538: [HUDI-803]: added more test cases in TestHoodieAvroUtils.class

2020-05-20 Thread GitBox


pratyakshsharma commented on pull request #1538:
URL: https://github.com/apache/incubator-hudi/pull/1538#issuecomment-631472874


   > @pratyakshsharma by close, you mean final review and merge right? :)
   
   Yes :)
   
   Rebased and pushed again.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-hudi] codecov-commenter commented on pull request #1647: [HUDI-867]: fixed IllegalArgumentException from graphite metrics in deltaStreamer continuous mode

2020-05-20 Thread GitBox


codecov-commenter commented on pull request #1647:
URL: https://github.com/apache/incubator-hudi/pull/1647#issuecomment-631511240


   # 
[Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1647?src=pr=h1) 
Report
   > Merging 
[#1647](https://codecov.io/gh/apache/incubator-hudi/pull/1647?src=pr=desc) 
into 
[master](https://codecov.io/gh/apache/incubator-hudi/commit/244d47494e2d4d5b3ca60e460e1feb9351fb8e69=desc)
 will **increase** coverage by `1.76%`.
   > The diff coverage is `0.00%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-hudi/pull/1647/graphs/tree.svg?width=650=150=pr=VTTXabwbs2)](https://codecov.io/gh/apache/incubator-hudi/pull/1647?src=pr=tree)
   
   ```diff
   @@ Coverage Diff  @@
   ## master#1647  +/-   ##
   
   + Coverage 16.60%   18.37%   +1.76% 
   - Complexity  800  855  +55 
   
 Files   344  344  
 Lines 1517215165   -7 
 Branches   1512 1512  
   
   + Hits   2520 2786 +266 
   + Misses1232012026 -294 
   - Partials332  353  +21 
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/incubator-hudi/pull/1647?src=pr=tree) | 
Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | 
[...java/org/apache/hudi/config/HoodieWriteConfig.java](https://codecov.io/gh/apache/incubator-hudi/pull/1647/diff?src=pr=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29uZmlnL0hvb2RpZVdyaXRlQ29uZmlnLmphdmE=)
 | `43.91% <ø> (+1.65%)` | `48.00 <0.00> (ø)` | |
   | 
[...in/java/org/apache/hudi/metrics/HoodieMetrics.java](https://codecov.io/gh/apache/incubator-hudi/pull/1647/diff?src=pr=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvbWV0cmljcy9Ib29kaWVNZXRyaWNzLmphdmE=)
 | `18.86% <0.00%> (-0.37%)` | `6.00 <0.00> (ø)` | |
   | 
[...le/view/IncrementalTimelineSyncFileSystemView.java](https://codecov.io/gh/apache/incubator-hudi/pull/1647/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3ZpZXcvSW5jcmVtZW50YWxUaW1lbGluZVN5bmNGaWxlU3lzdGVtVmlldy5qYXZh)
 | `4.51% <0.00%> (+0.56%)` | `4.00% <0.00%> (+1.00%)` | |
   | 
[...common/table/view/AbstractTableFileSystemView.java](https://codecov.io/gh/apache/incubator-hudi/pull/1647/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3ZpZXcvQWJzdHJhY3RUYWJsZUZpbGVTeXN0ZW1WaWV3LmphdmE=)
 | `8.59% <0.00%> (+2.34%)` | `6.00% <0.00%> (+1.00%)` | |
   | 
[...i/common/table/timeline/HoodieDefaultTimeline.java](https://codecov.io/gh/apache/incubator-hudi/pull/1647/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3RpbWVsaW5lL0hvb2RpZURlZmF1bHRUaW1lbGluZS5qYXZh)
 | `52.30% <0.00%> (+4.61%)` | `28.00% <0.00%> (+4.00%)` | |
   | 
[.../main/java/org/apache/hudi/common/util/Option.java](https://codecov.io/gh/apache/incubator-hudi/pull/1647/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3V0aWwvT3B0aW9uLmphdmE=)
 | `51.35% <0.00%> (+5.40%)` | `18.00% <0.00%> (+1.00%)` | |
   | 
[...udi/timeline/service/handlers/BaseFileHandler.java](https://codecov.io/gh/apache/incubator-hudi/pull/1647/diff?src=pr=tree#diff-aHVkaS10aW1lbGluZS1zZXJ2aWNlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL3RpbWVsaW5lL3NlcnZpY2UvaGFuZGxlcnMvQmFzZUZpbGVIYW5kbGVyLmphdmE=)
 | `11.11% <0.00%> (+11.11%)` | `1.00% <0.00%> (+1.00%)` | |
   | 
[...common/table/view/PriorityBasedFileSystemView.java](https://codecov.io/gh/apache/incubator-hudi/pull/1647/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3ZpZXcvUHJpb3JpdHlCYXNlZEZpbGVTeXN0ZW1WaWV3LmphdmE=)
 | `11.94% <0.00%> (+11.94%)` | `4.00% <0.00%> (+4.00%)` | |
   | 
[...i/common/table/view/HoodieTableFileSystemView.java](https://codecov.io/gh/apache/incubator-hudi/pull/1647/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3ZpZXcvSG9vZGllVGFibGVGaWxlU3lzdGVtVmlldy5qYXZh)
 | `37.93% <0.00%> (+13.79%)` | `9.00% <0.00%> (+2.00%)` | |
   | 
[...common/table/view/FileSystemViewStorageConfig.java](https://codecov.io/gh/apache/incubator-hudi/pull/1647/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3ZpZXcvRmlsZVN5c3RlbVZpZXdTdG9yYWdlQ29uZmlnLmphdmE=)
 | `63.49% <0.00%> (+14.28%)` | `8.00% <0.00%> (+3.00%)` | |
   | ... and [9 
more](https://codecov.io/gh/apache/incubator-hudi/pull/1647/diff?src=pr=tree-more)
 | |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1647?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  

[GitHub] [incubator-hudi] codecov-commenter edited a comment on pull request #1538: [HUDI-803]: added more test cases in TestHoodieAvroUtils.class

2020-05-20 Thread GitBox


codecov-commenter edited a comment on pull request #1538:
URL: https://github.com/apache/incubator-hudi/pull/1538#issuecomment-631510907


   # 
[Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1538?src=pr=h1) 
Report
   > Merging 
[#1538](https://codecov.io/gh/apache/incubator-hudi/pull/1538?src=pr=desc) 
into 
[master](https://codecov.io/gh/apache/incubator-hudi/commit/74ecc27e920c70fa4598d8e5a696954203a5b127=desc)
 will **decrease** coverage by `0.01%`.
   > The diff coverage is `50.00%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-hudi/pull/1538/graphs/tree.svg?width=650=150=pr=VTTXabwbs2)](https://codecov.io/gh/apache/incubator-hudi/pull/1538?src=pr=tree)
   
   ```diff
   @@ Coverage Diff  @@
   ## master#1538  +/-   ##
   
   - Coverage 18.34%   18.33%   -0.02% 
   - Complexity  854  855   +1 
   
 Files   344  344  
 Lines 1517215167   -5 
 Branches   1512 1512  
   
   - Hits   2784 2781   -3 
   + Misses1203512033   -2 
 Partials353  353  
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/incubator-hudi/pull/1538?src=pr=tree) | 
Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | 
[...ain/java/org/apache/hudi/avro/HoodieAvroUtils.java](https://codecov.io/gh/apache/incubator-hudi/pull/1538/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvYXZyby9Ib29kaWVBdnJvVXRpbHMuamF2YQ==)
 | `48.09% <50.00%> (-1.91%)` | `22.00 <0.00> (ø)` | |
   | 
[...apache/hudi/common/fs/HoodieWrapperFileSystem.java](https://codecov.io/gh/apache/incubator-hudi/pull/1538/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL2ZzL0hvb2RpZVdyYXBwZXJGaWxlU3lzdGVtLmphdmE=)
 | `22.69% <0.00%> (+0.70%)` | `29.00% <0.00%> (+1.00%)` | |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1538?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1538?src=pr=footer).
 Last update 
[74ecc27...1373dee](https://codecov.io/gh/apache/incubator-hudi/pull/1538?src=pr=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-hudi] hddong commented on a change in pull request #1558: [HUDI-796]: added deduping logic for upserts case

2020-05-20 Thread GitBox


hddong commented on a change in pull request #1558:
URL: https://github.com/apache/incubator-hudi/pull/1558#discussion_r428056297



##
File path: hudi-cli/src/main/java/org/apache/hudi/cli/commands/SparkMain.java
##
@@ -263,13 +265,26 @@ private static int compact(JavaSparkContext jsc, String 
basePath, String tableNa
   }
 
   private static int deduplicatePartitionPath(JavaSparkContext jsc, String 
duplicatedPartitionPath,
-  String repairedOutputPath, String basePath, String dryRun) {
+  String repairedOutputPath, String basePath, boolean dryRun, String 
dedupeType) {
 DedupeSparkJob job = new DedupeSparkJob(basePath, duplicatedPartitionPath, 
repairedOutputPath, new SQLContext(jsc),
-FSUtils.getFs(basePath, jsc.hadoopConfiguration()));
-job.fixDuplicates(Boolean.parseBoolean(dryRun));
+FSUtils.getFs(basePath, jsc.hadoopConfiguration()), 
getDedupeType(dedupeType));
+job.fixDuplicates(dryRun);
 return 0;
   }
 
+  private static Enumeration.Value getDedupeType(String type) {
+switch (type) {
+  case "insertType":
+return DeDupeType.insertType();
+  case "updateType":
+return DeDupeType.updateType();
+  case "upsertType":
+return DeDupeType.upsertType();
+  default:
+throw new IllegalArgumentException("Please provide valid dedupe 
type!");
+}
+  }
+

Review comment:
   Can use `DeDupeType.withName("insertType")` instead.

##
File path: hudi-cli/src/main/scala/org/apache/hudi/cli/DeDupeType.scala
##
@@ -0,0 +1,28 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.cli
+
+object DeDupeType extends Enumeration {
+
+  type dedupeType = Value
+
+  val insertType = Value("insertType")
+  val updateType = Value("updateType")
+  val upsertType = Value("upsertType")

Review comment:
   Can we make it all uppercase to keep the format uniform
   
https://github.com/apache/incubator-hudi/blob/74ecc27e920c70fa4598d8e5a696954203a5b127/hudi-common/src/main/java/org/apache/hudi/common/model/WriteOperationType.java#L30-L34

##
File path: hudi-cli/src/main/java/org/apache/hudi/cli/commands/SparkMain.java
##
@@ -263,13 +265,26 @@ private static int compact(JavaSparkContext jsc, String 
basePath, String tableNa
   }
 
   private static int deduplicatePartitionPath(JavaSparkContext jsc, String 
duplicatedPartitionPath,
-  String repairedOutputPath, String basePath, String dryRun) {
+  String repairedOutputPath, String basePath, boolean dryRun, String 
dedupeType) {
 DedupeSparkJob job = new DedupeSparkJob(basePath, duplicatedPartitionPath, 
repairedOutputPath, new SQLContext(jsc),
-FSUtils.getFs(basePath, jsc.hadoopConfiguration()));
-job.fixDuplicates(Boolean.parseBoolean(dryRun));
+FSUtils.getFs(basePath, jsc.hadoopConfiguration()), 
getDedupeType(dedupeType));
+job.fixDuplicates(dryRun);
 return 0;
   }
 
+  private static Enumeration.Value getDedupeType(String type) {
+switch (type) {
+  case "insertType":
+return DeDupeType.insertType();
+  case "updateType":
+return DeDupeType.updateType();
+  case "upsertType":
+return DeDupeType.upsertType();
+  default:
+throw new IllegalArgumentException("Please provide valid dedupe 
type!");
+}
+  }
+

Review comment:
   Can use `DeDupeType.withName("insertType")` instead?

##
File path: 
hudi-cli/src/main/java/org/apache/hudi/cli/commands/RepairsCommand.java
##
@@ -77,7 +77,9 @@ public String deduplicate(
   help = "Spark executor memory") final String sparkMemory,
   @CliOption(key = {"dryrun"},
   help = "Should we actually remove duplicates or just run and store 
result to repairedOutputPath",
-  unspecifiedDefaultValue = "true") final boolean dryRun)
+  unspecifiedDefaultValue = "true") final boolean dryRun,
+  @CliOption(key = {"dedupeType"}, help = "Check DeDupeType.scala for 
valid values",
+  unspecifiedDefaultValue = "insertType") final String dedupeType)

Review comment:
   It's better to show the three types in help string and have a type check 
at first line of 

[GitHub] [incubator-hudi] codecov-commenter edited a comment on pull request #1433: [HUDI-728]: Implement custom key generator

2020-05-20 Thread GitBox


codecov-commenter edited a comment on pull request #1433:
URL: https://github.com/apache/incubator-hudi/pull/1433#issuecomment-631535136


   # 
[Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1433?src=pr=h1) 
Report
   > Merging 
[#1433](https://codecov.io/gh/apache/incubator-hudi/pull/1433?src=pr=desc) 
into 
[master](https://codecov.io/gh/apache/incubator-hudi/commit/25a0080b2f6ddce0e528b2a72aea33a565f0e565=desc)
 will **increase** coverage by `1.55%`.
   > The diff coverage is `8.45%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-hudi/pull/1433/graphs/tree.svg?width=650=150=pr=VTTXabwbs2)](https://codecov.io/gh/apache/incubator-hudi/pull/1433?src=pr=tree)
   
   ```diff
   @@ Coverage Diff  @@
   ## master#1433  +/-   ##
   
   + Coverage 16.71%   18.26%   +1.55% 
   - Complexity  795  854  +59 
   
 Files   340  347   +7 
 Lines 1503015262 +232 
 Branches   1499 1525  +26 
   
   + Hits   2512 2788 +276 
   + Misses1218812122  -66 
   - Partials330  352  +22 
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/incubator-hudi/pull/1433?src=pr=tree) | 
Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | 
[...e/hudi/exception/HoodieDeltaStreamerException.java](https://codecov.io/gh/apache/incubator-hudi/pull/1433/diff?src=pr=tree#diff-aHVkaS1zcGFyay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS9leGNlcHRpb24vSG9vZGllRGVsdGFTdHJlYW1lckV4Y2VwdGlvbi5qYXZh)
 | `0.00% <ø> (ø)` | `0.00 <0.00> (?)` | |
   | 
[...va/org/apache/hudi/keygen/ComplexKeyGenerator.java](https://codecov.io/gh/apache/incubator-hudi/pull/1433/diff?src=pr=tree#diff-aHVkaS1zcGFyay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS9rZXlnZW4vQ29tcGxleEtleUdlbmVyYXRvci5qYXZh)
 | `0.00% <0.00%> (ø)` | `0.00 <0.00> (ø)` | |
   | 
[...ava/org/apache/hudi/keygen/CustomKeyGenerator.java](https://codecov.io/gh/apache/incubator-hudi/pull/1433/diff?src=pr=tree#diff-aHVkaS1zcGFyay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS9rZXlnZW4vQ3VzdG9tS2V5R2VuZXJhdG9yLmphdmE=)
 | `0.00% <0.00%> (ø)` | `0.00 <0.00> (?)` | |
   | 
[...g/apache/hudi/keygen/GlobalDeleteKeyGenerator.java](https://codecov.io/gh/apache/incubator-hudi/pull/1433/diff?src=pr=tree#diff-aHVkaS1zcGFyay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS9rZXlnZW4vR2xvYmFsRGVsZXRlS2V5R2VuZXJhdG9yLmphdmE=)
 | `0.00% <0.00%> (ø)` | `0.00 <0.00> (ø)` | |
   | 
[...apache/hudi/keygen/NonpartitionedKeyGenerator.java](https://codecov.io/gh/apache/incubator-hudi/pull/1433/diff?src=pr=tree#diff-aHVkaS1zcGFyay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS9rZXlnZW4vTm9ucGFydGl0aW9uZWRLZXlHZW5lcmF0b3IuamF2YQ==)
 | `0.00% <0.00%> (ø)` | `0.00 <0.00> (ø)` | |
   | 
[...apache/hudi/keygen/TimestampBasedKeyGenerator.java](https://codecov.io/gh/apache/incubator-hudi/pull/1433/diff?src=pr=tree#diff-aHVkaS1zcGFyay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS9rZXlnZW4vVGltZXN0YW1wQmFzZWRLZXlHZW5lcmF0b3IuamF2YQ==)
 | `0.00% <0.00%> (ø)` | `0.00 <0.00> (?)` | |
   | 
[...ava/org/apache/hudi/keygen/SimpleKeyGenerator.java](https://codecov.io/gh/apache/incubator-hudi/pull/1433/diff?src=pr=tree#diff-aHVkaS1zcGFyay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS9rZXlnZW4vU2ltcGxlS2V5R2VuZXJhdG9yLmphdmE=)
 | `73.68% <75.00%> (+14.86%)` | `0.00 <0.00> (ø)` | |
   | 
[...in/scala/org/apache/hudi/AvroConversionUtils.scala](https://codecov.io/gh/apache/incubator-hudi/pull/1433/diff?src=pr=tree#diff-aHVkaS1zcGFyay9zcmMvbWFpbi9zY2FsYS9vcmcvYXBhY2hlL2h1ZGkvQXZyb0NvbnZlcnNpb25VdGlscy5zY2FsYQ==)
 | `45.45% <0.00%> (-4.55%)` | `0.00% <0.00%> (ø%)` | |
   | 
[...c/main/java/org/apache/hudi/index/HoodieIndex.java](https://codecov.io/gh/apache/incubator-hudi/pull/1433/diff?src=pr=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaW5kZXgvSG9vZGllSW5kZXguamF2YQ==)
 | `36.84% <0.00%> (-4.34%)` | `3.00% <0.00%> (ø%)` | |
   | 
[...ain/java/org/apache/hudi/avro/HoodieAvroUtils.java](https://codecov.io/gh/apache/incubator-hudi/pull/1433/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvYXZyby9Ib29kaWVBdnJvVXRpbHMuamF2YQ==)
 | `50.00% <0.00%> (-3.97%)` | `22.00% <0.00%> (ø%)` | |
   | ... and [36 
more](https://codecov.io/gh/apache/incubator-hudi/pull/1433/diff?src=pr=tree-more)
 | |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1433?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1433?src=pr=footer).
 Last update 

[GitHub] [incubator-hudi] vinothchandar merged pull request #1640: [MINOR] Fix resource cleanup in TestTableSchemaEvolution

2020-05-20 Thread GitBox


vinothchandar merged pull request #1640:
URL: https://github.com/apache/incubator-hudi/pull/1640


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-hudi] codecov-commenter edited a comment on pull request #1647: [HUDI-867]: fixed IllegalArgumentException from graphite metrics in deltaStreamer continuous mode

2020-05-20 Thread GitBox


codecov-commenter edited a comment on pull request #1647:
URL: https://github.com/apache/incubator-hudi/pull/1647#issuecomment-631511240


   # 
[Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1647?src=pr=h1) 
Report
   > Merging 
[#1647](https://codecov.io/gh/apache/incubator-hudi/pull/1647?src=pr=desc) 
into 
[master](https://codecov.io/gh/apache/incubator-hudi/commit/244d47494e2d4d5b3ca60e460e1feb9351fb8e69=desc)
 will **increase** coverage by `1.76%`.
   > The diff coverage is `0.00%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-hudi/pull/1647/graphs/tree.svg?width=650=150=pr=VTTXabwbs2)](https://codecov.io/gh/apache/incubator-hudi/pull/1647?src=pr=tree)
   
   ```diff
   @@ Coverage Diff  @@
   ## master#1647  +/-   ##
   
   + Coverage 16.60%   18.37%   +1.76% 
   - Complexity  800  855  +55 
   
 Files   344  344  
 Lines 1517215165   -7 
 Branches   1512 1512  
   
   + Hits   2520 2786 +266 
   + Misses1232012026 -294 
   - Partials332  353  +21 
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/incubator-hudi/pull/1647?src=pr=tree) | 
Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | 
[...java/org/apache/hudi/config/HoodieWriteConfig.java](https://codecov.io/gh/apache/incubator-hudi/pull/1647/diff?src=pr=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29uZmlnL0hvb2RpZVdyaXRlQ29uZmlnLmphdmE=)
 | `43.91% <ø> (+1.65%)` | `48.00 <0.00> (ø)` | |
   | 
[...in/java/org/apache/hudi/metrics/HoodieMetrics.java](https://codecov.io/gh/apache/incubator-hudi/pull/1647/diff?src=pr=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvbWV0cmljcy9Ib29kaWVNZXRyaWNzLmphdmE=)
 | `18.86% <0.00%> (-0.37%)` | `6.00 <0.00> (ø)` | |
   | 
[...le/view/IncrementalTimelineSyncFileSystemView.java](https://codecov.io/gh/apache/incubator-hudi/pull/1647/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3ZpZXcvSW5jcmVtZW50YWxUaW1lbGluZVN5bmNGaWxlU3lzdGVtVmlldy5qYXZh)
 | `4.51% <0.00%> (+0.56%)` | `4.00% <0.00%> (+1.00%)` | |
   | 
[...common/table/view/AbstractTableFileSystemView.java](https://codecov.io/gh/apache/incubator-hudi/pull/1647/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3ZpZXcvQWJzdHJhY3RUYWJsZUZpbGVTeXN0ZW1WaWV3LmphdmE=)
 | `8.59% <0.00%> (+2.34%)` | `6.00% <0.00%> (+1.00%)` | |
   | 
[...i/common/table/timeline/HoodieDefaultTimeline.java](https://codecov.io/gh/apache/incubator-hudi/pull/1647/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3RpbWVsaW5lL0hvb2RpZURlZmF1bHRUaW1lbGluZS5qYXZh)
 | `52.30% <0.00%> (+4.61%)` | `28.00% <0.00%> (+4.00%)` | |
   | 
[.../main/java/org/apache/hudi/common/util/Option.java](https://codecov.io/gh/apache/incubator-hudi/pull/1647/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3V0aWwvT3B0aW9uLmphdmE=)
 | `51.35% <0.00%> (+5.40%)` | `18.00% <0.00%> (+1.00%)` | |
   | 
[...udi/timeline/service/handlers/BaseFileHandler.java](https://codecov.io/gh/apache/incubator-hudi/pull/1647/diff?src=pr=tree#diff-aHVkaS10aW1lbGluZS1zZXJ2aWNlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL3RpbWVsaW5lL3NlcnZpY2UvaGFuZGxlcnMvQmFzZUZpbGVIYW5kbGVyLmphdmE=)
 | `11.11% <0.00%> (+11.11%)` | `1.00% <0.00%> (+1.00%)` | |
   | 
[...common/table/view/PriorityBasedFileSystemView.java](https://codecov.io/gh/apache/incubator-hudi/pull/1647/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3ZpZXcvUHJpb3JpdHlCYXNlZEZpbGVTeXN0ZW1WaWV3LmphdmE=)
 | `11.94% <0.00%> (+11.94%)` | `4.00% <0.00%> (+4.00%)` | |
   | 
[...i/common/table/view/HoodieTableFileSystemView.java](https://codecov.io/gh/apache/incubator-hudi/pull/1647/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3ZpZXcvSG9vZGllVGFibGVGaWxlU3lzdGVtVmlldy5qYXZh)
 | `37.93% <0.00%> (+13.79%)` | `9.00% <0.00%> (+2.00%)` | |
   | 
[...common/table/view/FileSystemViewStorageConfig.java](https://codecov.io/gh/apache/incubator-hudi/pull/1647/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3ZpZXcvRmlsZVN5c3RlbVZpZXdTdG9yYWdlQ29uZmlnLmphdmE=)
 | `63.49% <0.00%> (+14.28%)` | `8.00% <0.00%> (+3.00%)` | |
   | ... and [9 
more](https://codecov.io/gh/apache/incubator-hudi/pull/1647/diff?src=pr=tree-more)
 | |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1647?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = 

[GitHub] [incubator-hudi] codecov-commenter commented on pull request #1433: [HUDI-728]: Implement custom key generator

2020-05-20 Thread GitBox


codecov-commenter commented on pull request #1433:
URL: https://github.com/apache/incubator-hudi/pull/1433#issuecomment-631535136


   # 
[Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1433?src=pr=h1) 
Report
   > Merging 
[#1433](https://codecov.io/gh/apache/incubator-hudi/pull/1433?src=pr=desc) 
into 
[master](https://codecov.io/gh/apache/incubator-hudi/commit/25a0080b2f6ddce0e528b2a72aea33a565f0e565=desc)
 will **increase** coverage by `1.55%`.
   > The diff coverage is `8.45%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-hudi/pull/1433/graphs/tree.svg?width=650=150=pr=VTTXabwbs2)](https://codecov.io/gh/apache/incubator-hudi/pull/1433?src=pr=tree)
   
   ```diff
   @@ Coverage Diff  @@
   ## master#1433  +/-   ##
   
   + Coverage 16.71%   18.26%   +1.55% 
   - Complexity  795  854  +59 
   
 Files   340  347   +7 
 Lines 1503015262 +232 
 Branches   1499 1525  +26 
   
   + Hits   2512 2788 +276 
   + Misses1218812122  -66 
   - Partials330  352  +22 
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/incubator-hudi/pull/1433?src=pr=tree) | 
Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | 
[...e/hudi/exception/HoodieDeltaStreamerException.java](https://codecov.io/gh/apache/incubator-hudi/pull/1433/diff?src=pr=tree#diff-aHVkaS1zcGFyay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS9leGNlcHRpb24vSG9vZGllRGVsdGFTdHJlYW1lckV4Y2VwdGlvbi5qYXZh)
 | `0.00% <ø> (ø)` | `0.00 <0.00> (?)` | |
   | 
[...va/org/apache/hudi/keygen/ComplexKeyGenerator.java](https://codecov.io/gh/apache/incubator-hudi/pull/1433/diff?src=pr=tree#diff-aHVkaS1zcGFyay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS9rZXlnZW4vQ29tcGxleEtleUdlbmVyYXRvci5qYXZh)
 | `0.00% <0.00%> (ø)` | `0.00 <0.00> (ø)` | |
   | 
[...ava/org/apache/hudi/keygen/CustomKeyGenerator.java](https://codecov.io/gh/apache/incubator-hudi/pull/1433/diff?src=pr=tree#diff-aHVkaS1zcGFyay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS9rZXlnZW4vQ3VzdG9tS2V5R2VuZXJhdG9yLmphdmE=)
 | `0.00% <0.00%> (ø)` | `0.00 <0.00> (?)` | |
   | 
[...g/apache/hudi/keygen/GlobalDeleteKeyGenerator.java](https://codecov.io/gh/apache/incubator-hudi/pull/1433/diff?src=pr=tree#diff-aHVkaS1zcGFyay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS9rZXlnZW4vR2xvYmFsRGVsZXRlS2V5R2VuZXJhdG9yLmphdmE=)
 | `0.00% <0.00%> (ø)` | `0.00 <0.00> (ø)` | |
   | 
[...apache/hudi/keygen/NonpartitionedKeyGenerator.java](https://codecov.io/gh/apache/incubator-hudi/pull/1433/diff?src=pr=tree#diff-aHVkaS1zcGFyay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS9rZXlnZW4vTm9ucGFydGl0aW9uZWRLZXlHZW5lcmF0b3IuamF2YQ==)
 | `0.00% <0.00%> (ø)` | `0.00 <0.00> (ø)` | |
   | 
[...apache/hudi/keygen/TimestampBasedKeyGenerator.java](https://codecov.io/gh/apache/incubator-hudi/pull/1433/diff?src=pr=tree#diff-aHVkaS1zcGFyay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS9rZXlnZW4vVGltZXN0YW1wQmFzZWRLZXlHZW5lcmF0b3IuamF2YQ==)
 | `0.00% <0.00%> (ø)` | `0.00 <0.00> (?)` | |
   | 
[...ava/org/apache/hudi/keygen/SimpleKeyGenerator.java](https://codecov.io/gh/apache/incubator-hudi/pull/1433/diff?src=pr=tree#diff-aHVkaS1zcGFyay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS9rZXlnZW4vU2ltcGxlS2V5R2VuZXJhdG9yLmphdmE=)
 | `73.68% <75.00%> (+14.86%)` | `0.00 <0.00> (ø)` | |
   | 
[...in/scala/org/apache/hudi/AvroConversionUtils.scala](https://codecov.io/gh/apache/incubator-hudi/pull/1433/diff?src=pr=tree#diff-aHVkaS1zcGFyay9zcmMvbWFpbi9zY2FsYS9vcmcvYXBhY2hlL2h1ZGkvQXZyb0NvbnZlcnNpb25VdGlscy5zY2FsYQ==)
 | `45.45% <0.00%> (-4.55%)` | `0.00% <0.00%> (ø%)` | |
   | 
[...c/main/java/org/apache/hudi/index/HoodieIndex.java](https://codecov.io/gh/apache/incubator-hudi/pull/1433/diff?src=pr=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaW5kZXgvSG9vZGllSW5kZXguamF2YQ==)
 | `36.84% <0.00%> (-4.34%)` | `3.00% <0.00%> (ø%)` | |
   | 
[...ain/java/org/apache/hudi/avro/HoodieAvroUtils.java](https://codecov.io/gh/apache/incubator-hudi/pull/1433/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvYXZyby9Ib29kaWVBdnJvVXRpbHMuamF2YQ==)
 | `50.00% <0.00%> (-3.97%)` | `22.00% <0.00%> (ø%)` | |
   | ... and [36 
more](https://codecov.io/gh/apache/incubator-hudi/pull/1433/diff?src=pr=tree-more)
 | |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1433?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1433?src=pr=footer).
 Last update 

[GitHub] [incubator-hudi] vinothchandar edited a comment on pull request #1484: [HUDI-316] : Hbase qps repartition writestatus

2020-05-20 Thread GitBox


vinothchandar edited a comment on pull request #1484:
URL: https://github.com/apache/incubator-hudi/pull/1484#issuecomment-631426244


   @v3nkatesh allow me to give some context around why we are strict around 
guava.. Hudi as you know has to be dropped under many different services 
(hive/spark/presto,...) and guava as universal it is, presents a jar conflict 
nightmare... 
   
   On the re-using code itself, i know we have done this in few places in the 
past. but, this often leads to maintenance issues more often than not... we 
change the re-used code, people fix the original code.. and overtime we don't 
invest in getting upstream bug fixes etc.. So for small stuff like this, I 
prefer that we write it ourselves.. 
   
   Is this the original 
[code](https://github.com/google/guava/blob/master/guava/src/com/google/common/util/concurrent/RateLimiter.java)
 ? I would say if we are trimming down that file and using some parts verbatim, 
we are still reusing code.. (which I think is what we are doing)
   
   The act of adding new things to LICENSE/NOTICE is not that straightforward, 
given we don't have an entry for guava yet.. We need to examine, guava's 
NOTICE, its dependencies etc.. I thought, even for you, just writing a small 
class and being done would be a better use of time? 
   
   
   
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-890) Prepare for 0.5.3 patch release

2020-05-20 Thread Vinoth Chandar (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17112120#comment-17112120
 ] 

Vinoth Chandar commented on HUDI-890:
-

[~vbalaji] can make the call on HUDI-846 (I am okay turning them on , but 
balaji worth reviewing the ported patch once for incremental cleaning issue we 
fixed and confirm) .. 

On HUDI-889, we can untag for 0.5.3 

> Prepare for 0.5.3 patch release
> ---
>
> Key: HUDI-890
> URL: https://issues.apache.org/jira/browse/HUDI-890
> Project: Apache Hudi (incubating)
>  Issue Type: Task
>Reporter: Bhavani Sudha
>Assignee: Bhavani Sudha
>Priority: Major
> Fix For: 0.5.3
>
>
> The following commits are included in this release.
>  * #1372 HUDI-652 Decouple HoodieReadClient and AbstractHoodieClient to break 
> the inheritance chain
>  * #1388 HUDI-681 Remove embeddedTimelineService from HoodieReadClient
>  * #1350 HUDI-629: Replace Guava's Hashing with an equivalent in 
> NumericUtils.java
>  * #1505 [HUDI - 738] Add validation to DeltaStreamer to fail fast when 
> filterDupes is enabled on UPSERT mode.
>  * #1517 HUDI-799 Use appropriate FS when loading configs
>  * #1406 HUDI-713 Fix conversion of Spark array of struct type to Avro schema
>  * #1394 HUDI-656[Performance] Return a dummy Spark relation after writing 
> the DataFrame
>  * #1576 HUDI-850 Avoid unnecessary listings in incremental cleaning mode
>  * #1421 HUDI-724 Parallelize getSmallFiles for partitions
>  * #1330 HUDI-607 Fix to allow creation/syncing of Hive tables partitioned by 
> Date type columns
>  * #1413 Add constructor to HoodieROTablePathFilter
>  * #1415 HUDI-539 Make ROPathFilter conf member serializable
>  * #1578 Add changes for presto mor queries
>  * #1506 HUDI-782 Add support of Aliyun object storage service.
>  * #1432 HUDI-716 Exception: Not an Avro data file when running 
> HoodieCleanClient.runClean
>  * #1422 HUDI-400 Check upgrade from old plan to new plan for compaction
>  * #1448 [MINOR] Update DOAP with 0.5.2 Release
>  * #1466 HUDI-742 Fix Java Math Exception
>  * #1416 HUDI-717 Fixed usage of HiveDriver for DDL statements.
>  * #1427 HUDI-727: Copy default values of fields if not present when 
> rewriting incoming record with new schema
>  * #1515 HUDI-795 Handle auto-deleted empty aux folder
>  * #1547 [MINOR]: Fix cli docs for DeltaStreamer
>  * #1580 HUDI-852 adding check for table name for Append Save mode
>  * #1537 [MINOR] fixed building IndexFileFilter with a wrong condition in 
> HoodieGlobalBloomIndex class
>  * #1434 HUDI-616 Fixed parquet files getting created on local FS
>  * #1633 HUDI-858 Allow multiple operations to be executed within a single 
> commit
>  * #1634 HUDI-846Enable Incremental cleaning and embedded timeline-server by 
> default
>  * #1596 HUDI-863 get decimal properties from derived spark DataType
>  * #1636 HUDI-895 Remove unnecessary listing .hoodie folder when using 
> timeline server
>  * #1584 HUDI-902 Avoid exception when getSchemaProvider
>  * #1612 HUDI-528 Handle empty commit in incremental pulling
>  * #1511 HUDI-789Adjust logic of upsert in HDFSParquetImporter
>  * #1627 HUDI-889 Writer supports useJdbc configuration when hive 
> synchronization is enabled



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] vinothchandar commented on pull request #1643: [HUDI-110] Spark Datasource Auto Partition Extractor

2020-05-20 Thread GitBox


vinothchandar commented on pull request #1643:
URL: https://github.com/apache/incubator-hudi/pull/1643#issuecomment-631431498


   IIRC we already support generating partition path in the 
`/partitionKey=partitionValue` folder strucutre.. Not sure what this PR is 
adding



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-914) support different target data clusters

2020-05-20 Thread Vinoth Chandar (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17112126#comment-17112126
 ] 

Vinoth Chandar commented on HUDI-914:
-

>Although specifying the namenode IP address of the target cluster can be 
>written, this loses HDFS high availability 
I think you are referring to the fact that the other configs for HA NameNode 
won't be picked up for e.g? 

I think having a way to explicitly pick up configuration for target cluster in 
delta streamer and data source (IIUC you will just be augmenting the 
sparkContext with this additional configurations) is a good addition.. 

> support different target data clusters
> --
>
> Key: HUDI-914
> URL: https://issues.apache.org/jira/browse/HUDI-914
> Project: Apache Hudi (incubating)
>  Issue Type: New Feature
>  Components: DeltaStreamer
>Reporter: liujinhui
>Assignee: liujinhui
>Priority: Major
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Currently hudi-DeltaStreamer does not support writing to different target 
> clusters. The specific scenarios are as follows: Generally, Hudi tasks run on 
> an independent cluster. If you want to write data to the target data cluster, 
> you generally rely on core-site.xml and hdfs-site.xml; sometimes you will 
> encounter different targets. The data cluster writes data, but the cluster 
> running the hudi task does not have the core-site.xml and hdfs-site.xml of 
> the target cluster. Although specifying the namenode IP address of the target 
> cluster can be written, this loses HDFS high availability, so I plan to Use 
> the contents of the core-site.xml and hdfs-site.xml files of the target 
> cluster as configuration items and configure them in the 
> dfs-source.properties or kafka-source.properties file of Hudi.
> Is there a better way to solve this problem?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[incubator-hudi] branch master updated: [HUDI-846][HUDI-848] Enable Incremental cleaning and embedded timeline-server by default (#1634)

2020-05-20 Thread vinoth
This is an automated email from the ASF dual-hosted git repository.

vinoth pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new 74ecc27  [HUDI-846][HUDI-848] Enable Incremental cleaning and embedded 
timeline-server by default (#1634)
74ecc27 is described below

commit 74ecc27e920c70fa4598d8e5a696954203a5b127
Author: Balaji Varadarajan 
AuthorDate: Wed May 20 05:29:43 2020 -0700

[HUDI-846][HUDI-848] Enable Incremental cleaning and embedded 
timeline-server by default (#1634)
---
 .../apache/hudi/config/HoodieCompactionConfig.java |  2 +-
 .../org/apache/hudi/config/HoodieWriteConfig.java  |  2 +-
 .../table/action/compact/TestHoodieCompactor.java  |  9 -
 hudi-hive-sync/pom.xml |  6 ---
 .../hudi/hive/testutils/HiveTestService.java   |  1 +
 hudi-spark/pom.xml | 44 +-
 hudi-utilities/pom.xml |  7 +---
 pom.xml|  9 +
 8 files changed, 56 insertions(+), 24 deletions(-)

diff --git 
a/hudi-client/src/main/java/org/apache/hudi/config/HoodieCompactionConfig.java 
b/hudi-client/src/main/java/org/apache/hudi/config/HoodieCompactionConfig.java
index bb087a2..d135a81 100644
--- 
a/hudi-client/src/main/java/org/apache/hudi/config/HoodieCompactionConfig.java
+++ 
b/hudi-client/src/main/java/org/apache/hudi/config/HoodieCompactionConfig.java
@@ -96,7 +96,7 @@ public class HoodieCompactionConfig extends 
DefaultHoodieConfig {
   private static final String DEFAULT_CLEANER_POLICY = 
HoodieCleaningPolicy.KEEP_LATEST_COMMITS.name();
   private static final String DEFAULT_AUTO_CLEAN = "true";
   private static final String DEFAULT_INLINE_COMPACT = "false";
-  private static final String DEFAULT_INCREMENTAL_CLEANER = "false";
+  private static final String DEFAULT_INCREMENTAL_CLEANER = "true";
   private static final String DEFAULT_INLINE_COMPACT_NUM_DELTA_COMMITS = "1";
   private static final String DEFAULT_CLEANER_FILE_VERSIONS_RETAINED = "3";
   private static final String DEFAULT_CLEANER_COMMITS_RETAINED = "10";
diff --git 
a/hudi-client/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java 
b/hudi-client/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java
index 11931c1..3f0f619 100644
--- a/hudi-client/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java
+++ b/hudi-client/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java
@@ -82,7 +82,7 @@ public class HoodieWriteConfig extends DefaultHoodieConfig {
   private static final String DEFAULT_FINALIZE_WRITE_PARALLELISM = 
DEFAULT_PARALLELISM;
 
   private static final String EMBEDDED_TIMELINE_SERVER_ENABLED = 
"hoodie.embed.timeline.server";
-  private static final String DEFAULT_EMBEDDED_TIMELINE_SERVER_ENABLED = 
"false";
+  private static final String DEFAULT_EMBEDDED_TIMELINE_SERVER_ENABLED = 
"true";
 
   private static final String FAIL_ON_TIMELINE_ARCHIVING_ENABLED_PROP = 
"hoodie.fail.on.timeline.archiving";
   private static final String DEFAULT_FAIL_ON_TIMELINE_ARCHIVING_ENABLED = 
"true";
diff --git 
a/hudi-client/src/test/java/org/apache/hudi/table/action/compact/TestHoodieCompactor.java
 
b/hudi-client/src/test/java/org/apache/hudi/table/action/compact/TestHoodieCompactor.java
index 0ebebed..9aec8ad 100644
--- 
a/hudi-client/src/test/java/org/apache/hudi/table/action/compact/TestHoodieCompactor.java
+++ 
b/hudi-client/src/test/java/org/apache/hudi/table/action/compact/TestHoodieCompactor.java
@@ -30,6 +30,9 @@ import org.apache.hudi.common.model.HoodieTableType;
 import org.apache.hudi.common.model.HoodieTestUtils;
 import org.apache.hudi.common.table.HoodieTableMetaClient;
 import org.apache.hudi.common.table.timeline.HoodieActiveTimeline;
+import org.apache.hudi.common.table.timeline.HoodieInstant;
+import org.apache.hudi.common.table.timeline.HoodieInstant.State;
+import org.apache.hudi.common.table.timeline.HoodieTimeline;
 import org.apache.hudi.common.util.Option;
 import org.apache.hudi.config.HoodieCompactionConfig;
 import org.apache.hudi.config.HoodieIndexConfig;
@@ -152,9 +155,13 @@ public class TestHoodieCompactor extends 
HoodieClientTestHarness {
   HoodieIndex index = new HoodieBloomIndex<>(config);
   updatedRecords = index.tagLocation(updatedRecordsRDD, jsc, 
table).collect();
 
-  // Write them to corresponding avro logfiles
+  // Write them to corresponding avro logfiles. Also, set the state 
transition properly.
   HoodieTestUtils.writeRecordsToLogFiles(fs, metaClient.getBasePath(),
   HoodieTestDataGenerator.AVRO_SCHEMA_WITH_METADATA_FIELDS, 
updatedRecords);
+  metaClient.getActiveTimeline().transitionRequestedToInflight(new 
HoodieInstant(State.REQUESTED,
+  HoodieTimeline.DELTA_COMMIT_ACTION, newCommitTime), Option.empty());
+  writeClient.commit(newCommitTime, 

[jira] [Commented] (HUDI-890) Prepare for 0.5.3 patch release

2020-05-20 Thread sivabalan narayanan (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17112136#comment-17112136
 ] 

sivabalan narayanan commented on HUDI-890:
--

sure. Will wait to hear from [~vbalaji]

> Prepare for 0.5.3 patch release
> ---
>
> Key: HUDI-890
> URL: https://issues.apache.org/jira/browse/HUDI-890
> Project: Apache Hudi (incubating)
>  Issue Type: Task
>Reporter: Bhavani Sudha
>Assignee: Bhavani Sudha
>Priority: Major
> Fix For: 0.5.3
>
>
> The following commits are included in this release.
>  * #1372 HUDI-652 Decouple HoodieReadClient and AbstractHoodieClient to break 
> the inheritance chain
>  * #1388 HUDI-681 Remove embeddedTimelineService from HoodieReadClient
>  * #1350 HUDI-629: Replace Guava's Hashing with an equivalent in 
> NumericUtils.java
>  * #1505 [HUDI - 738] Add validation to DeltaStreamer to fail fast when 
> filterDupes is enabled on UPSERT mode.
>  * #1517 HUDI-799 Use appropriate FS when loading configs
>  * #1406 HUDI-713 Fix conversion of Spark array of struct type to Avro schema
>  * #1394 HUDI-656[Performance] Return a dummy Spark relation after writing 
> the DataFrame
>  * #1576 HUDI-850 Avoid unnecessary listings in incremental cleaning mode
>  * #1421 HUDI-724 Parallelize getSmallFiles for partitions
>  * #1330 HUDI-607 Fix to allow creation/syncing of Hive tables partitioned by 
> Date type columns
>  * #1413 Add constructor to HoodieROTablePathFilter
>  * #1415 HUDI-539 Make ROPathFilter conf member serializable
>  * #1578 Add changes for presto mor queries
>  * #1506 HUDI-782 Add support of Aliyun object storage service.
>  * #1432 HUDI-716 Exception: Not an Avro data file when running 
> HoodieCleanClient.runClean
>  * #1422 HUDI-400 Check upgrade from old plan to new plan for compaction
>  * #1448 [MINOR] Update DOAP with 0.5.2 Release
>  * #1466 HUDI-742 Fix Java Math Exception
>  * #1416 HUDI-717 Fixed usage of HiveDriver for DDL statements.
>  * #1427 HUDI-727: Copy default values of fields if not present when 
> rewriting incoming record with new schema
>  * #1515 HUDI-795 Handle auto-deleted empty aux folder
>  * #1547 [MINOR]: Fix cli docs for DeltaStreamer
>  * #1580 HUDI-852 adding check for table name for Append Save mode
>  * #1537 [MINOR] fixed building IndexFileFilter with a wrong condition in 
> HoodieGlobalBloomIndex class
>  * #1434 HUDI-616 Fixed parquet files getting created on local FS
>  * #1633 HUDI-858 Allow multiple operations to be executed within a single 
> commit
>  * #1634 HUDI-846Enable Incremental cleaning and embedded timeline-server by 
> default
>  * #1596 HUDI-863 get decimal properties from derived spark DataType
>  * #1636 HUDI-895 Remove unnecessary listing .hoodie folder when using 
> timeline server
>  * #1584 HUDI-902 Avoid exception when getSchemaProvider
>  * #1612 HUDI-528 Handle empty commit in incremental pulling
>  * #1511 HUDI-789Adjust logic of upsert in HDFSParquetImporter
>  * #1627 HUDI-889 Writer supports useJdbc configuration when hive 
> synchronization is enabled



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HUDI-890) Prepare for 0.5.3 patch release

2020-05-20 Thread sivabalan narayanan (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17112136#comment-17112136
 ] 

sivabalan narayanan edited comment on HUDI-890 at 5/20/20, 12:30 PM:
-

sure. Will wait to hear from [~vbalaji]. Sorry the other one was already 
merged. 


was (Author: shivnarayan):
sure. Will wait to hear from [~vbalaji]

> Prepare for 0.5.3 patch release
> ---
>
> Key: HUDI-890
> URL: https://issues.apache.org/jira/browse/HUDI-890
> Project: Apache Hudi (incubating)
>  Issue Type: Task
>Reporter: Bhavani Sudha
>Assignee: Bhavani Sudha
>Priority: Major
> Fix For: 0.5.3
>
>
> The following commits are included in this release.
>  * #1372 HUDI-652 Decouple HoodieReadClient and AbstractHoodieClient to break 
> the inheritance chain
>  * #1388 HUDI-681 Remove embeddedTimelineService from HoodieReadClient
>  * #1350 HUDI-629: Replace Guava's Hashing with an equivalent in 
> NumericUtils.java
>  * #1505 [HUDI - 738] Add validation to DeltaStreamer to fail fast when 
> filterDupes is enabled on UPSERT mode.
>  * #1517 HUDI-799 Use appropriate FS when loading configs
>  * #1406 HUDI-713 Fix conversion of Spark array of struct type to Avro schema
>  * #1394 HUDI-656[Performance] Return a dummy Spark relation after writing 
> the DataFrame
>  * #1576 HUDI-850 Avoid unnecessary listings in incremental cleaning mode
>  * #1421 HUDI-724 Parallelize getSmallFiles for partitions
>  * #1330 HUDI-607 Fix to allow creation/syncing of Hive tables partitioned by 
> Date type columns
>  * #1413 Add constructor to HoodieROTablePathFilter
>  * #1415 HUDI-539 Make ROPathFilter conf member serializable
>  * #1578 Add changes for presto mor queries
>  * #1506 HUDI-782 Add support of Aliyun object storage service.
>  * #1432 HUDI-716 Exception: Not an Avro data file when running 
> HoodieCleanClient.runClean
>  * #1422 HUDI-400 Check upgrade from old plan to new plan for compaction
>  * #1448 [MINOR] Update DOAP with 0.5.2 Release
>  * #1466 HUDI-742 Fix Java Math Exception
>  * #1416 HUDI-717 Fixed usage of HiveDriver for DDL statements.
>  * #1427 HUDI-727: Copy default values of fields if not present when 
> rewriting incoming record with new schema
>  * #1515 HUDI-795 Handle auto-deleted empty aux folder
>  * #1547 [MINOR]: Fix cli docs for DeltaStreamer
>  * #1580 HUDI-852 adding check for table name for Append Save mode
>  * #1537 [MINOR] fixed building IndexFileFilter with a wrong condition in 
> HoodieGlobalBloomIndex class
>  * #1434 HUDI-616 Fixed parquet files getting created on local FS
>  * #1633 HUDI-858 Allow multiple operations to be executed within a single 
> commit
>  * #1634 HUDI-846Enable Incremental cleaning and embedded timeline-server by 
> default
>  * #1596 HUDI-863 get decimal properties from derived spark DataType
>  * #1636 HUDI-895 Remove unnecessary listing .hoodie folder when using 
> timeline server
>  * #1584 HUDI-902 Avoid exception when getSchemaProvider
>  * #1612 HUDI-528 Handle empty commit in incremental pulling
>  * #1511 HUDI-789Adjust logic of upsert in HDFSParquetImporter
>  * #1627 HUDI-889 Writer supports useJdbc configuration when hive 
> synchronization is enabled



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] pratyakshsharma opened a new pull request #1648: [HUDI-916]: added support for multiple input formats in TimestampBasedKeyGenerator

2020-05-20 Thread GitBox


pratyakshsharma opened a new pull request #1648:
URL: https://github.com/apache/incubator-hudi/pull/1648


   …dKeyGenerator
   
   ## *Tips*
   - *Thank you very much for contributing to Apache Hudi.*
   - *Please review https://hudi.apache.org/contributing.html before opening a 
pull request.*
   
   ## What is the purpose of the pull request
   
   *(For example: This pull request adds quick-start document.)*
   
   ## Brief change log
   
   *(for example:)*
 - *Modify AnnotationLocation checkstyle rule in checkstyle.xml*
   
   ## Verify this pull request
   
   *(Please pick either of the following options)*
   
   This pull request is a trivial rework / code cleanup without any test 
coverage.
   
   *(or)*
   
   This pull request is already covered by existing tests, such as *(please 
describe tests)*.
   
   (or)
   
   This change added tests and can be verified as follows:
   
   *(example:)*
   
 - *Added integration tests for end-to-end.*
 - *Added HoodieClientWriteTest to verify the change.*
 - *Manually verified the change by running a job locally.*
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (HUDI-916) Add support for multiple date/time formats in TimestampBasedKeyGenerator

2020-05-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-916:

Labels: pull-request-available  (was: )

> Add support for multiple date/time formats in TimestampBasedKeyGenerator
> 
>
> Key: HUDI-916
> URL: https://issues.apache.org/jira/browse/HUDI-916
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: DeltaStreamer, newbie
>Reporter: Pratyaksh Sharma
>Assignee: Pratyaksh Sharma
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.6.0
>
>
> Currently TimestampBasedKeyGenerator supports only one input date/time format 
> creating custom partition paths using timestamp based logic. Need to support 
> multiple input formats there. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] pratyakshsharma commented on pull request #1648: [HUDI-916]: added support for multiple input formats in TimestampBasedKeyGenerator

2020-05-20 Thread GitBox


pratyakshsharma commented on pull request #1648:
URL: https://github.com/apache/incubator-hudi/pull/1648#issuecomment-631488125


   @nsivabalan Raised a separate PR for 
https://github.com/apache/incubator-hudi/pull/1597. Please take a look. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (HUDI-889) Writer supports useJdbc configuration when hive synchronization is enabled

2020-05-20 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar updated HUDI-889:

Fix Version/s: (was: 0.5.3)
   0.6.0

> Writer supports useJdbc configuration when hive synchronization is enabled
> --
>
> Key: HUDI-889
> URL: https://issues.apache.org/jira/browse/HUDI-889
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Writer Core
>Reporter: dzcxzl
>Priority: Trivial
> Fix For: 0.6.0
>
>
> hudi-hive-sync supports the useJdbc = false configuration, but the writer 
> does not provide this configuration at this stage



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] vinothchandar merged pull request #1634: [HUDI-846][HUDI-848] Enable Incremental cleaning and embedded timeline-server by default

2020-05-20 Thread GitBox


vinothchandar merged pull request #1634:
URL: https://github.com/apache/incubator-hudi/pull/1634


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (HUDI-889) Writer supports useJdbc configuration when hive synchronization is enabled

2020-05-20 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar updated HUDI-889:

Fix Version/s: 0.5.3

> Writer supports useJdbc configuration when hive synchronization is enabled
> --
>
> Key: HUDI-889
> URL: https://issues.apache.org/jira/browse/HUDI-889
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Writer Core
>Reporter: dzcxzl
>Priority: Trivial
> Fix For: 0.6.0, 0.5.3
>
>
> hudi-hive-sync supports the useJdbc = false configuration, but the writer 
> does not provide this configuration at this stage



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] pratyakshsharma commented on pull request #1538: [HUDI-803]: added more test cases in TestHoodieAvroUtils.class

2020-05-20 Thread GitBox


pratyakshsharma commented on pull request #1538:
URL: https://github.com/apache/incubator-hudi/pull/1538#issuecomment-631453314


   > @pratyakshsharma could you rebase again and repush .. codecov seems to 
need that to work..
   
   Ack. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Resolved] (HUDI-889) Writer supports useJdbc configuration when hive synchronization is enabled

2020-05-20 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan resolved HUDI-889.
--
Fix Version/s: (was: 0.6.0)
   0.5.3
   Resolution: Fixed

https://github.com/apache/incubator-hudi/commit/32bada29dc95f1d5910713ae6b4f4a4ef39677c9

> Writer supports useJdbc configuration when hive synchronization is enabled
> --
>
> Key: HUDI-889
> URL: https://issues.apache.org/jira/browse/HUDI-889
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Writer Core
>Reporter: dzcxzl
>Priority: Trivial
> Fix For: 0.5.3
>
>
> hudi-hive-sync supports the useJdbc = false configuration, but the writer 
> does not provide this configuration at this stage



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] codecov-commenter commented on pull request #1538: [HUDI-803]: added more test cases in TestHoodieAvroUtils.class

2020-05-20 Thread GitBox


codecov-commenter commented on pull request #1538:
URL: https://github.com/apache/incubator-hudi/pull/1538#issuecomment-631510907


   # 
[Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1538?src=pr=h1) 
Report
   > Merging 
[#1538](https://codecov.io/gh/apache/incubator-hudi/pull/1538?src=pr=desc) 
into 
[master](https://codecov.io/gh/apache/incubator-hudi/commit/74ecc27e920c70fa4598d8e5a696954203a5b127=desc)
 will **decrease** coverage by `0.01%`.
   > The diff coverage is `50.00%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-hudi/pull/1538/graphs/tree.svg?width=650=150=pr=VTTXabwbs2)](https://codecov.io/gh/apache/incubator-hudi/pull/1538?src=pr=tree)
   
   ```diff
   @@ Coverage Diff  @@
   ## master#1538  +/-   ##
   
   - Coverage 18.34%   18.33%   -0.02% 
   - Complexity  854  855   +1 
   
 Files   344  344  
 Lines 1517215167   -5 
 Branches   1512 1512  
   
   - Hits   2784 2781   -3 
   + Misses1203512033   -2 
 Partials353  353  
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/incubator-hudi/pull/1538?src=pr=tree) | 
Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | 
[...ain/java/org/apache/hudi/avro/HoodieAvroUtils.java](https://codecov.io/gh/apache/incubator-hudi/pull/1538/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvYXZyby9Ib29kaWVBdnJvVXRpbHMuamF2YQ==)
 | `48.09% <50.00%> (-1.91%)` | `22.00 <0.00> (ø)` | |
   | 
[...apache/hudi/common/fs/HoodieWrapperFileSystem.java](https://codecov.io/gh/apache/incubator-hudi/pull/1538/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL2ZzL0hvb2RpZVdyYXBwZXJGaWxlU3lzdGVtLmphdmE=)
 | `22.69% <0.00%> (+0.70%)` | `29.00% <0.00%> (+1.00%)` | |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1538?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1538?src=pr=footer).
 Last update 
[74ecc27...1373dee](https://codecov.io/gh/apache/incubator-hudi/pull/1538?src=pr=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[incubator-hudi] branch hudi_test_suite_refactor updated (681cce9 -> 6472886)

2020-05-20 Thread nagarwal
This is an automated email from the ASF dual-hosted git repository.

nagarwal pushed a change to branch hudi_test_suite_refactor
in repository https://gitbox.apache.org/repos/asf/incubator-hudi.git.


 discard 681cce9  [HUDI-394] Provide a basic implementation of test suite
 add 6472886  [HUDI-394] Provide a basic implementation of test suite

This update added new revisions after undoing existing revisions.
That is to say, some revisions that were in the old version of the
branch are not in the new version.  This situation occurs
when a user --force pushes a change and generates a repository
containing something like this:

 * -- * -- B -- O -- O -- O   (681cce9)
\
 N -- N -- N   refs/heads/hudi_test_suite_refactor (6472886)

You should already have received notification emails for all of the O
revisions, and so the following emails describe only the N revisions
from the common base, B.

Any revisions marked "omit" are not gone; other references still
refer to them.  Any revisions marked "discard" are gone forever.

No new revisions were added by this update.

Summary of changes:
 .../hudi/testsuite/writer/DeltaInputWriter.java|  2 +-
 .../hudi/testsuite/job/TestHoodieTestSuiteJob.java | 68 --
 2 files changed, 39 insertions(+), 31 deletions(-)



[GitHub] [incubator-hudi] xushiyan commented on a change in pull request #1640: [MINOR] Fix resource cleanup in TestTableSchemaEvolution

2020-05-20 Thread GitBox


xushiyan commented on a change in pull request #1640:
URL: https://github.com/apache/incubator-hudi/pull/1640#discussion_r428110614



##
File path: pom.xml
##
@@ -245,7 +245,8 @@
 ${maven-surefire-plugin.version}
 
   ${skipUTs}
-  -Xms256m -Xmx2g
+  -Xmx2g

Review comment:
   @vinothchandar noted. I'll make sure message is self-explanatory onwards.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-hudi] sathyaprakashg commented on issue #143: Tracking ticket for folks to be added to slack group

2020-05-20 Thread GitBox


sathyaprakashg commented on issue #143:
URL: https://github.com/apache/incubator-hudi/issues/143#issuecomment-631715287


   please add sathyapraka...@zillowgroup.com



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[incubator-hudi] branch hudi_test_suite_refactor updated (51048f6 -> 894ab75)

2020-05-20 Thread nagarwal
This is an automated email from the ASF dual-hosted git repository.

nagarwal pushed a change to branch hudi_test_suite_refactor
in repository https://gitbox.apache.org/repos/asf/incubator-hudi.git.


 discard 51048f6  [HUDI-394] Provide a basic implementation of test suite
 add 894ab75  [HUDI-394] Provide a basic implementation of test suite

This update added new revisions after undoing existing revisions.
That is to say, some revisions that were in the old version of the
branch are not in the new version.  This situation occurs
when a user --force pushes a change and generates a repository
containing something like this:

 * -- * -- B -- O -- O -- O   (51048f6)
\
 N -- N -- N   refs/heads/hudi_test_suite_refactor (894ab75)

You should already have received notification emails for all of the O
revisions, and so the following emails describe only the N revisions
from the common base, B.

Any revisions marked "omit" are not gone; other references still
refer to them.  Any revisions marked "discard" are gone forever.

No new revisions were added by this update.

Summary of changes:
 .../hudi/testsuite/job/TestHoodieTestSuiteJob.java|  1 -
 .../src/test/resources/test-suite/complex-source.avsc | 19 ++-
 .../src/test/resources/test-suite/source.avsc | 19 ++-
 .../src/test/resources/test-suite/target.avsc | 19 ++-
 .../test/resources/test-suite/test-source.properties  | 17 +
 .../delta-streamer-config/complex-source.avsc | 19 ++-
 6 files changed, 89 insertions(+), 5 deletions(-)



[GitHub] [incubator-hudi] garyli1019 commented on pull request #1643: [HUDI-110] Spark Datasource Auto Partition Extractor

2020-05-20 Thread GitBox


garyli1019 commented on pull request #1643:
URL: https://github.com/apache/incubator-hudi/pull/1643#issuecomment-631709558


   Thanks @bvaradar , so this is more on the writer side. I will take a closer 
look.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-hudi] garyli1019 closed pull request #1643: [HUDI-110] Spark Datasource Auto Partition Extractor

2020-05-20 Thread GitBox


garyli1019 closed pull request #1643:
URL: https://github.com/apache/incubator-hudi/pull/1643


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-hudi] codecov-commenter edited a comment on pull request #1644: [HUDI-811] Restructure test packages in hudi-common

2020-05-20 Thread GitBox


codecov-commenter edited a comment on pull request #1644:
URL: https://github.com/apache/incubator-hudi/pull/1644#issuecomment-631673778


   # 
[Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1644?src=pr=h1) 
Report
   > Merging 
[#1644](https://codecov.io/gh/apache/incubator-hudi/pull/1644?src=pr=desc) 
into 
[master](https://codecov.io/gh/apache/incubator-hudi/commit/6a0aa9a645d11ed7b50e18aa0563dafcd9d145f7=desc)
 will **not change** coverage.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-hudi/pull/1644/graphs/tree.svg?width=650=150=pr=VTTXabwbs2)](https://codecov.io/gh/apache/incubator-hudi/pull/1644?src=pr=tree)
   
   ```diff
   @@Coverage Diff@@
   ## master#1644   +/-   ##
   =
 Coverage 18.33%   18.33%   
 Complexity  855  855   
   =
 Files   344  344   
 Lines 1516715167   
 Branches   1512 1512   
   =
 Hits   2781 2781   
 Misses1203312033   
 Partials353  353   
   ```
   
   
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1644?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1644?src=pr=footer).
 Last update 
[6a0aa9a...af47cf0](https://codecov.io/gh/apache/incubator-hudi/pull/1644?src=pr=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-hudi] codecov-commenter commented on pull request #1644: [HUDI-811] Restructure test packages in hudi-common

2020-05-20 Thread GitBox


codecov-commenter commented on pull request #1644:
URL: https://github.com/apache/incubator-hudi/pull/1644#issuecomment-631673778


   # 
[Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1644?src=pr=h1) 
Report
   > Merging 
[#1644](https://codecov.io/gh/apache/incubator-hudi/pull/1644?src=pr=desc) 
into 
[master](https://codecov.io/gh/apache/incubator-hudi/commit/6a0aa9a645d11ed7b50e18aa0563dafcd9d145f7=desc)
 will **not change** coverage.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-hudi/pull/1644/graphs/tree.svg?width=650=150=pr=VTTXabwbs2)](https://codecov.io/gh/apache/incubator-hudi/pull/1644?src=pr=tree)
   
   ```diff
   @@Coverage Diff@@
   ## master#1644   +/-   ##
   =
 Coverage 18.33%   18.33%   
 Complexity  855  855   
   =
 Files   344  344   
 Lines 1516715167   
 Branches   1512 1512   
   =
 Hits   2781 2781   
 Misses1203312033   
 Partials353  353   
   ```
   
   
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1644?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1644?src=pr=footer).
 Last update 
[6a0aa9a...af47cf0](https://codecov.io/gh/apache/incubator-hudi/pull/1644?src=pr=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-hudi] codecov-commenter edited a comment on pull request #1644: [HUDI-811] Restructure test packages in hudi-common

2020-05-20 Thread GitBox


codecov-commenter edited a comment on pull request #1644:
URL: https://github.com/apache/incubator-hudi/pull/1644#issuecomment-631673778


   # 
[Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1644?src=pr=h1) 
Report
   > Merging 
[#1644](https://codecov.io/gh/apache/incubator-hudi/pull/1644?src=pr=desc) 
into 
[master](https://codecov.io/gh/apache/incubator-hudi/commit/6a0aa9a645d11ed7b50e18aa0563dafcd9d145f7=desc)
 will **decrease** coverage by `0.02%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-hudi/pull/1644/graphs/tree.svg?width=650=150=pr=VTTXabwbs2)](https://codecov.io/gh/apache/incubator-hudi/pull/1644?src=pr=tree)
   
   ```diff
   @@ Coverage Diff  @@
   ## master#1644  +/-   ##
   
   - Coverage 18.33%   18.30%   -0.03% 
   + Complexity  855  854   -1 
   
 Files   344  344  
 Lines 1516715167  
 Branches   1512 1512  
   
   - Hits   2781 2777   -4 
   - Misses1203312036   +3 
   - Partials353  354   +1 
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/incubator-hudi/pull/1644?src=pr=tree) | 
Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | 
[...che/hudi/common/table/log/HoodieLogFileReader.java](https://codecov.io/gh/apache/incubator-hudi/pull/1644/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL2xvZy9Ib29kaWVMb2dGaWxlUmVhZGVyLmphdmE=)
 | `0.00% <ø> (ø)` | `0.00 <0.00> (ø)` | |
   | 
[...che/hudi/common/table/timeline/TimelineLayout.java](https://codecov.io/gh/apache/incubator-hudi/pull/1644/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3RpbWVsaW5lL1RpbWVsaW5lTGF5b3V0LmphdmE=)
 | `78.57% <0.00%> (-14.29%)` | `3.00% <0.00%> (ø%)` | |
   | 
[...apache/hudi/common/fs/HoodieWrapperFileSystem.java](https://codecov.io/gh/apache/incubator-hudi/pull/1644/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL2ZzL0hvb2RpZVdyYXBwZXJGaWxlU3lzdGVtLmphdmE=)
 | `21.98% <0.00%> (-0.71%)` | `28.00% <0.00%> (-1.00%)` | |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1644?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1644?src=pr=footer).
 Last update 
[6a0aa9a...ed4ca32](https://codecov.io/gh/apache/incubator-hudi/pull/1644?src=pr=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-hudi] codecov-commenter commented on pull request #1650: [HUDI-541]: replaced dataFile/df with baseFile/bf throughout code base

2020-05-20 Thread GitBox


codecov-commenter commented on pull request #1650:
URL: https://github.com/apache/incubator-hudi/pull/1650#issuecomment-631782718


   # 
[Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1650?src=pr=h1) 
Report
   > Merging 
[#1650](https://codecov.io/gh/apache/incubator-hudi/pull/1650?src=pr=desc) 
into 
[master](https://codecov.io/gh/apache/incubator-hudi/commit/6a0aa9a645d11ed7b50e18aa0563dafcd9d145f7=desc)
 will **not change** coverage.
   > The diff coverage is `18.66%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-hudi/pull/1650/graphs/tree.svg?width=650=150=pr=VTTXabwbs2)](https://codecov.io/gh/apache/incubator-hudi/pull/1650?src=pr=tree)
   
   ```diff
   @@Coverage Diff@@
   ## master#1650   +/-   ##
   =
 Coverage 18.33%   18.33%   
 Complexity  855  855   
   =
 Files   344  344   
 Lines 1516715167   
 Branches   1512 1512   
   =
 Hits   2781 2781   
 Misses1203312033   
 Partials353  353   
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/incubator-hudi/pull/1650?src=pr=tree) | 
Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | 
[.../org/apache/hudi/client/CompactionAdminClient.java](https://codecov.io/gh/apache/incubator-hudi/pull/1650/diff?src=pr=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY2xpZW50L0NvbXBhY3Rpb25BZG1pbkNsaWVudC5qYXZh)
 | `0.00% <0.00%> (ø)` | `0.00 <0.00> (ø)` | |
   | 
[.../java/org/apache/hudi/client/HoodieReadClient.java](https://codecov.io/gh/apache/incubator-hudi/pull/1650/diff?src=pr=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY2xpZW50L0hvb2RpZVJlYWRDbGllbnQuamF2YQ==)
 | `0.00% <0.00%> (ø)` | `0.00 <0.00> (ø)` | |
   | 
[...java/org/apache/hudi/io/HoodieKeyLookupHandle.java](https://codecov.io/gh/apache/incubator-hudi/pull/1650/diff?src=pr=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaW8vSG9vZGllS2V5TG9va3VwSGFuZGxlLmphdmE=)
 | `0.00% <0.00%> (ø)` | `0.00 <0.00> (ø)` | |
   | 
[...ain/java/org/apache/hudi/io/HoodieMergeHandle.java](https://codecov.io/gh/apache/incubator-hudi/pull/1650/diff?src=pr=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaW8vSG9vZGllTWVyZ2VIYW5kbGUuamF2YQ==)
 | `0.00% <0.00%> (ø)` | `0.00 <0.00> (ø)` | |
   | 
[...java/org/apache/hudi/io/HoodieRangeInfoHandle.java](https://codecov.io/gh/apache/incubator-hudi/pull/1650/diff?src=pr=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaW8vSG9vZGllUmFuZ2VJbmZvSGFuZGxlLmphdmE=)
 | `0.00% <0.00%> (ø)` | `0.00 <0.00> (ø)` | |
   | 
[...main/java/org/apache/hudi/io/HoodieReadHandle.java](https://codecov.io/gh/apache/incubator-hudi/pull/1650/diff?src=pr=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaW8vSG9vZGllUmVhZEhhbmRsZS5qYXZh)
 | `0.00% <ø> (ø)` | `0.00 <0.00> (ø)` | |
   | 
[.../org/apache/hudi/table/HoodieCopyOnWriteTable.java](https://codecov.io/gh/apache/incubator-hudi/pull/1650/diff?src=pr=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdGFibGUvSG9vZGllQ29weU9uV3JpdGVUYWJsZS5qYXZh)
 | `5.74% <0.00%> (ø)` | `4.00 <0.00> (ø)` | |
   | 
[...g/apache/hudi/table/action/clean/CleanPlanner.java](https://codecov.io/gh/apache/incubator-hudi/pull/1650/diff?src=pr=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdGFibGUvYWN0aW9uL2NsZWFuL0NsZWFuUGxhbm5lci5qYXZh)
 | `12.40% <0.00%> (ø)` | `5.00 <0.00> (ø)` | |
   | 
[...hudi/table/action/commit/CommitActionExecutor.java](https://codecov.io/gh/apache/incubator-hudi/pull/1650/diff?src=pr=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdGFibGUvYWN0aW9uL2NvbW1pdC9Db21taXRBY3Rpb25FeGVjdXRvci5qYXZh)
 | `11.11% <0.00%> (ø)` | `4.00 <0.00> (ø)` | |
   | 
[...ction/compact/HoodieMergeOnReadTableCompactor.java](https://codecov.io/gh/apache/incubator-hudi/pull/1650/diff?src=pr=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdGFibGUvYWN0aW9uL2NvbXBhY3QvSG9vZGllTWVyZ2VPblJlYWRUYWJsZUNvbXBhY3Rvci5qYXZh)
 | `0.00% <0.00%> (ø)` | `0.00 <0.00> (ø)` | |
   | ... and [24 
more](https://codecov.io/gh/apache/incubator-hudi/pull/1650/diff?src=pr=tree-more)
 | |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1650?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1650?src=pr=footer).
 Last update 
[6a0aa9a...f7f45b0](https://codecov.io/gh/apache/incubator-hudi/pull/1650?src=pr=lastupdated).
 Read the [comment 

[GitHub] [incubator-hudi] pratyakshsharma commented on pull request #1562: [HUDI-837]: implemented custom deserializer for AvroKafkaSource

2020-05-20 Thread GitBox


pratyakshsharma commented on pull request #1562:
URL: https://github.com/apache/incubator-hudi/pull/1562#issuecomment-631763979


   @n3nash got a chance to look at this? :)



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-hudi] codecov-commenter edited a comment on pull request #1650: [HUDI-541]: replaced dataFile/df with baseFile/bf throughout code base

2020-05-20 Thread GitBox


codecov-commenter edited a comment on pull request #1650:
URL: https://github.com/apache/incubator-hudi/pull/1650#issuecomment-631782718


   # 
[Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1650?src=pr=h1) 
Report
   > Merging 
[#1650](https://codecov.io/gh/apache/incubator-hudi/pull/1650?src=pr=desc) 
into 
[master](https://codecov.io/gh/apache/incubator-hudi/commit/6a0aa9a645d11ed7b50e18aa0563dafcd9d145f7=desc)
 will **not change** coverage.
   > The diff coverage is `18.66%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-hudi/pull/1650/graphs/tree.svg?width=650=150=pr=VTTXabwbs2)](https://codecov.io/gh/apache/incubator-hudi/pull/1650?src=pr=tree)
   
   ```diff
   @@Coverage Diff@@
   ## master#1650   +/-   ##
   =
 Coverage 18.33%   18.33%   
 Complexity  855  855   
   =
 Files   344  344   
 Lines 1516715167   
 Branches   1512 1512   
   =
 Hits   2781 2781   
 Misses1203312033   
 Partials353  353   
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/incubator-hudi/pull/1650?src=pr=tree) | 
Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | 
[.../org/apache/hudi/client/CompactionAdminClient.java](https://codecov.io/gh/apache/incubator-hudi/pull/1650/diff?src=pr=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY2xpZW50L0NvbXBhY3Rpb25BZG1pbkNsaWVudC5qYXZh)
 | `0.00% <0.00%> (ø)` | `0.00 <0.00> (ø)` | |
   | 
[.../java/org/apache/hudi/client/HoodieReadClient.java](https://codecov.io/gh/apache/incubator-hudi/pull/1650/diff?src=pr=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY2xpZW50L0hvb2RpZVJlYWRDbGllbnQuamF2YQ==)
 | `0.00% <0.00%> (ø)` | `0.00 <0.00> (ø)` | |
   | 
[...java/org/apache/hudi/io/HoodieKeyLookupHandle.java](https://codecov.io/gh/apache/incubator-hudi/pull/1650/diff?src=pr=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaW8vSG9vZGllS2V5TG9va3VwSGFuZGxlLmphdmE=)
 | `0.00% <0.00%> (ø)` | `0.00 <0.00> (ø)` | |
   | 
[...ain/java/org/apache/hudi/io/HoodieMergeHandle.java](https://codecov.io/gh/apache/incubator-hudi/pull/1650/diff?src=pr=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaW8vSG9vZGllTWVyZ2VIYW5kbGUuamF2YQ==)
 | `0.00% <0.00%> (ø)` | `0.00 <0.00> (ø)` | |
   | 
[...java/org/apache/hudi/io/HoodieRangeInfoHandle.java](https://codecov.io/gh/apache/incubator-hudi/pull/1650/diff?src=pr=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaW8vSG9vZGllUmFuZ2VJbmZvSGFuZGxlLmphdmE=)
 | `0.00% <0.00%> (ø)` | `0.00 <0.00> (ø)` | |
   | 
[...main/java/org/apache/hudi/io/HoodieReadHandle.java](https://codecov.io/gh/apache/incubator-hudi/pull/1650/diff?src=pr=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaW8vSG9vZGllUmVhZEhhbmRsZS5qYXZh)
 | `0.00% <ø> (ø)` | `0.00 <0.00> (ø)` | |
   | 
[.../org/apache/hudi/table/HoodieCopyOnWriteTable.java](https://codecov.io/gh/apache/incubator-hudi/pull/1650/diff?src=pr=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdGFibGUvSG9vZGllQ29weU9uV3JpdGVUYWJsZS5qYXZh)
 | `5.74% <0.00%> (ø)` | `4.00 <0.00> (ø)` | |
   | 
[...g/apache/hudi/table/action/clean/CleanPlanner.java](https://codecov.io/gh/apache/incubator-hudi/pull/1650/diff?src=pr=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdGFibGUvYWN0aW9uL2NsZWFuL0NsZWFuUGxhbm5lci5qYXZh)
 | `12.40% <0.00%> (ø)` | `5.00 <0.00> (ø)` | |
   | 
[...hudi/table/action/commit/CommitActionExecutor.java](https://codecov.io/gh/apache/incubator-hudi/pull/1650/diff?src=pr=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdGFibGUvYWN0aW9uL2NvbW1pdC9Db21taXRBY3Rpb25FeGVjdXRvci5qYXZh)
 | `11.11% <0.00%> (ø)` | `4.00 <0.00> (ø)` | |
   | 
[...ction/compact/HoodieMergeOnReadTableCompactor.java](https://codecov.io/gh/apache/incubator-hudi/pull/1650/diff?src=pr=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdGFibGUvYWN0aW9uL2NvbXBhY3QvSG9vZGllTWVyZ2VPblJlYWRUYWJsZUNvbXBhY3Rvci5qYXZh)
 | `0.00% <0.00%> (ø)` | `0.00 <0.00> (ø)` | |
   | ... and [24 
more](https://codecov.io/gh/apache/incubator-hudi/pull/1650/diff?src=pr=tree-more)
 | |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1650?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1650?src=pr=footer).
 Last update 
[6a0aa9a...f7f45b0](https://codecov.io/gh/apache/incubator-hudi/pull/1650?src=pr=lastupdated).
 Read the [comment 

[GitHub] [incubator-hudi] HariprasadAllaka1612 commented on issue #1641: [SUPPORT] Failed to merge old record into new file for key xxx from old file 123.parquet to new file 456.parquet

2020-05-20 Thread GitBox


HariprasadAllaka1612 commented on issue #1641:
URL: https://github.com/apache/incubator-hudi/issues/1641#issuecomment-631674561


   We can close this issue. This is a problem of having the parquet and hive 
table synced to parquet file having 2 different schemas. Its fixed by forcing 
the parquet schema always equal hive meta store, 
   
   Thank you.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-hudi] codecov-commenter edited a comment on pull request #1644: [HUDI-811] Restructure test packages in hudi-common

2020-05-20 Thread GitBox


codecov-commenter edited a comment on pull request #1644:
URL: https://github.com/apache/incubator-hudi/pull/1644#issuecomment-631673778


   # 
[Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1644?src=pr=h1) 
Report
   > Merging 
[#1644](https://codecov.io/gh/apache/incubator-hudi/pull/1644?src=pr=desc) 
into 
[master](https://codecov.io/gh/apache/incubator-hudi/commit/6a0aa9a645d11ed7b50e18aa0563dafcd9d145f7=desc)
 will **decrease** coverage by `0.01%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-hudi/pull/1644/graphs/tree.svg?width=650=150=pr=VTTXabwbs2)](https://codecov.io/gh/apache/incubator-hudi/pull/1644?src=pr=tree)
   
   ```diff
   @@ Coverage Diff  @@
   ## master#1644  +/-   ##
   
   - Coverage 18.33%   18.32%   -0.02% 
   + Complexity  855  854   -1 
   
 Files   344  344  
 Lines 1516715167  
 Branches   1512 1512  
   
   - Hits   2781 2779   -2 
   - Misses1203312035   +2 
 Partials353  353  
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/incubator-hudi/pull/1644?src=pr=tree) | 
Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | 
[...apache/hudi/common/fs/HoodieWrapperFileSystem.java](https://codecov.io/gh/apache/incubator-hudi/pull/1644/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL2ZzL0hvb2RpZVdyYXBwZXJGaWxlU3lzdGVtLmphdmE=)
 | `21.98% <0.00%> (-0.71%)` | `28.00% <0.00%> (-1.00%)` | |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1644?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1644?src=pr=footer).
 Last update 
[6a0aa9a...2915023](https://codecov.io/gh/apache/incubator-hudi/pull/1644?src=pr=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[incubator-hudi] branch hudi_test_suite_refactor updated (7781692 -> 51048f6)

2020-05-20 Thread nagarwal
This is an automated email from the ASF dual-hosted git repository.

nagarwal pushed a change to branch hudi_test_suite_refactor
in repository https://gitbox.apache.org/repos/asf/incubator-hudi.git.


 discard 7781692  [HUDI-394] Provide a basic implementation of test suite
 add 3c9da2e  [HUDI-895] Remove unnecessary listing .hoodie folder when 
using timeline server (#1636)
 add 29edf4b  [HUDI-407] Adding Simple Index to Hoodie. (#1402)
 add 57132f7  [HUDI-705] Add unit test for RollbacksCommand (#1611)
 add 459356e  [HUDI-863] get decimal properties from derived spark DataType 
(#1596)
 add 2600d2d  [MINOR] Fix apache-rat violations (#1639)
 add e6f3bf1  [HUDI-858] Allow multiple operations to be executed within a 
single commit (#1633)
 add 161a798  [HUDI-706] Add unit test for SavepointsCommand (#1624)
 add 0dc2fa6  [MINOR] Fix HoodieCompactor config abbreviation (#1642)
 add 244d474  [HUDI-888] fix NullPointerException in HoodieCompactor (#1622)
 add f802d44  [MINOR] Fix resource cleanup in TestTableSchemaEvolution 
(#1640)
 add 74ecc27  [HUDI-846][HUDI-848] Enable Incremental cleaning and embedded 
timeline-server by default (#1634)
 add 6a0aa9a  [HUDI-803] Replaced used of NullNode with 
JsonProperties.NULL_VALUE in HoodieAvroUtils (#1538)
 add 51048f6  [HUDI-394] Provide a basic implementation of test suite

This update added new revisions after undoing existing revisions.
That is to say, some revisions that were in the old version of the
branch are not in the new version.  This situation occurs
when a user --force pushes a change and generates a repository
containing something like this:

 * -- * -- B -- O -- O -- O   (7781692)
\
 N -- N -- N   refs/heads/hudi_test_suite_refactor (51048f6)

You should already have received notification emails for all of the O
revisions, and so the following emails describe only the N revisions
from the common base, B.

Any revisions marked "omit" are not gone; other references still
refer to them.  Any revisions marked "discard" are gone forever.

No new revisions were added by this update.

Summary of changes:
 .../apache/hudi/cli/HoodieTableHeaderFields.java   |  15 +
 .../apache/hudi/cli/commands/RollbacksCommand.java |  19 +-
 .../hudi/cli/commands/SavepointsCommand.java   |  76 +--
 .../org/apache/hudi/cli/commands/SparkMain.java|  50 +-
 .../hudi/cli/commands/TestRollbacksCommand.java| 182 
 .../hudi/cli/commands/TestSavepointsCommand.java   | 110 +
 .../hudi/cli/integ/ITTestSavepointsCommand.java| 157 +++
 .../org/apache/hudi/client/HoodieWriteClient.java  |   6 +-
 .../apache/hudi/client/utils/SparkConfigUtils.java |   4 +
 .../apache/hudi/config/HoodieCompactionConfig.java |   2 +-
 .../org/apache/hudi/config/HoodieIndexConfig.java  |  46 ++
 .../org/apache/hudi/config/HoodieWriteConfig.java  |  44 +-
 .../java/org/apache/hudi/index/HoodieIndex.java|  15 +-
 .../org/apache/hudi/index/HoodieIndexUtils.java|  90 
 .../apache/hudi/index/bloom/HoodieBloomIndex.java  |  35 +-
 .../hudi/index/bloom/HoodieGlobalBloomIndex.java   |   7 +-
 .../hudi/index/simple/HoodieGlobalSimpleIndex.java | 169 +++
 .../hudi/index/simple/HoodieSimpleIndex.java   | 181 
 .../hudi/io/HoodieKeyLocationFetchHandle.java  |  57 +++
 .../java/org/apache/hudi/table/HoodieTable.java|   6 +-
 .../action/commit/BaseCommitActionExecutor.java|   3 +-
 .../hudi/table/action/commit/BulkInsertHelper.java |   3 +-
 .../TestHoodieClientOnCopyOnWriteStorage.java  |  38 ++
 .../hudi/client/TestTableSchemaEvolution.java  |   6 +-
 .../org/apache/hudi/index/TestHoodieIndex.java | 510 +++--
 .../hudi/index/bloom/TestHoodieBloomIndex.java |   1 -
 .../hudi/io/TestHoodieKeyLocationFetchHandle.java  | 210 +
 .../table/action/compact/TestHoodieCompactor.java  |   9 +-
 .../java/org/apache/hudi/avro/HoodieAvroUtils.java |  35 +-
 .../table/timeline/HoodieActiveTimeline.java   |  20 +-
 .../common/table/view/FileSystemViewManager.java   |  59 ++-
 .../hudi/common/util/ObjectSizeCalculator.java |  32 +-
 .../org/apache/hudi/common/util/ParquetUtils.java  |  52 ++-
 .../org/apache/hudi/avro/TestHoodieAvroUtils.java  |  93 +++-
 .../apache/hudi/common/util/TestParquetUtils.java  |  35 +-
 hudi-hive-sync/pom.xml |   6 -
 .../hudi/hive/testutils/HiveTestService.java   |   1 +
 hudi-integ-test/pom.xml|   4 +
 hudi-spark/pom.xml |  44 +-
 .../org/apache/hudi/AvroConversionHelper.scala |  22 +-
 .../org/apache/hudi/AvroConversionUtils.scala  |   4 +-
 hudi-test-suite/pom.xml|   7 +-
 hudi-utilities/pom.xml |  11 +-
 .../org/apache/hudi/utilities/HoodieCompactor.java |  27 +-
 .../exception/HoodieSnapshotExporterException.java |  18 +
 

[jira] [Updated] (HUDI-541) Replace variables/comments named "data files" to "base file"

2020-05-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-541:

Labels: pull-request-available  (was: )

> Replace variables/comments named "data files" to "base file"
> 
>
> Key: HUDI-541
> URL: https://issues.apache.org/jira/browse/HUDI-541
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Code Cleanup, newbie
>Reporter: Vinoth Chandar
>Assignee: Pratyaksh Sharma
>Priority: Major
>  Labels: pull-request-available
>
> Per cWiki design and arch page, we should converge on the same terminology.. 
> We have _HoodieBaseFile_.. we should ensure all variables of this type are 
> named _baseFile_ or _bf_ , as opposed to _dataFile_ or _df_. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] pratyakshsharma opened a new pull request #1650: [HUDI-541]: replaced dataFile/df with baseFile/bf throughout code base

2020-05-20 Thread GitBox


pratyakshsharma opened a new pull request #1650:
URL: https://github.com/apache/incubator-hudi/pull/1650


   ## *Tips*
   - *Thank you very much for contributing to Apache Hudi.*
   - *Please review https://hudi.apache.org/contributing.html before opening a 
pull request.*
   
   ## What is the purpose of the pull request
   
   *(For example: This pull request adds quick-start document.)*
   
   ## Brief change log
   
   *(for example:)*
 - *Modify AnnotationLocation checkstyle rule in checkstyle.xml*
   
   ## Verify this pull request
   
   *(Please pick either of the following options)*
   
   This pull request is a trivial rework / code cleanup without any test 
coverage.
   
   *(or)*
   
   This pull request is already covered by existing tests, such as *(please 
describe tests)*.
   
   (or)
   
   This change added tests and can be verified as follows:
   
   *(example:)*
   
 - *Added integration tests for end-to-end.*
 - *Added HoodieClientWriteTest to verify the change.*
 - *Manually verified the change by running a job locally.*
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-hudi] codecov-commenter edited a comment on pull request #1644: [HUDI-811] Restructure test packages in hudi-common

2020-05-20 Thread GitBox


codecov-commenter edited a comment on pull request #1644:
URL: https://github.com/apache/incubator-hudi/pull/1644#issuecomment-631673778


   # 
[Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1644?src=pr=h1) 
Report
   > Merging 
[#1644](https://codecov.io/gh/apache/incubator-hudi/pull/1644?src=pr=desc) 
into 
[master](https://codecov.io/gh/apache/incubator-hudi/commit/6a0aa9a645d11ed7b50e18aa0563dafcd9d145f7=desc)
 will **decrease** coverage by `0.02%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-hudi/pull/1644/graphs/tree.svg?width=650=150=pr=VTTXabwbs2)](https://codecov.io/gh/apache/incubator-hudi/pull/1644?src=pr=tree)
   
   ```diff
   @@ Coverage Diff  @@
   ## master#1644  +/-   ##
   
   - Coverage 18.33%   18.30%   -0.03% 
   + Complexity  855  854   -1 
   
 Files   344  344  
 Lines 1516715167  
 Branches   1512 1512  
   
   - Hits   2781 2777   -4 
   - Misses1203312036   +3 
   - Partials353  354   +1 
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/incubator-hudi/pull/1644?src=pr=tree) | 
Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | 
[...che/hudi/common/table/timeline/TimelineLayout.java](https://codecov.io/gh/apache/incubator-hudi/pull/1644/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3RpbWVsaW5lL1RpbWVsaW5lTGF5b3V0LmphdmE=)
 | `78.57% <0.00%> (-14.29%)` | `3.00% <0.00%> (ø%)` | |
   | 
[...apache/hudi/common/fs/HoodieWrapperFileSystem.java](https://codecov.io/gh/apache/incubator-hudi/pull/1644/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL2ZzL0hvb2RpZVdyYXBwZXJGaWxlU3lzdGVtLmphdmE=)
 | `21.98% <0.00%> (-0.71%)` | `28.00% <0.00%> (-1.00%)` | |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1644?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1644?src=pr=footer).
 Last update 
[6a0aa9a...2915023](https://codecov.io/gh/apache/incubator-hudi/pull/1644?src=pr=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-hudi] xushiyan commented on pull request #1644: [HUDI-811] Restructure test packages in hudi-common

2020-05-20 Thread GitBox


xushiyan commented on pull request #1644:
URL: https://github.com/apache/incubator-hudi/pull/1644#issuecomment-631820696


   @yanghua The PR is ready for review. Thanks.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-hudi] xushiyan edited a comment on pull request #1644: [HUDI-811] Restructure test packages in hudi-common

2020-05-20 Thread GitBox


xushiyan edited a comment on pull request #1644:
URL: https://github.com/apache/incubator-hudi/pull/1644#issuecomment-631577878


   
   ![Screen Shot 2020-05-20 at 7 24 30 
PM](https://user-images.githubusercontent.com/2701446/82516434-906a1300-9acf-11ea-8d40-03d21d4ccaf2.png)
   
   
   Test classes under `functional` and `testutils`
   - `TestHoodieLogFormat` and `TestHoodieLogFormatAppendFailure` involves 
minicluster hence go to `functional`
   - Tried to move `org.apache.hudi.common.util.TestDFSPropertiesConfiguration` 
to `functional` but ran into errors from `TestParquetUtils` which implies 
inter-class side-effects. The investigation is beyond the scope of this PR, 
hence leaving it there.
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[incubator-hudi] branch asf-site updated: Travis CI build asf-site

2020-05-20 Thread vinoth
This is an automated email from the ASF dual-hosted git repository.

vinoth pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/incubator-hudi.git


The following commit(s) were added to refs/heads/asf-site by this push:
 new d0c3b9f  Travis CI build asf-site
d0c3b9f is described below

commit d0c3b9fb1095eaae1cda48bbef0de32defdb04b8
Author: CI 
AuthorDate: Thu May 21 04:46:35 2020 +

Travis CI build asf-site
---
 content/cn/docs/0.5.2-querying_data.html | 4 ++--
 content/cn/docs/querying_data.html   | 4 ++--
 content/cn/docs/quick-start-guide.html   | 2 ++
 content/docs/0.5.2-querying_data.html| 4 ++--
 content/docs/querying_data.html  | 4 ++--
 content/docs/quick-start-guide.html  | 6 --
 6 files changed, 14 insertions(+), 10 deletions(-)

diff --git a/content/cn/docs/0.5.2-querying_data.html 
b/content/cn/docs/0.5.2-querying_data.html
index eeaf7a1..4ed98db 100644
--- a/content/cn/docs/0.5.2-querying_data.html
+++ b/content/cn/docs/0.5.2-querying_data.html
@@ -360,7 +360,7 @@
 
   
   Presto
-  Impala(此功能还未正式发布)
+  Impala (3.4 or later)
 
   读优化表
 
@@ -677,7 +677,7 @@ Upsert实用程序(HoodieDeltaStreamer
 Presto是一种常用的查询引擎,可提供交互式查询性能。 Hudi RO表可以在Presto中无缝查询。
 这需要在整个安装过程中将hudi-presto-bundle 
jar放入presto_install/plugin/hive-hadoop2/中。
 
-Impala(此功能还未正式发布)
+Impala (3.4 or later)
 
 读优化表
 
diff --git a/content/cn/docs/querying_data.html 
b/content/cn/docs/querying_data.html
index e33d18b..002c34c 100644
--- a/content/cn/docs/querying_data.html
+++ b/content/cn/docs/querying_data.html
@@ -360,7 +360,7 @@
 
   
   Presto
-  Impala(此功能还未正式发布)
+  Impala (3.4 or later)
 
   读优化表
 
@@ -677,7 +677,7 @@ Upsert实用程序(HoodieDeltaStreamer
 Presto是一种常用的查询引擎,可提供交互式查询性能。 Hudi RO表可以在Presto中无缝查询。
 这需要在整个安装过程中将hudi-presto-bundle 
jar放入presto_install/plugin/hive-hadoop2/中。
 
-Impala(此功能还未正式发布)
+Impala (3.4 or later)
 
 读优化表
 
diff --git a/content/cn/docs/quick-start-guide.html 
b/content/cn/docs/quick-start-guide.html
index 984639a..adc2bfc 100644
--- a/content/cn/docs/quick-start-guide.html
+++ b/content/cn/docs/quick-start-guide.html
@@ -410,6 +410,8 @@
 read.
 format("org.apache.hudi").
 load(basePath + "/*/*/*/*")
+//load(basePath) 如果使用 "/partitionKey=partitionValue" 
文件夹命名格式,Spark将自动识别分区信息
+
 roViewDF.registerTempTable("hudi_ro_table")
 spark.sql("select fare, 
begin_lon, begin_lat, ts from  hudi_ro_table where fare  20.0").show()
 spark.sql("select 
_hoodie_commit_time, _hoodie_record_key, _hoodie_partition_path, rider, driver, 
fare from  hudi_ro_table").show()
diff --git a/content/docs/0.5.2-querying_data.html 
b/content/docs/0.5.2-querying_data.html
index 859bd0c..4315d03 100644
--- a/content/docs/0.5.2-querying_data.html
+++ b/content/docs/0.5.2-querying_data.html
@@ -357,7 +357,7 @@
 
   
   Presto
-  Impala (Not Officially 
Released)
+  Impala (3.4 or later)
 
   Snapshot Query
 
@@ -672,7 +672,7 @@ Please refer to confi
 Presto is a popular query engine, providing interactive query performance. 
Presto currently supports snapshot queries on COPY_ON_WRITE and read optimized 
queries 
 on MERGE_ON_READ Hudi tables. This requires the hudi-presto-bundle jar to be placed into presto_install/plugin/hive-hadoop2/, 
across the installation.
 
-Impala (Not Officially Released)
+Impala (3.4 or later)
 
 Snapshot Query
 
diff --git a/content/docs/querying_data.html b/content/docs/querying_data.html
index e8dbe1a..2cfa722 100644
--- a/content/docs/querying_data.html
+++ b/content/docs/querying_data.html
@@ -357,7 +357,7 @@
 
   
   Presto
-  Impala (Not Officially 
Released)
+  Impala (3.4 or later)
 
   Snapshot Query
 
@@ -672,7 +672,7 @@ Please refer to configurati
 Presto is a popular query engine, providing interactive query performance. 
Presto currently supports snapshot queries on COPY_ON_WRITE and read optimized 
queries 
 on MERGE_ON_READ Hudi tables. This requires the hudi-presto-bundle jar to be placed into presto_install/plugin/hive-hadoop2/, 
across the installation.
 
-Impala (Not Officially Released)
+Impala (3.4 or later)
 
 Snapshot Query
 
diff --git a/content/docs/quick-start-guide.html 
b/content/docs/quick-start-guide.html
index fa00061..76f0967 100644
--- a/content/docs/quick-start-guide.html
+++ b/content/docs/quick-start-guide.html
@@ -446,7 +446,8 @@ Here we are using the default write operation : 
   read.
   format("hudi").
   load(basePath + "/*/*/*/*")
-tripsSnapshotDF.createOrReplaceTempView("hudi_trips_snapshot")
+//load(basePath) use "/partitionKey=partitionValue" folder 
structure for Spark auto partition discovery
+tripsSnapshotDF.createOrReplaceTempView("hudi_trips_snapshot")
 
 spark.sql("select fare, 
begin_lon, begin_lat, ts from  hudi_trips_snapshot where fare  
20.0").show()
 spark.sql("select 
_hoodie_commit_time, _hoodie_record_key, _hoodie_partition_path, rider, driver, 
fare from  hudi_trips_snapshot").show()
@@ -637,7 +638,8 @@ Here we are 

[jira] [Updated] (HUDI-905) Support PrunedFilteredScan for Spark Datasource

2020-05-20 Thread Yanjia Gary Li (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yanjia Gary Li updated HUDI-905:

Priority: Minor  (was: Major)

> Support PrunedFilteredScan for Spark Datasource
> ---
>
> Key: HUDI-905
> URL: https://issues.apache.org/jira/browse/HUDI-905
> Project: Apache Hudi (incubating)
>  Issue Type: New Feature
>Reporter: Yanjia Gary Li
>Priority: Minor
>
> Hudi Spark Datasource incremental view currently is using 
> DataSourceReadOptions.PUSH_DOWN_INCR_FILTERS_OPT_KEY to push down the filter.
> If we wanna use Spark predicate pushdown in a native way, we need to 
> implement PrunedFilteredScan for Hudi Datasource.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-905) Support PrunedFilteredScan for Spark Datasource

2020-05-20 Thread Yanjia Gary Li (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yanjia Gary Li updated HUDI-905:

Status: Open  (was: New)

> Support PrunedFilteredScan for Spark Datasource
> ---
>
> Key: HUDI-905
> URL: https://issues.apache.org/jira/browse/HUDI-905
> Project: Apache Hudi (incubating)
>  Issue Type: New Feature
>Reporter: Yanjia Gary Li
>Priority: Minor
>
> Hudi Spark Datasource incremental view currently is using 
> DataSourceReadOptions.PUSH_DOWN_INCR_FILTERS_OPT_KEY to push down the filter.
> If we wanna use Spark predicate pushdown in a native way, we need to 
> implement PrunedFilteredScan for Hudi Datasource.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-905) Support PrunedFilteredScan for Spark Datasource

2020-05-20 Thread Yanjia Gary Li (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yanjia Gary Li updated HUDI-905:

Component/s: Spark Integration

> Support PrunedFilteredScan for Spark Datasource
> ---
>
> Key: HUDI-905
> URL: https://issues.apache.org/jira/browse/HUDI-905
> Project: Apache Hudi (incubating)
>  Issue Type: New Feature
>  Components: Spark Integration
>Reporter: Yanjia Gary Li
>Priority: Minor
>
> Hudi Spark Datasource incremental view currently is using 
> DataSourceReadOptions.PUSH_DOWN_INCR_FILTERS_OPT_KEY to push down the filter.
> If we wanna use Spark predicate pushdown in a native way, we need to 
> implement PrunedFilteredScan for Hudi Datasource.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] xushiyan edited a comment on pull request #1644: [HUDI-811] Restructure test packages in hudi-common

2020-05-20 Thread GitBox


xushiyan edited a comment on pull request #1644:
URL: https://github.com/apache/incubator-hudi/pull/1644#issuecomment-631577878


   ![Screen Shot 2020-05-20 at 8 57 05 
AM](https://user-images.githubusercontent.com/2701446/82468795-15294280-9a78-11ea-909a-bf09da83d7a4.png)
   
   Test classes under `functional` and `testutils`
   - `TestHoodieLogFormat` and `TestHoodieLogFormatAppendFailure` involves 
minicluster hence go to `functional`
   - Tried to move `org.apache.hudi.common.util.TestDFSPropertiesConfiguration` 
to `functional` but ran into errors from `TestParquetUtils` which implies 
inter-class side-effects. The investigation is beyond the scope of this PR, 
hence leaving it there.
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-hudi] codecov-commenter edited a comment on pull request #1645: [HUDI-707]Add unit test for StatsCommand

2020-05-20 Thread GitBox


codecov-commenter edited a comment on pull request #1645:
URL: https://github.com/apache/incubator-hudi/pull/1645#issuecomment-631841340


   # 
[Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1645?src=pr=h1) 
Report
   > Merging 
[#1645](https://codecov.io/gh/apache/incubator-hudi/pull/1645?src=pr=desc) 
into 
[master](https://codecov.io/gh/apache/incubator-hudi/commit/74ecc27e920c70fa4598d8e5a696954203a5b127=desc)
 will **decrease** coverage by `0.01%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-hudi/pull/1645/graphs/tree.svg?width=650=150=pr=VTTXabwbs2)](https://codecov.io/gh/apache/incubator-hudi/pull/1645?src=pr=tree)
   
   ```diff
   @@ Coverage Diff  @@
   ## master#1645  +/-   ##
   
   - Coverage 18.34%   18.33%   -0.02% 
   - Complexity  854  855   +1 
   
 Files   344  344  
 Lines 1517215167   -5 
 Branches   1512 1512  
   
   - Hits   2784 2781   -3 
   + Misses1203512033   -2 
 Partials353  353  
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/incubator-hudi/pull/1645?src=pr=tree) | 
Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | 
[...ain/java/org/apache/hudi/avro/HoodieAvroUtils.java](https://codecov.io/gh/apache/incubator-hudi/pull/1645/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvYXZyby9Ib29kaWVBdnJvVXRpbHMuamF2YQ==)
 | `48.09% <0.00%> (-1.91%)` | `22.00% <0.00%> (ø%)` | |
   | 
[...apache/hudi/common/fs/HoodieWrapperFileSystem.java](https://codecov.io/gh/apache/incubator-hudi/pull/1645/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL2ZzL0hvb2RpZVdyYXBwZXJGaWxlU3lzdGVtLmphdmE=)
 | `22.69% <0.00%> (+0.70%)` | `29.00% <0.00%> (+1.00%)` | |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1645?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1645?src=pr=footer).
 Last update 
[74ecc27...6a43c92](https://codecov.io/gh/apache/incubator-hudi/pull/1645?src=pr=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-hudi] leesf commented on a change in pull request #1651: [MINOR] add impala release and spark partition discovery

2020-05-20 Thread GitBox


leesf commented on a change in pull request #1651:
URL: https://github.com/apache/incubator-hudi/pull/1651#discussion_r428440522



##
File path: docs/_docs/1_1_quick_start_guide.md
##
@@ -297,6 +298,7 @@ tripsSnapshotDF = spark. \
   read. \
   format("hudi"). \
   load(basePath + "/*/*/*/*")
+# load(basePath) use "/partitionKey=partitionValue" folder structure for Spark 
auto partition discovery

Review comment:
   \# -> //





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-hudi] leesf commented on a change in pull request #1651: [MINOR] add impala release and spark partition discovery

2020-05-20 Thread GitBox


leesf commented on a change in pull request #1651:
URL: https://github.com/apache/incubator-hudi/pull/1651#discussion_r428440522



##
File path: docs/_docs/1_1_quick_start_guide.md
##
@@ -297,6 +298,7 @@ tripsSnapshotDF = spark. \
   read. \
   format("hudi"). \
   load(basePath + "/*/*/*/*")
+# load(basePath) use "/partitionKey=partitionValue" folder structure for Spark 
auto partition discovery

Review comment:
   # -> //





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




Build failed in Jenkins: hudi-snapshot-deployment-0.5 #284

2020-05-20 Thread Apache Jenkins Server
See 


Changes:


--
[...truncated 2.40 KB...]
/home/jenkins/tools/maven/apache-maven-3.5.4/conf:
logging
settings.xml
toolchains.xml

/home/jenkins/tools/maven/apache-maven-3.5.4/conf/logging:
simplelogger.properties

/home/jenkins/tools/maven/apache-maven-3.5.4/lib:
aopalliance-1.0.jar
cdi-api-1.0.jar
cdi-api.license
commons-cli-1.4.jar
commons-cli.license
commons-io-2.5.jar
commons-io.license
commons-lang3-3.5.jar
commons-lang3.license
ext
guava-20.0.jar
guice-4.2.0-no_aop.jar
jansi-1.17.1.jar
jansi-native
javax.inject-1.jar
jcl-over-slf4j-1.7.25.jar
jcl-over-slf4j.license
jsr250-api-1.0.jar
jsr250-api.license
maven-artifact-3.5.4.jar
maven-artifact.license
maven-builder-support-3.5.4.jar
maven-builder-support.license
maven-compat-3.5.4.jar
maven-compat.license
maven-core-3.5.4.jar
maven-core.license
maven-embedder-3.5.4.jar
maven-embedder.license
maven-model-3.5.4.jar
maven-model-builder-3.5.4.jar
maven-model-builder.license
maven-model.license
maven-plugin-api-3.5.4.jar
maven-plugin-api.license
maven-repository-metadata-3.5.4.jar
maven-repository-metadata.license
maven-resolver-api-1.1.1.jar
maven-resolver-api.license
maven-resolver-connector-basic-1.1.1.jar
maven-resolver-connector-basic.license
maven-resolver-impl-1.1.1.jar
maven-resolver-impl.license
maven-resolver-provider-3.5.4.jar
maven-resolver-provider.license
maven-resolver-spi-1.1.1.jar
maven-resolver-spi.license
maven-resolver-transport-wagon-1.1.1.jar
maven-resolver-transport-wagon.license
maven-resolver-util-1.1.1.jar
maven-resolver-util.license
maven-settings-3.5.4.jar
maven-settings-builder-3.5.4.jar
maven-settings-builder.license
maven-settings.license
maven-shared-utils-3.2.1.jar
maven-shared-utils.license
maven-slf4j-provider-3.5.4.jar
maven-slf4j-provider.license
org.eclipse.sisu.inject-0.3.3.jar
org.eclipse.sisu.inject.license
org.eclipse.sisu.plexus-0.3.3.jar
org.eclipse.sisu.plexus.license
plexus-cipher-1.7.jar
plexus-cipher.license
plexus-component-annotations-1.7.1.jar
plexus-component-annotations.license
plexus-interpolation-1.24.jar
plexus-interpolation.license
plexus-sec-dispatcher-1.4.jar
plexus-sec-dispatcher.license
plexus-utils-3.1.0.jar
plexus-utils.license
slf4j-api-1.7.25.jar
slf4j-api.license
wagon-file-3.1.0.jar
wagon-file.license
wagon-http-3.1.0-shaded.jar
wagon-http.license
wagon-provider-api-3.1.0.jar
wagon-provider-api.license

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/ext:
README.txt

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native:
freebsd32
freebsd64
linux32
linux64
osx
README.txt
windows32
windows64

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/freebsd32:
libjansi.so

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/freebsd64:
libjansi.so

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/linux32:
libjansi.so

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/linux64:
libjansi.so

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/osx:
libjansi.jnilib

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/windows32:
jansi.dll

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/windows64:
jansi.dll
Finished /home/jenkins/tools/maven/apache-maven-3.5.4 Directory Listing :
Detected current version as: 
'HUDI_home=
0.6.0-SNAPSHOT'
[INFO] Scanning for projects...
[WARNING] 
[WARNING] Some problems were encountered while building the effective model for 
org.apache.hudi:hudi-spark_2.11:jar:0.6.0-SNAPSHOT
[WARNING] 'artifactId' contains an expression but should be a constant. @ 
org.apache.hudi:hudi-spark_${scala.binary.version}:[unknown-version], 

 line 26, column 15
[WARNING] 
[WARNING] Some problems were encountered while building the effective model for 
org.apache.hudi:hudi-timeline-service:jar:0.6.0-SNAPSHOT
[WARNING] 'build.plugins.plugin.(groupId:artifactId)' must be unique but found 
duplicate declaration of plugin org.jacoco:jacoco-maven-plugin @ 
org.apache.hudi:hudi-timeline-service:[unknown-version], 

 line 58, column 15
[WARNING] 
[WARNING] Some problems were encountered while building the effective model for 
org.apache.hudi:hudi-utilities_2.11:jar:0.6.0-SNAPSHOT
[WARNING] 'artifactId' contains an expression but should be a constant. @ 
org.apache.hudi:hudi-utilities_${scala.binary.version}:[unknown-version], 

 line 26, column 15
[WARNING] 
[WARNING] Some problems were encountered while building the effective model for 
org.apache.hudi:hudi-spark-bundle_2.11:jar:0.6.0-SNAPSHOT
[WARNING] 'artifactId' contains an expression but should be a constant. @ 

[GitHub] [incubator-hudi] garyli1019 opened a new pull request #1651: [MINOR] add impala release and spark partition discovery

2020-05-20 Thread GitBox


garyli1019 opened a new pull request #1651:
URL: https://github.com/apache/incubator-hudi/pull/1651


   Minor doc edit



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-hudi] garyli1019 commented on a change in pull request #1651: [MINOR] add impala release and spark partition discovery

2020-05-20 Thread GitBox


garyli1019 commented on a change in pull request #1651:
URL: https://github.com/apache/incubator-hudi/pull/1651#discussion_r428441106



##
File path: docs/_docs/1_1_quick_start_guide.md
##
@@ -297,6 +298,7 @@ tripsSnapshotDF = spark. \
   read. \
   format("hudi"). \
   load(basePath + "/*/*/*/*")
+# load(basePath) use "/partitionKey=partitionValue" folder structure for Spark 
auto partition discovery

Review comment:
   this is python  





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-hudi] codecov-commenter edited a comment on pull request #1644: [HUDI-811] Restructure test packages in hudi-common

2020-05-20 Thread GitBox


codecov-commenter edited a comment on pull request #1644:
URL: https://github.com/apache/incubator-hudi/pull/1644#issuecomment-631673778


   # 
[Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1644?src=pr=h1) 
Report
   > Merging 
[#1644](https://codecov.io/gh/apache/incubator-hudi/pull/1644?src=pr=desc) 
into 
[master](https://codecov.io/gh/apache/incubator-hudi/commit/6a0aa9a645d11ed7b50e18aa0563dafcd9d145f7=desc)
 will **decrease** coverage by `0.01%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-hudi/pull/1644/graphs/tree.svg?width=650=150=pr=VTTXabwbs2)](https://codecov.io/gh/apache/incubator-hudi/pull/1644?src=pr=tree)
   
   ```diff
   @@ Coverage Diff  @@
   ## master#1644  +/-   ##
   
   - Coverage 18.33%   18.32%   -0.02% 
   + Complexity  855  854   -1 
   
 Files   344  344  
 Lines 1516715167  
 Branches   1512 1512  
   
   - Hits   2781 2779   -2 
   - Misses1203312035   +2 
 Partials353  353  
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/incubator-hudi/pull/1644?src=pr=tree) | 
Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | 
[...che/hudi/common/table/log/HoodieLogFileReader.java](https://codecov.io/gh/apache/incubator-hudi/pull/1644/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL2xvZy9Ib29kaWVMb2dGaWxlUmVhZGVyLmphdmE=)
 | `0.00% <ø> (ø)` | `0.00 <0.00> (ø)` | |
   | 
[...apache/hudi/common/fs/HoodieWrapperFileSystem.java](https://codecov.io/gh/apache/incubator-hudi/pull/1644/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL2ZzL0hvb2RpZVdyYXBwZXJGaWxlU3lzdGVtLmphdmE=)
 | `21.98% <0.00%> (-0.71%)` | `28.00% <0.00%> (-1.00%)` | |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1644?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1644?src=pr=footer).
 Last update 
[6a0aa9a...6f52762](https://codecov.io/gh/apache/incubator-hudi/pull/1644?src=pr=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-hudi] yanghua commented on pull request #1645: [HUDI-707]Add unit test for StatsCommand

2020-05-20 Thread GitBox


yanghua commented on pull request #1645:
URL: https://github.com/apache/incubator-hudi/pull/1645#issuecomment-631836600


   > you be interested in shepherding this PR when it is
   
   Yes, of course.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-hudi] codecov-commenter commented on pull request #1645: [HUDI-707]Add unit test for StatsCommand

2020-05-20 Thread GitBox


codecov-commenter commented on pull request #1645:
URL: https://github.com/apache/incubator-hudi/pull/1645#issuecomment-631841340


   # 
[Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1645?src=pr=h1) 
Report
   > Merging 
[#1645](https://codecov.io/gh/apache/incubator-hudi/pull/1645?src=pr=desc) 
into 
[master](https://codecov.io/gh/apache/incubator-hudi/commit/74ecc27e920c70fa4598d8e5a696954203a5b127=desc)
 will **decrease** coverage by `0.01%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-hudi/pull/1645/graphs/tree.svg?width=650=150=pr=VTTXabwbs2)](https://codecov.io/gh/apache/incubator-hudi/pull/1645?src=pr=tree)
   
   ```diff
   @@ Coverage Diff  @@
   ## master#1645  +/-   ##
   
   - Coverage 18.34%   18.33%   -0.02% 
   - Complexity  854  855   +1 
   
 Files   344  344  
 Lines 1517215167   -5 
 Branches   1512 1512  
   
   - Hits   2784 2781   -3 
   + Misses1203512033   -2 
 Partials353  353  
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/incubator-hudi/pull/1645?src=pr=tree) | 
Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | 
[...ain/java/org/apache/hudi/avro/HoodieAvroUtils.java](https://codecov.io/gh/apache/incubator-hudi/pull/1645/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvYXZyby9Ib29kaWVBdnJvVXRpbHMuamF2YQ==)
 | `48.09% <0.00%> (-1.91%)` | `22.00% <0.00%> (ø%)` | |
   | 
[...apache/hudi/common/fs/HoodieWrapperFileSystem.java](https://codecov.io/gh/apache/incubator-hudi/pull/1645/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL2ZzL0hvb2RpZVdyYXBwZXJGaWxlU3lzdGVtLmphdmE=)
 | `22.69% <0.00%> (+0.70%)` | `29.00% <0.00%> (+1.00%)` | |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1645?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1645?src=pr=footer).
 Last update 
[74ecc27...6a43c92](https://codecov.io/gh/apache/incubator-hudi/pull/1645?src=pr=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-hudi] codecov-commenter edited a comment on pull request #1645: [HUDI-707]Add unit test for StatsCommand

2020-05-20 Thread GitBox


codecov-commenter edited a comment on pull request #1645:
URL: https://github.com/apache/incubator-hudi/pull/1645#issuecomment-631841340


   # 
[Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1645?src=pr=h1) 
Report
   > Merging 
[#1645](https://codecov.io/gh/apache/incubator-hudi/pull/1645?src=pr=desc) 
into 
[master](https://codecov.io/gh/apache/incubator-hudi/commit/74ecc27e920c70fa4598d8e5a696954203a5b127=desc)
 will **decrease** coverage by `0.01%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-hudi/pull/1645/graphs/tree.svg?width=650=150=pr=VTTXabwbs2)](https://codecov.io/gh/apache/incubator-hudi/pull/1645?src=pr=tree)
   
   ```diff
   @@ Coverage Diff  @@
   ## master#1645  +/-   ##
   
   - Coverage 18.34%   18.33%   -0.02% 
   - Complexity  854  855   +1 
   
 Files   344  344  
 Lines 1517215167   -5 
 Branches   1512 1512  
   
   - Hits   2784 2781   -3 
   + Misses1203512033   -2 
 Partials353  353  
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/incubator-hudi/pull/1645?src=pr=tree) | 
Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | 
[...ain/java/org/apache/hudi/avro/HoodieAvroUtils.java](https://codecov.io/gh/apache/incubator-hudi/pull/1645/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvYXZyby9Ib29kaWVBdnJvVXRpbHMuamF2YQ==)
 | `48.09% <0.00%> (-1.91%)` | `22.00% <0.00%> (ø%)` | |
   | 
[...apache/hudi/common/fs/HoodieWrapperFileSystem.java](https://codecov.io/gh/apache/incubator-hudi/pull/1645/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL2ZzL0hvb2RpZVdyYXBwZXJGaWxlU3lzdGVtLmphdmE=)
 | `22.69% <0.00%> (+0.70%)` | `29.00% <0.00%> (+1.00%)` | |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1645?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1645?src=pr=footer).
 Last update 
[74ecc27...138f2f8](https://codecov.io/gh/apache/incubator-hudi/pull/1645?src=pr=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-hudi] leesf merged pull request #1651: [MINOR] add impala release and spark partition discovery

2020-05-20 Thread GitBox


leesf merged pull request #1651:
URL: https://github.com/apache/incubator-hudi/pull/1651


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[incubator-hudi] branch asf-site updated: [MINOR] add impala release and spark partition discovery (#1651)

2020-05-20 Thread leesf
This is an automated email from the ASF dual-hosted git repository.

leesf pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/incubator-hudi.git


The following commit(s) were added to refs/heads/asf-site by this push:
 new 349be47  [MINOR] add impala release and spark partition discovery 
(#1651)
349be47 is described below

commit 349be47d8830489bc8c3d130683ad561ea8005ca
Author: Gary Li 
AuthorDate: Wed May 20 21:44:35 2020 -0700

[MINOR] add impala release and spark partition discovery (#1651)
---
 docs/_docs/0.5.2/2_3_querying_data.cn.md | 2 +-
 docs/_docs/0.5.2/2_3_querying_data.md| 2 +-
 docs/_docs/1_1_quick_start_guide.cn.md   | 2 ++
 docs/_docs/1_1_quick_start_guide.md  | 2 ++
 docs/_docs/2_3_querying_data.cn.md   | 2 +-
 docs/_docs/2_3_querying_data.md  | 2 +-
 6 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/docs/_docs/0.5.2/2_3_querying_data.cn.md 
b/docs/_docs/0.5.2/2_3_querying_data.cn.md
index 77ad2d7..d37d8f2 100644
--- a/docs/_docs/0.5.2/2_3_querying_data.cn.md
+++ b/docs/_docs/0.5.2/2_3_querying_data.cn.md
@@ -176,7 +176,7 @@ scala> sqlContext.sql("select count(*) from hudi_rt where 
datestr = '2016-10-02'
 Presto是一种常用的查询引擎,可提供交互式查询性能。 Hudi RO表可以在Presto中无缝查询。
 这需要在整个安装过程中将`hudi-presto-bundle` jar放入`/plugin/hive-hadoop2/`中。
 
-## Impala(此功能还未正式发布)
+## Impala (3.4 or later)
 
 ### 读优化表
 
diff --git a/docs/_docs/0.5.2/2_3_querying_data.md 
b/docs/_docs/0.5.2/2_3_querying_data.md
index 9d17e72..00c8a48 100644
--- a/docs/_docs/0.5.2/2_3_querying_data.md
+++ b/docs/_docs/0.5.2/2_3_querying_data.md
@@ -171,7 +171,7 @@ Additionally, `HoodieReadClient` offers the following 
functionality using Hudi's
 Presto is a popular query engine, providing interactive query performance. 
Presto currently supports snapshot queries on COPY_ON_WRITE and read optimized 
queries 
 on MERGE_ON_READ Hudi tables. This requires the `hudi-presto-bundle` jar to be 
placed into `/plugin/hive-hadoop2/`, across the installation.
 
-## Impala (Not Officially Released)
+## Impala (3.4 or later)
 
 ### Snapshot Query
 
diff --git a/docs/_docs/1_1_quick_start_guide.cn.md 
b/docs/_docs/1_1_quick_start_guide.cn.md
index f20e212..9137f91 100644
--- a/docs/_docs/1_1_quick_start_guide.cn.md
+++ b/docs/_docs/1_1_quick_start_guide.cn.md
@@ -70,6 +70,8 @@ val roViewDF = spark.
 read.
 format("org.apache.hudi").
 load(basePath + "/*/*/*/*")
+//load(basePath) 如果使用 "/partitionKey=partitionValue" 文件夹命名格式,Spark将自动识别分区信息
+
 roViewDF.registerTempTable("hudi_ro_table")
 spark.sql("select fare, begin_lon, begin_lat, ts from  hudi_ro_table where 
fare > 20.0").show()
 spark.sql("select _hoodie_commit_time, _hoodie_record_key, 
_hoodie_partition_path, rider, driver, fare from  hudi_ro_table").show()
diff --git a/docs/_docs/1_1_quick_start_guide.md 
b/docs/_docs/1_1_quick_start_guide.md
index 3e088dd..62939cb 100644
--- a/docs/_docs/1_1_quick_start_guide.md
+++ b/docs/_docs/1_1_quick_start_guide.md
@@ -92,6 +92,7 @@ val tripsSnapshotDF = spark.
   read.
   format("hudi").
   load(basePath + "/*/*/*/*")
+//load(basePath) use "/partitionKey=partitionValue" folder structure for Spark 
auto partition discovery
 tripsSnapshotDF.createOrReplaceTempView("hudi_trips_snapshot")
 
 spark.sql("select fare, begin_lon, begin_lat, ts from  hudi_trips_snapshot 
where fare > 20.0").show()
@@ -297,6 +298,7 @@ tripsSnapshotDF = spark. \
   read. \
   format("hudi"). \
   load(basePath + "/*/*/*/*")
+# load(basePath) use "/partitionKey=partitionValue" folder structure for Spark 
auto partition discovery
 
 tripsSnapshotDF.createOrReplaceTempView("hudi_trips_snapshot")
 
diff --git a/docs/_docs/2_3_querying_data.cn.md 
b/docs/_docs/2_3_querying_data.cn.md
index 1fa91d1..0aeb104 100644
--- a/docs/_docs/2_3_querying_data.cn.md
+++ b/docs/_docs/2_3_querying_data.cn.md
@@ -175,7 +175,7 @@ scala> sqlContext.sql("select count(*) from hudi_rt where 
datestr = '2016-10-02'
 Presto是一种常用的查询引擎,可提供交互式查询性能。 Hudi RO表可以在Presto中无缝查询。
 这需要在整个安装过程中将`hudi-presto-bundle` jar放入`/plugin/hive-hadoop2/`中。
 
-## Impala(此功能还未正式发布)
+## Impala (3.4 or later)
 
 ### 读优化表
 
diff --git a/docs/_docs/2_3_querying_data.md b/docs/_docs/2_3_querying_data.md
index 3e6a436..568d3ba 100644
--- a/docs/_docs/2_3_querying_data.md
+++ b/docs/_docs/2_3_querying_data.md
@@ -170,7 +170,7 @@ Additionally, `HoodieReadClient` offers the following 
functionality using Hudi's
 Presto is a popular query engine, providing interactive query performance. 
Presto currently supports snapshot queries on COPY_ON_WRITE and read optimized 
queries 
 on MERGE_ON_READ Hudi tables. This requires the `hudi-presto-bundle` jar to be 
placed into `/plugin/hive-hadoop2/`, across the installation.
 
-## Impala (Not Officially Released)
+## Impala (3.4 or later)
 
 ### Snapshot Query
 



[jira] [Updated] (HUDI-905) Support PrunedFilteredScan for Spark Datasource

2020-05-20 Thread Yanjia Gary Li (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yanjia Gary Li updated HUDI-905:

Description: 
Hudi Spark Datasource incremental view currently is using 
DataSourceReadOptions.PUSH_DOWN_INCR_FILTERS_OPT_KEY to push down the filter.

If we wanna use Spark predicate pushdown in a native way, we need to implement 
PrunedFilteredScan for Hudi Datasource.

> Support PrunedFilteredScan for Spark Datasource
> ---
>
> Key: HUDI-905
> URL: https://issues.apache.org/jira/browse/HUDI-905
> Project: Apache Hudi (incubating)
>  Issue Type: New Feature
>Reporter: Yanjia Gary Li
>Priority: Major
>
> Hudi Spark Datasource incremental view currently is using 
> DataSourceReadOptions.PUSH_DOWN_INCR_FILTERS_OPT_KEY to push down the filter.
> If we wanna use Spark predicate pushdown in a native way, we need to 
> implement PrunedFilteredScan for Hudi Datasource.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-648) Implement error log/table for Datasource/DeltaStreamer/WriteClient/Compaction writes

2020-05-20 Thread Raymond Xu (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17112712#comment-17112712
 ] 

Raymond Xu commented on HUDI-648:
-

I guess RFC may not be necessary. Had a look into HoodieWriteClient and it 
seems that `postWrite()` `postCommit()` `completeCompaction()` are suitable 
places to write errors into the metadata dir. WDYT? [~vinoth][~liujinhui]

> Implement error log/table for Datasource/DeltaStreamer/WriteClient/Compaction 
> writes
> 
>
> Key: HUDI-648
> URL: https://issues.apache.org/jira/browse/HUDI-648
> Project: Apache Hudi (incubating)
>  Issue Type: New Feature
>  Components: DeltaStreamer, Spark Integration, Writer Core
>Reporter: Vinoth Chandar
>Priority: Major
>
> We would like a way to hand the erroring records from writing or compaction 
> back to the users, in a separate table or log. This needs to work generically 
> across all the different writer paths.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] yanghua commented on pull request #1572: [HUDI-836] Implement datadog metrics reporter

2020-05-20 Thread GitBox


yanghua commented on pull request #1572:
URL: https://github.com/apache/incubator-hudi/pull/1572#issuecomment-631840124


   @vinothchandar If you are busy with other things. Can we merge it firstly? 
Then, iterate later.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-hudi] codecov-commenter edited a comment on pull request #1644: [HUDI-811] Restructure test packages in hudi-common

2020-05-20 Thread GitBox


codecov-commenter edited a comment on pull request #1644:
URL: https://github.com/apache/incubator-hudi/pull/1644#issuecomment-631673778


   # 
[Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1644?src=pr=h1) 
Report
   > Merging 
[#1644](https://codecov.io/gh/apache/incubator-hudi/pull/1644?src=pr=desc) 
into 
[master](https://codecov.io/gh/apache/incubator-hudi/commit/6a0aa9a645d11ed7b50e18aa0563dafcd9d145f7=desc)
 will **not change** coverage.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-hudi/pull/1644/graphs/tree.svg?width=650=150=pr=VTTXabwbs2)](https://codecov.io/gh/apache/incubator-hudi/pull/1644?src=pr=tree)
   
   ```diff
   @@Coverage Diff@@
   ## master#1644   +/-   ##
   =
 Coverage 18.33%   18.33%   
 Complexity  855  855   
   =
 Files   344  344   
 Lines 1516715167   
 Branches   1512 1512   
   =
 Hits   2781 2781   
 Misses1203312033   
 Partials353  353   
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/incubator-hudi/pull/1644?src=pr=tree) | 
Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | 
[...che/hudi/common/table/log/HoodieLogFileReader.java](https://codecov.io/gh/apache/incubator-hudi/pull/1644/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL2xvZy9Ib29kaWVMb2dGaWxlUmVhZGVyLmphdmE=)
 | `0.00% <ø> (ø)` | `0.00 <0.00> (ø)` | |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1644?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1644?src=pr=footer).
 Last update 
[6a0aa9a...ed4ca32](https://codecov.io/gh/apache/incubator-hudi/pull/1644?src=pr=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-hudi] yanghua commented on pull request #1644: [HUDI-811] Restructure test packages in hudi-common

2020-05-20 Thread GitBox


yanghua commented on pull request #1644:
URL: https://github.com/apache/incubator-hudi/pull/1644#issuecomment-631836269


   OK, will review later



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (HUDI-905) Support PrunedFilteredScan for Spark Datasource

2020-05-20 Thread Yanjia Gary Li (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yanjia Gary Li updated HUDI-905:

Summary: Support PrunedFilteredScan for Spark Datasource  (was: Support 
native filter pushdown for Spark Datasource)

> Support PrunedFilteredScan for Spark Datasource
> ---
>
> Key: HUDI-905
> URL: https://issues.apache.org/jira/browse/HUDI-905
> Project: Apache Hudi (incubating)
>  Issue Type: New Feature
>Reporter: Yanjia Gary Li
>Assignee: Yanjia Gary Li
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


  1   2   >