[GitHub] [incubator-hudi] n3nash edited a comment on pull request #1100: [HUDI-289] Implement a test suite to support long running test for Hudi writing and querying end-end

2020-04-23 Thread GitBox


n3nash edited a comment on pull request #1100:
URL: https://github.com/apache/incubator-hudi/pull/1100#issuecomment-618816317


   @yanghua no worries, thanks for trying, I've pushed the changes. We will 
continue to have conflicting files given lots of new commits every day, we 
should merge this PR soon to avoid this situation everytime.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-hudi] n3nash edited a comment on pull request #1100: [HUDI-289] Implement a test suite to support long running test for Hudi writing and querying end-end

2020-04-23 Thread GitBox


n3nash edited a comment on pull request #1100:
URL: https://github.com/apache/incubator-hudi/pull/1100#issuecomment-618816317


   @yanghua no worries, thanks for trying, I've pushed the changes



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-hudi] n3nash commented on pull request #1100: [HUDI-289] Implement a test suite to support long running test for Hudi writing and querying end-end

2020-04-23 Thread GitBox


n3nash commented on pull request #1100:
URL: https://github.com/apache/incubator-hudi/pull/1100#issuecomment-618816317


   @yanghua no worries, I've pushed the changes



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[incubator-hudi] branch hudi_test_suite_refactor updated (95283be -> 08f9a76)

2020-04-23 Thread nagarwal
This is an automated email from the ASF dual-hosted git repository.

nagarwal pushed a change to branch hudi_test_suite_refactor
in repository https://gitbox.apache.org/repos/asf/incubator-hudi.git.


 discard 95283be  Testing running 3 builds to limit total build time
omit c13e885  [HUDI-394] Provide a basic implementation of test suite
 add ddd105b  [HUDI-772] Make UserDefinedBulkInsertPartitioner configurable 
for DataSource (#1500)
 add 2a2f31d  [MINOR] Remove reduntant code and fix typo in 
HoodieDefaultTimeline (#1535)
 add 332072b  [HUDI-371] Supporting hive combine input format for realtime 
tables (#1503)
 add 84dd904  [HUDI-789]Adjust logic of upsert in HDFSParquetImporter 
(#1511)
 add 62bd3e7  [HUDI-757] Added hudi-cli command to export metadata of 
Instants.
 add 2a56f82  [HUDI-821] Fixing JCommander param parsing in deltastreamer 
(#1525)
 add 6e15eeb  [HUDI-809] Migrate CommonTestHarness to JUnit 5 (#1530)
 add 26684f5  [HUDI-816] Fixed MAX_MEMORY_FOR_MERGE_PROP and 
MAX_MEMORY_FOR_COMPACTION_PROP do not work due to HUDI-678 (#1536)
 add aea7c16  [HUDI-795] Handle auto-deleted empty aux folder (#1515)
 add 19cc15c  [MINOR]: Fix cli docs for DeltaStreamer (#1547)
 add 08f9a76  [HUDI-394] Provide a basic implementation of test suite

This update added new revisions after undoing existing revisions.
That is to say, some revisions that were in the old version of the
branch are not in the new version.  This situation occurs
when a user --force pushes a change and generates a repository
containing something like this:

 * -- * -- B -- O -- O -- O   (95283be)
\
 N -- N -- N   refs/heads/hudi_test_suite_refactor (08f9a76)

You should already have received notification emails for all of the O
revisions, and so the following emails describe only the N revisions
from the common base, B.

Any revisions marked "omit" are not gone; other references still
refer to them.  Any revisions marked "discard" are gone forever.

No new revisions were added by this update.

Summary of changes:
 .../apache/hudi/cli/commands/ExportCommand.java| 231 
 .../apache/hudi/client/utils/SparkConfigUtils.java |  10 +-
 .../org/apache/hudi/config/HoodieWriteConfig.java  |  10 +
 .../apache/hudi/table/HoodieCommitArchiveLog.java  |  24 +-
 .../hudi/client/utils/TestSparkConfigUtils.java|  65 +++
 .../hudi/common/HoodieMergeOnReadTestUtils.java|   4 +-
 .../java/org/apache/hudi/avro/HoodieAvroUtils.java |  29 +
 .../table/timeline/HoodieDefaultTimeline.java  |   6 +-
 .../table/timeline/TimelineMetadataUtils.java  |   4 +
 .../hudi/common/util/collection/ArrayUtils.java|  62 ++
 .../common/table/TestHoodieTableMetaClient.java|  54 +-
 .../hudi/common/table/TestTimelineLayout.java  |  24 +-
 .../table/view/TestHoodieTableFileSystemView.java  | 335 ++-
 .../table/view/TestRocksDbBasedFileSystemView.java |   4 +-
 .../HoodieCommonTestHarnessJunit5.java}|  33 +-
 .../apache/hudi/common/util/TestFileIOUtils.java   |  20 +-
 hudi-hadoop-mr/pom.xml |   8 +-
 .../hadoop/hive/HoodieCombineHiveInputFormat.java  | 626 -
 .../hive/HoodieCombineRealtimeFileSplit.java   | 169 ++
 .../hive/HoodieCombineRealtimeHiveSplit.java   |  27 +-
 .../realtime/AbstractRealtimeRecordReader.java |   3 +
 .../HoodieCombineRealtimeRecordReader.java | 103 
 .../realtime/HoodieParquetRealtimeInputFormat.java |   2 +-
 .../realtime/HoodieRealtimeRecordReader.java   |   1 +
 .../realtime/RealtimeUnmergedRecordReader.java |  22 +-
 .../apache/hudi/hadoop/InputFormatTestUtil.java| 165 --
 .../hudi/hadoop/TestHoodieParquetInputFormat.java  |  99 ++--
 .../hudi/hadoop/TestHoodieROTablePathFilter.java   |  26 +-
 .../realtime/TestHoodieCombineHiveInputFormat.java | 156 +
 .../realtime/TestHoodieRealtimeRecordReader.java   | 206 +++
 .../main/java/org/apache/hudi/DataSourceUtils.java |  27 +-
 hudi-spark/src/test/java/DataSourceTestUtils.java  |  13 +
 hudi-spark/src/test/java/DataSourceUtilsTest.java  |  86 +++
 .../apache/hudi/utilities/HDFSParquetImporter.java |  22 +-
 .../deltastreamer/HoodieDeltaStreamer.java |  18 +-
 .../HoodieMultiTableDeltaStreamer.java |   5 +-
 .../hudi/utilities/TestHDFSParquetImporter.java| 255 +++--
 .../hudi/utilities/TestHoodieSnapshotCopier.java   |  22 +-
 .../TestKafkaConnectHdfsProvider.java  |  20 +-
 39 files changed, 2125 insertions(+), 871 deletions(-)
 create mode 100644 
hudi-cli/src/main/java/org/apache/hudi/cli/commands/ExportCommand.java
 create mode 100644 
hudi-client/src/test/java/org/apache/hudi/client/utils/TestSparkConfigUtils.java
 create mode 100644 
hudi-common/src/main/java/org/apache/hudi/common/util/collection/ArrayUtils.java
 copy 
hudi-common/src/test/java/org/apache/hudi/common/{table/view/TestRocksDBBasedIncrementalFSViewSync.java
 => 

[GitHub] [incubator-hudi] codecov-io edited a comment on pull request #1553: [HUDI-810] Migrate ClientTestHarness to JUnit 5

2020-04-23 Thread GitBox


codecov-io edited a comment on pull request #1553:
URL: https://github.com/apache/incubator-hudi/pull/1553#issuecomment-618814181


   # 
[Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1553?src=pr=h1) 
Report
   > Merging 
[#1553](https://codecov.io/gh/apache/incubator-hudi/pull/1553?src=pr=desc) 
into 
[master](https://codecov.io/gh/apache/incubator-hudi/commit/19cc15c0987043175685aaeb45facb15af23e34f=desc)
 will **increase** coverage by `0.01%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-hudi/pull/1553/graphs/tree.svg?width=650=150=pr=VTTXabwbs2)](https://codecov.io/gh/apache/incubator-hudi/pull/1553?src=pr=tree)
   
   ```diff
   @@ Coverage Diff  @@
   ## master#1553  +/-   ##
   
   + Coverage 71.66%   71.68%   +0.01% 
 Complexity  294  294  
   
 Files   378  378  
 Lines 1655116551  
 Branches   1670 1670  
   
   + Hits  1186111864   +3 
   + Misses 3959 3956   -3 
 Partials731  731  
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/incubator-hudi/pull/1553?src=pr=tree) | 
Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | 
[...ache/hudi/common/fs/inline/InMemoryFileSystem.java](https://codecov.io/gh/apache/incubator-hudi/pull/1553/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL2ZzL2lubGluZS9Jbk1lbW9yeUZpbGVTeXN0ZW0uamF2YQ==)
 | `89.65% <0.00%> (+10.34%)` | `0.00% <0.00%> (ø%)` | |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1553?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1553?src=pr=footer).
 Last update 
[19cc15c...79e937d](https://codecov.io/gh/apache/incubator-hudi/pull/1553?src=pr=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-hudi] codecov-io commented on pull request #1553: [HUDI-810] Migrate ClientTestHarness to JUnit 5

2020-04-23 Thread GitBox


codecov-io commented on pull request #1553:
URL: https://github.com/apache/incubator-hudi/pull/1553#issuecomment-618814181


   # 
[Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1553?src=pr=h1) 
Report
   > Merging 
[#1553](https://codecov.io/gh/apache/incubator-hudi/pull/1553?src=pr=desc) 
into 
[master](https://codecov.io/gh/apache/incubator-hudi/commit/19cc15c0987043175685aaeb45facb15af23e34f=desc)
 will **increase** coverage by `0.01%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-hudi/pull/1553/graphs/tree.svg?width=650=150=pr=VTTXabwbs2)](https://codecov.io/gh/apache/incubator-hudi/pull/1553?src=pr=tree)
   
   ```diff
   @@ Coverage Diff  @@
   ## master#1553  +/-   ##
   
   + Coverage 71.66%   71.68%   +0.01% 
 Complexity  294  294  
   
 Files   378  378  
 Lines 1655116551  
 Branches   1670 1670  
   
   + Hits  1186111864   +3 
   + Misses 3959 3956   -3 
 Partials731  731  
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/incubator-hudi/pull/1553?src=pr=tree) | 
Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | 
[...ache/hudi/common/fs/inline/InMemoryFileSystem.java](https://codecov.io/gh/apache/incubator-hudi/pull/1553/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL2ZzL2lubGluZS9Jbk1lbW9yeUZpbGVTeXN0ZW0uamF2YQ==)
 | `89.65% <0.00%> (+10.34%)` | `0.00% <0.00%> (ø%)` | |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1553?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1553?src=pr=footer).
 Last update 
[19cc15c...79e937d](https://codecov.io/gh/apache/incubator-hudi/pull/1553?src=pr=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1433: [HUDI-728]: Implement custom key generator

2020-04-23 Thread GitBox


vinothchandar commented on a change in pull request #1433:
URL: https://github.com/apache/incubator-hudi/pull/1433#discussion_r411494931



##
File path: hudi-spark/src/main/java/org/apache/hudi/keygen/KeyGenerator.java
##
@@ -40,4 +40,22 @@ protected KeyGenerator(TypedProperties config) {
* Generate a Hoodie Key out of provided generic record.
*/
   public abstract HoodieKey getKey(GenericRecord record);
+
+  public abstract String getPartitionPath(GenericRecord record, String 
partitionPathField);
+
+  public abstract String getRecordKey(GenericRecord record);
+
+  public enum PartitionKeyType {
+simple("simple"), complex("complex"), timestampBased("timestampBased"), 
noPartition("noPartition");

Review comment:
   please name enum elements like constants.. all caps

##
File path: hudi-spark/src/main/java/org/apache/hudi/keygen/KeyGenerator.java
##
@@ -40,4 +40,22 @@ protected KeyGenerator(TypedProperties config) {
* Generate a Hoodie Key out of provided generic record.
*/
   public abstract HoodieKey getKey(GenericRecord record);
+
+  public abstract String getPartitionPath(GenericRecord record, String 
partitionPathField);

Review comment:
   javadocs for these abstract methods? 
   
   Also I am not sure if it makes sense to add these component methods here.. 
We already have a top level one `getKey()`

##
File path: 
hudi-spark/src/main/java/org/apache/hudi/keygen/CustomKeyGenerator.java
##
@@ -0,0 +1,143 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.keygen;
+
+import org.apache.hudi.DataSourceWriteOptions;
+import org.apache.hudi.common.model.HoodieKey;
+import org.apache.hudi.common.config.TypedProperties;
+
+import org.apache.avro.generic.GenericRecord;
+import org.apache.hudi.exception.HoodieDeltaStreamerException;
+import org.apache.hudi.exception.HoodieKeyException;
+
+import java.util.Arrays;
+import java.util.List;
+import java.util.stream.Collectors;
+
+import static org.apache.hudi.keygen.KeyGenerator.PartitionKeyType.noPartition;
+import static org.apache.hudi.keygen.KeyGenerator.PartitionKeyType.simple;
+import static 
org.apache.hudi.keygen.KeyGenerator.PartitionKeyType.timestampBased;
+
+/**
+ * This is a generic implementation of KeyGenerator where users can configure 
record key as a single field or a combination of fields.
+ * Similarly partition path can be configured to have multiple fields or only 
one field. This class expects value for prop
+ * "hoodie.datasource.write.partitionpath.field" in a specific format. For 
example:
+ *
+ * properties.put("hoodie.datasource.write.partitionpath.field", 
"field1:PartitionKeyType1,field2:PartitionKeyType2").
+ *
+ * The complete partition path is created as / and so on.
+ *
+ * Few points to consider:
+ * 1. If you want to customise some partition path field on a timestamp basis, 
you can use field1:timestampBased

Review comment:
   all this needs to be documented for the user? 

##
File path: 
hudi-spark/src/main/java/org/apache/hudi/keygen/ComplexKeyGenerator.java
##
@@ -64,6 +62,32 @@ public HoodieKey getKey(GenericRecord record) {
   throw new HoodieKeyException("Unable to find field names for record key 
or partition path in cfg");
 }
 
+String recordKey = getRecordKey(record);
+StringBuilder partitionPath = new StringBuilder();
+for (String partitionPathField : partitionPathFields) {
+  partitionPath.append(getPartitionPath(record, partitionPathField));
+  partitionPath.append(DEFAULT_PARTITION_PATH_SEPARATOR);
+}
+partitionPath.deleteCharAt(partitionPath.length() - 1);
+
+return new HoodieKey(recordKey, partitionPath.toString());
+  }
+
+  @Override
+  public String getPartitionPath(GenericRecord record, String 
partitionPathField) {

Review comment:
   I assume this whol;e change is just restructuring code 

##
File path: hudi-spark/src/main/java/org/apache/hudi/keygen/KeyGenerator.java
##
@@ -40,4 +40,22 @@ protected KeyGenerator(TypedProperties config) {
* Generate a Hoodie Key out of provided generic record.
*/
   public abstract HoodieKey getKey(GenericRecord record);
+

[GitHub] [incubator-hudi] n3nash edited a comment on issue #1549: Potential issue when using Deltastreamer with DMS

2020-04-23 Thread GitBox


n3nash edited a comment on issue #1549:
URL: https://github.com/apache/incubator-hudi/issues/1549#issuecomment-618811796


   @vinothchandar  We do invoke the same payload when combining records during 
merge/compaction. For deletes, the payload has to be an empty payload and then 
the record should be skipped -> 
https://github.com/apache/incubator-hudi/blob/master/hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/realtime/RealtimeCompactedRecordReader.java#L94
   
   @PhatakN1 when you try deletes, is that any empty payload ? Or is this 
something you just drive through configs in deltastreamer ?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-hudi] n3nash commented on issue #1549: Potential issue when using Deltastreamer with DMS

2020-04-23 Thread GitBox


n3nash commented on issue #1549:
URL: https://github.com/apache/incubator-hudi/issues/1549#issuecomment-618811796


   We do invoke the same payload when combining records during 
merge/compaction. For deletes, the payload has to be an empty payload and then 
the record should be skipped -> 
https://github.com/apache/incubator-hudi/blob/master/hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/realtime/RealtimeCompactedRecordReader.java#L94
   
   @PhatakN1 when you try deletes, is that any empty payload ?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-hudi] n3nash edited a comment on issue #1549: Potential issue when using Deltastreamer with DMS

2020-04-23 Thread GitBox


n3nash edited a comment on issue #1549:
URL: https://github.com/apache/incubator-hudi/issues/1549#issuecomment-618811796


   @vinothchandar  We do invoke the same payload when combining records during 
merge/compaction. For deletes, the payload has to be an empty payload and then 
the record should be skipped -> 
https://github.com/apache/incubator-hudi/blob/master/hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/realtime/RealtimeCompactedRecordReader.java#L94
   
   @PhatakN1 when you try deletes, is that any empty payload ?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-hudi] bvaradar commented on issue #1555: [SUPPORT] Meet java.lang.IllegalAccessError: class org.apache.hadoop.hdfs.web.HftpFileSystem

2020-04-23 Thread GitBox


bvaradar commented on issue #1555:
URL: https://github.com/apache/incubator-hudi/issues/1555#issuecomment-618808782


   @allenzhg : The exception strongly suggests you have 2 different versions of 
hadoop (likely 3.x and 2.x brought by Spark). Spark 2.4.x comes pre-built with 
Hadoop 2.7 which could conflict with your hadoop 3.x setup. It is likely 
unrelated to Hudi. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-hudi] vinothchandar commented on issue #1549: Potential issue when using Deltastreamer with DMS

2020-04-23 Thread GitBox


vinothchandar commented on issue #1549:
URL: https://github.com/apache/incubator-hudi/issues/1549#issuecomment-618806837


   as long as the records are the same and you are using the payload it should 
n't matter... 
   
   Let me try to repro this myself.. I am puzzled since I do see the payload 
class written into hoodie.properties.. So what should happen is that the 
payload's `combineAndGetUpdateValue()` should be invoked ... From the code 
though, it seems like this may not be happening..
   
   cc @n3nash are you able to confirm? My understanding was we will invoke the 
same payload in rt merge path. no?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-hudi] PhatakN1 edited a comment on issue #1549: Potential issue when using Deltastreamer with DMS

2020-04-23 Thread GitBox


PhatakN1 edited a comment on issue #1549:
URL: https://github.com/apache/incubator-hudi/issues/1549#issuecomment-618803454


   These are the contents of hoodie.properties
   ```
   

   hoodie.compaction.payload.class=org.apache.hudi.payload.AWSDmsAvroPayload
   hoodie.table.name=retail_transactions
   hoodie.archivelog.folder=archived
   hoodie.table.type=MERGE_ON_READ
   hoodie.timeline.layout.version=1
   

   ```
   
   Some more background and context on what I did.
   I used mySQL--> DMS--> S3--> Hudi for the initial load of the table. This is 
where I used 
hoodie.compaction.payload.class=org.apache.hudi.payload.AWSDmsAvroPayload in my 
command.
   
   For CDC, I used mySQL--> DMS--> Kafka--> Hudi. Here, I used JsonKafkaSource 
in my command. 
   Would this cause an issue somewhere?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1540: [HUDI-819] Fix a bug with MergeOnReadLazyInsertIterable.

2020-04-23 Thread GitBox


vinothchandar commented on a change in pull request #1540:
URL: https://github.com/apache/incubator-hudi/pull/1540#discussion_r414295319



##
File path: hudi-client/src/main/java/org/apache/hudi/io/AppendHandleFactory.java
##
@@ -0,0 +1,36 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.io;
+
+import org.apache.hudi.client.SparkTaskContextSupplier;
+import org.apache.hudi.common.model.HoodieRecordPayload;
+import org.apache.hudi.config.HoodieWriteConfig;
+import org.apache.hudi.table.HoodieTable;
+
+public class AppendHandleFactory extends 
CreateHandleFactory {

Review comment:
   This inheritance is kind of confusing.. 

##
File path: hudi-client/src/main/java/org/apache/hudi/io/AppendHandleFactory.java
##
@@ -0,0 +1,36 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.io;
+
+import org.apache.hudi.client.SparkTaskContextSupplier;
+import org.apache.hudi.common.model.HoodieRecordPayload;
+import org.apache.hudi.config.HoodieWriteConfig;
+import org.apache.hudi.table.HoodieTable;
+
+public class AppendHandleFactory extends 
CreateHandleFactory {

Review comment:
   if you need some method in both Append and Create Handle factories, 
perhaps move it as a default method in the interface? 

##
File path: 
hudi-client/src/main/java/org/apache/hudi/io/WriteHandleCreatorFactory.java
##
@@ -0,0 +1,30 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.io;
+
+import org.apache.hudi.client.SparkTaskContextSupplier;
+import org.apache.hudi.common.model.HoodieRecordPayload;
+import org.apache.hudi.config.HoodieWriteConfig;
+import org.apache.hudi.table.HoodieTable;
+
+public interface WriteHandleCreatorFactory {

Review comment:
   Rename to just `WriteHandleFactory`?

##
File path: 
hudi-client/src/main/java/org/apache/hudi/execution/LazyInsertIterable.java
##
@@ -43,26 +44,34 @@
 /**
  * Lazy Iterable, that writes a stream of HoodieRecords sorted by the 
partitionPath, into new files.
  */
-public class CopyOnWriteLazyInsertIterable
+public class LazyInsertIterable
 extends LazyIterableIterator, List> {
 
   protected final HoodieWriteConfig hoodieConfig;
   protected final String instantTime;
   protected final HoodieTable hoodieTable;
   protected final String idPrefix;
-  protected int numFilesWritten;
   protected SparkTaskContextSupplier sparkTaskContextSupplier;
+  protected WriteHandleCreatorFactory writeHandleCreatorFactory;

Review comment:
   please rename this accordingly as well





This 

[GitHub] [incubator-hudi] PhatakN1 commented on issue #1549: Potential issue when using Deltastreamer with DMS

2020-04-23 Thread GitBox


PhatakN1 commented on issue #1549:
URL: https://github.com/apache/incubator-hudi/issues/1549#issuecomment-618803454


   These are the contents of hoodie.properties
   

   hoodie.compaction.payload.class=org.apache.hudi.payload.AWSDmsAvroPayload
   hoodie.table.name=retail_transactions
   hoodie.archivelog.folder=archived
   hoodie.table.type=MERGE_ON_READ
   hoodie.timeline.layout.version=1
   

   Some more background and context on what I did.
   I used mySQL--> DMS--> S3--> Hudi for the initial load of the table. This is 
where I used 
hoodie.compaction.payload.class=org.apache.hudi.payload.AWSDmsAvroPayload in my 
command.
   
   For CDC, I used mySQL--> DMS--> Kafka--> Hudi. Here, I used JsonKafkaSource 
in my command. 
   Would this cause an issue somewhere?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-hudi] bvaradar commented on issue #1555: [SUPPORT] Meet java.lang.IllegalAccessError: class org.apache.hadoop.hdfs.web.HftpFileSystem

2020-04-23 Thread GitBox


bvaradar commented on issue #1555:
URL: https://github.com/apache/incubator-hudi/issues/1555#issuecomment-618800754


   Yes @vinothchandar  I will handle it. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-hudi] vinothchandar commented on issue #1549: Potential issue when using Deltastreamer with DMS

2020-04-23 Thread GitBox


vinothchandar commented on issue #1549:
URL: https://github.com/apache/incubator-hudi/issues/1549#issuecomment-618800202


   @PhatakN1 ah okay.. Since Hudi itself is not aware of DMS or the`"Op": "D"`, 
it does log a data block with the deleted record.. I suspect the 
`AwsDMSPayload` is not getting used for merging the base and log files for the 
query.. 
   
   Could you also paste the contents of `.hoodie/hoodie.properties`?  



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-hudi] PhatakN1 edited a comment on issue #1549: Potential issue when using Deltastreamer with DMS

2020-04-23 Thread GitBox


PhatakN1 edited a comment on issue #1549:
URL: https://github.com/apache/incubator-hudi/issues/1549#issuecomment-618292312


   If MOR inserts go to a parquet file but updates to go a log file, then a 
query on the _ro table will show the inserts since the last compaction but not 
the updates. Isnt that like providing an inconsistent state of data? So, I 
still see all inserts since the last compaction but none of  the updates?
   
   These are the contents of the log file using show logfile records in hudi-cli
   ```
   {"_hoodie_commit_time": "20200422083923", "_hoodie_commit_seqno": 
"20200422083923_1_2", "_hoodie_record_key": "11", "_hoodie_partition_path": 
"2019-03-14", "_hoodie_file_name": "c9df1d00-5dda-4bf7-8f27-1d4534bbbe4c-0", 
"dms_received_ts": "2020-04-22T08:38:36.873970Z", "tran_id": 11, "tran_date": 
"2019-03-14", "store_id": 5, "store_city": "CHICAGO", "store_state": "IL", 
"item_code": "XX", "quantity": 15, "total": 106.25, "Op": "D"}
   ```
   
   This is the log file metadata
   ```
   ║ 20200422083923 │ 1   │ AVRO_DATA_BLOCK │ 
{"SCHEMA":"{\"type\":\"record\",\"name\":\"retail_transactions\",\"fields\":[{\"name\":\"_hoodie_commit_time\",\"type\":[\"null\",\"string\"],\"doc\":\"\",\"default\":null},{\"name\":\"_hoodie_commit_seqno\",\"type\":[\"null\",\"string\"],\"doc\":\"\",\"default\":null},{\"name\":\"_hoodie_record_key\",\"type\":[\"null\",\"string\"],\"doc\":\"\",\"default\":null},{\"name\":\"_hoodie_partition_path\",\"type\":[\"null\",\"string\"],\"doc\":\"\",\"default\":null},{\"name\":\"_hoodie_file_name\",\"type\":[\"null\",\"string\"],\"doc\":\"\",\"default\":null},{\"name\":\"dms_received_ts\",\"type\":\"string\"},{\"name\":\"tran_id\",\"type\":\"int\"},{\"name\":\"tran_date\",\"type\":\"string\"},{\"name\":\"store_id\",\"type\":\"int\"},{\"name\":\"store_city\",\"type\":\"string\"},{\"name\":\"store_state\",\"type\":\"string\"},{\"name\":\"item_code\",\"type\":\"string\"},{\"name\":\"quantity\",\"type\":\"int\"},{\"name\":\"total\",\"type\":\"float\"},{\"name\":\"Op\",\"type\":\"string\"}]}","INSTANT_TIME":"20200422083923"}
 │ {} ║
   ```
   
   The name of the parquet file in the partition is 
c9df1d00-5dda-4bf7-8f27-1d4534bbbe4c-0_3-23-40_20200422072539.parquet and the 
log file name is 
`c9df1d00-5dda-4bf7-8f27-1d4534bbbe4c-0_20200422072539.log.1_1-24-33`
   
   The partiton metadata contents are 
   ```
   commitTime=20200422072539
   partitionDepth=1
   ```
   Not sure why a query on the _rt table does not reflect the delete. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-hudi] vinothchandar commented on issue #1556: [SUPPORT] Input path in s3 doesn't exist if the write multiple datasets to s3 in a single execution

2020-04-23 Thread GitBox


vinothchandar commented on issue #1556:
URL: https://github.com/apache/incubator-hudi/issues/1556#issuecomment-618798845


   trying to understand, are you concurrently writing to the same dataset using 
two writers? 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-hudi] vinothchandar commented on issue #1555: [SUPPORT] Meet java.lang.IllegalAccessError: class org.apache.hadoop.hdfs.web.HftpFileSystem

2020-04-23 Thread GitBox


vinothchandar commented on issue #1555:
URL: https://github.com/apache/incubator-hudi/issues/1555#issuecomment-618797288


   @bvaradar are you able to tackle this one? 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




Build failed in Jenkins: hudi-snapshot-deployment-0.5 #257

2020-04-23 Thread Apache Jenkins Server
See 


Changes:


--
[...truncated 2.31 KB...]
/home/jenkins/tools/maven/apache-maven-3.5.4/conf:
logging
settings.xml
toolchains.xml

/home/jenkins/tools/maven/apache-maven-3.5.4/conf/logging:
simplelogger.properties

/home/jenkins/tools/maven/apache-maven-3.5.4/lib:
aopalliance-1.0.jar
cdi-api-1.0.jar
cdi-api.license
commons-cli-1.4.jar
commons-cli.license
commons-io-2.5.jar
commons-io.license
commons-lang3-3.5.jar
commons-lang3.license
ext
guava-20.0.jar
guice-4.2.0-no_aop.jar
jansi-1.17.1.jar
jansi-native
javax.inject-1.jar
jcl-over-slf4j-1.7.25.jar
jcl-over-slf4j.license
jsr250-api-1.0.jar
jsr250-api.license
maven-artifact-3.5.4.jar
maven-artifact.license
maven-builder-support-3.5.4.jar
maven-builder-support.license
maven-compat-3.5.4.jar
maven-compat.license
maven-core-3.5.4.jar
maven-core.license
maven-embedder-3.5.4.jar
maven-embedder.license
maven-model-3.5.4.jar
maven-model-builder-3.5.4.jar
maven-model-builder.license
maven-model.license
maven-plugin-api-3.5.4.jar
maven-plugin-api.license
maven-repository-metadata-3.5.4.jar
maven-repository-metadata.license
maven-resolver-api-1.1.1.jar
maven-resolver-api.license
maven-resolver-connector-basic-1.1.1.jar
maven-resolver-connector-basic.license
maven-resolver-impl-1.1.1.jar
maven-resolver-impl.license
maven-resolver-provider-3.5.4.jar
maven-resolver-provider.license
maven-resolver-spi-1.1.1.jar
maven-resolver-spi.license
maven-resolver-transport-wagon-1.1.1.jar
maven-resolver-transport-wagon.license
maven-resolver-util-1.1.1.jar
maven-resolver-util.license
maven-settings-3.5.4.jar
maven-settings-builder-3.5.4.jar
maven-settings-builder.license
maven-settings.license
maven-shared-utils-3.2.1.jar
maven-shared-utils.license
maven-slf4j-provider-3.5.4.jar
maven-slf4j-provider.license
org.eclipse.sisu.inject-0.3.3.jar
org.eclipse.sisu.inject.license
org.eclipse.sisu.plexus-0.3.3.jar
org.eclipse.sisu.plexus.license
plexus-cipher-1.7.jar
plexus-cipher.license
plexus-component-annotations-1.7.1.jar
plexus-component-annotations.license
plexus-interpolation-1.24.jar
plexus-interpolation.license
plexus-sec-dispatcher-1.4.jar
plexus-sec-dispatcher.license
plexus-utils-3.1.0.jar
plexus-utils.license
slf4j-api-1.7.25.jar
slf4j-api.license
wagon-file-3.1.0.jar
wagon-file.license
wagon-http-3.1.0-shaded.jar
wagon-http.license
wagon-provider-api-3.1.0.jar
wagon-provider-api.license

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/ext:
README.txt

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native:
freebsd32
freebsd64
linux32
linux64
osx
README.txt
windows32
windows64

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/freebsd32:
libjansi.so

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/freebsd64:
libjansi.so

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/linux32:
libjansi.so

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/linux64:
libjansi.so

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/osx:
libjansi.jnilib

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/windows32:
jansi.dll

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/windows64:
jansi.dll
Finished /home/jenkins/tools/maven/apache-maven-3.5.4 Directory Listing :
Detected current version as: 
'HUDI_home=
0.6.0-SNAPSHOT'
[INFO] Scanning for projects...
[WARNING] 
[WARNING] Some problems were encountered while building the effective model for 
org.apache.hudi:hudi-spark_2.11:jar:0.6.0-SNAPSHOT
[WARNING] 'artifactId' contains an expression but should be a constant. @ 
org.apache.hudi:hudi-spark_${scala.binary.version}:[unknown-version], 

 line 26, column 15
[WARNING] 
[WARNING] Some problems were encountered while building the effective model for 
org.apache.hudi:hudi-timeline-service:jar:0.6.0-SNAPSHOT
[WARNING] 'build.plugins.plugin.(groupId:artifactId)' must be unique but found 
duplicate declaration of plugin org.jacoco:jacoco-maven-plugin @ 
org.apache.hudi:hudi-timeline-service:[unknown-version], 

 line 58, column 15
[WARNING] 
[WARNING] Some problems were encountered while building the effective model for 
org.apache.hudi:hudi-utilities_2.11:jar:0.6.0-SNAPSHOT
[WARNING] 'artifactId' contains an expression but should be a constant. @ 
org.apache.hudi:hudi-utilities_${scala.binary.version}:[unknown-version], 

 line 26, column 15
[WARNING] 
[WARNING] Some problems were encountered while building the effective model for 
org.apache.hudi:hudi-spark-bundle_2.11:jar:0.6.0-SNAPSHOT
[WARNING] 'artifactId' contains an expression but should be a constant. @ 

[jira] [Assigned] (HUDI-836) Implement datadog metrics reporter

2020-04-23 Thread lamber-ken (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lamber-ken reassigned HUDI-836:
---

Assignee: Raymond Xu

> Implement datadog metrics reporter
> --
>
> Key: HUDI-836
> URL: https://issues.apache.org/jira/browse/HUDI-836
> Project: Apache Hudi (incubating)
>  Issue Type: New Feature
>  Components: Common Core
>Reporter: Raymond Xu
>Assignee: Raymond Xu
>Priority: Major
> Fix For: 0.6.0
>
>
> To implement a new metrics reporter type for datadog API



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-836) Implement datadog metrics reporter

2020-04-23 Thread lamber-ken (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17091134#comment-17091134
 ] 

lamber-ken commented on HUDI-836:
-

(y)

> Implement datadog metrics reporter
> --
>
> Key: HUDI-836
> URL: https://issues.apache.org/jira/browse/HUDI-836
> Project: Apache Hudi (incubating)
>  Issue Type: New Feature
>  Components: Common Core
>Reporter: Raymond Xu
>Assignee: Raymond Xu
>Priority: Major
> Fix For: 0.6.0
>
>
> To implement a new metrics reporter type for datadog API



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] codecov-io edited a comment on pull request #1100: [HUDI-289] Implement a test suite to support long running test for Hudi writing and querying end-end

2020-04-23 Thread GitBox


codecov-io edited a comment on pull request #1100:
URL: https://github.com/apache/incubator-hudi/pull/1100#issuecomment-61645


   # 
[Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1100?src=pr=h1) 
Report
   > Merging 
[#1100](https://codecov.io/gh/apache/incubator-hudi/pull/1100?src=pr=desc) 
into 
[master](https://codecov.io/gh/apache/incubator-hudi/commit/09fd6f64c527e6a822c4e17dc4e61b8fdee28189=desc)
 will **decrease** coverage by `0.57%`.
   > The diff coverage is `63.62%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-hudi/pull/1100/graphs/tree.svg?width=650=150=pr=VTTXabwbs2)](https://codecov.io/gh/apache/incubator-hudi/pull/1100?src=pr=tree)
   
   ```diff
   @@ Coverage Diff  @@
   ## master#1100  +/-   ##
   
   - Coverage 72.32%   71.75%   -0.58% 
   - Complexity  294  566 +272 
   
 Files   374  418  +44 
 Lines 1636617611+1245 
 Branches   1649 1772 +123 
   
   + Hits  1183612636 +800 
   - Misses 3798 4175 +377 
   - Partials732  800  +68 
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/incubator-hudi/pull/1100?src=pr=tree) | 
Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | 
[...c/main/java/org/apache/hudi/hive/HiveSyncTool.java](https://codecov.io/gh/apache/incubator-hudi/pull/1100/diff?src=pr=tree#diff-aHVkaS1oaXZlLXN5bmMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaGl2ZS9IaXZlU3luY1Rvb2wuamF2YQ==)
 | `71.73% <0.00%> (-2.42%)` | `0.00 <0.00> (ø)` | |
   | 
[...in/java/org/apache/hudi/hive/HoodieHiveClient.java](https://codecov.io/gh/apache/incubator-hudi/pull/1100/diff?src=pr=tree#diff-aHVkaS1oaXZlLXN5bmMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaGl2ZS9Ib29kaWVIaXZlQ2xpZW50LmphdmE=)
 | `74.36% <0.00%> (-0.27%)` | `0.00 <0.00> (ø)` | |
   | 
[...src/main/java/org/apache/hudi/DataSourceUtils.java](https://codecov.io/gh/apache/incubator-hudi/pull/1100/diff?src=pr=tree#diff-aHVkaS1zcGFyay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS9EYXRhU291cmNlVXRpbHMuamF2YQ==)
 | `49.45% <0.00%> (-1.12%)` | `0.00 <0.00> (ø)` | |
   | 
[...src/main/java/org/apache/hudi/QuickstartUtils.java](https://codecov.io/gh/apache/incubator-hudi/pull/1100/diff?src=pr=tree#diff-aHVkaS1zcGFyay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS9RdWlja3N0YXJ0VXRpbHMuamF2YQ==)
 | `0.00% <0.00%> (ø)` | `0.00 <0.00> (ø)` | |
   | 
[...va/org/apache/hudi/keygen/ComplexKeyGenerator.java](https://codecov.io/gh/apache/incubator-hudi/pull/1100/diff?src=pr=tree#diff-aHVkaS1zcGFyay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS9rZXlnZW4vQ29tcGxleEtleUdlbmVyYXRvci5qYXZh)
 | `91.66% <ø> (+4.82%)` | `0.00 <0.00> (ø)` | |
   | 
[...apache/hudi/keygen/NonpartitionedKeyGenerator.java](https://codecov.io/gh/apache/incubator-hudi/pull/1100/diff?src=pr=tree#diff-aHVkaS1zcGFyay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS9rZXlnZW4vTm9ucGFydGl0aW9uZWRLZXlHZW5lcmF0b3IuamF2YQ==)
 | `0.00% <0.00%> (ø)` | `0.00 <0.00> (ø)` | |
   | 
[...apache/hudi/testsuite/DFSSparkAvroDeltaWriter.java](https://codecov.io/gh/apache/incubator-hudi/pull/1100/diff?src=pr=tree#diff-aHVkaS10ZXN0LXN1aXRlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL3Rlc3RzdWl0ZS9ERlNTcGFya0F2cm9EZWx0YVdyaXRlci5qYXZh)
 | `0.00% <0.00%> (ø)` | `0.00 <0.00> (?)` | |
   | 
[...hudi/testsuite/dag/SimpleWorkflowDagGenerator.java](https://codecov.io/gh/apache/incubator-hudi/pull/1100/diff?src=pr=tree#diff-aHVkaS10ZXN0LXN1aXRlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL3Rlc3RzdWl0ZS9kYWcvU2ltcGxlV29ya2Zsb3dEYWdHZW5lcmF0b3IuamF2YQ==)
 | `0.00% <0.00%> (ø)` | `0.00 <0.00> (?)` | |
   | 
[...pache/hudi/testsuite/dag/nodes/BulkInsertNode.java](https://codecov.io/gh/apache/incubator-hudi/pull/1100/diff?src=pr=tree#diff-aHVkaS10ZXN0LXN1aXRlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL3Rlc3RzdWl0ZS9kYWcvbm9kZXMvQnVsa0luc2VydE5vZGUuamF2YQ==)
 | `0.00% <0.00%> (ø)` | `0.00 <0.00> (?)` | |
   | 
[...org/apache/hudi/testsuite/dag/nodes/CleanNode.java](https://codecov.io/gh/apache/incubator-hudi/pull/1100/diff?src=pr=tree#diff-aHVkaS10ZXN0LXN1aXRlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL3Rlc3RzdWl0ZS9kYWcvbm9kZXMvQ2xlYW5Ob2RlLmphdmE=)
 | `0.00% <0.00%> (ø)` | `0.00 <0.00> (?)` | |
   | ... and [97 
more](https://codecov.io/gh/apache/incubator-hudi/pull/1100/diff?src=pr=tree-more)
 | |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1100?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1100?src=pr=footer).
 Last update 

[GitHub] [incubator-hudi] yanghua commented on pull request #1100: [HUDI-289] Implement a test suite to support long running test for Hudi writing and querying end-end

2020-04-23 Thread GitBox


yanghua commented on pull request #1100:
URL: https://github.com/apache/incubator-hudi/pull/1100#issuecomment-618771761


   @n3nash Still conflicting files... I have tried to fix it yesterday. You may 
need to `pull --rebase` before force push?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-hudi] umehrot2 commented on issue #1550: Hudi 0.5.2 inability save complex type with nullable = true [SUPPORT]

2020-04-23 Thread GitBox


umehrot2 commented on issue #1550:
URL: https://github.com/apache/incubator-hudi/issues/1550#issuecomment-618769492


   @badion yeah the fix for this did not make it to 0.5.2. You can either build 
your custom Hudi with this patch applied on top of 0.5.2 or wait until next 
release.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[incubator-hudi] branch hudi_test_suite_refactor updated (7313a22 -> 95283be)

2020-04-23 Thread nagarwal
This is an automated email from the ASF dual-hosted git repository.

nagarwal pushed a change to branch hudi_test_suite_refactor
in repository https://gitbox.apache.org/repos/asf/incubator-hudi.git.


omit 7313a22  trigger rebuild
omit 908e57c  [HUDI-397]Normalize log print statement (#1224)
omit 7ab93b0  Testing running 3 builds to limit total build time
omit 0c75316  [HUDI-394] Provide a basic implementation of test suite
omit 19cc15c  [MINOR]: Fix cli docs for DeltaStreamer (#1547)
omit aea7c16  [HUDI-795] Handle auto-deleted empty aux folder (#1515)
omit 26684f5  [HUDI-816] Fixed MAX_MEMORY_FOR_MERGE_PROP and 
MAX_MEMORY_FOR_COMPACTION_PROP do not work due to HUDI-678 (#1536)
omit 6e15eeb  [HUDI-809] Migrate CommonTestHarness to JUnit 5 (#1530)
omit 2a56f82  [HUDI-821] Fixing JCommander param parsing in deltastreamer 
(#1525)
omit 62bd3e7  [HUDI-757] Added hudi-cli command to export metadata of 
Instants.
omit 84dd904  [HUDI-789]Adjust logic of upsert in HDFSParquetImporter 
(#1511)
omit 332072b  [HUDI-371] Supporting hive combine input format for realtime 
tables (#1503)
omit 2a2f31d  [MINOR] Remove reduntant code and fix typo in 
HoodieDefaultTimeline (#1535)
omit ddd105b  [HUDI-772] Make UserDefinedBulkInsertPartitioner configurable 
for DataSource (#1500)
 add c13e885  [HUDI-394] Provide a basic implementation of test suite
 add 95283be  Testing running 3 builds to limit total build time

This update added new revisions after undoing existing revisions.
That is to say, some revisions that were in the old version of the
branch are not in the new version.  This situation occurs
when a user --force pushes a change and generates a repository
containing something like this:

 * -- * -- B -- O -- O -- O   (7313a22)
\
 N -- N -- N   refs/heads/hudi_test_suite_refactor (95283be)

You should already have received notification emails for all of the O
revisions, and so the following emails describe only the N revisions
from the common base, B.

Any revisions marked "omit" are not gone; other references still
refer to them.  Any revisions marked "discard" are gone forever.

No new revisions were added by this update.

Summary of changes:
 .../apache/hudi/cli/commands/ExportCommand.java| 231 
 .../apache/hudi/client/utils/SparkConfigUtils.java |  10 +-
 .../org/apache/hudi/config/HoodieWriteConfig.java  |  10 -
 .../apache/hudi/table/HoodieCommitArchiveLog.java  |  24 +-
 .../hudi/client/utils/TestSparkConfigUtils.java|  65 ---
 .../hudi/common/HoodieMergeOnReadTestUtils.java|   4 +-
 .../java/org/apache/hudi/avro/HoodieAvroUtils.java |  29 -
 .../table/timeline/HoodieDefaultTimeline.java  |   6 +-
 .../table/timeline/TimelineMetadataUtils.java  |   4 -
 .../hudi/common/util/collection/ArrayUtils.java|  62 --
 .../common/table/TestHoodieTableMetaClient.java|  54 +-
 .../hudi/common/table/TestTimelineLayout.java  |  24 +-
 .../table/view/TestHoodieTableFileSystemView.java  | 335 +--
 .../table/view/TestRocksDbBasedFileSystemView.java |   4 +-
 .../testutils/HoodieCommonTestHarnessJunit5.java   |  52 --
 .../apache/hudi/common/util/TestFileIOUtils.java   |  20 +-
 hudi-hadoop-mr/pom.xml |   8 +-
 .../hadoop/hive/HoodieCombineHiveInputFormat.java  | 626 +
 .../hive/HoodieCombineRealtimeFileSplit.java   | 169 --
 .../hive/HoodieCombineRealtimeHiveSplit.java   |  44 --
 .../realtime/AbstractRealtimeRecordReader.java |   3 -
 .../HoodieCombineRealtimeRecordReader.java | 103 
 .../realtime/HoodieParquetRealtimeInputFormat.java |   2 +-
 .../realtime/HoodieRealtimeRecordReader.java   |   1 -
 .../realtime/RealtimeUnmergedRecordReader.java |  22 +-
 .../apache/hudi/hadoop/InputFormatTestUtil.java| 165 ++
 .../hudi/hadoop/TestHoodieParquetInputFormat.java  |  99 ++--
 .../hudi/hadoop/TestHoodieROTablePathFilter.java   |  26 +-
 .../realtime/TestHoodieCombineHiveInputFormat.java | 156 -
 .../realtime/TestHoodieRealtimeRecordReader.java   | 206 ---
 .../main/java/org/apache/hudi/DataSourceUtils.java |  27 +-
 hudi-spark/src/test/java/DataSourceTestUtils.java  |  13 -
 hudi-spark/src/test/java/DataSourceUtilsTest.java  |  86 ---
 .../hudi/testsuite/dag/nodes/BulkInsertNode.java   |   2 +-
 .../apache/hudi/testsuite/dag/nodes/CleanNode.java |   2 +-
 .../hudi/testsuite/dag/nodes/CompactNode.java  |   2 +-
 .../apache/hudi/testsuite/dag/nodes/DagNode.java   |   6 +-
 .../hudi/testsuite/dag/nodes/HiveQueryNode.java|   6 +-
 .../hudi/testsuite/dag/nodes/HiveSyncNode.java |   2 +-
 .../hudi/testsuite/dag/nodes/InsertNode.java   |   6 +-
 .../hudi/testsuite/dag/nodes/RollbackNode.java |   4 +-
 .../testsuite/dag/nodes/ScheduleCompactNode.java   |   4 +-
 .../testsuite/dag/nodes/SparkSQLQueryNode.java |   4 +-
 .../hudi/testsuite/dag/nodes/UpsertNode.java   |   4 +-
 

[incubator-hudi] branch hudi_test_suite_refactor updated (908e57c -> 7313a22)

2020-04-23 Thread vinoyang
This is an automated email from the ASF dual-hosted git repository.

vinoyang pushed a change to branch hudi_test_suite_refactor
in repository https://gitbox.apache.org/repos/asf/incubator-hudi.git.


from 908e57c  [HUDI-397]Normalize log print statement (#1224)
 add 7313a22  trigger rebuild

No new revisions were added by this update.

Summary of changes:



[jira] [Created] (HUDI-836) Implement datadog metrics reporter

2020-04-23 Thread Raymond Xu (Jira)
Raymond Xu created HUDI-836:
---

 Summary: Implement datadog metrics reporter
 Key: HUDI-836
 URL: https://issues.apache.org/jira/browse/HUDI-836
 Project: Apache Hudi (incubating)
  Issue Type: New Feature
  Components: Common Core
Reporter: Raymond Xu
 Fix For: 0.6.0


To implement a new metrics reporter type for datadog API



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-773) Hudi On Azure Data Lake Storage V2

2020-04-23 Thread Yanjia Gary Li (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17091042#comment-17091042
 ] 

Yanjia Gary Li commented on HUDI-773:
-

Hello [~sasikumar.venkat], could you try the following:

mount your storage account to Databricks
{code:java}
dbutils.fs.mount(
source = "abfss://x...@xxx.dfs.core.windows.net",
mountPoint = "/mountpoint",
extraConfigs = configs)
{code}
When writing to Hudi, use the abfss URL
{code:java}
save("abfss://<>.dfs.core.windows.net/hudi-tables/customer"){code}
When read Hudi data, use the mount point
{code:java}
load("/mountpoint/hudi-tables/customer")
{code}
I believe this error could be related to Databricks internal setup

> Hudi On Azure Data Lake Storage V2
> --
>
> Key: HUDI-773
> URL: https://issues.apache.org/jira/browse/HUDI-773
> Project: Apache Hudi (incubating)
>  Issue Type: New Feature
>  Components: Usability
>Reporter: Yanjia Gary Li
>Assignee: Yanjia Gary Li
>Priority: Minor
> Fix For: 0.6.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HUDI-620) Hive Sync Integration of bootstrapped table

2020-04-23 Thread Udit Mehrotra (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Udit Mehrotra reassigned HUDI-620:
--

Assignee: Udit Mehrotra

> Hive Sync Integration of bootstrapped table
> ---
>
> Key: HUDI-620
> URL: https://issues.apache.org/jira/browse/HUDI-620
> Project: Apache Hudi (incubating)
>  Issue Type: Sub-task
>  Components: Hive Integration
>Reporter: Balaji Varadarajan
>Assignee: Udit Mehrotra
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] satishkotha commented on pull request #1540: [HUDI-819] Fix a bug with MergeOnReadLazyInsertIterable.

2020-04-23 Thread GitBox


satishkotha commented on pull request #1540:
URL: https://github.com/apache/incubator-hudi/pull/1540#issuecomment-618717557


   > @satishkotha let's then break that up into a separate JIRA (tagged with 
Code Cleanup component). We can limit scope to these insert related handles and 
move on.. wdyt
   
   @vinothchandar Sure. that sounds good. I created HUDI-835 for this. Let me 
know if you have any other comments on create/append part



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Created] (HUDI-835) refactor HoodieMergeHandle into factory pattern

2020-04-23 Thread satish (Jira)
satish created HUDI-835:
---

 Summary: refactor HoodieMergeHandle into factory pattern
 Key: HUDI-835
 URL: https://issues.apache.org/jira/browse/HUDI-835
 Project: Apache Hudi (incubating)
  Issue Type: Improvement
  Components: Code Cleanup
Reporter: satish
Assignee: satish


As part of [this PR|https://github.com/apache/incubator-hudi/pull/1540], we 
changed Create and Append handles to use factory pattern to avoid code 
duplication. 

For consistency, we want to move HoodieMergeHandle also into factory pattern. 
One possible approach to achieving that is to move 'recordItr' from MergeHandle 
constructor into UpdateHandler class.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] hmatu commented on pull request #1557: [HUDI-834] Concrete signature of HoodieRecordPayload#combineAndGetUpdateValue & HoodieRecordPayload#getInsertValue

2020-04-23 Thread GitBox


hmatu commented on pull request #1557:
URL: https://github.com/apache/incubator-hudi/pull/1557#issuecomment-618691728


   +1, LGTM



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-hudi] lamber-ken commented on issue #1552: Time taken for upserting hudi table is increasing with increase in number of partitions

2020-04-23 Thread GitBox


lamber-ken commented on issue #1552:
URL: https://github.com/apache/incubator-hudi/issues/1552#issuecomment-618683897


   hi @harshi2506, need more spark info log, you can put the logfile to google 
drive, 
   e.g https://drive.google.com/file/d/1zzyaySDJqPgAdTSLnKwOG667QGvZhd03



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-hudi] lamber-ken edited a comment on issue #1552: Time taken for upserting hudi table is increasing with increase in number of partitions

2020-04-23 Thread GitBox


lamber-ken edited a comment on issue #1552:
URL: https://github.com/apache/incubator-hudi/issues/1552#issuecomment-618657572


   User report: upsert hoodie log
   ```
   Started at 20/04/22 20:12:14 
   
   
   20/04/22 20:15:30 INFO HoodieTableMetaClient: Finished Loading Table of type 
COPY_ON_WRITE from 
   20/04/22 20:15:30 INFO HoodieTableMetaClient: Loading Active commit timeline 
for 
   20/04/22 20:15:30 INFO HoodieActiveTimeline: Loaded instants 
java.util.stream.ReferencePipeline$Head@4af81941
   20/04/22 20:15:30 INFO HoodieCommitArchiveLog: No Instants to archive
   20/04/22 20:15:30 INFO HoodieWriteClient: Auto cleaning is enabled. Running 
cleaner now
   20/04/22 20:15:30 INFO HoodieWriteClient: Cleaner started
   20/04/22 20:15:30 INFO HoodieTableMetaClient: Loading HoodieTableMetaClient 
from 
   20/04/22 20:15:30 INFO FSUtils: Hadoop Configuration: fs.defaultFS:hdfs, 
Config:[Configuration: core-default.xml, core-site.xml, mapred-default.xml, 
mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, 
hdfs-site.xml, emrfs-site.xml, __spark_hadoop_conf__.xml, 
file:/etc/spark/conf.dist/hive-site.xml], FileSystem: 
[com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem@3c0cfaad]
   20/04/22 20:15:31 INFO HoodieTableConfig: Loading dataset properties from 
.hoodie/hoodie.properties
   20/04/22 20:15:31 INFO S3NativeFileSystem: Opening 
.hoodie/hoodie.properties' for reading
   20/04/22 20:15:31 WARN S3CryptoModuleAE: Unable to detect encryption 
information for object '.hoodie/hoodie.properties' in bucket 
'delta-data-devo'. Returning object without decryption.
   20/04/22 20:15:31 INFO HoodieTableMetaClient: Finished Loading Table of type 
COPY_ON_WRITE from 
   20/04/22 20:15:31 INFO HoodieTableMetaClient: Loading Active commit timeline 
for 
   20/04/22 20:15:31 INFO HoodieActiveTimeline: Loaded instants 
java.util.stream.ReferencePipeline$Head@72dd302
   20/04/22 20:15:31 INFO FileSystemViewManager: Creating View Manager with 
storage type :MEMORY
   20/04/22 20:15:31 INFO FileSystemViewManager: Creating in-memory based Table 
View
   20/04/22 20:15:31 INFO HoodieTableMetaClient: Loading HoodieTableMetaClient 
from 
   20/04/22 20:15:31 INFO FSUtils: Hadoop Configuration: fs.defaultFS: hdfs, 
Config:[Configuration: core-default.xml, core-site.xml, mapred-default.xml, 
mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, 
hdfs-site.xml, emrfs-site.xml, __spark_hadoop_conf__.xml, 
file:/etc/spark/conf.dist/hive-site.xml], FileSystem: 
[com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem@3c0cfaad]
   20/04/22 20:15:31 INFO HoodieTableConfig: Loading dataset properties from 
.hoodie/hoodie.properties
   20/04/22 20:15:31 INFO S3NativeFileSystem: Opening 
'.hoodie/hoodie.properties' for reading
   20/04/22 20:15:31 WARN S3CryptoModuleAE: Unable to detect encryption 
information for object '.hoodie/hoodie.properties' in bucket 
'delta-data-devo'. Returning object without decryption.
   20/04/22 20:15:31 INFO HoodieTableMetaClient: Finished Loading Table of type 
COPY_ON_WRITE from 
   20/04/22 20:15:31 INFO HoodieTableMetaClient: Loading Active commit timeline 
for 
   20/04/22 20:15:31 INFO HoodieActiveTimeline: Loaded instants 
java.util.stream.ReferencePipeline$Head@678852b5
   20/04/22 20:24:33 INFO HoodieCopyOnWriteTable: Partitions to clean up : 
[2001/05/02, 2001/05/07, 2001/05/09, 2001/05/10, 2001/05/17, 2001/05/18, 
2001/05/21, 2001/06/01, 2001/06/04, 2001/06/08, 2001/06/20, 2001/06/21, 
2001/07/17, 2001/07/23, 2001/07/25, 2001/07/30, 2001/08/02, 2001/08/03, 
2001/08/07, 2001/08/08, 2001/08/09, 2001/08/14, 2001/08/23, 2001/09/05, 
2001/09/06, 2001/09/07, 2001/09/13, 2001/09/14, 2001/10/02, 2001/10/03, 
2001/10/04, 2001/10/09, 2001/11/01, 2001/11/09, 2001/11/14, 2001/11/15, 
2001/11/16, 2001/11/19, 2001/11/20, 2001/11/21, 2001/11/27, 2001/11/28, 
2001/11/29, 2001/11/30, 2001/12/03, 2001/12/07, 2001/12/10, 2001/12/11, 
2001/12/12, 2001/12/13, 2001/12/17, 2001/12/20, 2001/12/21, 2001/12/25, 
2001/12/26, 2001/12/27, 2001/12/28, 2001/12/29, 2001/12/31, 2002/01/02, 
2002/01/03, 2002/01/07, 2002/01/08, 2002/01/09, 2002/01/11, 2002/01/13, 
2002/01/14, 2002/01/15, 2002/01/16, 2002/01/17, 2002/01/18, 2002/01/21, 
2002/01/22, 2002/01/23, 2002/01/25, 2002/01/28, 2002/01/29, 2002/01/30, 
2002/02/03, 2002/02/05, 2002/02/06, 2002/02/07, 2002/02/11, 2002/02/12, 
2002/02/14, 2002/02/15, 2002/02/18, 2002/02/19, 2002/02/20, 2002/02/21, 
2002/02/22, 2002/02/26, 2002/03/02, 2002/03/04, 2002/03/06, 2002/03/10, 
2002/03/15, 2002/03/17, 2002/03/19, 2002/03/20, 2002/03/21, 2002/03/22, 
2002/03/25, 2002/03/26, 2002/03/27, 2002/03/28, 2002/03/30, 2002/04/02, 
2002/04/03, 2002/04/04, 2002/04/05, 2002/04/07, 2002/04/09, 2002/04/10, 
2002/04/11, 2002/04/14, 2002/04/16, 2002/04/17, 2002/04/22, 2002/04/23, 
2002/04/25, 2002/04/30, 2002/05/01, 2002/05/02, 2002/05/06, 2002/05/08, 
2002/05/09, 2002/05/12, 2002/05/13, 2002/05/14, 2002/05/17, 2002/05/19, 
2002/05/20, 2002/05/21, 2002/05/22, 

[GitHub] [incubator-hudi] lamber-ken edited a comment on issue #1552: Time taken for upserting hudi table is increasing with increase in number of partitions

2020-04-23 Thread GitBox


lamber-ken edited a comment on issue #1552:
URL: https://github.com/apache/incubator-hudi/issues/1552#issuecomment-618657572


   User report: upsert hoodie log, cost about 30min
   ```
   Started at 20/04/22 20:12:14 
   
   
   20/04/22 20:15:30 INFO HoodieTableMetaClient: Finished Loading Table of type 
COPY_ON_WRITE from 
   20/04/22 20:15:30 INFO HoodieTableMetaClient: Loading Active commit timeline 
for 
   20/04/22 20:15:30 INFO HoodieActiveTimeline: Loaded instants 
java.util.stream.ReferencePipeline$Head@4af81941
   20/04/22 20:15:30 INFO HoodieCommitArchiveLog: No Instants to archive
   20/04/22 20:15:30 INFO HoodieWriteClient: Auto cleaning is enabled. Running 
cleaner now
   20/04/22 20:15:30 INFO HoodieWriteClient: Cleaner started
   20/04/22 20:15:30 INFO HoodieTableMetaClient: Loading HoodieTableMetaClient 
from 
   20/04/22 20:15:30 INFO FSUtils: Hadoop Configuration: fs.defaultFS:hdfs, 
Config:[Configuration: core-default.xml, core-site.xml, mapred-default.xml, 
mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, 
hdfs-site.xml, emrfs-site.xml, __spark_hadoop_conf__.xml, 
file:/etc/spark/conf.dist/hive-site.xml], FileSystem: 
[com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem@3c0cfaad]
   20/04/22 20:15:31 INFO HoodieTableConfig: Loading dataset properties from 
.hoodie/hoodie.properties
   20/04/22 20:15:31 INFO S3NativeFileSystem: Opening 
.hoodie/hoodie.properties' for reading
   20/04/22 20:15:31 WARN S3CryptoModuleAE: Unable to detect encryption 
information for object '.hoodie/hoodie.properties' in bucket 
'delta-data-devo'. Returning object without decryption.
   20/04/22 20:15:31 INFO HoodieTableMetaClient: Finished Loading Table of type 
COPY_ON_WRITE from 
   20/04/22 20:15:31 INFO HoodieTableMetaClient: Loading Active commit timeline 
for 
   20/04/22 20:15:31 INFO HoodieActiveTimeline: Loaded instants 
java.util.stream.ReferencePipeline$Head@72dd302
   20/04/22 20:15:31 INFO FileSystemViewManager: Creating View Manager with 
storage type :MEMORY
   20/04/22 20:15:31 INFO FileSystemViewManager: Creating in-memory based Table 
View
   20/04/22 20:15:31 INFO HoodieTableMetaClient: Loading HoodieTableMetaClient 
from 
   20/04/22 20:15:31 INFO FSUtils: Hadoop Configuration: fs.defaultFS: hdfs, 
Config:[Configuration: core-default.xml, core-site.xml, mapred-default.xml, 
mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, 
hdfs-site.xml, emrfs-site.xml, __spark_hadoop_conf__.xml, 
file:/etc/spark/conf.dist/hive-site.xml], FileSystem: 
[com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem@3c0cfaad]
   20/04/22 20:15:31 INFO HoodieTableConfig: Loading dataset properties from 
.hoodie/hoodie.properties
   20/04/22 20:15:31 INFO S3NativeFileSystem: Opening 
'.hoodie/hoodie.properties' for reading
   20/04/22 20:15:31 WARN S3CryptoModuleAE: Unable to detect encryption 
information for object '.hoodie/hoodie.properties' in bucket 
'delta-data-devo'. Returning object without decryption.
   20/04/22 20:15:31 INFO HoodieTableMetaClient: Finished Loading Table of type 
COPY_ON_WRITE from 
   20/04/22 20:15:31 INFO HoodieTableMetaClient: Loading Active commit timeline 
for 
   20/04/22 20:15:31 INFO HoodieActiveTimeline: Loaded instants 
java.util.stream.ReferencePipeline$Head@678852b5
   20/04/22 20:24:33 INFO HoodieCopyOnWriteTable: Partitions to clean up : 
[2001/05/02, 2001/05/07, 2001/05/09, 2001/05/10, 2001/05/17, 2001/05/18, 
2001/05/21, 2001/06/01, 2001/06/04, 2001/06/08, 2001/06/20, 2001/06/21, 
2001/07/17, 2001/07/23, 2001/07/25, 2001/07/30, 2001/08/02, 2001/08/03, 
2001/08/07, 2001/08/08, 2001/08/09, 2001/08/14, 2001/08/23, 2001/09/05, 
2001/09/06, 2001/09/07, 2001/09/13, 2001/09/14, 2001/10/02, 2001/10/03, 
2001/10/04, 2001/10/09, 2001/11/01, 2001/11/09, 2001/11/14, 2001/11/15, 
2001/11/16, 2001/11/19, 2001/11/20, 2001/11/21, 2001/11/27, 2001/11/28, 
2001/11/29, 2001/11/30, 2001/12/03, 2001/12/07, 2001/12/10, 2001/12/11, 
2001/12/12, 2001/12/13, 2001/12/17, 2001/12/20, 2001/12/21, 2001/12/25, 
2001/12/26, 2001/12/27, 2001/12/28, 2001/12/29, 2001/12/31, 2002/01/02, 
2002/01/03, 2002/01/07, 2002/01/08, 2002/01/09, 2002/01/11, 2002/01/13, 
2002/01/14, 2002/01/15, 2002/01/16, 2002/01/17, 2002/01/18, 2002/01/21, 
2002/01/22, 2002/01/23, 2002/01/25, 2002/01/28, 2002/01/29, 2002/01/30, 
2002/02/03, 2002/02/05, 2002/02/06, 2002/02/07, 2002/02/11, 2002/02/12, 
2002/02/14, 2002/02/15, 2002/02/18, 2002/02/19, 2002/02/20, 2002/02/21, 
2002/02/22, 2002/02/26, 2002/03/02, 2002/03/04, 2002/03/06, 2002/03/10, 
2002/03/15, 2002/03/17, 2002/03/19, 2002/03/20, 2002/03/21, 2002/03/22, 
2002/03/25, 2002/03/26, 2002/03/27, 2002/03/28, 2002/03/30, 2002/04/02, 
2002/04/03, 2002/04/04, 2002/04/05, 2002/04/07, 2002/04/09, 2002/04/10, 
2002/04/11, 2002/04/14, 2002/04/16, 2002/04/17, 2002/04/22, 2002/04/23, 
2002/04/25, 2002/04/30, 2002/05/01, 2002/05/02, 2002/05/06, 2002/05/08, 
2002/05/09, 2002/05/12, 2002/05/13, 2002/05/14, 2002/05/17, 2002/05/19, 
2002/05/20, 2002/05/21, 

[GitHub] [incubator-hudi] lamber-ken commented on issue #1552: Time taken for upserting hudi table is increasing with increase in number of partitions

2020-04-23 Thread GitBox


lamber-ken commented on issue #1552:
URL: https://github.com/apache/incubator-hudi/issues/1552#issuecomment-618657572


   Upsert hoodie log, cost about 30min
   ```
   Started at 20/04/22 20:12:14 
   
   
   20/04/22 20:15:30 INFO HoodieTableMetaClient: Finished Loading Table of type 
COPY_ON_WRITE from 
   20/04/22 20:15:30 INFO HoodieTableMetaClient: Loading Active commit timeline 
for 
   20/04/22 20:15:30 INFO HoodieActiveTimeline: Loaded instants 
java.util.stream.ReferencePipeline$Head@4af81941
   20/04/22 20:15:30 INFO HoodieCommitArchiveLog: No Instants to archive
   20/04/22 20:15:30 INFO HoodieWriteClient: Auto cleaning is enabled. Running 
cleaner now
   20/04/22 20:15:30 INFO HoodieWriteClient: Cleaner started
   20/04/22 20:15:30 INFO HoodieTableMetaClient: Loading HoodieTableMetaClient 
from 
   20/04/22 20:15:30 INFO FSUtils: Hadoop Configuration: fs.defaultFS:hdfs, 
Config:[Configuration: core-default.xml, core-site.xml, mapred-default.xml, 
mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, 
hdfs-site.xml, emrfs-site.xml, __spark_hadoop_conf__.xml, 
file:/etc/spark/conf.dist/hive-site.xml], FileSystem: 
[com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem@3c0cfaad]
   20/04/22 20:15:31 INFO HoodieTableConfig: Loading dataset properties from 
.hoodie/hoodie.properties
   20/04/22 20:15:31 INFO S3NativeFileSystem: Opening 
.hoodie/hoodie.properties' for reading
   20/04/22 20:15:31 WARN S3CryptoModuleAE: Unable to detect encryption 
information for object '.hoodie/hoodie.properties' in bucket 
'delta-data-devo'. Returning object without decryption.
   20/04/22 20:15:31 INFO HoodieTableMetaClient: Finished Loading Table of type 
COPY_ON_WRITE from 
   20/04/22 20:15:31 INFO HoodieTableMetaClient: Loading Active commit timeline 
for 
   20/04/22 20:15:31 INFO HoodieActiveTimeline: Loaded instants 
java.util.stream.ReferencePipeline$Head@72dd302
   20/04/22 20:15:31 INFO FileSystemViewManager: Creating View Manager with 
storage type :MEMORY
   20/04/22 20:15:31 INFO FileSystemViewManager: Creating in-memory based Table 
View
   20/04/22 20:15:31 INFO HoodieTableMetaClient: Loading HoodieTableMetaClient 
from 
   20/04/22 20:15:31 INFO FSUtils: Hadoop Configuration: fs.defaultFS: hdfs, 
Config:[Configuration: core-default.xml, core-site.xml, mapred-default.xml, 
mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, 
hdfs-site.xml, emrfs-site.xml, __spark_hadoop_conf__.xml, 
file:/etc/spark/conf.dist/hive-site.xml], FileSystem: 
[com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem@3c0cfaad]
   20/04/22 20:15:31 INFO HoodieTableConfig: Loading dataset properties from 
.hoodie/hoodie.properties
   20/04/22 20:15:31 INFO S3NativeFileSystem: Opening 
'.hoodie/hoodie.properties' for reading
   20/04/22 20:15:31 WARN S3CryptoModuleAE: Unable to detect encryption 
information for object '.hoodie/hoodie.properties' in bucket 
'delta-data-devo'. Returning object without decryption.
   20/04/22 20:15:31 INFO HoodieTableMetaClient: Finished Loading Table of type 
COPY_ON_WRITE from 
   20/04/22 20:15:31 INFO HoodieTableMetaClient: Loading Active commit timeline 
for 
   20/04/22 20:15:31 INFO HoodieActiveTimeline: Loaded instants 
java.util.stream.ReferencePipeline$Head@678852b5
   20/04/22 20:24:33 INFO HoodieCopyOnWriteTable: Partitions to clean up : 
[2001/05/02, 2001/05/07, 2001/05/09, 2001/05/10, 2001/05/17, 2001/05/18, 
2001/05/21, 2001/06/01, 2001/06/04, 2001/06/08, 2001/06/20, 2001/06/21, 
2001/07/17, 2001/07/23, 2001/07/25, 2001/07/30, 2001/08/02, 2001/08/03, 
2001/08/07, 2001/08/08, 2001/08/09, 2001/08/14, 2001/08/23, 2001/09/05, 
2001/09/06, 2001/09/07, 2001/09/13, 2001/09/14, 2001/10/02, 2001/10/03, 
2001/10/04, 2001/10/09, 2001/11/01, 2001/11/09, 2001/11/14, 2001/11/15, 
2001/11/16, 2001/11/19, 2001/11/20, 2001/11/21, 2001/11/27, 2001/11/28, 
2001/11/29, 2001/11/30, 2001/12/03, 2001/12/07, 2001/12/10, 2001/12/11, 
2001/12/12, 2001/12/13, 2001/12/17, 2001/12/20, 2001/12/21, 2001/12/25, 
2001/12/26, 2001/12/27, 2001/12/28, 2001/12/29, 2001/12/31, 2002/01/02, 
2002/01/03, 2002/01/07, 2002/01/08, 2002/01/09, 2002/01/11, 2002/01/13, 
2002/01/14, 2002/01/15, 2002/01/16, 2002/01/17, 2002/01/18, 2002/01/21, 
2002/01/22, 2002/01/23, 2002/01/25, 2002/01/28, 2002/01/29, 2002/01/30, 
2002/02/03, 2002/02/05, 2002/02/06, 2002/02/07, 2002/02/11, 2002/02/12, 
2002/02/14, 2002/02/15, 2002/02/18, 2002/02/19, 2002/02/20, 2002/02/21, 
2002/02/22, 2002/02/26, 2002/03/02, 2002/03/04, 2002/03/06, 2002/03/10, 
2002/03/15, 2002/03/17, 2002/03/19, 2002/03/20, 2002/03/21, 2002/03/22, 
2002/03/25, 2002/03/26, 2002/03/27, 2002/03/28, 2002/03/30, 2002/04/02, 
2002/04/03, 2002/04/04, 2002/04/05, 2002/04/07, 2002/04/09, 2002/04/10, 
2002/04/11, 2002/04/14, 2002/04/16, 2002/04/17, 2002/04/22, 2002/04/23, 
2002/04/25, 2002/04/30, 2002/05/01, 2002/05/02, 2002/05/06, 2002/05/08, 
2002/05/09, 2002/05/12, 2002/05/13, 2002/05/14, 2002/05/17, 2002/05/19, 
2002/05/20, 2002/05/21, 2002/05/22, 

[GitHub] [incubator-hudi] TisonKun commented on issue #1557: [HUDI-834] Concrete signature of HoodieRecordPayload#combineAndGetUpdateValue & HoodieRecordPayload#getInsertValue

2020-04-23 Thread GitBox


TisonKun commented on issue #1557:
URL: https://github.com/apache/incubator-hudi/pull/1557#issuecomment-618436893


   Hold if `HoodieRecordPayload` already user facing, we cannot change 
signature of the interface then.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-hudi] codecov-io commented on issue #1557: [HUDI-834] Concrete signature of HoodieRecordPayload#combineAndGetUpdateValue & HoodieRecordPayload#getInsertValue

2020-04-23 Thread GitBox


codecov-io commented on issue #1557:
URL: https://github.com/apache/incubator-hudi/pull/1557#issuecomment-618435496


   # 
[Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1557?src=pr=h1) 
Report
   > Merging 
[#1557](https://codecov.io/gh/apache/incubator-hudi/pull/1557?src=pr=desc) 
into 
[master](https://codecov.io/gh/apache/incubator-hudi/commit/ddd105bb3119174b613c6917ee25795f2939f430=desc)
 will **decrease** coverage by `0.66%`.
   > The diff coverage is `40.00%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-hudi/pull/1557/graphs/tree.svg?width=650=150=pr=VTTXabwbs2)](https://codecov.io/gh/apache/incubator-hudi/pull/1557?src=pr=tree)
   
   ```diff
   @@ Coverage Diff  @@
   ## master#1557  +/-   ##
   
   - Coverage 72.35%   71.69%   -0.67% 
 Complexity  294  294  
   
 Files   374  378   +4 
 Lines 1637716549 +172 
 Branches   1650 1670  +20 
   
   + Hits  1184911864  +15 
   - Misses 3797 3954 +157 
 Partials731  731  
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/incubator-hudi/pull/1557?src=pr=tree) | 
Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | 
[...he/hudi/common/model/EmptyHoodieRecordPayload.java](https://codecov.io/gh/apache/incubator-hudi/pull/1557/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL21vZGVsL0VtcHR5SG9vZGllUmVjb3JkUGF5bG9hZC5qYXZh)
 | `100.00% <ø> (ø)` | `0.00 <0.00> (ø)` | |
   | 
[...rg/apache/hudi/common/model/HoodieAvroPayload.java](https://codecov.io/gh/apache/incubator-hudi/pull/1557/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL21vZGVsL0hvb2RpZUF2cm9QYXlsb2FkLmphdmE=)
 | `84.61% <ø> (ø)` | `0.00 <0.00> (ø)` | |
   | 
[.../apache/hudi/common/model/HoodieRecordPayload.java](https://codecov.io/gh/apache/incubator-hudi/pull/1557/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL21vZGVsL0hvb2RpZVJlY29yZFBheWxvYWQuamF2YQ==)
 | `100.00% <ø> (ø)` | `0.00 <0.00> (ø)` | |
   | 
[...ava/org/apache/hudi/payload/AWSDmsAvroPayload.java](https://codecov.io/gh/apache/incubator-hudi/pull/1557/diff?src=pr=tree#diff-aHVkaS1zcGFyay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS9wYXlsb2FkL0FXU0Rtc0F2cm9QYXlsb2FkLmphdmE=)
 | `0.00% <0.00%> (ø)` | `0.00 <0.00> (ø)` | |
   | 
[...i/common/model/OverwriteWithLatestAvroPayload.java](https://codecov.io/gh/apache/incubator-hudi/pull/1557/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL21vZGVsL092ZXJ3cml0ZVdpdGhMYXRlc3RBdnJvUGF5bG9hZC5qYXZh)
 | `56.25% <100.00%> (ø)` | `0.00 <0.00> (ø)` | |
   | 
[...di/hadoop/realtime/HoodieRealtimeRecordReader.java](https://codecov.io/gh/apache/incubator-hudi/pull/1557/diff?src=pr=tree#diff-aHVkaS1oYWRvb3AtbXIvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaGFkb29wL3JlYWx0aW1lL0hvb2RpZVJlYWx0aW1lUmVjb3JkUmVhZGVyLmphdmE=)
 | `70.00% <0.00%> (-14.22%)` | `0.00% <0.00%> (ø%)` | |
   | 
[...ain/java/org/apache/hudi/avro/HoodieAvroUtils.java](https://codecov.io/gh/apache/incubator-hudi/pull/1557/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvYXZyby9Ib29kaWVBdnJvVXRpbHMuamF2YQ==)
 | `84.82% <0.00%> (-8.32%)` | `0.00% <0.00%> (ø%)` | |
   | 
[...i/common/table/timeline/TimelineMetadataUtils.java](https://codecov.io/gh/apache/incubator-hudi/pull/1557/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3RpbWVsaW5lL1RpbWVsaW5lTWV0YWRhdGFVdGlscy5qYXZh)
 | `93.02% <0.00%> (-2.22%)` | `0.00% <0.00%> (ø%)` | |
   | 
[.../org/apache/hudi/table/HoodieCommitArchiveLog.java](https://codecov.io/gh/apache/incubator-hudi/pull/1557/diff?src=pr=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdGFibGUvSG9vZGllQ29tbWl0QXJjaGl2ZUxvZy5qYXZh)
 | `76.43% <0.00%> (-1.06%)` | `0.00% <0.00%> (ø%)` | |
   | 
[...i/common/table/timeline/HoodieDefaultTimeline.java](https://codecov.io/gh/apache/incubator-hudi/pull/1557/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3RpbWVsaW5lL0hvb2RpZURlZmF1bHRUaW1lbGluZS5qYXZh)
 | `92.30% <0.00%> (-0.12%)` | `0.00% <0.00%> (ø%)` | |
   | ... and [12 
more](https://codecov.io/gh/apache/incubator-hudi/pull/1557/diff?src=pr=tree-more)
 | |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1557?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1557?src=pr=footer).

[jira] [Resolved] (HUDI-761) Organize Rollback/Savepoint/Restore action implementation under a single package

2020-04-23 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar resolved HUDI-761.
-
Resolution: Fixed

> Organize Rollback/Savepoint/Restore action implementation under a single 
> package
> 
>
> Key: HUDI-761
> URL: https://issues.apache.org/jira/browse/HUDI-761
> Project: Apache Hudi (incubating)
>  Issue Type: Sub-task
>  Components: Code Cleanup, Writer Core
>Reporter: Vinoth Chandar
>Assignee: Vinoth Chandar
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.6.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] badion edited a comment on issue #1550: Hudi 0.5.2 inability save complex type with nullable = true [SUPPORT]

2020-04-23 Thread GitBox


badion edited a comment on issue #1550:
URL: https://github.com/apache/incubator-hudi/issues/1550#issuecomment-618411983


   @vinothchandar Seems like issue gone after building .jar file from 
commit(merge) - _ce0a4c64d07d6eea926d1bfb92b69ae387b88f50_, which was 
apparently after release of _Hudi release 0.5.2_. Also one thing that we tried 
to use hudi jar from mvn central, it seems like it doesn't have fix with avro 
yet. 
   
   I think will will wait next **release**, which will include those changes.
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-hudi] badion commented on issue #1550: Hudi 0.5.2 inability save complex type with nullable = true [SUPPORT]

2020-04-23 Thread GitBox


badion commented on issue #1550:
URL: https://github.com/apache/incubator-hudi/issues/1550#issuecomment-618411983


   @vinothchandar Seems like issue gone after building .jar file from 
commit(merge) - _ce0a4c64d07d6eea926d1bfb92b69ae387b88f50_, which was 
apparently after release of _Hudi release 0.5.2_. Also one thing that we tried 
to use hudi jar from mvn central, it seems like it doesn't have fix with avro 
yet. 
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (HUDI-834) Concrete signature of HoodieRecordPayload#combineAndGetUpdateValue & HoodieRecordPayload#getInsertValue

2020-04-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-834:

Labels: pull-request-available  (was: )

> Concrete signature of HoodieRecordPayload#combineAndGetUpdateValue & 
> HoodieRecordPayload#getInsertValue
> ---
>
> Key: HUDI-834
> URL: https://issues.apache.org/jira/browse/HUDI-834
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>Reporter: Zili Chen
>Priority: Minor
>  Labels: pull-request-available
>
> So far, the return type of {{HoodieRecordPayload#combineAndGetUpdateValue}} & 
> {{HoodieRecordPayload#getInsertValue}} is effectively 
> {{Option}}. Instead of doing unchecked cast at
> org/apache/hudi/hadoop/realtime/RealtimeCompactedRecordReader.java:88
> I propose we use {{Option}} as the return type of these two 
> method, which replaces current {{Option}}.
> FYI, I encounter this ticket when trying to get rid of self type parameter in 
>  {{HoodieRecordPayload}} and found that it is a bit awkward if we don't take 
> a self type while doing this casting. Fortunately it is the fact that we can 
> directly concrete it.
> cc [~vinoth] [~leesf]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] jenu9417 commented on issue #1528: [SUPPORT] Issue while writing to HDFS via hudi. Only `/.hoodie` folder is written.

2020-04-23 Thread GitBox


jenu9417 commented on issue #1528:
URL: https://github.com/apache/incubator-hudi/issues/1528#issuecomment-618336942


   @lamber-ken  @vinothchandar 
   The above mentioned suggestions works fine. Time to write has now reduced 
drastically.
   Thank you for the continued support.
   
   Closing the ticket, since the original issue is resolved now.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-hudi] PhatakN1 commented on issue #1549: Potential issue when using Deltastreamer with DMS

2020-04-23 Thread GitBox


PhatakN1 commented on issue #1549:
URL: https://github.com/apache/incubator-hudi/issues/1549#issuecomment-618292312


   If MOR inserts go to a parquet file but updates to go a log file, then a 
query on the _ro table will show the inserts since the last compaction but not 
the updates. Isnt that like providing an inconsistent state of data? So, I 
still see all inserts since the last compaction but none of  the updates?
   
   These are the contents of the log file using show logfile records in hudi-cli
   {"_hoodie_commit_time": "20200422083923", "_hoodie_commit_seqno": 
"20200422083923_1_2", "_hoodie_record_key": "11", "_hoodie_partition_path": 
"2019-03-14", "_hoodie_file_name": "c9df1d00-5dda-4bf7-8f27-1d4534bbbe4c-0", 
"dms_received_ts": "2020-04-22T08:38:36.873970Z", "tran_id": 11, "tran_date": 
"2019-03-14", "store_id": 5, "store_city": "CHICAGO", "store_state": "IL", 
"item_code": "XX", "quantity": 15, "total": 106.25, "Op": "D"}
   
   This is the log file metadata
   ║ 20200422083923 │ 1   │ AVRO_DATA_BLOCK │ 
{"SCHEMA":"{\"type\":\"record\",\"name\":\"retail_transactions\",\"fields\":[{\"name\":\"_hoodie_commit_time\",\"type\":[\"null\",\"string\"],\"doc\":\"\",\"default\":null},{\"name\":\"_hoodie_commit_seqno\",\"type\":[\"null\",\"string\"],\"doc\":\"\",\"default\":null},{\"name\":\"_hoodie_record_key\",\"type\":[\"null\",\"string\"],\"doc\":\"\",\"default\":null},{\"name\":\"_hoodie_partition_path\",\"type\":[\"null\",\"string\"],\"doc\":\"\",\"default\":null},{\"name\":\"_hoodie_file_name\",\"type\":[\"null\",\"string\"],\"doc\":\"\",\"default\":null},{\"name\":\"dms_received_ts\",\"type\":\"string\"},{\"name\":\"tran_id\",\"type\":\"int\"},{\"name\":\"tran_date\",\"type\":\"string\"},{\"name\":\"store_id\",\"type\":\"int\"},{\"name\":\"store_city\",\"type\":\"string\"},{\"name\":\"store_state\",\"type\":\"string\"},{\"name\":\"item_code\",\"type\":\"string\"},{\"name\":\"quantity\",\"type\":\"int\"},{\"name\":\"total\",\"type\":\"float\"},{\"name\":\"Op\",\"type\":\"string\"}]}","INSTANT_TIME":"20200422083923"}
 │ {} ║
   
   The name of the parquet file in the partition is 
c9df1d00-5dda-4bf7-8f27-1d4534bbbe4c-0_3-23-40_20200422072539.parquet and the 
log file name is 
.c9df1d00-5dda-4bf7-8f27-1d4534bbbe4c-0_20200422072539.log.1_1-24-33
   
   The partiton metadata contents are 
   commitTime=20200422072539
   partitionDepth=1
   
   Not sure why a query on the _rt table does not reflect the delete. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-hudi] yanghua commented on issue #1100: [HUDI-289] Implement a test suite to support long running test for Hudi writing and querying end-end

2020-04-23 Thread GitBox


yanghua commented on issue #1100:
URL: https://github.com/apache/incubator-hudi/pull/1100#issuecomment-618251852


   > @yanghua need you to lead the Azure pipelines for the test suite and other 
tickets assigned to you under the umbrella ticket.
   
   @n3nash Thanks for your hard work. I will active this work after merging 
this huge PR.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Assigned] (HUDI-396) Provide an documentation to describe how to use test suite

2020-04-23 Thread vinoyang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang reassigned HUDI-396:
-

Assignee: wangxianghu

> Provide an documentation to describe how to use test suite
> --
>
> Key: HUDI-396
> URL: https://issues.apache.org/jira/browse/HUDI-396
> Project: Apache Hudi (incubating)
>  Issue Type: Sub-task
>  Components: Docs
>Reporter: vinoyang
>Assignee: wangxianghu
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (HUDI-591) Support Spark version upgrade

2020-04-23 Thread vinoyang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang closed HUDI-591.
-
Resolution: Fixed

> Support Spark version upgrade
> -
>
> Key: HUDI-591
> URL: https://issues.apache.org/jira/browse/HUDI-591
> Project: Apache Hudi (incubating)
>  Issue Type: Sub-task
>  Components: Testing
>Reporter: vinoyang
>Assignee: vinoyang
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Recently, the version of Spark that hudi depends on has been bumped from 
> 2.2.x to 2.4.x. However, this test suite born in the generation when hudi 
> depends on Spark 2.2.x. After rebased the test suite branch based on the 
> master branch. There are some unit test cases failure. The major reason is 
> that the Avro dependency has been a bulit-in module in Spark. We need to fix 
> them.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (HUDI-592) Remove duplicated dependencies in the pom file of test suite module

2020-04-23 Thread vinoyang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang closed HUDI-592.
-
Resolution: Fixed

> Remove duplicated dependencies in the pom file of test suite module
> ---
>
> Key: HUDI-592
> URL: https://issues.apache.org/jira/browse/HUDI-592
> Project: Apache Hudi (incubating)
>  Issue Type: Sub-task
>  Components: Testing
>Reporter: vinoyang
>Assignee: vinoyang
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> There are some duplicated dependencies in the pom file of the test suite 
> module, such as {{hadoop-hdfs}} and {{hadoop-common}}. We need to remove 
> these dependencies.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-592) Remove duplicated dependencies in the pom file of test suite module

2020-04-23 Thread vinoyang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang updated HUDI-592:
--
Status: Open  (was: New)

> Remove duplicated dependencies in the pom file of test suite module
> ---
>
> Key: HUDI-592
> URL: https://issues.apache.org/jira/browse/HUDI-592
> Project: Apache Hudi (incubating)
>  Issue Type: Sub-task
>  Components: Testing
>Reporter: vinoyang
>Assignee: vinoyang
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> There are some duplicated dependencies in the pom file of the test suite 
> module, such as {{hadoop-hdfs}} and {{hadoop-common}}. We need to remove 
> these dependencies.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-591) Support Spark version upgrade

2020-04-23 Thread vinoyang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang updated HUDI-591:
--
Status: Open  (was: New)

> Support Spark version upgrade
> -
>
> Key: HUDI-591
> URL: https://issues.apache.org/jira/browse/HUDI-591
> Project: Apache Hudi (incubating)
>  Issue Type: Sub-task
>  Components: Testing
>Reporter: vinoyang
>Assignee: vinoyang
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Recently, the version of Spark that hudi depends on has been bumped from 
> 2.2.x to 2.4.x. However, this test suite born in the generation when hudi 
> depends on Spark 2.2.x. After rebased the test suite branch based on the 
> master branch. There are some unit test cases failure. The major reason is 
> that the Avro dependency has been a bulit-in module in Spark. We need to fix 
> them.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[incubator-hudi] branch hudi_test_suite_refactor updated (e7b1474 -> 908e57c)

2020-04-23 Thread vinoyang
This is an automated email from the ASF dual-hosted git repository.

vinoyang pushed a change to branch hudi_test_suite_refactor
in repository https://gitbox.apache.org/repos/asf/incubator-hudi.git.


 discard e7b1474  [HUDI-397]Normalize log print statement (#1224)
omit da3232e  Testing running 3 builds to limit total build time
omit c13e885  [HUDI-394] Provide a basic implementation of test suite
 add ddd105b  [HUDI-772] Make UserDefinedBulkInsertPartitioner configurable 
for DataSource (#1500)
 add 2a2f31d  [MINOR] Remove reduntant code and fix typo in 
HoodieDefaultTimeline (#1535)
 add 332072b  [HUDI-371] Supporting hive combine input format for realtime 
tables (#1503)
 add 84dd904  [HUDI-789]Adjust logic of upsert in HDFSParquetImporter 
(#1511)
 add 62bd3e7  [HUDI-757] Added hudi-cli command to export metadata of 
Instants.
 add 2a56f82  [HUDI-821] Fixing JCommander param parsing in deltastreamer 
(#1525)
 add 6e15eeb  [HUDI-809] Migrate CommonTestHarness to JUnit 5 (#1530)
 add 26684f5  [HUDI-816] Fixed MAX_MEMORY_FOR_MERGE_PROP and 
MAX_MEMORY_FOR_COMPACTION_PROP do not work due to HUDI-678 (#1536)
 add aea7c16  [HUDI-795] Handle auto-deleted empty aux folder (#1515)
 add 19cc15c  [MINOR]: Fix cli docs for DeltaStreamer (#1547)
 add 0c75316  [HUDI-394] Provide a basic implementation of test suite
 add 7ab93b0  Testing running 3 builds to limit total build time
 add 908e57c  [HUDI-397]Normalize log print statement (#1224)

This update added new revisions after undoing existing revisions.
That is to say, some revisions that were in the old version of the
branch are not in the new version.  This situation occurs
when a user --force pushes a change and generates a repository
containing something like this:

 * -- * -- B -- O -- O -- O   (e7b1474)
\
 N -- N -- N   refs/heads/hudi_test_suite_refactor (908e57c)

You should already have received notification emails for all of the O
revisions, and so the following emails describe only the N revisions
from the common base, B.

Any revisions marked "omit" are not gone; other references still
refer to them.  Any revisions marked "discard" are gone forever.

No new revisions were added by this update.

Summary of changes:
 .../apache/hudi/cli/commands/ExportCommand.java| 231 
 .../apache/hudi/client/utils/SparkConfigUtils.java |  10 +-
 .../org/apache/hudi/config/HoodieWriteConfig.java  |  10 +
 .../apache/hudi/table/HoodieCommitArchiveLog.java  |  24 +-
 .../hudi/client/utils/TestSparkConfigUtils.java|  65 +++
 .../hudi/common/HoodieMergeOnReadTestUtils.java|   4 +-
 .../java/org/apache/hudi/avro/HoodieAvroUtils.java |  29 +
 .../table/timeline/HoodieDefaultTimeline.java  |   6 +-
 .../table/timeline/TimelineMetadataUtils.java  |   4 +
 .../hudi/common/util/collection/ArrayUtils.java|  62 ++
 .../common/table/TestHoodieTableMetaClient.java|  54 +-
 .../hudi/common/table/TestTimelineLayout.java  |  24 +-
 .../table/view/TestHoodieTableFileSystemView.java  | 335 ++-
 .../table/view/TestRocksDbBasedFileSystemView.java |   4 +-
 .../HoodieCommonTestHarnessJunit5.java}|  33 +-
 .../apache/hudi/common/util/TestFileIOUtils.java   |  20 +-
 hudi-hadoop-mr/pom.xml |   8 +-
 .../hadoop/hive/HoodieCombineHiveInputFormat.java  | 626 -
 .../hive/HoodieCombineRealtimeFileSplit.java   | 169 ++
 .../hive/HoodieCombineRealtimeHiveSplit.java   |  27 +-
 .../realtime/AbstractRealtimeRecordReader.java |   3 +
 .../HoodieCombineRealtimeRecordReader.java | 103 
 .../realtime/HoodieParquetRealtimeInputFormat.java |   2 +-
 .../realtime/HoodieRealtimeRecordReader.java   |   1 +
 .../realtime/RealtimeUnmergedRecordReader.java |  22 +-
 .../apache/hudi/hadoop/InputFormatTestUtil.java| 165 --
 .../hudi/hadoop/TestHoodieParquetInputFormat.java  |  99 ++--
 .../hudi/hadoop/TestHoodieROTablePathFilter.java   |  26 +-
 .../realtime/TestHoodieCombineHiveInputFormat.java | 156 +
 .../realtime/TestHoodieRealtimeRecordReader.java   | 206 +++
 .../main/java/org/apache/hudi/DataSourceUtils.java |  27 +-
 hudi-spark/src/test/java/DataSourceTestUtils.java  |  13 +
 hudi-spark/src/test/java/DataSourceUtilsTest.java  |  86 +++
 .../apache/hudi/utilities/HDFSParquetImporter.java |  22 +-
 .../deltastreamer/HoodieDeltaStreamer.java |  18 +-
 .../HoodieMultiTableDeltaStreamer.java |   5 +-
 .../hudi/utilities/TestHDFSParquetImporter.java| 255 +++--
 .../hudi/utilities/TestHoodieSnapshotCopier.java   |  22 +-
 .../TestKafkaConnectHdfsProvider.java  |  20 +-
 39 files changed, 2125 insertions(+), 871 deletions(-)
 create mode 100644 
hudi-cli/src/main/java/org/apache/hudi/cli/commands/ExportCommand.java
 create mode 100644 
hudi-client/src/test/java/org/apache/hudi/client/utils/TestSparkConfigUtils.java
 create mode 100644 

[jira] [Updated] (HUDI-704) Add unit test for RepairsCommand

2020-04-23 Thread vinoyang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang updated HUDI-704:
--
Status: Open  (was: New)

> Add unit test for RepairsCommand
> 
>
> Key: HUDI-704
> URL: https://issues.apache.org/jira/browse/HUDI-704
> Project: Apache Hudi (incubating)
>  Issue Type: Sub-task
>Reporter: hong dongdong
>Assignee: hong dongdong
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (HUDI-397) Normalize log print statement

2020-04-23 Thread vinoyang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang closed HUDI-397.
-
Fix Version/s: 0.6.0
   Resolution: Done

Done via hudi_test_suite_refactor branch: 
e7b1474e4e0eedc98e1883d2d8f27469368f141b

> Normalize log print statement
> -
>
> Key: HUDI-397
> URL: https://issues.apache.org/jira/browse/HUDI-397
> Project: Apache Hudi (incubating)
>  Issue Type: Sub-task
>  Components: Testing
>Reporter: vinoyang
>Assignee: wangxianghu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.6.0
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> In test suite module, there are many logging statements looks like this 
> pattern:
> {code:java}
> log.info(String.format("- inserting input data %s 
> --", this.getName()));
> {code}
> IMO, it's not a good design. We need to refactor it.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[incubator-hudi] branch hudi_test_suite_refactor updated (da3232e -> e7b1474)

2020-04-23 Thread vinoyang
This is an automated email from the ASF dual-hosted git repository.

vinoyang pushed a change to branch hudi_test_suite_refactor
in repository https://gitbox.apache.org/repos/asf/incubator-hudi.git.


from da3232e  Testing running 3 builds to limit total build time
 add e7b1474  [HUDI-397]Normalize log print statement (#1224)

No new revisions were added by this update.

Summary of changes:
 .../apache/hudi/testsuite/dag/nodes/BulkInsertNode.java  |  2 +-
 .../org/apache/hudi/testsuite/dag/nodes/CleanNode.java   |  2 +-
 .../org/apache/hudi/testsuite/dag/nodes/CompactNode.java |  2 +-
 .../org/apache/hudi/testsuite/dag/nodes/DagNode.java |  6 +++---
 .../apache/hudi/testsuite/dag/nodes/HiveQueryNode.java   |  6 +++---
 .../apache/hudi/testsuite/dag/nodes/HiveSyncNode.java|  2 +-
 .../org/apache/hudi/testsuite/dag/nodes/InsertNode.java  |  6 +++---
 .../apache/hudi/testsuite/dag/nodes/RollbackNode.java|  4 ++--
 .../hudi/testsuite/dag/nodes/ScheduleCompactNode.java|  4 ++--
 .../hudi/testsuite/dag/nodes/SparkSQLQueryNode.java  |  4 ++--
 .../org/apache/hudi/testsuite/dag/nodes/UpsertNode.java  |  4 ++--
 .../hudi/testsuite/dag/scheduler/DagScheduler.java   | 12 ++--
 .../apache/hudi/testsuite/generator/DeltaGenerator.java  |  6 +++---
 .../generator/GenericRecordFullPayloadGenerator.java | 10 +-
 .../apache/hudi/testsuite/job/HoodieTestSuiteJob.java| 10 +-
 .../testsuite/reader/DFSHoodieDatasetInputReader.java| 16 
 .../hudi/testsuite/writer/AvroDeltaInputWriter.java  |  7 ---
 .../reader/TestDFSHoodieDatasetInputReader.java  |  1 +
 18 files changed, 53 insertions(+), 51 deletions(-)