[GitHub] [incubator-hudi] lamber-ken opened a new pull request #1209: [CLEAN] Fix partition typo

2020-01-10 Thread GitBox
lamber-ken opened a new pull request #1209: [CLEAN] Fix partition typo
URL: https://github.com/apache/incubator-hudi/pull/1209
 
 
   
   ## What is the purpose of the pull request
   
   Fix `partition` typo.
   
   ## Brief change log
   
 - *Fix `partition` typo.*
   
   ## Verify this pull request
   
   This pull request is code cleanup without any test coverage.
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] leesf commented on a change in pull request #1208: [HUDI-304] Bring back spotless plugin

2020-01-10 Thread GitBox
leesf commented on a change in pull request #1208: [HUDI-304] Bring back 
spotless plugin
URL: https://github.com/apache/incubator-hudi/pull/1208#discussion_r365497823
 
 

 ##
 File path: style/eclipse-java-google-style.xml
 ##
 @@ -0,0 +1,351 @@
+
+
 
 Review comment:
   Seems that Apache License is OK, also checked the Apache Avro 
https://github.com/apache/avro/blob/8026c8ffe4ef67ab419dba73910636bf2c1a691c/lang/java/eclipse-java-formatter.xml
 cc @bvaradar 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] leesf commented on a change in pull request #1208: [HUDI-304] Bring back spotless plugin

2020-01-10 Thread GitBox
leesf commented on a change in pull request #1208: [HUDI-304] Bring back 
spotless plugin
URL: https://github.com/apache/incubator-hudi/pull/1208#discussion_r365497747
 
 

 ##
 File path: style/checkstyle.xml
 ##
 @@ -61,10 +61,11 @@
 
 
 
+
 
 Review comment:
   Comment it intentionally.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Updated] (HUDI-304) Bring back spotless plugin

2020-01-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-304:

Labels: pull-request-available  (was: )

> Bring back spotless plugin 
> ---
>
> Key: HUDI-304
> URL: https://issues.apache.org/jira/browse/HUDI-304
> Project: Apache Hudi (incubating)
>  Issue Type: Task
>  Components: Code Cleanup, Testing
>Reporter: Balaji Varadarajan
>Assignee: leesf
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.1
>
>
> spotless plugin has been turned off as the eclipse style format it was 
> referencing was removed due to compliance reasons. 
> We use google style eclipse format with some changes
> 90c90
> < 
> ---
> > 
> 242c242
> <  value="100"/>
> ---
> >  > value="120"/>
>  
> The eclipse style sheet was originally obtained from 
> [https://github.com/google/styleguide] which CC -By 3.0 license which is not 
> compatible for source distribution (See 
> [https://www.apache.org/legal/resolved.html#cc-by]) 
>  
> We need to figure out a way to bring this back
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] leesf opened a new pull request #1208: [HUDI-304] Bring back spotless plugin

2020-01-10 Thread GitBox
leesf opened a new pull request #1208: [HUDI-304] Bring back spotless plugin
URL: https://github.com/apache/incubator-hudi/pull/1208
 
 
   ## What is the purpose of the pull request
   Bring back the spotless plugin, first introduced in #945 .  Would run `mvn 
spotless:apply` to fix code format error automatically. 
   
   ## Brief change log
   
 - *Introduce eclipse-java-google-style.xml*
 - *Comment LineLength module in checkstyle.xml*
   
   ## Verify this pull request
   
   This pull request is a trivial rework / code cleanup without any test 
coverage.
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] wangxianghu commented on a change in pull request #1207: [HUDI-458] Redo hudi-hadoop-mr log statements using SLF4J

2020-01-10 Thread GitBox
wangxianghu commented on a change in pull request #1207: [HUDI-458] Redo 
hudi-hadoop-mr log statements using SLF4J
URL: https://github.com/apache/incubator-hudi/pull/1207#discussion_r365497345
 
 

 ##
 File path: 
hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/hive/HoodieCombineHiveInputFormat.java
 ##
 @@ -555,8 +533,8 @@ public int hashCode() {
 // Store the previous value for the path specification
 String oldPaths = 
job.get(org.apache.hadoop.mapreduce.lib.input.FileInputFormat.INPUT_DIR);
 if (LOG.isDebugEnabled()) {
-  LOG.debug("The received input paths are: [" + oldPaths + "] against the 
property "
-  + org.apache.hadoop.mapreduce.lib.input.FileInputFormat.INPUT_DIR);
+  LOG.debug("The received input paths are: [{}] against the property {}", 
oldPaths,
+  org.apache.hadoop.mapreduce.lib.input.FileInputFormat.INPUT_DIR);
 
 Review comment:
   ok, Thinks, I will fix it


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] yanghua commented on issue #1207: [HUDI-458] Redo hudi-hadoop-mr log statements using SLF4J

2020-01-10 Thread GitBox
yanghua commented on issue #1207: [HUDI-458] Redo hudi-hadoop-mr log statements 
using SLF4J
URL: https://github.com/apache/incubator-hudi/pull/1207#issuecomment-573278027
 
 
   @wangxianghu The Travis has failed. Please recheck your PR.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] hmatu commented on a change in pull request #1207: [HUDI-458] Redo hudi-hadoop-mr log statements using SLF4J

2020-01-10 Thread GitBox
hmatu commented on a change in pull request #1207: [HUDI-458] Redo 
hudi-hadoop-mr log statements using SLF4J
URL: https://github.com/apache/incubator-hudi/pull/1207#discussion_r365496893
 
 

 ##
 File path: 
hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/HoodieROTablePathFilter.java
 ##
 @@ -101,15 +100,15 @@ public boolean accept(Path path) {
   // Try to use the caches.
   if (nonHoodiePathCache.contains(folder.toString())) {
 if (LOG.isDebugEnabled()) {
-  LOG.debug("Accepting non-hoodie path from cache: " + path);
+  LOG.debug("Accepting non-hoodie path from cache: {}", path);
 }
 return true;
   }
 
   if (hoodiePathCache.containsKey(folder.toString())) {
 if (LOG.isDebugEnabled()) {
-  LOG.debug(String.format("%s Hoodie path checked against cache, 
accept => %s \n", path,
-  hoodiePathCache.get(folder.toString()).contains(path)));
+  LOG.debug("{} Hoodie path checked against cache, accept => {} \n", 
path,
 
 Review comment:
   Remove `\n`


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] hmatu commented on a change in pull request #1207: [HUDI-458] Redo hudi-hadoop-mr log statements using SLF4J

2020-01-10 Thread GitBox
hmatu commented on a change in pull request #1207: [HUDI-458] Redo 
hudi-hadoop-mr log statements using SLF4J
URL: https://github.com/apache/incubator-hudi/pull/1207#discussion_r365496931
 
 

 ##
 File path: 
hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/hive/HoodieCombineHiveInputFormat.java
 ##
 @@ -555,8 +533,8 @@ public int hashCode() {
 // Store the previous value for the path specification
 String oldPaths = 
job.get(org.apache.hadoop.mapreduce.lib.input.FileInputFormat.INPUT_DIR);
 if (LOG.isDebugEnabled()) {
-  LOG.debug("The received input paths are: [" + oldPaths + "] against the 
property "
-  + org.apache.hadoop.mapreduce.lib.input.FileInputFormat.INPUT_DIR);
+  LOG.debug("The received input paths are: [{}] against the property {}", 
oldPaths,
+  org.apache.hadoop.mapreduce.lib.input.FileInputFormat.INPUT_DIR);
 
 Review comment:
   Code padding.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] hmatu commented on a change in pull request #1207: [HUDI-458] Redo hudi-hadoop-mr log statements using SLF4J

2020-01-10 Thread GitBox
hmatu commented on a change in pull request #1207: [HUDI-458] Redo 
hudi-hadoop-mr log statements using SLF4J
URL: https://github.com/apache/incubator-hudi/pull/1207#discussion_r365496880
 
 

 ##
 File path: 
hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/HoodieROTablePathFilter.java
 ##
 @@ -144,30 +143,32 @@ public boolean accept(Path path) {
   if (!hoodiePathCache.containsKey(folder.toString())) {
 hoodiePathCache.put(folder.toString(), new HashSet<>());
   }
-  LOG.info("Based on hoodie metadata from base path: " + 
baseDir.toString() + ", caching " + latestFiles.size()
-  + " files under " + folder);
+  LOG.info("Based on hoodie metadata from base path: {}, caching {} 
files under {}",
+  baseDir,
+  latestFiles.size(),
+  folder);
   for (HoodieDataFile lfile : latestFiles) {
 hoodiePathCache.get(folder.toString()).add(new 
Path(lfile.getPath()));
   }
 
   // accept the path, if its among the latest files.
   if (LOG.isDebugEnabled()) {
-LOG.debug(String.format("%s checked after cache population, accept 
=> %s \n", path,
-hoodiePathCache.get(folder.toString()).contains(path)));
+LOG.debug("{} checked after cache population, accept => {} \n", 
path,
+hoodiePathCache.get(folder.toString()).contains(path));
   }
   return hoodiePathCache.get(folder.toString()).contains(path);
 } catch (DatasetNotFoundException e) {
   // Non-hoodie path, accept it.
   if (LOG.isDebugEnabled()) {
-LOG.debug(String.format("(1) Caching non-hoodie path under %s \n", 
folder.toString()));
+LOG.debug("(1) Caching non-hoodie path under {} \n", folder);
 
 Review comment:
   Remove `\n`


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] wangxianghu opened a new pull request #1207: [HUDI-458] Redo hudi-hadoop-mr log statements using SLF4J

2020-01-10 Thread GitBox
wangxianghu opened a new pull request #1207: [HUDI-458] Redo hudi-hadoop-mr log 
statements using SLF4J
URL: https://github.com/apache/incubator-hudi/pull/1207
 
 
   ## What is the purpose of the pull request
   
   *Redo hudi-hadoop-mr log statements using SLF4J*
   
   ## Brief change log
   
   *Redo hudi-hadoop-mr log statements using SLF4J*
   
   ## Verify this pull request
   
   This pull request should be covered by existing tests, such as *(please 
describe tests)*.
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Updated] (HUDI-458) Redo hudi-hadoop-mr log statements using SLF4J

2020-01-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-458:

Labels: pull-request-available  (was: )

> Redo hudi-hadoop-mr log statements using SLF4J
> --
>
> Key: HUDI-458
> URL: https://issues.apache.org/jira/browse/HUDI-458
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Code Cleanup
>Reporter: leesf
>Assignee: wangxianghu
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Build failed in Jenkins: hudi-snapshot-deployment-0.5 #156

2020-01-10 Thread Apache Jenkins Server
See 


Changes:


--
[...truncated 2.75 KB...]
guice-4.2.0-no_aop.jar
jansi-1.17.1.jar
jansi-native
javax.inject-1.jar
jcl-over-slf4j-1.7.25.jar
jcl-over-slf4j.license
jsr250-api-1.0.jar
jsr250-api.license
maven-artifact-3.5.4.jar
maven-artifact.license
maven-builder-support-3.5.4.jar
maven-builder-support.license
maven-compat-3.5.4.jar
maven-compat.license
maven-core-3.5.4.jar
maven-core.license
maven-embedder-3.5.4.jar
maven-embedder.license
maven-model-3.5.4.jar
maven-model-builder-3.5.4.jar
maven-model-builder.license
maven-model.license
maven-plugin-api-3.5.4.jar
maven-plugin-api.license
maven-repository-metadata-3.5.4.jar
maven-repository-metadata.license
maven-resolver-api-1.1.1.jar
maven-resolver-api.license
maven-resolver-connector-basic-1.1.1.jar
maven-resolver-connector-basic.license
maven-resolver-impl-1.1.1.jar
maven-resolver-impl.license
maven-resolver-provider-3.5.4.jar
maven-resolver-provider.license
maven-resolver-spi-1.1.1.jar
maven-resolver-spi.license
maven-resolver-transport-wagon-1.1.1.jar
maven-resolver-transport-wagon.license
maven-resolver-util-1.1.1.jar
maven-resolver-util.license
maven-settings-3.5.4.jar
maven-settings-builder-3.5.4.jar
maven-settings-builder.license
maven-settings.license
maven-shared-utils-3.2.1.jar
maven-shared-utils.license
maven-slf4j-provider-3.5.4.jar
maven-slf4j-provider.license
org.eclipse.sisu.inject-0.3.3.jar
org.eclipse.sisu.inject.license
org.eclipse.sisu.plexus-0.3.3.jar
org.eclipse.sisu.plexus.license
plexus-cipher-1.7.jar
plexus-cipher.license
plexus-component-annotations-1.7.1.jar
plexus-component-annotations.license
plexus-interpolation-1.24.jar
plexus-interpolation.license
plexus-sec-dispatcher-1.4.jar
plexus-sec-dispatcher.license
plexus-utils-3.1.0.jar
plexus-utils.license
slf4j-api-1.7.25.jar
slf4j-api.license
wagon-file-3.1.0.jar
wagon-file.license
wagon-http-3.1.0-shaded.jar
wagon-http.license
wagon-provider-api-3.1.0.jar
wagon-provider-api.license

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/ext:
README.txt

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native:
freebsd32
freebsd64
linux32
linux64
osx
README.txt
windows32
windows64

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/freebsd32:
libjansi.so

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/freebsd64:
libjansi.so

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/linux32:
libjansi.so

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/linux64:
libjansi.so

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/osx:
libjansi.jnilib

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/windows32:
jansi.dll

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/windows64:
jansi.dll
Finished /home/jenkins/tools/maven/apache-maven-3.5.4 Directory Listing :
Detected current version as: 
'HUDI_home=
0.5.1-SNAPSHOT'
[INFO] Scanning for projects...
[INFO] 
[INFO] Reactor Build Order:
[INFO] 
[INFO] Hudi   [pom]
[INFO] hudi-common[jar]
[INFO] hudi-timeline-service  [jar]
[INFO] hudi-hadoop-mr [jar]
[INFO] hudi-client[jar]
[INFO] hudi-hive  [jar]
[INFO] hudi-spark [jar]
[INFO] hudi-utilities [jar]
[INFO] hudi-cli   [jar]
[INFO] hudi-hadoop-mr-bundle  [jar]
[INFO] hudi-hive-bundle   [jar]
[INFO] hudi-spark-bundle  [jar]
[INFO] hudi-presto-bundle [jar]
[INFO] hudi-utilities-bundle  [jar]
[INFO] hudi-timeline-server-bundle[jar]
[INFO] hudi-hadoop-docker [pom]
[INFO] hudi-hadoop-base-docker[pom]
[INFO] hudi-hadoop-namenode-docker[pom]
[INFO] hudi-hadoop-datanode-docker[pom]
[INFO] hudi-hadoop-history-docker [pom]
[INFO] hudi-hadoop-hive-docker[pom]
[INFO] hudi-hadoop-sparkbase-docker   [pom]
[INFO] 

[GitHub] [incubator-hudi] yanghua commented on issue #1191: [HUDI-503] Add hudi test suite documentation into the README file of the test suite module

2020-01-10 Thread GitBox
yanghua commented on issue #1191: [HUDI-503] Add hudi test suite documentation 
into the README file of the test suite module
URL: https://github.com/apache/incubator-hudi/pull/1191#issuecomment-573271577
 
 
   > @yanghua looks good, did you try running it in docker ? Also, can you 
squash your commits and then I can merge this PR ?
   
   Absolutely, I can squash the commits. Sorry, I did not verify those commands 
in the docker. My local docker env always has some problems. Can you help to 
verify them?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] bschell commented on issue #1194: [HUDI-326] Add support to delete records with only record_key

2020-01-10 Thread GitBox
bschell commented on issue #1194: [HUDI-326] Add support to delete records with 
only record_key
URL: https://github.com/apache/incubator-hudi/pull/1194#issuecomment-573269632
 
 
   @n3nash Squashed the commits and updated PR. Also created the ticket here: 
https://issues.apache.org/jira/browse/HUDI-520 to decide on the approach for 
null/empty handling.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] bschell commented on a change in pull request #1194: [HUDI-326] Add support to delete records with only record_key

2020-01-10 Thread GitBox
bschell commented on a change in pull request #1194: [HUDI-326] Add support to 
delete records with only record_key
URL: https://github.com/apache/incubator-hudi/pull/1194#discussion_r365490837
 
 

 ##
 File path: 
hudi-spark/src/main/java/org/apache/hudi/keygen/GlobalDeleteKeyGenerator.java
 ##
 @@ -0,0 +1,74 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.keygen;
+
+import java.util.Arrays;
+import java.util.List;
+import org.apache.avro.generic.GenericRecord;
+import org.apache.hudi.DataSourceUtils;
+import org.apache.hudi.DataSourceWriteOptions;
+import org.apache.hudi.common.model.HoodieKey;
+import org.apache.hudi.common.util.TypedProperties;
+import org.apache.hudi.exception.HoodieKeyException;
+
+/**
+ * Key generator for deletes using global indices. Global index deletes do not 
require partition value
+ * so this key generator avoids using partition value for generating HoodieKey.
+ */
+public class GlobalDeleteKeyGenerator extends KeyGenerator {
+
+  private static final String EMPTY_PARTITION = "";
+  private static final String NULL_RECORDKEY_PLACEHOLDER = "__null__";
+  private static final String EMPTY_RECORDKEY_PLACEHOLDER = "__empty__";
+
+  protected final List recordKeyFields;
+
+  public GlobalDeleteKeyGenerator(TypedProperties config) {
+super(config);
+this.recordKeyFields = 
Arrays.asList(config.getString(DataSourceWriteOptions.RECORDKEY_FIELD_OPT_KEY()).split(","));
+  }
+
+  @Override
+  public HoodieKey getKey(GenericRecord record) {
+if (recordKeyFields == null) {
+  throw new HoodieKeyException("Unable to find field names for record key 
or partition path in cfg");
+}
+
+boolean keyIsNullEmpty = true;
+StringBuilder recordKey = new StringBuilder();
+for (String recordKeyField : recordKeyFields) {
+  String recordKeyValue = 
DataSourceUtils.getNestedFieldValAsString(record, recordKeyField, true);
+  if (recordKeyValue == null) {
 
 Review comment:
   good point, will do.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Updated] (HUDI-248) CLI doesn't allow rolling back a Delta commit

2020-01-10 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf updated HUDI-248:
---
Status: Closed  (was: Patch Available)

> CLI doesn't allow rolling back a Delta commit
> -
>
> Key: HUDI-248
> URL: https://issues.apache.org/jira/browse/HUDI-248
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>  Components: CLI, Usability
>Reporter: Rahul Bhartia
>Assignee: leesf
>Priority: Minor
>  Labels: aws-emr, pull-request-available
> Fix For: 0.5.1
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> [https://github.com/apache/incubator-hudi/blob/master/hudi-cli/src/main/java/org/apache/hudi/cli/commands/CommitsCommand.java#L128]
>  
> When trying to find a match for passed in commit value, the "commit rollback" 
> command is always default to using HoodieTimeline.COMMIT_ACTION - and hence 
> doesn't allow rolling back delta commits.
> Note: Delta Commits can be rolled back using a HoodieWriteClient, so seems 
> like it's a just a matter of having to match against both COMMIT_ACTION and 
> DELTA_COMMIT_ACTION in the CLI.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-248) CLI doesn't allow rolling back a Delta commit

2020-01-10 Thread leesf (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17013324#comment-17013324
 ] 

leesf commented on HUDI-248:


Fixed via master: 04afac977d4bd615c217349083b5f86cfa8060c4

> CLI doesn't allow rolling back a Delta commit
> -
>
> Key: HUDI-248
> URL: https://issues.apache.org/jira/browse/HUDI-248
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>  Components: CLI, Usability
>Reporter: Rahul Bhartia
>Assignee: leesf
>Priority: Minor
>  Labels: aws-emr, pull-request-available
> Fix For: 0.5.1
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> [https://github.com/apache/incubator-hudi/blob/master/hudi-cli/src/main/java/org/apache/hudi/cli/commands/CommitsCommand.java#L128]
>  
> When trying to find a match for passed in commit value, the "commit rollback" 
> command is always default to using HoodieTimeline.COMMIT_ACTION - and hence 
> doesn't allow rolling back delta commits.
> Note: Delta Commits can be rolled back using a HoodieWriteClient, so seems 
> like it's a just a matter of having to match against both COMMIT_ACTION and 
> DELTA_COMMIT_ACTION in the CLI.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (HUDI-513) Hudi Cli can't rollback deltastreamer commit.

2020-01-10 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf closed HUDI-513.
--
Resolution: Duplicate

> Hudi Cli can't rollback deltastreamer commit.
> -
>
> Key: HUDI-513
> URL: https://issues.apache.org/jira/browse/HUDI-513
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>  Components: CLI
>Affects Versions: 0.5.0
>Reporter: Alexander Filipchik
>Priority: Major
>
> commits show returns: 
> CommitTime: 20200106235639
> commit rollback --commit 20200106235639
> fails with:
> Commit 20200107021902 not found in Commits 
> org.apache.hudi.common.table.timeline.HoodieDefaultTimeline: 
> [20200106235639__deltacommit__COMPLETED],
>  
> Probably due to:
> [https://github.com/apache/incubator-hudi/blob/master/hudi-cli/src/main/java/org/apache/hudi/cli/commands/CommitsCommand.java#L99]
>  
> same applies to other commit commands. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-513) Hudi Cli can't rollback deltastreamer commit.

2020-01-10 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf updated HUDI-513:
---
Status: Open  (was: New)

> Hudi Cli can't rollback deltastreamer commit.
> -
>
> Key: HUDI-513
> URL: https://issues.apache.org/jira/browse/HUDI-513
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>  Components: CLI
>Affects Versions: 0.5.0
>Reporter: Alexander Filipchik
>Priority: Major
>
> commits show returns: 
> CommitTime: 20200106235639
> commit rollback --commit 20200106235639
> fails with:
> Commit 20200107021902 not found in Commits 
> org.apache.hudi.common.table.timeline.HoodieDefaultTimeline: 
> [20200106235639__deltacommit__COMPLETED],
>  
> Probably due to:
> [https://github.com/apache/incubator-hudi/blob/master/hudi-cli/src/main/java/org/apache/hudi/cli/commands/CommitsCommand.java#L99]
>  
> same applies to other commit commands. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-469) HoodieCommitMetadata only show first commit insert rows.

2020-01-10 Thread leesf (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17013321#comment-17013321
 ] 

leesf commented on HUDI-469:


Fixed via master: b95367d82a4e0c31a526cf8b292388355847e274

> HoodieCommitMetadata only show first commit insert rows. 
> -
>
> Key: HUDI-469
> URL: https://issues.apache.org/jira/browse/HUDI-469
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>  Components: CLI
>Reporter: cdmikechen
>Assignee: cdmikechen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.1
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> When I run hudi cli to get insert rows, I found that hudi cli can not get 
> insert rows if it is not in first commit time. I found that 
> *{{HoodieCommitMetadata.fetchTotalInsertRecordsWritten()*}} method use 
> *{{stat.getPrevCommit().equalsIgnoreCase("null")*}} to filter first commit. 
> This check option should be removed。



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-469) HoodieCommitMetadata only show first commit insert rows.

2020-01-10 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf updated HUDI-469:
---
Status: Closed  (was: Patch Available)

> HoodieCommitMetadata only show first commit insert rows. 
> -
>
> Key: HUDI-469
> URL: https://issues.apache.org/jira/browse/HUDI-469
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>  Components: CLI
>Reporter: cdmikechen
>Assignee: cdmikechen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.1
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> When I run hudi cli to get insert rows, I found that hudi cli can not get 
> insert rows if it is not in first commit time. I found that 
> *{{HoodieCommitMetadata.fetchTotalInsertRecordsWritten()*}} method use 
> *{{stat.getPrevCommit().equalsIgnoreCase("null")*}} to filter first commit. 
> This check option should be removed。



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HUDI-506) Optimize the new website based on feedback

2020-01-10 Thread lamber-ken (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17010164#comment-17010164
 ] 

lamber-ken edited comment on HUDI-506 at 1/11/20 1:00 AM:
--

Thanks for your feedback (y)

1, DataGenerator is a inner class in QuickstartUtils.

2, FAQ page contains the context about "How do I model the data stored in Hudi".

3, Concepts page contains the context about "commit".

4, Done.


was (Author: lamber-ken):
Thanks for your feedback (y)

> Optimize the new website based on feedback
> --
>
> Key: HUDI-506
> URL: https://issues.apache.org/jira/browse/HUDI-506
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Docs
>Reporter: sivabalan narayanan
>Assignee: lamber-ken
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.1
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Few issues.
> 1. Under Quicstart -> set up spark shell
>  "Data-generator" links to QuickStartUtils? Is that intended.
> 2. Under Quicstart ->Insert data
> "Modelling data stored in hudi" should it not link to 
> "https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=113709185#FAQ-HowdoImodelthedatastoredinHudi;.
>  Why linking it to general FAQ page. 
> 3. Under Quicstart -> Update data
> "commit" links to "concepts" page. is that intended? 
> 4. Link to "file a jira" is taking to summary page in hudi. Should we fix it 
> to launch "create new ticket" with some fields (like labels or tags as 
> needed) auto populated so that we can track them. 
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-520) Decide on keyGenerator strategy for handling null/empty recordkeys

2020-01-10 Thread Brandon Scheller (Jira)
Brandon Scheller created HUDI-520:
-

 Summary: Decide on keyGenerator strategy for handling null/empty 
recordkeys 
 Key: HUDI-520
 URL: https://issues.apache.org/jira/browse/HUDI-520
 Project: Apache Hudi (incubating)
  Issue Type: Bug
Reporter: Brandon Scheller


Currently key-generator implementations write out "__null__" for null values 
and "__empty__" for empty in order to provide a distinction between the two. 
This can add extra overhead to large datalakes that might not use this 
distinction.

This Jira is to decide on a consistent strategy for handling null/empty record 
keys in key generators.

 

The current strategy can be seen within ComplexKeyGenerator



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] lamber-ken commented on issue #1196: [feedback] new site feedback

2020-01-10 Thread GitBox
lamber-ken commented on issue #1196: [feedback] new site feedback
URL: https://github.com/apache/incubator-hudi/issues/1196#issuecomment-573258075
 
 
   
![image](https://user-images.githubusercontent.com/20113411/72195451-51ef8800-344d-11ea-98ff-da5e109cb836.png)
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] n3nash commented on a change in pull request #1005: [HUDI-91][HUDI-12]Migrate to spark 2.4.4, migrate to spark-avro library instead of databricks-avro, add support for Decima

2020-01-10 Thread GitBox
n3nash commented on a change in pull request #1005: [HUDI-91][HUDI-12]Migrate 
to spark 2.4.4, migrate to spark-avro library instead of databricks-avro, add 
support for Decimal/Date types
URL: https://github.com/apache/incubator-hudi/pull/1005#discussion_r365480126
 
 

 ##
 File path: packaging/hudi-hadoop-mr-bundle/pom.xml
 ##
 @@ -143,17 +143,7 @@
 
   org.apache.avro
   avro
-  ${mr.bundle.avro.scope}
+  compile
 
   
-
-  
-
-  mr-bundle-shade-avro
 
 Review comment:
   @vinothchandar @bvaradar Yes, this will affect the custom payload 
implementation on the reader side. But we are anyways going to make some 
changes in how the payload packages are loaded so we should be able to absorb 
this change as part of those considerations.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[incubator-hudi] branch master updated: [CLEAN] replace utf-8 constant with StandardCharsets.UTF_8

2020-01-10 Thread nagarwal
This is an automated email from the ASF dual-hosted git repository.

nagarwal pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new e103165  [CLEAN] replace utf-8 constant with StandardCharsets.UTF_8
e103165 is described below

commit e1031650833e3d359e00473884ceac272eac3afa
Author: lamber-ken 
AuthorDate: Wed Jan 8 11:12:10 2020 +0800

[CLEAN] replace utf-8 constant with StandardCharsets.UTF_8
---
 .../main/java/org/apache/hudi/common/model/HoodieCommitMetadata.java | 4 ++--
 .../src/main/java/org/apache/hudi/common/util/HoodieAvroUtils.java   | 5 +++--
 2 files changed, 5 insertions(+), 4 deletions(-)

diff --git 
a/hudi-common/src/main/java/org/apache/hudi/common/model/HoodieCommitMetadata.java
 
b/hudi-common/src/main/java/org/apache/hudi/common/model/HoodieCommitMetadata.java
index fcca320..f16ef2f 100644
--- 
a/hudi-common/src/main/java/org/apache/hudi/common/model/HoodieCommitMetadata.java
+++ 
b/hudi-common/src/main/java/org/apache/hudi/common/model/HoodieCommitMetadata.java
@@ -30,7 +30,7 @@ import org.apache.log4j.Logger;
 
 import java.io.IOException;
 import java.io.Serializable;
-import java.nio.charset.Charset;
+import java.nio.charset.StandardCharsets;
 import java.util.ArrayList;
 import java.util.HashMap;
 import java.util.List;
@@ -327,7 +327,7 @@ public class HoodieCommitMetadata implements Serializable {
 
   public static  T fromBytes(byte[] bytes, Class clazz) throws 
IOException {
 try {
-  return fromJsonString(new String(bytes, Charset.forName("utf-8")), 
clazz);
+  return fromJsonString(new String(bytes, StandardCharsets.UTF_8), clazz);
 } catch (Exception e) {
   throw new IOException("unable to read commit metadata", e);
 }
diff --git 
a/hudi-common/src/main/java/org/apache/hudi/common/util/HoodieAvroUtils.java 
b/hudi-common/src/main/java/org/apache/hudi/common/util/HoodieAvroUtils.java
index 40f907a..d030ce8 100644
--- a/hudi-common/src/main/java/org/apache/hudi/common/util/HoodieAvroUtils.java
+++ b/hudi-common/src/main/java/org/apache/hudi/common/util/HoodieAvroUtils.java
@@ -40,6 +40,7 @@ import java.io.ByteArrayOutputStream;
 import java.io.IOException;
 import java.io.InputStream;
 import java.io.OutputStream;
+import java.nio.charset.StandardCharsets;
 import java.util.ArrayList;
 import java.util.Arrays;
 import java.util.List;
@@ -217,7 +218,7 @@ public class HoodieAvroUtils {
 ByteArrayOutputStream baos = new ByteArrayOutputStream();
 try {
   OutputStream out = new DeflaterOutputStream(baos);
-  out.write(text.getBytes("UTF-8"));
+  out.write(text.getBytes(StandardCharsets.UTF_8));
   out.close();
 } catch (IOException e) {
   throw new HoodieIOException("IOException while compressing text " + 
text, e);
@@ -234,7 +235,7 @@ public class HoodieAvroUtils {
   while ((len = in.read(buffer)) > 0) {
 baos.write(buffer, 0, len);
   }
-  return new String(baos.toByteArray(), "UTF-8");
+  return new String(baos.toByteArray(), StandardCharsets.UTF_8);
 } catch (IOException e) {
   throw new HoodieIOException("IOException while decompressing text", e);
 }



[GitHub] [incubator-hudi] n3nash merged pull request #1204: [CLEAN] replace utf-8 constant with StandardCharsets.UTF_8

2020-01-10 Thread GitBox
n3nash merged pull request #1204: [CLEAN] replace utf-8 constant with 
StandardCharsets.UTF_8
URL: https://github.com/apache/incubator-hudi/pull/1204
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Updated] (HUDI-469) HoodieCommitMetadata only show first commit insert rows.

2020-01-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-469:

Labels: pull-request-available  (was: )

> HoodieCommitMetadata only show first commit insert rows. 
> -
>
> Key: HUDI-469
> URL: https://issues.apache.org/jira/browse/HUDI-469
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>  Components: CLI
>Reporter: cdmikechen
>Assignee: cdmikechen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.1
>
>
> When I run hudi cli to get insert rows, I found that hudi cli can not get 
> insert rows if it is not in first commit time. I found that 
> *{{HoodieCommitMetadata.fetchTotalInsertRecordsWritten()*}} method use 
> *{{stat.getPrevCommit().equalsIgnoreCase("null")*}} to filter first commit. 
> This check option should be removed。



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] n3nash merged pull request #1119: [HUDI-469] Fix: HoodieCommitMetadata only show first commit insert rows.

2020-01-10 Thread GitBox
n3nash merged pull request #1119: [HUDI-469] Fix: HoodieCommitMetadata only 
show first commit insert rows.
URL: https://github.com/apache/incubator-hudi/pull/1119
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[incubator-hudi] branch master updated: [HUDI-469] Fix: HoodieCommitMetadata only show first commit insert rows.

2020-01-10 Thread nagarwal
This is an automated email from the ASF dual-hosted git repository.

nagarwal pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new b95367d  [HUDI-469] Fix: HoodieCommitMetadata only show first commit 
insert rows.
b95367d is described below

commit b95367d82a4e0c31a526cf8b292388355847e274
Author: Thinking <744417...@qq.com>
AuthorDate: Fri Jan 10 16:29:51 2020 +0800

[HUDI-469] Fix: HoodieCommitMetadata only show first commit insert rows.
---
 .../main/java/org/apache/hudi/common/model/HoodieCommitMetadata.java   | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git 
a/hudi-common/src/main/java/org/apache/hudi/common/model/HoodieCommitMetadata.java
 
b/hudi-common/src/main/java/org/apache/hudi/common/model/HoodieCommitMetadata.java
index 475f75c..fcca320 100644
--- 
a/hudi-common/src/main/java/org/apache/hudi/common/model/HoodieCommitMetadata.java
+++ 
b/hudi-common/src/main/java/org/apache/hudi/common/model/HoodieCommitMetadata.java
@@ -175,7 +175,8 @@ public class HoodieCommitMetadata implements Serializable {
 long totalInsertRecordsWritten = 0;
 for (List stats : partitionToWriteStats.values()) {
   for (HoodieWriteStat stat : stats) {
-if (stat.getPrevCommit() != null && 
stat.getPrevCommit().equalsIgnoreCase("null")) {
+// determine insert rows in every file
+if (stat.getPrevCommit() != null) {
   totalInsertRecordsWritten += stat.getNumInserts();
 }
   }



[GitHub] [incubator-hudi] n3nash commented on a change in pull request #1119: [HUDI-469] Fix: HoodieCommitMetadata only show first commit insert rows.

2020-01-10 Thread GitBox
n3nash commented on a change in pull request #1119: [HUDI-469] Fix: 
HoodieCommitMetadata only show first commit insert rows.
URL: https://github.com/apache/incubator-hudi/pull/1119#discussion_r365478532
 
 

 ##
 File path: 
hudi-common/src/main/java/org/apache/hudi/common/model/HoodieCommitMetadata.java
 ##
 @@ -175,7 +175,8 @@ public long fetchTotalInsertRecordsWritten() {
 long totalInsertRecordsWritten = 0;
 for (List stats : partitionToWriteStats.values()) {
   for (HoodieWriteStat stat : stats) {
-if (stat.getPrevCommit() != null && 
stat.getPrevCommit().equalsIgnoreCase("null")) {
+// It was only possible to determine the number of rows to insert for 
the first commit before. Currently, this problem is fixed
 
 Review comment:
   Thank you!


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] n3nash commented on issue #1191: [HUDI-503] Add hudi test suite documentation into the README file of the test suite module

2020-01-10 Thread GitBox
n3nash commented on issue #1191: [HUDI-503] Add hudi test suite documentation 
into the README file of the test suite module
URL: https://github.com/apache/incubator-hudi/pull/1191#issuecomment-573254806
 
 
   @yanghua looks good, did you try running it in docker ? Also, can you squash 
your commits and then I can merge this PR ?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] n3nash commented on a change in pull request #1194: [HUDI-326] Add support to delete records with only record_key

2020-01-10 Thread GitBox
n3nash commented on a change in pull request #1194: [HUDI-326] Add support to 
delete records with only record_key
URL: https://github.com/apache/incubator-hudi/pull/1194#discussion_r365478024
 
 

 ##
 File path: 
hudi-spark/src/main/java/org/apache/hudi/keygen/GlobalDeleteKeyGenerator.java
 ##
 @@ -0,0 +1,74 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.keygen;
+
+import java.util.Arrays;
+import java.util.List;
+import org.apache.avro.generic.GenericRecord;
+import org.apache.hudi.DataSourceUtils;
+import org.apache.hudi.DataSourceWriteOptions;
+import org.apache.hudi.common.model.HoodieKey;
+import org.apache.hudi.common.util.TypedProperties;
+import org.apache.hudi.exception.HoodieKeyException;
+
+/**
+ * Key generator for deletes using global indices. Global index deletes do not 
require partition value
+ * so this key generator avoids using partition value for generating HoodieKey.
+ */
+public class GlobalDeleteKeyGenerator extends KeyGenerator {
+
+  private static final String EMPTY_PARTITION = "";
+  private static final String NULL_RECORDKEY_PLACEHOLDER = "__null__";
+  private static final String EMPTY_RECORDKEY_PLACEHOLDER = "__empty__";
+
+  protected final List recordKeyFields;
+
+  public GlobalDeleteKeyGenerator(TypedProperties config) {
+super(config);
+this.recordKeyFields = 
Arrays.asList(config.getString(DataSourceWriteOptions.RECORDKEY_FIELD_OPT_KEY()).split(","));
+  }
+
+  @Override
+  public HoodieKey getKey(GenericRecord record) {
+if (recordKeyFields == null) {
+  throw new HoodieKeyException("Unable to find field names for record key 
or partition path in cfg");
+}
+
+boolean keyIsNullEmpty = true;
+StringBuilder recordKey = new StringBuilder();
+for (String recordKeyField : recordKeyFields) {
+  String recordKeyValue = 
DataSourceUtils.getNestedFieldValAsString(record, recordKeyField, true);
+  if (recordKeyValue == null) {
 
 Review comment:
   Interesting, I think we should fix the ComplexKeyGenerator as well, writing 
extra bytes like "__null__" just to denote null values doesn't seem like the 
best idea for data lakes. 
   Can you open a ticket for addressing that ? We can for now keep this 
implementation consistent with ComplexKeyGenerator. Once you open the ticket, 
please link it here and squash your commits to 1 then I can merge this PR, 
thanks


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] n3nash commented on a change in pull request #1194: [HUDI-326] Add support to delete records with only record_key

2020-01-10 Thread GitBox
n3nash commented on a change in pull request #1194: [HUDI-326] Add support to 
delete records with only record_key
URL: https://github.com/apache/incubator-hudi/pull/1194#discussion_r365478024
 
 

 ##
 File path: 
hudi-spark/src/main/java/org/apache/hudi/keygen/GlobalDeleteKeyGenerator.java
 ##
 @@ -0,0 +1,74 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.keygen;
+
+import java.util.Arrays;
+import java.util.List;
+import org.apache.avro.generic.GenericRecord;
+import org.apache.hudi.DataSourceUtils;
+import org.apache.hudi.DataSourceWriteOptions;
+import org.apache.hudi.common.model.HoodieKey;
+import org.apache.hudi.common.util.TypedProperties;
+import org.apache.hudi.exception.HoodieKeyException;
+
+/**
+ * Key generator for deletes using global indices. Global index deletes do not 
require partition value
+ * so this key generator avoids using partition value for generating HoodieKey.
+ */
+public class GlobalDeleteKeyGenerator extends KeyGenerator {
+
+  private static final String EMPTY_PARTITION = "";
+  private static final String NULL_RECORDKEY_PLACEHOLDER = "__null__";
+  private static final String EMPTY_RECORDKEY_PLACEHOLDER = "__empty__";
+
+  protected final List recordKeyFields;
+
+  public GlobalDeleteKeyGenerator(TypedProperties config) {
+super(config);
+this.recordKeyFields = 
Arrays.asList(config.getString(DataSourceWriteOptions.RECORDKEY_FIELD_OPT_KEY()).split(","));
+  }
+
+  @Override
+  public HoodieKey getKey(GenericRecord record) {
+if (recordKeyFields == null) {
+  throw new HoodieKeyException("Unable to find field names for record key 
or partition path in cfg");
+}
+
+boolean keyIsNullEmpty = true;
+StringBuilder recordKey = new StringBuilder();
+for (String recordKeyField : recordKeyFields) {
+  String recordKeyValue = 
DataSourceUtils.getNestedFieldValAsString(record, recordKeyField, true);
+  if (recordKeyValue == null) {
 
 Review comment:
   Interesting, I think we should fix the ComplexKeyGenerator as well, writing 
extra bytes like "__null__" just to denote null values doesn't seem like the 
best idea for data lakes. 
   Can you open a ticket for addressing that ? We can for now keep this 
implementation consistent with ComplexKeyGenerator.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] n3nash commented on a change in pull request #1194: [HUDI-326] Add support to delete records with only record_key

2020-01-10 Thread GitBox
n3nash commented on a change in pull request #1194: [HUDI-326] Add support to 
delete records with only record_key
URL: https://github.com/apache/incubator-hudi/pull/1194#discussion_r365478024
 
 

 ##
 File path: 
hudi-spark/src/main/java/org/apache/hudi/keygen/GlobalDeleteKeyGenerator.java
 ##
 @@ -0,0 +1,74 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.keygen;
+
+import java.util.Arrays;
+import java.util.List;
+import org.apache.avro.generic.GenericRecord;
+import org.apache.hudi.DataSourceUtils;
+import org.apache.hudi.DataSourceWriteOptions;
+import org.apache.hudi.common.model.HoodieKey;
+import org.apache.hudi.common.util.TypedProperties;
+import org.apache.hudi.exception.HoodieKeyException;
+
+/**
+ * Key generator for deletes using global indices. Global index deletes do not 
require partition value
+ * so this key generator avoids using partition value for generating HoodieKey.
+ */
+public class GlobalDeleteKeyGenerator extends KeyGenerator {
+
+  private static final String EMPTY_PARTITION = "";
+  private static final String NULL_RECORDKEY_PLACEHOLDER = "__null__";
+  private static final String EMPTY_RECORDKEY_PLACEHOLDER = "__empty__";
+
+  protected final List recordKeyFields;
+
+  public GlobalDeleteKeyGenerator(TypedProperties config) {
+super(config);
+this.recordKeyFields = 
Arrays.asList(config.getString(DataSourceWriteOptions.RECORDKEY_FIELD_OPT_KEY()).split(","));
+  }
+
+  @Override
+  public HoodieKey getKey(GenericRecord record) {
+if (recordKeyFields == null) {
+  throw new HoodieKeyException("Unable to find field names for record key 
or partition path in cfg");
+}
+
+boolean keyIsNullEmpty = true;
+StringBuilder recordKey = new StringBuilder();
+for (String recordKeyField : recordKeyFields) {
+  String recordKeyValue = 
DataSourceUtils.getNestedFieldValAsString(record, recordKeyField, true);
+  if (recordKeyValue == null) {
 
 Review comment:
   Interesting, I think we should fix the ComplexKeyGenerator as well, writing 
extra bytes like "__null__" just to denote null values doesn't seem like the 
best idea for data lakes. 
   Can you open a ticket for addressing that ? We can for now keep this 
implementation consistent with ComplexKeyGenerator. Once you open the ticket, 
please link it here and then I can merge this PR, thanks


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[incubator-hudi] branch master updated: [HUDI-248] CLI doesn't allow rolling back a Delta commit

2020-01-10 Thread nagarwal
This is an automated email from the ASF dual-hosted git repository.

nagarwal pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new 04afac9  [HUDI-248] CLI doesn't allow rolling back a Delta commit
04afac9 is described below

commit 04afac977d4bd615c217349083b5f86cfa8060c4
Author: leesf <490081...@qq.com>
AuthorDate: Thu Jan 9 17:43:34 2020 +0800

[HUDI-248] CLI doesn't allow rolling back a Delta commit
---
 .../main/java/org/apache/hudi/cli/commands/CommitsCommand.java   | 9 -
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git 
a/hudi-cli/src/main/java/org/apache/hudi/cli/commands/CommitsCommand.java 
b/hudi-cli/src/main/java/org/apache/hudi/cli/commands/CommitsCommand.java
index a7cf32a..c0f8ead 100644
--- a/hudi-cli/src/main/java/org/apache/hudi/cli/commands/CommitsCommand.java
+++ b/hudi-cli/src/main/java/org/apache/hudi/cli/commands/CommitsCommand.java
@@ -95,11 +95,10 @@ public class CommitsCommand implements CommandMarker {
   @CliOption(key = {"sparkProperties"}, help = "Spark Properties File 
Path") final String sparkPropertiesPath)
   throws Exception {
 HoodieActiveTimeline activeTimeline = 
HoodieCLI.getTableMetaClient().getActiveTimeline();
-HoodieTimeline timeline = 
activeTimeline.getCommitsTimeline().filterCompletedInstants();
-HoodieInstant commitInstant = new HoodieInstant(false, 
HoodieTimeline.COMMIT_ACTION, commitTime);
-
-if (!timeline.containsInstant(commitInstant)) {
-  return "Commit " + commitTime + " not found in Commits " + timeline;
+HoodieTimeline completedTimeline = 
activeTimeline.getCommitsTimeline().filterCompletedInstants();
+HoodieTimeline filteredTimeline = completedTimeline.filter(instant -> 
instant.getTimestamp().equals(commitTime));
+if (filteredTimeline.empty()) {
+  return "Commit " + commitTime + " not found in Commits " + 
completedTimeline;
 }
 
 SparkLauncher sparkLauncher = SparkUtil.initLauncher(sparkPropertiesPath);



[GitHub] [incubator-hudi] n3nash merged pull request #1201: [HUDI-248] CLI doesn't allow rolling back a Delta commit

2020-01-10 Thread GitBox
n3nash merged pull request #1201: [HUDI-248] CLI doesn't allow rolling back a 
Delta commit
URL: https://github.com/apache/incubator-hudi/pull/1201
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Created] (HUDI-519) Document the need for Avro dependency shading/relocation for custom payloads

2020-01-10 Thread Udit Mehrotra (Jira)
Udit Mehrotra created HUDI-519:
--

 Summary: Document the need for Avro dependency shading/relocation 
for custom payloads 
 Key: HUDI-519
 URL: https://issues.apache.org/jira/browse/HUDI-519
 Project: Apache Hudi (incubating)
  Issue Type: Task
  Components: Docs, Usability
Reporter: Udit Mehrotra


In [https://github.com/apache/incubator-hudi/pull/1005] we are migrating Hudi 
to Spark 2.4.4. As part of this migration, we also had to migrate Hudi to use 
Avro 1.8.2 (required by spark), while Hive still uses older version of Avro.

This has resulted in the need to shade Avro in *hadoop-mr-bundle*. This has 
implications on users of Hudi, who implement custom record payloads. They would 
have start shading Avro in there custom jars, similar to how it shaded in 
*hadoop-mr-bundle*.

This Jira is to track the documentation of this caveat in release notes, and if 
needed at other places like website etc.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] umehrot2 commented on issue #1005: [HUDI-91][HUDI-12]Migrate to spark 2.4.4, migrate to spark-avro library instead of databricks-avro, add support for Decimal/Date types

2020-01-10 Thread GitBox
umehrot2 commented on issue #1005: [HUDI-91][HUDI-12]Migrate to spark 2.4.4, 
migrate to spark-avro library instead of databricks-avro, add support for 
Decimal/Date types
URL: https://github.com/apache/incubator-hudi/pull/1005#issuecomment-573235373
 
 
   @bvaradar Created a JIRA to track documentation of Avro shading caveat 
https://issues.apache.org/jira/browse/HUDI-519


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] bschell commented on a change in pull request #1194: [HUDI-326] Add support to delete records with only record_key

2020-01-10 Thread GitBox
bschell commented on a change in pull request #1194: [HUDI-326] Add support to 
delete records with only record_key
URL: https://github.com/apache/incubator-hudi/pull/1194#discussion_r365459763
 
 

 ##
 File path: 
hudi-spark/src/main/java/org/apache/hudi/keygen/GlobalDeleteKeyGenerator.java
 ##
 @@ -0,0 +1,74 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.keygen;
+
+import java.util.Arrays;
+import java.util.List;
+import org.apache.avro.generic.GenericRecord;
+import org.apache.hudi.DataSourceUtils;
+import org.apache.hudi.DataSourceWriteOptions;
+import org.apache.hudi.common.model.HoodieKey;
+import org.apache.hudi.common.util.TypedProperties;
+import org.apache.hudi.exception.HoodieKeyException;
+
+/**
+ * Key generator for deletes using global indices. Global index deletes do not 
require partition value
+ * so this key generator avoids using partition value for generating HoodieKey.
+ */
+public class GlobalDeleteKeyGenerator extends KeyGenerator {
+
+  private static final String EMPTY_PARTITION = "";
+  private static final String NULL_RECORDKEY_PLACEHOLDER = "__null__";
+  private static final String EMPTY_RECORDKEY_PLACEHOLDER = "__empty__";
+
+  protected final List recordKeyFields;
+
+  public GlobalDeleteKeyGenerator(TypedProperties config) {
+super(config);
+this.recordKeyFields = 
Arrays.asList(config.getString(DataSourceWriteOptions.RECORDKEY_FIELD_OPT_KEY()).split(","));
+  }
+
+  @Override
+  public HoodieKey getKey(GenericRecord record) {
+if (recordKeyFields == null) {
+  throw new HoodieKeyException("Unable to find field names for record key 
or partition path in cfg");
+}
+
+boolean keyIsNullEmpty = true;
+StringBuilder recordKey = new StringBuilder();
+for (String recordKeyField : recordKeyFields) {
+  String recordKeyValue = 
DataSourceUtils.getNestedFieldValAsString(record, recordKeyField, true);
+  if (recordKeyValue == null) {
 
 Review comment:
   For more context, am trying to maintain record_key compatibility with
   
https://github.com/apache/incubator-hudi/blob/master/hudi-spark/src/main/java/org/apache/hudi/ComplexKeyGenerator.java
 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] bschell commented on a change in pull request #1194: [HUDI-326] Add support to delete records with only record_key

2020-01-10 Thread GitBox
bschell commented on a change in pull request #1194: [HUDI-326] Add support to 
delete records with only record_key
URL: https://github.com/apache/incubator-hudi/pull/1194#discussion_r365459763
 
 

 ##
 File path: 
hudi-spark/src/main/java/org/apache/hudi/keygen/GlobalDeleteKeyGenerator.java
 ##
 @@ -0,0 +1,74 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.keygen;
+
+import java.util.Arrays;
+import java.util.List;
+import org.apache.avro.generic.GenericRecord;
+import org.apache.hudi.DataSourceUtils;
+import org.apache.hudi.DataSourceWriteOptions;
+import org.apache.hudi.common.model.HoodieKey;
+import org.apache.hudi.common.util.TypedProperties;
+import org.apache.hudi.exception.HoodieKeyException;
+
+/**
+ * Key generator for deletes using global indices. Global index deletes do not 
require partition value
+ * so this key generator avoids using partition value for generating HoodieKey.
+ */
+public class GlobalDeleteKeyGenerator extends KeyGenerator {
+
+  private static final String EMPTY_PARTITION = "";
+  private static final String NULL_RECORDKEY_PLACEHOLDER = "__null__";
+  private static final String EMPTY_RECORDKEY_PLACEHOLDER = "__empty__";
+
+  protected final List recordKeyFields;
+
+  public GlobalDeleteKeyGenerator(TypedProperties config) {
+super(config);
+this.recordKeyFields = 
Arrays.asList(config.getString(DataSourceWriteOptions.RECORDKEY_FIELD_OPT_KEY()).split(","));
+  }
+
+  @Override
+  public HoodieKey getKey(GenericRecord record) {
+if (recordKeyFields == null) {
+  throw new HoodieKeyException("Unable to find field names for record key 
or partition path in cfg");
+}
+
+boolean keyIsNullEmpty = true;
+StringBuilder recordKey = new StringBuilder();
+for (String recordKeyField : recordKeyFields) {
+  String recordKeyValue = 
DataSourceUtils.getNestedFieldValAsString(record, recordKeyField, true);
+  if (recordKeyValue == null) {
 
 Review comment:
   @n3nash For more context, am trying to maintain record_key compatibility with
   
https://github.com/apache/incubator-hudi/blob/master/hudi-spark/src/main/java/org/apache/hudi/ComplexKeyGenerator.java
 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] umehrot2 commented on a change in pull request #1005: [HUDI-91][HUDI-12]Migrate to spark 2.4.4, migrate to spark-avro library instead of databricks-avro, add support for Deci

2020-01-10 Thread GitBox
umehrot2 commented on a change in pull request #1005: [HUDI-91][HUDI-12]Migrate 
to spark 2.4.4, migrate to spark-avro library instead of databricks-avro, add 
support for Decimal/Date types
URL: https://github.com/apache/incubator-hudi/pull/1005#discussion_r365457608
 
 

 ##
 File path: LICENSE
 ##
 @@ -241,26 +241,6 @@ This product includes code from 
https://github.com/twitter/commons/blob/master/s
  limitations under the License.
 
=
 
-This product includes code from Databricks spark-avro with the below license
 
 Review comment:
   Done


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] zhedoubushishi commented on a change in pull request #1109: [HUDI-238] - Migrating to Scala 2.12

2020-01-10 Thread GitBox
zhedoubushishi commented on a change in pull request #1109: [HUDI-238] - 
Migrating to Scala 2.12
URL: https://github.com/apache/incubator-hudi/pull/1109#discussion_r365385384
 
 

 ##
 File path: 
hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/helpers/KafkaOffsetGen.java
 ##
 @@ -175,93 +155,101 @@ public static long totalNewMessages(OffsetRange[] 
ranges) {
 
 private static final String KAFKA_TOPIC_NAME = 
"hoodie.deltastreamer.source.kafka.topic";
 private static final String MAX_EVENTS_FROM_KAFKA_SOURCE_PROP = 
"hoodie.deltastreamer.kafka.source.maxEvents";
-private static final KafkaResetOffsetStrategies DEFAULT_AUTO_RESET_OFFSET 
= KafkaResetOffsetStrategies.LARGEST;
+private static final KafkaResetOffsetStrategies DEFAULT_AUTO_RESET_OFFSET 
= KafkaResetOffsetStrategies.latest;
 public static final long DEFAULT_MAX_EVENTS_FROM_KAFKA_SOURCE = 500;
 public static long maxEventsFromKafkaSource = 
DEFAULT_MAX_EVENTS_FROM_KAFKA_SOURCE;
   }
 
-  private final HashMap kafkaParams;
+  private final HashMap kafkaParams;
   private final TypedProperties props;
   protected final String topicName;
 
   public KafkaOffsetGen(TypedProperties props) {
 this.props = props;
-kafkaParams = new HashMap();
+kafkaParams = new HashMap();
 for (Object prop : props.keySet()) {
   kafkaParams.put(prop.toString(), props.getString(prop.toString()));
 }
 DataSourceUtils.checkRequiredProperties(props, 
Collections.singletonList(Config.KAFKA_TOPIC_NAME));
 topicName = props.getString(Config.KAFKA_TOPIC_NAME);
   }
 
+  public HashMap getKafkaProperties() {
+final HashMap kafkaParams;
+kafkaParams = new HashMap();
+for (Object prop : props.keySet()) {
+  kafkaParams.put(prop.toString(), props.get(prop));
+}
+kafkaParams.put(ConsumerConfig.GROUP_ID_CONFIG, String.valueOf(new 
Random().nextInt(1)));
 
 Review comment:
   +1, do we really need to set consumer group id here? From my understanding 
this config depends on the way you want to consume the Kafka topic, should it 
be set from the user side?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (HUDI-518) compact error when hoodie.compact.inline is true

2020-01-10 Thread liujianhui (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17013063#comment-17013063
 ] 

liujianhui commented on HUDI-518:
-

the generate of the commit instant should greater than the commit instant time

> compact error when hoodie.compact.inline is true
> 
>
> Key: HUDI-518
> URL: https://issues.apache.org/jira/browse/HUDI-518
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>  Components: Compaction
>Reporter: liujianhui
>Priority: Minor
>
> # set the property [hoodie.compact.inline|http://hoodie.compact.inline/] as 
> true
>  # the duration of the write process is 1 second
>  # the instant time of the compact is same to the commit instant time
>  
> {code}
> java.lang.IllegalArgumentException: Following instants have timestamps >= 
> compactionInstant (20200110171526) Instants 
> :[[20200110171526__deltacommit__COMPLETED]]
>  at com.google.common.base.Preconditions.checkArgument(Preconditions.java:92)
>  at 
> org.apache.hudi.HoodieWriteClient.scheduleCompactionAtInstant(HoodieWriteClient.java:1043)
>  at 
> org.apache.hudi.HoodieWriteClient.scheduleCompaction(HoodieWriteClient.java:1018)
>  at 
> org.apache.hudi.HoodieWriteClient.forceCompact(HoodieWriteClient.java:1292)
>  at org.apache.hudi.HoodieWriteClient.commit(HoodieWriteClient.java:510)
>  at org.apache.hudi.HoodieWriteClient.commit(HoodieWriteClient.java:479)
>  at org.apache.hudi.HoodieWriteClient.commit(HoodieWriteClient.java:470)
>  at 
> org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:152)
>  at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:91)
>  at 
> org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45)
>  at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
>  at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
>  at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86)
>  at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
>  at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
>  at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
>  at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
>  at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
>  at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
>  at 
> org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:80)
>  at 
> org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:80)
>  at 
> org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676)
>  at 
> org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676)
>  at 
> org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:78)
>  at 
> org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125)
>  at 
> org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73)
>  at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:676)
>  at 
> org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:285)
>  at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:271)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-518) compact error when hoodie.compact.inline is true

2020-01-10 Thread liujianhui (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

liujianhui updated HUDI-518:

Description: 
# set the property [hoodie.compact.inline|http://hoodie.compact.inline/] as true
 # the duration of the write process is 1 second
 # the instant time of the compact is same to the commit instant time

 

{code}

java.lang.IllegalArgumentException: Following instants have timestamps >= 
compactionInstant (20200110171526) Instants 
:[[20200110171526__deltacommit__COMPLETED]]
 at com.google.common.base.Preconditions.checkArgument(Preconditions.java:92)
 at 
org.apache.hudi.HoodieWriteClient.scheduleCompactionAtInstant(HoodieWriteClient.java:1043)
 at 
org.apache.hudi.HoodieWriteClient.scheduleCompaction(HoodieWriteClient.java:1018)
 at org.apache.hudi.HoodieWriteClient.forceCompact(HoodieWriteClient.java:1292)
 at org.apache.hudi.HoodieWriteClient.commit(HoodieWriteClient.java:510)
 at org.apache.hudi.HoodieWriteClient.commit(HoodieWriteClient.java:479)
 at org.apache.hudi.HoodieWriteClient.commit(HoodieWriteClient.java:470)
 at org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:152)
 at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:91)
 at 
org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45)
 at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
 at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
 at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86)
 at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
 at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
 at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
 at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
 at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
 at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
 at 
org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:80)
 at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:80)
 at 
org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676)
 at 
org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676)
 at 
org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:78)
 at 
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125)
 at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73)
 at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:676)
 at 
org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:285)
 at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:271)

{code}

  was:
# set the property [hoodie.compact.inline|http://hoodie.compact.inline/] as true
 # the duration of the write process is 1 second
 # the instant time of the compact is same to the commit instant time

 

```

java.lang.IllegalArgumentException: Following instants have timestamps >= 
compactionInstant (20200110171526) Instants 
:[[20200110171526__deltacommit__COMPLETED]]
 at com.google.common.base.Preconditions.checkArgument(Preconditions.java:92)
 at 
org.apache.hudi.HoodieWriteClient.scheduleCompactionAtInstant(HoodieWriteClient.java:1043)
 at 
org.apache.hudi.HoodieWriteClient.scheduleCompaction(HoodieWriteClient.java:1018)
 at org.apache.hudi.HoodieWriteClient.forceCompact(HoodieWriteClient.java:1292)
 at org.apache.hudi.HoodieWriteClient.commit(HoodieWriteClient.java:510)
 at org.apache.hudi.HoodieWriteClient.commit(HoodieWriteClient.java:479)
 at org.apache.hudi.HoodieWriteClient.commit(HoodieWriteClient.java:470)
 at org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:152)
 at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:91)
 at 
org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45)
 at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
 at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
 at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86)
 at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
 at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
 at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
 at 

[jira] [Created] (HUDI-517) compact error when hoodie.compact.inline is true

2020-01-10 Thread liujianhui (Jira)
liujianhui created HUDI-517:
---

 Summary: compact error when hoodie.compact.inline is true
 Key: HUDI-517
 URL: https://issues.apache.org/jira/browse/HUDI-517
 Project: Apache Hudi (incubating)
  Issue Type: Bug
  Components: Compaction
Reporter: liujianhui


# set the property [hoodie.compact.inline|http://hoodie.compact.inline/] as true
 # the duration of the write process is 1 second
 # the instant time of the compact is same to the commit instant time

 

```

java.lang.IllegalArgumentException: Following instants have timestamps >= 
compactionInstant (20200110171526) Instants 
:[[20200110171526__deltacommit__COMPLETED]]
 at com.google.common.base.Preconditions.checkArgument(Preconditions.java:92)
 at 
org.apache.hudi.HoodieWriteClient.scheduleCompactionAtInstant(HoodieWriteClient.java:1043)
 at 
org.apache.hudi.HoodieWriteClient.scheduleCompaction(HoodieWriteClient.java:1018)
 at org.apache.hudi.HoodieWriteClient.forceCompact(HoodieWriteClient.java:1292)
 at org.apache.hudi.HoodieWriteClient.commit(HoodieWriteClient.java:510)
 at org.apache.hudi.HoodieWriteClient.commit(HoodieWriteClient.java:479)
 at org.apache.hudi.HoodieWriteClient.commit(HoodieWriteClient.java:470)
 at org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:152)
 at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:91)
 at 
org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45)
 at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
 at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
 at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86)
 at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
 at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
 at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
 at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
 at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
 at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
 at 
org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:80)
 at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:80)
 at 
org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676)
 at 
org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676)
 at 
org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:78)
 at 
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125)
 at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73)
 at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:676)
 at 
org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:285)
 at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:271)

```



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-518) compact error when hoodie.compact.inline is true

2020-01-10 Thread liujianhui (Jira)
liujianhui created HUDI-518:
---

 Summary: compact error when hoodie.compact.inline is true
 Key: HUDI-518
 URL: https://issues.apache.org/jira/browse/HUDI-518
 Project: Apache Hudi (incubating)
  Issue Type: Bug
  Components: Compaction
Reporter: liujianhui


# set the property [hoodie.compact.inline|http://hoodie.compact.inline/] as true
 # the duration of the write process is 1 second
 # the instant time of the compact is same to the commit instant time

 

```

java.lang.IllegalArgumentException: Following instants have timestamps >= 
compactionInstant (20200110171526) Instants 
:[[20200110171526__deltacommit__COMPLETED]]
 at com.google.common.base.Preconditions.checkArgument(Preconditions.java:92)
 at 
org.apache.hudi.HoodieWriteClient.scheduleCompactionAtInstant(HoodieWriteClient.java:1043)
 at 
org.apache.hudi.HoodieWriteClient.scheduleCompaction(HoodieWriteClient.java:1018)
 at org.apache.hudi.HoodieWriteClient.forceCompact(HoodieWriteClient.java:1292)
 at org.apache.hudi.HoodieWriteClient.commit(HoodieWriteClient.java:510)
 at org.apache.hudi.HoodieWriteClient.commit(HoodieWriteClient.java:479)
 at org.apache.hudi.HoodieWriteClient.commit(HoodieWriteClient.java:470)
 at org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:152)
 at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:91)
 at 
org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45)
 at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
 at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
 at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86)
 at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
 at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
 at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
 at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
 at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
 at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
 at 
org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:80)
 at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:80)
 at 
org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676)
 at 
org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676)
 at 
org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:78)
 at 
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125)
 at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73)
 at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:676)
 at 
org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:285)
 at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:271)

```



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] liujianhuiouc commented on issue #143: Tracking ticket for folks to be added to slack group

2020-01-10 Thread GitBox
liujianhuiouc commented on issue #143: Tracking ticket for folks to be added to 
slack group
URL: https://github.com/apache/incubator-hudi/issues/143#issuecomment-573111663
 
 
   please add me: liujianhui...@163.com


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] OpenOpened commented on a change in pull request #1188: [HUDI-502] provide a custom time zone definition for TimestampBasedKeyGenerator

2020-01-10 Thread GitBox
OpenOpened commented on a change in pull request #1188: [HUDI-502] provide a 
custom time zone definition for TimestampBasedKeyGenerator
URL: https://github.com/apache/incubator-hudi/pull/1188#discussion_r365284878
 
 

 ##
 File path: 
hudi-utilities/src/test/java/org/apache/hudi/utilities/TestTimestampBasedKeyGenerator.java
 ##
 @@ -0,0 +1,80 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.utilities;
+
+import org.apache.avro.Schema;
+import org.apache.avro.generic.GenericRecord;
+import org.apache.hudi.DataSourceWriteOptions;
+import org.apache.hudi.common.model.HoodieKey;
+import org.apache.hudi.common.util.SchemaTestUtil;
+import org.apache.hudi.common.util.TypedProperties;
+import org.apache.hudi.utilities.keygen.TimestampBasedKeyGenerator;
+import org.junit.Before;
+import org.junit.Test;
+
+import java.io.IOException;
+
+import static org.junit.Assert.assertEquals;
+
+public class TestTimestampBasedKeyGenerator {
+  private Schema schema = SchemaTestUtil.getTimestampEvolvedSchema();
+  private GenericRecord baseRecord = null;
+
+  public TestTimestampBasedKeyGenerator() throws IOException {
+  }
+
+  @Before
+  public void initialize() throws IOException {
+baseRecord = SchemaTestUtil
+.generateAvroRecordFromJson(schema, 1, "001", "f1");
+  }
+
+  private TypedProperties getBaseKeyConfig(String recordKeyFieldName, String 
partitionPathField, String hiveStylePartitioning) {
+TypedProperties props = new TypedProperties();
+props.setProperty(DataSourceWriteOptions.RECORDKEY_FIELD_OPT_KEY(), 
recordKeyFieldName);
+props.setProperty(DataSourceWriteOptions.PARTITIONPATH_FIELD_OPT_KEY(), 
partitionPathField);
+
props.setProperty(DataSourceWriteOptions.HIVE_STYLE_PARTITIONING_OPT_KEY(), 
hiveStylePartitioning);
+props.setProperty("hoodie.deltastreamer.keygen.timebased.timestamp.type", 
"EPOCHMILLISECONDS");
+
props.setProperty("hoodie.deltastreamer.keygen.timebased.output.dateformat", 
"-MM-dd hh");
+props.setProperty("hoodie.deltastreamer.keygen.timebased.timezone", 
"GMT+8:00");
+return props;
+  }
+
+  @Test
+  public void testTimestampBasedKeyGenerator() {
+// if timezone is GMT+8:00
+baseRecord.put("createTime", 1578283932000L);
+TypedProperties props = getBaseKeyConfig("field1", "createTime", "false");
+HoodieKey hk1 = new TimestampBasedKeyGenerator(props).getKey(baseRecord);
+assertEquals(hk1.getPartitionPath(), "2020-01-06 12");
+
+// if timezone is GMT
+props.setProperty("hoodie.deltastreamer.keygen.timebased.timezone", "GMT");
+HoodieKey hk2 = new TimestampBasedKeyGenerator(props).getKey(baseRecord);
+assertEquals(hk2.getPartitionPath(), "2020-01-06 04");
+
+// if timestamp is DATE_STRING, and timestamp type is DATE_STRING
+baseRecord.put("createTime", "2020-01-06 12:12:12");
+props.setProperty("hoodie.deltastreamer.keygen.timebased.timestamp.type", 
"DATE_STRING");
+
props.setProperty("hoodie.deltastreamer.keygen.timebased.input.dateformat", 
"-MM-dd hh:mm:ss");
+props.setProperty("hoodie.deltastreamer.keygen.timebased.timezone", 
"GMT+8:00");
+HoodieKey hk3 = new TimestampBasedKeyGenerator(props).getKey(baseRecord);
+assertEquals(hk3.getPartitionPath(), "2020-01-06 12");
+  }
+}
 
 Review comment:
   already covered.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (HUDI-458) Redo hudi-hadoop-mr log statements using SLF4J

2020-01-10 Thread leesf (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17012868#comment-17012868
 ] 

leesf commented on HUDI-458:


[~wangxianghu] Feel free to pick it up, all yours. :)

> Redo hudi-hadoop-mr log statements using SLF4J
> --
>
> Key: HUDI-458
> URL: https://issues.apache.org/jira/browse/HUDI-458
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Code Cleanup
>Reporter: leesf
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] leesf commented on a change in pull request #1188: [HUDI-502] provide a custom time zone definition for TimestampBasedKeyGenerator

2020-01-10 Thread GitBox
leesf commented on a change in pull request #1188: [HUDI-502] provide a custom 
time zone definition for TimestampBasedKeyGenerator
URL: https://github.com/apache/incubator-hudi/pull/1188#discussion_r365232440
 
 

 ##
 File path: 
hudi-utilities/src/test/java/org/apache/hudi/utilities/TestTimestampBasedKeyGenerator.java
 ##
 @@ -0,0 +1,80 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.utilities;
+
+import org.apache.avro.Schema;
+import org.apache.avro.generic.GenericRecord;
+import org.apache.hudi.DataSourceWriteOptions;
+import org.apache.hudi.common.model.HoodieKey;
+import org.apache.hudi.common.util.SchemaTestUtil;
+import org.apache.hudi.common.util.TypedProperties;
+import org.apache.hudi.utilities.keygen.TimestampBasedKeyGenerator;
+import org.junit.Before;
+import org.junit.Test;
+
+import java.io.IOException;
+
+import static org.junit.Assert.assertEquals;
+
+public class TestTimestampBasedKeyGenerator {
+  private Schema schema = SchemaTestUtil.getTimestampEvolvedSchema();
+  private GenericRecord baseRecord = null;
+
+  public TestTimestampBasedKeyGenerator() throws IOException {
+  }
+
+  @Before
+  public void initialize() throws IOException {
+baseRecord = SchemaTestUtil
+.generateAvroRecordFromJson(schema, 1, "001", "f1");
+  }
+
+  private TypedProperties getBaseKeyConfig(String recordKeyFieldName, String 
partitionPathField, String hiveStylePartitioning) {
+TypedProperties props = new TypedProperties();
+props.setProperty(DataSourceWriteOptions.RECORDKEY_FIELD_OPT_KEY(), 
recordKeyFieldName);
+props.setProperty(DataSourceWriteOptions.PARTITIONPATH_FIELD_OPT_KEY(), 
partitionPathField);
+
props.setProperty(DataSourceWriteOptions.HIVE_STYLE_PARTITIONING_OPT_KEY(), 
hiveStylePartitioning);
+props.setProperty("hoodie.deltastreamer.keygen.timebased.timestamp.type", 
"EPOCHMILLISECONDS");
+
props.setProperty("hoodie.deltastreamer.keygen.timebased.output.dateformat", 
"-MM-dd hh");
+props.setProperty("hoodie.deltastreamer.keygen.timebased.timezone", 
"GMT+8:00");
+return props;
+  }
+
+  @Test
+  public void testTimestampBasedKeyGenerator() {
+// if timezone is GMT+8:00
+baseRecord.put("createTime", 1578283932000L);
+TypedProperties props = getBaseKeyConfig("field1", "createTime", "false");
+HoodieKey hk1 = new TimestampBasedKeyGenerator(props).getKey(baseRecord);
+assertEquals(hk1.getPartitionPath(), "2020-01-06 12");
+
+// if timezone is GMT
+props.setProperty("hoodie.deltastreamer.keygen.timebased.timezone", "GMT");
+HoodieKey hk2 = new TimestampBasedKeyGenerator(props).getKey(baseRecord);
+assertEquals(hk2.getPartitionPath(), "2020-01-06 04");
+
+// if timestamp is DATE_STRING, and timestamp type is DATE_STRING
+baseRecord.put("createTime", "2020-01-06 12:12:12");
+props.setProperty("hoodie.deltastreamer.keygen.timebased.timestamp.type", 
"DATE_STRING");
+
props.setProperty("hoodie.deltastreamer.keygen.timebased.input.dateformat", 
"-MM-dd hh:mm:ss");
+props.setProperty("hoodie.deltastreamer.keygen.timebased.timezone", 
"GMT+8:00");
+HoodieKey hk3 = new TimestampBasedKeyGenerator(props).getKey(baseRecord);
+assertEquals(hk3.getPartitionPath(), "2020-01-06 12");
+  }
+}
 
 Review comment:
   > Then what do I need to do? addition more test units?
   
   You may copy the code from gist and cover the current 
`TestTimestampBasedKeyGenerator` class.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] OpenOpened commented on a change in pull request #1188: [HUDI-502] provide a custom time zone definition for TimestampBasedKeyGenerator

2020-01-10 Thread GitBox
OpenOpened commented on a change in pull request #1188: [HUDI-502] provide a 
custom time zone definition for TimestampBasedKeyGenerator
URL: https://github.com/apache/incubator-hudi/pull/1188#discussion_r365228893
 
 

 ##
 File path: 
hudi-utilities/src/test/java/org/apache/hudi/utilities/TestTimestampBasedKeyGenerator.java
 ##
 @@ -0,0 +1,80 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.utilities;
+
+import org.apache.avro.Schema;
+import org.apache.avro.generic.GenericRecord;
+import org.apache.hudi.DataSourceWriteOptions;
+import org.apache.hudi.common.model.HoodieKey;
+import org.apache.hudi.common.util.SchemaTestUtil;
+import org.apache.hudi.common.util.TypedProperties;
+import org.apache.hudi.utilities.keygen.TimestampBasedKeyGenerator;
+import org.junit.Before;
+import org.junit.Test;
+
+import java.io.IOException;
+
+import static org.junit.Assert.assertEquals;
+
+public class TestTimestampBasedKeyGenerator {
+  private Schema schema = SchemaTestUtil.getTimestampEvolvedSchema();
+  private GenericRecord baseRecord = null;
+
+  public TestTimestampBasedKeyGenerator() throws IOException {
+  }
+
+  @Before
+  public void initialize() throws IOException {
+baseRecord = SchemaTestUtil
+.generateAvroRecordFromJson(schema, 1, "001", "f1");
+  }
+
+  private TypedProperties getBaseKeyConfig(String recordKeyFieldName, String 
partitionPathField, String hiveStylePartitioning) {
+TypedProperties props = new TypedProperties();
+props.setProperty(DataSourceWriteOptions.RECORDKEY_FIELD_OPT_KEY(), 
recordKeyFieldName);
+props.setProperty(DataSourceWriteOptions.PARTITIONPATH_FIELD_OPT_KEY(), 
partitionPathField);
+
props.setProperty(DataSourceWriteOptions.HIVE_STYLE_PARTITIONING_OPT_KEY(), 
hiveStylePartitioning);
+props.setProperty("hoodie.deltastreamer.keygen.timebased.timestamp.type", 
"EPOCHMILLISECONDS");
+
props.setProperty("hoodie.deltastreamer.keygen.timebased.output.dateformat", 
"-MM-dd hh");
+props.setProperty("hoodie.deltastreamer.keygen.timebased.timezone", 
"GMT+8:00");
+return props;
+  }
+
+  @Test
+  public void testTimestampBasedKeyGenerator() {
+// if timezone is GMT+8:00
+baseRecord.put("createTime", 1578283932000L);
+TypedProperties props = getBaseKeyConfig("field1", "createTime", "false");
+HoodieKey hk1 = new TimestampBasedKeyGenerator(props).getKey(baseRecord);
+assertEquals(hk1.getPartitionPath(), "2020-01-06 12");
+
+// if timezone is GMT
+props.setProperty("hoodie.deltastreamer.keygen.timebased.timezone", "GMT");
+HoodieKey hk2 = new TimestampBasedKeyGenerator(props).getKey(baseRecord);
+assertEquals(hk2.getPartitionPath(), "2020-01-06 04");
+
+// if timestamp is DATE_STRING, and timestamp type is DATE_STRING
+baseRecord.put("createTime", "2020-01-06 12:12:12");
+props.setProperty("hoodie.deltastreamer.keygen.timebased.timestamp.type", 
"DATE_STRING");
+
props.setProperty("hoodie.deltastreamer.keygen.timebased.input.dateformat", 
"-MM-dd hh:mm:ss");
+props.setProperty("hoodie.deltastreamer.keygen.timebased.timezone", 
"GMT+8:00");
+HoodieKey hk3 = new TimestampBasedKeyGenerator(props).getKey(baseRecord);
+assertEquals(hk3.getPartitionPath(), "2020-01-06 12");
+  }
+}
 
 Review comment:
   Then what do I need to do? addition more test units?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] OpenOpened commented on a change in pull request #1188: [HUDI-502] provide a custom time zone definition for TimestampBasedKeyGenerator

2020-01-10 Thread GitBox
OpenOpened commented on a change in pull request #1188: [HUDI-502] provide a 
custom time zone definition for TimestampBasedKeyGenerator
URL: https://github.com/apache/incubator-hudi/pull/1188#discussion_r365228893
 
 

 ##
 File path: 
hudi-utilities/src/test/java/org/apache/hudi/utilities/TestTimestampBasedKeyGenerator.java
 ##
 @@ -0,0 +1,80 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.utilities;
+
+import org.apache.avro.Schema;
+import org.apache.avro.generic.GenericRecord;
+import org.apache.hudi.DataSourceWriteOptions;
+import org.apache.hudi.common.model.HoodieKey;
+import org.apache.hudi.common.util.SchemaTestUtil;
+import org.apache.hudi.common.util.TypedProperties;
+import org.apache.hudi.utilities.keygen.TimestampBasedKeyGenerator;
+import org.junit.Before;
+import org.junit.Test;
+
+import java.io.IOException;
+
+import static org.junit.Assert.assertEquals;
+
+public class TestTimestampBasedKeyGenerator {
+  private Schema schema = SchemaTestUtil.getTimestampEvolvedSchema();
+  private GenericRecord baseRecord = null;
+
+  public TestTimestampBasedKeyGenerator() throws IOException {
+  }
+
+  @Before
+  public void initialize() throws IOException {
+baseRecord = SchemaTestUtil
+.generateAvroRecordFromJson(schema, 1, "001", "f1");
+  }
+
+  private TypedProperties getBaseKeyConfig(String recordKeyFieldName, String 
partitionPathField, String hiveStylePartitioning) {
+TypedProperties props = new TypedProperties();
+props.setProperty(DataSourceWriteOptions.RECORDKEY_FIELD_OPT_KEY(), 
recordKeyFieldName);
+props.setProperty(DataSourceWriteOptions.PARTITIONPATH_FIELD_OPT_KEY(), 
partitionPathField);
+
props.setProperty(DataSourceWriteOptions.HIVE_STYLE_PARTITIONING_OPT_KEY(), 
hiveStylePartitioning);
+props.setProperty("hoodie.deltastreamer.keygen.timebased.timestamp.type", 
"EPOCHMILLISECONDS");
+
props.setProperty("hoodie.deltastreamer.keygen.timebased.output.dateformat", 
"-MM-dd hh");
+props.setProperty("hoodie.deltastreamer.keygen.timebased.timezone", 
"GMT+8:00");
+return props;
+  }
+
+  @Test
+  public void testTimestampBasedKeyGenerator() {
+// if timezone is GMT+8:00
+baseRecord.put("createTime", 1578283932000L);
+TypedProperties props = getBaseKeyConfig("field1", "createTime", "false");
+HoodieKey hk1 = new TimestampBasedKeyGenerator(props).getKey(baseRecord);
+assertEquals(hk1.getPartitionPath(), "2020-01-06 12");
+
+// if timezone is GMT
+props.setProperty("hoodie.deltastreamer.keygen.timebased.timezone", "GMT");
+HoodieKey hk2 = new TimestampBasedKeyGenerator(props).getKey(baseRecord);
+assertEquals(hk2.getPartitionPath(), "2020-01-06 04");
+
+// if timestamp is DATE_STRING, and timestamp type is DATE_STRING
+baseRecord.put("createTime", "2020-01-06 12:12:12");
+props.setProperty("hoodie.deltastreamer.keygen.timebased.timestamp.type", 
"DATE_STRING");
+
props.setProperty("hoodie.deltastreamer.keygen.timebased.input.dateformat", 
"-MM-dd hh:mm:ss");
+props.setProperty("hoodie.deltastreamer.keygen.timebased.timezone", 
"GMT+8:00");
+HoodieKey hk3 = new TimestampBasedKeyGenerator(props).getKey(baseRecord);
+assertEquals(hk3.getPartitionPath(), "2020-01-06 12");
+  }
+}
 
 Review comment:
   Then what do I need to do? addition more test unites?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] OpenOpened commented on a change in pull request #1188: [HUDI-502] provide a custom time zone definition for TimestampBasedKeyGenerator

2020-01-10 Thread GitBox
OpenOpened commented on a change in pull request #1188: [HUDI-502] provide a 
custom time zone definition for TimestampBasedKeyGenerator
URL: https://github.com/apache/incubator-hudi/pull/1188#discussion_r365227387
 
 

 ##
 File path: 
hudi-utilities/src/test/java/org/apache/hudi/utilities/TestTimestampBasedKeyGenerator.java
 ##
 @@ -0,0 +1,80 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.utilities;
+
+import org.apache.avro.Schema;
+import org.apache.avro.generic.GenericRecord;
+import org.apache.hudi.DataSourceWriteOptions;
+import org.apache.hudi.common.model.HoodieKey;
+import org.apache.hudi.common.util.SchemaTestUtil;
+import org.apache.hudi.common.util.TypedProperties;
+import org.apache.hudi.utilities.keygen.TimestampBasedKeyGenerator;
+import org.junit.Before;
+import org.junit.Test;
+
+import java.io.IOException;
+
+import static org.junit.Assert.assertEquals;
+
+public class TestTimestampBasedKeyGenerator {
+  private Schema schema = SchemaTestUtil.getTimestampEvolvedSchema();
+  private GenericRecord baseRecord = null;
+
+  public TestTimestampBasedKeyGenerator() throws IOException {
+  }
+
+  @Before
+  public void initialize() throws IOException {
+baseRecord = SchemaTestUtil
+.generateAvroRecordFromJson(schema, 1, "001", "f1");
+  }
+
+  private TypedProperties getBaseKeyConfig(String recordKeyFieldName, String 
partitionPathField, String hiveStylePartitioning) {
+TypedProperties props = new TypedProperties();
+props.setProperty(DataSourceWriteOptions.RECORDKEY_FIELD_OPT_KEY(), 
recordKeyFieldName);
+props.setProperty(DataSourceWriteOptions.PARTITIONPATH_FIELD_OPT_KEY(), 
partitionPathField);
+
props.setProperty(DataSourceWriteOptions.HIVE_STYLE_PARTITIONING_OPT_KEY(), 
hiveStylePartitioning);
+props.setProperty("hoodie.deltastreamer.keygen.timebased.timestamp.type", 
"EPOCHMILLISECONDS");
+
props.setProperty("hoodie.deltastreamer.keygen.timebased.output.dateformat", 
"-MM-dd hh");
+props.setProperty("hoodie.deltastreamer.keygen.timebased.timezone", 
"GMT+8:00");
+return props;
+  }
+
+  @Test
+  public void testTimestampBasedKeyGenerator() {
+// if timezone is GMT+8:00
+baseRecord.put("createTime", 1578283932000L);
+TypedProperties props = getBaseKeyConfig("field1", "createTime", "false");
+HoodieKey hk1 = new TimestampBasedKeyGenerator(props).getKey(baseRecord);
+assertEquals(hk1.getPartitionPath(), "2020-01-06 12");
+
+// if timezone is GMT
+props.setProperty("hoodie.deltastreamer.keygen.timebased.timezone", "GMT");
+HoodieKey hk2 = new TimestampBasedKeyGenerator(props).getKey(baseRecord);
+assertEquals(hk2.getPartitionPath(), "2020-01-06 04");
+
+// if timestamp is DATE_STRING, and timestamp type is DATE_STRING
+baseRecord.put("createTime", "2020-01-06 12:12:12");
+props.setProperty("hoodie.deltastreamer.keygen.timebased.timestamp.type", 
"DATE_STRING");
+
props.setProperty("hoodie.deltastreamer.keygen.timebased.input.dateformat", 
"-MM-dd hh:mm:ss");
+props.setProperty("hoodie.deltastreamer.keygen.timebased.timezone", 
"GMT+8:00");
+HoodieKey hk3 = new TimestampBasedKeyGenerator(props).getKey(baseRecord);
+assertEquals(hk3.getPartitionPath(), "2020-01-06 12");
+  }
+}
 
 Review comment:
   Is that right? This is the web address: 
[gist](https://gist.github.com/OpenOpened/2b50b86f21c36f3e07c78cb44d97ec9c)


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] OpenOpened commented on a change in pull request #1188: [HUDI-502] provide a custom time zone definition for TimestampBasedKeyGenerator

2020-01-10 Thread GitBox
OpenOpened commented on a change in pull request #1188: [HUDI-502] provide a 
custom time zone definition for TimestampBasedKeyGenerator
URL: https://github.com/apache/incubator-hudi/pull/1188#discussion_r365227387
 
 

 ##
 File path: 
hudi-utilities/src/test/java/org/apache/hudi/utilities/TestTimestampBasedKeyGenerator.java
 ##
 @@ -0,0 +1,80 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.utilities;
+
+import org.apache.avro.Schema;
+import org.apache.avro.generic.GenericRecord;
+import org.apache.hudi.DataSourceWriteOptions;
+import org.apache.hudi.common.model.HoodieKey;
+import org.apache.hudi.common.util.SchemaTestUtil;
+import org.apache.hudi.common.util.TypedProperties;
+import org.apache.hudi.utilities.keygen.TimestampBasedKeyGenerator;
+import org.junit.Before;
+import org.junit.Test;
+
+import java.io.IOException;
+
+import static org.junit.Assert.assertEquals;
+
+public class TestTimestampBasedKeyGenerator {
+  private Schema schema = SchemaTestUtil.getTimestampEvolvedSchema();
+  private GenericRecord baseRecord = null;
+
+  public TestTimestampBasedKeyGenerator() throws IOException {
+  }
+
+  @Before
+  public void initialize() throws IOException {
+baseRecord = SchemaTestUtil
+.generateAvroRecordFromJson(schema, 1, "001", "f1");
+  }
+
+  private TypedProperties getBaseKeyConfig(String recordKeyFieldName, String 
partitionPathField, String hiveStylePartitioning) {
+TypedProperties props = new TypedProperties();
+props.setProperty(DataSourceWriteOptions.RECORDKEY_FIELD_OPT_KEY(), 
recordKeyFieldName);
+props.setProperty(DataSourceWriteOptions.PARTITIONPATH_FIELD_OPT_KEY(), 
partitionPathField);
+
props.setProperty(DataSourceWriteOptions.HIVE_STYLE_PARTITIONING_OPT_KEY(), 
hiveStylePartitioning);
+props.setProperty("hoodie.deltastreamer.keygen.timebased.timestamp.type", 
"EPOCHMILLISECONDS");
+
props.setProperty("hoodie.deltastreamer.keygen.timebased.output.dateformat", 
"-MM-dd hh");
+props.setProperty("hoodie.deltastreamer.keygen.timebased.timezone", 
"GMT+8:00");
+return props;
+  }
+
+  @Test
+  public void testTimestampBasedKeyGenerator() {
+// if timezone is GMT+8:00
+baseRecord.put("createTime", 1578283932000L);
+TypedProperties props = getBaseKeyConfig("field1", "createTime", "false");
+HoodieKey hk1 = new TimestampBasedKeyGenerator(props).getKey(baseRecord);
+assertEquals(hk1.getPartitionPath(), "2020-01-06 12");
+
+// if timezone is GMT
+props.setProperty("hoodie.deltastreamer.keygen.timebased.timezone", "GMT");
+HoodieKey hk2 = new TimestampBasedKeyGenerator(props).getKey(baseRecord);
+assertEquals(hk2.getPartitionPath(), "2020-01-06 04");
+
+// if timestamp is DATE_STRING, and timestamp type is DATE_STRING
+baseRecord.put("createTime", "2020-01-06 12:12:12");
+props.setProperty("hoodie.deltastreamer.keygen.timebased.timestamp.type", 
"DATE_STRING");
+
props.setProperty("hoodie.deltastreamer.keygen.timebased.input.dateformat", 
"-MM-dd hh:mm:ss");
+props.setProperty("hoodie.deltastreamer.keygen.timebased.timezone", 
"GMT+8:00");
+HoodieKey hk3 = new TimestampBasedKeyGenerator(props).getKey(baseRecord);
+assertEquals(hk3.getPartitionPath(), "2020-01-06 12");
+  }
+}
 
 Review comment:
   Is that right? This is the web address: 
[gist](https://gist.github.com/OpenOpened/2b50b86f21c36f3e07c78cb44d97ec9c)


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (HUDI-458) Redo hudi-hadoop-mr log statements using SLF4J

2020-01-10 Thread wangxianghu (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17012809#comment-17012809
 ] 

wangxianghu commented on HUDI-458:
--

Hi [~xleesf] , May I take this?

> Redo hudi-hadoop-mr log statements using SLF4J
> --
>
> Key: HUDI-458
> URL: https://issues.apache.org/jira/browse/HUDI-458
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Code Cleanup
>Reporter: leesf
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] leesf commented on a change in pull request #1188: [HUDI-502] provide a custom time zone definition for TimestampBasedKeyGenerator

2020-01-10 Thread GitBox
leesf commented on a change in pull request #1188: [HUDI-502] provide a custom 
time zone definition for TimestampBasedKeyGenerator
URL: https://github.com/apache/incubator-hudi/pull/1188#discussion_r365178140
 
 

 ##
 File path: 
hudi-utilities/src/test/java/org/apache/hudi/utilities/TestTimestampBasedKeyGenerator.java
 ##
 @@ -0,0 +1,80 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.utilities;
+
+import org.apache.avro.Schema;
+import org.apache.avro.generic.GenericRecord;
+import org.apache.hudi.DataSourceWriteOptions;
+import org.apache.hudi.common.model.HoodieKey;
+import org.apache.hudi.common.util.SchemaTestUtil;
+import org.apache.hudi.common.util.TypedProperties;
+import org.apache.hudi.utilities.keygen.TimestampBasedKeyGenerator;
+import org.junit.Before;
+import org.junit.Test;
+
+import java.io.IOException;
+
+import static org.junit.Assert.assertEquals;
+
+public class TestTimestampBasedKeyGenerator {
+  private Schema schema = SchemaTestUtil.getTimestampEvolvedSchema();
+  private GenericRecord baseRecord = null;
+
+  public TestTimestampBasedKeyGenerator() throws IOException {
+  }
+
+  @Before
+  public void initialize() throws IOException {
+baseRecord = SchemaTestUtil
+.generateAvroRecordFromJson(schema, 1, "001", "f1");
+  }
+
+  private TypedProperties getBaseKeyConfig(String recordKeyFieldName, String 
partitionPathField, String hiveStylePartitioning) {
+TypedProperties props = new TypedProperties();
+props.setProperty(DataSourceWriteOptions.RECORDKEY_FIELD_OPT_KEY(), 
recordKeyFieldName);
+props.setProperty(DataSourceWriteOptions.PARTITIONPATH_FIELD_OPT_KEY(), 
partitionPathField);
+
props.setProperty(DataSourceWriteOptions.HIVE_STYLE_PARTITIONING_OPT_KEY(), 
hiveStylePartitioning);
+props.setProperty("hoodie.deltastreamer.keygen.timebased.timestamp.type", 
"EPOCHMILLISECONDS");
+
props.setProperty("hoodie.deltastreamer.keygen.timebased.output.dateformat", 
"-MM-dd hh");
+props.setProperty("hoodie.deltastreamer.keygen.timebased.timezone", 
"GMT+8:00");
+return props;
+  }
+
+  @Test
+  public void testTimestampBasedKeyGenerator() {
+// if timezone is GMT+8:00
+baseRecord.put("createTime", 1578283932000L);
+TypedProperties props = getBaseKeyConfig("field1", "createTime", "false");
+HoodieKey hk1 = new TimestampBasedKeyGenerator(props).getKey(baseRecord);
+assertEquals(hk1.getPartitionPath(), "2020-01-06 12");
+
+// if timezone is GMT
+props.setProperty("hoodie.deltastreamer.keygen.timebased.timezone", "GMT");
+HoodieKey hk2 = new TimestampBasedKeyGenerator(props).getKey(baseRecord);
+assertEquals(hk2.getPartitionPath(), "2020-01-06 04");
+
+// if timestamp is DATE_STRING, and timestamp type is DATE_STRING
+baseRecord.put("createTime", "2020-01-06 12:12:12");
+props.setProperty("hoodie.deltastreamer.keygen.timebased.timestamp.type", 
"DATE_STRING");
+
props.setProperty("hoodie.deltastreamer.keygen.timebased.input.dateformat", 
"-MM-dd hh:mm:ss");
+props.setProperty("hoodie.deltastreamer.keygen.timebased.timezone", 
"GMT+8:00");
+HoodieKey hk3 = new TimestampBasedKeyGenerator(props).getKey(baseRecord);
+assertEquals(hk3.getPartitionPath(), "2020-01-06 12");
+  }
+}
 
 Review comment:
   click the gist above.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] OpenOpened commented on a change in pull request #1188: [HUDI-502] provide a custom time zone definition for TimestampBasedKeyGenerator

2020-01-10 Thread GitBox
OpenOpened commented on a change in pull request #1188: [HUDI-502] provide a 
custom time zone definition for TimestampBasedKeyGenerator
URL: https://github.com/apache/incubator-hudi/pull/1188#discussion_r365175719
 
 

 ##
 File path: 
hudi-utilities/src/test/java/org/apache/hudi/utilities/TestTimestampBasedKeyGenerator.java
 ##
 @@ -0,0 +1,80 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.utilities;
+
+import org.apache.avro.Schema;
+import org.apache.avro.generic.GenericRecord;
+import org.apache.hudi.DataSourceWriteOptions;
+import org.apache.hudi.common.model.HoodieKey;
+import org.apache.hudi.common.util.SchemaTestUtil;
+import org.apache.hudi.common.util.TypedProperties;
+import org.apache.hudi.utilities.keygen.TimestampBasedKeyGenerator;
+import org.junit.Before;
+import org.junit.Test;
+
+import java.io.IOException;
+
+import static org.junit.Assert.assertEquals;
+
+public class TestTimestampBasedKeyGenerator {
+  private Schema schema = SchemaTestUtil.getTimestampEvolvedSchema();
+  private GenericRecord baseRecord = null;
+
+  public TestTimestampBasedKeyGenerator() throws IOException {
+  }
+
+  @Before
+  public void initialize() throws IOException {
+baseRecord = SchemaTestUtil
+.generateAvroRecordFromJson(schema, 1, "001", "f1");
+  }
+
+  private TypedProperties getBaseKeyConfig(String recordKeyFieldName, String 
partitionPathField, String hiveStylePartitioning) {
+TypedProperties props = new TypedProperties();
+props.setProperty(DataSourceWriteOptions.RECORDKEY_FIELD_OPT_KEY(), 
recordKeyFieldName);
+props.setProperty(DataSourceWriteOptions.PARTITIONPATH_FIELD_OPT_KEY(), 
partitionPathField);
+
props.setProperty(DataSourceWriteOptions.HIVE_STYLE_PARTITIONING_OPT_KEY(), 
hiveStylePartitioning);
+props.setProperty("hoodie.deltastreamer.keygen.timebased.timestamp.type", 
"EPOCHMILLISECONDS");
+
props.setProperty("hoodie.deltastreamer.keygen.timebased.output.dateformat", 
"-MM-dd hh");
+props.setProperty("hoodie.deltastreamer.keygen.timebased.timezone", 
"GMT+8:00");
+return props;
+  }
+
+  @Test
+  public void testTimestampBasedKeyGenerator() {
+// if timezone is GMT+8:00
+baseRecord.put("createTime", 1578283932000L);
+TypedProperties props = getBaseKeyConfig("field1", "createTime", "false");
+HoodieKey hk1 = new TimestampBasedKeyGenerator(props).getKey(baseRecord);
+assertEquals(hk1.getPartitionPath(), "2020-01-06 12");
+
+// if timezone is GMT
+props.setProperty("hoodie.deltastreamer.keygen.timebased.timezone", "GMT");
+HoodieKey hk2 = new TimestampBasedKeyGenerator(props).getKey(baseRecord);
+assertEquals(hk2.getPartitionPath(), "2020-01-06 04");
+
+// if timestamp is DATE_STRING, and timestamp type is DATE_STRING
+baseRecord.put("createTime", "2020-01-06 12:12:12");
+props.setProperty("hoodie.deltastreamer.keygen.timebased.timestamp.type", 
"DATE_STRING");
+
props.setProperty("hoodie.deltastreamer.keygen.timebased.input.dateformat", 
"-MM-dd hh:mm:ss");
+props.setProperty("hoodie.deltastreamer.keygen.timebased.timezone", 
"GMT+8:00");
+HoodieKey hk3 = new TimestampBasedKeyGenerator(props).getKey(baseRecord);
+assertEquals(hk3.getPartitionPath(), "2020-01-06 12");
+  }
+}
 
 Review comment:
   ok, What should I do?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] leesf commented on a change in pull request #1188: [HUDI-502] provide a custom time zone definition for TimestampBasedKeyGenerator

2020-01-10 Thread GitBox
leesf commented on a change in pull request #1188: [HUDI-502] provide a custom 
time zone definition for TimestampBasedKeyGenerator
URL: https://github.com/apache/incubator-hudi/pull/1188#discussion_r365166437
 
 

 ##
 File path: 
hudi-utilities/src/test/java/org/apache/hudi/utilities/TestTimestampBasedKeyGenerator.java
 ##
 @@ -0,0 +1,80 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.utilities;
+
+import org.apache.avro.Schema;
+import org.apache.avro.generic.GenericRecord;
+import org.apache.hudi.DataSourceWriteOptions;
+import org.apache.hudi.common.model.HoodieKey;
+import org.apache.hudi.common.util.SchemaTestUtil;
+import org.apache.hudi.common.util.TypedProperties;
+import org.apache.hudi.utilities.keygen.TimestampBasedKeyGenerator;
+import org.junit.Before;
+import org.junit.Test;
+
+import java.io.IOException;
+
+import static org.junit.Assert.assertEquals;
+
+public class TestTimestampBasedKeyGenerator {
+  private Schema schema = SchemaTestUtil.getTimestampEvolvedSchema();
+  private GenericRecord baseRecord = null;
+
+  public TestTimestampBasedKeyGenerator() throws IOException {
+  }
+
+  @Before
+  public void initialize() throws IOException {
+baseRecord = SchemaTestUtil
+.generateAvroRecordFromJson(schema, 1, "001", "f1");
+  }
+
+  private TypedProperties getBaseKeyConfig(String recordKeyFieldName, String 
partitionPathField, String hiveStylePartitioning) {
+TypedProperties props = new TypedProperties();
+props.setProperty(DataSourceWriteOptions.RECORDKEY_FIELD_OPT_KEY(), 
recordKeyFieldName);
+props.setProperty(DataSourceWriteOptions.PARTITIONPATH_FIELD_OPT_KEY(), 
partitionPathField);
+
props.setProperty(DataSourceWriteOptions.HIVE_STYLE_PARTITIONING_OPT_KEY(), 
hiveStylePartitioning);
+props.setProperty("hoodie.deltastreamer.keygen.timebased.timestamp.type", 
"EPOCHMILLISECONDS");
+
props.setProperty("hoodie.deltastreamer.keygen.timebased.output.dateformat", 
"-MM-dd hh");
+props.setProperty("hoodie.deltastreamer.keygen.timebased.timezone", 
"GMT+8:00");
+return props;
+  }
+
+  @Test
+  public void testTimestampBasedKeyGenerator() {
+// if timezone is GMT+8:00
+baseRecord.put("createTime", 1578283932000L);
+TypedProperties props = getBaseKeyConfig("field1", "createTime", "false");
+HoodieKey hk1 = new TimestampBasedKeyGenerator(props).getKey(baseRecord);
+assertEquals(hk1.getPartitionPath(), "2020-01-06 12");
+
+// if timezone is GMT
+props.setProperty("hoodie.deltastreamer.keygen.timebased.timezone", "GMT");
+HoodieKey hk2 = new TimestampBasedKeyGenerator(props).getKey(baseRecord);
+assertEquals(hk2.getPartitionPath(), "2020-01-06 04");
+
+// if timestamp is DATE_STRING, and timestamp type is DATE_STRING
+baseRecord.put("createTime", "2020-01-06 12:12:12");
+props.setProperty("hoodie.deltastreamer.keygen.timebased.timestamp.type", 
"DATE_STRING");
+
props.setProperty("hoodie.deltastreamer.keygen.timebased.input.dateformat", 
"-MM-dd hh:mm:ss");
+props.setProperty("hoodie.deltastreamer.keygen.timebased.timezone", 
"GMT+8:00");
+HoodieKey hk3 = new TimestampBasedKeyGenerator(props).getKey(baseRecord);
+assertEquals(hk3.getPartitionPath(), "2020-01-06 12");
+  }
+}
 
 Review comment:
   I think we would optimize the `TestTimestampBasedKeyGenerator` to the 
[gist](https://gist.github.com/leesf/b36654ba68c25a48d56958dabec07b83), WDYT?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] leesf commented on a change in pull request #1188: [HUDI-502] provide a custom time zone definition for TimestampBasedKeyGenerator

2020-01-10 Thread GitBox
leesf commented on a change in pull request #1188: [HUDI-502] provide a custom 
time zone definition for TimestampBasedKeyGenerator
URL: https://github.com/apache/incubator-hudi/pull/1188#discussion_r365165705
 
 

 ##
 File path: hudi-common/src/test/resources/timestamp-test-evolved.avsc
 ##
 @@ -0,0 +1,35 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+{
+  "namespace": "example.avro",
+  "type": "record",
+  "name": "User",
+  "fields": [
+{"name": "field1", "type": ["null", "string"], "default": null},
+{"name": "field2", "type": ["null", "string"], "default": null},
+{"name": "createTime", "type": ["null", "string"], "default": null},
+{"name": "name", "type": ["null", "string"], "default": null},
+{"name": "favoriteIntNumber",  "type": ["null", "int"], "default": null},
+{"name": "favoriteNumber",  "type": ["null", "long"], "default": null},
+{"name": "favoriteFloatNumber",  "type": ["null", "float"], "default": 
null},
+{"name": "favoriteDoubleNumber",  "type": ["null", "double"], "default": 
null},
+{"name": "tags", "type": ["null", {"values": ["null", {"fields": 
[{"default": null, "type": ["null", "string"], "name": "item1"}, {"default": 
null, "type": ["null", "string"], "name": "item2"} ], "type": "record", "name": 
"tagsMapItems"} ], "type": "map"} ], "default": null},
+{"default": null, "name": "testNestedRecord", "type": ["null", {"fields": 
[{"default": null, "name": "isAdmin", "type": ["null", "boolean"] }, 
{"default": null, "name": "userId", "type": ["null", "string"] } ], "name": 
"notes", "type": "record"}]},
+{"default": null, "name": "stringArray", "type": ["null", {"items": 
"string", "type": "array"}]}
 
 Review comment:
   Would we remove this useless fields?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (HUDI-114) Allow for clients to overwrite the payload implementation in hoodie.properties

2020-01-10 Thread leesf (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17012594#comment-17012594
 ] 

leesf commented on HUDI-114:


Fixed via master: 3c90d252cc464fbd4ec3554fc930e41a0fcaa29f

> Allow for clients to overwrite the payload implementation in hoodie.properties
> --
>
> Key: HUDI-114
> URL: https://issues.apache.org/jira/browse/HUDI-114
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>  Components: newbie, Writer Core
>Reporter: Nishith Agarwal
>Assignee: Pratyaksh Sharma
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.5.1
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Right now, once the payload class is set once in hoodie.properties, it cannot 
> be changed. In some cases, if a code refactor is done and the jar updated, 
> one may need to pass the new payload class name.
> Also, fix picking up the payload name for datasource API. By default 
> HoodieAvroPayload is written whereas for datasource API default is 
> OverwriteLatestAvroPayload



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-114) Allow for clients to overwrite the payload implementation in hoodie.properties

2020-01-10 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf updated HUDI-114:
---
Status: Closed  (was: Patch Available)

> Allow for clients to overwrite the payload implementation in hoodie.properties
> --
>
> Key: HUDI-114
> URL: https://issues.apache.org/jira/browse/HUDI-114
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>  Components: newbie, Writer Core
>Reporter: Nishith Agarwal
>Assignee: Pratyaksh Sharma
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.5.1
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Right now, once the payload class is set once in hoodie.properties, it cannot 
> be changed. In some cases, if a code refactor is done and the jar updated, 
> one may need to pass the new payload class name.
> Also, fix picking up the payload name for datasource API. By default 
> HoodieAvroPayload is written whereas for datasource API default is 
> OverwriteLatestAvroPayload



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-307) Dataframe written with Date,Timestamp, Decimal is read with same types

2020-01-10 Thread leesf (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17012592#comment-17012592
 ] 

leesf commented on HUDI-307:


[~arw357] Hi, do you have time to pick up the ticket?

> Dataframe written with Date,Timestamp, Decimal is read with same types
> --
>
> Key: HUDI-307
> URL: https://issues.apache.org/jira/browse/HUDI-307
> Project: Apache Hudi (incubating)
>  Issue Type: Test
>  Components: Spark Integration
>Reporter: Cosmin Iordache
>Priority: Minor
> Fix For: 0.5.1
>
>
> Small test for COW table to check the persistence of Date, Timestamp ,Decimal 
> types



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] cdmikechen commented on a change in pull request #1119: [HUDI-469] Fix: HoodieCommitMetadata only show first commit insert rows.

2020-01-10 Thread GitBox
cdmikechen commented on a change in pull request #1119: [HUDI-469] Fix: 
HoodieCommitMetadata only show first commit insert rows.
URL: https://github.com/apache/incubator-hudi/pull/1119#discussion_r365119041
 
 

 ##
 File path: 
hudi-common/src/main/java/org/apache/hudi/common/model/HoodieCommitMetadata.java
 ##
 @@ -175,7 +175,8 @@ public long fetchTotalInsertRecordsWritten() {
 long totalInsertRecordsWritten = 0;
 for (List stats : partitionToWriteStats.values()) {
   for (HoodieWriteStat stat : stats) {
-if (stat.getPrevCommit() != null && 
stat.getPrevCommit().equalsIgnoreCase("null")) {
+// It was only possible to determine the number of rows to insert for 
the first commit before. Currently, this problem is fixed
 
 Review comment:
   @n3nash 
   OK~ have changed it. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] hddong commented on a change in pull request #1157: [HUDI-332]Add operation type (insert/upsert/bulkinsert/delete) to HoodieCommitMetadata

2020-01-10 Thread GitBox
hddong commented on a change in pull request #1157: [HUDI-332]Add operation 
type (insert/upsert/bulkinsert/delete) to HoodieCommitMetadata
URL: https://github.com/apache/incubator-hudi/pull/1157#discussion_r365110224
 
 

 ##
 File path: 
hudi-client/src/test/java/org/apache/hudi/io/TestHoodieCommitArchiveLog.java
 ##
 @@ -397,10 +402,81 @@ public void testArchiveCommitCompactionNoHole() throws 
IOException {
 timeline.containsInstant(new HoodieInstant(false, 
HoodieTimeline.COMMIT_ACTION, "107")));
   }
 
+  @Test
+  public void testArchiveCommitAndDeepCopy() throws IOException {
 
 Review comment:
   > Is this method used to test the enum issue in Avro? If so, you can remove 
it.
   
   Yes, be removed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] hddong commented on issue #1157: [HUDI-332]Add operation type (insert/upsert/bulkinsert/delete) to HoodieCommitMetadata

2020-01-10 Thread GitBox
hddong commented on issue #1157: [HUDI-332]Add operation type 
(insert/upsert/bulkinsert/delete) to HoodieCommitMetadata
URL: https://github.com/apache/incubator-hudi/pull/1157#issuecomment-572918303
 
 
   @bvaradar Thanks very much for your review, all of them be addressed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services