[GitHub] [incubator-hudi] hddong opened a new pull request #1449: [HUDI-698]Add unit test for CleansCommand

2020-03-25 Thread GitBox
hddong opened a new pull request #1449: [HUDI-698]Add unit test for 
CleansCommand
URL: https://github.com/apache/incubator-hudi/pull/1449
 
 
   ## *Tips*
   - *Thank you very much for contributing to Apache Hudi.*
   - *Please review https://hudi.apache.org/contributing.html before opening a 
pull request.*
   
   ## What is the purpose of the pull request
   
   *Add unit test for CleansCommand in hudi-cli module*
   
   ## Brief change log
   
 - *Add unit test for CleansCommand*
   
   ## Verify this pull request
   
 - *Add unit test for CleansCommand.*
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Updated] (HUDI-698) Add unit test for CleansCommand

2020-03-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-698:

Labels: pull-request-available  (was: )

> Add unit test for CleansCommand
> ---
>
> Key: HUDI-698
> URL: https://issues.apache.org/jira/browse/HUDI-698
> Project: Apache Hudi (incubating)
>  Issue Type: Sub-task
>  Components: CLI, Testing
>Reporter: hong dongdong
>Assignee: hong dongdong
>Priority: Major
>  Labels: pull-request-available
>
> Add unit test for CleansCommand in hudi-cli module



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] yanghua commented on a change in pull request #1418: [HUDI-678] Make config package spark free

2020-03-25 Thread GitBox
yanghua commented on a change in pull request #1418: [HUDI-678] Make config 
package spark free
URL: https://github.com/apache/incubator-hudi/pull/1418#discussion_r398345709
 
 

 ##
 File path: 
hudi-client/src/main/java/org/apache/hudi/config/HoodieMemoryConfig.java
 ##
 @@ -113,52 +112,20 @@ public Builder withWriteStatusFailureFraction(double 
failureFraction) {
   return this;
 }
 
-/**
- * Dynamic calculation of max memory to use for for spillable map. 
user.available.memory = spark.executor.memory *
- * (1 - spark.memory.fraction) spillable.available.memory = 
user.available.memory * hoodie.memory.fraction. Anytime
- * the spark.executor.memory or the spark.memory.fraction is changed, the 
memory used for spillable map changes
- * accordingly
- */
-private long getMaxMemoryAllowedForMerge(String maxMemoryFraction) {
-  final String SPARK_EXECUTOR_MEMORY_PROP = "spark.executor.memory";
-  final String SPARK_EXECUTOR_MEMORY_FRACTION_PROP = 
"spark.memory.fraction";
-  // This is hard-coded in spark code {@link
-  // 
https://github.com/apache/spark/blob/576c43fb4226e4efa12189b41c3bc862019862c6/core/src/main/scala/org/apache/
-  // spark/memory/UnifiedMemoryManager.scala#L231} so have to re-define 
this here
-  final String DEFAULT_SPARK_EXECUTOR_MEMORY_FRACTION = "0.6";
-  // This is hard-coded in spark code {@link
-  // 
https://github.com/apache/spark/blob/576c43fb4226e4efa12189b41c3bc862019862c6/core/src/main/scala/org/apache/
-  // spark/SparkContext.scala#L471} so have to re-define this here
-  final String DEFAULT_SPARK_EXECUTOR_MEMORY_MB = "1024"; // in MB
-
-  if (SparkEnv.get() != null) {
-// 1 GB is the default conf used by Spark, look at SparkContext.scala
-long executorMemoryInBytes = Utils.memoryStringToMb(
-SparkEnv.get().conf().get(SPARK_EXECUTOR_MEMORY_PROP, 
DEFAULT_SPARK_EXECUTOR_MEMORY_MB)) * 1024 * 1024L;
-// 0.6 is the default value used by Spark,
-// look at {@link
-// 
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/SparkConf.scala#L507}
-double memoryFraction = Double.parseDouble(
-SparkEnv.get().conf().get(SPARK_EXECUTOR_MEMORY_FRACTION_PROP, 
DEFAULT_SPARK_EXECUTOR_MEMORY_FRACTION));
-double maxMemoryFractionForMerge = 
Double.parseDouble(maxMemoryFraction);
-double userAvailableMemory = executorMemoryInBytes * (1 - 
memoryFraction);
-long maxMemoryForMerge = (long) Math.floor(userAvailableMemory * 
maxMemoryFractionForMerge);
-return Math.max(DEFAULT_MIN_MEMORY_FOR_SPILLABLE_MAP_IN_BYTES, 
maxMemoryForMerge);
-  } else {
-return DEFAULT_MAX_MEMORY_FOR_SPILLABLE_MAP_IN_BYTES;
-  }
-}
-
 public HoodieMemoryConfig build() {
   HoodieMemoryConfig config = new HoodieMemoryConfig(props);
   setDefaultOnCondition(props, 
!props.containsKey(MAX_MEMORY_FRACTION_FOR_COMPACTION_PROP),
   MAX_MEMORY_FRACTION_FOR_COMPACTION_PROP, 
DEFAULT_MAX_MEMORY_FRACTION_FOR_COMPACTION);
   setDefaultOnCondition(props, 
!props.containsKey(MAX_MEMORY_FRACTION_FOR_MERGE_PROP),
   MAX_MEMORY_FRACTION_FOR_MERGE_PROP, 
DEFAULT_MAX_MEMORY_FRACTION_FOR_MERGE);
+  long maxMemoryAllowedForMerge =
+  
SparkConfigUtils.getMaxMemoryAllowedForMerge(props.getProperty(MAX_MEMORY_FRACTION_FOR_MERGE_PROP));
   setDefaultOnCondition(props, 
!props.containsKey(MAX_MEMORY_FOR_MERGE_PROP), MAX_MEMORY_FOR_MERGE_PROP,
-  
String.valueOf(getMaxMemoryAllowedForMerge(props.getProperty(MAX_MEMORY_FRACTION_FOR_MERGE_PROP;
+  String.valueOf(maxMemoryAllowedForMerge));
+  long maxMemoryAllowedForCompaction =
+  
SparkConfigUtils.getMaxMemoryAllowedForMerge(props.getProperty(MAX_MEMORY_FRACTION_FOR_COMPACTION_PROP));
 
 Review comment:
   Yes, you are right. Maybe we have to extract these two lines into an extra 
method.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] yanghua commented on a change in pull request #1448: [MINOR] Update DOAP with 0.5.2 Release

2020-03-25 Thread GitBox
yanghua commented on a change in pull request #1448: [MINOR] Update DOAP with 
0.5.2 Release
URL: https://github.com/apache/incubator-hudi/pull/1448#discussion_r398342379
 
 

 ##
 File path: doap_HUDI.rdf
 ##
 @@ -46,6 +46,11 @@
 2020-01-31
 0.5.1
   
+  
 
 Review comment:
   Have added a step in "Finalize the release" section:
   
   ```
   Update DOAP file in the root of the project via sending a PR like this one.
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] yanghua commented on a change in pull request #1448: [MINOR] Update DOAP with 0.5.2 Release

2020-03-25 Thread GitBox
yanghua commented on a change in pull request #1448: [MINOR] Update DOAP with 
0.5.2 Release
URL: https://github.com/apache/incubator-hudi/pull/1448#discussion_r398339368
 
 

 ##
 File path: doap_HUDI.rdf
 ##
 @@ -46,6 +46,11 @@
 2020-01-31
 0.5.1
   
+  
 
 Review comment:
   OK 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Updated] (HUDI-731) Implement a chained transformer for deltastreamer that can chain other transformer implementations

2020-03-25 Thread Raymond Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-731:

Fix Version/s: 0.6.0

> Implement a chained transformer for deltastreamer that can chain other 
> transformer implementations
> --
>
> Key: HUDI-731
> URL: https://issues.apache.org/jira/browse/HUDI-731
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: DeltaStreamer, Utilities
>Reporter: Vinoth Chandar
>Assignee: Raymond Xu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.6.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] smarthi commented on a change in pull request #1159: [HUDI-479] Eliminate or Minimize use of Guava if possible

2020-03-25 Thread GitBox
smarthi commented on a change in pull request #1159: [HUDI-479] Eliminate or 
Minimize use of Guava if possible
URL: https://github.com/apache/incubator-hudi/pull/1159#discussion_r398333676
 
 

 ##
 File path: 
hudi-common/src/main/java/org/apache/hudi/common/util/ReflectionUtils.java
 ##
 @@ -90,16 +99,58 @@ public static Object loadClass(String clazz, Object... 
constructorArgs) {
   }
 
   /**
-   * Return stream of top level class names in the same class path as 
passed-in class.
-   * 
-   * @param clazz
+   * Scans all classes accessible from the context class loader
+   * which belong to the given package and subpackages.
+   *
+   * @param clazz class
+   * @return Stream of Class names in package
*/
-  public static Stream getTopLevelClassesInClasspath(Class clazz) {
+  public static Stream getTopLevelClassesInClasspath(Class clazz) {
 
 Review comment:
   yes - let's raise a jira


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1159: [HUDI-479] Eliminate or Minimize use of Guava if possible

2020-03-25 Thread GitBox
vinothchandar commented on a change in pull request #1159: [HUDI-479] Eliminate 
or Minimize use of Guava if possible
URL: https://github.com/apache/incubator-hudi/pull/1159#discussion_r39816
 
 

 ##
 File path: 
hudi-common/src/main/java/org/apache/hudi/common/util/CollectionUtils.java
 ##
 @@ -0,0 +1,132 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.common.util;
+
+import java.util.Collections;
+import java.util.Iterator;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+import java.util.Set;
+import java.util.stream.Collectors;
+import java.util.stream.Stream;
+
+public class CollectionUtils {
+  /**
+   * Determines whether two iterators contain equal elements in the same 
order. More specifically,
+   * this method returns {@code true} if {@code iterator1} and {@code 
iterator2} contain the same
+   * number of elements and every element of {@code iterator1} is equal to the 
corresponding element
+   * of {@code iterator2}.
+   *
+   * Note that this will modify the supplied iterators, since they will 
have been advanced some
+   * number of elements forward.
+   */
+  public static boolean elementsEqual(Iterator iterator1, Iterator 
iterator2) {
+while (iterator1.hasNext()) {
+  if (!iterator2.hasNext()) {
+return false;
+  }
+  Object o1 = iterator1.next();
+  Object o2 = iterator2.next();
+  if (!Objects.equals(o1, o2)) {
+return false;
+  }
+}
+return !iterator2.hasNext();
+  }
+
+  @SafeVarargs
+  public static  Set createSetFromElements(final T... elements) {
+return Stream.of(elements).collect(Collectors.toSet());
+  }
+
+  public static  Map createImmutableMap(final K key, final V value) 
{
+return Collections.unmodifiableMap(Collections.singletonMap(key, value));
+  }
+
+  @SafeVarargs
+  public static  List createImmutableList(final T... elements) {
+return 
Collections.unmodifiableList(Stream.of(elements).collect(Collectors.toList()));
+  }
+
+  public static  Map createImmutableMap(final Map map) {
+return Collections.unmodifiableMap(map);
+  }
+
+  @SafeVarargs
+  public static  Set createImmutableSet(final T... elements) {
+return 
Collections.unmodifiableSet(Stream.of(elements).collect(Collectors.toSet()));
+  }
+
+  public static  Set createImmutableSet(final Set set) {
+return Collections.unmodifiableSet(set);
+  }
+
+  public static  List createImmutableList(final List list) {
+return Collections.unmodifiableList(list);
+  }
+
+  private static Object[] checkElementsNotNull(Object... array) {
+return checkElementsNotNull(array, array.length);
+  }
+
+  private static Object[] checkElementsNotNull(Object[] array, int length) {
+for (int i = 0; i < length; i++) {
+  checkElementNotNull(array[i], i);
+}
+return array;
+  }
+
+  private static Object checkElementNotNull(Object element, int index) {
+if (element == null) {
+  throw new NullPointerException("at index " + index);
+}
+return element;
+  }
+
+  public static class Maps {
 
 Review comment:
   we can file a follow up JIRA and deal with it later it as well


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] smarthi commented on a change in pull request #1159: [HUDI-479] Eliminate or Minimize use of Guava if possible

2020-03-25 Thread GitBox
smarthi commented on a change in pull request #1159: [HUDI-479] Eliminate or 
Minimize use of Guava if possible
URL: https://github.com/apache/incubator-hudi/pull/1159#discussion_r398332715
 
 

 ##
 File path: 
hudi-common/src/main/java/org/apache/hudi/common/util/CollectionUtils.java
 ##
 @@ -0,0 +1,132 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.common.util;
+
+import java.util.Collections;
+import java.util.Iterator;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+import java.util.Set;
+import java.util.stream.Collectors;
+import java.util.stream.Stream;
+
+public class CollectionUtils {
+  /**
+   * Determines whether two iterators contain equal elements in the same 
order. More specifically,
+   * this method returns {@code true} if {@code iterator1} and {@code 
iterator2} contain the same
+   * number of elements and every element of {@code iterator1} is equal to the 
corresponding element
+   * of {@code iterator2}.
+   *
+   * Note that this will modify the supplied iterators, since they will 
have been advanced some
+   * number of elements forward.
+   */
+  public static boolean elementsEqual(Iterator iterator1, Iterator 
iterator2) {
+while (iterator1.hasNext()) {
+  if (!iterator2.hasNext()) {
+return false;
+  }
+  Object o1 = iterator1.next();
+  Object o2 = iterator2.next();
+  if (!Objects.equals(o1, o2)) {
+return false;
+  }
+}
+return !iterator2.hasNext();
+  }
+
+  @SafeVarargs
+  public static  Set createSetFromElements(final T... elements) {
+return Stream.of(elements).collect(Collectors.toSet());
+  }
+
+  public static  Map createImmutableMap(final K key, final V value) 
{
+return Collections.unmodifiableMap(Collections.singletonMap(key, value));
+  }
+
+  @SafeVarargs
+  public static  List createImmutableList(final T... elements) {
+return 
Collections.unmodifiableList(Stream.of(elements).collect(Collectors.toList()));
+  }
+
+  public static  Map createImmutableMap(final Map map) {
+return Collections.unmodifiableMap(map);
+  }
+
+  @SafeVarargs
+  public static  Set createImmutableSet(final T... elements) {
+return 
Collections.unmodifiableSet(Stream.of(elements).collect(Collectors.toSet()));
+  }
+
+  public static  Set createImmutableSet(final Set set) {
+return Collections.unmodifiableSet(set);
+  }
+
+  public static  List createImmutableList(final List list) {
+return Collections.unmodifiableList(list);
+  }
+
+  private static Object[] checkElementsNotNull(Object... array) {
+return checkElementsNotNull(array, array.length);
+  }
+
+  private static Object[] checkElementsNotNull(Object[] array, int length) {
+for (int i = 0; i < length; i++) {
+  checkElementNotNull(array[i], i);
+}
+return array;
+  }
+
+  private static Object checkElementNotNull(Object element, int index) {
+if (element == null) {
+  throw new NullPointerException("at index " + index);
+}
+return element;
+  }
+
+  public static class Maps {
 
 Review comment:
   +1 - this was the closest to replacing Guava's ImmutableMap Builder pattern 
that was being widely used across the project. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] lamber-ken commented on issue #1446: [MINOR] Add link for yields.io usage

2020-03-25 Thread GitBox
lamber-ken commented on issue #1446: [MINOR] Add link for yields.io usage
URL: https://github.com/apache/incubator-hudi/pull/1446#issuecomment-604243812
 
 
   > @lamber-ken can you please update the README on asf-site with some details 
on the automation and tips like these.. ? There is a section already.
   
   OK


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on issue #1446: [MINOR] Add link for yields.io usage

2020-03-25 Thread GitBox
vinothchandar commented on issue #1446: [MINOR] Add link for yields.io usage
URL: https://github.com/apache/incubator-hudi/pull/1446#issuecomment-604238808
 
 
   @lamber-ken can you please update the README on asf-site with some details 
on the automation and tips like these.. ? There is a section already. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1159: [HUDI-479] Eliminate or Minimize use of Guava if possible

2020-03-25 Thread GitBox
vinothchandar commented on a change in pull request #1159: [HUDI-479] Eliminate 
or Minimize use of Guava if possible
URL: https://github.com/apache/incubator-hudi/pull/1159#discussion_r398323808
 
 

 ##
 File path: 
hudi-common/src/main/java/org/apache/hudi/common/util/FileIOUtils.java
 ##
 @@ -91,4 +94,55 @@ public static void writeStringToFile(String str, String 
filePath) throws IOExcep
 out.flush();
 out.close();
   }
+
+  /**
+   * Closes a {@link Closeable}, with control over whether an {@code 
IOException} may be thrown.
+   * @param closeable the {@code Closeable} object to be closed, or null,
+   *  in which case this method does nothing.
+   * @param swallowIOException if true, don't propagate IO exceptions thrown 
by the {@code close} methods.
+   *
+   * @throws IOException if {@code swallowIOException} is false and {@code 
close} throws an {@code IOException}.
+   */
+  public static void close(@Nullable Closeable closeable, boolean 
swallowIOException)
+  throws IOException {
+if (closeable == null) {
+  return;
+}
+try {
+  closeable.close();
+} catch (IOException e) {
+  if (!swallowIOException) {
+throw e;
+  }
+}
+  }
+
+  /** Maximum loop count when creating temp directories. */
+  private static final int TEMP_DIR_ATTEMPTS = 1;
+
+  /**
+   * Create a Temporary Directory.
+   * @return {@code File}
+   */
+  public static File createTempDir() {
 
 Review comment:
   this is mostly used by tests, right?  I think we should just replace those 
with Junit temporary folder or use `Files.createTempDirectory()` from jdk here 
instead of creating it by hand.. This code and the retry loops can be 
simplified to 1 line, java will set up hooks to delete the directories on exit 
etc, if we do that.. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1159: [HUDI-479] Eliminate or Minimize use of Guava if possible

2020-03-25 Thread GitBox
vinothchandar commented on a change in pull request #1159: [HUDI-479] Eliminate 
or Minimize use of Guava if possible
URL: https://github.com/apache/incubator-hudi/pull/1159#discussion_r398318223
 
 

 ##
 File path: 
hudi-cli/src/test/java/org/apache/hudi/cli/common/HoodieTestCommitMetadataGenerator.java
 ##
 @@ -95,10 +96,9 @@ public static HoodieCommitMetadata 
generateCommitMetadata(String basePath) throw
 HoodieTestUtils.createNewDataFile(basePath, 
DEFAULT_FIRST_PARTITION_PATH, "000");
 String file1P1C0 =
 HoodieTestUtils.createNewDataFile(basePath, 
DEFAULT_SECOND_PARTITION_PATH, "000");
-return generateCommitMetadata(new ImmutableMap.Builder()
-  .put(DEFAULT_FIRST_PARTITION_PATH, new 
ImmutableList.Builder<>().add(file1P0C0).build())
-  .put(DEFAULT_SECOND_PARTITION_PATH, new 
ImmutableList.Builder<>().add(file1P1C0).build())
-  .build());
+return generateCommitMetadata(new Maps.MapBuilder>()
 
 Review comment:
   this can just be replaced with a in-place Map construction syntax 
   
   ```
   Map doubleBraceMap  = new HashMap() {{
   put("key1", "value1");
   put("key2", "value2");
   }};
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1159: [HUDI-479] Eliminate or Minimize use of Guava if possible

2020-03-25 Thread GitBox
vinothchandar commented on a change in pull request #1159: [HUDI-479] Eliminate 
or Minimize use of Guava if possible
URL: https://github.com/apache/incubator-hudi/pull/1159#discussion_r398320467
 
 

 ##
 File path: 
hudi-common/src/main/java/org/apache/hudi/common/util/CleanerUtils.java
 ##
 @@ -28,19 +28,19 @@
 import org.apache.hudi.common.versioning.clean.CleanV1MigrationHandler;
 import org.apache.hudi.common.versioning.clean.CleanV2MigrationHandler;
 
-import com.google.common.collect.ImmutableMap;
-
 import java.io.IOException;
 import java.util.List;
 
+import static org.apache.hudi.common.util.CollectionUtils.Maps;
+
 public class CleanerUtils {
   public static final Integer CLEAN_METADATA_VERSION_1 = 
CleanV1MigrationHandler.VERSION;
   public static final Integer CLEAN_METADATA_VERSION_2 = 
CleanV2MigrationHandler.VERSION;
   public static final Integer LATEST_CLEAN_METADATA_VERSION = 
CLEAN_METADATA_VERSION_2;
 
   public static HoodieCleanMetadata convertCleanMetadata(HoodieTableMetaClient 
metaClient,
   String startCleanTime, Option durationInMs, List 
cleanStats) {
-ImmutableMap.Builder 
partitionMetadataBuilder = ImmutableMap.builder();
+Maps.MapBuilder 
partitionMetadataBuilder = new Maps.MapBuilder<>();
 
 Review comment:
   this probably can be just replaced by a concrete map impl in place right? 
this sort of usage does not really warrant the builder pattern IMO. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1159: [HUDI-479] Eliminate or Minimize use of Guava if possible

2020-03-25 Thread GitBox
vinothchandar commented on a change in pull request #1159: [HUDI-479] Eliminate 
or Minimize use of Guava if possible
URL: https://github.com/apache/incubator-hudi/pull/1159#discussion_r398321039
 
 

 ##
 File path: 
hudi-common/src/main/java/org/apache/hudi/common/util/CollectionUtils.java
 ##
 @@ -0,0 +1,132 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.common.util;
+
+import java.util.Collections;
+import java.util.Iterator;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+import java.util.Set;
+import java.util.stream.Collectors;
+import java.util.stream.Stream;
+
+public class CollectionUtils {
+  /**
+   * Determines whether two iterators contain equal elements in the same 
order. More specifically,
+   * this method returns {@code true} if {@code iterator1} and {@code 
iterator2} contain the same
+   * number of elements and every element of {@code iterator1} is equal to the 
corresponding element
+   * of {@code iterator2}.
+   *
+   * Note that this will modify the supplied iterators, since they will 
have been advanced some
+   * number of elements forward.
+   */
+  public static boolean elementsEqual(Iterator iterator1, Iterator 
iterator2) {
+while (iterator1.hasNext()) {
+  if (!iterator2.hasNext()) {
+return false;
+  }
+  Object o1 = iterator1.next();
+  Object o2 = iterator2.next();
+  if (!Objects.equals(o1, o2)) {
+return false;
+  }
+}
+return !iterator2.hasNext();
+  }
+
+  @SafeVarargs
+  public static  Set createSetFromElements(final T... elements) {
 
 Review comment:
   rename to just `createSet`? 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1159: [HUDI-479] Eliminate or Minimize use of Guava if possible

2020-03-25 Thread GitBox
vinothchandar commented on a change in pull request #1159: [HUDI-479] Eliminate 
or Minimize use of Guava if possible
URL: https://github.com/apache/incubator-hudi/pull/1159#discussion_r398322800
 
 

 ##
 File path: 
hudi-common/src/main/java/org/apache/hudi/common/util/CollectionUtils.java
 ##
 @@ -0,0 +1,132 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.common.util;
+
+import java.util.Collections;
+import java.util.Iterator;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+import java.util.Set;
+import java.util.stream.Collectors;
+import java.util.stream.Stream;
+
+public class CollectionUtils {
+  /**
+   * Determines whether two iterators contain equal elements in the same 
order. More specifically,
+   * this method returns {@code true} if {@code iterator1} and {@code 
iterator2} contain the same
+   * number of elements and every element of {@code iterator1} is equal to the 
corresponding element
+   * of {@code iterator2}.
+   *
+   * Note that this will modify the supplied iterators, since they will 
have been advanced some
+   * number of elements forward.
+   */
+  public static boolean elementsEqual(Iterator iterator1, Iterator 
iterator2) {
+while (iterator1.hasNext()) {
+  if (!iterator2.hasNext()) {
+return false;
+  }
+  Object o1 = iterator1.next();
+  Object o2 = iterator2.next();
+  if (!Objects.equals(o1, o2)) {
+return false;
+  }
+}
+return !iterator2.hasNext();
+  }
+
+  @SafeVarargs
+  public static  Set createSetFromElements(final T... elements) {
+return Stream.of(elements).collect(Collectors.toSet());
+  }
+
+  public static  Map createImmutableMap(final K key, final V value) 
{
+return Collections.unmodifiableMap(Collections.singletonMap(key, value));
+  }
+
+  @SafeVarargs
+  public static  List createImmutableList(final T... elements) {
+return 
Collections.unmodifiableList(Stream.of(elements).collect(Collectors.toList()));
+  }
+
+  public static  Map createImmutableMap(final Map map) {
+return Collections.unmodifiableMap(map);
+  }
+
+  @SafeVarargs
+  public static  Set createImmutableSet(final T... elements) {
+return 
Collections.unmodifiableSet(Stream.of(elements).collect(Collectors.toSet()));
+  }
+
+  public static  Set createImmutableSet(final Set set) {
+return Collections.unmodifiableSet(set);
+  }
+
+  public static  List createImmutableList(final List list) {
+return Collections.unmodifiableList(list);
+  }
+
+  private static Object[] checkElementsNotNull(Object... array) {
+return checkElementsNotNull(array, array.length);
+  }
+
+  private static Object[] checkElementsNotNull(Object[] array, int length) {
+for (int i = 0; i < length; i++) {
+  checkElementNotNull(array[i], i);
+}
+return array;
+  }
+
+  private static Object checkElementNotNull(Object element, int index) {
+if (element == null) {
+  throw new NullPointerException("at index " + index);
+}
+return element;
+  }
+
+  public static class Maps {
 
 Review comment:
   I am wondering if we can eliminate the need for this class and the builder 
class, but replacing with plain hashmaps inline? 
   - in places where a static HashMap is needed we can use the map 
initialization syntax I pasted above. 
   
   Don't feel very strongly about this,, but thinking if we can avoid builder 
patterns where it actually is not used that way.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1159: [HUDI-479] Eliminate or Minimize use of Guava if possible

2020-03-25 Thread GitBox
vinothchandar commented on a change in pull request #1159: [HUDI-479] Eliminate 
or Minimize use of Guava if possible
URL: https://github.com/apache/incubator-hudi/pull/1159#discussion_r398322962
 
 

 ##
 File path: 
hudi-common/src/main/java/org/apache/hudi/common/util/FileIOUtils.java
 ##
 @@ -91,4 +94,55 @@ public static void writeStringToFile(String str, String 
filePath) throws IOExcep
 out.flush();
 out.close();
   }
+
+  /**
+   * Closes a {@link Closeable}, with control over whether an {@code 
IOException} may be thrown.
 
 Review comment:
   did we reuse this code from somewhere?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1159: [HUDI-479] Eliminate or Minimize use of Guava if possible

2020-03-25 Thread GitBox
vinothchandar commented on a change in pull request #1159: [HUDI-479] Eliminate 
or Minimize use of Guava if possible
URL: https://github.com/apache/incubator-hudi/pull/1159#discussion_r398325200
 
 

 ##
 File path: 
hudi-common/src/main/java/org/apache/hudi/common/util/ReflectionUtils.java
 ##
 @@ -90,16 +99,58 @@ public static Object loadClass(String clazz, Object... 
constructorArgs) {
   }
 
   /**
-   * Return stream of top level class names in the same class path as 
passed-in class.
-   * 
-   * @param clazz
+   * Scans all classes accessible from the context class loader
+   * which belong to the given package and subpackages.
+   *
+   * @param clazz class
+   * @return Stream of Class names in package
*/
-  public static Stream getTopLevelClassesInClasspath(Class clazz) {
+  public static Stream getTopLevelClassesInClasspath(Class clazz) {
 
 Review comment:
   this method is just used by the dummy bundle main classes right.. Ideally, 
like to simplify this down the line. if you agree, we can raise a JIRA? 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1159: [HUDI-479] Eliminate or Minimize use of Guava if possible

2020-03-25 Thread GitBox
vinothchandar commented on a change in pull request #1159: [HUDI-479] Eliminate 
or Minimize use of Guava if possible
URL: https://github.com/apache/incubator-hudi/pull/1159#discussion_r398321995
 
 

 ##
 File path: 
hudi-common/src/main/java/org/apache/hudi/common/util/CollectionUtils.java
 ##
 @@ -0,0 +1,132 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.common.util;
+
+import java.util.Collections;
+import java.util.Iterator;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+import java.util.Set;
+import java.util.stream.Collectors;
+import java.util.stream.Stream;
+
+public class CollectionUtils {
+  /**
+   * Determines whether two iterators contain equal elements in the same 
order. More specifically,
+   * this method returns {@code true} if {@code iterator1} and {@code 
iterator2} contain the same
+   * number of elements and every element of {@code iterator1} is equal to the 
corresponding element
+   * of {@code iterator2}.
+   *
+   * Note that this will modify the supplied iterators, since they will 
have been advanced some
+   * number of elements forward.
+   */
+  public static boolean elementsEqual(Iterator iterator1, Iterator 
iterator2) {
+while (iterator1.hasNext()) {
+  if (!iterator2.hasNext()) {
+return false;
+  }
+  Object o1 = iterator1.next();
+  Object o2 = iterator2.next();
+  if (!Objects.equals(o1, o2)) {
+return false;
+  }
+}
+return !iterator2.hasNext();
+  }
+
+  @SafeVarargs
+  public static  Set createSetFromElements(final T... elements) {
+return Stream.of(elements).collect(Collectors.toSet());
+  }
+
+  public static  Map createImmutableMap(final K key, final V value) 
{
+return Collections.unmodifiableMap(Collections.singletonMap(key, value));
+  }
+
+  @SafeVarargs
+  public static  List createImmutableList(final T... elements) {
+return 
Collections.unmodifiableList(Stream.of(elements).collect(Collectors.toList()));
+  }
+
+  public static  Map createImmutableMap(final Map map) {
+return Collections.unmodifiableMap(map);
+  }
+
+  @SafeVarargs
+  public static  Set createImmutableSet(final T... elements) {
+return 
Collections.unmodifiableSet(Stream.of(elements).collect(Collectors.toSet()));
+  }
+
+  public static  Set createImmutableSet(final Set set) {
+return Collections.unmodifiableSet(set);
+  }
+
+  public static  List createImmutableList(final List list) {
+return Collections.unmodifiableList(list);
+  }
+
+  private static Object[] checkElementsNotNull(Object... array) {
+return checkElementsNotNull(array, array.length);
+  }
+
+  private static Object[] checkElementsNotNull(Object[] array, int length) {
+for (int i = 0; i < length; i++) {
+  checkElementNotNull(array[i], i);
+}
+return array;
+  }
+
+  private static Object checkElementNotNull(Object element, int index) {
+if (element == null) {
 
 Review comment:
   just call Objects.requireNonNull(obj, string) ? 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1159: [HUDI-479] Eliminate or Minimize use of Guava if possible

2020-03-25 Thread GitBox
vinothchandar commented on a change in pull request #1159: [HUDI-479] Eliminate 
or Minimize use of Guava if possible
URL: https://github.com/apache/incubator-hudi/pull/1159#discussion_r398321209
 
 

 ##
 File path: 
hudi-common/src/main/java/org/apache/hudi/common/util/CollectionUtils.java
 ##
 @@ -0,0 +1,132 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.common.util;
+
+import java.util.Collections;
+import java.util.Iterator;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+import java.util.Set;
+import java.util.stream.Collectors;
+import java.util.stream.Stream;
+
+public class CollectionUtils {
+  /**
+   * Determines whether two iterators contain equal elements in the same 
order. More specifically,
+   * this method returns {@code true} if {@code iterator1} and {@code 
iterator2} contain the same
+   * number of elements and every element of {@code iterator1} is equal to the 
corresponding element
+   * of {@code iterator2}.
+   *
+   * Note that this will modify the supplied iterators, since they will 
have been advanced some
+   * number of elements forward.
+   */
+  public static boolean elementsEqual(Iterator iterator1, Iterator 
iterator2) {
+while (iterator1.hasNext()) {
+  if (!iterator2.hasNext()) {
+return false;
+  }
+  Object o1 = iterator1.next();
+  Object o2 = iterator2.next();
+  if (!Objects.equals(o1, o2)) {
+return false;
+  }
+}
+return !iterator2.hasNext();
+  }
+
+  @SafeVarargs
+  public static  Set createSetFromElements(final T... elements) {
+return Stream.of(elements).collect(Collectors.toSet());
+  }
+
+  public static  Map createImmutableMap(final K key, final V value) 
{
+return Collections.unmodifiableMap(Collections.singletonMap(key, value));
+  }
+
+  @SafeVarargs
+  public static  List createImmutableList(final T... elements) {
+return 
Collections.unmodifiableList(Stream.of(elements).collect(Collectors.toList()));
+  }
+
+  public static  Map createImmutableMap(final Map map) {
+return Collections.unmodifiableMap(map);
+  }
+
+  @SafeVarargs
+  public static  Set createImmutableSet(final T... elements) {
+return 
Collections.unmodifiableSet(Stream.of(elements).collect(Collectors.toSet()));
 
 Review comment:
   reuse and call createSet() from above?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1159: [HUDI-479] Eliminate or Minimize use of Guava if possible

2020-03-25 Thread GitBox
vinothchandar commented on a change in pull request #1159: [HUDI-479] Eliminate 
or Minimize use of Guava if possible
URL: https://github.com/apache/incubator-hudi/pull/1159#discussion_r398323238
 
 

 ##
 File path: 
hudi-common/src/main/java/org/apache/hudi/common/util/FileIOUtils.java
 ##
 @@ -91,4 +94,55 @@ public static void writeStringToFile(String str, String 
filePath) throws IOExcep
 out.flush();
 out.close();
   }
+
+  /**
+   * Closes a {@link Closeable}, with control over whether an {@code 
IOException} may be thrown.
+   * @param closeable the {@code Closeable} object to be closed, or null,
+   *  in which case this method does nothing.
+   * @param swallowIOException if true, don't propagate IO exceptions thrown 
by the {@code close} methods.
+   *
+   * @throws IOException if {@code swallowIOException} is false and {@code 
close} throws an {@code IOException}.
+   */
+  public static void close(@Nullable Closeable closeable, boolean 
swallowIOException)
+  throws IOException {
+if (closeable == null) {
+  return;
+}
+try {
+  closeable.close();
 
 Review comment:
   just the following in the try block? 
   
   ```
   if (closeable != null) {
 closeable.close();
   }
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1159: [HUDI-479] Eliminate or Minimize use of Guava if possible

2020-03-25 Thread GitBox
vinothchandar commented on a change in pull request #1159: [HUDI-479] Eliminate 
or Minimize use of Guava if possible
URL: https://github.com/apache/incubator-hudi/pull/1159#discussion_r398323935
 
 

 ##
 File path: 
hudi-common/src/main/java/org/apache/hudi/common/util/ReflectionUtils.java
 ##
 @@ -21,21 +21,30 @@
 import org.apache.hudi.common.model.HoodieRecordPayload;
 import org.apache.hudi.exception.HoodieException;
 
-import com.google.common.reflect.ClassPath;
-import com.google.common.reflect.ClassPath.ClassInfo;
+import org.slf4j.Logger;
 
 Review comment:
   I think we use log4j directly?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1159: [HUDI-479] Eliminate or Minimize use of Guava if possible

2020-03-25 Thread GitBox
vinothchandar commented on a change in pull request #1159: [HUDI-479] Eliminate 
or Minimize use of Guava if possible
URL: https://github.com/apache/incubator-hudi/pull/1159#discussion_r398318223
 
 

 ##
 File path: 
hudi-cli/src/test/java/org/apache/hudi/cli/common/HoodieTestCommitMetadataGenerator.java
 ##
 @@ -95,10 +96,9 @@ public static HoodieCommitMetadata 
generateCommitMetadata(String basePath) throw
 HoodieTestUtils.createNewDataFile(basePath, 
DEFAULT_FIRST_PARTITION_PATH, "000");
 String file1P1C0 =
 HoodieTestUtils.createNewDataFile(basePath, 
DEFAULT_SECOND_PARTITION_PATH, "000");
-return generateCommitMetadata(new ImmutableMap.Builder()
-  .put(DEFAULT_FIRST_PARTITION_PATH, new 
ImmutableList.Builder<>().add(file1P0C0).build())
-  .put(DEFAULT_SECOND_PARTITION_PATH, new 
ImmutableList.Builder<>().add(file1P1C0).build())
-  .build());
+return generateCommitMetadata(new Maps.MapBuilder>()
 
 Review comment:
   this can just be replaced with a in-place Map construction syntax ?
   
   ```
   Map doubleBraceMap  = new HashMap() {{
   put("key1", "value1");
   put("key2", "value2");
   }};
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1418: [HUDI-678] Make config package spark free

2020-03-25 Thread GitBox
vinothchandar commented on a change in pull request #1418: [HUDI-678] Make 
config package spark free
URL: https://github.com/apache/incubator-hudi/pull/1418#discussion_r398317120
 
 

 ##
 File path: 
hudi-client/src/main/java/org/apache/hudi/config/HoodieMemoryConfig.java
 ##
 @@ -113,52 +112,20 @@ public Builder withWriteStatusFailureFraction(double 
failureFraction) {
   return this;
 }
 
-/**
- * Dynamic calculation of max memory to use for for spillable map. 
user.available.memory = spark.executor.memory *
- * (1 - spark.memory.fraction) spillable.available.memory = 
user.available.memory * hoodie.memory.fraction. Anytime
- * the spark.executor.memory or the spark.memory.fraction is changed, the 
memory used for spillable map changes
- * accordingly
- */
-private long getMaxMemoryAllowedForMerge(String maxMemoryFraction) {
-  final String SPARK_EXECUTOR_MEMORY_PROP = "spark.executor.memory";
-  final String SPARK_EXECUTOR_MEMORY_FRACTION_PROP = 
"spark.memory.fraction";
-  // This is hard-coded in spark code {@link
-  // 
https://github.com/apache/spark/blob/576c43fb4226e4efa12189b41c3bc862019862c6/core/src/main/scala/org/apache/
-  // spark/memory/UnifiedMemoryManager.scala#L231} so have to re-define 
this here
-  final String DEFAULT_SPARK_EXECUTOR_MEMORY_FRACTION = "0.6";
-  // This is hard-coded in spark code {@link
-  // 
https://github.com/apache/spark/blob/576c43fb4226e4efa12189b41c3bc862019862c6/core/src/main/scala/org/apache/
-  // spark/SparkContext.scala#L471} so have to re-define this here
-  final String DEFAULT_SPARK_EXECUTOR_MEMORY_MB = "1024"; // in MB
-
-  if (SparkEnv.get() != null) {
-// 1 GB is the default conf used by Spark, look at SparkContext.scala
-long executorMemoryInBytes = Utils.memoryStringToMb(
-SparkEnv.get().conf().get(SPARK_EXECUTOR_MEMORY_PROP, 
DEFAULT_SPARK_EXECUTOR_MEMORY_MB)) * 1024 * 1024L;
-// 0.6 is the default value used by Spark,
-// look at {@link
-// 
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/SparkConf.scala#L507}
-double memoryFraction = Double.parseDouble(
-SparkEnv.get().conf().get(SPARK_EXECUTOR_MEMORY_FRACTION_PROP, 
DEFAULT_SPARK_EXECUTOR_MEMORY_FRACTION));
-double maxMemoryFractionForMerge = 
Double.parseDouble(maxMemoryFraction);
-double userAvailableMemory = executorMemoryInBytes * (1 - 
memoryFraction);
-long maxMemoryForMerge = (long) Math.floor(userAvailableMemory * 
maxMemoryFractionForMerge);
-return Math.max(DEFAULT_MIN_MEMORY_FOR_SPILLABLE_MAP_IN_BYTES, 
maxMemoryForMerge);
-  } else {
-return DEFAULT_MAX_MEMORY_FOR_SPILLABLE_MAP_IN_BYTES;
-  }
-}
-
 public HoodieMemoryConfig build() {
   HoodieMemoryConfig config = new HoodieMemoryConfig(props);
   setDefaultOnCondition(props, 
!props.containsKey(MAX_MEMORY_FRACTION_FOR_COMPACTION_PROP),
   MAX_MEMORY_FRACTION_FOR_COMPACTION_PROP, 
DEFAULT_MAX_MEMORY_FRACTION_FOR_COMPACTION);
   setDefaultOnCondition(props, 
!props.containsKey(MAX_MEMORY_FRACTION_FOR_MERGE_PROP),
   MAX_MEMORY_FRACTION_FOR_MERGE_PROP, 
DEFAULT_MAX_MEMORY_FRACTION_FOR_MERGE);
+  long maxMemoryAllowedForMerge =
+  
SparkConfigUtils.getMaxMemoryAllowedForMerge(props.getProperty(MAX_MEMORY_FRACTION_FOR_MERGE_PROP));
   setDefaultOnCondition(props, 
!props.containsKey(MAX_MEMORY_FOR_MERGE_PROP), MAX_MEMORY_FOR_MERGE_PROP,
-  
String.valueOf(getMaxMemoryAllowedForMerge(props.getProperty(MAX_MEMORY_FRACTION_FOR_MERGE_PROP;
+  String.valueOf(maxMemoryAllowedForMerge));
+  long maxMemoryAllowedForCompaction =
+  
SparkConfigUtils.getMaxMemoryAllowedForMerge(props.getProperty(MAX_MEMORY_FRACTION_FOR_COMPACTION_PROP));
 
 Review comment:
   if the classes in `config` call the SparkConfigUtils, then we cannot claim 
its spark free right..
   
   cc @yanghua as well 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1418: [HUDI-678] Make config package spark free

2020-03-25 Thread GitBox
vinothchandar commented on a change in pull request #1418: [HUDI-678] Make 
config package spark free
URL: https://github.com/apache/incubator-hudi/pull/1418#discussion_r398316910
 
 

 ##
 File path: 
hudi-client/src/main/java/org/apache/hudi/config/HoodieMemoryConfig.java
 ##
 @@ -113,40 +112,8 @@ public Builder withWriteStatusFailureFraction(double 
failureFraction) {
   return this;
 }
 
-/**
- * Dynamic calculation of max memory to use for for spillable map. 
user.available.memory = spark.executor.memory *
- * (1 - spark.memory.fraction) spillable.available.memory = 
user.available.memory * hoodie.memory.fraction. Anytime
- * the spark.executor.memory or the spark.memory.fraction is changed, the 
memory used for spillable map changes
- * accordingly
- */
 private long getMaxMemoryAllowedForMerge(String maxMemoryFraction) {
-  final String SPARK_EXECUTOR_MEMORY_PROP = "spark.executor.memory";
-  final String SPARK_EXECUTOR_MEMORY_FRACTION_PROP = 
"spark.memory.fraction";
-  // This is hard-coded in spark code {@link
-  // 
https://github.com/apache/spark/blob/576c43fb4226e4efa12189b41c3bc862019862c6/core/src/main/scala/org/apache/
-  // spark/memory/UnifiedMemoryManager.scala#L231} so have to re-define 
this here
-  final String DEFAULT_SPARK_EXECUTOR_MEMORY_FRACTION = "0.6";
-  // This is hard-coded in spark code {@link
-  // 
https://github.com/apache/spark/blob/576c43fb4226e4efa12189b41c3bc862019862c6/core/src/main/scala/org/apache/
-  // spark/SparkContext.scala#L471} so have to re-define this here
-  final String DEFAULT_SPARK_EXECUTOR_MEMORY_MB = "1024"; // in MB
-
-  if (SparkEnv.get() != null) {
-// 1 GB is the default conf used by Spark, look at SparkContext.scala
-long executorMemoryInBytes = Utils.memoryStringToMb(
-SparkEnv.get().conf().get(SPARK_EXECUTOR_MEMORY_PROP, 
DEFAULT_SPARK_EXECUTOR_MEMORY_MB)) * 1024 * 1024L;
-// 0.6 is the default value used by Spark,
-// look at {@link
-// 
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/SparkConf.scala#L507}
-double memoryFraction = Double.parseDouble(
-SparkEnv.get().conf().get(SPARK_EXECUTOR_MEMORY_FRACTION_PROP, 
DEFAULT_SPARK_EXECUTOR_MEMORY_FRACTION));
-double maxMemoryFractionForMerge = 
Double.parseDouble(maxMemoryFraction);
-double userAvailableMemory = executorMemoryInBytes * (1 - 
memoryFraction);
-long maxMemoryForMerge = (long) Math.floor(userAvailableMemory * 
maxMemoryFractionForMerge);
-return Math.max(DEFAULT_MIN_MEMORY_FOR_SPILLABLE_MAP_IN_BYTES, 
maxMemoryForMerge);
-  } else {
-return DEFAULT_MAX_MEMORY_FOR_SPILLABLE_MAP_IN_BYTES;
-  }
+  return ConfigUtils.getMaxMemoryAllowedForMerge(props, maxMemoryFraction);
 
 Review comment:
   @leesf  I think this is still not addressed.. ? 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Created] (HUDI-735) Improve deltastreamer error message when case mismatch of commandline arguments.

2020-03-25 Thread Vinoth Chandar (Jira)
Vinoth Chandar created HUDI-735:
---

 Summary: Improve deltastreamer error message when case mismatch of 
commandline arguments.
 Key: HUDI-735
 URL: https://issues.apache.org/jira/browse/HUDI-735
 Project: Apache Hudi (incubating)
  Issue Type: Improvement
  Components: DeltaStreamer, Usability, Utilities
Reporter: Vinoth Chandar


Team,

When following the blog "Change Capture Using AWS Database Migration
Service and Hudi" with my own data set, the initial load works perfectly.
When issuing the command with the DMS CDC files on S3, I get the following
error:

20/03/24 17:56:28 ERROR HoodieDeltaStreamer: Got error running delta sync
once. Shutting down
org.apache.hudi.exception.HoodieException: Please provide a valid schema
provider class! at
org.apache.hudi.utilities.sources.InputBatch.getSchemaProvider(InputBatch.java:53)
 at
org.apache.hudi.utilities.deltastreamer.DeltaSync.readFromSource(DeltaSync.java:312)
at
org.apache.hudi.utilities.deltastreamer.DeltaSync.syncOnce(DeltaSync.java:226)

I tried using the  --schemaprovider-class
org.apache.hudi.utilities.schema.FilebasedSchemaProvider.Source and provide
the schema. The error does not occur but there are no write to Hudi.

I am not performing any transformations (other than the DMS transform) and
using default record key strategy.

If the team has any pointers, please let me know.

Thank you!

---


Thank you Vinoth. I was able to find the issue. All my column names were in
high caps case. I switched column names and table names to lower case and
it works perfectly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1448: [MINOR] Update DOAP with 0.5.2 Release

2020-03-25 Thread GitBox
vinothchandar commented on a change in pull request #1448: [MINOR] Update DOAP 
with 0.5.2 Release
URL: https://github.com/apache/incubator-hudi/pull/1448#discussion_r398315198
 
 

 ##
 File path: doap_HUDI.rdf
 ##
 @@ -46,6 +46,11 @@
 2020-01-31
 0.5.1
   
+  
 
 Review comment:
   @yanghua let's update this step in the release guide? 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on issue #1443: [SUPPORT] Spark-Hudi consumes too much space in a temp folder while upsert

2020-03-25 Thread GitBox
vinothchandar commented on issue #1443: [SUPPORT] Spark-Hudi consumes too much 
space in a temp folder while upsert
URL: https://github.com/apache/incubator-hudi/issues/1443#issuecomment-604226504
 
 
   good idea @lamber-ken !


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] codecov-io edited a comment on issue #1159: [HUDI-479] Eliminate or Minimize use of Guava if possible

2020-03-25 Thread GitBox
codecov-io edited a comment on issue #1159: [HUDI-479] Eliminate or Minimize 
use of Guava if possible
URL: https://github.com/apache/incubator-hudi/pull/1159#issuecomment-596089314
 
 
   # 
[Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1159?src=pr&el=h1) 
Report
   > Merging 
[#1159](https://codecov.io/gh/apache/incubator-hudi/pull/1159?src=pr&el=desc) 
into 
[master](https://codecov.io/gh/apache/incubator-hudi/commit/e101ea9bd4405a461bc78aad1af64499f797daed&el=desc)
 will **decrease** coverage by `0.18%`.
   > The diff coverage is `44.82%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-hudi/pull/1159/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2)](https://codecov.io/gh/apache/incubator-hudi/pull/1159?src=pr&el=tree)
   
   ```diff
   @@ Coverage Diff  @@
   ## master#1159  +/-   ##
   
   - Coverage 67.68%   67.49%   -0.19% 
 Complexity  261  261  
   
 Files   341  342   +1 
 Lines 1651116584  +73 
 Branches   1688 1699  +11 
   
   + Hits  1117511194  +19 
   - Misses 4599 4650  +51 
   - Partials737  740   +3 
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/incubator-hudi/pull/1159?src=pr&el=tree) | 
Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | 
[...in/java/org/apache/hudi/io/HoodieAppendHandle.java](https://codecov.io/gh/apache/incubator-hudi/pull/1159/diff?src=pr&el=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaW8vSG9vZGllQXBwZW5kSGFuZGxlLmphdmE=)
 | `84.17% <ø> (ø)` | `0.00 <0.00> (ø)` | |
   | 
[...va/org/apache/hudi/metrics/JmxMetricsReporter.java](https://codecov.io/gh/apache/incubator-hudi/pull/1159/diff?src=pr&el=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvbWV0cmljcy9KbXhNZXRyaWNzUmVwb3J0ZXIuamF2YQ==)
 | `0.00% <0.00%> (ø)` | `0.00 <0.00> (ø)` | |
   | 
[...ava/org/apache/hudi/metrics/JmxReporterServer.java](https://codecov.io/gh/apache/incubator-hudi/pull/1159/diff?src=pr&el=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvbWV0cmljcy9KbXhSZXBvcnRlclNlcnZlci5qYXZh)
 | `0.00% <0.00%> (ø)` | `0.00 <0.00> (ø)` | |
   | 
[...pache/hudi/common/table/HoodieTableMetaClient.java](https://codecov.io/gh/apache/incubator-hudi/pull/1159/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL0hvb2RpZVRhYmxlTWV0YUNsaWVudC5qYXZh)
 | `76.77% <ø> (ø)` | `0.00 <0.00> (ø)` | |
   | 
[...main/java/org/apache/hudi/common/util/FSUtils.java](https://codecov.io/gh/apache/incubator-hudi/pull/1159/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3V0aWwvRlNVdGlscy5qYXZh)
 | `69.27% <ø> (ø)` | `0.00 <0.00> (ø)` | |
   | 
[...pache/hudi/utilities/sources/HoodieIncrSource.java](https://codecov.io/gh/apache/incubator-hudi/pull/1159/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvSG9vZGllSW5jclNvdXJjZS5qYXZh)
 | `92.59% <ø> (ø)` | `7.00 <0.00> (ø)` | |
   | 
[...a/org/apache/hudi/common/util/ReflectionUtils.java](https://codecov.io/gh/apache/incubator-hudi/pull/1159/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3V0aWwvUmVmbGVjdGlvblV0aWxzLmphdmE=)
 | `30.76% <3.22%> (-31.74%)` | `0.00 <0.00> (ø)` | |
   | 
[.../java/org/apache/hudi/common/util/FileIOUtils.java](https://codecov.io/gh/apache/incubator-hudi/pull/1159/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3V0aWwvRmlsZUlPVXRpbHMuamF2YQ==)
 | `54.54% <33.33%> (-10.98%)` | `0.00 <0.00> (ø)` | |
   | 
[...a/org/apache/hudi/common/util/CollectionUtils.java](https://codecov.io/gh/apache/incubator-hudi/pull/1159/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3V0aWwvQ29sbGVjdGlvblV0aWxzLmphdmE=)
 | `47.22% <47.22%> (ø)` | `0.00 <0.00> (?)` | |
   | 
[...java/org/apache/hudi/client/HoodieWriteClient.java](https://codecov.io/gh/apache/incubator-hudi/pull/1159/diff?src=pr&el=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY2xpZW50L0hvb2RpZVdyaXRlQ2xpZW50LmphdmE=)
 | `69.77% <100.00%> (ø)` | `0.00 <0.00> (ø)` | |
   | ... and [18 
more](https://codecov.io/gh/apache/incubator-hudi/pull/1159/diff?src=pr&el=tree-more)
 | |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1159?src=pr&el=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1159?src=pr&el=footer).
 Last

[GitHub] [incubator-hudi] codecov-io edited a comment on issue #1159: [HUDI-479] Eliminate or Minimize use of Guava if possible

2020-03-25 Thread GitBox
codecov-io edited a comment on issue #1159: [HUDI-479] Eliminate or Minimize 
use of Guava if possible
URL: https://github.com/apache/incubator-hudi/pull/1159#issuecomment-596089314
 
 
   # 
[Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1159?src=pr&el=h1) 
Report
   > Merging 
[#1159](https://codecov.io/gh/apache/incubator-hudi/pull/1159?src=pr&el=desc) 
into 
[master](https://codecov.io/gh/apache/incubator-hudi/commit/e101ea9bd4405a461bc78aad1af64499f797daed&el=desc)
 will **decrease** coverage by `0.18%`.
   > The diff coverage is `44.82%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-hudi/pull/1159/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2)](https://codecov.io/gh/apache/incubator-hudi/pull/1159?src=pr&el=tree)
   
   ```diff
   @@ Coverage Diff  @@
   ## master#1159  +/-   ##
   
   - Coverage 67.68%   67.49%   -0.19% 
 Complexity  261  261  
   
 Files   341  342   +1 
 Lines 1651116584  +73 
 Branches   1688 1699  +11 
   
   + Hits  1117511194  +19 
   - Misses 4599 4650  +51 
   - Partials737  740   +3 
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/incubator-hudi/pull/1159?src=pr&el=tree) | 
Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | 
[...in/java/org/apache/hudi/io/HoodieAppendHandle.java](https://codecov.io/gh/apache/incubator-hudi/pull/1159/diff?src=pr&el=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaW8vSG9vZGllQXBwZW5kSGFuZGxlLmphdmE=)
 | `84.17% <ø> (ø)` | `0.00 <0.00> (ø)` | |
   | 
[...va/org/apache/hudi/metrics/JmxMetricsReporter.java](https://codecov.io/gh/apache/incubator-hudi/pull/1159/diff?src=pr&el=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvbWV0cmljcy9KbXhNZXRyaWNzUmVwb3J0ZXIuamF2YQ==)
 | `0.00% <0.00%> (ø)` | `0.00 <0.00> (ø)` | |
   | 
[...ava/org/apache/hudi/metrics/JmxReporterServer.java](https://codecov.io/gh/apache/incubator-hudi/pull/1159/diff?src=pr&el=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvbWV0cmljcy9KbXhSZXBvcnRlclNlcnZlci5qYXZh)
 | `0.00% <0.00%> (ø)` | `0.00 <0.00> (ø)` | |
   | 
[...pache/hudi/common/table/HoodieTableMetaClient.java](https://codecov.io/gh/apache/incubator-hudi/pull/1159/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL0hvb2RpZVRhYmxlTWV0YUNsaWVudC5qYXZh)
 | `76.77% <ø> (ø)` | `0.00 <0.00> (ø)` | |
   | 
[...main/java/org/apache/hudi/common/util/FSUtils.java](https://codecov.io/gh/apache/incubator-hudi/pull/1159/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3V0aWwvRlNVdGlscy5qYXZh)
 | `69.27% <ø> (ø)` | `0.00 <0.00> (ø)` | |
   | 
[...pache/hudi/utilities/sources/HoodieIncrSource.java](https://codecov.io/gh/apache/incubator-hudi/pull/1159/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvSG9vZGllSW5jclNvdXJjZS5qYXZh)
 | `92.59% <ø> (ø)` | `7.00 <0.00> (ø)` | |
   | 
[...a/org/apache/hudi/common/util/ReflectionUtils.java](https://codecov.io/gh/apache/incubator-hudi/pull/1159/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3V0aWwvUmVmbGVjdGlvblV0aWxzLmphdmE=)
 | `30.76% <3.22%> (-31.74%)` | `0.00 <0.00> (ø)` | |
   | 
[.../java/org/apache/hudi/common/util/FileIOUtils.java](https://codecov.io/gh/apache/incubator-hudi/pull/1159/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3V0aWwvRmlsZUlPVXRpbHMuamF2YQ==)
 | `54.54% <33.33%> (-10.98%)` | `0.00 <0.00> (ø)` | |
   | 
[...a/org/apache/hudi/common/util/CollectionUtils.java](https://codecov.io/gh/apache/incubator-hudi/pull/1159/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3V0aWwvQ29sbGVjdGlvblV0aWxzLmphdmE=)
 | `47.22% <47.22%> (ø)` | `0.00 <0.00> (?)` | |
   | 
[...java/org/apache/hudi/client/HoodieWriteClient.java](https://codecov.io/gh/apache/incubator-hudi/pull/1159/diff?src=pr&el=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY2xpZW50L0hvb2RpZVdyaXRlQ2xpZW50LmphdmE=)
 | `69.77% <100.00%> (ø)` | `0.00 <0.00> (ø)` | |
   | ... and [18 
more](https://codecov.io/gh/apache/incubator-hudi/pull/1159/diff?src=pr&el=tree-more)
 | |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1159?src=pr&el=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1159?src=pr&el=footer).
 Last

Build failed in Jenkins: hudi-snapshot-deployment-0.5 #228

2020-03-25 Thread Apache Jenkins Server
See 


Changes:


--
[...truncated 2.35 KB...]
/home/jenkins/tools/maven/apache-maven-3.5.4/conf:
logging
settings.xml
toolchains.xml

/home/jenkins/tools/maven/apache-maven-3.5.4/conf/logging:
simplelogger.properties

/home/jenkins/tools/maven/apache-maven-3.5.4/lib:
aopalliance-1.0.jar
cdi-api-1.0.jar
cdi-api.license
commons-cli-1.4.jar
commons-cli.license
commons-io-2.5.jar
commons-io.license
commons-lang3-3.5.jar
commons-lang3.license
ext
guava-20.0.jar
guice-4.2.0-no_aop.jar
jansi-1.17.1.jar
jansi-native
javax.inject-1.jar
jcl-over-slf4j-1.7.25.jar
jcl-over-slf4j.license
jsr250-api-1.0.jar
jsr250-api.license
maven-artifact-3.5.4.jar
maven-artifact.license
maven-builder-support-3.5.4.jar
maven-builder-support.license
maven-compat-3.5.4.jar
maven-compat.license
maven-core-3.5.4.jar
maven-core.license
maven-embedder-3.5.4.jar
maven-embedder.license
maven-model-3.5.4.jar
maven-model-builder-3.5.4.jar
maven-model-builder.license
maven-model.license
maven-plugin-api-3.5.4.jar
maven-plugin-api.license
maven-repository-metadata-3.5.4.jar
maven-repository-metadata.license
maven-resolver-api-1.1.1.jar
maven-resolver-api.license
maven-resolver-connector-basic-1.1.1.jar
maven-resolver-connector-basic.license
maven-resolver-impl-1.1.1.jar
maven-resolver-impl.license
maven-resolver-provider-3.5.4.jar
maven-resolver-provider.license
maven-resolver-spi-1.1.1.jar
maven-resolver-spi.license
maven-resolver-transport-wagon-1.1.1.jar
maven-resolver-transport-wagon.license
maven-resolver-util-1.1.1.jar
maven-resolver-util.license
maven-settings-3.5.4.jar
maven-settings-builder-3.5.4.jar
maven-settings-builder.license
maven-settings.license
maven-shared-utils-3.2.1.jar
maven-shared-utils.license
maven-slf4j-provider-3.5.4.jar
maven-slf4j-provider.license
org.eclipse.sisu.inject-0.3.3.jar
org.eclipse.sisu.inject.license
org.eclipse.sisu.plexus-0.3.3.jar
org.eclipse.sisu.plexus.license
plexus-cipher-1.7.jar
plexus-cipher.license
plexus-component-annotations-1.7.1.jar
plexus-component-annotations.license
plexus-interpolation-1.24.jar
plexus-interpolation.license
plexus-sec-dispatcher-1.4.jar
plexus-sec-dispatcher.license
plexus-utils-3.1.0.jar
plexus-utils.license
slf4j-api-1.7.25.jar
slf4j-api.license
wagon-file-3.1.0.jar
wagon-file.license
wagon-http-3.1.0-shaded.jar
wagon-http.license
wagon-provider-api-3.1.0.jar
wagon-provider-api.license

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/ext:
README.txt

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native:
freebsd32
freebsd64
linux32
linux64
osx
README.txt
windows32
windows64

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/freebsd32:
libjansi.so

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/freebsd64:
libjansi.so

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/linux32:
libjansi.so

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/linux64:
libjansi.so

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/osx:
libjansi.jnilib

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/windows32:
jansi.dll

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/windows64:
jansi.dll
Finished /home/jenkins/tools/maven/apache-maven-3.5.4 Directory Listing :
Detected current version as: 
'HUDI_home=
0.6.0-SNAPSHOT'
[INFO] Scanning for projects...
[WARNING] 
[WARNING] Some problems were encountered while building the effective model for 
org.apache.hudi:hudi-spark_2.11:jar:0.6.0-SNAPSHOT
[WARNING] 'artifactId' contains an expression but should be a constant. @ 
org.apache.hudi:hudi-spark_${scala.binary.version}:[unknown-version], 

 line 26, column 15
[WARNING] 
[WARNING] Some problems were encountered while building the effective model for 
org.apache.hudi:hudi-timeline-service:jar:0.6.0-SNAPSHOT
[WARNING] 'build.plugins.plugin.(groupId:artifactId)' must be unique but found 
duplicate declaration of plugin org.jacoco:jacoco-maven-plugin @ 
org.apache.hudi:hudi-timeline-service:[unknown-version], 

 line 58, column 15
[WARNING] 
[WARNING] Some problems were encountered while building the effective model for 
org.apache.hudi:hudi-utilities_2.11:jar:0.6.0-SNAPSHOT
[WARNING] 'artifactId' contains an expression but should be a constant. @ 
org.apache.hudi:hudi-utilities_${scala.binary.version}:[unknown-version], 

 line 26, column 15
[WARNING] 
[WARNING] Some problems were encountered while building the effective model for 
org.apache.hudi:hudi-spark-bundle_2.11:jar:0.6.0-SNAPSHOT
[WARNING] 'artifactId' contains an expression but should be a constant. @ 
or

[incubator-hudi] branch master updated: [MINOR] Update DOAP with 0.5.2 Release (#1448)

2020-03-25 Thread smarthi
This is an automated email from the ASF dual-hosted git repository.

smarthi pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new e101ea9  [MINOR] Update DOAP with 0.5.2 Release (#1448)
e101ea9 is described below

commit e101ea9bd4405a461bc78aad1af64499f797daed
Author: Suneel Marthi 
AuthorDate: Wed Mar 25 23:37:32 2020 -0400

[MINOR] Update DOAP with 0.5.2 Release (#1448)
---
 doap_HUDI.rdf | 5 +
 1 file changed, 5 insertions(+)

diff --git a/doap_HUDI.rdf b/doap_HUDI.rdf
index c33d201..af45a41 100644
--- a/doap_HUDI.rdf
+++ b/doap_HUDI.rdf
@@ -46,6 +46,11 @@
 2020-01-31
 0.5.1
   
+  
+Apache Hudi-incubating 0.5.2
+2020-03-26
+0.5.2
+  
 
 
   



[GitHub] [incubator-hudi] smarthi merged pull request #1448: [MINOR] Update DOAP with 0.5.2 Release

2020-03-25 Thread GitBox
smarthi merged pull request #1448: [MINOR] Update DOAP with 0.5.2 Release
URL: https://github.com/apache/incubator-hudi/pull/1448
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] smarthi opened a new pull request #1448: [MINOR] Update DOAP with 0.5.2 Release

2020-03-25 Thread GitBox
smarthi opened a new pull request #1448: [MINOR] Update DOAP with 0.5.2 Release
URL: https://github.com/apache/incubator-hudi/pull/1448
 
 
   ## *Tips*
   - *Thank you very much for contributing to Apache Hudi.*
   - *Please review https://hudi.apache.org/contributing.html before opening a 
pull request.*
   
   ## What is the purpose of the pull request
   
   Update DOAP
   
   ## Brief change log
   
   Updated DOAP for 0.5.2 Release
   
   ## Verify this pull request
   
   *(Please pick either of the following options)*
   
   This pull request is a trivial rework / code cleanup without any test 
coverage.
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[incubator-hudi] branch asf-site updated: Travis CI build asf-site

2020-03-25 Thread vinoth
This is an automated email from the ASF dual-hosted git repository.

vinoth pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/incubator-hudi.git


The following commit(s) were added to refs/heads/asf-site by this push:
 new 6712fb2  Travis CI build asf-site
6712fb2 is described below

commit 6712fb2ee460ccd98ff6869f6de22a3ac6961819
Author: CI 
AuthorDate: Thu Mar 26 02:07:43 2020 +

Travis CI build asf-site
---
 test-content/cn/releases.html | 2 +-
 test-content/releases.html| 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/test-content/cn/releases.html b/test-content/cn/releases.html
index ed77b36..a48f73c 100644
--- a/test-content/cn/releases.html
+++ b/test-content/cn/releases.html
@@ -245,7 +245,7 @@
 
 
 Raw Release Notes
-The raw release notes are available https://issues.apache.org/jira/projects/HUDI/versions/12346606#release-report-tab-body";>here
+The raw release notes are available https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12322822&version=12346606";>here
 
 https://github.com/apache/incubator-hudi/releases/tag/release-0.5.1-incubating";>Release
 0.5.1-incubating (docs)
 
diff --git a/test-content/releases.html b/test-content/releases.html
index 9f97bff..303ba95 100644
--- a/test-content/releases.html
+++ b/test-content/releases.html
@@ -252,7 +252,7 @@
 
 
 Raw Release Notes
-The raw release notes are available https://issues.apache.org/jira/projects/HUDI/versions/12346606#release-report-tab-body";>here
+The raw release notes are available https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12322822&version=12346606";>here
 
 https://github.com/apache/incubator-hudi/releases/tag/release-0.5.1-incubating";>Release
 0.5.1-incubating (docs)
 



[incubator-hudi] branch asf-site updated: Update the release note url for 0.5.2 (#1447)

2020-03-25 Thread vinoyang
This is an automated email from the ASF dual-hosted git repository.

vinoyang pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/incubator-hudi.git


The following commit(s) were added to refs/heads/asf-site by this push:
 new f33188a  Update the release note url for 0.5.2 (#1447)
f33188a is described below

commit f33188a8c619d1434201856f582b6be0c81a92ff
Author: vinoyang 
AuthorDate: Thu Mar 26 10:05:35 2020 +0800

Update the release note url for 0.5.2 (#1447)
---
 content/cn/docs/powered_by.html | 2 +-
 content/cn/releases.html| 2 +-
 content/docs/powered_by.html| 2 +-
 content/releases.html   | 2 +-
 docs/_pages/releases.cn.md  | 2 +-
 docs/_pages/releases.md | 2 +-
 6 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/content/cn/docs/powered_by.html b/content/cn/docs/powered_by.html
index 5866edd..f852d64 100644
--- a/content/cn/docs/powered_by.html
+++ b/content/cn/docs/powered_by.html
@@ -351,7 +351,7 @@ Hudi还支持几个增量的Hive ETL管道,并且目前已集成到Uber的数
 
 Yields.io
 
-Yields.io是第一个使用AI在企业范围内进行自动模型验证和实时监控的金融科技平台。他们的数据湖由Hudi管理,他们还积极使用Hudi为增量式、跨语言/平台机器学习构建基础架构。
+https://www.yields.io/Blog/Apache-Hudi-at-Yields";>Yields.io是第一个使用AI在企业范围内进行自动模型验证和实时监控的金融科技平台。他们的数据湖由Hudi管理,他们还积极使用Hudi为增量式、跨语言/平台机器学习构建基础架构。
 
 Yotpo
 
diff --git a/content/cn/releases.html b/content/cn/releases.html
index ed77b36..a48f73c 100644
--- a/content/cn/releases.html
+++ b/content/cn/releases.html
@@ -245,7 +245,7 @@
 
 
 Raw Release Notes
-The raw release notes are available https://issues.apache.org/jira/projects/HUDI/versions/12346606#release-report-tab-body";>here
+The raw release notes are available https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12322822&version=12346606";>here
 
 https://github.com/apache/incubator-hudi/releases/tag/release-0.5.1-incubating";>Release
 0.5.1-incubating (docs)
 
diff --git a/content/docs/powered_by.html b/content/docs/powered_by.html
index 86e6f32..e5b52e3 100644
--- a/content/docs/powered_by.html
+++ b/content/docs/powered_by.html
@@ -355,7 +355,7 @@ offering, providing means for AWS users to perform 
record-level updates/deletes
 
 Yields.io
 
-Yields.io is the first FinTech platform that uses AI for automated model 
validation and real-time monitoring on an enterprise-wide scale. Their data 
lake is managed by Hudi. They are also actively building their infrastructure 
for incremental, cross language/platform machine learning using Hudi.
+Yields.io is the first FinTech platform that uses AI for automated model 
validation and real-time monitoring on an enterprise-wide scale. Their https://www.yields.io/Blog/Apache-Hudi-at-Yields";>data lake is 
managed by Hudi. They are also actively building their infrastructure for 
incremental, cross language/platform machine learning using Hudi.
 
 Yotpo
 
diff --git a/content/releases.html b/content/releases.html
index 9f97bff..303ba95 100644
--- a/content/releases.html
+++ b/content/releases.html
@@ -252,7 +252,7 @@
 
 
 Raw Release Notes
-The raw release notes are available https://issues.apache.org/jira/projects/HUDI/versions/12346606#release-report-tab-body";>here
+The raw release notes are available https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12322822&version=12346606";>here
 
 https://github.com/apache/incubator-hudi/releases/tag/release-0.5.1-incubating";>Release
 0.5.1-incubating (docs)
 
diff --git a/docs/_pages/releases.cn.md b/docs/_pages/releases.cn.md
index 181df17..211b55a 100644
--- a/docs/_pages/releases.cn.md
+++ b/docs/_pages/releases.cn.md
@@ -31,7 +31,7 @@ temp_query --sql "select Instant, NumInserts, NumWrites from 
satishkotha_debug w
 ```
 
 ### Raw Release Notes
-  The raw release notes are available 
[here](https://issues.apache.org/jira/projects/HUDI/versions/12346606#release-report-tab-body)
+  The raw release notes are available 
[here](https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12322822&version=12346606)
 
 ## [Release 
0.5.1-incubating](https://github.com/apache/incubator-hudi/releases/tag/release-0.5.1-incubating)
 ([docs](/docs/0.5.1-quick-start-guide.html))
 
diff --git a/docs/_pages/releases.md b/docs/_pages/releases.md
index c4d34bb..aacdcde 100644
--- a/docs/_pages/releases.md
+++ b/docs/_pages/releases.md
@@ -30,7 +30,7 @@ temp_query --sql "select Instant, NumInserts, NumWrites from 
satishkotha_debug w
 ```
 
 ### Raw Release Notes
-  The raw release notes are available 
[here](https://issues.apache.org/jira/projects/HUDI/versions/12346606#release-report-tab-body)
+  The raw release notes are available 
[here](https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12322822&version=12346606)
 
 ## [Release 
0.5.1-incubating](https://github.com/apache/incubator-hudi/releases/tag/release-0.5.1-incubating)
 ([docs](/docs/0.5.1-quick-start-guide.html))
 



[GitHub] [incubator-hudi] yanghua merged pull request #1447: [MINOR] Update the release note url for 0.5.2

2020-03-25 Thread GitBox
yanghua merged pull request #1447: [MINOR] Update the release note url for 0.5.2
URL: https://github.com/apache/incubator-hudi/pull/1447
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[incubator-hudi] annotated tag release-0.5.2-incubating updated (41202da -> 8c4a620)

2020-03-25 Thread vinoyang
This is an automated email from the ASF dual-hosted git repository.

vinoyang pushed a change to annotated tag release-0.5.2-incubating
in repository https://gitbox.apache.org/repos/asf/incubator-hudi.git.


*** WARNING: tag release-0.5.2-incubating was modified! ***

from 41202da  (commit)
  to 8c4a620  (tag)
 tagging 41202da7788193da77f1ae4b784127bb93eaae2c (commit)
 replaces release-0.5.2-incubating-rc2
  by yanghua
  on Thu Mar 26 09:42:40 2020 +0800

- Log -
release-0.5.2-incubating
-BEGIN PGP SIGNATURE-

iQEzBAABCAAdFiEEw6lux3FJVxron4J2TIZoTQR94DwFAl58CJAACgkQTIZoTQR9
4Dxa3wf8CQZ7DVVJb9z9NO4hhl+ObBUKk9XJtBcL8tW60nVI7bP6gJ4Egq4a2wpG
qHUX6lsvBKZ+mnOHlEk3pCwu+D/x1pRprD6qcSGvAjafVnsDeAybNI6qSsuRaRdL
68CQsdR7tLyLibEQ24RukHs0CU38mc1GviUuRFxmrPmlFKZP+LCs+Ym21vmOjo1F
6FwLcjjUgweZsEm92zgvWSN2tbrKRXtLu1i6oRZSlX2HkdQ7ULDUFF5hmRwY1eS3
sOWsOzuzvkySkE0J4rvyh6NHEMtA4uGgbq9LtQJIrLjAmKXV369MSPWG9058bqQk
fKItkwYmzn4BRf+cyplKJC3hqmqfLg==
=c25o
-END PGP SIGNATURE-
---


No new revisions were added by this update.

Summary of changes:



[GitHub] [incubator-hudi] codecov-io edited a comment on issue #1159: [HUDI-479] Eliminate or Minimize use of Guava if possible

2020-03-25 Thread GitBox
codecov-io edited a comment on issue #1159: [HUDI-479] Eliminate or Minimize 
use of Guava if possible
URL: https://github.com/apache/incubator-hudi/pull/1159#issuecomment-596089314
 
 
   # 
[Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1159?src=pr&el=h1) 
Report
   > Merging 
[#1159](https://codecov.io/gh/apache/incubator-hudi/pull/1159?src=pr&el=desc) 
into 
[master](https://codecov.io/gh/apache/incubator-hudi/commit/5eed6c98a8880dc3b4e64ec9ff9dae4859e89b4c&el=desc)
 will **increase** coverage by `66.83%`.
   > The diff coverage is `43.10%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-hudi/pull/1159/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2)](https://codecov.io/gh/apache/incubator-hudi/pull/1159?src=pr&el=tree)
   
   ```diff
   @@  Coverage Diff  @@
   ## master#1159   +/-   ##
   =
   + Coverage  0.63%   67.47%   +66.83% 
   - Complexity2  261  +259 
   =
 Files   292  342   +50 
 Lines 1448216584 +2102 
 Branches   1477 1699  +222 
   =
   + Hits 9211190+11098 
   + Misses14387 4655 -9732 
   - Partials  3  739  +736 
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/incubator-hudi/pull/1159?src=pr&el=tree) | 
Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | 
[...in/java/org/apache/hudi/io/HoodieAppendHandle.java](https://codecov.io/gh/apache/incubator-hudi/pull/1159/diff?src=pr&el=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaW8vSG9vZGllQXBwZW5kSGFuZGxlLmphdmE=)
 | `84.17% <ø> (+84.17%)` | `0.00 <0.00> (ø)` | |
   | 
[...va/org/apache/hudi/metrics/JmxMetricsReporter.java](https://codecov.io/gh/apache/incubator-hudi/pull/1159/diff?src=pr&el=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvbWV0cmljcy9KbXhNZXRyaWNzUmVwb3J0ZXIuamF2YQ==)
 | `0.00% <0.00%> (ø)` | `0.00 <0.00> (ø)` | |
   | 
[...ava/org/apache/hudi/metrics/JmxReporterServer.java](https://codecov.io/gh/apache/incubator-hudi/pull/1159/diff?src=pr&el=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvbWV0cmljcy9KbXhSZXBvcnRlclNlcnZlci5qYXZh)
 | `0.00% <0.00%> (ø)` | `0.00 <0.00> (ø)` | |
   | 
[...src/main/java/org/apache/hudi/metrics/Metrics.java](https://codecov.io/gh/apache/incubator-hudi/pull/1159/diff?src=pr&el=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvbWV0cmljcy9NZXRyaWNzLmphdmE=)
 | `58.33% <0.00%> (+58.33%)` | `0.00 <0.00> (ø)` | |
   | 
[...pache/hudi/common/table/HoodieTableMetaClient.java](https://codecov.io/gh/apache/incubator-hudi/pull/1159/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL0hvb2RpZVRhYmxlTWV0YUNsaWVudC5qYXZh)
 | `76.77% <ø> (+76.77%)` | `0.00 <0.00> (ø)` | |
   | 
[...main/java/org/apache/hudi/common/util/FSUtils.java](https://codecov.io/gh/apache/incubator-hudi/pull/1159/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3V0aWwvRlNVdGlscy5qYXZh)
 | `69.27% <ø> (+69.27%)` | `0.00 <0.00> (ø)` | |
   | 
[...pache/hudi/utilities/sources/HoodieIncrSource.java](https://codecov.io/gh/apache/incubator-hudi/pull/1159/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvSG9vZGllSW5jclNvdXJjZS5qYXZh)
 | `92.59% <ø> (ø)` | `7.00 <0.00> (?)` | |
   | 
[...a/org/apache/hudi/common/util/ReflectionUtils.java](https://codecov.io/gh/apache/incubator-hudi/pull/1159/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3V0aWwvUmVmbGVjdGlvblV0aWxzLmphdmE=)
 | `30.76% <3.22%> (+30.76%)` | `0.00 <0.00> (ø)` | |
   | 
[.../java/org/apache/hudi/common/util/FileIOUtils.java](https://codecov.io/gh/apache/incubator-hudi/pull/1159/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3V0aWwvRmlsZUlPVXRpbHMuamF2YQ==)
 | `52.27% <26.66%> (+52.27%)` | `0.00 <0.00> (ø)` | |
   | 
[...a/org/apache/hudi/common/util/CollectionUtils.java](https://codecov.io/gh/apache/incubator-hudi/pull/1159/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3V0aWwvQ29sbGVjdGlvblV0aWxzLmphdmE=)
 | `47.22% <47.22%> (ø)` | `0.00 <0.00> (?)` | |
   | ... and [320 
more](https://codecov.io/gh/apache/incubator-hudi/pull/1159/diff?src=pr&el=tree-more)
 | |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1159?src=pr&el=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1159

[GitHub] [incubator-hudi] codecov-io edited a comment on issue #1159: [HUDI-479] Eliminate or Minimize use of Guava if possible

2020-03-25 Thread GitBox
codecov-io edited a comment on issue #1159: [HUDI-479] Eliminate or Minimize 
use of Guava if possible
URL: https://github.com/apache/incubator-hudi/pull/1159#issuecomment-596089314
 
 
   # 
[Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1159?src=pr&el=h1) 
Report
   > Merging 
[#1159](https://codecov.io/gh/apache/incubator-hudi/pull/1159?src=pr&el=desc) 
into 
[master](https://codecov.io/gh/apache/incubator-hudi/commit/5eed6c98a8880dc3b4e64ec9ff9dae4859e89b4c&el=desc)
 will **increase** coverage by `66.83%`.
   > The diff coverage is `43.10%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-hudi/pull/1159/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2)](https://codecov.io/gh/apache/incubator-hudi/pull/1159?src=pr&el=tree)
   
   ```diff
   @@  Coverage Diff  @@
   ## master#1159   +/-   ##
   =
   + Coverage  0.63%   67.47%   +66.83% 
   - Complexity2  261  +259 
   =
 Files   292  342   +50 
 Lines 1448216584 +2102 
 Branches   1477 1699  +222 
   =
   + Hits 9211190+11098 
   + Misses14387 4655 -9732 
   - Partials  3  739  +736 
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/incubator-hudi/pull/1159?src=pr&el=tree) | 
Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | 
[...in/java/org/apache/hudi/io/HoodieAppendHandle.java](https://codecov.io/gh/apache/incubator-hudi/pull/1159/diff?src=pr&el=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaW8vSG9vZGllQXBwZW5kSGFuZGxlLmphdmE=)
 | `84.17% <ø> (+84.17%)` | `0.00 <0.00> (ø)` | |
   | 
[...va/org/apache/hudi/metrics/JmxMetricsReporter.java](https://codecov.io/gh/apache/incubator-hudi/pull/1159/diff?src=pr&el=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvbWV0cmljcy9KbXhNZXRyaWNzUmVwb3J0ZXIuamF2YQ==)
 | `0.00% <0.00%> (ø)` | `0.00 <0.00> (ø)` | |
   | 
[...ava/org/apache/hudi/metrics/JmxReporterServer.java](https://codecov.io/gh/apache/incubator-hudi/pull/1159/diff?src=pr&el=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvbWV0cmljcy9KbXhSZXBvcnRlclNlcnZlci5qYXZh)
 | `0.00% <0.00%> (ø)` | `0.00 <0.00> (ø)` | |
   | 
[...src/main/java/org/apache/hudi/metrics/Metrics.java](https://codecov.io/gh/apache/incubator-hudi/pull/1159/diff?src=pr&el=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvbWV0cmljcy9NZXRyaWNzLmphdmE=)
 | `58.33% <0.00%> (+58.33%)` | `0.00 <0.00> (ø)` | |
   | 
[...pache/hudi/common/table/HoodieTableMetaClient.java](https://codecov.io/gh/apache/incubator-hudi/pull/1159/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL0hvb2RpZVRhYmxlTWV0YUNsaWVudC5qYXZh)
 | `76.77% <ø> (+76.77%)` | `0.00 <0.00> (ø)` | |
   | 
[...main/java/org/apache/hudi/common/util/FSUtils.java](https://codecov.io/gh/apache/incubator-hudi/pull/1159/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3V0aWwvRlNVdGlscy5qYXZh)
 | `69.27% <ø> (+69.27%)` | `0.00 <0.00> (ø)` | |
   | 
[...pache/hudi/utilities/sources/HoodieIncrSource.java](https://codecov.io/gh/apache/incubator-hudi/pull/1159/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvSG9vZGllSW5jclNvdXJjZS5qYXZh)
 | `92.59% <ø> (ø)` | `7.00 <0.00> (?)` | |
   | 
[...a/org/apache/hudi/common/util/ReflectionUtils.java](https://codecov.io/gh/apache/incubator-hudi/pull/1159/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3V0aWwvUmVmbGVjdGlvblV0aWxzLmphdmE=)
 | `30.76% <3.22%> (+30.76%)` | `0.00 <0.00> (ø)` | |
   | 
[.../java/org/apache/hudi/common/util/FileIOUtils.java](https://codecov.io/gh/apache/incubator-hudi/pull/1159/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3V0aWwvRmlsZUlPVXRpbHMuamF2YQ==)
 | `52.27% <26.66%> (+52.27%)` | `0.00 <0.00> (ø)` | |
   | 
[...a/org/apache/hudi/common/util/CollectionUtils.java](https://codecov.io/gh/apache/incubator-hudi/pull/1159/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3V0aWwvQ29sbGVjdGlvblV0aWxzLmphdmE=)
 | `47.22% <47.22%> (ø)` | `0.00 <0.00> (?)` | |
   | ... and [320 
more](https://codecov.io/gh/apache/incubator-hudi/pull/1159/diff?src=pr&el=tree-more)
 | |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1159?src=pr&el=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1159

[GitHub] [incubator-hudi] yanghua opened a new pull request #1447: [MINOR] Update the release note url for 0.5.2

2020-03-25 Thread GitBox
yanghua opened a new pull request #1447: [MINOR] Update the release note url 
for 0.5.2
URL: https://github.com/apache/incubator-hudi/pull/1447
 
 
   
   
   ## What is the purpose of the pull request
   
   *This pull request UpdateS the release note url for 0.5.2*
   
   ## Brief change log
   
 - *Update the release note url for 0.5.2*
   
   ## Verify this pull request
   
   
   This pull request is a trivial rework / code cleanup without any test 
coverage.
   
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] lamber-ken commented on issue #1446: [MINOR] Add link for yields.io usage

2020-03-25 Thread GitBox
lamber-ken commented on issue #1446: [MINOR] Add link for yields.io usage
URL: https://github.com/apache/incubator-hudi/pull/1446#issuecomment-604174107
 
 
   btw, if the ci build fail, you can use  following command to trigger 
rebuild. @vinothchandar 
   ```
   git commit --allow-empty -m 'trigger rebuild' 
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] lamber-ken edited a comment on issue #1446: [MINOR] Add link for yields.io usage

2020-03-25 Thread GitBox
lamber-ken edited a comment on issue #1446: [MINOR] Add link for yields.io usage
URL: https://github.com/apache/incubator-hudi/pull/1446#issuecomment-604169166
 
 
   hi @vinothchandar, 
   when you submit the pr, the travis build fail. from its stacktrace, we may 
met some bug
   https://travis-ci.org/github/apache/incubator-hudi/builds/667011488
   
   after merged, it worked finally.
   https://travis-ci.org/github/apache/incubator-hudi/builds/667011979


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] lamber-ken commented on issue #1443: [SUPPORT] Spark-Hudi consumes too much space in a temp folder while upsert

2020-03-25 Thread GitBox
lamber-ken commented on issue #1443: [SUPPORT] Spark-Hudi consumes too much 
space in a temp folder while upsert
URL: https://github.com/apache/incubator-hudi/issues/1443#issuecomment-604171160
 
 
   > @lamber-ken this has come up in various tickets already..
   > @n3nash should we bump up the default merge size? what do you guys use at 
uber?
   
   right, we can  add it to Troubleshooting Guide.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] lamber-ken commented on issue #1446: [MINOR] Add link for yields.io usage

2020-03-25 Thread GitBox
lamber-ken commented on issue #1446: [MINOR] Add link for yields.io usage
URL: https://github.com/apache/incubator-hudi/pull/1446#issuecomment-604169166
 
 
   hi @vinothchandar, it may have been delayed, it worked finally.
   https://travis-ci.org/github/apache/incubator-hudi/builds/667011979


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] smarthi commented on issue #1159: [HUDI-479] Eliminate or Minimize use of Guava if possible

2020-03-25 Thread GitBox
smarthi commented on issue #1159: [HUDI-479] Eliminate or Minimize use of Guava 
if possible
URL: https://github.com/apache/incubator-hudi/pull/1159#issuecomment-604166428
 
 
   > @smarthi is this ready for final review? Have we eliminated Guava at this 
point?
   
   @vinothchandar please review


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] leesf commented on issue #1418: [HUDI-678] Make config package spark free

2020-03-25 Thread GitBox
leesf commented on issue #1418: [HUDI-678] Make config package spark free
URL: https://github.com/apache/incubator-hudi/pull/1418#issuecomment-604155771
 
 
   @vinothchandar Updated the PR to address your comments, PTAL, thanks.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[incubator-hudi] branch asf-site updated: Travis CI build asf-site

2020-03-25 Thread vinoth
This is an automated email from the ASF dual-hosted git repository.

vinoth pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/incubator-hudi.git


The following commit(s) were added to refs/heads/asf-site by this push:
 new 3353180  Travis CI build asf-site
3353180 is described below

commit 33531804037eb843d4939b5f8c3779aaf0fc5a25
Author: CI 
AuthorDate: Wed Mar 25 23:44:20 2020 +

Travis CI build asf-site
---
 test-content/cn/docs/powered_by.html | 2 +-
 test-content/docs/powered_by.html| 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/test-content/cn/docs/powered_by.html 
b/test-content/cn/docs/powered_by.html
index 5866edd..f852d64 100644
--- a/test-content/cn/docs/powered_by.html
+++ b/test-content/cn/docs/powered_by.html
@@ -351,7 +351,7 @@ Hudi还支持几个增量的Hive ETL管道,并且目前已集成到Uber的数
 
 Yields.io
 
-Yields.io是第一个使用AI在企业范围内进行自动模型验证和实时监控的金融科技平台。他们的数据湖由Hudi管理,他们还积极使用Hudi为增量式、跨语言/平台机器学习构建基础架构。
+https://www.yields.io/Blog/Apache-Hudi-at-Yields";>Yields.io是第一个使用AI在企业范围内进行自动模型验证和实时监控的金融科技平台。他们的数据湖由Hudi管理,他们还积极使用Hudi为增量式、跨语言/平台机器学习构建基础架构。
 
 Yotpo
 
diff --git a/test-content/docs/powered_by.html 
b/test-content/docs/powered_by.html
index 86e6f32..e5b52e3 100644
--- a/test-content/docs/powered_by.html
+++ b/test-content/docs/powered_by.html
@@ -355,7 +355,7 @@ offering, providing means for AWS users to perform 
record-level updates/deletes
 
 Yields.io
 
-Yields.io is the first FinTech platform that uses AI for automated model 
validation and real-time monitoring on an enterprise-wide scale. Their data 
lake is managed by Hudi. They are also actively building their infrastructure 
for incremental, cross language/platform machine learning using Hudi.
+Yields.io is the first FinTech platform that uses AI for automated model 
validation and real-time monitoring on an enterprise-wide scale. Their https://www.yields.io/Blog/Apache-Hudi-at-Yields";>data lake is 
managed by Hudi. They are also actively building their infrastructure for 
incremental, cross language/platform machine learning using Hudi.
 
 Yotpo
 



[GitHub] [incubator-hudi] umehrot2 commented on issue #1406: [HUDI-713] Fix conversion of Spark array of struct type to Avro schema

2020-03-25 Thread GitBox
umehrot2 commented on issue #1406: [HUDI-713] Fix conversion of Spark array of 
struct type to Avro schema
URL: https://github.com/apache/incubator-hudi/pull/1406#issuecomment-604137304
 
 
   > Sorry did not mean to hijack this fix.. Just trying to understand how it 
ll break compatibility while we are here.. All this schema namespace business 
is only before writing parquet files right... Once you are able to write 
parquet, it should be readable by parquet-avro for merging? (which has nothing 
to do with apache-spark-avro or databricks-spark-avro)... what causes the 
breakage?
   
   All I can think of is, since the old namespace is stored in the 
`parquet.avro.schema` in the actual parquet file, it might conflict with the 
new schema that has a different namespace. 
   @zhedoubushishi is looking into this.
   
   One good thing is that atleast it should not affect user's using 
`FileBaseSchemaProvider` or `SchemaRegistryProvider` with `DeltaStreamer` in 
which case from what I see we directly use the schema that user has passed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1402: [WIP][HUDI-407] Adding Simple Index

2020-03-25 Thread GitBox
vinothchandar commented on a change in pull request #1402: [WIP][HUDI-407] 
Adding Simple Index
URL: https://github.com/apache/incubator-hudi/pull/1402#discussion_r398223341
 
 

 ##
 File path: 
hudi-common/src/main/java/org/apache/hudi/common/util/ParquetUtils.java
 ##
 @@ -103,6 +120,42 @@
 return rowKeys;
   }
 
+  /**
+   * Read the rows with record key and partition path from the given parquet 
file
+   *
+   * @param filePath  The parquet file path.
+   * @param configuration configuration to build fs object
+   * @return Set Set of row keys matching candidateRecordKeys
+   */
+  public static List, Option>> 
fetchRecordKeyPartitionPathFromParquet(Configuration configuration, Path 
filePath,
+   
   String baseInstantTime,
+   
   String fileId) {
+List, Option>> rows = new 
ArrayList<>();
+try {
+  if (!filePath.getFileSystem(configuration).exists(filePath)) {
+return new ArrayList<>();
+  }
+  Configuration conf = new Configuration(configuration);
+  conf.addResource(FSUtils.getFs(filePath.toString(), conf).getConf());
+  Schema readSchema = HoodieAvroUtils.getRecordKeyPartitionPathSchema();
+  AvroReadSupport.setAvroReadSchema(conf, readSchema);
+  AvroReadSupport.setRequestedProjection(conf, readSchema);
+  ParquetReader reader = 
AvroParquetReader.builder(filePath).withConf(conf).build();
 
 Review comment:
   I think we can fix it in this patch itself.. its a critical aspect 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1402: [WIP][HUDI-407] Adding Simple Index

2020-03-25 Thread GitBox
vinothchandar commented on a change in pull request #1402: [WIP][HUDI-407] 
Adding Simple Index
URL: https://github.com/apache/incubator-hudi/pull/1402#discussion_r398222054
 
 

 ##
 File path: 
hudi-client/src/main/java/org/apache/hudi/index/bloom/HoodieSimpleIndex.java
 ##
 @@ -0,0 +1,244 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.index.bloom;
+
+import org.apache.hudi.WriteStatus;
+import org.apache.hudi.common.model.HoodieDataFile;
+import org.apache.hudi.common.model.HoodieKey;
+import org.apache.hudi.common.model.HoodieRecord;
+import org.apache.hudi.common.model.HoodieRecordLocation;
+import org.apache.hudi.common.model.HoodieRecordPayload;
+import org.apache.hudi.common.table.timeline.HoodieInstant;
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.common.util.ParquetUtils;
+import org.apache.hudi.common.util.collection.Pair;
+import org.apache.hudi.config.HoodieWriteConfig;
+import org.apache.hudi.table.HoodieTable;
+
+import com.clearspring.analytics.util.Lists;
+import com.google.common.annotations.VisibleForTesting;
+import org.apache.hadoop.fs.Path;
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+import org.apache.spark.api.java.JavaPairRDD;
+import org.apache.spark.api.java.JavaRDD;
+import org.apache.spark.api.java.JavaSparkContext;
+import org.apache.spark.api.java.Optional;
+import org.apache.spark.api.java.function.Function;
+import org.apache.spark.api.java.function.PairFunction;
+import org.apache.spark.storage.StorageLevel;
+
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.List;
+import java.util.Map;
+import java.util.stream.Collectors;
+
+import scala.Tuple2;
+
+import static java.util.stream.Collectors.toList;
+
+/**
+ * A simple index which reads interested fields from parquet and joins with 
incoming records to find the tagged location
 
 Review comment:
   let's write docs using higher level abstractions like file groups/slices, 
base/log file? 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1402: [WIP][HUDI-407] Adding Simple Index

2020-03-25 Thread GitBox
vinothchandar commented on a change in pull request #1402: [WIP][HUDI-407] 
Adding Simple Index
URL: https://github.com/apache/incubator-hudi/pull/1402#discussion_r398221409
 
 

 ##
 File path: 
hudi-client/src/main/java/org/apache/hudi/config/HoodieIndexConfig.java
 ##
 @@ -58,6 +58,11 @@
   public static final String DEFAULT_BLOOM_INDEX_FILTER_TYPE = 
BloomFilterTypeCode.SIMPLE.name();
   public static final String HOODIE_BLOOM_INDEX_FILTER_DYNAMIC_MAX_ENTRIES = 
"hoodie.bloom.index.filter.dynamic.max.entries";
   public static final String 
DEFAULT_HOODIE_BLOOM_INDEX_FILTER_DYNAMIC_MAX_ENTRIES = "10";
+  public static final String SIMPLE_BLOOM_INDEX_USE_CACHING_PROP = 
"hoodie.simple.bloom.index.use.caching";
 
 Review comment:
   call it `SimpleBloom` when it does not use any bloom filters is very 
confusing.. Can we make it just SimpleIndex? 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1402: [WIP][HUDI-407] Adding Simple Index

2020-03-25 Thread GitBox
vinothchandar commented on a change in pull request #1402: [WIP][HUDI-407] 
Adding Simple Index
URL: https://github.com/apache/incubator-hudi/pull/1402#discussion_r398222688
 
 

 ##
 File path: 
hudi-client/src/main/java/org/apache/hudi/index/bloom/HoodieSimpleIndex.java
 ##
 @@ -0,0 +1,244 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.index.bloom;
+
+import org.apache.hudi.WriteStatus;
+import org.apache.hudi.common.model.HoodieDataFile;
+import org.apache.hudi.common.model.HoodieKey;
+import org.apache.hudi.common.model.HoodieRecord;
+import org.apache.hudi.common.model.HoodieRecordLocation;
+import org.apache.hudi.common.model.HoodieRecordPayload;
+import org.apache.hudi.common.table.timeline.HoodieInstant;
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.common.util.ParquetUtils;
+import org.apache.hudi.common.util.collection.Pair;
+import org.apache.hudi.config.HoodieWriteConfig;
+import org.apache.hudi.table.HoodieTable;
+
+import com.clearspring.analytics.util.Lists;
+import com.google.common.annotations.VisibleForTesting;
+import org.apache.hadoop.fs.Path;
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+import org.apache.spark.api.java.JavaPairRDD;
+import org.apache.spark.api.java.JavaRDD;
+import org.apache.spark.api.java.JavaSparkContext;
+import org.apache.spark.api.java.Optional;
+import org.apache.spark.api.java.function.Function;
+import org.apache.spark.api.java.function.PairFunction;
+import org.apache.spark.storage.StorageLevel;
+
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.List;
+import java.util.Map;
+import java.util.stream.Collectors;
+
+import scala.Tuple2;
+
+import static java.util.stream.Collectors.toList;
+
+/**
+ * A simple index which reads interested fields from parquet and joins with 
incoming records to find the tagged location
+ *
+ * @param 
+ */
+public class HoodieSimpleIndex extends 
HoodieBloomIndex {
 
 Review comment:
   I think this very confusing.. may be we need to push a bunch to HoodieIndex 
abstract class or have a new class introduced..  


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on issue #1406: [HUDI-713] Fix conversion of Spark array of struct type to Avro schema

2020-03-25 Thread GitBox
vinothchandar commented on issue #1406: [HUDI-713] Fix conversion of Spark 
array of struct type to Avro schema
URL: https://github.com/apache/incubator-hudi/pull/1406#issuecomment-604128720
 
 
   Sorry did not mean to hijack this fix.. Just trying to understand how it ll 
break compatibility while we are here.. All this schema namespace business is 
only before writing parquet files right... Once you are able to write parquet, 
it should be readable by parquet-avro for merging? (which has nothing to do 
with apache-spark-avro or databricks-spark-avro)... what causes the breakage?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1418: [HUDI-678] Make config package spark free

2020-03-25 Thread GitBox
vinothchandar commented on a change in pull request #1418: [HUDI-678] Make 
config package spark free
URL: https://github.com/apache/incubator-hudi/pull/1418#discussion_r398216210
 
 

 ##
 File path: 
hudi-client/src/main/java/org/apache/hudi/client/utils/ConfigUtils.java
 ##
 @@ -0,0 +1,80 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.client.utils;
+
+import org.apache.hudi.config.HoodieIndexConfig;
+
+import org.apache.spark.SparkEnv;
+import org.apache.spark.storage.StorageLevel;
+import org.apache.spark.util.Utils;
+
+import java.util.Properties;
+
+import static 
org.apache.hudi.config.HoodieMemoryConfig.DEFAULT_MAX_MEMORY_FOR_SPILLABLE_MAP_IN_BYTES;
+import static 
org.apache.hudi.config.HoodieMemoryConfig.DEFAULT_MIN_MEMORY_FOR_SPILLABLE_MAP_IN_BYTES;
+import static 
org.apache.hudi.config.HoodieWriteConfig.WRITE_STATUS_STORAGE_LEVEL;
+
+/**
+ * Config utils.
+ */
+public class ConfigUtils {
 
 Review comment:
   rename to `SparkConfigUtils`? to make it explicit? 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1418: [HUDI-678] Make config package spark free

2020-03-25 Thread GitBox
vinothchandar commented on a change in pull request #1418: [HUDI-678] Make 
config package spark free
URL: https://github.com/apache/incubator-hudi/pull/1418#discussion_r398217268
 
 

 ##
 File path: 
hudi-client/src/main/java/org/apache/hudi/config/HoodieMemoryConfig.java
 ##
 @@ -113,40 +112,8 @@ public Builder withWriteStatusFailureFraction(double 
failureFraction) {
   return this;
 }
 
-/**
- * Dynamic calculation of max memory to use for for spillable map. 
user.available.memory = spark.executor.memory *
- * (1 - spark.memory.fraction) spillable.available.memory = 
user.available.memory * hoodie.memory.fraction. Anytime
- * the spark.executor.memory or the spark.memory.fraction is changed, the 
memory used for spillable map changes
- * accordingly
- */
 private long getMaxMemoryAllowedForMerge(String maxMemoryFraction) {
-  final String SPARK_EXECUTOR_MEMORY_PROP = "spark.executor.memory";
-  final String SPARK_EXECUTOR_MEMORY_FRACTION_PROP = 
"spark.memory.fraction";
-  // This is hard-coded in spark code {@link
-  // 
https://github.com/apache/spark/blob/576c43fb4226e4efa12189b41c3bc862019862c6/core/src/main/scala/org/apache/
-  // spark/memory/UnifiedMemoryManager.scala#L231} so have to re-define 
this here
-  final String DEFAULT_SPARK_EXECUTOR_MEMORY_FRACTION = "0.6";
-  // This is hard-coded in spark code {@link
-  // 
https://github.com/apache/spark/blob/576c43fb4226e4efa12189b41c3bc862019862c6/core/src/main/scala/org/apache/
-  // spark/SparkContext.scala#L471} so have to re-define this here
-  final String DEFAULT_SPARK_EXECUTOR_MEMORY_MB = "1024"; // in MB
-
-  if (SparkEnv.get() != null) {
-// 1 GB is the default conf used by Spark, look at SparkContext.scala
-long executorMemoryInBytes = Utils.memoryStringToMb(
-SparkEnv.get().conf().get(SPARK_EXECUTOR_MEMORY_PROP, 
DEFAULT_SPARK_EXECUTOR_MEMORY_MB)) * 1024 * 1024L;
-// 0.6 is the default value used by Spark,
-// look at {@link
-// 
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/SparkConf.scala#L507}
-double memoryFraction = Double.parseDouble(
-SparkEnv.get().conf().get(SPARK_EXECUTOR_MEMORY_FRACTION_PROP, 
DEFAULT_SPARK_EXECUTOR_MEMORY_FRACTION));
-double maxMemoryFractionForMerge = 
Double.parseDouble(maxMemoryFraction);
-double userAvailableMemory = executorMemoryInBytes * (1 - 
memoryFraction);
-long maxMemoryForMerge = (long) Math.floor(userAvailableMemory * 
maxMemoryFractionForMerge);
-return Math.max(DEFAULT_MIN_MEMORY_FOR_SPILLABLE_MAP_IN_BYTES, 
maxMemoryForMerge);
-  } else {
-return DEFAULT_MAX_MEMORY_FOR_SPILLABLE_MAP_IN_BYTES;
-  }
+  return ConfigUtils.getMaxMemoryAllowedForMerge(props, maxMemoryFraction);
 
 Review comment:
   Okay.. if we call a Spark specific class here, then this does not achieve 
the purpose right.. 
   
   i.e you cannot move `ConfigUtils` to hudi-spark and keep config in 
`hudi-writer-common` without making hudi-writer-common depend on ConfigUtils? 
   
   We should change the caller of `getMaxMemoryAllowedForMerge` to use 
ConfigUtils.getXX()` just like how you did for storage level? 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1418: [HUDI-678] Make config package spark free

2020-03-25 Thread GitBox
vinothchandar commented on a change in pull request #1418: [HUDI-678] Make 
config package spark free
URL: https://github.com/apache/incubator-hudi/pull/1418#discussion_r398216438
 
 

 ##
 File path: 
hudi-client/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java
 ##
 @@ -565,6 +556,7 @@ public FileSystemViewStorageConfig 
getClientSpecifiedViewStorageConfig() {
 private boolean isMemoryConfigSet = false;
 private boolean isViewConfigSet = false;
 private boolean isConsistencyGuardSet = false;
+private boolean isEngineConfigSet = false;
 
 Review comment:
   remove if unused? 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1418: [HUDI-678] Make config package spark free

2020-03-25 Thread GitBox
vinothchandar commented on a change in pull request #1418: [HUDI-678] Make 
config package spark free
URL: https://github.com/apache/incubator-hudi/pull/1418#discussion_r398215708
 
 

 ##
 File path: 
hudi-client/src/main/java/org/apache/hudi/index/bloom/HoodieBloomIndex.java
 ##
 @@ -69,7 +70,8 @@ public HoodieBloomIndex(HoodieWriteConfig config) {
 
 // Step 0: cache the input record RDD
 if (config.getBloomIndexUseCaching()) {
-  recordRDD.persist(config.getBloomIndexInputStorageLevel());
+  StorageLevel storageLevel = 
ConfigUtils.getBloomIndexInputStorageLevel(config.getProps());
 
 Review comment:
   single line? (also the usages above)


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on issue #1446: [MINOR] Add link for yields.io usage

2020-03-25 Thread GitBox
vinothchandar commented on issue #1446: [MINOR] Add link for yields.io usage
URL: https://github.com/apache/incubator-hudi/pull/1446#issuecomment-604120086
 
 
   ah.. I was hoping to see the CI build pass. :) its still there after 
merging.. n p 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[incubator-hudi] branch asf-site updated: [MINOR] Add link for yields.io usage (#1446)

2020-03-25 Thread smarthi
This is an automated email from the ASF dual-hosted git repository.

smarthi pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/incubator-hudi.git


The following commit(s) were added to refs/heads/asf-site by this push:
 new 770ea3f  [MINOR] Add link for yields.io usage (#1446)
770ea3f is described below

commit 770ea3f75422fe2347113030c52edf8d0e96a64b
Author: vinoth chandar 
AuthorDate: Wed Mar 25 15:24:37 2020 -0700

[MINOR] Add link for yields.io usage (#1446)
---
 docs/_docs/1_4_powered_by.cn.md | 2 +-
 docs/_docs/1_4_powered_by.md| 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/docs/_docs/1_4_powered_by.cn.md b/docs/_docs/1_4_powered_by.cn.md
index 82f9c02..500771f 100644
--- a/docs/_docs/1_4_powered_by.cn.md
+++ b/docs/_docs/1_4_powered_by.cn.md
@@ -20,7 +20,7 @@ Hudi还支持几个增量的Hive ETL管道,并且目前已集成到Uber的数
 
 ### Yields.io
 
-Yields.io是第一个使用AI在企业范围内进行自动模型验证和实时监控的金融科技平台。他们的数据湖由Hudi管理,他们还积极使用Hudi为增量式、跨语言/平台机器学习构建基础架构。
+[Yields.io](https://www.yields.io/Blog/Apache-Hudi-at-Yields)是第一个使用AI在企业范围内进行自动模型验证和实时监控的金融科技平台。他们的数据湖由Hudi管理,他们还积极使用Hudi为增量式、跨语言/平台机器学习构建基础架构。
 
 ### Yotpo
 
diff --git a/docs/_docs/1_4_powered_by.md b/docs/_docs/1_4_powered_by.md
index 229150e..bee6bb9 100644
--- a/docs/_docs/1_4_powered_by.md
+++ b/docs/_docs/1_4_powered_by.md
@@ -23,7 +23,7 @@ offering, providing means for AWS users to perform 
record-level updates/deletes
 
 ### Yields.io
 
-Yields.io is the first FinTech platform that uses AI for automated model 
validation and real-time monitoring on an enterprise-wide scale. Their data 
lake is managed by Hudi. They are also actively building their infrastructure 
for incremental, cross language/platform machine learning using Hudi.
+Yields.io is the first FinTech platform that uses AI for automated model 
validation and real-time monitoring on an enterprise-wide scale. Their [data 
lake](https://www.yields.io/Blog/Apache-Hudi-at-Yields) is managed by Hudi. 
They are also actively building their infrastructure for incremental, cross 
language/platform machine learning using Hudi.
 
 ### Yotpo
 



[GitHub] [incubator-hudi] smarthi merged pull request #1446: [MINOR] Add link for yields.io usage

2020-03-25 Thread GitBox
smarthi merged pull request #1446: [MINOR] Add link for yields.io usage
URL: https://github.com/apache/incubator-hudi/pull/1446
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar opened a new pull request #1446: [MINOR] Add link for yields.io usage

2020-03-25 Thread GitBox
vinothchandar opened a new pull request #1446: [MINOR] Add link for yields.io 
usage
URL: https://github.com/apache/incubator-hudi/pull/1446
 
 
   ## *Tips*
   - *Thank you very much for contributing to Apache Hudi.*
   - *Please review https://hudi.apache.org/contributing.html before opening a 
pull request.*
   
   ## What is the purpose of the pull request
   
   *(For example: This pull request adds quick-start document.)*
   
   ## Brief change log
   
   *(for example:)*
 - *Modify AnnotationLocation checkstyle rule in checkstyle.xml*
   
   ## Verify this pull request
   
   *(Please pick either of the following options)*
   
   This pull request is a trivial rework / code cleanup without any test 
coverage.
   
   *(or)*
   
   This pull request is already covered by existing tests, such as *(please 
describe tests)*.
   
   (or)
   
   This change added tests and can be verified as follows:
   
   *(example:)*
   
 - *Added integration tests for end-to-end.*
 - *Added HoodieClientWriteTest to verify the change.*
 - *Manually verified the change by running a job locally.*
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Closed] (HUDI-711) Exporter copy method gets passed wrong file list

2020-03-25 Thread Raymond Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu closed HUDI-711.
---

> Exporter copy method gets passed wrong file list
> 
>
> Key: HUDI-711
> URL: https://issues.apache.org/jira/browse/HUDI-711
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>  Components: Utilities
>Reporter: Raymond Xu
>Assignee: Raymond Xu
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.6.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> [https://github.com/apache/incubator-hudi/blob/99b7e9eb9ef8827c1e06b7e8621b6be6403b061e/hudi-utilities/src/main/java/org/apache/hudi/utilities/HoodieSnapshotExporter.java#L159]
> param {{dataFiles}} contains all data files and was passed to perform per 
> partition copy



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HUDI-711) Exporter copy method gets passed wrong file list

2020-03-25 Thread Raymond Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu resolved HUDI-711.
-
Resolution: Fixed

> Exporter copy method gets passed wrong file list
> 
>
> Key: HUDI-711
> URL: https://issues.apache.org/jira/browse/HUDI-711
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>  Components: Utilities
>Reporter: Raymond Xu
>Assignee: Raymond Xu
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.6.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> [https://github.com/apache/incubator-hudi/blob/99b7e9eb9ef8827c1e06b7e8621b6be6403b061e/hudi-utilities/src/main/java/org/apache/hudi/utilities/HoodieSnapshotExporter.java#L159]
> param {{dataFiles}} contains all data files and was passed to perform per 
> partition copy



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Reopened] (HUDI-711) Exporter copy method gets passed wrong file list

2020-03-25 Thread Raymond Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu reopened HUDI-711:
-

> Exporter copy method gets passed wrong file list
> 
>
> Key: HUDI-711
> URL: https://issues.apache.org/jira/browse/HUDI-711
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>  Components: Utilities
>Reporter: Raymond Xu
>Assignee: Raymond Xu
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.6.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> [https://github.com/apache/incubator-hudi/blob/99b7e9eb9ef8827c1e06b7e8621b6be6403b061e/hudi-utilities/src/main/java/org/apache/hudi/utilities/HoodieSnapshotExporter.java#L159]
> param {{dataFiles}} contains all data files and was passed to perform per 
> partition copy



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-711) Exporter copy method gets passed wrong file list

2020-03-25 Thread Raymond Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-711:

Status: Closed  (was: Patch Available)

> Exporter copy method gets passed wrong file list
> 
>
> Key: HUDI-711
> URL: https://issues.apache.org/jira/browse/HUDI-711
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>  Components: Utilities
>Reporter: Raymond Xu
>Assignee: Raymond Xu
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.6.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> [https://github.com/apache/incubator-hudi/blob/99b7e9eb9ef8827c1e06b7e8621b6be6403b061e/hudi-utilities/src/main/java/org/apache/hudi/utilities/HoodieSnapshotExporter.java#L159]
> param {{dataFiles}} contains all data files and was passed to perform per 
> partition copy



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] vinothchandar commented on issue #1396: [HUDI-687] Stop incremental reader on RO table before a pending compaction

2020-03-25 Thread GitBox
vinothchandar commented on issue #1396: [HUDI-687] Stop incremental reader on 
RO table before a pending compaction
URL: https://github.com/apache/incubator-hudi/pull/1396#issuecomment-603902158
 
 
   if @bvaradar approves, it should be sufficient.. I will take another pass at 
it today


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on issue #1440: [HUDI-731] Add ChainedTransformer

2020-03-25 Thread GitBox
vinothchandar commented on issue #1440: [HUDI-731] Add ChainedTransformer
URL: https://github.com/apache/incubator-hudi/pull/1440#issuecomment-603901546
 
 
   sorry for dropping off.. crazy times :/ 
   
   I agree with the tradeoffs and points discussed here.. Even though users can 
subclass the new class, then it adds documentation/discoverability overheads 
for the user.. UX level, it may be good to just turn our existing param into 
taking a list of class names and use this class internally.. 
   
   +1 on reworking towards that.. thanks @xushiyan 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] leesf commented on issue #1418: [HUDI-678] Make config package spark free

2020-03-25 Thread GitBox
leesf commented on issue #1418: [HUDI-678] Make config package spark free
URL: https://github.com/apache/incubator-hudi/pull/1418#issuecomment-603873115
 
 
   > @leesf sorry behind on reviews.. Will get stuff ready for you by your day 
time
   
   No worries..


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] yanghua merged pull request #1444: [MINOR] Update web site for 0.5.2 doc

2020-03-25 Thread GitBox
yanghua merged pull request #1444: [MINOR] Update web site for 0.5.2 doc
URL: https://github.com/apache/incubator-hudi/pull/1444
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] yanghua commented on issue #1444: [MINOR] Update web site for 0.5.2 doc

2020-03-25 Thread GitBox
yanghua commented on issue #1444: [MINOR] Update web site for 0.5.2 doc
URL: https://github.com/apache/incubator-hudi/pull/1444#issuecomment-603870415
 
 
   > LGTM, as long as you have tested once localluy
   
   Yes, I viewed doc of 0.5.2 and some other pages in my local.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on issue #1443: [SUPPORT] Spark-Hudi consumes too much space in a temp folder while upsert

2020-03-25 Thread GitBox
vinothchandar commented on issue #1443: [SUPPORT] Spark-Hudi consumes too much 
space in a temp folder while upsert
URL: https://github.com/apache/incubator-hudi/issues/1443#issuecomment-603869842
 
 
   @lamber-ken this has come up in various tickets already..
   @n3nash should we bump up the default merge size? what do you guys use at 
uber?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on issue #1418: [HUDI-678] Make config package spark free

2020-03-25 Thread GitBox
vinothchandar commented on issue #1418: [HUDI-678] Make config package spark 
free
URL: https://github.com/apache/incubator-hudi/pull/1418#issuecomment-603869397
 
 
   @leesf sorry behind on reviews.. Will get stuff ready for you by your day 
time


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] yanghua merged pull request #1445: [MINOR] Fix javadoc of InsertBucket

2020-03-25 Thread GitBox
yanghua merged pull request #1445: [MINOR] Fix javadoc of InsertBucket
URL: https://github.com/apache/incubator-hudi/pull/1445
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[incubator-hudi] branch master updated: [MINOR] Fix javadoc of InsertBucket (#1445)

2020-03-25 Thread vinoyang
This is an automated email from the ASF dual-hosted git repository.

vinoyang pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new 5eed6c9  [MINOR] Fix javadoc of InsertBucket (#1445)
5eed6c9 is described below

commit 5eed6c98a8880dc3b4e64ec9ff9dae4859e89b4c
Author: Mathieu 
AuthorDate: Wed Mar 25 22:25:47 2020 +0800

[MINOR] Fix javadoc of InsertBucket (#1445)
---
 .../src/main/java/org/apache/hudi/table/HoodieCopyOnWriteTable.java | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git 
a/hudi-client/src/main/java/org/apache/hudi/table/HoodieCopyOnWriteTable.java 
b/hudi-client/src/main/java/org/apache/hudi/table/HoodieCopyOnWriteTable.java
index 94c520c..0b16efe 100644
--- 
a/hudi-client/src/main/java/org/apache/hudi/table/HoodieCopyOnWriteTable.java
+++ 
b/hudi-client/src/main/java/org/apache/hudi/table/HoodieCopyOnWriteTable.java
@@ -499,7 +499,7 @@ public class HoodieCopyOnWriteTable extends Hoodi
   }
 
   /**
-   * Helper class for an insert bucket along with the weight [0.0, 0.1] that 
defines the amount of incoming inserts that
+   * Helper class for an insert bucket along with the weight [0.0, 1.0] that 
defines the amount of incoming inserts that
* should be allocated to the bucket.
*/
   class InsertBucket implements Serializable {



[GitHub] [incubator-hudi] yanghua commented on a change in pull request #1444: [MINOR] Update web site for 0.5.2 doc

2020-03-25 Thread GitBox
yanghua commented on a change in pull request #1444: [MINOR] Update web site 
for 0.5.2 doc
URL: https://github.com/apache/incubator-hudi/pull/1444#discussion_r397893404
 
 

 ##
 File path: content/sitemap.xml
 ##
 @@ -265,6 +265,138 @@
 2019-12-30T14:59:57-05:00
 
 
+http://0.0.0.0:4000/cn/docs/0.5.2-s3_hoodie.html
+2019-12-30T14:59:57-05:00
+
+
+http://0.0.0.0:4000/docs/0.5.2-s3_hoodie.html
+2019-12-30T14:59:57-05:00
+
+
+http://0.0.0.0:4000/cn/docs/0.5.2-gcs_hoodie.html
+2019-12-30T14:59:57-05:00
+
+
+http://0.0.0.0:4000/docs/0.5.2-gcs_hoodie.html
+2019-12-30T14:59:57-05:00
+
+
+http://0.0.0.0:4000/cn/docs/0.5.2-migration_guide.html
 
 Review comment:
   It seems that it's an auto-generated HTML page for every released version.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] wangxianghu commented on issue #1445: [MINOR] Fix javadoc of InsertBucket

2020-03-25 Thread GitBox
wangxianghu commented on issue #1445: [MINOR] Fix javadoc of InsertBucket
URL: https://github.com/apache/incubator-hudi/pull/1445#issuecomment-603862058
 
 
   @yanghua could you please take a look


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] wangxianghu opened a new pull request #1445: [MINOR] Fix javadoc of InsertBucket

2020-03-25 Thread GitBox
wangxianghu opened a new pull request #1445: [MINOR] Fix javadoc of InsertBucket
URL: https://github.com/apache/incubator-hudi/pull/1445
 
 
   ## *Tips*
   - *Thank you very much for contributing to Apache Hudi.*
   - *Please review https://hudi.apache.org/contributing.html before opening a 
pull request.*
   
   ## What is the purpose of the pull request
   
   *Fix javadoc of InsertBucket*
   
   ## Brief change log
   
   *Fix javadoc of InsertBucket*
 
   This pull request is a trivial rework / code cleanup without any test 
coverage.
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] leesf commented on issue #1418: [HUDI-678] Make config package spark free

2020-03-25 Thread GitBox
leesf commented on issue #1418: [HUDI-678] Make config package spark free
URL: https://github.com/apache/incubator-hudi/pull/1418#issuecomment-603850989
 
 
   @vinothchandar Any concerns here?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] tverdokhlebd closed issue #1443: [SUPPORT] Spark-Hudi consumes too much space in a temp folder while upsert

2020-03-25 Thread GitBox
tverdokhlebd closed issue #1443: [SUPPORT] Spark-Hudi consumes too much space 
in a temp folder while upsert
URL: https://github.com/apache/incubator-hudi/issues/1443
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] tverdokhlebd commented on issue #1443: [SUPPORT] Spark-Hudi consumes too much space in a temp folder while upsert

2020-03-25 Thread GitBox
tverdokhlebd commented on issue #1443: [SUPPORT] Spark-Hudi consumes too much 
space in a temp folder while upsert
URL: https://github.com/apache/incubator-hudi/issues/1443#issuecomment-603846771
 
 
   @lamber-ken , thanks, the problem is solved.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] leesf commented on a change in pull request #1444: [MINOR] Update web site for 0.5.2 doc

2020-03-25 Thread GitBox
leesf commented on a change in pull request #1444: [MINOR] Update web site for 
0.5.2 doc
URL: https://github.com/apache/incubator-hudi/pull/1444#discussion_r397838610
 
 

 ##
 File path: content/sitemap.xml
 ##
 @@ -265,6 +265,138 @@
 2019-12-30T14:59:57-05:00
 
 
+http://0.0.0.0:4000/cn/docs/0.5.2-s3_hoodie.html
+2019-12-30T14:59:57-05:00
+
+
+http://0.0.0.0:4000/docs/0.5.2-s3_hoodie.html
+2019-12-30T14:59:57-05:00
+
+
+http://0.0.0.0:4000/cn/docs/0.5.2-gcs_hoodie.html
+2019-12-30T14:59:57-05:00
+
+
+http://0.0.0.0:4000/docs/0.5.2-gcs_hoodie.html
+2019-12-30T14:59:57-05:00
+
+
+http://0.0.0.0:4000/cn/docs/0.5.2-migration_guide.html
 
 Review comment:
   is this needed?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Updated] (HUDI-733) presto query data error

2020-03-25 Thread jing (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jing updated HUDI-733:
--
Description: 
We found a data sequence issue in Hudi when we use API to import data(use 
spark.read.json("filename") read to dataframe then write  to hudi). The 
original d is rowkey:1 dt:2 time:3.

But the value is unexpected when query the data by Presto(rowkey:2 dt:1 
time:2), but correctly in Hive.

After analysis, if I use dt to partition the column data, it is also written in 
the parquet file. dt = xxx, and the value of the partition column should be the 
value in the path of the hudi. However, I found that the value of the presto 
query must be one-to-one with the columns in the parquet. He will not detect 
the column names.

Transformation methods and suggestions:
 # Can the inputformat class be ignored to read the column value of the 
partition column dt in parquet?
 # Can hive data be synchronized without dt as a partition column? Consider 
adding a column such as repl_dt as a partition column and dt as an ordinary 
field.
 # The dt column is not written to the parquet file.

     4, dt is written to the parquet file, but as the last column.

 

[~bhasudha]

  was:
We found a data sequence issue in Hudi when we use API to import data(use 
spark.read.json("filename") read to dataframe then write  to hudi). The 
original d is rowkey:1 dt:2 time:3.

But the value is unexpected when query the data by Presto(rowkey:2 dt:1 
time:2), but correctly in Hive.

After analysis, if I use dt to partition the column data, it is also written in 
the parquet file. dt = xxx, and the value of the partition column should be the 
value in the path of the hudi. However, I found that the value of the presto 
query must be one-to-one with the columns in the parquet. He will not detect 
the column names.

Transformation methods and suggestions:
 # Can the inputformat class be ignored to read the column value of the 
partition column dt in parquet?
 # Can hive data be synchronized without dt as a partition column? Consider 
adding a column such as repl_dt as a partition column and dt as an ordinary 
field.
 # The dt column is not written to the parquet file.

     4, dt is written to the parquet file, but as the last column.

 

@Sudha


> presto query data error
> ---
>
> Key: HUDI-733
> URL: https://issues.apache.org/jira/browse/HUDI-733
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>  Components: Presto Integration
>Affects Versions: 0.5.1
>Reporter: jing
>Priority: Major
> Attachments: hive_table.png, parquet_context.png, parquet_schema.png, 
> presto_query_data.png
>
>
> We found a data sequence issue in Hudi when we use API to import data(use 
> spark.read.json("filename") read to dataframe then write  to hudi). The 
> original d is rowkey:1 dt:2 time:3.
> But the value is unexpected when query the data by Presto(rowkey:2 dt:1 
> time:2), but correctly in Hive.
> After analysis, if I use dt to partition the column data, it is also written 
> in the parquet file. dt = xxx, and the value of the partition column should 
> be the value in the path of the hudi. However, I found that the value of the 
> presto query must be one-to-one with the columns in the parquet. He will not 
> detect the column names.
> Transformation methods and suggestions:
>  # Can the inputformat class be ignored to read the column value of the 
> partition column dt in parquet?
>  # Can hive data be synchronized without dt as a partition column? Consider 
> adding a column such as repl_dt as a partition column and dt as an ordinary 
> field.
>  # The dt column is not written to the parquet file.
>      4, dt is written to the parquet file, but as the last column.
>  
> [~bhasudha]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] yanghua opened a new pull request #1444: [MINOR] Update web site for 0.5.2 doc

2020-03-25 Thread GitBox
yanghua opened a new pull request #1444: [MINOR] Update web site for 0.5.2 doc
URL: https://github.com/apache/incubator-hudi/pull/1444
 
 
   
   
   ## What is the purpose of the pull request
   
   *This pull request updates web site for 0.5.2 doc*
   
   ## Brief change log
   
 - *Update web site for 0.5.2 doc*
   
   ## Verify this pull request
   
   
   This pull request is a trivial rework / code cleanup without any test 
coverage.
   
   
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Updated] (HUDI-624) Split some of the code from PR for HUDI-479

2020-03-25 Thread vinoyang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang updated HUDI-624:
--
Fix Version/s: (was: 0.5.2)
   0.6.0

> Split some of the code from PR for HUDI-479 
> 
>
> Key: HUDI-624
> URL: https://issues.apache.org/jira/browse/HUDI-624
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Code Cleanup
>Reporter: Suneel Marthi
>Assignee: Suneel Marthi
>Priority: Major
>  Labels: patch, pull-request-available
> Fix For: 0.6.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> This Jira is to reduce the size of the code base in PR# 1159 for HUDI-479, 
> making it easier for review.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-615) Add Test cases for StringUtils.

2020-03-25 Thread vinoyang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang updated HUDI-615:
--
Fix Version/s: (was: 0.5.2)
   0.6.0

> Add Test cases for StringUtils.
> ---
>
> Key: HUDI-615
> URL: https://issues.apache.org/jira/browse/HUDI-615
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Utilities
>Reporter: Suneel Marthi
>Assignee: Suneel Marthi
>Priority: Major
>  Labels: pull-request-available, test
> Fix For: 0.6.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Presently no Tests exist for org.apache.hudi.common.util.StringUtils - add 
> tests.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-590) Cut a new Doc version 0.5.1 explicitly

2020-03-25 Thread vinoyang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang updated HUDI-590:
--
Fix Version/s: (was: 0.5.2)
   0.6.0

> Cut a new Doc version 0.5.1 explicitly
> --
>
> Key: HUDI-590
> URL: https://issues.apache.org/jira/browse/HUDI-590
> Project: Apache Hudi (incubating)
>  Issue Type: Task
>  Components: Docs, Release & Administrative
>Reporter: Bhavani Sudha
>Assignee: Bhavani Sudha
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.6.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The latest version of docs needs to be tagged as 0.5.1 explicitly in the 
> site. Follow instructions in 
> [https://github.com/apache/incubator-hudi/blob/asf-site/README.md#updating-site]
>  to create a new dir 0.5.1 under docs/_docs/ 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-571) Modify Hudi CLI to show archived commits

2020-03-25 Thread vinoyang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang updated HUDI-571:
--
Fix Version/s: (was: 0.5.2)
   0.6.0

> Modify Hudi CLI to show archived commits
> 
>
> Key: HUDI-571
> URL: https://issues.apache.org/jira/browse/HUDI-571
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: CLI
>Reporter: satish
>Assignee: satish
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.6.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Hudi CLI has 'show archived commits' command which is not very helpful
>  
> {code:java}
> ->show archived commits
> ===> Showing only 10 archived commits <===
>     
>     | CommitTime    | CommitType|
>     |===|
>     | 2019033304| commit    |
>     | 20190323220154| commit    |
>     | 20190323220154| commit    |
>     | 20190323224004| commit    |
>     | 20190323224013| commit    |
>     | 20190323224229| commit    |
>     | 20190323224229| commit    |
>     | 20190323232849| commit    |
>     | 20190323233109| commit    |
>     | 20190323233109| commit    |
>  {code}
> Modify or introduce new command to make it easy to debug
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-114) Allow for clients to overwrite the payload implementation in hoodie.properties

2020-03-25 Thread vinoyang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang updated HUDI-114:
--
Fix Version/s: (was: 0.5.2)
   0.6.0

> Allow for clients to overwrite the payload implementation in hoodie.properties
> --
>
> Key: HUDI-114
> URL: https://issues.apache.org/jira/browse/HUDI-114
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>  Components: newbie, Writer Core
>Reporter: Nishith Agarwal
>Assignee: Pratyaksh Sharma
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.6.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Right now, once the payload class is set once in hoodie.properties, it cannot 
> be changed. In some cases, if a code refactor is done and the jar updated, 
> one may need to pass the new payload class name.
> Also, fix picking up the payload name for datasource API. By default 
> HoodieAvroPayload is written whereas for datasource API default is 
> OverwriteLatestAvroPayload



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (HUDI-597) Enable incremental pulling from defined partitions

2020-03-25 Thread vinoyang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang closed HUDI-597.
-

> Enable incremental pulling from defined partitions
> --
>
> Key: HUDI-597
> URL: https://issues.apache.org/jira/browse/HUDI-597
> Project: Apache Hudi (incubating)
>  Issue Type: New Feature
>Reporter: Yanjia Gary Li
>Assignee: Yanjia Gary Li
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.5.2
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> For the use case that I only need to pull the incremental part of certain 
> partitions, I need to do the incremental pulling from the entire dataset 
> first then filtering in Spark.
> If we can use the folder partitions directly as part of the input path, it 
> could run faster by only load relevant parquet files.
> Example:
>  
> {code:java}
> spark.read.format("org.apache.hudi")
> .option(DataSourceReadOptions.VIEW_TYPE_OPT_KEY,DataSourceReadOptions.VIEW_TYPE_INCREMENTAL_OPT_VAL)
> .option(DataSourceReadOptions.BEGIN_INSTANTTIME_OPT_KEY, "000")
> .option(DataSourceReadOptions.INCR_PATH_GLOB_OPT_KEY, "/year=2016/*/*/*")
> .load(path)
>  
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (HUDI-554) Restructure code/packages to move more code back into hudi-writer-common

2020-03-25 Thread vinoyang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang closed HUDI-554.
-

> Restructure code/packages  to move more code back into hudi-writer-common
> -
>
> Key: HUDI-554
> URL: https://issues.apache.org/jira/browse/HUDI-554
> Project: Apache Hudi (incubating)
>  Issue Type: Sub-task
>  Components: Code Cleanup
>Reporter: Vinoth Chandar
>Assignee: Vinoth Chandar
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.2
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (HUDI-570) Improve unit test coverage FSUtils.java

2020-03-25 Thread vinoyang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang closed HUDI-570.
-

> Improve unit test coverage FSUtils.java
> ---
>
> Key: HUDI-570
> URL: https://issues.apache.org/jira/browse/HUDI-570
> Project: Apache Hudi (incubating)
>  Issue Type: Sub-task
>Reporter: Balajee Nagasubramaniam
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.5.2
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Add test cases for 
> - deleteOlderRollbackMetaFiles()
> - deleteOlderCleanMetaFiles()



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (HUDI-617) Add support for data types convertible to String in TimestampBasedKeyGenerator

2020-03-25 Thread vinoyang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang closed HUDI-617.
-

> Add support for data types convertible to String in TimestampBasedKeyGenerator
> --
>
> Key: HUDI-617
> URL: https://issues.apache.org/jira/browse/HUDI-617
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Utilities
>Reporter: Amit Singh
>Priority: Minor
>  Labels: easyfix, pull-request-available
> Fix For: 0.5.2
>
> Attachments: test_data.json, test_schema.avsc
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Currently, TimestampBasedKeyGenerator only supports 4 data types for the 
> partition key. They are  Double, Long, Float and String. However, if the 
> `avro.java.string` is not specified in the schema provided, Hudi throws the 
> following error:
>  org.apache.hudi.exception.HoodieNotSupportedException: Unexpected type for 
> partition field: org.apache.avro.util.Utf8
>  at 
> org.apache.hudi.utilities.keygen.TimestampBasedKeyGenerator.getKey(TimestampBasedKeyGenerator.java:111)
>  at 
> org.apache.hudi.utilities.deltastreamer.DeltaSync.lambda$readFromSource$f92c188c$1(DeltaSync.java:338)
> 
>  It will be better if the support was more generalised to include the data 
> types that provide method to convert them to String such as `Utf8` since all 
> these methods implement the `CharSequence` interface.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (HUDI-585) Optimize the steps of building with scala-2.12

2020-03-25 Thread vinoyang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang closed HUDI-585.
-

> Optimize the steps of building with scala-2.12 
> ---
>
> Key: HUDI-585
> URL: https://issues.apache.org/jira/browse/HUDI-585
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Utilities
>Reporter: lamber-ken
>Assignee: lamber-ken
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.2
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Optimize the steps of building with scala-2.12.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (HUDI-587) Jacoco coverage report is not generated

2020-03-25 Thread vinoyang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang closed HUDI-587.
-

> Jacoco coverage report is not generated
> ---
>
> Key: HUDI-587
> URL: https://issues.apache.org/jira/browse/HUDI-587
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>Reporter: Prashant Wason
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.5.2
>
>   Original Estimate: 1h
>  Time Spent: 20m
>  Remaining Estimate: 40m
>
> When running tests, the jacoco coverage report is not generated. The jacoco 
> plugin is loaded, it sets the correct Java Agent line, bit it fails to find 
> the execution data file after tests complete.
> Example:
> mvn test -Dtest=TestHoodieActiveTimeline
> ...
> 22:42:40 [INFO] — jacoco-maven-plugin:0.7.8:prepare-agent (pre-unit-test) @ 
> hudi-common —
>  22:42:40 [INFO] *surefireArgLine set to 
> javaagent:/home/pwason/.m2/repository/org/jacoco/org.jacoco.agent/0.7.8/org.jacoco.agent-0.7.8-runtime.jar=destfile=/home/pwason/work/java/incubator-hudi/hudi-common/target/coverage-reports/jacocout.exec*
> *...*
> 22:42:49 [INFO] — jacoco-maven-plugin:0.7.8:report (post-unit-test) @ 
> hudi-common —
>  22:42:49 [INFO] *Skipping JaCoCo execution due to missing execution data 
> file.*
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (HUDI-604) Update docker page

2020-03-25 Thread vinoyang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang closed HUDI-604.
-

> Update docker page
> --
>
> Key: HUDI-604
> URL: https://issues.apache.org/jira/browse/HUDI-604
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Docs
>Reporter: lamber-ken
>Assignee: lamber-ken
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.2
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> 1, Change one-line command to multi lines
> 2, Unify code indent



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (HUDI-297) [Presto] Reuse table metadata listing and reduce namenode RPCs for Presto RO View queries

2020-03-25 Thread vinoyang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang closed HUDI-297.
-

> [Presto] Reuse table metadata listing and reduce namenode RPCs for Presto RO 
> View queries
> -
>
> Key: HUDI-297
> URL: https://issues.apache.org/jira/browse/HUDI-297
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Presto Integration
>Reporter: Bhavani Sudha Saktheeswaran
>Assignee: Bhavani Sudha
>Priority: Minor
> Fix For: 0.5.2
>
>
> This is described in detail in this Presto issue - 
> [https://github.com/prestodb/presto/issues/13511]
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (HUDI-499) Allow partition path to be updated with GLOBAL_BLOOM index

2020-03-25 Thread vinoyang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang closed HUDI-499.
-

> Allow partition path to be updated with GLOBAL_BLOOM index
> --
>
> Key: HUDI-499
> URL: https://issues.apache.org/jira/browse/HUDI-499
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Index
>Reporter: Raymond Xu
>Assignee: Raymond Xu
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.5.2
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> h3. Context
> When a record is to be updated with a new partition path, and when set to 
> GLOBAL_BLOOM as index, the current logic implemented in 
> [https://github.com/apache/incubator-hudi/pull/1091/] ignores the new 
> partition path and update the record in the original partition path.
> h3. Proposed change
> Allow records to be inserted into their new partition paths and delete the 
> records in the old partition paths. A configuration (e.g. 
> {{hoodie.index.bloom.update.partition.path=true}}) can be added to enable 
> this feature.
> h4. An example use case
> A Hudi dataset manages people info and partitioned by birthday. In most 
> cases, where people info are updated, birthdays are not to be changed (that's 
> why we choose it as partition field). But in some edge cases where birthday 
> info are input wrongly and we want to manually fix it or allow user to 
> updated it occasionally. In this case, option 2 would be helpful in keeping 
> records in the expected partition, so that a query like "show me people who 
> were born after 2000" would work.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (HUDI-596) KafkaConsumer need to be closed

2020-03-25 Thread vinoyang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang closed HUDI-596.
-

> KafkaConsumer need to be closed
> ---
>
> Key: HUDI-596
> URL: https://issues.apache.org/jira/browse/HUDI-596
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>  Components: Utilities
>Reporter: dengziming
>Assignee: dengziming
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.2
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> `offsetGen.getNextOffsetRanges` will is called periodically in DeltaStreamer 
> application, and it will `new KafkaConsumer(kafkaParams)` without close, and 
> Exception will throw after a while.
> ```
> java.net.SocketException: Too many open files
>   at sun.nio.ch.Net.socket0(Native Method)
>   at sun.nio.ch.Net.socket(Net.java:411)
>   at sun.nio.ch.Net.socket(Net.java:404)
>   at sun.nio.ch.SocketChannelImpl.(SocketChannelImpl.java:105)
>   at 
> sun.nio.ch.SelectorProviderImpl.openSocketChannel(SelectorProviderImpl.java:60)
>   at java.nio.channels.SocketChannel.open(SocketChannel.java:145)
>   at org.apache.kafka.common.network.Selector.connect(Selector.java:211)
>   at 
> org.apache.kafka.clients.NetworkClient.initiateConnect(NetworkClient.java:864)
>   at org.apache.kafka.clients.NetworkClient.ready(NetworkClient.java:265)
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.trySend(ConsumerNetworkClient.java:485)
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:261)
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:242)
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:218)
>   at 
> org.apache.kafka.clients.consumer.internals.Fetcher.getTopicMetadata(Fetcher.java:274)
>   at 
> org.apache.kafka.clients.consumer.KafkaConsumer.partitionsFor(KafkaConsumer.java:1774)
>   at 
> org.apache.kafka.clients.consumer.KafkaConsumer.partitionsFor(KafkaConsumer.java:1742)
>   at 
> org.apache.hudi.utilities.sources.helpers.KafkaOffsetGen.getNextOffsetRanges(KafkaOffsetGen.java:177)
>   at 
> org.apache.hudi.utilities.sources.JsonKafkaSource.fetchNewData(JsonKafkaSource.java:56)
>   at org.apache.hudi.utilities.sources.Source.fetchNext(Source.java:73)
>   at 
> org.apache.hudi.utilities.deltastreamer.SourceFormatAdapter.fetchNewDataInRowFormat(SourceFormatAdapter.java:107)
>   at 
> org.apache.hudi.utilities.deltastreamer.DeltaSync.readFromSource(DeltaSync.java:288)
>   at 
> org.apache.hudi.utilities.deltastreamer.DeltaSync.syncOnce(DeltaSync.java:226)
> ```



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (HUDI-647) Change community.html page with new PPMC/committers

2020-03-25 Thread vinoyang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang closed HUDI-647.
-

> Change community.html page with new PPMC/committers
> ---
>
> Key: HUDI-647
> URL: https://issues.apache.org/jira/browse/HUDI-647
> Project: Apache Hudi (incubating)
>  Issue Type: Task
>  Components: Release & Administrative
>Reporter: Vinoth Chandar
>Assignee: vinoyang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.2
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> [~yanghua] assigned to you since this is part of the maturity matrix as 
> well.. feel free to reassign if you don't have cycles.. 
> we need to denote 
> PPMC, for you and [~xleesf] 
> [~shivnarayan] as committer



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


  1   2   >