[GitHub] [incubator-hudi] dengziming commented on pull request #1151: [HUDI-476] Add hudi-examples module

2020-05-14 Thread GitBox


dengziming commented on pull request #1151:
URL: https://github.com/apache/incubator-hudi/pull/1151#issuecomment-629044776


   > Can you confirm if you have run these examples locally once and verified 
the instructions work?
   
   @vinothchandar , I ran these examples locally and ensured they do work, but 
haven't tried them in a yarn-cluster mode.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[incubator-hudi] branch master updated: HUDI-528 Handle empty commit in incremental pulling (#1612)

2020-05-14 Thread vbalaji
This is an automated email from the ASF dual-hosted git repository.

vbalaji pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new a64afdf  HUDI-528 Handle empty commit in incremental pulling (#1612)
a64afdf is described below

commit a64afdfd17ac974e451bceb877f3d40a9c775253
Author: Gary Li 
AuthorDate: Thu May 14 22:55:25 2020 -0700

HUDI-528 Handle empty commit in incremental pulling (#1612)
---
 .../org/apache/hudi/IncrementalRelation.scala  | 29 +-
 .../apache/hudi/functional/TestDataSource.scala|  8 ++
 2 files changed, 20 insertions(+), 17 deletions(-)

diff --git 
a/hudi-spark/src/main/scala/org/apache/hudi/IncrementalRelation.scala 
b/hudi-spark/src/main/scala/org/apache/hudi/IncrementalRelation.scala
index 8bb4609..436895b 100644
--- a/hudi-spark/src/main/scala/org/apache/hudi/IncrementalRelation.scala
+++ b/hudi-spark/src/main/scala/org/apache/hudi/IncrementalRelation.scala
@@ -19,9 +19,9 @@ package org.apache.hudi
 
 import org.apache.hadoop.fs.GlobPattern
 import org.apache.hadoop.fs.Path
+import org.apache.hudi.avro.HoodieAvroUtils
 import org.apache.hudi.common.model.{HoodieCommitMetadata, HoodieRecord, 
HoodieTableType}
-import org.apache.hudi.common.table.HoodieTableMetaClient
-import org.apache.hudi.common.util.ParquetUtils
+import org.apache.hudi.common.table.{HoodieTableMetaClient, 
TableSchemaResolver}
 import org.apache.hudi.config.HoodieWriteConfig
 import org.apache.hudi.exception.HoodieException
 import org.apache.hudi.table.HoodieTable
@@ -47,8 +47,7 @@ class IncrementalRelation(val sqlContext: SQLContext,
 
   private val log = LogManager.getLogger(classOf[IncrementalRelation])
 
-  val fs = new 
Path(basePath).getFileSystem(sqlContext.sparkContext.hadoopConfiguration)
-  val metaClient = new 
HoodieTableMetaClient(sqlContext.sparkContext.hadoopConfiguration, basePath, 
true)
+  private val metaClient = new 
HoodieTableMetaClient(sqlContext.sparkContext.hadoopConfiguration, basePath, 
true)
   // MOR tables not supported yet
   if (metaClient.getTableType.equals(HoodieTableType.MERGE_ON_READ)) {
 throw new HoodieException("Incremental view not implemented yet, for 
merge-on-read tables")
@@ -56,7 +55,7 @@ class IncrementalRelation(val sqlContext: SQLContext,
   // TODO : Figure out a valid HoodieWriteConfig
   private val hoodieTable = HoodieTable.create(metaClient, 
HoodieWriteConfig.newBuilder().withPath(basePath).build(),
 sqlContext.sparkContext.hadoopConfiguration)
-  val commitTimeline = 
hoodieTable.getMetaClient.getCommitTimeline.filterCompletedInstants()
+  private val commitTimeline = 
hoodieTable.getMetaClient.getCommitTimeline.filterCompletedInstants()
   if (commitTimeline.empty()) {
 throw new HoodieException("No instants to incrementally pull")
   }
@@ -65,25 +64,21 @@ class IncrementalRelation(val sqlContext: SQLContext,
   s"option ${DataSourceReadOptions.BEGIN_INSTANTTIME_OPT_KEY}")
   }
 
-  val lastInstant = commitTimeline.lastInstant().get()
+  private val lastInstant = commitTimeline.lastInstant().get()
 
-  val commitsToReturn = commitTimeline.findInstantsInRange(
+  private val commitsToReturn = commitTimeline.findInstantsInRange(
 optParams(DataSourceReadOptions.BEGIN_INSTANTTIME_OPT_KEY),
 optParams.getOrElse(DataSourceReadOptions.END_INSTANTTIME_OPT_KEY, 
lastInstant.getTimestamp))
 .getInstants.iterator().toList
 
-  // use schema from a file produced in the latest instant
-  val latestSchema = {
-// use last instant if instant range is empty
-val instant = commitsToReturn.lastOption.getOrElse(lastInstant)
-val latestMeta = HoodieCommitMetadata
-  .fromBytes(commitTimeline.getInstantDetails(instant).get, 
classOf[HoodieCommitMetadata])
-val metaFilePath = 
latestMeta.getFileIdAndFullPaths(basePath).values().iterator().next()
-
AvroConversionUtils.convertAvroSchemaToStructType(ParquetUtils.readAvroSchema(
-  sqlContext.sparkContext.hadoopConfiguration, new Path(metaFilePath)))
+  // use schema from latest metadata, if not present, read schema from the 
data file
+  private val latestSchema = {
+val schemaUtil = new TableSchemaResolver(metaClient)
+val tableSchema = 
HoodieAvroUtils.createHoodieWriteSchema(schemaUtil.getTableAvroSchemaWithoutMetadataFields);
+AvroConversionUtils.convertAvroSchemaToStructType(tableSchema)
   }
 
-  val filters = {
+  private val filters = {
 if 
(optParams.contains(DataSourceReadOptions.PUSH_DOWN_INCR_FILTERS_OPT_KEY)) {
   val filterStr = optParams.getOrElse(
 DataSourceReadOptions.PUSH_DOWN_INCR_FILTERS_OPT_KEY,
diff --git 
a/hudi-spark/src/test/scala/org/apache/hudi/functional/TestDataSource.scala 
b/hudi-spark/src/test/scala/org/apache/hudi/functional/TestDataSource.scala
index fdd02bf..8352485 100644
--- 

[GitHub] [incubator-hudi] bvaradar merged pull request #1612: [HUDI-528] Handle empty commit in incremental pulling

2020-05-14 Thread GitBox


bvaradar merged pull request #1612:
URL: https://github.com/apache/incubator-hudi/pull/1612


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-hudi] bvaradar closed pull request #1532: [HUDI-794]: implemented optional use of --config-folder option in HoodieDeltaStreamer

2020-05-14 Thread GitBox


bvaradar closed pull request #1532:
URL: https://github.com/apache/incubator-hudi/pull/1532


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-hudi] bvaradar commented on a change in pull request #1532: [HUDI-794]: implemented optional use of --config-folder option in HoodieDeltaStreamer

2020-05-14 Thread GitBox


bvaradar commented on a change in pull request #1532:
URL: https://github.com/apache/incubator-hudi/pull/1532#discussion_r425579387



##
File path: 
hudi-utilities/src/main/java/org/apache/hudi/utilities/DeltaStreamerUtility.java
##
@@ -0,0 +1,128 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.utilities;
+
+import org.apache.hudi.common.config.TypedProperties;
+import org.apache.hudi.common.util.StringUtils;
+import org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer;
+import org.apache.hudi.utilities.deltastreamer.HoodieMultiTableDeltaStreamer;
+import org.apache.hudi.utilities.deltastreamer.TableExecutionContext;
+
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.FileUtil;
+import org.apache.hadoop.fs.Path;
+
+import java.io.IOException;
+import java.util.ArrayList;
+
+public class DeltaStreamerUtility {
+
+  public static String getDefaultConfigFilePath(String configFolder, String 
database, String currentTable) {
+return configFolder + Constants.FILEDELIMITER + database + 
Constants.UNDERSCORE + currentTable + Constants.DEFAULT_CONFIG_FILE_NAME_SUFFIX;
+  }
+
+  public static String getTableWithDatabase(TableExecutionContext context) {
+return context.getDatabase() + Constants.DELIMITER + 
context.getTableName();
+  }
+
+  public static void checkIfPropsFileAndConfigFolderExist(String 
commonPropsFile, String configFolder, FileSystem fs) throws IOException {
+if (!fs.exists(new Path(commonPropsFile))) {
+  throw new IllegalArgumentException("Please provide valid common config 
file path!");
+}
+
+if (!fs.exists(new Path(configFolder))) {
+  fs.mkdirs(new Path(configFolder));
+}
+  }
+
+  public static void checkIfTableConfigFileExists(String configFolder, 
FileSystem fs, String configFilePath) throws IOException {
+if (!fs.exists(new Path(configFilePath)) || !fs.isFile(new 
Path(configFilePath))) {
+  throw new IllegalArgumentException("Please provide valid table config 
file path!");
+}
+
+Path path = new Path(configFilePath);
+Path filePathInConfigFolder = new Path(configFolder, path.getName());
+if (!fs.exists(filePathInConfigFolder)) {
+  FileUtil.copy(fs, path, fs, filePathInConfigFolder, false, fs.getConf());
+}
+  }
+
+  public static TypedProperties 
getTablePropertiesFromConfigFolder(HoodieDeltaStreamer.Config cfg, FileSystem 
fs) throws IOException {

Review comment:
   @pratyakshsharma : Sorry for the delay. I think we can close this PR. As 
a next step towards enhancing HoodieMultiDeltaStreamer, I think we can work on 
feature parity with : supporting parallel deltastreamer, support for MOR and 
async compaction. These would make HoodieMultiDeltaStreamer a really powerful 
tool :)





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-hudi] bvaradar commented on pull request #1520: [HUDI-797] Small performance improvement for rewriting records.

2020-05-14 Thread GitBox


bvaradar commented on pull request #1520:
URL: https://github.com/apache/incubator-hudi/pull/1520#issuecomment-629040082


   @prashantwason : Looking at the comments, it looks like this PR is going to 
be abandoned. If so, can you please close this PR.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-hudi] yanghua commented on pull request #1611: [HUDI-705]Add unit test for RollbacksCommand

2020-05-14 Thread GitBox


yanghua commented on pull request #1611:
URL: https://github.com/apache/incubator-hudi/pull/1611#issuecomment-629039996


   > @yanghua you can review this as well if possible :)
   
   OK



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-hudi] dengziming commented on a change in pull request #1151: [HUDI-476] Add hudi-examples module

2020-05-14 Thread GitBox


dengziming commented on a change in pull request #1151:
URL: https://github.com/apache/incubator-hudi/pull/1151#discussion_r425577242



##
File path: hudi-examples/pom.xml
##
@@ -0,0 +1,198 @@
+
+
+http://maven.apache.org/POM/4.0.0; 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance; 
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 
http://maven.apache.org/xsd/maven-4.0.0.xsd;>
+
+  
+hudi
+org.apache.hudi
+0.6.0-SNAPSHOT
+  
+  4.0.0
+
+  hudi-examples
+  jar
+
+  
+${project.parent.basedir}
+  
+
+  
+
+  
+src/main/resources
+  
+
+
+
+  
+org.apache.maven.plugins
+maven-dependency-plugin
+
+  
+copy-dependencies
+prepare-package
+
+  copy-dependencies
+
+
+  ${project.build.directory}/lib
+  true
+  true
+  true
+
+  
+
+  
+  
+net.alchim31.maven
+scala-maven-plugin
+
+  
+scala-compile-first
+process-resources
+
+  add-source
+  compile
+
+  
+
+  
+  
+org.apache.maven.plugins
+maven-compiler-plugin
+
+  
+compile
+
+  compile
+
+  
+
+  
+  
+org.apache.maven.plugins
+maven-jar-plugin
+
+  
+
+  test-jar
+
+test-compile
+  
+
+
+  false
+
+  
+  
+org.apache.rat
+apache-rat-plugin
+  
+
+  
+
+  
+
+
+  org.scala-lang
+  scala-library
+  ${scala.version}
+
+
+
+  org.apache.hudi
+  hudi-common
+  ${project.version}
+
+
+
+  org.apache.hudi
+  hudi-cli
+  ${project.version}
+
+
+
+  org.apache.hudi
+  hudi-client
+  ${project.version}
+
+
+
+  org.apache.hudi
+  hudi-utilities_${scala.binary.version}
+  ${project.version}
+
+
+
+  org.apache.hudi
+  hudi-spark_${scala.binary.version}
+  ${project.version}
+
+
+
+  org.apache.hudi
+  hudi-hadoop-mr
+  ${project.version}

Review comment:
   the build process will fail if versions are removed, and other module 
also have `project.version` in the dependencies.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-hudi] bvaradar closed issue #1586: [SUPPORT] DMS with 2 key example

2020-05-14 Thread GitBox


bvaradar closed issue #1586:
URL: https://github.com/apache/incubator-hudi/issues/1586


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-hudi] bvaradar commented on issue #1630: [SUPPORT] Latest commit does not have any schema in commit metadata

2020-05-14 Thread GitBox


bvaradar commented on issue #1630:
URL: https://github.com/apache/incubator-hudi/issues/1630#issuecomment-629037540


   Cherry-picking selective diffs is always a tricky business. Maybe you can 
use master or use 0.5.2 and apply the patch and try. Also, 0.5.3 release is 
going to happen shortly which will contain the above patch.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-hudi] codecov-io edited a comment on pull request #1151: [HUDI-476] Add hudi-examples module

2020-05-14 Thread GitBox


codecov-io edited a comment on pull request #1151:
URL: https://github.com/apache/incubator-hudi/pull/1151#issuecomment-593277561


   # 
[Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1151?src=pr=h1) 
Report
   > Merging 
[#1151](https://codecov.io/gh/apache/incubator-hudi/pull/1151?src=pr=desc) 
into 
[master](https://codecov.io/gh/apache/incubator-hudi/commit/2a56f82908a8b8f788a7547d3c707c144696c1df=desc)
 will **increase** coverage by `0.08%`.
   > The diff coverage is `0.00%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-hudi/pull/1151/graphs/tree.svg?width=650=150=pr=VTTXabwbs2)](https://codecov.io/gh/apache/incubator-hudi/pull/1151?src=pr=tree)
   
   ```diff
   @@ Coverage Diff  @@
   ## master#1151  +/-   ##
   
   + Coverage 71.65%   71.73%   +0.08% 
   - Complexity  294 1089 +795 
   
 Files   378  385   +7 
 Lines 1654116604  +63 
 Branches   1670 1669   -1 
   
   + Hits  1185211911  +59 
   - Misses 3957 3963   +6 
   + Partials732  730   -2 
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/incubator-hudi/pull/1151?src=pr=tree) | 
Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | 
[...rg/apache/hudi/common/model/HoodieAvroPayload.java](https://codecov.io/gh/apache/incubator-hudi/pull/1151/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL21vZGVsL0hvb2RpZUF2cm9QYXlsb2FkLmphdmE=)
 | `78.57% <0.00%> (-6.05%)` | `0.00 <0.00> (ø)` | |
   | 
[...g/apache/hudi/metrics/InMemoryMetricsReporter.java](https://codecov.io/gh/apache/incubator-hudi/pull/1151/diff?src=pr=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvbWV0cmljcy9Jbk1lbW9yeU1ldHJpY3NSZXBvcnRlci5qYXZh)
 | `40.00% <0.00%> (-40.00%)` | `0.00% <0.00%> (ø%)` | |
   | 
[...src/main/java/org/apache/hudi/metrics/Metrics.java](https://codecov.io/gh/apache/incubator-hudi/pull/1151/diff?src=pr=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvbWV0cmljcy9NZXRyaWNzLmphdmE=)
 | `56.75% <0.00%> (-10.82%)` | `0.00% <0.00%> (ø%)` | |
   | 
[...ache/hudi/common/fs/inline/InMemoryFileSystem.java](https://codecov.io/gh/apache/incubator-hudi/pull/1151/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL2ZzL2lubGluZS9Jbk1lbW9yeUZpbGVTeXN0ZW0uamF2YQ==)
 | `79.31% <0.00%> (-10.35%)` | `0.00% <0.00%> (ø%)` | |
   | 
[.../apache/hudi/common/table/TableSchemaResolver.java](https://codecov.io/gh/apache/incubator-hudi/pull/1151/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL1RhYmxlU2NoZW1hUmVzb2x2ZXIuamF2YQ==)
 | `56.71% <0.00%> (-7.15%)` | `0.00% <0.00%> (ø%)` | |
   | 
[...le/action/rollback/BaseRollbackActionExecutor.java](https://codecov.io/gh/apache/incubator-hudi/pull/1151/diff?src=pr=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdGFibGUvYWN0aW9uL3JvbGxiYWNrL0Jhc2VSb2xsYmFja0FjdGlvbkV4ZWN1dG9yLmphdmE=)
 | `70.83% <0.00%> (-6.95%)` | `0.00% <0.00%> (ø%)` | |
   | 
[.../main/java/org/apache/hudi/common/util/Option.java](https://codecov.io/gh/apache/incubator-hudi/pull/1151/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3V0aWwvT3B0aW9uLmphdmE=)
 | `75.67% <0.00%> (-5.41%)` | `17.00% <0.00%> (+17.00%)` | :arrow_down: |
   | 
[...ain/java/org/apache/hudi/avro/HoodieAvroUtils.java](https://codecov.io/gh/apache/incubator-hudi/pull/1151/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvYXZyby9Ib29kaWVBdnJvVXRpbHMuamF2YQ==)
 | `80.95% <0.00%> (-3.87%)` | `22.00% <0.00%> (+22.00%)` | :arrow_down: |
   | 
[...g/apache/hudi/table/action/clean/CleanPlanner.java](https://codecov.io/gh/apache/incubator-hudi/pull/1151/diff?src=pr=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdGFibGUvYWN0aW9uL2NsZWFuL0NsZWFuUGxhbm5lci5qYXZh)
 | `86.86% <0.00%> (-2.89%)` | `5.00% <0.00%> (+5.00%)` | :arrow_down: |
   | 
[...src/main/java/org/apache/hudi/DataSourceUtils.java](https://codecov.io/gh/apache/incubator-hudi/pull/1151/diff?src=pr=tree#diff-aHVkaS1zcGFyay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS9EYXRhU291cmNlVXRpbHMuamF2YQ==)
 | `55.55% <0.00%> (-1.15%)` | `0.00% <0.00%> (ø%)` | |
   | ... and [69 
more](https://codecov.io/gh/apache/incubator-hudi/pull/1151/diff?src=pr=tree-more)
 | |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1151?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 

[GitHub] [incubator-hudi] codecov-io edited a comment on pull request #1602: [HUDI-494] fix incorrect record size estimation

2020-05-14 Thread GitBox


codecov-io edited a comment on pull request #1602:
URL: https://github.com/apache/incubator-hudi/pull/1602#issuecomment-628183877


   # 
[Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1602?src=pr=h1) 
Report
   > Merging 
[#1602](https://codecov.io/gh/apache/incubator-hudi/pull/1602?src=pr=desc) 
into 
[master](https://codecov.io/gh/apache/incubator-hudi/commit/32bada29dc95f1d5910713ae6b4f4a4ef39677c9=desc)
 will **increase** coverage by `0.00%`.
   > The diff coverage is `100.00%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-hudi/pull/1602/graphs/tree.svg?width=650=150=pr=VTTXabwbs2)](https://codecov.io/gh/apache/incubator-hudi/pull/1602?src=pr=tree)
   
   ```diff
   @@Coverage Diff@@
   ## master#1602   +/-   ##
   =
 Coverage 71.76%   71.76%   
   - Complexity 1087 1090+3 
   =
 Files   385  385   
 Lines 1658416585+1 
 Branches   1668 1666-2 
   =
   + Hits  1190211903+1 
   + Misses 3954 3953-1 
   - Partials728  729+1 
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/incubator-hudi/pull/1602?src=pr=tree) | 
Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | 
[.../org/apache/hudi/table/HoodieCopyOnWriteTable.java](https://codecov.io/gh/apache/incubator-hudi/pull/1602/diff?src=pr=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdGFibGUvSG9vZGllQ29weU9uV3JpdGVUYWJsZS5qYXZh)
 | `57.14% <ø> (-4.93%)` | `4.00 <0.00> (ø)` | |
   | 
[...he/hudi/table/action/commit/UpsertPartitioner.java](https://codecov.io/gh/apache/incubator-hudi/pull/1602/diff?src=pr=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdGFibGUvYWN0aW9uL2NvbW1pdC9VcHNlcnRQYXJ0aXRpb25lci5qYXZh)
 | `95.65% <100.00%> (+0.68%)` | `15.00 <1.00> (ø)` | |
   | 
[...ache/hudi/common/fs/inline/InMemoryFileSystem.java](https://codecov.io/gh/apache/incubator-hudi/pull/1602/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL2ZzL2lubGluZS9Jbk1lbW9yeUZpbGVTeXN0ZW0uamF2YQ==)
 | `79.31% <0.00%> (-10.35%)` | `0.00% <0.00%> (ø%)` | |
   | 
[...i/utilities/keygen/TimestampBasedKeyGenerator.java](https://codecov.io/gh/apache/incubator-hudi/pull/1602/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2tleWdlbi9UaW1lc3RhbXBCYXNlZEtleUdlbmVyYXRvci5qYXZh)
 | `58.82% <0.00%> (-0.16%)` | `7.00% <0.00%> (+2.00%)` | :arrow_down: |
   | 
[...ain/java/org/apache/hudi/avro/HoodieAvroUtils.java](https://codecov.io/gh/apache/incubator-hudi/pull/1602/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvYXZyby9Ib29kaWVBdnJvVXRpbHMuamF2YQ==)
 | `80.95% <0.00%> (+1.12%)` | `22.00% <0.00%> (ø%)` | |
   | 
[...src/main/java/org/apache/hudi/metrics/Metrics.java](https://codecov.io/gh/apache/incubator-hudi/pull/1602/diff?src=pr=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvbWV0cmljcy9NZXRyaWNzLmphdmE=)
 | `67.56% <0.00%> (+10.81%)` | `0.00% <0.00%> (ø%)` | |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1602?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1602?src=pr=footer).
 Last update 
[32bada2...88e2177](https://codecov.io/gh/apache/incubator-hudi/pull/1602?src=pr=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-hudi] zherenyu831 commented on issue #1631: [SUPPORT] After changed schema could not update schema of Hive

2020-05-14 Thread GitBox


zherenyu831 commented on issue #1631:
URL: https://github.com/apache/incubator-hudi/issues/1631#issuecomment-629015268


   @bvaradar 
   Thank you, I think this is the only solution could works now



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-hudi] zherenyu831 closed issue #1631: [SUPPORT] After changed schema could not update schema of Hive

2020-05-14 Thread GitBox


zherenyu831 closed issue #1631:
URL: https://github.com/apache/incubator-hudi/issues/1631


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-hudi] codecov-io edited a comment on pull request #1602: [HUDI-494] fix incorrect record size estimation

2020-05-14 Thread GitBox


codecov-io edited a comment on pull request #1602:
URL: https://github.com/apache/incubator-hudi/pull/1602#issuecomment-628183877


   # 
[Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1602?src=pr=h1) 
Report
   > Merging 
[#1602](https://codecov.io/gh/apache/incubator-hudi/pull/1602?src=pr=desc) 
into 
[master](https://codecov.io/gh/apache/incubator-hudi/commit/32bada29dc95f1d5910713ae6b4f4a4ef39677c9=desc)
 will **decrease** coverage by `0.03%`.
   > The diff coverage is `100.00%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-hudi/pull/1602/graphs/tree.svg?width=650=150=pr=VTTXabwbs2)](https://codecov.io/gh/apache/incubator-hudi/pull/1602?src=pr=tree)
   
   ```diff
   @@ Coverage Diff  @@
   ## master#1602  +/-   ##
   
   - Coverage 71.76%   71.73%   -0.04% 
   - Complexity 1087 1090   +3 
   
 Files   385  385  
 Lines 1658416585   +1 
 Branches   1668 1666   -2 
   
   - Hits  1190211897   -5 
   - Misses 3954 3960   +6 
 Partials728  728  
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/incubator-hudi/pull/1602?src=pr=tree) | 
Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | 
[.../org/apache/hudi/table/HoodieCopyOnWriteTable.java](https://codecov.io/gh/apache/incubator-hudi/pull/1602/diff?src=pr=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdGFibGUvSG9vZGllQ29weU9uV3JpdGVUYWJsZS5qYXZh)
 | `57.14% <ø> (-4.93%)` | `4.00 <0.00> (ø)` | |
   | 
[...he/hudi/table/action/commit/UpsertPartitioner.java](https://codecov.io/gh/apache/incubator-hudi/pull/1602/diff?src=pr=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdGFibGUvYWN0aW9uL2NvbW1pdC9VcHNlcnRQYXJ0aXRpb25lci5qYXZh)
 | `95.65% <100.00%> (+0.68%)` | `15.00 <1.00> (ø)` | |
   | 
[...g/apache/hudi/metrics/InMemoryMetricsReporter.java](https://codecov.io/gh/apache/incubator-hudi/pull/1602/diff?src=pr=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvbWV0cmljcy9Jbk1lbW9yeU1ldHJpY3NSZXBvcnRlci5qYXZh)
 | `40.00% <0.00%> (-40.00%)` | `0.00% <0.00%> (ø%)` | |
   | 
[...ache/hudi/common/fs/inline/InMemoryFileSystem.java](https://codecov.io/gh/apache/incubator-hudi/pull/1602/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL2ZzL2lubGluZS9Jbk1lbW9yeUZpbGVTeXN0ZW0uamF2YQ==)
 | `79.31% <0.00%> (-10.35%)` | `0.00% <0.00%> (ø%)` | |
   | 
[...i/utilities/keygen/TimestampBasedKeyGenerator.java](https://codecov.io/gh/apache/incubator-hudi/pull/1602/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2tleWdlbi9UaW1lc3RhbXBCYXNlZEtleUdlbmVyYXRvci5qYXZh)
 | `58.82% <0.00%> (-0.16%)` | `7.00% <0.00%> (+2.00%)` | :arrow_down: |
   | 
[...ain/java/org/apache/hudi/avro/HoodieAvroUtils.java](https://codecov.io/gh/apache/incubator-hudi/pull/1602/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvYXZyby9Ib29kaWVBdnJvVXRpbHMuamF2YQ==)
 | `80.95% <0.00%> (+1.12%)` | `22.00% <0.00%> (ø%)` | |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1602?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1602?src=pr=footer).
 Last update 
[32bada2...88e2177](https://codecov.io/gh/apache/incubator-hudi/pull/1602?src=pr=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-hudi] garyli1019 commented on pull request #1592: [Hudi-69] Spark Datasource for MOR table

2020-05-14 Thread GitBox


garyli1019 commented on pull request #1592:
URL: https://github.com/apache/incubator-hudi/pull/1592#issuecomment-629011466


   Thanks @xushiyan! I will give it a try.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-hudi] nsivabalan commented on a change in pull request #1433: [HUDI-728]: Implement custom key generator

2020-05-14 Thread GitBox


nsivabalan commented on a change in pull request #1433:
URL: https://github.com/apache/incubator-hudi/pull/1433#discussion_r425547719



##
File path: 
hudi-spark/src/main/java/org/apache/hudi/keygen/SimpleKeyGenerator.java
##
@@ -66,6 +68,14 @@ public HoodieKey getKey(GenericRecord record) {
   partitionPath = partitionPathField + "=" + partitionPath;
 }
 
-return new HoodieKey(recordKey, partitionPath);
+return partitionPath;
+  }
+
+  public String getRecordKey(GenericRecord record) {

Review comment:
   did you think if we need to make this an abstract method in 
KeyGenerator? 

##
File path: 
hudi-spark/src/main/java/org/apache/hudi/keygen/CustomKeyGenerator.java
##
@@ -0,0 +1,128 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.keygen;
+
+import org.apache.hudi.DataSourceWriteOptions;
+import org.apache.hudi.common.model.HoodieKey;
+import org.apache.hudi.common.config.TypedProperties;
+
+import org.apache.avro.generic.GenericRecord;
+import org.apache.hudi.exception.HoodieDeltaStreamerException;
+import org.apache.hudi.exception.HoodieKeyException;
+
+import java.util.Arrays;
+import java.util.List;
+import java.util.stream.Collectors;
+
+/**
+ * This is a generic implementation of KeyGenerator where users can configure 
record key as a single field or a combination of fields.
+ * Similarly partition path can be configured to have multiple fields or only 
one field. This class expects value for prop
+ * "hoodie.datasource.write.partitionpath.field" in a specific format. For 
example:
+ *
+ * properties.put("hoodie.datasource.write.partitionpath.field", 
"field1:PartitionKeyType1,field2:PartitionKeyType2").
+ *
+ * The complete partition path is created as / and so on.
+ *
+ * Few points to consider:
+ * 1. If you want to customise some partition path field on a timestamp basis, 
you can use field1:timestampBased

Review comment:
   minor typo. "customize"

##
File path: 
hudi-spark/src/main/java/org/apache/hudi/keygen/ComplexKeyGenerator.java
##
@@ -49,21 +50,44 @@
 
   public ComplexKeyGenerator(TypedProperties props) {
 super(props);
-this.recordKeyFields = 
Arrays.asList(props.getString(DataSourceWriteOptions.RECORDKEY_FIELD_OPT_KEY()).split(","))
-.stream().map(String::trim).collect(Collectors.toList());
-this.partitionPathFields =
-
Arrays.asList(props.getString(DataSourceWriteOptions.PARTITIONPATH_FIELD_OPT_KEY()).split(","))
-.stream().map(String::trim).collect(Collectors.toList());
+DataSourceUtils.checkRequiredProperties(props, Arrays.asList(
+DataSourceWriteOptions.RECORDKEY_FIELD_OPT_KEY(),
+DataSourceWriteOptions.PARTITIONPATH_FIELD_OPT_KEY())
+);
+this.recordKeyFields = 
Arrays.stream(props.getString(DataSourceWriteOptions.RECORDKEY_FIELD_OPT_KEY()).split(","))
+.map(String::trim).collect(Collectors.toList());
+this.partitionPathFields = 
Arrays.stream(props.getString(DataSourceWriteOptions.PARTITIONPATH_FIELD_OPT_KEY()).split(","))
+.map(String::trim).collect(Collectors.toList());
 this.hiveStylePartitioning = 
props.getBoolean(DataSourceWriteOptions.HIVE_STYLE_PARTITIONING_OPT_KEY(),
 
Boolean.parseBoolean(DataSourceWriteOptions.DEFAULT_HIVE_STYLE_PARTITIONING_OPT_VAL()));
   }
 
   @Override
   public HoodieKey getKey(GenericRecord record) {
-if (recordKeyFields == null || partitionPathFields == null) {
-  throw new HoodieKeyException("Unable to find field names for record key 
or partition path in cfg");
+String recordKey = getRecordKey(record);
+StringBuilder partitionPath = new StringBuilder();
+for (String partitionPathField : partitionPathFields) {
+  partitionPath.append(getPartitionPath(record, partitionPathField));
+  partitionPath.append(DEFAULT_PARTITION_PATH_SEPARATOR);
+}
+partitionPath.deleteCharAt(partitionPath.length() - 1);
+
+return new HoodieKey(recordKey, partitionPath.toString());
+  }
+
+  public String getPartitionPath(GenericRecord record, String 
partitionPathField) {

Review comment:
   does this needs to be public? why not protected or package 

Build failed in Jenkins: hudi-snapshot-deployment-0.5 #278

2020-05-14 Thread Apache Jenkins Server
See 


Changes:


--
[...truncated 2.38 KB...]
/home/jenkins/tools/maven/apache-maven-3.5.4/conf:
logging
settings.xml
toolchains.xml

/home/jenkins/tools/maven/apache-maven-3.5.4/conf/logging:
simplelogger.properties

/home/jenkins/tools/maven/apache-maven-3.5.4/lib:
aopalliance-1.0.jar
cdi-api-1.0.jar
cdi-api.license
commons-cli-1.4.jar
commons-cli.license
commons-io-2.5.jar
commons-io.license
commons-lang3-3.5.jar
commons-lang3.license
ext
guava-20.0.jar
guice-4.2.0-no_aop.jar
jansi-1.17.1.jar
jansi-native
javax.inject-1.jar
jcl-over-slf4j-1.7.25.jar
jcl-over-slf4j.license
jsr250-api-1.0.jar
jsr250-api.license
maven-artifact-3.5.4.jar
maven-artifact.license
maven-builder-support-3.5.4.jar
maven-builder-support.license
maven-compat-3.5.4.jar
maven-compat.license
maven-core-3.5.4.jar
maven-core.license
maven-embedder-3.5.4.jar
maven-embedder.license
maven-model-3.5.4.jar
maven-model-builder-3.5.4.jar
maven-model-builder.license
maven-model.license
maven-plugin-api-3.5.4.jar
maven-plugin-api.license
maven-repository-metadata-3.5.4.jar
maven-repository-metadata.license
maven-resolver-api-1.1.1.jar
maven-resolver-api.license
maven-resolver-connector-basic-1.1.1.jar
maven-resolver-connector-basic.license
maven-resolver-impl-1.1.1.jar
maven-resolver-impl.license
maven-resolver-provider-3.5.4.jar
maven-resolver-provider.license
maven-resolver-spi-1.1.1.jar
maven-resolver-spi.license
maven-resolver-transport-wagon-1.1.1.jar
maven-resolver-transport-wagon.license
maven-resolver-util-1.1.1.jar
maven-resolver-util.license
maven-settings-3.5.4.jar
maven-settings-builder-3.5.4.jar
maven-settings-builder.license
maven-settings.license
maven-shared-utils-3.2.1.jar
maven-shared-utils.license
maven-slf4j-provider-3.5.4.jar
maven-slf4j-provider.license
org.eclipse.sisu.inject-0.3.3.jar
org.eclipse.sisu.inject.license
org.eclipse.sisu.plexus-0.3.3.jar
org.eclipse.sisu.plexus.license
plexus-cipher-1.7.jar
plexus-cipher.license
plexus-component-annotations-1.7.1.jar
plexus-component-annotations.license
plexus-interpolation-1.24.jar
plexus-interpolation.license
plexus-sec-dispatcher-1.4.jar
plexus-sec-dispatcher.license
plexus-utils-3.1.0.jar
plexus-utils.license
slf4j-api-1.7.25.jar
slf4j-api.license
wagon-file-3.1.0.jar
wagon-file.license
wagon-http-3.1.0-shaded.jar
wagon-http.license
wagon-provider-api-3.1.0.jar
wagon-provider-api.license

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/ext:
README.txt

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native:
freebsd32
freebsd64
linux32
linux64
osx
README.txt
windows32
windows64

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/freebsd32:
libjansi.so

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/freebsd64:
libjansi.so

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/linux32:
libjansi.so

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/linux64:
libjansi.so

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/osx:
libjansi.jnilib

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/windows32:
jansi.dll

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/windows64:
jansi.dll
Finished /home/jenkins/tools/maven/apache-maven-3.5.4 Directory Listing :
Detected current version as: 
'HUDI_home=
0.6.0-SNAPSHOT'
[INFO] Scanning for projects...
[WARNING] 
[WARNING] Some problems were encountered while building the effective model for 
org.apache.hudi:hudi-spark_2.11:jar:0.6.0-SNAPSHOT
[WARNING] 'artifactId' contains an expression but should be a constant. @ 
org.apache.hudi:hudi-spark_${scala.binary.version}:[unknown-version], 

 line 26, column 15
[WARNING] 
[WARNING] Some problems were encountered while building the effective model for 
org.apache.hudi:hudi-timeline-service:jar:0.6.0-SNAPSHOT
[WARNING] 'build.plugins.plugin.(groupId:artifactId)' must be unique but found 
duplicate declaration of plugin org.jacoco:jacoco-maven-plugin @ 
org.apache.hudi:hudi-timeline-service:[unknown-version], 

 line 58, column 15
[WARNING] 
[WARNING] Some problems were encountered while building the effective model for 
org.apache.hudi:hudi-utilities_2.11:jar:0.6.0-SNAPSHOT
[WARNING] 'artifactId' contains an expression but should be a constant. @ 
org.apache.hudi:hudi-utilities_${scala.binary.version}:[unknown-version], 

 line 26, column 15
[WARNING] 
[WARNING] Some problems were encountered while building the effective model for 
org.apache.hudi:hudi-spark-bundle_2.11:jar:0.6.0-SNAPSHOT
[WARNING] 'artifactId' contains an expression but should be a constant. @ 

[GitHub] [incubator-hudi] vinothchandar commented on pull request #1402: [HUDI-407] Adding Simple Index

2020-05-14 Thread GitBox


vinothchandar commented on pull request #1402:
URL: https://github.com/apache/incubator-hudi/pull/1402#issuecomment-629007570


   Yes. Please and a good commit message, given this is an important feature



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-hudi] garyli1019 commented on pull request #1602: [HUDI-494] fix incorrect record size estimation

2020-05-14 Thread GitBox


garyli1019 commented on pull request #1602:
URL: https://github.com/apache/incubator-hudi/pull/1602#issuecomment-629005508


   switched to `totalBytesWritten > 
hoodieWriteConfig.getParquetSmallFileLimit()`. I think this way would have 
minimal impact and handle this bug.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-hudi] nsivabalan commented on a change in pull request #1409: [HUDI-714]Add javadoc and comments to hudi write method link

2020-05-14 Thread GitBox


nsivabalan commented on a change in pull request #1409:
URL: https://github.com/apache/incubator-hudi/pull/1409#discussion_r425540622



##
File path: hudi-spark/src/main/java/org/apache/hudi/DataSourceUtils.java
##
@@ -241,6 +241,13 @@ public static HoodieRecord 
createHoodieRecord(GenericRecord gr, Comparable order
 return new HoodieRecord<>(hKey, payload);
   }
 
+  /**
+   * Drop duplicate records from incoming records.

Review comment:
   actually, my bad. we should word it differently. "Drop records already 
present in the dataset". Drop duplicates might give a wrong notion. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-hudi] nsivabalan commented on pull request #1402: [HUDI-407] Adding Simple Index

2020-05-14 Thread GitBox


nsivabalan commented on pull request #1402:
URL: https://github.com/apache/incubator-hudi/pull/1402#issuecomment-628997531


   @vinothchandar : tests are passing. Let me know if you want me to squash all 
commits



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-hudi] yanghua commented on pull request #1558: [HUDI-796]: added deduping logic for upserts case

2020-05-14 Thread GitBox


yanghua commented on pull request #1558:
URL: https://github.com/apache/incubator-hudi/pull/1558#issuecomment-628985363


   > I will let @yanghua see this home
   
   OK, and @pratyakshsharma first of all, please fix all the conflicting files.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-hudi] dengziming commented on pull request #1151: [HUDI-476] Add hudi-examples module

2020-05-14 Thread GitBox


dengziming commented on pull request #1151:
URL: https://github.com/apache/incubator-hudi/pull/1151#issuecomment-628968945


   @vinothchandar sorry, a little busy these days, I will addressed your 
comments in a few days.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-hudi] leesf commented on pull request #1622: [HUDI-888] fix NullPointerException

2020-05-14 Thread GitBox


leesf commented on pull request #1622:
URL: https://github.com/apache/incubator-hudi/pull/1622#issuecomment-628966948


   @rolandjohann Would you please check why the travis is red?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-hudi] codecov-io commented on pull request #1616: [HUDI-786] Fixing read beyond inline length in InlineFS

2020-05-14 Thread GitBox


codecov-io commented on pull request #1616:
URL: https://github.com/apache/incubator-hudi/pull/1616#issuecomment-628960229


   # 
[Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1616?src=pr=h1) 
Report
   > Merging 
[#1616](https://codecov.io/gh/apache/incubator-hudi/pull/1616?src=pr=desc) 
into 
[master](https://codecov.io/gh/apache/incubator-hudi/commit/fa6aba751d8de16d9d109a8cfc21150b17b59cff=desc)
 will **increase** coverage by `0.01%`.
   > The diff coverage is `72.72%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-hudi/pull/1616/graphs/tree.svg?width=650=150=pr=VTTXabwbs2)](https://codecov.io/gh/apache/incubator-hudi/pull/1616?src=pr=tree)
   
   ```diff
   @@ Coverage Diff  @@
   ## master#1616  +/-   ##
   
   + Coverage 71.78%   71.80%   +0.01% 
   - Complexity 1087 1089   +2 
   
 Files   385  385  
 Lines 1657516611  +36 
 Branches   1668 1669   +1 
   
   + Hits  1189911928  +29 
   - Misses 3947 3952   +5 
   - Partials729  731   +2 
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/incubator-hudi/pull/1616?src=pr=tree) | 
Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | 
[...hudi/common/fs/inline/InLineFsDataInputStream.java](https://codecov.io/gh/apache/incubator-hudi/pull/1616/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL2ZzL2lubGluZS9JbkxpbmVGc0RhdGFJbnB1dFN0cmVhbS5qYXZh)
 | `61.76% <72.72%> (+23.30%)` | `0.00 <0.00> (ø)` | |
   | 
[...ache/hudi/common/fs/inline/InMemoryFileSystem.java](https://codecov.io/gh/apache/incubator-hudi/pull/1616/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL2ZzL2lubGluZS9Jbk1lbW9yeUZpbGVTeXN0ZW0uamF2YQ==)
 | `79.31% <0.00%> (-10.35%)` | `0.00% <0.00%> (ø%)` | |
   | 
[...src/main/java/org/apache/hudi/DataSourceUtils.java](https://codecov.io/gh/apache/incubator-hudi/pull/1616/diff?src=pr=tree#diff-aHVkaS1zcGFyay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS9EYXRhU291cmNlVXRpbHMuamF2YQ==)
 | `55.55% <0.00%> (-1.15%)` | `0.00% <0.00%> (ø%)` | |
   | 
[...i/utilities/keygen/TimestampBasedKeyGenerator.java](https://codecov.io/gh/apache/incubator-hudi/pull/1616/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2tleWdlbi9UaW1lc3RhbXBCYXNlZEtleUdlbmVyYXRvci5qYXZh)
 | `58.82% <0.00%> (-0.16%)` | `7.00% <0.00%> (+2.00%)` | :arrow_down: |
   | 
[...c/main/java/org/apache/hudi/table/HoodieTable.java](https://codecov.io/gh/apache/incubator-hudi/pull/1616/diff?src=pr=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdGFibGUvSG9vZGllVGFibGUuamF2YQ==)
 | `83.46% <0.00%> (-0.13%)` | `22.00% <0.00%> (ø%)` | |
   | 
[...n/scala/org/apache/hudi/HoodieSparkSqlWriter.scala](https://codecov.io/gh/apache/incubator-hudi/pull/1616/diff?src=pr=tree#diff-aHVkaS1zcGFyay9zcmMvbWFpbi9zY2FsYS9vcmcvYXBhY2hlL2h1ZGkvSG9vZGllU3BhcmtTcWxXcml0ZXIuc2NhbGE=)
 | `53.29% <0.00%> (-0.04%)` | `0.00% <0.00%> (ø%)` | |
   | 
[.../java/org/apache/hudi/client/HoodieReadClient.java](https://codecov.io/gh/apache/incubator-hudi/pull/1616/diff?src=pr=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY2xpZW50L0hvb2RpZVJlYWRDbGllbnQuamF2YQ==)
 | `100.00% <0.00%> (ø)` | `0.00% <0.00%> (ø%)` | |
   | 
[...java/org/apache/hudi/client/utils/ClientUtils.java](https://codecov.io/gh/apache/incubator-hudi/pull/1616/diff?src=pr=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY2xpZW50L3V0aWxzL0NsaWVudFV0aWxzLmphdmE=)
 | `75.00% <0.00%> (ø)` | `1.00% <0.00%> (ø%)` | |
   | 
[...java/org/apache/hudi/common/fs/StorageSchemes.java](https://codecov.io/gh/apache/incubator-hudi/pull/1616/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL2ZzL1N0b3JhZ2VTY2hlbWVzLmphdmE=)
 | `100.00% <0.00%> (ø)` | `4.00% <0.00%> (ø%)` | |
   | 
[.../org/apache/hudi/table/HoodieCopyOnWriteTable.java](https://codecov.io/gh/apache/incubator-hudi/pull/1616/diff?src=pr=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdGFibGUvSG9vZGllQ29weU9uV3JpdGVUYWJsZS5qYXZh)
 | `62.06% <0.00%> (ø)` | `4.00% <0.00%> (ø%)` | |
   | ... and [7 
more](https://codecov.io/gh/apache/incubator-hudi/pull/1616/diff?src=pr=tree-more)
 | |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1616?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1616?src=pr=footer).
 Last update 

[GitHub] [incubator-hudi] v3nkatesh commented on pull request #1484: [HUDI-316] : Hbase qps repartition writestatus

2020-05-14 Thread GitBox


v3nkatesh commented on pull request #1484:
URL: https://github.com/apache/incubator-hudi/pull/1484#issuecomment-628953098


   > @v3nkatesh There are a couple of pending comments from @satishkotha. If 
you can finish up with those we can merge this PR with the condition that we 
need to add to LICENSE file because of the copied code in ratelimiter here -> 
https://github.com/apache/incubator-hudi/blob/master/LICENSE
   
   @n3nash I have replied to the remaining comments. For RateLimiter class, can 
you check if the new refactoring is good enough to skip the LICENSE part ?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-hudi] v3nkatesh commented on a change in pull request #1484: [HUDI-316] : Hbase qps repartition writestatus

2020-05-14 Thread GitBox


v3nkatesh commented on a change in pull request #1484:
URL: https://github.com/apache/incubator-hudi/pull/1484#discussion_r425498242



##
File path: hudi-client/src/main/java/org/apache/hudi/index/hbase/HBaseIndex.java
##
@@ -83,13 +88,14 @@
   private static final byte[] COMMIT_TS_COLUMN = Bytes.toBytes("commit_ts");
   private static final byte[] FILE_NAME_COLUMN = Bytes.toBytes("file_name");
   private static final byte[] PARTITION_PATH_COLUMN = 
Bytes.toBytes("partition_path");
-  private static final int SLEEP_TIME_MILLISECONDS = 100;
 
   private static final Logger LOG = LogManager.getLogger(HBaseIndex.class);
   private static Connection hbaseConnection = null;
   private HBaseIndexQPSResourceAllocator hBaseIndexQPSResourceAllocator = null;
-  private float qpsFraction;
   private int maxQpsPerRegionServer;
+  private long totalNumInserts;
+  private int numWriteStatusWithInserts;

Review comment:
   These 2 are actually used at multiple places inside the class, so did 
not refactor. It's possible to move them inside methods, but it will mean 
re-calculating.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-hudi] v3nkatesh commented on a change in pull request #1484: [HUDI-316] : Hbase qps repartition writestatus

2020-05-14 Thread GitBox


v3nkatesh commented on a change in pull request #1484:
URL: https://github.com/apache/incubator-hudi/pull/1484#discussion_r425496973



##
File path: hudi-client/src/main/java/org/apache/hudi/index/hbase/HBaseIndex.java
##
@@ -322,66 +347,94 @@ private boolean checkIfValidCommit(HoodieTableMetaClient 
metaClient, String comm
   /**
* Helper method to facilitate performing mutations (including puts and 
deletes) in Hbase.
*/
-  private void doMutations(BufferedMutator mutator, List mutations) 
throws IOException {
+  private void doMutations(BufferedMutator mutator, List mutations, 
RateLimiter limiter) throws IOException {
 if (mutations.isEmpty()) {
   return;
 }
+// report number of operations to account per second with rate limiter.
+// If #limiter.getRate() operations are acquired within 1 second, 
ratelimiter will limit the rest of calls
+// for within that second
+limiter.acquire(mutations.size());
 mutator.mutate(mutations);
 mutator.flush();
 mutations.clear();

Review comment:
   I think we synced offline for this, but let me try to summarize. 
   Token will be acquired only after previous batch is done, so it won't flood 
the system or over-utilize the cluster as planned. Though the side effect is 
hbase operation running slower than intended. Yes metrics on operation will be 
useful, will create follow up ticket.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-hudi] nsivabalan commented on a change in pull request #1616: [HUDI-786] Fixing read beyond inline length in InlineFS

2020-05-14 Thread GitBox


nsivabalan commented on a change in pull request #1616:
URL: https://github.com/apache/incubator-hudi/pull/1616#discussion_r425494577



##
File path: 
hudi-common/src/main/java/org/apache/hudi/common/fs/inline/InLineFsDataInputStream.java
##
@@ -56,24 +56,29 @@ public long getPos() throws IOException {
 
   @Override
   public int read(long position, byte[] buffer, int offset, int length) throws 
IOException {
+if ((length - offset) > this.length) {
+  throw new IOException("Attempting to read past inline content");
+}
 return outerStream.read(startOffset + position, buffer, offset, length);
   }
 
   @Override
   public void readFully(long position, byte[] buffer, int offset, int length) 
throws IOException {
+if ((length - offset) > this.length) {
+  throw new IOException("Attempting to read past inline content");
+}
 outerStream.readFully(startOffset + position, buffer, offset, length);
   }
 
   @Override
   public void readFully(long position, byte[] buffer)
   throws IOException {
-outerStream.readFully(startOffset + position, buffer, 0, buffer.length);
+readFully(position, buffer, 0, buffer.length);
   }
 
   @Override
   public boolean seekToNewSource(long targetPos) throws IOException {

Review comment:
   have fixed seek(targetpos). Trying to understand seekToNewSource(long 
targetPos). Docs says "Seek to the given position on an alternate copy of the 
data. Returns true if alternate copy is found, false otherwise". I am not sure 
how to go about this. 
   
   If we check for bounds and could return false (if targetPos > length), but 
what in case alternate copy is not found? 
   
   Or I could do something like this. 
   
   ```
   @Override
  public boolean seekToNewSource(long targetPos) throws IOException {
boolean returnVal = outerStream.seekToNewSource(startOffset + 
targetPos)
if(returnVal) {
  if(targetPos > length ) {
  returnVal = false
  }
 }
  return returnVal; 
   }
   ```





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-hudi] nsivabalan commented on a change in pull request #1616: [HUDI-786] Fixing read beyond inline length in InlineFS

2020-05-14 Thread GitBox


nsivabalan commented on a change in pull request #1616:
URL: https://github.com/apache/incubator-hudi/pull/1616#discussion_r425494577



##
File path: 
hudi-common/src/main/java/org/apache/hudi/common/fs/inline/InLineFsDataInputStream.java
##
@@ -56,24 +56,29 @@ public long getPos() throws IOException {
 
   @Override
   public int read(long position, byte[] buffer, int offset, int length) throws 
IOException {
+if ((length - offset) > this.length) {
+  throw new IOException("Attempting to read past inline content");
+}
 return outerStream.read(startOffset + position, buffer, offset, length);
   }
 
   @Override
   public void readFully(long position, byte[] buffer, int offset, int length) 
throws IOException {
+if ((length - offset) > this.length) {
+  throw new IOException("Attempting to read past inline content");
+}
 outerStream.readFully(startOffset + position, buffer, offset, length);
   }
 
   @Override
   public void readFully(long position, byte[] buffer)
   throws IOException {
-outerStream.readFully(startOffset + position, buffer, 0, buffer.length);
+readFully(position, buffer, 0, buffer.length);
   }
 
   @Override
   public boolean seekToNewSource(long targetPos) throws IOException {

Review comment:
   have fixed seek(targetpos). Trying to understand seekToNewSource(long 
targetPos). Docs says "Seek to the given position on an alternate copy of the 
data. Returns true if alternate copy is found, false otherwise". I am not sure 
how to go about this. 
   
   If we check for bounds and could return false (if targetPos > length), but 
what in case alternate copy is not found? 
   
   Or I could do something like this. 
   
   ` 
   @Override
  public boolean seekToNewSource(long targetPos) throws IOException {
boolean returnVal = outerStream.seekToNewSource(startOffset + 
targetPos)
if(returnVal) {
  if(targetPos > length ) {
  returnVal = false
  }
 }
  return returnVal; 
   }
   `





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-hudi] nsivabalan commented on a change in pull request #1616: [HUDI-786] Fixing read beyond inline length in InlineFS

2020-05-14 Thread GitBox


nsivabalan commented on a change in pull request #1616:
URL: https://github.com/apache/incubator-hudi/pull/1616#discussion_r425494577



##
File path: 
hudi-common/src/main/java/org/apache/hudi/common/fs/inline/InLineFsDataInputStream.java
##
@@ -56,24 +56,29 @@ public long getPos() throws IOException {
 
   @Override
   public int read(long position, byte[] buffer, int offset, int length) throws 
IOException {
+if ((length - offset) > this.length) {
+  throw new IOException("Attempting to read past inline content");
+}
 return outerStream.read(startOffset + position, buffer, offset, length);
   }
 
   @Override
   public void readFully(long position, byte[] buffer, int offset, int length) 
throws IOException {
+if ((length - offset) > this.length) {
+  throw new IOException("Attempting to read past inline content");
+}
 outerStream.readFully(startOffset + position, buffer, offset, length);
   }
 
   @Override
   public void readFully(long position, byte[] buffer)
   throws IOException {
-outerStream.readFully(startOffset + position, buffer, 0, buffer.length);
+readFully(position, buffer, 0, buffer.length);
   }
 
   @Override
   public boolean seekToNewSource(long targetPos) throws IOException {

Review comment:
   have fixed seek(targetpos). Trying to understand seekToNewSource(long 
targetPos). Docs says "Seek to the given position on an alternate copy of the 
data. Returns true if alternate copy is found, false otherwise". I am not sure 
how to go about this. 
   
   If we check for bounds and could return false (if targetPos > length), but 
what in case alternate copy is not found? 
   
   Or I could do something like this. 
   
@Override
 public boolean seekToNewSource(long targetPos) throws IOException {
  boolean returnVal = outerStream.seekToNewSource(startOffset + targetPos)
  if(returnVal) {
  if(targetPos > length ) {
 returnVal = false
   }
   }
  return returnVal; 
   }





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-hudi] n3nash commented on a change in pull request #1484: [HUDI-316] : Hbase qps repartition writestatus

2020-05-14 Thread GitBox


n3nash commented on a change in pull request #1484:
URL: https://github.com/apache/incubator-hudi/pull/1484#discussion_r425489735



##
File path: hudi-client/src/main/java/org/apache/hudi/index/hbase/HBaseIndex.java
##
@@ -83,13 +88,14 @@
   private static final byte[] COMMIT_TS_COLUMN = Bytes.toBytes("commit_ts");
   private static final byte[] FILE_NAME_COLUMN = Bytes.toBytes("file_name");
   private static final byte[] PARTITION_PATH_COLUMN = 
Bytes.toBytes("partition_path");
-  private static final int SLEEP_TIME_MILLISECONDS = 100;
 
   private static final Logger LOG = LogManager.getLogger(HBaseIndex.class);
   private static Connection hbaseConnection = null;
   private HBaseIndexQPSResourceAllocator hBaseIndexQPSResourceAllocator = null;
-  private float qpsFraction;
   private int maxQpsPerRegionServer;
+  private long totalNumInserts;
+  private int numWriteStatusWithInserts;

Review comment:
   @v3nkatesh can you address or respond to this comment ? 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-hudi] garyli1019 commented on pull request #1602: [HUDI-494] fix incorrect record size estimation

2020-05-14 Thread GitBox


garyli1019 commented on pull request #1602:
URL: https://github.com/apache/incubator-hudi/pull/1602#issuecomment-628934924


   > commit1 only wrote 1 record but the parquet file is 20MB 
   
   @vinothchandar Sorry this example is bad... Let's say 8MB(2M entries) bloom 
filter + 200 records producing a 10MB parquet. If these 200 records are 
assigned to an existing partition, it will be very likely inserted into the 
existing file, so no problem. But if it goes to a new partition, then this 
small file is inevitable. 
   
   In the next run, we consider this 10MB file as a small file. We calculate 
`averageRecordSize = 10MB/200 = 50KB`, let's say we set max parquet size to 
100MB, `(100MB - 10MB)/50KB = 1800` records to file this small file. For other 
files, each will be assigned 2000 records. 
   
   Yes, we can never get a pure record size. Even we deduct the bloom filter 
size in this case, with other metadata, we still have `(2MB/200) = 10KB`, which 
will produce a bit larger small files... 
   
   Another idea in my mind:
   
   - if the `totalBytesWritten` is less than the 
`DEFAULT_PARQUET_SMALL_FILE_LIMIT_BYTES`, then skip calculating size from this 
commit.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-hudi] EdwinGuo commented on issue #1630: [SUPPORT] Latest commit does not have any schema in commit metadata

2020-05-14 Thread GitBox


EdwinGuo commented on issue #1630:
URL: https://github.com/apache/incubator-hudi/issues/1630#issuecomment-628934279


   Thanks @bvaradar. Do you think it's make sense to support  post 0.5.1 to 
support delete 0.5.0 or older? Where does the schema is being stored at for 
0.5.0? Is is only refer from parquet/avro data?  THanks!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-hudi] bvaradar commented on issue #1630: [SUPPORT] Latest commit does not have any schema in commit metadata

2020-05-14 Thread GitBox


bvaradar commented on issue #1630:
URL: https://github.com/apache/incubator-hudi/issues/1630#issuecomment-628918822


   No, I meant the feature of writing schema to commit file. It was added in 
0.5.1. Pre-0.5.1 commit files won't have schema in commit metadata.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-hudi] xushiyan edited a comment on pull request #1514: [WIP] [HUDI-774] Addressing incorrect Spark to Avro schema generation

2020-05-14 Thread GitBox


xushiyan edited a comment on pull request #1514:
URL: https://github.com/apache/incubator-hudi/pull/1514#issuecomment-628915411


   @afilipchik The master codebase has been migrated to JUnit 5. Please kindly 
rebase and update the usage to Junit 5 APIs where applicable.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-hudi] xushiyan commented on pull request #1514: [WIP] [HUDI-774] Addressing incorrect Spark to Avro schema generation

2020-05-14 Thread GitBox


xushiyan commented on pull request #1514:
URL: https://github.com/apache/incubator-hudi/pull/1514#issuecomment-628915411


   @afilipchik The master codebase has been migrated to JUnit 5. Please kindly 
upgrade the usage to Junit 5 APIs.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-hudi] vinothchandar commented on pull request #1602: [HUDI-494] fix incorrect record size estimation

2020-05-14 Thread GitBox


vinothchandar commented on pull request #1602:
URL: https://github.com/apache/incubator-hudi/pull/1602#issuecomment-628910318


   @garyli1019 thinking about it, even today without the bloom filters, the 
parquet size include additional stats and metadata contained internally.. So, 
it's never going to be pure record size, right? 
   
   >commit1 only wrote 1 record but the parquet file is 20MB
   This feels like a misconfigured bloom filter.. 20MB bloom filter is just too 
much.. Like I mentioned, the dynamic bloom filter approach has a capping on the 
footer size and hopefully something like this does not happen.. Also a future 
write will overlook at 20MB as small file, only if you explicitly lowered the 
default (100MB) to less than 20MB right?  Overall, I am saying - this 
configuration seems to be asking for trouble :P 
   
   cc @nsivabalan who implemented the dynamic bloom filters.. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-hudi] vinothchandar edited a comment on pull request #1602: [HUDI-494] fix incorrect record size estimation

2020-05-14 Thread GitBox


vinothchandar edited a comment on pull request #1602:
URL: https://github.com/apache/incubator-hudi/pull/1602#issuecomment-628910318


   @garyli1019 thinking about it, even today without the bloom filters, the 
parquet size include additional stats and metadata contained internally.. So, 
it's never going to be pure record size, right? 
   
   >commit1 only wrote 1 record but the parquet file is 20MB
   
   This feels like a misconfigured bloom filter.. 20MB bloom filter is just too 
much.. Like I mentioned, the dynamic bloom filter approach has a capping on the 
footer size and hopefully something like this does not happen.. Also a future 
write will overlook at 20MB as small file, only if you explicitly lowered the 
default (100MB) to less than 20MB right?  Overall, I am saying - this 
configuration seems to be asking for trouble :P 
   
   cc @nsivabalan who implemented the dynamic bloom filters.. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-hudi] vinothchandar commented on pull request #1509: [HUDI-525] lack of insert info in delta_commit inflight

2020-05-14 Thread GitBox


vinothchandar commented on pull request #1509:
URL: https://github.com/apache/incubator-hudi/pull/1509#issuecomment-628902977


   @n3nash is this ready to land 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-hudi] vinothchandar closed pull request #1387: [WIP] [HUDI-674] Rename hudi-hadoop-mr-bundle to hudi-hive-bundle

2020-05-14 Thread GitBox


vinothchandar closed pull request #1387:
URL: https://github.com/apache/incubator-hudi/pull/1387


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-hudi] vinothchandar commented on pull request #1562: [HUDI-837]: implemented custom deserializer for AvroKafkaSource

2020-05-14 Thread GitBox


vinothchandar commented on pull request #1562:
URL: https://github.com/apache/incubator-hudi/pull/1562#issuecomment-628899382


   @n3nash can you review this and take it home? 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-hudi] vinothchandar closed pull request #1253: [WIP] [HUDI-558] Introduce ability to compress bloom filters while storing in parquet

2020-05-14 Thread GitBox


vinothchandar closed pull request #1253:
URL: https://github.com/apache/incubator-hudi/pull/1253


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-hudi] vinothchandar commented on pull request #1597: [WIP] Added a MultiFormatTimestampBasedKeyGenerator that allows for multipl…

2020-05-14 Thread GitBox


vinothchandar commented on pull request #1597:
URL: https://github.com/apache/incubator-hudi/pull/1597#issuecomment-628898626


   @pratyakshsharma in the meantime, if you want to absorb this into #1433 , we 
can do that as well. Assuming @allenerb does not mind.. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-hudi] vinothchandar commented on pull request #1597: Added a MultiFormatTimestampBasedKeyGenerator that allows for multipl…

2020-05-14 Thread GitBox


vinothchandar commented on pull request #1597:
URL: https://github.com/apache/incubator-hudi/pull/1597#issuecomment-628898234


   Actually, thinking again.. we can take the time we want on this and get this 
into 0.6.0..
   
   will make this as WIP and come back to it after 0.5.3 is pushed out :) 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-hudi] vinothchandar commented on pull request #1597: Added a MultiFormatTimestampBasedKeyGenerator that allows for multipl…

2020-05-14 Thread GitBox


vinothchandar commented on pull request #1597:
URL: https://github.com/apache/incubator-hudi/pull/1597#issuecomment-628896478


   @allenerb  wondering if you have a JIRA for this already.. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-hudi] vinothchandar commented on pull request #1565: [HUDI-73]: implemented vanilla AvroKafkaSource

2020-05-14 Thread GitBox


vinothchandar commented on pull request #1565:
URL: https://github.com/apache/incubator-hudi/pull/1565#issuecomment-628889429


   Overall, this PR is nice in the sense that it let's us read data from Kafka 
using AVRO, with a fixed schema.. 
   but then, it cannot handle evolutions that well (this is an expected 
tradeoff , right).. I feel we should understand little better how people are 
sending vanilla avro data into kafka today, beofre we can resolve the open 
items here.. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1565: [HUDI-73]: implemented vanilla AvroKafkaSource

2020-05-14 Thread GitBox


vinothchandar commented on a change in pull request #1565:
URL: https://github.com/apache/incubator-hudi/pull/1565#discussion_r425432347



##
File path: 
hudi-utilities/src/main/java/org/apache/hudi/utilities/serde/AbstractHoodieKafkaAvroDeserializer.java
##
@@ -0,0 +1,97 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.utilities.serde;
+
+import org.apache.hudi.utilities.schema.FilebasedSchemaProvider;
+import 
org.apache.hudi.utilities.serde.config.HoodieKafkaAvroDeserializationConfig;
+
+import io.confluent.kafka.serializers.KafkaAvroDeserializerConfig;
+import kafka.utils.VerifiableProperties;
+import org.apache.avro.Schema;
+import org.apache.avro.generic.GenericDatumReader;
+import org.apache.avro.io.DatumReader;
+import org.apache.avro.io.DecoderFactory;
+import org.apache.avro.specific.SpecificDatumReader;
+import org.apache.kafka.common.errors.SerializationException;
+
+import java.io.IOException;
+import java.nio.ByteBuffer;
+
+public class AbstractHoodieKafkaAvroDeserializer {
+
+  private final DecoderFactory decoderFactory = DecoderFactory.get();
+  private boolean useSpecificAvroReader = false;
+  private Schema sourceSchema;
+
+  public AbstractHoodieKafkaAvroDeserializer(VerifiableProperties properties) {
+this.sourceSchema = new 
Schema.Parser().parse(properties.props().getProperty(FilebasedSchemaProvider.Config.SOURCE_SCHEMA_PROP));
+  }
+
+  protected void configure(HoodieKafkaAvroDeserializationConfig config) {
+useSpecificAvroReader = config
+  .getBoolean(KafkaAvroDeserializerConfig.SPECIFIC_AVRO_READER_CONFIG);
+  }
+
+  protected Object deserialize(byte[] payload) throws SerializationException {
+return deserialize(null, null, payload, sourceSchema);
+  }
+
+  protected Object deserialize(String topic, Boolean isKey, byte[] payload, 
Schema readerSchema) {
+try {
+  ByteBuffer buffer = this.getByteBuffer(payload);
+  int id = buffer.getInt();

Review comment:
   if we are reusing code, we need to be also mindful of updates needed for 
NOTICE and LICENSE> 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1562: [HUDI-837]: implemented custom deserializer for AvroKafkaSource

2020-05-14 Thread GitBox


vinothchandar commented on a change in pull request #1562:
URL: https://github.com/apache/incubator-hudi/pull/1562#discussion_r425429344



##
File path: 
hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/serde/HoodieAvroKafkaDeserializer.java
##
@@ -0,0 +1,78 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.utilities.sources.serde;
+
+import org.apache.hudi.common.config.TypedProperties;
+import org.apache.hudi.utilities.UtilHelpers;
+import org.apache.hudi.utilities.schema.SchemaProvider;
+
+import io.confluent.kafka.serializers.AbstractKafkaAvroDeserializer;
+import io.confluent.kafka.serializers.KafkaAvroDeserializerConfig;
+import kafka.serializer.Decoder;
+import kafka.utils.VerifiableProperties;
+import org.apache.avro.Schema;
+import org.apache.kafka.common.errors.SerializationException;
+
+import java.io.IOException;
+import java.util.Map;
+import java.util.Objects;
+import java.util.Properties;
+
+/**
+ * This is a custom implementation of kafka.serializer.Decoder which aims 
at deserializing all the incoming messages
+ * with same schema (which is latest).
+ */
+public class HoodieAvroKafkaDeserializer extends AbstractKafkaAvroDeserializer 
implements Decoder {

Review comment:
   the source itself has to be renamed then.. let's keep this class named 
consistently with the source class in Hudi and that should be fine.. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-hudi] vinothchandar commented on pull request #1562: [HUDI-837]: implemented custom deserializer for AvroKafkaSource

2020-05-14 Thread GitBox


vinothchandar commented on pull request #1562:
URL: https://github.com/apache/incubator-hudi/pull/1562#issuecomment-628885456


   > Not sure of how to mock the same here since it is library class.
   
   We can just mock the response it will send into a test SchemaProvider.. We 
need not mock SchemaRegistry itself 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1562: [HUDI-837]: implemented custom deserializer for AvroKafkaSource

2020-05-14 Thread GitBox


vinothchandar commented on a change in pull request #1562:
URL: https://github.com/apache/incubator-hudi/pull/1562#discussion_r425421081



##
File path: 
hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/AvroKafkaSource.java
##
@@ -45,11 +46,14 @@
 
   private final KafkaOffsetGen offsetGen;
 
+  private final String useCustomDeserializerProp = 
"hoodie.deltastreamer.kafka.custom.avro.deserializer";

Review comment:
   static string, all caps variable name? 

##
File path: 
hudi-utilities/src/main/java/org/apache/hudi/utilities/UtilHelpers.java
##
@@ -106,6 +106,16 @@ public static SchemaProvider createSchemaProvider(String 
schemaProviderClass, Ty
 }
   }
 
+  public static SchemaProvider createSchemaProvider(String schemaProviderClass,
+TypedProperties cfg) 
throws IOException {
+try {
+  return schemaProviderClass == null ? null :
+(SchemaProvider) ReflectionUtils.loadClass(schemaProviderClass, cfg);
+} catch (Throwable e) {
+  throw new IOException("Could not load schema provider class " + 
schemaProviderClass, e);

Review comment:
   throw HoodieException instead? 

##
File path: 
hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/AvroKafkaSource.java
##
@@ -23,6 +23,7 @@
 import org.apache.hudi.utilities.schema.SchemaProvider;
 import org.apache.hudi.utilities.sources.helpers.KafkaOffsetGen;
 import 
org.apache.hudi.utilities.sources.helpers.KafkaOffsetGen.CheckpointUtils;
+import org.apache.hudi.utilities.sources.serde.HoodieAvroKafkaDeserializer;
 
 import io.confluent.kafka.serializers.KafkaAvroDeserializer;

Review comment:
   @pratyakshsharma I thought this is what this deserializer does. no? 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-hudi] vinothchandar commented on pull request #1433: [HUDI-728]: Implement custom key generator

2020-05-14 Thread GitBox


vinothchandar commented on pull request #1433:
URL: https://github.com/apache/incubator-hudi/pull/1433#issuecomment-628873883


   @nsivabalan  can you shepherd this one home from here> 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-hudi] vinothchandar merged pull request #1541: [HUDI-843] Add ability to specify time unit for TimestampBasedKeyGenerator

2020-05-14 Thread GitBox


vinothchandar merged pull request #1541:
URL: https://github.com/apache/incubator-hudi/pull/1541


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[incubator-hudi] branch master updated: [HUDI-843] Add ability to specify time unit for TimestampBasedKeyGenerator (#1541)

2020-05-14 Thread vinoth
This is an automated email from the ASF dual-hosted git repository.

vinoth pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new f094f42  [HUDI-843] Add ability to specify time unit for  
TimestampBasedKeyGenerator (#1541)
f094f42 is described below

commit f094f4285782e1253b52fef788fa7461c67d95a7
Author: Alexander Filipchik 
AuthorDate: Thu May 14 13:37:59 2020 -0700

[HUDI-843] Add ability to specify time unit for  TimestampBasedKeyGenerator 
(#1541)



Co-authored-by: Alex Filipchik 
Co-authored-by: Vinoth Chandar 
---
 .../keygen/TimestampBasedKeyGenerator.java | 47 ++
 .../keygen/TestTimestampBasedKeyGenerator.java | 26 +---
 2 files changed, 60 insertions(+), 13 deletions(-)

diff --git 
a/hudi-utilities/src/main/java/org/apache/hudi/utilities/keygen/TimestampBasedKeyGenerator.java
 
b/hudi-utilities/src/main/java/org/apache/hudi/utilities/keygen/TimestampBasedKeyGenerator.java
index 919a2ef..e5bdc64 100644
--- 
a/hudi-utilities/src/main/java/org/apache/hudi/utilities/keygen/TimestampBasedKeyGenerator.java
+++ 
b/hudi-utilities/src/main/java/org/apache/hudi/utilities/keygen/TimestampBasedKeyGenerator.java
@@ -35,6 +35,10 @@ import java.util.Arrays;
 import java.util.Collections;
 import java.util.Date;
 import java.util.TimeZone;
+import java.util.concurrent.TimeUnit;
+
+import static java.util.concurrent.TimeUnit.MILLISECONDS;
+import static java.util.concurrent.TimeUnit.SECONDS;
 
 /**
  * Key generator, that relies on timestamps for partitioning field. Still 
picks record key by name.
@@ -42,9 +46,11 @@ import java.util.TimeZone;
 public class TimestampBasedKeyGenerator extends SimpleKeyGenerator {
 
   enum TimestampType implements Serializable {
-UNIX_TIMESTAMP, DATE_STRING, MIXED, EPOCHMILLISECONDS
+UNIX_TIMESTAMP, DATE_STRING, MIXED, EPOCHMILLISECONDS, SCALAR
   }
 
+  private final TimeUnit timeUnit;
+
   private final TimestampType timestampType;
 
   private SimpleDateFormat inputDateFormat;
@@ -62,6 +68,8 @@ public class TimestampBasedKeyGenerator extends 
SimpleKeyGenerator {
 
 // One value from TimestampType above
 private static final String TIMESTAMP_TYPE_FIELD_PROP = 
"hoodie.deltastreamer.keygen.timebased.timestamp.type";
+private static final String INPUT_TIME_UNIT =
+"hoodie.deltastreamer.keygen.timebased.timestamp.scalar.time.unit";
 private static final String TIMESTAMP_INPUT_DATE_FORMAT_PROP =
 "hoodie.deltastreamer.keygen.timebased.input.dateformat";
 private static final String TIMESTAMP_OUTPUT_DATE_FORMAT_PROP =
@@ -84,6 +92,21 @@ public class TimestampBasedKeyGenerator extends 
SimpleKeyGenerator {
   this.inputDateFormat = new 
SimpleDateFormat(config.getString(Config.TIMESTAMP_INPUT_DATE_FORMAT_PROP));
   this.inputDateFormat.setTimeZone(timeZone);
 }
+
+switch (this.timestampType) {
+  case EPOCHMILLISECONDS:
+timeUnit = MILLISECONDS;
+break;
+  case UNIX_TIMESTAMP:
+timeUnit = SECONDS;
+break;
+  case SCALAR:
+String timeUnitStr = config.getString(Config.INPUT_TIME_UNIT, 
TimeUnit.SECONDS.toString());
+timeUnit = TimeUnit.valueOf(timeUnitStr.toUpperCase());
+break;
+  default:
+timeUnit = null;
+}
   }
 
   @Override
@@ -96,21 +119,20 @@ public class TimestampBasedKeyGenerator extends 
SimpleKeyGenerator {
 partitionPathFormat.setTimeZone(timeZone);
 
 try {
-  long unixTime;
+  long timeMs;
   if (partitionVal instanceof Double) {
-unixTime = ((Double) partitionVal).longValue();
+timeMs = convertLongTimeToMillis(((Double) partitionVal).longValue());
   } else if (partitionVal instanceof Float) {
-unixTime = ((Float) partitionVal).longValue();
+timeMs = convertLongTimeToMillis(((Float) partitionVal).longValue());
   } else if (partitionVal instanceof Long) {
-unixTime = (Long) partitionVal;
+timeMs = convertLongTimeToMillis((Long) partitionVal);
   } else if (partitionVal instanceof CharSequence) {
-unixTime = inputDateFormat.parse(partitionVal.toString()).getTime() / 
1000;
+timeMs = inputDateFormat.parse(partitionVal.toString()).getTime();
   } else {
 throw new HoodieNotSupportedException(
   "Unexpected type for partition field: " + 
partitionVal.getClass().getName());
   }
-  Date timestamp = this.timestampType == TimestampType.EPOCHMILLISECONDS ? 
new Date(unixTime) : new Date(unixTime * 1000);
-
+  Date timestamp = new Date(timeMs);
   String recordKey = DataSourceUtils.getNestedFieldValAsString(record, 
recordKeyField, true);
   if (recordKey == null || recordKey.isEmpty()) {
 throw new HoodieKeyException("recordKey value: \"" + recordKey + "\" 
for field: \"" + recordKeyField + "\" 

[GitHub] [incubator-hudi] vinothchandar commented on pull request #1433: [HUDI-728]: Implement custom key generator

2020-05-14 Thread GitBox


vinothchandar commented on pull request #1433:
URL: https://github.com/apache/incubator-hudi/pull/1433#issuecomment-628873197


   @pratyakshsharma Rebased and removed the parquet files etc.. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-hudi] vinothchandar commented on pull request #1597: Added a MultiFormatTimestampBasedKeyGenerator that allows for multipl…

2020-05-14 Thread GitBox


vinothchandar commented on pull request #1597:
URL: https://github.com/apache/incubator-hudi/pull/1597#issuecomment-628868262


   Hi @allenerb sg.. Will do some minor changes and try to get this landed. 
Will file a follow up JIRA.. which you or @pratyakshsharma or someone can take 
up .. 
   
   Let us know if you need a hand with getting Hudi working as well.. We can 
also chat on Slack to setup a time to meet.. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Created] (HUDI-900) Metadata Bootstrap Key Generator needs to handle complex keys correctly

2020-05-14 Thread Balaji Varadarajan (Jira)
Balaji Varadarajan created HUDI-900:
---

 Summary: Metadata Bootstrap Key Generator needs to handle complex 
keys correctly
 Key: HUDI-900
 URL: https://issues.apache.org/jira/browse/HUDI-900
 Project: Apache Hudi (incubating)
  Issue Type: Sub-task
  Components: Writer Core
Reporter: Balaji Varadarajan
 Fix For: 0.6.0


Look at ComplexKeyGenerator. Make sure MetadataBootstrap is of same format.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HUDI-900) Metadata Bootstrap Key Generator needs to handle complex keys correctly

2020-05-14 Thread Balaji Varadarajan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Balaji Varadarajan reassigned HUDI-900:
---

Assignee: Balaji Varadarajan

> Metadata Bootstrap Key Generator needs to handle complex keys correctly
> ---
>
> Key: HUDI-900
> URL: https://issues.apache.org/jira/browse/HUDI-900
> Project: Apache Hudi (incubating)
>  Issue Type: Sub-task
>  Components: Writer Core
>Reporter: Balaji Varadarajan
>Assignee: Balaji Varadarajan
>Priority: Major
> Fix For: 0.6.0
>
>
> Look at ComplexKeyGenerator. Make sure MetadataBootstrap is of same format.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-900) Metadata Bootstrap Key Generator needs to handle complex keys correctly

2020-05-14 Thread Balaji Varadarajan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Balaji Varadarajan updated HUDI-900:

Status: Open  (was: New)

> Metadata Bootstrap Key Generator needs to handle complex keys correctly
> ---
>
> Key: HUDI-900
> URL: https://issues.apache.org/jira/browse/HUDI-900
> Project: Apache Hudi (incubating)
>  Issue Type: Sub-task
>  Components: Writer Core
>Reporter: Balaji Varadarajan
>Assignee: Balaji Varadarajan
>Priority: Major
> Fix For: 0.6.0
>
>
> Look at ComplexKeyGenerator. Make sure MetadataBootstrap is of same format.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] codecov-io edited a comment on pull request #1612: [HUDI-528] Handle empty commit in incremental pulling

2020-05-14 Thread GitBox


codecov-io edited a comment on pull request #1612:
URL: https://github.com/apache/incubator-hudi/pull/1612#issuecomment-626417448


   # 
[Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1612?src=pr=h1) 
Report
   > Merging 
[#1612](https://codecov.io/gh/apache/incubator-hudi/pull/1612?src=pr=desc) 
into 
[master](https://codecov.io/gh/apache/incubator-hudi/commit/3a2fe13fcb7c168f8ff023e3bdb6ae482b400316=desc)
 will **increase** coverage by `0.03%`.
   > The diff coverage is `100.00%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-hudi/pull/1612/graphs/tree.svg?width=650=150=pr=VTTXabwbs2)](https://codecov.io/gh/apache/incubator-hudi/pull/1612?src=pr=tree)
   
   ```diff
   @@ Coverage Diff  @@
   ## master#1612  +/-   ##
   
   + Coverage 71.77%   71.81%   +0.03% 
 Complexity 1087 1087  
   
 Files   385  385  
 Lines 1659116587   -4 
 Branches   1669 1668   -1 
   
   + Hits  1190911912   +3 
   + Misses 3953 3949   -4 
   + Partials729  726   -3 
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/incubator-hudi/pull/1612?src=pr=tree) | 
Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | 
[...in/scala/org/apache/hudi/IncrementalRelation.scala](https://codecov.io/gh/apache/incubator-hudi/pull/1612/diff?src=pr=tree#diff-aHVkaS1zcGFyay9zcmMvbWFpbi9zY2FsYS9vcmcvYXBhY2hlL2h1ZGkvSW5jcmVtZW50YWxSZWxhdGlvbi5zY2FsYQ==)
 | `72.41% <100.00%> (-0.17%)` | `0.00 <0.00> (ø)` | |
   | 
[...java/org/apache/hudi/common/util/ParquetUtils.java](https://codecov.io/gh/apache/incubator-hudi/pull/1612/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3V0aWwvUGFycXVldFV0aWxzLmphdmE=)
 | `73.68% <0.00%> (-2.64%)` | `0.00% <0.00%> (ø%)` | |
   | 
[...che/hudi/common/util/BufferedRandomAccessFile.java](https://codecov.io/gh/apache/incubator-hudi/pull/1612/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3V0aWwvQnVmZmVyZWRSYW5kb21BY2Nlc3NGaWxlLmphdmE=)
 | `55.26% <0.00%> (+0.87%)` | `0.00% <0.00%> (ø%)` | |
   | 
[...n/scala/org/apache/hudi/HoodieSparkSqlWriter.scala](https://codecov.io/gh/apache/incubator-hudi/pull/1612/diff?src=pr=tree#diff-aHVkaS1zcGFyay9zcmMvbWFpbi9zY2FsYS9vcmcvYXBhY2hlL2h1ZGkvSG9vZGllU3BhcmtTcWxXcml0ZXIuc2NhbGE=)
 | `55.08% <0.00%> (+1.79%)` | `0.00% <0.00%> (ø%)` | |
   | 
[...in/scala/org/apache/hudi/AvroConversionUtils.scala](https://codecov.io/gh/apache/incubator-hudi/pull/1612/diff?src=pr=tree#diff-aHVkaS1zcGFyay9zcmMvbWFpbi9zY2FsYS9vcmcvYXBhY2hlL2h1ZGkvQXZyb0NvbnZlcnNpb25VdGlscy5zY2FsYQ==)
 | `58.33% <0.00%> (+4.16%)` | `0.00% <0.00%> (ø%)` | |
   | 
[...ache/hudi/common/fs/inline/InMemoryFileSystem.java](https://codecov.io/gh/apache/incubator-hudi/pull/1612/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL2ZzL2lubGluZS9Jbk1lbW9yeUZpbGVTeXN0ZW0uamF2YQ==)
 | `89.65% <0.00%> (+10.34%)` | `0.00% <0.00%> (ø%)` | |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1612?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1612?src=pr=footer).
 Last update 
[3a2fe13...f1dde61](https://codecov.io/gh/apache/incubator-hudi/pull/1612?src=pr=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-hudi] codecov-io edited a comment on pull request #1612: [HUDI-528] Handle empty commit in incremental pulling

2020-05-14 Thread GitBox


codecov-io edited a comment on pull request #1612:
URL: https://github.com/apache/incubator-hudi/pull/1612#issuecomment-626417448


   # 
[Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1612?src=pr=h1) 
Report
   > Merging 
[#1612](https://codecov.io/gh/apache/incubator-hudi/pull/1612?src=pr=desc) 
into 
[master](https://codecov.io/gh/apache/incubator-hudi/commit/3a2fe13fcb7c168f8ff023e3bdb6ae482b400316=desc)
 will **increase** coverage by `0.03%`.
   > The diff coverage is `100.00%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-hudi/pull/1612/graphs/tree.svg?width=650=150=pr=VTTXabwbs2)](https://codecov.io/gh/apache/incubator-hudi/pull/1612?src=pr=tree)
   
   ```diff
   @@ Coverage Diff  @@
   ## master#1612  +/-   ##
   
   + Coverage 71.77%   71.81%   +0.03% 
 Complexity 1087 1087  
   
 Files   385  385  
 Lines 1659116587   -4 
 Branches   1669 1668   -1 
   
   + Hits  1190911912   +3 
   + Misses 3953 3949   -4 
   + Partials729  726   -3 
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/incubator-hudi/pull/1612?src=pr=tree) | 
Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | 
[...in/scala/org/apache/hudi/IncrementalRelation.scala](https://codecov.io/gh/apache/incubator-hudi/pull/1612/diff?src=pr=tree#diff-aHVkaS1zcGFyay9zcmMvbWFpbi9zY2FsYS9vcmcvYXBhY2hlL2h1ZGkvSW5jcmVtZW50YWxSZWxhdGlvbi5zY2FsYQ==)
 | `72.41% <100.00%> (-0.17%)` | `0.00 <0.00> (ø)` | |
   | 
[...java/org/apache/hudi/common/util/ParquetUtils.java](https://codecov.io/gh/apache/incubator-hudi/pull/1612/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3V0aWwvUGFycXVldFV0aWxzLmphdmE=)
 | `73.68% <0.00%> (-2.64%)` | `0.00% <0.00%> (ø%)` | |
   | 
[...che/hudi/common/util/BufferedRandomAccessFile.java](https://codecov.io/gh/apache/incubator-hudi/pull/1612/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3V0aWwvQnVmZmVyZWRSYW5kb21BY2Nlc3NGaWxlLmphdmE=)
 | `55.26% <0.00%> (+0.87%)` | `0.00% <0.00%> (ø%)` | |
   | 
[...n/scala/org/apache/hudi/HoodieSparkSqlWriter.scala](https://codecov.io/gh/apache/incubator-hudi/pull/1612/diff?src=pr=tree#diff-aHVkaS1zcGFyay9zcmMvbWFpbi9zY2FsYS9vcmcvYXBhY2hlL2h1ZGkvSG9vZGllU3BhcmtTcWxXcml0ZXIuc2NhbGE=)
 | `55.08% <0.00%> (+1.79%)` | `0.00% <0.00%> (ø%)` | |
   | 
[...in/scala/org/apache/hudi/AvroConversionUtils.scala](https://codecov.io/gh/apache/incubator-hudi/pull/1612/diff?src=pr=tree#diff-aHVkaS1zcGFyay9zcmMvbWFpbi9zY2FsYS9vcmcvYXBhY2hlL2h1ZGkvQXZyb0NvbnZlcnNpb25VdGlscy5zY2FsYQ==)
 | `58.33% <0.00%> (+4.16%)` | `0.00% <0.00%> (ø%)` | |
   | 
[...ache/hudi/common/fs/inline/InMemoryFileSystem.java](https://codecov.io/gh/apache/incubator-hudi/pull/1612/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL2ZzL2lubGluZS9Jbk1lbW9yeUZpbGVTeXN0ZW0uamF2YQ==)
 | `89.65% <0.00%> (+10.34%)` | `0.00% <0.00%> (ø%)` | |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1612?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1612?src=pr=footer).
 Last update 
[3a2fe13...f1dde61](https://codecov.io/gh/apache/incubator-hudi/pull/1612?src=pr=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-hudi] codecov-io edited a comment on pull request #1612: [HUDI-528] Handle empty commit in incremental pulling

2020-05-14 Thread GitBox


codecov-io edited a comment on pull request #1612:
URL: https://github.com/apache/incubator-hudi/pull/1612#issuecomment-626417448


   # 
[Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1612?src=pr=h1) 
Report
   > Merging 
[#1612](https://codecov.io/gh/apache/incubator-hudi/pull/1612?src=pr=desc) 
into 
[master](https://codecov.io/gh/apache/incubator-hudi/commit/3a2fe13fcb7c168f8ff023e3bdb6ae482b400316=desc)
 will **increase** coverage by `0.01%`.
   > The diff coverage is `100.00%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-hudi/pull/1612/graphs/tree.svg?width=650=150=pr=VTTXabwbs2)](https://codecov.io/gh/apache/incubator-hudi/pull/1612?src=pr=tree)
   
   ```diff
   @@ Coverage Diff  @@
   ## master#1612  +/-   ##
   
   + Coverage 71.77%   71.79%   +0.01% 
 Complexity 1087 1087  
   
 Files   385  385  
 Lines 1659116588   -3 
 Branches   1669 1668   -1 
   
   + Hits  1190911910   +1 
   + Misses 3953 3952   -1 
   + Partials729  726   -3 
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/incubator-hudi/pull/1612?src=pr=tree) | 
Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | 
[...in/scala/org/apache/hudi/IncrementalRelation.scala](https://codecov.io/gh/apache/incubator-hudi/pull/1612/diff?src=pr=tree#diff-aHVkaS1zcGFyay9zcmMvbWFpbi9zY2FsYS9vcmcvYXBhY2hlL2h1ZGkvSW5jcmVtZW50YWxSZWxhdGlvbi5zY2FsYQ==)
 | `72.88% <100.00%> (+0.30%)` | `0.00 <0.00> (ø)` | |
   | 
[...java/org/apache/hudi/common/util/ParquetUtils.java](https://codecov.io/gh/apache/incubator-hudi/pull/1612/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3V0aWwvUGFycXVldFV0aWxzLmphdmE=)
 | `73.68% <0.00%> (-2.64%)` | `0.00% <0.00%> (ø%)` | |
   | 
[...che/hudi/common/util/BufferedRandomAccessFile.java](https://codecov.io/gh/apache/incubator-hudi/pull/1612/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3V0aWwvQnVmZmVyZWRSYW5kb21BY2Nlc3NGaWxlLmphdmE=)
 | `55.26% <0.00%> (+0.87%)` | `0.00% <0.00%> (ø%)` | |
   | 
[...n/scala/org/apache/hudi/HoodieSparkSqlWriter.scala](https://codecov.io/gh/apache/incubator-hudi/pull/1612/diff?src=pr=tree#diff-aHVkaS1zcGFyay9zcmMvbWFpbi9zY2FsYS9vcmcvYXBhY2hlL2h1ZGkvSG9vZGllU3BhcmtTcWxXcml0ZXIuc2NhbGE=)
 | `55.08% <0.00%> (+1.79%)` | `0.00% <0.00%> (ø%)` | |
   | 
[...in/scala/org/apache/hudi/AvroConversionUtils.scala](https://codecov.io/gh/apache/incubator-hudi/pull/1612/diff?src=pr=tree#diff-aHVkaS1zcGFyay9zcmMvbWFpbi9zY2FsYS9vcmcvYXBhY2hlL2h1ZGkvQXZyb0NvbnZlcnNpb25VdGlscy5zY2FsYQ==)
 | `58.33% <0.00%> (+4.16%)` | `0.00% <0.00%> (ø%)` | |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1612?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1612?src=pr=footer).
 Last update 
[3a2fe13...f1dde61](https://codecov.io/gh/apache/incubator-hudi/pull/1612?src=pr=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Created] (HUDI-899) Add a knob to change partition-path style while performing metadata bootstrap

2020-05-14 Thread Balaji Varadarajan (Jira)
Balaji Varadarajan created HUDI-899:
---

 Summary: Add a knob to change partition-path style while 
performing metadata bootstrap
 Key: HUDI-899
 URL: https://issues.apache.org/jira/browse/HUDI-899
 Project: Apache Hudi (incubating)
  Issue Type: Sub-task
  Components: Writer Core
Reporter: Balaji Varadarajan
 Fix For: 0.6.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] allenerb commented on pull request #1597: Added a MultiFormatTimestampBasedKeyGenerator that allows for multipl…

2020-05-14 Thread GitBox


allenerb commented on pull request #1597:
URL: https://github.com/apache/incubator-hudi/pull/1597#issuecomment-628853876


   Hi Vinoth,
   
   Apologies for not getting back to this PR to make changes.  I’ve been
   swamped trying to get Hudi working in the environment (still struggling
   with it).  I don’t mind at all going in and trying to merge the thing, but
   at the moment I’m spending all my time trying to figure this thing out!  :-)
   
   If you’re ok with taking it as-is at the moment, I’d definitely sign up for
   updating it in the near future, but I do think the idea of merging the two
   made a lot of sense.  I also saw your email about trying to get the PR
   backlog wrapped up for release 0.5.3 so I’m good with whatever you decide.
   I don’t want to be your bottleneck right now.
   
   Allen
   
   On Thu, May 14, 2020 at 3:32 PM vinoth chandar 
   wrote:
   
   > @allenerb  Trying to understand the next
   > steps here.. Are we deciding between doing a separate class and merging
   > this into the existing class?
   >
   > I am happy to take the patch as-is and do the consolidation as follow on
   > work as well. let me know what you both think cc @pratyakshsharma
   > 
   >
   > —
   > You are receiving this because you were mentioned.
   > Reply to this email directly, view it on GitHub
   > 
,
   > or unsubscribe
   > 

   > .
   >
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-hudi] xushiyan commented on pull request #1592: [Hudi-69] Spark Datasource for MOR table

2020-05-14 Thread GitBox


xushiyan commented on pull request #1592:
URL: https://github.com/apache/incubator-hudi/pull/1592#issuecomment-628845961


   @garyli1019 it does look very weird... getting NPE at 
org.apache.hudi.functional.TestDataSource.testSparkDatasourceForMergeOnRead(TestDataSource.scala:227,
 means the `basepath` was set to null somehow. But wouldn't it be null in L222 
and L223 of the same method? The test method passes locally for me as well.  I 
don't have clue on why travis ran it differently..but at least try this change 
for that method
   
   ```java
   @Test
   def testSparkDatasourceForMergeOnRead(@TempDir tempDir: java.nio.file.Path) {
 val basePath = tempDir.toAbsolutePath.toString
   }
   ```
   junit 5 should give it a new dedicated temp dir.. see if NPE still persists..



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-hudi] vinothchandar commented on pull request #1597: Added a MultiFormatTimestampBasedKeyGenerator that allows for multipl…

2020-05-14 Thread GitBox


vinothchandar commented on pull request #1597:
URL: https://github.com/apache/incubator-hudi/pull/1597#issuecomment-628843862


   @allenerb Trying to understand the next steps here.. Are we deciding between 
doing a separate class and merging this into the existing class? 
   
   I am happy to take the patch as-is and do the consolidation as follow on 
work as well. let me know what you both think cc @pratyakshsharma 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Created] (HUDI-898) Need to add Schema parameter to HoodieRecordPayload::preCombine

2020-05-14 Thread Yixue (Andrew) Zhu (Jira)
Yixue (Andrew) Zhu created HUDI-898:
---

 Summary: Need to add Schema parameter to 
HoodieRecordPayload::preCombine
 Key: HUDI-898
 URL: https://issues.apache.org/jira/browse/HUDI-898
 Project: Apache Hudi (incubating)
  Issue Type: Improvement
  Components: Common Core
Reporter: Yixue (Andrew) Zhu


We are working on Mongo Oplog integration with Hudi, to stream Mongo updates to 
Hudi tables.

There are 4 Mongo OpLog operations we need to handle, CRUD (create, read, 
update, delete).

Currently Hudi handle create/read, delete, but not update well with existing 
preCombine API in HoodieRecordPayload class. In particularly, Update operation 
contains "patch" field, which is extended Json describing update for dot 
separated field paths.

We need to pass Avro schema to preCombine API for it to work:

Even though BaseAvroPayload constructor accepts GenericRecord, which has Avro 
schema reference, but it materialize GenericRecord to bytes, to support 
serialization/deserialization by ExternalSpillableMap.

 

Is there concern/objection to this? in other words, have I overlooked something?

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] codecov-io edited a comment on pull request #1602: [HUDI-494] fix incorrect record size estimation

2020-05-14 Thread GitBox


codecov-io edited a comment on pull request #1602:
URL: https://github.com/apache/incubator-hudi/pull/1602#issuecomment-628183877


   # 
[Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1602?src=pr=h1) 
Report
   > Merging 
[#1602](https://codecov.io/gh/apache/incubator-hudi/pull/1602?src=pr=desc) 
into 
[master](https://codecov.io/gh/apache/incubator-hudi/commit/32bada29dc95f1d5910713ae6b4f4a4ef39677c9=desc)
 will **increase** coverage by `0.07%`.
   > The diff coverage is `96.15%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-hudi/pull/1602/graphs/tree.svg?width=650=150=pr=VTTXabwbs2)](https://codecov.io/gh/apache/incubator-hudi/pull/1602?src=pr=tree)
   
   ```diff
   @@ Coverage Diff  @@
   ## master#1602  +/-   ##
   
   + Coverage 71.76%   71.84%   +0.07% 
   + Complexity 1087 1085   -2 
   
 Files   385  385  
 Lines 1658416592   +8 
 Branches   1668 1674   +6 
   
   + Hits  1190211920  +18 
   + Misses 3954 3946   -8 
   + Partials728  726   -2 
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/incubator-hudi/pull/1602?src=pr=tree) | 
Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | 
[.../org/apache/hudi/table/HoodieCopyOnWriteTable.java](https://codecov.io/gh/apache/incubator-hudi/pull/1602/diff?src=pr=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdGFibGUvSG9vZGllQ29weU9uV3JpdGVUYWJsZS5qYXZh)
 | `57.14% <ø> (-4.93%)` | `4.00 <0.00> (ø)` | |
   | 
[...org/apache/hudi/common/bloom/BloomFilterUtils.java](https://codecov.io/gh/apache/incubator-hudi/pull/1602/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL2Jsb29tL0Jsb29tRmlsdGVyVXRpbHMuamF2YQ==)
 | `75.00% <0.00%> (ø)` | `3.00 <0.00> (ø)` | |
   | 
[...he/hudi/table/action/commit/UpsertPartitioner.java](https://codecov.io/gh/apache/incubator-hudi/pull/1602/diff?src=pr=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdGFibGUvYWN0aW9uL2NvbW1pdC9VcHNlcnRQYXJ0aXRpb25lci5qYXZh)
 | `96.07% <100.00%> (+1.11%)` | `16.00 <2.00> (+1.00)` | |
   | 
[...apache/hudi/common/model/HoodieCommitMetadata.java](https://codecov.io/gh/apache/incubator-hudi/pull/1602/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL21vZGVsL0hvb2RpZUNvbW1pdE1ldGFkYXRhLmphdmE=)
 | `57.65% <100.00%> (+2.61%)` | `27.00 <0.00> (-3.00)` | :arrow_up: |
   | 
[...g/apache/hudi/metrics/InMemoryMetricsReporter.java](https://codecov.io/gh/apache/incubator-hudi/pull/1602/diff?src=pr=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvbWV0cmljcy9Jbk1lbW9yeU1ldHJpY3NSZXBvcnRlci5qYXZh)
 | `40.00% <0.00%> (-40.00%)` | `0.00% <0.00%> (ø%)` | |
   | 
[...ain/java/org/apache/hudi/avro/HoodieAvroUtils.java](https://codecov.io/gh/apache/incubator-hudi/pull/1602/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvYXZyby9Ib29kaWVBdnJvVXRpbHMuamF2YQ==)
 | `80.95% <0.00%> (+1.12%)` | `22.00% <0.00%> (ø%)` | |
   | 
[...java/org/apache/hudi/config/HoodieIndexConfig.java](https://codecov.io/gh/apache/incubator-hudi/pull/1602/diff?src=pr=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29uZmlnL0hvb2RpZUluZGV4Q29uZmlnLmphdmE=)
 | `69.84% <0.00%> (+6.34%)` | `3.00% <0.00%> (ø%)` | |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1602?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1602?src=pr=footer).
 Last update 
[32bada2...2f8cf11](https://codecov.io/gh/apache/incubator-hudi/pull/1602?src=pr=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-hudi] garyli1019 commented on pull request #1592: [Hudi-69] Spark Datasource for MOR table

2020-05-14 Thread GitBox


garyli1019 commented on pull request #1592:
URL: https://github.com/apache/incubator-hudi/pull/1592#issuecomment-628828782


   Hello @xushiyan , may I ask a question regarding the functional testing? I 
believe you are the expert on this topic in our community :) 
   This PR passed CI in my local but failed on Travis. 
   `java.lang.NullPointerException at 
org.apache.hudi.functional.TestDataSource.testSparkDatasourceForMergeOnRead(TestDataSource.scala:227)`.
 Do you know where should I look into this issue?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-hudi] codecov-io edited a comment on pull request #1602: [HUDI-494] fix incorrect record size estimation

2020-05-14 Thread GitBox


codecov-io edited a comment on pull request #1602:
URL: https://github.com/apache/incubator-hudi/pull/1602#issuecomment-628183877


   # 
[Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1602?src=pr=h1) 
Report
   > Merging 
[#1602](https://codecov.io/gh/apache/incubator-hudi/pull/1602?src=pr=desc) 
into 
[master](https://codecov.io/gh/apache/incubator-hudi/commit/32bada29dc95f1d5910713ae6b4f4a4ef39677c9=desc)
 will **increase** coverage by `0.07%`.
   > The diff coverage is `96.15%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-hudi/pull/1602/graphs/tree.svg?width=650=150=pr=VTTXabwbs2)](https://codecov.io/gh/apache/incubator-hudi/pull/1602?src=pr=tree)
   
   ```diff
   @@ Coverage Diff  @@
   ## master#1602  +/-   ##
   
   + Coverage 71.76%   71.84%   +0.07% 
   + Complexity 1087 1085   -2 
   
 Files   385  385  
 Lines 1658416592   +8 
 Branches   1668 1674   +6 
   
   + Hits  1190211920  +18 
   + Misses 3954 3946   -8 
   + Partials728  726   -2 
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/incubator-hudi/pull/1602?src=pr=tree) | 
Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | 
[.../org/apache/hudi/table/HoodieCopyOnWriteTable.java](https://codecov.io/gh/apache/incubator-hudi/pull/1602/diff?src=pr=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdGFibGUvSG9vZGllQ29weU9uV3JpdGVUYWJsZS5qYXZh)
 | `57.14% <ø> (-4.93%)` | `4.00 <0.00> (ø)` | |
   | 
[...org/apache/hudi/common/bloom/BloomFilterUtils.java](https://codecov.io/gh/apache/incubator-hudi/pull/1602/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL2Jsb29tL0Jsb29tRmlsdGVyVXRpbHMuamF2YQ==)
 | `75.00% <0.00%> (ø)` | `3.00 <0.00> (ø)` | |
   | 
[...he/hudi/table/action/commit/UpsertPartitioner.java](https://codecov.io/gh/apache/incubator-hudi/pull/1602/diff?src=pr=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdGFibGUvYWN0aW9uL2NvbW1pdC9VcHNlcnRQYXJ0aXRpb25lci5qYXZh)
 | `96.07% <100.00%> (+1.11%)` | `16.00 <2.00> (+1.00)` | |
   | 
[...apache/hudi/common/model/HoodieCommitMetadata.java](https://codecov.io/gh/apache/incubator-hudi/pull/1602/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL21vZGVsL0hvb2RpZUNvbW1pdE1ldGFkYXRhLmphdmE=)
 | `57.65% <100.00%> (+2.61%)` | `27.00 <0.00> (-3.00)` | :arrow_up: |
   | 
[...g/apache/hudi/metrics/InMemoryMetricsReporter.java](https://codecov.io/gh/apache/incubator-hudi/pull/1602/diff?src=pr=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvbWV0cmljcy9Jbk1lbW9yeU1ldHJpY3NSZXBvcnRlci5qYXZh)
 | `40.00% <0.00%> (-40.00%)` | `0.00% <0.00%> (ø%)` | |
   | 
[...ain/java/org/apache/hudi/avro/HoodieAvroUtils.java](https://codecov.io/gh/apache/incubator-hudi/pull/1602/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvYXZyby9Ib29kaWVBdnJvVXRpbHMuamF2YQ==)
 | `80.95% <0.00%> (+1.12%)` | `22.00% <0.00%> (ø%)` | |
   | 
[...java/org/apache/hudi/config/HoodieIndexConfig.java](https://codecov.io/gh/apache/incubator-hudi/pull/1602/diff?src=pr=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29uZmlnL0hvb2RpZUluZGV4Q29uZmlnLmphdmE=)
 | `69.84% <0.00%> (+6.34%)` | `3.00% <0.00%> (ø%)` | |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1602?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1602?src=pr=footer).
 Last update 
[32bada2...2f8cf11](https://codecov.io/gh/apache/incubator-hudi/pull/1602?src=pr=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-hudi] vinothchandar commented on pull request #1584: fix schema provider issue

2020-05-14 Thread GitBox


vinothchandar commented on pull request #1584:
URL: https://github.com/apache/incubator-hudi/pull/1584#issuecomment-628828057


   To reduce context switch, assigning to @bvaradar who is looking into couple 
other PRs around schema provider 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-hudi] vinothchandar commented on pull request #1402: [HUDI-407] Adding Simple Index

2020-05-14 Thread GitBox


vinothchandar commented on pull request #1402:
URL: https://github.com/apache/incubator-hudi/pull/1402#issuecomment-628827254


   @nsivabalan please ping me when the tests are passing.. WIll make a final 
pass and land 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-hudi] vinothchandar commented on pull request #1596: [HUDI-863] get decimal properties from derived spark DataType

2020-05-14 Thread GitBox


vinothchandar commented on pull request #1596:
URL: https://github.com/apache/incubator-hudi/pull/1596#issuecomment-628821960


   Added you as a contributor on jira.. So you should be able to claim those 
now! let us know if you still face issues.. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-864) parquet schema conflict: optional binary (UTF8) is not a group

2020-05-14 Thread Vinoth Chandar (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17107561#comment-17107561
 ] 

Vinoth Chandar commented on HUDI-864:
-

oops.. slipped past my radar.. 

I prefer not to get into shading parquet.. The way we have right now, we just 
use parquet version on the engine, which makes it easy for users to 
troubleshoot issues based on what applies to the engine.. 

1 & 2 seem good to me. 

> parquet schema conflict: optional binary  (UTF8) is not a group
> ---
>
> Key: HUDI-864
> URL: https://issues.apache.org/jira/browse/HUDI-864
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>Reporter: Roland Johann
>Priority: Major
>  Labels: bug-bash-0.6.0
>
> When dealing with struct types like this
> {code:json}
> {
>   "type": "struct",
>   "fields": [
> {
>   "name": "categoryResults",
>   "type": {
> "type": "array",
> "elementType": {
>   "type": "struct",
>   "fields": [
> {
>   "name": "categoryId",
>   "type": "string",
>   "nullable": true,
>   "metadata": {}
> }
>   ]
> },
> "containsNull": true
>   },
>   "nullable": true,
>   "metadata": {}
> }
>   ]
> }
> {code}
> The second ingest batch throws that exception:
> {code}
> ERROR [Executor task launch worker for task 15] 
> commit.BaseCommitActionExecutor (BaseCommitActionExecutor.java:264) - Error 
> upserting bucketType UPDATE for partition :0
> org.apache.hudi.exception.HoodieException: 
> org.apache.hudi.exception.HoodieException: 
> java.util.concurrent.ExecutionException: 
> org.apache.hudi.exception.HoodieException: operation has failed
>   at 
> org.apache.hudi.table.action.commit.CommitActionExecutor.handleUpdateInternal(CommitActionExecutor.java:100)
>   at 
> org.apache.hudi.table.action.commit.CommitActionExecutor.handleUpdate(CommitActionExecutor.java:76)
>   at 
> org.apache.hudi.table.action.deltacommit.DeltaCommitActionExecutor.handleUpdate(DeltaCommitActionExecutor.java:73)
>   at 
> org.apache.hudi.table.action.commit.BaseCommitActionExecutor.handleUpsertPartition(BaseCommitActionExecutor.java:258)
>   at 
> org.apache.hudi.table.action.commit.BaseCommitActionExecutor.handleInsertPartition(BaseCommitActionExecutor.java:271)
>   at 
> org.apache.hudi.table.action.commit.BaseCommitActionExecutor.lambda$execute$caffe4c4$1(BaseCommitActionExecutor.java:104)
>   at 
> org.apache.spark.api.java.JavaRDDLike$$anonfun$mapPartitionsWithIndex$1.apply(JavaRDDLike.scala:102)
>   at 
> org.apache.spark.api.java.JavaRDDLike$$anonfun$mapPartitionsWithIndex$1.apply(JavaRDDLike.scala:102)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$25.apply(RDD.scala:853)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$25.apply(RDD.scala:853)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
>   at org.apache.spark.rdd.RDD$$anonfun$7.apply(RDD.scala:337)
>   at org.apache.spark.rdd.RDD$$anonfun$7.apply(RDD.scala:335)
>   at 
> org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1182)
>   at 
> org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1156)
>   at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:1091)
>   at 
> org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1156)
>   at 
> org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:882)
>   at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:335)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:286)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
>   at org.apache.spark.scheduler.Task.run(Task.scala:123)
>   at 
> org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
>   at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
>   at 
> 

[jira] [Updated] (HUDI-864) parquet schema conflict: optional binary (UTF8) is not a group

2020-05-14 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar updated HUDI-864:

Labels:   (was: bug-bash-0.6.0)

> parquet schema conflict: optional binary  (UTF8) is not a group
> ---
>
> Key: HUDI-864
> URL: https://issues.apache.org/jira/browse/HUDI-864
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>Reporter: Roland Johann
>Priority: Major
>
> When dealing with struct types like this
> {code:json}
> {
>   "type": "struct",
>   "fields": [
> {
>   "name": "categoryResults",
>   "type": {
> "type": "array",
> "elementType": {
>   "type": "struct",
>   "fields": [
> {
>   "name": "categoryId",
>   "type": "string",
>   "nullable": true,
>   "metadata": {}
> }
>   ]
> },
> "containsNull": true
>   },
>   "nullable": true,
>   "metadata": {}
> }
>   ]
> }
> {code}
> The second ingest batch throws that exception:
> {code}
> ERROR [Executor task launch worker for task 15] 
> commit.BaseCommitActionExecutor (BaseCommitActionExecutor.java:264) - Error 
> upserting bucketType UPDATE for partition :0
> org.apache.hudi.exception.HoodieException: 
> org.apache.hudi.exception.HoodieException: 
> java.util.concurrent.ExecutionException: 
> org.apache.hudi.exception.HoodieException: operation has failed
>   at 
> org.apache.hudi.table.action.commit.CommitActionExecutor.handleUpdateInternal(CommitActionExecutor.java:100)
>   at 
> org.apache.hudi.table.action.commit.CommitActionExecutor.handleUpdate(CommitActionExecutor.java:76)
>   at 
> org.apache.hudi.table.action.deltacommit.DeltaCommitActionExecutor.handleUpdate(DeltaCommitActionExecutor.java:73)
>   at 
> org.apache.hudi.table.action.commit.BaseCommitActionExecutor.handleUpsertPartition(BaseCommitActionExecutor.java:258)
>   at 
> org.apache.hudi.table.action.commit.BaseCommitActionExecutor.handleInsertPartition(BaseCommitActionExecutor.java:271)
>   at 
> org.apache.hudi.table.action.commit.BaseCommitActionExecutor.lambda$execute$caffe4c4$1(BaseCommitActionExecutor.java:104)
>   at 
> org.apache.spark.api.java.JavaRDDLike$$anonfun$mapPartitionsWithIndex$1.apply(JavaRDDLike.scala:102)
>   at 
> org.apache.spark.api.java.JavaRDDLike$$anonfun$mapPartitionsWithIndex$1.apply(JavaRDDLike.scala:102)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$25.apply(RDD.scala:853)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$25.apply(RDD.scala:853)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
>   at org.apache.spark.rdd.RDD$$anonfun$7.apply(RDD.scala:337)
>   at org.apache.spark.rdd.RDD$$anonfun$7.apply(RDD.scala:335)
>   at 
> org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1182)
>   at 
> org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1156)
>   at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:1091)
>   at 
> org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1156)
>   at 
> org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:882)
>   at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:335)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:286)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
>   at org.apache.spark.scheduler.Task.run(Task.scala:123)
>   at 
> org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
>   at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.hudi.exception.HoodieException: 
> java.util.concurrent.ExecutionException: 
> org.apache.hudi.exception.HoodieException: operation has 

[jira] [Updated] (HUDI-864) parquet schema conflict: optional binary (UTF8) is not a group

2020-05-14 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar updated HUDI-864:

Affects Version/s: 0.5.2

> parquet schema conflict: optional binary  (UTF8) is not a group
> ---
>
> Key: HUDI-864
> URL: https://issues.apache.org/jira/browse/HUDI-864
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>Affects Versions: 0.5.2
>Reporter: Roland Johann
>Priority: Major
>
> When dealing with struct types like this
> {code:json}
> {
>   "type": "struct",
>   "fields": [
> {
>   "name": "categoryResults",
>   "type": {
> "type": "array",
> "elementType": {
>   "type": "struct",
>   "fields": [
> {
>   "name": "categoryId",
>   "type": "string",
>   "nullable": true,
>   "metadata": {}
> }
>   ]
> },
> "containsNull": true
>   },
>   "nullable": true,
>   "metadata": {}
> }
>   ]
> }
> {code}
> The second ingest batch throws that exception:
> {code}
> ERROR [Executor task launch worker for task 15] 
> commit.BaseCommitActionExecutor (BaseCommitActionExecutor.java:264) - Error 
> upserting bucketType UPDATE for partition :0
> org.apache.hudi.exception.HoodieException: 
> org.apache.hudi.exception.HoodieException: 
> java.util.concurrent.ExecutionException: 
> org.apache.hudi.exception.HoodieException: operation has failed
>   at 
> org.apache.hudi.table.action.commit.CommitActionExecutor.handleUpdateInternal(CommitActionExecutor.java:100)
>   at 
> org.apache.hudi.table.action.commit.CommitActionExecutor.handleUpdate(CommitActionExecutor.java:76)
>   at 
> org.apache.hudi.table.action.deltacommit.DeltaCommitActionExecutor.handleUpdate(DeltaCommitActionExecutor.java:73)
>   at 
> org.apache.hudi.table.action.commit.BaseCommitActionExecutor.handleUpsertPartition(BaseCommitActionExecutor.java:258)
>   at 
> org.apache.hudi.table.action.commit.BaseCommitActionExecutor.handleInsertPartition(BaseCommitActionExecutor.java:271)
>   at 
> org.apache.hudi.table.action.commit.BaseCommitActionExecutor.lambda$execute$caffe4c4$1(BaseCommitActionExecutor.java:104)
>   at 
> org.apache.spark.api.java.JavaRDDLike$$anonfun$mapPartitionsWithIndex$1.apply(JavaRDDLike.scala:102)
>   at 
> org.apache.spark.api.java.JavaRDDLike$$anonfun$mapPartitionsWithIndex$1.apply(JavaRDDLike.scala:102)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$25.apply(RDD.scala:853)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$25.apply(RDD.scala:853)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
>   at org.apache.spark.rdd.RDD$$anonfun$7.apply(RDD.scala:337)
>   at org.apache.spark.rdd.RDD$$anonfun$7.apply(RDD.scala:335)
>   at 
> org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1182)
>   at 
> org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1156)
>   at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:1091)
>   at 
> org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1156)
>   at 
> org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:882)
>   at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:335)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:286)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
>   at org.apache.spark.scheduler.Task.run(Task.scala:123)
>   at 
> org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
>   at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.hudi.exception.HoodieException: 
> java.util.concurrent.ExecutionException: 
> 

[GitHub] [incubator-hudi] rolandjohann commented on pull request #1596: [HUDI-863] get decimal properties from derived spark DataType

2020-05-14 Thread GitBox


rolandjohann commented on pull request #1596:
URL: https://github.com/apache/incubator-hudi/pull/1596#issuecomment-628817206


   @nsivabalan it seems that I can't do that. "assign to me" link is missing 
and klicking on Assignee "Unassigned" doesn't transform the field to the 
desired people select. This is the same for HUDI-888. Can you assign those two 
issues to me?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-hudi] garyli1019 commented on pull request #1602: [HUDI-494] fix incorrect record size estimation

2020-05-14 Thread GitBox


garyli1019 commented on pull request #1602:
URL: https://github.com/apache/incubator-hudi/pull/1602#issuecomment-628817124


   @vinothchandar I definitely agree a statistical table would be a better 
approach, but it will take a while I believe. I am happy to contribute to this 
topic as well. 
   Any other recommendation for a short term fix for this issue? I believe this 
bug could happen again. When something upstream goes wrong, like Kafka or HDFS 
goes down in the production for a short period of time, Hudi will have a chance 
to make an abnormal small commit.
   Regarding the bloom filter size, I think they all use the bloom filter 
entries and FP rate to calculate the size, for simple, dynamic, local, and 
global. Once we switch to the parquet native approach, we can change the way of 
the estimation. I think the calculation could be accurate. HBASE index is not 
covered in this PR though.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (HUDI-863) nested structs containing decimal types lead to null pointer exception

2020-05-14 Thread Roland Johann (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Roland Johann updated HUDI-863:
---
Status: Patch Available  (was: In Progress)

> nested structs containing decimal types lead to null pointer exception
> --
>
> Key: HUDI-863
> URL: https://issues.apache.org/jira/browse/HUDI-863
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>Reporter: Roland Johann
>Priority: Major
>  Labels: bug-bash-0.6.0, pull-request-available
> Fix For: 0.6.0
>
>
> Currently the avro schema gets passed to 
> AvroConversionHelper.createConverterToAvro which itself pocesses passed spark 
> sql DataTypes recursively to resolve structs, arrays, etc.  - the AvroSchema 
> gets passed to recursions, but without selection of the relevant field and 
> therefore schema of that field. That leads to a null pointer exception when 
> decimal types will  be processed, because in that case the schema of the 
> filed will be retrieved by calling getField on the root schema which is not 
> defined when we deal with nested records.
> [AvroConversionHelper.scala#L291|https://github.com/apache/incubator-hudi/blob/master/hudi-spark/src/main/scala/org/apache/hudi/AvroConversionHelper.scala#L291]
> The proposed solution is to remove the dependency on the avro schema and 
> derive the particular avro schema for the decimal converter creator case only.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] bvaradar commented on issue #1631: [SUPPORT] After changed schema could not update schema of Hive

2020-05-14 Thread GitBox


bvaradar commented on issue #1631:
URL: https://github.com/apache/incubator-hudi/issues/1631#issuecomment-628801677


   @zherenyu831 : Schema evolution rules does not allow for adding columns in 
the middle of schema. You would need to add them at the end. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-hudi] EdwinGuo edited a comment on issue #1630: [SUPPORT] Latest commit does not have any schema in commit metadata

2020-05-14 Thread GitBox


EdwinGuo edited a comment on issue #1630:
URL: https://github.com/apache/incubator-hudi/issues/1630#issuecomment-628778573


   @bvaradar Thanks for the response. You mean the feature of delete? So what 
I'm working on is having data writing to storage through hudi with hudi version 
0.5.0 but I was using fresh build from the git sha mentioned above 
https://github.com/apache/incubator-hudi/commit/506447fd4fde4cd922f7aa8f4e17a7f0dc97),Latest.
 So the workaround that I used was to use a build from 
https://github.com/apache/incubator-hudi/commit/506447fd4fde4cd922f7aa8f4e17a7f0dc97),Latest
 to do a upsert (so that I can have schema under extraMetadata session under 
the most recent commit file) and then do the delete and this is the only way to 
make the 0.5.0 data deletable.
   
   Thanks for the help.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-hudi] EdwinGuo commented on issue #1630: [SUPPORT] Latest commit does not have any schema in commit metadata

2020-05-14 Thread GitBox


EdwinGuo commented on issue #1630:
URL: https://github.com/apache/incubator-hudi/issues/1630#issuecomment-628778573


   @bvaradar Thanks for the response. You mean the feature of delete? So what 
I'm working on is having data writing to storage through hudi with hudi version 
0.5.0 but I was using fresh build from the git sha mentioned above 
https://github.com/apache/incubator-hudi/commit/506447fd4fde4cd922f7aa8f4e17a7f0dc97),Latest.
 So the workaround that I used was to use a build from 
https://github.com/apache/incubator-hudi/commit/506447fd4fde4cd922f7aa8f4e17a7f0dc97),Latest
 and do a upsert and then do the delete and this is the only way to make the 
0.5.0 data deletable.
   
   Thanks for the help.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-hudi] vinothchandar commented on pull request #1151: [HUDI-476] Add hudi-examples module

2020-05-14 Thread GitBox


vinothchandar commented on pull request #1151:
URL: https://github.com/apache/incubator-hudi/pull/1151#issuecomment-628775947


   @lamber-ken are you able to take this across finish line? @dengziming has 
something that is very close to a first version.. we can try to land that and 
then improvise 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-767) Support transformation when export to Hudi

2020-05-14 Thread Vinoth Chandar (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17107485#comment-17107485
 ] 

Vinoth Chandar commented on HUDI-767:
-

yeah agree. we can defer this 

> Support transformation when export to Hudi
> --
>
> Key: HUDI-767
> URL: https://issues.apache.org/jira/browse/HUDI-767
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Utilities
>Reporter: Raymond Xu
>Priority: Major
> Fix For: 0.6.1
>
>
> Main logic described in 
> https://github.com/apache/incubator-hudi/issues/1480#issuecomment-608529410
> In HoodieSnapshotExporter, we could extend the feature to include 
> transformation when --output-format hudi, using a custom Transformer



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] vinothchandar commented on pull request #1094: [WIP] [HUDI-375] Refactor the configure framework of hudi project

2020-05-14 Thread GitBox


vinothchandar commented on pull request #1094:
URL: https://github.com/apache/incubator-hudi/pull/1094#issuecomment-628766636


   cc @n3nash we need something like this.. but more full fledged with fallback 
key support etc ... 
   
   closing and saving for later 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-hudi] vinothchandar closed pull request #1094: [WIP] [HUDI-375] Refactor the configure framework of hudi project

2020-05-14 Thread GitBox


vinothchandar closed pull request #1094:
URL: https://github.com/apache/incubator-hudi/pull/1094


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-hudi] vinothchandar commented on pull request #1409: [HUDI-714]Add javadoc and comments to hudi write method link

2020-05-14 Thread GitBox


vinothchandar commented on pull request #1409:
URL: https://github.com/apache/incubator-hudi/pull/1409#issuecomment-628765699


   @nsivabalan can you please re-review and see this home



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-hudi] vinothchandar commented on pull request #1471: [WIP][HUDI-752]Make CompactionAdminClient spark-free

2020-05-14 Thread GitBox


vinothchandar commented on pull request #1471:
URL: https://github.com/apache/incubator-hudi/pull/1471#issuecomment-628765061


   Closing due to inactivity and we have done few different fixes around this. 
Please rebase, reopen if its still relevant 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-hudi] vinothchandar closed pull request #1471: [WIP][HUDI-752]Make CompactionAdminClient spark-free

2020-05-14 Thread GitBox


vinothchandar closed pull request #1471:
URL: https://github.com/apache/incubator-hudi/pull/1471


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-774) Spark to Avro converter incorrectly generates optional fields

2020-05-14 Thread Vinoth Chandar (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17107473#comment-17107473
 ] 

Vinoth Chandar commented on HUDI-774:
-

yes [~uditme] is going to look at it as well 

> Spark to Avro converter incorrectly generates optional fields
> -
>
> Key: HUDI-774
> URL: https://issues.apache.org/jira/browse/HUDI-774
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>Reporter: Alexander Filipchik
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.6.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> I think https://issues.apache.org/jira/browse/SPARK-28008 is a good 
> descriptions of what is happening.
>  
> It can cause a situation when schema in the MOR log files is incompatible 
> with the schema produced by RowBasedSchemaProvider, so compactions will stall.
>  
> I have a fix which is a bit hacky -> postprocess schema produced by the 
> converter and
> 1) Make sure unions with null types have those null types at position 0
> 2) They have default values set to null
> I couldn't find a way to do a clean fix as some classes that are problematic 
> are from Hive and called from Spark.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] vinothchandar commented on pull request #1558: [HUDI-796]: added deduping logic for upserts case

2020-05-14 Thread GitBox


vinothchandar commented on pull request #1558:
URL: https://github.com/apache/incubator-hudi/pull/1558#issuecomment-628761300


   I will let @yanghua see this home 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-hudi] vinothchandar commented on pull request #1596: [HUDI-863] get decimal properties from derived spark DataType

2020-05-14 Thread GitBox


vinothchandar commented on pull request #1596:
URL: https://github.com/apache/incubator-hudi/pull/1596#issuecomment-628755876


   @umehrot2 gentle ping :) 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-hudi] vinothchandar commented on pull request #1611: [HUDI-705]Add unit test for RollbacksCommand

2020-05-14 Thread GitBox


vinothchandar commented on pull request #1611:
URL: https://github.com/apache/incubator-hudi/pull/1611#issuecomment-628755381


   @yanghua you can review this as well if possible :) 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (HUDI-888) NPE when compacting via hudi-cli and providing a compaction props file

2020-05-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-888:

Labels: pull-request-available  (was: )

> NPE when compacting via hudi-cli and providing a compaction props file
> --
>
> Key: HUDI-888
> URL: https://issues.apache.org/jira/browse/HUDI-888
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>Reporter: Roland Johann
>Priority: Major
>  Labels: pull-request-available
>
> When we schedule compaction via hudi-cli and provide compaction props via 
> `propsFilePath` argument, we get a NPE because the file system has not been 
> initialized at the constructor of HoodieCompactor.java



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] vinothchandar commented on pull request #1622: [HUDI-888] fix NullPointerException

2020-05-14 Thread GitBox


vinothchandar commented on pull request #1622:
URL: https://github.com/apache/incubator-hudi/pull/1622#issuecomment-628751099


   @leesf can you please review this one? 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1616: [HUDI-786] Fixing read beyond inline length in InlineFS

2020-05-14 Thread GitBox


vinothchandar commented on a change in pull request #1616:
URL: https://github.com/apache/incubator-hudi/pull/1616#discussion_r425276842



##
File path: 
hudi-common/src/main/java/org/apache/hudi/common/fs/inline/InLineFsDataInputStream.java
##
@@ -56,24 +56,29 @@ public long getPos() throws IOException {
 
   @Override
   public int read(long position, byte[] buffer, int offset, int length) throws 
IOException {
+if ((length - offset) > this.length) {
+  throw new IOException("Attempting to read past inline content");
+}
 return outerStream.read(startOffset + position, buffer, offset, length);
   }
 
   @Override
   public void readFully(long position, byte[] buffer, int offset, int length) 
throws IOException {
+if ((length - offset) > this.length) {
+  throw new IOException("Attempting to read past inline content");
+}
 outerStream.readFully(startOffset + position, buffer, offset, length);
   }
 
   @Override
   public void readFully(long position, byte[] buffer)
   throws IOException {
-outerStream.readFully(startOffset + position, buffer, 0, buffer.length);
+readFully(position, buffer, 0, buffer.length);
   }
 
   @Override
   public boolean seekToNewSource(long targetPos) throws IOException {

Review comment:
   +1 we need to rethink this entire class this way .. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-hudi] vinothchandar commented on pull request #1566: [HUDI-603]: DeltaStreamer can now fetch schema before every run in continuous mode

2020-05-14 Thread GitBox


vinothchandar commented on pull request #1566:
URL: https://github.com/apache/incubator-hudi/pull/1566#issuecomment-628749732


   @bvaradar this and #1518 are again related.. Can you take both of these home 
? 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-hudi] vinothchandar commented on pull request #1518: [HUDI-723] Register avro schema if infered from SQL transformation

2020-05-14 Thread GitBox


vinothchandar commented on pull request #1518:
URL: https://github.com/apache/incubator-hudi/pull/1518#issuecomment-628747799


   @bvaradar Could you review this one?  It hits close to the 
transformer/schemaprovider changes, which you are more familiar with 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-hudi] bvaradar commented on issue #1630: [SUPPORT] Latest commit does not have any schema in commit metadata

2020-05-14 Thread GitBox


bvaradar commented on issue #1630:
URL: https://github.com/apache/incubator-hudi/issues/1630#issuecomment-628744270


   @EdwinGuo : THis feature is available only from 0.5.1 onwards. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




  1   2   >