[jira] [Assigned] (HUDI-266) Utility Script to restore Hudi dataset to previous instant

2020-04-01 Thread Nishith Agarwal (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nishith Agarwal reassigned HUDI-266:


Assignee: Ramachandran M S

> Utility Script to restore Hudi dataset to previous instant
> --
>
> Key: HUDI-266
> URL: https://issues.apache.org/jira/browse/HUDI-266
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Common Core
>Reporter: Balaji Varadarajan
>Assignee: Ramachandran M S
>Priority: Major
>
> Recently, we added restoreToInstant() api. We need to expose it as an utility 
> script in-order for users to directly use it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] vinothchandar commented on issue #1472: [HUDI-754] Configure .asf.yaml for Hudi Github repository

2020-04-01 Thread GitBox
vinothchandar commented on issue #1472: [HUDI-754] Configure .asf.yaml for Hudi 
Github repository
URL: https://github.com/apache/incubator-hudi/pull/1472#issuecomment-607610527
 
 
   awesome catch @lamber-ken !


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


Build failed in Jenkins: hudi-snapshot-deployment-0.5 #235

2020-04-01 Thread Apache Jenkins Server
See 


Changes:


--
[...truncated 2.34 KB...]
/home/jenkins/tools/maven/apache-maven-3.5.4/conf:
logging
settings.xml
toolchains.xml

/home/jenkins/tools/maven/apache-maven-3.5.4/conf/logging:
simplelogger.properties

/home/jenkins/tools/maven/apache-maven-3.5.4/lib:
aopalliance-1.0.jar
cdi-api-1.0.jar
cdi-api.license
commons-cli-1.4.jar
commons-cli.license
commons-io-2.5.jar
commons-io.license
commons-lang3-3.5.jar
commons-lang3.license
ext
guava-20.0.jar
guice-4.2.0-no_aop.jar
jansi-1.17.1.jar
jansi-native
javax.inject-1.jar
jcl-over-slf4j-1.7.25.jar
jcl-over-slf4j.license
jsr250-api-1.0.jar
jsr250-api.license
maven-artifact-3.5.4.jar
maven-artifact.license
maven-builder-support-3.5.4.jar
maven-builder-support.license
maven-compat-3.5.4.jar
maven-compat.license
maven-core-3.5.4.jar
maven-core.license
maven-embedder-3.5.4.jar
maven-embedder.license
maven-model-3.5.4.jar
maven-model-builder-3.5.4.jar
maven-model-builder.license
maven-model.license
maven-plugin-api-3.5.4.jar
maven-plugin-api.license
maven-repository-metadata-3.5.4.jar
maven-repository-metadata.license
maven-resolver-api-1.1.1.jar
maven-resolver-api.license
maven-resolver-connector-basic-1.1.1.jar
maven-resolver-connector-basic.license
maven-resolver-impl-1.1.1.jar
maven-resolver-impl.license
maven-resolver-provider-3.5.4.jar
maven-resolver-provider.license
maven-resolver-spi-1.1.1.jar
maven-resolver-spi.license
maven-resolver-transport-wagon-1.1.1.jar
maven-resolver-transport-wagon.license
maven-resolver-util-1.1.1.jar
maven-resolver-util.license
maven-settings-3.5.4.jar
maven-settings-builder-3.5.4.jar
maven-settings-builder.license
maven-settings.license
maven-shared-utils-3.2.1.jar
maven-shared-utils.license
maven-slf4j-provider-3.5.4.jar
maven-slf4j-provider.license
org.eclipse.sisu.inject-0.3.3.jar
org.eclipse.sisu.inject.license
org.eclipse.sisu.plexus-0.3.3.jar
org.eclipse.sisu.plexus.license
plexus-cipher-1.7.jar
plexus-cipher.license
plexus-component-annotations-1.7.1.jar
plexus-component-annotations.license
plexus-interpolation-1.24.jar
plexus-interpolation.license
plexus-sec-dispatcher-1.4.jar
plexus-sec-dispatcher.license
plexus-utils-3.1.0.jar
plexus-utils.license
slf4j-api-1.7.25.jar
slf4j-api.license
wagon-file-3.1.0.jar
wagon-file.license
wagon-http-3.1.0-shaded.jar
wagon-http.license
wagon-provider-api-3.1.0.jar
wagon-provider-api.license

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/ext:
README.txt

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native:
freebsd32
freebsd64
linux32
linux64
osx
README.txt
windows32
windows64

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/freebsd32:
libjansi.so

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/freebsd64:
libjansi.so

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/linux32:
libjansi.so

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/linux64:
libjansi.so

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/osx:
libjansi.jnilib

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/windows32:
jansi.dll

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/windows64:
jansi.dll
Finished /home/jenkins/tools/maven/apache-maven-3.5.4 Directory Listing :
Detected current version as: 
'HUDI_home=
0.6.0-SNAPSHOT'
[INFO] Scanning for projects...
[WARNING] 
[WARNING] Some problems were encountered while building the effective model for 
org.apache.hudi:hudi-spark_2.11:jar:0.6.0-SNAPSHOT
[WARNING] 'artifactId' contains an expression but should be a constant. @ 
org.apache.hudi:hudi-spark_${scala.binary.version}:[unknown-version], 

 line 26, column 15
[WARNING] 
[WARNING] Some problems were encountered while building the effective model for 
org.apache.hudi:hudi-timeline-service:jar:0.6.0-SNAPSHOT
[WARNING] 'build.plugins.plugin.(groupId:artifactId)' must be unique but found 
duplicate declaration of plugin org.jacoco:jacoco-maven-plugin @ 
org.apache.hudi:hudi-timeline-service:[unknown-version], 

 line 58, column 15
[WARNING] 
[WARNING] Some problems were encountered while building the effective model for 
org.apache.hudi:hudi-utilities_2.11:jar:0.6.0-SNAPSHOT
[WARNING] 'artifactId' contains an expression but should be a constant. @ 
org.apache.hudi:hudi-utilities_${scala.binary.version}:[unknown-version], 

 line 26, column 15
[WARNING] 
[WARNING] Some problems were encountered while building the effective model for 
org.apache.hudi:hudi-spark-bundle_2.11:jar:0.6.0-SNAPSHOT
[WARNING] 'artifactId' contains an expression but should be a constant. @ 

[jira] [Commented] (HUDI-210) Implement prometheus metrics reporter

2020-04-01 Thread vinoyang (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17073307#comment-17073307
 ] 

vinoyang commented on HUDI-210:
---

[~rxu] agree with your opinion, we also use Prometheus as a unified metrics 
system. [~x1q1j1] Any progress?

> Implement prometheus metrics reporter
> -
>
> Key: HUDI-210
> URL: https://issues.apache.org/jira/browse/HUDI-210
> Project: Apache Hudi (incubating)
>  Issue Type: New Feature
>  Components: Common Core
>Reporter: vinoyang
>Assignee: Forward Xu
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Since Prometheus is a very popular monitoring system and time series 
> database, it would be better to provide a metrics reporter to report metrics 
> to prometheus.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] hddong commented on a change in pull request #1471: [WIP][HUDI-752]Make CompactionAdminClient spark-free

2020-04-01 Thread GitBox
hddong commented on a change in pull request #1471: [WIP][HUDI-752]Make 
CompactionAdminClient spark-free
URL: https://github.com/apache/incubator-hudi/pull/1471#discussion_r402006542
 
 

 ##
 File path: 
hudi-client/src/main/java/org/apache/hudi/client/utils/SparkEngineUtils.java
 ##
 @@ -0,0 +1,61 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.client.utils;
+
+import org.apache.spark.SparkContext;
+import org.apache.spark.api.java.JavaSparkContext;
+import org.apache.spark.api.java.function.FlatMapFunction;
+import org.apache.spark.api.java.function.Function;
+
+import java.util.List;
+
+/**
+ * Util class for Spark Engine.
+ */
+public class SparkEngineUtils {
 
 Review comment:
   Yes, agree too @yanghua @vinothchandar 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] umehrot2 commented on issue #1459: [HUDI-418] [HUDI-421] Bootstrap Index using HFile and File System View Changes with unit-test

2020-04-01 Thread GitBox
umehrot2 commented on issue #1459: [HUDI-418] [HUDI-421] Bootstrap Index using 
HFile and File System View Changes with unit-test
URL: https://github.com/apache/incubator-hudi/pull/1459#issuecomment-607572115
 
 
   > @umehrot2 : Made changes to track external file status.
   
   Thanks ! Plan to start reviewing this tomorrow.
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] n3nash commented on a change in pull request #1242: [HUDI-544] Archived commits command code cleanup

2020-04-01 Thread GitBox
n3nash commented on a change in pull request #1242: [HUDI-544] Archived commits 
command code cleanup
URL: https://github.com/apache/incubator-hudi/pull/1242#discussion_r401971898
 
 

 ##
 File path: hudi-spark/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala
 ##
 @@ -119,8 +119,10 @@ private[hudi] object HoodieSparkSqlWriter {
 
   // Create the table if not present
   if (!exists) {
+val archiveLogFolder = parameters.getOrElse(
+  HoodieTableConfig.HOODIE_ARCHIVELOG_FOLDER_PROP_NAME, "archived")
 HoodieTableMetaClient.initTableType(sparkContext.hadoopConfiguration, 
path.get, tableType,
-  tblName.get, "archived", parameters(PAYLOAD_CLASS_OPT_KEY))
+  tblName.get, archiveLogFolder, parameters(PAYLOAD_CLASS_OPT_KEY))
 
 Review comment:
   @hddong I'm accepting and will merge after you create another ticket for us 
to add to the release notes that if someone was actually overriding 
HOODIE_ARCHIVELOG_FOLDER_PROP_NAME that was not being honored before and now 
will be honored for those cases it can break (since this is the right thing to 
do)


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] n3nash commented on issue #1310: [HUDI-601] Improve unit test coverage for HoodieAvroWriteSupport, HoodieRealtimeRecordReader, RealtimeCompactedRecordReader

2020-04-01 Thread GitBox
n3nash commented on issue #1310: [HUDI-601] Improve unit test coverage for 
HoodieAvroWriteSupport, HoodieRealtimeRecordReader, 
RealtimeCompactedRecordReader
URL: https://github.com/apache/incubator-hudi/pull/1310#issuecomment-607540951
 
 
   @modi95 could you please rebase and address the comment so this can be 
merged ?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] n3nash commented on a change in pull request #1476: [HUDI-757] Added hudi-cli command to export metadata of Instants.

2020-04-01 Thread GitBox
n3nash commented on a change in pull request #1476: [HUDI-757] Added hudi-cli 
command to export metadata of Instants.
URL: https://github.com/apache/incubator-hudi/pull/1476#discussion_r401969804
 
 

 ##
 File path: 
hudi-cli/src/main/java/org/apache/hudi/cli/commands/ExportCommand.java
 ##
 @@ -0,0 +1,152 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.cli.commands;
+
+import org.apache.hudi.avro.HoodieAvroUtils;
+import org.apache.hudi.avro.model.HoodieArchivedMetaEntry;
+import org.apache.hudi.cli.HoodieCLI;
+import org.apache.hudi.common.fs.FSUtils;
+import org.apache.hudi.common.model.HoodieLogFile;
+import org.apache.hudi.common.table.log.HoodieLogFormat;
+import org.apache.hudi.common.table.log.HoodieLogFormat.Reader;
+import org.apache.hudi.common.table.log.block.HoodieAvroDataBlock;
+import org.apache.avro.generic.GenericRecord;
+import org.apache.avro.generic.IndexedRecord;
+import org.apache.hadoop.fs.FileStatus;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.springframework.shell.core.CommandMarker;
+import org.springframework.shell.core.annotation.CliCommand;
+import org.springframework.shell.core.annotation.CliOption;
+import org.springframework.stereotype.Component;
+
+import java.io.File;
+import java.io.FileOutputStream;
+import java.io.IOException;
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Set;
+import java.util.stream.Collectors;
+
+/**
+ * CLI command to export various information from a HUDI dataset.
+ */
+@Component
+public class ExportCommand implements CommandMarker {
+
+  @CliCommand(value = "export instants", help = "Export Instants and their 
metadata from the Timeline")
+  public String showArchivedCommits(
+  @CliOption(key = {"limit"}, help = "Limit Instants", 
unspecifiedDefaultValue = "-1") final Integer limit,
+  @CliOption(key = {"actions"}, help = "Comma seperated list of Instant 
actions to export",
+unspecifiedDefaultValue = 
"clean,commit,deltacommit,rollback,savepoint,restore") final String filter,
+  @CliOption(key = {"desc"}, help = "Ordering", unspecifiedDefaultValue = 
"false") final boolean descending,
+  @CliOption(key = {"localFolder"}, help = "Local Folder to export to", 
mandatory = true) String localFolder)
+  throws IOException {
+
+final String basePath = HoodieCLI.getTableMetaClient().getBasePath();
+final Path archivePath = new Path(basePath + 
"/.hoodie/.commits_.archive*");
+final Path metaPath = new Path(basePath + "/.hoodie/");
+final Set actionSet = new 
HashSet(Arrays.asList(filter.split(",")));
+int numExports = limit == -1 ? Integer.MAX_VALUE : limit;
+int numCopied = 0;
+
+if (! new File(localFolder).isDirectory()) {
+  throw new RuntimeException(localFolder + " is not a valid local 
directory");
+}
+
+// The non archived instants are of the format 
... We only
+// want the completed ones which do not have the requested/inflight suffix.
+FileStatus[] statuses = FSUtils.getFs(basePath, 
HoodieCLI.conf).listStatus(metaPath);
+List nonArchivedStatuses = Arrays.stream(statuses).filter(f -> 
{
 
 Review comment:
   Can you use the HoodieActiveTimeline here instead to filter out all non 
requested/inflight. once you do that, the base path to the .hoodie is known to 
you and the name of the commit file can be gotten from HoodieInstant - then you 
can do a getFileStatus() if needed - it's cleaner this way rather than you 
doing indexOf and substring etc - these can break


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] n3nash commented on a change in pull request #1457: [HUDI-741] Added checks to validate Hoodie's schema evolution.

2020-04-01 Thread GitBox
n3nash commented on a change in pull request #1457: [HUDI-741] Added checks to 
validate Hoodie's schema evolution.
URL: https://github.com/apache/incubator-hudi/pull/1457#discussion_r401966800
 
 

 ##
 File path: 
hudi-common/src/main/java/org/apache/hudi/common/avro/SchemaCompatibility.java
 ##
 @@ -0,0 +1,566 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.common.avro;
+
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+
+import org.apache.avro.AvroRuntimeException;
+import org.apache.avro.Schema;
+import org.apache.avro.Schema.Field;
+import org.apache.avro.Schema.Type;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * NOTE: This code is copied from org.apache.avro.SchemaCompatibility and 
changed for HUDI use case.
+ *
+ * HUDI requires a Schema to be specified in HoodieWriteConfig and is used by 
the HoodieWriteClient to
+ * create the records. The schema is also saved in the data files (parquet 
format) and log files (avro format).
+ * Since a schema is required each time new data is ingested into a HUDI 
dataset, schema can be evolved over time.
+ *
+ * HUDI specific validation of schema evolution should ensure that a newer 
schema can be used for the dataset by
+ * checking that the data written using the old schema can be read using the 
new schema.
+ *
+ * New Schema is compatible only if:
+ * 1. There is no change in schema
+ * 2. A field has been added and it has a default value specified
+ *
+ * New Schema is incompatible if:
+ * 1. A field has been deleted
+ * 2. A field has been renamed (treated as delete + add)
+ * 3. A field's type has changed to be incompatible with the older type
+ */
+public class SchemaCompatibility {
 
 Review comment:
   @prashantwason Can you please mark the lines/methods you have changed with 
may be **MOD** for future references ? Also, can you add 1-2 lines as to why 
that change is needed 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] n3nash commented on a change in pull request #1457: [HUDI-741] Added checks to validate Hoodie's schema evolution.

2020-04-01 Thread GitBox
n3nash commented on a change in pull request #1457: [HUDI-741] Added checks to 
validate Hoodie's schema evolution.
URL: https://github.com/apache/incubator-hudi/pull/1457#discussion_r401965698
 
 

 ##
 File path: 
hudi-client/src/test/java/org/apache/hudi/client/TestTableSchemaEvolution.java
 ##
 @@ -0,0 +1,410 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.client;
+
+import org.apache.hudi.common.HoodieClientTestUtils;
+import org.apache.hudi.common.HoodieTestDataGenerator;
+import org.apache.hudi.common.model.HoodieRecord;
+import org.apache.hudi.common.model.HoodieTableType;
+import org.apache.hudi.common.table.HoodieTableMetaClient;
+import org.apache.hudi.common.table.timeline.HoodieTimeline;
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.config.HoodieIndexConfig;
+import org.apache.hudi.config.HoodieWriteConfig;
+import org.apache.hudi.exception.HoodieInsertException;
+import org.apache.hudi.exception.HoodieUpsertException;
+import org.apache.hudi.index.HoodieIndex.IndexType;
+import org.junit.After;
+import org.junit.Before;
+import org.junit.Test;
+
+import java.io.IOException;
+import java.util.List;
+
+import static 
org.apache.hudi.common.HoodieTestDataGenerator.TRIP_EXAMPLE_SCHEMA;
+import static 
org.apache.hudi.common.table.timeline.versioning.TimelineLayoutVersion.VERSION_1;
+import static org.junit.Assert.assertEquals;
+import static org.junit.Assert.assertTrue;
+import static org.junit.Assert.fail;
+
+public class TestTableSchemaEvolution extends TestHoodieClientBase {
+  private final String initCommitTime = "000";
+  private HoodieTableType tableType = HoodieTableType.COPY_ON_WRITE;
+  private HoodieTestDataGenerator dataGenDevolved = new 
HoodieTestDataGenerator(TRIP_EXAMPLE_SCHEMA_DEVOLVED);
+  private HoodieTestDataGenerator dataGenEvolved = new 
HoodieTestDataGenerator(TRIP_EXAMPLE_SCHEMA_EVOLVED);
+
+  // TRIP_EXAMPLE_SCHEMA with a new_field added
+  public static final String TRIP_EXAMPLE_SCHEMA_EVOLVED = "{\"type\": 
\"record\"," + "\"name\": \"triprec\"," + "\"fields\": [ "
+  + "{\"name\": \"timestamp\",\"type\": \"double\"}," + "{\"name\": 
\"_row_key\", \"type\": \"string\"},"
+  + "{\"name\": \"rider\", \"type\": \"string\"}," + "{\"name\": 
\"driver\", \"type\": \"string\"},"
+  + "{\"name\": \"begin_lat\", \"type\": \"double\"}," + "{\"name\": 
\"begin_lon\", \"type\": \"double\"},"
+  + "{\"name\": \"end_lat\", \"type\": \"double\"}," + "{\"name\": 
\"end_lon\", \"type\": \"double\"},"
+  + "{\"name\": \"new_field\", \"type\": [\"null\", \"string\"], 
\"default\": null},"
+  + "{\"name\": \"fare\",\"type\": {\"type\":\"record\", 
\"name\":\"fare\",\"fields\": ["
+  + "{\"name\": \"amount\",\"type\": \"double\"},{\"name\": \"currency\", 
\"type\": \"string\"}]}},"
+  + "{\"name\": \"_hoodie_is_deleted\", \"type\": \"boolean\", 
\"default\": false} ]}";
+  // TRIP_EXAMPLE_SCHEMA with driver field removed
+  public static final String TRIP_EXAMPLE_SCHEMA_DEVOLVED = "{\"type\": 
\"record\"," + "\"name\": \"triprec\"," + "\"fields\": [ "
+  + "{\"name\": \"timestamp\",\"type\": \"double\"}," + "{\"name\": 
\"_row_key\", \"type\": \"string\"},"
+  + "{\"name\": \"rider\", \"type\": \"string\"},"
+  + "{\"name\": \"begin_lat\", \"type\": \"double\"}," + "{\"name\": 
\"begin_lon\", \"type\": \"double\"},"
+  + "{\"name\": \"end_lat\", \"type\": \"double\"}," + "{\"name\": 
\"end_lon\", \"type\": \"double\"},"
+  + "{\"name\": \"fare\",\"type\": {\"type\":\"record\", 
\"name\":\"fare\",\"fields\": ["
+  + "{\"name\": \"amount\",\"type\": \"double\"},{\"name\": \"currency\", 
\"type\": \"string\"}]}},"
+  + "{\"name\": \"_hoodie_is_deleted\", \"type\": \"boolean\", 
\"default\": false} ]}";
+
+  @Before
+  public void setUp() throws Exception {
+initResources();
+  }
+
+  @After
+  public void tearDown() {
+cleanupSparkContexts();
+  }
+
+  @Test
+  public void testMORTable() throws Exception {
+tableType = HoodieTableType.MERGE_ON_READ;
+initMetaClient();
+
+// Create the table
+HoodieTableMetaClient.initTableType(metaClient.getHadoopConf(), 
metaClient.getBasePath(),

[GitHub] [incubator-hudi] n3nash commented on a change in pull request #1457: [HUDI-741] Added checks to validate Hoodie's schema evolution.

2020-04-01 Thread GitBox
n3nash commented on a change in pull request #1457: [HUDI-741] Added checks to 
validate Hoodie's schema evolution.
URL: https://github.com/apache/incubator-hudi/pull/1457#discussion_r401964863
 
 

 ##
 File path: 
hudi-client/src/test/java/org/apache/hudi/client/TestTableSchemaEvolution.java
 ##
 @@ -0,0 +1,410 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.client;
+
+import org.apache.hudi.common.HoodieClientTestUtils;
+import org.apache.hudi.common.HoodieTestDataGenerator;
+import org.apache.hudi.common.model.HoodieRecord;
+import org.apache.hudi.common.model.HoodieTableType;
+import org.apache.hudi.common.table.HoodieTableMetaClient;
+import org.apache.hudi.common.table.timeline.HoodieTimeline;
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.config.HoodieIndexConfig;
+import org.apache.hudi.config.HoodieWriteConfig;
+import org.apache.hudi.exception.HoodieInsertException;
+import org.apache.hudi.exception.HoodieUpsertException;
+import org.apache.hudi.index.HoodieIndex.IndexType;
+import org.junit.After;
+import org.junit.Before;
+import org.junit.Test;
+
+import java.io.IOException;
+import java.util.List;
+
+import static 
org.apache.hudi.common.HoodieTestDataGenerator.TRIP_EXAMPLE_SCHEMA;
+import static 
org.apache.hudi.common.table.timeline.versioning.TimelineLayoutVersion.VERSION_1;
+import static org.junit.Assert.assertEquals;
+import static org.junit.Assert.assertTrue;
+import static org.junit.Assert.fail;
+
+public class TestTableSchemaEvolution extends TestHoodieClientBase {
+  private final String initCommitTime = "000";
+  private HoodieTableType tableType = HoodieTableType.COPY_ON_WRITE;
+  private HoodieTestDataGenerator dataGenDevolved = new 
HoodieTestDataGenerator(TRIP_EXAMPLE_SCHEMA_DEVOLVED);
+  private HoodieTestDataGenerator dataGenEvolved = new 
HoodieTestDataGenerator(TRIP_EXAMPLE_SCHEMA_EVOLVED);
+
+  // TRIP_EXAMPLE_SCHEMA with a new_field added
+  public static final String TRIP_EXAMPLE_SCHEMA_EVOLVED = "{\"type\": 
\"record\"," + "\"name\": \"triprec\"," + "\"fields\": [ "
+  + "{\"name\": \"timestamp\",\"type\": \"double\"}," + "{\"name\": 
\"_row_key\", \"type\": \"string\"},"
+  + "{\"name\": \"rider\", \"type\": \"string\"}," + "{\"name\": 
\"driver\", \"type\": \"string\"},"
+  + "{\"name\": \"begin_lat\", \"type\": \"double\"}," + "{\"name\": 
\"begin_lon\", \"type\": \"double\"},"
+  + "{\"name\": \"end_lat\", \"type\": \"double\"}," + "{\"name\": 
\"end_lon\", \"type\": \"double\"},"
+  + "{\"name\": \"new_field\", \"type\": [\"null\", \"string\"], 
\"default\": null},"
+  + "{\"name\": \"fare\",\"type\": {\"type\":\"record\", 
\"name\":\"fare\",\"fields\": ["
+  + "{\"name\": \"amount\",\"type\": \"double\"},{\"name\": \"currency\", 
\"type\": \"string\"}]}},"
+  + "{\"name\": \"_hoodie_is_deleted\", \"type\": \"boolean\", 
\"default\": false} ]}";
+  // TRIP_EXAMPLE_SCHEMA with driver field removed
+  public static final String TRIP_EXAMPLE_SCHEMA_DEVOLVED = "{\"type\": 
\"record\"," + "\"name\": \"triprec\"," + "\"fields\": [ "
+  + "{\"name\": \"timestamp\",\"type\": \"double\"}," + "{\"name\": 
\"_row_key\", \"type\": \"string\"},"
+  + "{\"name\": \"rider\", \"type\": \"string\"},"
+  + "{\"name\": \"begin_lat\", \"type\": \"double\"}," + "{\"name\": 
\"begin_lon\", \"type\": \"double\"},"
+  + "{\"name\": \"end_lat\", \"type\": \"double\"}," + "{\"name\": 
\"end_lon\", \"type\": \"double\"},"
+  + "{\"name\": \"fare\",\"type\": {\"type\":\"record\", 
\"name\":\"fare\",\"fields\": ["
+  + "{\"name\": \"amount\",\"type\": \"double\"},{\"name\": \"currency\", 
\"type\": \"string\"}]}},"
+  + "{\"name\": \"_hoodie_is_deleted\", \"type\": \"boolean\", 
\"default\": false} ]}";
+
+  @Before
+  public void setUp() throws Exception {
+initResources();
+  }
+
+  @After
+  public void tearDown() {
+cleanupSparkContexts();
+  }
+
+  @Test
+  public void testMORTable() throws Exception {
+tableType = HoodieTableType.MERGE_ON_READ;
+initMetaClient();
+
+// Create the table
+HoodieTableMetaClient.initTableType(metaClient.getHadoopConf(), 
metaClient.getBasePath(),

[GitHub] [incubator-hudi] n3nash commented on a change in pull request #1457: [HUDI-741] Added checks to validate Hoodie's schema evolution.

2020-04-01 Thread GitBox
n3nash commented on a change in pull request #1457: [HUDI-741] Added checks to 
validate Hoodie's schema evolution.
URL: https://github.com/apache/incubator-hudi/pull/1457#discussion_r401964812
 
 

 ##
 File path: 
hudi-client/src/test/java/org/apache/hudi/client/TestTableSchemaEvolution.java
 ##
 @@ -0,0 +1,410 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.client;
+
+import org.apache.hudi.common.HoodieClientTestUtils;
+import org.apache.hudi.common.HoodieTestDataGenerator;
+import org.apache.hudi.common.model.HoodieRecord;
+import org.apache.hudi.common.model.HoodieTableType;
+import org.apache.hudi.common.table.HoodieTableMetaClient;
+import org.apache.hudi.common.table.timeline.HoodieTimeline;
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.config.HoodieIndexConfig;
+import org.apache.hudi.config.HoodieWriteConfig;
+import org.apache.hudi.exception.HoodieInsertException;
+import org.apache.hudi.exception.HoodieUpsertException;
+import org.apache.hudi.index.HoodieIndex.IndexType;
+import org.junit.After;
+import org.junit.Before;
+import org.junit.Test;
+
+import java.io.IOException;
+import java.util.List;
+
+import static 
org.apache.hudi.common.HoodieTestDataGenerator.TRIP_EXAMPLE_SCHEMA;
+import static 
org.apache.hudi.common.table.timeline.versioning.TimelineLayoutVersion.VERSION_1;
+import static org.junit.Assert.assertEquals;
+import static org.junit.Assert.assertTrue;
+import static org.junit.Assert.fail;
+
+public class TestTableSchemaEvolution extends TestHoodieClientBase {
+  private final String initCommitTime = "000";
+  private HoodieTableType tableType = HoodieTableType.COPY_ON_WRITE;
+  private HoodieTestDataGenerator dataGenDevolved = new 
HoodieTestDataGenerator(TRIP_EXAMPLE_SCHEMA_DEVOLVED);
+  private HoodieTestDataGenerator dataGenEvolved = new 
HoodieTestDataGenerator(TRIP_EXAMPLE_SCHEMA_EVOLVED);
+
+  // TRIP_EXAMPLE_SCHEMA with a new_field added
+  public static final String TRIP_EXAMPLE_SCHEMA_EVOLVED = "{\"type\": 
\"record\"," + "\"name\": \"triprec\"," + "\"fields\": [ "
+  + "{\"name\": \"timestamp\",\"type\": \"double\"}," + "{\"name\": 
\"_row_key\", \"type\": \"string\"},"
+  + "{\"name\": \"rider\", \"type\": \"string\"}," + "{\"name\": 
\"driver\", \"type\": \"string\"},"
+  + "{\"name\": \"begin_lat\", \"type\": \"double\"}," + "{\"name\": 
\"begin_lon\", \"type\": \"double\"},"
+  + "{\"name\": \"end_lat\", \"type\": \"double\"}," + "{\"name\": 
\"end_lon\", \"type\": \"double\"},"
+  + "{\"name\": \"new_field\", \"type\": [\"null\", \"string\"], 
\"default\": null},"
+  + "{\"name\": \"fare\",\"type\": {\"type\":\"record\", 
\"name\":\"fare\",\"fields\": ["
+  + "{\"name\": \"amount\",\"type\": \"double\"},{\"name\": \"currency\", 
\"type\": \"string\"}]}},"
+  + "{\"name\": \"_hoodie_is_deleted\", \"type\": \"boolean\", 
\"default\": false} ]}";
+  // TRIP_EXAMPLE_SCHEMA with driver field removed
+  public static final String TRIP_EXAMPLE_SCHEMA_DEVOLVED = "{\"type\": 
\"record\"," + "\"name\": \"triprec\"," + "\"fields\": [ "
+  + "{\"name\": \"timestamp\",\"type\": \"double\"}," + "{\"name\": 
\"_row_key\", \"type\": \"string\"},"
+  + "{\"name\": \"rider\", \"type\": \"string\"},"
+  + "{\"name\": \"begin_lat\", \"type\": \"double\"}," + "{\"name\": 
\"begin_lon\", \"type\": \"double\"},"
+  + "{\"name\": \"end_lat\", \"type\": \"double\"}," + "{\"name\": 
\"end_lon\", \"type\": \"double\"},"
+  + "{\"name\": \"fare\",\"type\": {\"type\":\"record\", 
\"name\":\"fare\",\"fields\": ["
+  + "{\"name\": \"amount\",\"type\": \"double\"},{\"name\": \"currency\", 
\"type\": \"string\"}]}},"
+  + "{\"name\": \"_hoodie_is_deleted\", \"type\": \"boolean\", 
\"default\": false} ]}";
+
+  @Before
+  public void setUp() throws Exception {
+initResources();
+  }
+
+  @After
+  public void tearDown() {
+cleanupSparkContexts();
+  }
+
+  @Test
+  public void testMORTable() throws Exception {
+tableType = HoodieTableType.MERGE_ON_READ;
+initMetaClient();
+
+// Create the table
+HoodieTableMetaClient.initTableType(metaClient.getHadoopConf(), 
metaClient.getBasePath(),

[jira] [Assigned] (HUDI-741) Fix Hoodie's schema evolution checks

2020-04-01 Thread Nishith Agarwal (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nishith Agarwal reassigned HUDI-741:


Assignee: Prashant Wason

> Fix Hoodie's schema evolution checks
> 
>
> Key: HUDI-741
> URL: https://issues.apache.org/jira/browse/HUDI-741
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>Reporter: Prashant Wason
>Assignee: Prashant Wason
>Priority: Minor
>  Labels: pull-request-available
>   Original Estimate: 120h
>  Time Spent: 10m
>  Remaining Estimate: 119h 50m
>
> HUDI requires a Schema to be specified in HoodieWriteConfig and is used by 
> the HoodieWriteClient to create the records. The schema is also saved in the 
> data files (parquet format) and log files (avro format).
> Since a schema is required each time new data is ingested into a HUDI 
> dataset, schema can be evolved over time. But HUDI should ensure that the 
> evolved schema is compatible with the older schema.
> HUDI specific validation of schema evolution should ensure that a newer 
> schema can be used for the dataset by checking that the data written using 
> the old schema can be read using the new schema.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] n3nash commented on a change in pull request #1473: [HUDI-568] Improve unit test coverage

2020-04-01 Thread GitBox
n3nash commented on a change in pull request #1473: [HUDI-568] Improve unit 
test coverage
URL: https://github.com/apache/incubator-hudi/pull/1473#discussion_r401959992
 
 

 ##
 File path: 
hudi-hadoop-mr/src/test/java/org/apache/hudi/hadoop/realtime/TestHoodieRealtimeFileSplit.java
 ##
 @@ -0,0 +1,129 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.hadoop.realtime;
+
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.io.Text;
+import org.apache.hadoop.mapred.FileSplit;
+import org.junit.Before;
+import org.junit.Test;
+import org.mockito.InOrder;
+import org.mockito.invocation.InvocationOnMock;
+import org.mockito.stubbing.Answer;
+
+import java.io.DataInput;
+import java.io.DataOutput;
+import java.io.IOException;
+import java.nio.charset.StandardCharsets;
+import java.util.Collections;
+import java.util.List;
+
+import static org.junit.Assert.assertEquals;
+import static org.mockito.AdditionalMatchers.aryEq;
+import static org.mockito.Matchers.any;
+import static org.mockito.Matchers.anyByte;
+import static org.mockito.Matchers.anyInt;
+import static org.mockito.Mockito.doAnswer;
+import static org.mockito.Mockito.doNothing;
+import static org.mockito.Mockito.eq;
+import static org.mockito.Mockito.inOrder;
+import static org.mockito.Mockito.mock;
+import static org.mockito.Mockito.times;
+import static org.mockito.Mockito.when;
+
+public class TestHoodieRealtimeFileSplit {
+
+  private HoodieRealtimeFileSplit split;
+  private String basePath = "/tmp";
 
 Review comment:
   Take a look at 
https://github.com/apache/incubator-hudi/blob/master/hudi-common/src/test/java/org/apache/hudi/common/HoodieCommonTestHarness.java#L48
 to better initialize things like basePath and other variables that you intend 
to use throughout the execution of the test


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] n3nash commented on a change in pull request #1473: [HUDI-568] Improve unit test coverage

2020-04-01 Thread GitBox
n3nash commented on a change in pull request #1473: [HUDI-568] Improve unit 
test coverage
URL: https://github.com/apache/incubator-hudi/pull/1473#discussion_r401959992
 
 

 ##
 File path: 
hudi-hadoop-mr/src/test/java/org/apache/hudi/hadoop/realtime/TestHoodieRealtimeFileSplit.java
 ##
 @@ -0,0 +1,129 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.hadoop.realtime;
+
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.io.Text;
+import org.apache.hadoop.mapred.FileSplit;
+import org.junit.Before;
+import org.junit.Test;
+import org.mockito.InOrder;
+import org.mockito.invocation.InvocationOnMock;
+import org.mockito.stubbing.Answer;
+
+import java.io.DataInput;
+import java.io.DataOutput;
+import java.io.IOException;
+import java.nio.charset.StandardCharsets;
+import java.util.Collections;
+import java.util.List;
+
+import static org.junit.Assert.assertEquals;
+import static org.mockito.AdditionalMatchers.aryEq;
+import static org.mockito.Matchers.any;
+import static org.mockito.Matchers.anyByte;
+import static org.mockito.Matchers.anyInt;
+import static org.mockito.Mockito.doAnswer;
+import static org.mockito.Mockito.doNothing;
+import static org.mockito.Mockito.eq;
+import static org.mockito.Mockito.inOrder;
+import static org.mockito.Mockito.mock;
+import static org.mockito.Mockito.times;
+import static org.mockito.Mockito.when;
+
+public class TestHoodieRealtimeFileSplit {
+
+  private HoodieRealtimeFileSplit split;
+  private String basePath = "/tmp";
 
 Review comment:
   Can you use the basePath invocation strategy from a different class ? Take a 
look at 
https://github.com/apache/incubator-hudi/blob/master/hudi-common/src/test/java/org/apache/hudi/common/HoodieCommonTestHarness.java#L48


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] n3nash commented on a change in pull request #1473: [HUDI-568] Improve unit test coverage

2020-04-01 Thread GitBox
n3nash commented on a change in pull request #1473: [HUDI-568] Improve unit 
test coverage
URL: https://github.com/apache/incubator-hudi/pull/1473#discussion_r401959992
 
 

 ##
 File path: 
hudi-hadoop-mr/src/test/java/org/apache/hudi/hadoop/realtime/TestHoodieRealtimeFileSplit.java
 ##
 @@ -0,0 +1,129 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.hadoop.realtime;
+
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.io.Text;
+import org.apache.hadoop.mapred.FileSplit;
+import org.junit.Before;
+import org.junit.Test;
+import org.mockito.InOrder;
+import org.mockito.invocation.InvocationOnMock;
+import org.mockito.stubbing.Answer;
+
+import java.io.DataInput;
+import java.io.DataOutput;
+import java.io.IOException;
+import java.nio.charset.StandardCharsets;
+import java.util.Collections;
+import java.util.List;
+
+import static org.junit.Assert.assertEquals;
+import static org.mockito.AdditionalMatchers.aryEq;
+import static org.mockito.Matchers.any;
+import static org.mockito.Matchers.anyByte;
+import static org.mockito.Matchers.anyInt;
+import static org.mockito.Mockito.doAnswer;
+import static org.mockito.Mockito.doNothing;
+import static org.mockito.Mockito.eq;
+import static org.mockito.Mockito.inOrder;
+import static org.mockito.Mockito.mock;
+import static org.mockito.Mockito.times;
+import static org.mockito.Mockito.when;
+
+public class TestHoodieRealtimeFileSplit {
+
+  private HoodieRealtimeFileSplit split;
+  private String basePath = "/tmp";
 
 Review comment:
   Can you use the basePath invocations from different class ? Take a look at 
https://github.com/apache/incubator-hudi/blob/master/hudi-common/src/test/java/org/apache/hudi/common/HoodieCommonTestHarness.java#L48


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] n3nash commented on a change in pull request #1473: [HUDI-568] Improve unit test coverage

2020-04-01 Thread GitBox
n3nash commented on a change in pull request #1473: [HUDI-568] Improve unit 
test coverage
URL: https://github.com/apache/incubator-hudi/pull/1473#discussion_r401958980
 
 

 ##
 File path: 
hudi-hadoop-mr/src/test/java/org/apache/hudi/hadoop/realtime/TestHoodieRealtimeFileSplit.java
 ##
 @@ -0,0 +1,129 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.hadoop.realtime;
+
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.io.Text;
+import org.apache.hadoop.mapred.FileSplit;
+import org.junit.Before;
+import org.junit.Test;
+import org.mockito.InOrder;
+import org.mockito.invocation.InvocationOnMock;
+import org.mockito.stubbing.Answer;
+
+import java.io.DataInput;
+import java.io.DataOutput;
+import java.io.IOException;
+import java.nio.charset.StandardCharsets;
+import java.util.Collections;
+import java.util.List;
+
+import static org.junit.Assert.assertEquals;
+import static org.mockito.AdditionalMatchers.aryEq;
+import static org.mockito.Matchers.any;
+import static org.mockito.Matchers.anyByte;
+import static org.mockito.Matchers.anyInt;
+import static org.mockito.Mockito.doAnswer;
+import static org.mockito.Mockito.doNothing;
+import static org.mockito.Mockito.eq;
+import static org.mockito.Mockito.inOrder;
+import static org.mockito.Mockito.mock;
+import static org.mockito.Mockito.times;
+import static org.mockito.Mockito.when;
+
+public class TestHoodieRealtimeFileSplit {
+
+  private HoodieRealtimeFileSplit split;
+  private String basePath = "/tmp";
+  private List deltaLogPaths = Collections.singletonList("/tmp/1.log");
+  private FileSplit baseFileSplit = new FileSplit(new Path("/tmp", 
"test.file"), 0, 100, new String[]{});
+  private String maxCommitTime = "10001";
+
+  @Before
+  public void setUp() throws Exception {
+split = new HoodieRealtimeFileSplit(baseFileSplit, basePath, 
deltaLogPaths, maxCommitTime);
+  }
+
+  @Test
+  public void write() throws IOException {
 
 Review comment:
   Can you add comments for each of the tests you have added to explain what 
you are trying to test here ? Especially with the mock, it is unclear to a new 
reader


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] n3nash commented on a change in pull request #1473: [HUDI-568] Improve unit test coverage

2020-04-01 Thread GitBox
n3nash commented on a change in pull request #1473: [HUDI-568] Improve unit 
test coverage
URL: https://github.com/apache/incubator-hudi/pull/1473#discussion_r401955661
 
 

 ##
 File path: 
hudi-common/src/test/java/org/apache/hudi/common/util/collection/TestRocksDBManager.java
 ##
 @@ -99,25 +104,119 @@ public void testRocksDBManager() {
 List> gotPayloads =
 dbManager.prefixSearch(family, 
prefix).collect(Collectors.toList());
 Integer expCount = countsMap.get(family).get(prefix);
+System.out.printf("%s,%s: %d, %d\n", prefix, family, expCount == null 
? 0L : expCount.longValue(), gotPayloads.size());
 
 Review comment:
   can we make this log ? if there is already an assert, this might not be 
needed


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] n3nash commented on a change in pull request #1473: [HUDI-568] Improve unit test coverage

2020-04-01 Thread GitBox
n3nash commented on a change in pull request #1473: [HUDI-568] Improve unit 
test coverage
URL: https://github.com/apache/incubator-hudi/pull/1473#discussion_r401955661
 
 

 ##
 File path: 
hudi-common/src/test/java/org/apache/hudi/common/util/collection/TestRocksDBManager.java
 ##
 @@ -99,25 +104,119 @@ public void testRocksDBManager() {
 List> gotPayloads =
 dbManager.prefixSearch(family, 
prefix).collect(Collectors.toList());
 Integer expCount = countsMap.get(family).get(prefix);
+System.out.printf("%s,%s: %d, %d\n", prefix, family, expCount == null 
? 0L : expCount.longValue(), gotPayloads.size());
 
 Review comment:
   can we make this log ?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] n3nash commented on a change in pull request #1473: [HUDI-568] Improve unit test coverage

2020-04-01 Thread GitBox
n3nash commented on a change in pull request #1473: [HUDI-568] Improve unit 
test coverage
URL: https://github.com/apache/incubator-hudi/pull/1473#discussion_r401955277
 
 

 ##
 File path: 
hudi-common/src/main/java/org/apache/hudi/common/util/collection/RocksDBDAO.java
 ##
 @@ -75,9 +75,6 @@ public RocksDBDAO(String basePath, String rocksDBBasePath) {
* Create RocksDB if not initialized.
*/
   private RocksDB getRocksDB() {
-if (null == rocksDB) {
 
 Review comment:
   What if getRocksDB is called repetitively ? It should not init() every time, 
do we plan to add a different check in init to allow for better code coverage ?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] bvaradar commented on issue #1459: [HUDI-418] [HUDI-421] Bootstrap Index using HFile and File System View Changes with unit-test

2020-04-01 Thread GitBox
bvaradar commented on issue #1459: [HUDI-418] [HUDI-421] Bootstrap Index using 
HFile and File System View Changes with unit-test
URL: https://github.com/apache/incubator-hudi/pull/1459#issuecomment-607463279
 
 
   
   @umehrot2 : Made changes to track external file status. 
   
   @vinothchandar : Ready for review.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (HUDI-210) Implement prometheus metrics reporter

2020-04-01 Thread Raymond Xu (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17073046#comment-17073046
 ] 

Raymond Xu commented on HUDI-210:
-

[~vinoth] I also think supporting multiple metrics backends gives Hudi 
competitive advantage on user expansion. Any org that wants to adopt Hudi 
probably already has an established reporting backend, and the out-of-box 
support to that backend is a good +1 on using Hudi. By selectively supporting 
the most popular ones, we can win most orgs' +1s on this. That list won't be 
long anyway. (guess 4 or 5 would suffice?)


> Implement prometheus metrics reporter
> -
>
> Key: HUDI-210
> URL: https://issues.apache.org/jira/browse/HUDI-210
> Project: Apache Hudi (incubating)
>  Issue Type: New Feature
>  Components: Common Core
>Reporter: vinoyang
>Assignee: Forward Xu
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Since Prometheus is a very popular monitoring system and time series 
> database, it would be better to provide a metrics reporter to report metrics 
> to prometheus.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] lamber-ken commented on issue #1472: [HUDI-754] Configure .asf.yaml for Hudi Github repository

2020-04-01 Thread GitBox
lamber-ken commented on issue #1472: [HUDI-754] Configure .asf.yaml for Hudi 
Github repository
URL: https://github.com/apache/incubator-hudi/pull/1472#issuecomment-607401231
 
 
   it says
   ```
   an error occurred while running github feature in .asf.yaml!:
   .asf.yaml: Invalid GitHub label 'incremental processing' - must be lowercase 
alphanumerical and <= 35 characters!
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] lamber-ken commented on issue #1472: [HUDI-754] Configure .asf.yaml for Hudi Github repository

2020-04-01 Thread GitBox
lamber-ken commented on issue #1472: [HUDI-754] Configure .asf.yaml for Hudi 
Github repository
URL: https://github.com/apache/incubator-hudi/pull/1472#issuecomment-607400717
 
 
   hi @yanghua @vinothchandar, go ahead with
   
https://lists.apache.org/thread.html/r2e5ede6c5e00532db9bee01c99dc250c43b17cf120c02685844602d9%40%3Ccommits.hudi.apache.org%3E
   
   
![image](https://user-images.githubusercontent.com/20113411/78170071-fbf71300-7484-11ea-835e-f0192925350d.png)
   
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (HUDI-722) IndexOutOfBoundsException in MessageColumnIORecordConsumer.addBinary when writing parquet

2020-04-01 Thread lamber-ken (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17073006#comment-17073006
 ] 

lamber-ken commented on HUDI-722:
-

IMO, it's hard to debug this issue, because writes are not always fails. I am 
sorry that I can not do a f2f session because of the limit of network. :(

> IndexOutOfBoundsException in MessageColumnIORecordConsumer.addBinary when 
> writing parquet
> -
>
> Key: HUDI-722
> URL: https://issues.apache.org/jira/browse/HUDI-722
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>  Components: Writer Core
>Reporter: Alexander Filipchik
>Assignee: lamber-ken
>Priority: Major
> Fix For: 0.6.0
>
>
> Some writes fail with java.lang.IndexOutOfBoundsException : Invalid array 
> range: X to X inside MessageColumnIORecordConsumer.addBinary call.
> Specifically: getColumnWriter().write(value, r[currentLevel], 
> currentColumnIO.getDefinitionLevel());
> fails as size of r is the same as current level. What can be causing it?
>  
> It gets executed via: ParquetWriter.write(IndexedRecord) Library version: 
> 1.10.1 Avro is a very complex object (~2.5k columns, highly nested, arrays of 
> unions present).
> But what is surprising is that it fails to write top level field: 
> PrimitiveColumnIO _hoodie_commit_time r:0 d:1 [_hoodie_commit_time] which is 
> the first top level field in Avro: {"_hoodie_commit_time": "20200317215711", 
> "_hoodie_commit_seqno": "20200317215711_0_650",



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-718) java.lang.ClassCastException during upsert

2020-04-01 Thread lamber-ken (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17072975#comment-17072975
 ] 

lamber-ken commented on HUDI-718:
-

IMO, hudi-0.5 depends on avro-1.7.0, hudi-0.5.2 depends on avro-1.8.2, so they 
may not be compatible.
another way to solve this issue, please try to replace "fixed" with "string" 
type.

 

> java.lang.ClassCastException during upsert
> --
>
> Key: HUDI-718
> URL: https://issues.apache.org/jira/browse/HUDI-718
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>  Components: DeltaStreamer
>Reporter: Alexander Filipchik
>Assignee: lamber-ken
>Priority: Major
> Fix For: 0.6.0
>
> Attachments: image-2020-03-21-16-49-28-905.png
>
>
> Dataset was created using hudi 0.5 and now trying to migrate it to the latest 
> master. The table is written using SqlTransformer. Exception:
>  
> Caused by: org.apache.hudi.exception.HoodieUpsertException: Failed to merge 
> old record into new file for key bla.bla from old file 
> gs://../2020/03/15/7b75931f-ff2f-4bf4-8949-5c437112be79-0_0-35-1196_20200316234140.parquet
>  to new file 
> gs://.../2020/03/15/7b75931f-ff2f-4bf4-8949-5c437112be79-0_1-39-1506_20200317190948.parquet
>  at org.apache.hudi.io.HoodieMergeHandle.write(HoodieMergeHandle.java:246)
>  at 
> org.apache.hudi.table.HoodieCopyOnWriteTable$UpdateHandler.consumeOneRecord(HoodieCopyOnWriteTable.java:433)
>  at 
> org.apache.hudi.table.HoodieCopyOnWriteTable$UpdateHandler.consumeOneRecord(HoodieCopyOnWriteTable.java:423)
>  at 
> org.apache.hudi.common.util.queue.BoundedInMemoryQueueConsumer.consume(BoundedInMemoryQueueConsumer.java:37)
>  at 
> org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.lambda$null$2(BoundedInMemoryExecutor.java:121)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>  ... 3 more
> Caused by: java.lang.ClassCastException: org.apache.avro.util.Utf8 cannot be 
> cast to org.apache.avro.generic.GenericFixed
>  at 
> org.apache.parquet.avro.AvroWriteSupport.writeValueWithoutConversion(AvroWriteSupport.java:336)
>  at 
> org.apache.parquet.avro.AvroWriteSupport.writeValue(AvroWriteSupport.java:275)
>  at 
> org.apache.parquet.avro.AvroWriteSupport.writeRecordFields(AvroWriteSupport.java:191)
>  at org.apache.parquet.avro.AvroWriteSupport.write(AvroWriteSupport.java:165)
>  at 
> org.apache.parquet.hadoop.InternalParquetRecordWriter.write(InternalParquetRecordWriter.java:128)
>  at org.apache.parquet.hadoop.ParquetWriter.write(ParquetWriter.java:299)
>  at 
> org.apache.hudi.io.storage.HoodieParquetWriter.writeAvro(HoodieParquetWriter.java:103)
>  at org.apache.hudi.io.HoodieMergeHandle.write(HoodieMergeHandle.java:242)
>  ... 8 more



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HUDI-718) java.lang.ClassCastException during upsert

2020-04-01 Thread lamber-ken (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17072874#comment-17072874
 ] 

lamber-ken edited comment on HUDI-718 at 4/1/20, 4:46 PM:
--

hi [~afilipchik], can you share the schema of old parquet file? and the type of 
bla.bla field?


was (Author: lamber-ken):
hi [~afilipchik], can you share the schema of old parquet file? 

> java.lang.ClassCastException during upsert
> --
>
> Key: HUDI-718
> URL: https://issues.apache.org/jira/browse/HUDI-718
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>  Components: DeltaStreamer
>Reporter: Alexander Filipchik
>Assignee: lamber-ken
>Priority: Major
> Fix For: 0.6.0
>
> Attachments: image-2020-03-21-16-49-28-905.png
>
>
> Dataset was created using hudi 0.5 and now trying to migrate it to the latest 
> master. The table is written using SqlTransformer. Exception:
>  
> Caused by: org.apache.hudi.exception.HoodieUpsertException: Failed to merge 
> old record into new file for key bla.bla from old file 
> gs://../2020/03/15/7b75931f-ff2f-4bf4-8949-5c437112be79-0_0-35-1196_20200316234140.parquet
>  to new file 
> gs://.../2020/03/15/7b75931f-ff2f-4bf4-8949-5c437112be79-0_1-39-1506_20200317190948.parquet
>  at org.apache.hudi.io.HoodieMergeHandle.write(HoodieMergeHandle.java:246)
>  at 
> org.apache.hudi.table.HoodieCopyOnWriteTable$UpdateHandler.consumeOneRecord(HoodieCopyOnWriteTable.java:433)
>  at 
> org.apache.hudi.table.HoodieCopyOnWriteTable$UpdateHandler.consumeOneRecord(HoodieCopyOnWriteTable.java:423)
>  at 
> org.apache.hudi.common.util.queue.BoundedInMemoryQueueConsumer.consume(BoundedInMemoryQueueConsumer.java:37)
>  at 
> org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.lambda$null$2(BoundedInMemoryExecutor.java:121)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>  ... 3 more
> Caused by: java.lang.ClassCastException: org.apache.avro.util.Utf8 cannot be 
> cast to org.apache.avro.generic.GenericFixed
>  at 
> org.apache.parquet.avro.AvroWriteSupport.writeValueWithoutConversion(AvroWriteSupport.java:336)
>  at 
> org.apache.parquet.avro.AvroWriteSupport.writeValue(AvroWriteSupport.java:275)
>  at 
> org.apache.parquet.avro.AvroWriteSupport.writeRecordFields(AvroWriteSupport.java:191)
>  at org.apache.parquet.avro.AvroWriteSupport.write(AvroWriteSupport.java:165)
>  at 
> org.apache.parquet.hadoop.InternalParquetRecordWriter.write(InternalParquetRecordWriter.java:128)
>  at org.apache.parquet.hadoop.ParquetWriter.write(ParquetWriter.java:299)
>  at 
> org.apache.hudi.io.storage.HoodieParquetWriter.writeAvro(HoodieParquetWriter.java:103)
>  at org.apache.hudi.io.HoodieMergeHandle.write(HoodieMergeHandle.java:242)
>  ... 8 more



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HUDI-718) java.lang.ClassCastException during upsert

2020-04-01 Thread lamber-ken (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17072874#comment-17072874
 ] 

lamber-ken edited comment on HUDI-718 at 4/1/20, 4:46 PM:
--

hi [~afilipchik], can you share the schema of old parquet file? and what's the 
type of bla.bla field?


was (Author: lamber-ken):
hi [~afilipchik], can you share the schema of old parquet file? and the type of 
bla.bla field?

> java.lang.ClassCastException during upsert
> --
>
> Key: HUDI-718
> URL: https://issues.apache.org/jira/browse/HUDI-718
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>  Components: DeltaStreamer
>Reporter: Alexander Filipchik
>Assignee: lamber-ken
>Priority: Major
> Fix For: 0.6.0
>
> Attachments: image-2020-03-21-16-49-28-905.png
>
>
> Dataset was created using hudi 0.5 and now trying to migrate it to the latest 
> master. The table is written using SqlTransformer. Exception:
>  
> Caused by: org.apache.hudi.exception.HoodieUpsertException: Failed to merge 
> old record into new file for key bla.bla from old file 
> gs://../2020/03/15/7b75931f-ff2f-4bf4-8949-5c437112be79-0_0-35-1196_20200316234140.parquet
>  to new file 
> gs://.../2020/03/15/7b75931f-ff2f-4bf4-8949-5c437112be79-0_1-39-1506_20200317190948.parquet
>  at org.apache.hudi.io.HoodieMergeHandle.write(HoodieMergeHandle.java:246)
>  at 
> org.apache.hudi.table.HoodieCopyOnWriteTable$UpdateHandler.consumeOneRecord(HoodieCopyOnWriteTable.java:433)
>  at 
> org.apache.hudi.table.HoodieCopyOnWriteTable$UpdateHandler.consumeOneRecord(HoodieCopyOnWriteTable.java:423)
>  at 
> org.apache.hudi.common.util.queue.BoundedInMemoryQueueConsumer.consume(BoundedInMemoryQueueConsumer.java:37)
>  at 
> org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.lambda$null$2(BoundedInMemoryExecutor.java:121)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>  ... 3 more
> Caused by: java.lang.ClassCastException: org.apache.avro.util.Utf8 cannot be 
> cast to org.apache.avro.generic.GenericFixed
>  at 
> org.apache.parquet.avro.AvroWriteSupport.writeValueWithoutConversion(AvroWriteSupport.java:336)
>  at 
> org.apache.parquet.avro.AvroWriteSupport.writeValue(AvroWriteSupport.java:275)
>  at 
> org.apache.parquet.avro.AvroWriteSupport.writeRecordFields(AvroWriteSupport.java:191)
>  at org.apache.parquet.avro.AvroWriteSupport.write(AvroWriteSupport.java:165)
>  at 
> org.apache.parquet.hadoop.InternalParquetRecordWriter.write(InternalParquetRecordWriter.java:128)
>  at org.apache.parquet.hadoop.ParquetWriter.write(ParquetWriter.java:299)
>  at 
> org.apache.hudi.io.storage.HoodieParquetWriter.writeAvro(HoodieParquetWriter.java:103)
>  at org.apache.hudi.io.HoodieMergeHandle.write(HoodieMergeHandle.java:242)
>  ... 8 more



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] yanghua commented on issue #1472: [HUDI-754] Configure .asf.yaml for Hudi Github repository

2020-04-01 Thread GitBox
yanghua commented on issue #1472: [HUDI-754] Configure .asf.yaml for Hudi 
Github repository
URL: https://github.com/apache/incubator-hudi/pull/1472#issuecomment-607336667
 
 
   @vinothchandar It takes effects here: 
https://github.com/apache/infrastructure-staging-test/blob/master/.asf.yaml#L12
   
   I guess, maybe the `label` does not support blank space?
   
   Can you try to add a tag contains blank space via Github to see if it 
supports blank space?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar edited a comment on issue #1472: [HUDI-754] Configure .asf.yaml for Hudi Github repository

2020-04-01 Thread GitBox
vinothchandar edited a comment on issue #1472: [HUDI-754] Configure .asf.yaml 
for Hudi Github repository
URL: https://github.com/apache/incubator-hudi/pull/1472#issuecomment-607329538
 
 
   edit: its just the tags.. interesting.. may be look at other projects doing 
this ? 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on issue #1472: [HUDI-754] Configure .asf.yaml for Hudi Github repository

2020-04-01 Thread GitBox
vinothchandar commented on issue #1472: [HUDI-754] Configure .asf.yaml for Hudi 
Github repository
URL: https://github.com/apache/incubator-hudi/pull/1472#issuecomment-607329538
 
 
   Hmmm not sure if we have to wait some time for it to take effect.. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (HUDI-718) java.lang.ClassCastException during upsert

2020-04-01 Thread lamber-ken (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17072874#comment-17072874
 ] 

lamber-ken commented on HUDI-718:
-

hi [~afilipchik], can you share the schema of old parquet file? 

> java.lang.ClassCastException during upsert
> --
>
> Key: HUDI-718
> URL: https://issues.apache.org/jira/browse/HUDI-718
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>  Components: DeltaStreamer
>Reporter: Alexander Filipchik
>Assignee: lamber-ken
>Priority: Major
> Fix For: 0.6.0
>
> Attachments: image-2020-03-21-16-49-28-905.png
>
>
> Dataset was created using hudi 0.5 and now trying to migrate it to the latest 
> master. The table is written using SqlTransformer. Exception:
>  
> Caused by: org.apache.hudi.exception.HoodieUpsertException: Failed to merge 
> old record into new file for key bla.bla from old file 
> gs://../2020/03/15/7b75931f-ff2f-4bf4-8949-5c437112be79-0_0-35-1196_20200316234140.parquet
>  to new file 
> gs://.../2020/03/15/7b75931f-ff2f-4bf4-8949-5c437112be79-0_1-39-1506_20200317190948.parquet
>  at org.apache.hudi.io.HoodieMergeHandle.write(HoodieMergeHandle.java:246)
>  at 
> org.apache.hudi.table.HoodieCopyOnWriteTable$UpdateHandler.consumeOneRecord(HoodieCopyOnWriteTable.java:433)
>  at 
> org.apache.hudi.table.HoodieCopyOnWriteTable$UpdateHandler.consumeOneRecord(HoodieCopyOnWriteTable.java:423)
>  at 
> org.apache.hudi.common.util.queue.BoundedInMemoryQueueConsumer.consume(BoundedInMemoryQueueConsumer.java:37)
>  at 
> org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.lambda$null$2(BoundedInMemoryExecutor.java:121)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>  ... 3 more
> Caused by: java.lang.ClassCastException: org.apache.avro.util.Utf8 cannot be 
> cast to org.apache.avro.generic.GenericFixed
>  at 
> org.apache.parquet.avro.AvroWriteSupport.writeValueWithoutConversion(AvroWriteSupport.java:336)
>  at 
> org.apache.parquet.avro.AvroWriteSupport.writeValue(AvroWriteSupport.java:275)
>  at 
> org.apache.parquet.avro.AvroWriteSupport.writeRecordFields(AvroWriteSupport.java:191)
>  at org.apache.parquet.avro.AvroWriteSupport.write(AvroWriteSupport.java:165)
>  at 
> org.apache.parquet.hadoop.InternalParquetRecordWriter.write(InternalParquetRecordWriter.java:128)
>  at org.apache.parquet.hadoop.ParquetWriter.write(ParquetWriter.java:299)
>  at 
> org.apache.hudi.io.storage.HoodieParquetWriter.writeAvro(HoodieParquetWriter.java:103)
>  at org.apache.hudi.io.HoodieMergeHandle.write(HoodieMergeHandle.java:242)
>  ... 8 more



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (HUDI-731) Implement a chained transformer for deltastreamer that can chain other transformer implementations

2020-04-01 Thread vinoyang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang closed HUDI-731.
-
Resolution: Implemented

Implemented via master branch: 5b53b0d85e0d60a37c37941b5a653b0718534e7b

> Implement a chained transformer for deltastreamer that can chain other 
> transformer implementations
> --
>
> Key: HUDI-731
> URL: https://issues.apache.org/jira/browse/HUDI-731
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: DeltaStreamer, Utilities
>Reporter: Vinoth Chandar
>Assignee: Raymond Xu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.6.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[incubator-hudi] branch master updated: [HUDI-731] Add ChainedTransformer (#1440)

2020-04-01 Thread vinoyang
This is an automated email from the ASF dual-hosted git repository.

vinoyang pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new 5b53b0d  [HUDI-731] Add ChainedTransformer (#1440)
5b53b0d is described below

commit 5b53b0d85e0d60a37c37941b5a653b0718534e7b
Author: Raymond Xu <2701446+xushi...@users.noreply.github.com>
AuthorDate: Wed Apr 1 08:21:31 2020 -0700

[HUDI-731] Add ChainedTransformer (#1440)

* [HUDI-731] Add ChainedTransformer
---
 .../org/apache/hudi/utilities/UtilHelpers.java |  13 ++-
 .../hudi/utilities/deltastreamer/DeltaSync.java|   8 +-
 .../deltastreamer/HoodieDeltaStreamer.java |  21 -
 .../utilities/transform/ChainedTransformer.java|  54 +++
 .../hudi/utilities/TestHoodieDeltaStreamer.java|  67 +++---
 .../org/apache/hudi/utilities/TestUtilHelpers.java | 101 +
 .../transform/TestChainedTransformer.java  |  92 +++
 .../{ => transform}/TestFlatteningTransformer.java |   4 +-
 8 files changed, 314 insertions(+), 46 deletions(-)

diff --git 
a/hudi-utilities/src/main/java/org/apache/hudi/utilities/UtilHelpers.java 
b/hudi-utilities/src/main/java/org/apache/hudi/utilities/UtilHelpers.java
index 8930084..222a391 100644
--- a/hudi-utilities/src/main/java/org/apache/hudi/utilities/UtilHelpers.java
+++ b/hudi-utilities/src/main/java/org/apache/hudi/utilities/UtilHelpers.java
@@ -34,6 +34,7 @@ import org.apache.hudi.exception.HoodieIOException;
 import org.apache.hudi.index.HoodieIndex;
 import org.apache.hudi.utilities.schema.SchemaProvider;
 import org.apache.hudi.utilities.sources.Source;
+import org.apache.hudi.utilities.transform.ChainedTransformer;
 import org.apache.hudi.utilities.transform.Transformer;
 
 import org.apache.avro.Schema;
@@ -67,7 +68,9 @@ import java.sql.DriverManager;
 import java.sql.PreparedStatement;
 import java.sql.ResultSet;
 import java.sql.SQLException;
+import java.util.ArrayList;
 import java.util.Arrays;
+import java.util.Collections;
 import java.util.Enumeration;
 import java.util.HashMap;
 import java.util.List;
@@ -102,11 +105,15 @@ public class UtilHelpers {
 }
   }
 
-  public static Transformer createTransformer(String transformerClass) throws 
IOException {
+  public static Option createTransformer(List classNames) 
throws IOException {
 try {
-  return transformerClass == null ? null : (Transformer) 
ReflectionUtils.loadClass(transformerClass);
+  List transformers = new ArrayList<>();
+  for (String className : 
Option.ofNullable(classNames).orElse(Collections.emptyList())) {
+transformers.add(ReflectionUtils.loadClass(className));
+  }
+  return transformers.isEmpty() ? Option.empty() : Option.of(new 
ChainedTransformer(transformers));
 } catch (Throwable e) {
-  throw new IOException("Could not load transformer class " + 
transformerClass, e);
+  throw new IOException("Could not load transformer class(es) " + 
classNames, e);
 }
   }
 
diff --git 
a/hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/DeltaSync.java
 
b/hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/DeltaSync.java
index 99cb497..5cc33ee 100644
--- 
a/hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/DeltaSync.java
+++ 
b/hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/DeltaSync.java
@@ -106,7 +106,7 @@ public class DeltaSync implements Serializable {
   /**
* Allows transforming source to target table before writing.
*/
-  private transient Transformer transformer;
+  private transient Option transformer;
 
   /**
* Extract the key for the target table.
@@ -173,7 +173,7 @@ public class DeltaSync implements Serializable {
 
 refreshTimeline();
 
-this.transformer = UtilHelpers.createTransformer(cfg.transformerClassName);
+this.transformer = 
UtilHelpers.createTransformer(cfg.transformerClassNames);
 this.keyGenerator = DataSourceUtils.createKeyGenerator(props);
 
 this.formatAdapter = new SourceFormatAdapter(
@@ -281,14 +281,14 @@ public class DeltaSync implements Serializable {
 final Option> avroRDDOptional;
 final String checkpointStr;
 final SchemaProvider schemaProvider;
-if (transformer != null) {
+if (transformer.isPresent()) {
   // Transformation is needed. Fetch New rows in Row Format, apply 
transformation and then convert them
   // to generic records for writing
   InputBatch> dataAndCheckpoint =
   formatAdapter.fetchNewDataInRowFormat(resumeCheckpointStr, 
cfg.sourceLimit);
 
   Option> transformed =
-  dataAndCheckpoint.getBatch().map(data -> transformer.apply(jssc, 
sparkSession, data, props));
+  dataAndCheckpoint.getBatch().map(data -> 
transformer.get().apply(jssc, sparkSession, data, props));
   checkpointStr = 

Error while running github feature from .asf.yaml in incubator-hudi!

2020-04-01 Thread Apache Infrastructure
An error occurred while running github feature in .asf.yaml!:
.asf.yaml: Invalid GitHub label 'incremental processing' - must be lowercase 
alphanumerical and <= 35 characters!


[GitHub] [incubator-hudi] yanghua merged pull request #1440: [HUDI-731] Add ChainedTransformer

2020-04-01 Thread GitBox
yanghua merged pull request #1440: [HUDI-731] Add ChainedTransformer
URL: https://github.com/apache/incubator-hudi/pull/1440
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] yanghua commented on issue #1440: [HUDI-731] Add ChainedTransformer

2020-04-01 Thread GitBox
yanghua commented on issue #1440: [HUDI-731] Add ChainedTransformer
URL: https://github.com/apache/incubator-hudi/pull/1440#issuecomment-607313717
 
 
   thanks @xushiyan , merging...


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1471: [WIP][HUDI-752]Make CompactionAdminClient spark-free

2020-04-01 Thread GitBox
vinothchandar commented on a change in pull request #1471: [WIP][HUDI-752]Make 
CompactionAdminClient spark-free
URL: https://github.com/apache/incubator-hudi/pull/1471#discussion_r401677960
 
 

 ##
 File path: 
hudi-client/src/main/java/org/apache/hudi/client/utils/SparkEngineUtils.java
 ##
 @@ -0,0 +1,61 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.client.utils;
+
+import org.apache.spark.SparkContext;
+import org.apache.spark.api.java.JavaSparkContext;
+import org.apache.spark.api.java.function.FlatMapFunction;
+import org.apache.spark.api.java.function.Function;
+
+import java.util.List;
+
+/**
+ * Util class for Spark Engine.
+ */
+public class SparkEngineUtils {
 
 Review comment:
   yes.. I am with you... @yanghua and @hddong 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] xushiyan commented on issue #1440: [HUDI-731] Add ChainedTransformer

2020-04-01 Thread GitBox
xushiyan commented on issue #1440: [HUDI-731] Add ChainedTransformer
URL: https://github.com/apache/incubator-hudi/pull/1440#issuecomment-607291746
 
 
   @vinothchandar @yanghua Please check the last commit and merge if it's ok. 
Thanks.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[incubator-hudi] branch master updated: [HUDI-749] Fix hudi-timeline-server-bundle run_server.sh start error (#1477)

2020-04-01 Thread leesf
This is an automated email from the ASF dual-hosted git repository.

leesf pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new 2a611f4  [HUDI-749] Fix hudi-timeline-server-bundle run_server.sh 
start error (#1477)
2a611f4 is described below

commit 2a611f4ad3816b67b54e0fcd1ef588668fba0732
Author: Trevor <33487819+trevor-zh...@users.noreply.github.com>
AuthorDate: Wed Apr 1 22:19:54 2020 +0800

[HUDI-749] Fix hudi-timeline-server-bundle run_server.sh start error (#1477)
---
 packaging/hudi-timeline-server-bundle/run_server.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/packaging/hudi-timeline-server-bundle/run_server.sh 
b/packaging/hudi-timeline-server-bundle/run_server.sh
index 487cbb5..479eda1 100755
--- a/packaging/hudi-timeline-server-bundle/run_server.sh
+++ b/packaging/hudi-timeline-server-bundle/run_server.sh
@@ -18,7 +18,7 @@
 
 DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )"
 #Ensure we pick the right jar even for hive11 builds
-HOODIE_JAR=`ls -c $DIR/target/hudi-timeline-server-bundle-*.jar | grep -v test 
| head -1`
+HOODIE_JAR=`ls -c $DIR/target/hudi-timeline-server-bundle-*.jar | grep -v test 
| grep -v source | head -1`
 
 if [ -z "$HADOOP_HOME" ]; then
   echo "HADOOP_HOME not set. It must be set"



[GitHub] [incubator-hudi] leesf merged pull request #1477: [HUDI-749] Fix hudi-timeline-server-bundle./run_server.sh start error

2020-04-01 Thread GitBox
leesf merged pull request #1477: [HUDI-749] Fix 
hudi-timeline-server-bundle./run_server.sh start error
URL: https://github.com/apache/incubator-hudi/pull/1477
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] Trevor-zhang commented on a change in pull request #1477: [HUDI-749] Fix hudi-timeline-server-bundle./run_server.sh start error

2020-04-01 Thread GitBox
Trevor-zhang commented on a change in pull request #1477: [HUDI-749] Fix 
hudi-timeline-server-bundle./run_server.sh start error
URL: https://github.com/apache/incubator-hudi/pull/1477#discussion_r401647927
 
 

 ##
 File path: packaging/hudi-timeline-server-bundle/run_server.sh
 ##
 @@ -19,7 +19,7 @@
 
 DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )"
 #Ensure we pick the right jar even for hive11 builds
-HOODIE_JAR=`ls -c $DIR/target/hudi-timeline-server-bundle-*.jar | grep -v test 
| head -1`
+HOODIE_JAR=`ls c $DIR/target/hudi-timeline-server-bundle*.jar | grep -v test | 
grep -v source | head -1`
 
 Review comment:
   > nit:
   > change to
   > HOODIE_JAR=`ls -c $DIR/target/hudi-timeline-server-bundle-*.jar | grep -v 
test | grep -v source | head -1` should be ok(just need add `| grep -v source`).
   
   It's my fault! 
   done.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] Trevor-zhang commented on a change in pull request #1477: [HUDI-749] Fix hudi-timeline-server-bundle./run_server.sh start error

2020-04-01 Thread GitBox
Trevor-zhang commented on a change in pull request #1477: [HUDI-749] Fix 
hudi-timeline-server-bundle./run_server.sh start error
URL: https://github.com/apache/incubator-hudi/pull/1477#discussion_r401647133
 
 

 ##
 File path: packaging/hudi-timeline-server-bundle/run_server.sh
 ##
 @@ -19,7 +19,7 @@
 
 DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )"
 #Ensure we pick the right jar even for hive11 builds
-HOODIE_JAR=`ls -c $DIR/target/hudi-timeline-server-bundle-*.jar | grep -v test 
| head -1`
+HOODIE_JAR=`ls c $DIR/target/hudi-timeline-server-bundle*.jar | grep -v test | 
grep -v source | head -1`
 
 Review comment:
   > nit: `ls c` to `ls -c`
   
   done.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Assigned] (HUDI-723) SqlTransformer's schema sometimes is not registered.

2020-04-01 Thread lamber-ken (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lamber-ken reassigned HUDI-723:
---

Assignee: (was: lamber-ken)

> SqlTransformer's schema sometimes is not registered. 
> -
>
> Key: HUDI-723
> URL: https://issues.apache.org/jira/browse/HUDI-723
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>  Components: DeltaStreamer
>Reporter: Alexander Filipchik
>Priority: Major
> Fix For: 0.6.0
>
>
> If schema is inferred from RowBasedSchemaProvider when SQL transformer is 
> used it also needs to be registered. 
>  
> Current way only works if SchemaProvider has a valid target schema. Is one 
> wants to use schema from SQL transformation, the result of 
> RowBasedSchemaProvider.getTargetSchema needs to be passed into something like:
> {code:java}
> private void setupWriteClient(SchemaProvider schemaProvider) {
>   LOG.info("Setting up Hoodie Write Client");
>   registerAvroSchemas(schemaProvider);
>   HoodieWriteConfig hoodieCfg = getHoodieClientConfig(schemaProvider);
>   writeClient = new HoodieWriteClient<>(jssc, hoodieCfg, true);
>   onInitializingHoodieWriteClient.apply(writeClient);
> }
> {code}
> Existent method will not work as it is checking for:
> {code:java}
> if ((null != schemaProvider) && (null == writeClient)) {
> {code}
> and writeClient is already configured. 
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] leesf commented on a change in pull request #1477: [HUDI-749] Fix hudi-timeline-server-bundle./run_server.sh start error

2020-04-01 Thread GitBox
leesf commented on a change in pull request #1477: [HUDI-749] Fix 
hudi-timeline-server-bundle./run_server.sh start error
URL: https://github.com/apache/incubator-hudi/pull/1477#discussion_r401610086
 
 

 ##
 File path: packaging/hudi-timeline-server-bundle/run_server.sh
 ##
 @@ -19,7 +19,7 @@
 
 DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )"
 #Ensure we pick the right jar even for hive11 builds
-HOODIE_JAR=`ls -c $DIR/target/hudi-timeline-server-bundle-*.jar | grep -v test 
| head -1`
+HOODIE_JAR=`ls c $DIR/target/hudi-timeline-server-bundle*.jar | grep -v test | 
grep -v source | head -1`
 
 Review comment:
   nit: 
   change to 
   HOODIE_JAR=\`ls -c $DIR/target/hudi-timeline-server-bundle-*.jar | grep -v 
test | grep -v source | head -1\` should be ok(just need add `| grep -v 
source`).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] lamber-ken commented on a change in pull request #1477: [HUDI-749] Fix hudi-timeline-server-bundle./run_server.sh start error

2020-04-01 Thread GitBox
lamber-ken commented on a change in pull request #1477: [HUDI-749] Fix 
hudi-timeline-server-bundle./run_server.sh start error
URL: https://github.com/apache/incubator-hudi/pull/1477#discussion_r401600700
 
 

 ##
 File path: packaging/hudi-timeline-server-bundle/run_server.sh
 ##
 @@ -19,7 +19,7 @@
 
 DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )"
 #Ensure we pick the right jar even for hive11 builds
-HOODIE_JAR=`ls -c $DIR/target/hudi-timeline-server-bundle-*.jar | grep -v test 
| head -1`
+HOODIE_JAR=`ls c $DIR/target/hudi-timeline-server-bundle*.jar | grep -v test | 
grep -v source | head -1`
 
 Review comment:
   nit: `ls c` to `ls -c`


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Updated] (HUDI-749) packaging/hudi-timeline-server-bundle./run_server.sh start error

2020-04-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-749:

Labels: newbie pull-request-available  (was: newbie)

> packaging/hudi-timeline-server-bundle./run_server.sh start error
> 
>
> Key: HUDI-749
> URL: https://issues.apache.org/jira/browse/HUDI-749
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>  Components: newbie
>Reporter: leesf
>Assignee: Trevorzhang
>Priority: Minor
>  Labels: newbie, pull-request-available
> Fix For: 0.6.0
>
>
> ./run_server.sh start error.
> should change 
> HOODIE_JAR=`ls -c $DIR/target/hudi-timeline-server-bundle-*.jar | grep -v 
> test | head -1` to 
> HOODIE_JAR=`ls -c $DIR/target/hudi-timeline-server-bundle-*.jar | grep -v 
> test | grep -v source | head -1`
> should fix the issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] Trevor-zhang opened a new pull request #1477: [HUDI-749] Fix packaging/hudi-timeline-server-bundle./run_server.sh s…

2020-04-01 Thread GitBox
Trevor-zhang opened a new pull request #1477: [HUDI-749] Fix 
packaging/hudi-timeline-server-bundle./run_server.sh s…
URL: https://github.com/apache/incubator-hudi/pull/1477
 
 
   …tart error
   
   
   ## What is the purpose of the pull request
   
   Fix packaging/hudi-timeline-server-bundle./run_server.sh s…
   
   ## Brief change log
   
   fix
   
   ## Verify this pull request
   This change can get more accurate information
   
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] yanghua commented on a change in pull request #1471: [WIP][HUDI-752]Make CompactionAdminClient spark-free

2020-04-01 Thread GitBox
yanghua commented on a change in pull request #1471: [WIP][HUDI-752]Make 
CompactionAdminClient spark-free
URL: https://github.com/apache/incubator-hudi/pull/1471#discussion_r401510164
 
 

 ##
 File path: 
hudi-client/src/main/java/org/apache/hudi/client/utils/SparkEngineUtils.java
 ##
 @@ -0,0 +1,61 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.client.utils;
+
+import org.apache.spark.SparkContext;
+import org.apache.spark.api.java.JavaSparkContext;
+import org.apache.spark.api.java.function.FlatMapFunction;
+import org.apache.spark.api.java.function.Function;
+
+import java.util.List;
+
+/**
+ * Util class for Spark Engine.
+ */
+public class SparkEngineUtils {
 
 Review comment:
   Yes, I suggest whether we can temporarily block this PR. We internally 
expect that an incomplete version based on Flink implementation will be given 
this Friday. Can we look at its implementation then discuss furthermore?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Assigned] (HUDI-749) packaging/hudi-timeline-server-bundle./run_server.sh start error

2020-04-01 Thread Trevorzhang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Trevorzhang reassigned HUDI-749:


Assignee: Trevorzhang

> packaging/hudi-timeline-server-bundle./run_server.sh start error
> 
>
> Key: HUDI-749
> URL: https://issues.apache.org/jira/browse/HUDI-749
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>  Components: newbie
>Reporter: leesf
>Assignee: Trevorzhang
>Priority: Minor
>  Labels: newbie
> Fix For: 0.6.0
>
>
> ./run_server.sh start error.
> should change 
> HOODIE_JAR=`ls -c $DIR/target/hudi-timeline-server-bundle-*.jar | grep -v 
> test | head -1` to 
> HOODIE_JAR=`ls -c $DIR/target/hudi-timeline-server-bundle-*.jar | grep -v 
> test | grep -v source | head -1`
> should fix the issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-69) Support realtime view in Spark datasource #136

2020-04-01 Thread Vinoth Chandar (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-69?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17072488#comment-17072488
 ] 

Vinoth Chandar commented on HUDI-69:


[~garyli1019] Great.. Assigned to you!

> Support realtime view in Spark datasource #136
> --
>
> Key: HUDI-69
> URL: https://issues.apache.org/jira/browse/HUDI-69
> Project: Apache Hudi (incubating)
>  Issue Type: New Feature
>  Components: Spark Integration
>Reporter: Vinoth Chandar
>Assignee: Yanjia Gary Li
>Priority: Major
> Fix For: 0.6.0
>
>
> https://github.com/uber/hudi/issues/136



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HUDI-69) Support realtime view in Spark datasource #136

2020-04-01 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-69?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar reassigned HUDI-69:
--

Assignee: Yanjia Gary Li  (was: Vinoth Chandar)

> Support realtime view in Spark datasource #136
> --
>
> Key: HUDI-69
> URL: https://issues.apache.org/jira/browse/HUDI-69
> Project: Apache Hudi (incubating)
>  Issue Type: New Feature
>  Components: Spark Integration
>Reporter: Vinoth Chandar
>Assignee: Yanjia Gary Li
>Priority: Major
> Fix For: 0.6.0
>
>
> https://github.com/uber/hudi/issues/136



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HUDI-43) Introduce a WriteContext abstraction to HoodieWriteClient #384

2020-04-01 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-43?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar reassigned HUDI-43:
--

Assignee: Vinoth Chandar

> Introduce a WriteContext abstraction to HoodieWriteClient #384
> --
>
> Key: HUDI-43
> URL: https://issues.apache.org/jira/browse/HUDI-43
> Project: Apache Hudi (incubating)
>  Issue Type: Sub-task
>  Components: Code Cleanup, Writer Core
>Reporter: Vinoth Chandar
>Assignee: Vinoth Chandar
>Priority: Major
>
> [https://github.com/uber/hudi/issues/384]
>  
> HoodieTable, WriteConfig and other classes passed between "client" and "io"  
> etc need to standardize on this 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)