[GitHub] [incubator-hudi] HariprasadAllaka1612 edited a comment on issue #888: Exception in thread "main" com.uber.hoodie.exception.DatasetNotFoundException: Hoodie dataset not found in path

2019-09-17 Thread GitBox
HariprasadAllaka1612 edited a comment on issue #888: Exception in thread "main" 
com.uber.hoodie.exception.DatasetNotFoundException: Hoodie dataset not found in 
path
URL: https://github.com/apache/incubator-hudi/issues/888#issuecomment-532071607
 
 
   Hi Vinoth,
   
   Thanks for the reply.
   
   a. This is the first time i am writing the data to that path
   b. Tried with Overwrite but i have the same exception. 
   
inputDF
 .write.format("com.uber.hoodie")
 .option(HoodieWriteConfig.TABLE_NAME, tablename)
 .option(DataSourceWriteOptions.RECORDKEY_FIELD_OPT_KEY, "GameId")
 
.option(DataSourceWriteOptions.PARTITIONPATH_FIELD_OPT_KEY,"OperatorShortName")
 .option(DataSourceWriteOptions.PRECOMBINE_FIELD_OPT_KEY, 
"HandledTimestamp")
 .option(DataSourceWriteOptions.OPERATION_OPT_KEY, 
DataSourceWriteOptions.UPSERT_OPERATION_OPT_VAL)
 .option(DataSourceWriteOptions.STORAGE_TYPE_OPT_KEY, 
DataSourceWriteOptions.MOR_STORAGE_TYPE_OPT_VAL)
 .mode(SaveMode.Overwrite)
 .save("s3a://" + "gat-datalake-raw-dev" + "/Games3" )
   
   Below is the exception with Overwrite savemode for your reference,
   Exception in thread "main" 
com.uber.hoodie.exception.DatasetNotFoundException: Hoodie dataset not found in 
path s3a://gat-datalake-raw-dev/Games3\.hoodie
at 
com.uber.hoodie.exception.DatasetNotFoundException.checkValidDataset(DatasetNotFoundException.java:45)
at 
com.uber.hoodie.common.table.HoodieTableMetaClient.(HoodieTableMetaClient.java:91)
at 
com.uber.hoodie.common.table.HoodieTableMetaClient.(HoodieTableMetaClient.java:78)
at 
com.uber.hoodie.common.table.HoodieTableMetaClient.initializePathAsHoodieDataset(HoodieTableMetaClient.java:310)
at 
com.uber.hoodie.common.table.HoodieTableMetaClient.initTableType(HoodieTableMetaClient.java:248)
at 
com.uber.hoodie.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:136)
at com.uber.hoodie.DefaultSource.createRelation(DefaultSource.scala:91)
at 
org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45)
at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86)
at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at 
org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
at 
org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:80)
at 
org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:80)
at 
org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:668)
at 
org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:668)
at 
org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:78)
at 
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125)
at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73)
at 
org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:668)
at 
org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:276)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:270)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:228)
at 
com.playngodataengg.scala.dao.DataAccessS3.writeDataToRefinedS3(DataAccessS3.scala:28)
at 
com.playngodataengg.scala.controller.GameAndProviderDataTransform.processData(GameAndProviderDataTransform.scala:29)
at 
com.playngodataengg.scala.action.GameAndProviderData$.main(GameAndProviderData.scala:10)
at 
com.playngodataengg.scala.action.GameAndProviderData.main(GameAndProviderData.scala)
   
   And i wanted to provide you the dependencies i have in my maven project as 
well.
   
   http://maven.apache.org/POM/4.0.0; 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance; 
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 
http://maven.apache.org/maven-v4_0_0.xsd;>
 4.0.0
 com.playngodataengg.scala
 playngodataengg
 1.0-SNAPSHOT
 2008
 
   2.11.12
   2.4.0
   2.11
 
   
 
   
 

[GitHub] [incubator-hudi] HariprasadAllaka1612 edited a comment on issue #888: Exception in thread "main" com.uber.hoodie.exception.DatasetNotFoundException: Hoodie dataset not found in path

2019-09-17 Thread GitBox
HariprasadAllaka1612 edited a comment on issue #888: Exception in thread "main" 
com.uber.hoodie.exception.DatasetNotFoundException: Hoodie dataset not found in 
path
URL: https://github.com/apache/incubator-hudi/issues/888#issuecomment-532071607
 
 
   Hi Vinoth,
   
   Thanks for the reply.
   
   a. This is the first time i am writing the data to that path
   b. Tried with Overwrite but i have the same exception. 
   
   Below is the exception with Overwrite savemode for your reference,
   Exception in thread "main" 
com.uber.hoodie.exception.DatasetNotFoundException: Hoodie dataset not found in 
path s3a://gat-datalake-raw-dev/Games3\.hoodie
at 
com.uber.hoodie.exception.DatasetNotFoundException.checkValidDataset(DatasetNotFoundException.java:45)
at 
com.uber.hoodie.common.table.HoodieTableMetaClient.(HoodieTableMetaClient.java:91)
at 
com.uber.hoodie.common.table.HoodieTableMetaClient.(HoodieTableMetaClient.java:78)
at 
com.uber.hoodie.common.table.HoodieTableMetaClient.initializePathAsHoodieDataset(HoodieTableMetaClient.java:310)
at 
com.uber.hoodie.common.table.HoodieTableMetaClient.initTableType(HoodieTableMetaClient.java:248)
at 
com.uber.hoodie.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:136)
at com.uber.hoodie.DefaultSource.createRelation(DefaultSource.scala:91)
at 
org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45)
at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86)
at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at 
org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
at 
org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:80)
at 
org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:80)
at 
org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:668)
at 
org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:668)
at 
org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:78)
at 
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125)
at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73)
at 
org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:668)
at 
org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:276)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:270)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:228)
at 
com.playngodataengg.scala.dao.DataAccessS3.writeDataToRefinedS3(DataAccessS3.scala:28)
at 
com.playngodataengg.scala.controller.GameAndProviderDataTransform.processData(GameAndProviderDataTransform.scala:29)
at 
com.playngodataengg.scala.action.GameAndProviderData$.main(GameAndProviderData.scala:10)
at 
com.playngodataengg.scala.action.GameAndProviderData.main(GameAndProviderData.scala)
   
   And i wanted to provide you the dependencies i have in my maven project as 
well.
   
   http://maven.apache.org/POM/4.0.0; 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance; 
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 
http://maven.apache.org/maven-v4_0_0.xsd;>
 4.0.0
 com.playngodataengg.scala
 playngodataengg
 1.0-SNAPSHOT
 2008
 
   2.11.12
   2.4.0
   2.11
 
   
 
   
 scala-tools.org
 Scala-Tools Maven2 Repository
 http://scala-tools.org/repo-releases
   
   
 redshift
 
http://redshift-maven-repository.s3-website-us-east-1.amazonaws.com/release
   
 
   
 
   
 scala-tools.org
 Scala-Tools Maven2 Repository
 http://scala-tools.org/repo-releases
   
 
   
 
   
 org.apache.httpcomponents
 httpasyncclient
 4.0.2
 
   
 org.apache.httpcomponents
 httpcore
   
 
   
   
 org.apache.httpcomponents
 httpclient
 4.5.2
   
   
 

[GitHub] [incubator-hudi] firecast edited a comment on issue #894: Getting java.lang.NoSuchMethodError while doing Hive sync

2019-09-17 Thread GitBox
firecast edited a comment on issue #894: Getting java.lang.NoSuchMethodError 
while doing Hive sync
URL: https://github.com/apache/incubator-hudi/issues/894#issuecomment-532158394
 
 
   Will test that @vinothchandar and let you know. Just to put the whole setup 
I'm using into context, I am using IntelliJ IDEA to run the spark job locally. 
Here is a part of my build configuration. Am I supposed to add hudi-hive jars 
separately?
   
   ```sbt
   scalaVersion := "2.11.12"
   val sparkVersion = "2.4.3"
   
   libraryDependencies ++= Seq(
   "org.scala-lang" % "scala-compiler" % scalaVersion.value % "provided",
   
   "org.apache.spark" %% "spark-core" % sparkVersion % "provided",
   "org.apache.spark" %% "spark-sql" % sparkVersion % "provided",
   "org.apache.spark" %% "spark-hive" % sparkVersion % "provided",
   
   "org.apache.spark" %% "spark-sql-kafka-0-10" % sparkVersion % "provided",
   
   "com.databricks" %% "spark-avro" % "4.0.0",
   
   "org.apache.hadoop" % "hadoop-aws" % "2.8.5",
   "com.amazonaws" % "aws-java-sdk-s3" % "1.11.631",
   
   "com.facebook.presto" % "presto-jdbc" % "0.221",
   "io.spray" %% "spray-json" % "1.3.4",
   "io.minio" % "minio" % "6.0.11",
   
   "org.apache.hudi" % "hudi-spark-bundle" % "0.5.0-incubating-rc1" from 
"file:///Users/xxx/Documents/incubator-hudi/packaging/hudi-spark-bundle/target/hudi-spark-bundle-0.5.0-incubating-rc1.jar"
   
   )
   
   dependencyOverrides ++= Seq(
   "com.fasterxml.jackson.core" % "jackson-databind" % "2.6.7",
   "org.slf4j" % "slf4j-log4j12" % "1.7.28" % Test
   )
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] firecast commented on issue #894: Getting java.lang.NoSuchMethodError while doing Hive sync

2019-09-17 Thread GitBox
firecast commented on issue #894: Getting java.lang.NoSuchMethodError while 
doing Hive sync
URL: https://github.com/apache/incubator-hudi/issues/894#issuecomment-532158394
 
 
   Will test that @vinothchandar and let you know. Just to put the whole setup 
I'm using into context, I am using IntelliJ IDEA to run the spark job locally. 
Here is a part of my build configuration. Am I supposed to add hudi-hive jars 
separately?
   
   ```sbt
   scalaVersion := "2.11.12"
   val sparkVersion = "2.4.3"
   
   libraryDependencies ++= Seq(
   "org.scala-lang" % "scala-compiler" % scalaVersion.value % "provided",
   
   "org.apache.spark" %% "spark-core" % sparkVersion % "provided",
   "org.apache.spark" %% "spark-sql" % sparkVersion % "provided",
   "org.apache.spark" %% "spark-hive" % sparkVersion % "provided",
   
   "org.apache.spark" %% "spark-sql-kafka-0-10" % sparkVersion % "provided",
   
   "com.databricks" %% "spark-avro" % "4.0.0",
   
   "org.apache.hadoop" % "hadoop-aws" % "2.8.5",
   "com.amazonaws" % "aws-java-sdk-s3" % "1.11.631",
   // Add azure
   
   "com.facebook.presto" % "presto-jdbc" % "0.221",
   "io.spray" %% "spray-json" % "1.3.4",
   "io.minio" % "minio" % "6.0.11",
   
   "com.apache.hudi" % "hudi-spark-bundle" % "0.5.0-incubating-rc1" from 
"file:///Users/xxx/Documents/incubator-hudi/packaging/hudi-spark-bundle/target/hudi-spark-bundle-0.5.0-incubating-rc1.jar"
   
   )
   
   dependencyOverrides ++= Seq(
   "com.fasterxml.jackson.core" % "jackson-databind" % "2.6.7",
   "org.slf4j" % "slf4j-log4j12" % "1.7.28" % Test
   )
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Assigned] (HUDI-215) Update documentation for joining slack group

2019-09-17 Thread Pratyaksh Sharma (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pratyaksh Sharma reassigned HUDI-215:
-

Assignee: Pratyaksh Sharma

> Update documentation for joining slack group
> 
>
> Key: HUDI-215
> URL: https://issues.apache.org/jira/browse/HUDI-215
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Docs
>Reporter: Pratyaksh Sharma
>Assignee: Pratyaksh Sharma
>Priority: Major
>  Labels: documentation, newbie, pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently we have a list of pre-approved mail domains for joining apache-hudi 
> slack group. If anyone, whose mail-id is not present in that list, wants to 
> join the group, he/she has to check out github issue - 
> [https://github.com/apache/incubator-hudi/issues/143]. 
> However there is a documentation gap as this issue is not mentioned in the 
> documentation. This Jira is regarding updating the documentation to mention 
> this github issue in community.html page.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Updated] (HUDI-215) Update documentation for joining slack group

2019-09-17 Thread Pratyaksh Sharma (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pratyaksh Sharma updated HUDI-215:
--
Status: Patch Available  (was: In Progress)

> Update documentation for joining slack group
> 
>
> Key: HUDI-215
> URL: https://issues.apache.org/jira/browse/HUDI-215
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Docs
>Reporter: Pratyaksh Sharma
>Assignee: Pratyaksh Sharma
>Priority: Major
>  Labels: documentation, newbie, pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently we have a list of pre-approved mail domains for joining apache-hudi 
> slack group. If anyone, whose mail-id is not present in that list, wants to 
> join the group, he/she has to check out github issue - 
> [https://github.com/apache/incubator-hudi/issues/143]. 
> However there is a documentation gap as this issue is not mentioned in the 
> documentation. This Jira is regarding updating the documentation to mention 
> this github issue in community.html page.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Updated] (HUDI-215) Update documentation for joining slack group

2019-09-17 Thread Pratyaksh Sharma (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pratyaksh Sharma updated HUDI-215:
--
Status: Closed  (was: Patch Available)

> Update documentation for joining slack group
> 
>
> Key: HUDI-215
> URL: https://issues.apache.org/jira/browse/HUDI-215
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Docs
>Reporter: Pratyaksh Sharma
>Assignee: Pratyaksh Sharma
>Priority: Major
>  Labels: documentation, newbie, pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently we have a list of pre-approved mail domains for joining apache-hudi 
> slack group. If anyone, whose mail-id is not present in that list, wants to 
> join the group, he/she has to check out github issue - 
> [https://github.com/apache/incubator-hudi/issues/143]. 
> However there is a documentation gap as this issue is not mentioned in the 
> documentation. This Jira is regarding updating the documentation to mention 
> this github issue in community.html page.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[GitHub] [incubator-hudi] HariprasadAllaka1612 opened a new issue #898: Caused by: org.apache.hudi.exception.HoodieUpsertException: Failed to upsert for commit time

2019-09-17 Thread GitBox
HariprasadAllaka1612 opened a new issue #898: Caused by: 
org.apache.hudi.exception.HoodieUpsertException: Failed to upsert for commit 
time
URL: https://github.com/apache/incubator-hudi/issues/898
 
 
   I am trying to write data to s3 as Hudi file.
   
   Code:
inputDF
 .write.format("org.apache.hudi")
 .option(HoodieWriteConfig.TABLE_NAME, tablename)
 .option(DataSourceWriteOptions.RECORDKEY_FIELD_OPT_KEY, "GameId")
 
.option(DataSourceWriteOptions.PARTITIONPATH_FIELD_OPT_KEY,"OperatorShortName")
 .option(DataSourceWriteOptions.PRECOMBINE_FIELD_OPT_KEY, 
"HandledTimestamp")
 .option(DataSourceWriteOptions.OPERATION_OPT_KEY, 
DataSourceWriteOptions.UPSERT_OPERATION_OPT_VAL)
 .option(DataSourceWriteOptions.STORAGE_TYPE_OPT_KEY, 
DataSourceWriteOptions.MOR_STORAGE_TYPE_OPT_VAL)
 .mode(SaveMode.Append)
 .save("s3a://" + s3RawDataLakeBucket + "/Games2" )
   
   coming across below exception
   
   Exception in thread "main" java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
com.intellij.rt.execution.CommandLineWrapper.main(CommandLineWrapper.java:66)
   Caused by: org.apache.hudi.exception.HoodieUpsertException: Failed to upsert 
for commit time 20190917110809
at org.apache.hudi.HoodieWriteClient.upsert(HoodieWriteClient.java:177)
at 
org.apache.hudi.DataSourceUtils.doWriteOperation(DataSourceUtils.java:181)
at 
org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:143)
at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:91)
at 
org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45)
at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86)
at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at 
org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
at 
org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:80)
at 
org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:80)
at 
org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:668)
at 
org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:668)
at 
org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:78)
at 
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125)
at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73)
at 
org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:668)
at 
org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:276)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:270)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:228)
at 
com.playngoplatform.scala.dao.DataAccessS3.writeDataToRefinedS3(DataAccessS3.scala:30)
at 
com.playngoplatform.scala.controller.GameAndProviderDataTransform.processData(GameAndProviderDataTransform.scala:27)
at 
com.playngoplatform.scala.action.GameAndProviderData$.main(GameAndProviderData.scala:10)
at 
com.playngoplatform.scala.action.GameAndProviderData.main(GameAndProviderData.scala)
... 5 more
   Caused by: java.lang.NoSuchMethodError: 
org.apache.hadoop.fs.FSDataOutputStream: method (Ljava/io/OutputStream;)V 
not found
at 
org.apache.hudi.common.io.storage.SizeAwareFSDataOutputStream.(SizeAwareFSDataOutputStream.java:46)
at 
org.apache.hudi.common.io.storage.HoodieWrapperFileSystem.wrapOutputStream(HoodieWrapperFileSystem.java:160)
at 
org.apache.hudi.common.io.storage.HoodieWrapperFileSystem.create(HoodieWrapperFileSystem.java:168)
at 

[jira] [Updated] (HUDI-228) Add Jira Conventions to contributing/community pages of HUDI

2019-09-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-228:

Labels: pull-request-available  (was: )

> Add Jira Conventions to contributing/community pages of HUDI
> 
>
> Key: HUDI-228
> URL: https://issues.apache.org/jira/browse/HUDI-228
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: asf-migration, newbie
>Reporter: BALAJI VARADARAJAN
>Assignee: Pratyaksh Sharma
>Priority: Minor
>  Labels: pull-request-available
>
> When filing or updating a JIRA for Apache Hudi, kindly make sure
>  
> (a) the issue type and versions (when resolving the ticket) are set correctly.
> (b) Also, the summary needs to be descriptive enough to catch the essence of 
> the problem/features.
> (c) Capture the version of Hoodie/Spark/Hive/Hadoop/Cloud environments in the 
> ticket
>  
> When opening a github PR corresponding to the JIRA, reference Jira-id in the 
> commit message
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Created] (HUDI-255) Translate Talks & Powered By page

2019-09-17 Thread leesf (Jira)
leesf created HUDI-255:
--

 Summary: Translate Talks & Powered By page
 Key: HUDI-255
 URL: https://issues.apache.org/jira/browse/HUDI-255
 Project: Apache Hudi (incubating)
  Issue Type: Sub-task
Reporter: leesf
Assignee: leesf
 Fix For: 0.5.0


The online HTML web page: [https://hudi.apache.org/powered_by.html]



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[GitHub] [incubator-hudi] pratyakshsharma opened a new pull request #899: [HUDI-228] Contributing page updated to include JIRA guidelines

2019-09-17 Thread GitBox
pratyakshsharma opened a new pull request #899: [HUDI-228] Contributing page 
updated to include JIRA guidelines
URL: https://github.com/apache/incubator-hudi/pull/899
 
 
   Jira - HUDI-228


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Resolved] (HUDI-228) Add Jira Conventions to contributing/community pages of HUDI

2019-09-17 Thread Pratyaksh Sharma (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pratyaksh Sharma resolved HUDI-228.
---
Fix Version/s: 0.5.0
   Resolution: Fixed

> Add Jira Conventions to contributing/community pages of HUDI
> 
>
> Key: HUDI-228
> URL: https://issues.apache.org/jira/browse/HUDI-228
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: asf-migration, newbie
>Reporter: BALAJI VARADARAJAN
>Assignee: Pratyaksh Sharma
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.5.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> When filing or updating a JIRA for Apache Hudi, kindly make sure
>  
> (a) the issue type and versions (when resolving the ticket) are set correctly.
> (b) Also, the summary needs to be descriptive enough to catch the essence of 
> the problem/features.
> (c) Capture the version of Hoodie/Spark/Hive/Hadoop/Cloud environments in the 
> ticket
>  
> When opening a github PR corresponding to the JIRA, reference Jira-id in the 
> commit message
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[GitHub] [incubator-hudi] leesf commented on issue #900: [docs][chinese] update permalink for translated pages(quickstart.cn.md, use_cases.cn.md)

2019-09-17 Thread GitBox
leesf commented on issue #900: [docs][chinese] update permalink for translated 
pages(quickstart.cn.md, use_cases.cn.md)
URL: https://github.com/apache/incubator-hudi/pull/900#issuecomment-532177020
 
 
   cc @vinothchandar 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] leesf opened a new pull request #900: [docs][chinese] update permalink for translated pages(quickstart.cn.md, use_cases.cn.md)

2019-09-17 Thread GitBox
leesf opened a new pull request #900: [docs][chinese] update permalink for 
translated pages(quickstart.cn.md, use_cases.cn.md)
URL: https://github.com/apache/incubator-hudi/pull/900
 
 
   Fix pernalink.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Assigned] (HUDI-253) DeltaStreamer should report nicer error messages for misconfigs

2019-09-17 Thread Pratyaksh Sharma (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pratyaksh Sharma reassigned HUDI-253:
-

Assignee: Pratyaksh Sharma

> DeltaStreamer should report nicer error messages for misconfigs
> ---
>
> Key: HUDI-253
> URL: https://issues.apache.org/jira/browse/HUDI-253
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: deltastreamer
>Reporter: Vinoth Chandar
>Assignee: Pratyaksh Sharma
>Priority: Major
>
> e.g: 
> https://lists.apache.org/thread.html/4fdcdd7ba77a4f0366ec0e95f54298115fcc9567f6b0c9998f1b92b7@
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Created] (HUDI-256) Translate Comparison page

2019-09-17 Thread leesf (Jira)
leesf created HUDI-256:
--

 Summary: Translate Comparison page
 Key: HUDI-256
 URL: https://issues.apache.org/jira/browse/HUDI-256
 Project: Apache Hudi (incubating)
  Issue Type: Sub-task
  Components: docs-chinese
Reporter: leesf
Assignee: leesf
 Fix For: 0.5.0


The online HTML web page: [https://hudi.apache.org/comparison.html]



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Updated] (HUDI-255) Translate Talks & Powered By page

2019-09-17 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf updated HUDI-255:
---
Component/s: docs-chinese

> Translate Talks & Powered By page
> -
>
> Key: HUDI-255
> URL: https://issues.apache.org/jira/browse/HUDI-255
> Project: Apache Hudi (incubating)
>  Issue Type: Sub-task
>  Components: docs-chinese
>Reporter: leesf
>Assignee: leesf
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The online HTML web page: [https://hudi.apache.org/powered_by.html]



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Updated] (HUDI-255) Translate Talks & Powered By page

2019-09-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-255:

Labels: pull-request-available  (was: )

> Translate Talks & Powered By page
> -
>
> Key: HUDI-255
> URL: https://issues.apache.org/jira/browse/HUDI-255
> Project: Apache Hudi (incubating)
>  Issue Type: Sub-task
>Reporter: leesf
>Assignee: leesf
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.0
>
>
> The online HTML web page: [https://hudi.apache.org/powered_by.html]



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[GitHub] [incubator-hudi] leesf opened a new pull request #901: [HUDI-255] Translate Talks & Powered By page

2019-09-17 Thread GitBox
leesf opened a new pull request #901: [HUDI-255] Translate Talks & Powered By 
page
URL: https://github.com/apache/incubator-hudi/pull/901
 
 
   see [jira](https://jira.apache.org/jira/browse/HUDI-255)
   
   cc @yihua PTAL when you are free. Thanks.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] bhasudha commented on a change in pull request #896: Updating site to reflect recent doc changes

2019-09-17 Thread GitBox
bhasudha commented on a change in pull request #896: Updating site to reflect 
recent doc changes
URL: https://github.com/apache/incubator-hudi/pull/896#discussion_r325156491
 
 

 ##
 File path: content/404.html
 ##
 @@ -6,25 +6,25 @@
 
 
 Page Not Found | Hudi
-
+
 
 Review comment:
   @yanghua can you help with the removing the '/' changes in the original PR 
[#843](https://github.com/apache/incubator-hudi/pull/843)?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #896: Updating site to reflect recent doc changes

2019-09-17 Thread GitBox
vinothchandar commented on a change in pull request #896: Updating site to 
reflect recent doc changes
URL: https://github.com/apache/incubator-hudi/pull/896#discussion_r325244857
 
 

 ##
 File path: content/404.html
 ##
 @@ -6,25 +6,25 @@
 
 
 Page Not Found | Hudi
-
+
 
 Review comment:
   @yanghua I find it to be opposite.. you can try opening `content/index.html` 
as a simple file inside the browser and it seems to break css/js .. Can you 
point @bhasudha may be to the line that changed this behavior.. we can test it 
out?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[incubator-hudi] branch master updated: [HUDI-121] Update Release notes and fix master version

2019-09-17 Thread vbalaji
This is an automated email from the ASF dual-hosted git repository.

vbalaji pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new c1e7d0e  [HUDI-121] Update Release notes and fix master version
c1e7d0e is described below

commit c1e7d0e5a6ebc6fa416fc463ea6526f64a39afc5
Author: Balaji Varadarajan 
AuthorDate: Tue Sep 17 09:50:00 2019 -0700

[HUDI-121] Update Release notes and fix master version
---
 RELEASE_NOTES.md  | 3 +++
 docker/hoodie/hadoop/base/pom.xml | 2 +-
 docker/hoodie/hadoop/datanode/pom.xml | 2 +-
 docker/hoodie/hadoop/historyserver/pom.xml| 2 +-
 docker/hoodie/hadoop/hive_base/pom.xml| 2 +-
 docker/hoodie/hadoop/namenode/pom.xml | 2 +-
 docker/hoodie/hadoop/pom.xml  | 2 +-
 docker/hoodie/hadoop/prestobase/pom.xml   | 2 +-
 docker/hoodie/hadoop/spark_base/pom.xml   | 2 +-
 docker/hoodie/hadoop/sparkadhoc/pom.xml   | 2 +-
 docker/hoodie/hadoop/sparkmaster/pom.xml  | 2 +-
 docker/hoodie/hadoop/sparkworker/pom.xml  | 2 +-
 hudi-cli/pom.xml  | 2 +-
 hudi-client/pom.xml   | 2 +-
 hudi-common/pom.xml   | 2 +-
 hudi-hadoop-mr/pom.xml| 2 +-
 hudi-hive/pom.xml | 2 +-
 hudi-integ-test/pom.xml   | 2 +-
 hudi-spark/pom.xml| 2 +-
 hudi-timeline-service/pom.xml | 2 +-
 hudi-utilities/pom.xml| 2 +-
 packaging/hudi-hadoop-mr-bundle/pom.xml   | 2 +-
 packaging/hudi-hive-bundle/pom.xml| 2 +-
 packaging/hudi-presto-bundle/pom.xml  | 2 +-
 packaging/hudi-spark-bundle/pom.xml   | 2 +-
 packaging/hudi-timeline-server-bundle/pom.xml | 2 +-
 packaging/hudi-utilities-bundle/pom.xml   | 2 +-
 pom.xml   | 2 +-
 28 files changed, 30 insertions(+), 27 deletions(-)

diff --git a/RELEASE_NOTES.md b/RELEASE_NOTES.md
index 45cf4db..fbbca8f 100644
--- a/RELEASE_NOTES.md
+++ b/RELEASE_NOTES.md
@@ -8,6 +8,9 @@ Release 0.5.0-incubating
  * Bug fixes in query side integration, hive-sync, deltaStreamer, compaction, 
rollbacks, restore
 
 ### Full PR List
+  * **Balaji Varadarajan** [HUDI-257] Fix Bloom Index unit-test failures
+  * **Balaji Varadarajan** [HUDI-252] Add Disclaimer and cleanup NOTICE and 
LICENSE files in hudi. Identify packages which are under non-apache license in 
LICENSE file
+  * **Taher Koitwala** [HUDI-62] Index Lookup Timer added to HoodieWriteClient
   * **Balaji Varadarajan** [HUDI-249] Update Release-notes. Add sign-artifacts 
to POM and release related scripts. Add missing license headers and update 
NOTICE.txt files
   * **Vinoth Chandar** [HUDI-244] : Hive sync should escape partition field 
name - now supports field names beginning with '_' for e.g
   * **Balaji Varadarajan** [HUDI-250] Ensure Hudi CLI wrapper works with non 
snapshot jars too
diff --git a/docker/hoodie/hadoop/base/pom.xml 
b/docker/hoodie/hadoop/base/pom.xml
index 4e4be4b..52dd2a8 100644
--- a/docker/hoodie/hadoop/base/pom.xml
+++ b/docker/hoodie/hadoop/base/pom.xml
@@ -19,7 +19,7 @@
   
 hudi-hadoop-docker
 org.apache.hudi
-0.6.0-SNAPSHOT
+0.5.1-SNAPSHOT
   
   4.0.0
   pom
diff --git a/docker/hoodie/hadoop/datanode/pom.xml 
b/docker/hoodie/hadoop/datanode/pom.xml
index 667f978..23cb64d 100644
--- a/docker/hoodie/hadoop/datanode/pom.xml
+++ b/docker/hoodie/hadoop/datanode/pom.xml
@@ -19,7 +19,7 @@
   
 hudi-hadoop-docker
 org.apache.hudi
-0.6.0-SNAPSHOT
+0.5.1-SNAPSHOT
   
   4.0.0
   pom
diff --git a/docker/hoodie/hadoop/historyserver/pom.xml 
b/docker/hoodie/hadoop/historyserver/pom.xml
index 1779ecc..d35e940 100644
--- a/docker/hoodie/hadoop/historyserver/pom.xml
+++ b/docker/hoodie/hadoop/historyserver/pom.xml
@@ -19,7 +19,7 @@
   
 hudi-hadoop-docker
 org.apache.hudi
-0.6.0-SNAPSHOT
+0.5.1-SNAPSHOT
   
   4.0.0
   pom
diff --git a/docker/hoodie/hadoop/hive_base/pom.xml 
b/docker/hoodie/hadoop/hive_base/pom.xml
index 66301c1..2f7c2b5 100644
--- a/docker/hoodie/hadoop/hive_base/pom.xml
+++ b/docker/hoodie/hadoop/hive_base/pom.xml
@@ -19,7 +19,7 @@
   
 hudi-hadoop-docker
 org.apache.hudi
-0.6.0-SNAPSHOT
+0.5.1-SNAPSHOT
   
   4.0.0
   pom
diff --git a/docker/hoodie/hadoop/namenode/pom.xml 
b/docker/hoodie/hadoop/namenode/pom.xml
index 8c63cdc..a996f57 100644
--- a/docker/hoodie/hadoop/namenode/pom.xml
+++ b/docker/hoodie/hadoop/namenode/pom.xml
@@ -19,7 +19,7 @@
   
 hudi-hadoop-docker
 org.apache.hudi
-0.6.0-SNAPSHOT
+0.5.1-SNAPSHOT
   
   4.0.0
   pom
diff --git a/docker/hoodie/hadoop/pom.xml b/docker/hoodie/hadoop/pom.xml
index 4446255..fff962f 100644
--- a/docker/hoodie/hadoop/pom.xml
+++ b/docker/hoodie/hadoop/pom.xml
@@ -19,7 

svn commit: r35916 - /release/incubator/hudi/KEYS

2019-09-17 Thread vbalaji
Author: vbalaji
Date: Tue Sep 17 18:27:29 2019
New Revision: 35916

Log:
Adding KEYS to HUDI release dist


Added:
release/incubator/hudi/KEYS

Added: release/incubator/hudi/KEYS
==
--- release/incubator/hudi/KEYS (added)
+++ release/incubator/hudi/KEYS Tue Sep 17 18:27:29 2019
@@ -0,0 +1,275 @@
+This file contains the PGP keys of various developers.
+
+Users: pgp < KEYS
+   gpg --import KEYS
+Developers:
+pgp -kxa  and append it to this file.
+(pgpk -ll  && pgpk -xa ) >> this file.
+(gpg --list-sigs 
+ && gpg --armor --export ) >> this file.
+
+pub   4096R/D3541808 2014-01-09
+uid   [ultimate] Suneel Marthi (CODE SIGNING KEY) 
+sig 3D3541808 2014-01-09  Suneel Marthi (CODE SIGNING KEY) 

+sub   4096R/AF46E2DE 2014-01-09
+sig  D3541808 2014-01-09  Suneel Marthi (CODE SIGNING KEY) 

+
+-BEGIN PGP PUBLIC KEY BLOCK-
+Comment: GPGTools - https://gpgtools.org
+
+mQINBFLPJmEBEAC9d/dUZCXeyhB0fVGmJAjdjXfLebav4VqGdNZC+M1T9C3dcVsh
+X/JGme5bjJeIgVwiH5UsdNceYn1+hyxs8jXuRAWEWKP76gD+pNrp8Az0ZdBkJoAy
+zCywOPtJV2PCOz7+S5ri2nUA2+1Kgcu6IlSLMmYAGO0IAmRrjBEzxy9iGaxiNGTc
+LvQt/iVtIXWkKKI8yvpoJ8iFf3TGhpjgaC/h7cJP3zpy0SScmhJJASLXRsfocLv9
+sle6ndN9IPbDtRW8cL7Fk3VQlzp1ToVjmnQTyZZ6S1WafsjzCZ9hLN+k++o8VbvY
+v3icY6Sy0BKz0J6KwaxTkuZ6w1K7oUkVOQboKaWFIEdO+jwrEmU+Puyd8Np8jLnF
+Q0Y5GPfyMlqM3S/zaDm1t4D1eb5FLciStkxfg5wPVK6TkqB325KVD3aio5C7E7kt
+aQechHxaJXCQOtCtVY4X+L4iClnMSuk+hcSc8W8MYRTSVansItK0vI9eQZXMnpan
+w9/jk5rS4Gts1rHB7+kdjT3QRJmkyk6fEFT0fz5tfMC7N8waeEUhCaRW6lAoiqDW
+NW1h+0UGxJw+9YcGxBC0kkt3iofNOWQWmuf/BS3DHPKT7XV/YtBHe44wW0sF5L5P
+nfQUHpnA3pcZ0En6bXAvepKVZTNdOWWJqMyHV+436DA+33h45QL6lWb/GwARAQAB
+tDVTdW5lZWwgTWFydGhpIChDT0RFIFNJR05JTkcgS0VZKSA8c21hcnRoaUBhcGFj
+aGUub3JnPokCNwQTAQoAIQUCUs8mYQIbAwULCQgHAwUVCgkICwUWAgMBAAIeAQIX
+gAAKCRC08czE01QYCOKKEAChRtHBoYNTX+RZbFO0Kl1GlN+i1Ik0shEm5ZJ56XHv
+AnFx/gRK7CfZzJswWo7kf2s/dvJiFfs+rrolYVuO6E8gNhAaTEomSuvWQAMHdPcR
+9G5APRKCSkbZYugElqplEbSphk78FKoFO+sml52M7Pr9jj88ApBjoFVVY8njdnNq
+6DVlaDsg8YninCD78Z7PNFnRGwxyZ8Qd4Dh0rG+MUTfAWopZu6/MxpQxU7QpeVeX
+SIMLg7ClFrGfXnZcszYF4dnav1aa0i7W88PAdYNPko7tC5qz5yv2ep7t2gRbcYKf
+RXhYC2FHQey3wPhMKjA8V436lAqmfYnY/YdmhEy9Xq/1EdX1nHsQ7OEkfgXK14WM
+F+rnqXRAl/0cwiyb41eocdg5kpZFIKgCYT02usLWxwNnd3jOCe109Ze3y3acN/G8
++xOf9YRfNVAe6pD8H6ieRbv9gRjBmsbz9bXQCmxFnDqxNri5Me6gBAQPNmYTJD0h
+jgJTK6o0vJ0pwjBLauasJsLu+1tR3Cb0dxPE+JVaTF26FCd7pM7W6KdVfod9ZfrN
+cSyJ/cECc2KvYVGmTjQNVo1dYG0awBachlWnYNt+0Qx4opLsczZOLtPKtFY4BJA7
+aZoXT4Qf9yB8km7x2/cgNExVbFummToJ/IP3M39/EaryspsQQuM5Qu5Q5lZp8Qnn
+ybkCDQRSzyZhARAA7bAawFzbJaghYnm6mTZyGG5hQmfAynbF6cPAE+g2SnXcNQjP
+6kjYx3tSpb7rEzmjQqs46ztqdec6PIVBMhakON6z27Zz+IviAtO/TcaZHWNuCAjw
+FXVQZ+tYsSeiKInttfkrQc8jXAHWwSkSjLqNpvQpBdBEX80MYkFB6ZPOeON2+/Ta
+GC1H/HU2YngF0qQSmG33KKG6ezihBJdKxU6t2tsQfTlCmZW6R6MGpS9fVurYMKBk
+vR+7RGZ/H6dSjWPcpxhusGg92J9uz7r5SopN1wSdyPMUCMAFGeyoxcAuBDl38quU
+H/ENG3x5LDPq2aEH2AJ6yvZfIXbeJ1zmXf2cAHv+HbmvZaTSp0XIjq8Yxh8NkYEC
+ZdfRWmsGLIpU16TkBijpK3Dn9MDXjHGT3V8/qfdpURtMvIaL8WFrq9ejcy/vGRFn
+mCYqxIIPH+vLiMXKWtuMc61GN3ES21msKQH6IuQxxfQLyhK44L/pv7FpF4E+6LaE
+8uRwAex5HIDpR1v4aJq089rRtye9VXTJJLZ7lYs0HctdZ30QbBRWT4jS9d9rj3cr
+HgQ7mIGO9TAfK2kWc6AJN/EvxPWNbOwptsTUzAF/adiy9ax8C18iw7nKczC+2eN6
+UcbxXiPdytuKYK7O9A8S9e1w89GwpxYN7Xfn2o6QfpSbL9cLKiinOeV+xikAEQEA
+AYkCHwQYAQoACQUCUs8mYQIbDAAKCRC08czE01QYCG7yD/471dmyOD+go8cZkdqR
+3CHhjH03odtI0EJNVy4VGEC0r9paz3BWYTy18LqWYkw3ygphOIU1r8/7QK3H5Ke3
+c4yCSUxaMk5SlAJ+iVRek5TABkR8+zI+ZN5pQtqRH+ya5JxV4F/Sx5Q3KWMzpvgY
+n6AgSSc3hEfkgdI7SalIeyLaLDWv+RFdGZ5JU5gD28C0G8BeH8L62x6sixZcqoGT
+oy9rwkjs45/ZmmvBZhd1wLvC/au8l2Ecou6O8+8m26W8Z7vCuGKxuWn0KV3DLLWe
+66uchDVlakGoMJSPIK06JWYUlE+gL0CW+U2ekt/v2qb8hGgMVET3CBAMq+bFWuJ6
+juX7hJd7wHtCFfjnFDDAkdp2IIIZAlBW6FZGv7pJ82xsW6pSAg0A7VrV6nTtMtDv
+T8esOfo/t4t0gaL7bivy9DVVdATbUBcJJFpoVoe5MxiyjptveqPzIRwzt04n52Ph
+ordVWAnX5AokXWTg+Glem/EWEuf7jUuZArfqCSl/sZoQdXGTjR7G4iFscispji4+
+kNjVQsItqFbgDpuc6n+GcFxlKQ7YMCnu5MVtTV01U4lFs0qy0NTUqsuR35DM4z14
+DkFmj1upWAayCoXTpKzsHBvJZPC+Wqf9Pl3O47apelg7KxU3S011YfXpVPvCTKBv
+kD2o/5GKWS5QkSUEUXXY1oDiLg==
+=f8kJ
+-END PGP PUBLIC KEY BLOCK-
+
+pub   rsa4096 2019-07-29 [SC]
+  AF9BAF79D311A3D3288E583F24A499037262AAA4
+uid   [ultimate] Balaji Varadarajan 
+sig 324A499037262AAA4 2019-07-29  Balaji Varadarajan 

+sub   rsa4096 2019-07-29 [E]
+sig  24A499037262AAA4 2019-07-29  Balaji Varadarajan 

+
+-BEGIN PGP PUBLIC KEY BLOCK-
+
+mQINBF0+XtEBEADpNIZkDKZrwrHy7x8uJBSelnMGvd9z6+PYmvWYVvoGnjipjC7L
+fXzaZGofmKxDEKtQI5ip/4DlX/vRVjwNdaPfelLCPN+dZy73m2NcYH2v9OgVNf/L
+L6eqispkqIbmGRwJqq3YfsrDSqlJ5gS9B7/rSUyKx33sKzm0uHT+E/fg45q8AJBn
+ef/Y2zvSu7Stv9wYrXGBrOlwBpiRUoobcF7utAtLcr18DLgRD3K3trWpjLJqFf6O
+LDiFR25VmCQ6Lr/vPKICil75Z91CgRzkHl44drZffzqOzljz62nawSMhxzuX8ryO
+pTG8Wq3U1dS3699iCgMPYeHB4C43c0ieZf/+y7uJD7GwW7Jfnc1GuN3OwiDA16yh
+NfDQhhXlZf+iKAOBhkIGqYgy2+l587etTZqUBKWIjxwVobhX6VHKXDTC7YYxnw8n
+4emuF4nxC5ySfuJBaMFCTBvgALoBPJA4spS+uBFVygM7/ZMR2KUywhajqbpm4iEw

svn commit: r35917 - /release/hudi/

2019-09-17 Thread vbalaji
Author: vbalaji
Date: Tue Sep 17 18:28:58 2019
New Revision: 35917

Log:
Removing HUDI directory from top-level and moving to incubator


Removed:
release/hudi/



[GitHub] [incubator-hudi] umehrot2 commented on issue #869: Hudi Spark error when spark bundle jar is added to spark's classpath

2019-09-17 Thread GitBox
umehrot2 commented on issue #869: Hudi Spark error when spark bundle jar is 
added to spark's classpath
URL: https://github.com/apache/incubator-hudi/issues/869#issuecomment-532352413
 
 
   @vinothchandar Yes will continue the discussion there. Will test the my job, 
as well as demo setup by dropping hudi spark bundle (with shaded databricks 
avro) in the Spark jars folder.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Updated] (HUDI-254) Provide mechanism for installing hudi-spark-bundle onto an existing spark installation

2019-09-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-254:

Labels: pull-request-available  (was: )

> Provide mechanism for installing hudi-spark-bundle onto an existing spark 
> installation
> --
>
> Key: HUDI-254
> URL: https://issues.apache.org/jira/browse/HUDI-254
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Spark datasource, SparkSQL Support
>Reporter: Vinoth Chandar
>Assignee: Vinoth Chandar
>Priority: Major
>  Labels: pull-request-available
>
> A lot of discussions around this kicked off from 
> [https://github.com/apache/incubator-hudi/issues/869] 
> Breaking down into phases, when we drop the hudi-spark-bundle*.jar onto the 
> `jars` folder 
>  
> a) Writing data via Hudi datasource should work 
> b) Spark datasource reads should work
>  
> c)  a + Hive Sync should work
> d) SparkSQL on Hive synced table works 
>  
> Start with Spark 2.3 (current demo setup) and then proceed to 2.4 and iron 
> out issues.
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[GitHub] [incubator-hudi] vinothchandar opened a new pull request #903: [HUDI-254]: Bundle and shade databricks/avro with spark bundle

2019-09-17 Thread GitBox
vinothchandar opened a new pull request #903: [HUDI-254]: Bundle and shade 
databricks/avro with spark bundle
URL: https://github.com/apache/incubator-hudi/pull/903
 
 
- spark 2.4 onwards, spark has built in support. shading to avoid conflicts
- spark 2.3 still needs this bundled, so that dropping bundle into jars 
folder would work


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] HariprasadAllaka1612 commented on issue #898: Caused by: org.apache.hudi.exception.HoodieUpsertException: Failed to upsert for commit time

2019-09-17 Thread GitBox
HariprasadAllaka1612 commented on issue #898: Caused by: 
org.apache.hudi.exception.HoodieUpsertException: Failed to upsert for commit 
time
URL: https://github.com/apache/incubator-hudi/issues/898#issuecomment-532378362
 
 
   I will change the version of hadoop to 2.8.4 and check if it works.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Updated] (HUDI-259) Hadoop 3 support for Hudi writing

2019-09-17 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar updated HUDI-259:

Description: 
Sample issues

 

[https://github.com/apache/incubator-hudi/issues/735]

[https://github.com/apache/incubator-hudi/issues/877#issuecomment-528433568] 

[https://github.com/apache/incubator-hudi/issues/898]

 

> Hadoop 3 support for Hudi writing
> -
>
> Key: HUDI-259
> URL: https://issues.apache.org/jira/browse/HUDI-259
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Usability
>Reporter: Vinoth Chandar
>Priority: Major
>
> Sample issues
>  
> [https://github.com/apache/incubator-hudi/issues/735]
> [https://github.com/apache/incubator-hudi/issues/877#issuecomment-528433568] 
> [https://github.com/apache/incubator-hudi/issues/898]
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[incubator-hudi] tag release-0.5.0-incubating-rc2 created (now ffa2be3)

2019-09-17 Thread vbalaji
This is an automated email from the ASF dual-hosted git repository.

vbalaji pushed a change to tag release-0.5.0-incubating-rc2
in repository https://gitbox.apache.org/repos/asf/incubator-hudi.git.


  at ffa2be3  (commit)
No new revisions were added by this update.



[jira] [Resolved] (HUDI-62) Add metrics around IOHandle times #297

2019-09-17 Thread BALAJI VARADARAJAN (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-62?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BALAJI VARADARAJAN resolved HUDI-62.

Resolution: Fixed

> Add metrics around IOHandle times #297
> --
>
> Key: HUDI-62
> URL: https://issues.apache.org/jira/browse/HUDI-62
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>  Components: newbie, Performance, Write Client
>Reporter: Vinoth Chandar
>Assignee: Taher Koitawala
>Priority: Major
>  Labels: pull-request-available, realtime-data-lakes
> Fix For: 0.5.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> https://github.com/uber/hudi/issues/297



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Closed] (HUDI-62) Add metrics around IOHandle times #297

2019-09-17 Thread BALAJI VARADARAJAN (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-62?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BALAJI VARADARAJAN closed HUDI-62.
--

> Add metrics around IOHandle times #297
> --
>
> Key: HUDI-62
> URL: https://issues.apache.org/jira/browse/HUDI-62
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>  Components: newbie, Performance, Write Client
>Reporter: Vinoth Chandar
>Assignee: Taher Koitawala
>Priority: Major
>  Labels: pull-request-available, realtime-data-lakes
> Fix For: 0.5.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> https://github.com/uber/hudi/issues/297



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Updated] (HUDI-62) Add metrics around IOHandle times #297

2019-09-17 Thread BALAJI VARADARAJAN (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-62?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BALAJI VARADARAJAN updated HUDI-62:
---
Fix Version/s: 0.5.0

> Add metrics around IOHandle times #297
> --
>
> Key: HUDI-62
> URL: https://issues.apache.org/jira/browse/HUDI-62
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>  Components: newbie, Performance, Write Client
>Reporter: Vinoth Chandar
>Assignee: Taher Koitawala
>Priority: Major
>  Labels: pull-request-available, realtime-data-lakes
> Fix For: 0.5.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> https://github.com/uber/hudi/issues/297



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


svn commit: r35918 - /dev/hudi/

2019-09-17 Thread vbalaji
Author: vbalaji
Date: Tue Sep 17 18:31:35 2019
New Revision: 35918

Log:
Moving hudi from top-level to incubator

Removed:
dev/hudi/



[GitHub] [incubator-hudi] vinothchandar commented on issue #894: Getting java.lang.NoSuchMethodError while doing Hive sync

2019-09-17 Thread GitBox
vinothchandar commented on issue #894: Getting java.lang.NoSuchMethodError 
while doing Hive sync
URL: https://github.com/apache/incubator-hudi/issues/894#issuecomment-532347935
 
 
   >>Am I supposed to add hudi-hive jars separately?
   No.. its all there in the bundle
   
   Some general context . if you are writing a spark job, its better to just 
depend on `hudi-spark` , which will pull in hudi-hive.  That way you have 
control over what versions to exclude and bring in. With a bundled jar (true 
for any bundled/fat/uber jar), you dont have control to say tell hudi to not 
bring its version of Hive. 
   
   Can you try building a fat jar and running your job once via `spark-submit` 
locally? I imagine you added the spark jars to your IntelliJ module to be able 
to run the program locally. Want to see if the jar conflict is coming from 
that.. Cant understand where `Hive 2.3.2-amzn-2` comes from still from what you 
shared


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] umehrot2 commented on issue #770: remove com.databricks:spark-avro to build spark avro schema by itself

2019-09-17 Thread GitBox
umehrot2 commented on issue #770: remove com.databricks:spark-avro to build 
spark avro schema by itself
URL: https://github.com/apache/incubator-hudi/pull/770#issuecomment-532354352
 
 
   @vinothchandar @cdmikechen 
   
   I was able to read and write `Decimal` type correctly by upgrading that 
parquet version to `1.8.2`. This PR needs to be updated accordingly.
   
   Is there a way we can prioritize this work and get it merged ? Is there any 
additional testing that I can help perform which can give us confidence that it 
can be merged ? @cdmikechen you mentioned there are still some issues. If you 
would like and can point it out here, I would be willing to help out with that 
as well.
   
   
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on issue #888: Exception in thread "main" com.uber.hoodie.exception.DatasetNotFoundException: Hoodie dataset not found in path

2019-09-17 Thread GitBox
vinothchandar commented on issue #888: Exception in thread "main" 
com.uber.hoodie.exception.DatasetNotFoundException: Hoodie dataset not found in 
path
URL: https://github.com/apache/incubator-hudi/issues/888#issuecomment-532384495
 
 
   so the structure is getting created, but somehow the path is wrong? really 
weird. The code I pasted is what throws the error.. as you see, what it checks 
for is using the standard hadoop `Path` class. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] yanghua commented on a change in pull request #896: Updating site to reflect recent doc changes

2019-09-17 Thread GitBox
yanghua commented on a change in pull request #896: Updating site to reflect 
recent doc changes
URL: https://github.com/apache/incubator-hudi/pull/896#discussion_r325268090
 
 

 ##
 File path: content/404.html
 ##
 @@ -6,25 +6,25 @@
 
 
 Page Not Found | Hudi
-
+
 
 Review comment:
   @bhasudha @vinothchandar Sorry, I have tested. You are right. When I removed 
the start `/`, it worked fine. If it starts with `/`, the page can also be 
rendered correctly when run `bundle exec jekyll serve`, but it can not be 
rendered when we just open the generated HTML file with the browser.
   
   So it's really an issue.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Updated] (HUDI-248) CLI doesn't allow rolling back a Delta commit

2019-09-17 Thread BALAJI VARADARAJAN (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BALAJI VARADARAJAN updated HUDI-248:

Fix Version/s: (was: 0.6.0)
   0.5.1

> CLI doesn't allow rolling back a Delta commit
> -
>
> Key: HUDI-248
> URL: https://issues.apache.org/jira/browse/HUDI-248
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>  Components: CLI
>Reporter: Rahul Bhartia
>Priority: Minor
>  Labels: aws-emr
> Fix For: 0.5.1
>
>
> [https://github.com/apache/incubator-hudi/blob/master/hudi-cli/src/main/java/org/apache/hudi/cli/commands/CommitsCommand.java#L128]
>  
> When trying to find a match for passed in commit value, the "commit rollback" 
> command is always default to using HoodieTimeline.COMMIT_ACTION - and hence 
> doesn't allow rolling back delta commits.
> Note: Delta Commits can be rolled back using a HoodieWriteClient, so seems 
> like it's a just a matter of having to match against both COMMIT_ACTION and 
> DELTA_COMMIT_ACTION in the CLI.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Updated] (HUDI-254) Provide mechanism for installing hudi-spark-bundle onto an existing spark installation

2019-09-17 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar updated HUDI-254:

Labels:   (was: pull-request-available)

> Provide mechanism for installing hudi-spark-bundle onto an existing spark 
> installation
> --
>
> Key: HUDI-254
> URL: https://issues.apache.org/jira/browse/HUDI-254
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Spark datasource, SparkSQL Support
>Reporter: Vinoth Chandar
>Assignee: Vinoth Chandar
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> A lot of discussions around this kicked off from 
> [https://github.com/apache/incubator-hudi/issues/869] 
> Breaking down into phases, when we drop the hudi-spark-bundle*.jar onto the 
> `jars` folder 
>  
> a) Writing data via Hudi datasource should work 
> b) Spark datasource reads should work
>  
> c)  a + Hive Sync should work
> d) SparkSQL on Hive synced table works 
>  
> Start with Spark 2.3 (current demo setup) and then proceed to 2.4 and iron 
> out issues.
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


svn commit: r35920 - in /dev/incubator/hudi: ./ hudi-0.5.0-incubating-rc1/ hudi-0.5.0-incubating-rc2/

2019-09-17 Thread vbalaji
Author: vbalaji
Date: Tue Sep 17 20:32:22 2019
New Revision: 35920

Log:
Uploading hudi-0.5.0-incubating-rc2 source bundle

Added:
dev/incubator/hudi/KEYS
dev/incubator/hudi/hudi-0.5.0-incubating-rc1/

dev/incubator/hudi/hudi-0.5.0-incubating-rc1/hudi-0.5.0-incubating-rc1.src.tgz  
 (with props)

dev/incubator/hudi/hudi-0.5.0-incubating-rc1/hudi-0.5.0-incubating-rc1.src.tgz.asc

dev/incubator/hudi/hudi-0.5.0-incubating-rc1/hudi-0.5.0-incubating-rc1.src.tgz.sha512
dev/incubator/hudi/hudi-0.5.0-incubating-rc2/

dev/incubator/hudi/hudi-0.5.0-incubating-rc2/hudi-0.5.0-incubating-rc2.src.tgz  
 (with props)

dev/incubator/hudi/hudi-0.5.0-incubating-rc2/hudi-0.5.0-incubating-rc2.src.tgz.asc

dev/incubator/hudi/hudi-0.5.0-incubating-rc2/hudi-0.5.0-incubating-rc2.src.tgz.sha512

Added: dev/incubator/hudi/KEYS
==
--- dev/incubator/hudi/KEYS (added)
+++ dev/incubator/hudi/KEYS Tue Sep 17 20:32:22 2019
@@ -0,0 +1,275 @@
+This file contains the PGP keys of various developers.
+
+Users: pgp < KEYS
+   gpg --import KEYS
+Developers:
+pgp -kxa  and append it to this file.
+(pgpk -ll  && pgpk -xa ) >> this file.
+(gpg --list-sigs 
+ && gpg --armor --export ) >> this file.
+
+pub   4096R/D3541808 2014-01-09
+uid   [ultimate] Suneel Marthi (CODE SIGNING KEY) 
+sig 3D3541808 2014-01-09  Suneel Marthi (CODE SIGNING KEY) 

+sub   4096R/AF46E2DE 2014-01-09
+sig  D3541808 2014-01-09  Suneel Marthi (CODE SIGNING KEY) 

+
+-BEGIN PGP PUBLIC KEY BLOCK-
+Comment: GPGTools - https://gpgtools.org
+
+mQINBFLPJmEBEAC9d/dUZCXeyhB0fVGmJAjdjXfLebav4VqGdNZC+M1T9C3dcVsh
+X/JGme5bjJeIgVwiH5UsdNceYn1+hyxs8jXuRAWEWKP76gD+pNrp8Az0ZdBkJoAy
+zCywOPtJV2PCOz7+S5ri2nUA2+1Kgcu6IlSLMmYAGO0IAmRrjBEzxy9iGaxiNGTc
+LvQt/iVtIXWkKKI8yvpoJ8iFf3TGhpjgaC/h7cJP3zpy0SScmhJJASLXRsfocLv9
+sle6ndN9IPbDtRW8cL7Fk3VQlzp1ToVjmnQTyZZ6S1WafsjzCZ9hLN+k++o8VbvY
+v3icY6Sy0BKz0J6KwaxTkuZ6w1K7oUkVOQboKaWFIEdO+jwrEmU+Puyd8Np8jLnF
+Q0Y5GPfyMlqM3S/zaDm1t4D1eb5FLciStkxfg5wPVK6TkqB325KVD3aio5C7E7kt
+aQechHxaJXCQOtCtVY4X+L4iClnMSuk+hcSc8W8MYRTSVansItK0vI9eQZXMnpan
+w9/jk5rS4Gts1rHB7+kdjT3QRJmkyk6fEFT0fz5tfMC7N8waeEUhCaRW6lAoiqDW
+NW1h+0UGxJw+9YcGxBC0kkt3iofNOWQWmuf/BS3DHPKT7XV/YtBHe44wW0sF5L5P
+nfQUHpnA3pcZ0En6bXAvepKVZTNdOWWJqMyHV+436DA+33h45QL6lWb/GwARAQAB
+tDVTdW5lZWwgTWFydGhpIChDT0RFIFNJR05JTkcgS0VZKSA8c21hcnRoaUBhcGFj
+aGUub3JnPokCNwQTAQoAIQUCUs8mYQIbAwULCQgHAwUVCgkICwUWAgMBAAIeAQIX
+gAAKCRC08czE01QYCOKKEAChRtHBoYNTX+RZbFO0Kl1GlN+i1Ik0shEm5ZJ56XHv
+AnFx/gRK7CfZzJswWo7kf2s/dvJiFfs+rrolYVuO6E8gNhAaTEomSuvWQAMHdPcR
+9G5APRKCSkbZYugElqplEbSphk78FKoFO+sml52M7Pr9jj88ApBjoFVVY8njdnNq
+6DVlaDsg8YninCD78Z7PNFnRGwxyZ8Qd4Dh0rG+MUTfAWopZu6/MxpQxU7QpeVeX
+SIMLg7ClFrGfXnZcszYF4dnav1aa0i7W88PAdYNPko7tC5qz5yv2ep7t2gRbcYKf
+RXhYC2FHQey3wPhMKjA8V436lAqmfYnY/YdmhEy9Xq/1EdX1nHsQ7OEkfgXK14WM
+F+rnqXRAl/0cwiyb41eocdg5kpZFIKgCYT02usLWxwNnd3jOCe109Ze3y3acN/G8
++xOf9YRfNVAe6pD8H6ieRbv9gRjBmsbz9bXQCmxFnDqxNri5Me6gBAQPNmYTJD0h
+jgJTK6o0vJ0pwjBLauasJsLu+1tR3Cb0dxPE+JVaTF26FCd7pM7W6KdVfod9ZfrN
+cSyJ/cECc2KvYVGmTjQNVo1dYG0awBachlWnYNt+0Qx4opLsczZOLtPKtFY4BJA7
+aZoXT4Qf9yB8km7x2/cgNExVbFummToJ/IP3M39/EaryspsQQuM5Qu5Q5lZp8Qnn
+ybkCDQRSzyZhARAA7bAawFzbJaghYnm6mTZyGG5hQmfAynbF6cPAE+g2SnXcNQjP
+6kjYx3tSpb7rEzmjQqs46ztqdec6PIVBMhakON6z27Zz+IviAtO/TcaZHWNuCAjw
+FXVQZ+tYsSeiKInttfkrQc8jXAHWwSkSjLqNpvQpBdBEX80MYkFB6ZPOeON2+/Ta
+GC1H/HU2YngF0qQSmG33KKG6ezihBJdKxU6t2tsQfTlCmZW6R6MGpS9fVurYMKBk
+vR+7RGZ/H6dSjWPcpxhusGg92J9uz7r5SopN1wSdyPMUCMAFGeyoxcAuBDl38quU
+H/ENG3x5LDPq2aEH2AJ6yvZfIXbeJ1zmXf2cAHv+HbmvZaTSp0XIjq8Yxh8NkYEC
+ZdfRWmsGLIpU16TkBijpK3Dn9MDXjHGT3V8/qfdpURtMvIaL8WFrq9ejcy/vGRFn
+mCYqxIIPH+vLiMXKWtuMc61GN3ES21msKQH6IuQxxfQLyhK44L/pv7FpF4E+6LaE
+8uRwAex5HIDpR1v4aJq089rRtye9VXTJJLZ7lYs0HctdZ30QbBRWT4jS9d9rj3cr
+HgQ7mIGO9TAfK2kWc6AJN/EvxPWNbOwptsTUzAF/adiy9ax8C18iw7nKczC+2eN6
+UcbxXiPdytuKYK7O9A8S9e1w89GwpxYN7Xfn2o6QfpSbL9cLKiinOeV+xikAEQEA
+AYkCHwQYAQoACQUCUs8mYQIbDAAKCRC08czE01QYCG7yD/471dmyOD+go8cZkdqR
+3CHhjH03odtI0EJNVy4VGEC0r9paz3BWYTy18LqWYkw3ygphOIU1r8/7QK3H5Ke3
+c4yCSUxaMk5SlAJ+iVRek5TABkR8+zI+ZN5pQtqRH+ya5JxV4F/Sx5Q3KWMzpvgY
+n6AgSSc3hEfkgdI7SalIeyLaLDWv+RFdGZ5JU5gD28C0G8BeH8L62x6sixZcqoGT
+oy9rwkjs45/ZmmvBZhd1wLvC/au8l2Ecou6O8+8m26W8Z7vCuGKxuWn0KV3DLLWe
+66uchDVlakGoMJSPIK06JWYUlE+gL0CW+U2ekt/v2qb8hGgMVET3CBAMq+bFWuJ6
+juX7hJd7wHtCFfjnFDDAkdp2IIIZAlBW6FZGv7pJ82xsW6pSAg0A7VrV6nTtMtDv
+T8esOfo/t4t0gaL7bivy9DVVdATbUBcJJFpoVoe5MxiyjptveqPzIRwzt04n52Ph
+ordVWAnX5AokXWTg+Glem/EWEuf7jUuZArfqCSl/sZoQdXGTjR7G4iFscispji4+
+kNjVQsItqFbgDpuc6n+GcFxlKQ7YMCnu5MVtTV01U4lFs0qy0NTUqsuR35DM4z14
+DkFmj1upWAayCoXTpKzsHBvJZPC+Wqf9Pl3O47apelg7KxU3S011YfXpVPvCTKBv
+kD2o/5GKWS5QkSUEUXXY1oDiLg==
+=f8kJ
+-END PGP PUBLIC KEY BLOCK-
+
+pub   rsa4096 2019-07-29 [SC]
+  AF9BAF79D311A3D3288E583F24A499037262AAA4
+uid   [ultimate] Balaji Varadarajan 
+sig 324A499037262AAA4 2019-07-29  Balaji Varadarajan 

+sub   rsa4096 

[jira] [Assigned] (HUDI-257) Unit tests intermittently failing

2019-09-17 Thread BALAJI VARADARAJAN (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BALAJI VARADARAJAN reassigned HUDI-257:
---

Assignee: BALAJI VARADARAJAN

> Unit tests intermittently failing 
> --
>
> Key: HUDI-257
> URL: https://issues.apache.org/jira/browse/HUDI-257
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>  Components: Common Core
>Reporter: BALAJI VARADARAJAN
>Assignee: BALAJI VARADARAJAN
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
>   TestHoodieBloomIndex.testCheckUUIDsAgainstOneFile:270 » HoodieIndex Error 
> chec...
>   TestHoodieBloomIndex.testCheckUUIDsAgainstOneFile:270 » HoodieIndex Error 
> chec...
>   TestHoodieBloomIndex.testCheckUUIDsAgainstOneFile:270 » HoodieIndex Error 
> chec...
>   TestHoodieBloomIndex.testCheckUUIDsAgainstOneFile:270 » HoodieIndex Error 
> chec...
>   TestCopyOnWriteTable.testUpdateRecords:170 » HoodieIO Failed to read footer 
> fo...



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Closed] (HUDI-257) Unit tests intermittently failing

2019-09-17 Thread BALAJI VARADARAJAN (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BALAJI VARADARAJAN closed HUDI-257.
---

> Unit tests intermittently failing 
> --
>
> Key: HUDI-257
> URL: https://issues.apache.org/jira/browse/HUDI-257
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>  Components: Common Core
>Reporter: BALAJI VARADARAJAN
>Assignee: BALAJI VARADARAJAN
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
>   TestHoodieBloomIndex.testCheckUUIDsAgainstOneFile:270 » HoodieIndex Error 
> chec...
>   TestHoodieBloomIndex.testCheckUUIDsAgainstOneFile:270 » HoodieIndex Error 
> chec...
>   TestHoodieBloomIndex.testCheckUUIDsAgainstOneFile:270 » HoodieIndex Error 
> chec...
>   TestHoodieBloomIndex.testCheckUUIDsAgainstOneFile:270 » HoodieIndex Error 
> chec...
>   TestCopyOnWriteTable.testUpdateRecords:170 » HoodieIO Failed to read footer 
> fo...



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Resolved] (HUDI-257) Unit tests intermittently failing

2019-09-17 Thread BALAJI VARADARAJAN (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BALAJI VARADARAJAN resolved HUDI-257.
-
Resolution: Fixed

> Unit tests intermittently failing 
> --
>
> Key: HUDI-257
> URL: https://issues.apache.org/jira/browse/HUDI-257
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>  Components: Common Core
>Reporter: BALAJI VARADARAJAN
>Assignee: BALAJI VARADARAJAN
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
>   TestHoodieBloomIndex.testCheckUUIDsAgainstOneFile:270 » HoodieIndex Error 
> chec...
>   TestHoodieBloomIndex.testCheckUUIDsAgainstOneFile:270 » HoodieIndex Error 
> chec...
>   TestHoodieBloomIndex.testCheckUUIDsAgainstOneFile:270 » HoodieIndex Error 
> chec...
>   TestHoodieBloomIndex.testCheckUUIDsAgainstOneFile:270 » HoodieIndex Error 
> chec...
>   TestCopyOnWriteTable.testUpdateRecords:170 » HoodieIO Failed to read footer 
> fo...



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[GitHub] [incubator-hudi] HariprasadAllaka1612 edited a comment on issue #888: Exception in thread "main" com.uber.hoodie.exception.DatasetNotFoundException: Hoodie dataset not found in path

2019-09-17 Thread GitBox
HariprasadAllaka1612 edited a comment on issue #888: Exception in thread "main" 
com.uber.hoodie.exception.DatasetNotFoundException: Hoodie dataset not found in 
path
URL: https://github.com/apache/incubator-hudi/issues/888#issuecomment-532071607
 
 
   Hi Vinoth,
   
   Thanks for the reply.
   
   a. This is the first time i am writing the data to that path
   b. Tried with Overwrite but i have the same exception. 
   ```
inputDF
 .write.format("com.uber.hoodie")
 .option(HoodieWriteConfig.TABLE_NAME, tablename)
 .option(DataSourceWriteOptions.RECORDKEY_FIELD_OPT_KEY, "GameId")
 
.option(DataSourceWriteOptions.PARTITIONPATH_FIELD_OPT_KEY,"OperatorShortName")
 .option(DataSourceWriteOptions.PRECOMBINE_FIELD_OPT_KEY, 
"HandledTimestamp")
 .option(DataSourceWriteOptions.OPERATION_OPT_KEY, 
DataSourceWriteOptions.UPSERT_OPERATION_OPT_VAL)
 .option(DataSourceWriteOptions.STORAGE_TYPE_OPT_KEY, 
DataSourceWriteOptions.MOR_STORAGE_TYPE_OPT_VAL)
 .mode(SaveMode.Overwrite)
 .save("s3a://" + "gat-datalake-raw-dev" + "/Games3" )
   
   Below is the exception with Overwrite savemode for your reference,
   Exception in thread "main" 
com.uber.hoodie.exception.DatasetNotFoundException: Hoodie dataset not found in 
path s3a://gat-datalake-raw-dev/Games3\.hoodie
at 
com.uber.hoodie.exception.DatasetNotFoundException.checkValidDataset(DatasetNotFoundException.java:45)
at 
com.uber.hoodie.common.table.HoodieTableMetaClient.(HoodieTableMetaClient.java:91)
at 
com.uber.hoodie.common.table.HoodieTableMetaClient.(HoodieTableMetaClient.java:78)
at 
com.uber.hoodie.common.table.HoodieTableMetaClient.initializePathAsHoodieDataset(HoodieTableMetaClient.java:310)
at 
com.uber.hoodie.common.table.HoodieTableMetaClient.initTableType(HoodieTableMetaClient.java:248)
at 
com.uber.hoodie.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:136)
at com.uber.hoodie.DefaultSource.createRelation(DefaultSource.scala:91)
at 
org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45)
at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86)
at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at 
org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
at 
org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:80)
at 
org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:80)
at 
org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:668)
at 
org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:668)
at 
org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:78)
at 
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125)
at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73)
at 
org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:668)
at 
org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:276)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:270)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:228)
at 
com.playngodataengg.scala.dao.DataAccessS3.writeDataToRefinedS3(DataAccessS3.scala:28)
at 
com.playngodataengg.scala.controller.GameAndProviderDataTransform.processData(GameAndProviderDataTransform.scala:29)
at 
com.playngodataengg.scala.action.GameAndProviderData$.main(GameAndProviderData.scala:10)
at 
com.playngodataengg.scala.action.GameAndProviderData.main(GameAndProviderData.scala)
   
   And i wanted to provide you the dependencies i have in my maven project as 
well.
   
   http://maven.apache.org/POM/4.0.0; 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance; 
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 
http://maven.apache.org/maven-v4_0_0.xsd;>
 4.0.0
 com.playngodataengg.scala
 playngodataengg
 1.0-SNAPSHOT
 2008
 
   2.11.12
   2.4.0
   2.11
 
   
 
   
 

[jira] [Created] (HUDI-259) Hadoop 3 support for Hudi writing

2019-09-17 Thread Vinoth Chandar (Jira)
Vinoth Chandar created HUDI-259:
---

 Summary: Hadoop 3 support for Hudi writing
 Key: HUDI-259
 URL: https://issues.apache.org/jira/browse/HUDI-259
 Project: Apache Hudi (incubating)
  Issue Type: Improvement
  Components: Usability
Reporter: Vinoth Chandar






--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Created] (HUDI-257) Unit tests intermittently failing

2019-09-17 Thread BALAJI VARADARAJAN (Jira)
BALAJI VARADARAJAN created HUDI-257:
---

 Summary: Unit tests intermittently failing 
 Key: HUDI-257
 URL: https://issues.apache.org/jira/browse/HUDI-257
 Project: Apache Hudi (incubating)
  Issue Type: Bug
  Components: Common Core
Reporter: BALAJI VARADARAJAN
 Fix For: 0.5.0


  TestHoodieBloomIndex.testCheckUUIDsAgainstOneFile:270 » HoodieIndex Error 
chec...

  TestHoodieBloomIndex.testCheckUUIDsAgainstOneFile:270 » HoodieIndex Error 
chec...

  TestHoodieBloomIndex.testCheckUUIDsAgainstOneFile:270 » HoodieIndex Error 
chec...

  TestHoodieBloomIndex.testCheckUUIDsAgainstOneFile:270 » HoodieIndex Error 
chec...

  TestCopyOnWriteTable.testUpdateRecords:170 » HoodieIO Failed to read footer 
fo...



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Created] (HUDI-258) Hive Query engine not supporting join queries between RT and RO tables

2019-09-17 Thread BALAJI VARADARAJAN (Jira)
BALAJI VARADARAJAN created HUDI-258:
---

 Summary: Hive Query engine not supporting join queries between RT 
and RO tables
 Key: HUDI-258
 URL: https://issues.apache.org/jira/browse/HUDI-258
 Project: Apache Hudi (incubating)
  Issue Type: Bug
  Components: Hive Integration
Reporter: BALAJI VARADARAJAN


Description : 
[https://github.com/apache/incubator-hudi/issues/789#issuecomment-512740619]

 

Root Cause: Hive is tracking getSplits calls by dataset basePath and does not 
take INputFormatClass into account. Hence getSplits() is called only once. In 
the case of RO and RT tables, they both have same dataset base-path but differ 
in the InputFormatClass. Due to this, Hive join query is returning weird 
results.

 

=

The result of the demo is very strange
(Step 6(a))

 

{{ select `_hoodie_commit_time`, symbol, ts, volume, open, close  from 
stock_ticks_mor_rt where  symbol = 'GOOG';
 select `_hoodie_commit_time`, symbol, ts, volume, open, close  from 
stock_ticks_mor where  symbol = 'GOOG';}}

return as demo

BUT!

 

{{select a.key,a.ts, b.ts from stock_ticks_mor a join stock_ticks_mor_rt b  on 
a.key=b.key where a.ts != b.ts
...
++---+---+--+
| a.key  | a.ts  | b.ts  |
++---+---+--+
++---+---+--+}}

 

{{0: jdbc:hive2://hiveserver:1> select a.key,a.ts,b.ts from 
stock_ticks_mor_rt a join stock_ticks_mor b on a.key = b.key where a.key= 
'GOOG_2018-08-31 10';
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the 
future versions. Consider using a different execution engine (i.e. spark, tez) 
or using Hive 1.X releases.
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in 
[jar:file:/opt/hive/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in 
[jar:file:/opt/hadoop-2.8.4/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Execution log at: 
/tmp/root/root_20190718091316_ec40e8f2-be17-4450-bb75-8db9f4390041.log
2019-07-18 09:13:20 Starting to launch local task to process map join;  maximum 
memory = 477626368
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further 
details.
2019-07-18 09:13:21 Dump the side-table for tag: 0 with group count: 1 into 
file: 
file:/tmp/root/60ae1624-3514-4ddd-9bc1-5d2349d922d6/hive_2019-07-18_09-13-16_658_8306103829282410332-1/-local-10005/HashTable-Stage-3/MapJoin-mapfile50--.hashtable
2019-07-18 09:13:21 Uploaded 1 File to: 
file:/tmp/root/60ae1624-3514-4ddd-9bc1-5d2349d922d6/hive_2019-07-18_09-13-16_658_8306103829282410332-1/-local-10005/HashTable-Stage-3/MapJoin-mapfile50--.hashtable
 (317 bytes)
2019-07-18 09:13:21 End of local task; Time Taken: 1.688 sec.
+-+--+--+--+
|a.key| a.ts | b.ts |
+-+--+--+--+
| GOOG_2018-08-31 10  | 2018-08-31 10:29:00  | 2018-08-31 10:29:00  |
+-+--+--+--+
1 row selected (7.207 seconds)
0: jdbc:hive2://hiveserver:1> select a.key,a.ts,b.ts from stock_ticks_mor a 
join stock_ticks_mor_rt b on a.key = b.key where a.key= 'GOOG_2018-08-31 10';
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the 
future versions. Consider using a different execution engine (i.e. spark, tez) 
or using Hive 1.X releases.
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in 
[jar:file:/opt/hive/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in 
[jar:file:/opt/hadoop-2.8.4/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Execution log at: 
/tmp/root/root_20190718091348_72a5fc30-fc04-41c1-b2e3-5f943e4d5c08.log
2019-07-18 09:13:51 Starting to launch local task to process map join;  maximum 
memory = 477626368
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further 
details.
2019-07-18 09:13:53 Dump the side-table for tag: 0 with group count: 1 into 
file: 

[GitHub] [incubator-hudi] HariprasadAllaka1612 commented on issue #888: Exception in thread "main" com.uber.hoodie.exception.DatasetNotFoundException: Hoodie dataset not found in path

2019-09-17 Thread GitBox
HariprasadAllaka1612 commented on issue #888: Exception in thread "main" 
com.uber.hoodie.exception.DatasetNotFoundException: Hoodie dataset not found in 
path
URL: https://github.com/apache/incubator-hudi/issues/888#issuecomment-532362160
 
 
   I can see .hoodie directory under Games3. It contains below structure
   
   
![Capture](https://user-images.githubusercontent.com/55284877/65071733-b2832280-d98f-11e9-8b8f-ac714a2f3a0b.JPG)
   
   I am moving to master branch now. Earlier when i was getting this issue i 
was adding com.uber.hoodie:hoodie-spark and other libraries hoodie-hive, 
hoodie-client,hoodie-common etc. as maven dependency.
   
   Could you also please tel me what is the significance of the code you gave 
above?
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] bvaradar opened a new pull request #902: [HUDI-257] Fix Bloom Index unit-test failures

2019-09-17 Thread GitBox
bvaradar opened a new pull request #902: [HUDI-257] Fix Bloom Index unit-test 
failures
URL: https://github.com/apache/incubator-hudi/pull/902
 
 
   Due to test setup issue. Was able to repro the issue consistently and by 
adding initialization/cleanup step, the tests are passing consistently


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[incubator-hudi] branch master updated: Updating release notes and preparing for 0.5.0-incubating-rc2 release

2019-09-17 Thread vbalaji
This is an automated email from the ASF dual-hosted git repository.

vbalaji pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new 4bda742  Updating release notes and preparing for 0.5.0-incubating-rc2 
release
4bda742 is described below

commit 4bda742a938a344fb667d5d86ef1007de5c29365
Author: Balaji Varadarajan 
AuthorDate: Tue Sep 17 12:41:54 2019 -0700

Updating release notes and preparing for 0.5.0-incubating-rc2 release
---
 RELEASE_NOTES.md | 1 +
 1 file changed, 1 insertion(+)

diff --git a/RELEASE_NOTES.md b/RELEASE_NOTES.md
index fbbca8f..5cec3ee 100644
--- a/RELEASE_NOTES.md
+++ b/RELEASE_NOTES.md
@@ -8,6 +8,7 @@ Release 0.5.0-incubating
  * Bug fixes in query side integration, hive-sync, deltaStreamer, compaction, 
rollbacks, restore
 
 ### Full PR List
+  * **Vinoth Chandar** [HUDI-254]: Bundle and shade databricks/avro with spark 
bundle
   * **Balaji Varadarajan** [HUDI-257] Fix Bloom Index unit-test failures
   * **Balaji Varadarajan** [HUDI-252] Add Disclaimer and cleanup NOTICE and 
LICENSE files in hudi. Identify packages which are under non-apache license in 
LICENSE file
   * **Taher Koitwala** [HUDI-62] Index Lookup Timer added to HoodieWriteClient



[incubator-hudi] branch release-0.5.0 updated (9bf8ec3 -> ffa2be3)

2019-09-17 Thread vbalaji
This is an automated email from the ASF dual-hosted git repository.

vbalaji pushed a change to branch release-0.5.0
in repository https://gitbox.apache.org/repos/asf/incubator-hudi.git.


 discard 9bf8ec3  [HUDI-121] Preparing for Release 0.5.0-incubating-rc2
 add e217db5  [HUDI-254]: Bundle and shade databricks/avro with spark bundle
 add 4bda742  Updating release notes and preparing for 0.5.0-incubating-rc2 
release
 new ffa2be3  [HUDI-121] Preparing for Release 0.5.0-incubating-rc2

This update added new revisions after undoing existing revisions.
That is to say, some revisions that were in the old version of the
branch are not in the new version.  This situation occurs
when a user --force pushes a change and generates a repository
containing something like this:

 * -- * -- B -- O -- O -- O   (9bf8ec3)
\
 N -- N -- N   refs/heads/release-0.5.0 (ffa2be3)

You should already have received notification emails for all of the O
revisions, and so the following emails describe only the N revisions
from the common base, B.

Any revisions marked "omit" are not gone; other references still
refer to them.  Any revisions marked "discard" are gone forever.

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 RELEASE_NOTES.md| 1 +
 hudi-spark/pom.xml  | 1 -
 packaging/hudi-spark-bundle/pom.xml | 6 ++
 3 files changed, 7 insertions(+), 1 deletion(-)



[incubator-hudi] 01/01: [HUDI-121] Preparing for Release 0.5.0-incubating-rc2

2019-09-17 Thread vbalaji
This is an automated email from the ASF dual-hosted git repository.

vbalaji pushed a commit to branch release-0.5.0
in repository https://gitbox.apache.org/repos/asf/incubator-hudi.git

commit ffa2be321dd4b894b66f0453080b7a60a3260831
Author: Balaji Varadarajan 
AuthorDate: Tue Sep 17 10:35:16 2019 -0700

[HUDI-121] Preparing for Release 0.5.0-incubating-rc2
---
 docker/hoodie/hadoop/base/pom.xml | 2 +-
 docker/hoodie/hadoop/datanode/pom.xml | 2 +-
 docker/hoodie/hadoop/historyserver/pom.xml| 2 +-
 docker/hoodie/hadoop/hive_base/pom.xml| 2 +-
 docker/hoodie/hadoop/namenode/pom.xml | 2 +-
 docker/hoodie/hadoop/pom.xml  | 2 +-
 docker/hoodie/hadoop/prestobase/pom.xml   | 2 +-
 docker/hoodie/hadoop/spark_base/pom.xml   | 2 +-
 docker/hoodie/hadoop/sparkadhoc/pom.xml   | 2 +-
 docker/hoodie/hadoop/sparkmaster/pom.xml  | 2 +-
 docker/hoodie/hadoop/sparkworker/pom.xml  | 2 +-
 hudi-cli/pom.xml  | 2 +-
 hudi-client/pom.xml   | 2 +-
 hudi-common/pom.xml   | 2 +-
 hudi-hadoop-mr/pom.xml| 2 +-
 hudi-hive/pom.xml | 2 +-
 hudi-integ-test/pom.xml   | 2 +-
 hudi-spark/pom.xml| 2 +-
 hudi-timeline-service/pom.xml | 2 +-
 hudi-utilities/pom.xml| 2 +-
 packaging/hudi-hadoop-mr-bundle/pom.xml   | 2 +-
 packaging/hudi-hive-bundle/pom.xml| 2 +-
 packaging/hudi-presto-bundle/pom.xml  | 2 +-
 packaging/hudi-spark-bundle/pom.xml   | 2 +-
 packaging/hudi-timeline-server-bundle/pom.xml | 2 +-
 packaging/hudi-utilities-bundle/pom.xml   | 2 +-
 pom.xml   | 2 +-
 27 files changed, 27 insertions(+), 27 deletions(-)

diff --git a/docker/hoodie/hadoop/base/pom.xml 
b/docker/hoodie/hadoop/base/pom.xml
index 52dd2a8..8cb0ab2 100644
--- a/docker/hoodie/hadoop/base/pom.xml
+++ b/docker/hoodie/hadoop/base/pom.xml
@@ -19,7 +19,7 @@
   
 hudi-hadoop-docker
 org.apache.hudi
-0.5.1-SNAPSHOT
+0.5.0-incubating-rc2
   
   4.0.0
   pom
diff --git a/docker/hoodie/hadoop/datanode/pom.xml 
b/docker/hoodie/hadoop/datanode/pom.xml
index 23cb64d..ed4533f 100644
--- a/docker/hoodie/hadoop/datanode/pom.xml
+++ b/docker/hoodie/hadoop/datanode/pom.xml
@@ -19,7 +19,7 @@
   
 hudi-hadoop-docker
 org.apache.hudi
-0.5.1-SNAPSHOT
+0.5.0-incubating-rc2
   
   4.0.0
   pom
diff --git a/docker/hoodie/hadoop/historyserver/pom.xml 
b/docker/hoodie/hadoop/historyserver/pom.xml
index d35e940..b3455c6 100644
--- a/docker/hoodie/hadoop/historyserver/pom.xml
+++ b/docker/hoodie/hadoop/historyserver/pom.xml
@@ -19,7 +19,7 @@
   
 hudi-hadoop-docker
 org.apache.hudi
-0.5.1-SNAPSHOT
+0.5.0-incubating-rc2
   
   4.0.0
   pom
diff --git a/docker/hoodie/hadoop/hive_base/pom.xml 
b/docker/hoodie/hadoop/hive_base/pom.xml
index 2f7c2b5..0afaa0e 100644
--- a/docker/hoodie/hadoop/hive_base/pom.xml
+++ b/docker/hoodie/hadoop/hive_base/pom.xml
@@ -19,7 +19,7 @@
   
 hudi-hadoop-docker
 org.apache.hudi
-0.5.1-SNAPSHOT
+0.5.0-incubating-rc2
   
   4.0.0
   pom
diff --git a/docker/hoodie/hadoop/namenode/pom.xml 
b/docker/hoodie/hadoop/namenode/pom.xml
index a996f57..257781a 100644
--- a/docker/hoodie/hadoop/namenode/pom.xml
+++ b/docker/hoodie/hadoop/namenode/pom.xml
@@ -19,7 +19,7 @@
   
 hudi-hadoop-docker
 org.apache.hudi
-0.5.1-SNAPSHOT
+0.5.0-incubating-rc2
   
   4.0.0
   pom
diff --git a/docker/hoodie/hadoop/pom.xml b/docker/hoodie/hadoop/pom.xml
index fff962f..a339226 100644
--- a/docker/hoodie/hadoop/pom.xml
+++ b/docker/hoodie/hadoop/pom.xml
@@ -19,7 +19,7 @@
   
 hudi
 org.apache.hudi
-0.5.1-SNAPSHOT
+0.5.0-incubating-rc2
 ../../../pom.xml
   
   4.0.0
diff --git a/docker/hoodie/hadoop/prestobase/pom.xml 
b/docker/hoodie/hadoop/prestobase/pom.xml
index fa1d2ef..090cba6 100644
--- a/docker/hoodie/hadoop/prestobase/pom.xml
+++ b/docker/hoodie/hadoop/prestobase/pom.xml
@@ -39,7 +39,7 @@
   
 hudi-hadoop-docker
 org.apache.hudi
-0.5.1-SNAPSHOT
+0.5.0-incubating-rc2
   
   4.0.0
   pom
diff --git a/docker/hoodie/hadoop/spark_base/pom.xml 
b/docker/hoodie/hadoop/spark_base/pom.xml
index 32b33e0..28a3b78 100644
--- a/docker/hoodie/hadoop/spark_base/pom.xml
+++ b/docker/hoodie/hadoop/spark_base/pom.xml
@@ -19,7 +19,7 @@
   
 hudi-hadoop-docker
 org.apache.hudi
-0.5.1-SNAPSHOT
+0.5.0-incubating-rc2
   
   4.0.0
   pom
diff --git a/docker/hoodie/hadoop/sparkadhoc/pom.xml 
b/docker/hoodie/hadoop/sparkadhoc/pom.xml
index 80a811c..0c6e1b4 100644
--- a/docker/hoodie/hadoop/sparkadhoc/pom.xml
+++ b/docker/hoodie/hadoop/sparkadhoc/pom.xml
@@ -19,7 +19,7 @@
   
 hudi-hadoop-docker
 org.apache.hudi
-0.5.1-SNAPSHOT
+0.5.0-incubating-rc2
   
   4.0.0
   pom
diff --git 

[GitHub] [incubator-hudi] vinothchandar commented on issue #898: Caused by: org.apache.hudi.exception.HoodieUpsertException: Failed to upsert for commit time

2019-09-17 Thread GitBox
vinothchandar commented on issue #898: Caused by: 
org.apache.hudi.exception.HoodieUpsertException: Failed to upsert for commit 
time
URL: https://github.com/apache/incubator-hudi/issues/898#issuecomment-532290482
 
 
   @HariprasadAllaka1612 what version of Hadoop and Hudi are you using? We have 
not tested with Hadoop 3.0 for e.g if thats what you are using
   
   ```
   Caused by: java.lang.NoSuchMethodError: 
org.apache.hadoop.fs.FSDataOutputStream: method (Ljava/io/OutputStream;)V 
not found
   ```



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[incubator-hudi] 01/01: [HUDI-121] Preparing for Release 0.5.0-incubating-rc2

2019-09-17 Thread vbalaji
This is an automated email from the ASF dual-hosted git repository.

vbalaji pushed a commit to branch release-0.5.0
in repository https://gitbox.apache.org/repos/asf/incubator-hudi.git

commit 9bf8ec3807dfc70f8967e125170488ed9a206234
Author: Balaji Varadarajan 
AuthorDate: Tue Sep 17 10:35:16 2019 -0700

[HUDI-121] Preparing for Release 0.5.0-incubating-rc2
---
 docker/hoodie/hadoop/base/pom.xml | 2 +-
 docker/hoodie/hadoop/datanode/pom.xml | 2 +-
 docker/hoodie/hadoop/historyserver/pom.xml| 2 +-
 docker/hoodie/hadoop/hive_base/pom.xml| 2 +-
 docker/hoodie/hadoop/namenode/pom.xml | 2 +-
 docker/hoodie/hadoop/pom.xml  | 2 +-
 docker/hoodie/hadoop/prestobase/pom.xml   | 2 +-
 docker/hoodie/hadoop/spark_base/pom.xml   | 2 +-
 docker/hoodie/hadoop/sparkadhoc/pom.xml   | 2 +-
 docker/hoodie/hadoop/sparkmaster/pom.xml  | 2 +-
 docker/hoodie/hadoop/sparkworker/pom.xml  | 2 +-
 hudi-cli/pom.xml  | 2 +-
 hudi-client/pom.xml   | 2 +-
 hudi-common/pom.xml   | 2 +-
 hudi-hadoop-mr/pom.xml| 2 +-
 hudi-hive/pom.xml | 2 +-
 hudi-integ-test/pom.xml   | 2 +-
 hudi-spark/pom.xml| 2 +-
 hudi-timeline-service/pom.xml | 2 +-
 hudi-utilities/pom.xml| 2 +-
 packaging/hudi-hadoop-mr-bundle/pom.xml   | 2 +-
 packaging/hudi-hive-bundle/pom.xml| 2 +-
 packaging/hudi-presto-bundle/pom.xml  | 2 +-
 packaging/hudi-spark-bundle/pom.xml   | 2 +-
 packaging/hudi-timeline-server-bundle/pom.xml | 2 +-
 packaging/hudi-utilities-bundle/pom.xml   | 2 +-
 pom.xml   | 2 +-
 27 files changed, 27 insertions(+), 27 deletions(-)

diff --git a/docker/hoodie/hadoop/base/pom.xml 
b/docker/hoodie/hadoop/base/pom.xml
index 52dd2a8..8cb0ab2 100644
--- a/docker/hoodie/hadoop/base/pom.xml
+++ b/docker/hoodie/hadoop/base/pom.xml
@@ -19,7 +19,7 @@
   
 hudi-hadoop-docker
 org.apache.hudi
-0.5.1-SNAPSHOT
+0.5.0-incubating-rc2
   
   4.0.0
   pom
diff --git a/docker/hoodie/hadoop/datanode/pom.xml 
b/docker/hoodie/hadoop/datanode/pom.xml
index 23cb64d..ed4533f 100644
--- a/docker/hoodie/hadoop/datanode/pom.xml
+++ b/docker/hoodie/hadoop/datanode/pom.xml
@@ -19,7 +19,7 @@
   
 hudi-hadoop-docker
 org.apache.hudi
-0.5.1-SNAPSHOT
+0.5.0-incubating-rc2
   
   4.0.0
   pom
diff --git a/docker/hoodie/hadoop/historyserver/pom.xml 
b/docker/hoodie/hadoop/historyserver/pom.xml
index d35e940..b3455c6 100644
--- a/docker/hoodie/hadoop/historyserver/pom.xml
+++ b/docker/hoodie/hadoop/historyserver/pom.xml
@@ -19,7 +19,7 @@
   
 hudi-hadoop-docker
 org.apache.hudi
-0.5.1-SNAPSHOT
+0.5.0-incubating-rc2
   
   4.0.0
   pom
diff --git a/docker/hoodie/hadoop/hive_base/pom.xml 
b/docker/hoodie/hadoop/hive_base/pom.xml
index 2f7c2b5..0afaa0e 100644
--- a/docker/hoodie/hadoop/hive_base/pom.xml
+++ b/docker/hoodie/hadoop/hive_base/pom.xml
@@ -19,7 +19,7 @@
   
 hudi-hadoop-docker
 org.apache.hudi
-0.5.1-SNAPSHOT
+0.5.0-incubating-rc2
   
   4.0.0
   pom
diff --git a/docker/hoodie/hadoop/namenode/pom.xml 
b/docker/hoodie/hadoop/namenode/pom.xml
index a996f57..257781a 100644
--- a/docker/hoodie/hadoop/namenode/pom.xml
+++ b/docker/hoodie/hadoop/namenode/pom.xml
@@ -19,7 +19,7 @@
   
 hudi-hadoop-docker
 org.apache.hudi
-0.5.1-SNAPSHOT
+0.5.0-incubating-rc2
   
   4.0.0
   pom
diff --git a/docker/hoodie/hadoop/pom.xml b/docker/hoodie/hadoop/pom.xml
index fff962f..a339226 100644
--- a/docker/hoodie/hadoop/pom.xml
+++ b/docker/hoodie/hadoop/pom.xml
@@ -19,7 +19,7 @@
   
 hudi
 org.apache.hudi
-0.5.1-SNAPSHOT
+0.5.0-incubating-rc2
 ../../../pom.xml
   
   4.0.0
diff --git a/docker/hoodie/hadoop/prestobase/pom.xml 
b/docker/hoodie/hadoop/prestobase/pom.xml
index fa1d2ef..090cba6 100644
--- a/docker/hoodie/hadoop/prestobase/pom.xml
+++ b/docker/hoodie/hadoop/prestobase/pom.xml
@@ -39,7 +39,7 @@
   
 hudi-hadoop-docker
 org.apache.hudi
-0.5.1-SNAPSHOT
+0.5.0-incubating-rc2
   
   4.0.0
   pom
diff --git a/docker/hoodie/hadoop/spark_base/pom.xml 
b/docker/hoodie/hadoop/spark_base/pom.xml
index 32b33e0..28a3b78 100644
--- a/docker/hoodie/hadoop/spark_base/pom.xml
+++ b/docker/hoodie/hadoop/spark_base/pom.xml
@@ -19,7 +19,7 @@
   
 hudi-hadoop-docker
 org.apache.hudi
-0.5.1-SNAPSHOT
+0.5.0-incubating-rc2
   
   4.0.0
   pom
diff --git a/docker/hoodie/hadoop/sparkadhoc/pom.xml 
b/docker/hoodie/hadoop/sparkadhoc/pom.xml
index 80a811c..0c6e1b4 100644
--- a/docker/hoodie/hadoop/sparkadhoc/pom.xml
+++ b/docker/hoodie/hadoop/sparkadhoc/pom.xml
@@ -19,7 +19,7 @@
   
 hudi-hadoop-docker
 org.apache.hudi
-0.5.1-SNAPSHOT
+0.5.0-incubating-rc2
   
   4.0.0
   pom
diff --git 

[incubator-hudi] branch release-0.5.0 updated (26966c5 -> 9bf8ec3)

2019-09-17 Thread vbalaji
This is an automated email from the ASF dual-hosted git repository.

vbalaji pushed a change to branch release-0.5.0
in repository https://gitbox.apache.org/repos/asf/incubator-hudi.git.


omit 26966c5  [HUDI-121] ASF Release : Ensure NOTICE.txt is generated as 
part of cutting release branch
omit bb83355  [HUDI-249] Updating Notice files
omit b8afb1a  Setting Release Version to 0.5.0-incubating-rc1
 add 7190c02  [HUDI-249] Updating Notice files
 add 63cc455  [HUDI-121] ASF Release : Ensure NOTICE.txt is generated as 
part of cutting release branch
 add c0f42af  [HUDI-62] Index Lookup Timer added to HoodieWriteClient
 add 629698d  [HUDI-252] Add Disclaimer and cleanup NOTICE and LICENSE 
files in hudi
 add 96a46d8  [HUDI-252] Identify packages which are under non-apache 
license in LICENSE file
 add 2c6da09  [HUDI-257] Fix Bloom Index unit-test failures
 add c1e7d0e  [HUDI-121] Update Release notes and fix master version
 new 9bf8ec3  [HUDI-121] Preparing for Release 0.5.0-incubating-rc2

This update added new revisions after undoing existing revisions.
That is to say, some revisions that were in the old version of the
branch are not in the new version.  This situation occurs
when a user --force pushes a change and generates a repository
containing something like this:

 * -- * -- B -- O -- O -- O   (26966c5)
\
 N -- N -- N   refs/heads/release-0.5.0 (9bf8ec3)

You should already have received notification emails for all of the O
revisions, and so the following emails describe only the N revisions
from the common base, B.

Any revisions marked "omit" are not gone; other references still
refer to them.  Any revisions marked "discard" are gone forever.

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 DISCLAIMER |   6 +
 LICENSE| 332 +++
 LICENSE.txt| 614 -
 .../main/resources/META-INF/NOTICE.txt => NOTICE   |  53 +-
 NOTICE.txt | 392 -
 RELEASE_NOTES.md   |   3 +
 docker/hoodie/hadoop/base/pom.xml  |   2 +-
 docker/hoodie/hadoop/datanode/pom.xml  |   2 +-
 docker/hoodie/hadoop/historyserver/pom.xml |   2 +-
 docker/hoodie/hadoop/hive_base/pom.xml |   2 +-
 docker/hoodie/hadoop/namenode/pom.xml  |   2 +-
 docker/hoodie/hadoop/pom.xml   |   2 +-
 docker/hoodie/hadoop/prestobase/pom.xml|   2 +-
 docker/hoodie/hadoop/spark_base/pom.xml|   2 +-
 docker/hoodie/hadoop/sparkadhoc/pom.xml|   2 +-
 docker/hoodie/hadoop/sparkmaster/pom.xml   |   2 +-
 docker/hoodie/hadoop/sparkworker/pom.xml   |   2 +-
 hudi-cli/pom.xml   |   2 +-
 hudi-client/pom.xml|   2 +-
 .../java/org/apache/hudi/HoodieWriteClient.java|  16 +-
 .../org/apache/hudi/metrics/HoodieMetrics.java |  21 +-
 .../hudi/index/bloom/TestHoodieBloomIndex.java |   2 +
 .../apache/hudi/table/TestCopyOnWriteTable.java|   2 +
 hudi-common/pom.xml|   2 +-
 hudi-hadoop-mr/pom.xml |   2 +-
 hudi-hive/pom.xml  |   2 +-
 hudi-integ-test/pom.xml|   2 +-
 hudi-spark/pom.xml |   2 +-
 hudi-timeline-service/pom.xml  |   2 +-
 hudi-utilities/pom.xml |   2 +-
 packaging/hudi-hadoop-mr-bundle/pom.xml|   2 +-
 packaging/hudi-hive-bundle/pom.xml |   2 +-
 packaging/hudi-presto-bundle/pom.xml   |   2 +-
 packaging/hudi-spark-bundle/pom.xml|   2 +-
 packaging/hudi-timeline-server-bundle/pom.xml  |   2 +-
 packaging/hudi-utilities-bundle/pom.xml|   2 +-
 pom.xml|   2 +-
 37 files changed, 432 insertions(+), 1063 deletions(-)
 create mode 100644 DISCLAIMER
 create mode 100644 LICENSE
 delete mode 100644 LICENSE.txt
 copy hudi-timeline-service/src/main/resources/META-INF/NOTICE.txt => NOTICE 
(80%)
 delete mode 100644 NOTICE.txt



[jira] [Updated] (HUDI-255) Translate Talks & Powered By page

2019-09-17 Thread BALAJI VARADARAJAN (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BALAJI VARADARAJAN updated HUDI-255:

Fix Version/s: (was: 0.5.0)
   0.5.1

> Translate Talks & Powered By page
> -
>
> Key: HUDI-255
> URL: https://issues.apache.org/jira/browse/HUDI-255
> Project: Apache Hudi (incubating)
>  Issue Type: Sub-task
>  Components: docs-chinese
>Reporter: leesf
>Assignee: leesf
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.1
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The online HTML web page: [https://hudi.apache.org/powered_by.html]



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Updated] (HUDI-256) Translate Comparison page

2019-09-17 Thread BALAJI VARADARAJAN (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BALAJI VARADARAJAN updated HUDI-256:

Fix Version/s: (was: 0.5.0)
   0.5.1

> Translate Comparison page
> -
>
> Key: HUDI-256
> URL: https://issues.apache.org/jira/browse/HUDI-256
> Project: Apache Hudi (incubating)
>  Issue Type: Sub-task
>  Components: docs-chinese
>Reporter: leesf
>Assignee: leesf
>Priority: Major
> Fix For: 0.5.1
>
>
> The online HTML web page: [https://hudi.apache.org/comparison.html]



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Closed] (HUDI-252) Add Disclaimer and cleanup NOTICE and LICENSE files in hudi

2019-09-17 Thread BALAJI VARADARAJAN (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BALAJI VARADARAJAN closed HUDI-252.
---

> Add Disclaimer and cleanup NOTICE and LICENSE files in hudi
> ---
>
> Key: HUDI-252
> URL: https://issues.apache.org/jira/browse/HUDI-252
> Project: Apache Hudi (incubating)
>  Issue Type: Task
>  Components: asf-migration
>Reporter: BALAJI VARADARAJAN
>Assignee: BALAJI VARADARAJAN
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Add DIsclaimer and cleanup NOTICE and LICENSE files in hudi



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Resolved] (HUDI-252) Add Disclaimer and cleanup NOTICE and LICENSE files in hudi

2019-09-17 Thread BALAJI VARADARAJAN (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BALAJI VARADARAJAN resolved HUDI-252.
-
Resolution: Fixed

> Add Disclaimer and cleanup NOTICE and LICENSE files in hudi
> ---
>
> Key: HUDI-252
> URL: https://issues.apache.org/jira/browse/HUDI-252
> Project: Apache Hudi (incubating)
>  Issue Type: Task
>  Components: asf-migration
>Reporter: BALAJI VARADARAJAN
>Assignee: BALAJI VARADARAJAN
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Add DIsclaimer and cleanup NOTICE and LICENSE files in hudi



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[GitHub] [incubator-hudi] vinothchandar commented on issue #888: Exception in thread "main" com.uber.hoodie.exception.DatasetNotFoundException: Hoodie dataset not found in path

2019-09-17 Thread GitBox
vinothchandar commented on issue #888: Exception in thread "main" 
com.uber.hoodie.exception.DatasetNotFoundException: Hoodie dataset not found in 
path
URL: https://github.com/apache/incubator-hudi/issues/888#issuecomment-532353921
 
 
   This does not seem like a jar issue 
   
   `Exception in thread "main" 
com.uber.hoodie.exception.DatasetNotFoundException: Hoodie dataset not found in 
path s3a://gat-datalake-raw-dev/Games3\.hoodie` 
   
   Since it also fails with  are you running on windows? I am wondering how its 
`Games3\.hoodie` and not `Games3/.hoodie` ..  Can you see a `.hoodie` 
underneath `Games3` folder after failure?  I can't see anything in the code 
that may inject a `\`. can you move to master branch and try once? (that way I 
can correlate the lines in your stacktrace better as well)
   
   ```
   this.metaPath = new Path(basePath, METAFOLDER_NAME).toString();
   Path metaPathDir = new Path(this.metaPath);
   this.fs = getFs();
   DatasetNotFoundException.checkValidDataset(fs, basePathDir, metaPathDir);
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Updated] (HUDI-257) Unit tests intermittently failing

2019-09-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-257:

Labels: pull-request-available  (was: )

> Unit tests intermittently failing 
> --
>
> Key: HUDI-257
> URL: https://issues.apache.org/jira/browse/HUDI-257
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>  Components: Common Core
>Reporter: BALAJI VARADARAJAN
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.0
>
>
>   TestHoodieBloomIndex.testCheckUUIDsAgainstOneFile:270 » HoodieIndex Error 
> chec...
>   TestHoodieBloomIndex.testCheckUUIDsAgainstOneFile:270 » HoodieIndex Error 
> chec...
>   TestHoodieBloomIndex.testCheckUUIDsAgainstOneFile:270 » HoodieIndex Error 
> chec...
>   TestHoodieBloomIndex.testCheckUUIDsAgainstOneFile:270 » HoodieIndex Error 
> chec...
>   TestCopyOnWriteTable.testUpdateRecords:170 » HoodieIO Failed to read footer 
> fo...



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[GitHub] [incubator-hudi] bvaradar commented on issue #902: [HUDI-257] Fix Bloom Index unit-test failures

2019-09-17 Thread GitBox
bvaradar commented on issue #902: [HUDI-257] Fix Bloom Index unit-test failures
URL: https://github.com/apache/incubator-hudi/pull/902#issuecomment-532302020
 
 
   @vinothchandar @n3nash : Going ahead and merging this change to master to 
kickstart RC2. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] bvaradar merged pull request #902: [HUDI-257] Fix Bloom Index unit-test failures

2019-09-17 Thread GitBox
bvaradar merged pull request #902: [HUDI-257] Fix Bloom Index unit-test failures
URL: https://github.com/apache/incubator-hudi/pull/902
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[incubator-hudi] branch master updated: [HUDI-257] Fix Bloom Index unit-test failures

2019-09-17 Thread vbalaji
This is an automated email from the ASF dual-hosted git repository.

vbalaji pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new 2c6da09  [HUDI-257] Fix Bloom Index unit-test failures
2c6da09 is described below

commit 2c6da09d9d17f33ebc025c9ec9fa949605288bb7
Author: Balaji Varadarajan 
AuthorDate: Mon Sep 16 23:42:14 2019 -0700

[HUDI-257] Fix Bloom Index unit-test failures
---
 .../src/test/java/org/apache/hudi/index/bloom/TestHoodieBloomIndex.java | 2 ++
 .../src/test/java/org/apache/hudi/table/TestCopyOnWriteTable.java   | 2 ++
 2 files changed, 4 insertions(+)

diff --git 
a/hudi-client/src/test/java/org/apache/hudi/index/bloom/TestHoodieBloomIndex.java
 
b/hudi-client/src/test/java/org/apache/hudi/index/bloom/TestHoodieBloomIndex.java
index 8976683..28d19eb 100644
--- 
a/hudi-client/src/test/java/org/apache/hudi/index/bloom/TestHoodieBloomIndex.java
+++ 
b/hudi-client/src/test/java/org/apache/hudi/index/bloom/TestHoodieBloomIndex.java
@@ -91,6 +91,7 @@ public class TestHoodieBloomIndex extends 
HoodieClientTestHarness {
   public void setUp() throws Exception {
 initSparkContexts("TestHoodieBloomIndex");
 initTempFolderAndPath();
+initFileSystem();
 HoodieTestUtils.init(jsc.hadoopConfiguration(), basePath);
 // We have some records to be tagged (two different partitions)
 schemaStr = 
FileIOUtils.readAsUTFString(getClass().getResourceAsStream("/exampleSchema.txt"));
@@ -100,6 +101,7 @@ public class TestHoodieBloomIndex extends 
HoodieClientTestHarness {
   @After
   public void tearDown() throws Exception {
 cleanupSparkContexts();
+cleanupFileSystem();
 cleanupTempFolderAndPath();
   }
 
diff --git 
a/hudi-client/src/test/java/org/apache/hudi/table/TestCopyOnWriteTable.java 
b/hudi-client/src/test/java/org/apache/hudi/table/TestCopyOnWriteTable.java
index da7a0e8..6439d75 100644
--- a/hudi-client/src/test/java/org/apache/hudi/table/TestCopyOnWriteTable.java
+++ b/hudi-client/src/test/java/org/apache/hudi/table/TestCopyOnWriteTable.java
@@ -75,6 +75,7 @@ public class TestCopyOnWriteTable extends 
HoodieClientTestHarness {
 initTempFolderAndPath();
 initTableType();
 initTestDataGenerator();
+initFileSystem();
   }
 
   @After
@@ -82,6 +83,7 @@ public class TestCopyOnWriteTable extends 
HoodieClientTestHarness {
 cleanupSparkContexts();
 cleanupTempFolderAndPath();
 cleanupTableType();
+cleanupFileSystem();
 cleanupTestDataGenerator();
   }
 



[GitHub] [incubator-hudi] vinothchandar commented on issue #770: remove com.databricks:spark-avro to build spark avro schema by itself

2019-09-17 Thread GitBox
vinothchandar commented on issue #770: remove com.databricks:spark-avro to 
build spark avro schema by itself
URL: https://github.com/apache/incubator-hudi/pull/770#issuecomment-532360758
 
 
   @umehrot2 https://github.com/apache/incubator-hudi/pull/903 opened this for 
shading changes.. FYI.. 
   
   On bumping up versions, there are few compatibility considerations. 
   - Bumping parquet to 1.8.2 may be ok, since spark 2.2+ have that
   - Avro however is still 1.7.7 till spark 2.3 
   spark bundle will only include parquet-avro and use avro jars from the spark 
installation. Thus simply bumping parquet to 1.8.2 inside Hudi and then using 
against a spark 2.4 could work. Supporting Decimal on spark 2.3 and earlier 
might be tricky..  thoughts? 
   
   Also feel free to open a new PR, since @cdmikechen will take few weeks to 
circle back, as he mentioned. 
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] bvaradar merged pull request #903: [HUDI-254]: Bundle and shade databricks/avro with spark bundle

2019-09-17 Thread GitBox
bvaradar merged pull request #903: [HUDI-254]: Bundle and shade databricks/avro 
with spark bundle
URL: https://github.com/apache/incubator-hudi/pull/903
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[incubator-hudi] branch master updated: [HUDI-254]: Bundle and shade databricks/avro with spark bundle

2019-09-17 Thread vbalaji
This is an automated email from the ASF dual-hosted git repository.

vbalaji pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new e217db5  [HUDI-254]: Bundle and shade databricks/avro with spark bundle
e217db5 is described below

commit e217db56ab30ec9630411aefc16516747b3112e0
Author: Vinoth Chandar 
AuthorDate: Tue Sep 17 12:00:41 2019 -0700

[HUDI-254]: Bundle and shade databricks/avro with spark bundle

 - spark 2.4 onwards, spark has built in support. shading to avoid conflicts
 - spark 2.3 still needs this bundled, so that dropping bundle into jars 
folder would work
---
 hudi-spark/pom.xml  | 1 -
 packaging/hudi-spark-bundle/pom.xml | 6 ++
 2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/hudi-spark/pom.xml b/hudi-spark/pom.xml
index eb86e75..2701ac4 100644
--- a/hudi-spark/pom.xml
+++ b/hudi-spark/pom.xml
@@ -216,7 +216,6 @@
   com.databricks
   spark-avro_2.11
   4.0.0
-  provided
 
 
 
diff --git a/packaging/hudi-spark-bundle/pom.xml 
b/packaging/hudi-spark-bundle/pom.xml
index 153e69b..e8ef205 100644
--- a/packaging/hudi-spark-bundle/pom.xml
+++ b/packaging/hudi-spark-bundle/pom.xml
@@ -84,6 +84,8 @@
   org.apache.hive:hive-service-rpc
   org.apache.hive:hive-metastore
   org.apache.hive:hive-jdbc
+
+  com.databricks:spark-avro_2.11
 
   
   
@@ -127,6 +129,10 @@
   org.apache.commons.codec.
   
org.apache.hudi.org.apache.commons.codec.
 
+
+  com.databricks.
+  
org.apache.hudi.com.databricks.
+
 
 

[jira] [Created] (HUDI-260) Hudi Spark Bundle does not work when passed in extraClassPath option

2019-09-17 Thread Vinoth Chandar (Jira)
Vinoth Chandar created HUDI-260:
---

 Summary: Hudi Spark Bundle does not work when passed in 
extraClassPath option
 Key: HUDI-260
 URL: https://issues.apache.org/jira/browse/HUDI-260
 Project: Apache Hudi (incubating)
  Issue Type: Improvement
  Components: Spark datasource, SparkSQL Support
Reporter: Vinoth Chandar
Assignee: Vinoth Chandar






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-260) Hudi Spark Bundle does not work when passed in extraClassPath option

2019-09-17 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar updated HUDI-260:

Description: 
On EMR's side we have the same findings. *a + b + c +d* work in the following 
cases:
 * The bundle jar (with databricks-avro shaded) is specified using *--jars* or 
*spark.jars* option
 * The bundle jar (with databricks-avro shaded) is placed in the Spark Home 
jars folder i.e. */usr/lib/spark/jars* folder

However, it does not work if the jar is specified using 
*spark.driver.extraClassPath* and *spark.executor.extraClassPath* options which 
is what EMR uses to configure external dependencies. Although we can drop the 
jar in */usr/lib/spark/jars* folder, but I am not sure if it is recommended 
because that folder is supposed to contain the jars coming from spark. Extra 
dependencies from users side would be better off specified through 
*extraClassPath* option.

> Hudi Spark Bundle does not work when passed in extraClassPath option
> 
>
> Key: HUDI-260
> URL: https://issues.apache.org/jira/browse/HUDI-260
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Spark datasource, SparkSQL Support
>Reporter: Vinoth Chandar
>Assignee: Vinoth Chandar
>Priority: Major
>
> On EMR's side we have the same findings. *a + b + c +d* work in the following 
> cases:
>  * The bundle jar (with databricks-avro shaded) is specified using *--jars* 
> or *spark.jars* option
>  * The bundle jar (with databricks-avro shaded) is placed in the Spark Home 
> jars folder i.e. */usr/lib/spark/jars* folder
> However, it does not work if the jar is specified using 
> *spark.driver.extraClassPath* and *spark.executor.extraClassPath* options 
> which is what EMR uses to configure external dependencies. Although we can 
> drop the jar in */usr/lib/spark/jars* folder, but I am not sure if it is 
> recommended because that folder is supposed to contain the jars coming from 
> spark. Extra dependencies from users side would be better off specified 
> through *extraClassPath* option.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] yanghua commented on a change in pull request #896: Updating site to reflect recent doc changes

2019-09-17 Thread GitBox
yanghua commented on a change in pull request #896: Updating site to reflect 
recent doc changes
URL: https://github.com/apache/incubator-hudi/pull/896#discussion_r325438410
 
 

 ##
 File path: content/404.html
 ##
 @@ -6,25 +6,25 @@
 
 
 Page Not Found | Hudi
-
+
 
 Review comment:
   @vinothchandar Should I create an issue to describe the problem and submit a 
PR?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] umehrot2 commented on issue #770: remove com.databricks:spark-avro to build spark avro schema by itself

2019-09-17 Thread GitBox
umehrot2 commented on issue #770: remove com.databricks:spark-avro to build 
spark avro schema by itself
URL: https://github.com/apache/incubator-hudi/pull/770#issuecomment-532475222
 
 
   @vinothchandar At the moment, I cannot think of a good way how we can 
upgrade avro version while still continuing to support Spark 2.3 or earlier. 
What @cdmikechen has mentioned about asking users for this additional step of 
dropping `avro 1.8.2` jars in spark's classpath could be one option.
   
   If we agree that it is fine, either me or @cdmikechen can create a new PR 
based off this, with following changes:
   - Upgrade parquet version
   - Rollback Timestamp conversion to Logical Type, and continue to support it 
like String
   
   It appears like with the above 2 changes, this PR can be in a state to be 
merged. We can continue on the Timestamp issue in a separate Jira/PR.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Updated] (HUDI-261) Failed to create deltacommit.inflight file, duplicate timestamp issue

2019-09-17 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar updated HUDI-261:

Description: 
{{Hudi jobs started failing with }}
 {{Found commits after time :20190916210221, please rollback greater commits 
first}}

 

This occured after a "Failed to create deltacommit inflight file" exception:
{code:bash}
{{Exception in thread "main" org.apache.hudi.exception.HoodieUpsertException: 
Failed to upsert for commit time 20190916210221 at 
org.apache.hudi.HoodieWriteClient.upsert(HoodieWriteClient.java:177) at 
org.apache.hudi.utilities.deltastreamer.DeltaSync.writeToSink(DeltaSync.java:353)
 at 
org.apache.hudi.utilities.deltastreamer.DeltaSync.syncOnce(DeltaSync.java:228) 
at 
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.sync(HoodieDeltaStreamer.java:123)
 at 
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.main(HoodieDeltaStreamer.java:290)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:498) at 
org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) at 
org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:894)
 at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:198) at 
org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:228) at 
org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:137) at 
org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Caused by: 
org.apache.hudi.exception.HoodieIOException: Failed to create file 
gs:///.hoodie/20190916210223.deltacommit.inflight at 
org.apache.hudi.common.table.timeline.HoodieActiveTimeline.createFileInPath(HoodieActiveTimeline.java:391)
 at 
org.apache.hudi.common.table.timeline.HoodieActiveTimeline.createFileInMetaPath(HoodieActiveTimeline.java:371)
 at 
org.apache.hudi.common.table.timeline.HoodieActiveTimeline.saveToInflight(HoodieActiveTimeline.java:359)
 at 
org.apache.hudi.HoodieWriteClient.saveWorkloadProfileMetadataToInflight(HoodieWriteClient.java:417)
 at 
org.apache.hudi.HoodieWriteClient.upsertRecordsInternal(HoodieWriteClient.java:440)
 at org.apache.hudi.HoodieWriteClient.upsert(HoodieWriteClient.java:172) ... 14 
more Caused by: java.io.IOException: 
com.google.cloud.hadoop.repackaged.gcs.com.google.api.client.googleapis.json.GoogleJsonResponseException:
 412 Precondition Failed { "code" : 412, "errors" : [

{ "domain" : "global", "location" : "If-Match", "locationType" : "header", 
"message" : "Precondition Failed", "reason" : "conditionNotMet" }

], "message" : "Precondition Failed" } at 
com.google.cloud.hadoop.repackaged.gcs.com.google.cloud.hadoop.util.AbstractGoogleAsyncWriteChannel.waitForCompletionAndThrowIfUploadFailed(AbstractGoogleAsyncWriteChannel.java:367)
 at 
com.google.cloud.hadoop.repackaged.gcs.com.google.cloud.hadoop.util.AbstractGoogleAsyncWriteChannel.close(AbstractGoogleAsyncWriteChannel.java:238)
 at java.nio.channels.Channels$1.close(Channels.java:178) at 
java.io.FilterOutputStream.close(FilterOutputStream.java:159) at 
com.google.cloud.hadoop.fs.gcs.GoogleHadoopOutputStream.close(GoogleHadoopOutputStream.java:127)
 at 
org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72)
 at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:106) 
at 
org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72)
 at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:106) 
at 
org.apache.hudi.common.io.storage.SizeAwareFSDataOutputStream.close(SizeAwareFSDataOutputStream.java:66)
 at 
org.apache.hudi.common.table.timeline.HoodieActiveTimeline.createFileInPath(HoodieActiveTimeline.java:388)
 ... 19 more Caused by: 
com.google.cloud.hadoop.repackaged.gcs.com.google.api.client.googleapis.json.GoogleJsonResponseException:
 412 Precondition Failed}}
{code}

  was:
{{Hudi jobs started failing with }}
 {{Found commits after time :20190916210221, please rollback greater commits 
first}}

 

This occured after a "Failed to create deltacommit inflight file" exception:
{code:shell}
{{Exception in thread "main" org.apache.hudi.exception.HoodieUpsertException: 
Failed to upsert for commit time 20190916210221 at 
org.apache.hudi.HoodieWriteClient.upsert(HoodieWriteClient.java:177) at 
org.apache.hudi.utilities.deltastreamer.DeltaSync.writeToSink(DeltaSync.java:353)
 at 
org.apache.hudi.utilities.deltastreamer.DeltaSync.syncOnce(DeltaSync.java:228) 
at 
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.sync(HoodieDeltaStreamer.java:123)
 at 
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.main(HoodieDeltaStreamer.java:290)
 at 

[jira] [Updated] (HUDI-261) Failed to create deltacommit.inflight file, duplicate timestamp issue

2019-09-17 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar updated HUDI-261:

Description: 
{{Hudi jobs started failing with }}
 {{Found commits after time :20190916210221, please rollback greater commits 
first}}

 

This occured after a "Failed to create deltacommit inflight file" exception:
{code:shell}
{{Exception in thread "main" org.apache.hudi.exception.HoodieUpsertException: 
Failed to upsert for commit time 20190916210221 at 
org.apache.hudi.HoodieWriteClient.upsert(HoodieWriteClient.java:177) at 
org.apache.hudi.utilities.deltastreamer.DeltaSync.writeToSink(DeltaSync.java:353)
 at 
org.apache.hudi.utilities.deltastreamer.DeltaSync.syncOnce(DeltaSync.java:228) 
at 
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.sync(HoodieDeltaStreamer.java:123)
 at 
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.main(HoodieDeltaStreamer.java:290)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:498) at 
org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) at 
org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:894)
 at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:198) at 
org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:228) at 
org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:137) at 
org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Caused by: 
org.apache.hudi.exception.HoodieIOException: Failed to create file 
gs:///.hoodie/20190916210223.deltacommit.inflight at 
org.apache.hudi.common.table.timeline.HoodieActiveTimeline.createFileInPath(HoodieActiveTimeline.java:391)
 at 
org.apache.hudi.common.table.timeline.HoodieActiveTimeline.createFileInMetaPath(HoodieActiveTimeline.java:371)
 at 
org.apache.hudi.common.table.timeline.HoodieActiveTimeline.saveToInflight(HoodieActiveTimeline.java:359)
 at 
org.apache.hudi.HoodieWriteClient.saveWorkloadProfileMetadataToInflight(HoodieWriteClient.java:417)
 at 
org.apache.hudi.HoodieWriteClient.upsertRecordsInternal(HoodieWriteClient.java:440)
 at org.apache.hudi.HoodieWriteClient.upsert(HoodieWriteClient.java:172) ... 14 
more Caused by: java.io.IOException: 
com.google.cloud.hadoop.repackaged.gcs.com.google.api.client.googleapis.json.GoogleJsonResponseException:
 412 Precondition Failed { "code" : 412, "errors" : [

{ "domain" : "global", "location" : "If-Match", "locationType" : "header", 
"message" : "Precondition Failed", "reason" : "conditionNotMet" }

], "message" : "Precondition Failed" } at 
com.google.cloud.hadoop.repackaged.gcs.com.google.cloud.hadoop.util.AbstractGoogleAsyncWriteChannel.waitForCompletionAndThrowIfUploadFailed(AbstractGoogleAsyncWriteChannel.java:367)
 at 
com.google.cloud.hadoop.repackaged.gcs.com.google.cloud.hadoop.util.AbstractGoogleAsyncWriteChannel.close(AbstractGoogleAsyncWriteChannel.java:238)
 at java.nio.channels.Channels$1.close(Channels.java:178) at 
java.io.FilterOutputStream.close(FilterOutputStream.java:159) at 
com.google.cloud.hadoop.fs.gcs.GoogleHadoopOutputStream.close(GoogleHadoopOutputStream.java:127)
 at 
org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72)
 at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:106) 
at 
org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72)
 at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:106) 
at 
org.apache.hudi.common.io.storage.SizeAwareFSDataOutputStream.close(SizeAwareFSDataOutputStream.java:66)
 at 
org.apache.hudi.common.table.timeline.HoodieActiveTimeline.createFileInPath(HoodieActiveTimeline.java:388)
 ... 19 more Caused by: 
com.google.cloud.hadoop.repackaged.gcs.com.google.api.client.googleapis.json.GoogleJsonResponseException:
 412 Precondition Failed}}
{code}

  was:
{{Hudi jobs started failing with }}
 {{Found commits after time :20190916210221, please rollback greater commits 
first}}

 

This occured after a "Failed to create deltacommit inflight file" exception:

{{Exception in thread "main" org.apache.hudi.exception.HoodieUpsertException: 
Failed to upsert for commit time 20190916210221 at 
org.apache.hudi.HoodieWriteClient.upsert(HoodieWriteClient.java:177) at 
org.apache.hudi.utilities.deltastreamer.DeltaSync.writeToSink(DeltaSync.java:353)
 at 
org.apache.hudi.utilities.deltastreamer.DeltaSync.syncOnce(DeltaSync.java:228) 
at 
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.sync(HoodieDeltaStreamer.java:123)
 at 
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.main(HoodieDeltaStreamer.java:290)
 at 

[GitHub] [incubator-hudi] cdmikechen commented on issue #770: remove com.databricks:spark-avro to build spark avro schema by itself

2019-09-17 Thread GitBox
cdmikechen commented on issue #770: remove com.databricks:spark-avro to build 
spark avro schema by itself
URL: https://github.com/apache/incubator-hudi/pull/770#issuecomment-532471718
 
 
   @vinothchandar 
   The jar of avro 1.7.7 under spark can be directly replaced by 1.8.2. I have 
tested some of codes and proved that direct replacement of jar is a feasible 
method. In most cases, the method of avro1.8.2 is compatible with avro1.7.7.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] cdmikechen edited a comment on issue #770: remove com.databricks:spark-avro to build spark avro schema by itself

2019-09-17 Thread GitBox
cdmikechen edited a comment on issue #770: remove com.databricks:spark-avro to 
build spark avro schema by itself
URL: https://github.com/apache/incubator-hudi/pull/770#issuecomment-532471718
 
 
   @vinothchandar 
   The jar of avro 1.7.7 under spark can be directly replaced by 1.8.2. I have 
tested some codes and proved that direct replacement of jar is a feasible 
method. In most cases, the method of avro1.8.2 is compatible with avro1.7.7.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Closed] (HUDI-254) Provide mechanism for installing hudi-spark-bundle onto an existing spark installation

2019-09-17 Thread BALAJI VARADARAJAN (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BALAJI VARADARAJAN closed HUDI-254.
---

> Provide mechanism for installing hudi-spark-bundle onto an existing spark 
> installation
> --
>
> Key: HUDI-254
> URL: https://issues.apache.org/jira/browse/HUDI-254
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Spark datasource, SparkSQL Support
>Reporter: Vinoth Chandar
>Assignee: Vinoth Chandar
>Priority: Major
> Fix For: 0.5.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> A lot of discussions around this kicked off from 
> [https://github.com/apache/incubator-hudi/issues/869] 
> Breaking down into phases, when we drop the hudi-spark-bundle*.jar onto the 
> `jars` folder 
>  
> a) Writing data via Hudi datasource should work 
> b) Spark datasource reads should work
>  
> c)  a + Hive Sync should work
> d) SparkSQL on Hive synced table works 
>  
> Start with Spark 2.3 (current demo setup) and then proceed to 2.4 and iron 
> out issues.
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Resolved] (HUDI-254) Provide mechanism for installing hudi-spark-bundle onto an existing spark installation

2019-09-17 Thread BALAJI VARADARAJAN (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BALAJI VARADARAJAN resolved HUDI-254.
-
Resolution: Fixed

> Provide mechanism for installing hudi-spark-bundle onto an existing spark 
> installation
> --
>
> Key: HUDI-254
> URL: https://issues.apache.org/jira/browse/HUDI-254
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Spark datasource, SparkSQL Support
>Reporter: Vinoth Chandar
>Assignee: Vinoth Chandar
>Priority: Major
> Fix For: 0.5.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> A lot of discussions around this kicked off from 
> [https://github.com/apache/incubator-hudi/issues/869] 
> Breaking down into phases, when we drop the hudi-spark-bundle*.jar onto the 
> `jars` folder 
>  
> a) Writing data via Hudi datasource should work 
> b) Spark datasource reads should work
>  
> c)  a + Hive Sync should work
> d) SparkSQL on Hive synced table works 
>  
> Start with Spark 2.3 (current demo setup) and then proceed to 2.4 and iron 
> out issues.
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Updated] (HUDI-254) Provide mechanism for installing hudi-spark-bundle onto an existing spark installation

2019-09-17 Thread BALAJI VARADARAJAN (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BALAJI VARADARAJAN updated HUDI-254:

Fix Version/s: 0.5.0

> Provide mechanism for installing hudi-spark-bundle onto an existing spark 
> installation
> --
>
> Key: HUDI-254
> URL: https://issues.apache.org/jira/browse/HUDI-254
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Spark datasource, SparkSQL Support
>Reporter: Vinoth Chandar
>Assignee: Vinoth Chandar
>Priority: Major
> Fix For: 0.5.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> A lot of discussions around this kicked off from 
> [https://github.com/apache/incubator-hudi/issues/869] 
> Breaking down into phases, when we drop the hudi-spark-bundle*.jar onto the 
> `jars` folder 
>  
> a) Writing data via Hudi datasource should work 
> b) Spark datasource reads should work
>  
> c)  a + Hive Sync should work
> d) SparkSQL on Hive synced table works 
>  
> Start with Spark 2.3 (current demo setup) and then proceed to 2.4 and iron 
> out issues.
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (HUDI-260) Hudi Spark Bundle does not work when passed in extraClassPath option

2019-09-17 Thread Vinoth Chandar (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16931904#comment-16931904
 ] 

Vinoth Chandar commented on HUDI-260:
-

[~uditme] Let me reproduce this on the docker setup and see whats going on.. 
Mind pasting the exception you get when you try to do a + b ? 

> Hudi Spark Bundle does not work when passed in extraClassPath option
> 
>
> Key: HUDI-260
> URL: https://issues.apache.org/jira/browse/HUDI-260
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Spark datasource, SparkSQL Support
>Reporter: Vinoth Chandar
>Assignee: Vinoth Chandar
>Priority: Major
>
> On EMR's side we have the same findings. *a + b + c +d* work in the following 
> cases:
>  * The bundle jar (with databricks-avro shaded) is specified using *--jars* 
> or *spark.jars* option
>  * The bundle jar (with databricks-avro shaded) is placed in the Spark Home 
> jars folder i.e. */usr/lib/spark/jars* folder
> However, it does not work if the jar is specified using 
> *spark.driver.extraClassPath* and *spark.executor.extraClassPath* options 
> which is what EMR uses to configure external dependencies. Although we can 
> drop the jar in */usr/lib/spark/jars* folder, but I am not sure if it is 
> recommended because that folder is supposed to contain the jars coming from 
> spark. Extra dependencies from users side would be better off specified 
> through *extraClassPath* option.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] cdmikechen edited a comment on issue #770: remove com.databricks:spark-avro to build spark avro schema by itself

2019-09-17 Thread GitBox
cdmikechen edited a comment on issue #770: remove com.databricks:spark-avro to 
build spark avro schema by itself
URL: https://github.com/apache/incubator-hudi/pull/770#issuecomment-532470332
 
 
   @umehrot2 
   In addition to the decimal problem, I also modified a timestamp conversion 
problem. 
   On spark dataset, this PR get the right result. But there are still some 
problems on Hive and sparksql. Hive 2.3 does not correctly identify the 
logical-type in parquet-avro file, timestamp type may be cast to long type in 
Hive 2.3.
   I modified some of Hive's source in 
`org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableTimestampObjectInspector`
 code to solve this problem.
   ```java
   package org.apache.hadoop.hive.serde2.objectinspector.primitive;
   
   import java.sql.Timestamp;
   
   import org.apache.hadoop.hive.serde2.io.TimestampWritable;
   import org.apache.hadoop.hive.serde2.typeinfo.TypeInfoFactory;
   import org.apache.hadoop.io.LongWritable;
   
   public class WritableTimestampObjectInspector extends
   AbstractPrimitiveWritableObjectInspector implements
   SettableTimestampObjectInspector {
   
 public WritableTimestampObjectInspector() {
   super(TypeInfoFactory.timestampTypeInfo);
 }
   
 @Override
 public TimestampWritable getPrimitiveWritableObject(Object o) {
   if (o instanceof LongWritable) {
 return (TimestampWritable) 
PrimitiveObjectInspectorFactory.writableTimestampObjectInspector
 .create(new Timestamp(((LongWritable) o).get()));
   }
   return o == null ? null : (TimestampWritable) o;
 }
   
 public Timestamp getPrimitiveJavaObject(Object o) {
   if (o instanceof LongWritable) {
   return new Timestamp(((LongWritable) o).get());
   }
   return o == null ? null : ((TimestampWritable) o).getTimestamp();
 }
   
 public Object copyObject(Object o) {
   if (o instanceof LongWritable) {
   return new TimestampWritable(new Timestamp(((LongWritable) 
o).get()));
   }
   return o == null ? null : new TimestampWritable((TimestampWritable) o);
 }
   
 public Object set(Object o, byte[] bytes, int offset) {
   if (o instanceof LongWritable) {
 o = PrimitiveObjectInspectorFactory.writableTimestampObjectInspector
 .create(new Timestamp(((LongWritable) o).get()));
   } else
   ((TimestampWritable) o).set(bytes, offset);
   return o;
 }
   
 public Object set(Object o, Timestamp t) {
   if (t == null) {
 return null;
   }
   if (o instanceof LongWritable) {
 o = 
PrimitiveObjectInspectorFactory.writableTimestampObjectInspector.create(t);
   } else
   ((TimestampWritable) o).set(t);
   return o;
 }
   
 public Object set(Object o, TimestampWritable t) {
   if (t == null) {
 return null;
   }
   if (o instanceof LongWritable) {
 o = PrimitiveObjectInspectorFactory.writableTimestampObjectInspector
 .create(new Timestamp(((LongWritable) o).get()));
   } else
   ((TimestampWritable) o).set(t);
   return o;
 }
   
 public Object create(byte[] bytes, int offset) {
   return new TimestampWritable(bytes, offset);
 }
   
 public Object create(Timestamp t) {
   return new TimestampWritable(t);
 }
   }
   ```
   I'm looking for a solution that doesn't need to modify the hive source code. 
See if you can come up with any good ideas.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] cdmikechen commented on issue #770: remove com.databricks:spark-avro to build spark avro schema by itself

2019-09-17 Thread GitBox
cdmikechen commented on issue #770: remove com.databricks:spark-avro to build 
spark avro schema by itself
URL: https://github.com/apache/incubator-hudi/pull/770#issuecomment-532470332
 
 
   @umehrot2 
   In addition to the decimal problem, I also modified a timestamp conversion 
problem. 
   On spark dataset, this PR get the right result. But there are still some 
problems on Hive and sparksql. Hive 2.3 does not correctly identify the 
logical-type in parquet-avro file, timestamp type may be cast to long type in 
Hive 2.3.
   I modified some of Hive's source in 
`org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableTimestampObjectInspector`
 code to solve this problem.
   ```
   package org.apache.hadoop.hive.serde2.objectinspector.primitive;
   
   import java.sql.Timestamp;
   
   import org.apache.hadoop.hive.serde2.io.TimestampWritable;
   import org.apache.hadoop.hive.serde2.typeinfo.TypeInfoFactory;
   import org.apache.hadoop.io.LongWritable;
   
   public class WritableTimestampObjectInspector extends
   AbstractPrimitiveWritableObjectInspector implements
   SettableTimestampObjectInspector {
   
 public WritableTimestampObjectInspector() {
   super(TypeInfoFactory.timestampTypeInfo);
 }
   
 @Override
 public TimestampWritable getPrimitiveWritableObject(Object o) {
   if (o instanceof LongWritable) {
 return (TimestampWritable) 
PrimitiveObjectInspectorFactory.writableTimestampObjectInspector
 .create(new Timestamp(((LongWritable) o).get()));
   }
   return o == null ? null : (TimestampWritable) o;
 }
   
 public Timestamp getPrimitiveJavaObject(Object o) {
   if (o instanceof LongWritable) {
   return new Timestamp(((LongWritable) o).get());
   }
   return o == null ? null : ((TimestampWritable) o).getTimestamp();
 }
   
 public Object copyObject(Object o) {
   if (o instanceof LongWritable) {
   return new TimestampWritable(new Timestamp(((LongWritable) 
o).get()));
   }
   return o == null ? null : new TimestampWritable((TimestampWritable) o);
 }
   
 public Object set(Object o, byte[] bytes, int offset) {
   if (o instanceof LongWritable) {
 o = PrimitiveObjectInspectorFactory.writableTimestampObjectInspector
 .create(new Timestamp(((LongWritable) o).get()));
   } else
   ((TimestampWritable) o).set(bytes, offset);
   return o;
 }
   
 public Object set(Object o, Timestamp t) {
   if (t == null) {
 return null;
   }
   if (o instanceof LongWritable) {
 o = 
PrimitiveObjectInspectorFactory.writableTimestampObjectInspector.create(t);
   } else
   ((TimestampWritable) o).set(t);
   return o;
 }
   
 public Object set(Object o, TimestampWritable t) {
   if (t == null) {
 return null;
   }
   if (o instanceof LongWritable) {
 o = PrimitiveObjectInspectorFactory.writableTimestampObjectInspector
 .create(new Timestamp(((LongWritable) o).get()));
   } else
   ((TimestampWritable) o).set(t);
   return o;
 }
   
 public Object create(byte[] bytes, int offset) {
   return new TimestampWritable(bytes, offset);
 }
   
 public Object create(Timestamp t) {
   return new TimestampWritable(t);
 }
   }
   ```
   I'm looking for a solution that doesn't need to modify the hive source code. 
See if you can come up with any good ideas.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (HUDI-254) Provide mechanism for installing hudi-spark-bundle onto an existing spark installation

2019-09-17 Thread Udit Mehrotra (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16931817#comment-16931817
 ] 

Udit Mehrotra commented on HUDI-254:


[~vinoth] On EMR's side we have the same findings. *a + b + c +d* work in the 
following cases:
 * The bundle jar (with databricks-avro shaded) is specified using *--jars* or 
*spark.jars* option
 * The bundle jar (with databricks-avro shaded) is placed in the Spark Home 
jars folder i.e. */usr/lib/spark/jars* folder

However, it does not work if the jar is specified using 
*spark.driver.extraClassPath* and *spark.executor.extraClassPath* options which 
is what EMR uses to configure external dependencies. Although we can drop the 
jar in */usr/lib/spark/jars* folder, but I am not sure if it is recommended 
because that folder is supposed to contain the jars coming from spark. Extra 
dependencies from users side would be better off specified through 
*extraClassPath* option.

> Provide mechanism for installing hudi-spark-bundle onto an existing spark 
> installation
> --
>
> Key: HUDI-254
> URL: https://issues.apache.org/jira/browse/HUDI-254
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Spark datasource, SparkSQL Support
>Reporter: Vinoth Chandar
>Assignee: Vinoth Chandar
>Priority: Major
> Fix For: 0.5.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> A lot of discussions around this kicked off from 
> [https://github.com/apache/incubator-hudi/issues/869] 
> Breaking down into phases, when we drop the hudi-spark-bundle*.jar onto the 
> `jars` folder 
>  
> a) Writing data via Hudi datasource should work 
> b) Spark datasource reads should work
>  
> c)  a + Hive Sync should work
> d) SparkSQL on Hive synced table works 
>  
> Start with Spark 2.3 (current demo setup) and then proceed to 2.4 and iron 
> out issues.
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Created] (HUDI-261) Failed to create deltacommit.inflight file, duplicate timestamp issue

2019-09-17 Thread Elon Azoulay (Jira)
Elon Azoulay created HUDI-261:
-

 Summary: Failed to create deltacommit.inflight file, duplicate 
timestamp issue
 Key: HUDI-261
 URL: https://issues.apache.org/jira/browse/HUDI-261
 Project: Apache Hudi (incubating)
  Issue Type: Bug
  Components: deltastreamer
Reporter: Elon Azoulay


{{Hudi jobs started failing with }}
{{Found commits after time :20190916210221, please rollback greater commits 
first}}

 

This occured after a "Failed to create deltacommit inflight file" exception:

{{Exception in thread "main" org.apache.hudi.exception.HoodieUpsertException: 
Failed to upsert for commit time 20190916210221 at 
org.apache.hudi.HoodieWriteClient.upsert(HoodieWriteClient.java:177) at 
org.apache.hudi.utilities.deltastreamer.DeltaSync.writeToSink(DeltaSync.java:353)
 at 
org.apache.hudi.utilities.deltastreamer.DeltaSync.syncOnce(DeltaSync.java:228) 
at 
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.sync(HoodieDeltaStreamer.java:123)
 at 
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.main(HoodieDeltaStreamer.java:290)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:498) at 
org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) at 
org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:894)
 at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:198) at 
org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:228) at 
org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:137) at 
org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Caused by: 
org.apache.hudi.exception.HoodieIOException: Failed to create file 
gs://css-data-warehouse-hudi-ingest/hudi/data/hudi_ingest_raw/order_central_updates_latest/.hoodie/20190916210223.deltacommit.inflight
 at 
org.apache.hudi.common.table.timeline.HoodieActiveTimeline.createFileInPath(HoodieActiveTimeline.java:391)
 at 
org.apache.hudi.common.table.timeline.HoodieActiveTimeline.createFileInMetaPath(HoodieActiveTimeline.java:371)
 at 
org.apache.hudi.common.table.timeline.HoodieActiveTimeline.saveToInflight(HoodieActiveTimeline.java:359)
 at 
org.apache.hudi.HoodieWriteClient.saveWorkloadProfileMetadataToInflight(HoodieWriteClient.java:417)
 at 
org.apache.hudi.HoodieWriteClient.upsertRecordsInternal(HoodieWriteClient.java:440)
 at org.apache.hudi.HoodieWriteClient.upsert(HoodieWriteClient.java:172) ... 14 
more Caused by: java.io.IOException: 
com.google.cloud.hadoop.repackaged.gcs.com.google.api.client.googleapis.json.GoogleJsonResponseException:
 412 Precondition Failed \{ "code" : 412, "errors" : [ { "domain" : "global", 
"location" : "If-Match", "locationType" : "header", "message" : "Precondition 
Failed", "reason" : "conditionNotMet" } ], "message" : "Precondition Failed" } 
at 
com.google.cloud.hadoop.repackaged.gcs.com.google.cloud.hadoop.util.AbstractGoogleAsyncWriteChannel.waitForCompletionAndThrowIfUploadFailed(AbstractGoogleAsyncWriteChannel.java:367)
 at 
com.google.cloud.hadoop.repackaged.gcs.com.google.cloud.hadoop.util.AbstractGoogleAsyncWriteChannel.close(AbstractGoogleAsyncWriteChannel.java:238)
 at java.nio.channels.Channels$1.close(Channels.java:178) at 
java.io.FilterOutputStream.close(FilterOutputStream.java:159) at 
com.google.cloud.hadoop.fs.gcs.GoogleHadoopOutputStream.close(GoogleHadoopOutputStream.java:127)
 at 
org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72)
 at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:106) 
at 
org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72)
 at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:106) 
at 
org.apache.hudi.common.io.storage.SizeAwareFSDataOutputStream.close(SizeAwareFSDataOutputStream.java:66)
 at 
org.apache.hudi.common.table.timeline.HoodieActiveTimeline.createFileInPath(HoodieActiveTimeline.java:388)
 ... 19 more Caused by: 
com.google.cloud.hadoop.repackaged.gcs.com.google.api.client.googleapis.json.GoogleJsonResponseException:
 412 Precondition Failed}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-261) Failed to create deltacommit.inflight file, duplicate timestamp issue

2019-09-17 Thread Elon Azoulay (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elon Azoulay updated HUDI-261:
--
Description: 
{{Hudi jobs started failing with }}
 {{Found commits after time :20190916210221, please rollback greater commits 
first}}

 

This occured after a "Failed to create deltacommit inflight file" exception:

{{Exception in thread "main" org.apache.hudi.exception.HoodieUpsertException: 
Failed to upsert for commit time 20190916210221 at 
org.apache.hudi.HoodieWriteClient.upsert(HoodieWriteClient.java:177) at 
org.apache.hudi.utilities.deltastreamer.DeltaSync.writeToSink(DeltaSync.java:353)
 at 
org.apache.hudi.utilities.deltastreamer.DeltaSync.syncOnce(DeltaSync.java:228) 
at 
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.sync(HoodieDeltaStreamer.java:123)
 at 
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.main(HoodieDeltaStreamer.java:290)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:498) at 
org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) at 
org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:894)
 at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:198) at 
org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:228) at 
org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:137) at 
org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Caused by: 
org.apache.hudi.exception.HoodieIOException: Failed to create file 
gs:///.hoodie/20190916210223.deltacommit.inflight at 
org.apache.hudi.common.table.timeline.HoodieActiveTimeline.createFileInPath(HoodieActiveTimeline.java:391)
 at 
org.apache.hudi.common.table.timeline.HoodieActiveTimeline.createFileInMetaPath(HoodieActiveTimeline.java:371)
 at 
org.apache.hudi.common.table.timeline.HoodieActiveTimeline.saveToInflight(HoodieActiveTimeline.java:359)
 at 
org.apache.hudi.HoodieWriteClient.saveWorkloadProfileMetadataToInflight(HoodieWriteClient.java:417)
 at 
org.apache.hudi.HoodieWriteClient.upsertRecordsInternal(HoodieWriteClient.java:440)
 at org.apache.hudi.HoodieWriteClient.upsert(HoodieWriteClient.java:172) ... 14 
more Caused by: java.io.IOException: 
com.google.cloud.hadoop.repackaged.gcs.com.google.api.client.googleapis.json.GoogleJsonResponseException:
 412 Precondition Failed { "code" : 412, "errors" : [

{ "domain" : "global", "location" : "If-Match", "locationType" : "header", 
"message" : "Precondition Failed", "reason" : "conditionNotMet" }

], "message" : "Precondition Failed" } at 
com.google.cloud.hadoop.repackaged.gcs.com.google.cloud.hadoop.util.AbstractGoogleAsyncWriteChannel.waitForCompletionAndThrowIfUploadFailed(AbstractGoogleAsyncWriteChannel.java:367)
 at 
com.google.cloud.hadoop.repackaged.gcs.com.google.cloud.hadoop.util.AbstractGoogleAsyncWriteChannel.close(AbstractGoogleAsyncWriteChannel.java:238)
 at java.nio.channels.Channels$1.close(Channels.java:178) at 
java.io.FilterOutputStream.close(FilterOutputStream.java:159) at 
com.google.cloud.hadoop.fs.gcs.GoogleHadoopOutputStream.close(GoogleHadoopOutputStream.java:127)
 at 
org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72)
 at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:106) 
at 
org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72)
 at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:106) 
at 
org.apache.hudi.common.io.storage.SizeAwareFSDataOutputStream.close(SizeAwareFSDataOutputStream.java:66)
 at 
org.apache.hudi.common.table.timeline.HoodieActiveTimeline.createFileInPath(HoodieActiveTimeline.java:388)
 ... 19 more Caused by: 
com.google.cloud.hadoop.repackaged.gcs.com.google.api.client.googleapis.json.GoogleJsonResponseException:
 412 Precondition Failed}}

  was:
{{Hudi jobs started failing with }}
{{Found commits after time :20190916210221, please rollback greater commits 
first}}

 

This occured after a "Failed to create deltacommit inflight file" exception:

{{Exception in thread "main" org.apache.hudi.exception.HoodieUpsertException: 
Failed to upsert for commit time 20190916210221 at 
org.apache.hudi.HoodieWriteClient.upsert(HoodieWriteClient.java:177) at 
org.apache.hudi.utilities.deltastreamer.DeltaSync.writeToSink(DeltaSync.java:353)
 at 
org.apache.hudi.utilities.deltastreamer.DeltaSync.syncOnce(DeltaSync.java:228) 
at 
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.sync(HoodieDeltaStreamer.java:123)
 at 
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.main(HoodieDeltaStreamer.java:290)
 at 

[GitHub] [incubator-hudi] cdmikechen edited a comment on issue #770: remove com.databricks:spark-avro to build spark avro schema by itself

2019-09-17 Thread GitBox
cdmikechen edited a comment on issue #770: remove com.databricks:spark-avro to 
build spark avro schema by itself
URL: https://github.com/apache/incubator-hudi/pull/770#issuecomment-532471718
 
 
   @vinothchandar 
   The jar of avro 1.7.7 under spark can be directly replaced by 1.8.2. I have 
tested some codes and proved that direct replacement of jar is a feasible 
method. In most cases, the method of avro1.8.2 is compatible with avro1.7.7.
   This method is applicable to spark 2.2 and 2.3. Spark 2.4 can be used 
directly because it has its own.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] cdmikechen edited a comment on issue #770: remove com.databricks:spark-avro to build spark avro schema by itself

2019-09-17 Thread GitBox
cdmikechen edited a comment on issue #770: remove com.databricks:spark-avro to 
build spark avro schema by itself
URL: https://github.com/apache/incubator-hudi/pull/770#issuecomment-532471718
 
 
   @vinothchandar 
   The jar of avro 1.7.7 under spark can be directly replaced by 1.8.2. I have 
tested some codes in spark2.2 and proved that direct replacement of jar is a 
feasible method. In most cases, the method of avro1.8.2 is compatible with 
avro1.7.7.
   This method is applicable to spark 2.2 and 2.3. Spark 2.4 can be used 
directly because it has its own.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (HUDI-261) Failed to create deltacommit.inflight file, duplicate timestamp issue

2019-09-17 Thread Vinoth Chandar (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16931939#comment-16931939
 ] 

Vinoth Chandar commented on HUDI-261:
-

Interesting thing is the timestamps don't match for the commit and the 
underlying file? That does not make sense actually.. 
 
Failed to upsert for commit time 20190916210221 at 
org.apache.hudi.HoodieWriteClient.upsert(HoodieWriteClient.java:177) 

and 

org.apache.hudi.exception.HoodieIOException: Failed to create file 
gs:///.hoodie/20190916210223.deltacommit.inflight

[~z0le] and the error seems to be teh following? 

{code}
Exception in thread "main" java.lang.IllegalArgumentException: Earliest write 
inflight instant time must be later than compaction time. Earliest 
:[==>20190916210221__deltacommit__INFLIGHT], Compaction scheduled at 
20190916210418
{code}

> Failed to create deltacommit.inflight file, duplicate timestamp issue
> -
>
> Key: HUDI-261
> URL: https://issues.apache.org/jira/browse/HUDI-261
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>  Components: deltastreamer
>Reporter: Elon Azoulay
>Priority: Major
>
> {{Hudi jobs started failing with }}
>  {{Found commits after time :20190916210221, please rollback greater commits 
> first}}
>  
> This occured after a "Failed to create deltacommit inflight file" exception:
> {code:bash}
> {{Exception in thread "main" org.apache.hudi.exception.HoodieUpsertException: 
> Failed to upsert for commit time 20190916210221 at 
> org.apache.hudi.HoodieWriteClient.upsert(HoodieWriteClient.java:177) at 
> org.apache.hudi.utilities.deltastreamer.DeltaSync.writeToSink(DeltaSync.java:353)
>  at 
> org.apache.hudi.utilities.deltastreamer.DeltaSync.syncOnce(DeltaSync.java:228)
>  at 
> org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.sync(HoodieDeltaStreamer.java:123)
>  at 
> org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.main(HoodieDeltaStreamer.java:290)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498) at 
> org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) 
> at 
> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:894)
>  at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:198) 
> at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:228) at 
> org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:137) at 
> org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Caused by: 
> org.apache.hudi.exception.HoodieIOException: Failed to create file 
> gs:///.hoodie/20190916210223.deltacommit.inflight at 
> org.apache.hudi.common.table.timeline.HoodieActiveTimeline.createFileInPath(HoodieActiveTimeline.java:391)
>  at 
> org.apache.hudi.common.table.timeline.HoodieActiveTimeline.createFileInMetaPath(HoodieActiveTimeline.java:371)
>  at 
> org.apache.hudi.common.table.timeline.HoodieActiveTimeline.saveToInflight(HoodieActiveTimeline.java:359)
>  at 
> org.apache.hudi.HoodieWriteClient.saveWorkloadProfileMetadataToInflight(HoodieWriteClient.java:417)
>  at 
> org.apache.hudi.HoodieWriteClient.upsertRecordsInternal(HoodieWriteClient.java:440)
>  at org.apache.hudi.HoodieWriteClient.upsert(HoodieWriteClient.java:172) ... 
> 14 more Caused by: java.io.IOException: 
> com.google.cloud.hadoop.repackaged.gcs.com.google.api.client.googleapis.json.GoogleJsonResponseException:
>  412 Precondition Failed { "code" : 412, "errors" : [
> { "domain" : "global", "location" : "If-Match", "locationType" : "header", 
> "message" : "Precondition Failed", "reason" : "conditionNotMet" }
> ], "message" : "Precondition Failed" } at 
> com.google.cloud.hadoop.repackaged.gcs.com.google.cloud.hadoop.util.AbstractGoogleAsyncWriteChannel.waitForCompletionAndThrowIfUploadFailed(AbstractGoogleAsyncWriteChannel.java:367)
>  at 
> com.google.cloud.hadoop.repackaged.gcs.com.google.cloud.hadoop.util.AbstractGoogleAsyncWriteChannel.close(AbstractGoogleAsyncWriteChannel.java:238)
>  at java.nio.channels.Channels$1.close(Channels.java:178) at 
> java.io.FilterOutputStream.close(FilterOutputStream.java:159) at 
> com.google.cloud.hadoop.fs.gcs.GoogleHadoopOutputStream.close(GoogleHadoopOutputStream.java:127)
>  at 
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72)
>  at 
> org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:106) at 
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72)
>  at 
> 

[jira] [Commented] (HUDI-261) Failed to create deltacommit.inflight file, duplicate timestamp issue

2019-09-17 Thread Vinoth Chandar (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16931940#comment-16931940
 ] 

Vinoth Chandar commented on HUDI-261:
-

`Found commits after time :20190916210221, please rollback greater commits 
first` is thrown from two places.. So stack trace would be great to look at the 
line number

> Failed to create deltacommit.inflight file, duplicate timestamp issue
> -
>
> Key: HUDI-261
> URL: https://issues.apache.org/jira/browse/HUDI-261
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>  Components: deltastreamer
>Reporter: Elon Azoulay
>Priority: Major
>
> {{Hudi jobs started failing with }}
>  {{Found commits after time :20190916210221, please rollback greater commits 
> first}}
>  
> This occured after a "Failed to create deltacommit inflight file" exception:
> {code:bash}
> {{Exception in thread "main" org.apache.hudi.exception.HoodieUpsertException: 
> Failed to upsert for commit time 20190916210221 at 
> org.apache.hudi.HoodieWriteClient.upsert(HoodieWriteClient.java:177) at 
> org.apache.hudi.utilities.deltastreamer.DeltaSync.writeToSink(DeltaSync.java:353)
>  at 
> org.apache.hudi.utilities.deltastreamer.DeltaSync.syncOnce(DeltaSync.java:228)
>  at 
> org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.sync(HoodieDeltaStreamer.java:123)
>  at 
> org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.main(HoodieDeltaStreamer.java:290)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498) at 
> org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) 
> at 
> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:894)
>  at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:198) 
> at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:228) at 
> org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:137) at 
> org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Caused by: 
> org.apache.hudi.exception.HoodieIOException: Failed to create file 
> gs:///.hoodie/20190916210223.deltacommit.inflight at 
> org.apache.hudi.common.table.timeline.HoodieActiveTimeline.createFileInPath(HoodieActiveTimeline.java:391)
>  at 
> org.apache.hudi.common.table.timeline.HoodieActiveTimeline.createFileInMetaPath(HoodieActiveTimeline.java:371)
>  at 
> org.apache.hudi.common.table.timeline.HoodieActiveTimeline.saveToInflight(HoodieActiveTimeline.java:359)
>  at 
> org.apache.hudi.HoodieWriteClient.saveWorkloadProfileMetadataToInflight(HoodieWriteClient.java:417)
>  at 
> org.apache.hudi.HoodieWriteClient.upsertRecordsInternal(HoodieWriteClient.java:440)
>  at org.apache.hudi.HoodieWriteClient.upsert(HoodieWriteClient.java:172) ... 
> 14 more Caused by: java.io.IOException: 
> com.google.cloud.hadoop.repackaged.gcs.com.google.api.client.googleapis.json.GoogleJsonResponseException:
>  412 Precondition Failed { "code" : 412, "errors" : [
> { "domain" : "global", "location" : "If-Match", "locationType" : "header", 
> "message" : "Precondition Failed", "reason" : "conditionNotMet" }
> ], "message" : "Precondition Failed" } at 
> com.google.cloud.hadoop.repackaged.gcs.com.google.cloud.hadoop.util.AbstractGoogleAsyncWriteChannel.waitForCompletionAndThrowIfUploadFailed(AbstractGoogleAsyncWriteChannel.java:367)
>  at 
> com.google.cloud.hadoop.repackaged.gcs.com.google.cloud.hadoop.util.AbstractGoogleAsyncWriteChannel.close(AbstractGoogleAsyncWriteChannel.java:238)
>  at java.nio.channels.Channels$1.close(Channels.java:178) at 
> java.io.FilterOutputStream.close(FilterOutputStream.java:159) at 
> com.google.cloud.hadoop.fs.gcs.GoogleHadoopOutputStream.close(GoogleHadoopOutputStream.java:127)
>  at 
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72)
>  at 
> org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:106) at 
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72)
>  at 
> org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:106) at 
> org.apache.hudi.common.io.storage.SizeAwareFSDataOutputStream.close(SizeAwareFSDataOutputStream.java:66)
>  at 
> org.apache.hudi.common.table.timeline.HoodieActiveTimeline.createFileInPath(HoodieActiveTimeline.java:388)
>  ... 19 more Caused by: 
> com.google.cloud.hadoop.repackaged.gcs.com.google.api.client.googleapis.json.GoogleJsonResponseException:
>  412 Precondition Failed}}
> {code}



--
This message was sent by Atlassian Jira

[jira] [Commented] (HUDI-254) Provide mechanism for installing hudi-spark-bundle onto an existing spark installation

2019-09-17 Thread Vinoth Chandar (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16931902#comment-16931902
 ] 

Vinoth Chandar commented on HUDI-254:
-

[~uditme] sorry to keep switching JIRAs. Seems this needs to be closed so 
balaji can include it in release notes :) 

Awesome that we can narrow it down to this specific extraClassPath issue. Lets 
hash it out at HUDI-260 . Already copied your text over.. 

> Provide mechanism for installing hudi-spark-bundle onto an existing spark 
> installation
> --
>
> Key: HUDI-254
> URL: https://issues.apache.org/jira/browse/HUDI-254
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Spark datasource, SparkSQL Support
>Reporter: Vinoth Chandar
>Assignee: Vinoth Chandar
>Priority: Major
> Fix For: 0.5.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> A lot of discussions around this kicked off from 
> [https://github.com/apache/incubator-hudi/issues/869] 
> Breaking down into phases, when we drop the hudi-spark-bundle*.jar onto the 
> `jars` folder 
>  
> a) Writing data via Hudi datasource should work 
> b) Spark datasource reads should work
>  
> c)  a + Hive Sync should work
> d) SparkSQL on Hive synced table works 
>  
> Start with Spark 2.3 (current demo setup) and then proceed to 2.4 and iron 
> out issues.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] umehrot2 commented on issue #869: Hudi Spark error when spark bundle jar is added to spark's classpath

2019-09-17 Thread GitBox
umehrot2 commented on issue #869: Hudi Spark error when spark bundle jar is 
added to spark's classpath
URL: https://github.com/apache/incubator-hudi/issues/869#issuecomment-532404509
 
 
   Did test on our side and commented on 
https://issues.apache.org/jira/browse/HUDI-254 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (HUDI-260) Hudi Spark Bundle does not work when passed in extraClassPath option

2019-09-17 Thread Udit Mehrotra (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16931918#comment-16931918
 ] 

Udit Mehrotra commented on HUDI-260:


Thanks for creating checking on this issue. Here is the exception we get:


{noformat}
Driver stacktrace:
  at 
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:2041)
  at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:2029)
  at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:2028)
  at 
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
  at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
  at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:2028)
  at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:966)
  at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:966)
  at scala.Option.foreach(Option.scala:257)
  at 
org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:966)
  at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2262)
  at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2211)
  at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2200)
  at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)
  at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:777)
  at org.apache.spark.SparkContext.runJob(SparkContext.scala:2061)
  at org.apache.spark.SparkContext.runJob(SparkContext.scala:2082)
  at org.apache.spark.SparkContext.runJob(SparkContext.scala:2101)
  at org.apache.spark.rdd.RDD$$anonfun$take$1.apply(RDD.scala:1364)
  at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
  at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
  at org.apache.spark.rdd.RDD.withScope(RDD.scala:363)
  at org.apache.spark.rdd.RDD.take(RDD.scala:1337)
  at org.apache.spark.rdd.RDD$$anonfun$isEmpty$1.apply$mcZ$sp(RDD.scala:1472)
  at org.apache.spark.rdd.RDD$$anonfun$isEmpty$1.apply(RDD.scala:1472)
  at org.apache.spark.rdd.RDD$$anonfun$isEmpty$1.apply(RDD.scala:1472)
  at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
  at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
  at org.apache.spark.rdd.RDD.withScope(RDD.scala:363)
  at org.apache.spark.rdd.RDD.isEmpty(RDD.scala:1471)
  at org.apache.spark.api.java.JavaRDDLike$class.isEmpty(JavaRDDLike.scala:544)
  at org.apache.spark.api.java.AbstractJavaRDDLike.isEmpty(JavaRDDLike.scala:45)
  at org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:136)
  at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:91)
  at 
org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45)
  at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
  at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
  at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86)
  at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
  at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
  at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:156)
  at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
  at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
  at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
  at 
org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:80)
  at 
org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:80)
  at 
org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676)
  at 
org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676)
  at 
org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:78)
  at 
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125)
  at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73)
  at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:676)
  at 
org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:285)
  at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:271)
  at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:229)
  ... 

Build failed in Jenkins: hudi-snapshot-deployment-0.5 #41

2019-09-17 Thread Apache Jenkins Server
See 


--
[...truncated 2.18 KB...]
m2.conf
mvn
mvn.cmd
mvnDebug
mvnDebug.cmd
mvnyjp

/home/jenkins/tools/maven/apache-maven-3.5.4/boot:
plexus-classworlds-2.5.2.jar

/home/jenkins/tools/maven/apache-maven-3.5.4/conf:
logging
settings.xml
toolchains.xml

/home/jenkins/tools/maven/apache-maven-3.5.4/conf/logging:
simplelogger.properties

/home/jenkins/tools/maven/apache-maven-3.5.4/lib:
aopalliance-1.0.jar
cdi-api-1.0.jar
cdi-api.license
commons-cli-1.4.jar
commons-cli.license
commons-io-2.5.jar
commons-io.license
commons-lang3-3.5.jar
commons-lang3.license
ext
guava-20.0.jar
guice-4.2.0-no_aop.jar
jansi-1.17.1.jar
jansi-native
javax.inject-1.jar
jcl-over-slf4j-1.7.25.jar
jcl-over-slf4j.license
jsr250-api-1.0.jar
jsr250-api.license
maven-artifact-3.5.4.jar
maven-artifact.license
maven-builder-support-3.5.4.jar
maven-builder-support.license
maven-compat-3.5.4.jar
maven-compat.license
maven-core-3.5.4.jar
maven-core.license
maven-embedder-3.5.4.jar
maven-embedder.license
maven-model-3.5.4.jar
maven-model-builder-3.5.4.jar
maven-model-builder.license
maven-model.license
maven-plugin-api-3.5.4.jar
maven-plugin-api.license
maven-repository-metadata-3.5.4.jar
maven-repository-metadata.license
maven-resolver-api-1.1.1.jar
maven-resolver-api.license
maven-resolver-connector-basic-1.1.1.jar
maven-resolver-connector-basic.license
maven-resolver-impl-1.1.1.jar
maven-resolver-impl.license
maven-resolver-provider-3.5.4.jar
maven-resolver-provider.license
maven-resolver-spi-1.1.1.jar
maven-resolver-spi.license
maven-resolver-transport-wagon-1.1.1.jar
maven-resolver-transport-wagon.license
maven-resolver-util-1.1.1.jar
maven-resolver-util.license
maven-settings-3.5.4.jar
maven-settings-builder-3.5.4.jar
maven-settings-builder.license
maven-settings.license
maven-shared-utils-3.2.1.jar
maven-shared-utils.license
maven-slf4j-provider-3.5.4.jar
maven-slf4j-provider.license
org.eclipse.sisu.inject-0.3.3.jar
org.eclipse.sisu.inject.license
org.eclipse.sisu.plexus-0.3.3.jar
org.eclipse.sisu.plexus.license
plexus-cipher-1.7.jar
plexus-cipher.license
plexus-component-annotations-1.7.1.jar
plexus-component-annotations.license
plexus-interpolation-1.24.jar
plexus-interpolation.license
plexus-sec-dispatcher-1.4.jar
plexus-sec-dispatcher.license
plexus-utils-3.1.0.jar
plexus-utils.license
slf4j-api-1.7.25.jar
slf4j-api.license
wagon-file-3.1.0.jar
wagon-file.license
wagon-http-3.1.0-shaded.jar
wagon-http.license
wagon-provider-api-3.1.0.jar
wagon-provider-api.license

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/ext:
README.txt

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native:
freebsd32
freebsd64
linux32
linux64
osx
README.txt
windows32
windows64

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/freebsd32:
libjansi.so

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/freebsd64:
libjansi.so

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/linux32:
libjansi.so

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/linux64:
libjansi.so

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/osx:
libjansi.jnilib

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/windows32:
jansi.dll

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/windows64:
jansi.dll
Finished /home/jenkins/tools/maven/apache-maven-3.5.4 Directory Listing :
Detected current version as: 
'HUDI_home=
0.5.1-SNAPSHOT'
[INFO] Scanning for projects...
[INFO] 
[INFO] Reactor Build Order:
[INFO] 
[INFO] Hudi   [pom]
[INFO] hudi-common[jar]
[INFO] hudi-timeline-service  [jar]
[INFO] hudi-hadoop-mr [jar]
[INFO] hudi-client[jar]
[INFO] hudi-hive  [jar]
[INFO] hudi-spark [jar]
[INFO] hudi-utilities [jar]
[INFO] hudi-cli   [jar]
[INFO] hudi-hadoop-mr-bundle  [jar]
[INFO] hudi-hive-bundle   [jar]
[INFO] hudi-spark-bundle  [jar]
[INFO] hudi-presto-bundle [jar]
[INFO] hudi-utilities-bundle  [jar]
[INFO] hudi-timeline-server-bundle[jar]
[INFO] hudi-hadoop-docker 

[jira] [Commented] (HUDI-215) Update documentation for joining slack group

2019-09-17 Thread leesf (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16932079#comment-16932079
 ] 

leesf commented on HUDI-215:


Fixed via asf-site: cb57c912bf3d9ceb8b81a512b17d61cbd7ad1af9

> Update documentation for joining slack group
> 
>
> Key: HUDI-215
> URL: https://issues.apache.org/jira/browse/HUDI-215
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Docs
>Reporter: Pratyaksh Sharma
>Assignee: Pratyaksh Sharma
>Priority: Major
>  Labels: documentation, newbie, pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently we have a list of pre-approved mail domains for joining apache-hudi 
> slack group. If anyone, whose mail-id is not present in that list, wants to 
> join the group, he/she has to check out github issue - 
> [https://github.com/apache/incubator-hudi/issues/143]. 
> However there is a documentation gap as this issue is not mentioned in the 
> documentation. This Jira is regarding updating the documentation to mention 
> this github issue in community.html page.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] leesf commented on issue #143: Tracking ticket for folks to be added to slack group

2019-09-17 Thread GitBox
leesf commented on issue #143: Tracking ticket for folks to be added to slack 
group
URL: https://github.com/apache/incubator-hudi/issues/143#issuecomment-532526762
 
 
   Hi @vinothchandar , Could you please add leesf0...@gmail.com to the slack 
group.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #896: Updating site to reflect recent doc changes

2019-09-17 Thread GitBox
vinothchandar commented on a change in pull request #896: Updating site to 
reflect recent doc changes
URL: https://github.com/apache/incubator-hudi/pull/896#discussion_r325476668
 
 

 ##
 File path: content/404.html
 ##
 @@ -6,25 +6,25 @@
 
 
 Page Not Found | Hudi
-
+
 
 Review comment:
   I ll let you and @bhasudha decide. If the fix is simple she can also do it 
in this PR if you give pointers


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] yanghua commented on a change in pull request #896: Updating site to reflect recent doc changes

2019-09-17 Thread GitBox
yanghua commented on a change in pull request #896: Updating site to reflect 
recent doc changes
URL: https://github.com/apache/incubator-hudi/pull/896#discussion_r325178134
 
 

 ##
 File path: content/404.html
 ##
 @@ -6,25 +6,25 @@
 
 
 Page Not Found | Hudi
-
+
 
 Review comment:
   @bhasudha It seems if the `href` does not start with `/`, then the URL of 
the resource can not be rendered correctly.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services