[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #780: Cleanup Maven POM/Classpath

2019-07-11 Thread GitBox
vinothchandar commented on a change in pull request #780: Cleanup Maven 
POM/Classpath
URL: https://github.com/apache/incubator-hudi/pull/780#discussion_r302763907
 
 

 ##
 File path: hoodie-client/pom.xml
 ##
 @@ -78,6 +79,71 @@
   hoodie-timeline-service
   ${project.version}
 
+
+
+
+  log4j
+  log4j
+
+
+
+
+  org.apache.parquet
+  parquet-avro
+
+
+  org.apache.parquet
+  parquet-hadoop
+
+
+
+
+  org.apache.spark
+  spark-core_2.11
+
+
+  org.apache.spark
+  spark-sql_2.11
+
+
+
+
+  io.dropwizard.metrics
+  metrics-graphite
+
+
+  io.dropwizard.metrics
+  metrics-core
+
+
+
+  com.google.guava
+  guava
+
+
+
+  com.beust
+  jcommander
+  1.48
+
+
+
+  org.htrace
 
 Review comment:
   removed already


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #780: Cleanup Maven POM/Classpath

2019-07-11 Thread GitBox
vinothchandar commented on a change in pull request #780: Cleanup Maven 
POM/Classpath
URL: https://github.com/apache/incubator-hudi/pull/780#discussion_r302762037
 
 

 ##
 File path: hoodie-cli/pom.xml
 ##
 @@ -29,8 +29,6 @@
   
 1.2.0.RELEASE
 org.springframework.shell.Bootstrap
-1.2.17
-4.10
 
 Review comment:
   already removed junit.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #780: Cleanup Maven POM/Classpath

2019-07-11 Thread GitBox
vinothchandar commented on a change in pull request #780: Cleanup Maven 
POM/Classpath
URL: https://github.com/apache/incubator-hudi/pull/780#discussion_r302763536
 
 

 ##
 File path: hoodie-client/pom.xml
 ##
 @@ -78,6 +79,71 @@
   hoodie-timeline-service
   ${project.version}
 
+
+
+
+  log4j
+  log4j
+
+
+
+
+  org.apache.parquet
+  parquet-avro
+
+
+  org.apache.parquet
 
 Review comment:
   removed already


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #780: Cleanup Maven POM/Classpath

2019-07-11 Thread GitBox
vinothchandar commented on a change in pull request #780: Cleanup Maven 
POM/Classpath
URL: https://github.com/apache/incubator-hudi/pull/780#discussion_r302762896
 
 

 ##
 File path: hoodie-cli/pom.xml
 ##
 @@ -159,67 +176,51 @@
   spark-sql_2.11
 
 
+
 
-  com.jakewharton.fliptables
-  fliptables
-  1.0.2
+  commons-dbcp
+  commons-dbcp
 
 
 
-  log4j
-  log4j
-  ${log4j.version}
+  org.springframework.shell
+  spring-shell
+  ${spring.shell.version}
 
 
 
-  com.uber.hoodie
-  hoodie-hive
-  ${project.version}
+  de.vandermeer
+  asciitable
+  0.2.5
 
 
 
-  com.uber.hoodie
-  hoodie-client
-  ${project.version}
+  com.jakewharton.fliptables
+  fliptables
+  1.0.2
 
 
+
+  joda-time
+  joda-time
 
 Review comment:
   removed version


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #780: Cleanup Maven POM/Classpath

2019-07-11 Thread GitBox
vinothchandar commented on a change in pull request #780: Cleanup Maven 
POM/Classpath
URL: https://github.com/apache/incubator-hudi/pull/780#discussion_r302762792
 
 

 ##
 File path: hoodie-cli/pom.xml
 ##
 @@ -159,67 +176,51 @@
   spark-sql_2.11
 
 
+
 
-  com.jakewharton.fliptables
-  fliptables
-  1.0.2
+  commons-dbcp
+  commons-dbcp
 
 
 
-  log4j
-  log4j
-  ${log4j.version}
+  org.springframework.shell
+  spring-shell
+  ${spring.shell.version}
 
 
 
-  com.uber.hoodie
-  hoodie-hive
-  ${project.version}
+  de.vandermeer
 
 Review comment:
   not needed. removed already


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] yihua opened a new pull request #781: [HUDI-161] Remove --key-generator-class CLI arg in HoodieDeltaStreamer and use key generator class specified in datasource properties

2019-07-11 Thread GitBox
yihua opened a new pull request #781: [HUDI-161] Remove --key-generator-class 
CLI arg in HoodieDeltaStreamer and use key generator class specified in 
datasource properties
URL: https://github.com/apache/incubator-hudi/pull/781
 
 
   1. Removing --key-generator-class CLI arg in HoodieDeltaStreamer.
   2. Changing the method `createKeyGenerator()` in `DataSourceUtils` class to 
directly get the class name of the key generator from the properties, instead 
of using the passed-in argument of the key generator class.
   3. Adjusting the tests in `TestHoodieDeltaStreamer` to take the specified 
key generator class.  Adding one test around the invalid class name of the key 
generator.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] thesuperzapper commented on issue #638: [HUDI-12] Upgrade to Spark 2.4, Avro 1.8.2, Parquet 1.10.0...

2019-07-11 Thread GitBox
thesuperzapper commented on issue #638: [HUDI-12] Upgrade to Spark 2.4, Avro 
1.8.2, Parquet 1.10.0... 
URL: https://github.com/apache/incubator-hudi/pull/638#issuecomment-510649936
 
 
   We are working on #780 first


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on issue #764: Hoodie 0.4.7: Error upserting bucketType UPDATE for partition #, No value present

2019-07-11 Thread GitBox
vinothchandar commented on issue #764: Hoodie 0.4.7:  Error upserting 
bucketType UPDATE for partition #, No value present
URL: https://github.com/apache/incubator-hudi/issues/764#issuecomment-510644864
 
 
   @amaranathv again the issue from this line `.onParentPath(new 
Path(hoodieTable.getMetaClient().getBasePath(), partitionPath))
   ` where I suspect partitionPath is null. Can you please ensure your dataset 
contains non-null partition paths? (I am going to add an explicit exception to 
flag this case in the KeyGenerator subclasses or somewhere else. ) 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on issue #714: Performance Comparison of HoodieDeltaStreamer and DataSourceAPI

2019-07-11 Thread GitBox
vinothchandar commented on issue #714: Performance Comparison of 
HoodieDeltaStreamer and DataSourceAPI
URL: https://github.com/apache/incubator-hudi/issues/714#issuecomment-510636174
 
 
   One thing I notice is about 10 mins is spent in retrying the stages . This 
is usually indicatve that the job is running a degraded state. do you know what 
these failures are? also stage 3, simply parses the input and caches the hoodie 
records in memory.. Not sure why this takes 7 minutes - usually indicates input 
data parsing issues. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] amaranathv commented on issue #764: Hoodie 0.4.7: Error upserting bucketType UPDATE for partition #, No value present

2019-07-11 Thread GitBox
amaranathv commented on issue #764: Hoodie 0.4.7:  Error upserting bucketType 
UPDATE for partition #, No value present
URL: https://github.com/apache/incubator-hudi/issues/764#issuecomment-510609967
 
 
   DataSourceWriteOptions.KEYGENERATOR_CLASS_OPT_KEY-> 
"com.uber.hoodie.NonpartitionedKeyGenerator"
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] amaranathv commented on issue #764: Hoodie 0.4.7: Error upserting bucketType UPDATE for partition #, No value present

2019-07-11 Thread GitBox
amaranathv commented on issue #764: Hoodie 0.4.7:  Error upserting bucketType 
UPDATE for partition #, No value present
URL: https://github.com/apache/incubator-hudi/issues/764#issuecomment-510609856
 
 
   yes


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on issue #764: Hoodie 0.4.7: Error upserting bucketType UPDATE for partition #, No value present

2019-07-11 Thread GitBox
vinothchandar commented on issue #764: Hoodie 0.4.7:  Error upserting 
bucketType UPDATE for partition #, No value present
URL: https://github.com/apache/incubator-hudi/issues/764#issuecomment-510606829
 
 
   @amaranathv just to confirm you are using `NonpartitionedKeyGenerator` as 
the key generator? 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] amaranathv edited a comment on issue #764: Hoodie 0.4.7: Error upserting bucketType UPDATE for partition #, No value present

2019-07-11 Thread GitBox
amaranathv edited a comment on issue #764: Hoodie 0.4.7:  Error upserting 
bucketType UPDATE for partition #, No value present
URL: https://github.com/apache/incubator-hudi/issues/764#issuecomment-510581260
 
 
   I am getting same error.
   ```
   scala> 
.save("/datalake/888/888/888/hive/warehouse/test_hudi_spark_no_part_1_mor")
   19/07/11 12:31:45 WARN TaskSetManager: Lost task 0.0 in stage 304.0 (TID 
464, 8.uhc.com, executor 2): 
com.uber.hoodie.exception.HoodieUpsertException: Error upserting bucketType 
UPDATE for partition :0
   at 
com.uber.hoodie.table.HoodieCopyOnWriteTable.handleUpsertPartition(HoodieCopyOnWriteTable.java:274)
   at 
com.uber.hoodie.HoodieWriteClient.lambda$upsertRecordsInternal$7ef77fd$1(HoodieWriteClient.java:451)
   at 
org.apache.spark.api.java.JavaRDDLike$$anonfun$mapPartitionsWithIndex$1.apply(JavaRDDLike.scala:102)
   at 
org.apache.spark.api.java.JavaRDDLike$$anonfun$mapPartitionsWithIndex$1.apply(JavaRDDLike.scala:102)
   at 
org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$26.apply(RDD.scala:844)
   at 
org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$26.apply(RDD.scala:844)
   at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
   at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
   at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
   at org.apache.spark.rdd.RDD$$anonfun$8.apply(RDD.scala:336)
   at org.apache.spark.rdd.RDD$$anonfun$8.apply(RDD.scala:334)
   at 
org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1055)
   at 
org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1029)
   at 
org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:969)
   at 
org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1029)
   at 
org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:760)
   at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:334)
   at org.apache.spark.rdd.RDD.iterator(RDD.scala:285)
   at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
   at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
   at org.apache.spark.scheduler.Task.run(Task.scala:108)
   at 
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:338)
   at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
   at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
   at java.lang.Thread.run(Thread.java:748)
   Caused by: com.uber.hoodie.exception.HoodieUpsertException: Failed to 
initialize HoodieAppendHandle for FileId: 
951d569b-188d-46e4-ad94-a32525fac797-0 on commit 20190711123144 on HDFS path 
/datalake/optum/optuminsight/udw/hive/warehouse/test_hudi_spark_no_part_1_mor
   at 
com.uber.hoodie.io.HoodieAppendHandle.init(HoodieAppendHandle.java:141)
   at 
com.uber.hoodie.io.HoodieAppendHandle.doAppend(HoodieAppendHandle.java:193)
   at 
com.uber.hoodie.table.HoodieMergeOnReadTable.handleUpdate(HoodieMergeOnReadTable.java:118)
   at 
com.uber.hoodie.table.HoodieCopyOnWriteTable.handleUpsertPartition(HoodieCopyOnWriteTable.java:266)
   ... 28 more
   Caused by: java.lang.IllegalArgumentException: Can not create a Path from an 
empty string
   at org.apache.hadoop.fs.Path.checkPathArg(Path.java:130)
   at org.apache.hadoop.fs.Path.(Path.java:138)
   at org.apache.hadoop.fs.Path.(Path.java:92)
   at 
com.uber.hoodie.io.HoodieAppendHandle.createLogWriter(HoodieAppendHandle.java:277)
   at 
com.uber.hoodie.io.HoodieAppendHandle.init(HoodieAppendHandle.java:132)
   ... 31 more
   
   19/07/11 12:31:45 ERROR TaskSetManager: Task 0 in stage 304.0 failed 4 
times; aborting job
   org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in 
stage 304.0 failed 4 times, most recent failure: Lost task 0.3 in stage 304.0 
(TID 467, dbslt1829.uhc.com, executor 2): 
com.uber.hoodie.exception.HoodieUpsertException: Error upserting bucketType 
UPDATE for partition :0
   at 
com.uber.hoodie.table.HoodieCopyOnWriteTable.handleUpsertPartition(HoodieCopyOnWriteTable.java:274)
   at 
com.uber.hoodie.HoodieWriteClient.lambda$upsertRecordsInternal$7ef77fd$1(HoodieWriteClient.java:451)
   at 

[GitHub] [incubator-hudi] vinothchandar commented on issue #550: NoSuchMethodError: org.eclipse.jetty.servlet.ServletMapping.setDefault(Z)V

2019-07-11 Thread GitBox
vinothchandar commented on issue #550: NoSuchMethodError: 
org.eclipse.jetty.servlet.ServletMapping.setDefault(Z)V
URL: https://github.com/apache/incubator-hudi/issues/550#issuecomment-510590195
 
 
   In #751 , I actually found that the jetty jars are needed by the new 
embedded timeline server.. 
   
   ```
   2019-07-11 17:54:10 INFO  FileSystemViewManager:176 - Creating embedded 
rocks-db based Table View
   Exception in thread "main" java.lang.NoClassDefFoundError: 
org/eclipse/jetty/util/log/Log
at io.javalin.core.util.JettyServerUtil.(JettyServerUtil.kt:34)
at io.javalin.Javalin.create(Javalin.java:106)
at 
com.uber.hoodie.timeline.service.TimelineService.startService(TimelineService.java:101)
at 
com.uber.hoodie.client.embedded.EmbeddedTimelineService.startServer(EmbeddedTimelineService.java:72)
at 
com.uber.hoodie.AbstractHoodieClient.startEmbeddedServerView(AbstractHoodieClient.java:98)
at 
com.uber.hoodie.AbstractHoodieClient.(AbstractHoodieClient.java:66)
at com.uber.hoodie.HoodieWriteClient.(HoodieWriteClient.java:134)
at com.uber.hoodie.HoodieWriteClient.(HoodieWriteClient.java:129)
at com.uber.hoodie.HoodieWriteClient.(HoodieWriteClient.java:123)
at 
com.uber.hoodie.utilities.deltastreamer.DeltaSync.setupWriteClient(DeltaSync.java:442)
at 
com.uber.hoodie.utilities.deltastreamer.DeltaSync.(DeltaSync.java:185)
at 
com.uber.hoodie.utilities.deltastreamer.HoodieDeltaStreamer$DeltaSyncService.(HoodieDeltaStreamer.java:362)
at 
com.uber.hoodie.utilities.deltastreamer.HoodieDeltaStreamer.(HoodieDeltaStreamer.java:101)
at 
com.uber.hoodie.utilities.deltastreamer.HoodieDeltaStreamer.(HoodieDeltaStreamer.java:95)
at 
com.uber.hoodie.utilities.deltastreamer.HoodieDeltaStreamer.main(HoodieDeltaStreamer.java:289)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at 
org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:894)
at 
org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:198)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:228)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:137)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
   Caused by: java.lang.ClassNotFoundException: org.eclipse.jetty.util.log.Log
at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
   ```
   
   but lets shade them. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on issue #736: hoodie-hive-hundle don't have hive jars

2019-07-11 Thread GitBox
vinothchandar commented on issue #736: hoodie-hive-hundle don't have hive jars
URL: https://github.com/apache/incubator-hudi/issues/736#issuecomment-510581625
 
 
   @cdmikechen these jars are in the hive installation, thats why we don't 
bundle them. 
   
   ```
   ls hive/lib/datanucleus-*
   hive/lib/datanucleus-api-jdo-4.2.4.jar   
hive/lib/datanucleus-core-4.1.17.jar  hive/lib/datanucleus-rdbms-4.1.19.jar
   root@adhoc-2:/opt#
   ```
   
   is it possible the the script is not just picking them up?  are you able to 
repro this on top of #751 and see if this still is an issue?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] amaranathv commented on issue #764: Hoodie 0.4.7: Error upserting bucketType UPDATE for partition #, No value present

2019-07-11 Thread GitBox
amaranathv commented on issue #764: Hoodie 0.4.7:  Error upserting bucketType 
UPDATE for partition #, No value present
URL: https://github.com/apache/incubator-hudi/issues/764#issuecomment-510581260
 
 
   I am getting same error.
   
   scala> 
.save("/datalake/888/888/888/hive/warehouse/test_hudi_spark_no_part_1_mor")
   19/07/11 12:31:45 WARN TaskSetManager: Lost task 0.0 in stage 304.0 (TID 
464, 8.uhc.com, executor 2): 
com.uber.hoodie.exception.HoodieUpsertException: Error upserting bucketType 
UPDATE for partition :0
   at 
com.uber.hoodie.table.HoodieCopyOnWriteTable.handleUpsertPartition(HoodieCopyOnWriteTable.java:274)
   at 
com.uber.hoodie.HoodieWriteClient.lambda$upsertRecordsInternal$7ef77fd$1(HoodieWriteClient.java:451)
   at 
org.apache.spark.api.java.JavaRDDLike$$anonfun$mapPartitionsWithIndex$1.apply(JavaRDDLike.scala:102)
   at 
org.apache.spark.api.java.JavaRDDLike$$anonfun$mapPartitionsWithIndex$1.apply(JavaRDDLike.scala:102)
   at 
org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$26.apply(RDD.scala:844)
   at 
org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$26.apply(RDD.scala:844)
   at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
   at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
   at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
   at org.apache.spark.rdd.RDD$$anonfun$8.apply(RDD.scala:336)
   at org.apache.spark.rdd.RDD$$anonfun$8.apply(RDD.scala:334)
   at 
org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1055)
   at 
org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1029)
   at 
org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:969)
   at 
org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1029)
   at 
org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:760)
   at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:334)
   at org.apache.spark.rdd.RDD.iterator(RDD.scala:285)
   at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
   at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
   at org.apache.spark.scheduler.Task.run(Task.scala:108)
   at 
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:338)
   at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
   at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
   at java.lang.Thread.run(Thread.java:748)
   Caused by: com.uber.hoodie.exception.HoodieUpsertException: Failed to 
initialize HoodieAppendHandle for FileId: 
951d569b-188d-46e4-ad94-a32525fac797-0 on commit 20190711123144 on HDFS path 
/datalake/optum/optuminsight/udw/hive/warehouse/test_hudi_spark_no_part_1_mor
   at 
com.uber.hoodie.io.HoodieAppendHandle.init(HoodieAppendHandle.java:141)
   at 
com.uber.hoodie.io.HoodieAppendHandle.doAppend(HoodieAppendHandle.java:193)
   at 
com.uber.hoodie.table.HoodieMergeOnReadTable.handleUpdate(HoodieMergeOnReadTable.java:118)
   at 
com.uber.hoodie.table.HoodieCopyOnWriteTable.handleUpsertPartition(HoodieCopyOnWriteTable.java:266)
   ... 28 more
   Caused by: java.lang.IllegalArgumentException: Can not create a Path from an 
empty string
   at org.apache.hadoop.fs.Path.checkPathArg(Path.java:130)
   at org.apache.hadoop.fs.Path.(Path.java:138)
   at org.apache.hadoop.fs.Path.(Path.java:92)
   at 
com.uber.hoodie.io.HoodieAppendHandle.createLogWriter(HoodieAppendHandle.java:277)
   at 
com.uber.hoodie.io.HoodieAppendHandle.init(HoodieAppendHandle.java:132)
   ... 31 more
   
   19/07/11 12:31:45 ERROR TaskSetManager: Task 0 in stage 304.0 failed 4 
times; aborting job
   org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in 
stage 304.0 failed 4 times, most recent failure: Lost task 0.3 in stage 304.0 
(TID 467, dbslt1829.uhc.com, executor 2): 
com.uber.hoodie.exception.HoodieUpsertException: Error upserting bucketType 
UPDATE for partition :0
   at 
com.uber.hoodie.table.HoodieCopyOnWriteTable.handleUpsertPartition(HoodieCopyOnWriteTable.java:274)
   at 
com.uber.hoodie.HoodieWriteClient.lambda$upsertRecordsInternal$7ef77fd$1(HoodieWriteClient.java:451)
   at 

[GitHub] [incubator-hudi] vinothchandar closed issue #705: hadoop 2.8.x miss RecoveryInProgressException class

2019-07-11 Thread GitBox
vinothchandar closed issue #705: hadoop 2.8.x miss RecoveryInProgressException 
class
URL: https://github.com/apache/incubator-hudi/issues/705
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on issue #735: Failed to instantiate SLF4J LoggerFactory

2019-07-11 Thread GitBox
vinothchandar commented on issue #735: Failed to instantiate SLF4J LoggerFactory
URL: https://github.com/apache/incubator-hudi/issues/735#issuecomment-510579271
 
 
   @Gowthamsb12  #751 is redoing the bundles. but for this issue, I think you 
should be able to find log4J if the classpath is set correctly? 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] tooptoop4 commented on issue #638: [HUDI-12] Upgrade to Spark 2.4, Avro 1.8.2, Parquet 1.10.0...

2019-07-11 Thread GitBox
tooptoop4 commented on issue #638: [HUDI-12] Upgrade to Spark 2.4, Avro 1.8.2, 
Parquet 1.10.0... 
URL: https://github.com/apache/incubator-hudi/pull/638#issuecomment-510575991
 
 
   @thesuperzapper @bvaradar gentle ping


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] NetsanetGeb edited a comment on issue #714: Performance Comparison of HoodieDeltaStreamer and DataSourceAPI

2019-07-11 Thread GitBox
NetsanetGeb edited a comment on issue #714: Performance Comparison of 
HoodieDeltaStreamer and DataSourceAPI
URL: https://github.com/apache/incubator-hudi/issues/714#issuecomment-510569606
 
 
   **Benchmarking Hudi Upsert** 
   
   I am trying to bench mark Hudi upsert operation and the latency of ingesting 
6 GB of data is 38 minutes with the cluster i provided. How can i enhance this?
   
   For my specific use case, i used a spliced JSON data source with the schema 
having 20 columns.  
   The settings i used  for a cluster with (30 GB of RAM   and  100 GB 
available disk) are:
   spark.driver.memory = 4096m
   spark.executor.memory = 6144m
   spark.executor.instances =3
   spark.driver.cores =1
   spark.executor.cores =1
   hoodie.datasource.write.operation="upsert"
   hoodie.upsert.shuffle.parallellism="1500"
   
   You can see the details from the UI of the spark job provided below:
   
   
![hudiUpsert1](https://user-images.githubusercontent.com/25975892/61070032-f39a0c00-a40d-11e9-9f41-7909f0a045d4.png)
   
   
![hudiUpsert2](https://user-images.githubusercontent.com/25975892/61070050-057baf00-a40e-11e9-9139-b97c421ac99b.png)
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] NetsanetGeb commented on issue #714: Performance Comparison of HoodieDeltaStreamer and DataSourceAPI

2019-07-11 Thread GitBox
NetsanetGeb commented on issue #714: Performance Comparison of 
HoodieDeltaStreamer and DataSourceAPI
URL: https://github.com/apache/incubator-hudi/issues/714#issuecomment-510569606
 
 
   **### Bench marking Hudi Upsert** 
   
   I am trying to bench mark Hudi upsert operation and the latency of ingesting 
6 GB of data is 38 minutes with the cluster i provided. How can i enhance this?
   
   For my specific use case, i used a spliced JSON data source with the schema 
having 20 columns.  
   The settings i used  for a cluster with (30 GB of RAM   and  100 GB 
available disk) are:
   spark.driver.memory = 4096m
   spark.executor.memory = 6144m
   spark.executor.instances =3
   spark.driver.cores =1
   spark.executor.cores =1
   hoodie.datasource.write.operation="upsert"
   hoodie.upsert.shuffle.parallellism="1500"
   
   You can see the details from the UI of the spark job provided below:
   
![hudiUpsert1](https://user-images.githubusercontent.com/25975892/61070032-f39a0c00-a40d-11e9-9f41-7909f0a045d4.png)
   
![hudiUpsert2](https://user-images.githubusercontent.com/25975892/61070050-057baf00-a40e-11e9-9139-b97c421ac99b.png)
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] NetsanetGeb opened a new issue #714: Performance Comparison of HoodieDeltaStreamer and DataSourceAPI

2019-07-11 Thread GitBox
NetsanetGeb opened a new issue #714: Performance Comparison of 
HoodieDeltaStreamer and DataSourceAPI
URL: https://github.com/apache/incubator-hudi/issues/714
 
 
   I am trying to ingest data using the DatasourceAPI and HoodieDeltaStreamer.  
   The ingestion latency of using DataSource API with the  HoodieSparkSQLWriter 
 is high compared to using delta streamer.Both datasource and deltastreamer use 
the same APIs. Why is it slow?  Are there specific option where we could 
specify to minimize the ingestion latency. 
   
   For example: when i run the delta streamer its talking about 1 minute to 
insert some data. If i use DataSource API with HoodieSparkSQLWriter, its taking 
 4 minutes. How can we optimize this?
   
   I am attaching the spark UI of HoodieDeltaStreamer and DataSourceAPI 
respectively. 
   
   Spark UI of HoodieDeltaStreamer
   Note: The class Main is same as the HoodieDeltaStreamer
   
   
![Spark_UI_of_HoodieDeltaStreamer](https://user-images.githubusercontent.com/25975892/58870227-16732b00-86c0-11e9-8907-1a751dd18958.png)
   
   SparkUI of DataSourceAPI
   Note: am using the hoodie delta streamer first to get the dataframe and 
checkpoint and then later use the hoodie spark sql writer for ingestion. 
   
   
![Spark_UI_of_SparkSQLWritter_DataSourceAPI](https://user-images.githubusercontent.com/25975892/58870529-a87b3380-86c0-11e9-8431-d0233618bd7c.png)
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on issue #700: HUDI-138 - Meta Files handling also need to support consistency guard

2019-07-11 Thread GitBox
vinothchandar commented on issue #700: HUDI-138 - Meta Files handling also need 
to support consistency guard
URL: https://github.com/apache/incubator-hudi/pull/700#issuecomment-510562983
 
 
   @bvaradar is this ready for merging? 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on issue #780: Cleanup Maven POM/Classpath

2019-07-11 Thread GitBox
vinothchandar commented on issue #780: Cleanup Maven POM/Classpath
URL: https://github.com/apache/incubator-hudi/pull/780#issuecomment-510547665
 
 
   @thesuperzapper as for testing, if you can run the demo steps once and 
confirm there are no NoClassDefFound errors and such, it would be a good start. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on issue #780: Cleanup Maven POM/Classpath

2019-07-11 Thread GitBox
vinothchandar commented on issue #780: Cleanup Maven POM/Classpath
URL: https://github.com/apache/incubator-hudi/pull/780#issuecomment-510538727
 
 
   @thesuperzapper Thanks!  This is great.. let me review these. I have also 
been working on #751 around the same thing.. Ideally, if we can merge these 
changes onto a branch and test, that would be ideal.. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services