[GitHub] [incubator-hudi] amaranathv commented on issue #764: Hoodie 0.4.7: Error upserting bucketType UPDATE for partition #, No value present

2019-07-18 Thread GitBox
amaranathv commented on issue #764: Hoodie 0.4.7:  Error upserting bucketType 
UPDATE for partition #, No value present
URL: https://github.com/apache/incubator-hudi/issues/764#issuecomment-513093444
 
 
   I am still working on performance side of the copy of on write.Will do the 
testing again after the performance test complete.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] amaranathv commented on issue #764: Hoodie 0.4.7: Error upserting bucketType UPDATE for partition #, No value present

2019-07-11 Thread GitBox
amaranathv commented on issue #764: Hoodie 0.4.7:  Error upserting bucketType 
UPDATE for partition #, No value present
URL: https://github.com/apache/incubator-hudi/issues/764#issuecomment-510609967
 
 
   DataSourceWriteOptions.KEYGENERATOR_CLASS_OPT_KEY-> 
"com.uber.hoodie.NonpartitionedKeyGenerator"
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] amaranathv commented on issue #764: Hoodie 0.4.7: Error upserting bucketType UPDATE for partition #, No value present

2019-07-11 Thread GitBox
amaranathv commented on issue #764: Hoodie 0.4.7:  Error upserting bucketType 
UPDATE for partition #, No value present
URL: https://github.com/apache/incubator-hudi/issues/764#issuecomment-510609856
 
 
   yes


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] amaranathv commented on issue #764: Hoodie 0.4.7: Error upserting bucketType UPDATE for partition #, No value present

2019-07-11 Thread GitBox
amaranathv commented on issue #764: Hoodie 0.4.7:  Error upserting bucketType 
UPDATE for partition #, No value present
URL: https://github.com/apache/incubator-hudi/issues/764#issuecomment-510581260
 
 
   I am getting same error.
   
   scala> 
.save("/datalake/888/888/888/hive/warehouse/test_hudi_spark_no_part_1_mor")
   19/07/11 12:31:45 WARN TaskSetManager: Lost task 0.0 in stage 304.0 (TID 
464, 8.uhc.com, executor 2): 
com.uber.hoodie.exception.HoodieUpsertException: Error upserting bucketType 
UPDATE for partition :0
   at 
com.uber.hoodie.table.HoodieCopyOnWriteTable.handleUpsertPartition(HoodieCopyOnWriteTable.java:274)
   at 
com.uber.hoodie.HoodieWriteClient.lambda$upsertRecordsInternal$7ef77fd$1(HoodieWriteClient.java:451)
   at 
org.apache.spark.api.java.JavaRDDLike$$anonfun$mapPartitionsWithIndex$1.apply(JavaRDDLike.scala:102)
   at 
org.apache.spark.api.java.JavaRDDLike$$anonfun$mapPartitionsWithIndex$1.apply(JavaRDDLike.scala:102)
   at 
org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$26.apply(RDD.scala:844)
   at 
org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$26.apply(RDD.scala:844)
   at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
   at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
   at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
   at org.apache.spark.rdd.RDD$$anonfun$8.apply(RDD.scala:336)
   at org.apache.spark.rdd.RDD$$anonfun$8.apply(RDD.scala:334)
   at 
org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1055)
   at 
org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1029)
   at 
org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:969)
   at 
org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1029)
   at 
org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:760)
   at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:334)
   at org.apache.spark.rdd.RDD.iterator(RDD.scala:285)
   at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
   at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
   at org.apache.spark.scheduler.Task.run(Task.scala:108)
   at 
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:338)
   at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
   at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
   at java.lang.Thread.run(Thread.java:748)
   Caused by: com.uber.hoodie.exception.HoodieUpsertException: Failed to 
initialize HoodieAppendHandle for FileId: 
951d569b-188d-46e4-ad94-a32525fac797-0 on commit 20190711123144 on HDFS path 
/datalake/optum/optuminsight/udw/hive/warehouse/test_hudi_spark_no_part_1_mor
   at 
com.uber.hoodie.io.HoodieAppendHandle.init(HoodieAppendHandle.java:141)
   at 
com.uber.hoodie.io.HoodieAppendHandle.doAppend(HoodieAppendHandle.java:193)
   at 
com.uber.hoodie.table.HoodieMergeOnReadTable.handleUpdate(HoodieMergeOnReadTable.java:118)
   at 
com.uber.hoodie.table.HoodieCopyOnWriteTable.handleUpsertPartition(HoodieCopyOnWriteTable.java:266)
   ... 28 more
   Caused by: java.lang.IllegalArgumentException: Can not create a Path from an 
empty string
   at org.apache.hadoop.fs.Path.checkPathArg(Path.java:130)
   at org.apache.hadoop.fs.Path.(Path.java:138)
   at org.apache.hadoop.fs.Path.(Path.java:92)
   at 
com.uber.hoodie.io.HoodieAppendHandle.createLogWriter(HoodieAppendHandle.java:277)
   at 
com.uber.hoodie.io.HoodieAppendHandle.init(HoodieAppendHandle.java:132)
   ... 31 more
   
   19/07/11 12:31:45 ERROR TaskSetManager: Task 0 in stage 304.0 failed 4 
times; aborting job
   org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in 
stage 304.0 failed 4 times, most recent failure: Lost task 0.3 in stage 304.0 
(TID 467, dbslt1829.uhc.com, executor 2): 
com.uber.hoodie.exception.HoodieUpsertException: Error upserting bucketType 
UPDATE for partition :0
   at 
com.uber.hoodie.table.HoodieCopyOnWriteTable.handleUpsertPartition(HoodieCopyOnWriteTable.java:274)
   at 
com.uber.hoodie.HoodieWriteClient.lambda$upsertRecordsInternal$7ef77fd$1(HoodieWriteClient.java:451)
   at 

[GitHub] [incubator-hudi] amaranathv commented on issue #764: Hoodie 0.4.7: Error upserting bucketType UPDATE for partition #, No value present

2019-07-09 Thread GitBox
amaranathv commented on issue #764: Hoodie 0.4.7:  Error upserting bucketType 
UPDATE for partition #, No value present
URL: https://github.com/apache/incubator-hudi/issues/764#issuecomment-509889062
 
 
   I am facing similar issue while creating the MOR tables.Please take a look.
   
   ERROR Log :
   
spark-submit --master yarn  --class 
com.uber.hoodie.utilities.deltastreamer.HoodieDeltaStreamer `ls 
/mapr/user/avenka23/hoodie/incubator-hudi/packaging/hoodie-utilities-bundle/target/hoodie-utilities-bundle*-SNAPSHOT.jar`
   --props 
/user/avenka23/delta-streamer/config/dfs-source_no_partition.properties   
--schemaprovider-class com.uber.hoodie.utilities.schema.FilebasedSchemaProvider 
  --source-class com.uber.hoodie.utilities.sources.JsonDFSSource   
--source-ordering-field ts   --target-base-path 
/../stock_ticks_cow_no_part_DEMO_MR --target-table 
stock_ticks_cow_no_part_DEMO_MR  --storage-type MERGE_ON_READ 
--key-generator-class com.uber.hoodie.NonpartitionedKeyGenerator
   19/07/09 22:01:15 WARN SchedulerConfGenerator: Job Scheduling Configs will 
not be in effect as spark.scheduler.mode is not set to FAIR at instatiation 
time. Continuing without scheduling configs
   19/07/09 22:01:20 WARN Client: Neither spark.yarn.jars nor 
spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
   ERROR StatusLogger No log4j2 configuration file found. Using default 
configuration: logging only errors to the console.
   19/07/09 22:01:35 WARN SparkContext: Using an existing SparkContext; some 
configuration may not take effect.
   19/07/09 22:01:38 WARN TaskSetManager: Lost task 1.0 in stage 1.0 (TID 2, 
dsfsdf.sdfsd.com, executor 2): java.lang.IllegalArgumentException: Can not 
create a Path from an empty string
   at org.apache.hadoop.fs.Path.checkPathArg(Path.java:130)
   at org.apache.hadoop.fs.Path.(Path.java:138)
   at org.apache.hadoop.fs.Path.(Path.java:92)
   at 
com.uber.hoodie.table.HoodieMergeOnReadTable.lambda$rollback$5(HoodieMergeOnReadTable.java:510)
   at 
java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184)
   at 
java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175)
   at 
java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382)
   at 
java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
   at 
java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
   at 
java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151)
   at 
java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174)
   at 
java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
   at 
java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418)
   at 
com.uber.hoodie.table.HoodieMergeOnReadTable.rollback(HoodieMergeOnReadTable.java:505)
   at 
com.uber.hoodie.table.HoodieMergeOnReadTable.lambda$rollback$328a965c$1(HoodieMergeOnReadTable.java:307)
   at 
org.apache.spark.api.java.JavaPairRDD$$anonfun$toScalaFunction$1.apply(JavaPairRDD.scala:1040)
   at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
   at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:462)
   at scala.collection.Iterator$class.foreach(Iterator.scala:893)
   at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
   at 
scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59)
   at 
scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:104)
   at 
scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:48)
   at 
scala.collection.TraversableOnce$class.to(TraversableOnce.scala:310)
   at scala.collection.AbstractIterator.to(Iterator.scala:1336)
   at 
scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:302)
   at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1336)
   at 
scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:289)
   at scala.collection.AbstractIterator.toArray(Iterator.scala:1336)
   at 
org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$13.apply(RDD.scala:936)
   at 
org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$13.apply(RDD.scala:936)
   at 
org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2069)
   at 
org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2069)
   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
   at org.apache.spark.scheduler.Task.run(Task.scala:108)
   at 
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:338)
   at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
   at