[GitHub] [incubator-hudi] amaranathv commented on issue #764: Hoodie 0.4.7: Error upserting bucketType UPDATE for partition #, No value present
amaranathv commented on issue #764: Hoodie 0.4.7: Error upserting bucketType UPDATE for partition #, No value present URL: https://github.com/apache/incubator-hudi/issues/764#issuecomment-513093444 I am still working on performance side of the copy of on write.Will do the testing again after the performance test complete. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-hudi] amaranathv commented on issue #764: Hoodie 0.4.7: Error upserting bucketType UPDATE for partition #, No value present
amaranathv commented on issue #764: Hoodie 0.4.7: Error upserting bucketType UPDATE for partition #, No value present URL: https://github.com/apache/incubator-hudi/issues/764#issuecomment-510609967 DataSourceWriteOptions.KEYGENERATOR_CLASS_OPT_KEY-> "com.uber.hoodie.NonpartitionedKeyGenerator" This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-hudi] amaranathv commented on issue #764: Hoodie 0.4.7: Error upserting bucketType UPDATE for partition #, No value present
amaranathv commented on issue #764: Hoodie 0.4.7: Error upserting bucketType UPDATE for partition #, No value present URL: https://github.com/apache/incubator-hudi/issues/764#issuecomment-510609856 yes This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-hudi] amaranathv commented on issue #764: Hoodie 0.4.7: Error upserting bucketType UPDATE for partition #, No value present
amaranathv commented on issue #764: Hoodie 0.4.7: Error upserting bucketType UPDATE for partition #, No value present URL: https://github.com/apache/incubator-hudi/issues/764#issuecomment-510581260 I am getting same error. scala> .save("/datalake/888/888/888/hive/warehouse/test_hudi_spark_no_part_1_mor") 19/07/11 12:31:45 WARN TaskSetManager: Lost task 0.0 in stage 304.0 (TID 464, 8.uhc.com, executor 2): com.uber.hoodie.exception.HoodieUpsertException: Error upserting bucketType UPDATE for partition :0 at com.uber.hoodie.table.HoodieCopyOnWriteTable.handleUpsertPartition(HoodieCopyOnWriteTable.java:274) at com.uber.hoodie.HoodieWriteClient.lambda$upsertRecordsInternal$7ef77fd$1(HoodieWriteClient.java:451) at org.apache.spark.api.java.JavaRDDLike$$anonfun$mapPartitionsWithIndex$1.apply(JavaRDDLike.scala:102) at org.apache.spark.api.java.JavaRDDLike$$anonfun$mapPartitionsWithIndex$1.apply(JavaRDDLike.scala:102) at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$26.apply(RDD.scala:844) at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$26.apply(RDD.scala:844) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) at org.apache.spark.rdd.RDD$$anonfun$8.apply(RDD.scala:336) at org.apache.spark.rdd.RDD$$anonfun$8.apply(RDD.scala:334) at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1055) at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1029) at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:969) at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1029) at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:760) at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:334) at org.apache.spark.rdd.RDD.iterator(RDD.scala:285) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) at org.apache.spark.scheduler.Task.run(Task.scala:108) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:338) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: com.uber.hoodie.exception.HoodieUpsertException: Failed to initialize HoodieAppendHandle for FileId: 951d569b-188d-46e4-ad94-a32525fac797-0 on commit 20190711123144 on HDFS path /datalake/optum/optuminsight/udw/hive/warehouse/test_hudi_spark_no_part_1_mor at com.uber.hoodie.io.HoodieAppendHandle.init(HoodieAppendHandle.java:141) at com.uber.hoodie.io.HoodieAppendHandle.doAppend(HoodieAppendHandle.java:193) at com.uber.hoodie.table.HoodieMergeOnReadTable.handleUpdate(HoodieMergeOnReadTable.java:118) at com.uber.hoodie.table.HoodieCopyOnWriteTable.handleUpsertPartition(HoodieCopyOnWriteTable.java:266) ... 28 more Caused by: java.lang.IllegalArgumentException: Can not create a Path from an empty string at org.apache.hadoop.fs.Path.checkPathArg(Path.java:130) at org.apache.hadoop.fs.Path.(Path.java:138) at org.apache.hadoop.fs.Path.(Path.java:92) at com.uber.hoodie.io.HoodieAppendHandle.createLogWriter(HoodieAppendHandle.java:277) at com.uber.hoodie.io.HoodieAppendHandle.init(HoodieAppendHandle.java:132) ... 31 more 19/07/11 12:31:45 ERROR TaskSetManager: Task 0 in stage 304.0 failed 4 times; aborting job org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 304.0 failed 4 times, most recent failure: Lost task 0.3 in stage 304.0 (TID 467, dbslt1829.uhc.com, executor 2): com.uber.hoodie.exception.HoodieUpsertException: Error upserting bucketType UPDATE for partition :0 at com.uber.hoodie.table.HoodieCopyOnWriteTable.handleUpsertPartition(HoodieCopyOnWriteTable.java:274) at com.uber.hoodie.HoodieWriteClient.lambda$upsertRecordsInternal$7ef77fd$1(HoodieWriteClient.java:451) at
[GitHub] [incubator-hudi] amaranathv commented on issue #764: Hoodie 0.4.7: Error upserting bucketType UPDATE for partition #, No value present
amaranathv commented on issue #764: Hoodie 0.4.7: Error upserting bucketType UPDATE for partition #, No value present URL: https://github.com/apache/incubator-hudi/issues/764#issuecomment-509889062 I am facing similar issue while creating the MOR tables.Please take a look. ERROR Log : spark-submit --master yarn --class com.uber.hoodie.utilities.deltastreamer.HoodieDeltaStreamer `ls /mapr/user/avenka23/hoodie/incubator-hudi/packaging/hoodie-utilities-bundle/target/hoodie-utilities-bundle*-SNAPSHOT.jar` --props /user/avenka23/delta-streamer/config/dfs-source_no_partition.properties --schemaprovider-class com.uber.hoodie.utilities.schema.FilebasedSchemaProvider --source-class com.uber.hoodie.utilities.sources.JsonDFSSource --source-ordering-field ts --target-base-path /../stock_ticks_cow_no_part_DEMO_MR --target-table stock_ticks_cow_no_part_DEMO_MR --storage-type MERGE_ON_READ --key-generator-class com.uber.hoodie.NonpartitionedKeyGenerator 19/07/09 22:01:15 WARN SchedulerConfGenerator: Job Scheduling Configs will not be in effect as spark.scheduler.mode is not set to FAIR at instatiation time. Continuing without scheduling configs 19/07/09 22:01:20 WARN Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME. ERROR StatusLogger No log4j2 configuration file found. Using default configuration: logging only errors to the console. 19/07/09 22:01:35 WARN SparkContext: Using an existing SparkContext; some configuration may not take effect. 19/07/09 22:01:38 WARN TaskSetManager: Lost task 1.0 in stage 1.0 (TID 2, dsfsdf.sdfsd.com, executor 2): java.lang.IllegalArgumentException: Can not create a Path from an empty string at org.apache.hadoop.fs.Path.checkPathArg(Path.java:130) at org.apache.hadoop.fs.Path.(Path.java:138) at org.apache.hadoop.fs.Path.(Path.java:92) at com.uber.hoodie.table.HoodieMergeOnReadTable.lambda$rollback$5(HoodieMergeOnReadTable.java:510) at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184) at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175) at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382) at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481) at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471) at java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151) at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174) at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418) at com.uber.hoodie.table.HoodieMergeOnReadTable.rollback(HoodieMergeOnReadTable.java:505) at com.uber.hoodie.table.HoodieMergeOnReadTable.lambda$rollback$328a965c$1(HoodieMergeOnReadTable.java:307) at org.apache.spark.api.java.JavaPairRDD$$anonfun$toScalaFunction$1.apply(JavaPairRDD.scala:1040) at scala.collection.Iterator$$anon$11.next(Iterator.scala:409) at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:462) at scala.collection.Iterator$class.foreach(Iterator.scala:893) at scala.collection.AbstractIterator.foreach(Iterator.scala:1336) at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59) at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:104) at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:48) at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:310) at scala.collection.AbstractIterator.to(Iterator.scala:1336) at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:302) at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1336) at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:289) at scala.collection.AbstractIterator.toArray(Iterator.scala:1336) at org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$13.apply(RDD.scala:936) at org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$13.apply(RDD.scala:936) at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2069) at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2069) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) at org.apache.spark.scheduler.Task.run(Task.scala:108) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:338) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at