Re: [I] [SUPPORT] org.apache.avro.SchemaParseException: Can't redefine: array When there are Top level variables , Struct and Array[struct] (no complex datatype within array[struct]) [hudi]
ad1happy2go commented on issue #7717: URL: https://github.com/apache/hudi/issues/7717#issuecomment-2049464984 @Jonathanrodrigr12 Did you also had multiple "value" column across structs? This may be same as issue raised by @junkri https://github.com/apache/hudi/issues/10983 and not this original issue. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [SUPPORT] org.apache.avro.SchemaParseException: Can't redefine: array When there are Top level variables , Struct and Array[struct] (no complex datatype within array[struct]) [hudi]
junkri commented on issue #7717: URL: https://github.com/apache/hudi/issues/7717#issuecomment-2045350745 @Jonathanrodrigr12 I think I ran into the same problem as you, I can see on your screenshot that you have the field called "value" defined multiple times, once as a `decimal` and once as a `struct`. I've just raised a separate issue covering this, please check out if it covers your situation as well: https://github.com/apache/hudi/issues/10983 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [SUPPORT] org.apache.avro.SchemaParseException: Can't redefine: array When there are Top level variables , Struct and Array[struct] (no complex datatype within array[struct]) [hudi]
Jonathanrodrigr12 commented on issue #7717: URL: https://github.com/apache/hudi/issues/7717#issuecomment-1969960684 i am use Spark version : 3.4.1 and hudi 0.14.0 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [SUPPORT] org.apache.avro.SchemaParseException: Can't redefine: array When there are Top level variables , Struct and Array[struct] (no complex datatype within array[struct]) [hudi]
ad1happy2go commented on issue #7717: URL: https://github.com/apache/hudi/issues/7717#issuecomment-1966705910 What hudi and spark version you are using @Jonathanrodrigr12 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [SUPPORT] org.apache.avro.SchemaParseException: Can't redefine: array When there are Top level variables , Struct and Array[struct] (no complex datatype within array[struct]) [hudi]
Jonathanrodrigr12 commented on issue #7717: URL: https://github.com/apache/hudi/issues/7717#issuecomment-1962262653 Hi, i have the same problem but i am use the HoodieMultiTableStreamer **Description** I have a lot parquet files, all of them have this struct ![image](https://github.com/apache/hudi/assets/53848036/2c15084d-b17c-471f-8a5d-0b77391a7958) but the first time when i run the job in emr serverless the data is saved, but int the second attemp i have this error **Expected behavior** The second write succeeds. **Environment Description** Hudi hudi-utilities-bundle_2.12-0.14.0-amzn-0.jar Spark version : 3.4.1 EMR: 6.15.0 Stack Trace `org.apache.hudi.exception.HoodieUpsertException: Error upserting bucketType UPDATE for partition :0 at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpsertPartition(BaseSparkCommitActionExecutor.java:342) at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleInsertPartition(BaseSparkCommitActionExecutor.java:348) at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.lambda$mapPartitionsAsRDD$a3ab3c4$1(BaseSparkCommitActionExecutor.java:259) at org.apache.spark.api.java.JavaRDDLike.$anonfun$mapPartitionsWithIndex$1(JavaRDDLike.scala:102) at org.apache.spark.api.java.JavaRDDLike.$anonfun$mapPartitionsWithIndex$1$adapted(JavaRDDLike.scala:102) at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsWithIndex$2(RDD.scala:905) at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsWithIndex$2$adapted(RDD.scala:905) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:364) at org.apache.spark.rdd.RDD.iterator(RDD.scala:328) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:364) at org.apache.spark.rdd.RDD.$anonfun$getOrCompute$1(RDD.scala:377) at org.apache.spark.storage.BlockManager.$anonfun$doPutIterator$1(BlockManager.scala:1552) at org.apache.spark.storage.BlockManager.org$apache$spark$storage$BlockManager$$doPut(BlockManager.scala:1462) at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1526) at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:1349) at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:375) at org.apache.spark.rdd.RDD.iterator(RDD.scala:326) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:364) at org.apache.spark.rdd.RDD.iterator(RDD.scala:328) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:92) at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:161) at org.apache.spark.scheduler.Task.run(Task.scala:141) at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:563) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1541) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:566) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:750) Caused by: org.apache.hudi.exception.HoodieException: org.apache.avro.SchemaParseException: Can't redefine: value at org.apache.hudi.table.action.commit.HoodieMergeHelper.runMerge(HoodieMergeHelper.java:149) at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpdateInternal(BaseSparkCommitActionExecutor.java:387) at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpdate(BaseSparkCommitActionExecutor.java:369) at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpsertPartition(BaseSparkCommitActionExecutor.java:335) ... 30 more Caused by: org.apache.avro.SchemaParseException: Can't redefine: value at org.apache.avro.Schema$Names.put(Schema.java:1586) at org.apache.avro.Schema$NamedSchema.writeNameRef(Schema.java:844) at org.apache.avro.Schema$RecordSchema.toJson(Schema.java:1011) at org.apache.avro.Schema$UnionSchema.toJson(Schema.java:1278) at org.apache.avro.Schema$RecordSchema.fieldsToJson(Schema.java:1039) at org.apache.avro.Schema$RecordSchema.toJson(Schema.java:1023) at org.apache.avro.Schema$ArraySchema.toJson(Schema.java:1173) at org.apache.avro.Schema$UnionSchema.toJson(Schema.java:1278) at org.apache.avro.Schema$RecordSchema.fieldsToJson(Schema.java:1039) at