[ https://issues.apache.org/jira/browse/SPARK-32122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hyukjin Kwon resolved SPARK-32122. ---------------------------------- Resolution: Cannot Reproduce Let's resolve it as cannot reproduce since it can't in 3.0. We could identify which JIRA fixed and discuss about feasibility about porting back. > Exception while writing dataframe with enum fields > -------------------------------------------------- > > Key: SPARK-32122 > URL: https://issues.apache.org/jira/browse/SPARK-32122 > Project: Spark > Issue Type: Question > Components: SQL > Affects Versions: 2.4.3 > Reporter: Sai kiran Krishna murthy > Priority: Minor > Labels: avro, spark-sql > > I have an avro schema with one field which is an enum and I am trying to > enforce this schema when I am writing my dataframe, the code looks something > like this > {code:java} > case class Name1(id:String,count:Int,val_type:String) > val schema = """{ > | "type" : "record", > | "name" : "name1", > | "namespace" : "com.data", > | "fields" : [ > | { > | "name" : "id", > | "type" : "string" > | }, > | { > | "name" : "count", > | "type" : "int" > | }, > | { > | "name" : "val_type", > | "type" : { > | "type" : "enum", > | "name" : "ValType", > | "symbols" : [ "s1", "s2" ] > | } > | } > | ] > |}""".stripMargin > val df = Seq( > Name1("1",2,"s1"), > Name1("1",3,"s2"), > Name1("1",4,"s2"), > Name1("11",2,"s1")).toDF() > df.write.format("avro").option("avroSchema",schema).save("data/tes2/") > {code} > This code fails with the following exception, > > {noformat} > 2020-06-28 23:28:10 ERROR Utils:91 - Aborting task > org.apache.avro.AvroRuntimeException: Not a union: "string" > at org.apache.avro.Schema.getTypes(Schema.java:299) > at > org.apache.spark.sql.avro.AvroSerializer.org$apache$spark$sql$avro$AvroSerializer$$resolveNullableType(AvroSerializer.scala:229) > at > org.apache.spark.sql.avro.AvroSerializer$$anonfun$3.apply(AvroSerializer.scala:209) > at > org.apache.spark.sql.avro.AvroSerializer$$anonfun$3.apply(AvroSerializer.scala:208) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at scala.collection.immutable.List.foreach(List.scala:392) > at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) > at scala.collection.immutable.List.map(List.scala:296) > at > org.apache.spark.sql.avro.AvroSerializer.newStructConverter(AvroSerializer.scala:208) > at > org.apache.spark.sql.avro.AvroSerializer.<init>(AvroSerializer.scala:51) > at > org.apache.spark.sql.avro.AvroOutputWriter.serializer$lzycompute(AvroOutputWriter.scala:42) > at > org.apache.spark.sql.avro.AvroOutputWriter.serializer(AvroOutputWriter.scala:42) > at > org.apache.spark.sql.avro.AvroOutputWriter.write(AvroOutputWriter.scala:64) > at > org.apache.spark.sql.execution.datasources.SingleDirectoryDataWriter.write(FileFormatDataWriter.scala:137) > at > org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:245) > at > org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:242) > at > org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1394) > at > org.apache.spark.sql.execution.datasources.FileFormatWriter$.org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask(FileFormatWriter.scala:248) > at > org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:170) > at > org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:169) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) > at org.apache.spark.scheduler.Task.run(Task.scala:121) > at > org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408) > at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > 2020-06-28 23:28:10 ERROR Utils:91 - Aborting task{noformat} > > I understand this is because of the type of val_type is `String` in the case > class. Can you please advice how I can solve this problem without having to > change the underlying avro schema? > Thanks! -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org