[jira] [Resolved] (SPARK-32122) Exception while writing dataframe with enum fields

Hyukjin Kwon (Jira) Tue, 28 Jul 2020 01:22:22 -0700


     [ 
https://issues.apache.org/jira/browse/SPARK-32122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Hyukjin Kwon resolved SPARK-32122.
----------------------------------
    Resolution: Cannot Reproduce

Let's resolve it as cannot reproduce since it can't in 3.0. We could identify 
which JIRA fixed and discuss about feasibility about porting back.

> Exception while writing dataframe with enum fields
> --------------------------------------------------
>
>                 Key: SPARK-32122
>                 URL: https://issues.apache.org/jira/browse/SPARK-32122
>             Project: Spark
>          Issue Type: Question
>          Components: SQL
>    Affects Versions: 2.4.3
>            Reporter: Sai kiran Krishna murthy
>            Priority: Minor
>              Labels: avro, spark-sql
>
> I have an avro schema with one field which is an enum and I am trying to 
> enforce this schema when I am writing my dataframe, the code looks something 
> like this
> {code:java}
> case class Name1(id:String,count:Int,val_type:String)
> val schema = """{
>                  |  "type" : "record",
>                  |  "name" : "name1",
>                  |  "namespace" : "com.data",
>                  |  "fields" : [
>                  |  {
>                  |    "name" : "id",
>                  |    "type" : "string"
>                  |  },
>                  |  {
>                  |    "name" : "count",
>                  |    "type" : "int"
>                  |  },
>                  |  {
>                  |    "name" : "val_type",
>                  |    "type" : {
>                  |      "type" : "enum",
>                  |      "name" : "ValType",
>                  |      "symbols" : [ "s1", "s2" ]
>                  |    }
>                  |  }
>                  |  ]
>                  |}""".stripMargin
> val df = Seq(
>             Name1("1",2,"s1"),
>             Name1("1",3,"s2"),
>             Name1("1",4,"s2"),
>             Name1("11",2,"s1")).toDF()
> df.write.format("avro").option("avroSchema",schema).save("data/tes2/")
> {code}
> This code fails with the following exception,
>  
> {noformat}
> 2020-06-28 23:28:10 ERROR Utils:91 - Aborting task
> org.apache.avro.AvroRuntimeException: Not a union: "string"
>       at org.apache.avro.Schema.getTypes(Schema.java:299)
>       at 
> org.apache.spark.sql.avro.AvroSerializer.org$apache$spark$sql$avro$AvroSerializer$$resolveNullableType(AvroSerializer.scala:229)
>       at 
> org.apache.spark.sql.avro.AvroSerializer$$anonfun$3.apply(AvroSerializer.scala:209)
>       at 
> org.apache.spark.sql.avro.AvroSerializer$$anonfun$3.apply(AvroSerializer.scala:208)
>       at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>       at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>       at scala.collection.immutable.List.foreach(List.scala:392)
>       at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
>       at scala.collection.immutable.List.map(List.scala:296)
>       at 
> org.apache.spark.sql.avro.AvroSerializer.newStructConverter(AvroSerializer.scala:208)
>       at 
> org.apache.spark.sql.avro.AvroSerializer.<init>(AvroSerializer.scala:51)
>       at 
> org.apache.spark.sql.avro.AvroOutputWriter.serializer$lzycompute(AvroOutputWriter.scala:42)
>       at 
> org.apache.spark.sql.avro.AvroOutputWriter.serializer(AvroOutputWriter.scala:42)
>       at 
> org.apache.spark.sql.avro.AvroOutputWriter.write(AvroOutputWriter.scala:64)
>       at 
> org.apache.spark.sql.execution.datasources.SingleDirectoryDataWriter.write(FileFormatDataWriter.scala:137)
>       at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:245)
>       at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:242)
>       at 
> org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1394)
>       at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$.org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask(FileFormatWriter.scala:248)
>       at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:170)
>       at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:169)
>       at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
>       at org.apache.spark.scheduler.Task.run(Task.scala:121)
>       at 
> org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
>       at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
>       at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>       at java.lang.Thread.run(Thread.java:748)
> 2020-06-28 23:28:10 ERROR Utils:91 - Aborting task{noformat}
>  
> I understand this is because of the type of val_type is  `String` in the case 
> class. Can you please advice how I can solve this problem without having to 
> change the underlying avro schema? 
> Thanks!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-32122) Exception while writing dataframe with enum fields

Reply via email to