gengliangwang commented on a change in pull request #25419: [SPARK-28698][SQL] 
Support user-specified output schema in `to_avro`
URL: https://github.com/apache/spark/pull/25419#discussion_r313208446
 
 

 ##########
 File path: 
external/avro/src/test/scala/org/apache/spark/sql/avro/AvroCatalystDataConversionSuite.scala
 ##########
 @@ -209,4 +209,32 @@ class AvroCatalystDataConversionSuite extends 
SparkFunSuite
       checkUnsupportedRead(input, avroSchema)
     }
   }
+
+  test("user-specified schema") {
+    val data = Literal("SPADES")
+    val jsonFormatSchema =
+      """
+        |{ "type": "enum",
+        |  "name": "Suit",
+        |  "symbols" : ["SPADES", "HEARTS", "DIAMONDS", "CLUBS"]
+        |}
+      """.stripMargin
+    checkEvaluation(
+      AvroDataToCatalyst(
+        CatalystDataToAvro(
+          data,
+          Some(jsonFormatSchema)),
+        jsonFormatSchema,
+        options = Map.empty),
+      data.eval())
+    val message = intercept[SparkException] {
+      AvroDataToCatalyst(
+        CatalystDataToAvro(
+          data,
+          None),
+        jsonFormatSchema,
+        options = Map.empty).eval()
+    }.getMessage
+    assert(message.contains("Malformed records are detected in record 
parsing."))
 
 Review comment:
   Here `AvroDataToCatalyst` is just to check the Avro schema of 
`CatalystDataToAvro`.
   1. When `jsonFormatSchema` is provided in `CatalystDataToAvro`, the output 
Avro schema is `enum` type, and we validate it with `AvroDataToCatalyst`. This 
proves that the provided schema works.
   2. When the `jsonFormatSchema` is None, the output Avro schema is `string` 
type, and it can't be parsed as `enum` type.
   
   I will change the order of the two checks in the case and add a new test 
case for invalid user-specified schema 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to