gengliangwang commented on a change in pull request #25419: [SPARK-28698][SQL]
Support user-specified output schema in `to_avro`
URL: https://github.com/apache/spark/pull/25419#discussion_r313208446
##########
File path:
external/avro/src/test/scala/org/apache/spark/sql/avro/AvroCatalystDataConversionSuite.scala
##########
@@ -209,4 +209,32 @@ class AvroCatalystDataConversionSuite extends
SparkFunSuite
checkUnsupportedRead(input, avroSchema)
}
}
+
+ test("user-specified schema") {
+ val data = Literal("SPADES")
+ val jsonFormatSchema =
+ """
+ |{ "type": "enum",
+ | "name": "Suit",
+ | "symbols" : ["SPADES", "HEARTS", "DIAMONDS", "CLUBS"]
+ |}
+ """.stripMargin
+ checkEvaluation(
+ AvroDataToCatalyst(
+ CatalystDataToAvro(
+ data,
+ Some(jsonFormatSchema)),
+ jsonFormatSchema,
+ options = Map.empty),
+ data.eval())
+ val message = intercept[SparkException] {
+ AvroDataToCatalyst(
+ CatalystDataToAvro(
+ data,
+ None),
+ jsonFormatSchema,
+ options = Map.empty).eval()
+ }.getMessage
+ assert(message.contains("Malformed records are detected in record
parsing."))
Review comment:
Here `AvroDataToCatalyst` is just to check the Avro schema of
`CatalystDataToAvro`.
1. When `jsonFormatSchema` is provided in `CatalystDataToAvro`, the output
Avro schema is `enum` type, and we validate it with `AvroDataToCatalyst`. This
proves that the provided schema works.
2. When the `jsonFormatSchema` is None, the output Avro schema is `string`
type, and it can't be parsed as `enum` type.
I will change the order of the two checks in the case and add a new test
case for invalid user-specified schema
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]