giamo commented on a change in pull request #24405: [SPARK-27506][SQL] Allow deserialization of Avro data using compatible schemas URL: https://github.com/apache/spark/pull/24405#discussion_r304597229
########## File path: external/avro/src/main/scala/org/apache/spark/sql/avro/AvroDataToCatalyst.scala ########## @@ -53,7 +54,9 @@ case class AvroDataToCatalyst( @transient private lazy val avroSchema = new Schema.Parser().parse(jsonFormatSchema) - @transient private lazy val reader = new GenericDatumReader[Any](avroSchema) + @transient private lazy val reader = writerJsonFormatSchema Review comment: No, if the writerJsonFormatSchema is set then both writerJsonFormatSchema and jsonFormatSchema are used (in the `.map`), otherwise only the jsonFormatSchema is used (in the `.getOrElse`). In the first case the GenericDatumReader is built with the constructor accepting the two schemas (reader and writer) while in the second case it's built with the constructor accepting only one schema (which is assumed to be the same for both operations): https://avro.apache.org/docs/1.8.2/api/java/org/apache/avro/generic/GenericDatumReader.html Please refer to the Jira ticket for a more detailed explanation: https://issues.apache.org/jira/browse/SPARK-27506 ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
