giamo commented on a change in pull request #24405: [SPARK-27506][SQL] Allow 
deserialization of Avro data using compatible schemas
URL: https://github.com/apache/spark/pull/24405#discussion_r304597229
 
 

 ##########
 File path: 
external/avro/src/main/scala/org/apache/spark/sql/avro/AvroDataToCatalyst.scala
 ##########
 @@ -53,7 +54,9 @@ case class AvroDataToCatalyst(
 
   @transient private lazy val avroSchema = new 
Schema.Parser().parse(jsonFormatSchema)
 
-  @transient private lazy val reader = new GenericDatumReader[Any](avroSchema)
+  @transient private lazy val reader = writerJsonFormatSchema
 
 Review comment:
   No, if the writerJsonFormatSchema is set then both writerJsonFormatSchema 
and jsonFormatSchema are used (in the `.map`), otherwise only the 
jsonFormatSchema is used (in the `.getOrElse`).
   
   In the first case the GenericDatumReader is built with the constructor 
accepting the two schemas (reader and writer) while in the second case it's 
built with the constructor accepting only one schema (which is assumed to be 
the same for both operations): 
https://avro.apache.org/docs/1.8.2/api/java/org/apache/avro/generic/GenericDatumReader.html
   
   Please refer to the Jira ticket for a more detailed explanation: 
https://issues.apache.org/jira/browse/SPARK-27506

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to