[GitHub] [spark] cloud-fan commented on a diff in pull request #41052: [SPARK-43380][SQL] Fix Avro data type conversion issues to avoid producing incorrect results

via GitHub Wed, 31 May 2023 00:10:56 -0700


cloud-fan commented on code in PR #41052:
URL: https://github.com/apache/spark/pull/41052#discussion_r1211185323



##########
connector/avro/src/main/scala/org/apache/spark/sql/avro/AvroDeserializer.scala:
##########
@@ -117,6 +119,10 @@ private[sql] class AvroDeserializer(
     val incompatibleMsg = errorPrefix +
         s"schema is incompatible (avroType = $avroType, sqlType = 
${catalystType.sql})"
 
+    val realDataType = SchemaConverters.toSqlType(avroType).dataType
+    val confKey = SQLConf.LEGACY_AVRO_ALLOW_INCOMPATIBLE_SCHEMA
+    val preventReadingIncorrectType = !SQLConf.get.getConf(confKey)
+
     (avroType.getType, catalystType) match {

Review Comment:
   After thinking about it more, I think the current code structure is fragile 
and not future-proof. What if we have a new logical type based on avro BOOLEAN 
and we don't want to read it as a boolean type? The current code structure 
matches the avro physical type and the read catalyst type, and we only check 
the corresponding catalyst type of the avro type for certain cases.
   
   Instead of using a deny list, I think an allow list is more future-proof: We 
explicitly list all the cases that we allow. For example, we can read a long 
type avro value as timestamp type for backward compatibility reasons.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] cloud-fan commented on a diff in pull request #41052: [SPARK-43380][SQL] Fix Avro data type conversion issues to avoid producing incorrect results

Reply via email to