cloud-fan commented on code in PR #41052:
URL: https://github.com/apache/spark/pull/41052#discussion_r1211185323
##########
connector/avro/src/main/scala/org/apache/spark/sql/avro/AvroDeserializer.scala:
##########
@@ -117,6 +119,10 @@ private[sql] class AvroDeserializer(
val incompatibleMsg = errorPrefix +
s"schema is incompatible (avroType = $avroType, sqlType =
${catalystType.sql})"
+ val realDataType = SchemaConverters.toSqlType(avroType).dataType
+ val confKey = SQLConf.LEGACY_AVRO_ALLOW_INCOMPATIBLE_SCHEMA
+ val preventReadingIncorrectType = !SQLConf.get.getConf(confKey)
+
(avroType.getType, catalystType) match {
Review Comment:
After thinking about it more, I think the current code structure is fragile
and not future-proof. What if we have a new logical type based on avro BOOLEAN
and we don't want to read it as a boolean type? The current code structure
matches the avro physical type and the read catalyst type, and we only check
the corresponding catalyst type of the avro type for certain cases.
Instead of using a deny list, I think an allow list is more future-proof: We
explicitly list all the cases that we allow. For example, we can read a long
type avro value as timestamp type for backward compatibility reasons.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]