pradomota commented on issue #26524: [SPARK-29898][SQL] Support Avro Custom Logical Types URL: https://github.com/apache/spark/pull/26524#issuecomment-558628286 The Avro spec defines a set of logical types, but it also mentions that this is an extensibility point (so the logical types defined in the spec aren't the only possible ones). In fact, the Avro Object Model contains the necessary support to be able to build custom logical types. This enables different scenarios. For example, in [this Stackoverflow question](https://stackoverflow.com/questions/49034266/how-to-define-a-logicaltype-in-avro-java) the user is creating a new logical type that enables encryption of the data before serialization. In my team, we are using custom logical types for several reasons: 1. To enforce a certain syntax on some strings (e.g. phone numbers, uris) 2. To avoid the limitations of the built-in Avro logical types (e.g. Avro's timestamps don't have offset information) Here's a specific example. Imagine the following Avro schema: ```json { "namespace": "org.example", "name": "User", "type": "record", "doc": "User Record", "fields": [ { "name": "email", "type": "string", "doc": "User email" }, { "name": "creationDateTime", "type": { "type": "string", "logicalType": "iso-date" }, "doc": "User creation date in ISO format" } ] } ``` The logical type `iso-date` encodes datetime values as ISO8601 strings. We want this logical type to map to a DateType in Spark, but the current Avro support will map this to a StringType column. This PR contains a proposal that adds the `org.example.CustomAvroLogicalCatalystMapper` interface, which allows users to control how Avro fields that are marked with logical type `iso-date` are serialized into Spark objects, therefore allowing the user to map this logical type to the DateType SQL type.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
