pradomota commented on issue #26524: [SPARK-29898][SQL] Support Avro Custom 
Logical Types
URL: https://github.com/apache/spark/pull/26524#issuecomment-558628286
 
 
   The Avro spec defines a set of logical types, but it also mentions that this 
is an extensibility point (so the logical types defined in the spec aren't the 
only possible ones). In fact, the Avro Object Model contains the necessary 
support to be able to build custom logical types.
   
   This enables different scenarios. For example, in [this Stackoverflow 
question](https://stackoverflow.com/questions/49034266/how-to-define-a-logicaltype-in-avro-java)
 the user is creating a new logical type that enables encryption of the data 
before serialization. In my team, we are using custom logical types for several 
reasons:
   1. To enforce a certain syntax on some strings (e.g. phone numbers, uris)
   2. To avoid the limitations of the built-in Avro logical types (e.g. Avro's 
timestamps don't have offset information)
   
   Here's a specific example. Imagine the following Avro schema:
   ```json
   {
     "namespace": "org.example",
     "name": "User",
     "type": "record",
     "doc": "User Record",
     "fields": [
       {
         "name": "email",
         "type": "string",
         "doc": "User email"
       },
       {
         "name": "creationDateTime",
         "type": {
           "type": "string",
           "logicalType": "iso-date"
         },
         "doc": "User creation date in ISO format"
       }
     ]
   }
   ```
   The logical type `iso-date` encodes datetime values as ISO8601 strings. We 
want this logical type to map to a DateType in Spark, but the current Avro 
support will map this to a StringType column.
   
   This PR contains a proposal that adds the 
`org.example.CustomAvroLogicalCatalystMapper` interface, which allows users to 
control how Avro fields that are marked with logical type `iso-date` are 
serialized into Spark objects, therefore allowing the user to map this logical 
type to the DateType SQL type.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to