[GitHub] [spark] Zhen-hao opened a new pull request #35379: fix bugs in AvroSerializer

GitBox Mon, 31 Jan 2022 23:30:32 -0800


Zhen-hao opened a new pull request #35379:
URL: https://github.com/apache/spark/pull/35379



   **What's the problem to fix?**
   
   `AvroSerializer`'s implementation, at least in `newConverter`, was not 100% 
based on the `InternalRow` and `SpecializedGetters` interface. It assumes many 
implementation details of the interface. 
   
   For example, in 
   
   ```scala
         case (TimestampType, LONG) => avroType.getLogicalType match {
             // For backward compatibility, if the Avro type is Long and it is 
not logical type
             // (the `null` case), output the timestamp value as with 
millisecond precision.
             case null | _: TimestampMillis => (getter, ordinal) =>
               
DateTimeUtils.microsToMillis(timestampRebaseFunc(getter.getLong(ordinal)))
             case _: TimestampMicros => (getter, ordinal) =>
               timestampRebaseFunc(getter.getLong(ordinal))
             case other => throw new IncompatibleSchemaException(errorPrefix +
               s"SQL type ${TimestampType.sql} cannot be converted to Avro 
logical type $other")
           }
   ```
   
   it assumes the `InternalRow` instance encodes `TimestampType` as 
`java.lang.Long`. That's true for `Unsaferow` but not for `GenericInternalRow`. 
   
   Hence the above code will end up with runtime exceptions when used on an 
instance of `GenericInternalRow`, which is most of the case for Python.
   
   This PR may not be complete as I don't have much free time to work on it.
   But it should be a good improvement for now.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] Zhen-hao opened a new pull request #35379: fix bugs in AvroSerializer

Reply via email to