[GitHub] [spark] Zhen-hao commented on pull request #35379: [SPARK-38091][SQL] fix bugs in AvroSerializer

GitBox Wed, 02 Feb 2022 23:21:34 -0800


Zhen-hao commented on pull request #35379:
URL: https://github.com/apache/spark/pull/35379#issuecomment-1028676684



   > Left a few inline comments about specific implementation details, but I 
have a higher-level concern. I'm not confident that this is the right direction 
to be moving in. It seems that the problem is whoever created the 
`GenericInternalRow` put the wrong data type into the field. 
`GenericInternalRow` implements the `SpecializedGetters` interface, so 
`getDecimal` should work properly on it, right? If the Python UDF logic is 
putting a `BigDecimal` there, that seems wrong to me, because the 
implementation of `BaseGenericInternalRow#getDecimal` (which is used by 
`GenericInternalRow`) assumes that the field contains a `BigDecimal`. The 
Python UDF logic should either wrap the `BigDecimal` inside of a `Decimal` when 
storing the object, or use a subclass of `GenericInternalRow` which overrides 
`getDecimal` to wrap it on an as-needed in the getter.
   > 
   > LMK what you think.
   
   I totally agree that this may not be the right direction. 
   But any fix should pass my added unit tests or produce compiling errors 
instead of runtime errors.
   
   This is just my "easy" fix for some issue that we had with a client. We'd 
like to contribute back and/or make others aware of the issue.
   
   If I were to have unlimited free time, I would remove all the OO designs and 
rewrite the whole Spark with typeclasses as it has been painful to work with 
type casts and interfaces containing `Any`, `AnyRef`, or `Object`.
   
   I'll address your inline comments later.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] Zhen-hao commented on pull request #35379: [SPARK-38091][SQL] fix bugs in AvroSerializer

Reply via email to