Zhen-hao commented on pull request #35379: URL: https://github.com/apache/spark/pull/35379#issuecomment-1028676684
> Left a few inline comments about specific implementation details, but I have a higher-level concern. I'm not confident that this is the right direction to be moving in. It seems that the problem is whoever created the `GenericInternalRow` put the wrong data type into the field. `GenericInternalRow` implements the `SpecializedGetters` interface, so `getDecimal` should work properly on it, right? If the Python UDF logic is putting a `BigDecimal` there, that seems wrong to me, because the implementation of `BaseGenericInternalRow#getDecimal` (which is used by `GenericInternalRow`) assumes that the field contains a `BigDecimal`. The Python UDF logic should either wrap the `BigDecimal` inside of a `Decimal` when storing the object, or use a subclass of `GenericInternalRow` which overrides `getDecimal` to wrap it on an as-needed in the getter. > > LMK what you think. I totally agree that this may not be the right direction. But any fix should pass my added unit tests or produce compiling errors instead of runtime errors. This is just my "easy" fix for some issue that we had with a client. We'd like to contribute back and/or make others aware of the issue. If I were to have unlimited free time, I would remove all the OO designs and rewrite the whole Spark with typeclasses as it has been painful to work with type casts and interfaces containing `Any`, `AnyRef`, or `Object`. I'll address your inline comments later. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
