[GitHub] [spark] Zhen-hao commented on pull request #35379: [SPARK-38091][SQL] fix bugs in AvroSerializer

GitBox Fri, 18 Feb 2022 01:39:23 -0800


Zhen-hao commented on pull request #35379:
URL: https://github.com/apache/spark/pull/35379#issuecomment-1044223469



   > > I totally agree that this may not be the right direction.
   > > But any fix should pass my added unit tests or produce compiling errors 
instead of runtime errors.
   > > This is just my "easy" fix for some issue that we had with a client. 
We'd like to contribute back and/or make others aware of the issue.
   > > If I were to have unlimited free time, I would remove all the OO designs 
and rewrite the whole Spark with typeclasses as it has been painful to work 
with type casts and interfaces containing Any, AnyRef, or Object.
   > > I'll address your inline comments later.
   > 
   > Actually, could you provide an end-to-end Python UDF example? It can be in 
the PR description instead of code changes.
   
   Good question. We didn't find the issue with the public Spark API. 
   We had a copy the `org.apache.spark.sql.avro` package in our code base 
becuse it is private to Spark and we wanted to build a layer on top of it to 
make a serialization/deserialization library. We didn't see the problem until 
we offered our Python users an Avro serialization UDF. We solved the issue 
first by letting Python users to use not the UDF but scala API via `sc._jvm`. 
   
   I don't know how to repduce the issue with the public API...
   
   Again, the nature of the change in this PR is more about making the Scala 
code hermetic (or mitigate some leaky abstractions).
   This PR doesn't need to be merged if there are better ways to improve.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] Zhen-hao commented on pull request #35379: [SPARK-38091][SQL] fix bugs in AvroSerializer

Reply via email to