Zhen-hao commented on pull request #35379: URL: https://github.com/apache/spark/pull/35379#issuecomment-1044223469
> > I totally agree that this may not be the right direction. > > But any fix should pass my added unit tests or produce compiling errors instead of runtime errors. > > This is just my "easy" fix for some issue that we had with a client. We'd like to contribute back and/or make others aware of the issue. > > If I were to have unlimited free time, I would remove all the OO designs and rewrite the whole Spark with typeclasses as it has been painful to work with type casts and interfaces containing Any, AnyRef, or Object. > > I'll address your inline comments later. > > Actually, could you provide an end-to-end Python UDF example? It can be in the PR description instead of code changes. Good question. We didn't find the issue with the public Spark API. We had a copy the `org.apache.spark.sql.avro` package in our code base becuse it is private to Spark and we wanted to build a layer on top of it to make a serialization/deserialization library. We didn't see the problem until we offered our Python users an Avro serialization UDF. We solved the issue first by letting Python users to use not the UDF but scala API via `sc._jvm`. I don't know how to repduce the issue with the public API... Again, the nature of the change in this PR is more about making the Scala code hermetic (or mitigate some leaky abstractions). This PR doesn't need to be merged if there are better ways to improve. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
