grundprinzip commented on PR #39137:
URL: https://github.com/apache/spark/pull/39137#issuecomment-1365099279

   > * I worried that maybe it would not be easy to maintenance when the rules 
on one side (PySpark vs JVM) were arbitrarily changed in the future. So, I 
wanted to manage all errors in a single error class file(error-class.json) 
across the entire Apache Spark project to reduce the management cost.
   
   I think it's fine to reuse the `error-class.json` file. Let's figure out a 
way to bundle it as part of the PySpark build. If we can't do that, let's add a 
`python-error-class.json` and verify its integrity in `SparkThrowableSuite` the 
same way as we do for the `error-class.json`. 
   
   > * I thought I might see an advantage in that we can simply reuse the 
existing error class as it is without adding a new one when there is a similar 
error already defined on the JVM side in the future.
   
   I'm not sure how much of a benefit that we get here. For all server side 
exceptions they're thrown as SparkExceptions anyways and will be properly 
handled. This is where the `CapturedException` comes in. For client side 
exceptions, we should use the conceptually same approach but in a language 
idiomatic way. 
   
   > * Like the functions in `functions.py` , most of PySpark's functions 
leverage the JVM's logic, so it is assumed that the JVM will run at least once. 
So I thought that calling the error implemented in is acceptable for the 
expected overhead.
   
   The functions that are handled in this PR are purely client side and should 
not require the vm traverse. 
   
   
   Since the primary reason for `SparkThrowableHelper` is to provide a lookup 
in a JSON file and map parameters correctly, I feel that this behavior should 
be replicated in all of the clients (PySpark, PySpark with SparkConnect, Scala 
for Spark Connect etc) but follow the message format.
   
   @HyukjinKwon what is your opinion?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to