peter-toth edited a comment on pull request #31682:
URL: https://github.com/apache/spark/pull/31682#issuecomment-788247450


   > My idea is to let one `Pickler` instance only handle data of the same 
schema.
   > 
   > IIUC the Python UDF operator needs to send the input (values of (c1, c2)) 
from JVM to Python, run the UDF, and send back the UDF result (values of (c3, 
c4)) from Python to JVM. Since the `Pickler` instance is used to serialize both 
the input and output data, the bug happens. Do I understand it correctly?
   
   No sorry, the issue is that the `Pickler` instance in JVM serializes the 
input data `(c1, c2)` = `((1.0, 1.0), (1, 1))` as if it were `((1.0, 1.0), 
(1.0, 1.0))` (i.e. sends the serialized data as something like `((1.0, 1.0), 
<some short (hash?) code of (1.0, 1.0) instance we've seen before>`). At python 
side the other `Pickler` (and actually it is not a pyrolite `Pickler` but some 
Python lib), that serializes the output, has nothing to do with the issue.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to