peter-toth edited a comment on pull request #31682:
URL: https://github.com/apache/spark/pull/31682#issuecomment-788247450


   > My idea is to let one `Pickler` instance only handle data of the same 
schema.
   > 
   > IIUC the Python UDF operator needs to send the input (values of (c1, c2)) 
from JVM to Python, run the UDF, and send back the UDF result (values of (c3, 
c4)) from Python to JVM. Since the `Pickler` instance is used to serialize both 
the input and output data, the bug happens. Do I understand it correctly?
   
   No sorry, the issue is that the `Pickler` instance in JVM that serializes 
the input data `(c1, c2)` = `((1.0, 1.0), (1, 1))` serializes it as if it were 
`((1.0, 1.0), (1.0, 1.0))` (i.e. sends the serialized data as something like 
`((1.0, 1.0), <some short (hash?) code of (1.0, 1.0) instance we've seen 
before>`). At python side the other `Pickler` (and actually it is not a 
pyrolite `Pickler` but some Python lib), that serializes the output has nothing 
to do with the issue.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to