[GitHub] [spark] peter-toth edited a comment on pull request #31682: [WIP][SPARK-34545][SQL] Fix issues with valueCompare feature of pyrolite

GitBox Mon, 01 Mar 2021 12:34:26 -0800


peter-toth edited a comment on pull request #31682:
URL: https://github.com/apache/spark/pull/31682#issuecomment-788247450



   > My idea is to let one `Pickler` instance only handle data of the same 
schema.
   > 
   > IIUC the Python UDF operator needs to send the input (values of (c1, c2)) 
from JVM to Python, run the UDF, and send back the UDF result (values of (c3, 
c4)) from Python to JVM. Since the `Pickler` instance is used to serialize both 
the input and output data, the bug happens. Do I understand it correctly?
   
   No sorry, the issue is that the `Pickler` instance in JVM that serializes 
the input data `(c1, c2)` = `((1.0, 1.0), (1, 1))` serializes it as if it were 
`((1.0, 1.0), (1.0, 1.0))` (i.e. sends the serialized data as something like 
`((1.0, 1.0), <some short (hash?) code of (1.0, 1.0) instance we've seen 
before>`). At python side the other `Pickler` (and actually it is not a 
pyrolite `Pickler` but some Python lib), that serializes the output has nothing 
to do with the issue.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] peter-toth edited a comment on pull request #31682: [WIP][SPARK-34545][SQL] Fix issues with valueCompare feature of pyrolite

Reply via email to