[GitHub] [spark] peter-toth edited a comment on pull request #31682: [WIP][SPARK-34545][SQL] Fix issues with valueCompare feature of pyrolite

GitBox Mon, 01 Mar 2021 12:40:07 -0800


peter-toth edited a comment on pull request #31682:
URL: https://github.com/apache/spark/pull/31682#issuecomment-788247450



   > My idea is to let one `Pickler` instance only handle data of the same 
schema.
   > 
   > IIUC the Python UDF operator needs to send the input (values of (c1, c2)) 
from JVM to Python, run the UDF, and send back the UDF result (values of (c3, 
c4)) from Python to JVM. Since the `Pickler` instance is used to serialize both 
the input and output data, the bug happens. Do I understand it correctly?
   
   No sorry, the issue is that the `Pickler` instance in JVM serializes the 
input data `(c1, c2)` = `((1.0, 1.0), (1, 1))` as if it were `((1.0, 1.0), 
(1.0, 1.0))` (i.e. sends the serialized data as something like `((1.0, 1.0), 
<some short (hash?) code of (1.0, 1.0) instance we've seen before>`). At python 
side the other `Pickler` (and actually it is not a pyrolite `Pickler` but some 
Python lib), that serializes the output, has nothing to do with the issue.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] peter-toth edited a comment on pull request #31682: [WIP][SPARK-34545][SQL] Fix issues with valueCompare feature of pyrolite

Reply via email to