HyukjinKwon opened a new pull request #29114:
URL: https://github.com/apache/spark/pull/29114


   ### What changes were proposed in this pull request?
   
   This PR aims to upgrade PySpark's embedded cloudpickle to the latest 
cloudpickle v1.5.0 (See 
https://github.com/cloudpipe/cloudpickle/blob/v1.5.0/cloudpickle/cloudpickle.py)
   
   ### Why are the changes needed?
   
   There are many bug fixes. For example, the bug described in the JIRA:
   
   dill unpickling fails because they define `types.ClassType`, which is 
undefined in dill. This results in the following error:
   
   ```
   Traceback (most recent call last):
     File 
"/usr/local/lib/python3.6/site-packages/apache_beam/internal/pickler.py", line 
279, in loads
       return dill.loads(s)
     File "/usr/local/lib/python3.6/site-packages/dill/_dill.py", line 317, in 
loads
       return load(file, ignore)
     File "/usr/local/lib/python3.6/site-packages/dill/_dill.py", line 305, in 
load
       obj = pik.load()
     File "/usr/local/lib/python3.6/site-packages/dill/_dill.py", line 577, in 
_load_type
       return _reverse_typemap[name]
   KeyError: 'ClassType'
   ```
   
   See also https://github.com/cloudpipe/cloudpickle/issues/82. This was fixed 
for cloudpickle 1.3.0+ (https://github.com/cloudpipe/cloudpickle/pull/337), but 
PySpark's cloudpickle.py doesn't have this change yet.
   
   More notably, now it supports C pickle implementation with Python 3.8 which 
hugely improve performance. This is already adopted in another project such as 
Ray.
   
   ### Does this PR introduce _any_ user-facing change?
   
   Yes, as described above, the bug fixes. Internally, users also could 
leverage the fast cloudpickle backed by C pickle.
   
   ### How was this patch tested?
   
   Jenkins will test it out.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to