codesue opened a new pull request #28950: URL: https://github.com/apache/spark/pull/28950
### What changes were proposed in this pull request? Update cloudpickle in PySaprk to cloudpickle v1.4.1 (See https://github.com/cloudpipe/cloudpickle/blob/v1.4.1/cloudpickle/cloudpickle.py) This is currently the highest version of cloudpickle, and it does not support Python 2. ### Why are the changes needed? Pyspark's cloudpickle.py and versions of cloudpickle below 1.3.0 interfere with dill unpickling because they define types.ClassType, which is undefined in dill. This results in the following error: ``` Traceback (most recent call last): File "/usr/local/lib/python3.6/site-packages/apache_beam/internal/pickler.py", line 279, in loads return dill.loads(s) File "/usr/local/lib/python3.6/site-packages/dill/_dill.py", line 317, in loads return load(file, ignore) File "/usr/local/lib/python3.6/site-packages/dill/_dill.py", line 305, in load obj = pik.load() File "/usr/local/lib/python3.6/site-packages/dill/_dill.py", line 577, in _load_type return _reverse_typemap[name] KeyError: 'ClassType' ``` (See https://github.com/cloudpipe/cloudpickle/issues/82) This was fixed for cloudpickle 1.3.0+ (https://github.com/cloudpipe/cloudpickle/pull/337), but PySpark's cloudpickle.py doesn't have this change yet. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? `python/run-tests` ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
