HyukjinKwon commented on a change in pull request #20691: [SPARK-18161] 
[Python] Update cloudpickle to v0.6.1
URL: https://github.com/apache/spark/pull/20691#discussion_r249232099
 
 

 ##########
 File path: python/pyspark/broadcast.py
 ##########
 @@ -110,7 +110,7 @@ def __init__(self, sc=None, value=None, 
pickle_registry=None, path=None,
 
     def dump(self, value, f):
         try:
-            pickle.dump(value, f, 2)
+            pickle.dump(value, f, pickle.HIGHEST_PROTOCOL)
 
 Review comment:
   Yea, it should be great if we know the context about why it was set 2 
previously. I suspect there's no particular reason but should be good to double 
check and leave the reason if it's able to find.
   
   The highest pickle protocol will be 2 in Python 2 and 4 in Python 3.4+. So, 
we're changing it from 2 to 4 when Python 3.4+.
   
   One possibility is that it was set to 2 for the worry about writing and 
reading even in different Python versions but I don't think that's not 
guranteed in PySpark. Maybe we should explicitly note this somewhere as well.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to