HyukjinKwon commented on a change in pull request #20691: [SPARK-18161] [Python] Update cloudpickle to v0.6.1 URL: https://github.com/apache/spark/pull/20691#discussion_r249232099
########## File path: python/pyspark/broadcast.py ########## @@ -110,7 +110,7 @@ def __init__(self, sc=None, value=None, pickle_registry=None, path=None, def dump(self, value, f): try: - pickle.dump(value, f, 2) + pickle.dump(value, f, pickle.HIGHEST_PROTOCOL) Review comment: Yea, it should be great if we know the context about why it was set 2 previously. I suspect there's no particular reason but should be good to double check and leave the reason if it's able to find. The highest pickle protocol will be 2 in Python 2 and 4 in Python 3.4+. So, we're changing it from 2 to 4 when Python 3.4+. One possibility is that it was set to 2 for the worry about writing and reading even in different Python versions but I don't think that's not guranteed in PySpark. Maybe we should explicitly note this somewhere as well. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org