[GitHub] spark issue #21157: [SPARK-22674][PYTHON] Removed the namedtuple pickling pa...

HyukjinKwon Sat, 13 Oct 2018 22:57:02 -0700

Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/21157
  
    > Ok it looks like it was @HyukjinKwon who suggested that we remove this 
hack in general rather than the partial work around can I get your thoughts on 
why? It seems like the partial work around would give us the best of both 
worlds (e.g. we don't break peoples existing Spark code and we handle Python 
tuples better).
    
    Sorry for the late response. Yes, I spent some time to take a look for this 
named tuple hack, and my impression was that we should have not added such 
fixes to only allow named tuple pickling. The named tuple hack was introduced 
for both cloudpickle (SQL path) and normal pickle path, if I am not mistaken.
    
    Cloudpickle at PySpark side now supports it so workaround to allow the 
cases above should be to use CloudPickler when it's possible. I think PySpark 
API exposes this pickler (see the `SparkContext`'s `__init__`) (@superbobry, 
mind if I ask to document this workaround and add a test (and see if it really 
works?)




---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark issue #21157: [SPARK-22674][PYTHON] Removed the namedtuple pickling pa...

Reply via email to