Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/21157
> Ok it looks like it was @HyukjinKwon who suggested that we remove this
hack in general rather than the partial work around can I get your thoughts on
why? It seems like the partial work around would give us the best of both
worlds (e.g. we don't break peoples existing Spark code and we handle Python
tuples better).
Sorry for the late response. Yes, I spent some time to take a look for this
named tuple hack, and my impression was that we should have not added such
fixes to only allow named tuple pickling. The named tuple hack was introduced
for both cloudpickle (SQL path) and normal pickle path, if I am not mistaken.
Cloudpickle at PySpark side now supports it so workaround to allow the
cases above should be to use CloudPickler when it's possible. I think PySpark
API exposes this pickler (see the `SparkContext`'s `__init__`) (@superbobry,
mind if I ask to document this workaround and add a test (and see if it really
works?)
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]