[GitHub] spark issue #23008: [SPARK-22674][PYTHON] Removed the namedtuple pickling pa...

2018-11-12 Thread superbobry
Github user superbobry commented on the issue: https://github.com/apache/spark/pull/23008 Interestingly, `cloudpickle` adds overhead even if the namedtuple is importable: ```bash $ cat a.py from collections import namedtuple A = namedtuple("A", ["foo", "bar"])

[GitHub] spark issue #23008: [SPARK-22674][PYTHON] Removed the namedtuple pickling pa...

2018-11-12 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/23008 BTW, let.s test them in end-to-end. For instance, `spark.range(1).rdd.map(lambda blabla).count()` --- - To unsubscribe,

[GitHub] spark issue #23008: [SPARK-22674][PYTHON] Removed the namedtuple pickling pa...

2018-11-12 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/23008 If the perf diff is big, let's don't change but document that we can use `CloudPickleSerializer()` to avoid breaking change. If the perf diff is rather trivial, let's check if we can

[GitHub] spark issue #23008: [SPARK-22674][PYTHON] Removed the namedtuple pickling pa...

2018-11-12 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/23008 Nope, it should be manually done.. should be great to have it FWIW. I am not yet sure how we're going to measure the performance. I think you can show the performance diff for

[GitHub] spark issue #23008: [SPARK-22674][PYTHON] Removed the namedtuple pickling pa...

2018-11-12 Thread superbobry
Github user superbobry commented on the issue: https://github.com/apache/spark/pull/23008 Is there a benchmark suite for PySpark? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #23008: [SPARK-22674][PYTHON] Removed the namedtuple pickling pa...

2018-11-11 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23008 **[Test build #98710 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98710/testReport)** for PR 23008 at commit

[GitHub] spark issue #23008: [SPARK-22674][PYTHON] Removed the namedtuple pickling pa...

2018-11-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23008 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98710/ Test PASSed. ---

[GitHub] spark issue #23008: [SPARK-22674][PYTHON] Removed the namedtuple pickling pa...

2018-11-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23008 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #23008: [SPARK-22674][PYTHON] Removed the namedtuple pickling pa...

2018-11-11 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23008 **[Test build #98710 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98710/testReport)** for PR 23008 at commit

[GitHub] spark issue #23008: [SPARK-22674][PYTHON] Removed the namedtuple pickling pa...

2018-11-11 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/23008 ok to test --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:

[GitHub] spark issue #23008: [SPARK-22674][PYTHON] Removed the namedtuple pickling pa...

2018-11-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23008 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #23008: [SPARK-22674][PYTHON] Removed the namedtuple pickling pa...

2018-11-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23008 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #23008: [SPARK-22674][PYTHON] Removed the namedtuple pickling pa...

2018-11-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23008 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional