Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/19226#discussion_r139035222
--- Diff: python/pyspark/tests.py ---
@@ -644,6 +644,18 @@ def test_cartesian_chaining(self):
set([(x, (y, y)) for x in range(10) for y in range(10)])
)
+ def test_zip_chaining(self):
+ # Tests for SPARK-21985
+ rdd = self.sc.parallelize('abc')
--- End diff --
I'd set the explicit number of partitions because `zip` reserializes it
depending on this.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]