Github user tgravescs commented on a diff in the pull request:
https://github.com/apache/spark/pull/22112#discussion_r212332064
--- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala ---
@@ -855,16 +858,17 @@ abstract class RDD[T: ClassTag](
* a map on the other).
*/
def zip[U: ClassTag](other: RDD[U]): RDD[(T, U)] = withScope {
- zipPartitions(other, preservesPartitioning = false) { (thisIter,
otherIter) =>
- new Iterator[(T, U)] {
- def hasNext: Boolean = (thisIter.hasNext, otherIter.hasNext) match
{
- case (true, true) => true
- case (false, false) => false
- case _ => throw new SparkException("Can only zip RDDs with " +
- "same number of elements in each partition")
+ zipPartitionsInternal(other, preservesPartitioning = false,
orderSensitiveFunc = true) {
--- End diff --
I don't want to do zip here. I want to finish that discussion as I've
stated above. I'm not convinced we should "fix" zip. I was trying to get a
consensus on that before we did anything but since we were not able to do that
I think we should leave it as is until we do.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]