Github user mridulm commented on the issue: https://github.com/apache/spark/pull/21698 @cloud-fan There is no ambiguity in output of map - one record in, one record out. In case of zip, as you said, number of output records is min of both. Given this, there is no ambiguity in cardinality of zip().map() - I think @jiangxb1987's point was that which two tuples from rdd1 and rdd2 get zip'ed together can be arbitrary : and I agree about that. Note that the problem I surfaced above will cause data loss even after the proposed fix in this PR by @jiangxb1987 btw.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org