[GitHub] spark issue #21698: [SPARK-23243][Core] Fix RDD.repartition() data correctne...

mridulm Thu, 12 Jul 2018 11:24:00 -0700

Github user mridulm commented on the issue:

    https://github.com/apache/spark/pull/21698
  
    @cloud-fan There is no ambiguity in output of map - one record in, one 
record out.
    In case of zip, as you said, number of output records is min of both.
    Given this, there is no ambiguity in cardinality of zip().map() - I think 
@jiangxb1987's point was that which two tuples from rdd1 and rdd2 get zip'ed 
together can be arbitrary : and I agree about that.
    
    Note that the problem I surfaced above will cause data loss even after the 
proposed fix in this PR by @jiangxb1987 btw.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21698: [SPARK-23243][Core] Fix RDD.repartition() data correctne...

Reply via email to