Github user jiangxb1987 commented on the issue:
https://github.com/apache/spark/pull/21698
IIUC the output produced by `rdd1.zip(rdd2).map(v => (computeKey(v._1,
v._2), computeValue(v._1, v._2)))` shall always have the same cardinality, no
matter how many tasks are retried, so where is the data loss issue?--- --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
