cloud-fan commented on issue #25751: [SPARK-29042][Core] Sampling-based RDD with unordered input should be INDETERMINATE URL: https://github.com/apache/spark/pull/25751#issuecomment-530703027 Do we have any queries return wrong result because of it? for round-robin partitioner, it has an expectation that it should return the same output when rerun, otherwise we need to rerun the entire stage. This is for the correctness of `repartition`. However, I don't think sample has the same problem. End-users would expect sample to return random output, so it doesn't matter if Spark returns different output when rerun tasks of sample.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
