[GitHub] [spark] cloud-fan commented on issue #25751: [SPARK-29042][Core] Sampling-based RDD with unordered input should be INDETERMINATE

GitBox Thu, 12 Sep 2019 00:31:11 -0700

cloud-fan commented on issue #25751: [SPARK-29042][Core] Sampling-based RDD 
with unordered input should be INDETERMINATE
URL: https://github.com/apache/spark/pull/25751#issuecomment-530703027
 
 
   Do we have any queries return wrong result because of it?
   
   for round-robin partitioner, it has an expectation that it should return the 
same output when rerun, otherwise we need to rerun the entire stage. This is 
for the correctness of `repartition`.
   
   However, I don't think sample has the same problem. End-users would expect 
sample to return random output, so it doesn't matter if Spark returns different 
output when rerun tasks of sample.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] cloud-fan commented on issue #25751: [SPARK-29042][Core] Sampling-based RDD with unordered input should be INDETERMINATE

Reply via email to