srowen commented on pull request #32415: URL: https://github.com/apache/spark/pull/32415#issuecomment-832333786
Yeah it does seem like the variation here is due to distributing the computation. It might even be 'reasonable' to expect given the tiny data set. But isn't very good for confidence in the implementation via tests. I do agree that this variation is not due to the changes here. For that reason I'd suggest upping the iterations in the relevant tests to 100s of iterations, as that seems necessary for proper for convergence. And then just assert whatever result it comes up with. We can take a look at why it's so sensitive later; at worst it is already an issue. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org