Github user akopich commented on the issue: https://github.com/apache/spark/pull/19565 @hhbyyh, in case of "filter before sample" in a local test the overhead is negligible. Regarding "sample before filter", you are right. There (strictly speaking) should be adjustment of `miniBatchFraction`. Which is why I do prefer "filter before sample". Also note, version "sample before filter" is logically equivalent to the current upstream/master.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org