Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/19369#discussion_r141808396 --- Diff: core/src/main/scala/org/apache/spark/storage/BlockReplicationPolicy.scala --- @@ -85,11 +65,9 @@ object BlockReplicationUtils { * randomly shuffle elems */ def getRandomSample[T](elems: Seq[T], m: Int, r: Random): List[T] = { - if (elems.size > m) { - getSampleIds(elems.size, m, r).map(elems(_)) - } else { - r.shuffle(elems).toList - } + // This takes linear space, but is stable wrt m. That is for a fixed --- End diff -- I see now. I suspect you should just remove the test instead. @shubhamchopra what do you think? I believe you created most of this code. BTW if performance really mattered here we could make this sampling method a little bit more efficient (avoid foldLeft, -1 at the end, etc) but I don't think it matters
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org