Github user srowen commented on a diff in the pull request:
https://github.com/apache/spark/pull/19369#discussion_r141808396
--- Diff:
core/src/main/scala/org/apache/spark/storage/BlockReplicationPolicy.scala ---
@@ -85,11 +65,9 @@ object BlockReplicationUtils {
* randomly shuffle elems
*/
def getRandomSample[T](elems: Seq[T], m: Int, r: Random): List[T] = {
- if (elems.size > m) {
- getSampleIds(elems.size, m, r).map(elems(_))
- } else {
- r.shuffle(elems).toList
- }
+ // This takes linear space, but is stable wrt m. That is for a fixed
--- End diff --
I see now. I suspect you should just remove the test instead.
@shubhamchopra what do you think? I believe you created most of this code.
BTW if performance really mattered here we could make this sampling method
a little bit more efficient (avoid foldLeft, -1 at the end, etc) but I don't
think it matters
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]