Github user superbobry commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19369#discussion_r141671388
  
    --- Diff: 
core/src/main/scala/org/apache/spark/storage/BlockReplicationPolicy.scala ---
    @@ -85,11 +65,9 @@ object BlockReplicationUtils {
        *         randomly shuffle elems
        */
       def getRandomSample[T](elems: Seq[T], m: Int, r: Random): List[T] = {
    -    if (elems.size > m) {
    -      getSampleIds(elems.size, m, r).map(elems(_))
    -    } else {
    -      r.shuffle(elems).toList
    -    }
    +    // This takes linear space, but is stable wrt m. That is for a fixed
    --- End diff --
    
    Yes, the purpose was indeed that.
    
    Do you think the contract tested in `BlockManagerReplicationBehavior` makes 
sense? Personally, I fail to see why this contract is important, but if there 
is a need I don't think we can do better than O(elems.size). Otherwise, the 
solution is to just remove the test.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to