Github user squito commented on the issue:

    https://github.com/apache/spark/pull/16867
  
    median heap is a good idea to try.  in fact, `slice` is `O(n)` because of 
the way its implemented, it actually iterates through the first `n/2` elements 
(even though it should be able to do something smarter).
    
    You are comparing with a *lot* of tasks in your experiments.  Are you 
really running 100k tasks?  I have encouraged users with big clusters to use 
10k - 50k tasks , but generally things got worse around 100k or so. ( to be 
honest I didnt' spend much time investigating why it got slower, as things 
seemed to be working well within that range.)  But, at least its not hurting on 
a smaller scale.
    
    Also -- you are showing that getting the median is faster, but keep in mind 
you are also slowing down  each call to `handleSuccesfulTask`, since now it has 
to do an O(log n) insertion (even if you used a median heap).  If you have lots 
of really fast tasks, this may be significant.  and you pay that price even 
with speculation disabled!
    
    I'm still on the fence about this too, just sharing my thoughts.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to