Github user tgravescs commented on the pull request:

    https://github.com/apache/spark/pull/6082#issuecomment-102130616
  
    Since you don't want to give me testing information I ran this myself. 
    
    On a 100 node cluster.  utilization at 11% by other stuff with very little 
churn, no one else in my queue, running wordcount on README.md file asking for 
3 containers of 3g, we wait for all containers 
(spark.scheduler.minRegisteredResourcesRatio 1.0 and 
spark.scheduler.maxRegisteredResourcesWaitingTime 240000) each here are the 
results.  I flipped back and forth between running one with config one way, 
then the config the other, repeat...  I wasn't concerned with functionality job 
was doing just how long it takes to get containers
    
    on our multi-tenant environments I would consider this about best case.   I 
also ran a couple at 5 seconds just to see. Those were run after all the others 
so less likely to have cluster in very similar conditions.
    
    Also note this has the bug where it actually starts at 400ms.
    
    200ms with backoff | 3000ms static | 5000ms static
    -------------------------------------------------
    27 s  | 31s | 25s
    24s | 27 s | 27s
    30s | 30s | 28s
    31s  | 31s | 28s
    24s | 29s | 28s
    
    27.2 average | 29.6 average  | 27.2 average
    
    same thing but asking for 30 containers
    200ms with backoff | 3000ms static | 5000ms static
    -------------------------------------------------
    37s | 44s | 47s
    37s | 44s | 42 s
    42s | 40s | 47s
    41s | 41s
    41s  | 41s
    
    39.6 s average | 42s average | 45.3s average
    
    same thing but asking for 100 containers
    200ms with backoff | 3000ms static  | 5000ms static
    -------------------------------------------------
    1min 11sec | 1 min 14 sec  | 1 min 17sec
    1min 20sec | 1 min 14 sec  | 1 min 21sec
    1 min 38 sec | 1 min 27 sec 
    1min 10 sec | 1 min 10 sec
    
    1 min 19.75 sec average | 1 min 16.25 sec average | 1min 19 second average
    
    While there is variation between jobs to where a lot of time this won't be 
noticeable, on average it did improve by a little over 2 seconds on average in 
this particular scenario.   I have my doubts this is going to be noticeable in 
a real production environment since there are variations between run times.  I 
guess if you are down to optimizing at the ms level this would help, but we can 
put it in and see.
    
    I'm ok with this going in if other committers are ok with it.  I do think 
we should change the default to 3000ms instead of 5000ms.  That will be plenty 
safe for RM but will improve more cases then just this. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to