Re: Review Request 51027: Track allocation candidates to bound allocator.

Jacob Janco Fri, 23 Sep 2016 16:48:07 -0700


> On Sept. 23, 2016, 2:40 a.m., Guangya Liu wrote:
> > It is really weired that the performance of 
> > `SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.DeclineOffers/7`
> >  does not improve much when calling `addSlave`, need check more for why 
> > `addSlave` was same? Without fix, the `addSlave` will call `allocate` for 
> > each agent, but with the fix, only one `allocate` will be called....
> > 
> > ```
> > without fix:
> > [==========] Running 1 test from 1 test case.
> > [----------] Global test environment set-up.
> > [----------] 1 test from 
> > SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test
> > [ RUN      ] 
> > SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.DeclineOffers/7
> > Using 1000 agents and 6000 frameworks
> > Added 6000 frameworks in 122268us
> > Added 1000 agents in 42.037104secs
> > 
> > With fix:
> > [==========] Running 1 test from 1 test case.
> > [----------] Global test environment set-up.
> > [----------] 1 test from 
> > SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test
> > [ RUN      ] 
> > SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.DeclineOffers/7
> > Using 1000 agents and 6000 frameworks
> > Added 6000 frameworks in 116107us
> > Added 1000 agents in 41.615396secs
> > ```
> 
> Guangya Liu wrote:
>     Jacob, I did more test with the code on Aug 23, at which I posted some 
> result in this RR, and found that the test result is different, I did 
> following to get Aug 23 code.
>     
>     ```
>     LiuGuangyas-MacBook-Pro:build gyliu$ git checkout 
> 2f78a440ef4201c5b11fb92c225694e84a60369c
>     
>     LiuGuangyas-MacBook-Pro:build gyliu$ git log -1
>     commit 2f78a440ef4201c5b11fb92c225694e84a60369c
>     Author: Gilbert Song <songzihao1...@gmail.com>
>     Date:   Mon Aug 22 13:00:58 2016 -0700
>     
>         Fixed potential flakiness in ROOT_RecoverOrphanedPersistentVolume.
>     
>         Review: https://reviews.apache.org/r/51271/
>     ```
>     
>     The test result seems still same as now (without your patch and the code 
> is get from Aug 23):
>     
>     ```
>     [==========] Running 1 test from 1 test case.
>     [----------] Global test environment set-up.
>     [----------] 1 test from 
> SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test
>     [ RUN      ] 
> SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.DeclineOffers/7
>     Using 1000 agents and 6000 frameworks
>     Added 6000 frameworks in 144272us
>     Added 1000 agents in 43.107001secs
>     ```
>     
>     But anyway, I think that we need find out why the performance for 
> `addSlave` was not improved based on your patch.
> 
> Jacob Janco wrote:
>     Yes agreed, per our Slack discussions, I'll look into this. Thanks for 
> posting the followup.
> 
> Benjamin Mahler wrote:
>     `addSlave()` is asynchronous and we do not wait for all of the 
> `addSlave()` futures to complete, so any speedup in `addSlave()` will only 
> affect the next caller that waits for a result from the allocator.
> 
> Benjamin Mahler wrote:
>     Ah I missed that we do a `Clock::settle()`, nevermind :)
> 
> Guangya Liu wrote:
>     Some thinking for why `addSlave` does not improve much...
>     
>     Without Jacob's patch, the logic woule be:
>     
>     ```
>     addSlave -> allocate the single slave
>     addSlave -> allocate the single slave
>     addSlave -> allocate the single slave
>     ...
>     addSlave -> allocate the single slave
>     ```
>     
>     With Jacob's patch, the logic would be:
>     
>     ```
>     addSlave
>     addSlave
>     addSlave
>     ...
>     addSlave - > allocate for **all** of the slaves
>     ```
>     
>     The time elapsed by `allocate a single slave N times` with `allocate N 
> slaves in one allocate` request should not different much, the only 
> difference is one is looping the event queue while another is looping in 
> allocator, that's why there are not enough performance change for this.
>     
>     But this will impact a lot when adding frameworks or some other events in 
> allocator which will call `allocate(slaves)`, one proposal is we may need to 
> add some new benchmark test cases which do the following logic, the following 
> logic will trigger each `addframework` operation call `allocate(slaves)` 
> without Jacob's patch, but will only call `allocate(slaves)` one time with 
> Jacob's patch.
>     
>     ```
>     1) Add slaves first
>     2) Add frameworks
>     ```
>     
>     We may get some performance improvement with above case.
>     
>     Currently, all of the benchmark test are using 
>     
>     ```
>     1) Add frameworks
>     2) Add agents
>     ```
>     
>     That's why not much performance improvement...


This makes sense Guangya, I'm in the process of creating a minimal benchmark 
adding a set of slaves then adding frameworks. I'll post here if the results 
are interesting.


- Jacob


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51027/#review150123
-----------------------------------------------------------


On Sept. 23, 2016, 4:32 p.m., Jacob Janco wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51027/
> -----------------------------------------------------------
> 
> (Updated Sept. 23, 2016, 4:32 p.m.)
> 
> 
> Review request for mesos, Benjamin Mahler, Guangya Liu, James Peach, Klaus 
> Ma, and Jiang Yan Xu.
> 
> 
> Bugs: MESOS-3157
>     https://issues.apache.org/jira/browse/MESOS-3157
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> - Triggered allocations dispatch allocate() only
>   if there is no pending allocation in the queue.
> - Allocation candidates are accumulated and only
>   cleared when enqueued allocations are processed.
> 
> 
> Diffs
> -----
> 
>   src/master/allocator/mesos/hierarchical.hpp 
> 2c31471ee0f5d6836393bf87ff9ecfd8df835013 
>   src/master/allocator/mesos/hierarchical.cpp 
> 2d56bd011f2c87c67a02d0ae467a4a537d36867e 
> 
> Diff: https://reviews.apache.org/r/51027/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> note: check without filters depends on https://reviews.apache.org/r/51028
> 
> With new benchmark https://reviews.apache.org/r/49617: 
> Sample output without 51027:
> [ RUN      ] 
> SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.FrameworkFailover/22
> Using 10000 agents and 3000 frameworks
> Added 3000 frameworks in 57251us
> Added 10000 agents in 3.21345353333333mins
> allocator settled after  1.61236038333333mins
> [       OK ] 
> SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.FrameworkFailover/22
>  (290578 ms)
> 
> Sample output with 51027:
> [ RUN      ] 
> SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.FrameworkFailover/22
> Using 10000 agents and 3000 frameworks
> Added 3000 frameworks in 39817us
> Added 10000 agents in 3.22860541666667mins
> allocator settled after  25.525654secs
> [       OK ] 
> SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.FrameworkFailover/22
>  (220137 ms)
> 
> 
> Thanks,
> 
> Jacob Janco
> 
>

Re: Review Request 51027: Track allocation candidates to bound allocator.

Reply via email to