Re: Review Request 48914: Added GPU_RESOURCES capability to FrameworkInfo.

Guangya Liu Sun, 19 Jun 2016 22:50:49 -0700


> On 六月 19, 2016, 2:18 a.m., Guangya Liu wrote:
> > src/master/allocator/mesos/hierarchical.cpp, lines 1277-1279
> > <https://reviews.apache.org/r/48914/diff/1/?file=1423480#file1423480line1277>
> >
> >     This logic seems also allow agents without gpu resources offered to the 
> > framework with gpuCapble, is it expected behavior?
> 
> Kevin Klues wrote:
>     Yes, that was my intended behaviour as I wrote it. If we think there is 
> some problem with this, we should discuss it further.
> 
> Guangya Liu wrote:
>     So what is the use of the agent without GPU for the framework which 
> request gpu resources?
> 
> Klaus Ma wrote:
>     As far as I known, some GPU application can also run on CPU.
> 
> Kevin Klues wrote:
>     Not all tasks launched by a GPU capable framework will necessarily 
> require GPU resources. Some will, some wont -- but at least this capability 
> gives these frameworks a chance to be offered GPUs before non-GPU-capable 
> frameworks starve them out.  In other words, this capability is more about 
> making sure that frameworks that definitely *never* need GPU resources, don't 
> fill up GPU capable machines with non-GPU workloads. It shouldn't restrict 
> frameworks that have this capability from receiving their fair share of 
> resources in the cluster.


Got it, then I think that here should have some comments to clarify the 
behavior as discussed above to make sure everyone on the same page.


> On 六月 19, 2016, 2:18 a.m., Guangya Liu wrote:
> > src/master/allocator/mesos/hierarchical.cpp, line 1274
> > <https://reviews.apache.org/r/48914/diff/1/?file=1423480#file1423480line1274>
> >
> >     s/slaves/agents
> 
> Kevin Klues wrote:
>     I thought about this as I was writing it, but I left it as slave because 
> all of the other comments in the file use slave. I figured it was better to 
> stay consistent and wait for a full pass to change it than have one coment 
> refering to agent and the next referring to slave.
> 
> Guangya Liu wrote:
>     I think that we should use `agent` but not `slave` for the new code as we 
> need finally update all `slave` to `agent`, you can take a look at 
> https://github.com/apache/mesos/blob/master/src/tests/hierarchical_allocator_tests.cpp
>  , all of the latest code are using `agent` but not `slave`
> 
> Kevin Klues wrote:
>     I'm still not convinced. In the case of adding a new test, I agree -- 
> everything should be worded as `agent`. This is because you typically consume 
> each test individually, and so long as the terminology within the test is 
> self-consistent, everything looks good.
>     
>     I just think its weird -- within the same function -- to have one comment 
> say `slave`, the next say `agent`, and then another say `slave`, all within 
> 10 lines of each other.

You may see that all logs and outputs in this file are using `agent` but not 
`slave`, it was fixed here https://reviews.apache.org/r/45213/diff/3#8 , I 
think that we should not use `slave` from now on but keep `agent` for all new 
introduced code.


- Guangya


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/48914/#review138432
-----------------------------------------------------------


On 六月 18, 2016, 10:07 p.m., Kevin Klues wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/48914/
> -----------------------------------------------------------
> 
> (Updated 六月 18, 2016, 10:07 p.m.)
> 
> 
> Review request for mesos and Benjamin Mahler.
> 
> 
> Bugs: MESOS-5634
>     https://issues.apache.org/jira/browse/MESOS-5634
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Due to the scarce resource problem described in MESOS-5377, we are
> introducing a GPU_RESOURCES Framework capability. This capability
> allows the Mesos allocator to make better decisions about which
> frameworks should receive resources from GPU capable machines. In
> essence, the allocator ONLY allocate resources from GPU capable
> machines to frameworks that have this capability. This is necessary to
> prevent non-GPU workloads from filling up the GPU machines and
> preventing GPU workloads to run.
> 
> 
> Diffs
> -----
> 
>   include/mesos/mesos.proto e4c5bd31cf035707036eb509336fe051119b4e78 
>   include/mesos/v1/mesos.proto 9be22f02861f1eb89ab547d88530faf90ebee7ab 
>   src/master/allocator/mesos/hierarchical.hpp 
> 9c6b23abe2b0cb16412f1ed90165f8d0c14552fa 
>   src/master/allocator/mesos/hierarchical.cpp 
> 8b7b3afb5770c617918ec4864faaf8d8a7a81ef2 
>   src/tests/containerizer/nvidia_gpu_isolator_tests.cpp 
> e06d107f2dcdb9b470e330c8ceee66a54220d41b 
> 
> Diff: https://reviews.apache.org/r/48914/diff/
> 
> 
> Testing
> -------
> 
> $ make -j check; sudo GTEST_FILTER="*NVIDIA*" src/mesos-tests
> 
> 
> Thanks,
> 
> Kevin Klues
> 
>

Re: Review Request 48914: Added GPU_RESOURCES capability to FrameworkInfo.

Reply via email to