Thanks again!

We're not (yet) using docker (or any other form of containerization for
that matter).

I think in the meantime the approach of sub-advertising resources should
work for us.
As we're spinning clusters up and down on demand on GCE, I can probably do
this pretty easily by adding these lines to the GCE VM startup script:

cpus="$( grep -c ^processor /proc/cpuinfo )"
advertised_cpu=$((some appropriate conversion from cpus))
sudo sh -c "echo cpus:$advertised_cpu > /etc/mesos-slave/resources"
sudo service mesos-slave restart

On Thu, Jan 8, 2015 at 10:22 AM, Tom Arnfeld <[email protected]> wrote:

>  That's a great point Itamar, and something we discussed quite some time
> ago here but never implemented. These are the first two options that spring
> to mind that I can remember...
>
> - Are you using docker containers for your tasks? Why not use containers
> pre-configured on the box for these services too?
> - Build some custom init scripts for your services (perhaps systemd and
> the like can do this for you) that will drop your PIDs into cgroups after
> they launch, which would allow you to reserve those resources you need
> using the same resource system as the popular container tools.
> - Do you need to actually reserve these resources? Perhaps if you're only
> concerned about memory, or CPU, you could just advertise your slaves as
> having less than the machine actually has (using the --resources) flag to
> mesos-slave.
>
> With any of these three approaches you still are going to need to modify
> the --resources flag on each slave to ensure less resources than are
> actually available are advertised to the cluster.
>
> Maybe those options are of some use. If you do end up implementing
> something in this area for settings aside resources for these auxiliary
> services, i'd love to know how you end up doing it!
>
>
> --
>
> Tom Arnfeld
> Developer // DueDil
>
> On Thursday, Jan 8, 2015 at 7:32 am, Itamar Ostricher <[email protected]>,
> wrote:
>
>> Thanks everybody for all your insights!
>>
>> I totally agree with the last response from Tom.
>> The per-node services definitely belong to the level that provisions the
>> machine and the mesos-slave service itself (in our case, pre-configured GCE
>> images).
>>
>> So I guess the problem I wanted to solve is more general - how can I make
>> sure there are resources reserved for all of the system-level stuff that
>> are running outside of the mesos context?
>> To be more specific, if I have a machine with 16 CPUs, it is common that
>> my framework will schedule 16 heavy number-crunching processes on it.
>> This can starve anything else that's running on the machine... (like the
>> logging aggregation service, and the mesos-slave service itself)
>> (this probably explains phenomena of lost tasks we've been observing)
>> What's the best-practice solution for this situation?
>>
>> On Wed, Jan 7, 2015 at 2:09 AM, Tom Arnfeld <[email protected]> wrote:
>>
>>> I completely agree with Charles, though I think I can appreciate what
>>> you're trying to do here. Take the log aggregation service as an example,
>>> you want that on every slave to aggregate logs, but want to avoid using yet
>>> another layer of configuration management to deploy it.
>>>
>>> I'm of the opinion that these kind of auxiliary services which all work
>>> together (the mesos-slave process included) to define what we mean by a
>>> "slave" are the responsibility of whoever/whatever is provisioning the
>>> mesos-slave process and possibly even the machine itself. In our case,
>>> that's Chef. IMO once a slave registers with the mesos cluster it's
>>> immediately ready to start doing work, and mesos will actually start
>>> offering that slave immediately.
>>>
>>> If you continue down this path you're also going to run into a variety
>>> of interesting timing issues when these services fail, or when you want to
>>> upgrade them. I'd suggest taking a look at some kind of more advanced
>>> process monitor to run these aux services like M/Monit instead of mesos
>>> (via Marathon).
>>>
>>> Think of it another way, would you want something running through mesos
>>> to install apt package updates once a day? That'd be super weird, so why
>>> would log aggregation by any different?
>>>
>>> --
>>>
>>> Tom Arnfeld
>>> Developer // DueDil
>>>
>>>
>>>  On Tue, Jan 6, 2015 at 11:57 PM, Charles Baker <[email protected]>
>>> wrote:
>>>
>>>> It seems like an 'anti-pattern' (for lack of a better term) to attempt
>>>> to force locality on a bunch of dependency services launched through
>>>> Marathon. I thought the whole idea of Mesos (and Marathon) was to treat the
>>>> data center as one giant computer in which it fundamentally should not
>>>> matter where your services are launched. Although I obviously don't know
>>>> the details of the use-case and may be grossly misunderstanding what you
>>>> are trying to do but to me it sounds like you are attempting to shoehorn a
>>>> non-distributed application into a distributed architecture. If this is the
>>>> case, you may want to revisit your implementation and try to decouple the
>>>> application's requirement of node-level dependency locality. It is also a
>>>> good opportunity to possibly redesign a monolithic application into a
>>>> distributed one.
>>>>
>>>> On Tue, Jan 6, 2015 at 12:53 PM, David Greenberg <
>>>> [email protected]> wrote:
>>>>
>>>>> Tom is absolutely correct--you also need to ensure that your "special
>>>>> tasks" run as a user which is assigned a role w/ a special reservation to
>>>>> ensure they can always launch.
>>>>>
>>>>> On Tue, Jan 6, 2015 at 2:38 PM, Tom Arnfeld <[email protected]> wrote:
>>>>>
>>>>>> I'm not sure if I'm fully aware of the use case but if you use a
>>>>>> different framework (aka Marathon) to launch these services, should the
>>>>>> service die and need to be re-launched (or even the slave restarts) could
>>>>>> you not be in a position where another framework has consumed all 
>>>>>> resources
>>>>>> on that slave and your "core" tasks cannot launch?
>>>>>>
>>>>>> Maybe if you're just using Marathon it might provide a sort of
>>>>>> priority to decide who gets what resources first, but with multiple
>>>>>> frameworks you might need to look into the slave resource reservations 
>>>>>> and
>>>>>> framework roles.
>>>>>>
>>>>>> FWIW We're configuring these things out of band (via Chef to be
>>>>>> specific).
>>>>>>
>>>>>> Hope this helps!
>>>>>>
>>>>>> --
>>>>>>
>>>>>> Tom Arnfeld
>>>>>> Developer // DueDil
>>>>>>
>>>>>> (+44) 7525940046
>>>>>> 25 Christopher Street, London, EC2A 2BS
>>>>>>
>>>>>>
>>>>>> On Tue, Jan 6, 2015 at 9:05 AM, Itamar Ostricher <[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> I was wondering if the best approach to do what I want is to use
>>>>>>> mesos itself, or other Linux system tools.
>>>>>>>
>>>>>>> There are a bunch of services that our framework assumes are running
>>>>>>> on all participating slaves (e.g. logging service, data-bridge service,
>>>>>>> etc.).
>>>>>>> One approach to do that is in the infrastructure level, making sure
>>>>>>> that slave nodes are configured correctly (e.g. with pre-configured 
>>>>>>> images,
>>>>>>> or other provisioning systems).
>>>>>>> Another approach would be to use mesos itself (maybe with something
>>>>>>> like Marathon) to schedule these services on all slave nodes.
>>>>>>>
>>>>>>> The advantage of the mesos-based approach is that it becomes trivial
>>>>>>> to account for the resource consumption of said services (e.g. make sure
>>>>>>> there's always at least 1 CPU dedicated to this).
>>>>>>> I'm not sure how to achieve something similar with the
>>>>>>> system-approach.
>>>>>>>
>>>>>>> Anyone has any insights on this?
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>

Reply via email to