Thanks again! We're not (yet) using docker (or any other form of containerization for that matter).
I think in the meantime the approach of sub-advertising resources should work for us. As we're spinning clusters up and down on demand on GCE, I can probably do this pretty easily by adding these lines to the GCE VM startup script: cpus="$( grep -c ^processor /proc/cpuinfo )" advertised_cpu=$((some appropriate conversion from cpus)) sudo sh -c "echo cpus:$advertised_cpu > /etc/mesos-slave/resources" sudo service mesos-slave restart On Thu, Jan 8, 2015 at 10:22 AM, Tom Arnfeld <[email protected]> wrote: > That's a great point Itamar, and something we discussed quite some time > ago here but never implemented. These are the first two options that spring > to mind that I can remember... > > - Are you using docker containers for your tasks? Why not use containers > pre-configured on the box for these services too? > - Build some custom init scripts for your services (perhaps systemd and > the like can do this for you) that will drop your PIDs into cgroups after > they launch, which would allow you to reserve those resources you need > using the same resource system as the popular container tools. > - Do you need to actually reserve these resources? Perhaps if you're only > concerned about memory, or CPU, you could just advertise your slaves as > having less than the machine actually has (using the --resources) flag to > mesos-slave. > > With any of these three approaches you still are going to need to modify > the --resources flag on each slave to ensure less resources than are > actually available are advertised to the cluster. > > Maybe those options are of some use. If you do end up implementing > something in this area for settings aside resources for these auxiliary > services, i'd love to know how you end up doing it! > > > -- > > Tom Arnfeld > Developer // DueDil > > On Thursday, Jan 8, 2015 at 7:32 am, Itamar Ostricher <[email protected]>, > wrote: > >> Thanks everybody for all your insights! >> >> I totally agree with the last response from Tom. >> The per-node services definitely belong to the level that provisions the >> machine and the mesos-slave service itself (in our case, pre-configured GCE >> images). >> >> So I guess the problem I wanted to solve is more general - how can I make >> sure there are resources reserved for all of the system-level stuff that >> are running outside of the mesos context? >> To be more specific, if I have a machine with 16 CPUs, it is common that >> my framework will schedule 16 heavy number-crunching processes on it. >> This can starve anything else that's running on the machine... (like the >> logging aggregation service, and the mesos-slave service itself) >> (this probably explains phenomena of lost tasks we've been observing) >> What's the best-practice solution for this situation? >> >> On Wed, Jan 7, 2015 at 2:09 AM, Tom Arnfeld <[email protected]> wrote: >> >>> I completely agree with Charles, though I think I can appreciate what >>> you're trying to do here. Take the log aggregation service as an example, >>> you want that on every slave to aggregate logs, but want to avoid using yet >>> another layer of configuration management to deploy it. >>> >>> I'm of the opinion that these kind of auxiliary services which all work >>> together (the mesos-slave process included) to define what we mean by a >>> "slave" are the responsibility of whoever/whatever is provisioning the >>> mesos-slave process and possibly even the machine itself. In our case, >>> that's Chef. IMO once a slave registers with the mesos cluster it's >>> immediately ready to start doing work, and mesos will actually start >>> offering that slave immediately. >>> >>> If you continue down this path you're also going to run into a variety >>> of interesting timing issues when these services fail, or when you want to >>> upgrade them. I'd suggest taking a look at some kind of more advanced >>> process monitor to run these aux services like M/Monit instead of mesos >>> (via Marathon). >>> >>> Think of it another way, would you want something running through mesos >>> to install apt package updates once a day? That'd be super weird, so why >>> would log aggregation by any different? >>> >>> -- >>> >>> Tom Arnfeld >>> Developer // DueDil >>> >>> >>> On Tue, Jan 6, 2015 at 11:57 PM, Charles Baker <[email protected]> >>> wrote: >>> >>>> It seems like an 'anti-pattern' (for lack of a better term) to attempt >>>> to force locality on a bunch of dependency services launched through >>>> Marathon. I thought the whole idea of Mesos (and Marathon) was to treat the >>>> data center as one giant computer in which it fundamentally should not >>>> matter where your services are launched. Although I obviously don't know >>>> the details of the use-case and may be grossly misunderstanding what you >>>> are trying to do but to me it sounds like you are attempting to shoehorn a >>>> non-distributed application into a distributed architecture. If this is the >>>> case, you may want to revisit your implementation and try to decouple the >>>> application's requirement of node-level dependency locality. It is also a >>>> good opportunity to possibly redesign a monolithic application into a >>>> distributed one. >>>> >>>> On Tue, Jan 6, 2015 at 12:53 PM, David Greenberg < >>>> [email protected]> wrote: >>>> >>>>> Tom is absolutely correct--you also need to ensure that your "special >>>>> tasks" run as a user which is assigned a role w/ a special reservation to >>>>> ensure they can always launch. >>>>> >>>>> On Tue, Jan 6, 2015 at 2:38 PM, Tom Arnfeld <[email protected]> wrote: >>>>> >>>>>> I'm not sure if I'm fully aware of the use case but if you use a >>>>>> different framework (aka Marathon) to launch these services, should the >>>>>> service die and need to be re-launched (or even the slave restarts) could >>>>>> you not be in a position where another framework has consumed all >>>>>> resources >>>>>> on that slave and your "core" tasks cannot launch? >>>>>> >>>>>> Maybe if you're just using Marathon it might provide a sort of >>>>>> priority to decide who gets what resources first, but with multiple >>>>>> frameworks you might need to look into the slave resource reservations >>>>>> and >>>>>> framework roles. >>>>>> >>>>>> FWIW We're configuring these things out of band (via Chef to be >>>>>> specific). >>>>>> >>>>>> Hope this helps! >>>>>> >>>>>> -- >>>>>> >>>>>> Tom Arnfeld >>>>>> Developer // DueDil >>>>>> >>>>>> (+44) 7525940046 >>>>>> 25 Christopher Street, London, EC2A 2BS >>>>>> >>>>>> >>>>>> On Tue, Jan 6, 2015 at 9:05 AM, Itamar Ostricher <[email protected]> >>>>>> wrote: >>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> I was wondering if the best approach to do what I want is to use >>>>>>> mesos itself, or other Linux system tools. >>>>>>> >>>>>>> There are a bunch of services that our framework assumes are running >>>>>>> on all participating slaves (e.g. logging service, data-bridge service, >>>>>>> etc.). >>>>>>> One approach to do that is in the infrastructure level, making sure >>>>>>> that slave nodes are configured correctly (e.g. with pre-configured >>>>>>> images, >>>>>>> or other provisioning systems). >>>>>>> Another approach would be to use mesos itself (maybe with something >>>>>>> like Marathon) to schedule these services on all slave nodes. >>>>>>> >>>>>>> The advantage of the mesos-based approach is that it becomes trivial >>>>>>> to account for the resource consumption of said services (e.g. make sure >>>>>>> there's always at least 1 CPU dedicated to this). >>>>>>> I'm not sure how to achieve something similar with the >>>>>>> system-approach. >>>>>>> >>>>>>> Anyone has any insights on this? >>>>>>> >>>>>> >>>>>> >>>>> >>>> >>> >>

