Sorry to drop off the thread here. Airbnb is using mesos/hadoop. Not sure if I understand the question - mesos/hadoop manages the task tracker lifecycle so there is no need for Marathon here. With the upcoming Docker integration you'll be able to isolate them in a container: https://github.com/mesosphere/medea
On Thu, Feb 6, 2014 at 12:33 AM, Tom Arnfeld <[email protected]> wrote: > Hey Tobi, > > That's my thinking too.. Having taken a closer look at Marathon I now > realise at what level it sits (I previously thought it was a framework > itself). > > Do you know of anyone currently running Hadoop task trackers using > Marathon? If so, do you think it would be possible to implement a similar > scheduler to the task scheduler provided by > https://github.com/mesos/hadoop - if there isn't one already. Or is the > best way to simply launch long running task trackers? > > My point being, i'd like to be able to isolate the hadoop task trackers > (and even Chronos tasks, for example) within the docker containers to > enable hadoop tasks to use the dependencies built into the docker image. > > Thanks. > > -- > > Tom Arnfeld > Developer // DueDil > > On 6 Feb 2014, at 01:32, Tobias Knaup <[email protected]> wrote: > > Hi Tom, > > Docker is definitely a good option for this. Marathon already has basic > support for Docker, and there has been some work recently to integrate it > more tightly with Mesos. > > Cheers, > > Tobi > > > On Tue, Feb 4, 2014 at 4:31 AM, Tom Arnfeld <[email protected]> wrote: > >> I'm investigating the possibility of using Mesos to solve the problem of >> resource allocation between a Hadoop cluster and set of Jenkins slaves (and >> I like the possibility of being able to easily deploy other frameworks). >> One of the biggest overhanging questions I can't seem to find an answer to >> is how to manage system dependencies across a wide variety of frameworks, >> and jobs running within those frameworks. >> >> I came across this thread ( >> http://www.mail-archive.com/[email protected]/msg00301.html) and >> caching executor files seems to be the running solution, though not >> implemented yet. I too would really like to avoid shipping system >> dependencies (c-deps for python packages, as an example) along with every >> single job, and i'm especially unsure how this would interact with the >> Hadoop/Jenkins mesos schedulers (as each hadoop job may require it's own >> system dependencies). >> >> More importantly, the architecture of the machine submitting the job is >> often different from the slaves so we can't simply ship all the built >> dependencies with the task. >> >> We're solving this problem at the moment for Hadoop by installing all >> dependencies we require on every hadoop task tracker node, which is far >> from ideal. For jenkins, we're using Docker to isolate execution of >> different types of jobs, and built all system dependencies for a suite of >> jobs into docker images. >> >> I like the idea of continuing down the path of Docker for process >> isolation and system dependency management, but I don't see any easy way >> for this to interact with the existing hadoop/jenkins/etc. schedulers. I >> guess it'd require us to build our own schedulers/executors that wrapped >> the process in a Docker container. >> >> I'd love to hear how others are solving this problem... and/or whether >> Docker seems like the wrong way to go. >> >> -- >> >> Tom Arnfeld >> Developer // DueDil >> > > >

