Hey Tobi, That’s my thinking too.. Having taken a closer look at Marathon I now realise at what level it sits (I previously thought it was a framework itself).
Do you know of anyone currently running Hadoop task trackers using Marathon? If so, do you think it would be possible to implement a similar scheduler to the task scheduler provided by https://github.com/mesos/hadoop – if there isn’t one already. Or is the best way to simply launch long running task trackers? My point being, i’d like to be able to isolate the hadoop task trackers (and even Chronos tasks, for example) within the docker containers to enable hadoop tasks to use the dependencies built into the docker image. Thanks. — Tom Arnfeld Developer // DueDil On 6 Feb 2014, at 01:32, Tobias Knaup <[email protected]> wrote: > Hi Tom, > > Docker is definitely a good option for this. Marathon already has basic > support for Docker, and there has been some work recently to integrate it > more tightly with Mesos. > > Cheers, > > Tobi > > > On Tue, Feb 4, 2014 at 4:31 AM, Tom Arnfeld <[email protected]> wrote: > I’m investigating the possibility of using Mesos to solve the problem of > resource allocation between a Hadoop cluster and set of Jenkins slaves (and I > like the possibility of being able to easily deploy other frameworks). One of > the biggest overhanging questions I can’t seem to find an answer to is how to > manage system dependencies across a wide variety of frameworks, and jobs > running within those frameworks. > > I came across this thread > (http://www.mail-archive.com/[email protected]/msg00301.html) and caching > executor files seems to be the running solution, though not implemented yet. > I too would really like to avoid shipping system dependencies (c-deps for > python packages, as an example) along with every single job, and i’m > especially unsure how this would interact with the Hadoop/Jenkins mesos > schedulers (as each hadoop job may require it’s own system dependencies). > > More importantly, the architecture of the machine submitting the job is often > different from the slaves so we can’t simply ship all the built dependencies > with the task. > > We’re solving this problem at the moment for Hadoop by installing all > dependencies we require on every hadoop task tracker node, which is far from > ideal. For jenkins, we’re using Docker to isolate execution of different > types of jobs, and built all system dependencies for a suite of jobs into > docker images. > > I like the idea of continuing down the path of Docker for process isolation > and system dependency management, but I don’t see any easy way for this to > interact with the existing hadoop/jenkins/etc. schedulers. I guess it’d > require us to build our own schedulers/executors that wrapped the process in > a Docker container. > > I’d love to hear how others are solving this problem… and/or whether Docker > seems like the wrong way to go. > > — > > Tom Arnfeld > Developer // DueDil >

