Sorry to drop off the thread here. Airbnb is using mesos/hadoop. Not sure
if I understand the question - mesos/hadoop manages the task tracker
lifecycle so there is no need for Marathon here.
With the upcoming Docker integration you'll be able to isolate them in a
container:
https://github.com/mesosphere/medea


On Thu, Feb 6, 2014 at 12:33 AM, Tom Arnfeld <[email protected]> wrote:

> Hey Tobi,
>
> That's my thinking too.. Having taken a closer look at Marathon I now
> realise at what level it sits (I previously thought it was a framework
> itself).
>
> Do you know of anyone currently running Hadoop task trackers using
> Marathon? If so, do you think it would be possible to implement a similar
> scheduler to the task scheduler provided by
> https://github.com/mesos/hadoop - if there isn't one already. Or is the
> best way to simply launch long running task trackers?
>
> My point being, i'd like to be able to isolate the hadoop task trackers
> (and even Chronos tasks, for example) within the docker containers to
> enable hadoop tasks to use the dependencies built into the docker image.
>
> Thanks.
>
> --
>
> Tom Arnfeld
> Developer // DueDil
>
> On 6 Feb 2014, at 01:32, Tobias Knaup <[email protected]> wrote:
>
> Hi Tom,
>
> Docker is definitely a good option for this. Marathon already has basic
> support for Docker, and there has been some work recently to integrate it
> more tightly with Mesos.
>
> Cheers,
>
> Tobi
>
>
> On Tue, Feb 4, 2014 at 4:31 AM, Tom Arnfeld <[email protected]> wrote:
>
>> I'm investigating the possibility of using Mesos to solve the problem of
>> resource allocation between a Hadoop cluster and set of Jenkins slaves (and
>> I like the possibility of being able to easily deploy other frameworks).
>> One of the biggest overhanging questions I can't seem to find an answer to
>> is how to manage system dependencies across a wide variety of frameworks,
>> and jobs running within those frameworks.
>>
>> I came across this thread (
>> http://www.mail-archive.com/[email protected]/msg00301.html) and
>> caching executor files seems to be the running solution, though not
>> implemented yet. I too would really like to avoid shipping system
>> dependencies (c-deps for python packages, as an example) along with every
>> single job, and i'm especially unsure how this would interact with the
>> Hadoop/Jenkins mesos schedulers (as each hadoop job may require it's own
>> system dependencies).
>>
>> More importantly, the architecture of the machine submitting the job is
>> often different from the slaves so we can't simply ship all the built
>> dependencies with the task.
>>
>> We're solving this problem at the moment for Hadoop by installing all
>> dependencies we require on every hadoop task tracker node, which is far
>> from ideal. For jenkins, we're using Docker to isolate execution of
>> different types of jobs, and built all system dependencies for a suite of
>> jobs into docker images.
>>
>> I like the idea of continuing down the path of Docker for process
>> isolation and system dependency management, but I don't see any easy way
>> for this to interact with the existing hadoop/jenkins/etc. schedulers. I
>> guess it'd require us to build our own schedulers/executors that wrapped
>> the process in a Docker container.
>>
>> I'd love to hear how others are solving this problem... and/or whether
>> Docker seems like the wrong way to go.
>>
>>  --
>>
>> Tom Arnfeld
>> Developer // DueDil
>>
>
>
>

Reply via email to