Hi Alex,



Great to hear you're hoping to use Hadoop on Mesos. We've been running it for a 
good 6 months and it's been awesome.




I'll answer the simpler question first, running multiple job trackers should be 
just fine.. even multiple JTs with HA enabled (we do this). The mesos scheduler 
for Hadoop will ship all configuration options needed for each TaskTracker 
within mesos, so there's nothing you need to have specifically configured on 
each slave..




# Slow slot allocations




If you only have a few slaves, not many resources and a large amount of 
resources per slot, you might end up with a pretty small slot allocation (e.g 5 
mappers and 1 reducer). Because of the nature of Hadoop, slots are static for 
each TaskTracker and the framework does a best effort to figure out what 
balance of map/reduce slots to launch on the cluster.




Because of this, the current stable version of the framework has a few issues 
when running on small clusters, especially when you don't configure min/max 
slot capacity for each JobTracker. Few links below




- https://github.com/mesos/hadoop/issues/32

- https://github.com/mesos/hadoop/issues/31

- https://github.com/mesos/hadoop/issues/28

- https://github.com/mesos/hadoop/issues/26




Having said that, we've been working on a solution to this problem which 
enables Hadoop to launch different types of slots over the lifetime of a single 
job, meaning you could start with 5 maps and 1 reduce, and then end with 0 maps 
and 6 reduce. It's not perfect, but it's a decent optimisation if you still 
need to use Hadoop.




- https://github.com/mesos/hadoop/pull/33


You may also want to look into how large your executor URI is (the one 
containing the hadoop source that gets downloaded for each task tracker) and 
how long that takes to download.. it might be that the task trackers are taking 
a while to bootstrap.




# HA Hadoop JTs




The framework currently does not support a full HA setup, however that's not a 
huge issue. The JT will automatically restart jobs where they left off on it's 
own when a failover occurs, but for the time being all the track trackers will 
be killed and new ones spawned. Depending on your setup, this could be a fairly 
negligible time.




# Multiple versions of hadoop on the cluster




This is totally fine, each JT configuration can be given it's own hadoop tar.gz 
file with the right version in it, and they will all happily share the Mesos 
cluster.




I hope this makes sense! Ping me on irc (tarnfeld) if you run into anything 
funky on that branch for flexi trackers.




Tom.


--


Tom Arnfeld

Developer // DueDil

On Thu, Jan 29, 2015 at 4:09 PM, Alex <alex.m.lis...@gmail.com> wrote:

> Hi guys,
> I'm a Hadoop and Mesos n00b, so please be gentle. I'm trying to set up a
> Mesos cluster, and my ultimate goal is to introduce Mesos in my
> organization by showing off it's ability to run multiple Hadoop
> clusters, plus other stuff, on the same resources. I'd like to be able
> to do this with a HA configuration as close as possible to something we
> would run in production.
> I've successfully set up a Mesos cluster with 3 masters and 4 slaves,
> but I'm having trouble getting Hadoop jobs to run on top of it. I'm
> using Mesos 0.21.1 and Hadoop CDH 5.3.0. Initially I tried to follow the
> Mesosphere tutorial[1], but it looks like it is very outdated and I
> didn't get very far. Then I tried following the instructions in the
> github repo[2], but they're also less than ideal.
> I've managed to get a Hadoop jobtracker running on one of the masters, I
> can submit jobs to it and they eventually finish. The strange thing is
> that they take a really long time to start the reduce task, so much so
> that the first few times I thought it wasn't working at all. Here's part
> of the output for a simple wordcount example:
> 15/01/29 16:37:58 INFO mapred.JobClient:  map 0% reduce 0%
> 15/01/29 16:39:23 INFO mapred.JobClient:  map 25% reduce 0%
> 15/01/29 16:39:31 INFO mapred.JobClient:  map 50% reduce 0%
> 15/01/29 16:39:34 INFO mapred.JobClient:  map 75% reduce 0%
> 15/01/29 16:39:37 INFO mapred.JobClient:  map 100% reduce 0%
> 15/01/29 16:56:25 INFO mapred.JobClient:  map 100% reduce 100%
> 15/01/29 16:56:29 INFO mapred.JobClient: Job complete: job_201501291533_0004
> Mesos started 3 task trackers which ran the map tasks pretty fast, but
> then it looks like it was stuck for quite a while before launching a
> fourth task tracker to run the reduce task. Is this normal, or is there
> something wrong here?
> More questions: my configuration file looks a lot like the example in
> the github repo, but that's listed as being representative of a
> pseudo-distributed configuration. What should it look like for a real
> distributed setup? How can I go about running multiple Hadoop clusters?
> Currently, all three masters have the same configuration file, so they
> all create a different framework. How should things be set up for a
> high-availability Hadoop framework that can survive the failure of a
> Master? What do I need to do to run multiple versions of Hadoop on the
> same Mesos cluster?
> I'd really appreciate any hints of documentation or tutorials I may have
> missed. Even better would be examples of Puppet configurations to set
> something like this up, but I guess that's probably unlikely.
> Thanks a lot in advance,
> Alex
> [1] https://mesosphere.com/docs/tutorials/run-hadoop-on-mesos/
> [2] https://github.com/mesos/hadoop

Reply via email to