Hi guys, I'm a Hadoop and Mesos n00b, so please be gentle. I'm trying to set up a Mesos cluster, and my ultimate goal is to introduce Mesos in my organization by showing off it's ability to run multiple Hadoop clusters, plus other stuff, on the same resources. I'd like to be able to do this with a HA configuration as close as possible to something we would run in production.
I've successfully set up a Mesos cluster with 3 masters and 4 slaves, but I'm having trouble getting Hadoop jobs to run on top of it. I'm using Mesos 0.21.1 and Hadoop CDH 5.3.0. Initially I tried to follow the Mesosphere tutorial[1], but it looks like it is very outdated and I didn't get very far. Then I tried following the instructions in the github repo[2], but they're also less than ideal. I've managed to get a Hadoop jobtracker running on one of the masters, I can submit jobs to it and they eventually finish. The strange thing is that they take a really long time to start the reduce task, so much so that the first few times I thought it wasn't working at all. Here's part of the output for a simple wordcount example: 15/01/29 16:37:58 INFO mapred.JobClient: map 0% reduce 0% 15/01/29 16:39:23 INFO mapred.JobClient: map 25% reduce 0% 15/01/29 16:39:31 INFO mapred.JobClient: map 50% reduce 0% 15/01/29 16:39:34 INFO mapred.JobClient: map 75% reduce 0% 15/01/29 16:39:37 INFO mapred.JobClient: map 100% reduce 0% 15/01/29 16:56:25 INFO mapred.JobClient: map 100% reduce 100% 15/01/29 16:56:29 INFO mapred.JobClient: Job complete: job_201501291533_0004 Mesos started 3 task trackers which ran the map tasks pretty fast, but then it looks like it was stuck for quite a while before launching a fourth task tracker to run the reduce task. Is this normal, or is there something wrong here? More questions: my configuration file looks a lot like the example in the github repo, but that's listed as being representative of a pseudo-distributed configuration. What should it look like for a real distributed setup? How can I go about running multiple Hadoop clusters? Currently, all three masters have the same configuration file, so they all create a different framework. How should things be set up for a high-availability Hadoop framework that can survive the failure of a Master? What do I need to do to run multiple versions of Hadoop on the same Mesos cluster? I'd really appreciate any hints of documentation or tutorials I may have missed. Even better would be examples of Puppet configurations to set something like this up, but I guess that's probably unlikely. Thanks a lot in advance, Alex [1] https://mesosphere.com/docs/tutorials/run-hadoop-on-mesos/ [2] https://github.com/mesos/hadoop

