Could you post the log in executors which run jobtracker and taskstracks? It would be helpful to find the cause of this problem.
On Fri, May 8, 2015 at 3:05 AM, Brian Topping <[email protected]> wrote: > I think there's something weird here: > > cpus: offered 2.0 needed at least 1.0 > mem : offered 1724.0 needed at least 1024.0 > disk: offered 44124.0 needed at least 1024.0 > ports: at least 2 (sufficient) > > > Am I misreading this? All of the requirements seem to be met. > > Presumably it's this code from o.a.h.mapred.ResourcePolicyVariable: > > int slots = mapSlotsMax + reduceSlotsMax; > slots = (int) Math.min(slots, (cpus - containerCpus) / slotCpus); > slots = (int) Math.min(slots, (mem - containerMem) / slotMem); > slots = (int) Math.min(slots, (disk - containerDisk) / slotDisk); > > > // Is this offer too small for even the minimum slots? > if (slots < 1) { > return false; > } > > > Not exactly sure what this is doing. > > Sorry for the noise. > > > On May 7, 2015, at 6:32 PM, Brian Topping <[email protected]> wrote: > > Presumably https://gist.github.com/briantopping/311960f8e5454dbe9aab has > some more information necessary at this point... sorry for the omission.. > > On May 7, 2015, at 6:05 PM, Tom Arnfeld <[email protected]> wrote: > > Hi Brian, > > At this point you should see the TT attempting to be launched via Mesos. > The "launched but not heartbeat yet" count tells us that the framework has > accepted resources for 4 slots but the TT hasn't actually come up yet. > > Do you see the task in your Meaos cluster UI, and is there anything > interesting in the task logs? > > -- > > Tom Arnfeld > Developer // DueDil > > (+44) 7525940046 > 25 Christopher Street, London, EC2A 2BS > > > On Thu, May 7, 2015 at 12:01 PM, Brian Topping <[email protected]> > wrote: > >> Thanks guys, this was helpful. I started the job tracker as a service, >> but apparently I never started the task tracker (or it failed to start and >> I didn't notice). I started it after Haosdent's message, but wasn't able to >> see any difference and I kept poking around. >> >> After making some changes and the VM wouldn't boot, my OCD got the better >> of me and I reinstalled everything from scratch. There are just too many >> moving parts to hassle you guys with an imperfect install on my end. >> >> This time through, I felt a lot more confident to use the Mesosphere >> RPMs, but I couldn't find the best way to get things launched. >> https://docs.mesosphere.com/reference/packages/ has a Last-Modified >> of Fri, 01 May 2015 18:46:10 GMT (one week ago), but the RHEL 6 RPMs don't >> have any init.d service descriptions as the packages page would indicate. >> For now, I just launched them manually, but would like to get the machine >> to completely load on boot as services. >> >> At this point, I have tested Mesos with: >> >> mesos-execute >> --master="localhost:5050" --name="test-exec" --command="sleep 10" >> >> The only problem there is it seems that "localhost" isn't good enough for >> my install, it needs to be the FQDN, but it works and the job flows through >> the UI. >> >> Now, back to a hadoop job. When I try the job now, the logs show the >> following stream of repeated messages: >> >> 2015-05-07 17:52:53,124 INFO org.apache.hadoop.mapred.ResourcePolicy: >> Satisfied map and reduce slots needed. >> 2015-05-07 17:52:53,340 INFO org.apache.hadoop.mapred.MesosScheduler: >> Unknown/exited TaskTracker: http://10.211.55.16:50060. >> [Repeated a few times a second for five seconds] >> >> 2015-05-07 17:49:08,914 INFO org.apache.hadoop.mapred.ResourcePolicy: >> JobTracker Status >> >> Pending Map Tasks: 4 >> >> Pending Reduce Tasks: 1 >> Running Map Tasks: 0 >> Running Reduce Tasks: 0 >> Idle Map Slots: 0 >> Idle Reduce Slots: 0 >> Inactive Map Slots: 4 (launched but no hearbeat yet) >> Inactive Reduce Slots: 1 (launched but no hearbeat yet) >> Needed Map Slots: 0 >> Needed Reduce Slots: 0 >> Unhealthy Trackers: 0 >> >> >> This looks close. >> >> What's the best way to get a JDWP port set up to break in this code (i.e. >> learning to fish...)? >> >> best, Brian >> >> >> On May 7, 2015, at 12:11 PM, Adam Bordelon <[email protected]> wrote: >> >> From the mesos-master log and the JT log, it doesn't look like the >> MesosScheduler ever registered with Mesos, which should mean that it >> wouldn't start any TTs or map/reduce tasks. However, your `ps` output does >> seem to show a tasktracker running. Did you start that yourself (or >> automatically as a system service)? >> >> On Wed, May 6, 2015 at 9:32 AM, haosdent <[email protected]> wrote: >> >>> Do you start tasktracker successfully? >>> >>> On Wed, May 6, 2015 at 11:32 PM, Brian Topping <[email protected]> >>> wrote: >>> >>>> Hi all, I'm happy to report that I'm very close to >>>> getting 2.6.0-cdh5.4.0 integrated against Mesos 0.22.1 with the >>>> hadoop-mesos 0.10 code on Github. Hoping someone might have a few minutes >>>> to parse what I've got here and suggest something to try. >>>> >>>> https://gist.github.com/briantopping/0dfd0777ff4ce5a81219 hopefully >>>> has all the data necessary between the console output of the client run, >>>> the mesos master and slave console, the XML configuration of the JT and the >>>> output that was generated by it. Please let me know if I've left something >>>> out. >>>> >>>> I iterated a few times getting all the errors from missing paths or >>>> libraries sorted out, but the example client ultimately just sits waiting >>>> forever at "map 0% reduce 0%". >>>> >>>> Any input kindly appreciated! >>>> >>>> Brian >>>> >>> >>> >>> >>> -- >>> Best Regards, >>> Haosdent Huang >>> >> >> >> <signature.asc> > > > > > -- Best Regards, Haosdent Huang

