Thanks Hasodent, I've updated https://gist.github.com/briantopping/311960f8e5454dbe9aab <https://gist.github.com/briantopping/311960f8e5454dbe9aab> with the output logs of what I am currently seeing. I've edited them for size, the message "INFO org.apache.hadoop.mapred.MesosScheduler: Unknown/exited TaskTracker: http://10.211.55.16:50060" appeared a few thousand times in the logs. The configuration I have is probably still broken, 50060 is a Jetty port that returns a Cloudera string when telnetting to it.
The error I saw below were apparently the result of building against the older version of CDH, when I updated the hadoop-mesos POM to match my deployment version, the incorrectly calculated "slots" problem in my previous message has resolved. My current problem is a Hadoop logging problem and nothing to do with Mesos, so I didn't post. I changed hadoop.log.dir=/var/log/hadoop in /etc/hadoop/conf.pseudo.mr1/log4j.properties, but it didn't make any difference. Just getting back into it now. > On May 8, 2015, at 1:56 PM, haosdent <[email protected]> wrote: > > Could you post the log in executors which run jobtracker and taskstracks? It > would be helpful to find the cause of this problem. > > On Fri, May 8, 2015 at 3:05 AM, Brian Topping <[email protected] > <mailto:[email protected]>> wrote: > I think there's something weird here: >> cpus: offered 2.0 needed at least 1.0 >> mem : offered 1724.0 needed at least 1024.0 >> disk: offered 44124.0 needed at least 1024.0 >> ports: at least 2 (sufficient) > > Am I misreading this? All of the requirements seem to be met. > > Presumably it's this code from o.a.h.mapred.ResourcePolicyVariable: > >> int slots = mapSlotsMax + reduceSlotsMax; >> slots = (int) Math.min(slots, (cpus - containerCpus) / slotCpus); >> slots = (int) Math.min(slots, (mem - containerMem) / slotMem); >> slots = (int) Math.min(slots, (disk - containerDisk) / slotDisk); >> >> // Is this offer too small for even the minimum slots? >> if (slots < 1) { >> return false; >> } > > Not exactly sure what this is doing. > > Sorry for the noise. > >> >> On May 7, 2015, at 6:32 PM, Brian Topping <[email protected] >> <mailto:[email protected]>> wrote: >> >> Presumably https://gist.github.com/briantopping/311960f8e5454dbe9aab >> <https://gist.github.com/briantopping/311960f8e5454dbe9aab> has some more >> information necessary at this point... sorry for the omission.. >> >>> On May 7, 2015, at 6:05 PM, Tom Arnfeld <[email protected] >>> <mailto:[email protected]>> wrote: >>> >>> Hi Brian, >>> >>> At this point you should see the TT attempting to be launched via Mesos. >>> The "launched but not heartbeat yet" count tells us that the framework has >>> accepted resources for 4 slots but the TT hasn't actually come up yet. >>> >>> Do you see the task in your Meaos cluster UI, and is there anything >>> interesting in the task logs? >>> >>> -- >>> >>> Tom Arnfeld >>> Developer // DueDil >>> >>> (+44) 7525940046 <tel:%28%2B44%29%207525940046> >>> 25 Christopher Street, London, EC2A 2BS >>> >>> >>> On Thu, May 7, 2015 at 12:01 PM, Brian Topping <[email protected] >>> <mailto:[email protected]>> wrote: >>> >>> Thanks guys, this was helpful. I started the job tracker as a service, but >>> apparently I never started the task tracker (or it failed to start and I >>> didn't notice). I started it after Haosdent's message, but wasn't able to >>> see any difference and I kept poking around. >>> >>> After making some changes and the VM wouldn't boot, my OCD got the better >>> of me and I reinstalled everything from scratch. There are just too many >>> moving parts to hassle you guys with an imperfect install on my end. >>> >>> This time through, I felt a lot more confident to use the Mesosphere RPMs, >>> but I couldn't find the best way to get things launched. >>> https://docs.mesosphere.com/reference/packages/ >>> <https://docs.mesosphere.com/reference/packages/> has a Last-Modified of >>> Fri, 01 May 2015 18:46:10 GMT (one week ago), but the RHEL 6 RPMs don't >>> have any init.d service descriptions as the packages page would indicate. >>> For now, I just launched them manually, but would like to get the machine >>> to completely load on boot as services. >>> >>> At this point, I have tested Mesos with: >>> >>> mesos-execute --master="localhost:5050" --name="test-exec" >>> --command="sleep 10" >>> >>> The only problem there is it seems that "localhost" isn't good enough for >>> my install, it needs to be the FQDN, but it works and the job flows through >>> the UI. >>> >>> Now, back to a hadoop job. When I try the job now, the logs show the >>> following stream of repeated messages: >>> >>>> 2015-05-07 17:52:53,124 INFO org.apache.hadoop.mapred.ResourcePolicy: >>>> Satisfied map and reduce slots needed. >>>> 2015-05-07 17:52:53,340 INFO org.apache.hadoop.mapred.MesosScheduler: >>>> Unknown/exited TaskTracker: http://10.211.55.16:50060 >>>> <http://10.211.55.16:50060/>. >>>> [Repeated a few times a second for five seconds] >>>> 2015-05-07 17:49:08,914 INFO org.apache.hadoop.mapred.ResourcePolicy: >>>> JobTracker Status >>>> Pending Map Tasks: 4 >>>> Pending Reduce Tasks: 1 >>>> Running Map Tasks: 0 >>>> Running Reduce Tasks: 0 >>>> Idle Map Slots: 0 >>>> Idle Reduce Slots: 0 >>>> Inactive Map Slots: 4 (launched but no hearbeat yet) >>>> Inactive Reduce Slots: 1 (launched but no hearbeat yet) >>>> Needed Map Slots: 0 >>>> Needed Reduce Slots: 0 >>>> Unhealthy Trackers: 0 >>> >>> This looks close. >>> >>> What's the best way to get a JDWP port set up to break in this code (i.e. >>> learning to fish...)? >>> >>> best, Brian >>> >>> >>>> On May 7, 2015, at 12:11 PM, Adam Bordelon <[email protected] >>>> <mailto:[email protected]>> wrote: >>>> >>>> From the mesos-master log and the JT log, it doesn't look like the >>>> MesosScheduler ever registered with Mesos, which should mean that it >>>> wouldn't start any TTs or map/reduce tasks. However, your `ps` output does >>>> seem to show a tasktracker running. Did you start that yourself (or >>>> automatically as a system service)? >>>> >>>> On Wed, May 6, 2015 at 9:32 AM, haosdent <[email protected] >>>> <mailto:[email protected]>> wrote: >>>> Do you start tasktracker successfully? >>>> >>>> On Wed, May 6, 2015 at 11:32 PM, Brian Topping <[email protected] >>>> <mailto:[email protected]>> wrote: >>>> Hi all, I'm happy to report that I'm very close to getting 2.6.0-cdh5.4.0 >>>> integrated against Mesos 0.22.1 with the hadoop-mesos 0.10 code on Github. >>>> Hoping someone might have a few minutes to parse what I've got here and >>>> suggest something to try. >>>> >>>> https://gist.github.com/briantopping/0dfd0777ff4ce5a81219 >>>> <https://gist.github.com/briantopping/0dfd0777ff4ce5a81219> hopefully has >>>> all the data necessary between the console output of the client run, the >>>> mesos master and slave console, the XML configuration of the JT and the >>>> output that was generated by it. Please let me know if I've left something >>>> out. >>>> >>>> I iterated a few times getting all the errors from missing paths or >>>> libraries sorted out, but the example client ultimately just sits waiting >>>> forever at "map 0% reduce 0%". >>>> >>>> Any input kindly appreciated! >>>> >>>> Brian >>>> >>>> >>>> >>>> -- >>>> Best Regards, >>>> Haosdent Huang >>>> >>> >>> <signature.asc> >>> >> > > > > > -- > Best Regards, > Haosdent Huang
signature.asc
Description: Message signed with OpenPGP using GPGMail

