Re: Debugging hadoop-mesos

haosdent Thu, 07 May 2015 23:58:24 -0700

Could you post the log in executors which run jobtracker and taskstracks?
It would be helpful to find the cause of this problem.


On Fri, May 8, 2015 at 3:05 AM, Brian Topping <[email protected]>
wrote:

> I think there's something weird here:
>
>   cpus: offered 2.0 needed at least 1.0
>   mem : offered 1724.0 needed at least 1024.0
>   disk: offered 44124.0 needed at least 1024.0
>   ports:  at least 2 (sufficient)
>
>
> Am I misreading this? All of the requirements seem to be met.
>
> Presumably it's this code from o.a.h.mapred.ResourcePolicyVariable:
>
> int slots = mapSlotsMax + reduceSlotsMax;
> slots = (int) Math.min(slots, (cpus - containerCpus) / slotCpus);
> slots = (int) Math.min(slots, (mem - containerMem) / slotMem);
> slots = (int) Math.min(slots, (disk - containerDisk) / slotDisk);
>
>
> // Is this offer too small for even the minimum slots?
> if (slots < 1) {
>   return false;
> }
>
>
> Not exactly sure what this is doing.
>
> Sorry for the noise.
>
>
> On May 7, 2015, at 6:32 PM, Brian Topping <[email protected]> wrote:
>
> Presumably https://gist.github.com/briantopping/311960f8e5454dbe9aab has
> some more information necessary at this point... sorry for the omission..
>
> On May 7, 2015, at 6:05 PM, Tom Arnfeld <[email protected]> wrote:
>
> Hi Brian,
>
> At this point you should see the TT attempting to be launched via Mesos.
> The "launched but not heartbeat yet" count tells us that the framework has
> accepted resources for 4 slots but the TT hasn't actually come up yet.
>
> Do you see the task in your Meaos cluster UI, and is there anything
> interesting in the task logs?
>
> --
>
> Tom Arnfeld
> Developer // DueDil
>
> (+44) 7525940046
> 25 Christopher Street, London, EC2A 2BS
>
>
> On Thu, May 7, 2015 at 12:01 PM, Brian Topping <[email protected]>
> wrote:
>
>> Thanks guys, this was helpful. I started the job tracker as a service,
>> but apparently I never started the task tracker (or it failed to start and
>> I didn't notice). I started it after Haosdent's message, but wasn't able to
>> see any difference and I kept poking around.
>>
>> After making some changes and the VM wouldn't boot, my OCD got the better
>> of me and I reinstalled everything from scratch. There are just too many
>> moving parts to hassle you guys with an imperfect install on my end.
>>
>> This time through, I felt a lot more confident to use the Mesosphere
>> RPMs, but I couldn't find the best way to get things launched.
>> https://docs.mesosphere.com/reference/packages/ has a Last-Modified
>> of Fri, 01 May 2015 18:46:10 GMT (one week ago), but the RHEL 6 RPMs don't
>> have any init.d service descriptions as the packages page would indicate.
>> For now, I just launched them manually, but would like to get the machine
>> to completely load on boot as services.
>>
>> At this point, I have tested Mesos with:
>>
>>  mesos-execute
>> --master="localhost:5050" --name="test-exec" --command="sleep 10"
>>
>> The only problem there is it seems that "localhost" isn't good enough for
>> my install, it needs to be the FQDN, but it works and the job flows through
>> the UI.
>>
>> Now, back to a hadoop job. When I try the job now, the logs show the
>> following stream of repeated messages:
>>
>>  2015-05-07 17:52:53,124 INFO org.apache.hadoop.mapred.ResourcePolicy:
>> Satisfied map and reduce slots needed.
>> 2015-05-07 17:52:53,340 INFO org.apache.hadoop.mapred.MesosScheduler:
>> Unknown/exited TaskTracker: http://10.211.55.16:50060.
>> [Repeated a few times a second for five seconds]
>>
>> 2015-05-07 17:49:08,914 INFO org.apache.hadoop.mapred.ResourcePolicy:
>> JobTracker Status
>>
>>       Pending Map Tasks: 4
>>
>>    Pending Reduce Tasks: 1
>>       Running Map Tasks: 0
>>    Running Reduce Tasks: 0
>>          Idle Map Slots: 0
>>       Idle Reduce Slots: 0
>>      Inactive Map Slots: 4 (launched but no hearbeat yet)
>>   Inactive Reduce Slots: 1 (launched but no hearbeat yet)
>>        Needed Map Slots: 0
>>     Needed Reduce Slots: 0
>>      Unhealthy Trackers: 0
>>
>>
>> This looks close.
>>
>> What's the best way to get a JDWP port set up to break in this code (i.e.
>> learning to fish...)?
>>
>> best, Brian
>>
>>
>>  On May 7, 2015, at 12:11 PM, Adam Bordelon <[email protected]> wrote:
>>
>> From the mesos-master log and the JT log, it doesn't look like the
>> MesosScheduler ever registered with Mesos, which should mean that it
>> wouldn't start any TTs or map/reduce tasks. However, your `ps` output does
>> seem to show a tasktracker running. Did you start that yourself (or
>> automatically as a system service)?
>>
>> On Wed, May 6, 2015 at 9:32 AM, haosdent <[email protected]> wrote:
>>
>>> Do you start tasktracker successfully?
>>>
>>> On Wed, May 6, 2015 at 11:32 PM, Brian Topping <[email protected]>
>>> wrote:
>>>
>>>> Hi all, I'm happy to report that I'm very close to
>>>> getting 2.6.0-cdh5.4.0 integrated against Mesos 0.22.1 with the
>>>> hadoop-mesos 0.10 code on Github. Hoping someone might have a few minutes
>>>> to parse what I've got here and suggest something to try.
>>>>
>>>>  https://gist.github.com/briantopping/0dfd0777ff4ce5a81219 hopefully
>>>> has all the data necessary between the console output of the client run,
>>>> the mesos master and slave console, the XML configuration of the JT and the
>>>> output that was generated by it. Please let me know if I've left something
>>>> out.
>>>>
>>>> I iterated a few times getting all the errors from missing paths or
>>>> libraries sorted out, but the example client ultimately just sits waiting
>>>> forever at "map 0% reduce 0%".
>>>>
>>>> Any input kindly appreciated!
>>>>
>>>> Brian
>>>>
>>>
>>>
>>>
>>>  --
>>> Best Regards,
>>> Haosdent Huang
>>>
>>
>>
>>  <signature.asc>
>
>
>
>
>


-- 
Best Regards,
Haosdent Huang

Re: Debugging hadoop-mesos

Reply via email to