Re: hadoop job stuck.

Dick Davies Fri, 16 Jan 2015 03:22:12 -0800

To view the slaves logs, you need to be able to connect to that URL
from your browser, not the master
(the data is read directly from the slave by your browser, it doesn't
go via the master).



On 15 January 2015 at 21:42, Dan Dong <[email protected]> wrote:
> Hi, All,
>   Now sandbox could be viewed on mesos UI, I see the following info( The
> same error appears on every slave sandbox.):
>
> "Failed to connect to slave '20150115-144719-3205108908-5050-4552-S0' on
> 'centos-2.local:5051'.
>
> Potential reasons:
>
> The slave's hostname, 'centos-2.local', is not accessible from your network
> The slave's port, '5051', is not accessible from your network"
>
>
> I checked that:
> slave centos-2.local can be login from any machine in the cluster without
> password by "ssh centos-2.local ";
> port 5051 on slave centos-2.local could be connected from master by "telnet
> centos-2.local 5051"
>
> Confused what's the problem here?
>
> Cheers,
> Dan
>
>
>
> 2015-01-14 15:33 GMT-06:00 Brenden Matthews <[email protected]>:
>
>> Would need the task logs from the slave which the TaskTracker was launched
>> on, to debug this further.
>>
>> On Wed, Jan 14, 2015 at 1:28 PM, Dan Dong <[email protected]> wrote:
>>>
>>> Checked /etc/hosts is correct, master and slave can ssh login each other
>>> by hostname without password, and hadoop runs well without mesos, but it
>>> stucks when running on mesos.
>>>
>>> Cheers,
>>> Dan
>>>
>>> 2015-01-14 15:02 GMT-06:00 Brenden Matthews
>>> <[email protected]>:
>>>
>>>> At a first glance, it looks like `/etc/hosts` might be set incorrectly
>>>> and it cannot resolve the hostname of the worker.
>>>>
>>>> See here for more: https://wiki.apache.org/hadoop/UnknownHost
>>>>
>>>> On Wed, Jan 14, 2015 at 12:32 PM, Vinod Kone <[email protected]>
>>>> wrote:
>>>>>
>>>>> What do the master logs say?
>>>>>
>>>>> On Wed, Jan 14, 2015 at 12:21 PM, Dan Dong <[email protected]> wrote:
>>>>>>
>>>>>> Hi,
>>>>>>   When I run hadoop jobs on Mesos(0.21.0), the jobs are stuck for
>>>>>> ever:
>>>>>> 15/01/14 13:59:30 INFO mapred.FileInputFormat: Total input paths to
>>>>>> process : 8
>>>>>> 15/01/14 13:59:30 INFO mapred.JobClient: Running job:
>>>>>> job_201501141358_0001
>>>>>> 15/01/14 13:59:31 INFO mapred.JobClient:  map 0% reduce 0%
>>>>>>
>>>>>> From jobtracker log I see:
>>>>>> 2015-01-14 13:59:35,542 INFO org.apache.hadoop.mapred.ResourcePolicy:
>>>>>> Launching task Task_Tracker_0 on http://centos-2.local:31911 with 
>>>>>> mapSlots=1
>>>>>> reduceSlots=0
>>>>>> 2015-01-14 14:04:35,552 WARN org.apache.hadoop.mapred.MesosScheduler:
>>>>>> Tracker http://centos-2.local:31911 failed to launch within 300 seconds,
>>>>>> killing it
>>>>>>
>>>>>>  I started manually namenode and jobtracker on master node and
>>>>>> datanode on slave, but I could not see tasktracker started by mesos on
>>>>>> slave. Note that if I ran hadoop directly without Mesos( of course the 
>>>>>> conf
>>>>>> files are different and tasktracker will be started manually on slave),
>>>>>> everything works fine. Any hints?
>>>>>>
>>>>>> Cheers,
>>>>>> Dan
>>>>>
>>>>>
>>>>
>>>
>>
>

Re: hadoop job stuck.

Reply via email to