Re: hadoop job stuck.

Dan Dong Thu, 15 Jan 2015 13:44:00 -0800

Hi, All,
  Now sandbox could be viewed on mesos UI, I see the following info(* The
same error appears on every slave sandbox.)*:


*"Failed to connect to slave '20150115-144719-3205108908-5050-4552-S0' on
'centos-2.local:5051'.*
Potential reasons:

   - The slave's hostname, 'centos-2.local', is not accessible from your
   network
   - The slave's port, '5051', is not accessible from your network"


I checked that:
slave *centos-2.local can be login from any machine in the cluster without
password by "ssh*
* centos-2.local ";*
*port 5051 on *

*slave centos-2.local could be connected from master by "telnet
centos-2.local 5051"*


*Confused what's the problem here?*

*Cheers,*


*Dan*


2015-01-14 15:33 GMT-06:00 Brenden Matthews <[email protected]>:

> Would need the task logs from the slave which the TaskTracker was launched
> on, to debug this further.
>
> On Wed, Jan 14, 2015 at 1:28 PM, Dan Dong <[email protected]> wrote:
>
>> Checked /etc/hosts is correct, master and slave can ssh login each other
>> by hostname without password, and hadoop runs well without mesos, but it
>> stucks when running on mesos.
>>
>> Cheers,
>> Dan
>>
>> 2015-01-14 15:02 GMT-06:00 Brenden Matthews <[email protected]>
>> :
>>
>> At a first glance, it looks like `/etc/hosts` might be set incorrectly
>>> and it cannot resolve the hostname of the worker.
>>>
>>> See here for more: https://wiki.apache.org/hadoop/UnknownHost
>>>
>>> On Wed, Jan 14, 2015 at 12:32 PM, Vinod Kone <[email protected]>
>>> wrote:
>>>
>>>> What do the master logs say?
>>>>
>>>> On Wed, Jan 14, 2015 at 12:21 PM, Dan Dong <[email protected]> wrote:
>>>>
>>>>> Hi,
>>>>>   When I run hadoop jobs on Mesos(0.21.0), the jobs are stuck for ever:
>>>>> 15/01/14 13:59:30 INFO mapred.FileInputFormat: Total input paths to
>>>>> process : 8
>>>>> 15/01/14 13:59:30 INFO mapred.JobClient: Running job:
>>>>> job_201501141358_0001
>>>>> 15/01/14 13:59:31 INFO mapred.JobClient:  map 0% reduce 0%
>>>>>
>>>>> From jobtracker log I see:
>>>>> 2015-01-14 13:59:35,542 INFO org.apache.hadoop.mapred.ResourcePolicy:
>>>>> Launching task Task_Tracker_0 on http://centos-2.local:31911 with
>>>>> mapSlots=1 reduceSlots=0
>>>>> 2015-01-14 14:04:35,552 WARN org.apache.hadoop.mapred.MesosScheduler:
>>>>> Tracker http://centos-2.local:31911 failed to launch within 300
>>>>> seconds, killing it
>>>>>
>>>>>  I started manually namenode and jobtracker on master node and
>>>>> datanode on slave, but I could not see tasktracker started by mesos on
>>>>> slave. Note that if I ran hadoop directly without Mesos( of course the 
>>>>> conf
>>>>> files are different and tasktracker will be started manually on slave),
>>>>> everything works fine. Any hints?
>>>>>
>>>>> Cheers,
>>>>> Dan
>>>>>
>>>>
>>>>
>>>
>>
>

Re: hadoop job stuck.

Reply via email to