Re: hadoop job stuck.

craig mcmillan Fri, 16 Jan 2015 03:30:20 -0800

dan,

to function correctly the mesos web ui requires that the slave ipaddresses the master uses are directly accessible from your browser to :it doesn't work if your slaves are not accessible to your browser on thesame ip address that the master uses, such as happens when the masteruses a private ip address for the slaves

you can get the slave logs directly by logging onto the slave andlooking in /tmp/mesos/slaves/ (at least, this is where they are bydefault in the 0.21.0-1.0.ubuntu1404 i am using) then following the restof the path from the url in the web ui


:craig


On 16 Jan 2015, at 11:20, Dick Davies wrote:

To view the slaves logs, you need to be able to connect to that URL
from your browser, not the master
(the data is read directly from the slave by your browser, it doesn't
go via the master).


On 15 January 2015 at 21:42, Dan Dong <[email protected]> wrote:
Hi, All,
Now sandbox could be viewed on mesos UI, I see the following info(The
same error appears on every slave sandbox.):
"Failed to connect to slave '20150115-144719-3205108908-5050-4552-S0'on
'centos-2.local:5051'.

Potential reasons:
The slave's hostname, 'centos-2.local', is not accessible from yournetwork
The slave's port, '5051', is not accessible from your network"


I checked that:
slave centos-2.local can be login from any machine in the clusterwithout
password by "ssh centos-2.local ";
port 5051 on slave centos-2.local could be connected from master by"telnet
centos-2.local 5051"

Confused what's the problem here?

Cheers,
Dan
2015-01-14 15:33 GMT-06:00 Brenden Matthews<[email protected]>:
Would need the task logs from the slave which the TaskTracker waslaunched
on, to debug this further.
On Wed, Jan 14, 2015 at 1:28 PM, Dan Dong <[email protected]>wrote:
Checked /etc/hosts is correct, master and slave can ssh login eachotherby hostname without password, and hadoop runs well without mesos,but it
stucks when running on mesos.

Cheers,
Dan

2015-01-14 15:02 GMT-06:00 Brenden Matthews
<[email protected]>:
At a first glance, it looks like `/etc/hosts` might be setincorrectly
and it cannot resolve the hostname of the worker.

See here for more: https://wiki.apache.org/hadoop/UnknownHost
On Wed, Jan 14, 2015 at 12:32 PM, Vinod Kone<[email protected]>
wrote:
What do the master logs say?
On Wed, Jan 14, 2015 at 12:21 PM, Dan Dong <[email protected]>wrote:
Hi,
When I run hadoop jobs on Mesos(0.21.0), the jobs are stuck for
ever:
15/01/14 13:59:30 INFO mapred.FileInputFormat: Total input pathsto
process : 8
15/01/14 13:59:30 INFO mapred.JobClient: Running job:
job_201501141358_0001
15/01/14 13:59:31 INFO mapred.JobClient:  map 0% reduce 0%

From jobtracker log I see:
2015-01-14 13:59:35,542 INFOorg.apache.hadoop.mapred.ResourcePolicy:Launching task Task_Tracker_0 on http://centos-2.local:31911with mapSlots=1
reduceSlots=0
2015-01-14 14:04:35,552 WARNorg.apache.hadoop.mapred.MesosScheduler:Tracker http://centos-2.local:31911 failed to launch within 300seconds,
killing it

I started manually namenode and jobtracker on master node and
datanode on slave, but I could not see tasktracker started bymesos onslave. Note that if I ran hadoop directly without Mesos( ofcourse the conffiles are different and tasktracker will be started manually onslave),
everything works fine. Any hints?

Cheers,
Dan

Re: hadoop job stuck.

Reply via email to