Hi, All, Now sandbox could be viewed on mesos UI, I see the following info(* The same error appears on every slave sandbox.)*:
*"Failed to connect to slave '20150115-144719-3205108908-5050-4552-S0' on 'centos-2.local:5051'.* Potential reasons: - The slave's hostname, 'centos-2.local', is not accessible from your network - The slave's port, '5051', is not accessible from your network" I checked that: slave *centos-2.local can be login from any machine in the cluster without password by "ssh* * centos-2.local ";* *port 5051 on * *slave centos-2.local could be connected from master by "telnet centos-2.local 5051"* *Confused what's the problem here?* *Cheers,* *Dan* 2015-01-14 15:33 GMT-06:00 Brenden Matthews <[email protected]>: > Would need the task logs from the slave which the TaskTracker was launched > on, to debug this further. > > On Wed, Jan 14, 2015 at 1:28 PM, Dan Dong <[email protected]> wrote: > >> Checked /etc/hosts is correct, master and slave can ssh login each other >> by hostname without password, and hadoop runs well without mesos, but it >> stucks when running on mesos. >> >> Cheers, >> Dan >> >> 2015-01-14 15:02 GMT-06:00 Brenden Matthews <[email protected]> >> : >> >> At a first glance, it looks like `/etc/hosts` might be set incorrectly >>> and it cannot resolve the hostname of the worker. >>> >>> See here for more: https://wiki.apache.org/hadoop/UnknownHost >>> >>> On Wed, Jan 14, 2015 at 12:32 PM, Vinod Kone <[email protected]> >>> wrote: >>> >>>> What do the master logs say? >>>> >>>> On Wed, Jan 14, 2015 at 12:21 PM, Dan Dong <[email protected]> wrote: >>>> >>>>> Hi, >>>>> When I run hadoop jobs on Mesos(0.21.0), the jobs are stuck for ever: >>>>> 15/01/14 13:59:30 INFO mapred.FileInputFormat: Total input paths to >>>>> process : 8 >>>>> 15/01/14 13:59:30 INFO mapred.JobClient: Running job: >>>>> job_201501141358_0001 >>>>> 15/01/14 13:59:31 INFO mapred.JobClient: map 0% reduce 0% >>>>> >>>>> From jobtracker log I see: >>>>> 2015-01-14 13:59:35,542 INFO org.apache.hadoop.mapred.ResourcePolicy: >>>>> Launching task Task_Tracker_0 on http://centos-2.local:31911 with >>>>> mapSlots=1 reduceSlots=0 >>>>> 2015-01-14 14:04:35,552 WARN org.apache.hadoop.mapred.MesosScheduler: >>>>> Tracker http://centos-2.local:31911 failed to launch within 300 >>>>> seconds, killing it >>>>> >>>>> I started manually namenode and jobtracker on master node and >>>>> datanode on slave, but I could not see tasktracker started by mesos on >>>>> slave. Note that if I ran hadoop directly without Mesos( of course the >>>>> conf >>>>> files are different and tasktracker will be started manually on slave), >>>>> everything works fine. Any hints? >>>>> >>>>> Cheers, >>>>> Dan >>>>> >>>> >>>> >>> >> >

