The IP addresses are suspicious: 127.0.1.1:5051 looks like you communicate with localhost. Make sure you pass to both master and slave correct `--ip ` flag. Correct should be the IP address of the interface which is used for connecting these machines.
if the hostname is correct Name: hotbox-30.stanford.edu Address: 10.79.6.70 then --ip 10.79.6.70 On 28 June 2014 02:12, Sammy Steele <[email protected]> wrote: > Thanks so much for your help! I am completely new to Mesos, so I am not > sure if I am interpreting your question correctly. Do you mean these logs > (which I've attached pictures of)? Or do you mean the log generated by the > manager which looks like this (after attempting to register one slave on > the same computer and one on a different computer): > > I0627 17:01:39.385833 32259 replica.cpp:661] Replica learned TRUNCATE > action at position 1640 > I0627 17:01:46.255574 32260 http.cpp:452] HTTP request for > '/master/state.json' > I0627 17:01:55.490910 32255 master.cpp:2477] Re-registering slave > 20140627-105920-16777343-5050-32615-0 at slave(1)@127.0.1.1:5051 ( > hotbox-30.Stanford.EDU) > I0627 17:01:55.491575 32258 registrar.cpp:422] Attempting to update the > 'registry' > I0627 17:01:55.493095 32260 log.cpp:680] Attempting to append 322 bytes to > the log > I0627 17:01:55.493242 32255 coordinator.cpp:340] Coordinator attempting to > write APPEND action at position 1641 > I0627 17:01:55.493803 32255 replica.cpp:508] Replica received write > request for position 1641 > I0627 17:01:55.494604 32255 leveldb.cpp:343] Persisting action (342 bytes) > to leveldb took 738686ns > I0627 17:01:55.494701 32255 replica.cpp:676] Persisted action at 1641 > I0627 17:01:55.495028 32258 replica.cpp:655] Replica received learned > notice for position 1641 > I0627 17:01:55.495808 32258 leveldb.cpp:343] Persisting action (344 bytes) > to leveldb took 699563ns > I0627 17:01:55.495873 32258 replica.cpp:676] Persisted action at 1641 > I0627 17:01:55.495931 32258 replica.cpp:661] Replica learned APPEND action > at position 1641 > I0627 17:01:55.496388 32261 registrar.cpp:479] Successfully updated > 'registry' > I0627 17:01:55.496537 32256 log.cpp:699] Attempting to truncate the log to > 1641 > I0627 17:01:55.496665 32260 coordinator.cpp:340] Coordinator attempting to > write TRUNCATE action at position 1642 > I0627 17:01:55.496690 32257 master.cpp:2528] Re-registered slave > 20140627-105920-16777343-5050-32615-0 at slave(1)@127.0.1.1:5051 ( > hotbox-30.Stanford.EDU) > I0627 17:01:55.496824 32257 master.cpp:3472] Adding slave > 20140627-105920-16777343-5050-32615-0 at slave(1)@127.0.1.1:5051 ( > hotbox-30.Stanford.EDU) with cpus(*):8; mem(*):15024; disk(*):448079; > ports(*):[31000-32000] > I0627 17:01:55.497179 32256 replica.cpp:508] Replica received write > request for position 1642 > I0627 17:01:55.497406 32257 hierarchical_allocator_process.hpp:444] Added > slave 20140627-105920-16777343-5050-32615-0 (hotbox-30.Stanford.EDU) with > cpus(*):8; mem(*):15024; disk(*):448079; ports(*):[31000-32000] (and > cpus(*):8; mem(*):15024; disk(*):448079; ports(*):[31000-32000] available) > I0627 17:01:55.497931 32256 leveldb.cpp:343] Persisting action (18 bytes) > to leveldb took 685099ns > I0627 17:01:55.498000 32256 replica.cpp:676] Persisted action at 1642 > I0627 17:01:55.498262 32261 replica.cpp:655] Replica received learned > notice for position 1642 > I0627 17:01:55.499034 32261 leveldb.cpp:343] Persisting action (20 bytes) > to leveldb took 676723ns > I0627 17:01:55.499114 32261 leveldb.cpp:401] Deleting ~2 keys from leveldb > took 17977ns > I0627 17:01:55.499174 32261 replica.cpp:676] Persisted action at 1642 > I0627 17:01:55.499232 32261 replica.cpp:661] Replica learned TRUNCATE > action at position 1642 > I0627 17:01:56.261155 32261 http.cpp:452] HTTP request for > '/master/state.json' > I0627 17:02:06.306571 32257 http.cpp:452] HTTP request for > '/master/state.json' > I0627 17:02:16.337009 32255 http.cpp:452] HTTP request for > '/master/state.json' > I0627 17:02:26.346575 32256 http.cpp:452] HTTP request for > '/master/state.json' > I0627 17:02:36.356895 32257 http.cpp:452] HTTP request for > '/master/state.json' > > Or this LOG which seems to be generated by mesos in the working directory? > > 2014/06/27-17:01:39.350738 7f83eac8f740 Recovering log #81 > 2014/06/27-17:01:39.350881 7f83eac8f740 Level-0 table #85: started > 2014/06/27-17:01:39.353463 7f83eac8f740 Level-0 table #85: 1720 bytes OK > 2014/06/27-17:01:39.358567 7f83eac8f740 Delete type=0 #81 > 2014/06/27-17:01:39.358813 7f83eac8f740 Delete type=3 #78 > 2014/06/27-17:01:39.359606 7f83e2f48700 Level-0 table #88: started > 2014/06/27-17:01:39.359817 7f83e2f48700 Level-0 table #88: 0 bytes OK > 2014/06/27-17:01:39.360829 7f83e2f48700 Delete type=0 #86 > 2014/06/27-17:01:39.361097 7f83e2f48700 Manual compaction at level-0 from > (begin) .. (end); will stop at '0000001639' @ 4929 : 1 > 2014/06/27-17:01:39.361107 7f83e2f48700 Compacting 1@0 + 1@1 files > 2014/06/27-17:01:39.363837 7f83e2f48700 Generated table #89: 3 keys, 523 > bytes > 2014/06/27-17:01:39.363851 7f83e2f48700 Compacted 1@0 + 1@1 files => 523 > bytes > 2014/06/27-17:01:39.364422 7f83e2f48700 compacted to: files[ 0 1 0 0 0 0 0 > ] > 2014/06/27-17:01:39.364852 7f83e2f48700 Delete type=2 #85 > 2014/06/27-17:01:39.365031 7f83e2f48700 Delete type=2 #83 > 2014/06/27-17:01:39.365305 7f83e2f48700 Manual compaction at level-0 from > '0000001639' @ 4929 : 1 .. (end); will stop at (end) > > > > On Fri, Jun 27, 2014 at 4:50 PM, Vinod Kone <[email protected]> wrote: > >> It looks like the framework and slave are not able to properly register >> with the master due to networking issues. There should be log messages >> indicating whether master received registration requests are not. >> >> > "I0627 16:02:42.431401 10059 slave.cpp:2873] Current usage 0.81%. Max >> allowed age: 6.243193692985590days)", indicating that it has connected to >> the master. >> >> This just tells you that slave is running. It has nothing to do with >> whether it is registered with the master or not. >> >> What do master and framework logs say? >> >> >> >> On Fri, Jun 27, 2014 at 4:36 PM, Sammy Steele <[email protected]> >> wrote: >> >>> Hi, >>> >>> I am trying to set up Mesos with 1 master and 3 slaves running on >>> several computers all on the same switch. Each computer is a 4-socket dual >>> core machine running ubuntu 13.04. I have installed mesos, and can create >>> one master and one slave when ssh'd into the same computer using the local >>> IP. However, when I try to create a slave on a second computer connected >>> via the master's public IP, the slave appears to register. I get a >>> message: >>> >>> "I0627 16:02:42.431401 10059 slave.cpp:2873] Current usage 0.81%. Max >>> allowed age: 6.243193692985590days)", indicating that it has connected to >>> the master. >>> >>> However, the Mesos tracking website does not recognize the second >>> slave. >>> >>> Additionally, frameworks that are started when ssh'd into a different >>> computer (than the master), the framework stalls out at: >>> >>> "I0627 15:52:44.045642 10254 sched.cpp:230] No credentials provided. >>> Attempting to register without authentication." The task also fails to >>> appear on the mesos tracking website. However, frameworks that are launches >>> from the same computer as the master using the local IP do execute >>> normally. >>> >>> I am wondering if you have encountered this issue. I have been searching >>> the web for a solution, but have not been able to find one. I would really >>> appreciate any insights you might have. >>> >> >> >

