The IP addresses are suspicious: 127.0.1.1:5051 looks like you communicate
with localhost. Make sure you pass to both master and slave correct `--ip `
flag. Correct should be the IP address of the interface which is used for
connecting these machines.

if the hostname is correct

Name:   hotbox-30.stanford.edu
Address: 10.79.6.70

 then --ip 10.79.6.70



On 28 June 2014 02:12, Sammy Steele <[email protected]> wrote:

> Thanks so much for your help! I am completely new to Mesos, so I am not
> sure if I am interpreting your question correctly. Do you mean these logs
> (which I've attached pictures of)? Or do you mean the log generated by the
> manager which looks like this (after attempting to register one slave on
> the same computer and one on a different computer):
>
> I0627 17:01:39.385833 32259 replica.cpp:661] Replica learned TRUNCATE
> action at position 1640
> I0627 17:01:46.255574 32260 http.cpp:452] HTTP request for
> '/master/state.json'
> I0627 17:01:55.490910 32255 master.cpp:2477] Re-registering slave
> 20140627-105920-16777343-5050-32615-0 at slave(1)@127.0.1.1:5051 (
> hotbox-30.Stanford.EDU)
> I0627 17:01:55.491575 32258 registrar.cpp:422] Attempting to update the
> 'registry'
> I0627 17:01:55.493095 32260 log.cpp:680] Attempting to append 322 bytes to
> the log
> I0627 17:01:55.493242 32255 coordinator.cpp:340] Coordinator attempting to
> write APPEND action at position 1641
> I0627 17:01:55.493803 32255 replica.cpp:508] Replica received write
> request for position 1641
> I0627 17:01:55.494604 32255 leveldb.cpp:343] Persisting action (342 bytes)
> to leveldb took 738686ns
> I0627 17:01:55.494701 32255 replica.cpp:676] Persisted action at 1641
> I0627 17:01:55.495028 32258 replica.cpp:655] Replica received learned
> notice for position 1641
> I0627 17:01:55.495808 32258 leveldb.cpp:343] Persisting action (344 bytes)
> to leveldb took 699563ns
> I0627 17:01:55.495873 32258 replica.cpp:676] Persisted action at 1641
> I0627 17:01:55.495931 32258 replica.cpp:661] Replica learned APPEND action
> at position 1641
> I0627 17:01:55.496388 32261 registrar.cpp:479] Successfully updated
> 'registry'
> I0627 17:01:55.496537 32256 log.cpp:699] Attempting to truncate the log to
> 1641
> I0627 17:01:55.496665 32260 coordinator.cpp:340] Coordinator attempting to
> write TRUNCATE action at position 1642
> I0627 17:01:55.496690 32257 master.cpp:2528] Re-registered slave
> 20140627-105920-16777343-5050-32615-0 at slave(1)@127.0.1.1:5051 (
> hotbox-30.Stanford.EDU)
> I0627 17:01:55.496824 32257 master.cpp:3472] Adding slave
> 20140627-105920-16777343-5050-32615-0 at slave(1)@127.0.1.1:5051 (
> hotbox-30.Stanford.EDU) with cpus(*):8; mem(*):15024; disk(*):448079;
> ports(*):[31000-32000]
> I0627 17:01:55.497179 32256 replica.cpp:508] Replica received write
> request for position 1642
> I0627 17:01:55.497406 32257 hierarchical_allocator_process.hpp:444] Added
> slave 20140627-105920-16777343-5050-32615-0 (hotbox-30.Stanford.EDU) with
> cpus(*):8; mem(*):15024; disk(*):448079; ports(*):[31000-32000] (and
> cpus(*):8; mem(*):15024; disk(*):448079; ports(*):[31000-32000] available)
> I0627 17:01:55.497931 32256 leveldb.cpp:343] Persisting action (18 bytes)
> to leveldb took 685099ns
> I0627 17:01:55.498000 32256 replica.cpp:676] Persisted action at 1642
> I0627 17:01:55.498262 32261 replica.cpp:655] Replica received learned
> notice for position 1642
> I0627 17:01:55.499034 32261 leveldb.cpp:343] Persisting action (20 bytes)
> to leveldb took 676723ns
> I0627 17:01:55.499114 32261 leveldb.cpp:401] Deleting ~2 keys from leveldb
> took 17977ns
> I0627 17:01:55.499174 32261 replica.cpp:676] Persisted action at 1642
> I0627 17:01:55.499232 32261 replica.cpp:661] Replica learned TRUNCATE
> action at position 1642
> I0627 17:01:56.261155 32261 http.cpp:452] HTTP request for
> '/master/state.json'
> I0627 17:02:06.306571 32257 http.cpp:452] HTTP request for
> '/master/state.json'
> I0627 17:02:16.337009 32255 http.cpp:452] HTTP request for
> '/master/state.json'
> I0627 17:02:26.346575 32256 http.cpp:452] HTTP request for
> '/master/state.json'
> I0627 17:02:36.356895 32257 http.cpp:452] HTTP request for
> '/master/state.json'
>
> Or this LOG which seems to be generated by mesos in the working directory?
>
> 2014/06/27-17:01:39.350738 7f83eac8f740 Recovering log #81
> 2014/06/27-17:01:39.350881 7f83eac8f740 Level-0 table #85: started
> 2014/06/27-17:01:39.353463 7f83eac8f740 Level-0 table #85: 1720 bytes OK
> 2014/06/27-17:01:39.358567 7f83eac8f740 Delete type=0 #81
> 2014/06/27-17:01:39.358813 7f83eac8f740 Delete type=3 #78
> 2014/06/27-17:01:39.359606 7f83e2f48700 Level-0 table #88: started
> 2014/06/27-17:01:39.359817 7f83e2f48700 Level-0 table #88: 0 bytes OK
> 2014/06/27-17:01:39.360829 7f83e2f48700 Delete type=0 #86
> 2014/06/27-17:01:39.361097 7f83e2f48700 Manual compaction at level-0 from
> (begin) .. (end); will stop at '0000001639' @ 4929 : 1
> 2014/06/27-17:01:39.361107 7f83e2f48700 Compacting 1@0 + 1@1 files
> 2014/06/27-17:01:39.363837 7f83e2f48700 Generated table #89: 3 keys, 523
> bytes
> 2014/06/27-17:01:39.363851 7f83e2f48700 Compacted 1@0 + 1@1 files => 523
> bytes
> 2014/06/27-17:01:39.364422 7f83e2f48700 compacted to: files[ 0 1 0 0 0 0 0
> ]
> 2014/06/27-17:01:39.364852 7f83e2f48700 Delete type=2 #85
> 2014/06/27-17:01:39.365031 7f83e2f48700 Delete type=2 #83
> 2014/06/27-17:01:39.365305 7f83e2f48700 Manual compaction at level-0 from
> '0000001639' @ 4929 : 1 .. (end); will stop at (end)
>
>
>
> On Fri, Jun 27, 2014 at 4:50 PM, Vinod Kone <[email protected]> wrote:
>
>> It looks like the framework and slave are not able to properly register
>> with the master due to networking issues. There should be log messages
>> indicating whether master received registration requests are not.
>>
>> > "I0627 16:02:42.431401 10059 slave.cpp:2873] Current usage 0.81%. Max
>> allowed age: 6.243193692985590days)", indicating that it has connected to
>> the master.
>>
>> This just tells you that slave is running. It has nothing to do with
>> whether it is registered with the master or not.
>>
>> What do master and framework logs say?
>>
>>
>>
>> On Fri, Jun 27, 2014 at 4:36 PM, Sammy Steele <[email protected]>
>> wrote:
>>
>>> Hi,
>>>
>>> I am trying to set up Mesos with 1 master and 3 slaves running on
>>> several computers all on the same switch. Each computer is a 4-socket dual
>>> core machine running ubuntu 13.04. I have installed mesos, and can create
>>> one master and one slave when ssh'd into the same computer using the local
>>> IP. However, when I try to create a slave on a second computer connected
>>> via the master's public IP, the slave appears to register.  I get a
>>> message:
>>>
>>> "I0627 16:02:42.431401 10059 slave.cpp:2873] Current usage 0.81%. Max
>>> allowed age: 6.243193692985590days)", indicating that it has connected to
>>> the master.
>>>
>>>  However, the Mesos tracking website does not recognize the second
>>> slave.
>>>
>>> Additionally, frameworks that are started when ssh'd into a different
>>> computer (than the master), the framework stalls out at:
>>>
>>>  "I0627 15:52:44.045642 10254 sched.cpp:230] No credentials provided.
>>> Attempting to register without authentication." The task also fails to
>>> appear on the mesos tracking website. However, frameworks that are launches
>>> from the same computer as the master using the local IP do execute
>>> normally.
>>>
>>> I am wondering if you have encountered this issue. I have been searching
>>> the web for a solution, but have not been able to find one. I would really
>>> appreciate any insights you might have.
>>>
>>
>>
>

Reply via email to