Currently, the master needs to be able to create a connection back to the
slave.
If you look at the logs lines, you'll see the master is seeing the slave on
127.0.1.1:

"slave(1)@127.0.1.1:5051 (192.168.48.150)"

The master is going to try to connect to 127.0.1.1:5051, which appears to
fail immediately (hence the disconnection).
Can you try setting --ip=192.168.48.150 on the slave? That will ensure the
slave binds to the IP address you're expecting.

On Mon, Dec 15, 2014 at 4:33 PM, Tim Chen <[email protected]> wrote:
>
> Is there anything in the ERROR/WARNING logs?
>
> Tim
>
> On Mon, Dec 15, 2014 at 4:22 PM, Arunabha Ghosh <[email protected]>
> wrote:
>>
>> Hi,
>>     I've setup a test mesos cluster on a few VM's running locally. I have
>> three masters and two slaves
>>
>> masters : 192.168.48.14[5 - 7]
>> slaves : 192.168.48.15[0 - 1]
>>
>> The masters startup correctly and are able to elect a leader. The slaves
>> can find the master and register, but for some reason they immediately
>> disconnect.
>>
>>
>> *On the master (mesos-master.INFO)*
>>
>> master.cpp:3122] Registered slave
>> 20141215-160321-2435885248-5050-20424-S68 at slave(1)@127.0.1.1:5051
>> (192.168.48.150) with cpus(*):1; mem(*):489; disk(*):13901;
>> ports(*):[31000-32000]
>> I1215 16:15:51.970082 20448 hierarchical_allocator_process.hpp:442] Added
>> slave 20141215-160321-2435885248-5050-20424-S68 (192.168.48.150) with
>> cpus(*):1; mem(*):489; disk(*):13901; ports(*):[31000-32000] (and
>> cpus(*):1; mem(*):489; disk(*):13901; ports(*):[31000-32000] available)
>> I1215 16:15:51.970474 20454 master.cpp:839] Slave
>> 20141215-160321-2435885248-5050-20424-S68 at slave(1)@127.0.1.1:5051
>> (192.168.48.150) disconnected
>> I1215 16:15:51.970546 20454 master.cpp:1789] Disconnecting slave
>> 20141215-160321-2435885248-5050-20424-S68 at slave(1)@127.0.1.1:5051
>> (192.168.48.150)
>> I1215 16:15:51.970612 20454 master.cpp:1808] Deactivating slave
>> 20141215-160321-2435885248-5050-20424-S68 at slave(1)@127.0.1.1:5051
>> (192.168.48.150)
>> I1215 16:15:51.970772 20454 hierarchical_allocator_process.hpp:481] Slave
>> 20141215-160321-2435885248-5050-20424-S68 deactivated
>> I1215 16:15:51.975980 20453 replica.cpp:655] Replica received learned
>> notice for position 276
>> I1215 16:15:51.977501 20453 leveldb.cpp:343] Persisting action (20 bytes)
>> to leveldb took 1.475474ms
>> I1215 16:15:51.977625 20453 leveldb.cpp:401] Deleting ~2 keys from
>> leveldb took 50280ns
>>
>> *On the slave (mesos-slave.INFO)*
>>
>> Dec 15 16:06:09 ubuntu mesos-slave[18118]: 2014-12-15
>> 16:06:09,209:18118(0x7fa67d700700):ZOO_INFO@check_events@1750: session
>> establishment complete on server [192.168.48.147:2181],
>> sessionId=0x34a5067fd9e0001, negotiated timeout=10000
>> Dec 15 16:06:09 ubuntu mesos-slave[18118]: I1215 16:06:09.210183 18140
>> group.cpp:313] Group process (group(1)@127.0.1.1:5051) connected to
>> ZooKeeper
>> Dec 15 16:06:09 ubuntu mesos-slave[18118]: I1215 16:06:09.210248 18140
>> group.cpp:790] Syncing group operations: queue size (joins, cancels, datas)
>> = (0, 0, 0)
>> Dec 15 16:06:09 ubuntu mesos-slave[18118]: I1215 16:06:09.210270 18140
>> group.cpp:385] Trying to create path '/mesos' in ZooKeeper
>> Dec 15 16:06:09 ubuntu mesos-slave[18118]: I1215 16:06:09.213835 18140
>> detector.cpp:138] Detected a new leader: (id='55')
>> Dec 15 16:06:09 ubuntu mesos-slave[18118]: I1215 16:06:09.214570 18140
>> group.cpp:659] Trying to get '/mesos/info_0000000055' in ZooKeeper
>> Dec 15 16:06:09 ubuntu mesos-slave[18118]: I1215 16:06:09.215833 18141
>> detector.cpp:433] A new leading master ([email protected]:5050)
>> is detected
>> Dec 15 16:06:09 ubuntu mesos-slave[18118]: I1215 16:06:09.220592 18141
>> state.cpp:33] Recovering state from '/home/agh/mesos-work/meta'
>> Dec 15 16:06:09 ubuntu mesos-slave[18118]: I1215 16:06:09.220757 18141
>> state.cpp:62] Failed to find the latest slave from
>> '/home/agh/mesos-work/meta'
>> Dec 15 16:06:09 ubuntu mesos-slave[18118]: I1215 16:06:09.226416 18136
>> status_update_manager.cpp:197] Recovering status update manager
>> Dec 15 16:06:09 ubuntu mesos-slave[18118]: I1215 16:06:09.226963 18134
>> containerizer.cpp:281] Recovering containerizer
>> Dec 15 16:06:09 ubuntu mesos-slave[18118]: I1215 16:06:09.228973 18135
>> slave.cpp:3466] Finished recovery
>> Dec 15 16:06:09 ubuntu mesos-slave[18118]: I1215 16:06:09.230242 18137
>> status_update_manager.cpp:171] Pausing sending status updates
>> Dec 15 16:06:09 ubuntu mesos-slave[18118]: I1215 16:06:09.230450 18135
>> slave.cpp:602] New master detected at [email protected]:5050
>> Dec 15 16:06:09 ubuntu mesos-slave[18118]: I1215 16:06:09.230873 18135
>> slave.cpp:627] No credentials provided. Attempting to register without
>> authentication
>> Dec 15 16:06:09 ubuntu mesos-slave[18118]: I1215 16:06:09.231045 18135
>> slave.cpp:638] Detecting new master
>> Dec 15 16:07:09 ubuntu mesos-slave[18118]: I1215 16:07:09.225389 18141
>> slave.cpp:3321] Current usage 12.01%. Max allowed age: 5.459239732780289days
>> Dec 15 16:08:09 ubuntu mesos-slave[18118]: I1215 16:08:09.228869 18141
>> slave.cpp:3321] Current usage 12.01%. Max allowed age: 5.459239732780289days
>> Dec 15 16:09:09 ubuntu mesos-slave[18118]: I1215 16:09:09.252048 18141
>> slave.cpp:3321] Current usage 12.01%. Max allowed age: 5.459239732780289days
>> Dec 15 16:09:27 ubuntu mesos-slave[18118]: I1215 16:09:27.288277 18141
>> http.cpp:330] HTTP request for '/slave(1)/state.json'
>> Dec 15 16:10:09 ubuntu mesos-slave[18118]: I1215 16:10:09.271672 18138
>> slave.cpp:3321] Current usage 12.01%. Max allowed age: 5.459239732780289days
>>
>> It does not look like the slave is disconnecting, so why does the master
>> think the slave keeps disconnecting and deactivate the slave ?
>>
>> Thanks,
>> Arunabha
>>
>>
>>
>>

Reply via email to