I forgot to CC the list.

Begin forwarded message:

> From: Scott Atchley <[email protected]>
> Date: August 24, 2010 5:27:26 PM EDT
> To: Joshua Randall <[email protected]>
> Subject: Re: [Pvfs2-users] "Remote Endpoint is Closed" error starting 
> pvfs2-server
> 
> On Aug 24, 2010, at 3:12 PM, Joshua Randall wrote:
> 
>> Scott,
>> 
>> I modified the header file, recompiled, and ran it again -- here is the 
>> relevant portion of the debug output:
>> 
>>> [D 08/24 20:06] Passing mx://renton:0:3 as BMI listen address.
>>> [D 08/24 20:06] bmi_mx: bmx_peer_addref refcount was 0.
>>> [D 08/24 20:06] Server using shm key hint: 1937657261
>>> [D 08/24 20:06] [BMI CONTROL]: BMI_set_info: set_info: 0 option: 11
>>> [D 08/24 20:06] [BMI CONTROL]: BMI_set_info: set_info: 0 option: 12
>>> [D 08/24 20:06] dbpf_thread_initialize: initialized
>>> [D 08/24 20:06] dbpf_thread_function started
>>> [D 08/24 20:06] [SYNC_COALESCE]: dbpf_sync_context_init for context 0 called
>>> [D 08/24 20:06] bmi_mx: bmx_peer_addref refcount was 0.
>>> [D 08/24 20:06] bmi_mx: Setting peer mx://begbie:0:3 to BMX_PEER_WAIT.
>>> [D 08/24 20:06] bmi_mx: bmx_peer_addref refcount was 1.
>>> [D 08/24 20:06] bmi_mx: bmx_peer_addref refcount was 0.
>>> [D 08/24 20:06] bmi_mx: Setting peer mx://tommy:0:3 to BMX_PEER_WAIT.
>>> [D 08/24 20:06] bmi_mx: bmx_peer_addref refcount was 1.
>>> OMX: Completing iconnect request: Remote Endpoint is Closed
>> 
>> I don't really understand what is supposed to happen here -- the other two 
>> machines are not running a pvfs2 server at the moment because all three of 
>> them have this error and close before the others can be started.  Surely 
>> what should happen is some kind of polling loop waiting for the other 
>> servers to be ready?  That seems to be what is implied by going into the 
>> "BMX_PEER_WAIT" state, but it seems to be having a problem maintaining that 
>> state for some reason.
>> 
>> Josh.
> 
> This output is from renton. It tries to connect to begbie and tommy, but they 
> do not have an open MX endpoint. The connect fails and PVFS2 gives up.
> 
> I have not experimented much with multiple servers. Perhaps someone else can 
> chime in as to whether there should be specific order to bringing up servers 
> (e.g. in Lustre the metadata server must come up before the storage servers).
> 
> Another possibility is that PVFS2 tries again with socket connections but is 
> not with MX. Can anyone verify this?
> 
> Lastly, I expected to see some more message from bmi_mx. Is BMX_DB_CONN set 
> in the BMX_DB_MASK?
> 
> Scott


_______________________________________________
Pvfs2-users mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users

Reply via email to