I forgot to CC the list. Begin forwarded message:
> From: Scott Atchley <[email protected]> > Date: August 24, 2010 5:27:26 PM EDT > To: Joshua Randall <[email protected]> > Subject: Re: [Pvfs2-users] "Remote Endpoint is Closed" error starting > pvfs2-server > > On Aug 24, 2010, at 3:12 PM, Joshua Randall wrote: > >> Scott, >> >> I modified the header file, recompiled, and ran it again -- here is the >> relevant portion of the debug output: >> >>> [D 08/24 20:06] Passing mx://renton:0:3 as BMI listen address. >>> [D 08/24 20:06] bmi_mx: bmx_peer_addref refcount was 0. >>> [D 08/24 20:06] Server using shm key hint: 1937657261 >>> [D 08/24 20:06] [BMI CONTROL]: BMI_set_info: set_info: 0 option: 11 >>> [D 08/24 20:06] [BMI CONTROL]: BMI_set_info: set_info: 0 option: 12 >>> [D 08/24 20:06] dbpf_thread_initialize: initialized >>> [D 08/24 20:06] dbpf_thread_function started >>> [D 08/24 20:06] [SYNC_COALESCE]: dbpf_sync_context_init for context 0 called >>> [D 08/24 20:06] bmi_mx: bmx_peer_addref refcount was 0. >>> [D 08/24 20:06] bmi_mx: Setting peer mx://begbie:0:3 to BMX_PEER_WAIT. >>> [D 08/24 20:06] bmi_mx: bmx_peer_addref refcount was 1. >>> [D 08/24 20:06] bmi_mx: bmx_peer_addref refcount was 0. >>> [D 08/24 20:06] bmi_mx: Setting peer mx://tommy:0:3 to BMX_PEER_WAIT. >>> [D 08/24 20:06] bmi_mx: bmx_peer_addref refcount was 1. >>> OMX: Completing iconnect request: Remote Endpoint is Closed >> >> I don't really understand what is supposed to happen here -- the other two >> machines are not running a pvfs2 server at the moment because all three of >> them have this error and close before the others can be started. Surely >> what should happen is some kind of polling loop waiting for the other >> servers to be ready? That seems to be what is implied by going into the >> "BMX_PEER_WAIT" state, but it seems to be having a problem maintaining that >> state for some reason. >> >> Josh. > > This output is from renton. It tries to connect to begbie and tommy, but they > do not have an open MX endpoint. The connect fails and PVFS2 gives up. > > I have not experimented much with multiple servers. Perhaps someone else can > chime in as to whether there should be specific order to bringing up servers > (e.g. in Lustre the metadata server must come up before the storage servers). > > Another possibility is that PVFS2 tries again with socket connections but is > not with MX. Can anyone verify this? > > Lastly, I expected to see some more message from bmi_mx. Is BMX_DB_CONN set > in the BMX_DB_MASK? > > Scott _______________________________________________ Pvfs2-users mailing list [email protected] http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
