Hi Josh, I have not tried Open-MX in awhile, but several people have used it with PVFS2. Have you tried the omx_pingpong test to verify that Open-MX is passing traffic correctly?
You may also want to ask about this error on the Open-MX mail list: http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/open-mx-devel Scott On Aug 14, 2010, at 11:50 AM, Joshua Randall wrote: > I am using pvfs-2.8.2 and it works using TCP, but I ideally want to run it > using open-mx. I have installed and configured open-mx-1.3.1 and it is > running on all three servers. > > Does anyone actually have Open-MX working with PVFS2? I have set the > MX_IMM_ACK environment variable to 1 as directed in the FAQ, and all my > connectivity tests with Open-MX seem to work just fine. > > Below I have attached relevant output and configuration files. > > Thanks for any help you can offer! > > Josh. > > > > The output of omx_info shows all three hosts are successfully communicating > over ethernet. >> $ sudo /opt/open-mx/bin/omx_info >> Open-MX version 1.3.1 >> build: jrand...@tommy:/usr/local/src/open-mx/open-mx-1.3.1 Fri Aug 13 >> 19:07:08 BST 2010 >> >> Found 1 boards (32 max) supporting 32 endpoints each: >> tommy:0 (board #0 name eth3 addr 00:1b:21:4f:4b:e6) >> managed by driver 'ixgbe' >> attached to numa node 0 >> >> Peer table is ready, mapper is 00:00:00:00:00:00 >> ================================================ >> 0) 00:1b:21:4f:4b:e6 tommy:0 >> 1) 00:1b:21:4d:ba:92 renton:0 >> 2) 00:1b:21:4f:4d:5a begbie:0 > > > The output of omx_endpoint_info shows all 32 endpoints are available. >> $ sudo /opt/open-mx/bin/omx_endpoint_info >> tommy:0 (board #0 name eth3 addr 00:1b:21:4f:4b:e6) >> ============================================== >> raw open by pid 20653 (omxoed) >> 0 regular endpoints open (out of 32) >> > > When I run pvfs2-server, with PVFS2_DEBUGMASK="all" I get a "Remote Endpoint > is Closed" error and the server exits with code 255. >> $ sudo /usr/local/sbin/pvfs2-server /etc/pvfs2-fs.conf -d > >> [S 08/14 16:40] PVFS2 Server on node tommy version 2.8.2 starting... >> [D 08/14 16:40] Logging all (mask 18446744073709551615) >> [D 08/14 16:40] PINT_encode_initialize >> [D 08/14 16:40] lebf_initialize >> [D 08/14 16:40] check_req_size >> [D 08/14 16:40] encode_common >> [D 08/14 16:40] lebf_encode_req >> [D 08/14 16:40] lebf_encode_rel >> [D 08/14 16:40] check_resp_size >> [D 08/14 16:40] encode_common >> [D 08/14 16:40] lebf_encode_resp >> [D 08/14 16:40] lebf_encode_rel >> [D 08/14 16:40] check_req_size >> [D 08/14 16:40] encode_common >> [D 08/14 16:40] lebf_encode_req >> [D 08/14 16:40] lebf_encode_rel >> [D 08/14 16:40] check_resp_size >> [D 08/14 16:40] encode_common >> [D 08/14 16:40] lebf_encode_resp >> [D 08/14 16:40] lebf_encode_rel >> [D 08/14 16:40] check_req_size >> [D 08/14 16:40] encode_common >> [D 08/14 16:40] lebf_encode_req >> [D 08/14 16:40] PINT_do_request_commit: commit node 0x7fff0a7e6e40 >> [D 08/14 16:40] node stored at 0 >> [D 08/14 16:40] clearing tree >> [D 08/14 16:40] lebf_encode_rel >> [D 08/14 16:40] check_resp_size >> [D 08/14 16:40] encode_common >> [D 08/14 16:40] lebf_encode_resp >> [D 08/14 16:40] lebf_encode_rel >> [D 08/14 16:40] check_req_size >> [D 08/14 16:40] encode_common >> [D 08/14 16:40] lebf_encode_req >> [D 08/14 16:40] lebf_encode_rel >> [D 08/14 16:40] check_resp_size >> [D 08/14 16:40] encode_common >> [D 08/14 16:40] lebf_encode_resp >> [D 08/14 16:40] lebf_encode_rel >> [D 08/14 16:40] check_req_size >> [D 08/14 16:40] encode_common >> [D 08/14 16:40] lebf_encode_req >> [D 08/14 16:40] lebf_encode_rel >> [D 08/14 16:40] check_resp_size >> [D 08/14 16:40] encode_common >> [D 08/14 16:40] lebf_encode_resp >> [D 08/14 16:40] lebf_encode_rel >> [D 08/14 16:40] check_req_size >> [D 08/14 16:40] encode_common >> [D 08/14 16:40] lebf_encode_req >> [D 08/14 16:40] lebf_encode_rel >> [D 08/14 16:40] check_resp_size >> [D 08/14 16:40] encode_common >> [D 08/14 16:40] lebf_encode_resp >> [D 08/14 16:40] lebf_encode_rel >> [D 08/14 16:40] check_req_size >> [D 08/14 16:40] encode_common >> [D 08/14 16:40] lebf_encode_req >> [D 08/14 16:40] lebf_encode_rel >> [D 08/14 16:40] check_resp_size >> [D 08/14 16:40] encode_common >> [D 08/14 16:40] lebf_encode_resp >> [D 08/14 16:40] lebf_encode_rel >> [D 08/14 16:40] check_req_size >> [D 08/14 16:40] encode_common >> [D 08/14 16:40] lebf_encode_req >> [D 08/14 16:40] lebf_encode_rel >> [D 08/14 16:40] check_resp_size >> [D 08/14 16:40] encode_common >> [D 08/14 16:40] lebf_encode_resp >> [D 08/14 16:40] lebf_encode_rel >> [D 08/14 16:40] check_req_size >> [D 08/14 16:40] encode_common >> [D 08/14 16:40] lebf_encode_req >> [D 08/14 16:40] lebf_encode_rel >> [D 08/14 16:40] check_resp_size >> [D 08/14 16:40] encode_common >> [D 08/14 16:40] lebf_encode_resp >> [D 08/14 16:40] lebf_encode_rel >> [D 08/14 16:40] check_req_size >> [D 08/14 16:40] encode_common >> [D 08/14 16:40] lebf_encode_req >> [D 08/14 16:40] lebf_encode_rel >> [D 08/14 16:40] check_resp_size >> [D 08/14 16:40] encode_common >> [D 08/14 16:40] lebf_encode_resp >> [D 08/14 16:40] lebf_encode_rel >> [D 08/14 16:40] check_req_size >> [D 08/14 16:40] encode_common >> [D 08/14 16:40] lebf_encode_req >> [D 08/14 16:40] lebf_encode_rel >> [D 08/14 16:40] check_resp_size >> [D 08/14 16:40] encode_common >> [D 08/14 16:40] lebf_encode_resp >> [D 08/14 16:40] lebf_encode_rel >> [D 08/14 16:40] check_req_size >> [D 08/14 16:40] encode_common >> [D 08/14 16:40] lebf_encode_req >> [D 08/14 16:40] lebf_encode_rel >> [D 08/14 16:40] check_resp_size >> [D 08/14 16:40] encode_common >> [D 08/14 16:40] lebf_encode_resp >> [D 08/14 16:40] lebf_encode_rel >> [D 08/14 16:40] check_req_size >> [D 08/14 16:40] encode_common >> [D 08/14 16:40] lebf_encode_req >> [D 08/14 16:40] lebf_encode_rel >> [D 08/14 16:40] check_resp_size >> [D 08/14 16:40] encode_common >> [D 08/14 16:40] lebf_encode_resp >> [D 08/14 16:40] lebf_encode_rel >> [D 08/14 16:40] check_resp_size >> [D 08/14 16:40] encode_common >> [D 08/14 16:40] lebf_encode_resp >> [D 08/14 16:40] lebf_encode_rel >> [D 08/14 16:40] check_req_size >> [D 08/14 16:40] encode_common >> [D 08/14 16:40] lebf_encode_req >> [D 08/14 16:40] lebf_encode_rel >> [D 08/14 16:40] check_resp_size >> [D 08/14 16:40] encode_common >> [D 08/14 16:40] lebf_encode_resp >> [D 08/14 16:40] lebf_encode_rel >> [D 08/14 16:40] check_req_size >> [D 08/14 16:40] encode_common >> [D 08/14 16:40] lebf_encode_req >> [D 08/14 16:40] lebf_encode_rel >> [D 08/14 16:40] check_resp_size >> [D 08/14 16:40] encode_common >> [D 08/14 16:40] lebf_encode_resp >> [D 08/14 16:40] lebf_encode_rel >> [D 08/14 16:40] check_req_size >> [D 08/14 16:40] encode_common >> [D 08/14 16:40] lebf_encode_req >> [D 08/14 16:40] lebf_encode_rel >> [D 08/14 16:40] check_resp_size >> [D 08/14 16:40] encode_common >> [D 08/14 16:40] lebf_encode_resp >> [D 08/14 16:40] lebf_encode_rel >> [D 08/14 16:40] check_req_size >> [D 08/14 16:40] encode_common >> [D 08/14 16:40] lebf_encode_req >> [D 08/14 16:40] lebf_encode_rel >> [D 08/14 16:40] check_resp_size >> [D 08/14 16:40] encode_common >> [D 08/14 16:40] lebf_encode_resp >> [D 08/14 16:40] lebf_encode_rel >> [D 08/14 16:40] check_req_size >> [D 08/14 16:40] encode_common >> [D 08/14 16:40] lebf_encode_req >> [D 08/14 16:40] lebf_encode_rel >> [D 08/14 16:40] check_resp_size >> [D 08/14 16:40] encode_common >> [D 08/14 16:40] lebf_encode_resp >> [D 08/14 16:40] lebf_encode_rel >> [D 08/14 16:40] check_req_size >> [D 08/14 16:40] encode_common >> [D 08/14 16:40] lebf_encode_req >> [D 08/14 16:40] lebf_encode_rel >> [D 08/14 16:40] check_resp_size >> [D 08/14 16:40] encode_common >> [D 08/14 16:40] lebf_encode_resp >> [D 08/14 16:40] lebf_encode_rel >> [D 08/14 16:40] check_req_size >> [D 08/14 16:40] encode_common >> [D 08/14 16:40] lebf_encode_req >> [D 08/14 16:40] lebf_encode_rel >> [D 08/14 16:40] check_resp_size >> [D 08/14 16:40] encode_common >> [D 08/14 16:40] lebf_encode_resp >> [D 08/14 16:40] lebf_encode_rel >> [D 08/14 16:40] check_req_size >> [D 08/14 16:40] encode_common >> [D 08/14 16:40] lebf_encode_req >> [D 08/14 16:40] lebf_encode_rel >> [D 08/14 16:40] check_resp_size >> [D 08/14 16:40] encode_common >> [D 08/14 16:40] lebf_encode_resp >> [D 08/14 16:40] lebf_encode_rel >> [D 08/14 16:40] check_req_size >> [D 08/14 16:40] encode_common >> [D 08/14 16:40] lebf_encode_req >> [D 08/14 16:40] lebf_encode_rel >> [D 08/14 16:40] check_resp_size >> [D 08/14 16:40] encode_common >> [D 08/14 16:40] lebf_encode_resp >> [D 08/14 16:40] lebf_encode_rel >> [D 08/14 16:40] check_req_size >> [D 08/14 16:40] encode_common >> [D 08/14 16:40] lebf_encode_req >> [D 08/14 16:40] lebf_encode_rel >> [D 08/14 16:40] check_resp_size >> [D 08/14 16:40] encode_common >> [D 08/14 16:40] lebf_encode_resp >> [D 08/14 16:40] lebf_encode_rel >> [D 08/14 16:40] check_req_size >> [D 08/14 16:40] encode_common >> [D 08/14 16:40] lebf_encode_req >> [D 08/14 16:40] lebf_encode_rel >> [D 08/14 16:40] check_resp_size >> [D 08/14 16:40] encode_common >> [D 08/14 16:40] lebf_encode_resp >> [D 08/14 16:40] lebf_encode_rel >> [D 08/14 16:40] check_req_size >> [D 08/14 16:40] encode_common >> [D 08/14 16:40] lebf_encode_req >> [D 08/14 16:40] lebf_encode_rel >> [D 08/14 16:40] check_resp_size >> [D 08/14 16:40] encode_common >> [D 08/14 16:40] lebf_encode_resp >> [D 08/14 16:40] lebf_encode_rel >> [D 08/14 16:40] check_req_size >> [D 08/14 16:40] encode_common >> [D 08/14 16:40] lebf_encode_req >> [D 08/14 16:40] lebf_encode_rel >> [D 08/14 16:40] check_resp_size >> [D 08/14 16:40] encode_common >> [D 08/14 16:40] lebf_encode_resp >> [D 08/14 16:40] lebf_encode_rel >> [D 08/14 16:40] check_req_size >> [D 08/14 16:40] encode_common >> [D 08/14 16:40] lebf_encode_req >> [D 08/14 16:40] lebf_encode_rel >> [D 08/14 16:40] check_resp_size >> [D 08/14 16:40] encode_common >> [D 08/14 16:40] lebf_encode_resp >> [D 08/14 16:40] lebf_encode_rel >> [D 08/14 16:40] check_req_size >> [D 08/14 16:40] encode_common >> [D 08/14 16:40] lebf_encode_req >> [D 08/14 16:40] lebf_encode_rel >> [D 08/14 16:40] check_resp_size >> [D 08/14 16:40] encode_common >> [D 08/14 16:40] lebf_encode_resp >> [D 08/14 16:40] lebf_encode_rel >> [D 08/14 16:40] check_req_size >> [D 08/14 16:40] encode_common >> [D 08/14 16:40] lebf_encode_req >> [D 08/14 16:40] lebf_encode_rel >> [D 08/14 16:40] check_resp_size >> [D 08/14 16:40] encode_common >> [D 08/14 16:40] lebf_encode_resp >> [D 08/14 16:40] lebf_encode_rel >> [D 08/14 16:40] check_req_size >> [D 08/14 16:40] encode_common >> [D 08/14 16:40] lebf_encode_req >> [D 08/14 16:40] PINT_do_request_commit: commit node 0x7fff0a7e6e40 >> [D 08/14 16:40] node stored at 0 >> [D 08/14 16:40] clearing tree >> [D 08/14 16:40] lebf_encode_rel >> [D 08/14 16:40] check_resp_size >> [D 08/14 16:40] encode_common >> [D 08/14 16:40] lebf_encode_resp >> [D 08/14 16:40] lebf_encode_rel >> [D 08/14 16:40] check_req_size >> [D 08/14 16:40] encode_common >> [D 08/14 16:40] lebf_encode_req >> [D 08/14 16:40] lebf_encode_rel >> [D 08/14 16:40] check_resp_size >> [D 08/14 16:40] encode_common >> [D 08/14 16:40] lebf_encode_resp >> [D 08/14 16:40] lebf_encode_rel >> [D 08/14 16:40] check_req_size >> [D 08/14 16:40] encode_common >> [D 08/14 16:40] lebf_encode_req >> [D 08/14 16:40] lebf_encode_rel >> [D 08/14 16:40] check_resp_size >> [D 08/14 16:40] encode_common >> [D 08/14 16:40] lebf_encode_resp >> [D 08/14 16:40] lebf_encode_rel >> [D 08/14 16:40] check_req_size >> [D 08/14 16:40] encode_common >> [D 08/14 16:40] lebf_encode_req >> [D 08/14 16:40] lebf_encode_rel >> [D 08/14 16:40] check_resp_size >> [D 08/14 16:40] encode_common >> [D 08/14 16:40] lebf_encode_resp >> [D 08/14 16:40] lebf_encode_rel >> [D 08/14 16:40] check_req_size >> [D 08/14 16:40] encode_common >> [D 08/14 16:40] lebf_encode_req >> [D 08/14 16:40] lebf_encode_rel >> [D 08/14 16:40] check_resp_size >> [D 08/14 16:40] encode_common >> [D 08/14 16:40] lebf_encode_resp >> [D 08/14 16:40] lebf_encode_rel >> [D 08/14 16:40] Passing mx://tommy:0:0 as BMI listen address. >> OMX: Emulating MX_DISABLE_SHMEM as OMX_DISABLE_SHARED >> OMX: Forcing shared comms to disabled >> OMX: Setting 4 bits of context id at offset 60 in matching >> [D 08/14 16:40] Server using shm key hint: 1937657271 >> [D 08/14 16:40] [BMI CONTROL]: BMI_set_info: set_info: 0 option: 11 >> [D 08/14 16:40] [BMI CONTROL]: BMI_set_info: set_info: 0 option: 12 >> [D 08/14 16:40] dbpf_thread_initialize: initialized >> [D 08/14 16:40] dbpf_thread_function started >> [D 08/14 16:40] [SYNC_COALESCE]: dbpf_sync_context_init for context 0 called >> OMX: Completing iconnect request: Remote Endpoint is Closed > > > > My pvfs2-fs.conf file contains: >> <Defaults> >> UnexpectedRequests 50 >> EventLogging all >> EnableTracing no >> LogStamp datetime >> BMIModules bmi_mx >> FlowModules flowproto_multiqueue >> PerfUpdateInterval 1000 >> ServerJobBMITimeoutSecs 30 >> ServerJobFlowTimeoutSecs 30 >> ClientJobBMITimeoutSecs 300 >> ClientJobFlowTimeoutSecs 300 >> ClientRetryLimit 5 >> ClientRetryDelayMilliSecs 2000 >> PrecreateBatchSize 512 >> PrecreateLowThreshold 256 >> >> StorageSpace /raid/pvfs2-storage-space >> LogFile /var/log/pvfs2-server.log >> </Defaults> >> >> <Aliases> >> Alias begbie mx://begbie:0:0 >> Alias renton mx://renton:0:0 >> Alias tommy mx://tommy:0:0 >> </Aliases> >> >> <Filesystem> >> Name pvfs2-fs >> ID 1937657241 >> RootHandle 1048576 >> FileStuffing yes >> <MetaHandleRanges> >> Range begbie 3-1537228672809129302 >> Range renton 1537228672809129303-3074457345618258602 >> Range tommy 3074457345618258603-4611686018427387902 >> </MetaHandleRanges> >> <DataHandleRanges> >> Range begbie 4611686018427387903-6148914691236517202 >> Range renton 6148914691236517203-7686143364045646502 >> Range tommy 7686143364045646503-9223372036854775802 >> </DataHandleRanges> >> <StorageHints> >> TroveSyncMeta yes >> TroveSyncData no >> TroveMethod alt-aio >> </StorageHints> >> </Filesystem> > > > _______________________________________________ > Pvfs2-users mailing list > [email protected] > http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users _______________________________________________ Pvfs2-users mailing list [email protected] http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
