On Dec 21, 2006, at 4:57 PM, Pete Wyckoff wrote:
[EMAIL PROTECTED] wrote on Thu, 21 Dec 2006 16:26 -0500:
On Dec 21, 2006, at 3:59 PM, Pete Wyckoff wrote:
[EMAIL PROTECTED] wrote on Thu, 21 Dec 2006 15:50 -0500:
Client posts a receive with op_id 5, bmi tag 1 and length 32808
Client posts an unexpected send with op_id 7, bmi tag 1 and
length 24
[..]
Server receives unexpected recv with bmi tag 1 and length 24
Server posts an expected send with op_id 79, bmi tag 1 and
length 816
[..]
On the Client:
[E 15:40:10.538206] job_time_mgr_expire: job time out:
cancelling bmi
operation, job_id: 4.
[E 15:40:10.538421] job_time_mgr_expire: job time out:
cancelling bmi
operation, job_id: 6.
[..]
On the Server:
[E 12/21 15:40] job_time_mgr_expire: job time out: cancelling bmi
operation, job_id: 78.
[..]
I did not think the op_ids would match, but bmi_mx does not see the
timed out ops in any post_send or post_recv functions. Are these
operations passing through bmi_mx (possibly via other BMI_meth_*
functions) or are these unrelated to bmi_mx?
IDs are assigned to jobs. IDs are also assigned to BMI operations.
They share the same number space but are different things. A job
may require a few BMI operations to go to completion, and perhaps a
few disk operations. Job id 78 seems to require BMI id 79, for
instance.
Ok, it helps if you set *outcount in BMI_Mmeth_testcontext() to let
BMI know then you completed something. ;-)
After fixing other miscellaneous bugs, I now get:
% pvfs2-ping -m /mnt/pvfs2
(1) Parsing tab file...
(2) Initializing system interface...
(3) Initializing each file system found in tab file: /etc/pvfs2tab...
PVFS2 servers: mx://fog33:0:3
Storage name: pvfs2-fs
Local mount point: /mnt/pvfs2
/mnt/pvfs2: Ok
(4) Searching for /mnt/pvfs2 in pvfstab...
PVFS2 servers: mx://fog33:0:3
Storage name: pvfs2-fs
Local mount point: /mnt/pvfs2
meta servers:
mx://fog33:0:3
data servers:
mx://fog33:0:3
(5) Verifying that all servers are responding...
meta servers:
mx://fog33:0:3 Ok
data servers:
mx://fog33:0:3 Ok
(6) Verifying that fsid 1318064247 is acceptable to all servers...
Ok; all servers understand fs_id 1318064247
(7) Verifying that root handle is owned by one server...
Root handle: 1048576
Ok; root handle is owned by exactly one server.
zsh: segmentation fault (core dumped) pvfs2-ping -m /mnt/pvfs2
The segfault is in my cleanup code and I am looking into it.
Scott
_______________________________________________
Pvfs2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers