On Dec 21, 2006, at 4:57 PM, Pete Wyckoff wrote:

[EMAIL PROTECTED] wrote on Thu, 21 Dec 2006 16:26 -0500:
On Dec 21, 2006, at 3:59 PM, Pete Wyckoff wrote:
[EMAIL PROTECTED] wrote on Thu, 21 Dec 2006 15:50 -0500:
Client posts a receive with op_id 5, bmi tag 1 and length 32808
Client posts an unexpected send with op_id 7, bmi tag 1 and length 24
[..]
Server receives unexpected recv with bmi tag 1 and length 24
Server posts an expected send with op_id 79, bmi tag 1 and length 816
[..]
On the Client:
[E 15:40:10.538206] job_time_mgr_expire: job time out: cancelling bmi
operation, job_id: 4.
[E 15:40:10.538421] job_time_mgr_expire: job time out: cancelling bmi
operation, job_id: 6.
[..]
On the Server:
[E 12/21 15:40] job_time_mgr_expire: job time out: cancelling bmi
operation, job_id: 78.
[..]
I did not think the op_ids would match, but bmi_mx does not see the
timed out ops in any post_send or post_recv functions. Are these
operations passing through bmi_mx (possibly via other BMI_meth_*
functions) or are these unrelated to bmi_mx?

IDs are assigned to jobs.  IDs are also assigned to BMI operations.
They share the same number space but are different things.  A job
may require a few BMI operations to go to completion, and perhaps a
few disk operations. Job id 78 seems to require BMI id 79, for instance.

Ok, it helps if you set *outcount in BMI_Mmeth_testcontext() to let BMI know then you completed something. ;-)

After fixing other miscellaneous bugs, I now get:

% pvfs2-ping -m /mnt/pvfs2

(1) Parsing tab file...

(2) Initializing system interface...

(3) Initializing each file system found in tab file: /etc/pvfs2tab...

   PVFS2 servers: mx://fog33:0:3
   Storage name: pvfs2-fs
   Local mount point: /mnt/pvfs2
   /mnt/pvfs2: Ok

(4) Searching for /mnt/pvfs2 in pvfstab...

   PVFS2 servers: mx://fog33:0:3
   Storage name: pvfs2-fs
   Local mount point: /mnt/pvfs2

   meta servers:
   mx://fog33:0:3

   data servers:
   mx://fog33:0:3

(5) Verifying that all servers are responding...

   meta servers:
   mx://fog33:0:3 Ok

   data servers:
   mx://fog33:0:3 Ok

(6) Verifying that fsid 1318064247 is acceptable to all servers...

   Ok; all servers understand fs_id 1318064247

(7) Verifying that root handle is owned by one server...

   Root handle: 1048576
     Ok; root handle is owned by exactly one server.

zsh: segmentation fault (core dumped)  pvfs2-ping -m /mnt/pvfs2

The segfault is in my cleanup code and I am looking into it.

Scott
_______________________________________________
Pvfs2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers

Reply via email to