[EMAIL PROTECTED] wrote on Tue, 14 Mar 2006 16:26 -0600:
>  I've attached the output of a 6.4GByte run which failed, and 
> quasi-calculated that it failed after transmitting about 650MBytes.  
> (this was assumed to be the case by taking a rough 90% of the 2800 
> completions in this run to have transmitted 256KB, though my estimation 
> may have been pointless and/or plain wrong considering the problem at 
> hand) 
> This type of test is 100% reproducible on my end, varrying between runs 
> with only the failing mopid changing.
> The test is:
> 'pvfs2-cp -t [6.4GB file on local array] [pvfs2-fs on remote server 
> mounted to 2TB array]'

Yes, this is the thing I found and fixed in bmi_ib, but didn't check
in yet.  Because I got embroiled in what may be a different problem
of magically disappearing messages, possibly some broken IB on our
end.  And I tried to simplify some data structures at the same time.
But I'm going to suck in just the minimal fixes for this into a
clean tree and see if I can isolate the important parts at least.

> When going through the output, I noticed that in several cases the 
> completions generated for BMI_context's mopid's would be off by 5-10.  
> This seemed to fix itself and/or be handled by the server without 
> issues, however, towards the end I noticed that there is a case when the 
> id's would get off by 35+ and the transfer would fail with a cts error.

It's okay for the mopids to be off.  But what is broken is related
to how it used to ack the wrong CTS buffer numbers.

                -- Pete
_______________________________________________
Pvfs2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers

Reply via email to