Hmm, well something is still the matter. Presumably, I didn't get everything rebuilt and linked correctly. But its not clear what is the matter exactly, and I've double checked that the correct pvfs and mpich is in use. And I rebuilt the mpich executable with the new mpich, so . . .
PVFS servers run and ping successfully still. The mpi job still craps out, but it looks to be failing differently: Here is a run without debug (connection refused errors?): http://www.parl.clemson.edu/~bradles/downloads/anl-io-bm-mx-16-2.o168517 And here is with the debug enabled: http://www.parl.clemson.edu/~bradles/downloads/anl-io-bm-mx-16-2.o168520 That looks more like the timeout stuff again I guess, but a lot less network activity this time around. Any more assistance? Cheers, Brad On Thu, Mar 5, 2009 at 2:37 PM, Scott Atchley <[email protected]> wrote: > On Mar 5, 2009, at 1:52 PM, Bradley Settlemyer wrote: > >> Heh, the job works whenever I do that: >> >> http://www.parl.clemson.edu/~bradles/downloads/anl-io-bm-mx-16-2.o168456 >> >> However, this run had a really slow write in the second instance: >> >> http://www.parl.clemson.edu/~bradles/downloads/anl-io-bm-mx-16-2.o168495 >> >> Both include debug from two procs (on seperate nodes). Hope that is okay. >> >> Cheers, >> Brad > > Brad, > > This bug is fixed in PVFS 2.8.1. > > What happened in the second run above is the client disconnected and then > reconnected. The server did not realize that the client went away and the > server never replies to the new connection request. > > Remember to unset PVFS2_DEBUGMASK or your performance will be horrible. :-) > > Scott > _______________________________________________ Pvfs2-users mailing list [email protected] http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
