Thanks Rob.  I'm glad you have verified this.  It happens only with 2
PVFS2 servers for me.  I have done some debugging on it in the past to
only come up with a few clues.

1) It is the MPI_File_delete in hpio where it is hanging.  Basically on
the PVFS2 server, I think that the reads never get fully pushed out of
the request-server and therefore the delete is waiting for the read to
finish before it can happen and thus the hang.

2)  I've tried to simplify the problem, but believe it or not, this is
pretty much the simplest I can get it.  It could be a problem with the
datatype I/O code, but since it works with 1 and 3 servers, I'm not
entirely convinced of that.

3)  Honestly I'm not entirely sure of where the problem is, but if I had
to guess, it's a flow problem where the read flow isn't being marked
complete (and thus not flushed from the request scheduler).

4)  So it appears that one server will actually delete the file, but the
other won't.  So only one of the reads is incomplete.

Hope that helps some.  I'll keep working on it myself. =)  Extracting
the pvfs2-only parts of the code is a bit difficult...I was hoping to
avoid that if possible.  If we can't make any progress on it, I guess
I'll try to do that.  I appreciate the quick response.

Avery

On Thu, 2006-02-23 at 14:55 -0600, Robert Latham wrote:
> On Thu, Feb 23, 2006 at 12:38:26PM -0600, Avery Ching wrote:
> > By the way, is the datatype branch going to make it to ROMIO at some
> > point?  The major bug I've been trying to fix is using the datatype I/O
> > branch of the PVFS2 ROMIO driver using 2 pvfs2 servers.
> > 
> > mpiexec -n 2 ./hpio-debug  -o 11 -t 10 -m 1 -n 10 -c 4096 -p 128 -d
> > pvfs2:/mnt/pvfs2
> > 
> > It works fine with posix, list I/O, and collective I/O, just not
> > datatype I/O.  Could be something with my ROMIO driver or down inside
> > PVFS2.  Hard to say. =)  Basically, writes are fine, but reads will
> > hang....as if they never truly complete.
> 
> 
> hey avery
> Well i tried with your dtype code that you sent us a while back and
> that command worked great with 2 clients and 3 pvfs2 servers.  I'm
> seeing the two server problem now.    I'll try to narrow down the
> problem.   If you can extract the pvfs2-only parts of your code,
> that'll make it a lot easer to debug, but if not, that's ok.
> 
> ==rob
> 

_______________________________________________
Pvfs2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers

Reply via email to