Awesome, Sam! Can you check in your changes to test/posix/open.c as well? I will modify the rest (test/posix/io.c, test/posix/iox.c) thanks! Murali
> > Hi Guys, > > I finally found the bug(s) causing these hangs. The first problem was > in the request scheduler, with the new crdirent pass-through changes. > The handle the crdirent gets queued on is actually the directory handle > instead of the dirent handle, so the operation on that handle is > technically read-only. Treating it as modifying was causing other > operations that came along (setattr, for example) to get queued instead > of scheduled. > > That was the first cause of hangs. The second was in the way the sync > coalescing code worked. There were cases where operations were getting > queued as ready-to-be-synced (coalesced), but the following operations > that got serviced were failing (appropriately with EEXISTS), and never > calling any of the coalescing code. Julian and I had talked about this > being a problem a while back, but I guess it never got looked at. In > any case, I was able to cleanup the sync coalescing code some, so it was > probably worth it. > > The tests of doing multiple simultaneous creates and unlinks to the same > file seem to work fine now, including the open test in test/posix. Let > me know if any of you still have problems. > > Thanks, > > -sam > > Murali Vilayannur wrote: > > Hi RobL, > >> I'm seeing this on chiba with posix ior and the flash io benchmark > >> (oddly enough, just with the parallel netcdf version, not the hdf5 > >> one). > >> > >> I agree it's something related to chiba, but have no idea what it > >> could be. I tried a different version of berkely db and saw the same > >> results. > >> > >> pvfs2-ping works, but pvfs2-ls hangs in getattr. servers don't > >> *appear* to be stuck in anything, but I only hooked up a debugger to > >> the server-that-wouldn't-die. > > > > I really wish we could rule this as a Chiba-specific bug, since it is so > > hard to reproduce elsewhere or we would have known by now! :) > > Since, I was using the vfs interface and not the mpi-io interface, > > I can only think of the simultaneous create and unlink being the issue > > here. > > The other possibility is a bug in the aio libraries on Chiba... > > If any one has any insights or seen similar behavior, do let us know! > > thanks, > > Murali > > > > > > _______________________________________________ Pvfs2-developers mailing list [email protected] http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers
