Awesome, Sam!
Can you check in your changes to test/posix/open.c as well?
I will modify the rest (test/posix/io.c, test/posix/iox.c)
thanks!
Murali

>
> Hi Guys,
>
> I finally found the bug(s) causing these hangs.  The first problem was
> in the request scheduler, with the new crdirent pass-through changes.
> The handle the crdirent gets queued on is actually the directory handle
> instead of the dirent handle, so the operation on that handle is
> technically read-only.  Treating it as modifying was causing other
> operations that came along (setattr, for example) to get queued instead
> of scheduled.
>
> That was the first cause of hangs.  The second was in the way the sync
> coalescing code worked.  There were cases where operations were getting
> queued as ready-to-be-synced (coalesced), but the following operations
> that got serviced were failing (appropriately with EEXISTS), and never
> calling any of the coalescing code.  Julian and I had talked about this
> being a problem a while back, but I guess it never got looked at.  In
> any case, I was able to cleanup the sync coalescing code some, so it was
> probably worth it.
>
> The tests of doing multiple simultaneous creates and unlinks to the same
> file seem to work fine now, including the open test in test/posix.  Let
> me know if any of you still have problems.
>
> Thanks,
>
> -sam
>
> Murali Vilayannur wrote:
> > Hi RobL,
> >> I'm seeing this on chiba with posix ior and the flash io benchmark
> >> (oddly enough, just with the parallel netcdf version, not the hdf5
> >> one).
> >>
> >> I agree it's something related to chiba, but have no idea what it
> >> could be.  I tried a different version of berkely db and saw the same
> >> results.
> >>
> >> pvfs2-ping works, but pvfs2-ls hangs in getattr.  servers don't
> >> *appear* to be stuck in anything, but I only hooked up a debugger to
> >> the server-that-wouldn't-die.
> >
> > I really wish we could rule this as a Chiba-specific bug, since it is so
> > hard to reproduce elsewhere or we would have known by now! :)
> > Since, I was using the vfs interface and not the mpi-io interface,
> > I can only think of the simultaneous create and unlink being the issue
> > here.
> > The other possibility is a bug in the aio libraries on Chiba...
> > If any one has any insights or seen similar behavior, do let us know!
> > thanks,
> > Murali
> >
> >
>
>
_______________________________________________
Pvfs2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers

Reply via email to