On Sat, 5 Mar 2011 21:54:20 +0000 Simon Wilkinson <[email protected]> wrote:
> On 5 Mar 2011, at 21:31, Ryan C. Underwood wrote: > > > > I also looked for any read, seek, or stat call that returned > > negative. No luck. It seems like all the threads are being > > captured... The fileserver storage backend code caches file descriptors, so a previous access could have opened it. Either that, or we're somehow failing before we get to accessing the file data. But that seems unlikely if reads before the 2G mark are fine; you can access the beginning of the file, right? You could restart the dafileserver process and start strace'ing right away, or try to correlate the open file decsriptors in /proc/foo/fd; of course, you can't do that if you wait until after the salvage happened, since the fileserver won't have it open anymore. And there probably won't be any relevant seeks (pread, open, and fstat), and a short read can trigger this, which won't show up as a negative return. Also, can you check if /vicepa/AFSIDat/3=/3=++U/8/L3/Que++kB44 is actually 2147483648 bytes long? Can you read the contents of the file directly from vicepa successfully? (just don't change anything in the data or metadata of the file) > Another thing you could try is (if this is a test system) attach to > the fileserver process with gdb, and set a breakpoint at VTakeOffline. > Then try and reproduce the problem. Hopefully, when the fileserver > decides to take the volume offline, you'll hit that breakpoint, and > 'bt' will let us know exactly where this is being triggered. Or this. Or we could change the error messages to actually provide useful information, since they're currently amazingly unhelpful. -- Andrew Deason [email protected] _______________________________________________ OpenAFS-info mailing list [email protected] https://lists.openafs.org/mailman/listinfo/openafs-info
