On Wed, May 25, 2011 at 5:52 PM, Andrew Deason <[email protected]> wrote: > Brief summary: I'm asking a question about what our Linux readpage > handler is allowed to do. Some Red Hat folks have been asked about this, > but so far the response hasn't been very quick or useful, so I thought > I'd ask here if anyone happened to know. > > Hi, > > So, I've seen a few Linux panics on 2.6.9-89.mumble (RHEL4) from this > assert (OpenAFS client is 1.4-based): > > Assertion failure in journal_start() at fs/jbd/transaction.c:274: > "handle->h_transaction->t_journal == journal" > > The backtrace shows that this is from libafs trying to write to the > cache (from afs_GetDownDSlot), via afs_linux_readpage, which is > triggered from a page fault for a buffer that someone is trying to write > to an ext3 filesystem (which is separate from the ext3 disk cache fs). > > The 'handle->h_transaction->t_journal' in that assert is for the "other" > ext3 fs, and 'journal' is for the cache ext3 fs (in the panics that I've > seen). > > Based on that, and after reading through some Linux code, is looks like > ext3 will set current->journal_info, and then try to copy in pages from > the supplied user buffer, which can trigger libafs. So when we write to > the ext3 cache, the journal struct we want to use is different than the > one in the current->journal_info transaction, and ext3 explodes. > > Now, the thing is, I can reproduce this same situation on my own box, > and nothing blows up. From some extra print statements and such, I can > see that I'm in the afs_GetDownDSlot code path from a page fault, but > current->journal_info is always NULL by the time afs_linux_readpage gets > called, so the problem doesn't come up. Which leaves me a bit confused. > > > So, my question here is what is supposed to happen? Is > current->journal_info supposed to have the journal transaction of the > current process (in which case I assume the readpage handler is not > allowed to start write transactions, but I can't find this warned > against anywhere), or is something supposed to reset the current task's > journal_info or otherwise somehow guard against this? > > There may be some site-specific patches and stuff involved here and I'm > not trying to debug the panic itself on this list. I'd just like to know > what is intended to happen in a situation like this, if anyone can > provide any info.
I don't think it's been an issue before at all, and so it's simply not been on the radar. That said, I wonder if the site in question uses btrfs? I seem to recall there were some dodgy btrfs patches a while ago which diddled with journal state. Of course, I could totally be in left field here. -- Derrick _______________________________________________ OpenAFS-devel mailing list [email protected] https://lists.openafs.org/mailman/listinfo/openafs-devel
