I opened up a bug ticket for this (RT #56542); I am sending it to openafs-devel because I'd appreciate some more eyes on this to confirm that my analysis is correct here.

----------------
We recently saw a kernel crash on a machine running RHEL4, which hit the following assert in (linux)/fs/namei.c:

        void page_put_link(struct dentry *dentry, struct nameidata *nd)
        {
        if (!IS_ERR(nd_get_link(nd))) {
                struct page *page;
                page = find_get_page(dentry->d_inode->i_mapping, 0);
                if (!page)
                        BUG();



I did some research and it appears that the symlink caching API was changed in the linux kernel source on August 20, 2005 via the following commits:

-       "Fix nasty ncpfs symlink handling bug."
        Linus Torvalds [Sat, 20 Aug 2005 01:02:56 +0000 (18:02 -0700)]
see:
        
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=cc314eef0128a807e50fa03baf2d0abc0647952c

-       "[PATCH] Fix up symlink function pointers"
        author  Al Viro <[EMAIL PROTECTED]>
        Fri, 19 Aug 2005 23:17:39 +0000 (00:17 +0100)
        committer       Linus Torvalds <[EMAIL PROTECTED]>
        Sat, 20 Aug 2005 01:08:21 +0000 (18:08 -0700)
see:
        
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=008b150a3c4d971cd65d02d107b8fcc860bc959c


My understanding of the discussion that preceded this change is that the symlink caching API in the linux kernel:

        page_follow_link_light(), page_put_link(), page_readlink(), etc.

is unsafe for a network filesystem to use, prior to the above commits. Since OpenAFS uses this API, I believe that AFS is vulnerable to kernel crashes when running on kernels older than 2.6.13 (the first release incorporating the above changes).


I don't have a test case to reproduce the crash yet, but my understanding is that the kernel can crash while following symlinks during a path name lookup; it's possible for one thread to be following symlinks while another thread causes the file cache pages (which contain the symlink text) to be evicted from memory. After the file cache pages are evicted, when the thread doing the symlink lookup calls page_put_link(), the BUG() will trigger.

Apparently, the symlink caching API was only suitable for local filesystems prior to the change that went into 2.6.13.



Here is a link to the original discussion of this bug on linux-kernel back in 2005:

Subject: Kernel bug: Bad page state: related to generic symlink code and mmap
        From:       Anton Altaparmakov <[EMAIL PROTECTED]>
        Date:       2005-08-19 11:14:48
        Message-ID: [EMAIL PROTECTED]

archived at:
        http://www.uwsg.indiana.edu/hypermail/linux/kernel/0508.2/0858.html
or:
        http://marc.info/?l=linux-kernel&m=112445020708392


I believe that in order to be safe, openafs cannot use the page_*link() API on kernels that do not include the patch. Since the patch changed the calling convention of the:

        inode_operations.follow_link()
        inode_operations.put_link()

methods, it should be possible to check this with an autoconf test.




To fix things up on old kernels, I can think of two options:

        1. Disable symlink caching with page_*link() API on unpatched
           kernels.  OpenAFS already has code which might be made to work:

                src/afs/LINUX/osi_vnodeops.c::
                        afs_linux_readlink()
                        afs_linux_follow_link()

           but these methods were written for pre-2.4 kernels, so they'd
           need updating.


        2. Prior to the patch that went into linux-2.6.13, the NFS client
           code in the Linux kernel was using its own symlink caching
           code, which is why NFS was never affected by the bug.
           Adopting something similar to the old NFS code should fix
           OpenAFS; here is a patch from 2005 which should be suitable as
           a starting point:


Subject: Re: Kernel bug: Bad page state: related to generic symlink code and mmap
        From:       Al Viro <[EMAIL PROTECTED]>
        Date:       2005-08-19 18:02:18
        Message-ID: [EMAIL PROTECTED]

archived at:
        http://www.uwsg.indiana.edu/hypermail/linux/kernel/0508.2/0923.html
or:
        http://marc.info/?l=linux-kernel&m=112447444702991



I don't know if this bug affects 2.4 kernels as well as 2.6 kernels <= 2.6.13.


-Chris Wing
[EMAIL PROTECTED]
_______________________________________________
OpenAFS-devel mailing list
[email protected]
https://lists.openafs.org/mailman/listinfo/openafs-devel

Reply via email to