Re: kernel BUG at fs/dcache.c:648! with v3.11-7890-ge5c832d

2013-09-10 Thread Linus Torvalds
On Tue, Sep 10, 2013 at 8:56 PM, Al Viro wrote: > > I do. What we need on the second pass (one where we currently > take seq_writelock()) is exclusion against writers; nothing we are > doing is worth disturbing the readers - we don't change any data > structures. And simple grabbing the

Re: kernel BUG at fs/dcache.c:648! with v3.11-7890-ge5c832d

2013-09-10 Thread Al Viro
On Tue, Sep 10, 2013 at 08:22:56PM -0700, Linus Torvalds wrote: > On Tue, Sep 10, 2013 at 6:48 PM, Waiman Long wrote: > > > > I need to clean up some comments in the code. The other thing that I want to > > do is to introduce read_seqlock/read_sequnlock() primitives that do the > > locking

Re: kernel BUG at fs/dcache.c:648! with v3.11-7890-ge5c832d

2013-09-10 Thread Linus Torvalds
On Tue, Sep 10, 2013 at 6:48 PM, Waiman Long wrote: > > I need to clean up some comments in the code. The other thing that I want to > do is to introduce read_seqlock/read_sequnlock() primitives that do the > locking without incrementing the sequence number. I don't understand. That's the whole

Re: kernel BUG at fs/dcache.c:648! with v3.11-7890-ge5c832d

2013-09-10 Thread Waiman Long
On 09/10/2013 04:25 PM, Linus Torvalds wrote: On Tue, Sep 10, 2013 at 12:57 PM, Mace Moneta wrote: The (first) patch looks good; no recurrence. It has only taken 3-5 minutes before, and I've been up for about half an hour now. Ok, good. It's pushed out. Al, your third pile of VFS stuff is

Re: kernel BUG at fs/dcache.c:648! with v3.11-7890-ge5c832d

2013-09-10 Thread Josh Boyer
On Tue, Sep 10, 2013 at 4:25 PM, Linus Torvalds wrote: > On Tue, Sep 10, 2013 at 12:57 PM, Mace Moneta wrote: >> The (first) patch looks good; no recurrence. It has only taken 3-5 minutes >> before, and I've been up for about half an hour now. > > Ok, good. It's pushed out. Thanks to the both

Re: kernel BUG at fs/dcache.c:648! with v3.11-7890-ge5c832d

2013-09-10 Thread Linus Torvalds
On Tue, Sep 10, 2013 at 12:57 PM, Mace Moneta wrote: > The (first) patch looks good; no recurrence. It has only taken 3-5 minutes > before, and I've been up for about half an hour now. Ok, good. It's pushed out. Al, your third pile of VFS stuff is also merged. Waiman, that means that your RCU

Re: kernel BUG at fs/dcache.c:648! with v3.11-7890-ge5c832d

2013-09-10 Thread Linus Torvalds
On Tue, Sep 10, 2013 at 12:13 PM, Al Viro wrote: > > Ugh... I really don't like that - your patch introduces the situations > when race with chroot can lead to two absolute symlinks in the same path > being interpreted wrt different roots. And yes, sure, anybody who gets > in that kind of races

Re: kernel BUG at fs/dcache.c:648! with v3.11-7890-ge5c832d

2013-09-10 Thread Linus Torvalds
On Tue, Sep 10, 2013 at 11:29 AM, Mace Moneta wrote: > I can patch and compile a kernel if you post the patch you want tested. It was attached in that previous email.. Apparently I'm not the only one who misses things like versions in subject lines or other small "details" in emails ;)

Re: kernel BUG at fs/dcache.c:648! with v3.11-7890-ge5c832d

2013-09-10 Thread Al Viro
On Tue, Sep 10, 2013 at 12:01:22PM -0700, Linus Torvalds wrote: > On Tue, Sep 10, 2013 at 11:43 AM, Al Viro wrote: > > > > !LOOKUP_ROOT: we set nd->root the first time we need / (in the very > > beginning if it's an absolute pathname, on the first absolute symlink > > otherwise). In non-RCU mode

Re: kernel BUG at fs/dcache.c:648! with v3.11-7890-ge5c832d

2013-09-10 Thread Linus Torvalds
On Tue, Sep 10, 2013 at 11:43 AM, Al Viro wrote: > > !LOOKUP_ROOT: we set nd->root the first time we need / (in the very > beginning if it's an absolute pathname, on the first absolute symlink > otherwise). In non-RCU mode we hold a reference to it; in RCU mode > we do not. As the result,

Re: kernel BUG at fs/dcache.c:648! with v3.11-7890-ge5c832d

2013-09-10 Thread Al Viro
On Tue, Sep 10, 2013 at 11:25:44AM -0700, Linus Torvalds wrote: > nd->flags &= ~LOOKUP_RCU; > if (!(nd->flags & LOOKUP_ROOT)) > nd->root.mnt = NULL; > unlock_rcu_walk(); > > and my unlazy_walk() essentially terminated the

Re: kernel BUG at fs/dcache.c:648! with v3.11-7890-ge5c832d

2013-09-10 Thread Linus Torvalds
On Tue, Sep 10, 2013 at 10:47 AM, Linus Torvalds wrote: > > But yes, e5c832d is obviously the "fixed" kernel. Let me think about this. Ok, I think I found it. I missed that "terminate_walk()" for the RCU case does this: nd->flags &= ~LOOKUP_RCU; if (!(nd->flags

Re: kernel BUG at fs/dcache.c:648! with v3.11-7890-ge5c832d

2013-09-10 Thread Linus Torvalds
On Tue, Sep 10, 2013 at 10:14 AM, Josh Boyer wrote: > > We've had a user report a backtrace from hitting the > BUG_ON(!ret->d_lockref.count) added with the lockref infrastructure > (commit 98474236f72) on rawhide today[1]. I've grabbed the backtrace > below. The user has btrfs, NFS, and sshfs

Re: kernel BUG at fs/dcache.c:648! with v3.11-7890-ge5c832d

2013-09-10 Thread Josh Boyer
On Tue, Sep 10, 2013 at 1:33 PM, Linus Torvalds wrote: > On Tue, Sep 10, 2013 at 10:14 AM, Josh Boyer > wrote: >> >> We've had a user report a backtrace from hitting the >> BUG_ON(!ret->d_lockref.count) added with the lockref infrastructure >> (commit 98474236f72) on rawhide today[1]. I've

Re: kernel BUG at fs/dcache.c:648! with v3.11-7890-ge5c832d

2013-09-10 Thread Linus Torvalds
On Tue, Sep 10, 2013 at 10:39 AM, Josh Boyer wrote: > > The subject says v3.11-7890-ge5c832d, which is the git-describe output > of the mainline kernel for that Fedora build. Duh. I just read the bugzilla and the oops ;) But yes, e5c832d is obviously the "fixed" kernel. Let me think about this.

kernel BUG at fs/dcache.c:648! with v3.11-7890-ge5c832d

2013-09-10 Thread Josh Boyer
Hi All, We've had a user report a backtrace from hitting the BUG_ON(!ret->d_lockref.count) added with the lockref infrastructure (commit 98474236f72) on rawhide today[1]. I've grabbed the backtrace below. The user has btrfs, NFS, and sshfs in usage with this oops. I've not seen anything

kernel BUG at fs/dcache.c:648! with v3.11-7890-ge5c832d

2013-09-10 Thread Josh Boyer
Hi All, We've had a user report a backtrace from hitting the BUG_ON(!ret-d_lockref.count) added with the lockref infrastructure (commit 98474236f72) on rawhide today[1]. I've grabbed the backtrace below. The user has btrfs, NFS, and sshfs in usage with this oops. I've not seen anything

Re: kernel BUG at fs/dcache.c:648! with v3.11-7890-ge5c832d

2013-09-10 Thread Linus Torvalds
On Tue, Sep 10, 2013 at 10:39 AM, Josh Boyer jwbo...@gmail.com wrote: The subject says v3.11-7890-ge5c832d, which is the git-describe output of the mainline kernel for that Fedora build. Duh. I just read the bugzilla and the oops ;) But yes, e5c832d is obviously the fixed kernel. Let me think

Re: kernel BUG at fs/dcache.c:648! with v3.11-7890-ge5c832d

2013-09-10 Thread Linus Torvalds
On Tue, Sep 10, 2013 at 10:14 AM, Josh Boyer jwbo...@fedoraproject.org wrote: We've had a user report a backtrace from hitting the BUG_ON(!ret-d_lockref.count) added with the lockref infrastructure (commit 98474236f72) on rawhide today[1]. I've grabbed the backtrace below. The user has

Re: kernel BUG at fs/dcache.c:648! with v3.11-7890-ge5c832d

2013-09-10 Thread Josh Boyer
On Tue, Sep 10, 2013 at 1:33 PM, Linus Torvalds torva...@linux-foundation.org wrote: On Tue, Sep 10, 2013 at 10:14 AM, Josh Boyer jwbo...@fedoraproject.org wrote: We've had a user report a backtrace from hitting the BUG_ON(!ret-d_lockref.count) added with the lockref infrastructure (commit

Re: kernel BUG at fs/dcache.c:648! with v3.11-7890-ge5c832d

2013-09-10 Thread Linus Torvalds
On Tue, Sep 10, 2013 at 10:47 AM, Linus Torvalds torva...@linux-foundation.org wrote: But yes, e5c832d is obviously the fixed kernel. Let me think about this. Ok, I think I found it. I missed that terminate_walk() for the RCU case does this: nd-flags = ~LOOKUP_RCU;

Re: kernel BUG at fs/dcache.c:648! with v3.11-7890-ge5c832d

2013-09-10 Thread Al Viro
On Tue, Sep 10, 2013 at 11:25:44AM -0700, Linus Torvalds wrote: nd-flags = ~LOOKUP_RCU; if (!(nd-flags LOOKUP_ROOT)) nd-root.mnt = NULL; unlock_rcu_walk(); and my unlazy_walk() essentially terminated the walk _without_

Re: kernel BUG at fs/dcache.c:648! with v3.11-7890-ge5c832d

2013-09-10 Thread Linus Torvalds
On Tue, Sep 10, 2013 at 11:43 AM, Al Viro v...@zeniv.linux.org.uk wrote: !LOOKUP_ROOT: we set nd-root the first time we need / (in the very beginning if it's an absolute pathname, on the first absolute symlink otherwise). In non-RCU mode we hold a reference to it; in RCU mode we do not. As

Re: kernel BUG at fs/dcache.c:648! with v3.11-7890-ge5c832d

2013-09-10 Thread Al Viro
On Tue, Sep 10, 2013 at 12:01:22PM -0700, Linus Torvalds wrote: On Tue, Sep 10, 2013 at 11:43 AM, Al Viro v...@zeniv.linux.org.uk wrote: !LOOKUP_ROOT: we set nd-root the first time we need / (in the very beginning if it's an absolute pathname, on the first absolute symlink otherwise). In

Re: kernel BUG at fs/dcache.c:648! with v3.11-7890-ge5c832d

2013-09-10 Thread Linus Torvalds
On Tue, Sep 10, 2013 at 12:13 PM, Al Viro v...@zeniv.linux.org.uk wrote: Ugh... I really don't like that - your patch introduces the situations when race with chroot can lead to two absolute symlinks in the same path being interpreted wrt different roots. And yes, sure, anybody who gets in

Re: kernel BUG at fs/dcache.c:648! with v3.11-7890-ge5c832d

2013-09-10 Thread Linus Torvalds
On Tue, Sep 10, 2013 at 11:29 AM, Mace Moneta moneta.m...@gmail.com wrote: I can patch and compile a kernel if you post the patch you want tested. It was attached in that previous email.. Apparently I'm not the only one who misses things like versions in subject lines or other small details in

Re: kernel BUG at fs/dcache.c:648! with v3.11-7890-ge5c832d

2013-09-10 Thread Linus Torvalds
On Tue, Sep 10, 2013 at 12:57 PM, Mace Moneta moneta.m...@gmail.com wrote: The (first) patch looks good; no recurrence. It has only taken 3-5 minutes before, and I've been up for about half an hour now. Ok, good. It's pushed out. Al, your third pile of VFS stuff is also merged. Waiman, that

Re: kernel BUG at fs/dcache.c:648! with v3.11-7890-ge5c832d

2013-09-10 Thread Josh Boyer
On Tue, Sep 10, 2013 at 4:25 PM, Linus Torvalds torva...@linux-foundation.org wrote: On Tue, Sep 10, 2013 at 12:57 PM, Mace Moneta moneta.m...@gmail.com wrote: The (first) patch looks good; no recurrence. It has only taken 3-5 minutes before, and I've been up for about half an hour now. Ok,

Re: kernel BUG at fs/dcache.c:648! with v3.11-7890-ge5c832d

2013-09-10 Thread Waiman Long
On 09/10/2013 04:25 PM, Linus Torvalds wrote: On Tue, Sep 10, 2013 at 12:57 PM, Mace Monetamoneta.m...@gmail.com wrote: The (first) patch looks good; no recurrence. It has only taken 3-5 minutes before, and I've been up for about half an hour now. Ok, good. It's pushed out. Al, your third

Re: kernel BUG at fs/dcache.c:648! with v3.11-7890-ge5c832d

2013-09-10 Thread Linus Torvalds
On Tue, Sep 10, 2013 at 6:48 PM, Waiman Long waiman.l...@hp.com wrote: I need to clean up some comments in the code. The other thing that I want to do is to introduce read_seqlock/read_sequnlock() primitives that do the locking without incrementing the sequence number. I don't understand.

Re: kernel BUG at fs/dcache.c:648! with v3.11-7890-ge5c832d

2013-09-10 Thread Al Viro
On Tue, Sep 10, 2013 at 08:22:56PM -0700, Linus Torvalds wrote: On Tue, Sep 10, 2013 at 6:48 PM, Waiman Long waiman.l...@hp.com wrote: I need to clean up some comments in the code. The other thing that I want to do is to introduce read_seqlock/read_sequnlock() primitives that do the

Re: kernel BUG at fs/dcache.c:648! with v3.11-7890-ge5c832d

2013-09-10 Thread Linus Torvalds
On Tue, Sep 10, 2013 at 8:56 PM, Al Viro v...@zeniv.linux.org.uk wrote: I do. What we need on the second pass (one where we currently take seq_writelock()) is exclusion against writers; nothing we are doing is worth disturbing the readers - we don't change any data structures. And simple