Re: Regression in next with ext4 oops

2016-10-05 Thread Kalle Valo
Tony Lindgren  writes:

> * Kalle Valo  [161004 12:42]:
>> Tony Lindgren  writes:
>> 
>> > And the patch below seems to fix the issue as the driver is now
>> > using devm_kzalloc. Will do some more testing and then will post
>> > a proper patch. The same issue might be there for SPI glue also.
>> 
>> This was already posted and you even acked it :)
>> 
>> wlcore: sdio: drop kfree for memory allocated with devm_kzalloc
>> 
>> https://patchwork.kernel.org/patch/9353985/
>
> Heh well now we know :) Can you please apply it as it fixes a memory
> corruption issue?

Yeah, I'll apply it to wireless-drivers.git soon.

> The SPI glue does not have this same issue.

Good to know, thanks for checking.

-- 
Kalle Valo


Re: Regression in next with ext4 oops

2016-10-05 Thread Kalle Valo
Tony Lindgren  writes:

> * Kalle Valo  [161004 12:42]:
>> Tony Lindgren  writes:
>> 
>> > And the patch below seems to fix the issue as the driver is now
>> > using devm_kzalloc. Will do some more testing and then will post
>> > a proper patch. The same issue might be there for SPI glue also.
>> 
>> This was already posted and you even acked it :)
>> 
>> wlcore: sdio: drop kfree for memory allocated with devm_kzalloc
>> 
>> https://patchwork.kernel.org/patch/9353985/
>
> Heh well now we know :) Can you please apply it as it fixes a memory
> corruption issue?

Yeah, I'll apply it to wireless-drivers.git soon.

> The SPI glue does not have this same issue.

Good to know, thanks for checking.

-- 
Kalle Valo


Re: Regression in next with ext4 oops

2016-10-04 Thread Jan Kara
On Tue 04-10-16 15:56:39, Al Viro wrote:
> On Tue, Oct 04, 2016 at 11:00:41AM +0200, Jan Kara wrote:
> > Hi!
> > 
> > On Mon 03-10-16 16:30:55, Tony Lindgren wrote:
> > > I'm seeing a repeatable oops with Linux next while running
> > > update-initramfs, see below. I tried reverting commit 59aa5a3aeead
> > > ("fscrypto: make filename crypto functions return 0 on success")
> > > as that's the only commit changing ext4_htree_store_dirent, but
> > > that did not help.
> > > 
> > > Anybody else seeing something like this?
> > 
> > Never seen this but I suspect it is a fallout from Al's directory locking
> > changes. In particular ext4_htree_fill_tree() builds rb-tree of found
> > directory entries in file->private_data (and generally modifies the
> > structure stored there) but after Al's changes we don't have exclusive
> > access to struct file if I'm right so if two processes end up calling
> > getdents() for the same 'struct file' we are doomed.
> 
> RTFS.  We sure as hell *do* have exclusive access to struct file.  See
> /* POSIX.1-2008/SUSv4 Section XSI 2.9.7 */
> if (S_ISREG(inode->i_mode) || S_ISDIR(inode->i_mode))
> f->f_mode |= FMODE_ATOMIC_POS;
> in do_dentry_open() as well as
> if (file && (file->f_mode & FMODE_ATOMIC_POS)) {
> if (file_count(file) > 1) {
> v |= FDPUT_POS_UNLOCK;
> mutex_lock(>f_pos_lock);
> }
> }
> in __fdget_pos() and
> f = fdget_pos(fd);
> if (!f.file)
> return -EBADF;
> in getdents(2).

Yeah, sorry. I've misunderstood how the FMODE_ATOMIC_POS thing works...

Honza
-- 
Jan Kara 
SUSE Labs, CR


Re: Regression in next with ext4 oops

2016-10-04 Thread Jan Kara
On Tue 04-10-16 15:56:39, Al Viro wrote:
> On Tue, Oct 04, 2016 at 11:00:41AM +0200, Jan Kara wrote:
> > Hi!
> > 
> > On Mon 03-10-16 16:30:55, Tony Lindgren wrote:
> > > I'm seeing a repeatable oops with Linux next while running
> > > update-initramfs, see below. I tried reverting commit 59aa5a3aeead
> > > ("fscrypto: make filename crypto functions return 0 on success")
> > > as that's the only commit changing ext4_htree_store_dirent, but
> > > that did not help.
> > > 
> > > Anybody else seeing something like this?
> > 
> > Never seen this but I suspect it is a fallout from Al's directory locking
> > changes. In particular ext4_htree_fill_tree() builds rb-tree of found
> > directory entries in file->private_data (and generally modifies the
> > structure stored there) but after Al's changes we don't have exclusive
> > access to struct file if I'm right so if two processes end up calling
> > getdents() for the same 'struct file' we are doomed.
> 
> RTFS.  We sure as hell *do* have exclusive access to struct file.  See
> /* POSIX.1-2008/SUSv4 Section XSI 2.9.7 */
> if (S_ISREG(inode->i_mode) || S_ISDIR(inode->i_mode))
> f->f_mode |= FMODE_ATOMIC_POS;
> in do_dentry_open() as well as
> if (file && (file->f_mode & FMODE_ATOMIC_POS)) {
> if (file_count(file) > 1) {
> v |= FDPUT_POS_UNLOCK;
> mutex_lock(>f_pos_lock);
> }
> }
> in __fdget_pos() and
> f = fdget_pos(fd);
> if (!f.file)
> return -EBADF;
> in getdents(2).

Yeah, sorry. I've misunderstood how the FMODE_ATOMIC_POS thing works...

Honza
-- 
Jan Kara 
SUSE Labs, CR


Re: Regression in next with ext4 oops

2016-10-04 Thread Tony Lindgren
* Kalle Valo  [161004 12:42]:
> Tony Lindgren  writes:
> 
> > * Tony Lindgren  [161004 12:17]:
> >> Hi,
> >> 
> >> * Al Viro  [161004 08:00]:
> >> > On Tue, Oct 04, 2016 at 10:02:31AM -0400, Theodore Ts'o wrote:
> >> > > On Tue, Oct 04, 2016 at 11:00:41AM +0200, Jan Kara wrote:
> >> > > > Never seen this but I suspect it is a fallout from Al's directory 
> >> > > > locking
> >> > > > changes. In particular ext4_htree_fill_tree() builds rb-tree of found
> >> > > > directory entries in file->private_data (and generally modifies the
> >> > > > structure stored there) but after Al's changes we don't have 
> >> > > > exclusive
> >> > > > access to struct file if I'm right so if two processes end up calling
> >> > > > getdents() for the same 'struct file' we are doomed.
> >> > > 
> >> > > I haven't seen it either, and I've been doing a lot of testing on the
> >> > > ext4 test branch.  So I'm guessing Tony has the only reliable repro
> >> > > for the problem at the moment.  That being said, it shouldn't be that
> >> > > hard to create a test case for this and add it to xfstests.
> >> > > 
> >> > > I'm pretty sure Jan is right about this, though, but it would be great
> >> > > to a get a quick confirmation from Tony if at all possible.
> >> > 
> >> > Jan is wrong - we do have per-struct-file serialization for getdents()
> >> > et.al.  It might be a race between getdents() on *different* struct
> >> > file for the same directory, but ->private_data is not a problem.
> >> 
> >> OK found the guilty person after git bisect and that's me.
> >> 
> >> Git bisect points to commit d776fc86b82f ("wlcore: sdio: Populate config
> >> firmware data"), so adding Kalle to Cc.
> >> 
> >> Looks like update-initramfs does rmmod of wlcore_sdio and that triggers
> >> some issue with the wlcore driver or with SDIO/MMC. Or maybe it's a memory
> >> corruption issue. I don't know yet exactly what's going on here yet but
> >> I plan to find out after some lunch.
> >
> > And the patch below seems to fix the issue as the driver is now
> > using devm_kzalloc. Will do some more testing and then will post
> > a proper patch. The same issue might be there for SPI glue also.
> 
> This was already posted and you even acked it :)
> 
> wlcore: sdio: drop kfree for memory allocated with devm_kzalloc
> 
> https://patchwork.kernel.org/patch/9353985/

Heh well now we know :) Can you please apply it as it fixes a memory
corruption issue?

The SPI glue does not have this same issue.

Regards,

Tony


Re: Regression in next with ext4 oops

2016-10-04 Thread Tony Lindgren
* Kalle Valo  [161004 12:42]:
> Tony Lindgren  writes:
> 
> > * Tony Lindgren  [161004 12:17]:
> >> Hi,
> >> 
> >> * Al Viro  [161004 08:00]:
> >> > On Tue, Oct 04, 2016 at 10:02:31AM -0400, Theodore Ts'o wrote:
> >> > > On Tue, Oct 04, 2016 at 11:00:41AM +0200, Jan Kara wrote:
> >> > > > Never seen this but I suspect it is a fallout from Al's directory 
> >> > > > locking
> >> > > > changes. In particular ext4_htree_fill_tree() builds rb-tree of found
> >> > > > directory entries in file->private_data (and generally modifies the
> >> > > > structure stored there) but after Al's changes we don't have 
> >> > > > exclusive
> >> > > > access to struct file if I'm right so if two processes end up calling
> >> > > > getdents() for the same 'struct file' we are doomed.
> >> > > 
> >> > > I haven't seen it either, and I've been doing a lot of testing on the
> >> > > ext4 test branch.  So I'm guessing Tony has the only reliable repro
> >> > > for the problem at the moment.  That being said, it shouldn't be that
> >> > > hard to create a test case for this and add it to xfstests.
> >> > > 
> >> > > I'm pretty sure Jan is right about this, though, but it would be great
> >> > > to a get a quick confirmation from Tony if at all possible.
> >> > 
> >> > Jan is wrong - we do have per-struct-file serialization for getdents()
> >> > et.al.  It might be a race between getdents() on *different* struct
> >> > file for the same directory, but ->private_data is not a problem.
> >> 
> >> OK found the guilty person after git bisect and that's me.
> >> 
> >> Git bisect points to commit d776fc86b82f ("wlcore: sdio: Populate config
> >> firmware data"), so adding Kalle to Cc.
> >> 
> >> Looks like update-initramfs does rmmod of wlcore_sdio and that triggers
> >> some issue with the wlcore driver or with SDIO/MMC. Or maybe it's a memory
> >> corruption issue. I don't know yet exactly what's going on here yet but
> >> I plan to find out after some lunch.
> >
> > And the patch below seems to fix the issue as the driver is now
> > using devm_kzalloc. Will do some more testing and then will post
> > a proper patch. The same issue might be there for SPI glue also.
> 
> This was already posted and you even acked it :)
> 
> wlcore: sdio: drop kfree for memory allocated with devm_kzalloc
> 
> https://patchwork.kernel.org/patch/9353985/

Heh well now we know :) Can you please apply it as it fixes a memory
corruption issue?

The SPI glue does not have this same issue.

Regards,

Tony


Re: Regression in next with ext4 oops

2016-10-04 Thread Kalle Valo
Tony Lindgren  writes:

> * Tony Lindgren  [161004 12:17]:
>> Hi,
>> 
>> * Al Viro  [161004 08:00]:
>> > On Tue, Oct 04, 2016 at 10:02:31AM -0400, Theodore Ts'o wrote:
>> > > On Tue, Oct 04, 2016 at 11:00:41AM +0200, Jan Kara wrote:
>> > > > Never seen this but I suspect it is a fallout from Al's directory 
>> > > > locking
>> > > > changes. In particular ext4_htree_fill_tree() builds rb-tree of found
>> > > > directory entries in file->private_data (and generally modifies the
>> > > > structure stored there) but after Al's changes we don't have exclusive
>> > > > access to struct file if I'm right so if two processes end up calling
>> > > > getdents() for the same 'struct file' we are doomed.
>> > > 
>> > > I haven't seen it either, and I've been doing a lot of testing on the
>> > > ext4 test branch.  So I'm guessing Tony has the only reliable repro
>> > > for the problem at the moment.  That being said, it shouldn't be that
>> > > hard to create a test case for this and add it to xfstests.
>> > > 
>> > > I'm pretty sure Jan is right about this, though, but it would be great
>> > > to a get a quick confirmation from Tony if at all possible.
>> > 
>> > Jan is wrong - we do have per-struct-file serialization for getdents()
>> > et.al.  It might be a race between getdents() on *different* struct
>> > file for the same directory, but ->private_data is not a problem.
>> 
>> OK found the guilty person after git bisect and that's me.
>> 
>> Git bisect points to commit d776fc86b82f ("wlcore: sdio: Populate config
>> firmware data"), so adding Kalle to Cc.
>> 
>> Looks like update-initramfs does rmmod of wlcore_sdio and that triggers
>> some issue with the wlcore driver or with SDIO/MMC. Or maybe it's a memory
>> corruption issue. I don't know yet exactly what's going on here yet but
>> I plan to find out after some lunch.
>
> And the patch below seems to fix the issue as the driver is now
> using devm_kzalloc. Will do some more testing and then will post
> a proper patch. The same issue might be there for SPI glue also.

This was already posted and you even acked it :)

wlcore: sdio: drop kfree for memory allocated with devm_kzalloc

https://patchwork.kernel.org/patch/9353985/

-- 
Kalle Valo


Re: Regression in next with ext4 oops

2016-10-04 Thread Kalle Valo
Tony Lindgren  writes:

> * Tony Lindgren  [161004 12:17]:
>> Hi,
>> 
>> * Al Viro  [161004 08:00]:
>> > On Tue, Oct 04, 2016 at 10:02:31AM -0400, Theodore Ts'o wrote:
>> > > On Tue, Oct 04, 2016 at 11:00:41AM +0200, Jan Kara wrote:
>> > > > Never seen this but I suspect it is a fallout from Al's directory 
>> > > > locking
>> > > > changes. In particular ext4_htree_fill_tree() builds rb-tree of found
>> > > > directory entries in file->private_data (and generally modifies the
>> > > > structure stored there) but after Al's changes we don't have exclusive
>> > > > access to struct file if I'm right so if two processes end up calling
>> > > > getdents() for the same 'struct file' we are doomed.
>> > > 
>> > > I haven't seen it either, and I've been doing a lot of testing on the
>> > > ext4 test branch.  So I'm guessing Tony has the only reliable repro
>> > > for the problem at the moment.  That being said, it shouldn't be that
>> > > hard to create a test case for this and add it to xfstests.
>> > > 
>> > > I'm pretty sure Jan is right about this, though, but it would be great
>> > > to a get a quick confirmation from Tony if at all possible.
>> > 
>> > Jan is wrong - we do have per-struct-file serialization for getdents()
>> > et.al.  It might be a race between getdents() on *different* struct
>> > file for the same directory, but ->private_data is not a problem.
>> 
>> OK found the guilty person after git bisect and that's me.
>> 
>> Git bisect points to commit d776fc86b82f ("wlcore: sdio: Populate config
>> firmware data"), so adding Kalle to Cc.
>> 
>> Looks like update-initramfs does rmmod of wlcore_sdio and that triggers
>> some issue with the wlcore driver or with SDIO/MMC. Or maybe it's a memory
>> corruption issue. I don't know yet exactly what's going on here yet but
>> I plan to find out after some lunch.
>
> And the patch below seems to fix the issue as the driver is now
> using devm_kzalloc. Will do some more testing and then will post
> a proper patch. The same issue might be there for SPI glue also.

This was already posted and you even acked it :)

wlcore: sdio: drop kfree for memory allocated with devm_kzalloc

https://patchwork.kernel.org/patch/9353985/

-- 
Kalle Valo


Re: Regression in next with ext4 oops

2016-10-04 Thread Tony Lindgren
* Tony Lindgren  [161004 12:17]:
> Hi,
> 
> * Al Viro  [161004 08:00]:
> > On Tue, Oct 04, 2016 at 10:02:31AM -0400, Theodore Ts'o wrote:
> > > On Tue, Oct 04, 2016 at 11:00:41AM +0200, Jan Kara wrote:
> > > > Never seen this but I suspect it is a fallout from Al's directory 
> > > > locking
> > > > changes. In particular ext4_htree_fill_tree() builds rb-tree of found
> > > > directory entries in file->private_data (and generally modifies the
> > > > structure stored there) but after Al's changes we don't have exclusive
> > > > access to struct file if I'm right so if two processes end up calling
> > > > getdents() for the same 'struct file' we are doomed.
> > > 
> > > I haven't seen it either, and I've been doing a lot of testing on the
> > > ext4 test branch.  So I'm guessing Tony has the only reliable repro
> > > for the problem at the moment.  That being said, it shouldn't be that
> > > hard to create a test case for this and add it to xfstests.
> > > 
> > > I'm pretty sure Jan is right about this, though, but it would be great
> > > to a get a quick confirmation from Tony if at all possible.
> > 
> > Jan is wrong - we do have per-struct-file serialization for getdents()
> > et.al.  It might be a race between getdents() on *different* struct
> > file for the same directory, but ->private_data is not a problem.
> 
> OK found the guilty person after git bisect and that's me.
> 
> Git bisect points to commit d776fc86b82f ("wlcore: sdio: Populate config
> firmware data"), so adding Kalle to Cc.
> 
> Looks like update-initramfs does rmmod of wlcore_sdio and that triggers
> some issue with the wlcore driver or with SDIO/MMC. Or maybe it's a memory
> corruption issue. I don't know yet exactly what's going on here yet but
> I plan to find out after some lunch.

And the patch below seems to fix the issue as the driver is now
using devm_kzalloc. Will do some more testing and then will post
a proper patch. The same issue might be there for SPI glue also.

Regards,

Tony

8< -
--- a/drivers/net/wireless/ti/wlcore/sdio.c
+++ b/drivers/net/wireless/ti/wlcore/sdio.c
@@ -391,7 +391,6 @@ static void wl1271_remove(struct sdio_func *func)
pm_runtime_get_noresume(>dev);
 
platform_device_unregister(glue->core);
-   kfree(glue);
 }
 
 #ifdef CONFIG_PM


Re: Regression in next with ext4 oops

2016-10-04 Thread Tony Lindgren
* Tony Lindgren  [161004 12:17]:
> Hi,
> 
> * Al Viro  [161004 08:00]:
> > On Tue, Oct 04, 2016 at 10:02:31AM -0400, Theodore Ts'o wrote:
> > > On Tue, Oct 04, 2016 at 11:00:41AM +0200, Jan Kara wrote:
> > > > Never seen this but I suspect it is a fallout from Al's directory 
> > > > locking
> > > > changes. In particular ext4_htree_fill_tree() builds rb-tree of found
> > > > directory entries in file->private_data (and generally modifies the
> > > > structure stored there) but after Al's changes we don't have exclusive
> > > > access to struct file if I'm right so if two processes end up calling
> > > > getdents() for the same 'struct file' we are doomed.
> > > 
> > > I haven't seen it either, and I've been doing a lot of testing on the
> > > ext4 test branch.  So I'm guessing Tony has the only reliable repro
> > > for the problem at the moment.  That being said, it shouldn't be that
> > > hard to create a test case for this and add it to xfstests.
> > > 
> > > I'm pretty sure Jan is right about this, though, but it would be great
> > > to a get a quick confirmation from Tony if at all possible.
> > 
> > Jan is wrong - we do have per-struct-file serialization for getdents()
> > et.al.  It might be a race between getdents() on *different* struct
> > file for the same directory, but ->private_data is not a problem.
> 
> OK found the guilty person after git bisect and that's me.
> 
> Git bisect points to commit d776fc86b82f ("wlcore: sdio: Populate config
> firmware data"), so adding Kalle to Cc.
> 
> Looks like update-initramfs does rmmod of wlcore_sdio and that triggers
> some issue with the wlcore driver or with SDIO/MMC. Or maybe it's a memory
> corruption issue. I don't know yet exactly what's going on here yet but
> I plan to find out after some lunch.

And the patch below seems to fix the issue as the driver is now
using devm_kzalloc. Will do some more testing and then will post
a proper patch. The same issue might be there for SPI glue also.

Regards,

Tony

8< -
--- a/drivers/net/wireless/ti/wlcore/sdio.c
+++ b/drivers/net/wireless/ti/wlcore/sdio.c
@@ -391,7 +391,6 @@ static void wl1271_remove(struct sdio_func *func)
pm_runtime_get_noresume(>dev);
 
platform_device_unregister(glue->core);
-   kfree(glue);
 }
 
 #ifdef CONFIG_PM


Re: Regression in next with ext4 oops

2016-10-04 Thread Tony Lindgren
* Theodore Ts'o  [161004 12:08]:
> On Tue, Oct 04, 2016 at 03:59:15PM +0100, Al Viro wrote:
> > Jan is wrong - we do have per-struct-file serialization for getdents()
> > et.al.  It might be a race between getdents() on *different* struct
> > file for the same directory, but ->private_data is not a problem.
> 
> So the rb_insert_color() OOPS seems to indicate that the rbtree has
> gotten corrupted, and it hangs off of private_data.

Yes so it seems..

> And I've just checked the ext4.git tree and I can't see any changes in
> the dev branch (which I was just about to push to Linus) that would
> impact the readdir path for non-encrypted directories.  So if it's a
> race on ->private_data I'm not sure what else might be going wrong.
> 
> Hmm... Tony, what version was the last good version where it wouldn't
> reproduce for you?  v4.7?   Does it reproduce for you on v4.8?

Well I just found it's caused by my commit d776fc86b82f ("wlcore:
sdio: Populate config firmware data") and has nothing to do with
ext4, see the mail I just posted.

> Also, would you mind seeing if you can reproduce it on the dev
> branch of the ext4 git tree?
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4.git dev

Seems like there's nothing wrong with ext4, this is probably some
memory corruption issue.

Thanks everybody for help and sorry for the false ext4 alert.

Regards,

Tony


Re: Regression in next with ext4 oops

2016-10-04 Thread Tony Lindgren
* Theodore Ts'o  [161004 12:08]:
> On Tue, Oct 04, 2016 at 03:59:15PM +0100, Al Viro wrote:
> > Jan is wrong - we do have per-struct-file serialization for getdents()
> > et.al.  It might be a race between getdents() on *different* struct
> > file for the same directory, but ->private_data is not a problem.
> 
> So the rb_insert_color() OOPS seems to indicate that the rbtree has
> gotten corrupted, and it hangs off of private_data.

Yes so it seems..

> And I've just checked the ext4.git tree and I can't see any changes in
> the dev branch (which I was just about to push to Linus) that would
> impact the readdir path for non-encrypted directories.  So if it's a
> race on ->private_data I'm not sure what else might be going wrong.
> 
> Hmm... Tony, what version was the last good version where it wouldn't
> reproduce for you?  v4.7?   Does it reproduce for you on v4.8?

Well I just found it's caused by my commit d776fc86b82f ("wlcore:
sdio: Populate config firmware data") and has nothing to do with
ext4, see the mail I just posted.

> Also, would you mind seeing if you can reproduce it on the dev
> branch of the ext4 git tree?
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4.git dev

Seems like there's nothing wrong with ext4, this is probably some
memory corruption issue.

Thanks everybody for help and sorry for the false ext4 alert.

Regards,

Tony


Re: Regression in next with ext4 oops

2016-10-04 Thread Tony Lindgren
Hi,

* Al Viro  [161004 08:00]:
> On Tue, Oct 04, 2016 at 10:02:31AM -0400, Theodore Ts'o wrote:
> > On Tue, Oct 04, 2016 at 11:00:41AM +0200, Jan Kara wrote:
> > > Never seen this but I suspect it is a fallout from Al's directory locking
> > > changes. In particular ext4_htree_fill_tree() builds rb-tree of found
> > > directory entries in file->private_data (and generally modifies the
> > > structure stored there) but after Al's changes we don't have exclusive
> > > access to struct file if I'm right so if two processes end up calling
> > > getdents() for the same 'struct file' we are doomed.
> > 
> > I haven't seen it either, and I've been doing a lot of testing on the
> > ext4 test branch.  So I'm guessing Tony has the only reliable repro
> > for the problem at the moment.  That being said, it shouldn't be that
> > hard to create a test case for this and add it to xfstests.
> > 
> > I'm pretty sure Jan is right about this, though, but it would be great
> > to a get a quick confirmation from Tony if at all possible.
> 
> Jan is wrong - we do have per-struct-file serialization for getdents()
> et.al.  It might be a race between getdents() on *different* struct
> file for the same directory, but ->private_data is not a problem.

OK found the guilty person after git bisect and that's me.

Git bisect points to commit d776fc86b82f ("wlcore: sdio: Populate config
firmware data"), so adding Kalle to Cc.

Looks like update-initramfs does rmmod of wlcore_sdio and that triggers
some issue with the wlcore driver or with SDIO/MMC. Or maybe it's a memory
corruption issue. I don't know yet exactly what's going on here yet but
I plan to find out after some lunch.

Regards,

Tony


Re: Regression in next with ext4 oops

2016-10-04 Thread Tony Lindgren
Hi,

* Al Viro  [161004 08:00]:
> On Tue, Oct 04, 2016 at 10:02:31AM -0400, Theodore Ts'o wrote:
> > On Tue, Oct 04, 2016 at 11:00:41AM +0200, Jan Kara wrote:
> > > Never seen this but I suspect it is a fallout from Al's directory locking
> > > changes. In particular ext4_htree_fill_tree() builds rb-tree of found
> > > directory entries in file->private_data (and generally modifies the
> > > structure stored there) but after Al's changes we don't have exclusive
> > > access to struct file if I'm right so if two processes end up calling
> > > getdents() for the same 'struct file' we are doomed.
> > 
> > I haven't seen it either, and I've been doing a lot of testing on the
> > ext4 test branch.  So I'm guessing Tony has the only reliable repro
> > for the problem at the moment.  That being said, it shouldn't be that
> > hard to create a test case for this and add it to xfstests.
> > 
> > I'm pretty sure Jan is right about this, though, but it would be great
> > to a get a quick confirmation from Tony if at all possible.
> 
> Jan is wrong - we do have per-struct-file serialization for getdents()
> et.al.  It might be a race between getdents() on *different* struct
> file for the same directory, but ->private_data is not a problem.

OK found the guilty person after git bisect and that's me.

Git bisect points to commit d776fc86b82f ("wlcore: sdio: Populate config
firmware data"), so adding Kalle to Cc.

Looks like update-initramfs does rmmod of wlcore_sdio and that triggers
some issue with the wlcore driver or with SDIO/MMC. Or maybe it's a memory
corruption issue. I don't know yet exactly what's going on here yet but
I plan to find out after some lunch.

Regards,

Tony


Re: Regression in next with ext4 oops

2016-10-04 Thread Theodore Ts'o
On Tue, Oct 04, 2016 at 03:59:15PM +0100, Al Viro wrote:
> Jan is wrong - we do have per-struct-file serialization for getdents()
> et.al.  It might be a race between getdents() on *different* struct
> file for the same directory, but ->private_data is not a problem.

So the rb_insert_color() OOPS seems to indicate that the rbtree has
gotten corrupted, and it hangs off of private_data.

And I've just checked the ext4.git tree and I can't see any changes in
the dev branch (which I was just about to push to Linus) that would
impact the readdir path for non-encrypted directories.  So if it's a
race on ->private_data I'm not sure what else might be going wrong.

Hmm... Tony, what version was the last good version where it wouldn't
reproduce for you?  v4.7?   Does it reproduce for you on v4.8?

Also, would you mind seeing if you can reproduce it on the dev
branch of the ext4 git tree?

git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4.git dev

Thanks,

- Ted




Re: Regression in next with ext4 oops

2016-10-04 Thread Theodore Ts'o
On Tue, Oct 04, 2016 at 03:59:15PM +0100, Al Viro wrote:
> Jan is wrong - we do have per-struct-file serialization for getdents()
> et.al.  It might be a race between getdents() on *different* struct
> file for the same directory, but ->private_data is not a problem.

So the rb_insert_color() OOPS seems to indicate that the rbtree has
gotten corrupted, and it hangs off of private_data.

And I've just checked the ext4.git tree and I can't see any changes in
the dev branch (which I was just about to push to Linus) that would
impact the readdir path for non-encrypted directories.  So if it's a
race on ->private_data I'm not sure what else might be going wrong.

Hmm... Tony, what version was the last good version where it wouldn't
reproduce for you?  v4.7?   Does it reproduce for you on v4.8?

Also, would you mind seeing if you can reproduce it on the dev
branch of the ext4 git tree?

git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4.git dev

Thanks,

- Ted




Re: Regression in next with ext4 oops

2016-10-04 Thread Al Viro
On Tue, Oct 04, 2016 at 11:00:41AM +0200, Jan Kara wrote:
> Hi!
> 
> On Mon 03-10-16 16:30:55, Tony Lindgren wrote:
> > I'm seeing a repeatable oops with Linux next while running
> > update-initramfs, see below. I tried reverting commit 59aa5a3aeead
> > ("fscrypto: make filename crypto functions return 0 on success")
> > as that's the only commit changing ext4_htree_store_dirent, but
> > that did not help.
> > 
> > Anybody else seeing something like this?
> 
> Never seen this but I suspect it is a fallout from Al's directory locking
> changes. In particular ext4_htree_fill_tree() builds rb-tree of found
> directory entries in file->private_data (and generally modifies the
> structure stored there) but after Al's changes we don't have exclusive
> access to struct file if I'm right so if two processes end up calling
> getdents() for the same 'struct file' we are doomed.

RTFS.  We sure as hell *do* have exclusive access to struct file.  See
/* POSIX.1-2008/SUSv4 Section XSI 2.9.7 */
if (S_ISREG(inode->i_mode) || S_ISDIR(inode->i_mode))
f->f_mode |= FMODE_ATOMIC_POS;
in do_dentry_open() as well as
if (file && (file->f_mode & FMODE_ATOMIC_POS)) {
if (file_count(file) > 1) {
v |= FDPUT_POS_UNLOCK;
mutex_lock(>f_pos_lock);
}
}
in __fdget_pos() and
f = fdget_pos(fd);
if (!f.file)
return -EBADF;
in getdents(2).


Re: Regression in next with ext4 oops

2016-10-04 Thread Al Viro
On Tue, Oct 04, 2016 at 11:00:41AM +0200, Jan Kara wrote:
> Hi!
> 
> On Mon 03-10-16 16:30:55, Tony Lindgren wrote:
> > I'm seeing a repeatable oops with Linux next while running
> > update-initramfs, see below. I tried reverting commit 59aa5a3aeead
> > ("fscrypto: make filename crypto functions return 0 on success")
> > as that's the only commit changing ext4_htree_store_dirent, but
> > that did not help.
> > 
> > Anybody else seeing something like this?
> 
> Never seen this but I suspect it is a fallout from Al's directory locking
> changes. In particular ext4_htree_fill_tree() builds rb-tree of found
> directory entries in file->private_data (and generally modifies the
> structure stored there) but after Al's changes we don't have exclusive
> access to struct file if I'm right so if two processes end up calling
> getdents() for the same 'struct file' we are doomed.

RTFS.  We sure as hell *do* have exclusive access to struct file.  See
/* POSIX.1-2008/SUSv4 Section XSI 2.9.7 */
if (S_ISREG(inode->i_mode) || S_ISDIR(inode->i_mode))
f->f_mode |= FMODE_ATOMIC_POS;
in do_dentry_open() as well as
if (file && (file->f_mode & FMODE_ATOMIC_POS)) {
if (file_count(file) > 1) {
v |= FDPUT_POS_UNLOCK;
mutex_lock(>f_pos_lock);
}
}
in __fdget_pos() and
f = fdget_pos(fd);
if (!f.file)
return -EBADF;
in getdents(2).


Re: Regression in next with ext4 oops

2016-10-04 Thread Al Viro
On Tue, Oct 04, 2016 at 10:02:31AM -0400, Theodore Ts'o wrote:
> On Tue, Oct 04, 2016 at 11:00:41AM +0200, Jan Kara wrote:
> > Never seen this but I suspect it is a fallout from Al's directory locking
> > changes. In particular ext4_htree_fill_tree() builds rb-tree of found
> > directory entries in file->private_data (and generally modifies the
> > structure stored there) but after Al's changes we don't have exclusive
> > access to struct file if I'm right so if two processes end up calling
> > getdents() for the same 'struct file' we are doomed.
> 
> I haven't seen it either, and I've been doing a lot of testing on the
> ext4 test branch.  So I'm guessing Tony has the only reliable repro
> for the problem at the moment.  That being said, it shouldn't be that
> hard to create a test case for this and add it to xfstests.
> 
> I'm pretty sure Jan is right about this, though, but it would be great
> to a get a quick confirmation from Tony if at all possible.

Jan is wrong - we do have per-struct-file serialization for getdents()
et.al.  It might be a race between getdents() on *different* struct
file for the same directory, but ->private_data is not a problem.


Re: Regression in next with ext4 oops

2016-10-04 Thread Al Viro
On Tue, Oct 04, 2016 at 10:02:31AM -0400, Theodore Ts'o wrote:
> On Tue, Oct 04, 2016 at 11:00:41AM +0200, Jan Kara wrote:
> > Never seen this but I suspect it is a fallout from Al's directory locking
> > changes. In particular ext4_htree_fill_tree() builds rb-tree of found
> > directory entries in file->private_data (and generally modifies the
> > structure stored there) but after Al's changes we don't have exclusive
> > access to struct file if I'm right so if two processes end up calling
> > getdents() for the same 'struct file' we are doomed.
> 
> I haven't seen it either, and I've been doing a lot of testing on the
> ext4 test branch.  So I'm guessing Tony has the only reliable repro
> for the problem at the moment.  That being said, it shouldn't be that
> hard to create a test case for this and add it to xfstests.
> 
> I'm pretty sure Jan is right about this, though, but it would be great
> to a get a quick confirmation from Tony if at all possible.

Jan is wrong - we do have per-struct-file serialization for getdents()
et.al.  It might be a race between getdents() on *different* struct
file for the same directory, but ->private_data is not a problem.


Re: Regression in next with ext4 oops

2016-10-04 Thread Tony Lindgren
* Theodore Ts'o  [161004 07:04]:
> On Tue, Oct 04, 2016 at 11:00:41AM +0200, Jan Kara wrote:
> > Never seen this but I suspect it is a fallout from Al's directory locking
> > changes. In particular ext4_htree_fill_tree() builds rb-tree of found
> > directory entries in file->private_data (and generally modifies the
> > structure stored there) but after Al's changes we don't have exclusive
> > access to struct file if I'm right so if two processes end up calling
> > getdents() for the same 'struct file' we are doomed.
> 
> I haven't seen it either, and I've been doing a lot of testing on the
> ext4 test branch.  So I'm guessing Tony has the only reliable repro
> for the problem at the moment.  That being said, it shouldn't be that
> hard to create a test case for this and add it to xfstests.
> 
> I'm pretty sure Jan is right about this, though, but it would be great
> to a get a quick confirmation from Tony if at all possible.

Looks like Jan's patch somehow makes things worse for me.

Regards,

Tony


Re: Regression in next with ext4 oops

2016-10-04 Thread Tony Lindgren
* Theodore Ts'o  [161004 07:04]:
> On Tue, Oct 04, 2016 at 11:00:41AM +0200, Jan Kara wrote:
> > Never seen this but I suspect it is a fallout from Al's directory locking
> > changes. In particular ext4_htree_fill_tree() builds rb-tree of found
> > directory entries in file->private_data (and generally modifies the
> > structure stored there) but after Al's changes we don't have exclusive
> > access to struct file if I'm right so if two processes end up calling
> > getdents() for the same 'struct file' we are doomed.
> 
> I haven't seen it either, and I've been doing a lot of testing on the
> ext4 test branch.  So I'm guessing Tony has the only reliable repro
> for the problem at the moment.  That being said, it shouldn't be that
> hard to create a test case for this and add it to xfstests.
> 
> I'm pretty sure Jan is right about this, though, but it would be great
> to a get a quick confirmation from Tony if at all possible.

Looks like Jan's patch somehow makes things worse for me.

Regards,

Tony


Re: Regression in next with ext4 oops

2016-10-04 Thread Tony Lindgren
* Jan Kara  [161004 02:01]:
> Hi!
> 
> On Mon 03-10-16 16:30:55, Tony Lindgren wrote:
> > I'm seeing a repeatable oops with Linux next while running
> > update-initramfs, see below. I tried reverting commit 59aa5a3aeead
> > ("fscrypto: make filename crypto functions return 0 on success")
> > as that's the only commit changing ext4_htree_store_dirent, but
> > that did not help.
> > 
> > Anybody else seeing something like this?
> 
> Never seen this but I suspect it is a fallout from Al's directory locking
> changes. In particular ext4_htree_fill_tree() builds rb-tree of found
> directory entries in file->private_data (and generally modifies the
> structure stored there) but after Al's changes we don't have exclusive
> access to struct file if I'm right so if two processes end up calling
> getdents() for the same 'struct file' we are doomed.

OK..

> That being said two getdents() calls for a single fd looks like a stupid
> things to do but I don't see anything that would prevent this. Does the
> attached patch fix the issue for you?

Looks like with your patch I saw the oops few times during kernel boot
already, see below. And the errors seem more random now..

I've verified I can run mkinitramfs just fine with my curent for-next
that's based on 4.8.0-rc6, so this does not seem to be any memory or
MMC related issue.

Trying to bisect this would be a pain as the error does not happen
every time..

Regards,

Tony

8< ---
Unable to handle kernel NULL pointer dereference at virtual address 0004
pgd = c22a46c0
[0004] *pgd=
Internal error: Oops: 205 [#1] SMP ARM
Modules linked in: usbkbd usb_f_rndis usb_f_ecm usb_f_acm u_ether 
usb_f_mass_storage usb]
CPU: 1 PID: 364 Comm: startpar Not tainted 4.8.0-rc8-next-20161003+ #1005
Hardware name: Generic OMAP5 (Flattened Device Tree)
task: c2090ec0 task.stack: c25b2000
PC is at rb_insert_color+0x1c/0x1b8
LR is at ext4_htree_store_dirent+0xe0/0x104
pc : []lr : []psr: 600f0013
sp : c25b3db0  ip : c2297bd8  fp : c0a17618
r10:   r9 : c25b3e00  r8 : c2297bd0
r7 : c2298f80  r6 : 79de0293  r5 : 998623c8  r4 : 
r3 :   r2 : c2297c08  r1 : c2298f80  r0 : c2297bd8
Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user
Control: 30c5387d  Table: 822a46c0  DAC: 
Process startpar (pid: 364, stack limit = 0xc25b2218)
Stack: (0xc25b3db0 to 0xc25b4000)
3da0:  998623c8 79de0293 c042ade0
3dc0: e92460b4 ed080200 ed084790 c25b3e58 e92460bc e9246ff8  c043ee9c
3de0: c25b3e00 ed080200 e9246000 1000 00b4 c028a6e4 0009 ee48a400
3e00: e92460bc 000d c1025cb4 c2298f80 ed0848c0 ed084790  ee48a400
3e20:  c25b3e58 c2257800 c043fc14   0006 024080c0
3e40: c2298fa0  600f0013 c100253c c10936b4 c02871ac 998623c8 79de0293
3e60: 0004 c22570b8 c2298f80 c0383f40 030d8fb9 69db434f c2298f80 c20aa000
3e80: 7f5c2000 c2298f80 ed0848c0 c25b3f68 ed084790 ed084830 ee48a400 ed084790
3ea0: c2257800 c042aa04   600f0013 c10930b0 0001 c028c230
3ec0: 0001      0800 600f0013
3ee0:  0002  ed084830  0001 c03b4e3c ed084830
3f00: ee48a400  bef92f0c c08d4530 0001 ee48a400  
3f20: c25b3f68 ed084830 ee48a400  bef92f0c c03b4e70 ee48a400 bef90800
3f40: 7f59d01d 7f5ba6f0 8000 0004 00d9 ee48a400 ee48a400 
3f60: bef92f0c c03b54a0 c03b5124    7f5ba6f0 
3f80: 8000  0020 0020 7f5ba6d0 7f5ba6f0 00d9 c02080c4
3fa0: c25b2000 c0207f40 0020 7f5ba6d0 0004 7f5ba6f0 8000 
3fc0: 0020 7f5ba6d0 7f5ba6f0 00d9 b6f15850   bef92f0c
3fe0: 00d9 bef9080c b6e57fa3 b6e006f6 200f0030 0004  00150072
[] (rb_insert_color) from [] 
(ext4_htree_store_dirent+0xe0/0x104)
[] (ext4_htree_store_dirent) from [] 
(htree_dirblock_to_tree+0xb0/0x)
[] (htree_dirblock_to_tree) from [] 
(ext4_htree_fill_tree+0x7c/0x2a0)
[] (ext4_htree_fill_tree) from [] (ext4_readdir+0x62c/0x910)
[] (ext4_readdir) from [] (iterate_dir+0xcc/0x194)
[] (iterate_dir) from [] (SyS_getdents64+0x7c/0x10c)
[] (SyS_getdents64) from [] (ret_fast_syscall+0x0/0x1c)
Code: e5923000 e3130001 1a62 e92d4070 (e593c004)
---[ end trace 8970a95866a25898 ]---


Re: Regression in next with ext4 oops

2016-10-04 Thread Tony Lindgren
* Jan Kara  [161004 02:01]:
> Hi!
> 
> On Mon 03-10-16 16:30:55, Tony Lindgren wrote:
> > I'm seeing a repeatable oops with Linux next while running
> > update-initramfs, see below. I tried reverting commit 59aa5a3aeead
> > ("fscrypto: make filename crypto functions return 0 on success")
> > as that's the only commit changing ext4_htree_store_dirent, but
> > that did not help.
> > 
> > Anybody else seeing something like this?
> 
> Never seen this but I suspect it is a fallout from Al's directory locking
> changes. In particular ext4_htree_fill_tree() builds rb-tree of found
> directory entries in file->private_data (and generally modifies the
> structure stored there) but after Al's changes we don't have exclusive
> access to struct file if I'm right so if two processes end up calling
> getdents() for the same 'struct file' we are doomed.

OK..

> That being said two getdents() calls for a single fd looks like a stupid
> things to do but I don't see anything that would prevent this. Does the
> attached patch fix the issue for you?

Looks like with your patch I saw the oops few times during kernel boot
already, see below. And the errors seem more random now..

I've verified I can run mkinitramfs just fine with my curent for-next
that's based on 4.8.0-rc6, so this does not seem to be any memory or
MMC related issue.

Trying to bisect this would be a pain as the error does not happen
every time..

Regards,

Tony

8< ---
Unable to handle kernel NULL pointer dereference at virtual address 0004
pgd = c22a46c0
[0004] *pgd=
Internal error: Oops: 205 [#1] SMP ARM
Modules linked in: usbkbd usb_f_rndis usb_f_ecm usb_f_acm u_ether 
usb_f_mass_storage usb]
CPU: 1 PID: 364 Comm: startpar Not tainted 4.8.0-rc8-next-20161003+ #1005
Hardware name: Generic OMAP5 (Flattened Device Tree)
task: c2090ec0 task.stack: c25b2000
PC is at rb_insert_color+0x1c/0x1b8
LR is at ext4_htree_store_dirent+0xe0/0x104
pc : []lr : []psr: 600f0013
sp : c25b3db0  ip : c2297bd8  fp : c0a17618
r10:   r9 : c25b3e00  r8 : c2297bd0
r7 : c2298f80  r6 : 79de0293  r5 : 998623c8  r4 : 
r3 :   r2 : c2297c08  r1 : c2298f80  r0 : c2297bd8
Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user
Control: 30c5387d  Table: 822a46c0  DAC: 
Process startpar (pid: 364, stack limit = 0xc25b2218)
Stack: (0xc25b3db0 to 0xc25b4000)
3da0:  998623c8 79de0293 c042ade0
3dc0: e92460b4 ed080200 ed084790 c25b3e58 e92460bc e9246ff8  c043ee9c
3de0: c25b3e00 ed080200 e9246000 1000 00b4 c028a6e4 0009 ee48a400
3e00: e92460bc 000d c1025cb4 c2298f80 ed0848c0 ed084790  ee48a400
3e20:  c25b3e58 c2257800 c043fc14   0006 024080c0
3e40: c2298fa0  600f0013 c100253c c10936b4 c02871ac 998623c8 79de0293
3e60: 0004 c22570b8 c2298f80 c0383f40 030d8fb9 69db434f c2298f80 c20aa000
3e80: 7f5c2000 c2298f80 ed0848c0 c25b3f68 ed084790 ed084830 ee48a400 ed084790
3ea0: c2257800 c042aa04   600f0013 c10930b0 0001 c028c230
3ec0: 0001      0800 600f0013
3ee0:  0002  ed084830  0001 c03b4e3c ed084830
3f00: ee48a400  bef92f0c c08d4530 0001 ee48a400  
3f20: c25b3f68 ed084830 ee48a400  bef92f0c c03b4e70 ee48a400 bef90800
3f40: 7f59d01d 7f5ba6f0 8000 0004 00d9 ee48a400 ee48a400 
3f60: bef92f0c c03b54a0 c03b5124    7f5ba6f0 
3f80: 8000  0020 0020 7f5ba6d0 7f5ba6f0 00d9 c02080c4
3fa0: c25b2000 c0207f40 0020 7f5ba6d0 0004 7f5ba6f0 8000 
3fc0: 0020 7f5ba6d0 7f5ba6f0 00d9 b6f15850   bef92f0c
3fe0: 00d9 bef9080c b6e57fa3 b6e006f6 200f0030 0004  00150072
[] (rb_insert_color) from [] 
(ext4_htree_store_dirent+0xe0/0x104)
[] (ext4_htree_store_dirent) from [] 
(htree_dirblock_to_tree+0xb0/0x)
[] (htree_dirblock_to_tree) from [] 
(ext4_htree_fill_tree+0x7c/0x2a0)
[] (ext4_htree_fill_tree) from [] (ext4_readdir+0x62c/0x910)
[] (ext4_readdir) from [] (iterate_dir+0xcc/0x194)
[] (iterate_dir) from [] (SyS_getdents64+0x7c/0x10c)
[] (SyS_getdents64) from [] (ret_fast_syscall+0x0/0x1c)
Code: e5923000 e3130001 1a62 e92d4070 (e593c004)
---[ end trace 8970a95866a25898 ]---


Re: Regression in next with ext4 oops

2016-10-04 Thread Theodore Ts'o
On Tue, Oct 04, 2016 at 11:00:41AM +0200, Jan Kara wrote:
> Never seen this but I suspect it is a fallout from Al's directory locking
> changes. In particular ext4_htree_fill_tree() builds rb-tree of found
> directory entries in file->private_data (and generally modifies the
> structure stored there) but after Al's changes we don't have exclusive
> access to struct file if I'm right so if two processes end up calling
> getdents() for the same 'struct file' we are doomed.

I haven't seen it either, and I've been doing a lot of testing on the
ext4 test branch.  So I'm guessing Tony has the only reliable repro
for the problem at the moment.  That being said, it shouldn't be that
hard to create a test case for this and add it to xfstests.

I'm pretty sure Jan is right about this, though, but it would be great
to a get a quick confirmation from Tony if at all possible.

- Ted


Re: Regression in next with ext4 oops

2016-10-04 Thread Theodore Ts'o
On Tue, Oct 04, 2016 at 11:00:41AM +0200, Jan Kara wrote:
> Never seen this but I suspect it is a fallout from Al's directory locking
> changes. In particular ext4_htree_fill_tree() builds rb-tree of found
> directory entries in file->private_data (and generally modifies the
> structure stored there) but after Al's changes we don't have exclusive
> access to struct file if I'm right so if two processes end up calling
> getdents() for the same 'struct file' we are doomed.

I haven't seen it either, and I've been doing a lot of testing on the
ext4 test branch.  So I'm guessing Tony has the only reliable repro
for the problem at the moment.  That being said, it shouldn't be that
hard to create a test case for this and add it to xfstests.

I'm pretty sure Jan is right about this, though, but it would be great
to a get a quick confirmation from Tony if at all possible.

- Ted


Re: Regression in next with ext4 oops

2016-10-04 Thread Jan Kara
Hi!

On Mon 03-10-16 16:30:55, Tony Lindgren wrote:
> I'm seeing a repeatable oops with Linux next while running
> update-initramfs, see below. I tried reverting commit 59aa5a3aeead
> ("fscrypto: make filename crypto functions return 0 on success")
> as that's the only commit changing ext4_htree_store_dirent, but
> that did not help.
> 
> Anybody else seeing something like this?

Never seen this but I suspect it is a fallout from Al's directory locking
changes. In particular ext4_htree_fill_tree() builds rb-tree of found
directory entries in file->private_data (and generally modifies the
structure stored there) but after Al's changes we don't have exclusive
access to struct file if I'm right so if two processes end up calling
getdents() for the same 'struct file' we are doomed.

That being said two getdents() calls for a single fd looks like a stupid
things to do but I don't see anything that would prevent this. Does the
attached patch fix the issue for you?

If I'm right, better fix would be to exclude two ->readdir callbacks for
one fd but that would be slightly more complicated patch...

Honza
> 8< --
> Unable to handle kernel NULL pointer dereference at virtual address 0004
> pgd = ee7e7280
> [0004] *pgd=
> Internal error: Oops: 205 [#1] SMP ARM
> Modules linked in: ledtrig_default_on ledtrig_heartbeat hid_generic usbhid 
> smsc95xx smsc]
> CPU: 1 PID: 2299 Comm: mkinitramfs Not tainted 4.8.0-rc8-next-20160930+ #974
> Hardware name: Generic OMAP5 (Flattened Device Tree)
> task: ed0b8380 task.stack: ec216000
> PC is at rb_insert_color+0x1c/0x1b8
> LR is at ext4_htree_store_dirent+0xe0/0x104
> pc : []lr : []psr: 600e0013
> sp : ec217db8  ip : ed177948  fp : c0a17618
> r10: 0001  r9 : ec217e08  r8 : ed177940
> r7 : ec224b80  r6 : 2d7d188a  r5 : 83aa4108  r4 : 
> r3 :   r2 : ed389e88  r1 : ec224b80  r0 : ed177948
> Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user
> Control: 30c5387d  Table: ae7e7280  DAC: 
> Process mkinitramfs (pid: 2299, stack limit = 0xec216218)
> Stack: (0xec217db8 to 0xec218000)
> 7da0:    83aa4108
> 7dc0: 2d7d188a c042aa70 c0f1a628 ed80c040 ed80d6d0 ec217e60 c0f1a630 c0f1aff8
> 7de0: 0001 c043ecc8 ec217e08 ed80c040 c0f1a000 1000 1628 
> 7e00: 005c ebaf9c80 c0f1a630 000b ec224bc2  0002 ed80d6d0
> 7e20:  ebaf9c80  ec217e60 ec217e70 c043fb8c  
> 7e40: 0006 024080c0 ec224ba0 ec216000 600e0013 c100253c c0f19014 0002
> 7e60: 83aa4108 2d7d188a 0004 ee7bb8b8 ed80c080 c0f19020 c0f19020 
> 7e80: ec224b80 c028ac34 ab5552a8 ec224b80 ed80d800 ec217f70 ed80d6d0 ed80d770
> 7ea0: ebaf9c80 ed80d6d0 ee7b8000 c042a694 0055 ed80d7a4  
> 7ec0: 0001  600e0013 c1093070   0001 
> 7ee0:  c03b49e8    600e0013  0002
> 7f00: ed80d6d0 ed80d770  ed80d6d0 ec217f70 ed80d770 ec216000 ebaf9c80
> 7f20:  0001 ec217f70 ed80d770 ec216000  7f5edef4 c03b4ae8
> 7f40: 7f60261c 7f602611 00c5 7f617428 ebaf9c80 8000 008d ebaf9c80
> 7f60: ec216000  7f5edef4 c03b4f80 c03b4b30   
> 7f80: 7f617428  8000  7f617408 7f60261c 7f617428 008d
> 7fa0: c02080c4 c0207f40 7f617408 7f60261c 0003 7f617428 8000 
> 7fc0: 7f617408 7f60261c 7f617428 008d b6f98820 0002 7f6004c8 7f5edef4
> 7fe0: 008d bed19808 b6edaf41 b6e836f6 600e0030 0003 cf30eaa8 d31d966f
> [] (rb_insert_color) from [] 
> (ext4_htree_store_dirent+0xe0/0x104)
> [] (ext4_htree_store_dirent) from [] 
> (htree_dirblock_to_tree+0xb0/0x)
> [] (htree_dirblock_to_tree) from [] 
> (ext4_htree_fill_tree+0x1c8/0x2a)
> [] (ext4_htree_fill_tree) from [] 
> (ext4_readdir+0x62c/0x910)
> [] (ext4_readdir) from [] (iterate_dir+0x14c/0x194)
> [] (iterate_dir) from [] (SyS_getdents+0x7c/0x118)
> [] (SyS_getdents) from [] (ret_fast_syscall+0x0/0x1c)
> Code: e5923000 e3130001 1a62 e92d4070 (e593c004)
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
-- 
Jan Kara 
SUSE Labs, CR
>From d13477d654e60ec4434c266d11828347a135ca32 Mon Sep 17 00:00:00 2001
From: Jan Kara 
Date: Tue, 4 Oct 2016 10:56:00 +0200
Subject: [PATCH] ext4: Use exclusive locking for ext4_readdir()

We use file->private_data to track information for readdir calls. As
such we cannot allow two readdir calls to happen at the same time for
one struct file as they end up corrupting the information. For now just
revert back to the old behavior where readdirs are synchronized.

Signed-off-by: Jan Kara 
---
 fs/ext4/dir.c | 2 +-
 

Re: Regression in next with ext4 oops

2016-10-04 Thread Jan Kara
Hi!

On Mon 03-10-16 16:30:55, Tony Lindgren wrote:
> I'm seeing a repeatable oops with Linux next while running
> update-initramfs, see below. I tried reverting commit 59aa5a3aeead
> ("fscrypto: make filename crypto functions return 0 on success")
> as that's the only commit changing ext4_htree_store_dirent, but
> that did not help.
> 
> Anybody else seeing something like this?

Never seen this but I suspect it is a fallout from Al's directory locking
changes. In particular ext4_htree_fill_tree() builds rb-tree of found
directory entries in file->private_data (and generally modifies the
structure stored there) but after Al's changes we don't have exclusive
access to struct file if I'm right so if two processes end up calling
getdents() for the same 'struct file' we are doomed.

That being said two getdents() calls for a single fd looks like a stupid
things to do but I don't see anything that would prevent this. Does the
attached patch fix the issue for you?

If I'm right, better fix would be to exclude two ->readdir callbacks for
one fd but that would be slightly more complicated patch...

Honza
> 8< --
> Unable to handle kernel NULL pointer dereference at virtual address 0004
> pgd = ee7e7280
> [0004] *pgd=
> Internal error: Oops: 205 [#1] SMP ARM
> Modules linked in: ledtrig_default_on ledtrig_heartbeat hid_generic usbhid 
> smsc95xx smsc]
> CPU: 1 PID: 2299 Comm: mkinitramfs Not tainted 4.8.0-rc8-next-20160930+ #974
> Hardware name: Generic OMAP5 (Flattened Device Tree)
> task: ed0b8380 task.stack: ec216000
> PC is at rb_insert_color+0x1c/0x1b8
> LR is at ext4_htree_store_dirent+0xe0/0x104
> pc : []lr : []psr: 600e0013
> sp : ec217db8  ip : ed177948  fp : c0a17618
> r10: 0001  r9 : ec217e08  r8 : ed177940
> r7 : ec224b80  r6 : 2d7d188a  r5 : 83aa4108  r4 : 
> r3 :   r2 : ed389e88  r1 : ec224b80  r0 : ed177948
> Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user
> Control: 30c5387d  Table: ae7e7280  DAC: 
> Process mkinitramfs (pid: 2299, stack limit = 0xec216218)
> Stack: (0xec217db8 to 0xec218000)
> 7da0:    83aa4108
> 7dc0: 2d7d188a c042aa70 c0f1a628 ed80c040 ed80d6d0 ec217e60 c0f1a630 c0f1aff8
> 7de0: 0001 c043ecc8 ec217e08 ed80c040 c0f1a000 1000 1628 
> 7e00: 005c ebaf9c80 c0f1a630 000b ec224bc2  0002 ed80d6d0
> 7e20:  ebaf9c80  ec217e60 ec217e70 c043fb8c  
> 7e40: 0006 024080c0 ec224ba0 ec216000 600e0013 c100253c c0f19014 0002
> 7e60: 83aa4108 2d7d188a 0004 ee7bb8b8 ed80c080 c0f19020 c0f19020 
> 7e80: ec224b80 c028ac34 ab5552a8 ec224b80 ed80d800 ec217f70 ed80d6d0 ed80d770
> 7ea0: ebaf9c80 ed80d6d0 ee7b8000 c042a694 0055 ed80d7a4  
> 7ec0: 0001  600e0013 c1093070   0001 
> 7ee0:  c03b49e8    600e0013  0002
> 7f00: ed80d6d0 ed80d770  ed80d6d0 ec217f70 ed80d770 ec216000 ebaf9c80
> 7f20:  0001 ec217f70 ed80d770 ec216000  7f5edef4 c03b4ae8
> 7f40: 7f60261c 7f602611 00c5 7f617428 ebaf9c80 8000 008d ebaf9c80
> 7f60: ec216000  7f5edef4 c03b4f80 c03b4b30   
> 7f80: 7f617428  8000  7f617408 7f60261c 7f617428 008d
> 7fa0: c02080c4 c0207f40 7f617408 7f60261c 0003 7f617428 8000 
> 7fc0: 7f617408 7f60261c 7f617428 008d b6f98820 0002 7f6004c8 7f5edef4
> 7fe0: 008d bed19808 b6edaf41 b6e836f6 600e0030 0003 cf30eaa8 d31d966f
> [] (rb_insert_color) from [] 
> (ext4_htree_store_dirent+0xe0/0x104)
> [] (ext4_htree_store_dirent) from [] 
> (htree_dirblock_to_tree+0xb0/0x)
> [] (htree_dirblock_to_tree) from [] 
> (ext4_htree_fill_tree+0x1c8/0x2a)
> [] (ext4_htree_fill_tree) from [] 
> (ext4_readdir+0x62c/0x910)
> [] (ext4_readdir) from [] (iterate_dir+0x14c/0x194)
> [] (iterate_dir) from [] (SyS_getdents+0x7c/0x118)
> [] (SyS_getdents) from [] (ret_fast_syscall+0x0/0x1c)
> Code: e5923000 e3130001 1a62 e92d4070 (e593c004)
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
-- 
Jan Kara 
SUSE Labs, CR
>From d13477d654e60ec4434c266d11828347a135ca32 Mon Sep 17 00:00:00 2001
From: Jan Kara 
Date: Tue, 4 Oct 2016 10:56:00 +0200
Subject: [PATCH] ext4: Use exclusive locking for ext4_readdir()

We use file->private_data to track information for readdir calls. As
such we cannot allow two readdir calls to happen at the same time for
one struct file as they end up corrupting the information. For now just
revert back to the old behavior where readdirs are synchronized.

Signed-off-by: Jan Kara 
---
 fs/ext4/dir.c | 2 +-
 1 file changed, 1 insertion(+), 1 

Regression in next with ext4 oops

2016-10-03 Thread Tony Lindgren
Hi,

I'm seeing a repeatable oops with Linux next while running
update-initramfs, see below. I tried reverting commit 59aa5a3aeead
("fscrypto: make filename crypto functions return 0 on success")
as that's the only commit changing ext4_htree_store_dirent, but
that did not help.

Anybody else seeing something like this?

Regards,

Tony

8< --
Unable to handle kernel NULL pointer dereference at virtual address 0004
pgd = ee7e7280
[0004] *pgd=
Internal error: Oops: 205 [#1] SMP ARM
Modules linked in: ledtrig_default_on ledtrig_heartbeat hid_generic usbhid 
smsc95xx smsc]
CPU: 1 PID: 2299 Comm: mkinitramfs Not tainted 4.8.0-rc8-next-20160930+ #974
Hardware name: Generic OMAP5 (Flattened Device Tree)
task: ed0b8380 task.stack: ec216000
PC is at rb_insert_color+0x1c/0x1b8
LR is at ext4_htree_store_dirent+0xe0/0x104
pc : []lr : []psr: 600e0013
sp : ec217db8  ip : ed177948  fp : c0a17618
r10: 0001  r9 : ec217e08  r8 : ed177940
r7 : ec224b80  r6 : 2d7d188a  r5 : 83aa4108  r4 : 
r3 :   r2 : ed389e88  r1 : ec224b80  r0 : ed177948
Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user
Control: 30c5387d  Table: ae7e7280  DAC: 
Process mkinitramfs (pid: 2299, stack limit = 0xec216218)
Stack: (0xec217db8 to 0xec218000)
7da0:    83aa4108
7dc0: 2d7d188a c042aa70 c0f1a628 ed80c040 ed80d6d0 ec217e60 c0f1a630 c0f1aff8
7de0: 0001 c043ecc8 ec217e08 ed80c040 c0f1a000 1000 1628 
7e00: 005c ebaf9c80 c0f1a630 000b ec224bc2  0002 ed80d6d0
7e20:  ebaf9c80  ec217e60 ec217e70 c043fb8c  
7e40: 0006 024080c0 ec224ba0 ec216000 600e0013 c100253c c0f19014 0002
7e60: 83aa4108 2d7d188a 0004 ee7bb8b8 ed80c080 c0f19020 c0f19020 
7e80: ec224b80 c028ac34 ab5552a8 ec224b80 ed80d800 ec217f70 ed80d6d0 ed80d770
7ea0: ebaf9c80 ed80d6d0 ee7b8000 c042a694 0055 ed80d7a4  
7ec0: 0001  600e0013 c1093070   0001 
7ee0:  c03b49e8    600e0013  0002
7f00: ed80d6d0 ed80d770  ed80d6d0 ec217f70 ed80d770 ec216000 ebaf9c80
7f20:  0001 ec217f70 ed80d770 ec216000  7f5edef4 c03b4ae8
7f40: 7f60261c 7f602611 00c5 7f617428 ebaf9c80 8000 008d ebaf9c80
7f60: ec216000  7f5edef4 c03b4f80 c03b4b30   
7f80: 7f617428  8000  7f617408 7f60261c 7f617428 008d
7fa0: c02080c4 c0207f40 7f617408 7f60261c 0003 7f617428 8000 
7fc0: 7f617408 7f60261c 7f617428 008d b6f98820 0002 7f6004c8 7f5edef4
7fe0: 008d bed19808 b6edaf41 b6e836f6 600e0030 0003 cf30eaa8 d31d966f
[] (rb_insert_color) from [] 
(ext4_htree_store_dirent+0xe0/0x104)
[] (ext4_htree_store_dirent) from [] 
(htree_dirblock_to_tree+0xb0/0x)
[] (htree_dirblock_to_tree) from [] 
(ext4_htree_fill_tree+0x1c8/0x2a)
[] (ext4_htree_fill_tree) from [] (ext4_readdir+0x62c/0x910)
[] (ext4_readdir) from [] (iterate_dir+0x14c/0x194)
[] (iterate_dir) from [] (SyS_getdents+0x7c/0x118)
[] (SyS_getdents) from [] (ret_fast_syscall+0x0/0x1c)
Code: e5923000 e3130001 1a62 e92d4070 (e593c004)


Regression in next with ext4 oops

2016-10-03 Thread Tony Lindgren
Hi,

I'm seeing a repeatable oops with Linux next while running
update-initramfs, see below. I tried reverting commit 59aa5a3aeead
("fscrypto: make filename crypto functions return 0 on success")
as that's the only commit changing ext4_htree_store_dirent, but
that did not help.

Anybody else seeing something like this?

Regards,

Tony

8< --
Unable to handle kernel NULL pointer dereference at virtual address 0004
pgd = ee7e7280
[0004] *pgd=
Internal error: Oops: 205 [#1] SMP ARM
Modules linked in: ledtrig_default_on ledtrig_heartbeat hid_generic usbhid 
smsc95xx smsc]
CPU: 1 PID: 2299 Comm: mkinitramfs Not tainted 4.8.0-rc8-next-20160930+ #974
Hardware name: Generic OMAP5 (Flattened Device Tree)
task: ed0b8380 task.stack: ec216000
PC is at rb_insert_color+0x1c/0x1b8
LR is at ext4_htree_store_dirent+0xe0/0x104
pc : []lr : []psr: 600e0013
sp : ec217db8  ip : ed177948  fp : c0a17618
r10: 0001  r9 : ec217e08  r8 : ed177940
r7 : ec224b80  r6 : 2d7d188a  r5 : 83aa4108  r4 : 
r3 :   r2 : ed389e88  r1 : ec224b80  r0 : ed177948
Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user
Control: 30c5387d  Table: ae7e7280  DAC: 
Process mkinitramfs (pid: 2299, stack limit = 0xec216218)
Stack: (0xec217db8 to 0xec218000)
7da0:    83aa4108
7dc0: 2d7d188a c042aa70 c0f1a628 ed80c040 ed80d6d0 ec217e60 c0f1a630 c0f1aff8
7de0: 0001 c043ecc8 ec217e08 ed80c040 c0f1a000 1000 1628 
7e00: 005c ebaf9c80 c0f1a630 000b ec224bc2  0002 ed80d6d0
7e20:  ebaf9c80  ec217e60 ec217e70 c043fb8c  
7e40: 0006 024080c0 ec224ba0 ec216000 600e0013 c100253c c0f19014 0002
7e60: 83aa4108 2d7d188a 0004 ee7bb8b8 ed80c080 c0f19020 c0f19020 
7e80: ec224b80 c028ac34 ab5552a8 ec224b80 ed80d800 ec217f70 ed80d6d0 ed80d770
7ea0: ebaf9c80 ed80d6d0 ee7b8000 c042a694 0055 ed80d7a4  
7ec0: 0001  600e0013 c1093070   0001 
7ee0:  c03b49e8    600e0013  0002
7f00: ed80d6d0 ed80d770  ed80d6d0 ec217f70 ed80d770 ec216000 ebaf9c80
7f20:  0001 ec217f70 ed80d770 ec216000  7f5edef4 c03b4ae8
7f40: 7f60261c 7f602611 00c5 7f617428 ebaf9c80 8000 008d ebaf9c80
7f60: ec216000  7f5edef4 c03b4f80 c03b4b30   
7f80: 7f617428  8000  7f617408 7f60261c 7f617428 008d
7fa0: c02080c4 c0207f40 7f617408 7f60261c 0003 7f617428 8000 
7fc0: 7f617408 7f60261c 7f617428 008d b6f98820 0002 7f6004c8 7f5edef4
7fe0: 008d bed19808 b6edaf41 b6e836f6 600e0030 0003 cf30eaa8 d31d966f
[] (rb_insert_color) from [] 
(ext4_htree_store_dirent+0xe0/0x104)
[] (ext4_htree_store_dirent) from [] 
(htree_dirblock_to_tree+0xb0/0x)
[] (htree_dirblock_to_tree) from [] 
(ext4_htree_fill_tree+0x1c8/0x2a)
[] (ext4_htree_fill_tree) from [] (ext4_readdir+0x62c/0x910)
[] (ext4_readdir) from [] (iterate_dir+0x14c/0x194)
[] (iterate_dir) from [] (SyS_getdents+0x7c/0x118)
[] (SyS_getdents) from [] (ret_fast_syscall+0x0/0x1c)
Code: e5923000 e3130001 1a62 e92d4070 (e593c004)