Re: [CFT][PATCH] kernfs: Correct kernfs directory seeks.

2018-06-07 Thread Al Viro
On Tue, Jun 05, 2018 at 12:47:33PM -0500, Eric W. Biederman wrote:

> Sigh,  I have found another issue with kernfs_fop_readdir.
> 
> We are not currently protecting file->private_data with the kernfs_mutex
> or any other kind of serialization.  Which means if two processes are
> calling readdir on the same file descriptor we might get unpredictable
> behavior.
> 
> It doesn't look too bad and easy enough to fix, but definitely something
> to be watchful of.

As discussed off-list - this is not a problem; getdents() et.al. are
serialized on per-struct-file basis by fdget_pos() in relevant syscalls,
since all directories automatically get FMODE_ATOMIC_POS in ->f_mode.


Re: [CFT][PATCH] kernfs: Correct kernfs directory seeks.

2018-06-07 Thread Al Viro
On Tue, Jun 05, 2018 at 12:47:33PM -0500, Eric W. Biederman wrote:

> Sigh,  I have found another issue with kernfs_fop_readdir.
> 
> We are not currently protecting file->private_data with the kernfs_mutex
> or any other kind of serialization.  Which means if two processes are
> calling readdir on the same file descriptor we might get unpredictable
> behavior.
> 
> It doesn't look too bad and easy enough to fix, but definitely something
> to be watchful of.

As discussed off-list - this is not a problem; getdents() et.al. are
serialized on per-struct-file basis by fdget_pos() in relevant syscalls,
since all directories automatically get FMODE_ATOMIC_POS in ->f_mode.


Re: [CFT][PATCH] kernfs: Correct kernfs directory seeks.

2018-06-05 Thread Eric W. Biederman
"'t...@kernel.org'"  writes:

> Hello,
>
> On Tue, Jun 05, 2018 at 10:31:36AM -0500, Eric W. Biederman wrote:
>> What I have above is not the clearest, and in fact the logic could be
>> better.
>> 
>> The fundamental challenge is because hash collisions are possible a file
>> offset does not hold complete position information in a directory.
>> 
>> So the kernfs node that is to be read/displayed next is saved in the
>> struct file.  The it is tested if the saved kernfs node is usable
>> for finding the location in the directory.  Several things may have
>> gone wrong.
>> 
>> - Someone may have called seekdir.
>> - The saved kernfs node may have been renamed.
>> - The saved kernfs node may have been moved to a different directory in
>>   kernfs.
>> - the saved kernfs node may have been deleted.
>> 
>> If any of those are true the code needs to do the rbtree lookup.
>
> So, given that the whole thing is protected by a mutex which protects
> modifications, it could be an option to simply keep track of who's
> iterating what and shift them on removals.  IOW, always keep cursor
> pointing to the next thing to visit and if that gets removed shift the
> cursor to the next one.

Yes.  We could.

The primary case we have to worry about is someone using seekdir,
and for that we always need the rbtree lookup.   For seekdir
we could invalidate the saved entry and make things simpler
that way.

We could add list_head to the kernfs_node and create:
struct kernfs_dir_file {
struct list_head entry;
struct kernfs_node *kn;
}
And point at that from struct file->private_data.

I don't know if it would be worth the trouble to do that over a quick
check to make certain the kernfs_node is what it is expected to be.
But that is an option.

Part of the pain of supporting seekdir is that the offset we expose
to userspace in has to be 32bit to support 32bit userspace applications.
Which unfortunately is small enough that if nothing else a name
collision can be brute forced.  So we can not avoid handling collisions.


Sigh,  I have found another issue with kernfs_fop_readdir.

We are not currently protecting file->private_data with the kernfs_mutex
or any other kind of serialization.  Which means if two processes are
calling readdir on the same file descriptor we might get unpredictable
behavior.

It doesn't look too bad and easy enough to fix, but definitely something
to be watchful of.

Eric


Re: [CFT][PATCH] kernfs: Correct kernfs directory seeks.

2018-06-05 Thread Eric W. Biederman
"'t...@kernel.org'"  writes:

> Hello,
>
> On Tue, Jun 05, 2018 at 10:31:36AM -0500, Eric W. Biederman wrote:
>> What I have above is not the clearest, and in fact the logic could be
>> better.
>> 
>> The fundamental challenge is because hash collisions are possible a file
>> offset does not hold complete position information in a directory.
>> 
>> So the kernfs node that is to be read/displayed next is saved in the
>> struct file.  The it is tested if the saved kernfs node is usable
>> for finding the location in the directory.  Several things may have
>> gone wrong.
>> 
>> - Someone may have called seekdir.
>> - The saved kernfs node may have been renamed.
>> - The saved kernfs node may have been moved to a different directory in
>>   kernfs.
>> - the saved kernfs node may have been deleted.
>> 
>> If any of those are true the code needs to do the rbtree lookup.
>
> So, given that the whole thing is protected by a mutex which protects
> modifications, it could be an option to simply keep track of who's
> iterating what and shift them on removals.  IOW, always keep cursor
> pointing to the next thing to visit and if that gets removed shift the
> cursor to the next one.

Yes.  We could.

The primary case we have to worry about is someone using seekdir,
and for that we always need the rbtree lookup.   For seekdir
we could invalidate the saved entry and make things simpler
that way.

We could add list_head to the kernfs_node and create:
struct kernfs_dir_file {
struct list_head entry;
struct kernfs_node *kn;
}
And point at that from struct file->private_data.

I don't know if it would be worth the trouble to do that over a quick
check to make certain the kernfs_node is what it is expected to be.
But that is an option.

Part of the pain of supporting seekdir is that the offset we expose
to userspace in has to be 32bit to support 32bit userspace applications.
Which unfortunately is small enough that if nothing else a name
collision can be brute forced.  So we can not avoid handling collisions.


Sigh,  I have found another issue with kernfs_fop_readdir.

We are not currently protecting file->private_data with the kernfs_mutex
or any other kind of serialization.  Which means if two processes are
calling readdir on the same file descriptor we might get unpredictable
behavior.

It doesn't look too bad and easy enough to fix, but definitely something
to be watchful of.

Eric


Re: [CFT][PATCH] kernfs: Correct kernfs directory seeks.

2018-06-05 Thread 't...@kernel.org'
Hello,

On Tue, Jun 05, 2018 at 10:31:36AM -0500, Eric W. Biederman wrote:
> What I have above is not the clearest, and in fact the logic could be
> better.
> 
> The fundamental challenge is because hash collisions are possible a file
> offset does not hold complete position information in a directory.
> 
> So the kernfs node that is to be read/displayed next is saved in the
> struct file.  The it is tested if the saved kernfs node is usable
> for finding the location in the directory.  Several things may have
> gone wrong.
> 
> - Someone may have called seekdir.
> - The saved kernfs node may have been renamed.
> - The saved kernfs node may have been moved to a different directory in
>   kernfs.
> - the saved kernfs node may have been deleted.
> 
> If any of those are true the code needs to do the rbtree lookup.

So, given that the whole thing is protected by a mutex which protects
modifications, it could be an option to simply keep track of who's
iterating what and shift them on removals.  IOW, always keep cursor
pointing to the next thing to visit and if that gets removed shift the
cursor to the next one.

Thanks.

-- 
tejun


Re: [CFT][PATCH] kernfs: Correct kernfs directory seeks.

2018-06-05 Thread 't...@kernel.org'
Hello,

On Tue, Jun 05, 2018 at 10:31:36AM -0500, Eric W. Biederman wrote:
> What I have above is not the clearest, and in fact the logic could be
> better.
> 
> The fundamental challenge is because hash collisions are possible a file
> offset does not hold complete position information in a directory.
> 
> So the kernfs node that is to be read/displayed next is saved in the
> struct file.  The it is tested if the saved kernfs node is usable
> for finding the location in the directory.  Several things may have
> gone wrong.
> 
> - Someone may have called seekdir.
> - The saved kernfs node may have been renamed.
> - The saved kernfs node may have been moved to a different directory in
>   kernfs.
> - the saved kernfs node may have been deleted.
> 
> If any of those are true the code needs to do the rbtree lookup.

So, given that the whole thing is protected by a mutex which protects
modifications, it could be an option to simply keep track of who's
iterating what and shift them on removals.  IOW, always keep cursor
pointing to the next thing to visit and if that gets removed shift the
cursor to the next one.

Thanks.

-- 
tejun


Re: [CFT][PATCH] kernfs: Correct kernfs directory seeks.

2018-06-05 Thread Eric W. Biederman
"Hatayama, Daisuke"  writes:

>> >> +
>> >> + /* Is the saved position usable? */
>> >> + if (saved) {
>> >> + /* Proper parent and hash? */
>> >> + if ((parent != saved->parent) || (saved->hash != hash)) {
>> >> + saved = NULL;
>> >
>> > name is uninitialized in this path.
>> 
>> It is.  name is initialized to "" see above.
>> 
>
> Or when either of the conditions is true, it has resulted in some 
> inconsistent state, right?
> So, why not terminating this session of readdir() immediately by
> returning NULL just as when off is turned out to be invalid?

What I have above is not the clearest, and in fact the logic could be
better.

The fundamental challenge is because hash collisions are possible a file
offset does not hold complete position information in a directory.

So the kernfs node that is to be read/displayed next is saved in the
struct file.  The it is tested if the saved kernfs node is usable
for finding the location in the directory.  Several things may have
gone wrong.

- Someone may have called seekdir.
- The saved kernfs node may have been renamed.
- The saved kernfs node may have been moved to a different directory in
  kernfs.
- the saved kernfs node may have been deleted.

If any of those are true the code needs to do the rbtree lookup.

If the kernfs node has been deleted or moved to a different directory we
can safely use it's name while performing the rbtree lookup.  Which in
the event of a hash collision will be more accurate in finding our old
location, and preventing the same directory entry being returned
multiple times.

Which is completely different than if the directory offset is an invalid
value that will never point to any directory entries.

Eric






Re: [CFT][PATCH] kernfs: Correct kernfs directory seeks.

2018-06-05 Thread Eric W. Biederman
"Hatayama, Daisuke"  writes:

>> >> +
>> >> + /* Is the saved position usable? */
>> >> + if (saved) {
>> >> + /* Proper parent and hash? */
>> >> + if ((parent != saved->parent) || (saved->hash != hash)) {
>> >> + saved = NULL;
>> >
>> > name is uninitialized in this path.
>> 
>> It is.  name is initialized to "" see above.
>> 
>
> Or when either of the conditions is true, it has resulted in some 
> inconsistent state, right?
> So, why not terminating this session of readdir() immediately by
> returning NULL just as when off is turned out to be invalid?

What I have above is not the clearest, and in fact the logic could be
better.

The fundamental challenge is because hash collisions are possible a file
offset does not hold complete position information in a directory.

So the kernfs node that is to be read/displayed next is saved in the
struct file.  The it is tested if the saved kernfs node is usable
for finding the location in the directory.  Several things may have
gone wrong.

- Someone may have called seekdir.
- The saved kernfs node may have been renamed.
- The saved kernfs node may have been moved to a different directory in
  kernfs.
- the saved kernfs node may have been deleted.

If any of those are true the code needs to do the rbtree lookup.

If the kernfs node has been deleted or moved to a different directory we
can safely use it's name while performing the rbtree lookup.  Which in
the event of a hash collision will be more accurate in finding our old
location, and preventing the same directory entry being returned
multiple times.

Which is completely different than if the directory offset is an invalid
value that will never point to any directory entries.

Eric






RE: [CFT][PATCH] kernfs: Correct kernfs directory seeks.

2018-06-04 Thread Hatayama, Daisuke



> -Original Message-
> From: Eric W. Biederman [mailto:ebied...@xmission.com]
> Sent: Tuesday, June 5, 2018 11:03 AM
> To: Hatayama, Daisuke 
> Cc: 'gre...@linuxfoundation.org' ;
> 't...@kernel.org' ; Okajima, Toshiyuki 
> ; linux-kernel@vger.kernel.org;
> 'ebied...@aristanetworks.com' 
> Subject: Re: [CFT][PATCH] kernfs: Correct kernfs directory seeks.
> 
> ebied...@xmission.com (Eric W. Biederman) writes:
> 
> > "Hatayama, Daisuke"  writes:
> >
> >>> Can you test this and please verify it fixes your issue?
> >>
> >> I tried this patch on top of v4.17 but the system fails to boot
> >> without detecting root disks by dracut like this:
> [snip]
> 
> >> OTOH, there's no issue on the pure v4.17 kernel.
> >>
> >> As above, ls /sys/module looks apparently good. But I guess any part of
> >> behavior of getdentries() on sysfs must have changed, affecting the disk
> >> detection...
> >
> > I haven't been able to reproduce this yet.  My test system boots fine.
> > Which fedora are you testing on?
> 
> I reproduced something similar and fedora 28.  So I think I have found
> and fixed the issue.  I believe I simply reversed the test at the end of
> kernfs_dir_pos. AKA "<" instead of ">".

Though too late, I used fedora 28 and RHEL7.5.

I applied this fix to your patch and the system boot was successfully done.

I'll start testing your patch from now on.

> 
> I am going to see if I can test my changes more throughly on this side
> and then repost.
> 
> 
> 
> >>>  fs/kernfs/dir.c | 109
> >>> ++--
> >>>  1 file changed, 67 insertions(+), 42 deletions(-)
> >>>
> >>> diff --git a/fs/kernfs/dir.c b/fs/kernfs/dir.c
> >>> index 89d1dc19340b..8148b5fec48d 100644
> >>> --- a/fs/kernfs/dir.c
> >>> +++ b/fs/kernfs/dir.c
> >>> @@ -1584,53 +1584,75 @@ static int kernfs_dir_fop_release(struct inode
> *inode,
> >>> struct file *filp)
> >>>   return 0;
> >>>  }
> >>>
> >>> +static struct kernfs_node *kernfs_dir_next(struct kernfs_node *pos)
> >>> +{
> >>> + struct rb_node *node = rb_next(>rb);
> >>> + return node ? rb_to_kn(node) : NULL;
> >>> +}
> >>> +
> >>>  static struct kernfs_node *kernfs_dir_pos(const void *ns,
> >>> - struct kernfs_node *parent, loff_t hash, struct kernfs_node *pos)
> >>> + struct kernfs_node *parent, loff_t off, struct kernfs_node *saved)
> >>>  {
> >>> - if (pos) {
> >>> - int valid = kernfs_active(pos) &&
> >>> - pos->parent == parent && hash == pos->hash;
> >>> - kernfs_put(pos);
> >>> - if (!valid)
> >>> - pos = NULL;
> >>> - }
> >>> - if (!pos && (hash > 1) && (hash < INT_MAX)) {
> >>> - struct rb_node *node = parent->dir.children.rb_node;
> >>> - while (node) {
> >>> - pos = rb_to_kn(node);
> >>> -
> >>> - if (hash < pos->hash)
> >>> - node = node->rb_left;
> >>> - else if (hash > pos->hash)
> >>> - node = node->rb_right;
> >>> - else
> >>> - break;
> >>> + struct kernfs_node *pos;
> >>> + struct rb_node *node;
> >>> + unsigned int hash;
> >>> + const char *name = "";
> >>> +
> >>> + /* Is off a valid name hash? */
> >>> + if ((off < 2) || (off >= INT_MAX))
> >>> + return NULL;
> >>> + hash = off;
> >>> +
> >>> + /* Is the saved position usable? */
> >>> + if (saved) {
> >>> + /* Proper parent and hash? */
> >>> + if ((parent != saved->parent) || (saved->hash != hash)) {
> >>> + saved = NULL;
> >>
> >> name is uninitialized in this path.
> >
> > It is.  name is initialized to "" see above.
> >
> >>> + } else {
> >>> + if (kernfs_active(saved))
> >>> + return saved;
> >>> + name = saved->name;
> >>>   }
> >>>   }
> >>&

RE: [CFT][PATCH] kernfs: Correct kernfs directory seeks.

2018-06-04 Thread Hatayama, Daisuke



> -Original Message-
> From: Eric W. Biederman [mailto:ebied...@xmission.com]
> Sent: Tuesday, June 5, 2018 11:03 AM
> To: Hatayama, Daisuke 
> Cc: 'gre...@linuxfoundation.org' ;
> 't...@kernel.org' ; Okajima, Toshiyuki 
> ; linux-kernel@vger.kernel.org;
> 'ebied...@aristanetworks.com' 
> Subject: Re: [CFT][PATCH] kernfs: Correct kernfs directory seeks.
> 
> ebied...@xmission.com (Eric W. Biederman) writes:
> 
> > "Hatayama, Daisuke"  writes:
> >
> >>> Can you test this and please verify it fixes your issue?
> >>
> >> I tried this patch on top of v4.17 but the system fails to boot
> >> without detecting root disks by dracut like this:
> [snip]
> 
> >> OTOH, there's no issue on the pure v4.17 kernel.
> >>
> >> As above, ls /sys/module looks apparently good. But I guess any part of
> >> behavior of getdentries() on sysfs must have changed, affecting the disk
> >> detection...
> >
> > I haven't been able to reproduce this yet.  My test system boots fine.
> > Which fedora are you testing on?
> 
> I reproduced something similar and fedora 28.  So I think I have found
> and fixed the issue.  I believe I simply reversed the test at the end of
> kernfs_dir_pos. AKA "<" instead of ">".

Though too late, I used fedora 28 and RHEL7.5.

I applied this fix to your patch and the system boot was successfully done.

I'll start testing your patch from now on.

> 
> I am going to see if I can test my changes more throughly on this side
> and then repost.
> 
> 
> 
> >>>  fs/kernfs/dir.c | 109
> >>> ++--
> >>>  1 file changed, 67 insertions(+), 42 deletions(-)
> >>>
> >>> diff --git a/fs/kernfs/dir.c b/fs/kernfs/dir.c
> >>> index 89d1dc19340b..8148b5fec48d 100644
> >>> --- a/fs/kernfs/dir.c
> >>> +++ b/fs/kernfs/dir.c
> >>> @@ -1584,53 +1584,75 @@ static int kernfs_dir_fop_release(struct inode
> *inode,
> >>> struct file *filp)
> >>>   return 0;
> >>>  }
> >>>
> >>> +static struct kernfs_node *kernfs_dir_next(struct kernfs_node *pos)
> >>> +{
> >>> + struct rb_node *node = rb_next(>rb);
> >>> + return node ? rb_to_kn(node) : NULL;
> >>> +}
> >>> +
> >>>  static struct kernfs_node *kernfs_dir_pos(const void *ns,
> >>> - struct kernfs_node *parent, loff_t hash, struct kernfs_node *pos)
> >>> + struct kernfs_node *parent, loff_t off, struct kernfs_node *saved)
> >>>  {
> >>> - if (pos) {
> >>> - int valid = kernfs_active(pos) &&
> >>> - pos->parent == parent && hash == pos->hash;
> >>> - kernfs_put(pos);
> >>> - if (!valid)
> >>> - pos = NULL;
> >>> - }
> >>> - if (!pos && (hash > 1) && (hash < INT_MAX)) {
> >>> - struct rb_node *node = parent->dir.children.rb_node;
> >>> - while (node) {
> >>> - pos = rb_to_kn(node);
> >>> -
> >>> - if (hash < pos->hash)
> >>> - node = node->rb_left;
> >>> - else if (hash > pos->hash)
> >>> - node = node->rb_right;
> >>> - else
> >>> - break;
> >>> + struct kernfs_node *pos;
> >>> + struct rb_node *node;
> >>> + unsigned int hash;
> >>> + const char *name = "";
> >>> +
> >>> + /* Is off a valid name hash? */
> >>> + if ((off < 2) || (off >= INT_MAX))
> >>> + return NULL;
> >>> + hash = off;
> >>> +
> >>> + /* Is the saved position usable? */
> >>> + if (saved) {
> >>> + /* Proper parent and hash? */
> >>> + if ((parent != saved->parent) || (saved->hash != hash)) {
> >>> + saved = NULL;
> >>
> >> name is uninitialized in this path.
> >
> > It is.  name is initialized to "" see above.
> >
> >>> + } else {
> >>> + if (kernfs_active(saved))
> >>> + return saved;
> >>> + name = saved->name;
> >>>   }
> >>>   }
> >>&

RE: [CFT][PATCH] kernfs: Correct kernfs directory seeks.

2018-06-04 Thread Hatayama, Daisuke



> -Original Message-
> From: Eric W. Biederman [mailto:ebied...@xmission.com]
> Sent: Monday, June 4, 2018 11:45 PM
> To: Hatayama, Daisuke 
> Cc: 'gre...@linuxfoundation.org' ;
> 't...@kernel.org' ; Okajima, Toshiyuki 
> ; linux-kernel@vger.kernel.org;
> 'ebied...@aristanetworks.com' 
> Subject: Re: [CFT][PATCH] kernfs: Correct kernfs directory seeks.
> 
> "Hatayama, Daisuke"  writes:
> 
> > Hello,
> >
> > Thanks for this patch, Eric.
> >
> >> -Original Message-
> >> From: Eric W. Biederman [mailto:ebied...@xmission.com]
> >> Sent: Monday, June 4, 2018 3:51 AM
> >> To: Hatayama, Daisuke 
> >> Cc: 'gre...@linuxfoundation.org' ;
> >> 't...@kernel.org' ; Okajima, Toshiyuki
> >> ; linux-kernel@vger.kernel.org;
> >> 'ebied...@aristanetworks.com' 
> >> Subject: [CFT][PATCH] kernfs: Correct kernfs directory seeks.
> >>
> >>
> >> Over the years two bugs have crept into the code that handled the last
> >> returned kernfs directory being deleted, and handled general seeking
> >> in kernfs directories.  The result is that reading the /sys/module
> >> directory while moduled are loading and unloading will sometimes
> >> skip over a module that was present for the entire duration of
> >> the directory read.
> >>
> >> The first bug is that kernfs_dir_next_pos was advancing the position
> >> in the directory even when kernfs_dir_pos had found the starting
> >> kernfs directory entry was not usable.  In this case kernfs_dir_pos is
> >> guaranteed to return the directory entry past the starting directory
> >> entry, making it so that advancing the directory position is wrong.
> >>
> >> The second bug is that kernfs_dir_pos when the saved kernfs_node
> >> did not validate, was not always returning a directory after
> >> the the target position, but was sometimes returning the directory
> >> entry just before the target position.
> >>
> >> To correct these issues and to make the code easily readable and
> >> comprehensible several cleanups are made.
> >>
> >> - A function kernfs_dir_next is factored out to make it straight-forward
> >>   to find the next node in a kernfs directory.
> >>
> >> - A function kernfs_dir_skip_pos is factored out to skip over directory
> >>   entries that should not be shown to userspace.  This function is called
> >>   from kernfs_fop_readdir directly removing the complication of calling
> >>   it from the other directory advancement functions.
> >>
> >> - The kernfs_put of the saved directory entry is moved from kernfs_dir_pos
> >>   to kernfs_fop_readdir.  Keeping it with the kernfs_get when it is saved
> >>   and ensuring the directory position advancment functions can reference
> >>   the saved node without complications.
> >>
> >> - The function kernfs_dir_next_pos is modified to only advance the 
> >> directory
> >>   position in the common case when kernfs_dir_pos validates and returns
> >>   the starting kernfs node.  For all other cases kernfs_dir_pos is
> guaranteed
> >>   to return a directory position in advance of the starting directory
> position.
> >>
> >> - kernfs_dir_pos is given a significant make over.  It's essense remains
> the
> >>   same but some siginficant changes were made.
> >>
> >>   + The offset is validated to be a valid hash value at the very beginning.
> >> The validation is updated to handle the fact that neither 0 nor 1 are
> >> valid hash values.
> >>
> >>   + An extra test is added at the end of the rbtree lookup to ensure
> >> the returned position is at or beyond the target position.
> >>
> >>   + kernfs_name_compare is used during the rbtree lookup instead of just
> >> comparing
> >> the hash.  This allows the the passed namespace parameter to be used,
> and
> >> if it is available from the saved entry the old saved name as well.
> >>
> >>   + The validation of the saved kernfs node now checks for two cases.
> >> If the saved node is returnable.
> >> If the name of the saved node is usable during lookup.
> >>
> >> In net this should result in a easier to understand, and more correct
> >> version of directory traversal for kernfs directories.
> >>
> >> Reported-by: "Hatayama, Daisuke" 
> >> Fixes: a406f75840e1 ("sysfs: use rb-tree for inode number 

RE: [CFT][PATCH] kernfs: Correct kernfs directory seeks.

2018-06-04 Thread Hatayama, Daisuke



> -Original Message-
> From: Eric W. Biederman [mailto:ebied...@xmission.com]
> Sent: Monday, June 4, 2018 11:45 PM
> To: Hatayama, Daisuke 
> Cc: 'gre...@linuxfoundation.org' ;
> 't...@kernel.org' ; Okajima, Toshiyuki 
> ; linux-kernel@vger.kernel.org;
> 'ebied...@aristanetworks.com' 
> Subject: Re: [CFT][PATCH] kernfs: Correct kernfs directory seeks.
> 
> "Hatayama, Daisuke"  writes:
> 
> > Hello,
> >
> > Thanks for this patch, Eric.
> >
> >> -Original Message-
> >> From: Eric W. Biederman [mailto:ebied...@xmission.com]
> >> Sent: Monday, June 4, 2018 3:51 AM
> >> To: Hatayama, Daisuke 
> >> Cc: 'gre...@linuxfoundation.org' ;
> >> 't...@kernel.org' ; Okajima, Toshiyuki
> >> ; linux-kernel@vger.kernel.org;
> >> 'ebied...@aristanetworks.com' 
> >> Subject: [CFT][PATCH] kernfs: Correct kernfs directory seeks.
> >>
> >>
> >> Over the years two bugs have crept into the code that handled the last
> >> returned kernfs directory being deleted, and handled general seeking
> >> in kernfs directories.  The result is that reading the /sys/module
> >> directory while moduled are loading and unloading will sometimes
> >> skip over a module that was present for the entire duration of
> >> the directory read.
> >>
> >> The first bug is that kernfs_dir_next_pos was advancing the position
> >> in the directory even when kernfs_dir_pos had found the starting
> >> kernfs directory entry was not usable.  In this case kernfs_dir_pos is
> >> guaranteed to return the directory entry past the starting directory
> >> entry, making it so that advancing the directory position is wrong.
> >>
> >> The second bug is that kernfs_dir_pos when the saved kernfs_node
> >> did not validate, was not always returning a directory after
> >> the the target position, but was sometimes returning the directory
> >> entry just before the target position.
> >>
> >> To correct these issues and to make the code easily readable and
> >> comprehensible several cleanups are made.
> >>
> >> - A function kernfs_dir_next is factored out to make it straight-forward
> >>   to find the next node in a kernfs directory.
> >>
> >> - A function kernfs_dir_skip_pos is factored out to skip over directory
> >>   entries that should not be shown to userspace.  This function is called
> >>   from kernfs_fop_readdir directly removing the complication of calling
> >>   it from the other directory advancement functions.
> >>
> >> - The kernfs_put of the saved directory entry is moved from kernfs_dir_pos
> >>   to kernfs_fop_readdir.  Keeping it with the kernfs_get when it is saved
> >>   and ensuring the directory position advancment functions can reference
> >>   the saved node without complications.
> >>
> >> - The function kernfs_dir_next_pos is modified to only advance the 
> >> directory
> >>   position in the common case when kernfs_dir_pos validates and returns
> >>   the starting kernfs node.  For all other cases kernfs_dir_pos is
> guaranteed
> >>   to return a directory position in advance of the starting directory
> position.
> >>
> >> - kernfs_dir_pos is given a significant make over.  It's essense remains
> the
> >>   same but some siginficant changes were made.
> >>
> >>   + The offset is validated to be a valid hash value at the very beginning.
> >> The validation is updated to handle the fact that neither 0 nor 1 are
> >> valid hash values.
> >>
> >>   + An extra test is added at the end of the rbtree lookup to ensure
> >> the returned position is at or beyond the target position.
> >>
> >>   + kernfs_name_compare is used during the rbtree lookup instead of just
> >> comparing
> >> the hash.  This allows the the passed namespace parameter to be used,
> and
> >> if it is available from the saved entry the old saved name as well.
> >>
> >>   + The validation of the saved kernfs node now checks for two cases.
> >> If the saved node is returnable.
> >> If the name of the saved node is usable during lookup.
> >>
> >> In net this should result in a easier to understand, and more correct
> >> version of directory traversal for kernfs directories.
> >>
> >> Reported-by: "Hatayama, Daisuke" 
> >> Fixes: a406f75840e1 ("sysfs: use rb-tree for inode number 

Re: [CFT][PATCH] kernfs: Correct kernfs directory seeks.

2018-06-04 Thread Eric W. Biederman
ebied...@xmission.com (Eric W. Biederman) writes:

> "Hatayama, Daisuke"  writes:
>
>>> Can you test this and please verify it fixes your issue?
>>
>> I tried this patch on top of v4.17 but the system fails to boot
>> without detecting root disks by dracut like this:
[snip]

>> OTOH, there's no issue on the pure v4.17 kernel.
>>
>> As above, ls /sys/module looks apparently good. But I guess any part of
>> behavior of getdentries() on sysfs must have changed, affecting the disk
>> detection...
>
> I haven't been able to reproduce this yet.  My test system boots fine.
> Which fedora are you testing on?

I reproduced something similar and fedora 28.  So I think I have found
and fixed the issue.  I believe I simply reversed the test at the end of
kernfs_dir_pos. AKA "<" instead of ">".

I am going to see if I can test my changes more throughly on this side
and then repost.



>>>  fs/kernfs/dir.c | 109
>>> ++--
>>>  1 file changed, 67 insertions(+), 42 deletions(-)
>>> 
>>> diff --git a/fs/kernfs/dir.c b/fs/kernfs/dir.c
>>> index 89d1dc19340b..8148b5fec48d 100644
>>> --- a/fs/kernfs/dir.c
>>> +++ b/fs/kernfs/dir.c
>>> @@ -1584,53 +1584,75 @@ static int kernfs_dir_fop_release(struct inode 
>>> *inode,
>>> struct file *filp)
>>> return 0;
>>>  }
>>> 
>>> +static struct kernfs_node *kernfs_dir_next(struct kernfs_node *pos)
>>> +{
>>> +   struct rb_node *node = rb_next(>rb);
>>> +   return node ? rb_to_kn(node) : NULL;
>>> +}
>>> +
>>>  static struct kernfs_node *kernfs_dir_pos(const void *ns,
>>> -   struct kernfs_node *parent, loff_t hash, struct kernfs_node *pos)
>>> +   struct kernfs_node *parent, loff_t off, struct kernfs_node *saved)
>>>  {
>>> -   if (pos) {
>>> -   int valid = kernfs_active(pos) &&
>>> -   pos->parent == parent && hash == pos->hash;
>>> -   kernfs_put(pos);
>>> -   if (!valid)
>>> -   pos = NULL;
>>> -   }
>>> -   if (!pos && (hash > 1) && (hash < INT_MAX)) {
>>> -   struct rb_node *node = parent->dir.children.rb_node;
>>> -   while (node) {
>>> -   pos = rb_to_kn(node);
>>> -
>>> -   if (hash < pos->hash)
>>> -   node = node->rb_left;
>>> -   else if (hash > pos->hash)
>>> -   node = node->rb_right;
>>> -   else
>>> -   break;
>>> +   struct kernfs_node *pos;
>>> +   struct rb_node *node;
>>> +   unsigned int hash;
>>> +   const char *name = "";
>>> +
>>> +   /* Is off a valid name hash? */
>>> +   if ((off < 2) || (off >= INT_MAX))
>>> +   return NULL;
>>> +   hash = off;
>>> +
>>> +   /* Is the saved position usable? */
>>> +   if (saved) {
>>> +   /* Proper parent and hash? */
>>> +   if ((parent != saved->parent) || (saved->hash != hash)) {
>>> +   saved = NULL;
>>
>> name is uninitialized in this path.
>
> It is.  name is initialized to "" see above.
>
>>> +   } else {
>>> +   if (kernfs_active(saved))
>>> +   return saved;
>>> +   name = saved->name;
>>> }
>>> }
>>> -   /* Skip over entries which are dying/dead or in the wrong namespace
>>> */
>>> -   while (pos && (!kernfs_active(pos) || pos->ns != ns)) {
>>> -   struct rb_node *node = rb_next(>rb);
>>> -   if (!node)
>>> -   pos = NULL;
>>> +
>>> +   /* Find the closest pos to the hash we are looking for */
>>> +   pos = NULL;
>>> +   node = parent->dir.children.rb_node;
>>> +   while (node) {
>>> +   int result;
>>> +
>>> +   pos = rb_to_kn(node);
>>> +   result = kernfs_name_compare(hash, name, ns, pos);
>>> +   if (result < 0)
>>> +   node = node->rb_left;
>>> +   else if (result > 0)
>>> +   node = node->rb_right;
>>> else
>>> -   pos = rb_to_kn(node);
>>> +   break;
>>> }
>>> +
>>> +   /* Ensure pos is at or beyond the target position */
>>> +   if (pos && (kernfs_name_compare(hash, name, ns, pos) < 0))

  should be > 0
>>> +   pos = kernfs_dir_next(pos);
>>> +

Eric


Re: [CFT][PATCH] kernfs: Correct kernfs directory seeks.

2018-06-04 Thread Eric W. Biederman
ebied...@xmission.com (Eric W. Biederman) writes:

> "Hatayama, Daisuke"  writes:
>
>>> Can you test this and please verify it fixes your issue?
>>
>> I tried this patch on top of v4.17 but the system fails to boot
>> without detecting root disks by dracut like this:
[snip]

>> OTOH, there's no issue on the pure v4.17 kernel.
>>
>> As above, ls /sys/module looks apparently good. But I guess any part of
>> behavior of getdentries() on sysfs must have changed, affecting the disk
>> detection...
>
> I haven't been able to reproduce this yet.  My test system boots fine.
> Which fedora are you testing on?

I reproduced something similar and fedora 28.  So I think I have found
and fixed the issue.  I believe I simply reversed the test at the end of
kernfs_dir_pos. AKA "<" instead of ">".

I am going to see if I can test my changes more throughly on this side
and then repost.



>>>  fs/kernfs/dir.c | 109
>>> ++--
>>>  1 file changed, 67 insertions(+), 42 deletions(-)
>>> 
>>> diff --git a/fs/kernfs/dir.c b/fs/kernfs/dir.c
>>> index 89d1dc19340b..8148b5fec48d 100644
>>> --- a/fs/kernfs/dir.c
>>> +++ b/fs/kernfs/dir.c
>>> @@ -1584,53 +1584,75 @@ static int kernfs_dir_fop_release(struct inode 
>>> *inode,
>>> struct file *filp)
>>> return 0;
>>>  }
>>> 
>>> +static struct kernfs_node *kernfs_dir_next(struct kernfs_node *pos)
>>> +{
>>> +   struct rb_node *node = rb_next(>rb);
>>> +   return node ? rb_to_kn(node) : NULL;
>>> +}
>>> +
>>>  static struct kernfs_node *kernfs_dir_pos(const void *ns,
>>> -   struct kernfs_node *parent, loff_t hash, struct kernfs_node *pos)
>>> +   struct kernfs_node *parent, loff_t off, struct kernfs_node *saved)
>>>  {
>>> -   if (pos) {
>>> -   int valid = kernfs_active(pos) &&
>>> -   pos->parent == parent && hash == pos->hash;
>>> -   kernfs_put(pos);
>>> -   if (!valid)
>>> -   pos = NULL;
>>> -   }
>>> -   if (!pos && (hash > 1) && (hash < INT_MAX)) {
>>> -   struct rb_node *node = parent->dir.children.rb_node;
>>> -   while (node) {
>>> -   pos = rb_to_kn(node);
>>> -
>>> -   if (hash < pos->hash)
>>> -   node = node->rb_left;
>>> -   else if (hash > pos->hash)
>>> -   node = node->rb_right;
>>> -   else
>>> -   break;
>>> +   struct kernfs_node *pos;
>>> +   struct rb_node *node;
>>> +   unsigned int hash;
>>> +   const char *name = "";
>>> +
>>> +   /* Is off a valid name hash? */
>>> +   if ((off < 2) || (off >= INT_MAX))
>>> +   return NULL;
>>> +   hash = off;
>>> +
>>> +   /* Is the saved position usable? */
>>> +   if (saved) {
>>> +   /* Proper parent and hash? */
>>> +   if ((parent != saved->parent) || (saved->hash != hash)) {
>>> +   saved = NULL;
>>
>> name is uninitialized in this path.
>
> It is.  name is initialized to "" see above.
>
>>> +   } else {
>>> +   if (kernfs_active(saved))
>>> +   return saved;
>>> +   name = saved->name;
>>> }
>>> }
>>> -   /* Skip over entries which are dying/dead or in the wrong namespace
>>> */
>>> -   while (pos && (!kernfs_active(pos) || pos->ns != ns)) {
>>> -   struct rb_node *node = rb_next(>rb);
>>> -   if (!node)
>>> -   pos = NULL;
>>> +
>>> +   /* Find the closest pos to the hash we are looking for */
>>> +   pos = NULL;
>>> +   node = parent->dir.children.rb_node;
>>> +   while (node) {
>>> +   int result;
>>> +
>>> +   pos = rb_to_kn(node);
>>> +   result = kernfs_name_compare(hash, name, ns, pos);
>>> +   if (result < 0)
>>> +   node = node->rb_left;
>>> +   else if (result > 0)
>>> +   node = node->rb_right;
>>> else
>>> -   pos = rb_to_kn(node);
>>> +   break;
>>> }
>>> +
>>> +   /* Ensure pos is at or beyond the target position */
>>> +   if (pos && (kernfs_name_compare(hash, name, ns, pos) < 0))

  should be > 0
>>> +   pos = kernfs_dir_next(pos);
>>> +

Eric


Re: [CFT][PATCH] kernfs: Correct kernfs directory seeks.

2018-06-04 Thread Eric W. Biederman
"Hatayama, Daisuke"  writes:

> Hello,
>
> Thanks for this patch, Eric.
>
>> -Original Message-
>> From: Eric W. Biederman [mailto:ebied...@xmission.com]
>> Sent: Monday, June 4, 2018 3:51 AM
>> To: Hatayama, Daisuke 
>> Cc: 'gre...@linuxfoundation.org' ;
>> 't...@kernel.org' ; Okajima, Toshiyuki 
>> ; linux-kernel@vger.kernel.org;
>> 'ebied...@aristanetworks.com' 
>> Subject: [CFT][PATCH] kernfs: Correct kernfs directory seeks.
>> 
>> 
>> Over the years two bugs have crept into the code that handled the last
>> returned kernfs directory being deleted, and handled general seeking
>> in kernfs directories.  The result is that reading the /sys/module
>> directory while moduled are loading and unloading will sometimes
>> skip over a module that was present for the entire duration of
>> the directory read.
>> 
>> The first bug is that kernfs_dir_next_pos was advancing the position
>> in the directory even when kernfs_dir_pos had found the starting
>> kernfs directory entry was not usable.  In this case kernfs_dir_pos is
>> guaranteed to return the directory entry past the starting directory
>> entry, making it so that advancing the directory position is wrong.
>> 
>> The second bug is that kernfs_dir_pos when the saved kernfs_node
>> did not validate, was not always returning a directory after
>> the the target position, but was sometimes returning the directory
>> entry just before the target position.
>> 
>> To correct these issues and to make the code easily readable and
>> comprehensible several cleanups are made.
>> 
>> - A function kernfs_dir_next is factored out to make it straight-forward
>>   to find the next node in a kernfs directory.
>> 
>> - A function kernfs_dir_skip_pos is factored out to skip over directory
>>   entries that should not be shown to userspace.  This function is called
>>   from kernfs_fop_readdir directly removing the complication of calling
>>   it from the other directory advancement functions.
>> 
>> - The kernfs_put of the saved directory entry is moved from kernfs_dir_pos
>>   to kernfs_fop_readdir.  Keeping it with the kernfs_get when it is saved
>>   and ensuring the directory position advancment functions can reference
>>   the saved node without complications.
>> 
>> - The function kernfs_dir_next_pos is modified to only advance the directory
>>   position in the common case when kernfs_dir_pos validates and returns
>>   the starting kernfs node.  For all other cases kernfs_dir_pos is guaranteed
>>   to return a directory position in advance of the starting directory 
>> position.
>> 
>> - kernfs_dir_pos is given a significant make over.  It's essense remains the
>>   same but some siginficant changes were made.
>> 
>>   + The offset is validated to be a valid hash value at the very beginning.
>> The validation is updated to handle the fact that neither 0 nor 1 are
>> valid hash values.
>> 
>>   + An extra test is added at the end of the rbtree lookup to ensure
>> the returned position is at or beyond the target position.
>> 
>>   + kernfs_name_compare is used during the rbtree lookup instead of just
>> comparing
>> the hash.  This allows the the passed namespace parameter to be used, and
>> if it is available from the saved entry the old saved name as well.
>> 
>>   + The validation of the saved kernfs node now checks for two cases.
>> If the saved node is returnable.
>> If the name of the saved node is usable during lookup.
>> 
>> In net this should result in a easier to understand, and more correct
>> version of directory traversal for kernfs directories.
>>
>> Reported-by: "Hatayama, Daisuke" 
>> Fixes: a406f75840e1 ("sysfs: use rb-tree for inode number lookup")
>> Fixes: 1e5289c97bba ("sysfs: Cache the last sysfs_dirent to improve readdir
>> scalability v2")
>> Signed-off-by: "Eric W. Biederman" 
>> ---
>> 
>> Can you test this and please verify it fixes your issue?
>>
>
> I tried this patch on top of v4.17 but the system fails to boot
> without detecting root disks by dracut like this:
>
> [  196.121294] dracut-initqueue[324]: Warning: dracut-initqueue timeout - 
> starting timeout scripts
> [  196.672175] dracut-initqueue[324]: Warning: dracut-initqueue timeout - 
> starting timeout scripts
> [  197.219519] dracut-initqueue[324]: Warning: dracut-initqueue timeout - 
> starting timeout scripts
> [  197.768997] dracut-initqueue[324]: Warning: dracut-initqueue timeout - 
> starting timeout scripts
> [  198.312498] dracut-initqueue[324]: Warning: dracut-initqueue timeout - 
> starting timeout scripts
> [  198.856841] dracut-initqueue[324]: Warning: dracut-initqueue timeout - 
> starting timeout scripts
> [  199.403190] dracut-initqueue[324]: Warning: dracut-initqueue timeout - 
> starting timeout scripts
> [  199.945999] dracut-initqueue[324]: Warning: dracut-initqueue timeout - 
> starting timeout scripts
> [  200.490281] dracut-initqueue[324]: Warning: dracut-initqueue timeout - 
> starting 

Re: [CFT][PATCH] kernfs: Correct kernfs directory seeks.

2018-06-04 Thread Eric W. Biederman
"Hatayama, Daisuke"  writes:

> Hello,
>
> Thanks for this patch, Eric.
>
>> -Original Message-
>> From: Eric W. Biederman [mailto:ebied...@xmission.com]
>> Sent: Monday, June 4, 2018 3:51 AM
>> To: Hatayama, Daisuke 
>> Cc: 'gre...@linuxfoundation.org' ;
>> 't...@kernel.org' ; Okajima, Toshiyuki 
>> ; linux-kernel@vger.kernel.org;
>> 'ebied...@aristanetworks.com' 
>> Subject: [CFT][PATCH] kernfs: Correct kernfs directory seeks.
>> 
>> 
>> Over the years two bugs have crept into the code that handled the last
>> returned kernfs directory being deleted, and handled general seeking
>> in kernfs directories.  The result is that reading the /sys/module
>> directory while moduled are loading and unloading will sometimes
>> skip over a module that was present for the entire duration of
>> the directory read.
>> 
>> The first bug is that kernfs_dir_next_pos was advancing the position
>> in the directory even when kernfs_dir_pos had found the starting
>> kernfs directory entry was not usable.  In this case kernfs_dir_pos is
>> guaranteed to return the directory entry past the starting directory
>> entry, making it so that advancing the directory position is wrong.
>> 
>> The second bug is that kernfs_dir_pos when the saved kernfs_node
>> did not validate, was not always returning a directory after
>> the the target position, but was sometimes returning the directory
>> entry just before the target position.
>> 
>> To correct these issues and to make the code easily readable and
>> comprehensible several cleanups are made.
>> 
>> - A function kernfs_dir_next is factored out to make it straight-forward
>>   to find the next node in a kernfs directory.
>> 
>> - A function kernfs_dir_skip_pos is factored out to skip over directory
>>   entries that should not be shown to userspace.  This function is called
>>   from kernfs_fop_readdir directly removing the complication of calling
>>   it from the other directory advancement functions.
>> 
>> - The kernfs_put of the saved directory entry is moved from kernfs_dir_pos
>>   to kernfs_fop_readdir.  Keeping it with the kernfs_get when it is saved
>>   and ensuring the directory position advancment functions can reference
>>   the saved node without complications.
>> 
>> - The function kernfs_dir_next_pos is modified to only advance the directory
>>   position in the common case when kernfs_dir_pos validates and returns
>>   the starting kernfs node.  For all other cases kernfs_dir_pos is guaranteed
>>   to return a directory position in advance of the starting directory 
>> position.
>> 
>> - kernfs_dir_pos is given a significant make over.  It's essense remains the
>>   same but some siginficant changes were made.
>> 
>>   + The offset is validated to be a valid hash value at the very beginning.
>> The validation is updated to handle the fact that neither 0 nor 1 are
>> valid hash values.
>> 
>>   + An extra test is added at the end of the rbtree lookup to ensure
>> the returned position is at or beyond the target position.
>> 
>>   + kernfs_name_compare is used during the rbtree lookup instead of just
>> comparing
>> the hash.  This allows the the passed namespace parameter to be used, and
>> if it is available from the saved entry the old saved name as well.
>> 
>>   + The validation of the saved kernfs node now checks for two cases.
>> If the saved node is returnable.
>> If the name of the saved node is usable during lookup.
>> 
>> In net this should result in a easier to understand, and more correct
>> version of directory traversal for kernfs directories.
>>
>> Reported-by: "Hatayama, Daisuke" 
>> Fixes: a406f75840e1 ("sysfs: use rb-tree for inode number lookup")
>> Fixes: 1e5289c97bba ("sysfs: Cache the last sysfs_dirent to improve readdir
>> scalability v2")
>> Signed-off-by: "Eric W. Biederman" 
>> ---
>> 
>> Can you test this and please verify it fixes your issue?
>>
>
> I tried this patch on top of v4.17 but the system fails to boot
> without detecting root disks by dracut like this:
>
> [  196.121294] dracut-initqueue[324]: Warning: dracut-initqueue timeout - 
> starting timeout scripts
> [  196.672175] dracut-initqueue[324]: Warning: dracut-initqueue timeout - 
> starting timeout scripts
> [  197.219519] dracut-initqueue[324]: Warning: dracut-initqueue timeout - 
> starting timeout scripts
> [  197.768997] dracut-initqueue[324]: Warning: dracut-initqueue timeout - 
> starting timeout scripts
> [  198.312498] dracut-initqueue[324]: Warning: dracut-initqueue timeout - 
> starting timeout scripts
> [  198.856841] dracut-initqueue[324]: Warning: dracut-initqueue timeout - 
> starting timeout scripts
> [  199.403190] dracut-initqueue[324]: Warning: dracut-initqueue timeout - 
> starting timeout scripts
> [  199.945999] dracut-initqueue[324]: Warning: dracut-initqueue timeout - 
> starting timeout scripts
> [  200.490281] dracut-initqueue[324]: Warning: dracut-initqueue timeout - 
> starting 

RE: [CFT][PATCH] kernfs: Correct kernfs directory seeks.

2018-06-04 Thread Hatayama, Daisuke
Hello,

Thanks for this patch, Eric.

> -Original Message-
> From: Eric W. Biederman [mailto:ebied...@xmission.com]
> Sent: Monday, June 4, 2018 3:51 AM
> To: Hatayama, Daisuke 
> Cc: 'gre...@linuxfoundation.org' ;
> 't...@kernel.org' ; Okajima, Toshiyuki 
> ; linux-kernel@vger.kernel.org;
> 'ebied...@aristanetworks.com' 
> Subject: [CFT][PATCH] kernfs: Correct kernfs directory seeks.
> 
> 
> Over the years two bugs have crept into the code that handled the last
> returned kernfs directory being deleted, and handled general seeking
> in kernfs directories.  The result is that reading the /sys/module
> directory while moduled are loading and unloading will sometimes
> skip over a module that was present for the entire duration of
> the directory read.
> 
> The first bug is that kernfs_dir_next_pos was advancing the position
> in the directory even when kernfs_dir_pos had found the starting
> kernfs directory entry was not usable.  In this case kernfs_dir_pos is
> guaranteed to return the directory entry past the starting directory
> entry, making it so that advancing the directory position is wrong.
> 
> The second bug is that kernfs_dir_pos when the saved kernfs_node
> did not validate, was not always returning a directory after
> the the target position, but was sometimes returning the directory
> entry just before the target position.
> 
> To correct these issues and to make the code easily readable and
> comprehensible several cleanups are made.
> 
> - A function kernfs_dir_next is factored out to make it straight-forward
>   to find the next node in a kernfs directory.
> 
> - A function kernfs_dir_skip_pos is factored out to skip over directory
>   entries that should not be shown to userspace.  This function is called
>   from kernfs_fop_readdir directly removing the complication of calling
>   it from the other directory advancement functions.
> 
> - The kernfs_put of the saved directory entry is moved from kernfs_dir_pos
>   to kernfs_fop_readdir.  Keeping it with the kernfs_get when it is saved
>   and ensuring the directory position advancment functions can reference
>   the saved node without complications.
> 
> - The function kernfs_dir_next_pos is modified to only advance the directory
>   position in the common case when kernfs_dir_pos validates and returns
>   the starting kernfs node.  For all other cases kernfs_dir_pos is guaranteed
>   to return a directory position in advance of the starting directory 
> position.
> 
> - kernfs_dir_pos is given a significant make over.  It's essense remains the
>   same but some siginficant changes were made.
> 
>   + The offset is validated to be a valid hash value at the very beginning.
> The validation is updated to handle the fact that neither 0 nor 1 are
> valid hash values.
> 
>   + An extra test is added at the end of the rbtree lookup to ensure
> the returned position is at or beyond the target position.
> 
>   + kernfs_name_compare is used during the rbtree lookup instead of just
> comparing
> the hash.  This allows the the passed namespace parameter to be used, and
> if it is available from the saved entry the old saved name as well.
> 
>   + The validation of the saved kernfs node now checks for two cases.
> If the saved node is returnable.
> If the name of the saved node is usable during lookup.
> 
> In net this should result in a easier to understand, and more correct
> version of directory traversal for kernfs directories.
>
> Reported-by: "Hatayama, Daisuke" 
> Fixes: a406f75840e1 ("sysfs: use rb-tree for inode number lookup")
> Fixes: 1e5289c97bba ("sysfs: Cache the last sysfs_dirent to improve readdir
> scalability v2")
> Signed-off-by: "Eric W. Biederman" 
> ---
> 
> Can you test this and please verify it fixes your issue?
>

I tried this patch on top of v4.17 but the system fails to boot
without detecting root disks by dracut like this:

[  196.121294] dracut-initqueue[324]: Warning: dracut-initqueue timeout - 
starting timeout scripts
[  196.672175] dracut-initqueue[324]: Warning: dracut-initqueue timeout - 
starting timeout scripts
[  197.219519] dracut-initqueue[324]: Warning: dracut-initqueue timeout - 
starting timeout scripts
[  197.768997] dracut-initqueue[324]: Warning: dracut-initqueue timeout - 
starting timeout scripts
[  198.312498] dracut-initqueue[324]: Warning: dracut-initqueue timeout - 
starting timeout scripts
[  198.856841] dracut-initqueue[324]: Warning: dracut-initqueue timeout - 
starting timeout scripts
[  199.403190] dracut-initqueue[324]: Warning: dracut-initqueue timeout - 
starting timeout scripts
[  199.945999] dracut-initqueue[324]: Warning: dracut-initqueue timeout - 
starting timeout scripts
[  200.490281] dracut-initqueue[324]: Warning: dracut-initqueue timeout - 
starting timeout scripts
[  201.071177] dracut-initqueue[324]: Warning: dracut-initqueue timeout - 
starting timeout scripts
[  201.074958] dracut-initqueue[324]: 

RE: [CFT][PATCH] kernfs: Correct kernfs directory seeks.

2018-06-04 Thread Hatayama, Daisuke
Hello,

Thanks for this patch, Eric.

> -Original Message-
> From: Eric W. Biederman [mailto:ebied...@xmission.com]
> Sent: Monday, June 4, 2018 3:51 AM
> To: Hatayama, Daisuke 
> Cc: 'gre...@linuxfoundation.org' ;
> 't...@kernel.org' ; Okajima, Toshiyuki 
> ; linux-kernel@vger.kernel.org;
> 'ebied...@aristanetworks.com' 
> Subject: [CFT][PATCH] kernfs: Correct kernfs directory seeks.
> 
> 
> Over the years two bugs have crept into the code that handled the last
> returned kernfs directory being deleted, and handled general seeking
> in kernfs directories.  The result is that reading the /sys/module
> directory while moduled are loading and unloading will sometimes
> skip over a module that was present for the entire duration of
> the directory read.
> 
> The first bug is that kernfs_dir_next_pos was advancing the position
> in the directory even when kernfs_dir_pos had found the starting
> kernfs directory entry was not usable.  In this case kernfs_dir_pos is
> guaranteed to return the directory entry past the starting directory
> entry, making it so that advancing the directory position is wrong.
> 
> The second bug is that kernfs_dir_pos when the saved kernfs_node
> did not validate, was not always returning a directory after
> the the target position, but was sometimes returning the directory
> entry just before the target position.
> 
> To correct these issues and to make the code easily readable and
> comprehensible several cleanups are made.
> 
> - A function kernfs_dir_next is factored out to make it straight-forward
>   to find the next node in a kernfs directory.
> 
> - A function kernfs_dir_skip_pos is factored out to skip over directory
>   entries that should not be shown to userspace.  This function is called
>   from kernfs_fop_readdir directly removing the complication of calling
>   it from the other directory advancement functions.
> 
> - The kernfs_put of the saved directory entry is moved from kernfs_dir_pos
>   to kernfs_fop_readdir.  Keeping it with the kernfs_get when it is saved
>   and ensuring the directory position advancment functions can reference
>   the saved node without complications.
> 
> - The function kernfs_dir_next_pos is modified to only advance the directory
>   position in the common case when kernfs_dir_pos validates and returns
>   the starting kernfs node.  For all other cases kernfs_dir_pos is guaranteed
>   to return a directory position in advance of the starting directory 
> position.
> 
> - kernfs_dir_pos is given a significant make over.  It's essense remains the
>   same but some siginficant changes were made.
> 
>   + The offset is validated to be a valid hash value at the very beginning.
> The validation is updated to handle the fact that neither 0 nor 1 are
> valid hash values.
> 
>   + An extra test is added at the end of the rbtree lookup to ensure
> the returned position is at or beyond the target position.
> 
>   + kernfs_name_compare is used during the rbtree lookup instead of just
> comparing
> the hash.  This allows the the passed namespace parameter to be used, and
> if it is available from the saved entry the old saved name as well.
> 
>   + The validation of the saved kernfs node now checks for two cases.
> If the saved node is returnable.
> If the name of the saved node is usable during lookup.
> 
> In net this should result in a easier to understand, and more correct
> version of directory traversal for kernfs directories.
>
> Reported-by: "Hatayama, Daisuke" 
> Fixes: a406f75840e1 ("sysfs: use rb-tree for inode number lookup")
> Fixes: 1e5289c97bba ("sysfs: Cache the last sysfs_dirent to improve readdir
> scalability v2")
> Signed-off-by: "Eric W. Biederman" 
> ---
> 
> Can you test this and please verify it fixes your issue?
>

I tried this patch on top of v4.17 but the system fails to boot
without detecting root disks by dracut like this:

[  196.121294] dracut-initqueue[324]: Warning: dracut-initqueue timeout - 
starting timeout scripts
[  196.672175] dracut-initqueue[324]: Warning: dracut-initqueue timeout - 
starting timeout scripts
[  197.219519] dracut-initqueue[324]: Warning: dracut-initqueue timeout - 
starting timeout scripts
[  197.768997] dracut-initqueue[324]: Warning: dracut-initqueue timeout - 
starting timeout scripts
[  198.312498] dracut-initqueue[324]: Warning: dracut-initqueue timeout - 
starting timeout scripts
[  198.856841] dracut-initqueue[324]: Warning: dracut-initqueue timeout - 
starting timeout scripts
[  199.403190] dracut-initqueue[324]: Warning: dracut-initqueue timeout - 
starting timeout scripts
[  199.945999] dracut-initqueue[324]: Warning: dracut-initqueue timeout - 
starting timeout scripts
[  200.490281] dracut-initqueue[324]: Warning: dracut-initqueue timeout - 
starting timeout scripts
[  201.071177] dracut-initqueue[324]: Warning: dracut-initqueue timeout - 
starting timeout scripts
[  201.074958] dracut-initqueue[324]: