RE: [RESEND PATCH v2] kernfs: fix dentry unexpected skip

2018-06-04 Thread Hatayama, Daisuke



> -Original Message-
> From: Tejun Heo [mailto:hte...@gmail.com] On Behalf Of 't...@kernel.org'
> Sent: Saturday, June 2, 2018 2:07 AM
> To: Hatayama, Daisuke 
> Cc: 'gre...@linuxfoundation.org' ; Okajima,
> Toshiyuki ;
> linux-kernel@vger.kernel.org; 'ebied...@aristanetworks.com'
> 
> Subject: Re: [RESEND PATCH v2] kernfs: fix dentry unexpected skip
> 
> Hello,
> 
> On Fri, Jun 01, 2018 at 09:25:32AM +, Hatayama, Daisuke wrote:
> > kernfs_dir_pos() checks if a kernfs_node object given as one of its
> > arguments is still active and if so returns it, or returns a
> > kernfs_node object that is most equal (possibly smaller and larger) to
> > the given object.
> 
> Sometimes they're duplicate operations tho, which is exactly the bug
> the posted patch is trying to fix.  What I'm suggesting is instead of
> leaving both instances and skipping one conditionally, put them in one
> place and trigger only when necessary.  The sequence of operations
> would be exactly the same.  The only difference is how the code is
> organized.
> 

I see and I think Eric's patch is written as you suggest very well.

> > kernfs_dir_next_pos() returns a kernfs_node object that is next to the
> > object given by kernfs_dir_pos().
> >
> > Two functions does different things and both need to skip inactive
> > nodes. I don't think it natural to remove the skip only from
> > kernfs_dir_pos().
> >
> > OTOH, throughout getdents(), there is no case that the kernfs_node
> > object given to kernfs_dir_pos() is used afterwards in the
> > processing. So, is it enough to provide kernfs_dir_next_pos() only?
> > Then, the skip code is now not duplicated.
> >
> > The patch below is my thought. How do you think?
> >
> > But note that this patch has some bug so that system boot get hang
> > without detecting root filesystem disk :) I'm debugging this now.
> 
> I haven't looked into the code that closely but given that we had
> cases where both skippings were fine and not, the condition is likely
> gonna be a bit tricker?

I agree to this version looks a bit tricker. But I think once the skipping
code is separated as Eric's patch, it would be resolved naturally.

> 
> Thanks.
> 
> --
> tejun
> 




RE: [RESEND PATCH v2] kernfs: fix dentry unexpected skip

2018-06-04 Thread Hatayama, Daisuke



> -Original Message-
> From: Tejun Heo [mailto:hte...@gmail.com] On Behalf Of 't...@kernel.org'
> Sent: Saturday, June 2, 2018 2:07 AM
> To: Hatayama, Daisuke 
> Cc: 'gre...@linuxfoundation.org' ; Okajima,
> Toshiyuki ;
> linux-kernel@vger.kernel.org; 'ebied...@aristanetworks.com'
> 
> Subject: Re: [RESEND PATCH v2] kernfs: fix dentry unexpected skip
> 
> Hello,
> 
> On Fri, Jun 01, 2018 at 09:25:32AM +, Hatayama, Daisuke wrote:
> > kernfs_dir_pos() checks if a kernfs_node object given as one of its
> > arguments is still active and if so returns it, or returns a
> > kernfs_node object that is most equal (possibly smaller and larger) to
> > the given object.
> 
> Sometimes they're duplicate operations tho, which is exactly the bug
> the posted patch is trying to fix.  What I'm suggesting is instead of
> leaving both instances and skipping one conditionally, put them in one
> place and trigger only when necessary.  The sequence of operations
> would be exactly the same.  The only difference is how the code is
> organized.
> 

I see and I think Eric's patch is written as you suggest very well.

> > kernfs_dir_next_pos() returns a kernfs_node object that is next to the
> > object given by kernfs_dir_pos().
> >
> > Two functions does different things and both need to skip inactive
> > nodes. I don't think it natural to remove the skip only from
> > kernfs_dir_pos().
> >
> > OTOH, throughout getdents(), there is no case that the kernfs_node
> > object given to kernfs_dir_pos() is used afterwards in the
> > processing. So, is it enough to provide kernfs_dir_next_pos() only?
> > Then, the skip code is now not duplicated.
> >
> > The patch below is my thought. How do you think?
> >
> > But note that this patch has some bug so that system boot get hang
> > without detecting root filesystem disk :) I'm debugging this now.
> 
> I haven't looked into the code that closely but given that we had
> cases where both skippings were fine and not, the condition is likely
> gonna be a bit tricker?

I agree to this version looks a bit tricker. But I think once the skipping
code is separated as Eric's patch, it would be resolved naturally.

> 
> Thanks.
> 
> --
> tejun
> 




Re: [RESEND PATCH v2] kernfs: fix dentry unexpected skip

2018-06-02 Thread Eric W. Biederman
"Hatayama, Daisuke"  writes:

> kernfs_dir_next_pos() overlooks the situation that the dentry
> corresponding to a given pos object has already been inactive. Hence,
> when kernfs_dir_pos() returns the dentry with a hash value larger than
> the original one, kernfs_dir_next_pos() returns the dentry next to the
> one returned by kernfs_dir_pos(). As a result, the dentry returned by
> kernfs_dir_pos() is skipped.
>
> To fix this issue, try to find a next node only when the returned
> object is less than or equal to the original one.
>
> Note that this implementation focuses on getting guarantee that the
> existing nodes are never skipped, not interested in the other nodes
> that are added or removed during the process.
>
> We found this issue during a stress test that repeatedly reads
> /sys/module directory to get a list of the currently loaded kernel
> modules while repeatedly loading and unloading kernel modules
> simultaneously.
>
> v2: Fix the case where nodes with the same hash but with the name
> larger than the original node could still be skipped. Use
> kernfs_sd_compare() to compare kernfs_node objects. Imporove patch
> description.
>

Semantically this looks like the right fix.

The deep bug is that kernfs_dir_pos can always advance the position,
and the code has never looked for or handled that case.  AKA just the
rbtree walk is enough to advance the position.

That said your implementation is buggy.  It is not safe to call
kernfs_sd_compare on orig as kernfs_put has already been called on orig
and thus orig may already be free.

I suggest moving the kernfs_put for orig into kernfs_fop_readdir,
just before the mutex_unlock calls.  That makes your kernfs_sd_compare
safe.

That would then allow moving the code to skip the next entry to also
be vmoed into kernfs_fop_readir.

Perhaps something like this:

diff --git a/fs/kernfs/dir.c b/fs/kernfs/dir.c
index 89d1dc19340b..2d0f71ffb539 100644
--- a/fs/kernfs/dir.c
+++ b/fs/kernfs/dir.c
@@ -1584,13 +1584,12 @@ static int kernfs_dir_fop_release(struct inode *inode, 
struct file *filp)
return 0;
 }
 
-static struct kernfs_node *kernfs_dir_pos(const void *ns,
+static struct kernfs_node *kernfs_dir_pos(
struct kernfs_node *parent, loff_t hash, struct kernfs_node *pos)
 {
if (pos) {
int valid = kernfs_active(pos) &&
pos->parent == parent && hash == pos->hash;
-   kernfs_put(pos);
if (!valid)
pos = NULL;
}
@@ -1607,8 +1606,14 @@ static struct kernfs_node *kernfs_dir_pos(const void *ns,
break;
}
}
-   /* Skip over entries which are dying/dead or in the wrong namespace */
-   while (pos && (!kernfs_active(pos) || pos->ns != ns)) {
+   return pos;
+}
+
+static struct kernfs_node *kernfs_dir_next_pos(
+   struct kernfs_node *parent, ino_t ino, struct kernfs_node *start)
+{
+   struct kernfs_node *pos = kernfs_dir_pos(parent, ino, start);
+   if (pos && (kernfs_sd_compare(pos, start) <= 0)) {
struct rb_node *node = rb_next(>rb);
if (!node)
pos = NULL;
@@ -1618,27 +1623,11 @@ static struct kernfs_node *kernfs_dir_pos(const void 
*ns,
return pos;
 }
 
-static struct kernfs_node *kernfs_dir_next_pos(const void *ns,
-   struct kernfs_node *parent, ino_t ino, struct kernfs_node *pos)
-{
-   pos = kernfs_dir_pos(ns, parent, ino, pos);
-   if (pos) {
-   do {
-   struct rb_node *node = rb_next(>rb);
-   if (!node)
-   pos = NULL;
-   else
-   pos = rb_to_kn(node);
-   } while (pos && (!kernfs_active(pos) || pos->ns != ns));
-   }
-   return pos;
-}
-
 static int kernfs_fop_readdir(struct file *file, struct dir_context *ctx)
 {
struct dentry *dentry = file->f_path.dentry;
struct kernfs_node *parent = kernfs_dentry_node(dentry);
-   struct kernfs_node *pos = file->private_data;
+   struct kernfs_node *pos, *saved = file->private_data;
const void *ns = NULL;
 
if (!dir_emit_dots(file, ctx))
@@ -1648,23 +1637,30 @@ static int kernfs_fop_readdir(struct file *file, struct 
dir_context *ctx)
if (kernfs_ns_enabled(parent))
ns = kernfs_info(dentry->d_sb)->ns;
 
-   for (pos = kernfs_dir_pos(ns, parent, ctx->pos, pos);
+   for (pos = kernfs_dir_pos(parent, ctx->pos, saved);
 pos;
-pos = kernfs_dir_next_pos(ns, parent, ctx->pos, pos)) {
+pos = kernfs_dir_next_pos(parent, ctx->pos, pos)) {
const char *name = pos->name;
unsigned int type = dt_type(pos);
int len = strlen(name);
ino_t ino = pos->id.ino;
 
-   ctx->pos = pos->hash;
-   file->private_data = pos;

Re: [RESEND PATCH v2] kernfs: fix dentry unexpected skip

2018-06-02 Thread Eric W. Biederman
"Hatayama, Daisuke"  writes:

> kernfs_dir_next_pos() overlooks the situation that the dentry
> corresponding to a given pos object has already been inactive. Hence,
> when kernfs_dir_pos() returns the dentry with a hash value larger than
> the original one, kernfs_dir_next_pos() returns the dentry next to the
> one returned by kernfs_dir_pos(). As a result, the dentry returned by
> kernfs_dir_pos() is skipped.
>
> To fix this issue, try to find a next node only when the returned
> object is less than or equal to the original one.
>
> Note that this implementation focuses on getting guarantee that the
> existing nodes are never skipped, not interested in the other nodes
> that are added or removed during the process.
>
> We found this issue during a stress test that repeatedly reads
> /sys/module directory to get a list of the currently loaded kernel
> modules while repeatedly loading and unloading kernel modules
> simultaneously.
>
> v2: Fix the case where nodes with the same hash but with the name
> larger than the original node could still be skipped. Use
> kernfs_sd_compare() to compare kernfs_node objects. Imporove patch
> description.
>

Semantically this looks like the right fix.

The deep bug is that kernfs_dir_pos can always advance the position,
and the code has never looked for or handled that case.  AKA just the
rbtree walk is enough to advance the position.

That said your implementation is buggy.  It is not safe to call
kernfs_sd_compare on orig as kernfs_put has already been called on orig
and thus orig may already be free.

I suggest moving the kernfs_put for orig into kernfs_fop_readdir,
just before the mutex_unlock calls.  That makes your kernfs_sd_compare
safe.

That would then allow moving the code to skip the next entry to also
be vmoed into kernfs_fop_readir.

Perhaps something like this:

diff --git a/fs/kernfs/dir.c b/fs/kernfs/dir.c
index 89d1dc19340b..2d0f71ffb539 100644
--- a/fs/kernfs/dir.c
+++ b/fs/kernfs/dir.c
@@ -1584,13 +1584,12 @@ static int kernfs_dir_fop_release(struct inode *inode, 
struct file *filp)
return 0;
 }
 
-static struct kernfs_node *kernfs_dir_pos(const void *ns,
+static struct kernfs_node *kernfs_dir_pos(
struct kernfs_node *parent, loff_t hash, struct kernfs_node *pos)
 {
if (pos) {
int valid = kernfs_active(pos) &&
pos->parent == parent && hash == pos->hash;
-   kernfs_put(pos);
if (!valid)
pos = NULL;
}
@@ -1607,8 +1606,14 @@ static struct kernfs_node *kernfs_dir_pos(const void *ns,
break;
}
}
-   /* Skip over entries which are dying/dead or in the wrong namespace */
-   while (pos && (!kernfs_active(pos) || pos->ns != ns)) {
+   return pos;
+}
+
+static struct kernfs_node *kernfs_dir_next_pos(
+   struct kernfs_node *parent, ino_t ino, struct kernfs_node *start)
+{
+   struct kernfs_node *pos = kernfs_dir_pos(parent, ino, start);
+   if (pos && (kernfs_sd_compare(pos, start) <= 0)) {
struct rb_node *node = rb_next(>rb);
if (!node)
pos = NULL;
@@ -1618,27 +1623,11 @@ static struct kernfs_node *kernfs_dir_pos(const void 
*ns,
return pos;
 }
 
-static struct kernfs_node *kernfs_dir_next_pos(const void *ns,
-   struct kernfs_node *parent, ino_t ino, struct kernfs_node *pos)
-{
-   pos = kernfs_dir_pos(ns, parent, ino, pos);
-   if (pos) {
-   do {
-   struct rb_node *node = rb_next(>rb);
-   if (!node)
-   pos = NULL;
-   else
-   pos = rb_to_kn(node);
-   } while (pos && (!kernfs_active(pos) || pos->ns != ns));
-   }
-   return pos;
-}
-
 static int kernfs_fop_readdir(struct file *file, struct dir_context *ctx)
 {
struct dentry *dentry = file->f_path.dentry;
struct kernfs_node *parent = kernfs_dentry_node(dentry);
-   struct kernfs_node *pos = file->private_data;
+   struct kernfs_node *pos, *saved = file->private_data;
const void *ns = NULL;
 
if (!dir_emit_dots(file, ctx))
@@ -1648,23 +1637,30 @@ static int kernfs_fop_readdir(struct file *file, struct 
dir_context *ctx)
if (kernfs_ns_enabled(parent))
ns = kernfs_info(dentry->d_sb)->ns;
 
-   for (pos = kernfs_dir_pos(ns, parent, ctx->pos, pos);
+   for (pos = kernfs_dir_pos(parent, ctx->pos, saved);
 pos;
-pos = kernfs_dir_next_pos(ns, parent, ctx->pos, pos)) {
+pos = kernfs_dir_next_pos(parent, ctx->pos, pos)) {
const char *name = pos->name;
unsigned int type = dt_type(pos);
int len = strlen(name);
ino_t ino = pos->id.ino;
 
-   ctx->pos = pos->hash;
-   file->private_data = pos;

Re: [RESEND PATCH v2] kernfs: fix dentry unexpected skip

2018-06-01 Thread 't...@kernel.org'
Hello,

On Fri, Jun 01, 2018 at 09:25:32AM +, Hatayama, Daisuke wrote:
> kernfs_dir_pos() checks if a kernfs_node object given as one of its
> arguments is still active and if so returns it, or returns a
> kernfs_node object that is most equal (possibly smaller and larger) to
> the given object.

Sometimes they're duplicate operations tho, which is exactly the bug
the posted patch is trying to fix.  What I'm suggesting is instead of
leaving both instances and skipping one conditionally, put them in one
place and trigger only when necessary.  The sequence of operations
would be exactly the same.  The only difference is how the code is
organized.

> kernfs_dir_next_pos() returns a kernfs_node object that is next to the
> object given by kernfs_dir_pos().
> 
> Two functions does different things and both need to skip inactive
> nodes. I don't think it natural to remove the skip only from
> kernfs_dir_pos().
> 
> OTOH, throughout getdents(), there is no case that the kernfs_node
> object given to kernfs_dir_pos() is used afterwards in the
> processing. So, is it enough to provide kernfs_dir_next_pos() only?
> Then, the skip code is now not duplicated.
> 
> The patch below is my thought. How do you think?
> 
> But note that this patch has some bug so that system boot get hang
> without detecting root filesystem disk :) I'm debugging this now.

I haven't looked into the code that closely but given that we had
cases where both skippings were fine and not, the condition is likely
gonna be a bit tricker?

Thanks.

-- 
tejun


Re: [RESEND PATCH v2] kernfs: fix dentry unexpected skip

2018-06-01 Thread 't...@kernel.org'
Hello,

On Fri, Jun 01, 2018 at 09:25:32AM +, Hatayama, Daisuke wrote:
> kernfs_dir_pos() checks if a kernfs_node object given as one of its
> arguments is still active and if so returns it, or returns a
> kernfs_node object that is most equal (possibly smaller and larger) to
> the given object.

Sometimes they're duplicate operations tho, which is exactly the bug
the posted patch is trying to fix.  What I'm suggesting is instead of
leaving both instances and skipping one conditionally, put them in one
place and trigger only when necessary.  The sequence of operations
would be exactly the same.  The only difference is how the code is
organized.

> kernfs_dir_next_pos() returns a kernfs_node object that is next to the
> object given by kernfs_dir_pos().
> 
> Two functions does different things and both need to skip inactive
> nodes. I don't think it natural to remove the skip only from
> kernfs_dir_pos().
> 
> OTOH, throughout getdents(), there is no case that the kernfs_node
> object given to kernfs_dir_pos() is used afterwards in the
> processing. So, is it enough to provide kernfs_dir_next_pos() only?
> Then, the skip code is now not duplicated.
> 
> The patch below is my thought. How do you think?
> 
> But note that this patch has some bug so that system boot get hang
> without detecting root filesystem disk :) I'm debugging this now.

I haven't looked into the code that closely but given that we had
cases where both skippings were fine and not, the condition is likely
gonna be a bit tricker?

Thanks.

-- 
tejun


RE: [RESEND PATCH v2] kernfs: fix dentry unexpected skip

2018-06-01 Thread Hatayama, Daisuke
> -Original Message-
> From: Tejun Heo [mailto:hte...@gmail.com] On Behalf Of 't...@kernel.org'
> Sent: Wednesday, May 30, 2018 1:26 AM
> To: Hatayama, Daisuke 
> Cc: 'gre...@linuxfoundation.org' ; Okajima,
> Toshiyuki ;
> linux-kernel@vger.kernel.org; 'ebied...@aristanetworks.com'
> 
> Subject: Re: [RESEND PATCH v2] kernfs: fix dentry unexpected skip
> 
> Hello,
> 
> On Mon, May 28, 2018 at 12:54:03PM +, Hatayama, Daisuke wrote:
> > diff --git a/fs/kernfs/dir.c b/fs/kernfs/dir.c
> > index 89d1dc1..3aeeb7a 100644
> > --- a/fs/kernfs/dir.c
> > +++ b/fs/kernfs/dir.c
> > @@ -1621,8 +1621,10 @@ static int kernfs_dir_fop_release(struct inode 
> > *inode,
> struct file *filp)
> >  static struct kernfs_node *kernfs_dir_next_pos(const void *ns,
> > struct kernfs_node *parent, ino_t ino, struct kernfs_node *pos)
> >  {
> > +   struct kernfs_node *orig = pos;
> > +
> > pos = kernfs_dir_pos(ns, parent, ino, pos);
> > -   if (pos) {
> > +   if (pos && kernfs_sd_compare(pos, orig) <= 0) {
> 
> Hmm... the code seems a bit unintuitive to me and I wonder whether
> it's because there are two identical skipping loops in
> kernfs_dir_pos() and kernfs_dir_next_pos() and we're now trying to
> selectively disable one of them.  Wouldn't it make more sense to get
> rid of it from kernfs_dir_pos() and skip explicitly only when
> necessary?
> 

kernfs_dir_pos() checks if a kernfs_node object given as one of its
arguments is still active and if so returns it, or returns a
kernfs_node object that is most equal (possibly smaller and larger) to
the given object.

kernfs_dir_next_pos() returns a kernfs_node object that is next to the
object given by kernfs_dir_pos().

Two functions does different things and both need to skip inactive
nodes. I don't think it natural to remove the skip only from
kernfs_dir_pos().

OTOH, throughout getdents(), there is no case that the kernfs_node
object given to kernfs_dir_pos() is used afterwards in the
processing. So, is it enough to provide kernfs_dir_next_pos() only?
Then, the skip code is now not duplicated.

The patch below is my thought. How do you think?

But note that this patch has some bug so that system boot get hang
without detecting root filesystem disk :) I'm debugging this now.

diff --git a/fs/kernfs/dir.c b/fs/kernfs/dir.c
index 89d1dc1..140706f 100644
--- a/fs/kernfs/dir.c
+++ b/fs/kernfs/dir.c
@@ -1584,9 +1584,11 @@ static int kernfs_dir_fop_release(struct inode *inode, 
struct file *filp)
return 0;
 }

-static struct kernfs_node *kernfs_dir_pos(const void *ns,
+static struct kernfs_node *kernfs_dir_next_pos(const void *ns,
struct kernfs_node *parent, loff_t hash, struct kernfs_node *pos)
 {
+   struct kernfs_node *orig = pos;
+
if (pos) {
int valid = kernfs_active(pos) &&
pos->parent == parent && hash == pos->hash;
@@ -1608,7 +1610,9 @@ static struct kernfs_node *kernfs_dir_pos(const void *ns,
}
}
/* Skip over entries which are dying/dead or in the wrong namespace */
-   while (pos && (!kernfs_active(pos) || pos->ns != ns)) {
+   while (pos && (!kernfs_active(pos) ||
+  (!orig || kernfs_sd_compare(pos, orig) <= 0) ||
+  pos->ns != ns)) {
struct rb_node *node = rb_next(>rb);
if (!node)
pos = NULL;
@@ -1618,22 +1622,6 @@ static struct kernfs_node *kernfs_dir_pos(const void *ns,
return pos;
 }

-static struct kernfs_node *kernfs_dir_next_pos(const void *ns,
-   struct kernfs_node *parent, ino_t ino, struct kernfs_node *pos)
-{
-   pos = kernfs_dir_pos(ns, parent, ino, pos);
-   if (pos) {
-   do {
-   struct rb_node *node = rb_next(>rb);
-   if (!node)
-   pos = NULL;
-   else
-   pos = rb_to_kn(node);
-   } while (pos && (!kernfs_active(pos) || pos->ns != ns));
-   }
-   return pos;
-}
-
 static int kernfs_fop_readdir(struct file *file, struct dir_context *ctx)
 {
struct dentry *dentry = file->f_path.dentry;
@@ -1648,7 +1636,7 @@ static int kernfs_fop_readdir(struct file *file, struct 
dir_context *ctx)
if (kernfs_ns_enabled(parent))
ns = kernfs_info(dentry->d_sb)->ns;

-   for (pos = kernfs_dir_pos(ns, parent, ctx->pos, pos);
+   for (pos = kernfs_dir_next_pos(ns, parent, ctx->pos, pos);
 pos;
 pos = kernfs_dir_next_pos(ns, parent, ctx->pos, pos)) {
const char *name = pos->name;

Thanks.
HATAYAMA, Daisuke




RE: [RESEND PATCH v2] kernfs: fix dentry unexpected skip

2018-06-01 Thread Hatayama, Daisuke
> -Original Message-
> From: Tejun Heo [mailto:hte...@gmail.com] On Behalf Of 't...@kernel.org'
> Sent: Wednesday, May 30, 2018 1:26 AM
> To: Hatayama, Daisuke 
> Cc: 'gre...@linuxfoundation.org' ; Okajima,
> Toshiyuki ;
> linux-kernel@vger.kernel.org; 'ebied...@aristanetworks.com'
> 
> Subject: Re: [RESEND PATCH v2] kernfs: fix dentry unexpected skip
> 
> Hello,
> 
> On Mon, May 28, 2018 at 12:54:03PM +, Hatayama, Daisuke wrote:
> > diff --git a/fs/kernfs/dir.c b/fs/kernfs/dir.c
> > index 89d1dc1..3aeeb7a 100644
> > --- a/fs/kernfs/dir.c
> > +++ b/fs/kernfs/dir.c
> > @@ -1621,8 +1621,10 @@ static int kernfs_dir_fop_release(struct inode 
> > *inode,
> struct file *filp)
> >  static struct kernfs_node *kernfs_dir_next_pos(const void *ns,
> > struct kernfs_node *parent, ino_t ino, struct kernfs_node *pos)
> >  {
> > +   struct kernfs_node *orig = pos;
> > +
> > pos = kernfs_dir_pos(ns, parent, ino, pos);
> > -   if (pos) {
> > +   if (pos && kernfs_sd_compare(pos, orig) <= 0) {
> 
> Hmm... the code seems a bit unintuitive to me and I wonder whether
> it's because there are two identical skipping loops in
> kernfs_dir_pos() and kernfs_dir_next_pos() and we're now trying to
> selectively disable one of them.  Wouldn't it make more sense to get
> rid of it from kernfs_dir_pos() and skip explicitly only when
> necessary?
> 

kernfs_dir_pos() checks if a kernfs_node object given as one of its
arguments is still active and if so returns it, or returns a
kernfs_node object that is most equal (possibly smaller and larger) to
the given object.

kernfs_dir_next_pos() returns a kernfs_node object that is next to the
object given by kernfs_dir_pos().

Two functions does different things and both need to skip inactive
nodes. I don't think it natural to remove the skip only from
kernfs_dir_pos().

OTOH, throughout getdents(), there is no case that the kernfs_node
object given to kernfs_dir_pos() is used afterwards in the
processing. So, is it enough to provide kernfs_dir_next_pos() only?
Then, the skip code is now not duplicated.

The patch below is my thought. How do you think?

But note that this patch has some bug so that system boot get hang
without detecting root filesystem disk :) I'm debugging this now.

diff --git a/fs/kernfs/dir.c b/fs/kernfs/dir.c
index 89d1dc1..140706f 100644
--- a/fs/kernfs/dir.c
+++ b/fs/kernfs/dir.c
@@ -1584,9 +1584,11 @@ static int kernfs_dir_fop_release(struct inode *inode, 
struct file *filp)
return 0;
 }

-static struct kernfs_node *kernfs_dir_pos(const void *ns,
+static struct kernfs_node *kernfs_dir_next_pos(const void *ns,
struct kernfs_node *parent, loff_t hash, struct kernfs_node *pos)
 {
+   struct kernfs_node *orig = pos;
+
if (pos) {
int valid = kernfs_active(pos) &&
pos->parent == parent && hash == pos->hash;
@@ -1608,7 +1610,9 @@ static struct kernfs_node *kernfs_dir_pos(const void *ns,
}
}
/* Skip over entries which are dying/dead or in the wrong namespace */
-   while (pos && (!kernfs_active(pos) || pos->ns != ns)) {
+   while (pos && (!kernfs_active(pos) ||
+  (!orig || kernfs_sd_compare(pos, orig) <= 0) ||
+  pos->ns != ns)) {
struct rb_node *node = rb_next(>rb);
if (!node)
pos = NULL;
@@ -1618,22 +1622,6 @@ static struct kernfs_node *kernfs_dir_pos(const void *ns,
return pos;
 }

-static struct kernfs_node *kernfs_dir_next_pos(const void *ns,
-   struct kernfs_node *parent, ino_t ino, struct kernfs_node *pos)
-{
-   pos = kernfs_dir_pos(ns, parent, ino, pos);
-   if (pos) {
-   do {
-   struct rb_node *node = rb_next(>rb);
-   if (!node)
-   pos = NULL;
-   else
-   pos = rb_to_kn(node);
-   } while (pos && (!kernfs_active(pos) || pos->ns != ns));
-   }
-   return pos;
-}
-
 static int kernfs_fop_readdir(struct file *file, struct dir_context *ctx)
 {
struct dentry *dentry = file->f_path.dentry;
@@ -1648,7 +1636,7 @@ static int kernfs_fop_readdir(struct file *file, struct 
dir_context *ctx)
if (kernfs_ns_enabled(parent))
ns = kernfs_info(dentry->d_sb)->ns;

-   for (pos = kernfs_dir_pos(ns, parent, ctx->pos, pos);
+   for (pos = kernfs_dir_next_pos(ns, parent, ctx->pos, pos);
 pos;
 pos = kernfs_dir_next_pos(ns, parent, ctx->pos, pos)) {
const char *name = pos->name;

Thanks.
HATAYAMA, Daisuke




Re: [RESEND PATCH v2] kernfs: fix dentry unexpected skip

2018-05-29 Thread 't...@kernel.org'
Hello,

On Mon, May 28, 2018 at 12:54:03PM +, Hatayama, Daisuke wrote:
> diff --git a/fs/kernfs/dir.c b/fs/kernfs/dir.c
> index 89d1dc1..3aeeb7a 100644
> --- a/fs/kernfs/dir.c
> +++ b/fs/kernfs/dir.c
> @@ -1621,8 +1621,10 @@ static int kernfs_dir_fop_release(struct inode *inode, 
> struct file *filp)
>  static struct kernfs_node *kernfs_dir_next_pos(const void *ns,
>   struct kernfs_node *parent, ino_t ino, struct kernfs_node *pos)
>  {
> + struct kernfs_node *orig = pos;
> +
>   pos = kernfs_dir_pos(ns, parent, ino, pos);
> - if (pos) {
> + if (pos && kernfs_sd_compare(pos, orig) <= 0) {

Hmm... the code seems a bit unintuitive to me and I wonder whether
it's because there are two identical skipping loops in
kernfs_dir_pos() and kernfs_dir_next_pos() and we're now trying to
selectively disable one of them.  Wouldn't it make more sense to get
rid of it from kernfs_dir_pos() and skip explicitly only when
necessary?

Thanks.

-- 
tejun


Re: [RESEND PATCH v2] kernfs: fix dentry unexpected skip

2018-05-29 Thread 't...@kernel.org'
Hello,

On Mon, May 28, 2018 at 12:54:03PM +, Hatayama, Daisuke wrote:
> diff --git a/fs/kernfs/dir.c b/fs/kernfs/dir.c
> index 89d1dc1..3aeeb7a 100644
> --- a/fs/kernfs/dir.c
> +++ b/fs/kernfs/dir.c
> @@ -1621,8 +1621,10 @@ static int kernfs_dir_fop_release(struct inode *inode, 
> struct file *filp)
>  static struct kernfs_node *kernfs_dir_next_pos(const void *ns,
>   struct kernfs_node *parent, ino_t ino, struct kernfs_node *pos)
>  {
> + struct kernfs_node *orig = pos;
> +
>   pos = kernfs_dir_pos(ns, parent, ino, pos);
> - if (pos) {
> + if (pos && kernfs_sd_compare(pos, orig) <= 0) {

Hmm... the code seems a bit unintuitive to me and I wonder whether
it's because there are two identical skipping loops in
kernfs_dir_pos() and kernfs_dir_next_pos() and we're now trying to
selectively disable one of them.  Wouldn't it make more sense to get
rid of it from kernfs_dir_pos() and skip explicitly only when
necessary?

Thanks.

-- 
tejun


RE: [RESEND PATCH v2] kernfs: fix dentry unexpected skip

2018-05-28 Thread Hatayama, Daisuke
Hi,

I attach a reproducer for this issue.

Expand repro.tar.gz and run two commands:

  # make
  # ./repro.sh

Then, repro.sh will terminate when the issue is reproduced.

In this reproducer, we prepare AtvbAC0jwH.ko and U1cN9ZbARQ.ko having the same 
hash value.
Then, install U1cN9ZbARQ.ko and then repeating installing and removing 
AtvbAC0jwH.ko over and over.
Note that AtvbAC0jwH is smaller than U1cN9ZbARQ in the string comparison.
The issue is that output of ls -1 /sys/module contains NO U1cN9ZbARQ.ko.

I found a pair of AtvbAC0jwH and U1cN9ZbARQ using kernhash.c, retrieved from 
fs/kernfs/dirs.c,
contained in repro.tar.gz like:

# ./kernfshash
AtvbAC0jwH U1cN9ZbARQ 6b2609c5

> -Original Message-
> From: linux-kernel-ow...@vger.kernel.org
> [mailto:linux-kernel-ow...@vger.kernel.org] On Behalf Of Hatayama, Daisuke
> Sent: Monday, May 28, 2018 9:54 PM
> To: 'gre...@linuxfoundation.org' <gre...@linuxfoundation.org>;
> 't...@kernel.org' <t...@kernel.org>
> Cc: Okajima, Toshiyuki <toshi.okaj...@jp.fujitsu.com>;
> linux-kernel@vger.kernel.org; 'ebied...@aristanetworks.com'
> <ebied...@aristanetworks.com>
> Subject: [RESEND PATCH v2] kernfs: fix dentry unexpected skip
> 
> kernfs_dir_next_pos() overlooks the situation that the dentry
> corresponding to a given pos object has already been inactive. Hence,
> when kernfs_dir_pos() returns the dentry with a hash value larger than
> the original one, kernfs_dir_next_pos() returns the dentry next to the
> one returned by kernfs_dir_pos(). As a result, the dentry returned by
> kernfs_dir_pos() is skipped.
> 
> To fix this issue, try to find a next node only when the returned
> object is less than or equal to the original one.
> 
> Note that this implementation focuses on getting guarantee that the
> existing nodes are never skipped, not interested in the other nodes
> that are added or removed during the process.
> 
> We found this issue during a stress test that repeatedly reads
> /sys/module directory to get a list of the currently loaded kernel
> modules while repeatedly loading and unloading kernel modules
> simultaneously.
> 
> v2: Fix the case where nodes with the same hash but with the name
> larger than the original node could still be skipped. Use
> kernfs_sd_compare() to compare kernfs_node objects. Imporove patch
> description.
> 
> Signed-off-by: HATAYAMA Daisuke <d.hatay...@jp.fujitsu.com>
> Suggested-by: Toshiyuki Okajima <toshi.okaj...@jp.fujitsu.com>
> Cc: Eric W. Biederman <ebied...@aristanetworks.com>
> ---
>  fs/kernfs/dir.c |4 +++-
>  1 files changed, 3 insertions(+), 1 deletions(-)
> 
> diff --git a/fs/kernfs/dir.c b/fs/kernfs/dir.c
> index 89d1dc1..3aeeb7a 100644
> --- a/fs/kernfs/dir.c
> +++ b/fs/kernfs/dir.c
> @@ -1621,8 +1621,10 @@ static int kernfs_dir_fop_release(struct inode *inode,
> struct file *filp)
>  static struct kernfs_node *kernfs_dir_next_pos(const void *ns,
>   struct kernfs_node *parent, ino_t ino, struct kernfs_node *pos)
>  {
> + struct kernfs_node *orig = pos;
> +
>   pos = kernfs_dir_pos(ns, parent, ino, pos);
> - if (pos) {
> + if (pos && kernfs_sd_compare(pos, orig) <= 0) {
>   do {
>   struct rb_node *node = rb_next(>rb);
>   if (!node)
> --
> 1.7.1
> 
> 
> 
> 
> 
> 



repro.tar.gz
Description: repro.tar.gz


RE: [RESEND PATCH v2] kernfs: fix dentry unexpected skip

2018-05-28 Thread Hatayama, Daisuke
Hi,

I attach a reproducer for this issue.

Expand repro.tar.gz and run two commands:

  # make
  # ./repro.sh

Then, repro.sh will terminate when the issue is reproduced.

In this reproducer, we prepare AtvbAC0jwH.ko and U1cN9ZbARQ.ko having the same 
hash value.
Then, install U1cN9ZbARQ.ko and then repeating installing and removing 
AtvbAC0jwH.ko over and over.
Note that AtvbAC0jwH is smaller than U1cN9ZbARQ in the string comparison.
The issue is that output of ls -1 /sys/module contains NO U1cN9ZbARQ.ko.

I found a pair of AtvbAC0jwH and U1cN9ZbARQ using kernhash.c, retrieved from 
fs/kernfs/dirs.c,
contained in repro.tar.gz like:

# ./kernfshash
AtvbAC0jwH U1cN9ZbARQ 6b2609c5

> -Original Message-
> From: linux-kernel-ow...@vger.kernel.org
> [mailto:linux-kernel-ow...@vger.kernel.org] On Behalf Of Hatayama, Daisuke
> Sent: Monday, May 28, 2018 9:54 PM
> To: 'gre...@linuxfoundation.org' ;
> 't...@kernel.org' 
> Cc: Okajima, Toshiyuki ;
> linux-kernel@vger.kernel.org; 'ebied...@aristanetworks.com'
> 
> Subject: [RESEND PATCH v2] kernfs: fix dentry unexpected skip
> 
> kernfs_dir_next_pos() overlooks the situation that the dentry
> corresponding to a given pos object has already been inactive. Hence,
> when kernfs_dir_pos() returns the dentry with a hash value larger than
> the original one, kernfs_dir_next_pos() returns the dentry next to the
> one returned by kernfs_dir_pos(). As a result, the dentry returned by
> kernfs_dir_pos() is skipped.
> 
> To fix this issue, try to find a next node only when the returned
> object is less than or equal to the original one.
> 
> Note that this implementation focuses on getting guarantee that the
> existing nodes are never skipped, not interested in the other nodes
> that are added or removed during the process.
> 
> We found this issue during a stress test that repeatedly reads
> /sys/module directory to get a list of the currently loaded kernel
> modules while repeatedly loading and unloading kernel modules
> simultaneously.
> 
> v2: Fix the case where nodes with the same hash but with the name
> larger than the original node could still be skipped. Use
> kernfs_sd_compare() to compare kernfs_node objects. Imporove patch
> description.
> 
> Signed-off-by: HATAYAMA Daisuke 
> Suggested-by: Toshiyuki Okajima 
> Cc: Eric W. Biederman 
> ---
>  fs/kernfs/dir.c |4 +++-
>  1 files changed, 3 insertions(+), 1 deletions(-)
> 
> diff --git a/fs/kernfs/dir.c b/fs/kernfs/dir.c
> index 89d1dc1..3aeeb7a 100644
> --- a/fs/kernfs/dir.c
> +++ b/fs/kernfs/dir.c
> @@ -1621,8 +1621,10 @@ static int kernfs_dir_fop_release(struct inode *inode,
> struct file *filp)
>  static struct kernfs_node *kernfs_dir_next_pos(const void *ns,
>   struct kernfs_node *parent, ino_t ino, struct kernfs_node *pos)
>  {
> + struct kernfs_node *orig = pos;
> +
>   pos = kernfs_dir_pos(ns, parent, ino, pos);
> - if (pos) {
> + if (pos && kernfs_sd_compare(pos, orig) <= 0) {
>   do {
>   struct rb_node *node = rb_next(>rb);
>   if (!node)
> --
> 1.7.1
> 
> 
> 
> 
> 
> 



repro.tar.gz
Description: repro.tar.gz


[RESEND PATCH v2] kernfs: fix dentry unexpected skip

2018-05-28 Thread Hatayama, Daisuke
kernfs_dir_next_pos() overlooks the situation that the dentry
corresponding to a given pos object has already been inactive. Hence,
when kernfs_dir_pos() returns the dentry with a hash value larger than
the original one, kernfs_dir_next_pos() returns the dentry next to the
one returned by kernfs_dir_pos(). As a result, the dentry returned by
kernfs_dir_pos() is skipped.

To fix this issue, try to find a next node only when the returned
object is less than or equal to the original one.

Note that this implementation focuses on getting guarantee that the
existing nodes are never skipped, not interested in the other nodes
that are added or removed during the process.

We found this issue during a stress test that repeatedly reads
/sys/module directory to get a list of the currently loaded kernel
modules while repeatedly loading and unloading kernel modules
simultaneously.

v2: Fix the case where nodes with the same hash but with the name
larger than the original node could still be skipped. Use
kernfs_sd_compare() to compare kernfs_node objects. Imporove patch
description.

Signed-off-by: HATAYAMA Daisuke 
Suggested-by: Toshiyuki Okajima 
Cc: Eric W. Biederman 
---
 fs/kernfs/dir.c |4 +++-
 1 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/fs/kernfs/dir.c b/fs/kernfs/dir.c
index 89d1dc1..3aeeb7a 100644
--- a/fs/kernfs/dir.c
+++ b/fs/kernfs/dir.c
@@ -1621,8 +1621,10 @@ static int kernfs_dir_fop_release(struct inode *inode, 
struct file *filp)
 static struct kernfs_node *kernfs_dir_next_pos(const void *ns,
struct kernfs_node *parent, ino_t ino, struct kernfs_node *pos)
 {
+   struct kernfs_node *orig = pos;
+
pos = kernfs_dir_pos(ns, parent, ino, pos);
-   if (pos) {
+   if (pos && kernfs_sd_compare(pos, orig) <= 0) {
do {
struct rb_node *node = rb_next(>rb);
if (!node)
-- 
1.7.1







[RESEND PATCH v2] kernfs: fix dentry unexpected skip

2018-05-28 Thread Hatayama, Daisuke
kernfs_dir_next_pos() overlooks the situation that the dentry
corresponding to a given pos object has already been inactive. Hence,
when kernfs_dir_pos() returns the dentry with a hash value larger than
the original one, kernfs_dir_next_pos() returns the dentry next to the
one returned by kernfs_dir_pos(). As a result, the dentry returned by
kernfs_dir_pos() is skipped.

To fix this issue, try to find a next node only when the returned
object is less than or equal to the original one.

Note that this implementation focuses on getting guarantee that the
existing nodes are never skipped, not interested in the other nodes
that are added or removed during the process.

We found this issue during a stress test that repeatedly reads
/sys/module directory to get a list of the currently loaded kernel
modules while repeatedly loading and unloading kernel modules
simultaneously.

v2: Fix the case where nodes with the same hash but with the name
larger than the original node could still be skipped. Use
kernfs_sd_compare() to compare kernfs_node objects. Imporove patch
description.

Signed-off-by: HATAYAMA Daisuke 
Suggested-by: Toshiyuki Okajima 
Cc: Eric W. Biederman 
---
 fs/kernfs/dir.c |4 +++-
 1 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/fs/kernfs/dir.c b/fs/kernfs/dir.c
index 89d1dc1..3aeeb7a 100644
--- a/fs/kernfs/dir.c
+++ b/fs/kernfs/dir.c
@@ -1621,8 +1621,10 @@ static int kernfs_dir_fop_release(struct inode *inode, 
struct file *filp)
 static struct kernfs_node *kernfs_dir_next_pos(const void *ns,
struct kernfs_node *parent, ino_t ino, struct kernfs_node *pos)
 {
+   struct kernfs_node *orig = pos;
+
pos = kernfs_dir_pos(ns, parent, ino, pos);
-   if (pos) {
+   if (pos && kernfs_sd_compare(pos, orig) <= 0) {
do {
struct rb_node *node = rb_next(>rb);
if (!node)
-- 
1.7.1







[PATCH v2] kernfs: fix dentry unexpected skip

2018-05-21 Thread Hatayama, Daisuke
kernfs_dir_next_pos() overlooks the situation that the dentry
corresponding to a given pos object has already been inactive. Hence,
when kernfs_dir_pos() returns the dentry with a hash value larger than
the original one, kernfs_dir_next_pos() returns the dentry next to the
one returned by kernfs_dir_pos(). As a result, the dentry returned by
kernfs_dir_pos() is skipped.

To fix this issue, try to find a next node only when the returned
object is less than or equal to the original one.

Note that this implementation focuses on getting guarantee that the
existing nodes are never skipped, not interested in the other nodes
that are added or removed during the process.

We found this issue during a stress test that repeatedly reads
/sys/module directory to get a list of the currently loaded kernel
modules while repeatedly loading and unloading kernel modules
simultaneously.

v2: Fix the case where nodes with the same hash but with the name
larger than the original node could still be skipped. Use
kernfs_sd_compare() to compare kernfs_node objects. Imporove patch
description.

Signed-off-by: HATAYAMA Daisuke 
Suggested-by: Toshiyuki Okajima 
Cc: Eric W. Biederman 
---
 fs/kernfs/dir.c |4 +++-
 1 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/fs/kernfs/dir.c b/fs/kernfs/dir.c
index 89d1dc1..3aeeb7a 100644
--- a/fs/kernfs/dir.c
+++ b/fs/kernfs/dir.c
@@ -1621,8 +1621,10 @@ static int kernfs_dir_fop_release(struct inode *inode, 
struct file *filp)
 static struct kernfs_node *kernfs_dir_next_pos(const void *ns,
struct kernfs_node *parent, ino_t ino, struct kernfs_node *pos)
 {
+   struct kernfs_node *orig = pos;
+
pos = kernfs_dir_pos(ns, parent, ino, pos);
-   if (pos) {
+   if (pos && kernfs_sd_compare(pos, orig) <= 0) {
do {
struct rb_node *node = rb_next(>rb);
if (!node)
-- 
1.7.1




[PATCH v2] kernfs: fix dentry unexpected skip

2018-05-21 Thread Hatayama, Daisuke
kernfs_dir_next_pos() overlooks the situation that the dentry
corresponding to a given pos object has already been inactive. Hence,
when kernfs_dir_pos() returns the dentry with a hash value larger than
the original one, kernfs_dir_next_pos() returns the dentry next to the
one returned by kernfs_dir_pos(). As a result, the dentry returned by
kernfs_dir_pos() is skipped.

To fix this issue, try to find a next node only when the returned
object is less than or equal to the original one.

Note that this implementation focuses on getting guarantee that the
existing nodes are never skipped, not interested in the other nodes
that are added or removed during the process.

We found this issue during a stress test that repeatedly reads
/sys/module directory to get a list of the currently loaded kernel
modules while repeatedly loading and unloading kernel modules
simultaneously.

v2: Fix the case where nodes with the same hash but with the name
larger than the original node could still be skipped. Use
kernfs_sd_compare() to compare kernfs_node objects. Imporove patch
description.

Signed-off-by: HATAYAMA Daisuke 
Suggested-by: Toshiyuki Okajima 
Cc: Eric W. Biederman 
---
 fs/kernfs/dir.c |4 +++-
 1 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/fs/kernfs/dir.c b/fs/kernfs/dir.c
index 89d1dc1..3aeeb7a 100644
--- a/fs/kernfs/dir.c
+++ b/fs/kernfs/dir.c
@@ -1621,8 +1621,10 @@ static int kernfs_dir_fop_release(struct inode *inode, 
struct file *filp)
 static struct kernfs_node *kernfs_dir_next_pos(const void *ns,
struct kernfs_node *parent, ino_t ino, struct kernfs_node *pos)
 {
+   struct kernfs_node *orig = pos;
+
pos = kernfs_dir_pos(ns, parent, ino, pos);
-   if (pos) {
+   if (pos && kernfs_sd_compare(pos, orig) <= 0) {
do {
struct rb_node *node = rb_next(>rb);
if (!node)
-- 
1.7.1