Re: [PATCH 0/9] Support follow_link in RCU-walk.

2015-03-08 Thread NeilBrown
On Thu, 5 Mar 2015 06:05:20 + Al Viro  wrote:

> On Thu, Mar 05, 2015 at 04:21:21PM +1100, NeilBrown wrote:
> > Hi Al (and others),
> > 
> >  I wonder if you could look over this patchset.
> >  It allows RCU-walk to follow symlinks in many common cases,
> >  thus removing a surprising performance hit caused by using symlinks.
> > 
> >  The last could of patches make changes to XFS and NFS to support
> >  this but I haven't forwarded to the relevant lists yet.
> >  If/when the early code meets with approval I'll do that.
> > 
> >  The first patch almost certainly needs to be changed.  I originally
> >  wrote this code when filesystems could see inside nameidata.
> >  It is now opaque so the simplest solution was to provide an
> >  accessor function.
> >  Maybe I should as a 'flags' arg to ->follow_link?? Or have
> >  ->follow_link and ->follow_link_rcu ??
> >  What do you suggest?
> 
> Umm...  Some observations:
>   * now ->follow_link() can be called in RCU mode, which means
> that it can race with fs shutdown; not a problem, except that now it
> joins ->lookup() et.al. in "if some data structure is needed in RCU
> case of that, make sure it's not destroyed without an RCU delay somewhere
> between the entry into ->kill_sb() and destruction.

So inodes and dentries and associated private data should already be safe.
And s_fs_info can be used if it is freed by e.g. kfree_rcu (like autofs)
but not if just kfree (like ext3).

xfs_fs_put_super() directly frees the 'xfs_mount', which xfs_readlink
accesses.  I guess that needs to be fixed.


>   * highmem pages in symlinks: that BS shouldn't be allowed at
> all.  Just make sure that at least for those filesystems symlink inodes
> get mapping_set_gfp_mask(>i_data, GFP_KERNEL) and be done with that.

page_getlink() already uses kmap(), implying that highmem pages are
supported.   All I'm doing is making sure that my page_getlink_rcu()
doesn't fail horribly if the page is a highmem page.

If a filesystem needs improved follow_link performance on a highmem machine,
then setting the gfp_mask as you suggest is probably a good idea, but I don't
really want to impose that on filesystems if I don't need to.  And at present
I don't.
So I'd rather leave it to the filesystem maintainer, or someone who discovers
a need.


>   * are you sure that security_inode_follow_link() is OK to call in
> RCU mode?

No.
avc_has_perm() doesn't look RCU safe, even without auditing enabled.
At the very least we'll need to pass a "lookup_rcu" flag in there.


>   * what warranties are you giving for the lifetime of strings
> passed to nd_set_link()?  Right now it's "should not be freed until the
> matching ->put_link()"; what happens for RCU mode?

The same

For XFS, we kmalloc a buffer GFP_ATOMIC and copy into that.  Then
put_link() kfrees it.
For filesystems with the symlink in the page cache, we get a reference to
the page (which is a bit heavy-handed for RCU-walk, but much less so than the
current code) and drop the reference in ->put_link.

For filesystems with a short symlink in the inode, we just provide a pointer
to that... How long can we expect that to be around?
I cannot see any provision for keeping those inodes in memory while we
follow the symlink... What am I missing?

In any case, if there is a reference held on the inode for ref-walk, then
presumably complete_walk() will take a reference on that same inode when
dropping out of rcu-walk I hope.


So I think the rules here are unchanged.


>   * really nasty one: creat(2) on a dangling symlink.  What's to
> preserve the last component if you get into that symlink in RCU mode?

As above - we will have a counted reference to whatever holds the text of the
symlink.



> 
> TBH, I'm less than fond of passing nameidata to ->follow_link() at all,
> flags or no flags.  We could kill current->link_count and
> current->total_link_count, replacing them with one void * current->nameidata
> and taking counters into struct nameidata itself.  Have places like e.g.
> kern_path_locked() do
>   struct nameidata nd, *saved = set_nameidata();
>   ...
>   set_nameidata(saved);
> with set_nameidata(p) doing this:
>   old = current->nameidata;
>   current->nameidata = p;
>   if (p) {
>   if (!old) {
>   p->link_count = 0;
>   p->total_link_count = 0;
>   } else {
>   p->link_count = old->link_count;
>   p->total_link_count = old->total_link_count;
>   }
>   }
>   return old;
> 
> Then nd_set_link() et.al. would use current->nameidata instead of an
> explicitly passed pointer and ->follow_link() instances wouldn't need
> that opaque pointer passed to them at all.

Sounds interesting  - I might try it.

Would ->follow_link() than get a 'flags' argument, or would "nd_is_rcu()"
reference current->nameidata->flags ??

Thanks,
NeilBrown



pgpmfWypoc81Z.pgp
Description: OpenPGP 

Re: [PATCH 0/9] Support follow_link in RCU-walk.

2015-03-08 Thread NeilBrown
On Thu, 5 Mar 2015 06:05:20 + Al Viro v...@zeniv.linux.org.uk wrote:

 On Thu, Mar 05, 2015 at 04:21:21PM +1100, NeilBrown wrote:
  Hi Al (and others),
  
   I wonder if you could look over this patchset.
   It allows RCU-walk to follow symlinks in many common cases,
   thus removing a surprising performance hit caused by using symlinks.
  
   The last could of patches make changes to XFS and NFS to support
   this but I haven't forwarded to the relevant lists yet.
   If/when the early code meets with approval I'll do that.
  
   The first patch almost certainly needs to be changed.  I originally
   wrote this code when filesystems could see inside nameidata.
   It is now opaque so the simplest solution was to provide an
   accessor function.
   Maybe I should as a 'flags' arg to -follow_link?? Or have
   -follow_link and -follow_link_rcu ??
   What do you suggest?
 
 Umm...  Some observations:
   * now -follow_link() can be called in RCU mode, which means
 that it can race with fs shutdown; not a problem, except that now it
 joins -lookup() et.al. in if some data structure is needed in RCU
 case of that, make sure it's not destroyed without an RCU delay somewhere
 between the entry into -kill_sb() and destruction.

So inodes and dentries and associated private data should already be safe.
And s_fs_info can be used if it is freed by e.g. kfree_rcu (like autofs)
but not if just kfree (like ext3).

xfs_fs_put_super() directly frees the 'xfs_mount', which xfs_readlink
accesses.  I guess that needs to be fixed.


   * highmem pages in symlinks: that BS shouldn't be allowed at
 all.  Just make sure that at least for those filesystems symlink inodes
 get mapping_set_gfp_mask(inode-i_data, GFP_KERNEL) and be done with that.

page_getlink() already uses kmap(), implying that highmem pages are
supported.   All I'm doing is making sure that my page_getlink_rcu()
doesn't fail horribly if the page is a highmem page.

If a filesystem needs improved follow_link performance on a highmem machine,
then setting the gfp_mask as you suggest is probably a good idea, but I don't
really want to impose that on filesystems if I don't need to.  And at present
I don't.
So I'd rather leave it to the filesystem maintainer, or someone who discovers
a need.


   * are you sure that security_inode_follow_link() is OK to call in
 RCU mode?

No.
avc_has_perm() doesn't look RCU safe, even without auditing enabled.
At the very least we'll need to pass a lookup_rcu flag in there.


   * what warranties are you giving for the lifetime of strings
 passed to nd_set_link()?  Right now it's should not be freed until the
 matching -put_link(); what happens for RCU mode?

The same

For XFS, we kmalloc a buffer GFP_ATOMIC and copy into that.  Then
put_link() kfrees it.
For filesystems with the symlink in the page cache, we get a reference to
the page (which is a bit heavy-handed for RCU-walk, but much less so than the
current code) and drop the reference in -put_link.

For filesystems with a short symlink in the inode, we just provide a pointer
to that... How long can we expect that to be around?
I cannot see any provision for keeping those inodes in memory while we
follow the symlink... What am I missing?

In any case, if there is a reference held on the inode for ref-walk, then
presumably complete_walk() will take a reference on that same inode when
dropping out of rcu-walk I hope.


So I think the rules here are unchanged.


   * really nasty one: creat(2) on a dangling symlink.  What's to
 preserve the last component if you get into that symlink in RCU mode?

As above - we will have a counted reference to whatever holds the text of the
symlink.



 
 TBH, I'm less than fond of passing nameidata to -follow_link() at all,
 flags or no flags.  We could kill current-link_count and
 current-total_link_count, replacing them with one void * current-nameidata
 and taking counters into struct nameidata itself.  Have places like e.g.
 kern_path_locked() do
   struct nameidata nd, *saved = set_nameidata(nd);
   ...
   set_nameidata(saved);
 with set_nameidata(p) doing this:
   old = current-nameidata;
   current-nameidata = p;
   if (p) {
   if (!old) {
   p-link_count = 0;
   p-total_link_count = 0;
   } else {
   p-link_count = old-link_count;
   p-total_link_count = old-total_link_count;
   }
   }
   return old;
 
 Then nd_set_link() et.al. would use current-nameidata instead of an
 explicitly passed pointer and -follow_link() instances wouldn't need
 that opaque pointer passed to them at all.

Sounds interesting  - I might try it.

Would -follow_link() than get a 'flags' argument, or would nd_is_rcu()
reference current-nameidata-flags ??

Thanks,
NeilBrown



pgpmfWypoc81Z.pgp
Description: OpenPGP digital signature


Re: [PATCH 0/9] Support follow_link in RCU-walk.

2015-03-05 Thread NeilBrown
On Thu, 5 Mar 2015 06:05:20 + Al Viro  wrote:

> On Thu, Mar 05, 2015 at 04:21:21PM +1100, NeilBrown wrote:
> > Hi Al (and others),
> > 
> >  I wonder if you could look over this patchset.
> >  It allows RCU-walk to follow symlinks in many common cases,
> >  thus removing a surprising performance hit caused by using symlinks.
> > 
> >  The last could of patches make changes to XFS and NFS to support
> >  this but I haven't forwarded to the relevant lists yet.
> >  If/when the early code meets with approval I'll do that.
> > 
> >  The first patch almost certainly needs to be changed.  I originally
> >  wrote this code when filesystems could see inside nameidata.
> >  It is now opaque so the simplest solution was to provide an
> >  accessor function.
> >  Maybe I should as a 'flags' arg to ->follow_link?? Or have
> >  ->follow_link and ->follow_link_rcu ??
> >  What do you suggest?
> 
> Umm...  Some observations:
>   * now ->follow_link() can be called in RCU mode, which means
> that it can race with fs shutdown; not a problem, except that now it
> joins ->lookup() et.al. in "if some data structure is needed in RCU
> case of that, make sure it's not destroyed without an RCU delay somewhere
> between the entry into ->kill_sb() and destruction.
>   * highmem pages in symlinks: that BS shouldn't be allowed at
> all.  Just make sure that at least for those filesystems symlink inodes
> get mapping_set_gfp_mask(>i_data, GFP_KERNEL) and be done with that.
>   * are you sure that security_inode_follow_link() is OK to call in
> RCU mode?
>   * what warranties are you giving for the lifetime of strings
> passed to nd_set_link()?  Right now it's "should not be freed until the
> matching ->put_link()"; what happens for RCU mode?
>   * really nasty one: creat(2) on a dangling symlink.  What's to
> preserve the last component if you get into that symlink in RCU mode?
> 
> TBH, I'm less than fond of passing nameidata to ->follow_link() at all,
> flags or no flags.  We could kill current->link_count and
> current->total_link_count, replacing them with one void * current->nameidata
> and taking counters into struct nameidata itself.  Have places like e.g.
> kern_path_locked() do
>   struct nameidata nd, *saved = set_nameidata();
>   ...
>   set_nameidata(saved);
> with set_nameidata(p) doing this:
>   old = current->nameidata;
>   current->nameidata = p;
>   if (p) {
>   if (!old) {
>   p->link_count = 0;
>   p->total_link_count = 0;
>   } else {
>   p->link_count = old->link_count;
>   p->total_link_count = old->total_link_count;
>   }
>   }
>   return old;
> 
> Then nd_set_link() et.al. would use current->nameidata instead of an
> explicitly passed pointer and ->follow_link() instances wouldn't need
> that opaque pointer passed to them at all.

Wow, thanks for the quick response!!!

I'll look into all those issues and get back to you when I have something
coherent to say.

NeilBrown


pgp_fcu9BHu14.pgp
Description: OpenPGP digital signature


Re: [PATCH 0/9] Support follow_link in RCU-walk.

2015-03-05 Thread John Stoffel
> "Al" == Al Viro  writes:

Al> On Thu, Mar 05, 2015 at 08:52:20AM -0500, John Stoffel wrote:
>> So what happens if your filesystem is 10Tb in size, and you have 50
>> million files and lots of them are symlinks?  I've got developers who
>> do shit like this and wonder why performance sucks  and I just
>> worry that GPF_KERNEL is a limited resource.  But maybe 64bit systems
>> won't really have any problems?

Al> What would keep all those symlinks' contents pinned in page cache?

Dunno honestly... but your statement nudges my memory that they would
be evicted as memory pressure grows, so it's probably not a problem
afterall.  I just know that my users love tickling strange bugs like
this because they hate to actually cleanup after themselves unless
absolutely necessary.

Thanks for the nudge.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/9] Support follow_link in RCU-walk.

2015-03-05 Thread Al Viro
On Thu, Mar 05, 2015 at 08:52:20AM -0500, John Stoffel wrote:

> So what happens if your filesystem is 10Tb in size, and you have 50
> million files and lots of them are symlinks?  I've got developers who
> do shit like this and wonder why performance sucks  and I just
> worry that GPF_KERNEL is a limited resource.  But maybe 64bit systems
> won't really have any problems?

What would keep all those symlinks' contents pinned in page cache?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/9] Support follow_link in RCU-walk.

2015-03-05 Thread John Stoffel
> "Al" == Al Viro  writes:

Al> On Thu, Mar 05, 2015 at 04:21:21PM +1100, NeilBrown wrote:
>> Hi Al (and others),
>> 
>> I wonder if you could look over this patchset.
>> It allows RCU-walk to follow symlinks in many common cases,
>> thus removing a surprising performance hit caused by using symlinks.
>> 
>> The last could of patches make changes to XFS and NFS to support
>> this but I haven't forwarded to the relevant lists yet.
>> If/when the early code meets with approval I'll do that.
>> 
>> The first patch almost certainly needs to be changed.  I originally
>> wrote this code when filesystems could see inside nameidata.
>> It is now opaque so the simplest solution was to provide an
>> accessor function.
>> Maybe I should as a 'flags' arg to ->follow_link?? Or have
-> follow_link and ->follow_link_rcu ??
>> What do you suggest?

Al> Umm...  Some observations:
Al> * now ->follow_link() can be called in RCU mode, which means
Al> that it can race with fs shutdown; not a problem, except that now it
Al> joins ->lookup() et.al. in "if some data structure is needed in RCU
Al> case of that, make sure it's not destroyed without an RCU delay somewhere
Al> between the entry into ->kill_sb() and destruction.
Al> * highmem pages in symlinks: that BS shouldn't be allowed at
Al> all.  Just make sure that at least for those filesystems symlink inodes
Al> get mapping_set_gfp_mask(>i_data, GFP_KERNEL) and be done with that.


So what happens if your filesystem is 10Tb in size, and you have 50
million files and lots of them are symlinks?  I've got developers who
do shit like this and wonder why performance sucks  and I just
worry that GPF_KERNEL is a limited resource.  But maybe 64bit systems
won't really have any problems?

Rest of this is way outside my pay grade to commment on.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/9] Support follow_link in RCU-walk.

2015-03-05 Thread NeilBrown
On Thu, 5 Mar 2015 06:05:20 + Al Viro v...@zeniv.linux.org.uk wrote:

 On Thu, Mar 05, 2015 at 04:21:21PM +1100, NeilBrown wrote:
  Hi Al (and others),
  
   I wonder if you could look over this patchset.
   It allows RCU-walk to follow symlinks in many common cases,
   thus removing a surprising performance hit caused by using symlinks.
  
   The last could of patches make changes to XFS and NFS to support
   this but I haven't forwarded to the relevant lists yet.
   If/when the early code meets with approval I'll do that.
  
   The first patch almost certainly needs to be changed.  I originally
   wrote this code when filesystems could see inside nameidata.
   It is now opaque so the simplest solution was to provide an
   accessor function.
   Maybe I should as a 'flags' arg to -follow_link?? Or have
   -follow_link and -follow_link_rcu ??
   What do you suggest?
 
 Umm...  Some observations:
   * now -follow_link() can be called in RCU mode, which means
 that it can race with fs shutdown; not a problem, except that now it
 joins -lookup() et.al. in if some data structure is needed in RCU
 case of that, make sure it's not destroyed without an RCU delay somewhere
 between the entry into -kill_sb() and destruction.
   * highmem pages in symlinks: that BS shouldn't be allowed at
 all.  Just make sure that at least for those filesystems symlink inodes
 get mapping_set_gfp_mask(inode-i_data, GFP_KERNEL) and be done with that.
   * are you sure that security_inode_follow_link() is OK to call in
 RCU mode?
   * what warranties are you giving for the lifetime of strings
 passed to nd_set_link()?  Right now it's should not be freed until the
 matching -put_link(); what happens for RCU mode?
   * really nasty one: creat(2) on a dangling symlink.  What's to
 preserve the last component if you get into that symlink in RCU mode?
 
 TBH, I'm less than fond of passing nameidata to -follow_link() at all,
 flags or no flags.  We could kill current-link_count and
 current-total_link_count, replacing them with one void * current-nameidata
 and taking counters into struct nameidata itself.  Have places like e.g.
 kern_path_locked() do
   struct nameidata nd, *saved = set_nameidata(nd);
   ...
   set_nameidata(saved);
 with set_nameidata(p) doing this:
   old = current-nameidata;
   current-nameidata = p;
   if (p) {
   if (!old) {
   p-link_count = 0;
   p-total_link_count = 0;
   } else {
   p-link_count = old-link_count;
   p-total_link_count = old-total_link_count;
   }
   }
   return old;
 
 Then nd_set_link() et.al. would use current-nameidata instead of an
 explicitly passed pointer and -follow_link() instances wouldn't need
 that opaque pointer passed to them at all.

Wow, thanks for the quick response!!!

I'll look into all those issues and get back to you when I have something
coherent to say.

NeilBrown


pgp_fcu9BHu14.pgp
Description: OpenPGP digital signature


Re: [PATCH 0/9] Support follow_link in RCU-walk.

2015-03-05 Thread John Stoffel
 Al == Al Viro v...@zeniv.linux.org.uk writes:

Al On Thu, Mar 05, 2015 at 08:52:20AM -0500, John Stoffel wrote:
 So what happens if your filesystem is 10Tb in size, and you have 50
 million files and lots of them are symlinks?  I've got developers who
 do shit like this and wonder why performance sucks  and I just
 worry that GPF_KERNEL is a limited resource.  But maybe 64bit systems
 won't really have any problems?

Al What would keep all those symlinks' contents pinned in page cache?

Dunno honestly... but your statement nudges my memory that they would
be evicted as memory pressure grows, so it's probably not a problem
afterall.  I just know that my users love tickling strange bugs like
this because they hate to actually cleanup after themselves unless
absolutely necessary.

Thanks for the nudge.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/9] Support follow_link in RCU-walk.

2015-03-05 Thread Al Viro
On Thu, Mar 05, 2015 at 08:52:20AM -0500, John Stoffel wrote:

 So what happens if your filesystem is 10Tb in size, and you have 50
 million files and lots of them are symlinks?  I've got developers who
 do shit like this and wonder why performance sucks  and I just
 worry that GPF_KERNEL is a limited resource.  But maybe 64bit systems
 won't really have any problems?

What would keep all those symlinks' contents pinned in page cache?
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/9] Support follow_link in RCU-walk.

2015-03-05 Thread John Stoffel
 Al == Al Viro v...@zeniv.linux.org.uk writes:

Al On Thu, Mar 05, 2015 at 04:21:21PM +1100, NeilBrown wrote:
 Hi Al (and others),
 
 I wonder if you could look over this patchset.
 It allows RCU-walk to follow symlinks in many common cases,
 thus removing a surprising performance hit caused by using symlinks.
 
 The last could of patches make changes to XFS and NFS to support
 this but I haven't forwarded to the relevant lists yet.
 If/when the early code meets with approval I'll do that.
 
 The first patch almost certainly needs to be changed.  I originally
 wrote this code when filesystems could see inside nameidata.
 It is now opaque so the simplest solution was to provide an
 accessor function.
 Maybe I should as a 'flags' arg to -follow_link?? Or have
- follow_link and -follow_link_rcu ??
 What do you suggest?

Al Umm...  Some observations:
Al * now -follow_link() can be called in RCU mode, which means
Al that it can race with fs shutdown; not a problem, except that now it
Al joins -lookup() et.al. in if some data structure is needed in RCU
Al case of that, make sure it's not destroyed without an RCU delay somewhere
Al between the entry into -kill_sb() and destruction.
Al * highmem pages in symlinks: that BS shouldn't be allowed at
Al all.  Just make sure that at least for those filesystems symlink inodes
Al get mapping_set_gfp_mask(inode-i_data, GFP_KERNEL) and be done with that.


So what happens if your filesystem is 10Tb in size, and you have 50
million files and lots of them are symlinks?  I've got developers who
do shit like this and wonder why performance sucks  and I just
worry that GPF_KERNEL is a limited resource.  But maybe 64bit systems
won't really have any problems?

Rest of this is way outside my pay grade to commment on.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/9] Support follow_link in RCU-walk.

2015-03-04 Thread Al Viro
On Thu, Mar 05, 2015 at 04:21:21PM +1100, NeilBrown wrote:
> Hi Al (and others),
> 
>  I wonder if you could look over this patchset.
>  It allows RCU-walk to follow symlinks in many common cases,
>  thus removing a surprising performance hit caused by using symlinks.
> 
>  The last could of patches make changes to XFS and NFS to support
>  this but I haven't forwarded to the relevant lists yet.
>  If/when the early code meets with approval I'll do that.
> 
>  The first patch almost certainly needs to be changed.  I originally
>  wrote this code when filesystems could see inside nameidata.
>  It is now opaque so the simplest solution was to provide an
>  accessor function.
>  Maybe I should as a 'flags' arg to ->follow_link?? Or have
>  ->follow_link and ->follow_link_rcu ??
>  What do you suggest?

Umm...  Some observations:
* now ->follow_link() can be called in RCU mode, which means
that it can race with fs shutdown; not a problem, except that now it
joins ->lookup() et.al. in "if some data structure is needed in RCU
case of that, make sure it's not destroyed without an RCU delay somewhere
between the entry into ->kill_sb() and destruction.
* highmem pages in symlinks: that BS shouldn't be allowed at
all.  Just make sure that at least for those filesystems symlink inodes
get mapping_set_gfp_mask(>i_data, GFP_KERNEL) and be done with that.
* are you sure that security_inode_follow_link() is OK to call in
RCU mode?
* what warranties are you giving for the lifetime of strings
passed to nd_set_link()?  Right now it's "should not be freed until the
matching ->put_link()"; what happens for RCU mode?
* really nasty one: creat(2) on a dangling symlink.  What's to
preserve the last component if you get into that symlink in RCU mode?

TBH, I'm less than fond of passing nameidata to ->follow_link() at all,
flags or no flags.  We could kill current->link_count and
current->total_link_count, replacing them with one void * current->nameidata
and taking counters into struct nameidata itself.  Have places like e.g.
kern_path_locked() do
struct nameidata nd, *saved = set_nameidata();
...
set_nameidata(saved);
with set_nameidata(p) doing this:
old = current->nameidata;
current->nameidata = p;
if (p) {
if (!old) {
p->link_count = 0;
p->total_link_count = 0;
} else {
p->link_count = old->link_count;
p->total_link_count = old->total_link_count;
}
}
return old;

Then nd_set_link() et.al. would use current->nameidata instead of an
explicitly passed pointer and ->follow_link() instances wouldn't need
that opaque pointer passed to them at all.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/9] Support follow_link in RCU-walk.

2015-03-04 Thread Al Viro
On Thu, Mar 05, 2015 at 04:21:21PM +1100, NeilBrown wrote:
 Hi Al (and others),
 
  I wonder if you could look over this patchset.
  It allows RCU-walk to follow symlinks in many common cases,
  thus removing a surprising performance hit caused by using symlinks.
 
  The last could of patches make changes to XFS and NFS to support
  this but I haven't forwarded to the relevant lists yet.
  If/when the early code meets with approval I'll do that.
 
  The first patch almost certainly needs to be changed.  I originally
  wrote this code when filesystems could see inside nameidata.
  It is now opaque so the simplest solution was to provide an
  accessor function.
  Maybe I should as a 'flags' arg to -follow_link?? Or have
  -follow_link and -follow_link_rcu ??
  What do you suggest?

Umm...  Some observations:
* now -follow_link() can be called in RCU mode, which means
that it can race with fs shutdown; not a problem, except that now it
joins -lookup() et.al. in if some data structure is needed in RCU
case of that, make sure it's not destroyed without an RCU delay somewhere
between the entry into -kill_sb() and destruction.
* highmem pages in symlinks: that BS shouldn't be allowed at
all.  Just make sure that at least for those filesystems symlink inodes
get mapping_set_gfp_mask(inode-i_data, GFP_KERNEL) and be done with that.
* are you sure that security_inode_follow_link() is OK to call in
RCU mode?
* what warranties are you giving for the lifetime of strings
passed to nd_set_link()?  Right now it's should not be freed until the
matching -put_link(); what happens for RCU mode?
* really nasty one: creat(2) on a dangling symlink.  What's to
preserve the last component if you get into that symlink in RCU mode?

TBH, I'm less than fond of passing nameidata to -follow_link() at all,
flags or no flags.  We could kill current-link_count and
current-total_link_count, replacing them with one void * current-nameidata
and taking counters into struct nameidata itself.  Have places like e.g.
kern_path_locked() do
struct nameidata nd, *saved = set_nameidata(nd);
...
set_nameidata(saved);
with set_nameidata(p) doing this:
old = current-nameidata;
current-nameidata = p;
if (p) {
if (!old) {
p-link_count = 0;
p-total_link_count = 0;
} else {
p-link_count = old-link_count;
p-total_link_count = old-total_link_count;
}
}
return old;

Then nd_set_link() et.al. would use current-nameidata instead of an
explicitly passed pointer and -follow_link() instances wouldn't need
that opaque pointer passed to them at all.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/