Re: [PATCH v2 01/12] docs: path-lookup: update follow_managed() part

2021-04-19 Thread Fox Chen
On Tue, Apr 20, 2021 at 3:22 AM Jonathan Corbet  wrote:
>
> Fox Chen  writes:
>
> > On Mon, Apr 19, 2021 at 11:25 AM Matthew Wilcox  wrote:
> >>
> >> On Mon, Apr 19, 2021 at 10:33:00AM +0800, Fox Chen wrote:
> >> > On Mon, Apr 19, 2021 at 10:17 AM Matthew Wilcox  
> >> > wrote:
> >> > > You can drop ``..`` from around function named which are followed with
> >> > > ().  d74b0d31ddde ("Docs: An initial automarkup extension for sphinx")
> >> > > marks them up automatically.
> >> > >
> >> >
> >> > Got it, thanks for letting me know. But I will still use them in this
> >> > patch series to keep consistency with the remaining parts of the
> >> > document.
> >>
> >> Well, you weren't.  For example:
> >>
> >> +As the last step of ``walk_component()``, ``step_into()`` will be called 
> >> either
> >> +directly from walk_component() or from handle_dots().  It calls
> >> +``handle_mount()``, to check and handle mount points, in which a new
> >>
> >> Neither of the functions on the second line were using ``.
> >
> > Oh, That was a mistake,  They should've been wrapped with ``.
> > Thanks for pointing it out. I will go through the whole patch set and
> > fix this type of inconsistency in V3.
>
> Please, if possible, go toward the bare function() form rather than
> using literals...it's easier to read and the docs system will
> automatically create cross references for you.
>
> Thanks,
>
> jon

Ok, If you have no problem with that inconsistency, I will go with the
bare one in v3.


thanks,
fox


RE: [PATCH 5.11 000/122] 5.11.16-rc1 review

2021-04-19 Thread Fox Chen
On Mon, 19 Apr 2021 15:04:40 +0200, Greg Kroah-Hartman 
 wrote:
> This is the start of the stable review cycle for the 5.11.16 release.
> There are 122 patches in this series, all will be posted as a response
> to this one.  If anyone has any issues with these being applied, please
> let me know.
> 
> Responses should be made by Wed, 21 Apr 2021 13:05:09 +.
> Anything received after that time might be too late.
> 
> The whole patch series can be found in one patch at:
>   
> https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.11.16-rc1.gz
> or in the git tree and branch at:
>   
> git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git 
> linux-5.11.y
> and the diffstat can be found below.
> 
> thanks,
> 
> greg k-h
> 

5.11.16-rc1 Successfully Compiled and booted on my Raspberry PI 4b (8g) 
(bcm2711)

Tested-by: Fox Chen 



RE: [PATCH 5.10 000/103] 5.10.32-rc1 review

2021-04-19 Thread Fox Chen
On Mon, 19 Apr 2021 15:05:11 +0200, Greg Kroah-Hartman 
 wrote:
> This is the start of the stable review cycle for the 5.10.32 release.
> There are 103 patches in this series, all will be posted as a response
> to this one.  If anyone has any issues with these being applied, please
> let me know.
> 
> Responses should be made by Wed, 21 Apr 2021 13:05:09 +.
> Anything received after that time might be too late.
> 
> The whole patch series can be found in one patch at:
>   
> https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.10.32-rc1.gz
> or in the git tree and branch at:
>   
> git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git 
> linux-5.10.y
> and the diffstat can be found below.
> 
> thanks,
> 
> greg k-h
> 

5.10.32-rc1 Successfully Compiled and booted on my Raspberry PI 4b (8g) 
(bcm2711)

Tested-by: Fox Chen 



Re: [PATCH v2 01/12] docs: path-lookup: update follow_managed() part

2021-04-18 Thread Fox Chen
On Mon, Apr 19, 2021 at 11:25 AM Matthew Wilcox  wrote:
>
> On Mon, Apr 19, 2021 at 10:33:00AM +0800, Fox Chen wrote:
> > On Mon, Apr 19, 2021 at 10:17 AM Matthew Wilcox  wrote:
> > >
> > > On Tue, Mar 16, 2021 at 01:47:16PM +0800, Fox Chen wrote:
> > > > -In the absence of symbolic links, ``walk_component()`` creates a new
> > > > +As the last step of ``walk_component()``, ``step_into()`` will be 
> > > > called either
> > >
> > > You can drop ``..`` from around function named which are followed with
> > > ().  d74b0d31ddde ("Docs: An initial automarkup extension for sphinx")
> > > marks them up automatically.
> > >
> >
> > Got it, thanks for letting me know. But I will still use them in this
> > patch series to keep consistency with the remaining parts of the
> > document.
>
> Well, you weren't.  For example:
>
> +As the last step of ``walk_component()``, ``step_into()`` will be called 
> either
> +directly from walk_component() or from handle_dots().  It calls
> +``handle_mount()``, to check and handle mount points, in which a new
>
> Neither of the functions on the second line were using ``.

Oh, That was a mistake,  They should've been wrapped with ``.
Thanks for pointing it out. I will go through the whole patch set and
fix this type of inconsistency in V3.


thanks,
fox


Re: [PATCH v2 12/12] docs: path-lookup: update symlink description

2021-04-18 Thread Fox Chen
On Mon, Apr 19, 2021 at 9:59 AM NeilBrown  wrote:
>
> On Tue, Mar 16 2021, Fox Chen wrote:
>
> > instead of lookup_real()/vfs_create(), i_op->lookup() and
> > i_op->create() will be called directly.
> >
> > update vfs_open() logic
> >
> > should_follow_link is merged into lookup_last() or open_last_lookup()
> > which returns symlink name instead of an integer.
> >
> > Signed-off-by: Fox Chen 
> > ---
> >  Documentation/filesystems/path-lookup.rst | 13 ++---
> >  1 file changed, 6 insertions(+), 7 deletions(-)
> >
> > diff --git a/Documentation/filesystems/path-lookup.rst 
> > b/Documentation/filesystems/path-lookup.rst
> > index eef6e9f68fba..adbc714740c2 100644
> > --- a/Documentation/filesystems/path-lookup.rst
> > +++ b/Documentation/filesystems/path-lookup.rst
> > @@ -1202,16 +1202,15 @@ the code.
> > it.  If the file was found in the dcache, then ``vfs_open()`` is used 
> > for
> > this.  If not, then ``lookup_open()`` will either call 
> > ``atomic_open()`` (if
> > the filesystem provides it) to combine the final lookup with the open, 
> > or
> > -   will perform the separate ``lookup_real()`` and ``vfs_create()`` steps
> > +   will perform the separate ``i_op->lookup()`` and ``i_op->create()`` 
> > steps
> > directly.  In the later case the actual "open" of this newly found or
> > created file will be performed by ``vfs_open()``, just as if the name
> > were found in the dcache.
> >
> >  2. ``vfs_open()`` can fail with ``-EOPENSTALE`` if the cached information
> > -   wasn't quite current enough.  Rather than restarting the lookup from
> > -   the top with ``LOOKUP_REVAL`` set, ``lookup_open()`` is called instead,
> > -   giving the filesystem a chance to resolve small inconsistencies.
> > -   If that doesn't work, only then is the lookup restarted from the top.
> > +   wasn't quite current enough.  If it's in RCU-walk -ECHILD will be 
> > returned
> > +   otherwise will return -ESTALE.  When -ESTALE is returned, the caller may
>
> "otherwise -ESTALE is returned".
> If you don't like repeating "is returned", then maybe:
>   "... -ECHILD will be returned, otherwise the result is -ESTALE".
>
>
> > +   retry with LOOKUP_REVAL flag set.
> >
> >  3. An open with O_CREAT **does** follow a symlink in the final component,
> > unlike other creation system calls (like ``mkdir``).  So the sequence::
> > @@ -1221,8 +1220,8 @@ the code.
> >
> > will create a file called ``/tmp/bar``.  This is not permitted if
> > ``O_EXCL`` is set but otherwise is handled for an O_CREAT open much
> > -   like for a non-creating open: ``should_follow_link()`` returns ``1``, 
> > and
> > -   so does ``do_last()`` so that ``trailing_symlink()`` gets called and the
> > +   like for a non-creating open: ``lookup_last()`` or 
> > ``open_last_lookup()``
> > +   returns a non ``Null`` value, and ``link_path_walk()`` gets called and 
> > the
>
> "NULL", not "Null".
>
> This those changes,
>  Reviewed-by: NeilBrown 
>
> Thanks for a lot of all these improvements!! and apologies for the delay
> in the review.

Thanks for the review, I will fix them and send the next version back.


> Thanks,
> NeilBrown
>
>
> > open process continues on the symlink that was found.
> >
> >  Updating the access time
> > --
> > 2.30.2


thanks,
fox


Re: [PATCH v2 01/12] docs: path-lookup: update follow_managed() part

2021-04-18 Thread Fox Chen
On Mon, Apr 19, 2021 at 10:17 AM Matthew Wilcox  wrote:
>
> On Tue, Mar 16, 2021 at 01:47:16PM +0800, Fox Chen wrote:
> > -In the absence of symbolic links, ``walk_component()`` creates a new
> > +As the last step of ``walk_component()``, ``step_into()`` will be called 
> > either
>
> You can drop ``..`` from around function named which are followed with
> ().  d74b0d31ddde ("Docs: An initial automarkup extension for sphinx")
> marks them up automatically.
>

Got it, thanks for letting me know. But I will still use them in this
patch series to keep consistency with the remaining parts of the
document.


thanks,
fox


RE: [PATCH 5.10 00/25] 5.10.31-rc1 review

2021-04-16 Thread Fox Chen
On Thu, 15 Apr 2021 16:47:54 +0200, Greg Kroah-Hartman 
 wrote:
> This is the start of the stable review cycle for the 5.10.31 release.
> There are 25 patches in this series, all will be posted as a response
> to this one.  If anyone has any issues with these being applied, please
> let me know.
> 
> Responses should be made by Sat, 17 Apr 2021 14:44:01 +.
> Anything received after that time might be too late.
> 
> The whole patch series can be found in one patch at:
>   
> https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.10.31-rc1.gz
> or in the git tree and branch at:
>   
> git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git 
> linux-5.10.y
> and the diffstat can be found below.
> 
> thanks,
> 
> greg k-h
> 

5.10.31-rc1 Successfully Compiled and booted on my Raspberry PI 4b (8g) 
(bcm2711)

Tested-by: Fox Chen 



RE: [PATCH 5.11 00/23] 5.11.15-rc1 review

2021-04-16 Thread Fox Chen
On Thu, 15 Apr 2021 16:48:07 +0200, Greg Kroah-Hartman 
 wrote:
> This is the start of the stable review cycle for the 5.11.15 release.
> There are 23 patches in this series, all will be posted as a response
> to this one.  If anyone has any issues with these being applied, please
> let me know.
> 
> Responses should be made by Sat, 17 Apr 2021 14:44:01 +.
> Anything received after that time might be too late.
> 
> The whole patch series can be found in one patch at:
>   
> https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.11.15-rc1.gz
> or in the git tree and branch at:
>   
> git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git 
> linux-5.11.y
> and the diffstat can be found below.
> 
> thanks,
> 
> greg k-h
> 

5.11.15-rc1 Successfully Compiled and booted on my Raspberry PI 4b (8g) 
(bcm2711)

Tested-by: Fox Chen 



[PATCH v2 12/12] docs: path-lookup: update symlink description

2021-03-15 Thread Fox Chen
instead of lookup_real()/vfs_create(), i_op->lookup() and
i_op->create() will be called directly.

update vfs_open() logic

should_follow_link is merged into lookup_last() or open_last_lookup()
which returns symlink name instead of an integer.

Signed-off-by: Fox Chen 
---
 Documentation/filesystems/path-lookup.rst | 13 ++---
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/Documentation/filesystems/path-lookup.rst 
b/Documentation/filesystems/path-lookup.rst
index eef6e9f68fba..adbc714740c2 100644
--- a/Documentation/filesystems/path-lookup.rst
+++ b/Documentation/filesystems/path-lookup.rst
@@ -1202,16 +1202,15 @@ the code.
it.  If the file was found in the dcache, then ``vfs_open()`` is used for
this.  If not, then ``lookup_open()`` will either call ``atomic_open()`` (if
the filesystem provides it) to combine the final lookup with the open, or
-   will perform the separate ``lookup_real()`` and ``vfs_create()`` steps
+   will perform the separate ``i_op->lookup()`` and ``i_op->create()`` steps
directly.  In the later case the actual "open" of this newly found or
created file will be performed by ``vfs_open()``, just as if the name
were found in the dcache.
 
 2. ``vfs_open()`` can fail with ``-EOPENSTALE`` if the cached information
-   wasn't quite current enough.  Rather than restarting the lookup from
-   the top with ``LOOKUP_REVAL`` set, ``lookup_open()`` is called instead,
-   giving the filesystem a chance to resolve small inconsistencies.
-   If that doesn't work, only then is the lookup restarted from the top.
+   wasn't quite current enough.  If it's in RCU-walk -ECHILD will be returned
+   otherwise will return -ESTALE.  When -ESTALE is returned, the caller may
+   retry with LOOKUP_REVAL flag set.
 
 3. An open with O_CREAT **does** follow a symlink in the final component,
unlike other creation system calls (like ``mkdir``).  So the sequence::
@@ -1221,8 +1220,8 @@ the code.
 
will create a file called ``/tmp/bar``.  This is not permitted if
``O_EXCL`` is set but otherwise is handled for an O_CREAT open much
-   like for a non-creating open: ``should_follow_link()`` returns ``1``, and
-   so does ``do_last()`` so that ``trailing_symlink()`` gets called and the
+   like for a non-creating open: ``lookup_last()`` or ``open_last_lookup()``
+   returns a non ``Null`` value, and ``link_path_walk()`` gets called and the
open process continues on the symlink that was found.
 
 Updating the access time
-- 
2.30.2



[PATCH v2 10/12] docs: path-lookup: update WALK_GET, WALK_PUT desc

2021-03-15 Thread Fox Chen
WALK_GET is changed to WALK_TRAILING with a different meaning.
Here it should be WALK_NOFOLLOW. WALK_PUT dosn't exist, we have
WALK_MORE.

WALK_PUT == !WALK_MORE

And there is not should_follow_link().

Related commits:
commit 8c4efe22e7c4 ("namei: invert the meaning of WALK_FOLLOW")
commit 1c4ff1a87e46 ("namei: invert WALK_PUT logics")

Signed-off-by: Fox Chen 
---
 Documentation/filesystems/path-lookup.rst | 12 +---
 1 file changed, 5 insertions(+), 7 deletions(-)

diff --git a/Documentation/filesystems/path-lookup.rst 
b/Documentation/filesystems/path-lookup.rst
index 0d41c61f7e4f..abd0153e2415 100644
--- a/Documentation/filesystems/path-lookup.rst
+++ b/Documentation/filesystems/path-lookup.rst
@@ -1123,13 +1123,11 @@ stack in ``walk_component()`` immediately when the 
symlink is found;
 old symlink as it walks that last component.  So it is quite
 convenient for ``walk_component()`` to release the old symlink and pop
 the references just before pushing the reference information for the
-new symlink.  It is guided in this by two flags; ``WALK_GET``, which
-gives it permission to follow a symlink if it finds one, and
-``WALK_PUT``, which tells it to release the current symlink after it has been
-followed.  ``WALK_PUT`` is tested first, leading to a call to
-``put_link()``.  ``WALK_GET`` is tested subsequently (by
-``should_follow_link()``) leading to a call to ``pick_link()`` which sets
-up the stack frame.
+new symlink.  It is guided in this by two flags; ``WALK_NOFOLLOW``, which
+suggests whether to follow a symlink if it finds one, and
+``WALK_MORE``, which tells whether to release the current symlink after it has
+been followed.  ``WALK_MORE`` is tested first, leading to a call to
+``put_link()``.
 
 Symlinks with no final component
 
-- 
2.30.2



[PATCH v2 09/12] docs: path-lookup: no get_link()

2021-03-15 Thread Fox Chen
no get_link() anymore. we have step_into() and pick_link().

walk_component() will call step_into(), in turn call pick_link,
and return symlink name.

Signed-off-by: Fox Chen 
---
 Documentation/filesystems/path-lookup.rst | 10 --
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/Documentation/filesystems/path-lookup.rst 
b/Documentation/filesystems/path-lookup.rst
index 8ab95dd9046e..0d41c61f7e4f 100644
--- a/Documentation/filesystems/path-lookup.rst
+++ b/Documentation/filesystems/path-lookup.rst
@@ -1103,12 +1103,10 @@ doesn't need to notice.  Getting this ``name`` variable 
on and off the
 stack is very straightforward; pushing and popping the references is
 a little more complex.
 
-When a symlink is found, ``walk_component()`` returns the value ``1``
-(``0`` is returned for any other sort of success, and a negative number
-is, as usual, an error indicator).  This causes ``get_link()`` to be
-called; it then gets the link from the filesystem.  Providing that
-operation is successful, the old path ``name`` is placed on the stack,
-and the new value is used as the ``name`` for a while.  When the end of
+When a symlink is found, ``walk_component()`` calls ``pick_link()``,
+it then gets the link from the filesystem returning new path ``name``.
+Providing that operation is successful, the old path ``name`` is placed on the
+stack, and the new value is used as the ``name`` for a while.  When the end of
 the path is found (i.e. ``*name`` is ``'\0'``) the old ``name`` is restored
 off the stack and path walking continues.
 
-- 
2.30.2



[PATCH v2 11/12] docs: path-lookup: update get_link() ->follow_link description

2021-03-15 Thread Fox Chen
get_link() is merged into pick_link(). i_op->follow_link is
replaced with i_op->get_link(). get_link() can return ERR_PTR(0)
which equals NULL.

Signed-off-by: Fox Chen 
---
 Documentation/filesystems/path-lookup.rst | 13 ++---
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/Documentation/filesystems/path-lookup.rst 
b/Documentation/filesystems/path-lookup.rst
index abd0153e2415..eef6e9f68fba 100644
--- a/Documentation/filesystems/path-lookup.rst
+++ b/Documentation/filesystems/path-lookup.rst
@@ -1134,10 +1134,10 @@ Symlinks with no final component
 
 A pair of special-case symlinks deserve a little further explanation.
 Both result in a new ``struct path`` (with mount and dentry) being set
-up in the ``nameidata``, and result in ``get_link()`` returning ``NULL``.
+up in the ``nameidata``, and result in ``pick_link()`` returning ``NULL``.
 
 The more obvious case is a symlink to "``/``".  All symlinks starting
-with "``/``" are detected in ``get_link()`` which resets the ``nameidata``
+with "``/``" are detected in ``pick_link()`` which resets the ``nameidata``
 to point to the effective filesystem root.  If the symlink only
 contains "``/``" then there is nothing more to do, no components at all,
 so ``NULL`` is returned to indicate that the symlink can be released and
@@ -1154,12 +1154,11 @@ something that looks like a symlink.  It is really a 
reference to the
 target file, not just the name of it.  When you ``readlink`` these
 objects you get a name that might refer to the same file - unless it
 has been unlinked or mounted over.  When ``walk_component()`` follows
-one of these, the ``->follow_link()`` method in "procfs" doesn't return
+one of these, the ``->get_link()`` method in "procfs" doesn't return
 a string name, but instead calls ``nd_jump_link()`` which updates the
-``nameidata`` in place to point to that target.  ``->follow_link()`` then
-returns ``NULL``.  Again there is no final component and ``get_link()``
-reports this by leaving the ``last_type`` field of ``nameidata`` as
-``LAST_BIND``.
+``nameidata`` in place to point to that target.  ``->get_link()`` then
+returns ``0``.  Again there is no final component and ``pick_link()``
+returns NULL.
 
 Following the symlink in the final component
 
-- 
2.30.2



[PATCH v2 06/12] docs: path-lookup: Add macro name to symlink limit description

2021-03-15 Thread Fox Chen
Add macro name MAXSYMLINKS to the symlink limit description, so
that it is consistent with path name length description above.

Signed-off-by: Fox Chen 
---
 Documentation/filesystems/path-lookup.rst | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/Documentation/filesystems/path-lookup.rst 
b/Documentation/filesystems/path-lookup.rst
index 66697db74955..af5c20fecfef 100644
--- a/Documentation/filesystems/path-lookup.rst
+++ b/Documentation/filesystems/path-lookup.rst
@@ -992,8 +992,8 @@ is 4096.  There are a number of reasons for this limit; not 
letting the
 kernel spend too much time on just one path is one of them.  With
 symbolic links you can effectively generate much longer paths so some
 sort of limit is needed for the same reason.  Linux imposes a limit of
-at most 40 symlinks in any one path lookup.  It previously imposed a
-further limit of eight on the maximum depth of recursion, but that was
+at most 40 (MAXSYMLINKS) symlinks in any one path lookup.  It previously 
imposed
+a further limit of eight on the maximum depth of recursion, but that was
 raised to 40 when a separate stack was implemented, so there is now
 just the one limit.
 
-- 
2.30.2



[PATCH v2 07/12] docs: path-lookup: i_op->follow_link replaced with i_op->get_link

2021-03-15 Thread Fox Chen
follow_link has been replaced by get_link() which can be
called in RCU mode.

see commit: commit 6b2553918d8b ("replace ->follow_link() with
new method that could stay in RCU mode")

Signed-off-by: Fox Chen 
---
 Documentation/filesystems/path-lookup.rst | 12 +---
 1 file changed, 5 insertions(+), 7 deletions(-)

diff --git a/Documentation/filesystems/path-lookup.rst 
b/Documentation/filesystems/path-lookup.rst
index af5c20fecfef..e6b6c43ff0f6 100644
--- a/Documentation/filesystems/path-lookup.rst
+++ b/Documentation/filesystems/path-lookup.rst
@@ -1060,13 +1060,11 @@ filesystem cannot successfully get a reference in 
RCU-walk mode, it
 must return ``-ECHILD`` and ``unlazy_walk()`` will be called to return to
 REF-walk mode in which the filesystem is allowed to sleep.
 
-The place for all this to happen is the ``i_op->follow_link()`` inode
-method.  In the present mainline code this is never actually called in
-RCU-walk mode as the rewrite is not quite complete.  It is likely that
-in a future release this method will be passed an ``inode`` pointer when
-called in RCU-walk mode so it both (1) knows to be careful, and (2) has the
-validated pointer.  Much like the ``i_op->permission()`` method we
-looked at previously, ``->follow_link()`` would need to be careful that
+The place for all this to happen is the ``i_op->get_link()`` inode
+method. This is called both in RCU-walk and REF-walk. In RCU-walk the
+``dentry*`` argument is NULL, ``->get_link()`` can return -ECHILD to drop out 
of
+RCU-walk.  Much like the ``i_op->permission()`` method we
+looked at previously, ``->get_link()`` would need to be careful that
 all the data structures it references are safe to be accessed while
 holding no counted reference, only the RCU lock.  Though getting a
 reference with ``->follow_link()`` is not yet done in RCU-walk mode, the
-- 
2.30.2



[PATCH v2 08/12] docs: path-lookup: update i_op->put_link and cookie description

2021-03-15 Thread Fox Chen
No inode->put_link operation anymore. We use delayed_call to
deal with link destruction. Cookie has been replaced with
struct delayed_call.

Related commit: commit fceef393a538 ("switch ->get_link() to
delayed_call, kill ->put_link()")

Signed-off-by: Fox Chen 
---
 Documentation/filesystems/path-lookup.rst | 30 ++-
 1 file changed, 8 insertions(+), 22 deletions(-)

diff --git a/Documentation/filesystems/path-lookup.rst 
b/Documentation/filesystems/path-lookup.rst
index e6b6c43ff0f6..8ab95dd9046e 100644
--- a/Documentation/filesystems/path-lookup.rst
+++ b/Documentation/filesystems/path-lookup.rst
@@ -1066,34 +1066,20 @@ method. This is called both in RCU-walk and REF-walk. 
In RCU-walk the
 RCU-walk.  Much like the ``i_op->permission()`` method we
 looked at previously, ``->get_link()`` would need to be careful that
 all the data structures it references are safe to be accessed while
-holding no counted reference, only the RCU lock.  Though getting a
-reference with ``->follow_link()`` is not yet done in RCU-walk mode, the
-code is ready to release the reference when that does happen.
-
-This need to drop the reference to a symlink adds significant
-complexity.  It requires a reference to the inode so that the
-``i_op->put_link()`` inode operation can be called.  In REF-walk, that
-reference is kept implicitly through a reference to the dentry, so
-keeping the ``struct path`` of the symlink is easiest.  For RCU-walk,
-the pointer to the inode is kept separately.  To allow switching from
-RCU-walk back to REF-walk in the middle of processing nested symlinks
-we also need the seq number for the dentry so we can confirm that
-switching back was safe.
-
-Finally, when providing a reference to a symlink, the filesystem also
-provides an opaque "cookie" that must be passed to ``->put_link()`` so that it
-knows what to free.  This might be the allocated memory area, or a
-pointer to the ``struct page`` in the page cache, or something else
-completely.  Only the filesystem knows what it is.
+holding no counted reference, only the RCU lock. A callback
+``struct delayed_called`` will be passed to get_link,
+file systems can set their own put_link function and argument through
+``set_delayed_call``. Later on, when vfs wants to put link, it will call
+``do_delayed_call`` to invoke that callback function with the argument.
 
 In order for the reference to each symlink to be dropped when the walk 
completes,
 whether in RCU-walk or REF-walk, the symlink stack needs to contain,
 along with the path remnants:
 
-- the ``struct path`` to provide a reference to the inode in REF-walk
-- the ``struct inode *`` to provide a reference to the inode in RCU-walk
+- the ``struct path`` to provide a reference to the previous path
+- the ``const char *`` to provide a reference to the to previous name
 - the ``seq`` to allow the path to be safely switched from RCU-walk to REF-walk
-- the ``cookie`` that tells ``->put_path()`` what to put.
+- the ``struct delayed_call`` for later invocation.
 
 This means that each entry in the symlink stack needs to hold five
 pointers and an integer instead of just one pointer (the path
-- 
2.30.2



[PATCH v2 05/12] docs: path-lookup: remove filename_mountpoint

2021-03-15 Thread Fox Chen
No filename_mountpoint any more
see commit: commit 161aff1d93ab ("LOOKUP_MOUNTPOINT:
fold path_mountpointat() into path_lookupat()")

Without filename_mountpoint and path_mountpoint(), the
numbers should be four & three:

"These four correspond roughly to the three path_*() functions"

Signed-off-by: Fox Chen 
---
 Documentation/filesystems/path-lookup.rst | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/Documentation/filesystems/path-lookup.rst 
b/Documentation/filesystems/path-lookup.rst
index a65cb477d524..66697db74955 100644
--- a/Documentation/filesystems/path-lookup.rst
+++ b/Documentation/filesystems/path-lookup.rst
@@ -652,9 +652,9 @@ restarts from the top with REF-walk.
 
 This pattern of "try RCU-walk, if that fails try REF-walk" can be
 clearly seen in functions like ``filename_lookup()``,
-``filename_parentat()``, ``filename_mountpoint()``,
-``do_filp_open()``, and ``do_file_open_root()``.  These five
-correspond roughly to the four ``path_*()`` functions we met earlier,
+``filename_parentat()``,
+``do_filp_open()``, and ``do_file_open_root()``.  These four
+correspond roughly to the three ``path_*()`` functions we met earlier,
 each of which calls ``link_path_walk()``.  The ``path_*()`` functions are
 called using different mode flags until a mode is found which works.
 They are first called with ``LOOKUP_RCU`` set to request "RCU-walk".  If
-- 
2.30.2



[PATCH v2 04/12] docs: path-lookup: update do_last() part

2021-03-15 Thread Fox Chen
traling_symlink() was merged into lookup_last, do_last().

do_last() has later been split into open_last_lookups()
and do_open().

see related commit: commit c5971b8c6354 ("take post-lookup
part of do_last() out of loop")

Signed-off-by: Fox Chen 
---
 Documentation/filesystems/path-lookup.rst | 35 ---
 1 file changed, 18 insertions(+), 17 deletions(-)

diff --git a/Documentation/filesystems/path-lookup.rst 
b/Documentation/filesystems/path-lookup.rst
index b6a301b78121..a65cb477d524 100644
--- a/Documentation/filesystems/path-lookup.rst
+++ b/Documentation/filesystems/path-lookup.rst
@@ -495,11 +495,11 @@ This is important when unmounting a filesystem that is 
inaccessible, such as
 one provided by a dead NFS server.
 
 Finally ``path_openat()`` is used for the ``open()`` system call; it
-contains, in support functions starting with "``do_last()``", all the
+contains, in support functions starting with "``open_last_lookups()``", all the
 complexity needed to handle the different subtleties of O_CREAT (with
 or without O_EXCL), final "``/``" characters, and trailing symbolic
 links.  We will revisit this in the final part of this series, which
-focuses on those symbolic links.  "``do_last()``" will sometimes, but
+focuses on those symbolic links.  "``open_last_lookups()``" will sometimes, but
 not always, take ``i_rwsem``, depending on what it finds.
 
 Each of these, or the functions which call them, need to be alert to
@@ -1199,26 +1199,27 @@ symlink.
 This case is handled by the relevant caller of ``link_path_walk()``, such as
 ``path_lookupat()`` using a loop that calls ``link_path_walk()``, and then
 handles the final component.  If the final component is a symlink
-that needs to be followed, then ``trailing_symlink()`` is called to set
-things up properly and the loop repeats, calling ``link_path_walk()``
-again.  This could loop as many as 40 times if the last component of
-each symlink is another symlink.
+that needs to be followed, then ``open_last_lookups()`` is
+called to set things up properly and the loop repeats, calling
+``link_path_walk()`` again.  This could loop as many as 40 times if the last
+component of each symlink is another symlink.
 
 The various functions that examine the final component and possibly
-report that it is a symlink are ``lookup_last()``, ``mountpoint_last()``
-and ``do_last()``, each of which use the same convention as
-``walk_component()`` of returning ``1`` if a symlink was found that needs
-to be followed.
+report that it is a symlink are ``lookup_last()``, ``open_last_lookups()``
+, each of which use the same convention as
+``walk_component()`` of returning ``char *name`` if a symlink was found that
+needs to be followed.
 
-Of these, ``do_last()`` is the most interesting as it is used for
-opening a file.  Part of ``do_last()`` runs with ``i_rwsem`` held and this
-part is in a separate function: ``lookup_open()``.
+Of these, ``open_last_lookups()`` is the most interesting as it works in tandem
+with ``do_open()`` for opening a file.  Part of ``open_last_lookups()`` runs
+with ``i_rwsem`` held and this part is in a separate function: 
``lookup_open()``.
 
-Explaining ``do_last()`` completely is beyond the scope of this article,
-but a few highlights should help those interested in exploring the
-code.
+Explaining ``open_last_lookups()`` and ``do_open()`` completely is beyond the 
scope
+of this article, but a few highlights should help those interested in exploring
+the code.
 
-1. Rather than just finding the target file, ``do_last()`` needs to open
+1. Rather than just finding the target file, ``do_open()`` is used after
+   ``open_last_lookup()`` to open
it.  If the file was found in the dcache, then ``vfs_open()`` is used for
this.  If not, then ``lookup_open()`` will either call ``atomic_open()`` (if
the filesystem provides it) to combine the final lookup with the open, or
-- 
2.30.2



[PATCH v2 02/12] docs: path-lookup: update path_to_nameidata() part

2021-03-15 Thread Fox Chen
No path_to_namei() anymore, step_into() will be called.
Related commit: commit c99687a03a78 ("fold path_to_nameidata()
into its only remaining caller")

Signed-off-by: Fox Chen 
---
 Documentation/filesystems/path-lookup.rst | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/Documentation/filesystems/path-lookup.rst 
b/Documentation/filesystems/path-lookup.rst
index d07766375e13..a29d714431a3 100644
--- a/Documentation/filesystems/path-lookup.rst
+++ b/Documentation/filesystems/path-lookup.rst
@@ -455,9 +455,10 @@ directly from walk_component() or from handle_dots().  It 
calls
 ``handle_mount()``, to check and handle mount points, in which a new
 ``struct path`` containing a counted reference to the new dentry and a
 reference to the new ``vfsmount`` which is only counted if it is
-different from the previous ``vfsmount``.  It then calls
-``path_to_nameidata()`` to install the new ``struct path`` in the
-``struct nameidata`` and drop the unneeded references.
+different from the previous ``vfsmount`` will be created. Then if there is
+symbolic link, ``step_into()`` calls ``pick_link()`` to deal with it, otherwise
+installs the new ``struct path`` in the ``struct nameidata`` and drop the
+unneeded references.
 
 This "hand-over-hand" sequencing of getting a reference to the new
 dentry before dropping the reference to the previous dentry may
-- 
2.30.2



[PATCH v2 01/12] docs: path-lookup: update follow_managed() part

2021-03-15 Thread Fox Chen
No follow_managed() anymore, handle_mounts(),
traverse_mounts(), will do the job.
see commit 9deed3ebca24 ("new helper: traverse_mounts()")

Signed-off-by: Fox Chen 
---
 Documentation/filesystems/path-lookup.rst | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/Documentation/filesystems/path-lookup.rst 
b/Documentation/filesystems/path-lookup.rst
index c482e1619e77..d07766375e13 100644
--- a/Documentation/filesystems/path-lookup.rst
+++ b/Documentation/filesystems/path-lookup.rst
@@ -448,10 +448,11 @@ described.  If it finds a ``LAST_NORM`` component it 
first calls
 filesystem to revalidate the result if it is that sort of filesystem.
 If that doesn't get a good result, it calls "``lookup_slow()``" which
 takes ``i_rwsem``, rechecks the cache, and then asks the filesystem
-to find a definitive answer.  Each of these will call
-``follow_managed()`` (as described below) to handle any mount points.
+to find a definitive answer.
 
-In the absence of symbolic links, ``walk_component()`` creates a new
+As the last step of ``walk_component()``, ``step_into()`` will be called either
+directly from walk_component() or from handle_dots().  It calls
+``handle_mount()``, to check and handle mount points, in which a new
 ``struct path`` containing a counted reference to the new dentry and a
 reference to the new ``vfsmount`` which is only counted if it is
 different from the previous ``vfsmount``.  It then calls
@@ -535,8 +536,7 @@ covered in greater detail in autofs.txt in the Linux 
documentation
 tree, but a few notes specifically related to path lookup are in order
 here.
 
-The Linux VFS has a concept of "managed" dentries which is reflected
-in function names such as "``follow_managed()``".  There are three
+The Linux VFS has a concept of "managed" dentries.  There are three
 potentially interesting things about these dentries corresponding
 to three different flags that might be set in ``dentry->d_flags``:
 
-- 
2.30.2



[PATCH v2 03/12] docs: path-lookup: update path_mountpoint() part

2021-03-15 Thread Fox Chen
path_mountpoint() doesn't exist anymore. Have been folded
into path_lookup_at when flag is set with LOOKUP_MOUNTPOINT.
Check commit: commit 161aff1d93abf0e ("LOOKUP_MOUNTPOINT: fold
path_mountpointat() into path_lookupat()")

Signed-off-by: Fox Chen 
---
 Documentation/filesystems/path-lookup.rst | 12 +---
 1 file changed, 5 insertions(+), 7 deletions(-)

diff --git a/Documentation/filesystems/path-lookup.rst 
b/Documentation/filesystems/path-lookup.rst
index a29d714431a3..b6a301b78121 100644
--- a/Documentation/filesystems/path-lookup.rst
+++ b/Documentation/filesystems/path-lookup.rst
@@ -472,7 +472,7 @@ Handling the final component
 ``nd->last_type`` to refer to the final component of the path.  It does
 not call ``walk_component()`` that last time.  Handling that final
 component remains for the caller to sort out. Those callers are
-``path_lookupat()``, ``path_parentat()``, ``path_mountpoint()`` and
+``path_lookupat()``, ``path_parentat()`` and
 ``path_openat()`` each of which handles the differing requirements of
 different system calls.
 
@@ -488,12 +488,10 @@ perform their operation.
 object is wanted such as by ``stat()`` or ``chmod()``.  It essentially just
 calls ``walk_component()`` on the final component through a call to
 ``lookup_last()``.  ``path_lookupat()`` returns just the final dentry.
-
-``path_mountpoint()`` handles the special case of unmounting which must
-not try to revalidate the mounted filesystem.  It effectively
-contains, through a call to ``mountpoint_last()``, an alternate
-implementation of ``lookup_slow()`` which skips that step.  This is
-important when unmounting a filesystem that is inaccessible, such as
+It is worth noting that when flag ``LOOKUP_MOUNTPOINT`` is set,
+``path_lookupat()`` will unset LOOKUP_JUMPED in nameidata so that in the 
further
+path traversal ``d_weak_revalidate()`` won't be called.
+This is important when unmounting a filesystem that is inaccessible, such as
 one provided by a dead NFS server.
 
 Finally ``path_openat()`` is used for the ``open()`` system call; it
-- 
2.30.2



[PATCH v2 00/12] docs: path-lookup: Update pathlookup docs

2021-03-15 Thread Fox Chen
The Path lookup is a very complex subject in VFS. The path-lookup
document provides a very detailed guidance to help people understand
how path lookup works in the kernel. This document was originally
written based on three lwn articles five years ago. As times goes by,
some of the content is outdated. This patchset is intended to update
the document to make it more relevant to current codebase.


---
v1: https://lore.kernel.org/lkml/20210126072443.33066-1-foxhlc...@gmail.com/
v2:
 - Fix problems in v1 reviewed by Neil:
   1. In Patch 01 and 02 rewrite a new paragrah to describe step_into()
   2. In Patch 01 instead of changing it to traverse_mounts, remove 
follow_managed()
   3. In Patch 03 re-telling the story rather than adding notes
   4. In Patch 04 do_open() should be outside of loop, fix it and fix other 
problems
  in following paragrah
   5. In Patch 07 use "drop out of RCU-walk"
   6. In Patch 08 "latter" should be "later", fix it and restructure the next 
paragrah
  removing "Finally"

To help review, I've put a compiled html version here: 
http://linux-docs.54fox.com/linux_docs/filesystems/path-lookup-v2.html


Fox Chen (12):
  docs: path-lookup: update follow_managed() part
  docs: path-lookup: update path_to_nameidata() part
  docs: path-lookup: update path_mountpoint() part
  docs: path-lookup: update do_last() part
  docs: path-lookup: remove filename_mountpoint
  docs: path-lookup: Add macro name to symlink limit description
  docs: path-lookup: i_op->follow_link replaced with i_op->get_link
  docs: path-lookup: update i_op->put_link and cookie description
  docs: path-lookup: no get_link()
  docs: path-lookup: update WALK_GET, WALK_PUT desc
  docs: path-lookup: update get_link() ->follow_link description
  docs: path-lookup: update symlink description

 Documentation/filesystems/path-lookup.rst | 164 ++
 1 file changed, 71 insertions(+), 93 deletions(-)

-- 
2.30.2



Re: [PATCH 00/12] docs: path-lookup: Update pathlookup docs

2021-01-28 Thread Fox Chen
On Thu, Jan 28, 2021 at 11:58 AM NeilBrown  wrote:
>
> On Tue, Jan 26 2021, Fox Chen wrote:
>
> > The Path lookup is a very complex subject in VFS. The path-lookup
> > document provides a very detailed guidance to help people understand
> > how path lookup works in the kernel.This document was originally
> > written based on three lwn articles five years ago. As times goes by,
> > some of the content was outdated. This patchset is intended to update
> > the document to make it more relevant to current codebase.
> >
> >
> > Fox Chen (12):
> >   docs: path-lookup: update follow_managed() part
> >   docs: path-lookup: update path_to_nameidata() parth
> >   docs: path-lookup: update path_mountpoint() part
> >   docs: path-lookup: update do_last() part
> >   docs: path-lookup: remove filename_mountpoint
> >   docs: path-lookup: Add macro name to symlink limit description
> >   docs: path-lookup: i_op->follow_link replaced with i_op->get_link
> >   docs: path-lookup: update i_op->put_link and cookie description
> >   docs: path-lookup: no get_link()
> >   docs: path-lookup: update WALK_GET, WALK_PUT desc
> >   docs: path-lookup: update get_link() ->follow_link description
> >   docs: path-lookup: update symlink description
> >
>
> Thanks for doing this.  I've responded individually to several of the
> patches.  As you can see from my comments, there is often more to it
> than just changing function names.  In some places you have capture the
> more general nature of the change fairly well.  In other places the
> result is incoherent or confusion.
> Making small updates to this sort of documentation is not easy.  You
> need to step have and see the "big picture", to overall story-arc.
> Sometimes you can fit changes into that arc, sometimes you might need to
> restructure or re-tell the story.
>
> This is part of why I haven't put much effort into the document -
> re-telling a story can be a lot of work.
>

Thanks for reviewing my patchset.  Yeah, sometimes it's difficult to
fit in the context and match the writing style. Most part of the
article is still valid, adding annotations may be better than rewrite.

Let me revise it and send in the v2, though It may take a while.


thanks,
fox

> NeilBrown


Re: [PATCH 00/12] docs: path-lookup: Update pathlookup docs

2021-01-27 Thread Fox Chen
On Wed, Jan 27, 2021 at 4:31 AM Jonathan Corbet  wrote:
>
> On Tue, 26 Jan 2021 15:24:31 +0800
> Fox Chen  wrote:
>
> > The Path lookup is a very complex subject in VFS. The path-lookup
> > document provides a very detailed guidance to help people understand
> > how path lookup works in the kernel.This document was originally
> > written based on three lwn articles five years ago. As times goes by,
> > some of the content was outdated. This patchset is intended to update
> > the document to make it more relevant to current codebase.
> >
> >
> > Fox Chen (12):
> >   docs: path-lookup: update follow_managed() part
> >   docs: path-lookup: update path_to_nameidata() parth
> >   docs: path-lookup: update path_mountpoint() part
> >   docs: path-lookup: update do_last() part
> >   docs: path-lookup: remove filename_mountpoint
> >   docs: path-lookup: Add macro name to symlink limit description
> >   docs: path-lookup: i_op->follow_link replaced with i_op->get_link
> >   docs: path-lookup: update i_op->put_link and cookie description
> >   docs: path-lookup: no get_link()
> >   docs: path-lookup: update WALK_GET, WALK_PUT desc
> >   docs: path-lookup: update get_link() ->follow_link description
> >   docs: path-lookup: update symlink description
> >
> >  Documentation/filesystems/path-lookup.rst | 146 ++
> >  1 file changed, 63 insertions(+), 83 deletions(-)
>
> Neil Brown (copied) is the original author of this document; I'd really
> like his feedback on these changes.  Neil, the full set is at:
>
> 
> https://lore.kernel.org/lkml/20210126072443.33066-1-foxhlc...@gmail.com/
>
> Thanks,
>
> jon


FYI, To help review the patches I marked & annotated all the places
I've changed.

you can check the link:
https://hyp.is/go?url=https%3A%2F%2Fwww.kernel.org%2Fdoc%2Fhtml%2Flatest%2Ffilesystems%2Fpath-lookup.html=__world__


thanks,
fox


Re: [PATCH 01/12] docs: path-lookup: update follow_managed() part

2021-01-26 Thread Fox Chen
On Tue, Jan 26, 2021 at 10:03 PM Greg KH  wrote:
>
> On Tue, Jan 26, 2021 at 03:24:32PM +0800, Fox Chen wrote:
> > No follow_managed() anymore, handle_mounts(),
> > traverse_mounts(), will do the job.
> > see commit: 9deed3ebca244663530782631834e706a86a8c8f
>
> When referencing commits in changelogs, please use the documented way,
> which for this one would be 9deed3ebca24 ("new helper:
> traverse_mounts()").  That enables us to read and understand them
> better.
>
> thanks,
>
> greg k-h

Got it, I will wait to see any other things that need to be changed,
then send in v2 together.


thanks,
fox


[PATCH 10/12] docs: path-lookup: update WALK_GET, WALK_PUT desc

2021-01-26 Thread Fox Chen
WALK_GET is changed to WALK_TRAILING with a different meaning.
Here it should be WALK_NOFOLLOW. WALK_PUT dosn't exist, we have
WALK_MORE.

WALK_PUT == !WALK_MORE

And there is not should_follow_link().

related commits:
8c4efe22e7c4de1d44f624120a979e31e3a584b8
1c4ff1a87e46a06fc00a83da2fbbc3ac0298f221

Signed-off-by: Fox Chen 
---
 Documentation/filesystems/path-lookup.rst | 12 +---
 1 file changed, 5 insertions(+), 7 deletions(-)

diff --git a/Documentation/filesystems/path-lookup.rst 
b/Documentation/filesystems/path-lookup.rst
index 915c9ffe22c1..921779a4636f 100644
--- a/Documentation/filesystems/path-lookup.rst
+++ b/Documentation/filesystems/path-lookup.rst
@@ -1126,13 +1126,11 @@ stack in ``walk_component()`` immediately when the 
symlink is found;
 old symlink as it walks that last component.  So it is quite
 convenient for ``walk_component()`` to release the old symlink and pop
 the references just before pushing the reference information for the
-new symlink.  It is guided in this by two flags; ``WALK_GET``, which
-gives it permission to follow a symlink if it finds one, and
-``WALK_PUT``, which tells it to release the current symlink after it has been
-followed.  ``WALK_PUT`` is tested first, leading to a call to
-``put_link()``.  ``WALK_GET`` is tested subsequently (by
-``should_follow_link()``) leading to a call to ``pick_link()`` which sets
-up the stack frame.
+new symlink.  It is guided in this by two flags; ``WALK_NOFOLLOW``, which
+suggests whether to follow a symlink if it finds one, and
+``WALK_MORE``, which tells whether to release the current symlink after it has
+been followed.  ``WALK_MORE`` is tested first, leading to a call to
+``put_link()``.
 
 Symlinks with no final component
 
-- 
2.30.0



[PATCH 08/12] docs: path-lookup: update i_op->put_link and cookie description

2021-01-26 Thread Fox Chen
No inode->put_link operation anymore. We use delayed_call to
deal with link destruction. Cookie has been replaced with
struct delayed_call.

Related commit: fceef393a538134f03b778c5d2519e670269342f

Signed-off-by: Fox Chen 
---
 Documentation/filesystems/path-lookup.rst | 31 +++
 1 file changed, 9 insertions(+), 22 deletions(-)

diff --git a/Documentation/filesystems/path-lookup.rst 
b/Documentation/filesystems/path-lookup.rst
index 0a362849b26f..8572300b5405 100644
--- a/Documentation/filesystems/path-lookup.rst
+++ b/Documentation/filesystems/path-lookup.rst
@@ -1068,34 +1068,21 @@ method. This is called both in RCU-walk and REF-walk. 
In RCU-walk the
 RCU-walk.  Much like the ``i_op->permission()`` method we
 looked at previously, ``->get_link()`` would need to be careful that
 all the data structures it references are safe to be accessed while
-holding no counted reference, only the RCU lock.  Though getting a
-reference with ``->follow_link()`` is not yet done in RCU-walk mode, the
-code is ready to release the reference when that does happen.
-
-This need to drop the reference to a symlink adds significant
-complexity.  It requires a reference to the inode so that the
-``i_op->put_link()`` inode operation can be called.  In REF-walk, that
-reference is kept implicitly through a reference to the dentry, so
-keeping the ``struct path`` of the symlink is easiest.  For RCU-walk,
-the pointer to the inode is kept separately.  To allow switching from
-RCU-walk back to REF-walk in the middle of processing nested symlinks
-we also need the seq number for the dentry so we can confirm that
-switching back was safe.
-
-Finally, when providing a reference to a symlink, the filesystem also
-provides an opaque "cookie" that must be passed to ``->put_link()`` so that it
-knows what to free.  This might be the allocated memory area, or a
-pointer to the ``struct page`` in the page cache, or something else
-completely.  Only the filesystem knows what it is.
+holding no counted reference, only the RCU lock.
+
+Finally, a callback ``struct delayed_called`` will be passed to get_link,
+file systems can set their own put_link function and argument through
+``set_delayed_call``. Latter on, when vfs wants to put link, it will call 
+``do_delayed_call`` to invoke that callback function with the argument.
 
 In order for the reference to each symlink to be dropped when the walk 
completes,
 whether in RCU-walk or REF-walk, the symlink stack needs to contain,
 along with the path remnants:
 
-- the ``struct path`` to provide a reference to the inode in REF-walk
-- the ``struct inode *`` to provide a reference to the inode in RCU-walk
+- the ``struct path`` to provide a reference to the previous path
+- the ``const char *`` to provide a reference to the to previous name
 - the ``seq`` to allow the path to be safely switched from RCU-walk to REF-walk
-- the ``cookie`` that tells ``->put_path()`` what to put.
+- the ``struct delayed_call`` for later invocation.
 
 This means that each entry in the symlink stack needs to hold five
 pointers and an integer instead of just one pointer (the path
-- 
2.30.0



[PATCH 09/12] docs: path-lookup: no get_link()

2021-01-26 Thread Fox Chen
no get_link() anymore. we have step_into() and pick_link().

walk_component() will call step_into(), in turn call pick_link,
and return symlink name.

Signed-off-by: Fox Chen 
---
 Documentation/filesystems/path-lookup.rst | 10 --
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/Documentation/filesystems/path-lookup.rst 
b/Documentation/filesystems/path-lookup.rst
index 8572300b5405..915c9ffe22c1 100644
--- a/Documentation/filesystems/path-lookup.rst
+++ b/Documentation/filesystems/path-lookup.rst
@@ -1106,12 +1106,10 @@ doesn't need to notice.  Getting this ``name`` variable 
on and off the
 stack is very straightforward; pushing and popping the references is
 a little more complex.
 
-When a symlink is found, ``walk_component()`` returns the value ``1``
-(``0`` is returned for any other sort of success, and a negative number
-is, as usual, an error indicator).  This causes ``get_link()`` to be
-called; it then gets the link from the filesystem.  Providing that
-operation is successful, the old path ``name`` is placed on the stack,
-and the new value is used as the ``name`` for a while.  When the end of
+When a symlink is found, ``walk_component()`` calls ``pick_link()``,
+it then gets the link from the filesystem returning new path ``name``.
+Providing that operation is successful, the old path ``name`` is placed on the
+stack, and the new value is used as the ``name`` for a while.  When the end of
 the path is found (i.e. ``*name`` is ``'\0'``) the old ``name`` is restored
 off the stack and path walking continues.
 
-- 
2.30.0



Re: [PATCH 00/12] docs: path-lookup: Update pathlookup docs

2021-01-26 Thread Fox Chen
On Tue, Jan 26, 2021 at 3:25 PM Fox Chen  wrote:
>
> The Path lookup is a very complex subject in VFS. The path-lookup
> document provides a very detailed guidance to help people understand
> how path lookup works in the kernel.This document was originally
> written based on three lwn articles five years ago. As times goes by,
> some of the content was outdated. This patchset is intended to update
> the document to make it more relevant to current codebase.
>
>
> Fox Chen (12):
>   docs: path-lookup: update follow_managed() part
>   docs: path-lookup: update path_to_nameidata() parth
>   docs: path-lookup: update path_mountpoint() part
>   docs: path-lookup: update do_last() part
>   docs: path-lookup: remove filename_mountpoint
>   docs: path-lookup: Add macro name to symlink limit description
>   docs: path-lookup: i_op->follow_link replaced with i_op->get_link
>   docs: path-lookup: update i_op->put_link and cookie description
>   docs: path-lookup: no get_link()
>   docs: path-lookup: update WALK_GET, WALK_PUT desc
>   docs: path-lookup: update get_link() ->follow_link description
>   docs: path-lookup: update symlink description
>
>  Documentation/filesystems/path-lookup.rst | 146 ++
>  1 file changed, 63 insertions(+), 83 deletions(-)
>
> --
> 2.30.0
>


To help review the patches, I annotated and highlighted changes using hypothesis

you can check the current docs:
https://hyp.is/go?url=https%3A%2F%2Fwww.kernel.org%2Fdoc%2Fhtml%2Flatest%2Ffilesystems%2Fpath-lookup.html=__world__

the docs after patches applied:
https://hyp.is/go?url=http%3A%2F%2Flinux-docs.54fox.com%2Flinux_docs%2Ffilesystems%2Fpath-lookup.html=__world__


[PATCH 07/12] docs: path-lookup: i_op->follow_link replaced with i_op->get_link

2021-01-26 Thread Fox Chen
follow_link has been replaced by get_link() which can be
called in RCU mode.

see commit: 6b2553918d8b4e6de9853fd6315bec7271a2e592

Signed-off-by: Fox Chen 
---
 Documentation/filesystems/path-lookup.rst | 12 +---
 1 file changed, 5 insertions(+), 7 deletions(-)

diff --git a/Documentation/filesystems/path-lookup.rst 
b/Documentation/filesystems/path-lookup.rst
index 25d2a5a59f45..0a362849b26f 100644
--- a/Documentation/filesystems/path-lookup.rst
+++ b/Documentation/filesystems/path-lookup.rst
@@ -1062,13 +1062,11 @@ filesystem cannot successfully get a reference in 
RCU-walk mode, it
 must return ``-ECHILD`` and ``unlazy_walk()`` will be called to return to
 REF-walk mode in which the filesystem is allowed to sleep.
 
-The place for all this to happen is the ``i_op->follow_link()`` inode
-method.  In the present mainline code this is never actually called in
-RCU-walk mode as the rewrite is not quite complete.  It is likely that
-in a future release this method will be passed an ``inode`` pointer when
-called in RCU-walk mode so it both (1) knows to be careful, and (2) has the
-validated pointer.  Much like the ``i_op->permission()`` method we
-looked at previously, ``->follow_link()`` would need to be careful that
+The place for all this to happen is the ``i_op->get_link()`` inode
+method. This is called both in RCU-walk and REF-walk. In RCU-walk the
+``dentry*`` argument is NULL, ``->get_link()`` can return -ECHILD to drop
+RCU-walk.  Much like the ``i_op->permission()`` method we
+looked at previously, ``->get_link()`` would need to be careful that
 all the data structures it references are safe to be accessed while
 holding no counted reference, only the RCU lock.  Though getting a
 reference with ``->follow_link()`` is not yet done in RCU-walk mode, the
-- 
2.30.0



[PATCH 12/12] docs: path-lookup: update symlink description

2021-01-26 Thread Fox Chen
instead of lookup_real()/vfs_create(), i_op->lookup() and
i_op->create() will be called directly.

update vfs_open() logic

should_follow_link is merged into lookup_last() or open_last_lookup()
which returns symlink name instead of an integer.

Signed-off-by: Fox Chen 
---
 Documentation/filesystems/path-lookup.rst | 13 ++---
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/Documentation/filesystems/path-lookup.rst 
b/Documentation/filesystems/path-lookup.rst
index 2bb3ca486acd..0c6fc296056c 100644
--- a/Documentation/filesystems/path-lookup.rst
+++ b/Documentation/filesystems/path-lookup.rst
@@ -1204,16 +1204,15 @@ the code.
it.  If the file was found in the dcache, then ``vfs_open()`` is used for
this.  If not, then ``lookup_open()`` will either call ``atomic_open()`` (if
the filesystem provides it) to combine the final lookup with the open, or
-   will perform the separate ``lookup_real()`` and ``vfs_create()`` steps
+   will perform the separate ``i_op->lookup()`` and ``i_op->create()`` steps
directly.  In the later case the actual "open" of this newly found or
created file will be performed by ``vfs_open()``, just as if the name
were found in the dcache.
 
 2. ``vfs_open()`` can fail with ``-EOPENSTALE`` if the cached information
-   wasn't quite current enough.  Rather than restarting the lookup from
-   the top with ``LOOKUP_REVAL`` set, ``lookup_open()`` is called instead,
-   giving the filesystem a chance to resolve small inconsistencies.
-   If that doesn't work, only then is the lookup restarted from the top.
+   wasn't quite current enough.  If it's in RCU-walk -ECHILD will be returned
+   otherwise will return -ESTALE.  When -ESTALE is returned, the caller may
+   retry with LOOKUP_REVAL flag set.
 
 3. An open with O_CREAT **does** follow a symlink in the final component,
unlike other creation system calls (like ``mkdir``).  So the sequence::
@@ -1223,8 +1222,8 @@ the code.
 
will create a file called ``/tmp/bar``.  This is not permitted if
``O_EXCL`` is set but otherwise is handled for an O_CREAT open much
-   like for a non-creating open: ``should_follow_link()`` returns ``1``, and
-   so does ``do_last()`` so that ``trailing_symlink()`` gets called and the
+   like for a non-creating open: ``lookup_last()`` or ``open_last_lookup()``
+   returns a non ``Null`` value, and ``link_path_walk()`` gets called and the
open process continues on the symlink that was found.
 
 Updating the access time
-- 
2.30.0



[PATCH 11/12] docs: path-lookup: update get_link() ->follow_link description

2021-01-26 Thread Fox Chen
get_link() is merged into pick_link(). i_op->follow_link is
replaced with i_op->get_link(). get_link() can return ERR_PTR(0)
which equals NULL.

Signed-off-by: Fox Chen 
---
 Documentation/filesystems/path-lookup.rst | 13 ++---
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/Documentation/filesystems/path-lookup.rst 
b/Documentation/filesystems/path-lookup.rst
index 921779a4636f..2bb3ca486acd 100644
--- a/Documentation/filesystems/path-lookup.rst
+++ b/Documentation/filesystems/path-lookup.rst
@@ -1137,10 +1137,10 @@ Symlinks with no final component
 
 A pair of special-case symlinks deserve a little further explanation.
 Both result in a new ``struct path`` (with mount and dentry) being set
-up in the ``nameidata``, and result in ``get_link()`` returning ``NULL``.
+up in the ``nameidata``, and result in ``pick_link()`` returning ``NULL``.
 
 The more obvious case is a symlink to "``/``".  All symlinks starting
-with "``/``" are detected in ``get_link()`` which resets the ``nameidata``
+with "``/``" are detected in ``pick_link()`` which resets the ``nameidata``
 to point to the effective filesystem root.  If the symlink only
 contains "``/``" then there is nothing more to do, no components at all,
 so ``NULL`` is returned to indicate that the symlink can be released and
@@ -1157,12 +1157,11 @@ something that looks like a symlink.  It is really a 
reference to the
 target file, not just the name of it.  When you ``readlink`` these
 objects you get a name that might refer to the same file - unless it
 has been unlinked or mounted over.  When ``walk_component()`` follows
-one of these, the ``->follow_link()`` method in "procfs" doesn't return
+one of these, the ``->get_link()`` method in "procfs" doesn't return
 a string name, but instead calls ``nd_jump_link()`` which updates the
-``nameidata`` in place to point to that target.  ``->follow_link()`` then
-returns ``NULL``.  Again there is no final component and ``get_link()``
-reports this by leaving the ``last_type`` field of ``nameidata`` as
-``LAST_BIND``.
+``nameidata`` in place to point to that target.  ``->get_link()`` then
+returns ``0``.  Again there is no final component and ``pick_link()``
+returns NULL.
 
 Following the symlink in the final component
 
-- 
2.30.0



[PATCH 04/12] docs: path-lookup: update do_last() part

2021-01-26 Thread Fox Chen
traling_symlink() was merged into lookup_last, do_last().

do_last() has later been split into open_last_lookups()
and do_open().

see related commit: c5971b8c6354a95c9ee7eb722928af5000bac247

Signed-off-by: Fox Chen 
---
 Documentation/filesystems/path-lookup.rst | 34 +++
 1 file changed, 17 insertions(+), 17 deletions(-)

diff --git a/Documentation/filesystems/path-lookup.rst 
b/Documentation/filesystems/path-lookup.rst
index 4e77c8520fa9..1f05b1417a55 100644
--- a/Documentation/filesystems/path-lookup.rst
+++ b/Documentation/filesystems/path-lookup.rst
@@ -496,11 +496,11 @@ one provided by a dead NFS server.  In the current 
kernel, path_mountpoint
 has been merged into ``path_lookup_at()`` with a new flag LOOKUP_MOUNTPOINT.
 
 Finally ``path_openat()`` is used for the ``open()`` system call; it
-contains, in support functions starting with "``do_last()``", all the
+contains, in support functions starting with "``open_last_lookups()``", all the
 complexity needed to handle the different subtleties of O_CREAT (with
 or without O_EXCL), final "``/``" characters, and trailing symbolic
 links.  We will revisit this in the final part of this series, which
-focuses on those symbolic links.  "``do_last()``" will sometimes, but
+focuses on those symbolic links.  "``open_last_lookups()``" will sometimes, but
 not always, take ``i_rwsem``, depending on what it finds.
 
 Each of these, or the functions which call them, need to be alert to
@@ -1201,26 +1201,26 @@ symlink.
 This case is handled by the relevant caller of ``link_path_walk()``, such as
 ``path_lookupat()`` using a loop that calls ``link_path_walk()``, and then
 handles the final component.  If the final component is a symlink
-that needs to be followed, then ``trailing_symlink()`` is called to set
-things up properly and the loop repeats, calling ``link_path_walk()``
-again.  This could loop as many as 40 times if the last component of
-each symlink is another symlink.
+that needs to be followed, then ``open_last_lookups()`` and ``do_open()`` is
+called to set things up properly and the loop repeats, calling
+``link_path_walk()`` again.  This could loop as many as 40 times if the last
+component of each symlink is another symlink.
 
 The various functions that examine the final component and possibly
-report that it is a symlink are ``lookup_last()``, ``mountpoint_last()``
-and ``do_last()``, each of which use the same convention as
-``walk_component()`` of returning ``1`` if a symlink was found that needs
-to be followed.
+report that it is a symlink are ``lookup_last()``, ``open_last_lookups()``
+, each of which use the same convention as
+``walk_component()`` of returning ``char *name`` if a symlink was found that
+needs to be followed.
 
-Of these, ``do_last()`` is the most interesting as it is used for
-opening a file.  Part of ``do_last()`` runs with ``i_rwsem`` held and this
-part is in a separate function: ``lookup_open()``.
+Of these, ``open_last_lookups()``, ``do_open()`` is the most interesting as it 
is
+used for opening a file.  Part of ``open_last_lookups()`` runs with ``i_rwsem``
+held and this part is in a separate function: ``lookup_open()``.
 
-Explaining ``do_last()`` completely is beyond the scope of this article,
-but a few highlights should help those interested in exploring the
-code.
+Explaining ``open_last_lookups()``, ``do_open()`` completely is beyond the 
scope
+of this article, but a few highlights should help those interested in exploring
+the code.
 
-1. Rather than just finding the target file, ``do_last()`` needs to open
+1. Rather than just finding the target file, ``do_open()`` needs to open
it.  If the file was found in the dcache, then ``vfs_open()`` is used for
this.  If not, then ``lookup_open()`` will either call ``atomic_open()`` (if
the filesystem provides it) to combine the final lookup with the open, or
-- 
2.30.0



[PATCH 06/12] docs: path-lookup: Add macro name to symlink limit description

2021-01-26 Thread Fox Chen
Add macro name MAXSYMLINKS to the symlink limit description, so
that it is consistent with path name length description above.

Signed-off-by: Fox Chen 
---
 Documentation/filesystems/path-lookup.rst | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/Documentation/filesystems/path-lookup.rst 
b/Documentation/filesystems/path-lookup.rst
index bc450e0864d6..25d2a5a59f45 100644
--- a/Documentation/filesystems/path-lookup.rst
+++ b/Documentation/filesystems/path-lookup.rst
@@ -994,8 +994,8 @@ is 4096.  There are a number of reasons for this limit; not 
letting the
 kernel spend too much time on just one path is one of them.  With
 symbolic links you can effectively generate much longer paths so some
 sort of limit is needed for the same reason.  Linux imposes a limit of
-at most 40 symlinks in any one path lookup.  It previously imposed a
-further limit of eight on the maximum depth of recursion, but that was
+at most 40 (MAXSYMLINKS) symlinks in any one path lookup.  It previously 
imposed
+a further limit of eight on the maximum depth of recursion, but that was
 raised to 40 when a separate stack was implemented, so there is now
 just the one limit.
 
-- 
2.30.0



[PATCH 00/12] docs: path-lookup: Update pathlookup docs

2021-01-26 Thread Fox Chen
The Path lookup is a very complex subject in VFS. The path-lookup
document provides a very detailed guidance to help people understand
how path lookup works in the kernel.This document was originally
written based on three lwn articles five years ago. As times goes by,
some of the content was outdated. This patchset is intended to update
the document to make it more relevant to current codebase.


Fox Chen (12):
  docs: path-lookup: update follow_managed() part
  docs: path-lookup: update path_to_nameidata() parth
  docs: path-lookup: update path_mountpoint() part
  docs: path-lookup: update do_last() part
  docs: path-lookup: remove filename_mountpoint
  docs: path-lookup: Add macro name to symlink limit description
  docs: path-lookup: i_op->follow_link replaced with i_op->get_link
  docs: path-lookup: update i_op->put_link and cookie description
  docs: path-lookup: no get_link()
  docs: path-lookup: update WALK_GET, WALK_PUT desc
  docs: path-lookup: update get_link() ->follow_link description
  docs: path-lookup: update symlink description

 Documentation/filesystems/path-lookup.rst | 146 ++
 1 file changed, 63 insertions(+), 83 deletions(-)

-- 
2.30.0



[PATCH 02/12] docs: path-lookup: update path_to_nameidata() parth

2021-01-26 Thread Fox Chen
No path_to_namei() anymore, step_into() will be called.
Related commit: c99687a03a78775f77d57fe9b07af4c8ec3dd03c

Signed-off-by: Fox Chen 
---
 Documentation/filesystems/path-lookup.rst | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/filesystems/path-lookup.rst 
b/Documentation/filesystems/path-lookup.rst
index e778db767120..2ad96e1e3c49 100644
--- a/Documentation/filesystems/path-lookup.rst
+++ b/Documentation/filesystems/path-lookup.rst
@@ -455,7 +455,7 @@ In the absence of symbolic links, ``walk_component()`` 
creates a new
 ``struct path`` containing a counted reference to the new dentry and a
 reference to the new ``vfsmount`` which is only counted if it is
 different from the previous ``vfsmount``.  It then calls
-``path_to_nameidata()`` to install the new ``struct path`` in the
+``step_into()`` to install the new ``struct path`` in the
 ``struct nameidata`` and drop the unneeded references.
 
 This "hand-over-hand" sequencing of getting a reference to the new
-- 
2.30.0



[PATCH 03/12] docs: path-lookup: update path_mountpoint() part

2021-01-26 Thread Fox Chen
path_mountpoint() doesn't exist anymore. Have been folded
into path_lookup_at when flag is set with LOOKUP_MOUNTPOINT.
check out commit:161aff1d93abf0e5b5e9dbca88928998c155f677

Signed-off-by: Fox Chen 
---
 Documentation/filesystems/path-lookup.rst | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/Documentation/filesystems/path-lookup.rst 
b/Documentation/filesystems/path-lookup.rst
index 2ad96e1e3c49..4e77c8520fa9 100644
--- a/Documentation/filesystems/path-lookup.rst
+++ b/Documentation/filesystems/path-lookup.rst
@@ -492,7 +492,8 @@ not try to revalidate the mounted filesystem.  It 
effectively
 contains, through a call to ``mountpoint_last()``, an alternate
 implementation of ``lookup_slow()`` which skips that step.  This is
 important when unmounting a filesystem that is inaccessible, such as
-one provided by a dead NFS server.
+one provided by a dead NFS server.  In the current kernel, path_mountpoint
+has been merged into ``path_lookup_at()`` with a new flag LOOKUP_MOUNTPOINT.
 
 Finally ``path_openat()`` is used for the ``open()`` system call; it
 contains, in support functions starting with "``do_last()``", all the
-- 
2.30.0



[PATCH 05/12] docs: path-lookup: remove filename_mountpoint

2021-01-26 Thread Fox Chen
No filename_mountpoint any more
see commit: 161aff1d93abf0e5b5e9dbca88928998c155f677

Without filename_mountpoint and path_mountpoint(), the
numbers should be four & three:

"These four correspond roughly to the three path_*() functions"

Signed-off-by: Fox Chen 
---
 Documentation/filesystems/path-lookup.rst | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/Documentation/filesystems/path-lookup.rst 
b/Documentation/filesystems/path-lookup.rst
index 1f05b1417a55..bc450e0864d6 100644
--- a/Documentation/filesystems/path-lookup.rst
+++ b/Documentation/filesystems/path-lookup.rst
@@ -654,9 +654,9 @@ restarts from the top with REF-walk.
 
 This pattern of "try RCU-walk, if that fails try REF-walk" can be
 clearly seen in functions like ``filename_lookup()``,
-``filename_parentat()``, ``filename_mountpoint()``,
-``do_filp_open()``, and ``do_file_open_root()``.  These five
-correspond roughly to the four ``path_*()`` functions we met earlier,
+``filename_parentat()``,
+``do_filp_open()``, and ``do_file_open_root()``.  These four
+correspond roughly to the three ``path_*()`` functions we met earlier,
 each of which calls ``link_path_walk()``.  The ``path_*()`` functions are
 called using different mode flags until a mode is found which works.
 They are first called with ``LOOKUP_RCU`` set to request "RCU-walk".  If
-- 
2.30.0



[PATCH 01/12] docs: path-lookup: update follow_managed() part

2021-01-26 Thread Fox Chen
No follow_managed() anymore, handle_mounts(),
traverse_mounts(), will do the job.
see commit: 9deed3ebca244663530782631834e706a86a8c8f

Signed-off-by: Fox Chen 
---
 Documentation/filesystems/path-lookup.rst | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/Documentation/filesystems/path-lookup.rst 
b/Documentation/filesystems/path-lookup.rst
index c482e1619e77..e778db767120 100644
--- a/Documentation/filesystems/path-lookup.rst
+++ b/Documentation/filesystems/path-lookup.rst
@@ -448,8 +448,8 @@ described.  If it finds a ``LAST_NORM`` component it first 
calls
 filesystem to revalidate the result if it is that sort of filesystem.
 If that doesn't get a good result, it calls "``lookup_slow()``" which
 takes ``i_rwsem``, rechecks the cache, and then asks the filesystem
-to find a definitive answer.  Each of these will call
-``follow_managed()`` (as described below) to handle any mount points.
+to find a definitive answer.  In ``step_into()``, ``handle_mount()`` will be 
+called to handle any mount point.
 
 In the absence of symbolic links, ``walk_component()`` creates a new
 ``struct path`` containing a counted reference to the new dentry and a
@@ -536,7 +536,7 @@ tree, but a few notes specifically related to path lookup 
are in order
 here.
 
 The Linux VFS has a concept of "managed" dentries which is reflected
-in function names such as "``follow_managed()``".  There are three
+in function names such as "``traverse_mounts()``".  There are three
 potentially interesting things about these dentries corresponding
 to three different flags that might be set in ``dentry->d_flags``:
 
-- 
2.30.0



Re: [PATCH v2 0/6] kernfs: proposed locking and concurrency improvement

2020-12-21 Thread Fox Chen
On Sun, Dec 20, 2020 at 7:52 AM Ian Kent  wrote:
>
> On Sat, 2020-12-19 at 11:23 -0500, Tejun Heo wrote:
> > Hello,
> >
> > On Sat, Dec 19, 2020 at 03:08:13PM +0800, Ian Kent wrote:
> > > And looking further I see there's a race that kernfs can't do
> > > anything
> > > about between kernfs_refresh_inode() and fs/inode.c:update_times().
> >
> > Do kernfs files end up calling into that path tho? Doesn't look like
> > it to
> > me but if so yeah we'd need to override the update_time for kernfs.
>
> Sorry, the below was very hastily done and not what I would actually
> propose.
>
> The main point of it was the question
>
> +   /* Which kernfs node attributes should be updated from
> +* time?
> +*/
>
> but looking at it again this morning I think the node iattr fields
> that might need to be updated would be atime, ctime and mtime only,
> maybe not ctime ... not sure.
>
> What do you think?
>
> Also, if kn->attr == NULL it should fall back to what the VFS
> currently does.
>
> The update_times() function is one of the few places where the
> VFS updates the inode times.
>
> The idea is that the reason kernfs needs to overwrite the inode
> attributes is to reset what the VFS might have done but if kernfs
> has this inode operation they won't need to be overwritten since
> they won't have changed.
>
> There may be other places where the attributes (or an attribute)
> are set by the VFS, I haven't finished checking that yet so my
> suggestion might not be entirely valid.
>
> What I need to do is work out what kernfs node attributes, if any,
> should be updated by .update_times(). If I go by what
> kernfs_refresh_inode() does now then that would be none but shouldn't
> atime at least be updated in the node iattr.
>
> > > +static int kernfs_iop_update_time(struct inode *inode, struct
> > > timespec64 *time, int flags)
> > >  {
> > > -   struct inode *inode = d_inode(path->dentry);
> > > struct kernfs_node *kn = inode->i_private;
> > > +   struct kernfs_iattrs *attrs;
> > >
> > > mutex_lock(_mutex);
> > > +   attrs = kernfs_iattrs(kn);
> > > +   if (!attrs) {
> > > +   mutex_unlock(_mutex);
> > > +   return -ENOMEM;
> > > +   }
> > > +
> > > +   /* Which kernfs node attributes should be updated from
> > > +* time?
> > > +*/
> > > +
> > > kernfs_refresh_inode(kn, inode);
> > > mutex_unlock(_mutex);
> >
> > I don't see how this would reflect the changes from kernfs_setattr()
> > into
> > the attached inode. This would actually make the attr updates
> > obviously racy
> > - the userland visible attrs would be stale until the inode gets
> > reclaimed
> > and then when it gets reinstantiated it'd show the latest
> > information.
>
> Right, I will have to think about that, but as I say above this
> isn't really what I would propose.
>
> If .update_times() sticks strictly to what kernfs_refresh_inode()
> does now then it would set the inode attributes from the node iattr
> only.
>
> >
> > That said, if you wanna take the direction where attr updates are
> > reflected
> > to the associated inode when the change occurs, which makes sense,
> > the right
> > thing to do would be making kernfs_setattr() update the associated
> > inode if
> > existent.
>
> Mmm, that's a good point but it looks like the inode isn't available
> there.
>
Is it possible to embed super block somewhere, then we can call
kernfs_get_inode to get inode in kernfs_setattr???


thanks,
fox


Re: [PATCH v2 0/6] kernfs: proposed locking and concurrency improvement

2020-12-18 Thread Fox Chen
On Sat, Dec 19, 2020 at 8:53 AM Ian Kent  wrote:
>
> On Fri, 2020-12-18 at 21:20 +0800, Fox Chen wrote:
> > On Fri, Dec 18, 2020 at 7:21 PM Ian Kent  wrote:
> > > On Fri, 2020-12-18 at 16:01 +0800, Fox Chen wrote:
> > > > On Fri, Dec 18, 2020 at 3:36 PM Ian Kent 
> > > > wrote:
> > > > > On Thu, 2020-12-17 at 10:14 -0500, Tejun Heo wrote:
> > > > > > Hello,
> > > > > >
> > > > > > On Thu, Dec 17, 2020 at 07:48:49PM +0800, Ian Kent wrote:
> > > > > > > > What could be done is to make the kernfs node attr_mutex
> > > > > > > > a pointer and dynamically allocate it but even that is
> > > > > > > > too
> > > > > > > > costly a size addition to the kernfs node structure as
> > > > > > > > Tejun has said.
> > > > > > >
> > > > > > > I guess the question to ask is, is there really a need to
> > > > > > > call kernfs_refresh_inode() from functions that are usually
> > > > > > > reading/checking functions.
> > > > > > >
> > > > > > > Would it be sufficient to refresh the inode in the
> > > > > > > write/set
> > > > > > > operations in (if there's any) places where things like
> > > > > > > setattr_copy() is not already called?
> > > > > > >
> > > > > > > Perhaps GKH or Tejun could comment on this?
> > > > > >
> > > > > > My memory is a bit hazy but invalidations on reads is how
> > > > > > sysfs
> > > > > > namespace is
> > > > > > implemented, so I don't think there's an easy around that.
> > > > > > The
> > > > > > only
> > > > > > thing I
> > > > > > can think of is embedding the lock into attrs and doing xchg
> > > > > > dance
> > > > > > when
> > > > > > attaching it.
> > > > >
> > > > > Sounds like your saying it would be ok to add a lock to the
> > > > > attrs structure, am I correct?
> > > > >
> > > > > Assuming it is then, to keep things simple, use two locks.
> > > > >
> > > > > One global lock for the allocation and an attrs lock for all
> > > > > the
> > > > > attrs field updates including the kernfs_refresh_inode()
> > > > > update.
> > > > >
> > > > > The critical section for the global lock could be reduced and
> > > > > it
> > > > > changed to a spin lock.
> > > > >
> > > > > In __kernfs_iattrs() we would have something like:
> > > > >
> > > > > take the allocation lock
> > > > > do the allocated checks
> > > > >   assign if existing attrs
> > > > >   release the allocation lock
> > > > >   return existing if found
> > > > > othewise
> > > > >   release the allocation lock
> > > > >
> > > > > allocate and initialize attrs
> > > > >
> > > > > take the allocation lock
> > > > > check if someone beat us to it
> > > > >   free and grab exiting attrs
> > > > > otherwise
> > > > >   assign the new attrs
> > > > > release the allocation lock
> > > > > return attrs
> > > > >
> > > > > Add a spinlock to the attrs struct and use it everywhere for
> > > > > field updates.
> > > > >
> > > > > Am I on the right track or can you see problems with this?
> > > > >
> > > > > Ian
> > > > >
> > > >
> > > > umm, we update the inode in kernfs_refresh_inode, right??  So I
> > > > guess
> > > > the problem is how can we protect the inode when
> > > > kernfs_refresh_inode
> > > > is called, not the attrs??
> > >
> > > But the attrs (which is what's copied from) were protected by the
> > > mutex lock (IIUC) so dealing with the inode attributes implies
> > > dealing with the kernfs node attrs too.
> > >
> > > For example in kernfs_iop_setattr() the call to setattr_copy()
> > > copies
> > > the node attrs to the inode under the same mutex lock. So, if a
> > > read
> > > lock is us

Re: [PATCH v2 0/6] kernfs: proposed locking and concurrency improvement

2020-12-18 Thread Fox Chen
On Fri, Dec 18, 2020 at 7:21 PM Ian Kent  wrote:
>
> On Fri, 2020-12-18 at 16:01 +0800, Fox Chen wrote:
> > On Fri, Dec 18, 2020 at 3:36 PM Ian Kent  wrote:
> > > On Thu, 2020-12-17 at 10:14 -0500, Tejun Heo wrote:
> > > > Hello,
> > > >
> > > > On Thu, Dec 17, 2020 at 07:48:49PM +0800, Ian Kent wrote:
> > > > > > What could be done is to make the kernfs node attr_mutex
> > > > > > a pointer and dynamically allocate it but even that is too
> > > > > > costly a size addition to the kernfs node structure as
> > > > > > Tejun has said.
> > > > >
> > > > > I guess the question to ask is, is there really a need to
> > > > > call kernfs_refresh_inode() from functions that are usually
> > > > > reading/checking functions.
> > > > >
> > > > > Would it be sufficient to refresh the inode in the write/set
> > > > > operations in (if there's any) places where things like
> > > > > setattr_copy() is not already called?
> > > > >
> > > > > Perhaps GKH or Tejun could comment on this?
> > > >
> > > > My memory is a bit hazy but invalidations on reads is how sysfs
> > > > namespace is
> > > > implemented, so I don't think there's an easy around that. The
> > > > only
> > > > thing I
> > > > can think of is embedding the lock into attrs and doing xchg
> > > > dance
> > > > when
> > > > attaching it.
> > >
> > > Sounds like your saying it would be ok to add a lock to the
> > > attrs structure, am I correct?
> > >
> > > Assuming it is then, to keep things simple, use two locks.
> > >
> > > One global lock for the allocation and an attrs lock for all the
> > > attrs field updates including the kernfs_refresh_inode() update.
> > >
> > > The critical section for the global lock could be reduced and it
> > > changed to a spin lock.
> > >
> > > In __kernfs_iattrs() we would have something like:
> > >
> > > take the allocation lock
> > > do the allocated checks
> > >   assign if existing attrs
> > >   release the allocation lock
> > >   return existing if found
> > > othewise
> > >   release the allocation lock
> > >
> > > allocate and initialize attrs
> > >
> > > take the allocation lock
> > > check if someone beat us to it
> > >   free and grab exiting attrs
> > > otherwise
> > >   assign the new attrs
> > > release the allocation lock
> > > return attrs
> > >
> > > Add a spinlock to the attrs struct and use it everywhere for
> > > field updates.
> > >
> > > Am I on the right track or can you see problems with this?
> > >
> > > Ian
> > >
> >
> > umm, we update the inode in kernfs_refresh_inode, right??  So I guess
> > the problem is how can we protect the inode when kernfs_refresh_inode
> > is called, not the attrs??
>
> But the attrs (which is what's copied from) were protected by the
> mutex lock (IIUC) so dealing with the inode attributes implies
> dealing with the kernfs node attrs too.
>
> For example in kernfs_iop_setattr() the call to setattr_copy() copies
> the node attrs to the inode under the same mutex lock. So, if a read
> lock is used the copy in kernfs_refresh_inode() is no longer protected,
> it needs to be protected in a different way.
>

Ok, I'm actually wondering why the VFS holds exclusive i_rwsem for .setattr but
 no lock for .getattr (misdocumented?? sometimes they have as you've found out)?
What does it protect against?? Because .permission does a similar thing
here -- updating inode attributes, the goal is to provide the same
protection level
for .permission as for .setattr, am I right???


thanks,
fox


Re: [PATCH v2 0/6] kernfs: proposed locking and concurrency improvement

2020-12-18 Thread Fox Chen
On Fri, Dec 18, 2020 at 3:36 PM Ian Kent  wrote:
>
> On Thu, 2020-12-17 at 10:14 -0500, Tejun Heo wrote:
> > Hello,
> >
> > On Thu, Dec 17, 2020 at 07:48:49PM +0800, Ian Kent wrote:
> > > > What could be done is to make the kernfs node attr_mutex
> > > > a pointer and dynamically allocate it but even that is too
> > > > costly a size addition to the kernfs node structure as
> > > > Tejun has said.
> > >
> > > I guess the question to ask is, is there really a need to
> > > call kernfs_refresh_inode() from functions that are usually
> > > reading/checking functions.
> > >
> > > Would it be sufficient to refresh the inode in the write/set
> > > operations in (if there's any) places where things like
> > > setattr_copy() is not already called?
> > >
> > > Perhaps GKH or Tejun could comment on this?
> >
> > My memory is a bit hazy but invalidations on reads is how sysfs
> > namespace is
> > implemented, so I don't think there's an easy around that. The only
> > thing I
> > can think of is embedding the lock into attrs and doing xchg dance
> > when
> > attaching it.
>
> Sounds like your saying it would be ok to add a lock to the
> attrs structure, am I correct?
>
> Assuming it is then, to keep things simple, use two locks.
>
> One global lock for the allocation and an attrs lock for all the
> attrs field updates including the kernfs_refresh_inode() update.
>
> The critical section for the global lock could be reduced and it
> changed to a spin lock.
>
> In __kernfs_iattrs() we would have something like:
>
> take the allocation lock
> do the allocated checks
>   assign if existing attrs
>   release the allocation lock
>   return existing if found
> othewise
>   release the allocation lock
>
> allocate and initialize attrs
>
> take the allocation lock
> check if someone beat us to it
>   free and grab exiting attrs
> otherwise
>   assign the new attrs
> release the allocation lock
> return attrs
>
> Add a spinlock to the attrs struct and use it everywhere for
> field updates.
>
> Am I on the right track or can you see problems with this?
>
> Ian
>

umm, we update the inode in kernfs_refresh_inode, right??  So I guess
the problem is how can we protect the inode when kernfs_refresh_inode
is called, not the attrs??


thanks,
fox


Re: [PATCH v2 0/6] kernfs: proposed locking and concurrency improvement

2020-12-17 Thread Fox Chen
On Thu, Dec 17, 2020 at 12:46 PM Ian Kent  wrote:
>
> On Tue, 2020-12-15 at 20:59 +0800, Ian Kent wrote:
> > On Tue, 2020-12-15 at 16:33 +0800, Fox Chen wrote:
> > > On Mon, Dec 14, 2020 at 9:30 PM Ian Kent  wrote:
> > > > On Mon, 2020-12-14 at 14:14 +0800, Fox Chen wrote:
> > > > > On Sun, Dec 13, 2020 at 11:46 AM Ian Kent 
> > > > > wrote:
> > > > > > On Fri, 2020-12-11 at 10:17 +0800, Ian Kent wrote:
> > > > > > > On Fri, 2020-12-11 at 10:01 +0800, Ian Kent wrote:
> > > > > > > > > For the patches, there is a mutex_lock in kn-
> > > > > > > > > >attr_mutex,
> > > > > > > > > as
> > > > > > > > > Tejun
> > > > > > > > > mentioned here
> > > > > > > > > (
> > > > > > > > > https://lore.kernel.org/lkml/x8fe0cmu+aq1g...@mtj.duckdns.org/
> > > > > > > > > ),
> > > > > > > > > maybe a global
> > > > > > > > > rwsem for kn->iattr will be better??
> > > > > > > >
> > > > > > > > I wasn't sure about that, IIRC a spin lock could be used
> > > > > > > > around
> > > > > > > > the
> > > > > > > > initial check and checked again at the end which would
> > > > > > > > probably
> > > > > > > > have
> > > > > > > > been much faster but much less conservative and a bit
> > > > > > > > more
> > > > > > > > ugly
> > > > > > > > so
> > > > > > > > I just went the conservative path since there was so much
> > > > > > > > change
> > > > > > > > already.
> > > > > > >
> > > > > > > Sorry, I hadn't looked at Tejun's reply yet and TBH didn't
> > > > > > > remember
> > > > > > > it.
> > > > > > >
> > > > > > > Based on what Tejun said it sounds like that needs work.
> > > > > >
> > > > > > Those attribute handling patches were meant to allow taking
> > > > > > the
> > > > > > rw
> > > > > > sem read lock instead of the write lock for
> > > > > > kernfs_refresh_inode()
> > > > > > updates, with the added locking to protect the inode
> > > > > > attributes
> > > > > > update since it's called from the VFS both with and without
> > > > > > the
> > > > > > inode lock.
> > > > >
> > > > > Oh, understood. I was asking also because lock on kn-
> > > > > >attr_mutex
> > > > > drags
> > > > > concurrent performance.
> > > > >
> > > > > > Looking around it looks like kernfs_iattrs() is called from
> > > > > > multiple
> > > > > > places without a node database lock at all.
> > > > > >
> > > > > > I'm thinking that, to keep my proposed change straight
> > > > > > forward
> > > > > > and on topic, I should just leave kernfs_refresh_inode()
> > > > > > taking
> > > > > > the node db write lock for now and consider the attributes
> > > > > > handling
> > > > > > as a separate change. Once that's done we could reconsider
> > > > > > what's
> > > > > > needed to use the node db read lock in
> > > > > > kernfs_refresh_inode().
> > > > >
> > > > > You meant taking write lock of kernfs_rwsem for
> > > > > kernfs_refresh_inode()??
> > > > > It may be a lot slower in my benchmark, let me test it.
> > > >
> > > > Yes, but make sure the write lock of kernfs_rwsem is being taken
> > > > not the read lock.
> > > >
> > > > That's a mistake I had initially?
> > > >
> > > > Still, that attributes handling is, I think, sufficient to
> > > > warrant
> > > > a separate change since it looks like it might need work, the
> > > > kernfs
> > > > node db probably should be kept stable for those attribute
> > > > updates
> > > > but equally the existence of an instantiated dentry might
&

Re: [PATCH v2 0/6] kernfs: proposed locking and concurrency improvement

2020-12-15 Thread Fox Chen
On Mon, Dec 14, 2020 at 9:30 PM Ian Kent  wrote:
>
> On Mon, 2020-12-14 at 14:14 +0800, Fox Chen wrote:
> > On Sun, Dec 13, 2020 at 11:46 AM Ian Kent  wrote:
> > > On Fri, 2020-12-11 at 10:17 +0800, Ian Kent wrote:
> > > > On Fri, 2020-12-11 at 10:01 +0800, Ian Kent wrote:
> > > > > > For the patches, there is a mutex_lock in kn->attr_mutex, as
> > > > > > Tejun
> > > > > > mentioned here
> > > > > > (
> > > > > > https://lore.kernel.org/lkml/x8fe0cmu+aq1g...@mtj.duckdns.org/
> > > > > > ),
> > > > > > maybe a global
> > > > > > rwsem for kn->iattr will be better??
> > > > >
> > > > > I wasn't sure about that, IIRC a spin lock could be used around
> > > > > the
> > > > > initial check and checked again at the end which would probably
> > > > > have
> > > > > been much faster but much less conservative and a bit more ugly
> > > > > so
> > > > > I just went the conservative path since there was so much
> > > > > change
> > > > > already.
> > > >
> > > > Sorry, I hadn't looked at Tejun's reply yet and TBH didn't
> > > > remember
> > > > it.
> > > >
> > > > Based on what Tejun said it sounds like that needs work.
> > >
> > > Those attribute handling patches were meant to allow taking the rw
> > > sem read lock instead of the write lock for kernfs_refresh_inode()
> > > updates, with the added locking to protect the inode attributes
> > > update since it's called from the VFS both with and without the
> > > inode lock.
> >
> > Oh, understood. I was asking also because lock on kn->attr_mutex
> > drags
> > concurrent performance.
> >
> > > Looking around it looks like kernfs_iattrs() is called from
> > > multiple
> > > places without a node database lock at all.
> > >
> > > I'm thinking that, to keep my proposed change straight forward
> > > and on topic, I should just leave kernfs_refresh_inode() taking
> > > the node db write lock for now and consider the attributes handling
> > > as a separate change. Once that's done we could reconsider what's
> > > needed to use the node db read lock in kernfs_refresh_inode().
> >
> > You meant taking write lock of kernfs_rwsem for
> > kernfs_refresh_inode()??
> > It may be a lot slower in my benchmark, let me test it.
>
> Yes, but make sure the write lock of kernfs_rwsem is being taken
> not the read lock.
>
> That's a mistake I had initially?
>
> Still, that attributes handling is, I think, sufficient to warrant
> a separate change since it looks like it might need work, the kernfs
> node db probably should be kept stable for those attribute updates
> but equally the existence of an instantiated dentry might mitigate
> the it.
>
> Some people might just know whether it's ok or not but I would like
> to check the callers to work out what's going on.
>
> In any case it's academic if GCH isn't willing to consider the series
> for review and possible merge.
>
Hi Ian

I removed kn->attr_mutex and changed read lock to write lock for
kernfs_refresh_inode

down_write(_rwsem);
kernfs_refresh_inode(kn, inode);
up_write(_rwsem);


Unfortunate, changes in this way make things worse,  my benchmark runs
100% slower than upstream sysfs.  :(
open+read+close a sysfs file concurrently took 1000us. (Currently,
sysfs with a big mutex kernfs_mutex only takes ~500us
for one open+read+close operation concurrently)

|--45.93%--kernfs_iop_permission
  ||
  |  |  |  |
  ||
  |  |  |
|--22.55%--down_write
  ||
  |  |  |  |  |
  ||
  |  |  |  |
--20.69%--rwsem_down_write_slowpath
  ||
  |  |  |  |
  |
  ||
  |  |  |  |
  |--8.89%--schedule

perf showed most of the time had been spent on kernfs_iop_permission


thanks,
fox


Re: [PATCH v2 0/6] kernfs: proposed locking and concurrency improvement

2020-12-13 Thread Fox Chen
On Sun, Dec 13, 2020 at 11:46 AM Ian Kent  wrote:
>
> On Fri, 2020-12-11 at 10:17 +0800, Ian Kent wrote:
> > On Fri, 2020-12-11 at 10:01 +0800, Ian Kent wrote:
> > > > For the patches, there is a mutex_lock in kn->attr_mutex, as
> > > > Tejun
> > > > mentioned here
> > > > (https://lore.kernel.org/lkml/x8fe0cmu+aq1g...@mtj.duckdns.org/),
> > > > maybe a global
> > > > rwsem for kn->iattr will be better??
> > >
> > > I wasn't sure about that, IIRC a spin lock could be used around the
> > > initial check and checked again at the end which would probably
> > > have
> > > been much faster but much less conservative and a bit more ugly so
> > > I just went the conservative path since there was so much change
> > > already.
> >
> > Sorry, I hadn't looked at Tejun's reply yet and TBH didn't remember
> > it.
> >
> > Based on what Tejun said it sounds like that needs work.
>
> Those attribute handling patches were meant to allow taking the rw
> sem read lock instead of the write lock for kernfs_refresh_inode()
> updates, with the added locking to protect the inode attributes
> update since it's called from the VFS both with and without the
> inode lock.

Oh, understood. I was asking also because lock on kn->attr_mutex drags
concurrent performance.

> Looking around it looks like kernfs_iattrs() is called from multiple
> places without a node database lock at all.
>
> I'm thinking that, to keep my proposed change straight forward
> and on topic, I should just leave kernfs_refresh_inode() taking
> the node db write lock for now and consider the attributes handling
> as a separate change. Once that's done we could reconsider what's
> needed to use the node db read lock in kernfs_refresh_inode().

You meant taking write lock of kernfs_rwsem for kernfs_refresh_inode()??
It may be a lot slower in my benchmark, let me test it.

> It will reduce the effectiveness of the series but it would make
> this change much more complicated, and is somewhat off-topic, and
> could hamper the chances of reviewers spotting problem with it.
>
> Ian
>


thanks,
fox


[tip: core/rcu] docs/memory-barriers.txt: Fix a typo in CPU MEMORY BARRIERS section

2020-12-13 Thread tip-bot2 for Fox Chen
The following commit has been merged into the core/rcu branch of tip:

Commit-ID: d8566f15da9b1e51fd35f24321ec133095e02d06
Gitweb:
https://git.kernel.org/tip/d8566f15da9b1e51fd35f24321ec133095e02d06
Author:Fox Chen 
AuthorDate:Wed, 09 Sep 2020 14:53:40 +08:00
Committer: Paul E. McKenney 
CommitterDate: Fri, 06 Nov 2020 17:24:51 -08:00

docs/memory-barriers.txt: Fix a typo in CPU MEMORY BARRIERS section

Commit 39323c6 ("smp_mb__{before,after}_atomic(): update Documentation")
has a typo in CPU MEORY BARRIERS section:
"RMW functions that do not imply are memory barrier are ..." should be
"RMW functions that do not imply a memory barrier are ...".

This patch fixes this typo.

Signed-off-by: Fox Chen 
Acked-by: Will Deacon 
Signed-off-by: Paul E. McKenney 
---
 Documentation/memory-barriers.txt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/memory-barriers.txt 
b/Documentation/memory-barriers.txt
index 17c8e0c..7367ada 100644
--- a/Documentation/memory-barriers.txt
+++ b/Documentation/memory-barriers.txt
@@ -1870,7 +1870,7 @@ There are some more advanced barrier functions:
 
  These are for use with atomic RMW functions that do not imply memory
  barriers, but where the code needs a memory barrier. Examples for atomic
- RMW functions that do not imply are memory barrier are e.g. add,
+ RMW functions that do not imply a memory barrier are e.g. add,
  subtract, (failed) conditional operations, _relaxed functions,
  but not atomic_read or atomic_set. A common example where a memory
  barrier may be required is when atomic ops are used for reference


RE:[PATCH v2 0/6] kernfs: proposed locking and concurrency improvement

2020-12-10 Thread Fox Chen
Hi,

I found this series of patches solves exact the problem I am trying to solve.
https://lore.kernel.org/lkml/20201202145837.48040-1-foxhlc...@gmail.com/

The problem is reported by Brice Goglin on thread:
Re: [PATCH 1/4] drivers core: Introduce CPU type sysfs interface
https://lore.kernel.org/lkml/x60dvjot4furc...@kroah.com/

I independently comfirmed that on a 96-core AWS c5.metal server.
Do open+read+write on /sys/devices/system/cpu/cpu15/topology/core_id 1000 times.
With a single thread it takes ~2.5 us for each open+read+close.
With one thread per core, 96 threads running simultaneously takes 540 us 
for each of the same operation (without much variation) -- 200x slower than the 
single thread one. 

My Benchmark code is here:
https://github.com/foxhlchen/sysfs_benchmark

The problem can only be observed in large machines (>=16 cores).
The more cores you have the slower it can be.

Perf shows that CPUs spend most of the time (>80%) waiting on mutex locks in 
kernfs_iop_permission and kernfs_dop_revalidate.

After applying this, performance gets huge boost -- with the fastest one at ~30 
us 
to the worst at ~180 us (most of on spin_locks, the delay just stacking up, very
similar to the performance on ext4). 

I hope this problem can justifies this series of patches. A big mutex in kernfs
is really not nice. Due to this BIG LOCK, concurrency in kernfs is almost NONE,
even though you do operations on different files, they are contentious.

As we get more and more cores on normal machines and because sysfs provides such
important information, this problem should be fix. So please reconsider 
accepting
the patches.

For the patches, there is a mutex_lock in kn->attr_mutex, as Tejun mentioned 
here 
(https://lore.kernel.org/lkml/x8fe0cmu+aq1g...@mtj.duckdns.org/), maybe a 
global 
rwsem for kn->iattr will be better??



thanks,
fox



[PATCH V2] kernfs: replace the mutex in kernfs_iop_permission with a rwlock

2020-12-07 Thread Fox Chen
A big global mutex in kernfs_iop_permission will significanly drag
system performance when processes concurrently open files
on kernfs in Big machines(with >= 16 cpu cores).

This patch replace the big mutex with a global rwsem lock.
So that kernfs_iop_permission can perform concurrently.

In a 96-core AMD EPYC ROME server, I can observe 50% boost on
a open+read+close cycle when I call open+read+close one thread per
core concurrently 1000 times after applying the patch.

Signed-off-by: Fox Chen 
---
 fs/kernfs/inode.c | 19 +++
 1 file changed, 11 insertions(+), 8 deletions(-)

diff --git a/fs/kernfs/inode.c b/fs/kernfs/inode.c
index fc2469a20fed..ea65da176cfa 100644
--- a/fs/kernfs/inode.c
+++ b/fs/kernfs/inode.c
@@ -14,9 +14,12 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "kernfs-internal.h"
 
+static DECLARE_RWSEM(kernfs_iattr_rwsem);
+
 static const struct address_space_operations kernfs_aops = {
.readpage   = simple_readpage,
.write_begin= simple_write_begin,
@@ -106,9 +109,9 @@ int kernfs_setattr(struct kernfs_node *kn, const struct 
iattr *iattr)
 {
int ret;
 
-   mutex_lock(_mutex);
+   down_write(_iattr_rwsem);
ret = __kernfs_setattr(kn, iattr);
-   mutex_unlock(_mutex);
+   up_write(_iattr_rwsem);
return ret;
 }
 
@@ -121,7 +124,7 @@ int kernfs_iop_setattr(struct dentry *dentry, struct iattr 
*iattr)
if (!kn)
return -EINVAL;
 
-   mutex_lock(_mutex);
+   down_write(_iattr_rwsem);
error = setattr_prepare(dentry, iattr);
if (error)
goto out;
@@ -134,7 +137,7 @@ int kernfs_iop_setattr(struct dentry *dentry, struct iattr 
*iattr)
setattr_copy(inode, iattr);
 
 out:
-   mutex_unlock(_mutex);
+   up_write(_iattr_rwsem);
return error;
 }
 
@@ -189,9 +192,9 @@ int kernfs_iop_getattr(const struct path *path, struct 
kstat *stat,
struct inode *inode = d_inode(path->dentry);
struct kernfs_node *kn = inode->i_private;
 
-   mutex_lock(_mutex);
+   down_read(_iattr_rwsem);
kernfs_refresh_inode(kn, inode);
-   mutex_unlock(_mutex);
+   up_read(_iattr_rwsem);
 
generic_fillattr(inode, stat);
return 0;
@@ -281,9 +284,9 @@ int kernfs_iop_permission(struct inode *inode, int mask)
 
kn = inode->i_private;
 
-   mutex_lock(_mutex);
+   down_read(_iattr_rwsem);
kernfs_refresh_inode(kn, inode);
-   mutex_unlock(_mutex);
+   up_read(_iattr_rwsem);
 
return generic_permission(inode, mask);
 }
-- 
2.29.2



Differences from V1:

* Use rwsem instead of rwlock so we can sleep when kernfs_iattrs calls
  GFP_KERNEL type memory allocation.

* Use a global lock instead of a per-node lock to reduce memory
  consumption.


It's still slow, a open+read+close cycle spends ~260us compared to ~3us of 
single
thread one. After applying this, the mutex in kernfs_dop_revalidate becomes the 
top time-consuming
operation on concurrent open+read+close. However That's harder to solve than 
this one
and it's near the merge window and holiday season, I don't want to add up work 
load to
you guys during that time so I decided to turn in this separately. Hopefully, I 
can bring in
kernfs_dop_revalidate patch after holiday.

And hope this patch can help.


thanks,
fox


Re: [PATCH 2/2] kernfs: remove mutex in kernfs_dop_revalidate

2020-12-02 Thread Fox Chen
On Thu, Dec 3, 2020 at 2:46 AM Tejun Heo  wrote:
>
> Hello,
>
> On Wed, Dec 02, 2020 at 10:58:37PM +0800, Fox Chen wrote:
> > There is a big mutex in kernfs_dop_revalidate which slows down the
> > concurrent performance of kernfs.
> >
> > Since kernfs_dop_revalidate only does some checks, the lock is
> > largely unnecessary. Also, according to kernel filesystem locking
> > document:
> > https://www.kernel.org/doc/html/latest/filesystems/locking.html
> > locking is not in the protocal for d_revalidate operation.
>
> That's just describing the rules seen from vfs side. It doesn't say anything
> about locking rules internal to each file system implementation.

Oh, Ok, I got it.

> > This patch remove this mutex from
> > kernfs_dop_revalidate, so kernfs_dop_revalidate
> > can run concurrently.
> >
> > Signed-off-by: Fox Chen 
> > ---
> >  fs/kernfs/dir.c | 9 +++--
> >  1 file changed, 3 insertions(+), 6 deletions(-)
> >
> > diff --git a/fs/kernfs/dir.c b/fs/kernfs/dir.c
> > index 9aec80b9d7c6..c2267c93f546 100644
> > --- a/fs/kernfs/dir.c
> > +++ b/fs/kernfs/dir.c
> > @@ -26,7 +26,6 @@ static DEFINE_SPINLOCK(kernfs_idr_lock);/* 
> > root->ino_idr */
> >
> >  static bool kernfs_active(struct kernfs_node *kn)
> >  {
> > - lockdep_assert_held(_mutex);
> >   return atomic_read(>active) >= 0;
> >  }
> >
> > @@ -557,10 +556,9 @@ static int kernfs_dop_revalidate(struct dentry 
> > *dentry, unsigned int flags)
> >
> >   /* Always perform fresh lookup for negatives */
> >   if (d_really_is_negative(dentry))
> > - goto out_bad_unlocked;
> > + goto out_bad;
> >
> >   kn = kernfs_dentry_node(dentry);
> > - mutex_lock(_mutex);
> >
> >   /* The kernfs node has been deactivated */
> >   if (!kernfs_active(kn))
> > @@ -579,11 +577,8 @@ static int kernfs_dop_revalidate(struct dentry 
> > *dentry, unsigned int flags)
> >   kernfs_info(dentry->d_sb)->ns != kn->ns)
> >   goto out_bad;
> >
> > - mutex_unlock(_mutex);
> >   return 1;
> >  out_bad:
> > - mutex_unlock(_mutex);
> > -out_bad_unlocked:
> >   return 0;
> >  }
>
> I don't see how this can be safe. Nothing even protects the dentry from
> turning negative in the middle and it may end up trying to deref NULL. I'm
> sure we can make this not need kernfs_mutex but that'd have to be a lot more
> careful.
>

Sorry Tejun, I don't get it. Even before the patch

if (d_really_is_negative(dentry))
 goto out_bad_unlocked;

kn = kernfs_dentry_node(dentry);
mutex_lock(_mutex);  < we lock here

status of d_really_is_negative is not preserved by the mutex. It could
turn negative between we checked it and we lock kernfs_mutex.
Is it a bug here??


thanks,
fox


Re: [PATCH 0/2] kernfs: speed up concurrency performance

2020-12-02 Thread Fox Chen
On Thu, Dec 3, 2020 at 2:28 AM Greg KH  wrote:
>
> On Wed, Dec 02, 2020 at 10:58:35PM +0800, Fox Chen wrote:
> > Hello,
> >
> > kernfs is an important facillity to support pseudo file systems and cgroup.
> > Currently, with a global mutex, reading files concurrently from kernfs 
> > (e.g. /sys)
> > is very slow.
> >
> > This problem is reported by Brice Goglin on thread:
> > Re: [PATCH 1/4] drivers core: Introduce CPU type sysfs interface
> > https://lore.kernel.org/lkml/x60dvjot4furc...@kroah.com/
> >
> > I independently comfirmed this on a 96-core AWS c5.metal server.
> > Do open+read+write on /sys/devices/system/cpu/cpu15/topology/core_id 1000 
> > times.
> > With a single thread it takes ~2.5 us for each open+read+close.
> > With one thread per core, 96 threads running simultaneously takes 540 us
> > for each of the same operation (without much variation) -- 200x slower than 
> > the
> > single thread one.
> >
> > The problem can only be observed in large machines (>=16 cores).
> > The more cores you have the slower it can be.
> >
> > Perf shows that CPUs spend most of the time (>80%) waiting on mutex locks in
> > kernfs_iop_permission and kernfs_dop_revalidate.
> >
> > This patchset contains the following 2 patches:
> > 0001-kernfs-replace-the-mutex-in-kernfs_iop_permission-wi.patch
> > 0002-kernfs-remove-mutex-in-kernfs_dop_revalidate.patch
> >
> > 0001 replace the mutex lock in kernfs_iop_permission with a new rwlock and
> > 0002 removes the mutex lock in kernfs_dop_revalidate.
> >
> > After applying this patchset, the multi-thread performance becomes linear 
> > with
> > the fastest one at ~30 us to the worst at ~150 us, very similar as I tested 
> > it
> > on a normal ext4 file system with fastest one at ~20 us to slowest at ~100 
> > us.
> > And I believe that is largely due to spin_locks in filesystems which are 
> > normal.
> >
> > Although it's still slower than single thread, users can benefit from this
> > patchset, especially ones working on HPC realm with lots of cpu cores and 
> > want to
> > fetch system information from sysfs.
>
> Does this mean that the changes slow down the single-threaded case?  Or
> that it's just not as good as the speed of a single-threaded access?

No, It won't influence the single-threaded case. I meant
multi-threaded case is still not as good as single-threaded one.

> But anyway, thanks so much for looking into this, it should help the
> crazy systems out today, which means the normal systems in 5 years will
> really appreciate this :)

thanks :)


Re: [PATCH 2/2] kernfs: remove mutex in kernfs_dop_revalidate

2020-12-02 Thread Fox Chen
On Thu, Dec 3, 2020 at 2:26 AM Greg KH  wrote:
>
> On Wed, Dec 02, 2020 at 10:58:37PM +0800, Fox Chen wrote:
> > There is a big mutex in kernfs_dop_revalidate which slows down the
> > concurrent performance of kernfs.
> >
> > Since kernfs_dop_revalidate only does some checks, the lock is
> > largely unnecessary. Also, according to kernel filesystem locking
> > document:
> > https://www.kernel.org/doc/html/latest/filesystems/locking.html
> > locking is not in the protocal for d_revalidate operation.
> >
> > This patch remove this mutex from
> > kernfs_dop_revalidate, so kernfs_dop_revalidate
> > can run concurrently.
> >
> > Signed-off-by: Fox Chen 
> > ---
> >  fs/kernfs/dir.c | 9 +++--
> >  1 file changed, 3 insertions(+), 6 deletions(-)
> >
> > diff --git a/fs/kernfs/dir.c b/fs/kernfs/dir.c
> > index 9aec80b9d7c6..c2267c93f546 100644
> > --- a/fs/kernfs/dir.c
> > +++ b/fs/kernfs/dir.c
> > @@ -26,7 +26,6 @@ static DEFINE_SPINLOCK(kernfs_idr_lock);/* 
> > root->ino_idr */
> >
> >  static bool kernfs_active(struct kernfs_node *kn)
> >  {
> > - lockdep_assert_held(_mutex);
> >   return atomic_read(>active) >= 0;
> >  }
> >
> > @@ -557,10 +556,9 @@ static int kernfs_dop_revalidate(struct dentry 
> > *dentry, unsigned int flags)
> >
> >   /* Always perform fresh lookup for negatives */
> >   if (d_really_is_negative(dentry))
> > - goto out_bad_unlocked;
> > + goto out_bad;
> >
> >   kn = kernfs_dentry_node(dentry);
> > - mutex_lock(_mutex);
> >
> >   /* The kernfs node has been deactivated */
> >   if (!kernfs_active(kn))
> > @@ -579,11 +577,8 @@ static int kernfs_dop_revalidate(struct dentry 
> > *dentry, unsigned int flags)
> >   kernfs_info(dentry->d_sb)->ns != kn->ns)
> >   goto out_bad;
> >
> > - mutex_unlock(_mutex);
> >   return 1;
> >  out_bad:
> > - mutex_unlock(_mutex);
> > -out_bad_unlocked:
> >   return 0;
> >  }
> >
> > @@ -650,6 +645,8 @@ static struct kernfs_node *__kernfs_new_node(struct 
> > kernfs_root *root,
> >   kn->mode = mode;
> >   kn->flags = flags;
> >
> > + rwlock_init(>iattr_rwlock);
>
> Ah, now you initialize this, it should go into patch 1, right? :)
>
Yes, it's my fault. It should be in patch 1. Sorry.


Re: [PATCH 1/2] kernfs: replace the mutex in kernfs_iop_permission with a rwlock

2020-12-02 Thread Fox Chen
Hi,

Thanks for your comments.

> On Wed, Dec 02, 2020 at 10:58:36PM +0800, Fox Chen wrote:
> > @@ -121,7 +121,7 @@ int kernfs_iop_setattr(struct dentry *dentry, struct 
> > iattr *iattr)
> >   if (!kn)
> >   return -EINVAL;
> >
> > - mutex_lock(_mutex);
> > + write_lock(>iattr_rwlock);
> >   error = setattr_prepare(dentry, iattr);
> >   if (error)
> >   goto out;
> > @@ -134,7 +134,7 @@ int kernfs_iop_setattr(struct dentry *dentry, struct 
> > iattr *iattr)
> >   setattr_copy(inode, iattr);
> >
> >  out:
> > - mutex_unlock(_mutex);
> > + write_unlock(>iattr_rwlock);
> >   return error;
> >  }
>
> This is putting GFP_KERNEL allocation inside a rwlock. Can you please test
> with debug options including LOCKDEP and DEBUG_ATOMIC_SLEEP turned on?
>

Ok, I will try that.

Allocation is protected by the write_lock, only one thread can enter
this at a time. It should give the same protection as a mutex, right??
Or am I missing something here?? Any caveat?

On Thu, Dec 3, 2020 at 2:37 AM Tejun Heo  wrote:
>
> On Wed, Dec 02, 2020 at 10:58:36PM +0800, Fox Chen wrote:
> > diff --git a/include/linux/kernfs.h b/include/linux/kernfs.h
> > index 89f6a4214a70..545cdb39b34b 100644
> > --- a/include/linux/kernfs.h
> > +++ b/include/linux/kernfs.h
> > @@ -156,6 +156,7 @@ struct kernfs_node {
> >   unsigned short  flags;
> >   umode_t mode;
> >   struct kernfs_iattrs*iattr;
> > + rwlock_tiattr_rwlock;
> >  };
>
> Also, while this might not look like much, kernfs_node is very size
> sensitive. There are systems with huge number of these nodes, so I don't
> think putting a per-node lock like this is a good idea. Either we can use a
> shared iattr protecting lock or play some cmpxchg games when allocating and
> setting ->iattr and put the lock there.
>

Initially, I tried to put rwlock in kn->iattr, but
__kernfs_setattr(kn, iattr) needs lock protection and kn->iattr may
not exist before calling __kernfs_setattr. It's a chicken-egg paradox.
:)
It's hard to solve. cmpxchg can help, but who sets kn->iattr first
should be clearly defined.

What about I used a global shared rwlock to protect all kn->iattr.
It's easier to implement and I think we read sysfs more than write to
it, I guess it won't be that slow compared to one kn per lock?


thanks,
fox


[PATCH 1/2] kernfs: replace the mutex in kernfs_iop_permission with a rwlock

2020-12-02 Thread Fox Chen
A big global mutex in kernfs_iop_permission will significanly drag
system performance when processes concurrently open files
on kernfs in Big machines(with >= 16 cpu cores).

This patch replace the big mutex with a rwlock specifically for
protecting kernfs_node->iattribute. So that kernfs_iop_permission
can perform concurrently.

Signed-off-by: Fox Chen 
---
 fs/kernfs/inode.c  | 16 
 include/linux/kernfs.h |  1 +
 2 files changed, 9 insertions(+), 8 deletions(-)

diff --git a/fs/kernfs/inode.c b/fs/kernfs/inode.c
index fc2469a20fed..c8c2ea669e6d 100644
--- a/fs/kernfs/inode.c
+++ b/fs/kernfs/inode.c
@@ -106,9 +106,9 @@ int kernfs_setattr(struct kernfs_node *kn, const struct 
iattr *iattr)
 {
int ret;
 
-   mutex_lock(_mutex);
+   write_lock(>iattr_rwlock);
ret = __kernfs_setattr(kn, iattr);
-   mutex_unlock(_mutex);
+   write_unlock(>iattr_rwlock);
return ret;
 }
 
@@ -121,7 +121,7 @@ int kernfs_iop_setattr(struct dentry *dentry, struct iattr 
*iattr)
if (!kn)
return -EINVAL;
 
-   mutex_lock(_mutex);
+   write_lock(>iattr_rwlock);
error = setattr_prepare(dentry, iattr);
if (error)
goto out;
@@ -134,7 +134,7 @@ int kernfs_iop_setattr(struct dentry *dentry, struct iattr 
*iattr)
setattr_copy(inode, iattr);
 
 out:
-   mutex_unlock(_mutex);
+   write_unlock(>iattr_rwlock);
return error;
 }
 
@@ -189,9 +189,9 @@ int kernfs_iop_getattr(const struct path *path, struct 
kstat *stat,
struct inode *inode = d_inode(path->dentry);
struct kernfs_node *kn = inode->i_private;
 
-   mutex_lock(_mutex);
+   read_lock(>iattr_rwlock);
kernfs_refresh_inode(kn, inode);
-   mutex_unlock(_mutex);
+   read_unlock(>iattr_rwlock);
 
generic_fillattr(inode, stat);
return 0;
@@ -281,9 +281,9 @@ int kernfs_iop_permission(struct inode *inode, int mask)
 
kn = inode->i_private;
 
-   mutex_lock(_mutex);
+   read_lock(>iattr_rwlock);
kernfs_refresh_inode(kn, inode);
-   mutex_unlock(_mutex);
+   read_unlock(>iattr_rwlock);
 
return generic_permission(inode, mask);
 }
diff --git a/include/linux/kernfs.h b/include/linux/kernfs.h
index 89f6a4214a70..545cdb39b34b 100644
--- a/include/linux/kernfs.h
+++ b/include/linux/kernfs.h
@@ -156,6 +156,7 @@ struct kernfs_node {
unsigned short  flags;
umode_t mode;
struct kernfs_iattrs*iattr;
+   rwlock_tiattr_rwlock;
 };
 
 /*
-- 
2.29.2



[PATCH 0/2] kernfs: speed up concurrency performance

2020-12-02 Thread Fox Chen
Hello,

kernfs is an important facillity to support pseudo file systems and cgroup. 
Currently, with a global mutex, reading files concurrently from kernfs (e.g. 
/sys) 
is very slow.

This problem is reported by Brice Goglin on thread:
Re: [PATCH 1/4] drivers core: Introduce CPU type sysfs interface
https://lore.kernel.org/lkml/x60dvjot4furc...@kroah.com/

I independently comfirmed this on a 96-core AWS c5.metal server.
Do open+read+write on /sys/devices/system/cpu/cpu15/topology/core_id 1000 times.
With a single thread it takes ~2.5 us for each open+read+close.
With one thread per core, 96 threads running simultaneously takes 540 us 
for each of the same operation (without much variation) -- 200x slower than the 
single thread one. 

The problem can only be observed in large machines (>=16 cores).
The more cores you have the slower it can be.

Perf shows that CPUs spend most of the time (>80%) waiting on mutex locks in 
kernfs_iop_permission and kernfs_dop_revalidate.

This patchset contains the following 2 patches:
0001-kernfs-replace-the-mutex-in-kernfs_iop_permission-wi.patch
0002-kernfs-remove-mutex-in-kernfs_dop_revalidate.patch

0001 replace the mutex lock in kernfs_iop_permission with a new rwlock and 
0002 removes the mutex lock in kernfs_dop_revalidate.

After applying this patchset, the multi-thread performance becomes linear with 
the fastest one at ~30 us to the worst at ~150 us, very similar as I tested it
on a normal ext4 file system with fastest one at ~20 us to slowest at ~100 us. 
And I believe that is largely due to spin_locks in filesystems which are normal.

Although it's still slower than single thread, users can benefit from this 
patchset, especially ones working on HPC realm with lots of cpu cores and want 
to
fetch system information from sysfs.

I tried my best to solve this problem. If there is stupid mistake, please kindly
point out. I would appreciate it greatly.

Fox

 fs/kernfs/dir.c|  9 +++--
 fs/kernfs/inode.c  | 16 
 include/linux/kernfs.h |  1 +
 3 files changed, 12 insertions(+), 14 deletions(-)

-- 
2.29.2



[PATCH 2/2] kernfs: remove mutex in kernfs_dop_revalidate

2020-12-02 Thread Fox Chen
There is a big mutex in kernfs_dop_revalidate which slows down the
concurrent performance of kernfs.

Since kernfs_dop_revalidate only does some checks, the lock is
largely unnecessary. Also, according to kernel filesystem locking
document:
https://www.kernel.org/doc/html/latest/filesystems/locking.html
locking is not in the protocal for d_revalidate operation.

This patch remove this mutex from
kernfs_dop_revalidate, so kernfs_dop_revalidate
can run concurrently.

Signed-off-by: Fox Chen 
---
 fs/kernfs/dir.c | 9 +++--
 1 file changed, 3 insertions(+), 6 deletions(-)

diff --git a/fs/kernfs/dir.c b/fs/kernfs/dir.c
index 9aec80b9d7c6..c2267c93f546 100644
--- a/fs/kernfs/dir.c
+++ b/fs/kernfs/dir.c
@@ -26,7 +26,6 @@ static DEFINE_SPINLOCK(kernfs_idr_lock);  /* 
root->ino_idr */
 
 static bool kernfs_active(struct kernfs_node *kn)
 {
-   lockdep_assert_held(_mutex);
return atomic_read(>active) >= 0;
 }
 
@@ -557,10 +556,9 @@ static int kernfs_dop_revalidate(struct dentry *dentry, 
unsigned int flags)
 
/* Always perform fresh lookup for negatives */
if (d_really_is_negative(dentry))
-   goto out_bad_unlocked;
+   goto out_bad;
 
kn = kernfs_dentry_node(dentry);
-   mutex_lock(_mutex);
 
/* The kernfs node has been deactivated */
if (!kernfs_active(kn))
@@ -579,11 +577,8 @@ static int kernfs_dop_revalidate(struct dentry *dentry, 
unsigned int flags)
kernfs_info(dentry->d_sb)->ns != kn->ns)
goto out_bad;
 
-   mutex_unlock(_mutex);
return 1;
 out_bad:
-   mutex_unlock(_mutex);
-out_bad_unlocked:
return 0;
 }
 
@@ -650,6 +645,8 @@ static struct kernfs_node *__kernfs_new_node(struct 
kernfs_root *root,
kn->mode = mode;
kn->flags = flags;
 
+   rwlock_init(>iattr_rwlock);
+
if (!uid_eq(uid, GLOBAL_ROOT_UID) || !gid_eq(gid, GLOBAL_ROOT_GID)) {
struct iattr iattr = {
.ia_valid = ATTR_UID | ATTR_GID,
-- 
2.29.2



Re: [PATCH 1/4] drivers core: Introduce CPU type sysfs interface

2020-11-19 Thread Fox Chen
On Fri, Nov 13, 2020 at 2:15 PM Brice Goglin  wrote:
>
>
> Le 12/11/2020 à 11:49, Greg Kroah-Hartman a écrit :
>
> On Thu, Nov 12, 2020 at 10:10:57AM +0100, Brice Goglin wrote:
>
> Le 12/11/2020 à 07:42, Greg Kroah-Hartman a écrit :
>
> On Thu, Nov 12, 2020 at 07:19:48AM +0100, Brice Goglin wrote:
>
> Le 07/10/2020 à 07:15, Greg Kroah-Hartman a écrit :
>
> On Tue, Oct 06, 2020 at 08:14:47PM -0700, Ricardo Neri wrote:
>
> On Tue, Oct 06, 2020 at 09:37:44AM +0200, Greg Kroah-Hartman wrote:
>
> On Mon, Oct 05, 2020 at 05:57:36PM -0700, Ricardo Neri wrote:
>
> On Sat, Oct 03, 2020 at 10:53:45AM +0200, Greg Kroah-Hartman wrote:
>
> On Fri, Oct 02, 2020 at 06:17:42PM -0700, Ricardo Neri wrote:
>
> Hybrid CPU topologies combine CPUs of different microarchitectures in the
> same die. Thus, even though the instruction set is compatible among all
> CPUs, there may still be differences in features (e.g., some CPUs may
> have counters that others CPU do not). There may be applications
> interested in knowing the type of micro-architecture topology of the
> system to make decisions about process affinity.
>
> While the existing sysfs for capacity (/sys/devices/system/cpu/cpuX/
> cpu_capacity) may be used to infer the types of micro-architecture of the
> CPUs in the platform, it may not be entirely accurate. For instance, two
> subsets of CPUs with different types of micro-architecture may have the
> same capacity due to power or thermal constraints.
>
> Create the new directory /sys/devices/system/cpu/types. Under such
> directory, create individual subdirectories for each type of CPU micro-
> architecture. Each subdirectory will have cpulist and cpumap files. This
> makes it convenient for user space to read all the CPUs of the same type
> at once without having to inspect each CPU individually.
>
> Implement a generic interface using weak functions that architectures can
> override to indicate a) support for CPU types, b) the CPU type number, and
> c) a string to identify the CPU vendor and type.
>
> For example, an x86 system with one Intel Core and four Intel Atom CPUs
> would look like this (other architectures have the hooks to use whatever
> directory naming convention below "types" that meets their needs):
>
> user@host:~$: ls /sys/devices/system/cpu/types
> intel_atom_0  intel_core_0
>
> user@host:~$ ls /sys/devices/system/cpu/types/intel_atom_0
> cpulist cpumap
>
> user@host:~$ ls /sys/devices/system/cpu/types/intel_core_0
> cpulist cpumap
>
> user@host:~$ cat /sys/devices/system/cpu/types/intel_atom_0/cpumap
> 0f
>
> user@host:~$ cat /sys/devices/system/cpu/types/intel_atom_0/cpulist
> 0-3
>
> user@ihost:~$ cat /sys/devices/system/cpu/types/intel_core_0/cpumap
> 10
>
> user@host:~$ cat /sys/devices/system/cpu/types/intel_core_0/cpulist
> 4
>
> Thank you for the quick and detailed Greg!
>
> The output of 'tree' sometimes makes it easier to see here, or:
> grep -R . *
> also works well.
>
> Indeed, this would definitely make it more readable.
>
> On non-hybrid systems, the /sys/devices/system/cpu/types directory is not
> created. Add a hook for this purpose.
>
> Why should these not show up if the system is not "hybrid"?
>
> My thinking was that on a non-hybrid system, it does not make sense to
> create this interface, as all the CPUs will be of the same type.
>
> Why not just have this an attribute type in the existing cpuX directory?
> Why do this have to be a totally separate directory and userspace has to
> figure out to look in two different spots for the same cpu to determine
> what it is?
>
> But if the type is located under cpuX, usespace would need to traverse
> all the CPUs and create its own cpu masks. Under the types directory it
> would only need to look once for each type of CPU, IMHO.
>
> What does a "mask" do?  What does userspace care about this?  You would
> have to create it by traversing the directories you are creating anyway,
> so it's not much different, right?
>
> Hello
>
> Sorry for the late reply. As the first userspace consumer of this
> interface [1], I can confirm that reading a single file to get the mask
> would be better, at least for performance reason. On large platforms, we
> already have to read thousands of sysfs files to get CPU topology and
> cache information, I'd be happy not to read one more file per cpu.
>
> Reading these sysfs files is slow, and it does not scale well when
> multiple processes read them in parallel.
>
> Really?  Where is the slowdown?  Would something like readfile() work
> better for you for that?
> https://lore.kernel.org/linux-api/20200704140250.423345-1-gre...@linuxfoundation.org/
>
> I guess readfile would improve the sequential case by avoiding syscalls
> but it would not improve the parallel case since syscalls shouldn't have
> any parallel issue?
>
> syscalls should not have parallel issues at all.
>
> We've been watching the status of readfile() since it was posted on LKML
> 6 months ago, but we were actually wondering if it would end up 

[PATCH] squashfs: Add id_table sanity check to squashfs_get_id

2020-11-02 Thread Fox Chen
When uid/gid info in superblocks or id_index_table is corrupted,
The uid/gid index can be larger than the size of msblk->id_table.
This is reported by syzkaller.

This patch adds a sanity check to squashfs_get_id which calculates
the max available room for uid/gid table by doing
msblk->xattr_table - msblk->id_table[0]
and check if index is larger than this.

While this provides some sort of check, it is
imperfect because id_table can be smaller than that.

Reported-by: syzbot+8e28bba73ed1772a6...@syzkaller.appspotmail.com
Signed-off-by: Fox Chen 
---
 fs/squashfs/id.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/fs/squashfs/id.c b/fs/squashfs/id.c
index 6be5afe7287d..81bd67c0f649 100644
--- a/fs/squashfs/id.c
+++ b/fs/squashfs/id.c
@@ -35,10 +35,16 @@ int squashfs_get_id(struct super_block *sb, unsigned int 
index,
struct squashfs_sb_info *msblk = sb->s_fs_info;
int block = SQUASHFS_ID_BLOCK(index);
int offset = SQUASHFS_ID_BLOCK_OFFSET(index);
-   u64 start_block = le64_to_cpu(msblk->id_table[block]);
+   u64 start_block;
__le32 disk_id;
int err;
 
+   // sanity check
+   if (le64_to_cpu(msblk->id_table[0]) + block >= msblk->xattr_table)
+   return -EINVAL;
+
+   start_block = le64_to_cpu(msblk->id_table[block]);
+
err = squashfs_read_metadata(sb, _id, _block, ,
sizeof(disk_id));
if (err < 0)
-- 
2.25.1



[PATCH] NTFS: Add name sanity check to ntfs_attr_find

2020-11-02 Thread Fox Chen
When mounting, if Attribute data is correupted, doing named attribute
lookup can lead to invalid memory access. This is reported by syzkaller.

This patch adds a sanity check prior to attribute name lookup. If attribute's
name_offset is invalid, It will mark volume error and return -EIO.

Reported-by: syzbot+ecbcf37464c627253...@syzkaller.appspotmail.com
Signed-off-by: Fox Chen 
---
 fs/ntfs/attrib.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/fs/ntfs/attrib.c b/fs/ntfs/attrib.c
index d563abc3e136..e7366f74ff62 100644
--- a/fs/ntfs/attrib.c
+++ b/fs/ntfs/attrib.c
@@ -607,6 +607,16 @@ static int ntfs_attr_find(const ATTR_TYPE type, const 
ntfschar *name,
 * If @name is present, compare the two names.  If @name is
 * missing, assume we want an unnamed attribute.
 */
+
+   /*
+* Sanity check, a->name_offset should be within the range of 
a->lengh,
+*/
+   if (name && ((u8*)a + le16_to_cpu(a->name_offset)) > ((u8*)a + 
le32_to_cpu(a->length))) {
+   ntfs_error(vol->sb, "Invalid Attribute Name. Inode is 
corrupt.  Run chkdsk.");
+   NVolSetErrors(vol);
+   return -EIO;
+   }
+
if (!name) {
/* The search failed if the found attribute is named. */
if (a->name_length)
-- 
2.25.1



Re: [Cluster-devel] [PATCH] fs: gfs2: prevent OOB access in gfs2_read_sb()

2020-10-14 Thread Fox Chen
Hi Andrew,

On Wed, Oct 14, 2020 at 9:04 PM Andrew Price  wrote:
> Just a heads-up to avoid duplication of effort: Fox Chen (CCed) has
> attempted to fix this also[1], but I don't know if they plan to send
> another patch.

Oh, I thought it was solved by someone else as you've pointed out the
correct solution the other day. https://lkml.org/lkml/2020/10/5/538

I'm solving another bug right now, I will leave this to Anant. :)


thanks,
fox


Re: [Cluster-devel] KASAN: slab-out-of-bounds Write in,gfs2_fill_super

2020-10-14 Thread Fox Chen

Yes, it's the same bug,

https://lkml.org/lkml/2020/10/5/538 this may help


thanks,


Re: KASAN: slab-out-of-bounds Read in squashfs_get_id

2020-10-14 Thread Fox Chen

Hi,


I found this bug was caused by either uid/gid info in superblocks or 
id_index_table is corrupted. The uid/gid index is larger than the size 
of msblk->id_table.


Should I add a sanity check to squashfs_get_id??

The complete solution is to record the size of msblk->id_table in msblk 
and check uid/gid index each time in squashfs_get_id. However, this 
requires a change to msblk struct.


A simple solution is to calculate the max available room for uid/gid 
table by doing msblk->xattr_table - msblk->id_table[0] and check if 
index is larger than this. While this provides some sort of check, it is 
imperfect because id_table can be smaller than that.

Both of them work out for this bug.


thanks,
fox

On Friday, September 25, 2020 at 10:48:18 PM UTC+8 syzbot wrote:
Hello,
syzbot found the following issue on:
HEAD commit: 171d4ff7 Merge tag 'mmc-v5.9-rc4-2' of 
git://git.kernel.or.. git tree: upstream console output: 
https://syzkaller.appspot.com/x/log.txt?x=1597ead390 kernel config: 
https://syzkaller.appspot.com/x/.config?x=af502ec9a451c9fc dashboard 
link: https://syzkaller.appspot.com/bug?extid=8e28bba73ed1772a6802 
compiler: clang version 10.0.0 (https://github.com/llvm/llvm-project/ 
c2443155a0fb245c8f17f2c1c72b6ea391e86e81) syz repro: 
https://syzkaller.appspot.com/x/repro.syz?x=172ff48190 C reproducer: 
https://syzkaller.appspot.com/x/repro.c?x=17c3e6c590
IMPORTANT: if you fix the issue, please add the following tag to the 
commit: Reported-by: syzbot+8e28bb...@syzkaller.appspotmail.com
== BUG: 
KASAN: slab-out-of-bounds in squashfs_get_id+0xb9/0x1c0 
fs/squashfs/id.c:38 Read of size 8 at addr 8880a9684b98 by task 
syz-executor329/6836
CPU: 1 PID: 6836 Comm: syz-executor329 Not tainted 5.9.0-rc6-syzkaller 
#0 Hardware name: Google Google Compute Engine/Google Compute Engine, 
BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 
[inline] dump_stack+0x1d6/0x29e lib/dump_stack.c:118 
print_address_description+0x66/0x620 mm/kasan/report.c:383 
__kasan_report mm/kasan/report.c:513 [inline] kasan_report+0x132/0x1d0 
mm/kasan/report.c:530 squashfs_get_id+0xb9/0x1c0 fs/squashfs/id.c:38 
squashfs_new_inode fs/squashfs/inode.c:51 [inline] 
squashfs_read_inode+0x155/0x2170 fs/squashfs/inode.c:120 
squashfs_fill_super+0x1478/0x1790 fs/squashfs/super.c:310 
get_tree_bdev+0x3e9/0x5f0 fs/super.c:1342 vfs_get_tree+0x88/0x270 
fs/super.c:1547 do_new_mount fs/namespace.c:2875 [inline] 
path_mount+0x179d/0x29e0 fs/namespace.c:3192 do_mount 
fs/namespace.c:3205 [inline] __do_sys_mount fs/namespace.c:3413 [inline] 
__se_sys_mount+0x126/0x180 fs/namespace.c:3390 do_syscall_64+0x31/0x70 
arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 
0033:0x446d1a Code: b8 08 00 00 00 0f 05 48 3d 01 f0 ff ff 0f 83 fd ad 
fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 49 89 ca b8 a5 00 00 00 0f 
05 <48> 3d 01 f0 ff ff 0f 83 da ad fb ff c3 66 0f 1f 84 00 00 00 00 00 
RSP: 002b:7ffd7dd4f8b8 EFLAGS: 0293 ORIG_RAX: 00a5 
RAX: ffda RBX: 7ffd7dd4f910 RCX: 00446d1a RDX: 
2000 RSI: 2100 RDI: 7ffd7dd4f8d0 RBP: 
7ffd7dd4f8d0 R08: 7ffd7dd4f910 R09: 7ffd0015 R10: 
 R11: 0293 R12: 0001 R13: 
0004 R14: 0003 R15: 0003
Allocated by task 3913: kasan_save_stack mm/kasan/common.c:48 [inline] 
kasan_set_track mm/kasan/common.c:56 [inline] 
__kasan_kmalloc+0x100/0x130 mm/kasan/common.c:461 kmalloc_node 
include/linux/slab.h:577 [inline] __vmalloc_area_node mm/vmalloc.c:2429 
[inline] __vmalloc_node_range+0x2c7/0x870 mm/vmalloc.c:2511 
module_alloc+0x7e/0x90 arch/x86/kernel/module.c:75 
bpf_jit_binary_alloc+0x123/0x230 kernel/bpf/core.c:871 
bpf_int_jit_compile+0x7995/0x8920 arch/x86/net/bpf_jit_comp.c:1911 
bpf_prog_select_runtime+0x76d/0xa60 kernel/bpf/core.c:1807 
bpf_migrate_filter net/core/filter.c:1290 [inline] 
bpf_prepare_filter+0xec2/0x1140 net/core/filter.c:1338 
bpf_prog_create_from_user+0x2ad/0x3e0 net/core/filter.c:1432 
seccomp_prepare_filter kernel/seccomp.c:567 [inline] 
seccomp_prepare_user_filter kernel/seccomp.c:604 [inline] 
seccomp_set_mode_filter kernel/seccomp.c:1546 [inline] 
do_seccomp+0x852/0x20b0 kernel/seccomp.c:1661 do_syscall_64+0x31/0x70 
arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x44/0xa9
The buggy address belongs to the object at 8880a9684b80 which 
belongs to the cache kmalloc-32 of size 32 The buggy address is located 
24 bytes inside of 32-byte region [8880a9684b80, 8880a9684ba0) 
The buggy address belongs to the page: page:f697ca3d refcount:1 
mapcount:0 mapping: index:0x8880a9684fc1 pfn:0xa9684 
flags: 0xfffe000200(slab) raw: 00fffe000200 ea0002a5d5c8 
ea0002a98588 8880aa440100 raw: 8880a9684fc1 8880a9684000 
0001003f  page 

Re: [Cluster-devel] [PATCH] gfs2: gfs2_read_sb: put gfs2_assert inside the loop

2020-10-05 Thread Fox Chen
On Mon, Oct 5, 2020 at 9:23 PM Andrew Price  wrote:
>
> On 03/10/2020 07:31, Fox Chen wrote:
> > for (x = 2;; x++) {
> >  ...
> >  gfs2_assert(sdp, x <= GFS2_MAX_META_HEIGHT);  <--- after
> >  ...
> >  if (d != sdp->sd_heightsize[x - 1] || m)
> >  break;
> >  sdp->sd_heightsize[x] = space;
> > }
> >
> > sdp->sd_max_height = x
> > gfs2_assert(sdp, sdp->sd_max_height <= GFS2_MAX_META_HEIGHT) <--- before
> >
> > Before this patch, gfs2_assert is put outside of the loop of
> > sdp->sd_heightsize[x] calculation. When something goes wrong,
>
> So this looks related to one of the recent syzbot reports, where the
> "something goes wrong" is the block size in the on-disk superblock was
> zeroed and that leads eventually to this out-of-bounds write. The
> correct fix in that case would be to add a validity check for the block
> size in gfs2_check_sb().
>

Yes, I saw this bug from the syzbot report and I though instead of
KASAN gfs2_assert should be able to catch it so I proposed this patch.
:)

thank you both for your comments.


fox


[PATCH] gfs2: gfs2_read_sb: put gfs2_assert inside the loop

2020-10-03 Thread Fox Chen
for (x = 2;; x++) {
...
gfs2_assert(sdp, x <= GFS2_MAX_META_HEIGHT);  <--- after
...
if (d != sdp->sd_heightsize[x - 1] || m)
break;
sdp->sd_heightsize[x] = space;
}

sdp->sd_max_height = x
gfs2_assert(sdp, sdp->sd_max_height <= GFS2_MAX_META_HEIGHT) <--- before

Before this patch, gfs2_assert is put outside of the loop of
sdp->sd_heightsize[x] calculation. When something goes wrong,
x exceeds the size of GFS2_MAX_META_HEIGHT, it may already crash inside
the loop when

sdp->sd_heightsize[x] = space

tries to reach the out-of-bound
location, gfs2_assert won't help here.

This patch fixes this by moving gfs2_assert into the loop.
We will check x value each time to see if it exceeds GFS2_MAX_META_HEIGHT.

Signed-off-by: Fox Chen 
---
 fs/gfs2/ops_fstype.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/gfs2/ops_fstype.c b/fs/gfs2/ops_fstype.c
index 6d18d2c91add..6cc32e3010f2 100644
--- a/fs/gfs2/ops_fstype.c
+++ b/fs/gfs2/ops_fstype.c
@@ -333,6 +333,7 @@ static int gfs2_read_sb(struct gfs2_sbd *sdp, int silent)
u64 space, d;
u32 m;
 
+   gfs2_assert(sdp, x <= GFS2_MAX_META_HEIGHT);
space = sdp->sd_heightsize[x - 1] * sdp->sd_inptrs;
d = space;
m = do_div(d, sdp->sd_inptrs);
@@ -343,7 +344,6 @@ static int gfs2_read_sb(struct gfs2_sbd *sdp, int silent)
}
sdp->sd_max_height = x;
sdp->sd_heightsize[x] = ~0;
-   gfs2_assert(sdp, sdp->sd_max_height <= GFS2_MAX_META_HEIGHT);
 
sdp->sd_max_dents_per_leaf = (sdp->sd_sb.sb_bsize -
  sizeof(struct gfs2_leaf)) /
-- 
2.25.1



[PATCH] docs/memory-barriers.txt: Fix a typo in CPU MEMORY BARRIERS section

2020-09-09 Thread Fox Chen
Commit 39323c6 smp_mb__{before,after}_atomic(): update Documentation
has a typo in CPU MEORY BARRIERS section:
"RMW functions that do not imply are memory barrier are ..." should be
"RMW functions that do not imply a memory barrier are ...".

This patch fixes this typo.

Signed-off-by: Fox Chen 
---
 Documentation/memory-barriers.txt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/memory-barriers.txt 
b/Documentation/memory-barriers.txt
index 96186332e5f4..20b8a7b30320 100644
--- a/Documentation/memory-barriers.txt
+++ b/Documentation/memory-barriers.txt
@@ -1870,7 +1870,7 @@ There are some more advanced barrier functions:
 
  These are for use with atomic RMW functions that do not imply memory
  barriers, but where the code needs a memory barrier. Examples for atomic
- RMW functions that do not imply are memory barrier are e.g. add,
+ RMW functions that do not imply a memory barrier are e.g. add,
  subtract, (failed) conditional operations, _relaxed functions,
  but not atomic_read or atomic_set. A common example where a memory
  barrier may be required is when atomic ops are used for reference
-- 
2.25.1



HCI_EV_PHY_LINK_COMPLETE in net/bluetooth/hci_event.c

2020-08-16 Thread Fox Chen
Dear all,

I have a question about

static void hci_phy_link_complete_evt(struct hci_dev *hdev,
struct sk_buff *skb)
-- HCI_EV_PHY_LINK_COMPLETE event packet handler in hci_even.c:4940

if (ev->status) {
hci_conn_del(hcon); <--
hci_dev_unlock(hdev);
return;
}

Is it correct to del hcon here?? Because later on, when we close the
socket fd, socket_close will call sco_chan_del which will eventually
call hci_conn_drop. With hcon already deleted by this handler, it will
crash.

This bug is reported by syzbot in
https://syzkaller.appspot.com/bug?id=57e98513afbe427bbd65ac295130bcf5bc860dd8

I'm trying to fix that, but I don't know the design nature of
HCI_EV_PHY_LINK_COMPLETE. Will this scenario happen in real life?? Can
I remove hci_conn_del(hcon) here (I tested it, which fixes this bug)
??


Thanks,
Fox


[PATCH] staging: rtl8723bs: Cleanup open brace issues

2020-07-23 Thread Fox Chen
This cleans up open brace issues reported by checkpatch.pl

Signed-off-by: Fox Chen 
---
 .../staging/rtl8723bs/include/HalPwrSeqCmd.h  |  6 +--
 drivers/staging/rtl8723bs/include/HalVerDef.h | 18 +++
 drivers/staging/rtl8723bs/include/drv_types.h |  6 +--
 .../rtl8723bs/include/drv_types_sdio.h|  3 +-
 .../staging/rtl8723bs/include/hal_btcoex.h|  3 +-
 drivers/staging/rtl8723bs/include/hal_com.h   |  3 +-
 drivers/staging/rtl8723bs/include/ieee80211.h |  3 +-
 .../rtl8723bs/include/ioctl_cfg80211.h|  3 +-
 .../staging/rtl8723bs/include/rtl8192c_recv.h |  3 +-
 .../staging/rtl8723bs/include/rtl8723b_recv.h |  6 +--
 .../staging/rtl8723bs/include/rtl8723b_xmit.h |  3 +-
 drivers/staging/rtl8723bs/include/rtw_cmd.h   | 30 ---
 .../staging/rtl8723bs/include/rtw_eeprom.h|  6 +--
 drivers/staging/rtl8723bs/include/rtw_event.h |  6 +--
 drivers/staging/rtl8723bs/include/rtw_ht.h|  3 +-
 drivers/staging/rtl8723bs/include/rtw_mlme.h  |  6 +--
 .../staging/rtl8723bs/include/rtw_mlme_ext.h  | 51 +++
 drivers/staging/rtl8723bs/include/rtw_mp.h| 15 ++
 .../staging/rtl8723bs/include/rtw_pwrctrl.h   | 12 ++---
 drivers/staging/rtl8723bs/include/rtw_recv.h  | 12 ++---
 .../staging/rtl8723bs/include/rtw_security.h  |  9 ++--
 drivers/staging/rtl8723bs/include/rtw_xmit.h  | 12 ++---
 drivers/staging/rtl8723bs/include/wifi.h  | 24 +++--
 .../staging/rtl8723bs/include/wlan_bssdef.h   |  9 ++--
 24 files changed, 84 insertions(+), 168 deletions(-)

diff --git a/drivers/staging/rtl8723bs/include/HalPwrSeqCmd.h 
b/drivers/staging/rtl8723bs/include/HalPwrSeqCmd.h
index 7040cfc507d8..459f2f9d4bbb 100644
--- a/drivers/staging/rtl8723bs/include/HalPwrSeqCmd.h
+++ b/drivers/staging/rtl8723bs/include/HalPwrSeqCmd.h
@@ -82,14 +82,12 @@
 #definePWR_CUT_ALL_MSK 0xFF
 
 
-typedef enum _PWRSEQ_CMD_DELAY_UNIT_
-{
+typedef enum _PWRSEQ_CMD_DELAY_UNIT_ {
PWRSEQ_DELAY_US,
PWRSEQ_DELAY_MS,
 } PWRSEQ_DELAY_UNIT;
 
-typedef struct _WL_PWR_CFG_
-{
+typedef struct _WL_PWR_CFG_ {
u16 offset;
u8 cut_msk;
u8 fab_msk:4;
diff --git a/drivers/staging/rtl8723bs/include/HalVerDef.h 
b/drivers/staging/rtl8723bs/include/HalVerDef.h
index c548fb126683..b4744be2cbe1 100644
--- a/drivers/staging/rtl8723bs/include/HalVerDef.h
+++ b/drivers/staging/rtl8723bs/include/HalVerDef.h
@@ -8,8 +8,7 @@
 #define __HAL_VERSION_DEF_H__
 
 /*  HAL_IC_TYPE_E */
-typedef enum tag_HAL_IC_Type_Definition
-{
+typedef enum tag_HAL_IC_Type_Definition {
CHIP_8192S  =   0,
CHIP_8188C  =   1,
CHIP_8192C  =   2,
@@ -23,16 +22,14 @@ typedef enum tag_HAL_IC_Type_Definition
 } HAL_IC_TYPE_E;
 
 /* HAL_CHIP_TYPE_E */
-typedef enum tag_HAL_CHIP_Type_Definition
-{
+typedef enum tag_HAL_CHIP_Type_Definition {
TEST_CHIP   =   0,
NORMAL_CHIP =   1,
FPGA=   2,
 } HAL_CHIP_TYPE_E;
 
 /* HAL_CUT_VERSION_E */
-typedef enum tag_HAL_Cut_Version_Definition
-{
+typedef enum tag_HAL_Cut_Version_Definition {
A_CUT_VERSION   =   0,
B_CUT_VERSION   =   1,
C_CUT_VERSION   =   2,
@@ -47,15 +44,13 @@ typedef enum tag_HAL_Cut_Version_Definition
 } HAL_CUT_VERSION_E;
 
 /*  HAL_Manufacturer */
-typedef enum tag_HAL_Manufacturer_Version_Definition
-{
+typedef enum tag_HAL_Manufacturer_Version_Definition {
CHIP_VENDOR_TSMC=   0,
CHIP_VENDOR_UMC =   1,
CHIP_VENDOR_SMIC=   2,
 } HAL_VENDOR_E;
 
-typedef enum tag_HAL_RF_Type_Definition
-{
+typedef enum tag_HAL_RF_Type_Definition {
RF_TYPE_1T1R=   0,
RF_TYPE_1T2R=   1,
RF_TYPE_2T2R=   2,
@@ -66,8 +61,7 @@ typedef enum tag_HAL_RF_Type_Definition
RF_TYPE_4T4R=   7,
 } HAL_RF_TYPE_E;
 
-typedefstruct tag_HAL_VERSION
-{
+typedefstruct tag_HAL_VERSION {
HAL_IC_TYPE_E   ICType;
HAL_CHIP_TYPE_E ChipType;
HAL_CUT_VERSION_E   CUTVersion;
diff --git a/drivers/staging/rtl8723bs/include/drv_types.h 
b/drivers/staging/rtl8723bs/include/drv_types.h
index dba75216cbfe..c73f581aea06 100644
--- a/drivers/staging/rtl8723bs/include/drv_types.h
+++ b/drivers/staging/rtl8723bs/include/drv_types.h
@@ -86,8 +86,7 @@ struct specific_device_id {
 
 };
 
-struct registry_priv
-{
+struct registry_priv {
u8 chip_version;
u8 rfintfs;
u8 lbkmode;
@@ -418,8 +417,7 @@ struct cam_entry_cache {
((u8 *)(x))[6], ((u8 *)(x))[7], ((u8 *)(x))[8], ((u8 *)(x))[9], ((u8 
*)(x))[10], ((u8 *)(x))[11], \
((u8 *)(x))[12], ((u8 *)(x))[13], ((u8 *)(x))[14], ((u8 *)(x))[15]
 
-struct dvobj_priv
-{
+struct dvobj_priv {
/* below is common data */
struct adapter *if1; /* PRIMARY_ADAPTER */
struct adapter *if2; /* SECONDARY_ADAPTER */
diff --git