Re: About the conflict between XFS inode recycle and VFS rcu-walk

2024-05-28 Thread Brian Foster
On Mon, May 27, 2024 at 01:18:23AM +0100, Al Viro wrote:
> On Mon, May 27, 2024 at 07:51:39AM +0800, Ian Kent wrote:
> 
> > Indeed, that's what I found when I had a quick look.
> > 
> > 
> > Maybe a dentry (since that's part of the subject of the path walk and inode
> > is readily
> > 
> > accessible) flag could be used since there's opportunity to set it in vfs
> > callbacks that
> > 
> > are done as a matter of course.
> 
> You might recheck ->d_seq after fetching ->get_link there; with XFS
> ->get_link() unconditionlly failing in RCU mode that would prevent
> this particular problem.  But it would obviously have to be done
> in pick_link() itself (and I refuse to touch that area in 5.4 -
> carrying those changes across the e.g. 5.6 changes in pathwalk
> machinery is too much).
> 

Ian sent a patch along those lines a couple years or so ago:

https://lore.kernel.org/linux-fsdevel/164180589176.86426.501271559065590169.st...@mickey.themaw.net/

I'm still not quite sure why we didn't merge this, at least as a bandaid
fix for the symlink variant of this particular problem..?

Brian

> And it's really just the tip of the iceberg - e.g. I'd expect a massive
> headache in ACL-related part of permission checks, etc.
> 




Re: About the conflict between XFS inode recycle and VFS rcu-walk

2024-05-27 Thread Dave Chinner
On Mon, May 27, 2024 at 09:56:15PM +0800, Jinliang Zheng wrote:
> On Mon, 27 May 2024 at 19:41:18 +1000, Dave Chinner wrote:
> > On Thu, May 16, 2024 at 03:23:40PM +0800, Ian Kent wrote:
> > > On 16/5/24 15:08, Ian Kent wrote:
> > > > On 16/5/24 12:56, Jinliang Zheng wrote:
> > > > In any case what you say is indeed correct, so the comment isn't
> > > > important.
> > > > 
> > > > 
> > > > Fact is it is still a race between the lockless path walk and inode
> > > > eviction
> > > > 
> > > > and xfs recycling. I believe that the xfs recycling code is very hard to
> > > > fix.
> > 
> > Not really for this case. This is simply concurrent pathwalk lookups
> > occurring just after the inode has been evicted from the VFS inode
> > cache. The first lookup hits the XFS inode cache, sees
> > XFS_IRECLAIMABLE, and it then enters xfs_reinit_inode() to
> > reinstantiate the VFS inode to an initial state. The Xfs inode
> > itself is still valid as it hasn't reached the XFS_IRECLAIM state
> > where it will be torn down and freed.
> > 
> > Whilst we are running xfs_reinit_inode(), a second RCU pathwalk has
> > been run and that it is trying to call ->get_link on that same
> > inode. Unfortunately, the first lookup has just set inode->f_ops =
> > &empty_fops as part of the VFS inode reinit, and that then triggers
> > the null pointer deref.
> 
> The RCU pathwalk must occur before xfs_reinit_inode(), and must obtain the
> target inode before xfs_reinit_inode().

I'm not sure I follow - xfs_reinit_inode() typically occurs during a
pathwalk when no dentry for the given path component is found in the
dcache. Hence it has to create the dentry and look up the inode.
i.e.

walk_component()
  lookup_fast() -> doesn't find a valid cached dentry
  lookup_slow()
inode_lock_shared(parent)
parent->i_op->lookup(child)
  xfs_vn_lookup()
xfs_lookup()
  xfs_iget(child)    inode may not exist until here
xfs_iget_recycle(child)
  xfs_reinit_inode(child)
inode_unlock_shared(parent)

The path you are indicating is going wrong is:

link_path_walk()
  walk_component()

step_into(child)
  if (!d_is_symlink(child dentry)) {

return
  }
  pick_link(child)
if (!inode->i_link)
  inode->i_op->get_link()    doesn't exist, not a symlink inode

This implies that lookup_fast() found a symlink dentry with a
d_inode pointer to something that wasn't a symlink. That doesn't
mean that anything has gone wrong with xfs inode recycling within an
RCU grace period.

For example, d_is_symlink() looks purely at the dentry state and
assumes that it matches the dentry->d_inode attached to it:

#define DCACHE_ENTRY_TYPE   (7 << 20) /* bits 20..22 are for 
storing type: */
#define DCACHE_MISS_TYPE(0 << 20) /* Negative dentry */
#define DCACHE_WHITEOUT_TYPE(1 << 20) /* Whiteout dentry (stop 
pathwalk) */
#define DCACHE_DIRECTORY_TYPE   (2 << 20) /* Normal directory */
#define DCACHE_AUTODIR_TYPE (3 << 20) /* Lookupless directory 
(presumed automount) */
#define DCACHE_REGULAR_TYPE (4 << 20) /* Regular file type */
#define DCACHE_SPECIAL_TYPE (5 << 20) /* Other file type */
#define DCACHE_SYMLINK_TYPE (6 << 20) /* Symlink */

static inline unsigned __d_entry_type(const struct dentry *dentry)
{
return dentry->d_flags & DCACHE_ENTRY_TYPE;
}

static inline bool d_is_symlink(const struct dentry *dentry)
{
return __d_entry_type(dentry) == DCACHE_SYMLINK_TYPE;
}

This is a valid optimisation and good for performance, but it does
make it susceptible to memory corruption based failues. i.e.  a
single bit memory corruption can change a DCACHE_DIRECTORY_TYPE
dentry to look like a DCACHE_SYMLINK_TYPE dentry, and then the code
calls pick_link() on a dentry that points to a directory inode and
not a symlink inode.

Such a memory corruption would have an identical crash signature
to the stack trace you posted, hence I'd really like to have solid
confirmation that the crash you are seeing is actually a result of
inode recycling and not something else

> > Once the first lookup has finished the inode_init_always(),
> > xfs_reinit_inode() resets inode->f_ops back to 
> > xfs_symlink_file_ops and get_link calls work again.
> > 
> > Fundamentally, the problem is that we are completely reinitialising
> > the VFS inode within the RCU grace period. i.e. while concurrent RCU
> > pathwalks can still be in progress and find the VFS inode whilst the
> > XFS inode cache is manipulating it.
> > 
> > What we should be doing here is a subset of inode_init_always(),
> > which only reinitialises the bits of the VFS inode we need to
> > initialise rather than the entire inode. The identity of the inode
> > is not changing and so we don't need to go through a transient state
> > where the VFS inode goes xfs symlink -> empty initialised inode ->
> > xfs symlink.
> 
> Sorr

Re: About the conflict between XFS inode recycle and VFS rcu-walk

2024-05-27 Thread Jinliang Zheng
On Mon, 27 May 2024 at 19:41:18 +1000, Dave Chinner wrote:
> On Thu, May 16, 2024 at 03:23:40PM +0800, Ian Kent wrote:
> > On 16/5/24 15:08, Ian Kent wrote:
> > > On 16/5/24 12:56, Jinliang Zheng wrote:
> > > > > I encountered the following calltrace:
> > > > > 
> > > > > [20213.578756] BUG: kernel NULL pointer dereference, address:
> > > > > 
> > > > > [20213.578785] #PF: supervisor instruction fetch in kernel mode
> > > > > [20213.578799] #PF: error_code(0x0010) - not-present page
> > > > > [20213.578812] PGD 3f01d64067 P4D 3f01d64067 PUD 3f01d65067 PMD 0
> > > > > [20213.578828] Oops: 0010 [#1] SMP NOPTI
> > > > > [20213.578839] CPU: 92 PID: 766 Comm: /usr/local/serv Kdump:
> > > > > loaded Not tainted 5.4.241-1-tlinux4-0017.3 #1
> > > > > [20213.578860] Hardware name: New H3C Technologies Co., Ltd.
> > > > > UniServer R4900 G3/RS33M2C9SA, BIOS 2.00.38P02 04/14/2020
> > > > > [20213.578884] RIP: 0010:0x0
> > > > > [20213.578894] Code: Bad RIP value.
> > > > > [20213.578903] RSP: 0018:c90021ebfc38 EFLAGS: 00010246
> > > > > [20213.578916] RAX: 82081f40 RBX: c90021ebfce0 RCX:
> > > > > 
> > > > > [20213.578932] RDX: c90021ebfd48 RSI: 88bfad8d3890 RDI:
> > > > > 
> > > > > [20213.578948] RBP: c90021ebfc70 R08: 0001 R09:
> > > > > 889b9eeae380
> > > > > [20213.578965] R10: 302d34320067 R11: 0001 R12:
> > > > > 
> > > > > [20213.578981] R13: 88bfad8d3890 R14: 889b9eeae380 R15:
> > > > > c90021ebfd48
> > > > > [20213.578998] FS:  7f89c534e740()
> > > > > GS:88c07fd0() knlGS:
> > > > > [20213.579016] CS:  0010 DS:  ES:  CR0: 80050033
> > > > > [20213.579030] CR2: ffd6 CR3: 003f01d90001 CR4:
> > > > > 007706e0
> > > > > [20213.579046] DR0:  DR1:  DR2:
> > > > > 
> > > > > [20213.579062] DR3:  DR6: fffe0ff0 DR7:
> > > > > 0400
> > > > > [20213.579079] PKRU: 5554
> > > > > [20213.579087] Call Trace:
> > > > > [20213.579099]  trailing_symlink+0x1da/0x260
> > > > > [20213.579112]  path_lookupat.isra.53+0x79/0x220
> > > > > [20213.579125]  filename_lookup.part.69+0xa0/0x170
> > > > > [20213.579138]  ? kmem_cache_alloc+0x3f/0x3f0
> > > > > [20213.579151]  ? getname_flags+0x4f/0x1e0
> > > > > [20213.579161]  user_path_at_empty+0x3e/0x50
> > > > > [20213.579172]  vfs_statx+0x76/0xe0
> > > > > [20213.579182]  __do_sys_newstat+0x3d/0x70
> > > > > [20213.579194]  ? fput+0x13/0x20
> > > > > [20213.579203]  ? ksys_ioctl+0xb0/0x300
> > > > > [20213.579213]  ? generic_file_llseek+0x24/0x30
> > > > > [20213.579225]  ? fput+0x13/0x20
> > > > > [20213.579233]  ? ksys_lseek+0x8d/0xb0
> > > > > [20213.579243]  __x64_sys_newstat+0x16/0x20
> > > > > [20213.579256]  do_syscall_64+0x4d/0x140
> > > > > [20213.579268]  entry_SYSCALL_64_after_hwframe+0x5c/0xc1
> > > > > 
> > > > > <<<
> > > > > 
> > > > Please note that the kernel version I use is the one maintained by
> > > > Tencent.Inc,
> > > > and the baseline is v5.4. But in fact, in the latest upstream source
> > > > tree,
> > > > although the trailing_symlink() function has been removed, its logic
> > > > has been
> > > > moved to pick_link(), so the problem still exists.
> > > > 
> > > > Ian Kent pointed out that try_to_unlazy() was introduced in
> > > > pick_link() in the
> > > > latest upstream source tree, but I don't understand why this can
> > > > solve the NULL
> > > > ->get_link pointer dereference problem, because ->get_link pointer
> > > > will be
> > > > dereferenced before try_to_unlazy().
> > > > 
> > > > (I don't understand why Ian Kent's email didn't appear on the
> > > > mailing list.)
> > > 
> > > It was something about html mail and I think my mail client was at fault.
> > > 
> > > In any case what you say is indeed correct, so the comment isn't
> > > important.
> > > 
> > > 
> > > Fact is it is still a race between the lockless path walk and inode
> > > eviction
> > > 
> > > and xfs recycling. I believe that the xfs recycling code is very hard to
> > > fix.
> 
> Not really for this case. This is simply concurrent pathwalk lookups
> occurring just after the inode has been evicted from the VFS inode
> cache. The first lookup hits the XFS inode cache, sees
> XFS_IRECLAIMABLE, and it then enters xfs_reinit_inode() to
> reinstantiate the VFS inode to an initial state. The Xfs inode
> itself is still valid as it hasn't reached the XFS_IRECLAIM state
> where it will be torn down and freed.
> 
> Whilst we are running xfs_reinit_inode(), a second RCU pathwalk has
> been run and that it is trying to call ->get_link on that same
> inode. Unfortunately, the first lookup has just set inode->f_ops =
> &empty_fops as part of the VFS inode reinit, and that then triggers
> the null pointer deref.

The RCU path

Re: About the conflict between XFS inode recycle and VFS rcu-walk

2024-05-27 Thread Dave Chinner
On Thu, May 16, 2024 at 03:23:40PM +0800, Ian Kent wrote:
> On 16/5/24 15:08, Ian Kent wrote:
> > On 16/5/24 12:56, Jinliang Zheng wrote:
> > > > I encountered the following calltrace:
> > > > 
> > > > [20213.578756] BUG: kernel NULL pointer dereference, address:
> > > > 
> > > > [20213.578785] #PF: supervisor instruction fetch in kernel mode
> > > > [20213.578799] #PF: error_code(0x0010) - not-present page
> > > > [20213.578812] PGD 3f01d64067 P4D 3f01d64067 PUD 3f01d65067 PMD 0
> > > > [20213.578828] Oops: 0010 [#1] SMP NOPTI
> > > > [20213.578839] CPU: 92 PID: 766 Comm: /usr/local/serv Kdump:
> > > > loaded Not tainted 5.4.241-1-tlinux4-0017.3 #1
> > > > [20213.578860] Hardware name: New H3C Technologies Co., Ltd.
> > > > UniServer R4900 G3/RS33M2C9SA, BIOS 2.00.38P02 04/14/2020
> > > > [20213.578884] RIP: 0010:0x0
> > > > [20213.578894] Code: Bad RIP value.
> > > > [20213.578903] RSP: 0018:c90021ebfc38 EFLAGS: 00010246
> > > > [20213.578916] RAX: 82081f40 RBX: c90021ebfce0 RCX:
> > > > 
> > > > [20213.578932] RDX: c90021ebfd48 RSI: 88bfad8d3890 RDI:
> > > > 
> > > > [20213.578948] RBP: c90021ebfc70 R08: 0001 R09:
> > > > 889b9eeae380
> > > > [20213.578965] R10: 302d34320067 R11: 0001 R12:
> > > > 
> > > > [20213.578981] R13: 88bfad8d3890 R14: 889b9eeae380 R15:
> > > > c90021ebfd48
> > > > [20213.578998] FS:  7f89c534e740()
> > > > GS:88c07fd0() knlGS:
> > > > [20213.579016] CS:  0010 DS:  ES:  CR0: 80050033
> > > > [20213.579030] CR2: ffd6 CR3: 003f01d90001 CR4:
> > > > 007706e0
> > > > [20213.579046] DR0:  DR1:  DR2:
> > > > 
> > > > [20213.579062] DR3:  DR6: fffe0ff0 DR7:
> > > > 0400
> > > > [20213.579079] PKRU: 5554
> > > > [20213.579087] Call Trace:
> > > > [20213.579099]  trailing_symlink+0x1da/0x260
> > > > [20213.579112]  path_lookupat.isra.53+0x79/0x220
> > > > [20213.579125]  filename_lookup.part.69+0xa0/0x170
> > > > [20213.579138]  ? kmem_cache_alloc+0x3f/0x3f0
> > > > [20213.579151]  ? getname_flags+0x4f/0x1e0
> > > > [20213.579161]  user_path_at_empty+0x3e/0x50
> > > > [20213.579172]  vfs_statx+0x76/0xe0
> > > > [20213.579182]  __do_sys_newstat+0x3d/0x70
> > > > [20213.579194]  ? fput+0x13/0x20
> > > > [20213.579203]  ? ksys_ioctl+0xb0/0x300
> > > > [20213.579213]  ? generic_file_llseek+0x24/0x30
> > > > [20213.579225]  ? fput+0x13/0x20
> > > > [20213.579233]  ? ksys_lseek+0x8d/0xb0
> > > > [20213.579243]  __x64_sys_newstat+0x16/0x20
> > > > [20213.579256]  do_syscall_64+0x4d/0x140
> > > > [20213.579268]  entry_SYSCALL_64_after_hwframe+0x5c/0xc1
> > > > 
> > > > <<<
> > > > 
> > > Please note that the kernel version I use is the one maintained by
> > > Tencent.Inc,
> > > and the baseline is v5.4. But in fact, in the latest upstream source
> > > tree,
> > > although the trailing_symlink() function has been removed, its logic
> > > has been
> > > moved to pick_link(), so the problem still exists.
> > > 
> > > Ian Kent pointed out that try_to_unlazy() was introduced in
> > > pick_link() in the
> > > latest upstream source tree, but I don't understand why this can
> > > solve the NULL
> > > ->get_link pointer dereference problem, because ->get_link pointer
> > > will be
> > > dereferenced before try_to_unlazy().
> > > 
> > > (I don't understand why Ian Kent's email didn't appear on the
> > > mailing list.)
> > 
> > It was something about html mail and I think my mail client was at fault.
> > 
> > In any case what you say is indeed correct, so the comment isn't
> > important.
> > 
> > 
> > Fact is it is still a race between the lockless path walk and inode
> > eviction
> > 
> > and xfs recycling. I believe that the xfs recycling code is very hard to
> > fix.

Not really for this case. This is simply concurrent pathwalk lookups
occurring just after the inode has been evicted from the VFS inode
cache. The first lookup hits the XFS inode cache, sees
XFS_IRECLAIMABLE, and it then enters xfs_reinit_inode() to
reinstantiate the VFS inode to an initial state. The Xfs inode
itself is still valid as it hasn't reached the XFS_IRECLAIM state
where it will be torn down and freed.

Whilst we are running xfs_reinit_inode(), a second RCU pathwalk has
been run and that it is trying to call ->get_link on that same
inode. Unfortunately, the first lookup has just set inode->f_ops =
&empty_fops as part of the VFS inode reinit, and that then triggers
the null pointer deref.

Once the first lookup has finished the inode_init_always(),
xfs_reinit_inode() resets inode->f_ops back to 
xfs_symlink_file_ops and get_link calls work again.

Fundamentally, the problem is that we are completely reinitialising
the VFS inode within the RCU grace period. i.e

Re: About the conflict between XFS inode recycle and VFS rcu-walk

2024-05-26 Thread Al Viro
On Mon, May 27, 2024 at 07:51:39AM +0800, Ian Kent wrote:

> Indeed, that's what I found when I had a quick look.
> 
> 
> Maybe a dentry (since that's part of the subject of the path walk and inode
> is readily
> 
> accessible) flag could be used since there's opportunity to set it in vfs
> callbacks that
> 
> are done as a matter of course.

You might recheck ->d_seq after fetching ->get_link there; with XFS
->get_link() unconditionlly failing in RCU mode that would prevent
this particular problem.  But it would obviously have to be done
in pick_link() itself (and I refuse to touch that area in 5.4 -
carrying those changes across the e.g. 5.6 changes in pathwalk
machinery is too much).

And it's really just the tip of the iceberg - e.g. I'd expect a massive
headache in ACL-related part of permission checks, etc.



Re: About the conflict between XFS inode recycle and VFS rcu-walk

2024-05-26 Thread Ian Kent

On 21/5/24 10:13, Ian Kent wrote:

On 21/5/24 09:35, Ian Kent wrote:

On 21/5/24 01:36, Darrick J. Wong wrote:

On Thu, May 16, 2024 at 03:23:40PM +0800, Ian Kent wrote:

On 16/5/24 15:08, Ian Kent wrote:

On 16/5/24 12:56, Jinliang Zheng wrote:

On Wed, 15 May 2024 at 23:54:41 +0800, Jinliang Zheng wrote:

On Wed, 31 Jan 2024 at 11:30:18 -0800, djw...@kernel.org wrote:

On Wed, Jan 31, 2024 at 02:35:17PM +0800, Jinliang Zheng wrote:

On Fri, 8 Dec 2023 11:14:32 +1100, da...@fromorbit.com wrote:

On Tue, Dec 05, 2023 at 07:38:33PM +0800,
alexjlzh...@gmail.com wrote:

Hi, all

I would like to ask if the conflict between xfs
inode recycle and vfs rcu-walk
which can lead to null pointer references has been resolved?

I browsed through emails about the following
patches and their discussions:
- 
https://lore.kernel.org/linux-xfs/20220217172518.3842951-2-bfos...@redhat.com/
- 
https://lore.kernel.org/linux-xfs/20220121142454.1994916-1-bfos...@redhat.com/
- 
https://lore.kernel.org/linux-xfs/164180589176.86426.501271559065590169.st...@mickey.themaw.net/


And then came to the conclusion that this
problem has not been solved, am I
right? Did I miss some patch that could solve this problem?

We fixed the known problems this caused by turning off the VFS
functionality that the rcu pathwalks kept tripping over. See 
commit

7b7820b83f23 ("xfs: don't expose internal symlink
metadata buffers to
the vfs").

Sorry for the delay.

The problem I encountered in the production environment
was that during the
rcu walk process the ->get_link() pointer was NULL,
which caused a crash.

As far as I know, commit 7b7820b83f23 ("xfs: don't
expose internal symlink
metadata buffers to the vfs") first appeared in:
- 
https://lore.kernel.org/linux-fsdevel/YZvvP9RFXi3%2FjX0q@bfoster/


Does this commit solve the problem of NULL ->get_link()? And how?
I suggest reading the call stack from wherever the VFS enters 
the XFS

readlink code.  If you have a reliable reproducer, then
apply this patch
to your kernel (you haven't mentioned which one it is) and see 
if the

bad dereference goes away.

--D

Sorry for the delay.

I encountered the following calltrace:

[20213.578756] BUG: kernel NULL pointer dereference, address:

[20213.578785] #PF: supervisor instruction fetch in kernel mode
[20213.578799] #PF: error_code(0x0010) - not-present page
[20213.578812] PGD 3f01d64067 P4D 3f01d64067 PUD 3f01d65067 PMD 0
[20213.578828] Oops: 0010 [#1] SMP NOPTI
[20213.578839] CPU: 92 PID: 766 Comm: /usr/local/serv Kdump:
loaded Not tainted 5.4.241-1-tlinux4-0017.3 #1
[20213.578860] Hardware name: New H3C Technologies Co., Ltd.
UniServer R4900 G3/RS33M2C9SA, BIOS 2.00.38P02 04/14/2020
[20213.578884] RIP: 0010:0x0
[20213.578894] Code: Bad RIP value.
[20213.578903] RSP: 0018:c90021ebfc38 EFLAGS: 00010246
[20213.578916] RAX: 82081f40 RBX: c90021ebfce0 RCX:

[20213.578932] RDX: c90021ebfd48 RSI: 88bfad8d3890 RDI:

[20213.578948] RBP: c90021ebfc70 R08: 0001 R09:
889b9eeae380
[20213.578965] R10: 302d34320067 R11: 0001 R12:

[20213.578981] R13: 88bfad8d3890 R14: 889b9eeae380 R15:
c90021ebfd48
[20213.578998] FS:  7f89c534e740()
GS:88c07fd0() knlGS:
[20213.579016] CS:  0010 DS:  ES:  CR0: 80050033
[20213.579030] CR2: ffd6 CR3: 003f01d90001 CR4:
007706e0
[20213.579046] DR0:  DR1:  DR2:

[20213.579062] DR3:  DR6: fffe0ff0 DR7:
0400
[20213.579079] PKRU: 5554
[20213.579087] Call Trace:
[20213.579099]  trailing_symlink+0x1da/0x260
[20213.579112]  path_lookupat.isra.53+0x79/0x220
[20213.579125]  filename_lookup.part.69+0xa0/0x170
[20213.579138]  ? kmem_cache_alloc+0x3f/0x3f0
[20213.579151]  ? getname_flags+0x4f/0x1e0
[20213.579161]  user_path_at_empty+0x3e/0x50
[20213.579172]  vfs_statx+0x76/0xe0
[20213.579182]  __do_sys_newstat+0x3d/0x70
[20213.579194]  ? fput+0x13/0x20
[20213.579203]  ? ksys_ioctl+0xb0/0x300
[20213.579213]  ? generic_file_llseek+0x24/0x30
[20213.579225]  ? fput+0x13/0x20
[20213.579233]  ? ksys_lseek+0x8d/0xb0
[20213.579243]  __x64_sys_newstat+0x16/0x20
[20213.579256]  do_syscall_64+0x4d/0x140
[20213.579268] entry_SYSCALL_64_after_hwframe+0x5c/0xc1

<<< 




Please note that the kernel version I use is the one maintained by
Tencent.Inc,
and the baseline is v5.4. But in fact, in the latest upstream source
tree,
although the trailing_symlink() function has been removed, its logic
has been
moved to pick_link(), so the problem still exists.

Ian Kent pointed out that try_to_unlazy() was introduced in
pick_link() in the
latest upstream source tree, but I don't understand why this can
solve the NULL
->get_link pointer dereference problem, because ->get_link pointer
will be
dereferenced bef

Re: About the conflict between XFS inode recycle and VFS rcu-walk

2024-05-26 Thread Paul E. McKenney
On Sun, May 26, 2024 at 11:04:14PM +0800, Jinliang Zheng wrote:
> On Tue, 21 May 2024 at 10:13:38 +0800, Ian Kent wrote:
> > On 21/5/24 09:35, Ian Kent wrote:
> > > On 21/5/24 01:36, Darrick J. Wong wrote:
> > >> On Thu, May 16, 2024 at 03:23:40PM +0800, Ian Kent wrote:
> > >>> On 16/5/24 15:08, Ian Kent wrote:
> >  On 16/5/24 12:56, Jinliang Zheng wrote:
> > > On Wed, 15 May 2024 at 23:54:41 +0800, Jinliang Zheng wrote:
> > >> On Wed, 31 Jan 2024 at 11:30:18 -0800, djw...@kernel.org wrote:
> > >>> On Wed, Jan 31, 2024 at 02:35:17PM +0800, Jinliang Zheng wrote:
> >  On Fri, 8 Dec 2023 11:14:32 +1100, da...@fromorbit.com wrote:
> > > On Tue, Dec 05, 2023 at 07:38:33PM +0800,
> > > alexjlzh...@gmail.com wrote:
> > >> Hi, all
> > >>
> > >> I would like to ask if the conflict between xfs
> > >> inode recycle and vfs rcu-walk
> > >> which can lead to null pointer references has been resolved?
> > >>
> > >> I browsed through emails about the following
> > >> patches and their discussions:
> > >> - 
> > >> https://lore.kernel.org/linux-xfs/20220217172518.3842951-2-bfos...@redhat.com/
> > >> - 
> > >> https://lore.kernel.org/linux-xfs/20220121142454.1994916-1-bfos...@redhat.com/
> > >> - 
> > >> https://lore.kernel.org/linux-xfs/164180589176.86426.501271559065590169.st...@mickey.themaw.net/
> > >>
> > >> And then came to the conclusion that this
> > >> problem has not been solved, am I
> > >> right? Did I miss some patch that could solve this problem?
> > > We fixed the known problems this caused by turning off the VFS
> > > functionality that the rcu pathwalks kept tripping over. See 
> > > commit
> > > 7b7820b83f23 ("xfs: don't expose internal symlink
> > > metadata buffers to
> > > the vfs").
> >  Sorry for the delay.
> > 
> >  The problem I encountered in the production environment
> >  was that during the
> >  rcu walk process the ->get_link() pointer was NULL,
> >  which caused a crash.
> > 
> >  As far as I know, commit 7b7820b83f23 ("xfs: don't
> >  expose internal symlink
> >  metadata buffers to the vfs") first appeared in:
> >  - 
> >  https://lore.kernel.org/linux-fsdevel/YZvvP9RFXi3%2FjX0q@bfoster/
> > 
> >  Does this commit solve the problem of NULL ->get_link()? And how?
> > >>> I suggest reading the call stack from wherever the VFS enters 
> > >>> the XFS
> > >>> readlink code.  If you have a reliable reproducer, then
> > >>> apply this patch
> > >>> to your kernel (you haven't mentioned which one it is) and see 
> > >>> if the
> > >>> bad dereference goes away.
> > >>>
> > >>> --D
> > >> Sorry for the delay.
> > >>
> > >> I encountered the following calltrace:
> > >>
> > >> [20213.578756] BUG: kernel NULL pointer dereference, address:
> > >> 
> > >> [20213.578785] #PF: supervisor instruction fetch in kernel mode
> > >> [20213.578799] #PF: error_code(0x0010) - not-present page
> > >> [20213.578812] PGD 3f01d64067 P4D 3f01d64067 PUD 3f01d65067 PMD 0
> > >> [20213.578828] Oops: 0010 [#1] SMP NOPTI
> > >> [20213.578839] CPU: 92 PID: 766 Comm: /usr/local/serv Kdump:
> > >> loaded Not tainted 5.4.241-1-tlinux4-0017.3 #1
> > >> [20213.578860] Hardware name: New H3C Technologies Co., Ltd.
> > >> UniServer R4900 G3/RS33M2C9SA, BIOS 2.00.38P02 04/14/2020
> > >> [20213.578884] RIP: 0010:0x0
> > >> [20213.578894] Code: Bad RIP value.
> > >> [20213.578903] RSP: 0018:c90021ebfc38 EFLAGS: 00010246
> > >> [20213.578916] RAX: 82081f40 RBX: c90021ebfce0 RCX:
> > >> 
> > >> [20213.578932] RDX: c90021ebfd48 RSI: 88bfad8d3890 RDI:
> > >> 
> > >> [20213.578948] RBP: c90021ebfc70 R08: 0001 R09:
> > >> 889b9eeae380
> > >> [20213.578965] R10: 302d34320067 R11: 0001 R12:
> > >> 
> > >> [20213.578981] R13: 88bfad8d3890 R14: 889b9eeae380 R15:
> > >> c90021ebfd48
> > >> [20213.578998] FS:  7f89c534e740()
> > >> GS:88c07fd0() knlGS:
> > >> [20213.579016] CS:  0010 DS:  ES:  CR0: 80050033
> > >> [20213.579030] CR2: ffd6 CR3: 003f01d90001 CR4:
> > >> 007706e0
> > >> [20213.579046] DR0:  DR1:  DR2:
> > >> 
> > >> [20213.579062] DR3:  DR6: fffe0ff0 DR7:
> > >> 0400
> > >> [20213.579079] PKRU: 5554
> > >> [20213.579087] Call Trace:
> > >> [20213.579099]  trailing_symlink+0x1da/0x260
> > >> [20213.579112]  pa

Re: About the conflict between XFS inode recycle and VFS rcu-walk

2024-05-26 Thread Jinliang Zheng
On Tue, 21 May 2024 at 10:13:38 +0800, Ian Kent wrote:
> On 21/5/24 09:35, Ian Kent wrote:
> > On 21/5/24 01:36, Darrick J. Wong wrote:
> >> On Thu, May 16, 2024 at 03:23:40PM +0800, Ian Kent wrote:
> >>> On 16/5/24 15:08, Ian Kent wrote:
>  On 16/5/24 12:56, Jinliang Zheng wrote:
> > On Wed, 15 May 2024 at 23:54:41 +0800, Jinliang Zheng wrote:
> >> On Wed, 31 Jan 2024 at 11:30:18 -0800, djw...@kernel.org wrote:
> >>> On Wed, Jan 31, 2024 at 02:35:17PM +0800, Jinliang Zheng wrote:
>  On Fri, 8 Dec 2023 11:14:32 +1100, da...@fromorbit.com wrote:
> > On Tue, Dec 05, 2023 at 07:38:33PM +0800,
> > alexjlzh...@gmail.com wrote:
> >> Hi, all
> >>
> >> I would like to ask if the conflict between xfs
> >> inode recycle and vfs rcu-walk
> >> which can lead to null pointer references has been resolved?
> >>
> >> I browsed through emails about the following
> >> patches and their discussions:
> >> - 
> >> https://lore.kernel.org/linux-xfs/20220217172518.3842951-2-bfos...@redhat.com/
> >> - 
> >> https://lore.kernel.org/linux-xfs/20220121142454.1994916-1-bfos...@redhat.com/
> >> - 
> >> https://lore.kernel.org/linux-xfs/164180589176.86426.501271559065590169.st...@mickey.themaw.net/
> >>
> >> And then came to the conclusion that this
> >> problem has not been solved, am I
> >> right? Did I miss some patch that could solve this problem?
> > We fixed the known problems this caused by turning off the VFS
> > functionality that the rcu pathwalks kept tripping over. See 
> > commit
> > 7b7820b83f23 ("xfs: don't expose internal symlink
> > metadata buffers to
> > the vfs").
>  Sorry for the delay.
> 
>  The problem I encountered in the production environment
>  was that during the
>  rcu walk process the ->get_link() pointer was NULL,
>  which caused a crash.
> 
>  As far as I know, commit 7b7820b83f23 ("xfs: don't
>  expose internal symlink
>  metadata buffers to the vfs") first appeared in:
>  - 
>  https://lore.kernel.org/linux-fsdevel/YZvvP9RFXi3%2FjX0q@bfoster/
> 
>  Does this commit solve the problem of NULL ->get_link()? And how?
> >>> I suggest reading the call stack from wherever the VFS enters 
> >>> the XFS
> >>> readlink code.  If you have a reliable reproducer, then
> >>> apply this patch
> >>> to your kernel (you haven't mentioned which one it is) and see 
> >>> if the
> >>> bad dereference goes away.
> >>>
> >>> --D
> >> Sorry for the delay.
> >>
> >> I encountered the following calltrace:
> >>
> >> [20213.578756] BUG: kernel NULL pointer dereference, address:
> >> 
> >> [20213.578785] #PF: supervisor instruction fetch in kernel mode
> >> [20213.578799] #PF: error_code(0x0010) - not-present page
> >> [20213.578812] PGD 3f01d64067 P4D 3f01d64067 PUD 3f01d65067 PMD 0
> >> [20213.578828] Oops: 0010 [#1] SMP NOPTI
> >> [20213.578839] CPU: 92 PID: 766 Comm: /usr/local/serv Kdump:
> >> loaded Not tainted 5.4.241-1-tlinux4-0017.3 #1
> >> [20213.578860] Hardware name: New H3C Technologies Co., Ltd.
> >> UniServer R4900 G3/RS33M2C9SA, BIOS 2.00.38P02 04/14/2020
> >> [20213.578884] RIP: 0010:0x0
> >> [20213.578894] Code: Bad RIP value.
> >> [20213.578903] RSP: 0018:c90021ebfc38 EFLAGS: 00010246
> >> [20213.578916] RAX: 82081f40 RBX: c90021ebfce0 RCX:
> >> 
> >> [20213.578932] RDX: c90021ebfd48 RSI: 88bfad8d3890 RDI:
> >> 
> >> [20213.578948] RBP: c90021ebfc70 R08: 0001 R09:
> >> 889b9eeae380
> >> [20213.578965] R10: 302d34320067 R11: 0001 R12:
> >> 
> >> [20213.578981] R13: 88bfad8d3890 R14: 889b9eeae380 R15:
> >> c90021ebfd48
> >> [20213.578998] FS:  7f89c534e740()
> >> GS:88c07fd0() knlGS:
> >> [20213.579016] CS:  0010 DS:  ES:  CR0: 80050033
> >> [20213.579030] CR2: ffd6 CR3: 003f01d90001 CR4:
> >> 007706e0
> >> [20213.579046] DR0:  DR1:  DR2:
> >> 
> >> [20213.579062] DR3:  DR6: fffe0ff0 DR7:
> >> 0400
> >> [20213.579079] PKRU: 5554
> >> [20213.579087] Call Trace:
> >> [20213.579099]  trailing_symlink+0x1da/0x260
> >> [20213.579112]  path_lookupat.isra.53+0x79/0x220
> >> [20213.579125]  filename_lookup.part.69+0xa0/0x170
> >> [20213.579138]  ? kmem_cache_alloc+0x3f/0x3f0
> >> [20213.579151]  ? getname_flags+0x4f/0x1e0
> >> [20213.579161]  user_path_at_empty+0x3e/0x50
> >> [

Re: About the conflict between XFS inode recycle and VFS rcu-walk

2024-05-20 Thread Ian Kent

On 21/5/24 09:35, Ian Kent wrote:

On 21/5/24 01:36, Darrick J. Wong wrote:

On Thu, May 16, 2024 at 03:23:40PM +0800, Ian Kent wrote:

On 16/5/24 15:08, Ian Kent wrote:

On 16/5/24 12:56, Jinliang Zheng wrote:

On Wed, 15 May 2024 at 23:54:41 +0800, Jinliang Zheng wrote:

On Wed, 31 Jan 2024 at 11:30:18 -0800, djw...@kernel.org wrote:

On Wed, Jan 31, 2024 at 02:35:17PM +0800, Jinliang Zheng wrote:

On Fri, 8 Dec 2023 11:14:32 +1100, da...@fromorbit.com wrote:

On Tue, Dec 05, 2023 at 07:38:33PM +0800,
alexjlzh...@gmail.com wrote:

Hi, all

I would like to ask if the conflict between xfs
inode recycle and vfs rcu-walk
which can lead to null pointer references has been resolved?

I browsed through emails about the following
patches and their discussions:
- 
https://lore.kernel.org/linux-xfs/20220217172518.3842951-2-bfos...@redhat.com/
- 
https://lore.kernel.org/linux-xfs/20220121142454.1994916-1-bfos...@redhat.com/
- 
https://lore.kernel.org/linux-xfs/164180589176.86426.501271559065590169.st...@mickey.themaw.net/


And then came to the conclusion that this
problem has not been solved, am I
right? Did I miss some patch that could solve this problem?

We fixed the known problems this caused by turning off the VFS
functionality that the rcu pathwalks kept tripping over. See 
commit

7b7820b83f23 ("xfs: don't expose internal symlink
metadata buffers to
the vfs").

Sorry for the delay.

The problem I encountered in the production environment
was that during the
rcu walk process the ->get_link() pointer was NULL,
which caused a crash.

As far as I know, commit 7b7820b83f23 ("xfs: don't
expose internal symlink
metadata buffers to the vfs") first appeared in:
- 
https://lore.kernel.org/linux-fsdevel/YZvvP9RFXi3%2FjX0q@bfoster/


Does this commit solve the problem of NULL ->get_link()? And how?
I suggest reading the call stack from wherever the VFS enters 
the XFS

readlink code.  If you have a reliable reproducer, then
apply this patch
to your kernel (you haven't mentioned which one it is) and see 
if the

bad dereference goes away.

--D

Sorry for the delay.

I encountered the following calltrace:

[20213.578756] BUG: kernel NULL pointer dereference, address:

[20213.578785] #PF: supervisor instruction fetch in kernel mode
[20213.578799] #PF: error_code(0x0010) - not-present page
[20213.578812] PGD 3f01d64067 P4D 3f01d64067 PUD 3f01d65067 PMD 0
[20213.578828] Oops: 0010 [#1] SMP NOPTI
[20213.578839] CPU: 92 PID: 766 Comm: /usr/local/serv Kdump:
loaded Not tainted 5.4.241-1-tlinux4-0017.3 #1
[20213.578860] Hardware name: New H3C Technologies Co., Ltd.
UniServer R4900 G3/RS33M2C9SA, BIOS 2.00.38P02 04/14/2020
[20213.578884] RIP: 0010:0x0
[20213.578894] Code: Bad RIP value.
[20213.578903] RSP: 0018:c90021ebfc38 EFLAGS: 00010246
[20213.578916] RAX: 82081f40 RBX: c90021ebfce0 RCX:

[20213.578932] RDX: c90021ebfd48 RSI: 88bfad8d3890 RDI:

[20213.578948] RBP: c90021ebfc70 R08: 0001 R09:
889b9eeae380
[20213.578965] R10: 302d34320067 R11: 0001 R12:

[20213.578981] R13: 88bfad8d3890 R14: 889b9eeae380 R15:
c90021ebfd48
[20213.578998] FS:  7f89c534e740()
GS:88c07fd0() knlGS:
[20213.579016] CS:  0010 DS:  ES:  CR0: 80050033
[20213.579030] CR2: ffd6 CR3: 003f01d90001 CR4:
007706e0
[20213.579046] DR0:  DR1:  DR2:

[20213.579062] DR3:  DR6: fffe0ff0 DR7:
0400
[20213.579079] PKRU: 5554
[20213.579087] Call Trace:
[20213.579099]  trailing_symlink+0x1da/0x260
[20213.579112]  path_lookupat.isra.53+0x79/0x220
[20213.579125]  filename_lookup.part.69+0xa0/0x170
[20213.579138]  ? kmem_cache_alloc+0x3f/0x3f0
[20213.579151]  ? getname_flags+0x4f/0x1e0
[20213.579161]  user_path_at_empty+0x3e/0x50
[20213.579172]  vfs_statx+0x76/0xe0
[20213.579182]  __do_sys_newstat+0x3d/0x70
[20213.579194]  ? fput+0x13/0x20
[20213.579203]  ? ksys_ioctl+0xb0/0x300
[20213.579213]  ? generic_file_llseek+0x24/0x30
[20213.579225]  ? fput+0x13/0x20
[20213.579233]  ? ksys_lseek+0x8d/0xb0
[20213.579243]  __x64_sys_newstat+0x16/0x20
[20213.579256]  do_syscall_64+0x4d/0x140
[20213.579268]  entry_SYSCALL_64_after_hwframe+0x5c/0xc1

<<< 




Please note that the kernel version I use is the one maintained by
Tencent.Inc,
and the baseline is v5.4. But in fact, in the latest upstream source
tree,
although the trailing_symlink() function has been removed, its logic
has been
moved to pick_link(), so the problem still exists.

Ian Kent pointed out that try_to_unlazy() was introduced in
pick_link() in the
latest upstream source tree, but I don't understand why this can
solve the NULL
->get_link pointer dereference problem, because ->get_link pointer
will be
dereferenced before try_to_unlazy().

(I don't und

Re: About the conflict between XFS inode recycle and VFS rcu-walk

2024-05-20 Thread Ian Kent

On 21/5/24 01:36, Darrick J. Wong wrote:

On Thu, May 16, 2024 at 03:23:40PM +0800, Ian Kent wrote:

On 16/5/24 15:08, Ian Kent wrote:

On 16/5/24 12:56, Jinliang Zheng wrote:

On Wed, 15 May 2024 at 23:54:41 +0800, Jinliang Zheng wrote:

On Wed, 31 Jan 2024 at 11:30:18 -0800, djw...@kernel.org wrote:

On Wed, Jan 31, 2024 at 02:35:17PM +0800, Jinliang Zheng wrote:

On Fri, 8 Dec 2023 11:14:32 +1100, da...@fromorbit.com wrote:

On Tue, Dec 05, 2023 at 07:38:33PM +0800,
alexjlzh...@gmail.com wrote:

Hi, all

I would like to ask if the conflict between xfs
inode recycle and vfs rcu-walk
which can lead to null pointer references has been resolved?

I browsed through emails about the following
patches and their discussions:
- https://lore.kernel.org/linux-xfs/20220217172518.3842951-2-bfos...@redhat.com/
- https://lore.kernel.org/linux-xfs/20220121142454.1994916-1-bfos...@redhat.com/
- 
https://lore.kernel.org/linux-xfs/164180589176.86426.501271559065590169.st...@mickey.themaw.net/

And then came to the conclusion that this
problem has not been solved, am I
right? Did I miss some patch that could solve this problem?

We fixed the known problems this caused by turning off the VFS
functionality that the rcu pathwalks kept tripping over. See commit
7b7820b83f23 ("xfs: don't expose internal symlink
metadata buffers to
the vfs").

Sorry for the delay.

The problem I encountered in the production environment
was that during the
rcu walk process the ->get_link() pointer was NULL,
which caused a crash.

As far as I know, commit 7b7820b83f23 ("xfs: don't
expose internal symlink
metadata buffers to the vfs") first appeared in:
- https://lore.kernel.org/linux-fsdevel/YZvvP9RFXi3%2FjX0q@bfoster/

Does this commit solve the problem of NULL ->get_link()? And how?

I suggest reading the call stack from wherever the VFS enters the XFS
readlink code.  If you have a reliable reproducer, then
apply this patch
to your kernel (you haven't mentioned which one it is) and see if the
bad dereference goes away.

--D

Sorry for the delay.

I encountered the following calltrace:

[20213.578756] BUG: kernel NULL pointer dereference, address:

[20213.578785] #PF: supervisor instruction fetch in kernel mode
[20213.578799] #PF: error_code(0x0010) - not-present page
[20213.578812] PGD 3f01d64067 P4D 3f01d64067 PUD 3f01d65067 PMD 0
[20213.578828] Oops: 0010 [#1] SMP NOPTI
[20213.578839] CPU: 92 PID: 766 Comm: /usr/local/serv Kdump:
loaded Not tainted 5.4.241-1-tlinux4-0017.3 #1
[20213.578860] Hardware name: New H3C Technologies Co., Ltd.
UniServer R4900 G3/RS33M2C9SA, BIOS 2.00.38P02 04/14/2020
[20213.578884] RIP: 0010:0x0
[20213.578894] Code: Bad RIP value.
[20213.578903] RSP: 0018:c90021ebfc38 EFLAGS: 00010246
[20213.578916] RAX: 82081f40 RBX: c90021ebfce0 RCX:

[20213.578932] RDX: c90021ebfd48 RSI: 88bfad8d3890 RDI:

[20213.578948] RBP: c90021ebfc70 R08: 0001 R09:
889b9eeae380
[20213.578965] R10: 302d34320067 R11: 0001 R12:

[20213.578981] R13: 88bfad8d3890 R14: 889b9eeae380 R15:
c90021ebfd48
[20213.578998] FS:  7f89c534e740()
GS:88c07fd0() knlGS:
[20213.579016] CS:  0010 DS:  ES:  CR0: 80050033
[20213.579030] CR2: ffd6 CR3: 003f01d90001 CR4:
007706e0
[20213.579046] DR0:  DR1:  DR2:

[20213.579062] DR3:  DR6: fffe0ff0 DR7:
0400
[20213.579079] PKRU: 5554
[20213.579087] Call Trace:
[20213.579099]  trailing_symlink+0x1da/0x260
[20213.579112]  path_lookupat.isra.53+0x79/0x220
[20213.579125]  filename_lookup.part.69+0xa0/0x170
[20213.579138]  ? kmem_cache_alloc+0x3f/0x3f0
[20213.579151]  ? getname_flags+0x4f/0x1e0
[20213.579161]  user_path_at_empty+0x3e/0x50
[20213.579172]  vfs_statx+0x76/0xe0
[20213.579182]  __do_sys_newstat+0x3d/0x70
[20213.579194]  ? fput+0x13/0x20
[20213.579203]  ? ksys_ioctl+0xb0/0x300
[20213.579213]  ? generic_file_llseek+0x24/0x30
[20213.579225]  ? fput+0x13/0x20
[20213.579233]  ? ksys_lseek+0x8d/0xb0
[20213.579243]  __x64_sys_newstat+0x16/0x20
[20213.579256]  do_syscall_64+0x4d/0x140
[20213.579268]  entry_SYSCALL_64_after_hwframe+0x5c/0xc1

<<<


Please note that the kernel version I use is the one maintained by
Tencent.Inc,
and the baseline is v5.4. But in fact, in the latest upstream source
tree,
although the trailing_symlink() function has been removed, its logic
has been
moved to pick_link(), so the problem still exists.

Ian Kent pointed out that try_to_unlazy() was introduced in
pick_link() in the
latest upstream source tree, but I don't understand why this can
solve the NULL
->get_link pointer dereference problem, because ->get_link pointer
will be
dereferenced before try_to_unlazy().

(I don't understand why Ian Kent's email didn't appear on th

Re: About the conflict between XFS inode recycle and VFS rcu-walk

2024-05-20 Thread Darrick J. Wong
On Thu, May 16, 2024 at 03:23:40PM +0800, Ian Kent wrote:
> 
> On 16/5/24 15:08, Ian Kent wrote:
> > On 16/5/24 12:56, Jinliang Zheng wrote:
> > > On Wed, 15 May 2024 at 23:54:41 +0800, Jinliang Zheng wrote:
> > > > On Wed, 31 Jan 2024 at 11:30:18 -0800, djw...@kernel.org wrote:
> > > > > On Wed, Jan 31, 2024 at 02:35:17PM +0800, Jinliang Zheng wrote:
> > > > > > On Fri, 8 Dec 2023 11:14:32 +1100, da...@fromorbit.com wrote:
> > > > > > > On Tue, Dec 05, 2023 at 07:38:33PM +0800,
> > > > > > > alexjlzh...@gmail.com wrote:
> > > > > > > > Hi, all
> > > > > > > > 
> > > > > > > > I would like to ask if the conflict between xfs
> > > > > > > > inode recycle and vfs rcu-walk
> > > > > > > > which can lead to null pointer references has been resolved?
> > > > > > > > 
> > > > > > > > I browsed through emails about the following
> > > > > > > > patches and their discussions:
> > > > > > > > - 
> > > > > > > > https://lore.kernel.org/linux-xfs/20220217172518.3842951-2-bfos...@redhat.com/
> > > > > > > > - 
> > > > > > > > https://lore.kernel.org/linux-xfs/20220121142454.1994916-1-bfos...@redhat.com/
> > > > > > > > - 
> > > > > > > > https://lore.kernel.org/linux-xfs/164180589176.86426.501271559065590169.st...@mickey.themaw.net/
> > > > > > > > 
> > > > > > > > And then came to the conclusion that this
> > > > > > > > problem has not been solved, am I
> > > > > > > > right? Did I miss some patch that could solve this problem?
> > > > > > > We fixed the known problems this caused by turning off the VFS
> > > > > > > functionality that the rcu pathwalks kept tripping over. See 
> > > > > > > commit
> > > > > > > 7b7820b83f23 ("xfs: don't expose internal symlink
> > > > > > > metadata buffers to
> > > > > > > the vfs").
> > > > > > Sorry for the delay.
> > > > > > 
> > > > > > The problem I encountered in the production environment
> > > > > > was that during the
> > > > > > rcu walk process the ->get_link() pointer was NULL,
> > > > > > which caused a crash.
> > > > > > 
> > > > > > As far as I know, commit 7b7820b83f23 ("xfs: don't
> > > > > > expose internal symlink
> > > > > > metadata buffers to the vfs") first appeared in:
> > > > > > - https://lore.kernel.org/linux-fsdevel/YZvvP9RFXi3%2FjX0q@bfoster/
> > > > > > 
> > > > > > Does this commit solve the problem of NULL ->get_link()? And how?
> > > > > I suggest reading the call stack from wherever the VFS enters the XFS
> > > > > readlink code.  If you have a reliable reproducer, then
> > > > > apply this patch
> > > > > to your kernel (you haven't mentioned which one it is) and see if the
> > > > > bad dereference goes away.
> > > > > 
> > > > > --D
> > > > Sorry for the delay.
> > > > 
> > > > I encountered the following calltrace:
> > > > 
> > > > [20213.578756] BUG: kernel NULL pointer dereference, address:
> > > > 
> > > > [20213.578785] #PF: supervisor instruction fetch in kernel mode
> > > > [20213.578799] #PF: error_code(0x0010) - not-present page
> > > > [20213.578812] PGD 3f01d64067 P4D 3f01d64067 PUD 3f01d65067 PMD 0
> > > > [20213.578828] Oops: 0010 [#1] SMP NOPTI
> > > > [20213.578839] CPU: 92 PID: 766 Comm: /usr/local/serv Kdump:
> > > > loaded Not tainted 5.4.241-1-tlinux4-0017.3 #1
> > > > [20213.578860] Hardware name: New H3C Technologies Co., Ltd.
> > > > UniServer R4900 G3/RS33M2C9SA, BIOS 2.00.38P02 04/14/2020
> > > > [20213.578884] RIP: 0010:0x0
> > > > [20213.578894] Code: Bad RIP value.
> > > > [20213.578903] RSP: 0018:c90021ebfc38 EFLAGS: 00010246
> > > > [20213.578916] RAX: 82081f40 RBX: c90021ebfce0 RCX:
> > > > 
> > > > [20213.578932] RDX: c90021ebfd48 RSI: 88bfad8d3890 RDI:
> > > > 
> > > > [20213.578948] RBP: c90021ebfc70 R08: 0001 R09:
> > > > 889b9eeae380
> > > > [20213.578965] R10: 302d34320067 R11: 0001 R12:
> > > > 
> > > > [20213.578981] R13: 88bfad8d3890 R14: 889b9eeae380 R15:
> > > > c90021ebfd48
> > > > [20213.578998] FS:  7f89c534e740()
> > > > GS:88c07fd0() knlGS:
> > > > [20213.579016] CS:  0010 DS:  ES:  CR0: 80050033
> > > > [20213.579030] CR2: ffd6 CR3: 003f01d90001 CR4:
> > > > 007706e0
> > > > [20213.579046] DR0:  DR1:  DR2:
> > > > 
> > > > [20213.579062] DR3:  DR6: fffe0ff0 DR7:
> > > > 0400
> > > > [20213.579079] PKRU: 5554
> > > > [20213.579087] Call Trace:
> > > > [20213.579099]  trailing_symlink+0x1da/0x260
> > > > [20213.579112]  path_lookupat.isra.53+0x79/0x220
> > > > [20213.579125]  filename_lookup.part.69+0xa0/0x170
> > > > [20213.579138]  ? kmem_cache_alloc+0x3f/0x3f0
> > > > [20213.579151]  ? getname_flags+0x4f/0x1e0
> > > > [20213.579161]  user_path_at_empty+0x3e/0x50
> > > > [20213.579172]  vfs_statx+0x76/0xe0
> > > > [20213.579182]  __do_sys_newstat+0x3d/0x70
> > > > [20213.579194]  ? fput+0x13/0x20
> > > >

Re: About the conflict between XFS inode recycle and VFS rcu-walk

2024-05-16 Thread Ian Kent



On 16/5/24 15:08, Ian Kent wrote:

On 16/5/24 12:56, Jinliang Zheng wrote:

On Wed, 15 May 2024 at 23:54:41 +0800, Jinliang Zheng wrote:

On Wed, 31 Jan 2024 at 11:30:18 -0800, djw...@kernel.org wrote:

On Wed, Jan 31, 2024 at 02:35:17PM +0800, Jinliang Zheng wrote:

On Fri, 8 Dec 2023 11:14:32 +1100, da...@fromorbit.com wrote:
On Tue, Dec 05, 2023 at 07:38:33PM +0800, alexjlzh...@gmail.com 
wrote:

Hi, all

I would like to ask if the conflict between xfs inode recycle 
and vfs rcu-walk

which can lead to null pointer references has been resolved?

I browsed through emails about the following patches and their 
discussions:
- 
https://lore.kernel.org/linux-xfs/20220217172518.3842951-2-bfos...@redhat.com/
- 
https://lore.kernel.org/linux-xfs/20220121142454.1994916-1-bfos...@redhat.com/
- 
https://lore.kernel.org/linux-xfs/164180589176.86426.501271559065590169.st...@mickey.themaw.net/


And then came to the conclusion that this problem has not been 
solved, am I

right? Did I miss some patch that could solve this problem?

We fixed the known problems this caused by turning off the VFS
functionality that the rcu pathwalks kept tripping over. See commit
7b7820b83f23 ("xfs: don't expose internal symlink metadata 
buffers to

the vfs").

Sorry for the delay.

The problem I encountered in the production environment was that 
during the
rcu walk process the ->get_link() pointer was NULL, which caused a 
crash.


As far as I know, commit 7b7820b83f23 ("xfs: don't expose internal 
symlink

metadata buffers to the vfs") first appeared in:
- https://lore.kernel.org/linux-fsdevel/YZvvP9RFXi3%2FjX0q@bfoster/

Does this commit solve the problem of NULL ->get_link()? And how?

I suggest reading the call stack from wherever the VFS enters the XFS
readlink code.  If you have a reliable reproducer, then apply this 
patch

to your kernel (you haven't mentioned which one it is) and see if the
bad dereference goes away.

--D

Sorry for the delay.

I encountered the following calltrace:

[20213.578756] BUG: kernel NULL pointer dereference, address: 


[20213.578785] #PF: supervisor instruction fetch in kernel mode
[20213.578799] #PF: error_code(0x0010) - not-present page
[20213.578812] PGD 3f01d64067 P4D 3f01d64067 PUD 3f01d65067 PMD 0
[20213.578828] Oops: 0010 [#1] SMP NOPTI
[20213.578839] CPU: 92 PID: 766 Comm: /usr/local/serv Kdump: loaded 
Not tainted 5.4.241-1-tlinux4-0017.3 #1
[20213.578860] Hardware name: New H3C Technologies Co., Ltd. 
UniServer R4900 G3/RS33M2C9SA, BIOS 2.00.38P02 04/14/2020

[20213.578884] RIP: 0010:0x0
[20213.578894] Code: Bad RIP value.
[20213.578903] RSP: 0018:c90021ebfc38 EFLAGS: 00010246
[20213.578916] RAX: 82081f40 RBX: c90021ebfce0 RCX: 

[20213.578932] RDX: c90021ebfd48 RSI: 88bfad8d3890 RDI: 

[20213.578948] RBP: c90021ebfc70 R08: 0001 R09: 
889b9eeae380
[20213.578965] R10: 302d34320067 R11: 0001 R12: 

[20213.578981] R13: 88bfad8d3890 R14: 889b9eeae380 R15: 
c90021ebfd48
[20213.578998] FS:  7f89c534e740() GS:88c07fd0() 
knlGS:

[20213.579016] CS:  0010 DS:  ES:  CR0: 80050033
[20213.579030] CR2: ffd6 CR3: 003f01d90001 CR4: 
007706e0
[20213.579046] DR0:  DR1:  DR2: 

[20213.579062] DR3:  DR6: fffe0ff0 DR7: 
0400

[20213.579079] PKRU: 5554
[20213.579087] Call Trace:
[20213.579099]  trailing_symlink+0x1da/0x260
[20213.579112]  path_lookupat.isra.53+0x79/0x220
[20213.579125]  filename_lookup.part.69+0xa0/0x170
[20213.579138]  ? kmem_cache_alloc+0x3f/0x3f0
[20213.579151]  ? getname_flags+0x4f/0x1e0
[20213.579161]  user_path_at_empty+0x3e/0x50
[20213.579172]  vfs_statx+0x76/0xe0
[20213.579182]  __do_sys_newstat+0x3d/0x70
[20213.579194]  ? fput+0x13/0x20
[20213.579203]  ? ksys_ioctl+0xb0/0x300
[20213.579213]  ? generic_file_llseek+0x24/0x30
[20213.579225]  ? fput+0x13/0x20
[20213.579233]  ? ksys_lseek+0x8d/0xb0
[20213.579243]  __x64_sys_newstat+0x16/0x20
[20213.579256]  do_syscall_64+0x4d/0x140
[20213.579268]  entry_SYSCALL_64_after_hwframe+0x5c/0xc1

<<< 

Please note that the kernel version I use is the one maintained by 
Tencent.Inc,
and the baseline is v5.4. But in fact, in the latest upstream source 
tree,
although the trailing_symlink() function has been removed, its logic 
has been

moved to pick_link(), so the problem still exists.

Ian Kent pointed out that try_to_unlazy() was introduced in 
pick_link() in the
latest upstream source tree, but I don't understand why this can 
solve the NULL
->get_link pointer dereference problem, because ->get_link pointer 
will be

dereferenced before try_to_unlazy().

(I don't understand why Ian Kent's email didn't appear on the mailing 
list.)


It was something about html mail and I

Re: About the conflict between XFS inode recycle and VFS rcu-walk

2024-05-16 Thread Ian Kent

On 16/5/24 12:56, Jinliang Zheng wrote:

On Wed, 15 May 2024 at 23:54:41 +0800, Jinliang Zheng wrote:

On Wed, 31 Jan 2024 at 11:30:18 -0800, djw...@kernel.org wrote:

On Wed, Jan 31, 2024 at 02:35:17PM +0800, Jinliang Zheng wrote:

On Fri, 8 Dec 2023 11:14:32 +1100, da...@fromorbit.com wrote:

On Tue, Dec 05, 2023 at 07:38:33PM +0800, alexjlzh...@gmail.com wrote:

Hi, all

I would like to ask if the conflict between xfs inode recycle and vfs rcu-walk
which can lead to null pointer references has been resolved?

I browsed through emails about the following patches and their discussions:
- https://lore.kernel.org/linux-xfs/20220217172518.3842951-2-bfos...@redhat.com/
- https://lore.kernel.org/linux-xfs/20220121142454.1994916-1-bfos...@redhat.com/
- 
https://lore.kernel.org/linux-xfs/164180589176.86426.501271559065590169.st...@mickey.themaw.net/

And then came to the conclusion that this problem has not been solved, am I
right? Did I miss some patch that could solve this problem?

We fixed the known problems this caused by turning off the VFS
functionality that the rcu pathwalks kept tripping over. See commit
7b7820b83f23 ("xfs: don't expose internal symlink metadata buffers to
the vfs").

Sorry for the delay.

The problem I encountered in the production environment was that during the
rcu walk process the ->get_link() pointer was NULL, which caused a crash.

As far as I know, commit 7b7820b83f23 ("xfs: don't expose internal symlink
metadata buffers to the vfs") first appeared in:
- https://lore.kernel.org/linux-fsdevel/YZvvP9RFXi3%2FjX0q@bfoster/

Does this commit solve the problem of NULL ->get_link()? And how?

I suggest reading the call stack from wherever the VFS enters the XFS
readlink code.  If you have a reliable reproducer, then apply this patch
to your kernel (you haven't mentioned which one it is) and see if the
bad dereference goes away.

--D

Sorry for the delay.

I encountered the following calltrace:

[20213.578756] BUG: kernel NULL pointer dereference, address: 
[20213.578785] #PF: supervisor instruction fetch in kernel mode
[20213.578799] #PF: error_code(0x0010) - not-present page
[20213.578812] PGD 3f01d64067 P4D 3f01d64067 PUD 3f01d65067 PMD 0
[20213.578828] Oops: 0010 [#1] SMP NOPTI
[20213.578839] CPU: 92 PID: 766 Comm: /usr/local/serv Kdump: loaded Not tainted 
5.4.241-1-tlinux4-0017.3 #1
[20213.578860] Hardware name: New H3C Technologies Co., Ltd. UniServer R4900 
G3/RS33M2C9SA, BIOS 2.00.38P02 04/14/2020
[20213.578884] RIP: 0010:0x0
[20213.578894] Code: Bad RIP value.
[20213.578903] RSP: 0018:c90021ebfc38 EFLAGS: 00010246
[20213.578916] RAX: 82081f40 RBX: c90021ebfce0 RCX: 
[20213.578932] RDX: c90021ebfd48 RSI: 88bfad8d3890 RDI: 
[20213.578948] RBP: c90021ebfc70 R08: 0001 R09: 889b9eeae380
[20213.578965] R10: 302d34320067 R11: 0001 R12: 
[20213.578981] R13: 88bfad8d3890 R14: 889b9eeae380 R15: c90021ebfd48
[20213.578998] FS:  7f89c534e740() GS:88c07fd0() 
knlGS:
[20213.579016] CS:  0010 DS:  ES:  CR0: 80050033
[20213.579030] CR2: ffd6 CR3: 003f01d90001 CR4: 007706e0
[20213.579046] DR0:  DR1:  DR2: 
[20213.579062] DR3:  DR6: fffe0ff0 DR7: 0400
[20213.579079] PKRU: 5554
[20213.579087] Call Trace:
[20213.579099]  trailing_symlink+0x1da/0x260
[20213.579112]  path_lookupat.isra.53+0x79/0x220
[20213.579125]  filename_lookup.part.69+0xa0/0x170
[20213.579138]  ? kmem_cache_alloc+0x3f/0x3f0
[20213.579151]  ? getname_flags+0x4f/0x1e0
[20213.579161]  user_path_at_empty+0x3e/0x50
[20213.579172]  vfs_statx+0x76/0xe0
[20213.579182]  __do_sys_newstat+0x3d/0x70
[20213.579194]  ? fput+0x13/0x20
[20213.579203]  ? ksys_ioctl+0xb0/0x300
[20213.579213]  ? generic_file_llseek+0x24/0x30
[20213.579225]  ? fput+0x13/0x20
[20213.579233]  ? ksys_lseek+0x8d/0xb0
[20213.579243]  __x64_sys_newstat+0x16/0x20
[20213.579256]  do_syscall_64+0x4d/0x140
[20213.579268]  entry_SYSCALL_64_after_hwframe+0x5c/0xc1

<<<

Please note that the kernel version I use is the one maintained by Tencent.Inc,
and the baseline is v5.4. But in fact, in the latest upstream source tree,
although the trailing_symlink() function has been removed, its logic has been
moved to pick_link(), so the problem still exists.

Ian Kent pointed out that try_to_unlazy() was introduced in pick_link() in the
latest upstream source tree, but I don't understand why this can solve the NULL
->get_link pointer dereference problem, because ->get_link pointer will be
dereferenced before try_to_unlazy().

(I don't understand why Ian Kent's email didn't appear on the mailing list.)


It was something about html mail and I think my mail client was at fault.

In any case what you say is indeed corre

Re: About the conflict between XFS inode recycle and VFS rcu-walk

2024-05-15 Thread Jinliang Zheng
On Wed, 15 May 2024 at 23:54:41 +0800, Jinliang Zheng wrote:
> On Wed, 31 Jan 2024 at 11:30:18 -0800, djw...@kernel.org wrote:
> > On Wed, Jan 31, 2024 at 02:35:17PM +0800, Jinliang Zheng wrote:
> > > On Fri, 8 Dec 2023 11:14:32 +1100, da...@fromorbit.com wrote:
> > > > On Tue, Dec 05, 2023 at 07:38:33PM +0800, alexjlzh...@gmail.com wrote:
> > > > > Hi, all
> > > > > 
> > > > > I would like to ask if the conflict between xfs inode recycle and vfs 
> > > > > rcu-walk
> > > > > which can lead to null pointer references has been resolved?
> > > > > 
> > > > > I browsed through emails about the following patches and their 
> > > > > discussions:
> > > > > - 
> > > > > https://lore.kernel.org/linux-xfs/20220217172518.3842951-2-bfos...@redhat.com/
> > > > > - 
> > > > > https://lore.kernel.org/linux-xfs/20220121142454.1994916-1-bfos...@redhat.com/
> > > > > - 
> > > > > https://lore.kernel.org/linux-xfs/164180589176.86426.501271559065590169.st...@mickey.themaw.net/
> > > > > 
> > > > > And then came to the conclusion that this problem has not been 
> > > > > solved, am I
> > > > > right? Did I miss some patch that could solve this problem?
> > > > 
> > > > We fixed the known problems this caused by turning off the VFS
> > > > functionality that the rcu pathwalks kept tripping over. See commit
> > > > 7b7820b83f23 ("xfs: don't expose internal symlink metadata buffers to
> > > > the vfs").
> > > 
> > > Sorry for the delay.
> > > 
> > > The problem I encountered in the production environment was that during 
> > > the
> > > rcu walk process the ->get_link() pointer was NULL, which caused a crash.
> > > 
> > > As far as I know, commit 7b7820b83f23 ("xfs: don't expose internal symlink
> > > metadata buffers to the vfs") first appeared in:
> > > - https://lore.kernel.org/linux-fsdevel/YZvvP9RFXi3%2FjX0q@bfoster/
> > > 
> > > Does this commit solve the problem of NULL ->get_link()? And how?
> > 
> > I suggest reading the call stack from wherever the VFS enters the XFS
> > readlink code.  If you have a reliable reproducer, then apply this patch
> > to your kernel (you haven't mentioned which one it is) and see if the
> > bad dereference goes away.
> > 
> > --D
> 
> Sorry for the delay.
> 
> I encountered the following calltrace:
> 
>  
> >>
> 
> [20213.578756] BUG: kernel NULL pointer dereference, address: 
> [20213.578785] #PF: supervisor instruction fetch in kernel mode
> [20213.578799] #PF: error_code(0x0010) - not-present page
> [20213.578812] PGD 3f01d64067 P4D 3f01d64067 PUD 3f01d65067 PMD 0
> [20213.578828] Oops: 0010 [#1] SMP NOPTI
> [20213.578839] CPU: 92 PID: 766 Comm: /usr/local/serv Kdump: loaded Not 
> tainted 5.4.241-1-tlinux4-0017.3 #1
> [20213.578860] Hardware name: New H3C Technologies Co., Ltd. UniServer R4900 
> G3/RS33M2C9SA, BIOS 2.00.38P02 04/14/2020
> [20213.578884] RIP: 0010:0x0
> [20213.578894] Code: Bad RIP value.
> [20213.578903] RSP: 0018:c90021ebfc38 EFLAGS: 00010246
> [20213.578916] RAX: 82081f40 RBX: c90021ebfce0 RCX: 
> 
> [20213.578932] RDX: c90021ebfd48 RSI: 88bfad8d3890 RDI: 
> 
> [20213.578948] RBP: c90021ebfc70 R08: 0001 R09: 
> 889b9eeae380
> [20213.578965] R10: 302d34320067 R11: 0001 R12: 
> 
> [20213.578981] R13: 88bfad8d3890 R14: 889b9eeae380 R15: 
> c90021ebfd48
> [20213.578998] FS:  7f89c534e740() GS:88c07fd0() 
> knlGS:
> [20213.579016] CS:  0010 DS:  ES:  CR0: 80050033
> [20213.579030] CR2: ffd6 CR3: 003f01d90001 CR4: 
> 007706e0
> [20213.579046] DR0:  DR1:  DR2: 
> 
> [20213.579062] DR3:  DR6: fffe0ff0 DR7: 
> 0400
> [20213.579079] PKRU: 5554
> [20213.579087] Call Trace:
> [20213.579099]  trailing_symlink+0x1da/0x260
> [20213.579112]  path_lookupat.isra.53+0x79/0x220
> [20213.579125]  filename_lookup.part.69+0xa0/0x170
> [20213.579138]  ? kmem_cache_alloc+0x3f/0x3f0
> [20213.579151]  ? getname_flags+0x4f/0x1e0
> [20213.579161]  user_path_at_empty+0x3e/0x50
> [20213.579172]  vfs_statx+0x76/0xe0
> [20213.579182]  __do_sys_newstat+0x3d/0x70
> [20213.579194]  ? fput+0x13/0x20
> [20213.579203]  ? ksys_ioctl+0xb0/0x300
> [20213.579213]  ? generic_file_llseek+0x24/0x30
> [20213.579225]  ? fput+0x13/0x20
> [20213.579233]  ? ksys_lseek+0x8d/0xb0
> [20213.579243]  __x64_sys_newstat+0x16/0x20
> [20213.579256]  do_syscall_64+0x4d/0x140
> [20213.579268]  entry_SYSCALL_64_after_hwframe+0x5c/0xc1
> 
> <<<

Please note that the kernel version I use is the one maintained by Tencent.Inc,
and 

Re: About the conflict between XFS inode recycle and VFS rcu-walk

2024-05-15 Thread alexjlzheng
On Wed, 31 Jan 2024 at 11:30:18 -0800, djw...@kernel.org wrote:
> On Wed, Jan 31, 2024 at 02:35:17PM +0800, Jinliang Zheng wrote:
> > On Fri, 8 Dec 2023 11:14:32 +1100, da...@fromorbit.com wrote:
> > > On Tue, Dec 05, 2023 at 07:38:33PM +0800, alexjlzh...@gmail.com wrote:
> > > > Hi, all
> > > > 
> > > > I would like to ask if the conflict between xfs inode recycle and vfs 
> > > > rcu-walk
> > > > which can lead to null pointer references has been resolved?
> > > > 
> > > > I browsed through emails about the following patches and their 
> > > > discussions:
> > > > - 
> > > > https://lore.kernel.org/linux-xfs/20220217172518.3842951-2-bfos...@redhat.com/
> > > > - 
> > > > https://lore.kernel.org/linux-xfs/20220121142454.1994916-1-bfos...@redhat.com/
> > > > - 
> > > > https://lore.kernel.org/linux-xfs/164180589176.86426.501271559065590169.st...@mickey.themaw.net/
> > > > 
> > > > And then came to the conclusion that this problem has not been solved, 
> > > > am I
> > > > right? Did I miss some patch that could solve this problem?
> > > 
> > > We fixed the known problems this caused by turning off the VFS
> > > functionality that the rcu pathwalks kept tripping over. See commit
> > > 7b7820b83f23 ("xfs: don't expose internal symlink metadata buffers to
> > > the vfs").
> > 
> > Sorry for the delay.
> > 
> > The problem I encountered in the production environment was that during the
> > rcu walk process the ->get_link() pointer was NULL, which caused a crash.
> > 
> > As far as I know, commit 7b7820b83f23 ("xfs: don't expose internal symlink
> > metadata buffers to the vfs") first appeared in:
> > - https://lore.kernel.org/linux-fsdevel/YZvvP9RFXi3%2FjX0q@bfoster/
> > 
> > Does this commit solve the problem of NULL ->get_link()? And how?
> 
> I suggest reading the call stack from wherever the VFS enters the XFS
> readlink code.  If you have a reliable reproducer, then apply this patch
> to your kernel (you haven't mentioned which one it is) and see if the
> bad dereference goes away.
> 
> --D

Sorry for the delay.

I encountered the following calltrace:

>>>

[20213.578756] BUG: kernel NULL pointer dereference, address: 
[20213.578785] #PF: supervisor instruction fetch in kernel mode
[20213.578799] #PF: error_code(0x0010) - not-present page
[20213.578812] PGD 3f01d64067 P4D 3f01d64067 PUD 3f01d65067 PMD 0
[20213.578828] Oops: 0010 [#1] SMP NOPTI
[20213.578839] CPU: 92 PID: 766 Comm: /usr/local/serv Kdump: loaded Not tainted 
5.4.241-1-tlinux4-0017.3 #1
[20213.578860] Hardware name: New H3C Technologies Co., Ltd. UniServer R4900 
G3/RS33M2C9SA, BIOS 2.00.38P02 04/14/2020
[20213.578884] RIP: 0010:0x0
[20213.578894] Code: Bad RIP value.
[20213.578903] RSP: 0018:c90021ebfc38 EFLAGS: 00010246
[20213.578916] RAX: 82081f40 RBX: c90021ebfce0 RCX: 
[20213.578932] RDX: c90021ebfd48 RSI: 88bfad8d3890 RDI: 
[20213.578948] RBP: c90021ebfc70 R08: 0001 R09: 889b9eeae380
[20213.578965] R10: 302d34320067 R11: 0001 R12: 
[20213.578981] R13: 88bfad8d3890 R14: 889b9eeae380 R15: c90021ebfd48
[20213.578998] FS:  7f89c534e740() GS:88c07fd0() 
knlGS:
[20213.579016] CS:  0010 DS:  ES:  CR0: 80050033
[20213.579030] CR2: ffd6 CR3: 003f01d90001 CR4: 007706e0
[20213.579046] DR0:  DR1:  DR2: 
[20213.579062] DR3:  DR6: fffe0ff0 DR7: 0400
[20213.579079] PKRU: 5554
[20213.579087] Call Trace:
[20213.579099]  trailing_symlink+0x1da/0x260
[20213.579112]  path_lookupat.isra.53+0x79/0x220
[20213.579125]  filename_lookup.part.69+0xa0/0x170
[20213.579138]  ? kmem_cache_alloc+0x3f/0x3f0
[20213.579151]  ? getname_flags+0x4f/0x1e0
[20213.579161]  user_path_at_empty+0x3e/0x50
[20213.579172]  vfs_statx+0x76/0xe0
[20213.579182]  __do_sys_newstat+0x3d/0x70
[20213.579194]  ? fput+0x13/0x20
[20213.579203]  ? ksys_ioctl+0xb0/0x300
[20213.579213]  ? generic_file_llseek+0x24/0x30
[20213.579225]  ? fput+0x13/0x20
[20213.579233]  ? ksys_lseek+0x8d/0xb0
[20213.579243]  __x64_sys_newstat+0x16/0x20
[20213.579256]  do_syscall_64+0x4d/0x140
[20213.579268]  entry_SYSCALL_64_after_hwframe+0x5c/0xc1

<<<

And I analyzed the disassembly of trailing_symlink() and confirmed that a NULL
->get_link() happened here:

>>>

0x812e4850 :  nopl   0x0(%rax,%rax,1) [FTRACE NOP]
0x812e4855 :  push   %rbp
0x812e4856 :  mov%rsp,%rbp
0x812e4859 :  push   %r15
0x812e485b :  push   %r14
0x812e485d :  push   %r13
0x812e485f :  push   %r12
0x812e4861 : push   %rbx
0x8

Re: About the conflict between XFS inode recycle and VFS rcu-walk

2024-01-31 Thread Darrick J. Wong
On Wed, Jan 31, 2024 at 02:35:17PM +0800, Jinliang Zheng wrote:
> On Fri, 8 Dec 2023 11:14:32 +1100, da...@fromorbit.com wrote:
> > On Tue, Dec 05, 2023 at 07:38:33PM +0800, alexjlzh...@gmail.com wrote:
> > > Hi, all
> > > 
> > > I would like to ask if the conflict between xfs inode recycle and vfs 
> > > rcu-walk
> > > which can lead to null pointer references has been resolved?
> > > 
> > > I browsed through emails about the following patches and their 
> > > discussions:
> > > - 
> > > https://lore.kernel.org/linux-xfs/20220217172518.3842951-2-bfos...@redhat.com/
> > > - 
> > > https://lore.kernel.org/linux-xfs/20220121142454.1994916-1-bfos...@redhat.com/
> > > - 
> > > https://lore.kernel.org/linux-xfs/164180589176.86426.501271559065590169.st...@mickey.themaw.net/
> > > 
> > > And then came to the conclusion that this problem has not been solved, am 
> > > I
> > > right? Did I miss some patch that could solve this problem?
> > 
> > We fixed the known problems this caused by turning off the VFS
> > functionality that the rcu pathwalks kept tripping over. See commit
> > 7b7820b83f23 ("xfs: don't expose internal symlink metadata buffers to
> > the vfs").
> 
> Sorry for the delay.
> 
> The problem I encountered in the production environment was that during the
> rcu walk process the ->get_link() pointer was NULL, which caused a crash.
> 
> As far as I know, commit 7b7820b83f23 ("xfs: don't expose internal symlink
> metadata buffers to the vfs") first appeared in:
> - https://lore.kernel.org/linux-fsdevel/YZvvP9RFXi3%2FjX0q@bfoster/
> 
> Does this commit solve the problem of NULL ->get_link()? And how?

I suggest reading the call stack from wherever the VFS enters the XFS
readlink code.  If you have a reliable reproducer, then apply this patch
to your kernel (you haven't mentioned which one it is) and see if the
bad dereference goes away.

--D

> > 
> > Apart from that issue, I'm not aware of any other issues that the
> > XFS inode recycling directly exposes.
> > 
> > > According to my understanding, the essence of this problem is that XFS 
> > > reuses
> > > the inode evicted by VFS, but VFS rcu-walk assumes that this will not 
> > > happen.
> > 
> > It assumes that the inode will not change identity during the RCU
> > grace period after the inode has been evicted from cache. We can
> > safely reinstantiate an evicted inode without waiting for an RCU
> > grace period as long as it is the same inode with the same content
> > and same state.
> > 
> > Problems *may* arise when we unlink the inode, then evict it, then a
> > new file is created and the old slab cache memory address is used
> > for the new inode. I describe the issue here:
> > 
> > https://lore.kernel.org/linux-xfs/20220118232547.gd59...@dread.disaster.area/
> 
> And judging from the relevant emails, the main reason why ->get_link() is set
> to NULL should be the lack of synchronize_rcu() before xfs_reinit_inode() when
> the inode is chosen to be reused.
> 
> However, perhaps due to performance reasons, this solution has not been merged
> for a long time. How is it now? 
> 
> Maybe I am missing something in the threads of mail?
> 
> Thank you very much. :)
> Jinliang Zheng
> 
> > 
> > That said, we have exactly zero evidence that this is actually a
> > problem in production systems. We did get systems tripping over the
> > symlink issue, but there's no evidence that the
> > unlink->close->open(O_CREAT) issues are manifesting in the wild and
> > hence there hasn't been any particular urgency to address it.
> > 
> > > Are there any recommended workarounds until an elegant and efficient 
> > > solution
> > > can be proposed? After all, causing a crash is extremely unacceptable in a
> > > production environment.
> > 
> > What crashes are you seeing in your production environment?
> > 
> > -Dave.
> > -- 
> > Dave Chinner
> > da...@fromorbit.com
> 



Re: About the conflict between XFS inode recycle and VFS rcu-walk

2024-01-30 Thread Jinliang Zheng
On Fri, 8 Dec 2023 11:14:32 +1100, da...@fromorbit.com wrote:
> On Tue, Dec 05, 2023 at 07:38:33PM +0800, alexjlzh...@gmail.com wrote:
> > Hi, all
> > 
> > I would like to ask if the conflict between xfs inode recycle and vfs 
> > rcu-walk
> > which can lead to null pointer references has been resolved?
> > 
> > I browsed through emails about the following patches and their discussions:
> > - 
> > https://lore.kernel.org/linux-xfs/20220217172518.3842951-2-bfos...@redhat.com/
> > - 
> > https://lore.kernel.org/linux-xfs/20220121142454.1994916-1-bfos...@redhat.com/
> > - 
> > https://lore.kernel.org/linux-xfs/164180589176.86426.501271559065590169.st...@mickey.themaw.net/
> > 
> > And then came to the conclusion that this problem has not been solved, am I
> > right? Did I miss some patch that could solve this problem?
> 
> We fixed the known problems this caused by turning off the VFS
> functionality that the rcu pathwalks kept tripping over. See commit
> 7b7820b83f23 ("xfs: don't expose internal symlink metadata buffers to
> the vfs").

Sorry for the delay.

The problem I encountered in the production environment was that during the
rcu walk process the ->get_link() pointer was NULL, which caused a crash.

As far as I know, commit 7b7820b83f23 ("xfs: don't expose internal symlink
metadata buffers to the vfs") first appeared in:
- https://lore.kernel.org/linux-fsdevel/YZvvP9RFXi3%2FjX0q@bfoster/

Does this commit solve the problem of NULL ->get_link()? And how?

> 
> Apart from that issue, I'm not aware of any other issues that the
> XFS inode recycling directly exposes.
> 
> > According to my understanding, the essence of this problem is that XFS 
> > reuses
> > the inode evicted by VFS, but VFS rcu-walk assumes that this will not 
> > happen.
> 
> It assumes that the inode will not change identity during the RCU
> grace period after the inode has been evicted from cache. We can
> safely reinstantiate an evicted inode without waiting for an RCU
> grace period as long as it is the same inode with the same content
> and same state.
> 
> Problems *may* arise when we unlink the inode, then evict it, then a
> new file is created and the old slab cache memory address is used
> for the new inode. I describe the issue here:
> 
> https://lore.kernel.org/linux-xfs/20220118232547.gd59...@dread.disaster.area/

And judging from the relevant emails, the main reason why ->get_link() is set
to NULL should be the lack of synchronize_rcu() before xfs_reinit_inode() when
the inode is chosen to be reused.

However, perhaps due to performance reasons, this solution has not been merged
for a long time. How is it now? 

Maybe I am missing something in the threads of mail?

Thank you very much. :)
Jinliang Zheng

> 
> That said, we have exactly zero evidence that this is actually a
> problem in production systems. We did get systems tripping over the
> symlink issue, but there's no evidence that the
> unlink->close->open(O_CREAT) issues are manifesting in the wild and
> hence there hasn't been any particular urgency to address it.
> 
> > Are there any recommended workarounds until an elegant and efficient 
> > solution
> > can be proposed? After all, causing a crash is extremely unacceptable in a
> > production environment.
> 
> What crashes are you seeing in your production environment?
> 
> -Dave.
> -- 
> Dave Chinner
> da...@fromorbit.com



Re: About the conflict between XFS inode recycle and VFS rcu-walk

2023-12-07 Thread Dave Chinner
On Tue, Dec 05, 2023 at 07:38:33PM +0800, alexjlzh...@gmail.com wrote:
> Hi, all
> 
> I would like to ask if the conflict between xfs inode recycle and vfs rcu-walk
> which can lead to null pointer references has been resolved?
> 
> I browsed through emails about the following patches and their discussions:
> - 
> https://lore.kernel.org/linux-xfs/20220217172518.3842951-2-bfos...@redhat.com/
> - 
> https://lore.kernel.org/linux-xfs/20220121142454.1994916-1-bfos...@redhat.com/
> - 
> https://lore.kernel.org/linux-xfs/164180589176.86426.501271559065590169.st...@mickey.themaw.net/
> 
> And then came to the conclusion that this problem has not been solved, am I
> right? Did I miss some patch that could solve this problem?

We fixed the known problems this caused by turning off the VFS
functionality that the rcu pathwalks kept tripping over. See commit
7b7820b83f23 ("xfs: don't expose internal symlink metadata buffers to
the vfs").

Apart from that issue, I'm not aware of any other issues that the
XFS inode recycling directly exposes.

> According to my understanding, the essence of this problem is that XFS reuses
> the inode evicted by VFS, but VFS rcu-walk assumes that this will not happen.

It assumes that the inode will not change identity during the RCU
grace period after the inode has been evicted from cache. We can
safely reinstantiate an evicted inode without waiting for an RCU
grace period as long as it is the same inode with the same content
and same state.

Problems *may* arise when we unlink the inode, then evict it, then a
new file is created and the old slab cache memory address is used
for the new inode. I describe the issue here:

https://lore.kernel.org/linux-xfs/20220118232547.gd59...@dread.disaster.area/

That said, we have exactly zero evidence that this is actually a
problem in production systems. We did get systems tripping over the
symlink issue, but there's no evidence that the
unlink->close->open(O_CREAT) issues are manifesting in the wild and
hence there hasn't been any particular urgency to address it.

> Are there any recommended workarounds until an elegant and efficient solution
> can be proposed? After all, causing a crash is extremely unacceptable in a
> production environment.

What crashes are you seeing in your production environment?

-Dave.
-- 
Dave Chinner
da...@fromorbit.com



About the conflict between XFS inode recycle and VFS rcu-walk

2023-12-05 Thread alexjlzheng
Hi, all

I would like to ask if the conflict between xfs inode recycle and vfs rcu-walk
which can lead to null pointer references has been resolved?

I browsed through emails about the following patches and their discussions:
- https://lore.kernel.org/linux-xfs/20220217172518.3842951-2-bfos...@redhat.com/
- https://lore.kernel.org/linux-xfs/20220121142454.1994916-1-bfos...@redhat.com/
- 
https://lore.kernel.org/linux-xfs/164180589176.86426.501271559065590169.st...@mickey.themaw.net/

And then came to the conclusion that this problem has not been solved, am I
right? Did I miss some patch that could solve this problem?

According to my understanding, the essence of this problem is that XFS reuses
the inode evicted by VFS, but VFS rcu-walk assumes that this will not happen.
Are there any recommended workarounds until an elegant and efficient solution
can be proposed? After all, causing a crash is extremely unacceptable in a
production environment.

Thank you very much for your advice :)
Jinliang Zheng