Re: [GIT PULL] proc fixes v2 for v5.8-rc1
Linus Torvalds writes: > On Fri, Jun 12, 2020 at 1:06 PM Eric W. Biederman > wrote: >> >> I have a sense that a use after free that anyone can trigger could be a >> bit dangerous, and despite not being the only virtual filesystem in the >> kernel proc is the only virtual filesystem that called new_inode_pseudo. > > So the reason I pulled that change despite my questions was that I do > agree with the whole "there's probably little point to use > new_inode_pseudo() here" argument. > > But at the same time, I ghet the feeling that this partly just is > papering over the problem. If fsnotify causes problems with a > new_inode_pseudo() inode, then fsnotify should be _checking_ for that > case. > > And if fsnotify were to check for it, then the reason for /proc to use > it would largely go away. Maybe the debug check for umount matters, > but honestly it doesn't really seem to be a big deal. > > A pseudo-inode is basically independent of the filesystem it was > mounted as, so the generic_shutdown_super() check not triggering for > them is sensible, I feel. > > But yeah, we could also make the rule go the other way, and simply > make sure that "new_inode_pseudo()" itself checks that the super-block > you give it is something fundamenally unmountable and was created with > 'kern_mount()'. > > That would have also figured out that the /proc case was broken. > > So the main objection I have here is really that this fix feels > incomplete, and doesn't really reflect the underlying issue, just > fixes the symptom. > > Either the underlying issue is that you shouldn't call 'fsnotify' on > /proc, or the underlying issue is that /proc was using a bad inode and > nobody even noticed until the fsnotify issue. > > This is not a huge deal. I think you've fixed the bug, I just have > this itch that the thing that triggered it shouldn't have happened in > the first place. Yeah. I have a similar feeling. Especially since it took about 7 years for someone to notice something was not right. Still right now I am not up to digging and adding warnings and causing things to fail. What energy I have I am going to use to keep sorting out exec. I am not ready to page back in the fine details of how all of the filesystem pieces interact right now. I am wondering now if there are any crazy corner cases of the unix domain sockets where fsnotify could latch onto one, and have problems. It looks like the inode comes from the underlying filesystem so we should be safe. Eric
Re: [GIT PULL] proc fixes v2 for v5.8-rc1
On Fri, Jun 12, 2020 at 1:06 PM Eric W. Biederman wrote: > > I have a sense that a use after free that anyone can trigger could be a > bit dangerous, and despite not being the only virtual filesystem in the > kernel proc is the only virtual filesystem that called new_inode_pseudo. So the reason I pulled that change despite my questions was that I do agree with the whole "there's probably little point to use new_inode_pseudo() here" argument. But at the same time, I ghet the feeling that this partly just is papering over the problem. If fsnotify causes problems with a new_inode_pseudo() inode, then fsnotify should be _checking_ for that case. And if fsnotify were to check for it, then the reason for /proc to use it would largely go away. Maybe the debug check for umount matters, but honestly it doesn't really seem to be a big deal. A pseudo-inode is basically independent of the filesystem it was mounted as, so the generic_shutdown_super() check not triggering for them is sensible, I feel. But yeah, we could also make the rule go the other way, and simply make sure that "new_inode_pseudo()" itself checks that the super-block you give it is something fundamenally unmountable and was created with 'kern_mount()'. That would have also figured out that the /proc case was broken. So the main objection I have here is really that this fix feels incomplete, and doesn't really reflect the underlying issue, just fixes the symptom. Either the underlying issue is that you shouldn't call 'fsnotify' on /proc, or the underlying issue is that /proc was using a bad inode and nobody even noticed until the fsnotify issue. This is not a huge deal. I think you've fixed the bug, I just have this itch that the thing that triggered it shouldn't have happened in the first place. Linus
Re: [GIT PULL] proc fixes v2 for v5.8-rc1
Linus Torvalds writes: > On Fri, Jun 12, 2020 at 12:34 PM Eric W. Biederman > wrote: >> >> ebied...@xmission.com (Eric W. Biederman) writes: > > What happened to that first version of the email? I never got it.. A little distracted I think. I forgot to edit the above line out, and v2 because it is my second pull request for a proc fix into v5.8-rc1. >> Looking at the code the fsnotify watch should have been removed by >> fsnotify_sb_delete in generic_shutdown_super. > > Hmm. Correct. The new_inode_pseudo() is for things that don't have > quotas, fsnotify or writeback. > > That was used somewhat intentionally on /proc, though. /proc certainly > doesn't have quotas or writeback. It also means we break our debugging by not putting inodes on the s_inodes list. AKA this line in generic_shutdown_super is also depent on that behavior. if (!list_empty(>s_inodes)) { printk("VFS: Busy inodes after unmount of %s. " "Self-destruct in 5 seconds. Have a nice day...\n", sb->s_id); } > And fsnotify on /proc seems a bit questionably too. Do people actually > _do_ this and depend on it, or is this just about syzbot doing > something odd and thus showing the problem? > > Anyway, I have pulled your fix, because I think it's reasonable and > safe, but I do wonder if we should have kept the new_inode_pseudo(), > and instead just make fsnotify say "you can't notify on an inode that > isn't on the superblock list". Hmm? > > Is fsnotify on /proc really sensible? Do we actually generate any > useful notifications? I don't know of any proc code that does anything to make fsnotify/inotify work. Since changes to proc are not initialiated through the vfs that probably means fsnotify is pretty much useless. I have a sense that a use after free that anyone can trigger could be a bit dangerous, and despite not being the only virtual filesystem in the kernel proc is the only virtual filesystem that called new_inode_pseudo. Eric
Re: Re: [GIT PULL] proc fixes v2 for v5.8-rc1
The pull request you sent on Fri, 12 Jun 2020 14:29:50 -0500: > git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace.git > proc-linus has been merged into torvalds/linux.git: https://git.kernel.org/torvalds/c/44ebe016df3aad96e3be8f95ec52397728dd7701 Thank you! -- Deet-doot-dot, I am a bot. https://korg.wiki.kernel.org/userdoc/prtracker
Re: [GIT PULL] proc fixes v2 for v5.8-rc1
On Fri, Jun 12, 2020 at 12:34 PM Eric W. Biederman wrote: > > ebied...@xmission.com (Eric W. Biederman) writes: What happened to that first version of the email? I never got it.. > Looking at the code the fsnotify watch should have been removed by > fsnotify_sb_delete in generic_shutdown_super. Hmm. Correct. The new_inode_pseudo() is for things that don't have quotas, fsnotify or writeback. That was used somewhat intentionally on /proc, though. /proc certainly doesn't have quotas or writeback. And fsnotify on /proc seems a bit questionably too. Do people actually _do_ this and depend on it, or is this just about syzbot doing something odd and thus showing the problem? Anyway, I have pulled your fix, because I think it's reasonable and safe, but I do wonder if we should have kept the new_inode_pseudo(), and instead just make fsnotify say "you can't notify on an inode that isn't on the superblock list". Hmm? Is fsnotify on /proc really sensible? Do we actually generate any useful notifications? Linus
Re: [GIT PULL] proc fixes v2 for v5.8-rc1
ebied...@xmission.com (Eric W. Biederman) writes: Please pull the proc-linus branch from the git tree: git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace.git proc-linus HEAD: ef1548adada51a2f32ed7faef50aa465e1b4c5da proc: Use new_inode not new_inode_pseudo Much to my surprise syzbot found a very old bug in proc that the recent changes made easier to reproce. This bug is subtle enough it looks like it fooled everyone who should know better. Eric --- >From ef1548adada51a2f32ed7faef50aa465e1b4c5da Mon Sep 17 00:00:00 2001 From: "Eric W. Biederman" Date: Fri, 12 Jun 2020 09:42:03 -0500 Subject: [PATCH] proc: Use new_inode not new_inode_pseudo Recently syzbot reported that unmounting proc when there is an ongoing inotify watch on the root directory of proc could result in a use after free when the watch is removed after the unmount of proc when the watcher exits. Commit 69879c01a0c3 ("proc: Remove the now unnecessary internal mount of proc") made it easier to unmount proc and allowed syzbot to see the problem, but looking at the code it has been around for a long time. Looking at the code the fsnotify watch should have been removed by fsnotify_sb_delete in generic_shutdown_super. Unfortunately the inode was allocated with new_inode_pseudo instead of new_inode so the inode was not on the sb->s_inodes list. Which prevented fsnotify_unmount_inodes from finding the inode and removing the watch as well as made it so the "VFS: Busy inodes after unmount" warning could not find the inodes to warn about them. Make all of the inodes in proc visible to generic_shutdown_super, and fsnotify_sb_delete by using new_inode instead of new_inode_pseudo. The only functional difference is that new_inode places the inodes on the sb->s_inodes list. I wrote a small test program and I can verify that without changes it can trigger this issue, and by replacing new_inode_pseudo with new_inode the issues goes away. Cc: sta...@vger.kernel.org Link: https://lkml.kernel.org/r/d788c905a7dfa...@google.com Reported-by: syzbot+7d2debdcdb3cb93c1...@syzkaller.appspotmail.com Fixes: 0097875bd415 ("proc: Implement /proc/thread-self to point at the directory of the current thread") Fixes: 021ada7dff22 ("procfs: switch /proc/self away from proc_dir_entry") Fixes: 51f0885e5415 ("vfs,proc: guarantee unique inodes in /proc") Signed-off-by: "Eric W. Biederman" --- fs/proc/inode.c | 2 +- fs/proc/self.c| 2 +- fs/proc/thread_self.c | 2 +- 3 files changed, 3 insertions(+), 3 deletions(-) diff --git a/fs/proc/inode.c b/fs/proc/inode.c index f40c2532c057..28d6105e908e 100644 --- a/fs/proc/inode.c +++ b/fs/proc/inode.c @@ -617,7 +617,7 @@ const struct inode_operations proc_link_inode_operations = { struct inode *proc_get_inode(struct super_block *sb, struct proc_dir_entry *de) { - struct inode *inode = new_inode_pseudo(sb); + struct inode *inode = new_inode(sb); if (inode) { inode->i_ino = de->low_ino; diff --git a/fs/proc/self.c b/fs/proc/self.c index ca5158fa561c..72cd69bcaf4a 100644 --- a/fs/proc/self.c +++ b/fs/proc/self.c @@ -43,7 +43,7 @@ int proc_setup_self(struct super_block *s) inode_lock(root_inode); self = d_alloc_name(s->s_root, "self"); if (self) { - struct inode *inode = new_inode_pseudo(s); + struct inode *inode = new_inode(s); if (inode) { inode->i_ino = self_inum; inode->i_mtime = inode->i_atime = inode->i_ctime = current_time(inode); diff --git a/fs/proc/thread_self.c b/fs/proc/thread_self.c index ac284f409568..a553273fbd41 100644 --- a/fs/proc/thread_self.c +++ b/fs/proc/thread_self.c @@ -43,7 +43,7 @@ int proc_setup_thread_self(struct super_block *s) inode_lock(root_inode); thread_self = d_alloc_name(s->s_root, "thread-self"); if (thread_self) { - struct inode *inode = new_inode_pseudo(s); + struct inode *inode = new_inode(s); if (inode) { inode->i_ino = thread_self_inum; inode->i_mtime = inode->i_atime = inode->i_ctime = current_time(inode); -- 2.20.1