Re: [GIT PULL] proc fixes v2 for v5.8-rc1

2020-06-12 Thread Eric W. Biederman
Linus Torvalds  writes:

> On Fri, Jun 12, 2020 at 1:06 PM Eric W. Biederman  
> wrote:
>>
>> I have a sense that a use after free that anyone can trigger could be a
>> bit dangerous, and despite not being the only virtual filesystem in the
>> kernel proc is the only virtual filesystem that called new_inode_pseudo.
>
> So the reason I pulled that change despite my questions was that I do
> agree with the whole "there's probably little point to use
> new_inode_pseudo() here" argument.
>
> But at the same time, I ghet the feeling that this partly just is
> papering over the problem. If fsnotify causes problems with a
> new_inode_pseudo() inode, then fsnotify should be _checking_ for that
> case.
>
> And if fsnotify were to check for it, then the reason for /proc to use
> it would largely go away. Maybe the debug check for umount matters,
> but honestly it doesn't really seem to be a big deal.
>
> A pseudo-inode is basically independent of the filesystem it was
> mounted as, so the generic_shutdown_super() check not triggering for
> them is sensible, I feel.
>
> But yeah, we could also make the rule go the other way, and simply
> make sure that "new_inode_pseudo()" itself checks that the super-block
> you give it is something fundamenally unmountable and was created with
> 'kern_mount()'.
>
> That would have also figured out that the /proc case was broken.
>
> So the main objection I have here is really that this fix feels
> incomplete, and doesn't really reflect the underlying issue, just
> fixes the symptom.
>
> Either the underlying issue is that you shouldn't call 'fsnotify' on
> /proc, or the underlying issue is that /proc was using a bad inode and
> nobody even noticed until the fsnotify issue.
>
> This is not a huge deal. I think you've fixed the bug, I just have
> this itch that the thing that triggered it shouldn't have happened in
> the first place.

Yeah.  I have a similar feeling.

Especially since it took about 7 years for someone to notice something
was not right.

Still right now I am not up to digging and adding warnings and causing
things to fail.  What energy I have I am going to use to keep sorting
out exec.  I am not ready to page back in the fine details of how all of
the filesystem pieces interact right now.

I am wondering now if there are any crazy corner cases of the unix
domain sockets where fsnotify could latch onto one, and have problems.
It looks like the inode comes from the underlying filesystem so we
should be safe.

Eric


Re: [GIT PULL] proc fixes v2 for v5.8-rc1

2020-06-12 Thread Linus Torvalds
On Fri, Jun 12, 2020 at 1:06 PM Eric W. Biederman  wrote:
>
> I have a sense that a use after free that anyone can trigger could be a
> bit dangerous, and despite not being the only virtual filesystem in the
> kernel proc is the only virtual filesystem that called new_inode_pseudo.

So the reason I pulled that change despite my questions was that I do
agree with the whole "there's probably little point to use
new_inode_pseudo() here" argument.

But at the same time, I ghet the feeling that this partly just is
papering over the problem. If fsnotify causes problems with a
new_inode_pseudo() inode, then fsnotify should be _checking_ for that
case.

And if fsnotify were to check for it, then the reason for /proc to use
it would largely go away. Maybe the debug check for umount matters,
but honestly it doesn't really seem to be a big deal.

A pseudo-inode is basically independent of the filesystem it was
mounted as, so the generic_shutdown_super() check not triggering for
them is sensible, I feel.

But yeah, we could also make the rule go the other way, and simply
make sure that "new_inode_pseudo()" itself checks that the super-block
you give it is something fundamenally unmountable and was created with
'kern_mount()'.

That would have also figured out that the /proc case was broken.

So the main objection I have here is really that this fix feels
incomplete, and doesn't really reflect the underlying issue, just
fixes the symptom.

Either the underlying issue is that you shouldn't call 'fsnotify' on
/proc, or the underlying issue is that /proc was using a bad inode and
nobody even noticed until the fsnotify issue.

This is not a huge deal. I think you've fixed the bug, I just have
this itch that the thing that triggered it shouldn't have happened in
the first place.

 Linus


Re: [GIT PULL] proc fixes v2 for v5.8-rc1

2020-06-12 Thread Eric W. Biederman
Linus Torvalds  writes:

> On Fri, Jun 12, 2020 at 12:34 PM Eric W. Biederman
>  wrote:
>>
>> ebied...@xmission.com (Eric W. Biederman) writes:
>
> What happened to that first version of the email? I never got it..

A little distracted I think.  I forgot to edit the above line out,
and v2 because it is my second pull request for a proc fix into
v5.8-rc1.

>> Looking at the code the fsnotify watch should have been removed by
>> fsnotify_sb_delete in generic_shutdown_super.
>
> Hmm. Correct. The new_inode_pseudo() is for things that don't have
> quotas, fsnotify or writeback.
>
> That was used somewhat intentionally on /proc, though. /proc certainly
> doesn't have quotas or writeback.

It also means we break our debugging by not putting inodes on the s_inodes list.

AKA this line in generic_shutdown_super is also depent on that behavior.
if (!list_empty(>s_inodes)) {
printk("VFS: Busy inodes after unmount of %s. "
   "Self-destruct in 5 seconds.  Have a nice day...\n",
   sb->s_id);
}

> And fsnotify on /proc seems a bit questionably too. Do people actually
> _do_ this and depend on it, or is this just about syzbot doing
> something odd and thus showing the problem?
>
> Anyway, I have pulled your fix, because I think it's reasonable and
> safe, but I do wonder if we should have kept the new_inode_pseudo(),
> and instead just make fsnotify say "you can't notify on an inode that
> isn't on the superblock list". Hmm?
>
> Is fsnotify on /proc really sensible? Do we actually generate any
> useful notifications?

I don't know of any proc code that does anything to make
fsnotify/inotify work.  Since changes to proc are not initialiated
through the vfs that probably means fsnotify is pretty much useless.

I have a sense that a use after free that anyone can trigger could be a
bit dangerous, and despite not being the only virtual filesystem in the
kernel proc is the only virtual filesystem that called new_inode_pseudo.

Eric



Re: Re: [GIT PULL] proc fixes v2 for v5.8-rc1

2020-06-12 Thread pr-tracker-bot
The pull request you sent on Fri, 12 Jun 2020 14:29:50 -0500:

> git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace.git 
> proc-linus

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/44ebe016df3aad96e3be8f95ec52397728dd7701

Thank you!

-- 
Deet-doot-dot, I am a bot.
https://korg.wiki.kernel.org/userdoc/prtracker


Re: [GIT PULL] proc fixes v2 for v5.8-rc1

2020-06-12 Thread Linus Torvalds
On Fri, Jun 12, 2020 at 12:34 PM Eric W. Biederman
 wrote:
>
> ebied...@xmission.com (Eric W. Biederman) writes:

What happened to that first version of the email? I never got it..

> Looking at the code the fsnotify watch should have been removed by
> fsnotify_sb_delete in generic_shutdown_super.

Hmm. Correct. The new_inode_pseudo() is for things that don't have
quotas, fsnotify or writeback.

That was used somewhat intentionally on /proc, though. /proc certainly
doesn't have quotas or writeback.

And fsnotify on /proc seems a bit questionably too. Do people actually
_do_ this and depend on it, or is this just about syzbot doing
something odd and thus showing the problem?

Anyway, I have pulled your fix, because I think it's reasonable and
safe, but I do wonder if we should have kept the new_inode_pseudo(),
and instead just make fsnotify say "you can't notify on an inode that
isn't on the superblock list". Hmm?

Is fsnotify on /proc really sensible? Do we actually generate any
useful notifications?

  Linus


Re: [GIT PULL] proc fixes v2 for v5.8-rc1

2020-06-12 Thread Eric W. Biederman
ebied...@xmission.com (Eric W. Biederman) writes:

Please pull the proc-linus branch from the git tree:

git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace.git 
proc-linus

HEAD: ef1548adada51a2f32ed7faef50aa465e1b4c5da proc: Use new_inode not 
new_inode_pseudo

Much to my surprise syzbot found a very old bug in proc that the recent
changes made easier to reproce.  This bug is subtle enough it looks like
it fooled everyone who should know better.

Eric

---

>From ef1548adada51a2f32ed7faef50aa465e1b4c5da Mon Sep 17 00:00:00 2001
From: "Eric W. Biederman" 
Date: Fri, 12 Jun 2020 09:42:03 -0500
Subject: [PATCH] proc: Use new_inode not new_inode_pseudo

Recently syzbot reported that unmounting proc when there is an ongoing
inotify watch on the root directory of proc could result in a use
after free when the watch is removed after the unmount of proc
when the watcher exits.

Commit 69879c01a0c3 ("proc: Remove the now unnecessary internal mount
of proc") made it easier to unmount proc and allowed syzbot to see the
problem, but looking at the code it has been around for a long time.

Looking at the code the fsnotify watch should have been removed by
fsnotify_sb_delete in generic_shutdown_super.  Unfortunately the inode
was allocated with new_inode_pseudo instead of new_inode so the inode
was not on the sb->s_inodes list.  Which prevented
fsnotify_unmount_inodes from finding the inode and removing the watch
as well as made it so the "VFS: Busy inodes after unmount" warning
could not find the inodes to warn about them.

Make all of the inodes in proc visible to generic_shutdown_super,
and fsnotify_sb_delete by using new_inode instead of new_inode_pseudo.
The only functional difference is that new_inode places the inodes
on the sb->s_inodes list.

I wrote a small test program and I can verify that without changes it
can trigger this issue, and by replacing new_inode_pseudo with
new_inode the issues goes away.

Cc: sta...@vger.kernel.org
Link: https://lkml.kernel.org/r/d788c905a7dfa...@google.com
Reported-by: syzbot+7d2debdcdb3cb93c1...@syzkaller.appspotmail.com
Fixes: 0097875bd415 ("proc: Implement /proc/thread-self to point at the 
directory of the current thread")
Fixes: 021ada7dff22 ("procfs: switch /proc/self away from proc_dir_entry")
Fixes: 51f0885e5415 ("vfs,proc: guarantee unique inodes in /proc")
Signed-off-by: "Eric W. Biederman" 
---
 fs/proc/inode.c   | 2 +-
 fs/proc/self.c| 2 +-
 fs/proc/thread_self.c | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/fs/proc/inode.c b/fs/proc/inode.c
index f40c2532c057..28d6105e908e 100644
--- a/fs/proc/inode.c
+++ b/fs/proc/inode.c
@@ -617,7 +617,7 @@ const struct inode_operations proc_link_inode_operations = {
 
 struct inode *proc_get_inode(struct super_block *sb, struct proc_dir_entry *de)
 {
-   struct inode *inode = new_inode_pseudo(sb);
+   struct inode *inode = new_inode(sb);
 
if (inode) {
inode->i_ino = de->low_ino;
diff --git a/fs/proc/self.c b/fs/proc/self.c
index ca5158fa561c..72cd69bcaf4a 100644
--- a/fs/proc/self.c
+++ b/fs/proc/self.c
@@ -43,7 +43,7 @@ int proc_setup_self(struct super_block *s)
inode_lock(root_inode);
self = d_alloc_name(s->s_root, "self");
if (self) {
-   struct inode *inode = new_inode_pseudo(s);
+   struct inode *inode = new_inode(s);
if (inode) {
inode->i_ino = self_inum;
inode->i_mtime = inode->i_atime = inode->i_ctime = 
current_time(inode);
diff --git a/fs/proc/thread_self.c b/fs/proc/thread_self.c
index ac284f409568..a553273fbd41 100644
--- a/fs/proc/thread_self.c
+++ b/fs/proc/thread_self.c
@@ -43,7 +43,7 @@ int proc_setup_thread_self(struct super_block *s)
inode_lock(root_inode);
thread_self = d_alloc_name(s->s_root, "thread-self");
if (thread_self) {
-   struct inode *inode = new_inode_pseudo(s);
+   struct inode *inode = new_inode(s);
if (inode) {
inode->i_ino = thread_self_inum;
inode->i_mtime = inode->i_atime = inode->i_ctime = 
current_time(inode);
-- 
2.20.1