Bug#949235: ip netns add race condition can wreak havoc in mount points

2020-12-17 Thread Etienne Dechamps
Hi Luca,

On Fri, 27 Nov 2020 at 18:09, Luca Boccassi  wrote:
> Can reproduce the issue - sent a patch to use flock():
>
> https://patchwork.ozlabs.org/project/netdev/patch/20201127180651.80283-1-bl...@debian.org/
>
> Please test it as well, cannot reproduce anymore once the fix is in.

Sorry for the delay. I can confirm that I cannot reproduce anymore
with your patch (commit 975c494 in the iproute2 git repo). Thanks for
the fix!



Bug#949235: ip netns add race condition can wreak havoc in mount points

2020-11-27 Thread Luca Boccassi
Control: forwarded -1 
https://patchwork.ozlabs.org/project/netdev/patch/20201127180651.80283-1-bl...@debian.org/
Control: tags -1 patch

On Sat, 18 Jan 2020 17:06:55 + Etienne Dechamps  
wrote:
> Package: iproute2
> Version: 4.20.0-2
> Severity: important
> Control: found -1 5.4.0-1
> 
> The ipnetns.c:netns_add() function sets up the /var/run/netns mount
> point in a way that is fragile to race conditions if the routine is
> entered to by multiple processes at the same time.
> 
> If the race condition is triggered, some kind of mount point recursion
> explosion seems to happen, messing up the entire system in
> "interesting" ways. For example, /proc/self/mountinfo ends up with
> tons of duplicate entries, and the mountinfo file itself becomes so
> large that the entire system tends to slow down. Also, subsequent
> netns add commands might fail with the following error message (note
> that this doesn't always happen):
> 
>   mount --bind /run/netns /run/netns failed: No space left on device
> 
> Since it is a race condition, the issue is hard to reproduce on its
> own, but it is possible to force it to happen by using strace to
> inject an artificial delay in the mount() system call. See below.
> 
> I observed this race condition happen multiple times on a real
> production system during the boot process. This is because this
> particular system sets up network namespaces using systemd units.
> Because systemd is designed to start units in parallel, and due to
> cold caches, multiple units running "netns add" end up synchronizing
> with each other, making it quite likely the race condition will be
> triggered.
> 
> STEPS TO REPRODUCE
> 
> Do NOT follow this procedure on a system you care about. This
> procedure WILL mess up your system and likely require you to reboot!
> 
> 1. Start from a fresh system that never ran "netns add" since boot (or
> just unmount /var/run/netns manually). I can reproduce it on Debian
> Buster (iproute2 4.20.0-2) as well as latest Sid (5.4.0-1).
> 
> 2. Run the following bash script:
> ---
> for i in {0..9}
> do
> strace -e trace=mount -e inject=mount:delay_exit=100 ip
> netns add "testnetns$i" 2>&1 | tee "$i.log" &
> done
> wait
> ---
> 
> 3. Look at /proc/self/mountinfo. Hilarity ensues.
> 
> If you increase the count in the script you might even get to see some
> "mount failed: No space left on device" errors.
> 
> WORKAROUND
> 
> Make sure that the first "netns add" command that runs after boot
> cannot run in parallel with any other "netns add" command. flock(1)
> might be useful here. I guess setting up the /var/run/netns point

Can reproduce the issue - sent a patch to use flock():

https://patchwork.ozlabs.org/project/netdev/patch/20201127180651.80283-1-bl...@debian.org/

Please test it as well, cannot reproduce anymore once the fix is in.

I'll also look into the new context-based mount API that was added
recently, although this will be needed for backward compatibility
anyway.

-- 
Kind regards,
Luca Boccassi


signature.asc
Description: This is a digitally signed message part


Bug#949235: ip netns add race condition can wreak havoc in mount points

2020-01-18 Thread Etienne Dechamps
Package: iproute2
Version: 4.20.0-2
Severity: important
Control: found -1 5.4.0-1

The ipnetns.c:netns_add() function sets up the /var/run/netns mount
point in a way that is fragile to race conditions if the routine is
entered to by multiple processes at the same time.

If the race condition is triggered, some kind of mount point recursion
explosion seems to happen, messing up the entire system in
"interesting" ways. For example, /proc/self/mountinfo ends up with
tons of duplicate entries, and the mountinfo file itself becomes so
large that the entire system tends to slow down. Also, subsequent
netns add commands might fail with the following error message (note
that this doesn't always happen):

  mount --bind /run/netns /run/netns failed: No space left on device

Since it is a race condition, the issue is hard to reproduce on its
own, but it is possible to force it to happen by using strace to
inject an artificial delay in the mount() system call. See below.

I observed this race condition happen multiple times on a real
production system during the boot process. This is because this
particular system sets up network namespaces using systemd units.
Because systemd is designed to start units in parallel, and due to
cold caches, multiple units running "netns add" end up synchronizing
with each other, making it quite likely the race condition will be
triggered.

STEPS TO REPRODUCE

Do NOT follow this procedure on a system you care about. This
procedure WILL mess up your system and likely require you to reboot!

1. Start from a fresh system that never ran "netns add" since boot (or
just unmount /var/run/netns manually). I can reproduce it on Debian
Buster (iproute2 4.20.0-2) as well as latest Sid (5.4.0-1).

2. Run the following bash script:
---
for i in {0..9}
do
strace -e trace=mount -e inject=mount:delay_exit=100 ip
netns add "testnetns$i" 2>&1 | tee "$i.log" &
done
wait
---

3. Look at /proc/self/mountinfo. Hilarity ensues.

If you increase the count in the script you might even get to see some
"mount failed: No space left on device" errors.

WORKAROUND

Make sure that the first "netns add" command that runs after boot
cannot run in parallel with any other "netns add" command. flock(1)
might be useful here. I guess setting up the /var/run/netns point
manually during boot, before any "netns add" command runs, might also
work.