Re: [systemd-devel] bpfilter blocks root unmount during shutdown
El 24-09-2018 a las 13:30, Andrei Borzenkov escribió: This process is spawned as special kernel thread, even though it is otherwise normal user process. WUT ? So how is this new kind of task supposed to be handled by userspace ? looks like a kernel bug to me. ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] bpfilter blocks root unmount during shutdown
On Mo, 24.09.18 20:17, Andrei Borzenkov (arvidj...@gmail.com) wrote: > > I am sorry, what? Are you saying there's now a third kind of task? > > real kernel threads, real userspace processes, and weird shit running > > kernel code that in turn runs userspace supplied programs, and all > > that under user control? > > > > No, it is not exactly "user control". It runs executable embedded into > kernel module. So it is not arbitrary code. In this particular case at > least. By "user control" I meant that they are kill()-able by users (kernel threads generally are not). > > Do these processes report PF_KTHREAD in /proc/$PID/stat? i.e. do they > > pass the recently reworked is_kernel_thread() tests? > > No. The flags are 4194560 == 0x400100 == PF_RANDOMIZE|PF_SUPERPRIV. > > And sorry, I cannot comment on "these processes"; I have seen only one > concrete example. I have no idea how widespread use of this facility is. > > > We might want to update killall.c then so that it does not make > > assumptions on /proc/$PID/cmdline validity anymore, but strictly uses > > is_kernel_thread(). That should fix things properly for you, no? That > > way dracut won't even see these new kind processes at all... > > Well, I suppose there could be corner cases when executable and > libraries are from different filesystems, but this better waits for real > life example then. I prepped this PR: https://github.com/systemd/systemd/pull/10159 I think this should fix your issue, could you test? (using PF_KTHREAD checking is more correct anyway, hence regardless this should really be the right way and be merged) Lennart -- Lennart Poettering, Red Hat ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] bpfilter blocks root unmount during shutdown
24.09.2018 19:52, Lennart Poettering пишет: > On Mo, 24.09.18 19:30, Andrei Borzenkov (arvidj...@gmail.com) wrote: > >> 24.09.2018 16:20, Lennart Poettering пишет: >>> On So, 23.09.18 10:38, Andrei Borzenkov (arvidj...@gmail.com) wrote: >>> Dracut /shutdown script first tries to kill all processes still running off old root. Unfortunately this fails for special user process that runs bpfilter because it does not include reference to /oldroot in places where dracut looks for in kilall_proc_mountpoint() >>> >>> Hmm, when we invoke the /shutdown executable we already executed our >>> process killing spree as part of systemd-shutdown. How come your >>> processes even survive that long? >> >> >> p = procfs_file_alloca(pid, "cmdline"); >> f = fopen(p, "re"); >> if (!f) >> return true; /* not really, but has the desired effect */ >> >> count = fread(, 1, 1, f); >> >> /* Kernel threads have an empty cmdline */ >> if (count <= 0) >> return true; >> >> >> This process is spawned as special kernel thread, even though it is >> otherwise normal user process. > > I am sorry, what? Are you saying there's now a third kind of task? > real kernel threads, real userspace processes, and weird shit running > kernel code that in turn runs userspace supplied programs, and all > that under user control? > No, it is not exactly "user control". It runs executable embedded into kernel module. So it is not arbitrary code. In this particular case at least. > If so, yuck... > > Under which parent PID do they show up? kthreadd or somewhere further > down? > I showed it in original post. 10:~ # ps -ef | fgrep '[none]' root 984 2 0 09:46 ?00:00:00 [none] Yes, this is kthreadd. > Do these processes report PF_KTHREAD in /proc/$PID/stat? i.e. do they > pass the recently reworked is_kernel_thread() tests? > No. The flags are 4194560 == 0x400100 == PF_RANDOMIZE|PF_SUPERPRIV. And sorry, I cannot comment on "these processes"; I have seen only one concrete example. I have no idea how widespread use of this facility is. > We might want to update killall.c then so that it does not make > assumptions on /proc/$PID/cmdline validity anymore, but strictly uses > is_kernel_thread(). That should fix things properly for you, no? That > way dracut won't even see these new kind processes at all... > Well, I suppose there could be corner cases when executable and libraries are from different filesystems, but this better waits for real life example then. ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] bpfilter blocks root unmount during shutdown
On Mo, 24.09.18 19:30, Andrei Borzenkov (arvidj...@gmail.com) wrote: > 24.09.2018 16:20, Lennart Poettering пишет: > > On So, 23.09.18 10:38, Andrei Borzenkov (arvidj...@gmail.com) wrote: > > > >> Dracut /shutdown script first tries to kill all processes still running > >> off old root. Unfortunately this fails for special user process that > >> runs bpfilter because it does not include reference to /oldroot in > >> places where dracut looks for in kilall_proc_mountpoint() > > > > Hmm, when we invoke the /shutdown executable we already executed our > > process killing spree as part of systemd-shutdown. How come your > > processes even survive that long? > > > p = procfs_file_alloca(pid, "cmdline"); > f = fopen(p, "re"); > if (!f) > return true; /* not really, but has the desired effect */ > > count = fread(, 1, 1, f); > > /* Kernel threads have an empty cmdline */ > if (count <= 0) > return true; > > > This process is spawned as special kernel thread, even though it is > otherwise normal user process. I am sorry, what? Are you saying there's now a third kind of task? real kernel threads, real userspace processes, and weird shit running kernel code that in turn runs userspace supplied programs, and all that under user control? If so, yuck... Under which parent PID do they show up? kthreadd or somewhere further down? Do these processes report PF_KTHREAD in /proc/$PID/stat? i.e. do they pass the recently reworked is_kernel_thread() tests? We might want to update killall.c then so that it does not make assumptions on /proc/$PID/cmdline validity anymore, but strictly uses is_kernel_thread(). That should fix things properly for you, no? That way dracut won't even see these new kind processes at all... Lennart -- Lennart Poettering, Red Hat ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] bpfilter blocks root unmount during shutdown
On Mon, 24 Sep 2018 15:20:47 +0200 Lennart Poettering wrote: > On So, 23.09.18 10:38, Andrei Borzenkov (arvidj...@gmail.com) wrote: > > > Dracut /shutdown script first tries to kill all processes still > > running off old root. Unfortunately this fails for special user > > process that runs bpfilter because it does not include reference > > to /oldroot in places where dracut looks for in > > kilall_proc_mountpoint() > > Hmm, when we invoke the /shutdown executable we already executed our > process killing spree as part of systemd-shutdown. How come your > processes even survive that long? What am I missing? I believe it's because the bpfilter helper process is identified as a kernel thread - since it has an empty command line - and therefore not killed. I personally feel this is a bug (in the kernel), but apparently this whole bpfilter thing isn't quite ready yet and shouldn't be used for the moment -- so hopefully it'll improve/be fixed in the mean time. You can see this thread[1] about the issue. Cheers, [1] https://www.spinics.net/lists/netdev/msg520030.html ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] bpfilter blocks root unmount during shutdown
24.09.2018 16:20, Lennart Poettering пишет: > On So, 23.09.18 10:38, Andrei Borzenkov (arvidj...@gmail.com) wrote: > >> Dracut /shutdown script first tries to kill all processes still running >> off old root. Unfortunately this fails for special user process that >> runs bpfilter because it does not include reference to /oldroot in >> places where dracut looks for in kilall_proc_mountpoint() > > Hmm, when we invoke the /shutdown executable we already executed our > process killing spree as part of systemd-shutdown. How come your > processes even survive that long? p = procfs_file_alloca(pid, "cmdline"); f = fopen(p, "re"); if (!f) return true; /* not really, but has the desired effect */ count = fread(, 1, 1, f); /* Kernel threads have an empty cmdline */ if (count <= 0) return true; This process is spawned as special kernel thread, even though it is otherwise normal user process. net/bpfilter/bpfilter_kern.c:load_umh(): /* fork usermode process */ err = fork_usermode_blob(_umh_start, _umh_end - _umh_start, ); if (err) return err; pr_info("Loaded bpfilter_umh pid %d\n", info.pid); ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] bpfilter blocks root unmount during shutdown
On So, 23.09.18 10:38, Andrei Borzenkov (arvidj...@gmail.com) wrote: > Dracut /shutdown script first tries to kill all processes still running > off old root. Unfortunately this fails for special user process that > runs bpfilter because it does not include reference to /oldroot in > places where dracut looks for in kilall_proc_mountpoint() Hmm, when we invoke the /shutdown executable we already executed our process killing spree as part of systemd-shutdown. How come your processes even survive that long? What am I missing? Lennart -- Lennart Poettering, Red Hat ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
[systemd-devel] bpfilter blocks root unmount during shutdown
Dracut /shutdown script first tries to kill all processes still running off old root. Unfortunately this fails for special user process that runs bpfilter because it does not include reference to /oldroot in places where dracut looks for in kilall_proc_mountpoint() 10:~ # ps -ef | fgrep '[none]' root 984 2 0 09:46 ?00:00:00 [none] /proc/984: total 0 dr-xr-xr-x 2 root 0 0 Sep 23 10:11 attr -r 1 root 0 0 Sep 23 10:11 auxv -r--r--r-- 1 root 0 0 Sep 23 10:11 cgroup --w--- 1 root 0 0 Sep 23 10:11 clear_refs -r--r--r-- 1 root 0 0 Sep 23 10:10 cmdline -rw-r--r-- 1 root 0 0 Sep 23 10:11 comm -rw-r--r-- 1 root 0 0 Sep 23 10:11 coredump_filter -r--r--r-- 1 root 0 0 Sep 23 10:11 cpuset lrwxrwxrwx 1 root 0 0 Sep 23 10:11 cwd -> / -r 1 root 0 0 Sep 23 10:11 environ lrwxrwxrwx 1 root 0 0 Sep 23 10:11 exe -> / (deleted) -rw-r--r-- 1 root 0 0 Sep 23 10:11 fail-nth dr-x-- 2 root 0 0 Sep 23 10:11 fd dr-x-- 2 root 0 0 Sep 23 10:11 fdinfo -rw-r--r-- 1 root 0 0 Sep 23 10:11 gid_map -r 1 root 0 0 Sep 23 10:11 io -r--r--r-- 1 root 0 0 Sep 23 10:11 latency -r--r--r-- 1 root 0 0 Sep 23 10:11 limits -rw-r--r-- 1 root 0 0 Sep 23 10:11 loginuid -rw-r--r-- 1 root 0 0 Sep 23 10:11 make-it-fail dr-x-- 2 root 0 0 Sep 23 10:11 map_files -r--r--r-- 1 root 0 0 Sep 23 10:10 maps -rw--- 1 root 0 0 Sep 23 10:11 mem -r--r--r-- 1 root 0 0 Sep 23 10:11 mountinfo -r--r--r-- 1 root 0 0 Sep 23 10:11 mounts -r 1 root 0 0 Sep 23 10:11 mountstats dr-xr-xr-x 6 root 0 0 Sep 23 10:11 net dr-x--x--x 2 root 0 0 Sep 23 10:11 ns -r--r--r-- 1 root 0 0 Sep 23 10:11 numa_maps -rw-r--r-- 1 root 0 0 Sep 23 10:11 oom_adj -r--r--r-- 1 root 0 0 Sep 23 10:11 oom_score -rw-r--r-- 1 root 0 0 Sep 23 10:11 oom_score_adj -r 1 root 0 0 Sep 23 10:11 pagemap -r 1 root 0 0 Sep 23 10:11 patch_state -r 1 root 0 0 Sep 23 10:11 personality -rw-r--r-- 1 root 0 0 Sep 23 10:11 projid_map lrwxrwxrwx 1 root 0 0 Sep 23 10:11 root -> / -rw-r--r-- 1 root 0 0 Sep 23 10:11 sched -r--r--r-- 1 root 0 0 Sep 23 10:11 schedstat -r--r--r-- 1 root 0 0 Sep 23 10:11 sessionid -rw-r--r-- 1 root 0 0 Sep 23 10:11 setgroups -r--r--r-- 1 root 0 0 Sep 23 10:11 smaps -r--r--r-- 1 root 0 0 Sep 23 10:11 smaps_rollup -r 1 root 0 0 Sep 23 10:11 stack -r--r--r-- 1 root 0 0 Sep 23 10:10 stat -r--r--r-- 1 root 0 0 Sep 23 10:11 statm -r--r--r-- 1 root 0 0 Sep 23 10:10 status -r 1 root 0 0 Sep 23 10:11 syscall dr-xr-xr-x 3 root 0 0 Sep 23 10:11 task -r--r--r-- 1 root 0 0 Sep 23 10:11 timers -rw-rw-rw- 1 root 0 0 Sep 23 10:11 timerslack_ns -rw-r--r-- 1 root 0 0 Sep 23 10:11 uid_map -r--r--r-- 1 root 0 0 Sep 23 10:11 wchan /proc/984/fd: total 0 lr-x-- 1 root 0 64 Sep 23 10:11 0 -> pipe:[19409] l-wx-- 1 root 0 64 Sep 23 10:11 1 -> pipe:[19410] lrwx-- 1 root 0 64 Sep 23 10:11 2 -> /oldsys/dev/console But it does contain reference to /oldroot in its mapped libraries list (/proc/984/maps): 563b63002000-563b63003000 r--p 00:05 19404 / (deleted) 563b63003000-563b63004000 r-xp 1000 00:05 19404 / (deleted) 563b63004000-563b63005000 r--p 2000 00:05 19404 / (deleted) 563b63005000-563b63006000 r--p 2000 00:05 19404 / (deleted) 563b63006000-563b63007000 rw-p 3000 00:05 19404 / (deleted) 563b63fb4000-563b63fd5000 rw-p 00:00 0 [heap] 7fa3a46cc000-7fa3a4882000 r-xp 00:2a 7728 /oldroot/lib64/libc-2.27.so 7fa3a4882000-7fa3a4a82000 ---p 001b6000 00:2a 7728 /oldroot/lib64/libc-2.27.so 7fa3a4a82000-7fa3a4a86000 r--p 001b6000 00:2a 7728 /oldroot/lib64/libc-2.27.so 7fa3a4a86000-7fa3a4a88000 rw-p 001ba000 00:2a 7728 /oldroot/lib64/libc-2.27.so 7fa3a4a88000-7fa3a4a8c000 rw-p 00:00 0 7fa3a4a8c000-7fa3a4ab1000 r-xp 00:2a 7720 /oldroot/lib64/ld-2.27.so 7fa3a4ca7000-7fa3a4ca9000 rw-p 00:00 0 7fa3a4cb1000-7fa3a4cb2000 r--p 00025000 00:2a 7720 /oldroot/lib64/ld-2.27.so 7fa3a4cb2000-7fa3a4cb3000 rw-p 00026000 00:2a 7720 /oldroot/lib64/ld-2.27.so 7fa3a4cb3000-7fa3a4cb4000 rw-p 00:00 0 7ffea03b4000-7ffea03d5000 rw-p 00:00 0 [stack] 7ffea03df000-7ffea03e2000 r--p 00:00 0 [vvar] 7ffea03e2000-7ffea03e4000 r-xp 00:00 0 [vdso] ff60-ff601000 r-xp 00:00 0 [vsyscall] So the quick fix would be to extend check for root references to also look into /proc/$PID/maps. Something like (verified): --- dracut-lib.sh.orig 2018-09-18 13:24:49.0 +0300 +++ dracut-lib.sh 2018-09-23 10:31:13.300054544 +0300 @@ -118,7 +118,7 @@ killall_proc_mountpoint() { esac [ -e "/proc/$_pid/exe" ] || continue [ -e "/proc/$_pid/root" ] || continue -strstr "$(ls -l -- "/proc/$_pid" "/proc/$_pid/fd" 2>/dev/null)" "$1" && kill -9 "$_pid" +strstr "$(ls -l -- "/proc/$_pid" "/proc/$_pid/fd" 2>/dev/null; cat "/proc/$_pid/maps" 2> /dev/null)" "$1" && kill -9 "$_pid" done } Note that there are also other places that use similar check (most obvious being /shutdown script