Re: [systemd-devel] bpfilter blocks root unmount during shutdown

2018-09-25 Thread Cristian Rodríguez



El 24-09-2018 a las 13:30, Andrei Borzenkov escribió:


This process is spawned as special kernel thread, even though it is
otherwise normal user process.


WUT ? So how is this new kind of task supposed to be handled by 
userspace ? looks like a kernel bug to me.




___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] bpfilter blocks root unmount during shutdown

2018-09-24 Thread Lennart Poettering
On Mo, 24.09.18 20:17, Andrei Borzenkov (arvidj...@gmail.com) wrote:

> > I am sorry, what? Are you saying there's now a third kind of task?
> > real kernel threads, real userspace processes, and weird shit running
> > kernel code that in turn runs userspace supplied programs, and all
> > that under user control?
> > 
> 
> No, it is not exactly "user control". It runs executable embedded into
> kernel module. So it is not arbitrary code. In this particular case at
> least.

By "user control" I meant that they are kill()-able by users (kernel
threads generally are not).

> > Do these processes report PF_KTHREAD in /proc/$PID/stat? i.e. do they
> > pass the recently reworked is_kernel_thread() tests?
> 
> No. The flags are 4194560 == 0x400100 == PF_RANDOMIZE|PF_SUPERPRIV.
> 
> And sorry, I cannot comment on "these processes"; I have seen only one
> concrete example. I have no idea how widespread use of this facility is.
> 
> > We might want to update killall.c then so that it does not make
> > assumptions on /proc/$PID/cmdline validity anymore, but strictly uses
> > is_kernel_thread(). That should fix things properly for you, no? That
> > way dracut won't even see these new kind processes at all...
> 
> Well, I suppose there could be corner cases when executable and
> libraries are from different filesystems, but this better waits for real
> life example then.

I prepped this PR:

https://github.com/systemd/systemd/pull/10159

I think this should fix your issue, could you test? (using PF_KTHREAD
checking is more correct anyway, hence regardless this should really
be the right way and be merged)

Lennart

-- 
Lennart Poettering, Red Hat
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] bpfilter blocks root unmount during shutdown

2018-09-24 Thread Andrei Borzenkov
24.09.2018 19:52, Lennart Poettering пишет:
> On Mo, 24.09.18 19:30, Andrei Borzenkov (arvidj...@gmail.com) wrote:
> 
>> 24.09.2018 16:20, Lennart Poettering пишет:
>>> On So, 23.09.18 10:38, Andrei Borzenkov (arvidj...@gmail.com) wrote:
>>>
 Dracut /shutdown script first tries to kill all processes still running
 off old root. Unfortunately this fails for special user process that
 runs bpfilter because it does not include reference to /oldroot in
 places where dracut looks for in kilall_proc_mountpoint()
>>>
>>> Hmm, when we invoke the /shutdown executable we already executed our
>>> process killing spree as part of systemd-shutdown. How come your
>>> processes even survive that long?
>>
>>
>> p = procfs_file_alloca(pid, "cmdline");
>> f = fopen(p, "re");
>> if (!f)
>> return true; /* not really, but has the desired effect */
>>
>> count = fread(, 1, 1, f);
>>
>> /* Kernel threads have an empty cmdline */
>> if (count <= 0)
>> return true;
>>
>>
>> This process is spawned as special kernel thread, even though it is
>> otherwise normal user process.
> 
> I am sorry, what? Are you saying there's now a third kind of task?
> real kernel threads, real userspace processes, and weird shit running
> kernel code that in turn runs userspace supplied programs, and all
> that under user control?
> 

No, it is not exactly "user control". It runs executable embedded into
kernel module. So it is not arbitrary code. In this particular case at
least.

> If so, yuck...
> 
> Under which parent PID do they show up? kthreadd or somewhere further
> down?
> 

I showed it in original post.

10:~ # ps -ef | fgrep '[none]'
root   984 2  0 09:46 ?00:00:00 [none]

Yes, this is kthreadd.

> Do these processes report PF_KTHREAD in /proc/$PID/stat? i.e. do they
> pass the recently reworked is_kernel_thread() tests?
> 


No. The flags are 4194560 == 0x400100 == PF_RANDOMIZE|PF_SUPERPRIV.

And sorry, I cannot comment on "these processes"; I have seen only one
concrete example. I have no idea how widespread use of this facility is.

> We might want to update killall.c then so that it does not make
> assumptions on /proc/$PID/cmdline validity anymore, but strictly uses
> is_kernel_thread(). That should fix things properly for you, no? That
> way dracut won't even see these new kind processes at all...
> 

Well, I suppose there could be corner cases when executable and
libraries are from different filesystems, but this better waits for real
life example then.
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] bpfilter blocks root unmount during shutdown

2018-09-24 Thread Lennart Poettering
On Mo, 24.09.18 19:30, Andrei Borzenkov (arvidj...@gmail.com) wrote:

> 24.09.2018 16:20, Lennart Poettering пишет:
> > On So, 23.09.18 10:38, Andrei Borzenkov (arvidj...@gmail.com) wrote:
> > 
> >> Dracut /shutdown script first tries to kill all processes still running
> >> off old root. Unfortunately this fails for special user process that
> >> runs bpfilter because it does not include reference to /oldroot in
> >> places where dracut looks for in kilall_proc_mountpoint()
> > 
> > Hmm, when we invoke the /shutdown executable we already executed our
> > process killing spree as part of systemd-shutdown. How come your
> > processes even survive that long?
> 
> 
> p = procfs_file_alloca(pid, "cmdline");
> f = fopen(p, "re");
> if (!f)
> return true; /* not really, but has the desired effect */
> 
> count = fread(, 1, 1, f);
> 
> /* Kernel threads have an empty cmdline */
> if (count <= 0)
> return true;
> 
> 
> This process is spawned as special kernel thread, even though it is
> otherwise normal user process.

I am sorry, what? Are you saying there's now a third kind of task?
real kernel threads, real userspace processes, and weird shit running
kernel code that in turn runs userspace supplied programs, and all
that under user control?

If so, yuck...

Under which parent PID do they show up? kthreadd or somewhere further
down?

Do these processes report PF_KTHREAD in /proc/$PID/stat? i.e. do they
pass the recently reworked is_kernel_thread() tests?

We might want to update killall.c then so that it does not make
assumptions on /proc/$PID/cmdline validity anymore, but strictly uses
is_kernel_thread(). That should fix things properly for you, no? That
way dracut won't even see these new kind processes at all...

Lennart

-- 
Lennart Poettering, Red Hat
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] bpfilter blocks root unmount during shutdown

2018-09-24 Thread Olivier Brunel
On Mon, 24 Sep 2018 15:20:47 +0200
Lennart Poettering  wrote:

> On So, 23.09.18 10:38, Andrei Borzenkov (arvidj...@gmail.com) wrote:
> 
> > Dracut /shutdown script first tries to kill all processes still
> > running off old root. Unfortunately this fails for special user
> > process that runs bpfilter because it does not include reference
> > to /oldroot in places where dracut looks for in
> > kilall_proc_mountpoint()  
> 
> Hmm, when we invoke the /shutdown executable we already executed our
> process killing spree as part of systemd-shutdown. How come your
> processes even survive that long? What am I missing?

I believe it's because the bpfilter helper process is identified as a
kernel thread - since it has an empty command line - and therefore not
killed.

I personally feel this is a bug (in the kernel), but apparently
this whole bpfilter thing isn't quite ready yet and shouldn't be
used for the moment -- so hopefully it'll improve/be fixed in the mean
time.
You can see this thread[1] about the issue.

Cheers,



[1] https://www.spinics.net/lists/netdev/msg520030.html
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] bpfilter blocks root unmount during shutdown

2018-09-24 Thread Andrei Borzenkov
24.09.2018 16:20, Lennart Poettering пишет:
> On So, 23.09.18 10:38, Andrei Borzenkov (arvidj...@gmail.com) wrote:
> 
>> Dracut /shutdown script first tries to kill all processes still running
>> off old root. Unfortunately this fails for special user process that
>> runs bpfilter because it does not include reference to /oldroot in
>> places where dracut looks for in kilall_proc_mountpoint()
> 
> Hmm, when we invoke the /shutdown executable we already executed our
> process killing spree as part of systemd-shutdown. How come your
> processes even survive that long?


p = procfs_file_alloca(pid, "cmdline");
f = fopen(p, "re");
if (!f)
return true; /* not really, but has the desired effect */

count = fread(, 1, 1, f);

/* Kernel threads have an empty cmdline */
if (count <= 0)
return true;


This process is spawned as special kernel thread, even though it is
otherwise normal user process.

net/bpfilter/bpfilter_kern.c:load_umh():


/* fork usermode process */
err = fork_usermode_blob(_umh_start,
 _umh_end - _umh_start,
 );
if (err)
return err;
pr_info("Loaded bpfilter_umh pid %d\n", info.pid);
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] bpfilter blocks root unmount during shutdown

2018-09-24 Thread Lennart Poettering
On So, 23.09.18 10:38, Andrei Borzenkov (arvidj...@gmail.com) wrote:

> Dracut /shutdown script first tries to kill all processes still running
> off old root. Unfortunately this fails for special user process that
> runs bpfilter because it does not include reference to /oldroot in
> places where dracut looks for in kilall_proc_mountpoint()

Hmm, when we invoke the /shutdown executable we already executed our
process killing spree as part of systemd-shutdown. How come your
processes even survive that long? What am I missing?

Lennart

-- 
Lennart Poettering, Red Hat
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


[systemd-devel] bpfilter blocks root unmount during shutdown

2018-09-23 Thread Andrei Borzenkov
Dracut /shutdown script first tries to kill all processes still running
off old root. Unfortunately this fails for special user process that
runs bpfilter because it does not include reference to /oldroot in
places where dracut looks for in kilall_proc_mountpoint()

10:~ # ps -ef | fgrep '[none]'
root   984 2  0 09:46 ?00:00:00 [none]

/proc/984:
total 0
dr-xr-xr-x 2 root 0 0 Sep 23 10:11 attr
-r 1 root 0 0 Sep 23 10:11 auxv
-r--r--r-- 1 root 0 0 Sep 23 10:11 cgroup
--w--- 1 root 0 0 Sep 23 10:11 clear_refs
-r--r--r-- 1 root 0 0 Sep 23 10:10 cmdline
-rw-r--r-- 1 root 0 0 Sep 23 10:11 comm
-rw-r--r-- 1 root 0 0 Sep 23 10:11 coredump_filter
-r--r--r-- 1 root 0 0 Sep 23 10:11 cpuset
lrwxrwxrwx 1 root 0 0 Sep 23 10:11 cwd -> /
-r 1 root 0 0 Sep 23 10:11 environ
lrwxrwxrwx 1 root 0 0 Sep 23 10:11 exe -> / (deleted)
-rw-r--r-- 1 root 0 0 Sep 23 10:11 fail-nth
dr-x-- 2 root 0 0 Sep 23 10:11 fd
dr-x-- 2 root 0 0 Sep 23 10:11 fdinfo
-rw-r--r-- 1 root 0 0 Sep 23 10:11 gid_map
-r 1 root 0 0 Sep 23 10:11 io
-r--r--r-- 1 root 0 0 Sep 23 10:11 latency
-r--r--r-- 1 root 0 0 Sep 23 10:11 limits
-rw-r--r-- 1 root 0 0 Sep 23 10:11 loginuid
-rw-r--r-- 1 root 0 0 Sep 23 10:11 make-it-fail
dr-x-- 2 root 0 0 Sep 23 10:11 map_files
-r--r--r-- 1 root 0 0 Sep 23 10:10 maps
-rw--- 1 root 0 0 Sep 23 10:11 mem
-r--r--r-- 1 root 0 0 Sep 23 10:11 mountinfo
-r--r--r-- 1 root 0 0 Sep 23 10:11 mounts
-r 1 root 0 0 Sep 23 10:11 mountstats
dr-xr-xr-x 6 root 0 0 Sep 23 10:11 net
dr-x--x--x 2 root 0 0 Sep 23 10:11 ns
-r--r--r-- 1 root 0 0 Sep 23 10:11 numa_maps
-rw-r--r-- 1 root 0 0 Sep 23 10:11 oom_adj
-r--r--r-- 1 root 0 0 Sep 23 10:11 oom_score
-rw-r--r-- 1 root 0 0 Sep 23 10:11 oom_score_adj
-r 1 root 0 0 Sep 23 10:11 pagemap
-r 1 root 0 0 Sep 23 10:11 patch_state
-r 1 root 0 0 Sep 23 10:11 personality
-rw-r--r-- 1 root 0 0 Sep 23 10:11 projid_map
lrwxrwxrwx 1 root 0 0 Sep 23 10:11 root -> /
-rw-r--r-- 1 root 0 0 Sep 23 10:11 sched
-r--r--r-- 1 root 0 0 Sep 23 10:11 schedstat
-r--r--r-- 1 root 0 0 Sep 23 10:11 sessionid
-rw-r--r-- 1 root 0 0 Sep 23 10:11 setgroups
-r--r--r-- 1 root 0 0 Sep 23 10:11 smaps
-r--r--r-- 1 root 0 0 Sep 23 10:11 smaps_rollup
-r 1 root 0 0 Sep 23 10:11 stack
-r--r--r-- 1 root 0 0 Sep 23 10:10 stat
-r--r--r-- 1 root 0 0 Sep 23 10:11 statm
-r--r--r-- 1 root 0 0 Sep 23 10:10 status
-r 1 root 0 0 Sep 23 10:11 syscall
dr-xr-xr-x 3 root 0 0 Sep 23 10:11 task
-r--r--r-- 1 root 0 0 Sep 23 10:11 timers
-rw-rw-rw- 1 root 0 0 Sep 23 10:11 timerslack_ns
-rw-r--r-- 1 root 0 0 Sep 23 10:11 uid_map
-r--r--r-- 1 root 0 0 Sep 23 10:11 wchan

/proc/984/fd:
total 0
lr-x-- 1 root 0 64 Sep 23 10:11 0 -> pipe:[19409]
l-wx-- 1 root 0 64 Sep 23 10:11 1 -> pipe:[19410]
lrwx-- 1 root 0 64 Sep 23 10:11 2 -> /oldsys/dev/console


But it does contain reference to /oldroot in its mapped libraries list
(/proc/984/maps):

563b63002000-563b63003000 r--p  00:05 19404
/ (deleted)
563b63003000-563b63004000 r-xp 1000 00:05 19404
/ (deleted)
563b63004000-563b63005000 r--p 2000 00:05 19404
/ (deleted)
563b63005000-563b63006000 r--p 2000 00:05 19404
/ (deleted)
563b63006000-563b63007000 rw-p 3000 00:05 19404
/ (deleted)
563b63fb4000-563b63fd5000 rw-p  00:00 0
[heap]
7fa3a46cc000-7fa3a4882000 r-xp  00:2a 7728
/oldroot/lib64/libc-2.27.so
7fa3a4882000-7fa3a4a82000 ---p 001b6000 00:2a 7728
/oldroot/lib64/libc-2.27.so
7fa3a4a82000-7fa3a4a86000 r--p 001b6000 00:2a 7728
/oldroot/lib64/libc-2.27.so
7fa3a4a86000-7fa3a4a88000 rw-p 001ba000 00:2a 7728
/oldroot/lib64/libc-2.27.so
7fa3a4a88000-7fa3a4a8c000 rw-p  00:00 0
7fa3a4a8c000-7fa3a4ab1000 r-xp  00:2a 7720
/oldroot/lib64/ld-2.27.so
7fa3a4ca7000-7fa3a4ca9000 rw-p  00:00 0
7fa3a4cb1000-7fa3a4cb2000 r--p 00025000 00:2a 7720
/oldroot/lib64/ld-2.27.so
7fa3a4cb2000-7fa3a4cb3000 rw-p 00026000 00:2a 7720
/oldroot/lib64/ld-2.27.so
7fa3a4cb3000-7fa3a4cb4000 rw-p  00:00 0
7ffea03b4000-7ffea03d5000 rw-p  00:00 0
[stack]
7ffea03df000-7ffea03e2000 r--p  00:00 0
[vvar]
7ffea03e2000-7ffea03e4000 r-xp  00:00 0
[vdso]
ff60-ff601000 r-xp  00:00 0
[vsyscall]

So the quick fix would be to extend check for root references to also
look into /proc/$PID/maps. Something like (verified):

--- dracut-lib.sh.orig  2018-09-18 13:24:49.0 +0300
+++ dracut-lib.sh   2018-09-23 10:31:13.300054544 +0300
@@ -118,7 +118,7 @@ killall_proc_mountpoint() {
 esac
 [ -e "/proc/$_pid/exe" ] || continue
 [ -e "/proc/$_pid/root" ] || continue
-strstr "$(ls -l -- "/proc/$_pid" "/proc/$_pid/fd" 2>/dev/null)"
"$1" && kill -9 "$_pid"
+strstr "$(ls -l -- "/proc/$_pid" "/proc/$_pid/fd" 2>/dev/null;
cat "/proc/$_pid/maps" 2> /dev/null)" "$1" && kill -9 "$_pid"
 done
 }


Note that there are also other places that use similar check (most
obvious being /shutdown script