Re: [systemd-devel] protecting sshd against forkbombs, excessive memory usage by other processes

2020-08-14 Thread Benjamin Berg
Hi,

I would suggest trying the following:

 * Set a MemoryLow allocation
 * Enable the CPU cgroup controller

For the first, it'll make sense to set MemoryLow= on system.slice and
also setting DefaultMemoryLow= or MemoryLow= on sshd.service. Otherwise
things might be somewhat unexpected for now, see
  https://github.com/systemd/systemd/pull/16559
I guess one could also do something similar for user-0.slice.

The second part ensures CPU is allocated to users fairly, meaning that
the user-X.slice's are competing against each other, rather than the
individual processes. This will effectively give the root login and SSH
service a higher CPU priority in relation to the fork bomb. You can do
this by setting CPUWeight=100 on user-.slice. It'll also result in
system.slice and user.slice competing for CPU at eye level.

Benjamin

On Wed, 2020-08-12 at 12:57 +0900, Tomasz Chmielewski wrote:
> I've made a mistake and have executed a forkbomb-like task. Almost 
> immediately, the system became unresponsive, ssh session froze or
> were 
> very slow to output even single characters; some ssh sessions timed
> out 
> and were disconnected.
> 
> It was not possible to connect a new ssh session to interrupt the 
> runaway task - new connection attempt were simply timing out.
> 
> SSH is the only way to access the server. Eventually, after some 30 
> mins, the system "unfroze" - but - I wonder - can systemd help
> sysadmins 
> getting out of such situations?
> 
> I realize it's a bit tricky, as there are two cases here:
> 
> 1) misbehaving program is a child process of sshd (i.e. user logged
> in 
> and executed a forkbomb)
> 
> 2) misbehaving program is not a child process of sshd (i.e. some
> system 
> service is using a lot of resources)
> 
> 
> Given that - how can we tune systemd so that system admin is almost 
> always able to log in via a new SSH connection, in both cases
> outlined 
> above? My usage case assumes user error rather than a malicious
> system 
> resource usage.
> 
> 
> 
> Tomasz Chmielewski
> ___
> systemd-devel mailing list
> systemd-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/systemd-devel
> 


signature.asc
Description: This is a digitally signed message part
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] protecting sshd against forkbombs, excessive memory usage by other processes

2020-08-12 Thread Tomasz Chmielewski

On 2020-08-12 22:07, Mantas Mikulėnas wrote:

On Wed, Aug 12, 2020 at 7:03 AM Tomasz Chmielewski 
wrote:


I've made a mistake and have executed a forkbomb-like task. Almost
immediately, the system became unresponsive, ssh session froze or
were
very slow to output even single characters; some ssh sessions timed
out
and were disconnected.

It was not possible to connect a new ssh session to interrupt the
runaway task - new connection attempt were simply timing out.

SSH is the only way to access the server. Eventually, after some 30
mins, the system "unfroze" - but - I wonder - can systemd help
sysadmins
getting out of such situations?

I realize it's a bit tricky, as there are two cases here:

1) misbehaving program is a child process of sshd (i.e. user logged
in
and executed a forkbomb)


I don't think "child process of sshd" is the useful part, as logged-in
user processes are actually moved to a separate cgroup for the session
– so yes, they're sshd children, but they actually have resource
limits fully separate from the main sshd daemon process.

Which means that with systemd, each user already has their own limit
on the number of processes/tasks (the default in user-.slice.d is
TasksMax=33% of...something, but it could be lowered to e.g. 10% or to
4096) without affecting the service itself.

So I'm sure that sshd.service and user-0.slice could be tweaked
somehow to give root a higher priority at cgroup level, but that
depends on what your system actually ran out of...


It ran out of memory.


Tomasz
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] protecting sshd against forkbombs, excessive memory usage by other processes

2020-08-12 Thread Mantas Mikulėnas
On Wed, Aug 12, 2020 at 7:03 AM Tomasz Chmielewski  wrote:

> I've made a mistake and have executed a forkbomb-like task. Almost
> immediately, the system became unresponsive, ssh session froze or were
> very slow to output even single characters; some ssh sessions timed out
> and were disconnected.
>
> It was not possible to connect a new ssh session to interrupt the
> runaway task - new connection attempt were simply timing out.
>
> SSH is the only way to access the server. Eventually, after some 30
> mins, the system "unfroze" - but - I wonder - can systemd help sysadmins
> getting out of such situations?
>
> I realize it's a bit tricky, as there are two cases here:
>
> 1) misbehaving program is a child process of sshd (i.e. user logged in
> and executed a forkbomb)
>

I don't think "child process of sshd" is the useful part, as logged-in user
processes are actually moved to a separate cgroup for the session – so yes,
they're sshd children, but they actually have resource limits fully
separate from the main sshd daemon process.

Which means that with systemd, each user already has their own limit on the
number of processes/tasks (the default in user-.slice.d is TasksMax=33%
of...something, but it could be lowered to e.g. 10% or to 4096) without
affecting the service itself.

So I'm sure that sshd.service and user-0.slice could be tweaked somehow to
give root a higher priority at cgroup level, but that depends on what your
system actually ran out of...

-- 
Mantas Mikulėnas
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


[systemd-devel] protecting sshd against forkbombs, excessive memory usage by other processes

2020-08-11 Thread Tomasz Chmielewski
I've made a mistake and have executed a forkbomb-like task. Almost 
immediately, the system became unresponsive, ssh session froze or were 
very slow to output even single characters; some ssh sessions timed out 
and were disconnected.


It was not possible to connect a new ssh session to interrupt the 
runaway task - new connection attempt were simply timing out.


SSH is the only way to access the server. Eventually, after some 30 
mins, the system "unfroze" - but - I wonder - can systemd help sysadmins 
getting out of such situations?


I realize it's a bit tricky, as there are two cases here:

1) misbehaving program is a child process of sshd (i.e. user logged in 
and executed a forkbomb)


2) misbehaving program is not a child process of sshd (i.e. some system 
service is using a lot of resources)



Given that - how can we tune systemd so that system admin is almost 
always able to log in via a new SSH connection, in both cases outlined 
above? My usage case assumes user error rather than a malicious system 
resource usage.




Tomasz Chmielewski
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel