[systemd-devel] protecting sshd against forkbombs, excessive memory usage by other processes

2020-08-11 Thread Tomasz Chmielewski
I've made a mistake and have executed a forkbomb-like task. Almost 
immediately, the system became unresponsive, ssh session froze or were 
very slow to output even single characters; some ssh sessions timed out 
and were disconnected.


It was not possible to connect a new ssh session to interrupt the 
runaway task - new connection attempt were simply timing out.


SSH is the only way to access the server. Eventually, after some 30 
mins, the system "unfroze" - but - I wonder - can systemd help sysadmins 
getting out of such situations?


I realize it's a bit tricky, as there are two cases here:

1) misbehaving program is a child process of sshd (i.e. user logged in 
and executed a forkbomb)


2) misbehaving program is not a child process of sshd (i.e. some system 
service is using a lot of resources)



Given that - how can we tune systemd so that system admin is almost 
always able to log in via a new SSH connection, in both cases outlined 
above? My usage case assumes user error rather than a malicious system 
resource usage.




Tomasz Chmielewski
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Shutdown order in systemd

2020-08-11 Thread Colin Guthrie
Just FYI and for the sake of cross referencing, the inhibition logic was
mentioned on the list today in a thread: "systemd-inhibit don't work".

A developer says he will work on the patch for this RFE shortly.

Col

Zheng SHAO wrote on 04/08/2020 13:39:
> Hello,
>
> First thanks for your advise.
>
> I didn’t know inhibit before, today I read the document
> and did few simple tests, here is one of these.
>
> $ sudo systemd-inhibit --what=shutdown --who=graceful-shutdown
> --why="Keep application working" --mode=“block” /bin/sleep 60
>
> Unfortunately once ACPI G2 soft off signal comes, the system begin to
> shutdown immediately, I’m still figuring out why systemd-inhibit did
> not block the shutdown process.
>
> At the same time, I found an interesting project which try to block
> shutdown completely.
> https://github.com/ryran/reboot-guard
>
> Thanks!
>
>> On Aug 4, 2020, at 4:01, Colin Guthrie > > wrote:
>>
>> Zheng SHAO wrote on 03/08/2020 13:31:
>>> Hello,
>>>
>>> We are finding a robust way to handle ACPI G2 soft off signal to
>>> graceful shutdown our application.
>>> To simplifier the problem, consider our instance is running with
>>> Nginx behind a load balancer.
>>> When the ACPI G2 soft off signal comes to the Nginx instance, we
>>> want to do these jobs
>>>
>>> 1. Keep current HTTP connection works.
>>> 2. Fail the health check in load balance side.
>>> 3. Make sure new connection not comes from load balancer.
>>> 4. Kill long connections if any connection exceeds to 60 seconds.
>>> 5. Continue shutdown process.
>>>
>>> We are considering to achieve this by 2 options,
>>> 1. Add a custom handler for `HandlePowerKey` in
>>> /etc/systemd/logind.conf.
>>> 2. Add a system service so when systemd starting shutdown, this
>>> service will be run first and block other service to be killed.
>>
>> Have you considered writing a service that takes a systemd-inhibit
>> shutdown lock?
>>
>> This might not work but looking very quickly at
>> https://www.freedesktop.org/wiki/Software/systemd/inhibit/ it would
>> appear you get a PrepareForShutdown signal which could kick off your
>> steps.
>>
>> Depending on how things work, you could just introduce a "delay" inhibit
>> rather than a "block", e.g. a 70s delay could give you a bit of headroom
>> to trigger fail the LB's health check.
>>
>> Or perhaps you could take a block lock, then when the
>> prepareForShutdown() signal comes in, fail the LB, then when that is
>> confirmed, add a new 60s delay inhibit (not sure if this works after
>> shutdown has been triggered), then release the block inhibit and just
>> wait for everything else to run it's course? Alternatively just keep the
>> block inhibit right up until you want step 5 to begin.
>>
>> Again, this is pure speculation and the default handler for
>> HandlePowerKey may bypass login (tho' I suspect not) and others may
>> explain other reasons why this approach may not work.
>>
>> Good luck
>>
>> Col
>>
>>
>>> The method[2] is a preferred way, but we can not find a correct
>>> implement for this.
>>>
>>> ```
>>> [Unit]
>>> Description=Delay shutdown
>>> After=network-online.target network.target rsyslog.service
>>> After=google-instance-setup.service google-network-daemon.service
>>> After=systemd-user-sessions.service sshd.service
>>> google-fluentd.service user.slice system.slice
>>> nss-user-lookup.target logind.service
>>> Wants=network-online.target network.target rsyslog.service
>>> google-instance-setup.service google-network-daemon.service
>>> systemd-user-sessions.service sshd.service google-fluentd.service
>>> user.slice system.slice multi-user.target nss-user-lookup.target
>>> logind.service
>>>
>>> [Service]
>>> Type=oneshot
>>> ExecStart=/bin/true
>>> ExecStop=/root/shutdown.sh
>>> RemainAfterExit=yes
>>> KillMode=none
>>> TimeoutStopSec=0
>>> StandardOutput=journal+console
>>>
>>> [Install]
>>> WantedBy=multi-user.target
>>> ```
>>>
>>> /root/shutdown.sh
>>> ```
>>> #!/bin/bash
>>>
>>> echo start shutdown
>>> echo sleep 300
>>> sleep 300
>>> echo end shutdown
>>> ```
>>>
>>> We checked console output shows as follow,
>>> ```
>>> CentOS Linux 7 (Core)
>>> Kernel 3.10.0-1127.10.1.el7.x86_64 on an x86_64
>>>
>>> shao-redis-prd-base login: Aug  3 21:19:25 shao-redis-prd-base
>>> chronyd[449]: Selected source 169.254.169.254
>>> Aug  3 21:20:01 shao-redis-prd-base systemd: Created slice User
>>> Slice of root.
>>> Aug  3 21:20:01 shao-redis-prd-base systemd: Started Session 1 of
>>> user root.
>>> Aug  3 21:20:01 shao-redis-prd-base systemd: Removed slice User
>>> Slice of root.
>>> Aug  3 21:20:11 shao-redis-prd-base systemd: Started Unbound
>>> recursive Domain Name Server.
>>> Aug  3 21:20:11 shao-redis-prd-base systemd: Reached target Host and
>>> Network Name Lookups.
>>> Aug  3 21:20:11 shao-redis-prd-base unbound: [1204:0] notice: init
>>> module 0: ipsecmod
>>> Aug  3 21:20:11 shao-redis-prd-base unbound: [1204:0] notice: init
>>> module 1: validator
>>> Aug  3 

Re: [systemd-devel] systemd-inhibit don't work

2020-08-11 Thread Reindl Harald



Am 10.08.20 um 15:37 schrieb Lennart Poettering:
> On Mo, 10.08.20 15:05, Reindl Harald (h.rei...@thelounge.net) wrote:
> 
>> well, i would expect that the reboot in the scond ssh-session is
>> refused...
>>
>> [root@master:~]$ /usr/bin/systemd-inhibit --what=shutdown --who=root
>> --why="Backup in progress" --mode=block sleep 600
>>
>> [root@master:~]$ /usr/bin/systemd-inhibit; systemctl reboot
>> WHO  UID USER PID COMMWHAT WHYMODE
>> root 0   root 569 systemd-inhibit shutdown Backup in progress block
>>
>> 1 inhibitors listed.
>> [root@master:~]$ Connection to master.thelounge.net closed by remote host.
>> Connection to master.thelounge.net closed.
>> [harry@srv-rhsoft:~]$
> 
> Root is almighty on UNIX. This also means it has the privilege to
> ignore inhibitors, and thta's what you are seeing here.

tell that namesapces and gcroups :-)

my root user when doing from a graphical session "su - root" is far away
from almighty given he inherits the dropins for display-manager.service

> There is a github issue filed asking for some mechanism to extend
> inhibitors so that root can't trivially override it, but so far this
> hasn't been implemented.
sounds good

a backupserver with the only purpose running backups is the perfect
example for "nice that you instaleld some updates, but creep away and
reboot later when i am finished, no matter who you are"

dnf in one session doing upgrades and another admin in a different one
typing reboot would be another one

no question, he should be able to enforce it no matter what but not by
accident and i have around 5 things in mind where "you don#t reboot now"
would apply
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] systemd.net-naming-scheme change after update

2020-08-11 Thread Michal Sekletar
On Wed, Aug 5, 2020 at 4:12 PM Thomas HUMMEL 
wrote:
>
>
> What I understand here in my case is that NAME is not empty (because of
> biosdevname step) so I don't understand why I don't end up with em1
> instead of the
>   onboard style name. This would mean ID_NET_NAME has been set in a
> previous step ? What was the use of the biosdevname stop then ?

On RHEL/CentOS 8 biosdevname naming is not used unless it is explicitly
enabled on the kernel command line using biosdevname=1. Since you didn't
have an interface named using biosdevname to begin with I'd assume that
this rule is always skipped (which is OK unless you have biosdevname=1 on
cmdline)

In the case of an updated system net_id failed to generate a name based on
an on board index provided by the firmware. Hence naming falls back to the
next naming scheme which is based on PCI topology. I can't explain the
difference in names between updated and newly provisioned system (provided
they are exactly identical in terms of HW, firmware, ...).  Maybe due to
some race we assign a PCI path based name because the sysfs attribute that
is used to construct an on board name (enoX) becomes available only later
on. If that is the case then it would be a driver bug. To prove this
hypothesis you need to modify net_id so that it would log about missing
attributes. Roughly here,

https://github.com/systemd-rhel/rhel-8/blob/master/src/udev/udev-builtin-net_id.c#L228

you need to call log_error() or something like that only then return
-ENOENT.

>
>
> finally, what does "If the kernel claims that the name it has set for a
> device is predictable" mean
> (https://www.freedesktop.org/software/systemd/man/systemd.link.html#) ?
>
> And what is the kernel name (%k) : is it always ethX ?

Kernel can indicate via value of name_assign_type sysfs attribute that the
already assigned name is predictable.

More details in commit message,

https://patchwork.kernel.org/patch/3733841/

Cheers,

Michal
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] I/O error on "systemctl kill -s HUP rsyslog.service"

2020-08-11 Thread Lennart Poettering
On Di, 11.08.20 14:14, Harald Dunkel (harald.dun...@aixigo.com) wrote:

> Hi folks,
>
> sending a HUP to rsyslog using the "systemd way" gives me an error:
>
>   # systemctl kill -s HUP rsyslog.service
>   Failed to kill unit rsyslog.service: Input/output error
>
> rsyslog receives the signal, but the exit value of systemctl indicates
> an error, affecting the logrotate service.
>
> This is a LXC container (lxc 4.0.2). Host and container are running
> Debian 10, esp. systemd 241-7~deb10u4. See https://bugs.debian.org/968049
>
>
> Every helpful hint is highly appreciated.

Can you run systemctl with SYSTEMD_LOG_LEVEL debug? Anything
interesting in the debug output it generates then? I wonder where the
I/O error comes from...

i.e. question is if the error is client side. If it's coming from the
server side, then the next thing to try would be to turn on debug
logging with "systemd-analyze log-level debug" and reproduce the
issue, then check if there's anything interesting in the logs.

Please provide the relevant log excerpts here then.

Lennart

--
Lennart Poettering, Berlin
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


[systemd-devel] I/O error on "systemctl kill -s HUP rsyslog.service"

2020-08-11 Thread Harald Dunkel

Hi folks,

sending a HUP to rsyslog using the "systemd way" gives me an error:

# systemctl kill -s HUP rsyslog.service
Failed to kill unit rsyslog.service: Input/output error

rsyslog receives the signal, but the exit value of systemctl indicates
an error, affecting the logrotate service.

This is a LXC container (lxc 4.0.2). Host and container are running
Debian 10, esp. systemd 241-7~deb10u4. See https://bugs.debian.org/968049


Every helpful hint is highly appreciated.

Harri

___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] systemd unit timer

2020-08-11 Thread Lennart Poettering
On Mo, 10.08.20 20:19, Dave Howorth (syst...@howorth.org.uk) wrote:

> > It kinda makes sense to invoke cronjobs the same way as any other
> > piece of system code in userspace: as a service, so that you can take
> > benefit of deps management, priv handling, logging, sandboxing and so
> > on, so that you can run stuff either manually or by timers or by any
> > other kind of activation, and so on, and it always ends up in exactly
> > one instance. And there's tons of other stuff too.
> >
> > i.e. it unifies how system programs are invoked, and that's a good
> > thing. it turns time-based activation into "just another type of
> > activation".
>
> Most of that has gone over my head so some examples would probably help
> me to understand. Perhaps they're in the git logs?
>
> But I'm not normally running system code in cronjobs. I usually run
> either scripts I have written myself, or backup commands and the
> like.

Well, by "system code" in this context I mean code running in system
code, i.e. not associated with a specific "human" user. i.e. all code
traditionally run from /etc/crontab and related dirs is in "system
context".

> If I wanted to run a service, presumably I could just write a 'manual'
> invocation as a cron or at job? I'm not seeing the big imperative to
> create another new bunch of code to learn and maintain. I expect I'm
> blind.

I mean, you don't have to use systemd timers+services, that's entirely
up to you. cron continues to work after all.

However, there's a ton of stuff in it for you if you do bother running
stuff as timer. For example, let's say you wrote your own backup
script, that streams your whole OS backup to some server. Stuff like
that you want resource manage a bit, i.e. use CPUShares=100 or so to
make sure it doesn't take as much resources. You want to lock it down,
since it's interacting with the netwokr, and it's bad enough it needs
to be able to read all your files, but it sure as hell shouldn't also
be able to change them, so you could lock it down with ProtectSystem=
and so on. And sometimes you want to start a backup manually, outside
of your usual schedule, so there's "systemctl start" of the backup
script to do so in a way that won't conflict if the schedule hits
while the backup is still running. Then, maybe you need some service
to be up while you are doing your backup (or a mount), and it might be
used by something else too, but should go away when not used. You can
pull it in cleanly from your timer's service now, and mark it
StopWhenUnneeded= so that it goes away when no service uses it. And so
on and so on.

Lennart

--
Lennart Poettering, Berlin
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] ConditionPathExists vs mount unit

2020-08-11 Thread Andrei Borzenkov
10.08.2020 20:59, Böszörményi Zoltán пишет:
> Hi,
> 
> I have to use the same OS image tarball (created by Yocto)
> on several machines with different specifications.
> 
> Where they differ is the disk size and partitioning. On the smaller
> machine (a Sicom SL20 POS hardware, boots from CF card) the disk size
> is too small to have separate partitions for certain purposes that are
> on the other hand mandatory on the larger system.
> 
> The shipped disks are mass-produced and are pre-formatted with
> the same UUIDs across all devices so they are interchangeable.
> 
> So, I discovered the mount unit type:
> https://www.freedesktop.org/software/systemd/man/systemd.mount.html
> 
> This page says that the usual [Unit] section options are applicable.
> 
> I was hoping that the missing partitions can be skipped using the
> ConditionPathExists= option but it seems it's not the case.
> 
> On mount unit looks like this:
> 
> $ cat var.mount
> [Unit]
> Description=Variable Data (/var)
> ConditionPathExists=/dev/disk/by-uuid/e8282db7-dd6d-4231-b2b1-49887648480c
> ConditionPathIsSymbolicLink=!/var
> DefaultDependencies=no
> Conflicts=umount.target
> Before=local-fs.target umount.target
> After=swap.target
> 
> [Mount]
> What=/dev/disk/by-uuid/e8282db7-dd6d-4231-b2b1-49887648480c
> Where=/var
> Options=noatime
> 
> [Install]
> WantedBy=local-fs.target
> 
> 
> This boots properly on the larger system where the extra /var
> partition exists but the smaller system fails to boot.
> 
> systemctl status var.mount says:
> 
> Dependency failed for Variable Data (/var)
> var.mount: Job var.mount/start failed with result 'dependency'
> 
> Is there a way to describe optional mounts via such Conditions* options?
> 

No the way you are doing it. Device dependency is checked before
Conditions* directives are even looked at.

If your concern is only boot time, you should consider generators that
will create correct mount units for currently present hardware.
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel