Re: [systemd-devel] Service output missing from journal?
On Tue, 5 Jul 2022 at 00:29, Lennart Poettering wrote: > On Mo, 04.07.22 23:15, Michael Biebl (mbiebl at gmail.com) wrote: > Am Mo., 4. Juli 2022 um 19:36 Uhr schrieb Lennart Poettering > : > > > > eOn So, 03.07.22 19:29, Uwe Geuder (systemd-devel-ugeuder at snkmail.com) > > wrote: > > > > > Hi! > > > > > > When I run the command given below on a current Fedora CoreOS system > > > (systemd 250 (v250.6-1.fc36)) I get a result I absolute cannot understand. > > > Can anybody help me with what is wrong there? > > > > > > $ systemd-run --user sh -c 'while true; do echo foo; df -h > > > /var/log/journal/; echo $?; sleep 3; done' > > > Running as unit: run-r9a155474889b4d40a1ac119823bdc2bf.service > > > $ journalctl --user -f -u run-r9a155474889b4d40a1ac119823bdc2bf > > > [ ... similar lines redacted ... ] > > > Jul 03 15:25:08 ip-172-31-8-116 sh[366900]: 0 > > > Jul 03 15:25:11 ip-172-31-8-116 sh[366900]: foo > > > Jul 03 15:25:11 ip-172-31-8-116 sh[366900]: 0 > > > Jul 03 15:25:14 ip-172-31-8-116 sh[366900]: foo > > > Jul 03 15:25:14 ip-172-31-8-116 sh[366900]: 0 > > > Jul 03 15:25:17 ip-172-31-8-116 sh[366900]: foo > > > Jul 03 15:25:17 ip-172-31-8-116 sh[366900]: 0 > > > Jul 03 15:25:20 ip-172-31-8-116 sh[366900]: foo > > > Jul 03 15:25:20 ip-172-31-8-116 sh[366941]: Filesystem Size Used > > > Avail Use% Mounted on > > > Jul 03 15:25:20 ip-172-31-8-116 sh[366941]: /dev/nvme0n1p4 9.5G 8.1G > > > 1.5G 86% /var > > > Jul 03 15:25:20 ip-172-31-8-116 sh[366900]: 0 > > > Jul 03 15:25:23 ip-172-31-8-116 sh[366900]: foo > > > Jul 03 15:25:23 ip-172-31-8-116 sh[366900]: 0 > > > Jul 03 15:25:26 ip-172-31-8-116 sh[366900]: foo > > > Jul 03 15:25:26 ip-172-31-8-116 sh[366900]: 0 > > > Jul 03 15:25:29 ip-172-31-8-116 sh[366900]: foo > > > Jul 03 15:25:29 ip-172-31-8-116 sh[366900]: 0 > > > [...] > > > > > > So the output from the df command appears in the journal pretty rarely, > > > appearingly at random intervals. When I run the same loop on the > > > command line the output occurs every time. > > > > > > The problem was originally noted in a somewhat loaded system. However, > > > above reproducer (including the 2 echo commands and a shorter sleep) > > > shows the same problem even on an idling machine. > > > > https://github.com/systemd/systemd/issues/2913 > > I thought about this as well, but in this case the service is still > running. So I'm not sure if #2913 applies here. The service is, but the "df" process exits extremely quickly, before we can figure out what it belongs to. See the PIDs where it works, they are different from your shell script's PID, because they are short-lived child processes. Thanks for your answers! I thought I could quickly prove Lennart's explanataion correct by making the 2 echo commands their own processes, too. They should not be long-running ones either. $ systemd-run --user sh -c 'while true; do /bin/echo foo; df -h /var/log/journal/; /bin/echo $?; sleep 3; done' Interestingly enough the result looks much better: Jul 05 23:14:07 ip-172-31-13-133 sh[2944]: foo Jul 05 23:14:07 ip-172-31-13-133 sh[2945]: Filesystem Size Used Avail Use% Mounted on Jul 05 23:14:07 ip-172-31-13-133 sh[2945]: /dev/nvme0n1p4 9.5G 2.2G 7.4G 23% /var Jul 05 23:14:07 ip-172-31-13-133 sh[2946]: 0 Jul 05 23:14:10 ip-172-31-13-133 sh[2948]: foo Jul 05 23:14:10 ip-172-31-13-133 sh[2949]: Filesystem Size Used Avail Use% Mounted on Jul 05 23:14:10 ip-172-31-13-133 sh[2949]: /dev/nvme0n1p4 9.5G 2.2G 7.4G 23% /var Jul 05 23:14:10 ip-172-31-13-133 sh[2950]: 0 Jul 05 23:14:13 ip-172-31-13-133 sh[2954]: Filesystem Size Used Avail Use% Mounted on Jul 05 23:14:13 ip-172-31-13-133 sh[2954]: /dev/nvme0n1p4 9.5G 2.2G 7.4G 23% /var Jul 05 23:14:13 ip-172-31-13-133 sh[2955]: 0 Jul 05 23:14:16 ip-172-31-13-133 sh[2957]: foo Jul 05 23:14:16 ip-172-31-13-133 sh[2958]: Filesystem Size Used Avail Use% Mounted on Jul 05 23:14:16 ip-172-31-13-133 sh[2958]: /dev/nvme0n1p4 9.5G 2.2G 7.4G 23% /var Jul 05 23:14:16 ip-172-31-13-133 sh[2959]: 0 Jul 05 23:14:19 ip-172-31-13-133 sh[2961]: foo Jul 05 23:14:19 ip-172-31-13-133 sh[2962]: Filesystem Size Used Avail Use% Mounted on Jul 05 23:14:19 ip-172-31-13-133 sh[2962]: /dev/nvme0n1p4 9.5G 2.2G 7.4G 23% /var Jul 05 23:14:19 ip-172-31-13-133 sh[2963]: 0 Jul 05 23:14:22 ip-172-31-13-133 sh[2966]: foo Jul 05 23:14:22 ip-172-31-13-133 sh[2967]: Filesystem Size Used Avail Use% Mounted on Jul 05 23:14:22 ip-172-31-13-133 sh[2967]: /dev/nvme0n1p4 9.5G 2.2G 7.4G 23% /var Jul 05 23:14:22 ip-172-31-13-
[systemd-devel] Service output missing from journal?
Hi! When I run the command given below on a current Fedora CoreOS system (systemd 250 (v250.6-1.fc36)) I get a result I absolute cannot understand. Can anybody help me with what is wrong there? $ systemd-run --user sh -c 'while true; do echo foo; df -h /var/log/journal/; echo $?; sleep 3; done' Running as unit: run-r9a155474889b4d40a1ac119823bdc2bf.service $ journalctl --user -f -u run-r9a155474889b4d40a1ac119823bdc2bf [ ... similar lines redacted ... ] Jul 03 15:25:08 ip-172-31-8-116 sh[366900]: 0 Jul 03 15:25:11 ip-172-31-8-116 sh[366900]: foo Jul 03 15:25:11 ip-172-31-8-116 sh[366900]: 0 Jul 03 15:25:14 ip-172-31-8-116 sh[366900]: foo Jul 03 15:25:14 ip-172-31-8-116 sh[366900]: 0 Jul 03 15:25:17 ip-172-31-8-116 sh[366900]: foo Jul 03 15:25:17 ip-172-31-8-116 sh[366900]: 0 Jul 03 15:25:20 ip-172-31-8-116 sh[366900]: foo Jul 03 15:25:20 ip-172-31-8-116 sh[366941]: Filesystem Size Used Avail Use% Mounted on Jul 03 15:25:20 ip-172-31-8-116 sh[366941]: /dev/nvme0n1p4 9.5G 8.1G 1.5G 86% /var Jul 03 15:25:20 ip-172-31-8-116 sh[366900]: 0 Jul 03 15:25:23 ip-172-31-8-116 sh[366900]: foo Jul 03 15:25:23 ip-172-31-8-116 sh[366900]: 0 Jul 03 15:25:26 ip-172-31-8-116 sh[366900]: foo Jul 03 15:25:26 ip-172-31-8-116 sh[366900]: 0 Jul 03 15:25:29 ip-172-31-8-116 sh[366900]: foo Jul 03 15:25:29 ip-172-31-8-116 sh[366900]: 0 [...] So the output from the df command appears in the journal pretty rarely, appearingly at random intervals. When I run the same loop on the command line the output occurs every time. The problem was originally noted in a somewhat loaded system. However, above reproducer (including the 2 echo commands and a shorter sleep) shows the same problem even on an idling machine. Regards, Uwe Uwe Geuder Neuro Event Labs Oy Tampere, Finland uwe.gxu...@neuroeventlabs.com (bot check: fix 1 obvious typo)
Re: [systemd-devel] Mobile broadband modems support in systemd-networkd
Hi! On Mon, 23 Aug 2021 at 23:41, Bruce A. Johnson wrote: > > I wasn't clear what I meant about processing of the .network > file. With Ethernet or Wi-Fi (using iwd), when the link comes up, > systemd-networkd does the Right Thing and starts network services on > the interface, running a DHCP client or setting up static address(es), > routes, etc., as specified in the .network file. When ModemManager > establishes the connection with the 3G carrier, systemd-networkd seems > to be unaware of the event, and I have to poke it with "systemd > restart systemd-networkd" or "networkctl reload" before it tries to > use the information in the .network file for the interface. It seems > to me like that ought to happen automatically. I suspect that > ModemManager needs to be changed to inform systemd-networkd. > > So far, it's not looking like there's an effort to integrate > ModemManager (or something similar) with systemd-networkd, which is > kind of a shame. If I come up with anything useful toward making that > happen, I'll offer a pull request, but at the moment I'm looking at > cobbling something together with Python. > We have a 500+ line shell script to do the job by just calling iproute2 commands (well, it is not very modular and does a couple of other things, too.) Besides that it is not very generic, license-wise I am not able to share it unfortunately. Of course probably 300 - 400 lines are just for error handling. I was not really involved when the script got written, but I would not mind throwing it out and skipping the maintenance burden. > It seems to me like that ought to happen automatically. I agree. > I suspect that ModemManager needs to be changed to inform > systemd-networkd. Why would that be the case? If you plug in an Ethernet cable there is no user space process that would inform systemd-networkd. I cannot guess anything else than systemd-networkd listening for suitable kernel events. Why would that not work for wwan interfaces? Obviously it doesn't, otherwise you wouldn't post here and my colleagues wouldn't have written a script. One difference seems to be the IPv4 address. When an Ethernet interface gets connected, as you write, in a typical case you need to run DHCP to get an IP address. Obviously systemd-network implements that. Without knowing the details right now I'd guess mobile broadband uses different protocols. Is it even Ethernet and DHCP on top of that? Our script asks the IP address from modem manager. So maybe systemd-networkd would need to do the same unless someone wants to reimplement that part. (Not sure whether modem manager would even be the program of choice, I guess connman can handle WWAN connections, too.) Is it even the same for all modems/networks? In the past PPP was a thing, nowadays QMI seems to be mentioned more frequently. Altogether this seems not to be implemented by systemd-networkd and implementing it might be much more work than a couple of days. Some more or less hacky script can be written in a couple of days. > and I have to poke it with "systemd restart systemd-networkd" or > "networkctl reload" before it tries to use the information in the > .network file for the interface. Did you get a working connection including the correct IPv4 address by doing that? If you did, some of my reasoning above would be wrong. I have not looked very deep (not even run strace or tcpdump, not to mention having a look at the source...); if someone can explain me why my speculation is wrong that would be welcome! Regards, Uwe Uwe Geuder Neuro Event Labs Oy Tampere, Finland uwe.gxu...@neuroeventlabs.com (bot check: fix 1 obvious typo)
Re: [systemd-devel] Using *.path units without races?
Hi! On Fri, 20 Mar 2020 at 00:31, Michael Chapman wrote: > > On Fri, 20 Mar 2020, Uwe Geuder wrote: > [...] > > The manual page is not very specific about how that is supposed to work > > IMHO, but I could imagine the following distinction: > > > > PathExists=, PathExistsGlob=, and DirectoryNotEmpty= are absolute > > predicates. When setting the path unit to waiting one can just check > > whether they are true or not. (After arming inotify of course.) With > > your patch my limited testing was successful. > > > > However, PathChanged= and PathModified= are relative predicates. You > > cannot just check whether they are true or not. Wouldn't the correct > > implementation > > > > 1. need to store the applicable timestamp of the path involved when the > >path unit is set to started and > > > > 2. when the path unit is set to waiting again it would need to compare > >the stored timestamp with the current timestamp (again after arming > >inotify) to catch modifications that happened while the unit was > >running/inotify not armed > > > > I don't think I see the timestamps stored at all. So how was this > > supposed to work? Was the intended semantics different? > > PathChanged= and PathModified= each map down to a set of inotify > events. It's the kernel's inotify system that determines whether the > file changed or modified, not systemd. My understanding is that since https://github.com/systemd/systemd/pull/13509/commits/06582e42de65a61d0238a18720a12b6353edb7cd there are 2 states 1. While the path unit is waiting and the triggered service unit is dead its indead all inotify's business. When a change happens the kernel will notify systemd. 2. However, while the triggered service unit is running also the path unit is running and the inotify fd is closed. So the kernel will not report any changes to systemd at all during that time. When thereafter the path unit enters waiting again it will add the required inotify watches again. However, to find out whether anything has changed, i. e. been missed while inotify was not watching, it would need to do it's own checking in user space. In the case of "absolute" predicates it's a straight-forward check whether such path exists and obviously that's what the code does with your patch. However, for the "relative" predicates I don't see another way than comparing timestamps. How else could it fulfill the promise of the man page of "checking immediately again". Inotify will report only potential future changes, not report anything that has already happened while it was inactive / the path unit running > When a service unit triggered by a path unit terminates (regardless > whether it exited successfully or failed), monitored paths are checked > immediately again, and the service accordingly restarted instantly I did a quick $ strace -e inotify_init,inotify_init1,inotify_add_watch,inotify_rm_watch,close -p and got something like this ... inotify_init1(IN_NONBLOCK|IN_CLOEXEC) = 12 inotify_add_watch(12, "/", IN_ATTRIB|IN_MOVED_TO|IN_CREATE|IN_DELETE_SELF|IN_MOVE_SELF) = 1 inotify_add_watch(12, "/tmp", IN_ATTRIB|IN_MOVED_TO|IN_CREATE|IN_DELETE_SELF|IN_MOVE_SELF) = 2 inotify_add_watch(12, "/", IN_MOVE_SELF) = 1 inotify_add_watch(12, "/tmp/systemd-path-test", IN_MODIFY|IN_ATTRIB|IN_CLOSE_WRITE|IN_MOVED_FROM|IN_MOVED_TO|IN_CREATE|IN_DELETE|IN_DELETE_SELF|IN_MOVE_SELF) = 3 inotify_add_watch(12, "/tmp", IN_MOVE_SELF) = 2 close(12) ... for every "wait period" of the path unit. We see that the inotify fd gets closed when the units enter running state and there is a new inotify_init1() when the path unit enters watching again. Instead of timestamps the code does have a bool previous_exists For the "absolute" predicates that could "nearly" be used to implement different semantics. If the path has triggered once (previous_exists == true), don't trigger again on any extra changes unless the predicate was false (previous_exists == false) in the meantime. But a.) I don't see any such behaviour really described in the man page b.) the code doesn't seem to work like that at all, and c.) it could still not work if the predicate got false for a short period while the notify is not watching, because nobody could update the boolean appropriately. So even that variant would only "nearly" work. And for the "relative" predicates such a boolean would never be enough. Modification can have happened in the meantime even if the file existed before. So what's the purpose of the boolean? Regards, Uwe Uwe Geuder Neuro Event Labs Oy Tampere, Finland uwe.gxu...@neuroeventlabs.com (bot check: fix 1 obvious typo) ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] Using *.path units without races?
Hi again, On Thu, 19 Mar 2020 at 00:20, Uwe Geuder wrote: > > Hi! > > On Tue, 17 Mar 2020 at 23:52, Michael Chapman wrote: > > > > On Wed, 18 Mar 2020, Uwe Geuder wrote: > > > Hi! > > > > > > I have wondered for a while how I can use *.path units without (too bad) > > > races. > > > > > > Since > > > https://github.com/systemd/systemd/pull/13509/commits/06582e42de65a61d0238a18720a12b6353edb7cd > > > the behaviour has been become much clearer, but I must admit I still > > > don't get it. > > > > That commit does look incomplete to me. > > > > As a quick test, are you able to try out the patch below? This makes > > systemd always check the filesystem when the service stops, rather than > > just relying on the (as of that commit nonexistent) inotify event. ... > I built that change and quickly tested it. It seems to work fine! ... > As expected the service gets now invoked 3 times. Without your patch the > second touch command/file is missed and only "handled" together with the > third touch command/file. I tested a bit more. Yesterday I tested using DirectoryNotEmpty= Today I tested PathModified= and there your patch does not improve the situation. The manual page is not very specific about how that is supposed to work IMHO, but I could imagine the following distinction: PathExists=, PathExistsGlob=, and DirectoryNotEmpty= are absolute predicates. When setting the path unit to waiting one can just check whether they are true or not. (After arming inotify of course.) With your patch my limited testing was successful. However, PathChanged= and PathModified= are relative predicates. You cannot just check whether they are true or not. Wouldn't the correct implementation 1. need to store the applicable timestamp of the path involved when the path unit is set to started and 2. when the path unit is set to waiting again it would need to compare the stored timestamp with the current timestamp (again after arming inotify) to catch modifications that happened while the unit was running/inotify not armed I don't think I see the timestamps stored at all. So how was this supposed to work? Was the intended semantics different? Regards, Uwe Uwe Geuder Neuro Event Labs Oy Tampere, Finland uwe.gxu...@neuroeventlabs.com (bot check: fix 1 obvious typo) ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] Using *.path units without races?
Hi! On Tue, 17 Mar 2020 at 23:52, Michael Chapman wrote: > > On Wed, 18 Mar 2020, Uwe Geuder wrote: > > Hi! > > > > I have wondered for a while how I can use *.path units without (too bad) > > races. > > > > Since > > https://github.com/systemd/systemd/pull/13509/commits/06582e42de65a61d0238a18720a12b6353edb7cd > > the behaviour has been become much clearer, but I must admit I still > > don't get it. > > That commit does look incomplete to me. > > As a quick test, are you able to try out the patch below? This makes > systemd always check the filesystem when the service stops, rather than > just relying on the (as of that commit nonexistent) inotify event. ... > diff --git a/src/core/path.c b/src/core/path.c > index cb75d778af..a513df97b2 100644 > --- a/src/core/path.c > +++ b/src/core/path.c > @@ -759,11 +759,7 @@ static void path_trigger_notify(Unit *u, Unit *other) > { > if (p->state == PATH_RUNNING && > UNIT_IS_INACTIVE_OR_FAILED(unit_active_state(other))) { > log_unit_debug(UNIT(p), "Got notified about unit > deactivation."); > - > -/* Hmm, so inotify was triggered since the > - * last activation, so I guess we need to > - * recheck what is going on. */ > -path_enter_waiting(p, false, p->inotify_triggered); > +path_enter_waiting(p, false, true); > } > } > Thanks a lot Michael for this quick patch. The patch seems to have suffered a formatting accident in the mail. However, your intent to change the 3rd agument of that function call is pretty clear. I built that change and quickly tested it. It seems to work fine! Please note that I have not yet done any wider (regression) testing and my understanding of the code base is not good. I can share my test code - test.path - [Unit] Description=%n testing racyness of path [Path] DirectoryNotEmpty=/tmp/systemd-path-test/ MakeDirectory=true - test.service - [Unit] Description=%n testing racyness of path [Service] ExecStart=%h/try/systemd/test.sh - test.sh - #!/bin/sh -eu # Usage: Expects to find .sleep files in ${testdir} echo "${0} starting" testdir=/tmp/systemd-path-test/ exit_handler() { echo "${0} exiting" } trap exit_handler EXIT # assume good faith, don't worry about dotfiles or # pathologic filenames... files=$(ls "${testdir}/") if [ -z "${files}" ] ; then echo "No files found" exit else echo "Found ${files}" fi for f in ${files} ; do if [ -e "${testdir}/${f}" ] ; then rm -f "${testdir}/${f}" else echo "Race: ${f} has disappeared" fi case "${f}" in *.sleep) # don't worry about non-numeric... s=$(basename "${f}" .sleep) echo "sleeping ${s}" sleep "${s}" ;; esac done -- Test case: touch /tmp/systemd-path-test/5.sleep ; sleep 2 ; \ touch /tmp/systemd-path-test/6.sleep ; sleep 5 ; \ touch /tmp/systemd-path-test/0.5.sleep (all 5 commands in one go) As expected the service gets now invoked 3 times. Without your patch the second touch command/file is missed and only "handled" together with the third touch command/file. Regards, Uwe Uwe Geuder Neuro Event Labs Oy Tampere, Finland uwe.gxu...@neuroeventlabs.com (bot check: fix 1 obvious typo) ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
[systemd-devel] Using *.path units without races?
Hi! I have wondered for a while how I can use *.path units without (too bad) races. Since https://github.com/systemd/systemd/pull/13509/commits/06582e42de65a61d0238a18720a12b6353edb7cd the behaviour has been become much clearer, but I must admit I still don't get it. https://www.freedesktop.org/software/systemd/man/systemd.path.html says > When a service unit triggered by a path unit terminates (regardless > whether it exited successfully or failed), monitored paths are checked > immediately again, and the service accordingly restarted instantly I'm am not sure I understand what that is supposed to mean. So lets take a simple example: DirectoryNotEmpty=/tmp/sd-test/ When I call a touch /tmp/sd-test/1.foo my service gets activated, that's all fine. In order to avoid an endless loop my service moves the files in the watched directory out of the way. (The man page mentions that systemd would limit the loop anyway if I didn't do it.) But now assume that during the service is running (easily tested with a sleep in the service) I call a touch /tmp/sd-test/2.foo and later (after my sleep) the "service unit triggered by" my "path unit terminates". According to the man page "monitored paths are checked immediately again and the service accordingly restarted instantly". However, I have not been able to produce any such "instant restart". It appears to me that the path has to be back in state "waiting" already, before any newly created file will be noticed. I also tested PathExistsGlob=/tmp/sd-test/* PathModified=/tmp/sd-test/ but the behaviour seems to be the same in all cases. When the service terminates the path goes to waiting and only *after* that new events will be noticed. No "instant restart" for anything that happend while path was running. What am I missing here? Are *.path units at all supposed to handle several files appearing at random times in a reasonable racefree manner? What would be the correct pattern to do so? Of course I can implement my service to recheck before it exits. But that will never be racefree, and it will fail completely should the service fail because of an error. The man page suggests to me that systemd would do more for me. Regards, Uwe Uwe Geuder Neuro Event Labs Oy Tampere, Finland uwe.gxu...@neuroeventlabs.com (bot check: fix 1 obvious typo) ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel