short-lived command

Brian Goff Mon, 13 Dec 2021 12:01:14 -0800

Public bug reported:

Package: systemd
Version: 245.4-4ubuntu3.13


When running a systemd unit where type=forking and setting a pidfile
(example at the end of the description), if the process referred to by
the pid in the pidfile exits before systemd has read the file, systemd
complains (visible in journald logs) "New main PID <PID> does not exist
or is a zombie."

The problem is systemd never records the actual exit status, and querying them 
from the unit properties shows that the exit status is 0/success even though it 
exited non-zero.
The "Result" property on the unit is "protocol", indicating that we've run 
afoul of the forking protocol with systemd.

In this case we haven't really broken protocol just that we've exposed a race 
with monitoring the forked process.
This can happen with any sort of error in the forked process,
Since systemd should be reaping the process anyway, it seems like we should be 
able to get a correct exit status here.
If there is a small delay between starting the process and the exit then 
systemd has enough time to attach to the process and monitor correctly.

The properties one would normally check on this process are all zeroed out:
ExecMainStartTimestampMonotonic=0
ExecMainExitTimestampMonotonic=0
ExecMainPID=0
ExecMainCode=0
ExecMainStatus=0

As is the `EXIT_STATUS` environment variable passed along to any "ExecStop" 
commands.
In some cases I've seen the "stop_time" set in the ExecStart properties of the 
service, but found this to be unreliable.

I've tried working around this by keeping the control process alive to
wait and see if the forked process exits quickly and recording the exit
status myself. This is a decent work-around however causes some extra
overhead, and seems like it gets into the territory of what I'd expect
systemd to do for me.

The example below is just simulating what might happen with a real
process that errors out quickly.

Example:

cat << EOF > /tmp/systemd-forking-bug.sh
#!/usr/bin/env bash
(
    echo \$BASHPID > \$PIDFILE
    exit 1
) &

echo control process exiting
EOF

cat << EOF > /tmp/systemd-forking-bug.service
[Service]
Type=forking
PIDFile=/tmp/systemd-forking-bug.pid
ExecStart=/bin/bash /tmp/systemd-forking-bug.sh
EOF

sudo mv /tmp/systemd-forking-bug.service /run/systemd/system/
sudo systemctl daemon-reload
sudo systemctl start systemd-forking-bug.service
sudo systemctl status systemd-forking-bug.service
sudo journalctl --lines=5 -u systemd-forking-bug.service

** Affects: systemd (Ubuntu)
     Importance: Undecided
         Status: New

** Description changed:

  Package: systemd
  Version: 245.4-4ubuntu3.13
  
  When running a systemd unit where type=forking and setting a pidfile
  (example at the end of the description), if the process referred to by
  the pid in the pidfile exits before systemd has read the file, systemd
  complains (visible in journald logs) "New main PID <PID> does not exist
  or is a zombie."
  
- The problem is systemd never records the actual exit status, and querying 
them from the unit properties shows that the exit status is 0/success even 
though.
+ The problem is systemd never records the actual exit status, and querying 
them from the unit properties shows that the exit status is 0/success even 
though it exited non-zero.
  The "Result" property on the unit is "protocol", signifying that we've run 
afoul of the forking protocol with systemd.
  
  In this case we haven't really broken protocol just that we've exposed a race 
between monitoring the forked process.
- This can happen with any sort of error in the forked process, 
+ This can happen with any sort of error in the forked process,
  Since systemd should be reaping the process anyway, it seems like we should 
be able to get a correct exit status here.
  If there is a small delay between starting the process and the exit then 
systemd has enough time to attach to the process and monitor correctly.
  
  The properties one would normally check on this process are all zeroed out:
  ExecMainStartTimestampMonotonic=0
  ExecMainExitTimestampMonotonic=0
  ExecMainPID=0
  ExecMainCode=0
  ExecMainStatus=0
  
  As is the `EXIT_STATUS` environment variable passed along to any "ExecStop" 
commands.
  In some cases I've seen the "stop_time" set in the ExecStart properties of 
the service, but found this to be unreliable.
  
  I've tried working around this by keeping the control process alive to
  wait and see if the forked process exits quickly and recording the exit
  status myself. This is a decent work-around however causes some extra
  overhead, and seems like it gets into the territory of what I'd expect
  systemd to do for me.
  
  The example below is just simulating what might happen with a real
  process that errors out quickly.
  
  Example:
  
  cat << EOF > /tmp/systemd-forking-bug.sh
  #!/usr/bin/env bash
  (
-     echo \$BASHPID > \$PIDFILE
-     exit 1
+     echo \$BASHPID > \$PIDFILE
+     exit 1
  ) &
  
  echo control process exiting
  EOF
- 
  
  cat << EOF > /tmp/systemd-forking-bug.service
  [Service]
  Type=forking
  PIDFile=/tmp/systemd-forking-bug.pid
  ExecStart=/bin/bash /tmp/systemd-forking-bug.sh
  EOF
  
  sudo mv /tmp/systemd-forking-bug.service /run/systemd/system/
  sudo systemctl daemon-reload
  sudo systemctl start systemd-forking-bug.service
  sudo systemctl status systemd-forking-bug.service
  sudo journalctl --lines=5 -u systemd-forking-bug.service

** Description changed:

  Package: systemd
  Version: 245.4-4ubuntu3.13
  
  When running a systemd unit where type=forking and setting a pidfile
  (example at the end of the description), if the process referred to by
  the pid in the pidfile exits before systemd has read the file, systemd
  complains (visible in journald logs) "New main PID <PID> does not exist
  or is a zombie."
  
  The problem is systemd never records the actual exit status, and querying 
them from the unit properties shows that the exit status is 0/success even 
though it exited non-zero.
- The "Result" property on the unit is "protocol", signifying that we've run 
afoul of the forking protocol with systemd.
+ The "Result" property on the unit is "protocol", indicating that we've run 
afoul of the forking protocol with systemd.
  
  In this case we haven't really broken protocol just that we've exposed a race 
between monitoring the forked process.
  This can happen with any sort of error in the forked process,
  Since systemd should be reaping the process anyway, it seems like we should 
be able to get a correct exit status here.
  If there is a small delay between starting the process and the exit then 
systemd has enough time to attach to the process and monitor correctly.
  
  The properties one would normally check on this process are all zeroed out:
  ExecMainStartTimestampMonotonic=0
  ExecMainExitTimestampMonotonic=0
  ExecMainPID=0
  ExecMainCode=0
  ExecMainStatus=0
  
  As is the `EXIT_STATUS` environment variable passed along to any "ExecStop" 
commands.
  In some cases I've seen the "stop_time" set in the ExecStart properties of 
the service, but found this to be unreliable.
  
  I've tried working around this by keeping the control process alive to
  wait and see if the forked process exits quickly and recording the exit
  status myself. This is a decent work-around however causes some extra
  overhead, and seems like it gets into the territory of what I'd expect
  systemd to do for me.
  
  The example below is just simulating what might happen with a real
  process that errors out quickly.
  
  Example:
  
  cat << EOF > /tmp/systemd-forking-bug.sh
  #!/usr/bin/env bash
  (
      echo \$BASHPID > \$PIDFILE
      exit 1
  ) &
  
  echo control process exiting
  EOF
  
  cat << EOF > /tmp/systemd-forking-bug.service
  [Service]
  Type=forking
  PIDFile=/tmp/systemd-forking-bug.pid
  ExecStart=/bin/bash /tmp/systemd-forking-bug.sh
  EOF
  
  sudo mv /tmp/systemd-forking-bug.service /run/systemd/system/
  sudo systemctl daemon-reload
  sudo systemctl start systemd-forking-bug.service
  sudo systemctl status systemd-forking-bug.service
  sudo journalctl --lines=5 -u systemd-forking-bug.service

** Description changed:

  Package: systemd
  Version: 245.4-4ubuntu3.13
  
  When running a systemd unit where type=forking and setting a pidfile
  (example at the end of the description), if the process referred to by
  the pid in the pidfile exits before systemd has read the file, systemd
  complains (visible in journald logs) "New main PID <PID> does not exist
  or is a zombie."
  
  The problem is systemd never records the actual exit status, and querying 
them from the unit properties shows that the exit status is 0/success even 
though it exited non-zero.
  The "Result" property on the unit is "protocol", indicating that we've run 
afoul of the forking protocol with systemd.
  
- In this case we haven't really broken protocol just that we've exposed a race 
between monitoring the forked process.
+ In this case we haven't really broken protocol just that we've exposed a race 
with monitoring the forked process.
  This can happen with any sort of error in the forked process,
  Since systemd should be reaping the process anyway, it seems like we should 
be able to get a correct exit status here.
  If there is a small delay between starting the process and the exit then 
systemd has enough time to attach to the process and monitor correctly.
  
  The properties one would normally check on this process are all zeroed out:
  ExecMainStartTimestampMonotonic=0
  ExecMainExitTimestampMonotonic=0
  ExecMainPID=0
  ExecMainCode=0
  ExecMainStatus=0
  
  As is the `EXIT_STATUS` environment variable passed along to any "ExecStop" 
commands.
  In some cases I've seen the "stop_time" set in the ExecStart properties of 
the service, but found this to be unreliable.
  
  I've tried working around this by keeping the control process alive to
  wait and see if the forked process exits quickly and recording the exit
  status myself. This is a decent work-around however causes some extra
  overhead, and seems like it gets into the territory of what I'd expect
  systemd to do for me.
  
  The example below is just simulating what might happen with a real
  process that errors out quickly.
  
  Example:
  
  cat << EOF > /tmp/systemd-forking-bug.sh
  #!/usr/bin/env bash
  (
      echo \$BASHPID > \$PIDFILE
      exit 1
  ) &
  
  echo control process exiting
  EOF
  
  cat << EOF > /tmp/systemd-forking-bug.service
  [Service]
  Type=forking
  PIDFile=/tmp/systemd-forking-bug.pid
  ExecStart=/bin/bash /tmp/systemd-forking-bug.sh
  EOF
  
  sudo mv /tmp/systemd-forking-bug.service /run/systemd/system/
  sudo systemctl daemon-reload
  sudo systemctl start systemd-forking-bug.service
  sudo systemctl status systemd-forking-bug.service
  sudo journalctl --lines=5 -u systemd-forking-bug.service

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1954698

Title:
  Cannot read exit status/time for type=forking w/ short-lived command

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1954698/+subscriptions


-- 
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1954698] [NEW] Cannot read exit status/time for type=forking w/ short-lived command

Reply via email to