[Sts-sponsors] [Bug 1999816] Re: Failure to get free disk space breaks "rabbitmqctl status" command

Andreas Hasenack Thu, 16 Mar 2023 06:58:24 -0700

In focal (haven't yet checked the others in this regard), rabbitmq (the
server) seems to call "df" every 10s, regardless if I run rabbitmqctl
status or not:


# execsnoop-bpfcc -T -n df
TIME     PCOMM            PID    PPID   RET ARGS
13:17:48 df               7203   7202     0 /usr/bin/df -kP 
/var/lib/rabbitmq/mnesia/rabbit@f-rabbit
13:17:48 df.orig          7204   7203     0 /usr/bin/df.orig -kP 
/var/lib/rabbitmq/mnesia/rabbit@f-rabbit
13:17:58 df               7209   7208     0 /usr/bin/df -kP 
/var/lib/rabbitmq/mnesia/rabbit@f-rabbit
13:17:58 df.orig          7210   7209     0 /usr/bin/df.orig -kP 
/var/lib/rabbitmq/mnesia/rabbit@f-rabbit
13:18:08 df               7212   7211     0 /usr/bin/df -kP 
/var/lib/rabbitmq/mnesia/rabbit@f-rabbit
13:18:08 df.orig          7213   7212     0 /usr/bin/df.orig -kP 
/var/lib/rabbitmq/mnesia/rabbit@f-rabbit
13:18:18 df               7215   7214     0 /usr/bin/df -kP 
/var/lib/rabbitmq/mnesia/rabbit@f-rabbit
13:18:18 df.orig          7216   7215     0 /usr/bin/df.orig -kP 
/var/lib/rabbitmq/mnesia/rabbit@f-rabbit
13:18:28 df               7218   7217     0 /usr/bin/df -kP 
/var/lib/rabbitmq/mnesia/rabbit@f-rabbit
13:18:28 df.orig          7219   7218     0 /usr/bin/df.orig -kP 
/var/lib/rabbitmq/mnesia/rabbit@f-rabbit
13:18:38 df               7221   7220     0 /usr/bin/df -kP 
/var/lib/rabbitmq/mnesia/rabbit@f-rabbit
13:18:38 df.orig          7222   7221     0 /usr/bin/df.orig -kP 
/var/lib/rabbitmq/mnesia/rabbit@f-rabbit


(I used a /usr/bin/df wrapper that calls /usr/bin/df.orig "$@")

If you call rabbitmqctl-status in between those df calls, you will get
the report from the last df run.

If I add the long sleep, and call rabbitmqctl status while that sleep is
running, then my status command hangs until the sleep is over, or a
timeout is reached.

How about changing the test case to have df exit without printing
anything?

Like:
cat <<EOF >$SH
#!/bin/sh
exit 0
EOF

I noticed that in this case (focal at least) the server calls df once,
probably notices it isn't working, and doesn't call it again, so no
repeated calls every 10s. I left it for a while and it looks like the
new frequency is every 2min. Once df is working again (if I let the
wrapper call df.orig for example), then it resumes the 10s frequency.

-- 
You received this bug notification because you are a member of SE
("STS") Sponsors, which is subscribed to the bug report.
https://bugs.launchpad.net/bugs/1999816

Title:
  Failure to get free disk space breaks "rabbitmqctl status" command

Status in rabbitmq-server package in Ubuntu:
  Fix Released
Status in rabbitmq-server source package in Focal:
  Fix Committed
Status in rabbitmq-server source package in Jammy:
  Fix Committed
Status in rabbitmq-server source package in Kinetic:
  Fix Committed

Bug description:
  [Impact]

  When for some reason the df command fails to get the disk free space
  (for example timeout on a heavily loaded system) the result is a
  harcoded value of "unknown". As this is not a valid number this
  generates arithmetic errors when the "rabbitmqctl status" command is
  run and tries to divide that value to convert it to another unit.

  This has been fixed upstream here:
  https://github.com/rabbitmq/rabbitmq-server/pull/4897

  [Test Plan]

  The df command can be linked to another file that just waits for a few
  minutes to force a timeout for example: [detailed steps in comment
  #5].

  #!/bin/bash
  sleep 5m

  After the timeout occurs the "rabbitmqctl status" returns an error
  with the unpatched version. After the patch it shows all the
  information and displays unknown in the free space line.

  [Where problems could occur]

  The patch just changes the display of information, it should not break
  anything in the core operations of the package

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/rabbitmq-server/+bug/1999816/+subscriptions


-- 
Mailing list: https://launchpad.net/~sts-sponsors
Post to     : sts-sponsors@lists.launchpad.net
Unsubscribe : https://launchpad.net/~sts-sponsors
More help   : https://help.launchpad.net/ListHelp

[Sts-sponsors] [Bug 1999816] Re: Failure to get free disk space breaks "rabbitmqctl status" command

Reply via email to