On Tue Dec 23, 2025 at 1:43 PM CET, Maximiliano Sandoval wrote:
> "Max R. Carrara" <[email protected]> writes:
>
> > Fix #6816: Prevent ceph-exporter Daemon from Crashing on Startup - v2
> > =====================================================================
> >
> > tl;dr: Stop ceph-exporter.service from ending up in a crash loop by
> > handing it a custom keyring file and setting its group to `www-data`,
> > similar to what we did for ceph-crash.service [0] before.
> >
> > This is a refresh of a somewhat older series that has been rebased, with
> > the version guard in `debian/postinst` adapted. The description from the
> > previous version is provided here again for the reader's convenience.
> >
> > Currently, the `ceph-exporter` daemon ends up in a short startup crash
> > loop before ultimately failing to start at all, because it tries to
> > access the keyring file at `/etc/pve/priv/ceph.client.admin.keyring`,
> > for which it doesn't have the permissions to do so.
> >
> > Instead of giving it access to the admin ring, give it its own keyring
> > located at `/etc/pve/ceph/ceph.client.exporter.keyring`. This file and
> > its corresponding section in `/etc/pve/ceph.conf` is created when the
> > first MON is created via the API. If the cluster has already been set
> > up, a postinst hook creates the keyring file and adapts
> > `/etc/pve/ceph.conf` instead.
> >
> > The core logic of all of this was already added for `ceph-crash` a while
> > ago [0] and is reused throughout the series, with some alterations to
> > the original code in order to make it a little more generic.
>
> I tested this series and it works as advertised modulo a race condition:
>
> When the ceph-exporter unit is started before installing this series it
> will fail and systemd will retry a handful of times, during this time
> `systemctl is-failed ceph-exporter.service` returns 'activating' instead
> of 'failed'. This might explain that then the reset-failed is never
> called. This results in ceph-exporter being restarted as part of the
> postinst script but failing because the reset-failed was never called
> and there have been too many attempts already.
>
> Otherwise, it works as expected. Thanks!
>
> Tested-by: Maximiliano Sandoval <[email protected]>

Thanks a ton for testing this! That's a really good catch.

As discussed off-list, `ceph-exporter` won't be reset and restarted
anymore in debian/postinst. See v3 [0] for an update.

[0]: 
https://lore.proxmox.com/pve-devel/[email protected]/



_______________________________________________
pve-devel mailing list
[email protected]
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel

Reply via email to