On January 31, 2024 3:22 pm, Friedrich Weber wrote: > Also, looks like every time ceph-crash posts a report, the syslog reads: > > Jan 31 15:02:30 ceph1 ceph-crash[110939]: WARNING:ceph-crash:post > /var/lib/ceph/crash/2024-01-31T13:53:16.419342Z_1b5a078a-f665-4fcd-abd5-9bf602048d1f > as client.crash.ceph1 failed: 2024-01-31T15:02:30.105+0100 7f10bf7ae6c0 > -1 auth: unable to find a keyring on > /etc/pve/priv/ceph.client.crash.ceph1.keyring: (13) Permission denied > Jan 31 15:02:30 ceph1 ceph-crash[110939]: 2024-01-31T15:02:30.105+0100 > 7f10bf7ae6c0 -1 auth: unable to find a keyring on > /etc/pve/priv/ceph.client.crash.ceph1.keyring: (13) Permission denied > Jan 31 15:02:30 ceph1 ceph-crash[110939]: 2024-01-31T15:02:30.105+0100 > 7f10bf7ae6c0 -1 auth: unable to find a keyring on > /etc/pve/priv/ceph.client.crash.ceph1.keyring: (13) Permission denied > Jan 31 15:02:30 ceph1 ceph-crash[110939]: 2024-01-31T15:02:30.105+0100 > 7f10bf7ae6c0 -1 auth: unable to find a keyring on > /etc/pve/priv/ceph.client.crash.ceph1.keyring: (13) Permission denied > Jan 31 15:02:30 ceph1 ceph-crash[110939]: 2024-01-31T15:02:30.105+0100 > 7f10bf7ae6c0 -1 monclient: keyring not found > Jan 31 15:02:30 ceph1 ceph-crash[110939]: [errno 13] RADOS permission > denied (error connecting to the cluster) > > I remember you mentioned this before. Do I remember correctly there is > no easy way to prevent these messages? Having them appear only when a > crash is posted is certainly better than every 10 minutes, but they are > a bit misleading as they very much look like an error that needs attention.
so I did a few more experiments. ceph-crash does two things A) it executes `ceph -s` without specifying a client name, which means that part will always try to use the `client.admin` config/keyring B) it tries to post crashes if they exist, using the keys `client.crash.$HOST`, `client.crash`, `client.admin` A happens at startup to "exercise the key", irrespective of crash files existing or not. we'd need to patch ceph-crash once we settled which client name to use to avoid it. B happens for every crash, once posting worked the other keyrings are not tried again for that particular crash, but will for the next. this means to avoid warnings altogether, we'd need to make the first entry in auth_names work or patch the `auth_names` part of the ceph-crash binary. I played around a bit and it seems we could do the following: - change the [client] section in our config to only affect [client.admin] (simple renaming is enough, all `ceph` invocations without `-n` or `-i` should continue to work as before, since "client.admin" is the default `-n` value) - generate (on each node) a `client.crash.$HOSTNAME` keyring with crash profile and store it in /etc/ceph/ceph.client.crash.$HOSTNAME ceph-crash will then (at least for crash posting purposes) invoke `ceph -n client.crash.$HOSTNAME` first, which will pick up that keyring since `/etc/ceph/$cluster.$name.keyring` is part of the default value(s) for the client keyring. this doesn't work without modifying our ceph.conf since the current global "client.keyring" setting overrides the built-in defaults for *all* invocations, even for `ceph -n XXX`. using the current approach with "client.crash" and a key on pmxcfs also works, to silence the warnings we could then patch ceph-crash to use that key (/client name) for `ceph -s` and remove the `client.crash.$HOSTNAME` from auth_names. but I assume since that comes first, that upstream actually expects people to use that keyring, the rest are just fallbacks, so we'd need to watch for regressions when pulling in updates. _______________________________________________ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel