Fiddling with the crush weights sorted this out and I was able to remove
the OSD from the cluster. I set all the big weights down to 1
ceph osd crush reweight osd.7 1.0
etc.
Tx for all the help
On Tue, Nov 23, 2021 at 9:35 AM Stefan Kooman wrote:
> On 11/23/21 08:21, David Tinker wrote:
> >
Yes it recovered when I put the OSD back in. The issue is that it fails to
sort itself out when I remove that OSD even though I have loads of space
and 8 other OSDs in 4 different zones to choose from. The weights are very
different (some 3.2 others 0.36) and that post I found suggested that this
I just had a look at the balance docs and it says "No adjustments will be
made to the PG distribution if the cluster is degraded (e.g., because an
OSD has failed and the system has not yet healed itself).". That implies
that the balancer won't run until the disruption caused by the removed OSD
has
Yes it is on:
# ceph balancer status
{
"active": true,
"last_optimize_duration": "0:00:00.001867",
"last_optimize_started": "Mon Nov 22 13:10:24 2021",
"mode": "upmap",
"optimize_result": "Unable to find further optimization, or pool(s)
pg_num is decreasing, or distribution is
I set osd.7 as "in", uncordened the node, scaled the OSD deployment back up
and things are recovering with cluster status HEALTH_OK.
I found this message from the archives:
https://www.mail-archive.com/ceph-users@lists.ceph.com/msg47071.html
"You have a large difference in the capacities of the
Would it be worth setting the OSD I removed back to "in" (or whatever the
opposite of "out") is and seeing if things recovered?
On Thu, Nov 18, 2021 at 3:44 PM David Tinker wrote:
> Tx. # ceph version
> ceph version 15.2.7 (88e41c6c49beb18add4fdb6b4326ca466d931db8) octopus
> (stable)
>
>
>
> On
Tx. # ceph version
ceph version 15.2.7 (88e41c6c49beb18add4fdb6b4326ca466d931db8) octopus
(stable)
On Thu, Nov 18, 2021 at 3:28 PM Stefan Kooman wrote:
> On 11/18/21 13:20, David Tinker wrote:
> > I just grepped all the OSD pod logs for error and warn and nothing comes
> up:
> >
> > # k logs
If I ignore the dire warnings and about losing data and do:
ceph osd purge 7
will I lose data? There are still 2 copies of everything right?
I need to remove the node with the OSD from the k8s cluster, reinstall it
and have it re-join the cluster. This will bring in some new OSDs and maybe
Ceph
I just grepped all the OSD pod logs for error and warn and nothing comes up:
# k logs -n rook-ceph rook-ceph-osd-10-659549cd48-nfqgk | grep -i warn
etc
I am assuming that would bring back something if any of them were unhappy.
On Thu, Nov 18, 2021 at 1:26 PM Stefan Kooman wrote:
> On
Sure. Tx.
# ceph pg 3.1f query
{
"snap_trimq": "[]",
"snap_trimq_len": 0,
"state": "active+undersized+degraded",
"epoch": 2477,
"up": [
0,
2
],
"acting": [
0,
2
],
"acting_recovery_backfill": [
"0",
"2"
],
10 matches
Mail list logo