In an exploration of trying to speedup the long tail of backfills resulting
from marking a failing OSD out I began looking at my PGs to see if i could
tune some settings and noticed the following:

Scenario: on a 12.2.12 Cluster, I am alerted of an inconsistent PG and am
alerted of SMART failures on that OSD. I inspect that PG and notice it is a
read_error from the SMART-failing osd.

Steps I take: Set the primary affinity of the failing OSD to 0 (thought
process being, I dont want a failing drive to be responsible for
backfilling data), wait for peering to complete, then mark the OSD out. At
this point backfill begins.

90% of the PGs complete backfill very quickly. Towards the tail end of the
backfill I have 20 PGs or so in backfill_wait and 1 backfilling (presuming
because of osd_max_backfills = 1).

I do a `ceph pg ls backfill_wait` and notice that 100% of the tail end PGs
are such that all OSDs in the up_set are different than those of acting_set
and that the acting_primary is the OSD that was set with primary affinity 0
and marked out.

My questions are the following:
- Upon learning a disk has failed smart and has an inconsistent PG I want
to prevent its potentially-corrupt data from being replicated out to other
OSDs, even for PGs which may not have been discovered to be inconsistent
yet so I set primary affinity to 0. At this step shouldn't the
acting_primary be another OSD from the acting_set and backfill be copied
out of a different OSD?
- Should I be additionally marking the OSD as down, which would cause the
PGs to go degraded until backfill finishes but would presumably finish
faster as more OSDs would become the acting_primary and I wouldnt be
throttled by osd_max_backfills. My thought here is its best to avoid
degraded PGs as I do not want to drop below min_size.

I recognize some of these things may be different in Nautilus but I am
waiting on the 14.2.6 release as i am aware of some bugs I do not want to
contend with. Thanks.


Respectfully,

*Wes Dillingham*
w...@wesdillingham.com
LinkedIn <http://www.linkedin.com/in/wesleydillingham>
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to