Hello All, I´m writing to you because i´m trying to find the way to rebuild a osd disk in a way to don´t impact the performance of the cluster. That´s because my applications are very latency sensitive.
1_ I found the way to reuse a OSD ID and don´t rebalance the cluster every time that I lost a disk. So, my cluster is running with the noout check forever. The point here is do the disk change as fast I can. 2_ after reuse de OSD ID, I´m living the OSD up and running, but with CERO weight. For example: root@DC4-ceph03-dn03:/var/lib/ceph/osd/ceph-352# ceph osd tree | grep 352 *352 1.81999 osd.352 up 0 1.00000* At this point everything is good. 3_ Starting the reweight, using "osd reweigh" i´m not touching the crushmap, and I´m doing the reweight very gradually. Example: *ceph osd reweight 352 0.001* But, anyway doing the reweight in this way i´m heating the latency sometimes. Depending of the amount of PGs that the cluster is recovering the impact is worst. Tunings that I already have done: ceph tell osd.* injectargs "--osd_max_backfills 1" ceph tell osd.* injectargs "--osd_recovery_max_active 1" ceph tell osd.* injectargs '--osd-max-recovery-threads 1' ceph tell osd.* injectargs '--osd-recovery-op-priority 1' ceph tell osd.* injectargs '--osd-client-op-priority 63' The question is, there are more parameters to change in order to do more gradually the OSD rebuild? I really appreciate your help, thanks in advance. Agustin Trolli Storage Team Mercadolibre.com
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com