On Wed, Mar 11, 2015 at 8:40 AM, Artem Savinov asavi...@asdco.ru wrote:
hello.
ceph transfers osd node in the down status by default , after receiving 3
reports about disabled nodes. Reports are sent per osd heartbeat grace
seconds, but the settings of mon_osd_adjust_heartbeat_gratse = true,
mon_osd_adjust_down_out_interval = true timeout to transfer nodes in down
status may vary. Tell me please: what algorithm enables changes timeout for
the transfer nodes occur in down/out status and which parameters are
affected?
thanks.
The monitors keep track of which detected failures are incorrect
(based on reports from the marked-down/out OSDs) and build up an
expectation about how often the failures are correct based on an
exponential backoff of the data points. You can look at the code in
OSDMonitor.cc if you're interested, but basically they apply that
expectation to modify the down interval and the down-out interval to a
value large enough that they believe the OSD is really down (assuming
these config options are set). It's not terribly interesting. :)
-Greg
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com