Re: [ceph-users] osd laggy algorithm

2015-03-16 Thread Gregory Farnum
On Wed, Mar 11, 2015 at 8:40 AM, Artem Savinov asavi...@asdco.ru wrote:
 hello.
 ceph transfers osd node in the down status by default , after receiving 3
 reports about disabled nodes. Reports are sent per   osd heartbeat grace
 seconds, but the settings of mon_osd_adjust_heartbeat_gratse = true,
 mon_osd_adjust_down_out_interval = true timeout to transfer nodes in down
 status may vary. Tell me please: what algorithm enables changes timeout for
 the transfer nodes occur in down/out status and which parameters are
 affected?
 thanks.

The monitors keep track of which detected failures are incorrect
(based on reports from the marked-down/out OSDs) and build up an
expectation about how often the failures are correct based on an
exponential backoff of the data points. You can look at the code in
OSDMonitor.cc if you're interested, but basically they apply that
expectation to modify the down interval and the down-out interval to a
value large enough that they believe the OSD is really down (assuming
these config options are set). It's not terribly interesting. :)
-Greg
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] osd laggy algorithm

2015-03-11 Thread Artem Savinov
hello.
ceph transfers osd node in the down status by default , after receiving 3
reports about disabled nodes. Reports are sent per   osd heartbeat grace
seconds, but the settings of mon_osd_adjust_heartbeat_gratse = true,
mon_osd_adjust_down_out_interval = true timeout to transfer nodes in down
status may vary. Tell me please: what algorithm enables changes timeout for
the transfer nodes occur in down/out status and which parameters are
affected?
thanks.

--
Artem
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com