Marco, I think you've answered the nodown/noout question.
As for the "total unreasonable value" for the default.. In my experience the "defaults" become the defaults in 1 of 2 primary ways. The upstream vendor (ceph in this case) has a default that they select based on their expected typical use case, and the downstream vendor didn't override it, OR the downstream changes it to match the typical expected use-case Either way, as with most things in the unix world, the defaults aren't for everyone, which is why you can tune them. If the defaults aren't suitable for you, feel free to change them in your environment. On Fri, Dec 16, 2016 at 6:47 AM, Marco Gaiarin <[email protected]> wrote: > Mandi! Alexandre DERUMIER > In chel di` si favelave... > >> >>mon osd down out interval >> This is the time between when a monitor marks an OSD "down" (not >> currently serving data) and "out" (not considered *responsible* for >> data by the cluster). IO will resume once the OSD is down (assuming >> the PG has its minimum number of live replicas); it's just that data >> will be re-replicated to other nodes once an OSD is marked "out". > > Seems clear to me. I try to make an example to be sure. > > If i set: > mon osd report timeout = 15 > mon osd down out interval = 300 > > happen: > > a) after 15 seconds, irresponsive OSD get 'down', so IO resume > > b) after 5 minutes, the OSD get marked 'out', and so rebalancing > start. > > I've still a doubt. If i set 'ceph osd set nodown', simply i put the > first timeout to 'never'? Explained as above, could be... and so it is > my fault that i've set the 'nodown'... > > Wait... > > http://lists.ceph.com/pipermail/ceph-users-ceph.com/2015-June/002438.html > > ok, seems that 'noout' flag is the right thing to do. 'nodown' have to > be used only in 'bouncing' situation. > > If simply i need to stop rebalancing, it suffices to set 'noout'. > > >> osd should go down in around 30s max. (in this time, the cluster will be >> stale).. >> but not 5min. > > My experience say no. And if the parameter is 'mon osd report timeout', > also the docs say '300'. > Seems to me a total unreasonable value... > > >> (in ceph kraken, they have done optimisation for this detection >> https://github.com/ceph/ceph/pull/8558) > > Interesting. This is not my case, anyway, because i've rebooted all the > server. > > -- > dott. Marco Gaiarin GNUPG Key ID: 240A3D66 > Associazione ``La Nostra Famiglia'' http://www.lanostrafamiglia.it/ > Polo FVG - Via della Bontà , 7 - 33078 - San Vito al Tagliamento (PN) > marco.gaiarin(at)lanostrafamiglia.it t +39-0434-842711 f +39-0434-842797 > > Dona il 5 PER MILLE a LA NOSTRA FAMIGLIA! > http://www.lanostrafamiglia.it/25/index.php/component/k2/item/123 > (cf 00307430132, categoria ONLUS oppure RICERCA SANITARIA) > _______________________________________________ > pve-user mailing list > [email protected] > http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user -- Jeff Palmer https://PalmerIT.net _______________________________________________ pve-user mailing list [email protected] http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
