Re: [ceph-users] Ceph monitors overloaded on large cluster restart

2018-12-20 Thread Joachim Kraftmayer
Hello Andreas, we had the following experience in recent years: 1 year ago we also completely shut down one 2500+ osds ceph cluster and had no problems to start the cluster again. ( 5 mon nodes each with 4 x 25 Gbit/s ) A few years ago, we increased the number of osds to more than 600 in

Re: [ceph-users] Ceph monitors overloaded on large cluster restart

2018-12-19 Thread Andras Pataki
Hi Dan, 'noup' now makes a lot of sense - that's probably the major help that our cluster start would have needed.  Essentially this way only one map change occurs in the cluster when all the OSDs are marked 'in' and that gets distributed, vs hundreds or thousands of map changes as various

Re: [ceph-users] Ceph monitors overloaded on large cluster restart

2018-12-19 Thread Dan van der Ster
Hey Andras, Three mons is possibly too few for such a large cluster. We've had lots of good stable experience with 5-mon clusters. I've never tried 7, so I can't say if that would lead to other problems (e.g. leader/peon sync scalability). That said, our 1-osd bigbang tests managed with only

Re: [ceph-users] Ceph monitors overloaded on large cluster restart

2018-12-19 Thread Andras Pataki
Forgot to mention: all nodes are on Luminous 12.2.8 currently on CentOS 7.5. On 12/19/18 5:34 PM, Andras Pataki wrote: Dear ceph users, We have a large-ish ceph cluster with about 3500 osds.  We run 3 mons on dedicated hosts, and the mons typically use a few percent of a core, and generate

[ceph-users] Ceph monitors overloaded on large cluster restart

2018-12-19 Thread Andras Pataki
Dear ceph users, We have a large-ish ceph cluster with about 3500 osds.  We run 3 mons on dedicated hosts, and the mons typically use a few percent of a core, and generate about 50Mbits/sec network traffic.  They are connected at 20Gbits/sec (bonded dual 10Gbit) and are running on 2x14 core