Re: [ceph-users] Planning all flash cluster
Adding more nodes from the beginning would probably be a good idea. On Wed, Jun 20, 2018 at 12:58 PM Nick A wrote: > > Hello Everyone, > > We're planning a small cluster on a budget, and I'd like to request any > feedback or tips. > > 3x Dell R720XD with: > 2x Xeon E5-2680v2 or very similar The CPUs look good and sufficiently fast for IOPS. > 96GB RAM 4GB per OSD looks a bit on the short side. Probably 192G would help. > 2x Samsung SM863 240GB boot/OS drives > 4x Samsung SM863 960GB OSD drives > Dual 40/56Gbit Infiniband using IPoIB. > > 3 replica, MON on OSD nodes, RBD only (no object or CephFS). > > We'll probably add another 2 OSD drives per month per node until full (24 > SSD's per node), at which point, more nodes. We've got a few SM863's in > production on other system and are seriously impressed with them, so would > like to use them for Ceph too. > > We're hoping this is going to provide a decent amount of IOPS, 20k would be > ideal. I'd like to avoid NVMe Journals unless it's going to make a truly > massive difference. Same with carving up the SSD's, would rather not, and > just keep it as simple as possible. I agree: those SSDs shouldn't really require a journal device. Not sure about the 20k IOPS specially without any further information. Doing 20k IOPS at 1kB block is totally different at 1MB block... > > Is there anything that obviously stands out as severely unbalanced? The > R720XD comes with a H710 - instead of putting them in RAID0, I'm thinking a > different HBA might be a better idea, any recommendations please? Don't know that HBA. Does it support pass through mode or HBA mode? > > Regards, > Nick > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Planning all flash cluster
Another great thing about lots of small servers vs. few big servers is that you can use erasure coding. You can save a lot of money by using erasure coding, but performance will have to be evaluated for your use case. I'm working with several clusters that are 8-12 servers with 6-10 SSDs each running erasure coding for VMs with RBD. They perform surprisingly well: ~6-10k IOPS with ~30% cpu load and ~30% disk IO load. But that requires at least 7 servers for a reasonable setup and some good benchmarking to evaluate it for your scenario. Especially the tail latencies can be prohibitive sometimes. Paul 2018-06-20 14:09 GMT+02:00 Wido den Hollander : > > > On 06/20/2018 02:00 PM, Robert Sander wrote: > > On 20.06.2018 13:58, Nick A wrote: > > > >> We'll probably add another 2 OSD drives per month per node until full > >> (24 SSD's per node), at which point, more nodes. > > > > I would add more nodes earlier to achieve better overall performance. > > Exactly. Not only performance, but also failure domain. > > In a smaller setup I would always choose a 1U node with 8 ~ 10 SSDs per > node. > > Wido > > > > > Regards > > > > > > > > ___ > > ceph-users mailing list > > ceph-users@lists.ceph.com > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > -- Paul Emmerich Looking for help with your Ceph cluster? Contact us at https://croit.io croit GmbH Freseniusstr. 31h 81247 München www.croit.io Tel: +49 89 1896585 90 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Planning all flash cluster
This is true, but misses the point that the OP is talking about old hardware already - you're not going to save much money on removing a 2nd hand CPU from a system. On Wed, 20 Jun 2018 at 22:10, Wido den Hollander wrote: > > > On 06/20/2018 02:00 PM, Robert Sander wrote: > > On 20.06.2018 13:58, Nick A wrote: > > > >> We'll probably add another 2 OSD drives per month per node until full > >> (24 SSD's per node), at which point, more nodes. > > > > I would add more nodes earlier to achieve better overall performance. > > Exactly. Not only performance, but also failure domain. > > In a smaller setup I would always choose a 1U node with 8 ~ 10 SSDs per > node. > > Wido > > > > > Regards > > > > > > > > ___ > > ceph-users mailing list > > ceph-users@lists.ceph.com > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > -- Cheers, ~Blairo ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Planning all flash cluster
On 06/20/2018 02:00 PM, Robert Sander wrote: > On 20.06.2018 13:58, Nick A wrote: > >> We'll probably add another 2 OSD drives per month per node until full >> (24 SSD's per node), at which point, more nodes. > > I would add more nodes earlier to achieve better overall performance. Exactly. Not only performance, but also failure domain. In a smaller setup I would always choose a 1U node with 8 ~ 10 SSDs per node. Wido > > Regards > > > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > signature.asc Description: OpenPGP digital signature ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Planning all flash cluster
* More small servers give better performance then few big servers, maybe twice the number of servers with half the disks, cpus and RAM * 2x 10 gbit is usually enough, especially with more servers. that will rarely be the bottleneck (unless you have extreme bandwidth requirements) * maybe save money by using normal Ethernet unless you already got IB infrastructure around * you might need to reduce bluestore cache size a little bit (default is 3GB for SSDs) as you are running with 4GB ram per OSD (which is fine, you just might need to tune the settings a little bit) * SM863a is a great disk, good choice. NVMe db disks are not needed here * raid controllers are evil in most cases, configure them as JBOD Paul 2018-06-20 13:58 GMT+02:00 Nick A : > Hello Everyone, > > We're planning a small cluster on a budget, and I'd like to request any > feedback or tips. > > 3x Dell R720XD with: > 2x Xeon E5-2680v2 or very similar > 96GB RAM > 2x Samsung SM863 240GB boot/OS drives > 4x Samsung SM863 960GB OSD drives > Dual 40/56Gbit Infiniband using IPoIB. > > 3 replica, MON on OSD nodes, RBD only (no object or CephFS). > > We'll probably add another 2 OSD drives per month per node until full (24 > SSD's per node), at which point, more nodes. We've got a few SM863's in > production on other system and are seriously impressed with them, so would > like to use them for Ceph too. > > We're hoping this is going to provide a decent amount of IOPS, 20k would > be ideal. I'd like to avoid NVMe Journals unless it's going to make a truly > massive difference. Same with carving up the SSD's, would rather not, and > just keep it as simple as possible. > > Is there anything that obviously stands out as severely unbalanced? The > R720XD comes with a H710 - instead of putting them in RAID0, I'm thinking a > different HBA might be a better idea, any recommendations please? > > Regards, > Nick > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > -- Paul Emmerich Looking for help with your Ceph cluster? Contact us at https://croit.io croit GmbH Freseniusstr. 31h 81247 München www.croit.io Tel: +49 89 1896585 90 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Planning all flash cluster
On 20.06.2018 13:58, Nick A wrote: > We'll probably add another 2 OSD drives per month per node until full > (24 SSD's per node), at which point, more nodes. I would add more nodes earlier to achieve better overall performance. Regards -- Robert Sander Heinlein Support GmbH Schwedter Str. 8/9b, 10119 Berlin http://www.heinlein-support.de Tel: 030 / 405051-43 Fax: 030 / 405051-19 Zwangsangaben lt. §35a GmbHG: HRB 93818 B / Amtsgericht Berlin-Charlottenburg, Geschäftsführer: Peer Heinlein -- Sitz: Berlin signature.asc Description: OpenPGP digital signature ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Planning all flash cluster
Hello Everyone, We're planning a small cluster on a budget, and I'd like to request any feedback or tips. 3x Dell R720XD with: 2x Xeon E5-2680v2 or very similar 96GB RAM 2x Samsung SM863 240GB boot/OS drives 4x Samsung SM863 960GB OSD drives Dual 40/56Gbit Infiniband using IPoIB. 3 replica, MON on OSD nodes, RBD only (no object or CephFS). We'll probably add another 2 OSD drives per month per node until full (24 SSD's per node), at which point, more nodes. We've got a few SM863's in production on other system and are seriously impressed with them, so would like to use them for Ceph too. We're hoping this is going to provide a decent amount of IOPS, 20k would be ideal. I'd like to avoid NVMe Journals unless it's going to make a truly massive difference. Same with carving up the SSD's, would rather not, and just keep it as simple as possible. Is there anything that obviously stands out as severely unbalanced? The R720XD comes with a H710 - instead of putting them in RAID0, I'm thinking a different HBA might be a better idea, any recommendations please? Regards, Nick ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com