Re: [ceph-users] Planning all flash cluster

2018-06-20 Thread Luis Periquito
Adding more nodes from the beginning would probably be a good idea.

On Wed, Jun 20, 2018 at 12:58 PM Nick A  wrote:
>
> Hello Everyone,
>
> We're planning a small cluster on a budget, and I'd like to request any 
> feedback or tips.
>
> 3x Dell R720XD with:
> 2x Xeon E5-2680v2 or very similar
The CPUs look good and sufficiently fast for IOPS.

> 96GB RAM
4GB per OSD looks a bit on the short side. Probably 192G would help.

> 2x Samsung SM863 240GB boot/OS drives
> 4x Samsung SM863 960GB OSD drives
> Dual 40/56Gbit Infiniband using IPoIB.
>
> 3 replica, MON on OSD nodes, RBD only (no object or CephFS).
>
> We'll probably add another 2 OSD drives per month per node until full (24 
> SSD's per node), at which point, more nodes. We've got a few SM863's in 
> production on other system and are seriously impressed with them, so would 
> like to use them for Ceph too.
>
> We're hoping this is going to provide a decent amount of IOPS, 20k would be 
> ideal. I'd like to avoid NVMe Journals unless it's going to make a truly 
> massive difference. Same with carving up the SSD's, would rather not, and 
> just keep it as simple as possible.
I agree: those SSDs shouldn't really require a journal device. Not
sure about the 20k IOPS specially without any further information.
Doing 20k IOPS at 1kB block is totally different at 1MB block...
>
> Is there anything that obviously stands out as severely unbalanced? The 
> R720XD comes with a H710 - instead of putting them in RAID0, I'm thinking a 
> different HBA might be a better idea, any recommendations please?
Don't know that HBA. Does it support pass through mode or HBA mode?
>
> Regards,
> Nick
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Planning all flash cluster

2018-06-20 Thread Paul Emmerich
Another great thing about lots of small servers vs. few big servers is that
you can use erasure coding.
You can save a lot of money by using erasure coding, but performance will
have to be evaluated
for your use case.

I'm working with several clusters that are 8-12 servers with 6-10 SSDs each
running erasure coding
for VMs with RBD. They perform surprisingly well: ~6-10k IOPS with ~30% cpu
load and ~30%
disk IO load.

But that requires at least 7 servers for a reasonable setup and some good
benchmarking to evaluate
it for your scenario. Especially the tail latencies can be prohibitive
sometimes.

Paul

2018-06-20 14:09 GMT+02:00 Wido den Hollander :

>
>
> On 06/20/2018 02:00 PM, Robert Sander wrote:
> > On 20.06.2018 13:58, Nick A wrote:
> >
> >> We'll probably add another 2 OSD drives per month per node until full
> >> (24 SSD's per node), at which point, more nodes.
> >
> > I would add more nodes earlier to achieve better overall performance.
>
> Exactly. Not only performance, but also failure domain.
>
> In a smaller setup I would always choose a 1U node with 8 ~ 10 SSDs per
> node.
>
> Wido
>
> >
> > Regards
> >
> >
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>


-- 
Paul Emmerich

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Planning all flash cluster

2018-06-20 Thread Blair Bethwaite
This is true, but misses the point that the OP is talking about old
hardware already - you're not going to save much money on removing a 2nd
hand CPU from a system.

On Wed, 20 Jun 2018 at 22:10, Wido den Hollander  wrote:

>
>
> On 06/20/2018 02:00 PM, Robert Sander wrote:
> > On 20.06.2018 13:58, Nick A wrote:
> >
> >> We'll probably add another 2 OSD drives per month per node until full
> >> (24 SSD's per node), at which point, more nodes.
> >
> > I would add more nodes earlier to achieve better overall performance.
>
> Exactly. Not only performance, but also failure domain.
>
> In a smaller setup I would always choose a 1U node with 8 ~ 10 SSDs per
> node.
>
> Wido
>
> >
> > Regards
> >
> >
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>


-- 
Cheers,
~Blairo
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Planning all flash cluster

2018-06-20 Thread Wido den Hollander


On 06/20/2018 02:00 PM, Robert Sander wrote:
> On 20.06.2018 13:58, Nick A wrote:
> 
>> We'll probably add another 2 OSD drives per month per node until full
>> (24 SSD's per node), at which point, more nodes.
> 
> I would add more nodes earlier to achieve better overall performance.

Exactly. Not only performance, but also failure domain.

In a smaller setup I would always choose a 1U node with 8 ~ 10 SSDs per
node.

Wido

> 
> Regards
> 
> 
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 



signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Planning all flash cluster

2018-06-20 Thread Paul Emmerich
* More small servers give better performance then few big servers, maybe
twice the number of servers with half the disks, cpus and RAM
* 2x 10 gbit is usually enough, especially with more servers. that will
rarely be the bottleneck (unless you have extreme bandwidth requirements)
* maybe save money by using normal Ethernet unless you already got IB
infrastructure around
* you might need to reduce bluestore cache size a little bit (default is
3GB for SSDs) as you are running with 4GB ram per OSD (which is fine, you
just might need to tune the settings a little bit)
* SM863a is a great disk, good choice. NVMe db disks are not needed here
* raid controllers are evil in most cases, configure them as JBOD



Paul

2018-06-20 13:58 GMT+02:00 Nick A :

> Hello Everyone,
>
> We're planning a small cluster on a budget, and I'd like to request any
> feedback or tips.
>
> 3x Dell R720XD with:
> 2x Xeon E5-2680v2 or very similar
> 96GB RAM
> 2x Samsung SM863 240GB boot/OS drives
> 4x Samsung SM863 960GB OSD drives
> Dual 40/56Gbit Infiniband using IPoIB.
>
> 3 replica, MON on OSD nodes, RBD only (no object or CephFS).
>
> We'll probably add another 2 OSD drives per month per node until full (24
> SSD's per node), at which point, more nodes. We've got a few SM863's in
> production on other system and are seriously impressed with them, so would
> like to use them for Ceph too.
>
> We're hoping this is going to provide a decent amount of IOPS, 20k would
> be ideal. I'd like to avoid NVMe Journals unless it's going to make a truly
> massive difference. Same with carving up the SSD's, would rather not, and
> just keep it as simple as possible.
>
> Is there anything that obviously stands out as severely unbalanced? The
> R720XD comes with a H710 - instead of putting them in RAID0, I'm thinking a
> different HBA might be a better idea, any recommendations please?
>
> Regards,
> Nick
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>


-- 
Paul Emmerich

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Planning all flash cluster

2018-06-20 Thread Robert Sander
On 20.06.2018 13:58, Nick A wrote:

> We'll probably add another 2 OSD drives per month per node until full
> (24 SSD's per node), at which point, more nodes.

I would add more nodes earlier to achieve better overall performance.

Regards
-- 
Robert Sander
Heinlein Support GmbH
Schwedter Str. 8/9b, 10119 Berlin

http://www.heinlein-support.de

Tel: 030 / 405051-43
Fax: 030 / 405051-19

Zwangsangaben lt. §35a GmbHG:
HRB 93818 B / Amtsgericht Berlin-Charlottenburg,
Geschäftsführer: Peer Heinlein -- Sitz: Berlin



signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Planning all flash cluster

2018-06-20 Thread Nick A
Hello Everyone,

We're planning a small cluster on a budget, and I'd like to request any
feedback or tips.

3x Dell R720XD with:
2x Xeon E5-2680v2 or very similar
96GB RAM
2x Samsung SM863 240GB boot/OS drives
4x Samsung SM863 960GB OSD drives
Dual 40/56Gbit Infiniband using IPoIB.

3 replica, MON on OSD nodes, RBD only (no object or CephFS).

We'll probably add another 2 OSD drives per month per node until full (24
SSD's per node), at which point, more nodes. We've got a few SM863's in
production on other system and are seriously impressed with them, so would
like to use them for Ceph too.

We're hoping this is going to provide a decent amount of IOPS, 20k would be
ideal. I'd like to avoid NVMe Journals unless it's going to make a truly
massive difference. Same with carving up the SSD's, would rather not, and
just keep it as simple as possible.

Is there anything that obviously stands out as severely unbalanced? The
R720XD comes with a H710 - instead of putting them in RAID0, I'm thinking a
different HBA might be a better idea, any recommendations please?

Regards,
Nick
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com