Re: [ceph-users] Network redundancy...

2017-05-30 Thread Marco Gaiarin

> The switches your using can they stack?
> If so you could spread the LACP across the two switches.

And:

> Just use balance-alb, this will do a trick with no stack switches

Thanks for the answers, i'll do some tests! ;-)

-- 
dott. Marco Gaiarin GNUPG Key ID: 240A3D66
  Associazione ``La Nostra Famiglia''  http://www.lanostrafamiglia.it/
  Polo FVG   -   Via della Bontà, 7 - 33078   -   San Vito al Tagliamento (PN)
  marco.gaiarin(at)lanostrafamiglia.it   t +39-0434-842711   f +39-0434-842797

Dona il 5 PER MILLE a LA NOSTRA FAMIGLIA!
  http://www.lanostrafamiglia.it/index.php/it/sostienici/5x1000
(cf 00307430132, categoria ONLUS oppure RICERCA SANITARIA)
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Network redundancy...

2017-05-29 Thread Timofey Titovets
2017-05-29 11:37 GMT+03:00 Marco Gaiarin :
>
> I've setup a little Ceph cluster (3 host, 12 OSD), all belonging to a
> single switch, using 2-1Gbit/s LACP links.
>
> Supposing to have two identical switches, there's some way to setup a
> ''redundant'' configuration?
> For example, something similar to 'iSCSI multipath'?
>
>
> I'm reading switch manuals and ceph documentations, but with no luck.
>
>
> Thanks.

Just use balance-alb, this will do a trick with no stack switches

-- 
Have a nice day,
Timofey.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Network redundancy...

2017-05-29 Thread Ashley Merrick
The switches your using can they stack? 

If so you could spread the LACP across the two switches.

Sent from my iPhone

> On 29 May 2017, at 4:38 PM, Marco Gaiarin  wrote:
> 
> 
> I've setup a little Ceph cluster (3 host, 12 OSD), all belonging to a
> single switch, using 2-1Gbit/s LACP links.
> 
> Supposing to have two identical switches, there's some way to setup a
> ''redundant'' configuration?
> For example, something similar to 'iSCSI multipath'?
> 
> 
> I'm reading switch manuals and ceph documentations, but with no luck.
> 
> 
> Thanks.
> 
> -- 
> dott. Marco GaiarinGNUPG Key ID: 240A3D66
>  Associazione ``La Nostra Famiglia''  http://www.lanostrafamiglia.it/
>  Polo FVG   -   Via della Bontà, 7 - 33078   -   San Vito al Tagliamento (PN)
>  marco.gaiarin(at)lanostrafamiglia.it   t +39-0434-842711   f +39-0434-842797
> 
>Dona il 5 PER MILLE a LA NOSTRA FAMIGLIA!
>  http://www.lanostrafamiglia.it/index.php/it/sostienici/5x1000
>(cf 00307430132, categoria ONLUS oppure RICERCA SANITARIA)
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Network redundancy...

2017-05-29 Thread Marco Gaiarin

I've setup a little Ceph cluster (3 host, 12 OSD), all belonging to a
single switch, using 2-1Gbit/s LACP links.

Supposing to have two identical switches, there's some way to setup a
''redundant'' configuration?
For example, something similar to 'iSCSI multipath'?


I'm reading switch manuals and ceph documentations, but with no luck.


Thanks.

-- 
dott. Marco Gaiarin GNUPG Key ID: 240A3D66
  Associazione ``La Nostra Famiglia''  http://www.lanostrafamiglia.it/
  Polo FVG   -   Via della Bontà, 7 - 33078   -   San Vito al Tagliamento (PN)
  marco.gaiarin(at)lanostrafamiglia.it   t +39-0434-842711   f +39-0434-842797

Dona il 5 PER MILLE a LA NOSTRA FAMIGLIA!
  http://www.lanostrafamiglia.it/index.php/it/sostienici/5x1000
(cf 00307430132, categoria ONLUS oppure RICERCA SANITARIA)
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Network redundancy pro and cons, best practice, suggestions?

2015-04-21 Thread Götz Reinicke - IT Koordinator
Hi Christian,
Am 13.04.15 um 12:54 schrieb Christian Balzer:
 
 Hello,
 
 On Mon, 13 Apr 2015 11:03:24 +0200 Götz Reinicke - IT Koordinator wrote:
 
 Dear ceph users,

 we are planing a ceph storage cluster from scratch. Might be up to 1 PB
 within the next 3 years, multiple buildings, new network infrastructure
 for the cluster etc.

...

 So at your storage node density of 12 HDDs (16 HDD chassis are not space
 efficient), 40GbE is overkill with a single link/network, insanely so with
 2 networks.

What would you think, if we go with 20 OSDs, may be 22 OSd (24 HDD
chassis with 2 or 4 SSD (OS / Journal))

From the Questions about an example of ceph infrastructure topic ! got
the calculation of

HDDs x expected spped per disk = Max Performance =

20 x 70 MB/s = 1400 MB/s ie 1.4 GB/s = 14 Gb/s redundant = 28 Gb/s

I like the suggestion from Robert LeBlanc, using two 40Gb ports with VLANs.

Currently we have to extend our LAN anyway, all 10Gb ports are in use.

Upgrading 10Gb ports costs more than buying new 10Gb hardware in our
case. (Good old Cisco 6500 vs. modern 4500x challenge :) )

Furthermore we will see a lot of more traffic and requirements regarding
speed within next year and than an other rise within the next 5 years.
(e.g. 4K/8K video realtime playback for some workstations. 4K is about 9
Gb/s ! per Workstation.)

Long story short, we have/will/should/and can start with 40Gb. The
question is how big :)

I'd say, after some more internal discussions to, redundant switches are
mandatory in our case, and the 40Gb-VLANs a good balance regarding
redundancy, cost and performance.

Thumbs up or down, your vote :D. Seriously what you think?

Thanks for your feedback and best regards . Götz

-- 
Götz Reinicke
IT-Koordinator

Tel. +49 7141 969 82 420
E-Mail goetz.reini...@filmakademie.de

Filmakademie Baden-Württemberg GmbH
Akademiehof 10
71638 Ludwigsburg
www.filmakademie.de

Eintragung Amtsgericht Stuttgart HRB 205016

Vorsitzender des Aufsichtsrats: Jürgen Walter MdL
Staatssekretär im Ministerium für Wissenschaft,
Forschung und Kunst Baden-Württemberg

Geschäftsführer: Prof. Thomas Schadt



smime.p7s
Description: S/MIME Cryptographic Signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Network redundancy pro and cons, best practice, suggestions?

2015-04-21 Thread Christian Balzer

Hello,

On Tue, 21 Apr 2015 08:33:21 +0200 Götz Reinicke - IT Koordinator wrote:

 Hi Christian,
 Am 13.04.15 um 12:54 schrieb Christian Balzer:
  
  Hello,
  
  On Mon, 13 Apr 2015 11:03:24 +0200 Götz Reinicke - IT Koordinator
  wrote:
  
  Dear ceph users,
 
  we are planing a ceph storage cluster from scratch. Might be up to 1
  PB within the next 3 years, multiple buildings, new network
  infrastructure for the cluster etc.
 
 ...
 
  So at your storage node density of 12 HDDs (16 HDD chassis are not
  space efficient), 40GbE is overkill with a single link/network,
  insanely so with 2 networks.
 
 What would you think, if we go with 20 OSDs, may be 22 OSd (24 HDD
 chassis with 2 or 4 SSD (OS / Journal))
 
Density is nice, as in cost-effective. 
But with Ceph, smaller is better, both in terms of performance and failure
domains.

If you can start with a large enough number of nodes (at least 10) and
remember that you're probably looking for at least 2GHz per OSD with SSD
journals, go for it. 
But you'll need NVMe SSDs to satisfy 11 OSD HDDs, never mind the rather
large failure domain. 
So a 1:5 SSD journal to HDD OSD ratio would be better.

 From the Questions about an example of ceph infrastructure topic ! got
 the calculation of
 
 HDDs x expected spped per disk = Max Performance =
 
 20 x 70 MB/s = 1400 MB/s ie 1.4 GB/s = 14 Gb/s redundant = 28 Gb/s
 
 I like the suggestion from Robert LeBlanc, using two 40Gb ports with
 VLANs.
 
 Currently we have to extend our LAN anyway, all 10Gb ports are in use.
 
 Upgrading 10Gb ports costs more than buying new 10Gb hardware in our
 case. (Good old Cisco 6500 vs. modern 4500x challenge :) )
 
 Furthermore we will see a lot of more traffic and requirements regarding
 speed within next year and than an other rise within the next 5 years.
 (e.g. 4K/8K video realtime playback for some workstations. 4K is about 9
 Gb/s ! per Workstation.)
 
 Long story short, we have/will/should/and can start with 40Gb. The
 question is how big :)
 
If you can afford it, sure. ^o^

Christian

 I'd say, after some more internal discussions to, redundant switches are
 mandatory in our case, and the 40Gb-VLANs a good balance regarding
 redundancy, cost and performance.
 
 Thumbs up or down, your vote :D. Seriously what you think?
 
   Thanks for your feedback and best regards . Götz
 


-- 
Christian BalzerNetwork/Systems Engineer
ch...@gol.com   Global OnLine Japan/Fusion Communications
http://www.gol.com/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Network redundancy pro and cons, best practice, suggestions?

2015-04-13 Thread Alexandre DERUMIER
So what would you suggest, what are your experiences?

Hi, you can have a look at mellanox sx1012 for example
http://www.mellanox.com/page/products_dyn?product_family=163

12 ports 40GB for around 4000€

you can use breakout cables to have 4x12 10GB ports.


They can be stacked with mlag and lacp


- Mail original -
De: Götz Reinicke - IT Koordinator goetz.reini...@filmakademie.de
À: ceph-users ceph-users@lists.ceph.com
Envoyé: Lundi 13 Avril 2015 11:03:24
Objet: [ceph-users] Network redundancy pro and cons, best practice, 
suggestions?

Dear ceph users, 

we are planing a ceph storage cluster from scratch. Might be up to 1 PB 
within the next 3 years, multiple buildings, new network infrastructure 
for the cluster etc. 

I had some excellent trainings on ceph, so the essential fundamentals 
are familiar to me, and I know our goals/dreams can be reached. :) 

There is just one tiny piece in the design I'm currently unsure about :) 

Ceph follows some sort of keep it small and simple, e.g. dont use raid 
controllers, use more boxes and disks, fast network etc. 

So from our current design we plan 40Gb Storage and Client LAN. 

Would you suggest to connect the OSD nodes redundant to both networks? 
That would end up with 4 * 40Gb ports in each box, two Switches to 
connect to. 

I'd think of OSD nodes with 12 - 16 * 4TB SATA disks for high io 
pools. (+ currently SSD for journal, but may be until we start, levelDB, 
rocksDB are ready ... ?) 

Later some less io bound pools for data archiving/backup. (bigger and 
more Disks per node) 

We would also do some Cache tiering for some pools. 

From HP, Intel, Supermicron etc reference documentations, they use 
usually non-redundant network connection. (single 10Gb) 

I know: redundancy keeps some headaches small, but also adds some more 
complexity and increases the budget. (add network adapters, other 
server, more switches, etc) 

So what would you suggest, what are your experiences? 

Thanks for any suggestion and feedback . Regards . Götz 
-- 
Götz Reinicke 
IT-Koordinator 

Tel. +49 7141 969 82 420 
E-Mail goetz.reini...@filmakademie.de 

Filmakademie Baden-Württemberg GmbH 
Akademiehof 10 
71638 Ludwigsburg 
www.filmakademie.de 

Eintragung Amtsgericht Stuttgart HRB 205016 

Vorsitzender des Aufsichtsrats: Jürgen Walter MdL 
Staatssekretär im Ministerium für Wissenschaft, 
Forschung und Kunst Baden-Württemberg 

Geschäftsführer: Prof. Thomas Schadt 


___ 
ceph-users mailing list 
ceph-users@lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Network redundancy pro and cons, best practice, suggestions?

2015-04-13 Thread Götz Reinicke - IT Koordinator
Dear ceph users,

we are planing a ceph storage cluster from scratch. Might be up to 1 PB
within the next 3 years, multiple buildings, new network infrastructure
for the cluster etc.

I had some excellent trainings on ceph, so the essential fundamentals
are familiar to me, and I know our goals/dreams can be reached. :)

There is just one tiny piece in the design I'm currently unsure about :)

Ceph follows some sort of keep it small and simple, e.g. dont use raid
controllers, use more boxes and disks, fast network etc.

So from our current design we plan 40Gb Storage and Client LAN.

Would you suggest to connect the OSD nodes redundant to both networks?
That would end up with 4 * 40Gb ports in each box, two Switches to
connect to.

I'd think of OSD nodes with 12 - 16 * 4TB SATA disks for high io
pools. (+ currently SSD for journal, but may be until we start, levelDB,
rocksDB are ready ... ?)

Later some less io bound pools for data archiving/backup. (bigger and
more Disks per node)

We would also do some Cache tiering for some pools.

From HP, Intel, Supermicron etc reference documentations, they use
usually non-redundant network connection. (single 10Gb)

I know: redundancy keeps some headaches small, but also adds some more
complexity and increases the budget. (add network adapters, other
server, more switches, etc)

So what would you suggest, what are your experiences?

Thanks for any suggestion and feedback . Regards . Götz
-- 
Götz Reinicke
IT-Koordinator

Tel. +49 7141 969 82 420
E-Mail goetz.reini...@filmakademie.de

Filmakademie Baden-Württemberg GmbH
Akademiehof 10
71638 Ludwigsburg
www.filmakademie.de

Eintragung Amtsgericht Stuttgart HRB 205016

Vorsitzender des Aufsichtsrats: Jürgen Walter MdL
Staatssekretär im Ministerium für Wissenschaft,
Forschung und Kunst Baden-Württemberg

Geschäftsführer: Prof. Thomas Schadt



smime.p7s
Description: S/MIME Cryptographic Signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Network redundancy pro and cons, best practice, suggestions?

2015-04-13 Thread Götz Reinicke - IT Koordinator
Hi Alexandre,

thanks for that suggestion. mellanox might be on our shoping list
already, but what regarding the redundandency design at all from your POV?

/Götz
Am 13.04.15 um 11:08 schrieb Alexandre DERUMIER:
 So what would you suggest, what are your experiences?
 
 Hi, you can have a look at mellanox sx1012 for example
 http://www.mellanox.com/page/products_dyn?product_family=163
 
 12 ports 40GB for around 4000€
 
 you can use breakout cables to have 4x12 10GB ports.
 
 
 They can be stacked with mlag and lacp
 
 
 - Mail original -
 De: Götz Reinicke - IT Koordinator goetz.reini...@filmakademie.de
 À: ceph-users ceph-users@lists.ceph.com
 Envoyé: Lundi 13 Avril 2015 11:03:24
 Objet: [ceph-users] Network redundancy pro and cons, best practice,   
 suggestions?
 
 Dear ceph users, 
 
 we are planing a ceph storage cluster from scratch. Might be up to 1 PB 
 within the next 3 years, multiple buildings, new network infrastructure 
 for the cluster etc. 
 
 I had some excellent trainings on ceph, so the essential fundamentals 
 are familiar to me, and I know our goals/dreams can be reached. :) 
 
 There is just one tiny piece in the design I'm currently unsure about :) 
 
 Ceph follows some sort of keep it small and simple, e.g. dont use raid 
 controllers, use more boxes and disks, fast network etc. 
 
 So from our current design we plan 40Gb Storage and Client LAN. 
 
 Would you suggest to connect the OSD nodes redundant to both networks? 
 That would end up with 4 * 40Gb ports in each box, two Switches to 
 connect to. 
 
 I'd think of OSD nodes with 12 - 16 * 4TB SATA disks for high io 
 pools. (+ currently SSD for journal, but may be until we start, levelDB, 
 rocksDB are ready ... ?) 
 
 Later some less io bound pools for data archiving/backup. (bigger and 
 more Disks per node) 
 
 We would also do some Cache tiering for some pools. 
 
 From HP, Intel, Supermicron etc reference documentations, they use 
 usually non-redundant network connection. (single 10Gb) 
 
 I know: redundancy keeps some headaches small, but also adds some more 
 complexity and increases the budget. (add network adapters, other 
 server, more switches, etc) 
 
 So what would you suggest, what are your experiences? 
 
 Thanks for any suggestion and feedback . Regards . Götz 
 


-- 
Götz Reinicke
IT-Koordinator

Tel. +49 7141 969 82 420
E-Mail goetz.reini...@filmakademie.de

Filmakademie Baden-Württemberg GmbH
Akademiehof 10
71638 Ludwigsburg
www.filmakademie.de

Eintragung Amtsgericht Stuttgart HRB 205016

Vorsitzender des Aufsichtsrats: Jürgen Walter MdL
Staatssekretär im Ministerium für Wissenschaft,
Forschung und Kunst Baden-Württemberg

Geschäftsführer: Prof. Thomas Schadt



smime.p7s
Description: S/MIME Cryptographic Signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Network redundancy pro and cons, best practice, suggestions?

2015-04-13 Thread Christian Balzer

Hello,

On Mon, 13 Apr 2015 11:03:24 +0200 Götz Reinicke - IT Koordinator wrote:

 Dear ceph users,
 
 we are planing a ceph storage cluster from scratch. Might be up to 1 PB
 within the next 3 years, multiple buildings, new network infrastructure
 for the cluster etc.
 
 I had some excellent trainings on ceph, so the essential fundamentals
 are familiar to me, and I know our goals/dreams can be reached. :)
 
 There is just one tiny piece in the design I'm currently unsure
 about :)
 
 Ceph follows some sort of keep it small and simple, e.g. dont use raid
 controllers, use more boxes and disks, fast network etc.
 
While small and plenty is definitely true, some people actually use RAID
for OSDs (like RAID1) to avoid ever having to deal with a failed OSD and
getting a 4x replication in the end. 
Your needs and budget may of course differ.

 So from our current design we plan 40Gb Storage and Client LAN.
 
 Would you suggest to connect the OSD nodes redundant to both networks?
 That would end up with 4 * 40Gb ports in each box, two Switches to
 connect to.
 
If you can afford it, fabric switches are quite nice, as they allow for
LACP over 2 switches, so if everything is working you get twice the speed,
if not still full redundancy. The Brocade VDX stuff comes to mind.

However if you're not tied into an Ethernet network, you might do better
and cheaper with an Infiniband network on the storage side of things.
This will become even more attractive as RDMA support improves with Ceph.

Separating public (client) and private (storage, OSD interconnect)
networks with Ceph makes only sense if your storage node can actually
utilize all that bandwidth.

So at your storage node density of 12 HDDs (16 HDD chassis are not space
efficient), 40GbE is overkill with a single link/network, insanely so with
2 networks.

 I'd think of OSD nodes with 12 - 16 * 4TB SATA disks for high io
 pools. (+ currently SSD for journal, but may be until we start, levelDB,
 rocksDB are ready ... ?)
 
 Later some less io bound pools for data archiving/backup. (bigger and
 more Disks per node)
 
 We would also do some Cache tiering for some pools.
 
 From HP, Intel, Supermicron etc reference documentations, they use
 usually non-redundant network connection. (single 10Gb)
 
 I know: redundancy keeps some headaches small, but also adds some more
 complexity and increases the budget. (add network adapters, other
 server, more switches, etc)
 
Complexity not so much, cost yes.

 So what would you suggest, what are your experiences?
 
It all depends on how small (large really) you can start.

I have only small clusters with few nodes, so for me redundancy is a big
deal.
Thus those cluster use Infiniband, 2 switches and dual-port HCAs on the
nodes in an active-standby mode.

If you however can start with something like 10 racks (ToR switches),
loosing one switch would mean a loss of 10% of your cluster, which is
something it should be able to cope with.
Especially if you configured Ceph to _not_ start re-balancing data
automatically if a rack goes down (so that you have a chance to put a
replacement switch in place, which you of course kept handy on-site for
such a case). ^.-

Regards,

Christian
-- 
Christian BalzerNetwork/Systems Engineer
ch...@gol.com   Global OnLine Japan/Fusion Communications
http://www.gol.com/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Network redundancy pro and cons, best practice, suggestions?

2015-04-13 Thread Robert LeBlanc
For us, using two 40Gb ports with VLANs is redundancy enough. We are
doing LACP over two different switches.

On Mon, Apr 13, 2015 at 3:03 AM, Götz Reinicke - IT Koordinator
goetz.reini...@filmakademie.de wrote:
 Dear ceph users,

 we are planing a ceph storage cluster from scratch. Might be up to 1 PB
 within the next 3 years, multiple buildings, new network infrastructure
 for the cluster etc.

 I had some excellent trainings on ceph, so the essential fundamentals
 are familiar to me, and I know our goals/dreams can be reached. :)

 There is just one tiny piece in the design I'm currently unsure about :)

 Ceph follows some sort of keep it small and simple, e.g. dont use raid
 controllers, use more boxes and disks, fast network etc.

 So from our current design we plan 40Gb Storage and Client LAN.

 Would you suggest to connect the OSD nodes redundant to both networks?
 That would end up with 4 * 40Gb ports in each box, two Switches to
 connect to.

 I'd think of OSD nodes with 12 - 16 * 4TB SATA disks for high io
 pools. (+ currently SSD for journal, but may be until we start, levelDB,
 rocksDB are ready ... ?)

 Later some less io bound pools for data archiving/backup. (bigger and
 more Disks per node)

 We would also do some Cache tiering for some pools.

 From HP, Intel, Supermicron etc reference documentations, they use
 usually non-redundant network connection. (single 10Gb)

 I know: redundancy keeps some headaches small, but also adds some more
 complexity and increases the budget. (add network adapters, other
 server, more switches, etc)

 So what would you suggest, what are your experiences?

 Thanks for any suggestion and feedback . Regards . Götz
 --
 Götz Reinicke
 IT-Koordinator

 Tel. +49 7141 969 82 420
 E-Mail goetz.reini...@filmakademie.de

 Filmakademie Baden-Württemberg GmbH
 Akademiehof 10
 71638 Ludwigsburg
 www.filmakademie.de

 Eintragung Amtsgericht Stuttgart HRB 205016

 Vorsitzender des Aufsichtsrats: Jürgen Walter MdL
 Staatssekretär im Ministerium für Wissenschaft,
 Forschung und Kunst Baden-Württemberg

 Geschäftsführer: Prof. Thomas Schadt


 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Network redundancy pro and cons, best practice, suggestions?

2015-04-13 Thread Scott Laird
Redundancy is a means to an end, not an end itself.

If you can afford to lose component X, manually replace it, and then return
everything impacted to service, then there's no point in making X redundant.

If you can afford to lose a single disk (which Ceph certainly can), then
there's no point in local RAID.

If you can afford to lose a single machine, then there's no point in
redundant power supplies (although they can make power maintenance work a
lot less complex).

If you can afford to lose everything attached to a switch, then there's no
point in making it redundant.


Doing redundant networking to the host adds a lot of complexity that isn't
really there with single-attached hosts.  For instance, what happens if one
of the switches loses its connection to the outside world?  With LACP,
you'll probably lose connectivity to half of your peers.  Doing something
like OSPF, possibly with ECMP, avoids that problem, but certainly doesn't
make things less complicated.

In most cases, I'd avoid switch redundancy.  If I had more than 10 racks,
there's really no point, because you should be able to lose a rack without
massive disruption.  If I only had a rack or two, than I quite likely
wouldn't bother, simply because it ends up being a bigger part of the cost
and the added complexity and cost isn't worth it in most cases.

It comes down to engineering tradeoffs and money, and the right balance is
different in just about every situation.  It's a function of money,
acceptance of risk, scale, performance, networking experience, and the cost
of outages.


Scott

On Mon, Apr 13, 2015 at 4:02 AM Christian Balzer ch...@gol.com wrote:


 Hello,

 On Mon, 13 Apr 2015 11:03:24 +0200 Götz Reinicke - IT Koordinator wrote:

  Dear ceph users,
 
  we are planing a ceph storage cluster from scratch. Might be up to 1 PB
  within the next 3 years, multiple buildings, new network infrastructure
  for the cluster etc.
 
  I had some excellent trainings on ceph, so the essential fundamentals
  are familiar to me, and I know our goals/dreams can be reached. :)
 
  There is just one tiny piece in the design I'm currently unsure
  about :)
 
  Ceph follows some sort of keep it small and simple, e.g. dont use raid
  controllers, use more boxes and disks, fast network etc.
 
 While small and plenty is definitely true, some people actually use RAID
 for OSDs (like RAID1) to avoid ever having to deal with a failed OSD and
 getting a 4x replication in the end.
 Your needs and budget may of course differ.

  So from our current design we plan 40Gb Storage and Client LAN.
 
  Would you suggest to connect the OSD nodes redundant to both networks?
  That would end up with 4 * 40Gb ports in each box, two Switches to
  connect to.
 
 If you can afford it, fabric switches are quite nice, as they allow for
 LACP over 2 switches, so if everything is working you get twice the speed,
 if not still full redundancy. The Brocade VDX stuff comes to mind.

 However if you're not tied into an Ethernet network, you might do better
 and cheaper with an Infiniband network on the storage side of things.
 This will become even more attractive as RDMA support improves with Ceph.

 Separating public (client) and private (storage, OSD interconnect)
 networks with Ceph makes only sense if your storage node can actually
 utilize all that bandwidth.

 So at your storage node density of 12 HDDs (16 HDD chassis are not space
 efficient), 40GbE is overkill with a single link/network, insanely so with
 2 networks.

  I'd think of OSD nodes with 12 - 16 * 4TB SATA disks for high io
  pools. (+ currently SSD for journal, but may be until we start, levelDB,
  rocksDB are ready ... ?)
 
  Later some less io bound pools for data archiving/backup. (bigger and
  more Disks per node)
 
  We would also do some Cache tiering for some pools.
 
  From HP, Intel, Supermicron etc reference documentations, they use
  usually non-redundant network connection. (single 10Gb)
 
  I know: redundancy keeps some headaches small, but also adds some more
  complexity and increases the budget. (add network adapters, other
  server, more switches, etc)
 
 Complexity not so much, cost yes.

  So what would you suggest, what are your experiences?
 
 It all depends on how small (large really) you can start.

 I have only small clusters with few nodes, so for me redundancy is a big
 deal.
 Thus those cluster use Infiniband, 2 switches and dual-port HCAs on the
 nodes in an active-standby mode.

 If you however can start with something like 10 racks (ToR switches),
 loosing one switch would mean a loss of 10% of your cluster, which is
 something it should be able to cope with.
 Especially if you configured Ceph to _not_ start re-balancing data
 automatically if a rack goes down (so that you have a chance to put a
 replacement switch in place, which you of course kept handy on-site for
 such a case). ^.-

 Regards,

 Christian
 --
 Christian BalzerNetwork/Systems Engineer
 ch...@gol.com