Re: [ceph-users] Network redundancy...
> The switches your using can they stack? > If so you could spread the LACP across the two switches. And: > Just use balance-alb, this will do a trick with no stack switches Thanks for the answers, i'll do some tests! ;-) -- dott. Marco Gaiarin GNUPG Key ID: 240A3D66 Associazione ``La Nostra Famiglia'' http://www.lanostrafamiglia.it/ Polo FVG - Via della Bontà, 7 - 33078 - San Vito al Tagliamento (PN) marco.gaiarin(at)lanostrafamiglia.it t +39-0434-842711 f +39-0434-842797 Dona il 5 PER MILLE a LA NOSTRA FAMIGLIA! http://www.lanostrafamiglia.it/index.php/it/sostienici/5x1000 (cf 00307430132, categoria ONLUS oppure RICERCA SANITARIA) ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Network redundancy...
2017-05-29 11:37 GMT+03:00 Marco Gaiarin: > > I've setup a little Ceph cluster (3 host, 12 OSD), all belonging to a > single switch, using 2-1Gbit/s LACP links. > > Supposing to have two identical switches, there's some way to setup a > ''redundant'' configuration? > For example, something similar to 'iSCSI multipath'? > > > I'm reading switch manuals and ceph documentations, but with no luck. > > > Thanks. Just use balance-alb, this will do a trick with no stack switches -- Have a nice day, Timofey. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Network redundancy...
The switches your using can they stack? If so you could spread the LACP across the two switches. Sent from my iPhone > On 29 May 2017, at 4:38 PM, Marco Gaiarinwrote: > > > I've setup a little Ceph cluster (3 host, 12 OSD), all belonging to a > single switch, using 2-1Gbit/s LACP links. > > Supposing to have two identical switches, there's some way to setup a > ''redundant'' configuration? > For example, something similar to 'iSCSI multipath'? > > > I'm reading switch manuals and ceph documentations, but with no luck. > > > Thanks. > > -- > dott. Marco GaiarinGNUPG Key ID: 240A3D66 > Associazione ``La Nostra Famiglia'' http://www.lanostrafamiglia.it/ > Polo FVG - Via della Bontà, 7 - 33078 - San Vito al Tagliamento (PN) > marco.gaiarin(at)lanostrafamiglia.it t +39-0434-842711 f +39-0434-842797 > >Dona il 5 PER MILLE a LA NOSTRA FAMIGLIA! > http://www.lanostrafamiglia.it/index.php/it/sostienici/5x1000 >(cf 00307430132, categoria ONLUS oppure RICERCA SANITARIA) > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Network redundancy...
I've setup a little Ceph cluster (3 host, 12 OSD), all belonging to a single switch, using 2-1Gbit/s LACP links. Supposing to have two identical switches, there's some way to setup a ''redundant'' configuration? For example, something similar to 'iSCSI multipath'? I'm reading switch manuals and ceph documentations, but with no luck. Thanks. -- dott. Marco Gaiarin GNUPG Key ID: 240A3D66 Associazione ``La Nostra Famiglia'' http://www.lanostrafamiglia.it/ Polo FVG - Via della Bontà, 7 - 33078 - San Vito al Tagliamento (PN) marco.gaiarin(at)lanostrafamiglia.it t +39-0434-842711 f +39-0434-842797 Dona il 5 PER MILLE a LA NOSTRA FAMIGLIA! http://www.lanostrafamiglia.it/index.php/it/sostienici/5x1000 (cf 00307430132, categoria ONLUS oppure RICERCA SANITARIA) ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Network redundancy pro and cons, best practice, suggestions?
Hi Christian, Am 13.04.15 um 12:54 schrieb Christian Balzer: Hello, On Mon, 13 Apr 2015 11:03:24 +0200 Götz Reinicke - IT Koordinator wrote: Dear ceph users, we are planing a ceph storage cluster from scratch. Might be up to 1 PB within the next 3 years, multiple buildings, new network infrastructure for the cluster etc. ... So at your storage node density of 12 HDDs (16 HDD chassis are not space efficient), 40GbE is overkill with a single link/network, insanely so with 2 networks. What would you think, if we go with 20 OSDs, may be 22 OSd (24 HDD chassis with 2 or 4 SSD (OS / Journal)) From the Questions about an example of ceph infrastructure topic ! got the calculation of HDDs x expected spped per disk = Max Performance = 20 x 70 MB/s = 1400 MB/s ie 1.4 GB/s = 14 Gb/s redundant = 28 Gb/s I like the suggestion from Robert LeBlanc, using two 40Gb ports with VLANs. Currently we have to extend our LAN anyway, all 10Gb ports are in use. Upgrading 10Gb ports costs more than buying new 10Gb hardware in our case. (Good old Cisco 6500 vs. modern 4500x challenge :) ) Furthermore we will see a lot of more traffic and requirements regarding speed within next year and than an other rise within the next 5 years. (e.g. 4K/8K video realtime playback for some workstations. 4K is about 9 Gb/s ! per Workstation.) Long story short, we have/will/should/and can start with 40Gb. The question is how big :) I'd say, after some more internal discussions to, redundant switches are mandatory in our case, and the 40Gb-VLANs a good balance regarding redundancy, cost and performance. Thumbs up or down, your vote :D. Seriously what you think? Thanks for your feedback and best regards . Götz -- Götz Reinicke IT-Koordinator Tel. +49 7141 969 82 420 E-Mail goetz.reini...@filmakademie.de Filmakademie Baden-Württemberg GmbH Akademiehof 10 71638 Ludwigsburg www.filmakademie.de Eintragung Amtsgericht Stuttgart HRB 205016 Vorsitzender des Aufsichtsrats: Jürgen Walter MdL Staatssekretär im Ministerium für Wissenschaft, Forschung und Kunst Baden-Württemberg Geschäftsführer: Prof. Thomas Schadt smime.p7s Description: S/MIME Cryptographic Signature ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Network redundancy pro and cons, best practice, suggestions?
Hello, On Tue, 21 Apr 2015 08:33:21 +0200 Götz Reinicke - IT Koordinator wrote: Hi Christian, Am 13.04.15 um 12:54 schrieb Christian Balzer: Hello, On Mon, 13 Apr 2015 11:03:24 +0200 Götz Reinicke - IT Koordinator wrote: Dear ceph users, we are planing a ceph storage cluster from scratch. Might be up to 1 PB within the next 3 years, multiple buildings, new network infrastructure for the cluster etc. ... So at your storage node density of 12 HDDs (16 HDD chassis are not space efficient), 40GbE is overkill with a single link/network, insanely so with 2 networks. What would you think, if we go with 20 OSDs, may be 22 OSd (24 HDD chassis with 2 or 4 SSD (OS / Journal)) Density is nice, as in cost-effective. But with Ceph, smaller is better, both in terms of performance and failure domains. If you can start with a large enough number of nodes (at least 10) and remember that you're probably looking for at least 2GHz per OSD with SSD journals, go for it. But you'll need NVMe SSDs to satisfy 11 OSD HDDs, never mind the rather large failure domain. So a 1:5 SSD journal to HDD OSD ratio would be better. From the Questions about an example of ceph infrastructure topic ! got the calculation of HDDs x expected spped per disk = Max Performance = 20 x 70 MB/s = 1400 MB/s ie 1.4 GB/s = 14 Gb/s redundant = 28 Gb/s I like the suggestion from Robert LeBlanc, using two 40Gb ports with VLANs. Currently we have to extend our LAN anyway, all 10Gb ports are in use. Upgrading 10Gb ports costs more than buying new 10Gb hardware in our case. (Good old Cisco 6500 vs. modern 4500x challenge :) ) Furthermore we will see a lot of more traffic and requirements regarding speed within next year and than an other rise within the next 5 years. (e.g. 4K/8K video realtime playback for some workstations. 4K is about 9 Gb/s ! per Workstation.) Long story short, we have/will/should/and can start with 40Gb. The question is how big :) If you can afford it, sure. ^o^ Christian I'd say, after some more internal discussions to, redundant switches are mandatory in our case, and the 40Gb-VLANs a good balance regarding redundancy, cost and performance. Thumbs up or down, your vote :D. Seriously what you think? Thanks for your feedback and best regards . Götz -- Christian BalzerNetwork/Systems Engineer ch...@gol.com Global OnLine Japan/Fusion Communications http://www.gol.com/ ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Network redundancy pro and cons, best practice, suggestions?
So what would you suggest, what are your experiences? Hi, you can have a look at mellanox sx1012 for example http://www.mellanox.com/page/products_dyn?product_family=163 12 ports 40GB for around 4000€ you can use breakout cables to have 4x12 10GB ports. They can be stacked with mlag and lacp - Mail original - De: Götz Reinicke - IT Koordinator goetz.reini...@filmakademie.de À: ceph-users ceph-users@lists.ceph.com Envoyé: Lundi 13 Avril 2015 11:03:24 Objet: [ceph-users] Network redundancy pro and cons, best practice, suggestions? Dear ceph users, we are planing a ceph storage cluster from scratch. Might be up to 1 PB within the next 3 years, multiple buildings, new network infrastructure for the cluster etc. I had some excellent trainings on ceph, so the essential fundamentals are familiar to me, and I know our goals/dreams can be reached. :) There is just one tiny piece in the design I'm currently unsure about :) Ceph follows some sort of keep it small and simple, e.g. dont use raid controllers, use more boxes and disks, fast network etc. So from our current design we plan 40Gb Storage and Client LAN. Would you suggest to connect the OSD nodes redundant to both networks? That would end up with 4 * 40Gb ports in each box, two Switches to connect to. I'd think of OSD nodes with 12 - 16 * 4TB SATA disks for high io pools. (+ currently SSD for journal, but may be until we start, levelDB, rocksDB are ready ... ?) Later some less io bound pools for data archiving/backup. (bigger and more Disks per node) We would also do some Cache tiering for some pools. From HP, Intel, Supermicron etc reference documentations, they use usually non-redundant network connection. (single 10Gb) I know: redundancy keeps some headaches small, but also adds some more complexity and increases the budget. (add network adapters, other server, more switches, etc) So what would you suggest, what are your experiences? Thanks for any suggestion and feedback . Regards . Götz -- Götz Reinicke IT-Koordinator Tel. +49 7141 969 82 420 E-Mail goetz.reini...@filmakademie.de Filmakademie Baden-Württemberg GmbH Akademiehof 10 71638 Ludwigsburg www.filmakademie.de Eintragung Amtsgericht Stuttgart HRB 205016 Vorsitzender des Aufsichtsrats: Jürgen Walter MdL Staatssekretär im Ministerium für Wissenschaft, Forschung und Kunst Baden-Württemberg Geschäftsführer: Prof. Thomas Schadt ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Network redundancy pro and cons, best practice, suggestions?
Dear ceph users, we are planing a ceph storage cluster from scratch. Might be up to 1 PB within the next 3 years, multiple buildings, new network infrastructure for the cluster etc. I had some excellent trainings on ceph, so the essential fundamentals are familiar to me, and I know our goals/dreams can be reached. :) There is just one tiny piece in the design I'm currently unsure about :) Ceph follows some sort of keep it small and simple, e.g. dont use raid controllers, use more boxes and disks, fast network etc. So from our current design we plan 40Gb Storage and Client LAN. Would you suggest to connect the OSD nodes redundant to both networks? That would end up with 4 * 40Gb ports in each box, two Switches to connect to. I'd think of OSD nodes with 12 - 16 * 4TB SATA disks for high io pools. (+ currently SSD for journal, but may be until we start, levelDB, rocksDB are ready ... ?) Later some less io bound pools for data archiving/backup. (bigger and more Disks per node) We would also do some Cache tiering for some pools. From HP, Intel, Supermicron etc reference documentations, they use usually non-redundant network connection. (single 10Gb) I know: redundancy keeps some headaches small, but also adds some more complexity and increases the budget. (add network adapters, other server, more switches, etc) So what would you suggest, what are your experiences? Thanks for any suggestion and feedback . Regards . Götz -- Götz Reinicke IT-Koordinator Tel. +49 7141 969 82 420 E-Mail goetz.reini...@filmakademie.de Filmakademie Baden-Württemberg GmbH Akademiehof 10 71638 Ludwigsburg www.filmakademie.de Eintragung Amtsgericht Stuttgart HRB 205016 Vorsitzender des Aufsichtsrats: Jürgen Walter MdL Staatssekretär im Ministerium für Wissenschaft, Forschung und Kunst Baden-Württemberg Geschäftsführer: Prof. Thomas Schadt smime.p7s Description: S/MIME Cryptographic Signature ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Network redundancy pro and cons, best practice, suggestions?
Hi Alexandre, thanks for that suggestion. mellanox might be on our shoping list already, but what regarding the redundandency design at all from your POV? /Götz Am 13.04.15 um 11:08 schrieb Alexandre DERUMIER: So what would you suggest, what are your experiences? Hi, you can have a look at mellanox sx1012 for example http://www.mellanox.com/page/products_dyn?product_family=163 12 ports 40GB for around 4000€ you can use breakout cables to have 4x12 10GB ports. They can be stacked with mlag and lacp - Mail original - De: Götz Reinicke - IT Koordinator goetz.reini...@filmakademie.de À: ceph-users ceph-users@lists.ceph.com Envoyé: Lundi 13 Avril 2015 11:03:24 Objet: [ceph-users] Network redundancy pro and cons, best practice, suggestions? Dear ceph users, we are planing a ceph storage cluster from scratch. Might be up to 1 PB within the next 3 years, multiple buildings, new network infrastructure for the cluster etc. I had some excellent trainings on ceph, so the essential fundamentals are familiar to me, and I know our goals/dreams can be reached. :) There is just one tiny piece in the design I'm currently unsure about :) Ceph follows some sort of keep it small and simple, e.g. dont use raid controllers, use more boxes and disks, fast network etc. So from our current design we plan 40Gb Storage and Client LAN. Would you suggest to connect the OSD nodes redundant to both networks? That would end up with 4 * 40Gb ports in each box, two Switches to connect to. I'd think of OSD nodes with 12 - 16 * 4TB SATA disks for high io pools. (+ currently SSD for journal, but may be until we start, levelDB, rocksDB are ready ... ?) Later some less io bound pools for data archiving/backup. (bigger and more Disks per node) We would also do some Cache tiering for some pools. From HP, Intel, Supermicron etc reference documentations, they use usually non-redundant network connection. (single 10Gb) I know: redundancy keeps some headaches small, but also adds some more complexity and increases the budget. (add network adapters, other server, more switches, etc) So what would you suggest, what are your experiences? Thanks for any suggestion and feedback . Regards . Götz -- Götz Reinicke IT-Koordinator Tel. +49 7141 969 82 420 E-Mail goetz.reini...@filmakademie.de Filmakademie Baden-Württemberg GmbH Akademiehof 10 71638 Ludwigsburg www.filmakademie.de Eintragung Amtsgericht Stuttgart HRB 205016 Vorsitzender des Aufsichtsrats: Jürgen Walter MdL Staatssekretär im Ministerium für Wissenschaft, Forschung und Kunst Baden-Württemberg Geschäftsführer: Prof. Thomas Schadt smime.p7s Description: S/MIME Cryptographic Signature ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Network redundancy pro and cons, best practice, suggestions?
Hello, On Mon, 13 Apr 2015 11:03:24 +0200 Götz Reinicke - IT Koordinator wrote: Dear ceph users, we are planing a ceph storage cluster from scratch. Might be up to 1 PB within the next 3 years, multiple buildings, new network infrastructure for the cluster etc. I had some excellent trainings on ceph, so the essential fundamentals are familiar to me, and I know our goals/dreams can be reached. :) There is just one tiny piece in the design I'm currently unsure about :) Ceph follows some sort of keep it small and simple, e.g. dont use raid controllers, use more boxes and disks, fast network etc. While small and plenty is definitely true, some people actually use RAID for OSDs (like RAID1) to avoid ever having to deal with a failed OSD and getting a 4x replication in the end. Your needs and budget may of course differ. So from our current design we plan 40Gb Storage and Client LAN. Would you suggest to connect the OSD nodes redundant to both networks? That would end up with 4 * 40Gb ports in each box, two Switches to connect to. If you can afford it, fabric switches are quite nice, as they allow for LACP over 2 switches, so if everything is working you get twice the speed, if not still full redundancy. The Brocade VDX stuff comes to mind. However if you're not tied into an Ethernet network, you might do better and cheaper with an Infiniband network on the storage side of things. This will become even more attractive as RDMA support improves with Ceph. Separating public (client) and private (storage, OSD interconnect) networks with Ceph makes only sense if your storage node can actually utilize all that bandwidth. So at your storage node density of 12 HDDs (16 HDD chassis are not space efficient), 40GbE is overkill with a single link/network, insanely so with 2 networks. I'd think of OSD nodes with 12 - 16 * 4TB SATA disks for high io pools. (+ currently SSD for journal, but may be until we start, levelDB, rocksDB are ready ... ?) Later some less io bound pools for data archiving/backup. (bigger and more Disks per node) We would also do some Cache tiering for some pools. From HP, Intel, Supermicron etc reference documentations, they use usually non-redundant network connection. (single 10Gb) I know: redundancy keeps some headaches small, but also adds some more complexity and increases the budget. (add network adapters, other server, more switches, etc) Complexity not so much, cost yes. So what would you suggest, what are your experiences? It all depends on how small (large really) you can start. I have only small clusters with few nodes, so for me redundancy is a big deal. Thus those cluster use Infiniband, 2 switches and dual-port HCAs on the nodes in an active-standby mode. If you however can start with something like 10 racks (ToR switches), loosing one switch would mean a loss of 10% of your cluster, which is something it should be able to cope with. Especially if you configured Ceph to _not_ start re-balancing data automatically if a rack goes down (so that you have a chance to put a replacement switch in place, which you of course kept handy on-site for such a case). ^.- Regards, Christian -- Christian BalzerNetwork/Systems Engineer ch...@gol.com Global OnLine Japan/Fusion Communications http://www.gol.com/ ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Network redundancy pro and cons, best practice, suggestions?
For us, using two 40Gb ports with VLANs is redundancy enough. We are doing LACP over two different switches. On Mon, Apr 13, 2015 at 3:03 AM, Götz Reinicke - IT Koordinator goetz.reini...@filmakademie.de wrote: Dear ceph users, we are planing a ceph storage cluster from scratch. Might be up to 1 PB within the next 3 years, multiple buildings, new network infrastructure for the cluster etc. I had some excellent trainings on ceph, so the essential fundamentals are familiar to me, and I know our goals/dreams can be reached. :) There is just one tiny piece in the design I'm currently unsure about :) Ceph follows some sort of keep it small and simple, e.g. dont use raid controllers, use more boxes and disks, fast network etc. So from our current design we plan 40Gb Storage and Client LAN. Would you suggest to connect the OSD nodes redundant to both networks? That would end up with 4 * 40Gb ports in each box, two Switches to connect to. I'd think of OSD nodes with 12 - 16 * 4TB SATA disks for high io pools. (+ currently SSD for journal, but may be until we start, levelDB, rocksDB are ready ... ?) Later some less io bound pools for data archiving/backup. (bigger and more Disks per node) We would also do some Cache tiering for some pools. From HP, Intel, Supermicron etc reference documentations, they use usually non-redundant network connection. (single 10Gb) I know: redundancy keeps some headaches small, but also adds some more complexity and increases the budget. (add network adapters, other server, more switches, etc) So what would you suggest, what are your experiences? Thanks for any suggestion and feedback . Regards . Götz -- Götz Reinicke IT-Koordinator Tel. +49 7141 969 82 420 E-Mail goetz.reini...@filmakademie.de Filmakademie Baden-Württemberg GmbH Akademiehof 10 71638 Ludwigsburg www.filmakademie.de Eintragung Amtsgericht Stuttgart HRB 205016 Vorsitzender des Aufsichtsrats: Jürgen Walter MdL Staatssekretär im Ministerium für Wissenschaft, Forschung und Kunst Baden-Württemberg Geschäftsführer: Prof. Thomas Schadt ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Network redundancy pro and cons, best practice, suggestions?
Redundancy is a means to an end, not an end itself. If you can afford to lose component X, manually replace it, and then return everything impacted to service, then there's no point in making X redundant. If you can afford to lose a single disk (which Ceph certainly can), then there's no point in local RAID. If you can afford to lose a single machine, then there's no point in redundant power supplies (although they can make power maintenance work a lot less complex). If you can afford to lose everything attached to a switch, then there's no point in making it redundant. Doing redundant networking to the host adds a lot of complexity that isn't really there with single-attached hosts. For instance, what happens if one of the switches loses its connection to the outside world? With LACP, you'll probably lose connectivity to half of your peers. Doing something like OSPF, possibly with ECMP, avoids that problem, but certainly doesn't make things less complicated. In most cases, I'd avoid switch redundancy. If I had more than 10 racks, there's really no point, because you should be able to lose a rack without massive disruption. If I only had a rack or two, than I quite likely wouldn't bother, simply because it ends up being a bigger part of the cost and the added complexity and cost isn't worth it in most cases. It comes down to engineering tradeoffs and money, and the right balance is different in just about every situation. It's a function of money, acceptance of risk, scale, performance, networking experience, and the cost of outages. Scott On Mon, Apr 13, 2015 at 4:02 AM Christian Balzer ch...@gol.com wrote: Hello, On Mon, 13 Apr 2015 11:03:24 +0200 Götz Reinicke - IT Koordinator wrote: Dear ceph users, we are planing a ceph storage cluster from scratch. Might be up to 1 PB within the next 3 years, multiple buildings, new network infrastructure for the cluster etc. I had some excellent trainings on ceph, so the essential fundamentals are familiar to me, and I know our goals/dreams can be reached. :) There is just one tiny piece in the design I'm currently unsure about :) Ceph follows some sort of keep it small and simple, e.g. dont use raid controllers, use more boxes and disks, fast network etc. While small and plenty is definitely true, some people actually use RAID for OSDs (like RAID1) to avoid ever having to deal with a failed OSD and getting a 4x replication in the end. Your needs and budget may of course differ. So from our current design we plan 40Gb Storage and Client LAN. Would you suggest to connect the OSD nodes redundant to both networks? That would end up with 4 * 40Gb ports in each box, two Switches to connect to. If you can afford it, fabric switches are quite nice, as they allow for LACP over 2 switches, so if everything is working you get twice the speed, if not still full redundancy. The Brocade VDX stuff comes to mind. However if you're not tied into an Ethernet network, you might do better and cheaper with an Infiniband network on the storage side of things. This will become even more attractive as RDMA support improves with Ceph. Separating public (client) and private (storage, OSD interconnect) networks with Ceph makes only sense if your storage node can actually utilize all that bandwidth. So at your storage node density of 12 HDDs (16 HDD chassis are not space efficient), 40GbE is overkill with a single link/network, insanely so with 2 networks. I'd think of OSD nodes with 12 - 16 * 4TB SATA disks for high io pools. (+ currently SSD for journal, but may be until we start, levelDB, rocksDB are ready ... ?) Later some less io bound pools for data archiving/backup. (bigger and more Disks per node) We would also do some Cache tiering for some pools. From HP, Intel, Supermicron etc reference documentations, they use usually non-redundant network connection. (single 10Gb) I know: redundancy keeps some headaches small, but also adds some more complexity and increases the budget. (add network adapters, other server, more switches, etc) Complexity not so much, cost yes. So what would you suggest, what are your experiences? It all depends on how small (large really) you can start. I have only small clusters with few nodes, so for me redundancy is a big deal. Thus those cluster use Infiniband, 2 switches and dual-port HCAs on the nodes in an active-standby mode. If you however can start with something like 10 racks (ToR switches), loosing one switch would mean a loss of 10% of your cluster, which is something it should be able to cope with. Especially if you configured Ceph to _not_ start re-balancing data automatically if a rack goes down (so that you have a chance to put a replacement switch in place, which you of course kept handy on-site for such a case). ^.- Regards, Christian -- Christian BalzerNetwork/Systems Engineer ch...@gol.com