Re: [ceph-users] Multicast communication compuverde
On Wed, Feb 06, 2019 at 11:49:28AM +0200, Maged Mokhtar wrote: > It could be used for sending cluster maps or other configuration in a > push model, i believe corosync uses this by default. For use in sending > actual data during write ops, a primary osd can send to its replicas, > they do not have to process all traffic but can listen on specific group > address associated with that pg, which could be an increment from a base > multicast address defined. Some additional erasure codes and > acknowledgment messages need to be added to account for errors/dropped > packets. > i doubt it will give a appreciable boost given most pools use 3 > replicas in total, additionally there could be issues to get multicast > working correctly like setup igmp, so all in all in it could be a > hassle. A separate concern there is that there are too many combinations of OSDs vs multicast limitations in switchgear. As a quick math testcase: Having 3 replicas with 512 OSDs, split over 32 hosts for is ~30k unique host combinations. At at IPv4 protocol layer, this does fit into the 232/8 network for SSM scope or 239/8 LSA scope; in each of those 16.7M multicast addresses. On the switchgear side, even the big Cisco gear, the limits are even lower: 32K. | Output interface lists are stored in the multicast expansion table | (MET). The MET has room for up to 32,000 output interface lists. The | MET resources are shared by both Layer 3 multicast routes and by Layer 2 | multicast entries. The actual number of output interface lists available | in hardware depends on the specific configuration. If the total number | of multicast routes exceed 32,000, multicast packets might not be | switched by the Integrated Switching Engine. They would be forwarded by | the CPU subsystem at much slower speeds. older switchgear was even lower :-(. This would also be a switch from TCP to UDP, and redesign of other pieces, including CephX security. I'm not convinced of the overall gain at this scale for actual data. For heartbeat and other cluster-wide stuff, yes, I do agree that multicast might have benefits. -- Robin Hugh Johnson Gentoo Linux: Dev, Infra Lead, Foundation Treasurer E-Mail : robb...@gentoo.org GnuPG FP : 11ACBA4F 4778E3F6 E4EDF38E B27B944E 34884E85 GnuPG FP : 7D0B3CEB E9B85B1F 825BCECF EE05E6F6 A48F6136 signature.asc Description: PGP signature ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Multicast communication compuverde
On 06/02/2019 11:14, Marc Roos wrote: Yes indeed, but for osd's writing the replication or erasure objects you get sort of parrallel processing not? Multicast traffic from storage has a point in things like the old Windows provisioning software Ghost where you could netboot a room full och computers, have them listen to a mcast stream of the same data/image and all apply it at the same time, and perhaps re-sync potentially missing stuff at the end, which would be far less data overall than having each client ask the server(s) for the same image over and over. In the case of ceph, I would say it was much less probable that many clients would ask for exactly same data in the same order, so it would just mean all clients hear all traffic (or at least more traffic than they asked for) and need to skip past a lot of it. Den tis 5 feb. 2019 kl 22:07 skrev Marc Roos : I am still testing with ceph mostly, so my apologies for bringing up something totally useless. But I just had a chat about compuverde storage. They seem to implement multicast in a scale out solution. I was wondering if there is any experience here with compuverde and how it compared to ceph. And maybe this multicast approach could be interesting to use with ceph? ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com It could be used for sending cluster maps or other configuration in a push model, i believe corosync uses this by default. For use in sending actual data during write ops, a primary osd can send to its replicas, they do not have to process all traffic but can listen on specific group address associated with that pg, which could be an increment from a base multicast address defined. Some additional erasure codes and acknowledgment messages need to be added to account for errors/dropped packets. i doubt it will give a appreciable boost given most pools use 3 replicas in total, additionally there could be issues to get multicast working correctly like setup igmp, so all in all in it could be a hassle. /Maged ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Multicast communication compuverde
For EC coded stuff,at 10+4 with 13 others needing data apart from the primary, they are specifically NOT getting the same data, they are getting either 1/10th of the pieces, or one of the 4 different checksums, so it would be nasty to send full data to all OSDs expecting a 14th of the data. Den ons 6 feb. 2019 kl 10:14 skrev Marc Roos : > > Yes indeed, but for osd's writing the replication or erasure objects you > get sort of parrallel processing not? > > > > Multicast traffic from storage has a point in things like the old > Windows provisioning software Ghost where you could netboot a room full > och computers, have them listen to a mcast stream of the same data/image > and all apply it at the same time, and perhaps re-sync potentially > missing stuff at the end, which would be far less data overall than > having each client ask the server(s) for the same image over and over. > In the case of ceph, I would say it was much less probable that many > clients would ask for exactly same data in the same order, so it would > just mean all clients hear all traffic (or at least more traffic than > they asked for) and need to skip past a lot of it. > > > Den tis 5 feb. 2019 kl 22:07 skrev Marc Roos : > > > > > I am still testing with ceph mostly, so my apologies for bringing > up > something totally useless. But I just had a chat about compuverde > storage. They seem to implement multicast in a scale out solution. > > I was wondering if there is any experience here with compuverde > and > how > it compared to ceph. And maybe this multicast approach could be > interesting to use with ceph? > > > > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > > -- > > May the most significant bit of your life be positive. > > > > -- May the most significant bit of your life be positive. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Multicast communication compuverde
Hi, we have a compuverde cluster, and AFAIK it uses multicast for node discovery, not for data distribution. If you need more information, feel free to contact me either by email or via IRC (-> Be-El). Regards, Burkhard ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Multicast communication compuverde
Yes indeed, but for osd's writing the replication or erasure objects you get sort of parrallel processing not? Multicast traffic from storage has a point in things like the old Windows provisioning software Ghost where you could netboot a room full och computers, have them listen to a mcast stream of the same data/image and all apply it at the same time, and perhaps re-sync potentially missing stuff at the end, which would be far less data overall than having each client ask the server(s) for the same image over and over. In the case of ceph, I would say it was much less probable that many clients would ask for exactly same data in the same order, so it would just mean all clients hear all traffic (or at least more traffic than they asked for) and need to skip past a lot of it. Den tis 5 feb. 2019 kl 22:07 skrev Marc Roos : I am still testing with ceph mostly, so my apologies for bringing up something totally useless. But I just had a chat about compuverde storage. They seem to implement multicast in a scale out solution. I was wondering if there is any experience here with compuverde and how it compared to ceph. And maybe this multicast approach could be interesting to use with ceph? ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- May the most significant bit of your life be positive. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Multicast communication compuverde
Multicast traffic from storage has a point in things like the old Windows provisioning software Ghost where you could netboot a room full och computers, have them listen to a mcast stream of the same data/image and all apply it at the same time, and perhaps re-sync potentially missing stuff at the end, which would be far less data overall than having each client ask the server(s) for the same image over and over. In the case of ceph, I would say it was much less probable that many clients would ask for exactly same data in the same order, so it would just mean all clients hear all traffic (or at least more traffic than they asked for) and need to skip past a lot of it. Den tis 5 feb. 2019 kl 22:07 skrev Marc Roos : > > > I am still testing with ceph mostly, so my apologies for bringing up > something totally useless. But I just had a chat about compuverde > storage. They seem to implement multicast in a scale out solution. > > I was wondering if there is any experience here with compuverde and how > it compared to ceph. And maybe this multicast approach could be > interesting to use with ceph? > > > > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > -- May the most significant bit of your life be positive. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Multicast communication compuverde
I am still testing with ceph mostly, so my apologies for bringing up something totally useless. But I just had a chat about compuverde storage. They seem to implement multicast in a scale out solution. I was wondering if there is any experience here with compuverde and how it compared to ceph. And maybe this multicast approach could be interesting to use with ceph? ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com