Re: [ceph-users] Multicast communication compuverde

2019-02-08 Thread Robin H. Johnson
On Wed, Feb 06, 2019 at 11:49:28AM +0200, Maged Mokhtar wrote:
> It could be used for sending cluster maps or other configuration in a 
> push model, i believe corosync uses this by default. For use in sending 
> actual data during write ops, a primary osd can send to its replicas, 
> they do not have to process all traffic but can listen on specific group 
> address associated with that pg, which could be an increment from a base 
> multicast address defined. Some additional erasure codes and 
> acknowledgment messages need to be added to account for errors/dropped 
> packets.

> i doubt it will give a appreciable boost given most pools use 3
> replicas in total, additionally there could be issues to get multicast
> working correctly like setup igmp, so all in all in it could be a
> hassle.
A separate concern there is that there are too many combinations of OSDs
vs multicast limitations in switchgear. As a quick math testcase: 
Having 3 replicas with 512 OSDs, split over 32 hosts for is ~30k unique
host combinations. 

At at IPv4 protocol layer, this does fit into the 232/8 network for SSM
scope or 239/8 LSA scope; in each of those 16.7M multicast addresses.

On the switchgear side, even the big Cisco gear, the limits are even
lower: 32K.
| Output interface lists are stored in the multicast expansion table
| (MET). The MET has room for up to 32,000 output interface lists.  The
| MET resources are shared by both Layer 3 multicast routes and by Layer 2
| multicast entries. The actual number of output interface lists available
| in hardware depends on the specific configuration. If the total number
| of multicast routes exceed 32,000, multicast packets might not be
| switched by the Integrated Switching Engine. They would be forwarded by
| the CPU subsystem at much slower speeds.
older switchgear was even lower :-(.

This would also be a switch from TCP to UDP, and redesign of other
pieces, including CephX security.

I'm not convinced of the overall gain at this scale for actual data.
For heartbeat and other cluster-wide stuff, yes, I do agree that
multicast might have benefits.

-- 
Robin Hugh Johnson
Gentoo Linux: Dev, Infra Lead, Foundation Treasurer
E-Mail   : robb...@gentoo.org
GnuPG FP : 11ACBA4F 4778E3F6 E4EDF38E B27B944E 34884E85
GnuPG FP : 7D0B3CEB E9B85B1F 825BCECF EE05E6F6 A48F6136


signature.asc
Description: PGP signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Multicast communication compuverde

2019-02-06 Thread Maged Mokhtar



On 06/02/2019 11:14, Marc Roos wrote:

Yes indeed, but for osd's writing the replication or erasure objects you
get sort of parrallel processing not?



Multicast traffic from storage has a point in things like the old
Windows provisioning software Ghost where you could netboot a room full
och computers, have them listen to a mcast stream of the same data/image
and all apply it at the same time, and perhaps re-sync potentially
missing stuff at the end, which would be far less data overall than
having each client ask the server(s) for the same image over and over.
In the case of ceph, I would say it was much less probable that many
clients would ask for exactly same data in the same order, so it would
just mean all clients hear all traffic (or at least more traffic than
they asked for) and need to skip past a lot of it.


Den tis 5 feb. 2019 kl 22:07 skrev Marc Roos :




I am still testing with ceph mostly, so my apologies for bringing
up
something totally useless. But I just had a chat about compuverde
storage. They seem to implement multicast in a scale out solution.

I was wondering if there is any experience here with compuverde and
how
it compared to ceph. And maybe this multicast approach could be
interesting to use with ceph?




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




It could be used for sending cluster maps or other configuration in a 
push model, i believe corosync uses this by default. For use in sending 
actual data during write ops, a primary osd can send to its replicas, 
they do not have to process all traffic but can listen on specific group 
address associated with that pg, which could be an increment from a base 
multicast address defined. Some additional erasure codes and 
acknowledgment messages need to be added to account for errors/dropped 
packets. i doubt it will give a appreciable boost given most pools use 3 
replicas in total, additionally there could be issues to get multicast 
working correctly like setup igmp, so all in all in it could be a hassle.


/Maged

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Multicast communication compuverde

2019-02-06 Thread Janne Johansson
For EC coded stuff,at 10+4 with 13 others needing data apart from the
primary, they are specifically NOT getting the same data, they are getting
either 1/10th of the pieces, or one of the 4 different checksums, so it
would be nasty to send full data to all OSDs expecting a 14th of the data.


Den ons 6 feb. 2019 kl 10:14 skrev Marc Roos :

>
> Yes indeed, but for osd's writing the replication or erasure objects you
> get sort of parrallel processing not?
>
>
>
> Multicast traffic from storage has a point in things like the old
> Windows provisioning software Ghost where you could netboot a room full
> och computers, have them listen to a mcast stream of the same data/image
> and all apply it at the same time, and perhaps re-sync potentially
> missing stuff at the end, which would be far less data overall than
> having each client ask the server(s) for the same image over and over.
> In the case of ceph, I would say it was much less probable that many
> clients would ask for exactly same data in the same order, so it would
> just mean all clients hear all traffic (or at least more traffic than
> they asked for) and need to skip past a lot of it.
>
>
> Den tis 5 feb. 2019 kl 22:07 skrev Marc Roos :
>
>
>
>
> I am still testing with ceph mostly, so my apologies for bringing
> up
> something totally useless. But I just had a chat about compuverde
> storage. They seem to implement multicast in a scale out solution.
>
> I was wondering if there is any experience here with compuverde
> and
> how
> it compared to ceph. And maybe this multicast approach could be
> interesting to use with ceph?
>
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>
>
> --
>
> May the most significant bit of your life be positive.
>
>
>
>

-- 
May the most significant bit of your life be positive.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Multicast communication compuverde

2019-02-06 Thread Burkhard Linke

Hi,


we have a compuverde cluster, and AFAIK it uses multicast for node 
discovery, not for data distribution.



If you need more information, feel free to contact me either by email or 
via IRC (-> Be-El).



Regards,

Burkhard


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Multicast communication compuverde

2019-02-06 Thread Marc Roos


Yes indeed, but for osd's writing the replication or erasure objects you 
get sort of parrallel processing not?



Multicast traffic from storage has a point in things like the old 
Windows provisioning software Ghost where you could netboot a room full 
och computers, have them listen to a mcast stream of the same data/image 
and all apply it at the same time, and perhaps re-sync potentially 
missing stuff at the end, which would be far less data overall than 
having each client ask the server(s) for the same image over and over. 
In the case of ceph, I would say it was much less probable that many 
clients would ask for exactly same data in the same order, so it would 
just mean all clients hear all traffic (or at least more traffic than 
they asked for) and need to skip past a lot of it.


Den tis 5 feb. 2019 kl 22:07 skrev Marc Roos :




I am still testing with ceph mostly, so my apologies for bringing 
up 
something totally useless. But I just had a chat about compuverde 
storage. They seem to implement multicast in a scale out solution. 

I was wondering if there is any experience here with compuverde and 
how 
it compared to ceph. And maybe this multicast approach could be 
interesting to use with ceph?




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




-- 

May the most significant bit of your life be positive.



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Multicast communication compuverde

2019-02-06 Thread Janne Johansson
Multicast traffic from storage has a point in things like the old Windows
provisioning software Ghost where you could netboot a room full och
computers, have them listen to a mcast stream of the same data/image and
all apply it at the same time, and perhaps re-sync potentially missing
stuff at the end, which would be far less data overall than having each
client ask the server(s) for the same image over and over.
In the case of ceph, I would say it was much less probable that many
clients would ask for exactly same data in the same order, so it would just
mean all clients hear all traffic (or at least more traffic than they asked
for) and need to skip past a lot of it.


Den tis 5 feb. 2019 kl 22:07 skrev Marc Roos :

>
>
> I am still testing with ceph mostly, so my apologies for bringing up
> something totally useless. But I just had a chat about compuverde
> storage. They seem to implement multicast in a scale out solution.
>
> I was wondering if there is any experience here with compuverde and how
> it compared to ceph. And maybe this multicast approach could be
> interesting to use with ceph?
>
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>


-- 
May the most significant bit of your life be positive.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com