Hello Folks, Couple of slides/diagrams, I documented it for my understanding way back for havana release. Particularly slide no. 10 onward.
https://docs.google.com/presentation/d/1ZPWKXN7dzXs9bX3Ref9fPDiia912zsHCHNMh_VSMhJs/edit#slide=id.p I am also committed to using zeromq as it's light-weight/fast/scalable. I would like to chip in for further development regarding zeromq. Regards, Yatin On Wed, Nov 19, 2014 at 8:05 AM, Li Ma <[email protected]> wrote: > > On 2014/11/19 1:49, Eric Windisch wrote: > > I think for this cycle we really do need to focus on consolidating and >> testing the existing driver design and fixing up the biggest >> deficiency (1) before we consider moving forward with lots of new > > > +1 > > >> 1) Outbound messaging connection re-use - right now every outbound >> messaging creates and consumes a tcp connection - this approach scales >> badly when neutron does large fanout casts. >> > > > I'm glad you are looking at this and by doing so, will understand the > system better. I hope the following will give some insight into, at least, > why I made the decisions I made: > > This was an intentional design trade-off. I saw three choices here: build > a fully decentralized solution, build a fully-connected network, or use > centralized brokerage. I wrote off centralized brokerage immediately. The > problem with a fully connected system is that active TCP connections are > required between all of the nodes. I didn't think that would scale and > would be brittle against floods (intentional or otherwise). > > IMHO, I always felt the right solution for large fanout casts was to use > multicast. When the driver was written, Neutron didn't exist and there was > no use-case for large fanout casts, so I didn't implement multicast, but > knew it as an option if it became necessary. It isn't the right solution > for everyone, of course. > > Using multicast will add some complexity of switch forwarding plane > that it will enable and maintain multicast group communication. For large > deployment scenario, I prefer to make forwarding simple and > easy-to-maintain. IMO, run a set of fanout-router processes in the cluster > can also achieve the goal. > The data path is: openstack-daemon --------send the message (with > fanout=true) ---------> fanout-router -----read the matchmaker------> send > to the destinations > Actually it just uses unicast to simulate multicast. > > For connection reuse, you could manage a pool of connections and keep > those connections around for a configurable amount of time, after which > they'd expire and be re-opened. This would keep the most actively used > connections alive. One problem is that it would make the service more > brittle by making it far more susceptible to running out of file > descriptors by keeping connections around significantly longer. However, > this wouldn't be as brittle as fully-connecting the nodes nor as poorly > scalable. > > +1. Set a large number of fds is not a problem. Because we use socket > pool, we can control and keep the fixed number of fds. > > If OpenStack and oslo.messaging were designed specifically around this > message pattern, I might suggest that the library and its applications be > aware of high-traffic topics and persist the connections for those topics, > while keeping others ephemeral. A good example for Nova would be > api->scheduler traffic would be persistent, whereas scheduler->compute_node > would be ephemeral. Perhaps this is something that could still be added to > the library. > > 2) PUSH/PULL tcp sockets - Pieter suggested we look at ROUTER/DEALER >> as an option once 1) is resolved - this socket type pairing has some >> interesting features which would help with resilience and availability >> including heartbeating. > > > Using PUSH/PULL does not eliminate the possibility of being fully > connected, nor is it incompatible with persistent connections. If you're > not going to be fully-connected, there isn't much advantage to long-lived > persistent connections and without those persistent connections, you're not > benefitting from features such as heartbeating. > > How about REQ/REP? I think it is appropriate for long-lived persistent > connections and also provide reliability due to reply. > > I'm not saying ROUTER/DEALER cannot be used, but use them with care. > They're designed for long-lived channels between hosts and not for the > ephemeral-type connections used in a peer-to-peer system. Dealing with how > to manage timeouts on the client and the server and the swelling number of > active file descriptions that you'll get by using ROUTER/DEALER is not > trivial, assuming you can get past the management of all of those > synchronous sockets (hidden away by tons of eventlet greenthreads)... > > Extra anecdote: During a conversation at the OpenStack summit, someone > told me about their experiences using ZeroMQ and the pain of using REQ/REP > sockets and how they felt it was a mistake they used them. We discussed a > bit about some other problems such as the fact it's impossible to avoid TCP > fragmentation unless you force all frames to 552 bytes or have a > well-managed network where you know the MTUs of all the devices you'll pass > through. Suggestions were made to make ZeroMQ better, until we realized we > had just described TCP-over-ZeroMQ-over-TCP, finished our beers, and > quickly changed topics. > > Well, seems I need to take my last question back. In our deployment, I > always take advantage of jumbo frame to increase throughput. You said that > REQ/REP would introduce TCP fragmentation unless zeromq frames == 552 > bytes? Could you please elaborate? > > > > _______________________________________________ > OpenStack-dev mailing > [email protected]http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > > > > _______________________________________________ > OpenStack-dev mailing list > [email protected] > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > >
_______________________________________________ OpenStack-dev mailing list [email protected] http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
