Re: [openstack-dev] Zero MQ remove central broker. Architecture change.

Li Ma Tue, 18 Nov 2014 18:37:48 -0800


On 2014/11/19 1:49, Eric Windisch wrote:

    I think for this cycle we really do need to focus on consolidating and
    testing the existing driver design and fixing up the biggest
    deficiency (1) before we consider moving forward with lots of new


+1

    1) Outbound messaging connection re-use - right now every outbound
    messaging creates and consumes a tcp connection - this approach scales
    badly when neutron does large fanout casts.
I'm glad you are looking at this and by doing so, will understand thesystem better. I hope the following will give some insight into, atleast, why I made the decisions I made:This was an intentional design trade-off. I saw three choices here:build a fully decentralized solution, build a fully-connected network,or use centralized brokerage. I wrote off centralized brokerageimmediately. The problem with a fully connected system is that activeTCP connections are required between all of the nodes. I didn't thinkthat would scale and would be brittle against floods (intentional orotherwise).
IMHO, I always felt the right solution for large fanout casts was touse multicast. When the driver was written, Neutron didn't exist andthere was no use-case for large fanout casts, so I didn't implementmulticast, but knew it as an option if it became necessary. It isn'tthe right solution for everyone, of course.

Using multicast will add some complexity of switch forwarding plane thatit will enable and maintain multicast group communication. For largedeployment scenario, I prefer to make forwarding simple andeasy-to-maintain. IMO, run a set of fanout-router processes in thecluster can also achieve the goal.The data path is: openstack-daemon --------send the message (withfanout=true) ---------> fanout-router -----read the matchmaker------>send to the destinations

Actually it just uses unicast to simulate multicast.

For connection reuse, you could manage a pool of connections and keepthose connections around for a configurable amount of time, afterwhich they'd expire and be re-opened. This would keep the mostactively used connections alive. One problem is that it would make theservice more brittle by making it far more susceptible to running outof file descriptors by keeping connections around significantlylonger. However, this wouldn't be as brittle as fully-connecting thenodes nor as poorly scalable.

+1. Set a large number of fds is not a problem. Because we use socketpool, we can control and keep the fixed number of fds.

If OpenStack and oslo.messaging were designed specifically around thismessage pattern, I might suggest that the library and its applicationsbe aware of high-traffic topics and persist the connections for thosetopics, while keeping others ephemeral. A good example for Nova wouldbe api->scheduler traffic would be persistent, whereasscheduler->compute_node would be ephemeral. Perhaps this is somethingthat could still be added to the library.
    2) PUSH/PULL tcp sockets - Pieter suggested we look at ROUTER/DEALER
    as an option once 1) is resolved - this socket type pairing has some
    interesting features which would help with resilience and availability
including heartbeating.
Using PUSH/PULL does not eliminate the possibility of being fullyconnected, nor is it incompatible with persistent connections. Ifyou're not going to be fully-connected, there isn't much advantage tolong-lived persistent connections and without those persistentconnections, you're not benefitting from features such as heartbeating.

How about REQ/REP? I think it is appropriate for long-lived persistentconnections and also provide reliability due to reply.

I'm not saying ROUTER/DEALER cannot be used, but use them with care.They're designed for long-lived channels between hosts and not for theephemeral-type connections used in a peer-to-peer system. Dealing withhow to manage timeouts on the client and the server and the swellingnumber of active file descriptions that you'll get by usingROUTER/DEALER is not trivial, assuming you can get past the managementof all of those synchronous sockets (hidden away by tons of eventletgreenthreads)...
Extra anecdote: During a conversation at the OpenStack summit, someonetold me about their experiences using ZeroMQ and the pain of usingREQ/REP sockets and how they felt it was a mistake they used them. Wediscussed a bit about some other problems such as the fact it'simpossible to avoid TCP fragmentation unless you force all frames to552 bytes or have a well-managed network where you know the MTUs ofall the devices you'll pass through. Suggestions were made to makeZeroMQ better, until we realized we had just describedTCP-over-ZeroMQ-over-TCP, finished our beers, and quickly changed topics.

Well, seems I need to take my last question back. In our deployment, Ialways take advantage of jumbo frame to increase throughput. You saidthat REQ/REP would introduce TCP fragmentation unless zeromq frames ==552 bytes? Could you please elaborate?



_______________________________________________
OpenStack-dev mailing list
[email protected]
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

_______________________________________________
OpenStack-dev mailing list
[email protected]
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] Zero MQ remove central broker. Architecture change.

Reply via email to