Hi Yatin,

Thanks for sharing your presentation. That looks great. Welcome to contribute to ZeroMQ driver.

Cheers,
Li Ma

On 2014/11/19 12:50, yatin kumbhare wrote:
Hello Folks,

Couple of slides/diagrams, I documented it for my understanding way back for havana release. Particularly slide no. 10 onward.

https://docs.google.com/presentation/d/1ZPWKXN7dzXs9bX3Ref9fPDiia912zsHCHNMh_VSMhJs/edit#slide=id.p

I am also committed to using zeromq as it's light-weight/fast/scalable.

I would like to chip in for further development regarding zeromq.

Regards,
Yatin

On Wed, Nov 19, 2014 at 8:05 AM, Li Ma <[email protected] <mailto:[email protected]>> wrote:


    On 2014/11/19 1:49, Eric Windisch wrote:

        I think for this cycle we really do need to focus on
        consolidating and
        testing the existing driver design and fixing up the biggest
        deficiency (1) before we consider moving forward with lots of new


    +1

        1) Outbound messaging connection re-use - right now every
        outbound
        messaging creates and consumes a tcp connection - this
        approach scales
        badly when neutron does large fanout casts.



    I'm glad you are looking at this and by doing so, will understand
    the system better. I hope the following will give some insight
    into, at least, why I made the decisions I made:
    This was an intentional design trade-off. I saw three choices
    here: build a fully decentralized solution, build a
    fully-connected network, or use centralized brokerage. I wrote
    off centralized brokerage immediately. The problem with a fully
    connected system is that active TCP connections are required
    between all of the nodes. I didn't think that would scale and
    would be brittle against floods (intentional or otherwise).

    IMHO, I always felt the right solution for large fanout casts was
    to use multicast. When the driver was written, Neutron didn't
    exist and there was no use-case for large fanout casts, so I
    didn't implement multicast, but knew it as an option if it became
    necessary. It isn't the right solution for everyone, of course.

    Using multicast will add some complexity of switch forwarding
    plane that it will enable and maintain multicast group
    communication. For large deployment scenario, I prefer to make
    forwarding simple and easy-to-maintain. IMO, run a set of
    fanout-router processes in the cluster can also achieve the goal.
    The data path is: openstack-daemon --------send the message (with
    fanout=true) ---------> fanout-router -----read the
    matchmaker------> send to the destinations
    Actually it just uses unicast to simulate multicast.
    For connection reuse, you could manage a pool of connections and
    keep those connections around for a configurable amount of time,
    after which they'd expire and be re-opened. This would keep the
    most actively used connections alive. One problem is that it
    would make the service more brittle by making it far more
    susceptible to running out of file descriptors by keeping
    connections around significantly longer. However, this wouldn't
    be as brittle as fully-connecting the nodes nor as poorly scalable.

    +1. Set a large number of fds is not a problem. Because we use
    socket pool, we can control and keep the fixed number of fds.
    If OpenStack and oslo.messaging were designed specifically around
    this message pattern, I might suggest that the library and its
    applications be aware of high-traffic topics and persist the
    connections for those topics, while keeping others ephemeral. A
    good example for Nova would be api->scheduler traffic would be
persistent, whereas scheduler->compute_node would be ephemeral. Perhaps this is something that could still be added to the library.

        2) PUSH/PULL tcp sockets - Pieter suggested we look at
        ROUTER/DEALER
        as an option once 1) is resolved - this socket type pairing
        has some
        interesting features which would help with resilience and
        availability
including heartbeating.

    Using PUSH/PULL does not eliminate the possibility of being fully
    connected, nor is it incompatible with persistent connections. If
    you're not going to be fully-connected, there isn't much
    advantage to long-lived persistent connections and without those
    persistent connections, you're not benefitting from features such
    as heartbeating.

    How about REQ/REP? I think it is appropriate for long-lived
    persistent connections and also provide reliability due to reply.
    I'm not saying ROUTER/DEALER cannot be used, but use them with
    care. They're designed for long-lived channels between hosts and
    not for the ephemeral-type connections used in a peer-to-peer
    system. Dealing with how to manage timeouts on the client and the
    server and the swelling number of active file descriptions that
    you'll get by using ROUTER/DEALER is not trivial, assuming you
    can get past the management of all of those synchronous sockets
    (hidden away by tons of eventlet greenthreads)...

    Extra anecdote: During a conversation at the OpenStack summit,
    someone told me about their experiences using ZeroMQ and the pain
    of using REQ/REP sockets and how they felt it was a mistake they
    used them. We discussed a bit about some other problems such as
    the fact it's impossible to avoid TCP fragmentation unless you
    force all frames to 552 bytes or have a well-managed network
    where you know the MTUs of all the devices you'll pass through.
    Suggestions were made to make ZeroMQ better, until we realized we
    had just described TCP-over-ZeroMQ-over-TCP, finished our beers,
    and quickly changed topics.
    Well, seems I need to take my last question back. In our
    deployment, I always take advantage of jumbo frame to increase
    throughput. You said that REQ/REP would introduce TCP
    fragmentation unless zeromq frames == 552 bytes? Could you please
    elaborate?


    _______________________________________________
    OpenStack-dev mailing list
    [email protected]  
<mailto:[email protected]>
    http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


    _______________________________________________
    OpenStack-dev mailing list
    [email protected]
    <mailto:[email protected]>
    http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev




_______________________________________________
OpenStack-dev mailing list
[email protected]
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

_______________________________________________
OpenStack-dev mailing list
[email protected]
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Reply via email to