Hi everyone!

Here is the status update for zmq driver upstream development in M and plans 
for N

What was done:

1. Dedicated patterns usage was finally impelemented. DEALER/ROUTER for CALL, 
PUSH/PULL for CAST, PUB/SUB for fanout.
        Last time everything worked over DEAL/ROUTER which was not optimal 
enough.
2. Implemented support for Sentinel clustering in matchmaker Redis (Ty Alexey 
Yelistratov).
3. More smart (retries based) conversation between redis and services.
        * Dynamic updates
        * Records TTL
4. Transport URL was finally supported.
5. Added full tempest gate with Neutron for zmq (Ty Dmitry Ukhlov).
6. Performed successful multi-node deployment testing (Ty Alexey Yelistratov):
        * devstack multiple nodes
        * Rally nova-boot 200 nodes + fuel deployment
7. Performed benchmark testing with simulator (o.m/tools/simulator.py) on 20 
nodes deployment (Ty Yulia Portnova)
        * CALL ~29k msg/sec compared to rabbit-cluster ~2k msg/sec
8. Finally reduced IPC-proxy which could cause problems in container-based 
deployment like koala.

And many other smaller bug-fixes.


So, we've got closer to zmq usage in real environment but still need more work 
to do to make this happen.
Here is the list of known issues we've got from testing as a feedback and other 
things we would like to fix in the driver.
(To find the whole list of known bugs please follow the link [1]).

Most important issues to fix in N here:

1. ZMQ driver eats too many TCP sockets [2].
        Currently with direct client-server connections architecture we faced 
the problem. Solution is to use
        stateless transparent remote-proxies to reduce the number of 
connections. The solution is in progress [3].

2. Implement retries for unacknowledged messages and heartbeats [4], [5], [6]
        In order to have reliable messaging in case of bad-network and proxies 
failures.

3. Fix interaction with name-service and make proper updates both sides 
(HA-related) [7]
        Properly reconnect restarted services.

4. Get success with ceilometer. [8]

5. Support PGM protocol for multicast as an option [9]

6. Support encryption for messages (libsodium etc.)

All other issues by the link [1]


What kinds of testing is planned:

1. HA testing:
        * Restarting/adding/removing nodes, test reconnects and proper 
messaging layer recovery send-retries etc.
        * Bad network emulation also test send-retries correctness

2. Benchmark testing:
        * increase load, number of nodes
        * test different kinds of deployment configuration (with different 
number of proxies, with direct connections).

3. Try Rally with 500 nodes at least.

Many thanks to Oslo and Performance teams for help with testing and reviews.

Thanks,
Oleksii

Links:
1 - https://bugs.launchpad.net/oslo.messaging/+bugs?field.tag=zmq
2 - https://bugs.launchpad.net/oslo.messaging/+bug/1555007
3 - https://review.openstack.org/#/c/287094/
4 - https://bugs.launchpad.net/oslo.messaging/+bug/1497306
5 - https://bugs.launchpad.net/oslo.messaging/+bug/1503295
6 - https://bugs.launchpad.net/oslo.messaging/+bug/1497302
7 - https://bugs.launchpad.net/oslo.messaging/+bug/1548836
8 - https://bugs.launchpad.net/oslo.messaging/+bug/1539047
9 - https://bugs.launchpad.net/oslo.messaging/+bug/1524100
__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Reply via email to