Re: [openstack-dev] [Zaqar] Comments on the concerns arose during the TC meeting

Zane Bitter Thu, 11 Sep 2014 15:15:08 -0700

On 09/09/14 15:03, Monty Taylor wrote:

On 09/04/2014 01:30 AM, Clint Byrum wrote:

Excerpts from Flavio Percoco's message of 2014-09-04 00:08:47 -0700:

Greetings,


Last Tuesday the TC held the first graduation review for Zaqar. During
the meeting some concerns arose. I've listed those concerns below with
some comments hoping that it will help starting a discussion before the
next meeting. In addition, I've added some comments about the project
stability at the bottom and an etherpad link pointing to a list of use
cases for Zaqar.


Hi Flavio. This was an interesting read. As somebody whose attention has
recently been drawn to Zaqar, I am quite interested in seeing it
graduate.

# Concerns

- Concern on operational burden of requiring NoSQL deploy expertise to
the mix of openstack operational skills

For those of you not familiar with Zaqar, it currently supports 2 nosql
drivers - MongoDB and Redis - and those are the only 2 drivers it
supports for now. This will require operators willing to use Zaqar to
maintain a new (?) NoSQL technology in their system. Before expressing
our thoughts on this matter, let me say that:

     1. By removing the SQLAlchemy driver, we basically removed the
chance
for operators to use an already deployed "OpenStack-technology"
     2. Zaqar won't be backed by any AMQP based messaging technology for
now. Here's[0] a summary of the research the team (mostly done by
Victoria) did during Juno
     3. We (OpenStack) used to require Redis for the zmq matchmaker
     4. We (OpenStack) also use memcached for caching and as the oslo
caching lib becomes available - or a wrapper on top of dogpile.cache -
Redis may be used in place of memcached in more and more deployments.
     5. Ceilometer's recommended storage driver is still MongoDB,
although
Ceilometer has now support for sqlalchemy. (Please correct me if I'm
wrong).

That being said, it's obvious we already, to some extent, promote some
NoSQL technologies. However, for the sake of the discussion, lets assume
we don't.

I truly believe, with my OpenStack (not Zaqar's) hat on, that we can't
keep avoiding these technologies. NoSQL technologies have been around
for years and we should be prepared - including OpenStack operators - to
support these technologies. Not every tool is good for all tasks - one
of the reasons we removed the sqlalchemy driver in the first place -
therefore it's impossible to keep an homogeneous environment for all
services.


I whole heartedly agree that non traditional storage technologies that
are becoming mainstream are good candidates for use cases where SQL
based storage gets in the way. I wish there wasn't so much FUD
(warranted or not) about MongoDB, but that is the reality we live in.

With this, I'm not suggesting to ignore the risks and the extra burden
this adds but, instead of attempting to avoid it completely by not
evolving the stack of services we provide, we should probably work on
defining a reasonable subset of NoSQL services we are OK with
supporting. This will help making the burden smaller and it'll give
operators the option to choose.

[0] http://blog.flaper87.com/post/marconi-amqp-see-you-later/


- Concern on should we really reinvent a queue system rather than
piggyback on one

As mentioned in the meeting on Tuesday, Zaqar is not reinventing message
brokers. Zaqar provides a service akin to SQS from AWS with an OpenStack
flavor on top. [0]


I think Zaqar is more like SMTP and IMAP than AMQP. You're not really
trying to connect two processes in real time. You're trying to do fully
asynchronous messaging with fully randomized access to any message.

Perhaps somebody should explore whether the approaches taken by large
scale IMAP providers could be applied to Zaqar.

Anyway, I can't imagine writing a system to intentionally use the
semantics of IMAP and SMTP. I'd be very interested in seeing actual use
cases for it, apologies if those have been posted before.


It seems like you're EITHER describing something called XMPP that has at
least one open source scalable backend called ejabberd. OR, you've
actually hit the nail on the head with bringing up SMTP and IMAP but for
some reason that feels strange.

SMTP and IMAP already implement every feature you've described, as well
as retries/failover/HA and a fully end to end secure transport (if
installed properly) If you don't actually set them up to run as a public
messaging interface but just as a cloud-local exchange, then you could
get by with very low overhead for a massive throughput - it can very
easily be run on a single machine for Sean's simplicity, and could just
as easily be scaled out using well known techniques for public cloud
sized deployments?

So why not use existing daemons that do this? You could still use the
REST API you've got, but instead of writing it to a mongo backend and
trying to implement all of the things that already exist in SMTP/IMAP -
you could just have them front to it. You could even bypass normal
delivery mechanisms and do neat things with local injection.

I don't care about the NoSQL question on its own. Mongo is fine. Redis
is fine. I don't think either has any features for this use case that
make a licks worth of difference compared to MySQL or Postgres, but I
also don't think they are a PROBLEM in an of themselves.

The main thing I care about here is every description I've heard of what
zaqar wants to do (which does seem to be getting clearer through this
thread) is still well implemented somewhere as an existing scalable
service. Is zaqar actually Rabbit with a REST interface? Is it ejabberd
with a rest interface? Or is it IMAP/SMTP with a REST interface. You'll
note that probably nobody would think a single server that wanted to be
both Rabbit AND IMAP/SMTP is a good idea ... at least this is one of the
reasons why we all think Microsoft Exchange is a pile of garbage, no?

I was intrigued by the idea of an ejabberd backend to Zaqar, so I spenthalf a morning yesterday investigating it. (tl;dr - it won't work.)

XMPP does have a sort-of standard for queueing messages when a client isoffline[1], and ejabberd does support it[2]. Amusingly, it does so bystoring the queue in a RDBMS (the very thing that the TC has repeatedlycalled an 'anti-pattern'). Unfortunately, ejabberd does _not_ support[2]the extension that would allow the Zaqar API to request messages one ata time (in arbitrary order, though that's not important here) out of thequeue[3], so if I understand your proposal correctly every time the APIpolled ejabberd it could potentially receive a flood of messages that itwould then have to reliably buffer itself (i.e. duplicating all the workthat ejabberd was supposed to eliminate). In fact, XMPP is not designedto be reliable at all. There is an XMPP extension that could potentiallyoffer reliable delivery via acks[4], although it's not entirely clear tome if that requires the participation of the client (i.e. effectivelybecomes synchronous messaging).

So, in summary, not a good fit because it doesn't match the #1requirement, which is to never lose messages while remaining asynchronous.


I can't figure out if the suggestion to use dovecot was actually serious.

[1] http://www.xmpp.org/extensions/xep-0160.html
[2] http://www.ejabberd.im/protocols
[3] http://xmpp.org/extensions/xep-0013.html
[4] http://xmpp.org/extensions/xep-0079.html

In any event, I think it's probably unhelpful to come at this from theangle of "which orange is the best one to compare this to, and pleasedon't even talk to me about other apples". The thing Zaqar is mostdirectly comparable to is not email or XMPP, it's SQS.

SQS offers a guarantee of delivering each message *at least* once.[5] Itis optimised for durability rather than latency. It also tries tominimise multiple deliveries in the case where multiple clients arepolling the same queue (e.g. a work queue).

Zaqar offers somewhat more complicated semantics[6]. I think we shoulddiscuss those semantics and agree on which are essential and whichdispensable, rather than trying to compare it to things like IMAP. Oncewe have agreement on what the semantics should be, then we can sensiblydiscuss which back ends are capable of satisfying them.

[5]https://en.wikipedia.org/wiki/Amazon_Simple_Queue_Service#Message_delivery[6]https://wiki.openstack.org/wiki/Zaqar/Frequently_asked_questions#What_messaging_patterns_does_Zaqar_support.3F

Zaqar obviously supports point-to-point queues, with one producer andone consumer. I assume it also supports many:1 and anycast 1:many &many:many queues - it could hardly fail to do so, since it doesn'tactually know who the producers and consumers are. Hopefully it takessteps to ensure that multiple workers rarely receive the same messagebefore it is acknowledged.

However, Zaqar also supports the Pub-Sub model of messaging. I believe,but would like Flavio to confirm, that this is what is meant when theZaqar team say that Zaqar is about messaging in general and not justqueuing. That is to say, it is possible for multiple consumers tointentionally consume the same message, with each maintaining its ownpointer in the queue. (Another way to think of this is that messages canbe multicast to multiple virtual queues, with data de-duplicationbetween them.) To a relative novice in the field like me, the differencebetween this and queuing sounds pretty academic :P. Call it what youwill, it seems like a reasonable thing to implement to me.

What's not clear to me is whether Zaqar supports a model where multipledifferent publish queues are somehow multiplexed together into eachsubscription queue with the subscribers able to look and determine whichmessages to receive and which not to. I do *not* think Zaqar supportsthat (but again, would like Flavio to confirm). I definitely think itwould be a mistake if it did. And I think that this is the kind of thingthat Clint is referring to with the IMAP analogy.

The final question is the one of arbitrary access to messages in thequeue (or "queue" if you prefer). Flavio indicated that this effectivelycame for free with their implementation of Pub-Sub. IMHO it isunnecessary and limits the choice of potential back ends in the future.I would personally be +1 on removing it from the v2 API, and also +1 onthe v2 API shipping in Kilo so that as few new adopters as possible getstuck with the limited choices of back-end. I hope that would resolveClint's concerns that we need a separate, light-weight queue system; Ipersonally don't believe we need two projects, even though I agree thatall of the use cases I personally care about could probably be satisfiedwithout Pub-Sub.

As Rob pointed out, one of the more obvious choices of back end for anAPI like the one I just described would be Apache Kafka. Unfortunatelyit is a massive Java application with Zookeeper dependencies, and we allknow how Monty feels about those ;) (FWIW, I agree with him.) Giventhat's a non-starter as the _default_ back-end, the current design ofallowing multiple pluggable storage back ends, starting with MongoDB andRedis, seems like not a bad one to me.

I also worry about the fact that one description of zaqar was used to
communicate a need for divergent requirements (it needs to be a
high-volume fast message broker/queue - which, btw, sounds more like
Rabbit/oslo.messaging and less like what Clint describes above) ... and
that's why it wants to use falcon and not pecan and why it wants to use
mongo and not SQL. And then what we're doing it reimplementing something
like rabbit except in python (again, given as the justification for
deviating from how other bits of OpenStack work)

The idea of Zaqar is that it'll be the central place for polling stuffin OpenStack. So it's going to get hit a lot, and it makes sense to doas little work on each request as possible because work is expensive andthere will be a lot of requests. It doesn't follow that the main aim isto optimise for latency and throughput (as it is with AMQP).

Last I checked, pretty much every OpenStack API was using a differentweb framework already and it hasn't been much more than a minorannoyance as far as I know.

BUT - if that's not actually what zaqar is - if it isn't a rabbit
replacement and doesn't need to do massive high volume sub-second
queuing because what it's actually modeling is a message subscription
service that's closer to email than to anything else, then there is
nothing about the components that are happily used in the rest of
OpenStack that should be precluded from being used. A REST api written
in pecan should be fine ... as should an SQL backend, because 99% of all
operations are going to be primary key lookups where even a moderately
tuned database should be absolutely fine at keeping up.


I don't think it's either of those things.

cheers,
Zane.

So which is it? Because it sounds like to me it's a thing that actually
does NOT need to diverge in technology in any way, but that I've been
told that it needs to diverge because it's delivering a different set of
features - and I'm pretty sure if it _is_ the thing that needs to
diverge in technology because of its feature set, then it's a thing I
don't think we should be implementing in python in OpenStack because it
already exists and it's called AMQP.

Some things that differentiate Zaqar from SQS is it's capability for
supporting different protocols without sacrificing multi-tenantcy and
other intrinsic features it provides. Some protocols you may consider
for Zaqar are: STOMP, MQTT.

As far as the backend goes, Zaqar is not re-inventing it either. It sits
on top of existing storage technologies that have proven to be fast and
reliable for this task. The choice of using NoSQL technologies has a lot
to do with this particular thing and the fact that Zaqar needs a storage
capable of scaling, replicating and good support for failover.


What's odd to me is that other systems like Cassandra and Riak are not
being discussed. There are well documented large scale message storage
systems on both, and neither is encumbered by the same licensing FUD
as MongoDB.

Anyway, again if we look at this as a place to storage and retrieve
messages, and not as a queue, then talking about databases, instead of
message brokers, makes a lot more sense.


- concern on the maturity of the NoQSL not AGPL backend (Redis)

Redis backend just landed and I've been working on a gate job for it
today. Although it hasn't been tested in production, if Zaqar graduates,
it still has a full development cycle to be tested and improved before
the first integrated release happens.


I'd be quite interested to see how it is expected to scale. From my very
quick reading of the driver, it only supports a single redis server. No
consistent hash ring or anything like that.

# Use Cases

In addition to the aforementioned concerns and comments, I also would
like to share an etherpad that contains some use cases that other
integrated projects have for Zaqar[0]. The list is not exhaustive and
it'll contain more information before the next meeting.

[0] https://etherpad.openstack.org/p/zaqar-integrated-projects-use-cases


Just taking a look, there are two basic applications needed:

1) An inbox. Horizon wants to know when snapshots are done. Heat wants
to know what happened during a stack action. Etc.

2) A user-focused message queue. Heat wants to push data to agents.
Swift wants to synchronize processes when things happen.

To me, #1 is Zaqar as it is today. #2 is the one that I worry may not
be served best by bending #1 onto it.

_______________________________________________
OpenStack-dev mailing list
[email protected]
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



_______________________________________________
OpenStack-dev mailing list
[email protected]
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



_______________________________________________
OpenStack-dev mailing list
[email protected]
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [Zaqar] Comments on the concerns arose during the TC meeting

Reply via email to