Re: [openstack-dev] [Zaqar] Zaqar and SQS Properties of Distributed Queues

Gordon Sim Tue, 23 Sep 2014 01:59:24 -0700

On 09/22/2014 05:58 PM, Zane Bitter wrote:

On 22/09/14 10:11, Gordon Sim wrote:

As I understand it, pools don't help scaling a given queue since all the
messages for that queue must be in the same pool. At present traffic
through different Zaqar queues are essentially entirely orthogonal
streams. Pooling can help scale the number of such orthogonal streams,
but to be honest, that's the easier part of the problem.


But I think it's also the important part of the problem. When I talk
about scaling, I mean 1 million clients sending 10 messages per second
each, not 10 clients sending 1 million messages per second each.

I wasn't really talking about high throughput per producer (which Iagree is not going to be a good fit), but about e.g. a large number ofsubscribers for the same set of messages, e.g. publishing one messageper second to 10,000 subscribers.

Even at much smaller scale, expanding from 10 subscribers to say 100seems relatively modest but the subscriber related load would increaseby a factor of 10. I think handling these sorts of changes is also animportant part of the problem (though perhaps not a part that Zaqar isfocused on).

When a user gets to the point that individual queues have massive
throughput, it's unlikely that a one-size-fits-all cloud offering like
Zaqar or SQS is _ever_ going to meet their needs. Those users will want
to spin up and configure their own messaging systems on Nova servers,
and at that kind of size they'll be able to afford to. (In fact, they
may not be able to afford _not_ to, assuming per-message-based pricing.)


[...]

If scaling the number of communicants on a given communication channel
is a goal however, then strict ordering may hamper that. If it does, it
seems to me that this is not just a policy tweak on the underlying
datastore to choose the desired balance between ordering and scale, but
a more fundamental question on the internal structure of the queue
implementation built on top of the datastore.


I agree with your analysis, but I don't think this should be a goal.

I think it's worth clarifying that alongside the goals since scaling canmean different things to different people. The implication then is thatthere is some limit in the number of producers and/or consumers on aqueue beyond which the service won't scale and applications need todesign around that.

Note that the user can still implement this themselves using
application-level sharding - if you know that in-order delivery is not
important to you, then randomly assign clients to a queue and then poll
all of the queues in the round-robin. This yields _exactly_ the same
semantics as SQS.

You can certainly leave the problem of scaling in this dimension to theapplication itself by having them split the traffic into orthogonalstreams or hooking up orthogonal streams to provide an aggregated stream.

A true distributed queue isn't entirely trivial, but it may well be thatmost applications can get by with a much simpler approximation.

Distributed (pub-sub) topic semantics are easier to implement, but ifthe application is responsible for keeping the partitions connected,then it also takes on part of the burden for availability and redundancy.

The reverse is true of SQS - if you want FIFO then you have to implement
re-ordering by sequence number in your application. (I'm not certain,
but it also sounds very much like this situation is ripe for losing
messages when your client dies.)

So the question is: in which use case do we want to push additional
complexity into the application? The case where there are truly massive
volumes of messages flowing to a single point?  Or the case where the
application wants the messages in order?

I think the first case is more generally about increasing the number ofcommunicating parties (publishers or subscribers or both).

For competing consumers ordering isn't usually a concern since you areprocessing in parallel anyway (if it is important you need some notionof message grouping within which order is preserved and some stickinessbetween group and consumer).

For multiple non-competing consumers the choice needn't be as simple astotal ordering or no ordering at all. Many systems quite naturally onlydefine partial ordering which can be guaranteed more scalably.

That's not to deny that there are indeed cases where total ordering maybe required however.

I'd suggest both that the former applications are better able to handle
that extra complexity and that the latter applications are probably more
common. So it seems that the Zaqar team made a good decision.

If that was a deliberate decision it would be worth clarifying in thegoals. It seems to be a different conclusion from that reached by SQSand as such is part of the answer to the question that began the thread.

(Aside: it follows that Zaqar probably should have a maximum throughput
quota for each queue; or that it should report usage information in such
a way that the operator could sometimes bill more for a single queue
than they would for the same amount of usage spread across multiple
queues; or both.)

I also get the impression, perhaps wrongly, that providing the strict
ordering guarantee wasn't necessarily an explicit requirement, but was
simply a property of the underlying implementation(?).


I wasn't involved, but I expect it was a bit of both (i.e. it is a
chicken/egg question).

I see it as more of a problem/solution question. There is nothing wrongwith identifying properties of a particular solution that may beadvantageous, but I think there is value in distinguishing these fromrequirements that define the problem.


_______________________________________________
OpenStack-dev mailing list
[email protected]
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [Zaqar] Zaqar and SQS Properties of Distributed Queues

Reply via email to