Re: [openstack-dev] [Zaqar] Zaqar and SQS Properties of Distributed Queues

Fox, Kevin M Tue, 23 Sep 2014 12:31:26 -0700

Flavio wrote> "The reasoning, as explained in an other
email, is that from a use-case perspective, strict ordering won't hurt
you if you don't need it whereas having to implement it in the client
side because the service doesn't provide it can be a PITA."

The reasoning is flawed though. If performance is a concern, having strict 
ordering costs you when you may not care!

For example, is it better to implement a video streaming service on tcp or udp 
if firewalls aren't a concern? The latter. Why? Because ordering is a problem 
for these systems! If you have frames, 1 2 and 3..., and frame 2 gets lost on 
the first transmit and needs resending, but 3 gets there, the system has to 
wait to display frame 3 waiting for frame 2. But by the time frame 2 gets 
there, frame 3 doesn't matter because the system needs to move on to frame 5 
now. The human eye doens't care to wait for retransmits of frames. it only 
cares about the now. So because of the ordering, the eye sees 3 dropped frames 
instead of just one. making the system worse, not better.

Yeah, I know its a bit of a silly example. No one would implement video 
streaming on top of messaging like that. But it does present the point that 
something that seemingly only provides good things (order is always better then 
disorder, right?), sometimes has unintended and negative side affects. In 
lossless systems, it can show up as unnecessary latency or higher cpu loads.

I think your option 1 will make Zaqar much more palatable to those that don't 
need the strict ordering requirement.

I'm glad you want to make hard things like guaranteed ordering available so 
that users don't have to deal with it themselves if they don't want to. Its a 
great feature. But it also is an anti-feature in some cases. The ramifications 
of its requirements are higher then you think, and a feature to just disable it 
shouldn't be very costly to implement.

Part of the controversy right now, I think, has been not understanding the use 
case here, and by insisting that FIFO only ever is positive, it makes others 
that know its negatives question what other assumptions were made in Zaqar and 
makes them a little gun shy.

Please do reconsider this stance.

Thanks,
Kevin

________________________________________
From: Flavio Percoco [[email protected]]
Sent: Tuesday, September 23, 2014 5:58 AM
To: [email protected]
Subject: Re: [openstack-dev] [Zaqar] Zaqar and SQS Properties of Distributed 
Queues

On 09/23/2014 10:58 AM, Gordon Sim wrote:
> On 09/22/2014 05:58 PM, Zane Bitter wrote:
>> On 22/09/14 10:11, Gordon Sim wrote:
>>> As I understand it, pools don't help scaling a given queue since all the
>>> messages for that queue must be in the same pool. At present traffic
>>> through different Zaqar queues are essentially entirely orthogonal
>>> streams. Pooling can help scale the number of such orthogonal streams,
>>> but to be honest, that's the easier part of the problem.
>>
>> But I think it's also the important part of the problem. When I talk
>> about scaling, I mean 1 million clients sending 10 messages per second
>> each, not 10 clients sending 1 million messages per second each.
>
> I wasn't really talking about high throughput per producer (which I
> agree is not going to be a good fit), but about e.g. a large number of
> subscribers for the same set of messages, e.g. publishing one message
> per second to 10,000 subscribers.
>
> Even at much smaller scale, expanding from 10 subscribers to say 100
> seems relatively modest but the subscriber related load would increase
> by a factor of 10. I think handling these sorts of changes is also an
> important part of the problem (though perhaps not a part that Zaqar is
> focused on).
>
>> When a user gets to the point that individual queues have massive
>> throughput, it's unlikely that a one-size-fits-all cloud offering like
>> Zaqar or SQS is _ever_ going to meet their needs. Those users will want
>> to spin up and configure their own messaging systems on Nova servers,
>> and at that kind of size they'll be able to afford to. (In fact, they
>> may not be able to afford _not_ to, assuming per-message-based pricing.)
>
> [...]
>>> If scaling the number of communicants on a given communication channel
>>> is a goal however, then strict ordering may hamper that. If it does, it
>>> seems to me that this is not just a policy tweak on the underlying
>>> datastore to choose the desired balance between ordering and scale, but
>>> a more fundamental question on the internal structure of the queue
>>> implementation built on top of the datastore.
>>
>> I agree with your analysis, but I don't think this should be a goal.
>
> I think it's worth clarifying that alongside the goals since scaling can
> mean different things to different people. The implication then is that
> there is some limit in the number of producers and/or consumers on a
> queue beyond which the service won't scale and applications need to
> design around that.

Agreed. The above is not part of Zaqar's goals. That is to say that each
store knows best how to distribute reads and writes itself. Nonetheless,
drivers can be very smart about this and be implemented in ways they'd
take the most out of the backend.

>> Note that the user can still implement this themselves using
>> application-level sharding - if you know that in-order delivery is not
>> important to you, then randomly assign clients to a queue and then poll
>> all of the queues in the round-robin. This yields _exactly_ the same
>> semantics as SQS.
>
> You can certainly leave the problem of scaling in this dimension to the
> application itself by having them split the traffic into orthogonal
> streams or hooking up orthogonal streams to provide an aggregated stream.
>
> A true distributed queue isn't entirely trivial, but it may well be that
> most applications can get by with a much simpler approximation.
>
> Distributed (pub-sub) topic semantics are easier to implement, but if
> the application is responsible for keeping the partitions connected,
> then it also takes on part of the burden for availability and redundancy.
>
>> The reverse is true of SQS - if you want FIFO then you have to implement
>> re-ordering by sequence number in your application. (I'm not certain,
>> but it also sounds very much like this situation is ripe for losing
>> messages when your client dies.)
>>
>> So the question is: in which use case do we want to push additional
>> complexity into the application? The case where there are truly massive
>> volumes of messages flowing to a single point?  Or the case where the
>> application wants the messages in order?
>
> I think the first case is more generally about increasing the number of
> communicating parties (publishers or subscribers or both).
>
> For competing consumers ordering isn't usually a concern since you are
> processing in parallel anyway (if it is important you need some notion
> of message grouping within which order is preserved and some stickiness
> between group and consumer).
>
> For multiple non-competing consumers the choice needn't be as simple as
> total ordering or no ordering at all. Many systems quite naturally only
> define partial ordering which can be guaranteed more scalably.
>
> That's not to deny that there are indeed cases where total ordering may
> be required however.
>
>> I'd suggest both that the former applications are better able to handle
>> that extra complexity and that the latter applications are probably more
>> common. So it seems that the Zaqar team made a good decision.
>
> If that was a deliberate decision it would be worth clarifying in the
> goals. It seems to be a different conclusion from that reached by SQS
> and as such is part of the answer to the question that began the thread.
>
>> (Aside: it follows that Zaqar probably should have a maximum throughput
>> quota for each queue; or that it should report usage information in such
>> a way that the operator could sometimes bill more for a single queue
>> than they would for the same amount of usage spread across multiple
>> queues; or both.)
>>
>>> I also get the impression, perhaps wrongly, that providing the strict
>>> ordering guarantee wasn't necessarily an explicit requirement, but was
>>> simply a property of the underlying implementation(?).

The team decided to add FIFO based on the feedback from a group of SQS
users. FIFO is one of the things that the team decided to work on to
differentiate Zaqar from SQS. The reasoning, as explained in an other
email, is that from a use-case perspective, strict ordering won't hurt
you if you don't need it whereas having to implement it in the client
side because the service doesn't provide it can be a PITA.

The first feedback the team got about FIFO was back at the Portland
summit. The users attending the un-conference provided such feedback,
which the team then brought up to the list[0]. The thread is very
specific to mongodb's case, though, and it does not represent the
current implementation but it does show the team's intention to gather
feedback on this feature back then.

I believe the guarantee is still useful and it currently does not
represent an issue for the service nor the user. 2 things could happen
to FIFO in the future:

1. It's made optional and we allow users to opt-in in a per flavor
basis. (I personally don't like this one because it makes
interoperability even harder).
2. It's removed completely (Again, I personally don't like this one
because I don't think we have strong enough cases to require this to
happen).

That said, there's just 1 thing I think will happen for now, it'll be
kept as-is unless there are strong cases that'd require (1) or (2). All
this should be considered in the discussion of the API v2, whenever that
happens.

[0]
http://lists.openstack.org/pipermail/openstack-dev/2013-April/007650.html

Cheers,
Flavio

P.S: again, sorry for all the late and mixed replies, email disaster
happened and I'm working on recovering it.

--
@flaper87
Flavio Percoco

_______________________________________________
OpenStack-dev mailing list
[email protected]
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

_______________________________________________
OpenStack-dev mailing list
[email protected]
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [Zaqar] Zaqar and SQS Properties of Distributed Queues

Reply via email to