Re: [openstack-dev] [Zaqar] Zaqar and SQS Properties of Distributed Queues

Flavio Percoco Tue, 23 Sep 2014 13:14:24 -0700

On 09/23/2014 09:29 PM, Fox, Kevin M wrote:
> Flavio wrote> "The reasoning, as explained in an other
> email, is that from a use-case perspective, strict ordering won't hurt
> you if you don't need it whereas having to implement it in the client
> side because the service doesn't provide it can be a PITA."
> 
> The reasoning is flawed though. If performance is a concern, having strict 
> ordering costs you when you may not care!
> 
> For example, is it better to implement a video streaming service on tcp or 
> udp if firewalls aren't a concern? The latter. Why? Because ordering is a 
> problem for these systems! If you have frames, 1 2 and 3..., and frame 2 gets 
> lost on the first transmit and needs resending, but 3 gets there, the system 
> has to wait to display frame 3 waiting for frame 2. But by the time frame 2 
> gets there, frame 3 doesn't matter because the system needs to move on to 
> frame 5 now. The human eye doens't care to wait for retransmits of frames. it 
> only cares about the now. So because of the ordering, the eye sees 3 dropped 
> frames instead of just one. making the system worse, not better.
> 
> Yeah, I know its a bit of a silly example. No one would implement video 
> streaming on top of messaging like that. But it does present the point that 
> something that seemingly only provides good things (order is always better 
> then disorder, right?), sometimes has unintended and negative side affects. 
> In lossless systems, it can show up as unnecessary latency or higher cpu 
> loads.
> 
> I think your option 1 will make Zaqar much more palatable to those that don't 
> need the strict ordering requirement.
> 
> I'm glad you want to make hard things like guaranteed ordering available so 
> that users don't have to deal with it themselves if they don't want to. Its a 
> great feature. But it also is an anti-feature in some cases. The 
> ramifications of its requirements are higher then you think, and a feature to 
> just disable it shouldn't be very costly to implement.
> 
> Part of the controversy right now, I think, has been not understanding the 
> use case here, and by insisting that FIFO only ever is positive, it makes 
> others that know its negatives question what other assumptions were made in 
> Zaqar and makes them a little gun shy.
> 
> Please do reconsider this stance.



Hey Kevin,

FWIW, I explicitly said "from a use-case perspective" which in the
context of the emails I was replying to referred to the need (or not)
for FIFO and not to the impact it has in other areas like performance.

In any way I tried to insist that FIFO is only ever positive and I've
also explicitly said in several other emails that it *does* have an
impact on performance.

That said, I agree that if FIFO's reality in Zaqar changes, it'll likely
be towards the option (1).

Thanks for your feedback,
Flavio

> 
> Thanks,
> Kevin
> 
> 
> ________________________________________
> From: Flavio Percoco [[email protected]]
> Sent: Tuesday, September 23, 2014 5:58 AM
> To: [email protected]
> Subject: Re: [openstack-dev] [Zaqar] Zaqar and SQS Properties of Distributed 
> Queues
> 
> On 09/23/2014 10:58 AM, Gordon Sim wrote:
>> On 09/22/2014 05:58 PM, Zane Bitter wrote:
>>> On 22/09/14 10:11, Gordon Sim wrote:
>>>> As I understand it, pools don't help scaling a given queue since all the
>>>> messages for that queue must be in the same pool. At present traffic
>>>> through different Zaqar queues are essentially entirely orthogonal
>>>> streams. Pooling can help scale the number of such orthogonal streams,
>>>> but to be honest, that's the easier part of the problem.
>>>
>>> But I think it's also the important part of the problem. When I talk
>>> about scaling, I mean 1 million clients sending 10 messages per second
>>> each, not 10 clients sending 1 million messages per second each.
>>
>> I wasn't really talking about high throughput per producer (which I
>> agree is not going to be a good fit), but about e.g. a large number of
>> subscribers for the same set of messages, e.g. publishing one message
>> per second to 10,000 subscribers.
>>
>> Even at much smaller scale, expanding from 10 subscribers to say 100
>> seems relatively modest but the subscriber related load would increase
>> by a factor of 10. I think handling these sorts of changes is also an
>> important part of the problem (though perhaps not a part that Zaqar is
>> focused on).
>>
>>> When a user gets to the point that individual queues have massive
>>> throughput, it's unlikely that a one-size-fits-all cloud offering like
>>> Zaqar or SQS is _ever_ going to meet their needs. Those users will want
>>> to spin up and configure their own messaging systems on Nova servers,
>>> and at that kind of size they'll be able to afford to. (In fact, they
>>> may not be able to afford _not_ to, assuming per-message-based pricing.)
>>
>> [...]
>>>> If scaling the number of communicants on a given communication channel
>>>> is a goal however, then strict ordering may hamper that. If it does, it
>>>> seems to me that this is not just a policy tweak on the underlying
>>>> datastore to choose the desired balance between ordering and scale, but
>>>> a more fundamental question on the internal structure of the queue
>>>> implementation built on top of the datastore.
>>>
>>> I agree with your analysis, but I don't think this should be a goal.
>>
>> I think it's worth clarifying that alongside the goals since scaling can
>> mean different things to different people. The implication then is that
>> there is some limit in the number of producers and/or consumers on a
>> queue beyond which the service won't scale and applications need to
>> design around that.
> 
> Agreed. The above is not part of Zaqar's goals. That is to say that each
> store knows best how to distribute reads and writes itself. Nonetheless,
> drivers can be very smart about this and be implemented in ways they'd
> take the most out of the backend.
> 
> 
>>> Note that the user can still implement this themselves using
>>> application-level sharding - if you know that in-order delivery is not
>>> important to you, then randomly assign clients to a queue and then poll
>>> all of the queues in the round-robin. This yields _exactly_ the same
>>> semantics as SQS.
>>
>> You can certainly leave the problem of scaling in this dimension to the
>> application itself by having them split the traffic into orthogonal
>> streams or hooking up orthogonal streams to provide an aggregated stream.
>>
>> A true distributed queue isn't entirely trivial, but it may well be that
>> most applications can get by with a much simpler approximation.
>>
>> Distributed (pub-sub) topic semantics are easier to implement, but if
>> the application is responsible for keeping the partitions connected,
>> then it also takes on part of the burden for availability and redundancy.
>>
>>> The reverse is true of SQS - if you want FIFO then you have to implement
>>> re-ordering by sequence number in your application. (I'm not certain,
>>> but it also sounds very much like this situation is ripe for losing
>>> messages when your client dies.)
>>>
>>> So the question is: in which use case do we want to push additional
>>> complexity into the application? The case where there are truly massive
>>> volumes of messages flowing to a single point?  Or the case where the
>>> application wants the messages in order?
>>
>> I think the first case is more generally about increasing the number of
>> communicating parties (publishers or subscribers or both).
>>
>> For competing consumers ordering isn't usually a concern since you are
>> processing in parallel anyway (if it is important you need some notion
>> of message grouping within which order is preserved and some stickiness
>> between group and consumer).
>>
>> For multiple non-competing consumers the choice needn't be as simple as
>> total ordering or no ordering at all. Many systems quite naturally only
>> define partial ordering which can be guaranteed more scalably.
>>
>> That's not to deny that there are indeed cases where total ordering may
>> be required however.
>>
>>> I'd suggest both that the former applications are better able to handle
>>> that extra complexity and that the latter applications are probably more
>>> common. So it seems that the Zaqar team made a good decision.
>>
>> If that was a deliberate decision it would be worth clarifying in the
>> goals. It seems to be a different conclusion from that reached by SQS
>> and as such is part of the answer to the question that began the thread.
>>
>>> (Aside: it follows that Zaqar probably should have a maximum throughput
>>> quota for each queue; or that it should report usage information in such
>>> a way that the operator could sometimes bill more for a single queue
>>> than they would for the same amount of usage spread across multiple
>>> queues; or both.)
>>>
>>>> I also get the impression, perhaps wrongly, that providing the strict
>>>> ordering guarantee wasn't necessarily an explicit requirement, but was
>>>> simply a property of the underlying implementation(?).
> 
> The team decided to add FIFO based on the feedback from a group of SQS
> users. FIFO is one of the things that the team decided to work on to
> differentiate Zaqar from SQS. The reasoning, as explained in an other
> email, is that from a use-case perspective, strict ordering won't hurt
> you if you don't need it whereas having to implement it in the client
> side because the service doesn't provide it can be a PITA.
> 
> The first feedback the team got about FIFO was back at the Portland
> summit. The users attending the un-conference provided such feedback,
> which the team then brought up to the list[0]. The thread is very
> specific to mongodb's case, though, and it does not represent the
> current implementation but it does show the team's intention to gather
> feedback on this feature back then.
> 
> I believe the guarantee is still useful and it currently does not
> represent an issue for the service nor the user. 2 things could happen
> to FIFO in the future:
> 
> 1. It's made optional and we allow users to opt-in in a per flavor
> basis. (I personally don't like this one because it makes
> interoperability even harder).
> 2. It's removed completely (Again, I personally don't like this one
> because I don't think we have strong enough cases to require this to
> happen).
> 
> That said, there's just 1 thing I think will happen for now, it'll be
> kept as-is unless there are strong cases that'd require (1) or (2). All
> this should be considered in the discussion of the API v2, whenever that
> happens.
> 
> [0]
> http://lists.openstack.org/pipermail/openstack-dev/2013-April/007650.html
> 
> Cheers,
> Flavio
> 
> P.S: again, sorry for all the late and mixed replies, email disaster
> happened and I'm working on recovering it.
> 
> --
> @flaper87
> Flavio Percoco
> 
> _______________________________________________
> OpenStack-dev mailing list
> [email protected]
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 


-- 
@flaper87
Flavio Percoco

_______________________________________________
OpenStack-dev mailing list
[email protected]
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [Zaqar] Zaqar and SQS Properties of Distributed Queues

Reply via email to