[PATCH 0/2] Re: Priority based queuing

2018-05-11 Thread Patrick Hemmer
Ok, so here is the full submission for priority based queuing.

Notes since previous update:

Wasn't really able to optimize the tree search any. Tried a few things,
but nothing made a measurable performance difference.

I added a warning message and documentation making clear the issues with
timestamp wrapping.
Though one thing that might not be completely obvious is that even if
the user does not configure `set-priority-offset` at all, they're still
susceptible to the wrapping issue as the priority is the queue key
whether priority is adjusted or not.

The implementation of the %sq (srv_queue) and %bq (backend_queue) was
change to keep the description accurate. The description is "number of
requests which were processed before this one". The previous
implementation just stored the size of the queue at the time the
connection was queued. Since we can inject a connection into the middle
of the queue, this no longer works. Now we keep a count of dequeued
connections, and take the difference between when the connection was
queued, and then dequeued. This also means the value will be slightly
different even for users who don't use priority, as the previous method
would have included connections which closed without being processed.

I added sample fetches for retrieving the class/offset of the current
transaction. I think it might be beneficial to add some other fetches
for tracking the health of the queue, such as average class/offset, or
an exponential moving average of the class/offset for requests added to
the queue, requests processed, and requests which closed/timed out. But
this is just more stuff the code would have to store, so unsure if
they're worth it.


I wasn't convinced the 64-bit key was a bad idea, so I implemented the
idea with a 12/52 split and an absolute timestamp.  On my system (which
is 64-bit) the performance is about 20% faster. The code is much
simpler. And it also solves the limitations and issues with wrapping.
The patch for this is included in case it's of interest.

-Patrick


Patrick Hemmer (2):
  MEDIUM: add set-priority-class and set-priority-offset
  use a 64-bit int with absolute timestamp for priority-offset

 doc/configuration.txt  |  38 +++
 doc/lua-api/index.rst  |  18 
 include/types/proxy.h  |   3 +-
 include/types/queue.h  |   2 +-
 include/types/server.h |   3 +-
 include/types/stream.h |   7 +-
 src/cfgparse.c |  15 +++
 src/hlua.c |  69 +
 src/log.c  |   4 +-
 src/proto_http.c   |   4 +-
 src/proxy.c|   2 +-
 src/queue.c| 261
+
 src/server.c   |   2 +-
 src/stream.c   |  10 +-
 14 files changed, 366 insertions(+), 72 deletions(-)

-- 
2.16.3



[PATCH 0/2] Re: Priority based queuing

2018-05-09 Thread Patrick Hemmer
On 2018/5/5 13:55, Willy Tarreau wrote:
> On Sat, May 05, 2018 at 01:33:51PM -0400, Patrick Hemmer wrote:
>>> Also I'm thinking that we can even use 32-bit by putting the frontier
>>> between the date and the fixed priority (let's call it class) somewhere
>>> else :
>>>   - 8 bit class + 24 bit offset  => 256 classes and +/- 2.3 hours
offsets
>>>   - 12 bit class + 20 bit offset => 4096 classes and +/- 8 minutes
offsets
>>>   - 16 bit class + 16 bit offset => 64k classes and +/- 32 second
offsets
>> What's the benefit to using 32-bit over 64-bit, and is that benefit
>> worth the cost of the added code complexity and processing time?
>
> Slightly smaller memory footprint (doesn't count much), but more
> importantly much smaller processing time especially on 32-bit systems.
I would have expected the extra processing for wrapping to negate the
benefit.

>
>> If we
>> used 64-bit, we could do a 16/48 bit split and have an absolute
>> timestamp out to year 10889 at millisecond precision.
>
> Not really because that would force us to implement yet another clock
> and then to have two distinct set of clocks for queues and for all other
> internal events and timeouts (including the queue timeout!).
We already have the `now` and `date` variables. The priority is only
used for the queue key, and not to replace anything else. The queue key
is already getting replaced, so it would be no additional work.

> That would
> really be the beginning of a mess. We've been running with 32-bit
> millisecond-precise ticks for over a decade without ever a complain that
> it was not enough nor that the wrapping at 49.7 days was an issue. If
> one day we'd decide to change this, it would be to support finer-grained
> timestamps (eg: nanoseconds or CPU cycles) which would not necessarily
> protect us from wrapping either.
>
>>> I'm pretty sure that the two first options are way enough to cover
almost
>>> everyone's needs.
>> For me a 12-bit class should work. My intention is to implement
>> something similar to email spam filtering, where you generate a score
>> based on the rules that match, and some rules are worth more than
>> others. 8 bits might be a bit difficult to work within.
>
> Makes sense.
>
>> I also suspect the 16-bit option (32 second offset) is too small for
>> date based.
>
> Yes I agree. I'm pretty sure that for some TCP connections certain users
> will want to have a long delay (eg: wait a minute before your FTP transfer
> starts).
>
>> The rest of this all seems fine. I've got no other thoughts to add.
>
> Great, thanks!
>
>> So the question is, which route do we go?
>> 1. 32-bit int (12/20 split)
>> 2. 64-bit int (32/32 split)
>
> I think you can start with the 12/20 split first, knowing that if
> anybody ever requests an extension of either time or classes we can
> switch to 64 without much effort. Until this happens we'll be more
> efficient on small systems. Conversely we can't do it the other way
> around because we could break some setups :-)
>
>> 3. 64-bit int (16/48 split) with absolute timestamps
>
> I don't like it much for the reasons above.
>
>> 4. 64-bit int with a single `set-priority` action. Let the user use the
>> a new`ms()` fetch for time based. The user does the bit shifting if they
>> want to use both (using the `mul()` converter, or we can add a
`shift()`).
>
> I think this level of complexity ought to be done once in the code and not
> each and every time for all users. It would only be manageable for a
number
> of the people on this list, but newcomers would call us words for this :-)
>
> Thanks,
> Willy

Ok, so this patch set includes a fully functional implementation of the
priority class & offset. Unfortunately handling the offset got really
complicated due to the wrapping & locking.
There might be a little room for improvement, but I don't think much.

I'm still working on the code, but I wanted to submit this for review as
the final version probably won't be much different.

The second patch is a test I wrote to try and catch all the different
scenarios. I'm including in case you were inclined to play with the tree
search algorithm and test it. It's not intended to be merged.

Things to do:
* Try and optimize the macros & tree search.
* Output a warning if `timeout queue` > max offset.
* Add LUA methods & sample fetches.
* General cleanup.
* Docs

-Patrick


Patrick Hemmer (2):
  MEDIUM: add set-priority-class and set-priority-offset
  tests for queue set-priority

 Makefile.test  |  21 
 include/types/proxy.h  |   2 +-
 include/types/queue.h  |   2 +-
 include/types/server.h |   2 +-
 include/types/stream.h |   2 +
 src/proxy.c|   2 +-
 src/queue.c| 280
++---
 src/server.c   |   2 +-
 src/stream.c   |   2 +
 tests/test-queue.cfg   |   8 ++
 tests/test_queue.c | 117 +
 11 files changed, 396 insertions(+), 44 deletions(-)
 create mode 100644 Makefile.test
 create 

Re: Priority based queuing

2018-05-05 Thread Willy Tarreau
On Sat, May 05, 2018 at 01:33:51PM -0400, Patrick Hemmer wrote:
> > Also I'm thinking that we can even use 32-bit by putting the frontier
> > between the date and the fixed priority (let's call it class) somewhere
> > else :
> >   - 8 bit class + 24 bit offset  => 256 classes and +/- 2.3 hours offsets
> >   - 12 bit class + 20 bit offset => 4096 classes and +/- 8 minutes offsets
> >   - 16 bit class + 16 bit offset => 64k classes and +/- 32 second offsets
> What's the benefit to using 32-bit over 64-bit, and is that benefit
> worth the cost of the added code complexity and processing time?

Slightly smaller memory footprint (doesn't count much), but more
importantly much smaller processing time especially on 32-bit systems.

> If we
> used 64-bit, we could do a 16/48 bit split and have an absolute
> timestamp out to year 10889 at millisecond precision.

Not really because that would force us to implement yet another clock
and then to have two distinct set of clocks for queues and for all other
internal events and timeouts (including the queue timeout!). That would
really be the beginning of a mess. We've been running with 32-bit
millisecond-precise ticks for over a decade without ever a complain that
it was not enough nor that the wrapping at 49.7 days was an issue. If
one day we'd decide to change this, it would be to support finer-grained
timestamps (eg: nanoseconds or CPU cycles) which would not necessarily
protect us from wrapping either.

> > I'm pretty sure that the two first options are way enough to cover almost
> > everyone's needs.
> For me a 12-bit class should work. My intention is to implement
> something similar to email spam filtering, where you generate a score
> based on the rules that match, and some rules are worth more than
> others. 8 bits might be a bit difficult to work within.

Makes sense.

> I also suspect the 16-bit option (32 second offset) is too small for
> date based.

Yes I agree. I'm pretty sure that for some TCP connections certain users
will want to have a long delay (eg: wait a minute before your FTP transfer
starts).

> The rest of this all seems fine. I've got no other thoughts to add.

Great, thanks!

> So the question is, which route do we go?
> 1. 32-bit int (12/20 split)
> 2. 64-bit int (32/32 split)

I think you can start with the 12/20 split first, knowing that if
anybody ever requests an extension of either time or classes we can
switch to 64 without much effort. Until this happens we'll be more
efficient on small systems. Conversely we can't do it the other way
around because we could break some setups :-)

> 3. 64-bit int (16/48 split) with absolute timestamps

I don't like it much for the reasons above.

> 4. 64-bit int with a single `set-priority` action. Let the user use the
> a new`ms()` fetch for time based. The user does the bit shifting if they
> want to use both (using the `mul()` converter, or we can add a `shift()`).

I think this level of complexity ought to be done once in the code and not
each and every time for all users. It would only be manageable for a number
of the people on this list, but newcomers would call us words for this :-)

Thanks,
Willy



Re: Priority based queuing

2018-05-05 Thread Patrick Hemmer


On 2018/5/5 01:29, Willy Tarreau wrote:
> On Fri, May 04, 2018 at 06:49:00PM -0400, Patrick Hemmer wrote:
>> I'm not quite following the need for multiple queues. Why wouldn't you
>> just have one sorted queue, where if multiple pending requests have the
>> same priority, then they're FIFO.
> That's what the time-ordered queue does. I proposed this so that you can
> ensure that lower priorities are always processed first until there is no
> more of the same level.
Ah, I was considering the 2 solutions as an either-or choice. Not that
you'd use both in the same configuration, or at least in the same backend.

>> The queue was implemented using the existing pendconns code. The only
>> thing that was changed was to insert in sorted order. Obviously this is
>> just a quick and dirty implementation. There are lots of priority queue
>> implementations to choose from, and I don't know which one is best
>> suited to this kind of task. We could keep the existing sorted
>> linked-list, and just do some optimizations for insertion. Or we could
>> switch to some sort of tree. The linked-list has the advantage in that I
>> suspect that the vast majority of additions will append to the end, and
>> pops are always off the front, so well suited to a linked list. A tree
>> might become very unbalanced very fast, and require lots of maintenance.
> No, trees are quite cheap, we use them absolutely everywhere now. Just
> take a look at wake_expired_tasks() to get an idea. In fact every single
> time we've been relying on lists doing linear lookups, we've got several
> bug reports about haproxy eating all the CPU because some people were
> using it in a way that wasn't initially considered as reasonable but
> which made a lot of sense to them.
This is why I liked the idea of a single implementation. It gave the
flexibility to the user to do whatever they wanted (time or class
based). Granted I was only thinking of doing one or the other, but the
user could still do both, they would just have to manage dedicating a
range to the date/class. Maybe someone would want to consider date
before class.

> In this case you'd be better of with the principle consisting in using
> a larger integer with the priority at the top and the date at the bottom.
> But that's where I said that it would require two lookups to deal with
> wrapping date so I'm not sure it's any better than using multiple queues.
> Well, at least it's less maintenance burden. The detailed principle is
> the following :
>
>uint64_t priority :
>
> 6332 31 0
> [ hard priority | date   ]
>
>next = lookup_first(queue);
>if (!next)
>   return;
>  
>=> next is the lowest value of both priority and date
>
> Then you perform a second lookup with the oldest past date for the same
> priority level :
>
>key = (next->key & 0xULL) | (now_ms - TIMER_LOOK_BACK);
>next2 = lookup_ge(queue, key);
>if (next2 && !((next2->value ^ next->value) >> 32))
>   next=next2
>
> Then you dequeue next which is always the next one.
>
> That's where I thought that using two distinct levels was simpler, but
> if we factor in the fact that it would limit the number of priority
> classes and would require more maintenance code, it's probably not
> worth the hassle to save the 4 extra lines we have above to respect
> the wrapping date.
>
> Also I'm thinking that we can even use 32-bit by putting the frontier
> between the date and the fixed priority (let's call it class) somewhere
> else :
>   - 8 bit class + 24 bit offset  => 256 classes and +/- 2.3 hours offsets
>   - 12 bit class + 20 bit offset => 4096 classes and +/- 8 minutes offsets
>   - 16 bit class + 16 bit offset => 64k classes and +/- 32 second offsets
What's the benefit to using 32-bit over 64-bit, and is that benefit
worth the cost of the added code complexity and processing time? If we
used 64-bit, we could do a 16/48 bit split and have an absolute
timestamp out to year 10889 at millisecond precision.

> I'm pretty sure that the two first options are way enough to cover almost
> everyone's needs.
For me a 12-bit class should work. My intention is to implement
something similar to email spam filtering, where you generate a score
based on the rules that match, and some rules are worth more than
others. 8 bits might be a bit difficult to work within.
I also suspect the 16-bit option (32 second offset) is too small for
date based.

> Let's say for a moment that we'd use the second option
> (12+20 bits), the queue insertion code becomes this :
>
>pendconn->node.key = (stream->prio_class << 20) +
> (now_ms + stream->prio_offset) & 0xF;
>if (srv)
> eb32_insert(>pendconns, >node);
>else
> eb32_insert(>pendconns, >node);
>
>
> And the dequeuing code in pendconn_process_next_strm() which takes care
> both of the backend's and server's queues while still respecting their
> classes would become 

Re: Priority based queuing

2018-05-04 Thread Willy Tarreau
On Fri, May 04, 2018 at 06:49:00PM -0400, Patrick Hemmer wrote:
> I'm not quite following the need for multiple queues. Why wouldn't you
> just have one sorted queue, where if multiple pending requests have the
> same priority, then they're FIFO.

That's what the time-ordered queue does. I proposed this so that you can
ensure that lower priorities are always processed first until there is no
more of the same level.

> I had sent another email which proposed an implementation that would
> satisfy both designs
> (https://www.mail-archive.com/haproxy@formilux.org/msg29876.html).
> Unfortunately I was composing it when you sent your most recent reply,
> so I didn't see it before I sent.

Thanks, I missed this one indeed.

> I went ahead and implemented the functionality I had talked about in
> that other email, and have attached a WIP patch. I've done some testing
> on it, and it seems to work fine. The syntax is exactly as proposed:
> 
> # time-based priority
> http-request set-priority date(20) if LOW_PRIORITY
> http-request set-priority date(-20) if HIGH_PRIORITY
> 
> # score-based priority
> http-request set-priority int(20) if LOW_PRIORITY
> http-request set-priority int(-20) if HIGH_PRIORITY
> 
> Some notes on the implementation:
> 
> I used a `long` for tracking the priority. Obviously this has
> limitations when used with date(), as the timestamps are fairly large. I
> can change this to `long long`.

Ah now I understand how you wanted to implement it. I have a big concern
here with wrapping time however, which is what I explained in an earlier
mail where I said that if needed we could mix date and priority in the
same integer but that would still require two lookups. Our date wraps
every 49.7 days (32-bit millisecond resolution) so depending on the
phases of the moon some requests will be processed before the highest
priority ones or after, and you'll see some priority inversion 7 times
a year.

> The queue was implemented using the existing pendconns code. The only
> thing that was changed was to insert in sorted order. Obviously this is
> just a quick and dirty implementation. There are lots of priority queue
> implementations to choose from, and I don't know which one is best
> suited to this kind of task. We could keep the existing sorted
> linked-list, and just do some optimizations for insertion. Or we could
> switch to some sort of tree. The linked-list has the advantage in that I
> suspect that the vast majority of additions will append to the end, and
> pops are always off the front, so well suited to a linked list. A tree
> might become very unbalanced very fast, and require lots of maintenance.

No, trees are quite cheap, we use them absolutely everywhere now. Just
take a look at wake_expired_tasks() to get an idea. In fact every single
time we've been relying on lists doing linear lookups, we've got several
bug reports about haproxy eating all the CPU because some people were
using it in a way that wasn't initially considered as reasonable but
which made a lot of sense to them. In the case of the list above, the
corner case would be server with "maxconn 5000" and half of the requests
having a high priority and the other half having a low one, resulting in
an average of 2500 visits in the optimal case where you can insert from
both sides (which was what we used to do a very long time ago in the
scheduler).

> Using the date() fetch obviously doesn't provide good precision.

Hmmm good point here it's seconds so it will not wrap for the next 20
years but will provide a useless precision.

> We
> could add a `datems()`, `datens()`, or `ticks()` (ms since start) sample
> fetch. But obviously we would need to use `long long` for that, Though
> ticks could use a `long`, but would wrap at around 16 months.

No, 49.7 days ;-)

The other problem I'm having here is that it artificially mixes two types
of totally unrelated data and relies on the lack of intersection between
their domains (when time doesn't wrap). Indeed, here you're considering
that values below a certain number cannot be dates so they definitely are
fixed priorities. But that also means that people who want to use ranges
will randomly collide with dates and will observe a changing behaviour
along the year.

In this case you'd be better of with the principle consisting in using
a larger integer with the priority at the top and the date at the bottom.
But that's where I said that it would require two lookups to deal with
wrapping date so I'm not sure it's any better than using multiple queues.
Well, at least it's less maintenance burden. The detailed principle is
the following :

   uint64_t priority :

6332 31 0
[ hard priority | date   ]

   next = lookup_first(queue);
   if (!next)
  return;
 
   => next is the lowest value of both priority and date

Then you perform a second lookup with the oldest past date for the same
priority level :

   key = 

Re: Priority based queuing

2018-05-04 Thread Patrick Hemmer


On 2018/5/2 22:56, Willy Tarreau wrote:
> On Wed, May 02, 2018 at 04:29:33PM -0400, Patrick Hemmer wrote:
>> I think you're misunderstanding my design, as scoring wouldn't work like
>> this at all. If you give the gold class a score of 1000 (where higher
>> number means higher priority), then the only thing that would get
>> processed before gold class would be another class that has a score >
>> 1000. If nothing does, and a gold request comes in, it gets processed
>> first, no matter how big the queue is.
>> Think of scoring as instead having multiple queues. Each queue is only
>> processed if the queue before it is empty.
> (...)
>
> OK you convinced me. Not on everything, but on the fact that we're trying
> to address different points. My goal is to make it possible to improve
> quality of service between good requests, and your goal is to improve the
> classification between good, suspicious, and bad requests. Each of us see
> how to expand a little bit our respective model to address part of the
> other one (though less efficiently).
>
> I agree that for your goal, multi-queue is better, but I still maintain
> that for my goal, the time-based queue gives better control. The good
> news is that the two are orthogonal and 100% compatible.
>
> Basically the queueing system should be redesigned as a list of time-based
> trees that are visited in order of priority (or traffic classes, we'll have
> to find the proper wording to avoid confusion). If you think you can address
> your needs with just a small set of different priorities, probably that we
> can implement this with a small array (2-4 queues). If you think you need
> more, then we'd rather think about building a composite position value in
> the tree made of the priority at the top of the word and the service time
> at the bottom of the word. This way, picking the first value will always
> find the lowest priority value first. There's one subtlety related to
> wrapping time there however, but it can be addressed in two lookups.
>
> Please let me know if you'd be fine with designing and implementing
> something like this.
>
> Cheers,
> Willy

I'm not quite following the need for multiple queues. Why wouldn't you
just have one sorted queue, where if multiple pending requests have the
same priority, then they're FIFO.

I had sent another email which proposed an implementation that would
satisfy both designs
(https://www.mail-archive.com/haproxy@formilux.org/msg29876.html).
Unfortunately I was composing it when you sent your most recent reply,
so I didn't see it before I sent.
I went ahead and implemented the functionality I had talked about in
that other email, and have attached a WIP patch. I've done some testing
on it, and it seems to work fine. The syntax is exactly as proposed:

# time-based priority
http-request set-priority date(20) if LOW_PRIORITY
http-request set-priority date(-20) if HIGH_PRIORITY

# score-based priority
http-request set-priority int(20) if LOW_PRIORITY
http-request set-priority int(-20) if HIGH_PRIORITY

Some notes on the implementation:

I used a `long` for tracking the priority. Obviously this has
limitations when used with date(), as the timestamps are fairly large. I
can change this to `long long`.

The queue was implemented using the existing pendconns code. The only
thing that was changed was to insert in sorted order. Obviously this is
just a quick and dirty implementation. There are lots of priority queue
implementations to choose from, and I don't know which one is best
suited to this kind of task. We could keep the existing sorted
linked-list, and just do some optimizations for insertion. Or we could
switch to some sort of tree. The linked-list has the advantage in that I
suspect that the vast majority of additions will append to the end, and
pops are always off the front, so well suited to a linked list. A tree
might become very unbalanced very fast, and require lots of maintenance.

Using the date() fetch obviously doesn't provide good precision. We
could add a `datems()`, `datens()`, or `ticks()` (ms since start) sample
fetch. But obviously we would need to use `long long` for that, Though
ticks could use a `long`, but would wrap at around 16 months.

I still plan on adding a sample fetch for obtaining the current priority
value, and a lua function for setting it. Not sure if we want a lua
function for retrieving, or if using the fetch is good enough.

-Patrick

From 4783dccfc869cfb4219d195ba21f896272211c9e Mon Sep 17 00:00:00 2001
From: Patrick Hemmer 
Date: Fri, 4 May 2018 16:31:16 -0400
Subject: [PATCH] MINOR: add {http,tcp}-request set-priority for queue
 prioritization

This adds the set-priority action to http-request and tcp-request content.
The priority value is used when connections are queued to determine which 
connections should be served first. The lowest integer value is served first 
(akin to the 'nice' parameter).
---
 include/types/stream.h |  

Re: Priority based queuing

2018-05-02 Thread Patrick Hemmer


On 2018/5/2 16:29, Patrick Hemmer wrote:
>
>
> On 2018/5/2 13:22, Willy Tarreau wrote:
>> On Wed, May 02, 2018 at 12:44:06PM -0400, Patrick Hemmer wrote:
>>> Can you elaborate on what you're thinking of for a time-based queue?
>>>
>>> What I'm imagining you mean is that you would write a rule to set the
>>> max queue time, and haproxy would insert it into the queue sorting on
>>> TIME_NOW() + MAX_QUEUE_TIME. The main difference I see to this approach
>>> vs scoring, is that you ensure that an item doesn't sit on the queue
>>> forever (or whatever `timeout queue` is set to) if higher priority stuff
>>> keeps getting inserted before it.
>> In part, but mostly that under contention you can still maintain a
>> good quality of service. I remember having had a discussion a very
>> long time ago with a gaming site operator explaining that he wanted
>> to send profile management requests to a dedicated slow server so
>> that people filling in their profile do not disturb the ones playing.
>> For me it's a good example : "how long do you think these people are
>> willing to wait for the server to respond to each click to update
>> their profile?". Let's say 2 seconds is OK, you just add a 2s offset
>> to the position in the queue for such requests. If in the mean time
>> other higher priority requests come in, the delayed one may face the
>> 2 seconds delay. But it will not face more. And obviously if no other
>> request is pending before it in the queue it will be served earlier.
>>
>> A similar example is for certain shopping sites which consider that
>> once you have something in your shopping cart, you're much more likely
>> to finish the process and to pay because you've found the item you
>> were looking for, so you're more willing to tolerate a slightly higher
>> response time, and as such provide a better response time to newcomers.
>> But while this small delay can easily be expressed in milliseconds
>> (probably 50/100), it's almost impossible to define it using positions.
>> What if 50 new visitors suddenly come in ? You don't want the guy with
>> his item to suddenly experience a timeout. The time-based queue however
>> grants you a control over the service degradation you're accepting to
>> inflict to your users.
> There's a key difference in our designs. You're trying to classify
> traffic based on destination. I'm trying to classify traffic based on
> source.
> In my design, the maxconn is set to a level such that the server is
> healthy and every request can be processed fast. With this, and a
> score based queue, and under attack, the good requests will be drained
> fast, leaving only the bad requests. Thus no normal user should be
> getting a timeout, no matter what resource they're going to.
> Your design works when the backend system can't keep up with good
> legitimate traffic, and you need to prioritize one good request over
> another good request. My design is when the backend system can't keep
> up with bad traffic, and so we want to process the good traffic first.
>
>> Another interesting example is when you want ot provide strong service
>> guarantees to some top-level visitors while running under intense load.
>> By using only a position, it's easy to say "I want the Gold class to
>> advance by 1000 positions". But if the site is highly loaded and you
>> have 10k pending requests, these 1000 positions remain irrelevant,
>> because the requests sits there, waiting for 90% of the pollution to
>> be flushed. 
> I think you're misunderstanding my design, as scoring wouldn't work
> like this at all. If you give the gold class a score of 1000 (where
> higher number means higher priority), then the only thing that would
> get processed before gold class would be another class that has a
> score > 1000. If nothing does, and a gold request comes in, it gets
> processed first, no matter how big the queue is.
> Think of scoring as instead having multiple queues. Each queue is only
> processed if the queue before it is empty.
>> If you say "I want these requests to advance by 10 seconds"
>> then no matter how many other requests you have in the queue, it will
>> really advance by 10 seconds and effectively shrink the response time
>> by 10 seconds.
> It may shrink the response time by 10 seconds, but if the normal
> response time is 60 seconds, that's still 50 seconds of waiting. If
> your users are only willing to put up with 30 seconds of wait, this
> means you end up failing *everyone*.
>
>> Overall for user experience it's important to think in time and not
>> positions. The rationale behind this is simple : the user will never
>> accept as a justification for varying quality of service the number of
>> other visitors. And positions just depend on this, it's an average number
>> of people you're allowing to pass over. If I'm granted a pass ticket
>> allowing me to advance 10 places in the cinema queue and I find a
>> crowded street, I will go back home. If I'm told I won't wait more
>> than 5 

Re: Priority based queuing

2018-05-02 Thread Willy Tarreau
On Wed, May 02, 2018 at 04:29:33PM -0400, Patrick Hemmer wrote:
> I think you're misunderstanding my design, as scoring wouldn't work like
> this at all. If you give the gold class a score of 1000 (where higher
> number means higher priority), then the only thing that would get
> processed before gold class would be another class that has a score >
> 1000. If nothing does, and a gold request comes in, it gets processed
> first, no matter how big the queue is.
> Think of scoring as instead having multiple queues. Each queue is only
> processed if the queue before it is empty.
(...)

OK you convinced me. Not on everything, but on the fact that we're trying
to address different points. My goal is to make it possible to improve
quality of service between good requests, and your goal is to improve the
classification between good, suspicious, and bad requests. Each of us see
how to expand a little bit our respective model to address part of the
other one (though less efficiently).

I agree that for your goal, multi-queue is better, but I still maintain
that for my goal, the time-based queue gives better control. The good
news is that the two are orthogonal and 100% compatible.

Basically the queueing system should be redesigned as a list of time-based
trees that are visited in order of priority (or traffic classes, we'll have
to find the proper wording to avoid confusion). If you think you can address
your needs with just a small set of different priorities, probably that we
can implement this with a small array (2-4 queues). If you think you need
more, then we'd rather think about building a composite position value in
the tree made of the priority at the top of the word and the service time
at the bottom of the word. This way, picking the first value will always
find the lowest priority value first. There's one subtlety related to
wrapping time there however, but it can be addressed in two lookups.

Please let me know if you'd be fine with designing and implementing
something like this.

Cheers,
Willy



Re: Priority based queuing

2018-05-02 Thread Patrick Hemmer


On 2018/5/2 13:22, Willy Tarreau wrote:
> On Wed, May 02, 2018 at 12:44:06PM -0400, Patrick Hemmer wrote:
>> Can you elaborate on what you're thinking of for a time-based queue?
>>
>> What I'm imagining you mean is that you would write a rule to set the
>> max queue time, and haproxy would insert it into the queue sorting on
>> TIME_NOW() + MAX_QUEUE_TIME. The main difference I see to this approach
>> vs scoring, is that you ensure that an item doesn't sit on the queue
>> forever (or whatever `timeout queue` is set to) if higher priority stuff
>> keeps getting inserted before it.
> In part, but mostly that under contention you can still maintain a
> good quality of service. I remember having had a discussion a very
> long time ago with a gaming site operator explaining that he wanted
> to send profile management requests to a dedicated slow server so
> that people filling in their profile do not disturb the ones playing.
> For me it's a good example : "how long do you think these people are
> willing to wait for the server to respond to each click to update
> their profile?". Let's say 2 seconds is OK, you just add a 2s offset
> to the position in the queue for such requests. If in the mean time
> other higher priority requests come in, the delayed one may face the
> 2 seconds delay. But it will not face more. And obviously if no other
> request is pending before it in the queue it will be served earlier.
>
> A similar example is for certain shopping sites which consider that
> once you have something in your shopping cart, you're much more likely
> to finish the process and to pay because you've found the item you
> were looking for, so you're more willing to tolerate a slightly higher
> response time, and as such provide a better response time to newcomers.
> But while this small delay can easily be expressed in milliseconds
> (probably 50/100), it's almost impossible to define it using positions.
> What if 50 new visitors suddenly come in ? You don't want the guy with
> his item to suddenly experience a timeout. The time-based queue however
> grants you a control over the service degradation you're accepting to
> inflict to your users.
There's a key difference in our designs. You're trying to classify
traffic based on destination. I'm trying to classify traffic based on
source.
In my design, the maxconn is set to a level such that the server is
healthy and every request can be processed fast. With this, and a score
based queue, and under attack, the good requests will be drained fast,
leaving only the bad requests. Thus no normal user should be getting a
timeout, no matter what resource they're going to.
Your design works when the backend system can't keep up with good
legitimate traffic, and you need to prioritize one good request over
another good request. My design is when the backend system can't keep up
with bad traffic, and so we want to process the good traffic first.

>
> Another interesting example is when you want ot provide strong service
> guarantees to some top-level visitors while running under intense load.
> By using only a position, it's easy to say "I want the Gold class to
> advance by 1000 positions". But if the site is highly loaded and you
> have 10k pending requests, these 1000 positions remain irrelevant,
> because the requests sits there, waiting for 90% of the pollution to
> be flushed. 
I think you're misunderstanding my design, as scoring wouldn't work like
this at all. If you give the gold class a score of 1000 (where higher
number means higher priority), then the only thing that would get
processed before gold class would be another class that has a score >
1000. If nothing does, and a gold request comes in, it gets processed
first, no matter how big the queue is.
Think of scoring as instead having multiple queues. Each queue is only
processed if the queue before it is empty.
> If you say "I want these requests to advance by 10 seconds"
> then no matter how many other requests you have in the queue, it will
> really advance by 10 seconds and effectively shrink the response time
> by 10 seconds.
It may shrink the response time by 10 seconds, but if the normal
response time is 60 seconds, that's still 50 seconds of waiting. If your
users are only willing to put up with 30 seconds of wait, this means you
end up failing *everyone*.

>
> Overall for user experience it's important to think in time and not
> positions. The rationale behind this is simple : the user will never
> accept as a justification for varying quality of service the number of
> other visitors. And positions just depend on this, it's an average number
> of people you're allowing to pass over. If I'm granted a pass ticket
> allowing me to advance 10 places in the cinema queue and I find a
> crowded street, I will go back home. If I'm told I won't wait more
> than 5 minutes, I'll come regardless of the number of waiters.
But a time based queue wouldn't work like this. You can't guarantee
service within 5 

Re: Priority based queuing

2018-05-02 Thread Willy Tarreau
On Wed, May 02, 2018 at 12:44:06PM -0400, Patrick Hemmer wrote:
> Can you elaborate on what you're thinking of for a time-based queue?
> 
> What I'm imagining you mean is that you would write a rule to set the
> max queue time, and haproxy would insert it into the queue sorting on
> TIME_NOW() + MAX_QUEUE_TIME. The main difference I see to this approach
> vs scoring, is that you ensure that an item doesn't sit on the queue
> forever (or whatever `timeout queue` is set to) if higher priority stuff
> keeps getting inserted before it.

In part, but mostly that under contention you can still maintain a
good quality of service. I remember having had a discussion a very
long time ago with a gaming site operator explaining that he wanted
to send profile management requests to a dedicated slow server so
that people filling in their profile do not disturb the ones playing.
For me it's a good example : "how long do you think these people are
willing to wait for the server to respond to each click to update
their profile?". Let's say 2 seconds is OK, you just add a 2s offset
to the position in the queue for such requests. If in the mean time
other higher priority requests come in, the delayed one may face the
2 seconds delay. But it will not face more. And obviously if no other
request is pending before it in the queue it will be served earlier.

A similar example is for certain shopping sites which consider that
once you have something in your shopping cart, you're much more likely
to finish the process and to pay because you've found the item you
were looking for, so you're more willing to tolerate a slightly higher
response time, and as such provide a better response time to newcomers.
But while this small delay can easily be expressed in milliseconds
(probably 50/100), it's almost impossible to define it using positions.
What if 50 new visitors suddenly come in ? You don't want the guy with
his item to suddenly experience a timeout. The time-based queue however
grants you a control over the service degradation you're accepting to
inflict to your users.

Another interesting example is when you want ot provide strong service
guarantees to some top-level visitors while running under intense load.
By using only a position, it's easy to say "I want the Gold class to
advance by 1000 positions". But if the site is highly loaded and you
have 10k pending requests, these 1000 positions remain irrelevant,
because the requests sits there, waiting for 90% of the pollution to
be flushed. If you say "I want these requests to advance by 10 seconds"
then no matter how many other requests you have in the queue, it will
really advance by 10 seconds and effectively shrink the response time
by 10 seconds.

Overall for user experience it's important to think in time and not
positions. The rationale behind this is simple : the user will never
accept as a justification for varying quality of service the number of
other visitors. And positions just depend on this, it's an average number
of people you're allowing to pass over. If I'm granted a pass ticket
allowing me to advance 10 places in the cinema queue and I find a
crowded street, I will go back home. If I'm told I won't wait more
than 5 minutes, I'll come regardless of the number of waiters.

> I don't think this is necessarily a good thing.

For having considered the two options several times in field over the
last decade when sites started to crawl, I became very well convinced ;-)

> If you're under a DOS
> attack, the goal is to get the good requests processed before any
> possible malicious requests. With a time based queue, those malicious
> requests will still get processed and starve out the good requests.

Precisely not that much because the good ones will pass before, and
the malicious ones will then be subject to the queue timeout if there
are too many.

> For
> example lets say you're under attack, a bad request comes in with
> max_queue_time=1000ms, and then after 999ms elapse, a good request comes
> in with max_queue_time=10ms. You have a good request, and a bad request
> on the queue, but HAProxy is going to process the bad request first
> because its timer is expiring first.

Absolutely but you fell into the trap of not considering that the queue
is supposed to be full, so you're in fact comparing only two requests,
while in practice you have many of them and there the effect is much
more visible.

> Essentially if haproxy is receiving
> X good requests per second, and Y bad requests per second, it's still
> going to forward X good per second, and Y bad per second, to the backend
> server. The only difference is that they're time shifted.

Except once subject to the queue timeout. It's a very important aspect we
must not dismiss.

> The other thing I could think you mean by time-based is to insert into
> the queue sorting on MAX_QUEUE_TIME, just like a score-based queue, but
> you would still record TIME_NOW() + MAX_QUEUE_TIME, and would reject
> requests that don't 

Re: Priority based queuing

2018-05-02 Thread Patrick Hemmer


On 2018/5/2 11:04, Willy Tarreau wrote:
> On Tue, May 01, 2018 at 09:34:14PM -0400, Patrick Hemmer wrote:
>> Would it be possible to add priority based queuing to haproxy? By this I
>> mean that when a server/backend is full (maxconn), that incoming
>> requests would be added to the queue in a custom order. The idea here is
>> that when the system is under stress, to make sure the important
>> requests get handled first.
> Hehe that's fun that you mention this, as this has been postponed since
> around 1.2 or 1.3! By then we didn't have the equivalent of HTTP rules
> to add/subtract some priority. Now we have everything to do it, we "just"
> need to replace the lists with priority trees in the queues and that's
> all. It's not a big work if someone is interested in working on this.
>
>> In our exact use case, we're looking to use this to help mitigate DOS
>> attacks. The idea is that if a layer 7 attack is saturating the backend
>> servers, we can add logic to prioritize the requests. This logic might
>> be things like requests that have a valid application cookie go to the
>> front of the queue, or requests that come from a cloud provider (e.g.
>> EC2) go to the back of the queue.
> That's exactly why I wanted them to be manipulated vi http-request rules,
> so that everyone can construct his own conditions. Also I found that for
> most shopping sites, having time-based priority is more important than
> position-based : you often want this type of request to be processed 100ms
> faster than another type of request. With HTTP/2 it will be even more
> interesting because it will allow to send the important objects used for
> rendering before the other ones, which is very similar to the H2 priority
> but more fine-grained if you can adjust it on the fly.
>
>> DOS mitigation is hard because while you can write rules to identify
>> requests that are suspicious, you don't want to block them outright as
>> it is possible they might be legitimate. With prioritization, the
>> requests still get through, and are only affected when the backend is
>> saturated. If maxconn is not reached, the prioritization has no effect
>> at all (since queue is empty).
> I wholeheartly agree with you :-)
>
>> I made the change to haproxy and simulated the conditions in a lab, and
>> the strategy appears to work.
>> The change to haproxy was very minor, ~10 lines in queue.c, using
>> `task->nice` as the prioritization key. However my change is a very
>> rough PoC, and not worthy of submission.
> For a rough PoC it's indeed perfectly fine. But for a final design we
> really need a separate offset. I've really been convinced in field about
> using time rather than position, if you want to experiment with this I
> can give you some insights, it's the same in fact.
Can you elaborate on what you're thinking of for a time-based queue?

What I'm imagining you mean is that you would write a rule to set the
max queue time, and haproxy would insert it into the queue sorting on
TIME_NOW() + MAX_QUEUE_TIME. The main difference I see to this approach
vs scoring, is that you ensure that an item doesn't sit on the queue
forever (or whatever `timeout queue` is set to) if higher priority stuff
keeps getting inserted before it.
I don't think this is necessarily a good thing. If you're under a DOS
attack, the goal is to get the good requests processed before any
possible malicious requests. With a time based queue, those malicious
requests will still get processed and starve out the good requests. For
example lets say you're under attack, a bad request comes in with
max_queue_time=1000ms, and then after 999ms elapse, a good request comes
in with max_queue_time=10ms. You have a good request, and a bad request
on the queue, but HAProxy is going to process the bad request first
because its timer is expiring first. Essentially if haproxy is receiving
X good requests per second, and Y bad requests per second, it's still
going to forward X good per second, and Y bad per second, to the backend
server. The only difference is that they're time shifted.

The other thing I could think you mean by time-based is to insert into
the queue sorting on MAX_QUEUE_TIME, just like a score-based queue, but
you would still record TIME_NOW() + MAX_QUEUE_TIME, and would reject
requests that don't get processed by their deadline. Essentially a
per-request version of the `timeout queue` setting. But I don't see any
real value in this.

Or do you mean something else?

>
>> So before continuing any further down this route, I wanted to see if
>> this is something that could make it into HAProxy, and what any thoughts
>> on it might be.
> Absolutely! I've dreamed of it for over a decade, so I'm glad someone
> is willing to take care of it! Just checked, the item was added 12
> years ago to the roadmap file in 1.2.13 on 2006/05/13 by commit 814cbc6
> ("[DOC] added (and updated) the ROADMAP file"). The lines were :
>
>  - wait queues replaced for priority-based trees
>  - ability to 

Re: Priority based queuing

2018-05-02 Thread Willy Tarreau
On Tue, May 01, 2018 at 09:34:14PM -0400, Patrick Hemmer wrote:
> Would it be possible to add priority based queuing to haproxy? By this I
> mean that when a server/backend is full (maxconn), that incoming
> requests would be added to the queue in a custom order. The idea here is
> that when the system is under stress, to make sure the important
> requests get handled first.

Hehe that's fun that you mention this, as this has been postponed since
around 1.2 or 1.3! By then we didn't have the equivalent of HTTP rules
to add/subtract some priority. Now we have everything to do it, we "just"
need to replace the lists with priority trees in the queues and that's
all. It's not a big work if someone is interested in working on this.

> In our exact use case, we're looking to use this to help mitigate DOS
> attacks. The idea is that if a layer 7 attack is saturating the backend
> servers, we can add logic to prioritize the requests. This logic might
> be things like requests that have a valid application cookie go to the
> front of the queue, or requests that come from a cloud provider (e.g.
> EC2) go to the back of the queue.

That's exactly why I wanted them to be manipulated vi http-request rules,
so that everyone can construct his own conditions. Also I found that for
most shopping sites, having time-based priority is more important than
position-based : you often want this type of request to be processed 100ms
faster than another type of request. With HTTP/2 it will be even more
interesting because it will allow to send the important objects used for
rendering before the other ones, which is very similar to the H2 priority
but more fine-grained if you can adjust it on the fly.

> DOS mitigation is hard because while you can write rules to identify
> requests that are suspicious, you don't want to block them outright as
> it is possible they might be legitimate. With prioritization, the
> requests still get through, and are only affected when the backend is
> saturated. If maxconn is not reached, the prioritization has no effect
> at all (since queue is empty).

I wholeheartly agree with you :-)

> I made the change to haproxy and simulated the conditions in a lab, and
> the strategy appears to work.
> The change to haproxy was very minor, ~10 lines in queue.c, using
> `task->nice` as the prioritization key. However my change is a very
> rough PoC, and not worthy of submission.

For a rough PoC it's indeed perfectly fine. But for a final design we
really need a separate offset. I've really been convinced in field about
using time rather than position, if you want to experiment with this I
can give you some insights, it's the same in fact.

> So before continuing any further down this route, I wanted to see if
> this is something that could make it into HAProxy, and what any thoughts
> on it might be.

Absolutely! I've dreamed of it for over a decade, so I'm glad someone
is willing to take care of it! Just checked, the item was added 12
years ago to the roadmap file in 1.2.13 on 2006/05/13 by commit 814cbc6
("[DOC] added (and updated) the ROADMAP file"). The lines were :

 - wait queues replaced for priority-based trees
 - ability to assign a prio based on L7 matching

The goal has not changed since, I'm patient :-)

Willy