[PATCH 0/2] Re: Priority based queuing
Ok, so here is the full submission for priority based queuing. Notes since previous update: Wasn't really able to optimize the tree search any. Tried a few things, but nothing made a measurable performance difference. I added a warning message and documentation making clear the issues with timestamp wrapping. Though one thing that might not be completely obvious is that even if the user does not configure `set-priority-offset` at all, they're still susceptible to the wrapping issue as the priority is the queue key whether priority is adjusted or not. The implementation of the %sq (srv_queue) and %bq (backend_queue) was change to keep the description accurate. The description is "number of requests which were processed before this one". The previous implementation just stored the size of the queue at the time the connection was queued. Since we can inject a connection into the middle of the queue, this no longer works. Now we keep a count of dequeued connections, and take the difference between when the connection was queued, and then dequeued. This also means the value will be slightly different even for users who don't use priority, as the previous method would have included connections which closed without being processed. I added sample fetches for retrieving the class/offset of the current transaction. I think it might be beneficial to add some other fetches for tracking the health of the queue, such as average class/offset, or an exponential moving average of the class/offset for requests added to the queue, requests processed, and requests which closed/timed out. But this is just more stuff the code would have to store, so unsure if they're worth it. I wasn't convinced the 64-bit key was a bad idea, so I implemented the idea with a 12/52 split and an absolute timestamp. On my system (which is 64-bit) the performance is about 20% faster. The code is much simpler. And it also solves the limitations and issues with wrapping. The patch for this is included in case it's of interest. -Patrick Patrick Hemmer (2): MEDIUM: add set-priority-class and set-priority-offset use a 64-bit int with absolute timestamp for priority-offset doc/configuration.txt | 38 +++ doc/lua-api/index.rst | 18 include/types/proxy.h | 3 +- include/types/queue.h | 2 +- include/types/server.h | 3 +- include/types/stream.h | 7 +- src/cfgparse.c | 15 +++ src/hlua.c | 69 + src/log.c | 4 +- src/proto_http.c | 4 +- src/proxy.c| 2 +- src/queue.c| 261 + src/server.c | 2 +- src/stream.c | 10 +- 14 files changed, 366 insertions(+), 72 deletions(-) -- 2.16.3
[PATCH 0/2] Re: Priority based queuing
On 2018/5/5 13:55, Willy Tarreau wrote: > On Sat, May 05, 2018 at 01:33:51PM -0400, Patrick Hemmer wrote: >>> Also I'm thinking that we can even use 32-bit by putting the frontier >>> between the date and the fixed priority (let's call it class) somewhere >>> else : >>> - 8 bit class + 24 bit offset => 256 classes and +/- 2.3 hours offsets >>> - 12 bit class + 20 bit offset => 4096 classes and +/- 8 minutes offsets >>> - 16 bit class + 16 bit offset => 64k classes and +/- 32 second offsets >> What's the benefit to using 32-bit over 64-bit, and is that benefit >> worth the cost of the added code complexity and processing time? > > Slightly smaller memory footprint (doesn't count much), but more > importantly much smaller processing time especially on 32-bit systems. I would have expected the extra processing for wrapping to negate the benefit. > >> If we >> used 64-bit, we could do a 16/48 bit split and have an absolute >> timestamp out to year 10889 at millisecond precision. > > Not really because that would force us to implement yet another clock > and then to have two distinct set of clocks for queues and for all other > internal events and timeouts (including the queue timeout!). We already have the `now` and `date` variables. The priority is only used for the queue key, and not to replace anything else. The queue key is already getting replaced, so it would be no additional work. > That would > really be the beginning of a mess. We've been running with 32-bit > millisecond-precise ticks for over a decade without ever a complain that > it was not enough nor that the wrapping at 49.7 days was an issue. If > one day we'd decide to change this, it would be to support finer-grained > timestamps (eg: nanoseconds or CPU cycles) which would not necessarily > protect us from wrapping either. > >>> I'm pretty sure that the two first options are way enough to cover almost >>> everyone's needs. >> For me a 12-bit class should work. My intention is to implement >> something similar to email spam filtering, where you generate a score >> based on the rules that match, and some rules are worth more than >> others. 8 bits might be a bit difficult to work within. > > Makes sense. > >> I also suspect the 16-bit option (32 second offset) is too small for >> date based. > > Yes I agree. I'm pretty sure that for some TCP connections certain users > will want to have a long delay (eg: wait a minute before your FTP transfer > starts). > >> The rest of this all seems fine. I've got no other thoughts to add. > > Great, thanks! > >> So the question is, which route do we go? >> 1. 32-bit int (12/20 split) >> 2. 64-bit int (32/32 split) > > I think you can start with the 12/20 split first, knowing that if > anybody ever requests an extension of either time or classes we can > switch to 64 without much effort. Until this happens we'll be more > efficient on small systems. Conversely we can't do it the other way > around because we could break some setups :-) > >> 3. 64-bit int (16/48 split) with absolute timestamps > > I don't like it much for the reasons above. > >> 4. 64-bit int with a single `set-priority` action. Let the user use the >> a new`ms()` fetch for time based. The user does the bit shifting if they >> want to use both (using the `mul()` converter, or we can add a `shift()`). > > I think this level of complexity ought to be done once in the code and not > each and every time for all users. It would only be manageable for a number > of the people on this list, but newcomers would call us words for this :-) > > Thanks, > Willy Ok, so this patch set includes a fully functional implementation of the priority class & offset. Unfortunately handling the offset got really complicated due to the wrapping & locking. There might be a little room for improvement, but I don't think much. I'm still working on the code, but I wanted to submit this for review as the final version probably won't be much different. The second patch is a test I wrote to try and catch all the different scenarios. I'm including in case you were inclined to play with the tree search algorithm and test it. It's not intended to be merged. Things to do: * Try and optimize the macros & tree search. * Output a warning if `timeout queue` > max offset. * Add LUA methods & sample fetches. * General cleanup. * Docs -Patrick Patrick Hemmer (2): MEDIUM: add set-priority-class and set-priority-offset tests for queue set-priority Makefile.test | 21 include/types/proxy.h | 2 +- include/types/queue.h | 2 +- include/types/server.h | 2 +- include/types/stream.h | 2 + src/proxy.c| 2 +- src/queue.c| 280 ++--- src/server.c | 2 +- src/stream.c | 2 + tests/test-queue.cfg | 8 ++ tests/test_queue.c | 117 + 11 files changed, 396 insertions(+), 44 deletions(-) create mode 100644 Makefile.test create
Re: Priority based queuing
On Sat, May 05, 2018 at 01:33:51PM -0400, Patrick Hemmer wrote: > > Also I'm thinking that we can even use 32-bit by putting the frontier > > between the date and the fixed priority (let's call it class) somewhere > > else : > > - 8 bit class + 24 bit offset => 256 classes and +/- 2.3 hours offsets > > - 12 bit class + 20 bit offset => 4096 classes and +/- 8 minutes offsets > > - 16 bit class + 16 bit offset => 64k classes and +/- 32 second offsets > What's the benefit to using 32-bit over 64-bit, and is that benefit > worth the cost of the added code complexity and processing time? Slightly smaller memory footprint (doesn't count much), but more importantly much smaller processing time especially on 32-bit systems. > If we > used 64-bit, we could do a 16/48 bit split and have an absolute > timestamp out to year 10889 at millisecond precision. Not really because that would force us to implement yet another clock and then to have two distinct set of clocks for queues and for all other internal events and timeouts (including the queue timeout!). That would really be the beginning of a mess. We've been running with 32-bit millisecond-precise ticks for over a decade without ever a complain that it was not enough nor that the wrapping at 49.7 days was an issue. If one day we'd decide to change this, it would be to support finer-grained timestamps (eg: nanoseconds or CPU cycles) which would not necessarily protect us from wrapping either. > > I'm pretty sure that the two first options are way enough to cover almost > > everyone's needs. > For me a 12-bit class should work. My intention is to implement > something similar to email spam filtering, where you generate a score > based on the rules that match, and some rules are worth more than > others. 8 bits might be a bit difficult to work within. Makes sense. > I also suspect the 16-bit option (32 second offset) is too small for > date based. Yes I agree. I'm pretty sure that for some TCP connections certain users will want to have a long delay (eg: wait a minute before your FTP transfer starts). > The rest of this all seems fine. I've got no other thoughts to add. Great, thanks! > So the question is, which route do we go? > 1. 32-bit int (12/20 split) > 2. 64-bit int (32/32 split) I think you can start with the 12/20 split first, knowing that if anybody ever requests an extension of either time or classes we can switch to 64 without much effort. Until this happens we'll be more efficient on small systems. Conversely we can't do it the other way around because we could break some setups :-) > 3. 64-bit int (16/48 split) with absolute timestamps I don't like it much for the reasons above. > 4. 64-bit int with a single `set-priority` action. Let the user use the > a new`ms()` fetch for time based. The user does the bit shifting if they > want to use both (using the `mul()` converter, or we can add a `shift()`). I think this level of complexity ought to be done once in the code and not each and every time for all users. It would only be manageable for a number of the people on this list, but newcomers would call us words for this :-) Thanks, Willy
Re: Priority based queuing
On 2018/5/5 01:29, Willy Tarreau wrote: > On Fri, May 04, 2018 at 06:49:00PM -0400, Patrick Hemmer wrote: >> I'm not quite following the need for multiple queues. Why wouldn't you >> just have one sorted queue, where if multiple pending requests have the >> same priority, then they're FIFO. > That's what the time-ordered queue does. I proposed this so that you can > ensure that lower priorities are always processed first until there is no > more of the same level. Ah, I was considering the 2 solutions as an either-or choice. Not that you'd use both in the same configuration, or at least in the same backend. >> The queue was implemented using the existing pendconns code. The only >> thing that was changed was to insert in sorted order. Obviously this is >> just a quick and dirty implementation. There are lots of priority queue >> implementations to choose from, and I don't know which one is best >> suited to this kind of task. We could keep the existing sorted >> linked-list, and just do some optimizations for insertion. Or we could >> switch to some sort of tree. The linked-list has the advantage in that I >> suspect that the vast majority of additions will append to the end, and >> pops are always off the front, so well suited to a linked list. A tree >> might become very unbalanced very fast, and require lots of maintenance. > No, trees are quite cheap, we use them absolutely everywhere now. Just > take a look at wake_expired_tasks() to get an idea. In fact every single > time we've been relying on lists doing linear lookups, we've got several > bug reports about haproxy eating all the CPU because some people were > using it in a way that wasn't initially considered as reasonable but > which made a lot of sense to them. This is why I liked the idea of a single implementation. It gave the flexibility to the user to do whatever they wanted (time or class based). Granted I was only thinking of doing one or the other, but the user could still do both, they would just have to manage dedicating a range to the date/class. Maybe someone would want to consider date before class. > In this case you'd be better of with the principle consisting in using > a larger integer with the priority at the top and the date at the bottom. > But that's where I said that it would require two lookups to deal with > wrapping date so I'm not sure it's any better than using multiple queues. > Well, at least it's less maintenance burden. The detailed principle is > the following : > >uint64_t priority : > > 6332 31 0 > [ hard priority | date ] > >next = lookup_first(queue); >if (!next) > return; > >=> next is the lowest value of both priority and date > > Then you perform a second lookup with the oldest past date for the same > priority level : > >key = (next->key & 0xULL) | (now_ms - TIMER_LOOK_BACK); >next2 = lookup_ge(queue, key); >if (next2 && !((next2->value ^ next->value) >> 32)) > next=next2 > > Then you dequeue next which is always the next one. > > That's where I thought that using two distinct levels was simpler, but > if we factor in the fact that it would limit the number of priority > classes and would require more maintenance code, it's probably not > worth the hassle to save the 4 extra lines we have above to respect > the wrapping date. > > Also I'm thinking that we can even use 32-bit by putting the frontier > between the date and the fixed priority (let's call it class) somewhere > else : > - 8 bit class + 24 bit offset => 256 classes and +/- 2.3 hours offsets > - 12 bit class + 20 bit offset => 4096 classes and +/- 8 minutes offsets > - 16 bit class + 16 bit offset => 64k classes and +/- 32 second offsets What's the benefit to using 32-bit over 64-bit, and is that benefit worth the cost of the added code complexity and processing time? If we used 64-bit, we could do a 16/48 bit split and have an absolute timestamp out to year 10889 at millisecond precision. > I'm pretty sure that the two first options are way enough to cover almost > everyone's needs. For me a 12-bit class should work. My intention is to implement something similar to email spam filtering, where you generate a score based on the rules that match, and some rules are worth more than others. 8 bits might be a bit difficult to work within. I also suspect the 16-bit option (32 second offset) is too small for date based. > Let's say for a moment that we'd use the second option > (12+20 bits), the queue insertion code becomes this : > >pendconn->node.key = (stream->prio_class << 20) + > (now_ms + stream->prio_offset) & 0xF; >if (srv) > eb32_insert(>pendconns, >node); >else > eb32_insert(>pendconns, >node); > > > And the dequeuing code in pendconn_process_next_strm() which takes care > both of the backend's and server's queues while still respecting their > classes would become
Re: Priority based queuing
On Fri, May 04, 2018 at 06:49:00PM -0400, Patrick Hemmer wrote: > I'm not quite following the need for multiple queues. Why wouldn't you > just have one sorted queue, where if multiple pending requests have the > same priority, then they're FIFO. That's what the time-ordered queue does. I proposed this so that you can ensure that lower priorities are always processed first until there is no more of the same level. > I had sent another email which proposed an implementation that would > satisfy both designs > (https://www.mail-archive.com/haproxy@formilux.org/msg29876.html). > Unfortunately I was composing it when you sent your most recent reply, > so I didn't see it before I sent. Thanks, I missed this one indeed. > I went ahead and implemented the functionality I had talked about in > that other email, and have attached a WIP patch. I've done some testing > on it, and it seems to work fine. The syntax is exactly as proposed: > > # time-based priority > http-request set-priority date(20) if LOW_PRIORITY > http-request set-priority date(-20) if HIGH_PRIORITY > > # score-based priority > http-request set-priority int(20) if LOW_PRIORITY > http-request set-priority int(-20) if HIGH_PRIORITY > > Some notes on the implementation: > > I used a `long` for tracking the priority. Obviously this has > limitations when used with date(), as the timestamps are fairly large. I > can change this to `long long`. Ah now I understand how you wanted to implement it. I have a big concern here with wrapping time however, which is what I explained in an earlier mail where I said that if needed we could mix date and priority in the same integer but that would still require two lookups. Our date wraps every 49.7 days (32-bit millisecond resolution) so depending on the phases of the moon some requests will be processed before the highest priority ones or after, and you'll see some priority inversion 7 times a year. > The queue was implemented using the existing pendconns code. The only > thing that was changed was to insert in sorted order. Obviously this is > just a quick and dirty implementation. There are lots of priority queue > implementations to choose from, and I don't know which one is best > suited to this kind of task. We could keep the existing sorted > linked-list, and just do some optimizations for insertion. Or we could > switch to some sort of tree. The linked-list has the advantage in that I > suspect that the vast majority of additions will append to the end, and > pops are always off the front, so well suited to a linked list. A tree > might become very unbalanced very fast, and require lots of maintenance. No, trees are quite cheap, we use them absolutely everywhere now. Just take a look at wake_expired_tasks() to get an idea. In fact every single time we've been relying on lists doing linear lookups, we've got several bug reports about haproxy eating all the CPU because some people were using it in a way that wasn't initially considered as reasonable but which made a lot of sense to them. In the case of the list above, the corner case would be server with "maxconn 5000" and half of the requests having a high priority and the other half having a low one, resulting in an average of 2500 visits in the optimal case where you can insert from both sides (which was what we used to do a very long time ago in the scheduler). > Using the date() fetch obviously doesn't provide good precision. Hmmm good point here it's seconds so it will not wrap for the next 20 years but will provide a useless precision. > We > could add a `datems()`, `datens()`, or `ticks()` (ms since start) sample > fetch. But obviously we would need to use `long long` for that, Though > ticks could use a `long`, but would wrap at around 16 months. No, 49.7 days ;-) The other problem I'm having here is that it artificially mixes two types of totally unrelated data and relies on the lack of intersection between their domains (when time doesn't wrap). Indeed, here you're considering that values below a certain number cannot be dates so they definitely are fixed priorities. But that also means that people who want to use ranges will randomly collide with dates and will observe a changing behaviour along the year. In this case you'd be better of with the principle consisting in using a larger integer with the priority at the top and the date at the bottom. But that's where I said that it would require two lookups to deal with wrapping date so I'm not sure it's any better than using multiple queues. Well, at least it's less maintenance burden. The detailed principle is the following : uint64_t priority : 6332 31 0 [ hard priority | date ] next = lookup_first(queue); if (!next) return; => next is the lowest value of both priority and date Then you perform a second lookup with the oldest past date for the same priority level : key =
Re: Priority based queuing
On 2018/5/2 22:56, Willy Tarreau wrote: > On Wed, May 02, 2018 at 04:29:33PM -0400, Patrick Hemmer wrote: >> I think you're misunderstanding my design, as scoring wouldn't work like >> this at all. If you give the gold class a score of 1000 (where higher >> number means higher priority), then the only thing that would get >> processed before gold class would be another class that has a score > >> 1000. If nothing does, and a gold request comes in, it gets processed >> first, no matter how big the queue is. >> Think of scoring as instead having multiple queues. Each queue is only >> processed if the queue before it is empty. > (...) > > OK you convinced me. Not on everything, but on the fact that we're trying > to address different points. My goal is to make it possible to improve > quality of service between good requests, and your goal is to improve the > classification between good, suspicious, and bad requests. Each of us see > how to expand a little bit our respective model to address part of the > other one (though less efficiently). > > I agree that for your goal, multi-queue is better, but I still maintain > that for my goal, the time-based queue gives better control. The good > news is that the two are orthogonal and 100% compatible. > > Basically the queueing system should be redesigned as a list of time-based > trees that are visited in order of priority (or traffic classes, we'll have > to find the proper wording to avoid confusion). If you think you can address > your needs with just a small set of different priorities, probably that we > can implement this with a small array (2-4 queues). If you think you need > more, then we'd rather think about building a composite position value in > the tree made of the priority at the top of the word and the service time > at the bottom of the word. This way, picking the first value will always > find the lowest priority value first. There's one subtlety related to > wrapping time there however, but it can be addressed in two lookups. > > Please let me know if you'd be fine with designing and implementing > something like this. > > Cheers, > Willy I'm not quite following the need for multiple queues. Why wouldn't you just have one sorted queue, where if multiple pending requests have the same priority, then they're FIFO. I had sent another email which proposed an implementation that would satisfy both designs (https://www.mail-archive.com/haproxy@formilux.org/msg29876.html). Unfortunately I was composing it when you sent your most recent reply, so I didn't see it before I sent. I went ahead and implemented the functionality I had talked about in that other email, and have attached a WIP patch. I've done some testing on it, and it seems to work fine. The syntax is exactly as proposed: # time-based priority http-request set-priority date(20) if LOW_PRIORITY http-request set-priority date(-20) if HIGH_PRIORITY # score-based priority http-request set-priority int(20) if LOW_PRIORITY http-request set-priority int(-20) if HIGH_PRIORITY Some notes on the implementation: I used a `long` for tracking the priority. Obviously this has limitations when used with date(), as the timestamps are fairly large. I can change this to `long long`. The queue was implemented using the existing pendconns code. The only thing that was changed was to insert in sorted order. Obviously this is just a quick and dirty implementation. There are lots of priority queue implementations to choose from, and I don't know which one is best suited to this kind of task. We could keep the existing sorted linked-list, and just do some optimizations for insertion. Or we could switch to some sort of tree. The linked-list has the advantage in that I suspect that the vast majority of additions will append to the end, and pops are always off the front, so well suited to a linked list. A tree might become very unbalanced very fast, and require lots of maintenance. Using the date() fetch obviously doesn't provide good precision. We could add a `datems()`, `datens()`, or `ticks()` (ms since start) sample fetch. But obviously we would need to use `long long` for that, Though ticks could use a `long`, but would wrap at around 16 months. I still plan on adding a sample fetch for obtaining the current priority value, and a lua function for setting it. Not sure if we want a lua function for retrieving, or if using the fetch is good enough. -Patrick From 4783dccfc869cfb4219d195ba21f896272211c9e Mon Sep 17 00:00:00 2001 From: Patrick HemmerDate: Fri, 4 May 2018 16:31:16 -0400 Subject: [PATCH] MINOR: add {http,tcp}-request set-priority for queue prioritization This adds the set-priority action to http-request and tcp-request content. The priority value is used when connections are queued to determine which connections should be served first. The lowest integer value is served first (akin to the 'nice' parameter). --- include/types/stream.h |
Re: Priority based queuing
On 2018/5/2 16:29, Patrick Hemmer wrote: > > > On 2018/5/2 13:22, Willy Tarreau wrote: >> On Wed, May 02, 2018 at 12:44:06PM -0400, Patrick Hemmer wrote: >>> Can you elaborate on what you're thinking of for a time-based queue? >>> >>> What I'm imagining you mean is that you would write a rule to set the >>> max queue time, and haproxy would insert it into the queue sorting on >>> TIME_NOW() + MAX_QUEUE_TIME. The main difference I see to this approach >>> vs scoring, is that you ensure that an item doesn't sit on the queue >>> forever (or whatever `timeout queue` is set to) if higher priority stuff >>> keeps getting inserted before it. >> In part, but mostly that under contention you can still maintain a >> good quality of service. I remember having had a discussion a very >> long time ago with a gaming site operator explaining that he wanted >> to send profile management requests to a dedicated slow server so >> that people filling in their profile do not disturb the ones playing. >> For me it's a good example : "how long do you think these people are >> willing to wait for the server to respond to each click to update >> their profile?". Let's say 2 seconds is OK, you just add a 2s offset >> to the position in the queue for such requests. If in the mean time >> other higher priority requests come in, the delayed one may face the >> 2 seconds delay. But it will not face more. And obviously if no other >> request is pending before it in the queue it will be served earlier. >> >> A similar example is for certain shopping sites which consider that >> once you have something in your shopping cart, you're much more likely >> to finish the process and to pay because you've found the item you >> were looking for, so you're more willing to tolerate a slightly higher >> response time, and as such provide a better response time to newcomers. >> But while this small delay can easily be expressed in milliseconds >> (probably 50/100), it's almost impossible to define it using positions. >> What if 50 new visitors suddenly come in ? You don't want the guy with >> his item to suddenly experience a timeout. The time-based queue however >> grants you a control over the service degradation you're accepting to >> inflict to your users. > There's a key difference in our designs. You're trying to classify > traffic based on destination. I'm trying to classify traffic based on > source. > In my design, the maxconn is set to a level such that the server is > healthy and every request can be processed fast. With this, and a > score based queue, and under attack, the good requests will be drained > fast, leaving only the bad requests. Thus no normal user should be > getting a timeout, no matter what resource they're going to. > Your design works when the backend system can't keep up with good > legitimate traffic, and you need to prioritize one good request over > another good request. My design is when the backend system can't keep > up with bad traffic, and so we want to process the good traffic first. > >> Another interesting example is when you want ot provide strong service >> guarantees to some top-level visitors while running under intense load. >> By using only a position, it's easy to say "I want the Gold class to >> advance by 1000 positions". But if the site is highly loaded and you >> have 10k pending requests, these 1000 positions remain irrelevant, >> because the requests sits there, waiting for 90% of the pollution to >> be flushed. > I think you're misunderstanding my design, as scoring wouldn't work > like this at all. If you give the gold class a score of 1000 (where > higher number means higher priority), then the only thing that would > get processed before gold class would be another class that has a > score > 1000. If nothing does, and a gold request comes in, it gets > processed first, no matter how big the queue is. > Think of scoring as instead having multiple queues. Each queue is only > processed if the queue before it is empty. >> If you say "I want these requests to advance by 10 seconds" >> then no matter how many other requests you have in the queue, it will >> really advance by 10 seconds and effectively shrink the response time >> by 10 seconds. > It may shrink the response time by 10 seconds, but if the normal > response time is 60 seconds, that's still 50 seconds of waiting. If > your users are only willing to put up with 30 seconds of wait, this > means you end up failing *everyone*. > >> Overall for user experience it's important to think in time and not >> positions. The rationale behind this is simple : the user will never >> accept as a justification for varying quality of service the number of >> other visitors. And positions just depend on this, it's an average number >> of people you're allowing to pass over. If I'm granted a pass ticket >> allowing me to advance 10 places in the cinema queue and I find a >> crowded street, I will go back home. If I'm told I won't wait more >> than 5
Re: Priority based queuing
On Wed, May 02, 2018 at 04:29:33PM -0400, Patrick Hemmer wrote: > I think you're misunderstanding my design, as scoring wouldn't work like > this at all. If you give the gold class a score of 1000 (where higher > number means higher priority), then the only thing that would get > processed before gold class would be another class that has a score > > 1000. If nothing does, and a gold request comes in, it gets processed > first, no matter how big the queue is. > Think of scoring as instead having multiple queues. Each queue is only > processed if the queue before it is empty. (...) OK you convinced me. Not on everything, but on the fact that we're trying to address different points. My goal is to make it possible to improve quality of service between good requests, and your goal is to improve the classification between good, suspicious, and bad requests. Each of us see how to expand a little bit our respective model to address part of the other one (though less efficiently). I agree that for your goal, multi-queue is better, but I still maintain that for my goal, the time-based queue gives better control. The good news is that the two are orthogonal and 100% compatible. Basically the queueing system should be redesigned as a list of time-based trees that are visited in order of priority (or traffic classes, we'll have to find the proper wording to avoid confusion). If you think you can address your needs with just a small set of different priorities, probably that we can implement this with a small array (2-4 queues). If you think you need more, then we'd rather think about building a composite position value in the tree made of the priority at the top of the word and the service time at the bottom of the word. This way, picking the first value will always find the lowest priority value first. There's one subtlety related to wrapping time there however, but it can be addressed in two lookups. Please let me know if you'd be fine with designing and implementing something like this. Cheers, Willy
Re: Priority based queuing
On 2018/5/2 13:22, Willy Tarreau wrote: > On Wed, May 02, 2018 at 12:44:06PM -0400, Patrick Hemmer wrote: >> Can you elaborate on what you're thinking of for a time-based queue? >> >> What I'm imagining you mean is that you would write a rule to set the >> max queue time, and haproxy would insert it into the queue sorting on >> TIME_NOW() + MAX_QUEUE_TIME. The main difference I see to this approach >> vs scoring, is that you ensure that an item doesn't sit on the queue >> forever (or whatever `timeout queue` is set to) if higher priority stuff >> keeps getting inserted before it. > In part, but mostly that under contention you can still maintain a > good quality of service. I remember having had a discussion a very > long time ago with a gaming site operator explaining that he wanted > to send profile management requests to a dedicated slow server so > that people filling in their profile do not disturb the ones playing. > For me it's a good example : "how long do you think these people are > willing to wait for the server to respond to each click to update > their profile?". Let's say 2 seconds is OK, you just add a 2s offset > to the position in the queue for such requests. If in the mean time > other higher priority requests come in, the delayed one may face the > 2 seconds delay. But it will not face more. And obviously if no other > request is pending before it in the queue it will be served earlier. > > A similar example is for certain shopping sites which consider that > once you have something in your shopping cart, you're much more likely > to finish the process and to pay because you've found the item you > were looking for, so you're more willing to tolerate a slightly higher > response time, and as such provide a better response time to newcomers. > But while this small delay can easily be expressed in milliseconds > (probably 50/100), it's almost impossible to define it using positions. > What if 50 new visitors suddenly come in ? You don't want the guy with > his item to suddenly experience a timeout. The time-based queue however > grants you a control over the service degradation you're accepting to > inflict to your users. There's a key difference in our designs. You're trying to classify traffic based on destination. I'm trying to classify traffic based on source. In my design, the maxconn is set to a level such that the server is healthy and every request can be processed fast. With this, and a score based queue, and under attack, the good requests will be drained fast, leaving only the bad requests. Thus no normal user should be getting a timeout, no matter what resource they're going to. Your design works when the backend system can't keep up with good legitimate traffic, and you need to prioritize one good request over another good request. My design is when the backend system can't keep up with bad traffic, and so we want to process the good traffic first. > > Another interesting example is when you want ot provide strong service > guarantees to some top-level visitors while running under intense load. > By using only a position, it's easy to say "I want the Gold class to > advance by 1000 positions". But if the site is highly loaded and you > have 10k pending requests, these 1000 positions remain irrelevant, > because the requests sits there, waiting for 90% of the pollution to > be flushed. I think you're misunderstanding my design, as scoring wouldn't work like this at all. If you give the gold class a score of 1000 (where higher number means higher priority), then the only thing that would get processed before gold class would be another class that has a score > 1000. If nothing does, and a gold request comes in, it gets processed first, no matter how big the queue is. Think of scoring as instead having multiple queues. Each queue is only processed if the queue before it is empty. > If you say "I want these requests to advance by 10 seconds" > then no matter how many other requests you have in the queue, it will > really advance by 10 seconds and effectively shrink the response time > by 10 seconds. It may shrink the response time by 10 seconds, but if the normal response time is 60 seconds, that's still 50 seconds of waiting. If your users are only willing to put up with 30 seconds of wait, this means you end up failing *everyone*. > > Overall for user experience it's important to think in time and not > positions. The rationale behind this is simple : the user will never > accept as a justification for varying quality of service the number of > other visitors. And positions just depend on this, it's an average number > of people you're allowing to pass over. If I'm granted a pass ticket > allowing me to advance 10 places in the cinema queue and I find a > crowded street, I will go back home. If I'm told I won't wait more > than 5 minutes, I'll come regardless of the number of waiters. But a time based queue wouldn't work like this. You can't guarantee service within 5
Re: Priority based queuing
On Wed, May 02, 2018 at 12:44:06PM -0400, Patrick Hemmer wrote: > Can you elaborate on what you're thinking of for a time-based queue? > > What I'm imagining you mean is that you would write a rule to set the > max queue time, and haproxy would insert it into the queue sorting on > TIME_NOW() + MAX_QUEUE_TIME. The main difference I see to this approach > vs scoring, is that you ensure that an item doesn't sit on the queue > forever (or whatever `timeout queue` is set to) if higher priority stuff > keeps getting inserted before it. In part, but mostly that under contention you can still maintain a good quality of service. I remember having had a discussion a very long time ago with a gaming site operator explaining that he wanted to send profile management requests to a dedicated slow server so that people filling in their profile do not disturb the ones playing. For me it's a good example : "how long do you think these people are willing to wait for the server to respond to each click to update their profile?". Let's say 2 seconds is OK, you just add a 2s offset to the position in the queue for such requests. If in the mean time other higher priority requests come in, the delayed one may face the 2 seconds delay. But it will not face more. And obviously if no other request is pending before it in the queue it will be served earlier. A similar example is for certain shopping sites which consider that once you have something in your shopping cart, you're much more likely to finish the process and to pay because you've found the item you were looking for, so you're more willing to tolerate a slightly higher response time, and as such provide a better response time to newcomers. But while this small delay can easily be expressed in milliseconds (probably 50/100), it's almost impossible to define it using positions. What if 50 new visitors suddenly come in ? You don't want the guy with his item to suddenly experience a timeout. The time-based queue however grants you a control over the service degradation you're accepting to inflict to your users. Another interesting example is when you want ot provide strong service guarantees to some top-level visitors while running under intense load. By using only a position, it's easy to say "I want the Gold class to advance by 1000 positions". But if the site is highly loaded and you have 10k pending requests, these 1000 positions remain irrelevant, because the requests sits there, waiting for 90% of the pollution to be flushed. If you say "I want these requests to advance by 10 seconds" then no matter how many other requests you have in the queue, it will really advance by 10 seconds and effectively shrink the response time by 10 seconds. Overall for user experience it's important to think in time and not positions. The rationale behind this is simple : the user will never accept as a justification for varying quality of service the number of other visitors. And positions just depend on this, it's an average number of people you're allowing to pass over. If I'm granted a pass ticket allowing me to advance 10 places in the cinema queue and I find a crowded street, I will go back home. If I'm told I won't wait more than 5 minutes, I'll come regardless of the number of waiters. > I don't think this is necessarily a good thing. For having considered the two options several times in field over the last decade when sites started to crawl, I became very well convinced ;-) > If you're under a DOS > attack, the goal is to get the good requests processed before any > possible malicious requests. With a time based queue, those malicious > requests will still get processed and starve out the good requests. Precisely not that much because the good ones will pass before, and the malicious ones will then be subject to the queue timeout if there are too many. > For > example lets say you're under attack, a bad request comes in with > max_queue_time=1000ms, and then after 999ms elapse, a good request comes > in with max_queue_time=10ms. You have a good request, and a bad request > on the queue, but HAProxy is going to process the bad request first > because its timer is expiring first. Absolutely but you fell into the trap of not considering that the queue is supposed to be full, so you're in fact comparing only two requests, while in practice you have many of them and there the effect is much more visible. > Essentially if haproxy is receiving > X good requests per second, and Y bad requests per second, it's still > going to forward X good per second, and Y bad per second, to the backend > server. The only difference is that they're time shifted. Except once subject to the queue timeout. It's a very important aspect we must not dismiss. > The other thing I could think you mean by time-based is to insert into > the queue sorting on MAX_QUEUE_TIME, just like a score-based queue, but > you would still record TIME_NOW() + MAX_QUEUE_TIME, and would reject > requests that don't
Re: Priority based queuing
On 2018/5/2 11:04, Willy Tarreau wrote: > On Tue, May 01, 2018 at 09:34:14PM -0400, Patrick Hemmer wrote: >> Would it be possible to add priority based queuing to haproxy? By this I >> mean that when a server/backend is full (maxconn), that incoming >> requests would be added to the queue in a custom order. The idea here is >> that when the system is under stress, to make sure the important >> requests get handled first. > Hehe that's fun that you mention this, as this has been postponed since > around 1.2 or 1.3! By then we didn't have the equivalent of HTTP rules > to add/subtract some priority. Now we have everything to do it, we "just" > need to replace the lists with priority trees in the queues and that's > all. It's not a big work if someone is interested in working on this. > >> In our exact use case, we're looking to use this to help mitigate DOS >> attacks. The idea is that if a layer 7 attack is saturating the backend >> servers, we can add logic to prioritize the requests. This logic might >> be things like requests that have a valid application cookie go to the >> front of the queue, or requests that come from a cloud provider (e.g. >> EC2) go to the back of the queue. > That's exactly why I wanted them to be manipulated vi http-request rules, > so that everyone can construct his own conditions. Also I found that for > most shopping sites, having time-based priority is more important than > position-based : you often want this type of request to be processed 100ms > faster than another type of request. With HTTP/2 it will be even more > interesting because it will allow to send the important objects used for > rendering before the other ones, which is very similar to the H2 priority > but more fine-grained if you can adjust it on the fly. > >> DOS mitigation is hard because while you can write rules to identify >> requests that are suspicious, you don't want to block them outright as >> it is possible they might be legitimate. With prioritization, the >> requests still get through, and are only affected when the backend is >> saturated. If maxconn is not reached, the prioritization has no effect >> at all (since queue is empty). > I wholeheartly agree with you :-) > >> I made the change to haproxy and simulated the conditions in a lab, and >> the strategy appears to work. >> The change to haproxy was very minor, ~10 lines in queue.c, using >> `task->nice` as the prioritization key. However my change is a very >> rough PoC, and not worthy of submission. > For a rough PoC it's indeed perfectly fine. But for a final design we > really need a separate offset. I've really been convinced in field about > using time rather than position, if you want to experiment with this I > can give you some insights, it's the same in fact. Can you elaborate on what you're thinking of for a time-based queue? What I'm imagining you mean is that you would write a rule to set the max queue time, and haproxy would insert it into the queue sorting on TIME_NOW() + MAX_QUEUE_TIME. The main difference I see to this approach vs scoring, is that you ensure that an item doesn't sit on the queue forever (or whatever `timeout queue` is set to) if higher priority stuff keeps getting inserted before it. I don't think this is necessarily a good thing. If you're under a DOS attack, the goal is to get the good requests processed before any possible malicious requests. With a time based queue, those malicious requests will still get processed and starve out the good requests. For example lets say you're under attack, a bad request comes in with max_queue_time=1000ms, and then after 999ms elapse, a good request comes in with max_queue_time=10ms. You have a good request, and a bad request on the queue, but HAProxy is going to process the bad request first because its timer is expiring first. Essentially if haproxy is receiving X good requests per second, and Y bad requests per second, it's still going to forward X good per second, and Y bad per second, to the backend server. The only difference is that they're time shifted. The other thing I could think you mean by time-based is to insert into the queue sorting on MAX_QUEUE_TIME, just like a score-based queue, but you would still record TIME_NOW() + MAX_QUEUE_TIME, and would reject requests that don't get processed by their deadline. Essentially a per-request version of the `timeout queue` setting. But I don't see any real value in this. Or do you mean something else? > >> So before continuing any further down this route, I wanted to see if >> this is something that could make it into HAProxy, and what any thoughts >> on it might be. > Absolutely! I've dreamed of it for over a decade, so I'm glad someone > is willing to take care of it! Just checked, the item was added 12 > years ago to the roadmap file in 1.2.13 on 2006/05/13 by commit 814cbc6 > ("[DOC] added (and updated) the ROADMAP file"). The lines were : > > - wait queues replaced for priority-based trees > - ability to
Re: Priority based queuing
On Tue, May 01, 2018 at 09:34:14PM -0400, Patrick Hemmer wrote: > Would it be possible to add priority based queuing to haproxy? By this I > mean that when a server/backend is full (maxconn), that incoming > requests would be added to the queue in a custom order. The idea here is > that when the system is under stress, to make sure the important > requests get handled first. Hehe that's fun that you mention this, as this has been postponed since around 1.2 or 1.3! By then we didn't have the equivalent of HTTP rules to add/subtract some priority. Now we have everything to do it, we "just" need to replace the lists with priority trees in the queues and that's all. It's not a big work if someone is interested in working on this. > In our exact use case, we're looking to use this to help mitigate DOS > attacks. The idea is that if a layer 7 attack is saturating the backend > servers, we can add logic to prioritize the requests. This logic might > be things like requests that have a valid application cookie go to the > front of the queue, or requests that come from a cloud provider (e.g. > EC2) go to the back of the queue. That's exactly why I wanted them to be manipulated vi http-request rules, so that everyone can construct his own conditions. Also I found that for most shopping sites, having time-based priority is more important than position-based : you often want this type of request to be processed 100ms faster than another type of request. With HTTP/2 it will be even more interesting because it will allow to send the important objects used for rendering before the other ones, which is very similar to the H2 priority but more fine-grained if you can adjust it on the fly. > DOS mitigation is hard because while you can write rules to identify > requests that are suspicious, you don't want to block them outright as > it is possible they might be legitimate. With prioritization, the > requests still get through, and are only affected when the backend is > saturated. If maxconn is not reached, the prioritization has no effect > at all (since queue is empty). I wholeheartly agree with you :-) > I made the change to haproxy and simulated the conditions in a lab, and > the strategy appears to work. > The change to haproxy was very minor, ~10 lines in queue.c, using > `task->nice` as the prioritization key. However my change is a very > rough PoC, and not worthy of submission. For a rough PoC it's indeed perfectly fine. But for a final design we really need a separate offset. I've really been convinced in field about using time rather than position, if you want to experiment with this I can give you some insights, it's the same in fact. > So before continuing any further down this route, I wanted to see if > this is something that could make it into HAProxy, and what any thoughts > on it might be. Absolutely! I've dreamed of it for over a decade, so I'm glad someone is willing to take care of it! Just checked, the item was added 12 years ago to the roadmap file in 1.2.13 on 2006/05/13 by commit 814cbc6 ("[DOC] added (and updated) the ROADMAP file"). The lines were : - wait queues replaced for priority-based trees - ability to assign a prio based on L7 matching The goal has not changed since, I'm patient :-) Willy