[PATCH 0/2] Re: Priority based queuing

2018-05-11 Thread Patrick Hemmer
Ok, so here is the full submission for priority based queuing.

Notes since previous update:

Wasn't really able to optimize the tree search any. Tried a few things,
but nothing made a measurable performance difference.

I added a warning message and documentation making clear the issues with
timestamp wrapping.
Though one thing that might not be completely obvious is that even if
the user does not configure `set-priority-offset` at all, they're still
susceptible to the wrapping issue as the priority is the queue key
whether priority is adjusted or not.

The implementation of the %sq (srv_queue) and %bq (backend_queue) was
change to keep the description accurate. The description is "number of
requests which were processed before this one". The previous
implementation just stored the size of the queue at the time the
connection was queued. Since we can inject a connection into the middle
of the queue, this no longer works. Now we keep a count of dequeued
connections, and take the difference between when the connection was
queued, and then dequeued. This also means the value will be slightly
different even for users who don't use priority, as the previous method
would have included connections which closed without being processed.

I added sample fetches for retrieving the class/offset of the current
transaction. I think it might be beneficial to add some other fetches
for tracking the health of the queue, such as average class/offset, or
an exponential moving average of the class/offset for requests added to
the queue, requests processed, and requests which closed/timed out. But
this is just more stuff the code would have to store, so unsure if
they're worth it.


I wasn't convinced the 64-bit key was a bad idea, so I implemented the
idea with a 12/52 split and an absolute timestamp.  On my system (which
is 64-bit) the performance is about 20% faster. The code is much
simpler. And it also solves the limitations and issues with wrapping.
The patch for this is included in case it's of interest.

-Patrick


Patrick Hemmer (2):
  MEDIUM: add set-priority-class and set-priority-offset
  use a 64-bit int with absolute timestamp for priority-offset

 doc/configuration.txt  |  38 +++
 doc/lua-api/index.rst  |  18 
 include/types/proxy.h  |   3 +-
 include/types/queue.h  |   2 +-
 include/types/server.h |   3 +-
 include/types/stream.h |   7 +-
 src/cfgparse.c |  15 +++
 src/hlua.c |  69 +
 src/log.c  |   4 +-
 src/proto_http.c   |   4 +-
 src/proxy.c|   2 +-
 src/queue.c| 261
+
 src/server.c   |   2 +-
 src/stream.c   |  10 +-
 14 files changed, 366 insertions(+), 72 deletions(-)

-- 
2.16.3



[PATCH 0/2] Re: Priority based queuing

2018-05-09 Thread Patrick Hemmer
On 2018/5/5 13:55, Willy Tarreau wrote:
> On Sat, May 05, 2018 at 01:33:51PM -0400, Patrick Hemmer wrote:
>>> Also I'm thinking that we can even use 32-bit by putting the frontier
>>> between the date and the fixed priority (let's call it class) somewhere
>>> else :
>>>   - 8 bit class + 24 bit offset  => 256 classes and +/- 2.3 hours
offsets
>>>   - 12 bit class + 20 bit offset => 4096 classes and +/- 8 minutes
offsets
>>>   - 16 bit class + 16 bit offset => 64k classes and +/- 32 second
offsets
>> What's the benefit to using 32-bit over 64-bit, and is that benefit
>> worth the cost of the added code complexity and processing time?
>
> Slightly smaller memory footprint (doesn't count much), but more
> importantly much smaller processing time especially on 32-bit systems.
I would have expected the extra processing for wrapping to negate the
benefit.

>
>> If we
>> used 64-bit, we could do a 16/48 bit split and have an absolute
>> timestamp out to year 10889 at millisecond precision.
>
> Not really because that would force us to implement yet another clock
> and then to have two distinct set of clocks for queues and for all other
> internal events and timeouts (including the queue timeout!).
We already have the `now` and `date` variables. The priority is only
used for the queue key, and not to replace anything else. The queue key
is already getting replaced, so it would be no additional work.

> That would
> really be the beginning of a mess. We've been running with 32-bit
> millisecond-precise ticks for over a decade without ever a complain that
> it was not enough nor that the wrapping at 49.7 days was an issue. If
> one day we'd decide to change this, it would be to support finer-grained
> timestamps (eg: nanoseconds or CPU cycles) which would not necessarily
> protect us from wrapping either.
>
>>> I'm pretty sure that the two first options are way enough to cover
almost
>>> everyone's needs.
>> For me a 12-bit class should work. My intention is to implement
>> something similar to email spam filtering, where you generate a score
>> based on the rules that match, and some rules are worth more than
>> others. 8 bits might be a bit difficult to work within.
>
> Makes sense.
>
>> I also suspect the 16-bit option (32 second offset) is too small for
>> date based.
>
> Yes I agree. I'm pretty sure that for some TCP connections certain users
> will want to have a long delay (eg: wait a minute before your FTP transfer
> starts).
>
>> The rest of this all seems fine. I've got no other thoughts to add.
>
> Great, thanks!
>
>> So the question is, which route do we go?
>> 1. 32-bit int (12/20 split)
>> 2. 64-bit int (32/32 split)
>
> I think you can start with the 12/20 split first, knowing that if
> anybody ever requests an extension of either time or classes we can
> switch to 64 without much effort. Until this happens we'll be more
> efficient on small systems. Conversely we can't do it the other way
> around because we could break some setups :-)
>
>> 3. 64-bit int (16/48 split) with absolute timestamps
>
> I don't like it much for the reasons above.
>
>> 4. 64-bit int with a single `set-priority` action. Let the user use the
>> a new`ms()` fetch for time based. The user does the bit shifting if they
>> want to use both (using the `mul()` converter, or we can add a
`shift()`).
>
> I think this level of complexity ought to be done once in the code and not
> each and every time for all users. It would only be manageable for a
number
> of the people on this list, but newcomers would call us words for this :-)
>
> Thanks,
> Willy

Ok, so this patch set includes a fully functional implementation of the
priority class & offset. Unfortunately handling the offset got really
complicated due to the wrapping & locking.
There might be a little room for improvement, but I don't think much.

I'm still working on the code, but I wanted to submit this for review as
the final version probably won't be much different.

The second patch is a test I wrote to try and catch all the different
scenarios. I'm including in case you were inclined to play with the tree
search algorithm and test it. It's not intended to be merged.

Things to do:
* Try and optimize the macros & tree search.
* Output a warning if `timeout queue` > max offset.
* Add LUA methods & sample fetches.
* General cleanup.
* Docs

-Patrick


Patrick Hemmer (2):
  MEDIUM: add set-priority-class and set-priority-offset
  tests for queue set-priority

 Makefile.test  |  21 
 include/types/proxy.h  |   2 +-
 include/types/queue.h  |   2 +-
 include/types/server.h |   2 +-
 include/types/stream.h |   2 +
 src/proxy.c|   2 +-
 src/queue.c| 280
++---
 src/server.c   |   2 +-
 src/stream.c   |   2 +
 tests/test-queue.cfg   |   8 ++
 tests/test_queue.c | 117 +
 11 files changed, 396 insertions(+), 44 deletions(-)
 create mode 100644 Makefile.test
 create