Re: http-response set-header is unreliable

2018-05-02 Thread Willy Tarreau
On Thu, May 03, 2018 at 12:08:37AM +0200, Tim Düsterhus wrote:
> Willy,
> 
> Am 02.05.2018 um 11:47 schrieb Willy Tarreau:
> > Nice one, though I'd argue that sites which do this know that they
> > are manipulating large contents (it's visible in the config file and
> > sometimes they are the ones asking to relax the config parsing rules).
> > So they're also aware of the need to increase maxrewrite.
> 
> Clearly I was not aware that I needed to increase maxrewrite, despite
> adding a ton of headers to my responses.
> Yes, I could have found out if I read the complete configuration manual.
> But the tunables are defined as "Performance Tuning" in the headline. I
> don't have performance issues, I handle less than 15 r/s in peak
> situations. It did not even occur to me that there could possibly be a
> limit to header rewriting. In fact I found about this whole issue by
> accident.

Oh I can very well imagine. Long gone are the days where it was possible
to read the whole doc in 10 minutes!

> >> I'd start of with *logging* when the call fails for the short term.
> > 
> > Adding extra logs not related to traffic actually makes debugging much
> > worse because these logs don't end up in the correct file/storage/whatever.
> > It's much easier to filter on 400/502 and find all the request elements to
> > understand what happens than seeing a random error out of any context.
> > Part of the difficulty here is to properly indicate what was attempted
> > and failed, in complex configs where it's hard to even enumerate rulesets.
> 
> To me a message like: "Unable to add-header Content-Security-Policy to
> response. Possibly the amount of headers exceeds tune.maxrewrite." would
> have been more helpful than random 502 without any further information.

We could possibly emit something like this with a warning level, just like
when a server goes down. However we need to rate-limit it as you don't want
your admin log to grow at 1000/s when you're flooded with large bogus
requests that repeatedly cause this to happen.

> Especially since the issue happens randomly: Sometimes the additional
> headers fit by chance. Sometimes they don't. I would start by
> investigating the connection to the backend services, not investigating
> some random tunables (See my paragraph above.).

Actually when you have an error, the termination flags are the most important
thing to look at as they indicate what was wrong and where/when.

> Actually I'm not sure
> whether a 502 would even be the correct response. The issue is not with
> the backend, but with the proxy. I'd expect a 500 here:
> 
> >The 502 (Bad Gateway) status code indicates that the server, while
> >acting as a gateway or proxy, received an *invalid response from an
> >inbound server* it accessed while attempting to fulfill the request.
> 
> (highlighting mine)

It could as well, but arguably it could also be said that the frontend
never received a valid response from the backend since this one failed
to transform a valid response into another valid one.

> After digging into it I might be able to deduce that the addition of the
> new `http-response add-header` line caused the issues. But still I would
> be non the wiser. I would have to stumble upon the tunable by accident.
> Or ask on the list, like I did.

Just out of curiosity, what do you check more often, in order of priority,
among :
  - stats page
  - show info on CLI
  - traffic logs
  - admin logs
  - other

Because in fact that might help figure where such painful failures would
need to be shown (possibly multiple places).

> > Maybe one solution could be to tag transactions that experienced such a
> > failure and to put a warning on them, that would be reported in the logs
> > with a "W" flag in the termination codes. It would not indicate what
> > failed, obviously, but would very quickly raise awareness of the fact
> > that there is a problem. Usually thanks to the second flag indicating
> > the transaction state and to the rest of the log, it's easier to follow
> > the processing and to figure what went wrong. Later once we add the
> > ability to reject again such failures, we could add an "F" (for "failure"
> > to process).
> > 
> > What do you think ?
> 
> It clearly would be better than silently failing, but it still would
> have wasted a few hours I suspect (again: see above paragraphs).
> 
> I want to note at this point that I'm not running haproxy at scale or
> with serious monitoring. The haproxy instance I'm experiencing this
> issue with is my personal server, not some company or business one. It
> runs my mail and some side / hobby projects. My needs or expectations
> might be different.

That's an important point. It's the same for me on 1wt.eu or haproxy.org,
sometimes I figure late that there are errors. Most often it's the component
we forget about because it works and we don't spend time looking at the logs.
The probably, like me, you're looking at the stats page once in a while, and

Re: Priority based queuing

2018-05-02 Thread Patrick Hemmer


On 2018/5/2 16:29, Patrick Hemmer wrote:
>
>
> On 2018/5/2 13:22, Willy Tarreau wrote:
>> On Wed, May 02, 2018 at 12:44:06PM -0400, Patrick Hemmer wrote:
>>> Can you elaborate on what you're thinking of for a time-based queue?
>>>
>>> What I'm imagining you mean is that you would write a rule to set the
>>> max queue time, and haproxy would insert it into the queue sorting on
>>> TIME_NOW() + MAX_QUEUE_TIME. The main difference I see to this approach
>>> vs scoring, is that you ensure that an item doesn't sit on the queue
>>> forever (or whatever `timeout queue` is set to) if higher priority stuff
>>> keeps getting inserted before it.
>> In part, but mostly that under contention you can still maintain a
>> good quality of service. I remember having had a discussion a very
>> long time ago with a gaming site operator explaining that he wanted
>> to send profile management requests to a dedicated slow server so
>> that people filling in their profile do not disturb the ones playing.
>> For me it's a good example : "how long do you think these people are
>> willing to wait for the server to respond to each click to update
>> their profile?". Let's say 2 seconds is OK, you just add a 2s offset
>> to the position in the queue for such requests. If in the mean time
>> other higher priority requests come in, the delayed one may face the
>> 2 seconds delay. But it will not face more. And obviously if no other
>> request is pending before it in the queue it will be served earlier.
>>
>> A similar example is for certain shopping sites which consider that
>> once you have something in your shopping cart, you're much more likely
>> to finish the process and to pay because you've found the item you
>> were looking for, so you're more willing to tolerate a slightly higher
>> response time, and as such provide a better response time to newcomers.
>> But while this small delay can easily be expressed in milliseconds
>> (probably 50/100), it's almost impossible to define it using positions.
>> What if 50 new visitors suddenly come in ? You don't want the guy with
>> his item to suddenly experience a timeout. The time-based queue however
>> grants you a control over the service degradation you're accepting to
>> inflict to your users.
> There's a key difference in our designs. You're trying to classify
> traffic based on destination. I'm trying to classify traffic based on
> source.
> In my design, the maxconn is set to a level such that the server is
> healthy and every request can be processed fast. With this, and a
> score based queue, and under attack, the good requests will be drained
> fast, leaving only the bad requests. Thus no normal user should be
> getting a timeout, no matter what resource they're going to.
> Your design works when the backend system can't keep up with good
> legitimate traffic, and you need to prioritize one good request over
> another good request. My design is when the backend system can't keep
> up with bad traffic, and so we want to process the good traffic first.
>
>> Another interesting example is when you want ot provide strong service
>> guarantees to some top-level visitors while running under intense load.
>> By using only a position, it's easy to say "I want the Gold class to
>> advance by 1000 positions". But if the site is highly loaded and you
>> have 10k pending requests, these 1000 positions remain irrelevant,
>> because the requests sits there, waiting for 90% of the pollution to
>> be flushed. 
> I think you're misunderstanding my design, as scoring wouldn't work
> like this at all. If you give the gold class a score of 1000 (where
> higher number means higher priority), then the only thing that would
> get processed before gold class would be another class that has a
> score > 1000. If nothing does, and a gold request comes in, it gets
> processed first, no matter how big the queue is.
> Think of scoring as instead having multiple queues. Each queue is only
> processed if the queue before it is empty.
>> If you say "I want these requests to advance by 10 seconds"
>> then no matter how many other requests you have in the queue, it will
>> really advance by 10 seconds and effectively shrink the response time
>> by 10 seconds.
> It may shrink the response time by 10 seconds, but if the normal
> response time is 60 seconds, that's still 50 seconds of waiting. If
> your users are only willing to put up with 30 seconds of wait, this
> means you end up failing *everyone*.
>
>> Overall for user experience it's important to think in time and not
>> positions. The rationale behind this is simple : the user will never
>> accept as a justification for varying quality of service the number of
>> other visitors. And positions just depend on this, it's an average number
>> of people you're allowing to pass over. If I'm granted a pass ticket
>> allowing me to advance 10 places in the cinema queue and I find a
>> crowded street, I will go back home. If I'm told I won't wait more
>> than 5 m

Re: [PATCH] BUG/MINOR, lua/sockets, make lua tasks that are waiting for io suspend until woken up by the a corresponding event.

2018-05-02 Thread Willy Tarreau
Hi guys,

On Thu, May 03, 2018 at 12:58:55AM +0200, PiBa-NL wrote:
> Hi Tim,
> 
> Op 3-5-2018 om 0:26 schreef Tim Düsterhus:
> > Pieter,
> > 
> > Am 02.05.2018 um 23:54 schrieb PiBa-NL:
> > > If commit message needs tweaking please feel free to do so :).
> > > 
> > obviously not authoritative for this, but I noticed directly that the
> > first line of your message is very long. It should generally be about 60
> > characters, otherwise it might get truncated. The following lines should
> > be no more than 76 characters to avoid wrapping with the common terminal
> > width of 80.
> > 
> > Also after the severity and the component a colon should be used, not a
> > comma.
> > 
> > I suggest something like this for the first line. It is succinct and
> > gives a good idea of what the bug might me. But please check whether I
> > grasped the issue properly.
> > 
> > BUG/MINOR: lua: Put tasks to sleep when waiting for data
> > 
> > Best regards
> > Tim Düsterhus
> 
> Thanks, valid comments, the colons i should have noticed could have sworn i
> did those correctly.. but no ;). i thought i carefully constructed the
> commit message looking at a few others, while cramming in as much 'info'
> about the change/issue as possible on a line (better than 'fixed #123' as
> the whole subject&message as some committers do in a other project i
> contribute to., but thats off-topic.). As for line lengths, it didn't even
> cross my mind to look at that. (i also didn't lookup the contributing either
> sorry, though it doesn't seem to specify specific line lengths.?.)

Tim is absolutely right. I'm picky about the formating of the subject line
because it helps us a lot to do backports to stable branches. Just to give
you an idea, here's a typical command run in 1.8 branch :

haproxy-1.8$ ./scripts/git-show-backports -q -m -r tip/master HEAD | grep BUG/

abeaff2d  -   | BUG/MINOR: fd/threads: properly dereference fdcache as 
volatile
1ff91041  -   | BUG/MINOR: fd/threads: properly lock the FD before 
adding it to the fd cache.
41ccb194  -   | BUG/MEDIUM: threads: fix the double CAS implementation 
for ARMv7
f161d0f5  -   | BUG/MINOR: pools/threads: don't ignore DEBUG_UAF on 
double-word CAS capable archs
6e8a41d8  -   | BUG/MINOR: cli: Ensure all command outputs end with a LF
26fb5d84  -   | BUG/MEDIUM: fd/threads: ensure the fdcache_mask always 
reflects the cache contents
1a1dd606  -   | BUG/MINOR: h2: remove accidental debug code introduced 
with show_fd function
b7426d15  -   | BUG/MINOR: spoe: Register the variable to set when an 
error occurred
(...)

Here you can easily see that it's important to have the most info and context
in the fewest words, and it's very convenient to quickly evict commits we're
not interested in for a given version. Writing optimal commit messages is a
difficult exercise first, but over time it becomes natural and you quickly
figure that it also helps for other projects because "git log --oneline" and
"git rebase -i" start to give you a lot of information that help do the best
work.

By the way I've just noticed while running this command that your subject
now mistakenly contains "BUG/MINOR, BUG/MINOR" and I didn't notice it before
pushing. Bah I should have looked closer, too late now :-)

Regarding your fix, good job! You're right, it's a mistake. At first I
thought your fix was wrong until I figured that the current code results
in not updating an expired timeout, hence the 100% CPU loop. It's not
surprising we ended up with this given that the scheduler semantics have
changed in 1.8 and this part got broke after the change.

Now merged, thank you!

Willy



Re: Priority based queuing

2018-05-02 Thread Willy Tarreau
On Wed, May 02, 2018 at 04:29:33PM -0400, Patrick Hemmer wrote:
> I think you're misunderstanding my design, as scoring wouldn't work like
> this at all. If you give the gold class a score of 1000 (where higher
> number means higher priority), then the only thing that would get
> processed before gold class would be another class that has a score >
> 1000. If nothing does, and a gold request comes in, it gets processed
> first, no matter how big the queue is.
> Think of scoring as instead having multiple queues. Each queue is only
> processed if the queue before it is empty.
(...)

OK you convinced me. Not on everything, but on the fact that we're trying
to address different points. My goal is to make it possible to improve
quality of service between good requests, and your goal is to improve the
classification between good, suspicious, and bad requests. Each of us see
how to expand a little bit our respective model to address part of the
other one (though less efficiently).

I agree that for your goal, multi-queue is better, but I still maintain
that for my goal, the time-based queue gives better control. The good
news is that the two are orthogonal and 100% compatible.

Basically the queueing system should be redesigned as a list of time-based
trees that are visited in order of priority (or traffic classes, we'll have
to find the proper wording to avoid confusion). If you think you can address
your needs with just a small set of different priorities, probably that we
can implement this with a small array (2-4 queues). If you think you need
more, then we'd rather think about building a composite position value in
the tree made of the priority at the top of the word and the service time
at the bottom of the word. This way, picking the first value will always
find the lowest priority value first. There's one subtlety related to
wrapping time there however, but it can be addressed in two lookups.

Please let me know if you'd be fine with designing and implementing
something like this.

Cheers,
Willy



1.8.8 lua, xref_get_peer_and_lock hang / 100% cpu usage after restarting haproxy a few times

2018-05-02 Thread PiBa-NL

Hi List,

Sometimes after a few 'restarts' of haproxy 1.8.8 (using -sf  
parameter) one of the processes seems to get into a 'hanging' state 
consuming 100% cpu..


In this configuration i'm using 'nbthread 1' not sure if this is related 
to the corrupted task-tree from my other lua issue.?. 
https://www.mail-archive.com/haproxy@formilux.org/msg29801.html .?.


Also i'm using my new smtpmailqueue and serverhealthchecker lua scripts 
(can be found on github.).. these probably 'contribute' to triggering 
the condition.


Anything i can check / provide.?

(cant really share the config itself a.t.m. as its from our production 
env, but it has like 15 backends with 1 server each, a little header 
rewriting/insertion but nothing big..)


GNU gdb (GDB) 8.0.1 [GDB v8.0.1 for FreeBSD]
Copyright (C) 2017 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later 


This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-portbld-freebsd11.1".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
.
Find the GDB manual and other documentation resources online at:
.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /usr/local/sbin/haproxy...done.
Attaching to program: /usr/local/sbin/haproxy, process 68580
Reading symbols from /lib/libcrypt.so.5...(no debugging symbols 
found)...done.

Reading symbols from /lib/libz.so.6...(no debugging symbols found)...done.
Reading symbols from /lib/libthr.so.3...(no debugging symbols found)...done.
Reading symbols from /usr/lib/libssl.so.8...(no debugging symbols 
found)...done.
Reading symbols from /lib/libcrypto.so.8...(no debugging symbols 
found)...done.
Reading symbols from /usr/local/lib/liblua-5.3.so...(no debugging 
symbols found)...done.

Reading symbols from /lib/libm.so.5...(no debugging symbols found)...done.
Reading symbols from /lib/libc.so.7...(no debugging symbols found)...done.
Reading symbols from /libexec/ld-elf.so.1...(no debugging symbols 
found)...done.

[Switching to LWP 100340 of process 68580]
0x0044890b in xref_get_peer_and_lock (xref=0x80254b680) at 
include/common/xref.h:34

34  include/common/xref.h: No such file or directory.
(gdb) next
37  in include/common/xref.h
(gdb) next
45  in include/common/xref.h
(gdb) next
46  in include/common/xref.h
(gdb) next
34  in include/common/xref.h
(gdb) next
37  in include/common/xref.h
(gdb) next
45  in include/common/xref.h
(gdb) next
46  in include/common/xref.h
(gdb) next
34  in include/common/xref.h
(gdb) next
37  in include/common/xref.h
(gdb) next
45  in include/common/xref.h
(gdb) next
46  in include/common/xref.h
(gdb) next
34  in include/common/xref.h
(gdb) next
37  in include/common/xref.h
(gdb) next
45  in include/common/xref.h
(gdb) next
46  in include/common/xref.h
(gdb) next
34  in include/common/xref.h
(gdb) next
37  in include/common/xref.h
(gdb) next
45  in include/common/xref.h
(gdb) next
46  in include/common/xref.h
(gdb) next
34  in include/common/xref.h
(gdb) next
37  in include/common/xref.h
(gdb) next
45  in include/common/xref.h
(gdb) next
46  in include/common/xref.h
(gdb) next
34  in include/common/xref.h
(gdb) next
37  in include/common/xref.h
(gdb) next
45  in include/common/xref.h
(gdb) next
46  in include/common/xref.h
(gdb) next
34  in include/common/xref.h
(gdb) next
37  in include/common/xref.h
(gdb) next
45  in include/common/xref.h
(gdb) next
46  in include/common/xref.h
(gdb) next
34  in include/common/xref.h
(gdb) next
37  in include/common/xref.h
(gdb) next
45  in include/common/xref.h
(gdb) next
46  in include/common/xref.h
(gdb) next
34  in include/common/xref.h
(gdb) next
37  in include/common/xref.h
(gdb) next
45  in include/common/xref.h
(gdb) next
46  in include/common/xref.h
(gdb) next
34  in include/common/xref.h
(gdb) next
37  in include/common/xref.h
(gdb) next
45  in include/common/xref.h
(gdb) next
46  in include/common/xref.h
(gdb) next
34  in include/common/xref.h
(gdb) next
37  in include/common/xref.h
(gdb) next
45  in include/common/xref.h
(gdb) next
46  in include/common/xref.h
(gdb) next
34  in include/common/xref.h
(gdb) next
37  in include/common/xref.h
(gdb) next
45  in include/common/xref.h
(gdb) next
46  in include/common/xref.h
(gdb) next
34  in include/common/xref.h
(gdb) next
37  in include/common/xref.h
(gdb) next
45  in include/common/xref.h
(gdb) next
46  in include/common/xref.h
(gdb) next
34  in include/common/xref.h
(gdb) next
37   

Re: [PATCH] BUG/MINOR, lua/sockets, make lua tasks that are waiting for io suspend until woken up by the a corresponding event.

2018-05-02 Thread PiBa-NL

Hi Tim,

Op 3-5-2018 om 0:26 schreef Tim Düsterhus:

Pieter,

Am 02.05.2018 um 23:54 schrieb PiBa-NL:

If commit message needs tweaking please feel free to do so :).


obviously not authoritative for this, but I noticed directly that the
first line of your message is very long. It should generally be about 60
characters, otherwise it might get truncated. The following lines should
be no more than 76 characters to avoid wrapping with the common terminal
width of 80.

Also after the severity and the component a colon should be used, not a
comma.

I suggest something like this for the first line. It is succinct and
gives a good idea of what the bug might me. But please check whether I
grasped the issue properly.

BUG/MINOR: lua: Put tasks to sleep when waiting for data

Best regards
Tim Düsterhus


Thanks, valid comments, the colons i should have noticed could have 
sworn i did those correctly.. but no ;). i thought i carefully 
constructed the commit message looking at a few others, while cramming 
in as much 'info' about the change/issue as possible on a line (better 
than 'fixed #123' as the whole subject&message as some committers do in 
a other project i contribute to., but thats off-topic.). As for line 
lengths, it didn't even cross my mind to look at that. (i also didn't 
lookup the contributing either sorry, though it doesn't seem to specify 
specific line lengths.?.)


Anyhow changed the title as suggested, it still covers the change. And 
adjusted line wrapping in the message to not exceed 76.


Regards,

PiBa-NL (Pieter)

From a0b01cdc8ccc4ae95c5c03bc98bf859b6115d2f9 Mon Sep 17 00:00:00 2001
From: PiBa-NL 
Date: Wed, 2 May 2018 22:27:14 +0200
Subject: [PATCH] BUG/MINOR, BUG/MINOR: lua: Put tasks to sleep when waiting for 
data

If a lua socket is waiting for data it currently spins at 100% cpu usage.
This because the TICK_ETERNITY returned by the socket is ignored when 
setting the 'expire' time of the task.

Fixed by removing the check for yields that return TICK_ETERNITY.

This should be backported to at least 1.8.
---
 src/hlua.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/src/hlua.c b/src/hlua.c
index 32199c9..4c56409 100644
--- a/src/hlua.c
+++ b/src/hlua.c
@@ -5552,8 +5552,7 @@ static struct task *hlua_process_task(struct task *task)
 
case HLUA_E_AGAIN: /* co process or timeout wake me later. */
notification_gc(&hlua->com);
-   if (hlua->wake_time != TICK_ETERNITY)
-   task->expire = hlua->wake_time;
+   task->expire = hlua->wake_time;
break;
 
/* finished with error. */
-- 
2.10.1.windows.1



Re: [PATCH] BUG/MINOR, lua/sockets, make lua tasks that are waiting for io suspend until woken up by the a corresponding event.

2018-05-02 Thread Tim Düsterhus
Pieter,

Am 02.05.2018 um 23:54 schrieb PiBa-NL:
> If commit message needs tweaking please feel free to do so :).
> 

obviously not authoritative for this, but I noticed directly that the
first line of your message is very long. It should generally be about 60
characters, otherwise it might get truncated. The following lines should
be no more than 76 characters to avoid wrapping with the common terminal
width of 80.

Also after the severity and the component a colon should be used, not a
comma.

I suggest something like this for the first line. It is succinct and
gives a good idea of what the bug might me. But please check whether I
grasped the issue properly.

BUG/MINOR: lua: Put tasks to sleep when waiting for data

Best regards
Tim Düsterhus



Re: http-response set-header is unreliable

2018-05-02 Thread Tim Düsterhus
Willy,

Am 02.05.2018 um 11:47 schrieb Willy Tarreau:
> Nice one, though I'd argue that sites which do this know that they
> are manipulating large contents (it's visible in the config file and
> sometimes they are the ones asking to relax the config parsing rules).
> So they're also aware of the need to increase maxrewrite.

Clearly I was not aware that I needed to increase maxrewrite, despite
adding a ton of headers to my responses.
Yes, I could have found out if I read the complete configuration manual.
But the tunables are defined as "Performance Tuning" in the headline. I
don't have performance issues, I handle less than 15 r/s in peak
situations. It did not even occur to me that there could possibly be a
limit to header rewriting. In fact I found about this whole issue by
accident.

>> I'd start of with *logging* when the call fails for the short term.
> 
> Adding extra logs not related to traffic actually makes debugging much
> worse because these logs don't end up in the correct file/storage/whatever.
> It's much easier to filter on 400/502 and find all the request elements to
> understand what happens than seeing a random error out of any context.
> Part of the difficulty here is to properly indicate what was attempted
> and failed, in complex configs where it's hard to even enumerate rulesets.

To me a message like: "Unable to add-header Content-Security-Policy to
response. Possibly the amount of headers exceeds tune.maxrewrite." would
have been more helpful than random 502 without any further information.
Especially since the issue happens randomly: Sometimes the additional
headers fit by chance. Sometimes they don't. I would start by
investigating the connection to the backend services, not investigating
some random tunables (See my paragraph above.). Actually I'm not sure
whether a 502 would even be the correct response. The issue is not with
the backend, but with the proxy. I'd expect a 500 here:

>The 502 (Bad Gateway) status code indicates that the server, while
>acting as a gateway or proxy, received an *invalid response from an
>inbound server* it accessed while attempting to fulfill the request.

(highlighting mine)

After digging into it I might be able to deduce that the addition of the
new `http-response add-header` line caused the issues. But still I would
be non the wiser. I would have to stumble upon the tunable by accident.
Or ask on the list, like I did.

> Maybe one solution could be to tag transactions that experienced such a
> failure and to put a warning on them, that would be reported in the logs
> with a "W" flag in the termination codes. It would not indicate what
> failed, obviously, but would very quickly raise awareness of the fact
> that there is a problem. Usually thanks to the second flag indicating
> the transaction state and to the rest of the log, it's easier to follow
> the processing and to figure what went wrong. Later once we add the
> ability to reject again such failures, we could add an "F" (for "failure"
> to process).
> 
> What do you think ?

It clearly would be better than silently failing, but it still would
have wasted a few hours I suspect (again: see above paragraphs).

I want to note at this point that I'm not running haproxy at scale or
with serious monitoring. The haproxy instance I'm experiencing this
issue with is my personal server, not some company or business one. It
runs my mail and some side / hobby projects. My needs or expectations
might be different.

Best regards
Tim Düsterhus



[PATCH] BUG/MINOR, lua/sockets, make lua tasks that are waiting for io suspend until woken up by the a corresponding event.

2018-05-02 Thread PiBa-NL

Hi List, WiIly, Thierry, Emeric,

Tried a little patch for my 100% cpu usage issue.
https://www.mail-archive.com/haproxy@formilux.org/msg29762.html

It stops the cpu usage reported in above thread.. Just wondering if 
there are any culprits that might now 'hang' a lua applet instead.?.


I think this issue was introduced here (haven't tried to bisect it..): 
http://git.haproxy.org/?p=haproxy.git;a=commitdiff;h=253e53e661c49fb9723535319cf511152bf09bc7


Possibly due to some TICK_ETERNITY that shouldn't actually wait a 
eternity.?.


Thoughts are welcome :) Or perhaps the 'all okay' so it can be merged.?
If commit message needs tweaking please feel free to do so :).

Regards,
PiBa-NL (Pieter)
From a0b01cdc8ccc4ae95c5c03bc98bf859b6115d2f9 Mon Sep 17 00:00:00 2001
From: PiBa-NL 
Date: Wed, 2 May 2018 22:27:14 +0200
Subject: [PATCH] BUG/MINOR, lua/sockets, make lua tasks that are waiting for
 io suspend until woken up by the a corresponding event.

If a lua socket is waiting for data it currently spins at 100% cpu usage. This 
because the TICK_ETERNITY returned by the socket is ignored when setting the 
'expire' time of the task.

Fixed by removing the check for yields that return TICK_ETERNITY.

This should be backported to at least 1.8.
---
 src/hlua.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/src/hlua.c b/src/hlua.c
index 32199c9..4c56409 100644
--- a/src/hlua.c
+++ b/src/hlua.c
@@ -5552,8 +5552,7 @@ static struct task *hlua_process_task(struct task *task)
 
case HLUA_E_AGAIN: /* co process or timeout wake me later. */
notification_gc(&hlua->com);
-   if (hlua->wake_time != TICK_ETERNITY)
-   task->expire = hlua->wake_time;
+   task->expire = hlua->wake_time;
break;
 
/* finished with error. */
-- 
2.10.1.windows.1



Re: Priority based queuing

2018-05-02 Thread Patrick Hemmer


On 2018/5/2 13:22, Willy Tarreau wrote:
> On Wed, May 02, 2018 at 12:44:06PM -0400, Patrick Hemmer wrote:
>> Can you elaborate on what you're thinking of for a time-based queue?
>>
>> What I'm imagining you mean is that you would write a rule to set the
>> max queue time, and haproxy would insert it into the queue sorting on
>> TIME_NOW() + MAX_QUEUE_TIME. The main difference I see to this approach
>> vs scoring, is that you ensure that an item doesn't sit on the queue
>> forever (or whatever `timeout queue` is set to) if higher priority stuff
>> keeps getting inserted before it.
> In part, but mostly that under contention you can still maintain a
> good quality of service. I remember having had a discussion a very
> long time ago with a gaming site operator explaining that he wanted
> to send profile management requests to a dedicated slow server so
> that people filling in their profile do not disturb the ones playing.
> For me it's a good example : "how long do you think these people are
> willing to wait for the server to respond to each click to update
> their profile?". Let's say 2 seconds is OK, you just add a 2s offset
> to the position in the queue for such requests. If in the mean time
> other higher priority requests come in, the delayed one may face the
> 2 seconds delay. But it will not face more. And obviously if no other
> request is pending before it in the queue it will be served earlier.
>
> A similar example is for certain shopping sites which consider that
> once you have something in your shopping cart, you're much more likely
> to finish the process and to pay because you've found the item you
> were looking for, so you're more willing to tolerate a slightly higher
> response time, and as such provide a better response time to newcomers.
> But while this small delay can easily be expressed in milliseconds
> (probably 50/100), it's almost impossible to define it using positions.
> What if 50 new visitors suddenly come in ? You don't want the guy with
> his item to suddenly experience a timeout. The time-based queue however
> grants you a control over the service degradation you're accepting to
> inflict to your users.
There's a key difference in our designs. You're trying to classify
traffic based on destination. I'm trying to classify traffic based on
source.
In my design, the maxconn is set to a level such that the server is
healthy and every request can be processed fast. With this, and a score
based queue, and under attack, the good requests will be drained fast,
leaving only the bad requests. Thus no normal user should be getting a
timeout, no matter what resource they're going to.
Your design works when the backend system can't keep up with good
legitimate traffic, and you need to prioritize one good request over
another good request. My design is when the backend system can't keep up
with bad traffic, and so we want to process the good traffic first.

>
> Another interesting example is when you want ot provide strong service
> guarantees to some top-level visitors while running under intense load.
> By using only a position, it's easy to say "I want the Gold class to
> advance by 1000 positions". But if the site is highly loaded and you
> have 10k pending requests, these 1000 positions remain irrelevant,
> because the requests sits there, waiting for 90% of the pollution to
> be flushed. 
I think you're misunderstanding my design, as scoring wouldn't work like
this at all. If you give the gold class a score of 1000 (where higher
number means higher priority), then the only thing that would get
processed before gold class would be another class that has a score >
1000. If nothing does, and a gold request comes in, it gets processed
first, no matter how big the queue is.
Think of scoring as instead having multiple queues. Each queue is only
processed if the queue before it is empty.
> If you say "I want these requests to advance by 10 seconds"
> then no matter how many other requests you have in the queue, it will
> really advance by 10 seconds and effectively shrink the response time
> by 10 seconds.
It may shrink the response time by 10 seconds, but if the normal
response time is 60 seconds, that's still 50 seconds of waiting. If your
users are only willing to put up with 30 seconds of wait, this means you
end up failing *everyone*.

>
> Overall for user experience it's important to think in time and not
> positions. The rationale behind this is simple : the user will never
> accept as a justification for varying quality of service the number of
> other visitors. And positions just depend on this, it's an average number
> of people you're allowing to pass over. If I'm granted a pass ticket
> allowing me to advance 10 places in the cinema queue and I find a
> crowded street, I will go back home. If I'm told I won't wait more
> than 5 minutes, I'll come regardless of the number of waiters.
But a time based queue wouldn't work like this. You can't guarantee
service within 5 minut

Re: Priority based queuing

2018-05-02 Thread Willy Tarreau
On Wed, May 02, 2018 at 12:44:06PM -0400, Patrick Hemmer wrote:
> Can you elaborate on what you're thinking of for a time-based queue?
> 
> What I'm imagining you mean is that you would write a rule to set the
> max queue time, and haproxy would insert it into the queue sorting on
> TIME_NOW() + MAX_QUEUE_TIME. The main difference I see to this approach
> vs scoring, is that you ensure that an item doesn't sit on the queue
> forever (or whatever `timeout queue` is set to) if higher priority stuff
> keeps getting inserted before it.

In part, but mostly that under contention you can still maintain a
good quality of service. I remember having had a discussion a very
long time ago with a gaming site operator explaining that he wanted
to send profile management requests to a dedicated slow server so
that people filling in their profile do not disturb the ones playing.
For me it's a good example : "how long do you think these people are
willing to wait for the server to respond to each click to update
their profile?". Let's say 2 seconds is OK, you just add a 2s offset
to the position in the queue for such requests. If in the mean time
other higher priority requests come in, the delayed one may face the
2 seconds delay. But it will not face more. And obviously if no other
request is pending before it in the queue it will be served earlier.

A similar example is for certain shopping sites which consider that
once you have something in your shopping cart, you're much more likely
to finish the process and to pay because you've found the item you
were looking for, so you're more willing to tolerate a slightly higher
response time, and as such provide a better response time to newcomers.
But while this small delay can easily be expressed in milliseconds
(probably 50/100), it's almost impossible to define it using positions.
What if 50 new visitors suddenly come in ? You don't want the guy with
his item to suddenly experience a timeout. The time-based queue however
grants you a control over the service degradation you're accepting to
inflict to your users.

Another interesting example is when you want ot provide strong service
guarantees to some top-level visitors while running under intense load.
By using only a position, it's easy to say "I want the Gold class to
advance by 1000 positions". But if the site is highly loaded and you
have 10k pending requests, these 1000 positions remain irrelevant,
because the requests sits there, waiting for 90% of the pollution to
be flushed. If you say "I want these requests to advance by 10 seconds"
then no matter how many other requests you have in the queue, it will
really advance by 10 seconds and effectively shrink the response time
by 10 seconds.

Overall for user experience it's important to think in time and not
positions. The rationale behind this is simple : the user will never
accept as a justification for varying quality of service the number of
other visitors. And positions just depend on this, it's an average number
of people you're allowing to pass over. If I'm granted a pass ticket
allowing me to advance 10 places in the cinema queue and I find a
crowded street, I will go back home. If I'm told I won't wait more
than 5 minutes, I'll come regardless of the number of waiters.

> I don't think this is necessarily a good thing.

For having considered the two options several times in field over the
last decade when sites started to crawl, I became very well convinced ;-)

> If you're under a DOS
> attack, the goal is to get the good requests processed before any
> possible malicious requests. With a time based queue, those malicious
> requests will still get processed and starve out the good requests.

Precisely not that much because the good ones will pass before, and
the malicious ones will then be subject to the queue timeout if there
are too many.

> For
> example lets say you're under attack, a bad request comes in with
> max_queue_time=1000ms, and then after 999ms elapse, a good request comes
> in with max_queue_time=10ms. You have a good request, and a bad request
> on the queue, but HAProxy is going to process the bad request first
> because its timer is expiring first.

Absolutely but you fell into the trap of not considering that the queue
is supposed to be full, so you're in fact comparing only two requests,
while in practice you have many of them and there the effect is much
more visible.

> Essentially if haproxy is receiving
> X good requests per second, and Y bad requests per second, it's still
> going to forward X good per second, and Y bad per second, to the backend
> server. The only difference is that they're time shifted.

Except once subject to the queue timeout. It's a very important aspect we
must not dismiss.

> The other thing I could think you mean by time-based is to insert into
> the queue sorting on MAX_QUEUE_TIME, just like a score-based queue, but
> you would still record TIME_NOW() + MAX_QUEUE_TIME, and would reject
> requests that don't 

Re: Priority based queuing

2018-05-02 Thread Patrick Hemmer


On 2018/5/2 11:04, Willy Tarreau wrote:
> On Tue, May 01, 2018 at 09:34:14PM -0400, Patrick Hemmer wrote:
>> Would it be possible to add priority based queuing to haproxy? By this I
>> mean that when a server/backend is full (maxconn), that incoming
>> requests would be added to the queue in a custom order. The idea here is
>> that when the system is under stress, to make sure the important
>> requests get handled first.
> Hehe that's fun that you mention this, as this has been postponed since
> around 1.2 or 1.3! By then we didn't have the equivalent of HTTP rules
> to add/subtract some priority. Now we have everything to do it, we "just"
> need to replace the lists with priority trees in the queues and that's
> all. It's not a big work if someone is interested in working on this.
>
>> In our exact use case, we're looking to use this to help mitigate DOS
>> attacks. The idea is that if a layer 7 attack is saturating the backend
>> servers, we can add logic to prioritize the requests. This logic might
>> be things like requests that have a valid application cookie go to the
>> front of the queue, or requests that come from a cloud provider (e.g.
>> EC2) go to the back of the queue.
> That's exactly why I wanted them to be manipulated vi http-request rules,
> so that everyone can construct his own conditions. Also I found that for
> most shopping sites, having time-based priority is more important than
> position-based : you often want this type of request to be processed 100ms
> faster than another type of request. With HTTP/2 it will be even more
> interesting because it will allow to send the important objects used for
> rendering before the other ones, which is very similar to the H2 priority
> but more fine-grained if you can adjust it on the fly.
>
>> DOS mitigation is hard because while you can write rules to identify
>> requests that are suspicious, you don't want to block them outright as
>> it is possible they might be legitimate. With prioritization, the
>> requests still get through, and are only affected when the backend is
>> saturated. If maxconn is not reached, the prioritization has no effect
>> at all (since queue is empty).
> I wholeheartly agree with you :-)
>
>> I made the change to haproxy and simulated the conditions in a lab, and
>> the strategy appears to work.
>> The change to haproxy was very minor, ~10 lines in queue.c, using
>> `task->nice` as the prioritization key. However my change is a very
>> rough PoC, and not worthy of submission.
> For a rough PoC it's indeed perfectly fine. But for a final design we
> really need a separate offset. I've really been convinced in field about
> using time rather than position, if you want to experiment with this I
> can give you some insights, it's the same in fact.
Can you elaborate on what you're thinking of for a time-based queue?

What I'm imagining you mean is that you would write a rule to set the
max queue time, and haproxy would insert it into the queue sorting on
TIME_NOW() + MAX_QUEUE_TIME. The main difference I see to this approach
vs scoring, is that you ensure that an item doesn't sit on the queue
forever (or whatever `timeout queue` is set to) if higher priority stuff
keeps getting inserted before it.
I don't think this is necessarily a good thing. If you're under a DOS
attack, the goal is to get the good requests processed before any
possible malicious requests. With a time based queue, those malicious
requests will still get processed and starve out the good requests. For
example lets say you're under attack, a bad request comes in with
max_queue_time=1000ms, and then after 999ms elapse, a good request comes
in with max_queue_time=10ms. You have a good request, and a bad request
on the queue, but HAProxy is going to process the bad request first
because its timer is expiring first. Essentially if haproxy is receiving
X good requests per second, and Y bad requests per second, it's still
going to forward X good per second, and Y bad per second, to the backend
server. The only difference is that they're time shifted.

The other thing I could think you mean by time-based is to insert into
the queue sorting on MAX_QUEUE_TIME, just like a score-based queue, but
you would still record TIME_NOW() + MAX_QUEUE_TIME, and would reject
requests that don't get processed by their deadline. Essentially a
per-request version of the `timeout queue` setting. But I don't see any
real value in this.

Or do you mean something else?

>
>> So before continuing any further down this route, I wanted to see if
>> this is something that could make it into HAProxy, and what any thoughts
>> on it might be.
> Absolutely! I've dreamed of it for over a decade, so I'm glad someone
> is willing to take care of it! Just checked, the item was added 12
> years ago to the roadmap file in 1.2.13 on 2006/05/13 by commit 814cbc6
> ("[DOC] added (and updated) the ROADMAP file"). The lines were :
>
>  - wait queues replaced for priority-based trees
>  - ability to 

Re: [PATCH] BUG/MEDIUM: threads: Fix the sync point for more than 32 threads

2018-05-02 Thread Willy Tarreau
On Wed, May 02, 2018 at 05:36:24PM +0200, Christopher Faulet wrote:
> Hi Willy,
> 
> Here is a patch to fix the sync point from more than 32 threads. It is an
> obvious bug once found. But I had a hard time debugging it :)

Ah interesting one, indeed! You're lucky to have that many cores in
your laptop to spot such bugs, I'll ask for one as well :-)

> the function thread_no_sync() has also been updated to avoid any ambiguities.

This part is not useful in my opinion as it already had the comparison.
0 has no type, it's the representation of the value with all bits set
to zero for the common type. There, "threads_want_sync == 0" is an
integer representing the result of the comparison of threads_want_sync
and a value of the same type with all bits cleared, so it's not
ambigous. It's the same for pointers, where 0 is always the size of
the pointer it's compared against.

That said it doesn't hurt and at least it reminds the reader to be
careful.

I'm merging it now.

Thanks,
Willy



[PATCH] BUG/MEDIUM: threads: Fix the sync point for more than 32 threads

2018-05-02 Thread Christopher Faulet

Hi Willy,

Here is a patch to fix the sync point from more than 32 threads. It is 
an obvious bug once found. But I had a hard time debugging it :)


Thanks,
--
Christopher Faulet
>From 49fe27fda17418b9d8f62295d8f7b066bd2b57b4 Mon Sep 17 00:00:00 2001
From: Christopher Faulet 
Date: Wed, 2 May 2018 16:58:40 +0200
Subject: [PATCH] BUG/MEDIUM: threads: Fix the sync point for more than 32
 threads

In the sync point, to know if a thread has requested a synchronization, we call
the function thread_need_sync(). It should return 1 if yes, otherwise it should
return 0. It is intended to return a signed integer.

But internally, instead of returning 0 or 1, it returns 0 or tid_bit
(threads_want_sync & tid_bit). So, tid_bit is casted in integer. For the first
32 threads, it's ok, because we always check if thread_need_sync() returns
something else than 0. But this is a problem if HAProxy is started with more
than 32 threads, because for threads 33 to 64 (so for tid 32 to 63), their
tid_bit casted to integer are evaluated to 0. So the sync point does not work for
more than 32 threads.

Now, the function thread_need_sync() respects its contract, returning 0 or
1. the function thread_no_sync() has also been updated to avoid any ambiguities.

This patch must be backported in HAProxy 1.8.
---
 src/hathreads.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/hathreads.c b/src/hathreads.c
index 839b08696..0d690f383 100644
--- a/src/hathreads.c
+++ b/src/hathreads.c
@@ -82,7 +82,7 @@ void thread_want_sync()
 /* Returns 1 if no thread has requested a sync. Otherwise, it returns 0. */
 int thread_no_sync()
 {
-	return (threads_want_sync == 0);
+	return (threads_want_sync == 0UL);
 }
 
 /* Returns 1 if the current thread has requested a sync. Otherwise, it returns
@@ -90,7 +90,7 @@ int thread_no_sync()
  */
 int thread_need_sync()
 {
-	return (threads_want_sync & tid_bit);
+	return ((threads_want_sync & tid_bit) != 0UL);
 }
 
 /* Thread barrier. Synchronizes all threads at the barrier referenced by
-- 
2.14.3



Re: HAProxy multiple key type support - bug/feature (?) with DH parameters

2018-05-02 Thread Arnaud Gavara
Hello,

I resume this mail from Olivier because I think I meet the same problem.
Like him, I need to use specific DH parameters. For this, I simply use the 
ability to add these DH parameters in the certificate file.
These DH parameters are well taken into account if I specify the exact path of 
the certificate, for example:
bind: 443 ssl crt certificate.pem.rsa

Then, I try to use the functionality described in the manual 
(https://cbonte.github.io/haproxy-dconv/1.7/configuration.html#5.1-crt) which 
allows to create a certificate bundle if we don't specify the explicit suffix 
in the configuration:
bind: 443 ssl crt certificate.pem
In this case, the certificate is well used (certificate.pem.rsa, same file) but 
not its part containing the specific DH parameters. Indeed, if I do an SSL 
connection test (with testssl.sh for example), I observe that HAProxy uses its 
default DH parameters instead of using those of the file.

Of course, the goal is to be able to offer ECDSA certificates, but before going 
to this step, I would have to use specific DH parameters.

Regards,
Arnaud.

- Mail original -
> De: "Olivier Doucet" 
> À: "HAProxy" 
> Envoyé: Vendredi 23 Mars 2018 15:58:27
> Objet: HAProxy multiple key type support - bug/feature (?) with DH parameters

> Hello,
> a few months ago I started using multiple key type support in HAProxy. It
> means I have this in haproxy.cfg :
> bind :443 ssl crt example.pem
> 
> And these files:
> example.pem.rsa
> example.pem.rsa.ocsp
> example.pem.rsa.issuer
> example.pem.ecdsa
> example.pem.ecdsa.ocsp
> example.pem.ecdsa.issuer
> (see https://cbonte.github.io/haproxy-dconv/1.7/configuration.html#5.1-crt)
> 
> It is working very well :)
> 
> I now need to handle specific DH parameters for a customer. Before, I used
> to add a DH block in pem file and it was working ... But here, the block is
> simply ignored, despite what is said in config :
> https://cbonte.github.io/haproxy-dconv/1.7/configuration.html#3.2-tune.ssl.default-dh-param
> "This value is not used if static Diffie-Hellman parameters are supplied
> either directly in the certificate file or by using the ssl-dh-param-file
> parameter"
> 
> I can confirm this behaviour happens only when certificate are loaded with
> .rsa / .ecdsa extension : it is working if I rename example.pem.rsa to
> example.pem
> 
> I tried to create a file example.pem.rsa.dh or example.pem.rsa.dhparam with
> no luck (just tried those file names randomly :p).
> 
> Olivier

-- 
Université de Montpellier
Direction du Système d'Information et du Numérique
Service des Moyens Informatiques
Bureau réseaux, sécurité et téléphonie IP



Re: Priority based queuing

2018-05-02 Thread Willy Tarreau
On Tue, May 01, 2018 at 09:34:14PM -0400, Patrick Hemmer wrote:
> Would it be possible to add priority based queuing to haproxy? By this I
> mean that when a server/backend is full (maxconn), that incoming
> requests would be added to the queue in a custom order. The idea here is
> that when the system is under stress, to make sure the important
> requests get handled first.

Hehe that's fun that you mention this, as this has been postponed since
around 1.2 or 1.3! By then we didn't have the equivalent of HTTP rules
to add/subtract some priority. Now we have everything to do it, we "just"
need to replace the lists with priority trees in the queues and that's
all. It's not a big work if someone is interested in working on this.

> In our exact use case, we're looking to use this to help mitigate DOS
> attacks. The idea is that if a layer 7 attack is saturating the backend
> servers, we can add logic to prioritize the requests. This logic might
> be things like requests that have a valid application cookie go to the
> front of the queue, or requests that come from a cloud provider (e.g.
> EC2) go to the back of the queue.

That's exactly why I wanted them to be manipulated vi http-request rules,
so that everyone can construct his own conditions. Also I found that for
most shopping sites, having time-based priority is more important than
position-based : you often want this type of request to be processed 100ms
faster than another type of request. With HTTP/2 it will be even more
interesting because it will allow to send the important objects used for
rendering before the other ones, which is very similar to the H2 priority
but more fine-grained if you can adjust it on the fly.

> DOS mitigation is hard because while you can write rules to identify
> requests that are suspicious, you don't want to block them outright as
> it is possible they might be legitimate. With prioritization, the
> requests still get through, and are only affected when the backend is
> saturated. If maxconn is not reached, the prioritization has no effect
> at all (since queue is empty).

I wholeheartly agree with you :-)

> I made the change to haproxy and simulated the conditions in a lab, and
> the strategy appears to work.
> The change to haproxy was very minor, ~10 lines in queue.c, using
> `task->nice` as the prioritization key. However my change is a very
> rough PoC, and not worthy of submission.

For a rough PoC it's indeed perfectly fine. But for a final design we
really need a separate offset. I've really been convinced in field about
using time rather than position, if you want to experiment with this I
can give you some insights, it's the same in fact.

> So before continuing any further down this route, I wanted to see if
> this is something that could make it into HAProxy, and what any thoughts
> on it might be.

Absolutely! I've dreamed of it for over a decade, so I'm glad someone
is willing to take care of it! Just checked, the item was added 12
years ago to the roadmap file in 1.2.13 on 2006/05/13 by commit 814cbc6
("[DOC] added (and updated) the ROADMAP file"). The lines were :

 - wait queues replaced for priority-based trees
 - ability to assign a prio based on L7 matching

The goal has not changed since, I'm patient :-)

Willy



Fwd: [PATCH] MINOR: Add server name & puid to LUA Server class.

2018-05-02 Thread Willy Tarreau
Hi Thierry,

when you have a moment, could you please give a quick look at these
patches from Patrick so that I know if I can merge them or not ? There
are 2 other ones on the list.

Thanks,
Willy

On Sun, Apr 29, 2018 at 02:23:48PM -0400, Patrick Hemmer wrote:
> ---
>  doc/lua-api/index.rst |  8 
>  src/hlua_fcn.c| 14 ++
>  2 files changed, 22 insertions(+)
> 
> 

> diff --git a/doc/lua-api/index.rst b/doc/lua-api/index.rst
> index 7a77e46ee..cdb285957 100644
> --- a/doc/lua-api/index.rst
> +++ b/doc/lua-api/index.rst
> @@ -925,6 +925,14 @@ Server class
>  
>This class provides a way for manipulating servers and retrieving 
> information.
>  
> +.. js:attribute:: Server.name
> +
> +  Contain the name of the server.
> +
> +.. js:attribute:: Server.puid
> +
> +  Contain the proxy unique identifier of the server.
> +
>  .. js:function:: Server.is_draining(sv)
>  
>Return true if the server is currently draining sticky connections.
> diff --git a/src/hlua_fcn.c b/src/hlua_fcn.c
> index a8d53d45b..280d8e5af 100644
> --- a/src/hlua_fcn.c
> +++ b/src/hlua_fcn.c
> @@ -490,6 +490,8 @@ int hlua_listener_get_stats(lua_State *L)
>  
>  int hlua_fcn_new_server(lua_State *L, struct server *srv)
>  {
> + char buffer[10];
> +
>   lua_newtable(L);
>  
>   /* Pop a class sesison metatable and affect it to the userdata. */
> @@ -498,6 +500,18 @@ int hlua_fcn_new_server(lua_State *L, struct server *srv)
>  
>   lua_pushlightuserdata(L, srv);
>   lua_rawseti(L, -2, 0);
> +
> + /* Add server name. */
> + lua_pushstring(L, "name");
> + lua_pushstring(L, srv->id);
> + lua_settable(L, -3);
> +
> + /* Add server puid. */
> + lua_pushstring(L, "puid");
> + snprintf(buffer, sizeof(buffer), "%d", srv->puid);
> + lua_pushstring(L, buffer);
> + lua_settable(L, -3);
> +
>   return 1;
>  }
>  
> 




Re: [PATCH] BUG/MINOR: checks: Fix check->health computation for flapping, servers

2018-05-02 Thread Willy Tarreau
Hi Christopher,

On Wed, May 02, 2018 at 02:43:53PM +0200, Christopher Faulet wrote:
> Hi,
> 
> Here is a patch to fix an old bug in health-checks. As stated in the commit
> message, it must be backported to HAProxy 1.5 and newer.

Interesting one! And I'm pretty sure I've already met it (the sticky "down
going up" state in the stats page that I didn't manage to track down).

Now merged, thanks!
Willy



[PATCH] BUG/MINOR: checks: Fix check->health computation for flapping, servers

2018-05-02 Thread Christopher Faulet

Hi,

Here is a patch to fix an old bug in health-checks. As stated in the 
commit message, it must be backported to HAProxy 1.5 and newer.


Thanks,
--
Christopher Faulet
>From 913d6d9adda3420b79b4eadb1c4b847519dc88d9 Mon Sep 17 00:00:00 2001
From: Christopher Faulet 
Date: Wed, 2 May 2018 12:12:45 +0200
Subject: [PATCH] BUG/MINOR: checks: Fix check->health computation for flapping
 servers

This patch fixes an old bug introduced in the commit 7b1d47ce ("MAJOR: checks:
move health checks changes to set_server_check_status()"). When a DOWN server is
flapping, everytime a check succeds, check->health is incremented. But when a
check fails, it is decremented only when it is higher than the rise value. So if
only one check succeds for a DOWN server, check->health will remain set to 1 for
all subsequent failing checks.

So, at first glance, it seems not that terrible because the server remains
DOWN. But it is reported in the transitional state "DOWN server, going up". And
it will remain in this state until it is UP again. And there is also an
insidious side effect. If a DOWN server is flapping time to time, It will end to
be considered UP after a uniq successful check, , regardless the rise threshold,
because check->health will be increased slowly and never decreased.

To fix the bug, we just need to reset check->health to 0 when a check fails for
a DOWN server. To do so, we just need to relax the condition to handle a failure
in the function set_server_check_status.

This patch must be backported to haproxy 1.5 and newer.
---
 src/checks.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/checks.c b/src/checks.c
index 80a9c70d2..d07a82f84 100644
--- a/src/checks.c
+++ b/src/checks.c
@@ -243,7 +243,7 @@ static void set_server_check_status(struct check *check, short status, const cha
 		 */
 		if ((!(check->state & CHK_ST_AGENT) ||
 		(check->status >= HCHK_STATUS_L57DATA)) &&
-		(check->health >= check->rise)) {
+		(check->health > 0)) {
 			HA_ATOMIC_ADD(&s->counters.failed_checks, 1);
 			report = 1;
 			check->health--;
-- 
2.14.3



Re: http-response set-header is unreliable

2018-05-02 Thread Willy Tarreau
On Tue, May 01, 2018 at 05:10:19PM +0200, Tim Düsterhus wrote:
> Willy,
> 
> Am 01.05.2018 um 06:28 schrieb Willy Tarreau:
> >> It might make sense to enlarge the rewrite buffer reservation by
> >> default.
> > 
> > We used to have this a long time ago, the maxrewrite value used to
> > default to half the buffer size. But it caused too many requests to
> > be rejected and became absurd when users configure large buffers.
> > 
> >> I can absolutely imagine that people put in a ton of
> >> information in the times of Content-Security-Policy, Expect-{CT,Staple}
> >> and whatnot.
> > 
> > Yes but even once you put this, you hardly reach 1 kB. Most requests are
> > way below 1 kB headers, at least for performance reasons.
> 
> Just to make sure that we are on the same page: My failure case is not
> the requests, but rather the responses.

Yes I understood this well, sorry for using the wrong term. When I said
"request" I meant "messages" (in either direction).

> As another data point: GitHub adds a 812 Byte Content-Security-Policy
> header to all their responses. In total they send more than 2kB of
> response headers for a logged-out user:

Nice one, though I'd argue that sites which do this know that they
are manipulating large contents (it's visible in the config file and
sometimes they are the ones asking to relax the config parsing rules).
So they're also aware of the need to increase maxrewrite.

> >> Clearly the body must be able to span multiple buffers already,
> >> otherwise I would not be able to send bodies greater than 16kB.
> >> Will it need to allocate more buffers to do the same work, because every
> >> single one is smaller?
> > 
> > No, the body is simply streamed, not stored. If however you need to analyse
> > it (eg for a WAF) then you need to configure a larger buffer so that more
> > of the body can fit.
> > 
> > To give you an idea about how things currently work, when we perform an
> > recvfrom() call, we decide whether we read up to bufsize or up to
> > (bufsize-maxrewrite) depending on whether we are forwarding data or
> > still analysing the message. Thus when we receive a POST, we're not yet
> > forwarding, so up to 15 kB of headers+body are placed into the buffer,
> > leaving 1 kB available to add headers.
> 
> To make sure I understand that correctly (for the response case):
> 
> tune.bufsize:
> Maximum size of response headers.
> 
> tune.bufsize - tune.maxrewrite:
> Maximum supported size of backend generated response headers.
> 
> tune.maxrewrite:
> Maximum size of response headers you are guaranteed to be able to add to
> a response. You might or might not be able to add more depending on the
> speed of the backend sending the body.
> 
> Is that correct?

Absolutely (though it's true both for request and response).

> > After the new native internal HTTP representation is implemented, I think
> > that the need for the maxrewrite will disappear, at the expense of using
> > one buffer for the headers and another one for the body, but we'll see.
> 
> Is this planned for 1.9? I'm aware that plans can change :-)

Yes, but it has ramifications everywhere in the lower layers. You just
made me realize that I wanted to announce the plans a few months ago and
that we were so much busy dealing with complex bugs that I completely
forgot to communicate on this :-(

> > Anyway we need to address the lack of error checking, and I really predict
> > some breakage here :-/
> > 
> 
> I'd start of with *logging* when the call fails for the short term.

Adding extra logs not related to traffic actually makes debugging much
worse because these logs don't end up in the correct file/storage/whatever.
It's much easier to filter on 400/502 and find all the request elements to
understand what happens than seeing a random error out of any context.
Part of the difficulty here is to properly indicate what was attempted
and failed, in complex configs where it's hard to even enumerate rulesets.

Maybe one solution could be to tag transactions that experienced such a
failure and to put a warning on them, that would be reported in the logs
with a "W" flag in the termination codes. It would not indicate what
failed, obviously, but would very quickly raise awareness of the fact
that there is a problem. Usually thanks to the second flag indicating
the transaction state and to the rest of the log, it's easier to follow
the processing and to figure what went wrong. Later once we add the
ability to reject again such failures, we could add an "F" (for "failure"
to process).

What do you think ?

Willy