Re: lua socket settimeout has no effect

2018-08-16 Thread Cyril Bonté

Hi Willy and Sachin,

Le 14/08/2018 à 01:27, Cyril Bonté a écrit :
Thanks for the feedback. I've made some tests with the lines of code 
you suggested and it looks to work fine. I'll prepare the patches tomorrow.


OK, some updates on the patch.
It was working well with haproxy 1.7, the same with haproxy 1.8.
But when I prepared the patch for the 1.9 branch, I discovered a strange 
behaviour, where the timeout was approximaterly delayed by 500ms. I 
spent some time tonight, until I discovered it was under my eyes some 
lines earlier :)


The culprit is commit #56cc12509 [1] where the rounding calculation is 
wrong :

/* round up for inputs that are fractions and convert to millis */
dtmout = (0.5 + MAY_LJMP(luaL_checknumber(L, 2))) * 1000;

Well, now it's identified, I'll (really) send the patch tomorrow and 
will see in another patch how we can fix #56cc12509.



[1] 
http://git.haproxy.org/?p=haproxy.git;a=blobdiff;f=src/hlua.c;h=60cf8f948437677b72adabc50602ca99d80349bf;hp=633841c6d7aecce3e5cf893e01b5eed1cfe64e0f;hb=56cc12509;hpb=7741c854cd908dd4947325c36a6feb8203748d16



--
Cyril Bonté



Re: [PATCH] MINOR: ssl: Use consistent naming for TLS protocols

2018-08-16 Thread Willy Tarreau
On Tue, Aug 14, 2018 at 12:56:13AM +0100, Bertrand Jacquin wrote:
> In most cases, "TLSv1.x" naming is used across and documentation, lazy
> people tend to grep too much and may not find what they are looking for.

Thanks Bertrand. I retagged it to DOC since it only touches doc so
that we can backport it.

Willy



Re: [PATCH] DOC: add documentation for prio_class and prio_offset sample fetches.

2018-08-16 Thread Willy Tarreau
On Mon, Aug 13, 2018 at 02:07:57PM -0400, Patrick Hemmer wrote:
> This adds documentation that was missed as part of 268a707.

Oh thank you Patrick, I think I didn't even notice that the sample fetch
functions were already there :-)

Willy



Re: cpu-map and definition of LONGBITS

2018-08-16 Thread Willy Tarreau
Hi J,

On Wed, Aug 15, 2018 at 04:56:10PM +0200, J. Kendzorra wrote:
> Hi,
> 
> I was looking into steering HAProxy to some underutilized cores on a multi
> processor system where both network interrupts as well as HAProxy are
> supposed to run on a specific numa node; the initial idea was to have
> interrupts handled on cores 0-21 and give HAProxy cores 44-65:
> # cat /sys/class/net/em3/device/local_cpulist
> 0-21,44-65
> 
> which could result in a cpu-map setting like this:
> cpu-map all 44-65
> 
> However, that won't work since this setting is restricted to be no higher
> than LONGBITS which is defined as ((unsigned int)sizeof(long) * 8). Looking
> into clues to understand this better I found
> https://marc.info/?l=haproxy=144252098801440=2 which states:
> 
> ,--
> Let's stay on this version for now. We've been facing a lot of issues in
> the past due to over-engineering, so let's wait for users to complain
> that 64 is not enough and only then we'll address that and we'll know
> why we do it.
> `--
> 
> This isn't really an issue and I'm not asking to change this; I'd just like
> to understand better the reason why LONGBITS being 64 is the maximum for
> cpu-map.

It's being done like this because we perform all thread mask operations on
a "long" type. This has a lot of benefits, the most important one being to
allow to perform all these updates and computations using atomic operations
which are very fast.

> Given that multi processor systems (with multi threading enabled)
> nowadays easily have >64 cores I'd imagine others may run into this as well
> and look for further clues.

Quite frankly, at the moment (in 2018) I see little value in using that many
threads or even processes. Systems' scalability is far from being linear.
Worse, using a NUMA system for anything proxy-related is the best way to
destroy your performance due to the huge latency between the CPU sockets
and the important impact of cache line flushing for shared data. The only
real reason for using many cores is for TLS handshake processing. But even
there you will find a lot of wasted computation power because the more cores
you have, the lower their frequency and the longest it takes to generate a
single key. And when you start to count in multiples of 50K keys/s, you'll
see that passing everything over shared busses and using a shared TCP session
table over NUMA becomes a huge problem again.

Now it should be technically possible to combine nbproc and nbthread.
The code was designed for this though it purposely doesn't offer you the
greatest flexibility regarding the possibilities to configure process+thread
affinity. But using this it should theorically allow you to scale up to 4096
threads (which is totally ridiculous).

Just to give you some reference numbers, in the past I managed to reach
60 Gbps of forwarded traffic using a 4-core system, and a bit more recently
I reached 520k connections/s using only 8 processes. I'd claim that anyone
running above such numbers should seriously think about spreading the load
over multiple boxes because a single failure starts to cost a lot. So barring
SSL, the number of available CPUs will not bring anything really useful for
what we do. Also, when processing clear text traffic, you'll need roughly as
many CPU cores for the network stack as for haproxy. So even if you had, say,
a 80 cores CPU in a single socket that you wanted to use for very high
bandwidth, you'd still configure 40 threads for haproxy and 40 for the
network.

Just my two cents,
willy



Re: talking http to a https port

2018-08-16 Thread Willy Tarreau
Hi Angelo,

On Thu, Aug 16, 2018 at 03:48:22AM +0200, Angelo Höngens wrote:
> Hey guys,
> 
> I'm building a solution where I am ssl-offloading some old plain http
> applications.
> 
> The http ports are listening on 8001, 8002, 8003, etc. I'm using haproxy
> 1.8.12 to listen on ports 18001, 18002, 18003, etc. using ssl. Everything
> works fine, as expected.
> 
> I am trying to make stuff a bit more fool-proof for stupid users, and
> testing stuff like talking http to https ports, etc. These things are not
> supposed to happen, but users sometimes do stupid stuff like that.
> 
> When I talk http to a https port, I would expect something like an error
> message 'ERROR 400: You are talking http to a https port', or something
> like that. This makes it clear to users what they are doing wrong. (Apache
> does this.)
> 
> However, I get no reply whatsoever. Not even a status code, haproxy just
> closes the connection, and curl just says: "* Empty reply from server".
> Haproxy logs a nice "SSL handshake failure" error in it's logs, as expected.
> 
> I don't know if haproxy was designed to just keep quiet, but I would love
> to get a nice error message back. Is this something configurable?

We've thought about it already. The thing is, openssl directly reads from
the socket so when it fails to parse the data we have no idea what was
found there. We could decide to waste CPU cycles and memory by performing
a call to recv(MSG_PEEK) before openssl reads the contents so that in case
of handshake failure we can check if it looks like HTTP, but the reality
is that it would impact performance for power users with little benefit.

However along the connection layering rewrite, we have plans to split the
SSL layer and ensure that openssl will not directly manipulate the FD.
Thus it will become much easier over the mid term to do something like
this.

That's all I can propose for now.

Regards,
Willy



Re: Segfault on haproxy 1.7.10 with state file and slowstart

2018-08-16 Thread Willy Tarreau
Hi Baptiste,

could you please have a look at this one, I suspect Raghu's proposal
is reasonable, I just want to be sure we don't break DNS and friends.

Thanks,
Willy

On Mon, Aug 13, 2018 at 08:16:31PM +0530, Raghu Udiyar wrote:
> Hi Willy, Baptiste
> 
> On Fri, 29 Jun 2018 at 08:12 Willy Tarreau  wrote:
> 
> > Hi Raghu,
> > [...]
> >
> >[ALERT] 248/130258 (21960) : parsing [/etc/haproxy/test.cfg:53] :
> > 'server bla' : no method found to resolve address '(null)'
> >[ALERT] 248/130258 (21960) : Failed to initialize server(s) addr.
> >
> > According to Nenad :
> >   "It's not a good way to fix the issue we were experiencing
> >before. It will need a bigger rewrite, because the logic in
> >srv_iterate_initaddr needs to be changed."
> >
> > Could you please try the example above in the commit message with and
> > without
> > your patch ? Maybe the problem is still there, or maybe the logic change
> > mentioned by Nenad has already been done and this is no longer a problem,
> > in which case we could take your patch this time.
> >
> 
> Yes this is a problem - I missed to test this case. I have a proposal.
> 
> The init-addr option only makes sense when fqdn is used. But last method is
> useful when using IP addresses. This patch enables
> the last method to work by default when IP addresses are used. Can you
> please review?
> 
> Thanks
> Raghu





Re: [PATCH] DOC: dns: explain set server ... fqdn requires resolver

2018-08-16 Thread Willy Tarreau
Hi Lukas,

On Tue, Aug 14, 2018 at 11:39:35AM +0200, Lukas Tribus wrote:
> Abhishek Gupta reported on discourse that set server [...] fqdn always
> fails. Further investigation showed that this requires the internal
> DNS resolver to be configured. Add this requirement to the docs.
> 
> Must be backported to 1.8.

Now merged, sorry for the delay.

Thanks!
Willy



Re: [Patch] multiple key type bundles are not loaded correctly in certain cases

2018-08-16 Thread Willy Tarreau
On Thu, Aug 16, 2018 at 03:25:57PM +0200, Emeric Brun wrote:
> Here two patches which should fix the issues.

Applied, thanks guys.

Willy



Re: 100% cpu usage 1.9-dev0-48d92ee 2018/07/30, task.?. but keeps working.. (nbthread 1)

2018-08-16 Thread Willy Tarreau
Both patches applied, thanks guys!

Olivier, I have a suggestion for this one :
On Thu, Aug 16, 2018 at 07:17:07PM +0200, Olivier Houchard wrote:
> From 90fc92f72c6b47d88769bb73680702d7b8e6 Mon Sep 17 00:00:00 2001
> From: Olivier Houchard 
> Date: Thu, 16 Aug 2018 19:03:02 +0200
> Subject: [PATCH 1/2] BUG/MEDIUM: tasks: Don't insert in the global rqueue if
>  nbthread == 1
> 
> Make sure we don't insert a task in the global run queue if nbthread == 1,
> as, as an optimisation, we avoid reading from it if nbthread == 1.
> ---
>  src/task.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/src/task.c b/src/task.c
> index de097baf7..e357bc169 100644
> --- a/src/task.c
> +++ b/src/task.c
> @@ -395,7 +395,8 @@ void process_runnable_tasks()
>   state = HA_ATOMIC_AND(>state, ~TASK_RUNNING);
>   if (state)
>  #ifdef USE_THREAD
> - __task_wakeup(t, (t->thread_mask == tid_bit) ?
> + __task_wakeup(t, (t->thread_mask == tid_bit ||
> + global.nbthread == 1) ?
>   _local[tid] : );
>  #else

I'm pretty sure we'll get caught by this sort of stuff in the future
again. I think it would be safer to proceed like this :

 __task_wakeup(t, ((t->thread_mask & all_threads_mask) == tid_bit)

It should always be valid regardless of the number of threads. The same
could be done in task_wakeup().

I suspect we may have a similar issue with the fd cache applied to listeners
here :

   static inline void updt_fd_polling(const int fd)
   {
if (fdtab[fd].thread_mask == tid_bit) {
unsigned int oldupdt;

/* note: we don't have a test-and-set yet in hathreads */


In this case the thread_mask might be larger than all_threads_mask and
we might fail this test. Or we may see that when threads exit and the
threads mask changes.

Just my two cents, both patches were applied anyway.

Willy



Re: 100% cpu usage 1.9-dev0-48d92ee 2018/07/30, task.?. but keeps working.. (nbthread 1)

2018-08-16 Thread Olivier Houchard
Hi again,

On Thu, Aug 16, 2018 at 05:50:27PM +0200, Olivier Houchard wrote:
> Hi Pieter,
> 
> On Thu, Aug 16, 2018 at 12:24:04AM +0200, PiBa-NL wrote:
> > Hi List,
> > 
> > Anyone got a idea how to debug this further?
> > Currently its running at 100% again, any pointers to debug the process as
> > its running would be appreciated.
> > 
> > Or should i compile again from current master and 'hope' it doesn't return?
> > 
> > b.t.w. truss output is as follows:
> > kevent(3,0x0,0,{ },200,{ 0.0 })  = 0 (0x0)
> > kevent(3,0x0,0,{ },200,{ 0.0 })  = 0 (0x0)
> > kevent(3,0x0,0,{ },200,{ 0.0 })  = 0 (0x0)
> > kevent(3,0x0,0,{ },200,{ 0.0 })  = 0 (0x0)
> > kevent(3,0x0,0,{ },200,{ 0.0 })  = 0 (0x0)
> > kevent(3,0x0,0,{ },200,{ 0.0 })  = 0 (0x0)
> > kevent(3,0x0,0,{ },200,{ 0.0 })  = 0 (0x0)
> > 
> > Regards,
> > PiBa-NL (Pieter)
> 
> I'm interested in figuring that one out.
> From the look at it, it seems the scheduler thinks there's a task to be run,
> and so won't let the poller sleep in kevent().
> Can you
> - update to the latest master, even though I don't think any relevant fix
> was applied since Jul 30.
> - compile it with -O0, so that we can get meaningful informations from gdb.
> - When/if that happens again, getting a core, and send it to me with the
> haproxy binary, assuming there's no confidential information in that core,
> of course.
> 
> Thanks !
> 

So after discussing on IRC, I'm pretty sure I figured it out. The two
attached patches should fix it.

Thanks a lot !

Olivier
>From 90fc92f72c6b47d88769bb73680702d7b8e6 Mon Sep 17 00:00:00 2001
From: Olivier Houchard 
Date: Thu, 16 Aug 2018 19:03:02 +0200
Subject: [PATCH 1/2] BUG/MEDIUM: tasks: Don't insert in the global rqueue if
 nbthread == 1

Make sure we don't insert a task in the global run queue if nbthread == 1,
as, as an optimisation, we avoid reading from it if nbthread == 1.
---
 src/task.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/src/task.c b/src/task.c
index de097baf7..e357bc169 100644
--- a/src/task.c
+++ b/src/task.c
@@ -395,7 +395,8 @@ void process_runnable_tasks()
state = HA_ATOMIC_AND(>state, ~TASK_RUNNING);
if (state)
 #ifdef USE_THREAD
-   __task_wakeup(t, (t->thread_mask == tid_bit) ?
+   __task_wakeup(t, (t->thread_mask == tid_bit ||
+   global.nbthread == 1) ?
_local[tid] : );
 #else
__task_wakeup(t, _local[tid]);
-- 
2.14.3

>From 7640aa3de3c9ad00fe82cda4a50351e46fc0bf48 Mon Sep 17 00:00:00 2001
From: Olivier Houchard 
Date: Thu, 16 Aug 2018 19:03:50 +0200
Subject: [PATCH 2/2] BUG/MEDIUM: sessions: Don't use t->state.

In session_expire_embryonic(), don't use t->state, use the "state" argument
instead, as t->state has been cleaned before we're being called.
---
 src/session.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/session.c b/src/session.c
index 1d66b739f..d7d8544c7 100644
--- a/src/session.c
+++ b/src/session.c
@@ -388,7 +388,7 @@ static struct task *session_expire_embryonic(struct task 
*t, void *context, unsi
 {
struct session *sess = context;
 
-   if (!(t->state & TASK_WOKEN_TIMER))
+   if (!(state & TASK_WOKEN_TIMER))
return t;
 
session_kill_embryonic(sess);
-- 
2.14.3



Re: 100% cpu usage 1.9-dev0-48d92ee 2018/07/30, task.?. but keeps working.. (nbthread 1)

2018-08-16 Thread Olivier Houchard
Hi Pieter,

On Thu, Aug 16, 2018 at 12:24:04AM +0200, PiBa-NL wrote:
> Hi List,
> 
> Anyone got a idea how to debug this further?
> Currently its running at 100% again, any pointers to debug the process as
> its running would be appreciated.
> 
> Or should i compile again from current master and 'hope' it doesn't return?
> 
> b.t.w. truss output is as follows:
> kevent(3,0x0,0,{ },200,{ 0.0 })  = 0 (0x0)
> kevent(3,0x0,0,{ },200,{ 0.0 })  = 0 (0x0)
> kevent(3,0x0,0,{ },200,{ 0.0 })  = 0 (0x0)
> kevent(3,0x0,0,{ },200,{ 0.0 })  = 0 (0x0)
> kevent(3,0x0,0,{ },200,{ 0.0 })  = 0 (0x0)
> kevent(3,0x0,0,{ },200,{ 0.0 })  = 0 (0x0)
> kevent(3,0x0,0,{ },200,{ 0.0 })  = 0 (0x0)
> 
> Regards,
> PiBa-NL (Pieter)

I'm interested in figuring that one out.
>From the look at it, it seems the scheduler thinks there's a task to be run,
and so won't let the poller sleep in kevent().
Can you
- update to the latest master, even though I don't think any relevant fix
was applied since Jul 30.
- compile it with -O0, so that we can get meaningful informations from gdb.
- When/if that happens again, getting a core, and send it to me with the
haproxy binary, assuming there's no confidential information in that core,
of course.

Thanks !

Olivier



Re: [Patch] multiple key type bundles are not loaded correctly in certain cases

2018-08-16 Thread Emeric Brun
Hi Willy, Michael,

On 08/02/2018 06:03 PM, Willy Tarreau wrote:
> Hi Michael,
> 
> On Thu, Aug 02, 2018 at 03:48:13PM +0200, Michael Wimmesberger wrote:
>> Hi,
>>
>> while preparing to use multi-keytype bundles for my company's
>> domains, I found the following two issues:
> (...)
> 
> Thanks for reporting these issues. I'm CCing Emeric who's currently
> in vacation and will be back soon.  I prefer that he double-checks
> the implications of these modifications and/or proposes some extra
> solutions. While I'd suspect your first proposed change is right, he
> might have a use case in mind that we don't want to break (and this
> code is quite tricky).
> 
> Thanks!
> Willy
> 

Here two patches which should fix the issues.


Thanks you for the debug scripts Michael and your informations. It was very 
useful.


R,
Emeric
>From b7698752256a405ee32f0ac412eec7a25163c459 Mon Sep 17 00:00:00 2001
From: Emeric Brun 
Date: Thu, 16 Aug 2018 15:14:12 +0200
Subject: [PATCH 2/2] BUG/MEDIUM: ssl: loading dh param from certifile causes
 unpredictable error.

If the dh parameter is not found, the openssl's error global
stack was not correctly cleared causing unpredictable error
during the following parsing (chain cert parsing for instance).

This patch should be backported in 1.8 (and perhaps 1.7)
---
 src/ssl_sock.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/src/ssl_sock.c b/src/ssl_sock.c
index b9c1285..8d0b674 100644
--- a/src/ssl_sock.c
+++ b/src/ssl_sock.c
@@ -2607,6 +2607,8 @@ end:
 if (in)
 BIO_free(in);
 
+	ERR_clear_error();
+
 	return dh;
 }
 
-- 
2.7.4

>From 246bad0bee1e02088538d492f801c38e5f9ad535 Mon Sep 17 00:00:00 2001
From: Emeric Brun 
Date: Thu, 16 Aug 2018 15:11:12 +0200
Subject: [PATCH 1/2] BUG/MEDIUM: ssl: fix missing error loading a keytype cert
 from a bundle.

If there was an issue loading a keytype's part of a bundle, the bundle
was implicitly ignored without errors.

This patch should be backported in 1.8 (and perhaps 1.7)
---
 src/ssl_sock.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/ssl_sock.c b/src/ssl_sock.c
index 7e8739a..b9c1285 100644
--- a/src/ssl_sock.c
+++ b/src/ssl_sock.c
@@ -3491,7 +3491,7 @@ int ssl_sock_load_cert(char *path, struct bind_conf *bind_conf, char **err)
 		}
 
 		snprintf(fp, sizeof(fp), "%s/%s", path, dp);
-		ssl_sock_load_multi_cert(fp, bind_conf, NULL, NULL, 0, err);
+		cfgerr += ssl_sock_load_multi_cert(fp, bind_conf, NULL, NULL, 0, err);
 
 		/* Successfully processed the bundle */
 		goto ignore_entry;
-- 
2.7.4



SPOE and modsecurity contrib Failed to decode HELLO frame

2018-08-16 Thread Павел .

HI all! I'm compile modsec  as described in the instructions 
contib/modsec/README, but have the next errors:

# /usr/local/bin/modsecurity -n 4 -d -f /etc/haproxy/waf/modsecurity.conf
1534409877.286475 [00] ModSecurity for nginx (STABLE)/2.9.2 
(http://www.modsecurity.org/) configured.
1534409877.286555 [00] ModSecurity: APR compiled version="1.4.8"; loaded 
version="1.4.8"
1534409877.286577 [00] ModSecurity: PCRE compiled version="8.32 "; loaded 
version="8.32 2012-11-30"
1534409877.286593 [00] ModSecurity: YAJL compiled version="2.0.4"
1534409877.286610 [00] ModSecurity: LIBXML compiled version="2.9.1"
1534409877.286723 [00] ModSecurity: StatusEngine call: 
"2.9.2,nginx,1.4.8/1.4.8,8.32/8.32 
2012-11-30,(null),2.9.1,4c3b0f175f079eaa4dd15b6eaef7a8207e809bb8"
1534409877.494433 [00] ModSecurity: StatusEngine call successfully sent. For 
more information visit: http://status.modsecurity.org/
1534409877.495333 [00] Worker 01 initialized
1534409877.495453 [01] Worker ready to process client messages
1534409877.495499 [00] Worker 02 initialized
1534409877.495514 [02] Worker ready to process client messages
1534409877.495767 [00] Worker 03 initialized
1534409877.495817 [03] Worker ready to process client messages
1534409877.495925 [00] Worker 04 initialized
1534409877.495958 [00] Server is ready [fragmentation=false - pipelining=false 
- async=false - debug=true - max-frame-size=16384]
1534409877.495961 [04] Worker ready to process client messages
1534409881.192419 [00] <1> New Client connection accepted and assigned to 
worker 01
1534409881.192511 [01] <1> read_frame_cb
1534409881.192596 [01] <1> New Frame of 129 bytes received
1534409881.192606 [01] <1> Decode HAProxy HELLO frame
1534409881.192613 [01] Failed to decode HELLO frame
1534409881.192617 [01] <1> Encode Agent DISCONNECT frame
1534409881.192626 [01] <1> Disconnect status code : 10
1534409881.192630 [01] <1> Disconnect message : fragmentation not supported
1534409881.192648 [01] <1> write_frame_cb
1534409881.192689 [01] <1> Frame of 58 bytes send
1534409881.192695 [01] <1> Release client
1534409882.497134 [01] 0 clients connected
1534409882.497174 [02] 0 clients connected
1534409882.497197 [04] 0 clients connected
1534409882.497185 [03] 0 clients connected
^C1534409885.480782 [00] Stopping the server

Has anyone encountered a similar one?

-- 
Павел.