Re: [ANNOUNCE] haproxy-1.7.0

2016-11-25 Thread Jonathan Opperman
On 26/11/2016 11:06, "Willy Tarreau"  wrote:
>
> On Sat, Nov 26, 2016 at 10:14:57AM +1300, Jonathan Opperman wrote:
> > On 26/11/2016 10:11, "Baptiste"  wrote:
> > >
> > > Congrats all 
> > >
> > > Baptiste
> >
> > High 5 guys, haproxy is an awesome product. Congratulations to all
> > involved. What's the best way to get involved with helping with the
> > development of haproxy?
>
> Test it, help others when you can, report issues, read patches posted
here,
> test them, comment, review them, and at some point you'll figure you're
able
> to propose your own and to improve it yourself. We all started like this
:-)
>
> Cheers,
> Willy

Thanks Willy, I will do this. These days it's so easy to fire up test
environments and do some testing especially with lxc/lxd.

Thanks again to all for haproxy's existence. :)

Cheers
Jono


Re: [ANNOUNCE] haproxy-1.7.0

2016-11-25 Thread Willy Tarreau
On Sat, Nov 26, 2016 at 10:14:57AM +1300, Jonathan Opperman wrote:
> On 26/11/2016 10:11, "Baptiste"  wrote:
> >
> > Congrats all 
> >
> > Baptiste
> 
> High 5 guys, haproxy is an awesome product. Congratulations to all
> involved. What's the best way to get involved with helping with the
> development of haproxy?

Test it, help others when you can, report issues, read patches posted here,
test them, comment, review them, and at some point you'll figure you're able
to propose your own and to improve it yourself. We all started like this :-)

Cheers,
Willy



Re: [ANNOUNCE] haproxy-1.7.0

2016-11-25 Thread Jonathan Opperman
On 26/11/2016 10:11, "Baptiste"  wrote:
>
> Congrats all 
>
> Baptiste

High 5 guys, haproxy is an awesome product. Congratulations to all
involved. What's the best way to get involved with helping with the
development of haproxy?


Re: [ANNOUNCE] haproxy-1.7.0

2016-11-25 Thread Baptiste
Congrats all 

Baptiste


Re: [ANNOUNCE] haproxy-1.7.0

2016-11-25 Thread Willy Tarreau
On Fri, Nov 25, 2016 at 09:06:38PM +0100, Cyril Bonté wrote:
> Le 25/11/2016 à 20:09, Willy Tarreau a écrit :
> > Hi Cyril,
> > 
> > On Fri, Nov 25, 2016 at 07:25:12PM +0100, Cyril Bonté wrote:
> > > I was about to prepare the HTML documentation :-)
> > > But it will come a bit later : it seems that the repository for 
> > > haproxy-1.7
> > > is not cloneable yet (missing info/refs).
> > 
> > I think you tell me this at each release, and that at each release I
> > forget to set the post-update hook... One more reason for releasing
> > more often :-)
> > 
> > It should be OK now.
> 
> It is :) and now the documentation for 1.7.0 and 1.8-dev0 is ready.

Excellent, thanks for taking care of it this fast!

Willy



Re: [ANNOUNCE] haproxy-1.7.0

2016-11-25 Thread Cyril Bonté

Le 25/11/2016 à 20:09, Willy Tarreau a écrit :

Hi Cyril,

On Fri, Nov 25, 2016 at 07:25:12PM +0100, Cyril Bonté wrote:

I was about to prepare the HTML documentation :-)
But it will come a bit later : it seems that the repository for haproxy-1.7
is not cloneable yet (missing info/refs).


I think you tell me this at each release, and that at each release I
forget to set the post-update hook... One more reason for releasing
more often :-)

It should be OK now.


It is :) and now the documentation for 1.7.0 and 1.8-dev0 is ready.

Cheers !

--
Cyril Bonté



Re: [ANNOUNCE] haproxy-1.7.0

2016-11-25 Thread Willy Tarreau
Hi Cyril,

On Fri, Nov 25, 2016 at 07:25:12PM +0100, Cyril Bonté wrote:
> I was about to prepare the HTML documentation :-)
> But it will come a bit later : it seems that the repository for haproxy-1.7
> is not cloneable yet (missing info/refs).

I think you tell me this at each release, and that at each release I
forget to set the post-update hook... One more reason for releasing
more often :-)

It should be OK now.

Cheers,
Willy



Re: [ANNOUNCE] haproxy-1.7.0

2016-11-25 Thread Cyril Bonté

Hi all, hi Willy,

Le 25/11/2016 à 18:51, Willy Tarreau a écrit :

Hi,

HAProxy 1.7.0 was released on 2016/11/25.


Great news !


[...]
Please find the usual URLs below :
   Site index   : http://www.haproxy.org/
   Discourse: http://discourse.haproxy.org/
   Sources  : http://www.haproxy.org/download/1.7/src/
   Git repository   : http://git.haproxy.org/git/haproxy-1.7.git/
   Git Web browsing : http://git.haproxy.org/?p=haproxy-1.7.git
   Changelog: http://www.haproxy.org/download/1.7/src/CHANGELOG
   Cyril's HTML doc : http://cbonte.github.io/haproxy-dconv/


I was about to prepare the HTML documentation :-)
But it will come a bit later : it seems that the repository for 
haproxy-1.7 is not cloneable yet (missing info/refs).



--
Cyril Bonté



[ANNOUNCE] haproxy-1.7.0

2016-11-25 Thread Willy Tarreau
Hi,

HAProxy 1.7.0 was released on 2016/11/25. It added 107 new commits
after version 1.7-dev6. Most of them were late minor bug fixes and code
cleanups. Over the last two weeks we finally managed to clean a lot of
historical mess, just by splitting some huge code parts into several
files, or moving them into the appropriate file. It's better done
before than after a release since it will make backports easier for the
maintenance branch. To be honnest there's nothing outstanding compared
to 1.7-dev6 so I won't comment on these very latest changes.

Haproxy 1.7 now is what I would have liked 1.6 to be, and is what I
consider the cleanest version we've ever produced. When 1.6 was released
one year ago, I predicted that we'd face one year worth of bug fixes due
to the important changes that were brought to the connection management,
and it indeed took almost one year to get rid of all of them. Now we
mostly focused on fixes, cleanups and modularity, but not on earth-shaking
changes.

It's interesting to note that among the 706 commits that were produced
between 1.6.0 and 1.7.0, no less than 207 were bug fixes (roughly 1/3),
around 70 were build fixes and code reorganizations, and around 60 were
doc updates, so 1.7 was where the fixes for 1.6 were developped, and that
brings it its current level of maturity. We have almost not observed any
1.7-specific regression during its development for now which is a very
good sign of the code becoming more modular and much less tricky than
what it used to be. We had to emit 1.6.1 only one week after 1.6.0 due
to a major bug, I bet we'll be able to wait longer before requiring such
an update, time will tell.

Despite this it still brings quite some significant improvements over
1.6 :
  - significant improvements of the CLI : it is now possible to easily
register new commands without causing some inter-dependencies between
the CLI code and the functional code, so we could already improve a
large number of commands with better help and extra arguments. In
addition to this, the Lua code can also register CLI commands, pushing
the limits as far as your imagination goes.

  - typed statistics : will make it easier to aggregate statistics over
multiple processes. Additionally, all the fields that used to be
available in HTML are now also exported in the CSV output, such as
the server's address and port, cookie, average response times, etc.

  - SPOE (stream processing offload engine) : ability to delegate some
slow, unreliable or dangerous processing to external processes,
ensuring it will be much less necessary to touch the core parts to
add new features, and that some parts could possibly work across
multiple versions.

  - filters : these are a new type of internal hooks to many events and
around most analysers in order to plug code that can manipulate data
and headers. The compression was moved to a filter, and it will be
easy to write new code using filters. SPOE was built entirely as a
filter.

  - log-format : the parser now honnors error processing. It's been a
huge source of complaints over the last few years where some log
fields were empty because improperly typed in the config, but the
much more modular architecture now made this possible.

  - support of directories for config files : now if the argument to -f
is a directory, all files found there are loaded in alphabetical
order. Additionally, files can be specified after "--" without having
to repeat "-f".

  - config : it is now possible to set/unset/preset environment variables
directly in the global section, and even to consult them on the CLI.

  - init-addr : it is now possible to decide in which order the FQDN
should be resolved on "server" lines, and even accept to start with
no address, waiting for a run-time resolution.

  - server update on the CLI : the CLI makes it possible to change a
server's address, port, maxconn, check address and port so that it
is not required anymore to reload haproxy just to update an address.
In conjunction with init-addr, it even allows to pre-populate some
server pools that are filled at run time.

  - state change via the DNS : a valid DNS resolution can now start a
server, and repeated failures can stop it (configurable). This is
another step in the direction of a more dynamic configuration.

  - agent-check : an agent can now change the server's maxconn setting. A
server may now take its own load into consideration when deciding what
its connection limit should be.

  - support for OpenSSL 1.1.0 : this makes this new version future-proof
given that 1.1.0 is about to ship in some future distros. Compatibility
with older versions was validated on 0.9.8, 1.0.1 and 1.0.2.

  - support of multi-certs : different certificates for a same domain so
that the best one can be picked according to browser support. The main
use is to be 

Re: SSL/ECC and nbproc >1

2016-11-25 Thread Christian Ruppert

On 2016-11-25 15:26, Willy Tarreau wrote:

On Fri, Nov 25, 2016 at 02:44:35PM +0100, Christian Ruppert wrote:
I have a default bind for process 1 which is basically the http 
frontend and
the actual backend, RSA is bound to another, single process and ECC is 
bound
to all the rest. So in this case SSL (in particular ECC) is the 
problem. The

connections/handshakes should be *actually* using CPU+2 till NCPU.


That's exactly what I'm talking about, look, you have this :

  frontend ECC
 bind-process 3-36
 bind :65420 ssl crt /etc/haproxy/test.pem-ECC
 mode http
 default_backend bk_ram

It creates a single socket (hence a single queue) and shares it between
all processes. Thus each incoming connection will wake up all processes
not doing anything, and the first one capable of grabbing it will take
it as well as a few following ones if any. You end up with a very
unbalanced load making it hard to scale.

Instead you can do this :

  frontend ECC
 bind :65420 ssl crt /etc/haproxy/test.pem-ECC process 3
 bind :65420 ssl crt /etc/haproxy/test.pem-ECC process 4
 bind :65420 ssl crt /etc/haproxy/test.pem-ECC process 5
 bind :65420 ssl crt /etc/haproxy/test.pem-ECC process 6
 ...
 bind :65420 ssl crt /etc/haproxy/test.pem-ECC process 36
 mode http
 default_backend bk_ram

You'll really have 34 listening sockets all fairly balanced with their
own queue. You can generally achieve higher loads this way and with a
lower average latency.

Also, I tend to bind network IRQs to the same cores as those doing SSL
because you hardly have the two at once. SSL is not able to deal with
traffic capable of saturating a NIC driver, so when SSL saturates the
CPU you have little traffic and when the NIC requires all the CPU for
high traffic, you know there's little SSL.

Cheers,
Willy


Ah! Thanks! I had to remove the default "bind-process 1" or also setting 
the "bind-process 3-36" in the ECC frontend though. I guess it's the 
same at the end. Anyway the IRQ/NIC problem was still the same. I'll 
setup it that way anyway if that's better, together with the Intel 
affinity script or as you said, bound to the related core that does SSL. 
Let's see how well that performs.


--
Regards,
Christian Ruppert



Re: Backend: Multiple A records

2016-11-25 Thread Baptiste
On Fri, Nov 25, 2016 at 8:08 AM, Willy Tarreau  wrote:

> Hi Tim,
>
> On Fri, Nov 25, 2016 at 02:34:49AM +0100, Tim Düsterhus wrote:
> > Hi
> >
> > On 28.08.2016 19:57, Baptiste wrote:
> > > This should happen soon, for 1.7.
> >
> > I noticed Willy's email ???1.7 => almost there??? and wanted to test out
> the
> > feature. Has this feature been implemented for 1.7?
>
> No, unfortunately none of us had the time to complete this. It's sad
> but true. And I definitely refuse to reproduce the 1.5 model where
> we wait for a certain feature to release and where it takes 4.5
> years to produce the expected 6-months release. So this will have to
> wait for 1.8 or later, until someone has time to complete this feature.
>
> At least right now you can update the IP addresses from the CLI, so
> you could very well run a script iterating over the output of a
> "host" command to feed it. It's not as magical but will work.
>
> Regards,
> Willy
>


Hi

This will be my next point of focus. But I have very low amount of free
time currently.
(this and supporting SRV records)

Baptiste


Re: SSL/ECC and nbproc >1

2016-11-25 Thread Willy Tarreau
On Fri, Nov 25, 2016 at 02:44:35PM +0100, Christian Ruppert wrote:
> I have a default bind for process 1 which is basically the http frontend and
> the actual backend, RSA is bound to another, single process and ECC is bound
> to all the rest. So in this case SSL (in particular ECC) is the problem. The
> connections/handshakes should be *actually* using CPU+2 till NCPU.

That's exactly what I'm talking about, look, you have this :

  frontend ECC
 bind-process 3-36
 bind :65420 ssl crt /etc/haproxy/test.pem-ECC
 mode http
 default_backend bk_ram

It creates a single socket (hence a single queue) and shares it between
all processes. Thus each incoming connection will wake up all processes
not doing anything, and the first one capable of grabbing it will take
it as well as a few following ones if any. You end up with a very
unbalanced load making it hard to scale.

Instead you can do this :

  frontend ECC
 bind :65420 ssl crt /etc/haproxy/test.pem-ECC process 3
 bind :65420 ssl crt /etc/haproxy/test.pem-ECC process 4
 bind :65420 ssl crt /etc/haproxy/test.pem-ECC process 5
 bind :65420 ssl crt /etc/haproxy/test.pem-ECC process 6
 ...
 bind :65420 ssl crt /etc/haproxy/test.pem-ECC process 36
 mode http
 default_backend bk_ram

You'll really have 34 listening sockets all fairly balanced with their
own queue. You can generally achieve higher loads this way and with a
lower average latency.

Also, I tend to bind network IRQs to the same cores as those doing SSL
because you hardly have the two at once. SSL is not able to deal with
traffic capable of saturating a NIC driver, so when SSL saturates the
CPU you have little traffic and when the NIC requires all the CPU for
high traffic, you know there's little SSL.

Cheers,
Willy



Re: SSL/ECC and nbproc >1

2016-11-25 Thread Christian Ruppert

On 2016-11-25 14:44, Christian Ruppert wrote:

Hi Willy,

On 2016-11-25 14:30, Willy Tarreau wrote:

Hi Christian,

On Fri, Nov 25, 2016 at 12:12:06PM +0100, Christian Ruppert wrote:
I'll compare HT/no-HT afterwards. In my first tests it didn't same to 
make

much of a difference o far.
I also tried (in this case) to disable HT entirely and set it to max. 
36

procs. Basically the same as before.


Also you definitely need to split your bind lines, one per process, to
take advantage of the kernel's ability to load balance between 
multiple
queues. Otherwise the load is always unequal and many processes are 
woken

up for nothing.


I have a default bind for process 1 which is basically the http
frontend and the actual backend, RSA is bound to another, single
process and ECC is bound to all the rest. So in this case SSL (in
particular ECC) is the problem. The connections/handshakes should be
*actually* using CPU+2 till NCPU. The only shared part should be the
backend but that should be actually no problem for e.g. 5 parallel
benchmarks as a single HTTP benchmark can make >20k requests/s.

global
nbproc 36

defaults:
bind-process 1

frontend http
bind :65410
mode http
default_backend bk_ram

frontend ECC
bind-process 3-36
bind :65420 ssl crt /etc/haproxy/test.pem-ECC
mode http
default_backend bk_ram

backend bk_ram
mode http
fullconn 75000
errorfile 503 /etc/haproxy/test.error




Regards,
Willy


It seems to be the NIC or rather driver/kernel. Using Intel's 
set_irq_affinity (set_irq_affinity -x local eth2 eth3) seems to do the 
trick, at least at the first glance.


--
Regards,
Christian Ruppert



Re: SSL/ECC and nbproc >1

2016-11-25 Thread Christian Ruppert

Hi Willy,

On 2016-11-25 14:30, Willy Tarreau wrote:

Hi Christian,

On Fri, Nov 25, 2016 at 12:12:06PM +0100, Christian Ruppert wrote:
I'll compare HT/no-HT afterwards. In my first tests it didn't same to 
make

much of a difference o far.
I also tried (in this case) to disable HT entirely and set it to max. 
36

procs. Basically the same as before.


Also you definitely need to split your bind lines, one per process, to
take advantage of the kernel's ability to load balance between multiple
queues. Otherwise the load is always unequal and many processes are 
woken

up for nothing.


I have a default bind for process 1 which is basically the http frontend 
and the actual backend, RSA is bound to another, single process and ECC 
is bound to all the rest. So in this case SSL (in particular ECC) is the 
problem. The connections/handshakes should be *actually* using CPU+2 
till NCPU. The only shared part should be the backend but that should be 
actually no problem for e.g. 5 parallel benchmarks as a single HTTP 
benchmark can make >20k requests/s.


global
nbproc 36

defaults:
bind-process 1

frontend http
bind :65410
mode http
default_backend bk_ram

frontend ECC
bind-process 3-36
bind :65420 ssl crt /etc/haproxy/test.pem-ECC
mode http
default_backend bk_ram

backend bk_ram
mode http
fullconn 75000
errorfile 503 /etc/haproxy/test.error




Regards,
Willy


--
Regards,
Christian Ruppert



Re: SSL/ECC and nbproc >1

2016-11-25 Thread Willy Tarreau
Hi Christian,

On Fri, Nov 25, 2016 at 12:12:06PM +0100, Christian Ruppert wrote:
> I'll compare HT/no-HT afterwards. In my first tests it didn't same to make
> much of a difference o far.
> I also tried (in this case) to disable HT entirely and set it to max. 36
> procs. Basically the same as before.

Also you definitely need to split your bind lines, one per process, to
take advantage of the kernel's ability to load balance between multiple
queues. Otherwise the load is always unequal and many processes are woken
up for nothing.

Regards,
Willy



Re: [PATCH] allow higer averages then 16448ms

2016-11-25 Thread Willy Tarreau
Hi again Reinhard,

On Fri, Nov 25, 2016 at 09:17:41AM +0100, Reinhard Vicinus wrote:
> Hi Willy,
> 
> if the cost are to high, then I have no problem keeping the known
> behavior. The only thing I would suggest is to document it, because it
> caused me some headache to figure out why the values were always to low
> and I couldn't find any information, that this behavior is a know problem.

I finally found a much more elegant solution by improving the formula to
save one multiply. Not only does this avoid the overflow without changing
the integer size, it's also faster :-)

Now the limit is at 8.4M milliseconds of average time, or around 2h18m,
this should be plenty for most situations! I'm attaching the patch I've
just merged for this.

Best regards,
Willy
>From 3758581e197dd7b390fb3da94c08f34e0d319c07 Mon Sep 17 00:00:00 2001
From: Willy Tarreau 
Date: Fri, 25 Nov 2016 11:55:10 +0100
Subject: BUG/MINOR: freq-ctr: make swrate_add() support larger values
X-Bogosity: Ham, tests=bogofilter, spamicity=0.00, version=1.2.4

Reinhard Vicinus reported that the reported average response times cannot
be larger than 16s due to the double multiply being performed by
swrate_add() which causes an overflow very quickly. Indeed, with N=512,
the highest average value is 16448.

One solution proposed by Reinhard is to turn to long long, but this
involves 64x64 multiplies and 64->32 divides, which are extremely
expensive on 32-bit platforms.

There is in fact another way to avoid the overflow without using larger
integers, it consists in avoiding the multiply using the fact that
x*(n-1)/N = x-(x/N).

Now it becomes possible to store average values as large as 8.4 millions,
which is around 2h18mn.

Interestingly, this improvement also makes the code cheaper to execute
both on 32 and on 64 bit platforms :

Before :

 :
   0:   8b 54 24 04 mov0x4(%esp),%edx
   4:   8b 0a   mov(%edx),%ecx
   6:   89 c8   mov%ecx,%eax
   8:   c1 e0 09shl$0x9,%eax
   b:   29 c8   sub%ecx,%eax
   d:   8b 4c 24 0c mov0xc(%esp),%ecx
  11:   c1 e8 09shr$0x9,%eax
  14:   01 c8   add%ecx,%eax
  16:   89 02   mov%eax,(%edx)

After :

0020 :
  20:   8b 4c 24 04 mov0x4(%esp),%ecx
  24:   8b 44 24 0c mov0xc(%esp),%eax
  28:   8b 11   mov(%ecx),%edx
  2a:   01 d0   add%edx,%eax
  2c:   81 c2 ff 01 00 00   add$0x1ff,%edx
  32:   c1 ea 09shr$0x9,%edx
  35:   29 d0   sub%edx,%eax
  37:   89 01   mov%eax,(%ecx)

This fix may be backported to 1.6.
---
 include/proto/freq_ctr.h | 16 ++--
 1 file changed, 14 insertions(+), 2 deletions(-)

diff --git a/include/proto/freq_ctr.h b/include/proto/freq_ctr.h
index 65388b1..70b295e 100644
--- a/include/proto/freq_ctr.h
+++ b/include/proto/freq_ctr.h
@@ -182,7 +182,19 @@ unsigned int freq_ctr_remain_period(struct freq_ctr_period 
*ctr, unsigned int pe
  *
  * So basically by summing values and applying the last result an (N-1)/N 
factor
  * we just get N times the values over the long term, so we can recover the
- * constant value V by dividing by N.
+ * constant value V by dividing by N. In order to limit the impact of integer
+ * overflows, we'll use this equivalence which saves us one multiply :
+ *
+ *   N - 1   1 x0
+ *x1 = x0 * ---   =  x0 * ( 1 - --- )  = x0 - 
+ * N N  N
+ *
+ * And given that x0 is discrete here we'll have to saturate the values before
+ * performing the divide, so the value insertion will become :
+ *
+ *   x0 + N - 1
+ *x1 = x0 - 
+ *N
  *
  * A value added at the entry of the sliding window of N values will thus be
  * reduced to 1/e or 36.7% after N terms have been added. After a second batch,
@@ -220,7 +232,7 @@ unsigned int freq_ctr_remain_period(struct freq_ctr_period 
*ctr, unsigned int pe
  */
 static inline unsigned int swrate_add(unsigned int *sum, unsigned int n, 
unsigned int v)
 {
-   return *sum = *sum * (n - 1) / n + v;
+   return *sum = *sum - (*sum + n - 1) / n + v;
 }
 
 /* Returns the average sample value for the sum  over a sliding window of
-- 
1.7.12.1



Re: SSL/ECC and nbproc >1

2016-11-25 Thread Christian Ruppert

Hi Conrad,

On 2016-10-21 17:39, Conrad Hoffmann wrote:

Hi,

it's a lot of information, and I don't have time to go into all details
right now, but from a quick read, here are the things I noticed:

- Why nbproc 64? Your CPU has 18 cores (36 w/ HT), so more procs than 
that

will likely make performance rather worse. HT cores share the cache, so
using 18 might make most sense (see also below). It's best to 
experiment a

little with that and measure the results, though.


I'll compare HT/no-HT afterwards. In my first tests it didn't same to 
make much of a difference o far.
I also tried (in this case) to disable HT entirely and set it to max. 36 
procs. Basically the same as before.




- If you see ksoftirq eating up a lot of of one CPU, then your box is 
most
likely configured to process all IRQs on the first core. Most NICs 
these
days can be configured to use several IRQs, which you can then 
distribute

across all cores, smoothening the workload across cores significantly.


I'll try to get a more recent Distro (It's a Debian Wheezy still) with a 
newer driver etc. They seem to have added some IRQ options in more 
recent versions of ixgbe. Kernel could also be related.

So disabling HT did not help.
nginx seems to have similar problem btw. so it's neither HAProxy nor 
nginx I guess.




- Consider using "bind-process" to lock the processes to a single core 
(but

make sure to leave out the HT cores, or disable HT altogether). Less
context switching, might improve performance)

Hope that helps,
Conrad



On 10/21/2016 04:47 PM, Christian Ruppert wrote:

Hi,

again a performance topic.
I did some further testing/benchmarks with ECC and nbproc >1. I was 
testing

on a "E5-2697 v4" and the first thing I noticed was that HAProxy has a
fixed limit of 64 for nbproc. So the setup:

HAProxy server with the mentioned E5:
global
user haproxy
group haproxy
maxconn 75000
log 127.0.0.2 local0
ssl-default-bind-ciphers
ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-AES256-GCM-SHA384:DHE-DSS-AES128-GCM-SHA256:kEDH+AESGCM:ECDHE-RSA-AES128-SHA256:ECDH

ssl-default-bind-options no-sslv3 no-tls-tickets
tune.ssl.default-dh-param 1024

nbproc 64

defaults
timeout client 300s
timeout server 300s
timeout queue 60s
timeout connect 7s
timeout http-request 10s
maxconn 75000

bind-process 1

# HTTP
frontend haproxy_test_http
bind :65410
mode http
option httplog
option httpclose
log global
default_backend bk_ram

# ECC
frontend haproxy_test-ECC
bind-process 3-64
bind :65420 ssl crt /etc/haproxy/test.pem-ECC
mode http
option httplog
option httpclose
log global
default_backend bk_ram

backend bk_ram
mode http
fullconn 75000 # Just in case the lower default limit will be 
reached...

errorfile 503 /etc/haproxy/test.error



/etc/haproxy/test.error:
HTTP/1.0 200
Cache-Control: no-cache
Connection: close
Content-Type: text/plain

Test123456


The ECC key:
openssl ecparam -genkey -name prime256v1 -out 
/etc/haproxy/test.pem-ECC.key

openssl req -new -sha256 -key /etc/haproxy/test.pem-ECC.key -days 365
-nodes -x509 -sha256 -subj "/O=ECC Test/CN=test.example.com" -out
/etc/haproxy/test.pem-ECC.crt
cat /etc/haproxy/test.pem-ECC.key /etc/haproxy/test.pem-ECC.crt >
/etc/haproxy/test.pem-ECC


So then I tried a local "ab":
ab -n 5000 -c 250 https://127.0.0.1:65420/
Server Hostname:127.0.0.1
Server Port:65420
SSL/TLS Protocol:   
TLSv1/SSLv3,ECDHE-ECDSA-AES128-GCM-SHA256,256,128


Document Path:  /
Document Length:107 bytes

Concurrency Level:  250
Time taken for tests:   3.940 seconds
Complete requests:  5000
Failed requests:0
Write errors:   0
Non-2xx responses:  5000
Total transferred:  106 bytes
HTML transferred:   535000 bytes
Requests per second:1268.95 [#/sec] (mean)
Time per request:   197.013 [ms] (mean)
Time per request:   0.788 [ms] (mean, across all concurrent 
requests)

Transfer rate:  262.71 [Kbytes/sec] received

Connection Times (ms)
  min  mean[+/-sd] median   max
Connect:   54  138  34.7162 193
Processing: 8   51  34.8 24 157
Waiting:3   40  31.6 18 113
Total:177  189   7.5188 333

Percentage of the requests served within a certain time (ms)
  50%188
  66%189
  75%190
  80%190
  90%191
  95%192
  98%196
  99%205
 100%333 (longest request)

The same test with just nbproc 1 was about ~1500 requests/s. So 1,5k *
nbproc would have been what I expected, at least somewhere near that 
value.


Then I setup 61 EC2 instances, standard setup t2-micro. They're 
somewhat

slower with ~1k ECC requests per second but that's ok for the test.
HTTP (one proc) via localhost was around 27-28k r/s, remote (EC2) 
~4500.


So then I started "ab" parallel from each and it 

Re: [PATCH] allow higer averages then 16448ms

2016-11-25 Thread Reinhard Vicinus
Hi Willy,

if the cost are to high, then I have no problem keeping the known
behavior. The only thing I would suggest is to document it, because it
caused me some headache to figure out why the values were always to low
and I couldn't find any information, that this behavior is a know problem.

Best regards
Reinhard

On 11/25/2016 08:04 AM, Willy Tarreau wrote:
> Hi Reinhard,
>
> On Thu, Nov 24, 2016 at 10:04:31PM +0100, Reinhard Vicinus wrote:
>> Hi,
>>
>> we use haproxy (1.6.9) to balance very long running POST requests
>> (around 50 seconds) to backend servers. It generally works like a charm,
>> but the average queue time and average total session time statistic
>> values are totally screwed up.
>>
>> The problem is that the average is calculated like this for every request:
>>
>> sum = sum * 511 / 512 + value
>>
>> for a fixed value and enough iterations:
>>
>> sum = value * 511
>>
>> the problem is that at every iteration sum will first be multiplied by
>> 511 and therefore the maximum value during the calculation is:
>>
>> value * 511 * 511
>>
>> A unsigned int can store a maximum value of 4294967296. Divided by
>> 511*511 results in 16448. That means any backend with average times
>> above 16448ms will be affected by integer overflow and have wrong values.
> Yes we do know this limitation.
>
>> The attached patch tries to solve this by storing and calculating sum as
>> unsigned long long instead of a unsigned int. I don't know if the
>> attached patch will work in every case, but during my limited testing it
>> worked.
> It will definitely work, but I didn't want to do it because of the
> very expensive cost of the 64x64 multiply and divide on 32 bit
> platforms which causes a measurable performance impact. However I'll
> do some tests because is often OK, and doing 32x32/32 with a 64-bit
> intermediary result is OK as well. If I can do it this way, I'll do
> it. Otherwise I'd prefer that we just switch do long so that 64-bit
> platforms can benefit from the large timers and 32-bit ones are
> limited to lower values.
>
> Thanks,
> Willy
>


-- 
Reinhard Vicinus
Metaways Infosystems GmbH
Pickhuben 2, D-20457 Hamburg

E-Mail: r.vici...@metaways.de
Web:http://www.metaways.de
Tel:+49 (0)40 317031-524
Fax:+49 (0)40 317031-10

Metaways Infosystems GmbH - Sitz: D-22967 Tremsbüttel
Handelsregister: Amtsgericht Lübeck HRB 4508 AH
Geschäftsführung: Hermann Thaele, Lüder-H. Thaele