Re: [PATCH] MAJOR: filters: Add filters support

2016-09-19 Thread Willy Tarreau
Hi Bertrand,

On Tue, Sep 20, 2016 at 12:13:32AM +0100, Bertrand Jacquin wrote:
> > And finally, If you can share with me your HA and
> > Nginx configurations, this could help.
> 
> I'm attaching a strip down version of haproxy/nginx/php-fpm on which I
> can reproduice this issue.

I think another thing would be extremely useful, it would be a full-packet
network capture between haproxy and nginx so that we have the full headers,
the chunk sizes (if any) and the response timing, which generally matters a
lot. Ideally the dump in text (or binary) format in a distinct file using
curl -i would be nice as well.

Thanks!
Willy



Re: haproxy - namespece implementation and usage

2016-09-19 Thread Willy Tarreau
Hi Martin,

On Sat, Sep 17, 2016 at 11:16:15PM +0200, Martin Tóth wrote:
> Hi fellow haproxy users,
> 
> i just wanted to ask if new implementation of haproxy (implemented in v.
> 1.6.9) namespaces can work like this. I have Zabbix proxy daemon running
> inside network namespace in Linux, let???s say namespace is named
> ???customer???.
> I want to be able to run haproxy daemon in default linux namespace and be
> able to connect with haproxy to Zabbix proxy demon running inside own
> namespace. Is this possible ?
> 
> My config :
> 
> namespace_list
>   namespace customer
> 
> frontend customer
>   mode tcp
>   bind 10.0.0.2:10001 accept-proxy # this is IP and port on host 
> (10.0.0.2 - linux server IP) where i should connect when i want to reach 
> customer Zabbix proxy daemon
>   default_backend serverlist
> 
> backend serverlist
>   mode tcp
>   server s1 10.8.1.4:10050 namespace customer # this is zabbix proxy 
> dameon
> 
> It did not found any related example of configuration or more than one page
> of documentation. 

It should work like this. I've used it just for a test recently, to see
if it was possible to isolate a daemon into a network-less namespace (no
NIC except "lo"), and with haproxy connecting into that namespace. And
yes, it works. I'm not a namespace user at all so I had to use "ip netns"
with the man page opened and after trial and error I managed to make it
work.

At least you need to enter your customer namespace and issue
"netstat -ltnp" to ensure that your zabbix server is properly listening
to incoming connections, otherwise it will obviously never work.

Hoping this helps,
Willy



Re: envoy LB is now an open source project

2016-09-19 Thread Willy Tarreau
Hi, Pavlos,

On Thu, Sep 15, 2016 at 01:09:16AM +0200, Pavlos Parissis wrote:
> Hi,
> 
> It is a very interesting project, https://lyft.github.io/envoy/
> 
> Here is a comparison with HAProxy
> https://lyft.github.io/envoy/docs/intro/comparison.html

Thanks for the link, it sounds interesting. It looks very young however,
and given that oldest bugs in various load balancers can be around 10
years old with most important ones up to 5 years old, it may take a bit
of time to cool down but regardless of this it looks very interesting.

Cheers,
Willy



Re: [PATCH] MINOR: enable IP_BIND_ADDRESS_NO_PORT on backend connections

2016-09-19 Thread Willy Tarreau
Hi Pavlos,

On Wed, Sep 14, 2016 at 11:01:36PM +0200, Pavlos Parissis wrote:
> in our setup where we have haproxy in PoPs which forwards traffic to haproxy
> servers in main data-centers, I am planning to address the ephemeral port
> exhaustion symptom by having the frontends in data centers listening on 
> multiple
> IPs, so I can have the same server multiple times in the backend at PoP.
> 
> backend data_center_haproxies
>   server1_on_ip1 1.1.1.1 
>   server1_on_ip2 1.1.1.2 
> 
> with our system inventory/puppet infra assigning multiple IPs on servers at 
> PoP
> isn't that simple, I know it sounds weird.

Note that you can also make your servers listen on multiple ports, or use
multiple addresses on haproxy for this. I tend to prefer having multiple
ports because it multiplies the allocatable port ranges without adding IP
addresses anywhere.

Another point to note is that if you're running out of source ports due
to idle keep-alive connections between haproxy and the servers, you can
enable http-reuse to significantly improve the situation. It will also
remove one round-trip for the connect() and will reduce the memory usage
on the server side, so there are benefits everywhere.

Regards,
Willy



Re: Experimental patch: Consistent Hashing with Bounded Loads

2016-09-19 Thread Willy Tarreau
Hi Andrew,

On Mon, Sep 19, 2016 at 11:32:49AM -0400, Andrew Rodland wrote:
(...)
> I haven't found the cause of this, or been able to pin it down much further 
> than that it happens fairly reliably when doing a "haproxy -sf" restart under 
> load.

OK I'll have to test it.

> Other than that, I think I have things working properly and would 
> appreciate a bit of review. My changes are on the "bounded-chash" branch of 
> github.com/arodland/haproxy ??? or would you prefer a patch series sent to 
> the 
> list?

It's better and more convenient to send the patch series to the list. It's
easy to respond inline, and opens the review to everyone, as everyone may
have an opinion on the core or even good suggestions.

Thanks,
Willy



Re: [PATCH] MAJOR: filters: Add filters support

2016-09-19 Thread Bertrand Jacquin
On Mon, Sep 19, 2016 at 10:08:32AM +0200, Christopher Faulet wrote:
> Le 18/09/2016 à 04:17, Bertrand Jacquin a écrit :
> > Today I noticed data corruption when haproxy is used for compression
> > offloading. I bisected twice, and it lead to this specific commit but
> > I'm not 100% confident this commit is the actual root cause.
> > 
> > HTTP body coming from the nginx backend is consistent, but HTTP headers
> > are different depending on the setup I'm enabling. Data corruption only
> > happens with transfer encoding chunked. HTTP body coming then from
> > haproxy to curl can be randomly corrupted, I attached a diff
> > (v1.7-dev1-50-gd7c9196ae56e.Transfer-Encoding-chunked.diff) revealing an
> > unrelated blob like TLS structure in the middle of the javascript. For
> > example, you will find my x509 client certificate in there
> > 
> > I'm also attaching HTTP headers from haproxy to nginx may that help.
> > 
> > Note that I tested with zlib 1.2.8 and libslz 1.0.0, result remains the
> > same in both case.
> 
> I've done some tests. Unfortunately, I'm unable to reproduce the bug. So
> I need more information. First, I need to known how you hit it. Is this
> happen under load or randomly when you do a single request ?

I can reproduce this issue with 100% accuracy on arm or amd64:

  $ for (( i = 0 ; i < 25 ; i++ )) ; do
  curl -s -H 'Accept-Encoding: gzip' \
'https://pants-off.xyz/v1.7-dev1-50-gd7c9196ae56e.js' \
  | zcat | md5sum
done
  01a32fcef0a6894caf112c1a9d5c2a5d  -
  b2a109843f4c43fcde3cb051e4fbf8d2  -
  dedc59fb28ae5d91713234e3e5c08dec  -
  3c8f6d8d53c0ab36bb464b8283570355  -
  e1957e16479bc3106adc68fee2019be8  -
  4cc54367717e5adcdf940f619949ea72  -
  bf637a26e62582c35da6888a4928d4ec  -
  3eeecd478f8e6ea4d690c70f9444954a  -
  79ab805209777ab02bdc6fb829048c74  -
  2aaf9577c1fefdd107a5173aee270c83  -
  .. and so on, shrinking the output here

Note that md5sum of the file should be a4d8bb8ba2a76d7caf090ab632708d7d.

> Then, do
> you still have the bug when you are not using SSL ? Let me also know how
> often the bug appear.

I did not do that test since is was easy for me to track output
containing details about my x509 client certificate.

Running same test as before with 100 iterations, counting similar
output.

  $ for (( i = 0 ; i < 100 ; i++ )) ; do
  curl -s -H 'Accept-Encoding: gzip' \
'http://pants-off.xyz/v1.7-dev1-50-gd7c9196ae56e.js' \
  | zcat | md5sum
done | uniq -c
  1 6c38ef6556efa9e0fa6825803679b2f2  -
 99 a4d8bb8ba2a76d7caf090ab632708d7d  -

Note that 6c38ef6556efa9e0fa6825803679b2f2 appears for the first
iteration. Second test after a few seconds.

  1 ffaf62147b43f82d587df59c39b48e54  -
 29 a4d8bb8ba2a76d7caf090ab632708d7d  -
  1 ae6e4404422b93c9fe64bffdea87f36d  -
 41 a4d8bb8ba2a76d7caf090ab632708d7d  -
  1 3e8c507e16733af8b728e229c00f21c3  -
  4 a4d8bb8ba2a76d7caf090ab632708d7d  -
  1 f6195005b050edcb5ca682b1cde9777f  -
 22 a4d8bb8ba2a76d7caf090ab632708d7d  -

Third test:

  1 6c38ef6556efa9e0fa6825803679b2f2  -
 80 a4d8bb8ba2a76d7caf090ab632708d7d  -
  1 3e8c507e16733af8b728e229c00f21c3  -
 18 a4d8bb8ba2a76d7caf090ab632708d7d  -

So it looks a bit more stable. Now if if query HTTP and HTTPS at the
same time, here is what I get:

HTTPS:
  2 17bfe6f7f6296cc5e1d623381afc9e55  -
  1 cbc1779ce5636c31bcf3ea175088da11  -
  1 52ba63995295f5399ddd91b9f9bdf81d  -
  1 5b4115080f35ac5f564b7164a3ada701  -
  1 adfb87fe9efc33e0218a891b2b8b4d42  -
  1 a6f8707556b2f760d20b51dd59b11fb4  -
  .. and so on

HTTP:
  1 3a794f99df4f7a282f822bbaca508852  -
  1 24242f218d9041383c523984d19feddc  -
  2 a4d8bb8ba2a76d7caf090ab632708d7d  -
  1 9987d0621c7fbe4b399e462f421b2157  -
  1 a4d8bb8ba2a76d7caf090ab632708d7d  -
  1 e261d9cdf988c4fd3d75877812fa5028  -
  .. and so on

Yet it does not look stable. Let's do a test with HTTP only from 2
different hosts:

HTTP client 1:
  1 64cd299604d1f7fac29ef7b2b623b1d0  -
  6 a4d8bb8ba2a76d7caf090ab632708d7d  -
  1 bd0372d30c564925ebd1866cf2476474  -
 11 a4d8bb8ba2a76d7caf090ab632708d7d  -
  1 64cd299604d1f7fac29ef7b2b623b1d0  -
  9 a4d8bb8ba2a76d7caf090ab632708d7d  -

HTTP client 2:
  1 8749926476d446ead3bd8d81523330eb  -
 16 a4d8bb8ba2a76d7caf090ab632708d7d  -
  1 c533c33a3ff469086bdbb6a936e2  -
 14 a4d8bb8ba2a76d7caf090ab632708d7d  -
  1 bd89ab7eab271b2ac13dff42e8e96ba4  -

We are again in a less stable situation.

> And finally, If you can share with me your HA and
> Nginx configurations, this could help.

I'm attaching a strip down version of haproxy/nginx/php-fpm on which I
can reproduice this issue.

Cheers,

-- 
Bertrand


v1.7-dev1-50-gd7c9196ae56e.tgz
Description: GNU Unix tar archive


signature.asc
Description: Digital signature


Re: selecting backend based in server's load

2016-09-19 Thread Dmitry Sivachenko
 
> On 19 Sep 2016, at 23:42, Dmitry Sivachenko  wrote:
> 
> Hello,
> 
> imagine the following configuration:
> 
> frontend F1
> use_backend BACKUP_B1 if B1_IS_FULL
> default_backend B1
> 
> backend B1
> server s_i
> ...
> server s_j
> 
> backend BACKUP_B1
> server b_i
> ...
> server b_j
> 
> -
> frontend F2
> use_backend BACKUP_B2 if B2_IS_FULL
> default_backend B2
> 
> backend B2
> server s_k
> ...
> server s_m
> 
> backend BACKUP_B2
> server b_k
> ...
> server b_m
> --
> <...>
> 
> So basically I have a number of backends B1 ... Bn which use different 
> subsets of the same server pool s_1 ... s_N.
> Each backend have "BACKUP_" backend pair, which should be used only when each 
> server in primary backend has more than a defined number of active sessions 
> (each server may have active sessions via different backends: B1, B2, ..., 
> Bn).
> 
> What is the easiest way to define Bn_IS_FULL acl?
> 
> So far I came with the following solution: in each frontend Fn section write:
> 
> tcp-request content set-var(sess.s_1_conn) srv_conn(B1/s_1)
> tcp-request content set-var(sess.s_1_conn) srv_conn(B2/s_1),add(sess.s_1_conn)
> # <...> repeat last line for each backend which has s_1.  We will have total 
> number of active connections to s_1
> 
> # Repeat the above block for each server s_2, ..., s_N
> 
> #Then define acl, assume the max number of active sessions is 7:
> acl F1_IS_FULL var(sess.s_1_conn) ge 7 var(sess.s_2_conn) ge 7 <...>
> 
> but it looks ugly, we need to replicate the same logic in each frontend and 
> use a lot of code to count sessions.  There should probably be a simpler way 
> to track down the total number of active sessions for a server which 
> participates in several backends.
> 


BTW, it would be convenient to have an ability to have one "super"-backend 
containing all servers
backend SUPER_B
server s1
...
server sN

and let other backends reference these servers similar to what we can do with 
health checks ("track SUPER_B/s1"):

backend B1
server s_1 SUPER_B/s_1



As another benefit this would allow balance algorithm to take into account 
connections each server receives via different backends.




selecting backend based in server's load

2016-09-19 Thread Dmitry Sivachenko
Hello,

imagine the following configuration:

frontend F1
use_backend BACKUP_B1 if B1_IS_FULL
default_backend B1

backend B1
server s_i
...
server s_j

backend BACKUP_B1
server b_i
...
server b_j

-
frontend F2
use_backend BACKUP_B2 if B2_IS_FULL
default_backend B2

backend B2
server s_k
...
server s_m

backend BACKUP_B2
server b_k
...
server b_m
--
<...>

So basically I have a number of backends B1 ... Bn which use different subsets 
of the same server pool s_1 ... s_N.
Each backend have "BACKUP_" backend pair, which should be used only when each 
server in primary backend has more than a defined number of active sessions 
(each server may have active sessions via different backends: B1, B2, ..., Bn).

What is the easiest way to define Bn_IS_FULL acl?

So far I came with the following solution: in each frontend Fn section write:

tcp-request content set-var(sess.s_1_conn) srv_conn(B1/s_1)
tcp-request content set-var(sess.s_1_conn) srv_conn(B2/s_1),add(sess.s_1_conn)
# <...> repeat last line for each backend which has s_1.  We will have total 
number of active connections to s_1

# Repeat the above block for each server s_2, ..., s_N

#Then define acl, assume the max number of active sessions is 7:
acl F1_IS_FULL var(sess.s_1_conn) ge 7 var(sess.s_2_conn) ge 7 <...>

but it looks ugly, we need to replicate the same logic in each frontend and use 
a lot of code to count sessions.  There should probably be a simpler way to 
track down the total number of active sessions for a server which participates 
in several backends.

Thanks in advance.


Re: Experimental patch: Consistent Hashing with Bounded Loads

2016-09-19 Thread Andrew Rodland
On Thursday, September 15, 2016 4:06:15 AM EDT Willy Tarreau wrote:
> Hi Andrew,
> 
> On Wed, Sep 14, 2016 at 02:44:26PM -0400, Andrew Rodland wrote:
> > On Sunday, September 11, 2016 7:57:41 PM EDT Willy Tarreau wrote:
> > > > > Also I've been thinking about this issue of the infinite loop that
> > > > > you
> > > > > solved already. As long as c > 1 I don't think it can happen at all,
> > > > > because for any server having a load strictly greater than the
> > > > > average
> > > > > load, it means there exists at least one server with a load smaller
> > > > > than
> > > > > or equal to the average. Otherwise it means there's no more server
> > > > > in
> > > > > the ring because all servers are down, and then the initial lookup
> > > > > will
> > > > > simply return NULL. Maybe there's an issue with the current lookup
> > > > > method, we'll have to study this.
> > > > 
> > > > Agreed again, it should be impossible as long as c > 1, but I ran into
> > > > it.
> > > > I assumed it was some problem or misunderstanding in my code.
> > > 
> > > Don't worry I trust you, I was trying to figure what exact case could
> > > cause this and couldn't find a single possible case :-/
> > 
> > I've encountered this again in my re-written branch. I think it has to do
> > with the case where all servers are draining for shutdown. What I see is
> > that whenever I do a restart (haproxy -sf oldpid) under load, the new
> > process starts up, but the old process never exits, and perf shows it
> > using 100% CPU in chash_server_is_eligible, so it's got to be looping and
> > deciding nothing is eligible. Can you think of anything special that
> > needs to be done to handle graceful shutdown?
> 
> No, that's very strange. We may have a bug somewhere else which never
> stroke till now. When you talk about a shutdown, you in fact mean the
> shutdown of the haproxy process being replaced by another one, that's
> right ? If so, health checks are disabled during that period so servers
> should not be added to nor removed from the ring.
> 
> However if for any reason there's a graceful shutdown on the servers,
> their weight can be set to zero while they're still active. In this
> case they don't appear in the tree and that may be where the issue
> starts. It would be nice to get a 100% reproducible case to try to
> debug it and dump all weights and capacities, I think it would help.
> 
> Willy

I haven't found the cause of this, or been able to pin it down much further 
than that it happens fairly reliably when doing a "haproxy -sf" restart under 
load. Other than that, I think I have things working properly and would 
appreciate a bit of review. My changes are on the "bounded-chash" branch of 
github.com/arodland/haproxy — or would you prefer a patch series sent to the 
list?

Thanks,

Andrew




Re: How to control traffic like linux TC, instead of reject it?

2016-09-19 Thread Lukas Tribus

Hello JWD,


all your emails are arriving, there is no need to double or triple post 
your questions. Please stop doing that.



When you say "control traffic like linux TC", do you mean traffic 
shaping? Haproxy doesn't support traffic shaping as far as I know.




Lukas




Re: [PATCH] MAJOR: filters: Add filters support

2016-09-19 Thread Christopher Faulet
Le 18/09/2016 à 04:17, Bertrand Jacquin a écrit :
> Today I noticed data corruption when haproxy is used for compression
> offloading. I bisected twice, and it lead to this specific commit but
> I'm not 100% confident this commit is the actual root cause.
> 
> HTTP body coming from the nginx backend is consistent, but HTTP headers
> are different depending on the setup I'm enabling. Data corruption only
> happens with transfer encoding chunked. HTTP body coming then from
> haproxy to curl can be randomly corrupted, I attached a diff
> (v1.7-dev1-50-gd7c9196ae56e.Transfer-Encoding-chunked.diff) revealing an
> unrelated blob like TLS structure in the middle of the javascript. For
> example, you will find my x509 client certificate in there
> 
> I'm also attaching HTTP headers from haproxy to nginx may that help.
> 
> Note that I tested with zlib 1.2.8 and libslz 1.0.0, result remains the
> same in both case.

Hi Bertrand,

I've done some tests. Unfortunately, I'm unable to reproduce the bug. So
I need more information. First, I need to known how you hit it. Is this
happen under load or randomly when you do a single request ? Then, do
you still have the bug when you are not using SSL ? Let me also know how
often the bug appear. And finally, If you can share with me your HA and
Nginx configurations, this could help.

Thanks,
-- 
Christopher