Re: Re: haproxy does not capture the complete request header host sometimes

2017-05-30 Thread Willy Tarreau
On Wed, May 31, 2017 at 10:15:56AM +0800, siclesang wrote:
> i am sorry
> -d output
> 954722 000513c:.accept(0008)=000a from [10.201.10.11:10037]
> 954723 000513c:.clihdr[000a:] POST /coupon/show/a511 HTTP/1.1
> 954724 000513c:.clihdr[000a:] Connection: keep-alive
> 954725 000513c:.clihdr[000a:] ent-Type: application/x-www-form-ur
> 954726 000513c:.clihdr[000a:] ncoded
> 954727 000513c:.clihdr[000a:] gth: 686

Wow, you'll definitely need to run strace over the process as something is
obviously wrong here! Please run "strace -vxttTs500 -o strace.log haproxy -d
-f " and post the strace.log file.

Thanks,
Willy



Re:Re: haproxy does not capture the complete request header host sometimes

2017-05-30 Thread siclesang
i am sorry
-d output
954722 000513c:.accept(0008)=000a from [10.201.10.11:10037]
954723 000513c:.clihdr[000a:] POST /coupon/show/a511 HTTP/1.1
954724 000513c:.clihdr[000a:] Connection: keep-alive
954725 000513c:.clihdr[000a:] ent-Type: application/x-www-form-ur
954726 000513c:.clihdr[000a:] ncoded
954727 000513c:.clihdr[000a:] gth: 686








At 2017-05-27 14:17:26, "Willy Tarreau"  wrote:
>On Tue, May 23, 2017 at 10:13:40AM +0200, Aleksandar Lazic wrote:
>> Hi siclesang.
>> 
>> siclesang have written on Mon, 22 May 2017 11:11:31 +0800 (CST):
>> 
>> > hi
>> >  i have a problem:haproxy does not capture the complete  request
>> > header host sometimes
>> 
>> Which header do you miss?
>> How long is the header?
>
>Also as you can see in the image, the output was mangled with the first
>characters of headers being trimmed.
>
>Siclesang, I don't know if it's your capture or whatever you performed on
>it, but please just copy-paste the output from -d instead of converting
>it into an image and editing it. As it is now, it's useless.
>
>An important point to note is that debug mode (-d) shows every single
>line received prior to processing them, so if you don't see Host being
>listed there, it means it was not present in the request.
>
>Regards,
>Willy





 

Re:Re: haproxy does not capture the complete request header host sometimes

2017-05-30 Thread siclesang
i am sorry
-d output
954722 000513c:.accept(0008)=000a from [10.201.10.11:10037]
954723 000513c:.clihdr[000a:] POST /coupon/show/a511 HTTP/1.1
954724 000513c:.clihdr[000a:] Connection: keep-alive
954725 000513c:.clihdr[000a:] ent-Type: application/x-www-form-ur
954726 000513c:.clihdr[000a:] ncoded
954727 000513c:.clihdr[000a:] gth: 686








At 2017-05-27 14:17:26, "Willy Tarreau"  wrote:
>On Tue, May 23, 2017 at 10:13:40AM +0200, Aleksandar Lazic wrote:
>> Hi siclesang.
>> 
>> siclesang have written on Mon, 22 May 2017 11:11:31 +0800 (CST):
>> 
>> > hi
>> >  i have a problem:haproxy does not capture the complete  request
>> > header host sometimes
>> 
>> Which header do you miss?
>> How long is the header?
>
>Also as you can see in the image, the output was mangled with the first
>characters of headers being trimmed.
>
>Siclesang, I don't know if it's your capture or whatever you performed on
>it, but please just copy-paste the output from -d instead of converting
>it into an image and editing it. As it is now, it's useless.
>
>An important point to note is that debug mode (-d) shows every single
>line received prior to processing them, so if you don't see Host being
>listed there, it means it was not present in the request.
>
>Regards,
>Willy


BUG: Seg fault when reloading from saved state after config change

2017-05-30 Thread Shelley Shostak
BUG:

Extra spaces inserted into the haproxy.cfg file cause haproxy reload with
saved state to seg fault haproxy.

WORKAROUND:

Remove the existing state file OR remove save state from config.

REPRODUCE:

  - Enable save state across reloads
  - Reload and save state file
  - Insert extra space before "weight".
  - Attempt to reload or validate the new config and haproxy will segv.

DESCRIPTION

Our haproxy config is comprised of a bunch of puppet magic. Someone made a
change that included an additional space in nearly every server line of our
config, which has almost 1300 servers:

-  server foo3001.xx.box.net 10.16.26.2:8080 weight 16  check port 9086
inter 1
-  server foo3002.xx.box.net 10.16.26.3:8080 weight 16  check port 9086
inter 1
-  server foo3003.xx.box.net 10.16.26.4:8080 weight 32  check port 9086
inter 1
-  server foo3004.xx.box.net 10.16.26.5:8080 weight 16  check port 9086
inter 1
+  server foo3001.xx.box.net 10.16.26.2:8080  weight 16  check port 9086
inter 1
+  server foo3002.xx.box.net 10.16.26.3:8080  weight 16  check port 9086
inter 1
+  server foo3003.xx.box.net 10.16.26.4:8080  weight 32  check port 9086
inter 1
+  server foo3004.xx.box.net 10.16.26.5:8080  weight 16  check port 9086
inter 1

We are testing save server state on some of some of our hosts.  This change
causes reloads to fail with a segmentation fault.

If I remove the state saving stuff from the config, the segv goes away:

% diff ~/haproxy*cfg
15,16c15,16
< #  stats   socket /var/run/haproxy.sock mode 666 level admin
< #  server-state-file /var/lib/haproxy/haproxy-server-state
---
>   stats   socket /var/run/haproxy.sock mode 666 level admin
>   server-state-file /var/lib/haproxy/haproxy-server-state
20c20
< #  load-server-state-from-file global
---
>   load-server-state-from-file global
% /usr/sbin/haproxy -c -f ~/haproxy-test.cfg
Configuration file is valid
%

If I remove the state file, the config check is valid:

% rm /var/lib/haproxy/haproxy-server-state
% /usr/sbin/haproxy -c -f ~/haproxy.cfg
[WARNING] 149/180256 (140127) : stats socket will not work as expected in
multi-process mode (nbproc > 1), you should force process binding globally
using 'stats bind-process' or per socket using the 'process' attribute.
[WARNING] 149/180256 (140127) : Can't open server state file
'/var/lib/haproxy/haproxy-server-state': No such file or directory
[WARNING] 149/180256 (140127) : Can't open server state file
'/var/lib/haproxy/haproxy-server-state': No such file or directory
[WARNING] 149/180256 (140127) : Can't open server state file
'/var/lib/haproxy/haproxy-server-state': No such file or directory
[WARNING] 149/180256 (140127) : Can't open server state file
'/var/lib/haproxy/haproxy-server-state': No such file or directory
[WARNING] 149/180256 (140127) : Can't open server state file
'/var/lib/haproxy/haproxy-server-state': No such file or directory
[WARNING] 149/180256 (140127) : Can't open server state file
'/var/lib/haproxy/haproxy-server-state': No such file or directory
[WARNING] 149/180256 (140127) : Can't open server state file
'/var/lib/haproxy/haproxy-server-state': No such file or directory
[WARNING] 149/180256 (140127) : Can't open server state file
'/var/lib/haproxy/haproxy-server-state': No such file or directory
[WARNING] 149/180256 (140127) : Can't open server state file
'/var/lib/haproxy/haproxy-server-state': No such file or directory
[WARNING] 149/180256 (140127) : Can't open server state file
'/var/lib/haproxy/haproxy-server-state': No such file or directory
[WARNING] 149/180256 (140127) : Can't open server state file
'/var/lib/haproxy/haproxy-server-state': No such file or directory
[WARNING] 149/180256 (140127) : Can't open server state file
'/var/lib/haproxy/haproxy-server-state': No such file or directory
[WARNING] 149/180256 (140127) : Can't open server state file
'/var/lib/haproxy/haproxy-server-state': No such file or directory
[WARNING] 149/180256 (140127) : Can't open server state file
'/var/lib/haproxy/haproxy-server-state': No such file or directory
[WARNING] 149/180256 (140127) : Can't open server state file
'/var/lib/haproxy/haproxy-server-state': No such file or directory
[WARNING] 149/180256 (140127) : Can't open server state file
'/var/lib/haproxy/haproxy-server-state': No such file or directory
[WARNING] 149/180256 (140127) : Can't open server state file
'/var/lib/haproxy/haproxy-server-state': No such file or directory
[WARNING] 149/180256 (140127) : Can't open server state file
'/var/lib/haproxy/haproxy-server-state': No such file or directory
[WARNING] 149/180256 (140127) : Can't open server state file
'/var/lib/haproxy/haproxy-server-state': No such file or directory
[WARNING] 149/180256 (140127) : Can't open server state file
'/var/lib/haproxy/haproxy-server-state': No such file or directory
[WARNING] 149/180256 (140127) : Can't open server state file
'/var/lib/haproxy/haproxy-server-state': No such file or directory
[WARNING] 149/180256 (140127) : 

Re: HAProxy 1.7.5 cookie JSESSIONID prefix not working

2017-05-30 Thread Lukas Tribus
Hello Norman,


Am 31.05.2017 um 00:13 schrieb Norman Branitsky:
>
> You are correct.
>
> I was setting the jvmRoute parameter to be the server id (AWS EC2
> InstanceID) in my regular apps served by HAPRoxy 1.5.18.
>
> The HAProxy 1.7.5 testing is using a different app that obviously
> doesn't have the jvmRoute defined.
>
> Should I continue with adding “cookie /id/” to the server statement in
> conjunction with
>
> cookie JSESSIONID prefix nocache
>
> Or, should I follow Lukas' suggestion and insert my own HAPROXYID
> cookie like this:
>
> cookie HAPROXYID insert nocache
>
>  
>

Personally I dislike messing with application cookies on the proxy.
It just feels wrong, unclean and frankly unnecessary.

Using a dedicated cookie seems like the right thing to do, at least in
my opinion.

I don't think there are strong technical arguments for either one of
those configurations, this is just my personal distaste for messing with
application data.



Regards,
Lukas




RE: HAProxy 1.7.5 cookie JSESSIONID prefix not working

2017-05-30 Thread Norman Branitsky
You are correct.

I was setting the jvmRoute parameter to be the server id (AWS EC2 InstanceID) 
in my regular apps served by HAPRoxy 1.5.18.

The HAProxy 1.7.5 testing is using a different app that obviously doesn't have 
the jvmRoute defined.

Should I continue with adding "cookie id" to the server statement in 
conjunction with

cookie JSESSIONID prefix nocache

Or, should I follow Lukas' suggestion and insert my own HAPROXYID cookie like 
this:

cookie HAPROXYID insert nocache



-Original Message-
From: Cyril Bonté [mailto:cyril.bo...@free.fr]
Sent: May-30-17 5:56 PM
To: Norman Branitsky 
Cc: Lukas Tribus ; haproxy@formilux.org
Subject: Re: HAProxy 1.7.5 cookie JSESSIONID prefix not working



Hi Norman,



Le 30/05/2017 à 23:39, Norman Branitsky a écrit :

> I modified the server line thus:

>

> server id-dv-dcavr-01 10.90.10.53:9001 check cookie id-dv-dcavr-01

>

>

>

> Now the server name appears as a prefix with a "~" separator.

>

> (It used to appear as a suffix with a "." separator.)



No, appsession never did that. It doesn't modify the cookie value. If a suffix 
was added, it was done by the application server, I guess the jvmRoute 
parameter in your case. I suspect you have also modified the configuration of 
you app servers and didn't set this parameter during the switch from haproxy 
1.5 to 1.7.



>

> JSESSIONID=id-dv-dcavr-01~E8C5E4A2; path=/le5;

> domain=cadca-vr.irondatacorp.com; Secure; HttpOnly

>

>

>

> I can now successfully login to the 2 different servers.

>

>

>

> You ask

>

> "Also, why not use a dedicated cookie for haproxy, instead of humping

> JSESSIONID?"

>

> Frankly, this never occurred to me.

>

> When I started with HAProxy 1.5, 4 years ago,

>

> I looked for example configurations for fronting JBoss and Tomcat.

>

> The documentation always referred to:

>

> appsession JSESSIONID len 52 timeout 3h

>

>

>

> Are you suggesting I do something like this instead?

>

> cookie HAPROXYID insert nocache

>

>

>

> -Original Message-

> From: Lukas Tribus [mailto:lu...@gmx.net]

> Sent: May-30-17 5:00 PM

> To: Norman Branitsky 
> >;

> haproxy@formilux.org

> Subject: Re: HAProxy 1.7.5 cookie JSESSIONID prefix not working

>

>

>

> Hello Norman,

>

>

>

>

>

> Am 30.05.2017 um 18:06 schrieb Norman Branitsky:

>

>>

>

>> The server's identifier is not added to the cookie.

>

>>

>

>

>

> Did you specify the cookie value on the server line [1], as per [2]:

>

>

>

>> The value of the cookie will be the value indicated after the

>

>> "cookie> #

>

>> >" keyword in a "server

>

>>

> "

> statement. If no cookie is declared for a given server, the cookie is

> not set.

>

>

>

>

>

> Also, why not use a dedicated cookie for haproxy, instead of humping

> JSESSIONID?

>

> Do you have clients so broken they support only one single cookie?

>

>

>

>

>

>

>

> Regards,

>

> Lukas

>

>

>

>

>

> [1]

> https://cbonte.github.io/haproxy-dconv/1.7/configuration.html#4.2-cook

> ie

>

> [2]

> https://cbonte.github.io/haproxy-dconv/1.7/configuration.html#5.2-cook

> ie

>





--

Cyril Bonté


Re: HAProxy 1.7.5 cookie JSESSIONID prefix not working

2017-05-30 Thread Cyril Bonté

Hi Norman,

Le 30/05/2017 à 23:39, Norman Branitsky a écrit :

I modified the server line thus:

server id-dv-dcavr-01 10.90.10.53:9001 check cookie id-dv-dcavr-01



Now the server name appears as a prefix with a "~" separator.

(It used to appear as a suffix with a "." separator.)


No, appsession never did that. It doesn't modify the cookie value. If a 
suffix was added, it was done by the application server, I guess the 
jvmRoute parameter in your case. I suspect you have also modified the 
configuration of you app servers and didn't set this parameter during 
the switch from haproxy 1.5 to 1.7.




JSESSIONID=id-dv-dcavr-01~E8C5E4A2; path=/le5;
domain=cadca-vr.irondatacorp.com; Secure; HttpOnly



I can now successfully login to the 2 different servers.



You ask

“Also, why not use a dedicated cookie for haproxy, instead of humping
JSESSIONID?”

Frankly, this never occurred to me.

When I started with HAProxy 1.5, 4 years ago,

I looked for example configurations for fronting JBoss and Tomcat.

The documentation always referred to:

appsession JSESSIONID len 52 timeout 3h



Are you suggesting I do something like this instead?

cookie HAPROXYID insert nocache



-Original Message-
From: Lukas Tribus [mailto:lu...@gmx.net]
Sent: May-30-17 5:00 PM
To: Norman Branitsky ; haproxy@formilux.org
Subject: Re: HAProxy 1.7.5 cookie JSESSIONID prefix not working



Hello Norman,





Am 30.05.2017 um 18:06 schrieb Norman Branitsky:






The server’s identifier is not added to the cookie.








Did you specify the cookie value on the server line [1], as per [2]:




The value of the cookie will be the value indicated after the



"cookie" keyword in a "server





"
statement. If no cookie is declared for a given server, the cookie is
not set.





Also, why not use a dedicated cookie for haproxy, instead of humping
JSESSIONID?

Do you have clients so broken they support only one single cookie?







Regards,

Lukas





[1] https://cbonte.github.io/haproxy-dconv/1.7/configuration.html#4.2-cookie

[2] https://cbonte.github.io/haproxy-dconv/1.7/configuration.html#5.2-cookie




--
Cyril Bonté



RE: HAProxy 1.7.5 cookie JSESSIONID prefix not working

2017-05-30 Thread Norman Branitsky
I modified the server line thus:

server id-dv-dcavr-01 10.90.10.53:9001 check cookie id-dv-dcavr-01



Now the server name appears as a prefix with a "~" separator.

(It used to appear as a suffix with a "." separator.)

JSESSIONID=id-dv-dcavr-01~E8C5E4A2; path=/le5; 
domain=cadca-vr.irondatacorp.com; Secure; HttpOnly



I can now successfully login to the 2 different servers.



You ask

"Also, why not use a dedicated cookie for haproxy, instead of humping 
JSESSIONID?"

Frankly, this never occurred to me.

When I started with HAProxy 1.5, 4 years ago,

I looked for example configurations for fronting JBoss and Tomcat.

The documentation always referred to:

appsession JSESSIONID len 52 timeout 3h



Are you suggesting I do something like this instead?

cookie HAPROXYID insert nocache



-Original Message-
From: Lukas Tribus [mailto:lu...@gmx.net]
Sent: May-30-17 5:00 PM
To: Norman Branitsky ; haproxy@formilux.org
Subject: Re: HAProxy 1.7.5 cookie JSESSIONID prefix not working



Hello Norman,





Am 30.05.2017 um 18:06 schrieb Norman Branitsky:

>

> The server's identifier is not added to the cookie.

>



Did you specify the cookie value on the server line [1], as per [2]:



> The value of the cookie will be the value indicated after the

> "cookie >" keyword in a "server

> " 
> statement. If no cookie is declared for a given server, the cookie is not set.





Also, why not use a dedicated cookie for haproxy, instead of humping JSESSIONID?

Do you have clients so broken they support only one single cookie?







Regards,

Lukas





[1] https://cbonte.github.io/haproxy-dconv/1.7/configuration.html#4.2-cookie

[2] https://cbonte.github.io/haproxy-dconv/1.7/configuration.html#5.2-cookie


Re: HAProxy 1.7.5 cookie JSESSIONID prefix not working

2017-05-30 Thread Aleksandar Lazic
Hi Norman Branitsky.

Norman Branitsky  have written on Tue,
30 May 2017 16:06:18 +:

> With HAProxy 1.5.18, on a RHEL 7.1 server,
> 
> appsession JSESSIONID len 52 timeout 3h
> 
> results in a cookie that looks like this:
> 
> JSESSIONID=51CC2775.i-07035eca525e56235; path=/le5;
> domain=vr.ras.dshs.state.tx.us; Secure; HttpOnly
> 
> With HAProxy 1.7.5, on a CentOS 7 server,
> since apsession is no longer available,
> I tried all the following options in haproxy,cfg instead:
> 
> cookie JSESSIONID prefix
> 
> cookie JSESSIONID prefix nocache
> 
> cookie JSESSIONID rewrite
> 
> The cookie always looks like this:
> 
> JSESSIONID=58FF1FB6; path=/le5;
> domain=cadca-vr.irondatacorp.com; Secure; HttpOnly

Well I have tried to create a replacement sequence for the appsession.
https://www.mail-archive.com/haproxy@formilux.org/msg18181.html

Please can you try it and tell us what's missing, tanks.

The whole thread starts with this mail.
https://www.mail-archive.com/haproxy@formilux.org/msg20421.html

Regards
Aleks

> The server's identifier is not added to the cookie.
> Needless to say, my load balancing doesn't work.
> 
> Norman
> 
> Norman Branitsky
> Cloud Architect
> MicroPact
> (o) 416.916.1752
> (c) 416.843.0670
> (t) 1-888-232-0224 x61752
> www.micropact.com
> Think it > Track it > Done



Re: HAProxy 1.7.5 cookie JSESSIONID prefix not working

2017-05-30 Thread Lukas Tribus
Hello Norman,


Am 30.05.2017 um 18:06 schrieb Norman Branitsky:
>
> The server’s identifier is not added to the cookie.
>

Did you specify the cookie value on the server line [1], as per [2]:

> The value of the cookie will be the value
> indicated after the 
> "cookie" 
> keyword in a "server
> " 
> statement. If no cookie
> is declared for a given server, the cookie is not set.


Also, why not use a dedicated cookie for haproxy, instead of humping
JSESSIONID?
Do you have clients so broken they support only one single cookie?



Regards,
Lukas


[1] https://cbonte.github.io/haproxy-dconv/1.7/configuration.html#4.2-cookie
[2] https://cbonte.github.io/haproxy-dconv/1.7/configuration.html#5.2-cookie



HAProxy 1.7.5 cookie JSESSIONID prefix not working

2017-05-30 Thread Norman Branitsky
With HAProxy 1.5.18, on a RHEL 7.1 server,

appsession JSESSIONID len 52 timeout 3h

results in a cookie that looks like this:

JSESSIONID=51CC2775.i-07035eca525e56235; path=/le5; 
domain=vr.ras.dshs.state.tx.us; Secure; HttpOnly

With HAProxy 1.7.5, on a CentOS 7 server,
since apsession is no longer available,
I tried all the following options in haproxy,cfg instead:

cookie JSESSIONID prefix

cookie JSESSIONID prefix nocache

cookie JSESSIONID rewrite

The cookie always looks like this:

JSESSIONID=58FF1FB6; path=/le5; domain=cadca-vr.irondatacorp.com; 
Secure; HttpOnly

The server's identifier is not added to the cookie.
Needless to say, my load balancing doesn't work.

Norman

Norman Branitsky
Cloud Architect
MicroPact
(o) 416.916.1752
(c) 416.843.0670
(t) 1-888-232-0224 x61752
www.micropact.com
Think it > Track it > Done



Re: HAProxy won't shut down

2017-05-30 Thread Patrick Hemmer


On 2017/5/29 16:04, Frederic Lecaille wrote:
> On 05/29/2017 06:12 PM, Patrick Hemmer wrote:
>>
>> On 2017/5/29 08:22, Frederic Lecaille wrote:
>>>
>>> Hi Patrick,
>>>
>>> First thank you for this nice and helpful report.
>>>
>>> Would it be possible to have an output of this command the next time
>>> you reproduce such an issue please?
>>>
>>> echo "show sess" | socat stdio 
>>
>> Unfortunately this would not be possible. When the issue occurs, the
>> haproxy process has stopped accepting connections on all sockets. If I
>> were to run this command, it would be sent to the new process, not the
>> one that won't shut down.
>
>
> If you send a SIGHUP to haproxy-systemd-wrapper it asks the old
> process to graceful stop.
Yes, that is what my issue report is about. When sent a SIGHUP, the new
process comes up, but the old process won't shut down.

>
> Please have a look to this documentation:
>
> https://cbonte.github.io/haproxy-dconv/1.7/management.html#4
>
> So you are true, if everything goes well no more connection are
> accept()'ed by the old process (the sockets have been unbound). But in
> your reported case the peers sockets are not closed because still in
> CLOSE_WAIT state, so are still being processed, so stats information
> are still available from the socket stats.
The information might still be tracked within the process, but there is
no way to query the information because the process is no longer
accepting new connections. The new process has taken over control of the
admin socket.

>
> If I have missed something please does not hesitate to yell at me ;) .
>
> I have been told that "show sess *all*" give more information.
>
>>>
>>> I have only one question (see below).
>>>
>>> On 05/24/2017 10:40 AM, Willy Tarreau wrote:
 Hi Patrick,

 On Tue, May 23, 2017 at 01:49:42PM -0400, Patrick Hemmer wrote:
 (...)
> haproxy 28856 root1u IPv4  420797940  0t0
> TCP 10.0.33.145:35754->10.0.33.147:1029 (CLOSE_WAIT)
> haproxy 28856 root2u IPv4  420266351  0t0
> TCP 10.0.33.145:52898->10.0.33.147:1029 (CLOSE_WAIT)
> haproxy 28856 root3r  REG0,30
> 4026531956 net
> haproxy 28856 root4u IPv4  422150834  0t0
> TCP 10.0.33.145:38874->10.0.33.147:1029 (CLOSE_WAIT)

 These ones are very interesting.
>>>
>>> These traces also seem interesting to me.
>>>
>>> # strace -p 28856
>>> Process 28856 attached
>>> epoll_wait(0, {}, 200, 319) = 0
>>> epoll_wait(0, {}, 200, 0)   = 0
>>> epoll_wait(0, {}, 200, 362) = 0
>>> epoll_wait(0, {}, 200, 0)   = 0
>>> epoll_wait(0, {}, 200, 114) = 0
>>> epoll_wait(0, {}, 200, 0)   = 0
>>> epoll_wait(0, {}, 200, 203) = 0
>>> epoll_wait(0, {}, 200, 0)   = 0
>>> epoll_wait(0, {}, 200, 331) = 0
>>> epoll_wait(0, {}, 200, 0)
>>>
>>>
>>> Were such "epoll_wait(0, 0, 200, 0)" calls infinitively displayed?
>> Yes
>>
>>>
>>>
>>> In fact I am wondering if it is normal to have so much epoll_wait(0,
>>> {}, 200, 0) calls for a haproxy process which has shut down.
>>>
>>> I suspect they are in relation with peer tasks (obviously which has
>>> expired).
>>>
>>> If this is the case, and with configurations with only peer tasks,
>>> haproxy would definitively hang consuming a lot of CPU resources.
>> HAProxy was not consuming high CPU. Note that in every other call to
>> `epoll_wait`, the 4th value was >0. If every single timeout value were
>> 0, then yes, it would spin consuming CPU.
>>
>
> agreed... but perhaps your configuration does not use only peer tasks,
> contrary to my configuration... this is your traces which lead me to
> check how the peer task expiration is handled with configurations with
> only peers as backends.
>
> In my case with only two peers I see such following traces, after a
> peer has sent a synchronization request:
>
> epoll_wait(0, {}, 200, 1000}
> epoll_wait(0, {}, 200, 1000}
> epoll_wait(0, {}, 200, 1000}
> epoll_wait(0, {}, 200, 1000}
> epoll_wait(0, {}, 200, X}# with X < 1000
>
> followed by a big loop of
>
> epoll_wait(0, {}, 200, 0}# so consuming high CPU resources
> during a fraction of second
>
> then:
>
> shutdown(SHUT_WR)# FIN TCP segment at about 5s after
> the first epoll_wait(0, 0, 200, 1000} above.
>
> then again:
>
> epoll_wait(0, {}, 200, 0}
>
> until the remote peer, which is in CLOSE_WAIT state shuts down its
> socket.
This doesn't make sense. CLOSE_WAIT means that the remote side has
already closed the socket, and the application on the local side needs
to issue a close() on it.

>
> I have not told you that a synchronization request is the first thing
> a peer launches: the peers of the new process try to synchronize with
> old process peers.
>
> With the fix I provided the process epoll_wait(0, {}, 200, *1000*} after
> having shutdown(SHUT_WR} its 

Re: [PATCH 6/9] MEDIUM: mworker: workers exit when the master leaves

2017-05-30 Thread William Lallemand
On Tue, May 30, 2017 at 12:39:32PM +0200, Willy Tarreau wrote:
> [...]
> 
> The master, not intercepting this signal, would die, closing the pipe.
> The worker would be woken up on the detection of this closure, and while
> trying to perform the read() would get the signal in turn, causing the
> read() to return EINTR and to stop polling on this fd instead of exiting.
>

You are absolutely right, I'll fix that.

> 
> Regarding the environment variable names, it's preferable to prepend
> "HAPROXY" in front of them as we've been doing for all other ones to
> avoid namespace conflicts with anything used in other environments (I've
> ween places where you had to run "env|grep" to find your variable). I've
> seen another one called "WAIT_ONLY" in one of the first patches and
> which should equally be renamed.
> 

Will do.

> Otherwise the series looks quite good, it would be nice to get some
> feedback especially from systemd hostages^Wusers (my comments above
> will not affect their experience unless they're really unlucky).
> 
> Thanks!
> Willy
> 

I'll send you another batch with the fixes, and maybe some cleanup. In the
meanwhile people can still try them.

-- 
William Lallemand



Re: New feature request

2017-05-30 Thread Pavlos Parissis
On 05/30/2017 11:56 AM, Willy Tarreau wrote:
> On Tue, May 30, 2017 at 11:04:35AM +0200, Pavlos Parissis wrote:
>> On 05/29/2017 02:58 PM, John Dison wrote:
>>> Hello,
>>>
>>> in ROADMAP I see:
>>> - spare servers : servers which are used in LB only when a minimum farm
>>> weight threshold is not satisfied anymore. Useful for inter-site LB with
>>> local pref by default.
>>>
>>>
>>> Is it possible to push this item priority to get it done for 1.8 please?  
>>> It looks like it should not require major code refactoring, just another LB 
>>> scheme.
>>>
>>> What I want to achieve is an ability to route request to "local" pool until 
>>> is get some
>>> pre-defined maximum load, and route extra request to "remote" pool of 
>>> servers.
>>>
>>> Thanks in advance.
>>>
>>
>>
>> +1 as I also find it very useful. But I am afraid it is too late for 1.8.
> 
> I'd love to have it as well for the same reasons. I think by now it
> shouldn't be too complicated to implement anymore, but all the usual
> suspects are busy on more important devs. I'm willing to take a look
> at it before 1.8 is released if we're in time with everything planned,
> but not more. However if someone wants to give it a try and doesn't
> need too much code review (which is very time consuming), I think this
> could get merged if the impact on existing code remains low (otherwise
> postponed to 1.9-dev).
> 
> In the mean time it's quite possible to achieve something more or less
> similar using two backends, one with the local servers, one with all
> servers, and to only use the second backend when the first one is full.
> It's not exactly the same, but can sometimes provide comparable results.
> 
> Willy
> 

True. I use the following to achieve it, it also avoids flipping users between 
data centers:

# Data center availability logic.
# Based on the destination IP we select the pool.
# NOTE: Destination IP is the public IP of a site and for each data center
# we use different IP address. So, in case we see IP address of dc1
# arriving in dc2 we know that dc is broken
http-request set-header X-Pool
%[str(www.foo.bar)]%[dst,map_ip(/etc/haproxy/dst_ip_dc.map,env(DATACENTER)]
use_backend %[hdr(X-Pool)] if { hdr(X-Pool),nbsrv ge 1 }

# Check for the availability of app in a data canter.
# NOTE: Two acl's with the same name produces a logical or.
acl www.foo.bardc1_down nbsrv(www.foo.bardc1) lt 1
acl www.foo.bardc1_down queue(www.foo.bardc1) ge 1
acl www.foo.bardc2_down nbsrv(www.foo.bardc2) lt 1
acl www.foo.bardc2_down queue(www.foo.bardc2) ge 1
acl www.foo.bardc3_down nbsrv(www.foo.bardc3) lt 1
acl www.foo.bardc3_down queue(www.foo.bardc3) ge 1

# We end up here if the selected pool of a data center is down.
# We don't want to use the all_dc pool as it would flip users between data
# centers, thus we are going to balance traffic across the two remaining
# data centers using a hash against the client IP. Unfortunately, we will
# check again for the availability of the data center, for which we know
# already is down. I should try to figure out a way to somehow dynamically
# know the remaining two data centers, so if dc1 is down then I should
# only check dc2 and dc3.

http-request set-var(req.selected_dc_backup) src,djb2,mod(2)

#Balance if www.foo.bardc1 is down
use_backend www.foo.bardc2 if www.foo.bardc1_down !www.foo.bardc2_down { 
var(req.selected_dc_backup)
eq 0 }
use_backend www.foo.bardc3 if www.foo.bardc1_down !www.foo.bardc3_down { 
var(req.selected_dc_backup)
eq 1 }

#Balance if www.foo.bardc2 is down
use_backend www.foo.bardc1 if www.foo.bardc2_down !www.foo.bardc1_down { 
var(req.selected_dc_backup)
eq 0 }
use_backend www.foo.bardc3 if www.foo.bardc2_down !www.foo.bardc3_down { 
var(req.selected_dc_backup)
eq 1 }

#Balance if www.foo.bardc3 is down
use_backend www.foo.bardc1 if www.foo.bardc3_down !www.foo.bardc1_down { 
var(req.selected_dc_backup)
eq 0 }
use_backend www.foo.bardc2 if www.foo.bardc3_down !www.foo.bardc2_down { 
var(req.selected_dc_backup)
eq 1 }

# If two data centers are down then for simplicity reasons just use the all_dc 
pool
default_backend www.foo.barall_dc

Cheers,
Pavlos



signature.asc
Description: OpenPGP digital signature


Re: [PATCH 6/9] MEDIUM: mworker: workers exit when the master leaves

2017-05-30 Thread Willy Tarreau
So all the series looks quite good, and I must confess I'm impatient to
merge it so that we turn the page of the wrapper, and also because being
able to use nbproc in foreground during development can be nice.

But I have two comments first :

On Mon, May 29, 2017 at 05:42:09PM +0200, William Lallemand wrote:
> +void mworker_pipe_handler(int fd)
> +{
> + char c;
> +
> + if (read(fd, , 1) > -1) {
> + fd_delete(fd);
> + deinit();
> + exit(EXIT_FAILURE);
> + } else {
> + /* should never happened */
> + fd_delete(fd);
> + }
> +
> + return;
> +}

The test above is dangerous, it assumes that there's zero valid reason
for read() to return -1 but in fact there are eventhough they are corner
cases.

First, a good rule of thumb to consider is that a poller is not rocket
science and that it always relies on the callee to check for the reality
of event readiness. We have one well-known example in Linux in association
with UDP datagram reception. It's possible for poll() to say "there's
something to read" and when read()/recv() try to read, the packet checksum
is computed on the fly during the user_copy(), found to be bad, the packet
is destroyed and read() returns EAGAIN. Of course it's not what you have
above, but it's to say that whenever you have a poller, EAGAIN must be
tested to be completely safe.

The second (more likely) case is a signal causing read() to return -1 EINTR.
And in fact in the master worker, it could theorically happen like this :

killall -SOMESIG haproxy

The master, not intercepting this signal, would die, closing the pipe.
The worker would be woken up on the detection of this closure, and while
trying to perform the read() would get the signal in turn, causing the
read() to return EINTR and to stop polling on this fd instead of exiting.

A much safer way to deal with this situation is to loop on EINTR and
return without doing anything on EAGAIN, approximately like this :

while (read(fd) == -1) {
   if (errno == EINTR)
  continue;
   if (errno == EAGAIN) {
  fd_cant_recv(fd);
  return;
   }
   /* otherwise probably die ? */
   break;
}

deinit();
exit();

> @@ -2478,6 +2508,28 @@ int main(int argc, char **argv)
>   exit(0);
>   }
>  
> + if (global.mode & MODE_MWORKER) {
> + if ((getenv(REEXEC_FLAG) == NULL)) {
> + char *msg = NULL;
> + /* master pipe to ensure the master is still 
> alive  */
> + ret = pipe(mworker_pipe);
> + if (ret < 0) {
> + Warning("[%s.main()] Cannot create 
> master pipe.\n", argv[0]);
> + } else {
> + memprintf(, "%d", mworker_pipe[0]);
> + setenv("MWORKER_PIPE_RD", msg, 1);
> + memprintf(, "%d", mworker_pipe[1]);
> + setenv("MWORKER_PIPE_WR", msg, 1);

Regarding the environment variable names, it's preferable to prepend
"HAPROXY" in front of them as we've been doing for all other ones to
avoid namespace conflicts with anything used in other environments (I've
ween places where you had to run "env|grep" to find your variable). I've
seen another one called "WAIT_ONLY" in one of the first patches and
which should equally be renamed.

Otherwise the series looks quite good, it would be nice to get some
feedback especially from systemd hostages^Wusers (my comments above
will not affect their experience unless they're really unlucky).

Thanks!
Willy



Re: New feature request

2017-05-30 Thread Willy Tarreau
On Tue, May 30, 2017 at 11:04:35AM +0200, Pavlos Parissis wrote:
> On 05/29/2017 02:58 PM, John Dison wrote:
> > Hello,
> > 
> > in ROADMAP I see:
> > - spare servers : servers which are used in LB only when a minimum farm
> > weight threshold is not satisfied anymore. Useful for inter-site LB with
> > local pref by default.
> > 
> > 
> > Is it possible to push this item priority to get it done for 1.8 please?  
> > It looks like it should not require major code refactoring, just another LB 
> > scheme.
> > 
> > What I want to achieve is an ability to route request to "local" pool until 
> > is get some
> > pre-defined maximum load, and route extra request to "remote" pool of 
> > servers.
> > 
> > Thanks in advance.
> > 
> 
> 
> +1 as I also find it very useful. But I am afraid it is too late for 1.8.

I'd love to have it as well for the same reasons. I think by now it
shouldn't be too complicated to implement anymore, but all the usual
suspects are busy on more important devs. I'm willing to take a look
at it before 1.8 is released if we're in time with everything planned,
but not more. However if someone wants to give it a try and doesn't
need too much code review (which is very time consuming), I think this
could get merged if the impact on existing code remains low (otherwise
postponed to 1.9-dev).

In the mean time it's quite possible to achieve something more or less
similar using two backends, one with the local servers, one with all
servers, and to only use the second backend when the first one is full.
It's not exactly the same, but can sometimes provide comparable results.

Willy



Re: New feature request

2017-05-30 Thread Pavlos Parissis
On 05/29/2017 02:58 PM, John Dison wrote:
> Hello,
> 
> in ROADMAP I see:
> - spare servers : servers which are used in LB only when a minimum farm
> weight threshold is not satisfied anymore. Useful for inter-site LB with
> local pref by default.
> 
> 
> Is it possible to push this item priority to get it done for 1.8 please?  It 
> looks like it should not require major code refactoring, just another LB 
> scheme.
> 
> What I want to achieve is an ability to route request to "local" pool until 
> is get some
> pre-defined maximum load, and route extra request to "remote" pool of servers.
> 
> Thanks in advance.
> 


+1 as I also find it very useful. But I am afraid it is too late for 1.8.

Cheers,
Pavlos



signature.asc
Description: OpenPGP digital signature