Re: Help! HAProxy randomly failing health checks!

2016-03-24 Thread Zachary Punches
Hey guys, just following up. Still running into the issue.


From: Zachary Punches >
Date: Friday, March 18, 2016 at 6:07 PM
To: Igor Cicimov 
>
Cc: Baptiste >, 
"haproxy@formilux.org" 
>
Subject: Re: Help! HAProxy randomly failing health checks!

Ok! Here is a bunch of info that might better assist with the issue:


Each of our clients has an HAProxy install that forwards requests for 80 and 
443 to 1025 and 1026 respectively. These requests are forwarded over TCP using 
proxy protocol to our HAP instances.
Our HAP instances then SSL term the request and forward them off to our backend 
on port 80.

See attached diagram which should better explain the entire flow.

During an outage due to the SSL handshakes failing, I was running TCPDump so I 
could look through and discover what was causing the failure, I was able to 
discover that we are receiving connection resets on some SSL connections. We 
then tested all the SSL certs from our client side to our side to verify that 
there is not a mismatched cert. This test was completed with no issues.

Here is a connection reset packet I found in that TCP Dump

29525 158.09621710.1.4.11954.239.21.251TCP5438740 → 443 [RST] Seq=3533 Win=0 
Len=0

Frame 29523: 54 bytes on wire (432 bits), 54 bytes captured (432 bits)
Encapsulation type: Ethernet (1)
Arrival Time: Mar 17, 2016 14:58:07.34584 PDT
[Time shift for this packet: 0.0 seconds]
Epoch Time: 1458251887.34584 seconds
[Time delta from previous captured frame: 0.2 seconds]
[Time delta from previous displayed frame: 0.021655000 seconds]
[Time since reference or first frame: 158.096184000 seconds]
Frame Number: 29523
Frame Length: 54 bytes (432 bits)
Capture Length: 54 bytes (432 bits)
[Frame is marked: False]
[Frame is ignored: False]
[Protocols in frame: eth:ethertype:ip:tcp]
[Coloring Rule Name: TCP RST]
[Coloring Rule String: tcp.flags.reset eq 1]
Ethernet II, Src: 12:8d:18:05:0f:91 (12:8d:18:05:0f:91), Dst: 1e:8f:a6:6b:52:58 
(1e:8f:a6:6b:52:58)
Destination: 1e:8f:a6:6b:52:58 (1e:8f:a6:6b:52:58)
Address: 1e:8f:a6:6b:52:58 (1e:8f:a6:6b:52:58)
 ..1.     = LG bit: Locally administered address 
(this is NOT the factory default)
 ...0     = IG bit: Individual address (unicast)
Source: 12:8d:18:05:0f:91 (12:8d:18:05:0f:91)
Address: 12:8d:18:05:0f:91 (12:8d:18:05:0f:91)
 ..1.     = LG bit: Locally administered address 
(this is NOT the factory default)
 ...0     = IG bit: Individual address (unicast)
Type: IPv4 (0x0800)
Internet Protocol Version 4, Src: $SRC IP Dst: $DST IP
0100  = Version: 4
 0101 = Header Length: 20 bytes
Differentiated Services Field: 0x00 (DSCP: CS0, ECN: Not-ECT)
 00.. = Differentiated Services Codepoint: Default (0)
 ..00 = Explicit Congestion Notification: Not ECN-Capable Transport 
(0)
Total Length: 40
Identification: 0x5f56 (24406)
Flags: 0x02 (Don't Fragment)
0...  = Reserved bit: Not set
.1..  = Don't fragment: Set
..0.  = More fragments: Not set
Fragment offset: 0
Time to live: 64
Protocol: TCP (6)
Header checksum: 0x8018 [validation disabled]
[Good: False]
[Bad: False]
Source: $SourceIP
Destination: $DestinationIP
[Source GeoIP: Unknown]
[Destination GeoIP: Unknown]
Transmission Control Protocol, Src Port: 38740 (38740), Dst Port: 443 (443), 
Seq: 3533, Len: 0
Source Port: 38740
Destination Port: 443
[Stream index: 2799]
[TCP Segment Len: 0]
Sequence number: 3533(relative sequence number)
Acknowledgment number: 0
Header Length: 20 bytes
Flags: 0x004 (RST)
000.   = Reserved: Not set
...0   = Nonce: Not set
 0...  = Congestion Window Reduced (CWR): Not set
 .0..  = ECN-Echo: Not set
 ..0.  = Urgent: Not set
 ...0  = Acknowledgment: Not set
  0... = Push: Not set
  .1.. = Reset: Set
[Expert Info (Warn/Sequence): Connection reset (RST)]
[Connection reset (RST)]
[Severity level: Warn]
[Group: Sequence]
  ..0. = Syn: Not set
  ...0 = Fin: Not set
[TCP Flags: *R**]
Window size value: 0
[Calculated window size: 0]
[Window size scaling factor: 128]
Checksum: 0x5c2f [validation disabled]
[Good Checksum: False]
[Bad Checksum: False]
Urgent pointer: 0





From: Igor Cicimov 

Re: TLS Tickets and CPU usage

2016-03-24 Thread Olivier Doucet
Hi again,


2016-03-24 21:15 GMT+01:00 Lukas Tribus :

> Hi Nenad,
>
>
> >> Well, its not supposed to look like this, there is clearly something
> >> wrong. Master key fluctuates between the requests with TLS tickets
> >> and the reuse collumn shows failure.
> >
> > Looks like a haproxy bug, I think I can reproduce it.
> >
> > Can you try with EXACTLY 3 keys in /tmp/tls_ticket_keys?
>

Tried and now behaviour is like expected !
https://gist.github.com/anonymous/779fbc4f1cf8b23e9b1f

And, I can confirm that now, CPU is not doubled \o/





> there seems to be a bug in the handling of the tls-ticket-keys file.
>
> When there are 5 or more ticket keys in the file, clients using TLS tickets
> can no longer resume the TLS session (and fallback to full negotiation):
>
> https://gist.github.com/anonymous/6ec7c863f497cfd849a4
>
>
> Workaround would be to remove the oldest key from the file, so
> that the number of keys in the file remains below 5.
>
That's what I did : keep last 2 keys and add a new one.

 Olivier


Re: TLS Tickets and CPU usage

2016-03-24 Thread Nenad Merdanovic
Hey Lucas,

On 03/24/2016 09:15 PM, Lukas Tribus wrote:
> Hi Nenad,
> 
> 
>>> Well, its not supposed to look like this, there is clearly something
>>> wrong. Master key fluctuates between the requests with TLS tickets
>>> and the reuse collumn shows failure.
>>
>> Looks like a haproxy bug, I think I can reproduce it.
>>
>> Can you try with EXACTLY 3 keys in /tmp/tls_ticket_keys?
> 
> 
> there seems to be a bug in the handling of the tls-ticket-keys file.
> 
> When there are 5 or more ticket keys in the file, clients using TLS tickets
> can no longer resume the TLS session (and fallback to full negotiation):
> 
> https://gist.github.com/anonymous/6ec7c863f497cfd849a4
> 

Thanks a lot for the report. I think I have a fix, just need to validate it.

Regards,
Nenad

> 
> Workaround would be to remove the oldest key from the file, so
> that the number of keys in the file remains below 5.
> 
> 
> 
> cheers,
> 
> Lukas
> 
> 
> 



RE: TLS Tickets and CPU usage

2016-03-24 Thread Lukas Tribus
Hi Nenad,


>> Well, its not supposed to look like this, there is clearly something
>> wrong. Master key fluctuates between the requests with TLS tickets
>> and the reuse collumn shows failure.
>
> Looks like a haproxy bug, I think I can reproduce it.
>
> Can you try with EXACTLY 3 keys in /tmp/tls_ticket_keys?


there seems to be a bug in the handling of the tls-ticket-keys file.

When there are 5 or more ticket keys in the file, clients using TLS tickets
can no longer resume the TLS session (and fallback to full negotiation):

https://gist.github.com/anonymous/6ec7c863f497cfd849a4


Workaround would be to remove the oldest key from the file, so
that the number of keys in the file remains below 5.



cheers,

Lukas

  


RE: TLS Tickets and CPU usage

2016-03-24 Thread Lukas Tribus
Hi Oliver,


> 2016-03-24 17:12 GMT+01:00 Lukas Tribus  
> >: 
> > If thats not it, and no old haproxy instances are present after the 
> > reload, could you compile Vincent's rfc5077-client from [1]: 
> > Output can be find here 
> > : https://gist.github.com/anonymous/6ec7c863f497cfd849a4 
> > (HTTP 500 error is normal, as you are using HEAD / HTTP/1.0 and our web 
> > servers require a Host header) 
>  
> Well, its not supposed to look like this, there is clearly something 
> wrong. Master key fluctuates between the requests with TLS tickets 
> and the reuse collumn shows failure. 


Looks like a haproxy bug, I think I can reproduce it.


Can you try with EXACTLY 3 keys in /tmp/tls_ticket_keys?

Then check with the rfc5077-client and if possible check CPU load in
production.


Thanks,

Lukas

  


Haproxy and FastCGI sockets

2016-03-24 Thread Stojan Rančić

Hello,

we're using Haproxy  1.5.5-1 to load balance traffic between frontentds 
running Lighttpd with mod_FastCGI and backends running a custom Perl 
app, based on FastCGI. Traffic between front and backends is not http - 
frontends open a socket to the VIP, and thus communicate to the backends.


The problem occurs when we restart haproxy due to config changes, as the 
frontend doesn't know the socket has closed, and starts throwing errors. 
After restarting Lighty on the frontends, everything works fine again.


The question is - can Haproxy handle such restarts in a better way, or 
is this the only way?


Below is the relevant haproxy config:

listen CI_2001 1.2.3.4:5000
mode tcp
balance leastconn
server xx 192.168.0.100:5000 check inter 2000 rise 2 fall 5
server yy 192.168.0.200:5000 check inter 2000 rise 2 fall 5

thanks, Stojan



Re: TLS Tickets and CPU usage

2016-03-24 Thread Olivier Doucet
2016-03-24 17:12 GMT+01:00 Lukas Tribus :

> > If thats not it, and no old haproxy instances are present after the
> > reload, could you compile Vincent's rfc5077-client from [1]:
> > Output can be find here
> > : https://gist.github.com/anonymous/6ec7c863f497cfd849a4
> > (HTTP 500 error is normal, as you are using HEAD / HTTP/1.0 and our web
> > servers require a Host header)
>
> Well, its not supposed to look like this, there is clearly something
> wrong. Master key fluctuates between the requests with TLS tickets
> and the reuse collumn shows failure.
>
> Are there any middleboxes between the server and the client? Can
> you try directly on the server so it doesn't leave the box (specifically
> it doesn't cross any firewalls or other SSL/TLS intercepting MITM).
>

I'm sure there is no firewall or MITM.
HAProxy is launched with nbproc 7, but the frontend I'm asking for is bind
to a single one.

Olivier


RE: TLS Tickets and CPU usage

2016-03-24 Thread Lukas Tribus
> If thats not it, and no old haproxy instances are present after the 
> reload, could you compile Vincent's rfc5077-client from [1]: 
> Output can be find here 
> : https://gist.github.com/anonymous/6ec7c863f497cfd849a4 
> (HTTP 500 error is normal, as you are using HEAD / HTTP/1.0 and our web 
> servers require a Host header) 

Well, its not supposed to look like this, there is clearly something
wrong. Master key fluctuates between the requests with TLS tickets
and the reuse collumn shows failure.

Are there any middleboxes between the server and the client? Can
you try directly on the server so it doesn't leave the box (specifically
it doesn't cross any firewalls or other SSL/TLS intercepting MITM).


Lukas

  


RE: src_get_gpc0 seems not to work after commit f71f6f6

2016-03-24 Thread Lukas Tribus
Hi,


>> As below, I use stick-table for temporary acl.
>> After commit f71f6f6, src_get_gpc0 seems not to work.
>>
>> So, I revert commit f71f6f6, and it works!!
>
> That's not a valid commit in the official haproxy repo, can you please
> check the hash again?

Its a valid hash in the haproxy-1.6 repro, this would be be508f158
in 1.7-dev (BUG/MAJOR: samples: check smp->strm before using it).



cheers,
Lukas

  


Re: src_get_gpc0 seems not to work after commit f71f6f6

2016-03-24 Thread Christian Ruppert

Hi Seri,

On 2016-03-23 08:40, Sehoon Kim wrote:

Hi,

As below, I use stick-table for temporary acl.
After commit f71f6f6, src_get_gpc0 seems not to work.

So, I revert commit f71f6f6, and it works!!


That's not a valid commit in the official haproxy repo, can you please 
check the hash again?




frontend SSL-Offload
bind :443 ssl crt ssl.pem ecdhe prime256v1

tcp-request connection accept if { src_get_gpc0(whitelist) eq 1 }
tcp-request connection reject

backend whitelist
stick-table type ip size 1m expire 1h nopurge store gpc0

Thanks

Seri


--
Regards,
Christian Ruppert



Re: TLS Tickets and CPU usage

2016-03-24 Thread Olivier Doucet
2016-03-24 12:57 GMT+01:00 Lukas Tribus :

> > Ok, when you say CPU usage double do you mean the CPU usage after
> > a reload/restart, or do you mean CPU usage in general (even after not
> > reloading haproxy)?
> > CPU is at 100% just after reload for more than 30s (was a few seconds
> > before) and then CPU usage stays doubled all the time.
>
> Ok, so it looks like resumption doesn't work at all with TLS tickets.
>
> Are you sure the haproxy reload works fine - no old haproxy instances
> run in the background serving obsolete TLS keys?
>
Yes, I'm sure.


> There have been some bugs with reloading haproxy, fixed in 1.6.4.
>
I recompiled HAProxy with latest version and OpenSSL 1.0.2g.

I activated tls-ticket and CPU usage doubled again.
I tried a reload then, and CPU stays stable.

So at least the reload problem with CPU at 100% seems resolved. But I do
not understand why using TLS-tickets is using so much more CPUs (I hoped it
would be "slightly" higher, not doubled).

BTW, servers are 2x Intel Xeon L5630  @ 2.13GHz and certificates issued are
all SHA256RSA.
I will use ECDSA certificates in the future, I was just waiting for
transparent support of ECDSA/RSA certificates in HAProxy (done in 1.7, just
waiting for the stable release on this).




> If thats not it, and no old haproxy instances are present after the
> reload, could you compile Vincent's rfc5077-client from [1]:
>
Output can be find here :
https://gist.github.com/anonymous/6ec7c863f497cfd849a4
(HTTP 500 error is normal, as you are using HEAD / HTTP/1.0 and our web
servers require a Host header)

Olivier


RE: Weird stick-tables / peers behaviour

2016-03-24 Thread Lukas Tribus
> Hi all,
>
> I've just upgraded some hosts to 1.6.4 (from 1.5) and immediately got a
> bunch of SMS because we're using stick-tables to track the connections
> and monitor http_req_rate. The stick-tables data will be synced to the
> other peers using the "peers" section.

Possibly related to the ML thread "src_get_gpc0 seems not to work after
commit f71f6f6" (but I didn't look into the details).


cheers,

lukas

  


Weird stick-tables / peers behaviour

2016-03-24 Thread Christian Ruppert

Hi all,

I've just upgraded some hosts to 1.6.4 (from 1.5) and immediately got a 
bunch of SMS because we're using stick-tables to track the connections 
and monitor http_req_rate. The stick-tables data will be synced to the 
other peers using the "peers" section.

So I setup a test case using two HAProxy instances with e.g.:
global
user haproxy
group haproxy
maxconn 1
stats socket /var/run/haproxy.stat user haproxy gid haproxy mode 
600 level admin


# aus der anti-dos config
defaults
timeout client 60s
timeout server 60s
timeout queue 60s
timeout connect 3s
timeout http-request 10s


frontend test
bind 0.0.0.0:8080
mode http

tcp-request inspect-delay 7s
tcp-request content track-sc1 src table backend_sourceip

tcp-request content reject if { 
sc1_http_req_rate(backend_sourceip) gt 15 }


http-request deny if { sc1_http_req_rate(backend_sourceip) gt 15 
}



peers foo_peers
peer host1 172.16.0.128:8024
peer host2 172.16.0.16:8024

backend backend_sourceip
# 1mio IPs, 8hrs TTL per entry for several stats per IP in 10s
stick-table type ip size 1m expire 8h store 
gpc0,conn_cnt,conn_cur,conn_rate(10s),http_req_cnt,http_req_rate(10s),http_err_cnt,http_err_rate(10s) 
peers foo_peers



I then have 4 terminals, two for doing a:
watch "echo 'show table backend_sourceip' | socat stdio 
/var/run/haproxy.stat"


and two for doing some "curl -Lvs http://127.0.0.1:8080; by hand.
If you do some on the first and some on the second host you'll notice 
different values on one side. Also the counter may e.g. double while the 
other side has the correct/actual value. This results into several 
thousands of requests on our prod. systems but according to the logs it 
can't be correct.
Does anybody else have similar weirdness or can you guys confirm false 
values?
The *_cnt values seem to be ok but the *_rate ones seem to be false in 
some cases.


--
Regards,
Christian Ruppert



Re: Exchange 2013 / NTLM Connections

2016-03-24 Thread Baptiste
Hi Graham,

The http-keep-alive mode is recommended, with the "option
prefer-last-server" (which should be implicitly set by HAProxy in your
case).
Hopefully you're not using the http-reuse option.

401 are normal and are part of the NTLM negociation. You should see a
few of them at the beginning of the connection, then regular traffic
passing through.

Baptiste



RE: TLS Tickets and CPU usage

2016-03-24 Thread Lukas Tribus
> Ok, when you say CPU usage double do you mean the CPU usage after 
> a reload/restart, or do you mean CPU usage in general (even after not 
> reloading haproxy)? 
> CPU is at 100% just after reload for more than 30s (was a few seconds 
> before) and then CPU usage stays doubled all the time. 

Ok, so it looks like resumption doesn't work at all with TLS tickets.

Are you sure the haproxy reload works fine - no old haproxy instances
run in the background serving obsolete TLS keys?

There have been some bugs with reloading haproxy, fixed in 1.6.4.


If thats not it, and no old haproxy instances are present after the
reload, could you compile Vincent's rfc5077-client from [1]:

git clone https://github.com/vincentbernat/rfc5077.git
cd rfc5077
make rfc5077-client


./rfc5077-client -4 


Make sure you have the dependencies installed from the
github page (mainly libssl-dev and pkg-config).



cheers,

Lukas


[1] https://github.com/vincentbernat/rfc5077



  


Exchange 2013 / NTLM Connections

2016-03-24 Thread Graham Morley
Hi,

I'm hoping someone can help with an Exchange 2013 / NTLM Authentication 
question.

(My background is more on the networking side, so I'm finding my way a bit with 
HTTP related challenges...)

We're currently running HA Proxy v1.6.4 and trying to use it in-front of an 
Exchange 2013 CAS set-up.

The challenge that we're having is whilst everything appears to be working, 
we're having connectivity 'issues' when trying to migrate users to Office 365.

I've been trying to read-up on this and understand what might be happening.

- Our Frontend is running in HTTP mode, terminating the HTTPS connections to 
the outside world.
- Our Backend is also running in HTTP mode, connecting back to the CAS Servers 
over HTTPS.

The migration to Office 365 process connects to:

https://mail.ourcompany.com/EWS/mrsproxy.svc

>From the HA Proxy Logs, we can see that this works successfully for a number 
>of requests like this:

Mar 23 09:12:19 ae-lb01.ourcompany.org haproxy[18151]: 132.245.40.245:56136 
[23/Mar/2016:09:12:19.478] ft_exchange~ bk_exch_2013/ae-exch02 353/0/0/42/+395 
200 +721 - -  289/289/1/1/0 0/0 {mail.ourcompany.com||1504} {715506} "POST 
/EWS/mrsproxy.svc HTTP/1.1"

Mar 23 09:12:19 ae-lb01. ourcompany.org haproxy[18151]: 132.245.40.245:56136 
[23/Mar/2016:09:12:18.726] ft_exchange~ bk_exch_2013/ae-exch02 15/0/0/401/+416 
200 +722 - -  286/286/1/1/0 0/0 {mail.ourcompany.com||2435} {5620718} "POST 
/EWS/mrsproxy.svc HTTP/1.1"

This works fine for about 500 log entries (in this instance; it appears to be 
random). Then we get a few 401 errors:

Mar 23 09:12:19 ae-lb01.ourcompany.org haproxy[18151]: 132.245.40.245:61593 
[23/Mar/2016:09:12:19.179] ft_exchange~ bk_exch_2013/ae-exch02 12/0/2/2/+16 401 
+477 - -  288/288/2/2/0 0/0 {mail.ourcompany.com||2435} {0} "POST 
/EWS/mrsproxy.svc HTTP/1.1"

Mar 23 09:12:19 ae-lb01.ourcompany.org haproxy[18151]: 132.245.40.245:61593 
[23/Mar/2016:09:12:19.207] ft_exchange~ bk_exch_2013/ae-exch02 6/0/0/1/+7 401 
+693 - -  288/288/2/2/0 0/0 {mail.ourcompany.com||0} {0} "POST 
/EWS/mrsproxy.svc HTTP/1.1"

Interestingly, although the source IP is the same, the source tcp port changes. 
This is not unexpected; tcp connections will come to an end and new ones will 
start.

The problem that we're having is that as soon as this new connection starts, we 
just then see a series of 401 errors from the same source IP, but different 
source tcp port.

After doing some reading, as best as I can tell, this may be down to the NTLM / 
Windows Authentication that is used in the process.

To understand this more, I've added some capture lines to my Frontend 
configuration:

  capture request header Host len 50
  capture request header User-Agent len 500
  capture request header Content-Length len 50
  capture request header Authorization len 500

  capture response header Content-Length len 50
  capture response header WWW-Authenticate len 500
  capture response header Authentication-Info len 500

After doing this and re-starting some bits, I can now see some more of the 
Authentication information is the logs:

Mar 23 18:13:05 ae-lb01.ourcompany.org haproxy[3443]: 132.245.47.13:20099 
[23/Mar/2016:18:13:05.660] ft_exchange~ bk_exch_2013/ae-exch02 8/0/1/2/+11 401 
+693 - -  192/192/1/1/0 0/0 {mail.ourcompany.com||0|Negotiate 
TlRMTVNTUAABl4II4gAGAvAjDw==} {0|NTLM|} "POST 
/EWS/mrsproxy.svc HTTP/1.1"

Mar 23 18:13:26 ae-lb01.ourcompany.org haproxy[3443]: 132.245.47.13:20099 
[23/Mar/2016:18:13:05.671] ft_exchange~ bk_exch_2013/ae-exch02 
7/0/0/21029/+21036 503 +482 - -  194/194/1/1/0 0/0 
{mail.ourcompany.com||663|Negotiate 
TlRMTVNTUAADGAAYAJYAAABsAWwBrgAAABIAEgBYEgASAGoaABoAfBAAEAAaAgAAFYKI4gYC8CMPZvo1ootKOIM3n52pgdGLYm8AZgBmAGkAYwBlAG8AcgBnAG0AZgBhAGIAYQBkAG0AaQBuAEEATQAzAFAAUgAwADUATQBCAD}
 {0||} "POST /EWS/mrsproxy.svc HTTP/1.1"

This was a useful insight into how Microsoft's NTLM / Windows Authentication 
uses the HTTP headers and 401 responses:

http://www.innovation.ch/personal/ronald/ntlm.html

As was:

https://www.ietf.org/rfc/rfc4559.txt

My conclusion is that because HA Proxy is re-using the backend to Exchange 
connection, the overall conversation between Office 365 and Exchange is no 
longer authenticated.

My next step was to was to do some more reading about HA Proxy's HTTP 
'connection modes'. This section in the documentation was great:

http://cbonte.github.io/haproxy-dconv/configuration-1.6.html#4

As was this reference from ALOHA:

https://www.haproxy.com/static/media/uploads/eng/resources/aloha_load_balancer_http_connection_mode_memo2.pdf

So (finally), I'm down to my real question:

- What HA Proxy HTTP 'connection' mode should I be using with a Backend that is 
providing an NTLM Authenticated Service?
  - My understand is that it should be 'tunnel mode'.
  - This can be configured on the backend with, 'option tunnel-mode'.
  - This is needed, because in v1.6 and above, the default connection 

Re: CLEANUP: connection

2016-03-24 Thread Willy TARREAU
On Thu, Mar 24, 2016 at 10:10:06AM +, David CARLIER wrote:
> Sure :) but just "respected" the original form.

OK I fixed it by hand and applied it.

Willy




Re: TLS Tickets and CPU usage

2016-03-24 Thread Olivier Doucet
Hi Lukas,



2016-03-24 11:15 GMT+01:00 Lukas Tribus :

> > But CPU usage doubled ! I disabled it by adding again
> > "ssl-default-bind-options no-tls-tickets" and CPU usage returned to
> > normal.
>
> Ok, when you say CPU usage double do you mean the CPU usage after
> a reload/restart, or do you mean CPU usage in general (even after not
> reloading haproxy)?
>
CPU is at 100% just after reload for more than 30s (was a few seconds
before) and then CPU usage stays doubled all the time.



>
> > And /tmp/tls_ticket_keys generated with "openssl rand -base64 48"
> > called 3x + appended at each reload.
>
> By calling it 3 times you are basically destroying the old keys making
> sure that TLS tickets CANNOT be reused. You must only generate
> a new key ONCE per reload.
>

I misspoke . I generate 3 keys on haproxy first startup, then append only
one ticket at each reload.

Olivier


RE: TLS Tickets and CPU usage

2016-03-24 Thread Lukas Tribus
Hi Oliver,


> Hello guys, 
> 
> I'm having troubles with HAProxy 1.6.3 and TLS ticket, so let me 
> explain here my case. 
> 
> I'm running HAProxy 1.6.3 (since december) and all was running fine. 
> TLS ticket was explicitely disabled. The only downside of this setup is 
> that after each reload, I have a CPU spike for a few seconds. I thought 
> this was due to session renegociation (right ?) 
> 
> A few days ago, I decided to activate TLS-Ticket and use option 
> tls-ticket-keys on bind lines. My hope was to remove this CPU spike, as 
> session renegociation should be faster. 
> But CPU usage doubled ! I disabled it by adding again 
> "ssl-default-bind-options no-tls-tickets" and CPU usage returned to 
> normal. 

Ok, when you say CPU usage double do you mean the CPU usage after
a reload/restart, or do you mean CPU usage in general (even after not
reloading haproxy)?



> And /tmp/tls_ticket_keys generated with "openssl rand -base64 48" 
> called 3x + appended at each reload.

By calling it 3 times you are basically destroying the old keys making
sure that TLS tickets CANNOT be reused. You must only generate
a new key ONCE per reload.


Lukas

  


Re: CLEANUP: connection

2016-03-24 Thread David CARLIER
Sure :) but just "respected" the original form.

On 24 March 2016 at 10:05, Willy TARREAU  wrote:

> On Thu, Mar 24, 2016 at 09:34:48AM +, David CARLIER wrote:
> > - if (!memcmp(line, "TCP4 ", 5) != 0) {
> > + if (!(memcmp(line, "TCP4 ", 5) != 0)) {
>
> Wow. Scary one. That said, couldn't you not avoid the double negation, like
> this ? :-)
>
> -   if (!memcmp(line, "TCP4 ", 5) != 0) {
> +   if (memcmp(line, "TCP4 ", 5) == 0) {
>
> thanks,
> Willy
>
>


Re: CLEANUP: connection

2016-03-24 Thread Willy TARREAU
On Thu, Mar 24, 2016 at 09:34:48AM +, David CARLIER wrote:
> - if (!memcmp(line, "TCP4 ", 5) != 0) {
> + if (!(memcmp(line, "TCP4 ", 5) != 0)) {

Wow. Scary one. That said, couldn't you not avoid the double negation, like
this ? :-)

-   if (!memcmp(line, "TCP4 ", 5) != 0) {
+   if (memcmp(line, "TCP4 ", 5) == 0) {

thanks,
Willy




TLS Tickets and CPU usage

2016-03-24 Thread Olivier Doucet
Hello guys,

I'm having troubles with HAProxy 1.6.3 and TLS ticket, so let me explain
here my case.

I'm running HAProxy 1.6.3 (since december) and all was running fine. TLS
ticket was explicitely disabled. The only downside of this setup is that
after each reload, I have a CPU spike for a few seconds. I thought this was
due to session renegociation (right ?)

A few days ago, I decided to activate TLS-Ticket and use option
tls-ticket-keys on bind lines. My hope was to remove this CPU spike, as
session renegociation should be faster.
But CPU usage doubled ! I disabled it by adding again
"ssl-default-bind-options no-tls-tickets" and CPU usage returned to normal.

>From the doc, I read that activating TLS ticket may use "slightly" more
CPU, but I hoped that using tickets file could help in this case.
Apparently I'm wrong.

Any detailed explanation and feedback would be really useful here.

Snippet of my config (I know I'm using old syntax for listen/bind) :

global
tune.ssl.default-dh-param 2048
tune.ssl.lifetime 100800
tune.ssl.cachesize 100
#ssl-default-bind-options no-tls-tickets
ssl-default-bind-ciphers
ECDHE-RSA-AES256-GCM-SHA384:ECDHE-RSA-AES128-GCM-SHA256:DHE-RSA-AES256-GCM-SHA384:DHE-RSA-AES128-GCM-SHA256:ECDHE-RSA-AES256-SHA384
:ECDHE-RSA-AES128-SHA256:ECDHE-RSA-AES256-SHA:ECDHE-RSA-AES128-SHA:DHE-RSA-AES256-SHA256:DHE-RSA-AES128-SHA256:DHE-RSA-AES256-SHA:DHE-RSA-AES128-SHA:ECDHE-RSA-D
ES-CBC3-SHA:EDH-RSA-DES-CBC3-SHA:AES256-GCM-SHA384:AES128-GCM-SHA256:AES256-SHA256:AES128-SHA256:AES256-SHA:AES128-SHA:DES-CBC3-SHA:HIGH:!aNULL:!eNULL:!EXPORT:!
DES:!MD5:!PSK:!RC4

listen :443
bind xxx:443 ssl crt /etc/ssl/ssl_xxx.pem no-sslv3 tls-ticket-keys
/tmp/tls_ticket_keys
server s107 xxx:80 check weight 5 fall 60

***
HAProxy version :

HA-Proxy version 1.6.3 2015/12/25
Copyright 2000-2015 Willy Tarreau 

Build options :
  TARGET  = linux2628
  CPU = native
  CC  = gcc
  CFLAGS  = -O2 -march=native -g -fno-strict-aliasing
-Wdeclaration-after-statement
  OPTIONS = USE_OPENSSL=1 USE_PCRE=1 USE_TFO=1

Default settings :
  maxconn = 2000, bufsize = 16384, maxrewrite = 1024, maxpollevents = 200

Encrypted password support via crypt(3): yes
Built without compression support (neither USE_ZLIB nor USE_SLZ are set)
Compression algorithms supported : identity("identity")
Built with OpenSSL version : OpenSSL 1.0.2f  28 Jan 2016
Running on OpenSSL version : OpenSSL 1.0.2f  28 Jan 2016
OpenSSL library supports TLS extensions : yes
OpenSSL library supports SNI : yes
OpenSSL library supports prefer-server-ciphers : yes
Built with PCRE version : 7.2 2007-06-19
PCRE library supports JIT : no (USE_PCRE_JIT not set)
Built without Lua support
Built with transparent proxy support using: IP_TRANSPARENT IPV6_TRANSPARENT
IP_FREEBIND


*
And /tmp/tls_ticket_keys generated with "openssl rand -base64 48" called 3x
+ appended at each reload.


Olivier


CLEANUP: connection

2016-03-24 Thread David CARLIER
Hi again,

This is tiny one, this one I just spotted when I tried a build on FreeBSD
with clang.

clang detected unused functions as well but I did not dare to delete them,
they might but just here "on hold" for future purposes (ie
init_comp_ctx/deinit_comp_ctx and had_fd_isset in src/ev_poll.c).

Kind regards.
From 377ae8167a051d0209d04b702bcbc5e78d64d53c Mon Sep 17 00:00:00 2001
From: David CARLIER 
Date: Thu, 24 Mar 2016 09:22:36 +
Subject: [PATCH] CLEANUP: connection: adding missing parenthesis

Nothing harmful in here, just clarify that it applies to the whole
expression.
---
 src/connection.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/connection.c b/src/connection.c
index 6eab5e1..1d73e24 100644
--- a/src/connection.c
+++ b/src/connection.c
@@ -385,7 +385,7 @@ int conn_recv_proxy(struct connection *conn, int flag)
 	if (trash.len < 9) /* shortest possible line */
 		goto missing;
 
-	if (!memcmp(line, "TCP4 ", 5) != 0) {
+	if (!(memcmp(line, "TCP4 ", 5) != 0)) {
 		u32 src3, dst3, sport, dport;
 
 		line += 5;
@@ -426,7 +426,7 @@ int conn_recv_proxy(struct connection *conn, int flag)
 		((struct sockaddr_in *)>addr.to)->sin_port  = htons(dport);
 		conn->flags |= CO_FL_ADDR_FROM_SET | CO_FL_ADDR_TO_SET;
 	}
-	else if (!memcmp(line, "TCP6 ", 5) != 0) {
+	else if (!(memcmp(line, "TCP6 ", 5) != 0)) {
 		u32 sport, dport;
 		char *src_s;
 		char *dst_s, *sport_s, *dport_s;
-- 
2.7.4



Re: CLEANUP: chunk

2016-03-24 Thread Willy TARREAU
Hi David,

On Thu, Mar 24, 2016 at 09:16:19AM +, David CARLIER wrote:
> Here a cleanup patch for the chunk_dup function.
> hope it can be useful.

Good catch, thank you. I've just merged it. That reminds me that there's
still another one from you I need to check.

Cheers,
Willy




CLEANUP: chunk

2016-03-24 Thread David CARLIER
Hi all,

Here a cleanup patch for the chunk_dup function.
hope it can be useful.

Regards.
From 3d904193dc041bad266fd04f69b50a66b8429f54 Mon Sep 17 00:00:00 2001
From: David Carlier 
Date: Wed, 23 Mar 2016 17:50:57 +
Subject: [PATCH] CLEANUP: chunk: adding NULL check to chunk_dup allocation.

Avoiding harmful memcpy call if the allocation failed.
Resetting the size which avoids further harmful freeing
invalid pointer. Closer to the comment behavior description.
---
 include/common/chunk.h | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/include/common/chunk.h b/include/common/chunk.h
index b74c767..aac5282 100644
--- a/include/common/chunk.h
+++ b/include/common/chunk.h
@@ -177,6 +177,12 @@ static inline char *chunk_dup(struct chunk *dst, const struct chunk *src)
 		dst->size++;
 
 	dst->str = (char *)malloc(dst->size);
+	if (!dst->str) {
+		dst->len = 0;
+		dst->size = 0;
+		return NULL;
+	}
+
 	memcpy(dst->str, src->str, dst->len);
 	if (dst->len < dst->size)
 		dst->str[dst->len] = 0;
-- 
2.7.4



SO_REUSEPORT and process load distribution

2016-03-24 Thread Conrad Hoffmann
Hello,

I know SO_REUSEPORT has been discussed here a few times and I am aware that
haproxy uses it to make restarts less disruptive, as a new instance can
bind() to the listen ports without the need to stop the old instance first.

But there is another aspect of SO_REUSEPORT that I have not yet seen
discussed here (my apologies if I missed this, pointers welcome). The
original patch for SO_REUSEPORT [1] explicitly mentions an uneven
distribution across threads/processes for the "accept on a single
listener socket from multiple threads" case, which the author intended to
get rid of with SO_REUSEPORT. What I have so far never seen a detailed
explanation of, though, is what the exact criteria are to get this to work.

We always have and still are seeing this uneven distribution in our haproxy
processes (haproxy 1.6, linux 3.16). My assumption is that this is because
we use haproxy in daemon mode, which (I think) basically means that when a
reload happens, one process is started, it binds all the sockets (using
SO_REUSEPORT, which lets it take over sockets from the previous instance),
but then fork()s the child processes, which thus inherit the already bound
socket(s).

My gut feeling - but I cannot point to any reliable reference - is that the
even load distribution mecahnism only kicks in if all processes do the call
to at least bind(), possibly listen() or even socket() themselves. Just
setting SO_REUSEPORT but otherwise using the "accept on a single
listener socket from multiple threads" might not actually reap the intended
benefits here.

So my question is basically: do I understand this situation correctly? Can
someone who is more experienced with the kernel networking code comment on
this aspect of SO_REUSEPORT?

[1] https://lwn.net/Articles/542718/

Thanks a lot,
Conrad
-- 
Conrad Hoffmann
Traffic Engineer

SoundCloud Ltd. | Rheinsberger Str. 76/77, 10115 Berlin, Germany

Managing Director: Alexander Ljung | Incorporated in England & Wales
with Company No. 6343600 | Local Branch Office | AG Charlottenburg |
HRB 110657B



Re: DOC Patch: tune.vars.xxx-max-size

2016-03-24 Thread Willy Tarreau
Hi Daniel,

On Mon, Mar 21, 2016 at 09:56:10PM +0100, Daniel Schneller wrote:
> >From 29bddd461c30bc850633350ac81e3c9fd7b56cb8 Mon Sep 17 00:00:00 2001
> From: Daniel Schneller 
> Date: Mon, 21 Mar 2016 20:46:57 +0100
> Subject: [PATCH] DOC: Clarify tunes.vars.xxx-max-size settings
> 
> Adds a little more clarity to the description of the maximum sizes of
> the different variable scopes and adds a note about what happens when
> the space allocated for variables is too small.
> 
> Also fixes some typos and grammar/spelling issues re/ variables and
> their naming conventions, copied throughout the document.

Thanks for this, I've merged it. I suspect this part will have to be
factored out in the future to avoid these ugly copy-pastes.

Willy