Re: Achieving Zero Downtime Restarts at Yelp

2015-04-14 Thread Pavlos Parissis
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256


On 13/04/2015 07:24 ??, Joseph Lynch wrote:
 Hello,
 
 I published an article today on Yelp's engineering blog 
 (http://engineeringblog.yelp.com/2015/04/true-zero-downtime-haproxy-re
loads.html)

 
that shows a technique we use for low latency, zero downtime restarts of
 HAProxy. This solves the when I restart HAProxy some of my clients
 get RSTs problems that can occur. We built it to solve the RSTs in
 our internal load balancing, so there is a little more work to be
 done to modify the method to work with external traffic, which I
 talk about in the post.
 

thanks for sharing this very detailed article.

You wrote that
'As of version 1.5.11, HAProxy does not support zero downtime restarts
or reloads of configuration. Instead, it supports fast...'

Was zero downtime supported before 1.5.11? I believe not.

Cheers,
Pavlos
-BEGIN PGP SIGNATURE-
Version: GnuPG v2

iQIcBAEBCAAGBQJVLM1AAAoJEIP8ktofcXa5c68P/jVowVkpPduCSXX9I4UzoCe4
NjpboenXCJJb3ubvoHvWVJE4DTv2WfQMbAUXH//0NnPhO6RvxMKCQ8Qa2QmtS4UF
kIn1FSJ/Olbo4Kf4jlH80gjiammFSvo6Dc7v/IPdYhgvTTOeKDNcV/tR5NvQG9yJ
y7Y6aFHgrZtgcOriIv/reus77L4USDFRDikzrPrI/J2wWCZDSkjsJPF+YNrENcm3
kiOSbVo6ZF1EByM16vruOua2i0fG6MmnM73TVZwLqfKNYJfLP0VwB2FoJYI4JyKR
K77jdDStDg8PUYEUcwhAr5eFzSaJUglnbYA7zNHaDGQWyu0LE26gFw4AMCB8jDaE
4bveTI9sLnD4PPbIIpscDtOc0zp+xeSY3DLh+v2TP7YbMncjkyGsHGGhj9a7AxFf
Ne6WKHcbh2szLfvvAYxRZWr8ltl5xIud03p75HBMYUGRf37RlOcK7cBhMEHiPaCM
hF26KEZFem6AUjlB6TyOXYg0WlifR0o1Z+gm8FT+0my4fDLp82XJ+2O0Vg5Cc9Np
iNcdEYB6x2W2zhlhwpCIa+JVeLyBmpPo9gUzhPRi/jwhvnrwD8IJV2e+jN5VATr8
8sR/ht8GZLtQ1ZviXt31BtEGQwPAH4g7eRuHLbNSEIrDFjb+w23Ki62gvn3NEGe8
JGouYKKyFMcMgZdwJHM0
=WCRB
-END PGP SIGNATURE-



Re: Achieving Zero Downtime Restarts at Yelp

2015-04-14 Thread CJ Ess
I think the gold standard for graceful restarts is nginx - it will start a
new instance (could be a new binary), send the accept fd's to the new
instance, then the original instance will stop accepting new requests and
allow the existing connections to drain off. The whole process is
controlled by signals and you can even decide there is a problem with the
new instance and have the old one resume taking traffic. I love it because
I can bounce nginx all day long and noone notices. I could see haproxy
having the same ability when nbproc = 1, but not exactly a two weekend
project.


On Mon, Apr 13, 2015 at 1:24 PM, Joseph Lynch joe.e.ly...@gmail.com wrote:

 Hello,

 I published an article today on Yelp's engineering blog (
 http://engineeringblog.yelp.com/2015/04/true-zero-downtime-haproxy-reloads.html)
 that shows a technique we use for low latency, zero downtime restarts of
 HAProxy. This solves the when I restart HAProxy some of my clients get
 RSTs problems that can occur. We built it to solve the RSTs in our
 internal load balancing, so there is a little more work to be done to
 modify the method to work with external traffic, which I talk about in the
 post.

 The solution basically consists of using Linux queuing disciplines to
 delay SYN packets for the duration of the restart. It can definitely be
 improved by further tuning the qdiscs or replacing the iptables mangle with
 a u8/u32 tc filter, but I decided it was better to talk about the idea and
 if the community likes it, then we can optimize it further.

 -Joey



Re: Achieving Zero Downtime Restarts at Yelp

2015-04-13 Thread David Birdsong
Wow, this is a really informative blog post. Thanks for sharing!

I'm curious, did you weight the costs of simply converting your proxies to
run on one of the BSD's? As I understand it, their implementation of
SO_REUSEPORT would mean zero downtime reloads just work as
hoped-for/expected.

On Mon, Apr 13, 2015 at 10:24 AM, Joseph Lynch joe.e.ly...@gmail.com
wrote:

 Hello,

 I published an article today on Yelp's engineering blog (
 http://engineeringblog.yelp.com/2015/04/true-zero-downtime-haproxy-reloads.html)
 that shows a technique we use for low latency, zero downtime restarts of
 HAProxy. This solves the when I restart HAProxy some of my clients get
 RSTs problems that can occur. We built it to solve the RSTs in our
 internal load balancing, so there is a little more work to be done to
 modify the method to work with external traffic, which I talk about in the
 post.

 The solution basically consists of using Linux queuing disciplines to
 delay SYN packets for the duration of the restart. It can definitely be
 improved by further tuning the qdiscs or replacing the iptables mangle with
 a u8/u32 tc filter, but I decided it was better to talk about the idea and
 if the community likes it, then we can optimize it further.

 -Joey



Re: Achieving Zero Downtime Restarts at Yelp

2015-04-13 Thread Nicolas Grilly
Thanks for sharing this. This is a great and useful article!


Re: Achieving Zero Downtime Restarts at Yelp

2015-04-13 Thread Joseph Lynch
Hi David,

On Mon, Apr 13, 2015 at 12:53 PM, David Birdsong
david.birds...@gmail.com wrote:
 I'm curious, did you weight the costs of simply converting your proxies to
 run on one of the BSD's? As I understand it, their implementation of
 SO_REUSEPORT would mean zero downtime reloads just work as
 hoped-for/expected.

It was considered. Unfortunately, as part of our service oriented
architecture we run HAProxy on every machine and use it for routing
requests to service instances, which means that we have to run on the
same underlying platform that all our services run on, which is Linux.
The sheer number of packages we'd have to port to run on a BSD was
frankly a bit staggering so we decided against it. We may have been
able to work around this with proper containerization but we're not
quite there yet.

-Joey



Achieving Zero Downtime Restarts at Yelp

2015-04-13 Thread Joseph Lynch
Hello,

I published an article today on Yelp's engineering blog (
http://engineeringblog.yelp.com/2015/04/true-zero-downtime-haproxy-reloads.html)
that shows a technique we use for low latency, zero downtime restarts of
HAProxy. This solves the when I restart HAProxy some of my clients get
RSTs problems that can occur. We built it to solve the RSTs in our
internal load balancing, so there is a little more work to be done to
modify the method to work with external traffic, which I talk about in the
post.

The solution basically consists of using Linux queuing disciplines to delay
SYN packets for the duration of the restart. It can definitely be improved
by further tuning the qdiscs or replacing the iptables mangle with a u8/u32
tc filter, but I decided it was better to talk about the idea and if the
community likes it, then we can optimize it further.

-Joey