Re: Achieving Zero Downtime Restarts at Yelp
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 On 13/04/2015 07:24 ??, Joseph Lynch wrote: Hello, I published an article today on Yelp's engineering blog (http://engineeringblog.yelp.com/2015/04/true-zero-downtime-haproxy-re loads.html) that shows a technique we use for low latency, zero downtime restarts of HAProxy. This solves the when I restart HAProxy some of my clients get RSTs problems that can occur. We built it to solve the RSTs in our internal load balancing, so there is a little more work to be done to modify the method to work with external traffic, which I talk about in the post. thanks for sharing this very detailed article. You wrote that 'As of version 1.5.11, HAProxy does not support zero downtime restarts or reloads of configuration. Instead, it supports fast...' Was zero downtime supported before 1.5.11? I believe not. Cheers, Pavlos -BEGIN PGP SIGNATURE- Version: GnuPG v2 iQIcBAEBCAAGBQJVLM1AAAoJEIP8ktofcXa5c68P/jVowVkpPduCSXX9I4UzoCe4 NjpboenXCJJb3ubvoHvWVJE4DTv2WfQMbAUXH//0NnPhO6RvxMKCQ8Qa2QmtS4UF kIn1FSJ/Olbo4Kf4jlH80gjiammFSvo6Dc7v/IPdYhgvTTOeKDNcV/tR5NvQG9yJ y7Y6aFHgrZtgcOriIv/reus77L4USDFRDikzrPrI/J2wWCZDSkjsJPF+YNrENcm3 kiOSbVo6ZF1EByM16vruOua2i0fG6MmnM73TVZwLqfKNYJfLP0VwB2FoJYI4JyKR K77jdDStDg8PUYEUcwhAr5eFzSaJUglnbYA7zNHaDGQWyu0LE26gFw4AMCB8jDaE 4bveTI9sLnD4PPbIIpscDtOc0zp+xeSY3DLh+v2TP7YbMncjkyGsHGGhj9a7AxFf Ne6WKHcbh2szLfvvAYxRZWr8ltl5xIud03p75HBMYUGRf37RlOcK7cBhMEHiPaCM hF26KEZFem6AUjlB6TyOXYg0WlifR0o1Z+gm8FT+0my4fDLp82XJ+2O0Vg5Cc9Np iNcdEYB6x2W2zhlhwpCIa+JVeLyBmpPo9gUzhPRi/jwhvnrwD8IJV2e+jN5VATr8 8sR/ht8GZLtQ1ZviXt31BtEGQwPAH4g7eRuHLbNSEIrDFjb+w23Ki62gvn3NEGe8 JGouYKKyFMcMgZdwJHM0 =WCRB -END PGP SIGNATURE-
Re: Achieving Zero Downtime Restarts at Yelp
I think the gold standard for graceful restarts is nginx - it will start a new instance (could be a new binary), send the accept fd's to the new instance, then the original instance will stop accepting new requests and allow the existing connections to drain off. The whole process is controlled by signals and you can even decide there is a problem with the new instance and have the old one resume taking traffic. I love it because I can bounce nginx all day long and noone notices. I could see haproxy having the same ability when nbproc = 1, but not exactly a two weekend project. On Mon, Apr 13, 2015 at 1:24 PM, Joseph Lynch joe.e.ly...@gmail.com wrote: Hello, I published an article today on Yelp's engineering blog ( http://engineeringblog.yelp.com/2015/04/true-zero-downtime-haproxy-reloads.html) that shows a technique we use for low latency, zero downtime restarts of HAProxy. This solves the when I restart HAProxy some of my clients get RSTs problems that can occur. We built it to solve the RSTs in our internal load balancing, so there is a little more work to be done to modify the method to work with external traffic, which I talk about in the post. The solution basically consists of using Linux queuing disciplines to delay SYN packets for the duration of the restart. It can definitely be improved by further tuning the qdiscs or replacing the iptables mangle with a u8/u32 tc filter, but I decided it was better to talk about the idea and if the community likes it, then we can optimize it further. -Joey
Re: Achieving Zero Downtime Restarts at Yelp
Wow, this is a really informative blog post. Thanks for sharing! I'm curious, did you weight the costs of simply converting your proxies to run on one of the BSD's? As I understand it, their implementation of SO_REUSEPORT would mean zero downtime reloads just work as hoped-for/expected. On Mon, Apr 13, 2015 at 10:24 AM, Joseph Lynch joe.e.ly...@gmail.com wrote: Hello, I published an article today on Yelp's engineering blog ( http://engineeringblog.yelp.com/2015/04/true-zero-downtime-haproxy-reloads.html) that shows a technique we use for low latency, zero downtime restarts of HAProxy. This solves the when I restart HAProxy some of my clients get RSTs problems that can occur. We built it to solve the RSTs in our internal load balancing, so there is a little more work to be done to modify the method to work with external traffic, which I talk about in the post. The solution basically consists of using Linux queuing disciplines to delay SYN packets for the duration of the restart. It can definitely be improved by further tuning the qdiscs or replacing the iptables mangle with a u8/u32 tc filter, but I decided it was better to talk about the idea and if the community likes it, then we can optimize it further. -Joey
Re: Achieving Zero Downtime Restarts at Yelp
Thanks for sharing this. This is a great and useful article!
Re: Achieving Zero Downtime Restarts at Yelp
Hi David, On Mon, Apr 13, 2015 at 12:53 PM, David Birdsong david.birds...@gmail.com wrote: I'm curious, did you weight the costs of simply converting your proxies to run on one of the BSD's? As I understand it, their implementation of SO_REUSEPORT would mean zero downtime reloads just work as hoped-for/expected. It was considered. Unfortunately, as part of our service oriented architecture we run HAProxy on every machine and use it for routing requests to service instances, which means that we have to run on the same underlying platform that all our services run on, which is Linux. The sheer number of packages we'd have to port to run on a BSD was frankly a bit staggering so we decided against it. We may have been able to work around this with proper containerization but we're not quite there yet. -Joey
Achieving Zero Downtime Restarts at Yelp
Hello, I published an article today on Yelp's engineering blog ( http://engineeringblog.yelp.com/2015/04/true-zero-downtime-haproxy-reloads.html) that shows a technique we use for low latency, zero downtime restarts of HAProxy. This solves the when I restart HAProxy some of my clients get RSTs problems that can occur. We built it to solve the RSTs in our internal load balancing, so there is a little more work to be done to modify the method to work with external traffic, which I talk about in the post. The solution basically consists of using Linux queuing disciplines to delay SYN packets for the duration of the restart. It can definitely be improved by further tuning the qdiscs or replacing the iptables mangle with a u8/u32 tc filter, but I decided it was better to talk about the idea and if the community likes it, then we can optimize it further. -Joey