Re: [RFC PATCH] MEDIUM: compression: Add support for brotli compression
On Mon, Mar 04, 2019 at 02:44:38PM +0100, Tim Düsterhus wrote: > One could limit the overall brotli resource usage by returning NULLs in > the custom allocator when the *total* (versus the per-stream) brotli > memory consumption exceeds a certain level. The handling of OOMs in the > remaining code is not relevant then, because brotli is artificially > limited to a (way) lower memory limit that leaves space for other parts. Yes it could be done this way. But one needs to write this custom allocator at least ;-) Cheers, Willy
Re: [RFC PATCH] MEDIUM: compression: Add support for brotli compression
Willy, Am 04.03.19 um 14:36 schrieb Willy Tarreau: >>> can document such limits and let users decide on their own. We'll >>> need the equivalent of maxzlibmem though (or better, we can reuse it >>> to keep a single tunable and indicate it serves for any compression >>> algo so that there isn't the issue of "what if two frontends use a >>> different compression algo"). >> >> I guess one has to plug in a custom allocator to do this. > > Quite likely, which is another level of pain :-/ > >> The library >> appears to handle the OOM case (but I did not check what happens if the >> OOM is encountered halfway through compression). > > The problem is not as much how the lib handles the OOM situation as > how 100% of the remaining code (haproxy,openssl,pcre,...) handles it > once brotli makes this possibility a reality. We're always extremely > careful to make sure it still work in this situation by serializing > what can be, but we've already been hit by bugs in openssl and haproxy > at least. > > For now, until we figure a way to properly control the resource usage > of this lib, I'm not a big fan of merging it, as it's clear that it > *will* cause lots of trouble. Seeing users complain here on the list > is one thing, but thinking about their crashed or frozen LB in prod > is another one, and I'd rather not cross this boundary especially > given the small gains we've seen that very few people would take for > a valuable justification for killing their production :-/ > One could limit the overall brotli resource usage by returning NULLs in the custom allocator when the *total* (versus the per-stream) brotli memory consumption exceeds a certain level. The handling of OOMs in the remaining code is not relevant then, because brotli is artificially limited to a (way) lower memory limit that leaves space for other parts. Best regards Tim Düsterhus
Re: [RFC PATCH] MEDIUM: compression: Add support for brotli compression
Hi Tim, On Wed, Feb 27, 2019 at 01:23:28PM +0100, Tim Düsterhus wrote: > As mentioned in my reply to Aleks I don't have any numbers, because I > don't know to get them. My knowledge of both HAProxy's internals and C > is not strong enough to get those. > > The manpage documents this: > > >BROTLI_PARAM_LGBLOCK > > Recommended input block size. Encoder may reduce this value, > > e.g. if input is much smaller than input block size. > > > >Range is from BROTLI_MIN_INPUT_BLOCK_BITS to > >BROTLI_MAX_INPUT_BLOCK_BITS. > > > >Note: > >Bigger input block size allows better compression, but consumes > >more memory. > > The rough formula of memory used for temporary input storage is > > 3 > ><< lgBlock. > > The default of this value depends on other configuration settings: > https://github.com/google/brotli/blob/9cd01c0437e8b6010434d3491a348a5645de624b/c/enc/quality.h#L75-L92 > > It is the only place that talks about memory. There's also this (still > open) issue: https://github.com/google/brotli/issues/389 "Functions to > calculate approximate memory usage needed for compression and > decompression". This is quite scary, they're discussing about 2.6 MB for the huffmann tables. It's manageable in a browser. Sometimes on a server with low traffic, but on a shared load balancer, it's an immediate DoS. They're saying that it's the worst case but that the majority of the cases are below 0.5 MB, which is still twice as much as zlib which iself is insane. > > can document such limits and let users decide on their own. We'll > > need the equivalent of maxzlibmem though (or better, we can reuse it > > to keep a single tunable and indicate it serves for any compression > > algo so that there isn't the issue of "what if two frontends use a > > different compression algo"). > > I guess one has to plug in a custom allocator to do this. Quite likely, which is another level of pain :-/ > The library > appears to handle the OOM case (but I did not check what happens if the > OOM is encountered halfway through compression). The problem is not as much how the lib handles the OOM situation as how 100% of the remaining code (haproxy,openssl,pcre,...) handles it once brotli makes this possibility a reality. We're always extremely careful to make sure it still work in this situation by serializing what can be, but we've already been hit by bugs in openssl and haproxy at least. For now, until we figure a way to properly control the resource usage of this lib, I'm not a big fan of merging it, as it's clear that it *will* cause lots of trouble. Seeing users complain here on the list is one thing, but thinking about their crashed or frozen LB in prod is another one, and I'd rather not cross this boundary especially given the small gains we've seen that very few people would take for a valuable justification for killing their production :-/ Cheers, Willy
Re: [RFC PATCH] MEDIUM: compression: Add support for brotli compression
Willy, Am 27.02.19 um 05:12 schrieb Willy Tarreau: > Hi Tim, > > On Tue, Feb 26, 2019 at 06:16:12PM +0100, Tim Düsterhus wrote: >> Willy, >> >> Am 13.02.19 um 17:57 schrieb Tim Duesterhus: >>> *snip* >> >> Are you able to give some (first, basic) feedback on this patch already? > > Not yet. In fact I don't know much what to think about it. The patch > itself is reasonably small, but what's the real cost of using this ? > We've created libslz because zlib was not practically usable due to > its insane amount of memory per stream (256 kB), which resulted in > compression being disabled for many streams by lack of memory, hence > in a lower overall compression ratio. Here I have no idea how much > brotli requires, but if it compresses each stream slightly better than > zlib but at roughly similar (or worse) costs, maybe in the end it will > not be beneficial either ? So if you have numbers (CPU cost, pinned > memory per stream), it would be very useful. Once this is known, we As mentioned in my reply to Aleks I don't have any numbers, because I don't know to get them. My knowledge of both HAProxy's internals and C is not strong enough to get those. The manpage documents this: >BROTLI_PARAM_LGBLOCK > Recommended input block size. Encoder may reduce this value, > e.g. if input is much smaller than input block size. > >Range is from BROTLI_MIN_INPUT_BLOCK_BITS to >BROTLI_MAX_INPUT_BLOCK_BITS. > >Note: >Bigger input block size allows better compression, but consumes >more memory. > The rough formula of memory used for temporary input storage is 3 ><< lgBlock. The default of this value depends on other configuration settings: https://github.com/google/brotli/blob/9cd01c0437e8b6010434d3491a348a5645de624b/c/enc/quality.h#L75-L92 It is the only place that talks about memory. There's also this (still open) issue: https://github.com/google/brotli/issues/389 "Functions to calculate approximate memory usage needed for compression and decompression". > can document such limits and let users decide on their own. We'll > need the equivalent of maxzlibmem though (or better, we can reuse it > to keep a single tunable and indicate it serves for any compression > algo so that there isn't the issue of "what if two frontends use a > different compression algo"). I guess one has to plug in a custom allocator to do this. The library appears to handle the OOM case (but I did not check what happens if the OOM is encountered halfway through compression). Best regards Tim Düsterhus
Re: [RFC PATCH] MEDIUM: compression: Add support for brotli compression
Hi Tim, On Tue, Feb 26, 2019 at 06:16:12PM +0100, Tim Düsterhus wrote: > Willy, > > Am 13.02.19 um 17:57 schrieb Tim Duesterhus: > > *snip* > > Are you able to give some (first, basic) feedback on this patch already? Not yet. In fact I don't know much what to think about it. The patch itself is reasonably small, but what's the real cost of using this ? We've created libslz because zlib was not practically usable due to its insane amount of memory per stream (256 kB), which resulted in compression being disabled for many streams by lack of memory, hence in a lower overall compression ratio. Here I have no idea how much brotli requires, but if it compresses each stream slightly better than zlib but at roughly similar (or worse) costs, maybe in the end it will not be beneficial either ? So if you have numbers (CPU cost, pinned memory per stream), it would be very useful. Once this is known, we can document such limits and let users decide on their own. We'll need the equivalent of maxzlibmem though (or better, we can reuse it to keep a single tunable and indicate it serves for any compression algo so that there isn't the issue of "what if two frontends use a different compression algo"). Thanks, Willy
Re: [RFC PATCH] MEDIUM: compression: Add support for brotli compression
Willy, Am 13.02.19 um 17:57 schrieb Tim Duesterhus: > *snip* Are you able to give some (first, basic) feedback on this patch already? Best regards Tim Düsterhus
Re: [RFC PATCH] MEDIUM: compression: Add support for brotli compression
Aleks, Am 14.02.19 um 12:00 schrieb Aleksandar Lazic: >> I am successfully able access brotli compressed URLs with Google Chrome, >> this requires me to disable `gzip` though (because haproxy prefers to >> select gzip, I suspect because `br` is last in Chrome's `Accept-Encoding` >> header). > > Does it change it when you use `br` as frist entry in `compression algo ... ` > > https://cbonte.github.io/haproxy-dconv/1.9/configuration.html#4.2-compression%20algo I tried that. It does not. In the absence of a q-value for the encoding HAProxy select the first value listed in the Accept-Encoding header. I just checked RFC 7231#section-5.3.4 which does not specify a priority in this case. Maybe the code should be changed to use order of the algorithms in the config to determine the priority when the client gives equal priorities. >> I also am able to sucessfully download and decompress URLs with `curl` >> and the `brotli` CLI utility. The server I use as the backend for these >> tests has about 45ms RTT to my machine. The HTML page I use is some random >> HTML page on the server, the noise file is 1 MiB of finest /dev/urandom. >> >> You'll notice that brotli compressed requests are both faster as well as >> smaller compared to gzip with the hardcoded brotli compression quality >> of 3. The default is 11, which is *way* slower than gzip. > > How much more/less/equal CPU usage have brotli compared to gzip? I did not check, because I would have to build something more elaborate than "look at curl's output" for that. Also I have no idea how I would do so. I'll leave this up to the experts :-) > I'm a little bit disappointed from the size point of view, it is only ~6K less > then gzip, is it worth the amount of work for such a small gain of data > reduction. Even 6kB add up over time. Especially on cellular networks. Also I did not tune all the brotli encoder knobs yet. As an example one could specify that the contents is UTF-8 encoded text which possibly improves compression rate further (I guess it selects a different dictionary): https://github.com/google/brotli/blob/5805f99a533a8f8118699c0100d8c102f3605f65/docs/encode.h.3#L197-L204 Best regards Tim Düsterhus
Re: [RFC PATCH] MEDIUM: compression: Add support for brotli compression
Hi Tim. Am 13.02.2019 um 17:57 schrieb Tim Duesterhus: > Willy, > Aleks, > List, > > this (absolutely non-ready-to-merge) patch adds support for brotli > compression as suggested in issue #21: > https://github.com/haproxy/haproxy/issues/21 Cool ;-) > It is tested on Ubuntu Xenial with libbrotli 1.0.3: > > [timwolla@~]apt-cache policy libbrotli-dev > libbrotli-dev: > Installed: 1.0.3-1ubuntu1~16.04.1 > Candidate: 1.0.3-1ubuntu1~16.04.1 > Version table: > *** 1.0.3-1ubuntu1~16.04.1 500 > 500 http://de.archive.ubuntu.com/ubuntu xenial-updates/main > amd64 Packages > 100 /var/lib/dpkg/status > [timwolla@~]apt-cache policy libbrotli1 > libbrotli1: > Installed: 1.0.3-1ubuntu1~16.04.1 > Candidate: 1.0.3-1ubuntu1~16.04.1 > Version table: > *** 1.0.3-1ubuntu1~16.04.1 500 > 500 http://de.archive.ubuntu.com/ubuntu xenial-updates/main > amd64 Packages > 100 /var/lib/dpkg/status > > I am successfully able access brotli compressed URLs with Google Chrome, > this requires me to disable `gzip` though (because haproxy prefers to > select gzip, I suspect because `br` is last in Chrome's `Accept-Encoding` > header). Does it change it when you use `br` as frist entry in `compression algo ... ` https://cbonte.github.io/haproxy-dconv/1.9/configuration.html#4.2-compression%20algo > I also am able to sucessfully download and decompress URLs with `curl` > and the `brotli` CLI utility. The server I use as the backend for these > tests has about 45ms RTT to my machine. The HTML page I use is some random > HTML page on the server, the noise file is 1 MiB of finest /dev/urandom. > > You'll notice that brotli compressed requests are both faster as well as > smaller compared to gzip with the hardcoded brotli compression quality > of 3. The default is 11, which is *way* slower than gzip. How much more/less/equal CPU usage have brotli compared to gzip? I'm a little bit disappointed from the size point of view, it is only ~6K less then gzip, is it worth the amount of work for such a small gain of data reduction. Regards Aleks > + curl localhost:8080/*snip*.html -H 'Accept-Encoding: gzip' > % Total% Received % Xferd Average Speed TimeTime Time > Current > Dload Upload Total SpentLeft > Speed > 100 492800 492800 0 279k 0 --:--:-- --:--:-- > --:--:-- 279k > + curl localhost:8080/*snip*.html -H 'Accept-Encoding: br' > % Total% Received % Xferd Average Speed TimeTime Time > Current > Dload Upload Total SpentLeft > Speed > 100 434010 434010 0 332k 0 --:--:-- --:--:-- > --:--:-- 333k > + curl localhost:8080/*snip*.html -H 'Accept-Encoding: identity' > % Total% Received % Xferd Average Speed TimeTime Time > Current > Dload Upload Total SpentLeft > Speed > 100 127k 100 127k0 0 441k 0 --:--:-- --:--:-- > --:--:-- 441k > + curl localhost:8080/noise -H 'Accept-Encoding: gzip' > % Total% Received % Xferd Average Speed TimeTime Time > Current > Dload Upload Total SpentLeft > Speed > 100 1025k0 1025k0 0 3330k 0 --:--:-- --:--:-- > --:--:-- 3338k > + curl localhost:8080/noise -H 'Accept-Encoding: br' > % Total% Received % Xferd Average Speed TimeTime Time > Current > Dload Upload Total SpentLeft > Speed > 100 1024k0 1024k0 0 3029k 0 --:--:-- --:--:-- > --:--:-- 3030k > + curl localhost:8080/noise -H 'Accept-Encoding: identity' > % Total% Received % Xferd Average Speed TimeTime Time > Current > Dload Upload Total SpentLeft > Speed > 100 1024k 100 1024k0 0 3003k 0 --:--:-- --:--:-- > --:--:-- 3002k > + ls -al > total 3384 > drwxrwxr-x 2 timwolla timwolla4096 Feb 13 17:30 . > drwxrwxrwt 28 root root 69632 Feb 13 17:25 .. > -rw-rw-r-- 1 timwolla timwolla 598 Feb 13 17:30 download > -rw-rw-r-- 1 timwolla timwolla 43401 Feb 13 17:30 html-br > -rw-rw-r-- 1 timwolla timwolla 49280 Feb 13 17:30 html-gz > -rw-rw-r-- 1 timwolla timwolla 130334 Feb 13 17:30 html-id > -rw-rw-r-- 1 timwolla timwolla 1048949 Feb 13 17:30 noise-br > -rw-rw-r-- 1 timwolla timwolla 1049666 Feb 13 17:30 noise-gz > -rw-rw-r-- 1 timwolla timwolla 1048576 Feb 13 17:30 noise-id > ++ zcat html-gz > + sha256sum html-id /dev/fd/63 /dev/fd/62 > ++ brotli --decompress --stdout html-br > 56f1664241b3dbb750f93b69570be76c6baccb8de4f
[RFC PATCH] MEDIUM: compression: Add support for brotli compression
Willy, Aleks, List, this (absolutely non-ready-to-merge) patch adds support for brotli compression as suggested in issue #21: https://github.com/haproxy/haproxy/issues/21 It is tested on Ubuntu Xenial with libbrotli 1.0.3: [timwolla@~]apt-cache policy libbrotli-dev libbrotli-dev: Installed: 1.0.3-1ubuntu1~16.04.1 Candidate: 1.0.3-1ubuntu1~16.04.1 Version table: *** 1.0.3-1ubuntu1~16.04.1 500 500 http://de.archive.ubuntu.com/ubuntu xenial-updates/main amd64 Packages 100 /var/lib/dpkg/status [timwolla@~]apt-cache policy libbrotli1 libbrotli1: Installed: 1.0.3-1ubuntu1~16.04.1 Candidate: 1.0.3-1ubuntu1~16.04.1 Version table: *** 1.0.3-1ubuntu1~16.04.1 500 500 http://de.archive.ubuntu.com/ubuntu xenial-updates/main amd64 Packages 100 /var/lib/dpkg/status I am successfully able access brotli compressed URLs with Google Chrome, this requires me to disable `gzip` though (because haproxy prefers to select gzip, I suspect because `br` is last in Chrome's `Accept-Encoding` header). I also am able to sucessfully download and decompress URLs with `curl` and the `brotli` CLI utility. The server I use as the backend for these tests has about 45ms RTT to my machine. The HTML page I use is some random HTML page on the server, the noise file is 1 MiB of finest /dev/urandom. You'll notice that brotli compressed requests are both faster as well as smaller compared to gzip with the hardcoded brotli compression quality of 3. The default is 11, which is *way* slower than gzip. + curl localhost:8080/*snip*.html -H 'Accept-Encoding: gzip' % Total% Received % Xferd Average Speed TimeTime Time Current Dload Upload Total SpentLeft Speed 100 492800 492800 0 279k 0 --:--:-- --:--:-- --:--:-- 279k + curl localhost:8080/*snip*.html -H 'Accept-Encoding: br' % Total% Received % Xferd Average Speed TimeTime Time Current Dload Upload Total SpentLeft Speed 100 434010 434010 0 332k 0 --:--:-- --:--:-- --:--:-- 333k + curl localhost:8080/*snip*.html -H 'Accept-Encoding: identity' % Total% Received % Xferd Average Speed TimeTime Time Current Dload Upload Total SpentLeft Speed 100 127k 100 127k0 0 441k 0 --:--:-- --:--:-- --:--:-- 441k + curl localhost:8080/noise -H 'Accept-Encoding: gzip' % Total% Received % Xferd Average Speed TimeTime Time Current Dload Upload Total SpentLeft Speed 100 1025k0 1025k0 0 3330k 0 --:--:-- --:--:-- --:--:-- 3338k + curl localhost:8080/noise -H 'Accept-Encoding: br' % Total% Received % Xferd Average Speed TimeTime Time Current Dload Upload Total SpentLeft Speed 100 1024k0 1024k0 0 3029k 0 --:--:-- --:--:-- --:--:-- 3030k + curl localhost:8080/noise -H 'Accept-Encoding: identity' % Total% Received % Xferd Average Speed TimeTime Time Current Dload Upload Total SpentLeft Speed 100 1024k 100 1024k0 0 3003k 0 --:--:-- --:--:-- --:--:-- 3002k + ls -al total 3384 drwxrwxr-x 2 timwolla timwolla4096 Feb 13 17:30 . drwxrwxrwt 28 root root 69632 Feb 13 17:25 .. -rw-rw-r-- 1 timwolla timwolla 598 Feb 13 17:30 download -rw-rw-r-- 1 timwolla timwolla 43401 Feb 13 17:30 html-br -rw-rw-r-- 1 timwolla timwolla 49280 Feb 13 17:30 html-gz -rw-rw-r-- 1 timwolla timwolla 130334 Feb 13 17:30 html-id -rw-rw-r-- 1 timwolla timwolla 1048949 Feb 13 17:30 noise-br -rw-rw-r-- 1 timwolla timwolla 1049666 Feb 13 17:30 noise-gz -rw-rw-r-- 1 timwolla timwolla 1048576 Feb 13 17:30 noise-id ++ zcat html-gz + sha256sum html-id /dev/fd/63 /dev/fd/62 ++ brotli --decompress --stdout html-br 56f1664241b3dbb750f93b69570be76c6baccb8de4f3a62fb4fec0ce1bf440b5 html-id 56f1664241b3dbb750f93b69570be76c6baccb8de4f3a62fb4fec0ce1bf440b5 /dev/fd/63 56f1664241b3dbb750f93b69570be76c6baccb8de4f3a62fb4fec0ce1bf440b5 /dev/fd/62 ++ zcat noise-gz + sha256sum noise-id /dev/fd/63 /dev/fd/62 ++ brotli --decompress --stdout noise-br ab23236d9d4acecec239c3f0f9b59e59dd043267eeed9ed723da8b15f46bbf33 noise-id ab23236d9d4acecec239c3f0f9b59e59dd043267eeed9ed723da8b15f46bbf33 /dev/fd/63 ab23236d9d4acecec239c3f0f9b59e59dd043267eeed9ed723da8