Re: [PATCH] Add the possibility to compress requests

2023-04-06 Thread Willy Tarreau
On Fri, Apr 07, 2023 at 12:56:54AM +0200, Olivier Houchard wrote:
(...)
> > OK otherwise it looks good to me. I suggest you adjust these cosmetic
> > details and directly push it.
>  
> Done, thanks!

Thanks!

> > I'm having one question by the way: how did you manage to test this ?
> > Did you find a server that supports decompressing requests, or do you
> > only have some in-house stuff for this ? I'm wondering if we could manage
> > to make a regtest for this, at least by putting some expect rules on
> > the content-encoding header probably.
> 
> My test server was actually a nginx with custom LUA code to decompress
> stuff. Maybe we can do something similar with haproxy?

Maybe, yes. I'm not yet 100% sure how to do that from Lua, but maybe
by calling a Lua service that tries to decompress the data it could
work.

Thanks,
Willy



Re: [PATCH] Add the possibility to compress requests

2023-04-06 Thread Olivier Houchard
On Thu, Apr 06, 2023 at 09:17:34PM +0200, Willy Tarreau wrote:
> > > > +/* limit compression rate */
> > > > +   if (global.comp_rate_lim > 0)
> > > > +   if (read_freq_ctr(_bps_in) > 
> > > > global.comp_rate_lim)
> > > > +   goto fail;
> > > 
> > > I had a doubt regarding this one but I think it's indeed reasonable to
> > > use the same global value for all directions as the goal was precisely
> > > to be able to enforce a total limit on all compressed streams to limit
> > > the resource impacts.
> >  
> > I think it makes sense, unless someone wants to give more priority to
> > request compression vs response compression, or vice versa, but I'll
> > rethink it if somebody gives me a valid use-case.
> 
> We'll see, but I strongly doubt this would ever happen because those who
> want to enforce speed limits are often hosting providers who don't want
> that a single site eats all their CPU, so they configure maximum compression
> bandwidths. It dates back from the era where it was still common to start
> several daemons, which is no more the case these days. So actually if
> anything would change in the short term it would rather be to have
> per-frontend and/or per-backend total limits rather than splitting
> req/res which still concern the same application.
 
Yeah that makes sense.

> > > I have not seen anything updating the stats, so I suspect that request
> > > and response compression stats are accounted as response stats, which
> > > can be a problem showing compression rates larger than 100%. If you hover
> > > over your frontend's "Bytes / Out" fields, you'll see a yellow box with
> > > numbers such as:
> > > 
> > >   Response bytes in:  468197882863
> > >   Compression in: 117157038990
> > >   Compression out:93277104103 (79%)
> > >   Compression bypass: 0
> > >   Total bytes saved:  23879934887 (5%)
> > > 
> > > Here I'm figuring that we'll need to add the same for the upload, so
> > > these are extra stats. I can possibly help locate where to place them,
> > > but I'll need that you figure where they are updated.
> > 
> > Ok so I've had mixed feelings about this, because I thought a global
> > counter could be useful too. But ultimately I think it's better to keep
> > them separated, if only so that the percentages make sense.
> 
> Yes, and another point is that when you start with a global one and
> finally decide to introduce a separate one for one direction so that
> you can subtract it from the total to get the other one, it's not
> uncommon to see negative numbers due to rounding errors or timings,
> and that's extremely confusing for those who deal with that. Thus I
> prefer to let users sum data on their side rather than providing
> totals and separate ones later.
 
That's a good point.

> > We could
> > have had Request + Response bytes there, but I think it's less useful.
> > So I added the counters, but do not expose them for requests yet, as I'm
> > unsure where to do that exactly, but that can easily be addressed with a
> > separate commit.
> 
> Yes we can have a look later. I think we'll add new metric in the stats
> that will appear in a new tooltip on the bytes/in column of the stats
> page and that will be OK.
 
Yeah I was considering something like this, but wasn't so sure.

> Just a few tiny cosmetic comments below:

[Took care of unaligned comments]

> > From: Olivier Houchard 
> > Date: Thu, 6 Apr 2023 00:33:48 +0200
> > Subject: [PATCH 5/5] MEDIUM: compression: Make it so we can compress 
> > requests
> >  as well.
> (...)
> > diff --git a/doc/configuration.txt b/doc/configuration.txt
> > index 3468c78f3..63e8d3440 100644
> > --- a/doc/configuration.txt
> > +++ b/doc/configuration.txt
> > @@ -5099,7 +5111,15 @@ compression offload
> >If this setting is used in a defaults section, a warning is emitted and 
> > the
> >option is ignored.
> >  
> > -  See also : "compression type", "compression algo"
> > +  See also : "compression type", "compression algo", "compression "
>  ^^^
> A word is missing here, I suspect it was "direction" since that one
> references "offload" which is where we are.
 
Ahahah yeah, pretty sure I removed "tocompress" to replace it with
direction, but was probably distracted before I did so, and forgot.

> OK otherwise it looks good to me. I suggest you adjust these cosmetic
> details and directly push it.
 
Done, thanks!

> I'm having one question by the way: how did you manage to test this ?
> Did you find a server that supports decompressing requests, or do you
> only have some in-house stuff for this ? I'm wondering if we could manage
> to make a regtest for this, at least by putting some expect rules on
> the content-encoding header probably.

My test server was actually a nginx with custom LUA code to decompress
stuff. Maybe we can do something similar with haproxy?

Regards,

Olivier



Re: [PATCH] Add the possibility to compress requests

2023-04-06 Thread Willy Tarreau
On Thu, Apr 06, 2023 at 01:12:09AM +0200, Olivier Houchard wrote:
> > Also I don't understand how you can have different algos for each direction
> > since the config language allows you to define "compression algo" and
> > "compression tocompress" so you cannot (apparently) have a situation where
> > the two algos differ. I think it can make sense to use different algos for
> > each direction (at least compression settings, e.g. expensive compression
> > for uploads, relayed over expensive links, little compression for downloads
> > that possibly comes from a cache).
>  
> The idea was that while only one algo could be used for requests, the
> response algorithm would depend on what was supported by the client.

Ah yes indeed.

> > > -static int set_compression_response_header(struct comp_state *st,
> > > -struct stream *s,
> > > -struct http_msg *msg);
> > > +static int set_compression_header(struct comp_state *st,
> > > +   struct stream *s,
> > > +   struct http_msg *msg,
> > > +   int for_response);
> >   
> > For this one as well I would call it "direction" (or "dir"), as we
> > already do in samples and a few other parts that work in both
> > directions.
>  
> Ok I was about to do that, but somebody from the haproxy team whom I
> won't name told me it was even better to just deduce the direction from
> the http_msg, so I did just that.

:-)  I thought the same but suspected that you needed it for any other
reason.

> > > +/* limit compression rate */
> > > + if (global.comp_rate_lim > 0)
> > > + if (read_freq_ctr(_bps_in) > global.comp_rate_lim)
> > > + goto fail;
> > 
> > I had a doubt regarding this one but I think it's indeed reasonable to
> > use the same global value for all directions as the goal was precisely
> > to be able to enforce a total limit on all compressed streams to limit
> > the resource impacts.
>  
> I think it makes sense, unless someone wants to give more priority to
> request compression vs response compression, or vice versa, but I'll
> rethink it if somebody gives me a valid use-case.

We'll see, but I strongly doubt this would ever happen because those who
want to enforce speed limits are often hosting providers who don't want
that a single site eats all their CPU, so they configure maximum compression
bandwidths. It dates back from the era where it was still common to start
several daemons, which is no more the case these days. So actually if
anything would change in the short term it would rather be to have
per-frontend and/or per-backend total limits rather than splitting
req/res which still concern the same application.

> > > + if (!(msg->chn->flags & CF_ISRESP)) {
> > > + if (comp_flags & COMP_FL_COMPRESS_REQ) {
> > > + comp_prepare_compress_request(st, s, msg);
> > > + if (st->comp_algo_req) {
> > > + if (!set_compression_header(st, s, msg, 0))
> >   
> > ^^^
> > This zero is all but explicit about the fact that it's the direction.
> > Maybe you should create an enum for this, or you can decide to pass
> > the compression flags (which could then be reused to pass other info
> > in the future) and you'd have COMP_FL_DIR_REQ here, which is way more
> > explicit.
>  
> Yes this is fixed, I don't use the flag but introduced
> COMP_DIR_REQ/COMP_DIR_RES, which nicely matches the index of the
> corresponding arrays.

Cool, thanks!

> > I have not seen anything updating the stats, so I suspect that request
> > and response compression stats are accounted as response stats, which
> > can be a problem showing compression rates larger than 100%. If you hover
> > over your frontend's "Bytes / Out" fields, you'll see a yellow box with
> > numbers such as:
> > 
> >   Response bytes in:468197882863
> >   Compression in:   117157038990
> >   Compression out:  93277104103 (79%)
> >   Compression bypass:   0
> >   Total bytes saved:23879934887 (5%)
> > 
> > Here I'm figuring that we'll need to add the same for the upload, so
> > these are extra stats. I can possibly help locate where to place them,
> > but I'll need that you figure where they are updated.
> 
> Ok so I've had mixed feelings about this, because I thought a global
> counter could be useful too. But ultimately I think it's better to keep
> them separated, if only so that the percentages make sense.

Yes, and another point is that when you start with a global one and
finally decide to introduce a separate one for one direction so that
you can subtract it from the total to get the other one, it's not
uncommon to see negative numbers due to rounding errors or timings,
and that's extremely confusing for those who deal with that. Thus I