Re: [PATCH] Add the possibility to compress requests

2023-04-06 Thread Willy Tarreau
On Fri, Apr 07, 2023 at 12:56:54AM +0200, Olivier Houchard wrote:
(...)
> > OK otherwise it looks good to me. I suggest you adjust these cosmetic
> > details and directly push it.
>  
> Done, thanks!

Thanks!

> > I'm having one question by the way: how did you manage to test this ?
> > Did you find a server that supports decompressing requests, or do you
> > only have some in-house stuff for this ? I'm wondering if we could manage
> > to make a regtest for this, at least by putting some expect rules on
> > the content-encoding header probably.
> 
> My test server was actually a nginx with custom LUA code to decompress
> stuff. Maybe we can do something similar with haproxy?

Maybe, yes. I'm not yet 100% sure how to do that from Lua, but maybe
by calling a Lua service that tries to decompress the data it could
work.

Thanks,
Willy



Re: [PATCH] Add the possibility to compress requests

2023-04-06 Thread Olivier Houchard
On Thu, Apr 06, 2023 at 09:17:34PM +0200, Willy Tarreau wrote:
> > > > +/* limit compression rate */
> > > > +   if (global.comp_rate_lim > 0)
> > > > +   if (read_freq_ctr(_bps_in) > 
> > > > global.comp_rate_lim)
> > > > +   goto fail;
> > > 
> > > I had a doubt regarding this one but I think it's indeed reasonable to
> > > use the same global value for all directions as the goal was precisely
> > > to be able to enforce a total limit on all compressed streams to limit
> > > the resource impacts.
> >  
> > I think it makes sense, unless someone wants to give more priority to
> > request compression vs response compression, or vice versa, but I'll
> > rethink it if somebody gives me a valid use-case.
> 
> We'll see, but I strongly doubt this would ever happen because those who
> want to enforce speed limits are often hosting providers who don't want
> that a single site eats all their CPU, so they configure maximum compression
> bandwidths. It dates back from the era where it was still common to start
> several daemons, which is no more the case these days. So actually if
> anything would change in the short term it would rather be to have
> per-frontend and/or per-backend total limits rather than splitting
> req/res which still concern the same application.
 
Yeah that makes sense.

> > > I have not seen anything updating the stats, so I suspect that request
> > > and response compression stats are accounted as response stats, which
> > > can be a problem showing compression rates larger than 100%. If you hover
> > > over your frontend's "Bytes / Out" fields, you'll see a yellow box with
> > > numbers such as:
> > > 
> > >   Response bytes in:  468197882863
> > >   Compression in: 117157038990
> > >   Compression out:93277104103 (79%)
> > >   Compression bypass: 0
> > >   Total bytes saved:  23879934887 (5%)
> > > 
> > > Here I'm figuring that we'll need to add the same for the upload, so
> > > these are extra stats. I can possibly help locate where to place them,
> > > but I'll need that you figure where they are updated.
> > 
> > Ok so I've had mixed feelings about this, because I thought a global
> > counter could be useful too. But ultimately I think it's better to keep
> > them separated, if only so that the percentages make sense.
> 
> Yes, and another point is that when you start with a global one and
> finally decide to introduce a separate one for one direction so that
> you can subtract it from the total to get the other one, it's not
> uncommon to see negative numbers due to rounding errors or timings,
> and that's extremely confusing for those who deal with that. Thus I
> prefer to let users sum data on their side rather than providing
> totals and separate ones later.
 
That's a good point.

> > We could
> > have had Request + Response bytes there, but I think it's less useful.
> > So I added the counters, but do not expose them for requests yet, as I'm
> > unsure where to do that exactly, but that can easily be addressed with a
> > separate commit.
> 
> Yes we can have a look later. I think we'll add new metric in the stats
> that will appear in a new tooltip on the bytes/in column of the stats
> page and that will be OK.
 
Yeah I was considering something like this, but wasn't so sure.

> Just a few tiny cosmetic comments below:

[Took care of unaligned comments]

> > From: Olivier Houchard 
> > Date: Thu, 6 Apr 2023 00:33:48 +0200
> > Subject: [PATCH 5/5] MEDIUM: compression: Make it so we can compress 
> > requests
> >  as well.
> (...)
> > diff --git a/doc/configuration.txt b/doc/configuration.txt
> > index 3468c78f3..63e8d3440 100644
> > --- a/doc/configuration.txt
> > +++ b/doc/configuration.txt
> > @@ -5099,7 +5111,15 @@ compression offload
> >If this setting is used in a defaults section, a warning is emitted and 
> > the
> >option is ignored.
> >  
> > -  See also : "compression type", "compression algo"
> > +  See also : "compression type", "compression algo", "compression "
>  ^^^
> A word is missing here, I suspect it was "direction" since that one
> references "offload" which is where we are.
 
Ahahah yeah, pretty sure I removed "tocompress" to replace it with
direction, but was probably distracted before I did so, and forgot.

> OK otherwise it looks good to me. I suggest you adjust these cosmetic
> details and directly push it.
 
Done, thanks!

> I'm having one question by the way: how did you manage to test this ?
> Did you find a server that supports decompressing requests, or do you
> only have some in-house stuff for this ? I'm wondering if we could manage
> to make a regtest for this, at least by putting some expect rules on
> the content-encoding header probably.

My test server was actually a nginx with custom LUA code to decompress
stuff. Maybe we can do something similar with haproxy?

Regards,

Olivier



Re: [PATCH] Add the possibility to compress requests

2023-04-06 Thread Willy Tarreau
On Thu, Apr 06, 2023 at 01:12:09AM +0200, Olivier Houchard wrote:
> > Also I don't understand how you can have different algos for each direction
> > since the config language allows you to define "compression algo" and
> > "compression tocompress" so you cannot (apparently) have a situation where
> > the two algos differ. I think it can make sense to use different algos for
> > each direction (at least compression settings, e.g. expensive compression
> > for uploads, relayed over expensive links, little compression for downloads
> > that possibly comes from a cache).
>  
> The idea was that while only one algo could be used for requests, the
> response algorithm would depend on what was supported by the client.

Ah yes indeed.

> > > -static int set_compression_response_header(struct comp_state *st,
> > > -struct stream *s,
> > > -struct http_msg *msg);
> > > +static int set_compression_header(struct comp_state *st,
> > > +   struct stream *s,
> > > +   struct http_msg *msg,
> > > +   int for_response);
> >   
> > For this one as well I would call it "direction" (or "dir"), as we
> > already do in samples and a few other parts that work in both
> > directions.
>  
> Ok I was about to do that, but somebody from the haproxy team whom I
> won't name told me it was even better to just deduce the direction from
> the http_msg, so I did just that.

:-)  I thought the same but suspected that you needed it for any other
reason.

> > > +/* limit compression rate */
> > > + if (global.comp_rate_lim > 0)
> > > + if (read_freq_ctr(_bps_in) > global.comp_rate_lim)
> > > + goto fail;
> > 
> > I had a doubt regarding this one but I think it's indeed reasonable to
> > use the same global value for all directions as the goal was precisely
> > to be able to enforce a total limit on all compressed streams to limit
> > the resource impacts.
>  
> I think it makes sense, unless someone wants to give more priority to
> request compression vs response compression, or vice versa, but I'll
> rethink it if somebody gives me a valid use-case.

We'll see, but I strongly doubt this would ever happen because those who
want to enforce speed limits are often hosting providers who don't want
that a single site eats all their CPU, so they configure maximum compression
bandwidths. It dates back from the era where it was still common to start
several daemons, which is no more the case these days. So actually if
anything would change in the short term it would rather be to have
per-frontend and/or per-backend total limits rather than splitting
req/res which still concern the same application.

> > > + if (!(msg->chn->flags & CF_ISRESP)) {
> > > + if (comp_flags & COMP_FL_COMPRESS_REQ) {
> > > + comp_prepare_compress_request(st, s, msg);
> > > + if (st->comp_algo_req) {
> > > + if (!set_compression_header(st, s, msg, 0))
> >   
> > ^^^
> > This zero is all but explicit about the fact that it's the direction.
> > Maybe you should create an enum for this, or you can decide to pass
> > the compression flags (which could then be reused to pass other info
> > in the future) and you'd have COMP_FL_DIR_REQ here, which is way more
> > explicit.
>  
> Yes this is fixed, I don't use the flag but introduced
> COMP_DIR_REQ/COMP_DIR_RES, which nicely matches the index of the
> corresponding arrays.

Cool, thanks!

> > I have not seen anything updating the stats, so I suspect that request
> > and response compression stats are accounted as response stats, which
> > can be a problem showing compression rates larger than 100%. If you hover
> > over your frontend's "Bytes / Out" fields, you'll see a yellow box with
> > numbers such as:
> > 
> >   Response bytes in:468197882863
> >   Compression in:   117157038990
> >   Compression out:  93277104103 (79%)
> >   Compression bypass:   0
> >   Total bytes saved:23879934887 (5%)
> > 
> > Here I'm figuring that we'll need to add the same for the upload, so
> > these are extra stats. I can possibly help locate where to place them,
> > but I'll need that you figure where they are updated.
> 
> Ok so I've had mixed feelings about this, because I thought a global
> counter could be useful too. But ultimately I think it's better to keep
> them separated, if only so that the percentages make sense.

Yes, and another point is that when you start with a global one and
finally decide to introduce a separate one for one direction so that
you can subtract it from the total to get the other one, it's not
uncommon to see negative numbers due to rounding errors or timings,
and that's extremely confusing for those who deal with that. Thus I

Re: [PATCH] Add the possibility to compress requests

2023-04-05 Thread Olivier Houchard
Hi,

On Tue, Apr 04, 2023 at 09:45:24PM +0200, Willy Tarreau wrote:
> Hi Olivier,
> 
> On Tue, Apr 04, 2023 at 12:29:15AM +0200, Olivier Houchard wrote:
> > Hi,
> > 
> > The attached patchset is a first attempt at adding the possibility to
> > compress requests, as well as responses.
> 
> This is pretty cool, I definitely see how this can be super useful :-)
 
:)

> > It adds a new keyword for compression, "tocompress" (any better
> > alternative name would be appreciated).
> 
> I would call it "direction" which I find more natural and understandable.
 
I like that a lot more. Thank you!

> > The valid values are "request",
> > if you want to compress requests, "response" if you want to compress
> > responses or "both", if you want to compress both.
> > The default is to compress responses only.
> 
> I have a few below (besides renaming "tocompress").
 
[...]

> > diff --git a/include/haproxy/compression-t.h 
> > b/include/haproxy/compression-t.h
> > index cdbf0d14e..d15d15ee3 100644
> > --- a/include/haproxy/compression-t.h
> > +++ b/include/haproxy/compression-t.h
> > @@ -37,6 +37,9 @@
> >  /* Compression flags */
> >  
> >  #define COMP_FL_OFFLOAD0x0001 /* Compression offload */
> > +#define COMP_FL_COMPRESS_REQ   0x0002 /* Compress requests */
> > +#define COMP_FL_COMPRESS_RES   0x0004 /* Compress responses */
> 
> So these ones would be COMP_FL_DIR_{REQ,RES}.
 
Done!

> >  struct comp {
> > struct comp_algo *algos;
> > struct comp_type *types;
> > diff --git a/src/flt_http_comp.c b/src/flt_http_comp.c
> > index 507009692..7680e241f 100644
> > --- a/src/flt_http_comp.c
> > +++ b/src/flt_http_comp.c
> > @@ -33,7 +33,9 @@ struct flt_ops comp_ops;
> >  
> >  struct comp_state {
> > struct comp_ctx  *comp_ctx;   /* compression context */
> > +   struct comp_ctx  *comp_ctx_req; /* compression context for request */
> > struct comp_algo *comp_algo;  /* compression algorithm if not NULL */
> > +   struct comp_algo *comp_algo_req; /* Algo to be used for request */
> > unsigned int  flags;  /* COMP_STATE_* */
> 
> I find it confusing to have comp_ctx and comp_ctx_req (and same for algo).
> I'd rather see a preliminary patch renaming comp_ctx to comp_ctx_res and
> comp_algo to comp_algo_res (just perform a mechanical rename). Or you could
> also have an array [2] for each, which will even remove some ifs in the
> code.
 
Ok I made that an array.

> Also I don't understand how you can have different algos for each direction
> since the config language allows you to define "compression algo" and
> "compression tocompress" so you cannot (apparently) have a situation where
> the two algos differ. I think it can make sense to use different algos for
> each direction (at least compression settings, e.g. expensive compression
> for uploads, relayed over expensive links, little compression for downloads
> that possibly comes from a cache).
 
The idea was that while only one algo could be used for requests, the
response algorithm would depend on what was supported by the client.

> Thus maybe an approach could be to use the algo to indicate the direction
> at the same time:
> 
>compression algo foobar   => sets response only, for compatibility
>compression algo-res foobar   => sets response only, explicit syntax
>compression algo-req foobar   => sets request only, explicit syntax
> 
> Then there's probably no need to have a keyword to set it at once in
> both directions since both actions are totally independent and it's
> probably not desired to encourage users to conflate the two directions
> in a single keyword.
> 
> You might need the same for "compression types" I think, so that you
> don't try to compress regular forms or login:password that will confuse
> applications, but only large data.
 
But I did that, so now there's even more reason to have separate
algorithms.

> > -static int set_compression_response_header(struct comp_state *st,
> > -  struct stream *s,
> > -  struct http_msg *msg);
> > +static int set_compression_header(struct comp_state *st,
> > + struct stream *s,
> > + struct http_msg *msg,
> > + int for_response);
>   
> For this one as well I would call it "direction" (or "dir"), as we
> already do in samples and a few other parts that work in both
> directions.
 
Ok I was about to do that, but somebody from the haproxy team whom I
won't name told me it was even better to just deduce the direction from
the http_msg, so I did just that.

> > +static void 
> > +comp_prepare_compress_request(struct comp_state *st, struct stream *s, 
> > struct http_msg *msg)
> > +{
> > +   struct htx *htx = htxbuf(>chn->buf);
> > +   struct http_txn *txn = s->txn;
> > +   struct http_hdr_ctx ctx;
> > +
> > +   ctx.blk = NULL;
> > +   /* 

Re: [PATCH] Add the possibility to compress requests

2023-04-04 Thread Willy Tarreau
Hi Olivier,

On Tue, Apr 04, 2023 at 12:29:15AM +0200, Olivier Houchard wrote:
> Hi,
> 
> The attached patchset is a first attempt at adding the possibility to
> compress requests, as well as responses.

This is pretty cool, I definitely see how this can be super useful :-)

> It adds a new keyword for compression, "tocompress" (any better
> alternative name would be appreciated).

I would call it "direction" which I find more natural and understandable.

> The valid values are "request",
> if you want to compress requests, "response" if you want to compress
> responses or "both", if you want to compress both.
> The default is to compress responses only.

I have a few below (besides renaming "tocompress").

> From 77a5657592338db2ed53189c50828b3c0ed52716 Mon Sep 17 00:00:00 2001
> From: Olivier Houchard 
> Date: Tue, 4 Apr 2023 00:14:36 +0200
> Subject: [PATCH 2/2] MEDIUM: compression: allow to compress requests too.
> 
> Make it so we can compress requests, as well as responses.
> A new compress keyword is added, "tocompress". It takes exactly one
> argument. The valid values for that arguments are "response", if we want
> to compress only responses, "request", if we want to compress only
> requests, or "both", if we want to compress both.
> The dafault value is "response".
(...)
> diff --git a/include/haproxy/compression-t.h b/include/haproxy/compression-t.h
> index cdbf0d14e..d15d15ee3 100644
> --- a/include/haproxy/compression-t.h
> +++ b/include/haproxy/compression-t.h
> @@ -37,6 +37,9 @@
>  /* Compression flags */
>  
>  #define COMP_FL_OFFLOAD  0x0001 /* Compression offload */
> +#define COMP_FL_COMPRESS_REQ 0x0002 /* Compress requests */
> +#define COMP_FL_COMPRESS_RES 0x0004 /* Compress responses */

So these ones would be COMP_FL_DIR_{REQ,RES}.

>  struct comp {
>   struct comp_algo *algos;
>   struct comp_type *types;
> diff --git a/src/flt_http_comp.c b/src/flt_http_comp.c
> index 507009692..7680e241f 100644
> --- a/src/flt_http_comp.c
> +++ b/src/flt_http_comp.c
> @@ -33,7 +33,9 @@ struct flt_ops comp_ops;
>  
>  struct comp_state {
>   struct comp_ctx  *comp_ctx;   /* compression context */
> + struct comp_ctx  *comp_ctx_req; /* compression context for request */
>   struct comp_algo *comp_algo;  /* compression algorithm if not NULL */
> + struct comp_algo *comp_algo_req; /* Algo to be used for request */
>   unsigned int  flags;  /* COMP_STATE_* */

I find it confusing to have comp_ctx and comp_ctx_req (and same for algo).
I'd rather see a preliminary patch renaming comp_ctx to comp_ctx_res and
comp_algo to comp_algo_res (just perform a mechanical rename). Or you could
also have an array [2] for each, which will even remove some ifs in the
code.

Also I don't understand how you can have different algos for each direction
since the config language allows you to define "compression algo" and
"compression tocompress" so you cannot (apparently) have a situation where
the two algos differ. I think it can make sense to use different algos for
each direction (at least compression settings, e.g. expensive compression
for uploads, relayed over expensive links, little compression for downloads
that possibly comes from a cache).

Thus maybe an approach could be to use the algo to indicate the direction
at the same time:

   compression algo foobar   => sets response only, for compatibility
   compression algo-res foobar   => sets response only, explicit syntax
   compression algo-req foobar   => sets request only, explicit syntax

Then there's probably no need to have a keyword to set it at once in
both directions since both actions are totally independent and it's
probably not desired to encourage users to conflate the two directions
in a single keyword.

You might need the same for "compression types" I think, so that you
don't try to compress regular forms or login:password that will confuse
applications, but only large data.

> -static int set_compression_response_header(struct comp_state *st,
> -struct stream *s,
> -struct http_msg *msg);
> +static int set_compression_header(struct comp_state *st,
> +   struct stream *s,
> +   struct http_msg *msg,
> +   int for_response);
  
For this one as well I would call it "direction" (or "dir"), as we
already do in samples and a few other parts that work in both
directions.

> +static void 
> +comp_prepare_compress_request(struct comp_state *st, struct stream *s, 
> struct http_msg *msg)
> +{
> + struct htx *htx = htxbuf(>chn->buf);
> + struct http_txn *txn = s->txn;
> + struct http_hdr_ctx ctx;
> +
> + ctx.blk = NULL;
> + /* Already compressed, don't bother */
> + if (http_find_header(htx, ist("Content-Encoding"), , 1))
> + return;
> + /* HTTP < 

[PATCH] Add the possibility to compress requests

2023-04-03 Thread Olivier Houchard
Hi,

The attached patchset is a first attempt at adding the possibility to
compress requests, as well as responses.
It adds a new keyword for compression, "tocompress" (any better
alternative name would be appreciated). The valid values are "request",
if you want to compress requests, "response" if you want to compress
responses or "both", if you want to compress both.
The default is to compress responses only.

Any comment is more than welcome.

Thanks!

Olivier
>From 6c3e62baa888359521091387ce6ac8376a001259 Mon Sep 17 00:00:00 2001
From: Olivier Houchard 
Date: Mon, 3 Apr 2023 22:22:24 +0200
Subject: [PATCH 1/2] MINOR: compression: Make compression offload a flag

Turn compression offload into a flag in struct comp, instead of using
an int just for it.
---
 include/haproxy/compression-t.h | 5 -
 src/flt_http_comp.c | 6 +++---
 2 files changed, 7 insertions(+), 4 deletions(-)

diff --git a/include/haproxy/compression-t.h b/include/haproxy/compression-t.h
index 062f17f76..cdbf0d14e 100644
--- a/include/haproxy/compression-t.h
+++ b/include/haproxy/compression-t.h
@@ -34,10 +34,13 @@
 
 #include 
 
+/* Compression flags */
+
+#define COMP_FL_OFFLOAD		0x0001 /* Compression offload */
 struct comp {
 	struct comp_algo *algos;
 	struct comp_type *types;
-	unsigned int offload;
+	unsigned int flags;
 };
 
 struct comp_ctx {
diff --git a/src/flt_http_comp.c b/src/flt_http_comp.c
index ceda3fd5e..507009692 100644
--- a/src/flt_http_comp.c
+++ b/src/flt_http_comp.c
@@ -451,8 +451,8 @@ select_compression_request_header(struct comp_state *st, struct stream *s, struc
 
 	/* remove all occurrences of the header when "compression offload" is set */
 	if (st->comp_algo) {
-		if ((s->be->comp && s->be->comp->offload) ||
-		(strm_fe(s)->comp && strm_fe(s)->comp->offload)) {
+		if ((s->be->comp && (s->be->comp->flags & COMP_FL_OFFLOAD)) ||
+		(strm_fe(s)->comp && (strm_fe(s)->comp->flags & COMP_FL_OFFLOAD))) {
 			http_remove_header(htx, );
 			ctx.blk = NULL;
 			while (http_find_header(htx, ist("Accept-Encoding"), , 1))
@@ -690,7 +690,7 @@ parse_compression_options(char **args, int section, struct proxy *proxy,
   args[0], args[1]);
 			ret = 1;
 		}
-		comp->offload = 1;
+		comp->flags |= COMP_FL_OFFLOAD;
 	}
 	else if (strcmp(args[1], "type") == 0) {
 		int cur_arg = 2;
-- 
2.39.1

>From 77a5657592338db2ed53189c50828b3c0ed52716 Mon Sep 17 00:00:00 2001
From: Olivier Houchard 
Date: Tue, 4 Apr 2023 00:14:36 +0200
Subject: [PATCH 2/2] MEDIUM: compression: allow to compress requests too.

Make it so we can compress requests, as well as responses.
A new compress keyword is added, "tocompress". It takes exactly one
argument. The valid values for that arguments are "response", if we want
to compress only responses, "request", if we want to compress only
requests, or "both", if we want to compress both.
The dafault value is "response".
---
 doc/configuration.txt   |  12 ++-
 include/haproxy/compression-t.h |   3 +
 src/flt_http_comp.c | 165 +++-
 3 files changed, 154 insertions(+), 26 deletions(-)

diff --git a/doc/configuration.txt b/doc/configuration.txt
index 586689ca2..8f86b285e 100644
--- a/doc/configuration.txt
+++ b/doc/configuration.txt
@@ -5077,7 +5077,7 @@ compression type  ...
 compression algo gzip
 compression type text/html text/plain
 
-  See also : "compression offload"
+  See also : "compression offload", "compression tocompress"
 
 compression offload
   Makes HAProxy work as a compression offloader only.
@@ -5099,7 +5099,15 @@ compression offload
   If this setting is used in a defaults section, a warning is emitted and the
   option is ignored.
 
-  See also : "compression type", "compression algo"
+  See also : "compression type", "compression algo", "compression tocompress"
+
+compression tocompress 
+  Makes haproxy able to compress both requests and responses.
+  Valid values are "request", to compress only requests, "response", to
+  compress only responses, or "both", when you want to compress both.
+  The default value is "response".
+
+  See also : "compression type", "compression algo", "compression offload"
 
 cookie  [ rewrite | insert | prefix ] [ indirect ] [ nocache ]
   [ postonly ] [ preserve ] [ httponly ] [ secure ]
diff --git a/include/haproxy/compression-t.h b/include/haproxy/compression-t.h
index cdbf0d14e..d15d15ee3 100644
--- a/include/haproxy/compression-t.h
+++ b/include/haproxy/compression-t.h
@@ -37,6 +37,9 @@
 /* Compression flags */
 
 #define COMP_FL_OFFLOAD		0x0001 /* Compression offload */
+#define COMP_FL_COMPRESS_REQ	0x0002 /* Compress requests */
+#define COMP_FL_COMPRESS_RES	0x0004 /* Compress responses */
+
 struct comp {
 	struct comp_algo *algos;
 	struct comp_type *types;
diff --git a/src/flt_http_comp.c b/src/flt_http_comp.c
index 507009692..7680e241f 100644
--- a/src/flt_http_comp.c
+++ b/src/flt_http_comp.c
@@ -33,7 +33,9 @@ struct flt_ops