date:20210413

Re: [PATCH] MINOR: sample: Add a regmatch converter

2021-04-13 Thread Willy Tarreau

On Tue, Apr 13, 2021 at 10:52:15PM -0600, Thayne McCombs wrote:
> 
> On 4/13/21 3:44 AM, Willy Tarreau wrote:
> > I'm failing to see how it differs from regsub() which already does the
> > same with the reference (you'd write \1 instead of 1) and also allows to
> > compose something out of multiple matches. Am I missing something, or a
> > case where regsub() is really not convenient compared to this one ?
> > 
> > Thanks,
> > Willy
> 
> Mostly just convenience. It is possible to accomplish the same thing using
> something like
> `'regsub(".*(mypattern).*","\1")'`, but you need to remember to include the
> ".*" on both sides.
> I also suspect (although I haven't benchmarked it) that regmatch would
> perform better than the equivalent regsub.
> 
> That said, I do recognize that this doesn't add any completely new
> functionality, and I will not
> be offended at all if you don't think it is worth merging.

OK thanks for explaining. I'm not particularly against it, I'm fine if
others see some value in it.

Does anyone have any opinion ? Lukas, given you're the one dealing with
the most user help requests on Discourse, do you think it could simplify
certain configs or help users spot the right usage faster maybe ?

Thanks,
Willy

Re: [PATCH] MINOR: sample: Add a regmatch converter

2021-04-13 Thread Thayne McCombs




On 4/13/21 3:44 AM, Willy Tarreau wrote:

I'm failing to see how it differs from regsub() which already does the
same with the reference (you'd write \1 instead of 1) and also allows to
compose something out of multiple matches. Am I missing something, or a
case where regsub() is really not convenient compared to this one ?

Thanks,
Willy


Mostly just convenience. It is possible to accomplish the same thing 
using something like
`'regsub(".*(mypattern).*","\1")'`, but you need to remember to include 
the ".*" on both sides.
I also suspect (although I haven't benchmarked it) that regmatch would 
perform better than the

equivalent regsub.

That said, I do recognize that this doesn't add any completely new 
functionality, and I will not

be offended at all if you don't think it is worth merging.

Re: [PATCH] MINOR: sample: add json_string

2021-04-13 Thread Willy Tarreau

On Wed, Apr 14, 2021 at 03:02:20AM +0200, Aleksandar Lazic wrote:
> > But then, could it make sense to also support "strict integers": values
> > that can accurately be represented as integers and which are within the
> > JSON valid range for integers (-2^52 to 2^52 with no decimal part) ?
> > This would then make the converter return nothing in case of violation
> > (i.e. risk of losing precision). This would also reject NaN and infinite
> > that the lib supports.
> 
> You mean the same check which is done in arith_add().

Not exactly because arith_add only checks for overflows after addition
and tries to cap the result, but I'd rather just say that if the decoded
number is <= -2^53 or >= 2^53 then the converter should return a no match
in case an integer was requested.

> > H that's not your fault but now I'm seeing that we already have a
> > converter inappropriately called "json", so we don't even know in which
> > direction it works by just looking at its name :-(  Same issue as for
> > base64.
> > 
> > May I suggest that you call yours "json_decode" or maybe shorter
> > "json_dec" so that it's more explicit that it's the decode one ? Because
> > for me "json_string" is the one that will emit a json string from some
> > input (which it is not). Then we could later create "json_enc" and warn
> > when "json" alone is used. Or even "jsdec" and "jsenc" which are much
> > shorter and still quite explicit.
> 
> How about "json_query" because it's exactly what it does :-)

I'm not familiar with the notion of "query" to decode and extract contents
but I'm not the most representative user and am aware of the "jq" command-
line utility that does this. So if it sounds natural to others I'm fine
with this.

> > I'm seeing that there's a very nice mjson_find() which does *exactly* what
> > you need:
> > 
> > "In a JSON string s, len, find an element by its JSONPATH path. Save
> >  found element in tokptr, toklen. If not found, return JSON_TOK_INVALID.
> >  If found, return one of: MJSON_TOK_STRING, MJSON_TOK_NUMBER,
> >  MJSON_TOK_TRUE, MJSON_TOK_FALSE, MJSON_TOK_NULL, MJSON_TOK_ARRAY,
> >  MJSON_TOK_OBJECT.
> >  If a searched key contains ., [ or ] characters, they should be escaped
> >  by a backslash."
> > 
> > So you get the type in return. I think you can then call one of the
> > related functions depending on what is found, which is more reliable
> > than iterating over multiple attempts.
> 
> Oh yes, this sounds like a better approach.
> I have now used this suggestion and I hope you can help me to fix the double 
> parsing
> issue or is it acceptable to parse the input twice?

>From what I've seen in the code in the lib you have no other option.
I thought it might be possible to call mjson_get_string() on the
resulting pointer but you would need to find a way to express that
you want to extract the immediate content, maybe by having an empty
key designation or something like this. This point is not clear to
me and the unit tests in the project all re-parse the input string
after mjson_find(), so probably this is the way to do it.

> The check functions handles the int arg now as suggested.
> 
> ```
> /* This function checks the "json_query" converter's arguments. */
> static int sample_check_json_query(struct arg *arg, struct sample_conv *conv,
>const char *file, int line, char **err)
> {
>   if (arg[0].data.str.data == 0) { /* empty */
>   memprintf(err, "json_path must not be empty");
>   return 0;
>   }
> 
>   if (arg[1].data.str.data != 0) {
>   if (strncmp(arg[1].data.str.area, "int", sizeof("int")) 
> != 0) {
>   memprintf(err, "output_type only supports 
> \"int\" as argument");
>   return 0;
>   } else {
>   arg[1].type = ARGT_SINT;
>   arg[1].data.sint = 0;

OK!

> I use now the token enum but I have some troubles to handle the c types 
> properly.

OK that's better, but be careful about this:

>   trash->data = mjson_get_string(smp->data.u.str.area, 
> smp->data.u.str.data, args[0].data.str.area, trash->area, trash->size);
>   if (trash->data == -1 ) {

You're sending the result into trash->data whose type you don't know,
and expect it to match against -1. I'd rather store the result into
a temporary variable of the same type as the function's return (int),
test it, and only if equal, I'd assign this value to the return sample's
length.

Willy

Re: About the 'Hot Restarts' of haproxy

2021-04-13 Thread Willy Tarreau

On Wed, Apr 14, 2021 at 02:03:52AM +, Rmrf99 wrote:
> Thanks Chris, Willy! this make me clear now.
> Glad to see dynamically add/remove servers feature under development.

If you're interested by with feature, please try the latest 2.4-dev,
you already have the "add server" feature on the command line, you just
need to enable "experimental-mode on" before accessing the command.

Willy

Re: About the 'Hot Restarts' of haproxy

2021-04-13 Thread Rmrf99

Thanks Chris, Willy! this make me clear now.
Glad to see dynamically add/remove servers feature under development.


‐‐‐ Original Message ‐‐‐
On Wednesday, April 14, 2021 1:19 AM, Christopher Faulet  
wrote:

> Le 13/04/2021 à 18:15, John Lauro a écrit :
>
> > Sounds like the biggest part of hot restarts is the cost of leaving the old
> > process running as they have a lot of long running TCP connections, and if 
> > you
> > do a lot of restarts the memory requirements build up.  Not much of an 
> > issue for
> > short lived http requests (although it would be nice if keep alive wasn't
> > followed on the old haproxy processes so they could die quicker).
>
> Well, AFAIK, Envoy handles hot restarts exactly the same way HAProxy does. The
> old process tries to finish to process in-flight requests. The connections are
> not kept-alive. The old process closes all idle connections. But it must still
> wait the responses for already started requests. Both Envoy and HAProxy do it
> this way. Note there is an option to kill the old process after a given time.
>
> The article is a bit biased and inaccurate because it suggests HAproxy does 
> not
> support hot restarts while Envoy do it. In fact, The real difference here is 
> the
> ability to dynamically add or remove servers with Envoy. Thanks to this 
> feature,
> most of time, there is no reason to restart it. Thus, hot restarts are not an
> issue anymore. On HAProxy side, as Willy said, this feature is under 
> development.
>
> --
>
> Christopher Faulet

Re: [PATCH] MINOR: sample: add json_string

2021-04-13 Thread Aleksandar Lazic


On 13.04.21 11:26, Willy Tarreau wrote:

Hi Aleks,

On Mon, Apr 12, 2021 at 10:09:08PM +0200, Aleksandar Lazic wrote:

Hi.

another patch which honer the feedback.


Thank you. FWIW I agree with all the points reported by Tim. I'll add
a few comments and/or suggestions below. On a general note, please be
careful about your indenting, as it very can quickly become a total
mess. Similarly please pay attention not to leave trailing spaces
that may Git complain:

   Applying: MINOR: sample: converter: add JSON Path handling
   .git/rebase-apply/patch:39: trailing whitespace.
  - number  : When the JSON value is a number then will the value be
   .git/rebase-apply/patch:40: trailing whitespace.
  converted to a string. If you know that the value is a
   .git/rebase-apply/patch:41: trailing whitespace.
  integer then can you help haproxy to convert the value
   .git/rebase-apply/patch:46: trailing whitespace.
 This converter extracts the value located at  from the JSON
   .git/rebase-apply/patch:47: trailing whitespace.
 string in the input value.
   warning: squelched 10 whitespace errors
   warning: 15 lines add whitespace errors.

All these lines are easily noticed this way:

 $ git show | grep -c '^+.*\s$'
 15

A good way to avoid this once for all it to enable colors in Git and to
always make sure not to leave red areas in "git diff" or "git show" :

 $ git config --global color.ui true


Cool tip, I have set it now.


And even if it's of low importance for the code itself, it's particularly
important in a review because such cosmetic issues constantly remind the
reader that the patch is far from being final, so it's possibly not yet
the moment to focus on not critically important stuff. Thus in the end
they increase the number of round trips.


Thanks. I will take care about it.


The doc will be enhanced but I have a question about that sequence.
This should write the double value to the string but I think I have here some
issue.

```
printf("\n>>>DOUBLE rc:%d: double:%f:\n",rc, 
double_val);
trash->size = snprintf(trash->area,
trash->data,

"%g",double_val);
smp->data.u.str = *trash;
smp->data.type = SMP_T_STR;
```


Yeah, as Tim mentioned, you mixed size and data. "data" is the amount of
data bytes used in a chunk. "size" is its allocated size.


Fixed now, you can see it in then snipplet below.


>From 8cb1bc4aaedd17c7189d4985a57f662ab1b533a4 Mon Sep 17 00:00:00 2001
From: Aleksandar Lazic 
Date: Mon, 12 Apr 2021 22:01:04 +0200
Subject: [PATCH] MINOR: sample: converter: add JSON Path handling

With json_path can a JSON value be extacted from a Header
or body


In the final version, please add a few more lines to describe the name
of the added converter and what it's used for. As a reminder, think that
you're trying to sell your artwork to me or anyone else who would make
you proud bybackporting your work into their version :-)


Will do it :-)


+json_query(,[])
+  The  is mandatory.
+  By default will the follwing JSON types recognized.
+   - string  : This is the default search type and returns a string;
+   - number  : When the JSON value is a number then will the value be
+   converted to a string. If you know that the value is a
+   integer then can you help haproxy to convert the value
+   to a integer when you add "sint" to the ;


Just thinking loud, I looked at the rest of the doc and noticed we never
mention "sint" anywhere else, so I think it's entirely an internal type.
However we do mention "int" which is used as the matching method for
integers, so we could have:

  ... json_query("blah",sint) -m int 12

As such I would find it more natural to call this type "int" so that it
matches the same as the one used in the match. Maps already use "int" as
the output type name by the way.

In any case, I too am a bit confused by the need to force an output type.
As a user, I'd expect the type to be implicit and not to have to know
about it in the configuration. Of course we can imagine situations where
we'd want to force the type (like we sometimes do by adding 0 or
concatenating an empty string for example) but this is still not very
clear to me if we want it by default. Or maybe when dealing with floats
where we'd have to decide whether to emit them verbatim as strings or
to convert them to integers.

But then, could it make sense to also support "strict integers": values
that can accurately be represented as integers and which are within the
JSON valid range for integers (-2^52 to 2^52 with no decimal part) ?
This would then make the converter return nothing in case of violation
(i.e. risk of losing precision). This would also reject NaN and infinite
that the lib supports.


You m

stable-bot: Bugfixes waiting for a release 2.3 (5), 2.1 (14)

2021-04-13 Thread stable-bot

Hi,

This is a friendly bot that watches fixes pending for the next haproxy-stable 
release!  One such e-mail is sent periodically once patches are waiting in the 
last maintenance branch, and an ideal release date is computed based on the 
severity of these fixes and their merge date.  Responses to this mail must be 
sent to the mailing list.


Last release 2.3.9 was issued on 2021-03-30.  There are currently 5 patches 
in the queue cut down this way:
- 5 MINOR, first one merged on 2021-03-31

Thus the computed ideal release date for 2.3.10 would be 2021-04-28, which is 
in two weeks or less.

Last release 2.1.12 was issued on 2021-03-18.  There are currently 14 
patches in the queue cut down this way:
- 2 MAJOR, first one merged on 2021-04-09
- 7 MEDIUM, first one merged on 2021-03-31
- 5 MINOR, first one merged on 2021-03-31

Thus the computed ideal release date for 2.1.13 would be 2021-05-07, which is 
in four weeks or less.

The current list of patches in the queue is:
 - 2.1   - MAJOR   : dns: fix null pointer dereference in 
snr_update_srv_status
 - 2.1   - MAJOR   : dns: disabled servers through SRV 
records never recover
 - 2.1   - MEDIUM  : thread: Fix a deadlock if an isolated 
thread is marked as harmless
 - 2.1   - MEDIUM  : freq_ctr/threads: use the 
global_now_ms variable
 - 2.1   - MEDIUM  : debug/lua: Use internal hlua function 
to dump the lua traceback
 - 2.1   - MEDIUM  : resolvers: Don't release resolution 
from a requester callbacks
 - 2.1   - MEDIUM  : time: make sure to always initialize 
the global tick
 - 2.1   - MEDIUM  : mux-h1: make h1_shutw_conn() idempotent
 - 2.1   - MEDIUM  : lua: Always init the lua stack before 
referencing the context
 - 2.1   - MINOR   : stats: Apply proper styles in HTML 
status page.
 - 2.1, 2.3  - MINOR   : tcp: fix silent-drop workaround for 
IPv6
 - 2.1, 2.3  - MINOR   : http_fetch: make hdr_ip() resistant to 
empty fields
 - 2.3   - MINOR   : ssl: Fix update of default certificate
 - 2.3   - MINOR   : ssl: Add missing free on SSL_CTX in 
ckch_inst_free
 - 2.1   - MINOR   : http_fetch: make hdr_ip() reject 
trailing characters
 - 2.1   - MINOR   : resolvers: Unlink DNS resolution to 
set RMAINT on SRV resolution
 - 2.3   - MINOR   : ssl: Prevent removal of crt-list line 
if the instance is a default one

-- 
The haproxy stable-bot is freely provided by HAProxy Technologies to help 
improve the quality of each HAProxy release.  If you have any issue with these 
emails or if you want to suggest some improvements, please post them on the 
list so that the solutions suiting the most users can be found.

Re: About the 'Hot Restarts' of haproxy

2021-04-13 Thread Christopher Faulet


Le 13/04/2021 à 18:15, John Lauro a écrit :
Sounds like the biggest part of hot restarts is the cost of leaving the old 
process running as they have a lot of long running TCP connections, and if you 
do a lot of restarts the memory requirements build up.  Not much of an issue for 
short lived http requests (although it would be nice if keep alive wasn't 
followed on the old haproxy processes so they could die quicker).




Well, AFAIK, Envoy handles hot restarts exactly the same way HAProxy does. The 
old process tries to finish to process in-flight requests. The connections are 
not kept-alive. The old process closes all idle connections. But it must still 
wait the responses for already started requests. Both Envoy and HAProxy do it 
this way. Note there is an option to kill the old process after a given time.


The article is a bit biased and inaccurate because it suggests HAproxy does not 
support hot restarts while Envoy do it. In fact, The real difference here is the 
ability to dynamically add or remove servers with Envoy. Thanks to this feature, 
most of time, there is no reason to restart it. Thus, hot restarts are not an 
issue anymore. On HAProxy side, as Willy said, this feature is under development.


--
Christopher Faulet

Re: About the 'Hot Restarts' of haproxy

2021-04-13 Thread Willy Tarreau

On Tue, Apr 13, 2021 at 12:15:33PM -0400, John Lauro wrote:
> Sounds like the biggest part of hot restarts is the cost of leaving the old
> process running as they have a lot of long running TCP connections, and if
> you do a lot of restarts the memory requirements build up.  Not much of an
> issue for short lived http requests (although it would be nice if keep
> alive wasn't followed on the old haproxy processes so they could die
> quicker).

Exactly, which is why we've been working towards adding unlimited servers
as this is essentially the only reason for restarting nowadays, e.g. when
you need to add more servers than you had initially planned and already
maxed out your server-template reserve :-)

Willy

Re: [RFC PATCH 4/8] MINOR: uri_normalizer: Add a `sort-query` normalizer

2021-04-13 Thread Christopher Faulet


Le 13/04/2021 à 18:05, Tim Düsterhus a écrit :

Christopher,

On 4/13/21 4:59 PM, Christopher Faulet wrote:

+/* Sorts the parameters within the given query string. Returns an ist
containing
+ * the new path and backed by `trash` or IST_NULL if the `len` not
sufficiently
+ * large to store the resulting path.
+ */
+struct ist uri_normalizer_query_sort(const struct ist query, const
char delim, char *trash, size_t len)
+{


I hadn't noticed it till now, but please don't use "trash" variable
name, it is confusing with the trash buffer used almost everywhere.
Especially because of my next comment :)


In fact the http-request normalize-uri action will pass a trash buffer
pointer into the function, that's why I found the name somewhat fitting.
But the exact method signature is up for discussion anyway (see my other
email).



It is the caller point of view. For the normalizers, it is just a destination 
buffer, not necessarily a trash buffer. Here, it is just to avoid confusion with 
the global variable.


--
Christopher Faulet

Re: [RFC PATCH 3/8] MINOR: uri_normalizer: Add a `dotdot` normalizer to http-request normalize-uri

2021-04-13 Thread Christopher Faulet


Le 13/04/2021 à 18:03, Tim Düsterhus a écrit :

Christopher,

On 4/13/21 4:38 PM, Christopher Faulet wrote:

At the end it remains your choice. The function is quite good. I just
wonder if it could be valuable to also handle single dot-segment here in
addition to double dot-segment. Thus, the normalizer should be renamed
"dot-segments" or something similar.


I planned to add a separate normalizer for that. This keeps the
functions simple and easy to check for correctness. It also allows the
administrator to cherry-pick exactly the normalizers they desire and
that do not break their backend. In the old discussion Willy said that
not everything that might be possible to normalize can actually be
normalized when combined with legacy software.


Ok, that make sense.




Another point is about the dot encoding. It may be good to handle
encoded dot (%2E), may be via an option. And IMHO, the way empty
segments are handle is a bit counter intuitive. Calling "merge-slashes"
normalizer first is a solution of course, but this means rewriting twice
the uri. We must figure out what is the main expectation for this
normalizer. Especially because ignoring empty segment when dot-segments
are merged is not exactly the same than merge all slashes.


The percent encoding of the dot will be handled by a 'percent-decode'
normalizer that decodes percent encoded unreserved characters (RFC 3986,
section 2.3). The administrator then first would use the percent-decode
normalizer, then the merge-slashes normalizer, then the dotdot normalizer.


Well, it is a bit different here. Because someone could choose to not decode 
unreserved characters but want to handle encoded dot in dotdot normalizer.




Yes, it means rewriting the URI several times. But it is nice, explicit
and composes well.


On this point, you're right. It is far cleaner this way.



We can later figure out whether we want to provide "combined
normalizers", such as a 'filesystem' normalizer that would combine the
'.', '..' and '//' normalizers in an efficient way. Adding something
like that later is easy. Changing the behavior of a normalizer later is
hard.


That's true. Depending on feebacks, it will be possible to add more normalizers. 
I'm fine with that.




That's why I'd like to keep them simple "Unix style" for now. Make them
do one thing, make them do it well.


Note I was first surprised that leading dot-segments were preserved,
before reading the 6th patch because for me it is the most important
part. But I'm fine with an option in a way or another.



It's a result of how I approached the development. I wanted to not
rebase my branch more than necessary. I will probably merge the two
patches and change the default once the general approach is approved :-)


Well, it is not a problem. You can keep it in two patches if it is easier for 
you.

--
Christopher Faulet

Re: [RFC PATCH 0/8] URI normalization / Issue #714

2021-04-13 Thread Christopher Faulet


Le 13/04/2021 à 17:45, Tim Düsterhus a écrit :

Christopher,

On 4/13/21 2:41 PM, Christopher Faulet wrote:

Sorry for the delay. I'll comment your patches by replying inline when


No delay experienced. You said that you'd try this week and it's still
this week. So this is fine :-)


appropriate. The quality of the whole series is pretty good. Thanks for
the effort. Just a suggestion to simplify the error handling introduced
in the 7th patch. You may use following prototype for your normalizers:

    int uri_normalizer_NAME(const struct ist path, struct ist *dst);

returning for instance 0 on success and anything else on error. This
way, it is easy to return an enum instead of an integer to be able to
handle errors.



Yes, I struggled a bit with the method signature, due to the following
constraints:

1. I did not want to allocate the result buffer within the normalizer
itself, because this makes the method less flexible with regard to both
allocation and freeing.


It is indeed cleaner to allocate it in the caller. We should avoid 
responsibility sharing. It is always confusing and leads to bugs.



2. I need to somehow pass a size of the target buffer to prevent buffer
overflows.


Passing a pointer on an ist is a good way to announce the max size when the 
normalizer is called and to update it to set the final size of the new 
path/query. By contract, the caller must provide an ist owning an allocated buffer.



Thus I can't simply take a `struct ist*` for the destination, as an ist
cannot communicate the size of the underlying buffer. I could
technically take a `struct buffer`, but I'd still like the result to
reside in an ist, because this is what the HTX API expects.


Hum, I don't understand. If you create an ist using the trash buffer this way:

   struct ist dst = ist2(replace->area, replace->size);

You can pass a pointer on dst. In the normalizer, you can update its size. It is 
thus possible to use dst when calling http_replace_req_path() or 
http_replace_req_query().




Your suggested signature would work if I would allocate a trash buffer
within the normalizer. Should I do this? Is it safe to return a pointer
to a trash buffer from a function? I remember something about these
buffers being reused, accidentally destroying the contained information
if one is not careful.


No, it is indeed a very bad idea. the trash buffer must be allocated in the 
caller. You already choose the right way on this point. But you can still a 
pointer on an ist, locally build from the trash buffer.


--
Christopher Faulet

Re: About the 'Hot Restarts' of haproxy

2021-04-13 Thread John Lauro

Sounds like the biggest part of hot restarts is the cost of leaving the old
process running as they have a lot of long running TCP connections, and if
you do a lot of restarts the memory requirements build up.  Not much of an
issue for short lived http requests (although it would be nice if keep
alive wasn't followed on the old haproxy processes so they could die
quicker).

On Tue, Apr 13, 2021 at 6:25 AM Willy Tarreau  wrote:

> On Tue, Apr 13, 2021 at 01:31:12AM +, Rmrf99 wrote:
> > In this Slack engineering blog post:
> https://slack.engineering/migrating-millions-of-concurrent-websockets-to-envoy/
> >
> > they replace HAProxy with Envoy for **Hot Restart**, just curious does
> > HAProxy new version will have similar approach? or have better
> solution(in
> > the future).
>
> So in fact it's not for hot restarts, since we've supported that even
> before envoy was born, it's in order to introduce new servers at run
> time. We do have some ongoing work on this, and some significant parts
> are already available with experimental support:
>
> https://github.com/haproxy/haproxy/issues/1136
>
> Willy
>
>

Re: [RFC PATCH 8/8] MINOR: uri_normalizer: Add a `percent-upper` normalizer

2021-04-13 Thread Christopher Faulet


Le 08/04/2021 à 20:59, Tim Duesterhus a écrit :

Willy,
Christopher,

and this final one adds a normalizer to turn the hex digits of percent
encoding into uppercase. Uppercase is the variant preferred by the URI RFC, so
this is what we do.



This one looks good.

--
Christopher Faulet

Re: [RFC PATCH 4/8] MINOR: uri_normalizer: Add a `sort-query` normalizer

2021-04-13 Thread Tim Düsterhus


Christopher,

On 4/13/21 4:59 PM, Christopher Faulet wrote:
+/* Sorts the parameters within the given query string. Returns an ist 
containing
+ * the new path and backed by `trash` or IST_NULL if the `len` not 
sufficiently

+ * large to store the resulting path.
+ */
+struct ist uri_normalizer_query_sort(const struct ist query, const 
char delim, char *trash, size_t len)

+{


I hadn't noticed it till now, but please don't use "trash" variable 
name, it is confusing with the trash buffer used almost everywhere. 
Especially because of my next comment :)


In fact the http-request normalize-uri action will pass a trash buffer 
pointer into the function, that's why I found the name somewhat fitting. 
But the exact method signature is up for discussion anyway (see my other 
email).


Best regards
Tim Düsterhus

Re: [RFC PATCH 3/8] MINOR: uri_normalizer: Add a `dotdot` normalizer to http-request normalize-uri

2021-04-13 Thread Tim Düsterhus


Christopher,

On 4/13/21 4:38 PM, Christopher Faulet wrote:
At the end it remains your choice. The function is quite good. I just 
wonder if it could be valuable to also handle single dot-segment here in 
addition to double dot-segment. Thus, the normalizer should be renamed 
"dot-segments" or something similar.


I planned to add a separate normalizer for that. This keeps the 
functions simple and easy to check for correctness. It also allows the 
administrator to cherry-pick exactly the normalizers they desire and 
that do not break their backend. In the old discussion Willy said that 
not everything that might be possible to normalize can actually be 
normalized when combined with legacy software.


Another point is about the dot encoding. It may be good to handle 
encoded dot (%2E), may be via an option. And IMHO, the way empty 
segments are handle is a bit counter intuitive. Calling "merge-slashes" 
normalizer first is a solution of course, but this means rewriting twice 
the uri. We must figure out what is the main expectation for this 
normalizer. Especially because ignoring empty segment when dot-segments 
are merged is not exactly the same than merge all slashes.


The percent encoding of the dot will be handled by a 'percent-decode' 
normalizer that decodes percent encoded unreserved characters (RFC 3986, 
section 2.3). The administrator then first would use the percent-decode 
normalizer, then the merge-slashes normalizer, then the dotdot normalizer.


Yes, it means rewriting the URI several times. But it is nice, explicit 
and composes well.


We can later figure out whether we want to provide "combined 
normalizers", such as a 'filesystem' normalizer that would combine the 
'.', '..' and '//' normalizers in an efficient way. Adding something 
like that later is easy. Changing the behavior of a normalizer later is 
hard.


That's why I'd like to keep them simple "Unix style" for now. Make them 
do one thing, make them do it well.


Note I was first surprised that leading dot-segments were preserved, 
before reading the 6th patch because for me it is the most important 
part. But I'm fine with an option in a way or another.




It's a result of how I approached the development. I wanted to not 
rebase my branch more than necessary. I will probably merge the two 
patches and change the default once the general approach is approved :-)


Best regards
Tim Düsterhus

Re: [RFC PATCH 7/8] MINOR: uri_normalizer: Support returning detailed errors from uri normalization

2021-04-13 Thread Christopher Faulet


Le 08/04/2021 à 20:59, Tim Duesterhus a écrit :

Willy,
Christopher,

this is in prepatation for the next normalizer which normalizes character case
of the percent encoding.

The resources I found are not clear on whether a percent that is not followed
by two hex digits is valid or not. Most browsers and servers appear to support
it, so I opted to leave the user a choice whether to reject it or not.


It is probably a good choice



To support this I need to differ between "normalizing failed for internal
reasons" and "normalizing failed, because the user sent garbage".

Overall I am not happy with this patch, because the control flow is ugly. I
also wasn't sure in which cases I need to increase server counters (e.g.
failed_rewrites) or not. I'd be happy if you could give some advice!



The function may be simplified. I'll propose you a new version, based on my
other comments.

[...]


+enum uri_normalizer_err {
+   URI_NORMALIZER_ERR_NONE = 0,
+   URI_NORMALIZER_ERR_TRASH,
+   URI_NORMALIZER_ERR_ALLOC,
+   URI_NORMALIZER_ERR_INVALID_INPUT,
+   URI_NORMALIZER_ERR_BUG = 0xdead,
+};


URI_NORMALIZER_ERR_BUG value is useless. And I guess it could be useful to have
a rewrite error too.

[...]


fail_rewrite:
_HA_ATOMIC_ADD(&sess->fe->fe_counters.failed_rewrites, 1);
if (s->flags & SF_BE_ASSIGNED)
@@ -300,6 +301,24 @@ static enum act_return http_action_normalize_uri(struct 
act_rule *rule, struct p
s->flags |= SF_ERR_PRXCOND;
}
goto leave;
+
+  err:
+   switch (err) {
+   case URI_NORMALIZER_ERR_NONE:


You must break here, otherwise, an error is returned on success.


+   case URI_NORMALIZER_ERR_BUG:
+   case URI_NORMALIZER_ERR_TRASH:
+   /* These must not happen. */
+   BUG_ON("Logic error during URI normalization.");
+   ret = ACT_RET_ERR;
+   goto leave;
+   case URI_NORMALIZER_ERR_ALLOC:
+   goto fail_alloc;
+   case URI_NORMALIZER_ERR_INVALID_INPUT:
+   ret = ACT_RET_INV;
+   goto leave;
+   }
+
+   my_unreachable();
  }
  


You may rewrite http_action_normalize_uri() this way (on top of your series,
normalizers must be adapted). It is not untested nor compiled.

static enum act_return http_action_normalize_uri(struct act_rule *rule, struct 
proxy *px,
 struct session *sess, struct 
stream *s, int flags)
{
enum act_return ret = ACT_RET_CONT;
struct htx *htx = htxbuf(&s->req.buf);
const struct ist uri = htx_sl_req_uri(http_get_stline(htx));
const struct ist path = http_get_path(uri);
struct ist dst;
struct buffer *replace = alloc_trash_chunk();
enum uri_normalizer_err err;

if (!replace)
goto fail_alloc;

if (!isttest(path))
goto leave;

dst = ist2(replace->area, replace->size);
switch ((enum act_normalize_uri) rule->action) {
case ACT_NORMALIZE_URI_MERGE_SLASHES:
err = uri_normalizer_path_merge_slashes(iststop(path, '?'), 
&dst);
if (err != URI_NORMALIZER_ERR_NONE)
break;
if (!http_replace_req_path(htx, dst, 0))
goto fail_rewrite;
break;
case ACT_NORMALIZE_URI_DOTDOT:
case ACT_NORMALIZE_URI_DOTDOT_FULL:
err = uri_normalizer_path_dotdot(iststop(path, '?'), &dst, 
rule->action == ACT_NORMALIZE_URI_DOTDOT_FULL);
if (err != URI_NORMALIZER_ERR_NONE)
break;
if (!http_replace_req_path(htx, dst, 0))
goto fail_rewrite;
break;
case ACT_NORMALIZE_URI_SORT_QUERY:
err = uri_normalizer_query_sort(istfind(path, '?'), &dst, '&');
if (err != URI_NORMALIZER_ERR_NONE)
break;
if (!http_replace_req_query(htx, dst))
goto fail_rewrite;
break;
case ACT_NORMALIZE_URI_PERCENT_UPPER:
case ACT_NORMALIZE_URI_PERCENT_UPPER_STRICT:
err = uri_normalizer_percent_upper(path, &dst, rule->action == 
ACT_NORMALIZE_URI_PERCENT_UPPER_STRICT);
if (err != URI_NORMALIZER_ERR_NONE)
break;
if (!http_replace_req_path(htx, dst, 1))
goto fail_rewrite;
break;
}

switch (err) {
case URI_NORMALIZER_ERR_NONE:
break;
case URI_NORMALIZER_ERR_BUG:
case URI_NORMALIZER_ERR_TRASH:
/* These must not happen. */
BUG_ON("Logic error during URI normalization.");
ret = ACT_RET_ERR;
break;
case URI_NORMALIZER_ERR_ALLOC:
goto fail_alloc;
case URI_NORMALIZER_ERR_INVALID_INPU

Re: [RFC PATCH 0/8] URI normalization / Issue #714

2021-04-13 Thread Tim Düsterhus


Christopher,

On 4/13/21 2:41 PM, Christopher Faulet wrote:
Sorry for the delay. I'll comment your patches by replying inline when 


No delay experienced. You said that you'd try this week and it's still 
this week. So this is fine :-)


appropriate. The quality of the whole series is pretty good. Thanks for 
the effort. Just a suggestion to simplify the error handling introduced 
in the 7th patch. You may use following prototype for your normalizers:


   int uri_normalizer_NAME(const struct ist path, struct ist *dst);

returning for instance 0 on success and anything else on error. This 
way, it is easy to return an enum instead of an integer to be able to 
handle errors.




Yes, I struggled a bit with the method signature, due to the following 
constraints:


1. I did not want to allocate the result buffer within the normalizer 
itself, because this makes the method less flexible with regard to both 
allocation and freeing.
2. I need to somehow pass a size of the target buffer to prevent buffer 
overflows.


Thus I can't simply take a `struct ist*` for the destination, as an ist 
cannot communicate the size of the underlying buffer. I could 
technically take a `struct buffer`, but I'd still like the result to 
reside in an ist, because this is what the HTX API expects.


Your suggested signature would work if I would allocate a trash buffer 
within the normalizer. Should I do this? Is it safe to return a pointer 
to a trash buffer from a function? I remember something about these 
buffers being reused, accidentally destroying the contained information 
if one is not careful.


Best regards
Tim Düsterhus

Re: [PATCH] JWT payloads break b64dec convertor

2021-04-13 Thread Willy Tarreau

On Tue, Apr 13, 2021 at 04:44:38PM +0200, Moemen MHEDHBI wrote:
> > But then how about having just *your* functions
> > without relying on the other ones ? Now that you've extended the existing
> > function, you can declare it yours, remove all the configurable stuff and
> > keep the simplified version as the one you need. I'm sure it will be the
> > best tradeoff overall.
> >
> 
> Yes that makes sense to me too, the attached patch deal with it as
> suggested.

Yeah much cleaner now, indeed. I've merged it, thank you!

Willy

Re: [RFC PATCH 6/8] MINOR: uri_normalizer: Add support for supressing leading `../` for dotdot normalizer

2021-04-13 Thread Christopher Faulet


Le 08/04/2021 à 20:59, Tim Duesterhus a écrit :

Willy,
Christopher,

most of the patch is moving around the config parser to support ingesting the
new argument.



This one looks good.

--
Christopher Faulet

Re: [RFC PATCH 5/8] OPTIMIZE: uri_normalizer: Optimize allocations in uri_normalizer_query_sort

2021-04-13 Thread Christopher Faulet


Le 08/04/2021 à 20:59, Tim Duesterhus a écrit :

Willy,
Christopher,

I did not perform any measurements at all. But not reallocating for every
parameter should be better :-)



This one may be useless if you use the trash buffer to store the query 
parameters.

--
Christopher Faulet

Re: [RFC PATCH 4/8] MINOR: uri_normalizer: Add a `sort-query` normalizer

2021-04-13 Thread Christopher Faulet


Le 08/04/2021 à 20:59, Tim Duesterhus a écrit :

Willy,
Christopher,

This one comes with dynamic allocation. The next patch will add an optimization
for a small number of arguments. However dynamic allocation within the main
processing logic is pretty ugly, so this should be looked at further.



Dynamic allocation should be avoided here. Definitely. I may propose you an 
alternative by using the trash buffer area to store the ist array, via a cast. 
It may be considered a bit ugly at first glance, but if you think about it as a 
trash memory space, it is not so shocking. We've already used this trick for the 
HTX and regular buffers. If you do it that way, by default, it gives you 1024 
slots. And you may return an alloc failure beyond this value. It seems reasonable :)


  
+/* Compares two query parameters by name. Query parameters are ordered

+ * as with memcmp. Shorter parameter names are ordered lower. Identical
+ * parameter names are compared by their pointer to maintain a stable
+ * sort.
+ */
+static int query_param_cmp(const void *a, const void *b)
+{
+   const struct ist param_a = *(struct ist*)a;
+   const struct ist param_b = *(struct ist*)b;
+   const struct ist param_a_name = iststop(param_a, '=');
+   const struct ist param_b_name = iststop(param_b, '=');
+
+   int cmp = memcmp(istptr(param_a_name), istptr(param_b_name), 
MIN(istlen(param_a_name), istlen(param_b_name)));
+
+   if (cmp != 0)
+   return cmp;
+
+   if (istlen(param_a_name) < istlen(param_b_name))
+   return -1;
+
+   if (istlen(param_a_name) > istlen(param_b_name))
+   return 1;
+
+   if (istptr(param_a) < istptr(param_b))
+   return -1;
+
+   if (istptr(param_a) > istptr(param_b))
+   return 1;
+
+   return 0;
+}


To make it more simple, you may use istdiff instead.


+
+/* Sorts the parameters within the given query string. Returns an ist 
containing
+ * the new path and backed by `trash` or IST_NULL if the `len` not sufficiently
+ * large to store the resulting path.
+ */
+struct ist uri_normalizer_query_sort(const struct ist query, const char delim, 
char *trash, size_t len)
+{


I hadn't noticed it till now, but please don't use "trash" variable name, it is 
confusing with the trash buffer used almost everywhere. Especially because of my 
next comment :)



+   struct ist scanner = istadv(query, 1);
+   struct ist *params = NULL;
+   struct ist newquery = ist2(trash, 0);
+   size_t param_count = 0;
+   size_t i;
+
+   if (len < istlen(query))
+   return IST_NULL;
+
+   while (istlen(scanner) > 0) {
+   const struct ist param = istsplit(&scanner, delim);
+   struct ist *realloc = reallocarray(params, param_count + 1, 
sizeof(*realloc));


Here, instead of a dynamic array, you may use the trash buffer area (declared 
from outside the loop). For instance:


struct ist *params = (struct ist *)b_orig(&trash);
size_t max_param = b_size(&trash) / sizeof(*params);


+
+   if (!realloc)
+   goto fail;
+
+   params = realloc;
+
+   params[param_count] = param;
+   param_count++;
+   }
+
+   qsort(params, param_count, sizeof(*params), query_param_cmp);
+
+   newquery = __istappend(newquery, '?');
+   for (i = 0; i < param_count; i++) {
+   if (i > 0)
+   newquery = __istappend(newquery, '&');
+
+   if (istcat(&newquery, params[i], len) < 0)
+   goto fail;
+   }
+
+   free(params);
+
+   return newquery;
+
+  fail:
+   free(params);
+
+   return IST_NULL;
+}
  
  /*

   * Local variables:




--
Christopher Faulet

Re: [PATCH] JWT payloads break b64dec convertor

2021-04-13 Thread Moemen MHEDHBI


On 13/04/2021 11:39, Willy Tarreau wrote:

>> You can find attached the patches 0001-bis and 0002-bis modifying the
>> existing functions (introducing an url flag) to see how it looks like.
>> This solution may be cleaner (no chunk allocation and we don't loop
>> twice over input string) but has the drawbacks of being intrusive with
>> the rest of the code and less clearer imo regarding how url variant is
>> different from standard base64.
> 
> I agree they're not pretty due to the change of logic around the padding,
> thanks for having tested! But then how about having just *your* functions
> without relying on the other ones ? Now that you've extended the existing
> function, you can declare it yours, remove all the configurable stuff and
> keep the simplified version as the one you need. I'm sure it will be the
> best tradeoff overall.
>

Yes that makes sense to me too, the attached patch deal with it as
suggested.


>> diff --git a/src/base64.c b/src/base64.c
>> index 53e4d65b2..f2768d980 100644
>> --- a/src/base64.c
>> +++ b/src/base64.c
>> @@ -1,5 +1,5 @@
>>  /*
>> - * ASCII <-> Base64 conversion as described in RFC1421.
>> + * ASCII <-> Base64 conversion as described in RFC4648.
>>   *
>>   * Copyright 2006-2010 Willy Tarreau 
>>   * Copyright 2009-2010 Krzysztof Piotr Oledzki 
>> @@ -17,50 +17,70 @@
>>  #include 
>>  #include 
>>  
>> -#define B64BASE '#' /* arbitrary chosen base value */
>> -#define B64CMIN '+'
>> -#define B64CMAX 'z'
>> -#define B64PADV 64  /* Base64 chosen special pad value */
>> +#define B64BASE  '#'   /* arbitrary chosen base value */
>> +#define B64CMIN  '+'
>> +#define UB64CMIN '-'
>> +#define B64CMAX  'z'
>> +#define B64PADV  64   /* Base64 chosen special pad value */
> 
> Please do not needlessly reindent code parts for no reason. It seems that
> only the "-" was added there, the rest shouldn't change.

The reason is I was following the doc/coding-style where alignment
should use spaces, but since the existing bloc was aligned via tabs, I
thought about fixing that instead of repeating the issue. I understand
though that in such case better have this in separate commit, so I have
stuck with the tabs alignment.

> By the way, contrib/ was move to dev/ during your changes so if you keep
> this comment please update it.

Done.

On 13/04/2021 08:19, Jarno Huuskonen wrote:
> Could you add a cross reference from b64dec/base64 to ub64dec/ub64enc in
> configuration.txt.

Done thanks.


-- 
Moemen
>From b526416364b98afaa2d2b421fbf27f80bc4e8732 Mon Sep 17 00:00:00 2001
From: Moemen MHEDHBI 
Date: Fri, 2 Apr 2021 01:05:07 +0200
Subject: [PATCH 2/2] CLEANUP: align samples list in sample.c

---
 src/sample.c | 54 ++--
 1 file changed, 27 insertions(+), 27 deletions(-)

diff --git a/src/sample.c b/src/sample.c
index 04635a91f..7337ba06a 100644
--- a/src/sample.c
+++ b/src/sample.c
@@ -4129,33 +4129,33 @@ INITCALL1(STG_REGISTER, sample_register_fetches, &smp_kws);
 
 /* Note: must not be declared  as its list will be overwritten */
 static struct sample_conv_kw_list sample_conv_kws = {ILH, {
-	{ "debug",  sample_conv_debug, ARG2(0,STR,STR), smp_check_debug, SMP_T_ANY,  SMP_T_ANY },
-	{ "b64dec", sample_conv_base642bin,0,NULL, SMP_T_STR,  SMP_T_BIN  },
-	{ "base64", sample_conv_bin2base64,0,NULL, SMP_T_BIN,  SMP_T_STR  },
-	{ "ub64dec", sample_conv_base64url2bin,0,NULL, SMP_T_STR,  SMP_T_BIN  },
-	{ "ub64enc", sample_conv_bin2base64url,0,NULL, SMP_T_BIN,  SMP_T_STR  },
-	{ "upper",  sample_conv_str2upper, 0,NULL, SMP_T_STR,  SMP_T_STR  },
-	{ "lower",  sample_conv_str2lower, 0,NULL, SMP_T_STR,  SMP_T_STR  },
-	{ "length", sample_conv_length,0,NULL, SMP_T_STR,  SMP_T_SINT },
-	{ "hex",sample_conv_bin2hex,   0,NULL, SMP_T_BIN,  SMP_T_STR  },
-	{ "hex2i",  sample_conv_hex2int,   0,NULL, SMP_T_STR,  SMP_T_SINT },
-	{ "ipmask", sample_conv_ipmask,ARG2(1,MSK4,MSK6), NULL, SMP_T_ADDR, SMP_T_IPV4 },
-	{ "ltime",  sample_conv_ltime, ARG2(1,STR,SINT), NULL, SMP_T_SINT, SMP_T_STR },
-	{ "utime",  sample_conv_utime, ARG2(1,STR,SINT), NULL, SMP_T_SINT, SMP_T_STR },
-	{ "crc32",  sample_conv_crc32, ARG1(0,SINT), NULL, SMP_T_BIN,  SMP_T_SINT  },
-	{ "crc32c", sample_conv_crc32c,ARG1(0,SINT), NULL, SMP_T_BIN,  SMP_T_SINT  },
-	{ "djb2",   sample_conv_djb2,  ARG1(0,SINT), NULL, SMP_T_BIN,  SMP_T_SINT  },
-	{ "sdbm",   sample_conv_sdbm,  ARG1(0,SINT), NULL, SMP_T_BIN,  SMP_T_SINT  },
-	{ "wt6",sample_conv_wt6,   ARG1(0,SINT), NULL, SMP_T_BIN,  SMP_T_SINT  },
-	{ "xxh3",   sample_conv_xxh3,  ARG1(0,SINT), NULL, SMP_T_BIN,  SMP_T_SINT  },
-	{ "xxh32",  sample_conv_xxh32, ARG1(0,SINT), NULL, SMP_T_BIN,  SMP_T_SINT  },
-	{ "xxh64",  sample_conv_xxh64, ARG1(0,SINT), NULL, SMP_T_BIN,  SMP_T_SINT  },
-	{ "json",   sample_conv_json,  ARG1(1,STR),  sample_conv_json_ch

Re: [RFC PATCH 3/8] MINOR: uri_normalizer: Add a `dotdot` normalizer to http-request normalize-uri

2021-04-13 Thread Christopher Faulet


Le 08/04/2021 à 20:59, Tim Duesterhus a écrit :

Willy,
Christopher,

I'm not very happy with the normalization logic, because am processing the URI
in reverse. This requires me to directly access offsets instead of using the
`ist` API. However this way I don't need to backtrack once I encounter a `../`
which I consider to be a win.


It is not shocking. The function is readable, it is not a real problem. Maybe we 
can introduce the istoff() function to get the pointer at a given offset. You 
may also choose to fully rely on pointers with a negative index. I know you want 
to use the ist api as far as possible, but it is not always the easiest way :)


At the end it remains your choice. The function is quite good. I just wonder if 
it could be valuable to also handle single dot-segment here in addition to 
double dot-segment. Thus, the normalizer should be renamed "dot-segments" or 
something similar.


Another point is about the dot encoding. It may be good to handle encoded dot 
(%2E), may be via an option. And IMHO, the way empty segments are handle is a 
bit counter intuitive. Calling "merge-slashes" normalizer first is a solution of 
course, but this means rewriting twice the uri. We must figure out what is the 
main expectation for this normalizer. Especially because ignoring empty segment 
when dot-segments are merged is not exactly the same than merge all slashes.


Note I was first surprised that leading dot-segments were preserved, before 
reading the 6th patch because for me it is the most important part. But I'm fine 
with an option in a way or another.


--
Christopher Faulet

Re: [RFC PATCH 2/8] MINOR: uri_normalizer: Add `http-request normalize-uri`

2021-04-13 Thread Christopher Faulet


Le 08/04/2021 à 20:59, Tim Duesterhus a écrit :

Willy,
Christopher,

something simple for a start. This one adds the http-request action and a
very simple normalizer to test whether it works. Turns out it does :-)

You can see the new `ist` helpers in action already. I'm pretty happy that
I was able to implement this completely with the new `ist` API.



Just small comments but otherwise, this one looks good.

[...]


+/* Parses the http-request normalize-uri action. It expects a single 

+ * argument, corresponding too a value in `enum act_normalize_uri`.
+ *
+ * It returns ACT_RET_PRS_OK on success, ACT_RET_PRS_ERR on error.
+ */
+static enum act_parse_ret parse_http_normalize_uri(const char **args, int 
*orig_arg, struct proxy *px,
+   struct act_rule *rule, char 
**err)
+{
+   int cur_arg = *orig_arg;
+
+   rule->action_ptr = http_action_normalize_uri;
+   rule->release_ptr = NULL;
+
+   if (!*args[cur_arg] ||
+   (*args[cur_arg + 1] && strcmp(args[cur_arg + 1], "if") != 0 && 
strcmp(args[cur_arg + 1], "unless") != 0)) {



Testing "if" or "unless" is useless here. If no normalizer name is provided, 
this will be catch in the else statement. Especially because this test will be 
removed by the 6th patch.



+   memprintf(err, "expects exactly 1 argument ");
+   return ACT_RET_PRS_ERR;
+   }
+
+   if (strcmp(args[cur_arg], "merge-slashes") == 0) {
+   rule->action = ACT_NORMALIZE_URI_MERGE_SLASHES;
+   }
+   else {
+   memprintf(err, "unknown normalizer '%s'", args[cur_arg]);
+   return ACT_RET_PRS_ERR;


The list of supported normalizers may help the user here.


+   }
+   cur_arg++;
+
+   *orig_arg = cur_arg;
+   return ACT_RET_PRS_OK;
+}
+



--
Christopher Faulet

Re: [RFC PATCH 1/8] MINOR: uri_normalizer: Add uri_normalizer module

2021-04-13 Thread Christopher Faulet


Le 08/04/2021 à 20:59, Tim Duesterhus a écrit :

Willy,
Christopher,

I used uri_auth.[ch] as the basis for the source file structure (comments and
stuff).



Thanks, nothing to say about this one :)

--
Christopher Faulet

Re: [RFC PATCH 0/8] URI normalization / Issue #714

2021-04-13 Thread Christopher Faulet


Le 08/04/2021 à 20:59, Tim Duesterhus a écrit :

Willy,
Christopher,

Not sure who of you is better suited to review this series, so I'm adding both
of you :-)

I'm tagging this as RFC, because it's large and quite a bit outside of my
comfort zone. However the patches are as clean as possible. They include full
documentation and each normalizer comes with a bunch of reg-tests ensuring they
behave like I expect them to behave. So if you have nothing to complain, then
this series is in a mergeable state. I plan to add a few more normalizers after
this passes an initial review.

I'll add additional remarks to each patch, explaining my decisions in more
detail.

Best regards

Tim Düsterhus (8):
   MINOR: uri_normalizer: Add uri_normalizer module
   MINOR: uri_normalizer: Add `http-request normalize-uri`
   MINOR: uri_normalizer: Add a `dotdot` normalizer to http-request
 normalize-uri
   MINOR: uri_normalizer: Add a `sort-query` normalizer
   OPTIMIZE: uri_normalizer: Optimize allocations in
 uri_normalizer_query_sort
   MINOR: uri_normalizer: Add support for supressing leading `../` for
 dotdot normalizer
   MINOR: uri_normalizer: Support returning detailed errors from uri
 normalization
   MINOR: uri_normalizer: Add a `percent-upper` normalizer

  Makefile   |   3 +-
  doc/configuration.txt  |  58 +
  include/haproxy/action-t.h |   9 +
  include/haproxy/uri_normalizer-t.h |  32 +++
  include/haproxy/uri_normalizer.h   |  33 +++
  reg-tests/http-rules/normalize_uri.vtc | 314 +
  src/http_act.c | 213 +
  src/uri_normalizer.c   | 298 +++
  8 files changed, 959 insertions(+), 1 deletion(-)
  create mode 100644 include/haproxy/uri_normalizer-t.h
  create mode 100644 include/haproxy/uri_normalizer.h
  create mode 100644 reg-tests/http-rules/normalize_uri.vtc
  create mode 100644 src/uri_normalizer.c



Tim,

Sorry for the delay. I'll comment your patches by replying inline when 
appropriate. The quality of the whole series is pretty good. Thanks for the 
effort. Just a suggestion to simplify the error handling introduced in the 7th 
patch. You may use following prototype for your normalizers:


  int uri_normalizer_NAME(const struct ist path, struct ist *dst);

returning for instance 0 on success and anything else on error. This way, it is 
easy to return an enum instead of an integer to be able to handle errors.



--
Christopher Faulet

httpd ProxyPass / ProxyPassReverse to haproxy 2.3.9 ?

2021-04-13 Thread Thomas Elsgaard

Hi list

Moving from httpd (reverse proxy) to haproxy 2.3.9
In httpd i have following:

ServerName test.server.dk
ProxyPreserveHost On
ProxyPass / http://localhost:8080/aaa/bbb/
ProxyPassReverse / http://localhost:8080/aaa/bbb/

What is the correct way to do this in haproxy 2 ?

Something with http-request replace-uri ?

Thomas

Re: About the 'Hot Restarts' of haproxy

2021-04-13 Thread Willy Tarreau

On Tue, Apr 13, 2021 at 01:31:12AM +, Rmrf99 wrote:
> In this Slack engineering blog post: 
> https://slack.engineering/migrating-millions-of-concurrent-websockets-to-envoy/
> 
> they replace HAProxy with Envoy for **Hot Restart**, just curious does
> HAProxy new version will have similar approach? or have better solution(in
> the future).

So in fact it's not for hot restarts, since we've supported that even
before envoy was born, it's in order to introduce new servers at run
time. We do have some ongoing work on this, and some significant parts
are already available with experimental support:

https://github.com/haproxy/haproxy/issues/1136

Willy

Re: 2Mrps : kudos to the team

2021-04-13 Thread Willy Tarreau

On Sat, Apr 10, 2021 at 01:34:16PM +0200, Ionel GARDAIS wrote:
> Hi team, list, 
> 
> If you haven't already read it, go read this blog article : 
> [ 
> https://www.haproxy.com/blog/haproxy-forwards-over-2-million-http-requests-per-second-on-a-single-aws-arm-instance/
>  | 
> https://www.haproxy.com/blog/haproxy-forwards-over-2-million-http-requests-per-second-on-a-single-aws-arm-instance/
>  ] 
> 
> Such a milestone ! 
> It's time to take time to thank all the great work achieved by the HAProxy 
> team over the years. 
> Thanks Willy for your vision and to stick with the Unix philosophy : m ake 
> each program do one thing well. 
> Thanks Tim, William, Aleksandar, Lukas, Christopher and all those working 
> day-to-day to make HAProxy such a nice piece of code. 
> 
> *claps claps claps* 

Thank you Ionel :-)

There's still a lot of room for improvement in many areas, but what this
test showed is that we're on the right track regarding the thread-level
scalability, and that provided that we remain careful and continue to
address the remaining bottlenecks as they are met, nothing should stop
us from adding more features and more flexibility.

Cheers,
Willy

Re: [PATCH] BUG/MINOR: sample: Fix adjusting size in field converter

2021-04-13 Thread Willy Tarreau

Hi Thayne,

On Sun, Apr 11, 2021 at 11:26:59PM -0600, Thayne McCombs wrote:
> Adjust the size of the sample buffer before we change the "area"
> pointer. The change in size is calculated as the difference between the
> original pointer and the new start pointer. But since the
> `smp->data.u.str.area` assignment results in `smp->data.u.str.area` and
> `start` being the same pointer, we always ended up substracting zero.
> This changes it to change the size by the actual amount it changed.
> 
> I'm not entirely sure what the impact of this is, but the previous code
> seemed wrong.

So I carefully reviewed it, and not only you're totally right, but I
could figure in which case it is harmful. All accesses limit themselves
to the amount of data except one, the binary key padding for a stick
table. So it is technically possible to use it to write zeroes past the
end of the string in such a construct where  is of type binary
with keys at least as large as your buffers (lots of 'if') :

  hdr(foo),field(2,:),in_table(table)

Thus I tagged it "MEDIUM" in the end.

Thank you!
Willy

Re: [PATCH] MINOR: sample: Add a regmatch converter

2021-04-13 Thread Willy Tarreau

Hi Thayne,

On Tue, Apr 13, 2021 at 02:11:25AM -0600, Thayne McCombs wrote:
> Add a new sample converter that finds the first regex match and returns
> the substring for that match, or a capture group, if an index is
> provided.
> ---
>  doc/configuration.txt| 22 +++
>  reg-tests/converter/regmatch.vtc | 39 +++
>  src/sample.c | 66 
>  3 files changed, 127 insertions(+)
>  create mode 100644 reg-tests/converter/regmatch.vtc
> 
> diff --git a/doc/configuration.txt b/doc/configuration.txt
> index f21a29a68..e84395d23 100644
> --- a/doc/configuration.txt
> +++ b/doc/configuration.txt
> @@ -16238,6 +16238,28 @@ protobuf(,[])
>More information may be found here about the protocol buffers message 
> field types:
>https://developers.google.com/protocol-buffers/docs/encoding
>  
> +regmatch([,[,]])
> +  This extracts a substring that matches the regex pattern. It will return 
> the first
> +  match in the input string. By default it returns the entire match, but if 
> 
> +  is supplied, then the capture group for that number will be returned 
> instead. A
> +  value of 0 returns the entire match. The regex can be made case 
> insensitive by
> +  adding the flag "i" in .
> +
> +  It is highly recommended to enclose the regex part using protected quotes 
> to
> +  improve clarity and never have a closing parenthesis from the regex mixed 
> up with
> +  the parenthesis from the function. Just like in Bourne shell, the first 
> level of
> +  quotes is processed when delimiting word groups on the line, a second 
> level is
> +  usable for argument. It is recommended to use single quotes outside since 
> these
> +  ones do not try to resolve backslashes nor dollar signs.
> +
> +  Examples:
> +
> + # extract part of content-type
> + http-request set-var(txn.imtype) 
> 'hdr(content-type),regmatch("image/(.*)",1)'
> +
> + # extract cookie with certain pattern
> + http-request set-header x-test-cookie 
> %[hdr(cookie),'regmatch(test-\w+=\d+)']
> +

I'm failing to see how it differs from regsub() which already does the
same with the reference (you'd write \1 instead of 1) and also allows to
compose something out of multiple matches. Am I missing something, or a
case where regsub() is really not convenient compared to this one ?

Thanks,
Willy

Re: [PATCH] JWT payloads break b64dec convertor

2021-04-13 Thread Willy Tarreau

Hi Jarno,

On Tue, Apr 13, 2021 at 06:19:47AM +, Jarno Huuskonen wrote:
> Hello,
> 
> On Tue, 2021-04-06 at 01:58 +0200, Moemen MHEDHBI wrote:
> > Thanks Willy and Tim for your feedback.
> > 
> > You can find attached the updated patches with fixed coding style (now
> > set correctly in my editor), updated commit message, entry doc in sorted
> > order, size_t instead of int in both enc/dec  and corresponding reg-test.
> 
> Could you add a cross reference from b64dec/base64 to ub64dec/ub64enc in
> configuration.txt. Something like:
> --- a/doc/configuration.txt
> +++ b/doc/configuration.txt
> @@ -15020,11 +15020,14 @@ and()
>  b64dec
>Converts (decodes) a base64 encoded input string to its binary
>representation. It performs the inverse operation of base64().
> +  For base64url("URL and Filename Safe Alphabet" (RFC 4648)) variant
> +  see "ub64dec".
>  
>  base64
>Converts a binary input sample to a base64 string. It is used to log or
>transfer binary content in a way that can be reliably transferred (e.g.
> -  an SSL ID can be copied in a header).
> +  an SSL ID can be copied in a header). For base64url("URL and Filename
> Safe
> +  Alphabet" (RFC 4648)) variant see "ub64enc".
>  
>  bool
>Returns a boolean TRUE if the input value of type signed integer is

Yes very good point indeed, and similarly the ub64 ones should refer to
these ones.

Willy

Re: [PATCH] JWT payloads break b64dec convertor

2021-04-13 Thread Willy Tarreau

Hi Moemen,

On Tue, Apr 13, 2021 at 12:41:39AM +0200, Moemen MHEDHBI wrote:
> >> in such case should we rather use dynamic allocation ?
> >
> > No, there are two possible approaches. One of them is to use a trash
> > buffer using get_trash_chunk(). The trash buffers are "large enough"
> > for anything that comes from outside. A second, cleaner solution
> > simply consists in not using a temporary buffer but doing the conversion
> > on the fly. Indeed, looking closer, what the function does is to first
> > replace a few chars on the whole chain to then call the base64 conversion
> > function. So it doubles the work on the string and one side effect of
> > this double work is that you need a temporary storage.
> 
> The url variant is not only about a different alphabet that needs to be
> replaced but also is a non padding variant. So the straightforward
> algorithm to decoding it is to add the padding to the input encoded in
> url variant and then use the standard base64 decoder.

Ah OK I wasn't aware of this!

> Even doing this on the fly requires extending input with two more bytes
> max. Unless I am missing smth but in such case on the fly conversion
> will result in a out of bound array access.

There is never a reason for an out-of-bounds access but of course it
will require to add the checks everywhere while right now it knows that
it can read 4 bytes at once. So yeah, that woul result in two different
paths and that's a pain.

> That's why I have copied input in a "inlen+2" string.

Got it, that makes sense.

> In the end I have updated patch to handle extending input in decoding
> function via get_trash_chunk to make sure a buffer of size input+2 is
> available.
> 
> You can find attached the patches 0001 and 0002 for this implementation.

We could do with that, I just still find it a bit awkward to write so
much code to modify an input and adapt it to a parser instead of writing
a parser. That's more operations and more code to inspect in case of
trouble.

> You can find attached the patches 0001-bis and 0002-bis modifying the
> existing functions (introducing an url flag) to see how it looks like.
> This solution may be cleaner (no chunk allocation and we don't loop
> twice over input string) but has the drawbacks of being intrusive with
> the rest of the code and less clearer imo regarding how url variant is
> different from standard base64.

I agree they're not pretty due to the change of logic around the padding,
thanks for having tested! But then how about having just *your* functions
without relying on the other ones ? Now that you've extended the existing
function, you can declare it yours, remove all the configurable stuff and
keep the simplified version as the one you need. I'm sure it will be the
best tradeoff overall.

> diff --git a/src/base64.c b/src/base64.c
> index 53e4d65b2..f2768d980 100644
> --- a/src/base64.c
> +++ b/src/base64.c
> @@ -1,5 +1,5 @@
>  /*
> - * ASCII <-> Base64 conversion as described in RFC1421.
> + * ASCII <-> Base64 conversion as described in RFC4648.
>   *
>   * Copyright 2006-2010 Willy Tarreau 
>   * Copyright 2009-2010 Krzysztof Piotr Oledzki 
> @@ -17,50 +17,70 @@
>  #include 
>  #include 
>  
> -#define B64BASE  '#' /* arbitrary chosen base value */
> -#define B64CMIN  '+'
> -#define B64CMAX  'z'
> -#define B64PADV  64  /* Base64 chosen special pad value */
> +#define B64BASE  '#'   /* arbitrary chosen base value */
> +#define B64CMIN  '+'
> +#define UB64CMIN '-'
> +#define B64CMAX  'z'
> +#define B64PADV  64   /* Base64 chosen special pad value */

Please do not needlessly reindent code parts for no reason. It seems that
only the "-" was added there, the rest shouldn't change.
 
> @@ -73,29 +93,59 @@ int a2base64(char *in, int ilen, char *out, int olen)
>   *  or  to be NULL. Returns -1 if  is invalid or ilen
>   * has wrong size, -2 if  is too short.
>   * 1 to 3 output bytes are produced for 4 input bytes.
> + * The reverse tab used to decode base64 is generated via 
> /contrib/base64/base64rev-gen.c

By the way, contrib/ was move to dev/ during your changes so if you keep
this comment please update it.

Thanks,
Willy

Re: [PATCH] MINOR: sample: add json_string

2021-04-13 Thread Willy Tarreau

Hi Aleks,

On Mon, Apr 12, 2021 at 10:09:08PM +0200, Aleksandar Lazic wrote:
> Hi.
> 
> another patch which honer the feedback.

Thank you. FWIW I agree with all the points reported by Tim. I'll add
a few comments and/or suggestions below. On a general note, please be
careful about your indenting, as it very can quickly become a total
mess. Similarly please pay attention not to leave trailing spaces
that may Git complain:

  Applying: MINOR: sample: converter: add JSON Path handling
  .git/rebase-apply/patch:39: trailing whitespace.
 - number  : When the JSON value is a number then will the value be 
  .git/rebase-apply/patch:40: trailing whitespace.
 converted to a string. If you know that the value is a 
  .git/rebase-apply/patch:41: trailing whitespace.
 integer then can you help haproxy to convert the value 
  .git/rebase-apply/patch:46: trailing whitespace.
This converter extracts the value located at  from the JSON 
  .git/rebase-apply/patch:47: trailing whitespace.
string in the input value. 
  warning: squelched 10 whitespace errors
  warning: 15 lines add whitespace errors.

All these lines are easily noticed this way:

$ git show | grep -c '^+.*\s$'
15

A good way to avoid this once for all it to enable colors in Git and to
always make sure not to leave red areas in "git diff" or "git show" :

$ git config --global color.ui true

And even if it's of low importance for the code itself, it's particularly
important in a review because such cosmetic issues constantly remind the
reader that the patch is far from being final, so it's possibly not yet
the moment to focus on not critically important stuff. Thus in the end
they increase the number of round trips.

> The doc will be enhanced but I have a question about that sequence.
> This should write the double value to the string but I think I have here some
> issue.
> 
> ```
>   printf("\n>>>DOUBLE rc:%d: double:%f:\n",rc, 
> double_val);
>   trash->size = snprintf(trash->area,
>   trash->data,
>   
> "%g",double_val);
>   smp->data.u.str = *trash;
>   smp->data.type = SMP_T_STR;
> ```

Yeah, as Tim mentioned, you mixed size and data. "data" is the amount of
data bytes used in a chunk. "size" is its allocated size.

> >From 8cb1bc4aaedd17c7189d4985a57f662ab1b533a4 Mon Sep 17 00:00:00 2001
> From: Aleksandar Lazic 
> Date: Mon, 12 Apr 2021 22:01:04 +0200
> Subject: [PATCH] MINOR: sample: converter: add JSON Path handling
> 
> With json_path can a JSON value be extacted from a Header
> or body

In the final version, please add a few more lines to describe the name
of the added converter and what it's used for. As a reminder, think that
you're trying to sell your artwork to me or anyone else who would make
you proud bybackporting your work into their version :-)

> +json_query(,[])
> +  The  is mandatory.
> +  By default will the follwing JSON types recognized.
> +   - string  : This is the default search type and returns a string;
> +   - number  : When the JSON value is a number then will the value be 
> +   converted to a string. If you know that the value is a 
> +   integer then can you help haproxy to convert the value 
> +   to a integer when you add "sint" to the ;

Just thinking loud, I looked at the rest of the doc and noticed we never
mention "sint" anywhere else, so I think it's entirely an internal type.
However we do mention "int" which is used as the matching method for
integers, so we could have:

 ... json_query("blah",sint) -m int 12

As such I would find it more natural to call this type "int" so that it
matches the same as the one used in the match. Maps already use "int" as
the output type name by the way.

In any case, I too am a bit confused by the need to force an output type.
As a user, I'd expect the type to be implicit and not to have to know
about it in the configuration. Of course we can imagine situations where
we'd want to force the type (like we sometimes do by adding 0 or
concatenating an empty string for example) but this is still not very
clear to me if we want it by default. Or maybe when dealing with floats
where we'd have to decide whether to emit them verbatim as strings or
to convert them to integers.

But then, could it make sense to also support "strict integers": values
that can accurately be represented as integers and which are within the
JSON valid range for integers (-2^52 to 2^52 with no decimal part) ?
This would then make the converter return nothing in case of violation
(i.e. risk of losing precision). This would also reject NaN and infinite
that the lib supports.

A small detail, in general, prefer a passive form in the text rather than
speaking to the reader using "you". You'll notice that using this more
descr

Bid Writing, Fundraising and Volunteering Workshops

2021-04-13 Thread NFP Workshops


NFP   WORKSHOPS
 Affordable Training Courses



Bid Writing: The Basics


 Do you know the most common reasons for rejection? Are you gathering the right 
evidence? Are you making the right arguments? Are you using the right 
terminology? Are your numbers right? Are you learning from rejections? 

Are you assembling the right documents? Do you know how to create a clear and 
concise standard funding bid? Are you communicating with people or just 
excluding them? Do you know your own organisation well enough? 

Are you thinking through your projects carefully enough? Do you know enough 
about your competitors? Are you answering the questions funders will ask 
themselves about your application? Are you submitting applications correctly?
ONLINE VIA ZOOM
10.00 TO 12.30
COST £95.00
CLICK ON DATE TO BOOK YOUR PLACE
MON 12 APR 2021
MON 26 APR 2021
MON 10 MAY 2021
MON 24 MAY 2021
MON 07 JUN 2021
MON 21 JUN 2021
MON 05 JUL 2021
MON 19 JUL 2021




Bid Writing: Advanced

 Are you applying to the right trusts? Are you applying to enough trusts? Are 
you asking for the right amount of money? Are you applying in the right ways? 
Are your projects the most fundable projects? 

Are you carrying out trust fundraising in a professional way? Are you 
delegating enough work? Are you highly productive or just very busy? Are you 
looking for trusts in all the right places? 

How do you compare with your competitors for funding? Is the rest of your 
fundraising hampering your bids to trusts? Do you understand what trusts are 
ideally looking for?
ONLINE VIA ZOOM
10.00 TO 12.30
COST £95.00
CLICK ON DATE TO BOOK YOUR PLACE
TUE 13 APR 2021
TUE 27 APR 2021
TUE 11 MAY 2021
TUE 25 MAY 2021
TUE 08 JUN 2021
TUE 22 JUN 2021
TUE 06 JUL 2021
TUE 20 JUL 2021



Major Donor Fundraising

 Major Donor Characteristics, Motivations and Requirements. Researching and 
Screening Major Donors. Encouraging, Involving and Retaining Major Donors.

Building Relationships with Major Donors. Major Donor Events and Activities. 
Setting Up Major Donor Clubs. Asking For Major Gifts. Looking After and 
Reporting Back to Major Donors.  
 
Delivering on Major Donor Expectations. Showing Your Appreciation to Major 
Donors. Fundraising Budgets and Committees.   
ONLINE VIA ZOOM
10.00 TO 12.30
COST £95
CLICK ON DATE TO BOOK YOUR PLACE
WED 14 APR 2021
WED 09 JUN 2021



Corporate Fundraising 

Who are these companies? Why do they get involved? What do they like? What can 
you get from them? What can you offer them? What are the differences between 
donations, sponsorship, advertising and cause related marketing? 

Are companies just like trusts? How do you find these companies? How do you 
research them? How do you contact them? How do you pitch to them? How do you 
negotiate with them? 

When should you say no? How do you draft contracts? How do you manage the 
relationships? What could go wrong? What are the tax issues? What are the legal 
considerations?
ONLINE VIA ZOOM
10.00 TO 12.30
COST £95
CLICK ON DATE TO BOOK YOUR PLACE
THU 29 APR 2021
WED 23 JUN 2021



Recruiting and Managing Volunteers
 Where do you find volunteers? How do you find the right volunteers? How do you 
attract volunteers? How do you run volunteer recruitment events? How do you 
interview volunteers? 

How do you train volunteers? How do you motivate volunteers? How do you involve 
volunteers? How do you recognise volunteers? How do you recognise problems with 
volunteers? How do you learn from volunteer problems? 

How do you retain volunteers? How do you manage volunteers? What about 
volunteers and your own staff? What about younger, older and employee 
volunteers?

ONLINE VIA ZOOM
10.00 TO 12.30
COST £95
CLICK ON DATE TO BOOK YOUR PLACE
THU 13 MAY 2021
WED 07 JUL 2021



Legacy Fundraising 

Why do people make legacy gifts? What are the ethical issues? What are the 
regulations? What are the tax issues? What are the statistics? What are the 
trends? How can we integrate legacy fundraising into our other fundraising? 

What are the sources for research? How should we set a budget? How should we 
evaluate our results? How should we forecast likely income? Should we use 
consultants? How should we build a case for support? 

What media and marketing channels should we use? What about in memory giving? 
How should we setup our admin systems? What are the common problems & pitfalls?
ONLINE VIA ZOOM
10.00 TO 12.30
COST £95
CLICK ON DATE TO BOOK YOUR PLACE
THU 27 MAY 2021
WED 21 JUL 2021



Feedback From Past Attendees
I must say I was really impressed with the course and the content. My knowledge 
and confidence has increased hugely. I got a lot from your course and a lot of 
pointers! 
I can say after years of fundraising I learnt so much from your bid writing 
course. It was a very informative day and for someone who has not written bids 
before I am definitely more confident to get involved with them. 
I found the workshops very helpful. It is a whole new area for me but the 
information

[PATCH] MINOR: sample: Add a regmatch converter

2021-04-13 Thread Thayne McCombs

Add a new sample converter that finds the first regex match and returns
the substring for that match, or a capture group, if an index is
provided.
---
 doc/configuration.txt| 22 +++
 reg-tests/converter/regmatch.vtc | 39 +++
 src/sample.c | 66 
 3 files changed, 127 insertions(+)
 create mode 100644 reg-tests/converter/regmatch.vtc

diff --git a/doc/configuration.txt b/doc/configuration.txt
index f21a29a68..e84395d23 100644
--- a/doc/configuration.txt
+++ b/doc/configuration.txt
@@ -16238,6 +16238,28 @@ protobuf(,[])
   More information may be found here about the protocol buffers message field 
types:
   https://developers.google.com/protocol-buffers/docs/encoding
 
+regmatch([,[,]])
+  This extracts a substring that matches the regex pattern. It will return the 
first
+  match in the input string. By default it returns the entire match, but if 

+  is supplied, then the capture group for that number will be returned 
instead. A
+  value of 0 returns the entire match. The regex can be made case insensitive 
by
+  adding the flag "i" in .
+
+  It is highly recommended to enclose the regex part using protected quotes to
+  improve clarity and never have a closing parenthesis from the regex mixed up 
with
+  the parenthesis from the function. Just like in Bourne shell, the first 
level of
+  quotes is processed when delimiting word groups on the line, a second level 
is
+  usable for argument. It is recommended to use single quotes outside since 
these
+  ones do not try to resolve backslashes nor dollar signs.
+
+  Examples:
+
+ # extract part of content-type
+ http-request set-var(txn.imtype) 
'hdr(content-type),regmatch("image/(.*)",1)'
+
+ # extract cookie with certain pattern
+ http-request set-header x-test-cookie 
%[hdr(cookie),'regmatch(test-\w+=\d+)']
+
 regsub(,[,])
   Applies a regex-based substitution to the input string. It does the same
   operation as the well-known "sed" utility with "s///". By
diff --git a/reg-tests/converter/regmatch.vtc b/reg-tests/converter/regmatch.vtc
new file mode 100644
index 0..46df78ee0
--- /dev/null
+++ b/reg-tests/converter/regmatch.vtc
@@ -0,0 +1,39 @@
+varnishtest "regmatch converter Test"
+
+feature ignore_unknown_macro
+
+server s1 {
+   rxreq
+   txresp
+} -repeat 3 -start
+
+haproxy h1 -conf {
+   defaults
+   mode http
+   timeout connect 1s
+   timeout client 1s
+   timeout server 1s
+
+   frontend fe
+   bind "fd@${fe}"
+
+    requests
+   http-request set-var(txn.match) "path,regmatch('test/(\d+)/',1)"
+   http-response set-header Found %[var(txn.match)] if { var(txn.match) -m 
found }
+
+   default_backend be
+
+   backend be
+   server s1 ${s1_addr}:${s1_port}
+} -start
+
+client c1 -connect ${h1_fe_sock} {
+   txreq -url "/test/123/456"
+   rxresp
+   expect resp.status == 200
+   expect resp.http.found == "123"
+   txreq -url "/foo"
+   rxresp
+   expect resp.status == 200
+   expect resp.http.found == ""
+} -run
diff --git a/src/sample.c b/src/sample.c
index 835a18115..66d674e3f 100644
--- a/src/sample.c
+++ b/src/sample.c
@@ -2671,6 +2671,71 @@ static int sample_conv_word(const struct arg *arg_p, 
struct sample *smp, void *p
return 1;
 }
 
+static int sample_conv_regmatch_check(struct arg *args, struct sample_conv 
*conv,
+  const char *file, int line, char **err)
+{
+   struct arg *arg = args;
+   char *p;
+   int len;
+
+   if (arg[1].type == ARGT_SINT && (arg[1].data.sint < 0 || 
arg[1].data.sint > MAX_MATCH)) {
+   memprintf(err, "invalid capture group number %lld. must be 
between 0 and %d", arg[1].data.sint, MAX_MATCH);
+   return 0;
+   }
+
+   /* arg0 is a regex, it uses type_flag for ICASE */
+   arg[0].type_flags = 0;
+
+   if (arg[2].type != ARGT_STR)
+   return 1;
+
+   p = arg[2].data.str.area;
+   len = arg[2].data.str.data;
+   while (len) {
+   if (*p == 'i') {
+   arg[0].type_flags |= ARGF_REG_ICASE;
+   }
+   else {
+   memprintf(err, "invalid regex flag '%c', only 'i' is 
supported", *p);
+   return 0;
+   }
+   p++;
+   len--;
+   }
+   return 1;
+}
+
+/* This sample function is designed to find the first match of a regex in the 
input string.
+ * If arg1 is supplied, that is used as the capture group to return (or the 
whole match if 0).
+ */
+static int sample_conv_regmatch(const struct arg *arg_p, struct sample *smp, 
void *private)
+{
+   struct my_regex *reg = arg_p[0].data.reg;
+   regmatch_t pmatch[MAX_MATCH];
+   regmatch_t capture;
+   int nmatch = (arg_p[1].type == ARGT_SINT) ? arg_p[1].data.sint : 0;
+   int found;
+
+   found = re

37 matches

Mail list logo