Re: [PATCH] MINOR: sample: Add a regmatch converter
On Tue, Apr 13, 2021 at 10:52:15PM -0600, Thayne McCombs wrote: > > On 4/13/21 3:44 AM, Willy Tarreau wrote: > > I'm failing to see how it differs from regsub() which already does the > > same with the reference (you'd write \1 instead of 1) and also allows to > > compose something out of multiple matches. Am I missing something, or a > > case where regsub() is really not convenient compared to this one ? > > > > Thanks, > > Willy > > Mostly just convenience. It is possible to accomplish the same thing using > something like > `'regsub(".*(mypattern).*","\1")'`, but you need to remember to include the > ".*" on both sides. > I also suspect (although I haven't benchmarked it) that regmatch would > perform better than the equivalent regsub. > > That said, I do recognize that this doesn't add any completely new > functionality, and I will not > be offended at all if you don't think it is worth merging. OK thanks for explaining. I'm not particularly against it, I'm fine if others see some value in it. Does anyone have any opinion ? Lukas, given you're the one dealing with the most user help requests on Discourse, do you think it could simplify certain configs or help users spot the right usage faster maybe ? Thanks, Willy
Re: [PATCH] MINOR: sample: Add a regmatch converter
On 4/13/21 3:44 AM, Willy Tarreau wrote: I'm failing to see how it differs from regsub() which already does the same with the reference (you'd write \1 instead of 1) and also allows to compose something out of multiple matches. Am I missing something, or a case where regsub() is really not convenient compared to this one ? Thanks, Willy Mostly just convenience. It is possible to accomplish the same thing using something like `'regsub(".*(mypattern).*","\1")'`, but you need to remember to include the ".*" on both sides. I also suspect (although I haven't benchmarked it) that regmatch would perform better than the equivalent regsub. That said, I do recognize that this doesn't add any completely new functionality, and I will not be offended at all if you don't think it is worth merging.
Re: [PATCH] MINOR: sample: add json_string
On Wed, Apr 14, 2021 at 03:02:20AM +0200, Aleksandar Lazic wrote: > > But then, could it make sense to also support "strict integers": values > > that can accurately be represented as integers and which are within the > > JSON valid range for integers (-2^52 to 2^52 with no decimal part) ? > > This would then make the converter return nothing in case of violation > > (i.e. risk of losing precision). This would also reject NaN and infinite > > that the lib supports. > > You mean the same check which is done in arith_add(). Not exactly because arith_add only checks for overflows after addition and tries to cap the result, but I'd rather just say that if the decoded number is <= -2^53 or >= 2^53 then the converter should return a no match in case an integer was requested. > > H that's not your fault but now I'm seeing that we already have a > > converter inappropriately called "json", so we don't even know in which > > direction it works by just looking at its name :-( Same issue as for > > base64. > > > > May I suggest that you call yours "json_decode" or maybe shorter > > "json_dec" so that it's more explicit that it's the decode one ? Because > > for me "json_string" is the one that will emit a json string from some > > input (which it is not). Then we could later create "json_enc" and warn > > when "json" alone is used. Or even "jsdec" and "jsenc" which are much > > shorter and still quite explicit. > > How about "json_query" because it's exactly what it does :-) I'm not familiar with the notion of "query" to decode and extract contents but I'm not the most representative user and am aware of the "jq" command- line utility that does this. So if it sounds natural to others I'm fine with this. > > I'm seeing that there's a very nice mjson_find() which does *exactly* what > > you need: > > > > "In a JSON string s, len, find an element by its JSONPATH path. Save > > found element in tokptr, toklen. If not found, return JSON_TOK_INVALID. > > If found, return one of: MJSON_TOK_STRING, MJSON_TOK_NUMBER, > > MJSON_TOK_TRUE, MJSON_TOK_FALSE, MJSON_TOK_NULL, MJSON_TOK_ARRAY, > > MJSON_TOK_OBJECT. > > If a searched key contains ., [ or ] characters, they should be escaped > > by a backslash." > > > > So you get the type in return. I think you can then call one of the > > related functions depending on what is found, which is more reliable > > than iterating over multiple attempts. > > Oh yes, this sounds like a better approach. > I have now used this suggestion and I hope you can help me to fix the double > parsing > issue or is it acceptable to parse the input twice? >From what I've seen in the code in the lib you have no other option. I thought it might be possible to call mjson_get_string() on the resulting pointer but you would need to find a way to express that you want to extract the immediate content, maybe by having an empty key designation or something like this. This point is not clear to me and the unit tests in the project all re-parse the input string after mjson_find(), so probably this is the way to do it. > The check functions handles the int arg now as suggested. > > ``` > /* This function checks the "json_query" converter's arguments. */ > static int sample_check_json_query(struct arg *arg, struct sample_conv *conv, >const char *file, int line, char **err) > { > if (arg[0].data.str.data == 0) { /* empty */ > memprintf(err, "json_path must not be empty"); > return 0; > } > > if (arg[1].data.str.data != 0) { > if (strncmp(arg[1].data.str.area, "int", sizeof("int")) > != 0) { > memprintf(err, "output_type only supports > \"int\" as argument"); > return 0; > } else { > arg[1].type = ARGT_SINT; > arg[1].data.sint = 0; OK! > I use now the token enum but I have some troubles to handle the c types > properly. OK that's better, but be careful about this: > trash->data = mjson_get_string(smp->data.u.str.area, > smp->data.u.str.data, args[0].data.str.area, trash->area, trash->size); > if (trash->data == -1 ) { You're sending the result into trash->data whose type you don't know, and expect it to match against -1. I'd rather store the result into a temporary variable of the same type as the function's return (int), test it, and only if equal, I'd assign this value to the return sample's length. Willy
Re: About the 'Hot Restarts' of haproxy
On Wed, Apr 14, 2021 at 02:03:52AM +, Rmrf99 wrote: > Thanks Chris, Willy! this make me clear now. > Glad to see dynamically add/remove servers feature under development. If you're interested by with feature, please try the latest 2.4-dev, you already have the "add server" feature on the command line, you just need to enable "experimental-mode on" before accessing the command. Willy
Re: About the 'Hot Restarts' of haproxy
Thanks Chris, Willy! this make me clear now. Glad to see dynamically add/remove servers feature under development. ‐‐‐ Original Message ‐‐‐ On Wednesday, April 14, 2021 1:19 AM, Christopher Faulet wrote: > Le 13/04/2021 à 18:15, John Lauro a écrit : > > > Sounds like the biggest part of hot restarts is the cost of leaving the old > > process running as they have a lot of long running TCP connections, and if > > you > > do a lot of restarts the memory requirements build up. Not much of an > > issue for > > short lived http requests (although it would be nice if keep alive wasn't > > followed on the old haproxy processes so they could die quicker). > > Well, AFAIK, Envoy handles hot restarts exactly the same way HAProxy does. The > old process tries to finish to process in-flight requests. The connections are > not kept-alive. The old process closes all idle connections. But it must still > wait the responses for already started requests. Both Envoy and HAProxy do it > this way. Note there is an option to kill the old process after a given time. > > The article is a bit biased and inaccurate because it suggests HAproxy does > not > support hot restarts while Envoy do it. In fact, The real difference here is > the > ability to dynamically add or remove servers with Envoy. Thanks to this > feature, > most of time, there is no reason to restart it. Thus, hot restarts are not an > issue anymore. On HAProxy side, as Willy said, this feature is under > development. > > -- > > Christopher Faulet
Re: [PATCH] MINOR: sample: add json_string
On 13.04.21 11:26, Willy Tarreau wrote: Hi Aleks, On Mon, Apr 12, 2021 at 10:09:08PM +0200, Aleksandar Lazic wrote: Hi. another patch which honer the feedback. Thank you. FWIW I agree with all the points reported by Tim. I'll add a few comments and/or suggestions below. On a general note, please be careful about your indenting, as it very can quickly become a total mess. Similarly please pay attention not to leave trailing spaces that may Git complain: Applying: MINOR: sample: converter: add JSON Path handling .git/rebase-apply/patch:39: trailing whitespace. - number : When the JSON value is a number then will the value be .git/rebase-apply/patch:40: trailing whitespace. converted to a string. If you know that the value is a .git/rebase-apply/patch:41: trailing whitespace. integer then can you help haproxy to convert the value .git/rebase-apply/patch:46: trailing whitespace. This converter extracts the value located at from the JSON .git/rebase-apply/patch:47: trailing whitespace. string in the input value. warning: squelched 10 whitespace errors warning: 15 lines add whitespace errors. All these lines are easily noticed this way: $ git show | grep -c '^+.*\s$' 15 A good way to avoid this once for all it to enable colors in Git and to always make sure not to leave red areas in "git diff" or "git show" : $ git config --global color.ui true Cool tip, I have set it now. And even if it's of low importance for the code itself, it's particularly important in a review because such cosmetic issues constantly remind the reader that the patch is far from being final, so it's possibly not yet the moment to focus on not critically important stuff. Thus in the end they increase the number of round trips. Thanks. I will take care about it. The doc will be enhanced but I have a question about that sequence. This should write the double value to the string but I think I have here some issue. ``` printf("\n>>>DOUBLE rc:%d: double:%f:\n",rc, double_val); trash->size = snprintf(trash->area, trash->data, "%g",double_val); smp->data.u.str = *trash; smp->data.type = SMP_T_STR; ``` Yeah, as Tim mentioned, you mixed size and data. "data" is the amount of data bytes used in a chunk. "size" is its allocated size. Fixed now, you can see it in then snipplet below. >From 8cb1bc4aaedd17c7189d4985a57f662ab1b533a4 Mon Sep 17 00:00:00 2001 From: Aleksandar Lazic Date: Mon, 12 Apr 2021 22:01:04 +0200 Subject: [PATCH] MINOR: sample: converter: add JSON Path handling With json_path can a JSON value be extacted from a Header or body In the final version, please add a few more lines to describe the name of the added converter and what it's used for. As a reminder, think that you're trying to sell your artwork to me or anyone else who would make you proud bybackporting your work into their version :-) Will do it :-) +json_query(,[]) + The is mandatory. + By default will the follwing JSON types recognized. + - string : This is the default search type and returns a string; + - number : When the JSON value is a number then will the value be + converted to a string. If you know that the value is a + integer then can you help haproxy to convert the value + to a integer when you add "sint" to the ; Just thinking loud, I looked at the rest of the doc and noticed we never mention "sint" anywhere else, so I think it's entirely an internal type. However we do mention "int" which is used as the matching method for integers, so we could have: ... json_query("blah",sint) -m int 12 As such I would find it more natural to call this type "int" so that it matches the same as the one used in the match. Maps already use "int" as the output type name by the way. In any case, I too am a bit confused by the need to force an output type. As a user, I'd expect the type to be implicit and not to have to know about it in the configuration. Of course we can imagine situations where we'd want to force the type (like we sometimes do by adding 0 or concatenating an empty string for example) but this is still not very clear to me if we want it by default. Or maybe when dealing with floats where we'd have to decide whether to emit them verbatim as strings or to convert them to integers. But then, could it make sense to also support "strict integers": values that can accurately be represented as integers and which are within the JSON valid range for integers (-2^52 to 2^52 with no decimal part) ? This would then make the converter return nothing in case of violation (i.e. risk of losing precision). This would also reject NaN and infinite that the lib supports. You m
stable-bot: Bugfixes waiting for a release 2.3 (5), 2.1 (14)
Hi, This is a friendly bot that watches fixes pending for the next haproxy-stable release! One such e-mail is sent periodically once patches are waiting in the last maintenance branch, and an ideal release date is computed based on the severity of these fixes and their merge date. Responses to this mail must be sent to the mailing list. Last release 2.3.9 was issued on 2021-03-30. There are currently 5 patches in the queue cut down this way: - 5 MINOR, first one merged on 2021-03-31 Thus the computed ideal release date for 2.3.10 would be 2021-04-28, which is in two weeks or less. Last release 2.1.12 was issued on 2021-03-18. There are currently 14 patches in the queue cut down this way: - 2 MAJOR, first one merged on 2021-04-09 - 7 MEDIUM, first one merged on 2021-03-31 - 5 MINOR, first one merged on 2021-03-31 Thus the computed ideal release date for 2.1.13 would be 2021-05-07, which is in four weeks or less. The current list of patches in the queue is: - 2.1 - MAJOR : dns: fix null pointer dereference in snr_update_srv_status - 2.1 - MAJOR : dns: disabled servers through SRV records never recover - 2.1 - MEDIUM : thread: Fix a deadlock if an isolated thread is marked as harmless - 2.1 - MEDIUM : freq_ctr/threads: use the global_now_ms variable - 2.1 - MEDIUM : debug/lua: Use internal hlua function to dump the lua traceback - 2.1 - MEDIUM : resolvers: Don't release resolution from a requester callbacks - 2.1 - MEDIUM : time: make sure to always initialize the global tick - 2.1 - MEDIUM : mux-h1: make h1_shutw_conn() idempotent - 2.1 - MEDIUM : lua: Always init the lua stack before referencing the context - 2.1 - MINOR : stats: Apply proper styles in HTML status page. - 2.1, 2.3 - MINOR : tcp: fix silent-drop workaround for IPv6 - 2.1, 2.3 - MINOR : http_fetch: make hdr_ip() resistant to empty fields - 2.3 - MINOR : ssl: Fix update of default certificate - 2.3 - MINOR : ssl: Add missing free on SSL_CTX in ckch_inst_free - 2.1 - MINOR : http_fetch: make hdr_ip() reject trailing characters - 2.1 - MINOR : resolvers: Unlink DNS resolution to set RMAINT on SRV resolution - 2.3 - MINOR : ssl: Prevent removal of crt-list line if the instance is a default one -- The haproxy stable-bot is freely provided by HAProxy Technologies to help improve the quality of each HAProxy release. If you have any issue with these emails or if you want to suggest some improvements, please post them on the list so that the solutions suiting the most users can be found.
Re: About the 'Hot Restarts' of haproxy
Le 13/04/2021 à 18:15, John Lauro a écrit : Sounds like the biggest part of hot restarts is the cost of leaving the old process running as they have a lot of long running TCP connections, and if you do a lot of restarts the memory requirements build up. Not much of an issue for short lived http requests (although it would be nice if keep alive wasn't followed on the old haproxy processes so they could die quicker). Well, AFAIK, Envoy handles hot restarts exactly the same way HAProxy does. The old process tries to finish to process in-flight requests. The connections are not kept-alive. The old process closes all idle connections. But it must still wait the responses for already started requests. Both Envoy and HAProxy do it this way. Note there is an option to kill the old process after a given time. The article is a bit biased and inaccurate because it suggests HAproxy does not support hot restarts while Envoy do it. In fact, The real difference here is the ability to dynamically add or remove servers with Envoy. Thanks to this feature, most of time, there is no reason to restart it. Thus, hot restarts are not an issue anymore. On HAProxy side, as Willy said, this feature is under development. -- Christopher Faulet
Re: About the 'Hot Restarts' of haproxy
On Tue, Apr 13, 2021 at 12:15:33PM -0400, John Lauro wrote: > Sounds like the biggest part of hot restarts is the cost of leaving the old > process running as they have a lot of long running TCP connections, and if > you do a lot of restarts the memory requirements build up. Not much of an > issue for short lived http requests (although it would be nice if keep > alive wasn't followed on the old haproxy processes so they could die > quicker). Exactly, which is why we've been working towards adding unlimited servers as this is essentially the only reason for restarting nowadays, e.g. when you need to add more servers than you had initially planned and already maxed out your server-template reserve :-) Willy
Re: [RFC PATCH 4/8] MINOR: uri_normalizer: Add a `sort-query` normalizer
Le 13/04/2021 à 18:05, Tim Düsterhus a écrit : Christopher, On 4/13/21 4:59 PM, Christopher Faulet wrote: +/* Sorts the parameters within the given query string. Returns an ist containing + * the new path and backed by `trash` or IST_NULL if the `len` not sufficiently + * large to store the resulting path. + */ +struct ist uri_normalizer_query_sort(const struct ist query, const char delim, char *trash, size_t len) +{ I hadn't noticed it till now, but please don't use "trash" variable name, it is confusing with the trash buffer used almost everywhere. Especially because of my next comment :) In fact the http-request normalize-uri action will pass a trash buffer pointer into the function, that's why I found the name somewhat fitting. But the exact method signature is up for discussion anyway (see my other email). It is the caller point of view. For the normalizers, it is just a destination buffer, not necessarily a trash buffer. Here, it is just to avoid confusion with the global variable. -- Christopher Faulet
Re: [RFC PATCH 3/8] MINOR: uri_normalizer: Add a `dotdot` normalizer to http-request normalize-uri
Le 13/04/2021 à 18:03, Tim Düsterhus a écrit : Christopher, On 4/13/21 4:38 PM, Christopher Faulet wrote: At the end it remains your choice. The function is quite good. I just wonder if it could be valuable to also handle single dot-segment here in addition to double dot-segment. Thus, the normalizer should be renamed "dot-segments" or something similar. I planned to add a separate normalizer for that. This keeps the functions simple and easy to check for correctness. It also allows the administrator to cherry-pick exactly the normalizers they desire and that do not break their backend. In the old discussion Willy said that not everything that might be possible to normalize can actually be normalized when combined with legacy software. Ok, that make sense. Another point is about the dot encoding. It may be good to handle encoded dot (%2E), may be via an option. And IMHO, the way empty segments are handle is a bit counter intuitive. Calling "merge-slashes" normalizer first is a solution of course, but this means rewriting twice the uri. We must figure out what is the main expectation for this normalizer. Especially because ignoring empty segment when dot-segments are merged is not exactly the same than merge all slashes. The percent encoding of the dot will be handled by a 'percent-decode' normalizer that decodes percent encoded unreserved characters (RFC 3986, section 2.3). The administrator then first would use the percent-decode normalizer, then the merge-slashes normalizer, then the dotdot normalizer. Well, it is a bit different here. Because someone could choose to not decode unreserved characters but want to handle encoded dot in dotdot normalizer. Yes, it means rewriting the URI several times. But it is nice, explicit and composes well. On this point, you're right. It is far cleaner this way. We can later figure out whether we want to provide "combined normalizers", such as a 'filesystem' normalizer that would combine the '.', '..' and '//' normalizers in an efficient way. Adding something like that later is easy. Changing the behavior of a normalizer later is hard. That's true. Depending on feebacks, it will be possible to add more normalizers. I'm fine with that. That's why I'd like to keep them simple "Unix style" for now. Make them do one thing, make them do it well. Note I was first surprised that leading dot-segments were preserved, before reading the 6th patch because for me it is the most important part. But I'm fine with an option in a way or another. It's a result of how I approached the development. I wanted to not rebase my branch more than necessary. I will probably merge the two patches and change the default once the general approach is approved :-) Well, it is not a problem. You can keep it in two patches if it is easier for you. -- Christopher Faulet
Re: [RFC PATCH 0/8] URI normalization / Issue #714
Le 13/04/2021 à 17:45, Tim Düsterhus a écrit : Christopher, On 4/13/21 2:41 PM, Christopher Faulet wrote: Sorry for the delay. I'll comment your patches by replying inline when No delay experienced. You said that you'd try this week and it's still this week. So this is fine :-) appropriate. The quality of the whole series is pretty good. Thanks for the effort. Just a suggestion to simplify the error handling introduced in the 7th patch. You may use following prototype for your normalizers: int uri_normalizer_NAME(const struct ist path, struct ist *dst); returning for instance 0 on success and anything else on error. This way, it is easy to return an enum instead of an integer to be able to handle errors. Yes, I struggled a bit with the method signature, due to the following constraints: 1. I did not want to allocate the result buffer within the normalizer itself, because this makes the method less flexible with regard to both allocation and freeing. It is indeed cleaner to allocate it in the caller. We should avoid responsibility sharing. It is always confusing and leads to bugs. 2. I need to somehow pass a size of the target buffer to prevent buffer overflows. Passing a pointer on an ist is a good way to announce the max size when the normalizer is called and to update it to set the final size of the new path/query. By contract, the caller must provide an ist owning an allocated buffer. Thus I can't simply take a `struct ist*` for the destination, as an ist cannot communicate the size of the underlying buffer. I could technically take a `struct buffer`, but I'd still like the result to reside in an ist, because this is what the HTX API expects. Hum, I don't understand. If you create an ist using the trash buffer this way: struct ist dst = ist2(replace->area, replace->size); You can pass a pointer on dst. In the normalizer, you can update its size. It is thus possible to use dst when calling http_replace_req_path() or http_replace_req_query(). Your suggested signature would work if I would allocate a trash buffer within the normalizer. Should I do this? Is it safe to return a pointer to a trash buffer from a function? I remember something about these buffers being reused, accidentally destroying the contained information if one is not careful. No, it is indeed a very bad idea. the trash buffer must be allocated in the caller. You already choose the right way on this point. But you can still a pointer on an ist, locally build from the trash buffer. -- Christopher Faulet
Re: About the 'Hot Restarts' of haproxy
Sounds like the biggest part of hot restarts is the cost of leaving the old process running as they have a lot of long running TCP connections, and if you do a lot of restarts the memory requirements build up. Not much of an issue for short lived http requests (although it would be nice if keep alive wasn't followed on the old haproxy processes so they could die quicker). On Tue, Apr 13, 2021 at 6:25 AM Willy Tarreau wrote: > On Tue, Apr 13, 2021 at 01:31:12AM +, Rmrf99 wrote: > > In this Slack engineering blog post: > https://slack.engineering/migrating-millions-of-concurrent-websockets-to-envoy/ > > > > they replace HAProxy with Envoy for **Hot Restart**, just curious does > > HAProxy new version will have similar approach? or have better > solution(in > > the future). > > So in fact it's not for hot restarts, since we've supported that even > before envoy was born, it's in order to introduce new servers at run > time. We do have some ongoing work on this, and some significant parts > are already available with experimental support: > > https://github.com/haproxy/haproxy/issues/1136 > > Willy > >
Re: [RFC PATCH 8/8] MINOR: uri_normalizer: Add a `percent-upper` normalizer
Le 08/04/2021 à 20:59, Tim Duesterhus a écrit : Willy, Christopher, and this final one adds a normalizer to turn the hex digits of percent encoding into uppercase. Uppercase is the variant preferred by the URI RFC, so this is what we do. This one looks good. -- Christopher Faulet
Re: [RFC PATCH 4/8] MINOR: uri_normalizer: Add a `sort-query` normalizer
Christopher, On 4/13/21 4:59 PM, Christopher Faulet wrote: +/* Sorts the parameters within the given query string. Returns an ist containing + * the new path and backed by `trash` or IST_NULL if the `len` not sufficiently + * large to store the resulting path. + */ +struct ist uri_normalizer_query_sort(const struct ist query, const char delim, char *trash, size_t len) +{ I hadn't noticed it till now, but please don't use "trash" variable name, it is confusing with the trash buffer used almost everywhere. Especially because of my next comment :) In fact the http-request normalize-uri action will pass a trash buffer pointer into the function, that's why I found the name somewhat fitting. But the exact method signature is up for discussion anyway (see my other email). Best regards Tim Düsterhus
Re: [RFC PATCH 3/8] MINOR: uri_normalizer: Add a `dotdot` normalizer to http-request normalize-uri
Christopher, On 4/13/21 4:38 PM, Christopher Faulet wrote: At the end it remains your choice. The function is quite good. I just wonder if it could be valuable to also handle single dot-segment here in addition to double dot-segment. Thus, the normalizer should be renamed "dot-segments" or something similar. I planned to add a separate normalizer for that. This keeps the functions simple and easy to check for correctness. It also allows the administrator to cherry-pick exactly the normalizers they desire and that do not break their backend. In the old discussion Willy said that not everything that might be possible to normalize can actually be normalized when combined with legacy software. Another point is about the dot encoding. It may be good to handle encoded dot (%2E), may be via an option. And IMHO, the way empty segments are handle is a bit counter intuitive. Calling "merge-slashes" normalizer first is a solution of course, but this means rewriting twice the uri. We must figure out what is the main expectation for this normalizer. Especially because ignoring empty segment when dot-segments are merged is not exactly the same than merge all slashes. The percent encoding of the dot will be handled by a 'percent-decode' normalizer that decodes percent encoded unreserved characters (RFC 3986, section 2.3). The administrator then first would use the percent-decode normalizer, then the merge-slashes normalizer, then the dotdot normalizer. Yes, it means rewriting the URI several times. But it is nice, explicit and composes well. We can later figure out whether we want to provide "combined normalizers", such as a 'filesystem' normalizer that would combine the '.', '..' and '//' normalizers in an efficient way. Adding something like that later is easy. Changing the behavior of a normalizer later is hard. That's why I'd like to keep them simple "Unix style" for now. Make them do one thing, make them do it well. Note I was first surprised that leading dot-segments were preserved, before reading the 6th patch because for me it is the most important part. But I'm fine with an option in a way or another. It's a result of how I approached the development. I wanted to not rebase my branch more than necessary. I will probably merge the two patches and change the default once the general approach is approved :-) Best regards Tim Düsterhus
Re: [RFC PATCH 7/8] MINOR: uri_normalizer: Support returning detailed errors from uri normalization
Le 08/04/2021 à 20:59, Tim Duesterhus a écrit : Willy, Christopher, this is in prepatation for the next normalizer which normalizes character case of the percent encoding. The resources I found are not clear on whether a percent that is not followed by two hex digits is valid or not. Most browsers and servers appear to support it, so I opted to leave the user a choice whether to reject it or not. It is probably a good choice To support this I need to differ between "normalizing failed for internal reasons" and "normalizing failed, because the user sent garbage". Overall I am not happy with this patch, because the control flow is ugly. I also wasn't sure in which cases I need to increase server counters (e.g. failed_rewrites) or not. I'd be happy if you could give some advice! The function may be simplified. I'll propose you a new version, based on my other comments. [...] +enum uri_normalizer_err { + URI_NORMALIZER_ERR_NONE = 0, + URI_NORMALIZER_ERR_TRASH, + URI_NORMALIZER_ERR_ALLOC, + URI_NORMALIZER_ERR_INVALID_INPUT, + URI_NORMALIZER_ERR_BUG = 0xdead, +}; URI_NORMALIZER_ERR_BUG value is useless. And I guess it could be useful to have a rewrite error too. [...] fail_rewrite: _HA_ATOMIC_ADD(&sess->fe->fe_counters.failed_rewrites, 1); if (s->flags & SF_BE_ASSIGNED) @@ -300,6 +301,24 @@ static enum act_return http_action_normalize_uri(struct act_rule *rule, struct p s->flags |= SF_ERR_PRXCOND; } goto leave; + + err: + switch (err) { + case URI_NORMALIZER_ERR_NONE: You must break here, otherwise, an error is returned on success. + case URI_NORMALIZER_ERR_BUG: + case URI_NORMALIZER_ERR_TRASH: + /* These must not happen. */ + BUG_ON("Logic error during URI normalization."); + ret = ACT_RET_ERR; + goto leave; + case URI_NORMALIZER_ERR_ALLOC: + goto fail_alloc; + case URI_NORMALIZER_ERR_INVALID_INPUT: + ret = ACT_RET_INV; + goto leave; + } + + my_unreachable(); } You may rewrite http_action_normalize_uri() this way (on top of your series, normalizers must be adapted). It is not untested nor compiled. static enum act_return http_action_normalize_uri(struct act_rule *rule, struct proxy *px, struct session *sess, struct stream *s, int flags) { enum act_return ret = ACT_RET_CONT; struct htx *htx = htxbuf(&s->req.buf); const struct ist uri = htx_sl_req_uri(http_get_stline(htx)); const struct ist path = http_get_path(uri); struct ist dst; struct buffer *replace = alloc_trash_chunk(); enum uri_normalizer_err err; if (!replace) goto fail_alloc; if (!isttest(path)) goto leave; dst = ist2(replace->area, replace->size); switch ((enum act_normalize_uri) rule->action) { case ACT_NORMALIZE_URI_MERGE_SLASHES: err = uri_normalizer_path_merge_slashes(iststop(path, '?'), &dst); if (err != URI_NORMALIZER_ERR_NONE) break; if (!http_replace_req_path(htx, dst, 0)) goto fail_rewrite; break; case ACT_NORMALIZE_URI_DOTDOT: case ACT_NORMALIZE_URI_DOTDOT_FULL: err = uri_normalizer_path_dotdot(iststop(path, '?'), &dst, rule->action == ACT_NORMALIZE_URI_DOTDOT_FULL); if (err != URI_NORMALIZER_ERR_NONE) break; if (!http_replace_req_path(htx, dst, 0)) goto fail_rewrite; break; case ACT_NORMALIZE_URI_SORT_QUERY: err = uri_normalizer_query_sort(istfind(path, '?'), &dst, '&'); if (err != URI_NORMALIZER_ERR_NONE) break; if (!http_replace_req_query(htx, dst)) goto fail_rewrite; break; case ACT_NORMALIZE_URI_PERCENT_UPPER: case ACT_NORMALIZE_URI_PERCENT_UPPER_STRICT: err = uri_normalizer_percent_upper(path, &dst, rule->action == ACT_NORMALIZE_URI_PERCENT_UPPER_STRICT); if (err != URI_NORMALIZER_ERR_NONE) break; if (!http_replace_req_path(htx, dst, 1)) goto fail_rewrite; break; } switch (err) { case URI_NORMALIZER_ERR_NONE: break; case URI_NORMALIZER_ERR_BUG: case URI_NORMALIZER_ERR_TRASH: /* These must not happen. */ BUG_ON("Logic error during URI normalization."); ret = ACT_RET_ERR; break; case URI_NORMALIZER_ERR_ALLOC: goto fail_alloc; case URI_NORMALIZER_ERR_INVALID_INPU
Re: [RFC PATCH 0/8] URI normalization / Issue #714
Christopher, On 4/13/21 2:41 PM, Christopher Faulet wrote: Sorry for the delay. I'll comment your patches by replying inline when No delay experienced. You said that you'd try this week and it's still this week. So this is fine :-) appropriate. The quality of the whole series is pretty good. Thanks for the effort. Just a suggestion to simplify the error handling introduced in the 7th patch. You may use following prototype for your normalizers: int uri_normalizer_NAME(const struct ist path, struct ist *dst); returning for instance 0 on success and anything else on error. This way, it is easy to return an enum instead of an integer to be able to handle errors. Yes, I struggled a bit with the method signature, due to the following constraints: 1. I did not want to allocate the result buffer within the normalizer itself, because this makes the method less flexible with regard to both allocation and freeing. 2. I need to somehow pass a size of the target buffer to prevent buffer overflows. Thus I can't simply take a `struct ist*` for the destination, as an ist cannot communicate the size of the underlying buffer. I could technically take a `struct buffer`, but I'd still like the result to reside in an ist, because this is what the HTX API expects. Your suggested signature would work if I would allocate a trash buffer within the normalizer. Should I do this? Is it safe to return a pointer to a trash buffer from a function? I remember something about these buffers being reused, accidentally destroying the contained information if one is not careful. Best regards Tim Düsterhus
Re: [PATCH] JWT payloads break b64dec convertor
On Tue, Apr 13, 2021 at 04:44:38PM +0200, Moemen MHEDHBI wrote: > > But then how about having just *your* functions > > without relying on the other ones ? Now that you've extended the existing > > function, you can declare it yours, remove all the configurable stuff and > > keep the simplified version as the one you need. I'm sure it will be the > > best tradeoff overall. > > > > Yes that makes sense to me too, the attached patch deal with it as > suggested. Yeah much cleaner now, indeed. I've merged it, thank you! Willy
Re: [RFC PATCH 6/8] MINOR: uri_normalizer: Add support for supressing leading `../` for dotdot normalizer
Le 08/04/2021 à 20:59, Tim Duesterhus a écrit : Willy, Christopher, most of the patch is moving around the config parser to support ingesting the new argument. This one looks good. -- Christopher Faulet
Re: [RFC PATCH 5/8] OPTIMIZE: uri_normalizer: Optimize allocations in uri_normalizer_query_sort
Le 08/04/2021 à 20:59, Tim Duesterhus a écrit : Willy, Christopher, I did not perform any measurements at all. But not reallocating for every parameter should be better :-) This one may be useless if you use the trash buffer to store the query parameters. -- Christopher Faulet
Re: [RFC PATCH 4/8] MINOR: uri_normalizer: Add a `sort-query` normalizer
Le 08/04/2021 à 20:59, Tim Duesterhus a écrit : Willy, Christopher, This one comes with dynamic allocation. The next patch will add an optimization for a small number of arguments. However dynamic allocation within the main processing logic is pretty ugly, so this should be looked at further. Dynamic allocation should be avoided here. Definitely. I may propose you an alternative by using the trash buffer area to store the ist array, via a cast. It may be considered a bit ugly at first glance, but if you think about it as a trash memory space, it is not so shocking. We've already used this trick for the HTX and regular buffers. If you do it that way, by default, it gives you 1024 slots. And you may return an alloc failure beyond this value. It seems reasonable :) +/* Compares two query parameters by name. Query parameters are ordered + * as with memcmp. Shorter parameter names are ordered lower. Identical + * parameter names are compared by their pointer to maintain a stable + * sort. + */ +static int query_param_cmp(const void *a, const void *b) +{ + const struct ist param_a = *(struct ist*)a; + const struct ist param_b = *(struct ist*)b; + const struct ist param_a_name = iststop(param_a, '='); + const struct ist param_b_name = iststop(param_b, '='); + + int cmp = memcmp(istptr(param_a_name), istptr(param_b_name), MIN(istlen(param_a_name), istlen(param_b_name))); + + if (cmp != 0) + return cmp; + + if (istlen(param_a_name) < istlen(param_b_name)) + return -1; + + if (istlen(param_a_name) > istlen(param_b_name)) + return 1; + + if (istptr(param_a) < istptr(param_b)) + return -1; + + if (istptr(param_a) > istptr(param_b)) + return 1; + + return 0; +} To make it more simple, you may use istdiff instead. + +/* Sorts the parameters within the given query string. Returns an ist containing + * the new path and backed by `trash` or IST_NULL if the `len` not sufficiently + * large to store the resulting path. + */ +struct ist uri_normalizer_query_sort(const struct ist query, const char delim, char *trash, size_t len) +{ I hadn't noticed it till now, but please don't use "trash" variable name, it is confusing with the trash buffer used almost everywhere. Especially because of my next comment :) + struct ist scanner = istadv(query, 1); + struct ist *params = NULL; + struct ist newquery = ist2(trash, 0); + size_t param_count = 0; + size_t i; + + if (len < istlen(query)) + return IST_NULL; + + while (istlen(scanner) > 0) { + const struct ist param = istsplit(&scanner, delim); + struct ist *realloc = reallocarray(params, param_count + 1, sizeof(*realloc)); Here, instead of a dynamic array, you may use the trash buffer area (declared from outside the loop). For instance: struct ist *params = (struct ist *)b_orig(&trash); size_t max_param = b_size(&trash) / sizeof(*params); + + if (!realloc) + goto fail; + + params = realloc; + + params[param_count] = param; + param_count++; + } + + qsort(params, param_count, sizeof(*params), query_param_cmp); + + newquery = __istappend(newquery, '?'); + for (i = 0; i < param_count; i++) { + if (i > 0) + newquery = __istappend(newquery, '&'); + + if (istcat(&newquery, params[i], len) < 0) + goto fail; + } + + free(params); + + return newquery; + + fail: + free(params); + + return IST_NULL; +} /* * Local variables: -- Christopher Faulet
Re: [PATCH] JWT payloads break b64dec convertor
On 13/04/2021 11:39, Willy Tarreau wrote: >> You can find attached the patches 0001-bis and 0002-bis modifying the >> existing functions (introducing an url flag) to see how it looks like. >> This solution may be cleaner (no chunk allocation and we don't loop >> twice over input string) but has the drawbacks of being intrusive with >> the rest of the code and less clearer imo regarding how url variant is >> different from standard base64. > > I agree they're not pretty due to the change of logic around the padding, > thanks for having tested! But then how about having just *your* functions > without relying on the other ones ? Now that you've extended the existing > function, you can declare it yours, remove all the configurable stuff and > keep the simplified version as the one you need. I'm sure it will be the > best tradeoff overall. > Yes that makes sense to me too, the attached patch deal with it as suggested. >> diff --git a/src/base64.c b/src/base64.c >> index 53e4d65b2..f2768d980 100644 >> --- a/src/base64.c >> +++ b/src/base64.c >> @@ -1,5 +1,5 @@ >> /* >> - * ASCII <-> Base64 conversion as described in RFC1421. >> + * ASCII <-> Base64 conversion as described in RFC4648. >> * >> * Copyright 2006-2010 Willy Tarreau >> * Copyright 2009-2010 Krzysztof Piotr Oledzki >> @@ -17,50 +17,70 @@ >> #include >> #include >> >> -#define B64BASE '#' /* arbitrary chosen base value */ >> -#define B64CMIN '+' >> -#define B64CMAX 'z' >> -#define B64PADV 64 /* Base64 chosen special pad value */ >> +#define B64BASE '#' /* arbitrary chosen base value */ >> +#define B64CMIN '+' >> +#define UB64CMIN '-' >> +#define B64CMAX 'z' >> +#define B64PADV 64 /* Base64 chosen special pad value */ > > Please do not needlessly reindent code parts for no reason. It seems that > only the "-" was added there, the rest shouldn't change. The reason is I was following the doc/coding-style where alignment should use spaces, but since the existing bloc was aligned via tabs, I thought about fixing that instead of repeating the issue. I understand though that in such case better have this in separate commit, so I have stuck with the tabs alignment. > By the way, contrib/ was move to dev/ during your changes so if you keep > this comment please update it. Done. On 13/04/2021 08:19, Jarno Huuskonen wrote: > Could you add a cross reference from b64dec/base64 to ub64dec/ub64enc in > configuration.txt. Done thanks. -- Moemen >From b526416364b98afaa2d2b421fbf27f80bc4e8732 Mon Sep 17 00:00:00 2001 From: Moemen MHEDHBI Date: Fri, 2 Apr 2021 01:05:07 +0200 Subject: [PATCH 2/2] CLEANUP: align samples list in sample.c --- src/sample.c | 54 ++-- 1 file changed, 27 insertions(+), 27 deletions(-) diff --git a/src/sample.c b/src/sample.c index 04635a91f..7337ba06a 100644 --- a/src/sample.c +++ b/src/sample.c @@ -4129,33 +4129,33 @@ INITCALL1(STG_REGISTER, sample_register_fetches, &smp_kws); /* Note: must not be declared as its list will be overwritten */ static struct sample_conv_kw_list sample_conv_kws = {ILH, { - { "debug", sample_conv_debug, ARG2(0,STR,STR), smp_check_debug, SMP_T_ANY, SMP_T_ANY }, - { "b64dec", sample_conv_base642bin,0,NULL, SMP_T_STR, SMP_T_BIN }, - { "base64", sample_conv_bin2base64,0,NULL, SMP_T_BIN, SMP_T_STR }, - { "ub64dec", sample_conv_base64url2bin,0,NULL, SMP_T_STR, SMP_T_BIN }, - { "ub64enc", sample_conv_bin2base64url,0,NULL, SMP_T_BIN, SMP_T_STR }, - { "upper", sample_conv_str2upper, 0,NULL, SMP_T_STR, SMP_T_STR }, - { "lower", sample_conv_str2lower, 0,NULL, SMP_T_STR, SMP_T_STR }, - { "length", sample_conv_length,0,NULL, SMP_T_STR, SMP_T_SINT }, - { "hex",sample_conv_bin2hex, 0,NULL, SMP_T_BIN, SMP_T_STR }, - { "hex2i", sample_conv_hex2int, 0,NULL, SMP_T_STR, SMP_T_SINT }, - { "ipmask", sample_conv_ipmask,ARG2(1,MSK4,MSK6), NULL, SMP_T_ADDR, SMP_T_IPV4 }, - { "ltime", sample_conv_ltime, ARG2(1,STR,SINT), NULL, SMP_T_SINT, SMP_T_STR }, - { "utime", sample_conv_utime, ARG2(1,STR,SINT), NULL, SMP_T_SINT, SMP_T_STR }, - { "crc32", sample_conv_crc32, ARG1(0,SINT), NULL, SMP_T_BIN, SMP_T_SINT }, - { "crc32c", sample_conv_crc32c,ARG1(0,SINT), NULL, SMP_T_BIN, SMP_T_SINT }, - { "djb2", sample_conv_djb2, ARG1(0,SINT), NULL, SMP_T_BIN, SMP_T_SINT }, - { "sdbm", sample_conv_sdbm, ARG1(0,SINT), NULL, SMP_T_BIN, SMP_T_SINT }, - { "wt6",sample_conv_wt6, ARG1(0,SINT), NULL, SMP_T_BIN, SMP_T_SINT }, - { "xxh3", sample_conv_xxh3, ARG1(0,SINT), NULL, SMP_T_BIN, SMP_T_SINT }, - { "xxh32", sample_conv_xxh32, ARG1(0,SINT), NULL, SMP_T_BIN, SMP_T_SINT }, - { "xxh64", sample_conv_xxh64, ARG1(0,SINT), NULL, SMP_T_BIN, SMP_T_SINT }, - { "json", sample_conv_json, ARG1(1,STR), sample_conv_json_ch
Re: [RFC PATCH 3/8] MINOR: uri_normalizer: Add a `dotdot` normalizer to http-request normalize-uri
Le 08/04/2021 à 20:59, Tim Duesterhus a écrit : Willy, Christopher, I'm not very happy with the normalization logic, because am processing the URI in reverse. This requires me to directly access offsets instead of using the `ist` API. However this way I don't need to backtrack once I encounter a `../` which I consider to be a win. It is not shocking. The function is readable, it is not a real problem. Maybe we can introduce the istoff() function to get the pointer at a given offset. You may also choose to fully rely on pointers with a negative index. I know you want to use the ist api as far as possible, but it is not always the easiest way :) At the end it remains your choice. The function is quite good. I just wonder if it could be valuable to also handle single dot-segment here in addition to double dot-segment. Thus, the normalizer should be renamed "dot-segments" or something similar. Another point is about the dot encoding. It may be good to handle encoded dot (%2E), may be via an option. And IMHO, the way empty segments are handle is a bit counter intuitive. Calling "merge-slashes" normalizer first is a solution of course, but this means rewriting twice the uri. We must figure out what is the main expectation for this normalizer. Especially because ignoring empty segment when dot-segments are merged is not exactly the same than merge all slashes. Note I was first surprised that leading dot-segments were preserved, before reading the 6th patch because for me it is the most important part. But I'm fine with an option in a way or another. -- Christopher Faulet
Re: [RFC PATCH 2/8] MINOR: uri_normalizer: Add `http-request normalize-uri`
Le 08/04/2021 à 20:59, Tim Duesterhus a écrit : Willy, Christopher, something simple for a start. This one adds the http-request action and a very simple normalizer to test whether it works. Turns out it does :-) You can see the new `ist` helpers in action already. I'm pretty happy that I was able to implement this completely with the new `ist` API. Just small comments but otherwise, this one looks good. [...] +/* Parses the http-request normalize-uri action. It expects a single + * argument, corresponding too a value in `enum act_normalize_uri`. + * + * It returns ACT_RET_PRS_OK on success, ACT_RET_PRS_ERR on error. + */ +static enum act_parse_ret parse_http_normalize_uri(const char **args, int *orig_arg, struct proxy *px, + struct act_rule *rule, char **err) +{ + int cur_arg = *orig_arg; + + rule->action_ptr = http_action_normalize_uri; + rule->release_ptr = NULL; + + if (!*args[cur_arg] || + (*args[cur_arg + 1] && strcmp(args[cur_arg + 1], "if") != 0 && strcmp(args[cur_arg + 1], "unless") != 0)) { Testing "if" or "unless" is useless here. If no normalizer name is provided, this will be catch in the else statement. Especially because this test will be removed by the 6th patch. + memprintf(err, "expects exactly 1 argument "); + return ACT_RET_PRS_ERR; + } + + if (strcmp(args[cur_arg], "merge-slashes") == 0) { + rule->action = ACT_NORMALIZE_URI_MERGE_SLASHES; + } + else { + memprintf(err, "unknown normalizer '%s'", args[cur_arg]); + return ACT_RET_PRS_ERR; The list of supported normalizers may help the user here. + } + cur_arg++; + + *orig_arg = cur_arg; + return ACT_RET_PRS_OK; +} + -- Christopher Faulet
Re: [RFC PATCH 1/8] MINOR: uri_normalizer: Add uri_normalizer module
Le 08/04/2021 à 20:59, Tim Duesterhus a écrit : Willy, Christopher, I used uri_auth.[ch] as the basis for the source file structure (comments and stuff). Thanks, nothing to say about this one :) -- Christopher Faulet
Re: [RFC PATCH 0/8] URI normalization / Issue #714
Le 08/04/2021 à 20:59, Tim Duesterhus a écrit : Willy, Christopher, Not sure who of you is better suited to review this series, so I'm adding both of you :-) I'm tagging this as RFC, because it's large and quite a bit outside of my comfort zone. However the patches are as clean as possible. They include full documentation and each normalizer comes with a bunch of reg-tests ensuring they behave like I expect them to behave. So if you have nothing to complain, then this series is in a mergeable state. I plan to add a few more normalizers after this passes an initial review. I'll add additional remarks to each patch, explaining my decisions in more detail. Best regards Tim Düsterhus (8): MINOR: uri_normalizer: Add uri_normalizer module MINOR: uri_normalizer: Add `http-request normalize-uri` MINOR: uri_normalizer: Add a `dotdot` normalizer to http-request normalize-uri MINOR: uri_normalizer: Add a `sort-query` normalizer OPTIMIZE: uri_normalizer: Optimize allocations in uri_normalizer_query_sort MINOR: uri_normalizer: Add support for supressing leading `../` for dotdot normalizer MINOR: uri_normalizer: Support returning detailed errors from uri normalization MINOR: uri_normalizer: Add a `percent-upper` normalizer Makefile | 3 +- doc/configuration.txt | 58 + include/haproxy/action-t.h | 9 + include/haproxy/uri_normalizer-t.h | 32 +++ include/haproxy/uri_normalizer.h | 33 +++ reg-tests/http-rules/normalize_uri.vtc | 314 + src/http_act.c | 213 + src/uri_normalizer.c | 298 +++ 8 files changed, 959 insertions(+), 1 deletion(-) create mode 100644 include/haproxy/uri_normalizer-t.h create mode 100644 include/haproxy/uri_normalizer.h create mode 100644 reg-tests/http-rules/normalize_uri.vtc create mode 100644 src/uri_normalizer.c Tim, Sorry for the delay. I'll comment your patches by replying inline when appropriate. The quality of the whole series is pretty good. Thanks for the effort. Just a suggestion to simplify the error handling introduced in the 7th patch. You may use following prototype for your normalizers: int uri_normalizer_NAME(const struct ist path, struct ist *dst); returning for instance 0 on success and anything else on error. This way, it is easy to return an enum instead of an integer to be able to handle errors. -- Christopher Faulet
httpd ProxyPass / ProxyPassReverse to haproxy 2.3.9 ?
Hi list Moving from httpd (reverse proxy) to haproxy 2.3.9 In httpd i have following: ServerName test.server.dk ProxyPreserveHost On ProxyPass / http://localhost:8080/aaa/bbb/ ProxyPassReverse / http://localhost:8080/aaa/bbb/ What is the correct way to do this in haproxy 2 ? Something with http-request replace-uri ? Thomas
Re: About the 'Hot Restarts' of haproxy
On Tue, Apr 13, 2021 at 01:31:12AM +, Rmrf99 wrote: > In this Slack engineering blog post: > https://slack.engineering/migrating-millions-of-concurrent-websockets-to-envoy/ > > they replace HAProxy with Envoy for **Hot Restart**, just curious does > HAProxy new version will have similar approach? or have better solution(in > the future). So in fact it's not for hot restarts, since we've supported that even before envoy was born, it's in order to introduce new servers at run time. We do have some ongoing work on this, and some significant parts are already available with experimental support: https://github.com/haproxy/haproxy/issues/1136 Willy
Re: 2Mrps : kudos to the team
On Sat, Apr 10, 2021 at 01:34:16PM +0200, Ionel GARDAIS wrote: > Hi team, list, > > If you haven't already read it, go read this blog article : > [ > https://www.haproxy.com/blog/haproxy-forwards-over-2-million-http-requests-per-second-on-a-single-aws-arm-instance/ > | > https://www.haproxy.com/blog/haproxy-forwards-over-2-million-http-requests-per-second-on-a-single-aws-arm-instance/ > ] > > Such a milestone ! > It's time to take time to thank all the great work achieved by the HAProxy > team over the years. > Thanks Willy for your vision and to stick with the Unix philosophy : m ake > each program do one thing well. > Thanks Tim, William, Aleksandar, Lukas, Christopher and all those working > day-to-day to make HAProxy such a nice piece of code. > > *claps claps claps* Thank you Ionel :-) There's still a lot of room for improvement in many areas, but what this test showed is that we're on the right track regarding the thread-level scalability, and that provided that we remain careful and continue to address the remaining bottlenecks as they are met, nothing should stop us from adding more features and more flexibility. Cheers, Willy
Re: [PATCH] BUG/MINOR: sample: Fix adjusting size in field converter
Hi Thayne, On Sun, Apr 11, 2021 at 11:26:59PM -0600, Thayne McCombs wrote: > Adjust the size of the sample buffer before we change the "area" > pointer. The change in size is calculated as the difference between the > original pointer and the new start pointer. But since the > `smp->data.u.str.area` assignment results in `smp->data.u.str.area` and > `start` being the same pointer, we always ended up substracting zero. > This changes it to change the size by the actual amount it changed. > > I'm not entirely sure what the impact of this is, but the previous code > seemed wrong. So I carefully reviewed it, and not only you're totally right, but I could figure in which case it is harmful. All accesses limit themselves to the amount of data except one, the binary key padding for a stick table. So it is technically possible to use it to write zeroes past the end of the string in such a construct where is of type binary with keys at least as large as your buffers (lots of 'if') : hdr(foo),field(2,:),in_table(table) Thus I tagged it "MEDIUM" in the end. Thank you! Willy
Re: [PATCH] MINOR: sample: Add a regmatch converter
Hi Thayne, On Tue, Apr 13, 2021 at 02:11:25AM -0600, Thayne McCombs wrote: > Add a new sample converter that finds the first regex match and returns > the substring for that match, or a capture group, if an index is > provided. > --- > doc/configuration.txt| 22 +++ > reg-tests/converter/regmatch.vtc | 39 +++ > src/sample.c | 66 > 3 files changed, 127 insertions(+) > create mode 100644 reg-tests/converter/regmatch.vtc > > diff --git a/doc/configuration.txt b/doc/configuration.txt > index f21a29a68..e84395d23 100644 > --- a/doc/configuration.txt > +++ b/doc/configuration.txt > @@ -16238,6 +16238,28 @@ protobuf(,[]) >More information may be found here about the protocol buffers message > field types: >https://developers.google.com/protocol-buffers/docs/encoding > > +regmatch([,[,]]) > + This extracts a substring that matches the regex pattern. It will return > the first > + match in the input string. By default it returns the entire match, but if > > + is supplied, then the capture group for that number will be returned > instead. A > + value of 0 returns the entire match. The regex can be made case > insensitive by > + adding the flag "i" in . > + > + It is highly recommended to enclose the regex part using protected quotes > to > + improve clarity and never have a closing parenthesis from the regex mixed > up with > + the parenthesis from the function. Just like in Bourne shell, the first > level of > + quotes is processed when delimiting word groups on the line, a second > level is > + usable for argument. It is recommended to use single quotes outside since > these > + ones do not try to resolve backslashes nor dollar signs. > + > + Examples: > + > + # extract part of content-type > + http-request set-var(txn.imtype) > 'hdr(content-type),regmatch("image/(.*)",1)' > + > + # extract cookie with certain pattern > + http-request set-header x-test-cookie > %[hdr(cookie),'regmatch(test-\w+=\d+)'] > + I'm failing to see how it differs from regsub() which already does the same with the reference (you'd write \1 instead of 1) and also allows to compose something out of multiple matches. Am I missing something, or a case where regsub() is really not convenient compared to this one ? Thanks, Willy
Re: [PATCH] JWT payloads break b64dec convertor
Hi Jarno, On Tue, Apr 13, 2021 at 06:19:47AM +, Jarno Huuskonen wrote: > Hello, > > On Tue, 2021-04-06 at 01:58 +0200, Moemen MHEDHBI wrote: > > Thanks Willy and Tim for your feedback. > > > > You can find attached the updated patches with fixed coding style (now > > set correctly in my editor), updated commit message, entry doc in sorted > > order, size_t instead of int in both enc/dec and corresponding reg-test. > > Could you add a cross reference from b64dec/base64 to ub64dec/ub64enc in > configuration.txt. Something like: > --- a/doc/configuration.txt > +++ b/doc/configuration.txt > @@ -15020,11 +15020,14 @@ and() > b64dec >Converts (decodes) a base64 encoded input string to its binary >representation. It performs the inverse operation of base64(). > + For base64url("URL and Filename Safe Alphabet" (RFC 4648)) variant > + see "ub64dec". > > base64 >Converts a binary input sample to a base64 string. It is used to log or >transfer binary content in a way that can be reliably transferred (e.g. > - an SSL ID can be copied in a header). > + an SSL ID can be copied in a header). For base64url("URL and Filename > Safe > + Alphabet" (RFC 4648)) variant see "ub64enc". > > bool >Returns a boolean TRUE if the input value of type signed integer is Yes very good point indeed, and similarly the ub64 ones should refer to these ones. Willy
Re: [PATCH] JWT payloads break b64dec convertor
Hi Moemen, On Tue, Apr 13, 2021 at 12:41:39AM +0200, Moemen MHEDHBI wrote: > >> in such case should we rather use dynamic allocation ? > > > > No, there are two possible approaches. One of them is to use a trash > > buffer using get_trash_chunk(). The trash buffers are "large enough" > > for anything that comes from outside. A second, cleaner solution > > simply consists in not using a temporary buffer but doing the conversion > > on the fly. Indeed, looking closer, what the function does is to first > > replace a few chars on the whole chain to then call the base64 conversion > > function. So it doubles the work on the string and one side effect of > > this double work is that you need a temporary storage. > > The url variant is not only about a different alphabet that needs to be > replaced but also is a non padding variant. So the straightforward > algorithm to decoding it is to add the padding to the input encoded in > url variant and then use the standard base64 decoder. Ah OK I wasn't aware of this! > Even doing this on the fly requires extending input with two more bytes > max. Unless I am missing smth but in such case on the fly conversion > will result in a out of bound array access. There is never a reason for an out-of-bounds access but of course it will require to add the checks everywhere while right now it knows that it can read 4 bytes at once. So yeah, that woul result in two different paths and that's a pain. > That's why I have copied input in a "inlen+2" string. Got it, that makes sense. > In the end I have updated patch to handle extending input in decoding > function via get_trash_chunk to make sure a buffer of size input+2 is > available. > > You can find attached the patches 0001 and 0002 for this implementation. We could do with that, I just still find it a bit awkward to write so much code to modify an input and adapt it to a parser instead of writing a parser. That's more operations and more code to inspect in case of trouble. > You can find attached the patches 0001-bis and 0002-bis modifying the > existing functions (introducing an url flag) to see how it looks like. > This solution may be cleaner (no chunk allocation and we don't loop > twice over input string) but has the drawbacks of being intrusive with > the rest of the code and less clearer imo regarding how url variant is > different from standard base64. I agree they're not pretty due to the change of logic around the padding, thanks for having tested! But then how about having just *your* functions without relying on the other ones ? Now that you've extended the existing function, you can declare it yours, remove all the configurable stuff and keep the simplified version as the one you need. I'm sure it will be the best tradeoff overall. > diff --git a/src/base64.c b/src/base64.c > index 53e4d65b2..f2768d980 100644 > --- a/src/base64.c > +++ b/src/base64.c > @@ -1,5 +1,5 @@ > /* > - * ASCII <-> Base64 conversion as described in RFC1421. > + * ASCII <-> Base64 conversion as described in RFC4648. > * > * Copyright 2006-2010 Willy Tarreau > * Copyright 2009-2010 Krzysztof Piotr Oledzki > @@ -17,50 +17,70 @@ > #include > #include > > -#define B64BASE '#' /* arbitrary chosen base value */ > -#define B64CMIN '+' > -#define B64CMAX 'z' > -#define B64PADV 64 /* Base64 chosen special pad value */ > +#define B64BASE '#' /* arbitrary chosen base value */ > +#define B64CMIN '+' > +#define UB64CMIN '-' > +#define B64CMAX 'z' > +#define B64PADV 64 /* Base64 chosen special pad value */ Please do not needlessly reindent code parts for no reason. It seems that only the "-" was added there, the rest shouldn't change. > @@ -73,29 +93,59 @@ int a2base64(char *in, int ilen, char *out, int olen) > * or to be NULL. Returns -1 if is invalid or ilen > * has wrong size, -2 if is too short. > * 1 to 3 output bytes are produced for 4 input bytes. > + * The reverse tab used to decode base64 is generated via > /contrib/base64/base64rev-gen.c By the way, contrib/ was move to dev/ during your changes so if you keep this comment please update it. Thanks, Willy
Re: [PATCH] MINOR: sample: add json_string
Hi Aleks, On Mon, Apr 12, 2021 at 10:09:08PM +0200, Aleksandar Lazic wrote: > Hi. > > another patch which honer the feedback. Thank you. FWIW I agree with all the points reported by Tim. I'll add a few comments and/or suggestions below. On a general note, please be careful about your indenting, as it very can quickly become a total mess. Similarly please pay attention not to leave trailing spaces that may Git complain: Applying: MINOR: sample: converter: add JSON Path handling .git/rebase-apply/patch:39: trailing whitespace. - number : When the JSON value is a number then will the value be .git/rebase-apply/patch:40: trailing whitespace. converted to a string. If you know that the value is a .git/rebase-apply/patch:41: trailing whitespace. integer then can you help haproxy to convert the value .git/rebase-apply/patch:46: trailing whitespace. This converter extracts the value located at from the JSON .git/rebase-apply/patch:47: trailing whitespace. string in the input value. warning: squelched 10 whitespace errors warning: 15 lines add whitespace errors. All these lines are easily noticed this way: $ git show | grep -c '^+.*\s$' 15 A good way to avoid this once for all it to enable colors in Git and to always make sure not to leave red areas in "git diff" or "git show" : $ git config --global color.ui true And even if it's of low importance for the code itself, it's particularly important in a review because such cosmetic issues constantly remind the reader that the patch is far from being final, so it's possibly not yet the moment to focus on not critically important stuff. Thus in the end they increase the number of round trips. > The doc will be enhanced but I have a question about that sequence. > This should write the double value to the string but I think I have here some > issue. > > ``` > printf("\n>>>DOUBLE rc:%d: double:%f:\n",rc, > double_val); > trash->size = snprintf(trash->area, > trash->data, > > "%g",double_val); > smp->data.u.str = *trash; > smp->data.type = SMP_T_STR; > ``` Yeah, as Tim mentioned, you mixed size and data. "data" is the amount of data bytes used in a chunk. "size" is its allocated size. > >From 8cb1bc4aaedd17c7189d4985a57f662ab1b533a4 Mon Sep 17 00:00:00 2001 > From: Aleksandar Lazic > Date: Mon, 12 Apr 2021 22:01:04 +0200 > Subject: [PATCH] MINOR: sample: converter: add JSON Path handling > > With json_path can a JSON value be extacted from a Header > or body In the final version, please add a few more lines to describe the name of the added converter and what it's used for. As a reminder, think that you're trying to sell your artwork to me or anyone else who would make you proud bybackporting your work into their version :-) > +json_query(,[]) > + The is mandatory. > + By default will the follwing JSON types recognized. > + - string : This is the default search type and returns a string; > + - number : When the JSON value is a number then will the value be > + converted to a string. If you know that the value is a > + integer then can you help haproxy to convert the value > + to a integer when you add "sint" to the ; Just thinking loud, I looked at the rest of the doc and noticed we never mention "sint" anywhere else, so I think it's entirely an internal type. However we do mention "int" which is used as the matching method for integers, so we could have: ... json_query("blah",sint) -m int 12 As such I would find it more natural to call this type "int" so that it matches the same as the one used in the match. Maps already use "int" as the output type name by the way. In any case, I too am a bit confused by the need to force an output type. As a user, I'd expect the type to be implicit and not to have to know about it in the configuration. Of course we can imagine situations where we'd want to force the type (like we sometimes do by adding 0 or concatenating an empty string for example) but this is still not very clear to me if we want it by default. Or maybe when dealing with floats where we'd have to decide whether to emit them verbatim as strings or to convert them to integers. But then, could it make sense to also support "strict integers": values that can accurately be represented as integers and which are within the JSON valid range for integers (-2^52 to 2^52 with no decimal part) ? This would then make the converter return nothing in case of violation (i.e. risk of losing precision). This would also reject NaN and infinite that the lib supports. A small detail, in general, prefer a passive form in the text rather than speaking to the reader using "you". You'll notice that using this more descr
Bid Writing, Fundraising and Volunteering Workshops
NFP WORKSHOPS Affordable Training Courses Bid Writing: The Basics Do you know the most common reasons for rejection? Are you gathering the right evidence? Are you making the right arguments? Are you using the right terminology? Are your numbers right? Are you learning from rejections? Are you assembling the right documents? Do you know how to create a clear and concise standard funding bid? Are you communicating with people or just excluding them? Do you know your own organisation well enough? Are you thinking through your projects carefully enough? Do you know enough about your competitors? Are you answering the questions funders will ask themselves about your application? Are you submitting applications correctly? ONLINE VIA ZOOM 10.00 TO 12.30 COST £95.00 CLICK ON DATE TO BOOK YOUR PLACE MON 12 APR 2021 MON 26 APR 2021 MON 10 MAY 2021 MON 24 MAY 2021 MON 07 JUN 2021 MON 21 JUN 2021 MON 05 JUL 2021 MON 19 JUL 2021 Bid Writing: Advanced Are you applying to the right trusts? Are you applying to enough trusts? Are you asking for the right amount of money? Are you applying in the right ways? Are your projects the most fundable projects? Are you carrying out trust fundraising in a professional way? Are you delegating enough work? Are you highly productive or just very busy? Are you looking for trusts in all the right places? How do you compare with your competitors for funding? Is the rest of your fundraising hampering your bids to trusts? Do you understand what trusts are ideally looking for? ONLINE VIA ZOOM 10.00 TO 12.30 COST £95.00 CLICK ON DATE TO BOOK YOUR PLACE TUE 13 APR 2021 TUE 27 APR 2021 TUE 11 MAY 2021 TUE 25 MAY 2021 TUE 08 JUN 2021 TUE 22 JUN 2021 TUE 06 JUL 2021 TUE 20 JUL 2021 Major Donor Fundraising Major Donor Characteristics, Motivations and Requirements. Researching and Screening Major Donors. Encouraging, Involving and Retaining Major Donors. Building Relationships with Major Donors. Major Donor Events and Activities. Setting Up Major Donor Clubs. Asking For Major Gifts. Looking After and Reporting Back to Major Donors. Delivering on Major Donor Expectations. Showing Your Appreciation to Major Donors. Fundraising Budgets and Committees. ONLINE VIA ZOOM 10.00 TO 12.30 COST £95 CLICK ON DATE TO BOOK YOUR PLACE WED 14 APR 2021 WED 09 JUN 2021 Corporate Fundraising Who are these companies? Why do they get involved? What do they like? What can you get from them? What can you offer them? What are the differences between donations, sponsorship, advertising and cause related marketing? Are companies just like trusts? How do you find these companies? How do you research them? How do you contact them? How do you pitch to them? How do you negotiate with them? When should you say no? How do you draft contracts? How do you manage the relationships? What could go wrong? What are the tax issues? What are the legal considerations? ONLINE VIA ZOOM 10.00 TO 12.30 COST £95 CLICK ON DATE TO BOOK YOUR PLACE THU 29 APR 2021 WED 23 JUN 2021 Recruiting and Managing Volunteers Where do you find volunteers? How do you find the right volunteers? How do you attract volunteers? How do you run volunteer recruitment events? How do you interview volunteers? How do you train volunteers? How do you motivate volunteers? How do you involve volunteers? How do you recognise volunteers? How do you recognise problems with volunteers? How do you learn from volunteer problems? How do you retain volunteers? How do you manage volunteers? What about volunteers and your own staff? What about younger, older and employee volunteers? ONLINE VIA ZOOM 10.00 TO 12.30 COST £95 CLICK ON DATE TO BOOK YOUR PLACE THU 13 MAY 2021 WED 07 JUL 2021 Legacy Fundraising Why do people make legacy gifts? What are the ethical issues? What are the regulations? What are the tax issues? What are the statistics? What are the trends? How can we integrate legacy fundraising into our other fundraising? What are the sources for research? How should we set a budget? How should we evaluate our results? How should we forecast likely income? Should we use consultants? How should we build a case for support? What media and marketing channels should we use? What about in memory giving? How should we setup our admin systems? What are the common problems & pitfalls? ONLINE VIA ZOOM 10.00 TO 12.30 COST £95 CLICK ON DATE TO BOOK YOUR PLACE THU 27 MAY 2021 WED 21 JUL 2021 Feedback From Past Attendees I must say I was really impressed with the course and the content. My knowledge and confidence has increased hugely. I got a lot from your course and a lot of pointers! I can say after years of fundraising I learnt so much from your bid writing course. It was a very informative day and for someone who has not written bids before I am definitely more confident to get involved with them. I found the workshops very helpful. It is a whole new area for me but the information
[PATCH] MINOR: sample: Add a regmatch converter
Add a new sample converter that finds the first regex match and returns the substring for that match, or a capture group, if an index is provided. --- doc/configuration.txt| 22 +++ reg-tests/converter/regmatch.vtc | 39 +++ src/sample.c | 66 3 files changed, 127 insertions(+) create mode 100644 reg-tests/converter/regmatch.vtc diff --git a/doc/configuration.txt b/doc/configuration.txt index f21a29a68..e84395d23 100644 --- a/doc/configuration.txt +++ b/doc/configuration.txt @@ -16238,6 +16238,28 @@ protobuf(,[]) More information may be found here about the protocol buffers message field types: https://developers.google.com/protocol-buffers/docs/encoding +regmatch([,[,]]) + This extracts a substring that matches the regex pattern. It will return the first + match in the input string. By default it returns the entire match, but if + is supplied, then the capture group for that number will be returned instead. A + value of 0 returns the entire match. The regex can be made case insensitive by + adding the flag "i" in . + + It is highly recommended to enclose the regex part using protected quotes to + improve clarity and never have a closing parenthesis from the regex mixed up with + the parenthesis from the function. Just like in Bourne shell, the first level of + quotes is processed when delimiting word groups on the line, a second level is + usable for argument. It is recommended to use single quotes outside since these + ones do not try to resolve backslashes nor dollar signs. + + Examples: + + # extract part of content-type + http-request set-var(txn.imtype) 'hdr(content-type),regmatch("image/(.*)",1)' + + # extract cookie with certain pattern + http-request set-header x-test-cookie %[hdr(cookie),'regmatch(test-\w+=\d+)'] + regsub(,[,]) Applies a regex-based substitution to the input string. It does the same operation as the well-known "sed" utility with "s///". By diff --git a/reg-tests/converter/regmatch.vtc b/reg-tests/converter/regmatch.vtc new file mode 100644 index 0..46df78ee0 --- /dev/null +++ b/reg-tests/converter/regmatch.vtc @@ -0,0 +1,39 @@ +varnishtest "regmatch converter Test" + +feature ignore_unknown_macro + +server s1 { + rxreq + txresp +} -repeat 3 -start + +haproxy h1 -conf { + defaults + mode http + timeout connect 1s + timeout client 1s + timeout server 1s + + frontend fe + bind "fd@${fe}" + + requests + http-request set-var(txn.match) "path,regmatch('test/(\d+)/',1)" + http-response set-header Found %[var(txn.match)] if { var(txn.match) -m found } + + default_backend be + + backend be + server s1 ${s1_addr}:${s1_port} +} -start + +client c1 -connect ${h1_fe_sock} { + txreq -url "/test/123/456" + rxresp + expect resp.status == 200 + expect resp.http.found == "123" + txreq -url "/foo" + rxresp + expect resp.status == 200 + expect resp.http.found == "" +} -run diff --git a/src/sample.c b/src/sample.c index 835a18115..66d674e3f 100644 --- a/src/sample.c +++ b/src/sample.c @@ -2671,6 +2671,71 @@ static int sample_conv_word(const struct arg *arg_p, struct sample *smp, void *p return 1; } +static int sample_conv_regmatch_check(struct arg *args, struct sample_conv *conv, + const char *file, int line, char **err) +{ + struct arg *arg = args; + char *p; + int len; + + if (arg[1].type == ARGT_SINT && (arg[1].data.sint < 0 || arg[1].data.sint > MAX_MATCH)) { + memprintf(err, "invalid capture group number %lld. must be between 0 and %d", arg[1].data.sint, MAX_MATCH); + return 0; + } + + /* arg0 is a regex, it uses type_flag for ICASE */ + arg[0].type_flags = 0; + + if (arg[2].type != ARGT_STR) + return 1; + + p = arg[2].data.str.area; + len = arg[2].data.str.data; + while (len) { + if (*p == 'i') { + arg[0].type_flags |= ARGF_REG_ICASE; + } + else { + memprintf(err, "invalid regex flag '%c', only 'i' is supported", *p); + return 0; + } + p++; + len--; + } + return 1; +} + +/* This sample function is designed to find the first match of a regex in the input string. + * If arg1 is supplied, that is used as the capture group to return (or the whole match if 0). + */ +static int sample_conv_regmatch(const struct arg *arg_p, struct sample *smp, void *private) +{ + struct my_regex *reg = arg_p[0].data.reg; + regmatch_t pmatch[MAX_MATCH]; + regmatch_t capture; + int nmatch = (arg_p[1].type == ARGT_SINT) ? arg_p[1].data.sint : 0; + int found; + + found = re