Re: [PATCH v2 0/8] URI normalization / Issue #714
Willy, On 4/20/21 8:40 PM, Willy Tarreau wrote: I perfectly understand that some transformations may require the whole URI for various reasons, but for those that could be expressed on individual parts, I do think that converters will be way more flexible over the long term. Luckily this should be easy enough with the clear split between the action and the actual normalizers that simply take an ist. OK, great! I forgot to issue dev17 last Friday, different work place, change of habits and being busy on the pools mess made me simply forget it. I'll do it for next Friday with some other stuff. If you think there's anything doable by then, just let me know so that I can wait for you, otherwise no stress, it can come at any time. I don't plan on adding sample converters for normalization in 2.4. I think the feature should first get some basic exposure based off the http-request normalize-uri action and then we can look into extensions once feature requests arrive. One bite at a time! The only thing I will do before the final 2.4 release is adding a few more normalizers for use within the http-request normalize-uri action. Most notable the percent decoder for unreserved characters (%61dmin -> admin) still is missing, due to its higher complexity compared to the existing ones. Best regards Tim Düsterhus
Re: [PATCH v2 0/8] URI normalization / Issue #714
On Tue, Apr 20, 2021 at 07:59:15PM +0200, Tim Düsterhus wrote: > Willy, > > On 4/19/21 3:54 PM, Willy Tarreau wrote: > > > I would advice against making them into converters, because it forces the > > > user to think about the appropriate fetch to use. As an example the > > > path-strip-dotdot normalizer probably should not be applied to the query > > > string! The actions hide this type of detail from the user which I > > > consider > > > to be a good thing. > > > > I disagree with this line of reasoning, because developers must never > > decide what users need (and that's something very difficult to do). We > > must only help to figure what users really need (compared to what they > > ask for), and estimate the technical feasibility and the consistency > > with all other parts so that each feature doesn't look like it's been > > developed differently. > > ack. My personal experience is mostly with less experienced end users vs > system administrators and I constantly have to prevent those end users from > shooting themselves into the foot. That's why I try to develop safe > interfaces where one is unable to do something that might hurt them due to > them not knowing all the fine details. Yes I understand this, that's an area where a single mistake can be irreversible (or at a high cost). That's the same reason why I reuse to work as root, or to run scripts requiring "curl | sh". > > I perfectly understand that some transformations may require the whole > > URI for various reasons, but for those that could be expressed on > > individual parts, I do think that converters will be way more flexible > > over the long term. > > Luckily this should be easy enough with the clear split between the action > and the actual normalizers that simply take an ist. OK, great! I forgot to issue dev17 last Friday, different work place, change of habits and being busy on the pools mess made me simply forget it. I'll do it for next Friday with some other stuff. If you think there's anything doable by then, just let me know so that I can wait for you, otherwise no stress, it can come at any time. Thanks, Willy
Re: [PATCH v2 0/8] URI normalization / Issue #714
Willy, On 4/19/21 3:54 PM, Willy Tarreau wrote: I would advice against making them into converters, because it forces the user to think about the appropriate fetch to use. As an example the path-strip-dotdot normalizer probably should not be applied to the query string! The actions hide this type of detail from the user which I consider to be a good thing. I disagree with this line of reasoning, because developers must never decide what users need (and that's something very difficult to do). We must only help to figure what users really need (compared to what they ask for), and estimate the technical feasibility and the consistency with all other parts so that each feature doesn't look like it's been developed differently. ack. My personal experience is mostly with less experienced end users vs system administrators and I constantly have to prevent those end users from shooting themselves into the foot. That's why I try to develop safe interfaces where one is unable to do something that might hurt them due to them not knowing all the fine details. Each time we tried to impose something it resulted in a specific feature being corner-cased over time after we had to implement something cleaner to put an end to horrible workarounds, just like we've seen a lot of dummy headers being used before variables existed, or headers appended after the HTTP version in health checks. I already have counter examples for you here. Imagine I want to normalize the Referer header in a request. What I'll have to do is: http-request set-var(req.uri) url http-request set-uri req.fhdr(referer) http-request normalize-uri path-merge-slashes http-request normalize-uri path-strip-dotdot http-request set-header referer url http-request set-uri var(req.uri) Okay, fair point. This is a good counter example. I perfectly understand that some transformations may require the whole URI for various reasons, but for those that could be expressed on individual parts, I do think that converters will be way more flexible over the long term. Luckily this should be easy enough with the clear split between the action and the actual normalizers that simply take an ist. There's no rush on this, but I think it's something to keep in mind, to see which ones might be separated and be provided as converters as well. We may do that during 2.5-dev and possibly backport them if there is some demand. Best regards Tim Düsterhus
Re: [PATCH v2 0/8] URI normalization / Issue #714
Hi Tim, On Sat, Apr 17, 2021 at 01:23:15PM +0200, Tim Düsterhus wrote: > > I think that some of the actions will probably end up being replicated > > as converters, so maybe in the end the sequence below: > > > > http-request normalize-uri path-merge-slashes > > http-request normalize-uri path-strip-dotdot > > > > could end up like this: > > > > http-request set-path %[path,path-merge-slashes,path-strip-dotdot] > > > > The pre-release period is the right one to evaluate such options, so > > I'm not worried about any outcome. > > I would advice against making them into converters, because it forces the > user to think about the appropriate fetch to use. As an example the > path-strip-dotdot normalizer probably should not be applied to the query > string! The actions hide this type of detail from the user which I consider > to be a good thing. I disagree with this line of reasoning, because developers must never decide what users need (and that's something very difficult to do). We must only help to figure what users really need (compared to what they ask for), and estimate the technical feasibility and the consistency with all other parts so that each feature doesn't look like it's been developed differently. Each time we tried to impose something it resulted in a specific feature being corner-cased over time after we had to implement something cleaner to put an end to horrible workarounds, just like we've seen a lot of dummy headers being used before variables existed, or headers appended after the HTTP version in health checks. I already have counter examples for you here. Imagine I want to normalize the Referer header in a request. What I'll have to do is: http-request set-var(req.uri) url http-request set-uri req.fhdr(referer) http-request normalize-uri path-merge-slashes http-request normalize-uri path-strip-dotdot http-request set-header referer url http-request set-uri var(req.uri) And you can be certain that some users might end up with this one (then they will complain they cannot clean up Location nor Link headers because normalize-uri only works in the request context). And even when you say that some users would apply it to the query-string, yes that's possible. And so what, if they consider they need it ? For example you could have a query string made of an optionally base64-encoded URL to redirect to after a successful login. It could make sense for some users to decide to base64-decode, normalize, and re-encode, to make sure it's not being abused for example. Or in some cases it can make sense to apply the transformation only for a match against certain patterns without applying it to the request itself. I perfectly understand that some transformations may require the whole URI for various reasons, but for those that could be expressed on individual parts, I do think that converters will be way more flexible over the long term. There's no rush on this, but I think it's something to keep in mind, to see which ones might be separated and be provided as converters as well. We may do that during 2.5-dev and possibly backport them if there is some demand. Cheers, Willy
Re: [PATCH v2 0/8] URI normalization / Issue #714
Christopher, On 4/19/21 9:10 AM, Christopher Faulet wrote: All in all: I think that the 8 v2 patches + the 3 patches from this morning together result in something that is appropriate for HAProxy 2.4. I pushed all the series. Thanks ! Awesome, thank you! I'll then look into the last few normalizers that I planned to add and I'm looking forward to feedback from the users :-) Best regards Tim Düsterhus
Re: [PATCH v2 0/8] URI normalization / Issue #714
Le 17/04/2021 à 13:23, Tim Düsterhus a écrit : I added the experimental marking in the 3rd patch this morning. Generally I think that it looks solid enough, though. During development I carefully researched the relevant documentation (e.g. the URI RFC) and tested the behavior of different clients and servers. It also comes with quite a few tests ensuring that the normalizers behave like I expect them to. Nonetheless I might have missed something and correct handling of URIs is a sensitive part of the request handling, so an experimental note still is appropriate. All in all: I think that the 8 v2 patches + the 3 patches from this morning together result in something that is appropriate for HAProxy 2.4. Tim, I pushed all the series. Thanks ! -- Christopher Faulet
Re: [PATCH v2 0/8] URI normalization / Issue #714
On 17.04.21 13:23, Tim Düsterhus wrote: Willy, On 4/17/21 12:09 PM, Willy Tarreau wrote: With the renaming already made I consider the configuration syntax to be stable enough for a 2.4. I'll leave the final decision regarding that up to you, though. Especially since 2.4 is going to be an LTS. What we can possibly do, if you're not completely sure about the naming (it's often a very difficult aspect to deal with), is to merge the series, ask users in the next release announcement to have a look and possibly suggest updates before the release. We can then mark the new actions as With the new names (1st patch this morning) I think that the naming is good. It is descriptive and follows a well-defined naming scheme that allows for future extension. experimental in the doc, and remove the experimental status after a while. Or if the features look solid enough and you're feeling ready to deal with occasionally possible bug reports, we can merge them and even not pass via an experimental status. I added the experimental marking in the 3rd patch this morning. Generally I think that it looks solid enough, though. During development I carefully researched the relevant documentation (e.g. the URI RFC) and tested the behavior of different clients and servers. It also comes with quite a few tests ensuring that the normalizers behave like I expect them to. Nonetheless I might have missed something and correct handling of URIs is a sensitive part of the request handling, so an experimental note still is appropriate. All in all: I think that the 8 v2 patches + the 3 patches from this morning together result in something that is appropriate for HAProxy 2.4. I'm open to various options. Anyway I do think that URI normalization is a useful feature to have. I think that some of the actions will probably end up being replicated as converters, so maybe in the end the sequence below: http-request normalize-uri path-merge-slashes http-request normalize-uri path-strip-dotdot could end up like this: http-request set-path %[path,path-merge-slashes,path-strip-dotdot] The pre-release period is the right one to evaluate such options, so I'm not worried about any outcome. I would advice against making them into converters, because it forces the user to think about the appropriate fetch to use. As an example the path-strip-dotdot normalizer probably should not be applied to the query string! The actions hide this type of detail from the user which I consider to be a good thing. Well I think also that the usage of http-request set-path %[path,path-merge-slashes,path-strip-dotdot] would be more "natural". jm2c Best regards Tim Düsterhus
Re: [PATCH v2 0/8] URI normalization / Issue #714
Willy, On 4/17/21 12:09 PM, Willy Tarreau wrote: With the renaming already made I consider the configuration syntax to be stable enough for a 2.4. I'll leave the final decision regarding that up to you, though. Especially since 2.4 is going to be an LTS. What we can possibly do, if you're not completely sure about the naming (it's often a very difficult aspect to deal with), is to merge the series, ask users in the next release announcement to have a look and possibly suggest updates before the release. We can then mark the new actions as With the new names (1st patch this morning) I think that the naming is good. It is descriptive and follows a well-defined naming scheme that allows for future extension. experimental in the doc, and remove the experimental status after a while. Or if the features look solid enough and you're feeling ready to deal with occasionally possible bug reports, we can merge them and even not pass via an experimental status. I added the experimental marking in the 3rd patch this morning. Generally I think that it looks solid enough, though. During development I carefully researched the relevant documentation (e.g. the URI RFC) and tested the behavior of different clients and servers. It also comes with quite a few tests ensuring that the normalizers behave like I expect them to. Nonetheless I might have missed something and correct handling of URIs is a sensitive part of the request handling, so an experimental note still is appropriate. All in all: I think that the 8 v2 patches + the 3 patches from this morning together result in something that is appropriate for HAProxy 2.4. I'm open to various options. Anyway I do think that URI normalization is a useful feature to have. I think that some of the actions will probably end up being replicated as converters, so maybe in the end the sequence below: http-request normalize-uri path-merge-slashes http-request normalize-uri path-strip-dotdot could end up like this: http-request set-path %[path,path-merge-slashes,path-strip-dotdot] The pre-release period is the right one to evaluate such options, so I'm not worried about any outcome. I would advice against making them into converters, because it forces the user to think about the appropriate fetch to use. As an example the path-strip-dotdot normalizer probably should not be applied to the query string! The actions hide this type of detail from the user which I consider to be a good thing. Best regards Tim Düsterhus
Re: [PATCH v2 0/8] URI normalization / Issue #714
On Sat, Apr 17, 2021 at 11:42:32AM +0200, Tim Düsterhus wrote: > Christopher, > > On 4/17/21 9:16 AM, Christopher Faulet wrote: > > I'm a bit annoyed by the renaming of existing normalizers. Except if you > > plan to do your modification before the release, it is a bad idea to > > change the name of a configuration parameter once introduced in a > > stable release. The feature itself may be experimental because some bugs > > are expected but from the configuration point of view, it should be > > stable. > > To be honest I did not expect that the second round of patches was already > good enough to merge, so I wanted to send a quick notice regarding my > further plans before it was too late and you already applied the patches. > > I definitely planned to do the finalization before the release, but I > understand that we are *very* late in the cycle, so it's understandable that > you are a bit annoyed by my announcement. Last weeks in cycle always attract lots of no-so-important but very useful features, that always make us hesitate betwen merging or leaving them. And that's normal since usually they do not risk to cause regressions. > With the renaming already made I consider the configuration syntax to be > stable enough for a 2.4. I'll leave the final decision regarding that up to > you, though. Especially since 2.4 is going to be an LTS. What we can possibly do, if you're not completely sure about the naming (it's often a very difficult aspect to deal with), is to merge the series, ask users in the next release announcement to have a look and possibly suggest updates before the release. We can then mark the new actions as experimental in the doc, and remove the experimental status after a while. Or if the features look solid enough and you're feeling ready to deal with occasionally possible bug reports, we can merge them and even not pass via an experimental status. I'm open to various options. Anyway I do think that URI normalization is a useful feature to have. I think that some of the actions will probably end up being replicated as converters, so maybe in the end the sequence below: http-request normalize-uri path-merge-slashes http-request normalize-uri path-strip-dotdot could end up like this: http-request set-path %[path,path-merge-slashes,path-strip-dotdot] The pre-release period is the right one to evaluate such options, so I'm not worried about any outcome. Willy
Re: [PATCH v2 0/8] URI normalization / Issue #714
Christopher, On 4/17/21 9:16 AM, Christopher Faulet wrote: I'm a bit annoyed by the renaming of existing normalizers. Except if you plan to do your modification before the release, it is a bad idea to change the name of a configuration parameter once introduced in a stable release. The feature itself may be experimental because some bugs are expected but from the configuration point of view, it should be stable. To be honest I did not expect that the second round of patches was already good enough to merge, so I wanted to send a quick notice regarding my further plans before it was too late and you already applied the patches. I definitely planned to do the finalization before the release, but I understand that we are *very* late in the cycle, so it's understandable that you are a bit annoyed by my announcement. Please find 3 additional patches (to be applied on top of the 8 of v2). 1. One of them performs the renaming into a naming scheme that I consider to be stable enough for future extensions. 2. Expands the documentation a little on best practices regarding use. 3. Marks the action as experimental in the documentation. However, if you want to wait a bit to finish your work, I can push your patches in the next branch, pending for the next 2.5. This way, you'll have all the time to modify it. And because it is a standalone feature, we may plan to backport it to 2.4 if necessary. With the renaming already made I consider the configuration syntax to be stable enough for a 2.4. I'll leave the final decision regarding that up to you, though. Especially since 2.4 is going to be an LTS. Best regards Tim Düsterhus >From 02a0a18f3739dc9b0ed6297c76d6742f8f59c1eb Mon Sep 17 00:00:00 2001 From: Tim Duesterhus Date: Sat, 17 Apr 2021 11:21:10 +0200 Subject: [PATCH 1/3] MEDIUM: http_act: Rename uri-normalizers To: haproxy@formilux.org Cc: cfau...@haproxy.com This patch renames all existing uri-normalizers into a more consistent naming scheme: 1. The part of the URI that is being touched. 2. The modification being performed as an explicit verb. --- doc/configuration.txt | 24 +++ include/haproxy/action-t.h | 12 reg-tests/http-rules/normalize_uri.vtc | 42 +- src/http_act.c | 36 +++--- 4 files changed, 57 insertions(+), 57 deletions(-) diff --git a/doc/configuration.txt b/doc/configuration.txt index 1e2f72b61..a9ed869d7 100644 --- a/doc/configuration.txt +++ b/doc/configuration.txt @@ -6012,18 +6012,18 @@ http-request early-hint [ { if | unless } ] See RFC 8297 for more information. http-request normalize-uri [ { if | unless } ] -http-request normalize-uri dotdot [ full ] [ { if | unless } ] -http-request normalize-uri merge-slashes [ { if | unless } ] -http-request normalize-uri percent-upper [ strict ] [ { if | unless } ] -http-request normalize-uri sort-query [ { if | unless } ] +http-request normalize-uri path-merge-slashes [ { if | unless } ] +http-request normalize-uri path-strip-dotdot [ full ] [ { if | unless } ] +http-request normalize-uri percent-to-uppercase [ strict ] [ { if | unless } ] +http-request normalize-uri query-sort-by-name [ { if | unless } ] Performs normalization of the request's URI. The following normalizers are available: - - dotdot: Normalizes "/../" segments within the "path" component. This merges - segments that attempt to access the parent directory with their preceding - segment. Empty segments do not receive special treatment. Use the - "merge-slashes" normalizer first if this is undesired. + - path-strip-dotdot: Normalizes "/../" segments within the "path" component. + This merges segments that attempt to access the parent directory with + their preceding segment. Empty segments do not receive special treatment. + Use the "path-merge-slashes" normalizer first if this is undesired. Example: - /foo/../ -> / @@ -6040,14 +6040,14 @@ http-request normalize-uri sort-query [ { if | unless } ] - /../bar/ -> /bar/ - /bar/../../ -> / - - merge-slashes: Merges adjacent slashes within the "path" component into a - single slash. + - path-merge-slashes: Merges adjacent slashes within the "path" component + into a single slash. Example: - //-> / - /foo//bar -> /foo/bar - - percent-upper: Uppercases letters within percent-encoded sequences + - percent-to-uppercase: Uppercases letters within percent-encoded sequences (RFC 3986#6.2.21). Example: @@ -6060,7 +6060,7 @@ http-request normalize-uri sort-query [ { if | unless } ] Example: - /%zz -> HTTP 400 - - sort-query: Sorts the query string parameters by parameter name. + - query-sort-by-name: Sorts the query string parameters by parameter name. Parameters are assumed to be delimited by '&'. Shorter names sort
Re: [PATCH v2 0/8] URI normalization / Issue #714
Le 16/04/2021 à 20:14, Tim Düsterhus a écrit : Willy, Christopher, On 4/16/21 6:56 PM, Willy Tarreau wrote: Thanks Tim, The series looks good to me. Except if Willy has any comments, I'll merge it soon. No need for me to double-check, I'll trust you (and you know I like to complain afterwards :-)) I'll probably complain myself afterwards as well :-) As a heads up I'll definitely add a few more converters (most notably the percent decoder is missing) and maybe I'll rename the existing ones in the configuration for consistency once I finish this up. So: It should definitely considered *EXPERIMENTAL* for 2.4. Hi Tim, I'm a bit annoyed by the renaming of existing normalizers. Except if you plan to do your modification before the release, it is a bad idea to change the name of a configuration parameter once introduced in a stable release. The feature itself may be experimental because some bugs are expected but from the configuration point of view, it should be stable. However, if you want to wait a bit to finish your work, I can push your patches in the next branch, pending for the next 2.5. This way, you'll have all the time to modify it. And because it is a standalone feature, we may plan to backport it to 2.4 if necessary. -- Christopher Faulet
Re: [PATCH v2 0/8] URI normalization / Issue #714
Willy, Christopher, On 4/16/21 6:56 PM, Willy Tarreau wrote: Thanks Tim, The series looks good to me. Except if Willy has any comments, I'll merge it soon. No need for me to double-check, I'll trust you (and you know I like to complain afterwards :-)) I'll probably complain myself afterwards as well :-) As a heads up I'll definitely add a few more converters (most notably the percent decoder is missing) and maybe I'll rename the existing ones in the configuration for consistency once I finish this up. So: It should definitely considered *EXPERIMENTAL* for 2.4. Best regards Tim Düsterhus
Re: [PATCH v2 0/8] URI normalization / Issue #714
On Fri, Apr 16, 2021 at 06:51:41PM +0200, Christopher Faulet wrote: > Le 15/04/2021 à 21:45, Tim Duesterhus a écrit : > > Christopher, > > > > again: Thank you for the very helpful review. In this series you can find > > v2 of > > my URI normalization patches. I hope I did not forget to incorporate any of > > your feedback. > > > > Some general notes: > > > > - I completely cleaned up the commit history to group similar patches (e.g. > > the > >two patches for dotdot) and to avoid later commits completely refactoring > >earlier commits (e.g. the error handling). > > - As suggested I am now returning the error code and taking a `struct ist > > *dst`. > > - The values of `enum uri_normalizer_err` are cleaned up. > > - I cleaned up the error handling in `http_action_normalize_uri`. > > - I am now using `istdiff()`. > > - Dynamic allocation is no more. > > - I fixed some parts of the code style (`struct ist* foo` -> `struct ist > > *foo`). > > - I const'ified as much as possible. > > > > Thanks Tim, > > The series looks good to me. Except if Willy has any comments, I'll merge it > soon. No need for me to double-check, I'll trust you (and you know I like to complain afterwards :-)) Thanks! Willy
Re: [PATCH v2 0/8] URI normalization / Issue #714
Le 15/04/2021 à 21:45, Tim Duesterhus a écrit : Christopher, again: Thank you for the very helpful review. In this series you can find v2 of my URI normalization patches. I hope I did not forget to incorporate any of your feedback. Some general notes: - I completely cleaned up the commit history to group similar patches (e.g. the two patches for dotdot) and to avoid later commits completely refactoring earlier commits (e.g. the error handling). - As suggested I am now returning the error code and taking a `struct ist *dst`. - The values of `enum uri_normalizer_err` are cleaned up. - I cleaned up the error handling in `http_action_normalize_uri`. - I am now using `istdiff()`. - Dynamic allocation is no more. - I fixed some parts of the code style (`struct ist* foo` -> `struct ist *foo`). - I const'ified as much as possible. Thanks Tim, The series looks good to me. Except if Willy has any comments, I'll merge it soon. -- Christopher Faulet
[PATCH v2 0/8] URI normalization / Issue #714
Christopher, again: Thank you for the very helpful review. In this series you can find v2 of my URI normalization patches. I hope I did not forget to incorporate any of your feedback. Some general notes: - I completely cleaned up the commit history to group similar patches (e.g. the two patches for dotdot) and to avoid later commits completely refactoring earlier commits (e.g. the error handling). - As suggested I am now returning the error code and taking a `struct ist *dst`. - The values of `enum uri_normalizer_err` are cleaned up. - I cleaned up the error handling in `http_action_normalize_uri`. - I am now using `istdiff()`. - Dynamic allocation is no more. - I fixed some parts of the code style (`struct ist* foo` -> `struct ist *foo`). - I const'ified as much as possible. Tim Düsterhus (8): MINOR: uri_normalizer: Add uri_normalizer module MINOR: uri_normalizer: Add `enum uri_normalizer_err` MINOR: uri_normalizer: Add `http-request normalize-uri` MINOR: uri_normalizer: Add a `merge-slashes` normalizer to http-request normalize-uri MINOR: uri_normalizer: Add a `dotdot` normalizer to http-request normalize-uri MINOR: uri_normalizer: Add support for supressing leading `../` for dotdot normalizer MINOR: uri_normalizer: Add a `sort-query` normalizer MINOR: uri_normalizer: Add a `percent-upper` normalizer Makefile | 2 +- doc/configuration.txt | 58 + include/haproxy/action-t.h | 9 + include/haproxy/uri_normalizer-t.h | 31 +++ include/haproxy/uri_normalizer.h | 33 +++ reg-tests/http-rules/normalize_uri.vtc | 314 + src/http_act.c | 201 src/uri_normalizer.c | 296 +++ 8 files changed, 943 insertions(+), 1 deletion(-) create mode 100644 include/haproxy/uri_normalizer-t.h create mode 100644 include/haproxy/uri_normalizer.h create mode 100644 reg-tests/http-rules/normalize_uri.vtc create mode 100644 src/uri_normalizer.c -- 2.31.1