Re: [RFC PATCH 3/8] MINOR: uri_normalizer: Add a `dotdot` normalizer to http-request normalize-uri

2021-04-13 Thread Christopher Faulet

Le 13/04/2021 à 18:03, Tim Düsterhus a écrit :

Christopher,

On 4/13/21 4:38 PM, Christopher Faulet wrote:

At the end it remains your choice. The function is quite good. I just
wonder if it could be valuable to also handle single dot-segment here in
addition to double dot-segment. Thus, the normalizer should be renamed
"dot-segments" or something similar.


I planned to add a separate normalizer for that. This keeps the
functions simple and easy to check for correctness. It also allows the
administrator to cherry-pick exactly the normalizers they desire and
that do not break their backend. In the old discussion Willy said that
not everything that might be possible to normalize can actually be
normalized when combined with legacy software.


Ok, that make sense.




Another point is about the dot encoding. It may be good to handle
encoded dot (%2E), may be via an option. And IMHO, the way empty
segments are handle is a bit counter intuitive. Calling "merge-slashes"
normalizer first is a solution of course, but this means rewriting twice
the uri. We must figure out what is the main expectation for this
normalizer. Especially because ignoring empty segment when dot-segments
are merged is not exactly the same than merge all slashes.


The percent encoding of the dot will be handled by a 'percent-decode'
normalizer that decodes percent encoded unreserved characters (RFC 3986,
section 2.3). The administrator then first would use the percent-decode
normalizer, then the merge-slashes normalizer, then the dotdot normalizer.


Well, it is a bit different here. Because someone could choose to not decode 
unreserved characters but want to handle encoded dot in dotdot normalizer.




Yes, it means rewriting the URI several times. But it is nice, explicit
and composes well.


On this point, you're right. It is far cleaner this way.



We can later figure out whether we want to provide "combined
normalizers", such as a 'filesystem' normalizer that would combine the
'.', '..' and '//' normalizers in an efficient way. Adding something
like that later is easy. Changing the behavior of a normalizer later is
hard.


That's true. Depending on feebacks, it will be possible to add more normalizers. 
I'm fine with that.




That's why I'd like to keep them simple "Unix style" for now. Make them
do one thing, make them do it well.


Note I was first surprised that leading dot-segments were preserved,
before reading the 6th patch because for me it is the most important
part. But I'm fine with an option in a way or another.



It's a result of how I approached the development. I wanted to not
rebase my branch more than necessary. I will probably merge the two
patches and change the default once the general approach is approved :-)


Well, it is not a problem. You can keep it in two patches if it is easier for 
you.

--
Christopher Faulet



Re: [RFC PATCH 3/8] MINOR: uri_normalizer: Add a `dotdot` normalizer to http-request normalize-uri

2021-04-13 Thread Tim Düsterhus

Christopher,

On 4/13/21 4:38 PM, Christopher Faulet wrote:
At the end it remains your choice. The function is quite good. I just 
wonder if it could be valuable to also handle single dot-segment here in 
addition to double dot-segment. Thus, the normalizer should be renamed 
"dot-segments" or something similar.


I planned to add a separate normalizer for that. This keeps the 
functions simple and easy to check for correctness. It also allows the 
administrator to cherry-pick exactly the normalizers they desire and 
that do not break their backend. In the old discussion Willy said that 
not everything that might be possible to normalize can actually be 
normalized when combined with legacy software.


Another point is about the dot encoding. It may be good to handle 
encoded dot (%2E), may be via an option. And IMHO, the way empty 
segments are handle is a bit counter intuitive. Calling "merge-slashes" 
normalizer first is a solution of course, but this means rewriting twice 
the uri. We must figure out what is the main expectation for this 
normalizer. Especially because ignoring empty segment when dot-segments 
are merged is not exactly the same than merge all slashes.


The percent encoding of the dot will be handled by a 'percent-decode' 
normalizer that decodes percent encoded unreserved characters (RFC 3986, 
section 2.3). The administrator then first would use the percent-decode 
normalizer, then the merge-slashes normalizer, then the dotdot normalizer.


Yes, it means rewriting the URI several times. But it is nice, explicit 
and composes well.


We can later figure out whether we want to provide "combined 
normalizers", such as a 'filesystem' normalizer that would combine the 
'.', '..' and '//' normalizers in an efficient way. Adding something 
like that later is easy. Changing the behavior of a normalizer later is 
hard.


That's why I'd like to keep them simple "Unix style" for now. Make them 
do one thing, make them do it well.


Note I was first surprised that leading dot-segments were preserved, 
before reading the 6th patch because for me it is the most important 
part. But I'm fine with an option in a way or another.




It's a result of how I approached the development. I wanted to not 
rebase my branch more than necessary. I will probably merge the two 
patches and change the default once the general approach is approved :-)


Best regards
Tim Düsterhus



Re: [RFC PATCH 3/8] MINOR: uri_normalizer: Add a `dotdot` normalizer to http-request normalize-uri

2021-04-13 Thread Christopher Faulet

Le 08/04/2021 à 20:59, Tim Duesterhus a écrit :

Willy,
Christopher,

I'm not very happy with the normalization logic, because am processing the URI
in reverse. This requires me to directly access offsets instead of using the
`ist` API. However this way I don't need to backtrack once I encounter a `../`
which I consider to be a win.


It is not shocking. The function is readable, it is not a real problem. Maybe we 
can introduce the istoff() function to get the pointer at a given offset. You 
may also choose to fully rely on pointers with a negative index. I know you want 
to use the ist api as far as possible, but it is not always the easiest way :)


At the end it remains your choice. The function is quite good. I just wonder if 
it could be valuable to also handle single dot-segment here in addition to 
double dot-segment. Thus, the normalizer should be renamed "dot-segments" or 
something similar.


Another point is about the dot encoding. It may be good to handle encoded dot 
(%2E), may be via an option. And IMHO, the way empty segments are handle is a 
bit counter intuitive. Calling "merge-slashes" normalizer first is a solution of 
course, but this means rewriting twice the uri. We must figure out what is the 
main expectation for this normalizer. Especially because ignoring empty segment 
when dot-segments are merged is not exactly the same than merge all slashes.


Note I was first surprised that leading dot-segments were preserved, before 
reading the 6th patch because for me it is the most important part. But I'm fine 
with an option in a way or another.


--
Christopher Faulet



[RFC PATCH 3/8] MINOR: uri_normalizer: Add a `dotdot` normalizer to http-request normalize-uri

2021-04-08 Thread Tim Duesterhus
Willy,
Christopher,

I'm not very happy with the normalization logic, because am processing the URI
in reverse. This requires me to directly access offsets instead of using the
`ist` API. However this way I don't need to backtrack once I encounter a `../`
which I consider to be a win.

Best regards
Tim Düsterhus

Apply with `git am --scissors` to automatically cut the commit message.

-- >8 --
This normalizer merges `../` path segments with the predecing segment, removing
both the preceding segment and the `../`.

Empty segments do not receive special treatment. The `merge-slashes` normalizer
should be executed first.

See GitHub Issue #714.
---
 doc/configuration.txt  | 12 +
 include/haproxy/action-t.h |  1 +
 include/haproxy/uri_normalizer.h   |  1 +
 reg-tests/http-rules/normalize_uri.vtc | 71 +-
 src/http_act.c | 22 
 src/uri_normalizer.c   | 71 ++
 6 files changed, 177 insertions(+), 1 deletion(-)

diff --git a/doc/configuration.txt b/doc/configuration.txt
index d3030b478..99c845ac7 100644
--- a/doc/configuration.txt
+++ b/doc/configuration.txt
@@ -6016,6 +6016,18 @@ http-request normalize-uri  [ { if | unless 
}  ]
   Performs normalization of the request's URI. The following normalizers are
   available:
 
+  - dotdot: Normalizes "/../" segments within the "path" component. This merges
+  segments that attempt to access the parent directory with their preceding
+  segment. Empty segments do not receive special treatment. Use the
+  "merge-slashes" normalizer first if this is undesired.
+
+  Example:
+  - /foo/../ -> /
+  - /foo/../bar/ -> /bar/
+  - /foo/bar/../ -> /foo/
+  - /../bar/ -> /../bar/
+  - /foo//../-> /foo/
+
   - merge-slashes: Merges adjacent slashes within the "path" component into a
   single slash.
 
diff --git a/include/haproxy/action-t.h b/include/haproxy/action-t.h
index 4a3e3f8bd..ac9399a6b 100644
--- a/include/haproxy/action-t.h
+++ b/include/haproxy/action-t.h
@@ -103,6 +103,7 @@ enum act_timeout_name {
 
 enum act_normalize_uri {
ACT_NORMALIZE_URI_MERGE_SLASHES,
+   ACT_NORMALIZE_URI_DOTDOT,
 };
 
 /* NOTE: if <.action_ptr> is defined, the referenced function will always be
diff --git a/include/haproxy/uri_normalizer.h b/include/haproxy/uri_normalizer.h
index b6e15e281..99b27330e 100644
--- a/include/haproxy/uri_normalizer.h
+++ b/include/haproxy/uri_normalizer.h
@@ -16,6 +16,7 @@
 
 #include 
 
+struct ist uri_normalizer_path_dotdot(const struct ist path, char *trash, 
size_t len);
 struct ist uri_normalizer_path_merge_slashes(const struct ist path, char 
*trash, size_t len);
 
 #endif /* _HAPROXY_URI_NORMALIZER_H */
diff --git a/reg-tests/http-rules/normalize_uri.vtc 
b/reg-tests/http-rules/normalize_uri.vtc
index 3303760d4..e66bdc47b 100644
--- a/reg-tests/http-rules/normalize_uri.vtc
+++ b/reg-tests/http-rules/normalize_uri.vtc
@@ -8,7 +8,7 @@ feature ignore_unknown_macro
 server s1 {
 rxreq
 txresp
-} -repeat 10 -start
+} -repeat 21 -start
 
 haproxy h1 -conf {
 defaults
@@ -29,6 +29,18 @@ haproxy h1 -conf {
 
 default_backend be
 
+frontend fe_dotdot
+bind "fd@${fe_dotdot}"
+
+http-request set-var(txn.before) url
+http-request normalize-uri dotdot
+http-request set-var(txn.after) url
+
+http-response add-header before  %[var(txn.before)]
+http-response add-header after  %[var(txn.after)]
+
+default_backend be
+
 backend be
 server s1 ${s1_addr}:${s1_port}
 
@@ -85,3 +97,60 @@ client c1 -connect ${h1_fe_merge_slashes_sock} {
 expect resp.http.before == "*"
 expect resp.http.after == "*"
 } -run
+
+client c2 -connect ${h1_fe_dotdot_sock} {
+txreq -url "/foo/bar"
+rxresp
+expect resp.http.before == "/foo/bar"
+expect resp.http.after == "/foo/bar"
+
+txreq -url "/foo/.."
+rxresp
+expect resp.http.before == "/foo/.."
+expect resp.http.after == "/"
+
+txreq -url "/foo/../"
+rxresp
+expect resp.http.before == "/foo/../"
+expect resp.http.after == "/"
+
+txreq -url "/foo/bar/../"
+rxresp
+expect resp.http.before == "/foo/bar/../"
+expect resp.http.after == "/foo/"
+
+txreq -url "/foo/../bar"
+rxresp
+expect resp.http.before == "/foo/../bar"
+expect resp.http.after == "/bar"
+
+txreq -url "/foo/../bar/"
+rxresp
+expect resp.http.before == "/foo/../bar/"
+expect resp.http.after == "/bar/"
+
+txreq -url "/foo/../../bar/"
+rxresp
+expect resp.http.before == "/foo/../../bar/"
+expect resp.http.after == "/../bar/"
+
+txreq -url "/foo//../../bar/"
+rxresp
+expect resp.http.before == "/foo//../../bar/"
+expect resp.http.after == "/bar/"
+
+txreq -url "/foo/?bar=/foo/../"
+rxresp
+expect resp.http.before == "/foo/?bar=/foo/../"
+expect resp.http.after ==