[PATCH] HTTP: Add new uri_normalization_percent_decode option

2023-02-15 Thread Michael Kourlas via nginx-devel
# HG changeset patch
# User Michael Kourlas 
# Date 1676408746 18000
#  Tue Feb 14 16:05:46 2023 -0500
# Node ID 129437ade41b14a584fb4b7558accc1b8dee7f45
# Parent  cffaf3f2eec8fd33605c2a37814f5ffc30371989
HTTP: Add new uri_normalization_percent_decode option

This patch addresses ticket #2225 by adding a new
uri_normalization_percent_decode configuration option that controls which
characters are percent-decoded by nginx as part of its URI normalization.

The option has two values: "all" and "all-except-reserved". "all" is the
default value and is the current behaviour. When the option is set to
"all-except-reserved", nginx percent-decodes all characters except those in the
reserved set defined by RFC 3986:

  reserved= gen-delims / sub-delims

  gen-delims  = ":" / "/" / "?" / "#" / "[" / "]" / "@"

  sub-delims  = "!" / "$" / "&" / "'" / "(" / ")"
  / "*" / "+" / "," / ";" / "="

In addition, when "all-except-reserved" is used, nginx will not re-encode "%"
from the request URI when it observes that it is part of a percent-encoded
reserved character.

When nginx percent-decodes reserved characters, this can often change the
request URI's semantics, making it impossible to use a normalized URI for
certain use cases. "uri_normalization_percent_decode" gives the configuration
author the freedom to determine which reserved characters are semantically
relevant and which are not.

For example, consider the following location block, which handles part of a
hypothetical API:

location ~ ^/api/objects/[^/]+/subobjects(/.*)?$ {
...
}

Because nginx always normalizes "%2F" to "/", this location block will not
match a path of /api/objects/sample%2Fname/subobjects, even if the API permits
"/" to appear percent-encoded in the URI as part of object names. nginx will
instead interpret this as /api/objects/sample/name/subobjects, a completely
different path. Setting "uri_normalization_percent_decode" to
"all-except-reserved" will leave "%2F" encoded, resulting in the expected
behaviour.

diff -r cffaf3f2eec8 -r 129437ade41b src/core/ngx_string.c
--- a/src/core/ngx_string.c Thu Feb 02 23:38:48 2023 +0300
+++ b/src/core/ngx_string.c Tue Feb 14 16:05:46 2023 -0500
@@ -1487,7 +1487,8 @@


 uintptr_t
-ngx_escape_uri(u_char *dst, u_char *src, size_t size, ngx_uint_t type)
+ngx_escape_uri(u_char *dst, u_char *src, size_t size, ngx_uint_t type,
+ngx_uint_t skip_preencoded_type)
 {
 ngx_uint_t  n;
 uint32_t   *escape;
@@ -1641,7 +1642,11 @@
 n = 0;

 while (size) {
-if (escape[*src >> 5] & (1U << (*src & 0x1f))) {
+if ((escape[*src >> 5] & (1U << (*src & 0x1f)))
+&& !(*src == '%' && size >= 3
+ && ngx_escape_uri_skip_preencoded_character(
+   src + 1, skip_preencoded_type)))
+{
 n++;
 }
 src++;
@@ -1652,7 +1657,11 @@
 }

 while (size) {
-if (escape[*src >> 5] & (1U << (*src & 0x1f))) {
+if ((escape[*src >> 5] & (1U << (*src & 0x1f)))
+&& !(*src == '%' && size >= 3
+ && ngx_escape_uri_skip_preencoded_character(
+   src + 1, skip_preencoded_type)))
+{
 *dst++ = '%';
 *dst++ = hex[*src >> 4];
 *dst++ = hex[*src & 0xf];
@@ -1668,6 +1677,87 @@
 }


+ngx_uint_t
+ngx_escape_uri_skip_preencoded_character(u_char *hex_component,
+ngx_uint_t skip_preencoded_type)
+{
+u_charch, decoded_ch;
+uint32_t *skip;
+
+static uint32_t   none[] = {
+0x, /*          */
+
+/* ?>=< ;:98 7654 3210  /.-, +*)( '&%$ #"!  */
+0x, /*          */
+
+/* _^]\ [ZYX WVUT SRQP  ONML KJIH GFED CBA@ */
+0x, /*          */
+
+/*  ~}| {zyx wvut srqp  onml kjih gfed cba` */
+0x, /*          */
+
+0x, /*          */
+0x, /*          */
+0x, /*          */
+0x  /*          */
+};
+
+static uint32_t   reserved_only[] = {
+0x, /*          */
+
+/* ?>=< ;:98 7654 3210  /.-, +*)( '&%$ #"!  */
+0xac009fda, /* 1010 1100    1001  1101 1010 */
+
+/* _^]\ [ZYX WVUT SRQP  ONML KJIH GFED CBA@ */
+0x2801, /* 0010 1000       0001 */
+
+/*  ~}| {zyx wvut srqp  onml kjih gfed cba` */
+0x, /*          */
+
+0x, /*      0

Re: [PATCH] HTTP: Add new uri_normalization_percent_decode option

2023-03-27 Thread Michael Kourlas via nginx-devel
Hello,

Thanks for your comments! Sorry for the delay in responding -- I was on
vacation.

> As far as I understand, it will irreversibly corrupt URIs with
> double-encoded reserved characters.  For example, "%252F" will
> become "%2F" when proxying in the following configuration:
>
> location /foo/ {
> proxy_pass http://upstream/foo/;
> }

I believe you are correct. When the "all-except-reserved" option is used, nginx
will not recognize, during the re-encoding step, that the "%" in "%2F" was
originally percent-encoded and not part of a percent-encoded "/".

However, I think this issue can be addressed by renaming "all-except-reserved"
to "all-except-percent-and-reserved" and adding "%" to the corresponding set of
characters that is not decoded or encoded automatically when that option is
used. "%252F" will then be left untouched throughout the entire process.

> Further, requests to static files with (properly escaped) reserved
> characters will simply fail, because nginx won't decode these
> characters.  For example, in the following trivial configuration a
> request to "/foo%3Fbar" won't be decoded to match "/foo?bar" file
> under the document root:
>
> location / {
> # static files
> }

Yes, I agree it doesn't make sense to keep "%" and reserved characters
percent-encoded when performing a filesystem lookup, because at that point we
can be certain they're not being used as delimiters or anything other than
data.

However, I think this issue can be addressed by decoding any remaining
percent-encoded characters at the point of filesystem lookup, as well as any
other place where you think it is appropriate. (Of course, this should only be
done when using the "all-except-percent-and-reserved" option, and only with "%"
and reserved characters, to avoid double-decoding double-encoded characters.)

> Please also note that the configuration directive you've
> introduced in this patch applies to URI parsing from not-yet-final
> server block (see [1] for details), but the configuration from the
> final server block will be used for URI escaping.  These
> configuration can be different, and this might result in various
> additional issues.

I think this issue can be addressed by making the new directive available only
in http blocks. This ensures that all virtual servers share the same
configuration.

If you agree with these suggestions, I'd be happy to submit an updated patch.

Reserved characters are often used as delimiters in APIs, and I believe it is
important that nginx be able to distinguish between usages as delimiters and
usages as data.

Thanks,

Michael Kourlas

 Confidentiality notice

This e-mail message and any attachment hereto contain confidential information 
which may be privileged and which is intended for the exclusive use of its 
addressee(s). If you receive this message in error, please inform sender 
immediately and destroy any copy thereof. Furthermore, any disclosure, 
distribution or copying of this message and/or any attachment hereto without 
the consent of the sender is strictly prohibited. Thank you.
___
nginx-devel mailing list
nginx-devel@nginx.org
https://mailman.nginx.org/mailman/listinfo/nginx-devel


RE: [PATCH] HTTP: Add new uri_normalization_percent_decode option

2023-03-30 Thread Michael Kourlas via nginx-devel
Hello,

Thanks again for your comments.

> This implies, basically, that there are 3 forms of the request
> URI: 1) fully encoded, as in $request_uri, 2) fully decoded, as in
> $uri now, and 3) "all-except-percent-and-reserved". To implement this
> correctly, it needs clear definition when each form is used, and
> it is going to be a non-trivial task to do this safely.

I agree. A simple way to do this would be to make percent-decoding customizable
on a per-directive basis. The core use case I was hoping to support is
preserving encoded reserved characters in location matching (basically what was
proposed in [1]), so that is what I would like to focus on in a reworked
version of this patch.

I propose the following:

(1) The addition of a new variable called $uri_encoded_percent_and_reserved. As
discussed, this variable is a special version of the normalized URI ($uri)
that preserves any percent-encoded "%" or reserved characters.

(2) Every transformation applied to $uri (e.g. from the "rewrite" directive,
internal redirects, etc.) is automatically applied to
$uri_encoded_percent_and_reserved as well.

If this raises performance concerns, a new flag could be added to enable or
disable the availability of $uri_encoded_percent_and_reserved.

(3) The addition of a new optional parameter to the URI form of "location"
blocks called "match-source":

location [ = | ~ | ~* | ^~ ] uri 
[match-source=uri|uri-encoded-percent-and-reserved] {
...
}

For example:

location ~ ^/api/objects/[^/]+/subobjects(/.*)?$ 
match-source=uri-encoded-percent-and-reserved {
...
}

"match-source=uri" is the default and the current behaviour. When
"uri-encoded-percent-and-reserved" is used, the location matching for that
block uses $uri_encoded_percent_and_reserved rather than $uri. Nested location
blocks are not affected (unless they also use
"uri-encoded-percent-and-reserved").

In future it would be possible to use a similar pattern with other directives
that use $uri, such as "proxy_pass", but that can be done as part of a separate
patch.

If you think this is a sensible approach, I will submit a revised patch
implementing it.

Thanks,

Michael Kourlas

[1] https://trac.nginx.org/nginx/ticket/2225

 Confidentiality notice

This e-mail message and any attachment hereto contain confidential information 
which may be privileged and which is intended for the exclusive use of its 
addressee(s). If you receive this message in error, please inform sender 
immediately and destroy any copy thereof. Furthermore, any disclosure, 
distribution or copying of this message and/or any attachment hereto without 
the consent of the sender is strictly prohibited. Thank you.
___
nginx-devel mailing list
nginx-devel@nginx.org
https://mailman.nginx.org/mailman/listinfo/nginx-devel


RE: [PATCH] HTTP: Add new uri_normalization_percent_decode option

2023-04-03 Thread Michael Kourlas via nginx-devel
Hello,

Thanks again for your feedback.

> Consider the following rewrite:
>
> rewrite ^/(.*) /$1 break;
>
> Assuming request to "GET /foo%2fbar/", what
> $uri_encoded_percent_and_reserved do you expect after each of
> these rewrites?

I do not think that rewrite does anything in practice. Following the rewrite,
I would expect $uri to remain unchanged at its current value of "/foo/bar/" and
$uri_encoded_percent_and_reserved to similarly remain unchanged at its current
value of "/foo%2fbar/".

> Similarly, consider the following rewrite:
>
> rewrite ^/foo/(.*) /$1 break;
>
> What $uri_encoded_percent_and_reserved is expected after the
> rewrite?

In this case the regular expression matches $uri but not
$uri_encoded_percent_and_reserved. One could say that this just means that only
$uri is updated, but that has the potential to cause confusion when a flag is
used that changes the control flow (unless the user explicitly opts into this
behaviour).

This could be addressed by adding a "match-source" optional argument to
"rewrite" with three values (and a default of "uri"):
* "uri" - rewrite directive matches and changes $uri only
* "uri_encoded_percent_and_reserved" - rewrite directive matches and changes
  $uri_encoded_percent_and_reserved only
* "all" - rewrite directive matches and changes both (if only one is matched,
  directive is not applied)

It might also be a good idea to add "uri_encoded_percent_and_reserved_regex"
and "uri_encoded_percent_and_reserved_replacement" arguments to be used with
"all", so that it is possible to use the same directive and flag even when
needing to perform slightly different rewrites for $uri versus
$uri_encoded_percent_and_reserved.

> Consider the following configuration:
>
> location /foo%2fbar/ match-source=uri-encoded-percent-and-reserved {
> ...
> }
>
> location /foo/bar/ match-source=uri {
> ...
> }
>
> The question is: which location is expected to be matched for the
> request "GET /foo%2fbar/"?

Both blocks match their respective variables. Since the first block has the
longest matching prefix, I expect it will be selected.

> Which location is expected to be matched for the request "GET
> /foo%2Fbar/" (note that it is exactly equivalent to "GET
> /foo%2fbar/").

Only the second block matches its variable, so I expect it will be selected.

Although paths are generally case sensitive, a percent-encoded character is not
supposed to be, so this behaviour is unfortunate. One possibility is to
automatically use case-insensitive matching for any part of a location prefix
using "match-source=uri-encoded-percent-and-reserved" that is a percent
encoded "%" or reserved character.

Alternatively this behaviour could be documented with an instruction to use
regular expressions instead.

> Assuming static handling in the locations, what happens with the
> request "GET /foo%2fbar/..%2fbazz"?

The first block would be used. However, $uri would be used for static
handling, so the path "/foo/bazz" would be looked up on the filesystem, not
"/foo%2fbar/..%2fbazz".

> Note that the behaviour does not seem to be obvious, and it is an
> open question if it can be clarified to be safe.

Fair enough. I am certainly happy to continue making changes to my proposal to
address the specific concerns you raise. However, are you saying that you have
broader overall concerns about safety, complexity, etc. that make a patch
implementing the proposal unlikely to be accepted, even if all specific
concerns are addressed?

Thanks,

Michael Kourlas

 Confidentiality notice

This e-mail message and any attachment hereto contain confidential information 
which may be privileged and which is intended for the exclusive use of its 
addressee(s). If you receive this message in error, please inform sender 
immediately and destroy any copy thereof. Furthermore, any disclosure, 
distribution or copying of this message and/or any attachment hereto without 
the consent of the sender is strictly prohibited. Thank you.
___
nginx-devel mailing list
nginx-devel@nginx.org
https://mailman.nginx.org/mailman/listinfo/nginx-devel


RE: [PATCH] HTTP: Add new uri_normalization_percent_decode option

2023-04-06 Thread Michael Kourlas via nginx-devel
Hello,

Thanks for your lengthy explanation -- it's much appreciated.

I'll find a way to support my use case in the upstream server instead.

Best,

Michael Kourlas

 Confidentiality notice

This e-mail message and any attachment hereto contain confidential information 
which may be privileged and which is intended for the exclusive use of its 
addressee(s). If you receive this message in error, please inform sender 
immediately and destroy any copy thereof. Furthermore, any disclosure, 
distribution or copying of this message and/or any attachment hereto without 
the consent of the sender is strictly prohibited. Thank you.
___
nginx-devel mailing list
nginx-devel@nginx.org
https://mailman.nginx.org/mailman/listinfo/nginx-devel