Re: [RFC PATCH 4/8] MINOR: uri_normalizer: Add a `sort-query` normalizer

2021-04-13 Thread Christopher Faulet

Le 13/04/2021 à 18:05, Tim Düsterhus a écrit :

Christopher,

On 4/13/21 4:59 PM, Christopher Faulet wrote:

+/* Sorts the parameters within the given query string. Returns an ist
containing
+ * the new path and backed by `trash` or IST_NULL if the `len` not
sufficiently
+ * large to store the resulting path.
+ */
+struct ist uri_normalizer_query_sort(const struct ist query, const
char delim, char *trash, size_t len)
+{


I hadn't noticed it till now, but please don't use "trash" variable
name, it is confusing with the trash buffer used almost everywhere.
Especially because of my next comment :)


In fact the http-request normalize-uri action will pass a trash buffer
pointer into the function, that's why I found the name somewhat fitting.
But the exact method signature is up for discussion anyway (see my other
email).



It is the caller point of view. For the normalizers, it is just a destination 
buffer, not necessarily a trash buffer. Here, it is just to avoid confusion with 
the global variable.


--
Christopher Faulet



Re: [RFC PATCH 4/8] MINOR: uri_normalizer: Add a `sort-query` normalizer

2021-04-13 Thread Tim Düsterhus

Christopher,

On 4/13/21 4:59 PM, Christopher Faulet wrote:
+/* Sorts the parameters within the given query string. Returns an ist 
containing
+ * the new path and backed by `trash` or IST_NULL if the `len` not 
sufficiently

+ * large to store the resulting path.
+ */
+struct ist uri_normalizer_query_sort(const struct ist query, const 
char delim, char *trash, size_t len)

+{


I hadn't noticed it till now, but please don't use "trash" variable 
name, it is confusing with the trash buffer used almost everywhere. 
Especially because of my next comment :)


In fact the http-request normalize-uri action will pass a trash buffer 
pointer into the function, that's why I found the name somewhat fitting. 
But the exact method signature is up for discussion anyway (see my other 
email).


Best regards
Tim Düsterhus



Re: [RFC PATCH 4/8] MINOR: uri_normalizer: Add a `sort-query` normalizer

2021-04-13 Thread Christopher Faulet

Le 08/04/2021 à 20:59, Tim Duesterhus a écrit :

Willy,
Christopher,

This one comes with dynamic allocation. The next patch will add an optimization
for a small number of arguments. However dynamic allocation within the main
processing logic is pretty ugly, so this should be looked at further.



Dynamic allocation should be avoided here. Definitely. I may propose you an 
alternative by using the trash buffer area to store the ist array, via a cast. 
It may be considered a bit ugly at first glance, but if you think about it as a 
trash memory space, it is not so shocking. We've already used this trick for the 
HTX and regular buffers. If you do it that way, by default, it gives you 1024 
slots. And you may return an alloc failure beyond this value. It seems reasonable :)


  
+/* Compares two query parameters by name. Query parameters are ordered

+ * as with memcmp. Shorter parameter names are ordered lower. Identical
+ * parameter names are compared by their pointer to maintain a stable
+ * sort.
+ */
+static int query_param_cmp(const void *a, const void *b)
+{
+   const struct ist param_a = *(struct ist*)a;
+   const struct ist param_b = *(struct ist*)b;
+   const struct ist param_a_name = iststop(param_a, '=');
+   const struct ist param_b_name = iststop(param_b, '=');
+
+   int cmp = memcmp(istptr(param_a_name), istptr(param_b_name), 
MIN(istlen(param_a_name), istlen(param_b_name)));
+
+   if (cmp != 0)
+   return cmp;
+
+   if (istlen(param_a_name) < istlen(param_b_name))
+   return -1;
+
+   if (istlen(param_a_name) > istlen(param_b_name))
+   return 1;
+
+   if (istptr(param_a) < istptr(param_b))
+   return -1;
+
+   if (istptr(param_a) > istptr(param_b))
+   return 1;
+
+   return 0;
+}


To make it more simple, you may use istdiff instead.


+
+/* Sorts the parameters within the given query string. Returns an ist 
containing
+ * the new path and backed by `trash` or IST_NULL if the `len` not sufficiently
+ * large to store the resulting path.
+ */
+struct ist uri_normalizer_query_sort(const struct ist query, const char delim, 
char *trash, size_t len)
+{


I hadn't noticed it till now, but please don't use "trash" variable name, it is 
confusing with the trash buffer used almost everywhere. Especially because of my 
next comment :)



+   struct ist scanner = istadv(query, 1);
+   struct ist *params = NULL;
+   struct ist newquery = ist2(trash, 0);
+   size_t param_count = 0;
+   size_t i;
+
+   if (len < istlen(query))
+   return IST_NULL;
+
+   while (istlen(scanner) > 0) {
+   const struct ist param = istsplit(&scanner, delim);
+   struct ist *realloc = reallocarray(params, param_count + 1, 
sizeof(*realloc));


Here, instead of a dynamic array, you may use the trash buffer area (declared 
from outside the loop). For instance:


struct ist *params = (struct ist *)b_orig(&trash);
size_t max_param = b_size(&trash) / sizeof(*params);


+
+   if (!realloc)
+   goto fail;
+
+   params = realloc;
+
+   params[param_count] = param;
+   param_count++;
+   }
+
+   qsort(params, param_count, sizeof(*params), query_param_cmp);
+
+   newquery = __istappend(newquery, '?');
+   for (i = 0; i < param_count; i++) {
+   if (i > 0)
+   newquery = __istappend(newquery, '&');
+
+   if (istcat(&newquery, params[i], len) < 0)
+   goto fail;
+   }
+
+   free(params);
+
+   return newquery;
+
+  fail:
+   free(params);
+
+   return IST_NULL;
+}
  
  /*

   * Local variables:




--
Christopher Faulet



[RFC PATCH 4/8] MINOR: uri_normalizer: Add a `sort-query` normalizer

2021-04-08 Thread Tim Duesterhus
Willy,
Christopher,

This one comes with dynamic allocation. The next patch will add an optimization
for a small number of arguments. However dynamic allocation within the main
processing logic is pretty ugly, so this should be looked at further.

Best regards
Tim Düsterhus

Apply with `git am --scissors` to automatically cut the commit message.

-- >8 --
This normalizer sorts the `&` delimited query parameters by parameter name.

See GitHub Issue #714.
---
 doc/configuration.txt  |  9 +++
 include/haproxy/action-t.h |  1 +
 include/haproxy/uri_normalizer.h   |  1 +
 reg-tests/http-rules/normalize_uri.vtc | 81 +-
 src/http_act.c | 23 
 src/uri_normalizer.c   | 80 +
 6 files changed, 194 insertions(+), 1 deletion(-)

diff --git a/doc/configuration.txt b/doc/configuration.txt
index 99c845ac7..eacd8ff26 100644
--- a/doc/configuration.txt
+++ b/doc/configuration.txt
@@ -6035,6 +6035,15 @@ http-request normalize-uri  [ { if | unless 
}  ]
   - //-> /
   - /foo//bar -> /foo/bar
 
+  - sort-query: Sorts the query string parameters by parameter name.
+  Parameters are assumed to be delimited by '&'. Shorter names sort before
+  longer names and identical parameter names maintain their relative order.
+
+  Example:
+  - /?c=3&a=1&b=2 -> /?a=1&b=2&c=3
+  - /?aaa=3&a=1&aa=2  -> /?a=1&aa=2&aaa=3
+  - /?a=3&b=4&a=1&b=5&a=2 -> /?a=3&a=1&a=2&b=4&b=5
+
 http-request redirect  [ { if | unless }  ]
 
   This performs an HTTP redirection based on a redirect rule. This is exactly
diff --git a/include/haproxy/action-t.h b/include/haproxy/action-t.h
index ac9399a6b..332be513f 100644
--- a/include/haproxy/action-t.h
+++ b/include/haproxy/action-t.h
@@ -104,6 +104,7 @@ enum act_timeout_name {
 enum act_normalize_uri {
ACT_NORMALIZE_URI_MERGE_SLASHES,
ACT_NORMALIZE_URI_DOTDOT,
+   ACT_NORMALIZE_URI_SORT_QUERY,
 };
 
 /* NOTE: if <.action_ptr> is defined, the referenced function will always be
diff --git a/include/haproxy/uri_normalizer.h b/include/haproxy/uri_normalizer.h
index 99b27330e..5884e5ab9 100644
--- a/include/haproxy/uri_normalizer.h
+++ b/include/haproxy/uri_normalizer.h
@@ -18,6 +18,7 @@
 
 struct ist uri_normalizer_path_dotdot(const struct ist path, char *trash, 
size_t len);
 struct ist uri_normalizer_path_merge_slashes(const struct ist path, char 
*trash, size_t len);
+struct ist uri_normalizer_query_sort(const struct ist query, const char delim, 
char *trash, size_t len);
 
 #endif /* _HAPROXY_URI_NORMALIZER_H */
 
diff --git a/reg-tests/http-rules/normalize_uri.vtc 
b/reg-tests/http-rules/normalize_uri.vtc
index e66bdc47b..8be81c574 100644
--- a/reg-tests/http-rules/normalize_uri.vtc
+++ b/reg-tests/http-rules/normalize_uri.vtc
@@ -8,7 +8,7 @@ feature ignore_unknown_macro
 server s1 {
 rxreq
 txresp
-} -repeat 21 -start
+} -repeat 34 -start
 
 haproxy h1 -conf {
 defaults
@@ -41,6 +41,18 @@ haproxy h1 -conf {
 
 default_backend be
 
+frontend fe_sort_query
+bind "fd@${fe_sort_query}"
+
+http-request set-var(txn.before) url
+http-request normalize-uri sort-query
+http-request set-var(txn.after) url
+
+http-response add-header before  %[var(txn.before)]
+http-response add-header after  %[var(txn.after)]
+
+default_backend be
+
 backend be
 server s1 ${s1_addr}:${s1_port}
 
@@ -154,3 +166,70 @@ client c2 -connect ${h1_fe_dotdot_sock} {
 expect resp.http.before == "*"
 expect resp.http.after == "*"
 } -run
+
+client c3 -connect ${h1_fe_sort_query_sock} {
+txreq -url "/?a=a"
+rxresp
+expect resp.http.before == "/?a=a"
+expect resp.http.after == "/?a=a"
+
+txreq -url "/?a=a&z=z"
+rxresp
+expect resp.http.before == "/?a=a&z=z"
+expect resp.http.after == "/?a=a&z=z"
+
+txreq -url "/?z=z&a=a"
+rxresp
+expect resp.http.before == "/?z=z&a=a"
+expect resp.http.after == "/?a=a&z=z"
+
+txreq -url "/?a=z&z=a"
+rxresp
+expect resp.http.before == "/?a=z&z=a"
+expect resp.http.after == "/?a=z&z=a"
+
+txreq -url "/?z=a&a=z"
+rxresp
+expect resp.http.before == "/?z=a&a=z"
+expect resp.http.after == "/?a=z&z=a"
+
+txreq -url "/?c&b&a&z&x&y"
+rxresp
+expect resp.http.before == "/?c&b&a&z&x&y"
+expect resp.http.after == "/?a&b&c&x&y&z"
+
+txreq -url "/?a=&aa=&aaa=&="
+rxresp
+expect resp.http.before == "/?a=&aa=&aaa=&="
+expect resp.http.after == "/?a=&aa=&aaa=&="
+
+txreq -url "/?=&a=&aa=&aaa="
+rxresp
+expect resp.http.before == "/?=&a=&aa=&aaa="
+expect resp.http.after == "/?a=&aa=&aaa=&="
+
+txreq -url "/?a=5&a=3&a=1&a=2&a=4"
+rxresp
+expect resp.http.before == "/?a=5&a=3&a=1&a=2&a=4"
+expect resp.http.after == "/?a=5&a=3&a=1&a=2&a=4"
+
+txreq -url "/?a=5&b=3&a=1&a=2&b=