Re: mod_proxy_html and special characters
For clarity's sake, the spec defines these two entities as not-equal. Of course, %41 and 'A' are equivilant, so such a function might not be a bad thing to have in refactoring URI handling. On Mon, May 28, 2018, 04:10 Nick Kew wrote: > > >> ctx->buf = http://internal/!%22%23$/ > >> m->from.c = http://internal/!"#$/ > > A further thought arising from that. > > Just as strcasecmp is case-independent, the world could no doubt use > a standard library function that would treat the above as equal. > > Something like > int stringcmp(const char *a, const char *b, unsigned int flags) > where flags would control behaviour such as case-independence, > and equivalence over URLencoding, HTML encoding, HTML entities, > and whatever else someone might like to support (maybe integrate > with locale too?). > > Anyone know of such a thing? > > -- > Nick Kew >
Re: mod_proxy_html and special characters
>> ctx->buf = http://internal/!%22%23$/ >> m->from.c = http://internal/!"#$/ A further thought arising from that. Just as strcasecmp is case-independent, the world could no doubt use a standard library function that would treat the above as equal. Something like int stringcmp(const char *a, const char *b, unsigned int flags) where flags would control behaviour such as case-independence, and equivalence over URLencoding, HTML encoding, HTML entities, and whatever else someone might like to support (maybe integrate with locale too?). Anyone know of such a thing? -- Nick Kew
Re: mod_proxy_html and special characters
> On 28 May 2018, at 08:50, Micha Lenk wrote: > > The reason I am asking this is, because for Location matching, Apache httpd > apparently does map a request with a URL encoded path to the non-encoded > configured path. For example, if I have configured in a virtual host: Yes of course httpd deals with encoding, as it must, in processing a request URL. > >ProxyPass "http://internal/!\"#$/"; >ProxyHTMLURLMap "http://internal/!\"#$/"; "http://external/!\"#$/"; >... > mod_proxy_html is not processing a request URL, it's processing contents in the response. Contents destined, and encoded, for a HTTP Client. The resemblence is entirely coincidental. To align the behaviour on grounds of consistency would seem to me misleading! -- Nick Kew
Re: mod_proxy_html and special characters
Hi Eric, On 05/25/2018 06:57 PM, Eric Covener wrote: http://internal/!%22%23$/";>A link with special characters >> ProxyHTMLURLMap "http://internal/!\"#$/"; "http://external/!\"#$/"; Is it reasonable to expect mod_proxy_html to rewrite URL encoded URLs as well? > IMO no, I don't think the literals in the first argument should be expected to match the URL-encoded content The reason I am asking this is, because for Location matching, Apache httpd apparently does map a request with a URL encoded path to the non-encoded configured path. For example, if I have configured in a virtual host: ProxyPass "http://internal/!\"#$/"; ProxyHTMLURLMap "http://internal/!\"#$/"; "http://external/!\"#$/"; ... ... then for matching the location container it does not matter whether the path of the request is URL encoded or not. I consider this behavior a bit inconsistent. URL-encoded requests get proxied to the internal resource as if they were not URL-encoded. But URL-encoding a few characters in the path is sufficient to bypass HTML rewriting. Regards, Micha
Re: mod_proxy_html and special characters
> On 25 May 2018, at 17:43, Micha Lenk wrote: > > 524 s_from = strlen(m->from.c); > 525 if (!strncasecmp(ctx->buf, m->from.c, s_from)) { > ... ... do the string replacement ... > > > ... where ctx->buf is the URL found in the HTML document, and m->from.c is > the first configured argument of ProxyHTMLURLMap. So, if the latter is a > prefix of the first, this condition should be true and the string replacement > should happen. When the expected string replacement doesn't happen, the > condition is false and the values of the variables are: > > ctx->buf = http://internal/!%22%23$/ > m->from.c = http://internal/!"#$/ > > So, the strings don't match and are not replaced for that reason. Yep. mod_proxy_html takes what it sees. That's why it relies on another module (mod_xml2enc) for i18n, which is kind-of what I expected to see from your subject line! > Going forward I am not interested in finding a work around for this, but more > how to approach a fix (if this is a bug at all). > > Is it reasonable to expect mod_proxy_html to rewrite URL encoded URLs as well? I think it's reasonable to use the escaped html in your ProxyHTMLURLMap. If we have mod_proxy_html unescape characters, it adds complexity to the code, and (perhaps more to the point) presents a mirror-image of your problem to anyone with the opposite expectations. > Let's assume this needs to be fixed. To make the strings match, we could > either URL escape the value from the Apache directive ProxyHTMLURLMap, or URL > temporarily URL-decode the string found in the HTML document just for the > purpose of the string comparison. What is the right thing to do? I prefer to leave it to server admins to find the match that works for them. I don't recollect this particular question ever arising in 15 years, which kind-of suggests users are not confused by it! -- Nick Kew
Re: mod_proxy_html and special characters
On Fri, May 25, 2018 at 11:57 AM, Eric Covener wrote: > > http://internal/!%22%23$/";>A link with special characters > > > ProxyHTMLURLMap "http://internal/!\"#$/"; "http://external/!\"#$/"; > > > Is it reasonable to expect mod_proxy_html to rewrite URL encoded URLs as > > well? > > IMO no, I don't think the literals in the first argument should be > expected to match the URL-encoded content > Agreed that the pattern above should only match and pass (or reflect, in a rewrite case) a literal '#' for a fragment. If you mean %23, don't write it as '#'. The %-enc should be retained, and matched distinctly, unless their plaintext is equivalent, e.g. meets none of the sub-delim or delim or restricted set. Which must therefore include %25, % encoded '%' itself. Any %41 or 'A' are equivalent because their definition is an identity. But I don't know that you can use %41 in the match pattern as we would not decode that, and you likely can force any result to contain a %41. This is not well handled in general, there are ideas floating around, but since there is no committee interest beyond 2.4.x and complete division of opinion on how anything >2.4.x would be managed, it looks most practical to clearly document existing observed behavior.
Re: mod_proxy_html and special characters
> http://internal/!%22%23$/";>A link with special characters > ProxyHTMLURLMap "http://internal/!\"#$/"; "http://external/!\"#$/"; > Is it reasonable to expect mod_proxy_html to rewrite URL encoded URLs as > well? IMO no, I don't think the literals in the first argument should be expected to match the URL-encoded content
mod_proxy_html and special characters
Hi all, I'm currently facing an issue where the directive ProxyHTMLURLMap does not work. And I am not sure whether that is by design or not, and where I would appreciate some feedback. Let's assume an imaginary backend server delivers a HTML page that contains a link like this: http://internal/!%22%23$/";>A link with special characters Please note that %22 is the double quote that needs to be encoded to not break the HTML, and %23 is the '#' character, which we don't want to get treated as anchor in this case. So, the unencoded URL would look like this: http://internal/!"#$/ Now, Apache configured as reverse proxy should rewrite this link to http://external/!"#$/ (or http://external/!%22%23$/), but not any other links outside the sub directory /!"#$/ (nor /!%22%23$/). An imaginary configuration to achieve that and to showcase the issue I am trying to get feedback on looks like this: ProxyHTMLURLMap "http://internal/!\"#$/"; "http://external/!\"#$/"; Please note that the double quote is only escaped here with a backslash to cater for the Apache configuration syntax requirements. This does not work, i.e. the URL in the HTML document doesn't get rewritten. Let's try to better understand what exactly is happening here. Looking into the code of mod_proxy_html.c (trunk, SVN rev. 1832252), this is where the string comparison happens: 524 s_from = strlen(m->from.c); 525 if (!strncasecmp(ctx->buf, m->from.c, s_from)) { ... ... do the string replacement ... ... where ctx->buf is the URL found in the HTML document, and m->from.c is the first configured argument of ProxyHTMLURLMap. So, if the latter is a prefix of the first, this condition should be true and the string replacement should happen. When the expected string replacement doesn't happen, the condition is false and the values of the variables are: ctx->buf = http://internal/!%22%23$/ m->from.c = http://internal/!"#$/ So, the strings don't match and are not replaced for that reason. Going forward I am not interested in finding a work around for this, but more how to approach a fix (if this is a bug at all). Is it reasonable to expect mod_proxy_html to rewrite URL encoded URLs as well? Let's assume this needs to be fixed. To make the strings match, we could either URL escape the value from the Apache directive ProxyHTMLURLMap, or URL temporarily URL-decode the string found in the HTML document just for the purpose of the string comparison. What is the right thing to do? If you have managed read all this down to this line, I am curious about your feedback. :) Regards, Micha