On 13/01/2010 22:47, Tero Karttunen wrote:
> Thank you for another reply, Chris! I was secretly hoping that
> somebody would stand up and tell me that I have missed something
> obvious, but the more I look into this issue, the messier it seems.
> But let's not get ahead of things.
> I apologize for the inconsistency in the log lines I posted in my
> earlier message. I had tried to replace (and anonymize) the URL path
> elements with s/ts_core_virtual_repository/contextroot/,
> s/TeamCenterEmulator/subcontext/ and s/geek/localhost/, but I had
> obviously missed some references. Sorry for the confusion. Even though
> the cat's out of the bag, I will stick to the replacements for
> consistency's sake.

Just a thought, mod_jk doesn't always play nicely with other modules if
those modules try to manipulate the URI. Have you tried mod_proxy_http?


>>> Notice that mod_alias has erronously (considering the use case in
>>> question) re-encoded the URL, causing %2B to change into '+' and %3C
>>> to change into equivalent %3c.
>> Note that mod_jk is not involved, here: mod_alias is performing the
>> redirect and mod_jk does not get involved. Also, the change from %3C to
>> %3c is not really a problem: HTTP allows either upper or lowercase
>> %-encoded URI elements (see section 2.2 of
>> http://www.ietf.org/rfc/rfc1738.txt).
> This is correct. The superceding RFC 3986 states: "If two URIs differ
> only in the case of hexadecimal digits used in percent-encoded octets,
> they are equivalent", but it also continues: "For consistency, URI
> producers and normalizers should use uppercase hexadecimal digits for
> all percent-encodings", making one choice preferable to the other.
> [JkOptions +ForwardUriProxy]
>>> Now, if I manually modify the address bar to access
>>> http://localhost/contextroot/subcontext/sites/one%2Bone%3Cfour,
>>> Apache HTTPD access log now shows:
>>> [11/Jan/2010:12:53:37 +0200] "GET
>>> /ts_core_virtual_repository/TeamCenterEmulator/sites/one%2Bone%3Cfour
>>> HTTP/1.1" +200 worker1(worker1) 399 15625
>> Good.
>>> but Tomcat access log still shows:
>>> - - [11/Jan/2010:12:53:34 +0200] "GET
>>> /ts_core_virtual_repository/TeamCenterEmulator/sites/one+one%3Cfour
>>> HTTP/1.1" 200 399
>> Right: that's wrong.
>>> and my application sees after decoding the URL: sites/one one<four
>> Given that Tomcat saw one+one%3Cfour, this is correct decoding.
>> What does the mod_jk log show for this request?
> I reran the test with JK request logging on, and the log shows:
> [Wed Jan 13 12:14:13 2010] worker1 GET
> /contextroot/subcontext/sites/one%2Bone%3Cfour HTTP/1.1 200 0.000000
> so both Apache HTTPD and mod_jk logs show the correctly encoded URLs
> from the browser request.
>>> Quite interesting: No URL rewriting should occur at Apache HTTPD,
>>> because the RedirectMatch rule does not match, but the URLs in HTTPD
>>> and Tomcat access logs are semantically different.
>> Well, the RedirectMatch rule does match for the first request, and it
>> definitely appears that mod_alias is mangling your URL. Have you tried
>> snooping the HTTP conversation to make sure it's not your web browser
>> that is misinterpreting the 302 response from httpd?
> Yes I have. Here is a telnet session for proof.
> ==============================
> <username>@<hostname>:~ $ telnet localhost 80
> Trying
> Connected to <hostname>.
> Escape character is '^]'.
> GET /sites/one%2Bone%3C HTTP/1.0
> HTTP/1.1 302 Found
> Date: Wed, 13 Jan 2010 09:57:32 GMT
> Server: Apache/2.2.14 (Win32) mod_jk/1.2.28
> Location: http://localhost/contextroot/subcontext/sites/one+one%3c
> Content-Length: 274
> Connection: close
> Content-Type: text/html; charset=iso-8859-1
> <html><head>
> <title>302 Found</title>
> </head><body>
> <h1>Found</h1>
> <p>The document has moved <a
> href="http://localhost/contextroot/subcontext/sites/one+one%3c";>here</a>.</p>
> </body></html>
> ==============================
> It looks to me like both mod_alias and mod_jk (with +ForwardUriProxy)
> decode the URLs and do not subsequently re-encode the '+' character.
> I do not have any C coding experience, but I attempted to check mod_jk
> source code for this. mod_jk uses int jk_canonenc(const char *x, char
> *y, int maxlen) function for encoding when +ForwardUriProxy is on.
> Here are the juicy bits (slightly reformatted for brevity):
>     /* characters which should not be encoded */
>     char *allowed = "~$-_.+!*'(),;:@&=";
>     /* characters which much not be en/de-coded */
>     char *reserved = "/";
>     for (i = 0, j = 0; ch != '\0' && j < maxlen; i++, j++, ch=x[i]) {
>         if (strchr(reserved, ch)) { /* always handle '/' first */
>             y[j] = ch;
>             continue;
>         }
>         if (!JK_ISALNUM(ch) && !strchr(allowed, ch)) { /* recode it,
> if necessary */
>             if (j+2<maxlen) {
>                 jk_c2hex(ch, &y[j]);
>                 j += 2;
>             }
>             else {
>                 return JK_FALSE;
>             }
>         }
>         else {
>             y[j] = ch;
>         }
>     }
> mod_alias and mod_rewrite are already reported to suffer from similar
> encoding problems. There are several bug reports; the best I could
> find was https://issues.apache.org/bugzilla/show_bug.cgi?id=32328.
> Link http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=421820 is also
> useful in trying to find out what is going on, as is RFC 3986.
> All this is making my head hurt, but what I guess is going on is that
> the original URL (still available as r->unparsed_uri) is being decoded
> in Apache HTTPD at a very early stage, and once mod_jk or other
> dispatchers activate, the r->uri they handle can already be a result
> of multiple URI manipulations by mod_rewrite and other modules, and
> for that reason it can be considered unsafe to mindlessly re-encode
> some of its reserved characters. But this is only my first guess.
> +ForwardUriCompatUnparsed solves the mod_jk part of the problem _for
> me_, but while HTTPD people are working on bug 32328 (since 2007),
> could it be benecifial for mod_jk to maybe offer a fifth Forwarding
> mode as a workaround for the problem for mod_jk users? Maybe taking a
> list of characters to be encoded as an extra argument?
> Unfortunately, I still have no ideas on how to configure the URL
> redirection for Apache HTTPD so that the plus-characters are preserved
> in encoded format. Does anyone have any ideas or hints?
> Thanks for help!
> Tero Karttunen
