On Mon, Oct 17, 2016 at 4:29 AM, Mark Thomas <ma...@apache.org> wrote:
> On 17/10/2016 08:30, Mark Thomas wrote:
> > On 16/10/2016 19:09, Mark Juszczec wrote:
> >> Hello
> >> I have Tomcat 8.0.28 running on CentOS Linux 7.2.1511 behind Apache
> >> I'm using AJP 1.3 for communication between Apache and Tomcat
> >> Its all powered by Java 1.8
> >> I'm having a problem with international characters when I send them as
> >> request *URI* (which is used by GET requests and this is a GET request).
> >> Let's say I get the string AOËL
> >> mod_jk log logs the bytes with the message
> >> "ajp_connection_tcp_send_message::jk_ajp_common.c (1208): sending to
> >> pos=4 len=1411 max=8192" (at
> >> ajp_connection_tcp_send_message::jk_ajp_common.c) shows them to be:
> >> 41 4f c3 8b 4c
> >> AFAIK this means the correct bytes are being sent to AJP. Is that
> > That is the correct UTF-8 byte encoding for the characters AOËL.
> A small hint. I'd expect those to be % encoded.
Thank you very much for your reply.
I've been thinking the problem is lack of % encoding after reading:
*"Default encoding for GET*
The character set for HTTP query strings (that's the technical term for
'GET parameters') can be found in sections 2 and 2.1 the "URI Syntax"
specification. The character set is defined to be US-ASCII
<http://en.wikipedia.org/wiki/ASCII>. Any character that does not map to
US-ASCII must be encoded in some way. Section 2.1 of the URI Syntax
specification says that characters outside of US-ASCII must be encoded using
% escape sequences: each character is encoded as a literal % followed by
the two hexadecimal codes which indicate its character code. Thus, a (US-ASCII
character code 97 = 0x61) is equivalent to %61. There *is no default
encoding for URIs* specified anywhere, which is why there is a lot of
confusion when it comes to decoding these values. "
Do you know if there's a way to force something (mod_jk, mod_rewrite or
something else) to % encode the data being fed into the AJP port?