On 19.10.2016 00:21, Mark Thomas wrote:
On 18/10/2016 23:10, Mark Juszczec wrote:
On Oct 18, 2016 5:37 PM, "Mark Thomas" <ma...@apache.org> wrote:


Java handles bytes as signed (-128 to 127) but the data in the input
stream is unsigned. The additional Fs are an artefact of whatever those
bytes were cast to.

It looks normal to me.

That's what i thought but didn't think it would hurt to double check.

What's interesting is the next level up, in CoyoteAdapter (I'll have to
double check that) in HttpServletRequest the data appears as the String

JOÃ[CTL-CHAR]L

In that String, the Ã[CTL-CHAR] are bytes 0xc3 0x83 0xc2 0x8b and is a
corruption of  Ë (0xc3 0x8b)

I'm not sure how we go from the correct bytes to 0xc3 0x8b 0xc2 0x8b.

Nor me. For the record I did test this and it worked as expected - no
corruption.

I wonder if it is worth a clean install of httpd, mod_jk and Tomcat and
then running a simple test.


I was going to suggest the same thing : a standard simple installation of httpd, mod_jk and Tomcat (without Shibbolet), and a simple test.
Justification :
a) I run several international (French, German, Spanish, English) applications with httpd + mod_jk + tomcat, quasi "out of the box", with mostly the default parameters for mod_jk (e.g. no special JkOptions), since years, multiple versions of all the above, and I have never seen such corruption.
b) I do not use Shibbolet
c) having had a look at the httpd configuration that the OP posted, I cannot see anything definitely wrong, but it is certainly not an out-of-the-box from-the-apache-website configuration, so it is bit hard to figure out what is really going on there.
(It looks more like some pre-packaged setup for one particular application or 
framework)

d) some of the above characters/bytes sequences look quite like a double UTF-8 encoding took place : - an "Ë", would be encoded in UTF-8 as the 2 bytes 0xc3 0x8b (which seen as ISO-8859-1 bytes/characters, would look like "A tilde" followed by an unprintable control character) - then if you considered these 2 bytes again as 2 ISO-8859-1 characters, and re-encoded them in UTF-8, you might indeed get something like 0xc3 0x8b 0xc2 0x8b.
(0xc3 0x8b for the "A tilde", and 0xc2 0x8b for the "control character").
(I have not really checked the exact bytes sequences, but at least they look 
plausible)

Here is a link to a great tool for that kind of thing :
http://unicode.scarfboy.com/?s=U%2b00cb


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org

Reply via email to