Re: resource encoding troubles

Garret Wilson Fri, 29 Aug 2014 09:16:01 -0700

Hi, all. Thanks Andrew for that attempt to reproduce this. I haveverified this on Wicket 6.16.0 and 7.0.0-M2.

I have checked out the latest code fromhttps://git-wip-us.apache.org/repos/asf/wicket.git . I was going totrace this down in the code, but then I was stopped in my tracks with anEclipse m2e bug <https://bugs.eclipse.org/bugs/show_bug.cgi?id=371618>that won't even let me clean/compile the project. Argg!! Alwayssomething, huh?

But I did start looking in the code. IsoPropertiesFileLoader lookscompletely OK; it uses Properties.load(InputStream), and the file evenindicates that the input encoding must be ISO-8859-1. Not much could gowrong there. I back-referenced the calls up the chain toWicketMessageTagHandler.onComponentTag(Component, ComponentTag), and itlooks straightforward there---but that's for message tags, not message body.

I investigated downwards from WicketMessageResolver.resolve(...) (whichI presume is what is at play here), which has this code:


   MessageContainer label = new MessageContainer(id, messageKey);

The MessageContainer.onComponentTagBody(...) simply looks up the valueand calls renderMessage(), which in turn does some complicated ${var}replacement using MapVariableInterpolator and then write out the resultusing getResponse().write(text). Unless MapVariableInterpolator messesup the value during variable replacement (but there are no variables toreplace in this situation), then on the surface everything looks OK.


So I decided to do an experiment; I changed the HTML to this:

   <p>This a © copyright. <small><wicket:message key="copyright">dummy
   text</wicket:message></small></p>

And I changed the properties to this:

   copyright=This a © copyright.


Here is what was produced:

   This a © copyright. This a ï¿½ copyright.

So something is going on here in the generation of the included message,because as you can see the content from XML gets produced correctly. Itturns out <http://stackoverflow.com/a/6367675/421049> that ï¿½ is theUTF-8 sequence for U+FFFD, which is the Unicode replacement characterwhen an invalid UTF-8 sequence is encountered. And of course, thecopyright symbol U+00A9 is not a valid UTF-8 value, even thought it isfine as part of ISO-8859-1.

So here is the problem: something is taking the string generated by themessage (which was parsed correctly from the properties file) andwriting it to the output stream, not in UTF-8 as it should, but in someother encoding. If I were to guess here, I would say that the embeddedmessage is writing out in Windows cp1252 (more or less ISO-8859-1),which is my default encoding (which would explain why Andrew didn't seethis, if his system is Linux and the default encoding happens to beUTF-8 for example). This seems incorrect to me; the embedded messageshould know that it is writing into a UTF-8 output stream and should usethat instead of the system encoding.

Remember that I can't even compile the code because of an m2e bug, soall of this is highly conjectural, just from visually inspecting thecode and doing a few experiments. But I have a hunch that if you switchto a machine that has a default system encoding that isn't UTF-8, you'llreproduce this issue. And I further predict that if you trace throughthe code, the embedded <wicket:message> tag is incorrectly injecting itscontents using the system encoding rather than the entire output streamencoding (however that is configured in Wicket). Put another way,whatever is producing the bytes from the main HTML page is using UTF-8(as it should), but whatever is taking the message tag output isspitting out its bytes using cp1252 or something similar.

As soon as I can get Eclipse to be happier with the Wicket build, I'llgive you some more exact details. But I'll have to take a break and getback to main my work for a while---we're nearing a big deadline and Ihave some actual functionality to implement! :)


Thanks again for investigating, Andrew.

Garret

On 8/28/2014 8:22 PM, Andrew Geery wrote:

I created a Wicket quickstart (from
http://wicket.apache.org/start/quickstart.html) [this is Wicket 6.16.0] and
made two simple changes:

1) I created a HomePage.properties file, encoded as ISO-8859-1, with a
single line as per the example above: copyright=© 2014 Example, Inc.

2) I added a line to the HomePage.html file as per the example
above: <p><small><wicket:message key="copyright">©
Example</wicket:message></small></p>

The content is served as UTF-8 and the copyright symbol is rendered
correctly on the page.

It doesn't look like the problem is in Wicket (at least not in 6.16).  I
guess your next steps would be to verify that you get the same results and,
assuming that you do, start removing things from your page that has the
problem until you find an element that is causing the problem.

Thanks
Andrew


On Thu, Aug 28, 2014 at 5:38 PM, Garret Wilson <gar...@globalmentor.com>
wrote:

On 8/28/2014 12:08 PM, Sven Meier wrote:

...

My configuration, as far as I can tell, is correct.

 From what you've written, I'd agree.

You should create a quickstart. This will easily allow us to find a
possible bug.

Better than that, I'd like to trace down the bug, fix it, and file a
patch. But currently I'm blocked from working with Wicket on Eclipse <
https://issues.apache.org/jira/browse/WICKET-5649>.

Garret

Re: resource encoding troubles

Reply via email to