Hi, all. Thanks Andrew for that attempt to reproduce this. I have verified this on Wicket 6.16.0 and 7.0.0-M2.

I have checked out the latest code from https://git-wip-us.apache.org/repos/asf/wicket.git . I was going to trace this down in the code, but then I was stopped in my tracks with an Eclipse m2e bug <https://bugs.eclipse.org/bugs/show_bug.cgi?id=371618> that won't even let me clean/compile the project. Argg!! Always something, huh?

But I did start looking in the code. IsoPropertiesFileLoader looks completely OK; it uses Properties.load(InputStream), and the file even indicates that the input encoding must be ISO-8859-1. Not much could go wrong there. I back-referenced the calls up the chain to WicketMessageTagHandler.onComponentTag(Component, ComponentTag), and it looks straightforward there---but that's for message tags, not message body.

I investigated downwards from WicketMessageResolver.resolve(...) (which I presume is what is at play here), which has this code:

   MessageContainer label = new MessageContainer(id, messageKey);

The MessageContainer.onComponentTagBody(...) simply looks up the value and calls renderMessage(), which in turn does some complicated ${var} replacement using MapVariableInterpolator and then write out the result using getResponse().write(text). Unless MapVariableInterpolator messes up the value during variable replacement (but there are no variables to replace in this situation), then on the surface everything looks OK.

So I decided to do an experiment; I changed the HTML to this:

   <p>This a © copyright. <small><wicket:message key="copyright">dummy
   text</wicket:message></small></p>

And I changed the properties to this:

   copyright=This a © copyright.


Here is what was produced:

   This a © copyright. This a � copyright.


So something is going on here in the generation of the included message, because as you can see the content from XML gets produced correctly. It turns out <http://stackoverflow.com/a/6367675/421049> that � is the UTF-8 sequence for U+FFFD, which is the Unicode replacement character when an invalid UTF-8 sequence is encountered. And of course, the copyright symbol U+00A9 is not a valid UTF-8 value, even thought it is fine as part of ISO-8859-1.

So here is the problem: something is taking the string generated by the message (which was parsed correctly from the properties file) and writing it to the output stream, not in UTF-8 as it should, but in some other encoding. If I were to guess here, I would say that the embedded message is writing out in Windows cp1252 (more or less ISO-8859-1), which is my default encoding (which would explain why Andrew didn't see this, if his system is Linux and the default encoding happens to be UTF-8 for example). This seems incorrect to me; the embedded message should know that it is writing into a UTF-8 output stream and should use that instead of the system encoding.

Remember that I can't even compile the code because of an m2e bug, so all of this is highly conjectural, just from visually inspecting the code and doing a few experiments. But I have a hunch that if you switch to a machine that has a default system encoding that isn't UTF-8, you'll reproduce this issue. And I further predict that if you trace through the code, the embedded <wicket:message> tag is incorrectly injecting its contents using the system encoding rather than the entire output stream encoding (however that is configured in Wicket). Put another way, whatever is producing the bytes from the main HTML page is using UTF-8 (as it should), but whatever is taking the message tag output is spitting out its bytes using cp1252 or something similar.

As soon as I can get Eclipse to be happier with the Wicket build, I'll give you some more exact details. But I'll have to take a break and get back to main my work for a while---we're nearing a big deadline and I have some actual functionality to implement! :)

Thanks again for investigating, Andrew.

Garret

On 8/28/2014 8:22 PM, Andrew Geery wrote:
I created a Wicket quickstart (from
http://wicket.apache.org/start/quickstart.html) [this is Wicket 6.16.0] and
made two simple changes:

1) I created a HomePage.properties file, encoded as ISO-8859-1, with a
single line as per the example above: copyright=© 2014 Example, Inc.

2) I added a line to the HomePage.html file as per the example
above: <p><small><wicket:message key="copyright">©
Example</wicket:message></small></p>

The content is served as UTF-8 and the copyright symbol is rendered
correctly on the page.

It doesn't look like the problem is in Wicket (at least not in 6.16).  I
guess your next steps would be to verify that you get the same results and,
assuming that you do, start removing things from your page that has the
problem until you find an element that is causing the problem.

Thanks
Andrew


On Thu, Aug 28, 2014 at 5:38 PM, Garret Wilson <gar...@globalmentor.com>
wrote:

On 8/28/2014 12:08 PM, Sven Meier wrote:

...


My configuration, as far as I can tell, is correct.
 From what you've written, I'd agree.

You should create a quickstart. This will easily allow us to find a
possible bug.

Better than that, I'd like to trace down the bug, fix it, and file a
patch. But currently I'm blocked from working with Wicket on Eclipse <
https://issues.apache.org/jira/browse/WICKET-5649>.

Garret


Reply via email to