Hi Garret,
I'm glad you found the culprit. Thanks for keeping us updated, we all
learn something new each day.
Have fun
Sven
On 09/20/2014 10:28 PM, Garret Wilson wrote:
Hahahaha! I found the problem!
When I looked at the HomePage.properties file in a hex editor, I was
looking at the HomePage.properties file in my source tree. But
remember that this file isn't the one that Wicket loads! After a Maven
build, Wicket will load the HomePage.properties file that Maven copies
the target directory!! (I should have paid closer attention to the URL
used by URLConnection.) And sure enough, when I open that copied
version of HomePage.properties, it contains the sequence EF BF BD! In
other words, when Maven copied the HomePage.properties file from the
source tree to the target directory, it must have opened it up as
UTF-8, converting the A9 © character (not valid UTF-8) into EF BF BD,
the UTF-8 sequence for U+FFFD, the Unicode replacement character. Thus
when Wicket came along to read the file from the target directory, it
(correctly) loaded it as ISO-8859-1, interpreting EF BF BD as three
characters, �.
But why did Maven use UTF-8 when it copied my HomePage.properties
source file to the target directory? Ummm... because I told it to,
sort of:
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
</properties>
<build>
<resources>
<resource>
<directory>src/main/resources</directory>
<filtering>true</filtering>
<includes>
<include>**/*.properties</include>
</includes>
Apparently when Maven copies resources using filtering, it opens and
parses them using the ${project.build.sourceEncoding} setting, which
of course I had set to UTF-8. I probably I need to set the "encoding"
parameter of the maven-resources-plugin
<http://maven.apache.org/plugins/maven-resources-plugin/copy-resources-mojo.html#encoding>.
Argg!! So much pain and agony for such a tiny mistake! But I'm glad I
found it. I'll fix it... another day. Right now I'm going to grab some
tequila and celebrate!!
Have a great rest of the weekend, everybody!
Garret
On 9/20/2014 4:14 PM, Garret Wilson wrote:
I'm finally able to trace the code, and this is getting very odd.
I use a hex editor, and the bytes in the properties file are ... 3D
A9 ... (=©), just as I expect.
But when I trace through the Wicket code, the
IsoPropertiesFilePropertiesLoader is using a UrlResourceStream which
uses a URLConnection, which under the hood uses a BufferedInputStream
to a FileInputStream. This in turn is wrapped in another
BufferedInputStream. When the Properties class (from
IsoPropertiesFilePropertiesLoader) parses the file, the internal
Properties.LineReader reads into its inByteBuf variable the sequence
... 3D EF BF BD ...! As mentioned below, EF BF BD is the UTF-8
sequence for U+FFFD, which is the Unicode replacement character.
So it appears that the UrlResourceStream/URLConnection for the
properties file is somewhere trying to open the stream as UTF-8.
Therefore the A9 © character gets converted into the EF BF BD
sequence before it even gets to the parser in
IsoPropertiesFilePropertiesLoader/Properties!
But what would be causing the UrlResourceStream/URLConnection to
default to UTF-8 when opening my properties file? This seems to be
the answer that lies at the heart of this problem. Is there some
Wicket or Java setting that is defaulting a URLConnection to use
UTF-8 encoding? (As I mentioned above, the underlying input stream
seems to be a FileInputStream wrapped in two layers of
BufferedInputStream.)
Garret
On 8/29/2014 1:15 PM, Garret Wilson wrote:
Hi, all. Thanks Andrew for that attempt to reproduce this. I have
verified this on Wicket 6.16.0 and 7.0.0-M2.
I have checked out the latest code from
https://git-wip-us.apache.org/repos/asf/wicket.git . I was going to
trace this down in the code, but then I was stopped in my tracks
with an Eclipse m2e bug
<https://bugs.eclipse.org/bugs/show_bug.cgi?id=371618> that won't
even let me clean/compile the project. Argg!! Always something, huh?
But I did start looking in the code. IsoPropertiesFileLoader looks
completely OK; it uses Properties.load(InputStream), and the file
even indicates that the input encoding must be ISO-8859-1. Not much
could go wrong there. I back-referenced the calls up the chain to
WicketMessageTagHandler.onComponentTag(Component, ComponentTag), and
it looks straightforward there---but that's for message tags, not
message body.
I investigated downwards from WicketMessageResolver.resolve(...)
(which I presume is what is at play here), which has this code:
MessageContainer label = new MessageContainer(id, messageKey);
The MessageContainer.onComponentTagBody(...) simply looks up the
value and calls renderMessage(), which in turn does some complicated
${var} replacement using MapVariableInterpolator and then write out
the result using getResponse().write(text). Unless
MapVariableInterpolator messes up the value during variable
replacement (but there are no variables to replace in this
situation), then on the surface everything looks OK.
So I decided to do an experiment; I changed the HTML to this:
<p>This a © copyright. <small><wicket:message key="copyright">dummy
text</wicket:message></small></p>
And I changed the properties to this:
copyright=This a © copyright.
Here is what was produced:
This a © copyright. This a � copyright.
So something is going on here in the generation of the included
message, because as you can see the content from XML gets produced
correctly. It turns out <http://stackoverflow.com/a/6367675/421049>
that � is the UTF-8 sequence for U+FFFD, which is the Unicode
replacement character when an invalid UTF-8 sequence is encountered.
And of course, the copyright symbol U+00A9 is not a valid UTF-8
value, even thought it is fine as part of ISO-8859-1.
So here is the problem: something is taking the string generated by
the message (which was parsed correctly from the properties file)
and writing it to the output stream, not in UTF-8 as it should, but
in some other encoding. If I were to guess here, I would say that
the embedded message is writing out in Windows cp1252 (more or less
ISO-8859-1), which is my default encoding (which would explain why
Andrew didn't see this, if his system is Linux and the default
encoding happens to be UTF-8 for example). This seems incorrect to
me; the embedded message should know that it is writing into a UTF-8
output stream and should use that instead of the system encoding.
Remember that I can't even compile the code because of an m2e bug,
so all of this is highly conjectural, just from visually inspecting
the code and doing a few experiments. But I have a hunch that if you
switch to a machine that has a default system encoding that isn't
UTF-8, you'll reproduce this issue. And I further predict that if
you trace through the code, the embedded <wicket:message> tag is
incorrectly injecting its contents using the system encoding rather
than the entire output stream encoding (however that is configured
in Wicket). Put another way, whatever is producing the bytes from
the main HTML page is using UTF-8 (as it should), but whatever is
taking the message tag output is spitting out its bytes using cp1252
or something similar.
As soon as I can get Eclipse to be happier with the Wicket build,
I'll give you some more exact details. But I'll have to take a break
and get back to main my work for a while---we're nearing a big
deadline and I have some actual functionality to implement! :)
Thanks again for investigating, Andrew.
Garret
On 8/28/2014 8:22 PM, Andrew Geery wrote:
I created a Wicket quickstart (from
http://wicket.apache.org/start/quickstart.html) [this is Wicket
6.16.0] and
made two simple changes:
1) I created a HomePage.properties file, encoded as ISO-8859-1, with a
single line as per the example above: copyright=© 2014 Example, Inc.
2) I added a line to the HomePage.html file as per the example
above: <p><small><wicket:message key="copyright">©
Example</wicket:message></small></p>
The content is served as UTF-8 and the copyright symbol is rendered
correctly on the page.
It doesn't look like the problem is in Wicket (at least not in
6.16). I
guess your next steps would be to verify that you get the same
results and,
assuming that you do, start removing things from your page that has
the
problem until you find an element that is causing the problem.
Thanks
Andrew
On Thu, Aug 28, 2014 at 5:38 PM, Garret Wilson
<gar...@globalmentor.com>
wrote:
On 8/28/2014 12:08 PM, Sven Meier wrote:
...
My configuration, as far as I can tell, is correct.
From what you've written, I'd agree.
You should create a quickstart. This will easily allow us to find a
possible bug.
Better than that, I'd like to trace down the bug, fix it, and file a
patch. But currently I'm blocked from working with Wicket on
Eclipse <
https://issues.apache.org/jira/browse/WICKET-5649>.
Garret
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@wicket.apache.org
For additional commands, e-mail: users-h...@wicket.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@wicket.apache.org
For additional commands, e-mail: users-h...@wicket.apache.org