Re: resource encoding troubles
I'm finally able to trace the code, and this is getting very odd. I use a hex editor, and the bytes in the properties file are ... 3D A9 ... (=©), just as I expect. But when I trace through the Wicket code, the IsoPropertiesFilePropertiesLoader is using a UrlResourceStream which uses a URLConnection, which under the hood uses a BufferedInputStream to a FileInputStream. This in turn is wrapped in another BufferedInputStream. When the Properties class (from IsoPropertiesFilePropertiesLoader) parses the file, the internal Properties.LineReader reads into its inByteBuf variable the sequence ... 3D EF BF BD ...! As mentioned below, EF BF BD is the UTF-8 sequence for U+FFFD, which is the Unicode replacement character. So it appears that the UrlResourceStream/URLConnection for the properties file is somewhere trying to open the stream as UTF-8. Therefore the A9 © character gets converted into the EF BF BD sequence before it even gets to the parser in IsoPropertiesFilePropertiesLoader/Properties! But what would be causing the UrlResourceStream/URLConnection to default to UTF-8 when opening my properties file? This seems to be the answer that lies at the heart of this problem. Is there some Wicket or Java setting that is defaulting a URLConnection to use UTF-8 encoding? (As I mentioned above, the underlying input stream seems to be a FileInputStream wrapped in two layers of BufferedInputStream.) Garret On 8/29/2014 1:15 PM, Garret Wilson wrote: Hi, all. Thanks Andrew for that attempt to reproduce this. I have verified this on Wicket 6.16.0 and 7.0.0-M2. I have checked out the latest code from https://git-wip-us.apache.org/repos/asf/wicket.git . I was going to trace this down in the code, but then I was stopped in my tracks with an Eclipse m2e bug https://bugs.eclipse.org/bugs/show_bug.cgi?id=371618 that won't even let me clean/compile the project. Argg!! Always something, huh? But I did start looking in the code. IsoPropertiesFileLoader looks completely OK; it uses Properties.load(InputStream), and the file even indicates that the input encoding must be ISO-8859-1. Not much could go wrong there. I back-referenced the calls up the chain to WicketMessageTagHandler.onComponentTag(Component, ComponentTag), and it looks straightforward there---but that's for message tags, not message body. I investigated downwards from WicketMessageResolver.resolve(...) (which I presume is what is at play here), which has this code: MessageContainer label = new MessageContainer(id, messageKey); The MessageContainer.onComponentTagBody(...) simply looks up the value and calls renderMessage(), which in turn does some complicated ${var} replacement using MapVariableInterpolator and then write out the result using getResponse().write(text). Unless MapVariableInterpolator messes up the value during variable replacement (but there are no variables to replace in this situation), then on the surface everything looks OK. So I decided to do an experiment; I changed the HTML to this: pThis a © copyright. smallwicket:message key=copyrightdummy text/wicket:message/small/p And I changed the properties to this: copyright=This a © copyright. Here is what was produced: This a © copyright. This a � copyright. So something is going on here in the generation of the included message, because as you can see the content from XML gets produced correctly. It turns out http://stackoverflow.com/a/6367675/421049 that � is the UTF-8 sequence for U+FFFD, which is the Unicode replacement character when an invalid UTF-8 sequence is encountered. And of course, the copyright symbol U+00A9 is not a valid UTF-8 value, even thought it is fine as part of ISO-8859-1. So here is the problem: something is taking the string generated by the message (which was parsed correctly from the properties file) and writing it to the output stream, not in UTF-8 as it should, but in some other encoding. If I were to guess here, I would say that the embedded message is writing out in Windows cp1252 (more or less ISO-8859-1), which is my default encoding (which would explain why Andrew didn't see this, if his system is Linux and the default encoding happens to be UTF-8 for example). This seems incorrect to me; the embedded message should know that it is writing into a UTF-8 output stream and should use that instead of the system encoding. Remember that I can't even compile the code because of an m2e bug, so all of this is highly conjectural, just from visually inspecting the code and doing a few experiments. But I have a hunch that if you switch to a machine that has a default system encoding that isn't UTF-8, you'll reproduce this issue. And I further predict that if you trace through the code, the embedded wicket:message tag is incorrectly injecting its contents using the system encoding rather than the entire output stream encoding (however that is configured in Wicket). Put
Re: resource encoding troubles
Hahahaha! I found the problem! When I looked at the HomePage.properties file in a hex editor, I was looking at the HomePage.properties file in my source tree. But remember that this file isn't the one that Wicket loads! After a Maven build, Wicket will load the HomePage.properties file that Maven copies the target directory!! (I should have paid closer attention to the URL used by URLConnection.) And sure enough, when I open that copied version of HomePage.properties, it contains the sequence EF BF BD! In other words, when Maven copied the HomePage.properties file from the source tree to the target directory, it must have opened it up as UTF-8, converting the A9 © character (not valid UTF-8) into EF BF BD, the UTF-8 sequence for U+FFFD, the Unicode replacement character. Thus when Wicket came along to read the file from the target directory, it (correctly) loaded it as ISO-8859-1, interpreting EF BF BD as three characters, �. But why did Maven use UTF-8 when it copied my HomePage.properties source file to the target directory? Ummm... because I told it to, sort of: properties project.build.sourceEncodingUTF-8/project.build.sourceEncoding /properties build resources resource directorysrc/main/resources/directory filteringtrue/filtering includes include**/*.properties/include /includes Apparently when Maven copies resources using filtering, it opens and parses them using the ${project.build.sourceEncoding} setting, which of course I had set to UTF-8. I probably I need to set the encoding parameter of the maven-resources-plugin http://maven.apache.org/plugins/maven-resources-plugin/copy-resources-mojo.html#encoding. Argg!! So much pain and agony for such a tiny mistake! But I'm glad I found it. I'll fix it... another day. Right now I'm going to grab some tequila and celebrate!! Have a great rest of the weekend, everybody! Garret On 9/20/2014 4:14 PM, Garret Wilson wrote: I'm finally able to trace the code, and this is getting very odd. I use a hex editor, and the bytes in the properties file are ... 3D A9 ... (=©), just as I expect. But when I trace through the Wicket code, the IsoPropertiesFilePropertiesLoader is using a UrlResourceStream which uses a URLConnection, which under the hood uses a BufferedInputStream to a FileInputStream. This in turn is wrapped in another BufferedInputStream. When the Properties class (from IsoPropertiesFilePropertiesLoader) parses the file, the internal Properties.LineReader reads into its inByteBuf variable the sequence ... 3D EF BF BD ...! As mentioned below, EF BF BD is the UTF-8 sequence for U+FFFD, which is the Unicode replacement character. So it appears that the UrlResourceStream/URLConnection for the properties file is somewhere trying to open the stream as UTF-8. Therefore the A9 © character gets converted into the EF BF BD sequence before it even gets to the parser in IsoPropertiesFilePropertiesLoader/Properties! But what would be causing the UrlResourceStream/URLConnection to default to UTF-8 when opening my properties file? This seems to be the answer that lies at the heart of this problem. Is there some Wicket or Java setting that is defaulting a URLConnection to use UTF-8 encoding? (As I mentioned above, the underlying input stream seems to be a FileInputStream wrapped in two layers of BufferedInputStream.) Garret On 8/29/2014 1:15 PM, Garret Wilson wrote: Hi, all. Thanks Andrew for that attempt to reproduce this. I have verified this on Wicket 6.16.0 and 7.0.0-M2. I have checked out the latest code from https://git-wip-us.apache.org/repos/asf/wicket.git . I was going to trace this down in the code, but then I was stopped in my tracks with an Eclipse m2e bug https://bugs.eclipse.org/bugs/show_bug.cgi?id=371618 that won't even let me clean/compile the project. Argg!! Always something, huh? But I did start looking in the code. IsoPropertiesFileLoader looks completely OK; it uses Properties.load(InputStream), and the file even indicates that the input encoding must be ISO-8859-1. Not much could go wrong there. I back-referenced the calls up the chain to WicketMessageTagHandler.onComponentTag(Component, ComponentTag), and it looks straightforward there---but that's for message tags, not message body. I investigated downwards from WicketMessageResolver.resolve(...) (which I presume is what is at play here), which has this code: MessageContainer label = new MessageContainer(id, messageKey); The MessageContainer.onComponentTagBody(...) simply looks up the value and calls renderMessage(), which in turn does some complicated ${var} replacement using MapVariableInterpolator and then write out the result using getResponse().write(text). Unless MapVariableInterpolator messes up the value during variable replacement (but there are no variables to replace in this situation), then on the surface everything looks OK. So I
Re: resource encoding troubles
Hi Garret, I'm glad you found the culprit. Thanks for keeping us updated, we all learn something new each day. Have fun Sven On 09/20/2014 10:28 PM, Garret Wilson wrote: Hahahaha! I found the problem! When I looked at the HomePage.properties file in a hex editor, I was looking at the HomePage.properties file in my source tree. But remember that this file isn't the one that Wicket loads! After a Maven build, Wicket will load the HomePage.properties file that Maven copies the target directory!! (I should have paid closer attention to the URL used by URLConnection.) And sure enough, when I open that copied version of HomePage.properties, it contains the sequence EF BF BD! In other words, when Maven copied the HomePage.properties file from the source tree to the target directory, it must have opened it up as UTF-8, converting the A9 © character (not valid UTF-8) into EF BF BD, the UTF-8 sequence for U+FFFD, the Unicode replacement character. Thus when Wicket came along to read the file from the target directory, it (correctly) loaded it as ISO-8859-1, interpreting EF BF BD as three characters, �. But why did Maven use UTF-8 when it copied my HomePage.properties source file to the target directory? Ummm... because I told it to, sort of: properties project.build.sourceEncodingUTF-8/project.build.sourceEncoding /properties build resources resource directorysrc/main/resources/directory filteringtrue/filtering includes include**/*.properties/include /includes Apparently when Maven copies resources using filtering, it opens and parses them using the ${project.build.sourceEncoding} setting, which of course I had set to UTF-8. I probably I need to set the encoding parameter of the maven-resources-plugin http://maven.apache.org/plugins/maven-resources-plugin/copy-resources-mojo.html#encoding. Argg!! So much pain and agony for such a tiny mistake! But I'm glad I found it. I'll fix it... another day. Right now I'm going to grab some tequila and celebrate!! Have a great rest of the weekend, everybody! Garret On 9/20/2014 4:14 PM, Garret Wilson wrote: I'm finally able to trace the code, and this is getting very odd. I use a hex editor, and the bytes in the properties file are ... 3D A9 ... (=©), just as I expect. But when I trace through the Wicket code, the IsoPropertiesFilePropertiesLoader is using a UrlResourceStream which uses a URLConnection, which under the hood uses a BufferedInputStream to a FileInputStream. This in turn is wrapped in another BufferedInputStream. When the Properties class (from IsoPropertiesFilePropertiesLoader) parses the file, the internal Properties.LineReader reads into its inByteBuf variable the sequence ... 3D EF BF BD ...! As mentioned below, EF BF BD is the UTF-8 sequence for U+FFFD, which is the Unicode replacement character. So it appears that the UrlResourceStream/URLConnection for the properties file is somewhere trying to open the stream as UTF-8. Therefore the A9 © character gets converted into the EF BF BD sequence before it even gets to the parser in IsoPropertiesFilePropertiesLoader/Properties! But what would be causing the UrlResourceStream/URLConnection to default to UTF-8 when opening my properties file? This seems to be the answer that lies at the heart of this problem. Is there some Wicket or Java setting that is defaulting a URLConnection to use UTF-8 encoding? (As I mentioned above, the underlying input stream seems to be a FileInputStream wrapped in two layers of BufferedInputStream.) Garret On 8/29/2014 1:15 PM, Garret Wilson wrote: Hi, all. Thanks Andrew for that attempt to reproduce this. I have verified this on Wicket 6.16.0 and 7.0.0-M2. I have checked out the latest code from https://git-wip-us.apache.org/repos/asf/wicket.git . I was going to trace this down in the code, but then I was stopped in my tracks with an Eclipse m2e bug https://bugs.eclipse.org/bugs/show_bug.cgi?id=371618 that won't even let me clean/compile the project. Argg!! Always something, huh? But I did start looking in the code. IsoPropertiesFileLoader looks completely OK; it uses Properties.load(InputStream), and the file even indicates that the input encoding must be ISO-8859-1. Not much could go wrong there. I back-referenced the calls up the chain to WicketMessageTagHandler.onComponentTag(Component, ComponentTag), and it looks straightforward there---but that's for message tags, not message body. I investigated downwards from WicketMessageResolver.resolve(...) (which I presume is what is at play here), which has this code: MessageContainer label = new MessageContainer(id, messageKey); The MessageContainer.onComponentTagBody(...) simply looks up the value and calls renderMessage(), which in turn does some complicated ${var} replacement using MapVariableInterpolator and then write out the result using getResponse().write(text). Unless
Re: resource encoding troubles
Thanks Andrew! Sven On 08/29/2014 05:22 AM, Andrew Geery wrote: I created a Wicket quickstart (from http://wicket.apache.org/start/quickstart.html) [this is Wicket 6.16.0] and made two simple changes: 1) I created a HomePage.properties file, encoded as ISO-8859-1, with a single line as per the example above: copyright=© 2014 Example, Inc. 2) I added a line to the HomePage.html file as per the example above: psmallwicket:message key=copyright© Example/wicket:message/small/p The content is served as UTF-8 and the copyright symbol is rendered correctly on the page. It doesn't look like the problem is in Wicket (at least not in 6.16). I guess your next steps would be to verify that you get the same results and, assuming that you do, start removing things from your page that has the problem until you find an element that is causing the problem. Thanks Andrew On Thu, Aug 28, 2014 at 5:38 PM, Garret Wilson gar...@globalmentor.com wrote: On 8/28/2014 12:08 PM, Sven Meier wrote: ... My configuration, as far as I can tell, is correct. From what you've written, I'd agree. You should create a quickstart. This will easily allow us to find a possible bug. Better than that, I'd like to trace down the bug, fix it, and file a patch. But currently I'm blocked from working with Wicket on Eclipse https://issues.apache.org/jira/browse/WICKET-5649. Garret - To unsubscribe, e-mail: users-unsubscr...@wicket.apache.org For additional commands, e-mail: users-h...@wicket.apache.org
Re: resource encoding troubles
Hi, all. Thanks Andrew for that attempt to reproduce this. I have verified this on Wicket 6.16.0 and 7.0.0-M2. I have checked out the latest code from https://git-wip-us.apache.org/repos/asf/wicket.git . I was going to trace this down in the code, but then I was stopped in my tracks with an Eclipse m2e bug https://bugs.eclipse.org/bugs/show_bug.cgi?id=371618 that won't even let me clean/compile the project. Argg!! Always something, huh? But I did start looking in the code. IsoPropertiesFileLoader looks completely OK; it uses Properties.load(InputStream), and the file even indicates that the input encoding must be ISO-8859-1. Not much could go wrong there. I back-referenced the calls up the chain to WicketMessageTagHandler.onComponentTag(Component, ComponentTag), and it looks straightforward there---but that's for message tags, not message body. I investigated downwards from WicketMessageResolver.resolve(...) (which I presume is what is at play here), which has this code: MessageContainer label = new MessageContainer(id, messageKey); The MessageContainer.onComponentTagBody(...) simply looks up the value and calls renderMessage(), which in turn does some complicated ${var} replacement using MapVariableInterpolator and then write out the result using getResponse().write(text). Unless MapVariableInterpolator messes up the value during variable replacement (but there are no variables to replace in this situation), then on the surface everything looks OK. So I decided to do an experiment; I changed the HTML to this: pThis a © copyright. smallwicket:message key=copyrightdummy text/wicket:message/small/p And I changed the properties to this: copyright=This a © copyright. Here is what was produced: This a © copyright. This a � copyright. So something is going on here in the generation of the included message, because as you can see the content from XML gets produced correctly. It turns out http://stackoverflow.com/a/6367675/421049 that � is the UTF-8 sequence for U+FFFD, which is the Unicode replacement character when an invalid UTF-8 sequence is encountered. And of course, the copyright symbol U+00A9 is not a valid UTF-8 value, even thought it is fine as part of ISO-8859-1. So here is the problem: something is taking the string generated by the message (which was parsed correctly from the properties file) and writing it to the output stream, not in UTF-8 as it should, but in some other encoding. If I were to guess here, I would say that the embedded message is writing out in Windows cp1252 (more or less ISO-8859-1), which is my default encoding (which would explain why Andrew didn't see this, if his system is Linux and the default encoding happens to be UTF-8 for example). This seems incorrect to me; the embedded message should know that it is writing into a UTF-8 output stream and should use that instead of the system encoding. Remember that I can't even compile the code because of an m2e bug, so all of this is highly conjectural, just from visually inspecting the code and doing a few experiments. But I have a hunch that if you switch to a machine that has a default system encoding that isn't UTF-8, you'll reproduce this issue. And I further predict that if you trace through the code, the embedded wicket:message tag is incorrectly injecting its contents using the system encoding rather than the entire output stream encoding (however that is configured in Wicket). Put another way, whatever is producing the bytes from the main HTML page is using UTF-8 (as it should), but whatever is taking the message tag output is spitting out its bytes using cp1252 or something similar. As soon as I can get Eclipse to be happier with the Wicket build, I'll give you some more exact details. But I'll have to take a break and get back to main my work for a while---we're nearing a big deadline and I have some actual functionality to implement! :) Thanks again for investigating, Andrew. Garret On 8/28/2014 8:22 PM, Andrew Geery wrote: I created a Wicket quickstart (from http://wicket.apache.org/start/quickstart.html) [this is Wicket 6.16.0] and made two simple changes: 1) I created a HomePage.properties file, encoded as ISO-8859-1, with a single line as per the example above: copyright=© 2014 Example, Inc. 2) I added a line to the HomePage.html file as per the example above: psmallwicket:message key=copyright© Example/wicket:message/small/p The content is served as UTF-8 and the copyright symbol is rendered correctly on the page. It doesn't look like the problem is in Wicket (at least not in 6.16). I guess your next steps would be to verify that you get the same results and, assuming that you do, start removing things from your page that has the problem until you find an element that is causing the problem. Thanks Andrew On Thu, Aug 28, 2014 at 5:38 PM, Garret Wilson gar...@globalmentor.com wrote: On 8/28/2014 12:08 PM, Sven Meier
Re: resource encoding troubles
On 8/29/2014 9:15 AM, Garret Wilson wrote: ... So here is the problem: something is taking the string generated by the message (which was parsed correctly from the properties file) and writing it to the output stream, not in UTF-8 as it should, but in some other encoding. Hmmm... the sequence of events would have to be a little more complicated than that. If somehow the properties file were being read as UTF-8 (it shouldn't be), then when U+00A9 it would be mapped to the replacement character U+FFFD. Then if /that/ UTF-8 stream were in turn interpreted as cp1252/ISO-8859-1, then it would produce the sequence �, which I'm seeing. But that would require two levels of errors, it would seem. And the code looks like the properties file is being read correctly in IsoPropertiesFilePropertiesLoader. (Maybe something is being cached in the system encoding, and then being read from the cache using UTF-8.) So I can sense the problem here, but I don't yet see where it's happening in the code. As soon as I'm able to trace the code, I would imagine I could find it pretty quickly. Garret
resource encoding troubles
I have Wicket 7.0.0-M2 running on embedded Jetty, which is correctly returning a content type of UTF-8 for my Wicket page: Date: Thu, 28 Aug 2014 15:37:52 GMT Expires: Thu, 01 Jan 1970 00:00:00 GMT Pragma: no-cache Cache-Control: no-cache, no-store Content-Type: text/html; charset=UTF-8 Transfer-Encoding: chunked Server: Jetty(9.1.0.v20131115) I have a properties file FooterPanel.properties that contains the following line (encoded in ISO-8859-1, as properties files unfortunately require): copyright=© 2014 Example, Inc. FooterPanel.html is encoded in UTF-8, has the appropriate XML prolog, and contains the following reference to the property resource: ?xml version=1.0 encoding=utf-8? ... psmallwicket:message key=copyright© Example/wicket:message/small/p When this all is rendered, here is what I see in Firefox 31 and Chrome 37: � 2014 Example, Inc. I thought I had all the correct encoding indicators at each stage in the pipeline. But somebody blinked. Where is the problem? Garret
Re: resource encoding troubles
Look at http://apache-wicket.1842946.n4.nabble.com/How-to-localize-options-in-drop-down-tt4661751.html#a4661768 François Meillet Formation Wicket - Développement Wicket Le 28 août 2014 à 17:47, Garret Wilson gar...@globalmentor.com a écrit : I have Wicket 7.0.0-M2 running on embedded Jetty, which is correctly returning a content type of UTF-8 for my Wicket page: Date: Thu, 28 Aug 2014 15:37:52 GMT Expires: Thu, 01 Jan 1970 00:00:00 GMT Pragma: no-cache Cache-Control: no-cache, no-store Content-Type: text/html; charset=UTF-8 Transfer-Encoding: chunked Server: Jetty(9.1.0.v20131115) I have a properties file FooterPanel.properties that contains the following line (encoded in ISO-8859-1, as properties files unfortunately require): copyright=© 2014 Example, Inc. FooterPanel.html is encoded in UTF-8, has the appropriate XML prolog, and contains the following reference to the property resource: ?xml version=1.0 encoding=utf-8? ... psmallwicket:message key=copyright© Example/wicket:message/small/p When this all is rendered, here is what I see in Firefox 31 and Chrome 37: � 2014 Example, Inc. I thought I had all the correct encoding indicators at each stage in the pipeline. But somebody blinked. Where is the problem? Garret
Re: resource encoding troubles
Please explain explicitly what you are trying to say. I don't see how that link is relevant. * I am using FooterPanel.properties. * Java properties files, as per the specification http://docs.oracle.com/javase/8/docs/api/java/util/Properties.html, are (and always have been) encoded in ISO-8859-1. (I don't like this either, but that's how it is.) * My FooBar.properties file is encoded in ISO-8859-1, and there is nothing to indicate otherwise. There is no BOM present. There is no utf in the filename. * The character © is U+00A9, which takes up exactly one byte in ISO-8859-1. It is correctly encoded in FooterPanel.properties. So what specifically are you implying by the link? Are you implying that Wicket does not support the Java properties specification? Are you implying I did something incorrectly in my properties file? Please elaborate. Garret On 8/28/2014 9:57 AM, Francois Meillet wrote: Look at http://apache-wicket.1842946.n4.nabble.com/How-to-localize-options-in-drop-down-tt4661751.html#a4661768 François Meillet Formation Wicket - Développement Wicket Le 28 août 2014 à 17:47, Garret Wilson gar...@globalmentor.com a écrit : I have Wicket 7.0.0-M2 running on embedded Jetty, which is correctly returning a content type of UTF-8 for my Wicket page: Date: Thu, 28 Aug 2014 15:37:52 GMT Expires: Thu, 01 Jan 1970 00:00:00 GMT Pragma: no-cache Cache-Control: no-cache, no-store Content-Type: text/html; charset=UTF-8 Transfer-Encoding: chunked Server: Jetty(9.1.0.v20131115) I have a properties file FooterPanel.properties that contains the following line (encoded in ISO-8859-1, as properties files unfortunately require): copyright=© 2014 Example, Inc. FooterPanel.html is encoded in UTF-8, has the appropriate XML prolog, and contains the following reference to the property resource: ?xml version=1.0 encoding=utf-8? ... psmallwicket:message key=copyright© Example/wicket:message/small/p When this all is rendered, here is what I see in Firefox 31 and Chrome 37: � 2014 Example, Inc. I thought I had all the correct encoding indicators at each stage in the pipeline. But somebody blinked. Where is the problem? Garret
Re: resource encoding troubles
use *.utf8.properties François Meillet Formation Wicket - Développement Wicket Le 28 août 2014 à 17:47, Garret Wilson gar...@globalmentor.com a écrit : I have Wicket 7.0.0-M2 running on embedded Jetty, which is correctly returning a content type of UTF-8 for my Wicket page: Date: Thu, 28 Aug 2014 15:37:52 GMT Expires: Thu, 01 Jan 1970 00:00:00 GMT Pragma: no-cache Cache-Control: no-cache, no-store Content-Type: text/html; charset=UTF-8 Transfer-Encoding: chunked Server: Jetty(9.1.0.v20131115) I have a properties file FooterPanel.properties that contains the following line (encoded in ISO-8859-1, as properties files unfortunately require): copyright=© 2014 Example, Inc. FooterPanel.html is encoded in UTF-8, has the appropriate XML prolog, and contains the following reference to the property resource: ?xml version=1.0 encoding=utf-8? ... psmallwicket:message key=copyright© Example/wicket:message/small/p When this all is rendered, here is what I see in Firefox 31 and Chrome 37: � 2014 Example, Inc. I thought I had all the correct encoding indicators at each stage in the pipeline. But somebody blinked. Where is the problem? Garret
Re: resource encoding troubles
So are you saying that Wicket does not support ISO-8859-1 properties files that adhere do the Java standard? Or are you saying, I don't know what the problem is, I'm just giving you a workaround? If so, I appreciate the workaround tip, but that still doesn't explain what the problem is. I'm the sort of person who doesn't like to wave my hands as we say. I like to find the source of the problem. My configuration, as far as I can tell, is correct. Moreover, it is technically more correct than the *.utf8.properties approach, as my approach follows the standard. In fact my approach should be the default. So does anyone know why my configuration does not work? What am I doing wrong? Sincerely, Garret On 8/28/2014 10:18 AM, Francois Meillet wrote: use *.utf8.properties François Meillet Formation Wicket - Développement Wicket Le 28 août 2014 à 17:47, Garret Wilson gar...@globalmentor.com a écrit : I have Wicket 7.0.0-M2 running on embedded Jetty, which is correctly returning a content type of UTF-8 for my Wicket page: Date: Thu, 28 Aug 2014 15:37:52 GMT Expires: Thu, 01 Jan 1970 00:00:00 GMT Pragma: no-cache Cache-Control: no-cache, no-store Content-Type: text/html; charset=UTF-8 Transfer-Encoding: chunked Server: Jetty(9.1.0.v20131115) I have a properties file FooterPanel.properties that contains the following line (encoded in ISO-8859-1, as properties files unfortunately require): copyright=© 2014 Example, Inc. FooterPanel.html is encoded in UTF-8, has the appropriate XML prolog, and contains the following reference to the property resource: ?xml version=1.0 encoding=utf-8? ... psmallwicket:message key=copyright© Example/wicket:message/small/p When this all is rendered, here is what I see in Firefox 31 and Chrome 37: � 2014 Example, Inc. I thought I had all the correct encoding indicators at each stage in the pipeline. But somebody blinked. Where is the problem? Garret - To unsubscribe, e-mail: users-unsubscr...@wicket.apache.org For additional commands, e-mail: users-h...@wicket.apache.org
Re: resource encoding troubles
http://wicket.apache.org/guide/guide/ François Meillet Formation Wicket - Développement Wicket Le 28 août 2014 à 19:24, Garret Wilson gar...@globalmentor.com a écrit : So are you saying that Wicket does not support ISO-8859-1 properties files that adhere do the Java standard? Or are you saying, I don't know what the problem is, I'm just giving you a workaround? If so, I appreciate the workaround tip, but that still doesn't explain what the problem is. I'm the sort of person who doesn't like to wave my hands as we say. I like to find the source of the problem. My configuration, as far as I can tell, is correct. Moreover, it is technically more correct than the *.utf8.properties approach, as my approach follows the standard. In fact my approach should be the default. So does anyone know why my configuration does not work? What am I doing wrong? Sincerely, Garret On 8/28/2014 10:18 AM, Francois Meillet wrote: use *.utf8.properties François Meillet Formation Wicket - Développement Wicket Le 28 août 2014 à 17:47, Garret Wilson gar...@globalmentor.com a écrit : I have Wicket 7.0.0-M2 running on embedded Jetty, which is correctly returning a content type of UTF-8 for my Wicket page: Date: Thu, 28 Aug 2014 15:37:52 GMT Expires: Thu, 01 Jan 1970 00:00:00 GMT Pragma: no-cache Cache-Control: no-cache, no-store Content-Type: text/html; charset=UTF-8 Transfer-Encoding: chunked Server: Jetty(9.1.0.v20131115) I have a properties file FooterPanel.properties that contains the following line (encoded in ISO-8859-1, as properties files unfortunately require): copyright=© 2014 Example, Inc. FooterPanel.html is encoded in UTF-8, has the appropriate XML prolog, and contains the following reference to the property resource: ?xml version=1.0 encoding=utf-8? ... psmallwicket:message key=copyright© Example/wicket:message/small/p When this all is rendered, here is what I see in Firefox 31 and Chrome 37: � 2014 Example, Inc. I thought I had all the correct encoding indicators at each stage in the pipeline. But somebody blinked. Where is the problem? Garret - To unsubscribe, e-mail: users-unsubscr...@wicket.apache.org For additional commands, e-mail: users-h...@wicket.apache.org
Re: resource encoding troubles
Exactly! Quoting from the page you provided: Java uses the standard character set ISO 8859-11 to encode text files like properties files. ... (Note that this is a typo above---the author meant to say ISO 8859-1, not ISO 8859-11. The link to http://en.wikipedia.org/wiki/ISO/IEC_8859-1 in the text is correct, however.) So according to that description, my FooterPanel.properties file is expected to be encoded in ISO-8859-1. And indeed it is, as I have repeatedly explained. So I ask again: what is wrong with my current configuration? Garret On 8/28/2014 10:25 AM, Francois Meillet wrote: http://wicket.apache.org/guide/guide/ François Meillet Formation Wicket - Développement Wicket Le 28 août 2014 à 19:24, Garret Wilson gar...@globalmentor.com a écrit : So are you saying that Wicket does not support ISO-8859-1 properties files that adhere do the Java standard? Or are you saying, I don't know what the problem is, I'm just giving you a workaround? If so, I appreciate the workaround tip, but that still doesn't explain what the problem is. I'm the sort of person who doesn't like to wave my hands as we say. I like to find the source of the problem. My configuration, as far as I can tell, is correct. Moreover, it is technically more correct than the *.utf8.properties approach, as my approach follows the standard. In fact my approach should be the default. So does anyone know why my configuration does not work? What am I doing wrong? Sincerely, Garret On 8/28/2014 10:18 AM, Francois Meillet wrote: use *.utf8.properties François Meillet Formation Wicket - Développement Wicket Le 28 août 2014 à 17:47, Garret Wilson gar...@globalmentor.com a écrit : I have Wicket 7.0.0-M2 running on embedded Jetty, which is correctly returning a content type of UTF-8 for my Wicket page: Date: Thu, 28 Aug 2014 15:37:52 GMT Expires: Thu, 01 Jan 1970 00:00:00 GMT Pragma: no-cache Cache-Control: no-cache, no-store Content-Type: text/html; charset=UTF-8 Transfer-Encoding: chunked Server: Jetty(9.1.0.v20131115) I have a properties file FooterPanel.properties that contains the following line (encoded in ISO-8859-1, as properties files unfortunately require): copyright=© 2014 Example, Inc. FooterPanel.html is encoded in UTF-8, has the appropriate XML prolog, and contains the following reference to the property resource: ?xml version=1.0 encoding=utf-8? ... psmallwicket:message key=copyright© Example/wicket:message/small/p When this all is rendered, here is what I see in Firefox 31 and Chrome 37: � 2014 Example, Inc. I thought I had all the correct encoding indicators at each stage in the pipeline. But somebody blinked. Where is the problem? Garret - To unsubscribe, e-mail: users-unsubscr...@wicket.apache.org For additional commands, e-mail: users-h...@wicket.apache.org - To unsubscribe, e-mail: users-unsubscr...@wicket.apache.org For additional commands, e-mail: users-h...@wicket.apache.org
Re: resource encoding troubles
Have you tried using directly unicode character? i.e.: copyright=\u00A9 2014 Example, Inc. If you don't want to use unicode characters you should use an xml file as bundle file. Exactly! Quoting from the page you provided: Java uses the standard character set ISO 8859-11 to encode text files like properties files. ... (Note that this is a typo above---the author meant to say ISO 8859-1, not ISO 8859-11. The link to http://en.wikipedia.org/wiki/ISO/IEC_8859-1 in the text is correct, however.) So according to that description, my FooterPanel.properties file is expected to be encoded in ISO-8859-1. And indeed it is, as I have repeatedly explained. So I ask again: what is wrong with my current configuration? Garret - To unsubscribe, e-mail: users-unsubscr...@wicket.apache.org For additional commands, e-mail: users-h...@wicket.apache.org
Re: resource encoding troubles
Hi Garret, Garret Wilson wrote: Exactly! Quoting from the page you provided: Java uses the standard character set ISO 8859-11 to encode text files like properties files. ... (Note that this is a typo above---the author meant to say ISO 8859-1, not ISO 8859-11. The link to http://en.wikipedia.org/wiki/ISO/IEC_8859-1 in the text is correct, however.) So according to that description, my FooterPanel.properties file is expected to be encoded in ISO-8859-1. And indeed it is, as I have repeatedly explained. So I ask again: what is wrong with my current configuration? if I read your original post correctly, you have not used ISO-8859-1 encoding in your property file, as I clearly see a (C) symbol. As others already pointed out, you should use the \u... coded (c)-symbol. However, as Francois stated, you can use a property file called FooterPanel.utf8.properties, in which you can use UTF8-encoding and nicely write all your labels and/or translations. I don't know about Wicket 7, but in Wicket 6 this works like a charm, whereever the .utf8.properties-stuff comes from. Cheers, Stefan Garret On 8/28/2014 10:25 AM, Francois Meillet wrote: http://wicket.apache.org/guide/guide/ François Meillet Formation Wicket - Développement Wicket Le 28 août 2014 à 19:24, Garret Wilson gar...@globalmentor.com a écrit : So are you saying that Wicket does not support ISO-8859-1 properties files that adhere do the Java standard? Or are you saying, I don't know what the problem is, I'm just giving you a workaround? If so, I appreciate the workaround tip, but that still doesn't explain what the problem is. I'm the sort of person who doesn't like to wave my hands as we say. I like to find the source of the problem. My configuration, as far as I can tell, is correct. Moreover, it is technically more correct than the *.utf8.properties approach, as my approach follows the standard. In fact my approach should be the default. So does anyone know why my configuration does not work? What am I doing wrong? Sincerely, Garret On 8/28/2014 10:18 AM, Francois Meillet wrote: use *.utf8.properties François Meillet Formation Wicket - Développement Wicket Le 28 août 2014 à 17:47, Garret Wilson gar...@globalmentor.com a écrit : I have Wicket 7.0.0-M2 running on embedded Jetty, which is correctly returning a content type of UTF-8 for my Wicket page: Date: Thu, 28 Aug 2014 15:37:52 GMT Expires: Thu, 01 Jan 1970 00:00:00 GMT Pragma: no-cache Cache-Control: no-cache, no-store Content-Type: text/html; charset=UTF-8 Transfer-Encoding: chunked Server: Jetty(9.1.0.v20131115) I have a properties file FooterPanel.properties that contains the following line (encoded in ISO-8859-1, as properties files unfortunately require): copyright=© 2014 Example, Inc. FooterPanel.html is encoded in UTF-8, has the appropriate XML prolog, and contains the following reference to the property resource: ?xml version=1.0 encoding=utf-8? ... psmallwicket:message key=copyright© Example/wicket:message/small/p When this all is rendered, here is what I see in Firefox 31 and Chrome 37: � 2014 Example, Inc. I thought I had all the correct encoding indicators at each stage in the pipeline. But somebody blinked. Where is the problem? Garret - To unsubscribe, e-mail: users-unsubscr...@wicket.apache.org For additional commands, e-mail: users-h...@wicket.apache.org - To unsubscribe, e-mail: users-unsubscr...@wicket.apache.org For additional commands, e-mail: users-h...@wicket.apache.org -- im Auftrag der eFonds Solutions AG, +49-89-579494-3417 - To unsubscribe, e-mail: users-unsubscr...@wicket.apache.org For additional commands, e-mail: users-h...@wicket.apache.org
Re: resource encoding troubles
I appreciate all the workarounds suggested. But no one has addressed the core issue: Is this a Wicket bug, or am I using standard property files incorrectly? Garret On 8/28/2014 10:42 AM, Andrea Del Bene wrote: Have you tried using directly unicode character? i.e.: copyright=\u00A9 2014 Example, Inc. If you don't want to use unicode characters you should use an xml file as bundle file. Exactly! Quoting from the page you provided: Java uses the standard character set ISO 8859-11 to encode text files like properties files. ... (Note that this is a typo above---the author meant to say ISO 8859-1, not ISO 8859-11. The link to http://en.wikipedia.org/wiki/ISO/IEC_8859-1 in the text is correct, however.) So according to that description, my FooterPanel.properties file is expected to be encoded in ISO-8859-1. And indeed it is, as I have repeatedly explained. So I ask again: what is wrong with my current configuration? Garret - To unsubscribe, e-mail: users-unsubscr...@wicket.apache.org For additional commands, e-mail: users-h...@wicket.apache.org - To unsubscribe, e-mail: users-unsubscr...@wicket.apache.org For additional commands, e-mail: users-h...@wicket.apache.org
Re: resource encoding troubles
On 8/28/2014 10:53 AM, Stefan Renz wrote: ... if I read your original post correctly, you have not used ISO-8859-1 encoding in your property file, as I clearly see a (C) symbol. Since when is © (U+00A9) not part of ISO-8859-1? http://en.wikipedia.org/wiki/ISO/IEC_8859-1 Garret - To unsubscribe, e-mail: users-unsubscr...@wicket.apache.org For additional commands, e-mail: users-h...@wicket.apache.org
Re: resource encoding troubles
It's just an encoding conflict: your properties uses ISO-8859-1, your page UTF-8. The result is a bad rendering, as you can see. When Java designers decided to adopt ISO-8859-1 they didn't consider most of the Asian languages... PS: just as a personal advice, try to be less rude in your answers ;) I appreciate all the workarounds suggested. But no one has addressed the core issue: Is this a Wicket bug, or am I using standard property files incorrectly? Garret - To unsubscribe, e-mail: users-unsubscr...@wicket.apache.org For additional commands, e-mail: users-h...@wicket.apache.org
Re: resource encoding troubles
Hi Garret, I like to find the source of the problem. Me too :). My configuration, as far as I can tell, is correct. From what you've written, I'd agree. You should create a quickstart. This will easily allow us to find a possible bug. Regards Sven On 08/28/2014 07:56 PM, Garret Wilson wrote: I appreciate all the workarounds suggested. But no one has addressed the core issue: Is this a Wicket bug, or am I using standard property files incorrectly? Garret On 8/28/2014 10:42 AM, Andrea Del Bene wrote: Have you tried using directly unicode character? i.e.: copyright=\u00A9 2014 Example, Inc. If you don't want to use unicode characters you should use an xml file as bundle file. Exactly! Quoting from the page you provided: Java uses the standard character set ISO 8859-11 to encode text files like properties files. ... (Note that this is a typo above---the author meant to say ISO 8859-1, not ISO 8859-11. The link to http://en.wikipedia.org/wiki/ISO/IEC_8859-1 in the text is correct, however.) So according to that description, my FooterPanel.properties file is expected to be encoded in ISO-8859-1. And indeed it is, as I have repeatedly explained. So I ask again: what is wrong with my current configuration? Garret - To unsubscribe, e-mail: users-unsubscr...@wicket.apache.org For additional commands, e-mail: users-h...@wicket.apache.org - To unsubscribe, e-mail: users-unsubscr...@wicket.apache.org For additional commands, e-mail: users-h...@wicket.apache.org - To unsubscribe, e-mail: users-unsubscr...@wicket.apache.org For additional commands, e-mail: users-h...@wicket.apache.org
Re: resource encoding troubles
On 8/28/2014 11:14 AM, Andrea Del Bene wrote: It's just an encoding conflict: your properties uses ISO-8859-1, your page UTF-8. The result is a bad rendering, as you can see. When Java designers decided to adopt ISO-8859-1 they didn't consider most of the Asian languages... PS: just as a personal advice, try to be less rude in your answers ;) Andrea, I'm sorry, I'll really try. My answers were probably terse (short and to the point), and you probably sense a frustration on my part with the lack of basic understanding in the software development world on the fundamentals of software encoding. For example, your answer seems to assume that some function simply loads two sets of bytes and merges them together. That's not what happens at all. (Or at least I hope that's not what happens---it would indicate that the coder had no idea how to approach the task.) In fact their are two layers to the encoding stack: the byte-level processing, and the character level processing. The Java Properties class should correctly take the bytes in the character file and do the ISO 8859-1 encoding, producing a character stream to be parsed. This is already implemented in Java, and has been for well over a decade, I believe. Similarly, an XML processor will take the bytes in an XML file and transform them based upon the encoding (in this case, UTF-8) and produce a stream of characters. All XML processors are required to be able to perform this transformation, and have been for well over a decade. Now that bother input sources produce data and the character level, the original byte-level encoding is irrelevant. At the character level, there is no encoding conflict, because there is no encoding. (There exists the in-memory encoding used by the JVM, but that's irrelevant to the discussion and will certainly be the same for all strings used.) Thus the two input streams can be mixed together without worry of encoding. If this is not what happens within Wicket, there is a software bug---but not an encoding conflict. I recommend you start by reading read http://www.joelonsoftware.com/articles/Unicode.html . If you have any questions, I'll be happy to answer any specific questions. I apologize again for being brusk, but I'll do my best to explain things if others honestly have questions. Garret - To unsubscribe, e-mail: users-unsubscr...@wicket.apache.org For additional commands, e-mail: users-h...@wicket.apache.org
Re: resource encoding troubles
On 8/28/2014 12:08 PM, Sven Meier wrote: ... My configuration, as far as I can tell, is correct. From what you've written, I'd agree. You should create a quickstart. This will easily allow us to find a possible bug. Better than that, I'd like to trace down the bug, fix it, and file a patch. But currently I'm blocked from working with Wicket on Eclipse https://issues.apache.org/jira/browse/WICKET-5649. Garret
Re: resource encoding troubles
I created a Wicket quickstart (from http://wicket.apache.org/start/quickstart.html) [this is Wicket 6.16.0] and made two simple changes: 1) I created a HomePage.properties file, encoded as ISO-8859-1, with a single line as per the example above: copyright=© 2014 Example, Inc. 2) I added a line to the HomePage.html file as per the example above: psmallwicket:message key=copyright© Example/wicket:message/small/p The content is served as UTF-8 and the copyright symbol is rendered correctly on the page. It doesn't look like the problem is in Wicket (at least not in 6.16). I guess your next steps would be to verify that you get the same results and, assuming that you do, start removing things from your page that has the problem until you find an element that is causing the problem. Thanks Andrew On Thu, Aug 28, 2014 at 5:38 PM, Garret Wilson gar...@globalmentor.com wrote: On 8/28/2014 12:08 PM, Sven Meier wrote: ... My configuration, as far as I can tell, is correct. From what you've written, I'd agree. You should create a quickstart. This will easily allow us to find a possible bug. Better than that, I'd like to trace down the bug, fix it, and file a patch. But currently I'm blocked from working with Wicket on Eclipse https://issues.apache.org/jira/browse/WICKET-5649. Garret