Re: Character encoding problems using jsp:include with jsp:param in Tomcat 8.5 only.
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Thorsten, On 11/27/18 04:48, Thorsten Schöning wrote: > Guten Tag Christopher Schultz, am Montag, 26. November 2018 um > 16:07 schrieben Sie: > >> web.xml - --- >> UTF-8 >> > > Tested that with Tomcat 9 and this setting fixed my problem the > same as using SetCharacterEncodingFilter. It doesn't work in Tomcat > 8.5, I guess because that simply doesn't implement Servlet 4.0? Correct. Tomcat 8.0 and 8.5 implement servlet 3.1. In Tomcat 8.x, you'll need to use the SetCharacterEncodingFilter. > Because I still need to support Tomcat 7 and 8.0 for some time, > I'll keep SetCharacterEncodingFilter for now and just document the > better solution. Thanks! Sounds good. The SetCharacterEncodingFilter should be entirely forward-compatible. - -chris -BEGIN PGP SIGNATURE- Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/ iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAlwAMN8ACgkQHPApP6U8 pFgY/w/+JyJy02PVIebDXUNYugq8rR2GR+7cQhrHiFwdR0kcf8/FySP8s/8IsJyn JaCbQ4V/qssMRYlSaxHb2m7xpioraXJkXQE/3HGZyJFKnLykZcAwF86jTSuTesS0 I20IRMh5KJKMoCszmDfqMnY3vQSGJJ7G+Jc47myApKn7qu2igQcDHkVZSK7hEqsb +ayfHiUIkyN24h6xvFEb7u5RDiATMli6GOverpW1t5+oWdDoUK452aQGQYfN8ojH Nv2lI6r9OSKQoz3eA6xNkMLlfSPGCH1kzfDyY4KYqhBtxshTnxRzkEoZ3w+DjVjD U69oOpLthm7nTiYbdGft4dMTcKW+17LczjEbRExV8ZqM3EI92a2iTPDhrva5T65E dTcNuImv2dr9Ijgn6hvMttE1Ntubncy+UwRdfuGTAoeZ771zxrP7+6UN6BXyO14S rwgAI1tPzwwsWHJ4emfNEERjKbKy0m5U/WivoKmVVDavGfYskCWQXkzZ64eUGxuU QKANPJJcprELYw2bX06n+ViJ+zKRHju4SsdJuScKpiXsBgVqiE6MsilB5DKIO8vg zypgshIpoKVjq3KevsEyHUbVNZguxv4wtSOsGhjkYpm0+e07e/MNLXaK2OnLxIV5 0OGfimo2pYNocS2iM2a2aiwi5PMfDchqjjVovyQvFSV4W3xaMIk= =mqmG -END PGP SIGNATURE- - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: Character encoding problems using jsp:include with jsp:param in Tomcat 8.5 only.
Guten Tag Christopher Schultz, am Montag, 26. November 2018 um 16:07 schrieben Sie: > web.xml > - --- > > UTF-8 > Tested that with Tomcat 9 and this setting fixed my problem the same as using SetCharacterEncodingFilter. It doesn't work in Tomcat 8.5, I guess because that simply doesn't implement Servlet 4.0? Because I still need to support Tomcat 7 and 8.0 for some time, I'll keep SetCharacterEncodingFilter for now and just document the better solution. Thanks! P.S.: I've send you a private mail some days ago, unrelated to Tomcat. Did you get that? Just want to make sure that I'm not spam filtered. Mit freundlichen Grüßen, Thorsten Schöning -- Thorsten Schöning E-Mail: thorsten.schoen...@am-soft.de AM-SoFT IT-Systeme http://www.AM-SoFT.de/ Telefon...05151- 9468- 55 Fax...05151- 9468- 88 Mobil..0178-8 9468- 04 AM-SoFT GmbH IT-Systeme, Brandenburger Str. 7c, 31789 Hameln AG Hannover HRB 207 694 - Geschäftsführer: Andreas Muchow - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: Character encoding problems using jsp:include with jsp:param in Tomcat 8.5 only.
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Thorsten, On 11/26/18 08:45, Thorsten Schöning wrote: > Hi all, > > I'm currently testing migration of a legacy web app from Tomcat 7 > to 8 to 8.5 and ran into problems regarding character encoding in > 8.5 only. That app uses JSP pages and declares all of those to be > stored in UTF-8, does really do so :-), and declares a HTTP-Content > type of "text/html; charset=UTF-8" as well. Textual content at > HTML-level is properly encoded using UTF-8 and looks properly in > the browser etc. > > In Tomcat 8.5 the following is introducing encoding problems, > though: > >> > name="chooseSearchInputTitle" value="Benutzer wählen" /> >> > > "search.jsp" simply outputs the value of the param as the "title" > attribute of some HTML-link and the character "ä" is replaced > somewhere with the Unicode character REPLACEMENT CHARACTER 0xFFFD. > But really only in Tomcat 8.5, not in 8 and not in 7. Have you been able to determine if the problem is on input or output? > I can fix that problem using either "SetCharacterEncodingFilter" > or the following line, which simply results in the same I guess: > >> <% request.setCharacterEncoding("UTF-8"); %> FYI the SetCharacterEncodingFilter only modifies request encoding and not response encoding. Also, it only changes the encoding of the request *body* (e.g. PUT/POST), and not the encoding used to decode the URI. That's configured in 's URIEncoding. There is also useBodyEncodingForURI which inherits the request body's encoding if it's present. I recommend using useBodyEncodingForURI="true". I recommend *always* using SetCharacterEncodingFilter, since web browsers both habitually refuse to send a correct content/type and often use UTF-8 in URLs in violation of the HTTP spec. The result is essentially that everything works the way you *want* it to work, except that you just have to "hope" it works instead of being able to prove that it will. > Looking at the generated Java code for the JSP I get the > following: > >> org.apache.jasper.runtime.JspRuntimeLibrary.include(request, >> response, "/WEB-INF/jsp/includes/search.jsp" + "?" + >> org.apache.jasper.runtime.JspRuntimeLibrary.URLEncode("chooseSearchIn putTitle", >> request.getCharacterEncoding())+ "=" + >> org.apache.jasper.runtime.JspRuntimeLibrary.URLEncode("Benutzer >> wählen", request.getCharacterEncoding()), out, false); > > The "ä" is properly encoded using UTF-8 in all versions of Tomcat > and the generated code seems to be the same in all versions as > well, especially regarding "request.getCharacterEncoding()". > > "getCharacterEncoding" in Tomcat 8.8 has changed, the former > implementation didn't take the context into account: > >> @Override public String getCharacterEncoding() { String >> characterEncoding = coyoteRequest.getCharacterEncoding(); if >> (characterEncoding != null) { return characterEncoding; } >> >> Context context = getContext(); if (context != null) { return >> context.getRequestCharacterEncoding(); } >> >> return null; } This is just a fall-back for when there is no character encoding defined in the request (because the browser didn't send one). > My connector in server.xml is configured to use "URIEncoding" as > UTF-8 in all versions of Tomcat, but that doesn't make a difference > to 8.5. So I understand that using "setCharacterEncoding", I set > the value actually used in the generated Java now, even though the > following is documented for character encoding filter: > >> Note that the encoding for GET requests is not set here, but on a >> Connector > > https://tomcat.apache.org/tomcat-8.5-doc/config/filter.html#Set_Charac ter_Encoding_Filter/Introduction > > Now I'm wondering about multiple things... > > 1. Doesn't "getCharacterEncoding" provide the encoding of the > HTTP-body? Yes, but it comes directly from the browser, who often doesn't provide it. There is no encoding-detection going on, so it's often "null" or ISO-8859-1, which is the spec-defined default. > My JSP is called using GET and the Java quoted above seems to build > a query string as well. So why does it depend on some body encoding > instead of e.g. URIEncoding of the connector? Good question. Might be a bug, here. > 2. Is my former approach wrong or did changes in Tomcat 8.5 > introduce some regression? There is some conversion somewhere which > was not present in the past. Tomcat 8.5 follows the servlet spec, which in v4.0 added the to make things even more fun. Actually, this can replace the use of the SetCharacterEncodingFilter. Thanks for pointing this out; I wasn't aware of this feature of the 4.0 spec. > 3. What is the correct fix I need now? The character encoding > filter, even though it only applies to bodies per documentation? Try setting in your like this: web.xml - --- UTF-8 - -chris -BEGIN PGP SIGNATURE- Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/
Re: Character encoding issue in URL
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Justin, On 1/25/17 12:25 AM, Justin Dang wrote: > Hi, I have a clean install of an older version of Tomcat (8.0.24). > I have noticed when a character is encoded in the URL, Tomcat fails > to return the URL requested. I've noted this same request > performed in IIS works fine. > > Apache Tomcat tests: > > Works (No escape) – http://localhost:8080/examples/delme/íj.pdf > > Works (URL encoded) – > http://localhost:8080/examples/delme/%C3%ADj.pdf > > Failed (Char encoded) – > http://localhost:8080/examples/delme/%EDj.pdf You are mistaken. While you might think that %ED would map to U+00ED, it does not. You need to use %C3%AD as you have done in your second example, because the standard is UTF-8 and not UTF-16 or anything like that. %ED is not a valid character in UTF-8. https://en.wikipedia.org/wiki/UTF-8#Description > IIS tests: > > Works (No escape) – http://localhost:8080/examples/delme/íj.pdf > > Works (URL encoded) – > http://localhost:8080/examples/delme/%C3%ADj.pdf > > Works (Char encoded) – > http://localhost:8080/examples/delme/%EDj.pdf > > > I've reviewed this wiki page: > > > https://wiki.apache.org/tomcat/FAQ/CharacterEncoding > > > And it seems to imply that I shouldn't have to do anything, and the > URL request should return properly. > > > So my question is, what do I need to configure in Apache Tomcat to > handle the character encoding request like IIS does? Only by violating UTF-8. - -chris -BEGIN PGP SIGNATURE- Comment: GPGTools - http://gpgtools.org Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQIcBAEBCAAGBQJYiNiJAAoJEBzwKT+lPKRY0I4P/2B84M/wkoomYdL3T2mgG7Pg MjijT7cRrAnX/OhJsT1vILKFVeW8nB6O6IV2NDUx4CtqcVg/ce4cYPmoy0qADMyu qHmwybGMauoIM6uamA1jxDiNWGElW36Wa6y8ESySFG0qzsK8o++XJMCINlS2hQJ9 g7dBcfVLXQc9PTYIGbrAQQ/oSVViRRgfsW5TgH0YlVfie1iSASRm9lcYLHliDGH9 S3NMPdmaRE+lwkrKJ1X6r+Kxz95e5hxQWQPXc4xGGcmZEC8PWcnQRiCob/TCqJUh obKNrLEC/GvJ8gu7eCEFMDd6usjUIxJVjhGJDPo0vxVcLIJ9dte2kq714u12w7kl 49AMoyz+3Co5W5PheeqQnIoJhA5sqJRP3KxuxcfTJE7TyKn+SE2moC0twDKpur5W exu5ps2wdaBmIBE3S5aXxGYpFmlm5dvdcM1lQjoiIdg5JdKZLacxP7DBCaVT8UC9 4/Siu1iDBz0KnEwCoBhFjlr8qVoSgCfRV6VEHjhr9z+yEG60cnniVk2diYdpcpia W/iPEe7nFhzBjNelqh1IL9XlogTc4IIoL0T88ti5EYks/pKgr4Ilsh08IkJhtHk6 vH3jCmdbR3c3Gb002lOMk9oBYyvOSxnwUr34n7KXcEYitJd8a8YNm+tNsKQ14ZLS 1z8g/1zJZSrGZdX6n8g9 =4ASz -END PGP SIGNATURE- - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: Character encoding issues
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 James, On 8/24/16 3:46 PM, James H. H. Lampert wrote: > On 8/24/16, 12:36 PM, Mark Thomas wrote: > >> At a guess, something in the web application is using the >> platform default encoding rather than an explicit encoding. Given >> that the Linux box is OK, it looks like the app should be >> explicitly using UTF-8 everywhere. > > Based on a response I got on the Midrange Java List, and on what > I'd found since I entered the query, I would agree. > > What's the best way to accomplish this? http://wiki.apache.org/tomcat/FAQ/CharacterEncoding#Q8 If you also want to change the default platform encoding, you'll have to do the following (or equivalent on your E4A): 1. Edit CATALINA_BASE/bin/setenv.sh (or create it) 2. Add this line: export CATALINA_OPTS="$CATALINA_OPTS -Dfile.encoding=UTF-8" The above should change the values on the E4A's JVM so that everything reports UTF-8 and nothing reports ISO-8859-1. The "UnicodeBig" is the internal encoding that the JVM uses to represent Java primitive "char" type and will probably only work with the value you've got there... don't try to mess with that! :) FWIW, the system properties of each JVM are somewhat interesting, but probably won't help you debug anything. It might not even fix anything. Every HTTP request/response is defined to have a character encoding: 1. As specified in the Content-Type header 2. To be ISO-8859-1 as the default in case no header exists "Most" client software these days actually defaults to sending requests without a Content-Type character encoding and instead just using "whatever character encoding the server sent this page to me using" as the character encoding. That's usually not the case with back-end software, which is likely the case with your app. Presumably, your application uses IMAP or similar to contact the gmail server? In that case, HTTP isn't in use and it's possible that the system properties defining the system character encoding are in use. It all depends upon how the software works under the hood. If HTTP is in use, here, then the problem exists in some component not following the spec. Tomcat isn't part of the the problem, there. If some other protocol is in use, it's entirely possible that default "platform" (as defined by system properties) encoding is being used. The only solution there would be to change the file.encoding property as I've described above. Let us know how it goes. - -chris -BEGIN PGP SIGNATURE- Comment: GPGTools - http://gpgtools.org Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQIcBAEBCAAGBQJXvhPQAAoJEBzwKT+lPKRY97kQAKPNSiS+6ZHWd/XvfgQMsGMB zF/+gD75swub3qjz93vYYwG/OCykNE7ljpZJva3VNXdHwfnPkPe9xL4Kbn5uH/0i CA4zztQW89Mkdhe9tGa4LDCv4tDWQGLhvKiGu3moFzcjKMSHTyIQe6wByX4SUk5N HCbHf39avr6So60G55i7vWBPkeU9Du8Oa0T/X3NOlAlBIoSiDm2HKdvwb+3Fmeqn W7JytdcRxoS5VwkIJFa3lsFt77Rz3ROV7KnCl7wrCVaxPs0RIM7DI9ojzzbMLt2s S+nArR4gKwR0A5js+nRGJ/H53m1qiqUGvpb6HmUUz2pVSpTejGQFwVANwf54+IUY uQKxud5XkB+JDN4f7+7ZKUn2l4kgrtYJxxyr2bzzYmHu3Z0AAMAqt78ZI7DYdCBZ B0Gpdx6DPV0czsQs4g/usmF3M3hbAhkozYi7U5tzZfmUg2rIBfHKo4bX0GMEznJ9 5HvVJpRyLUPnXkA85wPi3aJwuvavFb9r51Kg17Vhuj74qcEacH4RwydE2vPRmVm1 WovpPjP0rwIpmJJlYq+RzzSXkYShiOZftqOKOeH/XSO+IwpQS2MlYpFUNiLpU4Y7 7qhatQQMcbmBHEFJ7jI1gJs/jkChm3iUWicOwju0XWoTshg0wEA3tEhGgZe8laN9 kLp4YiKoxMDPDX5uFQvO =uvDy -END PGP SIGNATURE- - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: Character encoding issues
On 8/24/16, 12:36 PM, Mark Thomas wrote: At a guess, something in the web application is using the platform default encoding rather than an explicit encoding. Given that the Linux box is OK, it looks like the app should be explicitly using UTF-8 everywhere. Based on a response I got on the Midrange Java List, and on what I'd found since I entered the query, I would agree. What's the best way to accomplish this? -- JHHL - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: Character encoding issues
On 24/08/2016 17:43, James H. H. Lampert wrote: > Ladies and Gentlemen of the Tomcat and Midrange-Java communities: > > We're having a weird problem with character encoding in a Tomcat webapp. > > We've added an interface to GMail to our webapp, and we've got, just for > our own development, testing, and production use, instances of that > webapp running in three different Tomcat 7 servers: one running on an > IBM Midrange box (an E4A, running V6R1, with Tomcat running on a Java 6 > JVM), one running on a Windows box, and the third running on a Linux > (CentOS) box. > > On the Midrange box, the traffic between us and GMail is getting garbled > (Chinese characters appear, seemingly at random), with an apparent > character encoding conflict. On the Linux box, it isn't. Not sure about > the Windows box. > > Now, on the Midrange box, it's a fairly straightforward process to look > at the Java System Properties for a JVM. For the JVM Tomcat is running > in, "Initial Java System Properties" shows >> file.encoding 'ISO8859_1' > and "Current Java System Properties" shows >> os.encoding 'ISO8859-1' >> sun.jnu.encoding'ISO8859-1' >> sun.io.unicode.encoding 'UnicodeBig' >> ibm.system.encoding 'ISO8859-1' >> file.encoding 'ISO8859_1' > > I found JConsole and JVisualVM on the Linux box, and while I couldn't > find system properties in JConsole, I could in JVisualVM. I have: >> file.encoding=UTF-8 >> sun.jnu.encoding=UTF-8 >> sun.io.unicode.encoding=UnicodeLittle > > Can somebody enlighten me on whether this is the cause of the encoding > issue with Google, and what to do about it? At a guess, something in the web application is using the platform default encoding rather than an explicit encoding. Given that the Linux box is OK, it looks like the app should be explicitly using UTF-8 everywhere. Mark - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: Character encoding question
The Microsoft characters are encoded in CP-1252 (http://en.wikipedia.org/wiki/Windows-1252). However, if the problem is the database-driven content, then you also need to consider the encoding that the dB is using. For example, MySQL might by default use latin1 (ISO-8859-1), so your data might be corrupted before it is even seen by Tomcat. So be careful -- you might have to go deeper than just the Tomcat encoding. BTW, in our application we solved the same problem by catching the data before it is saved to the dB, converting any CP-1252 characters into reasonable latin-1 characters. It is not the perfect solution, but is enough for our clients. Good luck, -- Dave Cherkassky VP of Software Development DJiNN Software Inc. 416.504.1354 On 27/08/2010 1:23 PM, laredotornado wrote: Hi, I'm using Tomcat 6.0.26. I'm noticing that when our JSPs pages are served, we frequently have ?s where apostrophes should be. We think this is because the database-driven content contains the Microsoft style apostrophe. My question is, if I adjust the character encoding on Tomcat, will it serve the MS character instead of a question mark? I read the default encoding is ISO-8859-1, which I thought would include this mystery character, but apparently it doesn't. Do you know what encoding I should use and where I should set it? Thanks, - Dave - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: Character encoding question
On 27/08/2010 18:23, laredotornado wrote: Hi, I'm using Tomcat 6.0.26. I'm noticing that when our JSPs pages are served, we frequently have ?s where apostrophes should be. We think this is because the database-driven content contains the Microsoft style apostrophe. [wince] My question is, if I adjust the character encoding on Tomcat, will it serve the MS character instead of a question mark? I read the default encoding is ISO-8859-1, which I thought would include this mystery character, but apparently it doesn't. Do you know what encoding I should use and where I should set it? Depends. What encoding does the DB use? What kind of DB is it? p 0x62590808.asc Description: application/pgp-keys signature.asc Description: OpenPGP digital signature
Re: Character encoding for POST x-www-form-urlencoding (a success story)
Very nice work, Thank you for the sharing. On Fri, Feb 12, 2010 at 11:23 PM, Christopher Schultz ch...@christopherschultz.net wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 All, My company recently decided to alter our password complexity requirements for our webapp, and I got to implement the changes. What fun! We use a regular expression to enforce our password complexity, and it needed to be changed. Since we are starting to branch-out into populations that aren't necessarily using written English everywhere, I chose to change our naive [a-z]- and [A-Z]-type checking to a mroe enlightened \p{Ll} and \p{Lu}, respectively. (Readers' note: jakarta-oro does not support this notation, so you'll want to use Java's built-in regular expression support to do this). Anyhow, when making changes to things security-related, it pays to test /everything/, so I grabbed 4 other people from my group and had them each test 15 sample passwords against our 6 different forms that accept password-change entry. Everything went fine. Except when I then tried to login from our home page with the password 1πππ (that's a '1' digit followed by 7 Greek Pi characters, in case your email reader can't render that), and I got a failure. I figured I must have fat-fingered something, so I tried again and all was well. My spidey-sense tingling, I logged-out and repeated the process: again, my first login attempt was unsuccessful, while the second was. Hmm. Upon closer inspection, our opening page is a static HTML file served by Apache httpd -- no Tomcat involvement. After a failed login, a page that looks exactly like the home page is sent to the user, but it's different: /and/ it's served by Tomcat. The difference was that the original request's response (for /index.html) had a Content-Type of text/html, while the failed login had a response Content-Type of text/html; charset=UTF-8. It's out old pal what's the default encoding, again? coming back to haunt me, and here I am telling people on this list that they just don't understand the history of the web and how to do things properly. Evidently, I wasn't doing them properly, either. All those complaints about the way that URL-encoded GET parameters can get messed up based upon Content-Type and encoding guesses, etc. and the solution is just to use POST is, well, only half the truth. Yes, POST gets you away from the browser's preference for what encoding to use before URL-encoding the bytes, but, with POST the Content-Type is application/x-www-form-urlencoded, which means there's no charset associated with it. :( So, what's to be done? Well, I immediately thought of two solutions: meta http-equiv=Content-Type content=text/html; charset=UTF-8 / and form accept-charset=UTF-8 Knowing that web browsers are notoriously inconsistent with one another regarding certain things, I was sure that I'd have a giant mess when it came to testing, and that I'd have to figure out how to trick each version of each browser into doing my bidding. First, I had to make sure that they all /failed/ in the same way (that is to say, that the login failed the way I expected it to fail), then I had to see what magical incantations would be necessary to actually get the login to succeed. I'm happy to report that, for /all/ of the following browsers, */both/* solutions worked! Mozilla Firefox 2.0 Mozilla Firefox 3.0 Mozilla Firefox 3.5 Mozilla Firefox 3.6 Opera 9.6 Opera 10.10 Apple Safari 3.2 Apple Safari 4.0 Google Chrome 4.0 MSIE 6.0 MSIE 7.0 MSIE 8.0 I'm inclined to use the form accept-charset=UTF-8 solution, because that does not involve lying to the browser about the encoding of the actual HTML document. Instead, I'd rather advertise that I will only accept UTF-8 encoding and leave it at that. Sadly, the client still doesn't tell me that the underlying encoding being used to urlencode the POST parameters is UTF-8, but at least they're doing what I want them to do, and they all agree on behavior! So, score 1 for standards, at least in this instance. - -chris -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.10 (MingW32) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAkt11PoACgkQ9CaO5/Lv0PC+OACgtobt70NWFxYJzcRt5r0zXlaN tYEAn0ZYnB/oehIoZR0NUs7Q/4mOux7x =U0Wt -END PGP SIGNATURE- - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org -- Sincerely yours and Best Regards, Xie Xiaodong
Re: Character encoding
Chris, I finally found it. My server.xml was not correctly configured. My fault. Again, thank you all for your help. - Original Message From: Christopher Schultz [EMAIL PROTECTED] To: Tomcat Users List users@tomcat.apache.org Sent: Wednesday, June 18, 2008 11:12:45 PM Subject: Re: Character encoding -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 nch, nch wrote: | You say: | Tomcat does not use any environment variables. The only settings that | affect the interpretation of the URI are the URIEncoding and | useBody... settings on the Connector. Are you using more than one | connector? Are you using Apache httpd out in front of Tomcat? | | Perhaps the JVM does and so tomcat read them indirectly through it?? You can read the code for the connector. Those settings are the only relevant ones. - -chris -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.9 (MingW32) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEARECAAYFAkhZek0ACgkQ9CaO5/Lv0PBDvQCguIgu+QMTjKDxua3CS0cn9Gd0 AEoAoIZTNaJpiI8Xv3szp9O+3eANIGK0 =+VmT -END PGP SIGNATURE- - To start a new topic, e-mail: users@tomcat.apache.org To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Character encoding
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 nch, nch wrote: | Chris, I finally found it. | My server.xml was not correctly configured. My fault. | | Again, thank you all for your help. No problem. Would you mind explaining for the group what the actual problem was, and what the solution was? Lots of these threads go nowhere because either the people asking questions go away entirely, or they say works, now! and nobody reading the archives has any clue where they should look (in spite of the repeated answers they get from folks like me). Thanks, - -chris -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.9 (MingW32) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEARECAAYFAkhau0MACgkQ9CaO5/Lv0PBp7gCeLf+c+fGjkNzGO1qqQvazol4f buwAnRbiYnDWcubbAu0AnnQ21SClNAVm =z0rX -END PGP SIGNATURE- - To start a new topic, e-mail: users@tomcat.apache.org To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Character encoding
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 nch, nch wrote: | I'm having difficulties trying to decode URI parameters into UTF8. :( | When I moved the application | to linux (debian etch) I found out it was not working. We run on Linux as well. TC 5.5.23, Java 1.5.0_11. We have configured the following: 1. Set URIEncoding=UTF-8 on our Connector ~ (but /not/ useBodyEncoding) 2. Installed a filter similar to the one you mentioned 3. Output encoding on every page is set to UTF-8 This appears to work with us (we tried several greek characters and they went into our database and came back out correctly). Try removing the useBodyEncoding setting. - -chris -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.9 (MingW32) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEARECAAYFAkhZE/0ACgkQ9CaO5/Lv0PCMeACfbsGgANVvy3xTBY3sqiQN5STW 6I0AniwfnPX0OTPNmQ7YJGc+c/YL2AJx =ruy+ -END PGP SIGNATURE- - To start a new topic, e-mail: users@tomcat.apache.org To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Character encoding
Thanks, Christopher. This doesn't work either. I removed the useBodyEncoding property, as you suggested, from the Connector element, but the URI parameter coming in the request is still being decoded into ISO-8859-1 instead of UTF-8. Pages are displaying correctly, I use pageEncoding=UTF-8 contentType=text/html;charset=utf-8 in every single page I also tried changing my system locale into es_ES.UTF-8 (it was en_US-UTF-8) by following http://people.debian.org/~schultmc/locales.html , but I can see no difference after restarting everything. Remember, I'm having this problem in debian etch (works fine in windows xp). Many thanks. - Original Message From: Christopher Schultz [EMAIL PROTECTED] To: Tomcat Users List users@tomcat.apache.org Sent: Wednesday, June 18, 2008 3:56:13 PM Subject: Re: Character encoding -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 nch, nch wrote: | I'm having difficulties trying to decode URI parameters into UTF8. :( | When I moved the application | to linux (debian etch) I found out it was not working. We run on Linux as well. TC 5.5.23, Java 1.5.0_11. We have configured the following: 1. Set URIEncoding=UTF-8 on our Connector ~ (but /not/ useBodyEncoding) 2. Installed a filter similar to the one you mentioned 3. Output encoding on every page is set to UTF-8 This appears to work with us (we tried several greek characters and they went into our database and came back out correctly). Try removing the useBodyEncoding setting. - -chris -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.9 (MingW32) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEARECAAYFAkhZE/0ACgkQ9CaO5/Lv0PCMeACfbsGgANVvy3xTBY3sqiQN5STW 6I0AniwfnPX0OTPNmQ7YJGc+c/YL2AJx =ruy+ -END PGP SIGNATURE- - To start a new topic, e-mail: users@tomcat.apache.org To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Character encoding
nch wrote: Thanks, Christopher. This doesn't work either. Could you give an example of such a UTF-8 encoded URI ? (and tell us what it should be decoded to) Thanks - To start a new topic, e-mail: users@tomcat.apache.org To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Character encoding
There it goes. I have a form that has an input field named query. I type piraña an submit the form using the GET method. I can see the browser has encoded this parameter into the URI as query=pira%C3%B1a I set a breakpoint into the filter so when the request hits the filter I can see getCharacterEncoding() returns null. The filters sets it to UTF-8. Then the request gets to the controller where I can see the request parameter query is set to piraña. The controller tries to perform a text search using that query but, obviously, it doesn't return any results. I can manually modify it while debugging and set it to piraña, so the controller returns several results. BTW. I'm running Tomcat 6.0.13 on Sun JDK 1.6.0_06 Kind regards. - Original Message From: André Warnier [EMAIL PROTECTED] To: Tomcat Users List users@tomcat.apache.org Sent: Wednesday, June 18, 2008 4:29:54 PM Subject: Re: Character encoding nch wrote: Thanks, Christopher. This doesn't work either. Could you give an example of such a UTF-8 encoded URI ? (and tell us what it should be decoded to) Thanks - To start a new topic, e-mail: users@tomcat.apache.org To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Character encoding
- Original Message - From: André Warnier [EMAIL PROTECTED] Could you give an example of such a UTF-8 encoded URI ? (and tell us what it should be decoded to) Thanks Andre have a look here... its not url encoding, thats something different It about being able to store japanese and typically trying to match it all with the dB's encoding. Heres some1's explanation of encoding history and why UTF8 is a good thing... http://www.joelonsoftware.com/articles/Unicode.html And here is a typical solution on TC's wiki http://wiki.apache.org/tomcat/Tomcat/UTF-8 And in the real world it gets hectic ;) Like in Netbeans if you dont put this in Opts Dfile.encoding=UTF-8 You not seeing Japanese in your editor... and it wont save the files as UTF 8 Then you think cool... until you find out you can stand on your head and a property file will not encode in UTF 8... Then you may have some lib that converts back to ASCII and you cant figure it out... Its a headache... but necessary... Java actually does a pretty good job of things in String just by default, but if you look at all the options you going to find the whole encoding thing going on there as well. Then you try just the %@ page contentType and its perfect, next project for some unknown reason you got to do the old fashioned meta tag in the web pages as well fun stuff ;) - To start a new topic, e-mail: users@tomcat.apache.org To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Character encoding
- Original Message - From: nch [EMAIL PROTECTED] To: Tomcat Users List users@tomcat.apache.org Sent: Wednesday, June 18, 2008 5:09 PM Subject: Re: Character encoding There it goes. I have a form that has an input field named query. I type piraña an submit the form using the GET method. I can see the browser has encoded this parameter into the URI as query=pira%C3%B1a I set a breakpoint into the filter so when the request hits the filter I can see getCharacterEncoding() returns null. The filters sets it to UTF-8. Then the request gets to the controller where I can see the request parameter query is set to piraña. The controller tries to perform a text search using that query but, obviously, it doesn't return any results. I can manually modify it while debugging and set it to piraña, so the controller returns several results. BTW. I'm running Tomcat 6.0.13 on Sun JDK 1.6.0_06 Kind regards. nch, I think the HTML page doesnt know its charset... it doesnt look like its encoded. Have a look at this article... they doing almost what you doing http://java.sun.com/developer/technicalArticles/Intl/HTTPCharset/ I think you are in your software bringing the req back UTF8 encoded but the page that went out to the browser is not telling the browser the form must come back encoded. It looks just like normal URL encoding there is not UTF8 in there... I think. Good luck... - To start a new topic, e-mail: users@tomcat.apache.org To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Character encoding
- Original Message - From: Johnny Kewl [EMAIL PROTECTED] To: Tomcat Users List users@tomcat.apache.org Sent: Wednesday, June 18, 2008 6:26 PM Subject: Re: Character encoding - Original Message - From: nch [EMAIL PROTECTED] To: Tomcat Users List users@tomcat.apache.org Sent: Wednesday, June 18, 2008 5:09 PM Subject: Re: Character encoding There it goes. I have a form that has an input field named query. I type piraña an submit the form using the GET method. I can see the browser has encoded this parameter into the URI as query=pira%C3%B1a I set a breakpoint into the filter so when the request hits the filter I can see getCharacterEncoding() returns null. The filters sets it to UTF-8. Then the request gets to the controller where I can see the request parameter query is set to piraña. The controller tries to perform a text search using that query but, obviously, it doesn't return any results. I can manually modify it while debugging and set it to piraña, so the controller returns several results. BTW. I'm running Tomcat 6.0.13 on Sun JDK 1.6.0_06 Kind regards. nch, I think the HTML page doesnt know its charset... it doesnt look like its encoded. Have a look at this article... they doing almost what you doing http://java.sun.com/developer/technicalArticles/Intl/HTTPCharset/ I think you are in your software bringing the req back UTF8 encoded but the page that went out to the browser is not telling the browser the form must come back encoded. It looks just like normal URL encoding there is not UTF8 in there... I think. nch I checked it... I was wrong, the browser is returning the right things... that is UTF8 but that display of piraña is still ISO... ie ISO trying to display the UTF8 So its been read wrong in the server.. sorry. If the IDE is not set up for UTF8... then the display is right, NB just cant show it to you until it can also read UTF8... good luck ;) Maybe its just your eyes that are broken, and TC is working ;) Send it back to a the browser... it will probably be right... in which case its the IDE ;) Good luck ;) - To start a new topic, e-mail: users@tomcat.apache.org To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Character encoding
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 nch, nch wrote: | This doesn't work either. :( | I removed the useBodyEncoding property, as you suggested, from the | Connector element, but the URI parameter coming in the request is | still being decoded into ISO-8859-1 instead of UTF-8. How do you know that ISO-8859-1 is being used to decode it? | Pages are | displaying correctly, I use pageEncoding=UTF-8 | contentType=text/html;charset=utf-8 in every single page I also | tried changing my system locale into es_ES.UTF-8 (it was en_US-UTF-8) | by following http://people.debian.org/~schultmc/locales.html , but I | can see no difference after restarting everything. Remember, I'm | having this problem in debian etch (works fine in windows xp). We don't bother explicitly setting the JVM's locale or anything like that. The standard environment for my production system shows file.encoding=UTF-8 with no additional configuration. That should not affect the interpretation of URI parameters, though. Are you sure that your configuration is being read properly? That everything is spelled correctly? That you are actually putting server.xml in the right place and that TC is properly reading it? I'm asking because the change you made should definitely have worked. - -chris -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.9 (MingW32) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEARECAAYFAkhZRBsACgkQ9CaO5/Lv0PBYHQCcCJzA1/JhwDD9XtWG4ilBK7Z5 /IoAoLsCGbi+Vw6jA/Ycc0elpb9tZrlN =7mCa -END PGP SIGNATURE- - To start a new topic, e-mail: users@tomcat.apache.org To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Character encoding
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 nch, nch wrote: | I have a form that has an input field named query. I type piraña | an submit the form using the GET method. I can see the browser has | encoded this parameter into the URI as query=pira%C3%B1a Is this a correct UTF-8 encoding of the parameter? I don't have my unicode conversion chart handy right now. | I set a breakpoint Stop right there. If you are executing TC through a debugger, are you sure that it is using its standard server.xml configuration? | into the filter so when the request hits the filter I can see | getCharacterEncoding() returns null. The filters sets it to UTF-8. FYI, this has no bearing on the interpretation of the URI. | Then the request gets to the controller where I can see the request | parameter query is set to piraña. Just in case it doesn't go through email very well, I see pir followed by an A with a tilde over it, followed by a +/- symbol, followed by an a. Definitely not right. Is that what you'd expect if you improperly interpreted the UTF-8, URL-encoded piraña as if it were ISO-8859-1? - -chris -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.9 (MingW32) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEARECAAYFAkhZRO8ACgkQ9CaO5/Lv0PBXBQCeP3YKqnpJDO65N8lfvO9ThPhr Nr8AnRbPC1BxIEOXqIOrMCS1ACy7YFU6 =y8/w -END PGP SIGNATURE- - To start a new topic, e-mail: users@tomcat.apache.org To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Character encoding
More info on this: - I do remote debugging through Eclipse to both tomcat on windows (same machine as eclipse, though) and tomcat on debian. - I open a debugging port on tomcat by setting CATALINA_OPTS=-Xmx1024m -Xdebug -Xnoagent -Djava.compiler=NONE -Xrunjdwp:transport=dt_socket,address=4501,server=y,suspend=n - When I send piraña it is allways encoded into the URL as pira%C3%B1a, whether running tomcat on windows, debian or even running my app into Jetty. - When I send piraña, if I'm debugging tomcat on windows I can read piraña. - If tomcat is running on debian, I read piraña. - If I type piraña on http://www.us-webmasters.com/Decode-URLs/ and switch browser encoding display between ISO-8859-1 and UTF-8, I can see that when ISO-8859-1, then it displays piraña, when UTF-8, it displays piraña. - When I run/debug my app on Jetty I get piraña (I've read on the web that Jetty decodes to UTF-8 by default). - Something could be wrong in my debian environment. How can I find out about which env. varables is tomcat using? - If I try to manually decode the returned parameter into my controller by using URLDecoder.decode(query, UTF-8) then I can see no difference. That is, when debugging the tomcat on windows the result is piraña while debugging the one on debian the result is piraña. - Is URLDecoder#decode environment dependent? Hope this is useful. Lots of thanks to you all. - Original Message From: Christopher Schultz [EMAIL PROTECTED] To: Tomcat Users List users@tomcat.apache.org Sent: Wednesday, June 18, 2008 7:25:03 PM Subject: Re: Character encoding -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 nch, nch wrote: | I have a form that has an input field named query. I type piraña | an submit the form using the GET method. I can see the browser has | encoded this parameter into the URI as query=pira%C3%B1a Is this a correct UTF-8 encoding of the parameter? I don't have my unicode conversion chart handy right now. | I set a breakpoint Stop right there. If you are executing TC through a debugger, are you sure that it is using its standard server.xml configuration? | into the filter so when the request hits the filter I can see | getCharacterEncoding() returns null. The filters sets it to UTF-8. FYI, this has no bearing on the interpretation of the URI. | Then the request gets to the controller where I can see the request | parameter query is set to piraña. Just in case it doesn't go through email very well, I see pir followed by an A with a tilde over it, followed by a +/- symbol, followed by an a. Definitely not right. Is that what you'd expect if you improperly interpreted the UTF-8, URL-encoded piraña as if it were ISO-8859-1? - -chris -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.9 (MingW32) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEARECAAYFAkhZRO8ACgkQ9CaO5/Lv0PBXBQCeP3YKqnpJDO65N8lfvO9ThPhr Nr8AnRbPC1BxIEOXqIOrMCS1ACy7YFU6 =y8/w -END PGP SIGNATURE- - To start a new topic, e-mail: users@tomcat.apache.org To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Character encoding
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 nch, nch wrote: | - I do remote debugging through Eclipse to both tomcat on windows | (same machine as eclipse, though) and tomcat on debian. Okay, remote debugging should not affect the server, but I'm still wondering if the server.xml you think you are using is the one actually being used. Try setting the Connector port to something crazy like 12345 and restarting. If you can still contact the server, then you are either editing the wrong server.xml (there should only be one!) or your changes are not being picked up. | - When I send piraña it is always encoded into the URL as | pira%C3%B1a, whether running tomcat on windows, debian or even | running my app into Jetty. That's because your browser is encoding it, not the server. So, it doesn't depend on the server configuration (except possibly for the page encoding, which often directs the browser to use utf-8 URI encoding). | - If I type piraña on http://www.us-webmasters.com/Decode-URLs/ and | switch browser encoding display between ISO-8859-1 and UTF-8, I can | see that when ISO-8859-1, then it displays piraña, when UTF-8, it | displays piraña. I'm not sure what you think you're doing, there. When I paste that word into the box to decode, I get broken output. There is no indication as to what encoding the server expects for URIs. Switching browser interpretation of the resulting page does not seem to prove anything. The server never advertises any encoding to use, so the browser just chooses whatever it wants. My browser chooses ISO-8859-1. When I switch it to UTF-8, I see the expected interpretation. I'm not sure what I just learned. | - Something could be wrong in my debian environment. How can I find | out about which env. varables is tomcat using? Tomcat does not use any environment variables. The only settings that affect the interpretation of the URI are the URIEncoding and useBody... settings on the Connector. Are you using more than one connector? Are you using Apache httpd out in front of Tomcat? | - If I try to manually decode the returned parameter into my | controller by using URLDecoder.decode(query, UTF-8) then I can see | no difference. That is, when debugging the tomcat on windows the | result is piraña while debugging the one on debian the result is | piraña. So, running this: URLDecoder.decode(URLEncoder.encode(piraña, UTF-8), UTF-8); ...gives you piraña on your debian system? That doesn't seem right. | - Is URLDecoder#decode environment dependent? Nope. As long as you always provide the encoding to bs used, you should be fine. - -chris -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.9 (MingW32) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEUEARECAAYFAkhZZR0ACgkQ9CaO5/Lv0PCbTQCgm/eWN4Xphx9GQ4CTPZXNXdvn rigAlA5l2731npViTS8ofT4cqSi5F6o= =g6gT -END PGP SIGNATURE- - To start a new topic, e-mail: users@tomcat.apache.org To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Character encoding
Chris, thanks for your help. Please, see my comments bellow. Kind regards. - Original Message From: Christopher Schultz [EMAIL PROTECTED] To: Tomcat Users List users@tomcat.apache.org Sent: Wednesday, June 18, 2008 9:42:21 PM Subject: Re: Character encoding -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 nch, nch wrote: | | - I do remote debugging through Eclipse to both tomcat on windows | | (same machine as eclipse, though) and tomcat on debian. | Okay, remote debugging should not affect the server, but I'm still | wondering if the server.xml you think you are using is the one actually | being used. Try setting the Connector port to something crazy like | 12345 and restarting. If you can still contact the server, then you are | either editing the wrong server.xml (there should only be one!) or your | changes are not being picked up. I'll try. | | - When I send piraña it is always encoded into the URL as | | pira%C3%B1a, whether running tomcat on windows, debian or even | | running my app into Jetty. | That's because your browser is encoding it, not the server. So, it | doesn't depend on the server configuration (except possibly for the page | encoding, which often directs the browser to use utf-8 URI encoding). But, if the URL is allways encoded in the same way and tomcat does not receive any other information on what the resulting character encoding should be. Why do I get different values from tomcat? | | - If I type piraña on http://www.us-webmasters.com/Decode-URLs/ and | | switch browser encoding display between ISO-8859-1 and UTF-8, I can | | see that when ISO-8859-1, then it displays piraña, when UTF-8, it | | displays piraña. | I'm not sure what you think you're doing, there. When I paste that word | into the box to decode, I get broken output. There is no indication as | to what encoding the server expects for URIs. | Switching browser interpretation of the resulting page does not seem to | prove anything. The server never advertises any encoding to use, so the | browser just chooses whatever it wants. My browser chooses ISO-8859-1. | When I switch it to UTF-8, I see the expected interpretation. I'm not | sure what I just learned. If we take a look into this page src code we can see the following line: META HTTP-EQUIV=Content-Type CONTENT=text/html; charset=ISO-8859-1 I assume the this site expects ISO-8859-1 from the browser and so it decodes it into ISO-8859-1. In the case of Piraña it decodes to piraña which is same as what tomcat gives to my controller, even though I'm explicitly telling it to decode to UTF-8. | | - Something could be wrong in my debian environment. How can I find | | out about which env. varables is tomcat using? | Tomcat does not use any environment variables. The only settings that | affect the interpretation of the URI are the URIEncoding and | useBody... settings on the Connector. Are you using more than one | connector? Are you using Apache httpd out in front of Tomcat? Ah, I forgot to mention. I do have an apache httpd in front of tomcat, but for testing purposes I'm directly accessing tomcat through port 8080. Anyway, it yields same results whether directly accessing tomcat or through httpd. So, if tomcat doesn't read env. variables, why would debian packagers try to set LANG to system default into their tomcat init script? Does that make sense? BTW, the instance of tomcat I'm running on debian was manually downloaded from tomcat.apache.org | | - If I try to manually decode the returned parameter into my | | controller by using URLDecoder.decode(query, UTF-8) then I can see | | no difference. That is, when debugging the tomcat on windows the | | result is piraña while debugging the one on debian the result is | | piraña. | So, running this: | URLDecoder.decode(URLEncoder.encode(piraña, UTF-8), UTF-8); | | ...gives you piraña on your debian system? That doesn't seem right. I realise this test is crap :-) because I'm passing URLEncoder.encode an already decoded parameter. I'm tired ... I'll try to get the raw url parameter. | | - Is URLDecoder#decode environment dependent? | Nope. As long as you always provide the encoding to bs used, you should | be fine. - -chris -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.9 (MingW32) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEUEARECAAYFAkhZZR0ACgkQ9CaO5/Lv0PCbTQCgm/eWN4Xphx9GQ4CTPZXNXdvn rigAlA5l2731npViTS8ofT4cqSi5F6o= =g6gT -END PGP SIGNATURE- - To start a new topic, e-mail: users@tomcat.apache.org To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Character encoding
You say: Tomcat does not use any environment variables. The only settings that affect the interpretation of the URI are the URIEncoding and useBody... settings on the Connector. Are you using more than one connector? Are you using Apache httpd out in front of Tomcat? Perhaps the JVM does and so tomcat read them indirectly through it?? Cheers - Original Message From: Christopher Schultz [EMAIL PROTECTED] To: Tomcat Users List users@tomcat.apache.org Sent: Wednesday, June 18, 2008 9:42:21 PM Subject: Re: Character encoding -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 nch, nch wrote: | - I do remote debugging through Eclipse to both tomcat on windows | (same machine as eclipse, though) and tomcat on debian. Okay, remote debugging should not affect the server, but I'm still wondering if the server.xml you think you are using is the one actually being used. Try setting the Connector port to something crazy like 12345 and restarting. If you can still contact the server, then you are either editing the wrong server.xml (there should only be one!) or your changes are not being picked up. | - When I send piraña it is always encoded into the URL as | pira%C3%B1a, whether running tomcat on windows, debian or even | running my app into Jetty. That's because your browser is encoding it, not the server. So, it doesn't depend on the server configuration (except possibly for the page encoding, which often directs the browser to use utf-8 URI encoding). | - If I type piraña on http://www.us-webmasters.com/Decode-URLs/ and | switch browser encoding display between ISO-8859-1 and UTF-8, I can | see that when ISO-8859-1, then it displays piraña, when UTF-8, it | displays piraña. I'm not sure what you think you're doing, there. When I paste that word into the box to decode, I get broken output. There is no indication as to what encoding the server expects for URIs. Switching browser interpretation of the resulting page does not seem to prove anything. The server never advertises any encoding to use, so the browser just chooses whatever it wants. My browser chooses ISO-8859-1. When I switch it to UTF-8, I see the expected interpretation. I'm not sure what I just learned. | - Something could be wrong in my debian environment. How can I find | out about which env. varables is tomcat using? Tomcat does not use any environment variables. The only settings that affect the interpretation of the URI are the URIEncoding and useBody... settings on the Connector. Are you using more than one connector? Are you using Apache httpd out in front of Tomcat? | - If I try to manually decode the returned parameter into my | controller by using URLDecoder.decode(query, UTF-8) then I can see | no difference. That is, when debugging the tomcat on windows the | result is piraña while debugging the one on debian the result is | piraña. So, running this: URLDecoder.decode(URLEncoder.encode(piraña, UTF-8), UTF-8); ...gives you piraña on your debian system? That doesn't seem right. | - Is URLDecoder#decode environment dependent? Nope. As long as you always provide the encoding to bs used, you should be fine. - -chris -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.9 (MingW32) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEUEARECAAYFAkhZZR0ACgkQ9CaO5/Lv0PCbTQCgm/eWN4Xphx9GQ4CTPZXNXdvn rigAlA5l2731npViTS8ofT4cqSi5F6o= =g6gT -END PGP SIGNATURE- - To start a new topic, e-mail: users@tomcat.apache.org To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Character encoding
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 nch, nch wrote: | But, if the URL is allways encoded in the same way and tomcat does | not receive any other information on what the resulting character | encoding should be. Why do I get different values from tomcat? Because the servers are configured differently (probably is some very small way). The problem is that the HTTP spec is ... hazy when it comes to how URIs should be interpreted. The spec says that most servers expect ISO-8859-1, but many clients are (rightfully so, IMO) switching to UTF-8. This leaves us developers in a limbo where we have to beat our servers into submission and cross our fingers when decoding URIs. | | Tomcat does not use any environment variables. The only settings that | | affect the interpretation of the URI are the URIEncoding and | | useBody... settings on the Connector. Are you using more than one | | connector? Are you using Apache httpd out in front of Tomcat? | | Ah, I forgot to mention. I do have an apache httpd in front of | tomcat, but for testing purposes I'm directly accessing tomcat through port | 8080. Anyway, it yields same results whether directly accessing tomcat | or through httpd. If you have multiple Connectors (one for AJP and one for HTTP), are you setting the URIEncoding=utf-8 on both of them, or only one of them? It would help if you posted your entire server.xml. | So, if tomcat doesn't read env. variables, why would debian packagers | try to set LANG to system default into their tomcat init script? Probably to make it more consistent with the rest of the packages they support. They want you to be able to set LANG=foo and have it change everything for all services. | Does that make sense? I think it /does/ make sense, but it often confuses the issue when you're dealing with someone who is NOT using, say, debian. Note that there are no external factors for URI decoding. The only setting that can change it is the URIEncoding attribute of the Connector. It does not fall-back to the system Locale's preferred encoding or file.encoding or anything weird like that. It /always/ falls-back to ISO-8859-1, regardless of any other settings. | BTW, the instance of tomcat I'm running on debian was manually | downloaded from tomcat.apache.org The only reason it would be an issue is if the configuration was not what you expected it to be (for instance, the server.xml you are editing is not the one that TC is actually using). - -chris -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.9 (MingW32) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEARECAAYFAkhZeP8ACgkQ9CaO5/Lv0PA3ngCeMSw/ltgABrIKpVsqb+HEqAa9 KP0Aniac1roIDr0rPBl098vfGxlnVf7p =RGzQ -END PGP SIGNATURE- - To start a new topic, e-mail: users@tomcat.apache.org To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Character encoding
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 nch, nch wrote: | You say: | Tomcat does not use any environment variables. The only settings that | affect the interpretation of the URI are the URIEncoding and | useBody... settings on the Connector. Are you using more than one | connector? Are you using Apache httpd out in front of Tomcat? | | Perhaps the JVM does and so tomcat read them indirectly through it?? You can read the code for the connector. Those settings are the only relevant ones. - -chris -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.9 (MingW32) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEARECAAYFAkhZek0ACgkQ9CaO5/Lv0PBDvQCguIgu+QMTjKDxua3CS0cn9Gd0 AEoAoIZTNaJpiI8Xv3szp9O+3eANIGK0 =+VmT -END PGP SIGNATURE- - To start a new topic, e-mail: users@tomcat.apache.org To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [OT] Re: Character encoding
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 lightbulb, lightbulb432 wrote: POST requests always use the request's body encoding, which is specified in the HTTP header (and can be overridden by using request.setCharacterEncoding). Some broken clients don't provide the character encoding of the request, which makes things difficult sometimes. What determines what's specified in the HTTP header for the value of the encoding? Well... it's a bit of a chicken-in-an-egg scenario, since the encoding specified in the header must match the encoding actually used in the request. So, you could either decide that the header should match the content or the content should match the header. Is it purely up to the user agent, or can Tomcat provide hints based on previous requests how to encode it - or is it something up to the end user to set in their browser (in IE, View - Encoding)? Typically, the default encoding used by the user-agent will be locale-specific. For instance, most browsers in the US will use ISO-8859-1 as the default locale, or maybe WINDOWS-1252 if you're unlucky. Ideally, the server should be able to accept all reasonable encodings. The Accept-Charset header sent by the user-agent to the server indicates the acceptable encodings that should be returned, rated by acceptability. For instance, my en_US Mozilla Firefox on Windows sends this Accept-Charset string to servers: Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7 This indicates that the browser would prefer ISO-8859-1 encoding, but will also accept UTF-8 as a second choice, but that anything will do ('*') if those two are unavailable. On HTML form elements, you may override the encoding used to send the data: form accept-charset=UTF-8 The HTML 4 specification says this about the accept-charset attribute: The default value for this attribute is the reserved string UNKNOWN. User agents may interpret this value as the character encoding that was used to transmit the document containing this FORM element. (http://www.w3.org/TR/html4/interact/forms.html#h-17.3) So, if the server sends a document using UTF-8, it is polite for the user-agent to use that same encoding to respond to the server if the server hasn't indicated any preference by using the accept-charset form attribute. In what cases would you call request.setCharacterEncoding to override the value specified by the user agent? You should only do this when the user-agent does not declare the charset being used in the body of the request through the Content-Type request header. You should also only do this when you are relatively confident that the user-agent is sending the data in the overridden character set. For instance, if you suspect that most browsers adhere to the W3C's recommendation above that an UNKNOWN accept-charset implies that the browser should respond to the server with the same charset as used in the previous server response (got all that?), and you always use the same charset to send pages (say, UTF-8), they it is reasonable to override any unspecified Content-Type encoding with the charset you use to send pages (UTF-8, in this case). The HTTP specification has this to say about missing charsets (in Content-Type headers): The charset parameter is used with some media types to define the character set (section 3.4) of the data. When no explicit charset parameter is provided by the sender, media subtypes of the text type are defined to have a default charset value of ISO-8859-1 when received via HTTP. Data in character sets other than ISO-8859-1 or its subsets MUST be labeled with an appropriate charset value. See section 3.4.1 for compatibility problems. (http://www.ietf.org/rfc/rfc2616.txt Section 3.7.1) Basically, this says that a missing charset within a Content-Type header means that the request should be interpreted as being encoded using ISO-8859-1 encoding. Pretty simple. Shouldn't you trust the user agent rather than trying to guess? (Or is this only used in cases where the user agent is broken, like you said - but then how would you know you're dealing with a broken client to begin with...aah, complicated!) You should /always/ respect the charset sent by the client. In fact, the HTTP spec says so: HTTP/1.1 recipients MUST respect the charset label provided by the sender; (http://www.ietf.org/rfc/rfc2616.txt Section 3.4.1) If the client sends the wrong charset, it's their fault that their data will get all screwed up. But, if there's no charset, then you should provide your own. The default charset should be ISO-8859-1. I think Tomcat uses the default encoding of the JVM if no charset is provided, which is a problem for folks who set the JVM encoding to UTF-8 for i18n purposes... because then the default becomes UTF-8 which is incorrect. Fortunately, UTF-8 and ISO-8859-1 are compatible for most common lower ASCII characters. This has lead to a lot of folks thinking that they have their servers configured
[OT] Re: Character encoding
That was a really great set of answers, thanks! These follow-ups are somewhat off-topic to Tomcat, but you really know this stuff well so I hope you don't mind addressing them: POST requests always use the request's body encoding, which is specified in the HTTP header (and can be overridden by using request.setCharacterEncoding). Some broken clients don't provide the character encoding of the request, which makes things difficult sometimes. What determines what's specified in the HTTP header for the value of the encoding? Is it purely up to the user agent, or can Tomcat provide hints based on previous requests how to encode it - or is it something up to the end user to set in their browser (in IE, View - Encoding)? In what cases would you call request.setCharacterEncoding to override the value specified by the user agent? Shouldn't you trust the user agent rather than trying to guess? (Or is this only used in cases where the user agent is broken, like you said - but then how would you know you're dealing with a broken client to begin with...aah, complicated!) You shouldn't have to worry about cookie encoding, since you can always call request.getCookies() and get them correctly interpreted for you. What do you mean by this? Does it mean (pardon the surely messed up use of the API below) in your response.addCookie(), you add a cookie where the value has cookie.setValue(new String(charByteArray,UTF-8)) then you read it back using responseCookie.getValue().getBytes(UTF-8)? (Where UTF-8 is whatever encoding you're using internally in your application.) Finally, what's the default encoding used by the response when response.setCharacterEncoding(myEncoding) isn't called? Am I correct to assume that if that default is not the default Java String encoding of UTF-16, then you MUST call convert all the Strings you've outputted to that encoding? (...because the HTTP header expects whatever the default is, but Java is outputting UTF-16 encoded text to the actual response bytes) Am I speaking rubbish here, or am I thinking about these concepts in the right way? Thanks a lot. P.S. How did you learn all of that?! Christopher Schultz-2 wrote: Lightbulb, lightbulb432 wrote: Why is the URIEncoding attribute specified on the connector rather than on a host, for example? Because the host doesn't handle connections... the connectors do. Does this mean that the number of virtual hosts that can listen on the same port on the same box are limited by whether they all use the same encodings in their URIs? Yes, all virtual hosts listening on the same port will have to have the same encoding. Fortunately, UTF-8 works for all languages that I know of. Now that I think about it, wouldn't it be at the context level, not even at the host level? If you had a connector-per-context, yes, but that's no the case. In Tomcat 6, should the useBodyEncodingForURI be used if not needing compatibility with 4.1, as the documentation mentions? I would highly recommend following that recommendation. To see if I have things straight, is HttpServletRequest's get/setCharacterEncoding used for both the request parameters from a GET request AND the contents of the POST? No. GET requests have request parameters encoded as part of the URL, which is affected by the Connector's URIEncoding parameter. POST requests always use the request's body encoding, which is specified in the HTTP header (and can be overridden by using request.setCharacterEncoding). Some broken clients don't provide the character encoding of the request, which makes things difficult sometimes. How are multipart POST requests dealt with? Typically, each part of a multipart request contains its own character encoding, so a multipart POST would follow the encoding for the part you're reading at the time. And HttpServletResponse's get/setCharacterEncoding is used for the contents of the response header and the meta tags? Only for the header field, not META tags. If you want to emit META tags, you'll have to do them yourself. Does it also encode the page content itself? Nope. If you change the character encoding for a response after the response has already had some data written to it, I think you'll send an incorrect header. For instance: response.setCharacterEncoding(ISO-8859-1); PrintWriter out = response.getOutputWriter(); response.setCharacterEncoding(Big5); out.print(abcdef); out.flush(); Your client will not receive a sane response. Setting the character encoding only sets the HTTP response header and configures the response's Writer, if used, but only /before/ calling getWriter the first time. What about the encoding of cookies for both incoming requests and outgoing responses? See the HTTP spec, section 4.2 (Message Headers). It references RFC 822 (ARPA Internet text messages) which does not actually specify a character encoding. From what I can see, low ASCII
Re: Character encoding
Lightbulb, lightbulb432 wrote: Why is the URIEncoding attribute specified on the connector rather than on a host, for example? Because the host doesn't handle connections... the connectors do. Does this mean that the number of virtual hosts that can listen on the same port on the same box are limited by whether they all use the same encodings in their URIs? Yes, all virtual hosts listening on the same port will have to have the same encoding. Fortunately, UTF-8 works for all languages that I know of. Now that I think about it, wouldn't it be at the context level, not even at the host level? If you had a connector-per-context, yes, but that's no the case. In Tomcat 6, should the useBodyEncodingForURI be used if not needing compatibility with 4.1, as the documentation mentions? I would highly recommend following that recommendation. To see if I have things straight, is HttpServletRequest's get/setCharacterEncoding used for both the request parameters from a GET request AND the contents of the POST? No. GET requests have request parameters encoded as part of the URL, which is affected by the Connector's URIEncoding parameter. POST requests always use the request's body encoding, which is specified in the HTTP header (and can be overridden by using request.setCharacterEncoding). Some broken clients don't provide the character encoding of the request, which makes things difficult sometimes. How are multipart POST requests dealt with? Typically, each part of a multipart request contains its own character encoding, so a multipart POST would follow the encoding for the part you're reading at the time. And HttpServletResponse's get/setCharacterEncoding is used for the contents of the response header and the meta tags? Only for the header field, not META tags. If you want to emit META tags, you'll have to do them yourself. Does it also encode the page content itself? Nope. If you change the character encoding for a response after the response has already had some data written to it, I think you'll send an incorrect header. For instance: response.setCharacterEncoding(ISO-8859-1); PrintWriter out = response.getOutputWriter(); response.setCharacterEncoding(Big5); out.print(abcdef); out.flush(); Your client will not receive a sane response. Setting the character encoding only sets the HTTP response header and configures the response's Writer, if used, but only /before/ calling getWriter the first time. What about the encoding of cookies for both incoming requests and outgoing responses? See the HTTP spec, section 4.2 (Message Headers). It references RFC 822 (ARPA Internet text messages) which does not actually specify a character encoding. From what I can see, low ASCII is the encoding used. You shouldn't have to worry about cookie encoding, since you can always call request.getCookies() and get them correctly interpreted for you. -chris signature.asc Description: OpenPGP digital signature
Re: Character encoding
Hello Mark Mester József wrote: Ok. Let's see my problem. I have a form with text input box. I type Árvíztűrő tükörfúrógép and I get ÃrvÃztűrÅ tükörfúrógép I have tested this with the latest 5.5.x source and it works correctly (there haven't been any encoding related fixes since 5.5.20). Have you got the request dumper valve enabled? This causes all request parameters to be processed as ISO-8859-1 and enabling it was the only way I could replicate the behaviour you see. I develop with Netbeans 5.5 and my servlet container is Netbean's bundled Tomcat. (5.5.17) I don't changed anything in tomcat's settings. What is request dumper valve? And where can I set? If you haven't got this valve enabled, check you application for filters, valves etc that may read request parameters before your request.setCharacterEncoding(UTF-8) is called. Note parameters are only read once so if the encoding is wrong then you can't easily fix it. There are no filters in my application. Joe ___ Now you can scan emails quickly with a reading pane. Get the new Yahoo! Mail. http://uk.docs.yahoo.com/nowyoucan.html
Re: Character encoding
Mester József wrote: Hello Mark Ok. Let's see my problem. I have a form with text input box. I type Árvíztűrő tükörfúrógép and I get ÃrvÃztűrÅ tükörfúrógép Beautiful isn't it? I have tested this with the latest 5.5.x source and it works correctly (there haven't been any encoding related fixes since 5.5.20). Have you got the request dumper valve enabled? This causes all request parameters to be processed as ISO-8859-1 and enabling it was the only way I could replicate the behaviour you see. If you haven't got this valve enabled, check you application for filters, valves etc that may read request parameters before your request.setCharacterEncoding(UTF-8) is called. Note parameters are only read once so if the encoding is wrong then you can't easily fix it. Mark - To start a new topic, e-mail: users@tomcat.apache.org To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Character encoding
Hello Mark This is unlikely to help you and may be read-only on your JVM. You don't say what doesn't work but generally the following is required: set URIEncoding=UTF-8 on the connector set the the correct response encoding on every response (you can do this per page or use a filter to do this for all pages) Ok. Let's see my problem. I have a form with text input box. I type Árvíztűrő tükörfúrógép and I get ÃrvÃztűrÅ tükörfúrógép Beautiful isn't it? The page is: [EMAIL PROTECTED] contentType=text/html% [EMAIL PROTECTED] pageEncoding=UTF-8% !DOCTYPE HTML PUBLIC -//W3C//DTD HTML 4.01 Transitional//EN http://www.w3.org/TR/html4/loose.dtd; html head meta http-equiv=Content-Type content=text/html; charset=UTF-8 titleTry encoding/title /head body % try { request.setCharacterEncoding(UTF-8); } catch (Exception ex) { out.println(Bad something: +ex.getMessage()); } % Hello %=request.getParameter(nev)% br form accept-charset=UTF-8 action=index.jsp method=POST input type=text name=nev input type=submit value=Send name /form /body /html If you use a database make sure that you persist your data in the correct encoding. If my text came from database everything is correct. If you convert from bytes to characters or characters to bytes makde sure you use the correct encoding. I don't. Joe ___ Try the all-new Yahoo! Mail. The New Version is radically easier to use – The Wall Street Journal http://uk.docs.yahoo.com/nowyoucan.html
Re: Character encoding
export CATALINA_OPTS=-Dfile.encoding=UTF-8 On 12/12/06, Mester József [EMAIL PROTECTED] wrote: Hi I have some problem with character encoding. I have found a page ( http://junlu.com/msg/1132.html ) and on this page there is a direction: 2. In the Catalina.bat (windows) catalina.sh (linux) there must be a switch added to the call to java.exe. The switch is: -Dfile.encoding=UTF-8 But I don't know where can I add this switch in catalina.sh I use Tomcat 5.5.20 on Debian Sarge Joe Send instant messages to your online friends http://uk.messenger.yahoo.com -- Souviens-toi qu'au moment de ta naissance tout le monde était dans la joie et toi dans les pleurs. Vis de manière qu'au moment de ta mort, tout le monde soit dans les pleurs et toi dans la joie.
Re: Character encoding, once again....
THANKS, that URIencoding property of HTTP connector was source of GET problems I tried to remove filter after that, but POST requests stop working. So ive instaled filter back. Now I have working both GET and POST. Aleluja... d. Anyway its On Mon, 14 Aug 2006 22:40:27 +0200, Mark Thomas [EMAIL PROTECTED] wrote: dizzi wrote: Im not sure if this is problem of tomcat, but i think that its most probable. Unlikely. I haven't seen a valid bug in this area for quite some time. It is usually a combination of configuration (check the URIEncoding property of your connector) and application errors. For a correctly coded application, the content-encoding filter should be unnecessary. I'd start with a simple application like this one and build up to the form that is causing problems. http://marc.theaimsgroup.com/?l=tomcat-userm=111548442910292w=2 Mark - To start a new topic, e-mail: users@tomcat.apache.org To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To start a new topic, e-mail: users@tomcat.apache.org To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Character Encoding : Unix vs Windows
On 4/3/06, Nigel Blake [EMAIL PROTECTED] wrote: Problem : Creating a URL type with parameters that have a space between them causes an IOException in a javabean when called from Tomcat 5.0.0.27 on a Unix installation. Using the same bean and JSP code causes no problem when invoked on the same version of Tomcat on a Windows installation. Solutions tried : 1.Ensured that the server connector encoding is UTF-8 (suggested in the FAQ) 2. Have ensured that jsp the page instruction is UTF-8 3. I could turn the bean into a servlet and try using the setContentType or SetCharacterEncoding. ( I would rather not ) Any suggestions that would make Unix implementation work would be gratefully received. I have run out of ideas... URLEncoder.encode(), URLEncoder.decode() - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Character Encoding : Unix vs Windows
java.net.URLEncoder.encode -Original Message- From: Nigel Blake [mailto:[EMAIL PROTECTED] Sent: Monday, April 03, 2006 5:43 PM To: users@tomcat.apache.org Subject: Character Encoding : Unix vs Windows Problem : Creating a URL type with parameters that have a space between them causes an IOException in a javabean when called from Tomcat 5.0.0.27 on a Unix installation. Using the same bean and JSP code causes no problem when invoked on the same version of Tomcat on a Windows installation. Solutions tried : 1.Ensured that the server connector encoding is UTF-8 (suggested in the FAQ) 2. Have ensured that jsp the page instruction is UTF-8 3. I could turn the bean into a servlet and try using the setContentType or SetCharacterEncoding. ( I would rather not ) Any suggestions that would make Unix implementation work would be gratefully received. I have run out of ideas... Thanks Nigel Example code : URL birdSite = new URL(http://orientalbirdimages.org/search.php?keyword=black bittern); try { webPageStream = new BufferedReader(new InputStreamReader(birdSite. openStream())); } catch (MalformedURLException ne) { System.out.println( Malformed URL Error called from within getPageNumber() + ne.toString()); } catch (IOException ie) { System.out.println(IOException called from within getPageNumber + ie.toString()); } The IOException is caught under unix when the variable I pass to the URL query string has a query parameter of more than more than 1 word as in 'black bittern' above. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Character Encoding -ISo-8859-1 Vs UTF-8 Vs GBK
Notice: The information contained in this e-mail message and/or attachments to it may contain confidential or privileged information. If you are not the intended recipient, any dissemination, use, review, distribution, printing or copying of the information contained in this e-mail message and/or attachments to it are strictly prohibited. If you have received this communication in error, please notify us by reply e-mail or telephone and immediately and permanently delete the message and any attachments. Thank you - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Character Encoding -ISo-8859-1 Vs UTF-8 Vs GBK
Please don´t send more emails I´m not tomcat user -Mensaje original- De: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Enviado el: Miércoles, 19 de Octubre de 2005 04:20 a.m. Para: Tomcat Users List Asunto: RE: Character Encoding -ISo-8859-1 Vs UTF-8 Vs GBK Notice: The information contained in this e-mail message and/or attachments to it may contain confidential or privileged information. If you are not the intended recipient, any dissemination, use, review, distribution, printing or copying of the information contained in this e-mail message and/or attachments to it are strictly prohibited. If you have received this communication in error, please notify us by reply e-mail or telephone and immediately and permanently delete the message and any attachments. Thank you - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Character Encoding -ISo-8859-1 Vs UTF-8 Vs GBK
Hi, In Europe we have lots of languages. I don't think it's true that UTF-8 can handle european character very well.There is a list in the net (I don't know here) with the other ISO encoding for other languages. AF Citando David Delbecq [EMAIL PROTECTED]: Hi, UTF-8 can handle european and chinese character very well. If you can't read using utf-8 any of those this simply mean you text file is not saved in utf-8. [EMAIL PROTECTED] a écrit : Hi, I am trying to read the universal charater form a text file to my java application that stores them in database. When I use encoding type GBK i can read all special charater in chinease, when i use encoding ISO-8859-1 i can read latin but not chinease , but whn i use encoding as UTF-8 i think i ma supposed to read both chinease and latin correctly but i am not able to read any of them. Can any one give me the pointers for solution , Further the beta- is converted to ss in latin-1 thanks in advance Birendar S Waldiya Notice: The information contained in this e-mail message and/or attachments to it may contain confidential or privileged information. If you are not the intended recipient, any dissemination, use, review, distribution, printing or copying of the information contained in this e-mail message and/or attachments to it are strictly prohibited. If you have received this communication in error, please notify us by reply e-mail or telephone and immediately and permanently delete the message and any attachments. Thank you - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] -- - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Character Encoding -ISo-8859-1 Vs UTF-8 Vs GBK
UTF-8 (8-bit Unicode Transformation Format) is a lossless, variable-length character encoding for Unicode created by Ken Thompson and Rob Pike. It uses groups of bytes to represent the Unicode standard for the alphabets of many of the world's languages. UTF-8 is especially useful for transmission over 8-bit Electronic Mail systems. http://en.wikipedia.org/wiki/UTF-8 In computing, Unicode provides an international standard which has the goal of providing the means to encode the text of every document people want to store on computers. This includes all scripts in active use today, many scripts known only by scholars, and symbols which do not strictly represent scripts, like mathematical, linguistic and APL symbols. http://en.wikipedia.org/wiki/Unicode [EMAIL PROTECTED] a écrit : Hi, In Europe we have lots of languages. I don't think it's true that UTF-8 can handle ALL european character very well.There is a list in the net (I don't know here) with the other ISO encoding for other languages. AF Citando David Delbecq [EMAIL PROTECTED]: Hi, UTF-8 can handle european and chinese character very well. If you can't read using utf-8 any of those this simply mean you text file is not saved in utf-8. [EMAIL PROTECTED] a écrit : Hi, I am trying to read the universal charater form a text file to my java application that stores them in database. When I use encoding type GBK i can read all special charater in chinease, when i use encoding ISO-8859-1 i can read latin but not chinease , but whn i use encoding as UTF-8 i think i ma supposed to read both chinease and latin correctly but i am not able to read any of them. Can any one give me the pointers for solution , Further the beta- is converted to ss in latin-1 thanks in advance Birendar S Waldiya Notice: The information contained in this e-mail message and/or attachments to it may contain confidential or privileged information. If you are not the intended recipient, any dissemination, use, review, distribution, printing or copying of the information contained in this e-mail message and/or attachments to it are strictly prohibited. If you have received this communication in error, please notify us by reply e-mail or telephone and immediately and permanently delete the message and any attachments. Thank you - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Character Encoding -ISo-8859-1 Vs UTF-8 Vs GBK
Sorry, my mistake! I thought we were speaking about something else... AF Citando Peter Crowther [EMAIL PROTECTED]: From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] I don't think it's true that UTF-8 can handle ALL european character very well. If it can't, the Unicode consortium (http://www.unicode.org/) will be pretty worried, as UTF-8 is an encoding of Unicode... - Peter - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] -- - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]