Re: POST request encoding - Tomcat/JVM configuration?

2009-10-26 Thread Christopher Schultz
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Jan,

On 10/24/2009 5:58 AM, Pfeifer Jan wrote:
 String decoded = new String(param.getBytes(iso-8859-1),UTF-8);

(I'm all out of breath from replacing those quot; escapes with 
symbols... I need to get more exercise).

The above line of code is only valid if:

1. The bytes coming from the client were supposed to be UTF-8
2. Your server has been configured to interpret the data coming from
   clients unconditionally as ISO-8859-1
3. The characters you are trying to decode are in the ASCII character
   set

Why the third constraint? Because, if the clients sends UTF-8 and the
server decodes that as ISO-8859-1, information is lost in the
translation... the bytes are not going to be magically re-combined into
UTF-8 bytes when you call getBytes(ISO-8859-1) on them. It's only
going to make things worse.

The only time transcoding bytes is appropriate is when you are decoding
GET parameters, because any POST parameters ought to have been sent with
a correct Content-Type (including a charset) parameter.

It would be better to install a filter to set the character encoding of
the request /before/ any data has been read from it if you were worried
about the client sending an incorrect content type.

As for GET parameters, you're pretty much screwed as Andre points out:
there's just no standard for URL encoding (okay, yes, there is a
standard: use URL/%-encoded ISO-8859-1, unless the browser is modern and
uses UTF-8 instead of ISO-8859-1 as its default URL encoding). It's just
a mess.

 for
 a start, I know about URIEncoding in server.xml and about using
 Encoding filter,but we use this for decoding GET request for
 historical reasons. Or is there more correct way to
 decode String?

If you always want your strings decoded as UTF-8, then set
URIEncoding=UTF-8 on your Connector and be done with it. Don't have
your webapp's code re-coding strings that come from clients.

Again, read the CharacterEncoding page on the Wiki, as previously
suggested. All will become clear.

Well, the solution becomes clear, at least.

- -chris
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.10 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkrlyfgACgkQ9CaO5/Lv0PAenQCgsmZN7pMGMuhuBO9x1hZ3z5A2
MV0AoJW1MtGpPwWDGrdwy50NhETwvedX
=2ZXB
-END PGP SIGNATURE-

-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Re: POST request encoding - Tomcat/JVM configuration?

2009-10-24 Thread Pfeifer Jan
I apologize for my last post, I accidentaly doublecoded it.

Original post:

quot;That JSP should work on any clean Tomcat installation. quot;It doesn't 
workquot; isn't very informative. We need details.quot; 
There realy is not much more to say. quot;it worked, now it does notquot;. 

I also use myeclipse IDE. I disabled it,reinstalled tomcat, tried it and 
quot;it does not workquot;. I believe that clean instalation of the whole 
system will work, but this is not an option at the moment. Neither Tomcat 
upgrade.

 What else can change the way how Tomcat/Java treats the request body? 

quot;String decoded = new 
String(param.getBytes(quot;iso-8859-1quot;),quot;UTF-8quot;); for a 
startquot; 
I know about URIEncoding in server.xml and about using Encoding filter,but we 
use this for decoding GET request for historical reasons. Or is there more 
quot;correctquot; way to decode String? 

Jan

Re: POST request encoding - Tomcat/JVM configuration?

2009-10-24 Thread André Warnier

Pfeifer Jan wrote:
...

I know about URIEncoding in server.xml and about using Encoding filter,but we use this for decoding GET request for historical reasons. Or is there more quot;correctquot; way to decode String? 


Jan,

this whole area of the character set in which HTTP requests come into a 
server, and are decoded by the server, is complicated, confusing, and 
generally not well-defined (or defined in contradictory ways) by the 
Internet RFCs themselves.
In short, there can be many reasons why you are not getting the data in 
the character set that you expect, and finding the specific reason that 
applies in your case can be tedious and involve several levels.
To resolve it, you have to be very systematic, and check every step one 
by one.

Here are some principles :

1) the general default for the HTTP protocol, and for HTML, is 
iso-8859-1.  Anything else, you have to explicitly specify.
iso-8859-1 is at the same time a character set, and an encoding, in 
which each character is represented by one byte.


2) internally, Java represents all character strings as Unicode (which 
is a character set), using a 16-bit representation for each character 
(which is an encoding).


(1) and (2) above mean that somewhere, no matter what, some character 
set translation is going to take place, between the web and your Java 
webapp, and vice-versa between your webapp and the web.  The trick is to 
get the pieces in place so that the /correct/ translations take place in 
each direction.


3) iso-8859-1 (in fact all iso-8859-x character sets and encodings) can 
only represent each 256 different characters, which is not enough to 
cover all languages used on the WWW nowadays. So if your applications 
have to use Czech and German at the same time, you should not use a 
iso-8859 charset.


4) UTF-8 is a popular encoding of Unicode, where each character is 
represented by one or more bytes.
The big advantage of Unicode/UTF-8 is that it can represent all 
characters of all languages used on the WWW.
The inconvenient of Unicode/UTF-8 at the moment is that, for historical 
reasons, it is /not/ the HTTP/HTML default charset, so you have to 
explicitly specify it in several places.


5) despite what is said above about the default for HTTP being 
iso-8859-1, URLs are an exception.  A URL, by definition, is not in any 
specific character set or encoding.  The definition of URLs just says 
that, whatever the character set and encoding used, *any byte whose 
value does not match one of the printable characters of the US-ASCII 
range (roughly [0-9A-Za-z] + some), must be encoded in %AB notation, 
where %AB is : the % sign, followed by a 2-digit hexadecimal 
representation of the byte value.


In other words it means that, when interpreting data that comes as part 
of a URL (like the query string in a HTTP GET),
- the server first decodes the URI from the %AB encoding above, back 
into a series of bytes
- then the server further decodes this series of bytes into a string of 
characters, using some charset encoding
- but, the only way to know in which character set the data really is, 
is *by convention* between the client and the server.


The convention, historically so far, has always been iso-8859-1.
Recently and slowly, it seems that this convention is now shifting 
toward UTF-8.
But note that it is a convention still, and that in order to make sure 
that your application (and Tomcat before it) can consider the parameters 
from a GET URL to be UTF-8, /you/ have to make sure that all URLs on 
which a user may click in one of /your/ pages, is indeed encoding the 
URLs that way.
(And thus basically also, if you receive a request from an unknown 
source, well, you have to guess..)


See in Tomcat 6.0 docs, the following attribute of the HTTP Connector :

URIEncoding :   
This specifies the character encoding used to decode the URI bytes, 
after %xx decoding the URL. If not specified, ISO-8859-1 will be used.


(The above applies to GET requests, because in that case the request 
parameters are passed as part of the URI)


Now about POST requests :

In a POST, the request parameters are not sent as part of a query string 
in a URI, but they are sent in the *body* of the request.

There are 2 ways to format a POST request from the client side :
a) as a url-encoded body (the default).
b) as a multipart/form-data body.
(That is the case if the Form tag contains the attribute :
enctype=multipart/form-data
)

In (a), the body consists of one long string, which looks like the query 
string of a GET :

param1=value1param2=value2.paramn=valuen
The charset and encoding of that string are supposed to be given by the 
Content-type HTTP header of that POST request.


In (b), it is more complicated :
The body of the request is composed of parts, each part representing 
one parameter.  Each part /should/ have its own Content-type header, 
indicating the type of that part, and if applicable, the character set 
and encoding of that part.


Re: POST request encoding - Tomcat/JVM configuration?

2009-10-23 Thread Pfeifer Jan


Thank you for reply. 

I've checked that link several times already. I have not problem with code 
itself. It works. Resp. it works everywhere except my computer now. I am 
looking for a reason why after almost a year it stopped work properly. Which 
configuration file I missed.

I have tested many samples  with no luck so far.

Jan

Re: POST request encoding - Tomcat/JVM configuration?

2009-10-23 Thread Mark Thomas
Pfeifer Jan wrote:
 
 Thank you for reply. 
 
 I've checked that link several times already. I have not problem with code 
 itself. It works.

I disagree. I can see a whole bunch of things wrong with that code.

Does the sample JSP in the FAQ work? If not, you have some system config
issues to fix. If it does then you just need to fix you code.

Mark




-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Re: Re: POST request encoding - Tomcat/JVM configuration?

2009-10-23 Thread Pfeifer Jan
quot;That JSP should work on any clean Tomcat installation. quot;It doesn't 
workquot; isn't very informative. We need details.quot; There realy is not 
much more to say. quot;it worked, now it does notquot;. I also use myeclipse 
IDE. I disabled it,reinstalled tomcat, tried it and quot;it does not 
workquot;. I believe that clean instalation of the whole system will work, but 
this is not an option at the moment. Neither Tomcat upgrade. What else can 
change the way how Tomcat/Java treats the request body? quot;String decoded = 
new String(param.getBytes(quot;iso-8859-1quot;),quot;UTF-8quot;); for a 
startquot; I know about URIEncoding in server.xml and about using Encoding 
filter,but we use this for decoding GET request for historical reasons. Or is 
there more quot;correctquot; way to decode String? Jan

Re: POST request encoding - Tomcat/JVM configuration?

2009-10-23 Thread André Warnier

Pfeifer Jan wrote:

quot;That JSP should work on any clean Tomcat installation. quot;It doesn't workquot; isn't very informative. We need 
details.quot; There realy is not much more to say. quot;it worked, now it does notquot;. I also use myeclipse IDE. I disabled 
it,reinstalled tomcat, tried it and quot;it does not workquot;. I believe that clean instalation of the whole system will work, but 
this is not an option at the moment. Neither Tomcat upgrade. What else can change the way how Tomcat/Java treats the request body? 
quot;String decoded = new String(param.getBytes(quot;iso-8859-1quot;),quot;UTF-8quot;); for a startquot; I know 
about URIEncoding in server.xml and about using Encoding filter,but we use this for decoding GET request for historical reasons. Or is there 
more quot;correctquot; way to decode String? Jan


Jan,
I don't know if this affects only my mail reader, but your messages to 
the list, for me, are almost impossible to read because of the apparent 
profusion of html escapes in them.

Can you maybe make sure that you are posting only in plain text ?

Until I am sure that it is not only a problem on my side, I will refrain 
from further comments about posting encoding-related issues in html...


-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Re: POST request encoding - Tomcat/JVM configuration?

2009-10-23 Thread Pid

On 23/10/2009 15:21, André Warnier wrote:

Pfeifer Jan wrote:

quot;That JSP should work on any clean Tomcat installation. quot;It
doesn't workquot; isn't very informative. We need details.quot;
There realy is not much more to say. quot;it worked, now it does
notquot;. I also use myeclipse IDE. I disabled it,reinstalled tomcat,
tried it and quot;it does not workquot;. I believe that clean
instalation of the whole system will work, but this is not an option
at the moment. Neither Tomcat upgrade. What else can change the way
how Tomcat/Java treats the request body? quot;String decoded = new
String(param.getBytes(quot;iso-8859-1quot;),quot;UTF-8quot;); for
a startquot; I know about URIEncoding in server.xml and about using
Encoding filter,but we use this for decoding GET request for
historical reasons. Or is there more quot;correctquot; way to decode
String? Jan


Jan,
I don't know if this affects only my mail reader, but your messages to
the list, for me, are almost impossible to read because of the apparent
profusion of html escapes in them.
Can you maybe make sure that you are posting only in plain text ?

Until I am sure that it is not only a problem on my side, I will refrain
from further comments about posting encoding-related issues in html...


snap. encoded ampersands  no spacing: unreadable.

p



-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org




-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Re: POST request encoding - Tomcat/JVM configuration?

2009-10-23 Thread Christopher Schultz
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

André,

On 10/23/2009 10:21 AM, André Warnier wrote:
 Pfeifer Jan wrote:
 I don't know if this affects only my mail reader, but your messages to
 the list, for me, are almost impossible to read because of the apparent
 profusion of html escapes in them.

+1

Also, quotes are mixed-up with the main body of the message. I would
guess a broken mailer. It seems that Smart4Web 2.0 Mailer isn't so smart. :(

- -chris
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.10 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkriDsoACgkQ9CaO5/Lv0PAHOgCdGucbUtWZ35Fv8Tiar+F6jJaa
kXUAoKGy3DaQI4em01N0HwFDxI8EMl/p
=bZIK
-END PGP SIGNATURE-

-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Re: POST request encoding - Tomcat/JVM configuration?

2009-10-22 Thread Mark Thomas
Pfeifer Jan wrote:
 
 Hi,
 I am running webapp for some time with no problems. After some 
 quot;changequot; that I am not able to identify my POST (GET works fine) 
 requests are messed up. Just on my local server.

http://wiki.apache.org/tomcat/FAQ/CharacterEncoding

Mark

 
 Some facts: Tomcat 5.0.28, jdk 1.4.2.07 winXP, (others in,xp win server,linux)
 Following simple code produces different result on my server and the others 
 (web.xml,server.xml,jdk configuration is the same)
 
 Form page:
 
 lt;!DOCTYPE HTML PUBLIC quot;-//W3C//DTD HTML 4.01//ENquot; 
 quot;http://www.w3.org/TR/html4/strict.dtdquot;gt;
 lt;htmlgt;
 lt;headgt;
   lt;meta http-equiv=quot;Content-Typequot; content=quot;text/html; 
 charset=UTF-8quot;gt;
   lt;titlegt;Encoding testlt;/titlegt;
  
   lt;script type=quot;text/javascriptquot;gt;
   function test(method){
var frm = document.getElementById(quot;formquot;);
frm.method = method;
frm.submit();
   }
   lt;/scriptgt;
 lt;/headgt;
 
 lt;bodygt; 
  lt;form method='POST' action='post_process.jsp' id='form'gt;
   lt;input type='text' name='param'gt;
  lt;/formgt; 
 
  lt;button onclick='test(quot;getquot;)'gt;getlt;/buttongt;
  lt;button onclick='test(quot;postquot;);'gt;postlt;/buttongt;
 lt;/bodygt;
 lt;/htmlgt;
 
 post_procces.jsp
 lt;%@ page language=quot;javaquot; session=quot;truequot; 
 contentType=quot;text/html; charset=UTF-8quot;%gt;
 lt;% request.setCharacterEncoding(quot;UTF-8quot;);  %gt;
 lt;!DOCTYPE HTML PUBLIC quot;-//W3C//DTD HTML 4.01//ENquot; 
 quot;http://www.w3.org/TR/html4/strict.dtdquot;gt;
 lt;htmlgt;
 lt;headgt;
  lt;meta http-equiv=quot;Content-Typequot; content=quot;text/html; 
 charset=UTF-8quot;gt;
  lt;titlegt;Encoding test processlt;/titlegt;
 lt;/headgt;
 lt;bodygt;
 
   lt;% String param   = request.getParameter(quot;paramquot;);
  String decoded = new 
 String(param.getBytes(quot;iso-8859-1quot;),quot;UTF-8quot;);
   
   System.out.println(quot;Original value --gt; quot;+param);
   System.out.println(quot;Decoded  value --gt; quot;+decoded);
   %gt;  
   
   V2lt;brgt;
   Method: lt;%= request.getMethod() %gt;lt;brgt;
   Encoding: lt;%= request.getCharacterEncoding() %gt;lt;brgt;
   Locale: lt;%= request.getLocale() %gt;lt;brgt;
   Default System charset: lt;%= new java.io.OutputStreamWriter(new 
 java.io.ByteArrayOutputStream()).getEncoding() %gt;lt;brgt;
   Original value:  'lt;span style='color:lt;%= 
 request.getMethod().equals(quot;GETquot;) ? quot;redquot; : 
 quot;greenquot; %gt;'gt;lt;%= param %gt;lt;/spangt;'lt;brgt;
   Decoded  value: 'lt;span style='color:lt;%= 
 request.getMethod().equals(quot;GETquot;) ? quot;greenquot; : 
 quot;redquot; %gt;'gt;lt;%= decoded %gt;lt;/spangt;'lt;brgt;
   lt;/tablegt;
 lt;/bodygt;
 lt;/htmlgt;
 
 My output:
 Method: POST
 Encoding: UTF-8
 Locale: cs
 Default System charset: Cp1250
 Original value: 'Auml;Atilde;shy;Aring;frac34;ek'
 Decoded value: 'čížek'
 
 Correct output (any other server):
 V2
 Method: POST
 Encoding: UTF-8
 Locale: cs
 Default System charset: Cp1250
 Original value: 'čížek'
 Decoded value: '?�?ek'
 
 I spent last two days googling and looking for answer with no luck. Hope that 
 someone can help.
 
 Thanks in advance
 Jan Pfeifer
 
 




-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org