Re: mojk and utf8 charset problem

2011-05-18 Thread Thierry Templier
Hello,

Sorry for my very late answer!

I took me time to solve the problem basing on what you suggested. In fact, 
there are two different ones:

- I use Tiles and I don't specify header in all elements building the final 
page (%@page language=java contentType=text/html; charset=UTF-8%). After 
having specified that, utf8 characters display correctly.

- I also have a configuration problem at the MySQL JDBC driver level. Whereas 
the database is configured for utf8, I also need to specify some parameters in 
the JDBC url (see 
http://confluence.atlassian.com/display/DOC/Configuring+Database+Character+Encoding).

Thanks very much for your help!
Thierry

 Logic would have it that, independently of what the server
 does,
 - if you have the same browser at the client side
 - if the HTTP response headers are the same in both cases
 - if the response content is the same in both cases
 then the browser should display the same thing.
 
 And if it doesn't, then one of the above premises is
 wrong.
 
 To my knowledge, there is no purpose-built mechanism in
 either the AJP Connector, or mod_jk, to change the response
 content after it has been produced by the application.
 
 There could be a bug somewhere however, in particular when
 talking about characters which may need more than 2 bytes
 for a proper UTF-8 representation (and chunked encoding?
 that may be a little-investigated area).
 But if the received content is the same, then this also
 makes no sense.
 
 Another test : what about using wget to retrieve one of
 your pages directly from tomcat and then through
 Apache/mod_jk, saving the result as 2 files, and then
 comparing these files with diff ?

-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



RE: mojk and utf8 charset problem

2011-05-02 Thread Thierry Templier
Hello Matteo,

Thanks very much for your answer but I didn't receive the end...

As suggested, I tried both addresses and the result isn't the same. When using 
Tomcat directly, everything works fine and when accessing through modjk, I have 
problem with non latin1 characters... So I think that it's a modjk / connector 
configuration problem.

Thierry

 From my experience modjk doesn't have
 charset configuration, only on connector into server.xml you
 can change charset configuration (URIEncodig,
 useBodyEncodingForURI) but only to parse the uri and
 parameters of the request, not for output.
 
 Did you try with the same tomcat to get pages from http
 tomcat connector (port 8080 default) and from apache (port
 80 default).
 
 i.e: if you have a dynamic page testPage.html build by
 tomcat, try this on your browser
 http://localhost:8080/testPage.html

 http://localhost:80/testPage.html

 
 If the result is the same, connector and apache are not the
 origin.
 
 If you

-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Re: mojk and utf8 charset problem

2011-05-02 Thread Thierry Templier
Hi André,

Thanks very much for your help!

I checked difference between two access:

- Using Apache / modjk / Tomcat that can't display correclty non latin1 
characters
- Directly using Tomcat that works fine

Except characters that don't display correctly content are the same, especially 
meta tags at the beginning:

meta http-equiv=CONTENT-TYPE content=text/html; charset=UTF-8/meta

As suggested, I also have a look at request / response content and it seems 
that there are some different, as described below.

- Response headers when using Apache / Modjk / Tomcat:

DateMon, 02 May 2011 08:21:16 GMT
Server  Apache/2.2.14 (Ubuntu)
Pragma  no-cache
Expires Thu, 01 Jan 1970 00:00:00 GMT
Cache-Control   no-cache, no-store
Content-Languageen-UK
VaryAccept-Encoding
Content-Encodinggzip
Content-Length  2494
Keep-Alive  timeout=15, max=93
Connection  Keep-Alive
Content-Typetext/html;charset=UTF-8

- Response headers when directly using Tomcat:

Server  Apache-Coyote/1.1
Pragma  no-cache
Expires Thu, 01 Jan 1970 00:00:00 GMT
Cache-Control   no-cache, no-store
Content-Typetext/html;charset=UTF-8
Content-Languageen-UK
Transfer-Encoding   chunked
DateMon, 02 May 2011 08:19:39 GMT

The content type header is the same and specifies UTF-8 as encoding... However 
it appears that when using Apache / modjk / Tomcat, the reponse content is 
compressed using gzip. It's not the case when directly accessing Tomcat. I 
don't know if it could be the reason of the problem...

Thierry

 Hi.
 
 I suggest to get one of the browser add-ons which allow to
 display the complete HTTP response from the webserver to the
 browser (iow the HTTP headers as well as the content).
 For Firefox, you can use for example HttpFox; for IE, there
 is Fiddler2. A quick search in Google will lead you to the
 download page.
 
 Install one of those, re-do your server request, and
 carefully compare what you get back
 a) from Tomcat directly
 b) from Apache + mod_jk + tomcat
 
 The way that a browser will display a page (in terms of
 charset) depends on 3 elements :
 
 1) when the server sends a response, it includes a
 Content-type HTTP header, which in this case should be
 something like :
 Content-type: text/html; charset=UTF-8
 
 2) any meta tags included inside the head
 portion of the html page.
 For example, a tag such as :
 meta http-equiv=content-type value=text/html;
 charset=UTF-8 /
 
 3) the way in which the browser (each specific browser, and
 sometimes even version) interprets the above.
 
 According to the HTTP RFCs, the browser SHOULD NOT
 second-guess what the server says in terms of
 content-type. In other words, if the server says
 Content-type: something; charset=somecharset
 then the browser should blindly follow this, and not make
 its own determination.
 
 However, IE for one is notorious for not following this
 aspect of the RFCs, and constantly trying to determine by
 itself what it is receiving, often in contradiction to what
 the server says. And worse, the determination it makes
 depends on the version of IE, and sometimes even on the
 patches applied to ir or to Windows.
 
 Also,
 3a) ultimately, it is the user who is in control.  In
 the browser settings, there is a way to override the above,
 and force the browser to always display the page in a
 specific character set.  It does not sound that this is
 an issue in your case, but better check anyway.
 
 But first, make sure that what you are receiving in one
 case or the other is really the same, headers and content.
 And maybe also try it with different browsers, to see if
 the result is always the same.
 
 Once you know the answer to that, then you can start
 looking for the issue in a more focused way.
 
 
 -
 To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
 For additional commands, e-mail: users-h...@tomcat.apache.org
 


-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Re: mojk and utf8 charset problem

2011-05-02 Thread André Warnier

Thierry Templier wrote:

Hi André,

Thanks very much for your help!

I checked difference between two access:

- Using Apache / modjk / Tomcat that can't display correclty non latin1 
characters
- Directly using Tomcat that works fine

Except characters that don't display correctly content are the same, especially 
meta tags at the beginning:

meta http-equiv=CONTENT-TYPE content=text/html; charset=UTF-8/meta

As suggested, I also have a look at request / response content and it seems 
that there are some different, as described below.

- Response headers when using Apache / Modjk / Tomcat:

DateMon, 02 May 2011 08:21:16 GMT
Server  Apache/2.2.14 (Ubuntu)
Pragma  no-cache
Expires Thu, 01 Jan 1970 00:00:00 GMT
Cache-Control   no-cache, no-store
Content-Languageen-UK
VaryAccept-Encoding
Content-Encodinggzip
Content-Length  2494
Keep-Alive  timeout=15, max=93
Connection  Keep-Alive
Content-Typetext/html;charset=UTF-8

- Response headers when directly using Tomcat:

Server  Apache-Coyote/1.1
Pragma  no-cache
Expires Thu, 01 Jan 1970 00:00:00 GMT
Cache-Control   no-cache, no-store
Content-Typetext/html;charset=UTF-8
Content-Languageen-UK
Transfer-Encoding   chunked
DateMon, 02 May 2011 08:19:39 GMT

The content type header is the same and specifies UTF-8 as encoding... However 
it appears that when using Apache / modjk / Tomcat, the reponse content is 
compressed using gzip. It's not the case when directly accessing Tomcat. I 
don't know if it could be the reason of the problem...


It seems unlikely that it would be the compression that causes the problem.
Content encoding is only supposed to be used during the transport from the server to the 
browser.  So it is applied last at the server (Apache) side, and removed first at the 
browser side, before interpreting the content.

But just in case, it should be easy to disable, if even just for a test.

Under Ubuntu, you may try the command a2dismod deflate to disable the filter.
Or if that does not work, have a look here to modify your configuration :
http://httpd.apache.org/docs/2.2/mod/mod_deflate.html

I believe Ubuntu is similar to Debian.  If so, then the setup of the mod_deflate filter 
may be in a file like /etc/apache2/mods-available/deflate.conf




-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Re: mojk and utf8 charset problem

2011-05-02 Thread André Warnier

Additional question : did you try it with different browsers ?

-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Re: mojk and utf8 charset problem

2011-05-02 Thread Thierry Templier
Hello André,

I made tests in both browsers:

- Firefox 3.6.16 (linux)
- Chrome 11.0.696.57 (linux)

and I have the same behavior.

Thierry

 Additional question : did you try it
 with different browsers ?

-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Re: mojk and utf8 charset problem

2011-05-02 Thread Thierry Templier
Hello André,

After having disabled compression at Apache level, things change a bit since 
now content from database is correctly displayed using JSTL (c:out 
value=(...) escapeXml=false/) but it's still not the case for content of 
JSP pages. I have however that at the beginning of JSP pages: %@page 
language=java contentType=text/html; charset=UTF-8%.

Thierry

 It seems unlikely that it would be the compression that
 causes the problem.
 Content encoding is only supposed to be used during the
 transport from the server to the browser.  So it is
 applied last at the server (Apache) side, and removed first
 at the browser side, before interpreting the content.
 But just in case, it should be easy to disable, if even
 just for a test.
 
 Under Ubuntu, you may try the command a2dismod deflate to
 disable the filter.
 Or if that does not work, have a look here to modify your
 configuration :
 http://httpd.apache.org/docs/2.2/mod/mod_deflate.html
 
 I believe Ubuntu is similar to Debian.  If so, then
 the setup of the mod_deflate filter may be in a file like
 /etc/apache2/mods-available/deflate.conf

-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Re: mojk and utf8 charset problem

2011-05-02 Thread Christopher Schultz
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Thierry,

On 5/2/2011 4:31 AM, Thierry Templier wrote:
 meta http-equiv=CONTENT-TYPE content=text/html; charset=UTF-8/meta

Just to be sure, I highly recommend coding your pages like this:

meta http-equiv=CONTENT-TYPE content=text/html; charset=?=
response.getCharacterEncoding() %/meta

This will ensure that you aren't sending ISO-8859-1 but claiming that
it's UTF-8.

 The content type header is the same and specifies UTF-8 as encoding... 
 However it appears that when using Apache / modjk / Tomcat, the reponse 
 content is compressed using gzip. It's not the case when directly accessing 
 Tomcat. I don't know if it could be the reason of the problem...

gzip encoding is unlikely to be causing the problem.

Can you post the configuration you have for your Connector elements in
Tomcat's conf/server.xml? Remember to remove any sensitive information
(ip addresses, JK secrets, etc.)

- -chris
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.10 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk2+vrEACgkQ9CaO5/Lv0PBqIgCeKKh2ihG6UX/EESHe1dgkMK0O
NDYAn06+/cyLX0CiQJLSg+6IuKS8tCsx
=kcom
-END PGP SIGNATURE-

-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Re: mojk and utf8 charset problem

2011-05-02 Thread André Warnier

Thierry Templier wrote:

Hello André,

After having disabled compression at Apache level, things change a bit since now content from database is correctly displayed using 
JSTL (c:out value=(...) escapeXml=false/) but it's still not the case for content of JSP pages. I 
have however that at the beginning of JSP pages: %@page language=java contentType=text/html; 
charset=UTF-8%.


Logic would have it that, independently of what the server does,
- if you have the same browser at the client side
- if the HTTP response headers are the same in both cases
- if the response content is the same in both cases
then the browser should display the same thing.

And if it doesn't, then one of the above premises is wrong.

To my knowledge, there is no purpose-built mechanism in either the AJP Connector, or 
mod_jk, to change the response content after it has been produced by the application.


There could be a bug somewhere however, in particular when talking about characters which 
may need more than 2 bytes for a proper UTF-8 representation (and chunked encoding? that 
may be a little-investigated area).

But if the received content is the same, then this also makes no sense.

Another test : what about using wget to retrieve one of your pages directly from tomcat 
and then through Apache/mod_jk, saving the result as 2 files, and then comparing these 
files with diff ?



-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



mojk and utf8 charset problem

2011-04-29 Thread Thierry Templier
Hello,

I developped an application that uses UTF8 encoding since it needs to display 
arabic characters. When directly accessing the application from Tomcat, 
everything works fine. When I tried to access it through Apache web server and 
mod jk, I have problems to display such characters. Utf8 is correctly 
configured within Apache web server since I can display them from static pages. 
So it seems the problem comes from mod jk.

Is there a way to configure modjk to use utf8 encoding for http requests and 
responses?

Thanks very much for your answers.
Thierry



-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



RE: mojk and utf8 charset problem

2011-04-29 Thread Matteo Turra
From my experience modjk doesn't have charset configuration, only on connector 
into server.xml you can change charset configuration (URIEncodig, 
useBodyEncodingForURI) but only to parse the uri and parameters of the request, 
not for output.

Did you try with the same tomcat to get pages from http tomcat connector (port 
8080 default) and from apache (port 80 default).

i.e: if you have a dynamic page testPage.html build by tomcat, try this on your 
browser
http://localhost:8080/testPage.html
http://localhost:80/testPage.html

If the result is the same, connector and apache are not the origin.

If you 




-Original Message-
From: Thierry Templier [mailto:temp...@yahoo.fr] 
Sent: venerdì 29 aprile 2011 14:33
To: users@tomcat.apache.org
Subject: mojk and utf8 charset problem

Hello,

I developped an application that uses UTF8 encoding since it needs to display 
arabic characters. When directly accessing the application from Tomcat, 
everything works fine. When I tried to access it through Apache web server and 
mod jk, I have problems to display such characters. Utf8 is correctly 
configured within Apache web server since I can display them from static pages. 
So it seems the problem comes from mod jk.

Is there a way to configure modjk to use utf8 encoding for http requests and 
responses?

Thanks very much for your answers.
Thierry



-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Re: mojk and utf8 charset problem

2011-04-29 Thread André Warnier

Thierry Templier wrote:

Hello,

I developped an application that uses UTF8 encoding since it needs to display 
arabic characters. When directly accessing the application from Tomcat, 
everything works fine. When I tried to access it through Apache web server and 
mod jk, I have problems to display such characters. Utf8 is correctly 
configured within Apache web server since I can display them from static pages. 
So it seems the problem comes from mod jk.

Is there a way to configure modjk to use utf8 encoding for http requests and 
responses?


Hi.

I suggest to get one of the browser add-ons which allow to display the complete HTTP 
response from the webserver to the browser (iow the HTTP headers as well as the content).
For Firefox, you can use for example HttpFox; for IE, there is Fiddler2. A quick search in 
Google will lead you to the download page.


Install one of those, re-do your server request, and carefully compare what you 
get back
a) from Tomcat directly
b) from Apache + mod_jk + tomcat

The way that a browser will display a page (in terms of charset) depends on 3 
elements :

1) when the server sends a response, it includes a Content-type HTTP header, which in 
this case should be something like :

Content-type: text/html; charset=UTF-8

2) any meta tags included inside the head portion of the html page.
For example, a tag such as :
meta http-equiv=content-type value=text/html; charset=UTF-8 /

3) the way in which the browser (each specific browser, and sometimes even version) 
interprets the above.


According to the HTTP RFCs, the browser SHOULD NOT second-guess what the server says in 
terms of content-type. In other words, if the server says

Content-type: something; charset=somecharset
then the browser should blindly follow this, and not make its own determination.

However, IE for one is notorious for not following this aspect of the RFCs, and constantly 
trying to determine by itself what it is receiving, often in contradiction to what the 
server says. And worse, the determination it makes depends on the version of IE, and 
sometimes even on the patches applied to ir or to Windows.


Also,
3a) ultimately, it is the user who is in control.  In the browser settings, there is a way 
to override the above, and force the browser to always display the page in a specific 
character set.  It does not sound that this is an issue in your case, but better check anyway.


But first, make sure that what you are receiving in one case or the other is really the 
same, headers and content.

And maybe also try it with different browsers, to see if the result is always 
the same.

Once you know the answer to that, then you can start looking for the issue in a more 
focused way.



-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Re: [ win xp and win server 2003 ] tomcat utf8 encoding

2011-04-12 Thread Tomislav Brkljačić

Cris,


Christopher Schultz-2 wrote:
 
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1
 
 Tom,
 
 On 4/9/2011 12:53 PM, Tomislav Brkljačić wrote:
 I gave the add the filter and bunch of Spring jars method a try and it
 turned out to be a success! 
 
 You don't need a bunch of Spring jars... just use the one that comes
 with Tomcat and be done with it: there's no reason to add unnecessary
 libraries to your webapp.
 
 

Well there seemed to be more than 3 dependancies. 

I tried with org.springframework.web-3.1.0.M1.jar only, but it didn't work.
Then i added the *-core.jar, but still problems. Tried adding
*-beans.jar, 
but still problems with loading.

After that i added the whole spring distro, ran the test scenarios and
didn't find any problems.


Christopher Schultz-2 wrote:
 
 I don't think you want this: you only want to set the encoding when the
 client has provided none. If the client provides an encoding and you
 override it, you are probably making a bit mistake.
 
 

I see. 
The app i'm building will be accessible on intranet only.
Guessing on what can a client send as attach is not wise, i know.

Could this issue be handled with a smarter custom filter in place of the
generic one ?


Christopher Schultz-2 wrote:
 
 Andre  Cris, a beer in your name tonight.
 
 You should send me a Belgian beer and Andre an American one. :)
 
 (PS there actually are decent American beers)
 
 - -chris
 

I've heard of Corsendonk beer as a fine one. Don't know any American beers
(beside B) :)

-- 
View this message in context: 
http://old.nabble.com/--win-xp-and-win-server-2003---tomcat-utf8-encoding-tp31342723p31377067.html
Sent from the Tomcat - User mailing list archive at Nabble.com.


-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Re: [ win xp and win server 2003 ] tomcat utf8 encoding

2011-04-12 Thread Christopher Schultz
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Tom,

On 4/12/2011 4:22 AM, Tomislav Brkljačić wrote:
 After that i added the whole spring distro, ran the test scenarios and
 didn't find any problems.

I guess if that works I just think it's unnecessary because you can
use a filter from somewhere else (Tomcat, for instance).

 Christopher Schultz-2 wrote:

 I don't think you want this: you only want to set the encoding when the
 client has provided none. If the client provides an encoding and you
 override it, you are probably making a bit mistake.
 
 I see. 
 The app i'm building will be accessible on intranet only.
 Guessing on what can a client send as attach is not wise, i know.
 
 Could this issue be handled with a smarter custom filter in place of the
 generic one ?

Just set forceEncoding=false (or don't set it at all: the default
/should/ be not to force the encoding because it's a bad idea).

- -chris
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.10 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk2kW/wACgkQ9CaO5/Lv0PBriwCfYsjBk1b5YWGGKLYUDghs4ESW
MmMAniI9+VHADDcznoZy2JWVY6qqsn/x
=wA6L
-END PGP SIGNATURE-

-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Re: [ win xp and win server 2003 ] tomcat utf8 encoding

2011-04-11 Thread Christopher Schultz
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Tom,

On 4/9/2011 12:53 PM, Tomislav Brkljačić wrote:
 I gave the add the filter and bunch of Spring jars method a try and it
 turned out to be a success! 

You don't need a bunch of Spring jars... just use the one that comes
with Tomcat and be done with it: there's no reason to add unnecessary
libraries to your webapp.

 Fantastic!
 
 Filter code :
 filter
 filter-namecharacterEncodingFilter/filter-name
 filter-classorg.springframework.web.filter.CharacterEncodingFilter/filter-class
 init-param
 param-nameforceEncoding/param-name
 param-valuetrue/param-value

I don't think you want this: you only want to set the encoding when the
client has provided none. If the client provides an encoding and you
override it, you are probably making a bit mistake.

 Andre  Cris, a beer in your name tonight.

You should send me a Belgian beer and Andre an American one. :)

(PS there actually are decent American beers)

- -chris
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.10 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk2jFTsACgkQ9CaO5/Lv0PC6egCeIBhQOJecHh1nNt5pwTRgVJ7b
GRoAoIbgwbo4w+4/JCxYGz2Dl7fam888
=ojE2
-END PGP SIGNATURE-

-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Re: [ win xp and win server 2003 ] tomcat utf8 encoding

2011-04-09 Thread Tomislav Brkljačić

I gave the add the filter and bunch of Spring jars method a try and it
turned out to be a success! 
Fantastic!

Filter code :
filter
filter-namecharacterEncodingFilter/filter-name
filter-classorg.springframework.web.filter.CharacterEncodingFilter/filter-class
init-param
param-nameforceEncoding/param-name
param-valuetrue/param-value
/init-param
init-param
param-nameencoding/param-name
param-valueUTF-8/param-value
/init-param
/filter

filter-mapping
filter-namecharacterEncodingFilter/filter-name
url-pattern/*/url-pattern
/filter-mapping

And all the jars from Spring 3.1.0 M distro were copied in the tomcat/lib
folder.

Andre  Cris, a beer in your name tonight.

Cheers! :)




Christopher Schultz-2 wrote:
 
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1
 
 André,
 
 On 4/8/2011 11:50 AM, André Warnier wrote:
 Tomislav Brkljačić wrote:
 The remote machine gives the wrong result.

 I wrote on the mailing list of the BPM software, the discussion is still
 alive.

 Maybe i could try to force a CharacterEncodingFilter filter on tomcat.
 Something like
 http://www.onthoo.com/blog/programming/2005/07/characterencodingfilter.html

 this .
 
 Don't do that.  Your problem is with the file *name*, not with the file
 content.
 Filters work on the content.
 I think you could make a real mess of everything by adding a content
 filter.  I don't think that Tomcat would use it in this case, but if it
 does, it will filter the whole multi-part body (headers and contents),
 which is certainly not what you want here.
 
 If the multipart form-handler uses an InputStream to read the request
 body, it won't matter what the character encoding is, anyway. I suspect
 an InputStream will be used, since that is entirely appropriate in this
 case. On the other hand, setting the character encoding might trigger
 the multipart parsing library to use the preferred encoding to translate
 filename bytes into characters. One can dream.
 
 - -chris
 -BEGIN PGP SIGNATURE-
 Version: GnuPG v1.4.10 (MingW32)
 Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
 
 iEYEARECAAYFAk2fMCAACgkQ9CaO5/Lv0PDa8gCfSrZxjxF4vcEcsHkAqFChnYZ4
 nsYAni7LNi0PeGjgGGhxxZadvQOh6QuY
 =VwYO
 -END PGP SIGNATURE-
 
 -
 To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
 For additional commands, e-mail: users-h...@tomcat.apache.org
 
 
 

-- 
View this message in context: 
http://old.nabble.com/--win-xp-and-win-server-2003---tomcat-utf8-encoding-tp31342723p31359969.html
Sent from the Tomcat - User mailing list archive at Nabble.com.


-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Re: [ win xp and win server 2003 ] tomcat utf8 encoding

2011-04-09 Thread Fadil
unsubscribe

2011/4/7 Tomislav Brkljačić tomislav.brklja...@gmail.com


 Hi to all,

 this is my scenario and problem.

 Situation 1. - local machine, win xp
 I have a web app deployed to tomcat, and the app has a webform for
 uploading
 attachments.
 Attachments can have funny letters (š,ć,čćžđ ) in the filename.
 I have set the file.encoding=UTF8 and UriEncoding = UTF8 for jvm and inside
 the server.xml.
 Everything works as expected, no anomalies in displaying the filenames of
 the uploaded files.

 Situation 2. - client machine, win server 2003
 Same webapp as in Situation 1, same tomcat configuration in all matters.
 But there is  aproblem.
 After i upload the files with funny names through the app, the filenames
 are
 scrambled and garbled.
 I checked the location of the files in the file system, and of course
 uploadaed filenames are
 acrambled in the file system too.

 Obviously there is some other setting i need to check and syncronize, but
 it
 eludes me so far..

 Any help is very appreciated.

 Tomislav


 --
 View this message in context:
 http://old.nabble.com/--win-xp-and-win-server-2003---tomcat-utf8-encoding-tp31342723p31342723.html
 Sent from the Tomcat - User mailing list archive at Nabble.com.


 -
 To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
 For additional commands, e-mail: users-h...@tomcat.apache.org




Re: [ win xp and win server 2003 ] tomcat utf8 encoding

2011-04-09 Thread André Warnier

Fadil wrote:
unsubscribe 


--
 |
 v

To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org




-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Re: [ win xp and win server 2003 ] tomcat utf8 encoding

2011-04-08 Thread Tomislav Brkljačić


awarnier wrote:
 
 Tomislav Brkljačić wrote:
 
 awarnier wrote:
 Tomislav Brkljačić wrote:
 Hi to all,

 this is my scenario and problem. 

 Situation 1. - local machine, win xp
 I have a web app deployed to tomcat, and the app has a webform for
 uploading
 attachments.
 Attachments can have funny letters (š,ć,čćžđ ) in the filename.
 I have set the file.encoding=UTF8 and UriEncoding = UTF8 for jvm and
 inside
 the server.xml.
 Everything works as expected, no anomalies in displaying the filenames
 of
 the uploaded files.

 Situation 2. - client machine, win server 2003
 Same webapp as in Situation 1, same tomcat configuration in all
 matters.
 But there is  aproblem.
 After i upload the files with funny names through the app, the
 filenames
 are
 scrambled and garbled.
 I checked the location of the files in the file system, and of course 
 uploadaed filenames are
 acrambled in the file system too.

 Obviously there is some other setting i need to check and syncronize,
 but
 it
 eludes me so far..

 Any help is very appreciated.

 Hi.
 Can you provide the *exact* versions of Java, Tomcat, and whichever file
 uploading 
 mechanism you are using ?
 (meaning : to process the multi-part POST with the file upload, your
 webapp uses some 
 additional mechanism; which is it ?)

 
 1.Situation - local win xp machine
 Java : java version 1.6.0_22
 Tomcat : 6.0.29
 This is the scenario where everything works as expected.
 
 
 2. Situation - customer win server 2003 machine
 Java : java version 1.6.0_20
 Tomcat : 6.0.29
 
 The deployed web application is developed with Bonita open Solution (BPM
 framework).
 I'm not that fluent in the java world but looking at the downloaded
 source
 code, i guess it
 would be a basic fileupload servlet. 
 
 
 Right. But that may be the important part.
 Are you familiar with the format in which a browser sends a
 multipart/form-data POST ?
 (MIME multipart, similar to the basic .eml format of an email with
 attachments)
 Briefly : the data is sent by the browser in a format like :
 
 request line (POST)
 header..
 header..
 header Content-type: multipart/form-data; boundary=xyz--
 ..
 (blank line)
 header of part 1
 (blank line)
 body of part 1
 xyz--
 header of part 2
 (blank line)
 body of part 2
 xyz--
 etc...
 
 where each part is one of the inputs of the form.  One of these parts
 is your uploaded 
 file, and it has a special header which specifies the file type, encoding,
 file name etc..
 
 The job of the fileupload servlet (actually, it is a library capable of
 reading such a 
 POST and separating it into parts), is to read these headers and bodies,
 and make sense 
 out of them.  One of these things that it reads is the filename, and of
 course it 
 interprets that according to some character set.
 For that, it uses some kind of java stream, and if it does it right, tells
 it the 
 character set to use to decode the input.
 And it is possible that it does /not/ do it right in some cases (maybe
 even depending on 
 which JVM version it runs under). For example, if it does not specify the
 character set to 
 use to decode the input, Java may use the platform default, which may be
 different on 
 these two systems.  And if that is the case, it may wrongly decode the
 filename header, 
 and produce garbage.
 
 What I am saying is that, since you have the same Tomcat version on both
 systems, the code 
 which works differently is unlikely to be in Tomcat itself.  To my
 recollection (maybe 
 wrong), Tomcat 6.0 does not include any code that can deal with a
 multi-part POST.
 (I think that Tomcat 7.0 does).
 So the code which acts differently on your two servers above, is either
 the file-upload 
 library used by that servlet, or the JVM functions that it itself uses.
 
 In other words again, my first stop for a solution would be whatever
 support list is 
 available for the Bonita open Solution (BPM framework).
 
 (you may further narrow down the problem first by updating your 2003
 server to java 
 version 1.6.0_22).
 
 Now for another more general comment :
 According to your explanation, you upload a file from a browser, and then
 try to write it 
 to the local filesystem using the name which it had on the original
 workstation.
 In my view, this is always a bad idea, in general.
 One reason is the one you already found.
 The other is that if 2 users upload a file with the same name, the second
 one will 
 overwrite the first.
 The third is that you are leaving yourself open for all kinds of nasty
 things, such as a 
 user uploading a file with spaces in the name (always a problem at some
 point), or with 
 characters in the name that may be very dangerous (think of a file named
  /etc/passwd 
 or some.file|rm *).
 So, if you have a chance to do that, give each uploaded file a name that
 you create, and 
 keep the original filename in some separate place if you need it, for
 display only

Re: [ win xp and win server 2003 ] tomcat utf8 encoding

2011-04-08 Thread Christopher Schultz
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

André,

On 4/7/2011 5:15 PM, André Warnier wrote:
 Christopher Schultz wrote:
 ... (RFC references) ..
 
 Thanks for that post (with the chain of applicable RFCs).  I will keep
 that email preciously as a resource for future file upload debugging
 references.

You could always update the Tomcat Character Encoding page. I have a
headache from reading those specs so I'm not going to do it right now :)

 Also, to add to the potential OP woes, there is also the fact that some
 browsers send the filename, and others send the full path of the file.

I love it when a standard gets followed. Do you happen to know which
browsers send the files in which format? Most OSs these days use either
\ or / as a path delimiter, so you can take everything after the last
one of those... but what is someone is using Firefox 4.0 on VMS (ha ha ha)?

 But it /may/ still be a problem if, after uploading the file and duly
 writing it into a directory, that directory is then later scanned by
 some separate (non-Java) program or script (whatever language it may be
 written in, even, God forbid, perl) with the purpose of actually doing
 something with these files.

Good point. Always good to quote your filenames :)

 There may be a lot of potential there :
 
 for ff in /mydir/* ; do
   mv $ff /otherdir/${ff}.new
 done

:)

- -chris
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.10 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk2fHE8ACgkQ9CaO5/Lv0PDqvACfXYAyO3jUtppEbPmW/pqCi71x
jv4Anik39tH/ir2Gw8ah+uGAeAg473or
=bUn3
-END PGP SIGNATURE-

-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Re: [ win xp and win server 2003 ] tomcat utf8 encoding

2011-04-08 Thread Christopher Schultz
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Tom,

On 4/8/2011 4:19 AM, Tomislav Brkljačić wrote:
 Ok, this is what i did.
 
 1. updated the java runtime so they match on both machines

Not a bad idea, but probably didn't affect anything.

 Tried to run the examples, but still the same result.
 
 2. installed livehttpheaders for firefox and ran the examples upload.
 This is the output from livehttp  from my local machine (the same is on the
 server machine) :

So... is the local machine the one that does or does not work? Comparing
the two that DO work would be a good idea.

 Content-Type: multipart/form-data;
 boundary=---55652821543

Note the lack of a character encoding (in the main request header). This
is appropriate for multipart/form-data content.

 Content-Disposition: form-data; name=attach_file; filename=pričuva.txt
 Content-Type: text/plain
 
 asdasdasd
 -55652821543--

A couple of things:

1. I'm surprised that no Content-Length was sent along with the file.

2. Note that the filename has non-US-ASCII characters shown there.
   I wonder if that's LiveHttpHeaders's interpretation of the header
   (and in what encoding) or if that's what's on the wire.


I suspect that ff is just using utf-8 to send the filename. Tomcat may
interpret it as US-ASCII and give you an odd result. Actually... for
multipart, Tomcat shouldn't be involved: this may be a problem with the
library you are using for file uploads. You should definitely ask on the
BPM mailing list.

Here's one thing you can do:

String brokenString = part.getFilename();  // or whatever

String fixedString
   = new String(brokenString.getBytes(US-ASCII), UTF-8));

That will re-encode the bytes sent from the client UTF-8. This wil only
work if:

1. The client actually sent the data in UTF-8

2. Your multipart handler actually assumed that US-ASCII was correct

3. No alteration of the bytes has occurred by the interpretation
   as US-ASCII

If any of the above are NOT true, you are basically stuck.

It would be worth it to look at the bytes are they are traversing the
network -- say, with Wireshark -- to determine whether the filename is
actually encoded in UTF-8 or some other encoding.

Hope that helps,
- -chris
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.10 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk2fHlAACgkQ9CaO5/Lv0PAJpwCeLrK7QVnL8bEkyfXow8Thj6UD
TpEAoJgmtujwwN+VvvCHQzUHZsf9e2qO
=9LWc
-END PGP SIGNATURE-

-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Re: [ win xp and win server 2003 ] tomcat utf8 encoding

2011-04-08 Thread Tomislav Brkljačić

The remote machine gives the wrong result.

I wrote on the mailing list of the BPM software, the discussion is still
alive.

Maybe i could try to force a CharacterEncodingFilter filter on tomcat.
Something like 
http://www.onthoo.com/blog/programming/2005/07/characterencodingfilter.html
this .

I will definitely try with Wireshark.

thx


Christopher Schultz-2 wrote:
 
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1
 
 Tom,
 
 On 4/8/2011 4:19 AM, Tomislav Brkljačić wrote:
 Ok, this is what i did.
 
 1. updated the java runtime so they match on both machines
 
 Not a bad idea, but probably didn't affect anything.
 
 Tried to run the examples, but still the same result.
 
 2. installed livehttpheaders for firefox and ran the examples upload.
 This is the output from livehttp  from my local machine (the same is on
 the
 server machine) :
 
 So... is the local machine the one that does or does not work? Comparing
 the two that DO work would be a good idea.
 
 Content-Type: multipart/form-data;
 boundary=---55652821543
 
 Note the lack of a character encoding (in the main request header). This
 is appropriate for multipart/form-data content.
 
 Content-Disposition: form-data; name=attach_file;
 filename=pričuva.txt
 Content-Type: text/plain
 
 asdasdasd
 -55652821543--
 
 A couple of things:
 
 1. I'm surprised that no Content-Length was sent along with the file.
 
 2. Note that the filename has non-US-ASCII characters shown there.
I wonder if that's LiveHttpHeaders's interpretation of the header
(and in what encoding) or if that's what's on the wire.
 
 
 I suspect that ff is just using utf-8 to send the filename. Tomcat may
 interpret it as US-ASCII and give you an odd result. Actually... for
 multipart, Tomcat shouldn't be involved: this may be a problem with the
 library you are using for file uploads. You should definitely ask on the
 BPM mailing list.
 
 Here's one thing you can do:
 
 String brokenString = part.getFilename();  // or whatever
 
 String fixedString
= new String(brokenString.getBytes(US-ASCII), UTF-8));
 
 That will re-encode the bytes sent from the client UTF-8. This wil only
 work if:
 
 1. The client actually sent the data in UTF-8
 
 2. Your multipart handler actually assumed that US-ASCII was correct
 
 3. No alteration of the bytes has occurred by the interpretation
as US-ASCII
 
 If any of the above are NOT true, you are basically stuck.
 
 It would be worth it to look at the bytes are they are traversing the
 network -- say, with Wireshark -- to determine whether the filename is
 actually encoded in UTF-8 or some other encoding.
 
 Hope that helps,
 - -chris
 -BEGIN PGP SIGNATURE-
 Version: GnuPG v1.4.10 (MingW32)
 Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
 
 iEYEARECAAYFAk2fHlAACgkQ9CaO5/Lv0PAJpwCeLrK7QVnL8bEkyfXow8Thj6UD
 TpEAoJgmtujwwN+VvvCHQzUHZsf9e2qO
 =9LWc
 -END PGP SIGNATURE-
 
 -
 To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
 For additional commands, e-mail: users-h...@tomcat.apache.org
 
 
 

-- 
View this message in context: 
http://old.nabble.com/--win-xp-and-win-server-2003---tomcat-utf8-encoding-tp31342723p31353009.html
Sent from the Tomcat - User mailing list archive at Nabble.com.


-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Re: [ win xp and win server 2003 ] tomcat utf8 encoding

2011-04-08 Thread Christopher Schultz
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Tom,

On 4/8/2011 11:42 AM, Tomislav Brkljačić wrote:
 The remote machine gives the wrong result.

Okay. Could you post a LiveHttpHeaders dump of /that/ interaction, too?

 I wrote on the mailing list of the BPM software, the discussion is still
 alive.
 
 Maybe i could try to force a CharacterEncodingFilter filter on tomcat.
 Something like 
 http://www.onthoo.com/blog/programming/2005/07/characterencodingfilter.html
 this .

Tomcat's examples come with a filter that does exactly this. It's called
SetCharacterEncodingFilter and it can be found in the examples webapp.

We always run with such a filter in place because it solves all kinds of
problems with POST requests. My initial reaction was that the headers
are not part of the request body, so the SetCharacterEncodingFilter
wouldn't have an effect, but then again, the request body contains the
multipart/form-data including the headers of each multipart part. This
may solve all your problems.

Let us know how it goes.

- -chris
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.10 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk2fLtkACgkQ9CaO5/Lv0PBa9QCgqSihhlwnMH4c4nqpN9HP2ACX
iLMAn3B2P5u/qT4ipH6xaR+LbycTJ4gI
=oiLZ
-END PGP SIGNATURE-

-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Re: [ win xp and win server 2003 ] tomcat utf8 encoding

2011-04-08 Thread André Warnier

Tomislav Brkljačić wrote:

The remote machine gives the wrong result.

I wrote on the mailing list of the BPM software, the discussion is still
alive.

Maybe i could try to force a CharacterEncodingFilter filter on tomcat.
Something like 
http://www.onthoo.com/blog/programming/2005/07/characterencodingfilter.html

this .


Don't do that.  Your problem is with the file *name*, not with the file content.
Filters work on the content.
I think you could make a real mess of everything by adding a content filter.  I don't 
think that Tomcat would use it in this case, but if it does, it will filter the whole 
multi-part body (headers and contents), which is certainly not what you want here.


A question : how exactly is the file name retrieved and used by that BPM upload 
module ?
I mean, can you see if it gets it as a byte array or as a String ?

And what about that locale=default query parameter ?
What is it supposed to mean, in the BPM documentation ?




I will definitely try with Wireshark.

thx


Christopher Schultz-2 wrote:

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Tom,

On 4/8/2011 4:19 AM, Tomislav Brkljačić wrote:

Ok, this is what i did.

1. updated the java runtime so they match on both machines

Not a bad idea, but probably didn't affect anything.


Tried to run the examples, but still the same result.

2. installed livehttpheaders for firefox and ran the examples upload.
This is the output from livehttp  from my local machine (the same is on
the
server machine) :

So... is the local machine the one that does or does not work? Comparing
the two that DO work would be a good idea.


Content-Type: multipart/form-data;
boundary=---55652821543

Note the lack of a character encoding (in the main request header). This
is appropriate for multipart/form-data content.


Content-Disposition: form-data; name=attach_file;
filename=pričuva.txt
Content-Type: text/plain

asdasdasd
-55652821543--

A couple of things:

1. I'm surprised that no Content-Length was sent along with the file.

2. Note that the filename has non-US-ASCII characters shown there.
   I wonder if that's LiveHttpHeaders's interpretation of the header
   (and in what encoding) or if that's what's on the wire.


I suspect that ff is just using utf-8 to send the filename. Tomcat may
interpret it as US-ASCII and give you an odd result. Actually... for
multipart, Tomcat shouldn't be involved: this may be a problem with the
library you are using for file uploads. You should definitely ask on the
BPM mailing list.

Here's one thing you can do:

String brokenString = part.getFilename();  // or whatever

String fixedString
   = new String(brokenString.getBytes(US-ASCII), UTF-8));

That will re-encode the bytes sent from the client UTF-8. This wil only
work if:

1. The client actually sent the data in UTF-8

2. Your multipart handler actually assumed that US-ASCII was correct

3. No alteration of the bytes has occurred by the interpretation
   as US-ASCII

If any of the above are NOT true, you are basically stuck.

It would be worth it to look at the bytes are they are traversing the
network -- say, with Wireshark -- to determine whether the filename is
actually encoded in UTF-8 or some other encoding.

Hope that helps,
- -chris
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.10 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk2fHlAACgkQ9CaO5/Lv0PAJpwCeLrK7QVnL8bEkyfXow8Thj6UD
TpEAoJgmtujwwN+VvvCHQzUHZsf9e2qO
=9LWc
-END PGP SIGNATURE-

-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org








-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Re: [ win xp and win server 2003 ] tomcat utf8 encoding

2011-04-08 Thread Christopher Schultz
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

André,

On 4/8/2011 11:50 AM, André Warnier wrote:
 Tomislav Brkljačić wrote:
 The remote machine gives the wrong result.

 I wrote on the mailing list of the BPM software, the discussion is still
 alive.

 Maybe i could try to force a CharacterEncodingFilter filter on tomcat.
 Something like
 http://www.onthoo.com/blog/programming/2005/07/characterencodingfilter.html

 this .
 
 Don't do that.  Your problem is with the file *name*, not with the file
 content.
 Filters work on the content.
 I think you could make a real mess of everything by adding a content
 filter.  I don't think that Tomcat would use it in this case, but if it
 does, it will filter the whole multi-part body (headers and contents),
 which is certainly not what you want here.

If the multipart form-handler uses an InputStream to read the request
body, it won't matter what the character encoding is, anyway. I suspect
an InputStream will be used, since that is entirely appropriate in this
case. On the other hand, setting the character encoding might trigger
the multipart parsing library to use the preferred encoding to translate
filename bytes into characters. One can dream.

- -chris
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.10 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk2fMCAACgkQ9CaO5/Lv0PDa8gCfSrZxjxF4vcEcsHkAqFChnYZ4
nsYAni7LNi0PeGjgGGhxxZadvQOh6QuY
=VwYO
-END PGP SIGNATURE-

-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



[ win xp and win server 2003 ] tomcat utf8 encoding

2011-04-07 Thread Tomislav Brkljačić

Hi to all,

this is my scenario and problem. 

Situation 1. - local machine, win xp
I have a web app deployed to tomcat, and the app has a webform for uploading
attachments.
Attachments can have funny letters (š,ć,čćžđ ) in the filename.
I have set the file.encoding=UTF8 and UriEncoding = UTF8 for jvm and inside
the server.xml.
Everything works as expected, no anomalies in displaying the filenames of
the uploaded files.

Situation 2. - client machine, win server 2003
Same webapp as in Situation 1, same tomcat configuration in all matters.
But there is  aproblem.
After i upload the files with funny names through the app, the filenames are
scrambled and garbled.
I checked the location of the files in the file system, and of course 
uploadaed filenames are
acrambled in the file system too.

Obviously there is some other setting i need to check and syncronize, but it
eludes me so far..

Any help is very appreciated.

Tomislav


-- 
View this message in context: 
http://old.nabble.com/--win-xp-and-win-server-2003---tomcat-utf8-encoding-tp31342723p31342723.html
Sent from the Tomcat - User mailing list archive at Nabble.com.


-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Re: [ win xp and win server 2003 ] tomcat utf8 encoding

2011-04-07 Thread André Warnier

Tomislav Brkljačić wrote:

Hi to all,

this is my scenario and problem. 


Situation 1. - local machine, win xp
I have a web app deployed to tomcat, and the app has a webform for uploading
attachments.
Attachments can have funny letters (š,ć,čćžđ ) in the filename.
I have set the file.encoding=UTF8 and UriEncoding = UTF8 for jvm and inside
the server.xml.
Everything works as expected, no anomalies in displaying the filenames of
the uploaded files.

Situation 2. - client machine, win server 2003
Same webapp as in Situation 1, same tomcat configuration in all matters.
But there is  aproblem.
After i upload the files with funny names through the app, the filenames are
scrambled and garbled.
I checked the location of the files in the file system, and of course 
uploadaed filenames are

acrambled in the file system too.

Obviously there is some other setting i need to check and syncronize, but it
eludes me so far..

Any help is very appreciated.


Hi.
Can you provide the *exact* versions of Java, Tomcat, and whichever file uploading 
mechanism you are using ?
(meaning : to process the multi-part POST with the file upload, your webapp uses some 
additional mechanism; which is it ?)



-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Re: [ win xp and win server 2003 ] tomcat utf8 encoding

2011-04-07 Thread Tomislav Brkljačić


awarnier wrote:
 
 Tomislav Brkljačić wrote:
 Hi to all,
 
 this is my scenario and problem. 
 
 Situation 1. - local machine, win xp
 I have a web app deployed to tomcat, and the app has a webform for
 uploading
 attachments.
 Attachments can have funny letters (š,ć,čćžđ ) in the filename.
 I have set the file.encoding=UTF8 and UriEncoding = UTF8 for jvm and
 inside
 the server.xml.
 Everything works as expected, no anomalies in displaying the filenames of
 the uploaded files.
 
 Situation 2. - client machine, win server 2003
 Same webapp as in Situation 1, same tomcat configuration in all matters.
 But there is  aproblem.
 After i upload the files with funny names through the app, the filenames
 are
 scrambled and garbled.
 I checked the location of the files in the file system, and of course 
 uploadaed filenames are
 acrambled in the file system too.
 
 Obviously there is some other setting i need to check and syncronize, but
 it
 eludes me so far..
 
 Any help is very appreciated.
 
 Hi.
 Can you provide the *exact* versions of Java, Tomcat, and whichever file
 uploading 
 mechanism you are using ?
 (meaning : to process the multi-part POST with the file upload, your
 webapp uses some 
 additional mechanism; which is it ?)
 

1.Situation - local win xp machine
Java : java version 1.6.0_22
Tomcat : 6.0.29
This is the scenario where everything works as expected.


2. Situation - customer win server 2003 machine
Java : java version 1.6.0_20
Tomcat : 6.0.29

The deployed web application is developed with Bonita open Solution (BPM
framework).
I'm not that fluent in the java world but looking at the downloaded source
code, i guess it
would be a basic fileupload servlet. 

thx 
-- 
View this message in context: 
http://old.nabble.com/--win-xp-and-win-server-2003---tomcat-utf8-encoding-tp31342723p31343818.html
Sent from the Tomcat - User mailing list archive at Nabble.com.


-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Re: [ win xp and win server 2003 ] tomcat utf8 encoding

2011-04-07 Thread André Warnier

Tomislav Brkljačić wrote:


awarnier wrote:

Tomislav Brkljačić wrote:

Hi to all,

this is my scenario and problem. 


Situation 1. - local machine, win xp
I have a web app deployed to tomcat, and the app has a webform for
uploading
attachments.
Attachments can have funny letters (š,ć,čćžđ ) in the filename.
I have set the file.encoding=UTF8 and UriEncoding = UTF8 for jvm and
inside
the server.xml.
Everything works as expected, no anomalies in displaying the filenames of
the uploaded files.

Situation 2. - client machine, win server 2003
Same webapp as in Situation 1, same tomcat configuration in all matters.
But there is  aproblem.
After i upload the files with funny names through the app, the filenames
are
scrambled and garbled.
I checked the location of the files in the file system, and of course 
uploadaed filenames are

acrambled in the file system too.

Obviously there is some other setting i need to check and syncronize, but
it
eludes me so far..

Any help is very appreciated.


Hi.
Can you provide the *exact* versions of Java, Tomcat, and whichever file
uploading 
mechanism you are using ?

(meaning : to process the multi-part POST with the file upload, your
webapp uses some 
additional mechanism; which is it ?)




1.Situation - local win xp machine
Java : java version 1.6.0_22
Tomcat : 6.0.29
This is the scenario where everything works as expected.


2. Situation - customer win server 2003 machine
Java : java version 1.6.0_20
Tomcat : 6.0.29

The deployed web application is developed with Bonita open Solution (BPM
framework).
I'm not that fluent in the java world but looking at the downloaded source
code, i guess it
would be a basic fileupload servlet. 



Right. But that may be the important part.
Are you familiar with the format in which a browser sends a multipart/form-data 
POST ?
(MIME multipart, similar to the basic .eml format of an email with attachments)
Briefly : the data is sent by the browser in a format like :

request line (POST)
header..
header..
header Content-type: multipart/form-data; boundary=xyz--
..
(blank line)
header of part 1
(blank line)
body of part 1
xyz--
header of part 2
(blank line)
body of part 2
xyz--
etc...

where each part is one of the inputs of the form.  One of these parts is your uploaded 
file, and it has a special header which specifies the file type, encoding, file name etc..


The job of the fileupload servlet (actually, it is a library capable of reading such a 
POST and separating it into parts), is to read these headers and bodies, and make sense 
out of them.  One of these things that it reads is the filename, and of course it 
interprets that according to some character set.
For that, it uses some kind of java stream, and if it does it right, tells it the 
character set to use to decode the input.
And it is possible that it does /not/ do it right in some cases (maybe even depending on 
which JVM version it runs under). For example, if it does not specify the character set to 
use to decode the input, Java may use the platform default, which may be different on 
these two systems.  And if that is the case, it may wrongly decode the filename header, 
and produce garbage.


What I am saying is that, since you have the same Tomcat version on both systems, the code 
which works differently is unlikely to be in Tomcat itself.  To my recollection (maybe 
wrong), Tomcat 6.0 does not include any code that can deal with a multi-part POST.

(I think that Tomcat 7.0 does).
So the code which acts differently on your two servers above, is either the file-upload 
library used by that servlet, or the JVM functions that it itself uses.


In other words again, my first stop for a solution would be whatever support list is 
available for the Bonita open Solution (BPM framework).


(you may further narrow down the problem first by updating your 2003 server to java 
version 1.6.0_22).


Now for another more general comment :
According to your explanation, you upload a file from a browser, and then try to write it 
to the local filesystem using the name which it had on the original workstation.

In my view, this is always a bad idea, in general.
One reason is the one you already found.
The other is that if 2 users upload a file with the same name, the second one will 
overwrite the first.
The third is that you are leaving yourself open for all kinds of nasty things, such as a 
user uploading a file with spaces in the name (always a problem at some point), or with 
characters in the name that may be very dangerous (think of a file named  /etc/passwd 
or some.file|rm *).
So, if you have a chance to do that, give each uploaded file a name that you create, and 
keep the original filename in some separate place if you need it, for display only.






-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Re: [ win xp and win server 2003 ] tomcat utf8 encoding

2011-04-07 Thread Christopher Schultz
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

André,

On 4/7/2011 12:26 PM, André Warnier wrote:
 What I am saying is that, since you have the same Tomcat version on both
 systems, the code which works differently is unlikely to be in Tomcat
 itself.  To my recollection (maybe wrong), Tomcat 6.0 does not include
 any code that can deal with a multi-part POST.
 (I think that Tomcat 7.0 does).

Correct: Tomcat 6 does not include any multipart-parsing code, while
Tomcat 7 does, since it implements the Servlet 3.0 Multipart upload
features.

Note that the URIEncoding setting on your Connector is not relevant,
since the filename is being read from the request /body/ and not from
the URI.

I would use Fiddler(2?), LiveHttHeaders, FireBug, etc. to see if there
is a difference on the /client/ side between these two situations. If
the client is sending a different Content-Type, then things may go wrong.

Here's the problem the ARPA Internet Text Messages standard (from
which HTTP et al descend) doesn't say how to encode message headers that
include non-US-ASCII characters. This includes filenames with
non-US-ASCII characters that are embedded in the

The W3C says this (http://www.w3.org/TR/html401/interact/forms.html) in
section 17.13.4:


The user agent should attempt to supply a file name for each submitted
file. The file name may be specified with the filename parameter of
the 'Content-Disposition: form-data' header, or, in the case of multiple
files, in a 'Content-Disposition: file' header of the subpart. If the
file name of the client's operating system is not in US-ASCII, the file
name might be approximated or encoded using the method of [RFC2045].
This is convenient for those cases where, for example, the uploaded
files might contain references to each other (e.g., a TeX file and its
.sty auxiliary style description).


So, the user agent /might/ do something? Not very encouraging. RFC 2045
says virtually nothing, but there is an RFC specifically covering the
Content-Disposition header: http://www.ietf.org/rfc/rfc2183.txt

If you follow everything, you can piece together the following:

- From http://www.ietf.org/rfc/rfc822.txt:
 CHAR=  any ASCII character; (  0-177,  0.-127.)

 quoted-string =  *(qtext/quoted-pair) ; Regular qtext or
 ;   quoted chars.

 qtext   =  any CHAR excepting , ; = may be folded
 \  CR, and including
 linear-white-space

 quoted-pair =  \ CHAR ; may quote any char

- From http://www.ietf.org/rfc/rfc2045.txt:

 value := token / quoted-string

- From http://www.ietf.org/rfc/rfc2183.txt:

 filename-parm := filename = value

So, the filename value can be a quoted string made up of any ASCII
value. Great. What about non-US-ASCII characters?

RFC 2183 says this in section 2:


   NOTE ON PARAMETER VALUE LENGHTS: A short (length = 78 characters)
   parameter value containing only non-`tspecials' characters SHOULD be
   represented as a single `token'.  A short parameter value containing
   only ASCII characters, but including `tspecials' characters, SHOULD
   be represented as `quoted-string'.  Parameter values longer than 78
   characters, or which contain non-ASCII characters, MUST be encoded as
   specified in [RFC 2184].


Great: another RFC to read. At least this one deals with the proper way
to communicate the character encoding used for a parameter value.

I think this all comes down to two things:

1. How standards-compliant is your user-agent (most aren't very good)

2. How standards-compliant is your file upload library (or servlet
container).

I've forgotten whether or not Tomcat includes RFC2184-style header
decoding logic... I'll have to check. But it doesn't matter if your
user-agent (=browser) sends the information in a non-standard way.

Can you provide some header captures so we can see what's going on?

 Now for another more general comment :
 According to your explanation, you upload a file from a browser, and
 then try to write it to the local filesystem using the name which it had
 on the original workstation.
 In my view, this is always a bad idea, in general.

+1

 One reason is the one you already found.
 The other is that if 2 users upload a file with the same name, the
 second one will overwrite the first.

+1

 The third is that you are leaving yourself open for all kinds of nasty
 things, such as a user uploading a file with spaces in the name (always
 a problem at some point), or with characters in the name that may be
 very dangerous (think of a file named  /etc/passwd or some.file|rm *).

I would hope that the OP was putting these files in some known root, so
that uploading /etc/passwd wouldn't overwrite /etc/passwd, and that file
permissions wouldn't allow this, either. Also, unlike Perl, having a
pipe in a filename isn't a problem in Java :)

The user can cause some other problems like uploading a 

Re: [ win xp and win server 2003 ] tomcat utf8 encoding

2011-04-07 Thread André Warnier

Christopher Schultz wrote:
... (RFC references) ..

Thanks for that post (with the chain of applicable RFCs).  I will keep that email 
preciously as a resource for future file upload debugging references.

...

Also, to add to the potential OP woes, there is also the fact that some browsers send the 
filename, and others send the full path of the file.




I would hope that the OP was putting these files in some known root, so
that uploading /etc/passwd wouldn't overwrite /etc/passwd,

(I wrote  /etc/passwd as the filename)

 and that file

permissions wouldn't allow this, either. Also, unlike Perl, having a
pipe in a filename isn't a problem in Java :)

But it /may/ still be a problem if, after uploading the file and duly writing it into a 
directory, that directory is then later scanned by some separate (non-Java) program or 
script (whatever language it may be written in, even, God forbid, perl) with the purpose 
of actually doing something with these files.


There may be a lot of potential there :

for ff in /mydir/* ; do
  mv $ff /otherdir/${ff}.new
done


-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



FormAuthenticator exception with non-latin (UTF8) user name

2010-06-30 Thread Chris Rafferty
Hi,

 

I'm getting the following exception when I try to access the list of
deployed webservices (http://localhost:8080/manager/list) with a user who
has non-Latin characters in their user name:

 

java.lang.NullPointerException

org.apache.catalina.authenticator.FormAuthenticator.forwardToLoginPage(FormA
uthenticator.java:321)

 
org.apache.catalina.authenticator.FormAuthenticator.authenticate(FormAuthent
icator.java:245)

 
org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase
.java:523)

 
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102
)

 
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298)

 
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:852)

org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http
11Protocol.java:588)

 
org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)

java.lang.Thread.run(Thread.java:619)

 

I'm running Tomcat and the browser (Firefox) all on my development PC with
Windows XP,  I've tried the same scenario, with the same results, on the
following Tomcat versions 5.5.23, 5.5.29 and 6.0.26 all on Java 1.6.0_12.  

 

My username and passwords are stored in a Postgres database and I have the
realm configured to access my database and tables in my server.xml, thus:

 

  Realm  className=org.apache.catalina.realm.JDBCRealm

 driverName=org.postgresql.Driver

  connectionURL=jdbc:postgresql://localhost/uca

 connectionName=blah connectionPassword=blah

  userTable=mg_users userNameCol=username
userCredCol=password

  userRoleTable=mg_roles roleNameCol=role /

 

This only occurs when I add the following valve to /conf/context.xml

 

Valve className=org.apache.catalina.authenticator.FormAuthenticator
characterEncoding=UTF-8 /

 

If I do not add the valve then Tomcat will not authenticate any attempt to
access the list page with a user name containing non-Latin characters, it
works perfectly if the user name contains only Latin characters.   Rationale
behind this approach:  my application consisting of a number of war files,
an html page and some jsp pages.   A user logs in via a html page which is
submitted as a form.  Now this works correctly for non-Latin characters when
I configure the context.xml in the application's META-INF directory with the
same valve, hence why I was trying to set it for all applications in the
hope that this would allow the manager list to be retrieved.   If I
explicitly set the valve just for the manager application I get the same
exception. 

 

Any pointers to what I'm doing wrong would be greatly appreciated. 

 

Chris Rafferty

Team Leader, Sidonis

 

e:  mailto:carol.hopper...@sidonis.com chris.raffe...@sidonis.com

w:  http://www.sidonis.com/ www.sidonis.com

 



Re: FormAuthenticator exception with non-latin (UTF8) user name

2010-06-30 Thread Mark Thomas

On 30/06/2010 12:22, Chris Rafferty wrote:


This only occurs when I add the following valve to /conf/context.xml

 Valve className=org.apache.catalina.authenticator.FormAuthenticator
characterEncoding=UTF-8 /


Bad idea on a number of levels.

1. That change then applies to *every* context, and will break any that 
don't use FORM authentication.


2. The Manager app uses BASIC authentication...


If I do not add the valve then Tomcat will not authenticate any attempt to
access the list page with a user name containing non-Latin characters, it
works perfectly if the user name contains only Latin characters.


BASIC auth, browsers and UTF-8 is a combination that I suspect behaves 
differently from browser to browser.


Your best bet would be to modify the manager app to use FORM auth, 
making sure your login form correctly supports UTF-8.


Mark

-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



RE: FormAuthenticator exception with non-latin (UTF8) user name

2010-06-30 Thread Chris Rafferty
Thanks Mark,

That has worked a treat.   I changed the manager application's web.xml to
use FORM based authentication, added the valve to its context.xml rather
than specifying it globally and provided the login form. 

Cheers
Chris


-Original Message-
From: Mark Thomas [mailto:ma...@apache.org] 
Sent: 30 June 2010 12:28
To: Tomcat Users List
Subject: Re: FormAuthenticator exception with non-latin (UTF8) user name

On 30/06/2010 12:22, Chris Rafferty wrote:

 This only occurs when I add the following valve to /conf/context.xml

  Valve
className=org.apache.catalina.authenticator.FormAuthenticator
 characterEncoding=UTF-8 /

Bad idea on a number of levels.

1. That change then applies to *every* context, and will break any that 
don't use FORM authentication.

2. The Manager app uses BASIC authentication...

 If I do not add the valve then Tomcat will not authenticate any attempt to
 access the list page with a user name containing non-Latin characters, it
 works perfectly if the user name contains only Latin characters.

BASIC auth, browsers and UTF-8 is a combination that I suspect behaves 
differently from browser to browser.

Your best bet would be to modify the manager app to use FORM auth, 
making sure your login form correctly supports UTF-8.

Mark

-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



RE: FormAuthenticator exception with non-latin (UTF8) user name

2010-06-30 Thread Caldarale, Charles R
 From: Chris Rafferty [mailto:chris.raffe...@sidonis.com]
 Subject: RE: FormAuthenticator exception with non-latin (UTF8) user
 name
 
 I changed the manager application's web.xml to use FORM 
 based authentication, added the valve to its context.xml

You really should not be explicitly adding the Valve; Tomcat will include the 
proper valve as required by the security specifications in the webapp's 
WEB-INF/web.xml file.

 - Chuck


THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY 
MATERIAL and is thus for use only by the intended recipient. If you received 
this in error, please contact the sender and delete the e-mail and its 
attachments from all computers.


-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Re: FormAuthenticator exception with non-latin (UTF8) user name

2010-06-30 Thread Mark Thomas

On 30/06/2010 16:20, Caldarale, Charles R wrote:

From: Chris Rafferty [mailto:chris.raffe...@sidonis.com]
Subject: RE: FormAuthenticator exception with non-latin (UTF8) user
name

I changed the manager application's web.xml to use FORM
based authentication, added the valve to its context.xml


You really should not be explicitly adding theValve; Tomcat will include the 
proper valve as required by the security specifications in the webapp's 
WEB-INF/web.xml file.


You do need to add the valve if you want to change any of the default 
settings. Whilst the request body *should* be in UTF-8 if the login form 
is written correctly, I believe there are issues with some browsers not 
following the spec and failing to declare the character encoding being used.


Mark

-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Tomcat5.5.27 not processing UTF8 encoded cookies

2009-10-05 Thread realta

I've recently had to upgrade from Tomcat5.5.20 to Tomcat5.5.27. For the main
functionality of the web application to work it needs to process a UTF8
encoded cookies to retrieve user customizations. There was no issue with the
5.5.20 version, but the 5.5.27 version is not processing the UTF8 encoded
cookie. It looks like 5.5.27 version is ignoring the UTF8 cookies.

I did find a bug report saying the security around cookie handling has
become stricter from Tomcat5.5.26 onwards. Could anybody point me in the
right direction of a spec that outlines the correct encoding to use on
cookies that Tomcat5.5.26 and greater will accept?
-- 
View this message in context: 
http://www.nabble.com/Tomcat5.5.27-not-processing-UTF8-encoded-cookies-tp25752559p25752559.html
Sent from the Tomcat - User mailing list archive at Nabble.com.


-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Re: Tomcat5.5.27 not processing UTF8 encoded cookies

2009-10-05 Thread Mark Thomas
realta wrote:
 I've recently had to upgrade from Tomcat5.5.20 to Tomcat5.5.27. For the main
 functionality of the web application to work it needs to process a UTF8
 encoded cookies to retrieve user customizations. There was no issue with the
 5.5.20 version, but the 5.5.27 version is not processing the UTF8 encoded
 cookie. It looks like 5.5.27 version is ignoring the UTF8 cookies.
 
 I did find a bug report saying the security around cookie handling has
 become stricter from Tomcat5.5.26 onwards. Could anybody point me in the
 right direction of a spec that outlines the correct encoding to use on
 cookies that Tomcat5.5.26 and greater will accept?

https://issues.apache.org/bugzilla/show_bug.cgi?id=44679 covers most of
the discussion - just skip over the various rants.

You'll be better off with 5.5.28. I'm fairly sure more cookie handling
patches were ported across.

If the cookie is set and used from within your app, you *should* be
fine. If you have a case where that doesn't appear to be working let us
know.

If the cookie is set by a third party app, then that app may need to
change to be compliant with the cookie specs.

Mark



-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Re: UTF8

2007-09-26 Thread Amnon Lahav
i just noticed the reply , i don't understand what u mean would you mind
explaining ?

On 9/23/07, Lucas Galfaso [EMAIL PROTECTED] wrote:

 hi,
   What happens if you escape every char in the XML file? This is you
 replace character number nnn to #nnn; (quotes for clarity.) The
 number has to be the ISO-10646 of the character and, lucky for you,
 this is the case of Javas internal encoding.

 Regards,
   lg

 On 9/22/07, Amnon Lahav [EMAIL PROTECTED] wrote:
  hi ,
  i'm using tomcat 5.5 and jdk5 allso using commons.fileupload , when
  uploading a XML that contains hebrew fonts i can't seem to get it in
 utf8 in
  the servlet tough JSP is configured to utf8 with :
  %@ page language=java contentType=text/html; charset=UTF-8
  pageEncoding=UTF-8 %
 
  !DOCTYPE html PUBLIC -//W3C//DTD HTML 4.01 Transitional//EN
  http://www.w3.org/TR/html4/loose.dtd;
 
  meta http-equiv=content-type content=text/html; charset=UTF-8
 
 
 
  i suspect that i might be getting in utf 8 but maybe it differs from
 java's
  UTF8 (that's impossible isn't it ?) because when i try to convert using
 new
  String(stringByte,UTF-8) it returns the same while with other
 encodings in
  can see in debug content changes ... i'm realy at a jam here people i
 have a
  deadline adn i can't seem to fix this silly bug any ideas ?
 
  when i open the xml with firefox and check properties it says
 windows-1255
  but when i try using the getbytes method to init stringByte it doesn't
  matter , i had this problem once with tomcat but it was with a simple
  textarea input and then i just converted to utf8 like i described above
 from
  iso-8859-1 but now i can't seem to do that ..
 
 
 
 
  -
  To start a new topic, e-mail: users@tomcat.apache.org
  To unsubscribe, e-mail: [EMAIL PROTECTED]
  For additional commands, e-mail: [EMAIL PROTECTED]
 
 

 -
 To start a new topic, e-mail: users@tomcat.apache.org
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]




Re: UTF8

2007-09-23 Thread Lucas Galfaso
hi,
  What happens if you escape every char in the XML file? This is you
replace character number nnn to #nnn; (quotes for clarity.) The
number has to be the ISO-10646 of the character and, lucky for you,
this is the case of Javas internal encoding.

Regards,
  lg

On 9/22/07, Amnon Lahav [EMAIL PROTECTED] wrote:
 hi ,
 i'm using tomcat 5.5 and jdk5 allso using commons.fileupload , when
 uploading a XML that contains hebrew fonts i can't seem to get it in utf8 in
 the servlet tough JSP is configured to utf8 with :
 %@ page language=java contentType=text/html; charset=UTF-8
 pageEncoding=UTF-8 %

 !DOCTYPE html PUBLIC -//W3C//DTD HTML 4.01 Transitional//EN
 http://www.w3.org/TR/html4/loose.dtd;

 meta http-equiv=content-type content=text/html; charset=UTF-8



 i suspect that i might be getting in utf 8 but maybe it differs from java's
 UTF8 (that's impossible isn't it ?) because when i try to convert using new
 String(stringByte,UTF-8) it returns the same while with other encodings in
 can see in debug content changes ... i'm realy at a jam here people i have a
 deadline adn i can't seem to fix this silly bug any ideas ?

 when i open the xml with firefox and check properties it says windows-1255
 but when i try using the getbytes method to init stringByte it doesn't
 matter , i had this problem once with tomcat but it was with a simple
 textarea input and then i just converted to utf8 like i described above from
 iso-8859-1 but now i can't seem to do that ..




 -
 To start a new topic, e-mail: users@tomcat.apache.org
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]



-
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



UTF8

2007-09-22 Thread Amnon Lahav

hi ,
i'm using tomcat 5.5 and jdk5 allso using commons.fileupload , when 
uploading a XML that contains hebrew fonts i can't seem to get it in utf8 in 
the servlet tough JSP is configured to utf8 with :
%@ page language=java contentType=text/html; charset=UTF-8 
pageEncoding=UTF-8 %


!DOCTYPE html PUBLIC -//W3C//DTD HTML 4.01 Transitional//EN 
http://www.w3.org/TR/html4/loose.dtd;


meta http-equiv=content-type content=text/html; charset=UTF-8



i suspect that i might be getting in utf 8 but maybe it differs from java's 
UTF8 (that's impossible isn't it ?) because when i try to convert using new 
String(stringByte,UTF-8) it returns the same while with other encodings in 
can see in debug content changes ... i'm realy at a jam here people i have a 
deadline adn i can't seem to fix this silly bug any ideas ?


when i open the xml with firefox and check properties it says windows-1255 
but when i try using the getbytes method to init stringByte it doesn't 
matter , i had this problem once with tomcat but it was with a simple 
textarea input and then i just converted to utf8 like i described above from 
iso-8859-1 but now i can't seem to do that ..





-
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: change server default enconding -Where to set JAVA_OPTS in catalina.sh for UTF8?

2006-10-11 Thread David Delbecq
Hi mark, not at all

1) there are 20 results for Djavax.servlet.request.encoding in google ^^
(but am really not sure this parameter really exists in tomcat)
2) URIEncoding=UTF-8 set the encoding used for html link, the default
is platform dependent.

I suppose the Zis wanted to set the default character encoding of the
request parameter, which is the most problematic part of http client
server, as client is suppose to tell it to server, but they most of the
time omit it. The only way i know to solve it is to write a small
servlet filter that issue a request.setEncoding(UTF-8)

btw Zis, your catalina not starting is because you need to replace your line

JAVA_OPTS=-Djavax.servlet.request.encoding=UTF-8 -Dfile.encoding=UTF-8

to the following

JAVA_OPTS=-Djavax.servlet.request.encoding=UTF-8 -Dfile.encoding=UTF-8

(notice the little  , it's not a tomcat problem, it's a shell rule :) )

this is even better:

JAVA_OPTS=-Djavax.servlet.request.encoding=UTF-8 -Dfile.encoding=UTF-8 
${JAVA_OPTS}


regards

Mark Thomas a écrit :
 Java Development Team wrote:
   
 Hi everyone.
 Iam trying to change server default enconding from ISO8859_1 to UTF8.
 Till now I found 2 differrent solutions. 
 The fisrt one is to use the following in my catalina.sh:
 set JAVA_OPTS=-Djavax.servlet.request.encoding=UTF-8
 
 Never seen this before and Google returns zero hits.

  -Dfile.encoding=UTF-8
 This is a read-only option on some platforms

   
 The second one is to use filters which i will try if dont get any luck with 
 the above which seems to be more efficient solution.
 

 Or 3, code your application for UTF-8 from the start.

 Which ever way you go, you will almost certainly need to set
 URIEncoding=UTF-8 on the connector.

 Mark

 -
 To start a new topic, e-mail: users@tomcat.apache.org
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]

   


-
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: change server default enconding -Where to set JAVA_OPTS in catalina.sh for UTF8?

2006-10-11 Thread Mark Thomas
David Delbecq wrote:
 Hi mark, not at all
 
 1) there are 20 results for Djavax.servlet.request.encoding in google ^^
 (but am really not sure this parameter really exists in tomcat)

My bad. I kept the - in front which, of course, suppressed the
results. The option isn't in the spec and isn't in the 5.5.x code
base. Maybe it is an option from an old version since the only
references appear to be Tomcat related.

 2) URIEncoding=UTF-8 set the encoding used for html link, the default
 is platform dependent.

Indeed, which is why I mentioned it. As per the docs and the spec, the
default is always ISO-8859-1 regardless of the platform default
encoding. Any query parameters in the URL will be decoded on this
basis. request.setEncoding() has no effect on this unless
useBodyEncodingForURI=true is set on the connector.

 The only way i know to solve it is to write a small
 servlet filter that issue a request.setEncoding(UTF-8)

You can make the same call within a JSP or servlet. Where makes most
sense will vary from application to application.

The other place where encoding can trip you up is if you include
static resources within a JSP. There is a fileEncoding parameter on
the default servlet that may help.

Mark

Mark

-
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: change server default enconding -Where to set JAVA_OPTS in catalina.sh for UTF8?

2006-10-11 Thread Java Development Team
I have used a sevlet filter and  translte from ISO8859_1 to UTF8 just works
through all application.

  1) there are 20 results for Djavax.servlet.request.encoding in google ^^
  (but am really not sure this parameter really exists in tomcat)

 My bad. I kept the - in front which, of course, suppressed the
 results. The option isn't in the spec and isn't in the 5.5.x code
 base. Maybe it is an option from an old version since the only
 references appear to be Tomcat related.

Adding the Djavax.servlet.request.encoding gave me corect UTF8 encoding for
servlets that insert stuff directly in database which was problematic.
JAVA_OPTS=-Djavax.servlet.request.encoding=UTF-8 -Dfile.encoding=UTF-8
${JAVA_OPTS}

The corect one thanx to david


  2) URIEncoding=UTF-8 set the encoding used for html link, the default
  is platform dependent.

 Indeed, which is why I mentioned it. As per the docs and the spec, the
 default is always ISO-8859-1 regardless of the platform default
 encoding. Any query parameters in the URL will be decoded on this
 basis. request.setEncoding() has no effect on this unless
 useBodyEncodingForURI=true is set on the connector.

I didnt knew about this parameter that is probably why URIEncoding=UTF-8
had no effect while I was testing .

 The other place where encoding can trip you up is if you include
 static resources within a JSP. There is a fileEncoding parameter on
 the default servlet that may help.

I dont know about this either. How do I get  access to  default servlet

Thank you Mark and David
Zissis



-
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



change server default enconding -Where to set JAVA_OPTS in catalina.sh for UTF8?

2006-10-10 Thread Java Development Team
Hi everyone.
Iam trying to change server default enconding from ISO8859_1 to UTF8.
Till now I found 2 differrent solutions. 
The fisrt one is to use the following in my catalina.sh:
set JAVA_OPTS=-Djavax.servlet.request.encoding=UTF-8 -Dfile.encoding=UTF-8

Putting this line in the start of my catalina.sh I get 
Java not starting up, Tomcat not running. 
Does anyone know how to fix this , if any specific .jars needed etc...

The second one is to use filters which i will try if dont get any luck with the 
above which seems to be more efficient solution. 

Thank you
Zis




Re: Form based login with UTF8 and Tomcat

2006-01-10 Thread Mark Thomas
Joacim Turesson wrote:
 I have trouble with UTF-8 and form based login with Tomcat 5.5.12 together
 with Apache 2.0.55 using mod_jk 1.2.15.

See the last section of
http://tomcat.apache.org/tomcat-5.5-doc/config/valve.html

 I have a struts based application that works fine with UTF-8, but the form
 based login using jdbc realm don’t work with åäö.
 
 I added URIEncoding=UTF-8 the connectors in server.xml, and the
 application has a filter matching “/” (in web.xml) that encodes to UTF-8 as
 described in
 http://www.javaworld.com/javaworld/jw-05-2004/jw-0524-i18n_p.html
 

Mark


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Form based login with UTF8 and Tomcat

2006-01-09 Thread Joacim Turesson
Hi!

 

First of all, I’m sorry for my empty mail.

 

Now to my question.

 

I have trouble with UTF-8 and form based login with Tomcat 5.5.12 together
with Apache 2.0.55 using mod_jk 1.2.15.

I have a struts based application that works fine with UTF-8, but the form
based login using jdbc realm don’t work with åäö.

 

I added URIEncoding=UTF-8 the connectors in server.xml, and the
application has a filter matching “/” (in web.xml) that encodes to UTF-8 as
described in
http://www.javaworld.com/javaworld/jw-05-2004/jw-0524-i18n_p.html

 

Before when I used ISO-8859-1, form based login in Tomcat worked fine with
åäö. 

 

Thanks in advance!

 

Best Regards

 

Joacim Turesson