The bug #31265 has not been fixed yet, tell me if anybody has an issue with the patch I provided - It's not perfect, but I think this could be very useful as it allows to supports more "standard" clients (indeed it is very useful for some of my customers :-)
thomas
Le 17 sept. 04, � 15:18, Thomas Draier a �crit :
i cannot completely guess the encoding, but from what i've seen it is either utf-8 or a regional encoding, that may be specified in the slide configuration. of course it is impossible to guess if it is 8859-1 or 8859-5, that's why on this point i rely on the configuration - and that can be configured, as it depends more on the geographical position of your clients. but it is easy to check if a value is utf-8 or not, and even if it is possible that a sequence encoded in iso-8859-x can look like a utf-8, that is rare - at least with 8859-1. a utf-8 sequence is composed of a header character coded between 0xC0 and 0xFC and a precise number of parts coded between 0x80 and 0xBF . so if your filename is xxũxx (a with ring above - copyright sign) - that will be ttranslated to � (e with acute) if you use a non-utf-8 client - in that case there's no other choice to either rename your file or use a utf-8 capable client. that's how utf-8 and other encodings are detected in the decodeString method.
thomas
Le 17 sept. 04, � 14:07, Jacob Lund a �crit :
The http protocol does not include any information about encoding - how do you intend to guess/find the encoding of the uri? I have been working on this too and I came up with the conclusion that it would be impossible.
All I would like is simply to deny all requests that does not use utf-8 encoding.
/jacob
----- Original Message ----- From: "Thomas Draier" <[EMAIL PROTECTED]>
To: "Slide Developers Mailing List" <[EMAIL PROTECTED]>
Sent: Friday, September 17, 2004 1:16 PM
Subject: Re: slide and encodings
Le 17 sept. 04, � 12:20, Jacob Lund a �crit :
All I did was test with utf-8 and it seems to work.
Also be careful about windows explorer. Encoding from windows explorer differs with the combination of windows version and office version installed on you machine. I have windows XP and office 2003 - and it seem to do a correct utf-8 encoding of the uri.
correct - i've tested the default version without office 2003 - i have
also tested with an office installed version, that also works, but i
did not deeply check what was sent by this version. if it is a correct
url-encoded utf-8 that's nice (and that's probably the best way) - but
still different from my other clients, except maybe cadaver :-) . so
you may need to change the server configuration depending on the client
you use. but our users often want to use different clients, and
sometimes (no, always) use different version of the windows explorer
(with or without office). we used to told them to replace the dll to
get the version that is bundled with office, but that not always easy
to do.
I general client that encode using client characterset does not really impress me :-)
When I use latest slide and TC 5 with slide encoding set to utf-8 and TC connector set to utf-8 encoding then everything works for me. However I only use client that encodes to utf-8.
yep, that was my problem, i use many different clients :-)
I am not sure if I understand you test correctly. Do you want a server configuration that can accept both utf-8 and other encodings?
yes - in fact i'd like the server to understand all client requests,
whatever it uses internally - at least for latin extensions and
supplement. i do not really like the idea of setting parameters in the
server that will restrict what kind of clients you can use. if the
database or filesystem supports only a single encoding, you can set the
useUtf8 parameter to false and all data will be restricted to the
specified charset - but request coded with utf-8 should still work. if
utf8 parameter is set to true, the encoding parameter will only be used
as a charset that may be used by clients that do not send utf-8 - that
should make the server work for all clients in a specific region.
thomas
/Jacob
----- Original Message ----- From: "Thomas Draier" <[EMAIL PROTECTED]>
To: "Slide Developers Mailing List" <[EMAIL PROTECTED]>
Sent: Friday, September 17, 2004 11:40 AM
Subject: Re: slide and encodings
interesting, i did not know about these parameters. i just tried on
tomcat 5, the result of the getPathInfo() method is clearly different,
url-encoded chars are now interpreted as utf-8.the useURIValidationHack
was already set to false - when setting it is to false, some bytes
(non-ascii) are transformed to '-' char. but the problem is that in
general, clients send iso-8859-1, or windows-1252, except weird client
that do not know how to send non iso characters. for example, if my
filename is \u00E9-\0153-\20AC-\u0430.txt (LATIN SMALL LETTER E WITH
ACUTE - LATIN SMALL LIGATURE OE - EURO SIGN - CYRILLIC SMALL LETTER A)
, it will be sent by windows explorer as
%E9-[C5][93]-[E2][82][AC]-[D0][B0].txt where [xx] is the byte xx - as
you can see the first char is url-encoded as iso-8859-1, other chars
are not url-encoded and written as utf-8. just to compare, webdrive
will send for the same file the sequence %E9-%9C-%80-? - which is
completely url-encoded in windows-1252 (LATIN SMALL LIGATURE OE does
not exist in iso-8859-1 but is %9c in windows 1252, EURO SIGN is %A4 in
8859-15 but %80 in windows 1252). strange thing is that the cyrillic
letter is passed as a ? - which is interpreted by tomcat as a parameter
separator, and then cut the end of the string - for both getPathInfo
and getRequestURI. whatever the encoding set in tomcat, it is not
properly decoded to the original string. that's why i had to bypass the
tomcat decoding and write another method that try to guess the encoding
from raw data.
thomas
Le 17 sept. 04, � 09:19, Jacob Lund a �crit :
Did you remember to set encoding in the connector in server.xml?
in TC 5: add URIEncoding="UTF-8" as parameter to the connection.
in TC41: (Coyote connector): add useURIValidationHack="false" as parameter to the connector.
/jacob
----- Original Message ----- From: "Thomas Draier" <[EMAIL PROTECTED]>
To: "Slide Developers Mailing List" <[EMAIL PROTECTED]>
Sent: Thursday, September 16, 2004 5:00 PM
Subject: slide and encodings
hi,
back with my encoding problems, when using strange characters in filenames ..
i've made some tests on differents servers/platforms/clients - my base config is a tomcat 5.0.27 on mac os, i've also tested with tomcat 4.1.29, and moved the server to a windows xp. as clients, i used windows explorer, webdrive (from south river), cadaver on mac os, and the mac os finder webdav client (almost unusable because of specific mac encoding), cadaver on linux, and konqueror on linux. of course, each configuration gave me different results :-)
the getPathInfo() method is supposed to return a decoded path - but the behaviour is different between different containers when strange characters are sent. tomcat 4 and 5 do no return the same value. i replaced it with a parsing of getRequestURI(), which returns what the client has sent without decoding - and do the same with either tomcat 4 or 5, hopefully also for other servers.
i've also found some problems with the urlEncoding configuration parameter. even if the system is configured as utf-8, some clients can still send a mix of utf-8 and another encoding - so i added another parameter in Configuration to define if utf8 should be used or not, and kept the other as a "secondary" encoding. i changed the decodeString method (the method i previously added), which decodes either utf-8 or encoding specified in the configuration. and i updated the fixTomcatURL() method in order to work with these changes.
i changed the propfind method so that the encoding declared in the xml response match the encoding being used (it was always returning "utf-8")
the "Destination" header should be decoded as the url - for all the clients i've tested, the same form is used - getHeader() works as getRequestURI(), and does not decode anything. i do not know about the "Label" header - the clients i have, except slide client, do not support it now.
finally, i added a transformation for the ? character. that character does not work with most of the client i've used, but it appears if a character not supported by the encoding specified in confguration is sent. and then make the file unusable.
now that seems to work fine, whatever the server, client, or encoding being used .. for what i've tested .. but i'm sure we can still find some other problems :-)
there are 5 modified files - slide.properties, Configuration, AbstractWebdavMethod, PropFindMethod and WebdavUtils - can i send the patches here or on the bugzilla ?
thomas
------------------------------------------------------------------- --
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
-------------------------------------------------------------------- -
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
