"Pill, Juergen" wrote:
>
> Hello,
>
> We found the problem, when we were using German Umlaute in webFolders and
> Office. Both client and server were running on the same machine, therefore
> the fix seams to work.
>
> I have noticed following (via protocol sniffers):
>
> 1) WebFolders sends a German "ue" as %fc, this is exactly what the java.net
> classes de/en-code the "ue" to/from.
That's what you'd expect if it's just using ISO8859-1. It's wrong, since
there's no indication at that point that it is ISO8859-1 (if it's in the
body, then it's explicitly said to be UTF8. If it's in the headers, I
don't know, but it's probably ambigous, and means we can't use OTHER
character sets, probably.
> 2) Office can read this %fc and displays it as an "ue", but (depending on
> the request) sends sometimes %fc or a different encoded character tuple on
> other requests, which Slide was able to handle.
> Encoding an "ue" (in the XML propFind response encoded via utf-8) as %fc is
> definitely wrong, but did work (strange). This would be the right UTF-8
> encoding for B"ue"cher (hope it goes through the e-mail): B"�1/4"cher.
> (hex: C3 BC)
>
> This whole I18N stuff is really ....
>
> The http standard states, that an URL may not contain anything except
> USASCII, but (for the integration in the file system) at the user level, he
> can specify more than USASCII. This standard would allow us to reject URLs
> containing "special" characters.
>
> What do you think about following idea:
>
> If the url is send/received via the header/command part we de/en-code it via
> the java.net class ("ue" <--> %fc).
> If the url is wrapped in xml (e.g. propFind) we encode it via utf-8.
>
> Best regards
>
> Juergen
>
> We are starting to build up a test suite for I18N, this is unfortunately a
> rather slow process, for all these layers and different clients involved.
The java.net stuff has other undesirable properties in how it works, so
this is probably not the best idea. This is also guaranteed to be wrong
for anything other than ISO8859-1 (latin-1), which is also bad (err... I
think).
If it's needed for interoperability (and it seems it is), perhaps you
should make it explicit that that's what you're doing (by using the same
URLUtil encoder/decoder, but explicitly using ISO8859-1. That _should_
behave the same as the java.net stuff, except with causing problems with
things like spaces), and possible make that overridable by an option
somewhere?
Anyway, some ideas to think about...
Michael