True,
it`s the Browser that encodes the special chars I think. I sometimes had problems with
not encoded URL`s in Netscape, but the IE always translates them right.
Birte Glimm
-----Original Message-----
From: Kitching Simon [mailto:[EMAIL PROTECTED]]
Sent: Freitag, 5. Januar 2001 11:58
To: '[EMAIL PROTECTED]'
Subject: off-topic: handling non-ascii characters in URLs
Hi All,
While following a related thread (RE: a simple test to charset),
a question occured to me about charset encodings in URLs.
This isn't really tomcat-related (more to do with HTTP standards)
but thought someone here might be able to offer an answer.
When a webserver sends content to a browser, it can indicate
the character data format (ascii, latin-1, UTF8, etc) as an http
header. However, how is the character data type specified for data
send *by* a browser *to* a webserver (ie GET or POST action)?
Andre Alves had an example where an e-accent character
was part of the URL. I saw that IE4 replaced this character
with %E9 when submitting a form using GET method, but this
really assumes that the receiving webserver is using latin-1.
There is this thing called an "entity-header" defined in the HTTP
specs, which may contain a "content-encoding" entry. This seems
to cover POST urls ok then, as the POSTed data is in an entity-body,
and therefore an entity-header can be used to define its encoding.
But the URLs themselves cannot have their encoding specified by
an entity-header, because they are not in an entity-body. So does
this mean that all URLs should be restricted to ascii, and forms
should not use GET method unless their data content is guarunteed
to be all-ascii?? I remember seeing an article recently about domain
names now being available in asian ideogram characters, which seems
to indicate otherwise....
Any comments?
Cheers,
Simon
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, email: [EMAIL PROTECTED]