On Fri, 4 Jan 2008 09:16:54 -0500, tedd wrote:
> At 10:33 AM +0100 1/4/08, Nisse Engström wrote:
>>On Thu, 3 Jan 2008 12:39:36 -0500, tedd wrote:
>
> Nisse:
>
> I thank you for your most enlightened and informative reply.
>
> I cut/pasted your post into my list of things to remember.
A few more random thoughts on form submission:
How does the browser know which character encoding
to use in the form submission? Well, lacking any other
guidance, I believe most browsers tend to use the
encoding that was used in the document where the form
is located.
So what guidance can you give to the browser? The
<form> element has an attribute `accept-charset´ that
can be used to specify a list of acceptable character
encodings. However, something in the back of my mind
tells me I've read that this is not widely supported
by browsers, but I could easily be wrong about this.
In any case, a browser can choose a different encoding
if none of those specified are supported.
[The name `accept-charset´ is somewhat unfortunate
because it confuses two different concepts: A character
set is a repertoire of characters, while a character
encoding is a way to translate (serialize) the
characters into a byte sequence. UTF-8 and UTF-16 both
contain the same character set (Unicode), but they
encode them in very different ways.]
The general rule seems to be: Browsers tend to use
the same character encoding that it received from the
server.
This brings up another problem: How do you *know*
which character encoding was actually used? Apparently,
this problem was overlooked when the HTTP protocol was
devised. The only way (according to the HTML spec. is
to use a POST request with enctype=multipart/form-data,
but I don't think PHP makes the Content-Type information
available to the user, so this is no help.
Someone (Ian Hickson?) came up with a fix for this:
If you add the following form control:
<input type=hidden name=_charset_>
most modern browsers will fill in which encoding it used
in the form submission.
- - -
More reading:
W3C on Internationalization:
<http://www.w3.org/International/>
(Even experts get it wrong. Spot the bug!)
W3C on character sets and encodings:
<http://www.w3.org/International/getting-started/characters>
Wikipedia on Character encoding:
<http://en.wikipedia.org/wiki/Character_encoding>
"This entire encoding process is more involved
than it looks"
/Nisse
--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php