[PHP] UTF-8/FormMail headaches

2002-05-22 Thread Peter Johansson

Hi all,

I've got a problem with character encoding in combination with a
FormMail-script (coded in PHP). Everything works fine as long as I stick
to ISO-8859-1 as charset, but when I call the script from pages that use
UTF-8 as encoding, special characters (e.g. those special chars with dots
and circles above that we tend to use here in Sweden) end up garbled. The
encoding is set with a meta-tag in case the document is in UTF-8, like so:

meta http-equiv=content-type content=text/html; charset=UTF-8/

The server hasn't got any special charset configured, so it should deliver
ISO-8859-1 (which is the default if I'm not mistaken) on those pages
which haven't got that meta-tag.

I can understand that the characters are garbled, using different charsets
and all, but how can I make my FormMail-script to cope with both variants
of encoding? I've played around with phpinfo() to see if the encoding is
available in some environment variable, but I haven't find anything of
interest. There's a function utf8_decode() in php that seems to work for
converting the UTF-8 form data to ISO-8859-1 prior to sending the mail,
problem is that it doesn't work all that well on data already in
ISO-8859-1. I guess I need some way to determine what encoding the posted
data is in?

The content-type after the form has been posted to my FormMail-script
always seems to be application/x-www-form-urlencoded no matter what I
try. I've looked at the HTML-specs at www.w3.org and even tried to set the
enctype to application/x-www-form-urlencoded; charset=ISO-8859-1 in an
attempt to make the form-data use a different encoding than the rest of
the page, but no success. I've also tried the accept-charset but I
couldn't get that to work either.

Anyone who has a clue on this? I guess I could alter the script somehow
and add a new hidden field that indicate which encoding is used, and then
decide whether to call utf8_decode() on the data, but I don't think that's
a very nice solution. And changing all pages that uses FormMail is out of
the question (well, almost anyway) since there's _alot_ of them.

Regards,
Peter


-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php




Re: [PHP] UTF-8/FormMail headaches

2002-05-22 Thread Miguel Cruz

For detection of encoding, perhaps you could include a hidden field with 
some shibboleth characters. Research how they are transformed by various 
encodings, and then just look at them to figure out what for that the rest 
of the data is in.

miguel

On Wed, 22 May 2002, Peter Johansson wrote:
 I've got a problem with character encoding in combination with a
 FormMail-script (coded in PHP). Everything works fine as long as I stick
 to ISO-8859-1 as charset, but when I call the script from pages that use
 UTF-8 as encoding, special characters (e.g. those special chars with dots
 and circles above that we tend to use here in Sweden) end up garbled. The
 encoding is set with a meta-tag in case the document is in UTF-8, like so:
 
 meta http-equiv=content-type content=text/html; charset=UTF-8/
 
 The server hasn't got any special charset configured, so it should deliver
 ISO-8859-1 (which is the default if I'm not mistaken) on those pages
 which haven't got that meta-tag.
 
 I can understand that the characters are garbled, using different charsets
 and all, but how can I make my FormMail-script to cope with both variants
 of encoding? I've played around with phpinfo() to see if the encoding is
 available in some environment variable, but I haven't find anything of
 interest. There's a function utf8_decode() in php that seems to work for
 converting the UTF-8 form data to ISO-8859-1 prior to sending the mail,
 problem is that it doesn't work all that well on data already in
 ISO-8859-1. I guess I need some way to determine what encoding the posted
 data is in?
 
 The content-type after the form has been posted to my FormMail-script
 always seems to be application/x-www-form-urlencoded no matter what I
 try. I've looked at the HTML-specs at www.w3.org and even tried to set the
 enctype to application/x-www-form-urlencoded; charset=ISO-8859-1 in an
 attempt to make the form-data use a different encoding than the rest of
 the page, but no success. I've also tried the accept-charset but I
 couldn't get that to work either.
 
 Anyone who has a clue on this? I guess I could alter the script somehow
 and add a new hidden field that indicate which encoding is used, and then
 decide whether to call utf8_decode() on the data, but I don't think that's
 a very nice solution. And changing all pages that uses FormMail is out of
 the question (well, almost anyway) since there's _alot_ of them.
 
 Regards,
 Peter
 
 
 


-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php




Re: [PHP] UTF-8/FormMail headaches

2002-05-22 Thread Peter Johansson

On Wed, 22 May 2002, Miguel Cruz wrote:

 For detection of encoding, perhaps you could include a hidden field with 
 some shibboleth characters. Research how they are transformed by various 
 encodings, and then just look at them to figure out what for that the rest 
 of the data is in.

Yes, that could probably work, but I'd still have to modify all those 
pages with forms and I'd rather not (they're not my own pages, but my 
customer's). I'm sure there must be something I've missed, something 
simple, something elegant?

Regards,
Peter


-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php