On Fri, 2008-05-09 at 12:56 +0900, Darren Cook wrote: > (This is a reply to a problem in the archives, from March: > http://marc.info/?l=php-i18n&m=120595161128203&w=2 )
Hi Darren, Thanks for your answer! I actually planned to write a more detailed answer - but then couldn't find the time to finish it and finally lost everything I had prepared because of a HD problem :( I often get email with mojibake (scrambled subjects etc.) here in Japan and - after looking at other peoples code - had the impression that most people handle with this kind of problem rather by trial-and-error until their personal problem is solved than by reading the official coding standards. The code used by ec-cube for example seems to suffer from the same problem I encountered. The result are buggy email programs all over which often do not work with anything else than JIS, SJIS and EUC. And the only way to deal with their "personal" approach to character coding therefor seems to be to fall back to the same strategy: trial-and-error. Not very satisfying. But I finally did the same thing as this seemed to be the only way to make my problem vanish. (See my hackish code appended.) Probably the answers I got from other people when asking them why they do not use UTF-8 reflect this kind of experiences and the believes caused by them rather than the facts about encoding methods and standards... Here a list with some of the answers I got (as I got them): - Most cellars don't work with UTF-8. ...this is the one most important answer I got as lots of people in Japan use the time they spend in the subway to read and write their email with their cellar. Only some cellars work with UTF-8, most don't. And I often was told by friends that my email program (I normally use UTF-8) "doesn't work correctly" :) More often I just didn't get any answer at all... Based on this experience it is just natural that people don't switch to UTF-8. And even if more and more of the newer programs also work with UTF-8, probably it will still take a while until this "tradition" in the Japanese software developer community will change. continuing with the answers: - I don't like UTF-8. It is too new, everybody is used to JIS and when using UTF-8 there are always lots of problems. With JIS things work well out of the box. - The file size becomes bigger as there are so many different Characters which have to be encoded and Japanese characters are encoded with three bytes in UTF-8. - There are too many different versions of UTF-8 which create problems. There is only one version of JIS which and therefor no version problems arise. - I only use UTF-8 if absolutely necessary, for example when Chinese and Japanese texts are on the same page. - when using UTF-8 the characters do not look nice. - There is no need for UTF-8: Japanese and Ascii is all we need in normal circumstances, why bother about other languages? - Similar Characters are grouped together and differences between similar Japanese Characters get lost. - Doesn't look good. - When only Chinese or only Japanese it looks good, when mixing languages the Characters the page gets ugly. > <minor rant> > Later in the thread Tomas suggested using UTF-8 instead of ISO-2022-JP, > and getting Docomo to change. The problem is all those handsets in > existence. Not to mention all the other legacy email clients that don't > work well with UTF-8, but real people still use. Docomo could convert > from UTF-8 to ISO-2022-JP at the gateway of course, which apparently is > what softbank and kddi actually do, but Docomo deal with a lot of email > so care about the cost of the extra CPU cycles, and you're going to need > better motivation for them than "PHP cannot write proper MIME headers" I > suspect. > </minor rant> Yes, I agree. > Darren > > As you obviously have the mb_string extension installed, have you tried > using mb_send_mail() instead of mail()? Then you shouldn't need to mess > around encoding your own mimeheaders. I do not remember well I have to admit. The next time I will try again :) > P.S. If still no luck, and you want to try writing your own solution in > PHP, I seem to have a function called jis_loop() in mail.inc in my > open-source fclib ( http://dcook.org/software/fclib/ ) that does this. > It is 5 years since I touched that file, and probably 7-8 years since I > wrote that function, and just looking at it now I cannot make head nor > tail of it. So I'd regard that as a last resort :-) Thanks for your help :) Dietrich Here is the code I finally used - it is kind of ugly and I currently don't have the time to write a nicer version, sorry: ------------------------------ <?php // (emacs: -*- mode: php -*-) かな漢字 - save in UTF-8 /** Encode and Send Japanese emails using ISO-2022-JP and mime header encoding. Emails with header fields encoded with the `mb_encode_mimeheader()' function cannot be decoded correctly. Long subjects and strings in other header fields (for example 'From:' or the 'Reply-To:' values) are broken down into shorter strings by `mb_encode_mimeheader()' which cannot be decoded by (at least some) email programms. These problems have been encountered when using evolution 2.12.3 as email reader, but users of other email programs reported the same errors when trying to decode emails encoded with `mv_encode_mimeheader()'. The code in this file uses several "work-arounds" found by experimenting with several coding strategies. Emails formatted with `send_email_iso_2022_JP()' could be decoded correctly by evolution 2.12.3 and other email programs. */ /** Encode `$string' by: - breaking it into chunks of at most 10 characters - base64 encoding each chunk - putting every base64 encoded chunk between the prefix `=?ISO-2022-JP?B?' and the postfix `?=' - assembling the encoded chunks separated with `$separator' into one result string */ function encode_mimeheader_iso_2022_JP($string, $separator) { // Notes: // // - The encoding seems to work only with "ISO-2022-JP" used to characterize the encoding // in the mime prefix; when using "JIS" at least my email reader (evolution 2.12.3) // is not able to decode the result. // // - The separator has to be given as second element // - a blank has to be used when encoding a name in the 'From:', 'Reply-To:' etc. // - The separator '\r\n' seems to be the standard for encoding the subject. // I didn't verify if // - a blank works also for the subject field (probably it does) // - how / if other email programes work with the following code // (I tested only with evolution 2.12.3) // convert `$string' to ISO-2022-JP (JIS) $encoding = "ISO-2022-JP"; $stringJIS = mb_convert_encoding($string, $encoding, "AUTO"); // encode `$stringJIS' // - subdividing `$stringJIS' into character chunks of length ` $chunk_length' // - encoding every chunk with base64 // - putting the result between "=?ISO-2022-JP?B?" and "?=" // - using `$separator' to separate the base64 encoded chunks $chunk_length = 10; $encoded = ''; while ($length = mb_strlen($stringJIS)) { // encode the next `$chunk_length' chars $chunk = mb_substr($stringJIS, 0, $chunk_length, $encoding); $chunk_64encoded = base64_encode($chunk); $encoded .= sprintf("=?%s?B?%s?=%s", $encoding, $chunk_64encoded, $separator); // continue with the rest of the string $stringJIS = mb_substr($stringJIS, $chunk_length, $length, $encoding); } // return the encoded string return $encoded; } /** Encode the email body using ISO-2022-JP (JIS). Note that the coding system has to be specified in the Content-Type/charset header: Content-Type: text/plain; charset=ISO-2022-JP */ function encode_body_iso_2022_JP($body) { // encode body using JIS (ISO-2022-JP) return mb_convert_encoding($body, "ISO-2022-JP", "AUTO"); } /** Format a Japanese email using ISO-2022-JP (JIS) and mime header encoding. */ function send_email_iso_2022_JP($recipientEmailAddress, $subject, $body, $senderName, $senderEmailAddress) { // set current language to Japanese mb_language("ja"); // encode subject $subjectMIME = encode_mimeheader_iso_2022_JP($subject, "\r\n"); // encode the name of the sender $senderNameMIME = encode_mimeheader_iso_2022_JP($senderName, " "); // encode body $bodyJIS = encode_body_iso_2022_JP($body); // formatting the sender string if ($senderName && strlen($senderName) > 0) { // encode the name of the sender $senderNameMIME = encode_mimeheader_iso_2022_JP($senderName, " "); // format email address $senderMIME = sprintf("%s <%s>", $senderNameMIME, $senderEmailAddress); } else { // format email address $senderMIME = sprintf("%s", $senderEmailAddress); } // formatting the mime header $headers = "MIME-Version: 1.0\r\n" ; $headers .= sprintf("From: %s\r\n", $senderMIME); $headers .= sprintf("Reply-To: %s\r\n", $senderMIME); $headers .= "Content-Type: text/plain; charset=ISO-2022-JP\r\n"; // send encoded mail $result = mail($recipientEmailAddress, $subjectMIME, $bodyJIS, $headers); // return result return $result; } ?> ------------------------------ -- PHP Unicode & I18N Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
