On Sun, Jan 05, 2003 at 03:37:47PM -0000, David Powers wrote: > > On Sun, Jan 05, 2003 at 12:20:51AM -0000, David Powers wrote: > >> This is a cut-down version of what I now have in my php.ini. As you > >> will see, I have commented out the output_handler line. When > >> enabled, all I got was mojibake. > >> > >> output_buffering = On > >> ;output_handler = mb_output_handler > > > > I see no problem in this. > > That brings me back to my original query. The PHP documentation says > SJIS users should set output_handler to mb_output_handler. Doing so > results in mojibake. Turning it off (by commenting it out with the > semi-colon) is the only way I can get my pages to display correctly. So, > either there is a mistake in the documentation or the explanation of > SJIS users needs to be clarified. (snip) > This would seem to add an unnecessary level of complication. I am using > PHP in combination with MySQL to provide an online database in both > Japanese and English. All input is done through a browser interface over > the internet, and most - if not all - users are on Windows. PHP seems to > do an excellent job of conversion without adding a further layer.
As I said in the previous mail, mojibake is because you are composing your pages in Shift_JIS whereas you are supposed to use EUC-JP actually. In most cases PHP is likely to process Shift_JIS encoded pages without problems, but sometimes it ends up giving a buggy result you could hardly know what is going wrong there. This is because several (not many) Shift_JIS kanji characters consist of any character which can be a lead-byte of the double-byte character set and '\' (backslash / yen sign), though '\' is also used to form escape sequences in string literals enclosed by single-quotes or double-quotes. Besides the same problem is known to be caused by other east-asian(CJKV) charatcter sets like CP936 (a superset of GB2312 which is adopted by Microsoft; also known as GBK), GB18030 (a huge character set defined as a Chinese national standard), or BIG5 (used to represent traditional Chinese text). If you haven't experienced such a "phenomenon" ever, you are definitely lucky so far :-) Unfortunately I don't seem to be allowed to use Japanese characters in this list, I couldn't give you any example in this mail. I'll come up with those again if you can read Japanese mails with your mail client. > >> based on PHP 4.2.2 and PostgreSQL. Are there any major differences > >> between 4.2.2 and 4.3.0 as far as Japanese is concerned? > > > > No significant changes have been made between these versions. All that > > the mbstring developers did is bug fixing. > > Again, this is where I get confused - or maybe I'm misunderstanding a > vital element. The PHP documentation states that as of > 4.3.0, --enable-mbstr-enc-trans has been eliminated. Under 4.2.2, I > needed to use mb_convert_encoding($_POST['variable'], "SJIS") to gather > variables submitted by a form. Now I don't need to. Sorry for the confusion. I said "no significant" in a technical point of view. As for --enable-mbstr-enc-trans, this compile-time option is removed for convenience and now replaced by "mbstring.encoding_translation" runtime option. You can use mb_parse_str() as well in case it's turned off. > Since Japanese is not my native language, it's not as easy for me to > search for information in news groups and websites as it is in English. > I intend to study the PDF files you recommended, but I see they were > written before --enable-mbstr-enc-trans was eliminated, so any guidance > on how this affects the handling of Japanese would be useful. Hmm... English information about Japanese text handling with PHP is very limited since a small number of developers who fluently speak English use Japanese or other east-asian languages in his/her project, and since I don't have much time to add more explanation to the manual. I think all I can do for now is fill up this list's archive with (hopefully) detailed mails. Moriyoshi -- PHP Internationalization Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php