Re: [PHP-I18N] new to intenationalization programming

Moriyoshi Koizumi Fri, 28 Nov 2003 20:24:54 -0800

On 2003/11/28, at 17:45, Rasmus Lerdorf wrote:

Are you sure you will be using UTF8? The native charset everyone uses in Japan is EUC-JP and in Korea it is EUC-KR. This of course doesn't mean that UTF8 won't work, but you may want to doublecheck on that.


Just to clarify, there are several standards for character set and
encoding in Japan.

The most commonly used encoding is Shift_JIS, because it's been used
in numerous products since localized CP/M was introduced in Japan.
(it became popular when the first localized MS-DOS came out in fact.)

However, ISO-2022-JP encoding is often used for internet message
transport because RFC1468 standarized it.

In contrast to Windows, you may be able to choose one of the
following encodings for system locale charset, Shift_JIS,
EUC-JP and ISO-2022-JP possibly on *nix (or similar) platforms.
There EUC-JP has been considered to be most preferred because
it's often found easier to port a non-multilingual application
to the Japanised one with EUC-JP than the others due to
characteristics of its encoding scheme.

There are lots of non-i18n'ed open source products used in Japan.
PHP is no exception. It's actually incapable of handling several
encodings such as Shift_JIS (CP932), CP936 (often wrongly referred to
as GB2312) or CP949 (a microsoft variant of EUC-KR) without magic [1].
It is because those encoding methods encode some characters into a
compound of an arbitrary octet and a special character like "\" that
has a particular meaning in the language construct of PHP, and it
often leads PHP into unexpected behaviour [2].

On the other hand, lots of people tend to think Unicode is the
perfect solution to create an fully-internationalised application.
But it's hardly the case because OS vendors use different
character mapping table between native character set and Unicode,
which ended up with sort of mass confusion.

Well, maybe enough said. If you want to know further information,
Ken Lunde's CJKV Information Processing [3], published by O'Reilly
would definitely help.

Moriyoshi

[1] The magic is now enabled by specifying --zend-multibyte to configure. Let's thank Masaki Fujimoto for his effort :)

[2] A typical example can be seen in
http://news.php.net/article.php?group=php.i18n&article=633

[3] You can reach Ken Lunde's homepage at http://www.praxagora.com/lunde/ .

As far as PHP is concerned you can work in almost any character set you want, including UTF8, in your PHP script itself. User input can come in from the browser in even more character sets and the output can be any of a long list as well.

For more info I suggest reading through http://php.net/mbstring

-Rasmus

On Fri, 28 Nov 2003, Ligaya Turmelle wrote:
Hi I am just starting a project that will have to handle English, Japanese and Korean characters. I have never programmed for an international group before and am unsure how to begin. I will be using a HTML(UTF8)/PHP front end with a MySQL DB in the back. I have read over the information about multi-byte string functions for PHP on PHP.net and still am very confused about what I have to do to start using these functions. Also I am confused about what (if anything) I have to do with the MySQL DB. This seems to be the only place to go to ask questions. I'm sorry if I ask silly or stupid questions in the future and ask for your patience.
Can anyone tell me where to go to get information, possibly view code
snippets, or a forum I could join.  Where is a good place to start?
--
PHP Internationalization Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php
--
PHP Internationalization Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php


--
PHP Internationalization Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Re: [PHP-I18N] new to intenationalization programming

Reply via email to