php-i18n Digest 15 Feb 2003 17:40:28 -0000 Issue 151
Topics (messages 448 through 455):
Re: Getting...oriented.
448 by: a.h.s. boy
450 by: Moriyoshi Koizumi
Re: charset conversion experiences
449 by: Moriyoshi Koizumi
451 by: Jan Schneider
452 by: Moriyoshi Koizumi
Re: confirm unsubscribe from [EMAIL PROTECTED]
453 by: Javi Lavandeira
Ooops
454 by: Javi Lavandeira
Indexing Japenese (new to list)
455 by: Gary Ross
Administrivia:
To subscribe to the digest, e-mail:
[EMAIL PROTECTED]
To unsubscribe from the digest, e-mail:
[EMAIL PROTECTED]
To post to the list, e-mail:
[EMAIL PROTECTED]
----------------------------------------------------------------------
--- Begin Message ---
On Wednesday, February 12, 2003, at 02:42 AM, Moriyoshi Koizumi wrote:
Well, I've mostly lived through the experience of internationalizing a
very large PHP application (using gettext()) to support the majority
of
Western languages. I'm using UTF-8 as the default encoding for the
site
(and form input), though MySQL still has Latin1 as its default
character set (which doesn't seem to pose any problems). But just when
I thought that might be sufficient, of course someone comes along and
wants to use the system in English and...Japanese.
UTF-8 also covers Japanese letters...
I realized, somewhere in the midst of writing my last post, that UTF-8
was capable of displaying Japanese characters, but it doesn't seem to
be a particularly common choice. And as long as I was only expecting
single-byte languages, shouldn't Japanese input still cause potential
problems, especially with str functions?
I just attempted a cut-n-paste (since I can't input real Japanese
myself) into my app from a random Japanese site, and it appeared to
work, but I can't forsee what problems may occur that I'm not
considering. And I've only tested it using the Safari browser under Mac
OS X (which supports Unicode and various language inputs quite well),
so I'm not sure how well it works with the more common Windows 98/XP
MSIE combination...
There's a development version of the app at http://dev.dadaimc.org/, if
anyone more familiar with Japanese would like to test. Use the "post an
article" link at the top of the page to submit...
Cheers,
spud.
-------------------------------------------------------------------
a.h.s. boy
spud(at)nothingness.org "as yes is to if,love is to yes"
http://www.nothingness.org/
-------------------------------------------------------------------
--- End Message ---
--- Begin Message ---
"a.h.s. boy" <[EMAIL PROTECTED]> wrote:
> I realized, somewhere in the midst of writing my last post, that UTF-8
> was capable of displaying Japanese characters, but it doesn't seem to
> be a particularly common choice. And as long as I was only expecting
> single-byte languages, shouldn't Japanese input still cause potential
> problems, especially with str functions?
That's the case, in particular when it comes to mobile internet access
like i-mode, J-SKY, ezweb or whatsoever provided in Japan, though recent
PC browsers such as IE and Netscape except for old versions (4.x) of
netscape navigator are enough capable of handling UTF-8 characters.
And on the other hand, there still remains some sort of database
problem. In most cases it's not likely that you have trouble executing SQL
statements which include UTF-8 characters under the
configuration for latin1(iso-8859-1) characters. But you won't get a
correct result by string manipulation functions then. To solve this, you
should use one of the encodings supported by MySQL such as EUC-JP(ujis),
Shift_JIS(sjis) for Japanese text handling.
In addition, since PHP's standard string functions treat all the input
strings as such that they consist of one-letter-per-byte components, you
should use mbstring functions to manipulate multi-byte strings. Besides
mb_ series can handle various single-byte encoding too.
BTW even if you want to quickly make your websites ready for Japanese
language, using mbstring's function overloading feature should always be
discouraged. It's likely to cause various unknown problems due to slight
diferrences in the function specs.
Moriyoshi
--- End Message ---
--- Begin Message ---
Hi,
1. libiconv is basically far better at transliteration than glibc iconv.
2. glibc iconv supports yet another external option.
See http://bugs.php.net/20809
Hope this helps.
Moriyoshi
Jan Schneider <[EMAIL PROTECTED]> wrote:
> Hi,
>
> does anybody have some experience or even did some extensive testing on
> how successful iconv, mbstring and recode are when it comes to charset
> conversions?
>
> I currently try iconv() with transliteration first and fallback to
> mb_convert_encoding() if this fails. The rationale is that iconv
> supports much more charsets than mbstring but fails if it detects an
> invalid character for the input charset.
>
> I have no experiences yet with recode (neither in php nor generally),
> does it perhaps even use the same libiconv as iconv?
>
> Anything that throws some light on this is welcome.
>
> Jan.
--- End Message ---
--- Begin Message ---
Moriyoshi Koizumi wrote:
1. libiconv is basically far better at transliteration than glibc iconv.
2. glibc iconv supports yet another external option.
See http://bugs.php.net/20809
Hope this helps.
Not really, but thanks for your answer.
It still isn't clear to me which conversion method uses which library.
If I understand it correctly, mbstring doesn't use an external library
but has its own conversion routines.
iconv probably uses glibc's iconv?
Does recode use the libiconv library?
If so, do you (or others) recommend the following chain to convert
between charsets?
recode - iconv - mbstring
Jan Schneider <[EMAIL PROTECTED]> wrote:
Hi,
does anybody have some experience or even did some extensive testing on
how successful iconv, mbstring and recode are when it comes to charset
conversions?
I currently try iconv() with transliteration first and fallback to
mb_convert_encoding() if this fails. The rationale is that iconv
supports much more charsets than mbstring but fails if it detects an
invalid character for the input charset.
I have no experiences yet with recode (neither in php nor generally),
does it perhaps even use the same libiconv as iconv?
Anything that throws some light on this is welcome.
Jan.
--- End Message ---
--- Begin Message ---
Jan Schneider <[EMAIL PROTECTED]> wrote:
> Moriyoshi Koizumi wrote:
> > 1. libiconv is basically far better at transliteration than glibc iconv.
> >
> > 2. glibc iconv supports yet another external option.
> > See http://bugs.php.net/20809
> >
> > Hope this helps.
>
> Not really, but thanks for your answer.
>
> It still isn't clear to me which conversion method uses which library.
> If I understand it correctly, mbstring doesn't use an external library
> but has its own conversion routines.
As for mbstring, that's virtually true.
> iconv probably uses glibc's iconv?
Depends on your configuration.
On glibc available systems, --with-iconv (without any parameters) will
lead the iconv extension to use glibc's iconv implementation.
And if libiconv has been installed in the prefix "/usr/local" and PHP is
configured with --with-iconv=/usr/local, libiconv will be used by the
iconv extension.
Please see the manual for detail as it has been updated since 4.3.0.
> Does recode use the libiconv library?
libiconv(libcharset) is bundled and used by recode internally.
> If so, do you (or others) recommend the following chain to convert
> between charsets?
> recode - iconv - mbstring
I don't so much recommend using recode because it's known to not work in a
threaded environment.
Moriyoshi
--- End Message ---
--- Begin Message ---
On 14 Feb 2003 14:55:10 -0000 [EMAIL PROTECTED] wrote:
> Hi! This is the ezmlm program. I'm managing the
> [EMAIL PROTECTED] mailing list.
>
> I'm working for my owner, who can be reached
> at [EMAIL PROTECTED]
>
> To confirm that you would like
>
> [EMAIL PROTECTED]
>
> removed from the php-i18n mailing list, please send an empty reply
> to this address:
>
> [EMAIL PROTECTED]
>
> Usually, this happens when you just hit the "reply" button.
> If this does not work, simply copy the address and paste it into
> the "To:" field of a new message.
>
> or click here:
>
>mailto:[EMAIL PROTECTED]
>
> I haven't checked whether your address is currently on the mailing list.
> To see what address you used to subscribe, look at the messages you are
> receiving from the mailing list. Each message has your address hidden
> inside its return path; for example, [EMAIL PROTECTED] receives messages
> with return path: <php-i18n-return-<number>[EMAIL PROTECTED]
>
> Some mail programs are broken and cannot handle long addresses. If you
> cannot reply to this request, instead send a message to
> <[EMAIL PROTECTED]> and put the entire address listed above
> into the "Subject:" line.
>
>
> --- Administrative commands for the php-i18n list ---
>
> I can handle administrative requests automatically. Please
> do not send them to the list address! Instead, send
> your message to the correct command address:
>
> To subscribe to the list, send a message to:
> <[EMAIL PROTECTED]>
>
> To remove your address from the list, send a message to:
> <[EMAIL PROTECTED]>
>
> Send mail to the following for info and FAQ for this list:
> <[EMAIL PROTECTED]>
> <[EMAIL PROTECTED]>
>
> Similar addresses exist for the digest list:
> <[EMAIL PROTECTED]>
> <[EMAIL PROTECTED]>
>
> To get messages 123 through 145 (a maximum of 100 per request), mail:
> <[EMAIL PROTECTED]>
>
> To get an index with subject and author for messages 123-456 , mail:
> <[EMAIL PROTECTED]>
>
> They are always returned as sets of 100, max 2000 per request,
> so you'll actually get 100-499.
>
> To receive all messages with the same subject as message 12345,
> send an empty message to:
> <[EMAIL PROTECTED]>
>
> The messages do not really need to be empty, but I will ignore
> their content. Only the ADDRESS you send to is important.
>
> You can start a subscription for an alternate address,
> for example "[EMAIL PROTECTED]", just add a hyphen and your
> address (with '=' instead of '@') after the command word:
> <[EMAIL PROTECTED]>
>
> To stop subscription for this address, mail:
> <[EMAIL PROTECTED]>
>
> In both cases, I'll send a confirmation message to that address. When
> you receive it, simply reply to it to complete your subscription.
>
> If despite following these instructions, you do not get the
> desired results, please contact my owner at
> [EMAIL PROTECTED] Please be patient, my owner is a
> lot slower than I am ;-)
>
> --- Enclosed is a copy of the request I received.
>
> Return-Path: <[EMAIL PROTECTED]>
> Received: (qmail 30339 invoked from network); 14 Feb 2003 14:55:10 -0000
> Received: from unknown (HELO giskard.ag0ny.com) (61.213.134.141)
> by pb1.pair.com with SMTP; 14 Feb 2003 14:55:10 -0000
> Received: (qmail 54162 invoked from network); 14 Feb 2003 14:57:00 -0000
> Received: from unknown (HELO cosmos3.ag0ny.com) ([EMAIL PROTECTED])
> by 0 with SMTP; 14 Feb 2003 14:57:00 -0000
> Date: Fri, 14 Feb 2003 23:55:03 +0900
> From: Javi Lavandeira <[EMAIL PROTECTED]>
> To: [EMAIL PROTECTED]
> Subject: Re: [PHP-I18N] charset conversion experiences
> Message-Id: <[EMAIL PROTECTED]>
> In-Reply-To: <[EMAIL PROTECTED]>
> References: <[EMAIL PROTECTED]>
> Organization: aamsx.org
> X-Mailer: Sylpheed version 0.8.5 (GTK+ 1.2.10; i386-unknown-freebsd4.6.2)
> Mime-Version: 1.0
> Content-Type: text/plain; charset=US-ASCII
> Content-Transfer-Encoding: 7bit
>
> On Wed, 12 Feb 2003 17:24:06 +0100 Jan Schneider <[EMAIL PROTECTED]> wrote:
>
> > Hi,
> >
> > does anybody have some experience or even did some extensive testing on
> > how successful iconv, mbstring and recode are when it comes to charset
> > conversions?
> >
> > I currently try iconv() with transliteration first and fallback to
> > mb_convert_encoding() if this fails. The rationale is that iconv
> > supports much more charsets than mbstring but fails if it detects an
> > invalid character for the input charset.
> >
> > I have no experiences yet with recode (neither in php nor generally),
> > does it perhaps even use the same libiconv as iconv?
> >
> > Anything that throws some light on this is welcome.
> >
> > Jan.
> >
> >
> > --
> > PHP Internationalization Mailing List (http://www.php.net/)
> > To unsubscribe, visit: http://www.php.net/unsub.php
> >
> >
>
>
> --
> Javi Lavandeira ([EMAIL PROTECTED]) - http://www.ag0ny.com - http://www.aamsx.org
>
--
Javi Lavandeira ([EMAIL PROTECTED]) - http://www.ag0ny.com - http://www.aamsx.org
--- End Message ---
--- Begin Message ---
I'm sorry, I didn't check the address in the message when I replied...
Regards,
--
Javi Lavandeira ([EMAIL PROTECTED]) - http://www.ag0ny.com - http://www.aamsx.org
--- End Message ---
--- Begin Message ---
hello All,
I'm new to the list!
I'm wondering if anyone has information about indexing/splitting
Japanese text strings in a mysql/php environment. So
korehanihongodesu (obviously Kanji in reality) is indexed as kore ha
nihongo desu. I know there are semantic limitation on a word by word
index by it's better than nothing.
Thanks,
Gary
--- End Message ---