Subject: Re: [nyphp-talk] PHP + UTF-8 + mb_string issue. Date: Wednesday 21 March 2007 12:07 From: Anirudh Zala <[EMAIL PROTECTED]> To: Michael B Allen <[EMAIL PROTECTED]>
On Wednesday 21 March 2007 11:49, you wrote: > On Wed, 21 Mar 2007 11:48:20 +0530 > > Anirudh Zala <[EMAIL PROTECTED]> wrote: > > On Wednesday 21 March 2007 11:36, you wrote: > > > On Wed, 21 Mar 2007 10:50:26 +0530 > > > > > > Anirudh Zala <[EMAIL PROTECTED]> wrote: > > > > Hello Everybody, > > > > > > > > While building a truly multilingual project, I am running into an > > > > interesting problem with php5 + utf-8 + mb_string. > > > > > > <snip> > > > > > > > ____________ = 1 word; 4 bytes; 2 characters (______, ______); 4 > > > > key-strokes (___, ___, ___, ___); "strlen" should be 2 but is 4. > > > > > > Generally the libc-like functions exhibit libc behavior so 4 is the > > > correct answer. > > > > > > Is mb_strlen not suitable for some reason? You have to use mb_* > > > functions whenever you perform character-wise operations as opposed to > > > byte-wise (and that assumes you're running in the UTF-8 locale). > > > > > > Mike > > > > I am using mb_* functions and UTF-8 as locale. Everything is > > transparently processed in UTF-8 format only. I have tested same thing > > using "iconv" extension but same results. Looks like it is the behavior > > of php + mb_*. > > I don't understand. You used mb_strlen and got 4? If so, what are the > 4 bytes that make up the 2 characters exactly? > > Mike It is because length of string should "2" in actual way when it is used in communication, writing, speaking etc. But PHP needs 4 bytes to store it hence giving length as "4". As I told in original mail that Indic languages have primary (like ઝ, લ) and secondary characters (like ા) to create different meaning but secondary characters should not be counted while calculating length of word (even if it requires additional byte to store). This is the issue. -- Anirudh Zala (30% of Internet resources, used to deliver web-pages, are wasted by unnecessary tabs and spaces.) _______________________________________________ New York PHP Community Talk Mailing List http://lists.nyphp.org/mailman/listinfo/talk NYPHPCon 2006 Presentations Online http://www.nyphpcon.com Show Your Participation in New York PHP http://www.nyphp.org/show_participation.php
