Re: [nyphp-talk] Charsets are still driving me nuts

Michael B Allen Wed, 05 Mar 2008 19:21:06 -0800

On 3/5/08, John Campbell <[EMAIL PROTECTED]> wrote:
>  Frankly, I rarely find the need to use the multibyte functions (or any
>  string functions for that matter) on user data.


Agreed. It is rare that the multibyte functions are necessary. The
problem area is when you want to iterate over each character and
evaluate them independently. But in 99% of those scenarios you are
just searching for an ASCII character and ASCII characters cannot
appear within a UTF-8 sequence [1] and therefore the standard bytewise
iteration is ok (e.g. strchr is ok if the needle is a single ASCII
character).

Some example issues with multibyte encodings are chopping a string to
a fixed length or doing a case insensitive string comparison. If you
want to display a summary of some search results (e.g. where the
string is chopped off with an ellipsis at the end) you cannot just use
substr because the last character may be incomplete. In that case you
would need to use mb_substr($str, $start, $length, 'UTF-8').

Mike

[1] ASCII characters can appear within multibyte sequences of
encodings other than UTF-8. For example the backslash (\) can appear
within a certain Japanese encoding (I don't recall which, I think it
was SHIFT-JIS).

-- 
Michael B Allen
PHP Active Directory SPNEGO SSO
http://www.ioplex.com/
_______________________________________________
New York PHP Community Talk Mailing List
http://lists.nyphp.org/mailman/listinfo/talk

NYPHPCon 2006 Presentations Online
http://www.nyphpcon.com

Show Your Participation in New York PHP
http://www.nyphp.org/show_participation.php

Re: [nyphp-talk] Charsets are still driving me nuts

Reply via email to