On 3/5/08, John Campbell <[EMAIL PROTECTED]> wrote: > Frankly, I rarely find the need to use the multibyte functions (or any > string functions for that matter) on user data.
Agreed. It is rare that the multibyte functions are necessary. The problem area is when you want to iterate over each character and evaluate them independently. But in 99% of those scenarios you are just searching for an ASCII character and ASCII characters cannot appear within a UTF-8 sequence [1] and therefore the standard bytewise iteration is ok (e.g. strchr is ok if the needle is a single ASCII character). Some example issues with multibyte encodings are chopping a string to a fixed length or doing a case insensitive string comparison. If you want to display a summary of some search results (e.g. where the string is chopped off with an ellipsis at the end) you cannot just use substr because the last character may be incomplete. In that case you would need to use mb_substr($str, $start, $length, 'UTF-8'). Mike [1] ASCII characters can appear within multibyte sequences of encodings other than UTF-8. For example the backslash (\) can appear within a certain Japanese encoding (I don't recall which, I think it was SHIFT-JIS). -- Michael B Allen PHP Active Directory SPNEGO SSO http://www.ioplex.com/ _______________________________________________ New York PHP Community Talk Mailing List http://lists.nyphp.org/mailman/listinfo/talk NYPHPCon 2006 Presentations Online http://www.nyphpcon.com Show Your Participation in New York PHP http://www.nyphp.org/show_participation.php