Yasuo Ohgaki wrote: > >> http://dk.php.net/manual/en/ref.mbstring.php claims that "a multi-byte >> character string may be destroyed when it is divided and/or counted >> unless multi-byte character encoding safe method is used". I've just >> run some tests with Unicode and Japanese characters (copied from >> http://unicode.org/unicode/standard/translations/japanese.html). I >> used functions like preg_match(), strlen(), and substr(), and no >> matter what I can't seem to break the Japanese strings. Which leads to >> my question: > > > What encoding are you using?
UTF-8. > Don't you use func_overload, right? Not at the moment. > For instance, UTF-8 can be 6 bytes at most, and inserting newline, > etc to middle of multibyte sequence breaks multi-byte chars obviously. Obviously - but the functions I'm going to use will not change the string. E.g., substr() doesn't change the string, and neither does strlen(). >> Is it really necessary to use functions like mb_substr() instead of >> substr(), mb_strlen() instead of strlen(), etc.? Does anyone have any >> examples of strings that would actually break if you use preg_match(), >> substr(), strlen() or similar functions on them? > > > Of course, they need it. > We'll make all default string functions multibyte aware someday. > > If you use PCRE and UTF-8, it works. I do. So what you're saying is that it's OK to use preg_*, but not strlen(), substr(), etc. (unless I use function overloading)? > How did you check if the multibyte sequence is broken > or not? Well, I just used my browser (Mozilla 1.1). If the unmodified string looked exactly like the string that I ran through functions like strlen(), then I concluded they were the same. But obviously you have much more experience with Japanese characters, so if you say that functions like strlen() might break the strings, I'll take your word for it. > See also encodings like ISO 2022 or EUC. Well, the site I'm making is not going to be Japanese - it's going to be international, and so I want it to work for everybody (including the Japanese). -- PHP Internationalization Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php