> -----Original Message----- > From: Ed Batutis [mailto:[EMAIL PROTECTED] > > Sounds like a break iterator with a bit of extra info to > support multiple encodings. Or perhaps you mean to wrap all > string operations? > > =Ed
I mean just for unicode, not other encodings, and yes to wrap all string ops, so that it can be maintained for any operations performed on the string. I would maintain some info like is the string all ascii, are there any graphemes, etc. to use lowercost functions if possible, and some information about where eac character in the string begins for fast indexing on short strings. For longer strings, I might remember beginning of lines and their character and byte offsets. Also previous and next character info. It would be something you might use on certain frequently used strings that are actually processed. PHP does a lot of just moving strings around and not parsing or modifying them so it isnt cost effective for all. Just a thought for the future. -- PHP Unicode & I18N Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php