> -----Original Message-----
> From: Ed Batutis [mailto:[EMAIL PROTECTED] 
> 
> Sounds like a break iterator with a bit of extra info to 
> support multiple encodings. Or perhaps you mean to wrap all 
> string operations?
> 
> =Ed


I mean just for unicode, not other encodings, and yes to wrap all string ops, 
so that it can be maintained for any operations performed on the string.

I would maintain some info like is the string all ascii, are there any 
graphemes, etc. to use lowercost functions if possible, and some information 
about where eac character in the string begins for fast indexing on short 
strings.
For longer strings, I might remember beginning of lines and their character and 
byte offsets.
Also previous and next character info.

It would be something you might use on certain frequently used strings that are 
actually processed.
PHP does a lot of just moving strings around and not parsing or modifying them 
so it isnt cost effective for all.

Just a thought for the future.

--
PHP Unicode & I18N Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to