Tex,

This approach would work only if we allowed access to the string contents always via regimented API. Unfortunately, many third party extensions (and many bundled ones) simply change the contents of the string directly via a pointer.. I am not sure we could standardize this.

-Andrei

On Mar 8, 2006, at 1:35 AM, Tex Texin wrote:

Suggestion for improving the performance of indexing strings:

Associate with the string the index of the first code unit that is a
surrogate.
Since most strings will have no surrogates, these strings will have a value greater than the length of the string, and this tells you that you can index directly into the string. When there is a surrogate, you can index directly,
prior to the surrogate's index.

If there is a surrogate then you can consider the meta data for remembering
which chars used surrogates, to optimize indexing as was proposed.

This is low cost, very efficient... Most strings won't have surrogates.
tex


--
PHP Unicode & I18N Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to