I thought the proposal in the nov minutes was to create a data structure indicating which chars used surrogates. This approach is cheaper than that approach. Also, this model can be used in local loops and algorithms to gain performance, so it still has benefits even where there isn't a longer term structure available.
Tex Texin Internationalization Architect, Yahoo! Inc. > -----Original Message----- > From: Andrei Zmievski [mailto:[EMAIL PROTECTED] > Sent: Wednesday, March 08, 2006 8:49 AM > To: Tex Texin > Cc: php-i18n@lists.php.net > Subject: [PHP-I18N] Re: surrogates optimization > > > Tex, > > This approach would work only if we allowed access to the string > contents always via regimented API. Unfortunately, many third party > extensions (and many bundled ones) simply change the contents of the > string directly via a pointer.. I am not sure we could standardize > this. > > -Andrei > > On Mar 8, 2006, at 1:35 AM, Tex Texin wrote: > > > Suggestion for improving the performance of indexing strings: > > > > Associate with the string the index of the first code unit > that is a > > surrogate. Since most strings will have no surrogates, > these strings > > will have a value > > greater than the length of the string, and this tells you > that you can > > index > > directly into the string. When there is a surrogate, you can index > > directly, > > prior to the surrogate's index. > > > > If there is a surrogate then you can consider the meta data for > > remembering > > which chars used surrogates, to optimize indexing as was proposed. > > > > This is low cost, very efficient... Most strings won't have > > surrogates. tex > > > > -- > PHP Unicode & I18N Mailing List (http://www.php.net/) > To unsubscribe, visit: http://www.php.net/unsub.php > > -- PHP Unicode & I18N Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php