Hi,
On $start being a reference, I like the idea, especially if we do that 
consistently for all functions (eventually I guess). (Otherwise, it may cause 
bugs to have $start change unexpectedly for a single function.) 

It does make migration also a little harder as people will need to adjust their 
code which does not expect $start to change.

A variation of the proposal might be to have the end value be an optional 
argument at the end of the arg list for returning the end position.
That is easy to migrate and requires a conscious change to update the variable 
and only require updating it if in fact it will be used.

All in all either approach is fine.

(I am out of the office and don't have the specs in front of me - sorry for not 
being more precise.)

But it doesn't really fix the fundamental issue with needing to involve php 
programmers with a byte vs char vs grapheme choice.

The right solution (to my mind) is to have some meta data maintained with 
strings and store position and other info about the strings to improve 
performance without involving programmers and letting people program strings 
without caring about encoding or architecture. That is an opportunity for php 6.




> -----Original Message-----
> From: Ed Batutis [mailto:[EMAIL PROTECTED] 
> Sent: Tuesday, May 13, 2008 8:25 AM
> To: Texin, Tex; 'Stanislav Malyshev'
> Cc: php-i18n@lists.php.net
> Subject: RE: [PHP-I18N] proposal: unification of the 
> grapheme_extract functions
> 
> 
> > I disagree with case 2 as it is described. You don't want 
> to truncate 
> > in the middle of a grapheme, if you in fact have graphemes.
> 
> I didn't intend to say that - the only difference between 1 
> and 2 is that in
> 2 the buffer is a character-length buffer and presumably 
> you'd have a character index that you'd like to use in 
> $start. But grapheme_extract always returns whole graphemes 
> regardless of any option or there's no point to it.
> 
> Stas brought up the idea of having $start be a reference so 
> the routine could update it to the next position. I think 
> that might solve some problems in the caller's code. $start 
> could still be defined as any of bytes, characters, or 
> graphemes and it would be updated respecting that. What do 
> you think? If we do that, the user might be perfectly happy 
> with only a "byte flavor" of $start in many simple cases 
> since they don't need to do anything extra to iterate through 
> the original string - they can always get a grapheme count or 
> character count if they need it by making a function call.
> 
> =Ed
> 
> 
> 

--
PHP Unicode & I18N Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to