RE: [PHP-I18N] proposal: unification of the grapheme_extract functions

Ed Batutis Mon, 12 May 2008 13:01:47 -0700

> Maybe I just misunderstand the use case for the extract function - what
> it's supposed to do that substr, mb_substr and grapheme_substr can't or
> do worse?


Tex could probably answer this better than I could, but I'll have a go.

Use case 1: You have a buffer that is a fixed number of bytes long. You need
to fill it up as far as you can with whole graphemes. You are probably
sending that buffer to another API that might not be grapheme - or even
Unicode - aware. You are in a loop so you are tracking your position in the
original string. This is how the discussion got started about how the
'start' parameter is defined - it isn't clear how the position would be
tracked. I assumed a byte count because the user can simply do a strlen on
the return string to update his position, but Tex thinks this isn't as handy
as it should be. It depends on the details of the algorithm I guess.

Use case 2: Same as above except in this case it is an Oracle database
buffer where your columns are defined as being N Unicode characters (not
bytes or graphemes) long.

Use case 3 (a generalization of use case 1 really): You have some code that
knows about bytes or Unicode characters but nothing about graphemes. You
want to update the code so it is grapheme aware. You can't completely
abandon a byte count or character count in the code for some reason, but you
want to easily update the code to process whole graphemes.


=Ed



-- 
PHP Unicode & I18N Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

RE: [PHP-I18N] proposal: unification of the grapheme_extract functions

Reply via email to