> Maybe I just misunderstand the use case for the extract function - what > it's supposed to do that substr, mb_substr and grapheme_substr can't or > do worse?
Tex could probably answer this better than I could, but I'll have a go. Use case 1: You have a buffer that is a fixed number of bytes long. You need to fill it up as far as you can with whole graphemes. You are probably sending that buffer to another API that might not be grapheme - or even Unicode - aware. You are in a loop so you are tracking your position in the original string. This is how the discussion got started about how the 'start' parameter is defined - it isn't clear how the position would be tracked. I assumed a byte count because the user can simply do a strlen on the return string to update his position, but Tex thinks this isn't as handy as it should be. It depends on the details of the algorithm I guess. Use case 2: Same as above except in this case it is an Oracle database buffer where your columns are defined as being N Unicode characters (not bytes or graphemes) long. Use case 3 (a generalization of use case 1 really): You have some code that knows about bytes or Unicode characters but nothing about graphemes. You want to update the code so it is grapheme aware. You can't completely abandon a byte count or character count in the code for some reason, but you want to easily update the code to process whole graphemes. =Ed -- PHP Unicode & I18N Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php