mb_substr is always going to be slow because you always have to
iterate from the beginning get the count, thus the loop will run in
O(N^2).

In theory, it should be much faster if you just pull the first character.
e.g.:

while($rest)
  $char = mb_substr($rest,0,1);
  $rest = mb_substr($rest,1);

This will at least be O(N) on the length of the string.

I also like Dan's idea of using preg_split.

Regards,
John Campbell

On Wed, Jan 13, 2010 at 10:02 AM, Rob Marscher
<rmarsc...@beaffinitive.com> wrote:
> Hi all,
>
> I have a need to iterate through a multibyte string to process the string 
> character by character.  Hopefully in php6, this will work without any 
> special work, but as we know we need to use special multibyte string 
> functions in php5 to work with utf-8 characters.  Here's an example that 
> iterates my dilemma:
>
> <?php
> mb_internal_encoding("UTF-8");
>
> $str = "string with utf-8 chars åèö";
> $length = mb_strlen($str);
> $brokenStr = "";
> $preservedStr = "";
>
> for ($i = 0; $i < $length; $i++) {
>  $brokenStr .= $str[$i];
>  $preservedStr .= mb_substr($str, $i, 1);
> }
> echo "brokenStr = " . $brokenStr . "\n";
> echo "preservedStr = " . $preservedStr . "\n";
> ?>
>
> The array notation for string is the normal way to do this with regular 
> strings: $str[$i].  I assume this will work for multibyte strings in php6.
>
> -- Is using mb_substr($str, $i, 1) the only way to get this to work in php5?  
> That's my question.
>
> It seems like it's going to be many times slower according to some of the 
> comments I've seen on the multibyte functions in the php manual.
>
> Thanks!!
> -Rob
>
> _______________________________________________
> New York PHP Users Group Community Talk Mailing List
> http://lists.nyphp.org/mailman/listinfo/talk
>
> http://www.nyphp.org/Show-Participation
>
_______________________________________________
New York PHP Users Group Community Talk Mailing List
http://lists.nyphp.org/mailman/listinfo/talk

http://www.nyphp.org/Show-Participation

Reply via email to