You forgot
mb_internal_encoding("UTF-8");

without that, mb_substr is just an alias for substr

my results look like:

normal iteration took 0.64724087715149
mb_substr method took 16.471849918365
mb_substr method with shortening the string took 21.613878965378
preg_split method took 1.927277803421

Dan is the winner.  preg_split always runs in linear time.  Both of
the mb_substr are O(N^2), because the first step in mb_substr is
splitting the string into array.  It is not as intelligent as I
initially assumed.

Regards,
John Campbell

On Wed, Jan 13, 2010 at 11:37 AM, Rob Marscher
<rmarsc...@beaffinitive.com> wrote:
> OK.  Here are the results of my rough benchmark.  Every time I ran it, the 
> results were within about .025 seconds of each other so it seems accurate.  
> Surprisingly, my original mb_substr method won, with preg_split taking just a 
> little bit longer.  John's method of grabbing the first character and then 
> removing it from the string actually seems take almost exponentially more 
> time based on how long the string is.  I set $strSize to 1000 and had to kill 
> it because I didn't want to wait so long.  There must be something pretty 
> inefficient going on in mb_substr to make that the case.  I suppose we could 
> look at the source to get to the bottom of it... but I think I've already 
> spent as much time on this as I'm willing to.  Thanks again to you guys.
>
> $ php mbtest.php
> normal iteration took 0.8041729927063
> mb_substr method took 1.7228858470917
> mb_substr method with shortening the string took 7.9840841293335
> preg_split method took 2.1547298431396
>
> $ cat mbtest.php
> <?php
>
> $strSize = 100;
> $repeats = 1000;
>
> // make the string somewhat large
> $str = '';
> for ($i = 0; $i < $strSize; $i++) {
>        $str .= "string with utf-8 chars\n   åèö";
> }
>
> // non-multibyte iteration
> $start = microtime(true);
> for ($i = 0; $i < $repeats; $i++) {
>        $length = strlen($str);
>        $newStr = '';
>        for ($j = 0; $j < $length; $j++) {
>                $newStr .= $str{$j};
>        }
> }
> $end = microtime(true);
> echo "normal iteration took " . ($end - $start) . "\n";
>
> // mb_substr method
> $start = microtime(true);
> for ($i = 0; $i < $repeats; $i++) {
>        $length = mb_strlen($str);
>        $newStr = '';
>        $rest = $str;
>        for ($j = 0; $j < $length; $j++) {
>                $newStr .= mb_substr($rest, $j, 1);
>        }
> }
> $end = microtime(true);
> echo "mb_substr method took " . ($end - $start) . "\n";
>
> // mb_substr method, shortening string
> $start = microtime(true);
> for ($i = 0; $i < $repeats; $i++) {
>        $length = mb_strlen($str);
>        $newStr = '';
>        $rest = $str;
>        while ($rest) {
>                $newStr .= mb_substr($rest, 0, 1);
>                $rest = mb_substr($rest, 1);
>        }
> }
> $end = microtime(true);
> echo "mb_substr method with shortening the string took " . ($end - $start) . 
> "\n";
>
> // preg_split method
> $start = microtime(true);
> for ($i = 0; $i < $repeats; $i++) {
>        $chars = preg_split('//u', $str, -1, PREG_SPLIT_NO_EMPTY);
>        $length = count($chars);
>        $newStr = '';
>        for ($j = 0; $j < $length; $j++) {
>                $newStr += $chars[$j];
>        }
> }
> $end = microtime(true);
> echo "preg_split method took " . ($end - $start) . "\n";
>
>
>
> _______________________________________________
> New York PHP Users Group Community Talk Mailing List
> http://lists.nyphp.org/mailman/listinfo/talk
>
> http://www.nyphp.org/Show-Participation
>
_______________________________________________
New York PHP Users Group Community Talk Mailing List
http://lists.nyphp.org/mailman/listinfo/talk

http://www.nyphp.org/Show-Participation

Reply via email to