ID: 47643
Updated by: [email protected]
Reported By: viper7 at viper-7 dot com
Status: Assigned
Bug Type: Performance problem
Operating System: *
PHP Version: 5.2.6+, 5.3, 6CVS (2009-04-13)
Assigned To: dmitry
New Comment:
The problems occurs because of "bad" patch for bug #42838.
The diff algorithm sorts arrays using qsort and then assumes that they
are sorted correctly. But in case of user compaison function it can't be
guaranteed. Thus in ext/standard/tests/array/bug42838.phpt
key_compare_func() can't sort array correctly because expressions (0 <
'a') and (0 > 'a') both false ('a' is interpreted as a number 0).
It should be fixed in some way
Previous Comments:
------------------------------------------------------------------------
[2009-06-30 15:22:24] [email protected]
Dmitry, could you have a look? I have no idea why this occurs.
------------------------------------------------------------------------
[2009-06-30 15:19:43] viper7 at viper-7 dot com
I've tracked down the change that broke things, this is it. but the
exact reason is beyond me heh. Hopefully this helps.
http://cvs.php.net/viewvc.cgi/php-src/ext/standard/array.c?r1=1.308.2.21.2.51&r2=1.308.2.21.2.52&pathrev=PHP_5_2
------------------------------------------------------------------------
[2009-03-24 21:19:01] cisa at cisa85 dot de
Like I described [1] I use this function to get the performance I
need:
function array_diff_fast($data1, $data2) {
$data1 = array_flip($data1);
$data2 = array_flip($data2);
foreach($data2 as $hash => $key) {
if (isset($data1[$hash])) unset($data1[$hash]);
}
return array_flip($data1);
}
Thanks to Viper for his help.
[1]
http://nohostname.de/blog/2009/03/24/bug-gefunden-array_diff-in-php-526-unglaublich-langsam/
------------------------------------------------------------------------
[2009-03-13 11:49:36] viper7 at viper-7 dot com
Description:
------------
This bug was reported in ##php on freenode, and after some thorough
testing on multiple machines we determined it must be an engine bug.
array_diff on two large arrays of md5 hashes (600,000 elements each)
takes approximately 4 seconds on a fast server in PHP 5.2.4 and below
(confirmed with PHP 5.2.0), but over 4 hours (!) on PHP 5.2.6 and
greater (confirmed with PHP 5.2.9 and PHP 5.3.0 beta2)
Reproduce code:
---------------
<?php
$i=0; $j=500000;
while($i < 600000) {
$i++; $j++;
$data1[] = md5($i);
$data2[] = md5($j);
}
$time = microtime(true);
echo "Starting array_diff\n";
$data_diff1 = array_diff($data1, $data2);
$time = microtime(true) - $time;
echo 'array_diff() took ' . number_format($time, 3) . ' seconds and
returned ' . count($data_diff1) . " entries\n";
?>
Expected result:
----------------
Starting array_diff
array_diff() took 3.778 seconds and returned 500000 entries
Actual result:
--------------
Starting array_diff
array_diff() took 14826.278 seconds and returned 500000 entries
------------------------------------------------------------------------
--
Edit this bug report at http://bugs.php.net/?id=47643&edit=1