Edit report at https://bugs.php.net/bug.php?id=64518&edit=1
ID: 64518
User updated by: php at richardneill dot org
Reported by: php at richardneill dot org
Summary: optimisation for "for ($i=0; $i<count($array);$i++)"
when $array is global
Status: Not a bug
Type: Bug
Package: Performance problem
PHP Version: 5.4.13
Block user comment: N
Private report: N
New Comment:
> The issue you are seeing is that `global $data` is a by-reference fetch
> (unlike `$data = $GLOBALS['data']`, which is by-value). The by-ref pass
> prevents us from doing a copy-on-write optimization when passing the array to
> the by-value count() function, so the array has to be copied every single
> time.
Thanks for the explanation. That makes some sense as an explanation. However,
it also implies that use of "$GLOBALS['var']" is always faster, sometimes much
faster, than "global $var". If this is true, might I suggest that this is a
documentation issue that should be addressed?
Previous Comments:
------------------------------------------------------------------------
[2013-03-26 13:34:40] [email protected]
The issue you are seeing is that `global $data` is a by-reference fetch (unlike
`$data = $GLOBALS['data']`, which is by-value). The by-ref pass prevents us
from doing a copy-on-write optimization when passing the array to the by-value
count() function, so the array has to be copied every single time.
The specific count() case is something we could optimize by adding a read-only
passing mode for function arguments (that never separates zvals). As that is
something that's also useful for a few other things, it might well be worth
doing.
------------------------------------------------------------------------
[2013-03-26 06:34:44] php at richardneill dot org
... and getting closer to the source of the problem...
function a(){ //about 1400 us
global $data;
$n = $data;
}
function b(){ //much much quicker: about 18us
$n = $GLOBALS['data'];
}
Something is definitely wrong here, that makes "global" so much slower than
$GLOBALS.
Also, this isn't a read-only optimisation which makes b() much quicker than
a();
it still applies even when we want to copy back into the global scope.
Consider:
function x(){
$n = $GLOBALS['data'];
$GLOBALS['data'] = count($n);
}
and x() is always about as fast as b().
A secondary problem: if x() is rewritten as:
function y(){
$GLOBALS['data'] = count($GLOBALS['data']);
}
then it can be either fast or slow depending on whether other function calls
contain "global $data" or not.
------------------------------------------------------------------------
[2013-03-26 06:09:30] php at richardneill dot org
Here's a simpler test case. The problem is actually that accessing global
variables from functions is much slower than local variables (and I was seeing
this exacerbated by looping over them with count).
Note that passing the data directly as a function argument is much faster than
passing it as a global variable - this seems to me to be the root of the
problem. i.e.
function ($a,$b,$c,$large_array){
//do stuff - this is fast.
}
function ($a,$b,$c){
global $large_array
//do stuff - this is SLOW.
}
--------------------------------------------
#!/usr/bin/php -ddisplay_errors=E_ALL
<?
$NUMBER = 100000; //adjust for your computer.
for ($i = 0; $i< $NUMBER; $i++){
$data[$i] = "$i";
}
echo "Copy in main()...\n";
$t1 = microtime(true);
$copy = $data;
$t2 = microtime(true);
$time = ($t2 - $t1) * 1E6;
echo "Finished in $time microseconds.\n\n"; #5.0 us
function f1(){
global $NUMBER;
echo "Copy local variable in function...\n";
for ($i = 0; $i< $NUMBER; $i++){
$data_loc[$i] = "$i";
}
$t1 = microtime(true);
$copy = $data_loc;
$t2 = microtime(true);
$time = ($t2 - $t1) *1E6;
echo "Finished in $time microseconds.\n\n"; #1.9 us
}
function f2(){
global $data;
echo "Copy global variable into function...\n";
$t1 = microtime(true);
$n = $data;
$t2 = microtime(true);
$time = ($t2 - $t1) *1E6;
echo "Finished in $time microseconds.\n\n"; #9557 us
}
function f3($data){
echo "Pass variable into function...\n";
$t1 = microtime(true);
$n = $data;
$t2 = microtime(true);
$time = ($t2 - $t1) *1E6;
echo "Finished in $time microseconds.\n\n"; #0.95 us
}
f1();
f2();
f3($data);
echo "Note that as \$NUMBER is changed by factors of 10, the first two times
don't change much, but the latter scales as O(\$NUMBER).\n";
?>
------------------------------------------------------------------------
[2013-03-26 05:17:44] php at richardneill dot org
Description:
------------
If an array is counted inside a for-loop, this is normally slightly inefficient.
$n = count ($array); //[1] optimised and a bit faster.
for ($i=0; $i< $n;$i++){
//do stuff
}
for ($i=0; $i<count($array);$i++){ //[2] perfectly respectable speed.
//do stuff //very nearly as fast as [1].
}
BUT, if our for-loop is inside a function AND our $array is a global variable,
then method [2] becomes really really slow (though method [1] remains
unaffected).
Below, the problematic function is add_slow(); the script does the same thing 4
ways to demonstrate that 3 of them work well and one works slowly.
Test script:
---------------
#!/usr/bin/php -ddisplay_errors=E_ALL
<?
echo "This demonstrates the problem with counting global arrays inside loops
inside functions.\n";
$NUMBER = 10000; //adjust for your computer.
for ($i = 0; $i< $NUMBER; $i++){
$data[$i] = "$i";
}
function add_slow(){ //This is the problematic function. $data is global,
and the count() is inside the for().
global $data; //It is REALLY slow: 4000 times slower than the others!
$sum=0;
for ($i = 0; $i < count($data); $i++){
$sum += $data[$i];
}
echo "Total (slow) is $sum.\n";
}
function add_fast(){ //This one is fine. The count() is optimised by taking
it out of the for().
global $data; //... but we're still using a global array, so that's
not the problem.
$sum=0;
$n = count($data);
for ($i = 0; $i < $n; $i++){
$sum += $data[$i];
}
echo "Total (fast) is $sum.\n";
}
function add_local(){ //This one is also fine. The count() is NOT optimised,
but it still runs almost as quickly.
global $NUMBER;
for ($i = 0; $i< $NUMBER; $i++){
$data[$i] = "$i";
}
$sum=0;
for ($i = 0; $i < count($data); $i++){
$sum += $data[$i];
}
echo "Total (local) is $sum.\n";
}
echo "Calling add_slow()...\n";
$t1 = microtime(true);
add_slow();
$t2 = microtime(true);
$time = round($t2 - $t1, 3);
echo "Finished in $time seconds.\n\n";
echo "Calling add_fast()...\n";
$t1 = microtime(true);
add_fast();
$t2 = microtime(true);
$time = round($t2 - $t1, 3);
echo "Finished in $time seconds.\n\n";
echo "Calling add_local()...\n";
$t1 = microtime(true);
add_local();
$t2 = microtime(true);
$time = round($t2 - $t1, 3);
echo "Finished in $time seconds.\n\n";
echo "Not using a function...\n";
$t1 = microtime(true);
$sum=0;
for ($i = 0; $i < count($data); $i++){ //This one is in the main() scope,
it's also fast.
$sum += $data[$i];
}
echo "Total (main loop) is $sum.\n";
$t2 = microtime(true);
$time = round($t2 - $t1, 3);
echo "Finished in $time seconds.\n\n";
?>
Expected result:
----------------
All 4 ways of summing this array should run in similar amounts of time (a few
milliseconds).
But actually add_slow() is 3900 times slower.
Actual result:
--------------
This demonstrates the problem with counting global arrays inside loops inside
functions.
Calling add_slow()...
Total (slow) is 49995000.
Finished in 7.86 seconds.
Calling add_fast()...
Total (fast) is 49995000.
Finished in 0.002 seconds.
Calling add_local()...
Total (local) is 49995000.
Finished in 0.006 seconds.
Not using a function...
Total (main loop) is 49995000.
Finished in 0.002 seconds.
------------------------------------------------------------------------
--
Edit this bug report at https://bugs.php.net/bug.php?id=64518&edit=1