Edit report at http://bugs.php.net/bug.php?id=52832&edit=1
ID: 52832
User updated by: galaxy dot mipt at gmail dot com
Reported by: galaxy dot mipt at gmail dot com
Summary: unserialize() performance
Status: Open
Type: Feature/Change Request
Package: Performance problem
Operating System: Linux
PHP Version: 5.3.3
Block user comment: N
New Comment:
Added a patch against latest SVN version, did things in a way that
required least code modification.
Here goes the test script:
<?php
ini_set('memory_limit', '512M');
$sizes = array(100000, 200000, 500000, 1000000);
foreach($sizes as $N) {
$data = array();
for($i=0; $i < $N; $i++) $data[] = mt_rand();
$timeSerialize = 0;
$timeUnserialize = 0;
for($run=0; $run < 10; $run++) {
$ts = microtime(1);
$ser = serialize($data);
$timeSerialize += microtime(1) - $ts;
$ts = microtime(1);
$unser = unserialize($ser);
$timeUnserialize += microtime(1) - $ts;
if (count($data) != count($unser)) print "Error: array sizes
mismatch\n";
for($i=0; $i < $N; $i++)
if (!isset($unser[$i]) || $data[$i] != $unser[$i])
print "Error: array elements mismatch\n";
unset($ser);
unset($unser);
}
print "Size: $N\t\tSerialize: " . (floor(1000*$timeSerialize)) .
"ms\t\tUnserialize: " . (floor(1000*$timeUnserialize)) . "ms\n\n";
}
?>
It's a bit memory consuming, so array sizes might need to be reduced
depending on available hardware.
My test results:
Original PHP:
Size: 100000 Serialize: 483ms Unserialize:
470ms
Size: 200000 Serialize: 1047ms Unserialize:
1308ms
Size: 500000 Serialize: 2638ms Unserialize:
14360ms
Size: 1000000 Serialize: 6319ms Unserialize:
72744ms
Patched PHP:
Size: 100000 Serialize: 500ms Unserialize:
357ms
Size: 200000 Serialize: 870ms Unserialize:
703ms
Size: 500000 Serialize: 2212ms Unserialize:
1315ms
Size: 1000000 Serialize: 4898ms Unserialize:
2823ms
Previous Comments:
------------------------------------------------------------------------
[2010-09-14 02:58:37] [email protected]
> In my tests doing so reduced the unserialize time from 7 secs to ~0.3
sec on 1000000-size array and size dependency apparently changed to
something more like O(n*log(n))
Could you submit a patch with that modification and a test script that
exemplifies the speedup?
------------------------------------------------------------------------
[2010-09-14 02:46:32] galaxy dot mipt at gmail dot com
Description:
------------
Performance of built-in unserializer degrades at unexpectedly high rate
with the increase of unserialized data size (rather, with number of
serialized items). Say, unserializing a plain array of ~1000000 integers
might take somewhat 10 secs on average P4 machine, and the worst part is
that the time raises quadratically (O(n^2)) with the array size, i.e.
~2000000-ish array would take 40 secs or so.
The main performance killer is var_hash linked list where every
extracted variable is pushed. It is looked up sequentally from the very
beginning up to, in fact, the very end during every push operation
(var_push() in ext/standard/var_unserializer.c). It appears that looking
from the end (or just storing last used element elsewhere) would save a
lot of cycles.
In my tests doing so reduced the unserialize time from 7 secs to ~0.3
sec on 1000000-size array and size dependency apparently changed to
something more like O(n*log(n))
------------------------------------------------------------------------
--
Edit this bug report at http://bugs.php.net/bug.php?id=52832&edit=1