Edit report at https://bugs.php.net/bug.php?id=53655&edit=1
ID: 53655
Comment by: M dot Slowe at kent dot ac dot uk
Reported by: olav dot morken at uninett dot no
Summary: Improve speed of DOMNode::C14N() on large XML
documents
Status: Assigned
Type: Feature/Change Request
Package: DOM XML related
PHP Version: 5.3.4
Assigned To: rrichards
Block user comment: N
Private report: N
New Comment:
This problem appears to be work-around-able by patching xmlseclibs.php (in
SimpleSAMLphp, at least).
See
https://groups.google.com/forum/#!msg/simplesamlphp/b6fTf53iq4w/uNhw_NBNzxkJ
for details and the patch at:
http://pastebin.com/byVmBXHQ
Previous Comments:
------------------------------------------------------------------------
[2011-01-05 08:33:02] olav dot morken at uninett dot no
Description:
------------
The C14N() function appears to have a runtime that is O(N^2) (or possibly
worse?) depending on input size, which means that it becomes very slow as the
input grows. For example, an input with around 196000 nodes takes about 290
seconds, while an input with 486000 nodes takes 2200 seconds.
Note that this problem only occurs when canonicalizing a subtree of the
docuemnt. If we canonicalize the whole document, it completes almost
immediately.
The problem is that canonicalization uses an XPath expression to find the
nodeset that should be canonicalized. Evaluation of the XPath expression takes
a lot of time as the input size grows, but the libxml2 xmlC14NDocSaveTo()
function also has to do a lookup in the nodeset returned by the XPath
expression for every node it encounters.
I believe a better solution would be to do this like it is done in the xmlsec
library. This library use the xmlC14NExecute()-function instead, which accepts
a callback that determines whether a node should be included in the result.
This should make the speed of canonicalization linear with the input size.
Test script:
---------------
<?php
$doc = new DOMDocument();
$doc->load('some-large-xml-file.xml');
$start = microtime(TRUE);
$doc->documentElement->C14N(FALSE, FALSE);
echo "Done in " . (microtime(TRUE) - $start) . " seconds.\n";
------------------------------------------------------------------------
--
Edit this bug report at https://bugs.php.net/bug.php?id=53655&edit=1