On Tue, Feb 12, 2008 at 10:32:42PM -0500, Edward Z. Yang wrote:
Hi Edward,
> Over at the PHP documentation project, we use libxml in order to parse
> and then process our documentation. [1] Recently, some optimization work
> was done to the loading and resolution of entities inside or XML
> documents faster; [2] the LIBXML_COMPACT flag was the primary change,
I assume you mean XML_PARSE_COMPACT
> and for some people reduced the processing time of 24 MB worth of XML
> documents spread over thirteen thousand files to a mere five seconds.
>
> However, the performance gains have not been uniform; other systems
> (with comparable or even better hardware specs) still take several
> minutes to parse and validate our document, with memory usage breaking
> into gigabytes (for comparison, the optimization only uses 400 MB when
> it's working properly).
>
> These discrepancies don't appear to be tied to libxml version (2.6.26 is
> one of the ones used on the slow machine) or operating system (Windows
> Vista and Ubuntu Linux have been shown to have this problem).
>
> Any thoughts or ideas as to what may be the cause of these problems?
> Even if they're not "fixable", it would be nice to know why libxml is
> much faster on some systems than others. Thank you!
that's very strange. Libxml2 code itself is of course deterministic
but it seems to be 'machine' related, and hence related to the environment.
There is 3 things I can think of which could lead to such variations:
- memory pressure: you are building trees so this means a lot of
small allocations so depending on the available memory, you could
see huge changes, other applications competing for the memory pool
can also raise serious problems
- threading problems, or DNS problems
- 32 vs 64bit machines/systems. If you use XML_PARSE_COMPACT some of the
small text nodes content will get stored directly in the node structure
in an unused pointer. On a 32 bits machines very few nodes or attributes
are likely to fit in the 4 bytes (including terminating 0), while on
a 64bit box, you have 8 bytes to store the string and a lot more can
be compacted that way.
I would say, check the amount of memory and competing applications, and
make sure you have a fully 64bits stack.
Daniel
--
Red Hat Virtualization group http://redhat.com/virtualization/
Daniel Veillard | virtualization library http://libvirt.org/
[EMAIL PROTECTED] | libxml GNOME XML XSLT toolkit http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/
_______________________________________________
xml mailing list, project page http://xmlsoft.org/
[email protected]
http://mail.gnome.org/mailman/listinfo/xml