On Fri, Sep 24, 2010 at 01:47:15AM +0200, Max Kisselew wrote: [...] > I wanted to extract all the content from the <token> elements. In the xml > file without the namespace definitions that takes just a moment (less > that 30 seconds). > But when I tried to perform the same on the new file with namespaces, it > took much longer, more that 30 minutes (!). The xml file was about 7 MB. > > Since the same problem occurs when one tries to parse the xml file > with the LibXML2 binding for Perl, I guess the problem comes from > LibXML2 itself. > > It is also strange that the performance problem seems to grow with the > amount of the <token> > tags to be parsed. So the first 10 000 tags only need about a second. > But when we parse the > first 20 000 tags, it takes 21 seconds! Do you have any idea about the > cause of this problem > and how it could be solved?
please try first with libxml2 directly, please make sure you have a recent version xmllint --noout your.xml xmllint --version second make sure that you're not exhausting your available memory, if the system begin to swap, there is no way performances are gonna be linear, 7MB is unlikely to result in swap, but ... if xmllint --noout takes really too long, then I will investigate provide me a gzipped version of the file on some server to have a look. Daniel -- Daniel Veillard | libxml Gnome XML XSLT toolkit http://xmlsoft.org/ [email protected] | Rpmfind RPM search engine http://rpmfind.net/ http://veillard.com/ | virtualization library http://libvirt.org/ _______________________________________________ xml mailing list, project page http://xmlsoft.org/ [email protected] http://mail.gnome.org/mailman/listinfo/xml
