Re: [xml] performance of parsing docbook with xincludes
On 05/14/2018 09:48 PM, Stefan Sauer wrote: > On 05/14/2018 12:19 PM, Nick Wellnhofer wrote: >> On 13/05/2018 20:54, Stefan Sauer wrote: >>> Lets look at some numbers using glib >>> (https://gitlab.gnome.org/GNOME/glib) >>> >>> cd glib/docs/reference/glib >>> xmllint --timing --xinclude --noout glib-docs.xml >>> Parsing took 0 ms >>> Xinclude processing took 4560 ms >>> Freeing took 91 ms >>> >>> Any idea how I can get more breakdown of whats happening in 'Xinclude >>> processing'? >> >> It seems that "XInclude processing" also contains the time needed to >> parse the included documents, so maybe the XIncludes aren't the issue >> at all (glib-docs.xml is a small document including several larger >> ones). Can you save glib-docs.xml after processing XIncludes and >> check whether parsing the consolidated document is considerably faster? >> >>> Running with "perf record -g -- xmllint --timing --xinclude --noout >>> glib-docs.xml" gets me such a report. >>> >>> + 17.15% 16.69% xmllint libc-2.24.so [.] _int_malloc >>> + 11.93% 11.87% xmllint libc-2.24.so [.] >>> malloc_consolidate >>> + 9.01% 8.97% xmllint libxml2.so.2.9.4 [.] xmlDictLookup >>> + 7.15% 0.00% xmllint ld-2.24.so [.] >>> 0x8021a0022010 >>> + 6.25% 6.21% xmllint libxml2.so.2.9.4 [.] xmlHashAddEntry3 >>> + 6.22% 0.00% xmllint libxml2.so.2.9.4 [.] >>> xmlSAX2IsStandalone >>> + 6.22% 0.00% xmllint [unknown] [.] >>> 0x56413c74c0854810 >>> + 3.95% 3.94% xmllint libxml2.so.2.9.4 [.] xmlHashLookup2 >>> 3.72% 3.70% xmllint libc-2.24.so [.] _int_free >>> + 3.28% 0.00% xmllint [unknown] [.] >>> + 3.06% 3.04% xmllint libxml2.so.2.9.4 [.] >>> xmlFreeDocElementContent >>> + 2.96% 2.91% xmllint libc-2.24.so [.] free >> >> The callgraph based reports (perf report -g or -G) are usually more >> helpful. > > This part looks suspicious: >|--22.98%--0xc2160 >| xmlFreeDoc >| | >| --22.42%--xmlFreeDtd >| | >| |--19.62%--xmlHashFree >| | | >| | |--10.03%--_int_free >| | | | >| | | > --9.56%--malloc_consolidate >| | | >| | |--3.69%--0x7e560 >| | | > xmlFreeDocElementContent >| | | | >| | | > --2.19%--xmlFreeDocElementContent >| | | >| | |--0.71%--0x7face >| | | >| | |--0.66%--0x30498 >| | | >| | --0.61%--0x7fae3 >| | xmlUnlinkNode >| | >| --0.89%--xmlFreeNode > > > Can I tell it to not load dtds in the first place? Is it loading the dtd for > each an every xinclude? > > Stefan All my xincluded files have doctype headers like: http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd; [ ]> If I remove them it seems to become faster. I'll see if I can programmatically strip them all to be really sure though. Stefan > >> >>> Any ideas. Is there a know issues with using xincludes here? >> >> It might be quadratic behavior in the XInclude engine or something >> else entirely. How large is glib-docs.xml after processing XIncludes? >> >> Nick > > > > > ___ > xml mailing list, project page http://xmlsoft.org/ > xml@gnome.org > https://mail.gnome.org/mailman/listinfo/xml ___ xml mailing list, project page http://xmlsoft.org/ xml@gnome.org https://mail.gnome.org/mailman/listinfo/xml
Re: [xml] performance of parsing docbook with xincludes
On 05/14/2018 12:19 PM, Nick Wellnhofer wrote: > On 13/05/2018 20:54, Stefan Sauer wrote: >> Lets look at some numbers using glib >> (https://gitlab.gnome.org/GNOME/glib) >> >> cd glib/docs/reference/glib >> xmllint --timing --xinclude --noout glib-docs.xml >> Parsing took 0 ms >> Xinclude processing took 4560 ms >> Freeing took 91 ms >> >> Any idea how I can get more breakdown of whats happening in 'Xinclude >> processing'? > > It seems that "XInclude processing" also contains the time needed to > parse the included documents, so maybe the XIncludes aren't the issue > at all (glib-docs.xml is a small document including several larger > ones). Can you save glib-docs.xml after processing XIncludes and check > whether parsing the consolidated document is considerably faster? > >> Running with "perf record -g -- xmllint --timing --xinclude --noout >> glib-docs.xml" gets me such a report. >> >> + 17.15% 16.69% xmllint libc-2.24.so [.] _int_malloc >> + 11.93% 11.87% xmllint libc-2.24.so [.] >> malloc_consolidate >> + 9.01% 8.97% xmllint libxml2.so.2.9.4 [.] xmlDictLookup >> + 7.15% 0.00% xmllint ld-2.24.so [.] >> 0x8021a0022010 >> + 6.25% 6.21% xmllint libxml2.so.2.9.4 [.] xmlHashAddEntry3 >> + 6.22% 0.00% xmllint libxml2.so.2.9.4 [.] >> xmlSAX2IsStandalone >> + 6.22% 0.00% xmllint [unknown] [.] >> 0x56413c74c0854810 >> + 3.95% 3.94% xmllint libxml2.so.2.9.4 [.] xmlHashLookup2 >> 3.72% 3.70% xmllint libc-2.24.so [.] _int_free >> + 3.28% 0.00% xmllint [unknown] [.] >> + 3.06% 3.04% xmllint libxml2.so.2.9.4 [.] >> xmlFreeDocElementContent >> + 2.96% 2.91% xmllint libc-2.24.so [.] free > > The callgraph based reports (perf report -g or -G) are usually more > helpful. This part looks suspicious: |--22.98%--0xc2160 | xmlFreeDoc | | | --22.42%--xmlFreeDtd | | | |--19.62%--xmlHashFree | | | | | |--10.03%--_int_free | | | | | | | --9.56%--malloc_consolidate | | | | | |--3.69%--0x7e560 | | | xmlFreeDocElementContent | | | | | | | --2.19%--xmlFreeDocElementContent | | | | | |--0.71%--0x7face | | | | | |--0.66%--0x30498 | | | | | --0.61%--0x7fae3 | | xmlUnlinkNode | | | --0.89%--xmlFreeNode Can I tell it to not load dtds in the first place? Is it loading the dtd for each an every xinclude? Stefan > >> Any ideas. Is there a know issues with using xincludes here? > > It might be quadratic behavior in the XInclude engine or something > else entirely. How large is glib-docs.xml after processing XIncludes? > > Nick ___ xml mailing list, project page http://xmlsoft.org/ xml@gnome.org https://mail.gnome.org/mailman/listinfo/xml
Re: [xml] performance of parsing docbook with xincludes
Seems that the list does not like attachements, so resensing with links On 05/14/2018 12:19 PM, Nick Wellnhofer wrote: > On 13/05/2018 20:54, Stefan Sauer wrote: >> Lets look at some numbers using glib >> (https://gitlab.gnome.org/GNOME/glib) >> >> cd glib/docs/reference/glib >> xmllint --timing --xinclude --noout glib-docs.xml >> Parsing took 0 ms >> Xinclude processing took 4560 ms >> Freeing took 91 ms >> >> Any idea how I can get more breakdown of whats happening in 'Xinclude >> processing'? > > It seems that "XInclude processing" also contains the time needed to > parse the included documents, so maybe the XIncludes aren't the issue > at all (glib-docs.xml is a small document including several larger > ones). Can you save glib-docs.xml after processing XIncludes and check > whether parsing the consolidated document is considerably faster? $ xmllint --timing --xinclude glib-docs.xml >glib-full.xml Parsing took 0 ms Xinclude processing took 4726 ms Saving took 85 ms Freeing took 104 ms $ xmllint --timing --noout glib-full.xml Parsing took 151 ms Freeing took 14 ms $ ls -alh glib-full.xml -rw-r--r-- 1 ensonic users 6.8M May 14 12:42 glib-full.xml Parsing the consolidated doc is a magnitude faster. Thanks for suggesting this test. $ xtime sh -c 'find . -name "*.xml" | grep -v version.xml | xargs xmllint --noout' 0.40u 0.04s 0.44r 70296kB sh -c find . -name "*.xml" | grep -v version.xml | xargs xmllint --noout Even parsing all files like this is >10 times faster. > >> Running with "perf record -g -- xmllint --timing --xinclude --noout >> glib-docs.xml" gets me such a report. >> >> + 17.15% 16.69% xmllint libc-2.24.so [.] _int_malloc >> + 11.93% 11.87% xmllint libc-2.24.so [.] >> malloc_consolidate >> + 9.01% 8.97% xmllint libxml2.so.2.9.4 [.] xmlDictLookup >> + 7.15% 0.00% xmllint ld-2.24.so [.] >> 0x8021a0022010 >> + 6.25% 6.21% xmllint libxml2.so.2.9.4 [.] xmlHashAddEntry3 >> + 6.22% 0.00% xmllint libxml2.so.2.9.4 [.] >> xmlSAX2IsStandalone >> + 6.22% 0.00% xmllint [unknown] [.] >> 0x56413c74c0854810 >> + 3.95% 3.94% xmllint libxml2.so.2.9.4 [.] xmlHashLookup2 >> 3.72% 3.70% xmllint libc-2.24.so [.] _int_free >> + 3.28% 0.00% xmllint [unknown] [.] >> + 3.06% 3.04% xmllint libxml2.so.2.9.4 [.] >> xmlFreeDocElementContent >> + 2.96% 2.91% xmllint libc-2.24.so [.] free > > The callgraph based reports (perf report -g or -G) are usually more > helpful. here is a full perf.log with callgraph. https://gist.github.com/ensonic/a73608edd60a995374e0961d3840eb4f > >> Any ideas. Is there a know issues with using xincludes here? > > It might be quadratic behavior in the XInclude engine or something > else entirely. How large is glib-docs.xml after processing XIncludes? See above. Btw. only the toplevel doc is using xincludes, the included docs don't include other docs: $grep "xi:inc" glib-docs.xml | wc -l 116 > > Nick ___ xml mailing list, project page http://xmlsoft.org/ xml@gnome.org https://mail.gnome.org/mailman/listinfo/xml
Re: [xml] performance of parsing docbook with xincludes
On 13/05/2018 20:54, Stefan Sauer wrote: Lets look at some numbers using glib (https://gitlab.gnome.org/GNOME/glib) cd glib/docs/reference/glib xmllint --timing --xinclude --noout glib-docs.xml Parsing took 0 ms Xinclude processing took 4560 ms Freeing took 91 ms Any idea how I can get more breakdown of whats happening in 'Xinclude processing'? It seems that "XInclude processing" also contains the time needed to parse the included documents, so maybe the XIncludes aren't the issue at all (glib-docs.xml is a small document including several larger ones). Can you save glib-docs.xml after processing XIncludes and check whether parsing the consolidated document is considerably faster? Running with "perf record -g -- xmllint --timing --xinclude --noout glib-docs.xml" gets me such a report. + 17.15% 16.69% xmllint libc-2.24.so [.] _int_malloc + 11.93% 11.87% xmllint libc-2.24.so [.] malloc_consolidate + 9.01% 8.97% xmllint libxml2.so.2.9.4 [.] xmlDictLookup + 7.15% 0.00% xmllint ld-2.24.so [.] 0x8021a0022010 + 6.25% 6.21% xmllint libxml2.so.2.9.4 [.] xmlHashAddEntry3 + 6.22% 0.00% xmllint libxml2.so.2.9.4 [.] xmlSAX2IsStandalone + 6.22% 0.00% xmllint [unknown] [.] 0x56413c74c0854810 + 3.95% 3.94% xmllint libxml2.so.2.9.4 [.] xmlHashLookup2 3.72% 3.70% xmllint libc-2.24.so [.] _int_free + 3.28% 0.00% xmllint [unknown] [.] + 3.06% 3.04% xmllint libxml2.so.2.9.4 [.] xmlFreeDocElementContent + 2.96% 2.91% xmllint libc-2.24.so [.] free The callgraph based reports (perf report -g or -G) are usually more helpful. Any ideas. Is there a know issues with using xincludes here? It might be quadratic behavior in the XInclude engine or something else entirely. How large is glib-docs.xml after processing XIncludes? Nick ___ xml mailing list, project page http://xmlsoft.org/ xml@gnome.org https://mail.gnome.org/mailman/listinfo/xml