Re: [xml] performance of parsing docbook with xincludes

2018-05-14 Thread Stefan Sauer
On 05/14/2018 09:48 PM, Stefan Sauer wrote:
> On 05/14/2018 12:19 PM, Nick Wellnhofer wrote:
>> On 13/05/2018 20:54, Stefan Sauer wrote:
>>> Lets look at some numbers using glib
>>> (https://gitlab.gnome.org/GNOME/glib)
>>>
>>> cd glib/docs/reference/glib
>>> xmllint --timing --xinclude --noout glib-docs.xml
>>> Parsing took 0 ms
>>> Xinclude processing took 4560 ms
>>> Freeing took 91 ms
>>>
>>> Any idea how I can get more breakdown of whats happening in  'Xinclude
>>> processing'?
>>
>> It seems that "XInclude processing" also contains the time needed to
>> parse the included documents, so maybe the XIncludes aren't the issue
>> at all (glib-docs.xml is a small document including several larger
>> ones). Can you save glib-docs.xml after processing XIncludes and
>> check whether parsing the consolidated document is considerably faster?
>>
>>> Running with "perf record -g -- xmllint --timing --xinclude --noout
>>> glib-docs.xml" gets me such a report.
>>>
>>> +   17.15%    16.69%  xmllint  libc-2.24.so    [.] _int_malloc
>>> +   11.93%    11.87%  xmllint  libc-2.24.so    [.]
>>> malloc_consolidate
>>> +    9.01% 8.97%  xmllint  libxml2.so.2.9.4    [.] xmlDictLookup
>>> +    7.15% 0.00%  xmllint  ld-2.24.so  [.]
>>> 0x8021a0022010
>>> +    6.25% 6.21%  xmllint  libxml2.so.2.9.4    [.] xmlHashAddEntry3
>>> +    6.22% 0.00%  xmllint  libxml2.so.2.9.4    [.]
>>> xmlSAX2IsStandalone
>>> +    6.22% 0.00%  xmllint  [unknown]   [.]
>>> 0x56413c74c0854810
>>> +    3.95% 3.94%  xmllint  libxml2.so.2.9.4    [.] xmlHashLookup2
>>>   3.72% 3.70%  xmllint  libc-2.24.so    [.] _int_free
>>> +    3.28% 0.00%  xmllint  [unknown]   [.] 
>>> +    3.06% 3.04%  xmllint  libxml2.so.2.9.4    [.]
>>> xmlFreeDocElementContent
>>> +    2.96% 2.91%  xmllint  libc-2.24.so    [.] free
>>
>> The callgraph based reports (perf report -g or -G) are usually more
>> helpful.
>
> This part looks suspicious:
>|--22.98%--0xc2160
>|  xmlFreeDoc
>|  |  
>|   --22.42%--xmlFreeDtd
>| |  
>| |--19.62%--xmlHashFree
>| |  |  
>| |  |--10.03%--_int_free
>| |  |  |  
>| |  |   
> --9.56%--malloc_consolidate
>| |  |  
>| |  |--3.69%--0x7e560
>| |  |  
> xmlFreeDocElementContent
>| |  |  |  
>| |  |   
> --2.19%--xmlFreeDocElementContent
>| |  |  
>| |  |--0.71%--0x7face
>| |  |  
>| |  |--0.66%--0x30498
>| |  |  
>| |   --0.61%--0x7fae3
>| | xmlUnlinkNode
>| |  
>|  --0.89%--xmlFreeNode
>
>
> Can I tell it to not load dtds in the first place? Is it loading the dtd for 
> each an every xinclude?
>
> Stefan
All my xincluded files have doctype headers like:

http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd; [
]>

If I remove them it seems to become faster. I'll see if I can
programmatically strip them all to be really sure though.

Stefan

>
>>
>>> Any ideas. Is there a know issues with using xincludes here?
>>
>> It might be quadratic behavior in the XInclude engine or something
>> else entirely. How large is glib-docs.xml after processing XIncludes?
>>
>> Nick
>
>
>
>
> ___
> xml mailing list, project page  http://xmlsoft.org/
> xml@gnome.org
> https://mail.gnome.org/mailman/listinfo/xml


___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
https://mail.gnome.org/mailman/listinfo/xml


Re: [xml] performance of parsing docbook with xincludes

2018-05-14 Thread Stefan Sauer
On 05/14/2018 12:19 PM, Nick Wellnhofer wrote:
> On 13/05/2018 20:54, Stefan Sauer wrote:
>> Lets look at some numbers using glib
>> (https://gitlab.gnome.org/GNOME/glib)
>>
>> cd glib/docs/reference/glib
>> xmllint --timing --xinclude --noout glib-docs.xml
>> Parsing took 0 ms
>> Xinclude processing took 4560 ms
>> Freeing took 91 ms
>>
>> Any idea how I can get more breakdown of whats happening in  'Xinclude
>> processing'?
>
> It seems that "XInclude processing" also contains the time needed to
> parse the included documents, so maybe the XIncludes aren't the issue
> at all (glib-docs.xml is a small document including several larger
> ones). Can you save glib-docs.xml after processing XIncludes and check
> whether parsing the consolidated document is considerably faster?
>
>> Running with "perf record -g -- xmllint --timing --xinclude --noout
>> glib-docs.xml" gets me such a report.
>>
>> +   17.15%    16.69%  xmllint  libc-2.24.so    [.] _int_malloc
>> +   11.93%    11.87%  xmllint  libc-2.24.so    [.]
>> malloc_consolidate
>> +    9.01% 8.97%  xmllint  libxml2.so.2.9.4    [.] xmlDictLookup
>> +    7.15% 0.00%  xmllint  ld-2.24.so  [.]
>> 0x8021a0022010
>> +    6.25% 6.21%  xmllint  libxml2.so.2.9.4    [.] xmlHashAddEntry3
>> +    6.22% 0.00%  xmllint  libxml2.so.2.9.4    [.]
>> xmlSAX2IsStandalone
>> +    6.22% 0.00%  xmllint  [unknown]   [.]
>> 0x56413c74c0854810
>> +    3.95% 3.94%  xmllint  libxml2.so.2.9.4    [.] xmlHashLookup2
>>   3.72% 3.70%  xmllint  libc-2.24.so    [.] _int_free
>> +    3.28% 0.00%  xmllint  [unknown]   [.] 
>> +    3.06% 3.04%  xmllint  libxml2.so.2.9.4    [.]
>> xmlFreeDocElementContent
>> +    2.96% 2.91%  xmllint  libc-2.24.so    [.] free
>
> The callgraph based reports (perf report -g or -G) are usually more
> helpful.

This part looks suspicious:

   |--22.98%--0xc2160
   |  xmlFreeDoc
   |  |  
   |   --22.42%--xmlFreeDtd
   | |  
   | |--19.62%--xmlHashFree
   | |  |  
   | |  |--10.03%--_int_free
   | |  |  |  
   | |  |   
--9.56%--malloc_consolidate
   | |  |  
   | |  |--3.69%--0x7e560
   | |  |  
xmlFreeDocElementContent
   | |  |  |  
   | |  |   
--2.19%--xmlFreeDocElementContent
   | |  |  
   | |  |--0.71%--0x7face
   | |  |  
   | |  |--0.66%--0x30498
   | |  |  
   | |   --0.61%--0x7fae3
   | | xmlUnlinkNode
   | |  
   |  --0.89%--xmlFreeNode


Can I tell it to not load dtds in the first place? Is it loading the dtd for 
each an every xinclude?

Stefan

>
>> Any ideas. Is there a know issues with using xincludes here?
>
> It might be quadratic behavior in the XInclude engine or something
> else entirely. How large is glib-docs.xml after processing XIncludes?
>
> Nick


___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
https://mail.gnome.org/mailman/listinfo/xml


Re: [xml] performance of parsing docbook with xincludes

2018-05-14 Thread Stefan Sauer
Seems that the list does not like attachements, so resensing with links

On 05/14/2018 12:19 PM, Nick Wellnhofer wrote:
> On 13/05/2018 20:54, Stefan Sauer wrote:
>> Lets look at some numbers using glib
>> (https://gitlab.gnome.org/GNOME/glib)
>>
>> cd glib/docs/reference/glib
>> xmllint --timing --xinclude --noout glib-docs.xml
>> Parsing took 0 ms
>> Xinclude processing took 4560 ms
>> Freeing took 91 ms
>>
>> Any idea how I can get more breakdown of whats happening in  'Xinclude
>> processing'?
>
> It seems that "XInclude processing" also contains the time needed to
> parse the included documents, so maybe the XIncludes aren't the issue
> at all (glib-docs.xml is a small document including several larger
> ones). Can you save glib-docs.xml after processing XIncludes and check
> whether parsing the consolidated document is considerably faster?

$ xmllint --timing --xinclude glib-docs.xml >glib-full.xml

Parsing took 0 ms
Xinclude processing took 4726 ms
Saving took 85 ms
Freeing took 104 ms

$ xmllint --timing --noout glib-full.xml

Parsing took 151 ms
Freeing took 14 ms

$ ls -alh glib-full.xml

-rw-r--r-- 1 ensonic users 6.8M May 14 12:42 glib-full.xml

Parsing the consolidated doc is a magnitude faster. Thanks for
suggesting this test.

$ xtime sh -c 'find . -name "*.xml" | grep -v version.xml | xargs xmllint 
--noout'
0.40u 0.04s 0.44r 70296kB sh -c find . -name "*.xml" | grep -v version.xml | 
xargs xmllint --noout

Even parsing all files like this is >10 times faster.

>
>> Running with "perf record -g -- xmllint --timing --xinclude --noout
>> glib-docs.xml" gets me such a report.
>>
>> +   17.15%    16.69%  xmllint  libc-2.24.so    [.] _int_malloc
>> +   11.93%    11.87%  xmllint  libc-2.24.so    [.]
>> malloc_consolidate
>> +    9.01% 8.97%  xmllint  libxml2.so.2.9.4    [.] xmlDictLookup
>> +    7.15% 0.00%  xmllint  ld-2.24.so  [.]
>> 0x8021a0022010
>> +    6.25% 6.21%  xmllint  libxml2.so.2.9.4    [.] xmlHashAddEntry3
>> +    6.22% 0.00%  xmllint  libxml2.so.2.9.4    [.]
>> xmlSAX2IsStandalone
>> +    6.22% 0.00%  xmllint  [unknown]   [.]
>> 0x56413c74c0854810
>> +    3.95% 3.94%  xmllint  libxml2.so.2.9.4    [.] xmlHashLookup2
>>   3.72% 3.70%  xmllint  libc-2.24.so    [.] _int_free
>> +    3.28% 0.00%  xmllint  [unknown]   [.] 
>> +    3.06% 3.04%  xmllint  libxml2.so.2.9.4    [.]
>> xmlFreeDocElementContent
>> +    2.96% 2.91%  xmllint  libc-2.24.so    [.] free
>
> The callgraph based reports (perf report -g or -G) are usually more
> helpful.
here is a full perf.log with callgraph.
https://gist.github.com/ensonic/a73608edd60a995374e0961d3840eb4f
>
>> Any ideas. Is there a know issues with using xincludes here?
>
> It might be quadratic behavior in the XInclude engine or something
> else entirely. How large is glib-docs.xml after processing XIncludes?

See above. Btw. only the toplevel doc is using xincludes, the included
docs don't include other docs:

$grep "xi:inc" glib-docs.xml | wc -l

116



>
> Nick


___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
https://mail.gnome.org/mailman/listinfo/xml


Re: [xml] performance of parsing docbook with xincludes

2018-05-14 Thread Nick Wellnhofer

On 13/05/2018 20:54, Stefan Sauer wrote:

Lets look at some numbers using glib (https://gitlab.gnome.org/GNOME/glib)

cd glib/docs/reference/glib
xmllint --timing --xinclude --noout glib-docs.xml
Parsing took 0 ms
Xinclude processing took 4560 ms
Freeing took 91 ms

Any idea how I can get more breakdown of whats happening in  'Xinclude
processing'?


It seems that "XInclude processing" also contains the time needed to parse the 
included documents, so maybe the XIncludes aren't the issue at all 
(glib-docs.xml is a small document including several larger ones). Can you 
save glib-docs.xml after processing XIncludes and check whether parsing the 
consolidated document is considerably faster?



Running with "perf record -g -- xmllint --timing --xinclude --noout
glib-docs.xml" gets me such a report.

+   17.15%    16.69%  xmllint  libc-2.24.so    [.] _int_malloc
+   11.93%    11.87%  xmllint  libc-2.24.so    [.] malloc_consolidate
+    9.01% 8.97%  xmllint  libxml2.so.2.9.4    [.] xmlDictLookup
+    7.15% 0.00%  xmllint  ld-2.24.so  [.] 0x8021a0022010
+    6.25% 6.21%  xmllint  libxml2.so.2.9.4    [.] xmlHashAddEntry3
+    6.22% 0.00%  xmllint  libxml2.so.2.9.4    [.] xmlSAX2IsStandalone
+    6.22% 0.00%  xmllint  [unknown]   [.] 0x56413c74c0854810
+    3.95% 3.94%  xmllint  libxml2.so.2.9.4    [.] xmlHashLookup2
  3.72% 3.70%  xmllint  libc-2.24.so    [.] _int_free
+    3.28% 0.00%  xmllint  [unknown]   [.] 
+    3.06% 3.04%  xmllint  libxml2.so.2.9.4    [.]
xmlFreeDocElementContent
+    2.96% 2.91%  xmllint  libc-2.24.so    [.] free


The callgraph based reports (perf report -g or -G) are usually more helpful.


Any ideas. Is there a know issues with using xincludes here?


It might be quadratic behavior in the XInclude engine or something else 
entirely. How large is glib-docs.xml after processing XIncludes?


Nick
___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
https://mail.gnome.org/mailman/listinfo/xml