Re: [xml] performance of parsing docbook with xincludes
On May 15, 2018, at 21:56 , Stefan Sauer wrote: > > On 05/15/2018 08:40 PM, Stefan Sauer wrote: >> On 05/15/2018 12:42 PM, Nick Wellnhofer wrote: >>> Can you try to change the line to >>> >>> xmlCtxtUseOptions(pctxt, ctxt->parseFlags); >>> >>> and see if it helps? >>> >> It does not help. I'll experiment further. Thanks for the recomendations. I think you also have to remove the line at https://git.gnome.org/browse/libxml2/tree/xinclude.c#n463 pctxt->loadsubset |= XML_DETECT_IDS; Looks like the idea is to make sure that ID attributes are detected for XIncludes with XPointers. IMO, it should be the application's responsibility to set the XML_PARSE_DTDLOAD flag in this case. But changing the behavior might break code that relies on this feature. > Is libxml2 doing that for each file over and over? Yes. > Wouldn't it make sense to only load each dtd once? This would make sense. > And where exatly is it loaded (I can only > see xmlFreeDtd, but can't find a xmlLoadDtd or the like. Via xmlParseDocument -> xmlSAX2ExternalSubset -> xmlParseExternalSubset. Nick ___ xml mailing list, project page http://xmlsoft.org/ xml@gnome.org https://mail.gnome.org/mailman/listinfo/xml
Re: [xml] performance of parsing docbook with xincludes
On 05/15/2018 08:40 PM, Stefan Sauer wrote: > On 05/15/2018 12:42 PM, Nick Wellnhofer wrote: >> On 14/05/2018 21:48, Stefan Sauer wrote: >>> This part looks suspicious: >>> >>> |--22.98%--0xc2160 >>> | xmlFreeDoc >>> | | >>> | --22.42%--xmlFreeDtd >>> Can I tell it to not load dtds in the first place? Is it loading the >>> dtd for each an every xinclude? >> Good catch. It seems that the XInclude engine always parses included >> docs with XML_PARSE_DTDLOAD: >> >> https://git.gnome.org/browse/libxml2/tree/xinclude.c#n450 >> >> If you're not using XML catalogs, this will probably cause the DTD to >> be loaded over the network multiple times which could explain the >> slowdown. >> >> Can you try to change the line to >> >> xmlCtxtUseOptions(pctxt, ctxt->parseFlags); >> >> and see if it helps? >> >> Nick > It does not help. I'll experiment further. Thanks for the recomendations. and FYI: a call grpah plot: https://imgur.com/a/d27xxor As an experiemnt I dropped the doctype headers for the (generated) xincluded files. So no it is 20 files with doctype headers + 105 (generated) files without doctype headers. And voila! xmllint --timing --xinclude --noout glib-docs.xml Parsing took 0 ms Xinclude processing took 447 ms Freeing took 19 ms The docbook header looks like this: http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd' [ http://www.w3.org/2003/XInclude'"> ]> and gtk-doc will replicate this for the fragments (replacing 'book' with e.g. 'refentry'). This way one can e.g. inject things like a version. I do have the /usr/share/xml/docbook/schema/dtd/4.5/docbookx.dtd locally available. I guess there is no way avoiding to loading the dtd then. Is libxml2 doing that for each file over and over? Wouldn't it make sense to only load each dtd once? And where exatly is it loaded (I can only see xmlFreeDtd, but can't find a xmlLoadDtd or the like. Sorry for all the questions, but it looks like there is low hanging fruit to save a lot of cpu time. Stefan > > > Stefan > > ___ > xml mailing list, project page http://xmlsoft.org/ > xml@gnome.org > https://mail.gnome.org/mailman/listinfo/xml ___ xml mailing list, project page http://xmlsoft.org/ xml@gnome.org https://mail.gnome.org/mailman/listinfo/xml
Re: [xml] performance of parsing docbook with xincludes
On 05/15/2018 12:42 PM, Nick Wellnhofer wrote: > On 14/05/2018 21:48, Stefan Sauer wrote: >> This part looks suspicious: >> >> |--22.98%--0xc2160 >> | xmlFreeDoc >> | | >> | --22.42%--xmlFreeDtd > >> Can I tell it to not load dtds in the first place? Is it loading the >> dtd for each an every xinclude? > > Good catch. It seems that the XInclude engine always parses included > docs with XML_PARSE_DTDLOAD: > > https://git.gnome.org/browse/libxml2/tree/xinclude.c#n450 > > If you're not using XML catalogs, this will probably cause the DTD to > be loaded over the network multiple times which could explain the > slowdown. > > Can you try to change the line to > > xmlCtxtUseOptions(pctxt, ctxt->parseFlags); > > and see if it helps? > > Nick It does not help. I'll experiment further. Thanks for the recomendations. Stefan ___ xml mailing list, project page http://xmlsoft.org/ xml@gnome.org https://mail.gnome.org/mailman/listinfo/xml
Re: [xml] performance of parsing docbook with xincludes
On 14/05/2018 21:48, Stefan Sauer wrote: This part looks suspicious: |--22.98%--0xc2160 | xmlFreeDoc | | | --22.42%--xmlFreeDtd Can I tell it to not load dtds in the first place? Is it loading the dtd for each an every xinclude? Good catch. It seems that the XInclude engine always parses included docs with XML_PARSE_DTDLOAD: https://git.gnome.org/browse/libxml2/tree/xinclude.c#n450 If you're not using XML catalogs, this will probably cause the DTD to be loaded over the network multiple times which could explain the slowdown. Can you try to change the line to xmlCtxtUseOptions(pctxt, ctxt->parseFlags); and see if it helps? Nick ___ xml mailing list, project page http://xmlsoft.org/ xml@gnome.org https://mail.gnome.org/mailman/listinfo/xml