On 06/07/2018 01:55 PM, Nick Wellnhofer wrote: > On 07/06/2018 00:00, Stefan Sauer wrote: >>>> Another idea is to stop loading external DTDs for XIncludes without an >>>> XPointer expression. This would still change the behavior for some >>>> users but it's much less likely to cause problems. >> change the behaviour, as in we would not catch validation errors? > > No, nothing related to validation. If you validate a document, the > DTDs will always be loaded. But parsing with or without > XML_PARSE_DTDLOAD will obviously produce different results. It's hard > to tell whether this will cause problems for users. But maybe I'm > overly cautious. If someone parses a document without DTD flags, why > would they assume that XIncluded documents are parsed with > XML_PARSE_DTDLOAD? Validation is one thing, but e.g. applying default attributes is another thing. Basically what I want to avoid is loading the external subset over and over again, but the internal subset should be applied. I am still looking where things like <!ENTITY % local.common.attrib "xmlns:xi CDATA #FIXED 'http://www.w3.org/2003/XInclude'"> are applied. The other problem seem to be that id refs between the master and the xincluded docs are not resolved - is that what XML_DETECT_IDS controls? I check the docs comment in the sources, but it is hard to tell. If I don't comment out pctxt->loadsubset |= XML_DETECT_IDS; I get my links resolved, but the speedup is gone.
> >> Too bad that xmlXIncludeParseFile() does not get the parent parserCtx, >> in that case we could apply the same flags'. > > I think the original flags are already passed via xmlXIncludeSetFlags. You are right, traced it back. > >> It seems that xmldict is only handling key and value to be a string, >> right? So, we'll even need out one cache data structure. I'd say it >> would need to be on the _xmlXIncludeCtxt level. global is easier, but >> then we can't free it ever :/ > > xmlHash should work fine: > > http://xmlsoft.org/html/libxml-hash.html > > But building a DTD cache would be the least of your problems. The hard > part is to apply a cached DTD to a document. There are some > interactions between internal and external subsets (see > xmlAddElementDecl and xmlAddAttributeDecl in valid.c for example), so > you it looks like you can't just simply set doc->extSubset to the > cached DTD. You'd probably have to replay the calls to > xmlAddElementDecl etc, maybe even in the original order which might be > lost. That's why I wouldn't want to go down this route. From looking more at the code I aggree. I am now checking if I can share the xmlDict between all the dtds so that we fix the 25% spent in xmlFree. I don't want to replace allocators, since I am using it from python via lxml and I won't be able to patch the allocators. Thanks for your support on discussing the options. > > Nick _______________________________________________ xml mailing list, project page http://xmlsoft.org/ xml@gnome.org https://mail.gnome.org/mailman/listinfo/xml