On 06/07/2018 01:55 PM, Nick Wellnhofer wrote:
> On 07/06/2018 00:00, Stefan Sauer wrote:
>>>> Another idea is to stop loading external DTDs for XIncludes without an
>>>> XPointer expression. This would still change the behavior for some
>>>> users but it's much less likely to cause problems.
>> change the behaviour, as in we would not catch validation errors?
>
> No, nothing related to validation. If you validate a document, the
> DTDs will always be loaded. But parsing with or without
> XML_PARSE_DTDLOAD will obviously produce different results. It's hard
> to tell whether this will cause problems for users. But maybe I'm
> overly cautious. If someone parses a document without DTD flags, why
> would they assume that XIncluded documents are parsed with
> XML_PARSE_DTDLOAD?
Validation is one thing, but e.g. applying default attributes is another
thing. Basically what I want to avoid is loading the external subset
over and over again, but the internal subset should be applied. I am
still looking where things like
<!ENTITY % local.common.attrib "xmlns:xi  CDATA  #FIXED
'http://www.w3.org/2003/XInclude'">
are applied. The other problem seem to be that id refs between the
master and the xincluded docs are not resolved - is that what
XML_DETECT_IDS controls? I check the docs comment in the sources, but it
is hard to tell. If I don't comment out
  pctxt->loadsubset |= XML_DETECT_IDS;
I get my links resolved, but the speedup is gone.

>
>> Too bad that xmlXIncludeParseFile() does not get the parent parserCtx,
>> in that case we could apply the same flags'.
>
> I think the original flags are already passed via xmlXIncludeSetFlags.
You are right, traced it back.

>
>> It seems that xmldict is only handling key and value to be a string,
>> right? So, we'll even need out one cache data structure. I'd say it
>> would need to be on the _xmlXIncludeCtxt level. global is easier, but
>> then we can't free it ever :/
>
> xmlHash should work fine:
>
>     http://xmlsoft.org/html/libxml-hash.html
>
> But building a DTD cache would be the least of your problems. The hard
> part is to apply a cached DTD to a document. There are some
> interactions between internal and external subsets (see
> xmlAddElementDecl and xmlAddAttributeDecl in valid.c for example), so
> you it looks like you can't just simply set doc->extSubset to the
> cached DTD. You'd probably have to replay the calls to
> xmlAddElementDecl etc, maybe even in the original order which might be
> lost. That's why I wouldn't want to go down this route.

From looking more at the code I aggree. I am now checking if I can share
the xmlDict between all the dtds so that we fix the 25% spent in
xmlFree. I don't want to replace allocators, since I am using it from
python via lxml and I won't be able to patch the allocators.

Thanks for your support on discussing the options.

>
> Nick



_______________________________________________
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
https://mail.gnome.org/mailman/listinfo/xml

Reply via email to