Re: [xml] Runtime parser limit for maximum size of text nodes
On Mon, Jun 26, 2017 at 02:41:46PM +0200, Nick Wellnhofer wrote: > On 22/06/2017 22:29, Daniel Veillard wrote: > >No limit on text node and one can be DoS'ed, there is many kind > > of recursive attacks on XML, and libxml2 uses a combination of "entities > > density" and text node size to try to catch those, it's complex. > >By offloading the choice of the maximum text size to the application > > developper you then put them in charge of doing the checking, i.e. too > > big a size and the app can be DoS'ed in practice, it just depends on the > > computer speed and memory anount. They need to be aware of that. > > Right, this is something I overlooked. xmlParserEntityCheck should not use > the user-provided max text length. Otherwise, xmlSetMaxTextLength(ctxt, > SIZE_MAX) would disable one of the checks which wasn't my intention. We > should keep the hardcoded limit there. > > > The change of the size of the parsing context can bite some apps, that > > happen in the past when I did this, maybe they have all been corrected, > > but I would doubt it's gonna be 100% without side effects. > > OK, I could revert this part of the change. > > But thinking more about it, I come to the conclusion that libxml2 shouldn't > impose a limit on the maximum size of text nodes at all. If there's a > reliable mechanism to catch abusive entity expansions, the size of a text > node is bounded by the size of the input document. In fact, processing a > document containing a single 1 GB text node uses much less resources than a > document with 1 GB of ""s. The former should consume just a bit more > than 1 GB of memory for the text node, the latter will create 250 million > element nodes, consuming around 30 GB of memory on x86-64. I still think that one need to protect users with a default max text node side. The problem is that libxml2 is very often used on untrusted data, and if they can push a 30 GB document (not that hard on current networks) then you're DoS'ed as well. Same for max depth of a document (1 GB of ""), etc ... Daniel -- Daniel Veillard | Red Hat Developers Tools http://developer.redhat.com/ veill...@redhat.com | libxml Gnome XML XSLT toolkit http://xmlsoft.org/ http://veillard.com/ | virtualization library http://libvirt.org/ ___ xml mailing list, project page http://xmlsoft.org/ xml@gnome.org https://mail.gnome.org/mailman/listinfo/xml
Re: [xml] Runtime parser limit for maximum size of text nodes
On 22/06/2017 22:29, Daniel Veillard wrote: No limit on text node and one can be DoS'ed, there is many kind of recursive attacks on XML, and libxml2 uses a combination of "entities density" and text node size to try to catch those, it's complex. By offloading the choice of the maximum text size to the application developper you then put them in charge of doing the checking, i.e. too big a size and the app can be DoS'ed in practice, it just depends on the computer speed and memory anount. They need to be aware of that. Right, this is something I overlooked. xmlParserEntityCheck should not use the user-provided max text length. Otherwise, xmlSetMaxTextLength(ctxt, SIZE_MAX) would disable one of the checks which wasn't my intention. We should keep the hardcoded limit there. The change of the size of the parsing context can bite some apps, that happen in the past when I did this, maybe they have all been corrected, but I would doubt it's gonna be 100% without side effects. OK, I could revert this part of the change. But thinking more about it, I come to the conclusion that libxml2 shouldn't impose a limit on the maximum size of text nodes at all. If there's a reliable mechanism to catch abusive entity expansions, the size of a text node is bounded by the size of the input document. In fact, processing a document containing a single 1 GB text node uses much less resources than a document with 1 GB of ""s. The former should consume just a bit more than 1 GB of memory for the text node, the latter will create 250 million element nodes, consuming around 30 GB of memory on x86-64. Nick ___ xml mailing list, project page http://xmlsoft.org/ xml@gnome.org https://mail.gnome.org/mailman/listinfo/xml
Re: [xml] Runtime parser limit for maximum size of text nodes
On Thu, Jun 22, 2017 at 08:40:23AM -0600, Stacy W. Smith wrote: > > On 21/06/2017 23:41, Daniel Veillard wrote: > >> I see only one person asking for this. Like any change to the > >> structure (even adding at the end) this has > >> potential risks. IMHO not worth the risk. If the person has such a > >> specific need he can simply recompile > >> libxml2 with a different value of the constant for that piece of code. > > If your looking for feedback, I would also like to see this change committed. > I have a project which uses libxml2 indirectly via the Python lxml library. > In this environment, it is not practical/feasible to have users recompile > libxml2. No limit on text node and one can be DoS'ed, there is many kind of recursive attacks on XML, and libxml2 uses a combination of "entities density" and text node size to try to catch those, it's complex. By offloading the choice of the maximum text size to the application developper you then put them in charge of doing the checking, i.e. too big a size and the app can be DoS'ed in practice, it just depends on the computer speed and memory anount. They need to be aware of that. Your change would not change the default behaviour, that's a good point. The change of the size of the parsing context can bite some apps, that happen in the past when I did this, maybe they have all been corrected, but I would doubt it's gonna be 100% without side effects. Except for Inkscape which usually doesn't load its images from random web resources, I think use of XML_PARSE_HUGE to avoid the node text size limit is a serious mistake (at least for web apps and stacks), but I would not be too optimist about the actual testing of the apps after they pick a new one for DoS, too many vectors, too complex... Purely at the code review patch looks fine, I would add a comment that changing the text node max size has an influence on DoS detection by libxml2 and doing so while parsing untrusted source should get some security attention. Since the default is preserved, I'm not too worried, it's mostly the structure size change which may break on users. ACK Daniel -- Daniel Veillard | Red Hat Developers Tools http://developer.redhat.com/ veill...@redhat.com | libxml Gnome XML XSLT toolkit http://xmlsoft.org/ http://veillard.com/ | virtualization library http://libvirt.org/ ___ xml mailing list, project page http://xmlsoft.org/ xml@gnome.org https://mail.gnome.org/mailman/listinfo/xml
Re: [xml] Runtime parser limit for maximum size of text nodes
> On 21/06/2017 23:41, Daniel Veillard wrote: >> I see only one person asking for this. Like any change to the >> structure (even adding at the end) this has >> potential risks. IMHO not worth the risk. If the person has such a >> specific need he can simply recompile >> libxml2 with a different value of the constant for that piece of code. If your looking for feedback, I would also like to see this change committed. I have a project which uses libxml2 indirectly via the Python lxml library. In this environment, it is not practical/feasible to have users recompile libxml2. --Stacy ___ xml mailing list, project page http://xmlsoft.org/ xml@gnome.org https://mail.gnome.org/mailman/listinfo/xml
Re: [xml] Runtime parser limit for maximum size of text nodes
On 21/06/2017 23:41, Daniel Veillard wrote: I see only one person asking for this. Like any change to the structure (even adding at the end) this has potential risks. IMHO not worth the risk. If the person has such a specific need he can simply recompile libxml2 with a different value of the constant for that piece of code. I wouldn't base the decision on the number of complaints. If users knew what XML_PARSE_HUGE really means, there would be many more complaints. The main problem is that enabling XML_PARSE_HUGE makes users vulnerable to the billion laughs attack [1]. But if you want to support text nodes larger than 10 MB, there's no way around it. And no, recompiling libxml2 with a different constant isn't an option for most users or downstream projects. So downstream projects end up enabling the flag blindly: WebKit [2], Blink [3], PHP's SOAP server [4], Inkscape [5], and many others [6]. None of these projects seem to be aware of the consequences. The other main reason why users enable XML_PARSE_HUGE is the xmlParserMaxDepth limit. This limit can be relaxed if we make xmlParseElement/xmlParseContent non-recursive. IIUC, the push parser already uses a non-recursive approach and the infrastructure to store the list of current nodes on the heap is already there. So this shouldn't be too hard. Once these changes are made, we can advise users to disable XML_PARSE_HUGE. I also don't see the risk in appending items to struct xmlParserCtxt. The struct is supposed to only be allocated within libxml2, so the struct size should never be compiled into client code. The last time the struct was changed was in 2013 for the 2.9.1 release [7]. Did this cause any problems? Why was the change deemed safe back then? There's of course another solution: Simply disable the max text length limit. (Actually set it to around 2 GB to avoid integer overflows.) I don't see much value in limiting the size of text nodes. It makes more sense to limit the total size of an XML document which users can do easily. Nick [1] https://en.wikipedia.org/wiki/Billion_laughs [2] http://git.webkit.org/?p=WebKit.git;a=commitdiff;h=c74717d5a176f74b0c77ff8266b272903ad7297d [3] https://chromium.googlesource.com/chromium/blink/+/a939e6184a192e91b0088052269554c8866dacad [4] https://github.com/php/php-src/commit/40c60b8212b8ab18fd5bf9a426f99e42d9908f8e [5] https://gitlab.com/inkscape/inkscape/commit/112f963fb12a941762c828dfd1690a61771516af [6] https://encrypted.google.com/search?num=100&q="commit"+"xml_parse_huge"; [7] https://git.gnome.org/browse/libxml2/commit/?id=23f05e0 ___ xml mailing list, project page http://xmlsoft.org/ xml@gnome.org https://mail.gnome.org/mailman/listinfo/xml
[xml] Runtime parser limit for maximum size of text nodes
Greetings, I'd like to add a runtime parser setting for the maximum size of text nodes. This relieves many users from setting the unsafe XML_PARSE_HUGE option. See for example: https://bugzilla.gnome.org/show_bug.cgi?id=719350 https://bugzilla.gnome.org/show_bug.cgi?id=771799 Here's the commit: https://github.com/nwellnhof/libxml2/commit/7343130 I keep it up for review for a while, then push to Git master unless I hear objections. Nick ___ xml mailing list, project page http://xmlsoft.org/ xml@gnome.org https://mail.gnome.org/mailman/listinfo/xml