Re: [xml] Runtime parser limit for maximum size of text nodes

2017-07-22 Thread Daniel Veillard
On Mon, Jun 26, 2017 at 02:41:46PM +0200, Nick Wellnhofer wrote:
> On 22/06/2017 22:29, Daniel Veillard wrote:
> >No limit on text node and one can be DoS'ed, there is many kind
> > of recursive attacks on XML, and libxml2 uses a combination of "entities
> > density" and text node size to try to catch those, it's complex.
> >By offloading the choice of the maximum text size to the application
> > developper you then put them in charge of doing the checking, i.e. too
> > big a size and the app can be DoS'ed in practice, it just depends on the
> > computer speed and memory anount. They need to be aware of that.
> 
> Right, this is something I overlooked. xmlParserEntityCheck should not use
> the user-provided max text length. Otherwise, xmlSetMaxTextLength(ctxt,
> SIZE_MAX) would disable one of the checks which wasn't my intention. We
> should keep the hardcoded limit there.
> 
> > The change of the size of the parsing context can bite some apps, that
> > happen in the past when I did this, maybe they have all been corrected,
> > but I would doubt it's gonna be 100% without side effects.
> 
> OK, I could revert this part of the change.
> 
> But thinking more about it, I come to the conclusion that libxml2 shouldn't
> impose a limit on the maximum size of text nodes at all. If there's a
> reliable mechanism to catch abusive entity expansions, the size of a text
> node is bounded by the size of the input document. In fact, processing a
> document containing a single 1 GB text node uses much less resources than a
> document with 1 GB of ""s. The former should consume just a bit more
> than 1 GB of memory for the text node, the latter will create 250 million
> element nodes, consuming around 30 GB of memory on x86-64.

  I still think that one need to protect users with a default max
text node side. The problem is that libxml2 is very often used on untrusted
data, and if they can push a 30 GB document (not that hard on current networks)
then you're DoS'ed as well.
  Same for max depth of a document (1 GB of ""), etc ...

Daniel

-- 
Daniel Veillard  | Red Hat Developers Tools http://developer.redhat.com/
veill...@redhat.com  | libxml Gnome XML XSLT toolkit  http://xmlsoft.org/
http://veillard.com/ | virtualization library  http://libvirt.org/
___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
https://mail.gnome.org/mailman/listinfo/xml


Re: [xml] Runtime parser limit for maximum size of text nodes

2017-06-26 Thread Nick Wellnhofer

On 22/06/2017 22:29, Daniel Veillard wrote:

   No limit on text node and one can be DoS'ed, there is many kind
of recursive attacks on XML, and libxml2 uses a combination of "entities
density" and text node size to try to catch those, it's complex.
   By offloading the choice of the maximum text size to the application
developper you then put them in charge of doing the checking, i.e. too
big a size and the app can be DoS'ed in practice, it just depends on the
computer speed and memory anount. They need to be aware of that.


Right, this is something I overlooked. xmlParserEntityCheck should not use the 
user-provided max text length. Otherwise, xmlSetMaxTextLength(ctxt, SIZE_MAX) 
would disable one of the checks which wasn't my intention. We should keep the 
hardcoded limit there.



The change of the size of the parsing context can bite some apps, that
happen in the past when I did this, maybe they have all been corrected,
but I would doubt it's gonna be 100% without side effects.


OK, I could revert this part of the change.

But thinking more about it, I come to the conclusion that libxml2 shouldn't 
impose a limit on the maximum size of text nodes at all. If there's a reliable 
mechanism to catch abusive entity expansions, the size of a text node is 
bounded by the size of the input document. In fact, processing a document 
containing a single 1 GB text node uses much less resources than a document 
with 1 GB of ""s. The former should consume just a bit more than 1 GB of 
memory for the text node, the latter will create 250 million element nodes, 
consuming around 30 GB of memory on x86-64.


Nick
___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
https://mail.gnome.org/mailman/listinfo/xml


Re: [xml] Runtime parser limit for maximum size of text nodes

2017-06-22 Thread Daniel Veillard
On Thu, Jun 22, 2017 at 08:40:23AM -0600, Stacy W. Smith wrote:
> > On 21/06/2017 23:41, Daniel Veillard wrote:
> >>   I see only one person asking for this. Like any change to the
> >> structure (even adding at the end) this has
> >> potential risks. IMHO not worth the risk. If the person has such a
> >> specific need he can simply recompile
> >> libxml2 with a different value of the constant for that piece of code.
> 
> If your looking for feedback, I would also like to see this change committed. 
> I have a project which uses libxml2 indirectly via the Python lxml library. 
> In this environment, it is not practical/feasible to have users recompile 
> libxml2.

  No limit on text node and one can be DoS'ed, there is many kind
of recursive attacks on XML, and libxml2 uses a combination of "entities
density" and text node size to try to catch those, it's complex.
  By offloading the choice of the maximum text size to the application
developper you then put them in charge of doing the checking, i.e. too
big a size and the app can be DoS'ed in practice, it just depends on the
computer speed and memory anount. They need to be aware of that.
  Your change would not change the default behaviour, that's a good point.
The change of the size of the parsing context can bite some apps, that
happen in the past when I did this, maybe they have all been corrected,
but I would doubt it's gonna be 100% without side effects.
  Except for Inkscape which usually doesn't load its images from random
web resources, I think use of XML_PARSE_HUGE to avoid the node text size
limit is a serious mistake (at least for web apps and stacks), but
I would not be too optimist about the actual testing of the apps after they
pick a new one for DoS, too many vectors, too complex...

  Purely at the code review patch looks fine, I would add a comment that
changing the text node max size has an influence on DoS detection by
libxml2 and doing so while parsing untrusted source should get some security
attention.
  Since the default is preserved, I'm not too worried, it's mostly the
structure size change which may break on users.

  ACK

Daniel

-- 
Daniel Veillard  | Red Hat Developers Tools http://developer.redhat.com/
veill...@redhat.com  | libxml Gnome XML XSLT toolkit  http://xmlsoft.org/
http://veillard.com/ | virtualization library  http://libvirt.org/
___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
https://mail.gnome.org/mailman/listinfo/xml


Re: [xml] Runtime parser limit for maximum size of text nodes

2017-06-22 Thread Stacy W. Smith
> On 21/06/2017 23:41, Daniel Veillard wrote:
>>   I see only one person asking for this. Like any change to the
>> structure (even adding at the end) this has
>> potential risks. IMHO not worth the risk. If the person has such a
>> specific need he can simply recompile
>> libxml2 with a different value of the constant for that piece of code.

If your looking for feedback, I would also like to see this change committed. I 
have a project which uses libxml2 indirectly via the Python lxml library. In 
this environment, it is not practical/feasible to have users recompile libxml2.

--Stacy


___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
https://mail.gnome.org/mailman/listinfo/xml


Re: [xml] Runtime parser limit for maximum size of text nodes

2017-06-22 Thread Nick Wellnhofer

On 21/06/2017 23:41, Daniel Veillard wrote:

   I see only one person asking for this. Like any change to the
structure (even adding at the end) this has
potential risks. IMHO not worth the risk. If the person has such a
specific need he can simply recompile
libxml2 with a different value of the constant for that piece of code.


I wouldn't base the decision on the number of complaints. If users knew what 
XML_PARSE_HUGE really means, there would be many more complaints. The main 
problem is that enabling XML_PARSE_HUGE makes users vulnerable to the billion 
laughs attack [1]. But if you want to support text nodes larger than 10 MB, 
there's no way around it. And no, recompiling libxml2 with a different 
constant isn't an option for most users or downstream projects.


So downstream projects end up enabling the flag blindly: WebKit [2], Blink 
[3], PHP's SOAP server [4], Inkscape [5], and many others [6]. None of these 
projects seem to be aware of the consequences.


The other main reason why users enable XML_PARSE_HUGE is the xmlParserMaxDepth 
limit. This limit can be relaxed if we make xmlParseElement/xmlParseContent 
non-recursive. IIUC, the push parser already uses a non-recursive approach and 
the infrastructure to store the list of current nodes on the heap is already 
there. So this shouldn't be too hard.


Once these changes are made, we can advise users to disable XML_PARSE_HUGE.

I also don't see the risk in appending items to struct xmlParserCtxt. The 
struct is supposed to only be allocated within libxml2, so the struct size 
should never be compiled into client code. The last time the struct was 
changed was in 2013 for the 2.9.1 release [7]. Did this cause any problems? 
Why was the change deemed safe back then?


There's of course another solution: Simply disable the max text length limit. 
(Actually set it to around 2 GB to avoid integer overflows.) I don't see much 
value in limiting the size of text nodes. It makes more sense to limit the 
total size of an XML document which users can do easily.


Nick


[1] https://en.wikipedia.org/wiki/Billion_laughs
[2] 
http://git.webkit.org/?p=WebKit.git;a=commitdiff;h=c74717d5a176f74b0c77ff8266b272903ad7297d
[3] 
https://chromium.googlesource.com/chromium/blink/+/a939e6184a192e91b0088052269554c8866dacad

[4] 
https://github.com/php/php-src/commit/40c60b8212b8ab18fd5bf9a426f99e42d9908f8e
[5] 
https://gitlab.com/inkscape/inkscape/commit/112f963fb12a941762c828dfd1690a61771516af

[6] https://encrypted.google.com/search?num=100="commit"+"xml_parse_huge;
[7] https://git.gnome.org/browse/libxml2/commit/?id=23f05e0
___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
https://mail.gnome.org/mailman/listinfo/xml