Hello everyone, As my colleague pointed out in December (http://mail.gnome.org/archives/xml/2009-December/msg00036.html ; although he didn't do it in a clear manner), there're real world examples of HTML pages that overflows stack. We're using libxml through nokogiri ( http://nokogiri.org/ it's a Ruby library).
E. g.
>>
Nokogiri::HTML::SAX::Parser.new(Nokogiri::XML::SAX::Document.new).parse_memory("<b>"*100_000)
#=> SystemStackError: stack level too deep
In the patch I change htmlParseElement to return immediately and let the caller
htmlParseContent do the job.
htmlParseElement is not a static function, and I changed it behavior! I googled
around (http://google.com/codesearch?q=htmlParseElement&hl=en&btnG=Search+Code)
and I don't see everyone actually using it. But if this is an issue, I can make
htmlParseElement call the secret (static) htmlParseElement and then
htmlParseContent until level matches. I'd rather see htmlParseElement converted
to static though.
I also attach weirdness.patch that deletes double definitions, and sets nameMax
to 0 if it fails to allocate some memory.
Good day, everyone :)
non-recursive-html-parser.patch
Description: Binary data
weirdness.patch
Description: Binary data
_______________________________________________ xml mailing list, project page http://xmlsoft.org/ [email protected] http://mail.gnome.org/mailman/listinfo/xml
