[xml] Non recursive html parser

Eugene Pimenov Mon, 15 Feb 2010 23:00:27 -0800

Hello everyone,

As my colleague pointed out in December 
(http://mail.gnome.org/archives/xml/2009-December/msg00036.html ; although he 
didn't do it in a clear manner), there're real world examples of  HTML pages 
that overflows stack. We're using libxml through nokogiri ( 
http://nokogiri.org/ it's a Ruby library).


E. g.
        >> 
Nokogiri::HTML::SAX::Parser.new(Nokogiri::XML::SAX::Document.new).parse_memory("<b>"*100_000)
        #=> SystemStackError: stack level too deep

In the patch I change htmlParseElement to return immediately and let the caller 
htmlParseContent do the job.

htmlParseElement is not a static function, and I changed it behavior! I googled 
around (http://google.com/codesearch?q=htmlParseElement&hl=en&btnG=Search+Code) 
and I don't see everyone actually using it. But if this is an issue, I can make 
htmlParseElement call the secret (static) htmlParseElement and then 
htmlParseContent until level matches. I'd rather see htmlParseElement converted 
to static though.

I also attach weirdness.patch that deletes double definitions, and sets nameMax 
to 0 if it fails to allocate some memory.

Good day, everyone :)

non-recursive-html-parser.patch
Description: Binary data

weirdness.patch
Description: Binary data

_______________________________________________
xml mailing list, project page  http://xmlsoft.org/
[email protected]
http://mail.gnome.org/mailman/listinfo/xml

[xml] Non recursive html parser

Reply via email to