On Wed, Feb 17, 2010 at 10:04:08AM +0100, Daniel Veillard wrote: > On Tue, Feb 16, 2010 at 10:00:03AM +0300, Eugene Pimenov wrote: > > Hello everyone, > > > > As my colleague pointed out in December > > (http://mail.gnome.org/archives/xml/2009-December/msg00036.html ; although > > he didn't do it in a clear manner), there're real world examples of HTML > > pages that overflows stack. We're using libxml through nokogiri ( > > http://nokogiri.org/ it's a Ruby library). > > > > E. g. > > >> > > Nokogiri::HTML::SAX::Parser.new(Nokogiri::XML::SAX::Document.new).parse_memory("<b>"*100_000) > > #=> SystemStackError: stack level too deep > > > > In the patch I change htmlParseElement to return immediately and let the > > caller htmlParseContent do the job. > > > > htmlParseElement is not a static function, and I changed it behavior! I > > googled around > > (http://google.com/codesearch?q=htmlParseElement&hl=en&btnG=Search+Code) > > and I don't see everyone actually using it. But if this is an issue, I can > > make htmlParseElement call the secret (static) htmlParseElement and then > > htmlParseContent until level matches. I'd rather see htmlParseElement > > converted to static though. > > The patch as is also breaks the ABI by changing the ordering of > elements in the public structure of a parser context. > Also changing the behaviour of the function to that extend is not > correct and googling is not sufficient to answer if that behaviour would > be in use somewhere. libxml2 is used in many embedded project, you won't > find their source in google ! > So at the minimum the public function must be preserved. The new > elements in the parser context must be added at the end of the > structure (I also need to be convinced this is really needed) and > all the regression tests must still pass with the patch applied.
So I had to look at this more closely. I ended up keeping htmlParseElement and htmlParseContent as-is, that's the old code now there for compatibility. I renamed your modified version htmlParseElementInternal and htmlParseContentInternal and made sure we were calling the new one. I also moved your extensions to the parsing context to the end of the structure to preserve ABI. I gave it a fair bit of testing and this looks solid. So that's now applied but I kept the ABI intact. The new functions are static of course. > > I also attach weirdness.patch that deletes double definitions, and sets > > nameMax to 0 if it fails to allocate some memory. > > That one sounds fine, right ! And I applied that one as-is, thanks a lot ! Daniel -- Daniel Veillard | libxml Gnome XML XSLT toolkit http://xmlsoft.org/ [email protected] | Rpmfind RPM search engine http://rpmfind.net/ http://veillard.com/ | virtualization library http://libvirt.org/ _______________________________________________ xml mailing list, project page http://xmlsoft.org/ [email protected] http://mail.gnome.org/mailman/listinfo/xml
