On Wed, Feb 17, 2010 at 10:04:08AM +0100, Daniel Veillard wrote:
> On Tue, Feb 16, 2010 at 10:00:03AM +0300, Eugene Pimenov wrote:
> > Hello everyone,
> > 
> > As my colleague pointed out in December 
> > (http://mail.gnome.org/archives/xml/2009-December/msg00036.html ; although 
> > he didn't do it in a clear manner), there're real world examples of  HTML 
> > pages that overflows stack. We're using libxml through nokogiri ( 
> > http://nokogiri.org/ it's a Ruby library). 
> > 
> > E. g.
> >     >> 
> > Nokogiri::HTML::SAX::Parser.new(Nokogiri::XML::SAX::Document.new).parse_memory("<b>"*100_000)
> >     #=> SystemStackError: stack level too deep
> > 
> > In the patch I change htmlParseElement to return immediately and let the 
> > caller htmlParseContent do the job.
> > 
> > htmlParseElement is not a static function, and I changed it behavior! I 
> > googled around 
> > (http://google.com/codesearch?q=htmlParseElement&hl=en&btnG=Search+Code) 
> > and I don't see everyone actually using it. But if this is an issue, I can 
> > make htmlParseElement call the secret (static) htmlParseElement and then 
> > htmlParseContent until level matches. I'd rather see htmlParseElement 
> > converted to static though.
> 
>   The patch as is also breaks the ABI by changing the ordering of
> elements in the public structure of a parser context.
>   Also changing the behaviour of the function to that extend is not
> correct and googling is not sufficient to answer if that behaviour would
> be in use somewhere. libxml2 is used in many embedded project, you won't
> find their source in google !
>   So at the minimum the public function must be preserved. The new
> elements in the parser context must be added at the end of the
> structure (I also need to be convinced this is really needed) and
> all the regression tests must still pass with the patch applied.

  So I had to look at this more closely. I ended up keeping
htmlParseElement and htmlParseContent as-is, that's the old code
now there for compatibility. I renamed your modified version
htmlParseElementInternal and htmlParseContentInternal and made
sure we were calling the new one. I also moved your extensions to
the parsing context to the end of the structure to preserve ABI.
I gave it a fair bit of testing and this looks solid. So that's now
applied but I kept the ABI intact. The new functions are static of
course.

> > I also attach weirdness.patch that deletes double definitions, and sets 
> > nameMax to 0 if it fails to allocate some memory.
> 
>   That one sounds fine, right !

  And I applied that one as-is,

   thanks a lot !

Daniel

-- 
Daniel Veillard      | libxml Gnome XML XSLT toolkit  http://xmlsoft.org/
[email protected]  | Rpmfind RPM search engine http://rpmfind.net/
http://veillard.com/ | virtualization library  http://libvirt.org/
_______________________________________________
xml mailing list, project page  http://xmlsoft.org/
[email protected]
http://mail.gnome.org/mailman/listinfo/xml

Reply via email to