On Thu, Aug 02, 2007 at 07:02:26AM +0400, Andrey A. Chujko wrote:
> Hello All,
> 
> In recovery mode, parent 'script' or 'style' section will be parsed 
> wrongly if it  contains the same embedded one.
> Say, an HTML document contains following script section:
> ================================Cut here===================================
> <script language=javascript>
> ...
> document.write('<script language=vbscript\>blah</script\>');
> ...
> </script>
> ================================Cut here===================================
> It's content escaped incorrectly.
> 
> 
> After this document processed with HTML SAX Parser in RECOVERY mode, the 
> original section looks corrupted:
> ================================Cut here===================================
> <script language=javascript>
> ...
> document.write('<script language=vbscript\>blah</script>
> ================================Cut here===================================
> 
> Cause both, the parent tag and the embedded one have similar names, the 
> Parser breaks
> parent section parsing prematurely, once it met the end of the embedded 
> section.
> (see HTMLparser.c, htmlParseScript function, line 2689).

  Well I'm sure that HTML breaks in a number of places, not just in libxml2
looks to me a case of broken beyond recovery data.

> Possible patch is attached.

  Could you try to explain your patch in english, i.e. what kind of workaround
you suggest, this may help discuss it,

Daniel

-- 
Red Hat Virtualization group http://redhat.com/virtualization/
Daniel Veillard      | virtualization library  http://libvirt.org/
[EMAIL PROTECTED]  | libxml GNOME XML XSLT toolkit  http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine  http://rpmfind.net/
_______________________________________________
xml mailing list, project page  http://xmlsoft.org/
[email protected]
http://mail.gnome.org/mailman/listinfo/xml

Reply via email to