On Thu, Aug 02, 2007 at 07:02:26AM +0400, Andrey A. Chujko wrote:
> Hello All,
>
> In recovery mode, parent 'script' or 'style' section will be parsed
> wrongly if it contains the same embedded one.
> Say, an HTML document contains following script section:
> ================================Cut here===================================
> <script language=javascript>
> ...
> document.write('<script language=vbscript\>blah</script\>');
> ...
> </script>
> ================================Cut here===================================
> It's content escaped incorrectly.
>
>
> After this document processed with HTML SAX Parser in RECOVERY mode, the
> original section looks corrupted:
> ================================Cut here===================================
> <script language=javascript>
> ...
> document.write('<script language=vbscript\>blah</script>
> ================================Cut here===================================
>
> Cause both, the parent tag and the embedded one have similar names, the
> Parser breaks
> parent section parsing prematurely, once it met the end of the embedded
> section.
> (see HTMLparser.c, htmlParseScript function, line 2689).
Well I'm sure that HTML breaks in a number of places, not just in libxml2
looks to me a case of broken beyond recovery data.
> Possible patch is attached.
Could you try to explain your patch in english, i.e. what kind of workaround
you suggest, this may help discuss it,
Daniel
--
Red Hat Virtualization group http://redhat.com/virtualization/
Daniel Veillard | virtualization library http://libvirt.org/
[EMAIL PROTECTED] | libxml GNOME XML XSLT toolkit http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/
_______________________________________________
xml mailing list, project page http://xmlsoft.org/
[email protected]
http://mail.gnome.org/mailman/listinfo/xml