On Fri, Feb 19, 2010 at 04:24:38PM +0100, Joachim Zobel wrote: > Hi. > > I am trying to parse HTML generated by MS Word. Although this starts > with a > > <html ... xmlns:o="urn:schemas-microsoft-com:office:office" > > The parser complains about > > Tag o:p invalid > > when I encounters such a tag? > > Why is this?
Because you are using an HTML parser to parse what looks like XHTML i.e. XML version of HTML with what looks like MS extensions. You could try to use the XML parser instead , Daniel -- Daniel Veillard | libxml Gnome XML XSLT toolkit http://xmlsoft.org/ [email protected] | Rpmfind RPM search engine http://rpmfind.net/ http://veillard.com/ | virtualization library http://libvirt.org/ _______________________________________________ xml mailing list, project page http://xmlsoft.org/ [email protected] http://mail.gnome.org/mailman/listinfo/xml
