Re: [xml] htmlParser questions

Daniel Veillard Tue, 24 Jan 2006 04:33:04 -0800

On Mon, Jan 23, 2006 at 04:03:19PM +0100, Liron wrote:
> 1) Right now I'm simply using htmlParseDoc with "encoding=NULL" to build the 
> tree I need for the xsl engine. This function gives me a well-formed tree but 
> not valid at all, I wanted to know if there's an option to use the htmlParser 
> to build also a valid document.


  valid in which sense ? SGML DTD validity is way too complex.
You could use the XML serialization to get XML well-formedness of the output.
But IMHO since HTML is just the input, the validity concern should be
on the XSLT result and validity there can be insured by the stylesheets design.

> 2) Is there anyway to speed up the work of htmlParser? I'm not using any 
> options and only calling htmlParseDoc. The thing that worries me is that I've 
> also tested a seperate library called HtmlAgilityPack which is managed code 
> and it processes a html file faster than the libxml's html parser AND outputs 
> a well-formed+valid tree. From my tests libxml has an amazing performance on 
> xml and xsl files so I don't understand how a managed and marshalled code can 
> work better and faster. I must be doing something wrong, maybe the htmlParser 
> is not intended for valid trees which is also fine by me but I'd like it 
> atleast to be faster.

  The HTML parser should be that much slower than the XML parser, maybe there
is a problem introduced recently in the the code. But it's the first time I 
hear a complain about the HTML parser speed, strange. Maybe a bit of profiling
could help understanding what's happening.

Daniel

-- 
Daniel Veillard      | Red Hat http://redhat.com/
[EMAIL PROTECTED]  | libxml GNOME XML XSLT toolkit  http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/
_______________________________________________
xml mailing list, project page  http://xmlsoft.org/
[email protected]
http://mail.gnome.org/mailman/listinfo/xml

Re: [xml] htmlParser questions

Reply via email to