Elwin wrote:
No I don't try to do that. I just use the default paser for the plguin. It
seems that it works well now.
Thx.

I often find TagSoup performing better than NekoHTML. In case of some grave HTML errors Neko tends to simply truncate the document, while TagSoup just "keeps on truckin'". This is especially true for pages with multiple <html> elements, where Neko ignores all elements but the first one, while TagSoup just treats any <html> elements inside a document like any other nested element.

--
Best regards,
Andrzej Bialecki     <><
___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com




-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=103432&bid=230486&dat=121642
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to