Elwin wrote:
No I don't try to do that. I just use the default paser for the plguin. It seems that it works well now. Thx.
I often find TagSoup performing better than NekoHTML. In case of some grave HTML errors Neko tends to simply truncate the document, while TagSoup just "keeps on truckin'". This is especially true for pages with multiple <html> elements, where Neko ignores all elements but the first one, while TagSoup just treats any <html> elements inside a document like any other nested element.
-- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __________________________________ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com ------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Do you grep through log files for problems? Stop! Download the new AJAX search engine that makes searching your log files as easy as surfing the web. DOWNLOAD SPLUNK! http://sel.as-us.falkag.net/sel?cmd=lnk&kid=103432&bid=230486&dat=121642 _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
