Andrzej, Thank you -- and here we were going nuts thinking the problem might have been with the plugin! Would it be possible to post the patch file of the changes once you have made them as our version of Nutch is different from SVN.
Thankx again. CC- -----Original Message----- From: Andrzej Bialecki [mailto:[EMAIL PROTECTED] Sent: Monday, July 04, 2005 6:05 AM To: nutch-dev@lucene.apache.org Subject: Re: both html parser have bug with javascript Chirag Chaman wrote: > Actually, I think the JavaScript is there as it's part of the HTML > page -- but it should not be part of the summaries. Has anyone found > a solution to not showing the "JavaScript" or "text/css" -- that shows > up from time to time? Summary is generated from parse_text data. So, the problem is already during the parsing. Actually, I think the problem is caused by my patch to DOMContentUtils ;-), which adds script language, stylesheet type and so on to the output text. From your comments I gather that you'd rather not have it there - I'll fix it. -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __________________________________ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com ------------------------------------------------------- SF.Net email is sponsored by: Discover Easy Linux Migration Strategies from IBM. Find simple to follow Roadmaps, straightforward articles, informative Webcasts and more! Get everything you need to get up to speed, fast. http://ads.osdn.com/?ad_id=7477&alloc_id=16492&op=click _______________________________________________ Nutch-developers mailing list Nutch-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nutch-developers