Andrzej,

Thank you -- and here we were going nuts thinking the problem might have
been with the plugin!
Would it be possible to post the patch file of the changes once you have
made them as our version of Nutch is different from SVN.

Thankx again.

CC-
 

-----Original Message-----
From: Andrzej Bialecki [mailto:[EMAIL PROTECTED] 
Sent: Monday, July 04, 2005 6:05 AM
To: nutch-dev@lucene.apache.org
Subject: Re: both html parser have bug with javascript

Chirag Chaman wrote:
> Actually, I think the JavaScript is there as it's part of the HTML 
> page -- but it should not be part of the summaries.  Has anyone found 
> a solution to not showing the "JavaScript" or "text/css" -- that shows 
> up from time to time?

Summary is generated from parse_text data. So, the problem is already during
the parsing.

Actually, I think the problem is caused by my patch to DOMContentUtils ;-),
which adds script language, stylesheet type and so on to the output text.

 From your comments I gather that you'd rather not have it there - I'll fix
it.

--
Best regards,
Andrzej Bialecki     <><
  ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web ___|||__||  \|
||  |  Embedded Unix, System Integration http://www.sigram.com  Contact:
info at sigram dot com





-------------------------------------------------------
SF.Net email is sponsored by: Discover Easy Linux Migration Strategies
from IBM. Find simple to follow Roadmaps, straightforward articles,
informative Webcasts and more! Get everything you need to get up to
speed, fast. http://ads.osdn.com/?ad_id=7477&alloc_id=16492&op=click
_______________________________________________
Nutch-developers mailing list
Nutch-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to