Thanks. I change my nutch-default.xml to the following:

<property>
  <name>plugin.includes</name>
  
<value>protocol-http|urlfilter-regex|parse-(html)|index-basic|query-(basic|site|url)|summary-basic|scoring-opic|urlnormalizer-(pass|regex|basic)</value>
</property>

But I still see this error message, I don't expect it tries to fetch
js files at all.

Error parsing: http://www.cnn.com/exchange/submit/pokkariJavascript.js:
failed(2,200): org.apache.nutch.parse.ParseException: parser not found
for contentType=application/x-javascript
url=http://www.cnn.com/exchange/submit/pokkariJavascript.js


And why it fetch rss file too?

fetching http://rss.cnn.com/rss/cnn_ireports.rss



Any help is appreciated.



On 4/12/07, Ratnesh,V2Solutions India
<[EMAIL PROTECTED]> wrote:
>
> HI,
> what you can do is remove parse-js and other related plugin from
> nutch-site.xml file and nutch-default.xml file both .
> but its not recommended to do change in nutch-default.xml , though sometimes
> without changing in nutch-default.xml , it does not affect .
>
> so you see what the changes you can do according to the requirement I am
> sure once you remove the parse-js It wount crawl javascript and try removing
> other plugins as parse-msword etc.
>
> I hope that it will done
>
> Ratnesh,V2Solutions,India
>
>
>
> Meryl Silverburgh wrote:
> >
> > Hi,
> >
> > How can I configure nutch just crawl html links (no images, no
> > javascript files, no css files)?
> > And it won't record in the crawl database for non html pages links.
> >
> > thank you.
> >
> >
>
> --
> View this message in context: 
> http://www.nabble.com/How-to-config-nutch-just-crawl-html-links--tf3562947.html#a9957697
> Sent from the Nutch - User mailing list archive at Nabble.com.
>
>

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nutch-general mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to