Thanks a lot Arkadi. I implemented the approach you suggested and it seems to be doing exactly what I want.

Thanks again,
Mark

On Sep 29, 2010, at 6:35 PM, <[email protected]> wrote:

Hi Mark,

I am not sure, maybe there is a simpler way, but if you want to something to be fetched and processed but not indexed, you can write an index filter plugin and return null for documents that you don't want in the index. This is relatively easy to do, just use the index- basic filter as an example.

Regards,

Arkadi

-----Original Message-----
From: Mark Stephenson [mailto:[email protected]]
Sent: Thursday, September 30, 2010 9:29 AM
To: [email protected]
Subject: Excluding javascript files from indexing and search results.

Hi,

I'm wondering if there's a way to prevent nutch from indexing
javascript files.  I still would like to fetch and parse javascript
files to find valuable outlinks, but I don't want them to show up in
my search results.  Is there a good way to do this?

Thanks a lot,
Mark

Reply via email to