Sorry, I did not read your requirement completely (that you wanted to parse the JS files for outlinks). My bad.
Thanks Raj -----Original Message----- From: Mark Stephenson [mailto:[email protected]] Sent: Thursday, September 30, 2010 4:49 PM To: [email protected] Subject: Re: Excluding javascript files from indexing and search results. Thanks a lot Arkadi. I implemented the approach you suggested and it seems to be doing exactly what I want. Thanks again, Mark On Sep 29, 2010, at 6:35 PM, <[email protected]> wrote: > Hi Mark, > > I am not sure, maybe there is a simpler way, but if you want to > something to be fetched and processed but not indexed, you can write > an index filter plugin and return null for documents that you don't > want in the index. This is relatively easy to do, just use the index- > basic filter as an example. > > Regards, > > Arkadi > >> -----Original Message----- >> From: Mark Stephenson [mailto:[email protected]] >> Sent: Thursday, September 30, 2010 9:29 AM >> To: [email protected] >> Subject: Excluding javascript files from indexing and search results. >> >> Hi, >> >> I'm wondering if there's a way to prevent nutch from indexing >> javascript files. I still would like to fetch and parse javascript >> files to find valuable outlinks, but I don't want them to show up in >> my search results. Is there a good way to do this? >> >> Thanks a lot, >> Mark

