Sorry, I did not read your requirement completely (that you wanted to
parse the JS files for outlinks).  My bad.

Thanks
Raj


-----Original Message-----
From: Mark Stephenson [mailto:[email protected]] 
Sent: Thursday, September 30, 2010 4:49 PM
To: [email protected]
Subject: Re: Excluding javascript files from indexing and search
results.

Thanks a lot Arkadi.  I implemented the approach you suggested and it  
seems to be doing exactly what I want.

Thanks again,
Mark

On Sep 29, 2010, at 6:35 PM, <[email protected]> wrote:

> Hi Mark,
>
> I am not sure, maybe there is a simpler way, but if you want to  
> something to be fetched and processed but not indexed, you can write  
> an index filter plugin and return null for documents that you don't  
> want in the index. This is relatively easy to do, just use the index- 
> basic filter as an example.
>
> Regards,
>
> Arkadi
>
>> -----Original Message-----
>> From: Mark Stephenson [mailto:[email protected]]
>> Sent: Thursday, September 30, 2010 9:29 AM
>> To: [email protected]
>> Subject: Excluding javascript files from indexing and search results.
>>
>> Hi,
>>
>> I'm wondering if there's a way to prevent nutch from indexing
>> javascript files.  I still would like to fetch and parse javascript
>> files to find valuable outlinks, but I don't want them to show up in
>> my search results.  Is there a good way to do this?
>>
>> Thanks a lot,
>> Mark

Reply via email to