RESOLVED---
I had to add a custom plugin - InvalidUrlIndexFilter which filters out all
the invalid urls while indexing the pages/files. Check out this blog:
http://sujitpal.blogspot.com/2009/07/nutch-getting-my-feet-wet.html

Just follow the process of creating/adding a new custom plugin
http://wiki.apache.org/nutch/WritingPluginExample-0.9

<http://wiki.apache.org/nutch/WritingPluginExample-0.9>After adding this
plugin, I was able to index the files by skipping this index page...hope
this helps...


On Wed, Apr 28, 2010 at 1:54 PM, BK <bk4...@gmail.com> wrote:

> Hello all,
>
> I have indexed few directories which contain html files and the *index to
> each directory* is showing up as one of the search results. Is there any
> way
> to skip this directory from search results. e.g. *Index of
> C:\temp\html*, *Index
> of C:\temp\html\dir2 *are showing up in the results which displays the list
> of all files under a specific directory (end users won't need this info).
> Thanks!
>

Reply via email to