RESOLVED--- I had to add a custom plugin - InvalidUrlIndexFilter which filters out all the invalid urls while indexing the pages/files. Check out this blog: http://sujitpal.blogspot.com/2009/07/nutch-getting-my-feet-wet.html
Just follow the process of creating/adding a new custom plugin http://wiki.apache.org/nutch/WritingPluginExample-0.9 <http://wiki.apache.org/nutch/WritingPluginExample-0.9>After adding this plugin, I was able to index the files by skipping this index page...hope this helps... On Wed, Apr 28, 2010 at 1:54 PM, BK <bk4...@gmail.com> wrote: > Hello all, > > I have indexed few directories which contain html files and the *index to > each directory* is showing up as one of the search results. Is there any > way > to skip this directory from search results. e.g. *Index of > C:\temp\html*, *Index > of C:\temp\html\dir2 *are showing up in the results which displays the list > of all files under a specific directory (end users won't need this info). > Thanks! >