Hi Bayu

Many thanks for that.  Disabling the directory index page and enabling a
directory has fixed the issue.  I now get three documents indexed.  The
directory listing, index.html and index1.html

Is there anyway to stop nutch from indexing (rather than crawing) the
 directory listing itself?

Thanks

Paul


On 5 May 2014 18:57, Bayu Widyasanyata <[email protected]> wrote:

> On Tue, May 6, 2014 at 6:05 AM, Paul Rogers <[email protected]>
> wrote:
>
> > By that do you mean using file:// as opposed to http:// crawling?
>
>
> Yupe.
>
> https://wiki.apache.org/nutch/FAQ#Nutch_crawling_parent_directories_for_file_protocol
>
>
> --
> wassalam,
> [bayu]
>

Reply via email to