Hi Bayu Many thanks for that. Disabling the directory index page and enabling a directory has fixed the issue. I now get three documents indexed. The directory listing, index.html and index1.html
Is there anyway to stop nutch from indexing (rather than crawing) the directory listing itself? Thanks Paul On 5 May 2014 18:57, Bayu Widyasanyata <[email protected]> wrote: > On Tue, May 6, 2014 at 6:05 AM, Paul Rogers <[email protected]> > wrote: > > > By that do you mean using file:// as opposed to http:// crawling? > > > Yupe. > > https://wiki.apache.org/nutch/FAQ#Nutch_crawling_parent_directories_for_file_protocol > > > -- > wassalam, > [bayu] >

