Hi Bayu Many thanks for the response.
> Otherwise you can still also fetch if through "directory crawling" (instead of browser crawling) By that do you mean using file:// as opposed to http:// crawling? Thanks P On 5 May 2014 17:42, Bayu Widyasanyata <[email protected]> wrote: > On Mon, May 5, 2014 at 10:34 PM, Paul Rogers <[email protected]> > wrote: > > > My question is how do I get nutch to crawl all the files on a web site > not > > just the "root" url? > > > > Hi, > > nutch is acts as crawler, the same about we uses any Internet browser. > nutch or we can't browse or crawl the pages that doesn't have a referer > page (linked page). > So, you should have a page that has link to index1.html. > File index.html is automatically crawled since it should be your > DirectoryIndex page. > > Otherwise you can still also fetch if through "directory crawling" (instead > of browser crawling) or you disable directory Index page setting (such on > Apache / DirectoryIndex), so clients (nutch) can browse your entire > directories. > > Thanks.- > > > -- > wassalam, > [bayu] >

