Hi Markus, The following files should I configured:
= prefix-urlfilter.txt: put file:// which is already configured. = regex-urlfilter.txt: update following line -^(file|ftp|mailto) to -^(ftp|mailto): = urls/seed.txt: add new URL/file path. ...and start crawling. Is it enough? CMIIW Thanks- On Wed, Jun 4, 2014 at 7:33 PM, Markus Jelsma <[email protected]> wrote: > Hi Bayu, > > > You must enabled the protocol-file first. Then make sure the file:// > prefix is not filtered via prefix-urlfilter.txt or any other. Now just > inject new URL's and start the crawl. > > > Cheers > > > > -----Original message----- > From:Bayu Widyasanyata <[email protected]> > Sent:Wed 04-06-2014 14:30 > Subject:Crawling web and intranet files into single crawldb > To:[email protected]; > Hi, > > I successfully running nutch 1.8 and Solr 4.8.1 to fetch and index web > sources (http protocol). > And now I want add file share data sources (file protocol) into current > crawldb. > > What is the strategy or common practices to handle this situations? > > Thank you.- > > -- > wassalam, > [bayu] > -- wassalam, [bayu]

