RE: Crawling web and intranet files into single crawldb

Markus Jelsma Wed, 04 Jun 2014 05:34:52 -0700

Hi Bayu,

 
You must enabled the protocol-file first. Then make sure the file:// prefix is 
not filtered via prefix-urlfilter.txt or any other. Now just inject new URL's 
and start the crawl.

Cheers

-----Original message-----
From:Bayu Widyasanyata <[email protected]>
Sent:Wed 04-06-2014 14:30
Subject:Crawling web and intranet files into single crawldb
To:[email protected]; 
Hi,

I successfully running nutch 1.8 and Solr 4.8.1 to fetch and index web
sources (http protocol).
And now I want add file share data sources (file protocol) into current
crawldb.

What is the strategy or common practices to handle this situations?

Thank you.-

-- 
wassalam,
[bayu]

RE: Crawling web and intranet files into single crawldb

Reply via email to