I have a requirement where I want to index and search file system contents (my local server contents), and at the same time crawl a select set of web-sites on the same search query.
I have search for my local file system implemented through Lucene. I would like to have Nutch just crawl the web-sites and produce content, so that my Lucene search application could index and search the web content as well. I would like to use standalone Lucene for index/search of web-content also because I want to use same analyzer across the two and have more control on the search results like, say, apply different boosts to local content vs web-content. I want to use Nutch code for crawling and retrieving web-links of search results, but I want to do indexing/searching/analysis using Lucene itself. Is there a solution where only the crawling part of Nutch is taken and is integrated with Lucene? -- View this message in context: http://lucene.472066.n3.nabble.com/Separate-Nutch-crawl-and-Lucene-index-search-tp747841p747841.html Sent from the Nutch - User mailing list archive at Nabble.com.