Hi All,

I am trying to decide if I could use Nutch for a project I am working on with the following requirements:

1. I need to build the ability to search a bunch of urls.
2. These urls are given to me and there is no need to crawl links from or to these urls. 3. From time to time new urls will be added to the original set of urls. I need to update the indexes as soon as I get a new url to be added to the original set of urls.
4. There is no need to rank these urls based on outside links etc..

Based on these requirements it seems that most of the capabilities of Nutch (crawling, hadoop etc.) would be an overkill for this project. There is no need for a linkdb etc..

Due to this I am thinking that I could use Solr with some other component to feed it with the appropriate data. If I use Solr, I would need a mechanism to fetch those urls and convert them to the format Solr needs the data to be sent to it. Can I use Nutch for this by just using the Fetcher and build something that would convert the html into the appropriate xml format for Solr? Is there something else that I could use that anyone here is aware of?

I am just starting out with Nutch and Solr and any help would be greatly appreciated.

Thanks,
Kumar.

Reply via email to