Hi, I am a newbie to Nutch and Lucene. Have a task to build a framework for webpage caching on local system (i.e. download and store webpage in local filesystem), indexing (index pages on keywords), search (search the local webpage cache using the keywords). The preference would be to build framework using Java API available in third party jars.
On first glance, it seems Nutch+Hadoop+Lucene should be a good option to build this framework. Do you think it is a right option? Any ideas, links would be appreciated. Regards, Amit