Hi,
         I am fairly new to Nutch. I have a search engine setup for local
files using the Lucene API. Many of the Lucene files have been customized to
the application I am working on. Now I want to extend the capability to
fetch files from the web and index them. I need the following functionality,

(1) Given a root URL fetch all the pages under that URL
(2) Index the page using the Lucene API i have customized
(3) I should be able to fetch only the modifed documents later on.

        Can someone suggest me where I can find more information regarding
these issues? Or give some pointers as to where to start?

Thanks,

Rajesh Munavalli
Blog: http://mathsearch.blogspot.com

Reply via email to