Hi,
I would like to perform an incremental crawling using Nutch. What I want
to do is to configure Nutch in such a way that it should check for
expired pages and issue new crawls to the expires pages only.
Other requirements are:
Ability to inject new urls to the crawl database. When incremental
crawling begins, nutch should crawl the newly injected urls.
After an incremental crawl is completed, either a new search index
should be created or the previous search index should be updated.
Can anyone suggest how to achieve this?
Thanks,
Kannan
This e-mail and any files transmitted with it are for the sole use of the
intended recipient(s) and may contain confidential and privileged information.
If you are not the intended recipient, please contact the sender by reply
e-mail and destroy all copies of the original message.
Any unauthorised review, use, disclosure, dissemination, forwarding, printing
or copying of this email or any action taken in reliance on this e-mail is
strictly
prohibited and may be unlawful.
Visit us at http://www.cognizant.com
-------------------------------------------------------
This SF.Net email is sponsored by: New Crystal Reports XI.
Version 11 adds new functionality designed to reduce time involved in
creating, integrating, and deploying reporting solutions. Free runtime info,
new features, or free trial, at: http://www.businessobjects.com/devxi/728
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers