Hi, our requirement is that the Nutch should not recrawl crawl the pages that was being already crawled. ie., the crawling should not happen for the web pages with the status as '2' in the webpage table. It should not recrawl and should not put the outlinks as well.
can you please let me know whether it is possible by changing some configuration parameters in nutch site xml? Thanks and Regards Deepa =====-----=====-----===== Notice: The information contained in this e-mail message and/or attachments to it may contain confidential or privileged information. If you are not the intended recipient, any dissemination, use, review, distribution, printing or copying of the information contained in this e-mail message and/or attachments to it are strictly prohibited. If you have received this communication in error, please notify us by reply e-mail or telephone and immediately and permanently delete the message and any attachments. Thank you