I don't know of an elegant way, but if you want to hack Nutch sources, you could set its refetch time to some point in time veeerrrry far in the future, for example. Or introduce additional status.
Otis -- Sematext is hiring -- http://sematext.com/about/jobs.html?mls Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR ----- Original Message ---- > From: Saurabh Suman <[email protected]> > To: [email protected] > Sent: Thursday, July 30, 2009 9:59:50 AM > Subject: Meaning of ProtocolStatus.ACCESS_DENIED > > > Hi > In Fetcher.java, if protacol status of a url is > ProtocolStatus.ACCESS_DENIED. > Will nutch try to crawl it again after certain time interval? If yes , how > can i prevent nutch to recrawl it again if its protocol status is > ProtocolStatus.ACCESS_DENIED? > -- > View this message in context: > http://www.nabble.com/Meaning-of-ProtocolStatus.ACCESS_DENIED-tp24739011p24739011.html > Sent from the Nutch - User mailing list archive at Nabble.com.
