Re: Delete Urls from CrawlsDB

oddaniel Fri, 02 May 2008 02:11:30 -0700

Hi,

Thanks for the response. I spent the whole day searching for how to use
filters to get CrawlDbMerger to delete unwanted urls from the crawldb. Im
confused. I dont have a clue how to do this. Please can yu tell me how
exactly.




ogjunk-nutch wrote:
> 
> That's a good question, and one that you could add to the FAQ on the Wiki. 
> From the quick scan of the source code, it doesn't look like you can
> delete a URL directly.  However, you can filter it out with bin/nutch
> mergedb (uses CrawlDbMerger class, so check its javadocs), effectively
> removing the URL from CrawlDb.
> 
> Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
> 
> 
> ----- Original Message ----
>> From: oddaniel <[EMAIL PROTECTED]>
>> To: [email protected]
>> Sent: Saturday, April 19, 2008 4:20:04 AM
>> Subject: Delete Urls from CrawlsDB
>> 
>> 
>> Is it possible to remove or delete one of the urls that has been crawled
>> from
>> the crawl database? If this is possible, how can it be done?
>> -- 
>> View this message in context: 
>> http://www.nabble.com/Delete-Urls-from-CrawlsDB-tp16773512p16773512.html
>> Sent from the Nutch - User mailing list archive at Nabble.com.
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Delete-Urls-from-CrawlsDB-tp16773512p17010176.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Re: Delete Urls from CrawlsDB

Reply via email to