Crawls more urls than specified

SravanS Fri, 02 Jul 2010 04:26:00 -0700

Hey guys,

So I previously crawled/indexed (nutched?!) two urls together at the same
time. Then I got rid of the crawl file, and tried to re-crawl with just one
url. However, it still seems to crawl both the urls.


I changed my urls file, as well as my crawlurl-filter.txt to limit the
domain to that one url.

I tried re-downloading nutch and resetting all my settings, and using only
that one url, but regardless it seems to crawl those two urls.

I know this is very poor amount of information, so I'll just give the specs
of what I'm running.

I've used nutch 0.9, nutch 1.0 on centos 5.2. I run the nutch web server in
tomcat 6.0. Same results everytime.

Sincerely,
Sravan Suryadevara
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Crawls-more-urls-than-specified-tp929785p929785.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Crawls more urls than specified

Reply via email to