recrawl continuos

payo Mon, 17 Mar 2008 09:17:58 -0700

hi to all

i am working with nutch-0.8.1


centos

I was working with google, which allows me to make a crawl continuous, of
this form no longer tapeworm that to make complete, single a crawl updated
my index when some site has had some change single towards a complete crawl
at the beginning, but already after was continuous. 

my question is, with nutch is possible to make a type of continuous crawl?

i am trying index 7 sites but the time is a longer 3 days for this

[EMAIL PROTECTED] nutch-0.8]# ./bin/nutch readdb crawl2/crawldb -stats
CrawlDb statistics start: crawl2/crawldb
Statistics for CrawlDb: crawl2/crawldb
TOTAL urls:     286272
retry 0:        284788
retry 1:        856
retry 2:        628
min score:      0.0
avg score:      5.5150344E-5
max score:      1.396
status 1 (DB_unfetched):        23
status 2 (DB_fetched):  284463
status 3 (DB_gone):     1786
CrawlDb statistics: done


i am trying implement nutch and hadoop for reduce time

any idea for helme?

thanks in advance

-- 
View this message in context: 
http://www.nabble.com/recrawl-continuos-tp16095581p16095581.html
Sent from the Nutch - User mailing list archive at Nabble.com.

recrawl continuos

Reply via email to