Interleaved nutch crawls locks crawldb

anupamk Mon, 17 Mar 2014 12:42:31 -0700

I have two scripts that fetch webpages from different parts of the internet
at various depths but updated the same crawldb.


The fetch cycle of one crawler is petty long while the other crawler is
pretty short (fetching about only 100 or so links).

While running the two crawler's concurrently I have run into the problems
and nutch sometimes throws a IOException saying that the ".locked" file
exists in crawldb. While one of crawl script tries to generate and/or update
crawldb.

Why does this happen and what do I do to avoid this ? What does -force mean
? 

Any information / wiki links / documentation explaining locking and how it
works would be appreciated. 

Thanks.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Interleaved-nutch-crawls-locks-crawldb-tp4124949.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Interleaved nutch crawls locks crawldb

Reply via email to