I am getting these lock file errors all over the place when indexing
or even creating crawldbs.  It doesn't happen all the time, but
sometimes it happens continuously.  So, I am not quite sure how these
locks are getting in there, or why they aren't getting removed.

I am not sure where to go from here.

My current application is designed for crawling individual domains.
So, I have multiple custom crawlers that work concurrently.  Each one
basically does:

1) fetch
2) invert links
3) segment merge
4) index
5) deduplicate
6) merge indexes


Though, I am still not 100% sure of what the "indexes" directory is truly for.




java.io.IOException: Lock obtain timed out:
[EMAIL 
PROTECTED]:/crawloutput/http$~~www.camlawblog.com/indexes/part-00000/write.lock
        at org.apache.lucene.store.Lock.obtain(Lock.java:69)
        at 
org.apache.lucene.index.IndexReader.aquireWriteLock(IndexReader.java:526)
        at 
org.apache.lucene.index.IndexReader.deleteDocument(IndexReader.java:551)
        at 
org.apache.nutch.indexer.DeleteDuplicates.reduce(DeleteDuplicates.java:414)
        at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:313)
        at 
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:155)


So, has anyone seen this come up on their own implementations?

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Nutch-developers mailing list
Nutch-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to