I am getting these lock file errors all over the place when indexing or even creating crawldbs. It doesn't happen all the time, but sometimes it happens continuously. So, I am not quite sure how these locks are getting in there, or why they aren't getting removed.
I am not sure where to go from here. My current application is designed for crawling individual domains. So, I have multiple custom crawlers that work concurrently. Each one basically does: 1) fetch 2) invert links 3) segment merge 4) index 5) deduplicate 6) merge indexes Though, I am still not 100% sure of what the "indexes" directory is truly for. java.io.IOException: Lock obtain timed out: [EMAIL PROTECTED]:/crawloutput/http$~~www.camlawblog.com/indexes/part-00000/write.lock at org.apache.lucene.store.Lock.obtain(Lock.java:69) at org.apache.lucene.index.IndexReader.aquireWriteLock(IndexReader.java:526) at org.apache.lucene.index.IndexReader.deleteDocument(IndexReader.java:551) at org.apache.nutch.indexer.DeleteDuplicates.reduce(DeleteDuplicates.java:414) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:313) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:155) So, has anyone seen this come up on their own implementations? ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ Nutch-developers mailing list Nutch-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nutch-developers