You should you hadoop 0.12.3 for example to dedup. The current version 0.14.x don't support Lock operation.
2007/10/18, Matei Zaharia <[EMAIL PROTECTED]>: > > Hi, > > I'm sometimes getting the following error in the dedup 3 job when > running Nutch 0.9 on top of Hadoop 0.14.2: > > java.io.IOException: Lock obtain timed out: [EMAIL PROTECTED]://r37:54310/ > user/matei/crawl4/indexes/part-00000/write.lock > at org.apache.lucene.store.Lock.obtain(Lock.java:69) > at org.apache.lucene.index.IndexReader.aquireWriteLock > (IndexReader.java:526) > at org.apache.lucene.index.IndexReader.deleteDocument > (IndexReader.java:551) > at org.apache.nutch.indexer.DeleteDuplicates.reduce > (DeleteDuplicates.java:378) > at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:322) > at org.apache.hadoop.mapred.TaskTracker$Child.main( > TaskTracker.java: > 1782) > > Other times, it works just fine. Do you know why this is happening? > > Thanks, > > Matei Zaharia >
