Remove the lock files before generating a fetch list. crawldb/.locked crawldb/..locked.crc
> My nutch crawl is hanging here: (Any ideas why?) > > > > 2010-11-04 13:25:55,616 INFO crawl.Injector - Injector: starting at > 2010-11-04 13:25:55 > > 2010-11-04 13:25:55,617 INFO crawl.Injector - Injector: crawlDb: > /lib/nutch/crawl/crawldb > > 2010-11-04 13:25:55,617 INFO crawl.Injector - Injector: urlDir: > /lib/nutch/seed > > 2010-11-04 13:25:55,618 INFO crawl.Injector - Injector: Converting > injected urls to crawl db entries. > > 2010-11-04 13:25:56,800 ERROR crawl.Generator - Generator: > java.io.IOException: lock file /lib/nutch/crawl/crawldb/.locked already > exists. > > at > org.apache.nutch.util.LockUtil.createLockFile(LockUtil.java:44) > > at > org.apache.nutch.crawl.Generator.generate(Generator.java:474) > > at org.apache.nutch.crawl.Generator.run(Generator.java:692) > > at > org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) > > at > org.apache.nutch.crawl.Generator.main(Generator.java:648) > > > > 2010-11-04 13:25:58,540 INFO segment.SegmentMerger - Merging 1 segments to > /lib/nutch/crawl/MERGEDsegments/20101104132558 > > 2010-11-04 13:25:58,543 WARN segment.SegmentMerger - Input dir > /lib/nutch/crawl/segments/* doesn't exist, skipping. > > 2010-11-04 13:25:58,543 INFO segment.SegmentMerger - SegmentMerger: using > segment data from: content crawl_generate crawl_fetch crawl_parse > parse_data parse_text > > 2010-11-04 13:25:58,575 WARN mapred.JobClient - Use GenericOptionsParser > for parsing the arguments. Applications should implement Tool for the same. > > 2010-11-04 13:25:59,625 INFO crawl.LinkDb - LinkDb: starting at 2010-11-04 > 13:25:59 > > 2010-11-04 13:25:59,626 INFO crawl.LinkDb - LinkDb: linkdb: > /lib/nutch/crawl/linkdb > > 2010-11-04 13:25:59,626 INFO crawl.LinkDb - LinkDb: URL normalize: true > > 2010-11-04 13:25:59,626 INFO crawl.LinkDb - LinkDb: URL filter: true > > 2010-11-04 13:25:59,635 INFO crawl.LinkDb - LinkDb: adding segment: > /lib/nutch/crawl/segments/* > > 2010-11-04 13:26:00,584 INFO solr.SolrIndexer - SolrIndexer: starting at > 2010-11-04 13:26:00 > > 2010-11-04 13:26:00,652 INFO indexer.IndexerMapReduce - IndexerMapReduce: > crawldb: /lib/nutch/crawl/crawldb > > 2010-11-04 13:26:00,652 INFO indexer.IndexerMapReduce - IndexerMapReduce: > linkdb: /lib/nutch/crawl/linkdb > > 2010-11-04 13:26:00,652 INFO indexer.IndexerMapReduce - IndexerMapReduces: > adding segment: /lib/nutch/crawl/segments/*

