Removed. And removed the ones in linkdb as well after a restart and reading the error (Thanks, I know what to look for now.) I am still hung here so there mmust be something else I am missing:
2010-11-04 13:47:57,361 INFO crawl.Injector - Injector: starting at 2010-11-04 13:47:57 2010-11-04 13:47:57,361 INFO crawl.Injector - Injector: crawlDb: /lib/nutch/crawl/crawldb 2010-11-04 13:47:57,362 INFO crawl.Injector - Injector: urlDir: /lib/nutch/seed 2010-11-04 13:47:57,363 INFO crawl.Injector - Injector: Converting injected urls to crawl db entries. 2010-11-04 13:47:58,499 INFO crawl.Generator - Generator: starting at 2010-11-04 13:47:58 2010-11-04 13:47:58,500 INFO crawl.Generator - Generator: Selecting best-scoring urls due for fetch. 2010-11-04 13:47:58,500 INFO crawl.Generator - Generator: filtering: true 2010-11-04 13:47:58,500 INFO crawl.Generator - Generator: normalizing: true 2010-11-04 13:47:58,502 INFO crawl.Generator - Generator: jobtracker is 'local', generating exactly one partition. 2010-11-04 13:48:00,553 INFO segment.SegmentMerger - Merging 1 segments to /lib/nutch/crawl/MERGEDsegments/20101104134800 2010-11-04 13:48:00,555 WARN segment.SegmentMerger - Input dir /lib/nutch/crawl/segments/* doesn't exist, skipping. 2010-11-04 13:48:00,555 INFO segment.SegmentMerger - SegmentMerger: using segment data from: content crawl_generate crawl_fetch crawl_parse parse_data parse_text 2010-11-04 13:48:00,588 WARN mapred.JobClient - Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. 2010-11-04 13:48:01,585 INFO crawl.LinkDb - LinkDb: starting at 2010-11-04 13:48:01 2010-11-04 13:48:01,586 INFO crawl.LinkDb - LinkDb: linkdb: /lib/nutch/crawl/linkdb 2010-11-04 13:48:01,586 INFO crawl.LinkDb - LinkDb: URL normalize: true 2010-11-04 13:48:01,586 INFO crawl.LinkDb - LinkDb: URL filter: true 2010-11-04 13:48:01,595 INFO crawl.LinkDb - LinkDb: adding segment: /lib/nutch/crawl/segments/* 2010-11-04 13:48:02,549 INFO solr.SolrIndexer - SolrIndexer: starting at 2010-11-04 13:48:02 2010-11-04 13:48:02,617 INFO indexer.IndexerMapReduce - IndexerMapReduce: crawldb: /lib/nutch/crawl/crawldb 2010-11-04 13:48:02,617 INFO indexer.IndexerMapReduce - IndexerMapReduce: linkdb: /lib/nutch/crawl/linkdb 2010-11-04 13:48:02,617 INFO indexer.IndexerMapReduce - IndexerMapReduces: adding segment: /lib/nutch/crawl/segments/* -----Original Message----- From: Markus Jelsma [mailto:[email protected]] Sent: Thursday, November 04, 2010 11:39 AM To: [email protected] Subject: Re: False Start Remove the lock files before generating a fetch list. crawldb/.locked crawldb/..locked.crc > My nutch crawl is hanging here: (Any ideas why?) > > > > 2010-11-04 13:25:55,616 INFO crawl.Injector - Injector: starting at > 2010-11-04 13:25:55 > > 2010-11-04 13:25:55,617 INFO crawl.Injector - Injector: crawlDb: > /lib/nutch/crawl/crawldb > > 2010-11-04 13:25:55,617 INFO crawl.Injector - Injector: urlDir: > /lib/nutch/seed > > 2010-11-04 13:25:55,618 INFO crawl.Injector - Injector: Converting > injected urls to crawl db entries. > > 2010-11-04 13:25:56,800 ERROR crawl.Generator - Generator: > java.io.IOException: lock file /lib/nutch/crawl/crawldb/.locked already > exists. > > at > org.apache.nutch.util.LockUtil.createLockFile(LockUtil.java:44) > > at > org.apache.nutch.crawl.Generator.generate(Generator.java:474) > > at org.apache.nutch.crawl.Generator.run(Generator.java:692) > > at > org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) > > at > org.apache.nutch.crawl.Generator.main(Generator.java:648) > > > > 2010-11-04 13:25:58,540 INFO segment.SegmentMerger - Merging 1 segments to > /lib/nutch/crawl/MERGEDsegments/20101104132558 > > 2010-11-04 13:25:58,543 WARN segment.SegmentMerger - Input dir > /lib/nutch/crawl/segments/* doesn't exist, skipping. > > 2010-11-04 13:25:58,543 INFO segment.SegmentMerger - SegmentMerger: using > segment data from: content crawl_generate crawl_fetch crawl_parse > parse_data parse_text > > 2010-11-04 13:25:58,575 WARN mapred.JobClient - Use GenericOptionsParser > for parsing the arguments. Applications should implement Tool for the same. > > 2010-11-04 13:25:59,625 INFO crawl.LinkDb - LinkDb: starting at 2010-11-04 > 13:25:59 > > 2010-11-04 13:25:59,626 INFO crawl.LinkDb - LinkDb: linkdb: > /lib/nutch/crawl/linkdb > > 2010-11-04 13:25:59,626 INFO crawl.LinkDb - LinkDb: URL normalize: true > > 2010-11-04 13:25:59,626 INFO crawl.LinkDb - LinkDb: URL filter: true > > 2010-11-04 13:25:59,635 INFO crawl.LinkDb - LinkDb: adding segment: > /lib/nutch/crawl/segments/* > > 2010-11-04 13:26:00,584 INFO solr.SolrIndexer - SolrIndexer: starting at > 2010-11-04 13:26:00 > > 2010-11-04 13:26:00,652 INFO indexer.IndexerMapReduce - IndexerMapReduce: > crawldb: /lib/nutch/crawl/crawldb > > 2010-11-04 13:26:00,652 INFO indexer.IndexerMapReduce - IndexerMapReduce: > linkdb: /lib/nutch/crawl/linkdb > > 2010-11-04 13:26:00,652 INFO indexer.IndexerMapReduce - IndexerMapReduces: > adding segment: /lib/nutch/crawl/segments/*

