Hi, Paul Can you post the error messages in the log file (file:/Users/ptomblin/nutch-1.0/logs)?
On Mon, Jul 27, 2009 at 6:55 PM, Paul Tomblin<[email protected]> wrote: > Actually, I got that error the first time I used it, and then again when I > blew away the downloaded nutch and grabbed the latest trunk from Subversion. > > On Mon, Jul 27, 2009 at 1:11 AM, xiao yang <[email protected]> wrote: > >> You must have crawled for several times, and some of them failed >> before the parse phase. So the parse data was not generated. >> You'd better delete the whole directory >> file:/Users/ptomblin/nutch-1.0/crawl.blog, and recrawl it, then you >> will know the exact reason why it failed in the parse phase from the >> output information. >> >> Xiao >> >> On Fri, Jul 24, 2009 at 10:53 PM, Paul Tomblin<[email protected]> wrote: >> > I installed nutch 1.0 on my laptop last night and set it running to crawl >> my >> > blog with the command: bin/nutch crawl urls -dir crawl.blog -depth 10 >> > it was still running strong when I went to bed several hours later, and >> this >> > morning I woke up to this: >> > >> > activeThreads=0, spinWaiting=0, fetchQueues.totalSize=0 >> > -activeThreads=0 >> > Fetcher: done >> > CrawlDb update: starting >> > CrawlDb update: db: crawl.blog/crawldb >> > CrawlDb update: segments: [crawl.blog/segments/20090724010303] >> > CrawlDb update: additions allowed: true >> > CrawlDb update: URL normalizing: true >> > CrawlDb update: URL filtering: true >> > CrawlDb update: Merging segment data into db. >> > CrawlDb update: done >> > LinkDb: starting >> > LinkDb: linkdb: crawl.blog/linkdb >> > LinkDb: URL normalize: true >> > LinkDb: URL filter: true >> > LinkDb: adding segment: >> > file:/Users/ptomblin/nutch-1.0/crawl.blog/segments/20090723154530 >> > LinkDb: adding segment: >> > file:/Users/ptomblin/nutch-1.0/crawl.blog/segments/20090723155106 >> > LinkDb: adding segment: >> > file:/Users/ptomblin/nutch-1.0/crawl.blog/segments/20090723155122 >> > LinkDb: adding segment: >> > file:/Users/ptomblin/nutch-1.0/crawl.blog/segments/20090723155303 >> > LinkDb: adding segment: >> > file:/Users/ptomblin/nutch-1.0/crawl.blog/segments/20090723155812 >> > LinkDb: adding segment: >> > file:/Users/ptomblin/nutch-1.0/crawl.blog/segments/20090723161808 >> > LinkDb: adding segment: >> > file:/Users/ptomblin/nutch-1.0/crawl.blog/segments/20090723171215 >> > LinkDb: adding segment: >> > file:/Users/ptomblin/nutch-1.0/crawl.blog/segments/20090723193543 >> > LinkDb: adding segment: >> > file:/Users/ptomblin/nutch-1.0/crawl.blog/segments/20090723224936 >> > LinkDb: adding segment: >> > file:/Users/ptomblin/nutch-1.0/crawl.blog/segments/20090724004250 >> > LinkDb: adding segment: >> > file:/Users/ptomblin/nutch-1.0/crawl.blog/segments/20090724010303 >> > Exception in thread "main" >> org.apache.hadoop.mapred.InvalidInputException: >> > Input path does not exist: >> > >> file:/Users/ptomblin/nutch-1.0/crawl.blog/segments/20090723154530/parse_data >> > at >> > >> org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:179) >> > at >> > >> org.apache.hadoop.mapred.SequenceFileInputFormat.listStatus(SequenceFileInputFormat.java:39) >> > at >> > >> org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:190) >> > at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:797) >> > at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1142) >> > at org.apache.nutch.crawl.LinkDb.invert(LinkDb.java:170) >> > at org.apache.nutch.crawl.LinkDb.invert(LinkDb.java:147) >> > at org.apache.nutch.crawl.Crawl.main(Crawl.java:129) >> > >> > >> > -- >> > http://www.linkedin.com/in/paultomblin >> > >> > > > > -- > http://www.linkedin.com/in/paultomblin >
