Actually, I got that error the first time I used it, and then again when I blew away the downloaded nutch and grabbed the latest trunk from Subversion.
On Mon, Jul 27, 2009 at 1:11 AM, xiao yang <[email protected]> wrote: > You must have crawled for several times, and some of them failed > before the parse phase. So the parse data was not generated. > You'd better delete the whole directory > file:/Users/ptomblin/nutch-1.0/crawl.blog, and recrawl it, then you > will know the exact reason why it failed in the parse phase from the > output information. > > Xiao > > On Fri, Jul 24, 2009 at 10:53 PM, Paul Tomblin<[email protected]> wrote: > > I installed nutch 1.0 on my laptop last night and set it running to crawl > my > > blog with the command: bin/nutch crawl urls -dir crawl.blog -depth 10 > > it was still running strong when I went to bed several hours later, and > this > > morning I woke up to this: > > > > activeThreads=0, spinWaiting=0, fetchQueues.totalSize=0 > > -activeThreads=0 > > Fetcher: done > > CrawlDb update: starting > > CrawlDb update: db: crawl.blog/crawldb > > CrawlDb update: segments: [crawl.blog/segments/20090724010303] > > CrawlDb update: additions allowed: true > > CrawlDb update: URL normalizing: true > > CrawlDb update: URL filtering: true > > CrawlDb update: Merging segment data into db. > > CrawlDb update: done > > LinkDb: starting > > LinkDb: linkdb: crawl.blog/linkdb > > LinkDb: URL normalize: true > > LinkDb: URL filter: true > > LinkDb: adding segment: > > file:/Users/ptomblin/nutch-1.0/crawl.blog/segments/20090723154530 > > LinkDb: adding segment: > > file:/Users/ptomblin/nutch-1.0/crawl.blog/segments/20090723155106 > > LinkDb: adding segment: > > file:/Users/ptomblin/nutch-1.0/crawl.blog/segments/20090723155122 > > LinkDb: adding segment: > > file:/Users/ptomblin/nutch-1.0/crawl.blog/segments/20090723155303 > > LinkDb: adding segment: > > file:/Users/ptomblin/nutch-1.0/crawl.blog/segments/20090723155812 > > LinkDb: adding segment: > > file:/Users/ptomblin/nutch-1.0/crawl.blog/segments/20090723161808 > > LinkDb: adding segment: > > file:/Users/ptomblin/nutch-1.0/crawl.blog/segments/20090723171215 > > LinkDb: adding segment: > > file:/Users/ptomblin/nutch-1.0/crawl.blog/segments/20090723193543 > > LinkDb: adding segment: > > file:/Users/ptomblin/nutch-1.0/crawl.blog/segments/20090723224936 > > LinkDb: adding segment: > > file:/Users/ptomblin/nutch-1.0/crawl.blog/segments/20090724004250 > > LinkDb: adding segment: > > file:/Users/ptomblin/nutch-1.0/crawl.blog/segments/20090724010303 > > Exception in thread "main" > org.apache.hadoop.mapred.InvalidInputException: > > Input path does not exist: > > > file:/Users/ptomblin/nutch-1.0/crawl.blog/segments/20090723154530/parse_data > > at > > > org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:179) > > at > > > org.apache.hadoop.mapred.SequenceFileInputFormat.listStatus(SequenceFileInputFormat.java:39) > > at > > > org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:190) > > at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:797) > > at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1142) > > at org.apache.nutch.crawl.LinkDb.invert(LinkDb.java:170) > > at org.apache.nutch.crawl.LinkDb.invert(LinkDb.java:147) > > at org.apache.nutch.crawl.Crawl.main(Crawl.java:129) > > > > > > -- > > http://www.linkedin.com/in/paultomblin > > > -- http://www.linkedin.com/in/paultomblin
