RE: invertlinks: Input path does not exist

Arkadi.Kosmynin Thu, 18 Mar 2010 19:57:05 -0700

I had similar problems caused by lack of space in temp directory. To solve, 
edited hadoop-site.xml and set hadoop.tmp.dir to a directory with plenty of 
space.


> -----Original Message-----
> From: kevin chen [mailto:kevinc...@bdsing.com]
> Sent: Friday, March 19, 2010 1:42 PM
> To: nutch-user@lucene.apache.org
> Subject: Re: invertlinks: Input path does not exist
> 
> Sounds like the last segment is corrupted.
> Did you try to remove the last segment?
> 
> On Wed, 2010-03-17 at 16:10 +0000, Patricio Galeas wrote:
> >     Hello all,
> >
> > I crawling the web using the
> > LanguageIdentifier plugin, but I get an error by running nutch
> > invertlinks.
> > The error occurs always by processing
> > the last segment (20100317010313-81).
> >
> > The problem is the same described in
> > http://www.mail-archive.com/nutch-user@lucene.apache.org/msg14776.html
> > With both syntax variants of
> > invertlinks I get the same error:
> > a) nutch invertlinks crawl/linkdb -dir
> > crawl/segments
> > b) nutch invertlinks crawl/linkdb
> > crawl/segments/*
> >
> > I applied
> > https://issues.apache.org/jira/browse/NUTCH-356 to avoid some java
> > heap problems by using the Language Identifier, but I got the same
> error. ;-(
> >
> > I set the NUTCH_HEAPSIZE with 6000
> > (physical memory) and I merged the segments using slice=50000
> >
> > Any idea where to look for ?
> >
> > Thanks
> > Pato
> >
> > --------------------hadoop.log----------------------------------
> > ..
> > 2010-03-17 02:33:25,107 INFO  crawl.LinkDb - LinkDb: adding segment:
> file:/mnt/nutch-1.0/crawl_al/segments/20100317010313-47
> > 2010-03-17 02:33:25,107 INFO  crawl.LinkDb - LinkDb: adding segment:
> file:/mnt/nutch-1.0/crawl_al/segments/20100317010313-68
> > 2010-03-17 02:33:25,108 INFO  crawl.LinkDb - LinkDb: adding segment:
> file:/mnt/nutch-1.0/crawl_al/segments/20100317010313-56
> > 2010-03-17 02:33:25,108 INFO  crawl.LinkDb - LinkDb: adding segment:
> file:/mnt/nutch-1.0/crawl_al/segments/20100317010313-12
> > 2010-03-17 02:33:25,108 INFO  crawl.LinkDb - LinkDb: adding segment:
> file:/mnt/nutch-1.0/crawl_al/segments/20100317010313-26
> > 2010-03-17 02:33:25,109 INFO  crawl.LinkDb - LinkDb: adding segment:
> file:/mnt/nutch-1.0/crawl_al/segments/20100317010313-73
> > 2010-03-17 02:33:25,109 INFO  crawl.LinkDb - LinkDb: adding segment:
> file:/mnt/nutch-1.0/crawl_al/segments/20100317010313-59
> > 2010-03-17 02:33:25,109 INFO  crawl.LinkDb - LinkDb: adding segment:
> file:/mnt/nutch-1.0/crawl_al/segments/20100317010313-30
> > 2010-03-17 02:33:25,110 INFO  crawl.LinkDb - LinkDb: adding segment:
> file:/mnt/nutch-1.0/crawl_al/segments/20100317010313-2
> > 2010-03-17 02:33:25,110 INFO  crawl.LinkDb - LinkDb: adding segment:
> file:/mnt/nutch-1.0/crawl_al/segments/20100317010313-34
> > 2010-03-17 02:33:25,111 INFO  crawl.LinkDb - LinkDb: adding segment:
> file:/mnt/nutch-1.0/crawl_al/segments/20100317010313-52
> > 2010-03-17 02:33:25,111 INFO  crawl.LinkDb - LinkDb: adding segment:
> file:/mnt/nutch-1.0/crawl_al/segments/20100317010313-29
> > 2010-03-17 02:33:25,111 INFO  crawl.LinkDb - LinkDb: adding segment:
> file:/mnt/nutch-1.0/crawl_al/segments/20100317010313-24
> > 2010-03-17 02:33:25,610 FATAL crawl.LinkDb - LinkDb:
> org.apache.hadoop.mapred.InvalidInputException:
> > Input path does not exist: file:/mnt/nutch-
> 1.0/crawl_al/segments/20100317010313-81/parse_data
> >         at
> org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:1
> 79)
> >         at
> org.apache.hadoop.mapred.SequenceFileInputFormat.listStatus(SequenceFileIn
> putFormat.java:39)
> >         at
> org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:19
> 0)
> >         at
> org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:797)
> >         at
> org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1142)
> >         at org.apache.nutch.crawl.LinkDb.invert(LinkDb.java:170)
> >         at org.apache.nutch.crawl.LinkDb.run(LinkDb.java:285)
> >         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> >         at org.apache.nutch.crawl.LinkDb.main(LinkDb.java:248)
> >
> > __________________________________________________
> > Do You Yahoo!?
> > Sie sind Spam leid? Yahoo! Mail verfügt über einen herausragenden Schutz
> gegen Massenmails.
> > http://mail.yahoo.com

RE: invertlinks: Input path does not exist

Reply via email to