Re: parse data directory not found after merge

Markus Jelsma Tue, 10 Jan 2012 09:02:23 -0800

I might want to ask about your Hadoop temp dir since you seem to have disk 
errors. Have you set it?


On Tuesday 10 January 2012 17:59:58 Markus Jelsma wrote:
> I haven't followed the entire thread but this is about the parse_data
> directory disappears after a merge? We have no issues with merges on small
> crawls.
> 
> Do you still store content despite the parsing fetcher? Can you reproduce
> this on a clean Nutch 1.4  build with an example crawl?
> 
> On Thursday 05 January 2012 18:28:52 Dean Pullen wrote:
> > Hi all,
> > 
> > I'm upgrading from nutch 1 to 1.4 and am having problems running
> > invertlinks.
> > 
> > Error:
> > 
> > LinkDb: org.apache.hadoop.mapred.InvalidInputException: Input path does
> > not exist: file:/opt/nutch/data/crawl/segments/20120105172548/parse_data
> > 
> >      at
> > 
> > org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:
> > 19 0) at
> > org.apache.hadoop.mapred.SequenceFileInputFormat.listStatus(SequenceFileI
> > np utFormat.java:44) at
> > org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:2
> > 01 ) at
> > org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:810)
> > 
> >      at
> > 
> > org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:781)
> > 
> >      at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:730)
> >      at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1249)
> >      at org.apache.nutch.crawl.LinkDb.invert(LinkDb.java:175)
> >      at org.apache.nutch.crawl.LinkDb.run(LinkDb.java:290)
> >      at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> >      at org.apache.nutch.crawl.LinkDb.main(LinkDb.java:255)
> > 
> > I notice that the parse_data directories are produced after a fetch
> > (with fetcher.parse set to true), but after the merge the parse_data
> > directory doesn't exist.
> > 
> > What behaviour has changed since 1.0 and does anyone have a solution for
> > the above?
> > 
> > Thanks in advance,
> > 
> > Dean.

-- 
Markus Jelsma - CTO - Openindex

Re: parse data directory not found after merge

Reply via email to