hi. what does this command supposed to do ? also do you know is there any way i can parse and save text of html files while crawling ?
On 7 April 2010 14:32, Gareth Gale <gareth.g...@hp.com> wrote: > Running nutch 0.9 for a long time without problems, but have just started > to see this error when executing (all from within the nutch 0.9 bin > directory) :- > > ./nutch mergesegs $crawldir/MERGEDsegments $crawldir/segements/* > > The error is :- > > Exception in thread "main" java.io.IOException: No input paths specified in > input > at > org.apache.hadoop.mapred.InputFormatBase.validateInput(InputFormatBase.java:99) > at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:326) > at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:543) > at > org.apache.nutch.segment.SegmentMerger.merge(SegmentMerger.java:590) > at > org.apache.nutch.segment.SegmentMerger.main(SegmentMerger.java:638) > > > I've tried both a 1.5 and 1.6 java vm but get the same result. > > I have no idea how this is happening or why, but need to fix it asap - any > help much appreciated ! >