Re: crawling a subfolder

Nestor Mon, 03 Oct 2016 16:49:02 -0700

I look at the link you sent and I tried it and it failed.

Thanks,


$ bin/nutch mergesegs crawl/merged crawl/segments/*
Merging 1 segments to crawl/merged/20161003234422
SegmentMerger:   adding crawl/segments/20161003222933
SegmentMerger: using segment data from: content crawl_generate crawl_fetch
crawl_parse parse_data parse_text 
$ bin/nutch readseg -dump crawl/merged/* dumpedContent
SegmentReader: dump segment: crawl/merged/20161003234422
Exception in thread "main" org.apache.hadoop.mapred.InvalidInputException:
Input path does not exist:
file:/home/ubuntu/temtomcat/apache-nutch-1.7/runtime/local/crawl/merged/20161003234422/crawl_parse
Input path does not exist:
file:/home/ubuntu/temtomcat/apache-nutch-1.7/runtime/local/crawl/merged/20161003234422/content
Input path does not exist:
file:/home/ubuntu/temtomcat/apache-nutch-1.7/runtime/local/crawl/merged/20161003234422/parse_data
Input path does not exist:
file:/home/ubuntu/temtomcat/apache-nutch-1.7/runtime/local/crawl/merged/20161003234422/parse_text
        at
org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:197)




--
View this message in context: 
http://lucene.472066.n3.nabble.com/crawling-a-subfolder-tp4299300p4299375.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Re: crawling a subfolder

Reply via email to