CLASSIFICATION: UNCLASSIFIED Try using full paths i.e /home/whatever/nutch/bin/nutch mergesegs /home/whatever/nutch/crawl/merged /home/whatever/nutch/crawl/segments/*
Thanks, Kris ~~~~~~~~~~~~~~~~~~~~~~~~~~ Kris T. Musshorn FileMaker Developer - Contractor - Catapult Technology Inc. US Army Research Lab Aberdeen Proving Ground Application Management & Development Branch 410-278-7251 [email protected] ~~~~~~~~~~~~~~~~~~~~~~~~~~ -----Original Message----- From: Nestor [mailto:[email protected]] Sent: Monday, October 03, 2016 7:48 PM To: [email protected] Subject: [Non-DoD Source] Re: crawling a subfolder All active links contained in this email were disabled. Please verify the identity of the sender, and confirm the authenticity of all links contained within the message prior to copying and pasting the address to a Web browser. ---- I look at the link you sent and I tried it and it failed. Thanks, $ bin/nutch mergesegs crawl/merged crawl/segments/* Merging 1 segments to crawl/merged/20161003234422 SegmentMerger: adding crawl/segments/20161003222933 SegmentMerger: using segment data from: content crawl_generate crawl_fetch crawl_parse parse_data parse_text $ bin/nutch readseg -dump crawl/merged/* dumpedContent SegmentReader: dump segment: crawl/merged/20161003234422 Exception in thread "main" org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: file:/home/ubuntu/temtomcat/apache-nutch-1.7/runtime/local/crawl/merged/20161003234422/crawl_parse Input path does not exist: file:/home/ubuntu/temtomcat/apache-nutch-1.7/runtime/local/crawl/merged/20161003234422/content Input path does not exist: file:/home/ubuntu/temtomcat/apache-nutch-1.7/runtime/local/crawl/merged/20161003234422/parse_data Input path does not exist: file:/home/ubuntu/temtomcat/apache-nutch-1.7/runtime/local/crawl/merged/20161003234422/parse_text at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:197) -- View this message in context: Caution-http://lucene.472066.n3.nabble.com/crawling-a-subfolder-tp4299300p4299375.html Sent from the Nutch - User mailing list archive at Nabble.com. CLASSIFICATION: UNCLASSIFIED

