CLASSIFICATION: UNCLASSIFIED

Try using full paths i.e
/home/whatever/nutch/bin/nutch mergesegs /home/whatever/nutch/crawl/merged 
/home/whatever/nutch/crawl/segments/*

Thanks,
Kris

~~~~~~~~~~~~~~~~~~~~~~~~~~
Kris T. Musshorn
FileMaker Developer - Contractor - Catapult Technology Inc.      
US Army Research Lab 
Aberdeen Proving Ground 
Application Management & Development Branch 
410-278-7251
[email protected]
~~~~~~~~~~~~~~~~~~~~~~~~~~

-----Original Message-----
From: Nestor [mailto:[email protected]] 
Sent: Monday, October 03, 2016 7:48 PM
To: [email protected]
Subject: [Non-DoD Source] Re: crawling a subfolder

All active links contained in this email were disabled.  Please verify the 
identity of the sender, and confirm the authenticity of all links contained 
within the message prior to copying and pasting the address to a Web browser.  




----

I look at the link you sent and I tried it and it failed.

Thanks,

$ bin/nutch mergesegs crawl/merged crawl/segments/* Merging 1 segments to 
crawl/merged/20161003234422
SegmentMerger:   adding crawl/segments/20161003222933
SegmentMerger: using segment data from: content crawl_generate crawl_fetch 
crawl_parse parse_data parse_text $ bin/nutch readseg -dump crawl/merged/* 
dumpedContent
SegmentReader: dump segment: crawl/merged/20161003234422 Exception in thread 
"main" org.apache.hadoop.mapred.InvalidInputException:
Input path does not exist:
file:/home/ubuntu/temtomcat/apache-nutch-1.7/runtime/local/crawl/merged/20161003234422/crawl_parse
Input path does not exist:
file:/home/ubuntu/temtomcat/apache-nutch-1.7/runtime/local/crawl/merged/20161003234422/content
Input path does not exist:
file:/home/ubuntu/temtomcat/apache-nutch-1.7/runtime/local/crawl/merged/20161003234422/parse_data
Input path does not exist:
file:/home/ubuntu/temtomcat/apache-nutch-1.7/runtime/local/crawl/merged/20161003234422/parse_text
        at
org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:197)




--
View this message in context: 
Caution-http://lucene.472066.n3.nabble.com/crawling-a-subfolder-tp4299300p4299375.html
Sent from the Nutch - User mailing list archive at Nabble.com.


CLASSIFICATION: UNCLASSIFIED

Reply via email to