to read what is in nutch try mergesegs then readseg http://stackoverflow.com/questions/7968534/dump-all-segments-from-nutch
----- Original Message ----- From: "Néstor" <[email protected]> To: [email protected] Sent: Monday, October 3, 2016 11:49:15 AM Subject: crawling a subfolder Hi, I am using nutch for the first time and when I crawl www.mysite.com it crawls for a while. When I try to crawl a subfolder like www.mysite.com/mysubfolder it crawls for about 1 sec. my ursl/seed.txt is set http://www.mysite.com/mysubfolder my regex-urlfilter.txt use the defautl except for the last 2 lines: #+. +^http://www.mysite.org/mysubfolder Also When try to access the results on http://mysite.com:8080/CLIPS using solr I only see 10 records What could I be missing? How I get all the records found? Is there a way to look at the data crawl without sorl? Thanks, -- Né§t☼r *Authority gone to one's head is the greatest enemy of Truth*

