Re: crawling a subfolder

KRIS MUSSHORN Mon, 03 Oct 2016 09:31:39 -0700

to read what is in nutch try mergesegs then readseg 
http://stackoverflow.com/questions/7968534/dump-all-segments-from-nutch

----- Original Message -----

From: "Néstor" <[email protected]> 
To: [email protected] 
Sent: Monday, October 3, 2016 11:49:15 AM 
Subject: crawling a subfolder 

Hi, 

I am using nutch for the first time and when I crawl www.mysite.com it 
crawls for a while. 
When I try to crawl a subfolder like www.mysite.com/mysubfolder it crawls 
for about 1 sec. 

my ursl/seed.txt is set 
http://www.mysite.com/mysubfolder 
my regex-urlfilter.txt use the defautl except for the last 2 lines: 

#+. 
+^http://www.mysite.org/mysubfolder 

Also When try to access the results on http://mysite.com:8080/CLIPS using 
solr 
I only see 10 records 

What could I be missing? 
How I get all the records found? 
Is there a way to look at the data crawl without sorl? 

Thanks, 

-- 
Né§t☼r *Authority gone to one's head is the greatest enemy of Truth*

Re: crawling a subfolder

Reply via email to