crawling a subfolder

Néstor Mon, 03 Oct 2016 08:57:41 -0700

Hi,

I am using nutch for the first time and when I crawl www.mysite.com it
crawls for a while.
When I try to crawl a subfolder like www.mysite.com/mysubfolder it crawls
for about 1 sec.


my ursl/seed.txt is set
http://www.mysite.com/mysubfolder
my regex-urlfilter.txt use the defautl except for the last 2 lines:

#+.
+^http://www.mysite.org/mysubfolder


Also When try to access the results on http://mysite.com:8080/CLIPS using
solr
I only see 10 records

What could I be missing?
How I get all the records found?
Is there a way to look at the data crawl without sorl?

Thanks,

-- 
Né§t☼r  *Authority gone to one's head is the greatest enemy of Truth*

crawling a subfolder

Reply via email to