Re: Crawling a specific site only

Wang Yi Tue, 17 Dec 2013 02:55:08 -0800

HI
Just set 
          <name>db.ignore.external.links</name>
          <value>true</value>
and run crawl script for several times, the default number of pages to
be added is 50,000.

Is it right?
Wang

-----Original Message-----
From: Vangelis karv <[email protected]>
Reply-to: [email protected]
To: [email protected] <[email protected]>
Subject: Crawling a specific site only
Date: Tue, 17 Dec 2013 12:15:00 +0200

Hi again! My goal is to crawl a specific site. I want to crawl all the links 
that exist under that site. For example, if i decide to crawl 
http://www.uefa.com/, I want to parse all its inlinks(photos, videos, htmls 
etc) and not only the best scoring urls for this site= topN. So, my question 
here is: how can we tell Nutch to crawl everything in a site and not only the 
sites that have the best score?

Re: Crawling a specific site only

Reply via email to