HI
Just set
<name>db.ignore.external.links</name>
<value>true</value>
and run crawl script for several times, the default number of pages to
be added is 50,000.
Is it right?
Wang
-----Original Message-----
From: Vangelis karv <[email protected]>
Reply-to: [email protected]
To: [email protected] <[email protected]>
Subject: Crawling a specific site only
Date: Tue, 17 Dec 2013 12:15:00 +0200
Hi again! My goal is to crawl a specific site. I want to crawl all the links
that exist under that site. For example, if i decide to crawl
http://www.uefa.com/, I want to parse all its inlinks(photos, videos, htmls
etc) and not only the best scoring urls for this site= topN. So, my question
here is: how can we tell Nutch to crawl everything in a site and not only the
sites that have the best score?