Hi - You can use the DomainUrlFilter to restrict URL's to a specific site.

 
 
-----Original message-----
> From:Vangelis karv <[email protected]>
> Sent: Tuesday 17th December 2013 11:15
> To: [email protected]
> Subject: Crawling a specific site only
> 
> Hi again! My goal is to crawl a specific site. I want to crawl all the links 
> that exist under that site. For example, if i decide to crawl 
> http://www.uefa.com/, I want to parse all its inlinks(photos, videos, htmls 
> etc) and not only the best scoring urls for this site= topN. So, my question 
> here is: how can we tell Nutch to crawl everything in a site and not only the 
> sites that have the best score?
>                                         

Reply via email to