Hi Everyone, I'm using Nutch 1.7 to crawl the contents of a number of sites. I want it to get 10 pages from each seed, not including pages from outlinks of the seed. Say I want to crawl www.example1.com, and some pages there have outlinks to www.example2.com. Here I provide example1.com as a seed, and want 10 pages (exactly 10, unless there doesn't exist that many) only from from example1.com (I got 100+ sites to crawl, so I can't set regexes matching every single URL ). Also, I want pictures and videos to be excluded from crawl results. Could anyone please help me with what I should set? I read the documentation a couple of times with no results.
Thanks, Arian

