Hi, I am trying to use nutch to just download exact number of (say 5) html pages from each seed page I provide,
I was wondering if this is the right approach, Seed Urls = total 10,000 bin/nuch crawl urls/<list of domains> -dir <out> -depth 1 -topN 60000 here depth = 1 because I just want pages from only first level i.e. if domain if foo.bar I want to download foo.bar/spam.htm foo.bar/ham.htm foo.bar/eggs.htm but NOT foo.bar/ham/spam.htm And, -topN is 60,000 because there there are 10,000 seed urls 10,000 home pages and 5 top pages per url Any suggestions? Thanks, krish

