It is specified INSIDE the crawl script itself! -----Original Message----- From: A Laxmi [mailto:[email protected]] Sent: Tuesday, October 01, 2013 5:58 PM To: [email protected] Subject: Nutch 2.2.1 with HBase crawl command - topN
Hi, I have HBase 2.2.1 with HBase as backend. I am using the all-in-one crawl command which runs fine - *bin/crawl urls 3 http://localhost:8983/solr/ 10 * *crawl <seedDir> <crawlId> <solrURL> <numberOfRounds>* My question is - Where do we specify the "*topN*" parameter for the above all-in-one crawl command? topN - maximum number of pages that will be retrieved at each level

