Hi Jotta, From your thread it appears you are passing individual commands (or running a script) for both injecting and fetching. We have parameters that we can pass to various commands throughout the early steps of the 'crawl' process, namely
topN, depth, threads I am unsure where to find precise details of these parameters (try browsing articles on the wiki and searching mailing lists as I'm sure this must have been asked before), but I am sure that someone can paint an accurate picture for you :0) Lewis ________________________________________ From: jotta [[email protected]] Sent: 28 April 2011 09:20 To: [email protected] Subject: Crawling process - Fetching Hi! I have question about fetching - one of crawling process's stage. When I use commands to injecting url and fetching content, Nutch gets different number of records to fetch e.g. during first fetching it usually gets one record (main page), then about 50 records and then 100 records and so on. On what depends number of records getting to fetch and can I change it? -- View this message in context: http://lucene.472066.n3.nabble.com/Crawling-process-Fetching-tp2873786p2873786.html Sent from the Nutch - User mailing list archive at Nabble.com. Email has been scanned for viruses by Altman Technologies' email management service - www.altman.co.uk/emailsystems Glasgow Caledonian University is a registered Scottish charity, number SC021474 Winner: Times Higher Education’s Widening Participation Initiative of the Year 2009 and Herald Society’s Education Initiative of the Year 2009. http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,6219,en.html Winner: Times Higher Education’s Outstanding Support for Early Career Researchers of the Year 2010, GCU as a lead with Universities Scotland partners. http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,15691,en.html

