RE: Crawling process - Fetching

McGibbney, Lewis John Thu, 28 Apr 2011 08:02:46 -0700

Hi Jotta,

From your thread it appears you are passing individual commands (or running a 
script) for both injecting and fetching. We have parameters that we can pass to 
various commands throughout the early steps of the 'crawl' process, namely


topN, depth, threads

I am unsure where to find precise details of these parameters (try browsing 
articles on the wiki and searching mailing lists as I'm sure this must have 
been asked before), but I am sure that someone can paint an accurate picture 
for you  :0)

Lewis
________________________________________
From: jotta [[email protected]]
Sent: 28 April 2011 09:20
To: [email protected]
Subject: Crawling process - Fetching

Hi!

I have question about fetching - one of crawling process's stage.
When I use commands to injecting url and fetching content, Nutch gets
different number of records to fetch e.g. during first fetching it usually
gets one record (main page), then about 50 records and then 100 records and
so on.

On what depends number of records getting to fetch and can I change it?

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Crawling-process-Fetching-tp2873786p2873786.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Email has been scanned for viruses by Altman Technologies' email management 
service - www.altman.co.uk/emailsystems

Glasgow Caledonian University is a registered Scottish charity, number SC021474

Winner: Times Higher Education’s Widening Participation Initiative of the Year 
2009 and Herald Society’s Education Initiative of the Year 2009.
http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,6219,en.html

Winner: Times Higher Education’s Outstanding Support for Early Career 
Researchers of the Year 2010, GCU as a lead with Universities Scotland partners.
http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,15691,en.html

RE: Crawling process - Fetching

Reply via email to