Depth argument is only used for the crawl command and basically is the number of run cycles craw/fetch/update/index
2010/1/8, Mischa Tuffield <mischa.tuffi...@garlik.com>: > Hi Kumar, > > Am happy that that was of use to you. Sadly I have no feel for what the > "depth" argument does, I don't tend to ever use it, I tend to use nutch's > more specific commands: inject, generate, fetch, updatedb, merge, etc ... > > Perhaps someone else could shed light on the crawl command. > > Regards, and happy new years! > > Mischa > On 8 Jan 2010, at 11:49, Kumar Krishnasami wrote: > >> Thanks, Mischa. That worked!! >> >> So, it looks like once this config property is set, crawl ignores the >> 'depth' argument. Even if I set 'depth' to 2, 3 etc., it will never crawl >> any of the outlinks. Is that correct? >> >> Regards, >> Kumar. >> >> Mischa Tuffield wrote: >>> Hello Kumar, >>> There is a config property you can set in conf/nutch-site.xml, as follows >>> : >>> <!-- >>> <property> >>> <name>db.max.outlinks.per.page</name> >>> <value>0</value> >>> <description>The maximum number of outlinks that we'll process for a >>> page. >>> If this value is nonnegative (>=0), at most db.max.outlinks.per.page >>> outlinks >>> will be processed for a page; otherwise, all outlinks will be processed. >>> </description> >>> </property> >>> --> >>> This will force nutch to only fetch items of depth "0", i.e. it wont >>> attempt to follow any of the outlinks from pages you tell it to go and >>> fetch. >>> >>> Regards, >>> Mischa >>> On 8 Jan 2010, at 10:59, Kumar Krishnasami wrote: >>> >>>> Hi, >>>> >>>> I am a newbie to nutch. Just started looking at. I have a requirement to >>>> crawl and index only urls that are specified under the urls folder. I do >>>> not want nutch to crawl to any depth beyond the ones that are listed in >>>> the urls folder. >>>> >>>> Can I accomplish this by setting the depth argument for 'crawl' to "0"? >>>> >>>> If I set the depth to 0, I get a message that says "No URLs to fetch - >>>> check your seed list and URL filters.". >>>> >>>> Any help will be greatly appreciated. >>>> >>>> Thanks, >>>> Kumar. >>> >>> ___________________________________ >>> Mischa Tuffield >>> Email: mischa.tuffi...@garlik.com <mailto:mischa.tuffi...@garlik.com> >>> Homepage - http://mmt.me.uk/ >>> Garlik Limited, 2 Sheen Road, Richmond, TW9 1AE, UK >>> +44(0)20 8973 2465 http://www.garlik.com/ >>> Registered in England and Wales 535 7233 VAT # 849 0517 11 >>> Registered office: Thames House, Portsmouth Road, Esher, Surrey, KT10 9AD >>> >> > > ___________________________________ > Mischa Tuffield > Email: mischa.tuffi...@garlik.com > Homepage - http://mmt.me.uk/ > Garlik Limited, 2 Sheen Road, Richmond, TW9 1AE, UK > +44(0)20 8973 2465 http://www.garlik.com/ > Registered in England and Wales 535 7233 VAT # 849 0517 11 > Registered office: Thames House, Portsmouth Road, Esher, Surrey, KT10 9AD > > -- -MilleBii-