Re: Crawl specific urls and depth argument

2010-01-09 Thread MilleBii
I agree it is a miss-leading at first. 2010/1/9 Kumar Krishnasami kumara...@vembu.com Thanks, MilleBii. That explains it. All the docs I came across mentioned something like -depth /depth/ indicates the link depth from the root page that should be crawled (from

Crawl specific urls and depth argument

2010-01-08 Thread Kumar Krishnasami
Hi, I am a newbie to nutch. Just started looking at. I have a requirement to crawl and index only urls that are specified under the urls folder. I do not want nutch to crawl to any depth beyond the ones that are listed in the urls folder. Can I accomplish this by setting the depth argument

Re: Crawl specific urls and depth argument

2010-01-08 Thread Mischa Tuffield
Hello Kumar, There is a config property you can set in conf/nutch-site.xml, as follows : !-- property namedb.max.outlinks.per.page/name value0/value descriptionThe maximum number of outlinks that we'll process for a page. If this value is nonnegative (=0), at most

Re: Crawl specific urls and depth argument

2010-01-08 Thread Kumar Krishnasami
Thanks, Mischa. That worked!! So, it looks like once this config property is set, crawl ignores the 'depth' argument. Even if I set 'depth' to 2, 3 etc., it will never crawl any of the outlinks. Is that correct? Regards, Kumar. Mischa Tuffield wrote: Hello Kumar, There is a config

Re: Crawl specific urls and depth argument

2010-01-08 Thread Mischa Tuffield
Hi Kumar, Am happy that that was of use to you. Sadly I have no feel for what the depth argument does, I don't tend to ever use it, I tend to use nutch's more specific commands: inject, generate, fetch, updatedb, merge, etc ... Perhaps someone else could shed light on the crawl command.

Re: Crawl specific urls and depth argument

2010-01-08 Thread MilleBii
Depth argument is only used for the crawl command and basically is the number of run cycles craw/fetch/update/index 2010/1/8, Mischa Tuffield mischa.tuffi...@garlik.com: Hi Kumar, Am happy that that was of use to you. Sadly I have no feel for what the depth argument does, I don't tend to ever

Re: Crawl specific urls and depth argument

2010-01-08 Thread Kumar Krishnasami
Thanks, MilleBii. That explains it. All the docs I came across mentioned something like -depth /depth/ indicates the link depth from the root page that should be crawled (from http://lucene.apache.org/nutch/tutorial8.html). MilleBii wrote: Depth argument is only used for the crawl command