RE: fetch depth

2010-04-19 Thread Arkadi.Kosmynin
Hi Fernando Crawling is done in iterations. At each iteration next portion of URLs selected for fetching are fetched. It is normal that only your seed URLs are fetched at the first iteration. See example of a crawling script here: http://wiki.apache.org/nutch/Crawl Regards, Arkadi > -Ori

RE: Hadoop Disk Error

2010-04-19 Thread Arkadi.Kosmynin
Are you sure that you have enough space in the temporary directory used by Hadoop? From: Joshua J Pavel [mailto:jpa...@us.ibm.com] Sent: Tuesday, 20 April 2010 6:42 AM To: nutch-user@lucene.apache.org Subject: Re: Hadoop Disk Error Some more information, if anyone can help: If I turn fetcher.p

Re: Hadoop Disk Error

2010-04-19 Thread Joshua J Pavel
Some more information, if anyone can help: If I turn fetcher.parse to "false", then it successfully fetches and crawls the site. and then bombs out with a larger ID for the job: 2010-04-19 20:34:48,342 WARN mapred.LocalJobRunner - job_local_0010 org.apache.hadoop.util.DiskChecker$DiskErrorExce

fetch depth

2010-04-19 Thread Fernando Navarro
Hello, when i try to create a new segment and fecth it, only the frontpage is fetched. Everything else are un_fetched. However, if i execute a bin/nutch crawl everything runs ok. I donĀ“t know how to set the depth value in a segment in a inject > generate> fetch>updatedb>invertlinks>index process.