Hi Fernando
Crawling is done in iterations. At each iteration next portion of URLs selected
for fetching are fetched. It is normal that only your seed URLs are fetched at
the first iteration. See example of a crawling script here:
http://wiki.apache.org/nutch/Crawl
Regards,
Arkadi
> -Ori
Are you sure that you have enough space in the temporary directory used by
Hadoop?
From: Joshua J Pavel [mailto:jpa...@us.ibm.com]
Sent: Tuesday, 20 April 2010 6:42 AM
To: nutch-user@lucene.apache.org
Subject: Re: Hadoop Disk Error
Some more information, if anyone can help:
If I turn fetcher.p
Some more information, if anyone can help:
If I turn fetcher.parse to "false", then it successfully fetches and crawls
the site. and then bombs out with a larger ID for the job:
2010-04-19 20:34:48,342 WARN mapred.LocalJobRunner - job_local_0010
org.apache.hadoop.util.DiskChecker$DiskErrorExce
Hello,
when i try to create a new segment and fecth it, only the frontpage is
fetched. Everything else are un_fetched.
However, if i execute a bin/nutch crawl everything runs ok. I donĀ“t know how
to set the depth value in a segment in a inject > generate>
fetch>updatedb>invertlinks>index process.