Chetan Patel wrote:
Hi Doğacan Güney,
Thanks for giving solution.
Is it possible to recrawl without removing files?
Yes and no. If the segment is already fetched, there is no fetch-again
or update mode. You can copy segments to a new directory, rename the
old directory, rename the new directory to the old name, do as Dogacan
suggested and removed everything except crawl_generate in the copied
directory, then fetch again.
Dennis
Thank you again.
Regads,
Chetan Patel
Doğacan Güney-3 wrote:
Hi,
On Mon, Sep 15, 2008 at 2:43 PM, Chetan Patel <[EMAIL PROTECTED]>
wrote:
Hi,
I have tried to re crawl script which is available on
http://wiki.apache.org/nutch/IntranetRecrawl.
I have got following error.
2008-09-15 17:04:32,238 INFO fetcher.Fetcher - Fetcher: starting
2008-09-15 17:04:32,254 INFO fetcher.Fetcher - Fetcher: segment:
google/segments/20080915170335
2008-09-15 17:04:32,972 FATAL fetcher.Fetcher - Fetcher:
java.io.IOException: Segment already fetched!
at
org.apache.nutch.fetcher.FetcherOutputFormat.checkOutputSpecs(FetcherOutputFormat.java:46)
at
org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:329)
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:543)
at org.apache.nutch.fetcher.Fetcher.fetch(Fetcher.java:470)
at org.apache.nutch.fetcher.Fetcher.run(Fetcher.java:505)
at org.apache.hadoop.util.ToolBase.doMain(ToolBase.java:189)
at org.apache.nutch.fetcher.Fetcher.main(Fetcher.java:477)
2008-09-15 17:04:35,144 INFO crawl.CrawlDb - CrawlDb update: starting
Plz. help me to solve this error.
Segment you are trying to crawl is already fetched. Try removing
everything but crawl_generate under that segment.
Thanks in advance
Regards,
Chetan Patel
--
Doğacan Güney