Hi Doğacan Güney,

Thanks for giving solution.

Is it possible to recrawl without removing files?

Thank you again.

Regads,
Chetan Patel


Doğacan Güney-3 wrote:
> 
> Hi,
> 
> On Mon, Sep 15, 2008 at 2:43 PM, Chetan Patel <[EMAIL PROTECTED]>
> wrote:
>>
>> Hi,
>>
>> I have tried to re crawl script which is available on
>> http://wiki.apache.org/nutch/IntranetRecrawl.
>>
>> I have got following error.
>>
>> 2008-09-15 17:04:32,238 INFO  fetcher.Fetcher - Fetcher: starting
>> 2008-09-15 17:04:32,254 INFO  fetcher.Fetcher - Fetcher: segment:
>> google/segments/20080915170335
>> 2008-09-15 17:04:32,972 FATAL fetcher.Fetcher - Fetcher:
>> java.io.IOException: Segment already fetched!
>>        at
>> org.apache.nutch.fetcher.FetcherOutputFormat.checkOutputSpecs(FetcherOutputFormat.java:46)
>>        at
>> org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:329)
>>        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:543)
>>        at org.apache.nutch.fetcher.Fetcher.fetch(Fetcher.java:470)
>>        at org.apache.nutch.fetcher.Fetcher.run(Fetcher.java:505)
>>        at org.apache.hadoop.util.ToolBase.doMain(ToolBase.java:189)
>>        at org.apache.nutch.fetcher.Fetcher.main(Fetcher.java:477)
>>
>> 2008-09-15 17:04:35,144 INFO  crawl.CrawlDb - CrawlDb update: starting
>>
>> Plz. help me to solve this error.
>>
> 
> Segment you are trying to crawl is already fetched. Try removing
> everything but crawl_generate under that segment.
> 
>> Thanks in advance
>>
>> Regards,
>> Chetan Patel
>>
>>
>>
> 
> 
> 
> 
> -- 
> Doğacan Güney
> 
> 

-- 
View this message in context: 
http://www.nabble.com/hadoop-dfs--ls-and-nutch-generate-fetch-commands-tp16758617p19491939.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Reply via email to