Hi Doğacan Güney, Thanks for giving solution.
Is it possible to recrawl without removing files? Thank you again. Regads, Chetan Patel Doğacan Güney-3 wrote: > > Hi, > > On Mon, Sep 15, 2008 at 2:43 PM, Chetan Patel <[EMAIL PROTECTED]> > wrote: >> >> Hi, >> >> I have tried to re crawl script which is available on >> http://wiki.apache.org/nutch/IntranetRecrawl. >> >> I have got following error. >> >> 2008-09-15 17:04:32,238 INFO fetcher.Fetcher - Fetcher: starting >> 2008-09-15 17:04:32,254 INFO fetcher.Fetcher - Fetcher: segment: >> google/segments/20080915170335 >> 2008-09-15 17:04:32,972 FATAL fetcher.Fetcher - Fetcher: >> java.io.IOException: Segment already fetched! >> at >> org.apache.nutch.fetcher.FetcherOutputFormat.checkOutputSpecs(FetcherOutputFormat.java:46) >> at >> org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:329) >> at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:543) >> at org.apache.nutch.fetcher.Fetcher.fetch(Fetcher.java:470) >> at org.apache.nutch.fetcher.Fetcher.run(Fetcher.java:505) >> at org.apache.hadoop.util.ToolBase.doMain(ToolBase.java:189) >> at org.apache.nutch.fetcher.Fetcher.main(Fetcher.java:477) >> >> 2008-09-15 17:04:35,144 INFO crawl.CrawlDb - CrawlDb update: starting >> >> Plz. help me to solve this error. >> > > Segment you are trying to crawl is already fetched. Try removing > everything but crawl_generate under that segment. > >> Thanks in advance >> >> Regards, >> Chetan Patel >> >> >> > > > > > -- > Doğacan Güney > > -- View this message in context: http://www.nabble.com/hadoop-dfs--ls-and-nutch-generate-fetch-commands-tp16758617p19491939.html Sent from the Nutch - User mailing list archive at Nabble.com.
