Hi, I have a new problem.
When I run recrawling script it work fine first time. When I tried again it fails. see attached log file for error. http://www.nabble.com/file/p19493344/hadoop.log hadoop.log Plz. give me a solution. Thanks in advance. Regads, Chetan Patel Chetan Patel wrote: > > Hi Doğacan Güney, > > Thanks for giving solution. > > Is it possible to recrawl without removing files? > > Thank you again. > > Regads, > Chetan Patel > > > Doğacan Güney-3 wrote: >> >> Hi, >> >> On Mon, Sep 15, 2008 at 2:43 PM, Chetan Patel <[EMAIL PROTECTED]> >> wrote: >>> >>> Hi, >>> >>> I have tried to re crawl script which is available on >>> http://wiki.apache.org/nutch/IntranetRecrawl. >>> >>> I have got following error. >>> >>> 2008-09-15 17:04:32,238 INFO fetcher.Fetcher - Fetcher: starting >>> 2008-09-15 17:04:32,254 INFO fetcher.Fetcher - Fetcher: segment: >>> google/segments/20080915170335 >>> 2008-09-15 17:04:32,972 FATAL fetcher.Fetcher - Fetcher: >>> java.io.IOException: Segment already fetched! >>> at >>> org.apache.nutch.fetcher.FetcherOutputFormat.checkOutputSpecs(FetcherOutputFormat.java:46) >>> at >>> org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:329) >>> at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:543) >>> at org.apache.nutch.fetcher.Fetcher.fetch(Fetcher.java:470) >>> at org.apache.nutch.fetcher.Fetcher.run(Fetcher.java:505) >>> at org.apache.hadoop.util.ToolBase.doMain(ToolBase.java:189) >>> at org.apache.nutch.fetcher.Fetcher.main(Fetcher.java:477) >>> >>> 2008-09-15 17:04:35,144 INFO crawl.CrawlDb - CrawlDb update: starting >>> >>> Plz. help me to solve this error. >>> >> >> Segment you are trying to crawl is already fetched. Try removing >> everything but crawl_generate under that segment. >> >>> Thanks in advance >>> >>> Regards, >>> Chetan Patel >>> >>> >>> >> >> >> >> >> -- >> Doğacan Güney >> >> > > -- View this message in context: http://www.nabble.com/hadoop-dfs--ls-and-nutch-generate-fetch-commands-tp16758617p19493344.html Sent from the Nutch - User mailing list archive at Nabble.com.
