Yes It is required to remove or use a different directory to create index Sent from my iPhone
On Jan 10, 2011, at 9:48 AM, "McGibbney, Lewis John" <[email protected]> wrote: > Hello List, > > Only material I could find on this was a post by myself (some time ago) which > addressed a slightly different problem case. > > During the indexing stage of a recrawl, my Hadoop log reads as follows > > Indexer: starting at 2011-01-10 16:40:42 > Indexer: org.apache.hadoop.mapred.FileAlreadyExistsException: Output > directory f > ile:/C:/Downloads/Apache/nutch-1.2/crawl/indexes already exists > at > org.apache.hadoop.mapred.FileOutputFormat.checkOutputSpecs(FileOutput > Format.java:111) > at > org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:7 > 72) > at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:730) > at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1249) > at org.apache.nutch.indexer.Indexer.index(Indexer.java:76) > at org.apache.nutch.indexer.Indexer.run(Indexer.java:97) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) > at org.apache.nutch.indexer.Indexer.main(Indexer.java:106) > > My quick question is, is it necessary to delete/remove existing indexes > before I can index freshly fetched web data? > > Thank you > > Lewis > > > Glasgow Caledonian University is a registered Scottish charity, number > SC021474 > > Winner: Times Higher Education's Widening Participation Initiative of the > Year 2009 and Herald Society's Education Initiative of the Year 2009 > http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,6219,en.html

