Hi all, I re-ran recrawl.sh on a linux server. New indexes were generated in merge-output folder this time. The failed scenario in my privous query occurs if I run script in crgwin.
Considering indexes are lucene, I thought the next job is merging indexes within merge-output and index. I am not sure whether merge-out only contain incremental indexes. So I create a script to do merging job, and indexes are merged finally. My question 1: Is index in merge-out just incremental or complete? question 2 : Can I delete merge-out folder after I meged indexes? If I run recrawl again, it will complain merge-output already exists. Br Ian ianwong wrote: > > Hi, I used recrawl.sh and did recrawling job. The logs regarding mering > index as follows: > > 2008-12-03 09:41:21,118 INFO indexer.IndexingFilters - Adding > org.apache.nutch.indexer.basic.BasicIndexingFilter > 2008-12-03 09:41:21,371 INFO indexer.Indexer - Optimizing index. > 2008-12-03 09:41:22,414 INFO indexer.Indexer - Indexer: done > 2008-12-03 09:41:25,543 INFO indexer.DeleteDuplicates - Dedup: starting > 2008-12-03 09:41:25,620 INFO indexer.DeleteDuplicates - Dedup: adding > indexes in: c3/newindexes > 2008-12-03 09:41:31,462 INFO indexer.DeleteDuplicates - Dedup: done > 2008-12-03 09:41:34,599 INFO indexer.IndexMerger - merging indexes to: > c3/index > 2008-12-03 09:41:34,618 INFO indexer.IndexMerger - Adding > c3/newindexes/part-00000 > 2008-12-03 09:41:34,833 INFO indexer.IndexMerger - done merging > > The result is that a merge-output folder added into index.. But for index > data, nothing happened. > Can anybody tell me what happended? Did I miss something? > > thanks! > Ian > -- View this message in context: http://www.nabble.com/question-about-recrawl-and-merging-index-tp20830909p20873815.html Sent from the Nutch - User mailing list archive at Nabble.com.