Hi all,

I re-ran recrawl.sh on a linux server. New indexes were generated in
merge-output folder this time. The failed scenario in my privous query
occurs if I run script in crgwin. 

Considering indexes are lucene, I thought the next job is merging indexes
within merge-output and index. I am not sure whether merge-out only contain
incremental indexes. So I create a script to do merging job, and indexes are
merged finally.

My question 1:  Is index in merge-out just incremental or complete?

question 2 :  Can I delete merge-out folder after I meged indexes?  If I run
recrawl again, it will complain merge-output already exists.

Br
Ian


ianwong wrote:
> 
> Hi, I used recrawl.sh and did  recrawling job. The logs regarding mering
> index as follows:
> 
> 2008-12-03 09:41:21,118 INFO  indexer.IndexingFilters - Adding
> org.apache.nutch.indexer.basic.BasicIndexingFilter
> 2008-12-03 09:41:21,371 INFO  indexer.Indexer - Optimizing index.
> 2008-12-03 09:41:22,414 INFO  indexer.Indexer - Indexer: done
> 2008-12-03 09:41:25,543 INFO  indexer.DeleteDuplicates - Dedup: starting
> 2008-12-03 09:41:25,620 INFO  indexer.DeleteDuplicates - Dedup: adding
> indexes in: c3/newindexes
> 2008-12-03 09:41:31,462 INFO  indexer.DeleteDuplicates - Dedup: done
> 2008-12-03 09:41:34,599 INFO  indexer.IndexMerger - merging indexes to:
> c3/index
> 2008-12-03 09:41:34,618 INFO  indexer.IndexMerger - Adding
> c3/newindexes/part-00000
> 2008-12-03 09:41:34,833 INFO  indexer.IndexMerger - done merging
> 
> The result is that a merge-output folder added into index.. But for index
> data, nothing happened.
> Can anybody tell me what happended? Did I miss something?
> 
> thanks!
> Ian
> 

-- 
View this message in context: 
http://www.nabble.com/question-about-recrawl-and-merging-index-tp20830909p20873815.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Reply via email to