Hello, I'm using SegmentMergeTool to merge some large segments, and I see that the final index optimization (below) takes a looong time. I think this index creation and optimization is triggered by the "-i" param to SegmentMergeTools. From what I saw in the SegmentMergeTools.java, this is an optional parameter, but I'm wondering if I can just skip this final indexing and index optimization step all together. Right now I'm not making use of this final Nutch index, as I'm just reading from ParseData.DIR_NAME and FetcherOutput.DIR_NAME using ArrayFile.Reader.
... 050721 004958 * Creating new segment index(es)... 050721 004958 * Opening segment 20050721003259 050721 004958 * Indexing segment 20050721003259 050721 004958 found resource common-terms.utf8 at file:/simpy/nutch-nightly/conf/common-terms.utf8 050721 005147 Processed 20000 records (183.43071 rec/s) 050721 005536 Processed 60000 records (87.209045 rec/s) 050721 005923 Processed 100000 records (88.12397 rec/s) 050721 010117 Processed 120000 records (176.20369 rec/s) 050721 010311 Processed 140000 records (174.64503 rec/s) 050721 010509 Processed 160000 records (169.44844 rec/s) 050721 010712 Processed 180000 records (163.1867 rec/s) 050721 010850 Processed 200000 records (204.94743 rec/s) 050721 011110 Processed 220000 records (142.55066 rec/s) ... ... 050721 071234 * Optimizing index... ... ... this takes a long time ... ... Thanks, Otis ____________________________________________________________________ Simpy -- simpy.com -- tags, social bookmarks, personal search engine
