Hello,

I'm using SegmentMergeTool to merge some large segments, and I see that
the final index optimization (below) takes a looong time.  I think this
index creation and optimization is triggered by the "-i" param to
SegmentMergeTools.  From what I saw in the SegmentMergeTools.java, this
is an optional parameter, but I'm wondering if I can just skip this
final indexing and index optimization step all together.  Right now I'm
not making use of this final Nutch index, as I'm just reading from
ParseData.DIR_NAME and FetcherOutput.DIR_NAME using ArrayFile.Reader.

...
050721 004958 * Creating new segment index(es)...
050721 004958 * Opening segment 20050721003259
050721 004958 * Indexing segment 20050721003259
050721 004958 found resource common-terms.utf8 at
file:/simpy/nutch-nightly/conf/common-terms.utf8
050721 005147  Processed 20000 records (183.43071 rec/s)
050721 005536  Processed 60000 records (87.209045 rec/s)
050721 005923  Processed 100000 records (88.12397 rec/s)
050721 010117  Processed 120000 records (176.20369 rec/s)
050721 010311  Processed 140000 records (174.64503 rec/s)
050721 010509  Processed 160000 records (169.44844 rec/s)
050721 010712  Processed 180000 records (163.1867 rec/s)
050721 010850  Processed 200000 records (204.94743 rec/s)
050721 011110  Processed 220000 records (142.55066 rec/s)
...
...
050721 071234 * Optimizing index...
...
... this takes a long time ...
...

Thanks,
Otis

____________________________________________________________________
Simpy -- simpy.com -- tags, social bookmarks, personal search engine

Reply via email to