[ 
https://issues.apache.org/jira/browse/LUCENE-1737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless reopened LUCENE-1737:
----------------------------------------


I realized we should fix a few more cases here to use bulk-copy more often.  
First, on opening a pre-4.0 index, we should sweep all segments to union the 
FieldInfos so newly written segments are congruent with all past segments as 
much as possible.  Second, when merging we should start from the current 
FieldInfos.

Even with this, if you addIndexes(Directory[]), which simply copies in new 
segments, if the fields name->number assignment on those incoming indices 
doesn't match the current index, then when those segments are merged they can't 
be bulk copied.

> Always use bulk-copy when merging stored fields and term vectors
> ----------------------------------------------------------------
>
>                 Key: LUCENE-1737
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1737
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>            Priority: Minor
>             Fix For: 3.1, 4.0
>
>         Attachments: LUCENE-1737.patch
>
>
> Lucene has nice optimizations in place during merging of stored fields
> (LUCENE-1043) and term vectors (LUCENE-1120) whereby the bytes are
> bulk copied to the new segmetn.  This is much faster than decoding &
> rewriting one document at a time.
> However the optimization is rather brittle: it relies on the mapping
> of field name to number to be the same ("congruent") for the segment
> being merged.
> Unfortunately, the field mapping will be congruent only if the app
> adds the same fields in precisely the same order to each document.
> I think we should fix IndexWriter to assign the same field number for
> a given field that has been assigned in the past.  Ie, when writing a
> new segment, we pre-seed the field numbers based on past segments.
> All other aspects of FieldInfo would remain fully dynamic.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to