>
> These lookups are expensive and will be done millions of times (each term,
> each DV field, each .. everything).
Yes, I think you have described the issue correctly. There is no way we can
achieve speed-ups without a DocMap, especially for repeated lookups/merge
IndexWriter relies on this i
That said... if we generate the global DocMap up front, there's no reason
to not execute the merge of the segments more efficiently, i.e. without
wrapping them in a SlowCompositeReaderWrapper.
But that's not work for SortingMergePolicy, it's either a special
SortingAtomicReader which wraps a group
OK I think I now understand what you're asking :). It's unrelated though to
SortingMergePolicy. You propose to do the "merge" part of a merge-sort,
since we know the indexes are already sorted, right?
This is something we've considered in the past, but it is very tricky (see
below) and we went wit
>
> Therefore the DocMap is initialized only when the
> merge actually executes ... what is there more to postpone?
Agreed. However, what I am asking is, if there is an alternative to DocMap,
will that be better? Plz read-on
And besides, if the segments are already sorted, you should return a
n
>
> I am afraid the DocMap still maintains doc-id mappings till merge and I am
> trying to avoid it...
>
What do you mean 'till merge'? The method OneMerge.getMergeReaders() is
called only when the merge is executed, not when the MergePolicy decided to
merge those segments. Therefore the DocMap is
I am afraid the DocMap still maintains doc-id mappings till merge and I am
trying to avoid it...
I think lucene itself has a MergeIterator in o.a.l.util package.
A MergePolicy can wrap a simple MergeIterator for iterating docs across
different AtomicReaders in correct sort-order for a given field
loadSortTerm is your method right? In the current Sorter.sort
implementation, I see this code:
boolean sorted = true;
for (int i = 1; i < maxDoc; ++i) {
if (comparator.compare(i-1, i) > 0) {
sorted = false;
break;
}
}
if (sorted) {
return null;
Shai,
This is the code snippet I use inside my class...
public class MySorter extends Sorter {
@Override
public DocMap sort(AtomicReader reader) throws IOException {
final Map docVsId = loadSortTerm(reader);
final Sorter.DocComparator comparator = new Sorter.DocComparator() {
@Override
I'm not sure that I follow ... where do you see DocMap being loaded up
front? Specifically, Sorter.sort may return null of the readers are already
sorted ... I think we already optimized for the case where the readers are
sorted.
Shai
On Tue, Jun 17, 2014 at 4:04 AM, Ravikumar Govindarajan <
rav