[jira] [Commented] (LUCENE-1076) Allow MergePolicy to select non-contiguous merges

2011-05-12 Thread Simon Willnauer (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13032274#comment-13032274
 ] 

Simon Willnauer commented on LUCENE-1076:
-

bq. This means docIDs may be reordered, since Tiered MP can merge out-of-order 
segments.
I think this is a very hard break and it should depend on the Version you pass 
to IWC. Stuff like that is really a good usecase for Version. I had customers 
in the past that heavily depend on the lucene doc ID while it is not 
recommended but with this change their app will suddenly not work anymore. so 
we should make sure that they can upgrade seamlessly!

 Allow MergePolicy to select non-contiguous merges
 -

 Key: LUCENE-1076
 URL: https://issues.apache.org/jira/browse/LUCENE-1076
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Affects Versions: 2.3
Reporter: Michael McCandless
Assignee: Michael McCandless
Priority: Minor
 Fix For: 3.2, 4.0

 Attachments: LUCENE-1076-3x.patch, LUCENE-1076.patch, 
 LUCENE-1076.patch, LUCENE-1076.patch


 I started work on this but with LUCENE-1044 I won't make much progress
 on it for a while, so I want to checkpoint my current state/patch.
 For backwards compatibility we must leave the default MergePolicy as
 selecting contiguous merges.  This is necessary because some
 applications rely on temporal monotonicity of doc IDs, which means
 even though merges can re-number documents, the renumbering will
 always reflect the order in which the documents were added to the
 index.
 Still, for those apps that do not rely on this, we should offer a
 MergePolicy that is free to select the best merges regardless of
 whether they are continuguous.  This requires fixing IndexWriter to
 accept such a merge, and, fixing LogMergePolicy to optionally allow
 it the freedom to do so.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-1076) Allow MergePolicy to select non-contiguous merges

2011-05-12 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13032282#comment-13032282
 ] 

Uwe Schindler commented on LUCENE-1076:
---

{quote}

bq. This means docIDs may be reordered, since Tiered MP can merge out-of-order 
segments.
I think this is a very hard break and it should depend on the Version you pass 
to IWC. Stuff like that is really a good usecase for Version. I had customers 
in the past that heavily depend on the lucene doc ID while it is not 
recommended but with this change their app will suddenly not work anymore. so 
we should make sure that they can upgrade seamlessly!
{quote}

I think we should also warn people that have this problem to use IndexUpgrader, 
because it has the same problem. Segments are reordered (segments that were 
upgraded before a call to MP's optimize come first, then the upgraded ones). 
Maybe we should add this to JavaDocs in 3.x.

I'll reopen.

 Allow MergePolicy to select non-contiguous merges
 -

 Key: LUCENE-1076
 URL: https://issues.apache.org/jira/browse/LUCENE-1076
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Affects Versions: 2.3
Reporter: Michael McCandless
Assignee: Michael McCandless
Priority: Minor
 Fix For: 3.2, 4.0

 Attachments: LUCENE-1076-3x.patch, LUCENE-1076.patch, 
 LUCENE-1076.patch, LUCENE-1076.patch


 I started work on this but with LUCENE-1044 I won't make much progress
 on it for a while, so I want to checkpoint my current state/patch.
 For backwards compatibility we must leave the default MergePolicy as
 selecting contiguous merges.  This is necessary because some
 applications rely on temporal monotonicity of doc IDs, which means
 even though merges can re-number documents, the renumbering will
 always reflect the order in which the documents were added to the
 index.
 Still, for those apps that do not rely on this, we should offer a
 MergePolicy that is free to select the best merges regardless of
 whether they are continuguous.  This requires fixing IndexWriter to
 accept such a merge, and, fixing LogMergePolicy to optionally allow
 it the freedom to do so.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-1076) Allow MergePolicy to select non-contiguous merges

2011-05-12 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13032344#comment-13032344
 ] 

Michael McCandless commented on LUCENE-1076:


bq. I think this is a very hard break and it should depend on the Version you 
pass to IWC.

+1

I'll make it TieredMP if version = 3.2, else LogByteSizeMP.

 Allow MergePolicy to select non-contiguous merges
 -

 Key: LUCENE-1076
 URL: https://issues.apache.org/jira/browse/LUCENE-1076
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Affects Versions: 2.3
Reporter: Michael McCandless
Assignee: Michael McCandless
Priority: Minor
 Fix For: 3.2, 4.0

 Attachments: LUCENE-1076-3x.patch, LUCENE-1076.patch, 
 LUCENE-1076.patch, LUCENE-1076.patch


 I started work on this but with LUCENE-1044 I won't make much progress
 on it for a while, so I want to checkpoint my current state/patch.
 For backwards compatibility we must leave the default MergePolicy as
 selecting contiguous merges.  This is necessary because some
 applications rely on temporal monotonicity of doc IDs, which means
 even though merges can re-number documents, the renumbering will
 always reflect the order in which the documents were added to the
 index.
 Still, for those apps that do not rely on this, we should offer a
 MergePolicy that is free to select the best merges regardless of
 whether they are continuguous.  This requires fixing IndexWriter to
 accept such a merge, and, fixing LogMergePolicy to optionally allow
 it the freedom to do so.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1076) Allow MergePolicy to select non-contiguous merges

2011-01-29 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12988458#action_12988458
 ] 

Michael McCandless commented on LUCENE-1076:


bq. I saw this and thought it was interesting. Why is the gen needed?

So, at first I added it because the pushing of merged delete packets
got too hairy, eg when merges interleave you'd have to handle deletes
being pushed onto each other's internal merged segments.

Also, we really needed a transactional data structure here, because
before DW could push more deletes into an existing packet (ie the
packet was not write once), which made tracking problematic if the
merge wanted to record that the first batch of deletes had been
applied but not any subsequent pushes.

But, after making the change, I realized that today (trunk, 3.1) we
are badly inefficient!  We apply deletes to segments being merged, but
then we place the merged segment back in the same position.  This is
inefficient because later when this segment gets merged, we wastefully
re-apply the same deletes (plus, new ones, which do need to be
applied).  This is a total waste.

So, by decoupling tracking of where you are in the deletes packet
stream, from the physical location of your segment in the index, we
fix this waste.  Also, it's quite a bit simpler now -- we no longer
have to merge deletes on completing a merge.


 Allow MergePolicy to select non-contiguous merges
 -

 Key: LUCENE-1076
 URL: https://issues.apache.org/jira/browse/LUCENE-1076
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Affects Versions: 2.3
Reporter: Michael McCandless
Assignee: Michael McCandless
Priority: Minor
 Fix For: 3.2, 4.0

 Attachments: LUCENE-1076.patch, LUCENE-1076.patch


 I started work on this but with LUCENE-1044 I won't make much progress
 on it for a while, so I want to checkpoint my current state/patch.
 For backwards compatibility we must leave the default MergePolicy as
 selecting contiguous merges.  This is necessary because some
 applications rely on temporal monotonicity of doc IDs, which means
 even though merges can re-number documents, the renumbering will
 always reflect the order in which the documents were added to the
 index.
 Still, for those apps that do not rely on this, we should offer a
 MergePolicy that is free to select the best merges regardless of
 whether they are continuguous.  This requires fixing IndexWriter to
 accept such a merge, and, fixing LogMergePolicy to optionally allow
 it the freedom to do so.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1076) Allow MergePolicy to select non-contiguous merges

2011-01-29 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12988465#action_12988465
 ] 

Michael McCandless commented on LUCENE-1076:


Another benefit of the transaction log for deletes is, because they are 
write-once (ie, after a set of buffered deletes is pushed, they are never 
changed), we can switch to a more efficient data structure than TreeMap on push.

Eg, we can pull the del Terms (sorted) and store them in an array.

 Allow MergePolicy to select non-contiguous merges
 -

 Key: LUCENE-1076
 URL: https://issues.apache.org/jira/browse/LUCENE-1076
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Affects Versions: 2.3
Reporter: Michael McCandless
Assignee: Michael McCandless
Priority: Minor
 Fix For: 3.2, 4.0

 Attachments: LUCENE-1076.patch, LUCENE-1076.patch


 I started work on this but with LUCENE-1044 I won't make much progress
 on it for a while, so I want to checkpoint my current state/patch.
 For backwards compatibility we must leave the default MergePolicy as
 selecting contiguous merges.  This is necessary because some
 applications rely on temporal monotonicity of doc IDs, which means
 even though merges can re-number documents, the renumbering will
 always reflect the order in which the documents were added to the
 index.
 Still, for those apps that do not rely on this, we should offer a
 MergePolicy that is free to select the best merges regardless of
 whether they are continuguous.  This requires fixing IndexWriter to
 accept such a merge, and, fixing LogMergePolicy to optionally allow
 it the freedom to do so.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1076) Allow MergePolicy to select non-contiguous merges

2011-01-29 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12988499#action_12988499
 ] 

Michael McCandless commented on LUCENE-1076:


Committed to trunk; I'll let this age for a while before back porting.

On the backport, I'll leave the default contiguous merges.

 Allow MergePolicy to select non-contiguous merges
 -

 Key: LUCENE-1076
 URL: https://issues.apache.org/jira/browse/LUCENE-1076
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Affects Versions: 2.3
Reporter: Michael McCandless
Assignee: Michael McCandless
Priority: Minor
 Fix For: 3.2, 4.0

 Attachments: LUCENE-1076.patch, LUCENE-1076.patch


 I started work on this but with LUCENE-1044 I won't make much progress
 on it for a while, so I want to checkpoint my current state/patch.
 For backwards compatibility we must leave the default MergePolicy as
 selecting contiguous merges.  This is necessary because some
 applications rely on temporal monotonicity of doc IDs, which means
 even though merges can re-number documents, the renumbering will
 always reflect the order in which the documents were added to the
 index.
 Still, for those apps that do not rely on this, we should offer a
 MergePolicy that is free to select the best merges regardless of
 whether they are continuguous.  This requires fixing IndexWriter to
 accept such a merge, and, fixing LogMergePolicy to optionally allow
 it the freedom to do so.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1076) Allow MergePolicy to select non-contiguous merges

2011-01-28 Thread Jason Rutherglen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12988356#action_12988356
 ] 

Jason Rutherglen commented on LUCENE-1076:
--

{quote}I also reworked how buffered deletes are managed, so that each
packet of buffered deletes, as well as each flushed segment, is now
assigned an incrementing gen.  This way, when it's time to apply
deletes, the algorithm is easy: only delete packets with gen = this
segment should coalesce and apply.{quote}

I saw this and thought it was interesting.  Why is the gen needed?

 Allow MergePolicy to select non-contiguous merges
 -

 Key: LUCENE-1076
 URL: https://issues.apache.org/jira/browse/LUCENE-1076
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Affects Versions: 2.3
Reporter: Michael McCandless
Assignee: Michael McCandless
Priority: Minor
 Fix For: 3.2, 4.0

 Attachments: LUCENE-1076.patch, LUCENE-1076.patch


 I started work on this but with LUCENE-1044 I won't make much progress
 on it for a while, so I want to checkpoint my current state/patch.
 For backwards compatibility we must leave the default MergePolicy as
 selecting contiguous merges.  This is necessary because some
 applications rely on temporal monotonicity of doc IDs, which means
 even though merges can re-number documents, the renumbering will
 always reflect the order in which the documents were added to the
 index.
 Still, for those apps that do not rely on this, we should offer a
 MergePolicy that is free to select the best merges regardless of
 whether they are continuguous.  This requires fixing IndexWriter to
 accept such a merge, and, fixing LogMergePolicy to optionally allow
 it the freedom to do so.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1076) Allow MergePolicy to select non-contiguous merges

2009-07-28 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12735968#action_12735968
 ] 

Shai Erera commented on LUCENE-1076:


Hmm ... I think I found the problem. I added commit() after cms.sync() and it 
never failed again. So I checked the output of infoStream in two cases (failure 
and success) and found this: in the success case, the pending merges occurred 
before the last addDocument calls happened (actually the last 2), therefore 
commit() committed those pending merges output, and sync() afterwards did 
nothing.

In the failure case, the last pending merge happened *after* commit() was 
called, either as (or not) part of the sync() call, but it was never committed.

So it looks to me that I should add this test case as 
testOptimizeMaxNumSegments3() (even though it has nothing to do w/ optimize()), 
just to cover this aspect and also document CMS.sync() to mention that a 
commit() after it is required if the outcome of the merges should be reflected 
in the index (i.e., committed).

Did I get it right?

 Allow MergePolicy to select non-contiguous merges
 -

 Key: LUCENE-1076
 URL: https://issues.apache.org/jira/browse/LUCENE-1076
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Affects Versions: 2.3
Reporter: Michael McCandless
Priority: Minor
 Attachments: LUCENE-1076.patch


 I started work on this but with LUCENE-1044 I won't make much progress
 on it for a while, so I want to checkpoint my current state/patch.
 For backwards compatibility we must leave the default MergePolicy as
 selecting contiguous merges.  This is necessary because some
 applications rely on temporal monotonicity of doc IDs, which means
 even though merges can re-number documents, the renumbering will
 always reflect the order in which the documents were added to the
 index.
 Still, for those apps that do not rely on this, we should offer a
 MergePolicy that is free to select the best merges regardless of
 whether they are continuguous.  This requires fixing IndexWriter to
 accept such a merge, and, fixing LogMergePolicy to optionally allow
 it the freedom to do so.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1076) Allow MergePolicy to select non-contiguous merges

2009-07-28 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12735971#action_12735971
 ] 

Shai Erera commented on LUCENE-1076:


BTW, the second sync() call comes after optimize(), which is redundant as far 
as I understand, since optimize() or optimize(int) will wait for all merges to 
complete, which CMS merges.

I wonder then if it won't be useful to have a commit(doWait=true), which won't 
require calling sync() or waitForMerges()?

 Allow MergePolicy to select non-contiguous merges
 -

 Key: LUCENE-1076
 URL: https://issues.apache.org/jira/browse/LUCENE-1076
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Affects Versions: 2.3
Reporter: Michael McCandless
Priority: Minor
 Attachments: LUCENE-1076.patch


 I started work on this but with LUCENE-1044 I won't make much progress
 on it for a while, so I want to checkpoint my current state/patch.
 For backwards compatibility we must leave the default MergePolicy as
 selecting contiguous merges.  This is necessary because some
 applications rely on temporal monotonicity of doc IDs, which means
 even though merges can re-number documents, the renumbering will
 always reflect the order in which the documents were added to the
 index.
 Still, for those apps that do not rely on this, we should offer a
 MergePolicy that is free to select the best merges regardless of
 whether they are continuguous.  This requires fixing IndexWriter to
 accept such a merge, and, fixing LogMergePolicy to optionally allow
 it the freedom to do so.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1076) Allow MergePolicy to select non-contiguous merges

2009-07-28 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12736141#action_12736141
 ] 

Michael McCandless commented on LUCENE-1076:


bq. I added commit() after cms.sync() and it never failed again. 

I think you are right!  Since we try to read the dir (sis.read), if we don't 
commit then the changes won't be present since IndexWriter is opened w/ 
autoCommit false.  Another simple check would be to call 
getReader().getSequentialSubReaders() and check how many segments there are, 
instead of having to go through the Directory to check it.

bq. BTW, the second sync() call comes after optimize(), which is redundant as 
far as I understand

I agree.

bq. I wonder then if it won't be useful to have a commit(doWait=true), which 
won't require calling sync() or waitForMerges()?

I think we can leave this separate (ie you should call waitForMerges() if you 
need to), because commit normally has nothing to do w/ merging, since merging 
doesn't change any docs in the index.  Commit only ensure that changes to the 
index are pushed to stable storage.  Whereas eg optimize is all about doing 
merges so it makes sense for it to have a doWait?

 Allow MergePolicy to select non-contiguous merges
 -

 Key: LUCENE-1076
 URL: https://issues.apache.org/jira/browse/LUCENE-1076
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Affects Versions: 2.3
Reporter: Michael McCandless
Priority: Minor
 Attachments: LUCENE-1076.patch


 I started work on this but with LUCENE-1044 I won't make much progress
 on it for a while, so I want to checkpoint my current state/patch.
 For backwards compatibility we must leave the default MergePolicy as
 selecting contiguous merges.  This is necessary because some
 applications rely on temporal monotonicity of doc IDs, which means
 even though merges can re-number documents, the renumbering will
 always reflect the order in which the documents were added to the
 index.
 Still, for those apps that do not rely on this, we should offer a
 MergePolicy that is free to select the best merges regardless of
 whether they are continuguous.  This requires fixing IndexWriter to
 accept such a merge, and, fixing LogMergePolicy to optionally allow
 it the freedom to do so.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1076) Allow MergePolicy to select non-contiguous merges

2009-07-28 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12736206#action_12736206
 ] 

Shai Erera commented on LUCENE-1076:


Ok I agree that commit should not wait for merges. It does seem not related to 
segment merging.

 Allow MergePolicy to select non-contiguous merges
 -

 Key: LUCENE-1076
 URL: https://issues.apache.org/jira/browse/LUCENE-1076
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Affects Versions: 2.3
Reporter: Michael McCandless
Priority: Minor
 Attachments: LUCENE-1076.patch


 I started work on this but with LUCENE-1044 I won't make much progress
 on it for a while, so I want to checkpoint my current state/patch.
 For backwards compatibility we must leave the default MergePolicy as
 selecting contiguous merges.  This is necessary because some
 applications rely on temporal monotonicity of doc IDs, which means
 even though merges can re-number documents, the renumbering will
 always reflect the order in which the documents were added to the
 index.
 Still, for those apps that do not rely on this, we should offer a
 MergePolicy that is free to select the best merges regardless of
 whether they are continuguous.  This requires fixing IndexWriter to
 accept such a merge, and, fixing LogMergePolicy to optionally allow
 it the freedom to do so.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1076) Allow MergePolicy to select non-contiguous merges

2009-07-27 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12735905#action_12735905
 ] 

Shai Erera commented on LUCENE-1076:


Can someone please help me understand what's going on here? After I applied the 
patch to trunk, TestIndexWriter.testOptimizeMaxNumSegments2() fails. The 
failure happens only if CMS is used, and doesn't when SMS is used. I dug deeper 
into the test and what happens is that the test asks to 
optimize(maxNumSegments) and expects that either: (1) if the number of segments 
was  maxNumSegments than the resulting number of segments is exactly as it was 
before and (2) otherwise it should be exactly maxNumSegments.

First, the javadocs of optimize(maxNumSegments) say that it will result in = 
maxNumSegments, but I understand the LogMergePolicy ensures that if you ask for 
maxNumSegments, that's the number of segments you'll get.

While trying to debug what's wrong w/ the change so far, I managed to reduce 
the test to this code:

{code}
public void test1() throws Exception {
MockRAMDirectory dir = new MockRAMDirectory();

final Document doc = new Document();
doc.add(new Field(content, aaa, Field.Store.YES, Field.Index.ANALYZED));

IndexWriter writer  = new IndexWriter(dir, new WhitespaceAnalyzer(), true, 
IndexWriter.MaxFieldLength.LIMITED);
//writer.setMergeScheduler(new SerialMergeScheduler());
LogDocMergePolicy ldmp = new LogDocMergePolicy();
ldmp.setMinMergeDocs(1);
writer.setMergePolicy(ldmp);
writer.setMergeFactor(3);
writer.setMaxBufferedDocs(2);

MergeScheduler ms = writer.getMergeScheduler();
//  writer.setInfoStream(System.out);

// Add enough documents to create several segments (uncomitted) and kick off
// some threads.
for (int i = 0; i  20; i++) {
  writer.addDocument(doc);
}
writer.commit();

if (ms instanceof ConcurrentMergeScheduler) {
  // Wait for all merges to complete
  ((ConcurrentMergeScheduler) writer.getMergeScheduler()).sync();
}

SegmentInfos sis = new SegmentInfos();
sis.read(dir);

System.out.println(numSegments after add + commit ==  + sis.size());

final int segCount = sis.size();

int maxNumSegments = 3;
writer.optimize(maxNumSegments);
writer.commit();

if (ms instanceof ConcurrentMergeScheduler) {
  // Wait for all merges to complete
  ((ConcurrentMergeScheduler) writer.getMergeScheduler()).sync();
}

sis = new SegmentInfos();
sis.read(dir);
final int optSegCount = sis.size();

System.out.println(numSegments after optimize ( + maxNumSegments + ) + 
commit ==  + sis.size());

if (segCount  maxNumSegments)
  Assert.assertEquals(segCount, optSegCount);
else
  Assert.assertEquals(maxNumSegments, optSegCount);
}
{code}

This fails almost every time that I run it, so if you try it - make sure to run 
it a couple of times. I then switched to trunk, but it fails almost 
consistently on trunk also !?!?

Can someone please have a look and tell me what's wrong (is it the test, or did 
I hit a true bug in the code?)?

 Allow MergePolicy to select non-contiguous merges
 -

 Key: LUCENE-1076
 URL: https://issues.apache.org/jira/browse/LUCENE-1076
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Affects Versions: 2.3
Reporter: Michael McCandless
Priority: Minor
 Attachments: LUCENE-1076.patch


 I started work on this but with LUCENE-1044 I won't make much progress
 on it for a while, so I want to checkpoint my current state/patch.
 For backwards compatibility we must leave the default MergePolicy as
 selecting contiguous merges.  This is necessary because some
 applications rely on temporal monotonicity of doc IDs, which means
 even though merges can re-number documents, the renumbering will
 always reflect the order in which the documents were added to the
 index.
 Still, for those apps that do not rely on this, we should offer a
 MergePolicy that is free to select the best merges regardless of
 whether they are continuguous.  This requires fixing IndexWriter to
 accept such a merge, and, fixing LogMergePolicy to optionally allow
 it the freedom to do so.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1076) Allow MergePolicy to select non-contiguous merges

2009-07-22 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12734169#action_12734169
 ] 

Michael McCandless commented on LUCENE-1076:


maxDoc() does reflect the number of docs in the index.  It's simply the sum of 
docCount for all segments.  Shuffling the order of the segments, or allowing 
non-contiguous segments to be merged, won't change how maxDoc() is computed.

New docIDs are allocating by incrementing an integer (starting with 0) for the 
buffered docs.  When a segment gets flushed, we reset that to 0.  Ie, docIDs 
are stored within one segment; they have no context from prior segments.

 Allow MergePolicy to select non-contiguous merges
 -

 Key: LUCENE-1076
 URL: https://issues.apache.org/jira/browse/LUCENE-1076
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Affects Versions: 2.3
Reporter: Michael McCandless
Assignee: Michael McCandless
Priority: Minor
 Attachments: LUCENE-1076.patch


 I started work on this but with LUCENE-1044 I won't make much progress
 on it for a while, so I want to checkpoint my current state/patch.
 For backwards compatibility we must leave the default MergePolicy as
 selecting contiguous merges.  This is necessary because some
 applications rely on temporal monotonicity of doc IDs, which means
 even though merges can re-number documents, the renumbering will
 always reflect the order in which the documents were added to the
 index.
 Still, for those apps that do not rely on this, we should offer a
 MergePolicy that is free to select the best merges regardless of
 whether they are continuguous.  This requires fixing IndexWriter to
 accept such a merge, and, fixing LogMergePolicy to optionally allow
 it the freedom to do so.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1076) Allow MergePolicy to select non-contiguous merges

2009-07-22 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12734174#action_12734174
 ] 

Shai Erera commented on LUCENE-1076:


Oh. Thanks for correcting me. In that case, I take what I said back.

I think this together w/ LUCENE-1750 can really help speed up segment merges in 
certain scenarios. Will wait for you to come back to it :)

 Allow MergePolicy to select non-contiguous merges
 -

 Key: LUCENE-1076
 URL: https://issues.apache.org/jira/browse/LUCENE-1076
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Affects Versions: 2.3
Reporter: Michael McCandless
Assignee: Michael McCandless
Priority: Minor
 Attachments: LUCENE-1076.patch


 I started work on this but with LUCENE-1044 I won't make much progress
 on it for a while, so I want to checkpoint my current state/patch.
 For backwards compatibility we must leave the default MergePolicy as
 selecting contiguous merges.  This is necessary because some
 applications rely on temporal monotonicity of doc IDs, which means
 even though merges can re-number documents, the renumbering will
 always reflect the order in which the documents were added to the
 index.
 Still, for those apps that do not rely on this, we should offer a
 MergePolicy that is free to select the best merges regardless of
 whether they are continuguous.  This requires fixing IndexWriter to
 accept such a merge, and, fixing LogMergePolicy to optionally allow
 it the freedom to do so.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1076) Allow MergePolicy to select non-contiguous merges

2009-07-22 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12734190#action_12734190
 ] 

Michael McCandless commented on LUCENE-1076:


bq. Will wait for you to come back to it

Feel free to take it, too :)

I think LUCENE-1737 is also very important for speeding up merging, especially 
because it's so unexpected that just by adding different fields to your docs, 
or the same fields in different orders, can so severely impact merge 
performance.

 Allow MergePolicy to select non-contiguous merges
 -

 Key: LUCENE-1076
 URL: https://issues.apache.org/jira/browse/LUCENE-1076
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Affects Versions: 2.3
Reporter: Michael McCandless
Assignee: Michael McCandless
Priority: Minor
 Attachments: LUCENE-1076.patch


 I started work on this but with LUCENE-1044 I won't make much progress
 on it for a while, so I want to checkpoint my current state/patch.
 For backwards compatibility we must leave the default MergePolicy as
 selecting contiguous merges.  This is necessary because some
 applications rely on temporal monotonicity of doc IDs, which means
 even though merges can re-number documents, the renumbering will
 always reflect the order in which the documents were added to the
 index.
 Still, for those apps that do not rely on this, we should offer a
 MergePolicy that is free to select the best merges regardless of
 whether they are continuguous.  This requires fixing IndexWriter to
 accept such a merge, and, fixing LogMergePolicy to optionally allow
 it the freedom to do so.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1076) Allow MergePolicy to select non-contiguous merges

2009-07-21 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12733770#action_12733770
 ] 

Shai Erera commented on LUCENE-1076:


So Mike - just to clarify. If I have 3 segments: A (0-52), B (53-124) and C 
(125-145), and you decide to merge A and C, what will be the new doc IDs of all 
segments? will they start from 53? or will you shift all the documents so that 
the segments will be B (0-71) and A+C (72-145)?

 Allow MergePolicy to select non-contiguous merges
 -

 Key: LUCENE-1076
 URL: https://issues.apache.org/jira/browse/LUCENE-1076
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Affects Versions: 2.3
Reporter: Michael McCandless
Assignee: Michael McCandless
Priority: Minor
 Attachments: LUCENE-1076.patch


 I started work on this but with LUCENE-1044 I won't make much progress
 on it for a while, so I want to checkpoint my current state/patch.
 For backwards compatibility we must leave the default MergePolicy as
 selecting contiguous merges.  This is necessary because some
 applications rely on temporal monotonicity of doc IDs, which means
 even though merges can re-number documents, the renumbering will
 always reflect the order in which the documents were added to the
 index.
 Still, for those apps that do not rely on this, we should offer a
 MergePolicy that is free to select the best merges regardless of
 whether they are continuguous.  This requires fixing IndexWriter to
 accept such a merge, and, fixing LogMergePolicy to optionally allow
 it the freedom to do so.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1076) Allow MergePolicy to select non-contiguous merges

2009-07-21 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12733771#action_12733771
 ] 

Michael McCandless commented on LUCENE-1076:


Well... one option might be the newly merged segment always replaces the 
leftmost segment.  Another option could be to leave it undefined, ie IW makes 
no commitment as to where it will place the newly merged segment so you should 
not rely on it.  Presumably apps that rely on Lucene's internal doc ID to mean 
something would not use a merge policy that selects non-contiguous segments.

Unfortunately, with the current index format, there's a big cost to allowing 
non-contiguous segments to be merged: it means the doc stores will always be 
merged.  Whereas, today, if you build up a large new index, no merging is done 
for the doc stores.

If we someday allowed a single segment to reference multiple original doc 
stores (logically concatenating [possibly many] slices out of them), which 
would presumably be a perf hit when retrieving the stored doc or term vectors, 
then this cost would go away.

 Allow MergePolicy to select non-contiguous merges
 -

 Key: LUCENE-1076
 URL: https://issues.apache.org/jira/browse/LUCENE-1076
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Affects Versions: 2.3
Reporter: Michael McCandless
Assignee: Michael McCandless
Priority: Minor
 Attachments: LUCENE-1076.patch


 I started work on this but with LUCENE-1044 I won't make much progress
 on it for a while, so I want to checkpoint my current state/patch.
 For backwards compatibility we must leave the default MergePolicy as
 selecting contiguous merges.  This is necessary because some
 applications rely on temporal monotonicity of doc IDs, which means
 even though merges can re-number documents, the renumbering will
 always reflect the order in which the documents were added to the
 index.
 Still, for those apps that do not rely on this, we should offer a
 MergePolicy that is free to select the best merges regardless of
 whether they are continuguous.  This requires fixing IndexWriter to
 accept such a merge, and, fixing LogMergePolicy to optionally allow
 it the freedom to do so.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1076) Allow MergePolicy to select non-contiguous merges

2009-07-21 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12733778#action_12733778
 ] 

Shai Erera commented on LUCENE-1076:


Well ... what I was thinking of is that even if the app does not care about 
internal doc IDs, the Lucene code may very well care. If we don't shift doc IDs 
back, it means maxDoc will continue to grow, and at some point (extreme case 
though), maxDoc will equal 1M, while there will be just 50K docs in the index.

AFAIU, maxDoc is used today to determine array length in FieldCache, I've seen 
it used in IndexSearcher to sort the sub readers (at least in the past) etc. So 
perhaps alongside maxDoc we'll need to keep a curNumDocs member to track the 
actual number of documents?

But I have a feeling this will also get complicated.

 Allow MergePolicy to select non-contiguous merges
 -

 Key: LUCENE-1076
 URL: https://issues.apache.org/jira/browse/LUCENE-1076
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Affects Versions: 2.3
Reporter: Michael McCandless
Assignee: Michael McCandless
Priority: Minor
 Attachments: LUCENE-1076.patch


 I started work on this but with LUCENE-1044 I won't make much progress
 on it for a while, so I want to checkpoint my current state/patch.
 For backwards compatibility we must leave the default MergePolicy as
 selecting contiguous merges.  This is necessary because some
 applications rely on temporal monotonicity of doc IDs, which means
 even though merges can re-number documents, the renumbering will
 always reflect the order in which the documents were added to the
 index.
 Still, for those apps that do not rely on this, we should offer a
 MergePolicy that is free to select the best merges regardless of
 whether they are continuguous.  This requires fixing IndexWriter to
 accept such a merge, and, fixing LogMergePolicy to optionally allow
 it the freedom to do so.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1076) Allow MergePolicy to select non-contiguous merges

2009-07-21 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12733783#action_12733783
 ] 

Shai Erera commented on LUCENE-1076:


Besides Mike, there's something I don't understand from a previous comment 
you've made: You commented that today if I build a large index, the doc stores 
are not merged, while if we'll move to merging non contiguous segments, they 
will. I'm afraid I'm not familiar with this area of Lucene well -- if I merge 
two consecutive segments, how come I don't merge their doc stores?

 Allow MergePolicy to select non-contiguous merges
 -

 Key: LUCENE-1076
 URL: https://issues.apache.org/jira/browse/LUCENE-1076
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Affects Versions: 2.3
Reporter: Michael McCandless
Assignee: Michael McCandless
Priority: Minor
 Attachments: LUCENE-1076.patch


 I started work on this but with LUCENE-1044 I won't make much progress
 on it for a while, so I want to checkpoint my current state/patch.
 For backwards compatibility we must leave the default MergePolicy as
 selecting contiguous merges.  This is necessary because some
 applications rely on temporal monotonicity of doc IDs, which means
 even though merges can re-number documents, the renumbering will
 always reflect the order in which the documents were added to the
 index.
 Still, for those apps that do not rely on this, we should offer a
 MergePolicy that is free to select the best merges regardless of
 whether they are continuguous.  This requires fixing IndexWriter to
 accept such a merge, and, fixing LogMergePolicy to optionally allow
 it the freedom to do so.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1076) Allow MergePolicy to select non-contiguous merges

2009-07-21 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12733819#action_12733819
 ] 

Michael McCandless commented on LUCENE-1076:


bq. But how is a new doc ID allocated?

Doc IDs are logically assigned by summing docCount of all segments before me, 
as my base, and then adding to the index of the doc within my segment.  Ie, 
the base of a given segment is not stored anywhere, so we are always free to 
shuffle up the order of segments and nothing in Lucene should care (but, the 
app might).

 Allow MergePolicy to select non-contiguous merges
 -

 Key: LUCENE-1076
 URL: https://issues.apache.org/jira/browse/LUCENE-1076
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Affects Versions: 2.3
Reporter: Michael McCandless
Assignee: Michael McCandless
Priority: Minor
 Attachments: LUCENE-1076.patch


 I started work on this but with LUCENE-1044 I won't make much progress
 on it for a while, so I want to checkpoint my current state/patch.
 For backwards compatibility we must leave the default MergePolicy as
 selecting contiguous merges.  This is necessary because some
 applications rely on temporal monotonicity of doc IDs, which means
 even though merges can re-number documents, the renumbering will
 always reflect the order in which the documents were added to the
 index.
 Still, for those apps that do not rely on this, we should offer a
 MergePolicy that is free to select the best merges regardless of
 whether they are continuguous.  This requires fixing IndexWriter to
 accept such a merge, and, fixing LogMergePolicy to optionally allow
 it the freedom to do so.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1076) Allow MergePolicy to select non-contiguous merges

2009-07-21 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12733822#action_12733822
 ] 

Michael McCandless commented on LUCENE-1076:


bq.  if I merge two consecutive segments, how come I don't merge their doc 
stores

Multiple segments are able to share a single set of doc-store (=
stored fields  term vectors) files, today.  This only happens when
multiple segments are written in a single IndexWriter session with
autoCommit=false.

EG if I open a writer, index all of wikipedia w/ autoCommit false, and
close it, you'll see a single large set of doc store files (eg _0.fdt,
_0.fdx, _0.tvf, _0.tvd, _0.tvx).

Whenever RAM is full (with postings  norms data), a new segment is
flushed, but the doc store files are kept open  shared with further
flushed segments.

A single segment then refers to the shared doc stores, but records its
offset within them.

So, when we merge contiguous segments, because the resulting docs are
also contiguous in the doc stores, we are able to store a single doc
store offset in the merged segment, referencing the orignial doc
store, and it works fine.

But if we merge non-contiguous segments, we must then pull out  merge
the slices from the doc stores into a new [private to the new
segment] set of doc store files.

For apps that store term vectors w/ positions  offsets, and have many
stored fields, and have heterogenous field name - number assignments
(see LUCENE-1737 to fix that), the merging of doc stores can easily
dominate the merge cost.


 Allow MergePolicy to select non-contiguous merges
 -

 Key: LUCENE-1076
 URL: https://issues.apache.org/jira/browse/LUCENE-1076
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Affects Versions: 2.3
Reporter: Michael McCandless
Assignee: Michael McCandless
Priority: Minor
 Attachments: LUCENE-1076.patch


 I started work on this but with LUCENE-1044 I won't make much progress
 on it for a while, so I want to checkpoint my current state/patch.
 For backwards compatibility we must leave the default MergePolicy as
 selecting contiguous merges.  This is necessary because some
 applications rely on temporal monotonicity of doc IDs, which means
 even though merges can re-number documents, the renumbering will
 always reflect the order in which the documents were added to the
 index.
 Still, for those apps that do not rely on this, we should offer a
 MergePolicy that is free to select the best merges regardless of
 whether they are continuguous.  This requires fixing IndexWriter to
 accept such a merge, and, fixing LogMergePolicy to optionally allow
 it the freedom to do so.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1076) Allow MergePolicy to select non-contiguous merges

2009-07-21 Thread Tim Smith (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12733831#action_12733831
 ] 

Tim Smith commented on LUCENE-1076:
---

i suppose you could do a preliminary round of merging that would merge together 
segments that share doc store/termvector data

once this preliminary round of merging is done, you would then no longer have 
the need to slice the doc stores up, just merge them together (contiguous or 
non-contiguous wouldn't matter anymore, however if a segmented session still 
exists higher up, this would prevent you from selecting these segments, or 
newer segments)

it might even be desirable to have a commit() optionally perform this merging 
prior to the commit finishing as this will result in each commit producing one 
segment, regardless of the number of flushes that were done under the hood

 Allow MergePolicy to select non-contiguous merges
 -

 Key: LUCENE-1076
 URL: https://issues.apache.org/jira/browse/LUCENE-1076
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Affects Versions: 2.3
Reporter: Michael McCandless
Assignee: Michael McCandless
Priority: Minor
 Attachments: LUCENE-1076.patch


 I started work on this but with LUCENE-1044 I won't make much progress
 on it for a while, so I want to checkpoint my current state/patch.
 For backwards compatibility we must leave the default MergePolicy as
 selecting contiguous merges.  This is necessary because some
 applications rely on temporal monotonicity of doc IDs, which means
 even though merges can re-number documents, the renumbering will
 always reflect the order in which the documents were added to the
 index.
 Still, for those apps that do not rely on this, we should offer a
 MergePolicy that is free to select the best merges regardless of
 whether they are continuguous.  This requires fixing IndexWriter to
 accept such a merge, and, fixing LogMergePolicy to optionally allow
 it the freedom to do so.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1076) Allow MergePolicy to select non-contiguous merges

2009-07-21 Thread Jason Rutherglen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12733844#action_12733844
 ] 

Jason Rutherglen commented on LUCENE-1076:
--

{quote}if I merge two consecutive segments, how come I don't
merge their doc stores?{quote}

You may want to take a look at
SegmentInfo.docStoreOffset/docStoreSegment which is the pointer
to the docstore file data for that SI. 

 Allow MergePolicy to select non-contiguous merges
 -

 Key: LUCENE-1076
 URL: https://issues.apache.org/jira/browse/LUCENE-1076
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Affects Versions: 2.3
Reporter: Michael McCandless
Assignee: Michael McCandless
Priority: Minor
 Attachments: LUCENE-1076.patch


 I started work on this but with LUCENE-1044 I won't make much progress
 on it for a while, so I want to checkpoint my current state/patch.
 For backwards compatibility we must leave the default MergePolicy as
 selecting contiguous merges.  This is necessary because some
 applications rely on temporal monotonicity of doc IDs, which means
 even though merges can re-number documents, the renumbering will
 always reflect the order in which the documents were added to the
 index.
 Still, for those apps that do not rely on this, we should offer a
 MergePolicy that is free to select the best merges regardless of
 whether they are continuguous.  This requires fixing IndexWriter to
 accept such a merge, and, fixing LogMergePolicy to optionally allow
 it the freedom to do so.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org