Re: How does delete work?

2002-11-23 Thread Clemens Marschner
So what if documents are deleted in the meantime? Then the recursive merge
can't determine the X segments with the same size.



--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




Re: How does delete work?

2002-11-23 Thread Doug Cutting
Clemens Marschner wrote:

So what if documents are deleted in the meantime? Then the recursive merge
can't determine the X segments with the same size.


If you read my previous message you'll find the answer:

Doug Cutting wrote:
 It's actually a little more complicated than that, since (among other
 reasons) after docuuments are deleted a segment's size will no longer
 be exactly a power of the mergeFactor.

If you want the gory details, look at IndexWriter.java.  Or just set 
IndexReader.infoStream to System.out, then add and delete documents and 
watch what happens.

Doug


--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]



How does delete work?

2002-11-22 Thread Rob Outar
Hello all,

I used the delete(Term) method, then I looked at the index files, only one
file changed _1tx.del  I found references to the file still in some of the
index files, so my question is how does Lucene handle deletes?

Thanks,

Rob


--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




Re: How does delete work?

2002-11-22 Thread Scott Ganyo
It just marks the record as deleted.  The record isn't actually removed 
until the index is optimized.

Scott

Rob Outar wrote:

Hello all,

	I used the delete(Term) method, then I looked at the index files, 
only one
file changed _1tx.del  I found references to the file still in some 
of the
index files, so my question is how does Lucene handle deletes?

Thanks,

Rob


--
To unsubscribe, e-mail:
For additional commands, e-mail: 


--
Brain: Pinky, are you pondering what I’m pondering?
Pinky: I think so, Brain, but calling it a pu-pu platter? Huh, what were 
they thinking?


--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]



Re: How does delete work?

2002-11-22 Thread Otis Gospodnetic
This is via mergeFactor?

--- Doug Cutting [EMAIL PROTECTED] wrote:
 The data is actually removed the next time its segment is merged. 
 Optimizing forces it to happen, but it will also eventually happen as
 
 more documents are added to the index, without optimization.
 
 Scott Ganyo wrote:
  It just marks the record as deleted.  The record isn't actually
 removed 
  until the index is optimized.
  
  Scott
  
  Rob Outar wrote:
  
  Hello all,
 
  I used the delete(Term) method, then I looked at the index
 files, 
  only one
  file changed _1tx.del  I found references to the file still in
 some 
  of the
  index files, so my question is how does Lucene handle deletes?
 
  Thanks,
 
  Rob
 
 
  -- 
  To unsubscribe, e-mail:
  For additional commands, e-mail: 
  
  
  
 
 
 --
 To unsubscribe, e-mail:  
 mailto:[EMAIL PROTECTED]
 For additional commands, e-mail:
 mailto:[EMAIL PROTECTED]
 


__
Do you Yahoo!?
Yahoo! Mail Plus – Powerful. Affordable. Sign up now.
http://mailplus.yahoo.com

--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




Re: How does delete work?

2002-11-22 Thread Doug Cutting
Merging happens constantly as documents are added.  Each document is 
initially added in its own segment, and pushed onto the segment stack. 
Whenever there are mergeFactor segments on the top of the stack that are 
the same size, these are merged together into a new single segment that 
replaces them.  So, if mergeFactor is 10, and you've added 122 
documents, the stack will have five segments, as follows:
  document 121
  document 120
  documents 110-119
  documents 100-109
  documents 0-100
The next merge will happen after document 129 is added, when a new 
segment will replace the segments for document 120 through document 129 
with a new single segment.

It's actually a little more complicated than that, since (among other 
reasons) after docuuments are deleted a segment's size will no longer be 
exactly a power of the mergeFactor.

Doug

Otis Gospodnetic wrote:
This is via mergeFactor?

--- Doug Cutting [EMAIL PROTECTED] wrote:


The data is actually removed the next time its segment is merged. 
Optimizing forces it to happen, but it will also eventually happen as

more documents are added to the index, without optimization.

Scott Ganyo wrote:

It just marks the record as deleted.  The record isn't actually


removed 

until the index is optimized.

Scott

Rob Outar wrote:



Hello all,

   I used the delete(Term) method, then I looked at the index



files, 

only one
file changed _1tx.del  I found references to the file still in



some 

of the
index files, so my question is how does Lucene handle deletes?

Thanks,

Rob


--
To unsubscribe, e-mail:
For additional commands, e-mail: 





--
To unsubscribe, e-mail:  
mailto:[EMAIL PROTECTED]
For additional commands, e-mail:
mailto:[EMAIL PROTECTED]



__
Do you Yahoo!?
Yahoo! Mail Plus – Powerful. Affordable. Sign up now.
http://mailplus.yahoo.com

--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




Re: How does delete work?

2002-11-22 Thread Otis Gospodnetic
I see, so every mergeFactor documents they are compined into a single
new segment in the index, and only when optimize() is called do those
multiple segments get merged into a single segment.
In your example below that would mean that optimize() was called after
document 100 was added, hence a single segment with documents 0-100.
Is this right?

Thanks,
Otis

--- Doug Cutting [EMAIL PROTECTED] wrote:
 Merging happens constantly as documents are added.  Each document is 
 initially added in its own segment, and pushed onto the segment
 stack. 
 Whenever there are mergeFactor segments on the top of the stack that
 are 
 the same size, these are merged together into a new single segment
 that 
 replaces them.  So, if mergeFactor is 10, and you've added 122 
 documents, the stack will have five segments, as follows:
document 121
document 120
documents 110-119
documents 100-109
documents 0-100
 The next merge will happen after document 129 is added, when a new 
 segment will replace the segments for document 120 through document
 129 
 with a new single segment.
 
 It's actually a little more complicated than that, since (among other
 
 reasons) after docuuments are deleted a segment's size will no longer
 be 
 exactly a power of the mergeFactor.
 
 Doug
 
 Otis Gospodnetic wrote:
  This is via mergeFactor?
  
  --- Doug Cutting [EMAIL PROTECTED] wrote:
  
 The data is actually removed the next time its segment is merged. 
 Optimizing forces it to happen, but it will also eventually happen
 as
 
 more documents are added to the index, without optimization.
 
 Scott Ganyo wrote:
 
 It just marks the record as deleted.  The record isn't actually
 
 removed 
 
 until the index is optimized.
 
 Scott
 
 Rob Outar wrote:
 
 
 Hello all,
 
 I used the delete(Term) method, then I looked at the index
 
 files, 
 
 only one
 file changed _1tx.del  I found references to the file still in
 
 some 
 
 of the
 index files, so my question is how does Lucene handle deletes?
 
 Thanks,
 
 Rob
 
 
 -- 
 To unsubscribe, e-mail:
 For additional commands, e-mail: 
 
 
 
 
 --
 To unsubscribe, e-mail:  
 mailto:[EMAIL PROTECTED]
 For additional commands, e-mail:
 mailto:[EMAIL PROTECTED]
 
  
  
  __
  Do you Yahoo!?
  Yahoo! Mail Plus – Powerful. Affordable. Sign up now.
  http://mailplus.yahoo.com
  
  --
  To unsubscribe, e-mail:  
 mailto:[EMAIL PROTECTED]
  For additional commands, e-mail:
 mailto:[EMAIL PROTECTED]
  
 
 
 --
 To unsubscribe, e-mail:  
 mailto:[EMAIL PROTECTED]
 For additional commands, e-mail:
 mailto:[EMAIL PROTECTED]
 


__
Do you Yahoo!?
Yahoo! Mail Plus – Powerful. Affordable. Sign up now.
http://mailplus.yahoo.com

--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




Re: How does delete work?

2002-11-22 Thread Doug Cutting
No, in my example optimize() was never called.  The merge rule operates 
recursively.  So, after 99 documents had been added the segment stack 
contained nine indexes with ten documents and nine with one document. 
When the hundredth document was added, the nine one document segments 
were popped of the stack and merged into a single segment that was 
pushed onto the stack.  So then the top of the stack had ten segments 
each containing ten documents, i.e., mergeFactor segments of the same 
size, and these ten segments were then merged into a single segment of 
100 documents.  So adding the 100th document triggered two merges.

(One error in my previous example: the 100 document segment actually 
contains documents 0-99, not 0-100.)

A corollary of this is, when mergeFactor is 10 and no deletions have 
been made, the segments correspond to the digits in the decimal 
representation of the number of documents in the index.  So, an 85 
document index has eight segments with 10 documents and five segments 
with one documents.  (This is somewhat of a simplification, as Lucene 
automatically merges single document segments before ever writing them 
to disk as an optimization.)

It is most beneficial to call IndexWriter.optimize() only when you know 
you won't be adding documents to an index for a while, but will be 
searching it a lot.  Calling optimize() periodically while indexing 
mostly just slows things down.

Doug

Otis Gospodnetic wrote:
I see, so every mergeFactor documents they are compined into a single
new segment in the index, and only when optimize() is called do those
multiple segments get merged into a single segment.
In your example below that would mean that optimize() was called after
document 100 was added, hence a single segment with documents 0-100.
Is this right?

Thanks,
Otis

--- Doug Cutting [EMAIL PROTECTED] wrote:


Merging happens constantly as documents are added.  Each document is 
initially added in its own segment, and pushed onto the segment
stack. 
Whenever there are mergeFactor segments on the top of the stack that
are 
the same size, these are merged together into a new single segment
that 
replaces them.  So, if mergeFactor is 10, and you've added 122 
documents, the stack will have five segments, as follows:
  document 121
  document 120
  documents 110-119
  documents 100-109
  documents 0-100
The next merge will happen after document 129 is added, when a new 
segment will replace the segments for document 120 through document
129 
with a new single segment.

It's actually a little more complicated than that, since (among other

reasons) after docuuments are deleted a segment's size will no longer
be 
exactly a power of the mergeFactor.

Doug

Otis Gospodnetic wrote:

This is via mergeFactor?

--- Doug Cutting [EMAIL PROTECTED] wrote:



The data is actually removed the next time its segment is merged. 
Optimizing forces it to happen, but it will also eventually happen


as


more documents are added to the index, without optimization.

Scott Ganyo wrote:



It just marks the record as deleted.  The record isn't actually


removed 


until the index is optimized.

Scott

Rob Outar wrote:




Hello all,

  I used the delete(Term) method, then I looked at the index



files, 


only one
file changed _1tx.del  I found references to the file still in



some 


of the
index files, so my question is how does Lucene handle deletes?

Thanks,

Rob


--
To unsubscribe, e-mail:
For additional commands, e-mail: 




--
To unsubscribe, e-mail:  
mailto:[EMAIL PROTECTED]
For additional commands, e-mail:
mailto:[EMAIL PROTECTED]


__
Do you Yahoo!?
Yahoo! Mail Plus – Powerful. Affordable. Sign up now.
http://mailplus.yahoo.com

--
To unsubscribe, e-mail:  

mailto:[EMAIL PROTECTED]


For additional commands, e-mail:


mailto:[EMAIL PROTECTED]


--
To unsubscribe, e-mail:  
mailto:[EMAIL PROTECTED]
For additional commands, e-mail:
mailto:[EMAIL PROTECTED]



__
Do you Yahoo!?
Yahoo! Mail Plus – Powerful. Affordable. Sign up now.
http://mailplus.yahoo.com

--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]