Re: How does delete work?
So what if documents are deleted in the meantime? Then the recursive merge can't determine the X segments with the same size. -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED]
Re: How does delete work?
Clemens Marschner wrote: So what if documents are deleted in the meantime? Then the recursive merge can't determine the X segments with the same size. If you read my previous message you'll find the answer: Doug Cutting wrote: It's actually a little more complicated than that, since (among other reasons) after docuuments are deleted a segment's size will no longer be exactly a power of the mergeFactor. If you want the gory details, look at IndexWriter.java. Or just set IndexReader.infoStream to System.out, then add and delete documents and watch what happens. Doug -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED]
How does delete work?
Hello all, I used the delete(Term) method, then I looked at the index files, only one file changed _1tx.del I found references to the file still in some of the index files, so my question is how does Lucene handle deletes? Thanks, Rob -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED]
Re: How does delete work?
It just marks the record as deleted. The record isn't actually removed until the index is optimized. Scott Rob Outar wrote: Hello all, I used the delete(Term) method, then I looked at the index files, only one file changed _1tx.del I found references to the file still in some of the index files, so my question is how does Lucene handle deletes? Thanks, Rob -- To unsubscribe, e-mail: For additional commands, e-mail: -- Brain: Pinky, are you pondering what Im pondering? Pinky: I think so, Brain, but calling it a pu-pu platter? Huh, what were they thinking? -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED]
Re: How does delete work?
This is via mergeFactor? --- Doug Cutting [EMAIL PROTECTED] wrote: The data is actually removed the next time its segment is merged. Optimizing forces it to happen, but it will also eventually happen as more documents are added to the index, without optimization. Scott Ganyo wrote: It just marks the record as deleted. The record isn't actually removed until the index is optimized. Scott Rob Outar wrote: Hello all, I used the delete(Term) method, then I looked at the index files, only one file changed _1tx.del I found references to the file still in some of the index files, so my question is how does Lucene handle deletes? Thanks, Rob -- To unsubscribe, e-mail: For additional commands, e-mail: -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED] __ Do you Yahoo!? Yahoo! Mail Plus Powerful. Affordable. Sign up now. http://mailplus.yahoo.com -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED]
Re: How does delete work?
Merging happens constantly as documents are added. Each document is initially added in its own segment, and pushed onto the segment stack. Whenever there are mergeFactor segments on the top of the stack that are the same size, these are merged together into a new single segment that replaces them. So, if mergeFactor is 10, and you've added 122 documents, the stack will have five segments, as follows: document 121 document 120 documents 110-119 documents 100-109 documents 0-100 The next merge will happen after document 129 is added, when a new segment will replace the segments for document 120 through document 129 with a new single segment. It's actually a little more complicated than that, since (among other reasons) after docuuments are deleted a segment's size will no longer be exactly a power of the mergeFactor. Doug Otis Gospodnetic wrote: This is via mergeFactor? --- Doug Cutting [EMAIL PROTECTED] wrote: The data is actually removed the next time its segment is merged. Optimizing forces it to happen, but it will also eventually happen as more documents are added to the index, without optimization. Scott Ganyo wrote: It just marks the record as deleted. The record isn't actually removed until the index is optimized. Scott Rob Outar wrote: Hello all, I used the delete(Term) method, then I looked at the index files, only one file changed _1tx.del I found references to the file still in some of the index files, so my question is how does Lucene handle deletes? Thanks, Rob -- To unsubscribe, e-mail: For additional commands, e-mail: -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED] __ Do you Yahoo!? Yahoo! Mail Plus Powerful. Affordable. Sign up now. http://mailplus.yahoo.com -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED] -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED]
Re: How does delete work?
I see, so every mergeFactor documents they are compined into a single new segment in the index, and only when optimize() is called do those multiple segments get merged into a single segment. In your example below that would mean that optimize() was called after document 100 was added, hence a single segment with documents 0-100. Is this right? Thanks, Otis --- Doug Cutting [EMAIL PROTECTED] wrote: Merging happens constantly as documents are added. Each document is initially added in its own segment, and pushed onto the segment stack. Whenever there are mergeFactor segments on the top of the stack that are the same size, these are merged together into a new single segment that replaces them. So, if mergeFactor is 10, and you've added 122 documents, the stack will have five segments, as follows: document 121 document 120 documents 110-119 documents 100-109 documents 0-100 The next merge will happen after document 129 is added, when a new segment will replace the segments for document 120 through document 129 with a new single segment. It's actually a little more complicated than that, since (among other reasons) after docuuments are deleted a segment's size will no longer be exactly a power of the mergeFactor. Doug Otis Gospodnetic wrote: This is via mergeFactor? --- Doug Cutting [EMAIL PROTECTED] wrote: The data is actually removed the next time its segment is merged. Optimizing forces it to happen, but it will also eventually happen as more documents are added to the index, without optimization. Scott Ganyo wrote: It just marks the record as deleted. The record isn't actually removed until the index is optimized. Scott Rob Outar wrote: Hello all, I used the delete(Term) method, then I looked at the index files, only one file changed _1tx.del I found references to the file still in some of the index files, so my question is how does Lucene handle deletes? Thanks, Rob -- To unsubscribe, e-mail: For additional commands, e-mail: -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED] __ Do you Yahoo!? Yahoo! Mail Plus Powerful. Affordable. Sign up now. http://mailplus.yahoo.com -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED] -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED] __ Do you Yahoo!? Yahoo! Mail Plus Powerful. Affordable. Sign up now. http://mailplus.yahoo.com -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED]
Re: How does delete work?
No, in my example optimize() was never called. The merge rule operates recursively. So, after 99 documents had been added the segment stack contained nine indexes with ten documents and nine with one document. When the hundredth document was added, the nine one document segments were popped of the stack and merged into a single segment that was pushed onto the stack. So then the top of the stack had ten segments each containing ten documents, i.e., mergeFactor segments of the same size, and these ten segments were then merged into a single segment of 100 documents. So adding the 100th document triggered two merges. (One error in my previous example: the 100 document segment actually contains documents 0-99, not 0-100.) A corollary of this is, when mergeFactor is 10 and no deletions have been made, the segments correspond to the digits in the decimal representation of the number of documents in the index. So, an 85 document index has eight segments with 10 documents and five segments with one documents. (This is somewhat of a simplification, as Lucene automatically merges single document segments before ever writing them to disk as an optimization.) It is most beneficial to call IndexWriter.optimize() only when you know you won't be adding documents to an index for a while, but will be searching it a lot. Calling optimize() periodically while indexing mostly just slows things down. Doug Otis Gospodnetic wrote: I see, so every mergeFactor documents they are compined into a single new segment in the index, and only when optimize() is called do those multiple segments get merged into a single segment. In your example below that would mean that optimize() was called after document 100 was added, hence a single segment with documents 0-100. Is this right? Thanks, Otis --- Doug Cutting [EMAIL PROTECTED] wrote: Merging happens constantly as documents are added. Each document is initially added in its own segment, and pushed onto the segment stack. Whenever there are mergeFactor segments on the top of the stack that are the same size, these are merged together into a new single segment that replaces them. So, if mergeFactor is 10, and you've added 122 documents, the stack will have five segments, as follows: document 121 document 120 documents 110-119 documents 100-109 documents 0-100 The next merge will happen after document 129 is added, when a new segment will replace the segments for document 120 through document 129 with a new single segment. It's actually a little more complicated than that, since (among other reasons) after docuuments are deleted a segment's size will no longer be exactly a power of the mergeFactor. Doug Otis Gospodnetic wrote: This is via mergeFactor? --- Doug Cutting [EMAIL PROTECTED] wrote: The data is actually removed the next time its segment is merged. Optimizing forces it to happen, but it will also eventually happen as more documents are added to the index, without optimization. Scott Ganyo wrote: It just marks the record as deleted. The record isn't actually removed until the index is optimized. Scott Rob Outar wrote: Hello all, I used the delete(Term) method, then I looked at the index files, only one file changed _1tx.del I found references to the file still in some of the index files, so my question is how does Lucene handle deletes? Thanks, Rob -- To unsubscribe, e-mail: For additional commands, e-mail: -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED] __ Do you Yahoo!? Yahoo! Mail Plus Powerful. Affordable. Sign up now. http://mailplus.yahoo.com -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED] -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED] __ Do you Yahoo!? Yahoo! Mail Plus Powerful. Affordable. Sign up now. http://mailplus.yahoo.com -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED] -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED]