On Sun, Nov 24, 2013 at 8:31 AM, Erick Erickson <erickerick...@gmail.com>wrote:
> bq: Do i understand you correctly that when two segmets get merged, the > docids > (of the original segments) remain the same? > > The original segments are unchanged, segments are _never_ changed after > they're closed. But they'll be thrown away. Say you have segment1 and > segment2 that get merged into segment3. As soon as the last searcher > that is looking at segment1 and segment2 is closed, those two segments > will be deleted from your disk. > > But for any given doc, the docid in segment3 will very likely be different > than it was in segment1 or 2. > i'm trying to figure this out - i'll have to dig, i suppose. for example, if the docbase (the docid offset per searcher) was stored together with the index segment, that would be an indication of 'relative stability of docids' > > I think you're reading too much into LUCENE-2897. I'm pretty sure the > segment in question is not available to you anyway before this rewrite is > done, > but freely admit I don't know much about it. > i've done tests, committing and overwriting a document and saw (SOLR4.0) that docids are being recycled. I deleted 2 docs, then added a new document and guess what: the new document had the docid of the previously deleted document (but different fields). That was new to me, so I searched and found the LUCENE-2897 which seemed to explain that behaviour. > > You're probably going to get into the whole PerSegment family of > operations, > which is something I'm not all that familiar with so I'll leave > explanations > to others. > Thank you, it is useful to get insights from various sides, roman > > On Sat, Nov 23, 2013 at 8:22 PM, Roman Chyla <roman.ch...@gmail.com> > wrote: > > > Hi Erick, > > Many thanks for the info. An additional question: > > > > Do i understand you correctly that when two segmets get merged, the > docids > > (of the original segments) remain the same? > > > > (unless, perhaps in situation, they were merged using the last index > > segment which was opened for writing and where the docids could have > > suddenly changed in a commit just before the merge) > > > > Yes, you guessed right that I am putting my code into the custom cache - > so > > it gets notified on index changes. I don't know yet how, but I think I > can > > find the way to the current active, opened (last) index segment. Which is > > actively updated (as opposed to just being merged) -- so my definition of > > 'not last ones' is: where docids don't change. I'd be grateful if someone > > could spot any problem with such assumption. > > > > roman > > > > > > > > > > On Sat, Nov 23, 2013 at 7:39 PM, Erick Erickson <erickerick...@gmail.com > > >wrote: > > > > > bq: But can I assume > > > that docids in other segments (other than the last one) will be > > relatively > > > stable? > > > > > > Kinda. Maybe. Maybe not. It depends on how you define "other than the > > > last one". > > > > > > The key is that the internal doc IDs may change when segments are > > > merged. And old segments get merged. Doc IDs will _never_ change > > > in a segment once it's closed (although as you note they may be > > > marked as deleted). But that segment may be written to a new segment > > > when merging and the internal ID for a given document in the new > > > segment bears no relationship to internal ID in the old segment. > > > > > > BTW, I think you only really care when opening a new searchers. There > is > > > a UserCache (see solrconfig.xml) that gets notified when a new searcher > > > is being opened to give it an opportunity to refresh itself, is that > > > useful? > > > > > > As long as a searcher is open, it's guaranteed that nothing is > changing. > > > Hard commits with openSearcher=false don't open new searchers, which > > > is why changes aren't visible until a softCommit or a hard commit with > > > openSearcher=true despite the fact that the segments are closed. > > > > > > FWIW, > > > Erick > > > > > > Best > > > Erick > > > > > > > > > > > > On Sat, Nov 23, 2013 at 12:40 AM, Roman Chyla <roman.ch...@gmail.com> > > > wrote: > > > > > > > Hi, > > > > docids are 'ephemeral', but i'd still like to build a search cache > with > > > > them (they allow for the fastest joins). > > > > > > > > i'm seeing docids keep changing with updates (especially, in the last > > > index > > > > segment) - as per > > > > https://issues.apache.org/jira/browse/LUCENE-2897 > > > > > > > > That would be fine, because i could build the cache from diff (of > index > > > > state) + reading the latest index segment in its entirety. But can I > > > assume > > > > that docids in other segments (other than the last one) will be > > > relatively > > > > stable? (ie. when an old doc is deleted, the docid is marked as > > removed; > > > > update doc = delete old & create a new docid)? > > > > > > > > thanks > > > > > > > > roman > > > > > > > > > >