On Sun, Nov 24, 2013 at 8:31 AM, Erick Erickson <erickerick...@gmail.com>wrote:

> bq: Do i understand you correctly that when two segmets get merged, the
> docids
> (of the original segments) remain the same?
>
> The original segments are unchanged, segments are _never_ changed after
> they're closed. But they'll be thrown away. Say you have segment1 and
> segment2 that get merged into segment3. As soon as the last searcher
> that is looking at segment1 and segment2 is closed, those two segments
> will be deleted from your disk.
>
> But for any given doc, the docid in segment3 will very likely be different
> than it was in segment1 or 2.
>

i'm trying to figure this out - i'll have to dig, i suppose. for example,
if the docbase (the docid offset per searcher) was stored together with the
index segment, that would be an indication of 'relative stability of docids'


>
> I think you're reading too much into LUCENE-2897. I'm pretty sure the
> segment in question is not available to you anyway before this rewrite is
> done,
> but freely admit I don't know much about it.
>

i've done tests, committing and overwriting a document and saw (SOLR4.0)
that docids are being recycled. I deleted 2 docs, then added a new document
and guess what: the new document had the docid of the previously deleted
document (but different fields).

That was new to me, so I searched and found the LUCENE-2897 which seemed to
explain that behaviour.


>
> You're probably going to get into the whole PerSegment family of
> operations,
> which is something I'm not all that familiar with so I'll leave
> explanations
> to others.
>

Thank you, it is useful to get insights from various sides,

  roman


>
> On Sat, Nov 23, 2013 at 8:22 PM, Roman Chyla <roman.ch...@gmail.com>
> wrote:
>
> > Hi Erick,
> > Many thanks for the info. An additional question:
> >
> > Do i understand you correctly that when two segmets get merged, the
> docids
> > (of the original segments) remain the same?
> >
> > (unless, perhaps in situation, they were merged using the last index
> > segment which was opened for writing and where the docids could have
> > suddenly changed in a commit just before the merge)
> >
> > Yes, you guessed right that I am putting my code into the custom cache -
> so
> > it gets notified on index changes. I don't know yet how, but I think I
> can
> > find the way to the current active, opened (last) index segment. Which is
> > actively updated (as opposed to just being merged) -- so my definition of
> > 'not last ones' is: where docids don't change. I'd be grateful if someone
> > could spot any problem with such assumption.
> >
> > roman
> >
> >
> >
> >
> > On Sat, Nov 23, 2013 at 7:39 PM, Erick Erickson <erickerick...@gmail.com
> > >wrote:
> >
> > > bq: But can I assume
> > > that docids in other segments (other than the last one) will be
> > relatively
> > > stable?
> > >
> > > Kinda. Maybe. Maybe not. It depends on how you define "other than the
> > > last one".
> > >
> > > The key is that the internal doc IDs may change when segments are
> > > merged. And old segments get merged. Doc IDs will _never_ change
> > > in a segment once it's closed (although as you note they may be
> > > marked as deleted). But that segment may be written to a new segment
> > > when merging and the internal ID for a given document in the new
> > > segment bears no relationship to internal ID in the old segment.
> > >
> > > BTW, I think you only really care when opening a new searchers. There
> is
> > > a UserCache (see solrconfig.xml) that gets notified when a new searcher
> > > is being opened to give it an opportunity to refresh itself, is that
> > > useful?
> > >
> > > As long as a searcher is open, it's guaranteed that nothing is
> changing.
> > > Hard commits with openSearcher=false don't open new searchers, which
> > > is why changes aren't visible until a softCommit or a hard commit with
> > > openSearcher=true despite the fact that the segments are closed.
> > >
> > > FWIW,
> > > Erick
> > >
> > > Best
> > > Erick
> > >
> > >
> > >
> > > On Sat, Nov 23, 2013 at 12:40 AM, Roman Chyla <roman.ch...@gmail.com>
> > > wrote:
> > >
> > > > Hi,
> > > > docids are 'ephemeral', but i'd still like to build a search cache
> with
> > > > them (they allow for the fastest joins).
> > > >
> > > > i'm seeing docids keep changing with updates (especially, in the last
> > > index
> > > > segment) - as per
> > > > https://issues.apache.org/jira/browse/LUCENE-2897
> > > >
> > > > That would be fine, because i could build the cache from diff (of
> index
> > > > state) + reading the latest index segment in its entirety. But can I
> > > assume
> > > > that docids in other segments (other than the last one) will be
> > > relatively
> > > > stable? (ie. when an old doc is deleted, the docid is marked as
> > removed;
> > > > update doc = delete old & create a new docid)?
> > > >
> > > > thanks
> > > >
> > > > roman
> > > >
> > >
> >
>

Reply via email to