Re: Oak Indexing. Was Re: Property index replacement / evolution

Ian Boston Thu, 11 Aug 2016 01:29:29 -0700

Hi,

On 11 August 2016 at 09:14, Michael Marth <[email protected]> wrote:


> Hi Ian,
>
> No worries - good discussion.
>
> I should point out though that my reply to Davide was based on a
> comparison of the current design vs the Jackrabbit 2 design (in which
> indexes were stored locally). Maybe I misunderstood Davide’s comment.
>
> I will split my answer to your mail in 2 parts:
>
>
> >
> >Full text extraction should be separated from indexing, as the DS blobs
> are
> >immutable, so is the full text. There is code to do this in the Oak
> >indexer, but it's not used to write to the DS at present. It should be
> done
> >in a Job, distributed to all nodes, run only once per item. Full text
> >extraction is hugely expensive.
>
> My understanding is that Oak currently:
> A) runs full text extraction in a separate thread (separate form the
> “other” indexer)
> B) runs it only once per cluster
> If that is correct then the difference to what you mention above would be
> that you would like the FT indexing not be pinned to one instance but
> rather be distributed, say round-robin.
> Right?
>


Yes.


>
>
> >Building the same index on every node doesn't scale for the reasons you
> >point out, and eventually hits a brick wall.
> >http://lucene.apache.org/core/6_1_0/core/org/apache/
> lucene/codecs/lucene60/package-summary.html#Limitations.
> >(Int32 on Document ID per index). One of the reasons for the Hybrid
> >approach was the number of Oak documents in some repositories will exceed
> >that limit.
>
> I am not sure what you are arguing for with this comment…
> It sounds like an argument in favour of the current design - which is
> probably not what you mean… Could you explain, please?
>

I didn't communicate that very well.

Currently Lucene (6.1) has a limit of Int32 to the number of documents it
can store in an index, IIUC There is a long term desire to increase that
but using Int64 but no long term commitment as its probably significant
work given arrays in Java are indexed with Int32.

The Hybrid approach doesn't help the potential Lucene brick wall, but one
motivation for looking at it was the number of Oak Documents including
those under /oak:index which is, in some cases, approaching that limit.



>
>
> Thanks!
> Michael
>

Re: Oak Indexing. Was Re: Property index replacement / evolution

Reply via email to