Hi Marcel,
Thanks for the response, that makes sense.

I assume that there already > 64 indexes in /oak:index before any custom
ones are added, which makes it impossible to remove /oak:index for
MongoDB.  With that many it's going to be impractical for all RDBMS's.

Would there be any benefit in moving /oak:index out of the main document
collection so that any MongoDB indexes in the collection of no relevance to
/oak:index don't get bloated ?
or, more generally
Is there a different way of storing the data in /oak:index so that it
doesn't result in so many MongoDB documents ?


Best Regards
Ian

On 9 July 2015 at 08:15, Marcel Reutegger <[email protected]> wrote:

> Hi Ian,
>
> there are mainly two reasons why we cannot use DocumentStore
> based indexes for this purpose:
>
> - MongoDB only supports a limited number of indexes (64 per
>   collection) and applications usually have a need for more
>   indexes.
>
> - Data in Oak is multi-versioned. It must be possible to query
>   nodes at a specific revision of the tree.
>
> Lucene indexes are more efficient, but are only updated
> asynchronously. Whether this is acceptable usually depends on
> application requirements. Experience so far shows, many indexes
> can be asynchronous, because there was no hard requirement
> for synchronous index updates.
>
> Regards
>  Marcel
>
> On 08/07/15 18:18, "[email protected] on behalf of Ian Boston" wrote:
>
> >Hi,
> >I am confused at how /oak:index works and why it is needed in a MongoDB
> >setting which has native database indexes that appear to cover the same
> >functionality. Could the Oak Query engine use DB indexes directly for all
> >indexes that are built into Oak, and Lucene indexes for all custom
> >indexes ?
> >
> >I am asking this because in MongoDB I observe that 60% of the size of the
> >nodes collection is attributable to /oak:index, and that the 60% increases
> >every non sparse MongoDB index by about 3x. An _id + _modified compound
> >index in MongoDB comes out at about 70GB for 100M documents (in part due
> >to
> >the size of _id). Without the duplication /oak:index it could be closer to
> >25GB. Disk space is cheap, but MongoDB working set RAM is not cheap,
> >neither is page fault IO.
> >
> >I fully understand why TarMK needs /oak:index, but I can't understand
> >(conceptually) the need to implement an index inside an database table.
> >It's like trying to implement an inverted index in an RDBMS table, which
> >everyone who has ever tried (or used) that approach doesn't scale nearly
> >as
> >far as Lucene bitmaps.
> >
> >Could /oak:index be replaced by something that doesn't generate
> >Documents/db rows as fast as it does ?
> >
> >Best Regards
> >Ian
>
>

Reply via email to