Hi Marcel, Thanks for the response, that makes sense. I assume that there already > 64 indexes in /oak:index before any custom ones are added, which makes it impossible to remove /oak:index for MongoDB. With that many it's going to be impractical for all RDBMS's.
Would there be any benefit in moving /oak:index out of the main document collection so that any MongoDB indexes in the collection of no relevance to /oak:index don't get bloated ? or, more generally Is there a different way of storing the data in /oak:index so that it doesn't result in so many MongoDB documents ? Best Regards Ian On 9 July 2015 at 08:15, Marcel Reutegger <[email protected]> wrote: > Hi Ian, > > there are mainly two reasons why we cannot use DocumentStore > based indexes for this purpose: > > - MongoDB only supports a limited number of indexes (64 per > collection) and applications usually have a need for more > indexes. > > - Data in Oak is multi-versioned. It must be possible to query > nodes at a specific revision of the tree. > > Lucene indexes are more efficient, but are only updated > asynchronously. Whether this is acceptable usually depends on > application requirements. Experience so far shows, many indexes > can be asynchronous, because there was no hard requirement > for synchronous index updates. > > Regards > Marcel > > On 08/07/15 18:18, "[email protected] on behalf of Ian Boston" wrote: > > >Hi, > >I am confused at how /oak:index works and why it is needed in a MongoDB > >setting which has native database indexes that appear to cover the same > >functionality. Could the Oak Query engine use DB indexes directly for all > >indexes that are built into Oak, and Lucene indexes for all custom > >indexes ? > > > >I am asking this because in MongoDB I observe that 60% of the size of the > >nodes collection is attributable to /oak:index, and that the 60% increases > >every non sparse MongoDB index by about 3x. An _id + _modified compound > >index in MongoDB comes out at about 70GB for 100M documents (in part due > >to > >the size of _id). Without the duplication /oak:index it could be closer to > >25GB. Disk space is cheap, but MongoDB working set RAM is not cheap, > >neither is page fault IO. > > > >I fully understand why TarMK needs /oak:index, but I can't understand > >(conceptually) the need to implement an index inside an database table. > >It's like trying to implement an inverted index in an RDBMS table, which > >everyone who has ever tried (or used) that approach doesn't scale nearly > >as > >far as Lucene bitmaps. > > > >Could /oak:index be replaced by something that doesn't generate > >Documents/db rows as fast as it does ? > > > >Best Regards > >Ian > >
