Hi Ian, there are mainly two reasons why we cannot use DocumentStore based indexes for this purpose:
- MongoDB only supports a limited number of indexes (64 per collection) and applications usually have a need for more indexes. - Data in Oak is multi-versioned. It must be possible to query nodes at a specific revision of the tree. Lucene indexes are more efficient, but are only updated asynchronously. Whether this is acceptable usually depends on application requirements. Experience so far shows, many indexes can be asynchronous, because there was no hard requirement for synchronous index updates. Regards Marcel On 08/07/15 18:18, "ianbos...@gmail.com on behalf of Ian Boston" wrote: >Hi, >I am confused at how /oak:index works and why it is needed in a MongoDB >setting which has native database indexes that appear to cover the same >functionality. Could the Oak Query engine use DB indexes directly for all >indexes that are built into Oak, and Lucene indexes for all custom >indexes ? > >I am asking this because in MongoDB I observe that 60% of the size of the >nodes collection is attributable to /oak:index, and that the 60% increases >every non sparse MongoDB index by about 3x. An _id + _modified compound >index in MongoDB comes out at about 70GB for 100M documents (in part due >to >the size of _id). Without the duplication /oak:index it could be closer to >25GB. Disk space is cheap, but MongoDB working set RAM is not cheap, >neither is page fault IO. > >I fully understand why TarMK needs /oak:index, but I can't understand >(conceptually) the need to implement an index inside an database table. >It's like trying to implement an inverted index in an RDBMS table, which >everyone who has ever tried (or used) that approach doesn't scale nearly >as >far as Lucene bitmaps. > >Could /oak:index be replaced by something that doesn't generate >Documents/db rows as fast as it does ? > >Best Regards >Ian