Hi, Using MongoDB indexes directly doesn't work because of the MVCC model. What we could do is add special collections (basically one collection per index). This would requires some work, which then would need to be repeated for RDBMK. It would be quite some work.
> I observe that 60% of the size of the nodes collection is attributable >to /oak:index Could you try to find out which index(es) are responsible for that? There would be multiple ways to reduce the number of nodes: 0) remove unused indexes 1) convert some indexes to Lucene property indexes 2) convert to unique index if possible (as this uses less space) 3) add a feature to only index a subset of the keys (only index what we need) 4) convert the last x levels of the index structure as a property instead of as a node 3) and 4) would require changes in Oak. For 4), the change should reduce the number of nodes, but might cause merge conflicts (not sure). With level = 1, it would be: /content/products/a @color=red /content/products/b @color=red /oak:index/color/red/content /oak:index/color/red/content/products @a = true, @b = true instead of /oak:index/color/red/content /oak:index/color/red/content/products /oak:index/color/red/content/products/a @match = true /oak:index/color/red/content/products/b @match = true With level > 1, it would require some escaping magic, but we could save some more nodes, and basically it would be: level = 2: /oak:index/color/red/content @products_a = true, @products_b = true level = 3: /oak:index/color/red @content_products_a = true, @content_products_b = true Regards, Thomas On 08/07/15 18:18, "Ian Boston" <[email protected]> wrote: >Hi, >I am confused at how /oak:index works and why it is needed in a MongoDB >setting which has native database indexes that appear to cover the same >functionality. Could the Oak Query engine use DB indexes directly for all >indexes that are built into Oak, and Lucene indexes for all custom >indexes ? > >I am asking this because in MongoDB I observe that 60% of the size of the >nodes collection is attributable to /oak:index, and that the 60% increases >every non sparse MongoDB index by about 3x. An _id + _modified compound >index in MongoDB comes out at about 70GB for 100M documents (in part due >to >the size of _id). Without the duplication /oak:index it could be closer to >25GB. Disk space is cheap, but MongoDB working set RAM is not cheap, >neither is page fault IO. > >I fully understand why TarMK needs /oak:index, but I can't understand >(conceptually) the need to implement an index inside an database table. >It's like trying to implement an inverted index in an RDBMS table, which >everyone who has ever tried (or used) that approach doesn't scale nearly >as >far as Lucene bitmaps. > >Could /oak:index be replaced by something that doesn't generate >Documents/db rows as fast as it does ? > >Best Regards >Ian
