/oak:index (DocumentNodeStore)

Ian Boston Wed, 08 Jul 2015 09:18:42 -0700

Hi,
I am confused at how /oak:index works and why it is needed in a MongoDB
setting which has native database indexes that appear to cover the same
functionality. Could the Oak Query engine use DB indexes directly for all
indexes that are built into Oak, and Lucene indexes for all custom indexes ?


I am asking this because in MongoDB I observe that 60% of the size of the
nodes collection is attributable to /oak:index, and that the 60% increases
every non sparse MongoDB index by about 3x. An _id + _modified compound
index in MongoDB comes out at about 70GB for 100M documents (in part due to
the size of _id). Without the duplication /oak:index it could be closer to
25GB. Disk space is cheap, but MongoDB working set RAM is not cheap,
neither is page fault IO.

I fully understand why TarMK needs /oak:index, but I can't understand
(conceptually) the need to implement an index inside an database table.
It's like trying to implement an inverted index in an RDBMS table, which
everyone who has ever tried (or used) that approach doesn't scale nearly as
far as Lucene bitmaps.

Could /oak:index be replaced by something that doesn't generate
Documents/db rows as fast as it does ?

Best Regards
Ian

/oak:index (DocumentNodeStore)

Reply via email to