Re: /oak:index (DocumentNodeStore)

Thomas Mueller Thu, 09 Jul 2015 02:35:57 -0700

Hi,

Using MongoDB indexes directly doesn't work because of the MVCC model.
What we could do is add special collections (basically one collection per
index). This would requires some work, which then would need to be
repeated for RDBMK. It would be quite some work.


> I observe that 60% of the size of the nodes collection is attributable
>to /oak:index

Could you try to find out which index(es) are responsible for that? There
would be multiple ways to reduce the number of nodes:

0) remove unused indexes
1) convert some indexes to Lucene property indexes
2) convert to unique index if possible (as this uses less space)
3) add a feature to only index a subset of the keys (only index what we
need)
4) convert the last x levels of the index structure as a property instead
of as a node


3) and 4) would require changes in Oak. For 4), the change should reduce
the number of nodes, but might cause merge conflicts (not sure). With
level = 1, it would be:

  /content/products/a @color=red
  /content/products/b @color=red

  /oak:index/color/red/content
  /oak:index/color/red/content/products @a = true, @b = true

instead of

  /oak:index/color/red/content
  /oak:index/color/red/content/products
  /oak:index/color/red/content/products/a @match = true
  /oak:index/color/red/content/products/b @match = true

With level > 1, it would require some escaping magic, but we could save
some more nodes, and basically it would be:

level = 2:

  /oak:index/color/red/content @products_a = true, @products_b = true


level = 3:

  /oak:index/color/red @content_products_a = true, @content_products_b =
true




Regards,
Thomas





On 08/07/15 18:18, "Ian Boston" <[email protected]> wrote:

>Hi,
>I am confused at how /oak:index works and why it is needed in a MongoDB
>setting which has native database indexes that appear to cover the same
>functionality. Could the Oak Query engine use DB indexes directly for all
>indexes that are built into Oak, and Lucene indexes for all custom
>indexes ?
>
>I am asking this because in MongoDB I observe that 60% of the size of the
>nodes collection is attributable to /oak:index, and that the 60% increases
>every non sparse MongoDB index by about 3x. An _id + _modified compound
>index in MongoDB comes out at about 70GB for 100M documents (in part due
>to
>the size of _id). Without the duplication /oak:index it could be closer to
>25GB. Disk space is cheap, but MongoDB working set RAM is not cheap,
>neither is page fault IO.
>
>I fully understand why TarMK needs /oak:index, but I can't understand
>(conceptually) the need to implement an index inside an database table.
>It's like trying to implement an inverted index in an RDBMS table, which
>everyone who has ever tried (or used) that approach doesn't scale nearly
>as
>far as Lucene bitmaps.
>
>Could /oak:index be replaced by something that doesn't generate
>Documents/db rows as fast as it does ?
>
>Best Regards
>Ian

Re: /oak:index (DocumentNodeStore)

Reply via email to