Re: /oak:index (DocumentNodeStore)

Norberto Leite Thu, 09 Jul 2015 03:33:52 -0700

A collection per index (or a separate one for indexes only), specially the
asynchronous ones, will translate in a big benefit if the following occurs:
- when querying on index nodes we don't need to get all related node
documents (which is happening)
- the write operations are distinct between indexes and nodes (which I
think is also happening)


N.

On Thu, Jul 9, 2015 at 11:33 AM, Thomas Mueller <muel...@adobe.com> wrote:

> Hi,
>
> Using MongoDB indexes directly doesn't work because of the MVCC model.
> What we could do is add special collections (basically one collection per
> index). This would requires some work, which then would need to be
> repeated for RDBMK. It would be quite some work.
>
> > I observe that 60% of the size of the nodes collection is attributable
> >to /oak:index
>
> Could you try to find out which index(es) are responsible for that? There
> would be multiple ways to reduce the number of nodes:
>
> 0) remove unused indexes
> 1) convert some indexes to Lucene property indexes
> 2) convert to unique index if possible (as this uses less space)
> 3) add a feature to only index a subset of the keys (only index what we
> need)
> 4) convert the last x levels of the index structure as a property instead
> of as a node
>
>
> 3) and 4) would require changes in Oak. For 4), the change should reduce
> the number of nodes, but might cause merge conflicts (not sure). With
> level = 1, it would be:
>
>   /content/products/a @color=red
>   /content/products/b @color=red
>
>   /oak:index/color/red/content
>   /oak:index/color/red/content/products @a = true, @b = true
>
> instead of
>
>   /oak:index/color/red/content
>   /oak:index/color/red/content/products
>   /oak:index/color/red/content/products/a @match = true
>   /oak:index/color/red/content/products/b @match = true
>
> With level > 1, it would require some escaping magic, but we could save
> some more nodes, and basically it would be:
>
> level = 2:
>
>   /oak:index/color/red/content @products_a = true, @products_b = true
>
>
> level = 3:
>
>   /oak:index/color/red @content_products_a = true, @content_products_b =
> true
>
>
>
>
> Regards,
> Thomas
>
>
>
>
>
> On 08/07/15 18:18, "Ian Boston" <i...@tfd.co.uk> wrote:
>
> >Hi,
> >I am confused at how /oak:index works and why it is needed in a MongoDB
> >setting which has native database indexes that appear to cover the same
> >functionality. Could the Oak Query engine use DB indexes directly for all
> >indexes that are built into Oak, and Lucene indexes for all custom
> >indexes ?
> >
> >I am asking this because in MongoDB I observe that 60% of the size of the
> >nodes collection is attributable to /oak:index, and that the 60% increases
> >every non sparse MongoDB index by about 3x. An _id + _modified compound
> >index in MongoDB comes out at about 70GB for 100M documents (in part due
> >to
> >the size of _id). Without the duplication /oak:index it could be closer to
> >25GB. Disk space is cheap, but MongoDB working set RAM is not cheap,
> >neither is page fault IO.
> >
> >I fully understand why TarMK needs /oak:index, but I can't understand
> >(conceptually) the need to implement an index inside an database table.
> >It's like trying to implement an inverted index in an RDBMS table, which
> >everyone who has ever tried (or used) that approach doesn't scale nearly
> >as
> >far as Lucene bitmaps.
> >
> >Could /oak:index be replaced by something that doesn't generate
> >Documents/db rows as fast as it does ?
> >
> >Best Regards
> >Ian
>
>

Re: /oak:index (DocumentNodeStore)

Reply via email to