Hi Ian,

there are mainly two reasons why we cannot use DocumentStore
based indexes for this purpose:

- MongoDB only supports a limited number of indexes (64 per
  collection) and applications usually have a need for more
  indexes. 

- Data in Oak is multi-versioned. It must be possible to query
  nodes at a specific revision of the tree.

Lucene indexes are more efficient, but are only updated
asynchronously. Whether this is acceptable usually depends on
application requirements. Experience so far shows, many indexes
can be asynchronous, because there was no hard requirement
for synchronous index updates.

Regards
 Marcel

On 08/07/15 18:18, "ianbos...@gmail.com on behalf of Ian Boston" wrote:

>Hi,
>I am confused at how /oak:index works and why it is needed in a MongoDB
>setting which has native database indexes that appear to cover the same
>functionality. Could the Oak Query engine use DB indexes directly for all
>indexes that are built into Oak, and Lucene indexes for all custom
>indexes ?
>
>I am asking this because in MongoDB I observe that 60% of the size of the
>nodes collection is attributable to /oak:index, and that the 60% increases
>every non sparse MongoDB index by about 3x. An _id + _modified compound
>index in MongoDB comes out at about 70GB for 100M documents (in part due
>to
>the size of _id). Without the duplication /oak:index it could be closer to
>25GB. Disk space is cheap, but MongoDB working set RAM is not cheap,
>neither is page fault IO.
>
>I fully understand why TarMK needs /oak:index, but I can't understand
>(conceptually) the need to implement an index inside an database table.
>It's like trying to implement an inverted index in an RDBMS table, which
>everyone who has ever tried (or used) that approach doesn't scale nearly
>as
>far as Lucene bitmaps.
>
>Could /oak:index be replaced by something that doesn't generate
>Documents/db rows as fast as it does ?
>
>Best Regards
>Ian

Reply via email to