Hi Ian,

Your proposal would not be very efficient.
The concurrency control mechanism that 2.6 offers (current supported
version), although not neglectable, would not be that beneficial on the
write load. On the reading part, which we can assume is the gross workload
that JCR will be doing, is not affected by that.
One needs to consider that every time you would be reading from the JCR you
either would be providing a complex M/R operation, which is designed to
span out to the full amount of documents existing in a given collection,
and would need to recur all affected collections. Not very effective.

The existing mechanism is way more simple and more efficient.
With the upcoming support for wired tiger, the concurrency control
(potential issue) becomes totally irrelevant.

Also don't forget that you cannot predict the number of child nodes that a
given system would implement to define their content tree.
If you do have a very nested (on specific level) number of documents you
would need to treat that collection separately(when needing to scale just
shard that collection and not the others) bringing in more operational
complexity.

What can be a good discussion point would be to separate the blobs
collection into its own database given the flexibility that JCR offers when
treating these 2 different data types.
Actually, this reminded me that I was pending on submitting a jira request
on this matter <https://issues.apache.org/jira/browse/OAK-2984>.

As Chetan is mentioning, sharding comes into play once we have to scale the
write throughput of the system.

N.


On Fri, Jun 12, 2015 at 4:15 PM, Chetan Mehrotra <chetan.mehro...@gmail.com>
wrote:

> On Fri, Jun 12, 2015 at 7:32 PM, Ian Boston <i...@tfd.co.uk> wrote:
> > Initially I was thinking about the locking behaviour but I realises 2.6.*
> > is still locking at a database level, and that only changes to at a
> > collection level 3.0 with MMAPv1 and row if you switch to WiredTiger [1].
>
> I initially thought the same and then we benchmarked the throughput by
> placing the BlobStore in a separate database (OAK-1153). But did not
> observed any significant gains. So that approach was not pursued
> further. If we have some benchmark which can demonstrate that write
> throughput increases if we _shard_ node collection into separate
> database on same server then we can look further there
>
> Chetan Mehrotra
>

Reply via email to