H Norberto,

Thank you for the feedback on the questions. I see you work for as an
Evangelist for MongoDB, so will probably know the answers, and can save me
time. I agree it's not worth doing anything about concurrency even if logs
indicate there is contention on locks in 2.6, as the added complexity would
make read things worse. If an upgrade to 3.0 has been done, anything
collection based makes is a waste of time due to the availability of
WiredTiger.

Could you confirm that separating one large collection into a number of
smaller collections will not reduce the size of the indexes that have to be
consulted for queries of the form that Chetan shared earlier ?

I'll try and clarify that question. DocumentNodeStore has 1 collection
containing all Documents "nodes". Some queries are only interested in a key
space representing a certain part of the "nodes" collection, eg
n:/largelystatic/**. If those Documents were stored in nodes_x, and
count(nodes_x) <= 0.001*count(nodes), would there be any performance
advantage or does MongoDB, under the covers, treat all collections as a
single massive collection from an index and query point of view ?

If you have any pointer to how 2.6 scale relative to collection size,
number of collections and index size that would help me understand more
about its behaviour.

Best Regards
Ian




On 12 June 2015 at 17:08, Norberto Leite <norbe...@norbertoleite.com> wrote:

> Hi Ian,
>
> Your proposal would not be very efficient.
> The concurrency control mechanism that 2.6 offers (current supported
> version), although not neglectable, would not be that beneficial on the
> write load. On the reading part, which we can assume is the gross workload
> that JCR will be doing, is not affected by that.
> One needs to consider that every time you would be reading from the JCR you
> either would be providing a complex M/R operation, which is designed to
> span out to the full amount of documents existing in a given collection,
> and would need to recur all affected collections. Not very effective.
>
> The existing mechanism is way more simple and more efficient.
> With the upcoming support for wired tiger, the concurrency control
> (potential issue) becomes totally irrelevant.
>
> Also don't forget that you cannot predict the number of child nodes that a
> given system would implement to define their content tree.
> If you do have a very nested (on specific level) number of documents you
> would need to treat that collection separately(when needing to scale just
> shard that collection and not the others) bringing in more operational
> complexity.
>
> What can be a good discussion point would be to separate the blobs
> collection into its own database given the flexibility that JCR offers when
> treating these 2 different data types.
> Actually, this reminded me that I was pending on submitting a jira request
> on this matter <https://issues.apache.org/jira/browse/OAK-2984>.
>
> As Chetan is mentioning, sharding comes into play once we have to scale the
> write throughput of the system.
>
> N.
>
>
> On Fri, Jun 12, 2015 at 4:15 PM, Chetan Mehrotra <
> chetan.mehro...@gmail.com>
> wrote:
>
> > On Fri, Jun 12, 2015 at 7:32 PM, Ian Boston <i...@tfd.co.uk> wrote:
> > > Initially I was thinking about the locking behaviour but I realises
> 2.6.*
> > > is still locking at a database level, and that only changes to at a
> > > collection level 3.0 with MMAPv1 and row if you switch to WiredTiger
> [1].
> >
> > I initially thought the same and then we benchmarked the throughput by
> > placing the BlobStore in a separate database (OAK-1153). But did not
> > observed any significant gains. So that approach was not pursued
> > further. If we have some benchmark which can demonstrate that write
> > throughput increases if we _shard_ node collection into separate
> > database on same server then we can look further there
> >
> > Chetan Mehrotra
> >
>

Reply via email to