Re: Oak Scalability: Load Distribution

Thomas Mueller Thu, 27 Feb 2014 08:19:34 -0800

Hi,

>The path depth is prepended to
>the path to ensure that the nodes are distributed more equally.


Actually, the reason for the prefix is not that the nodes are distributed
more equally, but so that queries for child nodes are efficient, and so
that siblings are stored next to each other. Queries for child nodes are
range queries of the form "id between '2:/content/' and '2:/content0'".
This is efficient because MongoDB keeps a documents sorted by id. For more
details about range queries, see
http://docs.mongodb.org/manual/core/index-single/

>Cards: /content/<tenant>/<board>/<card>
>Comments: /content/<tenant>/<board>/<card>/comments/<comment>
>
>As you can see, all cards and all comments are saved on the same level and
>hence end up on the same cluster node.

In this case, cards are stored next to each other, and comments are stored
next to each other. But not necessarily on the same cluster node.

> If we assume that every card gets
>10 comments, this will cause 10 times more write load on the ³comments²
>cluster node than on the ³cards² cluster node.

MongoDB will distribute the nodes evenly accross shards. In the extreme
case, if there are 10 shards, and if 10% of the data is cards and 90% is
comments, then one cluster node will have all the cards, while the
comments are distributed accross the remaining cluster nodes.

>A much better distribution could be achieved if the hash/checksum of the
>parent node path would be used instead of the path depth.

Sure, we can do some experiments and try it out. My fear is that using an
index on randomly distributed data will perform poorly, and we might end
up with similar problems than we had with Jackrabbit 2.x. But I might be
wrong.

Regards,
Thomas

Re: Oak Scalability: Load Distribution

Reply via email to