Tim has forwarded me the email since I was removed from the thread.

On 27/02/14 17:35, "Timothee Maret" <[email protected]> wrote:

>
>________________________________________
>From: Thomas Mueller
>Sent: Thursday, February 27, 2014 5:12 PM
>To: [email protected]
>Cc: Marcel Reutegger; Ian Boston; Timothee Maret
>Subject: Re: Oak Scalability: Load Distribution
>
>Hi,
>
>>The path depth is prepended to
>>the path to ensure that the nodes are distributed more equally.
>
>Actually, the reason for the prefix is not that the nodes are distributed
>more equally, but so that queries for child nodes are efficient, and so
>that siblings are stored next to each other. Queries for child nodes are
>range queries of the form "id between '2:/content/' and '2:/content0'".
>This is efficient because MongoDB keeps a documents sorted by id. For more
>details about range queries, see
>http://docs.mongodb.org/manual/core/index-single/

Ok, I understand; however, this doesn’t change my statement.

>
>>Cards: /content/<tenant>/<board>/<card>
>>Comments: /content/<tenant>/<board>/<card>/comments/<comment>
>>
>>As you can see, all cards and all comments are saved on the same level
>>and
>>hence end up on the same cluster node.
>
>In this case, cards are stored next to each other, and comments are stored
>next to each other. But not necessarily on the same cluster node.

I agree that this must not always be the case (e.g. if every tenant has a
lot of other data below /content on the same level), but the probability
is way too high.

>
>> If we assume that every card gets
>>10 comments, this will cause 10 times more write load on the ³comments²
>>cluster node than on the ³cards² cluster node.
>
>MongoDB will distribute the nodes evenly accross shards. In the extreme
>case, if there are 10 shards, and if 10% of the data is cards and 90% is
>comments, then one cluster node will have all the cards, while the
>comments are distributed accross the remaining cluster nodes.

Which means that one cluster is much less busy with writes. In reality,
there is a lot of other (often read-only) data which is saved on - let’s
say - another 10 nodes which had then much less write operations too.

>
>>A much better distribution could be achieved if the hash/checksum of the
>>parent node path would be used instead of the path depth.
>
>Sure, we can do some experiments and try it out. My fear is that using an
>index on randomly distributed data will perform poorly, and we might end
>up with similar problems than we had with Jackrabbit 2.x. But I might be
>wrong.

I don’t understand the problem. My suggestion is just to use a hash/short
checksum of the parent node path instead of the node level/path depth:

cc24:/content/a/1
cc24:/content/a/2


0ef7:/content/b/1
0ef7:/content/b/2
09d7:/content/c/1
09d7:/content/c/2






(where the first 4 characters are the first 4 characters of the SHA1 of
the parent node path)
instead of:


3:/content/a/1
3:/content/a/2

3:/content/b/1


3:/content/b/2
3:/content/c/1
3:/content/c/2

The advantage is a much better distribution for all nodes which are in a
bucket (like a tenant, board, artificial bucket and so on).

Regards, Joel

>
>Regards,
>Thomas
>

Reply via email to