Hi, >The path depth is prepended to >the path to ensure that the nodes are distributed more equally.
Actually, the reason for the prefix is not that the nodes are distributed more equally, but so that queries for child nodes are efficient, and so that siblings are stored next to each other. Queries for child nodes are range queries of the form "id between '2:/content/' and '2:/content0'". This is efficient because MongoDB keeps a documents sorted by id. For more details about range queries, see http://docs.mongodb.org/manual/core/index-single/ >Cards: /content/<tenant>/<board>/<card> >Comments: /content/<tenant>/<board>/<card>/comments/<comment> > >As you can see, all cards and all comments are saved on the same level and >hence end up on the same cluster node. In this case, cards are stored next to each other, and comments are stored next to each other. But not necessarily on the same cluster node. > If we assume that every card gets >10 comments, this will cause 10 times more write load on the ³comments² >cluster node than on the ³cards² cluster node. MongoDB will distribute the nodes evenly accross shards. In the extreme case, if there are 10 shards, and if 10% of the data is cards and 90% is comments, then one cluster node will have all the cards, while the comments are distributed accross the remaining cluster nodes. >A much better distribution could be achieved if the hash/checksum of the >parent node path would be used instead of the path depth. Sure, we can do some experiments and try it out. My fear is that using an index on randomly distributed data will perform poorly, and we might end up with similar problems than we had with Jackrabbit 2.x. But I might be wrong. Regards, Thomas
