Hi, During the TechFair I have talked with Marcel about the MongoDB Microkernel and asked some questions about clustering. He explained that the nodes are distributed over the clusters based on the key which consists of the path depth and node path. The path depth is prepended to the path to ensure that the nodes are distributed more equally. However, I fear that this will cause troubles since often only some kind of data is written heavily. In our current project, for example, we will have a lot of card and comment writes and all cards and comments are on the same level (the paths are simplified):
Cards: /content/<tenant>/<board>/<card> Comments: /content/<tenant>/<board>/<card>/comments/<comment> As you can see, all cards and all comments are saved on the same level and hence end up on the same cluster node. If we assume that every card gets 10 comments, this will cause 10 times more write load on the “comments” cluster node than on the “cards” cluster node. And again, the same is true for cards compared to other content. A much better distribution could be achieved if the hash/checksum of the parent node path would be used instead of the path depth. Children would still be saved on the same cluster node, but the cards/comments would be saved on different clusters for every board/card. I strongly recommend to reconsider the choice of the path depth as the prefix of the MongoDB key since this will lead to really bad load distribution. Regards, Joel
