Hi Thomas, Thanks for your advices.
Thomas Müller a écrit : > Hi, > > What about: > > 1) Start with the node of your choice (let's say /data/) > 2) Check how many nodes and child nodes are in that node > 3) If more than 50: pick a child node randomly from the range 00-50 and > continue at step 2) > 4) Create a node with a random name, done > With this method a content will contains its applicative child nodes and eventually other contents. This implies two problems: - if i want to delete a content, i must ensure to not delete contents below. - the versioning of a content implies versioning of the contents below and recursively the others contents below. I guess i can adapt this method with the notion of content holder with one content and multiple sub content holder. But if there are holder, there will be holes, at least without additional management. > I would make sure there are no same name siblings (for multiple reasons). > There are at least two solutions: > > A) Disallow same name siblings, use node names n00-n50, and let the > algorithm re-try if there is a clash. I'm not sure if that works well with > clustering. > I think the transient node creation will always works. But as the retrieval of the next node is not synchronized between cluster node, it's the call to save (which do synchronize and do lock) will failed, but as you said i can retry like 3 times and finally the node will be created and will be unique. > B) Use a cryptographically secure pseudo random number generator (for > example a UUID) as the node name. This works well with clustering, the node > names will be quite long however. > I see, like the storing of version history into /jcr:root/jcr:system/jcr:versionStorage. This implies a fixed hash for limiting child nodes. This can also fails if a node of the hash already exists (especially with the nodes of the first level) but we can retry. In Jackrabbit jcr:versionStorage node there is: - 00 to ff = 256 child nodes for first level. - 00 to ff = 256 child nodes for second level. - 00 to ff = 256 child nodes for third level. - no limitation for the fourth level. Ex: fff0b38e-66... => ff/f0/b3/fff0b38e-66... So, if the number generator distributes UUID with equals frequencies on the 3 first segment (ff, b0, b3 in the example) and IIUC there will be more than 10k nodes in the fourth level if the number of contents exceeds ((256)^3)*10000 = 167 772M. The number generated are rarely distributed equally and 10k is the limit, but with 167 G we can assume it will be working quite well with 500M contents which is far enough in our case. Therefore, i think this algorithm is more simple and much easier to implement and will surely produce less bugs :). I think i am gonna try this by first finding a UUID generator maybe the same as the one used in Jackrabbit from Jakarta Commons-Id project. The only drawback is that the node may have an UUID different than the name of the node which is another UUID ;). Thanks again Thomas. > Disadvantage: > - Maybe performance. To improve that, you could keep the information 'root > is full' in memory and start with a random node /data/00 - /data/50 directly > > Advantages: > - No counter required > - No synchronization required > - Holes are automatically filled > > Regards, > Thomas > > -- Sébastien Launay
