Hi Stefan, comments inline: On 18 October 2012 02:46, Stefan Guggisberg <[email protected]> wrote: > hi ian, > > On Sun, Oct 14, 2012 at 2:49 AM, Ian Boston <[email protected]> wrote: >> Hi, >> IIUC Oak has the potential to handle wide hierarchies where there are >> millions of child nodes to a parent and hence support some of the use >> cases commonly found in Social Media and cloud applications. This >> question is about the implementation to support that. >> >> IIUC from reading the code child nodes are accessed from the parent >> node, and to avoid the situation in JR2 where child nodes were >> represented as an array of objects stored with the parent node, child >> nodes in Oak are represented in a internal tree structure where the >> width of the tree is controlled to ensure concurrency and performance >> is maintained. (did I understand that correctly ?). > > in JR2 the list of child node entries (name-id pairs) is stored within the > parent node. while this is very efficient for small to medium sized child > node lists (up to a couple of k entries) read and write performance suffers > when the list grows very large. > > in the current MicroKernel implementation we use an htree based intermediate > tree structure for storing large child node collections.
phew, thats what I understood. > >> >> In choosing this approach, was a possible alternative considered where >> the child node contains a single pointer to its parent, and the child >> node is retrieved by hashing it's path rather ie hash-f(path) === >> child-node-storage-id, (eg sha1(path) == child-node-storage-id ) >> rather than navigating from the parent. > > the data model of the current MicroKernel implementation is essentially > just a DAG of node objects. the versioning model is git-like, i.e. > every change results in a new immutable tree. > > i don't see how child->parent relationships could be implemented with > this model. IIUC with path-based hashing you'd sacrifice MVCC > or am i missing something? Hmm, good point, your right, you cant MVCC the tree structure, only the data within the tree, since the path -> pointer transformation is immutable, thats a pity. Thanks for explaining. Ian > >> >> In my exposure to use cases for social media and cloud like systems I >> have observed several things. Listing child nodes from a parent >> becomes rarer the closer the application moves towards social media. >> Direct pointer based access derived from a hierarchical url is the >> main use case. For non hierarchal pointer, searching becomes dominant. >> Occasionally listing is required but only as a variant of search. >> Scans or sorted scans as nearly always avoided as being impractical. >> >> In social media content, the cardinality of all parents is low making >> finding children of parents amenable to an inverted index approach. Ie >> the parent becomes a search keyword term. >> >> (the closer to Enterprise Content Management the application is the >> inverse of the above is true) >> >> If I understood all of that correctly, my question: >> >> In circumstances where the use cases are more to the social media end >> of the spectrum, will it be possible to invert the child-parent >> relationship within Oak to be a pointer from the child with an >> optional, additional inverted index should finding children of a >> specific parent be required ? > > WRT the MicroKernel implementation: > > IMO that's unfortunately not possible with the current git-like > approach (content-based identifiers, DAG of immutable versions). > > cheers > stefan > >> >> Apologies if this is already covered and I missed it. The archives at >> Markmail didn't turn anything up. >> >> Thanks >> Ian
