hi ian, On Sun, Oct 14, 2012 at 2:49 AM, Ian Boston <[email protected]> wrote: > Hi, > IIUC Oak has the potential to handle wide hierarchies where there are > millions of child nodes to a parent and hence support some of the use > cases commonly found in Social Media and cloud applications. This > question is about the implementation to support that. > > IIUC from reading the code child nodes are accessed from the parent > node, and to avoid the situation in JR2 where child nodes were > represented as an array of objects stored with the parent node, child > nodes in Oak are represented in a internal tree structure where the > width of the tree is controlled to ensure concurrency and performance > is maintained. (did I understand that correctly ?).
in JR2 the list of child node entries (name-id pairs) is stored within the parent node. while this is very efficient for small to medium sized child node lists (up to a couple of k entries) read and write performance suffers when the list grows very large. in the current MicroKernel implementation we use an htree based intermediate tree structure for storing large child node collections. > > In choosing this approach, was a possible alternative considered where > the child node contains a single pointer to its parent, and the child > node is retrieved by hashing it's path rather ie hash-f(path) === > child-node-storage-id, (eg sha1(path) == child-node-storage-id ) > rather than navigating from the parent. the data model of the current MicroKernel implementation is essentially just a DAG of node objects. the versioning model is git-like, i.e. every change results in a new immutable tree. i don't see how child->parent relationships could be implemented with this model. IIUC with path-based hashing you'd sacrifice MVCC or am i missing something? > > In my exposure to use cases for social media and cloud like systems I > have observed several things. Listing child nodes from a parent > becomes rarer the closer the application moves towards social media. > Direct pointer based access derived from a hierarchical url is the > main use case. For non hierarchal pointer, searching becomes dominant. > Occasionally listing is required but only as a variant of search. > Scans or sorted scans as nearly always avoided as being impractical. > > In social media content, the cardinality of all parents is low making > finding children of parents amenable to an inverted index approach. Ie > the parent becomes a search keyword term. > > (the closer to Enterprise Content Management the application is the > inverse of the above is true) > > If I understood all of that correctly, my question: > > In circumstances where the use cases are more to the social media end > of the spectrum, will it be possible to invert the child-parent > relationship within Oak to be a pointer from the child with an > optional, additional inverted index should finding children of a > specific parent be required ? WRT the MicroKernel implementation: IMO that's unfortunately not possible with the current git-like approach (content-based identifiers, DAG of immutable versions). cheers stefan > > Apologies if this is already covered and I missed it. The archives at > Markmail didn't turn anything up. > > Thanks > Ian
