hi jukka On Tue, Nov 20, 2012 at 5:24 PM, Jukka Zitting <[email protected]> wrote: > Hi, > > A lot of functionality in Oak (node states, the diff and hook > mechanisms, etc.) are based on walking down the tree hierarchy one > level at a time. To do this, for example to access changes below > /a/b/c, oak-core will currently request paths /a, /a/b, /a/b/c and so > on from the underlying MK implementation. > > This would work reasonably well with MK implementations that are > essentially big hash table that map the full path (and revision) to > the content at that location. Even then there's some space overhead as > even tiny nodes (think of an ACL entry) get paired with the full path > (and revision) of the node. The current MongoMK with its path keys > works like this, though even there a secondary index is needed for the > path lookups. > > The approach is less ideal for MK implementations (like the default > H2-based one) that have to traverse the path when some content is > accessed. For example, with the above oak-core access pattern, the > sequence of accessed nodes would be [ a, a, b, a, b, c ], where > ideally just [ a, b, c ] would suffice. The KernelNodeStore cache in > oak-core prevents this from being too big an issue, but ideally we'd > be able to avoid such extra levels of caching. > > To solve that mismatch without impacting the overall architecture too > much I'd like to propose the following: > > * When requested using the filter argument, the getNodes() call may > (but is not required to) return special ":hash" or ":id" properties as > parts of the (possibly otherwise empty) child node objects included in > the JSON response. > > * When returned by getNodes(), those values can be used by the client > instead of the normal path argument when requesting the content of > such child nodes using other getNodes() calls. The MK implementation > is expected to automatically detect whether a given string argument is > a path, a hash or an identifier, possibly as simply as looking at > whether it starts with a slash. > > * Both ":hash" and ":id" values are expected to uniquely identify a > specific immutable state of a node. The only difference is that the > inequality of two hashes implies the inequality of the referenced > nodes (which can be used by oak-core to optimize some operations), > whereas it's possible for two different ids to refer to nodes with the > exact same content. > > Such a solution would allow the following sequence > > getNodes("/") => { "a": {} } > getNodes("/a") => { "b": {} } > getNodes("/a/b") => { "c": {} } > getNodes("/a/b/c") => {} > > to become something like > > getNodes("/") => { "a": { ":id": "x" } } > getNodes("x") => { "b": { :id": "y" } } > getNodes("y") => { "c": { :id": "z"} } > getNodes("z") => {} > > with x, y and z being some implementation-specific identifiers, like > ObjectIDs in MongoDB. > > In any case the MK implementation would still be required to support > access by full path.
makes sense, +1 in general. some comments: - returning an :id and/or :hash should be optional, i.e. we shouldn't require an implementation to return an :id or :hash for every path (an implementation might e.g. want to persist an entire subtree as one single persistence entity) - i suggest we prefix the id/path getNodes parameter value with ':id:' and ':hash:' (or some other scheme) when requesting nodes by hash or identifier to avoid a potential ambiguity (an implementation might support both access by hash and id). - do you have a proposal for the suggested MicroKernel API (java doc) changes? cheers stefan > > BR, > > Jukka Zitting
