Re: SPI: ItemInfo.getParentId()
Hi, Marcel Reutegger schrieb: major parts of the jcr2spi currently rely on hierarchical caching structure of nodes and properties. which means if an item is in the cache it's ancestors are cached as well. this simplified the handling of spi ids a lot because in some cases they can be very volatile. think of same name siblings and parent nodes that become referenceable. another issue with a non-hierarchical caching structure is the way how a save on an item is specified. if you have multiple disconnected item sub-tree fragments (which contain modified items) it will be impossible to find out whether one of the sub-trees is included in a save call. even though I'd also be in favor of only reading what is really necessary, this constraint seems to even demand that an implementation resolves the ancestor hierarchy. regards marcel
Re: SPI: ItemInfo.getParentId()
Marcel Reutegger schrieb: Hi, Marcel Reutegger schrieb: major parts of the jcr2spi currently rely on hierarchical caching structure of nodes and properties. which means if an item is in the cache it's ancestors are cached as well. this simplified the handling of spi ids a lot because in some cases they can be very volatile. think of same name siblings and parent nodes that become referenceable. another issue with a non-hierarchical caching structure is the way how a save on an item is specified. if you have multiple disconnected item sub-tree fragments (which contain modified items) it will be impossible to find out whether one of the sub-trees is included in a save call. even though I'd also be in favor of only reading what is really necessary, this constraint seems to even demand that an implementation resolves the ancestor hierarchy. I understand the problem in general, but it certainly doesn't apply for the specific use case I have (having the contents of jcr:versionStorage not being enumerable). It seems to me that -- independantly of SPI -- we need to discuss whether this is legal behavior for JCR implementations. If it is, we need to define how this works with saving changes in general, and the transient layer in JCR2SPI in particular. If it isn't, that should be spelled out as well, because it may affect other implementors as well. In general, I think the assumption that if a user has read access to /a/b/c necessarily means (s)he also has access to /a and /a/b is flawed. Best regards, Julian
Re: SPI: ItemInfo.getParentId()
(sorry for the late reply - had a customer visit for most of the previous week). Marcel Reutegger schrieb: the current design of the spi demands that the client on top of the spi resolves paths to ids and vice versa. this design was actually just borrowed from the jackrabbit implementation, where the lower layers don't know about paths but the items just have forward and backward references (parent uuid, child node entries and property names). I'm not so sure if we should move this task to the server. I think in most cases a workspace is accessed in a traversal way. At least that's what most methods in the JCR are about. To get a node or a property you usually start from a node you accessed before. But you are right that this design will cause problems when there are ancestor nodes that cannot be accessed. It will also cause problems when collections are really, really big. in the meantime I realized that the IdFactory can do that for me, assuming it allows .createNodeId((NodeId)null, path); ...where path would be absolute -- which the one in spi2dav doesn't (why?). (As a matter of fact a createNodeId(Path) signature would be useful). the method createNodeId(NodeId, Path) is meant for ids that are relative to an existing id. createNodeId(String, Path) is what you are looking for. here, the String uuid parameter is optional. OK, thanks, works for me. So given the fact that the SPI API at least in theory has the capability to do the lookup without having to access the parent collections, shouldn't JCR2SPI use that when circumstances require that? major parts of the jcr2spi currently rely on hierarchical caching structure of nodes and properties. which means if an item is in the cache it's ancestors are cached as well. this simplified the handling of spi ids a lot because in some cases they can be very volatile. think of same name siblings and parent nodes that become referenceable. Well, the current design doesn't work for my current back end; and even if I *could* change the back end, looking up nodes that have many children will still be very expensive. Does it make sense that I start trying to change JCR2SPI with respect to this? Best regards, Julian
Re: SPI: ItemInfo.getParentId()
Hi Julian, Julian Reschke wrote: here's a question on ItemInfo.getParentId(). In my store, all version histories live directly below /jcr:system/jcr:versionStorage. However, getNodeIds() will not return any children. As far as I understand, that is legal in JCR (versioning nodes are exposed below jcr:versionStorage, but you can't navigate to them). Looking at the relevant sections in the spec, I think the version storage should behave just like any other tree in the workspace: section 8.2.2.2: Exposing the version storage as content in the workspace allows the stored versions and their associated version meta-data to be searched or traversed just like any other part of the workspace. With this setup, I'm getting an NPE (see below), as the code seems to rely on the assumption that if getParentId() returns something != null, the item will show up in the child node list of the parent. A parent child relation must always resolve. Unless its the root node, then obviously there is no parent. I can workaround this by return null in this special case for now, but I'd really like to clarify Hmm, that should be reserved for the root node. The jcr2spi layer will probably get confused later when it has to deal with multiple nodes without a parent. I'm actually not sure what the exact behaviour in that case is right now. - whether the setup itself is ok, and How about using an intermediate structure like jackrabbit does and expose the version histories that way? - what getParentId() is supposed to return in this case... well, as mentioned already, every parent id must resolve to a child node entry in the parent. hmm, the more I think about it, we might have to deal with this issue at other occasions. It may happen that a node is returned by a query that has never been requested, but its parent node has. assuming that the jcr2spi layer still has the old version of the parent node it will not see the new child node entry in there for the node returned in the query. While the jcr2spi layer is technically able to just add the new child node entry, it can be difficult to determine the exact sort order of the new child node in case the parent node supports orderable child nodes. To get back to the initial problem, I think from a specification standpoint the version storage must be traversable and expose all version histories. That would certainly solve your issue with the current state of the jcr2spi layer. regards marcel
Re: SPI: ItemInfo.getParentId()
Marcel Reutegger schrieb: Hi Julian, Julian Reschke wrote: How about using an intermediate structure like jackrabbit does and expose the version histories that way? I have to confess that I'm not sure how it currently does that (pointer?). jackrabbit uses the 6 highest digits of the uuid of the versionable node to construct an intermediate structure to the version history of that node. the label of the version history node is the full uuid of the versionable node. Understood. For the record: in a previous project where we exposed version histories in the namespace we choose a similar approach. The trouble is that in the system I currently have I can't enumerate version histories *at all* (well, except by looking at every single node in the system and asking it for it's version history). can you at least search for version history uuids with a certain pattern? That would allow you to group your version histories in a sub node structure under jcr:versionStorage. But I guess when you say you can't enumerate them at all, that means it *is* impossible... Right. I don't have an API for that. I think that's indeed a specification issue. Can a client always rely on a node's ability to enumerate all children? well, according to the spec it can... I'd rephrase that slightly as the spec ignores the issue :-). JCR clients already have to deal with child nodes not showing up in the parent collection, for instance due to permission problems. I think requiring this makes many use cases extremely hard, if not impossible, to implement. what use cases do you have in mind? Well, this one for instance. Another one would be where the system exposes all referenceable nodes with a second path, consisting of a collection (/jcr:flat/) plus the UUID. That's an approach I've seen in use to provide users with an alternate, stable, identifier. The common pattern here is that the nodes exposed at a certain part of the namespace are just projections from somewhere else, and not something that is persisted under that name. Back in September, I claimed that implementing simple versioning was trivial, maybe I now have to take that back ;-) Best regards, Julian
Re: SPI: ItemInfo.getParentId()
Marcel Reutegger schrieb: ... hmm, the more I think about it, we might have to deal with this issue at other occasions. It may happen that a node is returned by a query that has never been requested, but its parent node has. assuming that the jcr2spi layer still has the old version of the parent node it will not see the new child node entry in there for the node returned in the query. ... I can resolve my original problem with the change below...: --- src/main/java/org/apache/jackrabbit/jcr2spi/state/WorkspaceItemStateFactory.java 26 Oct 2006 12:20:08 - 1.2 +++ src/main/java/org/apache/jackrabbit/jcr2spi/state/WorkspaceItemStateFactory.java 3 Nov 2006 14:18:50 - @@ -96,7 +96,8 @@ NodeState parent = (parentId != null) ? (NodeState) ism.getItemState(parentId) : null; if (parent != null) { -return parent.getChildNodeEntry(info.getQName(), info.getIndex()).getNodeState(); +ChildNodeEntry child = parent.getChildNodeEntry(info.getQName(), info.getIndex()); +return child != null ? child.getNodeState() : createNodeState(info, parent); } else { return createNodeState(info, parent); } ...however once I do that - as expected - other problems surface. Looking at JCR2SPIs NodeImpl and HierarchyManagerImpl it seems that the only way to access a Node by absolute path is to recursively access all parent nodes, visiting their children. This seems to be not only inefficient, but may also cause a problem when the given user doesn't have read access to all parent nodes... Shouldn't we have something like: NodeId RepositoryService.getNodeId(QPath path); Best regards, Julian