Re: SPI: ItemInfo.getParentId()

2006-11-16 Thread Marcel Reutegger

Hi,


Marcel Reutegger schrieb:
major parts of the jcr2spi currently rely on hierarchical caching 
structure of nodes and properties. which means if an item is in the 
cache it's ancestors are cached as well. this simplified the handling 
of spi ids a lot because in some cases they can be very volatile. 
think of same name siblings and parent nodes that become referenceable.


another issue with a non-hierarchical caching structure is the way how a save on 
an item is specified. if you have multiple disconnected item sub-tree fragments 
(which contain modified items) it will be impossible to find out whether one of 
the sub-trees is included in a save call. even though I'd also be in favor of 
only reading what is really necessary, this constraint seems to even demand that 
an implementation resolves the ancestor hierarchy.


regards
 marcel


Re: SPI: ItemInfo.getParentId()

2006-11-16 Thread Julian Reschke

Marcel Reutegger schrieb:

Hi,


Marcel Reutegger schrieb:
major parts of the jcr2spi currently rely on hierarchical caching 
structure of nodes and properties. which means if an item is in the 
cache it's ancestors are cached as well. this simplified the handling 
of spi ids a lot because in some cases they can be very volatile. 
think of same name siblings and parent nodes that become referenceable.


another issue with a non-hierarchical caching structure is the way how a 
save on an item is specified. if you have multiple disconnected item 
sub-tree fragments (which contain modified items) it will be impossible 
to find out whether one of the sub-trees is included in a save call. 
even though I'd also be in favor of only reading what is really 
necessary, this constraint seems to even demand that an implementation 
resolves the ancestor hierarchy.


I understand the problem in general, but it certainly doesn't apply for 
the specific use case I have (having the contents of jcr:versionStorage 
not being enumerable).


It seems to me that -- independantly of SPI -- we need to discuss 
whether this is legal behavior for JCR implementations. If it is, we 
need to define how this works with saving changes in general, and the 
transient layer in JCR2SPI in particular. If it isn't, that should be 
spelled out as well, because it may affect other implementors as well.


In general, I think the assumption that if a user has read access to 
/a/b/c necessarily means (s)he also has access to /a and /a/b is flawed.


Best regards, Julian




Re: SPI: ItemInfo.getParentId()

2006-11-13 Thread Julian Reschke
(sorry for the late reply - had a customer visit for most of the 
previous week).


Marcel Reutegger schrieb:
the current design of the spi demands that the client on top of the spi 
resolves paths to ids and vice versa. this design was actually just 
borrowed from the jackrabbit implementation, where the lower layers 
don't know about paths but the items just have forward and backward 
references (parent uuid, child node entries and property names).


I'm not so sure if we should move this task to the server. I think in 
most cases a workspace is accessed in a traversal way. At least that's 
what most methods in the JCR are about. To get a node or a property you 
usually start from a node you  accessed before.


But you are right that this design will cause problems when there are 
ancestor nodes that cannot be accessed.


It will also cause problems when collections are really, really big.

in the meantime I realized that the IdFactory can do that for me, 
assuming it allows


  .createNodeId((NodeId)null, path);

...where path would be absolute -- which the one in spi2dav doesn't 
(why?). (As a matter of fact a createNodeId(Path) signature would be 
useful).


the method createNodeId(NodeId, Path) is meant for ids that are relative 
to an existing id. createNodeId(String, Path) is what you are looking 
for. here, the String uuid parameter is optional.


OK, thanks, works for me.

So given the fact that the SPI API at least in theory has the 
capability to do the lookup without having to access the parent 
collections, shouldn't JCR2SPI use that when circumstances require that?


major parts of the jcr2spi currently rely on hierarchical caching 
structure of nodes and properties. which means if an item is in the 
cache it's ancestors are cached as well. this simplified the handling of 
spi ids a lot because in some cases they can be very volatile. think of 
same name siblings and parent nodes that become referenceable.


Well, the current design doesn't work for my current back end; and even 
if I *could* change the back end, looking up nodes that have many 
children will still be very expensive.


Does it make sense that I start trying to change JCR2SPI with respect to 
this?


Best regards, Julian



Re: SPI: ItemInfo.getParentId()

2006-11-03 Thread Marcel Reutegger

Hi Julian,

Julian Reschke wrote:

here's a question on ItemInfo.getParentId().

In my store, all version histories live directly below 
/jcr:system/jcr:versionStorage. However, getNodeIds() will not return 
any children. As far as I understand, that is legal in JCR (versioning 
nodes are exposed below jcr:versionStorage, but you can't navigate to 
them).


Looking at the relevant sections in the spec, I think the version storage should 
behave just like any other tree in the workspace:


section 8.2.2.2:

Exposing the version storage as content in the workspace allows
the stored versions and their associated version meta-data to be
searched or traversed just like any other part of the workspace.

With this setup, I'm getting an NPE (see below), as the code seems to 
rely on the assumption that if getParentId() returns something != null, 
the item will show up in the child node list of the parent.


A parent child relation must always resolve. Unless its the root node, then 
obviously there is no parent.


I can workaround this by return null in this special case for now, but 
I'd really like to clarify


Hmm, that should be reserved for the root node. The jcr2spi layer will probably 
get confused later when it has to deal with multiple nodes without a parent. I'm 
actually not sure what the exact behaviour in that case is right now.



- whether the setup itself is ok, and


How about using an intermediate structure like jackrabbit does and expose the 
version histories that way?



- what getParentId() is supposed to return in this case...


well, as mentioned already, every parent id must resolve to a child node entry 
in the parent.


hmm, the more I think about it, we might have to deal with this issue at other 
occasions. It may happen that a node is returned by a query that has never been 
requested, but its parent node has. assuming that the jcr2spi layer still has 
the old version of the parent node it will not see the new child node entry in 
there for the node returned in the query.


While the jcr2spi layer is technically able to just add the new child node 
entry, it can be difficult to determine the exact sort order of the new child 
node in case the parent node supports orderable child nodes.


To get back to the initial problem, I think from a specification standpoint the 
version storage must be traversable and expose all version histories. That would 
certainly solve your issue with the current state of the jcr2spi layer.


regards
 marcel


Re: SPI: ItemInfo.getParentId()

2006-11-03 Thread Julian Reschke

Marcel Reutegger schrieb:

Hi Julian,

Julian Reschke wrote:
How about using an intermediate structure like jackrabbit does and 
expose the version histories that way?


I have to confess that I'm not sure how it currently does that 
(pointer?).


jackrabbit uses the 6 highest digits of the uuid of the versionable node 
to construct an intermediate structure to the version history of that 
node. the label of the version history node is the full uuid of the 
versionable node.


Understood. For the record: in a previous project where we exposed 
version histories in the namespace we choose a similar approach.


The trouble is that in the system I currently have I can't enumerate 
version histories *at all* (well, except by looking at every single 
node in the system and asking it for it's version history).


can you at least search for version history uuids with a certain 
pattern? That would allow you to group your version histories in a sub 
node structure under jcr:versionStorage. But I guess when you say you 
can't enumerate them at all, that means it *is* impossible...


Right. I don't have an API for that.

I think that's indeed a specification issue. Can a client always rely 
on  a node's ability to enumerate all children?


well, according to the spec it can...


I'd rephrase that slightly as the spec ignores the issue :-). JCR 
clients already have to deal with child nodes not showing up in the 
parent collection, for instance due to permission problems.


I think requiring this makes many use cases extremely hard, if not 
impossible, to implement.


what use cases do you have in mind?


Well, this one for instance.

Another one would be where the system exposes all referenceable nodes 
with a second path, consisting of a collection (/jcr:flat/) plus the 
UUID. That's an approach I've seen in use to provide users with an 
alternate, stable, identifier. The common pattern here is that the nodes 
exposed at a certain part of the namespace are just projections from 
somewhere else, and not something that is persisted under that name.


Back in September, I claimed that implementing simple versioning was 
trivial, maybe I now have to take that back ;-)



Best regards, Julian






Re: SPI: ItemInfo.getParentId()

2006-11-03 Thread Julian Reschke

Marcel Reutegger schrieb:

...
hmm, the more I think about it, we might have to deal with this issue at 
other occasions. It may happen that a node is returned by a query that 
has never been requested, but its parent node has. assuming that the 
jcr2spi layer still has the old version of the parent node it will not 
see the new child node entry in there for the node returned in the query.

...


I can resolve my original problem with the change below...:

--- 
src/main/java/org/apache/jackrabbit/jcr2spi/state/WorkspaceItemStateFactory.java 
26 Oct 2006 12:20:08 -	1.2
+++ 
src/main/java/org/apache/jackrabbit/jcr2spi/state/WorkspaceItemStateFactory.java 
3 Nov 2006 14:18:50 -

@@ -96,7 +96,8 @@
 NodeState parent = (parentId != null) ? (NodeState) 
ism.getItemState(parentId) : null;


 if (parent != null) {
-return parent.getChildNodeEntry(info.getQName(), 
info.getIndex()).getNodeState();
+ChildNodeEntry child = 
parent.getChildNodeEntry(info.getQName(), info.getIndex());
+return child != null ? child.getNodeState() : 
createNodeState(info, parent);

 } else {
 return createNodeState(info, parent);
 }

...however once I do that - as expected - other problems surface.

Looking at JCR2SPIs NodeImpl and HierarchyManagerImpl it seems that the 
only way to access a Node by absolute path is to recursively access all 
parent nodes, visiting their children.


This seems to be not only inefficient, but may also cause a problem when 
 the given user doesn't have read access to all parent nodes...


Shouldn't we have something like:

  NodeId RepositoryService.getNodeId(QPath path);

Best regards, Julian