Re: Recursive child versioning
Hi Tobias and all, thank you very much for your support. Your help has been very precious, everything is clear now on my side, and I see that Jackrabbit 2.0 implements exactly what described in JCR 2.0. I will structure my repository accordingly. Regards, Davide On Fri, Mar 26, 2010 at 9:47 PM, Tobias Bocanegra tri...@day.com wrote: On Fri, Mar 26, 2010 at 3:23 PM, Davide Maestroni davide.maestr...@gmail.com wrote: Hi Tobias, what you wrote is perfectly right, however if you look at http://www.day.com/specs/jcr/1.0/8.2.11.2_VERSION.html (I have just copy/pasted from there), the versioning process described is a bit different. I really don't know if there is a bug or not, I'm just trying to clearly understand how the versioning works in Jackrabbit and which is the expected behavior in each case. Since it looks to me that JCR 1.0 and JCR 2.0 say slightly different things (but with huge impact) and I have observed the same behavior using Jackrabbit 1.0 and Jackrabbit 2.0, I am bit confused about how things should work. If you say that the versioning of a whole sub-tree depends only on the child node, and not on all its descendants, I am perfectly fine with that, I just want to be sure that what I see is correct, and my understanding of the specifications is correct as well. yes. jcr2.0 is far more precise about the versioning. if i remember correctly, in jackrabbit 1.0 the behavior was different, indeed. jackrabbit 2.0 implements the versioning as described in jcr2.0 regards, toby Thanks, Davide On Fri, Mar 26, 2010 at 2:05 PM, Tobias Bocanegra tri...@day.com wrote: hi, that is not entirely true. if you checkin A, then B (and the entire subtree is copied, since B is not versionable (irrespective of the OPV of B). 3.13.9 Versionable State [...] 5. For each child node C of N where • C has an OPV of COPY, a copy of the entire subgraph rooted at C (regardless of the OPV values of the sub-items) is added to the frozen node, preserving the name of C and the names and values of all its sub-items. [...] if you checkin B, then all C's should be not copied if they have a OPV=VERSION: 3.13.9 Versionable State [...] 6. For each child node C of N where: • C has an OPV of VERSION Under simple versioning, the same behavior as COPY. Under full versioning, if C is not mix:versionable, the same behavior as COPY. Under full versioning, if C is mix:versionable, then a special nt:versionedChild node with a reference to the version history of C is substituted in place of C as a child of the frozen node. [...] So you statement: ... however the recursive copy terminates at each versionable node encountered further below in the subtree, is not correct. where did you read this? regards, toby On Wed, Mar 3, 2010 at 10:02 PM, Davide Maestroni davide.maestr...@gmail.com wrote: Hi Alex, thank you for your suggestions, now I have a clearer idea about the versioning of child nodes. Though I still have one doubt. In my repository I have a parent-child structure like this: A (mix:versionable, nt:unstructured) ^ | B (nt:unstructured) ^ | C (mix:versionable, nt:unstructured) ^ | D (nt:file) ^ | E (nt:resource) Where the OnParentVersion attribute for child nodes, in the node type definition of A, B and C, is always VERSION. As per what written in paragraph 8.2.11.2 of JCR 1.0 specifications (If *C* is not versionable then the behavior of *COPY*applies on *checkin*, however the recursive copy terminates at each versionable node encountered further below in the subtree, at which points the standard *VERSION* behavior is again followed.), what I expect when versioning the node A is that D and E are not copied, but it is not what I observe. I looked also in JCR 2.0 but couldn't find anything explaining a different behavior. What am I missing here? Thanks again for your patience, Davide On Mon, Mar 1, 2010 at 10:31 PM, Alexander Klimetschek aklim...@day.com wrote: On Mon, Mar 1, 2010 at 22:28, Alexander Klimetschek aklim...@day.com wrote: Also it is optimized for trees with fine-granular content (eg. a page in a CMS), not for arbitrary sized subfolders with lots of binary content. Ah, forgot: have you tried your scenario with a FileDataStore? Using a datastore avoids duplicate binaries in the storage, so there should only be some overhead in hashing the files upon versioning. Not sure, it might be that the versioning implementation internally temporarily copies the files, which might be slow. Regards, Alex -- Alexander Klimetschek alexander.klimetsc...@day.com
Re: Recursive child versioning
On Thu, Mar 25, 2010 at 18:36, Davide Maestroni davide.maestr...@gmail.com wrote: Hi Alex, sorry for the late reply, I've been out during the past weeks. Actually I'm not sure how to write a unit test for Jackrabbit, beside I am using it through Sling and Jetty, so I cannot provide you the whole code since the architecture I have implemented is quite complex and articulated. Anyway, I will try to write down a small test in the next days. That would be cool, because then the chance of it getting fixed is an order of magnitude higher, since people could actually reproduce the issue repeatedly :-) It doesn't have to be fancy, some code snippet starting with a session on an empty repository, then creating the necessary content and doing the problematic operations is enough. Regards, Alex -- Alexander Klimetschek alexander.klimetsc...@day.com
Re: Recursive child versioning
hi, that is not entirely true. if you checkin A, then B (and the entire subtree is copied, since B is not versionable (irrespective of the OPV of B). 3.13.9 Versionable State [...] 5. For each child node C of N where • C has an OPV of COPY, a copy of the entire subgraph rooted at C (regardless of the OPV values of the sub-items) is added to the frozen node, preserving the name of C and the names and values of all its sub-items. [...] if you checkin B, then all C's should be not copied if they have a OPV=VERSION: 3.13.9 Versionable State [...] 6. For each child node C of N where: • C has an OPV of VERSION Under simple versioning, the same behavior as COPY. Under full versioning, if C is not mix:versionable, the same behavior as COPY. Under full versioning, if C is mix:versionable, then a special nt:versionedChild node with a reference to the version history of C is substituted in place of C as a child of the frozen node. [...] So you statement: ... however the recursive copy terminates at each versionable node encountered further below in the subtree, is not correct. where did you read this? regards, toby On Wed, Mar 3, 2010 at 10:02 PM, Davide Maestroni davide.maestr...@gmail.com wrote: Hi Alex, thank you for your suggestions, now I have a clearer idea about the versioning of child nodes. Though I still have one doubt. In my repository I have a parent-child structure like this: A (mix:versionable, nt:unstructured) ^ | B (nt:unstructured) ^ | C (mix:versionable, nt:unstructured) ^ | D (nt:file) ^ | E (nt:resource) Where the OnParentVersion attribute for child nodes, in the node type definition of A, B and C, is always VERSION. As per what written in paragraph 8.2.11.2 of JCR 1.0 specifications (If *C* is not versionable then the behavior of *COPY*applies on *checkin*, however the recursive copy terminates at each versionable node encountered further below in the subtree, at which points the standard *VERSION* behavior is again followed.), what I expect when versioning the node A is that D and E are not copied, but it is not what I observe. I looked also in JCR 2.0 but couldn't find anything explaining a different behavior. What am I missing here? Thanks again for your patience, Davide On Mon, Mar 1, 2010 at 10:31 PM, Alexander Klimetschek aklim...@day.comwrote: On Mon, Mar 1, 2010 at 22:28, Alexander Klimetschek aklim...@day.com wrote: Also it is optimized for trees with fine-granular content (eg. a page in a CMS), not for arbitrary sized subfolders with lots of binary content. Ah, forgot: have you tried your scenario with a FileDataStore? Using a datastore avoids duplicate binaries in the storage, so there should only be some overhead in hashing the files upon versioning. Not sure, it might be that the versioning implementation internally temporarily copies the files, which might be slow. Regards, Alex -- Alexander Klimetschek alexander.klimetsc...@day.com
Re: Recursive child versioning
On Fri, Mar 26, 2010 at 3:23 PM, Davide Maestroni davide.maestr...@gmail.com wrote: Hi Tobias, what you wrote is perfectly right, however if you look at http://www.day.com/specs/jcr/1.0/8.2.11.2_VERSION.html (I have just copy/pasted from there), the versioning process described is a bit different. I really don't know if there is a bug or not, I'm just trying to clearly understand how the versioning works in Jackrabbit and which is the expected behavior in each case. Since it looks to me that JCR 1.0 and JCR 2.0 say slightly different things (but with huge impact) and I have observed the same behavior using Jackrabbit 1.0 and Jackrabbit 2.0, I am bit confused about how things should work. If you say that the versioning of a whole sub-tree depends only on the child node, and not on all its descendants, I am perfectly fine with that, I just want to be sure that what I see is correct, and my understanding of the specifications is correct as well. yes. jcr2.0 is far more precise about the versioning. if i remember correctly, in jackrabbit 1.0 the behavior was different, indeed. jackrabbit 2.0 implements the versioning as described in jcr2.0 regards, toby Thanks, Davide On Fri, Mar 26, 2010 at 2:05 PM, Tobias Bocanegra tri...@day.com wrote: hi, that is not entirely true. if you checkin A, then B (and the entire subtree is copied, since B is not versionable (irrespective of the OPV of B). 3.13.9 Versionable State [...] 5. For each child node C of N where • C has an OPV of COPY, a copy of the entire subgraph rooted at C (regardless of the OPV values of the sub-items) is added to the frozen node, preserving the name of C and the names and values of all its sub-items. [...] if you checkin B, then all C's should be not copied if they have a OPV=VERSION: 3.13.9 Versionable State [...] 6. For each child node C of N where: • C has an OPV of VERSION Under simple versioning, the same behavior as COPY. Under full versioning, if C is not mix:versionable, the same behavior as COPY. Under full versioning, if C is mix:versionable, then a special nt:versionedChild node with a reference to the version history of C is substituted in place of C as a child of the frozen node. [...] So you statement: ... however the recursive copy terminates at each versionable node encountered further below in the subtree, is not correct. where did you read this? regards, toby On Wed, Mar 3, 2010 at 10:02 PM, Davide Maestroni davide.maestr...@gmail.com wrote: Hi Alex, thank you for your suggestions, now I have a clearer idea about the versioning of child nodes. Though I still have one doubt. In my repository I have a parent-child structure like this: A (mix:versionable, nt:unstructured) ^ | B (nt:unstructured) ^ | C (mix:versionable, nt:unstructured) ^ | D (nt:file) ^ | E (nt:resource) Where the OnParentVersion attribute for child nodes, in the node type definition of A, B and C, is always VERSION. As per what written in paragraph 8.2.11.2 of JCR 1.0 specifications (If *C* is not versionable then the behavior of *COPY*applies on *checkin*, however the recursive copy terminates at each versionable node encountered further below in the subtree, at which points the standard *VERSION* behavior is again followed.), what I expect when versioning the node A is that D and E are not copied, but it is not what I observe. I looked also in JCR 2.0 but couldn't find anything explaining a different behavior. What am I missing here? Thanks again for your patience, Davide On Mon, Mar 1, 2010 at 10:31 PM, Alexander Klimetschek aklim...@day.com wrote: On Mon, Mar 1, 2010 at 22:28, Alexander Klimetschek aklim...@day.com wrote: Also it is optimized for trees with fine-granular content (eg. a page in a CMS), not for arbitrary sized subfolders with lots of binary content. Ah, forgot: have you tried your scenario with a FileDataStore? Using a datastore avoids duplicate binaries in the storage, so there should only be some overhead in hashing the files upon versioning. Not sure, it might be that the versioning implementation internally temporarily copies the files, which might be slow. Regards, Alex -- Alexander Klimetschek alexander.klimetsc...@day.com
Re: Recursive child versioning
Hi Alex, sorry for the late reply, I've been out during the past weeks. Actually I'm not sure how to write a unit test for Jackrabbit, beside I am using it through Sling and Jetty, so I cannot provide you the whole code since the architecture I have implemented is quite complex and articulated. Anyway, I will try to write down a small test in the next days. Thanks, Davide On Thu, Mar 4, 2010 at 11:58 AM, Alexander Klimetschek aklim...@day.comwrote: On Thu, Mar 4, 2010 at 10:52, Davide Maestroni davide.maestr...@gmail.com wrote: Hi Alex, I did another test removing the node B, so the structure became like this: A (mix:versionable, nt:unstructured) ^ | C (mix:versionable, nt:unstructured) ^ | D (nt:file) ^ | E (nt:resource) When I created a version of A, with the same number of files under C, it took only few seconds and the repository size did not increase. I'm even more confused now about the expected behavior of VERSION opv. Can you throw some light on it? I am not an expert in versioning to say if this is a bug. Could you provide a short test case (for the case with node B)? Maybe the clue is in the api calls you make. Regards, Alex -- Alexander Klimetschek alexander.klimetsc...@day.com
Re: Recursive child versioning
Hi Alex, what I see is that each time the file D (along with its data) is copied in the new version of node A. I understand that no diff mechanism is employed when applying COPY opv, so it is ok that the node B and all its properties are copied in the new version of A (since B is not versionable and the COPY behavior applies instead of VERSION), but I expected only the version ID of node C to be copied, not the entire sub-tree. What happens instead, is that on calling VersionManager.checkin(A) a new copy of the whole tree from A to E is made. In my use case I have hundreds of files under C, and changing a property of A, and then creating a new version of it, may take several minutes. Moreover, I can observe the size of the repository (i.e. the number of bytes in the home directory) to quickly increase, each time of roughly the sum of the bytes of all the files under C. And that is an undesired behavior. So please tell me if I'm doing something wrong or my understanding of the specifications is not correct. Regards, Davide On Thu, Mar 4, 2010 at 1:03 AM, Alexander Klimetschek aklim...@day.comwrote: On Wed, Mar 3, 2010 at 22:02, Davide Maestroni davide.maestr...@gmail.com wrote: ...what I expect when versioning the node A is that D and E are not copied, but it is not what I observe. What exactly do you observe? Regards, Alex -- Alexander Klimetschek alexander.klimetsc...@day.com
Re: Recursive child versioning
Hi Alex, I did another test removing the node B, so the structure became like this: A (mix:versionable, nt:unstructured) ^ | C (mix:versionable, nt:unstructured) ^ | D (nt:file) ^ | E (nt:resource) When I created a version of A, with the same number of files under C, it took only few seconds and the repository size did not increase. I'm even more confused now about the expected behavior of VERSION opv. Can you throw some light on it? Thanks a lot, Davide On Thu, Mar 4, 2010 at 10:45 AM, Davide Maestroni davide.maestr...@gmail.com wrote: Hi Alex, what I see is that each time the file D (along with its data) is copied in the new version of node A. I understand that no diff mechanism is employed when applying COPY opv, so it is ok that the node B and all its properties are copied in the new version of A (since B is not versionable and the COPY behavior applies instead of VERSION), but I expected only the version ID of node C to be copied, not the entire sub-tree. What happens instead, is that on calling VersionManager.checkin(A) a new copy of the whole tree from A to E is made. In my use case I have hundreds of files under C, and changing a property of A, and then creating a new version of it, may take several minutes. Moreover, I can observe the size of the repository (i.e. the number of bytes in the home directory) to quickly increase, each time of roughly the sum of the bytes of all the files under C. And that is an undesired behavior. So please tell me if I'm doing something wrong or my understanding of the specifications is not correct. Regards, Davide On Thu, Mar 4, 2010 at 1:03 AM, Alexander Klimetschek aklim...@day.comwrote: On Wed, Mar 3, 2010 at 22:02, Davide Maestroni davide.maestr...@gmail.com wrote: ...what I expect when versioning the node A is that D and E are not copied, but it is not what I observe. What exactly do you observe? Regards, Alex -- Alexander Klimetschek alexander.klimetsc...@day.com
Re: Recursive child versioning
On Thu, Mar 4, 2010 at 10:52, Davide Maestroni davide.maestr...@gmail.com wrote: Hi Alex, I did another test removing the node B, so the structure became like this: A (mix:versionable, nt:unstructured) ^ | C (mix:versionable, nt:unstructured) ^ | D (nt:file) ^ | E (nt:resource) When I created a version of A, with the same number of files under C, it took only few seconds and the repository size did not increase. I'm even more confused now about the expected behavior of VERSION opv. Can you throw some light on it? I am not an expert in versioning to say if this is a bug. Could you provide a short test case (for the case with node B)? Maybe the clue is in the api calls you make. Regards, Alex -- Alexander Klimetschek alexander.klimetsc...@day.com
Re: Recursive child versioning
Hi Alex, thank you for your suggestions, now I have a clearer idea about the versioning of child nodes. Though I still have one doubt. In my repository I have a parent-child structure like this: A (mix:versionable, nt:unstructured) ^ | B (nt:unstructured) ^ | C (mix:versionable, nt:unstructured) ^ | D (nt:file) ^ | E (nt:resource) Where the OnParentVersion attribute for child nodes, in the node type definition of A, B and C, is always VERSION. As per what written in paragraph 8.2.11.2 of JCR 1.0 specifications (If *C* is not versionable then the behavior of *COPY*applies on *checkin*, however the recursive copy terminates at each versionable node encountered further below in the subtree, at which points the standard *VERSION* behavior is again followed.), what I expect when versioning the node A is that D and E are not copied, but it is not what I observe. I looked also in JCR 2.0 but couldn't find anything explaining a different behavior. What am I missing here? Thanks again for your patience, Davide On Mon, Mar 1, 2010 at 10:31 PM, Alexander Klimetschek aklim...@day.comwrote: On Mon, Mar 1, 2010 at 22:28, Alexander Klimetschek aklim...@day.com wrote: Also it is optimized for trees with fine-granular content (eg. a page in a CMS), not for arbitrary sized subfolders with lots of binary content. Ah, forgot: have you tried your scenario with a FileDataStore? Using a datastore avoids duplicate binaries in the storage, so there should only be some overhead in hashing the files upon versioning. Not sure, it might be that the versioning implementation internally temporarily copies the files, which might be slow. Regards, Alex -- Alexander Klimetschek alexander.klimetsc...@day.com
Re: Recursive child versioning
On Wed, Mar 3, 2010 at 22:02, Davide Maestroni davide.maestr...@gmail.com wrote: ...what I expect when versioning the node A is that D and E are not copied, but it is not what I observe. What exactly do you observe? Regards, Alex -- Alexander Klimetschek alexander.klimetsc...@day.com
Re: Recursive child versioning
On Mon, Mar 1, 2010 at 19:31, Davide Maestroni davide.maestr...@gmail.com wrote: In my repository I have a vesionable root node with a few child nodes, each containing a lot of files (let's say more than 200). I set the VERSION rule for each child of the root node, which are in turn versionable nodes, while the file nodes are of type 'nt:file'. Versioning entire folders with many files might not be efficient. What is your use case? The problem is that each time I create a new version of the root, all the files are copied even if not changed. That may take minutes, which is simply not acceptable. Yes, everything is copied in Jackrabbit. There is no optimization in the version storage format (eg. using diffs), because unlike in revision control systems, it is quite likely that versions get deleted, so they are kept independent. Also it is optimized for trees with fine-granular content (eg. a page in a CMS), not for arbitrary sized subfolders with lots of binary content. In my understanding of what written in JCR specs there is actually no way of stopping the versioning at one (or any specific number) level of depth, and even if I set a rule in the child nodes, so to ignore any further level, the VERSION rule is applied to whole root subtree. Have you used OnParentVersion (OPV) with a value of IGNORE? See 3.13.9 ff of the JCR 2.0 spec for more: http://www.day.com/specs/jcr/2.0/3_Repository_Model.html#VersionableState Or this tutorial: http://jtoee.com/jsr-170/the_jcr_primer/5/ CND docs: http://jackrabbit.apache.org/node-type-notation.html Regards, Alex -- Alexander Klimetschek alexander.klimetsc...@day.com
Re: Recursive child versioning
On Mon, Mar 1, 2010 at 22:28, Alexander Klimetschek aklim...@day.com wrote: Also it is optimized for trees with fine-granular content (eg. a page in a CMS), not for arbitrary sized subfolders with lots of binary content. Ah, forgot: have you tried your scenario with a FileDataStore? Using a datastore avoids duplicate binaries in the storage, so there should only be some overhead in hashing the files upon versioning. Not sure, it might be that the versioning implementation internally temporarily copies the files, which might be slow. Regards, Alex -- Alexander Klimetschek alexander.klimetsc...@day.com