Re: Recursive child versioning

2010-03-29 Thread Davide Maestroni
Hi Tobias and all,

thank you very much for your support. Your help has been very precious,
everything is clear now on my side, and I see that Jackrabbit 2.0 implements
exactly what described in JCR 2.0. I will structure my repository
accordingly.

Regards,

Davide


On Fri, Mar 26, 2010 at 9:47 PM, Tobias Bocanegra tri...@day.com wrote:

 On Fri, Mar 26, 2010 at 3:23 PM, Davide Maestroni
 davide.maestr...@gmail.com wrote:
  Hi Tobias,
 
  what you wrote is perfectly right, however if you look at
  http://www.day.com/specs/jcr/1.0/8.2.11.2_VERSION.html (I have just
  copy/pasted from there), the versioning process described is a bit
  different.
  I really don't know if there is a bug or not, I'm just trying to clearly
  understand how the versioning works in Jackrabbit and which is the
 expected
  behavior in each case.
  Since it looks to me that JCR 1.0 and JCR 2.0 say slightly different
  things (but with huge impact) and I have observed the same behavior using
  Jackrabbit 1.0 and Jackrabbit 2.0, I am bit confused about how things
 should
  work.
  If you say that the versioning of a whole sub-tree depends only on the
 child
  node, and not on all its descendants, I am perfectly fine with that, I
 just
  want to be sure that what I see is correct, and my understanding of the
  specifications is correct as well.
 yes. jcr2.0 is far more precise about the versioning. if i remember
 correctly, in jackrabbit 1.0 the behavior was different, indeed.
 jackrabbit 2.0 implements the versioning as described in jcr2.0

 regards, toby

 
  Thanks,
 
  Davide
 
 
  On Fri, Mar 26, 2010 at 2:05 PM, Tobias Bocanegra tri...@day.com
 wrote:
 
  hi,
  that is not entirely true.
 
  if you checkin A, then B (and the entire subtree is copied, since B is
  not versionable (irrespective of the OPV of B).
 
  3.13.9 Versionable State
  [...]
  5. For each child node C of N where
  •   C has an OPV of COPY,
  a copy of the entire subgraph rooted at C (regardless of the OPV
  values of the sub-items) is added to the frozen node, preserving the
  name of C and the names and values of all its sub-items.
  [...]
 
  if you checkin B, then all C's should be not copied if they have a
  OPV=VERSION:
 
  3.13.9 Versionable State
  [...]
  6. For each child node C of N where:
  •   C has an OPV of VERSION
  Under simple versioning, the same behavior as COPY. Under full
  versioning, if C is not mix:versionable, the same behavior as
  COPY.
  Under full versioning, if C is mix:versionable, then a special
  nt:versionedChild node with a reference to the version history of C is
  substituted in place of C as a child of the frozen node.
  [...]
 
  So you statement:
  ... however the recursive copy terminates at each versionable node
  encountered further below in the subtree, 
 
  is not correct. where did you read this?
  regards, toby
 
 
  On Wed, Mar 3, 2010 at 10:02 PM, Davide Maestroni
  davide.maestr...@gmail.com wrote:
   Hi Alex,
  
   thank you for your suggestions, now I have a clearer idea about the
   versioning of child nodes.
   Though I still have one doubt. In my repository I have a parent-child
   structure like this:
  
   A (mix:versionable, nt:unstructured)
^
|
   B (nt:unstructured)
^
|
   C (mix:versionable, nt:unstructured)
^
|
   D (nt:file)
^
|
   E (nt:resource)
  
   Where the OnParentVersion attribute for child nodes, in the node type
   definition of A, B and C, is always VERSION.
   As per what written in paragraph 8.2.11.2 of JCR 1.0 specifications
 (If
  *C* is
   not versionable then the behavior of *COPY*applies on *checkin*,
 however
  the
   recursive copy terminates at each versionable node encountered further
  below
   in the subtree, at which points the standard *VERSION* behavior is
 again
   followed.), what I expect when versioning the node A is that D and E
 are
   not copied, but it is not what I observe. I looked also in JCR 2.0 but
   couldn't find anything explaining a different behavior.
   What am I missing here?
  
   Thanks again for your patience,
  
   Davide
  
  
  
   On Mon, Mar 1, 2010 at 10:31 PM, Alexander Klimetschek 
 aklim...@day.com
  wrote:
  
   On Mon, Mar 1, 2010 at 22:28, Alexander Klimetschek 
 aklim...@day.com
   wrote:
Also it is optimized for trees
with fine-granular content (eg. a page in a CMS), not for arbitrary
sized subfolders with lots of binary content.
  
   Ah, forgot: have you tried your scenario with a FileDataStore? Using
 a
   datastore avoids duplicate binaries in the storage, so there should
   only be some overhead in hashing the files upon versioning. Not sure,
   it might be that the versioning implementation internally temporarily
   copies the files, which might be slow.
  
   Regards,
   Alex
  
   --
   Alexander Klimetschek
   alexander.klimetsc...@day.com
  
  
 
 



Re: Recursive child versioning

2010-03-26 Thread Alexander Klimetschek
On Thu, Mar 25, 2010 at 18:36, Davide Maestroni
davide.maestr...@gmail.com wrote:
 Hi Alex,

 sorry for the late reply, I've been out during the past weeks.
 Actually I'm not sure how to write a unit test for Jackrabbit, beside I am
 using it through Sling and Jetty, so I cannot provide you the whole code
 since the architecture I have implemented is quite complex and articulated.
 Anyway, I will try to write down a small test in the next days.

That would be cool, because then the chance of it getting fixed is an
order of magnitude higher, since people could actually reproduce the
issue repeatedly :-) It doesn't have to be fancy, some code snippet
starting with a session on an empty repository, then creating the
necessary content and doing the problematic operations is enough.

Regards,
Alex

-- 
Alexander Klimetschek
alexander.klimetsc...@day.com


Re: Recursive child versioning

2010-03-26 Thread Tobias Bocanegra
hi,
that is not entirely true.

if you checkin A, then B (and the entire subtree is copied, since B is
not versionable (irrespective of the OPV of B).

3.13.9 Versionable State
[...]
5. For each child node C of N where
•   C has an OPV of COPY,
a copy of the entire subgraph rooted at C (regardless of the OPV
values of the sub-items) is added to the frozen node, preserving the
name of C and the names and values of all its sub-items.
[...]

if you checkin B, then all C's should be not copied if they have a OPV=VERSION:

3.13.9 Versionable State
[...]
6. For each child node C of N where:
•   C has an OPV of VERSION
Under simple versioning, the same behavior as COPY. Under full
versioning, if C is not mix:versionable, the same behavior as
COPY.
Under full versioning, if C is mix:versionable, then a special
nt:versionedChild node with a reference to the version history of C is
substituted in place of C as a child of the frozen node.
[...]

So you statement:
... however the recursive copy terminates at each versionable node
encountered further below in the subtree, 

is not correct. where did you read this?
regards, toby


On Wed, Mar 3, 2010 at 10:02 PM, Davide Maestroni
davide.maestr...@gmail.com wrote:
 Hi Alex,

 thank you for your suggestions, now I have a clearer idea about the
 versioning of child nodes.
 Though I still have one doubt. In my repository I have a parent-child
 structure like this:

 A (mix:versionable, nt:unstructured)
      ^
      |
 B (nt:unstructured)
      ^
      |
 C (mix:versionable, nt:unstructured)
      ^
      |
 D (nt:file)
      ^
      |
 E (nt:resource)

 Where the OnParentVersion attribute for child nodes, in the node type
 definition of A, B and C, is always VERSION.
 As per what written in paragraph 8.2.11.2 of JCR 1.0 specifications (If *C* 
 is
 not versionable then the behavior of *COPY*applies on *checkin*, however the
 recursive copy terminates at each versionable node encountered further below
 in the subtree, at which points the standard *VERSION* behavior is again
 followed.), what I expect when versioning the node A is that D and E are
 not copied, but it is not what I observe. I looked also in JCR 2.0 but
 couldn't find anything explaining a different behavior.
 What am I missing here?

 Thanks again for your patience,

 Davide



 On Mon, Mar 1, 2010 at 10:31 PM, Alexander Klimetschek 
 aklim...@day.comwrote:

 On Mon, Mar 1, 2010 at 22:28, Alexander Klimetschek aklim...@day.com
 wrote:
  Also it is optimized for trees
  with fine-granular content (eg. a page in a CMS), not for arbitrary
  sized subfolders with lots of binary content.

 Ah, forgot: have you tried your scenario with a FileDataStore? Using a
 datastore avoids duplicate binaries in the storage, so there should
 only be some overhead in hashing the files upon versioning. Not sure,
 it might be that the versioning implementation internally temporarily
 copies the files, which might be slow.

 Regards,
 Alex

 --
 Alexander Klimetschek
 alexander.klimetsc...@day.com




Re: Recursive child versioning

2010-03-26 Thread Tobias Bocanegra
On Fri, Mar 26, 2010 at 3:23 PM, Davide Maestroni
davide.maestr...@gmail.com wrote:
 Hi Tobias,

 what you wrote is perfectly right, however if you look at
 http://www.day.com/specs/jcr/1.0/8.2.11.2_VERSION.html (I have just
 copy/pasted from there), the versioning process described is a bit
 different.
 I really don't know if there is a bug or not, I'm just trying to clearly
 understand how the versioning works in Jackrabbit and which is the expected
 behavior in each case.
 Since it looks to me that JCR 1.0 and JCR 2.0 say slightly different
 things (but with huge impact) and I have observed the same behavior using
 Jackrabbit 1.0 and Jackrabbit 2.0, I am bit confused about how things should
 work.
 If you say that the versioning of a whole sub-tree depends only on the child
 node, and not on all its descendants, I am perfectly fine with that, I just
 want to be sure that what I see is correct, and my understanding of the
 specifications is correct as well.
yes. jcr2.0 is far more precise about the versioning. if i remember
correctly, in jackrabbit 1.0 the behavior was different, indeed.
jackrabbit 2.0 implements the versioning as described in jcr2.0

regards, toby


 Thanks,

 Davide


 On Fri, Mar 26, 2010 at 2:05 PM, Tobias Bocanegra tri...@day.com wrote:

 hi,
 that is not entirely true.

 if you checkin A, then B (and the entire subtree is copied, since B is
 not versionable (irrespective of the OPV of B).

 3.13.9 Versionable State
 [...]
 5. For each child node C of N where
 •       C has an OPV of COPY,
 a copy of the entire subgraph rooted at C (regardless of the OPV
 values of the sub-items) is added to the frozen node, preserving the
 name of C and the names and values of all its sub-items.
 [...]

 if you checkin B, then all C's should be not copied if they have a
 OPV=VERSION:

 3.13.9 Versionable State
 [...]
 6. For each child node C of N where:
 •       C has an OPV of VERSION
 Under simple versioning, the same behavior as COPY. Under full
 versioning, if C is not mix:versionable, the same behavior as
 COPY.
 Under full versioning, if C is mix:versionable, then a special
 nt:versionedChild node with a reference to the version history of C is
 substituted in place of C as a child of the frozen node.
 [...]

 So you statement:
 ... however the recursive copy terminates at each versionable node
 encountered further below in the subtree, 

 is not correct. where did you read this?
 regards, toby


 On Wed, Mar 3, 2010 at 10:02 PM, Davide Maestroni
 davide.maestr...@gmail.com wrote:
  Hi Alex,
 
  thank you for your suggestions, now I have a clearer idea about the
  versioning of child nodes.
  Though I still have one doubt. In my repository I have a parent-child
  structure like this:
 
  A (mix:versionable, nt:unstructured)
       ^
       |
  B (nt:unstructured)
       ^
       |
  C (mix:versionable, nt:unstructured)
       ^
       |
  D (nt:file)
       ^
       |
  E (nt:resource)
 
  Where the OnParentVersion attribute for child nodes, in the node type
  definition of A, B and C, is always VERSION.
  As per what written in paragraph 8.2.11.2 of JCR 1.0 specifications (If
 *C* is
  not versionable then the behavior of *COPY*applies on *checkin*, however
 the
  recursive copy terminates at each versionable node encountered further
 below
  in the subtree, at which points the standard *VERSION* behavior is again
  followed.), what I expect when versioning the node A is that D and E are
  not copied, but it is not what I observe. I looked also in JCR 2.0 but
  couldn't find anything explaining a different behavior.
  What am I missing here?
 
  Thanks again for your patience,
 
  Davide
 
 
 
  On Mon, Mar 1, 2010 at 10:31 PM, Alexander Klimetschek aklim...@day.com
 wrote:
 
  On Mon, Mar 1, 2010 at 22:28, Alexander Klimetschek aklim...@day.com
  wrote:
   Also it is optimized for trees
   with fine-granular content (eg. a page in a CMS), not for arbitrary
   sized subfolders with lots of binary content.
 
  Ah, forgot: have you tried your scenario with a FileDataStore? Using a
  datastore avoids duplicate binaries in the storage, so there should
  only be some overhead in hashing the files upon versioning. Not sure,
  it might be that the versioning implementation internally temporarily
  copies the files, which might be slow.
 
  Regards,
  Alex
 
  --
  Alexander Klimetschek
  alexander.klimetsc...@day.com
 
 




Re: Recursive child versioning

2010-03-25 Thread Davide Maestroni
Hi Alex,

sorry for the late reply, I've been out during the past weeks.
Actually I'm not sure how to write a unit test for Jackrabbit, beside I am
using it through Sling and Jetty, so I cannot provide you the whole code
since the architecture I have implemented is quite complex and articulated.
Anyway, I will try to write down a small test in the next days.

Thanks,

Davide

On Thu, Mar 4, 2010 at 11:58 AM, Alexander Klimetschek aklim...@day.comwrote:

 On Thu, Mar 4, 2010 at 10:52, Davide Maestroni
 davide.maestr...@gmail.com wrote:
  Hi Alex,
 
  I did another test removing the node B, so the structure became like
 this:
 
  A (mix:versionable, nt:unstructured)
   ^
   |
  C (mix:versionable, nt:unstructured)
   ^
   |
  D (nt:file)
   ^
   |
  E (nt:resource)
 
  When I created a version of A, with the same number of files under C, it
  took only few seconds and the repository size did not increase.
  I'm even more confused now about the expected behavior of VERSION opv.
 Can
  you throw some light on it?

 I am not an expert in versioning to say if this is a bug. Could you
 provide a short test case (for the case with node B)? Maybe the clue
 is in the api calls you make.

 Regards,
 Alex

 --
 Alexander Klimetschek
 alexander.klimetsc...@day.com



Re: Recursive child versioning

2010-03-04 Thread Davide Maestroni
Hi Alex,

what I see is that each time the file D (along with its data) is copied in
the new version of node A. I understand that no diff mechanism is employed
when applying COPY opv, so it is ok that the node B and all its properties
are copied in the new version of A (since B is not versionable and the COPY
behavior applies instead of VERSION), but I expected only the version ID of
node C to be copied, not the entire sub-tree. What happens instead, is that
on calling VersionManager.checkin(A) a new copy of the whole tree from A to
E is made.

In my use case I have hundreds of files under C, and changing a property of
A, and then creating a new version of it, may take several minutes.
Moreover, I can observe the size of the repository (i.e. the number of bytes
in the home directory) to quickly increase, each time of roughly the sum of
the bytes of all the files under C. And that is an undesired behavior. So
please tell me if I'm doing something wrong or my understanding of the
specifications is not correct.

Regards,

Davide


On Thu, Mar 4, 2010 at 1:03 AM, Alexander Klimetschek aklim...@day.comwrote:

 On Wed, Mar 3, 2010 at 22:02, Davide Maestroni
 davide.maestr...@gmail.com wrote:
  ...what I expect when versioning the node A is that D and E are
  not copied, but it is not what I observe.

 What exactly do you observe?

 Regards,
 Alex

 --
 Alexander Klimetschek
 alexander.klimetsc...@day.com



Re: Recursive child versioning

2010-03-04 Thread Davide Maestroni
Hi Alex,

I did another test removing the node B, so the structure became like this:

A (mix:versionable, nt:unstructured)
  ^
  |
C (mix:versionable, nt:unstructured)
  ^
  |
D (nt:file)
  ^
  |
E (nt:resource)

When I created a version of A, with the same number of files under C, it
took only few seconds and the repository size did not increase.
I'm even more confused now about the expected behavior of VERSION opv. Can
you throw some light on it?

Thanks a lot,

Davide


On Thu, Mar 4, 2010 at 10:45 AM, Davide Maestroni 
davide.maestr...@gmail.com wrote:

 Hi Alex,

 what I see is that each time the file D (along with its data) is copied in
 the new version of node A. I understand that no diff mechanism is employed
 when applying COPY opv, so it is ok that the node B and all its properties
 are copied in the new version of A (since B is not versionable and the COPY
 behavior applies instead of VERSION), but I expected only the version ID of
 node C to be copied, not the entire sub-tree. What happens instead, is that
 on calling VersionManager.checkin(A) a new copy of the whole tree from A to
 E is made.

 In my use case I have hundreds of files under C, and changing a property of
 A, and then creating a new version of it, may take several minutes.
 Moreover, I can observe the size of the repository (i.e. the number of bytes
 in the home directory) to quickly increase, each time of roughly the sum of
 the bytes of all the files under C. And that is an undesired behavior. So
 please tell me if I'm doing something wrong or my understanding of the
 specifications is not correct.

 Regards,

 Davide


 On Thu, Mar 4, 2010 at 1:03 AM, Alexander Klimetschek aklim...@day.comwrote:

 On Wed, Mar 3, 2010 at 22:02, Davide Maestroni
 davide.maestr...@gmail.com wrote:
  ...what I expect when versioning the node A is that D and E are
  not copied, but it is not what I observe.

 What exactly do you observe?

 Regards,
 Alex

 --
 Alexander Klimetschek
 alexander.klimetsc...@day.com





Re: Recursive child versioning

2010-03-04 Thread Alexander Klimetschek
On Thu, Mar 4, 2010 at 10:52, Davide Maestroni
davide.maestr...@gmail.com wrote:
 Hi Alex,

 I did another test removing the node B, so the structure became like this:

 A (mix:versionable, nt:unstructured)
      ^
      |
 C (mix:versionable, nt:unstructured)
      ^
      |
 D (nt:file)
      ^
      |
 E (nt:resource)

 When I created a version of A, with the same number of files under C, it
 took only few seconds and the repository size did not increase.
 I'm even more confused now about the expected behavior of VERSION opv. Can
 you throw some light on it?

I am not an expert in versioning to say if this is a bug. Could you
provide a short test case (for the case with node B)? Maybe the clue
is in the api calls you make.

Regards,
Alex

-- 
Alexander Klimetschek
alexander.klimetsc...@day.com


Re: Recursive child versioning

2010-03-03 Thread Davide Maestroni
Hi Alex,

thank you for your suggestions, now I have a clearer idea about the
versioning of child nodes.
Though I still have one doubt. In my repository I have a parent-child
structure like this:

A (mix:versionable, nt:unstructured)
  ^
  |
B (nt:unstructured)
  ^
  |
C (mix:versionable, nt:unstructured)
  ^
  |
D (nt:file)
  ^
  |
E (nt:resource)

Where the OnParentVersion attribute for child nodes, in the node type
definition of A, B and C, is always VERSION.
As per what written in paragraph 8.2.11.2 of JCR 1.0 specifications (If *C* is
not versionable then the behavior of *COPY*applies on *checkin*, however the
recursive copy terminates at each versionable node encountered further below
in the subtree, at which points the standard *VERSION* behavior is again
followed.), what I expect when versioning the node A is that D and E are
not copied, but it is not what I observe. I looked also in JCR 2.0 but
couldn't find anything explaining a different behavior.
What am I missing here?

Thanks again for your patience,

Davide



On Mon, Mar 1, 2010 at 10:31 PM, Alexander Klimetschek aklim...@day.comwrote:

 On Mon, Mar 1, 2010 at 22:28, Alexander Klimetschek aklim...@day.com
 wrote:
  Also it is optimized for trees
  with fine-granular content (eg. a page in a CMS), not for arbitrary
  sized subfolders with lots of binary content.

 Ah, forgot: have you tried your scenario with a FileDataStore? Using a
 datastore avoids duplicate binaries in the storage, so there should
 only be some overhead in hashing the files upon versioning. Not sure,
 it might be that the versioning implementation internally temporarily
 copies the files, which might be slow.

 Regards,
 Alex

 --
 Alexander Klimetschek
 alexander.klimetsc...@day.com



Re: Recursive child versioning

2010-03-03 Thread Alexander Klimetschek
On Wed, Mar 3, 2010 at 22:02, Davide Maestroni
davide.maestr...@gmail.com wrote:
 ...what I expect when versioning the node A is that D and E are
 not copied, but it is not what I observe.

What exactly do you observe?

Regards,
Alex

-- 
Alexander Klimetschek
alexander.klimetsc...@day.com


Re: Recursive child versioning

2010-03-01 Thread Alexander Klimetschek
On Mon, Mar 1, 2010 at 19:31, Davide Maestroni
davide.maestr...@gmail.com wrote:
 In my repository I have a vesionable root node with a few child nodes, each
 containing a lot of files (let's say more than 200). I set the VERSION rule
 for each child of the root node, which are in turn versionable nodes, while
 the file nodes are of type 'nt:file'.

Versioning entire folders with many files might not be efficient.
What is your use case?

 The problem is that each time I create a new version of the root, all the
 files are copied even if not changed. That may take minutes, which is simply
 not acceptable.

Yes, everything is copied in Jackrabbit. There is no optimization in
the version storage format (eg. using diffs), because unlike in
revision control systems, it is quite likely that versions get
deleted, so they are kept independent. Also it is optimized for trees
with fine-granular content (eg. a page in a CMS), not for arbitrary
sized subfolders with lots of binary content.

 In my understanding of what written in JCR specs there is actually no way of
 stopping the versioning at one (or any specific number) level of depth, and
 even if I set a rule in the child nodes, so to ignore any further level, the
 VERSION rule is applied to whole root subtree.

Have you used OnParentVersion (OPV) with a value of IGNORE?

See 3.13.9 ff of the JCR 2.0 spec for more:
http://www.day.com/specs/jcr/2.0/3_Repository_Model.html#VersionableState

Or this tutorial:
http://jtoee.com/jsr-170/the_jcr_primer/5/

CND docs:
http://jackrabbit.apache.org/node-type-notation.html

Regards,
Alex

-- 
Alexander Klimetschek
alexander.klimetsc...@day.com


Re: Recursive child versioning

2010-03-01 Thread Alexander Klimetschek
On Mon, Mar 1, 2010 at 22:28, Alexander Klimetschek aklim...@day.com wrote:
 Also it is optimized for trees
 with fine-granular content (eg. a page in a CMS), not for arbitrary
 sized subfolders with lots of binary content.

Ah, forgot: have you tried your scenario with a FileDataStore? Using a
datastore avoids duplicate binaries in the storage, so there should
only be some overhead in hashing the files upon versioning. Not sure,
it might be that the versioning implementation internally temporarily
copies the files, which might be slow.

Regards,
Alex

-- 
Alexander Klimetschek
alexander.klimetsc...@day.com