Re: [jr3] Orderable child nodes: required (to be the default)?

2012-01-23 Thread Julian Reschke

On 2012-01-23 11:43, Felix Meschberger wrote:

...
I am not sure, whether this proposal does not open a can of worms: Consider 
using a node for child nodes whose retrieval should be ordered by insertion 
order. This is currently guaranteed by switching on ordered child nodes on the 
parent node, right ?

So, applications using this will always use ordered nodes and thus suffer from 
performance again ... not good.
...


How is "ordered by insertion order" and "allowing manual re-ordering" 
different from an application point of view?



...


Best regards, Julia


Re: [jr3] Orderable child nodes: required (to be the default)?

2012-01-23 Thread Michael Dürig

On Sat, Jan 21, 2012 at 11:37 PM, Tobias Bocanegra  wrote:

so for large childnode lists, a stable but uncontrollable order
would be ok, although violating the spec?


I wouldn't violate the spec for this. If you have an orderable node
(like nt:unstructured), then the repository should keep track of the
order of child nodes regardless of how many they are. IMHO it's OK for
the repository *not* to scale up or perform well if someone dumps a
million child nodes to an orderable parent.

However, it would be great if we could implement efficient storage for
nodes with lots (i.e. millions) of unorderable children. In such a
case I'd say we can expect the client to explicitly use a parent node
with a custom node type that doesn't require orderability.


I am not sure, whether this proposal does not open a can of worms: Consider 
using a node for child nodes whose retrieval should be ordered by insertion 
order. This is currently guaranteed by switching on ordered child nodes on the 
parent node, right ?

So, applications using this will always use ordered nodes and thus suffer from 
performance again ... not good.


Well, there is no free lunch...

If we want to go 'big data' and distributed, we will have to make trade 
offs. What we need to do IMO is to carefully identify the areas where we 
can/want to trade off and make sure these are well understood.


Michael


Re: [jr3] Orderable child nodes: required (to be the default)?

2012-01-23 Thread Alexander Klimetschek
On 23.01.12 11:43, "Felix Meschberger"  wrote:
>Wouldn't it be such, that "unordered" might mean "no defined but stable
>ordering" while ordered means "insertion order unless eplicitly changed" ?

+1

Alex

-- 
Alexander Klimetschek
Developer // Adobe (Day) // Berlin - Basel






Re: [jr3] Orderable child nodes: required (to be the default)?

2012-01-23 Thread Felix Meschberger
Hi,

Am 23.01.2012 um 10:43 schrieb Jukka Zitting:

> Hi,
> 
> On Sat, Jan 21, 2012 at 11:37 PM, Tobias Bocanegra  wrote:
>> so for large childnode lists, a stable but uncontrollable order
>> would be ok, although violating the spec?
> 
> I wouldn't violate the spec for this. If you have an orderable node
> (like nt:unstructured), then the repository should keep track of the
> order of child nodes regardless of how many they are. IMHO it's OK for
> the repository *not* to scale up or perform well if someone dumps a
> million child nodes to an orderable parent.
> 
> However, it would be great if we could implement efficient storage for
> nodes with lots (i.e. millions) of unorderable children. In such a
> case I'd say we can expect the client to explicitly use a parent node
> with a custom node type that doesn't require orderability.

I am not sure, whether this proposal does not open a can of worms: Consider 
using a node for child nodes whose retrieval should be ordered by insertion 
order. This is currently guaranteed by switching on ordered child nodes on the 
parent node, right ?

So, applications using this will always use ordered nodes and thus suffer from 
performance again ... not good.


> 
> I wouldn't even promise a stable iteration order for such cases. The
> repository should be free to for example reorder the internal data
> structure based on frequent access patterns.

+1

Though paging to the list will then not be possible at all !

Wouldn't it be such, that "unordered" might mean "no defined but stable 
ordering" while ordered means "insertion order unless eplicitly changed" ?

Both must really perform well or we get bad press again

Regards
Felix

Re: [jr3] Orderable child nodes: required (to be the default)?

2012-01-23 Thread Jukka Zitting
Hi,

On Sat, Jan 21, 2012 at 11:37 PM, Tobias Bocanegra  wrote:
> so for large childnode lists, a stable but uncontrollable order
> would be ok, although violating the spec?

I wouldn't violate the spec for this. If you have an orderable node
(like nt:unstructured), then the repository should keep track of the
order of child nodes regardless of how many they are. IMHO it's OK for
the repository *not* to scale up or perform well if someone dumps a
million child nodes to an orderable parent.

However, it would be great if we could implement efficient storage for
nodes with lots (i.e. millions) of unorderable children. In such a
case I'd say we can expect the client to explicitly use a parent node
with a custom node type that doesn't require orderability.

I wouldn't even promise a stable iteration order for such cases. The
repository should be free to for example reorder the internal data
structure based on frequent access patterns.

BR,

Jukka Zitting


Re: [jr3] Orderable child nodes: required (to be the default)?

2012-01-23 Thread Alexander Klimetschek
I think there is a use case for large flat unordered lists (using a node
type with orderable = false). For example, dictionaries, where the key and
value are in the properties and the node name is not really relevant, and
where you don't want to create nested structures just for the sake of
scalability - which also increases the total number of nodes.

Cheers,
Alex

On 21.01.12 23:37, "Tobias Bocanegra"  wrote:

>I agree,
>so for large childnode lists, a stable but uncontrollable order would
>be ok, although violating the spec?
>
>--
>
>toby
>On Sat, Jan 21, 2012 at 5:04 AM, Stefan Guggisberg
> wrote:
>> On Sat, Jan 21, 2012 at 3:00 AM, Tobias Bocanegra 
>>wrote:
>>> nt:unstructured has orderable childnodes, and we also preach to use
>>> nt:unstructured whereever you can. Also, i assume that a lot of
>>> applications benefit from node ordering (like a CMS). So I think that
>>> the majority of the repositories have a lot of nodes with ordered
>>> childnodes.
>>
>>
>> agreed, at least for small child node lists (e.g. < 1k child nodes).
>> but i doubt there's a need for large (e.g. 1M entries)  orderable
>> child node lists.
>>
>>> thus i would not put too much effort in having an
>>> efficient non-ordered format, but making the ordered storage as fast
>>> as possible.
>>
>> i am thinking of large child node collections with stable order.
>> however, they wouldn't be insertion-ordered nor client-orderable.
>> that's the compromise.
>>
>>> using lexicographical ordering for unordered child nodes
>>> would certainly be a good option,
>>
>> that would be an implementation detail. as long as the order
>> is stable (repeated reads of the same node revision should
>> provide the same order of the child nodes) we shouldn't make
>> any promises about how the order is defined.
>>
>> cheers
>> stefan
>>
>>> [...] as i can reduce the lookup (but not
>>> insert). also, i think that re-orderings happen relatively rarely.
>>>
>>> regards, toby
>>>
>>> On Fri, Jan 20, 2012 at 9:13 AM, Stefan Guggisberg
>>>  wrote:
 On Fri, Jan 20, 2012 at 5:43 PM, Jukka Zitting
 wrote:
> Hi,
>
> Thinking about this more generally, it would definitely be useful to
> be able to use a more efficient underlying storage for unorderable
> nodes. However, there still are hard use cases that require us to
> support orderable nodes as indicated by the type of the parent node.
> Thus the implementation basically has two options:
>
> 1) Use a single data structure for all nodes, like Jackrabbit
> currently does. This simplifies the implementation but prevents us
> from enjoying the performance and scalability benefits of unorderable
> nodes.
>
> 2) Use two data structures, one for orderable and another for
> unorderable nodes. This is more complex (not least because it links
> node types to the underlying storage model), but is probably required
> if we want to efficiently support up to millions of child nodes per a
> single parent.

 i agree that it's probably required for efficiently handling both
large
 and small child node lists.

 i am not sure that it necessarily means linking node types to the
 underlying storage modal. the microkernel prototype currently has
 no notion of node types and that's IMO a good thing.

 node types have a way to dominant role in the current jackrabbit
 core implementation. jackrabbit started out as the official reference
 implementation for jsr-170, the focus was on supporting every
 feature of the spec.

 in jr3 i guess we can and should trade some 'note type correctness'
 for improved efficiency.

>
> It might also be possible to have the underlying storage model
> unorderable, and just include extra ordering metadata at a higher
> layer where also node types are handled. Option 2b, if you like, with
> probably fairly significant overhead when iterating over nodes (the
> implementation would probably need to pre-fetch all child nodes in
> advance to access their ordering metadata).
>
> If we do have native support for unorderable nodes, then I wouldn't
> promise any particular ordering (alphabetic, insertion order, etc.)
> but rather leave it undefined like in Java's Map interface. The
> underlying implementation is then free to use whatever ordering it
> thinks is best.

 i agree.

 cheers
 stefan

>
> PS. Note that ordering affects not just node traversal, but also the
> document ordering used in search results. Though in practice we
> already now disable document ordering of search results by default.
>
> BR,
>
> Jukka Zitting



-- 
Alexander Klimetschek
Developer // Adobe (Day) // Berlin - Basel






Re: [jr3] Orderable child nodes: required (to be the default)?

2012-01-21 Thread Tobias Bocanegra
I agree,
so for large childnode lists, a stable but uncontrollable order would
be ok, although violating the spec?

--

toby
On Sat, Jan 21, 2012 at 5:04 AM, Stefan Guggisberg
 wrote:
> On Sat, Jan 21, 2012 at 3:00 AM, Tobias Bocanegra  wrote:
>> nt:unstructured has orderable childnodes, and we also preach to use
>> nt:unstructured whereever you can. Also, i assume that a lot of
>> applications benefit from node ordering (like a CMS). So I think that
>> the majority of the repositories have a lot of nodes with ordered
>> childnodes.
>
>
> agreed, at least for small child node lists (e.g. < 1k child nodes).
> but i doubt there's a need for large (e.g. 1M entries)  orderable
> child node lists.
>
>> thus i would not put too much effort in having an
>> efficient non-ordered format, but making the ordered storage as fast
>> as possible.
>
> i am thinking of large child node collections with stable order.
> however, they wouldn't be insertion-ordered nor client-orderable.
> that's the compromise.
>
>> using lexicographical ordering for unordered child nodes
>> would certainly be a good option,
>
> that would be an implementation detail. as long as the order
> is stable (repeated reads of the same node revision should
> provide the same order of the child nodes) we shouldn't make
> any promises about how the order is defined.
>
> cheers
> stefan
>
>> [...] as i can reduce the lookup (but not
>> insert). also, i think that re-orderings happen relatively rarely.
>>
>> regards, toby
>>
>> On Fri, Jan 20, 2012 at 9:13 AM, Stefan Guggisberg
>>  wrote:
>>> On Fri, Jan 20, 2012 at 5:43 PM, Jukka Zitting  
>>> wrote:
 Hi,

 Thinking about this more generally, it would definitely be useful to
 be able to use a more efficient underlying storage for unorderable
 nodes. However, there still are hard use cases that require us to
 support orderable nodes as indicated by the type of the parent node.
 Thus the implementation basically has two options:

 1) Use a single data structure for all nodes, like Jackrabbit
 currently does. This simplifies the implementation but prevents us
 from enjoying the performance and scalability benefits of unorderable
 nodes.

 2) Use two data structures, one for orderable and another for
 unorderable nodes. This is more complex (not least because it links
 node types to the underlying storage model), but is probably required
 if we want to efficiently support up to millions of child nodes per a
 single parent.
>>>
>>> i agree that it's probably required for efficiently handling both large
>>> and small child node lists.
>>>
>>> i am not sure that it necessarily means linking node types to the
>>> underlying storage modal. the microkernel prototype currently has
>>> no notion of node types and that's IMO a good thing.
>>>
>>> node types have a way to dominant role in the current jackrabbit
>>> core implementation. jackrabbit started out as the official reference
>>> implementation for jsr-170, the focus was on supporting every
>>> feature of the spec.
>>>
>>> in jr3 i guess we can and should trade some 'note type correctness'
>>> for improved efficiency.
>>>

 It might also be possible to have the underlying storage model
 unorderable, and just include extra ordering metadata at a higher
 layer where also node types are handled. Option 2b, if you like, with
 probably fairly significant overhead when iterating over nodes (the
 implementation would probably need to pre-fetch all child nodes in
 advance to access their ordering metadata).

 If we do have native support for unorderable nodes, then I wouldn't
 promise any particular ordering (alphabetic, insertion order, etc.)
 but rather leave it undefined like in Java's Map interface. The
 underlying implementation is then free to use whatever ordering it
 thinks is best.
>>>
>>> i agree.
>>>
>>> cheers
>>> stefan
>>>

 PS. Note that ordering affects not just node traversal, but also the
 document ordering used in search results. Though in practice we
 already now disable document ordering of search results by default.

 BR,

 Jukka Zitting

Re: [jr3] Orderable child nodes: required (to be the default)?

2012-01-21 Thread Stefan Guggisberg
On Sat, Jan 21, 2012 at 3:00 AM, Tobias Bocanegra  wrote:
> nt:unstructured has orderable childnodes, and we also preach to use
> nt:unstructured whereever you can. Also, i assume that a lot of
> applications benefit from node ordering (like a CMS). So I think that
> the majority of the repositories have a lot of nodes with ordered
> childnodes.


agreed, at least for small child node lists (e.g. < 1k child nodes).
but i doubt there's a need for large (e.g. 1M entries)  orderable
child node lists.

> thus i would not put too much effort in having an
> efficient non-ordered format, but making the ordered storage as fast
> as possible.

i am thinking of large child node collections with stable order.
however, they wouldn't be insertion-ordered nor client-orderable.
that's the compromise.

> using lexicographical ordering for unordered child nodes
> would certainly be a good option,

that would be an implementation detail. as long as the order
is stable (repeated reads of the same node revision should
provide the same order of the child nodes) we shouldn't make
any promises about how the order is defined.

cheers
stefan

> [...] as i can reduce the lookup (but not
> insert). also, i think that re-orderings happen relatively rarely.
>
> regards, toby
>
> On Fri, Jan 20, 2012 at 9:13 AM, Stefan Guggisberg
>  wrote:
>> On Fri, Jan 20, 2012 at 5:43 PM, Jukka Zitting  
>> wrote:
>>> Hi,
>>>
>>> Thinking about this more generally, it would definitely be useful to
>>> be able to use a more efficient underlying storage for unorderable
>>> nodes. However, there still are hard use cases that require us to
>>> support orderable nodes as indicated by the type of the parent node.
>>> Thus the implementation basically has two options:
>>>
>>> 1) Use a single data structure for all nodes, like Jackrabbit
>>> currently does. This simplifies the implementation but prevents us
>>> from enjoying the performance and scalability benefits of unorderable
>>> nodes.
>>>
>>> 2) Use two data structures, one for orderable and another for
>>> unorderable nodes. This is more complex (not least because it links
>>> node types to the underlying storage model), but is probably required
>>> if we want to efficiently support up to millions of child nodes per a
>>> single parent.
>>
>> i agree that it's probably required for efficiently handling both large
>> and small child node lists.
>>
>> i am not sure that it necessarily means linking node types to the
>> underlying storage modal. the microkernel prototype currently has
>> no notion of node types and that's IMO a good thing.
>>
>> node types have a way to dominant role in the current jackrabbit
>> core implementation. jackrabbit started out as the official reference
>> implementation for jsr-170, the focus was on supporting every
>> feature of the spec.
>>
>> in jr3 i guess we can and should trade some 'note type correctness'
>> for improved efficiency.
>>
>>>
>>> It might also be possible to have the underlying storage model
>>> unorderable, and just include extra ordering metadata at a higher
>>> layer where also node types are handled. Option 2b, if you like, with
>>> probably fairly significant overhead when iterating over nodes (the
>>> implementation would probably need to pre-fetch all child nodes in
>>> advance to access their ordering metadata).
>>>
>>> If we do have native support for unorderable nodes, then I wouldn't
>>> promise any particular ordering (alphabetic, insertion order, etc.)
>>> but rather leave it undefined like in Java's Map interface. The
>>> underlying implementation is then free to use whatever ordering it
>>> thinks is best.
>>
>> i agree.
>>
>> cheers
>> stefan
>>
>>>
>>> PS. Note that ordering affects not just node traversal, but also the
>>> document ordering used in search results. Though in practice we
>>> already now disable document ordering of search results by default.
>>>
>>> BR,
>>>
>>> Jukka Zitting


Re: [jr3] Orderable child nodes: required (to be the default)?

2012-01-20 Thread Tobias Bocanegra
nt:unstructured has orderable childnodes, and we also preach to use
nt:unstructured whereever you can. Also, i assume that a lot of
applications benefit from node ordering (like a CMS). So I think that
the majority of the repositories have a lot of nodes with ordered
childnodes. thus i would not put too much effort in having an
efficient non-ordered format, but making the ordered storage as fast
as possible. using lexicographical ordering for unordered child nodes
would certainly be a good option, as i can reduce the lookup (but not
insert). also, i think that re-orderings happen relatively rarely.

regards, toby

On Fri, Jan 20, 2012 at 9:13 AM, Stefan Guggisberg
 wrote:
> On Fri, Jan 20, 2012 at 5:43 PM, Jukka Zitting  
> wrote:
>> Hi,
>>
>> Thinking about this more generally, it would definitely be useful to
>> be able to use a more efficient underlying storage for unorderable
>> nodes. However, there still are hard use cases that require us to
>> support orderable nodes as indicated by the type of the parent node.
>> Thus the implementation basically has two options:
>>
>> 1) Use a single data structure for all nodes, like Jackrabbit
>> currently does. This simplifies the implementation but prevents us
>> from enjoying the performance and scalability benefits of unorderable
>> nodes.
>>
>> 2) Use two data structures, one for orderable and another for
>> unorderable nodes. This is more complex (not least because it links
>> node types to the underlying storage model), but is probably required
>> if we want to efficiently support up to millions of child nodes per a
>> single parent.
>
> i agree that it's probably required for efficiently handling both large
> and small child node lists.
>
> i am not sure that it necessarily means linking node types to the
> underlying storage modal. the microkernel prototype currently has
> no notion of node types and that's IMO a good thing.
>
> node types have a way to dominant role in the current jackrabbit
> core implementation. jackrabbit started out as the official reference
> implementation for jsr-170, the focus was on supporting every
> feature of the spec.
>
> in jr3 i guess we can and should trade some 'note type correctness'
> for improved efficiency.
>
>>
>> It might also be possible to have the underlying storage model
>> unorderable, and just include extra ordering metadata at a higher
>> layer where also node types are handled. Option 2b, if you like, with
>> probably fairly significant overhead when iterating over nodes (the
>> implementation would probably need to pre-fetch all child nodes in
>> advance to access their ordering metadata).
>>
>> If we do have native support for unorderable nodes, then I wouldn't
>> promise any particular ordering (alphabetic, insertion order, etc.)
>> but rather leave it undefined like in Java's Map interface. The
>> underlying implementation is then free to use whatever ordering it
>> thinks is best.
>
> i agree.
>
> cheers
> stefan
>
>>
>> PS. Note that ordering affects not just node traversal, but also the
>> document ordering used in search results. Though in practice we
>> already now disable document ordering of search results by default.
>>
>> BR,
>>
>> Jukka Zitting

Re: [jr3] Orderable child nodes: required (to be the default)?

2012-01-20 Thread Stefan Guggisberg
On Fri, Jan 20, 2012 at 5:43 PM, Jukka Zitting  wrote:
> Hi,
>
> Thinking about this more generally, it would definitely be useful to
> be able to use a more efficient underlying storage for unorderable
> nodes. However, there still are hard use cases that require us to
> support orderable nodes as indicated by the type of the parent node.
> Thus the implementation basically has two options:
>
> 1) Use a single data structure for all nodes, like Jackrabbit
> currently does. This simplifies the implementation but prevents us
> from enjoying the performance and scalability benefits of unorderable
> nodes.
>
> 2) Use two data structures, one for orderable and another for
> unorderable nodes. This is more complex (not least because it links
> node types to the underlying storage model), but is probably required
> if we want to efficiently support up to millions of child nodes per a
> single parent.

i agree that it's probably required for efficiently handling both large
and small child node lists.

i am not sure that it necessarily means linking node types to the
underlying storage modal. the microkernel prototype currently has
no notion of node types and that's IMO a good thing.

node types have a way to dominant role in the current jackrabbit
core implementation. jackrabbit started out as the official reference
implementation for jsr-170, the focus was on supporting every
feature of the spec.

in jr3 i guess we can and should trade some 'note type correctness'
for improved efficiency.

>
> It might also be possible to have the underlying storage model
> unorderable, and just include extra ordering metadata at a higher
> layer where also node types are handled. Option 2b, if you like, with
> probably fairly significant overhead when iterating over nodes (the
> implementation would probably need to pre-fetch all child nodes in
> advance to access their ordering metadata).
>
> If we do have native support for unorderable nodes, then I wouldn't
> promise any particular ordering (alphabetic, insertion order, etc.)
> but rather leave it undefined like in Java's Map interface. The
> underlying implementation is then free to use whatever ordering it
> thinks is best.

i agree.

cheers
stefan

>
> PS. Note that ordering affects not just node traversal, but also the
> document ordering used in search results. Though in practice we
> already now disable document ordering of search results by default.
>
> BR,
>
> Jukka Zitting


Re: [jr3] Orderable child nodes: required (to be the default)?

2012-01-20 Thread Stefan Guggisberg
On Fri, Jan 20, 2012 at 4:56 PM, Jukka Zitting  wrote:
> Hi,
>
> On Fri, Jan 20, 2012 at 4:35 PM, Thomas Mueller  wrote:
>> If we use alphabetical child node lists by default for Jackrabbit 3, then
>> we would need to use a different default node type?
>
> Right, though I suppose there are quite a few applications that use
> nt:unstructured either as-is or as a base type for many purposes.
>
>> Is it even possible to use a different node type than nt:unstructured
>> in Jackrabbit 3?
>
> It is. The default use of nt:unstructured in Jackrabbit 1.x and 2.x
> boils down to the use of nt:unstructured as the base type of rep:root.
> It should be fairly straightforward to change the root type to specify
> an unorderable type as the default for any child nodes.

yes, i agree.

cheers
stefan

>
> Doing so may cause some breakage in existing clients that don't
> explicitly specify the types they want, but that breakage should be
> manageable with proper release notes and workaround instructions.
>
> BR,
>
> Jukka Zitting


Re: [jr3] Orderable child nodes: required (to be the default)?

2012-01-20 Thread Jukka Zitting
Hi,

Thinking about this more generally, it would definitely be useful to
be able to use a more efficient underlying storage for unorderable
nodes. However, there still are hard use cases that require us to
support orderable nodes as indicated by the type of the parent node.
Thus the implementation basically has two options:

1) Use a single data structure for all nodes, like Jackrabbit
currently does. This simplifies the implementation but prevents us
from enjoying the performance and scalability benefits of unorderable
nodes.

2) Use two data structures, one for orderable and another for
unorderable nodes. This is more complex (not least because it links
node types to the underlying storage model), but is probably required
if we want to efficiently support up to millions of child nodes per a
single parent.

It might also be possible to have the underlying storage model
unorderable, and just include extra ordering metadata at a higher
layer where also node types are handled. Option 2b, if you like, with
probably fairly significant overhead when iterating over nodes (the
implementation would probably need to pre-fetch all child nodes in
advance to access their ordering metadata).

If we do have native support for unorderable nodes, then I wouldn't
promise any particular ordering (alphabetic, insertion order, etc.)
but rather leave it undefined like in Java's Map interface. The
underlying implementation is then free to use whatever ordering it
thinks is best.

PS. Note that ordering affects not just node traversal, but also the
document ordering used in search results. Though in practice we
already now disable document ordering of search results by default.

BR,

Jukka Zitting


Re: [jr3] Orderable child nodes: required (to be the default)?

2012-01-20 Thread Stefan Guggisberg
On Fri, Jan 20, 2012 at 4:09 PM, Thomas Mueller  wrote:
> Hi,
>
> The current Jackrabbit implementation uses orderable child nodes by default,
> meaning nodes are returned in the same order as they are created. As an
> example, if I create the nodes /test/c, then /test/b, and then /test/a, and
> then read the node list, I will get them back in that same order. I'm
> wondering if this is really required (at all, and to be the default
> behavior). Specially if there are a lot of child nodes.
>
> Instead of using the insertion order, I propose to sort the child node list
> alphabetically: /test/a, /test/b, /test/c - no matter in what order the
> nodes where created. This will allow an efficient lookup (using binary
> search), and for large child node lists (more than one thousand child nodes
> per node) modifications would be about twice as fast as using insertion
> order (plus it would need less disk space).
>
> Would it be a problem if Jackrabbit 3 sorts child nodes by name (always, or
> at least by default)?

i guess no. unless that this behavior is expected/mandated.

>
> Another option would be to use (insertion) order until the child node list
> gets too large (for example 1000 elements), and from then on sort the child
> nodes by name (the previous ordering would then be lost).

i am currently working on a similar approach in the microkernel prototype
(sandbox). i am thinking about switching from 'insertion ordered' to
'unordered' (the order depends on the implementation, in my case it's
based on the hashcode of the child node name) once a certain threshold
is reached.

cheers
stefan
>
> Regards,
> Thomas
>


Re: [jr3] Orderable child nodes: required (to be the default)?

2012-01-20 Thread Jukka Zitting
Hi,

On Fri, Jan 20, 2012 at 4:35 PM, Thomas Mueller  wrote:
> If we use alphabetical child node lists by default for Jackrabbit 3, then
> we would need to use a different default node type?

Right, though I suppose there are quite a few applications that use
nt:unstructured either as-is or as a base type for many purposes.

> Is it even possible to use a different node type than nt:unstructured
> in Jackrabbit 3?

It is. The default use of nt:unstructured in Jackrabbit 1.x and 2.x
boils down to the use of nt:unstructured as the base type of rep:root.
It should be fairly straightforward to change the root type to specify
an unorderable type as the default for any child nodes.

Doing so may cause some breakage in existing clients that don't
explicitly specify the types they want, but that breakage should be
manageable with proper release notes and workaround instructions.

BR,

Jukka Zitting


Re: [jr3] Orderable child nodes: required (to be the default)?

2012-01-20 Thread Alexander Klimetschek
On 20.01.12 16:50, "Thomas Mueller"  wrote:
>>Isn't the current appending the simplest and fastest?
>
>No. It requires to keep two indexes: one by number, and one by name.
>Having only one index (by name) is about twice as fast, and needs about
>half the disk space / memory.

Ah, I guess this is because both cases (orderable & non-orderable) share
the same data structure underneath. Wouldn't it make sense for J3 to
support both cases naturally in the persistence layer?

Cheers,
Alex

-- 
Alexander Klimetschek
Developer // Adobe (Day) // Berlin - Basel






Re: [jr3] Orderable child nodes: required (to be the default)?

2012-01-20 Thread Thomas Mueller
Hi,

> Isn't the current appending the simplest and fastest?

No. It requires to keep two indexes: one by number, and one by name.
Having only one index (by name) is about twice as fast, and needs about
half the disk space / memory.

Regards,
Thomas



Re: [jr3] Orderable child nodes: required (to be the default)?

2012-01-20 Thread Alexander Klimetschek
On 20.01.12 16:09, "Thomas Mueller" 
mailto:[email protected]>> wrote:

The current Jackrabbit implementation uses orderable child nodes by default, 
meaning nodes are returned in the same order as they are created.

I think "same order as they are created" is not really "orderable child nodes" 
- if a node does not have the orderable flag, you cannot re-order them (through 
the API). I'd call this default behavior simply "appending". And I think for 
nodes with the orderable flag, this is a good default behaviour and shouldn't 
be changed

Instead of using the insertion order, I propose to sort the child node list 
alphabetically: /test/a, /test/b, /test/c - no matter in what order the nodes 
where created.

...but for non-orderable nodes (i.e. a big "bunch") I agree that alphabetical 
order by default is good.

Based on experience, all the UIs for non-orderable nodes (e.g 
nt:hierarchyNodes) end up doing alphabetical ordering anyway (for the default 
view and if they don't allow for custom display ordering), as that's what makes 
it easy for humans to find the way around.

This will allow an efficient lookup (using binary search), and for large child 
node lists (more than one thousand child nodes per node) modifications would be 
about twice as fast as using insertion order (plus it would need less disk 
space).

Isn't the current appending the simplest and fastest?

Would it be a problem if Jackrabbit 3 sorts child nodes by name (always, or at 
least by default)?

Yes, but for non-orderable nodes only, of course.

For orderable nodes I think a goal should be to keep the list ordered the whole 
stack down (API, in-memory, in-between formats (like json) and persistence 
layer), to handle it efficiently. I.e. without any "ordering number" column and 
thus requiring re-orderings at some level.

Another option would be to use (insertion) order until the child node list gets 
too large (for example 1000 elements), and from then on sort the child nodes by 
name (the previous ordering would then be lost).

Hmm, that would take away the benefit of the alphabetical default order, as 
applications would still need to order it alphabetically in UIs (for the lists 
shorter than 1000 elements).

Cheers,
Alex

--
Alexander Klimetschek
Developer // Adobe (Day) // Berlin - Basel


Re: [jr3] Orderable child nodes: required (to be the default)?

2012-01-20 Thread Thomas Mueller
Hi,

I guess one problem is that the default node type, nt:unstructured,
supports orderable child nodes:
http://www.day.com/specs/jcr/2.0/3_Repository_Model.html#3.7.13.1%20nt:unst
ructured "This node type also supports client-orderable child nodes." I
guess "client-orderable" also means "insertion order".

If we use alphabetical child node lists by default for Jackrabbit 3, then
we would need to use a different default node type? Is it even possible to
use a different node type than nt:unstructured in Jackrabbit 3?

Regards,
Thomas



On 1/20/12 4:26 PM, "[email protected]"  wrote:

>On 2012-01-20 16:09, Thomas Mueller wrote:
>> Hi,
>>
>> The current Jackrabbit implementation uses orderable child nodes by
>> default, meaning nodes are returned in the same order as they are
>> created. As an example, if I create the nodes /test/c, then /test/b, and
>> then /test/a, and then read the node list, I will get them back in that
>> same order. I'm wondering if this is really required (at all, and to be
>> the default behavior). Specially if there are a lot of child nodes.
>>
>> Instead of using the insertion order, I propose to sort the child node
>> list alphabetically: /test/a, /test/b, /test/c - no matter in what order
>> the nodes where created. This will allow an efficient lookup (using
>> binary search), and for large child node lists (more than one thousand
>> child nodes per node) modifications would be about twice as fast as
>> using insertion order (plus it would need less disk space).
>>
>> Would it be a problem if Jackrabbit 3 sorts child nodes by name (always,
>> or at least by default)?
>> ...
>
>Spec-wise the returned ordering shouldn't matter unless the node type
>indicates that the child nodes are indeed ordered.
>
>If we suspect that there's code out there making other assumptions, we
>may want to change that in an earlier jackrabbit release to find out for
>sure :-)



Re: [jr3] Orderable child nodes: required (to be the default)?

2012-01-20 Thread Julian Reschke

On 2012-01-20 16:09, Thomas Mueller wrote:

Hi,

The current Jackrabbit implementation uses orderable child nodes by
default, meaning nodes are returned in the same order as they are
created. As an example, if I create the nodes /test/c, then /test/b, and
then /test/a, and then read the node list, I will get them back in that
same order. I'm wondering if this is really required (at all, and to be
the default behavior). Specially if there are a lot of child nodes.

Instead of using the insertion order, I propose to sort the child node
list alphabetically: /test/a, /test/b, /test/c - no matter in what order
the nodes where created. This will allow an efficient lookup (using
binary search), and for large child node lists (more than one thousand
child nodes per node) modifications would be about twice as fast as
using insertion order (plus it would need less disk space).

Would it be a problem if Jackrabbit 3 sorts child nodes by name (always,
or at least by default)?
...


Spec-wise the returned ordering shouldn't matter unless the node type 
indicates that the child nodes are indeed ordered.


If we suspect that there's code out there making other assumptions, we 
may want to change that in an earlier jackrabbit release to find out for 
sure :-)