Re: Implementing virtual, per session subtree

2007-11-19 Thread Marcel Reutegger

(Berry) A.W. van Halderen wrote:

You can already see that in fact see that the tree through which you
can browse is exponential in size compared to the actual size of the
stored nodes.
It can however be generated from the stored data itself, so there is no
need to actually store it, just refresh.  Event listeners are out I think
because of the shere potential size of the data, and the risk of keeping
things in memory one a path has been traversed once.


I see. that's just too much of data that needs to be kept up-to-date with each 
change.


How about implementing facetted search features into jackrabbit, that allows you 
to efficiently query for this kind of information?


regards
 marcel


Re: Status of proposed JCR 20. changes

2007-11-19 Thread Thomas Mueller
Hi,

Those changes are part of JSR 283. What will be implemented depends on
what the expert group decides. See also
http://jcp.org/en/jsr/detail?id=283. I'm not in this expert group by
the way.

Regards,
Thomas

On Nov 18, 2007 11:26 PM, Michael Wechner [EMAIL PROTECTED] wrote:
 Hi

 Which of the proposed changes have been accepted resp. will be implemented?

 http://wiki.apache.org/jackrabbit/Proposed_JCR_2.0_API_Changes

 Thanks

 Michael

 --
 Michael Wechner
 Wyona  -   Open Source Content Management   -Apache Lenya
 http://www.wyona.com  http://lenya.apache.org
 [EMAIL PROTECTED][EMAIL PROTECTED]
 +41 44 272 91 61




Re: Status of proposed JCR 20. changes

2007-11-19 Thread David Nuescheler
Hi Michael,

 Which of the proposed changes have been accepted resp. will be implemented?
 http://wiki.apache.org/jackrabbit/Proposed_JCR_2.0_API_Changes
 Thanks

Thomas extracted the changes for the jackrabbit wiki from the public review
document, so these come directly from the expert group. So I think the
consensus could be considered reasonably solid.

Now given the fact that things may still change quite a bit until final release
these cannot be considered final yet.

The code for the reference implementation will be developed inside the
Jackrabbit project again, so you will see these features implemented rather
sooner than later.

regards,
david


[jira] Commented: (JCR-1213) UUIDDocId cache does not work properly because of weakReferences in combination with new instance for combined indexreader

2007-11-19 Thread Marcel Reutegger (JIRA)

[ 
https://issues.apache.org/jira/browse/JCR-1213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12543513
 ] 

Marcel Reutegger commented on JCR-1213:
---

I think whatever UUIDDocId calculates should be independent of the multi index 
reader. That is, it should only hold the document number as retrieved from the 
index segment. Then in a second step an offset should be applied, as with the
PlainDocId to accommodate the multi index reader wrapping. This probably means 
we have to change some of the signatures, but that's OK.

 UUIDDocId cache does not work properly because of weakReferences in 
 combination with new instance for combined indexreader 
 ---

 Key: JCR-1213
 URL: https://issues.apache.org/jira/browse/JCR-1213
 Project: Jackrabbit
  Issue Type: Improvement
  Components: query
Affects Versions: 1.3.3
Reporter: Ard Schrijvers
 Fix For: 1.4


 Queries that use ChildAxisQuery or DescendantSelfAxisQuery make use of 
 getParent() functions to know wether the parents are correct and if the 
 result is allowed. The getParent() is called recursively for every hit, and 
 can become very expensive. Hence, in DocId.UUIDDocId, the parents are cached. 
 Currently,  docId.UUIDDocId's are cached by having a WeakRefence to the 
 CombinedIndexReader, but, this CombinedIndexReader is recreated all the time, 
 implying that a gc() is allowed to remove the 'expensive' cache.
 A much better solution is to not have a weakReference to the 
 CombinedIndexReader, but to a reference of each indexreader segment. This 
 means, that in getParent(int n) in SearchIndex the return 
 return id.getDocumentNumber(this) needs to be replaced by return 
 id.getDocumentNumber(subReaders[i]); and something similar in 
 CachingMultiReader. 
 That is all. Obviously, when a node/property is added/removed/changed, some 
 parts of the cached DocId.UUIDDocId will be invalid, but mainly small indexes 
 are updated frequently, which obviously are less expensive to recompute.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (JCR-1213) UUIDDocId cache does not work properly because of weakReferences in combination with new instance for combined indexreader

2007-11-19 Thread Ard Schrijvers (JIRA)

[ 
https://issues.apache.org/jira/browse/JCR-1213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12543520
 ] 

Ard Schrijvers commented on JCR-1213:
-

I think whatever UUIDDocId calculates should be independent of the multi index 
reader. That is, it should only hold the document number as retrieved from the 
index segment. Then in a second step an offset should be applied

Yes, this is probably the cleanest way. I now also understand why we had the 
discussion about how to solve the issue. You were already thinking about 
computing the docNumber in a second step, hence, all that matters is the 
segment instance. 

So, the latter part, about the segment instance I did build, though we might 
discuss wether it is a good way. I added a method to MultiIndexReader 
interface, 

public boolean hasIndexReaderInstance(IndexReader indexReader);

and in CachingMultiReader and CombinedIndexReader I keep track of subreaders 
instances with an IdentityHashMap().

In UUIDDocId I can find the reader instance the doc was found in by changing 
SingleTermDocs by having a reference to its segment reader. Obviously, now I 
have to cast reader.termDocs(id) to SingleTermDocs which we might not like.

Anyway, I'll try to add the second step offset in calculating the docNumber as 
you suggested somewhere this week, and create a patch (might be easier than 
talking about a solution).  



 UUIDDocId cache does not work properly because of weakReferences in 
 combination with new instance for combined indexreader 
 ---

 Key: JCR-1213
 URL: https://issues.apache.org/jira/browse/JCR-1213
 Project: Jackrabbit
  Issue Type: Improvement
  Components: query
Affects Versions: 1.3.3
Reporter: Ard Schrijvers
 Fix For: 1.4


 Queries that use ChildAxisQuery or DescendantSelfAxisQuery make use of 
 getParent() functions to know wether the parents are correct and if the 
 result is allowed. The getParent() is called recursively for every hit, and 
 can become very expensive. Hence, in DocId.UUIDDocId, the parents are cached. 
 Currently,  docId.UUIDDocId's are cached by having a WeakRefence to the 
 CombinedIndexReader, but, this CombinedIndexReader is recreated all the time, 
 implying that a gc() is allowed to remove the 'expensive' cache.
 A much better solution is to not have a weakReference to the 
 CombinedIndexReader, but to a reference of each indexreader segment. This 
 means, that in getParent(int n) in SearchIndex the return 
 return id.getDocumentNumber(this) needs to be replaced by return 
 id.getDocumentNumber(subReaders[i]); and something similar in 
 CachingMultiReader. 
 That is all. Obviously, when a node/property is added/removed/changed, some 
 parts of the cached DocId.UUIDDocId will be invalid, but mainly small indexes 
 are updated frequently, which obviously are less expensive to recompute.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (JCR-1214) DocId.UUIDDocId should not have a string attr uuid, but two long's lsb and msb

2007-11-19 Thread Marcel Reutegger (JIRA)

 [ 
https://issues.apache.org/jira/browse/JCR-1214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcel Reutegger reassigned JCR-1214:
-

Assignee: Marcel Reutegger

 DocId.UUIDDocId should not have a string attr uuid, but two long's lsb and 
 msb 
 ---

 Key: JCR-1214
 URL: https://issues.apache.org/jira/browse/JCR-1214
 Project: Jackrabbit
  Issue Type: Improvement
  Components: query
Affects Versions: 1.3.3
Reporter: Ard Schrijvers
Assignee: Marcel Reutegger
 Fix For: 1.4


 After JCR-1213 will be solved, lots of DocId.UUIDDocId can be cached, and not 
 being cleaned after every gc(). The number of cached UUIDDocId can grow very 
 large, depending on the size of the repository.  Therefor, instead of storing 
 the private String uuid; we can make it more memory efficient by storing 2 
 long's, the lsb and msb of the uuid.  Storing 1.000.000 of parent UUIDDocId 
 might differ about 100Mb of memory. 
 I even did test by removing the entire uuid string, and not use msb or lsb, 
 because, when everything works properly (with references to index reader 
 segments (See JCR-1213)), the uuid is never needed again: in 
 UUIDDocId getDocumentNumber(IndexReader reader) throws IOException {
 we could set uuid = null just before the return. It works perfectly well, 
 because when an index reader is recreated, the CachingIndexReader will be 
 recreated, hence DocId[] parents will be recreated. 
 So, IMO, I think we might be able to remove the uuid entirely when the 
 docNumber is found in DocId.UUIDDocId (obviously after JCR-1213)
 WDOT?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (JCR-1214) DocId.UUIDDocId should not have a string attr uuid

2007-11-19 Thread Marcel Reutegger (JIRA)

 [ 
https://issues.apache.org/jira/browse/JCR-1214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcel Reutegger updated JCR-1214:
--

Summary: DocId.UUIDDocId should not have a string attr uuid  (was: 
DocId.UUIDDocId should not have a string attr uuid, but two long's lsb and msb )

 DocId.UUIDDocId should not have a string attr uuid
 --

 Key: JCR-1214
 URL: https://issues.apache.org/jira/browse/JCR-1214
 Project: Jackrabbit
  Issue Type: Improvement
  Components: query
Affects Versions: 1.3.3
Reporter: Ard Schrijvers
Assignee: Marcel Reutegger
 Fix For: 1.4


 After JCR-1213 will be solved, lots of DocId.UUIDDocId can be cached, and not 
 being cleaned after every gc(). The number of cached UUIDDocId can grow very 
 large, depending on the size of the repository.  Therefor, instead of storing 
 the private String uuid; we can make it more memory efficient by storing 2 
 long's, the lsb and msb of the uuid.  Storing 1.000.000 of parent UUIDDocId 
 might differ about 100Mb of memory. 
 I even did test by removing the entire uuid string, and not use msb or lsb, 
 because, when everything works properly (with references to index reader 
 segments (See JCR-1213)), the uuid is never needed again: in 
 UUIDDocId getDocumentNumber(IndexReader reader) throws IOException {
 we could set uuid = null just before the return. It works perfectly well, 
 because when an index reader is recreated, the CachingIndexReader will be 
 recreated, hence DocId[] parents will be recreated. 
 So, IMO, I think we might be able to remove the uuid entirely when the 
 docNumber is found in DocId.UUIDDocId (obviously after JCR-1213)
 WDOT?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (JCR-1214) DocId.UUIDDocId should not have a string attr uuid

2007-11-19 Thread Marcel Reutegger (JIRA)

 [ 
https://issues.apache.org/jira/browse/JCR-1214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcel Reutegger resolved JCR-1214.
---

Resolution: Fixed

Replaced the uuid String with a UUID instance.

svn revision: 596274.

 DocId.UUIDDocId should not have a string attr uuid
 --

 Key: JCR-1214
 URL: https://issues.apache.org/jira/browse/JCR-1214
 Project: Jackrabbit
  Issue Type: Improvement
  Components: query
Affects Versions: 1.3.3
Reporter: Ard Schrijvers
Assignee: Marcel Reutegger
 Fix For: 1.4


 After JCR-1213 will be solved, lots of DocId.UUIDDocId can be cached, and not 
 being cleaned after every gc(). The number of cached UUIDDocId can grow very 
 large, depending on the size of the repository.  Therefor, instead of storing 
 the private String uuid; we can make it more memory efficient by storing 2 
 long's, the lsb and msb of the uuid.  Storing 1.000.000 of parent UUIDDocId 
 might differ about 100Mb of memory. 
 I even did test by removing the entire uuid string, and not use msb or lsb, 
 because, when everything works properly (with references to index reader 
 segments (See JCR-1213)), the uuid is never needed again: in 
 UUIDDocId getDocumentNumber(IndexReader reader) throws IOException {
 we could set uuid = null just before the return. It works perfectly well, 
 because when an index reader is recreated, the CachingIndexReader will be 
 recreated, hence DocId[] parents will be recreated. 
 So, IMO, I think we might be able to remove the uuid entirely when the 
 docNumber is found in DocId.UUIDDocId (obviously after JCR-1213)
 WDOT?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (JCR-1214) DocId.UUIDDocId should not have a string attr uuid

2007-11-19 Thread Ard Schrijvers (JIRA)

[ 
https://issues.apache.org/jira/browse/JCR-1214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12543530
 ] 

Ard Schrijvers commented on JCR-1214:
-

For simplicity I would rather use an instance of UUID instead of two longs.  

By the way I think the overhead of a UUID instance (12-16 bytes for an Object) 
seems to me redundant if only storing 2 longs is enough. My intention was to 
reduce memory consumption for the string uuid ( ~120 bytes) to two long's. 
Ofcourse, a UUID instance is still smaller than the original 120 bytes but 
still uses redundant memory (though I admit probably UUID instances are small 
enough :-) )

Thx for solving the issue! 

 DocId.UUIDDocId should not have a string attr uuid
 --

 Key: JCR-1214
 URL: https://issues.apache.org/jira/browse/JCR-1214
 Project: Jackrabbit
  Issue Type: Improvement
  Components: query
Affects Versions: 1.3.3
Reporter: Ard Schrijvers
Assignee: Marcel Reutegger
 Fix For: 1.4


 After JCR-1213 will be solved, lots of DocId.UUIDDocId can be cached, and not 
 being cleaned after every gc(). The number of cached UUIDDocId can grow very 
 large, depending on the size of the repository.  Therefor, instead of storing 
 the private String uuid; we can make it more memory efficient by storing 2 
 long's, the lsb and msb of the uuid.  Storing 1.000.000 of parent UUIDDocId 
 might differ about 100Mb of memory. 
 I even did test by removing the entire uuid string, and not use msb or lsb, 
 because, when everything works properly (with references to index reader 
 segments (See JCR-1213)), the uuid is never needed again: in 
 UUIDDocId getDocumentNumber(IndexReader reader) throws IOException {
 we could set uuid = null just before the return. It works perfectly well, 
 because when an index reader is recreated, the CachingIndexReader will be 
 recreated, hence DocId[] parents will be recreated. 
 So, IMO, I think we might be able to remove the uuid entirely when the 
 docNumber is found in DocId.UUIDDocId (obviously after JCR-1213)
 WDOT?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (JCR-1214) DocId.UUIDDocId should not have a string attr uuid

2007-11-19 Thread Ard Schrijvers (JIRA)

[ 
https://issues.apache.org/jira/browse/JCR-1214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12543554
 ] 

Ard Schrijvers commented on JCR-1214:
-

:-) thanks again (by the way after filing the jira issue ofcourse I did not 
intend to not solve the issue. I thought trying JCR-1213 in combination with 
this one to not create an extra patch because this one is such a basic one) 
Anyway, thanks for solving it, and I'll try to get the patch with respect to 
1213 somewhere this week/weekend. 

 DocId.UUIDDocId should not have a string attr uuid
 --

 Key: JCR-1214
 URL: https://issues.apache.org/jira/browse/JCR-1214
 Project: Jackrabbit
  Issue Type: Improvement
  Components: query
Affects Versions: 1.3.3
Reporter: Ard Schrijvers
Assignee: Marcel Reutegger
 Fix For: 1.4


 After JCR-1213 will be solved, lots of DocId.UUIDDocId can be cached, and not 
 being cleaned after every gc(). The number of cached UUIDDocId can grow very 
 large, depending on the size of the repository.  Therefor, instead of storing 
 the private String uuid; we can make it more memory efficient by storing 2 
 long's, the lsb and msb of the uuid.  Storing 1.000.000 of parent UUIDDocId 
 might differ about 100Mb of memory. 
 I even did test by removing the entire uuid string, and not use msb or lsb, 
 because, when everything works properly (with references to index reader 
 segments (See JCR-1213)), the uuid is never needed again: in 
 UUIDDocId getDocumentNumber(IndexReader reader) throws IOException {
 we could set uuid = null just before the return. It works perfectly well, 
 because when an index reader is recreated, the CachingIndexReader will be 
 recreated, hence DocId[] parents will be recreated. 
 So, IMO, I think we might be able to remove the uuid entirely when the 
 docNumber is found in DocId.UUIDDocId (obviously after JCR-1213)
 WDOT?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: Status of proposed JCR 20. changes

2007-11-19 Thread Michael Wechner

David Nuescheler wrote:


Hi Michael,

 


Which of the proposed changes have been accepted resp. will be implemented?
http://wiki.apache.org/jackrabbit/Proposed_JCR_2.0_API_Changes
Thanks
   



Thomas extracted the changes for the jackrabbit wiki from the public review
document, so these come directly from the expert group. So I think the
consensus could be considered reasonably solid.
 



Thomas and David, thanks very much for your quick feedback


Now given the fact that things may still change quite a bit until final release
these cannot be considered final yet.

The code for the reference implementation will be developed inside the
Jackrabbit project again, so you will see these features implemented rather
sooner than later.
 



what timeframe do you consider?

Cheers

Michael


regards,
david
 




--
Michael Wechner
Wyona  -   Open Source Content Management - Yanel, Yulup
http://www.wyona.com
[EMAIL PROTECTED], [EMAIL PROTECTED]
+41 44 272 91 61



[jira] Commented: (JCR-1043) Package names for spring project do not match update ocm packages

2007-11-19 Thread Christophe Lombart (JIRA)

[ 
https://issues.apache.org/jira/browse/JCR-1043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12543674
 ] 

Christophe Lombart commented on JCR-1043:
-

Patch applied thanks !
We have to continue to work on it because the unit test fails. 
It seems that the JCR support in the Spring module used an old version of 
Jackrabbit which is not compatible with the OCM stuff. 

 Package names for spring project do not match update ocm packages
 -

 Key: JCR-1043
 URL: https://issues.apache.org/jira/browse/JCR-1043
 Project: Jackrabbit
  Issue Type: Bug
  Components: jcr-mapping
Affects Versions: 1.3
 Environment: All environments
Reporter: Padraic Hannon
Assignee: Christophe Lombart
Priority: Blocker
 Attachments: spring-mvn2.patch, spring.patch


 The spring package and tests reference the old graffitto package naming 
 scheme.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: Configuration of derby.log file path

2007-11-19 Thread Jukka Zitting
Hi,

On Nov 18, 2007 1:17 AM, Michael Wechner [EMAIL PROTECTED] wrote:
 I am using the TransientRepository with the default repo config and it
 works very fine so far. Using System properties I have been able to use
 a custom path for my repo config and repo itself, but I haven't found
 out how to configure the path of the logfile derby.log.

The derby.log file is written by the embedded Derby database. You can
control the log file name by setting the derby.stream.error.file
system property.

BR,

Jukka Zitting


Re: Configuration of derby.log file path

2007-11-19 Thread Michael Wechner

Jukka Zitting wrote:


Hi,

On Nov 18, 2007 1:17 AM, Michael Wechner [EMAIL PROTECTED] wrote:
 


I am using the TransientRepository with the default repo config and it
works very fine so far. Using System properties I have been able to use
a custom path for my repo config and repo itself, but I haven't found
out how to configure the path of the logfile derby.log.
   



The derby.log file is written by the embedded Derby database. You can
control the log file name by setting the derby.stream.error.file
system property.
 



thanks very much

Michael


BR,

Jukka Zitting
 




--
Michael Wechner
Wyona  -   Open Source Content Management - Yanel, Yulup
http://www.wyona.com
[EMAIL PROTECTED], [EMAIL PROTECTED]
+41 44 272 91 61



Re: Realtime datastore garbage collector

2007-11-19 Thread Thomas Mueller
Hi,

 dataStore.removeTransientIdentifiers(addedProps);

There is a problem with this approach: an identifier can be added to
multiple properties. Also, it may be used at other places. So you
would need to keep a reference count as well. Also, you would need to
be sure the reference counts are updated correctly ('transactional').

It would be a good idea to implement this, however I think with the
current architecture of Jackrabbit (having multiple change logs,
multiple caches, and multiple places where values are used), it is
beyond my ability to verify that the implementation is correct. I just
don't know enough about the Jackrabbit core, and there are not enough
test cases in the Jackrabbit core that would allow automatic
verification.

A simpler mechanism would be to store back-references: each data
record / identifier would know who references it. The garbage
collection could then follow the back-references and check if they are
still valid (and if not remove them). Items without valid back
references could be deleted. This allows to delete very large objects
quickly (if they are not used of course).

When we change the architecture of Jackrabbit (see also NGP) we should
think about the data store.

But at this time, I would argue it is safer to keep the data store
mechanism as is, without trying add more features (adding more data
store implementations is not a problem of course), unless we really
fix a bug. I think it makes more sense to spend the time improving the
architecture of Jackrabbit before trying to add more complex
algorithms to the data store (which are not required afterwards).

Thomas