Re: Implementing virtual, per session subtree
(Berry) A.W. van Halderen wrote: You can already see that in fact see that the tree through which you can browse is exponential in size compared to the actual size of the stored nodes. It can however be generated from the stored data itself, so there is no need to actually store it, just refresh. Event listeners are out I think because of the shere potential size of the data, and the risk of keeping things in memory one a path has been traversed once. I see. that's just too much of data that needs to be kept up-to-date with each change. How about implementing facetted search features into jackrabbit, that allows you to efficiently query for this kind of information? regards marcel
Re: Status of proposed JCR 20. changes
Hi, Those changes are part of JSR 283. What will be implemented depends on what the expert group decides. See also http://jcp.org/en/jsr/detail?id=283. I'm not in this expert group by the way. Regards, Thomas On Nov 18, 2007 11:26 PM, Michael Wechner [EMAIL PROTECTED] wrote: Hi Which of the proposed changes have been accepted resp. will be implemented? http://wiki.apache.org/jackrabbit/Proposed_JCR_2.0_API_Changes Thanks Michael -- Michael Wechner Wyona - Open Source Content Management -Apache Lenya http://www.wyona.com http://lenya.apache.org [EMAIL PROTECTED][EMAIL PROTECTED] +41 44 272 91 61
Re: Status of proposed JCR 20. changes
Hi Michael, Which of the proposed changes have been accepted resp. will be implemented? http://wiki.apache.org/jackrabbit/Proposed_JCR_2.0_API_Changes Thanks Thomas extracted the changes for the jackrabbit wiki from the public review document, so these come directly from the expert group. So I think the consensus could be considered reasonably solid. Now given the fact that things may still change quite a bit until final release these cannot be considered final yet. The code for the reference implementation will be developed inside the Jackrabbit project again, so you will see these features implemented rather sooner than later. regards, david
[jira] Commented: (JCR-1213) UUIDDocId cache does not work properly because of weakReferences in combination with new instance for combined indexreader
[ https://issues.apache.org/jira/browse/JCR-1213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12543513 ] Marcel Reutegger commented on JCR-1213: --- I think whatever UUIDDocId calculates should be independent of the multi index reader. That is, it should only hold the document number as retrieved from the index segment. Then in a second step an offset should be applied, as with the PlainDocId to accommodate the multi index reader wrapping. This probably means we have to change some of the signatures, but that's OK. UUIDDocId cache does not work properly because of weakReferences in combination with new instance for combined indexreader --- Key: JCR-1213 URL: https://issues.apache.org/jira/browse/JCR-1213 Project: Jackrabbit Issue Type: Improvement Components: query Affects Versions: 1.3.3 Reporter: Ard Schrijvers Fix For: 1.4 Queries that use ChildAxisQuery or DescendantSelfAxisQuery make use of getParent() functions to know wether the parents are correct and if the result is allowed. The getParent() is called recursively for every hit, and can become very expensive. Hence, in DocId.UUIDDocId, the parents are cached. Currently, docId.UUIDDocId's are cached by having a WeakRefence to the CombinedIndexReader, but, this CombinedIndexReader is recreated all the time, implying that a gc() is allowed to remove the 'expensive' cache. A much better solution is to not have a weakReference to the CombinedIndexReader, but to a reference of each indexreader segment. This means, that in getParent(int n) in SearchIndex the return return id.getDocumentNumber(this) needs to be replaced by return id.getDocumentNumber(subReaders[i]); and something similar in CachingMultiReader. That is all. Obviously, when a node/property is added/removed/changed, some parts of the cached DocId.UUIDDocId will be invalid, but mainly small indexes are updated frequently, which obviously are less expensive to recompute. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (JCR-1213) UUIDDocId cache does not work properly because of weakReferences in combination with new instance for combined indexreader
[ https://issues.apache.org/jira/browse/JCR-1213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12543520 ] Ard Schrijvers commented on JCR-1213: - I think whatever UUIDDocId calculates should be independent of the multi index reader. That is, it should only hold the document number as retrieved from the index segment. Then in a second step an offset should be applied Yes, this is probably the cleanest way. I now also understand why we had the discussion about how to solve the issue. You were already thinking about computing the docNumber in a second step, hence, all that matters is the segment instance. So, the latter part, about the segment instance I did build, though we might discuss wether it is a good way. I added a method to MultiIndexReader interface, public boolean hasIndexReaderInstance(IndexReader indexReader); and in CachingMultiReader and CombinedIndexReader I keep track of subreaders instances with an IdentityHashMap(). In UUIDDocId I can find the reader instance the doc was found in by changing SingleTermDocs by having a reference to its segment reader. Obviously, now I have to cast reader.termDocs(id) to SingleTermDocs which we might not like. Anyway, I'll try to add the second step offset in calculating the docNumber as you suggested somewhere this week, and create a patch (might be easier than talking about a solution). UUIDDocId cache does not work properly because of weakReferences in combination with new instance for combined indexreader --- Key: JCR-1213 URL: https://issues.apache.org/jira/browse/JCR-1213 Project: Jackrabbit Issue Type: Improvement Components: query Affects Versions: 1.3.3 Reporter: Ard Schrijvers Fix For: 1.4 Queries that use ChildAxisQuery or DescendantSelfAxisQuery make use of getParent() functions to know wether the parents are correct and if the result is allowed. The getParent() is called recursively for every hit, and can become very expensive. Hence, in DocId.UUIDDocId, the parents are cached. Currently, docId.UUIDDocId's are cached by having a WeakRefence to the CombinedIndexReader, but, this CombinedIndexReader is recreated all the time, implying that a gc() is allowed to remove the 'expensive' cache. A much better solution is to not have a weakReference to the CombinedIndexReader, but to a reference of each indexreader segment. This means, that in getParent(int n) in SearchIndex the return return id.getDocumentNumber(this) needs to be replaced by return id.getDocumentNumber(subReaders[i]); and something similar in CachingMultiReader. That is all. Obviously, when a node/property is added/removed/changed, some parts of the cached DocId.UUIDDocId will be invalid, but mainly small indexes are updated frequently, which obviously are less expensive to recompute. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (JCR-1214) DocId.UUIDDocId should not have a string attr uuid, but two long's lsb and msb
[ https://issues.apache.org/jira/browse/JCR-1214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcel Reutegger reassigned JCR-1214: - Assignee: Marcel Reutegger DocId.UUIDDocId should not have a string attr uuid, but two long's lsb and msb --- Key: JCR-1214 URL: https://issues.apache.org/jira/browse/JCR-1214 Project: Jackrabbit Issue Type: Improvement Components: query Affects Versions: 1.3.3 Reporter: Ard Schrijvers Assignee: Marcel Reutegger Fix For: 1.4 After JCR-1213 will be solved, lots of DocId.UUIDDocId can be cached, and not being cleaned after every gc(). The number of cached UUIDDocId can grow very large, depending on the size of the repository. Therefor, instead of storing the private String uuid; we can make it more memory efficient by storing 2 long's, the lsb and msb of the uuid. Storing 1.000.000 of parent UUIDDocId might differ about 100Mb of memory. I even did test by removing the entire uuid string, and not use msb or lsb, because, when everything works properly (with references to index reader segments (See JCR-1213)), the uuid is never needed again: in UUIDDocId getDocumentNumber(IndexReader reader) throws IOException { we could set uuid = null just before the return. It works perfectly well, because when an index reader is recreated, the CachingIndexReader will be recreated, hence DocId[] parents will be recreated. So, IMO, I think we might be able to remove the uuid entirely when the docNumber is found in DocId.UUIDDocId (obviously after JCR-1213) WDOT? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (JCR-1214) DocId.UUIDDocId should not have a string attr uuid
[ https://issues.apache.org/jira/browse/JCR-1214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcel Reutegger updated JCR-1214: -- Summary: DocId.UUIDDocId should not have a string attr uuid (was: DocId.UUIDDocId should not have a string attr uuid, but two long's lsb and msb ) DocId.UUIDDocId should not have a string attr uuid -- Key: JCR-1214 URL: https://issues.apache.org/jira/browse/JCR-1214 Project: Jackrabbit Issue Type: Improvement Components: query Affects Versions: 1.3.3 Reporter: Ard Schrijvers Assignee: Marcel Reutegger Fix For: 1.4 After JCR-1213 will be solved, lots of DocId.UUIDDocId can be cached, and not being cleaned after every gc(). The number of cached UUIDDocId can grow very large, depending on the size of the repository. Therefor, instead of storing the private String uuid; we can make it more memory efficient by storing 2 long's, the lsb and msb of the uuid. Storing 1.000.000 of parent UUIDDocId might differ about 100Mb of memory. I even did test by removing the entire uuid string, and not use msb or lsb, because, when everything works properly (with references to index reader segments (See JCR-1213)), the uuid is never needed again: in UUIDDocId getDocumentNumber(IndexReader reader) throws IOException { we could set uuid = null just before the return. It works perfectly well, because when an index reader is recreated, the CachingIndexReader will be recreated, hence DocId[] parents will be recreated. So, IMO, I think we might be able to remove the uuid entirely when the docNumber is found in DocId.UUIDDocId (obviously after JCR-1213) WDOT? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (JCR-1214) DocId.UUIDDocId should not have a string attr uuid
[ https://issues.apache.org/jira/browse/JCR-1214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcel Reutegger resolved JCR-1214. --- Resolution: Fixed Replaced the uuid String with a UUID instance. svn revision: 596274. DocId.UUIDDocId should not have a string attr uuid -- Key: JCR-1214 URL: https://issues.apache.org/jira/browse/JCR-1214 Project: Jackrabbit Issue Type: Improvement Components: query Affects Versions: 1.3.3 Reporter: Ard Schrijvers Assignee: Marcel Reutegger Fix For: 1.4 After JCR-1213 will be solved, lots of DocId.UUIDDocId can be cached, and not being cleaned after every gc(). The number of cached UUIDDocId can grow very large, depending on the size of the repository. Therefor, instead of storing the private String uuid; we can make it more memory efficient by storing 2 long's, the lsb and msb of the uuid. Storing 1.000.000 of parent UUIDDocId might differ about 100Mb of memory. I even did test by removing the entire uuid string, and not use msb or lsb, because, when everything works properly (with references to index reader segments (See JCR-1213)), the uuid is never needed again: in UUIDDocId getDocumentNumber(IndexReader reader) throws IOException { we could set uuid = null just before the return. It works perfectly well, because when an index reader is recreated, the CachingIndexReader will be recreated, hence DocId[] parents will be recreated. So, IMO, I think we might be able to remove the uuid entirely when the docNumber is found in DocId.UUIDDocId (obviously after JCR-1213) WDOT? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (JCR-1214) DocId.UUIDDocId should not have a string attr uuid
[ https://issues.apache.org/jira/browse/JCR-1214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12543530 ] Ard Schrijvers commented on JCR-1214: - For simplicity I would rather use an instance of UUID instead of two longs. By the way I think the overhead of a UUID instance (12-16 bytes for an Object) seems to me redundant if only storing 2 longs is enough. My intention was to reduce memory consumption for the string uuid ( ~120 bytes) to two long's. Ofcourse, a UUID instance is still smaller than the original 120 bytes but still uses redundant memory (though I admit probably UUID instances are small enough :-) ) Thx for solving the issue! DocId.UUIDDocId should not have a string attr uuid -- Key: JCR-1214 URL: https://issues.apache.org/jira/browse/JCR-1214 Project: Jackrabbit Issue Type: Improvement Components: query Affects Versions: 1.3.3 Reporter: Ard Schrijvers Assignee: Marcel Reutegger Fix For: 1.4 After JCR-1213 will be solved, lots of DocId.UUIDDocId can be cached, and not being cleaned after every gc(). The number of cached UUIDDocId can grow very large, depending on the size of the repository. Therefor, instead of storing the private String uuid; we can make it more memory efficient by storing 2 long's, the lsb and msb of the uuid. Storing 1.000.000 of parent UUIDDocId might differ about 100Mb of memory. I even did test by removing the entire uuid string, and not use msb or lsb, because, when everything works properly (with references to index reader segments (See JCR-1213)), the uuid is never needed again: in UUIDDocId getDocumentNumber(IndexReader reader) throws IOException { we could set uuid = null just before the return. It works perfectly well, because when an index reader is recreated, the CachingIndexReader will be recreated, hence DocId[] parents will be recreated. So, IMO, I think we might be able to remove the uuid entirely when the docNumber is found in DocId.UUIDDocId (obviously after JCR-1213) WDOT? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (JCR-1214) DocId.UUIDDocId should not have a string attr uuid
[ https://issues.apache.org/jira/browse/JCR-1214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12543554 ] Ard Schrijvers commented on JCR-1214: - :-) thanks again (by the way after filing the jira issue ofcourse I did not intend to not solve the issue. I thought trying JCR-1213 in combination with this one to not create an extra patch because this one is such a basic one) Anyway, thanks for solving it, and I'll try to get the patch with respect to 1213 somewhere this week/weekend. DocId.UUIDDocId should not have a string attr uuid -- Key: JCR-1214 URL: https://issues.apache.org/jira/browse/JCR-1214 Project: Jackrabbit Issue Type: Improvement Components: query Affects Versions: 1.3.3 Reporter: Ard Schrijvers Assignee: Marcel Reutegger Fix For: 1.4 After JCR-1213 will be solved, lots of DocId.UUIDDocId can be cached, and not being cleaned after every gc(). The number of cached UUIDDocId can grow very large, depending on the size of the repository. Therefor, instead of storing the private String uuid; we can make it more memory efficient by storing 2 long's, the lsb and msb of the uuid. Storing 1.000.000 of parent UUIDDocId might differ about 100Mb of memory. I even did test by removing the entire uuid string, and not use msb or lsb, because, when everything works properly (with references to index reader segments (See JCR-1213)), the uuid is never needed again: in UUIDDocId getDocumentNumber(IndexReader reader) throws IOException { we could set uuid = null just before the return. It works perfectly well, because when an index reader is recreated, the CachingIndexReader will be recreated, hence DocId[] parents will be recreated. So, IMO, I think we might be able to remove the uuid entirely when the docNumber is found in DocId.UUIDDocId (obviously after JCR-1213) WDOT? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: Status of proposed JCR 20. changes
David Nuescheler wrote: Hi Michael, Which of the proposed changes have been accepted resp. will be implemented? http://wiki.apache.org/jackrabbit/Proposed_JCR_2.0_API_Changes Thanks Thomas extracted the changes for the jackrabbit wiki from the public review document, so these come directly from the expert group. So I think the consensus could be considered reasonably solid. Thomas and David, thanks very much for your quick feedback Now given the fact that things may still change quite a bit until final release these cannot be considered final yet. The code for the reference implementation will be developed inside the Jackrabbit project again, so you will see these features implemented rather sooner than later. what timeframe do you consider? Cheers Michael regards, david -- Michael Wechner Wyona - Open Source Content Management - Yanel, Yulup http://www.wyona.com [EMAIL PROTECTED], [EMAIL PROTECTED] +41 44 272 91 61
[jira] Commented: (JCR-1043) Package names for spring project do not match update ocm packages
[ https://issues.apache.org/jira/browse/JCR-1043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12543674 ] Christophe Lombart commented on JCR-1043: - Patch applied thanks ! We have to continue to work on it because the unit test fails. It seems that the JCR support in the Spring module used an old version of Jackrabbit which is not compatible with the OCM stuff. Package names for spring project do not match update ocm packages - Key: JCR-1043 URL: https://issues.apache.org/jira/browse/JCR-1043 Project: Jackrabbit Issue Type: Bug Components: jcr-mapping Affects Versions: 1.3 Environment: All environments Reporter: Padraic Hannon Assignee: Christophe Lombart Priority: Blocker Attachments: spring-mvn2.patch, spring.patch The spring package and tests reference the old graffitto package naming scheme. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: Configuration of derby.log file path
Hi, On Nov 18, 2007 1:17 AM, Michael Wechner [EMAIL PROTECTED] wrote: I am using the TransientRepository with the default repo config and it works very fine so far. Using System properties I have been able to use a custom path for my repo config and repo itself, but I haven't found out how to configure the path of the logfile derby.log. The derby.log file is written by the embedded Derby database. You can control the log file name by setting the derby.stream.error.file system property. BR, Jukka Zitting
Re: Configuration of derby.log file path
Jukka Zitting wrote: Hi, On Nov 18, 2007 1:17 AM, Michael Wechner [EMAIL PROTECTED] wrote: I am using the TransientRepository with the default repo config and it works very fine so far. Using System properties I have been able to use a custom path for my repo config and repo itself, but I haven't found out how to configure the path of the logfile derby.log. The derby.log file is written by the embedded Derby database. You can control the log file name by setting the derby.stream.error.file system property. thanks very much Michael BR, Jukka Zitting -- Michael Wechner Wyona - Open Source Content Management - Yanel, Yulup http://www.wyona.com [EMAIL PROTECTED], [EMAIL PROTECTED] +41 44 272 91 61
Re: Realtime datastore garbage collector
Hi, dataStore.removeTransientIdentifiers(addedProps); There is a problem with this approach: an identifier can be added to multiple properties. Also, it may be used at other places. So you would need to keep a reference count as well. Also, you would need to be sure the reference counts are updated correctly ('transactional'). It would be a good idea to implement this, however I think with the current architecture of Jackrabbit (having multiple change logs, multiple caches, and multiple places where values are used), it is beyond my ability to verify that the implementation is correct. I just don't know enough about the Jackrabbit core, and there are not enough test cases in the Jackrabbit core that would allow automatic verification. A simpler mechanism would be to store back-references: each data record / identifier would know who references it. The garbage collection could then follow the back-references and check if they are still valid (and if not remove them). Items without valid back references could be deleted. This allows to delete very large objects quickly (if they are not used of course). When we change the architecture of Jackrabbit (see also NGP) we should think about the data store. But at this time, I would argue it is safer to keep the data store mechanism as is, without trying add more features (adding more data store implementations is not a problem of course), unless we really fix a bug. I think it makes more sense to spend the time improving the architecture of Jackrabbit before trying to add more complex algorithms to the data store (which are not required afterwards). Thomas