[jira] Commented: (JCR-688) Improve name resolution
[ https://issues.apache.org/jira/browse/JCR-688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12462987 ] Marcel Reutegger commented on JCR-688: -- I think the patch is very good, I just have one issue: name resolution through the CachingNameResolver is serialized. The call resolver.getQName(name) within CachingNameResolver.getQName() is in a synchronized(this). Can this be change in a way that multiple thread can resolve names concurrently? Improve name resolution --- Key: JCR-688 URL: https://issues.apache.org/jira/browse/JCR-688 Project: Jackrabbit Issue Type: Improvement Components: core Reporter: Jukka Zitting Priority: Minor Fix For: 1.3 Attachments: JCR-688.LocalCache.patch, JCR-688.NameResolver.patch As discussed in JCR-685, the current CachingNamespaceResolver class contains excessive synchronization causing monitor contention that reduces performance. In JCR-685 there's a proposed patch that replaces synchronization with a read-write lock that would allow concurrent read access to the name cache. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (JCR-688) Improve name resolution
[ https://issues.apache.org/jira/browse/JCR-688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12462989 ] Jukka Zitting commented on JCR-688: --- The call resolver.getQName(name) within CachingNameResolver.getQName() is in a synchronized(this).Can this be change in a way that multiple thread can resolve names concurrently? I think so. The reason for the synchronization is to guard access to the young generation maps that gets modified. I think it would be enough to make just the young map and the increaseGenerationAge() method synchronized. I'll take a look at this later today. Improve name resolution --- Key: JCR-688 URL: https://issues.apache.org/jira/browse/JCR-688 Project: Jackrabbit Issue Type: Improvement Components: core Reporter: Jukka Zitting Priority: Minor Fix For: 1.3 Attachments: JCR-688.LocalCache.patch, JCR-688.NameResolver.patch As discussed in JCR-685, the current CachingNamespaceResolver class contains excessive synchronization causing monitor contention that reduces performance. In JCR-685 there's a proposed patch that replaces synchronization with a read-write lock that would allow concurrent read access to the name cache. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (JCR-688) Improve name resolution
[ https://issues.apache.org/jira/browse/JCR-688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12463005 ] Stefan Guggisberg commented on JCR-688: --- i agree with jukka's rationale, +1 for the redesign/patch (assuming the synchronization issue mentioned by marcel gets resolved). it would be very interesting to see some 'before/after' benckmark/profiling results. Improve name resolution --- Key: JCR-688 URL: https://issues.apache.org/jira/browse/JCR-688 Project: Jackrabbit Issue Type: Improvement Components: core Reporter: Jukka Zitting Assigned To: Jukka Zitting Priority: Minor Fix For: 1.3 Attachments: JCR-688.LocalCache.patch, JCR-688.NameResolver.patch As discussed in JCR-685, the current CachingNamespaceResolver class contains excessive synchronization causing monitor contention that reduces performance. In JCR-685 there's a proposed patch that replaces synchronization with a read-write lock that would allow concurrent read access to the name cache. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (JCR-688) Improve name resolution
[ https://issues.apache.org/jira/browse/JCR-688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12463021 ] Jukka Zitting commented on JCR-688: --- Attached JCR-688.NameResolver.v2.patch with reduced synchronization and a slightly improved increaseGenerationAge() method. A quick ad-hoc test with the above testPropertyAccess method running in 20 concurrent threads is about 6% faster using this implementation than with the current CachingNamespaceResolver. A few more percents can probably be achieved by getting rid of the current NameFormat indirections obsoleted by the new NameResolver interface. Careful analysis of the performance tradeoffs could improve the cache even more. For example it might make sense to make the young generation write-only (i.e. only add mappings to it within the synchronized increaseGenerationAge()) to avoid extra monitor contention. Improve name resolution --- Key: JCR-688 URL: https://issues.apache.org/jira/browse/JCR-688 Project: Jackrabbit Issue Type: Improvement Components: core Reporter: Jukka Zitting Assigned To: Jukka Zitting Priority: Minor Fix For: 1.3 Attachments: JCR-688.LocalCache.patch, JCR-688.NameResolver.patch, JCR-688.NameResolver.v2.patch As discussed in JCR-685, the current CachingNamespaceResolver class contains excessive synchronization causing monitor contention that reduces performance. In JCR-685 there's a proposed patch that replaces synchronization with a read-write lock that would allow concurrent read access to the name cache. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (JCR-688) Improve name resolution
[ https://issues.apache.org/jira/browse/JCR-688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12463150 ] Jukka Zitting commented on JCR-688: --- Committed version 3 of the NameResolver classes in revision 494219. The committed classes have some minor differences to the above proposals, most notably: a) ParsingNameResolver was further optimized and is now most likely the fastest validating JCR name parser there is. b) CachingNameResolver now caches both parsed and formatted names, as benchmarking showed noticeable gain from doing so. c) CachingNameResolver was simplified so that only a single synchronized method is required for controlling concurrency. Improve name resolution --- Key: JCR-688 URL: https://issues.apache.org/jira/browse/JCR-688 Project: Jackrabbit Issue Type: Improvement Components: core Reporter: Jukka Zitting Assigned To: Jukka Zitting Priority: Minor Fix For: 1.3 Attachments: JCR-688.LocalCache.patch, JCR-688.NameResolver.patch, JCR-688.NameResolver.v2.patch As discussed in JCR-685, the current CachingNamespaceResolver class contains excessive synchronization causing monitor contention that reduces performance. In JCR-685 there's a proposed patch that replaces synchronization with a read-write lock that would allow concurrent read access to the name cache. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (JCR-688) Improve name resolution
[ https://issues.apache.org/jira/browse/JCR-688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12463206 ] Tobias Bocanegra commented on JCR-688: -- what about caching of paths? Improve name resolution --- Key: JCR-688 URL: https://issues.apache.org/jira/browse/JCR-688 Project: Jackrabbit Issue Type: Improvement Components: core Reporter: Jukka Zitting Assigned To: Jukka Zitting Priority: Minor Fix For: 1.3 Attachments: JCR-688.LocalCache.patch, JCR-688.NameResolver.patch, JCR-688.NameResolver.v2.patch As discussed in JCR-685, the current CachingNamespaceResolver class contains excessive synchronization causing monitor contention that reduces performance. In JCR-685 there's a proposed patch that replaces synchronization with a read-write lock that would allow concurrent read access to the name cache. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (JCR-688) Improve name resolution
[ https://issues.apache.org/jira/browse/JCR-688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12462760 ] Jukka Zitting commented on JCR-688: --- Attached a JCR-688.NameResolver.patch that includes 1) a proposed new NameResolver interface, and 2) a generational name cache. For now the patch simply replaces the CachingNamespaceResolver class, but if the NameResolver proposal gets accepted, it has potential to simplify quite a lot of the current name resolution code. 1) NameResolver interface The previous and current mechanisms of putting JCR name parsing and formatting operations respectively in the NamespaceResolver interface and the NameFormat static class are both not very clean. The methods in NamespaceResolver imply extra responsibilities even to classes that would only need to handle namespace mappings and not individual names, and the static NameFormat methods are very inflexible, for example support for name caching requires explicit instanceof operations. The NameResolver interface is a proposal to simplify this. The interface implies only the responsibilities of parsing and formatting JCR names and leaves the handling of namespace mappings or name caching as implementation details. I've included two basic implementations of the NameResolver interface in the patch: the ParsingNameResolver class contains the basic name parsing and formatting routines and uses a NamespaceResolver to access the prefix mappings, while the CachingNameResolver class can be used to decorate another NameResolver (most likely a ParsingNameResolver) with name caching. Those two implementations should be enough to cover almost all of the current name resolution needs. 2) Generational name cache Both the LRUMap and ConcurrentReaderHashMap alternatives cause the entire name cache to get flushed when a client generates a long sequence of different names, for example itemN where N is a sequence number. To better handle this situation and to get rid of all synchronization for most cases, the CachingNameResolver class implements a generational name cache that borrows ideas from modern garbage collectors. The basic idea of a generational name cache is that there are three cache maps: one is a long term cache and two represent generations of cached entries, one young and one old generation. Name entries in the long term cache are always looked up and returned directly without any synchronization or counting taking place. If the long term cache is missed, the entry is looked up in either of the generations or explicitly resolved using the decorated NameResolver. Finally the resolved name entry added to the young generation to mark it as recently requested. After a configurable number of such long term cache misses (the age of the generation) have been made, the union of the two generations is calculated and added to the long term cache. I.e. the names that have been accessed continuously over two cache generations will probably be used frequently in the long term. Then the previous young generation is becomes the new old generation, and a new empty young generation is started. Once the long term cache reaches a specified maximum size, it gets replaced with just the union of the last two generations. The CachingNameResolver is also implemented so that no synchronization is required for accessing the long term cache. Synchronization is only used when a long term cache miss occurs and even then concurrent reads to the long term cache can continue. Improve name resolution --- Key: JCR-688 URL: https://issues.apache.org/jira/browse/JCR-688 Project: Jackrabbit Issue Type: Improvement Components: core Reporter: Jukka Zitting Priority: Minor Fix For: 1.3 Attachments: JCR-688.LocalCache.patch, JCR-688.NameResolver.patch As discussed in JCR-685, the current CachingNamespaceResolver class contains excessive synchronization causing monitor contention that reduces performance. In JCR-685 there's a proposed patch that replaces synchronization with a read-write lock that would allow concurrent read access to the name cache. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (JCR-688) Improve name resolution
[ https://issues.apache.org/jira/browse/JCR-688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12462183 ] Tobias Bocanegra commented on JCR-688: -- i think using a simple ConcurrentReaderHashMap improves the performance quite a lot since NO synchronization needs to be done (if the element is found). And instead of maintaining a LRU mechanism, i would just clear the map if it reaches a certain size (e.g. 10'000 names). i think an application either uses few similar names or many different. and in the later case, an LRU map does not help either. the costs of rebuilding the cache after it was cleared are not higher than if the elements get shifted out of the LRU it this one is too small. Improve name resolution --- Key: JCR-688 URL: https://issues.apache.org/jira/browse/JCR-688 Project: Jackrabbit Issue Type: Improvement Components: core Reporter: Jukka Zitting Priority: Minor Fix For: 1.3 Attachments: JCR-688.LocalCache.patch As discussed in JCR-685, the current CachingNamespaceResolver class contains excessive synchronization causing monitor contention that reduces performance. In JCR-685 there's a proposed patch that replaces synchronization with a read-write lock that would allow concurrent read access to the name cache. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (JCR-688) Improve name resolution
[ https://issues.apache.org/jira/browse/JCR-688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12461946 ] Marcel Reutegger commented on JCR-688: -- * The performance requirements and expected data sets for the QName-String and String-QName conversions seem quite different. For example the above test case is almost entirely governed by the String-QName conversion speed. It would probably make sense to consider using separate mechanisms for the two types of conversions. I agree with you. The cost for converting a QName into its resolved String is probably not worth caching the result. We should probably remove the cache for this conversion. * Moving the name cache from the global namespace registry to the session level would probably make sense. That would avoid the synchronization requirements at the expense of extra memory use. An extra benefit would be that the name cache could also be used when session-local namespace remappings are in effect. While it certainly helps to avoid the concurrency issue it introduces a cache per session which is in most cases filled with mappings that are just duplicates from other sessions. I think the current location of the cache is the better choice because it uses the occupied memory more effectively (shared with other sessions), though at the price of increased concurrency on the cache. But that's IMO just a technical detail of the cache implementation. Improve name resolution --- Key: JCR-688 URL: https://issues.apache.org/jira/browse/JCR-688 Project: Jackrabbit Issue Type: Improvement Components: core Reporter: Jukka Zitting Priority: Minor Fix For: 1.3 Attachments: JCR-688.LocalCache.patch As discussed in JCR-685, the current CachingNamespaceResolver class contains excessive synchronization causing monitor contention that reduces performance. In JCR-685 there's a proposed patch that replaces synchronization with a read-write lock that would allow concurrent read access to the name cache. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (JCR-688) Improve name resolution
[ http://issues.apache.org/jira/browse/JCR-688?page=comments#action_12461041 ] Julian Reschke commented on JCR-688: If we know that name prefix resolution causes this amount of pain, shouldn't we consider adding QName-based addressing in JSR-283? Improve name resolution --- Key: JCR-688 URL: http://issues.apache.org/jira/browse/JCR-688 Project: Jackrabbit Issue Type: Improvement Components: core Reporter: Jukka Zitting Priority: Minor Fix For: 1.3 Attachments: JCR-688.LocalCache.patch As discussed in JCR-685, the current CachingNamespaceResolver class contains excessive synchronization causing monitor contention that reduces performance. In JCR-685 there's a proposed patch that replaces synchronization with a read-write lock that would allow concurrent read access to the name cache. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (JCR-688) Improve name resolution
[ http://issues.apache.org/jira/browse/JCR-688?page=comments#action_12461047 ] Jukka Zitting commented on JCR-688: --- If we know that name prefix resolution causes this amount of pain, shouldn't we consider adding QName-based addressing in JSR-283? This issue is just a minor implementation detail. There is a price in performance of doing the String-QName conversions, but the performance hit is probably only noticeable for heavily cached workloads like the example above. The cost of the string operations is likely insignificant as soon as the repository starts hitting the disk. The reason I'm passionate about this is not the performance, but the creeping complexity introduced by the concurrency control. More generally I think the whole name resolution infrastructure in Jackrabbit is much more complex than it really needs to be. Improve name resolution --- Key: JCR-688 URL: http://issues.apache.org/jira/browse/JCR-688 Project: Jackrabbit Issue Type: Improvement Components: core Reporter: Jukka Zitting Priority: Minor Fix For: 1.3 Attachments: JCR-688.LocalCache.patch As discussed in JCR-685, the current CachingNamespaceResolver class contains excessive synchronization causing monitor contention that reduces performance. In JCR-685 there's a proposed patch that replaces synchronization with a read-write lock that would allow concurrent read access to the name cache. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (JCR-688) Improve name resolution
[ http://issues.apache.org/jira/browse/JCR-688?page=comments#action_12460221 ] Marcel Reutegger commented on JCR-688: -- I have to apologize, my post in JCR-685 was not precise enough. What acutally happens when multiple threads access the CachingNamespaceResolver is the following: because even a get() call on the LRUMap modifies the internal data structures those calls must be synchronized. This basically serializes access on the LRUMap even though only a small critical section should be synchronized: the method moveToMRU(). Calculating the hash code, doing the lookup in the hashtable and checking whether the keys are equal does not require synchronization and multiple threads can do this concurrently. I agree that this does not necessarily have to go into the 1.2 release, but it certainly is a problem, because the CachingNamespaceResolver in the NamespaceRegistry is a single instance repository wide and used by all threads that access the repository. Since name resolution is involved in nearly every operation I think it should allow maximum concurrent threads. It is correct that the proposed change will not lower the number of synchronized sections but the sections will be significantly smaller. So to be precise this improvement is not really to avoid monitor contention but to allow more concurrent access to the CachingNamespaceResolver. Improve name resolution --- Key: JCR-688 URL: http://issues.apache.org/jira/browse/JCR-688 Project: Jackrabbit Issue Type: Improvement Components: core Reporter: Jukka Zitting Priority: Minor Fix For: 1.3 As discussed in JCR-685, the current CachingNamespaceResolver class contains excessive synchronization causing monitor contention that reduces performance. In JCR-685 there's a proposed patch that replaces synchronization with a read-write lock that would allow concurrent read access to the name cache. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (JCR-688) Improve name resolution
[ http://issues.apache.org/jira/browse/JCR-688?page=comments#action_12460243 ] Jukka Zitting commented on JCR-688: --- Is the cache even required? It seems to me that the operations that we try to speed up using the cache are very small and fast in any case, i.e. simple string parsing and concatenation. Sure, it adds up to a large number of short-lived object allocations, but a generational garbage collector shouldn't have much trouble handling that. Do we have some performance numbers to validate the need for caching in this case? Improve name resolution --- Key: JCR-688 URL: http://issues.apache.org/jira/browse/JCR-688 Project: Jackrabbit Issue Type: Improvement Components: core Reporter: Jukka Zitting Priority: Minor Fix For: 1.3 As discussed in JCR-685, the current CachingNamespaceResolver class contains excessive synchronization causing monitor contention that reduces performance. In JCR-685 there's a proposed patch that replaces synchronization with a read-write lock that would allow concurrent read access to the name cache. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (JCR-688) Improve name resolution
[ http://issues.apache.org/jira/browse/JCR-688?page=comments#action_12460272 ] Marcel Reutegger commented on JCR-688: -- One example is regularly accessing properties with the same name. In an application you will usually have a fixed set of property names (unless you use residual properties). The following test case executes 5 times faster with name caching: public void testPropertyAccess() throws RepositoryException { for (int i = 0; i 100; i++) { testRootNode.addNode(node + i); } testRootNode.save(); long time = System.currentTimeMillis(); // loop 1000 times over child nodes and access two properties for (int i = 0; i 1000; i++) { for (NodeIterator it = testRootNode.getNodes(); it.hasNext(); ) { Node n = it.nextNode(); n.getProperty(jcrPrimaryType); n.hasProperty(jcrMixinTypes); } } time = System.currentTimeMillis() - time; System.out.println(test took: + time + ms.); } Improve name resolution --- Key: JCR-688 URL: http://issues.apache.org/jira/browse/JCR-688 Project: Jackrabbit Issue Type: Improvement Components: core Reporter: Jukka Zitting Priority: Minor Fix For: 1.3 As discussed in JCR-685, the current CachingNamespaceResolver class contains excessive synchronization causing monitor contention that reduces performance. In JCR-685 there's a proposed patch that replaces synchronization with a read-write lock that would allow concurrent read access to the name cache. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (JCR-688) Improve name resolution
[ http://issues.apache.org/jira/browse/JCR-688?page=comments#action_12460349 ] Jukka Zitting commented on JCR-688: --- The following test case executes 5 times faster with name caching Good point. :-) Digging deeper in the performance figures I noticed that most of this performance hit is caused by the heavy use of regexp matching in NameFormat. I was able to get a major performance gain simply by replacing the regexp's with custom parsing code (I'll clean it up attach as an example patch). The performance is still clearly worse than with caching, but not nearly as bad as before. Some other observations: * The performance requirements and expected data sets for the QName-String and String-QName conversions seem quite different. For example the above test case is almost entirely governed by the String-QName conversion speed. It would probably make sense to consider using separate mechanisms for the two types of conversions. * Moving the name cache from the global namespace registry to the session level would probably make sense. That would avoid the synchronization requirements at the expense of extra memory use. An extra benefit would be that the name cache could also be used when session-local namespace remappings are in effect. * The individual resolution operations are very fast in any case, so the performance is only relevant when doing a large number of operations. Thus the cost of occasional heavier optimization operations (if available) could probably be easily amortized. Improve name resolution --- Key: JCR-688 URL: http://issues.apache.org/jira/browse/JCR-688 Project: Jackrabbit Issue Type: Improvement Components: core Reporter: Jukka Zitting Priority: Minor Fix For: 1.3 As discussed in JCR-685, the current CachingNamespaceResolver class contains excessive synchronization causing monitor contention that reduces performance. In JCR-685 there's a proposed patch that replaces synchronization with a read-write lock that would allow concurrent read access to the name cache. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira