Re: [jira] Created: (JCR-1050) Remove synchronization from JNDI data sources
I completely agree with that approach (TDD). I think examining how systems like hibernate and toplink handle session/connection relationships would point us in the right direction. Since jackrabbit is seen as something embedded within the servlet container using an embedded db I see the reason for doing a connection per workspace, however, once data grows to a certain point or the db becomes remote I have a feeling that it would be advantageous to allow each worker thread independent access to the repository. This would mean, however, leveraging or creating a more complex transactional framework and dealing with pooling, etc. -paddy
Re: [jira] Created: (JCR-1050) Remove synchronization from JNDI data sources
Looking briefly at the code, it doesn't look like having a single instance of a persistence manager would cause too many issues given the datasourcepersistence manager as I implemented it. I do not have any class level variables that would be used to do work in multiple threads (ie no connection or prepared statement caches, etc). I'll do some testing over the weekend or next week. -paddy Padraic Hannon wrote: I completely agree with that approach (TDD). I think examining how systems like hibernate and toplink handle session/connection relationships would point us in the right direction. Since jackrabbit is seen as something embedded within the servlet container using an embedded db I see the reason for doing a connection per workspace, however, once data grows to a certain point or the db becomes remote I have a feeling that it would be advantageous to allow each worker thread independent access to the repository. This would mean, however, leveraging or creating a more complex transactional framework and dealing with pooling, etc. -paddy
Re: [jira] Created: (JCR-1050) Remove synchronization from JNDI data sources
Hi, Currently Jackrabbit uses one persistence manager per workspace, and one for versioning. That means the same persistence manager is used for all sessions (in a workspace). there should be a new manager per session, ie per usage thread. While the current architecture has advantages, the approach 'one database connection per session' also has advantages. I don't think it will be easy to implement, and there would be additional problems (transaction isolation for example). We should try to find out how much faster / more scalable this solution would be. What about defining a use cases and then writing a small 'benchmark type' application? To find out if using multiple connections really would help, and how much it would help. Test driven development. What do you think? Thomas
Re: [jira] Created: (JCR-1050) Remove synchronization from JNDI data sources
Hi, I am not suggesting this be the only driver, just that the JNDI drive should be built in such a way as to make use of the facilities provided by JEE containers (datasources, jta, etc). I think using JNDI as an alternative way to get the connection is fine. Do you suggest to create a new PreparedStatement for each request? response Yes, let the datasource or DB handle caching the PreparedStatements rather than holding them in an internal map. /response I don't think there are advantages in using prepared statements from a data source compared to using your own prepared statements. pre-creating ... should not be needed. I agree, it's not required to create all prepared statements when connecting. It would be OK if they are created when required (and then put in a hash map or so). holding onto the connection for long periods ... should not be needed. Except for MySQL (where the connection drops after a few hours) I don't see a problem doing that. There is a risk (for all remote databases) that the connection drops temporarily (network cable disconnected or so), but if you want to solve that you need to add some reconnect functionality - even when using data sources. advantages of 'not holding onto the connection'? Why hold onto resources one is not using? Let other threads take them. You mean other threads inside Jackrabbit? As far as I know, the persistence engine of Jackrabbit doesn't require multiple connections. Or do you mean other threads inside other applications? I suggest not to access Jackrabbit databases directly. Less code in jackrabbit for managing transactions I don't think it would be less code. You anyway need to maintain the current behavior (using DriverManager to get the connection). So adding separate persistence managers (would be required for all databases) would double the maintenance work? I think there are already too many persistence managers. But I agree, getting the connection from a data source would make sense. This could be integrated into the current persistence manager(s). and less synchronization leading to less potential threading conflicts. You probably mean higher concurrency. However I don't think that this would be possible just because data sources are used. synchronization has serious performance penalties in high traffic situations. In general I would think that the fewer synchronized parts the better. When using one connection: Some JDBC drivers are not thread-safe, that means there is a risk accessing the same connection using multiple threads at the same time. Others are thread-safe, but synchronize internally, so there would be no benefit. When using multiple connections, there are new problems. Are you suggesting to use multiple connections inside one persistence manager? The connection defines the scope of the transaction, so using multiple connections would mean multiple concurrent transactions. As far as I know, the current Jackrabbit engine does not support this. Actually, I think Jackrabbit _should_ use one database connection per session. The problem is, the architecture is currently no like that. the purpose of synchronized blocks was to handle the fact that statements and connections where held open for long periods by the driver. I don't think this is the reason why synchronization is used (but I might be wrong). In my view, synchronization is used to make sure the JDBC objects (statements, result sets) are not accessed concurrently. that allowing multiple threads to read would have serious performance implications With the current architecture, I don't think removing synchronization would improve the performance. But if it does improve performance, or course this should be implemented. Thomas
Re: [jira] Created: (JCR-1050) Remove synchronization from JNDI data sources
See reply threaded below. Perhaps this should be moved into the Jira ticket? -paddy Thomas Mueller-6 wrote: Hi, I'm not sure if I understand this request for improvement. Using datasources So you suggest to use DataSource.getConnection(..) instead of DriverManager.getConnection(..)? How do you get / create the datasource object, using JNDI? What about embedded applications where JNDI is not available? response I attached code to the ticket. Basically, this assumes that one is running inside an application server container. I am not suggesting this be the only driver, just that the JNDI drive should be built in such a way as to make use of the facilities provided by JEE containers (datasources, jta, etc). /response one should be able to rely on the application server to manage PreparedStatement caches Do you suggest to create a new PreparedStatement for each request? response Yes, let the datasource or DB handle caching the PreparedStatements rather than holding them in an internal map. /response therefore pre-creating and holding onto the connection for long periods of time should not be needed. Could you explain the advantages of 'not holding onto the connection'? I know that MySQL closes connections after 8 hours idle time, are there any other advantages? response Why hold onto resources one is not using? Let other threads take them. /response This relates to improvement JCR-313, however, that change did not address the benefits one could see in using an application server controlled datasource. What are those benefits? response Less code in jackrabbit for managing transactions and less synchronization leading to less potential threading conflicts. /response Even if jackrabbit does aim to use an embedded database such a system could be configured to use datasources and could benefit from the removal of the synchronization. In what way would removal of the synchronization be a benefit? Do you think it would be faster without synchronization? How would you make sure statements are executed in the right order? response Our experience over the last year or so of using CQ and CRX has lead us to believe that synchronization has serious performance penalties in high traffic situations. In general I would think that the fewer synchronized parts the better. This is not a request to entirely do away with synchronized blocks. However, looking at the DB drivers it seemed that the sole purpose of such blocks was to handle the fact that statements and connections where held open for long periods by the driver. I would assume that allowing multiple threads to read would have serious performance implications and that allowing the container and db to manage transactions one could decide on the transaction isolation level outside of the core code to deal with dirty reads etc. /response Thanks, Thomas -- View this message in context: http://www.nabble.com/-jira--Created%3A-%28JCR-1050%29-Remove-synchronization-from-JNDI-data-sources-tf4203578.html#a12044986 Sent from the Jackrabbit - Dev mailing list archive at Nabble.com.
Re: [jira] Created: (JCR-1050) Remove synchronization from JNDI data sources
Hi, I'm not sure if I understand this request for improvement. Using datasources So you suggest to use DataSource.getConnection(..) instead of DriverManager.getConnection(..)? How do you get / create the datasource object, using JNDI? What about embedded applications where JNDI is not available? one should be able to rely on the application server to manage PreparedStatement caches Do you suggest to create a new PreparedStatement for each request? therefore pre-creating and holding onto the connection for long periods of time should not be needed. Could you explain the advantages of 'not holding onto the connection'? I know that MySQL closes connections after 8 hours idle time, are there any other advantages? This relates to improvement JCR-313, however, that change did not address the benefits one could see in using an application server controlled datasource. What are those benefits? Even if jackrabbit does aim to use an embedded database such a system could be configured to use datasources and could benefit from the removal of the synchronization. In what way would removal of the synchronization be a benefit? Do you think it would be faster without synchronization? How would you make sure statements are executed in the right order? Thanks, Thomas
[jira] Created: (JCR-1050) Remove synchronization from JNDI data sources
Remove synchronization from JNDI data sources - Key: JCR-1050 URL: https://issues.apache.org/jira/browse/JCR-1050 Project: Jackrabbit Issue Type: Improvement Components: core Affects Versions: 1.3, 1.2.3, 1.2.2, 1.2.1, 1.1.1, 1.1, 1.0.1, 1.0, 0.9, 1.3.1, 1.4, 2.0 Reporter: Padraic Hannon Using datasources one should be able to rely on the application server to manage PreparedStatement caches therefore pre-creating and holding onto the connection for long periods of time should not be needed. This relates to improvement JCR-313, however, that change did not address the benefits one could see in using an application server controlled datasource. Even if jackrabbit does aim to use an embedded database such a system could be configured to use datasources and could benefit from the removal of the synchronization. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.