[RT] JCR observation: adding cargo data to events?
Hi, (this might be more a question for the JSR283 people, but I'd like to have this community's opinion) I recently implemented a JCR-based audit trail module, and making application-level sense of the Events that an EventListener receives required quite some efforts (and lazyness is a virtue, not? ;-) In a real-life app with both human users and automated processes generating JCR data, an EventListener is bombarded with Events that sometimes make little sense at the application level, and sorting them out to create a meaningful audit trail can be tricky. See also the recent How to figure out if there was a rename operation on Node thread on users@ (http://tinyurl.com/2qfbt3) for a similar problem. This event analysis would be much easier if the save operations could be enhanced with some cargo Object, that is opaque for JCR, but passed on to Events to give more info about what's happening at the application level. Here's my suggestion (which would need changes to the JCR spec): Maybe session.save() and other save() methods could take an optional Object parameter, that is made available in the observation Event with a new getCargo() method? This object can be used, for example, to indicate that the nodes being saved are autogenerated by some metadata extractor, to mark the Events as such in an audit trail, separating them from Events that indicate human user actions. I'm wondering if this might be a valid suggestion for JSR-283, what do people think? I haven't seriously evaluated the implications at the implementation level, this might be tricky to implement in clustered settings (although the cargo could probably be saved in the journal). -Bertrand, nostalgic about the cargo concept in Clipper code circa 1987 ;-)
Re: [RT] JCR observation: adding cargo data to events?
Hi, This would probably really be worth it. I also think of somehow tagging the operations for example to provide more information in case of item removal, where very little is actually available in the event leading to guessing or having to keep caches. On the other hand, save operations may encompass a whole number of possible unrelated tasks, but this might be something for the user to handle. To come around the clustering issue, it might be defined, that the cargo should be serializable. Regards Felix Am Donnerstag, den 16.08.2007, 11:24 +0200 schrieb Bertrand Delacretaz: Hi, (this might be more a question for the JSR283 people, but I'd like to have this community's opinion) I recently implemented a JCR-based audit trail module, and making application-level sense of the Events that an EventListener receives required quite some efforts (and lazyness is a virtue, not? ;-) In a real-life app with both human users and automated processes generating JCR data, an EventListener is bombarded with Events that sometimes make little sense at the application level, and sorting them out to create a meaningful audit trail can be tricky. See also the recent How to figure out if there was a rename operation on Node thread on users@ (http://tinyurl.com/2qfbt3) for a similar problem. This event analysis would be much easier if the save operations could be enhanced with some cargo Object, that is opaque for JCR, but passed on to Events to give more info about what's happening at the application level. Here's my suggestion (which would need changes to the JCR spec): Maybe session.save() and other save() methods could take an optional Object parameter, that is made available in the observation Event with a new getCargo() method? This object can be used, for example, to indicate that the nodes being saved are autogenerated by some metadata extractor, to mark the Events as such in an audit trail, separating them from Events that indicate human user actions. I'm wondering if this might be a valid suggestion for JSR-283, what do people think? I haven't seriously evaluated the implications at the implementation level, this might be tricky to implement in clustered settings (although the cargo could probably be saved in the journal). -Bertrand, nostalgic about the cargo concept in Clipper code circa 1987 ;-)
Re: [RT] JCR observation: adding cargo data to events?
hi bertrand, i agree that this would be interesting, and could get us to a certain extent out of the method events issue. if you dont mind and others feel like this would be a valuable addition you could eventually send this as a public review comment to [EMAIL PROTECTED] so i can include it in our digest. regards, david On 8/16/07, Felix Meschberger [EMAIL PROTECTED] wrote: Hi, This would probably really be worth it. I also think of somehow tagging the operations for example to provide more information in case of item removal, where very little is actually available in the event leading to guessing or having to keep caches. On the other hand, save operations may encompass a whole number of possible unrelated tasks, but this might be something for the user to handle. To come around the clustering issue, it might be defined, that the cargo should be serializable. Regards Felix Am Donnerstag, den 16.08.2007, 11:24 +0200 schrieb Bertrand Delacretaz: Hi, (this might be more a question for the JSR283 people, but I'd like to have this community's opinion) I recently implemented a JCR-based audit trail module, and making application-level sense of the Events that an EventListener receives required quite some efforts (and lazyness is a virtue, not? ;-) In a real-life app with both human users and automated processes generating JCR data, an EventListener is bombarded with Events that sometimes make little sense at the application level, and sorting them out to create a meaningful audit trail can be tricky. See also the recent How to figure out if there was a rename operation on Node thread on users@ (http://tinyurl.com/2qfbt3) for a similar problem. This event analysis would be much easier if the save operations could be enhanced with some cargo Object, that is opaque for JCR, but passed on to Events to give more info about what's happening at the application level. Here's my suggestion (which would need changes to the JCR spec): Maybe session.save() and other save() methods could take an optional Object parameter, that is made available in the observation Event with a new getCargo() method? This object can be used, for example, to indicate that the nodes being saved are autogenerated by some metadata extractor, to mark the Events as such in an audit trail, separating them from Events that indicate human user actions. I'm wondering if this might be a valid suggestion for JSR-283, what do people think? I haven't seriously evaluated the implications at the implementation level, this might be tricky to implement in clustered settings (although the cargo could probably be saved in the journal). -Bertrand, nostalgic about the cargo concept in Clipper code circa 1987 ;-)
Re: [RT] JCR observation: adding cargo data to events?
Hi, Clustering: The Object needs to be Serializable. You need to define what happens if the class does not exist in the 'receiving' end. Maybe you want to think about operations that don't require save(), for example Workspace.move(..). Thomas
[jira] Commented: (JCR-1049) DatabaseFileSystem: mysql.ddl works for mysql5 but not mysql 4.1.20
[ https://issues.apache.org/jira/browse/JCR-1049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12520187 ] Stefan Guggisberg commented on JCR-1049: mysql4.dll and mysql.ddl should have: create unique index JCR_FSENTRY_IDX on JCR_FSENTRY (FSENTRY_PATH(745), FSENTRY_NAME); that's not a good idea since the max key limit is storage engine mysql version dependant. quote src=http://dev.mysql.com/doc/refman/4.1/en/create-index.html; Prefix lengths are storage engine-dependent (for example, a prefix can be up to 1000 bytes long for MyISAM tables, 767 bytes for InnoDB tables). (Before MySQL 4.1.2, the limit is 255 bytes for all tables.) Note that prefix limits are measured in bytes, whereas the prefix length in CREATE INDEX statements is interpreted as number of characters for non-binary data types (CHAR, VARCHAR, TEXT). Take this into account when specifying a prefix length for a column that uses a multi-byte character set. /quote the current 'create unique index' stmt in mysql.ddl is IMO a good compromise that will work with most mysql servers out there. DatabaseFileSystem: mysql.ddl works for mysql5 but not mysql 4.1.20 --- Key: JCR-1049 URL: https://issues.apache.org/jira/browse/JCR-1049 Project: Jackrabbit Issue Type: Bug Components: core Affects Versions: 1.3 Environment: MySQL 4.1.20 ERROR 1071 (42000): Specified key was too long; max key length is 1000 bytes Reporter: Stephen More Assignee: Stefan Guggisberg Perhaps a new column ( primary key ) could get added to the table called uid, which is actually an md5checksum of FSENTRY_PATH and FSENTRY_NAME. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: [RT] JCR observation: adding cargo data to events?
On 8/16/07, David Nuescheler [EMAIL PROTECTED] wrote: ...if you dont mind and others feel like this would be a valuable addition you could eventually send this as a public review comment to [EMAIL PROTECTED] so i can include it in our digest Ok, I'll send this as a public review comment tomorrow, to leave time for others to comment here. Thanks for the feedback, -Bertrand
[jira] Resolved: (JCR-1049) DatabaseFileSystem: mysql.ddl works for mysql5 but not mysql 4.1.20
[ https://issues.apache.org/jira/browse/JCR-1049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Guggisberg resolved JCR-1049. Resolution: Fixed added 'character set latin1' to 'create table' statement fixed in svn r566639 DatabaseFileSystem: mysql.ddl works for mysql5 but not mysql 4.1.20 --- Key: JCR-1049 URL: https://issues.apache.org/jira/browse/JCR-1049 Project: Jackrabbit Issue Type: Bug Components: core Affects Versions: 1.3 Environment: MySQL 4.1.20 ERROR 1071 (42000): Specified key was too long; max key length is 1000 bytes Reporter: Stephen More Assignee: Stefan Guggisberg Perhaps a new column ( primary key ) could get added to the table called uid, which is actually an md5checksum of FSENTRY_PATH and FSENTRY_NAME. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: [RT] JCR observation: adding cargo data to events?
Hi, On 8/16/07, Bertrand Delacretaz [EMAIL PROTECTED] wrote: Maybe session.save() and other save() methods could take an optional Object parameter, that is made available in the observation Event with a new getCargo() method? Sounds interesting. Have you considered how something like that could be implemented already now as a Jackrabbit-specific extension? It would be nice if we had a patch that implements this in current Jackrabbit so we could see how useful it is in practice before including it in the spec. BR, Jukka Zitting
Re: [RT] JCR observation: adding cargo data to events?
On 8/16/07, Jukka Zitting [EMAIL PROTECTED] wrote: Have you considered how something like that could be implemented already now as a Jackrabbit-specific extension?... One idea (that might also be better than my former proposal for JSR283) would be to add an additional setEventCargo(Object) method to the JCR Session, instead of having to modify all the save() (and move() as Thomas indicates) methods. All operations that generate Events would then attach this cargo to them, and the application can call setEventCargo when needed to indicate what it's doing. We could implement this in Jackrabbit by creating two cargo-specific interfaces, one for the Session and one for the Event, and having the Jackrabbit Session and Event implement these in addition to the official JCR interfaces. Is that in line with how Jackrabbit-specific stuff is done now, or do you have another suggestion? -Bertrand
Re: [RT] JCR observation: adding cargo data to events?
Hi, On 8/16/07, Bertrand Delacretaz [EMAIL PROTECTED] wrote: On 8/16/07, Jukka Zitting [EMAIL PROTECTED] wrote: Have you considered how something like that could be implemented already now as a Jackrabbit-specific extension?... One idea (that might also be better than my former proposal for JSR283) would be to add an additional setEventCargo(Object) method to the JCR Session, instead of having to modify all the save() (and move() as Thomas indicates) methods. Sounds good. We could implement this in Jackrabbit by creating two cargo-specific interfaces, one for the Session and one for the Event, and having the Jackrabbit Session and Event implement these in addition to the official JCR interfaces. Is that in line with how Jackrabbit-specific stuff is done now, or do you have another suggestion? For now I'd just put the extra methods directly on the SessionImpl and EventImpl classes. We can formalize them in a jackrabbit-api (or jackrabbit-jsr283) extension interface if they seem useful to a big enough audience. BR, Jukka Zitting
Re: [RT] JCR observation: adding cargo data to events?
Hi, Am Donnerstag, den 16.08.2007, 15:01 +0300 schrieb Jukka Zitting: For now I'd just put the extra methods directly on the SessionImpl and EventImpl classes. We can formalize them in a jackrabbit-api (or jackrabbit-jsr283) extension interface if they seem useful to a big enough audience. The problem of not having it in the API someplace, the audience might not grow enough :-) Reason: Generally I will not have access to the impl classes but just to the api classes ... Therefore, if consensus would be reached, that this might be usefull, I would suggest to add it to the API - and be it in the form of some kind of tentative API. Regards Felix
search index exceptions while clustering jackrabbit
Hi devs I am using Magnolia CMS (with Jackrabbit 1.3.1) and have set up clustered instance of JR using Oracle bundle PM. After publishing a page I get following exception: INFO openwfe.org.embed.impl.engine.AbstractEmbeddedParticipant MgnlParticipant.java(consume:88) 16.08.2007 08:08:07 consume command command-activate... INFO openwfe.org.embed.impl.engine.AbstractEmbeddedParticipant MgnlParticipant.java(consume:99) 16.08.2007 08:08:07 Command has been found through the magnolia catalog: info.magnolia.module.admi ninterface.commands.ActivationCommand INFO info.magnolia.module.exchangesimple.ReceiveFilter ReceiveFilter.java(receive:114) 16.08.2007 08:08:16 Activation succeeded WARN org.apache.jackrabbit.core.query.lucene.SearchIndex SearchIndex.java(next:370) 16.08.2007 08:08:16 Exception while creating document for node: 4f91b3c9-9a99-4386-aef9-3374560a5dad: javax.jc r.RepositoryException: Missing child node entry for node with id: 4f91b3c9-9a99-4386-aef9-3374560a5dad WARN org.apache.jackrabbit.core.query.lucene.SearchIndex SearchIndex.java(next:370) 16.08.2007 08:08:16 Exception while creating document for node: ab3c3bab-758f-4e34-a92c-a851ff313b64: javax.jc r.RepositoryException: Missing child node entry for node with id: ab3c3bab-758f-4e34-a92c-a851ff313b64 WARN org.apache.jackrabbit.core.query.lucene.SearchIndex SearchIndex.java(next:370) 16.08.2007 08:08:16 Exception while creating document for node: a39bf54d-daab-4fdf-aa84-ed9f17ea6cfe: javax.jc r.RepositoryException: Missing child node entry for node with id: a39bf54d-daab-4fdf-aa84-ed9f17ea6cfe WARN org.apache.jackrabbit.core.query.lucene.SearchIndex SearchIndex.java(next:370) 16.08.2007 08:08:16 Exception while creating document for node: c75820e5-bda9-4557-b162-02ee73a4c76d: javax.jc r.RepositoryException: Missing child node entry for node with id: c75820e5-bda9-4557-b162-02ee73a4c76d WARN org.apache.jackrabbit.core.query.lucene.SearchIndex SearchIndex.java(next:370) 16.08.2007 08:08:16 Exception while creating document for node: 02b1294d-6ba4-45ba-ab60-415265e53190: javax.jc r.RepositoryException: Missing child node entry for node with id: 02b1294d-6ba4-45ba-ab60-415265e53190 WARN org.apache.jackrabbit.core.query.lucene.SearchIndex SearchIndex.java(next:370) 16.08.2007 08:08:16 Exception while creating document for node: 74396dcc-b680-4cfd-880a-67fb4681316b: javax.jc r.RepositoryException: Missing child node entry for node with id: 74396dcc-b680-4cfd-880a-67fb4681316b INFO info.magnolia.module.exchangesimple.SimpleSyndicator BaseSyndicatorImpl.java(activate:247) 16.08.2007 08:08:16 Exchange: activation succeeded [/features] And after this the page on second cluster node is no longer accessible... Can someone perhaps shed some light here or point me in the right direction Thanks Amir
Re: [RT] JCR observation: adding cargo data to events?
Hi, On 8/16/07, Felix Meschberger [EMAIL PROTECTED] wrote: Am Donnerstag, den 16.08.2007, 15:01 +0300 schrieb Jukka Zitting: For now I'd just put the extra methods directly on the SessionImpl and EventImpl classes. We can formalize them in a jackrabbit-api (or jackrabbit-jsr283) extension interface if they seem useful to a big enough audience. The problem of not having it in the API someplace, the audience might not grow enough :-) Reason: Generally I will not have access to the impl classes but just to the api classes ... Therefore, if consensus would be reached, that this might be usefull, I would suggest to add it to the API - and be it in the form of some kind of tentative API. Yeah, I'm fine with adding stuff to the API as long as the consensus is broad enough. I'm just concerned that we don't start putting things to the API just because they seem like a good idea, and then find out that the interface needs to be modified in some way or that nobody's really using it in the end. From that perspective it's a better idea to have such extensions first just as extra methods in the implementation classes and promote them to API interfaces once we have at least two or three independent users reporting that they are happy with the additions. The only problem would be model 2 and 3 deployments where the implementation classes aren't available to the client. I guess we could also put such tentative interfaces to snapshot versions of jackrabbit-api as long as we are ready to take them out before the next release in case we don't yet have a broad enough consensus. BR, Jukka Zitting
[jira] Commented: (JCR-1050) Remove synchronization from JNDI data sources
[ https://issues.apache.org/jira/browse/JCR-1050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12520257 ] Stefan Guggisberg commented on JCR-1050: discussion on the dev list: -- Forwarded message -- From: Thomas Mueller [EMAIL PROTECTED] Date: Aug 2, 2007 9:33 AM Subject: Re: [jira] Created: (JCR-1050) Remove synchronization from JNDI data sources To: dev@jackrabbit.apache.org Hi, I'm not sure if I understand this request for improvement. Using datasources So you suggest to use DataSource.getConnection(..) instead of DriverManager.getConnection(..)? How do you get / create the datasource object, using JNDI? What about embedded applications where JNDI is not available? one should be able to rely on the application server to manage PreparedStatement caches Do you suggest to create a new PreparedStatement for each request? therefore pre-creating and holding onto the connection for long periods of time should not be needed. Could you explain the advantages of 'not holding onto the connection'? I know that MySQL closes connections after 8 hours idle time, are there any other advantages? This relates to improvement JCR-313, however, that change did not address the benefits one could see in using an application server controlled datasource. What are those benefits? Even if jackrabbit does aim to use an embedded database such a system could be configured to use datasources and could benefit from the removal of the synchronization. In what way would removal of the synchronization be a benefit? Do you think it would be faster without synchronization? How would you make sure statements are executed in the right order? Thanks, Thomas Remove synchronization from JNDI data sources - Key: JCR-1050 URL: https://issues.apache.org/jira/browse/JCR-1050 Project: Jackrabbit Issue Type: Improvement Components: core Reporter: Padraic Hannon Attachments: JNDI_Datasource_Changes.diff Using datasources one should be able to rely on the application server to manage PreparedStatement caches therefore pre-creating and holding onto the connection for long periods of time should not be needed. This relates to improvement JCR-313, however, that change did not address the benefits one could see in using an application server controlled datasource. Even if jackrabbit does aim to use an embedded database such a system could be configured to use datasources and could benefit from the removal of the synchronization. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (JCR-1050) Remove synchronization from JNDI data sources
[ https://issues.apache.org/jira/browse/JCR-1050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12520258 ] Stefan Guggisberg commented on JCR-1050: discussion on the dev list: -- Forwarded message -- From: hannonpi [EMAIL PROTECTED] Date: Aug 8, 2007 2:20 AM Subject: Re: [jira] Created: (JCR-1050) Remove synchronization from JNDI data sources To: dev@jackrabbit.apache.org See reply threaded below. Perhaps this should be moved into the Jira ticket? -paddy Thomas Mueller-6 wrote: Hi, I'm not sure if I understand this request for improvement. Using datasources So you suggest to use DataSource.getConnection(..) instead of DriverManager.getConnection(..)? How do you get / create the datasource object, using JNDI? What about embedded applications where JNDI is not available? response I attached code to the ticket. Basically, this assumes that one is running inside an application server container. I am not suggesting this be the only driver, just that the JNDI drive should be built in such a way as to make use of the facilities provided by JEE containers (datasources, jta, etc). /response one should be able to rely on the application server to manage PreparedStatement caches Do you suggest to create a new PreparedStatement for each request? response Yes, let the datasource or DB handle caching the PreparedStatements rather than holding them in an internal map. /response therefore pre-creating and holding onto the connection for long periods of time should not be needed. Could you explain the advantages of 'not holding onto the connection'? I know that MySQL closes connections after 8 hours idle time, are there any other advantages? response Why hold onto resources one is not using? Let other threads take them. /response This relates to improvement JCR-313, however, that change did not address the benefits one could see in using an application server controlled datasource. What are those benefits? response Less code in jackrabbit for managing transactions and less synchronization leading to less potential threading conflicts. /response Even if jackrabbit does aim to use an embedded database such a system could be configured to use datasources and could benefit from the removal of the synchronization. In what way would removal of the synchronization be a benefit? Do you think it would be faster without synchronization? How would you make sure statements are executed in the right order? response Our experience over the last year or so of using CQ and CRX has lead us to believe that synchronization has serious performance penalties in high traffic situations. In general I would think that the fewer synchronized parts the better. This is not a request to entirely do away with synchronized blocks. However, looking at the DB drivers it seemed that the sole purpose of such blocks was to handle the fact that statements and connections where held open for long periods by the driver. I would assume that allowing multiple threads to read would have serious performance implications and that allowing the container and db to manage transactions one could decide on the transaction isolation level outside of the core code to deal with dirty reads etc. /response Thanks, Thomas -- View this message in context: http://www.nabble.com/-jira--Created%3A-%28JCR-1050%29-Remove-synchronization-from-JNDI-data-sources-tf4203578.html#a12044986 Sent from the Jackrabbit - Dev mailing list archive at Nabble.com. Remove synchronization from JNDI data sources - Key: JCR-1050 URL: https://issues.apache.org/jira/browse/JCR-1050 Project: Jackrabbit Issue Type: Improvement Components: core Reporter: Padraic Hannon Attachments: JNDI_Datasource_Changes.diff Using datasources one should be able to rely on the application server to manage PreparedStatement caches therefore pre-creating and holding onto the connection for long periods of time should not be needed. This relates to improvement JCR-313, however, that change did not address the benefits one could see in using an application server controlled datasource. Even if jackrabbit does aim to use an embedded database such a system could be configured to use datasources and could benefit from the removal of the synchronization. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (JCR-1050) Remove synchronization from JNDI data sources
[ https://issues.apache.org/jira/browse/JCR-1050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12520259 ] Stefan Guggisberg commented on JCR-1050: discussion on the dev list: -- Forwarded message -- From: Thomas Mueller [EMAIL PROTECTED] Date: Aug 10, 2007 8:56 AM Subject: Re: [jira] Created: (JCR-1050) Remove synchronization from JNDI data sources To: dev@jackrabbit.apache.org Hi, I am not suggesting this be the only driver, just that the JNDI drive should be built in such a way as to make use of the facilities provided by JEE containers (datasources, jta, etc). I think using JNDI as an alternative way to get the connection is fine. Do you suggest to create a new PreparedStatement for each request? response Yes, let the datasource or DB handle caching the PreparedStatements rather than holding them in an internal map. /response I don't think there are advantages in using prepared statements from a data source compared to using your own prepared statements. pre-creating ... should not be needed. I agree, it's not required to create all prepared statements when connecting. It would be OK if they are created when required (and then put in a hash map or so). holding onto the connection for long periods ... should not be needed. Except for MySQL (where the connection drops after a few hours) I don't see a problem doing that. There is a risk (for all remote databases) that the connection drops temporarily (network cable disconnected or so), but if you want to solve that you need to add some reconnect functionality - even when using data sources. advantages of 'not holding onto the connection'? Why hold onto resources one is not using? Let other threads take them. You mean other threads inside Jackrabbit? As far as I know, the persistence engine of Jackrabbit doesn't require multiple connections. Or do you mean other threads inside other applications? I suggest not to access Jackrabbit databases directly. Less code in jackrabbit for managing transactions I don't think it would be less code. You anyway need to maintain the current behavior (using DriverManager to get the connection). So adding separate persistence managers (would be required for all databases) would double the maintenance work? I think there are already too many persistence managers. But I agree, getting the connection from a data source would make sense. This could be integrated into the current persistence manager(s). and less synchronization leading to less potential threading conflicts. You probably mean higher concurrency. However I don't think that this would be possible just because data sources are used. synchronization has serious performance penalties in high traffic situations. In general I would think that the fewer synchronized parts the better. When using one connection: Some JDBC drivers are not thread-safe, that means there is a risk accessing the same connection using multiple threads at the same time. Others are thread-safe, but synchronize internally, so there would be no benefit. When using multiple connections, there are new problems. Are you suggesting to use multiple connections inside one persistence manager? The connection defines the scope of the transaction, so using multiple connections would mean multiple concurrent transactions. As far as I know, the current Jackrabbit engine does not support this. Actually, I think Jackrabbit _should_ use one database connection per session. The problem is, the architecture is currently no like that. the purpose of synchronized blocks was to handle the fact that statements and connections where held open for long periods by the driver. I don't think this is the reason why synchronization is used (but I might be wrong). In my view, synchronization is used to make sure the JDBC objects (statements, result sets) are not accessed concurrently. that allowing multiple threads to read would have serious performance implications With the current architecture, I don't think removing synchronization would improve the performance. But if it does improve performance, or course this should be implemented. Thomas Remove synchronization from JNDI data sources - Key: JCR-1050 URL: https://issues.apache.org/jira/browse/JCR-1050 Project: Jackrabbit Issue Type: Improvement Components: core Reporter: Padraic Hannon Attachments: JNDI_Datasource_Changes.diff Using datasources one should be able to rely on the application server to manage PreparedStatement caches therefore pre-creating and holding onto the connection for long periods of time should not be needed. This relates to improvement JCR-313, however, that change did not address the benefits one could see in using an application server controlled
[jira] Created: (JCR-1065) Workspace{Copy|Move}VersionableTest assumptions on versioning
Workspace{Copy|Move}VersionableTest assumptions on versioning - Key: JCR-1065 URL: https://issues.apache.org/jira/browse/JCR-1065 Project: Jackrabbit Issue Type: Bug Components: JCR TCK Reporter: Julian Reschke These test cases assume that an ancestor of a versioned node can be made versioned. This may not be true for all JCR compliant stores. There should be a way to skip the test when it can not be executed. One obvious approach would be to throw a NotExecutableException when the attempt to enable versioning on the parent fails. However this has the drawback that it can mask configuration errors. Thoughts? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (JCR-1066) Exclude system index for queries that restrict the result set to nodetypes not availble in the jcr:system subtree
Exclude system index for queries that restrict the result set to nodetypes not availble in the jcr:system subtree --- Key: JCR-1066 URL: https://issues.apache.org/jira/browse/JCR-1066 Project: Jackrabbit Issue Type: Improvement Components: query Reporter: Christoph Kiehl Assignee: Christoph Kiehl Priority: Minor Fix For: 1.4 We already have code that is able to decide whether the system index needs to be included in a search or not (see JCR-967). If I execute a query like my:app//element(*, my:doc) this will only search the workspace index. Unfortunately this is slower than //element(*, my:doc), since the first query can not be optimized as the second. In our case both queries return the same result set because we use application specific node types. Even though the second query includes the system index it is still faster than the first one. But it could be even faster because it doesn't need to search the system index because nodes with the application specific node type can't be added to the jcr:system-tree and are therefore are added never to the system index (am I right?). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (JCR-1066) Exclude system index for queries that restrict the result set to nodetypes not availble in the jcr:system subtree
[ https://issues.apache.org/jira/browse/JCR-1066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Christoph Kiehl updated JCR-1066: - Attachment: patch.txt This is an initial patch. I'm not fully satisfied with it because PathQueryNode shouldn't know about NodeTypeQueryNode or node types. Maybe someone has a better idea how to implement this? Exclude system index for queries that restrict the result set to nodetypes not availble in the jcr:system subtree --- Key: JCR-1066 URL: https://issues.apache.org/jira/browse/JCR-1066 Project: Jackrabbit Issue Type: Improvement Components: query Reporter: Christoph Kiehl Assignee: Christoph Kiehl Priority: Minor Fix For: 1.4 Attachments: patch.txt We already have code that is able to decide whether the system index needs to be included in a search or not (see JCR-967). If I execute a query like my:app//element(*, my:doc) this will only search the workspace index. Unfortunately this is slower than //element(*, my:doc), since the first query can not be optimized as the second. In our case both queries return the same result set because we use application specific node types. Even though the second query includes the system index it is still faster than the first one. But it could be even faster because it doesn't need to search the system index because nodes with the application specific node type can't be added to the jcr:system-tree and are therefore are added never to the system index (am I right?). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (JCR-1050) Remove synchronization from JNDI data sources
[ https://issues.apache.org/jira/browse/JCR-1050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12520275 ] Stefan Guggisberg commented on JCR-1050: paddy, thanks for the patch, i appreciate your efforts. i have a few general comments regarding the patch: - the patch is *huge* (2k lines) and incorporates massive refactoring related and other changes in several jackrabbit classes; this makes tracking and understanding the actual changes very difficult at best. - the subject of this issue suggests that only JNDI datasource related classes would be affected ('Remove synchronization from JNDI data sources'). the scope of the patch is much broader as far as i can tell from browsing through the diff. - the patch is incomplete; i wasn't able to apply it because of some missing files. rather than refactoring the current implementations i'd like to encourage you to write a separate, independant persistence manager (accepting some code redundancy). that would enable us to perform one-to-one performance, functional scalability tests. the test results would provide a better basis for decision-making. without such tests we can only guess and make assumptions. cheers stefan Remove synchronization from JNDI data sources - Key: JCR-1050 URL: https://issues.apache.org/jira/browse/JCR-1050 Project: Jackrabbit Issue Type: Improvement Components: core Reporter: Padraic Hannon Attachments: JNDI_Datasource_Changes.diff Using datasources one should be able to rely on the application server to manage PreparedStatement caches therefore pre-creating and holding onto the connection for long periods of time should not be needed. This relates to improvement JCR-313, however, that change did not address the benefits one could see in using an application server controlled datasource. Even if jackrabbit does aim to use an embedded database such a system could be configured to use datasources and could benefit from the removal of the synchronization. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: [jira] Commented: (JCR-1050) Remove synchronization from JNDI data sources
Stefan Guggisberg (JIRA) wrote: [ https://issues.apache.org/jira/browse/JCR-1050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12520257 ] Stefan Guggisberg commented on JCR-1050: discussion on the dev list: -- Forwarded message -- From: Thomas Mueller [EMAIL PROTECTED] Date: Aug 2, 2007 9:33 AM Subject: Re: [jira] Created: (JCR-1050) Remove synchronization from JNDI data sources To: dev@jackrabbit.apache.org Hi, I'm not sure if I understand this request for improvement. Using datasources So you suggest to use DataSource.getConnection(..) instead of DriverManager.getConnection(..)? How do you get / create the datasource object, using JNDI? What about embedded applications where JNDI is not available? I really would like to see Jackrabbit to support DataSource and JNDI. This simplifies the usage in an application server and corporate environments (corporate = the AS admins configure the datasource in the AS and will ask questions why you got a JEE app which can not use the Jdbc Pool for connections ... no chance that in your role as a 'application provider' you will the the production DB password!). How about - Use commons-dbcp for creating and managing datasource - All DB backed PM/FS only use an 'injected' DataSource to get a single connection for now. This greatly reduces the redundant create-connection-from-driver-manager logic from FS, PM and for all implementation types (bundled, simple, ...). Reconnects fetch a fresh connection from the data source. - Create a JNDI PM/FS wrapper for datasource based PM/FS which would fetch the data source from JNDI and inject it into the wrapped PM/FS. one should be able to rely on the application server to manage PreparedStatement caches Do you suggest to create a new PreparedStatement for each request? As already mentioned before in this thread: a JEE datasource pool handles PrepStat caching nicely (nice article: http://www.theserverside.com/tt/articles/article.tss?l=Prepared-Statments) I'm not sure if commons-dbcp would do that, too ... ??? therefore pre-creating and holding onto the connection for long periods of time should not be needed. Could you explain the advantages of 'not holding onto the connection'? I know that MySQL closes connections after 8 hours idle time, are there any other advantages? The mysql idle timeout can be configured on the server side. Also, some firewalls close idle connections. Connection pools can 'health' check the connections before handing one to the application (eg JR). Most DB vendors provide optimized health checking utils (eg for mysql when configuring a datasource on JBoss). This relates to improvement JCR-313, however, that change did not address the benefits one could see in using an application server controlled datasource. What are those benefits? Even if jackrabbit does aim to use an embedded database such a system could be configured to use datasources and could benefit from the removal of the synchronization. In what way would removal of the synchronization be a benefit? Do you think it would be faster without synchronization? How would you make sure statements are executed in the right order? Thanks, Thomas Remove synchronization from JNDI data sources - Key: JCR-1050 URL: https://issues.apache.org/jira/browse/JCR-1050 Project: Jackrabbit Issue Type: Improvement Components: core Reporter: Padraic Hannon Attachments: JNDI_Datasource_Changes.diff Using datasources one should be able to rely on the application server to manage PreparedStatement caches therefore pre-creating and holding onto the connection for long periods of time should not be needed. This relates to improvement JCR-313, however, that change did not address the benefits one could see in using an application server controlled datasource. Even if jackrabbit does aim to use an embedded database such a system could be configured to use datasources and could benefit from the removal of the synchronization.
[jira] Updated: (JCR-1066) Exclude system index for queries that restrict the result set to nodetypes not availble in the jcr:system subtree
[ https://issues.apache.org/jira/browse/JCR-1066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Christoph Kiehl updated JCR-1066: - Attachment: (was: patch.txt) Exclude system index for queries that restrict the result set to nodetypes not availble in the jcr:system subtree --- Key: JCR-1066 URL: https://issues.apache.org/jira/browse/JCR-1066 Project: Jackrabbit Issue Type: Improvement Components: query Reporter: Christoph Kiehl Assignee: Christoph Kiehl Priority: Minor Fix For: 1.4 We already have code that is able to decide whether the system index needs to be included in a search or not (see JCR-967). If I execute a query like my:app//element(*, my:doc) this will only search the workspace index. Unfortunately this is slower than //element(*, my:doc), since the first query can not be optimized as the second. In our case both queries return the same result set because we use application specific node types. Even though the second query includes the system index it is still faster than the first one. But it could be even faster because it doesn't need to search the system index because nodes with the application specific node type can't be added to the jcr:system-tree and are therefore are added never to the system index (am I right?). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (JCR-1050) Remove synchronization from JNDI data sources
[ https://issues.apache.org/jira/browse/JCR-1050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12520309 ] Padraic Hannon commented on JCR-1050: - That makes sense, I was trying to eliminate duplication of code and ensure that there was a common code base. I will do a more coarse implementation first so we can get a better idea of what the changes are. -Paddy Remove synchronization from JNDI data sources - Key: JCR-1050 URL: https://issues.apache.org/jira/browse/JCR-1050 Project: Jackrabbit Issue Type: Improvement Components: core Reporter: Padraic Hannon Attachments: JNDI_Datasource_Changes.diff Using datasources one should be able to rely on the application server to manage PreparedStatement caches therefore pre-creating and holding onto the connection for long periods of time should not be needed. This relates to improvement JCR-313, however, that change did not address the benefits one could see in using an application server controlled datasource. Even if jackrabbit does aim to use an embedded database such a system could be configured to use datasources and could benefit from the removal of the synchronization. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (JCR-1067) Referenced beans in an object graph should be persisted by the ocm automatically
[ https://issues.apache.org/jira/browse/JCR-1067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Padraic Hannon updated JCR-1067: Comment: was deleted Referenced beans in an object graph should be persisted by the ocm automatically Key: JCR-1067 URL: https://issues.apache.org/jira/browse/JCR-1067 Project: Jackrabbit Issue Type: Improvement Components: jcr-mapping Affects Versions: 1.3.1 Reporter: Padraic Hannon Attachments: BeanReferenceCollectionConverterImpl.diff, ReferenceBeanConverterImpl.diff Currently the BeanReferenceCollectionConverter and ReferenceBeanConverter classes only persist the UUID of the referenced object. There should either be new converter classes that cascade down the object graph to ensure all referenced items are created or updated, or the existing ones should be updated to cascade. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: [jira] Created: (JCR-1050) Remove synchronization from JNDI data sources
Hi, Currently Jackrabbit uses one persistence manager per workspace, and one for versioning. That means the same persistence manager is used for all sessions (in a workspace). there should be a new manager per session, ie per usage thread. While the current architecture has advantages, the approach 'one database connection per session' also has advantages. I don't think it will be easy to implement, and there would be additional problems (transaction isolation for example). We should try to find out how much faster / more scalable this solution would be. What about defining a use cases and then writing a small 'benchmark type' application? To find out if using multiple connections really would help, and how much it would help. Test driven development. What do you think? Thomas