[jira] Commented: (JCR-2900) DbClusterTest failure due to network configuration
[ https://issues.apache.org/jira/browse/JCR-2900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12996308#comment-12996308 ] Serge Huber commented on JCR-2900: -- Thanks Jukka for creating this bug. I can confirm the fix works fine. DbClusterTest failure due to network configuration -- Key: JCR-2900 URL: https://issues.apache.org/jira/browse/JCR-2900 Project: Jackrabbit Content Repository Issue Type: Bug Components: jackrabbit-core Affects Versions: 2.2.4 Reporter: Jukka Zitting Assignee: Thomas Mueller Priority: Minor Fix For: 2.2.5 As reported by Serge, the DbClusterTest case fails when run with certain network configuration. Thomas already suggested a fix: ### Eclipse Workspace Patch 1.0 #P jackrabbit-core Index: src/test/java/org/apache/jackrabbit/core/cluster/DbClusterTest.java === --- src/test/java/org/apache/jackrabbit/core/cluster/DbClusterTest.java (revisi on 1067983) +++ src/test/java/org/apache/jackrabbit/core/cluster/DbClusterTest.java (workin g copy) @@ -37,9 +37,9 @@ public void setUp() throws Exception { deleteAll(); server1 = Server.createTcpServer(-tcpPort, 9001, -baseDir, -./target/dbClusterTest/db1).start(); +./target/dbClusterTest/db1, -tcpAllowOthers).start(); server2 = Server.createTcpServer(-tcpPort, 9002, -baseDir, -./target/dbClusterTest/db2).start(); +./target/dbClusterTest/db2, -tcpAllowOthers).start(); FileUtils.copyFile( new File(./src/test/resources/org/apache/jackrabbit/core/cluster/repository-h2 .xml), new File(./target/dbClusterTest/node1/repository.xml)); -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: [VOTE] Release Apache Jackrabbit 2.2.4
Thank you very much for your detailed answer, I did indeed have something strange in my etc/hosts file but once removed it didn't change anything. Here is the information you requested : localhost:h2 loom$ cat /etc/hosts ## # Host Database # # localhost is used to configure the loopback interface # when the system is booting. Do not change this entry. ## 127.0.0.1 localhost 255.255.255.255 broadcasthost ::1 localhost fe80::1%lo0 localhost ./build.sh testNetwork localhost:localhost/10.X.37.XXX localhost/10.X.37.XXX localhost/10.X.129.2 localhost/10.X.55.2 localhost/192.168.74.1 localhost/192.168.223.1 localhost/fe80:0:0:0:225:4bff:fea6:8410%4 localhost/fe80:0:0:0:21c:42ff:fe00:0%7 localhost/fe80:0:0:0:21c:42ff:fe00:1%8 getLocalHost:localhost/127.0.0.1 /127.0.0.1 byName:/127.0.0.1 ServerSocket[addr=0.0.0.0/0.0.0.0,port=0,localport=59901] server accepting client:/127.0.0.1:59901 server accepted:Socket[addr=/127.0.0.1,port=59902,localport=59901] client:Socket[addr=/127.0.0.1,port=59901,localport=59902] server read:123 client read:234 done Done in 1430 ms server closing server done I have VMWare and ShareTool installed, which are probably why all these interfaces are present. I also obscured some adresses just to be on the same side :) I would prefer not to disable the test case, but rather improve it so that it handles this configuration. cheers, Serge... On 14 févr. 2011, at 09:48, Thomas Mueller wrote: Hi, Remote connections to this server are not allowed, see -tcpAllowOthers [90117-149] This looks more like a network problem with the H2 database than a problem with Jackrabbit. It might be a network configuration problem, or another service is already running on the ports 9001 or 9002. This test case starts two H2 servers on port 9001 and 9002, and then tries to connect to them over the local IP address (not localhost). To find out if it's a network config problem, could you check your /etc/hosts file for weird entries? I saw similar problems before, but I don't know the root cause. One option, of course, is to disable this test case, but that would be a bit sad, because it's the only clustering test case. To find out if it's really a network config problem, please download the H2 database (h2database.com), and run ./build.sh testNetwork and send me the result (only the last part; the system properties are not relevant). See below for what I get on my machine (Mac OS X as well). It might be possible to work around the problem by setting the system property h2.bindAddress to localhost (but I didn't actually test this). cat /etc/hosts 127.0.0.1 localhost 255.255.255.255 broadcasthost ::1 localhost fe80::1%lo0 localhost ./build.sh testNetwork localhost:localhost/127.0.0.1 localhost/127.0.0.1 localhost/0:0:0:0:0:0:0:1 localhost/fe80:0:0:0:0:0:0:1%1 getLocalHost:Thomas-Muellers-MacBook-Pro.local/10.131.197.10 /10.131.197.10 byName:/10.131.197.10 ServerSocket[addr=0.0.0.0/0.0.0.0,port=0,localport=63643] server accepting client:/10.131.197.10:63643 server accepted:Socket[addr=/10.131.197.10,port=63644,localport=63643] client:Socket[addr=/10.131.197.10,port=63643,localport=63644] server read:123 client read:234 server closing server done done Done in 1742 ms Regards, Thomas
Re: [VOTE] Release Apache Jackrabbit 2.2.4
Hello Thomas, I just tested the patch and this works fine here. Thanks a lot, I suggest you commit it also on the 2.2 branch if that's ok ? Best regards, Serge Huber. On 14 févr. 2011, at 10:50, Thomas Mueller wrote: Hi, The H2 database server thinks that the localhost connection is actually coming from another machine, and therefore reject it (for security reasons). To disable this check, apply the following patch: ### Eclipse Workspace Patch 1.0 #P jackrabbit-core Index: src/test/java/org/apache/jackrabbit/core/cluster/DbClusterTest.java === --- src/test/java/org/apache/jackrabbit/core/cluster/DbClusterTest.java (revisi on 1067983) +++ src/test/java/org/apache/jackrabbit/core/cluster/DbClusterTest.java (workin g copy) @@ -37,9 +37,9 @@ public void setUp() throws Exception { deleteAll(); server1 = Server.createTcpServer(-tcpPort, 9001, -baseDir, -./target/dbClusterTest/db1).start(); +./target/dbClusterTest/db1, -tcpAllowOthers).start(); server2 = Server.createTcpServer(-tcpPort, 9002, -baseDir, -./target/dbClusterTest/db2).start(); +./target/dbClusterTest/db2, -tcpAllowOthers).start(); FileUtils.copyFile( new File(./src/test/resources/org/apache/jackrabbit/core/cluster/repository-h2 .xml), new File(./target/dbClusterTest/node1/repository.xml)); I will apply it in the trunk if this solves the problem for you. Regards, Thomas
Re: [VOTE] Release Apache Jackrabbit 2.2.4
Hello Jukka, I must say I'm not very familiar with the process. Is it possible to release a 2.2.4 that includes this fix or is it too late in the process ? Otherwise I would say let's cut 2.2.4 as it is and prepare this for 2.2.5. Regards, Serge Huber. On 14 févr. 2011, at 13:34, Jukka Zitting wrote: Hi, On 02/14/2011 11:19 AM, Serge Huber wrote: I just tested the patch and this works fine here. With that sorted out, do you think we should cut a 2.2.5 release with this fix included, or can we go forward with 2.2.4 as-is (I still need one +1)? It's just a test issue that occurs in a specific environment, so I think we should be fine to release 2.2.4 like this. -- Jukka Zitting
Re: [VOTE] Release Apache Jackrabbit 2.2.4
I agree, let's cut 2.2.4 as it is. Here's my vote : +1 :) And I hope, for my sake and other's with non-standard configs, that 2.2.5 will be right behind :) Regards, Serge... On 14 févr. 2011, at 14:18, Jukka Zitting wrote: Hi, On 02/14/2011 01:45 PM, Serge Huber wrote: I must say I'm not very familiar with the process. Is it possible to release a 2.2.4 that includes this fix or is it too late in the process ? Otherwise I would say let's cut 2.2.4 as it is and prepare this for 2.2.5. I like to upgrade the version number whenever a new release candidate is made to avoid confusion. For example we cancelled the 2.2.3 release vote and cut the 2.2.4 candidate instead to get the somewhat critical deadlock and database journal fixes included. Version numbers are cheap and it's not a big deal to re-cut a release, but in this case I think the benefit is not worth the trouble of a separate vote. -- Jukka Zitting
Re: [VOTE] Release Apache Jackrabbit 2.2.4
I'm having a strange issue when compiling rev 1068882 (which should be the same as 2.2.4). One of the cluster test fails : --- Test set: org.apache.jackrabbit.core.cluster.TestAll --- Tests run: 10, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0.266 sec FAILURE! test(org.apache.jackrabbit.core.cluster.DbClusterTest) Time elapsed: 0.159 sec ERROR! javax.jcr.RepositoryException: File system initialization failure. at org.apache.jackrabbit.core.config.RepositoryConfigurationParser$6.getFileSystem(RepositoryConfigurationParser.java:1060) at org.apache.jackrabbit.core.config.RepositoryConfig.getFileSystem(RepositoryConfig.java:911) at org.apache.jackrabbit.core.RepositoryImpl.init(RepositoryImpl.java:285) at org.apache.jackrabbit.core.RepositoryImpl.create(RepositoryImpl.java:605) at org.apache.jackrabbit.core.cluster.DbClusterTest.test(DbClusterTest.java:62) Caused by: org.apache.jackrabbit.core.fs.FileSystemException: failed to initialize file system at org.apache.jackrabbit.core.fs.db.DatabaseFileSystem.init(DatabaseFileSystem.java:210) at org.apache.jackrabbit.core.config.RepositoryConfigurationParser$6.getFileSystem(RepositoryConfigurationParser.java:1057) ... 31 more Caused by: org.apache.commons.dbcp.SQLNestedException: Cannot create PoolableConnectionFactory (Remote connections to this server are not allowed, see -tcpAllowOthers [90117-149]) at org.apache.commons.dbcp.BasicDataSource.createDataSource(BasicDataSource.java:1225) at org.apache.commons.dbcp.BasicDataSource.getConnection(BasicDataSource.java:880) at org.apache.jackrabbit.core.util.db.ConnectionHelper.getExtraNameCharacters(ConnectionHelper.java:163) at org.apache.jackrabbit.core.util.db.ConnectionHelper.prepareDbIdentifier(ConnectionHelper.java:118) at org.apache.jackrabbit.core.fs.db.DatabaseFileSystem.init(DatabaseFileSystem.java:193) ... 32 more Caused by: org.h2.jdbc.JdbcSQLException: Remote connections to this server are not allowed, see -tcpAllowOthers [90117-149] at org.h2.message.DbException.getJdbcSQLException(DbException.java:327) at org.h2.message.DbException.get(DbException.java:167) at org.h2.message.DbException.get(DbException.java:144) at org.h2.message.DbException.get(DbException.java:133) at org.h2.server.TcpServerThread.run(TcpServerThread.java:69) at java.lang.Thread.run(Thread.java:680) at org.h2.engine.SessionRemote.done(SessionRemote.java:543) at org.h2.engine.SessionRemote.initTransfer(SessionRemote.java:109) at org.h2.engine.SessionRemote.connectServer(SessionRemote.java:376) at org.h2.engine.SessionRemote.connectEmbeddedOrServer(SessionRemote.java:271) at org.h2.engine.SessionRemote.createSession(SessionRemote.java:265) at org.h2.jdbc.JdbcConnection.init(JdbcConnection.java:110) at org.h2.jdbc.JdbcConnection.init(JdbcConnection.java:94) at org.h2.Driver.connect(Driver.java:62) at org.apache.commons.dbcp.DriverConnectionFactory.createConnection(DriverConnectionFactory.java:38) at org.apache.commons.dbcp.PoolableConnectionFactory.makeObject(PoolableConnectionFactory.java:294) at org.apache.commons.dbcp.BasicDataSource.validateConnectionFactory(BasicDataSource.java:1247) at org.apache.commons.dbcp.BasicDataSource.createDataSource(BasicDataSource.java:1221) ... 36 more I should mention that I am testing with Mac OS X. Is anyone else seeing this issue ? Best regards, Serge Huber. On 9 févr. 2011, at 16:03, Jukka Zitting wrote: Hi, A candidate for the Jackrabbit 2.2.4 release is available at: http://people.apache.org/~jukka/jackrabbit/2.2.4/ The release candidate is a zip archive of the sources in: http://svn.apache.org/repos/asf/jackrabbit/tags/2.2.4/ The SHA1 checksum of the archive is 210a5056a4ceb711e1e66995aafefbab51753c5a. A staged Maven repository is available for review at: https://repository.apache.org/content/repositories/orgapachejackrabbit-048/ Please vote on releasing this package as Apache Jackrabbit 2.2.4. The vote is open for the next 72 hours and passes if a majority of at least three +1 Jackrabbit PMC votes are cast. [ ] +1 Release this package as Apache Jackrabbit 2.2.4 [ ] -1 Do not release this package because... My vote is +1. -- Jukka Zitting
[jira] Commented: (JCR-2415) Update Lucene to 3.0
[ https://issues.apache.org/jira/browse/JCR-2415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12984599#action_12984599 ] Serge Huber commented on JCR-2415: -- Thanks for the work. Do you know how this change should affect query performance ? Should it improve, or will it degrade ? Best regards, Serge Huber. Update Lucene to 3.0 Key: JCR-2415 URL: https://issues.apache.org/jira/browse/JCR-2415 Project: Jackrabbit Content Repository Issue Type: Improvement Components: query Affects Versions: 2.0-beta3 Reporter: Attila Király Lucene 3.0 was released on 2009/11/25. They migrated to Java 1.5 as Jackrabbit is doing with 2.0. Also they added some new optimizations. It would be nice if Jackrabbit could switch to the new lucene version too. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: Jackrabbit 2.2.1 release plan
Hello Jukka, Thanks for the clarification. I will in the future always mark the issues for the proper patch release then, and brush up on svn merge, because I'm used to doing merges by hand :) For JCR-2622 yes I wrote my question too soon, I noticed afterwards that there was no fix required. Best regards, Serge Huber. On 6 janv. 2011, at 16:55, Jukka Zitting wrote: Hi, On Thu, Jan 6, 2011 at 12:04 PM, Serge Huber shub...@jahia.com wrote: First of all thanks for porting the ISDESCENDANT node to the 2.2 branch. No problem. To answer your previous question about it, anyone can merge fixes back to maintenance branches (just remember to use svn merge so we have proper merge tracking records in place) but it's the responsibility of the release manager to take care of any pending merges before cutting the release candidate. I already have the merging process pretty streamlined, so usually it's easiest to simply tag a Jira issue for inclusion in a patch release, and I'll take care of the merging as a part of the release preparation phase. I noticed you didn't include 2622 ? Do you think it should not be included for 2.2.1 ? I looked at the issue, but agreed with Fabrizio's conclusion that we actually don't need to fix that. BR, Jukka Zitting
Re: Jackrabbit 2.2.1 release plan
Hello Jukka, First of all thanks for porting the ISDESCENDANT node to the 2.2 branch. I noticed you didn't include 2622 ? Do you think it should not be included for 2.2.1 ? Best regards, Serge Huber. On 4 janv. 2011, at 21:05, Jukka Zitting wrote: Hi, It’s a few weeks since we did 2.2.0 and we have a few good fixes lined up for the first 2.2.x patch release, so I’m planning to cut a 2.2.1 release candidate tomorrow afternoon. I’ll make sure that all the issued tagged for 2.2.1 have their commits merged to the 2.2 branch before cutting the release. Seehttps://issues.apache.org/jira/browse/JCR/fixforversion/12315965 for the current list of issues to be included. Please use Jira or reply here if you’d like to see other fixes included. BR, Jukka Zitting
Re: [VOTE] Release Apache Jackrabbit 2.2.1
Looks good to me, so +1 :) Best regards, Serge Huber. On 5 janv. 2011, at 19:46, Jukka Zitting wrote: Hi, A candidate for the Jackrabbit 2.2.1 release is available at: http://people.apache.org/~jukka/jackrabbit/2.2.1/ The release candidate is a zip archive of the sources in: http://svn.apache.org/repos/asf/jackrabbit/tags/2.2.1/ The SHA1 checksum of the archive is dfe0a621f91ae765a36b3f892f01531e6a054834. A staged Maven repository is available for review at: https://repository.apache.org/content/repositories/orgapachejackrabbit-005/ Please vote on releasing this package as Apache Jackrabbit 2.2.1. The vote is open for the next 72 hours and passes if a majority of at least three +1 Jackrabbit PMC votes are cast. [ ] +1 Release this package as Apache Jackrabbit 2.2.1 [ ] -1 Do not release this package because... My vote is +1. BR, Jukka Zitting
[jira] Commented: (JCR-2415) Update Lucene to 3.0
[ https://issues.apache.org/jira/browse/JCR-2415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12978270#action_12978270 ] Serge Huber commented on JCR-2415: -- I wish I could help out but it's not possible in the immediate future but I'm also very interested in this. Update Lucene to 3.0 Key: JCR-2415 URL: https://issues.apache.org/jira/browse/JCR-2415 Project: Jackrabbit Content Repository Issue Type: Improvement Components: query Affects Versions: 2.0-beta3 Reporter: Attila Király Lucene 3.0 was released on 2009/11/25. They migrated to Java 1.5 as Jackrabbit is doing with 2.0. Also they added some new optimizations. It would be nice if Jackrabbit could switch to the new lucene version too. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: Jackrabbit 2.3 performance test sub project
On 17 déc. 2010, at 11:42, Jukka Zitting wrote: Hi, From: Serge Huber [mailto:shub...@jahia.com] I have taken the liberty of committing a sub-project for Jackrabbit 2.3 because I needed it to validate a patch that we've been working on. Excellent, thanks! We should probably also set the jackrabbit22 performance test to always use version 2.2.0 instead of a snapshot now that the release is out. Ok I've just committed that. Btw did you see my comment about porting the patch to Jackrabbit 2.2 for JCR-2835. I realize I asked the question before others so that wasn't very clear :) Serge Huber commented on JCR-2835: -- Thanks Jukka, will you take care of the backport or should I do it ? Regards, Serge... Best regards, Serge Huber.
[jira] Commented: (JCR-2835) Poor performance of ISDESCENDANTNODE on SQL 2 queries
[ https://issues.apache.org/jira/browse/JCR-2835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12972384#action_12972384 ] Serge Huber commented on JCR-2835: -- Also, maybe we should port this to 2.2.1 ? Regards, Serge Huber. Poor performance of ISDESCENDANTNODE on SQL 2 queries - Key: JCR-2835 URL: https://issues.apache.org/jira/browse/JCR-2835 Project: Jackrabbit Content Repository Issue Type: Improvement Components: jackrabbit-core, query Affects Versions: 2.2.0, 2.2.1, 2.3.0 Reporter: Serge Huber Fix For: 2.3.0 Attachments: DescendantSearchTest.png, JCR-2835-use-DescendantSelfAxisQuery.patch, JCR-2835_PerformanceTests.patch, JCR-2835_Poor_performance_on_ISDESCENDANTNODE_constraint_v1.patch, SQL2DescendantSearchTest.png Using the latest source code, I have noticed very bad performance on SQL-2 queries that use the ISDESCENDANTNODE constraint on a large sub-tree. For example, the query : select * from [jnt:news] as news where ISDESCENDANTNODE(news,'/root/site') order by news.[date] desc executes in 600ms select * from [jnt:news] as news order by news.[date] desc executes in 4ms From looking at the problem in the Yourkit profiler, it seems that the culprit is the constraint building, that uses recursive Lucene searches to build the list of descendant node IDs : private Query getDescendantNodeQuery( DescendantNode dn, JackrabbitIndexSearcher searcher) throws RepositoryException, IOException { BooleanQuery query = new BooleanQuery(); try { LinkedListNodeId ids = new LinkedListNodeId(); NodeImpl ancestor = (NodeImpl) session.getNode(dn.getAncestorPath()); ids.add(ancestor.getNodeId()); while (!ids.isEmpty()) { String id = ids.removeFirst().toString(); Query q = new JackrabbitTermQuery(new Term(FieldNames.PARENT, id)); QueryHits hits = searcher.evaluate(q); ScoreNode sn = hits.nextScoreNode(); if (sn != null) { query.add(q, SHOULD); do { ids.add(sn.getNodeId()); sn = hits.nextScoreNode(); } while (sn != null); } } } catch (PathNotFoundException e) { query.add(new JackrabbitTermQuery(new Term( FieldNames.UUID, invalid-node-id)), // never matches SHOULD); } return query; } In the above example this generates over 2800 Lucene queries, which is the culprit. I wonder if it wouldn't be faster to retrieve the IDs by using the JCR to retrieve the list of child IDs ? This was probably also missed because I didn't seem to find any performance tests on this constraint. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (JCR-2835) Poor performance of ISDESCENDANTNODE on SQL 2 queries
[ https://issues.apache.org/jira/browse/JCR-2835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12972440#action_12972440 ] Serge Huber commented on JCR-2835: -- Thanks Jukka, will you take care of the backport or should I do it ? Regards, Serge... Poor performance of ISDESCENDANTNODE on SQL 2 queries - Key: JCR-2835 URL: https://issues.apache.org/jira/browse/JCR-2835 Project: Jackrabbit Content Repository Issue Type: Improvement Components: jackrabbit-core, query Affects Versions: 2.2.0, 2.2.1, 2.3.0 Reporter: Serge Huber Fix For: 2.2.1, 2.3.0 Attachments: DescendantSearchTest.png, JCR-2835-use-DescendantSelfAxisQuery.patch, JCR-2835_PerformanceTests.patch, JCR-2835_Poor_performance_on_ISDESCENDANTNODE_constraint_v1.patch, SQL2DescendantSearchTest.png Using the latest source code, I have noticed very bad performance on SQL-2 queries that use the ISDESCENDANTNODE constraint on a large sub-tree. For example, the query : select * from [jnt:news] as news where ISDESCENDANTNODE(news,'/root/site') order by news.[date] desc executes in 600ms select * from [jnt:news] as news order by news.[date] desc executes in 4ms From looking at the problem in the Yourkit profiler, it seems that the culprit is the constraint building, that uses recursive Lucene searches to build the list of descendant node IDs : private Query getDescendantNodeQuery( DescendantNode dn, JackrabbitIndexSearcher searcher) throws RepositoryException, IOException { BooleanQuery query = new BooleanQuery(); try { LinkedListNodeId ids = new LinkedListNodeId(); NodeImpl ancestor = (NodeImpl) session.getNode(dn.getAncestorPath()); ids.add(ancestor.getNodeId()); while (!ids.isEmpty()) { String id = ids.removeFirst().toString(); Query q = new JackrabbitTermQuery(new Term(FieldNames.PARENT, id)); QueryHits hits = searcher.evaluate(q); ScoreNode sn = hits.nextScoreNode(); if (sn != null) { query.add(q, SHOULD); do { ids.add(sn.getNodeId()); sn = hits.nextScoreNode(); } while (sn != null); } } } catch (PathNotFoundException e) { query.add(new JackrabbitTermQuery(new Term( FieldNames.UUID, invalid-node-id)), // never matches SHOULD); } return query; } In the above example this generates over 2800 Lucene queries, which is the culprit. I wonder if it wouldn't be faster to retrieve the IDs by using the JCR to retrieve the list of child IDs ? This was probably also missed because I didn't seem to find any performance tests on this constraint. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (JCR-2835) Poor performance of ISDESCENDANTNODE on SQL 2 queries
[ https://issues.apache.org/jira/browse/JCR-2835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Serge Huber updated JCR-2835: - Fix Version/s: 2.2.1 Poor performance of ISDESCENDANTNODE on SQL 2 queries - Key: JCR-2835 URL: https://issues.apache.org/jira/browse/JCR-2835 Project: Jackrabbit Content Repository Issue Type: Improvement Components: jackrabbit-core, query Affects Versions: 2.2.0, 2.2.1, 2.3.0 Reporter: Serge Huber Fix For: 2.2.1, 2.3.0 Attachments: DescendantSearchTest.png, JCR-2835-use-DescendantSelfAxisQuery.patch, JCR-2835_PerformanceTests.patch, JCR-2835_Poor_performance_on_ISDESCENDANTNODE_constraint_v1.patch, SQL2DescendantSearchTest.png Using the latest source code, I have noticed very bad performance on SQL-2 queries that use the ISDESCENDANTNODE constraint on a large sub-tree. For example, the query : select * from [jnt:news] as news where ISDESCENDANTNODE(news,'/root/site') order by news.[date] desc executes in 600ms select * from [jnt:news] as news order by news.[date] desc executes in 4ms From looking at the problem in the Yourkit profiler, it seems that the culprit is the constraint building, that uses recursive Lucene searches to build the list of descendant node IDs : private Query getDescendantNodeQuery( DescendantNode dn, JackrabbitIndexSearcher searcher) throws RepositoryException, IOException { BooleanQuery query = new BooleanQuery(); try { LinkedListNodeId ids = new LinkedListNodeId(); NodeImpl ancestor = (NodeImpl) session.getNode(dn.getAncestorPath()); ids.add(ancestor.getNodeId()); while (!ids.isEmpty()) { String id = ids.removeFirst().toString(); Query q = new JackrabbitTermQuery(new Term(FieldNames.PARENT, id)); QueryHits hits = searcher.evaluate(q); ScoreNode sn = hits.nextScoreNode(); if (sn != null) { query.add(q, SHOULD); do { ids.add(sn.getNodeId()); sn = hits.nextScoreNode(); } while (sn != null); } } } catch (PathNotFoundException e) { query.add(new JackrabbitTermQuery(new Term( FieldNames.UUID, invalid-node-id)), // never matches SHOULD); } return query; } In the above example this generates over 2800 Lucene queries, which is the culprit. I wonder if it wouldn't be faster to retrieve the IDs by using the JCR to retrieve the list of child IDs ? This was probably also missed because I didn't seem to find any performance tests on this constraint. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (JCR-2835) Poor performance of ISDESCENDANTNODE on SQL 2 queries
[ https://issues.apache.org/jira/browse/JCR-2835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12972444#action_12972444 ] Serge Huber commented on JCR-2835: -- Ok I have committed this in the trunk, as revision 1050346 Regards, Serge Huber. Poor performance of ISDESCENDANTNODE on SQL 2 queries - Key: JCR-2835 URL: https://issues.apache.org/jira/browse/JCR-2835 Project: Jackrabbit Content Repository Issue Type: Improvement Components: jackrabbit-core, query Affects Versions: 2.2.0, 2.2.1, 2.3.0 Reporter: Serge Huber Fix For: 2.2.1, 2.3.0 Attachments: DescendantSearchTest.png, JCR-2835-use-DescendantSelfAxisQuery.patch, JCR-2835_PerformanceTests.patch, JCR-2835_Poor_performance_on_ISDESCENDANTNODE_constraint_v1.patch, SQL2DescendantSearchTest.png Using the latest source code, I have noticed very bad performance on SQL-2 queries that use the ISDESCENDANTNODE constraint on a large sub-tree. For example, the query : select * from [jnt:news] as news where ISDESCENDANTNODE(news,'/root/site') order by news.[date] desc executes in 600ms select * from [jnt:news] as news order by news.[date] desc executes in 4ms From looking at the problem in the Yourkit profiler, it seems that the culprit is the constraint building, that uses recursive Lucene searches to build the list of descendant node IDs : private Query getDescendantNodeQuery( DescendantNode dn, JackrabbitIndexSearcher searcher) throws RepositoryException, IOException { BooleanQuery query = new BooleanQuery(); try { LinkedListNodeId ids = new LinkedListNodeId(); NodeImpl ancestor = (NodeImpl) session.getNode(dn.getAncestorPath()); ids.add(ancestor.getNodeId()); while (!ids.isEmpty()) { String id = ids.removeFirst().toString(); Query q = new JackrabbitTermQuery(new Term(FieldNames.PARENT, id)); QueryHits hits = searcher.evaluate(q); ScoreNode sn = hits.nextScoreNode(); if (sn != null) { query.add(q, SHOULD); do { ids.add(sn.getNodeId()); sn = hits.nextScoreNode(); } while (sn != null); } } } catch (PathNotFoundException e) { query.add(new JackrabbitTermQuery(new Term( FieldNames.UUID, invalid-node-id)), // never matches SHOULD); } return query; } In the above example this generates over 2800 Lucene queries, which is the culprit. I wonder if it wouldn't be faster to retrieve the IDs by using the JCR to retrieve the list of child IDs ? This was probably also missed because I didn't seem to find any performance tests on this constraint. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (JCR-2835) Poor performance of ISDESCENDANTNODE on SQL 2 queries
[ https://issues.apache.org/jira/browse/JCR-2835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Serge Huber updated JCR-2835: - Attachment: SQL2DescendantSearchTest.png DescendantSearchTest.png I have generated the performance graphs and indeed this patch looks really good ! Btw I had a lot of trouble generating the graphs under Mac OS X. It took me a while to understand that I needed to install the following packages from fink : imagemagick2-svg gnuplot I think we might want to add that to the README.txt if there are others trying to use Mac OS X to generate the graphs. If nobody has any objections, I'd like to commit this patch since the results are really much better ? Best regards, Serge Huber. Poor performance of ISDESCENDANTNODE on SQL 2 queries - Key: JCR-2835 URL: https://issues.apache.org/jira/browse/JCR-2835 Project: Jackrabbit Content Repository Issue Type: Improvement Components: jackrabbit-core, query Affects Versions: 2.2.0, 2.2.1, 2.3.0 Reporter: Serge Huber Fix For: 2.3.0 Attachments: DescendantSearchTest.png, JCR-2835-use-DescendantSelfAxisQuery.patch, JCR-2835_PerformanceTests.patch, JCR-2835_Poor_performance_on_ISDESCENDANTNODE_constraint_v1.patch, SQL2DescendantSearchTest.png Using the latest source code, I have noticed very bad performance on SQL-2 queries that use the ISDESCENDANTNODE constraint on a large sub-tree. For example, the query : select * from [jnt:news] as news where ISDESCENDANTNODE(news,'/root/site') order by news.[date] desc executes in 600ms select * from [jnt:news] as news order by news.[date] desc executes in 4ms From looking at the problem in the Yourkit profiler, it seems that the culprit is the constraint building, that uses recursive Lucene searches to build the list of descendant node IDs : private Query getDescendantNodeQuery( DescendantNode dn, JackrabbitIndexSearcher searcher) throws RepositoryException, IOException { BooleanQuery query = new BooleanQuery(); try { LinkedListNodeId ids = new LinkedListNodeId(); NodeImpl ancestor = (NodeImpl) session.getNode(dn.getAncestorPath()); ids.add(ancestor.getNodeId()); while (!ids.isEmpty()) { String id = ids.removeFirst().toString(); Query q = new JackrabbitTermQuery(new Term(FieldNames.PARENT, id)); QueryHits hits = searcher.evaluate(q); ScoreNode sn = hits.nextScoreNode(); if (sn != null) { query.add(q, SHOULD); do { ids.add(sn.getNodeId()); sn = hits.nextScoreNode(); } while (sn != null); } } } catch (PathNotFoundException e) { query.add(new JackrabbitTermQuery(new Term( FieldNames.UUID, invalid-node-id)), // never matches SHOULD); } return query; } In the above example this generates over 2800 Lucene queries, which is the culprit. I wonder if it wouldn't be faster to retrieve the IDs by using the JCR to retrieve the list of child IDs ? This was probably also missed because I didn't seem to find any performance tests on this constraint. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Issue Comment Edited: (JCR-2835) Poor performance of ISDESCENDANTNODE on SQL 2 queries
[ https://issues.apache.org/jira/browse/JCR-2835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12972379#action_12972379 ] Serge Huber edited comment on JCR-2835 at 12/17/10 2:50 AM: I have generated the performance graphs and indeed this patch looks really good ! Btw I had a lot of trouble generating the graphs under Mac OS X. It took me a while to understand that I needed to install the following packages from fink : imagemagick2-svg gnuplot I have added to the README.txt instructions on how to use the script under Mac OS X to generate the graphs. If nobody has any objections, I'd like to commit this patch since the results are really much better ? Best regards, Serge Huber. was (Author: bhillou): I have generated the performance graphs and indeed this patch looks really good ! Btw I had a lot of trouble generating the graphs under Mac OS X. It took me a while to understand that I needed to install the following packages from fink : imagemagick2-svg gnuplot I think we might want to add that to the README.txt if there are others trying to use Mac OS X to generate the graphs. If nobody has any objections, I'd like to commit this patch since the results are really much better ? Best regards, Serge Huber. Poor performance of ISDESCENDANTNODE on SQL 2 queries - Key: JCR-2835 URL: https://issues.apache.org/jira/browse/JCR-2835 Project: Jackrabbit Content Repository Issue Type: Improvement Components: jackrabbit-core, query Affects Versions: 2.2.0, 2.2.1, 2.3.0 Reporter: Serge Huber Fix For: 2.3.0 Attachments: DescendantSearchTest.png, JCR-2835-use-DescendantSelfAxisQuery.patch, JCR-2835_PerformanceTests.patch, JCR-2835_Poor_performance_on_ISDESCENDANTNODE_constraint_v1.patch, SQL2DescendantSearchTest.png Using the latest source code, I have noticed very bad performance on SQL-2 queries that use the ISDESCENDANTNODE constraint on a large sub-tree. For example, the query : select * from [jnt:news] as news where ISDESCENDANTNODE(news,'/root/site') order by news.[date] desc executes in 600ms select * from [jnt:news] as news order by news.[date] desc executes in 4ms From looking at the problem in the Yourkit profiler, it seems that the culprit is the constraint building, that uses recursive Lucene searches to build the list of descendant node IDs : private Query getDescendantNodeQuery( DescendantNode dn, JackrabbitIndexSearcher searcher) throws RepositoryException, IOException { BooleanQuery query = new BooleanQuery(); try { LinkedListNodeId ids = new LinkedListNodeId(); NodeImpl ancestor = (NodeImpl) session.getNode(dn.getAncestorPath()); ids.add(ancestor.getNodeId()); while (!ids.isEmpty()) { String id = ids.removeFirst().toString(); Query q = new JackrabbitTermQuery(new Term(FieldNames.PARENT, id)); QueryHits hits = searcher.evaluate(q); ScoreNode sn = hits.nextScoreNode(); if (sn != null) { query.add(q, SHOULD); do { ids.add(sn.getNodeId()); sn = hits.nextScoreNode(); } while (sn != null); } } } catch (PathNotFoundException e) { query.add(new JackrabbitTermQuery(new Term( FieldNames.UUID, invalid-node-id)), // never matches SHOULD); } return query; } In the above example this generates over 2800 Lucene queries, which is the culprit. I wonder if it wouldn't be faster to retrieve the IDs by using the JCR to retrieve the list of child IDs ? This was probably also missed because I didn't seem to find any performance tests on this constraint. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Jackrabbit 2.3 performance test sub project
Hi guys, I have taken the liberty of committing a sub-project for Jackrabbit 2.3 because I needed it to validate a patch that we've been working on. I have also updated the README.txt file to add instructions on how to run the plot.sh script under Mac OS X. I haven't added the compatibility sub-project yet though, as I didn't need it immediately and I saw that there isn't one for 2.2, is that normal ? Best regards, Serge Huber.
[jira] Commented: (JCR-2835) Poor performance of ISDESCENDANTNODE on SQL 2 queries
[ https://issues.apache.org/jira/browse/JCR-2835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12970715#action_12970715 ] Serge Huber commented on JCR-2835: -- Sorry about that Jukka, my bad. Didn't know this could cause Hudson to fail. Btw unfortunately I didn't have the time to test your proposal. I was working on comparing the Lucene queries between the XPath and SQL-2 tests, and saw that the DescendantChildNodeQuery is being used in the case of XPath but not in the case of SQL-2. I'm not (yet) an expert at Lucene, but maybe that's a place to start ? I also notice that the SimpleQueryResult does not support result fetch size as the other SingleColumnQueryResult and MultipleColumnQueryResult do. I realize this is because of the join merging, but maybe we should look at being able to do progressive merging alongside with merges in order to reduce the number of results being loaded systematically. Again I haven't thought this through completely and maybe there is some limitation on doing so. These query problems are difficult because we are basically rewriting a full-fledged SQL optimizer, and maybe we should look at how databases perform these ? Regards, Serge Huber. Poor performance of ISDESCENDANTNODE on SQL 2 queries - Key: JCR-2835 URL: https://issues.apache.org/jira/browse/JCR-2835 Project: Jackrabbit Content Repository Issue Type: Improvement Components: jackrabbit-core, query Affects Versions: 2.2.0, 2.2.1, 2.3.0 Reporter: Serge Huber Fix For: 2.3.0 Attachments: JCR-2835_PerformanceTests.patch, JCR-2835_Poor_performance_on_ISDESCENDANTNODE_constraint_v1.patch Using the latest source code, I have noticed very bad performance on SQL-2 queries that use the ISDESCENDANTNODE constraint on a large sub-tree. For example, the query : select * from [jnt:news] as news where ISDESCENDANTNODE(news,'/root/site') order by news.[date] desc executes in 600ms select * from [jnt:news] as news order by news.[date] desc executes in 4ms From looking at the problem in the Yourkit profiler, it seems that the culprit is the constraint building, that uses recursive Lucene searches to build the list of descendant node IDs : private Query getDescendantNodeQuery( DescendantNode dn, JackrabbitIndexSearcher searcher) throws RepositoryException, IOException { BooleanQuery query = new BooleanQuery(); try { LinkedListNodeId ids = new LinkedListNodeId(); NodeImpl ancestor = (NodeImpl) session.getNode(dn.getAncestorPath()); ids.add(ancestor.getNodeId()); while (!ids.isEmpty()) { String id = ids.removeFirst().toString(); Query q = new JackrabbitTermQuery(new Term(FieldNames.PARENT, id)); QueryHits hits = searcher.evaluate(q); ScoreNode sn = hits.nextScoreNode(); if (sn != null) { query.add(q, SHOULD); do { ids.add(sn.getNodeId()); sn = hits.nextScoreNode(); } while (sn != null); } } } catch (PathNotFoundException e) { query.add(new JackrabbitTermQuery(new Term( FieldNames.UUID, invalid-node-id)), // never matches SHOULD); } return query; } In the above example this generates over 2800 Lucene queries, which is the culprit. I wonder if it wouldn't be faster to retrieve the IDs by using the JCR to retrieve the list of child IDs ? This was probably also missed because I didn't seem to find any performance tests on this constraint. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (JCR-2835) Poor performance of ISDESCENDANTNODE on SQL 2 queries
[ https://issues.apache.org/jira/browse/JCR-2835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12970100#action_12970100 ] Serge Huber commented on JCR-2835: -- Ok I have committed the perf tests in revision 1043897. I will be working on trying out your suggestion today. Best regards, Serge Huber. Poor performance of ISDESCENDANTNODE on SQL 2 queries - Key: JCR-2835 URL: https://issues.apache.org/jira/browse/JCR-2835 Project: Jackrabbit Content Repository Issue Type: Improvement Components: jackrabbit-core, query Affects Versions: 2.2.0, 2.2.1, 2.3.0 Reporter: Serge Huber Fix For: 2.3.0 Attachments: JCR-2835_PerformanceTests.patch, JCR-2835_Poor_performance_on_ISDESCENDANTNODE_constraint_v1.patch Using the latest source code, I have noticed very bad performance on SQL-2 queries that use the ISDESCENDANTNODE constraint on a large sub-tree. For example, the query : select * from [jnt:news] as news where ISDESCENDANTNODE(news,'/root/site') order by news.[date] desc executes in 600ms select * from [jnt:news] as news order by news.[date] desc executes in 4ms From looking at the problem in the Yourkit profiler, it seems that the culprit is the constraint building, that uses recursive Lucene searches to build the list of descendant node IDs : private Query getDescendantNodeQuery( DescendantNode dn, JackrabbitIndexSearcher searcher) throws RepositoryException, IOException { BooleanQuery query = new BooleanQuery(); try { LinkedListNodeId ids = new LinkedListNodeId(); NodeImpl ancestor = (NodeImpl) session.getNode(dn.getAncestorPath()); ids.add(ancestor.getNodeId()); while (!ids.isEmpty()) { String id = ids.removeFirst().toString(); Query q = new JackrabbitTermQuery(new Term(FieldNames.PARENT, id)); QueryHits hits = searcher.evaluate(q); ScoreNode sn = hits.nextScoreNode(); if (sn != null) { query.add(q, SHOULD); do { ids.add(sn.getNodeId()); sn = hits.nextScoreNode(); } while (sn != null); } } } catch (PathNotFoundException e) { query.add(new JackrabbitTermQuery(new Term( FieldNames.UUID, invalid-node-id)), // never matches SHOULD); } return query; } In the above example this generates over 2800 Lucene queries, which is the culprit. I wonder if it wouldn't be faster to retrieve the IDs by using the JCR to retrieve the list of child IDs ? This was probably also missed because I didn't seem to find any performance tests on this constraint. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Issue Comment Edited: (JCR-2835) Poor performance of ISDESCENDANTNODE on SQL 2 queries
[ https://issues.apache.org/jira/browse/JCR-2835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12970100#action_12970100 ] Serge Huber edited comment on JCR-2835 at 12/10/10 3:33 AM: Ok I have committed the perf tests in revision 1044239. I will be working on trying out your suggestion today. Best regards, Serge Huber. was (Author: bhillou): Ok I have committed the perf tests in revision 1043897. I will be working on trying out your suggestion today. Best regards, Serge Huber. Poor performance of ISDESCENDANTNODE on SQL 2 queries - Key: JCR-2835 URL: https://issues.apache.org/jira/browse/JCR-2835 Project: Jackrabbit Content Repository Issue Type: Improvement Components: jackrabbit-core, query Affects Versions: 2.2.0, 2.2.1, 2.3.0 Reporter: Serge Huber Fix For: 2.3.0 Attachments: JCR-2835_PerformanceTests.patch, JCR-2835_Poor_performance_on_ISDESCENDANTNODE_constraint_v1.patch Using the latest source code, I have noticed very bad performance on SQL-2 queries that use the ISDESCENDANTNODE constraint on a large sub-tree. For example, the query : select * from [jnt:news] as news where ISDESCENDANTNODE(news,'/root/site') order by news.[date] desc executes in 600ms select * from [jnt:news] as news order by news.[date] desc executes in 4ms From looking at the problem in the Yourkit profiler, it seems that the culprit is the constraint building, that uses recursive Lucene searches to build the list of descendant node IDs : private Query getDescendantNodeQuery( DescendantNode dn, JackrabbitIndexSearcher searcher) throws RepositoryException, IOException { BooleanQuery query = new BooleanQuery(); try { LinkedListNodeId ids = new LinkedListNodeId(); NodeImpl ancestor = (NodeImpl) session.getNode(dn.getAncestorPath()); ids.add(ancestor.getNodeId()); while (!ids.isEmpty()) { String id = ids.removeFirst().toString(); Query q = new JackrabbitTermQuery(new Term(FieldNames.PARENT, id)); QueryHits hits = searcher.evaluate(q); ScoreNode sn = hits.nextScoreNode(); if (sn != null) { query.add(q, SHOULD); do { ids.add(sn.getNodeId()); sn = hits.nextScoreNode(); } while (sn != null); } } } catch (PathNotFoundException e) { query.add(new JackrabbitTermQuery(new Term( FieldNames.UUID, invalid-node-id)), // never matches SHOULD); } return query; } In the above example this generates over 2800 Lucene queries, which is the culprit. I wonder if it wouldn't be faster to retrieve the IDs by using the JCR to retrieve the list of child IDs ? This was probably also missed because I didn't seem to find any performance tests on this constraint. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: [VOTE] Release Apache Jackrabbit 2.2.0
Hi Jukka, Didn't receive it either, but I'm +1. Regards, Serge Huber. On 10 déc. 2010, at 14:41, Jukka Zitting wrote: Hi, On 08/12/10 22:25, Jukka Zitting wrote: Please vote on releasing this package as Apache Jackrabbit 2.2.0. I just found that at least Michael Dürig never received this vote (we had some problems with Apache mails reaching Adobe mailboxes earlier), so I'm resending. See below. BR, Jukka Zitting Hi, A candidate for the Jackrabbit 2.2.0 release is available at: http://people.apache.org/~jukka/jackrabbit/2.2.0/ The release candidate is a zip archive of the sources in: http://svn.apache.org/repos/asf/jackrabbit/tags/2.2.0/ The SHA1 checksum of the archive is c7736f13454ca69a2a024182938fbd3f417e7e3d. A staged Maven repository is available for review at: https://repository.apache.org/content/repositories/orgapachejackrabbit-005/ A report of the latest performance figures is available at: http://people.apache.org/~jukka/jackrabbit/2.2-20101208/report.html Please vote on releasing this package as Apache Jackrabbit 2.2.0. The vote is open for the next 72 hours and passes if a majority of at least three +1 Jackrabbit PMC votes are cast. [ ] +1 Release this package as Apache Jackrabbit 2.2.0 [ ] -1 Do not release this package because... Here's my +1. BR, Jukka Zitting
[jira] Assigned: (JCR-2835) Poor performance of ISDESCENDANTNODE on SQL 2 queries
[ https://issues.apache.org/jira/browse/JCR-2835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Serge Huber reassigned JCR-2835: Assignee: Serge Huber Poor performance of ISDESCENDANTNODE on SQL 2 queries - Key: JCR-2835 URL: https://issues.apache.org/jira/browse/JCR-2835 Project: Jackrabbit Content Repository Issue Type: Improvement Components: jackrabbit-core, query Affects Versions: 2.2.0 Reporter: Serge Huber Assignee: Serge Huber Fix For: 2.3.0 Attachments: JCR-2835_Poor_performance_on_ISDESCENDANTNODE_constraint_v1.patch Using the latest source code, I have noticed very bad performance on SQL-2 queries that use the ISDESCENDANTNODE constraint on a large sub-tree. For example, the query : select * from [jnt:news] as news where ISDESCENDANTNODE(news,'/root/site') order by news.[date] desc executes in 600ms select * from [jnt:news] as news order by news.[date] desc executes in 4ms From looking at the problem in the Yourkit profiler, it seems that the culprit is the constraint building, that uses recursive Lucene searches to build the list of descendant node IDs : private Query getDescendantNodeQuery( DescendantNode dn, JackrabbitIndexSearcher searcher) throws RepositoryException, IOException { BooleanQuery query = new BooleanQuery(); try { LinkedListNodeId ids = new LinkedListNodeId(); NodeImpl ancestor = (NodeImpl) session.getNode(dn.getAncestorPath()); ids.add(ancestor.getNodeId()); while (!ids.isEmpty()) { String id = ids.removeFirst().toString(); Query q = new JackrabbitTermQuery(new Term(FieldNames.PARENT, id)); QueryHits hits = searcher.evaluate(q); ScoreNode sn = hits.nextScoreNode(); if (sn != null) { query.add(q, SHOULD); do { ids.add(sn.getNodeId()); sn = hits.nextScoreNode(); } while (sn != null); } } } catch (PathNotFoundException e) { query.add(new JackrabbitTermQuery(new Term( FieldNames.UUID, invalid-node-id)), // never matches SHOULD); } return query; } In the above example this generates over 2800 Lucene queries, which is the culprit. I wonder if it wouldn't be faster to retrieve the IDs by using the JCR to retrieve the list of child IDs ? This was probably also missed because I didn't seem to find any performance tests on this constraint. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (JCR-2835) Poor performance of ISDESCENDANTNODE on SQL 2 queries
[ https://issues.apache.org/jira/browse/JCR-2835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Serge Huber updated JCR-2835: - Affects Version/s: 2.3.0 2.2.1 Status: Patch Available (was: Open) I am attaching a first pass at the descendant search tests. These tests were performed on the trunk WITHOUT the proposed patch. I will work on implementing Jukka's proposal now that I have the tests. Please review the XPath one as I am not that fluent in those queries. The current difference is huge (provided my tests are correct) : XPath : # DescendantSearchTest min 10% 50% 90% max 2.2 25 34 43 59 265 SQL-2 : # SQL2DescendantSearchTest min 10% 50% 90% max 2.2 395318 395318 395318 395318 395318 If the test implementations look ok, I can commit them once reviewed. Best regards, Serge Huber. Poor performance of ISDESCENDANTNODE on SQL 2 queries - Key: JCR-2835 URL: https://issues.apache.org/jira/browse/JCR-2835 Project: Jackrabbit Content Repository Issue Type: Improvement Components: jackrabbit-core, query Affects Versions: 2.2.0, 2.2.1, 2.3.0 Reporter: Serge Huber Assignee: Serge Huber Fix For: 2.3.0 Attachments: JCR-2835_Poor_performance_on_ISDESCENDANTNODE_constraint_v1.patch Using the latest source code, I have noticed very bad performance on SQL-2 queries that use the ISDESCENDANTNODE constraint on a large sub-tree. For example, the query : select * from [jnt:news] as news where ISDESCENDANTNODE(news,'/root/site') order by news.[date] desc executes in 600ms select * from [jnt:news] as news order by news.[date] desc executes in 4ms From looking at the problem in the Yourkit profiler, it seems that the culprit is the constraint building, that uses recursive Lucene searches to build the list of descendant node IDs : private Query getDescendantNodeQuery( DescendantNode dn, JackrabbitIndexSearcher searcher) throws RepositoryException, IOException { BooleanQuery query = new BooleanQuery(); try { LinkedListNodeId ids = new LinkedListNodeId(); NodeImpl ancestor = (NodeImpl) session.getNode(dn.getAncestorPath()); ids.add(ancestor.getNodeId()); while (!ids.isEmpty()) { String id = ids.removeFirst().toString(); Query q = new JackrabbitTermQuery(new Term(FieldNames.PARENT, id)); QueryHits hits = searcher.evaluate(q); ScoreNode sn = hits.nextScoreNode(); if (sn != null) { query.add(q, SHOULD); do { ids.add(sn.getNodeId()); sn = hits.nextScoreNode(); } while (sn != null); } } } catch (PathNotFoundException e) { query.add(new JackrabbitTermQuery(new Term( FieldNames.UUID, invalid-node-id)), // never matches SHOULD); } return query; } In the above example this generates over 2800 Lucene queries, which is the culprit. I wonder if it wouldn't be faster to retrieve the IDs by using the JCR to retrieve the list of child IDs ? This was probably also missed because I didn't seem to find any performance tests on this constraint. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (JCR-2835) Poor performance of ISDESCENDANTNODE on SQL 2 queries
[ https://issues.apache.org/jira/browse/JCR-2835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Serge Huber updated JCR-2835: - Attachment: JCR-2835_PerformanceTests.patch Poor performance of ISDESCENDANTNODE on SQL 2 queries - Key: JCR-2835 URL: https://issues.apache.org/jira/browse/JCR-2835 Project: Jackrabbit Content Repository Issue Type: Improvement Components: jackrabbit-core, query Affects Versions: 2.2.0, 2.2.1, 2.3.0 Reporter: Serge Huber Assignee: Serge Huber Fix For: 2.3.0 Attachments: JCR-2835_PerformanceTests.patch, JCR-2835_Poor_performance_on_ISDESCENDANTNODE_constraint_v1.patch Using the latest source code, I have noticed very bad performance on SQL-2 queries that use the ISDESCENDANTNODE constraint on a large sub-tree. For example, the query : select * from [jnt:news] as news where ISDESCENDANTNODE(news,'/root/site') order by news.[date] desc executes in 600ms select * from [jnt:news] as news order by news.[date] desc executes in 4ms From looking at the problem in the Yourkit profiler, it seems that the culprit is the constraint building, that uses recursive Lucene searches to build the list of descendant node IDs : private Query getDescendantNodeQuery( DescendantNode dn, JackrabbitIndexSearcher searcher) throws RepositoryException, IOException { BooleanQuery query = new BooleanQuery(); try { LinkedListNodeId ids = new LinkedListNodeId(); NodeImpl ancestor = (NodeImpl) session.getNode(dn.getAncestorPath()); ids.add(ancestor.getNodeId()); while (!ids.isEmpty()) { String id = ids.removeFirst().toString(); Query q = new JackrabbitTermQuery(new Term(FieldNames.PARENT, id)); QueryHits hits = searcher.evaluate(q); ScoreNode sn = hits.nextScoreNode(); if (sn != null) { query.add(q, SHOULD); do { ids.add(sn.getNodeId()); sn = hits.nextScoreNode(); } while (sn != null); } } } catch (PathNotFoundException e) { query.add(new JackrabbitTermQuery(new Term( FieldNames.UUID, invalid-node-id)), // never matches SHOULD); } return query; } In the above example this generates over 2800 Lucene queries, which is the culprit. I wonder if it wouldn't be faster to retrieve the IDs by using the JCR to retrieve the list of child IDs ? This was probably also missed because I didn't seem to find any performance tests on this constraint. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (JCR-2835) Poor performance of ISDESCENDANTNODE on SQL 2 queries
[ https://issues.apache.org/jira/browse/JCR-2835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Serge Huber reassigned JCR-2835: Assignee: (was: Serge Huber) Poor performance of ISDESCENDANTNODE on SQL 2 queries - Key: JCR-2835 URL: https://issues.apache.org/jira/browse/JCR-2835 Project: Jackrabbit Content Repository Issue Type: Improvement Components: jackrabbit-core, query Affects Versions: 2.2.0, 2.2.1, 2.3.0 Reporter: Serge Huber Fix For: 2.3.0 Attachments: JCR-2835_PerformanceTests.patch, JCR-2835_Poor_performance_on_ISDESCENDANTNODE_constraint_v1.patch Using the latest source code, I have noticed very bad performance on SQL-2 queries that use the ISDESCENDANTNODE constraint on a large sub-tree. For example, the query : select * from [jnt:news] as news where ISDESCENDANTNODE(news,'/root/site') order by news.[date] desc executes in 600ms select * from [jnt:news] as news order by news.[date] desc executes in 4ms From looking at the problem in the Yourkit profiler, it seems that the culprit is the constraint building, that uses recursive Lucene searches to build the list of descendant node IDs : private Query getDescendantNodeQuery( DescendantNode dn, JackrabbitIndexSearcher searcher) throws RepositoryException, IOException { BooleanQuery query = new BooleanQuery(); try { LinkedListNodeId ids = new LinkedListNodeId(); NodeImpl ancestor = (NodeImpl) session.getNode(dn.getAncestorPath()); ids.add(ancestor.getNodeId()); while (!ids.isEmpty()) { String id = ids.removeFirst().toString(); Query q = new JackrabbitTermQuery(new Term(FieldNames.PARENT, id)); QueryHits hits = searcher.evaluate(q); ScoreNode sn = hits.nextScoreNode(); if (sn != null) { query.add(q, SHOULD); do { ids.add(sn.getNodeId()); sn = hits.nextScoreNode(); } while (sn != null); } } } catch (PathNotFoundException e) { query.add(new JackrabbitTermQuery(new Term( FieldNames.UUID, invalid-node-id)), // never matches SHOULD); } return query; } In the above example this generates over 2800 Lucene queries, which is the culprit. I wonder if it wouldn't be faster to retrieve the IDs by using the JCR to retrieve the list of child IDs ? This was probably also missed because I didn't seem to find any performance tests on this constraint. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (JCR-2835) Poor performance of ISDESCENDANTNODE on SQL 2 queries
[ https://issues.apache.org/jira/browse/JCR-2835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12969838#action_12969838 ] Serge Huber commented on JCR-2835: -- I just tested with the patch I proposed here, the results are slightly better, but still very far from the XPath implementation : # SQL2DescendantSearchTest min 10% 50% 90% max 2.2 224662 224662 224662 224662 224662 Best regards, Serge Huber. Poor performance of ISDESCENDANTNODE on SQL 2 queries - Key: JCR-2835 URL: https://issues.apache.org/jira/browse/JCR-2835 Project: Jackrabbit Content Repository Issue Type: Improvement Components: jackrabbit-core, query Affects Versions: 2.2.0, 2.2.1, 2.3.0 Reporter: Serge Huber Fix For: 2.3.0 Attachments: JCR-2835_PerformanceTests.patch, JCR-2835_Poor_performance_on_ISDESCENDANTNODE_constraint_v1.patch Using the latest source code, I have noticed very bad performance on SQL-2 queries that use the ISDESCENDANTNODE constraint on a large sub-tree. For example, the query : select * from [jnt:news] as news where ISDESCENDANTNODE(news,'/root/site') order by news.[date] desc executes in 600ms select * from [jnt:news] as news order by news.[date] desc executes in 4ms From looking at the problem in the Yourkit profiler, it seems that the culprit is the constraint building, that uses recursive Lucene searches to build the list of descendant node IDs : private Query getDescendantNodeQuery( DescendantNode dn, JackrabbitIndexSearcher searcher) throws RepositoryException, IOException { BooleanQuery query = new BooleanQuery(); try { LinkedListNodeId ids = new LinkedListNodeId(); NodeImpl ancestor = (NodeImpl) session.getNode(dn.getAncestorPath()); ids.add(ancestor.getNodeId()); while (!ids.isEmpty()) { String id = ids.removeFirst().toString(); Query q = new JackrabbitTermQuery(new Term(FieldNames.PARENT, id)); QueryHits hits = searcher.evaluate(q); ScoreNode sn = hits.nextScoreNode(); if (sn != null) { query.add(q, SHOULD); do { ids.add(sn.getNodeId()); sn = hits.nextScoreNode(); } while (sn != null); } } } catch (PathNotFoundException e) { query.add(new JackrabbitTermQuery(new Term( FieldNames.UUID, invalid-node-id)), // never matches SHOULD); } return query; } In the above example this generates over 2800 Lucene queries, which is the culprit. I wonder if it wouldn't be faster to retrieve the IDs by using the JCR to retrieve the list of child IDs ? This was probably also missed because I didn't seem to find any performance tests on this constraint. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (JCR-2835) Poor performance of ISDESCENDANTNODE on SQL 2 queries
[ https://issues.apache.org/jira/browse/JCR-2835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Serge Huber updated JCR-2835: - Fix Version/s: 2.3.0 2.2.0 Added fix version, please correct if needed. Poor performance of ISDESCENDANTNODE on SQL 2 queries - Key: JCR-2835 URL: https://issues.apache.org/jira/browse/JCR-2835 Project: Jackrabbit Content Repository Issue Type: Bug Affects Versions: 2.2.0 Reporter: Serge Huber Fix For: 2.2.0, 2.3.0 Using the latest source code, I have noticed very bad performance on SQL-2 queries that use the ISDESCENDANTNODE constraint on a large sub-tree. For example, the query : select * from [jnt:news] as news where ISDESCENDANTNODE(news,'/root/site') order by news.[date] desc executes in 600ms select * from [jnt:news] as news order by news.[date] desc executes in 4ms From looking at the problem in the Yourkit profiler, it seems that the culprit is the constraint building, that uses recursive Lucene searches to build the list of descendant node IDs : private Query getDescendantNodeQuery( DescendantNode dn, JackrabbitIndexSearcher searcher) throws RepositoryException, IOException { BooleanQuery query = new BooleanQuery(); try { LinkedListNodeId ids = new LinkedListNodeId(); NodeImpl ancestor = (NodeImpl) session.getNode(dn.getAncestorPath()); ids.add(ancestor.getNodeId()); while (!ids.isEmpty()) { String id = ids.removeFirst().toString(); Query q = new JackrabbitTermQuery(new Term(FieldNames.PARENT, id)); QueryHits hits = searcher.evaluate(q); ScoreNode sn = hits.nextScoreNode(); if (sn != null) { query.add(q, SHOULD); do { ids.add(sn.getNodeId()); sn = hits.nextScoreNode(); } while (sn != null); } } } catch (PathNotFoundException e) { query.add(new JackrabbitTermQuery(new Term( FieldNames.UUID, invalid-node-id)), // never matches SHOULD); } return query; } In the above example this generates over 2800 Lucene queries, which is the culprit. I wonder if it wouldn't be faster to retrieve the IDs by using the JCR to retrieve the list of child IDs ? This was probably also missed because I didn't seem to find any performance tests on this constraint. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (JCR-2835) Poor performance of ISDESCENDANTNODE on SQL 2 queries
[ https://issues.apache.org/jira/browse/JCR-2835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Serge Huber updated JCR-2835: - Attachment: JCR-2835_Poor_performance_on_ISDESCENDANTNODE_constraint_v1.patch I am attaching a patch to the LuceneQueryFactory that replaces the recursive Lucene queries with JCR sub-tree traversing. This seems to yield a little bit better performance (x3) in my tests, but this is still slow if the sub-tree has a lot of nodes. I welcome any feedback you may have. I am also ready to commit this if you'd like. Best regards, Serge Huber. Poor performance of ISDESCENDANTNODE on SQL 2 queries - Key: JCR-2835 URL: https://issues.apache.org/jira/browse/JCR-2835 Project: Jackrabbit Content Repository Issue Type: Bug Affects Versions: 2.2.0 Reporter: Serge Huber Fix For: 2.2.0, 2.3.0 Attachments: JCR-2835_Poor_performance_on_ISDESCENDANTNODE_constraint_v1.patch Using the latest source code, I have noticed very bad performance on SQL-2 queries that use the ISDESCENDANTNODE constraint on a large sub-tree. For example, the query : select * from [jnt:news] as news where ISDESCENDANTNODE(news,'/root/site') order by news.[date] desc executes in 600ms select * from [jnt:news] as news order by news.[date] desc executes in 4ms From looking at the problem in the Yourkit profiler, it seems that the culprit is the constraint building, that uses recursive Lucene searches to build the list of descendant node IDs : private Query getDescendantNodeQuery( DescendantNode dn, JackrabbitIndexSearcher searcher) throws RepositoryException, IOException { BooleanQuery query = new BooleanQuery(); try { LinkedListNodeId ids = new LinkedListNodeId(); NodeImpl ancestor = (NodeImpl) session.getNode(dn.getAncestorPath()); ids.add(ancestor.getNodeId()); while (!ids.isEmpty()) { String id = ids.removeFirst().toString(); Query q = new JackrabbitTermQuery(new Term(FieldNames.PARENT, id)); QueryHits hits = searcher.evaluate(q); ScoreNode sn = hits.nextScoreNode(); if (sn != null) { query.add(q, SHOULD); do { ids.add(sn.getNodeId()); sn = hits.nextScoreNode(); } while (sn != null); } } } catch (PathNotFoundException e) { query.add(new JackrabbitTermQuery(new Term( FieldNames.UUID, invalid-node-id)), // never matches SHOULD); } return query; } In the above example this generates over 2800 Lucene queries, which is the culprit. I wonder if it wouldn't be faster to retrieve the IDs by using the JCR to retrieve the list of child IDs ? This was probably also missed because I didn't seem to find any performance tests on this constraint. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: Jackrabbit 2.2.0 release plan
Hi Marcel, I know I could use XPath :) I was just trying to help out on the performance issue :) From looking at the code, it really looks like there was a lot of effort put into optimizing SQL-1 and XPath, but that we are only at the beginning of the optimizations for SQL-2. Regards, Serge Huber. On 8 déc. 2010, at 09:49, Marcel Reutegger wrote: Hi serge, you could use XPath instead: root/site//element(*, jnt:news) order by @date descending regards marcel From: Serge Huber [mailto:shub...@jahia.com] Sent: 08 December 2010 08:51 To: dev@jackrabbit.apache.org Subject: Re: Jackrabbit 2.2.0 release plan Hello Jukka, I have noticed a performance issue on the ISDESCENDANTNODE constraint which I have reported here : https://issues.apache.org/jira/browse/JCR-2835 I am currently investigating this, but apart from the alternative I proposed in the ticket, I don't see another way of doing this apart from indexing the path at node creation/moving time. Do you have any ideas ? Best regards, Serge Huber. On 7 déc. 2010, at 13:54, Jukka Zitting wrote: Hi, On 30/11/10 10:33, Jukka Zitting wrote: Looks like we're on good track for cutting the release candidate next week, as there's only a bit of tweaking and some minor updates left to be done. There are still some clustering tests and other minor things pending, so I'll postpone cutting the release candidate to tomorrow. That'll also give us a chance to do a last performance test run tonight before the release. BR, Jukka Zitting
[jira] Commented: (JCR-2835) Poor performance of ISDESCENDANTNODE on SQL 2 queries
[ https://issues.apache.org/jira/browse/JCR-2835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12969649#action_12969649 ] Serge Huber commented on JCR-2835: -- Hello Jukka, Thanks again for your quick answer. Yes agreed we should provide a test case. Where should this be included ? I might have the opportunity to help develop this but I don't really know the best place to add such a case ? Interesting approach for the levels, this would indeed reduce the number of queries, although the clauses could get quite large, not sure if that's an issue for Lucene. Best regards, Serge Huber. Poor performance of ISDESCENDANTNODE on SQL 2 queries - Key: JCR-2835 URL: https://issues.apache.org/jira/browse/JCR-2835 Project: Jackrabbit Content Repository Issue Type: Improvement Components: jackrabbit-core, query Affects Versions: 2.2.0 Reporter: Serge Huber Fix For: 2.3.0 Attachments: JCR-2835_Poor_performance_on_ISDESCENDANTNODE_constraint_v1.patch Using the latest source code, I have noticed very bad performance on SQL-2 queries that use the ISDESCENDANTNODE constraint on a large sub-tree. For example, the query : select * from [jnt:news] as news where ISDESCENDANTNODE(news,'/root/site') order by news.[date] desc executes in 600ms select * from [jnt:news] as news order by news.[date] desc executes in 4ms From looking at the problem in the Yourkit profiler, it seems that the culprit is the constraint building, that uses recursive Lucene searches to build the list of descendant node IDs : private Query getDescendantNodeQuery( DescendantNode dn, JackrabbitIndexSearcher searcher) throws RepositoryException, IOException { BooleanQuery query = new BooleanQuery(); try { LinkedListNodeId ids = new LinkedListNodeId(); NodeImpl ancestor = (NodeImpl) session.getNode(dn.getAncestorPath()); ids.add(ancestor.getNodeId()); while (!ids.isEmpty()) { String id = ids.removeFirst().toString(); Query q = new JackrabbitTermQuery(new Term(FieldNames.PARENT, id)); QueryHits hits = searcher.evaluate(q); ScoreNode sn = hits.nextScoreNode(); if (sn != null) { query.add(q, SHOULD); do { ids.add(sn.getNodeId()); sn = hits.nextScoreNode(); } while (sn != null); } } } catch (PathNotFoundException e) { query.add(new JackrabbitTermQuery(new Term( FieldNames.UUID, invalid-node-id)), // never matches SHOULD); } return query; } In the above example this generates over 2800 Lucene queries, which is the culprit. I wonder if it wouldn't be faster to retrieve the IDs by using the JCR to retrieve the list of child IDs ? This was probably also missed because I didn't seem to find any performance tests on this constraint. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (JCR-2835) Poor performance of ISDESCENDANTNODE on SQL 2 queries
Poor performance of ISDESCENDANTNODE on SQL 2 queries - Key: JCR-2835 URL: https://issues.apache.org/jira/browse/JCR-2835 Project: Jackrabbit Content Repository Issue Type: Bug Affects Versions: 2.2.0 Reporter: Serge Huber Using the latest source code, I have noticed very bad performance on SQL-2 queries that use the ISDESCENDANTNODE constraint on a large sub-tree. For example, the query : select * from [jnt:news] as news where ISDESCENDANTNODE(news,'/root/site') order by news.[date] desc executes in 600ms select * from [jnt:news] as news order by news.[date] desc executes in 4ms From looking at the problem in the Yourkit profiler, it seems that the culprit is the constraint building, that uses recursive Lucene searches to build the list of descendant node IDs : private Query getDescendantNodeQuery( DescendantNode dn, JackrabbitIndexSearcher searcher) throws RepositoryException, IOException { BooleanQuery query = new BooleanQuery(); try { LinkedListNodeId ids = new LinkedListNodeId(); NodeImpl ancestor = (NodeImpl) session.getNode(dn.getAncestorPath()); ids.add(ancestor.getNodeId()); while (!ids.isEmpty()) { String id = ids.removeFirst().toString(); Query q = new JackrabbitTermQuery(new Term(FieldNames.PARENT, id)); QueryHits hits = searcher.evaluate(q); ScoreNode sn = hits.nextScoreNode(); if (sn != null) { query.add(q, SHOULD); do { ids.add(sn.getNodeId()); sn = hits.nextScoreNode(); } while (sn != null); } } } catch (PathNotFoundException e) { query.add(new JackrabbitTermQuery(new Term( FieldNames.UUID, invalid-node-id)), // never matches SHOULD); } return query; } In the above example this generates over 2800 Lucene queries, which is the culprit. I wonder if it wouldn't be faster to retrieve the IDs by using the JCR to retrieve the list of child IDs ? This was probably also missed because I didn't seem to find any performance tests on this constraint. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: Jackrabbit 2.2.0 release plan
Hello Jukka, I have noticed a performance issue on the ISDESCENDANTNODE constraint which I have reported here : https://issues.apache.org/jira/browse/JCR-2835 I am currently investigating this, but apart from the alternative I proposed in the ticket, I don't see another way of doing this apart from indexing the path at node creation/moving time. Do you have any ideas ? Best regards, Serge Huber. On 7 déc. 2010, at 13:54, Jukka Zitting wrote: Hi, On 30/11/10 10:33, Jukka Zitting wrote: Looks like we're on good track for cutting the release candidate next week, as there's only a bit of tweaking and some minor updates left to be done. There are still some clustering tests and other minor things pending, so I'll postpone cutting the release candidate to tomorrow. That'll also give us a chance to do a last performance test run tonight before the release. BR, Jukka Zitting
[jira] Commented: (JCR-2715) Improved join query performance
[ https://issues.apache.org/jira/browse/JCR-2715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12966965#action_12966965 ] Serge Huber commented on JCR-2715: -- Thanks Clemens for the JRat data. From looking at it seems that the sorting of the results is taking up the most time, so I wonder how the sorting is done in the case of the SQL-1 implementation ? Jukka, is there a reason why we load all the data in the SQL-2/QOM implementations at query execution time ? Best regards, Serge... Improved join query performance --- Key: JCR-2715 URL: https://issues.apache.org/jira/browse/JCR-2715 Project: Jackrabbit Content Repository Issue Type: Improvement Components: jackrabbit-core, query Reporter: Jukka Zitting Assignee: Jukka Zitting Fix For: 2.2.0 Attachments: 2010-11-30_PM-04-55-23.zip, SQL2SearchTest.png, ThreeWayJoinTest.png, TwoWayJoinTest.png Our current implementation of SQL2 join queries does not perform very well on pretty much any non-trivial data set. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (JCR-2715) Improved join query performance
[ https://issues.apache.org/jira/browse/JCR-2715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12967111#action_12967111 ] Serge Huber commented on JCR-2715: -- Hello Jukka, Thanks for clarifying this. But it seems to me that a lot of developers will generate such queries. I'll continue on the other ticket anyway, so that we can close this one. Best regards, Serge Huber. Improved join query performance --- Key: JCR-2715 URL: https://issues.apache.org/jira/browse/JCR-2715 Project: Jackrabbit Content Repository Issue Type: Improvement Components: jackrabbit-core, query Reporter: Jukka Zitting Assignee: Jukka Zitting Fix For: 2.2.0 Attachments: 2010-11-30_PM-04-55-23.zip, SQL2SearchTest.png, ThreeWayJoinTest.png, TwoWayJoinTest.png Our current implementation of SQL2 join queries does not perform very well on pretty much any non-trivial data set. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (JCR-2830) JCR-SQL2 : Query on large node-set is (too) slow even when offset and limit is used
[ https://issues.apache.org/jira/browse/JCR-2830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12967112#action_12967112 ] Serge Huber commented on JCR-2830: -- Does anyone know how SQL-1 does it ? It seems from the code that it lazy loads the results, but how can it do so if sorting is needed ? Best regards, Serge Huber. JCR-SQL2 : Query on large node-set is (too) slow even when offset and limit is used --- Key: JCR-2830 URL: https://issues.apache.org/jira/browse/JCR-2830 Project: Jackrabbit Content Repository Issue Type: Improvement Affects Versions: 2.3.0 Environment: + Win7 (64bit) + JR built from latest greatest sources + repo with many nodes of same node type 77'000 Reporter: Clemens Wyss Given a node-set of approx 77'000 entries a SQL2-query limited to 10 nodes takes approx 37 to 59sec (!) whereas the corresponding SQL returns in less than 1sec. Query q = session.getWorkspace().getQueryManager().createQuery( select * from [task], Query.SQL2 ); q.setOffset( 0 ); // or any other offset q.setLimit( 10 ); returnValue = q.execute(); -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: Jackrabbit 2.2.0 release plan
Hello Jukka, Thanks for the graphs. Indeed performance looks a lot better on 2.2 than 2.1 or 2.0, but in some cases we're still quite slower than previous versions. Especially the login/logout test is performing poorly, do you have any idea why ? Regards, Serge Huber. On 30 nov. 2010, at 10:33, Jukka Zitting wrote: Hi, On 24/11/10 10:49, Jukka Zitting wrote: On 10/11/10 00:13, Jukka Zitting wrote: After that we'll allow two weeks for testing and fine-tuning in the branch. Unless any major issues come up, I will then cut the 2.2.0 release candidate on Tuesday, Nov 30th. If all goes well, the release will be out by the end of that week, at the beginning of December. Let's push that date also ahead, to Tuesday, Dec 7th. Looks like we're on good track for cutting the release candidate next week, as there's only a bit of tweaking and some minor updates left to be done. To avoid any unexpected performance regressions, I run the performance test suite last night on the latest 2.2 branch after syncing it with recent changes in trunk. The report looks pretty good, and can be seen at: http://people.apache.org/~jukka/jackrabbit/2.2-20101129/report.html BR, Jukka Zitting
[jira] Commented: (JCR-2715) Improved join query performance
[ https://issues.apache.org/jira/browse/JCR-2715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12965187#action_12965187 ] Serge Huber commented on JCR-2715: -- Great improvements, thanks a lot. Did you also test simple (non join) SQL-2 performance as you also improved this part ? I wanted to run these tests but as I am traveling I probably won't be able to do so until next week. Best Regards, Serge Huber Improved join query performance --- Key: JCR-2715 URL: https://issues.apache.org/jira/browse/JCR-2715 Project: Jackrabbit Content Repository Issue Type: Improvement Components: jackrabbit-core, query Reporter: Jukka Zitting Assignee: Jukka Zitting Fix For: 2.2.0 Attachments: ThreeWayJoinTest.png, TwoWayJoinTest.png Our current implementation of SQL2 join queries does not perform very well on pretty much any non-trivial data set. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (JCR-2715) Improved join query performance
[ https://issues.apache.org/jira/browse/JCR-2715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12965201#action_12965201 ] Serge Huber commented on JCR-2715: -- Very impressive indeed ! Thanks for answering so fast :) Best Regards, Serge Huber Improved join query performance --- Key: JCR-2715 URL: https://issues.apache.org/jira/browse/JCR-2715 Project: Jackrabbit Content Repository Issue Type: Improvement Components: jackrabbit-core, query Reporter: Jukka Zitting Assignee: Jukka Zitting Fix For: 2.2.0 Attachments: SQL2SearchTest.png, ThreeWayJoinTest.png, TwoWayJoinTest.png Our current implementation of SQL2 join queries does not perform very well on pretty much any non-trivial data set. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (JCR-2715) Improved join query performance
[ https://issues.apache.org/jira/browse/JCR-2715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12965256#action_12965256 ] Serge Huber commented on JCR-2715: -- Hello Clement, Thank you for the feedback. Could you provide more information about your tests ? Maybe there is something that can be put into a unit test so that performance analysis is easier to reproduce ? Or at the minimum the CND ? Best Regards, Serge Huber Improved join query performance --- Key: JCR-2715 URL: https://issues.apache.org/jira/browse/JCR-2715 Project: Jackrabbit Content Repository Issue Type: Improvement Components: jackrabbit-core, query Reporter: Jukka Zitting Assignee: Jukka Zitting Fix For: 2.2.0 Attachments: SQL2SearchTest.png, ThreeWayJoinTest.png, TwoWayJoinTest.png Our current implementation of SQL2 join queries does not perform very well on pretty much any non-trivial data set. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (JCR-2715) Improved join query performance
[ https://issues.apache.org/jira/browse/JCR-2715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12965259#action_12965259 ] Serge Huber commented on JCR-2715: -- Ok thanks. It is plausible that the number of results if affecting performance because from previous traces I saw that some constraints seemed to be evaluated on the results. But I didn't get around to testing the latest code yet. Would it be possible for your to capture some snapshots with Yourkit or an equivalent ? Best regards, Serge Huber Improved join query performance --- Key: JCR-2715 URL: https://issues.apache.org/jira/browse/JCR-2715 Project: Jackrabbit Content Repository Issue Type: Improvement Components: jackrabbit-core, query Reporter: Jukka Zitting Assignee: Jukka Zitting Fix For: 2.2.0 Attachments: SQL2SearchTest.png, ThreeWayJoinTest.png, TwoWayJoinTest.png Our current implementation of SQL2 join queries does not perform very well on pretty much any non-trivial data set. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (JCR-2793) Typo in NodeTypeRegistry
[ https://issues.apache.org/jira/browse/JCR-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Serge Huber reassigned JCR-2793: Assignee: Serge Huber Typo in NodeTypeRegistry Key: JCR-2793 URL: https://issues.apache.org/jira/browse/JCR-2793 Project: Jackrabbit Content Repository Issue Type: Bug Components: jackrabbit-core Affects Versions: 2.2.0 Reporter: Serge Huber Assignee: Serge Huber Fix For: 2.2.0 Attachments: Fix_typo_on_custom_node_types_constant.patch Original Estimate: 0.17h Remaining Estimate: 0.17h It seems a little typo has been introduced in the NodeTypeRegistry, as illustrated in this stack trace : Caused by: javax.jcr.RepositoryException: internal error: invalid resource: nodetypes/custom_nodetypes.xml at org.apache.jackrabbit.core.nodetype.NodeTypeRegistry.init(NodeTypeRegistry.java:703) ~[jackrabbit-core-2.2-SNAPSHOT.jar:2.2-SNAPSHOT] at org.apache.jackrabbit.core.RepositoryImpl.createNodeTypeRegistry(RepositoryImpl.java:422) ~[jackrabbit-core-2.2-SNAPSHOT.jar:na] at org.apache.jackrabbit.core.RepositoryImpl.init(RepositoryImpl.java:294) ~[jackrabbit-core-2.2-SNAPSHOT.jar:na] This happens when using a DbFileSystem for the root filesystem. This didn't cause a problem in 2.1.1 The patch attached to this ticket correct the issue. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (JCR-2793) Typo in NodeTypeRegistry
[ https://issues.apache.org/jira/browse/JCR-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12964654#action_12964654 ] Serge Huber commented on JCR-2793: -- Ok I have committed this in revision 1040033. This was committed in the trunk, I'm assuming you will merge it to the 2.2 branch ? Typo in NodeTypeRegistry Key: JCR-2793 URL: https://issues.apache.org/jira/browse/JCR-2793 Project: Jackrabbit Content Repository Issue Type: Bug Components: jackrabbit-core Affects Versions: 2.2.0 Reporter: Serge Huber Assignee: Serge Huber Fix For: 2.2.0 Attachments: Fix_typo_on_custom_node_types_constant.patch Original Estimate: 0.17h Remaining Estimate: 0.17h It seems a little typo has been introduced in the NodeTypeRegistry, as illustrated in this stack trace : Caused by: javax.jcr.RepositoryException: internal error: invalid resource: nodetypes/custom_nodetypes.xml at org.apache.jackrabbit.core.nodetype.NodeTypeRegistry.init(NodeTypeRegistry.java:703) ~[jackrabbit-core-2.2-SNAPSHOT.jar:2.2-SNAPSHOT] at org.apache.jackrabbit.core.RepositoryImpl.createNodeTypeRegistry(RepositoryImpl.java:422) ~[jackrabbit-core-2.2-SNAPSHOT.jar:na] at org.apache.jackrabbit.core.RepositoryImpl.init(RepositoryImpl.java:294) ~[jackrabbit-core-2.2-SNAPSHOT.jar:na] This happens when using a DbFileSystem for the root filesystem. This didn't cause a problem in 2.1.1 The patch attached to this ticket correct the issue. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (JCR-2715) Improved join query performance
[ https://issues.apache.org/jira/browse/JCR-2715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12935379#action_12935379 ] Serge Huber commented on JCR-2715: -- Great ! Thanks a lot. I'll be away in the next few days but I'll test it as soon as possible. Improved join query performance --- Key: JCR-2715 URL: https://issues.apache.org/jira/browse/JCR-2715 Project: Jackrabbit Content Repository Issue Type: Improvement Components: jackrabbit-core, query Reporter: Jukka Zitting Assignee: Jukka Zitting Fix For: 2.2.0 Our current implementation of SQL2 join queries does not perform very well on pretty much any non-trivial data set. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: AW: jcr-sql2 queries with or without where-clause :-(
Hello Clemens, Yes that is similar to what I am seeing here. I have commented on the ticket https://issues.apache.org/jira/browse/JCR-2715 that we don't seem to be going through Jukka's new optimized code for simple queries that don't use joins, which is a shame. What happens if you use SQL-1 instead of SQL-2 for your queries (just as a basis of comparison ?) Best regards, Serge Huber. On 18 nov. 2010, at 18:10, Clemens Wyss wrote: addendum: select * from [task] where @employeeinchargeid = 3 returns after 90s to 120s -Ursprüngliche Nachricht- Von: Clemens Wyss [mailto:clemens...@mysign.ch] Gesendet: Donnerstag, 18. November 2010 17:56 An: dev@jackrabbit.apache.org Betreff: jcr-sql2 queries with or without where-clause :-( I know that jcr-sql2 is in progress. Still I would like to know if my oberservations are at all possible: I have approx. 76'000 [task]-nodes in my repo. select * from [task] takes approx. 870ms, :-) select * from [task] order by [jcr:score] takes btw. 2s and 5s (still somehow acceptable) select * from [task] where employeeinchargeid = 3 Seems to be neverending... I stopped my test after 10minutes! employeeinchargeid is a simple int-property. The xpath-pendant only takes 311ms, but xpath is deprecated. Regards Clemens
[jira] Commented: (JCR-2793) Typo in NodeTypeRegistry
[ https://issues.apache.org/jira/browse/JCR-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12932506#action_12932506 ] Serge Huber commented on JCR-2793: -- So who should do this ? Can this be included in 2.2.0 ? I'm a bit rusty in my Apache commiter skills but I'd be willing to do it :) Regards, Serge... Typo in NodeTypeRegistry Key: JCR-2793 URL: https://issues.apache.org/jira/browse/JCR-2793 Project: Jackrabbit Content Repository Issue Type: Bug Components: jackrabbit-core Affects Versions: 2.2.0 Reporter: Serge Huber Fix For: 2.2.0 Attachments: Fix_typo_on_custom_node_types_constant.patch Original Estimate: 0.17h Remaining Estimate: 0.17h It seems a little typo has been introduced in the NodeTypeRegistry, as illustrated in this stack trace : Caused by: javax.jcr.RepositoryException: internal error: invalid resource: nodetypes/custom_nodetypes.xml at org.apache.jackrabbit.core.nodetype.NodeTypeRegistry.init(NodeTypeRegistry.java:703) ~[jackrabbit-core-2.2-SNAPSHOT.jar:2.2-SNAPSHOT] at org.apache.jackrabbit.core.RepositoryImpl.createNodeTypeRegistry(RepositoryImpl.java:422) ~[jackrabbit-core-2.2-SNAPSHOT.jar:na] at org.apache.jackrabbit.core.RepositoryImpl.init(RepositoryImpl.java:294) ~[jackrabbit-core-2.2-SNAPSHOT.jar:na] This happens when using a DbFileSystem for the root filesystem. This didn't cause a problem in 2.1.1 The patch attached to this ticket correct the issue. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (JCR-2715) Improved join query performance
[ https://issues.apache.org/jira/browse/JCR-2715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12923846#action_12923846 ] Serge Huber commented on JCR-2715: -- Hi Jukka, I just tested the latest commits, and it's looking quite good. I only saw 2 tests that don't seem to work yet, but I'm assuming you're already aware of this ? EquiJoinConditionTest testInnerJoin1(org.apache.jackrabbit.test.api.query.qom.EquiJoinConditionTest) 0.692 testInnerJoin2(org.apache.jackrabbit.test.api.query.qom.EquiJoinConditionTest) 0.608 testRightOuterJoin1(org.apache.jackrabbit.test.api.query.qom.EquiJoinConditionTest) + [ Detail ]0.138 |/testroot/node1| is not part of the result set testRightOuterJoin2(org.apache.jackrabbit.test.api.query.qom.EquiJoinConditionTest) 0.09 testLeftOuterJoin1(org.apache.jackrabbit.test.api.query.qom.EquiJoinConditionTest) 0.103 testLeftOuterJoin2(org.apache.jackrabbit.test.api.query.qom.EquiJoinConditionTest) + [ Detail ] 0.071 /testroot/node1|| is not part of the result set Best regards, Serge Huber. Improved join query performance --- Key: JCR-2715 URL: https://issues.apache.org/jira/browse/JCR-2715 Project: Jackrabbit Content Repository Issue Type: Improvement Components: jackrabbit-core, query Reporter: Jukka Zitting Assignee: Jukka Zitting Fix For: 2.2.0 Our current implementation of SQL2 join queries does not perform very well on pretty much any non-trivial data set. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (JCR-2793) Typo in NodeTypeRegistry
[ https://issues.apache.org/jira/browse/JCR-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12923414#action_12923414 ] Serge Huber commented on JCR-2793: -- Thanks for your answer Jukka, Actually, in my mail on the list ( http://markmail.org/thread/l3rrvrasnweb4bsn ), I mentioned that it might have been caused by the removal of the BasedFileSystem around the NodeTypeRegistry instance, which was removed for some reason in the RepositoryImpl. So the old code in 2.1.1 looked like this : RepositoryImpl constructor : ... // create registries nsReg = createNamespaceRegistry(new BasedFileSystem(repStore, /namespaces)); ntReg = createNodeTypeRegistry(nsReg, new BasedFileSystem(repStore, /nodetypes)); ... Possibly because we moved this into the constructor for example of the NamespaceRegistryImpl we now have the following : RepositoryImpl : // create registries context.setNamespaceRegistry(createNamespaceRegistry()); context.setNodeTypeRegistry(createNodeTypeRegistry()); ... protected NamespaceRegistryImpl createNamespaceRegistry() throws RepositoryException { return new NamespaceRegistryImpl(context.getFileSystem()); } ... protected NodeTypeRegistry createNodeTypeRegistry() throws RepositoryException { return new NodeTypeRegistry( context.getNamespaceRegistry(), context.getFileSystem()); } In NamespaceRegistryImpl : public NamespaceRegistryImpl(FileSystem fs) throws RepositoryException { this.nsRegStore = new BasedFileSystem(fs, /namespaces); load(); } And for NodeTypeRegistry : private static final String CUSTOM_NODETYPES_RESOURCE_NAME = nodetypes/custom_nodetypes.xml; @SuppressWarnings(unchecked) public NodeTypeRegistry(NamespaceRegistry nsReg, FileSystem fs) throws RepositoryException { this.nsReg = nsReg; customNodeTypesResource = new FileSystemResource(fs, CUSTOM_NODETYPES_RESOURCE_NAME); So in comparing the two I assumed that if we removed the BasedFileSystem for a reason, we might need the / back at the beginning of the constant. Because the code right after that in the NodeTypeRegistry is : try { // make sure path to resource exists if (!customNodeTypesResource.exists()) { customNodeTypesResource.makeParentDirs(); } } catch (FileSystemException fse) { String error = internal error: invalid resource: + customNodeTypesResource.getPath(); log.debug(error); throw new RepositoryException(error, fse); } which fails because of the exists() clause which calls the following in DatabaseFileSystem : /** * {...@inheritdoc} */ public boolean exists(String path) throws FileSystemException { if (!initialized) { throw new IllegalStateException(not initialized); } FileSystemPathUtil.checkFormat(path); String parentDir = FileSystemPathUtil.getParentDir(path); String name = FileSystemPathUtil.getName(path); The failure is in the checkFormat call that checks if the path starts with a /. This code hasn't changed since 2.1.1, so I was assuming it wasn't the source of the problem, but maybe it needs to change ? Again, I am not very familiar with the codebase yet, so I didn't know what the best solution was, I was just basing my analysis on what had been modified. Typo in NodeTypeRegistry Key: JCR-2793 URL: https://issues.apache.org/jira/browse/JCR-2793 Project: Jackrabbit Content Repository Issue Type: Bug Components: jackrabbit-core Affects Versions: 2.2.0 Reporter: Serge Huber Fix For: 2.2.0 Attachments: Fix_typo_on_custom_node_types_constant.patch Original Estimate: 0.17h Remaining Estimate: 0.17h It seems a little typo has been introduced in the NodeTypeRegistry, as illustrated in this stack trace : Caused by: javax.jcr.RepositoryException: internal error: invalid resource: nodetypes/custom_nodetypes.xml at org.apache.jackrabbit.core.nodetype.NodeTypeRegistry.init(NodeTypeRegistry.java:703) ~[jackrabbit-core-2.2-SNAPSHOT.jar:2.2-SNAPSHOT] at org.apache.jackrabbit.core.RepositoryImpl.createNodeTypeRegistry(RepositoryImpl.java:422) ~[jackrabbit-core-2.2-SNAPSHOT.jar:na] at org.apache.jackrabbit.core.RepositoryImpl.init(RepositoryImpl.java:294) ~[jackrabbit-core-2.2-SNAPSHOT.jar:na] This happens when using a DbFileSystem for the root filesystem. This didn't cause a problem in 2.1.1 The patch attached to this ticket correct the issue. -- This message is automatically generated by JIRA
[jira] Commented: (JCR-2715) Improved join query performance
[ https://issues.apache.org/jira/browse/JCR-2715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12922872#action_12922872 ] Serge Huber commented on JCR-2715: -- Thanks for all the work ! I've tested the QueryEngine and results do indeed seem to be faster, although I had an issue I'm not sure is due to my code regarding reading results twice. Also, concerning limits, do they currently work with sorting ? If I limit to 100 results, will it be the sorted 100 first results ? Or is sorting done after limiting ? Let me know how I can help, Best regards, Serge Huber. Improved join query performance --- Key: JCR-2715 URL: https://issues.apache.org/jira/browse/JCR-2715 Project: Jackrabbit Content Repository Issue Type: Improvement Components: jackrabbit-core, query Reporter: Jukka Zitting Assignee: Jukka Zitting Fix For: 2.2.0 Our current implementation of SQL2 join queries does not perform very well on pretty much any non-trivial data set. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Update on 2.2.0 release plan ?
Hello Jukka, In last september [0] , you mentioned establishing a release plan for the 2.2 release and planning a release in November, do you have an update on this ? Best regards, Serge Huber. [0] http://jackrabbit.markmail.org/thread/yyei3gjxh5ylyjwz
Re: Update on 2.2.0 release plan ?
Thanks for the quick reply ! Ok that looks perfect for our project. I think we should be able to assist with testing since we will be undergoing internal testing at pretty much the same time. Best regards, Serge Huber. On Wed, Oct 20, 2010 at 12:11 PM, Jukka Zitting jukka.zitt...@gmail.comwrote: Hi, On Wed, Oct 20, 2010 at 10:56 AM, Serge Huber shub...@jahia.com wrote: In last september [0] , you mentioned establishing a release plan for the 2.2 release and planning a release in November, do you have an update on this? I think we should be well on track for a November release, though there's still some work to be done on things like JCR-2573 and JCR-2715. I think we should target at branching 2.2 in early November and cutting the 2.2.0 release at the end of the month after a few weeks of testing and stabilization. BR, Jukka Zitting
[jira] Created: (JCR-2793) Typo in NodeTypeRegistry
Typo in NodeTypeRegistry Key: JCR-2793 URL: https://issues.apache.org/jira/browse/JCR-2793 Project: Jackrabbit Content Repository Issue Type: Bug Components: jackrabbit-core Affects Versions: 2.2.0 Reporter: Serge Huber Fix For: 2.2.0 Attachments: Fix_typo_on_custom_node_types_constant.patch It seems a little typo has been introduced in the NodeTypeRegistry, as illustrated in this stack trace : Caused by: javax.jcr.RepositoryException: internal error: invalid resource: nodetypes/custom_nodetypes.xml at org.apache.jackrabbit.core.nodetype.NodeTypeRegistry.init(NodeTypeRegistry.java:703) ~[jackrabbit-core-2.2-SNAPSHOT.jar:2.2-SNAPSHOT] at org.apache.jackrabbit.core.RepositoryImpl.createNodeTypeRegistry(RepositoryImpl.java:422) ~[jackrabbit-core-2.2-SNAPSHOT.jar:na] at org.apache.jackrabbit.core.RepositoryImpl.init(RepositoryImpl.java:294) ~[jackrabbit-core-2.2-SNAPSHOT.jar:na] This happens when using a DbFileSystem for the root filesystem. This didn't cause a problem in 2.1.1 The patch attached to this ticket correct the issue. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (JCR-2793) Typo in NodeTypeRegistry
[ https://issues.apache.org/jira/browse/JCR-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Serge Huber updated JCR-2793: - Attachment: Fix_typo_on_custom_node_types_constant.patch Typo in NodeTypeRegistry Key: JCR-2793 URL: https://issues.apache.org/jira/browse/JCR-2793 Project: Jackrabbit Content Repository Issue Type: Bug Components: jackrabbit-core Affects Versions: 2.2.0 Reporter: Serge Huber Fix For: 2.2.0 Attachments: Fix_typo_on_custom_node_types_constant.patch Original Estimate: 0.17h Remaining Estimate: 0.17h It seems a little typo has been introduced in the NodeTypeRegistry, as illustrated in this stack trace : Caused by: javax.jcr.RepositoryException: internal error: invalid resource: nodetypes/custom_nodetypes.xml at org.apache.jackrabbit.core.nodetype.NodeTypeRegistry.init(NodeTypeRegistry.java:703) ~[jackrabbit-core-2.2-SNAPSHOT.jar:2.2-SNAPSHOT] at org.apache.jackrabbit.core.RepositoryImpl.createNodeTypeRegistry(RepositoryImpl.java:422) ~[jackrabbit-core-2.2-SNAPSHOT.jar:na] at org.apache.jackrabbit.core.RepositoryImpl.init(RepositoryImpl.java:294) ~[jackrabbit-core-2.2-SNAPSHOT.jar:na] This happens when using a DbFileSystem for the root filesystem. This didn't cause a problem in 2.1.1 The patch attached to this ticket correct the issue. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (JCR-2715) Improved join query performance
[ https://issues.apache.org/jira/browse/JCR-2715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12922058#action_12922058 ] Serge Huber commented on JCR-2715: -- Thank you for your reply, I will pull the changes from SVN, test it and give you feedback. I am using another unit test that does a lot of concurrent read, writes and searches. Maybe this is something I could contribute but it is not yet generic to Jackrabbit and currently has dependencies t our product. Basically we are testing with larger loads than the currently available tests do. Regards, Serge Huber. Improved join query performance --- Key: JCR-2715 URL: https://issues.apache.org/jira/browse/JCR-2715 Project: Jackrabbit Content Repository Issue Type: Improvement Components: jackrabbit-core, query Reporter: Jukka Zitting Assignee: Jukka Zitting Fix For: 2.2.0 Our current implementation of SQL2 join queries does not perform very well on pretty much any non-trivial data set. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: Possibly typo in NodeTypeRegistry ?
Any feedback on this ? Does this fix look ok ? Should I create a JIRA ? (I have a patch for this just in case that I could attach to a ticket, but I just want to validate the correction first). Best regards, Serge Huber. On Thu, Oct 14, 2010 at 3:32 PM, Serge Huber shub...@jahia.com wrote: Hello, I am no expert in Jackrabbit source code (yet), so this might be wrong :) I am currently building the trunk and I was wondering if there was a possible typo in : jackrabbit-core/src/main/java/org/apache/jackrabbit/core/nodetype/NodeTypeRegistry.java private static final String CUSTOM_NODETYPES_RESOURCE_NAME = nodetypes/custom_nodetypes.xml; Isn't there a / missing at the beginning ? I was getting the following error when starting with a DbFileSystem : Caused by: org.apache.jackrabbit.core.fs.FileSystemException: not an absolute path: nodetypes/custom_nodetypes.xml at org.apache.jackrabbit.core.fs.FileSystemPathUtil.checkFormat(FileSystemPathUtil.java:178) at org.apache.jackrabbit.core.fs.db.DatabaseFileSystem.exists(DatabaseFileSystem.java:347) at org.apache.jackrabbit.core.fs.FileSystemResource.exists(FileSystemResource.java:142) at org.apache.jackrabbit.core.nodetype.NodeTypeRegistry.init(NodeTypeRegistry.java:696) It seems to startup ok with the / added. I wasn't sure if we shouldn't put again the BasedFileSystem instance around, like it is done in the constructor of the NamespaceRegistryImpl. Because in 2.1.1 we had the following code : nsReg = createNamespaceRegistry(new BasedFileSystem(repStore, /namespaces)); ntReg = createNodeTypeRegistry(nsReg, new BasedFileSystem(repStore, /nodetypes)); Best regards, Serge Huber.
Re: Possibly typo in NodeTypeRegistry ?
Any feedback on this ? Does this fix look ok ? Should I create a JIRA ? (I have a patch for this just in case that I could attach to a ticket, but I just want to validate the correction first). Best regards, Serge Huber. On Thu, Oct 14, 2010 at 3:32 PM, Serge Huber shub...@jahia.com wrote: Hello, I am no expert in Jackrabbit source code (yet), so this might be wrong :) I am currently building the trunk and I was wondering if there was a possible typo in : jackrabbit-core/src/main/java/org/apache/jackrabbit/core/nodetype/NodeTypeRegistry.java private static final String CUSTOM_NODETYPES_RESOURCE_NAME = nodetypes/custom_nodetypes.xml; Isn't there a / missing at the beginning ? I was getting the following error when starting with a DbFileSystem : Caused by: org.apache.jackrabbit.core.fs.FileSystemException: not an absolute path: nodetypes/custom_nodetypes.xml at org.apache.jackrabbit.core.fs.FileSystemPathUtil.checkFormat(FileSystemPathUtil.java:178) at org.apache.jackrabbit.core.fs.db.DatabaseFileSystem.exists(DatabaseFileSystem.java:347) at org.apache.jackrabbit.core.fs.FileSystemResource.exists(FileSystemResource.java:142) at org.apache.jackrabbit.core.nodetype.NodeTypeRegistry.init(NodeTypeRegistry.java:696) It seems to startup ok with the / added. I wasn't sure if we shouldn't put again the BasedFileSystem instance around, like it is done in the constructor of the NamespaceRegistryImpl. Because in 2.1.1 we had the following code : nsReg = createNamespaceRegistry(new BasedFileSystem(repStore, /namespaces)); ntReg = createNodeTypeRegistry(nsReg, new BasedFileSystem(repStore, /nodetypes)); Best regards, Serge Huber.
[jira] Commented: (JCR-2715) Improved join query performance
[ https://issues.apache.org/jira/browse/JCR-2715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12921338#action_12921338 ] Serge Huber commented on JCR-2715: -- If I understand this ticket properly, this doesn't only happen for join queries but for all SQL-2 queries no ? In the first solution, do you mean you intend to map the single selector queries along with using BooleanQuery objects to map constraints directly to the underlying Lucene query ? Anyway, I'd be willing to help anyway possible as this is become the biggest performance issue we are seeing in testing Jackrabbit with non-trivial data sets and loads. Regards, Serge Huber. Improved join query performance --- Key: JCR-2715 URL: https://issues.apache.org/jira/browse/JCR-2715 Project: Jackrabbit Content Repository Issue Type: Improvement Components: jackrabbit-core, query Reporter: Jukka Zitting Assignee: Jukka Zitting Fix For: 2.2.0 Our current implementation of SQL2 join queries does not perform very well on pretty much any non-trivial data set. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Possibly typo in NodeTypeRegistry ?
Hello, I am no expert in Jackrabbit source code (yet), so this might be wrong :) I am currently building the trunk and I was wondering if there was a possible typo in : jackrabbit-core/src/main/java/org/apache/jackrabbit/core/nodetype/NodeTypeRegistry.java private static final String CUSTOM_NODETYPES_RESOURCE_NAME = nodetypes/custom_nodetypes.xml; Isn't there a / missing at the beginning ? I was getting the following error when starting with a DbFileSystem : Caused by: org.apache.jackrabbit.core.fs.FileSystemException: not an absolute path: nodetypes/custom_nodetypes.xml at org.apache.jackrabbit.core.fs.FileSystemPathUtil.checkFormat(FileSystemPathUtil.java:178) at org.apache.jackrabbit.core.fs.db.DatabaseFileSystem.exists(DatabaseFileSystem.java:347) at org.apache.jackrabbit.core.fs.FileSystemResource.exists(FileSystemResource.java:142) at org.apache.jackrabbit.core.nodetype.NodeTypeRegistry.init(NodeTypeRegistry.java:696) It seems to startup ok with the / added. I wasn't sure if we shouldn't put again the BasedFileSystem instance around, like it is done in the constructor of the NamespaceRegistryImpl. Because in 2.1.1 we had the following code : nsReg = createNamespaceRegistry(new BasedFileSystem(repStore, /namespaces)); ntReg = createNodeTypeRegistry(nsReg, new BasedFileSystem(repStore, /nodetypes)); Best regards, Serge Huber.
[jira] Commented: (JCR-2715) Improved join query performance
[ https://issues.apache.org/jira/browse/JCR-2715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12920495#action_12920495 ] Serge Huber commented on JCR-2715: -- We are seeing this issue also, so we would be very interested in the work you are doing. Is there anything already available to test ? Regards, Serge Huber. Improved join query performance --- Key: JCR-2715 URL: https://issues.apache.org/jira/browse/JCR-2715 Project: Jackrabbit Content Repository Issue Type: Improvement Components: jackrabbit-core, query Reporter: Jukka Zitting Assignee: Jukka Zitting Fix For: 2.2.0 Our current implementation of SQL2 join queries does not perform very well on pretty much any non-trivial data set. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: Jackrabbit 2.0 press release
Hello Jukka, Thank you for including us in the request for comments :) We(Jahia)'ll try to put something together and get it to you soon, based on the Tuscany example. Best Regards, Serge Huber. On 16 sept. 09, at 11:28, Jukka Zitting wrote: Hi, I would like to start preparing an official Apache press release to go with the upcoming Jackrabbit 2.0 release. This will be our biggest release since Jackrabbit 1.0 was released 3.5 years ago [1], and I'd like to get also the press involved in spreading the word. See [2] for a good example press release from Apache Tuscany. I'll write a basic draft over the next week or so, and we can work together on improving it. Ideally we should have a final draft of the press release available in mid-October so that we can have it released at ApacheCon US. I would also like to get a few quotes from key people included in the press release. David as the JSR 283 spec lead and Day representative would be a good candidate, and people like Arje from Hippo, Julian from greenbytes, or someone from GX, Cognifide, Jahia or Anyware could give good industry perspective. [1] http://markmail.org/message/jlk4qe336lne7v4m [2] http://apache.org/foundation/press/pr_2008_10_14.html BR, Jukka Zitting
[jira] Commented: (JCR-1525) Jackrabbit depends on Oracle driver for BLOB support in Oracle versions previous than 10.2
[ https://issues.apache.org/jira/browse/JCR-1525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12709752#action_12709752 ] Serge Huber commented on JCR-1525: -- Actually in the most recent versions of the Oracle driver (11) we no longer need the Oracle BLOB specific code. And the new driver solves this even for older versions of Oracle. If you're interested I have a patch for that also. But it's not clear to me if it's acceptable to Jackrabbit to have a dependency on the latest version of the driver ? Here is the extract from the README for the new driver : New Factory Methods The JDBC 4.0 spec for java.sql.Connection includes factory methods for creating instances of the standard JDBC types, Array, Blob, Clob, etc. Building on this concept Oracle JDBC 11R1 oracle.jdbc.OracleConnection provides factory methods for creating instances of the Oracle specific types. Best practice is to use the standard JDBC types and the new factory methods. When it is necessary to use the Oracle specific types best practice is to create them via the new factory methods. Direct customer access to the the constructors for these types will be deprecated and later desupported. The supported types are all those in oracle.sql, including ARRAY, BFILE, DATE, INTERVALDS, NUMBER, STRUCT, TIME, TIMESTAMP, etc. Regards, Serge Jackrabbit depends on Oracle driver for BLOB support in Oracle versions previous than 10.2 -- Key: JCR-1525 URL: https://issues.apache.org/jira/browse/JCR-1525 Project: Jackrabbit Content Repository Issue Type: Improvement Components: jackrabbit-core Reporter: Esteban Franqueiro Attachments: JCR-1525.patch In Oracle versions previous to 10.2, Jackrabbit explicitly uses a class from the Oracle driver to provide BLOB support (see OracleFileSystem.init()). This special handling is no longer necesary for Oracle 10.2+, so we should provide a new implementation. As discussed on the list, we can create a new class for Oracle 10.2+, make it inherit from DbFileSystem, and override the createSchema(), and table space related methods, which are the ones that need special handling. Furthermore, we could refactor the current OracleFileSystem and break it into two clases, one of them to keep the current behavior and a new one to keep the common code (which we could rename to OracleBaseFileSystem or similar, to maintain compatiblity with code that uses OracleFileSystem for versions previous to 10.2). Then we make the Oracle10FileSystem inherit from the latter. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: Removing old persistence managers in Jackrabbit 2.0
+1 On 5 mai 09, at 17:33, Jukka Zitting wrote: Hi, I would like to drop all our non-bundle persistence managers before we release Jackrabbit 2.0. Besides forcing people to upgrade to the recommended setup, getting rid of the old PMs would also allow us to evolve the PersistenceManager interface to better match the architectural changes (bundle storage, data store, clustering, etc.) that we've gone through during the Jackrabbit 1.x cycle. WDYT? BR, Jukka Zitting
Re: Thanks Issue-Report from PlugFest in Basel.
Thanks a lot for hosting and organizing the event. I was thinking that it might be a good idea to schedule another plugfest later in the year to see how things have progressed and possibly have an idea of interoperability close to the final release of the specification ? Best Regards, Serge... On 3 mai 09, at 12:44, David Nuescheler wrote: Dear TC members Jackrabbit-devs, I would like to thank everybody who attended the CMIS PlugFest in Basel. I think it was very successful and we uncovered a lot of issues while having a lot of fun achieving 31 (!) client / server connections. http://liip.to/cmismatrix I think we should be able to use the above matrix to track ongoing CMIS introp testing. I am sure this can be an evolving base for everybody to contribute their test results to. Also find write-ups about the PlugFest here: http://dev.day.com/microsling/content/blogs/main/cmisplugfest2.html I also reported the Issues that were logged [1] throughout the PlugFest as issue 161 - issue 170 in the CMIS jira. regards, david [1] http://www.day.com/o.file/cmis-issues.jpg?get=f2f7b2e3176fc1deb1d610ac0ad06ec9 -- http://dev.day.com
Question about Chemistry client
Hi all, I'm wondering how one would go about using Chemistry as a CMIS client ? Is there code already related to this ? Best regards, Serge Huber.
Re: Question about Chemistry client
Thanks a lot Florent. I look forward to meeting you tomorrow! Best regards, Serge Huber. On 28 avr. 09, at 15:00, Florent Guillaume wrote: On 28 Apr 2009, at 14:24, Serge Huber wrote: I'm wondering how one would go about using Chemistry as a CMIS client ? Is there code already related to this ? We have code lying around for a partial AtomPub client (done by Bogdan here), I have to commit it today/tomorrow. (http://hg.nuxeo.org/sandbox/nuxeo-chemistry/file/bs/nuxeo-chemistry-client/ if you're curious -- it still has Nuxeo dependencies at the moment which I'll clean up when moving to chemistry). Florent -- Florent Guillaume, Head of RD, Nuxeo Open Source, Java EE based, Enterprise Content Management (ECM) http://www.nuxeo.com http://www.nuxeo.org +33 1 40 33 79 87
Re: Removing ORM persistence manager ?
I'd just leave it in sandbox/inactive where I already placed it some while ago [1]. Perhaps someone still is interested in the code, or at least want's to browse the project history and the various contributions that never made it to the trunk. By definition none of the stuff in the sandbox is actively supported or maintained so no worries about that. I just resolved the last related open issue (JCR-730) with the resolution Won't Fix. Thank you Jukka ! Best Regards, Serge Huber.
Re: Next steps for Chemistry
Hi Jukka all, Thanks for the heads up. I was looking at the code for Chemistry this morning, and it looks indeed like it's ready to be played with, but it's really lacking an integration with Jackrabbit :) I was wondering what people thought of David's suggestion that we have the Jackrabbit bindings in the Chemistry project ? Or it is too early to discuss this ? Also what is planned for the jcr-cmis code in the sandbox ? Will we be able to reuse some of that for the binding ? Anyone already working on this ? Best regards, Serge Huber. On 27 avr. 09, at 12:00, Jukka Zitting wrote: Hi, To keep everyone on track of what's happening with Chemistry, here's a list of things that are going to happen: * Based on the successful vote, I've asked the Incubator to accept Chemistry for incubation. Incubation starts on Thursday this week unless someone from the Incubator PMC challenges the proposal. * On Thursday (or once the IPMC is happy) I will request or set up all the project infrastructure listed on the Chemistry proposal. I will also move all the related code that we currently have in the Jackrabbit sandbox to the new Chemistry project. All the JCRCMIS Jira issues will be migrated to the new CMIS project in Jira. * I will request Apache committer accounts for the initial committers who don't already have such accounts. Please let me know what your preferred user account name is. The account will be associated with all the commits you make and will also become your @apache.org email address. Please check http://people.apache.org/~jim/committers.html for potential account name conflicts or provide a few acceptable alternatives. There are some delays related to setting up mailing lists and user accounts, so it'll probably take until next week before we're fully set up to start working on the project. BR, Jukka Zitting
Re: Next steps for Chemistry
Hi Dominique, Is this code already accessible somewhere ? I'd love to have a look at it before I come on Wednesday. Even a snapshot of your working version would be fine :) Is the RMI deployment a requirement of your implementation, or just an option ? Regards, Serge Huber. On 27 avr. 09, at 15:06, Dominique Pfister wrote: Hi Serge, I'm currently working on a JCR bridge for Chemistry, that will implement the interfaces in the chemistry-api module and translate them to calls on a generic JCR repository. I was able to plug a JCR repository implementation (actually Jackrabbit running in Tomcat accessed via RMI) into Florian's TestAtomPubServer test case, and the tests passed through. Of course, there is still a lot of functionality missing... Kind regards Dominique On Mon, Apr 27, 2009 at 2:58 PM, Serge Huber shub...@jahia.com wrote: Hi Jukka all, Thanks for the heads up. I was looking at the code for Chemistry this morning, and it looks indeed like it's ready to be played with, but it's really lacking an integration with Jackrabbit :) I was wondering what people thought of David's suggestion that we have the Jackrabbit bindings in the Chemistry project ? Or it is too early to discuss this ? Also what is planned for the jcr-cmis code in the sandbox ? Will we be able to reuse some of that for the binding ? Anyone already working on this ? Best regards, Serge Huber. On 27 avr. 09, at 12:00, Jukka Zitting wrote: Hi, To keep everyone on track of what's happening with Chemistry, here's a list of things that are going to happen: * Based on the successful vote, I've asked the Incubator to accept Chemistry for incubation. Incubation starts on Thursday this week unless someone from the Incubator PMC challenges the proposal. * On Thursday (or once the IPMC is happy) I will request or set up all the project infrastructure listed on the Chemistry proposal. I will also move all the related code that we currently have in the Jackrabbit sandbox to the new Chemistry project. All the JCRCMIS Jira issues will be migrated to the new CMIS project in Jira. * I will request Apache committer accounts for the initial committers who don't already have such accounts. Please let me know what your preferred user account name is. The account will be associated with all the commits you make and will also become your @apache.org email address. Please check http://people.apache.org/~jim/committers.html for potential account name conflicts or provide a few acceptable alternatives. There are some delays related to setting up mailing lists and user accounts, so it'll probably take until next week before we're fully set up to start working on the project. BR, Jukka Zitting
Re: Next steps for Chemistry
Thanks a lot David :) Best regards, Serge Huber. On 27 avr. 09, at 15:56, David Nuescheler wrote: hi serge, Is this code already accessible somewhere ? I'd love to have a look at it before I come on Wednesday. Even a snapshot of your working version would be fine :) i think our focus is on getting as much as possible done for wednesday, so we will check stuff in whenever it makes sense... Is the RMI deployment a requirement of your implementation, or just an option ? it is an option. i would even go as far as saying that it is an undesirable option ;) regards, david
Removing ORM persistence manager ?
Hi guys, It's been a long time since I have been implicated with Jackrabbit as I have not had the opportunity to work on it as much as I had hoped. Anyway, I am proposing that we remove the ORM persistence manager that I initially committed, since I believe there is little interest for it and even myself will no longer be using it since much better implementations such as the bundle persistence managers have become available. Otherwise we could just leave it be in the sandbox, but should we maintain it ? Regards, Serge Huber.
Re: Incubating Chemistry (Was: IP clearance for the Chemistry contribution)
I am very interested in this project also, but I think it would be much too early for me to get commitership on this, as I need to first prove myself again, as an alumni of Jackrabbit :) Regards, Serge
Re: Incubating Chemistry (Was: IP clearance for the Chemistry contribution)
I might be jumping the gun a little, but if this can help, here are a few name suggestions : Apache Plug Apache Content Plug Apache Jackplug Apache Sea Miss (ok it's an awful word game :)) (or could be written Seamis, or Seemis) Apache Startle (Jackrabbit synonym) But I must admit the name Chemistry is quite good, even if potentially misleading. Regards, Serge Huber. On 21 avr. 09, at 16:04, Jukka Zitting wrote: Hi, On Tue, Apr 21, 2009 at 12:12 AM, Jukka Zitting jukka.zitt...@gmail.com wrote: I'll now ping the gene...@incubator.apache.org list for some early comments on the proposal. See http://thread.gmane.org/gmane.comp.apache.incubator.general/21727 for the Incubator thread. There are some concerns over the name Chemistry and I actually found an existing Java project called Chemistry Development Kit (http://apps.sourceforge.net/mediawiki/cdk/), so as painful as it is we may still need to reconsider the project name. BR, Jukka Zitting
Re: Incubating Chemistry (Was: IP clearance for the Chemistry contribution)
Thank you Torgeir, I remember this being mentioned a while ago, but I was mostly hoping to help if need be. But I would miss the current name, as I agree it is good :) Regards, Serge Huber. On 21 avr. 09, at 16:36, Torgeir Veimo wrote: It is a play on the letters; CheMIStry. 2009/4/22 Serge Huber shub...@jahia.com I might be jumping the gun a little, but if this can help, here are a few name suggestions : Apache Plug Apache Content Plug Apache Jackplug Apache Sea Miss (ok it's an awful word game :)) (or could be written Seamis, or Seemis) Apache Startle (Jackrabbit synonym) But I must admit the name Chemistry is quite good, even if potentially misleading. Regards, Serge Huber. On 21 avr. 09, at 16:04, Jukka Zitting wrote: Hi, On Tue, Apr 21, 2009 at 12:12 AM, Jukka Zitting jukka.zitt...@gmail.com wrote: I'll now ping the gene...@incubator.apache.org list for some early comments on the proposal. See http://thread.gmane.org/gmane.comp.apache.incubator.general/21727 for the Incubator thread. There are some concerns over the name Chemistry and I actually found an existing Java project called Chemistry Development Kit (http://apps.sourceforge.net/mediawiki/cdk/), so as painful as it is we may still need to reconsider the project name. BR, Jukka Zitting -- -Tor
Re: Incubating Chemistry (Was: IP clearance for the Chemistry contribution)
or Chemis ? Regards, Serge Huber. On 21 avr. 09, at 16:43, Jukka Zitting wrote: Hi, On Tue, Apr 21, 2009 at 4:40 PM, Serge Huber shub...@jahia.com wrote: But I would miss the current name, as I agree it is good :) Me too. How about if we just slightly modified it, say to Apache Chemi or something similar? BR, Jukka Zitting
Testing with Websphere 6.1 Oracle 11g
Hi Jackrabbit developers, I have been quite passive on this list for a long time, but I have finally been able to work with Jackrabbit a bit more, and I really need your help at this point :) I have been trying to deploy Jackrabbit on one of the most unforgiving platforms : Websphere 6.1 and Oracle 9/10/11g. I have struggled quite a lot, and I had to revert to webapp-based deployment because the JCA deployment model just didn't fit well with the software I'm trying to integrate with (I was having issues with the authentificationauthorization that was not possible to integrate into the connector). I managed to make most of it work, but I am now facing an issue that looks like an architectural issue, which is why I am writing here. Basically the problem I am seeing is that Websphere is complaining about accessing JNDI resources outside of container-managed threads. In the specific case I am investigating, this happens in the ObservationDispatcher class, that creates a thread for handling the notifications. Websphere refuses to serve the managed datasource because the threads are not managed by him. From my point of view I have the following options : - directly connect Jackrabbit to the database, without using the container datasource and using JDBC connections - modifying the thread creation in Jackrabbit to maybe use something like the CommonJ Workmanager interface that allows the creation of contained managed threads The first solution I keep as a last solution, because I think that clients using Websphere will not like the idea of having JDBC connections that they have no control over. So I'd prefer to work on the thread creation part, but here I have the problem that the ObservationDispatcher class, from what I could understand of the source code, is not pluggeable so I couldn't replace it without really patching the code. Does this analysis look right ? Did I miss something ? I should point out that I am more than willing to do whatever work could help this issue out, as I am under a rather aggressive timeline to make all of this work. Best regards, Serge Huber.
Re: Testing with Websphere 6.1 Oracle 11g
Hi Jukka, Thank you for the quick reply. I will take a stronger look at the first option then :) Regards, Serge Huber. On Tue, Apr 14, 2009 at 10:33 AM, Jukka Zitting jukka.zitt...@gmail.comwrote: Hi, On Tue, Apr 14, 2009 at 9:13 AM, Serge Huber shub...@jahia.com wrote: From my point of view I have the following options : - directly connect Jackrabbit to the database, without using the container datasource and using JDBC connections - modifying the thread creation in Jackrabbit to maybe use something like the CommonJ Workmanager interface that allows the creation of contained managed threads The first solution I keep as a last solution, because I think that clients using Websphere will not like the idea of having JDBC connections that they have no control over. On the other hand Jackrabbit doesn't like JDBC connections that it doesn't have full control over. For example, mixing a Jackrabbit transaction and a container-managed database transaction is a bad idea. Also, Jackrabbit likes to keep long-lived database connections and switch them back and forth from auto-commit mode, which are both operations that some container-managed data sources don't like that much. So, from a technical perspective your path of least resistance would clearly be to go with the first option. BR, Jukka Zitting
Re: Alfresco + Jackrabbit
David Nuescheler wrote: Hi Robert, Actually one thing that I find really interesting about Alfresco - in case anyone wants to implement it as an add-on to Jackrabbit - is the CIFS layer which supposedly allows good access to the server (as a document server) from Windows clients. I would imagine that using the jCIFS library it would be possible to write something similar for a more generic JSR-170 provider... I think this would be a good idea too, as a matter of fact we already looked into the feasibility of something like that and it seems to work just fine. Some random access performance drawbacks if we want to keep it strictly bound to the JCR API. jCIFS though is a CIFS client, right? At least I have not found a CIFS server other than Starlasoft's / Alfresco's? Am I looking in the wrong place? Well the other question is the relationship between Starlasoft and Alfresco, as the developer that is behind Starlasoft is on Alfresco's payroll. What is not entirely clear is if the product on Starlasoft will continue to evolve or if JLAN will become an integral part of Alfresco. The license of JLAN inside Alfresco is Mozilla-based, with a strong advertisement clause. But the advertisement clause concerns mostly GUIs, and how does it apply to server-side libraries ? If JLAN is used as a CIFS server for a Jackrabbit repository, where do you (and do you need to ?) advertise the Alfresco copyrights, etc ? Having tried mapping a WeDAV location as a network drive I can say that it really doesn't work in a usable fashion. Really? So far I experienced a generally suboptimal perfomance but it works just as well as CIFS for me, both on MacOSX and Windows. What issues did you encounter? Our experience with the WebDAV client on Windows has been nothing short of horrible : two many buggy DLLs out there (see http://www.greenbytes.de/tech/webdav/webfolder-client-list.html for a detailed list of the bugs and implementations) and they all behave differently. Caching is erratic, internationalisation is very poor (some operations like renaming a directory using non ISO8859-1 chars are not even sent out to the server depending on the version of the DLL !). My impression is that Microsoft does not view the WebDAV client as important, and therefore doesn't really make a big effort to maintain it. CIFS on the other hand is the main file-sharing system on Windows system, and is actively developed and maintained by Microsoft. Of course this also means that they can change the protocol and implementation as they see fit, which is a problem for libraries like JLAN that must constantly keep up. One idea that would be interesting would be to develop a custom open source file-system driver for Windows, that could use a transport such as WebDAV. There are closed-source solutions like Xythos Drive that already offer this, but having an open source one might be interesting. The good side about having such an implementation is that it could evolve seperately to Windows implementations of WebDAV. The bad side is that it's a lot of Windows-specific work and might be tedious. But there are success stories such as TortoiseCVS and TortoiseSVN, they just lack the mapping possibility. The ideal open source configuration as I see it for file repositories : - Strongly integrated open source client on Windows (offering the possibility to control versions, searches, check-in/check-out, locking, etc...).CIFS and the default WebDAV Windows client don't offer these features. - JCR backends such as Jackrabbit. Anyway, just some food for thought :) cheers, Serge...
Re: Alfresco + Jackrabbit
Well I looked at Alfresco a while back but for me the main difference was at the time : - Nodes in Alfresco seem more file-oriented (basically it's mostly configured for that type of usage) - Nodes in Jackrabbit are quite general But the two implementations are quite similar, except that Jackrabbit has decoupled the persistence implementation in a way that makes it easy to choose a back-end fitting your deployment. On the other hand Alfresco is database oriented, which will help with some issues such as transaction management, clustering, etc. Of course this is a very summarized view of the two technologies. There is a lot more to both, but it is not clear to me that one or the other would better fitted for large hierarchical data. One thing I have noted is that Alfreso is in full buzz mode right now :) So it would be nice to have a real-world comparison of the two techs. It seems to me that Alfresco is more EDM oriented than Jackrabbit though in terms of a product. And last time I did performance comparisons, nothing could beat Jackrabbit in terms of indexing speed, and the possibility to use file-based persistence was a interesting choice for lighter configurations that still need speed. For me the big issue with Jackrabbit is to scale it to really large datasets. I'd love to be able to say that Jackrabbit can scale to a cluster of 10-20 machines in cluster and managed hierarchical data of 20 million nodes amounting to 100TB of data :) Regards, Serge... Jukka Zitting wrote: Hi, On 10/9/06, Alexandru Popescu [EMAIL PROTECTED] wrote: Other than this, I guess it may be oke to have different solutions for this spec implementation (in case you are referring to this). +1 In fact I'd be very interested in seeing some comparisons on the various aspects of the different JCR implementations. There's a lot to be learned from different approaches to the same problem. BR, Jukka Zitting