[jira] [Resolved] (CONNECTORS-453) ManifoldCF running with Derby 10.8.1.1 has problems pausing and aborting jobs
[ https://issues.apache.org/jira/browse/CONNECTORS-453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karl Wright resolved CONNECTORS-453. Resolution: Fixed > ManifoldCF running with Derby 10.8.1.1 has problems pausing and aborting jobs > - > > Key: CONNECTORS-453 > URL: https://issues.apache.org/jira/browse/CONNECTORS-453 > Project: ManifoldCF > Issue Type: Bug > Components: Framework core >Affects Versions: ManifoldCF 0.5 >Reporter: Karl Wright >Assignee: Karl Wright >Priority: Critical > Fix For: ManifoldCF 0.5.1, ManifoldCF 0.6 > > > Since upgrading to Derby 10.8.x.x, tt takes minutes to crawl just 20 > documents. Clearly the Derby contention/locking bugs are back with a > vengeance in 10.8.x.x. Either we use 10.7.x.x or we get the Derby team to > look at them again. > In the interim, maybe it is time to use hsqldb as the default embedded > database for the single-process example instead of Derby. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CONNECTORS-453) ManifoldCF running with Derby 10.8.1.1 has problems pausing and aborting jobs
[ https://issues.apache.org/jira/browse/CONNECTORS-453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13273872#comment-13273872 ] Karl Wright commented on CONNECTORS-453: r1337457 (release branch) > ManifoldCF running with Derby 10.8.1.1 has problems pausing and aborting jobs > - > > Key: CONNECTORS-453 > URL: https://issues.apache.org/jira/browse/CONNECTORS-453 > Project: ManifoldCF > Issue Type: Bug > Components: Framework core >Affects Versions: ManifoldCF 0.5 >Reporter: Karl Wright >Assignee: Karl Wright >Priority: Critical > Fix For: ManifoldCF 0.5.1, ManifoldCF 0.6 > > > Since upgrading to Derby 10.8.x.x, tt takes minutes to crawl just 20 > documents. Clearly the Derby contention/locking bugs are back with a > vengeance in 10.8.x.x. Either we use 10.7.x.x or we get the Derby team to > look at them again. > In the interim, maybe it is time to use hsqldb as the default embedded > database for the single-process example instead of Derby. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Reopened] (CONNECTORS-453) ManifoldCF running with Derby 10.8.1.1 has problems pausing and aborting jobs
[ https://issues.apache.org/jira/browse/CONNECTORS-453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karl Wright reopened CONNECTORS-453: Reopening for inclusion in 0.5.1 > ManifoldCF running with Derby 10.8.1.1 has problems pausing and aborting jobs > - > > Key: CONNECTORS-453 > URL: https://issues.apache.org/jira/browse/CONNECTORS-453 > Project: ManifoldCF > Issue Type: Bug > Components: Framework core >Affects Versions: ManifoldCF 0.5 >Reporter: Karl Wright >Assignee: Karl Wright >Priority: Critical > Fix For: ManifoldCF 0.5.1, ManifoldCF 0.6 > > > Since upgrading to Derby 10.8.x.x, tt takes minutes to crawl just 20 > documents. Clearly the Derby contention/locking bugs are back with a > vengeance in 10.8.x.x. Either we use 10.7.x.x or we get the Derby team to > look at them again. > In the interim, maybe it is time to use hsqldb as the default embedded > database for the single-process example instead of Derby. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CONNECTORS-453) ManifoldCF running with Derby 10.8.1.1 has problems pausing and aborting jobs
[ https://issues.apache.org/jira/browse/CONNECTORS-453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karl Wright updated CONNECTORS-453: --- Priority: Critical (was: Major) Fix Version/s: ManifoldCF 0.5.1 > ManifoldCF running with Derby 10.8.1.1 has problems pausing and aborting jobs > - > > Key: CONNECTORS-453 > URL: https://issues.apache.org/jira/browse/CONNECTORS-453 > Project: ManifoldCF > Issue Type: Bug > Components: Framework core >Affects Versions: ManifoldCF 0.5 >Reporter: Karl Wright >Assignee: Karl Wright >Priority: Critical > Fix For: ManifoldCF 0.5.1, ManifoldCF 0.6 > > > Since upgrading to Derby 10.8.x.x, tt takes minutes to crawl just 20 > documents. Clearly the Derby contention/locking bugs are back with a > vengeance in 10.8.x.x. Either we use 10.7.x.x or we get the Derby team to > look at them again. > In the interim, maybe it is time to use hsqldb as the default embedded > database for the single-process example instead of Derby. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CONNECTORS-453) ManifoldCF running with Derby 10.8.1.1 has problems pausing and aborting jobs
[ https://issues.apache.org/jira/browse/CONNECTORS-453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karl Wright updated CONNECTORS-453: --- Summary: ManifoldCF running with Derby 10.8.1.1 has problems pausing and aborting jobs (was: ManifoldCF running with Derby 10.8.1.1 has severe performance problems) > ManifoldCF running with Derby 10.8.1.1 has problems pausing and aborting jobs > - > > Key: CONNECTORS-453 > URL: https://issues.apache.org/jira/browse/CONNECTORS-453 > Project: ManifoldCF > Issue Type: Bug > Components: Framework core >Affects Versions: ManifoldCF 0.5 >Reporter: Karl Wright >Assignee: Karl Wright > Fix For: ManifoldCF 0.6 > > > Since upgrading to Derby 10.8.x.x, tt takes minutes to crawl just 20 > documents. Clearly the Derby contention/locking bugs are back with a > vengeance in 10.8.x.x. Either we use 10.7.x.x or we get the Derby team to > look at them again. > In the interim, maybe it is time to use hsqldb as the default embedded > database for the single-process example instead of Derby. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CONNECTORS-453) ManifoldCF running with Derby 10.8.1.1 has severe performance problems
[ https://issues.apache.org/jira/browse/CONNECTORS-453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13263148#comment-13263148 ] Karl Wright commented on CONNECTORS-453: r1331102 > ManifoldCF running with Derby 10.8.1.1 has severe performance problems > -- > > Key: CONNECTORS-453 > URL: https://issues.apache.org/jira/browse/CONNECTORS-453 > Project: ManifoldCF > Issue Type: Bug > Components: Framework core >Affects Versions: ManifoldCF 0.5 >Reporter: Karl Wright >Assignee: Karl Wright > Fix For: ManifoldCF 0.6 > > > Since upgrading to Derby 10.8.x.x, tt takes minutes to crawl just 20 > documents. Clearly the Derby contention/locking bugs are back with a > vengeance in 10.8.x.x. Either we use 10.7.x.x or we get the Derby team to > look at them again. > In the interim, maybe it is time to use hsqldb as the default embedded > database for the single-process example instead of Derby. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (CONNECTORS-453) ManifoldCF running with Derby 10.8.1.1 has severe performance problems
[ https://issues.apache.org/jira/browse/CONNECTORS-453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karl Wright resolved CONNECTORS-453. Resolution: Fixed > ManifoldCF running with Derby 10.8.1.1 has severe performance problems > -- > > Key: CONNECTORS-453 > URL: https://issues.apache.org/jira/browse/CONNECTORS-453 > Project: ManifoldCF > Issue Type: Bug > Components: Framework core >Affects Versions: ManifoldCF 0.5 >Reporter: Karl Wright >Assignee: Karl Wright > Fix For: ManifoldCF 0.6 > > > Since upgrading to Derby 10.8.x.x, tt takes minutes to crawl just 20 > documents. Clearly the Derby contention/locking bugs are back with a > vengeance in 10.8.x.x. Either we use 10.7.x.x or we get the Derby team to > look at them again. > In the interim, maybe it is time to use hsqldb as the default embedded > database for the single-process example instead of Derby. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CONNECTORS-453) ManifoldCF running with Derby 10.8.1.1 has severe performance problems
[ https://issues.apache.org/jira/browse/CONNECTORS-453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13263092#comment-13263092 ] Karl Wright commented on CONNECTORS-453: Here's another example: Error! A lock could not be obtained due to a deadlock, cycle of locks and waiters is: Lock : ROW, JOBS, (1,7) Waiting XID : {157800, X} , APP, UPDATE jobs SET status=? WHERE id=? Granted XID : {157521, S} , {157653, S} Lock : ROW, JOBQUEUE, (503,86) Waiting XID : {157653, S} , APP, SELECT t0.id,t0.jobid,t0.dochash,t0.docid,t0.status,t0.failtime,t0.failcount,t0.priorityset FROM jobqueue t0 WHERE t0.status IN (?,?) AND t0.checkaction=? AND t0.checktime<=? AND EXISTS(SELECT 'x' FROM jobs t1 WHERE t1.status IN (?,?) AND t1.id=t0.jobid AND t1.priority=?) AND NOT EXISTS(SELECT 'x' FROM jobqueue t2 WHERE t2.dochash=t0.dochash AND t2.status IN (?,?,?,?,?,?) AND t2.jobid!=t0.jobid) AND NOT EXISTS(SELECT 'x' FROM prereqevents t3,events t4 WHERE t0.id=t3.owner AND t3.eventname=t4.name) ORDER BY t0.docpriority ASC,t0.status ASC,t0.checkaction ASC,t0.checktime ASC FETCH NEXT 120 ROWS ONLY Granted XID : {157557, X} Lock : ROW, JOBS, (1,7) Waiting XID : {157557, S} , APP, INSERT INTO hopcount (deathmark,parentidhash,id,distance,jobid,linktype) VALUES (?,?,?,?,?,?) . The selected victim is XID : 157800. > ManifoldCF running with Derby 10.8.1.1 has severe performance problems > -- > > Key: CONNECTORS-453 > URL: https://issues.apache.org/jira/browse/CONNECTORS-453 > Project: ManifoldCF > Issue Type: Bug > Components: Framework core >Affects Versions: ManifoldCF 0.5 >Reporter: Karl Wright >Assignee: Karl Wright > Fix For: ManifoldCF 0.6 > > > Since upgrading to Derby 10.8.x.x, tt takes minutes to crawl just 20 > documents. Clearly the Derby contention/locking bugs are back with a > vengeance in 10.8.x.x. Either we use 10.7.x.x or we get the Derby team to > look at them again. > In the interim, maybe it is time to use hsqldb as the default embedded > database for the single-process example instead of Derby. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CONNECTORS-453) ManifoldCF running with Derby 10.8.1.1 has severe performance problems
[ https://issues.apache.org/jira/browse/CONNECTORS-453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13263090#comment-13263090 ] Karl Wright commented on CONNECTORS-453: Clicking pause during the job run yields the following to be displayed in the UI: A lock could not be obtained due to a deadlock, cycle of locks and waiters is: Lock : ROW, JOBS, (1,7) Waiting XID : {147028, X} , APP, UPDATE jobs SET status=? WHERE id=? Granted XID : {146703, S} , {146941, S} Lock : ROW, JOBQUEUE, (481,10) Waiting XID : {146941, S} , APP, SELECT jobid,CAST(COUNT(dochash) AS bigint) AS doccount FROM jobqueue t1 WHERE EXISTS(SELECT 'x' FROM jobs t0 WHERE t0.id=t1.jobid AND id=?) GROUP BY jobid Granted XID : {146612, X} Lock : ROW, HOPCOUNT, (1734,27) Waiting XID : {146612, S} , APP, SELECT parentidhash,linktype,distance FROM hopcount WHERE jobid=? AND parentidhash IN (?,?,?,?,?,?,?,?,?,?) AND linktype=? Granted XID : {14, X} Lock : ROW, JOBS, (1,7) Waiting XID : {14, S} , APP, INSERT INTO hopcount (deathmark,parentidhash,id,distance,jobid,linktype) VALUES (?,?,?,?,?,?) . The selected victim is XID : 147028. > ManifoldCF running with Derby 10.8.1.1 has severe performance problems > -- > > Key: CONNECTORS-453 > URL: https://issues.apache.org/jira/browse/CONNECTORS-453 > Project: ManifoldCF > Issue Type: Bug > Components: Framework core >Affects Versions: ManifoldCF 0.5 >Reporter: Karl Wright >Assignee: Karl Wright > Fix For: ManifoldCF 0.6 > > > Since upgrading to Derby 10.8.x.x, tt takes minutes to crawl just 20 > documents. Clearly the Derby contention/locking bugs are back with a > vengeance in 10.8.x.x. Either we use 10.7.x.x or we get the Derby team to > look at them again. > In the interim, maybe it is time to use hsqldb as the default embedded > database for the single-process example instead of Derby. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CONNECTORS-453) ManifoldCF running with Derby 10.8.1.1 has severe performance problems
[ https://issues.apache.org/jira/browse/CONNECTORS-453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13263086#comment-13263086 ] Karl Wright commented on CONNECTORS-453: I see stalls only at the very beginning of a crawl. Long crawls with lots of documents don't appear to stall, however. Still trying to figure out if this is an actual problem or something more innocuous. > ManifoldCF running with Derby 10.8.1.1 has severe performance problems > -- > > Key: CONNECTORS-453 > URL: https://issues.apache.org/jira/browse/CONNECTORS-453 > Project: ManifoldCF > Issue Type: Bug > Components: Framework core >Affects Versions: ManifoldCF 0.5 >Reporter: Karl Wright >Assignee: Karl Wright > Fix For: ManifoldCF 0.6 > > > Since upgrading to Derby 10.8.x.x, tt takes minutes to crawl just 20 > documents. Clearly the Derby contention/locking bugs are back with a > vengeance in 10.8.x.x. Either we use 10.7.x.x or we get the Derby team to > look at them again. > In the interim, maybe it is time to use hsqldb as the default embedded > database for the single-process example instead of Derby. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (CONNECTORS-453) ManifoldCF running with Derby 10.8.1.1 has severe performance problems
ManifoldCF running with Derby 10.8.1.1 has severe performance problems -- Key: CONNECTORS-453 URL: https://issues.apache.org/jira/browse/CONNECTORS-453 Project: ManifoldCF Issue Type: Bug Components: Framework core Affects Versions: ManifoldCF 0.5 Reporter: Karl Wright Assignee: Karl Wright Fix For: ManifoldCF 0.6 Since upgrading to Derby 10.8.x.x, tt takes minutes to crawl just 20 documents. Clearly the Derby contention/locking bugs are back with a vengeance in 10.8.x.x. Either we use 10.7.x.x or we get the Derby team to look at them again. In the interim, maybe it is time to use hsqldb as the default embedded database for the single-process example instead of Derby. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CONNECTORS-178) Implement ability to run ManifoldCF with Derby in multiprocess mode
[ https://issues.apache.org/jira/browse/CONNECTORS-178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karl Wright updated CONNECTORS-178: --- Affects Version/s: ManifoldCF 0.1 ManifoldCF 0.2 Fix Version/s: ManifoldCF next > Implement ability to run ManifoldCF with Derby in multiprocess mode > --- > > Key: CONNECTORS-178 > URL: https://issues.apache.org/jira/browse/CONNECTORS-178 > Project: ManifoldCF > Issue Type: Bug > Components: Documentation, Framework core >Affects Versions: ManifoldCF 0.1, ManifoldCF 0.2 >Reporter: Karl Wright >Priority: Minor > Fix For: ManifoldCF next > > > Derby has a standalone server mode, which we can no doubt use if we modify > the Derby driver to accept a configuration parameter which allows you to > choose between the embedded driver and the client driver. It might be useful > to be able to run ManifoldCF with Derby in this manner. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CONNECTORS-110) Max activity and Max bandwidth reports don't work properly under Derby or HSQLDB
[ https://issues.apache.org/jira/browse/CONNECTORS-110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karl Wright updated CONNECTORS-110: --- Affects Version/s: ManifoldCF 0.1 ManifoldCF 0.2 Fix Version/s: ManifoldCF next > Max activity and Max bandwidth reports don't work properly under Derby or > HSQLDB > > > Key: CONNECTORS-110 > URL: https://issues.apache.org/jira/browse/CONNECTORS-110 > Project: ManifoldCF > Issue Type: Bug > Components: Framework crawler agent >Affects Versions: ManifoldCF 0.1, ManifoldCF 0.2 >Reporter: Karl Wright > Fix For: ManifoldCF next > > > The reason for the failure is because the queries used are doing the > Postgresql DISTINCT ON (xxx) syntax, which Derby does not support. > Unfortunately, there does not seem to be a way in Derby at present to do > anything similar to DISTINCT ON (xxx), and the queries really can't be done > without that. > One option is to introduce a getCapabilities() method into the database > implementation, which would allow ACF to query the database capabilities > before even presenting the report in the navigation menu in the UI. Another > alternative is to do a sizable chunk of resultset processing within ACF, > which would require not only the DISTINCT ON() implementation, but also the > enclosing sort and limit stuff. It's the latter that would be most > challenging, because of the difficulties with i18n etc. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (CONNECTORS-244) Derby deadlocks in a new way on the IngestStatus table, which isn't caught and retried
[ https://issues.apache.org/jira/browse/CONNECTORS-244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karl Wright resolved CONNECTORS-244. Resolution: Fixed Fix Version/s: ManifoldCF 0.3 r1163260. > Derby deadlocks in a new way on the IngestStatus table, which isn't caught > and retried > -- > > Key: CONNECTORS-244 > URL: https://issues.apache.org/jira/browse/CONNECTORS-244 > Project: ManifoldCF > Issue Type: Bug > Components: Framework agents process >Affects Versions: ManifoldCF 0.3 >Reporter: Karl Wright >Assignee: Karl Wright > Fix For: ManifoldCF 0.3 > > > Derby deadlocks when a file system job is run, as follows: > Irrecoverable Derby deadlock at: > at > org.apache.manifoldcf.core.database.DBInterfaceDerby.reinterpretException(DBInterfaceDerby.java:803) > at > org.apache.manifoldcf.core.database.DBInterfaceDerby.performQuery(DBInterfaceDerby.java:961) > at > org.apache.manifoldcf.core.database.BaseTable.performQuery(BaseTable.java:229) > at > org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.performIngestion(IncrementalIngester.java:388) > at > org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.documentIngest(IncrementalIngester.java:364) > at > org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocument(WorkerThread.java:1555) > at > org.apache.manifoldcf.crawler.connectors.filesystem.FileConnector.processDocuments(FileConnector.java:283) > at > org.apache.manifoldcf.crawler.connectors.BaseRepositoryConnector.processDocuments(BaseRepositoryConnector.java:423) > at > org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:561) > The deadlock needs to be caught, backed off, and retried. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (CONNECTORS-244) Derby deadlocks in a new way on the IngestStatus table, which isn't caught and retried
Derby deadlocks in a new way on the IngestStatus table, which isn't caught and retried -- Key: CONNECTORS-244 URL: https://issues.apache.org/jira/browse/CONNECTORS-244 Project: ManifoldCF Issue Type: Bug Components: Framework agents process Affects Versions: ManifoldCF 0.3 Reporter: Karl Wright Assignee: Karl Wright Derby deadlocks when a file system job is run, as follows: Irrecoverable Derby deadlock at: at org.apache.manifoldcf.core.database.DBInterfaceDerby.reinterpretException(DBInterfaceDerby.java:803) at org.apache.manifoldcf.core.database.DBInterfaceDerby.performQuery(DBInterfaceDerby.java:961) at org.apache.manifoldcf.core.database.BaseTable.performQuery(BaseTable.java:229) at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.performIngestion(IncrementalIngester.java:388) at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.documentIngest(IncrementalIngester.java:364) at org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocument(WorkerThread.java:1555) at org.apache.manifoldcf.crawler.connectors.filesystem.FileConnector.processDocuments(FileConnector.java:283) at org.apache.manifoldcf.crawler.connectors.BaseRepositoryConnector.processDocuments(BaseRepositoryConnector.java:423) at org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:561) The deadlock needs to be caught, backed off, and retried. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (CONNECTORS-225) When using Derby, ManifoldCF incremental indexer sometimes gets a deadlock exception
[ https://issues.apache.org/jira/browse/CONNECTORS-225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karl Wright resolved CONNECTORS-225. Resolution: Fixed Fix Version/s: ManifoldCF 0.3 r1150502 > When using Derby, ManifoldCF incremental indexer sometimes gets a deadlock > exception > > > Key: CONNECTORS-225 > URL: https://issues.apache.org/jira/browse/CONNECTORS-225 > Project: ManifoldCF > Issue Type: Bug > Components: Framework agents process >Affects Versions: ManifoldCF 0.1, ManifoldCF 0.2, ManifoldCF 0.3 >Reporter: Karl Wright >Assignee: Karl Wright > Fix For: ManifoldCF 0.3 > > > When working with Derby and indexing documents rapidly, sometimes the > following deadlock stack trace is thrown: > at > org.apache.manifoldcf.core.database.DBInterfaceDerby.reinterpretExcep > tion(DBInterfaceDerby.java:803) > at > org.apache.manifoldcf.core.database.DBInterfaceDerby.performQuery(DBI > nterfaceDerby.java:961) > at > org.apache.manifoldcf.core.database.BaseTable.performQuery(BaseTable. > java:229) > at > org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.no > teDocumentIngest(IncrementalIngester.java:1372) > at > org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.pe > rformIngestion(IncrementalIngester.java:469) > at > org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.do > cumentIngest(IncrementalIngester.java:365) > at > org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ing > estDocument(WorkerThread.java:1587) > at > org.apache.manifoldcf.crawler.connectors.webcrawler.WebcrawlerConnect > or.processDocuments(WebcrawlerConnector.java:1222) > at > org.apache.manifoldcf.crawler.connectors.BaseRepositoryConnector.proc > essDocuments(BaseRepositoryConnector.java:423) > at > org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.ja -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (CONNECTORS-225) When using Derby, ManifoldCF incremental indexer sometimes gets a deadlock exception
[ https://issues.apache.org/jira/browse/CONNECTORS-225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karl Wright reassigned CONNECTORS-225: -- Assignee: Karl Wright > When using Derby, ManifoldCF incremental indexer sometimes gets a deadlock > exception > > > Key: CONNECTORS-225 > URL: https://issues.apache.org/jira/browse/CONNECTORS-225 > Project: ManifoldCF > Issue Type: Bug > Components: Framework agents process >Affects Versions: ManifoldCF 0.1, ManifoldCF 0.2, ManifoldCF 0.3 >Reporter: Karl Wright >Assignee: Karl Wright > > When working with Derby and indexing documents rapidly, sometimes the > following deadlock stack trace is thrown: > at > org.apache.manifoldcf.core.database.DBInterfaceDerby.reinterpretExcep > tion(DBInterfaceDerby.java:803) > at > org.apache.manifoldcf.core.database.DBInterfaceDerby.performQuery(DBI > nterfaceDerby.java:961) > at > org.apache.manifoldcf.core.database.BaseTable.performQuery(BaseTable. > java:229) > at > org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.no > teDocumentIngest(IncrementalIngester.java:1372) > at > org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.pe > rformIngestion(IncrementalIngester.java:469) > at > org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.do > cumentIngest(IncrementalIngester.java:365) > at > org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ing > estDocument(WorkerThread.java:1587) > at > org.apache.manifoldcf.crawler.connectors.webcrawler.WebcrawlerConnect > or.processDocuments(WebcrawlerConnector.java:1222) > at > org.apache.manifoldcf.crawler.connectors.BaseRepositoryConnector.proc > essDocuments(BaseRepositoryConnector.java:423) > at > org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.ja -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (CONNECTORS-225) When using Derby, ManifoldCF incremental indexer sometimes gets a deadlock exception
When using Derby, ManifoldCF incremental indexer sometimes gets a deadlock exception Key: CONNECTORS-225 URL: https://issues.apache.org/jira/browse/CONNECTORS-225 Project: ManifoldCF Issue Type: Bug Components: Framework agents process Affects Versions: ManifoldCF 0.2, ManifoldCF 0.1, ManifoldCF 0.3 Reporter: Karl Wright When working with Derby and indexing documents rapidly, sometimes the following deadlock stack trace is thrown: at org.apache.manifoldcf.core.database.DBInterfaceDerby.reinterpretExcep tion(DBInterfaceDerby.java:803) at org.apache.manifoldcf.core.database.DBInterfaceDerby.performQuery(DBI nterfaceDerby.java:961) at org.apache.manifoldcf.core.database.BaseTable.performQuery(BaseTable. java:229) at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.no teDocumentIngest(IncrementalIngester.java:1372) at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.pe rformIngestion(IncrementalIngester.java:469) at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.do cumentIngest(IncrementalIngester.java:365) at org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ing estDocument(WorkerThread.java:1587) at org.apache.manifoldcf.crawler.connectors.webcrawler.WebcrawlerConnect or.processDocuments(WebcrawlerConnector.java:1222) at org.apache.manifoldcf.crawler.connectors.BaseRepositoryConnector.proc essDocuments(BaseRepositoryConnector.java:423) at org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.ja -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (CONNECTORS-114) Derby seems too unstable in multithreaded situations to be a good database for ManifoldCF, so try to add support for HSQLDB
[ https://issues.apache.org/jira/browse/CONNECTORS-114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karl Wright resolved CONNECTORS-114. Resolution: Fixed Fix Version/s: ManifoldCF 0.3 Assignee: Karl Wright I have not yet made HSQLDB the official Derby replacement, but it is currently a better embedded option for many situations than Derby is. > Derby seems too unstable in multithreaded situations to be a good database > for ManifoldCF, so try to add support for HSQLDB > --- > > Key: CONNECTORS-114 > URL: https://issues.apache.org/jira/browse/CONNECTORS-114 > Project: ManifoldCF > Issue Type: Bug > Components: Framework core >Reporter: Karl Wright >Assignee: Karl Wright > Fix For: ManifoldCF 0.3 > > > Derby seems to have multiple problems: > (1) It has internal deadlocks, which even if caught cause poor performance > due to stalling (CONNECTORS-111); > (2) It has no support for certain SQL constructs (CONNECTORS-109 and > CONNECTORS-110); > (3) It locks up entirely for some people (CONNECTORS-100). > HSQLDB has been recommended as another potential embedded database that might > work better. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CONNECTORS-114) Derby seems too unstable in multithreaded situations to be a good database for ManifoldCF, so try to add support for HSQLDB
[ https://issues.apache.org/jira/browse/CONNECTORS-114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13043369#comment-13043369 ] Karl Wright commented on CONNECTORS-114: Remaining issues with HSQLDB have been resolved, so I'm closing this ticket. r1131056. > Derby seems too unstable in multithreaded situations to be a good database > for ManifoldCF, so try to add support for HSQLDB > --- > > Key: CONNECTORS-114 > URL: https://issues.apache.org/jira/browse/CONNECTORS-114 > Project: ManifoldCF > Issue Type: Bug > Components: Framework core >Reporter: Karl Wright > Fix For: ManifoldCF 0.3 > > > Derby seems to have multiple problems: > (1) It has internal deadlocks, which even if caught cause poor performance > due to stalling (CONNECTORS-111); > (2) It has no support for certain SQL constructs (CONNECTORS-109 and > CONNECTORS-110); > (3) It locks up entirely for some people (CONNECTORS-100). > HSQLDB has been recommended as another potential embedded database that might > work better. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CONNECTORS-110) Max activity and Max bandwidth reports don't work properly under Derby or HSQLDB
[ https://issues.apache.org/jira/browse/CONNECTORS-110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13042861#comment-13042861 ] Karl Wright commented on CONNECTORS-110: r1130644 implements this for HSQLDB. Unfortunately, performance is extremely slow, even when the number of rows in the temporary table is only a few dozen. > Max activity and Max bandwidth reports don't work properly under Derby or > HSQLDB > > > Key: CONNECTORS-110 > URL: https://issues.apache.org/jira/browse/CONNECTORS-110 > Project: ManifoldCF > Issue Type: Bug > Components: Framework crawler agent >Reporter: Karl Wright > > The reason for the failure is because the queries used are doing the > Postgresql DISTINCT ON (xxx) syntax, which Derby does not support. > Unfortunately, there does not seem to be a way in Derby at present to do > anything similar to DISTINCT ON (xxx), and the queries really can't be done > without that. > One option is to introduce a getCapabilities() method into the database > implementation, which would allow ACF to query the database capabilities > before even presenting the report in the navigation menu in the UI. Another > alternative is to do a sizable chunk of resultset processing within ACF, > which would require not only the DISTINCT ON() implementation, but also the > enclosing sort and limit stuff. It's the latter that would be most > challenging, because of the difficulties with i18n etc. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CONNECTORS-110) Max activity and Max bandwidth reports don't work properly under Derby or HSQLDB
[ https://issues.apache.org/jira/browse/CONNECTORS-110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13042669#comment-13042669 ] Karl Wright commented on CONNECTORS-110: Updated suggestion from Fred pertaining to HSQLDB: Use WITH statement, as follows: WITH invoice ( customerid, id, total) AS ( complex select statetment) SELECT * FROM (SELECT DISTINCT customerid FROM invoice) AS i_one, LATERAL ( SELECT id, total FROM invoice WHERE customerid = i_one.customerid ORDER BY total DESC LIMIT 1) AS i_two I believe this can actually be generated in a manner that fits the current abstraction. > Max activity and Max bandwidth reports don't work properly under Derby or > HSQLDB > > > Key: CONNECTORS-110 > URL: https://issues.apache.org/jira/browse/CONNECTORS-110 > Project: ManifoldCF > Issue Type: Bug > Components: Framework crawler agent >Reporter: Karl Wright > > The reason for the failure is because the queries used are doing the > Postgresql DISTINCT ON (xxx) syntax, which Derby does not support. > Unfortunately, there does not seem to be a way in Derby at present to do > anything similar to DISTINCT ON (xxx), and the queries really can't be done > without that. > One option is to introduce a getCapabilities() method into the database > implementation, which would allow ACF to query the database capabilities > before even presenting the report in the navigation menu in the UI. Another > alternative is to do a sizable chunk of resultset processing within ACF, > which would require not only the DISTINCT ON() implementation, but also the > enclosing sort and limit stuff. It's the latter that would be most > challenging, because of the difficulties with i18n etc. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CONNECTORS-110) Max activity and Max bandwidth reports don't work properly under Derby or HSQLDB
[ https://issues.apache.org/jira/browse/CONNECTORS-110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karl Wright updated CONNECTORS-110: --- Summary: Max activity and Max bandwidth reports don't work properly under Derby or HSQLDB (was: Max activity and Max bandwidth reports don't work properly under Derby) > Max activity and Max bandwidth reports don't work properly under Derby or > HSQLDB > > > Key: CONNECTORS-110 > URL: https://issues.apache.org/jira/browse/CONNECTORS-110 > Project: ManifoldCF > Issue Type: Bug > Components: Framework crawler agent >Reporter: Karl Wright > > The reason for the failure is because the queries used are doing the > Postgresql DISTINCT ON (xxx) syntax, which Derby does not support. > Unfortunately, there does not seem to be a way in Derby at present to do > anything similar to DISTINCT ON (xxx), and the queries really can't be done > without that. > One option is to introduce a getCapabilities() method into the database > implementation, which would allow ACF to query the database capabilities > before even presenting the report in the navigation menu in the UI. Another > alternative is to do a sizable chunk of resultset processing within ACF, > which would require not only the DISTINCT ON() implementation, but also the > enclosing sort and limit stuff. It's the latter that would be most > challenging, because of the difficulties with i18n etc. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CONNECTORS-110) Max activity and Max bandwidth reports don't work properly under Derby
[ https://issues.apache.org/jira/browse/CONNECTORS-110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13042655#comment-13042655 ] Karl Wright commented on CONNECTORS-110: HSQLDB is now also in roughly the same situation, although I've gotten a rough outline of a way to make this work involving temporary tables. This is as follows: SELECT * FROM (SELECT DISTINCT customerid FROM invoice) AS i_one, LATERAL ( SELECT id, total FROM invoice WHERE customerid = i_one.customerid ORDER BY total DESC LIMIT 1) AS i_two ... where "invoice" would be a temporary table created on the fly, as follows: DECLARE LOCAL TEMPORARY TABLE T AS (SELECT statement) [ON COMMIT { PRESERVE | DELETE } ROWS] For example: DECLARE LOCAL TEMPORARY TABLE invoice AS (SELECT * FROM whatever) ON COMMIT DELETE ROWS WITH DATA then perform the kind of query I suggested. The issue is that this does not fit in a our single-query abstraction metaphor at all. Maybe a (different but identically named) stored procedure could be generated on all three databases that would do the trick. Alternatively, all databases could go the temporary table route, but then PostgreSQL would be unnecessarily crippled. > Max activity and Max bandwidth reports don't work properly under Derby > -- > > Key: CONNECTORS-110 > URL: https://issues.apache.org/jira/browse/CONNECTORS-110 > Project: ManifoldCF > Issue Type: Bug > Components: Framework crawler agent >Reporter: Karl Wright > > The reason for the failure is because the queries used are doing the > Postgresql DISTINCT ON (xxx) syntax, which Derby does not support. > Unfortunately, there does not seem to be a way in Derby at present to do > anything similar to DISTINCT ON (xxx), and the queries really can't be done > without that. > One option is to introduce a getCapabilities() method into the database > implementation, which would allow ACF to query the database capabilities > before even presenting the report in the navigation menu in the UI. Another > alternative is to do a sizable chunk of resultset processing within ACF, > which would require not only the DISTINCT ON() implementation, but also the > enclosing sort and limit stuff. It's the latter that would be most > challenging, because of the difficulties with i18n etc. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CONNECTORS-114) Derby seems too unstable in multithreaded situations to be a good database for ManifoldCF, so try to add support for HSQLDB
[ https://issues.apache.org/jira/browse/CONNECTORS-114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13041339#comment-13041339 ] Karl Wright commented on CONNECTORS-114: Just got email from the HSQLDB team, and confirmed that the deadlock issue was resolved in hsqldb 2.2.2. So it looks like we have a third database that ManifoldCF can work with. I've checked in the updated database jar, and am planning on writing a test series that uses hsqldb, much like the series that uses PostgreSQL. We've still got to settle on how precisely to do the equivalent of PostgreSQL's DISTINCT ON functionality, but that's all that is left. Also, FWIW, HSQLDB doesn't (as yet) seem to fail so spectacularly dealing with hopcounts as Derby does. > Derby seems too unstable in multithreaded situations to be a good database > for ManifoldCF, so try to add support for HSQLDB > --- > > Key: CONNECTORS-114 > URL: https://issues.apache.org/jira/browse/CONNECTORS-114 > Project: ManifoldCF > Issue Type: Bug > Components: Framework core >Reporter: Karl Wright > > Derby seems to have multiple problems: > (1) It has internal deadlocks, which even if caught cause poor performance > due to stalling (CONNECTORS-111); > (2) It has no support for certain SQL constructs (CONNECTORS-109 and > CONNECTORS-110); > (3) It locks up entirely for some people (CONNECTORS-100). > HSQLDB has been recommended as another potential embedded database that might > work better. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (CONNECTORS-175) The site documentation property list does not include the PostgreSQL-specific parameters, and may be missing some of the Derby ones too
[ https://issues.apache.org/jira/browse/CONNECTORS-175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karl Wright resolved CONNECTORS-175. Resolution: Fixed Fix Version/s: ManifoldCF next r1089704. > The site documentation property list does not include the PostgreSQL-specific > parameters, and may be missing some of the Derby ones too > --- > > Key: CONNECTORS-175 > URL: https://issues.apache.org/jira/browse/CONNECTORS-175 > Project: ManifoldCF > Issue Type: Improvement > Components: Documentation >Affects Versions: ManifoldCF next >Reporter: Karl Wright >Assignee: Karl Wright >Priority: Minor > Fix For: ManifoldCF next > > > The table that documents all the properties in properties.xml seems to be > missing the PostgreSQL-specific ones. This is the > how-to-build-and-deploy.html page. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (CONNECTORS-175) The site documentation property list does not include the PostgreSQL-specific parameters, and may be missing some of the Derby ones too
[ https://issues.apache.org/jira/browse/CONNECTORS-175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karl Wright reassigned CONNECTORS-175: -- Assignee: Karl Wright > The site documentation property list does not include the PostgreSQL-specific > parameters, and may be missing some of the Derby ones too > --- > > Key: CONNECTORS-175 > URL: https://issues.apache.org/jira/browse/CONNECTORS-175 > Project: ManifoldCF > Issue Type: Improvement > Components: Documentation >Affects Versions: ManifoldCF next >Reporter: Karl Wright >Assignee: Karl Wright >Priority: Minor > > The table that documents all the properties in properties.xml seems to be > missing the PostgreSQL-specific ones. This is the > how-to-build-and-deploy.html page. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CONNECTORS-175) The site documentation property list does not include the PostgreSQL-specific parameters, and may be missing some of the Derby ones too
[ https://issues.apache.org/jira/browse/CONNECTORS-175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13016641#comment-13016641 ] Karl Wright commented on CONNECTORS-175: The QuickStart parameters org.apache.manifoldcf.dbsuperusername and org.apache.manifoldcf.dbsuperuserpassword are definitely missing. > The site documentation property list does not include the PostgreSQL-specific > parameters, and may be missing some of the Derby ones too > --- > > Key: CONNECTORS-175 > URL: https://issues.apache.org/jira/browse/CONNECTORS-175 > Project: ManifoldCF > Issue Type: Improvement > Components: Documentation >Affects Versions: ManifoldCF next >Reporter: Karl Wright >Priority: Minor > > The table that documents all the properties in properties.xml seems to be > missing the PostgreSQL-specific ones. This is the > how-to-build-and-deploy.html page. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (CONNECTORS-178) Implement ability to run ManifoldCF with Derby in multiprocess mode
Implement ability to run ManifoldCF with Derby in multiprocess mode --- Key: CONNECTORS-178 URL: https://issues.apache.org/jira/browse/CONNECTORS-178 Project: ManifoldCF Issue Type: Bug Components: Documentation, Framework core Reporter: Karl Wright Priority: Minor Derby has a standalone server mode, which we can no doubt use if we modify the Derby driver to accept a configuration parameter which allows you to choose between the embedded driver and the client driver. It might be useful to be able to run ManifoldCF with Derby in this manner. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CONNECTORS-110) Max activity and Max bandwidth reports don't work properly under Derby
[ https://issues.apache.org/jira/browse/CONNECTORS-110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13015015#comment-13015015 ] Karl Wright commented on CONNECTORS-110: This ticket is stalled because it requires a new Derby feature to resolve. The resolution will be to assess the current version of Derby and find out whether the required feature has been added, and barring that, opening a Derby ticket for the feature. > Max activity and Max bandwidth reports don't work properly under Derby > -- > > Key: CONNECTORS-110 > URL: https://issues.apache.org/jira/browse/CONNECTORS-110 > Project: ManifoldCF > Issue Type: Bug > Components: Framework crawler agent >Reporter: Karl Wright > > The reason for the failure is because the queries used are doing the > Postgresql DISTINCT ON (xxx) syntax, which Derby does not support. > Unfortunately, there does not seem to be a way in Derby at present to do > anything similar to DISTINCT ON (xxx), and the queries really can't be done > without that. > One option is to introduce a getCapabilities() method into the database > implementation, which would allow ACF to query the database capabilities > before even presenting the report in the navigation menu in the UI. Another > alternative is to do a sizable chunk of resultset processing within ACF, > which would require not only the DISTINCT ON() implementation, but also the > enclosing sort and limit stuff. It's the latter that would be most > challenging, because of the difficulties with i18n etc. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CONNECTORS-110) Max activity and Max bandwidth reports don't work properly under Derby
[ https://issues.apache.org/jira/browse/CONNECTORS-110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karl Wright updated CONNECTORS-110: --- Summary: Max activity and Max bandwidth reports don't work properly under Derby (was: Max activity and Max bandwidth reports fail under Derby with a stack trace) > Max activity and Max bandwidth reports don't work properly under Derby > -- > > Key: CONNECTORS-110 > URL: https://issues.apache.org/jira/browse/CONNECTORS-110 > Project: ManifoldCF > Issue Type: Bug > Components: Framework crawler agent >Reporter: Karl Wright > > The reason for the failure is because the queries used are doing the > Postgresql DISTINCT ON (xxx) syntax, which Derby does not support. > Unfortunately, there does not seem to be a way in Derby at present to do > anything similar to DISTINCT ON (xxx), and the queries really can't be done > without that. > One option is to introduce a getCapabilities() method into the database > implementation, which would allow ACF to query the database capabilities > before even presenting the report in the navigation menu in the UI. Another > alternative is to do a sizable chunk of resultset processing within ACF, > which would require not only the DISTINCT ON() implementation, but also the > enclosing sort and limit stuff. It's the latter that would be most > challenging, because of the difficulties with i18n etc. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CONNECTORS-114) Derby seems too unstable in multithreaded situations to be a good database for ManifoldCF, so try to add support for HSQLDB
[ https://issues.apache.org/jira/browse/CONNECTORS-114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13015013#comment-13015013 ] Karl Wright commented on CONNECTORS-114: This is stalled, because HSQLDB is not yet ready for the kinds of demands that ManifoldCF will put on it. Working with Derby seems more appropriate since they've been able to respond to bugs. > Derby seems too unstable in multithreaded situations to be a good database > for ManifoldCF, so try to add support for HSQLDB > --- > > Key: CONNECTORS-114 > URL: https://issues.apache.org/jira/browse/CONNECTORS-114 > Project: ManifoldCF > Issue Type: Bug > Components: Framework core >Reporter: Karl Wright > > Derby seems to have multiple problems: > (1) It has internal deadlocks, which even if caught cause poor performance > due to stalling (CONNECTORS-111); > (2) It has no support for certain SQL constructs (CONNECTORS-109 and > CONNECTORS-110); > (3) It locks up entirely for some people (CONNECTORS-100). > HSQLDB has been recommended as another potential embedded database that might > work better. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (CONNECTORS-175) The site documentation property list does not include the PostgreSQL-specific parameters, and may be missing some of the Derby ones too
The site documentation property list does not include the PostgreSQL-specific parameters, and may be missing some of the Derby ones too --- Key: CONNECTORS-175 URL: https://issues.apache.org/jira/browse/CONNECTORS-175 Project: ManifoldCF Issue Type: Improvement Components: Documentation Affects Versions: ManifoldCF next Reporter: Karl Wright Priority: Minor The table that documents all the properties in properties.xml seems to be missing the PostgreSQL-specific ones. This is the how-to-build-and-deploy.html page. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Resolved: (CONNECTORS-170) Derby database driver needs to periodically update statistics
[ https://issues.apache.org/jira/browse/CONNECTORS-170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karl Wright resolved CONNECTORS-170. Resolution: Fixed Fix Version/s: ManifoldCF 0.2 r1082598. > Derby database driver needs to periodically update statistics > - > > Key: CONNECTORS-170 > URL: https://issues.apache.org/jira/browse/CONNECTORS-170 > Project: ManifoldCF > Issue Type: Bug > Components: Framework core >Affects Versions: ManifoldCF 0.2 >Reporter: Karl Wright >Assignee: Karl Wright > Fix For: ManifoldCF 0.2 > > > The Derby database driver needs to update statistics periodically, using > logic similar to that developed for PostgreSQL. The way that's done is > through calling SYSCS_UTIL.SYSCS_UPDATE_STATISTICS on the table in question. > http://db.apache.org/derby/docs/10.7/ref/rrefupdatestatsproc.html. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Created: (CONNECTORS-170) Derby database driver needs to periodically update statistics
Derby database driver needs to periodically update statistics - Key: CONNECTORS-170 URL: https://issues.apache.org/jira/browse/CONNECTORS-170 Project: ManifoldCF Issue Type: Bug Components: Framework core Affects Versions: ManifoldCF 0.2 Reporter: Karl Wright The Derby database driver needs to update statistics periodically, using logic similar to that developed for PostgreSQL. The way that's done is through calling SYSCS_UTIL.SYSCS_UPDATE_STATISTICS on the table in question. http://db.apache.org/derby/docs/10.7/ref/rrefupdatestatsproc.html. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Assigned: (CONNECTORS-170) Derby database driver needs to periodically update statistics
[ https://issues.apache.org/jira/browse/CONNECTORS-170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karl Wright reassigned CONNECTORS-170: -- Assignee: Karl Wright > Derby database driver needs to periodically update statistics > - > > Key: CONNECTORS-170 > URL: https://issues.apache.org/jira/browse/CONNECTORS-170 > Project: ManifoldCF > Issue Type: Bug > Components: Framework core >Affects Versions: ManifoldCF 0.2 >Reporter: Karl Wright > Assignee: Karl Wright > > The Derby database driver needs to update statistics periodically, using > logic similar to that developed for PostgreSQL. The way that's done is > through calling SYSCS_UTIL.SYSCS_UPDATE_STATISTICS on the table in question. > http://db.apache.org/derby/docs/10.7/ref/rrefupdatestatsproc.html. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Resolved: (CONNECTORS-166) Crawl seizes up when running Derby
[ https://issues.apache.org/jira/browse/CONNECTORS-166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karl Wright resolved CONNECTORS-166. Resolution: Fixed Fix Version/s: ManifoldCF 0.2 r1082140. > Crawl seizes up when running Derby > -- > > Key: CONNECTORS-166 > URL: https://issues.apache.org/jira/browse/CONNECTORS-166 > Project: ManifoldCF > Issue Type: Bug > Components: Framework crawler agent >Affects Versions: ManifoldCF 0.1, ManifoldCF 0.2 >Reporter: Karl Wright >Assignee: Karl Wright > Fix For: ManifoldCF 0.2 > > > A crawl using multiple worker threads with Derby eventually hangs, because > threads get deadlocked dealing with carrydown information. At the time of > hang, a thread dump yields: > "Worker thread '5'" daemon prio=6 tid=0x02fc7800 nid=0xd78 in Object.wait() > [0x0465f000] >java.lang.Thread.State: WAITING (on object monitor) > at java.lang.Object.wait(Native Method) > - waiting on <0x2858b720> (a > org.apache.manifoldcf.core.database.Database$ExecuteQueryThread) > at java.lang.Thread.join(Unknown Source) > - locked <0x2858b720> (a > org.apache.manifoldcf.core.database.Database$ExecuteQueryThread) > at java.lang.Thread.join(Unknown Source) > at > org.apache.manifoldcf.core.database.Database.executeViaThread(Database.java:453) > at > org.apache.manifoldcf.core.database.Database.executeUncachedQuery(Database.java:489) > at > org.apache.manifoldcf.core.database.Database$QueryCacheExecutor.create(Database.java:1131) > at > org.apache.manifoldcf.core.cachemanager.CacheManager.findObjectsAndExecute(CacheManager.java:144) > at > org.apache.manifoldcf.core.database.Database.executeQuery(Database.java:168) > at > org.apache.manifoldcf.core.database.DBInterfaceDerby.performQuery(DBInterfaceDerby.java:785) > at > org.apache.manifoldcf.crawler.jobs.JobManager.processDeleteHashSet(JobManager.java:2592) > at > org.apache.manifoldcf.crawler.jobs.JobManager.calculateAffectedDeleteCarrydownChildren(JobManager.java:2565) > at > org.apache.manifoldcf.crawler.jobs.JobManager.markDocumentDeletedMultiple(JobManager.java:2494) > at > org.apache.manifoldcf.crawler.system.WorkerThread.processDeleteLists(WorkerThread.java:1077) > at > org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:544) > ... for at least two threads. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (CONNECTORS-166) Crawl seizes up when running Derby
[ https://issues.apache.org/jira/browse/CONNECTORS-166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13007401#comment-13007401 ] Karl Wright commented on CONNECTORS-166: Oleg reports that the test seems to pass. The only remaining issue is that the version of Derby built from trunk has upgrade blocked. I will therefore need to build a version of Derby based on the latest release plus the patch instead. > Crawl seizes up when running Derby > -- > > Key: CONNECTORS-166 > URL: https://issues.apache.org/jira/browse/CONNECTORS-166 > Project: ManifoldCF > Issue Type: Bug > Components: Framework crawler agent >Affects Versions: ManifoldCF 0.1, ManifoldCF 0.2 >Reporter: Karl Wright >Assignee: Karl Wright > > A crawl using multiple worker threads with Derby eventually hangs, because > threads get deadlocked dealing with carrydown information. At the time of > hang, a thread dump yields: > "Worker thread '5'" daemon prio=6 tid=0x02fc7800 nid=0xd78 in Object.wait() > [0x0465f000] >java.lang.Thread.State: WAITING (on object monitor) > at java.lang.Object.wait(Native Method) > - waiting on <0x2858b720> (a > org.apache.manifoldcf.core.database.Database$ExecuteQueryThread) > at java.lang.Thread.join(Unknown Source) > - locked <0x2858b720> (a > org.apache.manifoldcf.core.database.Database$ExecuteQueryThread) > at java.lang.Thread.join(Unknown Source) > at > org.apache.manifoldcf.core.database.Database.executeViaThread(Database.java:453) > at > org.apache.manifoldcf.core.database.Database.executeUncachedQuery(Database.java:489) > at > org.apache.manifoldcf.core.database.Database$QueryCacheExecutor.create(Database.java:1131) > at > org.apache.manifoldcf.core.cachemanager.CacheManager.findObjectsAndExecute(CacheManager.java:144) > at > org.apache.manifoldcf.core.database.Database.executeQuery(Database.java:168) > at > org.apache.manifoldcf.core.database.DBInterfaceDerby.performQuery(DBInterfaceDerby.java:785) > at > org.apache.manifoldcf.crawler.jobs.JobManager.processDeleteHashSet(JobManager.java:2592) > at > org.apache.manifoldcf.crawler.jobs.JobManager.calculateAffectedDeleteCarrydownChildren(JobManager.java:2565) > at > org.apache.manifoldcf.crawler.jobs.JobManager.markDocumentDeletedMultiple(JobManager.java:2494) > at > org.apache.manifoldcf.crawler.system.WorkerThread.processDeleteLists(WorkerThread.java:1077) > at > org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:544) > ... for at least two threads. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (CONNECTORS-166) Crawl seizes up when running Derby
[ https://issues.apache.org/jira/browse/CONNECTORS-166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13006583#comment-13006583 ] Karl Wright commented on CONNECTORS-166: According to the Derby team, Derby trunk fixes this problem. I've therefore build trunk and checked it in. r1081520. > Crawl seizes up when running Derby > -- > > Key: CONNECTORS-166 > URL: https://issues.apache.org/jira/browse/CONNECTORS-166 > Project: ManifoldCF > Issue Type: Bug > Components: Framework crawler agent >Affects Versions: ManifoldCF 0.1, ManifoldCF 0.2 >Reporter: Karl Wright >Assignee: Karl Wright > > A crawl using multiple worker threads with Derby eventually hangs, because > threads get deadlocked dealing with carrydown information. At the time of > hang, a thread dump yields: > "Worker thread '5'" daemon prio=6 tid=0x02fc7800 nid=0xd78 in Object.wait() > [0x0465f000] >java.lang.Thread.State: WAITING (on object monitor) > at java.lang.Object.wait(Native Method) > - waiting on <0x2858b720> (a > org.apache.manifoldcf.core.database.Database$ExecuteQueryThread) > at java.lang.Thread.join(Unknown Source) > - locked <0x2858b720> (a > org.apache.manifoldcf.core.database.Database$ExecuteQueryThread) > at java.lang.Thread.join(Unknown Source) > at > org.apache.manifoldcf.core.database.Database.executeViaThread(Database.java:453) > at > org.apache.manifoldcf.core.database.Database.executeUncachedQuery(Database.java:489) > at > org.apache.manifoldcf.core.database.Database$QueryCacheExecutor.create(Database.java:1131) > at > org.apache.manifoldcf.core.cachemanager.CacheManager.findObjectsAndExecute(CacheManager.java:144) > at > org.apache.manifoldcf.core.database.Database.executeQuery(Database.java:168) > at > org.apache.manifoldcf.core.database.DBInterfaceDerby.performQuery(DBInterfaceDerby.java:785) > at > org.apache.manifoldcf.crawler.jobs.JobManager.processDeleteHashSet(JobManager.java:2592) > at > org.apache.manifoldcf.crawler.jobs.JobManager.calculateAffectedDeleteCarrydownChildren(JobManager.java:2565) > at > org.apache.manifoldcf.crawler.jobs.JobManager.markDocumentDeletedMultiple(JobManager.java:2494) > at > org.apache.manifoldcf.crawler.system.WorkerThread.processDeleteLists(WorkerThread.java:1077) > at > org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:544) > ... for at least two threads. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Created: (CONNECTORS-166) Crawl seizes up when running Derby
Crawl seizes up when running Derby -- Key: CONNECTORS-166 URL: https://issues.apache.org/jira/browse/CONNECTORS-166 Project: ManifoldCF Issue Type: Bug Components: Framework crawler agent Affects Versions: ManifoldCF 0.1, ManifoldCF next Reporter: Karl Wright Assignee: Karl Wright A crawl using multiple worker threads with Derby eventually hangs, because threads get deadlocked dealing with carrydown information. At the time of hang, a thread dump yields: "Worker thread '5'" daemon prio=6 tid=0x02fc7800 nid=0xd78 in Object.wait() [0x0465f000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on <0x2858b720> (a org.apache.manifoldcf.core.database.Database$ExecuteQueryThread) at java.lang.Thread.join(Unknown Source) - locked <0x2858b720> (a org.apache.manifoldcf.core.database.Database$ExecuteQueryThread) at java.lang.Thread.join(Unknown Source) at org.apache.manifoldcf.core.database.Database.executeViaThread(Database.java:453) at org.apache.manifoldcf.core.database.Database.executeUncachedQuery(Database.java:489) at org.apache.manifoldcf.core.database.Database$QueryCacheExecutor.create(Database.java:1131) at org.apache.manifoldcf.core.cachemanager.CacheManager.findObjectsAndExecute(CacheManager.java:144) at org.apache.manifoldcf.core.database.Database.executeQuery(Database.java:168) at org.apache.manifoldcf.core.database.DBInterfaceDerby.performQuery(DBInterfaceDerby.java:785) at org.apache.manifoldcf.crawler.jobs.JobManager.processDeleteHashSet(JobManager.java:2592) at org.apache.manifoldcf.crawler.jobs.JobManager.calculateAffectedDeleteCarrydownChildren(JobManager.java:2565) at org.apache.manifoldcf.crawler.jobs.JobManager.markDocumentDeletedMultiple(JobManager.java:2494) at org.apache.manifoldcf.crawler.system.WorkerThread.processDeleteLists(WorkerThread.java:1077) at org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:544) ... for at least two threads. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Resolved: (CONNECTORS-163) Go to current version of Derby, to try and avoid internal deadlocks
[ https://issues.apache.org/jira/browse/CONNECTORS-163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karl Wright resolved CONNECTORS-163. Resolution: Fixed Fix Version/s: ManifoldCF next r1074064. > Go to current version of Derby, to try and avoid internal deadlocks > --- > > Key: CONNECTORS-163 > URL: https://issues.apache.org/jira/browse/CONNECTORS-163 > Project: ManifoldCF > Issue Type: Improvement > Components: Framework core >Affects Versions: ManifoldCF next >Reporter: Karl Wright >Assignee: Karl Wright > Fix For: ManifoldCF next > > > Derby 10.5.3.0 irrecoverably deadlocks on the straightforward correlated > subqueries involving the carrydown table. The source of the problem is not > clear. However, there's a newer version of Derby available. If it passes > the tests, I recommend trying that to see if the problem is fixed. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (CONNECTORS-163) Go to current version of Derby, to try and avoid internal deadlocks
[ https://issues.apache.org/jira/browse/CONNECTORS-163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karl Wright updated CONNECTORS-163: --- Description: Derby 10.5.3.0 irrecoverably deadlocks on the straightforward correlated subqueries involving the carrydown table. The source of the problem is not clear. However, there's a newer version of Derby available. If it passes the tests, I recommend trying that to see if the problem is fixed. (was: Derby 10.5.3.0 internally deadlocks on the straightforward correlated subqueries involving the carrydown table. The source of the problem is not clear. However, there's a newer version of Derby available. If it passes the tests, I recommend trying that to see if the problem is fixed.) > Go to current version of Derby, to try and avoid internal deadlocks > --- > > Key: CONNECTORS-163 > URL: https://issues.apache.org/jira/browse/CONNECTORS-163 > Project: ManifoldCF > Issue Type: Improvement > Components: Framework core >Affects Versions: ManifoldCF next >Reporter: Karl Wright >Assignee: Karl Wright > > Derby 10.5.3.0 irrecoverably deadlocks on the straightforward correlated > subqueries involving the carrydown table. The source of the problem is not > clear. However, there's a newer version of Derby available. If it passes > the tests, I recommend trying that to see if the problem is fixed. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Created: (CONNECTORS-163) Go to current version of Derby, to try and avoid internal deadlocks
Go to current version of Derby, to try and avoid internal deadlocks --- Key: CONNECTORS-163 URL: https://issues.apache.org/jira/browse/CONNECTORS-163 Project: ManifoldCF Issue Type: Improvement Components: Framework core Affects Versions: ManifoldCF next Reporter: Karl Wright Derby 10.5.3.0 internally deadlocks on the straightforward correlated subqueries involving the carrydown table. The source of the problem is not clear. However, there's a newer version of Derby available. If it passes the tests, I recommend trying that to see if the problem is fixed. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Assigned: (CONNECTORS-163) Go to current version of Derby, to try and avoid internal deadlocks
[ https://issues.apache.org/jira/browse/CONNECTORS-163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karl Wright reassigned CONNECTORS-163: -- Assignee: Karl Wright > Go to current version of Derby, to try and avoid internal deadlocks > --- > > Key: CONNECTORS-163 > URL: https://issues.apache.org/jira/browse/CONNECTORS-163 > Project: ManifoldCF > Issue Type: Improvement > Components: Framework core >Affects Versions: ManifoldCF next >Reporter: Karl Wright > Assignee: Karl Wright > > Derby 10.5.3.0 internally deadlocks on the straightforward correlated > subqueries involving the carrydown table. The source of the problem is not > clear. However, there's a newer version of Derby available. If it passes > the tests, I recommend trying that to see if the problem is fixed. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Resolved: (CONNECTORS-123) Document status report does not display the correct status under Derby
[ https://issues.apache.org/jira/browse/CONNECTORS-123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karl Wright resolved CONNECTORS-123. Resolution: Fixed It looks like this is a Derby bug, but it can be worked around by rearranging certain expressions. r1029937. > Document status report does not display the correct status under Derby > -- > > Key: CONNECTORS-123 > URL: https://issues.apache.org/jira/browse/CONNECTORS-123 > Project: ManifoldCF > Issue Type: Bug > Components: Framework agents process >Reporter: Karl Wright >Assignee: Karl Wright >Priority: Minor > > The document status report displays a status of "Unknown" for documents that > are in the PENDING_PURGATORY state where the action time is greater than the > current time, and the action is RESCAN. The status that should be displayed > is "Waiting for processing". This only happens if the database is Derby. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (CONNECTORS-123) Document status report does not display the correct status under Derby
[ https://issues.apache.org/jira/browse/CONNECTORS-123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karl Wright reassigned CONNECTORS-123: -- Assignee: Karl Wright > Document status report does not display the correct status under Derby > -- > > Key: CONNECTORS-123 > URL: https://issues.apache.org/jira/browse/CONNECTORS-123 > Project: ManifoldCF > Issue Type: Bug > Components: Framework agents process >Reporter: Karl Wright >Assignee: Karl Wright >Priority: Minor > > The document status report displays a status of "Unknown" for documents that > are in the PENDING_PURGATORY state where the action time is greater than the > current time, and the action is RESCAN. The status that should be displayed > is "Waiting for processing". This only happens if the database is Derby. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (CONNECTORS-123) Document status report does not display the correct status under Derby
Document status report does not display the correct status under Derby -- Key: CONNECTORS-123 URL: https://issues.apache.org/jira/browse/CONNECTORS-123 Project: ManifoldCF Issue Type: Bug Components: Framework agents process Reporter: Karl Wright Priority: Minor The document status report displays a status of "Unknown" for documents that are in the PENDING_PURGATORY state where the action time is greater than the current time, and the action is RESCAN. The status that should be displayed is "Waiting for processing". This only happens if the database is Derby. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (CONNECTORS-109) Queue status report fails under Derby
[ https://issues.apache.org/jira/browse/CONNECTORS-109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karl Wright resolved CONNECTORS-109. Resolution: Fixed Fix Version/s: LCF Release 0.5 Hooked up user-defined functions to perform regular expression matching in Derby. r1029455. > Queue status report fails under Derby > - > > Key: CONNECTORS-109 > URL: https://issues.apache.org/jira/browse/CONNECTORS-109 > Project: ManifoldCF > Issue Type: Bug > Components: Framework crawler agent >Reporter: Karl Wright >Assignee: Karl Wright > Fix For: LCF Release 0.5 > > > If you try to use the queue status report with Derby as the database, you get > the following error: > 2010-09-17 18:03:21.558:WARN::Nested in javax.servlet.ServletException: > org.apac > he.acf.core.interfaces.ACFException: Database exception: Exception doing > query: > Syntax error: Encountered "SUBSTRING" at line 1, column 8.: > org.apache.acf.core.interfaces.ACFException: Database exception: Exception > doing > query: Syntax error: Encountered "SUBSTRING" at line 1, column 8. > at > org.apache.acf.core.database.Database.executeViaThread(Database.java: > 421) > at > org.apache.acf.core.database.Database.executeUncachedQuery(Database.j > ava:465) > at > org.apache.acf.core.database.Database$QueryCacheExecutor.create(Datab > ase.java:1072) > at > org.apache.acf.core.cachemanager.CacheManager.findObjectsAndExecute(C > acheManager.java:144) > at > org.apache.acf.core.database.Database.executeQuery(Database.java:167) > at > org.apache.acf.core.database.DBInterfaceDerby.performQuery(DBInterfac > eDerby.java:751) > at > org.apache.acf.crawler.jobs.JobManager.genQueueStatus(JobManager.java > :5981) > at > org.apache.jsp.queuestatus_jsp._jspService(queuestatus_jsp.java:769) > at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:70) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) > at > org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper > .java:377) > at > org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:3 > 13) > at org.apache.jasper.servlet.JspServlet.service(JspServlet.java:260) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) > at > org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511 > ) > at > org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:3 > 90) > at > org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.jav > a:216) > at > org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:1 > 82) > at > org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:7 > 65) > at > org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:418) > at org.mortbay.jetty.servlet.Dispatcher.forward(Dispatcher.java:327) > at org.mortbay.jetty.servlet.Dispatcher.forward(Dispatcher.java:126) > at > org.apache.jasper.runtime.PageContextImpl.doForward(PageContextImpl.j > ava:706) > at > org.apache.jasper.runtime.PageContextImpl.forward(PageContextImpl.jav > a:677) > at org.apache.jsp.execute_jsp._jspService(execute_jsp.java:1291) > at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:70) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) > at > org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper > .java:377) > at > org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:3 > 13) > at org.apache.jasper.servlet.JspServlet.service(JspServlet.java:260) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) > at > org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511 > ) > at > org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:3 > 90) > at > org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.jav > a:216) > at > org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:1 > 82) > at > org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:7 > 65) > at > org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:418) > at > org.mortbay.jetty.handler.HandlerCollection.handle(HandlerColl
[jira] Assigned: (CONNECTORS-109) Queue status report fails under Derby
[ https://issues.apache.org/jira/browse/CONNECTORS-109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karl Wright reassigned CONNECTORS-109: -- Assignee: Karl Wright > Queue status report fails under Derby > - > > Key: CONNECTORS-109 > URL: https://issues.apache.org/jira/browse/CONNECTORS-109 > Project: ManifoldCF > Issue Type: Bug > Components: Framework crawler agent >Reporter: Karl Wright >Assignee: Karl Wright > > If you try to use the queue status report with Derby as the database, you get > the following error: > 2010-09-17 18:03:21.558:WARN::Nested in javax.servlet.ServletException: > org.apac > he.acf.core.interfaces.ACFException: Database exception: Exception doing > query: > Syntax error: Encountered "SUBSTRING" at line 1, column 8.: > org.apache.acf.core.interfaces.ACFException: Database exception: Exception > doing > query: Syntax error: Encountered "SUBSTRING" at line 1, column 8. > at > org.apache.acf.core.database.Database.executeViaThread(Database.java: > 421) > at > org.apache.acf.core.database.Database.executeUncachedQuery(Database.j > ava:465) > at > org.apache.acf.core.database.Database$QueryCacheExecutor.create(Datab > ase.java:1072) > at > org.apache.acf.core.cachemanager.CacheManager.findObjectsAndExecute(C > acheManager.java:144) > at > org.apache.acf.core.database.Database.executeQuery(Database.java:167) > at > org.apache.acf.core.database.DBInterfaceDerby.performQuery(DBInterfac > eDerby.java:751) > at > org.apache.acf.crawler.jobs.JobManager.genQueueStatus(JobManager.java > :5981) > at > org.apache.jsp.queuestatus_jsp._jspService(queuestatus_jsp.java:769) > at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:70) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) > at > org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper > .java:377) > at > org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:3 > 13) > at org.apache.jasper.servlet.JspServlet.service(JspServlet.java:260) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) > at > org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511 > ) > at > org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:3 > 90) > at > org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.jav > a:216) > at > org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:1 > 82) > at > org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:7 > 65) > at > org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:418) > at org.mortbay.jetty.servlet.Dispatcher.forward(Dispatcher.java:327) > at org.mortbay.jetty.servlet.Dispatcher.forward(Dispatcher.java:126) > at > org.apache.jasper.runtime.PageContextImpl.doForward(PageContextImpl.j > ava:706) > at > org.apache.jasper.runtime.PageContextImpl.forward(PageContextImpl.jav > a:677) > at org.apache.jsp.execute_jsp._jspService(execute_jsp.java:1291) > at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:70) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) > at > org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper > .java:377) > at > org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:3 > 13) > at org.apache.jasper.servlet.JspServlet.service(JspServlet.java:260) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) > at > org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511 > ) > at > org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:3 > 90) > at > org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.jav > a:216) > at > org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:1 > 82) > at > org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:7 > 65) > at > org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:418) > at > org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection. > java:114) > at > org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:1 > 52) > at org.mortbay.jetty.Server.handl
[jira] Commented: (CONNECTORS-114) Derby seems too unstable in multithreaded situations to be a good database for ManifoldCF, so try to add support for HSQLDB
at org.hsqldb.jdbc.JDBCPreparedStatement.performPreExecute(Unknown Sourc e) at org.hsqldb.jdbc.JDBCPreparedStatement.fetchResult(Unknown Source) at org.hsqldb.jdbc.JDBCPreparedStatement.executeUpdate(Unknown Source) - locked <0x29a65798> (a org.hsqldb.jdbc.JDBCPreparedStatement) at org.apache.manifoldcf.core.database.Database.execute(Database.java:56 6) at org.apache.manifoldcf.core.database.Database$ExecuteQueryThread.run(D atabase.java:381) Found 1 deadlock. > Derby seems too unstable in multithreaded situations to be a good database > for ManifoldCF, so try to add support for HSQLDB > --- > > Key: CONNECTORS-114 > URL: https://issues.apache.org/jira/browse/CONNECTORS-114 > Project: ManifoldCF > Issue Type: Bug > Components: Framework core >Reporter: Karl Wright > > Derby seems to have multiple problems: > (1) It has internal deadlocks, which even if caught cause poor performance > due to stalling (CONNECTORS-111); > (2) It has no support for certain SQL constructs (CONNECTORS-109 and > CONNECTORS-110); > (3) It locks up entirely for some people (CONNECTORS-100). > HSQLDB has been recommended as another potential embedded database that might > work better. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (CONNECTORS-114) Derby seems too unstable in multithreaded situations to be a good database for ManifoldCF, so try to add support for HSQLDB
[ https://issues.apache.org/jira/browse/CONNECTORS-114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12920227#action_12920227 ] Karl Wright commented on CONNECTORS-114: Support added and checked in. However, when I try to use hsqldb for an actual crawl, in less than 10 seconds I wind up with a java-level thread deadlock. I've posted the thread dump to connectors-dev. All the locks seem to be deep inside hsqldb, FWIW, which leads me to believe that perhaps hsqldb is even less stable than Derby in a multithread environment. > Derby seems too unstable in multithreaded situations to be a good database > for ManifoldCF, so try to add support for HSQLDB > --- > > Key: CONNECTORS-114 > URL: https://issues.apache.org/jira/browse/CONNECTORS-114 > Project: ManifoldCF > Issue Type: Bug > Components: Framework core >Reporter: Karl Wright > > Derby seems to have multiple problems: > (1) It has internal deadlocks, which even if caught cause poor performance > due to stalling (CONNECTORS-111); > (2) It has no support for certain SQL constructs (CONNECTORS-109 and > CONNECTORS-110); > (3) It locks up entirely for some people (CONNECTORS-100). > HSQLDB has been recommended as another potential embedded database that might > work better. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (CONNECTORS-114) Derby seems too unstable in multithreaded situations to be a good database for ManifoldCF, so try to add support for HSQLDB
Derby seems too unstable in multithreaded situations to be a good database for ManifoldCF, so try to add support for HSQLDB --- Key: CONNECTORS-114 URL: https://issues.apache.org/jira/browse/CONNECTORS-114 Project: ManifoldCF Issue Type: Bug Components: Framework core Reporter: Karl Wright Derby seems to have multiple problems: (1) It has internal deadlocks, which even if caught cause poor performance due to stalling (CONNECTORS-111); (2) It has no support for certain SQL constructs (CONNECTORS-109 and CONNECTORS-110); (3) It locks up entirely for some people (CONNECTORS-100). HSQLDB has been recommended as another potential embedded database that might work better. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (CONNECTORS-111) Encountering deadlock using quick-start & derby
[ https://issues.apache.org/jira/browse/CONNECTORS-111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karl Wright resolved CONNECTORS-111. Resolution: Fixed Fix Version/s: LCF Release 0.5 Retry seems to have fixed things. > Encountering deadlock using quick-start & derby > --- > > Key: CONNECTORS-111 > URL: https://issues.apache.org/jira/browse/CONNECTORS-111 > Project: ManifoldCF > Issue Type: Bug > Components: Examples > Environment: Windows XP Professional SR3, Intel Core 2, 2 GB Ram >Reporter: Farzad >Assignee: Karl Wright > Fix For: LCF Release 0.5 > > > Ran into problem with quick-start and thought I might have better luck if I > manually setup the system. Maybe you can shed a light on the quick-start > problem. Here is what happened, after running start.jar, I went to the > crawler UI, configured a null output and a file system repo connector. > Created a job pointing to a file share \\host\share and started the job. > After a few seconds I ran into the error message below in the job status > panel. It said 60 docs found, 9 active, and 52 processed. Any ideas as to why > I'm seeing this? > Error: A lock could not be obtained due to a deadlock, cycle of locks and > waiters is: Lock : ROW, INGESTSTATUS, (1,57) Waiting XID : > Unknown macro: {6293, X} > , APP, DELETE FROM ingeststatus WHERE urihash=? AND dockey!=? AND > connectionname=? Granted XID : > Unknown macro: {6305, X} > Lock : ROW, INGESTSTATUS, (1,55) Waiting XID : > , APP, INSERT INTO ingeststatus > (id,changecount,dockey,lastversion,firstingest,connectionname,authorityname,urihash,lastoutputversion,lastingest,docuri) > VALUES (?,?,?,?,?,?,?,?,?,?,?) Granted XID : > . The selected victim is XID : 6293. > Thanks! -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (CONNECTORS-111) Encountering deadlock using quick-start & derby
[ https://issues.apache.org/jira/browse/CONNECTORS-111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karl Wright reassigned CONNECTORS-111: -- Assignee: Karl Wright > Encountering deadlock using quick-start & derby > --- > > Key: CONNECTORS-111 > URL: https://issues.apache.org/jira/browse/CONNECTORS-111 > Project: ManifoldCF > Issue Type: Bug > Components: Examples > Environment: Windows XP Professional SR3, Intel Core 2, 2 GB Ram >Reporter: Farzad >Assignee: Karl Wright > Fix For: LCF Release 0.5 > > > Ran into problem with quick-start and thought I might have better luck if I > manually setup the system. Maybe you can shed a light on the quick-start > problem. Here is what happened, after running start.jar, I went to the > crawler UI, configured a null output and a file system repo connector. > Created a job pointing to a file share \\host\share and started the job. > After a few seconds I ran into the error message below in the job status > panel. It said 60 docs found, 9 active, and 52 processed. Any ideas as to why > I'm seeing this? > Error: A lock could not be obtained due to a deadlock, cycle of locks and > waiters is: Lock : ROW, INGESTSTATUS, (1,57) Waiting XID : > Unknown macro: {6293, X} > , APP, DELETE FROM ingeststatus WHERE urihash=? AND dockey!=? AND > connectionname=? Granted XID : > Unknown macro: {6305, X} > Lock : ROW, INGESTSTATUS, (1,55) Waiting XID : > , APP, INSERT INTO ingeststatus > (id,changecount,dockey,lastversion,firstingest,connectionname,authorityname,urihash,lastoutputversion,lastingest,docuri) > VALUES (?,?,?,?,?,?,?,?,?,?,?) Granted XID : > . The selected victim is XID : 6293. > Thanks! -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (CONNECTORS-111) Encountering deadlock using quick-start & derby
[ https://issues.apache.org/jira/browse/CONNECTORS-111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12918507#action_12918507 ] Farzad commented on CONNECTORS-111: --- I tried your fix this morning and I was able to run the job successfully the first time after setup. Seems to be resolved. I ran a few other experiments and they were fine too. Thanks! > Encountering deadlock using quick-start & derby > --- > > Key: CONNECTORS-111 > URL: https://issues.apache.org/jira/browse/CONNECTORS-111 > Project: ManifoldCF > Issue Type: Bug > Components: Examples > Environment: Windows XP Professional SR3, Intel Core 2, 2 GB Ram >Reporter: Farzad > > Ran into problem with quick-start and thought I might have better luck if I > manually setup the system. Maybe you can shed a light on the quick-start > problem. Here is what happened, after running start.jar, I went to the > crawler UI, configured a null output and a file system repo connector. > Created a job pointing to a file share \\host\share and started the job. > After a few seconds I ran into the error message below in the job status > panel. It said 60 docs found, 9 active, and 52 processed. Any ideas as to why > I'm seeing this? > Error: A lock could not be obtained due to a deadlock, cycle of locks and > waiters is: Lock : ROW, INGESTSTATUS, (1,57) Waiting XID : > Unknown macro: {6293, X} > , APP, DELETE FROM ingeststatus WHERE urihash=? AND dockey!=? AND > connectionname=? Granted XID : > Unknown macro: {6305, X} > Lock : ROW, INGESTSTATUS, (1,55) Waiting XID : > , APP, INSERT INTO ingeststatus > (id,changecount,dockey,lastversion,firstingest,connectionname,authorityname,urihash,lastoutputversion,lastingest,docuri) > VALUES (?,?,?,?,?,?,?,?,?,?,?) Granted XID : > . The selected victim is XID : 6293. > Thanks! -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (CONNECTORS-111) Encountering deadlock using quick-start & derby
[ https://issues.apache.org/jira/browse/CONNECTORS-111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12918460#action_12918460 ] Karl Wright commented on CONNECTORS-111: Looking at the complaint again: Error: A lock could not be obtained due to a deadlock, cycle of locks and waiters is: Lock : ROW, INGESTSTATUS, (1,57) Waiting XID :6293, APP, DELETE FROM ingeststatus WHERE urihash=? AND dockey\!=? AND connectionname=? Granted XID :6305 Lock : ROW, INGESTSTATUS, (1,55) Waiting XID :6305, APP, INSERT INTO ingeststatus (id,changecount,dockey,lastversion,firstingest,connectionname,authorityname,urihash,lastoutputversion,lastingest,docuri) VALUES (?,?,?,?,?,?,?,?,?,?,?) Granted XID :6293 The selected victim is XID : 6293. ... it seems more like this has nothing to do with transactions, and more to do with an internal lock-ordering problem in Derby itself. So, each database modification has a potential of throwing one of these exceptions. I tried to fix that issue by retrying the operation should I be outside of a transaction. r1004915. > Encountering deadlock using quick-start & derby > --- > > Key: CONNECTORS-111 > URL: https://issues.apache.org/jira/browse/CONNECTORS-111 > Project: ManifoldCF > Issue Type: Bug > Components: Examples > Environment: Windows XP Professional SR3, Intel Core 2, 2 GB Ram >Reporter: Farzad > > Ran into problem with quick-start and thought I might have better luck if I > manually setup the system. Maybe you can shed a light on the quick-start > problem. Here is what happened, after running start.jar, I went to the > crawler UI, configured a null output and a file system repo connector. > Created a job pointing to a file share \\host\share and started the job. > After a few seconds I ran into the error message below in the job status > panel. It said 60 docs found, 9 active, and 52 processed. Any ideas as to why > I'm seeing this? > Error: A lock could not be obtained due to a deadlock, cycle of locks and > waiters is: Lock : ROW, INGESTSTATUS, (1,57) Waiting XID : > Unknown macro: {6293, X} > , APP, DELETE FROM ingeststatus WHERE urihash=? AND dockey!=? AND > connectionname=? Granted XID : > Unknown macro: {6305, X} > Lock : ROW, INGESTSTATUS, (1,55) Waiting XID : > , APP, INSERT INTO ingeststatus > (id,changecount,dockey,lastversion,firstingest,connectionname,authorityname,urihash,lastoutputversion,lastingest,docuri) > VALUES (?,?,?,?,?,?,?,?,?,?,?) Granted XID : > . The selected victim is XID : 6293. > Thanks! -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (CONNECTORS-111) Encountering deadlock using quick-start & derby
[ https://issues.apache.org/jira/browse/CONNECTORS-111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12918134#action_12918134 ] Karl Wright commented on CONNECTORS-111: I made a change that should make the derby ManifoldCF database implementation more robust against exceptions thrown during commit or rollback. r1004786. See if this makes any difference in your setup. > Encountering deadlock using quick-start & derby > --- > > Key: CONNECTORS-111 > URL: https://issues.apache.org/jira/browse/CONNECTORS-111 > Project: ManifoldCF > Issue Type: Bug > Components: Examples > Environment: Windows XP Professional SR3, Intel Core 2, 2 GB Ram >Reporter: Farzad > > Ran into problem with quick-start and thought I might have better luck if I > manually setup the system. Maybe you can shed a light on the quick-start > problem. Here is what happened, after running start.jar, I went to the > crawler UI, configured a null output and a file system repo connector. > Created a job pointing to a file share \\host\share and started the job. > After a few seconds I ran into the error message below in the job status > panel. It said 60 docs found, 9 active, and 52 processed. Any ideas as to why > I'm seeing this? > Error: A lock could not be obtained due to a deadlock, cycle of locks and > waiters is: Lock : ROW, INGESTSTATUS, (1,57) Waiting XID : > Unknown macro: {6293, X} > , APP, DELETE FROM ingeststatus WHERE urihash=? AND dockey!=? AND > connectionname=? Granted XID : > Unknown macro: {6305, X} > Lock : ROW, INGESTSTATUS, (1,55) Waiting XID : > , APP, INSERT INTO ingeststatus > (id,changecount,dockey,lastversion,firstingest,connectionname,authorityname,urihash,lastoutputversion,lastingest,docuri) > VALUES (?,?,?,?,?,?,?,?,?,?,?) Granted XID : > . The selected victim is XID : 6293. > Thanks! -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (CONNECTORS-111) Encountering deadlock using quick-start & derby
[ https://issues.apache.org/jira/browse/CONNECTORS-111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12918121#action_12918121 ] Karl Wright commented on CONNECTORS-111: As discussed in Confluence, the code in question is not apparently within a database transaction, so it's puzzling to me how a deadlock could develop. The possibilities are: (1) Derby detects deadlocks in part by timeout. Perhaps the derby timeout time is too short. (2) It could be a plain old Derby bug. (3) There could be an error occuring at some point earlier during connection.commit(), which is confusing the ManifoldCF derby implementation. The semantics of such errors are not clear. I suppose I can presume that a rollback took place in that case. > Encountering deadlock using quick-start & derby > --- > > Key: CONNECTORS-111 > URL: https://issues.apache.org/jira/browse/CONNECTORS-111 > Project: ManifoldCF > Issue Type: Bug > Components: Examples > Environment: Windows XP Professional SR3, Intel Core 2, 2 GB Ram >Reporter: Farzad > > Ran into problem with quick-start and thought I might have better luck if I > manually setup the system. Maybe you can shed a light on the quick-start > problem. Here is what happened, after running start.jar, I went to the > crawler UI, configured a null output and a file system repo connector. > Created a job pointing to a file share \\host\share and started the job. > After a few seconds I ran into the error message below in the job status > panel. It said 60 docs found, 9 active, and 52 processed. Any ideas as to why > I'm seeing this? > Error: A lock could not be obtained due to a deadlock, cycle of locks and > waiters is: Lock : ROW, INGESTSTATUS, (1,57) Waiting XID : > Unknown macro: {6293, X} > , APP, DELETE FROM ingeststatus WHERE urihash=? AND dockey!=? AND > connectionname=? Granted XID : > Unknown macro: {6305, X} > Lock : ROW, INGESTSTATUS, (1,55) Waiting XID : > , APP, INSERT INTO ingeststatus > (id,changecount,dockey,lastversion,firstingest,connectionname,authorityname,urihash,lastoutputversion,lastingest,docuri) > VALUES (?,?,?,?,?,?,?,?,?,?,?) Granted XID : > . The selected victim is XID : 6293. > Thanks! -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (CONNECTORS-111) Encountering deadlock using quick-start & derby
[ https://issues.apache.org/jira/browse/CONNECTORS-111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12918119#action_12918119 ] Farzad commented on CONNECTORS-111: --- I experimented more and it seems to be happening the first time after a clean setup of ManifoldCF. Subsequent jobs including the same job ran successfully. > Encountering deadlock using quick-start & derby > --- > > Key: CONNECTORS-111 > URL: https://issues.apache.org/jira/browse/CONNECTORS-111 > Project: ManifoldCF > Issue Type: Bug > Components: Examples > Environment: Windows XP Professional SR3, Intel Core 2, 2 GB Ram >Reporter: Farzad > > Ran into problem with quick-start and thought I might have better luck if I > manually setup the system. Maybe you can shed a light on the quick-start > problem. Here is what happened, after running start.jar, I went to the > crawler UI, configured a null output and a file system repo connector. > Created a job pointing to a file share \\host\share and started the job. > After a few seconds I ran into the error message below in the job status > panel. It said 60 docs found, 9 active, and 52 processed. Any ideas as to why > I'm seeing this? > Error: A lock could not be obtained due to a deadlock, cycle of locks and > waiters is: Lock : ROW, INGESTSTATUS, (1,57) Waiting XID : > Unknown macro: {6293, X} > , APP, DELETE FROM ingeststatus WHERE urihash=? AND dockey!=? AND > connectionname=? Granted XID : > Unknown macro: {6305, X} > Lock : ROW, INGESTSTATUS, (1,55) Waiting XID : > , APP, INSERT INTO ingeststatus > (id,changecount,dockey,lastversion,firstingest,connectionname,authorityname,urihash,lastoutputversion,lastingest,docuri) > VALUES (?,?,?,?,?,?,?,?,?,?,?) Granted XID : > . The selected victim is XID : 6293. > Thanks! -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (CONNECTORS-111) Encountering deadlock using quick-start & derby
Encountering deadlock using quick-start & derby --- Key: CONNECTORS-111 URL: https://issues.apache.org/jira/browse/CONNECTORS-111 Project: ManifoldCF Issue Type: Bug Components: Examples Environment: Windows XP Professional SR3, Intel Core 2, 2 GB Ram Reporter: Farzad Ran into problem with quick-start and thought I might have better luck if I manually setup the system. Maybe you can shed a light on the quick-start problem. Here is what happened, after running start.jar, I went to the crawler UI, configured a null output and a file system repo connector. Created a job pointing to a file share \\host\share and started the job. After a few seconds I ran into the error message below in the job status panel. It said 60 docs found, 9 active, and 52 processed. Any ideas as to why I'm seeing this? Error: A lock could not be obtained due to a deadlock, cycle of locks and waiters is: Lock : ROW, INGESTSTATUS, (1,57) Waiting XID : Unknown macro: {6293, X} , APP, DELETE FROM ingeststatus WHERE urihash=? AND dockey!=? AND connectionname=? Granted XID : Unknown macro: {6305, X} Lock : ROW, INGESTSTATUS, (1,55) Waiting XID : , APP, INSERT INTO ingeststatus (id,changecount,dockey,lastversion,firstingest,connectionname,authorityname,urihash,lastoutputversion,lastingest,docuri) VALUES (?,?,?,?,?,?,?,?,?,?,?) Granted XID : . The selected victim is XID : 6293. Thanks! -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: Derby SQL ideas needed
Yes. This is for the Max Activity and Max Bandwidth reports. Karl On Sun, Sep 19, 2010 at 2:13 PM, Alexey Serba wrote: > And all of this is only with single table repohistory, right? Is this > some kind of complex analytics/stats? > > On Sun, Sep 19, 2010 at 8:48 PM, Karl Wright wrote: >> Here you go: >> >> // The query we will generate here looks like this: >> // SELECT * >> // FROM >> // (SELECT DISTINCT ON (idbucket) t3.bucket AS idbucket, >> t3.bytecount AS bytecount, >> // t3.windowstart AS starttime, >> t3.windowend AS endtime >> // FROM (SELECT * FROM (SELECT t0.bucket AS bucket, >> t0.starttime AS windowstart, t0.starttime + AS windowend, >> // SUM(t1.datasize * ((case when t0.starttime + >> < t1.endtime then t0.starttime + else t1.endtime >> end) - >> // (case when t0.starttime>t1.starttime then >> t0.starttime else t1.starttime end)) >> // / (t1.endtime - t1.starttime)) AS bytecount >> // FROM (SELECT DISTINCT substring(entityid from >> '') AS bucket, starttime FROM repohistory WHERE >> ) t0, repohistory t1 >> // WHERE t0.bucket=substring(t1.entityid from >> '') >> // AND t1.starttime < t0.starttime + >> AND t1.endtime > t0.starttime >> // AND >> // GROUP BY bucket,windowstart,windowend >> // UNION SELECT t0a.bucket AS bucket, t0a.endtime - >> AS windowstart, t0a.endtime AS windowend, >> // SUM(t1a.datasize * ((case when t0a.endtime < >> t1a.endtime then t0a.endtime else t1a.endtime end) - >> // (case when t0a.endtime - > >> t1a.starttime then t0a.endtime - else t1a.starttime end)) >> // / (t1a.endtime - t1a.starttime)) AS bytecount >> // FROM (SELECT DISTINCT substring(entityid from >> '') AS bucket, endtime FROM repohistory WHERE >> ) t0a, repohistory t1a >> // WHERE t0a.bucket=substring(t1a.entityid from >> '') >> // AND (t1a.starttime < t0a.endtime AND >> t1a.endtime > t0a.endtime - >> // AND >> // GROUP BY bucket,windowstart,windowend) t2 >> // ORDER BY bucket ASC,bytecount >> DESC) t3) t4 ORDER BY xxx LIMIT yyy OFFSET zzz; >> >> I have low confidence that ANY planner would be able to locate the >> common part of a 2x larger query and not do it twice. >> >> Karl >> >> >> >> On Sun, Sep 19, 2010 at 12:05 PM, Alexey Serba wrote: >>>> The other thing is that we cannot afford to use the same "table" >>>> twice, as it is actually an extremely expensive query in its own >>>> right, with multiple joins, select distinct's, etc. under the covers. >>> Even if you create indexes on bucket and activitycount columns? It >>> might be that the query plans for these two queries (with "distinct >>> on" hack and subquery max/subquery order limit/join) would be the >>> same. >>> >>>> I'd be happy to post it but it may shock you. ;-) >>> The way I indent SQL queries should say that I'm not afraid of >>> multipage queries :) >>> >>>> >>>> Karl >>>> >>>> >>>> >>>> >>>> >>>> On Sun, Sep 19, 2010 at 11:32 AM, Alexey Serba wrote: >>>>>> SELECT DISTINCT ON (idbucket) t3.bucket AS idbucket, t3.activitycount AS >>>>>> activitycount, t3.windowstart AS starttime, t3.windowend AS endtime FROM >>>>>> (...) t3 >>>>> Do you have primary key in your t3 table? >>>>> >>>>>> In Postgresql, what this does is to return the FIRST entire row matching >>>>>> each distinct idbucket result. >>>>> FIRST based on which sort? >>>>> >>>>> Lets say you want to return FIRST row based on t3.windowstart column >>>>> and you have primary key in t3 table. Then I believe your query can be >>>>> rewritten in the following ways: >>>>> >>>>> 1. Using subqueries >>>>> SELECT >>>>> bucket, primary_key, windowstart, etc >>>>> FROM >>>>>
Re: Derby SQL ideas needed
And all of this is only with single table repohistory, right? Is this some kind of complex analytics/stats? On Sun, Sep 19, 2010 at 8:48 PM, Karl Wright wrote: > Here you go: > > // The query we will generate here looks like this: > // SELECT * > // FROM > // (SELECT DISTINCT ON (idbucket) t3.bucket AS idbucket, > t3.bytecount AS bytecount, > // t3.windowstart AS starttime, > t3.windowend AS endtime > // FROM (SELECT * FROM (SELECT t0.bucket AS bucket, > t0.starttime AS windowstart, t0.starttime + AS windowend, > // SUM(t1.datasize * ((case when t0.starttime + > < t1.endtime then t0.starttime + else t1.endtime > end) - > // (case when t0.starttime>t1.starttime then > t0.starttime else t1.starttime end)) > // / (t1.endtime - t1.starttime)) AS bytecount > // FROM (SELECT DISTINCT substring(entityid from > '') AS bucket, starttime FROM repohistory WHERE > ) t0, repohistory t1 > // WHERE t0.bucket=substring(t1.entityid from > '') > // AND t1.starttime < t0.starttime + > AND t1.endtime > t0.starttime > // AND > // GROUP BY bucket,windowstart,windowend > // UNION SELECT t0a.bucket AS bucket, t0a.endtime - > AS windowstart, t0a.endtime AS windowend, > // SUM(t1a.datasize * ((case when t0a.endtime < > t1a.endtime then t0a.endtime else t1a.endtime end) - > // (case when t0a.endtime - > > t1a.starttime then t0a.endtime - else t1a.starttime end)) > // / (t1a.endtime - t1a.starttime)) AS bytecount > // FROM (SELECT DISTINCT substring(entityid from > '') AS bucket, endtime FROM repohistory WHERE > ) t0a, repohistory t1a > // WHERE t0a.bucket=substring(t1a.entityid from > '') > // AND (t1a.starttime < t0a.endtime AND > t1a.endtime > t0a.endtime - > // AND > // GROUP BY bucket,windowstart,windowend) t2 > // ORDER BY bucket ASC,bytecount > DESC) t3) t4 ORDER BY xxx LIMIT yyy OFFSET zzz; > > I have low confidence that ANY planner would be able to locate the > common part of a 2x larger query and not do it twice. > > Karl > > > > On Sun, Sep 19, 2010 at 12:05 PM, Alexey Serba wrote: >>> The other thing is that we cannot afford to use the same "table" >>> twice, as it is actually an extremely expensive query in its own >>> right, with multiple joins, select distinct's, etc. under the covers. >> Even if you create indexes on bucket and activitycount columns? It >> might be that the query plans for these two queries (with "distinct >> on" hack and subquery max/subquery order limit/join) would be the >> same. >> >>> I'd be happy to post it but it may shock you. ;-) >> The way I indent SQL queries should say that I'm not afraid of >> multipage queries :) >> >>> >>> Karl >>> >>> >>> >>> >>> >>> On Sun, Sep 19, 2010 at 11:32 AM, Alexey Serba wrote: >>>>> SELECT DISTINCT ON (idbucket) t3.bucket AS idbucket, t3.activitycount AS >>>>> activitycount, t3.windowstart AS starttime, t3.windowend AS endtime FROM >>>>> (...) t3 >>>> Do you have primary key in your t3 table? >>>> >>>>> In Postgresql, what this does is to return the FIRST entire row matching >>>>> each distinct idbucket result. >>>> FIRST based on which sort? >>>> >>>> Lets say you want to return FIRST row based on t3.windowstart column >>>> and you have primary key in t3 table. Then I believe your query can be >>>> rewritten in the following ways: >>>> >>>> 1. Using subqueries >>>> SELECT >>>> bucket, primary_key, windowstart, etc >>>> FROM >>>> table AS t1 >>>> WHERE >>>> windowstart=( SELECT max(windowstart) FROM table AS t2 WHERE >>>> bucket = t1.bucket ) >>>> >>>> 2. Using joins instead of subqueries ( in case Derby doesn't support >>>> subqueries - not sure about that ) >>>> SELECT >>>> t1.bucket, t1.primary_key, windowstart, etc >>>> FROM >>>> table AS t1 >>>> LEFT OUTER JOIN table AS t2 ON ( t
Re: Derby SQL ideas needed
You can also try ORDER BY bytecount DESC LIMIT 1 instead of aggregate function max, i.e. SELECT t1.bucket, t1.bytecount, t1.windowstart, t1.windowend FROM (xxx) t1 WHERE t1.bytecount=( SELECT t2.bytecount FROM (xxx) t2 WHERE t2.bucket = t1.bucket ORDER BY t2.bytecount DESC LIMIT 1 ) On Sun, Sep 19, 2010 at 9:07 PM, Karl Wright wrote: > Looking at your proposal: > > SELECT > bucket, primary_key, windowstart, etc > FROM > table AS t1 > WHERE > windowstart=( SELECT max(windowstart) FROM table AS t2 WHERE > bucket = t1.bucket ) > > ... we'd be looking actually for something more like this: > > > SELECT > t1.bucket, t1.bytecount, t1.windowstart, t1.windowend > FROM > (xxx) t1 > WHERE > t1.bytecount=( SELECT max(t2.bytecount) FROM (xxx) t2 WHERE > t2.bucket = t1.bucket ) > > ... although I've never seen the =(SELECT...) structure before. > > Karl > > > On Sun, Sep 19, 2010 at 12:48 PM, Karl Wright wrote: >> Here you go: >> >> // The query we will generate here looks like this: >> // SELECT * >> // FROM >> // (SELECT DISTINCT ON (idbucket) t3.bucket AS idbucket, >> t3.bytecount AS bytecount, >> // t3.windowstart AS starttime, >> t3.windowend AS endtime >> // FROM (SELECT * FROM (SELECT t0.bucket AS bucket, >> t0.starttime AS windowstart, t0.starttime + AS windowend, >> // SUM(t1.datasize * ((case when t0.starttime + >> < t1.endtime then t0.starttime + else t1.endtime >> end) - >> // (case when t0.starttime>t1.starttime then >> t0.starttime else t1.starttime end)) >> // / (t1.endtime - t1.starttime)) AS bytecount >> // FROM (SELECT DISTINCT substring(entityid from >> '') AS bucket, starttime FROM repohistory WHERE >> ) t0, repohistory t1 >> // WHERE t0.bucket=substring(t1.entityid from >> '') >> // AND t1.starttime < t0.starttime + >> AND t1.endtime > t0.starttime >> // AND >> // GROUP BY bucket,windowstart,windowend >> // UNION SELECT t0a.bucket AS bucket, t0a.endtime - >> AS windowstart, t0a.endtime AS windowend, >> // SUM(t1a.datasize * ((case when t0a.endtime < >> t1a.endtime then t0a.endtime else t1a.endtime end) - >> // (case when t0a.endtime - > >> t1a.starttime then t0a.endtime - else t1a.starttime end)) >> // / (t1a.endtime - t1a.starttime)) AS bytecount >> // FROM (SELECT DISTINCT substring(entityid from >> '') AS bucket, endtime FROM repohistory WHERE >> ) t0a, repohistory t1a >> // WHERE t0a.bucket=substring(t1a.entityid from >> '') >> // AND (t1a.starttime < t0a.endtime AND >> t1a.endtime > t0a.endtime - >> // AND >> // GROUP BY bucket,windowstart,windowend) t2 >> // ORDER BY bucket ASC,bytecount >> DESC) t3) t4 ORDER BY xxx LIMIT yyy OFFSET zzz; >> >> I have low confidence that ANY planner would be able to locate the >> common part of a 2x larger query and not do it twice. >> >> Karl >> >> >> >> On Sun, Sep 19, 2010 at 12:05 PM, Alexey Serba wrote: >>>> The other thing is that we cannot afford to use the same "table" >>>> twice, as it is actually an extremely expensive query in its own >>>> right, with multiple joins, select distinct's, etc. under the covers. >>> Even if you create indexes on bucket and activitycount columns? It >>> might be that the query plans for these two queries (with "distinct >>> on" hack and subquery max/subquery order limit/join) would be the >>> same. >>> >>>> I'd be happy to post it but it may shock you. ;-) >>> The way I indent SQL queries should say that I'm not afraid of >>> multipage queries :) >>> >>>> >>>> Karl >>>> >>>> >>>> >>>> >>>> >>>> On Sun, Sep 19, 2010 at 11:32 AM, Alexey Serba wrote: >>>>>> SELECT DISTINCT ON (idbucket) t3.bucket AS idbucket, t3.activitycount AS >>>>>> activitycount, t3.windowstart AS starttime, t3.windowend AS endtime FROM >>>>>> (...) t3 >>>>
Re: Derby SQL ideas needed
Looking at your proposal: SELECT bucket, primary_key, windowstart, etc FROM table AS t1 WHERE windowstart=( SELECT max(windowstart) FROM table AS t2 WHERE bucket = t1.bucket ) ... we'd be looking actually for something more like this: SELECT t1.bucket, t1.bytecount, t1.windowstart, t1.windowend FROM (xxx) t1 WHERE t1.bytecount=( SELECT max(t2.bytecount) FROM (xxx) t2 WHERE t2.bucket = t1.bucket ) ... although I've never seen the =(SELECT...) structure before. Karl On Sun, Sep 19, 2010 at 12:48 PM, Karl Wright wrote: > Here you go: > > // The query we will generate here looks like this: > // SELECT * > // FROM > // (SELECT DISTINCT ON (idbucket) t3.bucket AS idbucket, > t3.bytecount AS bytecount, > // t3.windowstart AS starttime, > t3.windowend AS endtime > // FROM (SELECT * FROM (SELECT t0.bucket AS bucket, > t0.starttime AS windowstart, t0.starttime + AS windowend, > // SUM(t1.datasize * ((case when t0.starttime + > < t1.endtime then t0.starttime + else t1.endtime > end) - > // (case when t0.starttime>t1.starttime then > t0.starttime else t1.starttime end)) > // / (t1.endtime - t1.starttime)) AS bytecount > // FROM (SELECT DISTINCT substring(entityid from > '') AS bucket, starttime FROM repohistory WHERE > ) t0, repohistory t1 > // WHERE t0.bucket=substring(t1.entityid from > '') > // AND t1.starttime < t0.starttime + > AND t1.endtime > t0.starttime > // AND > // GROUP BY bucket,windowstart,windowend > // UNION SELECT t0a.bucket AS bucket, t0a.endtime - > AS windowstart, t0a.endtime AS windowend, > // SUM(t1a.datasize * ((case when t0a.endtime < > t1a.endtime then t0a.endtime else t1a.endtime end) - > // (case when t0a.endtime - > > t1a.starttime then t0a.endtime - else t1a.starttime end)) > // / (t1a.endtime - t1a.starttime)) AS bytecount > // FROM (SELECT DISTINCT substring(entityid from > '') AS bucket, endtime FROM repohistory WHERE > ) t0a, repohistory t1a > // WHERE t0a.bucket=substring(t1a.entityid from > '') > // AND (t1a.starttime < t0a.endtime AND > t1a.endtime > t0a.endtime - > // AND > // GROUP BY bucket,windowstart,windowend) t2 > // ORDER BY bucket ASC,bytecount > DESC) t3) t4 ORDER BY xxx LIMIT yyy OFFSET zzz; > > I have low confidence that ANY planner would be able to locate the > common part of a 2x larger query and not do it twice. > > Karl > > > > On Sun, Sep 19, 2010 at 12:05 PM, Alexey Serba wrote: >>> The other thing is that we cannot afford to use the same "table" >>> twice, as it is actually an extremely expensive query in its own >>> right, with multiple joins, select distinct's, etc. under the covers. >> Even if you create indexes on bucket and activitycount columns? It >> might be that the query plans for these two queries (with "distinct >> on" hack and subquery max/subquery order limit/join) would be the >> same. >> >>> I'd be happy to post it but it may shock you. ;-) >> The way I indent SQL queries should say that I'm not afraid of >> multipage queries :) >> >>> >>> Karl >>> >>> >>> >>> >>> >>> On Sun, Sep 19, 2010 at 11:32 AM, Alexey Serba wrote: >>>>> SELECT DISTINCT ON (idbucket) t3.bucket AS idbucket, t3.activitycount AS >>>>> activitycount, t3.windowstart AS starttime, t3.windowend AS endtime FROM >>>>> (...) t3 >>>> Do you have primary key in your t3 table? >>>> >>>>> In Postgresql, what this does is to return the FIRST entire row matching >>>>> each distinct idbucket result. >>>> FIRST based on which sort? >>>> >>>> Lets say you want to return FIRST row based on t3.windowstart column >>>> and you have primary key in t3 table. Then I believe your query can be >>>> rewritten in the following ways: >>>> >>>> 1. Using subqueries >>>> SELECT >>>> bucket, primary_key, windowstart, etc >>>> FROM >>>> table AS t1 >>>> WHERE >>>> windowstart=( SELECT max(windowstart) FROM table AS t2 WHE
Re: Derby SQL ideas needed
Here you go: // The query we will generate here looks like this: // SELECT * // FROM // (SELECT DISTINCT ON (idbucket) t3.bucket AS idbucket, t3.bytecount AS bytecount, // t3.windowstart AS starttime, t3.windowend AS endtime //FROM (SELECT * FROM (SELECT t0.bucket AS bucket, t0.starttime AS windowstart, t0.starttime + AS windowend, // SUM(t1.datasize * ((case when t0.starttime + < t1.endtime then t0.starttime + else t1.endtime end) - // (case when t0.starttime>t1.starttime then t0.starttime else t1.starttime end)) // / (t1.endtime - t1.starttime)) AS bytecount // FROM (SELECT DISTINCT substring(entityid from '') AS bucket, starttime FROM repohistory WHERE ) t0, repohistory t1 // WHERE t0.bucket=substring(t1.entityid from '') // AND t1.starttime < t0.starttime + AND t1.endtime > t0.starttime // AND // GROUP BY bucket,windowstart,windowend // UNION SELECT t0a.bucket AS bucket, t0a.endtime - AS windowstart, t0a.endtime AS windowend, // SUM(t1a.datasize * ((case when t0a.endtime < t1a.endtime then t0a.endtime else t1a.endtime end) - // (case when t0a.endtime - > t1a.starttime then t0a.endtime - else t1a.starttime end)) // / (t1a.endtime - t1a.starttime)) AS bytecount // FROM (SELECT DISTINCT substring(entityid from '') AS bucket, endtime FROM repohistory WHERE ) t0a, repohistory t1a // WHERE t0a.bucket=substring(t1a.entityid from '') // AND (t1a.starttime < t0a.endtime AND t1a.endtime > t0a.endtime - // AND // GROUP BY bucket,windowstart,windowend) t2 // ORDER BY bucket ASC,bytecount DESC) t3) t4 ORDER BY xxx LIMIT yyy OFFSET zzz; I have low confidence that ANY planner would be able to locate the common part of a 2x larger query and not do it twice. Karl On Sun, Sep 19, 2010 at 12:05 PM, Alexey Serba wrote: >> The other thing is that we cannot afford to use the same "table" >> twice, as it is actually an extremely expensive query in its own >> right, with multiple joins, select distinct's, etc. under the covers. > Even if you create indexes on bucket and activitycount columns? It > might be that the query plans for these two queries (with "distinct > on" hack and subquery max/subquery order limit/join) would be the > same. > >> I'd be happy to post it but it may shock you. ;-) > The way I indent SQL queries should say that I'm not afraid of > multipage queries :) > >> >> Karl >> >> >> >> >> >> On Sun, Sep 19, 2010 at 11:32 AM, Alexey Serba wrote: >>>> SELECT DISTINCT ON (idbucket) t3.bucket AS idbucket, t3.activitycount AS >>>> activitycount, t3.windowstart AS starttime, t3.windowend AS endtime FROM >>>> (...) t3 >>> Do you have primary key in your t3 table? >>> >>>> In Postgresql, what this does is to return the FIRST entire row matching >>>> each distinct idbucket result. >>> FIRST based on which sort? >>> >>> Lets say you want to return FIRST row based on t3.windowstart column >>> and you have primary key in t3 table. Then I believe your query can be >>> rewritten in the following ways: >>> >>> 1. Using subqueries >>> SELECT >>> bucket, primary_key, windowstart, etc >>> FROM >>> table AS t1 >>> WHERE >>> windowstart=( SELECT max(windowstart) FROM table AS t2 WHERE >>> bucket = t1.bucket ) >>> >>> 2. Using joins instead of subqueries ( in case Derby doesn't support >>> subqueries - not sure about that ) >>> SELECT >>> t1.bucket, t1.primary_key, windowstart, etc >>> FROM >>> table AS t1 >>> LEFT OUTER JOIN table AS t2 ON ( t1.bucket = t2.bucket AND >>> t2.windowstart > t1.windowstart ) >>> WHERE >>> t2.primary_key IS NULL >>> >>> HTH, >>> Alex >>> >>> On Sat, Sep 18, 2010 at 2:28 PM, Karl Wright wrote: >>>> Hi Folks, >>>> >>>> For two of the report queries, ACF uses the following Postgresql >>>> construct, which sadly seems to have no Derby equivalent: >>>> >>>> SELECT DISTINCT ON (idbucket) t3.bucket AS idbucket, t3.activitycount >>>> AS activi
Re: Derby SQL ideas needed
> The other thing is that we cannot afford to use the same "table" > twice, as it is actually an extremely expensive query in its own > right, with multiple joins, select distinct's, etc. under the covers. Even if you create indexes on bucket and activitycount columns? It might be that the query plans for these two queries (with "distinct on" hack and subquery max/subquery order limit/join) would be the same. > I'd be happy to post it but it may shock you. ;-) The way I indent SQL queries should say that I'm not afraid of multipage queries :) > > Karl > > > > > > On Sun, Sep 19, 2010 at 11:32 AM, Alexey Serba wrote: >>> SELECT DISTINCT ON (idbucket) t3.bucket AS idbucket, t3.activitycount AS >>> activitycount, t3.windowstart AS starttime, t3.windowend AS endtime FROM >>> (...) t3 >> Do you have primary key in your t3 table? >> >>> In Postgresql, what this does is to return the FIRST entire row matching >>> each distinct idbucket result. >> FIRST based on which sort? >> >> Lets say you want to return FIRST row based on t3.windowstart column >> and you have primary key in t3 table. Then I believe your query can be >> rewritten in the following ways: >> >> 1. Using subqueries >> SELECT >> bucket, primary_key, windowstart, etc >> FROM >> table AS t1 >> WHERE >> windowstart=( SELECT max(windowstart) FROM table AS t2 WHERE >> bucket = t1.bucket ) >> >> 2. Using joins instead of subqueries ( in case Derby doesn't support >> subqueries - not sure about that ) >> SELECT >> t1.bucket, t1.primary_key, windowstart, etc >> FROM >> table AS t1 >> LEFT OUTER JOIN table AS t2 ON ( t1.bucket = t2.bucket AND >> t2.windowstart > t1.windowstart ) >> WHERE >> t2.primary_key IS NULL >> >> HTH, >> Alex >> >> On Sat, Sep 18, 2010 at 2:28 PM, Karl Wright wrote: >>> Hi Folks, >>> >>> For two of the report queries, ACF uses the following Postgresql >>> construct, which sadly seems to have no Derby equivalent: >>> >>> SELECT DISTINCT ON (idbucket) t3.bucket AS idbucket, t3.activitycount >>> AS activitycount, t3.windowstart AS starttime, t3.windowend AS endtime >>> FROM (...) t3 >>> >>> In Postgresql, what this does is to return the FIRST entire row >>> matching each distinct idbucket result. If Derby had a "FIRST()" >>> aggregate function, it would be the equivalent of: >>> >>> SELECT t3.bucket AS idbucket, FIRST(t3.activitycount) AS >>> activitycount, FIRST(t3.windowstart) AS starttime, FIRST(t3.windowend) >>> AS endtime FROM (...) t3 GROUP BY t3.bucket >>> >>> Unfortunately, Derby has no such aggregate function. Furthermore, it >>> would not be ideal if I were to do the work myself in ACF, because >>> this is a resultset that needs to be paged through with offset and >>> length, for presentation to the user and sorting, so it gets wrapped >>> in another SELECT ... FROM (...) ORDER BY ... OFFSET ... LIMIT ... >>> that does that part. >>> >>> Does anyone have any ideas and/or Derby contacts? I'd really like the >>> quick-start example to have a functional set of reports. >>> >>> Karl >>> >> >
Re: Derby SQL ideas needed
"FIRST based on which sort?"? First based on the existing sort, which is crucial, because the sort is by bucket ASC, activitycount DESC. I'm looking for the row with the highest activitycount, per bucket. The other thing is that we cannot afford to use the same "table" twice, as it is actually an extremely expensive query in its own right, with multiple joins, select distinct's, etc. under the covers. I'd be happy to post it but it may shock you. ;-) Karl On Sun, Sep 19, 2010 at 11:32 AM, Alexey Serba wrote: >> SELECT DISTINCT ON (idbucket) t3.bucket AS idbucket, t3.activitycount AS >> activitycount, t3.windowstart AS starttime, t3.windowend AS endtime FROM >> (...) t3 > Do you have primary key in your t3 table? > >> In Postgresql, what this does is to return the FIRST entire row matching >> each distinct idbucket result. > FIRST based on which sort? > > Lets say you want to return FIRST row based on t3.windowstart column > and you have primary key in t3 table. Then I believe your query can be > rewritten in the following ways: > > 1. Using subqueries > SELECT > bucket, primary_key, windowstart, etc > FROM > table AS t1 > WHERE > windowstart=( SELECT max(windowstart) FROM table AS t2 WHERE > bucket = t1.bucket ) > > 2. Using joins instead of subqueries ( in case Derby doesn't support > subqueries - not sure about that ) > SELECT > t1.bucket, t1.primary_key, windowstart, etc > FROM > table AS t1 > LEFT OUTER JOIN table AS t2 ON ( t1.bucket = t2.bucket AND > t2.windowstart > t1.windowstart ) > WHERE > t2.primary_key IS NULL > > HTH, > Alex > > On Sat, Sep 18, 2010 at 2:28 PM, Karl Wright wrote: >> Hi Folks, >> >> For two of the report queries, ACF uses the following Postgresql >> construct, which sadly seems to have no Derby equivalent: >> >> SELECT DISTINCT ON (idbucket) t3.bucket AS idbucket, t3.activitycount >> AS activitycount, t3.windowstart AS starttime, t3.windowend AS endtime >> FROM (...) t3 >> >> In Postgresql, what this does is to return the FIRST entire row >> matching each distinct idbucket result. If Derby had a "FIRST()" >> aggregate function, it would be the equivalent of: >> >> SELECT t3.bucket AS idbucket, FIRST(t3.activitycount) AS >> activitycount, FIRST(t3.windowstart) AS starttime, FIRST(t3.windowend) >> AS endtime FROM (...) t3 GROUP BY t3.bucket >> >> Unfortunately, Derby has no such aggregate function. Furthermore, it >> would not be ideal if I were to do the work myself in ACF, because >> this is a resultset that needs to be paged through with offset and >> length, for presentation to the user and sorting, so it gets wrapped >> in another SELECT ... FROM (...) ORDER BY ... OFFSET ... LIMIT ... >> that does that part. >> >> Does anyone have any ideas and/or Derby contacts? I'd really like the >> quick-start example to have a functional set of reports. >> >> Karl >> >
Re: Derby SQL ideas needed
> SELECT DISTINCT ON (idbucket) t3.bucket AS idbucket, t3.activitycount AS > activitycount, t3.windowstart AS starttime, t3.windowend AS endtime FROM > (...) t3 Do you have primary key in your t3 table? > In Postgresql, what this does is to return the FIRST entire row matching each > distinct idbucket result. FIRST based on which sort? Lets say you want to return FIRST row based on t3.windowstart column and you have primary key in t3 table. Then I believe your query can be rewritten in the following ways: 1. Using subqueries SELECT bucket, primary_key, windowstart, etc FROM table AS t1 WHERE windowstart=( SELECT max(windowstart) FROM table AS t2 WHERE bucket = t1.bucket ) 2. Using joins instead of subqueries ( in case Derby doesn't support subqueries - not sure about that ) SELECT t1.bucket, t1.primary_key, windowstart, etc FROM table AS t1 LEFT OUTER JOIN table AS t2 ON ( t1.bucket = t2.bucket AND t2.windowstart > t1.windowstart ) WHERE t2.primary_key IS NULL HTH, Alex On Sat, Sep 18, 2010 at 2:28 PM, Karl Wright wrote: > Hi Folks, > > For two of the report queries, ACF uses the following Postgresql > construct, which sadly seems to have no Derby equivalent: > > SELECT DISTINCT ON (idbucket) t3.bucket AS idbucket, t3.activitycount > AS activitycount, t3.windowstart AS starttime, t3.windowend AS endtime > FROM (...) t3 > > In Postgresql, what this does is to return the FIRST entire row > matching each distinct idbucket result. If Derby had a "FIRST()" > aggregate function, it would be the equivalent of: > > SELECT t3.bucket AS idbucket, FIRST(t3.activitycount) AS > activitycount, FIRST(t3.windowstart) AS starttime, FIRST(t3.windowend) > AS endtime FROM (...) t3 GROUP BY t3.bucket > > Unfortunately, Derby has no such aggregate function. Furthermore, it > would not be ideal if I were to do the work myself in ACF, because > this is a resultset that needs to be paged through with offset and > length, for presentation to the user and sorting, so it gets wrapped > in another SELECT ... FROM (...) ORDER BY ... OFFSET ... LIMIT ... > that does that part. > > Does anyone have any ideas and/or Derby contacts? I'd really like the > quick-start example to have a functional set of reports. > > Karl >
[jira] Commented: (CONNECTORS-110) Max activity and Max bandwidth reports fail under Derby with a stack trace
[ https://issues.apache.org/jira/browse/CONNECTORS-110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12912212#action_12912212 ] Karl Wright commented on CONNECTORS-110: Checked in a partial solution to this issue. At least the reports don't fail with an exception now, but they also list all time intervals on Derby instead of collapsing and reporting just the maximum, which will make these reports far less useful. r998635. > Max activity and Max bandwidth reports fail under Derby with a stack trace > -- > > Key: CONNECTORS-110 > URL: https://issues.apache.org/jira/browse/CONNECTORS-110 > Project: Apache Connectors Framework > Issue Type: Bug > Components: Framework crawler agent >Reporter: Karl Wright > > The reason for the failure is because the queries used are doing the > Postgresql DISTINCT ON (xxx) syntax, which Derby does not support. > Unfortunately, there does not seem to be a way in Derby at present to do > anything similar to DISTINCT ON (xxx), and the queries really can't be done > without that. > One option is to introduce a getCapabilities() method into the database > implementation, which would allow ACF to query the database capabilities > before even presenting the report in the navigation menu in the UI. Another > alternative is to do a sizable chunk of resultset processing within ACF, > which would require not only the DISTINCT ON() implementation, but also the > enclosing sort and limit stuff. It's the latter that would be most > challenging, because of the difficulties with i18n etc. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (CONNECTORS-110) Max activity and Max bandwidth reports fail under Derby with a stack trace
Max activity and Max bandwidth reports fail under Derby with a stack trace -- Key: CONNECTORS-110 URL: https://issues.apache.org/jira/browse/CONNECTORS-110 Project: Apache Connectors Framework Issue Type: Bug Components: Framework crawler agent Reporter: Karl Wright The reason for the failure is because the queries used are doing the Postgresql DISTINCT ON (xxx) syntax, which Derby does not support. Unfortunately, there does not seem to be a way in Derby at present to do anything similar to DISTINCT ON (xxx), and the queries really can't be done without that. One option is to introduce a getCapabilities() method into the database implementation, which would allow ACF to query the database capabilities before even presenting the report in the navigation menu in the UI. Another alternative is to do a sizable chunk of resultset processing within ACF, which would require not only the DISTINCT ON() implementation, but also the enclosing sort and limit stuff. It's the latter that would be most challenging, because of the difficulties with i18n etc. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (CONNECTORS-109) Queue status report fails under Derby
[ https://issues.apache.org/jira/browse/CONNECTORS-109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12911146#action_12911146 ] Karl Wright commented on CONNECTORS-109: Committed the set of changes necessary to use the DERBY-4066 fix properly when it becomes available. r998576. > Queue status report fails under Derby > - > > Key: CONNECTORS-109 > URL: https://issues.apache.org/jira/browse/CONNECTORS-109 > Project: Apache Connectors Framework > Issue Type: Bug > Components: Framework crawler agent >Reporter: Karl Wright > > If you try to use the queue status report with Derby as the database, you get > the following error: > 2010-09-17 18:03:21.558:WARN::Nested in javax.servlet.ServletException: > org.apac > he.acf.core.interfaces.ACFException: Database exception: Exception doing > query: > Syntax error: Encountered "SUBSTRING" at line 1, column 8.: > org.apache.acf.core.interfaces.ACFException: Database exception: Exception > doing > query: Syntax error: Encountered "SUBSTRING" at line 1, column 8. > at > org.apache.acf.core.database.Database.executeViaThread(Database.java: > 421) > at > org.apache.acf.core.database.Database.executeUncachedQuery(Database.j > ava:465) > at > org.apache.acf.core.database.Database$QueryCacheExecutor.create(Datab > ase.java:1072) > at > org.apache.acf.core.cachemanager.CacheManager.findObjectsAndExecute(C > acheManager.java:144) > at > org.apache.acf.core.database.Database.executeQuery(Database.java:167) > at > org.apache.acf.core.database.DBInterfaceDerby.performQuery(DBInterfac > eDerby.java:751) > at > org.apache.acf.crawler.jobs.JobManager.genQueueStatus(JobManager.java > :5981) > at > org.apache.jsp.queuestatus_jsp._jspService(queuestatus_jsp.java:769) > at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:70) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) > at > org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper > .java:377) > at > org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:3 > 13) > at org.apache.jasper.servlet.JspServlet.service(JspServlet.java:260) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) > at > org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511 > ) > at > org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:3 > 90) > at > org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.jav > a:216) > at > org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:1 > 82) > at > org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:7 > 65) > at > org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:418) > at org.mortbay.jetty.servlet.Dispatcher.forward(Dispatcher.java:327) > at org.mortbay.jetty.servlet.Dispatcher.forward(Dispatcher.java:126) > at > org.apache.jasper.runtime.PageContextImpl.doForward(PageContextImpl.j > ava:706) > at > org.apache.jasper.runtime.PageContextImpl.forward(PageContextImpl.jav > a:677) > at org.apache.jsp.execute_jsp._jspService(execute_jsp.java:1291) > at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:70) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) > at > org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper > .java:377) > at > org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:3 > 13) > at org.apache.jasper.servlet.JspServlet.service(JspServlet.java:260) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) > at > org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511 > ) > at > org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:3 > 90) > at > org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.jav > a:216) > at > org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:1 > 82) > at > org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:7 > 65) > at > org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:418) > at > org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection. > java:114) >
[jira] Commented: (CONNECTORS-109) Queue status report fails under Derby
[ https://issues.apache.org/jira/browse/CONNECTORS-109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1293#action_1293 ] Karl Wright commented on CONNECTORS-109: Made most of the necessary code changes to correct this problem locally, but can't commit them yet because Derby's functions are limited in the current release to not allow CLOB arguments. This issue is going to be addressed in the next release of Derby, see DERBY-4066. The alternative is to build a trunk version of Derby and use that instead. > Queue status report fails under Derby > - > > Key: CONNECTORS-109 > URL: https://issues.apache.org/jira/browse/CONNECTORS-109 > Project: Apache Connectors Framework > Issue Type: Bug > Components: Framework crawler agent >Reporter: Karl Wright > > If you try to use the queue status report with Derby as the database, you get > the following error: > 2010-09-17 18:03:21.558:WARN::Nested in javax.servlet.ServletException: > org.apac > he.acf.core.interfaces.ACFException: Database exception: Exception doing > query: > Syntax error: Encountered "SUBSTRING" at line 1, column 8.: > org.apache.acf.core.interfaces.ACFException: Database exception: Exception > doing > query: Syntax error: Encountered "SUBSTRING" at line 1, column 8. > at > org.apache.acf.core.database.Database.executeViaThread(Database.java: > 421) > at > org.apache.acf.core.database.Database.executeUncachedQuery(Database.j > ava:465) > at > org.apache.acf.core.database.Database$QueryCacheExecutor.create(Datab > ase.java:1072) > at > org.apache.acf.core.cachemanager.CacheManager.findObjectsAndExecute(C > acheManager.java:144) > at > org.apache.acf.core.database.Database.executeQuery(Database.java:167) > at > org.apache.acf.core.database.DBInterfaceDerby.performQuery(DBInterfac > eDerby.java:751) > at > org.apache.acf.crawler.jobs.JobManager.genQueueStatus(JobManager.java > :5981) > at > org.apache.jsp.queuestatus_jsp._jspService(queuestatus_jsp.java:769) > at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:70) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) > at > org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper > .java:377) > at > org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:3 > 13) > at org.apache.jasper.servlet.JspServlet.service(JspServlet.java:260) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) > at > org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511 > ) > at > org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:3 > 90) > at > org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.jav > a:216) > at > org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:1 > 82) > at > org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:7 > 65) > at > org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:418) > at org.mortbay.jetty.servlet.Dispatcher.forward(Dispatcher.java:327) > at org.mortbay.jetty.servlet.Dispatcher.forward(Dispatcher.java:126) > at > org.apache.jasper.runtime.PageContextImpl.doForward(PageContextImpl.j > ava:706) > at > org.apache.jasper.runtime.PageContextImpl.forward(PageContextImpl.jav > a:677) > at org.apache.jsp.execute_jsp._jspService(execute_jsp.java:1291) > at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:70) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) > at > org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper > .java:377) > at > org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:3 > 13) > at org.apache.jasper.servlet.JspServlet.service(JspServlet.java:260) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) > at > org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511 > ) > at > org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:3 > 90) > at > org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.jav > a:216) > at > org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:1 > 82) > at > org.mortbay.jetty.handler.ContextHandler.handle(ContextH
Re: Derby SQL ideas needed
The Derby table-result function syntax requires all output columns to be declared as part of the function definition, and more importantly it does not seem to allow calls into Derby itself to get results. So this would not seem to be a viable option for that reason. Back to square 1, I guess. Derby doesn't seem to allow any way to declare aggregate functions either, so I couldn't declare a FIRST() aggregate method as proposed below. Simple arithmetic functions seem like they would work, but that's not helpful here. Karl On Sat, Sep 18, 2010 at 6:45 AM, Karl Wright wrote: > For what it's worth, defining a Derby function seems like the only way > to do it. These seem to call arbitrary java that can accept a query > as an argument and return a resultset as the result. But in order to > write such a thing I will need the ability to call Derby at a java > level, I think, rather than through JDBC. Still looking for a good > example from somebody who has done something similar. > > Karl > > On Sat, Sep 18, 2010 at 6:28 AM, Karl Wright wrote: >> Hi Folks, >> >> For two of the report queries, ACF uses the following Postgresql >> construct, which sadly seems to have no Derby equivalent: >> >> SELECT DISTINCT ON (idbucket) t3.bucket AS idbucket, t3.activitycount >> AS activitycount, t3.windowstart AS starttime, t3.windowend AS endtime >> FROM (...) t3 >> >> In Postgresql, what this does is to return the FIRST entire row >> matching each distinct idbucket result. If Derby had a "FIRST()" >> aggregate function, it would be the equivalent of: >> >> SELECT t3.bucket AS idbucket, FIRST(t3.activitycount) AS >> activitycount, FIRST(t3.windowstart) AS starttime, FIRST(t3.windowend) >> AS endtime FROM (...) t3 GROUP BY t3.bucket >> >> Unfortunately, Derby has no such aggregate function. Furthermore, it >> would not be ideal if I were to do the work myself in ACF, because >> this is a resultset that needs to be paged through with offset and >> length, for presentation to the user and sorting, so it gets wrapped >> in another SELECT ... FROM (...) ORDER BY ... OFFSET ... LIMIT ... >> that does that part. >> >> Does anyone have any ideas and/or Derby contacts? I'd really like the >> quick-start example to have a functional set of reports. >> >> Karl >> >
Re: Derby SQL ideas needed
For what it's worth, defining a Derby function seems like the only way to do it. These seem to call arbitrary java that can accept a query as an argument and return a resultset as the result. But in order to write such a thing I will need the ability to call Derby at a java level, I think, rather than through JDBC. Still looking for a good example from somebody who has done something similar. Karl On Sat, Sep 18, 2010 at 6:28 AM, Karl Wright wrote: > Hi Folks, > > For two of the report queries, ACF uses the following Postgresql > construct, which sadly seems to have no Derby equivalent: > > SELECT DISTINCT ON (idbucket) t3.bucket AS idbucket, t3.activitycount > AS activitycount, t3.windowstart AS starttime, t3.windowend AS endtime > FROM (...) t3 > > In Postgresql, what this does is to return the FIRST entire row > matching each distinct idbucket result. If Derby had a "FIRST()" > aggregate function, it would be the equivalent of: > > SELECT t3.bucket AS idbucket, FIRST(t3.activitycount) AS > activitycount, FIRST(t3.windowstart) AS starttime, FIRST(t3.windowend) > AS endtime FROM (...) t3 GROUP BY t3.bucket > > Unfortunately, Derby has no such aggregate function. Furthermore, it > would not be ideal if I were to do the work myself in ACF, because > this is a resultset that needs to be paged through with offset and > length, for presentation to the user and sorting, so it gets wrapped > in another SELECT ... FROM (...) ORDER BY ... OFFSET ... LIMIT ... > that does that part. > > Does anyone have any ideas and/or Derby contacts? I'd really like the > quick-start example to have a functional set of reports. > > Karl >
Derby SQL ideas needed
Hi Folks, For two of the report queries, ACF uses the following Postgresql construct, which sadly seems to have no Derby equivalent: SELECT DISTINCT ON (idbucket) t3.bucket AS idbucket, t3.activitycount AS activitycount, t3.windowstart AS starttime, t3.windowend AS endtime FROM (...) t3 In Postgresql, what this does is to return the FIRST entire row matching each distinct idbucket result. If Derby had a "FIRST()" aggregate function, it would be the equivalent of: SELECT t3.bucket AS idbucket, FIRST(t3.activitycount) AS activitycount, FIRST(t3.windowstart) AS starttime, FIRST(t3.windowend) AS endtime FROM (...) t3 GROUP BY t3.bucket Unfortunately, Derby has no such aggregate function. Furthermore, it would not be ideal if I were to do the work myself in ACF, because this is a resultset that needs to be paged through with offset and length, for presentation to the user and sorting, so it gets wrapped in another SELECT ... FROM (...) ORDER BY ... OFFSET ... LIMIT ... that does that part. Does anyone have any ideas and/or Derby contacts? I'd really like the quick-start example to have a functional set of reports. Karl
[jira] Commented: (CONNECTORS-109) Queue status report fails under Derby
[ https://issues.apache.org/jira/browse/CONNECTORS-109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12910822#action_12910822 ] Karl Wright commented on CONNECTORS-109: The same is true of the maximum activity report, maximum bandwidth report, and result code report as well. > Queue status report fails under Derby > - > > Key: CONNECTORS-109 > URL: https://issues.apache.org/jira/browse/CONNECTORS-109 > Project: Apache Connectors Framework > Issue Type: Bug > Components: Framework crawler agent >Reporter: Karl Wright > > If you try to use the queue status report with Derby as the database, you get > the following error: > 2010-09-17 18:03:21.558:WARN::Nested in javax.servlet.ServletException: > org.apac > he.acf.core.interfaces.ACFException: Database exception: Exception doing > query: > Syntax error: Encountered "SUBSTRING" at line 1, column 8.: > org.apache.acf.core.interfaces.ACFException: Database exception: Exception > doing > query: Syntax error: Encountered "SUBSTRING" at line 1, column 8. > at > org.apache.acf.core.database.Database.executeViaThread(Database.java: > 421) > at > org.apache.acf.core.database.Database.executeUncachedQuery(Database.j > ava:465) > at > org.apache.acf.core.database.Database$QueryCacheExecutor.create(Datab > ase.java:1072) > at > org.apache.acf.core.cachemanager.CacheManager.findObjectsAndExecute(C > acheManager.java:144) > at > org.apache.acf.core.database.Database.executeQuery(Database.java:167) > at > org.apache.acf.core.database.DBInterfaceDerby.performQuery(DBInterfac > eDerby.java:751) > at > org.apache.acf.crawler.jobs.JobManager.genQueueStatus(JobManager.java > :5981) > at > org.apache.jsp.queuestatus_jsp._jspService(queuestatus_jsp.java:769) > at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:70) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) > at > org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper > .java:377) > at > org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:3 > 13) > at org.apache.jasper.servlet.JspServlet.service(JspServlet.java:260) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) > at > org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511 > ) > at > org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:3 > 90) > at > org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.jav > a:216) > at > org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:1 > 82) > at > org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:7 > 65) > at > org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:418) > at org.mortbay.jetty.servlet.Dispatcher.forward(Dispatcher.java:327) > at org.mortbay.jetty.servlet.Dispatcher.forward(Dispatcher.java:126) > at > org.apache.jasper.runtime.PageContextImpl.doForward(PageContextImpl.j > ava:706) > at > org.apache.jasper.runtime.PageContextImpl.forward(PageContextImpl.jav > a:677) > at org.apache.jsp.execute_jsp._jspService(execute_jsp.java:1291) > at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:70) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) > at > org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper > .java:377) > at > org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:3 > 13) > at org.apache.jasper.servlet.JspServlet.service(JspServlet.java:260) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) > at > org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511 > ) > at > org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:3 > 90) > at > org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.jav > a:216) > at > org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:1 > 82) > at > org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:7 > 65) > at > org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:418) > at > org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection. > java:114) >
[jira] Created: (CONNECTORS-109) Queue status report fails under Derby
Queue status report fails under Derby - Key: CONNECTORS-109 URL: https://issues.apache.org/jira/browse/CONNECTORS-109 Project: Apache Connectors Framework Issue Type: Bug Components: Framework crawler agent Reporter: Karl Wright If you try to use the queue status report with Derby as the database, you get the following error: 2010-09-17 18:03:21.558:WARN::Nested in javax.servlet.ServletException: org.apac he.acf.core.interfaces.ACFException: Database exception: Exception doing query: Syntax error: Encountered "SUBSTRING" at line 1, column 8.: org.apache.acf.core.interfaces.ACFException: Database exception: Exception doing query: Syntax error: Encountered "SUBSTRING" at line 1, column 8. at org.apache.acf.core.database.Database.executeViaThread(Database.java: 421) at org.apache.acf.core.database.Database.executeUncachedQuery(Database.j ava:465) at org.apache.acf.core.database.Database$QueryCacheExecutor.create(Datab ase.java:1072) at org.apache.acf.core.cachemanager.CacheManager.findObjectsAndExecute(C acheManager.java:144) at org.apache.acf.core.database.Database.executeQuery(Database.java:167) at org.apache.acf.core.database.DBInterfaceDerby.performQuery(DBInterfac eDerby.java:751) at org.apache.acf.crawler.jobs.JobManager.genQueueStatus(JobManager.java :5981) at org.apache.jsp.queuestatus_jsp._jspService(queuestatus_jsp.java:769) at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:70) at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) at org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper .java:377) at org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:3 13) at org.apache.jasper.servlet.JspServlet.service(JspServlet.java:260) at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511 ) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:3 90) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.jav a:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:1 82) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:7 65) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:418) at org.mortbay.jetty.servlet.Dispatcher.forward(Dispatcher.java:327) at org.mortbay.jetty.servlet.Dispatcher.forward(Dispatcher.java:126) at org.apache.jasper.runtime.PageContextImpl.doForward(PageContextImpl.j ava:706) at org.apache.jasper.runtime.PageContextImpl.forward(PageContextImpl.jav a:677) at org.apache.jsp.execute_jsp._jspService(execute_jsp.java:1291) at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:70) at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) at org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper .java:377) at org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:3 13) at org.apache.jasper.servlet.JspServlet.service(JspServlet.java:260) at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511 ) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:3 90) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.jav a:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:1 82) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:7 65) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:418) at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection. java:114) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:1 52) at org.mortbay.jetty.Server.handle(Server.java:326) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:54 2) at org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnectio n.java:938) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:755) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:218) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector. java:228) at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java (502) The reason for the error is that Derby does not recognize the SUBSTRING(...) operation, which extracts parts of a string based on a regular expression. In other places in Derby where regular expressions were required, I've been success
RE: Derby/JUnit bad interaction - any ideas?
This actually did work, oddly enough. I wonder how Derby is undoing the read-only attribute on those directories? But in any case, I'm revamping the core setup/shutdown code again so that there's a decent hook in place to do the derby shutdown. Karl -Original Message- From: ext Mark Miller [mailto:markrmil...@gmail.com] Sent: Wednesday, June 09, 2010 4:26 PM To: connectors-dev@incubator.apache.org Subject: Re: Derby/JUnit bad interaction - any ideas? On 6/8/10 6:35 AM, karl.wri...@nokia.com wrote: > I've been trying to get some basic tests working under Junit. Unfortunately, > I've run into a Derby problem which prevents these tests from working. > > What happens is this. Derby, when it creates a database, forces a number of > directories within the database to "read-only". Unfortunately, unless we > stipulate Java 1.6 or up, there is no native Java way to make these > directories become non-read-only. So database cleanup always fails to > actually remove the old database, and then new database creation subsequently > fails. > > So there are two possibilities. First, we can change things so we never > actually try to clean up the Derby DB. Second, we can mandate the java 1.6 > is used for LCF. That's all there really is. > > The first possibility is tricky but doable - I think. The second would > probably be unacceptable in many ways. > > Thoughts? > > Karl > > > > So I've been thinking about this - I still have trouble believing this is a real problem. I had a large suite of tests that used embedded derby in a system I worked on a few years back - and I never had any trouble removing the db dir after shutting down derby. Looking at the code, have you actually tried shutting down derby? Currently you have: // Cause database to shut down new Database(context,_url+databaseName+";shutdown=true",_driver,databaseName,"",""); // DO NOT delete user or shutdown database, since this is in fact impossible under java 1.5 (since Derby makes its directories read-only, and // there's no way to undo that... // rm -rf //File f = new File(databaseName); //recursiveDelete(f); But that is not going to do the shutdown? On a quick look, doing new Database(context, url ... does not actually contact the db - so its not going to cause it to shutdown? Is this just cruft code and you have actually tried shutting down as well? Something makes me think the delete is going to work if you actually attempt to connect with '...;shutdown=true' jdbc URL. -- - Mark http://www.lucidimagination.com
Re: Derby/JUnit bad interaction - any ideas?
On 6/8/10 6:35 AM, karl.wri...@nokia.com wrote: I've been trying to get some basic tests working under Junit. Unfortunately, I've run into a Derby problem which prevents these tests from working. What happens is this. Derby, when it creates a database, forces a number of directories within the database to "read-only". Unfortunately, unless we stipulate Java 1.6 or up, there is no native Java way to make these directories become non-read-only. So database cleanup always fails to actually remove the old database, and then new database creation subsequently fails. So there are two possibilities. First, we can change things so we never actually try to clean up the Derby DB. Second, we can mandate the java 1.6 is used for LCF. That's all there really is. The first possibility is tricky but doable - I think. The second would probably be unacceptable in many ways. Thoughts? Karl So I've been thinking about this - I still have trouble believing this is a real problem. I had a large suite of tests that used embedded derby in a system I worked on a few years back - and I never had any trouble removing the db dir after shutting down derby. Looking at the code, have you actually tried shutting down derby? Currently you have: // Cause database to shut down new Database(context,_url+databaseName+";shutdown=true",_driver,databaseName,"",""); // DO NOT delete user or shutdown database, since this is in fact impossible under java 1.5 (since Derby makes its directories read-only, and // there's no way to undo that... // rm -rf //File f = new File(databaseName); //recursiveDelete(f); But that is not going to do the shutdown? On a quick look, doing new Database(context, url ... does not actually contact the db - so its not going to cause it to shutdown? Is this just cruft code and you have actually tried shutting down as well? Something makes me think the delete is going to work if you actually attempt to connect with '...;shutdown=true' jdbc URL. -- - Mark http://www.lucidimagination.com
RE: Derby/JUnit bad interaction - any ideas?
I take this partially back. The gcj jvm is the one that doesn't work with ant. At any rate, going to a different JVM is something I can only influence but can't control, so that's probably not going to happen for a while. Karl From: Wright Karl (Nokia-S/Cambridge) Sent: Wednesday, June 09, 2010 5:24 AM To: connectors-dev@incubator.apache.org Subject: RE: Derby/JUnit bad interaction - any ideas? Open jdk does not seem to work properly with most java applications at this time, although it has continued to improve. Its switch incompatibilities stop it from working with ant at this time, so one cannot even build LCF with it. Karl From: ext Olivier Bourgeat [olivier.bourg...@polyspot.com] Sent: Wednesday, June 09, 2010 4:03 AM To: connectors-dev@incubator.apache.org Subject: RE: Derby/JUnit bad interaction - any ideas? Debian Lenny have openjdk-6: http://packages.debian.org/fr/source/lenny/openjdk-6 Olivier Le mardi 08 juin 2010 à 22:37 +0200, karl.wri...@nokia.com a écrit : > MetaCarta is running Debian Lenny, which does not have a 1.6 version of Java > available at this time. > > Karl > > > -Original Message- > From: ext Jack Krupansky [mailto:jack.krupan...@lucidimagination.com] > Sent: Tuesday, June 08, 2010 4:36 PM > To: connectors-dev@incubator.apache.org > Subject: Re: Derby/JUnit bad interaction - any ideas? > > If we need to require Java 1.6, that is probably okay. I am fine with that. > Does anybody have a serious objection to requiring Java 1.6 for LCF? > > -- Jack Krupansky > > -- > From: > Sent: Tuesday, June 08, 2010 6:35 AM > To: > Subject: Derby/JUnit bad interaction - any ideas? > > > I've been trying to get some basic tests working under Junit. > > Unfortunately, I've run into a Derby problem which prevents these tests > > from working. > > > > What happens is this. Derby, when it creates a database, forces a number > > of directories within the database to "read-only". Unfortunately, unless > > we stipulate Java 1.6 or up, there is no native Java way to make these > > directories become non-read-only. So database cleanup always fails to > > actually remove the old database, and then new database creation > > subsequently fails. > > > > So there are two possibilities. First, we can change things so we never > > actually try to clean up the Derby DB. Second, we can mandate the java > > 1.6 is used for LCF. That's all there really is. > > > > The first possibility is tricky but doable - I think. The second would > > probably be unacceptable in many ways. > > > > Thoughts? > > > > Karl > > > > > > > >
RE: Derby/JUnit bad interaction - any ideas?
Open jdk does not seem to work properly with most java applications at this time, although it has continued to improve. Its switch incompatibilities stop it from working with ant at this time, so one cannot even build LCF with it. Karl From: ext Olivier Bourgeat [olivier.bourg...@polyspot.com] Sent: Wednesday, June 09, 2010 4:03 AM To: connectors-dev@incubator.apache.org Subject: RE: Derby/JUnit bad interaction - any ideas? Debian Lenny have openjdk-6: http://packages.debian.org/fr/source/lenny/openjdk-6 Olivier Le mardi 08 juin 2010 à 22:37 +0200, karl.wri...@nokia.com a écrit : > MetaCarta is running Debian Lenny, which does not have a 1.6 version of Java > available at this time. > > Karl > > > -Original Message- > From: ext Jack Krupansky [mailto:jack.krupan...@lucidimagination.com] > Sent: Tuesday, June 08, 2010 4:36 PM > To: connectors-dev@incubator.apache.org > Subject: Re: Derby/JUnit bad interaction - any ideas? > > If we need to require Java 1.6, that is probably okay. I am fine with that. > Does anybody have a serious objection to requiring Java 1.6 for LCF? > > -- Jack Krupansky > > -- > From: > Sent: Tuesday, June 08, 2010 6:35 AM > To: > Subject: Derby/JUnit bad interaction - any ideas? > > > I've been trying to get some basic tests working under Junit. > > Unfortunately, I've run into a Derby problem which prevents these tests > > from working. > > > > What happens is this. Derby, when it creates a database, forces a number > > of directories within the database to "read-only". Unfortunately, unless > > we stipulate Java 1.6 or up, there is no native Java way to make these > > directories become non-read-only. So database cleanup always fails to > > actually remove the old database, and then new database creation > > subsequently fails. > > > > So there are two possibilities. First, we can change things so we never > > actually try to clean up the Derby DB. Second, we can mandate the java > > 1.6 is used for LCF. That's all there really is. > > > > The first possibility is tricky but doable - I think. The second would > > probably be unacceptable in many ways. > > > > Thoughts? > > > > Karl > > > > > > > >
RE: Derby/JUnit bad interaction - any ideas?
Debian Lenny have openjdk-6: http://packages.debian.org/fr/source/lenny/openjdk-6 Olivier Le mardi 08 juin 2010 à 22:37 +0200, karl.wri...@nokia.com a écrit : > MetaCarta is running Debian Lenny, which does not have a 1.6 version of Java > available at this time. > > Karl > > > -Original Message- > From: ext Jack Krupansky [mailto:jack.krupan...@lucidimagination.com] > Sent: Tuesday, June 08, 2010 4:36 PM > To: connectors-dev@incubator.apache.org > Subject: Re: Derby/JUnit bad interaction - any ideas? > > If we need to require Java 1.6, that is probably okay. I am fine with that. > Does anybody have a serious objection to requiring Java 1.6 for LCF? > > -- Jack Krupansky > > -- > From: > Sent: Tuesday, June 08, 2010 6:35 AM > To: > Subject: Derby/JUnit bad interaction - any ideas? > > > I've been trying to get some basic tests working under Junit. > > Unfortunately, I've run into a Derby problem which prevents these tests > > from working. > > > > What happens is this. Derby, when it creates a database, forces a number > > of directories within the database to "read-only". Unfortunately, unless > > we stipulate Java 1.6 or up, there is no native Java way to make these > > directories become non-read-only. So database cleanup always fails to > > actually remove the old database, and then new database creation > > subsequently fails. > > > > So there are two possibilities. First, we can change things so we never > > actually try to clean up the Derby DB. Second, we can mandate the java > > 1.6 is used for LCF. That's all there really is. > > > > The first possibility is tricky but doable - I think. The second would > > probably be unacceptable in many ways. > > > > Thoughts? > > > > Karl > > > > > > > >
RE: Derby/JUnit bad interaction - any ideas?
MetaCarta is running Debian Lenny, which does not have a 1.6 version of Java available at this time. Karl -Original Message- From: ext Jack Krupansky [mailto:jack.krupan...@lucidimagination.com] Sent: Tuesday, June 08, 2010 4:36 PM To: connectors-dev@incubator.apache.org Subject: Re: Derby/JUnit bad interaction - any ideas? If we need to require Java 1.6, that is probably okay. I am fine with that. Does anybody have a serious objection to requiring Java 1.6 for LCF? -- Jack Krupansky -- From: Sent: Tuesday, June 08, 2010 6:35 AM To: Subject: Derby/JUnit bad interaction - any ideas? > I've been trying to get some basic tests working under Junit. > Unfortunately, I've run into a Derby problem which prevents these tests > from working. > > What happens is this. Derby, when it creates a database, forces a number > of directories within the database to "read-only". Unfortunately, unless > we stipulate Java 1.6 or up, there is no native Java way to make these > directories become non-read-only. So database cleanup always fails to > actually remove the old database, and then new database creation > subsequently fails. > > So there are two possibilities. First, we can change things so we never > actually try to clean up the Derby DB. Second, we can mandate the java > 1.6 is used for LCF. That's all there really is. > > The first possibility is tricky but doable - I think. The second would > probably be unacceptable in many ways. > > Thoughts? > > Karl > > > >
Re: Derby/JUnit bad interaction - any ideas?
If we need to require Java 1.6, that is probably okay. I am fine with that. Does anybody have a serious objection to requiring Java 1.6 for LCF? -- Jack Krupansky -- From: Sent: Tuesday, June 08, 2010 6:35 AM To: Subject: Derby/JUnit bad interaction - any ideas? I've been trying to get some basic tests working under Junit. Unfortunately, I've run into a Derby problem which prevents these tests from working. What happens is this. Derby, when it creates a database, forces a number of directories within the database to "read-only". Unfortunately, unless we stipulate Java 1.6 or up, there is no native Java way to make these directories become non-read-only. So database cleanup always fails to actually remove the old database, and then new database creation subsequently fails. So there are two possibilities. First, we can change things so we never actually try to clean up the Derby DB. Second, we can mandate the java 1.6 is used for LCF. That's all there really is. The first possibility is tricky but doable - I think. The second would probably be unacceptable in many ways. Thoughts? Karl
RE: Derby/JUnit bad interaction - any ideas?
I just had a look at the sources. Ant's chmod task queries what kind of OS it is, and if it is the right kind, it actually attempts to fire off the chmod utility. ;-) That's pretty hacky. Nice to avoid that if possible. Now, I was able to get my current set of brain-dead tests to work OK (and the ant cleanup too!) by making sure that the database was properly cleaned after every use, and leaving it around for later. It turns out that ant can delete the testing directory even though the directory underneath it has read-only stuff in it, even without the chmod. This seems to be because when it fails any deletion, it simply calls f.deleteOnExit() and lets the JVM do it later - and apparently the JVM *can* do this, because it's implemented to just do an unlink at that time, which bypasses the need to actually delete any read-only subdirectories. Oh my. What a strange mess. Still, things are currently working, so I guess I'll leave them as they are, for now. Karl -Original Message- From: ext Koji Sekiguchi [mailto:k...@r.email.ne.jp] Sent: Tuesday, June 08, 2010 10:30 AM To: connectors-dev@incubator.apache.org Subject: Re: Derby/JUnit bad interaction - any ideas? (10/06/08 23:14), karl.wri...@nokia.com wrote: > Yeah, I was pretty surprised too. But on windows it is likely that > File.makeReadOnly() (which is what Derby must be using) doesn't actually do > anything to directories, which would explain the discrepancy. > > Karl > > If so, luckily Ant hack can solve the problem on Linux. Koji -- http://www.rondhuit.com/en/
Re: Derby/JUnit bad interaction - any ideas?
(10/06/08 23:14), karl.wri...@nokia.com wrote: Yeah, I was pretty surprised too. But on windows it is likely that File.makeReadOnly() (which is what Derby must be using) doesn't actually do anything to directories, which would explain the discrepancy. Karl If so, luckily Ant hack can solve the problem on Linux. Koji -- http://www.rondhuit.com/en/
RE: Derby/JUnit bad interaction - any ideas?
Yeah, I was pretty surprised too. But on windows it is likely that File.makeReadOnly() (which is what Derby must be using) doesn't actually do anything to directories, which would explain the discrepancy. Karl -Original Message- From: ext Mark Miller [mailto:markrmil...@gmail.com] Sent: Tuesday, June 08, 2010 9:45 AM To: connectors-dev@incubator.apache.org Subject: Re: Derby/JUnit bad interaction - any ideas? On 6/8/10 6:35 AM, karl.wri...@nokia.com wrote: > I've been trying to get some basic tests working under Junit. Unfortunately, > I've run into a Derby problem which prevents these tests from working. > > What happens is this. Derby, when it creates a database, forces a number of > directories within the database to "read-only". Unfortunately, unless we > stipulate Java 1.6 or up, there is no native Java way to make these > directories become non-read-only. So database cleanup always fails to > actually remove the old database, and then new database creation subsequently > fails. > > So there are two possibilities. First, we can change things so we never > actually try to clean up the Derby DB. Second, we can mandate the java 1.6 > is used for LCF. That's all there really is. > > The first possibility is tricky but doable - I think. The second would > probably be unacceptable in many ways. > > Thoughts? > > Karl > > > > Interesting - when I worked with derby in the past, I never had any trouble deleting a database after shutting it down on windows using Java 5. It worked great with my unit tests. You could always run each test in a new system tmp dir every time... I find it hard to believe you cannot delete the database somehow though - like I said, I never had any problems with it using embedded derby in the past after shutting down the db. -- - Mark http://www.lucidimagination.com
RE: Derby/JUnit bad interaction - any ideas?
Huh. I wonder how ant is doing it? Using the ant task directly makes it impossible to do this from within JUnit, of course, but maybe the same hack can be done inside the test stuff. Karl -Original Message- From: ext Koji Sekiguchi [mailto:k...@r.email.ne.jp] Sent: Tuesday, June 08, 2010 10:08 AM To: connectors-dev@incubator.apache.org Subject: Re: Derby/JUnit bad interaction - any ideas? (10/06/08 22:35), karl.wri...@nokia.com wrote: > I've been trying to get some basic tests working under Junit. Unfortunately, > I've run into a Derby problem which prevents these tests from working. > > What happens is this. Derby, when it creates a database, forces a number of > directories within the database to "read-only". Unfortunately, unless we > stipulate Java 1.6 or up, there is no native Java way to make these > directories become non-read-only. So database cleanup always fails to > actually remove the old database, and then new database creation subsequently > fails. > > So there are two possibilities. First, we can change things so we never > actually try to clean up the Derby DB. Second, we can mandate the java 1.6 > is used for LCF. That's all there really is. > > The first possibility is tricky but doable - I think. The second would > probably be unacceptable in many ways. > > Thoughts? > > Karl > Hi Karl, If it is possible, Ant chmod task can be used, or you can consult the implementation. But Ant manual says for the task: " Right now it has effect only under Unix or NonStop Kernel (Tandem)." http://ant.apache.org/manual/Tasks/chmod.html Koji -- http://www.rondhuit.com/en/
Re: Derby/JUnit bad interaction - any ideas?
(10/06/08 22:35), karl.wri...@nokia.com wrote: I've been trying to get some basic tests working under Junit. Unfortunately, I've run into a Derby problem which prevents these tests from working. What happens is this. Derby, when it creates a database, forces a number of directories within the database to "read-only". Unfortunately, unless we stipulate Java 1.6 or up, there is no native Java way to make these directories become non-read-only. So database cleanup always fails to actually remove the old database, and then new database creation subsequently fails. So there are two possibilities. First, we can change things so we never actually try to clean up the Derby DB. Second, we can mandate the java 1.6 is used for LCF. That's all there really is. The first possibility is tricky but doable - I think. The second would probably be unacceptable in many ways. Thoughts? Karl Hi Karl, If it is possible, Ant chmod task can be used, or you can consult the implementation. But Ant manual says for the task: " Right now it has effect only under Unix or NonStop Kernel (Tandem)." http://ant.apache.org/manual/Tasks/chmod.html Koji -- http://www.rondhuit.com/en/
Re: Derby/JUnit bad interaction - any ideas?
On 6/8/10 6:35 AM, karl.wri...@nokia.com wrote: I've been trying to get some basic tests working under Junit. Unfortunately, I've run into a Derby problem which prevents these tests from working. What happens is this. Derby, when it creates a database, forces a number of directories within the database to "read-only". Unfortunately, unless we stipulate Java 1.6 or up, there is no native Java way to make these directories become non-read-only. So database cleanup always fails to actually remove the old database, and then new database creation subsequently fails. So there are two possibilities. First, we can change things so we never actually try to clean up the Derby DB. Second, we can mandate the java 1.6 is used for LCF. That's all there really is. The first possibility is tricky but doable - I think. The second would probably be unacceptable in many ways. Thoughts? Karl Interesting - when I worked with derby in the past, I never had any trouble deleting a database after shutting it down on windows using Java 5. It worked great with my unit tests. You could always run each test in a new system tmp dir every time... I find it hard to believe you cannot delete the database somehow though - like I said, I never had any problems with it using embedded derby in the past after shutting down the db. -- - Mark http://www.lucidimagination.com
Derby/JUnit bad interaction - any ideas?
I've been trying to get some basic tests working under Junit. Unfortunately, I've run into a Derby problem which prevents these tests from working. What happens is this. Derby, when it creates a database, forces a number of directories within the database to "read-only". Unfortunately, unless we stipulate Java 1.6 or up, there is no native Java way to make these directories become non-read-only. So database cleanup always fails to actually remove the old database, and then new database creation subsequently fails. So there are two possibilities. First, we can change things so we never actually try to clean up the Derby DB. Second, we can mandate the java 1.6 is used for LCF. That's all there really is. The first possibility is tricky but doable - I think. The second would probably be unacceptable in many ways. Thoughts? Karl
RE: Derby
Yup. Karl -Original Message- From: ext Jack Krupansky [mailto:jack.krupan...@lucidimagination.com] Sent: Friday, June 04, 2010 12:27 AM To: connectors-dev@incubator.apache.org Subject: Re: Derby Just to be clear, the full sequence would be: 1) Start UI app. Agent process should not be running. 2) "Start" LCF job in UI. 3) Shutdown UI app. Not just close the browser window. 4) AgentRun. 5) Wait long enough for crawl to have finished. Maybe watch to see that Solr has become idle. 6) Possibly commit to Solr. 7) AgentStop. 8) Back to step 1 for additional jobs. Correct? -- Jack Krupansky -- From: Sent: Thursday, June 03, 2010 7:24 PM To: Subject: RE: Derby > The daemon does not need to interact with the UI directly, only with the > database. So, you stop the UI, start the daemon, and after a while, shut > down the daemon and restart the UI. > > Karl > > -Original Message- > From: ext Jack Krupansky [mailto:jack.krupan...@lucidimagination.com] > Sent: Thursday, June 03, 2010 5:51 PM > To: connectors-dev@incubator.apache.org > Subject: Re: Derby > >> (1) You can't run more than one LCF process at a time. That means >> you >> need to either run the daemon or the crawler-ui web application, but you >> can't run both at the same time. > > How do you "Start" a crawl then if not in the web app which then starts > the > agent process crawling? > > Thanks for all of this effort! > > -- Jack Krupansky > > ------ > From: > Sent: Thursday, June 03, 2010 5:34 PM > To: > Subject: Derby > >> For what it's worth, after some 5 days of work, and a couple of schema >> changes to boot, LCF now runs with Derby. >> Some caveats: >> >> (1) You can't run more than one LCF process at a time. That means >> you >> need to either run the daemon or the crawler-ui web application, but you >> can't run both at the same time. >> (2) I haven't tested every query, so I'm sure there are probably some >> that are still broken. >> (3) It's slow. Count yourself as fortunate if it runs 1/5 the rate >> of >> Postgresql for you. >> (4) Transactional integrity hasn't been evaluated. >> (5) Deadlock detection and unique constraint violation detection is >> probably not right, because I'd need to cause these errors to occur >> before >> being able to key off their exception messages. >> (6) I had to turn off the ability to sort on certain columns in the >> reports - basically, any column that was represented as a large character >> field. >> >> Nevertheless, this represents an important milestone on the path to being >> able to write some kind of unit tests that have at least some meaning. >> >> If you have an existing LCF Postgresql database, you will need to force >> an >> upgrade after going to the new trunk code. To do this, repeat the >> "org.apache.lcf.agents.Install" command, and the >> "org.apache.lcf.agents.Register >> org.apache.lcf.crawler.system.CrawlerAgent" command after deploying the >> new code. And, please, let me know of any kind of errors you notice that >> could be related to the schema change. >> >> Thanks, >> Karl >> >> >>
RE: Derby
The reason this occurs is because I am using Derby in embedded mode, and the restriction appears to be a limitation of that mode of operation. However, this mode is necessary to meet the testing goal, which was the prime motivator behind doing a Derby implementation. I am sure that if we were to use Derby as a service, the restriction would no longer apply, but then there would be no conceivable benefit either. Karl -Original Message- From: ext Jack Krupansky [mailto:jack.krupan...@lucidimagination.com] Sent: Friday, June 04, 2010 12:41 AM To: connectors-dev@incubator.apache.org Subject: Re: Derby What is the nature of the single LCF process issue? Is it because the database is being used in single-user mode, or some other issue? Is it a permanent issue, or is there a solution or workaround anticipated at some stage. Thanks. -- Jack Krupansky -- From: Sent: Thursday, June 03, 2010 5:34 PM To: Subject: Derby > For what it's worth, after some 5 days of work, and a couple of schema > changes to boot, LCF now runs with Derby. > Some caveats: > > (1) You can't run more than one LCF process at a time. That means you > need to either run the daemon or the crawler-ui web application, but you > can't run both at the same time. > (2) I haven't tested every query, so I'm sure there are probably some > that are still broken. > (3) It's slow. Count yourself as fortunate if it runs 1/5 the rate of > Postgresql for you. > (4) Transactional integrity hasn't been evaluated. > (5) Deadlock detection and unique constraint violation detection is > probably not right, because I'd need to cause these errors to occur before > being able to key off their exception messages. > (6) I had to turn off the ability to sort on certain columns in the > reports - basically, any column that was represented as a large character > field. > > Nevertheless, this represents an important milestone on the path to being > able to write some kind of unit tests that have at least some meaning. > > If you have an existing LCF Postgresql database, you will need to force an > upgrade after going to the new trunk code. To do this, repeat the > "org.apache.lcf.agents.Install" command, and the > "org.apache.lcf.agents.Register > org.apache.lcf.crawler.system.CrawlerAgent" command after deploying the > new code. And, please, let me know of any kind of errors you notice that > could be related to the schema change. > > Thanks, > Karl > > >
Re: Derby
What is the nature of the single LCF process issue? Is it because the database is being used in single-user mode, or some other issue? Is it a permanent issue, or is there a solution or workaround anticipated at some stage. Thanks. -- Jack Krupansky -- From: Sent: Thursday, June 03, 2010 5:34 PM To: Subject: Derby For what it's worth, after some 5 days of work, and a couple of schema changes to boot, LCF now runs with Derby. Some caveats: (1) You can't run more than one LCF process at a time. That means you need to either run the daemon or the crawler-ui web application, but you can't run both at the same time. (2) I haven't tested every query, so I'm sure there are probably some that are still broken. (3) It's slow. Count yourself as fortunate if it runs 1/5 the rate of Postgresql for you. (4) Transactional integrity hasn't been evaluated. (5) Deadlock detection and unique constraint violation detection is probably not right, because I'd need to cause these errors to occur before being able to key off their exception messages. (6) I had to turn off the ability to sort on certain columns in the reports - basically, any column that was represented as a large character field. Nevertheless, this represents an important milestone on the path to being able to write some kind of unit tests that have at least some meaning. If you have an existing LCF Postgresql database, you will need to force an upgrade after going to the new trunk code. To do this, repeat the "org.apache.lcf.agents.Install" command, and the "org.apache.lcf.agents.Register org.apache.lcf.crawler.system.CrawlerAgent" command after deploying the new code. And, please, let me know of any kind of errors you notice that could be related to the schema change. Thanks, Karl
Re: Derby
Just to be clear, the full sequence would be: 1) Start UI app. Agent process should not be running. 2) "Start" LCF job in UI. 3) Shutdown UI app. Not just close the browser window. 4) AgentRun. 5) Wait long enough for crawl to have finished. Maybe watch to see that Solr has become idle. 6) Possibly commit to Solr. 7) AgentStop. 8) Back to step 1 for additional jobs. Correct? -- Jack Krupansky -- From: Sent: Thursday, June 03, 2010 7:24 PM To: Subject: RE: Derby The daemon does not need to interact with the UI directly, only with the database. So, you stop the UI, start the daemon, and after a while, shut down the daemon and restart the UI. Karl -Original Message- From: ext Jack Krupansky [mailto:jack.krupan...@lucidimagination.com] Sent: Thursday, June 03, 2010 5:51 PM To: connectors-dev@incubator.apache.org Subject: Re: Derby (1) You can't run more than one LCF process at a time. That means you need to either run the daemon or the crawler-ui web application, but you can't run both at the same time. How do you "Start" a crawl then if not in the web app which then starts the agent process crawling? Thanks for all of this effort! -- Jack Krupansky -- From: Sent: Thursday, June 03, 2010 5:34 PM To: Subject: Derby For what it's worth, after some 5 days of work, and a couple of schema changes to boot, LCF now runs with Derby. Some caveats: (1) You can't run more than one LCF process at a time. That means you need to either run the daemon or the crawler-ui web application, but you can't run both at the same time. (2) I haven't tested every query, so I'm sure there are probably some that are still broken. (3) It's slow. Count yourself as fortunate if it runs 1/5 the rate of Postgresql for you. (4) Transactional integrity hasn't been evaluated. (5) Deadlock detection and unique constraint violation detection is probably not right, because I'd need to cause these errors to occur before being able to key off their exception messages. (6) I had to turn off the ability to sort on certain columns in the reports - basically, any column that was represented as a large character field. Nevertheless, this represents an important milestone on the path to being able to write some kind of unit tests that have at least some meaning. If you have an existing LCF Postgresql database, you will need to force an upgrade after going to the new trunk code. To do this, repeat the "org.apache.lcf.agents.Install" command, and the "org.apache.lcf.agents.Register org.apache.lcf.crawler.system.CrawlerAgent" command after deploying the new code. And, please, let me know of any kind of errors you notice that could be related to the schema change. Thanks, Karl
RE: Derby
The daemon does not need to interact with the UI directly, only with the database. So, you stop the UI, start the daemon, and after a while, shut down the daemon and restart the UI. Karl -Original Message- From: ext Jack Krupansky [mailto:jack.krupan...@lucidimagination.com] Sent: Thursday, June 03, 2010 5:51 PM To: connectors-dev@incubator.apache.org Subject: Re: Derby > (1) You can't run more than one LCF process at a time. That means you > need to either run the daemon or the crawler-ui web application, but you > can't run both at the same time. How do you "Start" a crawl then if not in the web app which then starts the agent process crawling? Thanks for all of this effort! -- Jack Krupansky -- From: Sent: Thursday, June 03, 2010 5:34 PM To: Subject: Derby > For what it's worth, after some 5 days of work, and a couple of schema > changes to boot, LCF now runs with Derby. > Some caveats: > > (1) You can't run more than one LCF process at a time. That means you > need to either run the daemon or the crawler-ui web application, but you > can't run both at the same time. > (2) I haven't tested every query, so I'm sure there are probably some > that are still broken. > (3) It's slow. Count yourself as fortunate if it runs 1/5 the rate of > Postgresql for you. > (4) Transactional integrity hasn't been evaluated. > (5) Deadlock detection and unique constraint violation detection is > probably not right, because I'd need to cause these errors to occur before > being able to key off their exception messages. > (6) I had to turn off the ability to sort on certain columns in the > reports - basically, any column that was represented as a large character > field. > > Nevertheless, this represents an important milestone on the path to being > able to write some kind of unit tests that have at least some meaning. > > If you have an existing LCF Postgresql database, you will need to force an > upgrade after going to the new trunk code. To do this, repeat the > "org.apache.lcf.agents.Install" command, and the > "org.apache.lcf.agents.Register > org.apache.lcf.crawler.system.CrawlerAgent" command after deploying the > new code. And, please, let me know of any kind of errors you notice that > could be related to the schema change. > > Thanks, > Karl > > >
Re: Derby
(1) You can't run more than one LCF process at a time. That means you need to either run the daemon or the crawler-ui web application, but you can't run both at the same time. How do you "Start" a crawl then if not in the web app which then starts the agent process crawling? Thanks for all of this effort! -- Jack Krupansky -- From: Sent: Thursday, June 03, 2010 5:34 PM To: Subject: Derby For what it's worth, after some 5 days of work, and a couple of schema changes to boot, LCF now runs with Derby. Some caveats: (1) You can't run more than one LCF process at a time. That means you need to either run the daemon or the crawler-ui web application, but you can't run both at the same time. (2) I haven't tested every query, so I'm sure there are probably some that are still broken. (3) It's slow. Count yourself as fortunate if it runs 1/5 the rate of Postgresql for you. (4) Transactional integrity hasn't been evaluated. (5) Deadlock detection and unique constraint violation detection is probably not right, because I'd need to cause these errors to occur before being able to key off their exception messages. (6) I had to turn off the ability to sort on certain columns in the reports - basically, any column that was represented as a large character field. Nevertheless, this represents an important milestone on the path to being able to write some kind of unit tests that have at least some meaning. If you have an existing LCF Postgresql database, you will need to force an upgrade after going to the new trunk code. To do this, repeat the "org.apache.lcf.agents.Install" command, and the "org.apache.lcf.agents.Register org.apache.lcf.crawler.system.CrawlerAgent" command after deploying the new code. And, please, let me know of any kind of errors you notice that could be related to the schema change. Thanks, Karl
Derby
For what it's worth, after some 5 days of work, and a couple of schema changes to boot, LCF now runs with Derby. Some caveats: (1) You can't run more than one LCF process at a time. That means you need to either run the daemon or the crawler-ui web application, but you can't run both at the same time. (2) I haven't tested every query, so I'm sure there are probably some that are still broken. (3) It's slow. Count yourself as fortunate if it runs 1/5 the rate of Postgresql for you. (4) Transactional integrity hasn't been evaluated. (5) Deadlock detection and unique constraint violation detection is probably not right, because I'd need to cause these errors to occur before being able to key off their exception messages. (6) I had to turn off the ability to sort on certain columns in the reports - basically, any column that was represented as a large character field. Nevertheless, this represents an important milestone on the path to being able to write some kind of unit tests that have at least some meaning. If you have an existing LCF Postgresql database, you will need to force an upgrade after going to the new trunk code. To do this, repeat the "org.apache.lcf.agents.Install" command, and the "org.apache.lcf.agents.Register org.apache.lcf.crawler.system.CrawlerAgent" command after deploying the new code. And, please, let me know of any kind of errors you notice that could be related to the schema change. Thanks, Karl