[jira] [Resolved] (CONNECTORS-453) ManifoldCF running with Derby 10.8.1.1 has problems pausing and aborting jobs

2012-05-12 Thread Karl Wright (JIRA)

 [ 
https://issues.apache.org/jira/browse/CONNECTORS-453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karl Wright resolved CONNECTORS-453.


Resolution: Fixed

> ManifoldCF running with Derby 10.8.1.1 has problems pausing and aborting jobs
> -
>
> Key: CONNECTORS-453
> URL: https://issues.apache.org/jira/browse/CONNECTORS-453
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Framework core
>Affects Versions: ManifoldCF 0.5
>Reporter: Karl Wright
>Assignee: Karl Wright
>Priority: Critical
> Fix For: ManifoldCF 0.5.1, ManifoldCF 0.6
>
>
> Since upgrading to Derby 10.8.x.x, tt takes minutes to crawl just 20 
> documents.  Clearly the Derby contention/locking bugs are back with a 
> vengeance in 10.8.x.x.  Either we use 10.7.x.x or we get the Derby team to 
> look at them again.
> In the interim, maybe it is time to use hsqldb as the default embedded 
> database for the single-process example instead of Derby.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CONNECTORS-453) ManifoldCF running with Derby 10.8.1.1 has problems pausing and aborting jobs

2012-05-12 Thread Karl Wright (JIRA)

[ 
https://issues.apache.org/jira/browse/CONNECTORS-453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13273872#comment-13273872
 ] 

Karl Wright commented on CONNECTORS-453:


r1337457 (release branch)


> ManifoldCF running with Derby 10.8.1.1 has problems pausing and aborting jobs
> -
>
> Key: CONNECTORS-453
> URL: https://issues.apache.org/jira/browse/CONNECTORS-453
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Framework core
>Affects Versions: ManifoldCF 0.5
>Reporter: Karl Wright
>Assignee: Karl Wright
>Priority: Critical
> Fix For: ManifoldCF 0.5.1, ManifoldCF 0.6
>
>
> Since upgrading to Derby 10.8.x.x, tt takes minutes to crawl just 20 
> documents.  Clearly the Derby contention/locking bugs are back with a 
> vengeance in 10.8.x.x.  Either we use 10.7.x.x or we get the Derby team to 
> look at them again.
> In the interim, maybe it is time to use hsqldb as the default embedded 
> database for the single-process example instead of Derby.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Reopened] (CONNECTORS-453) ManifoldCF running with Derby 10.8.1.1 has problems pausing and aborting jobs

2012-05-12 Thread Karl Wright (JIRA)

 [ 
https://issues.apache.org/jira/browse/CONNECTORS-453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karl Wright reopened CONNECTORS-453:



Reopening for inclusion in 0.5.1

> ManifoldCF running with Derby 10.8.1.1 has problems pausing and aborting jobs
> -
>
> Key: CONNECTORS-453
> URL: https://issues.apache.org/jira/browse/CONNECTORS-453
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Framework core
>Affects Versions: ManifoldCF 0.5
>Reporter: Karl Wright
>Assignee: Karl Wright
>Priority: Critical
> Fix For: ManifoldCF 0.5.1, ManifoldCF 0.6
>
>
> Since upgrading to Derby 10.8.x.x, tt takes minutes to crawl just 20 
> documents.  Clearly the Derby contention/locking bugs are back with a 
> vengeance in 10.8.x.x.  Either we use 10.7.x.x or we get the Derby team to 
> look at them again.
> In the interim, maybe it is time to use hsqldb as the default embedded 
> database for the single-process example instead of Derby.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CONNECTORS-453) ManifoldCF running with Derby 10.8.1.1 has problems pausing and aborting jobs

2012-05-12 Thread Karl Wright (JIRA)

 [ 
https://issues.apache.org/jira/browse/CONNECTORS-453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karl Wright updated CONNECTORS-453:
---

 Priority: Critical  (was: Major)
Fix Version/s: ManifoldCF 0.5.1

> ManifoldCF running with Derby 10.8.1.1 has problems pausing and aborting jobs
> -
>
> Key: CONNECTORS-453
> URL: https://issues.apache.org/jira/browse/CONNECTORS-453
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Framework core
>Affects Versions: ManifoldCF 0.5
>Reporter: Karl Wright
>Assignee: Karl Wright
>Priority: Critical
> Fix For: ManifoldCF 0.5.1, ManifoldCF 0.6
>
>
> Since upgrading to Derby 10.8.x.x, tt takes minutes to crawl just 20 
> documents.  Clearly the Derby contention/locking bugs are back with a 
> vengeance in 10.8.x.x.  Either we use 10.7.x.x or we get the Derby team to 
> look at them again.
> In the interim, maybe it is time to use hsqldb as the default embedded 
> database for the single-process example instead of Derby.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CONNECTORS-453) ManifoldCF running with Derby 10.8.1.1 has problems pausing and aborting jobs

2012-04-26 Thread Karl Wright (JIRA)

 [ 
https://issues.apache.org/jira/browse/CONNECTORS-453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karl Wright updated CONNECTORS-453:
---

Summary: ManifoldCF running with Derby 10.8.1.1 has problems pausing and 
aborting jobs  (was: ManifoldCF running with Derby 10.8.1.1 has severe 
performance problems)

> ManifoldCF running with Derby 10.8.1.1 has problems pausing and aborting jobs
> -
>
> Key: CONNECTORS-453
> URL: https://issues.apache.org/jira/browse/CONNECTORS-453
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Framework core
>Affects Versions: ManifoldCF 0.5
>Reporter: Karl Wright
>Assignee: Karl Wright
> Fix For: ManifoldCF 0.6
>
>
> Since upgrading to Derby 10.8.x.x, tt takes minutes to crawl just 20 
> documents.  Clearly the Derby contention/locking bugs are back with a 
> vengeance in 10.8.x.x.  Either we use 10.7.x.x or we get the Derby team to 
> look at them again.
> In the interim, maybe it is time to use hsqldb as the default embedded 
> database for the single-process example instead of Derby.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CONNECTORS-453) ManifoldCF running with Derby 10.8.1.1 has severe performance problems

2012-04-26 Thread Karl Wright (JIRA)

[ 
https://issues.apache.org/jira/browse/CONNECTORS-453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13263148#comment-13263148
 ] 

Karl Wright commented on CONNECTORS-453:


r1331102


> ManifoldCF running with Derby 10.8.1.1 has severe performance problems
> --
>
> Key: CONNECTORS-453
> URL: https://issues.apache.org/jira/browse/CONNECTORS-453
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Framework core
>Affects Versions: ManifoldCF 0.5
>Reporter: Karl Wright
>Assignee: Karl Wright
> Fix For: ManifoldCF 0.6
>
>
> Since upgrading to Derby 10.8.x.x, tt takes minutes to crawl just 20 
> documents.  Clearly the Derby contention/locking bugs are back with a 
> vengeance in 10.8.x.x.  Either we use 10.7.x.x or we get the Derby team to 
> look at them again.
> In the interim, maybe it is time to use hsqldb as the default embedded 
> database for the single-process example instead of Derby.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (CONNECTORS-453) ManifoldCF running with Derby 10.8.1.1 has severe performance problems

2012-04-26 Thread Karl Wright (JIRA)

 [ 
https://issues.apache.org/jira/browse/CONNECTORS-453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karl Wright resolved CONNECTORS-453.


Resolution: Fixed

> ManifoldCF running with Derby 10.8.1.1 has severe performance problems
> --
>
> Key: CONNECTORS-453
> URL: https://issues.apache.org/jira/browse/CONNECTORS-453
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Framework core
>Affects Versions: ManifoldCF 0.5
>Reporter: Karl Wright
>Assignee: Karl Wright
> Fix For: ManifoldCF 0.6
>
>
> Since upgrading to Derby 10.8.x.x, tt takes minutes to crawl just 20 
> documents.  Clearly the Derby contention/locking bugs are back with a 
> vengeance in 10.8.x.x.  Either we use 10.7.x.x or we get the Derby team to 
> look at them again.
> In the interim, maybe it is time to use hsqldb as the default embedded 
> database for the single-process example instead of Derby.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CONNECTORS-453) ManifoldCF running with Derby 10.8.1.1 has severe performance problems

2012-04-26 Thread Karl Wright (JIRA)

[ 
https://issues.apache.org/jira/browse/CONNECTORS-453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13263092#comment-13263092
 ] 

Karl Wright commented on CONNECTORS-453:


Here's another example:

Error!
A lock could not be obtained due to a deadlock, cycle of locks and waiters is: 
Lock : ROW, JOBS, (1,7) Waiting XID : {157800, X} , APP, UPDATE jobs SET 
status=? WHERE id=? Granted XID : {157521, S} , {157653, S} Lock : ROW, 
JOBQUEUE, (503,86) Waiting XID : {157653, S} , APP, SELECT 
t0.id,t0.jobid,t0.dochash,t0.docid,t0.status,t0.failtime,t0.failcount,t0.priorityset
 FROM jobqueue t0 WHERE t0.status IN (?,?) AND t0.checkaction=? AND 
t0.checktime<=? AND EXISTS(SELECT 'x' FROM jobs t1 WHERE t1.status IN (?,?) AND 
t1.id=t0.jobid AND t1.priority=?) AND NOT EXISTS(SELECT 'x' FROM jobqueue t2 
WHERE t2.dochash=t0.dochash AND t2.status IN (?,?,?,?,?,?) AND 
t2.jobid!=t0.jobid) AND NOT EXISTS(SELECT 'x' FROM prereqevents t3,events t4 
WHERE t0.id=t3.owner AND t3.eventname=t4.name) ORDER BY t0.docpriority 
ASC,t0.status ASC,t0.checkaction ASC,t0.checktime ASC FETCH NEXT 120 ROWS ONLY 
Granted XID : {157557, X} Lock : ROW, JOBS, (1,7) Waiting XID : {157557, S} , 
APP, INSERT INTO hopcount (deathmark,parentidhash,id,distance,jobid,linktype) 
VALUES (?,?,?,?,?,?) . The selected victim is XID : 157800.


    
> ManifoldCF running with Derby 10.8.1.1 has severe performance problems
> --
>
> Key: CONNECTORS-453
> URL: https://issues.apache.org/jira/browse/CONNECTORS-453
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Framework core
>Affects Versions: ManifoldCF 0.5
>Reporter: Karl Wright
>Assignee: Karl Wright
> Fix For: ManifoldCF 0.6
>
>
> Since upgrading to Derby 10.8.x.x, tt takes minutes to crawl just 20 
> documents.  Clearly the Derby contention/locking bugs are back with a 
> vengeance in 10.8.x.x.  Either we use 10.7.x.x or we get the Derby team to 
> look at them again.
> In the interim, maybe it is time to use hsqldb as the default embedded 
> database for the single-process example instead of Derby.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CONNECTORS-453) ManifoldCF running with Derby 10.8.1.1 has severe performance problems

2012-04-26 Thread Karl Wright (JIRA)

[ 
https://issues.apache.org/jira/browse/CONNECTORS-453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13263090#comment-13263090
 ] 

Karl Wright commented on CONNECTORS-453:


Clicking pause during the job run yields the following to be displayed in the 
UI:



A lock could not be obtained due to a deadlock, cycle of locks and waiters is: 
Lock : ROW, JOBS, (1,7) Waiting XID : {147028, X} , APP, UPDATE jobs SET 
status=? WHERE id=? Granted XID : {146703, S} , {146941, S} Lock : ROW, 
JOBQUEUE, (481,10) Waiting XID : {146941, S} , APP, SELECT 
jobid,CAST(COUNT(dochash) AS bigint) AS doccount FROM jobqueue t1 WHERE 
EXISTS(SELECT 'x' FROM jobs t0 WHERE t0.id=t1.jobid AND id=?) GROUP BY jobid 
Granted XID : {146612, X} Lock : ROW, HOPCOUNT, (1734,27) Waiting XID : 
{146612, S} , APP, SELECT parentidhash,linktype,distance FROM hopcount WHERE 
jobid=? AND parentidhash IN (?,?,?,?,?,?,?,?,?,?) AND linktype=? Granted XID : 
{14, X} Lock : ROW, JOBS, (1,7) Waiting XID : {14, S} , APP, INSERT 
INTO hopcount (deathmark,parentidhash,id,distance,jobid,linktype) VALUES 
(?,?,?,?,?,?) . The selected victim is XID : 147028.


> ManifoldCF running with Derby 10.8.1.1 has severe performance problems
> --
>
> Key: CONNECTORS-453
> URL: https://issues.apache.org/jira/browse/CONNECTORS-453
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Framework core
>Affects Versions: ManifoldCF 0.5
>Reporter: Karl Wright
>Assignee: Karl Wright
> Fix For: ManifoldCF 0.6
>
>
> Since upgrading to Derby 10.8.x.x, tt takes minutes to crawl just 20 
> documents.  Clearly the Derby contention/locking bugs are back with a 
> vengeance in 10.8.x.x.  Either we use 10.7.x.x or we get the Derby team to 
> look at them again.
> In the interim, maybe it is time to use hsqldb as the default embedded 
> database for the single-process example instead of Derby.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CONNECTORS-453) ManifoldCF running with Derby 10.8.1.1 has severe performance problems

2012-04-26 Thread Karl Wright (JIRA)

[ 
https://issues.apache.org/jira/browse/CONNECTORS-453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13263086#comment-13263086
 ] 

Karl Wright commented on CONNECTORS-453:


I see stalls only at the very beginning of a crawl.  Long crawls with lots of 
documents don't appear to stall, however.  Still trying to figure out if this 
is an actual problem or something more innocuous.


> ManifoldCF running with Derby 10.8.1.1 has severe performance problems
> --
>
> Key: CONNECTORS-453
> URL: https://issues.apache.org/jira/browse/CONNECTORS-453
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Framework core
>Affects Versions: ManifoldCF 0.5
>Reporter: Karl Wright
>Assignee: Karl Wright
> Fix For: ManifoldCF 0.6
>
>
> Since upgrading to Derby 10.8.x.x, tt takes minutes to crawl just 20 
> documents.  Clearly the Derby contention/locking bugs are back with a 
> vengeance in 10.8.x.x.  Either we use 10.7.x.x or we get the Derby team to 
> look at them again.
> In the interim, maybe it is time to use hsqldb as the default embedded 
> database for the single-process example instead of Derby.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (CONNECTORS-453) ManifoldCF running with Derby 10.8.1.1 has severe performance problems

2012-04-05 Thread Karl Wright (Created) (JIRA)
ManifoldCF running with Derby 10.8.1.1 has severe performance problems
--

 Key: CONNECTORS-453
 URL: https://issues.apache.org/jira/browse/CONNECTORS-453
 Project: ManifoldCF
  Issue Type: Bug
  Components: Framework core
Affects Versions: ManifoldCF 0.5
Reporter: Karl Wright
Assignee: Karl Wright
 Fix For: ManifoldCF 0.6


Since upgrading to Derby 10.8.x.x, tt takes minutes to crawl just 20 documents. 
 Clearly the Derby contention/locking bugs are back with a vengeance in 
10.8.x.x.  Either we use 10.7.x.x or we get the Derby team to look at them 
again.

In the interim, maybe it is time to use hsqldb as the default embedded database 
for the single-process example instead of Derby.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CONNECTORS-178) Implement ability to run ManifoldCF with Derby in multiprocess mode

2011-09-01 Thread Karl Wright (JIRA)

 [ 
https://issues.apache.org/jira/browse/CONNECTORS-178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karl Wright updated CONNECTORS-178:
---

Affects Version/s: ManifoldCF 0.1
   ManifoldCF 0.2
Fix Version/s: ManifoldCF next

> Implement ability to run ManifoldCF with Derby in multiprocess mode
> ---
>
> Key: CONNECTORS-178
> URL: https://issues.apache.org/jira/browse/CONNECTORS-178
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Documentation, Framework core
>Affects Versions: ManifoldCF 0.1, ManifoldCF 0.2
>Reporter: Karl Wright
>Priority: Minor
> Fix For: ManifoldCF next
>
>
> Derby has a standalone server mode, which we can no doubt use if we modify 
> the Derby driver to accept a configuration parameter which allows you to 
> choose between the embedded driver and the client driver.  It might be useful 
> to be able to run ManifoldCF with Derby in this manner.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CONNECTORS-110) Max activity and Max bandwidth reports don't work properly under Derby or HSQLDB

2011-09-01 Thread Karl Wright (JIRA)

 [ 
https://issues.apache.org/jira/browse/CONNECTORS-110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karl Wright updated CONNECTORS-110:
---

Affects Version/s: ManifoldCF 0.1
   ManifoldCF 0.2
Fix Version/s: ManifoldCF next

> Max activity and Max bandwidth reports don't work properly under Derby or 
> HSQLDB
> 
>
> Key: CONNECTORS-110
> URL: https://issues.apache.org/jira/browse/CONNECTORS-110
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Framework crawler agent
>Affects Versions: ManifoldCF 0.1, ManifoldCF 0.2
>Reporter: Karl Wright
> Fix For: ManifoldCF next
>
>
> The reason for the failure is because the queries used are doing the 
> Postgresql DISTINCT ON (xxx) syntax, which Derby does not support.  
> Unfortunately, there does not seem to be a way in Derby at present to do 
> anything similar to DISTINCT ON (xxx), and the queries really can't be done 
> without that.
> One option is to introduce a getCapabilities() method into the database 
> implementation, which would allow ACF to query the database capabilities 
> before even presenting the report in the navigation menu in the UI.  Another 
> alternative is to do a sizable chunk of resultset processing within ACF, 
> which would require not only the DISTINCT ON() implementation, but also the 
> enclosing sort and limit stuff.  It's the latter that would be most 
> challenging, because of the difficulties with i18n etc.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (CONNECTORS-244) Derby deadlocks in a new way on the IngestStatus table, which isn't caught and retried

2011-08-30 Thread Karl Wright (JIRA)

 [ 
https://issues.apache.org/jira/browse/CONNECTORS-244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karl Wright resolved CONNECTORS-244.


   Resolution: Fixed
Fix Version/s: ManifoldCF 0.3

r1163260.


> Derby deadlocks in a new way on the IngestStatus table, which isn't caught 
> and retried
> --
>
> Key: CONNECTORS-244
> URL: https://issues.apache.org/jira/browse/CONNECTORS-244
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Framework agents process
>Affects Versions: ManifoldCF 0.3
>Reporter: Karl Wright
>Assignee: Karl Wright
> Fix For: ManifoldCF 0.3
>
>
> Derby deadlocks when a file system job is run, as follows:
> Irrecoverable Derby deadlock at:
>   at 
> org.apache.manifoldcf.core.database.DBInterfaceDerby.reinterpretException(DBInterfaceDerby.java:803)
>   at 
> org.apache.manifoldcf.core.database.DBInterfaceDerby.performQuery(DBInterfaceDerby.java:961)
>   at 
> org.apache.manifoldcf.core.database.BaseTable.performQuery(BaseTable.java:229)
>   at 
> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.performIngestion(IncrementalIngester.java:388)
>   at 
> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.documentIngest(IncrementalIngester.java:364)
>   at 
> org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocument(WorkerThread.java:1555)
>   at 
> org.apache.manifoldcf.crawler.connectors.filesystem.FileConnector.processDocuments(FileConnector.java:283)
>   at 
> org.apache.manifoldcf.crawler.connectors.BaseRepositoryConnector.processDocuments(BaseRepositoryConnector.java:423)
>   at 
> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:561)
> The deadlock needs to be caught, backed off, and retried.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (CONNECTORS-244) Derby deadlocks in a new way on the IngestStatus table, which isn't caught and retried

2011-08-30 Thread Karl Wright (JIRA)
Derby deadlocks in a new way on the IngestStatus table, which isn't caught and 
retried
--

 Key: CONNECTORS-244
 URL: https://issues.apache.org/jira/browse/CONNECTORS-244
 Project: ManifoldCF
  Issue Type: Bug
  Components: Framework agents process
Affects Versions: ManifoldCF 0.3
Reporter: Karl Wright
Assignee: Karl Wright


Derby deadlocks when a file system job is run, as follows:

Irrecoverable Derby deadlock at:
at 
org.apache.manifoldcf.core.database.DBInterfaceDerby.reinterpretException(DBInterfaceDerby.java:803)
at 
org.apache.manifoldcf.core.database.DBInterfaceDerby.performQuery(DBInterfaceDerby.java:961)
at 
org.apache.manifoldcf.core.database.BaseTable.performQuery(BaseTable.java:229)
at 
org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.performIngestion(IncrementalIngester.java:388)
at 
org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.documentIngest(IncrementalIngester.java:364)
at 
org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocument(WorkerThread.java:1555)
at 
org.apache.manifoldcf.crawler.connectors.filesystem.FileConnector.processDocuments(FileConnector.java:283)
at 
org.apache.manifoldcf.crawler.connectors.BaseRepositoryConnector.processDocuments(BaseRepositoryConnector.java:423)
at 
org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:561)


The deadlock needs to be caught, backed off, and retried.


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (CONNECTORS-225) When using Derby, ManifoldCF incremental indexer sometimes gets a deadlock exception

2011-07-24 Thread Karl Wright (JIRA)

 [ 
https://issues.apache.org/jira/browse/CONNECTORS-225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karl Wright resolved CONNECTORS-225.


   Resolution: Fixed
Fix Version/s: ManifoldCF 0.3

r1150502

> When using Derby, ManifoldCF incremental indexer sometimes gets a deadlock 
> exception
> 
>
> Key: CONNECTORS-225
> URL: https://issues.apache.org/jira/browse/CONNECTORS-225
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Framework agents process
>Affects Versions: ManifoldCF 0.1, ManifoldCF 0.2, ManifoldCF 0.3
>Reporter: Karl Wright
>Assignee: Karl Wright
> Fix For: ManifoldCF 0.3
>
>
> When working with Derby and indexing documents rapidly, sometimes the 
> following deadlock stack trace is thrown:
> at 
> org.apache.manifoldcf.core.database.DBInterfaceDerby.reinterpretExcep
> tion(DBInterfaceDerby.java:803)
> at 
> org.apache.manifoldcf.core.database.DBInterfaceDerby.performQuery(DBI
> nterfaceDerby.java:961)
> at 
> org.apache.manifoldcf.core.database.BaseTable.performQuery(BaseTable.
> java:229)
> at 
> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.no
> teDocumentIngest(IncrementalIngester.java:1372)
> at 
> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.pe
> rformIngestion(IncrementalIngester.java:469)
> at 
> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.do
> cumentIngest(IncrementalIngester.java:365)
> at 
> org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ing
> estDocument(WorkerThread.java:1587)
> at 
> org.apache.manifoldcf.crawler.connectors.webcrawler.WebcrawlerConnect
> or.processDocuments(WebcrawlerConnector.java:1222)
> at 
> org.apache.manifoldcf.crawler.connectors.BaseRepositoryConnector.proc
> essDocuments(BaseRepositoryConnector.java:423)
> at 
> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.ja

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (CONNECTORS-225) When using Derby, ManifoldCF incremental indexer sometimes gets a deadlock exception

2011-07-24 Thread Karl Wright (JIRA)

 [ 
https://issues.apache.org/jira/browse/CONNECTORS-225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karl Wright reassigned CONNECTORS-225:
--

Assignee: Karl Wright

> When using Derby, ManifoldCF incremental indexer sometimes gets a deadlock 
> exception
> 
>
> Key: CONNECTORS-225
> URL: https://issues.apache.org/jira/browse/CONNECTORS-225
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Framework agents process
>Affects Versions: ManifoldCF 0.1, ManifoldCF 0.2, ManifoldCF 0.3
>Reporter: Karl Wright
>Assignee: Karl Wright
>
> When working with Derby and indexing documents rapidly, sometimes the 
> following deadlock stack trace is thrown:
> at 
> org.apache.manifoldcf.core.database.DBInterfaceDerby.reinterpretExcep
> tion(DBInterfaceDerby.java:803)
> at 
> org.apache.manifoldcf.core.database.DBInterfaceDerby.performQuery(DBI
> nterfaceDerby.java:961)
> at 
> org.apache.manifoldcf.core.database.BaseTable.performQuery(BaseTable.
> java:229)
> at 
> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.no
> teDocumentIngest(IncrementalIngester.java:1372)
> at 
> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.pe
> rformIngestion(IncrementalIngester.java:469)
> at 
> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.do
> cumentIngest(IncrementalIngester.java:365)
> at 
> org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ing
> estDocument(WorkerThread.java:1587)
> at 
> org.apache.manifoldcf.crawler.connectors.webcrawler.WebcrawlerConnect
> or.processDocuments(WebcrawlerConnector.java:1222)
> at 
> org.apache.manifoldcf.crawler.connectors.BaseRepositoryConnector.proc
> essDocuments(BaseRepositoryConnector.java:423)
> at 
> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.ja

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (CONNECTORS-225) When using Derby, ManifoldCF incremental indexer sometimes gets a deadlock exception

2011-07-24 Thread Karl Wright (JIRA)
When using Derby, ManifoldCF incremental indexer sometimes gets a deadlock 
exception


 Key: CONNECTORS-225
 URL: https://issues.apache.org/jira/browse/CONNECTORS-225
 Project: ManifoldCF
  Issue Type: Bug
  Components: Framework agents process
Affects Versions: ManifoldCF 0.2, ManifoldCF 0.1, ManifoldCF 0.3
Reporter: Karl Wright


When working with Derby and indexing documents rapidly, sometimes the following 
deadlock stack trace is thrown:

at org.apache.manifoldcf.core.database.DBInterfaceDerby.reinterpretExcep
tion(DBInterfaceDerby.java:803)
at org.apache.manifoldcf.core.database.DBInterfaceDerby.performQuery(DBI
nterfaceDerby.java:961)
at org.apache.manifoldcf.core.database.BaseTable.performQuery(BaseTable.
java:229)
at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.no
teDocumentIngest(IncrementalIngester.java:1372)
at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.pe
rformIngestion(IncrementalIngester.java:469)
at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.do
cumentIngest(IncrementalIngester.java:365)
at org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ing
estDocument(WorkerThread.java:1587)
at org.apache.manifoldcf.crawler.connectors.webcrawler.WebcrawlerConnect
or.processDocuments(WebcrawlerConnector.java:1222)
at org.apache.manifoldcf.crawler.connectors.BaseRepositoryConnector.proc
essDocuments(BaseRepositoryConnector.java:423)
at org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.ja



--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (CONNECTORS-114) Derby seems too unstable in multithreaded situations to be a good database for ManifoldCF, so try to add support for HSQLDB

2011-06-03 Thread Karl Wright (JIRA)

 [ 
https://issues.apache.org/jira/browse/CONNECTORS-114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karl Wright resolved CONNECTORS-114.


   Resolution: Fixed
Fix Version/s: ManifoldCF 0.3
 Assignee: Karl Wright

I have not yet made HSQLDB the official Derby replacement, but it is currently 
a better embedded option for many situations than Derby is.

> Derby seems too unstable in multithreaded situations to be a good database 
> for ManifoldCF, so try to add support for HSQLDB
> ---
>
> Key: CONNECTORS-114
> URL: https://issues.apache.org/jira/browse/CONNECTORS-114
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Framework core
>Reporter: Karl Wright
>Assignee: Karl Wright
>     Fix For: ManifoldCF 0.3
>
>
> Derby seems to have multiple problems:
> (1) It has internal deadlocks, which even if caught cause poor performance 
> due to stalling (CONNECTORS-111);
> (2) It has no support for certain SQL constructs (CONNECTORS-109 and 
> CONNECTORS-110);
> (3) It locks up entirely for some people (CONNECTORS-100).
> HSQLDB has been recommended as another potential embedded database that might 
> work better.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CONNECTORS-114) Derby seems too unstable in multithreaded situations to be a good database for ManifoldCF, so try to add support for HSQLDB

2011-06-03 Thread Karl Wright (JIRA)

[ 
https://issues.apache.org/jira/browse/CONNECTORS-114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13043369#comment-13043369
 ] 

Karl Wright commented on CONNECTORS-114:


Remaining issues with HSQLDB have been resolved, so I'm closing this ticket.
r1131056.


> Derby seems too unstable in multithreaded situations to be a good database 
> for ManifoldCF, so try to add support for HSQLDB
> ---
>
> Key: CONNECTORS-114
> URL: https://issues.apache.org/jira/browse/CONNECTORS-114
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Framework core
>Reporter: Karl Wright
>     Fix For: ManifoldCF 0.3
>
>
> Derby seems to have multiple problems:
> (1) It has internal deadlocks, which even if caught cause poor performance 
> due to stalling (CONNECTORS-111);
> (2) It has no support for certain SQL constructs (CONNECTORS-109 and 
> CONNECTORS-110);
> (3) It locks up entirely for some people (CONNECTORS-100).
> HSQLDB has been recommended as another potential embedded database that might 
> work better.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CONNECTORS-110) Max activity and Max bandwidth reports don't work properly under Derby or HSQLDB

2011-06-02 Thread Karl Wright (JIRA)

[ 
https://issues.apache.org/jira/browse/CONNECTORS-110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13042861#comment-13042861
 ] 

Karl Wright commented on CONNECTORS-110:


r1130644 implements this for HSQLDB.  Unfortunately, performance is extremely 
slow, even when the number of rows in the temporary table is only a few dozen.


> Max activity and Max bandwidth reports don't work properly under Derby or 
> HSQLDB
> 
>
> Key: CONNECTORS-110
> URL: https://issues.apache.org/jira/browse/CONNECTORS-110
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Framework crawler agent
>Reporter: Karl Wright
>
> The reason for the failure is because the queries used are doing the 
> Postgresql DISTINCT ON (xxx) syntax, which Derby does not support.  
> Unfortunately, there does not seem to be a way in Derby at present to do 
> anything similar to DISTINCT ON (xxx), and the queries really can't be done 
> without that.
> One option is to introduce a getCapabilities() method into the database 
> implementation, which would allow ACF to query the database capabilities 
> before even presenting the report in the navigation menu in the UI.  Another 
> alternative is to do a sizable chunk of resultset processing within ACF, 
> which would require not only the DISTINCT ON() implementation, but also the 
> enclosing sort and limit stuff.  It's the latter that would be most 
> challenging, because of the difficulties with i18n etc.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CONNECTORS-110) Max activity and Max bandwidth reports don't work properly under Derby or HSQLDB

2011-06-02 Thread Karl Wright (JIRA)

[ 
https://issues.apache.org/jira/browse/CONNECTORS-110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13042669#comment-13042669
 ] 

Karl Wright commented on CONNECTORS-110:


Updated suggestion from Fred pertaining to HSQLDB:  Use WITH statement, as 
follows:

WITH invoice ( customerid, id, total) AS ( complex select statetment)
SELECT * FROM (SELECT DISTINCT customerid FROM invoice)  AS  i_one,
LATERAL ( SELECT id, total FROM invoice WHERE customerid =
i_one.customerid ORDER BY total DESC LIMIT 1) AS i_two

I believe this can actually be generated in a manner that fits the current 
abstraction.

> Max activity and Max bandwidth reports don't work properly under Derby or 
> HSQLDB
> 
>
> Key: CONNECTORS-110
> URL: https://issues.apache.org/jira/browse/CONNECTORS-110
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Framework crawler agent
>Reporter: Karl Wright
>
> The reason for the failure is because the queries used are doing the 
> Postgresql DISTINCT ON (xxx) syntax, which Derby does not support.  
> Unfortunately, there does not seem to be a way in Derby at present to do 
> anything similar to DISTINCT ON (xxx), and the queries really can't be done 
> without that.
> One option is to introduce a getCapabilities() method into the database 
> implementation, which would allow ACF to query the database capabilities 
> before even presenting the report in the navigation menu in the UI.  Another 
> alternative is to do a sizable chunk of resultset processing within ACF, 
> which would require not only the DISTINCT ON() implementation, but also the 
> enclosing sort and limit stuff.  It's the latter that would be most 
> challenging, because of the difficulties with i18n etc.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (CONNECTORS-110) Max activity and Max bandwidth reports don't work properly under Derby or HSQLDB

2011-06-02 Thread Karl Wright (JIRA)

 [ 
https://issues.apache.org/jira/browse/CONNECTORS-110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karl Wright updated CONNECTORS-110:
---

Summary: Max activity and Max bandwidth reports don't work properly under 
Derby or HSQLDB  (was: Max activity and Max bandwidth reports don't work 
properly under Derby)

> Max activity and Max bandwidth reports don't work properly under Derby or 
> HSQLDB
> 
>
> Key: CONNECTORS-110
> URL: https://issues.apache.org/jira/browse/CONNECTORS-110
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Framework crawler agent
>Reporter: Karl Wright
>
> The reason for the failure is because the queries used are doing the 
> Postgresql DISTINCT ON (xxx) syntax, which Derby does not support.  
> Unfortunately, there does not seem to be a way in Derby at present to do 
> anything similar to DISTINCT ON (xxx), and the queries really can't be done 
> without that.
> One option is to introduce a getCapabilities() method into the database 
> implementation, which would allow ACF to query the database capabilities 
> before even presenting the report in the navigation menu in the UI.  Another 
> alternative is to do a sizable chunk of resultset processing within ACF, 
> which would require not only the DISTINCT ON() implementation, but also the 
> enclosing sort and limit stuff.  It's the latter that would be most 
> challenging, because of the difficulties with i18n etc.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CONNECTORS-110) Max activity and Max bandwidth reports don't work properly under Derby

2011-06-02 Thread Karl Wright (JIRA)

[ 
https://issues.apache.org/jira/browse/CONNECTORS-110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13042655#comment-13042655
 ] 

Karl Wright commented on CONNECTORS-110:


HSQLDB is now also in roughly the same situation, although I've gotten a rough 
outline of a way to make this work involving temporary tables. This is as 
follows:

SELECT * FROM (SELECT DISTINCT customerid FROM invoice)  AS  i_one,
LATERAL ( SELECT id, total FROM invoice WHERE customerid =
i_one.customerid ORDER BY total DESC LIMIT 1) AS i_two

... where "invoice" would be a temporary table created on the fly, as follows:


DECLARE LOCAL TEMPORARY TABLE T AS (SELECT statement) [ON COMMIT {
PRESERVE | DELETE } ROWS]

For example:

DECLARE LOCAL TEMPORARY TABLE invoice AS (SELECT * FROM whatever) ON COMMIT 
DELETE ROWS WITH DATA

then perform the kind of query I suggested.

The issue is that this does not fit in a our single-query abstraction metaphor 
at all.  Maybe a (different but identically named) stored procedure could be 
generated on all three databases that would do the trick.  Alternatively, all 
databases could go the temporary table route, but then PostgreSQL would be 
unnecessarily crippled.




> Max activity and Max bandwidth reports don't work properly under Derby
> --
>
> Key: CONNECTORS-110
> URL: https://issues.apache.org/jira/browse/CONNECTORS-110
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Framework crawler agent
>Reporter: Karl Wright
>
> The reason for the failure is because the queries used are doing the 
> Postgresql DISTINCT ON (xxx) syntax, which Derby does not support.  
> Unfortunately, there does not seem to be a way in Derby at present to do 
> anything similar to DISTINCT ON (xxx), and the queries really can't be done 
> without that.
> One option is to introduce a getCapabilities() method into the database 
> implementation, which would allow ACF to query the database capabilities 
> before even presenting the report in the navigation menu in the UI.  Another 
> alternative is to do a sizable chunk of resultset processing within ACF, 
> which would require not only the DISTINCT ON() implementation, but also the 
> enclosing sort and limit stuff.  It's the latter that would be most 
> challenging, because of the difficulties with i18n etc.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CONNECTORS-114) Derby seems too unstable in multithreaded situations to be a good database for ManifoldCF, so try to add support for HSQLDB

2011-05-30 Thread Karl Wright (JIRA)

[ 
https://issues.apache.org/jira/browse/CONNECTORS-114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13041339#comment-13041339
 ] 

Karl Wright commented on CONNECTORS-114:


Just got email from the HSQLDB team, and confirmed that the deadlock issue was 
resolved in hsqldb 2.2.2.  So it looks like we have a third database that 
ManifoldCF can work with.  I've checked in the updated database jar, and am 
planning on writing a test series that uses hsqldb, much like the series that 
uses PostgreSQL.

We've still got to settle on how precisely to do the equivalent of PostgreSQL's 
DISTINCT ON functionality, but that's all that is left.  Also, FWIW, HSQLDB 
doesn't (as yet) seem to fail so spectacularly dealing with hopcounts as Derby 
does.


> Derby seems too unstable in multithreaded situations to be a good database 
> for ManifoldCF, so try to add support for HSQLDB
> ---
>
> Key: CONNECTORS-114
> URL: https://issues.apache.org/jira/browse/CONNECTORS-114
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Framework core
>Reporter: Karl Wright
>
> Derby seems to have multiple problems:
> (1) It has internal deadlocks, which even if caught cause poor performance 
> due to stalling (CONNECTORS-111);
> (2) It has no support for certain SQL constructs (CONNECTORS-109 and 
> CONNECTORS-110);
> (3) It locks up entirely for some people (CONNECTORS-100).
> HSQLDB has been recommended as another potential embedded database that might 
> work better.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (CONNECTORS-175) The site documentation property list does not include the PostgreSQL-specific parameters, and may be missing some of the Derby ones too

2011-04-06 Thread Karl Wright (JIRA)

 [ 
https://issues.apache.org/jira/browse/CONNECTORS-175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karl Wright resolved CONNECTORS-175.


   Resolution: Fixed
Fix Version/s: ManifoldCF next

r1089704.


> The site documentation property list does not include the PostgreSQL-specific 
> parameters, and may be missing some of the Derby ones too
> ---
>
> Key: CONNECTORS-175
> URL: https://issues.apache.org/jira/browse/CONNECTORS-175
> Project: ManifoldCF
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: ManifoldCF next
>Reporter: Karl Wright
>Assignee: Karl Wright
>Priority: Minor
> Fix For: ManifoldCF next
>
>
> The table that documents all the properties in properties.xml seems to be 
> missing the PostgreSQL-specific ones.  This is the 
> how-to-build-and-deploy.html page.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (CONNECTORS-175) The site documentation property list does not include the PostgreSQL-specific parameters, and may be missing some of the Derby ones too

2011-04-06 Thread Karl Wright (JIRA)

 [ 
https://issues.apache.org/jira/browse/CONNECTORS-175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karl Wright reassigned CONNECTORS-175:
--

Assignee: Karl Wright

> The site documentation property list does not include the PostgreSQL-specific 
> parameters, and may be missing some of the Derby ones too
> ---
>
> Key: CONNECTORS-175
> URL: https://issues.apache.org/jira/browse/CONNECTORS-175
> Project: ManifoldCF
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: ManifoldCF next
>Reporter: Karl Wright
>Assignee: Karl Wright
>Priority: Minor
>
> The table that documents all the properties in properties.xml seems to be 
> missing the PostgreSQL-specific ones.  This is the 
> how-to-build-and-deploy.html page.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CONNECTORS-175) The site documentation property list does not include the PostgreSQL-specific parameters, and may be missing some of the Derby ones too

2011-04-06 Thread Karl Wright (JIRA)

[ 
https://issues.apache.org/jira/browse/CONNECTORS-175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13016641#comment-13016641
 ] 

Karl Wright commented on CONNECTORS-175:


The QuickStart parameters org.apache.manifoldcf.dbsuperusername and 
org.apache.manifoldcf.dbsuperuserpassword are definitely missing.


> The site documentation property list does not include the PostgreSQL-specific 
> parameters, and may be missing some of the Derby ones too
> ---
>
> Key: CONNECTORS-175
> URL: https://issues.apache.org/jira/browse/CONNECTORS-175
> Project: ManifoldCF
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: ManifoldCF next
>Reporter: Karl Wright
>Priority: Minor
>
> The table that documents all the properties in properties.xml seems to be 
> missing the PostgreSQL-specific ones.  This is the 
> how-to-build-and-deploy.html page.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



[jira] [Created] (CONNECTORS-178) Implement ability to run ManifoldCF with Derby in multiprocess mode

2011-04-02 Thread Karl Wright (JIRA)
Implement ability to run ManifoldCF with Derby in multiprocess mode
---

 Key: CONNECTORS-178
 URL: https://issues.apache.org/jira/browse/CONNECTORS-178
 Project: ManifoldCF
  Issue Type: Bug
  Components: Documentation, Framework core
Reporter: Karl Wright
Priority: Minor


Derby has a standalone server mode, which we can no doubt use if we modify the 
Derby driver to accept a configuration parameter which allows you to choose 
between the embedded driver and the client driver.  It might be useful to be 
able to run ManifoldCF with Derby in this manner.


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CONNECTORS-110) Max activity and Max bandwidth reports don't work properly under Derby

2011-04-02 Thread Karl Wright (JIRA)

[ 
https://issues.apache.org/jira/browse/CONNECTORS-110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13015015#comment-13015015
 ] 

Karl Wright commented on CONNECTORS-110:


This ticket is stalled because it requires a new Derby feature to resolve.  The 
resolution will be to assess the current version of Derby and find out whether 
the required feature has been added, and barring that, opening a Derby ticket 
for the feature.


> Max activity and Max bandwidth reports don't work properly under Derby
> --
>
> Key: CONNECTORS-110
> URL: https://issues.apache.org/jira/browse/CONNECTORS-110
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Framework crawler agent
>Reporter: Karl Wright
>
> The reason for the failure is because the queries used are doing the 
> Postgresql DISTINCT ON (xxx) syntax, which Derby does not support.  
> Unfortunately, there does not seem to be a way in Derby at present to do 
> anything similar to DISTINCT ON (xxx), and the queries really can't be done 
> without that.
> One option is to introduce a getCapabilities() method into the database 
> implementation, which would allow ACF to query the database capabilities 
> before even presenting the report in the navigation menu in the UI.  Another 
> alternative is to do a sizable chunk of resultset processing within ACF, 
> which would require not only the DISTINCT ON() implementation, but also the 
> enclosing sort and limit stuff.  It's the latter that would be most 
> challenging, because of the difficulties with i18n etc.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (CONNECTORS-110) Max activity and Max bandwidth reports don't work properly under Derby

2011-04-02 Thread Karl Wright (JIRA)

 [ 
https://issues.apache.org/jira/browse/CONNECTORS-110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karl Wright updated CONNECTORS-110:
---

Summary: Max activity and Max bandwidth reports don't work properly under 
Derby  (was: Max activity and Max bandwidth reports fail under Derby with a 
stack trace)

> Max activity and Max bandwidth reports don't work properly under Derby
> --
>
> Key: CONNECTORS-110
> URL: https://issues.apache.org/jira/browse/CONNECTORS-110
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Framework crawler agent
>Reporter: Karl Wright
>
> The reason for the failure is because the queries used are doing the 
> Postgresql DISTINCT ON (xxx) syntax, which Derby does not support.  
> Unfortunately, there does not seem to be a way in Derby at present to do 
> anything similar to DISTINCT ON (xxx), and the queries really can't be done 
> without that.
> One option is to introduce a getCapabilities() method into the database 
> implementation, which would allow ACF to query the database capabilities 
> before even presenting the report in the navigation menu in the UI.  Another 
> alternative is to do a sizable chunk of resultset processing within ACF, 
> which would require not only the DISTINCT ON() implementation, but also the 
> enclosing sort and limit stuff.  It's the latter that would be most 
> challenging, because of the difficulties with i18n etc.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CONNECTORS-114) Derby seems too unstable in multithreaded situations to be a good database for ManifoldCF, so try to add support for HSQLDB

2011-04-02 Thread Karl Wright (JIRA)

[ 
https://issues.apache.org/jira/browse/CONNECTORS-114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13015013#comment-13015013
 ] 

Karl Wright commented on CONNECTORS-114:


This is stalled, because HSQLDB is not yet ready for the kinds of demands that 
ManifoldCF will put on it.  Working with Derby seems more appropriate since 
they've been able to respond to bugs.


> Derby seems too unstable in multithreaded situations to be a good database 
> for ManifoldCF, so try to add support for HSQLDB
> ---
>
> Key: CONNECTORS-114
> URL: https://issues.apache.org/jira/browse/CONNECTORS-114
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Framework core
>Reporter: Karl Wright
>
> Derby seems to have multiple problems:
> (1) It has internal deadlocks, which even if caught cause poor performance 
> due to stalling (CONNECTORS-111);
> (2) It has no support for certain SQL constructs (CONNECTORS-109 and 
> CONNECTORS-110);
> (3) It locks up entirely for some people (CONNECTORS-100).
> HSQLDB has been recommended as another potential embedded database that might 
> work better.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (CONNECTORS-175) The site documentation property list does not include the PostgreSQL-specific parameters, and may be missing some of the Derby ones too

2011-04-02 Thread Karl Wright (JIRA)
The site documentation property list does not include the PostgreSQL-specific 
parameters, and may be missing some of the Derby ones too
---

 Key: CONNECTORS-175
 URL: https://issues.apache.org/jira/browse/CONNECTORS-175
 Project: ManifoldCF
  Issue Type: Improvement
  Components: Documentation
Affects Versions: ManifoldCF next
Reporter: Karl Wright
Priority: Minor


The table that documents all the properties in properties.xml seems to be 
missing the PostgreSQL-specific ones.  This is the how-to-build-and-deploy.html 
page.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Resolved: (CONNECTORS-170) Derby database driver needs to periodically update statistics

2011-03-17 Thread Karl Wright (JIRA)

 [ 
https://issues.apache.org/jira/browse/CONNECTORS-170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karl Wright resolved CONNECTORS-170.


   Resolution: Fixed
Fix Version/s: ManifoldCF 0.2

r1082598.


> Derby database driver needs to periodically update statistics
> -
>
> Key: CONNECTORS-170
> URL: https://issues.apache.org/jira/browse/CONNECTORS-170
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Framework core
>Affects Versions: ManifoldCF 0.2
>Reporter: Karl Wright
>Assignee: Karl Wright
>     Fix For: ManifoldCF 0.2
>
>
> The Derby database driver needs to update statistics periodically, using 
> logic similar to that developed for PostgreSQL.  The way that's done is 
> through calling SYSCS_UTIL.SYSCS_UPDATE_STATISTICS on the table in question.
> http://db.apache.org/derby/docs/10.7/ref/rrefupdatestatsproc.html.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Created: (CONNECTORS-170) Derby database driver needs to periodically update statistics

2011-03-17 Thread Karl Wright (JIRA)
Derby database driver needs to periodically update statistics
-

 Key: CONNECTORS-170
 URL: https://issues.apache.org/jira/browse/CONNECTORS-170
 Project: ManifoldCF
  Issue Type: Bug
  Components: Framework core
Affects Versions: ManifoldCF 0.2
Reporter: Karl Wright


The Derby database driver needs to update statistics periodically, using logic 
similar to that developed for PostgreSQL.  The way that's done is through 
calling SYSCS_UTIL.SYSCS_UPDATE_STATISTICS on the table in question.

http://db.apache.org/derby/docs/10.7/ref/rrefupdatestatsproc.html.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Assigned: (CONNECTORS-170) Derby database driver needs to periodically update statistics

2011-03-17 Thread Karl Wright (JIRA)

 [ 
https://issues.apache.org/jira/browse/CONNECTORS-170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karl Wright reassigned CONNECTORS-170:
--

Assignee: Karl Wright

> Derby database driver needs to periodically update statistics
> -
>
> Key: CONNECTORS-170
> URL: https://issues.apache.org/jira/browse/CONNECTORS-170
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Framework core
>Affects Versions: ManifoldCF 0.2
>Reporter: Karl Wright
>    Assignee: Karl Wright
>
> The Derby database driver needs to update statistics periodically, using 
> logic similar to that developed for PostgreSQL.  The way that's done is 
> through calling SYSCS_UTIL.SYSCS_UPDATE_STATISTICS on the table in question.
> http://db.apache.org/derby/docs/10.7/ref/rrefupdatestatsproc.html.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Resolved: (CONNECTORS-166) Crawl seizes up when running Derby

2011-03-16 Thread Karl Wright (JIRA)

 [ 
https://issues.apache.org/jira/browse/CONNECTORS-166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karl Wright resolved CONNECTORS-166.


   Resolution: Fixed
Fix Version/s: ManifoldCF 0.2

r1082140.


> Crawl seizes up when running Derby
> --
>
> Key: CONNECTORS-166
> URL: https://issues.apache.org/jira/browse/CONNECTORS-166
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Framework crawler agent
>Affects Versions: ManifoldCF 0.1, ManifoldCF 0.2
>Reporter: Karl Wright
>Assignee: Karl Wright
> Fix For: ManifoldCF 0.2
>
>
> A crawl using multiple worker threads with Derby eventually hangs, because 
> threads get deadlocked dealing with carrydown information.  At the time of 
> hang, a thread dump yields:
> "Worker thread '5'" daemon prio=6 tid=0x02fc7800 nid=0xd78 in Object.wait() 
> [0x0465f000]
>java.lang.Thread.State: WAITING (on object monitor)
> at java.lang.Object.wait(Native Method)
> - waiting on <0x2858b720> (a 
> org.apache.manifoldcf.core.database.Database$ExecuteQueryThread)
> at java.lang.Thread.join(Unknown Source)
> - locked <0x2858b720> (a 
> org.apache.manifoldcf.core.database.Database$ExecuteQueryThread)
> at java.lang.Thread.join(Unknown Source)
> at 
> org.apache.manifoldcf.core.database.Database.executeViaThread(Database.java:453)
> at 
> org.apache.manifoldcf.core.database.Database.executeUncachedQuery(Database.java:489)
> at 
> org.apache.manifoldcf.core.database.Database$QueryCacheExecutor.create(Database.java:1131)
> at 
> org.apache.manifoldcf.core.cachemanager.CacheManager.findObjectsAndExecute(CacheManager.java:144)
> at 
> org.apache.manifoldcf.core.database.Database.executeQuery(Database.java:168)
> at 
> org.apache.manifoldcf.core.database.DBInterfaceDerby.performQuery(DBInterfaceDerby.java:785)
> at 
> org.apache.manifoldcf.crawler.jobs.JobManager.processDeleteHashSet(JobManager.java:2592)
> at 
> org.apache.manifoldcf.crawler.jobs.JobManager.calculateAffectedDeleteCarrydownChildren(JobManager.java:2565)
> at 
> org.apache.manifoldcf.crawler.jobs.JobManager.markDocumentDeletedMultiple(JobManager.java:2494)
> at 
> org.apache.manifoldcf.crawler.system.WorkerThread.processDeleteLists(WorkerThread.java:1077)
> at 
> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:544)
> ... for at least two threads.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Commented: (CONNECTORS-166) Crawl seizes up when running Derby

2011-03-16 Thread Karl Wright (JIRA)

[ 
https://issues.apache.org/jira/browse/CONNECTORS-166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13007401#comment-13007401
 ] 

Karl Wright commented on CONNECTORS-166:


Oleg reports that the test seems to pass.  The only remaining issue is that the 
version of Derby built from trunk has upgrade blocked.  I will therefore need 
to build a version of Derby based on the latest release plus the patch instead.


> Crawl seizes up when running Derby
> --
>
> Key: CONNECTORS-166
> URL: https://issues.apache.org/jira/browse/CONNECTORS-166
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Framework crawler agent
>Affects Versions: ManifoldCF 0.1, ManifoldCF 0.2
>Reporter: Karl Wright
>Assignee: Karl Wright
>
> A crawl using multiple worker threads with Derby eventually hangs, because 
> threads get deadlocked dealing with carrydown information.  At the time of 
> hang, a thread dump yields:
> "Worker thread '5'" daemon prio=6 tid=0x02fc7800 nid=0xd78 in Object.wait() 
> [0x0465f000]
>java.lang.Thread.State: WAITING (on object monitor)
> at java.lang.Object.wait(Native Method)
> - waiting on <0x2858b720> (a 
> org.apache.manifoldcf.core.database.Database$ExecuteQueryThread)
> at java.lang.Thread.join(Unknown Source)
> - locked <0x2858b720> (a 
> org.apache.manifoldcf.core.database.Database$ExecuteQueryThread)
> at java.lang.Thread.join(Unknown Source)
> at 
> org.apache.manifoldcf.core.database.Database.executeViaThread(Database.java:453)
> at 
> org.apache.manifoldcf.core.database.Database.executeUncachedQuery(Database.java:489)
> at 
> org.apache.manifoldcf.core.database.Database$QueryCacheExecutor.create(Database.java:1131)
> at 
> org.apache.manifoldcf.core.cachemanager.CacheManager.findObjectsAndExecute(CacheManager.java:144)
> at 
> org.apache.manifoldcf.core.database.Database.executeQuery(Database.java:168)
> at 
> org.apache.manifoldcf.core.database.DBInterfaceDerby.performQuery(DBInterfaceDerby.java:785)
> at 
> org.apache.manifoldcf.crawler.jobs.JobManager.processDeleteHashSet(JobManager.java:2592)
> at 
> org.apache.manifoldcf.crawler.jobs.JobManager.calculateAffectedDeleteCarrydownChildren(JobManager.java:2565)
> at 
> org.apache.manifoldcf.crawler.jobs.JobManager.markDocumentDeletedMultiple(JobManager.java:2494)
> at 
> org.apache.manifoldcf.crawler.system.WorkerThread.processDeleteLists(WorkerThread.java:1077)
> at 
> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:544)
> ... for at least two threads.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Commented: (CONNECTORS-166) Crawl seizes up when running Derby

2011-03-14 Thread Karl Wright (JIRA)

[ 
https://issues.apache.org/jira/browse/CONNECTORS-166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13006583#comment-13006583
 ] 

Karl Wright commented on CONNECTORS-166:


According to the Derby team, Derby trunk fixes this problem.  I've therefore 
build trunk and checked it in.
r1081520.


> Crawl seizes up when running Derby
> --
>
> Key: CONNECTORS-166
> URL: https://issues.apache.org/jira/browse/CONNECTORS-166
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Framework crawler agent
>Affects Versions: ManifoldCF 0.1, ManifoldCF 0.2
>Reporter: Karl Wright
>Assignee: Karl Wright
>
> A crawl using multiple worker threads with Derby eventually hangs, because 
> threads get deadlocked dealing with carrydown information.  At the time of 
> hang, a thread dump yields:
> "Worker thread '5'" daemon prio=6 tid=0x02fc7800 nid=0xd78 in Object.wait() 
> [0x0465f000]
>java.lang.Thread.State: WAITING (on object monitor)
> at java.lang.Object.wait(Native Method)
> - waiting on <0x2858b720> (a 
> org.apache.manifoldcf.core.database.Database$ExecuteQueryThread)
> at java.lang.Thread.join(Unknown Source)
> - locked <0x2858b720> (a 
> org.apache.manifoldcf.core.database.Database$ExecuteQueryThread)
> at java.lang.Thread.join(Unknown Source)
> at 
> org.apache.manifoldcf.core.database.Database.executeViaThread(Database.java:453)
> at 
> org.apache.manifoldcf.core.database.Database.executeUncachedQuery(Database.java:489)
> at 
> org.apache.manifoldcf.core.database.Database$QueryCacheExecutor.create(Database.java:1131)
> at 
> org.apache.manifoldcf.core.cachemanager.CacheManager.findObjectsAndExecute(CacheManager.java:144)
> at 
> org.apache.manifoldcf.core.database.Database.executeQuery(Database.java:168)
> at 
> org.apache.manifoldcf.core.database.DBInterfaceDerby.performQuery(DBInterfaceDerby.java:785)
> at 
> org.apache.manifoldcf.crawler.jobs.JobManager.processDeleteHashSet(JobManager.java:2592)
> at 
> org.apache.manifoldcf.crawler.jobs.JobManager.calculateAffectedDeleteCarrydownChildren(JobManager.java:2565)
> at 
> org.apache.manifoldcf.crawler.jobs.JobManager.markDocumentDeletedMultiple(JobManager.java:2494)
> at 
> org.apache.manifoldcf.crawler.system.WorkerThread.processDeleteLists(WorkerThread.java:1077)
> at 
> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:544)
> ... for at least two threads.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Created: (CONNECTORS-166) Crawl seizes up when running Derby

2011-02-27 Thread Karl Wright (JIRA)
Crawl seizes up when running Derby
--

 Key: CONNECTORS-166
 URL: https://issues.apache.org/jira/browse/CONNECTORS-166
 Project: ManifoldCF
  Issue Type: Bug
  Components: Framework crawler agent
Affects Versions: ManifoldCF 0.1, ManifoldCF next
Reporter: Karl Wright
Assignee: Karl Wright


A crawl using multiple worker threads with Derby eventually hangs, because 
threads get deadlocked dealing with carrydown information.  At the time of 
hang, a thread dump yields:

"Worker thread '5'" daemon prio=6 tid=0x02fc7800 nid=0xd78 in Object.wait() 
[0x0465f000]
   java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x2858b720> (a 
org.apache.manifoldcf.core.database.Database$ExecuteQueryThread)
at java.lang.Thread.join(Unknown Source)
- locked <0x2858b720> (a 
org.apache.manifoldcf.core.database.Database$ExecuteQueryThread)
at java.lang.Thread.join(Unknown Source)
at 
org.apache.manifoldcf.core.database.Database.executeViaThread(Database.java:453)
at 
org.apache.manifoldcf.core.database.Database.executeUncachedQuery(Database.java:489)
at 
org.apache.manifoldcf.core.database.Database$QueryCacheExecutor.create(Database.java:1131)
at 
org.apache.manifoldcf.core.cachemanager.CacheManager.findObjectsAndExecute(CacheManager.java:144)
at 
org.apache.manifoldcf.core.database.Database.executeQuery(Database.java:168)
at 
org.apache.manifoldcf.core.database.DBInterfaceDerby.performQuery(DBInterfaceDerby.java:785)
at 
org.apache.manifoldcf.crawler.jobs.JobManager.processDeleteHashSet(JobManager.java:2592)
at 
org.apache.manifoldcf.crawler.jobs.JobManager.calculateAffectedDeleteCarrydownChildren(JobManager.java:2565)
at 
org.apache.manifoldcf.crawler.jobs.JobManager.markDocumentDeletedMultiple(JobManager.java:2494)
at 
org.apache.manifoldcf.crawler.system.WorkerThread.processDeleteLists(WorkerThread.java:1077)
at 
org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:544)

... for at least two threads.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Resolved: (CONNECTORS-163) Go to current version of Derby, to try and avoid internal deadlocks

2011-02-24 Thread Karl Wright (JIRA)

 [ 
https://issues.apache.org/jira/browse/CONNECTORS-163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karl Wright resolved CONNECTORS-163.


   Resolution: Fixed
Fix Version/s: ManifoldCF next

r1074064.

> Go to current version of Derby, to try and avoid internal deadlocks
> ---
>
> Key: CONNECTORS-163
> URL: https://issues.apache.org/jira/browse/CONNECTORS-163
> Project: ManifoldCF
>  Issue Type: Improvement
>  Components: Framework core
>Affects Versions: ManifoldCF next
>Reporter: Karl Wright
>Assignee: Karl Wright
> Fix For: ManifoldCF next
>
>
> Derby 10.5.3.0 irrecoverably deadlocks on the straightforward correlated 
> subqueries involving the carrydown table.  The source of the problem is not 
> clear.  However, there's a newer version of Derby available.  If it passes 
> the tests, I recommend trying that to see if the problem is fixed.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Updated: (CONNECTORS-163) Go to current version of Derby, to try and avoid internal deadlocks

2011-02-24 Thread Karl Wright (JIRA)

 [ 
https://issues.apache.org/jira/browse/CONNECTORS-163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karl Wright updated CONNECTORS-163:
---

Description: Derby 10.5.3.0 irrecoverably deadlocks on the straightforward 
correlated subqueries involving the carrydown table.  The source of the problem 
is not clear.  However, there's a newer version of Derby available.  If it 
passes the tests, I recommend trying that to see if the problem is fixed.  
(was: Derby 10.5.3.0 internally deadlocks on the straightforward correlated 
subqueries involving the carrydown table.  The source of the problem is not 
clear.  However, there's a newer version of Derby available.  If it passes the 
tests, I recommend trying that to see if the problem is fixed.)

> Go to current version of Derby, to try and avoid internal deadlocks
> ---
>
> Key: CONNECTORS-163
> URL: https://issues.apache.org/jira/browse/CONNECTORS-163
> Project: ManifoldCF
>  Issue Type: Improvement
>  Components: Framework core
>Affects Versions: ManifoldCF next
>Reporter: Karl Wright
>Assignee: Karl Wright
>
> Derby 10.5.3.0 irrecoverably deadlocks on the straightforward correlated 
> subqueries involving the carrydown table.  The source of the problem is not 
> clear.  However, there's a newer version of Derby available.  If it passes 
> the tests, I recommend trying that to see if the problem is fixed.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Created: (CONNECTORS-163) Go to current version of Derby, to try and avoid internal deadlocks

2011-02-24 Thread Karl Wright (JIRA)
Go to current version of Derby, to try and avoid internal deadlocks
---

 Key: CONNECTORS-163
 URL: https://issues.apache.org/jira/browse/CONNECTORS-163
 Project: ManifoldCF
  Issue Type: Improvement
  Components: Framework core
Affects Versions: ManifoldCF next
Reporter: Karl Wright


Derby 10.5.3.0 internally deadlocks on the straightforward correlated 
subqueries involving the carrydown table.  The source of the problem is not 
clear.  However, there's a newer version of Derby available.  If it passes the 
tests, I recommend trying that to see if the problem is fixed.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Assigned: (CONNECTORS-163) Go to current version of Derby, to try and avoid internal deadlocks

2011-02-24 Thread Karl Wright (JIRA)

 [ 
https://issues.apache.org/jira/browse/CONNECTORS-163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karl Wright reassigned CONNECTORS-163:
--

Assignee: Karl Wright

> Go to current version of Derby, to try and avoid internal deadlocks
> ---
>
> Key: CONNECTORS-163
> URL: https://issues.apache.org/jira/browse/CONNECTORS-163
> Project: ManifoldCF
>  Issue Type: Improvement
>  Components: Framework core
>Affects Versions: ManifoldCF next
>Reporter: Karl Wright
>    Assignee: Karl Wright
>
> Derby 10.5.3.0 internally deadlocks on the straightforward correlated 
> subqueries involving the carrydown table.  The source of the problem is not 
> clear.  However, there's a newer version of Derby available.  If it passes 
> the tests, I recommend trying that to see if the problem is fixed.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Resolved: (CONNECTORS-123) Document status report does not display the correct status under Derby

2010-11-01 Thread Karl Wright (JIRA)

 [ 
https://issues.apache.org/jira/browse/CONNECTORS-123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karl Wright resolved CONNECTORS-123.


Resolution: Fixed

It looks like this is a Derby bug, but it can be worked around by rearranging 
certain expressions.  r1029937.

> Document status report does not display the correct status under Derby
> --
>
> Key: CONNECTORS-123
> URL: https://issues.apache.org/jira/browse/CONNECTORS-123
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Framework agents process
>Reporter: Karl Wright
>Assignee: Karl Wright
>Priority: Minor
>
> The document status report displays a status of "Unknown" for documents that 
> are in the PENDING_PURGATORY state where the action time is greater than the 
> current time, and the action is RESCAN.  The status that should be displayed 
> is "Waiting for processing".  This only happens if the database is Derby.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (CONNECTORS-123) Document status report does not display the correct status under Derby

2010-11-01 Thread Karl Wright (JIRA)

 [ 
https://issues.apache.org/jira/browse/CONNECTORS-123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karl Wright reassigned CONNECTORS-123:
--

Assignee: Karl Wright

> Document status report does not display the correct status under Derby
> --
>
> Key: CONNECTORS-123
> URL: https://issues.apache.org/jira/browse/CONNECTORS-123
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Framework agents process
>Reporter: Karl Wright
>Assignee: Karl Wright
>Priority: Minor
>
> The document status report displays a status of "Unknown" for documents that 
> are in the PENDING_PURGATORY state where the action time is greater than the 
> current time, and the action is RESCAN.  The status that should be displayed 
> is "Waiting for processing".  This only happens if the database is Derby.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (CONNECTORS-123) Document status report does not display the correct status under Derby

2010-11-01 Thread Karl Wright (JIRA)
Document status report does not display the correct status under Derby
--

 Key: CONNECTORS-123
 URL: https://issues.apache.org/jira/browse/CONNECTORS-123
 Project: ManifoldCF
  Issue Type: Bug
  Components: Framework agents process
Reporter: Karl Wright
Priority: Minor


The document status report displays a status of "Unknown" for documents that 
are in the PENDING_PURGATORY state where the action time is greater than the 
current time, and the action is RESCAN.  The status that should be displayed is 
"Waiting for processing".  This only happens if the database is Derby.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (CONNECTORS-109) Queue status report fails under Derby

2010-10-31 Thread Karl Wright (JIRA)

 [ 
https://issues.apache.org/jira/browse/CONNECTORS-109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karl Wright resolved CONNECTORS-109.


   Resolution: Fixed
Fix Version/s: LCF Release 0.5

Hooked up user-defined functions to perform regular expression matching in 
Derby.  r1029455.


> Queue status report fails under Derby
> -
>
> Key: CONNECTORS-109
> URL: https://issues.apache.org/jira/browse/CONNECTORS-109
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Framework crawler agent
>Reporter: Karl Wright
>Assignee: Karl Wright
> Fix For: LCF Release 0.5
>
>
> If you try to use the queue status report with Derby as the database, you get 
> the following error:
> 2010-09-17 18:03:21.558:WARN::Nested in javax.servlet.ServletException: 
> org.apac
> he.acf.core.interfaces.ACFException: Database exception: Exception doing 
> query:
> Syntax error: Encountered "SUBSTRING" at line 1, column 8.:
> org.apache.acf.core.interfaces.ACFException: Database exception: Exception 
> doing
>  query: Syntax error: Encountered "SUBSTRING" at line 1, column 8.
> at 
> org.apache.acf.core.database.Database.executeViaThread(Database.java:
> 421)
> at 
> org.apache.acf.core.database.Database.executeUncachedQuery(Database.j
> ava:465)
> at 
> org.apache.acf.core.database.Database$QueryCacheExecutor.create(Datab
> ase.java:1072)
> at 
> org.apache.acf.core.cachemanager.CacheManager.findObjectsAndExecute(C
> acheManager.java:144)
> at 
> org.apache.acf.core.database.Database.executeQuery(Database.java:167)
> at 
> org.apache.acf.core.database.DBInterfaceDerby.performQuery(DBInterfac
> eDerby.java:751)
> at 
> org.apache.acf.crawler.jobs.JobManager.genQueueStatus(JobManager.java
> :5981)
> at 
> org.apache.jsp.queuestatus_jsp._jspService(queuestatus_jsp.java:769)
> at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:70)
> at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
> at 
> org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper
> .java:377)
> at 
> org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:3
> 13)
> at org.apache.jasper.servlet.JspServlet.service(JspServlet.java:260)
> at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
> at 
> org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511
> )
> at 
> org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:3
> 90)
> at 
> org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.jav
> a:216)
> at 
> org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:1
> 82)
> at 
> org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:7
> 65)
> at 
> org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:418)
> at org.mortbay.jetty.servlet.Dispatcher.forward(Dispatcher.java:327)
> at org.mortbay.jetty.servlet.Dispatcher.forward(Dispatcher.java:126)
> at 
> org.apache.jasper.runtime.PageContextImpl.doForward(PageContextImpl.j
> ava:706)
> at 
> org.apache.jasper.runtime.PageContextImpl.forward(PageContextImpl.jav
> a:677)
> at org.apache.jsp.execute_jsp._jspService(execute_jsp.java:1291)
> at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:70)
> at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
> at 
> org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper
> .java:377)
> at 
> org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:3
> 13)
> at org.apache.jasper.servlet.JspServlet.service(JspServlet.java:260)
> at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
> at 
> org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511
> )
> at 
> org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:3
> 90)
> at 
> org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.jav
> a:216)
> at 
> org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:1
> 82)
> at 
> org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:7
> 65)
> at 
> org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:418)
> at 
> org.mortbay.jetty.handler.HandlerCollection.handle(HandlerColl

[jira] Assigned: (CONNECTORS-109) Queue status report fails under Derby

2010-10-31 Thread Karl Wright (JIRA)

 [ 
https://issues.apache.org/jira/browse/CONNECTORS-109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karl Wright reassigned CONNECTORS-109:
--

Assignee: Karl Wright

> Queue status report fails under Derby
> -
>
> Key: CONNECTORS-109
> URL: https://issues.apache.org/jira/browse/CONNECTORS-109
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Framework crawler agent
>Reporter: Karl Wright
>Assignee: Karl Wright
>
> If you try to use the queue status report with Derby as the database, you get 
> the following error:
> 2010-09-17 18:03:21.558:WARN::Nested in javax.servlet.ServletException: 
> org.apac
> he.acf.core.interfaces.ACFException: Database exception: Exception doing 
> query:
> Syntax error: Encountered "SUBSTRING" at line 1, column 8.:
> org.apache.acf.core.interfaces.ACFException: Database exception: Exception 
> doing
>  query: Syntax error: Encountered "SUBSTRING" at line 1, column 8.
> at 
> org.apache.acf.core.database.Database.executeViaThread(Database.java:
> 421)
> at 
> org.apache.acf.core.database.Database.executeUncachedQuery(Database.j
> ava:465)
> at 
> org.apache.acf.core.database.Database$QueryCacheExecutor.create(Datab
> ase.java:1072)
> at 
> org.apache.acf.core.cachemanager.CacheManager.findObjectsAndExecute(C
> acheManager.java:144)
> at 
> org.apache.acf.core.database.Database.executeQuery(Database.java:167)
> at 
> org.apache.acf.core.database.DBInterfaceDerby.performQuery(DBInterfac
> eDerby.java:751)
> at 
> org.apache.acf.crawler.jobs.JobManager.genQueueStatus(JobManager.java
> :5981)
> at 
> org.apache.jsp.queuestatus_jsp._jspService(queuestatus_jsp.java:769)
> at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:70)
> at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
> at 
> org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper
> .java:377)
> at 
> org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:3
> 13)
> at org.apache.jasper.servlet.JspServlet.service(JspServlet.java:260)
> at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
> at 
> org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511
> )
> at 
> org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:3
> 90)
> at 
> org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.jav
> a:216)
> at 
> org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:1
> 82)
> at 
> org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:7
> 65)
> at 
> org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:418)
> at org.mortbay.jetty.servlet.Dispatcher.forward(Dispatcher.java:327)
> at org.mortbay.jetty.servlet.Dispatcher.forward(Dispatcher.java:126)
> at 
> org.apache.jasper.runtime.PageContextImpl.doForward(PageContextImpl.j
> ava:706)
> at 
> org.apache.jasper.runtime.PageContextImpl.forward(PageContextImpl.jav
> a:677)
> at org.apache.jsp.execute_jsp._jspService(execute_jsp.java:1291)
> at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:70)
> at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
> at 
> org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper
> .java:377)
> at 
> org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:3
> 13)
> at org.apache.jasper.servlet.JspServlet.service(JspServlet.java:260)
> at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
> at 
> org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511
> )
> at 
> org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:3
> 90)
> at 
> org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.jav
> a:216)
> at 
> org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:1
> 82)
> at 
> org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:7
> 65)
> at 
> org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:418)
> at 
> org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.
> java:114)
> at 
> org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:1
> 52)
> at org.mortbay.jetty.Server.handl

[jira] Commented: (CONNECTORS-114) Derby seems too unstable in multithreaded situations to be a good database for ManifoldCF, so try to add support for HSQLDB

2010-10-12 Thread Karl Wright (JIRA)
   at org.hsqldb.jdbc.JDBCPreparedStatement.performPreExecute(Unknown Sourc
e)
at org.hsqldb.jdbc.JDBCPreparedStatement.fetchResult(Unknown Source)
at org.hsqldb.jdbc.JDBCPreparedStatement.executeUpdate(Unknown Source)
- locked <0x29a65798> (a org.hsqldb.jdbc.JDBCPreparedStatement)
at org.apache.manifoldcf.core.database.Database.execute(Database.java:56
6)
at org.apache.manifoldcf.core.database.Database$ExecuteQueryThread.run(D
atabase.java:381)

Found 1 deadlock.


> Derby seems too unstable in multithreaded situations to be a good database 
> for ManifoldCF, so try to add support for HSQLDB
> ---
>
> Key: CONNECTORS-114
> URL: https://issues.apache.org/jira/browse/CONNECTORS-114
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Framework core
>Reporter: Karl Wright
>
> Derby seems to have multiple problems:
> (1) It has internal deadlocks, which even if caught cause poor performance 
> due to stalling (CONNECTORS-111);
> (2) It has no support for certain SQL constructs (CONNECTORS-109 and 
> CONNECTORS-110);
> (3) It locks up entirely for some people (CONNECTORS-100).
> HSQLDB has been recommended as another potential embedded database that might 
> work better.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (CONNECTORS-114) Derby seems too unstable in multithreaded situations to be a good database for ManifoldCF, so try to add support for HSQLDB

2010-10-12 Thread Karl Wright (JIRA)

[ 
https://issues.apache.org/jira/browse/CONNECTORS-114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12920227#action_12920227
 ] 

Karl Wright commented on CONNECTORS-114:


Support added and checked in.

However, when I try to use hsqldb for an actual crawl, in less than 10 seconds 
I wind up with a java-level  thread deadlock.  I've posted the thread dump to 
connectors-dev.  All the locks seem to be deep inside hsqldb, FWIW, which leads 
me to believe that perhaps hsqldb is even less stable than Derby in a 
multithread environment.


> Derby seems too unstable in multithreaded situations to be a good database 
> for ManifoldCF, so try to add support for HSQLDB
> ---
>
> Key: CONNECTORS-114
> URL: https://issues.apache.org/jira/browse/CONNECTORS-114
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Framework core
>Reporter: Karl Wright
>
> Derby seems to have multiple problems:
> (1) It has internal deadlocks, which even if caught cause poor performance 
> due to stalling (CONNECTORS-111);
> (2) It has no support for certain SQL constructs (CONNECTORS-109 and 
> CONNECTORS-110);
> (3) It locks up entirely for some people (CONNECTORS-100).
> HSQLDB has been recommended as another potential embedded database that might 
> work better.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (CONNECTORS-114) Derby seems too unstable in multithreaded situations to be a good database for ManifoldCF, so try to add support for HSQLDB

2010-10-09 Thread Karl Wright (JIRA)
Derby seems too unstable in multithreaded situations to be a good database for 
ManifoldCF, so try to add support for HSQLDB
---

 Key: CONNECTORS-114
 URL: https://issues.apache.org/jira/browse/CONNECTORS-114
 Project: ManifoldCF
  Issue Type: Bug
  Components: Framework core
Reporter: Karl Wright


Derby seems to have multiple problems:
(1) It has internal deadlocks, which even if caught cause poor performance due 
to stalling (CONNECTORS-111);
(2) It has no support for certain SQL constructs (CONNECTORS-109 and 
CONNECTORS-110);
(3) It locks up entirely for some people (CONNECTORS-100).

HSQLDB has been recommended as another potential embedded database that might 
work better.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (CONNECTORS-111) Encountering deadlock using quick-start & derby

2010-10-06 Thread Karl Wright (JIRA)

 [ 
https://issues.apache.org/jira/browse/CONNECTORS-111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karl Wright resolved CONNECTORS-111.


   Resolution: Fixed
Fix Version/s: LCF Release 0.5

Retry seems to have fixed things.


> Encountering deadlock using quick-start & derby
> ---
>
> Key: CONNECTORS-111
> URL: https://issues.apache.org/jira/browse/CONNECTORS-111
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Examples
> Environment: Windows XP Professional SR3, Intel Core 2, 2 GB Ram
>Reporter: Farzad
>Assignee: Karl Wright
> Fix For: LCF Release 0.5
>
>
> Ran into problem with quick-start and thought I might have better luck if I 
> manually setup the system. Maybe you can shed a light on the quick-start 
> problem. Here is what happened, after running start.jar, I went to the 
> crawler UI, configured a null output and a file system repo connector. 
> Created a job pointing to a file share \\host\share and started the job. 
> After a few seconds I ran into the error message below in the job status 
> panel. It said 60 docs found, 9 active, and 52 processed. Any ideas as to why 
> I'm seeing this?
> Error: A lock could not be obtained due to a deadlock, cycle of locks and 
> waiters is: Lock : ROW, INGESTSTATUS, (1,57) Waiting XID : 
> Unknown macro: {6293, X} 
> , APP, DELETE FROM ingeststatus WHERE urihash=? AND dockey!=? AND 
> connectionname=? Granted XID : 
> Unknown macro: {6305, X} 
> Lock : ROW, INGESTSTATUS, (1,55) Waiting XID : 
> , APP, INSERT INTO ingeststatus 
> (id,changecount,dockey,lastversion,firstingest,connectionname,authorityname,urihash,lastoutputversion,lastingest,docuri)
>  VALUES (?,?,?,?,?,?,?,?,?,?,?) Granted XID : 
> . The selected victim is XID : 6293.
> Thanks!

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (CONNECTORS-111) Encountering deadlock using quick-start & derby

2010-10-06 Thread Karl Wright (JIRA)

 [ 
https://issues.apache.org/jira/browse/CONNECTORS-111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karl Wright reassigned CONNECTORS-111:
--

Assignee: Karl Wright

> Encountering deadlock using quick-start & derby
> ---
>
> Key: CONNECTORS-111
> URL: https://issues.apache.org/jira/browse/CONNECTORS-111
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Examples
> Environment: Windows XP Professional SR3, Intel Core 2, 2 GB Ram
>Reporter: Farzad
>Assignee: Karl Wright
> Fix For: LCF Release 0.5
>
>
> Ran into problem with quick-start and thought I might have better luck if I 
> manually setup the system. Maybe you can shed a light on the quick-start 
> problem. Here is what happened, after running start.jar, I went to the 
> crawler UI, configured a null output and a file system repo connector. 
> Created a job pointing to a file share \\host\share and started the job. 
> After a few seconds I ran into the error message below in the job status 
> panel. It said 60 docs found, 9 active, and 52 processed. Any ideas as to why 
> I'm seeing this?
> Error: A lock could not be obtained due to a deadlock, cycle of locks and 
> waiters is: Lock : ROW, INGESTSTATUS, (1,57) Waiting XID : 
> Unknown macro: {6293, X} 
> , APP, DELETE FROM ingeststatus WHERE urihash=? AND dockey!=? AND 
> connectionname=? Granted XID : 
> Unknown macro: {6305, X} 
> Lock : ROW, INGESTSTATUS, (1,55) Waiting XID : 
> , APP, INSERT INTO ingeststatus 
> (id,changecount,dockey,lastversion,firstingest,connectionname,authorityname,urihash,lastoutputversion,lastingest,docuri)
>  VALUES (?,?,?,?,?,?,?,?,?,?,?) Granted XID : 
> . The selected victim is XID : 6293.
> Thanks!

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (CONNECTORS-111) Encountering deadlock using quick-start & derby

2010-10-06 Thread Farzad (JIRA)

[ 
https://issues.apache.org/jira/browse/CONNECTORS-111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12918507#action_12918507
 ] 

Farzad commented on CONNECTORS-111:
---

I tried your fix this morning and I was able to run the job successfully the 
first time after setup.  Seems to be resolved.  I ran a few other experiments 
and they were fine too. Thanks!

> Encountering deadlock using quick-start & derby
> ---
>
> Key: CONNECTORS-111
> URL: https://issues.apache.org/jira/browse/CONNECTORS-111
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Examples
> Environment: Windows XP Professional SR3, Intel Core 2, 2 GB Ram
>Reporter: Farzad
>
> Ran into problem with quick-start and thought I might have better luck if I 
> manually setup the system. Maybe you can shed a light on the quick-start 
> problem. Here is what happened, after running start.jar, I went to the 
> crawler UI, configured a null output and a file system repo connector. 
> Created a job pointing to a file share \\host\share and started the job. 
> After a few seconds I ran into the error message below in the job status 
> panel. It said 60 docs found, 9 active, and 52 processed. Any ideas as to why 
> I'm seeing this?
> Error: A lock could not be obtained due to a deadlock, cycle of locks and 
> waiters is: Lock : ROW, INGESTSTATUS, (1,57) Waiting XID : 
> Unknown macro: {6293, X} 
> , APP, DELETE FROM ingeststatus WHERE urihash=? AND dockey!=? AND 
> connectionname=? Granted XID : 
> Unknown macro: {6305, X} 
> Lock : ROW, INGESTSTATUS, (1,55) Waiting XID : 
> , APP, INSERT INTO ingeststatus 
> (id,changecount,dockey,lastversion,firstingest,connectionname,authorityname,urihash,lastoutputversion,lastingest,docuri)
>  VALUES (?,?,?,?,?,?,?,?,?,?,?) Granted XID : 
> . The selected victim is XID : 6293.
> Thanks!

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (CONNECTORS-111) Encountering deadlock using quick-start & derby

2010-10-06 Thread Karl Wright (JIRA)

[ 
https://issues.apache.org/jira/browse/CONNECTORS-111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12918460#action_12918460
 ] 

Karl Wright commented on CONNECTORS-111:


Looking at the complaint again:

Error: A lock could not be obtained due to a deadlock, cycle of locks and 
waiters is: 
Lock : ROW, INGESTSTATUS, (1,57) Waiting XID :6293, APP, DELETE FROM 
ingeststatus WHERE urihash=? AND dockey\!=? AND connectionname=? Granted XID 
:6305
Lock : ROW, INGESTSTATUS, (1,55) Waiting XID :6305, APP, INSERT INTO 
ingeststatus 
(id,changecount,dockey,lastversion,firstingest,connectionname,authorityname,urihash,lastoutputversion,lastingest,docuri)
 VALUES (?,?,?,?,?,?,?,?,?,?,?) Granted XID :6293

The selected victim is XID : 6293.

... it seems more like this has nothing to do with transactions, and more to do 
with an internal lock-ordering problem in Derby itself.  So, each database 
modification has a potential of throwing one of these exceptions.

I tried to fix that issue by retrying the operation should I be outside of a 
transaction.  r1004915.


> Encountering deadlock using quick-start & derby
> ---
>
> Key: CONNECTORS-111
> URL: https://issues.apache.org/jira/browse/CONNECTORS-111
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Examples
> Environment: Windows XP Professional SR3, Intel Core 2, 2 GB Ram
>Reporter: Farzad
>
> Ran into problem with quick-start and thought I might have better luck if I 
> manually setup the system. Maybe you can shed a light on the quick-start 
> problem. Here is what happened, after running start.jar, I went to the 
> crawler UI, configured a null output and a file system repo connector. 
> Created a job pointing to a file share \\host\share and started the job. 
> After a few seconds I ran into the error message below in the job status 
> panel. It said 60 docs found, 9 active, and 52 processed. Any ideas as to why 
> I'm seeing this?
> Error: A lock could not be obtained due to a deadlock, cycle of locks and 
> waiters is: Lock : ROW, INGESTSTATUS, (1,57) Waiting XID : 
> Unknown macro: {6293, X} 
> , APP, DELETE FROM ingeststatus WHERE urihash=? AND dockey!=? AND 
> connectionname=? Granted XID : 
> Unknown macro: {6305, X} 
> Lock : ROW, INGESTSTATUS, (1,55) Waiting XID : 
> , APP, INSERT INTO ingeststatus 
> (id,changecount,dockey,lastversion,firstingest,connectionname,authorityname,urihash,lastoutputversion,lastingest,docuri)
>  VALUES (?,?,?,?,?,?,?,?,?,?,?) Granted XID : 
> . The selected victim is XID : 6293.
> Thanks!

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (CONNECTORS-111) Encountering deadlock using quick-start & derby

2010-10-05 Thread Karl Wright (JIRA)

[ 
https://issues.apache.org/jira/browse/CONNECTORS-111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12918134#action_12918134
 ] 

Karl Wright commented on CONNECTORS-111:


I made a change that should make the derby ManifoldCF database implementation 
more robust against exceptions thrown during commit or rollback. r1004786.  See 
if this makes any difference in your setup.



> Encountering deadlock using quick-start & derby
> ---
>
> Key: CONNECTORS-111
> URL: https://issues.apache.org/jira/browse/CONNECTORS-111
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Examples
> Environment: Windows XP Professional SR3, Intel Core 2, 2 GB Ram
>Reporter: Farzad
>
> Ran into problem with quick-start and thought I might have better luck if I 
> manually setup the system. Maybe you can shed a light on the quick-start 
> problem. Here is what happened, after running start.jar, I went to the 
> crawler UI, configured a null output and a file system repo connector. 
> Created a job pointing to a file share \\host\share and started the job. 
> After a few seconds I ran into the error message below in the job status 
> panel. It said 60 docs found, 9 active, and 52 processed. Any ideas as to why 
> I'm seeing this?
> Error: A lock could not be obtained due to a deadlock, cycle of locks and 
> waiters is: Lock : ROW, INGESTSTATUS, (1,57) Waiting XID : 
> Unknown macro: {6293, X} 
> , APP, DELETE FROM ingeststatus WHERE urihash=? AND dockey!=? AND 
> connectionname=? Granted XID : 
> Unknown macro: {6305, X} 
> Lock : ROW, INGESTSTATUS, (1,55) Waiting XID : 
> , APP, INSERT INTO ingeststatus 
> (id,changecount,dockey,lastversion,firstingest,connectionname,authorityname,urihash,lastoutputversion,lastingest,docuri)
>  VALUES (?,?,?,?,?,?,?,?,?,?,?) Granted XID : 
> . The selected victim is XID : 6293.
> Thanks!

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (CONNECTORS-111) Encountering deadlock using quick-start & derby

2010-10-05 Thread Karl Wright (JIRA)

[ 
https://issues.apache.org/jira/browse/CONNECTORS-111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12918121#action_12918121
 ] 

Karl Wright commented on CONNECTORS-111:


As discussed in Confluence, the code in question is not apparently within a 
database transaction, so it's puzzling to me how a deadlock could develop.  The 
possibilities are:

(1) Derby detects deadlocks in part by timeout.  Perhaps the derby timeout time 
is too short.
(2) It could be a plain old Derby bug.
(3) There could be an error occuring at some point earlier during 
connection.commit(), which is confusing the ManifoldCF derby implementation.  
The semantics of such errors are not clear.  I suppose I can presume that a 
rollback took place in that case.


> Encountering deadlock using quick-start & derby
> ---
>
> Key: CONNECTORS-111
> URL: https://issues.apache.org/jira/browse/CONNECTORS-111
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Examples
> Environment: Windows XP Professional SR3, Intel Core 2, 2 GB Ram
>Reporter: Farzad
>
> Ran into problem with quick-start and thought I might have better luck if I 
> manually setup the system. Maybe you can shed a light on the quick-start 
> problem. Here is what happened, after running start.jar, I went to the 
> crawler UI, configured a null output and a file system repo connector. 
> Created a job pointing to a file share \\host\share and started the job. 
> After a few seconds I ran into the error message below in the job status 
> panel. It said 60 docs found, 9 active, and 52 processed. Any ideas as to why 
> I'm seeing this?
> Error: A lock could not be obtained due to a deadlock, cycle of locks and 
> waiters is: Lock : ROW, INGESTSTATUS, (1,57) Waiting XID : 
> Unknown macro: {6293, X} 
> , APP, DELETE FROM ingeststatus WHERE urihash=? AND dockey!=? AND 
> connectionname=? Granted XID : 
> Unknown macro: {6305, X} 
> Lock : ROW, INGESTSTATUS, (1,55) Waiting XID : 
> , APP, INSERT INTO ingeststatus 
> (id,changecount,dockey,lastversion,firstingest,connectionname,authorityname,urihash,lastoutputversion,lastingest,docuri)
>  VALUES (?,?,?,?,?,?,?,?,?,?,?) Granted XID : 
> . The selected victim is XID : 6293.
> Thanks!

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (CONNECTORS-111) Encountering deadlock using quick-start & derby

2010-10-05 Thread Farzad (JIRA)

[ 
https://issues.apache.org/jira/browse/CONNECTORS-111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12918119#action_12918119
 ] 

Farzad commented on CONNECTORS-111:
---

I experimented more and it seems to be happening the first time after a clean 
setup of ManifoldCF.  Subsequent jobs including the same job ran successfully.

> Encountering deadlock using quick-start & derby
> ---
>
> Key: CONNECTORS-111
> URL: https://issues.apache.org/jira/browse/CONNECTORS-111
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Examples
> Environment: Windows XP Professional SR3, Intel Core 2, 2 GB Ram
>Reporter: Farzad
>
> Ran into problem with quick-start and thought I might have better luck if I 
> manually setup the system. Maybe you can shed a light on the quick-start 
> problem. Here is what happened, after running start.jar, I went to the 
> crawler UI, configured a null output and a file system repo connector. 
> Created a job pointing to a file share \\host\share and started the job. 
> After a few seconds I ran into the error message below in the job status 
> panel. It said 60 docs found, 9 active, and 52 processed. Any ideas as to why 
> I'm seeing this?
> Error: A lock could not be obtained due to a deadlock, cycle of locks and 
> waiters is: Lock : ROW, INGESTSTATUS, (1,57) Waiting XID : 
> Unknown macro: {6293, X} 
> , APP, DELETE FROM ingeststatus WHERE urihash=? AND dockey!=? AND 
> connectionname=? Granted XID : 
> Unknown macro: {6305, X} 
> Lock : ROW, INGESTSTATUS, (1,55) Waiting XID : 
> , APP, INSERT INTO ingeststatus 
> (id,changecount,dockey,lastversion,firstingest,connectionname,authorityname,urihash,lastoutputversion,lastingest,docuri)
>  VALUES (?,?,?,?,?,?,?,?,?,?,?) Granted XID : 
> . The selected victim is XID : 6293.
> Thanks!

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (CONNECTORS-111) Encountering deadlock using quick-start & derby

2010-10-05 Thread Farzad (JIRA)
Encountering deadlock using quick-start & derby
---

 Key: CONNECTORS-111
 URL: https://issues.apache.org/jira/browse/CONNECTORS-111
 Project: ManifoldCF
  Issue Type: Bug
  Components: Examples
 Environment: Windows XP Professional SR3, Intel Core 2, 2 GB Ram
Reporter: Farzad


Ran into problem with quick-start and thought I might have better luck if I 
manually setup the system. Maybe you can shed a light on the quick-start 
problem. Here is what happened, after running start.jar, I went to the crawler 
UI, configured a null output and a file system repo connector. Created a job 
pointing to a file share \\host\share and started the job. After a few seconds 
I ran into the error message below in the job status panel. It said 60 docs 
found, 9 active, and 52 processed. Any ideas as to why I'm seeing this?

Error: A lock could not be obtained due to a deadlock, cycle of locks and 
waiters is: Lock : ROW, INGESTSTATUS, (1,57) Waiting XID : 

Unknown macro: {6293, X} 
, APP, DELETE FROM ingeststatus WHERE urihash=? AND dockey!=? AND 
connectionname=? Granted XID : 

Unknown macro: {6305, X} 
Lock : ROW, INGESTSTATUS, (1,55) Waiting XID : 

, APP, INSERT INTO ingeststatus 
(id,changecount,dockey,lastversion,firstingest,connectionname,authorityname,urihash,lastoutputversion,lastingest,docuri)
 VALUES (?,?,?,?,?,?,?,?,?,?,?) Granted XID : 

. The selected victim is XID : 6293.

Thanks!


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: Derby SQL ideas needed

2010-09-19 Thread Karl Wright
Yes.  This is for the Max Activity and Max Bandwidth reports.
Karl

On Sun, Sep 19, 2010 at 2:13 PM, Alexey Serba  wrote:
> And all of this is only with single table repohistory, right? Is this
> some kind of complex analytics/stats?
>
> On Sun, Sep 19, 2010 at 8:48 PM, Karl Wright  wrote:
>> Here you go:
>>
>>    // The query we will generate here looks like this:
>>    // SELECT *
>>    //   FROM
>>    //     (SELECT DISTINCT ON (idbucket) t3.bucket AS idbucket,
>> t3.bytecount AS bytecount,
>>    //                               t3.windowstart AS starttime,
>> t3.windowend AS endtime
>>    //        FROM (SELECT * FROM (SELECT t0.bucket AS bucket,
>> t0.starttime AS windowstart, t0.starttime +  AS windowend,
>>    //                   SUM(t1.datasize * ((case when t0.starttime +
>>  < t1.endtime then t0.starttime +  else t1.endtime
>> end) -
>>    //                     (case when t0.starttime>t1.starttime then
>> t0.starttime else t1.starttime end))
>>    //                      / (t1.endtime - t1.starttime)) AS bytecount
>>    //                   FROM (SELECT DISTINCT substring(entityid from
>> '') AS bucket, starttime FROM repohistory WHERE
>> ) t0, repohistory t1
>>    //                   WHERE t0.bucket=substring(t1.entityid from
>> '')
>>    //                      AND t1.starttime < t0.starttime +
>>  AND t1.endtime > t0.starttime
>>    //                      AND 
>>    //                          GROUP BY bucket,windowstart,windowend
>>    //              UNION SELECT t0a.bucket AS bucket, t0a.endtime -
>>  AS windowstart, t0a.endtime AS windowend,
>>    //                   SUM(t1a.datasize * ((case when t0a.endtime <
>> t1a.endtime then t0a.endtime else t1a.endtime end) -
>>    //                     (case when t0a.endtime -  >
>> t1a.starttime then t0a.endtime -  else t1a.starttime end))
>>    //                      / (t1a.endtime - t1a.starttime)) AS bytecount
>>    //                   FROM (SELECT DISTINCT substring(entityid from
>> '') AS bucket, endtime FROM repohistory WHERE
>> ) t0a, repohistory t1a
>>    //                   WHERE t0a.bucket=substring(t1a.entityid from
>> '')
>>    //                      AND (t1a.starttime < t0a.endtime AND
>> t1a.endtime > t0a.endtime - 
>>    //                      AND 
>>    //                          GROUP BY bucket,windowstart,windowend) t2
>>    //                              ORDER BY bucket ASC,bytecount
>> DESC) t3) t4 ORDER BY xxx LIMIT yyy OFFSET zzz;
>>
>> I have low confidence that ANY planner would be able to locate the
>> common part of a 2x larger query and not do it twice.
>>
>> Karl
>>
>>
>>
>> On Sun, Sep 19, 2010 at 12:05 PM, Alexey Serba  wrote:
>>>> The other thing is that we cannot afford to use the same "table"
>>>> twice, as it is actually an extremely expensive query in its own
>>>> right, with multiple joins, select distinct's, etc. under the covers.
>>> Even if you create indexes on bucket and activitycount columns? It
>>> might be that the query plans for these two queries (with "distinct
>>> on" hack and subquery max/subquery order limit/join) would be the
>>> same.
>>>
>>>> I'd be happy to post it but it may shock you. ;-)
>>> The way I indent SQL queries should say that I'm not afraid of
>>> multipage queries :)
>>>
>>>>
>>>> Karl
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Sun, Sep 19, 2010 at 11:32 AM, Alexey Serba  wrote:
>>>>>> SELECT DISTINCT ON (idbucket) t3.bucket AS idbucket, t3.activitycount AS 
>>>>>> activitycount, t3.windowstart AS starttime, t3.windowend AS endtime FROM 
>>>>>> (...) t3
>>>>> Do you have primary key in your t3 table?
>>>>>
>>>>>> In Postgresql, what this does is to return the FIRST entire row matching 
>>>>>> each distinct idbucket result.
>>>>> FIRST based on which sort?
>>>>>
>>>>> Lets say you want to return FIRST row based on t3.windowstart column
>>>>> and you have primary key in t3 table. Then I believe your query can be
>>>>> rewritten in the following ways:
>>>>>
>>>>> 1. Using subqueries
>>>>> SELECT
>>>>>    bucket, primary_key, windowstart, etc
>>>>> FROM
>>>>>   

Re: Derby SQL ideas needed

2010-09-19 Thread Alexey Serba
And all of this is only with single table repohistory, right? Is this
some kind of complex analytics/stats?

On Sun, Sep 19, 2010 at 8:48 PM, Karl Wright  wrote:
> Here you go:
>
>    // The query we will generate here looks like this:
>    // SELECT *
>    //   FROM
>    //     (SELECT DISTINCT ON (idbucket) t3.bucket AS idbucket,
> t3.bytecount AS bytecount,
>    //                               t3.windowstart AS starttime,
> t3.windowend AS endtime
>    //        FROM (SELECT * FROM (SELECT t0.bucket AS bucket,
> t0.starttime AS windowstart, t0.starttime +  AS windowend,
>    //                   SUM(t1.datasize * ((case when t0.starttime +
>  < t1.endtime then t0.starttime +  else t1.endtime
> end) -
>    //                     (case when t0.starttime>t1.starttime then
> t0.starttime else t1.starttime end))
>    //                      / (t1.endtime - t1.starttime)) AS bytecount
>    //                   FROM (SELECT DISTINCT substring(entityid from
> '') AS bucket, starttime FROM repohistory WHERE
> ) t0, repohistory t1
>    //                   WHERE t0.bucket=substring(t1.entityid from
> '')
>    //                      AND t1.starttime < t0.starttime +
>  AND t1.endtime > t0.starttime
>    //                      AND 
>    //                          GROUP BY bucket,windowstart,windowend
>    //              UNION SELECT t0a.bucket AS bucket, t0a.endtime -
>  AS windowstart, t0a.endtime AS windowend,
>    //                   SUM(t1a.datasize * ((case when t0a.endtime <
> t1a.endtime then t0a.endtime else t1a.endtime end) -
>    //                     (case when t0a.endtime -  >
> t1a.starttime then t0a.endtime -  else t1a.starttime end))
>    //                      / (t1a.endtime - t1a.starttime)) AS bytecount
>    //                   FROM (SELECT DISTINCT substring(entityid from
> '') AS bucket, endtime FROM repohistory WHERE
> ) t0a, repohistory t1a
>    //                   WHERE t0a.bucket=substring(t1a.entityid from
> '')
>    //                      AND (t1a.starttime < t0a.endtime AND
> t1a.endtime > t0a.endtime - 
>    //                      AND 
>    //                          GROUP BY bucket,windowstart,windowend) t2
>    //                              ORDER BY bucket ASC,bytecount
> DESC) t3) t4 ORDER BY xxx LIMIT yyy OFFSET zzz;
>
> I have low confidence that ANY planner would be able to locate the
> common part of a 2x larger query and not do it twice.
>
> Karl
>
>
>
> On Sun, Sep 19, 2010 at 12:05 PM, Alexey Serba  wrote:
>>> The other thing is that we cannot afford to use the same "table"
>>> twice, as it is actually an extremely expensive query in its own
>>> right, with multiple joins, select distinct's, etc. under the covers.
>> Even if you create indexes on bucket and activitycount columns? It
>> might be that the query plans for these two queries (with "distinct
>> on" hack and subquery max/subquery order limit/join) would be the
>> same.
>>
>>> I'd be happy to post it but it may shock you. ;-)
>> The way I indent SQL queries should say that I'm not afraid of
>> multipage queries :)
>>
>>>
>>> Karl
>>>
>>>
>>>
>>>
>>>
>>> On Sun, Sep 19, 2010 at 11:32 AM, Alexey Serba  wrote:
>>>>> SELECT DISTINCT ON (idbucket) t3.bucket AS idbucket, t3.activitycount AS 
>>>>> activitycount, t3.windowstart AS starttime, t3.windowend AS endtime FROM 
>>>>> (...) t3
>>>> Do you have primary key in your t3 table?
>>>>
>>>>> In Postgresql, what this does is to return the FIRST entire row matching 
>>>>> each distinct idbucket result.
>>>> FIRST based on which sort?
>>>>
>>>> Lets say you want to return FIRST row based on t3.windowstart column
>>>> and you have primary key in t3 table. Then I believe your query can be
>>>> rewritten in the following ways:
>>>>
>>>> 1. Using subqueries
>>>> SELECT
>>>>    bucket, primary_key, windowstart, etc
>>>> FROM
>>>>    table AS t1
>>>> WHERE
>>>>    windowstart=( SELECT max(windowstart) FROM table AS t2 WHERE
>>>> bucket = t1.bucket )
>>>>
>>>> 2. Using joins instead of subqueries ( in case Derby doesn't support
>>>> subqueries - not sure about that )
>>>> SELECT
>>>>    t1.bucket, t1.primary_key, windowstart, etc
>>>> FROM
>>>>    table AS t1
>>>>    LEFT OUTER JOIN table AS t2 ON ( t

Re: Derby SQL ideas needed

2010-09-19 Thread Alexey Serba
You can also try ORDER BY bytecount DESC LIMIT 1 instead of aggregate
function max, i.e.

SELECT
t1.bucket, t1.bytecount, t1.windowstart, t1.windowend
FROM
(xxx) t1
WHERE
t1.bytecount=( SELECT t2.bytecount FROM (xxx) t2 WHERE t2.bucket =
t1.bucket ORDER BY t2.bytecount DESC LIMIT 1 )

On Sun, Sep 19, 2010 at 9:07 PM, Karl Wright  wrote:
> Looking at your proposal:
>
> SELECT
>   bucket, primary_key, windowstart, etc
> FROM
>   table AS t1
> WHERE
>   windowstart=( SELECT max(windowstart) FROM table AS t2 WHERE
> bucket = t1.bucket )
>
> ... we'd be looking actually for something more like this:
>
>
> SELECT
>   t1.bucket, t1.bytecount, t1.windowstart, t1.windowend
> FROM
>   (xxx) t1
> WHERE
>   t1.bytecount=( SELECT max(t2.bytecount) FROM (xxx) t2 WHERE
> t2.bucket = t1.bucket )
>
> ... although I've never seen the =(SELECT...) structure before.
>
> Karl
>
>
> On Sun, Sep 19, 2010 at 12:48 PM, Karl Wright  wrote:
>> Here you go:
>>
>>    // The query we will generate here looks like this:
>>    // SELECT *
>>    //   FROM
>>    //     (SELECT DISTINCT ON (idbucket) t3.bucket AS idbucket,
>> t3.bytecount AS bytecount,
>>    //                               t3.windowstart AS starttime,
>> t3.windowend AS endtime
>>    //        FROM (SELECT * FROM (SELECT t0.bucket AS bucket,
>> t0.starttime AS windowstart, t0.starttime +  AS windowend,
>>    //                   SUM(t1.datasize * ((case when t0.starttime +
>>  < t1.endtime then t0.starttime +  else t1.endtime
>> end) -
>>    //                     (case when t0.starttime>t1.starttime then
>> t0.starttime else t1.starttime end))
>>    //                      / (t1.endtime - t1.starttime)) AS bytecount
>>    //                   FROM (SELECT DISTINCT substring(entityid from
>> '') AS bucket, starttime FROM repohistory WHERE
>> ) t0, repohistory t1
>>    //                   WHERE t0.bucket=substring(t1.entityid from
>> '')
>>    //                      AND t1.starttime < t0.starttime +
>>  AND t1.endtime > t0.starttime
>>    //                      AND 
>>    //                          GROUP BY bucket,windowstart,windowend
>>    //              UNION SELECT t0a.bucket AS bucket, t0a.endtime -
>>  AS windowstart, t0a.endtime AS windowend,
>>    //                   SUM(t1a.datasize * ((case when t0a.endtime <
>> t1a.endtime then t0a.endtime else t1a.endtime end) -
>>    //                     (case when t0a.endtime -  >
>> t1a.starttime then t0a.endtime -  else t1a.starttime end))
>>    //                      / (t1a.endtime - t1a.starttime)) AS bytecount
>>    //                   FROM (SELECT DISTINCT substring(entityid from
>> '') AS bucket, endtime FROM repohistory WHERE
>> ) t0a, repohistory t1a
>>    //                   WHERE t0a.bucket=substring(t1a.entityid from
>> '')
>>    //                      AND (t1a.starttime < t0a.endtime AND
>> t1a.endtime > t0a.endtime - 
>>    //                      AND 
>>    //                          GROUP BY bucket,windowstart,windowend) t2
>>    //                              ORDER BY bucket ASC,bytecount
>> DESC) t3) t4 ORDER BY xxx LIMIT yyy OFFSET zzz;
>>
>> I have low confidence that ANY planner would be able to locate the
>> common part of a 2x larger query and not do it twice.
>>
>> Karl
>>
>>
>>
>> On Sun, Sep 19, 2010 at 12:05 PM, Alexey Serba  wrote:
>>>> The other thing is that we cannot afford to use the same "table"
>>>> twice, as it is actually an extremely expensive query in its own
>>>> right, with multiple joins, select distinct's, etc. under the covers.
>>> Even if you create indexes on bucket and activitycount columns? It
>>> might be that the query plans for these two queries (with "distinct
>>> on" hack and subquery max/subquery order limit/join) would be the
>>> same.
>>>
>>>> I'd be happy to post it but it may shock you. ;-)
>>> The way I indent SQL queries should say that I'm not afraid of
>>> multipage queries :)
>>>
>>>>
>>>> Karl
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Sun, Sep 19, 2010 at 11:32 AM, Alexey Serba  wrote:
>>>>>> SELECT DISTINCT ON (idbucket) t3.bucket AS idbucket, t3.activitycount AS 
>>>>>> activitycount, t3.windowstart AS starttime, t3.windowend AS endtime FROM 
>>>>>> (...) t3
>>>>

Re: Derby SQL ideas needed

2010-09-19 Thread Karl Wright
Looking at your proposal:

SELECT
   bucket, primary_key, windowstart, etc
FROM
   table AS t1
WHERE
   windowstart=( SELECT max(windowstart) FROM table AS t2 WHERE
bucket = t1.bucket )

... we'd be looking actually for something more like this:


SELECT
   t1.bucket, t1.bytecount, t1.windowstart, t1.windowend
FROM
   (xxx) t1
WHERE
   t1.bytecount=( SELECT max(t2.bytecount) FROM (xxx) t2 WHERE
t2.bucket = t1.bucket )

... although I've never seen the =(SELECT...) structure before.

Karl


On Sun, Sep 19, 2010 at 12:48 PM, Karl Wright  wrote:
> Here you go:
>
>    // The query we will generate here looks like this:
>    // SELECT *
>    //   FROM
>    //     (SELECT DISTINCT ON (idbucket) t3.bucket AS idbucket,
> t3.bytecount AS bytecount,
>    //                               t3.windowstart AS starttime,
> t3.windowend AS endtime
>    //        FROM (SELECT * FROM (SELECT t0.bucket AS bucket,
> t0.starttime AS windowstart, t0.starttime +  AS windowend,
>    //                   SUM(t1.datasize * ((case when t0.starttime +
>  < t1.endtime then t0.starttime +  else t1.endtime
> end) -
>    //                     (case when t0.starttime>t1.starttime then
> t0.starttime else t1.starttime end))
>    //                      / (t1.endtime - t1.starttime)) AS bytecount
>    //                   FROM (SELECT DISTINCT substring(entityid from
> '') AS bucket, starttime FROM repohistory WHERE
> ) t0, repohistory t1
>    //                   WHERE t0.bucket=substring(t1.entityid from
> '')
>    //                      AND t1.starttime < t0.starttime +
>  AND t1.endtime > t0.starttime
>    //                      AND 
>    //                          GROUP BY bucket,windowstart,windowend
>    //              UNION SELECT t0a.bucket AS bucket, t0a.endtime -
>  AS windowstart, t0a.endtime AS windowend,
>    //                   SUM(t1a.datasize * ((case when t0a.endtime <
> t1a.endtime then t0a.endtime else t1a.endtime end) -
>    //                     (case when t0a.endtime -  >
> t1a.starttime then t0a.endtime -  else t1a.starttime end))
>    //                      / (t1a.endtime - t1a.starttime)) AS bytecount
>    //                   FROM (SELECT DISTINCT substring(entityid from
> '') AS bucket, endtime FROM repohistory WHERE
> ) t0a, repohistory t1a
>    //                   WHERE t0a.bucket=substring(t1a.entityid from
> '')
>    //                      AND (t1a.starttime < t0a.endtime AND
> t1a.endtime > t0a.endtime - 
>    //                      AND 
>    //                          GROUP BY bucket,windowstart,windowend) t2
>    //                              ORDER BY bucket ASC,bytecount
> DESC) t3) t4 ORDER BY xxx LIMIT yyy OFFSET zzz;
>
> I have low confidence that ANY planner would be able to locate the
> common part of a 2x larger query and not do it twice.
>
> Karl
>
>
>
> On Sun, Sep 19, 2010 at 12:05 PM, Alexey Serba  wrote:
>>> The other thing is that we cannot afford to use the same "table"
>>> twice, as it is actually an extremely expensive query in its own
>>> right, with multiple joins, select distinct's, etc. under the covers.
>> Even if you create indexes on bucket and activitycount columns? It
>> might be that the query plans for these two queries (with "distinct
>> on" hack and subquery max/subquery order limit/join) would be the
>> same.
>>
>>> I'd be happy to post it but it may shock you. ;-)
>> The way I indent SQL queries should say that I'm not afraid of
>> multipage queries :)
>>
>>>
>>> Karl
>>>
>>>
>>>
>>>
>>>
>>> On Sun, Sep 19, 2010 at 11:32 AM, Alexey Serba  wrote:
>>>>> SELECT DISTINCT ON (idbucket) t3.bucket AS idbucket, t3.activitycount AS 
>>>>> activitycount, t3.windowstart AS starttime, t3.windowend AS endtime FROM 
>>>>> (...) t3
>>>> Do you have primary key in your t3 table?
>>>>
>>>>> In Postgresql, what this does is to return the FIRST entire row matching 
>>>>> each distinct idbucket result.
>>>> FIRST based on which sort?
>>>>
>>>> Lets say you want to return FIRST row based on t3.windowstart column
>>>> and you have primary key in t3 table. Then I believe your query can be
>>>> rewritten in the following ways:
>>>>
>>>> 1. Using subqueries
>>>> SELECT
>>>>    bucket, primary_key, windowstart, etc
>>>> FROM
>>>>    table AS t1
>>>> WHERE
>>>>    windowstart=( SELECT max(windowstart) FROM table AS t2 WHE

Re: Derby SQL ideas needed

2010-09-19 Thread Karl Wright
Here you go:

// The query we will generate here looks like this:
// SELECT *
//   FROM
// (SELECT DISTINCT ON (idbucket) t3.bucket AS idbucket,
t3.bytecount AS bytecount,
//   t3.windowstart AS starttime,
t3.windowend AS endtime
//FROM (SELECT * FROM (SELECT t0.bucket AS bucket,
t0.starttime AS windowstart, t0.starttime +  AS windowend,
//   SUM(t1.datasize * ((case when t0.starttime +
 < t1.endtime then t0.starttime +  else t1.endtime
end) -
// (case when t0.starttime>t1.starttime then
t0.starttime else t1.starttime end))
//  / (t1.endtime - t1.starttime)) AS bytecount
//   FROM (SELECT DISTINCT substring(entityid from
'') AS bucket, starttime FROM repohistory WHERE
) t0, repohistory t1
//   WHERE t0.bucket=substring(t1.entityid from
'')
//  AND t1.starttime < t0.starttime +
 AND t1.endtime > t0.starttime
//  AND 
//  GROUP BY bucket,windowstart,windowend
//  UNION SELECT t0a.bucket AS bucket, t0a.endtime -
 AS windowstart, t0a.endtime AS windowend,
//   SUM(t1a.datasize * ((case when t0a.endtime <
t1a.endtime then t0a.endtime else t1a.endtime end) -
// (case when t0a.endtime -  >
t1a.starttime then t0a.endtime -  else t1a.starttime end))
//  / (t1a.endtime - t1a.starttime)) AS bytecount
//   FROM (SELECT DISTINCT substring(entityid from
'') AS bucket, endtime FROM repohistory WHERE
) t0a, repohistory t1a
//   WHERE t0a.bucket=substring(t1a.entityid from
'')
//  AND (t1a.starttime < t0a.endtime AND
t1a.endtime > t0a.endtime - 
//  AND 
//  GROUP BY bucket,windowstart,windowend) t2
//  ORDER BY bucket ASC,bytecount
DESC) t3) t4 ORDER BY xxx LIMIT yyy OFFSET zzz;

I have low confidence that ANY planner would be able to locate the
common part of a 2x larger query and not do it twice.

Karl



On Sun, Sep 19, 2010 at 12:05 PM, Alexey Serba  wrote:
>> The other thing is that we cannot afford to use the same "table"
>> twice, as it is actually an extremely expensive query in its own
>> right, with multiple joins, select distinct's, etc. under the covers.
> Even if you create indexes on bucket and activitycount columns? It
> might be that the query plans for these two queries (with "distinct
> on" hack and subquery max/subquery order limit/join) would be the
> same.
>
>> I'd be happy to post it but it may shock you. ;-)
> The way I indent SQL queries should say that I'm not afraid of
> multipage queries :)
>
>>
>> Karl
>>
>>
>>
>>
>>
>> On Sun, Sep 19, 2010 at 11:32 AM, Alexey Serba  wrote:
>>>> SELECT DISTINCT ON (idbucket) t3.bucket AS idbucket, t3.activitycount AS 
>>>> activitycount, t3.windowstart AS starttime, t3.windowend AS endtime FROM 
>>>> (...) t3
>>> Do you have primary key in your t3 table?
>>>
>>>> In Postgresql, what this does is to return the FIRST entire row matching 
>>>> each distinct idbucket result.
>>> FIRST based on which sort?
>>>
>>> Lets say you want to return FIRST row based on t3.windowstart column
>>> and you have primary key in t3 table. Then I believe your query can be
>>> rewritten in the following ways:
>>>
>>> 1. Using subqueries
>>> SELECT
>>>    bucket, primary_key, windowstart, etc
>>> FROM
>>>    table AS t1
>>> WHERE
>>>    windowstart=( SELECT max(windowstart) FROM table AS t2 WHERE
>>> bucket = t1.bucket )
>>>
>>> 2. Using joins instead of subqueries ( in case Derby doesn't support
>>> subqueries - not sure about that )
>>> SELECT
>>>    t1.bucket, t1.primary_key, windowstart, etc
>>> FROM
>>>    table AS t1
>>>    LEFT OUTER JOIN table AS t2 ON ( t1.bucket = t2.bucket AND
>>> t2.windowstart > t1.windowstart )
>>> WHERE
>>>    t2.primary_key IS NULL
>>>
>>> HTH,
>>> Alex
>>>
>>> On Sat, Sep 18, 2010 at 2:28 PM, Karl Wright  wrote:
>>>> Hi Folks,
>>>>
>>>> For two of the report queries, ACF uses the following Postgresql
>>>> construct, which sadly seems to have no Derby equivalent:
>>>>
>>>> SELECT DISTINCT ON (idbucket) t3.bucket AS idbucket, t3.activitycount
>>>> AS activi

Re: Derby SQL ideas needed

2010-09-19 Thread Alexey Serba
> The other thing is that we cannot afford to use the same "table"
> twice, as it is actually an extremely expensive query in its own
> right, with multiple joins, select distinct's, etc. under the covers.
Even if you create indexes on bucket and activitycount columns? It
might be that the query plans for these two queries (with "distinct
on" hack and subquery max/subquery order limit/join) would be the
same.

> I'd be happy to post it but it may shock you. ;-)
The way I indent SQL queries should say that I'm not afraid of
multipage queries :)

>
> Karl
>
>
>
>
>
> On Sun, Sep 19, 2010 at 11:32 AM, Alexey Serba  wrote:
>>> SELECT DISTINCT ON (idbucket) t3.bucket AS idbucket, t3.activitycount AS 
>>> activitycount, t3.windowstart AS starttime, t3.windowend AS endtime FROM 
>>> (...) t3
>> Do you have primary key in your t3 table?
>>
>>> In Postgresql, what this does is to return the FIRST entire row matching 
>>> each distinct idbucket result.
>> FIRST based on which sort?
>>
>> Lets say you want to return FIRST row based on t3.windowstart column
>> and you have primary key in t3 table. Then I believe your query can be
>> rewritten in the following ways:
>>
>> 1. Using subqueries
>> SELECT
>>    bucket, primary_key, windowstart, etc
>> FROM
>>    table AS t1
>> WHERE
>>    windowstart=( SELECT max(windowstart) FROM table AS t2 WHERE
>> bucket = t1.bucket )
>>
>> 2. Using joins instead of subqueries ( in case Derby doesn't support
>> subqueries - not sure about that )
>> SELECT
>>    t1.bucket, t1.primary_key, windowstart, etc
>> FROM
>>    table AS t1
>>    LEFT OUTER JOIN table AS t2 ON ( t1.bucket = t2.bucket AND
>> t2.windowstart > t1.windowstart )
>> WHERE
>>    t2.primary_key IS NULL
>>
>> HTH,
>> Alex
>>
>> On Sat, Sep 18, 2010 at 2:28 PM, Karl Wright  wrote:
>>> Hi Folks,
>>>
>>> For two of the report queries, ACF uses the following Postgresql
>>> construct, which sadly seems to have no Derby equivalent:
>>>
>>> SELECT DISTINCT ON (idbucket) t3.bucket AS idbucket, t3.activitycount
>>> AS activitycount, t3.windowstart AS starttime, t3.windowend AS endtime
>>> FROM (...) t3
>>>
>>> In Postgresql, what this does is to return the FIRST entire row
>>> matching each distinct idbucket result.  If Derby had a "FIRST()"
>>> aggregate function, it would be the equivalent of:
>>>
>>> SELECT t3.bucket AS idbucket, FIRST(t3.activitycount) AS
>>> activitycount, FIRST(t3.windowstart) AS starttime, FIRST(t3.windowend)
>>> AS endtime FROM (...) t3 GROUP BY t3.bucket
>>>
>>> Unfortunately, Derby has no such aggregate function.  Furthermore, it
>>> would not be ideal if I were to do the work myself in ACF, because
>>> this is a resultset that needs to be paged through with offset and
>>> length, for presentation to the user and sorting, so it gets wrapped
>>> in another SELECT ... FROM (...) ORDER BY ... OFFSET ... LIMIT ...
>>> that does that part.
>>>
>>> Does anyone have any ideas and/or Derby contacts?  I'd really like the
>>> quick-start example to have a functional set of reports.
>>>
>>> Karl
>>>
>>
>


Re: Derby SQL ideas needed

2010-09-19 Thread Karl Wright
"FIRST based on which sort?"?

First based on the existing sort, which is crucial, because the sort
is by bucket ASC, activitycount DESC.  I'm looking for the row with
the highest activitycount, per bucket.

The other thing is that we cannot afford to use the same "table"
twice, as it is actually an extremely expensive query in its own
right, with multiple joins, select distinct's, etc. under the covers.
I'd be happy to post it but it may shock you. ;-)

Karl





On Sun, Sep 19, 2010 at 11:32 AM, Alexey Serba  wrote:
>> SELECT DISTINCT ON (idbucket) t3.bucket AS idbucket, t3.activitycount AS 
>> activitycount, t3.windowstart AS starttime, t3.windowend AS endtime FROM 
>> (...) t3
> Do you have primary key in your t3 table?
>
>> In Postgresql, what this does is to return the FIRST entire row matching 
>> each distinct idbucket result.
> FIRST based on which sort?
>
> Lets say you want to return FIRST row based on t3.windowstart column
> and you have primary key in t3 table. Then I believe your query can be
> rewritten in the following ways:
>
> 1. Using subqueries
> SELECT
>    bucket, primary_key, windowstart, etc
> FROM
>    table AS t1
> WHERE
>    windowstart=( SELECT max(windowstart) FROM table AS t2 WHERE
> bucket = t1.bucket )
>
> 2. Using joins instead of subqueries ( in case Derby doesn't support
> subqueries - not sure about that )
> SELECT
>    t1.bucket, t1.primary_key, windowstart, etc
> FROM
>    table AS t1
>    LEFT OUTER JOIN table AS t2 ON ( t1.bucket = t2.bucket AND
> t2.windowstart > t1.windowstart )
> WHERE
>    t2.primary_key IS NULL
>
> HTH,
> Alex
>
> On Sat, Sep 18, 2010 at 2:28 PM, Karl Wright  wrote:
>> Hi Folks,
>>
>> For two of the report queries, ACF uses the following Postgresql
>> construct, which sadly seems to have no Derby equivalent:
>>
>> SELECT DISTINCT ON (idbucket) t3.bucket AS idbucket, t3.activitycount
>> AS activitycount, t3.windowstart AS starttime, t3.windowend AS endtime
>> FROM (...) t3
>>
>> In Postgresql, what this does is to return the FIRST entire row
>> matching each distinct idbucket result.  If Derby had a "FIRST()"
>> aggregate function, it would be the equivalent of:
>>
>> SELECT t3.bucket AS idbucket, FIRST(t3.activitycount) AS
>> activitycount, FIRST(t3.windowstart) AS starttime, FIRST(t3.windowend)
>> AS endtime FROM (...) t3 GROUP BY t3.bucket
>>
>> Unfortunately, Derby has no such aggregate function.  Furthermore, it
>> would not be ideal if I were to do the work myself in ACF, because
>> this is a resultset that needs to be paged through with offset and
>> length, for presentation to the user and sorting, so it gets wrapped
>> in another SELECT ... FROM (...) ORDER BY ... OFFSET ... LIMIT ...
>> that does that part.
>>
>> Does anyone have any ideas and/or Derby contacts?  I'd really like the
>> quick-start example to have a functional set of reports.
>>
>> Karl
>>
>


Re: Derby SQL ideas needed

2010-09-19 Thread Alexey Serba
> SELECT DISTINCT ON (idbucket) t3.bucket AS idbucket, t3.activitycount AS 
> activitycount, t3.windowstart AS starttime, t3.windowend AS endtime FROM 
> (...) t3
Do you have primary key in your t3 table?

> In Postgresql, what this does is to return the FIRST entire row matching each 
> distinct idbucket result.
FIRST based on which sort?

Lets say you want to return FIRST row based on t3.windowstart column
and you have primary key in t3 table. Then I believe your query can be
rewritten in the following ways:

1. Using subqueries
SELECT
bucket, primary_key, windowstart, etc
FROM
table AS t1
WHERE
windowstart=( SELECT max(windowstart) FROM table AS t2 WHERE
bucket = t1.bucket )

2. Using joins instead of subqueries ( in case Derby doesn't support
subqueries - not sure about that )
SELECT
t1.bucket, t1.primary_key, windowstart, etc
FROM
table AS t1
LEFT OUTER JOIN table AS t2 ON ( t1.bucket = t2.bucket AND
t2.windowstart > t1.windowstart )
WHERE
t2.primary_key IS NULL

HTH,
Alex

On Sat, Sep 18, 2010 at 2:28 PM, Karl Wright  wrote:
> Hi Folks,
>
> For two of the report queries, ACF uses the following Postgresql
> construct, which sadly seems to have no Derby equivalent:
>
> SELECT DISTINCT ON (idbucket) t3.bucket AS idbucket, t3.activitycount
> AS activitycount, t3.windowstart AS starttime, t3.windowend AS endtime
> FROM (...) t3
>
> In Postgresql, what this does is to return the FIRST entire row
> matching each distinct idbucket result.  If Derby had a "FIRST()"
> aggregate function, it would be the equivalent of:
>
> SELECT t3.bucket AS idbucket, FIRST(t3.activitycount) AS
> activitycount, FIRST(t3.windowstart) AS starttime, FIRST(t3.windowend)
> AS endtime FROM (...) t3 GROUP BY t3.bucket
>
> Unfortunately, Derby has no such aggregate function.  Furthermore, it
> would not be ideal if I were to do the work myself in ACF, because
> this is a resultset that needs to be paged through with offset and
> length, for presentation to the user and sorting, so it gets wrapped
> in another SELECT ... FROM (...) ORDER BY ... OFFSET ... LIMIT ...
> that does that part.
>
> Does anyone have any ideas and/or Derby contacts?  I'd really like the
> quick-start example to have a functional set of reports.
>
> Karl
>


[jira] Commented: (CONNECTORS-110) Max activity and Max bandwidth reports fail under Derby with a stack trace

2010-09-19 Thread Karl Wright (JIRA)

[ 
https://issues.apache.org/jira/browse/CONNECTORS-110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12912212#action_12912212
 ] 

Karl Wright commented on CONNECTORS-110:


Checked in a partial solution to this issue.  At least the reports don't fail 
with an exception now, but they also list all time intervals on Derby instead 
of collapsing and reporting just the maximum, which will make these reports far 
less useful.  r998635.

> Max activity and Max bandwidth reports fail under Derby with a stack trace
> --
>
> Key: CONNECTORS-110
> URL: https://issues.apache.org/jira/browse/CONNECTORS-110
> Project: Apache Connectors Framework
>  Issue Type: Bug
>  Components: Framework crawler agent
>Reporter: Karl Wright
>
> The reason for the failure is because the queries used are doing the 
> Postgresql DISTINCT ON (xxx) syntax, which Derby does not support.  
> Unfortunately, there does not seem to be a way in Derby at present to do 
> anything similar to DISTINCT ON (xxx), and the queries really can't be done 
> without that.
> One option is to introduce a getCapabilities() method into the database 
> implementation, which would allow ACF to query the database capabilities 
> before even presenting the report in the navigation menu in the UI.  Another 
> alternative is to do a sizable chunk of resultset processing within ACF, 
> which would require not only the DISTINCT ON() implementation, but also the 
> enclosing sort and limit stuff.  It's the latter that would be most 
> challenging, because of the difficulties with i18n etc.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (CONNECTORS-110) Max activity and Max bandwidth reports fail under Derby with a stack trace

2010-09-18 Thread Karl Wright (JIRA)
Max activity and Max bandwidth reports fail under Derby with a stack trace
--

 Key: CONNECTORS-110
 URL: https://issues.apache.org/jira/browse/CONNECTORS-110
 Project: Apache Connectors Framework
  Issue Type: Bug
  Components: Framework crawler agent
Reporter: Karl Wright


The reason for the failure is because the queries used are doing the Postgresql 
DISTINCT ON (xxx) syntax, which Derby does not support.  Unfortunately, there 
does not seem to be a way in Derby at present to do anything similar to 
DISTINCT ON (xxx), and the queries really can't be done without that.

One option is to introduce a getCapabilities() method into the database 
implementation, which would allow ACF to query the database capabilities before 
even presenting the report in the navigation menu in the UI.  Another 
alternative is to do a sizable chunk of resultset processing within ACF, which 
would require not only the DISTINCT ON() implementation, but also the enclosing 
sort and limit stuff.  It's the latter that would be most challenging, because 
of the difficulties with i18n etc.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (CONNECTORS-109) Queue status report fails under Derby

2010-09-18 Thread Karl Wright (JIRA)

[ 
https://issues.apache.org/jira/browse/CONNECTORS-109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12911146#action_12911146
 ] 

Karl Wright commented on CONNECTORS-109:


Committed the set of changes necessary to use the DERBY-4066 fix properly when 
it becomes available.  r998576.


> Queue status report fails under Derby
> -
>
> Key: CONNECTORS-109
> URL: https://issues.apache.org/jira/browse/CONNECTORS-109
> Project: Apache Connectors Framework
>  Issue Type: Bug
>  Components: Framework crawler agent
>Reporter: Karl Wright
>
> If you try to use the queue status report with Derby as the database, you get 
> the following error:
> 2010-09-17 18:03:21.558:WARN::Nested in javax.servlet.ServletException: 
> org.apac
> he.acf.core.interfaces.ACFException: Database exception: Exception doing 
> query:
> Syntax error: Encountered "SUBSTRING" at line 1, column 8.:
> org.apache.acf.core.interfaces.ACFException: Database exception: Exception 
> doing
>  query: Syntax error: Encountered "SUBSTRING" at line 1, column 8.
> at 
> org.apache.acf.core.database.Database.executeViaThread(Database.java:
> 421)
> at 
> org.apache.acf.core.database.Database.executeUncachedQuery(Database.j
> ava:465)
> at 
> org.apache.acf.core.database.Database$QueryCacheExecutor.create(Datab
> ase.java:1072)
> at 
> org.apache.acf.core.cachemanager.CacheManager.findObjectsAndExecute(C
> acheManager.java:144)
> at 
> org.apache.acf.core.database.Database.executeQuery(Database.java:167)
> at 
> org.apache.acf.core.database.DBInterfaceDerby.performQuery(DBInterfac
> eDerby.java:751)
> at 
> org.apache.acf.crawler.jobs.JobManager.genQueueStatus(JobManager.java
> :5981)
> at 
> org.apache.jsp.queuestatus_jsp._jspService(queuestatus_jsp.java:769)
> at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:70)
> at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
> at 
> org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper
> .java:377)
> at 
> org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:3
> 13)
> at org.apache.jasper.servlet.JspServlet.service(JspServlet.java:260)
> at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
> at 
> org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511
> )
> at 
> org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:3
> 90)
> at 
> org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.jav
> a:216)
> at 
> org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:1
> 82)
> at 
> org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:7
> 65)
> at 
> org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:418)
> at org.mortbay.jetty.servlet.Dispatcher.forward(Dispatcher.java:327)
> at org.mortbay.jetty.servlet.Dispatcher.forward(Dispatcher.java:126)
> at 
> org.apache.jasper.runtime.PageContextImpl.doForward(PageContextImpl.j
> ava:706)
> at 
> org.apache.jasper.runtime.PageContextImpl.forward(PageContextImpl.jav
> a:677)
> at org.apache.jsp.execute_jsp._jspService(execute_jsp.java:1291)
> at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:70)
> at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
> at 
> org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper
> .java:377)
> at 
> org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:3
> 13)
> at org.apache.jasper.servlet.JspServlet.service(JspServlet.java:260)
> at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
> at 
> org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511
> )
> at 
> org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:3
> 90)
> at 
> org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.jav
> a:216)
> at 
> org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:1
> 82)
> at 
> org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:7
> 65)
> at 
> org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:418)
> at 
> org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.
> java:114)
> 

[jira] Commented: (CONNECTORS-109) Queue status report fails under Derby

2010-09-18 Thread Karl Wright (JIRA)

[ 
https://issues.apache.org/jira/browse/CONNECTORS-109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1293#action_1293
 ] 

Karl Wright commented on CONNECTORS-109:


Made most of the necessary code changes to correct this problem locally, but 
can't commit them yet because Derby's functions are limited in the current 
release to not allow CLOB arguments.  This issue is going to be addressed in 
the next release of Derby, see DERBY-4066.  The alternative is to build a trunk 
version of Derby and use that instead.


> Queue status report fails under Derby
> -
>
> Key: CONNECTORS-109
> URL: https://issues.apache.org/jira/browse/CONNECTORS-109
> Project: Apache Connectors Framework
>  Issue Type: Bug
>  Components: Framework crawler agent
>Reporter: Karl Wright
>
> If you try to use the queue status report with Derby as the database, you get 
> the following error:
> 2010-09-17 18:03:21.558:WARN::Nested in javax.servlet.ServletException: 
> org.apac
> he.acf.core.interfaces.ACFException: Database exception: Exception doing 
> query:
> Syntax error: Encountered "SUBSTRING" at line 1, column 8.:
> org.apache.acf.core.interfaces.ACFException: Database exception: Exception 
> doing
>  query: Syntax error: Encountered "SUBSTRING" at line 1, column 8.
> at 
> org.apache.acf.core.database.Database.executeViaThread(Database.java:
> 421)
> at 
> org.apache.acf.core.database.Database.executeUncachedQuery(Database.j
> ava:465)
> at 
> org.apache.acf.core.database.Database$QueryCacheExecutor.create(Datab
> ase.java:1072)
> at 
> org.apache.acf.core.cachemanager.CacheManager.findObjectsAndExecute(C
> acheManager.java:144)
> at 
> org.apache.acf.core.database.Database.executeQuery(Database.java:167)
> at 
> org.apache.acf.core.database.DBInterfaceDerby.performQuery(DBInterfac
> eDerby.java:751)
> at 
> org.apache.acf.crawler.jobs.JobManager.genQueueStatus(JobManager.java
> :5981)
> at 
> org.apache.jsp.queuestatus_jsp._jspService(queuestatus_jsp.java:769)
> at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:70)
> at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
> at 
> org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper
> .java:377)
> at 
> org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:3
> 13)
> at org.apache.jasper.servlet.JspServlet.service(JspServlet.java:260)
> at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
> at 
> org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511
> )
> at 
> org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:3
> 90)
> at 
> org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.jav
> a:216)
> at 
> org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:1
> 82)
> at 
> org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:7
> 65)
> at 
> org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:418)
> at org.mortbay.jetty.servlet.Dispatcher.forward(Dispatcher.java:327)
> at org.mortbay.jetty.servlet.Dispatcher.forward(Dispatcher.java:126)
> at 
> org.apache.jasper.runtime.PageContextImpl.doForward(PageContextImpl.j
> ava:706)
> at 
> org.apache.jasper.runtime.PageContextImpl.forward(PageContextImpl.jav
> a:677)
> at org.apache.jsp.execute_jsp._jspService(execute_jsp.java:1291)
> at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:70)
> at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
> at 
> org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper
> .java:377)
> at 
> org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:3
> 13)
> at org.apache.jasper.servlet.JspServlet.service(JspServlet.java:260)
> at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
> at 
> org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511
> )
> at 
> org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:3
> 90)
> at 
> org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.jav
> a:216)
> at 
> org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:1
> 82)
> at 
> org.mortbay.jetty.handler.ContextHandler.handle(ContextH

Re: Derby SQL ideas needed

2010-09-18 Thread Karl Wright
The Derby table-result function syntax requires all output columns to
be declared as part of the function definition, and more importantly
it does not seem to allow calls into Derby itself to get results.  So
this would not seem to be a viable option for that reason.

Back to square 1, I guess.  Derby doesn't seem to allow any way to
declare aggregate functions either, so I couldn't declare a FIRST()
aggregate method as proposed below.  Simple arithmetic functions seem
like they would work, but that's not helpful here.

Karl



On Sat, Sep 18, 2010 at 6:45 AM, Karl Wright  wrote:
> For what it's worth, defining a Derby function seems like the only way
> to do it.  These seem to call arbitrary java that can accept a query
> as an argument and return a resultset as the result.  But in order to
> write such a thing I will need the ability to call Derby at a java
> level, I think, rather than through JDBC.  Still looking for a good
> example from somebody who has done something similar.
>
> Karl
>
> On Sat, Sep 18, 2010 at 6:28 AM, Karl Wright  wrote:
>> Hi Folks,
>>
>> For two of the report queries, ACF uses the following Postgresql
>> construct, which sadly seems to have no Derby equivalent:
>>
>> SELECT DISTINCT ON (idbucket) t3.bucket AS idbucket, t3.activitycount
>> AS activitycount, t3.windowstart AS starttime, t3.windowend AS endtime
>> FROM (...) t3
>>
>> In Postgresql, what this does is to return the FIRST entire row
>> matching each distinct idbucket result.  If Derby had a "FIRST()"
>> aggregate function, it would be the equivalent of:
>>
>> SELECT t3.bucket AS idbucket, FIRST(t3.activitycount) AS
>> activitycount, FIRST(t3.windowstart) AS starttime, FIRST(t3.windowend)
>> AS endtime FROM (...) t3 GROUP BY t3.bucket
>>
>> Unfortunately, Derby has no such aggregate function.  Furthermore, it
>> would not be ideal if I were to do the work myself in ACF, because
>> this is a resultset that needs to be paged through with offset and
>> length, for presentation to the user and sorting, so it gets wrapped
>> in another SELECT ... FROM (...) ORDER BY ... OFFSET ... LIMIT ...
>> that does that part.
>>
>> Does anyone have any ideas and/or Derby contacts?  I'd really like the
>> quick-start example to have a functional set of reports.
>>
>> Karl
>>
>


Re: Derby SQL ideas needed

2010-09-18 Thread Karl Wright
For what it's worth, defining a Derby function seems like the only way
to do it.  These seem to call arbitrary java that can accept a query
as an argument and return a resultset as the result.  But in order to
write such a thing I will need the ability to call Derby at a java
level, I think, rather than through JDBC.  Still looking for a good
example from somebody who has done something similar.

Karl

On Sat, Sep 18, 2010 at 6:28 AM, Karl Wright  wrote:
> Hi Folks,
>
> For two of the report queries, ACF uses the following Postgresql
> construct, which sadly seems to have no Derby equivalent:
>
> SELECT DISTINCT ON (idbucket) t3.bucket AS idbucket, t3.activitycount
> AS activitycount, t3.windowstart AS starttime, t3.windowend AS endtime
> FROM (...) t3
>
> In Postgresql, what this does is to return the FIRST entire row
> matching each distinct idbucket result.  If Derby had a "FIRST()"
> aggregate function, it would be the equivalent of:
>
> SELECT t3.bucket AS idbucket, FIRST(t3.activitycount) AS
> activitycount, FIRST(t3.windowstart) AS starttime, FIRST(t3.windowend)
> AS endtime FROM (...) t3 GROUP BY t3.bucket
>
> Unfortunately, Derby has no such aggregate function.  Furthermore, it
> would not be ideal if I were to do the work myself in ACF, because
> this is a resultset that needs to be paged through with offset and
> length, for presentation to the user and sorting, so it gets wrapped
> in another SELECT ... FROM (...) ORDER BY ... OFFSET ... LIMIT ...
> that does that part.
>
> Does anyone have any ideas and/or Derby contacts?  I'd really like the
> quick-start example to have a functional set of reports.
>
> Karl
>


Derby SQL ideas needed

2010-09-18 Thread Karl Wright
Hi Folks,

For two of the report queries, ACF uses the following Postgresql
construct, which sadly seems to have no Derby equivalent:

SELECT DISTINCT ON (idbucket) t3.bucket AS idbucket, t3.activitycount
AS activitycount, t3.windowstart AS starttime, t3.windowend AS endtime
FROM (...) t3

In Postgresql, what this does is to return the FIRST entire row
matching each distinct idbucket result.  If Derby had a "FIRST()"
aggregate function, it would be the equivalent of:

SELECT t3.bucket AS idbucket, FIRST(t3.activitycount) AS
activitycount, FIRST(t3.windowstart) AS starttime, FIRST(t3.windowend)
AS endtime FROM (...) t3 GROUP BY t3.bucket

Unfortunately, Derby has no such aggregate function.  Furthermore, it
would not be ideal if I were to do the work myself in ACF, because
this is a resultset that needs to be paged through with offset and
length, for presentation to the user and sorting, so it gets wrapped
in another SELECT ... FROM (...) ORDER BY ... OFFSET ... LIMIT ...
that does that part.

Does anyone have any ideas and/or Derby contacts?  I'd really like the
quick-start example to have a functional set of reports.

Karl


[jira] Commented: (CONNECTORS-109) Queue status report fails under Derby

2010-09-17 Thread Karl Wright (JIRA)

[ 
https://issues.apache.org/jira/browse/CONNECTORS-109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12910822#action_12910822
 ] 

Karl Wright commented on CONNECTORS-109:


The same is true of the maximum activity report, maximum bandwidth report, and 
result code report as well.


> Queue status report fails under Derby
> -
>
> Key: CONNECTORS-109
> URL: https://issues.apache.org/jira/browse/CONNECTORS-109
> Project: Apache Connectors Framework
>  Issue Type: Bug
>  Components: Framework crawler agent
>Reporter: Karl Wright
>
> If you try to use the queue status report with Derby as the database, you get 
> the following error:
> 2010-09-17 18:03:21.558:WARN::Nested in javax.servlet.ServletException: 
> org.apac
> he.acf.core.interfaces.ACFException: Database exception: Exception doing 
> query:
> Syntax error: Encountered "SUBSTRING" at line 1, column 8.:
> org.apache.acf.core.interfaces.ACFException: Database exception: Exception 
> doing
>  query: Syntax error: Encountered "SUBSTRING" at line 1, column 8.
> at 
> org.apache.acf.core.database.Database.executeViaThread(Database.java:
> 421)
> at 
> org.apache.acf.core.database.Database.executeUncachedQuery(Database.j
> ava:465)
> at 
> org.apache.acf.core.database.Database$QueryCacheExecutor.create(Datab
> ase.java:1072)
> at 
> org.apache.acf.core.cachemanager.CacheManager.findObjectsAndExecute(C
> acheManager.java:144)
> at 
> org.apache.acf.core.database.Database.executeQuery(Database.java:167)
> at 
> org.apache.acf.core.database.DBInterfaceDerby.performQuery(DBInterfac
> eDerby.java:751)
> at 
> org.apache.acf.crawler.jobs.JobManager.genQueueStatus(JobManager.java
> :5981)
> at 
> org.apache.jsp.queuestatus_jsp._jspService(queuestatus_jsp.java:769)
> at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:70)
> at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
> at 
> org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper
> .java:377)
> at 
> org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:3
> 13)
> at org.apache.jasper.servlet.JspServlet.service(JspServlet.java:260)
> at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
> at 
> org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511
> )
> at 
> org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:3
> 90)
> at 
> org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.jav
> a:216)
> at 
> org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:1
> 82)
> at 
> org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:7
> 65)
> at 
> org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:418)
> at org.mortbay.jetty.servlet.Dispatcher.forward(Dispatcher.java:327)
> at org.mortbay.jetty.servlet.Dispatcher.forward(Dispatcher.java:126)
> at 
> org.apache.jasper.runtime.PageContextImpl.doForward(PageContextImpl.j
> ava:706)
> at 
> org.apache.jasper.runtime.PageContextImpl.forward(PageContextImpl.jav
> a:677)
> at org.apache.jsp.execute_jsp._jspService(execute_jsp.java:1291)
> at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:70)
> at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
> at 
> org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper
> .java:377)
> at 
> org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:3
> 13)
> at org.apache.jasper.servlet.JspServlet.service(JspServlet.java:260)
> at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
> at 
> org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511
> )
> at 
> org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:3
> 90)
> at 
> org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.jav
> a:216)
> at 
> org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:1
> 82)
> at 
> org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:7
> 65)
> at 
> org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:418)
> at 
> org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.
> java:114)
> 

[jira] Created: (CONNECTORS-109) Queue status report fails under Derby

2010-09-17 Thread Karl Wright (JIRA)
Queue status report fails under Derby
-

 Key: CONNECTORS-109
 URL: https://issues.apache.org/jira/browse/CONNECTORS-109
 Project: Apache Connectors Framework
  Issue Type: Bug
  Components: Framework crawler agent
Reporter: Karl Wright


If you try to use the queue status report with Derby as the database, you get 
the following error:

2010-09-17 18:03:21.558:WARN::Nested in javax.servlet.ServletException: org.apac
he.acf.core.interfaces.ACFException: Database exception: Exception doing query:
Syntax error: Encountered "SUBSTRING" at line 1, column 8.:
org.apache.acf.core.interfaces.ACFException: Database exception: Exception doing
 query: Syntax error: Encountered "SUBSTRING" at line 1, column 8.
at org.apache.acf.core.database.Database.executeViaThread(Database.java:
421)
at org.apache.acf.core.database.Database.executeUncachedQuery(Database.j
ava:465)
at org.apache.acf.core.database.Database$QueryCacheExecutor.create(Datab
ase.java:1072)
at org.apache.acf.core.cachemanager.CacheManager.findObjectsAndExecute(C
acheManager.java:144)
at org.apache.acf.core.database.Database.executeQuery(Database.java:167)

at org.apache.acf.core.database.DBInterfaceDerby.performQuery(DBInterfac
eDerby.java:751)
at org.apache.acf.crawler.jobs.JobManager.genQueueStatus(JobManager.java
:5981)
at org.apache.jsp.queuestatus_jsp._jspService(queuestatus_jsp.java:769)
at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:70)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
at org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper
.java:377)
at org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:3
13)
at org.apache.jasper.servlet.JspServlet.service(JspServlet.java:260)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511
)
at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:3
90)
at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.jav
a:216)
at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:1
82)
at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:7
65)
at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:418)

at org.mortbay.jetty.servlet.Dispatcher.forward(Dispatcher.java:327)
at org.mortbay.jetty.servlet.Dispatcher.forward(Dispatcher.java:126)
at org.apache.jasper.runtime.PageContextImpl.doForward(PageContextImpl.j
ava:706)
at org.apache.jasper.runtime.PageContextImpl.forward(PageContextImpl.jav
a:677)
at org.apache.jsp.execute_jsp._jspService(execute_jsp.java:1291)
at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:70)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
at org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper
.java:377)
at org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:3
13)
at org.apache.jasper.servlet.JspServlet.service(JspServlet.java:260)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511
)
at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:3
90)
at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.jav
a:216)
at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:1
82)
at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:7
65)
at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:418)

at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.
java:114)
at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:1
52)
at org.mortbay.jetty.Server.handle(Server.java:326)
at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:54
2)
at org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnectio
n.java:938)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:755)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:218)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.
java:228)
at 
org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java (502)

The reason for the error is that Derby does not recognize the SUBSTRING(...) 
operation, which extracts parts of a string based on a regular expression.  In 
other places in Derby where regular expressions were required, I've been 
success

RE: Derby/JUnit bad interaction - any ideas?

2010-06-09 Thread karl.wright
This actually did work, oddly enough.  I wonder how Derby is undoing the 
read-only attribute on those directories?  But in any case, I'm revamping the 
core setup/shutdown code again so that there's a decent hook in place to do the 
derby shutdown.

Karl


-Original Message-
From: ext Mark Miller [mailto:markrmil...@gmail.com] 
Sent: Wednesday, June 09, 2010 4:26 PM
To: connectors-dev@incubator.apache.org
Subject: Re: Derby/JUnit bad interaction - any ideas?

On 6/8/10 6:35 AM, karl.wri...@nokia.com wrote:
> I've been trying to get some basic tests working under Junit.  Unfortunately, 
> I've run into a Derby problem which prevents these tests from working.
>
> What happens is this.  Derby, when it creates a database, forces a number of 
> directories within the database to "read-only".  Unfortunately, unless we 
> stipulate Java 1.6 or up, there is no native Java way to make these 
> directories become non-read-only.  So database cleanup always fails to 
> actually remove the old database, and then new database creation subsequently 
> fails.
>
> So there are two possibilities.  First, we can change things so we never 
> actually try to clean up the Derby DB.  Second, we can mandate the java 1.6 
> is used for LCF.  That's all there really is.
>
> The first possibility is tricky but doable - I think.  The second would 
> probably be unacceptable in many ways.
>
> Thoughts?
>
> Karl
>
>
>
>

So I've been thinking about this - I still have trouble believing this 
is a real problem. I had a large suite of tests that used embedded derby 
in a system I worked on a few years back - and I never had any trouble 
removing the db dir after shutting down derby.

Looking at the code, have you actually tried shutting down derby?

Currently you have:

 // Cause database to shut down
 new 
Database(context,_url+databaseName+";shutdown=true",_driver,databaseName,"","");
 // DO NOT delete user or shutdown database, since this is in fact 
impossible under java 1.5 (since Derby makes its directories read-only, and
 // there's no way to undo that...
 // rm -rf 
 //File f = new File(databaseName);
 //recursiveDelete(f);

But that is not going to do the shutdown?
On a quick look, doing new Database(context, url ...
does not actually contact the db - so its not going to cause it to shutdown?

Is this just cruft code and you have actually tried shutting down as well?

Something makes me think the delete is going to work if you actually 
attempt to connect with '...;shutdown=true' jdbc URL.

-- 
- Mark

http://www.lucidimagination.com


Re: Derby/JUnit bad interaction - any ideas?

2010-06-09 Thread Mark Miller

On 6/8/10 6:35 AM, karl.wri...@nokia.com wrote:

I've been trying to get some basic tests working under Junit.  Unfortunately, 
I've run into a Derby problem which prevents these tests from working.

What happens is this.  Derby, when it creates a database, forces a number of directories 
within the database to "read-only".  Unfortunately, unless we stipulate Java 
1.6 or up, there is no native Java way to make these directories become non-read-only.  
So database cleanup always fails to actually remove the old database, and then new 
database creation subsequently fails.

So there are two possibilities.  First, we can change things so we never 
actually try to clean up the Derby DB.  Second, we can mandate the java 1.6 is 
used for LCF.  That's all there really is.

The first possibility is tricky but doable - I think.  The second would 
probably be unacceptable in many ways.

Thoughts?

Karl






So I've been thinking about this - I still have trouble believing this 
is a real problem. I had a large suite of tests that used embedded derby 
in a system I worked on a few years back - and I never had any trouble 
removing the db dir after shutting down derby.


Looking at the code, have you actually tried shutting down derby?

Currently you have:

// Cause database to shut down
new 
Database(context,_url+databaseName+";shutdown=true",_driver,databaseName,"","");
// DO NOT delete user or shutdown database, since this is in fact 
impossible under java 1.5 (since Derby makes its directories read-only, and

// there's no way to undo that...
// rm -rf 
//File f = new File(databaseName);
//recursiveDelete(f);

But that is not going to do the shutdown?
On a quick look, doing new Database(context, url ...
does not actually contact the db - so its not going to cause it to shutdown?

Is this just cruft code and you have actually tried shutting down as well?

Something makes me think the delete is going to work if you actually 
attempt to connect with '...;shutdown=true' jdbc URL.


--
- Mark

http://www.lucidimagination.com


RE: Derby/JUnit bad interaction - any ideas?

2010-06-09 Thread karl.wright
I take this partially back.  The gcj jvm is the one that doesn't work with ant.
At any rate, going to a different JVM is something I can only influence but 
can't control, so that's probably not going to happen for a while.

Karl


From: Wright Karl (Nokia-S/Cambridge)
Sent: Wednesday, June 09, 2010 5:24 AM
To: connectors-dev@incubator.apache.org
Subject: RE: Derby/JUnit bad interaction - any ideas?

Open jdk does not seem to work properly with most java applications at this 
time, although it has continued to improve.  Its switch incompatibilities stop 
it from working with ant at this time, so one cannot even build LCF with it.

Karl


From: ext Olivier Bourgeat [olivier.bourg...@polyspot.com]
Sent: Wednesday, June 09, 2010 4:03 AM
To: connectors-dev@incubator.apache.org
Subject: RE: Derby/JUnit bad interaction - any ideas?

Debian Lenny have openjdk-6:
http://packages.debian.org/fr/source/lenny/openjdk-6

Olivier

Le mardi 08 juin 2010 à 22:37 +0200, karl.wri...@nokia.com a écrit :
> MetaCarta is running Debian Lenny, which does not have a 1.6 version of Java 
> available at this time.
>
> Karl
>
>
> -Original Message-
> From: ext Jack Krupansky [mailto:jack.krupan...@lucidimagination.com]
> Sent: Tuesday, June 08, 2010 4:36 PM
> To: connectors-dev@incubator.apache.org
> Subject: Re: Derby/JUnit bad interaction - any ideas?
>
> If we need to require Java 1.6, that is probably okay. I am fine with that.
> Does anybody have a serious objection to requiring Java 1.6 for LCF?
>
> -- Jack Krupansky
>
> --
> From: 
> Sent: Tuesday, June 08, 2010 6:35 AM
> To: 
> Subject: Derby/JUnit bad interaction - any ideas?
>
> > I've been trying to get some basic tests working under Junit.
> > Unfortunately, I've run into a Derby problem which prevents these tests
> > from working.
> >
> > What happens is this.  Derby, when it creates a database, forces a number
> > of directories within the database to "read-only".  Unfortunately, unless
> > we stipulate Java 1.6 or up, there is no native Java way to make these
> > directories become non-read-only.  So database cleanup always fails to
> > actually remove the old database, and then new database creation
> > subsequently fails.
> >
> > So there are two possibilities.  First, we can change things so we never
> > actually try to clean up the Derby DB.  Second, we can mandate the java
> > 1.6 is used for LCF.  That's all there really is.
> >
> > The first possibility is tricky but doable - I think.  The second would
> > probably be unacceptable in many ways.
> >
> > Thoughts?
> >
> > Karl
> >
> >
> >
> >






RE: Derby/JUnit bad interaction - any ideas?

2010-06-09 Thread karl.wright
Open jdk does not seem to work properly with most java applications at this 
time, although it has continued to improve.  Its switch incompatibilities stop 
it from working with ant at this time, so one cannot even build LCF with it.

Karl


From: ext Olivier Bourgeat [olivier.bourg...@polyspot.com]
Sent: Wednesday, June 09, 2010 4:03 AM
To: connectors-dev@incubator.apache.org
Subject: RE: Derby/JUnit bad interaction - any ideas?

Debian Lenny have openjdk-6:
http://packages.debian.org/fr/source/lenny/openjdk-6

Olivier

Le mardi 08 juin 2010 à 22:37 +0200, karl.wri...@nokia.com a écrit :
> MetaCarta is running Debian Lenny, which does not have a 1.6 version of Java 
> available at this time.
>
> Karl
>
>
> -Original Message-
> From: ext Jack Krupansky [mailto:jack.krupan...@lucidimagination.com]
> Sent: Tuesday, June 08, 2010 4:36 PM
> To: connectors-dev@incubator.apache.org
> Subject: Re: Derby/JUnit bad interaction - any ideas?
>
> If we need to require Java 1.6, that is probably okay. I am fine with that.
> Does anybody have a serious objection to requiring Java 1.6 for LCF?
>
> -- Jack Krupansky
>
> --
> From: 
> Sent: Tuesday, June 08, 2010 6:35 AM
> To: 
> Subject: Derby/JUnit bad interaction - any ideas?
>
> > I've been trying to get some basic tests working under Junit.
> > Unfortunately, I've run into a Derby problem which prevents these tests
> > from working.
> >
> > What happens is this.  Derby, when it creates a database, forces a number
> > of directories within the database to "read-only".  Unfortunately, unless
> > we stipulate Java 1.6 or up, there is no native Java way to make these
> > directories become non-read-only.  So database cleanup always fails to
> > actually remove the old database, and then new database creation
> > subsequently fails.
> >
> > So there are two possibilities.  First, we can change things so we never
> > actually try to clean up the Derby DB.  Second, we can mandate the java
> > 1.6 is used for LCF.  That's all there really is.
> >
> > The first possibility is tricky but doable - I think.  The second would
> > probably be unacceptable in many ways.
> >
> > Thoughts?
> >
> > Karl
> >
> >
> >
> >






RE: Derby/JUnit bad interaction - any ideas?

2010-06-09 Thread Olivier Bourgeat
Debian Lenny have openjdk-6:
http://packages.debian.org/fr/source/lenny/openjdk-6 

Olivier

Le mardi 08 juin 2010 à 22:37 +0200, karl.wri...@nokia.com a écrit :
> MetaCarta is running Debian Lenny, which does not have a 1.6 version of Java 
> available at this time.
> 
> Karl
> 
> 
> -Original Message-
> From: ext Jack Krupansky [mailto:jack.krupan...@lucidimagination.com] 
> Sent: Tuesday, June 08, 2010 4:36 PM
> To: connectors-dev@incubator.apache.org
> Subject: Re: Derby/JUnit bad interaction - any ideas?
> 
> If we need to require Java 1.6, that is probably okay. I am fine with that. 
> Does anybody have a serious objection to requiring Java 1.6 for LCF?
> 
> -- Jack Krupansky
> 
> --
> From: 
> Sent: Tuesday, June 08, 2010 6:35 AM
> To: 
> Subject: Derby/JUnit bad interaction - any ideas?
> 
> > I've been trying to get some basic tests working under Junit. 
> > Unfortunately, I've run into a Derby problem which prevents these tests 
> > from working.
> >
> > What happens is this.  Derby, when it creates a database, forces a number 
> > of directories within the database to "read-only".  Unfortunately, unless 
> > we stipulate Java 1.6 or up, there is no native Java way to make these 
> > directories become non-read-only.  So database cleanup always fails to 
> > actually remove the old database, and then new database creation 
> > subsequently fails.
> >
> > So there are two possibilities.  First, we can change things so we never 
> > actually try to clean up the Derby DB.  Second, we can mandate the java 
> > 1.6 is used for LCF.  That's all there really is.
> >
> > The first possibility is tricky but doable - I think.  The second would 
> > probably be unacceptable in many ways.
> >
> > Thoughts?
> >
> > Karl
> >
> >
> >
> > 






RE: Derby/JUnit bad interaction - any ideas?

2010-06-08 Thread karl.wright
MetaCarta is running Debian Lenny, which does not have a 1.6 version of Java 
available at this time.

Karl


-Original Message-
From: ext Jack Krupansky [mailto:jack.krupan...@lucidimagination.com] 
Sent: Tuesday, June 08, 2010 4:36 PM
To: connectors-dev@incubator.apache.org
Subject: Re: Derby/JUnit bad interaction - any ideas?

If we need to require Java 1.6, that is probably okay. I am fine with that. 
Does anybody have a serious objection to requiring Java 1.6 for LCF?

-- Jack Krupansky

--
From: 
Sent: Tuesday, June 08, 2010 6:35 AM
To: 
Subject: Derby/JUnit bad interaction - any ideas?

> I've been trying to get some basic tests working under Junit. 
> Unfortunately, I've run into a Derby problem which prevents these tests 
> from working.
>
> What happens is this.  Derby, when it creates a database, forces a number 
> of directories within the database to "read-only".  Unfortunately, unless 
> we stipulate Java 1.6 or up, there is no native Java way to make these 
> directories become non-read-only.  So database cleanup always fails to 
> actually remove the old database, and then new database creation 
> subsequently fails.
>
> So there are two possibilities.  First, we can change things so we never 
> actually try to clean up the Derby DB.  Second, we can mandate the java 
> 1.6 is used for LCF.  That's all there really is.
>
> The first possibility is tricky but doable - I think.  The second would 
> probably be unacceptable in many ways.
>
> Thoughts?
>
> Karl
>
>
>
> 


Re: Derby/JUnit bad interaction - any ideas?

2010-06-08 Thread Jack Krupansky
If we need to require Java 1.6, that is probably okay. I am fine with that. 
Does anybody have a serious objection to requiring Java 1.6 for LCF?


-- Jack Krupansky

--
From: 
Sent: Tuesday, June 08, 2010 6:35 AM
To: 
Subject: Derby/JUnit bad interaction - any ideas?

I've been trying to get some basic tests working under Junit. 
Unfortunately, I've run into a Derby problem which prevents these tests 
from working.


What happens is this.  Derby, when it creates a database, forces a number 
of directories within the database to "read-only".  Unfortunately, unless 
we stipulate Java 1.6 or up, there is no native Java way to make these 
directories become non-read-only.  So database cleanup always fails to 
actually remove the old database, and then new database creation 
subsequently fails.


So there are two possibilities.  First, we can change things so we never 
actually try to clean up the Derby DB.  Second, we can mandate the java 
1.6 is used for LCF.  That's all there really is.


The first possibility is tricky but doable - I think.  The second would 
probably be unacceptable in many ways.


Thoughts?

Karl






RE: Derby/JUnit bad interaction - any ideas?

2010-06-08 Thread karl.wright
I just had a look at the sources.  Ant's chmod task queries what kind of OS it 
is, and if it is the right kind, it actually attempts to fire off the chmod 
utility. ;-)

That's pretty hacky.  Nice to avoid that if possible.

Now, I was able to get my current set of brain-dead tests to work OK (and the 
ant cleanup too!) by making sure that the database was properly cleaned after 
every use, and leaving it around for later.  It turns out that ant can delete 
the testing directory even though the directory underneath it has read-only 
stuff in it, even without the chmod.  This seems to be because when it fails 
any deletion, it simply calls f.deleteOnExit() and lets the JVM do it later - 
and apparently the JVM *can* do this, because it's implemented to just do an 
unlink at that time, which bypasses the need to actually delete any read-only 
subdirectories.

Oh my.  What a strange mess.

Still, things are currently working, so I guess I'll leave them as they are, 
for now.

Karl


-Original Message-
From: ext Koji Sekiguchi [mailto:k...@r.email.ne.jp] 
Sent: Tuesday, June 08, 2010 10:30 AM
To: connectors-dev@incubator.apache.org
Subject: Re: Derby/JUnit bad interaction - any ideas?

(10/06/08 23:14), karl.wri...@nokia.com wrote:
> Yeah, I was pretty surprised too.  But on windows it is likely that 
> File.makeReadOnly() (which is what Derby must be using) doesn't actually do 
> anything to directories, which would explain the discrepancy.
>
> Karl
>
>
If so, luckily Ant hack can solve the problem on Linux.

Koji

-- 
http://www.rondhuit.com/en/



Re: Derby/JUnit bad interaction - any ideas?

2010-06-08 Thread Koji Sekiguchi

(10/06/08 23:14), karl.wri...@nokia.com wrote:

Yeah, I was pretty surprised too.  But on windows it is likely that 
File.makeReadOnly() (which is what Derby must be using) doesn't actually do 
anything to directories, which would explain the discrepancy.

Karl

   

If so, luckily Ant hack can solve the problem on Linux.

Koji

--
http://www.rondhuit.com/en/



RE: Derby/JUnit bad interaction - any ideas?

2010-06-08 Thread karl.wright
Yeah, I was pretty surprised too.  But on windows it is likely that 
File.makeReadOnly() (which is what Derby must be using) doesn't actually do 
anything to directories, which would explain the discrepancy.

Karl


-Original Message-
From: ext Mark Miller [mailto:markrmil...@gmail.com] 
Sent: Tuesday, June 08, 2010 9:45 AM
To: connectors-dev@incubator.apache.org
Subject: Re: Derby/JUnit bad interaction - any ideas?

On 6/8/10 6:35 AM, karl.wri...@nokia.com wrote:
> I've been trying to get some basic tests working under Junit.  Unfortunately, 
> I've run into a Derby problem which prevents these tests from working.
>
> What happens is this.  Derby, when it creates a database, forces a number of 
> directories within the database to "read-only".  Unfortunately, unless we 
> stipulate Java 1.6 or up, there is no native Java way to make these 
> directories become non-read-only.  So database cleanup always fails to 
> actually remove the old database, and then new database creation subsequently 
> fails.
>
> So there are two possibilities.  First, we can change things so we never 
> actually try to clean up the Derby DB.  Second, we can mandate the java 1.6 
> is used for LCF.  That's all there really is.
>
> The first possibility is tricky but doable - I think.  The second would 
> probably be unacceptable in many ways.
>
> Thoughts?
>
> Karl
>
>
>
>

Interesting - when I worked with derby in the past, I never had any 
trouble deleting a database after shutting it down on windows using Java 
5. It worked great with my unit tests.

You could always run each test in a new system tmp dir every time...

I find it hard to believe you cannot delete the database somehow though 
- like I said, I never had any problems with it using embedded derby in 
the past after shutting down the db.

-- 
- Mark

http://www.lucidimagination.com


RE: Derby/JUnit bad interaction - any ideas?

2010-06-08 Thread karl.wright
Huh.  I wonder how ant is doing it?

Using the ant task directly makes it impossible to do this from within JUnit, 
of course, but maybe the same hack can be done inside the test stuff.

Karl

-Original Message-
From: ext Koji Sekiguchi [mailto:k...@r.email.ne.jp] 
Sent: Tuesday, June 08, 2010 10:08 AM
To: connectors-dev@incubator.apache.org
Subject: Re: Derby/JUnit bad interaction - any ideas?

(10/06/08 22:35), karl.wri...@nokia.com wrote:
> I've been trying to get some basic tests working under Junit.  Unfortunately, 
> I've run into a Derby problem which prevents these tests from working.
>
> What happens is this.  Derby, when it creates a database, forces a number of 
> directories within the database to "read-only".  Unfortunately, unless we 
> stipulate Java 1.6 or up, there is no native Java way to make these 
> directories become non-read-only.  So database cleanup always fails to 
> actually remove the old database, and then new database creation subsequently 
> fails.
>
> So there are two possibilities.  First, we can change things so we never 
> actually try to clean up the Derby DB.  Second, we can mandate the java 1.6 
> is used for LCF.  That's all there really is.
>
> The first possibility is tricky but doable - I think.  The second would 
> probably be unacceptable in many ways.
>
> Thoughts?
>
> Karl
>
Hi Karl,

If it is possible, Ant chmod task can be used, or
you can consult the implementation. But Ant manual
says for the task:

" Right now it has effect only under Unix or NonStop Kernel (Tandem)."
http://ant.apache.org/manual/Tasks/chmod.html

Koji

-- 
http://www.rondhuit.com/en/



Re: Derby/JUnit bad interaction - any ideas?

2010-06-08 Thread Koji Sekiguchi

(10/06/08 22:35), karl.wri...@nokia.com wrote:

I've been trying to get some basic tests working under Junit.  Unfortunately, 
I've run into a Derby problem which prevents these tests from working.

What happens is this.  Derby, when it creates a database, forces a number of directories 
within the database to "read-only".  Unfortunately, unless we stipulate Java 
1.6 or up, there is no native Java way to make these directories become non-read-only.  
So database cleanup always fails to actually remove the old database, and then new 
database creation subsequently fails.

So there are two possibilities.  First, we can change things so we never 
actually try to clean up the Derby DB.  Second, we can mandate the java 1.6 is 
used for LCF.  That's all there really is.

The first possibility is tricky but doable - I think.  The second would 
probably be unacceptable in many ways.

Thoughts?

Karl
   

Hi Karl,

If it is possible, Ant chmod task can be used, or
you can consult the implementation. But Ant manual
says for the task:

" Right now it has effect only under Unix or NonStop Kernel (Tandem)."
http://ant.apache.org/manual/Tasks/chmod.html

Koji

--
http://www.rondhuit.com/en/



Re: Derby/JUnit bad interaction - any ideas?

2010-06-08 Thread Mark Miller

On 6/8/10 6:35 AM, karl.wri...@nokia.com wrote:

I've been trying to get some basic tests working under Junit.  Unfortunately, 
I've run into a Derby problem which prevents these tests from working.

What happens is this.  Derby, when it creates a database, forces a number of directories 
within the database to "read-only".  Unfortunately, unless we stipulate Java 
1.6 or up, there is no native Java way to make these directories become non-read-only.  
So database cleanup always fails to actually remove the old database, and then new 
database creation subsequently fails.

So there are two possibilities.  First, we can change things so we never 
actually try to clean up the Derby DB.  Second, we can mandate the java 1.6 is 
used for LCF.  That's all there really is.

The first possibility is tricky but doable - I think.  The second would 
probably be unacceptable in many ways.

Thoughts?

Karl






Interesting - when I worked with derby in the past, I never had any 
trouble deleting a database after shutting it down on windows using Java 
5. It worked great with my unit tests.


You could always run each test in a new system tmp dir every time...

I find it hard to believe you cannot delete the database somehow though 
- like I said, I never had any problems with it using embedded derby in 
the past after shutting down the db.


--
- Mark

http://www.lucidimagination.com


Derby/JUnit bad interaction - any ideas?

2010-06-08 Thread karl.wright
I've been trying to get some basic tests working under Junit.  Unfortunately, 
I've run into a Derby problem which prevents these tests from working.

What happens is this.  Derby, when it creates a database, forces a number of 
directories within the database to "read-only".  Unfortunately, unless we 
stipulate Java 1.6 or up, there is no native Java way to make these directories 
become non-read-only.  So database cleanup always fails to actually remove the 
old database, and then new database creation subsequently fails.

So there are two possibilities.  First, we can change things so we never 
actually try to clean up the Derby DB.  Second, we can mandate the java 1.6 is 
used for LCF.  That's all there really is.

The first possibility is tricky but doable - I think.  The second would 
probably be unacceptable in many ways.

Thoughts?

Karl





RE: Derby

2010-06-04 Thread karl.wright
Yup.

Karl

-Original Message-
From: ext Jack Krupansky [mailto:jack.krupan...@lucidimagination.com] 
Sent: Friday, June 04, 2010 12:27 AM
To: connectors-dev@incubator.apache.org
Subject: Re: Derby

Just to be clear, the full sequence would be:

1) Start UI app. Agent process should not be running.
2) "Start" LCF job in UI.
3) Shutdown UI app. Not just close the browser window.
4) AgentRun.
5) Wait long enough for crawl to have finished. Maybe watch to see that Solr 
has become idle.
6) Possibly commit to Solr.
7) AgentStop.
8) Back to step 1 for additional jobs.

Correct?

-- Jack Krupansky

--
From: 
Sent: Thursday, June 03, 2010 7:24 PM
To: 
Subject: RE: Derby

> The daemon does not need to interact with the UI directly, only with the 
> database.  So, you stop the UI, start the daemon, and after a while, shut 
> down the daemon and restart the UI.
>
> Karl
>
> -Original Message-
> From: ext Jack Krupansky [mailto:jack.krupan...@lucidimagination.com]
> Sent: Thursday, June 03, 2010 5:51 PM
> To: connectors-dev@incubator.apache.org
> Subject: Re: Derby
>
>> (1) You can't run more than one LCF process at a time.  That means 
>> you
>> need to either run the daemon or the crawler-ui web application, but you
>> can't run both at the same time.
>
> How do you "Start" a crawl then if not in the web app which then starts 
> the
> agent process crawling?
>
> Thanks for all of this effort!
>
> -- Jack Krupansky
>
> ------
> From: 
> Sent: Thursday, June 03, 2010 5:34 PM
> To: 
> Subject: Derby
>
>> For what it's worth, after some 5 days of work, and a couple of schema
>> changes to boot, LCF now runs with Derby.
>> Some caveats:
>>
>> (1) You can't run more than one LCF process at a time.  That means 
>> you
>> need to either run the daemon or the crawler-ui web application, but you
>> can't run both at the same time.
>> (2) I haven't tested every query, so I'm sure there are probably some
>> that are still broken.
>> (3) It's slow.  Count yourself as fortunate if it runs 1/5 the rate 
>> of
>> Postgresql for you.
>> (4) Transactional integrity hasn't been evaluated.
>> (5) Deadlock detection and unique constraint violation detection is
>> probably not right, because I'd need to cause these errors to occur 
>> before
>> being able to key off their exception messages.
>> (6) I had to turn off the ability to sort on certain columns in the
>> reports - basically, any column that was represented as a large character
>> field.
>>
>> Nevertheless, this represents an important milestone on the path to being
>> able to write some kind of unit tests that have at least some meaning.
>>
>> If you have an existing LCF Postgresql database, you will need to force 
>> an
>> upgrade after going to the new trunk code.  To do this, repeat the
>> "org.apache.lcf.agents.Install" command, and the
>> "org.apache.lcf.agents.Register
>> org.apache.lcf.crawler.system.CrawlerAgent" command after deploying the
>> new code.  And, please, let me know of any kind of errors you notice that
>> could be related to the schema change.
>>
>> Thanks,
>> Karl
>>
>>
>> 


RE: Derby

2010-06-04 Thread karl.wright
The reason this occurs is because I am using Derby in embedded mode, and the 
restriction appears to be a limitation of that mode of operation.  However, 
this mode is necessary to meet the testing goal, which was the prime motivator 
behind doing a Derby implementation.  I am sure that if we were to use Derby as 
a service, the restriction would no longer apply, but then there would be no 
conceivable benefit either.

Karl

-Original Message-
From: ext Jack Krupansky [mailto:jack.krupan...@lucidimagination.com] 
Sent: Friday, June 04, 2010 12:41 AM
To: connectors-dev@incubator.apache.org
Subject: Re: Derby

What is the nature of the single LCF process issue? Is it because the 
database is being used in single-user mode, or some other issue? Is it a 
permanent issue, or is there a solution or workaround anticipated at some 
stage.

Thanks.

-- Jack Krupansky

--
From: 
Sent: Thursday, June 03, 2010 5:34 PM
To: 
Subject: Derby

> For what it's worth, after some 5 days of work, and a couple of schema 
> changes to boot, LCF now runs with Derby.
> Some caveats:
>
> (1) You can't run more than one LCF process at a time.  That means you 
> need to either run the daemon or the crawler-ui web application, but you 
> can't run both at the same time.
> (2) I haven't tested every query, so I'm sure there are probably some 
> that are still broken.
> (3) It's slow.  Count yourself as fortunate if it runs 1/5 the rate of 
> Postgresql for you.
> (4) Transactional integrity hasn't been evaluated.
> (5) Deadlock detection and unique constraint violation detection is 
> probably not right, because I'd need to cause these errors to occur before 
> being able to key off their exception messages.
> (6) I had to turn off the ability to sort on certain columns in the 
> reports - basically, any column that was represented as a large character 
> field.
>
> Nevertheless, this represents an important milestone on the path to being 
> able to write some kind of unit tests that have at least some meaning.
>
> If you have an existing LCF Postgresql database, you will need to force an 
> upgrade after going to the new trunk code.  To do this, repeat the 
> "org.apache.lcf.agents.Install" command, and the 
> "org.apache.lcf.agents.Register 
> org.apache.lcf.crawler.system.CrawlerAgent" command after deploying the 
> new code.  And, please, let me know of any kind of errors you notice that 
> could be related to the schema change.
>
> Thanks,
> Karl
>
>
> 


Re: Derby

2010-06-03 Thread Jack Krupansky
What is the nature of the single LCF process issue? Is it because the 
database is being used in single-user mode, or some other issue? Is it a 
permanent issue, or is there a solution or workaround anticipated at some 
stage.


Thanks.

-- Jack Krupansky

--
From: 
Sent: Thursday, June 03, 2010 5:34 PM
To: 
Subject: Derby

For what it's worth, after some 5 days of work, and a couple of schema 
changes to boot, LCF now runs with Derby.

Some caveats:

(1) You can't run more than one LCF process at a time.  That means you 
need to either run the daemon or the crawler-ui web application, but you 
can't run both at the same time.
(2) I haven't tested every query, so I'm sure there are probably some 
that are still broken.
(3) It's slow.  Count yourself as fortunate if it runs 1/5 the rate of 
Postgresql for you.

(4) Transactional integrity hasn't been evaluated.
(5) Deadlock detection and unique constraint violation detection is 
probably not right, because I'd need to cause these errors to occur before 
being able to key off their exception messages.
(6) I had to turn off the ability to sort on certain columns in the 
reports - basically, any column that was represented as a large character 
field.


Nevertheless, this represents an important milestone on the path to being 
able to write some kind of unit tests that have at least some meaning.


If you have an existing LCF Postgresql database, you will need to force an 
upgrade after going to the new trunk code.  To do this, repeat the 
"org.apache.lcf.agents.Install" command, and the 
"org.apache.lcf.agents.Register 
org.apache.lcf.crawler.system.CrawlerAgent" command after deploying the 
new code.  And, please, let me know of any kind of errors you notice that 
could be related to the schema change.


Thanks,
Karl





Re: Derby

2010-06-03 Thread Jack Krupansky

Just to be clear, the full sequence would be:

1) Start UI app. Agent process should not be running.
2) "Start" LCF job in UI.
3) Shutdown UI app. Not just close the browser window.
4) AgentRun.
5) Wait long enough for crawl to have finished. Maybe watch to see that Solr 
has become idle.

6) Possibly commit to Solr.
7) AgentStop.
8) Back to step 1 for additional jobs.

Correct?

-- Jack Krupansky

--
From: 
Sent: Thursday, June 03, 2010 7:24 PM
To: 
Subject: RE: Derby

The daemon does not need to interact with the UI directly, only with the 
database.  So, you stop the UI, start the daemon, and after a while, shut 
down the daemon and restart the UI.


Karl

-Original Message-
From: ext Jack Krupansky [mailto:jack.krupan...@lucidimagination.com]
Sent: Thursday, June 03, 2010 5:51 PM
To: connectors-dev@incubator.apache.org
Subject: Re: Derby

(1) You can't run more than one LCF process at a time.  That means 
you

need to either run the daemon or the crawler-ui web application, but you
can't run both at the same time.


How do you "Start" a crawl then if not in the web app which then starts 
the

agent process crawling?

Thanks for all of this effort!

-- Jack Krupansky

--
From: 
Sent: Thursday, June 03, 2010 5:34 PM
To: 
Subject: Derby


For what it's worth, after some 5 days of work, and a couple of schema
changes to boot, LCF now runs with Derby.
Some caveats:

(1) You can't run more than one LCF process at a time.  That means 
you

need to either run the daemon or the crawler-ui web application, but you
can't run both at the same time.
(2) I haven't tested every query, so I'm sure there are probably some
that are still broken.
(3) It's slow.  Count yourself as fortunate if it runs 1/5 the rate 
of

Postgresql for you.
(4) Transactional integrity hasn't been evaluated.
(5) Deadlock detection and unique constraint violation detection is
probably not right, because I'd need to cause these errors to occur 
before

being able to key off their exception messages.
(6) I had to turn off the ability to sort on certain columns in the
reports - basically, any column that was represented as a large character
field.

Nevertheless, this represents an important milestone on the path to being
able to write some kind of unit tests that have at least some meaning.

If you have an existing LCF Postgresql database, you will need to force 
an

upgrade after going to the new trunk code.  To do this, repeat the
"org.apache.lcf.agents.Install" command, and the
"org.apache.lcf.agents.Register
org.apache.lcf.crawler.system.CrawlerAgent" command after deploying the
new code.  And, please, let me know of any kind of errors you notice that
could be related to the schema change.

Thanks,
Karl





RE: Derby

2010-06-03 Thread karl.wright
The daemon does not need to interact with the UI directly, only with the 
database.  So, you stop the UI, start the daemon, and after a while, shut down 
the daemon and restart the UI.

Karl

-Original Message-
From: ext Jack Krupansky [mailto:jack.krupan...@lucidimagination.com] 
Sent: Thursday, June 03, 2010 5:51 PM
To: connectors-dev@incubator.apache.org
Subject: Re: Derby

> (1) You can't run more than one LCF process at a time.  That means you 
> need to either run the daemon or the crawler-ui web application, but you 
> can't run both at the same time.

How do you "Start" a crawl then if not in the web app which then starts the 
agent process crawling?

Thanks for all of this effort!

-- Jack Krupansky

--
From: 
Sent: Thursday, June 03, 2010 5:34 PM
To: 
Subject: Derby

> For what it's worth, after some 5 days of work, and a couple of schema 
> changes to boot, LCF now runs with Derby.
> Some caveats:
>
> (1) You can't run more than one LCF process at a time.  That means you 
> need to either run the daemon or the crawler-ui web application, but you 
> can't run both at the same time.
> (2) I haven't tested every query, so I'm sure there are probably some 
> that are still broken.
> (3) It's slow.  Count yourself as fortunate if it runs 1/5 the rate of 
> Postgresql for you.
> (4) Transactional integrity hasn't been evaluated.
> (5) Deadlock detection and unique constraint violation detection is 
> probably not right, because I'd need to cause these errors to occur before 
> being able to key off their exception messages.
> (6) I had to turn off the ability to sort on certain columns in the 
> reports - basically, any column that was represented as a large character 
> field.
>
> Nevertheless, this represents an important milestone on the path to being 
> able to write some kind of unit tests that have at least some meaning.
>
> If you have an existing LCF Postgresql database, you will need to force an 
> upgrade after going to the new trunk code.  To do this, repeat the 
> "org.apache.lcf.agents.Install" command, and the 
> "org.apache.lcf.agents.Register 
> org.apache.lcf.crawler.system.CrawlerAgent" command after deploying the 
> new code.  And, please, let me know of any kind of errors you notice that 
> could be related to the schema change.
>
> Thanks,
> Karl
>
>
> 


Re: Derby

2010-06-03 Thread Jack Krupansky
(1) You can't run more than one LCF process at a time.  That means you 
need to either run the daemon or the crawler-ui web application, but you 
can't run both at the same time.


How do you "Start" a crawl then if not in the web app which then starts the 
agent process crawling?


Thanks for all of this effort!

-- Jack Krupansky

--
From: 
Sent: Thursday, June 03, 2010 5:34 PM
To: 
Subject: Derby

For what it's worth, after some 5 days of work, and a couple of schema 
changes to boot, LCF now runs with Derby.

Some caveats:

(1) You can't run more than one LCF process at a time.  That means you 
need to either run the daemon or the crawler-ui web application, but you 
can't run both at the same time.
(2) I haven't tested every query, so I'm sure there are probably some 
that are still broken.
(3) It's slow.  Count yourself as fortunate if it runs 1/5 the rate of 
Postgresql for you.

(4) Transactional integrity hasn't been evaluated.
(5) Deadlock detection and unique constraint violation detection is 
probably not right, because I'd need to cause these errors to occur before 
being able to key off their exception messages.
(6) I had to turn off the ability to sort on certain columns in the 
reports - basically, any column that was represented as a large character 
field.


Nevertheless, this represents an important milestone on the path to being 
able to write some kind of unit tests that have at least some meaning.


If you have an existing LCF Postgresql database, you will need to force an 
upgrade after going to the new trunk code.  To do this, repeat the 
"org.apache.lcf.agents.Install" command, and the 
"org.apache.lcf.agents.Register 
org.apache.lcf.crawler.system.CrawlerAgent" command after deploying the 
new code.  And, please, let me know of any kind of errors you notice that 
could be related to the schema change.


Thanks,
Karl





Derby

2010-06-03 Thread karl.wright
For what it's worth, after some 5 days of work, and a couple of schema changes 
to boot, LCF now runs with Derby.
Some caveats:

(1) You can't run more than one LCF process at a time.  That means you need 
to either run the daemon or the crawler-ui web application, but you can't run 
both at the same time.
(2) I haven't tested every query, so I'm sure there are probably some that 
are still broken.
(3) It's slow.  Count yourself as fortunate if it runs 1/5 the rate of 
Postgresql for you.
(4) Transactional integrity hasn't been evaluated.
(5) Deadlock detection and unique constraint violation detection is 
probably not right, because I'd need to cause these errors to occur before 
being able to key off their exception messages.
(6) I had to turn off the ability to sort on certain columns in the reports 
- basically, any column that was represented as a large character field.

Nevertheless, this represents an important milestone on the path to being able 
to write some kind of unit tests that have at least some meaning.

If you have an existing LCF Postgresql database, you will need to force an 
upgrade after going to the new trunk code.  To do this, repeat the 
"org.apache.lcf.agents.Install" command, and the 
"org.apache.lcf.agents.Register org.apache.lcf.crawler.system.CrawlerAgent" 
command after deploying the new code.  And, please, let me know of any kind of 
errors you notice that could be related to the schema change.

Thanks,
Karl