Nice catch Karl!
I applied that patch, but I'm still getting the same error.
I think the problem is in JobManager.noteTransformationConnectionRe
gistration
If jobs.findJobsMatchingTransformations(list); returns a large list of ids
(like it is doing in our case - 39,941 ids ), the generated query string
still has a large OR clause in it. I don't see getMaxOrClause applied to
the query being built inside noteTransformationConnectionRegistration
>>>>>>
protected void noteTransformationConnectionRegistration(List<String> list)
throws ManifoldCFException
{
// Query for the matching jobs, and then for each job potentially
adjust the state
Long[] jobIDs = jobs.findJobsMatchingTransformations(list);
if (jobIDs.length == 0)
return;
StringBuilder query = new StringBuilder();
ArrayList newList = new ArrayList();
query.append("SELECT
").append(jobs.idField).append(",").append(jobs.statusField)
.append(" FROM ").append(jobs.getTableName()).append(" WHERE ")
* .append(database.buildConjunctionClause(newList,new
ClauseDescription[]{*
* new MultiClause(jobs.idField,jobIDs)}))*
.append(" FOR UPDATE");
IResultSet set =
database.performQuery(query.toString(),newList,null,null);
int i = 0;
while (i < set.getRowCount())
{
IResultRow row = set.getRow(i++);
Long jobID = (Long)row.getValue(jobs.idField);
int statusValue =
jobs.stringToStatus((String)row.getValue(jobs.statusField));
jobs.noteTransformationConnectorRegistration(jobID,statusValue);
}
}
<<<<<<
On Mon, Jul 30, 2018 at 1:55 PM, Karl Wright <[email protected]> wrote:
> The Postgresql driver supposedly limits this to 25 clauses at a pop:
>
> >>>>>>
> @Override
> public int getMaxOrClause()
> {
> return 25;
> }
>
> /* Calculate the number of values a particular clause can have, given
> the values for all the other clauses.
> * For example, if in the expression x AND y AND z, x has 2 values and z
> has 1, find out how many values x can legally have
> * when using the buildConjunctionClause() method below.
> */
> @Override
> public int findConjunctionClauseMax(ClauseDescription[]
> otherClauseDescriptions)
> {
> // This implementation uses "OR"
> return getMaxOrClause();
> }
> <<<<<<
>
> The problem is that there was a cut-and-paste error, with just
> transformation connections, that defeated the limit. I'll create a ticket
> and attach a patch. CONNECTORS-1520.
>
> Karl
>
>
>
>
>
> On Mon, Jul 30, 2018 at 2:29 PM Karl Wright <[email protected]> wrote:
>
>> Hi Mike,
>>
>> This might be the issue indeed. I'll look into it.
>>
>> Karl
>>
>>
>> On Mon, Jul 30, 2018 at 2:26 PM Mike Hugo <[email protected]> wrote:
>>
>>> I'm not sure what the solution is yet, but I think I may have found the
>>> culprit:
>>>
>>> JobManager.noteTransformationConnectionRegistration(List<String> list)
>>> is creating a pretty big query:
>>>
>>> SELECT id,status FROM jobs WHERE (id=? OR id=? OR id=? OR id=? ........
>>> OR id=?) FOR UPDATE
>>>
>>> replace the elipsis with as list of 39,941 ids (it's a huge query when
>>> it prints out)
>>>
>>> It seems that the database doesn't like that query and closes the
>>> connection before returning with a response.
>>>
>>> As I mentioned this instance of manifold has nearly 40,000 web
>>> crawlers. is that a high number for Manifold to handle?
>>>
>>> On Mon, Jul 30, 2018 at 10:58 AM, Karl Wright <[email protected]>
>>> wrote:
>>>
>>>> Well, I have absolutely no idea what is wrong and I've never seen
>>>> anything like that before. But postgres is complaining because the
>>>> communication with the JDBC client is being interrupted by something.
>>>>
>>>> Karl
>>>>
>>>>
>>>> On Mon, Jul 30, 2018 at 10:39 AM Mike Hugo <[email protected]> wrote:
>>>>
>>>>> No, and manifold and postgres run on the same host.
>>>>>
>>>>> On Mon, Jul 30, 2018 at 9:35 AM, Karl Wright <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> ' LOG: incomplete message from client'
>>>>>>
>>>>>> This shows a network issue. Did your network configuration change
>>>>>> recently?
>>>>>>
>>>>>> Karl
>>>>>>
>>>>>>
>>>>>> On Mon, Jul 30, 2018 at 9:59 AM Mike Hugo <[email protected]> wrote:
>>>>>>
>>>>>>> Tried a postgres vacuum and also a restart, but the problem
>>>>>>> persists. Here's the log again with some additional logging details
>>>>>>> added
>>>>>>> (below)
>>>>>>>
>>>>>>> I tried running the last query from the logs against the database
>>>>>>> and it works fine - I modified it to return a count and that also works.
>>>>>>>
>>>>>>> SELECT count(*) FROM jobs t1 WHERE EXISTS(SELECT 'x' FROM
>>>>>>> jobpipelines WHERE t1.id=ownerid AND transformationname='Tika');
>>>>>>> count
>>>>>>> -------
>>>>>>> 39941
>>>>>>> (1 row)
>>>>>>>
>>>>>>>
>>>>>>> Is 39k jobs a high number? I've run some other instances of
>>>>>>> Manifold with more like 1,000 jobs and those seem to be working fine.
>>>>>>> That's the only thing I can think of that's different between this
>>>>>>> instance
>>>>>>> that won't start and the others. Any ideas?
>>>>>>>
>>>>>>> Thanks for your help!
>>>>>>>
>>>>>>> Mike
>>>>>>>
>>>>>>> LOG: duration: 0.079 ms parse <unnamed>: SELECT connectionname
>>>>>>> FROM transformationconnections WHERE classname=$1
>>>>>>> LOG: duration: 0.079 ms bind <unnamed>: SELECT connectionname FROM
>>>>>>> transformationconnections WHERE classname=$1
>>>>>>> DETAIL: parameters: $1 = 'org.apache.manifoldcf.agents.
>>>>>>> transformation.tika.TikaExtractor'
>>>>>>> LOG: duration: 0.017 ms execute <unnamed>: SELECT connectionname
>>>>>>> FROM transformationconnections WHERE classname=$1
>>>>>>> DETAIL: parameters: $1 = 'org.apache.manifoldcf.agents.
>>>>>>> transformation.tika.TikaExtractor'
>>>>>>> LOG: duration: 0.039 ms parse <unnamed>: SELECT * FROM agents
>>>>>>> LOG: duration: 0.040 ms bind <unnamed>: SELECT * FROM agents
>>>>>>> LOG: duration: 0.010 ms execute <unnamed>: SELECT * FROM agents
>>>>>>> LOG: duration: 0.084 ms parse <unnamed>: SELECT id FROM jobs t1
>>>>>>> WHERE EXISTS(SELECT 'x' FROM jobpipelines WHERE t1.id=ownerid AND
>>>>>>> transformationname=$1)
>>>>>>> LOG: duration: 0.359 ms bind <unnamed>: SELECT id FROM jobs t1
>>>>>>> WHERE EXISTS(SELECT 'x' FROM jobpipelines WHERE t1.id=ownerid AND
>>>>>>> transformationname=$1)
>>>>>>> DETAIL: parameters: $1 = 'Tika'
>>>>>>> LOG: duration: 77.622 ms execute <unnamed>: SELECT id FROM jobs t1
>>>>>>> WHERE EXISTS(SELECT 'x' FROM jobpipelines WHERE t1.id=ownerid AND
>>>>>>> transformationname=$1)
>>>>>>> DETAIL: parameters: $1 = 'Tika'
>>>>>>> LOG: incomplete message from client
>>>>>>> LOG: disconnection: session time: 0:00:06.574 user=REMOVED
>>>>>>> database=REMOVED host=127.0.0.1 port=45356
>>>>>>> >2018-07-30 12:36:09,415 [main] ERROR org.apache.manifoldcf.root -
>>>>>>> Exception: This connection has been closed.
>>>>>>> org.apache.manifoldcf.core.interfaces.ManifoldCFException: This
>>>>>>> connection has been closed.
>>>>>>> at org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.
>>>>>>> reinterpretException(DBInterfacePostgreSQL.java:627)
>>>>>>> ~[mcf-core.jar:?]
>>>>>>> at org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.
>>>>>>> rollbackCurrentTransaction(DBInterfacePostgreSQL.java:1296)
>>>>>>> ~[mcf-core.jar:?]
>>>>>>> at org.apache.manifoldcf.core.database.Database.
>>>>>>> endTransaction(Database.java:368) ~[mcf-core.jar:?]
>>>>>>> at org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.
>>>>>>> endTransaction(DBInterfacePostgreSQL.java:1236) ~[mcf-core.jar:?]
>>>>>>> at org.apache.manifoldcf.crawler.system.ManifoldCF.
>>>>>>> registerConnectors(ManifoldCF.java:605) ~[mcf-pull-agent.jar:?]
>>>>>>> at org.apache.manifoldcf.crawler.system.ManifoldCF.
>>>>>>> reregisterAllConnectors(ManifoldCF.java:160) ~[mcf-pull-agent.jar:?]
>>>>>>> at org.apache.manifoldcf.jettyrunner.ManifoldCFJettyRunner.main(
>>>>>>> ManifoldCFJettyRunner.java:239) [mcf-jetty-runner.jar:?]
>>>>>>> Caused by: org.postgresql.util.PSQLException: This connection has
>>>>>>> been closed.
>>>>>>> at org.postgresql.jdbc.PgConnection.checkClosed(PgConnection.java:766)
>>>>>>> ~[postgresql-42.1.3.jar:42.1.3]
>>>>>>> at
>>>>>>> org.postgresql.jdbc.PgConnection.createStatement(PgConnection.java:1576)
>>>>>>> ~[postgresql-42.1.3.jar:42.1.3]
>>>>>>> at
>>>>>>> org.postgresql.jdbc.PgConnection.createStatement(PgConnection.java:367)
>>>>>>> ~[postgresql-42.1.3.jar:42.1.3]
>>>>>>> at
>>>>>>> org.apache.manifoldcf.core.database.Database.execute(Database.java:873)
>>>>>>> ~[mcf-core.jar:?]
>>>>>>> at org.apache.manifoldcf.core.database.Database$
>>>>>>> ExecuteQueryThread.run(Database.java:696) ~[mcf-core.jar:?]
>>>>>>> org.apache.manifoldcf.core.interfaces.ManifoldCFException: This
>>>>>>> connection has been closed.
>>>>>>> at org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.
>>>>>>> reinterpretException(DBInterfacePostgreSQL.java:627)
>>>>>>> at org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.
>>>>>>> rollbackCurrentTransaction(DBInterfacePostgreSQL.java:1296)
>>>>>>> at org.apache.manifoldcf.core.database.Database.
>>>>>>> endTransaction(Database.java:368)
>>>>>>> at org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.
>>>>>>> endTransaction(DBInterfacePostgreSQL.java:1236)
>>>>>>> at org.apache.manifoldcf.crawler.system.ManifoldCF.
>>>>>>> registerConnectors(ManifoldCF.java:605)
>>>>>>> at org.apache.manifoldcf.crawler.system.ManifoldCF.
>>>>>>> reregisterAllConnectors(ManifoldCF.java:160)
>>>>>>> at org.apache.manifoldcf.jettyrunner.ManifoldCFJettyRunner.main(
>>>>>>> ManifoldCFJettyRunner.java:239)
>>>>>>> Caused by: org.postgresql.util.PSQLException: This connection has
>>>>>>> been closed.
>>>>>>> at org.postgresql.jdbc.PgConnection.checkClosed(
>>>>>>> PgConnection.java:766)
>>>>>>> at org.postgresql.jdbc.PgConnection.createStatement(
>>>>>>> PgConnection.java:1576)
>>>>>>> at org.postgresql.jdbc.PgConnection.createStatement(
>>>>>>> PgConnection.java:367)
>>>>>>> at org.apache.manifoldcf.core.database.Database.execute(
>>>>>>> Database.java:873)
>>>>>>> at org.apache.manifoldcf.core.database.Database$
>>>>>>> ExecuteQueryThread.run(Database.java:696)
>>>>>>> LOG: disconnection: session time: 0:00:10.677 user=postgres
>>>>>>> database=template1 host=127.0.0.1 port=45354
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Sun, Jul 29, 2018 at 8:09 AM, Karl Wright <[email protected]>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> It looks to me like your database server is not happy. Maybe it's
>>>>>>>> out of resources? Not sure but a restart may be in order.
>>>>>>>>
>>>>>>>> Karl
>>>>>>>>
>>>>>>>>
>>>>>>>> On Sun, Jul 29, 2018 at 9:06 AM Mike Hugo <[email protected]> wrote:
>>>>>>>>
>>>>>>>>> Recently we started seeing this error when Manifold CF starts up.
>>>>>>>>> We had been running Manifold CF with many web connectors and a few RSS
>>>>>>>>> feeds for a while and it had been working fine. The server got
>>>>>>>>> rebooted
>>>>>>>>> and since then we started seeing this error. I'm not sure exactly what
>>>>>>>>> changed. Any ideas as to where to start looking and how to fix this?
>>>>>>>>>
>>>>>>>>> Thanks!
>>>>>>>>>
>>>>>>>>> Mike
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Initial repository connections already created.
>>>>>>>>> Configuration file successfully read
>>>>>>>>> Successfully unregistered all domains
>>>>>>>>> Successfully unregistered all output connectors
>>>>>>>>> Successfully unregistered all transformation connectors
>>>>>>>>> Successfully unregistered all mapping connectors
>>>>>>>>> Successfully unregistered all authority connectors
>>>>>>>>> Successfully unregistered all repository connectors
>>>>>>>>> WARNING: there is already a transaction in progress
>>>>>>>>> WARNING: there is no transaction in progress
>>>>>>>>> Successfully registered output connector
>>>>>>>>> 'org.apache.manifoldcf.agents.output.solr.SolrConnector'
>>>>>>>>> WARNING: there is already a transaction in progress
>>>>>>>>> WARNING: there is no transaction in progress
>>>>>>>>> Successfully registered output connector
>>>>>>>>> 'org.apache.manifoldcf.agents.output.searchblox.
>>>>>>>>> SearchBloxConnector'
>>>>>>>>> WARNING: there is already a transaction in progress
>>>>>>>>> WARNING: there is no transaction in progress
>>>>>>>>> Successfully registered output connector
>>>>>>>>> 'org.apache.manifoldcf.agents.output.opensearchserver.
>>>>>>>>> OpenSearchServerConnector'
>>>>>>>>> WARNING: there is already a transaction in progress
>>>>>>>>> WARNING: there is no transaction in progress
>>>>>>>>> Successfully registered output connector
>>>>>>>>> 'org.apache.manifoldcf.agents.output.nullconnector.NullConnector'
>>>>>>>>> WARNING: there is already a transaction in progress
>>>>>>>>> WARNING: there is no transaction in progress
>>>>>>>>> Successfully registered output connector
>>>>>>>>> 'org.apache.manifoldcf.agents.output.kafka.KafkaOutputConnector'
>>>>>>>>> WARNING: there is already a transaction in progress
>>>>>>>>> WARNING: there is no transaction in progress
>>>>>>>>> Successfully registered output connector
>>>>>>>>> 'org.apache.manifoldcf.agents.output.hdfs.HDFSOutputConnector'
>>>>>>>>> WARNING: there is already a transaction in progress
>>>>>>>>> WARNING: there is no transaction in progress
>>>>>>>>> Successfully registered output connector
>>>>>>>>> 'org.apache.manifoldcf.agents.output.gts.GTSConnector'
>>>>>>>>> WARNING: there is already a transaction in progress
>>>>>>>>> WARNING: there is no transaction in progress
>>>>>>>>> Successfully registered output connector
>>>>>>>>> 'org.apache.manifoldcf.agents.output.filesystem.
>>>>>>>>> FileOutputConnector'
>>>>>>>>> WARNING: there is already a transaction in progress
>>>>>>>>> WARNING: there is no transaction in progress
>>>>>>>>> Successfully registered output connector
>>>>>>>>> 'org.apache.manifoldcf.agents.output.elasticsearch.
>>>>>>>>> ElasticSearchConnector'
>>>>>>>>> WARNING: there is already a transaction in progress
>>>>>>>>> WARNING: there is no transaction in progress
>>>>>>>>> Successfully registered output connector
>>>>>>>>> 'org.apache.manifoldcf.agents.output.amazoncloudsearch.
>>>>>>>>> AmazonCloudSearchConnector'
>>>>>>>>> WARNING: there is already a transaction in progress
>>>>>>>>> WARNING: there is no transaction in progress
>>>>>>>>> Successfully registered transformation connector
>>>>>>>>> 'org.apache.manifoldcf.agents.transformation.tikaservice.
>>>>>>>>> TikaExtractor'
>>>>>>>>> WARNING: there is already a transaction in progress
>>>>>>>>> LOG: incomplete message from client
>>>>>>>>> >2018-07-29 13:02:06,659 [main] ERROR org.apache.manifoldcf.root -
>>>>>>>>> Exception: This connection has been closed.
>>>>>>>>> org.apache.manifoldcf.core.interfaces.ManifoldCFException: This
>>>>>>>>> connection has been closed.
>>>>>>>>> at org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.
>>>>>>>>> reinterpretException(DBInterfacePostgreSQL.java:627)
>>>>>>>>> ~[mcf-core.jar:?]
>>>>>>>>> at org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.
>>>>>>>>> rollbackCurrentTransaction(DBInterfacePostgreSQL.java:1296)
>>>>>>>>> ~[mcf-core.jar:?]
>>>>>>>>> at org.apache.manifoldcf.core.database.Database.
>>>>>>>>> endTransaction(Database.java:368) ~[mcf-core.jar:?]
>>>>>>>>> at org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.
>>>>>>>>> endTransaction(DBInterfacePostgreSQL.java:1236) ~[mcf-core.jar:?]
>>>>>>>>> at org.apache.manifoldcf.crawler.system.ManifoldCF.
>>>>>>>>> registerConnectors(ManifoldCF.java:605) ~[mcf-pull-agent.jar:?]
>>>>>>>>> at org.apache.manifoldcf.crawler.system.ManifoldCF.
>>>>>>>>> reregisterAllConnectors(ManifoldCF.java:160)
>>>>>>>>> ~[mcf-pull-agent.jar:?]
>>>>>>>>> at org.apache.manifoldcf.jettyrunner.ManifoldCFJettyRunner.main(
>>>>>>>>> ManifoldCFJettyRunner.java:239) [mcf-jetty-runner.jar:?]
>>>>>>>>> Caused by: org.postgresql.util.PSQLException: This connection has
>>>>>>>>> been closed.
>>>>>>>>> at org.postgresql.jdbc.PgConnection.checkClosed(PgConnection.java:766)
>>>>>>>>> ~[postgresql-42.1.3.jar:42.1.3]
>>>>>>>>> at
>>>>>>>>> org.postgresql.jdbc.PgConnection.createStatement(PgConnection.java:1576)
>>>>>>>>> ~[postgresql-42.1.3.jar:42.1.3]
>>>>>>>>> at
>>>>>>>>> org.postgresql.jdbc.PgConnection.createStatement(PgConnection.java:367)
>>>>>>>>> ~[postgresql-42.1.3.jar:42.1.3]
>>>>>>>>> at
>>>>>>>>> org.apache.manifoldcf.core.database.Database.execute(Database.java:873)
>>>>>>>>> ~[mcf-core.jar:?]
>>>>>>>>> at org.apache.manifoldcf.core.database.Database$
>>>>>>>>> ExecuteQueryThread.run(Database.java:696) ~[mcf-core.jar:?]
>>>>>>>>> org.apache.manifoldcf.core.interfaces.ManifoldCFException: This
>>>>>>>>> connection has been closed.
>>>>>>>>> at org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.
>>>>>>>>> reinterpretException(DBInterfacePostgreSQL.java:627)
>>>>>>>>> at org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.
>>>>>>>>> rollbackCurrentTransaction(DBInterfacePostgreSQL.java:1296)
>>>>>>>>> at org.apache.manifoldcf.core.database.Database.
>>>>>>>>> endTransaction(Database.java:368)
>>>>>>>>> at org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.
>>>>>>>>> endTransaction(DBInterfacePostgreSQL.java:1236)
>>>>>>>>> at org.apache.manifoldcf.crawler.system.ManifoldCF.
>>>>>>>>> registerConnectors(ManifoldCF.java:605)
>>>>>>>>> at org.apache.manifoldcf.crawler.system.ManifoldCF.
>>>>>>>>> reregisterAllConnectors(ManifoldCF.java:160)
>>>>>>>>> at org.apache.manifoldcf.jettyrunner.ManifoldCFJettyRunner.main(
>>>>>>>>> ManifoldCFJettyRunner.java:239)
>>>>>>>>> Caused by: org.postgresql.util.PSQLException: This connection has
>>>>>>>>> been closed.
>>>>>>>>> at org.postgresql.jdbc.PgConnection.checkClosed(
>>>>>>>>> PgConnection.java:766)
>>>>>>>>> at org.postgresql.jdbc.PgConnection.createStatement(
>>>>>>>>> PgConnection.java:1576)
>>>>>>>>> at org.postgresql.jdbc.PgConnection.createStatement(
>>>>>>>>> PgConnection.java:367)
>>>>>>>>> at org.apache.manifoldcf.core.database.Database.execute(
>>>>>>>>> Database.java:873)
>>>>>>>>> at org.apache.manifoldcf.core.database.Database$
>>>>>>>>> ExecuteQueryThread.run(Database.java:696)
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>
>>>