[ https://issues.apache.org/jira/browse/CONNECTORS-279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13138221#comment-13138221 ]
Karl Wright commented on CONNECTORS-279: ---------------------------------------- Postgresql team responded and gave debugging tips, which show the following: {code} postgres=# select * from pg_locks; locktype | database | relation | page | tuple | virtualxid | transactionid | classid | objid | objsubid | virtualtransaction | pid | mode | g ranted ---------------+----------+----------+------+-------+------------+-------------- -+---------+-------+----------+--------------------+------+------------------+-- ------- relation | 133853 | 140809 | | | | | | | | 2/204727 | 7260 | AccessShareLock | t relation | 133853 | 140809 | | | | | | | | 2/204727 | 7260 | RowExclusiveLock | t relation | 133853 | 140807 | | | | | | | | 2/204727 | 7260 | AccessShareLock | t relation | 133853 | 140807 | | | | | | | | 2/204727 | 7260 | RowExclusiveLock | t virtualxid | | | | | 2/204727 | | | | | 2/204727 | 7260 | ExclusiveLock | t virtualxid | | | | | 3/185434 | | | | | 3/185434 | 2384 | ExclusiveLock | t virtualxid | | | | | 1/216105 | | | | | 1/216105 | 1800 | ExclusiveLock | t relation | 133853 | 140751 | | | | | | | | 2/204727 | 7260 | AccessShareLock | t relation | 133853 | 140745 | | | | | | | | 2/204727 | 7260 | AccessShareLock | t relation | 133853 | 140810 | | | | | | | | 2/204727 | 7260 | AccessShareLock | t relation | 133853 | 140810 | | | | | | | | 2/204727 | 7260 | RowExclusiveLock | t relation | 133853 | 140808 | | | | | | | | 2/204727 | 7260 | AccessShareLock | t relation | 133853 | 140808 | | | | | | | | 2/204727 | 7260 | RowExclusiveLock | t relation | 133853 | 140812 | | | | | | | | 2/204727 | 7260 | AccessShareLock | t relation | 133853 | 140812 | | | | | | | | 2/204727 | 7260 | RowExclusiveLock | t relation | 133853 | 140784 | | | | | | | | 2/204727 | 7260 | AccessShareLock | t relation | 133853 | 140785 | | | | | | | | 1/216105 | 1800 | ShareLock | f relation | 133853 | 140791 | | | | | | | | 2/204727 | 7260 | AccessShareLock | t relation | 133853 | 140791 | | | | | | | | 2/204727 | 7260 | RowExclusiveLock | t relation | 133853 | 140785 | | | | | | | | 2/204727 | 7260 | AccessShareLock | t relation | 133853 | 140785 | | | | | | | | 2/204727 | 7260 | RowExclusiveLock | t transactionid | | | | | | 6252988 | | | | 2/204727 | 7260 | ExclusiveLock | t relation | 133853 | 140811 | | | | | | | | 2/204727 | 7260 | AccessShareLock | t relation | 133853 | 140811 | | | | | | | | 2/204727 | 7260 | RowExclusiveLock | t relation | 133853 | 140813 | | | | | | | | 2/204727 | 7260 | AccessShareLock | t relation | 133853 | 140813 | | | | | | | | 2/204727 | 7260 | RowExclusiveLock | t relation | 11564 | 10969 | | | | | | | | 3/185434 | 2384 | AccessShareLock | t (27 rows) postgres=# select * from pg_stat_activity; datid | datname | procpid | usesysid | usename | current_query | waiting | xact_start | query_start | backend_start | client_addr | client_port --------+----------+---------+----------+----------+---------------------------- -----+---------+----------------------------+----------------------------+------ ----------------------+-------------+------------- 133853 | testdb | 1800 | 68231 | testuser | REINDEX TABLE jobqueue | t | 2011-10-28 06:53:59.562-04 | 2011-10-28 06:53:59.562-04 | 2011- 10-28 06:53:57.789-04 | 127.0.0.1 | 54835 133853 | testdb | 7260 | 68231 | testuser | <IDLE> in transaction | f | 2011-10-28 06:53:33.481-04 | 2011-10-28 06:53:34.895-04 | 2011- 10-28 06:53:33.404-04 | 127.0.0.1 | 54828 11564 | postgres | 2384 | 10 | postgres | select * from pg_stat_activ ity; | f | 2011-10-28 06:57:14.212-04 | 2011-10-28 06:57:14.212-04 | 2011- 10-28 06:55:12.74-04 | ::1 | 54852 133853 | testdb | 7664 | 68231 | testuser | <IDLE> | f | | 2011-10-28 06:57:11.029-04 | 2011- 10-28 06:57:11.017-04 | 127.0.0.1 | 54876 133853 | testdb | 3128 | 68231 | testuser | <IDLE> | f | | 2011-10-28 06:57:11.059-04 | 2011- 10-28 06:57:11.052-04 | 127.0.0.1 | 54877 (5 rows) "Thread-4398639" daemon prio=6 tid=0x087f9400 nid=0x9f0 runnable [0x04b4f000] java.lang.Thread.State: RUNNABLE at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.read(SocketInputStream.java:129) at org.postgresql.core.VisibleBufferedInputStream.readMore(VisibleBufferedInputStream.java:135) at org.postgresql.core.VisibleBufferedInputStream.ensureBytes(VisibleBufferedInputStream.java:104) at org.postgresql.core.VisibleBufferedInputStream.read(VisibleBufferedInputStream.java:73) at org.postgresql.core.PGStream.ReceiveChar(PGStream.java:255) at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1165) at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:191) - locked <0x260f2220> (a org.postgresql.core.v3.QueryExecutorImpl) at org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:452) at org.postgresql.jdbc2.AbstractJdbc2Statement.executeWithFlags(AbstractJdbc2Statement.java:337) at org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:329) at org.apache.manifoldcf.core.database.Database.execute(Database.java:566) at org.apache.manifoldcf.core.database.Database$ExecuteQueryThread.run(Database.java:421) "Document delete thread '2'" daemon prio=6 tid=0x05660c00 nid=0xc88 in Object.wait() [0x06d8f000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) at java.lang.Thread.join(Thread.java:1143) - locked <0x260f0118> (a org.apache.manifoldcf.core.database.Database$ExecuteQueryThread) at java.lang.Thread.join(Thread.java:1196) at org.apache.manifoldcf.core.database.Database.executeViaThread(Database.java:453) at org.apache.manifoldcf.core.database.Database.executeUncachedQuery(Database.java:505) at org.apache.manifoldcf.core.database.Database$QueryCacheExecutor.create(Database.java:1152) at org.apache.manifoldcf.core.cachemanager.CacheManager.findObjectsAndExecute(CacheManager.java:144) at org.apache.manifoldcf.core.database.Database.executeQuery(Database.java:168) at org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.performModification(DBInterfacePostgreSQL.java:639) at org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.reindexTableInternal(DBInterfacePostgreSQL.java:1284) at org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.noteModifications(DBInterfacePostgreSQL.java:1356) at org.apache.manifoldcf.core.database.BaseTable.noteModifications(BaseTable.java:299) at org.apache.manifoldcf.crawler.jobs.JobQueue.deleteIngestedDocumentIdentifiers(JobQueue.java:565) at org.apache.manifoldcf.crawler.jobs.JobManager.deleteIngestedDocumentIdentifiers(JobManager.java:789) at org.apache.manifoldcf.crawler.system.DocumentDeleteThread.run(DocumentDeleteThread.java:176) "Document delete stuffer thread" daemon prio=6 tid=0x0565fc00 nid=0x16b0 in Object.wait() [0x06c9f000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) at java.lang.Object.wait(Object.java:485) at org.apache.manifoldcf.core.lockmanager.LockObject.enterWriteLock(LockObject.java:111) - locked <0x260f26f8> (a org.apache.manifoldcf.core.lockmanager.LockObject) at org.apache.manifoldcf.core.lockmanager.LockManager.enterWriteCriticalSection(LockManager.java:1459) at org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.noteModifications(DBInterfacePostgreSQL.java:1322) at org.apache.manifoldcf.core.database.BaseTable.noteModifications(BaseTable.java:299) at org.apache.manifoldcf.crawler.jobs.JobQueue.setDeletingStatus(JobQueue.java:718) at org.apache.manifoldcf.crawler.jobs.JobManager.getNextDeletableDocuments(JobManager.java:1227) at org.apache.manifoldcf.crawler.system.DocumentDeleteStufferThread.run(DocumentDeleteStufferThread.java:105) {code} So it looks like our reindexing locks are what are hanging the system - at least in this case. > Postgresql load test job delete document cleanup fails sometimes and leaves > orphaned documents > ---------------------------------------------------------------------------------------------- > > Key: CONNECTORS-279 > URL: https://issues.apache.org/jira/browse/CONNECTORS-279 > Project: ManifoldCF > Issue Type: Bug > Components: Framework crawler agent > Affects Versions: ManifoldCF 0.4 > Reporter: Karl Wright > Assignee: Karl Wright > Fix For: ManifoldCF 0.4 > > > Running the postgresql load test on my laptop, I was surprised when the test > did not finish. The UI indicated that the job was being deleted, but there > were 49,000 documents and that number was not moving. Further inspection > yielded the following: > - Job was in the "DELETING" state > - Documents were in the "BEINGDELETED" state > - No activity of any kind ongoing > The log had no errors. > It was impossible to get a thread dump, but a cursory inspection of the code > indicated that either the documents were being marked as "BEINGDELETED" but > were not actually being placed on the in-memory queue, or the delete threads > were picking up the documents and somehow avoiding marking them as being > processed. > Also, probably unrelated, the Document Status report listed these documents > as having a status of "Being removed" and a state of "Unknown". The > "Unknown" should have been a "Deleting". Since the extended WHEN... ELSE > clause has a reasonable condition for the "Deleting" answer, it's hard to see > how this could have occurred either. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira