Hi Karl,
I hope you might be able to provide some advice and guidance about a job lock 
up and performance issue we are seeing with ManifoldCF 2.0.1 in our environment.
Environment:
We are running ManifoldCF version 2.0.1 on Windows 2008 R2   using a PostgreSQL 
9.3 database
We have used the ManifoldCF multiprocess-file-example as the basis of the 
installation. (As an aside for a production environment would it be acceptable 
to use the file-based synchronisation model?)
We are calling the "start-agents.bat" file from a NSSM service wrapper (could 
there be an issue if we do not call the stop-agents.bat file when the service 
wrapper shuts down.  NSSM shuts down the start-agents .bat by sending it the 
equivalent of Ctrl-C).

The database has been set up using the configuration parameter values in the 
postgresql.conf parameters identified in 
https://manifoldcf.apache.org/release/release-2.5/en_US/how-to-build-and-deploy.html
We crawl various repositories including:
-Website with 1.8 million documents (crawled using 16 different jobs - each run 
manually)
-SharePoint instance with only a few hundred documents (2 or 3 crawl Jobs)   
run hourly
-Website crawled every 5 mins ( upto 4 documents returned - 60kb each)
The documents are output to multiple collections in SOLR Cloud 4.10.3

We have been running the PostgreSQL Vacuum Full and Re-index manually (each 
take an hour to run) on a semi regular basis and before and after large jobs.
Looking at the PostgreSQL statistics it appears that the Jobqueue table is 
being analysed every couple of minutes.

Issues:
Issue 1: Warnings about locks
Regularly we see lock errors in the logs which sometimes result in all the jobs 
halting until a lock-clean is performed.
We have checked that all the manifoldcf processes are running as the same user 
(SYSTEM),   and that AV and Windows indexing are not scanning the Syncharea.

The kinds of lock errors we are seeing include:

"WARN 2017-01-19 04:33:40,964 (Idle cleanup thread) - Attempt to set file lock 
'D:\Apps\ManifoldCF\apache-manifoldcf-2.0.1\multiprocess-file-example\.\.\syncharea\602\12\lock-_Cache_OUTPUTCONNECTION_Solr
 COLL1 osp_unstruct.lock' failed: Access is denied java.io.IOException: Access 
is denied"

and
WARN 2017-01-19 03:22:33,312 (Idle cleanup thread) - Attempt to set file lock 
'D:\Apps\ManifoldCF\apache-manifoldcf-2.0.1\multiprocess-file-example\.\.\syncharea\386\230\lock-_POOLTARGET__TRANSFORMATIONCONNECTORPOOL_Metadata
 Adjuster.lock' failed: Access is denied
java.io.IOException: Access is denied

and
WARN 2017-01-19 03:35:39,864 (Idle cleanup thread) - Attempt to set file lock 
'D:\Apps\ManifoldCF\apache-manifoldcf-2.0.1\multiprocess-file-example\.\.\syncharea\475\708\lock-_POOLTARGET__OUTPUTCONNECTORPOOL_Solr
 COLL1 osp_unstruct.lock' failed: Access is denied
java.io.IOException: Access is denied







Issue 2:  Log contains  Error : Unexpected jobqueue status

"WARN 2017-01-16 13:57:20,954 (Worker thread '7') - Service interruption 
reported for job 1442819500587 connection WebSource1: IO exception reading 
header: Connection reset
WARN 2017-01-16 13:59:51,042 (Worker thread '72') - Service interruption 
reported for job 1442819500587 connection WebSource1: IO exception reading 
header: Connection reset
WARN 2017-01-16 13:59:51,042 (Worker thread '93') - Service interruption 
reported for job 1442819500587 connection WebSource1: IO exception reading 
header: Connection reset
WARN 2017-01-16 13:59:51,057 (Worker thread '82') - Service interruption 
reported for job 1442819500587 connection WebSource1: IO exception reading 
header: Connection reset
WARN 2017-01-16 13:59:51,057 (Worker thread '103') - Service interruption 
reported for job 1442819500587 connection WebSource1: IO exception reading 
header: Connection reset
WARN 2017-01-16 13:59:51,057 (Worker thread '47') - Service interruption 
reported for job 1442819500587 connection WebSource1: IO exception reading 
header: Connection reset
WARN 2017-01-16 14:00:04,037 (Worker thread '74') - Service interruption 
reported for job 1442819500587 connection WebSource1: IO exception reading 
header: Connection reset
WARN 2017-01-16 14:00:04,037 (Worker thread '43') - Service interruption 
reported for job 1442819500587 connection WebSource1: IO exception reading 
header: Connection reset
WARN 2017-01-16 14:00:04,037 (Worker thread '101') - Service interruption 
reported for job 1442819500587 connection WebSource1: IO exception reading 
header: Connection reset
WARN 2017-01-16 14:00:04,037 (Worker thread '109') - Service interruption 
reported for job 1442819500587 connection WebSource1: IO exception reading 
header: Connection reset
WARN 2017-01-16 14:00:04,037 (Worker thread '77') - Service interruption 
reported for job 1442819500587 connection WebSource1: IO exception reading 
header: Connection reset
WARN 2017-01-16 14:00:04,037 (Worker thread '15') - Service interruption 
reported for job 1442819500587 connection WebSource1: IO exception reading 
header: Connection reset
WARN 2017-01-16 14:00:04,037 (Worker thread '30') - Service interruption 
reported for job 1442819500587 connection WebSource1: IO exception reading 
header: Connection reset
ERROR 2017-01-16 14:01:03,566 (Worker thread '43') - Exception tossed: 
Unexpected jobqueue status - record id 1463482497412, expecting active status, 
saw 4
org.apache.manifoldcf.core.interfaces.ManifoldCFException: Unexpected jobqueue 
status - record id 1463482497412, expecting active status, saw 4
at 
org.apache.manifoldcf.crawler.jobs.JobQueue.updateCompletedRecord(JobQueue.java:1019)
                at 
org.apache.manifoldcf.crawler.jobs.JobManager.markDocumentCompletedMultiple(JobManager.java:3118)
at org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:661)


Issue 3: Various warnings about long running queries
Example 1
WARN 2017-01-16 14:08:35,670 (Startup thread) - Found a long-running query 
(98326 ms): [UPDATE hopcount SET distance=?,deathmark=? WHERE id IN(SELECT 
t0.ownerid FROM hopdeletedeps t0,jobqueue t99,intrinsiclink t1 WHERE t0.jobid=? 
AND t1.jobid=? AND t1.parentidhash=t0.parentidhash AND t1.linktype=t0.linktype 
AND t1.childidhash=t0.childidhash AND t99.jobid=? AND 
t99.dochash=t0.childidhash AND  (t99.status=? OR t99.status=?))]
WARN 2017-01-16 14:08:35,670 (Startup thread) -   Parameter 0: '-1'
WARN 2017-01-16 14:08:35,670 (Startup thread) -   Parameter 1: 'D'
WARN 2017-01-16 14:08:35,670 (Startup thread) -   Parameter 2: '1442819500587'
WARN 2017-01-16 14:08:35,670 (Startup thread) -   Parameter 3: '1442819500587'
WARN 2017-01-16 14:08:35,670 (Startup thread) -   Parameter 4: '1442819500587'
WARN 2017-01-16 14:08:35,670 (Startup thread) -   Parameter 5: 'P'
WARN 2017-01-16 14:08:35,670 (Startup thread) -   Parameter 6: 'H'
WARN 2017-01-16 14:08:35,702 (Startup thread) -  Plan: Update on hopcount  
(cost=7229.16..7236.97 rows=1 width=87)
WARN 2017-01-16 14:08:35,702 (Startup thread) -  Plan:   ->  Nested Loop  
(cost=7229.16..7236.97 rows=1 width=87)
WARN 2017-01-16 14:08:35,702 (Startup thread) -  Plan:         ->  
HashAggregate  (cost=7228.73..7228.74 rows=1 width=26)
WARN 2017-01-16 14:08:35,702 (Startup thread) -  Plan:               ->  Nested 
Loop  (cost=9.99..7228.72 rows=1 width=26)
WARN 2017-01-16 14:08:35,702 (Startup thread) -  Plan:                     ->  
Nested Loop  (cost=9.43..7220.35 rows=1 width=148)
WARN 2017-01-16 14:08:35,702 (Startup thread) -  Plan:                          
 ->  Bitmap Heap Scan on jobqueue t99  (cost=8.87..12.89 rows=1 width=47)
WARN 2017-01-16 14:08:35,702 (Startup thread) -  Plan:                          
       Recheck Cond: ((status = 'P'::bpchar) OR (status = 'H'::bpchar))
WARN 2017-01-16 14:08:35,702 (Startup thread) -  Plan:                          
       Filter: (jobid = 1442819500587::bigint)
WARN 2017-01-16 14:08:35,702 (Startup thread) -  Plan:                          
       ->  BitmapOr  (cost=8.87..8.87 rows=1 width=0)
WARN 2017-01-16 14:08:35,702 (Startup thread) -  Plan:                          
             ->  Bitmap Index Scan on i1437569161257  (cost=0.00..4.44 rows=1 
width=0)
WARN 2017-01-16 14:08:35,702 (Startup thread) -  Plan:                          
                   Index Cond: (status = 'P'::bpchar)
WARN 2017-01-16 14:08:35,702 (Startup thread) -  Plan:                          
             ->  Bitmap Index Scan on i1437569161257  (cost=0.00..4.44 rows=1 
width=0)
WARN 2017-01-16 14:08:35,702 (Startup thread) -  Plan:                          
                   Index Cond: (status = 'H'::bpchar)
WARN 2017-01-16 14:08:35,702 (Startup thread) -  Plan:                          
 ->  Index Scan using i1437569161250 on hopdeletedeps t0  (cost=0.56..7188.01 
rows=1945 width=101)
WARN 2017-01-16 14:08:35,702 (Startup thread) -  Plan:                          
       Index Cond: ((jobid = 1442819500587::bigint) AND ((childidhash)::text = 
(t99.dochash)::text))
WARN 2017-01-16 14:08:35,702 (Startup thread) -  Plan:                     ->  
Index Scan using i1437569161252 on intrinsiclink t1  (cost=0.55..8.36 rows=1 
width=93)
WARN 2017-01-16 14:08:35,702 (Startup thread) -  Plan:                          
 Index Cond: ((jobid = 1442819500587::bigint) AND ((parentidhash)::text = 
(t0.parentidhash)::text) AND ((linktype)::text = (t0.linktype)::text) AND 
((childidhash)::text = (t0.childidhash)::text))
WARN 2017-01-16 14:08:35,702 (Startup thread) -  Plan:         ->  Index Scan 
using hopcount_pkey on hopcount  (cost=0.43..8.22 rows=1 width=69)
WARN 2017-01-16 14:08:35,702 (Startup thread) -  Plan:               Index 
Cond: (id = t0.ownerid)
WARN 2017-01-16 14:08:35,702 (Startup thread) -
 WARN 2017-01-16 14:08:36,279 (Startup thread) -  Stats: n_distinct=3.0 
most_common_vals={C,G,Z} 
most_common_freqs={0.97649997,0.023066666,0.00043333333}
WARN 2017-01-16 14:08:36,279 (Startup thread) -

Example 2
WARN 2017-01-18 03:49:17,802 (Worker thread '73') - Found a long-running query 
(157622 ms): [SELECT t0.id,t0.dochash,t0.docid FROM carrydown t1, jobqueue t0 
WHERE t1.jobid=? AND t1.parentidhash=? AND t0.dochash=t1.childidhash AND 
t0.jobid=t1.jobid AND t1.isnew=?]
WARN 2017-01-18 03:49:17,802 (Worker thread '73') -   Parameter 0: 
'1443087309509'
WARN 2017-01-18 03:49:17,802 (Worker thread '73') -   Parameter 1: 
'7397E59F70C5FA473E178890A7A96C871833F621'
WARN 2017-01-18 03:49:17,802 (Worker thread '73') -   Parameter 2: 'B'
WARN 2017-01-18 03:49:17,865 (Worker thread '73') -  Plan: Nested Loop  
(cost=0.82..14.62 rows=1 width=126) (actual time=0.020..0.020 rows=0 loops=1)
WARN 2017-01-18 03:49:17,896 (Worker thread '73') -  Plan:   ->  Index Scan 
using i1437569161245 on carrydown t1  (cost=0.27..6.04 rows=1 width=49) (actual 
time=0.007..0.007 rows=0 loops=1)
WARN 2017-01-18 03:49:17,896 (Worker thread '73') -  Plan:         Index Cond: 
((jobid = 1443087309509::bigint) AND ((parentidhash)::text = 
'7397E59F70C5FA473E178890A7A96C871833F621'::text))
WARN 2017-01-18 03:49:17,896 (Worker thread '73') -  Plan:         Filter: 
(isnew = 'B'::bpchar)
WARN 2017-01-18 03:49:17,896 (Worker thread '73') -  Plan:   ->  Index Scan 
using i1437569161255 on jobqueue t0  (cost=0.55..8.57 rows=1 width=134) (never 
executed)


Example 3:
WARN 2017-01-18 04:37:26,435 (Expire stuffer thread) - Found a long-running 
query (157638 ms): [SELECT 
t0.id,t0.jobid,t0.dochash,t0.docid,t0.status,t0.failtime,t0.failcount FROM 
jobqueue t0 WHERE  (t0.status=? OR t0.status=?) AND t0.checkaction=? AND 
t0.checktime<=? AND EXISTS(SELECT 'x' FROM jobs t1 WHERE  (t1.status=? OR 
t1.status=?) AND t1.id=t0.jobid) AND NOT EXISTS(SELECT 'x' FROM jobqueue t2 
WHERE t2.dochash=t0.dochash AND  (t2.status=? OR t2.status=? OR t2.status=? OR 
t2.status=? OR t2.status=? OR t2.status=?) AND t2.jobid!=t0.jobid) LIMIT 100]
WARN 2017-01-18 04:37:26,451 (Expire stuffer thread) -   Parameter 0: 'P'
WARN 2017-01-18 04:37:26,451 (Expire stuffer thread) -   Parameter 1: 'G'
WARN 2017-01-18 04:37:26,451 (Expire stuffer thread) -   Parameter 2: 'D'
WARN 2017-01-18 04:37:26,451 (Expire stuffer thread) -   Parameter 3: 
'1484714088797'
WARN 2017-01-18 04:37:26,451 (Expire stuffer thread) -   Parameter 4: 'A'
WARN 2017-01-18 04:37:26,451 (Expire stuffer thread) -   Parameter 5: 'a'
WARN 2017-01-18 04:37:26,451 (Expire stuffer thread) -   Parameter 6: 'A'
WARN 2017-01-18 04:37:26,451 (Expire stuffer thread) -   Parameter 7: 'F'
WARN 2017-01-18 04:37:26,451 (Expire stuffer thread) -   Parameter 8: 'a'
WARN 2017-01-18 04:37:26,451 (Expire stuffer thread) -   Parameter 9: 'f'
WARN 2017-01-18 04:37:26,466 (Expire stuffer thread) -   Parameter 10: 'D'
WARN 2017-01-18 04:37:26,466 (Expire stuffer thread) -   Parameter 11: 'd'
WARN 2017-01-18 04:37:26,466 (Expire stuffer thread) -  Plan: Limit  
(cost=17.72..45.87 rows=1 width=152) (actual time=0.043..0.043 rows=0 loops=1)
WARN 2017-01-18 04:37:26,466 (Expire stuffer thread) -  Plan:   ->  Nested Loop 
Anti Join  (cost=17.72..45.87 rows=1 width=152) (actual time=0.040..0.040 
rows=0 loops=1)
WARN 2017-01-18 04:37:26,466 (Expire stuffer thread) -  Plan:         ->  
Nested Loop Semi Join  (cost=13.16..25.21 rows=1 width=152) (actual 
time=0.038..0.038 rows=0 loops=1)
WARN 2017-01-18 04:37:26,466 (Expire stuffer thread) -  Plan:               ->  
Bitmap Heap Scan on jobqueue t0  (cost=8.88..12.90 rows=1 width=152) (actual 
time=0.036..0.036 rows=0 loops=1)
WARN 2017-01-18 04:37:26,466 (Expire stuffer thread) -  Plan:                   
  Recheck Cond: (((status = 'P'::bpchar) AND (checkaction = 'D'::bpchar) AND 
(checktime <= 1484714088797::bigint)) OR ((status = 'G'::bpchar) AND 
(checkaction = 'D'::bpchar) AND (checktime <= 1484714088797::bigint)))
WARN 2017-01-18 04:37:26,466 (Expire stuffer thread) -  Plan:                   
  ->  BitmapOr  (cost=8.88..8.88 rows=1 width=0) (actual time=0.033..0.033 
rows=0 loops=1)
WARN 2017-01-18 04:37:26,466 (Expire stuffer thread) -  Plan:                   
        ->  Bitmap Index Scan on i1437569161257  (cost=0.00..4.44 rows=1 
width=0) (actual time=0.017..0.017 rows=0 loops=1)
WARN 2017-01-18 04:37:26,466 (Expire stuffer thread) -  Plan:                   
              Index Cond: ((status = 'P'::bpchar) AND (checkaction = 
'D'::bpchar) AND (checktime <= 1484714088797::bigint))
WARN 2017-01-18 04:37:26,466 (Expire stuffer thread) -  Plan:                   
        ->  Bitmap Index Scan on i1437569161257  (cost=0.00..4.44 rows=1 
width=0) (actual time=0.012..0.012 rows=0 loops=1)
WARN 2017-01-18 04:37:26,466 (Expire stuffer thread) -  Plan:                   
              Index Cond: ((status = 'G'::bpchar) AND (checkaction = 
'D'::bpchar) AND (checktime <= 1484714088797::bigint))
WARN 2017-01-18 04:37:26,466 (Expire stuffer thread) -  Plan:               ->  
Bitmap Heap Scan on jobs t1  (cost=4.28..8.29 rows=1 width=8) (never executed)
WARN 2017-01-18 04:37:26,466 (Expire stuffer thread) -  Plan:                   
  Recheck Cond: (id = t0.jobid)
WARN 2017-01-18 04:37:26,466 (Expire stuffer thread) -  Plan:                   
  Filter: ((status = 'A'::bpchar) OR (status = 'a'::bpchar))
WARN 2017-01-18 04:37:26,466 (Expire stuffer thread) -  Plan:                   
  ->  Bitmap Index Scan on jobs_pkey  (cost=0.00..4.28 rows=1 width=0) (never 
executed)
WARN 2017-01-18 04:37:26,466 (Expire stuffer thread) -  Plan:                   
        Index Cond: (id = t0.jobid)
WARN 2017-01-18 04:37:26,466 (Expire stuffer thread) -  Plan:         ->  
Bitmap Heap Scan on jobqueue t2  (cost=4.57..12.61 rows=1 width=49) (never 
executed)
WARN 2017-01-18 04:37:26,466 (Expire stuffer thread) -  Plan:               
Recheck Cond: ((dochash)::text = (t0.dochash)::text)
WARN 2017-01-18 04:37:26,466 (Expire stuffer thread) -  Plan:               
Filter: ((jobid <> t0.jobid) AND ((status = 'A'::bpchar) OR (status = 
'F'::bpchar) OR (status = 'a'::bpchar) OR (status = 'f'::bpchar) OR (status = 
'D'::bpchar) OR (status = 'd'::bpchar)))
WARN 2017-01-18 04:37:26,466 (Expire stuffer thread) -  Plan:               ->  
Bitmap Index Scan on i1437569161255  (cost=0.00..4.57 rows=2 width=0) (never 
executed)
WARN 2017-01-18 04:37:26,466 (Expire stuffer thread) -  Plan:                   
  Index Cond: ((dochash)::text = (t0.dochash)::text)
WARN 2017-01-18 04:37:26,466 (Expire stuffer thread) -  Plan: Total runtime: 
0.151 ms

Best Regards,

Guy

Guy Standen



Unless otherwise stated, this email has been sent from Fujitsu Services Limited 
(registered in England No 96056); Fujitsu EMEA PLC (registered in England No 
2216100) both with registered offices at: 22 Baker Street, London W1U 3BW;  PFU 
(EMEA) Limited, (registered in England No 1578652) and Fujitsu Laboratories of 
Europe Limited (registered in England No. 4153469) both with registered offices 
at: Hayes Park Central, Hayes End Road, Hayes, Middlesex, UB4 8FE. 
This email is only for the use of its intended recipient. Its contents are 
subject to a duty of confidence and may be privileged. Fujitsu does not 
guarantee that this email has not been intercepted and amended or that it is 
virus-free.

Reply via email to