I wrote a custom Manifoldcf repository connector for an internal document 
system and it has some strange behaviours which I am not able to explain.

1. When I schedule the job to run on a specific day and at a specific time, the 
job runs but after the shutdown it decides that it still is within the run 
window and it restarts again. This goes on multiple times, in the end the job 
ends up running 15 times or more. I checked the job history and there is no 
'job end' event but I can see all the 'job start' events which took place after 
the schedule window start time. Invoking the job manually works fine, i.e. it 
runs only once. Also, because I put a maximum run time of 300 minutes, the job 
ends up in a waiting state after the interval expires.

Below you can find some of the logs of this particular job.

 (Finisher thread) - Marked job 1470044524072 for shutdown
 INFO 2017-01-13 03:53:46,848 (Job notification thread) - Found job 
1470044524072 in need of notification
 INFO 2017-01-13 03:53:51,349 (Job start thread) - Job '1470044524072' is 
within run window at 1484276031338 ms. (which starts at 1484258400000 ms. and 
goes for 18000000 ms.)
 INFO 2017-01-13 03:53:51,356 (Job start thread) - Signalled for job start for 
job 1470044524072
 INFO 2017-01-13 03:53:55,479 (Startup thread) - Marked job 1470044524072 for 

Why does it have this behaviour and how can I correct it?

2. In the second scenario I had indexed some documents and I wanted to simulate 
the fact that our internal repository  was not available. In the current 
implementation, if there are any errors while seeding the documents, then I do 
not throw an exception but instead provide an empty list of documents to be 
seeded. What happens next is that Manifoldcf processes the already indexed 
documents and in this case the connector throws ServiceInterruptionExceptions 
which after 3 unsuccessful retries make the job stop. However, the clean-up 
thread of Manifoldcf decides that all the documents need to be deleted from the 
index. I would like to keep/update the documents, not delete them, that's why I 
chose a connector model of ADD_CHANGE. There is only one place where I 
specifically invoke activities.deleteDocument but this happens only when our 
document repository is available and the document cannot be found. This 
scenario is exceptional and will almost never happen in practice because the 
document repository never deletes the files.

Why does the Manifoldcf clean-up thread mark the documents for deletion since 
my connector does not do it on purpose?

Thank you,

Reply via email to