It is multiprocess setup with file synchronisation. I can see reprioritisation in logs and after a while all I can see are these logs cycling:
DEBUG 2015-08-17 20:27:19,980 (Expire stuffer thread) - org.apache.manifoldcf.crawlerthreads - Expiration stuffer thread woke up DEBUG 2015-08-17 20:27:19,981 (Expire stuffer thread) - org.apache.manifoldcf.perf - Beginning query to look for documents to expire DEBUG 2015-08-17 20:27:19,981 (Expire stuffer thread) - org.apache.manifoldcf.perf - Attempt 1 to expire documents, after 0 ms DEBUG 2015-08-17 20:27:19,983 (Expire stuffer thread) - org.apache.manifoldcf.perf - Expiring 0 documents DEBUG 2015-08-17 20:27:19,984 (Expire stuffer thread) - org.apache.manifoldcf.crawlerthreads - Expiration stuffer thread: Found 0 documents to expire DEBUG 2015-08-17 20:27:19,996 (Expire stuffer thread) - org.apache.manifoldcf.crawlerthreads - Expiration stuffer thread woke up DEBUG 2015-08-17 20:27:19,996 (Expire stuffer thread) - org.apache.manifoldcf.perf - Beginning query to look for documents to expire DEBUG 2015-08-17 20:27:19,997 (Expire stuffer thread) - org.apache.manifoldcf.perf - Attempt 1 to expire documents, after 1 ms DEBUG 2015-08-17 20:27:19,999 (Expire stuffer thread) - org.apache.manifoldcf.perf - Expiring 0 documents DEBUG 2015-08-17 20:27:19,999 (Expire stuffer thread) - org.apache.manifoldcf.crawlerthreads - Expiration stuffer thread: Found 0 documents to expire DEBUG 2015-08-17 20:27:20,077 (Document cleanup stuffer thread) - org.apache.manifoldcf.crawlerthreads - Document cleanup stuffer thread woke up DEBUG 2015-08-17 20:27:20,077 (Document delete stuffer thread) - org.apache.manifoldcf.crawlerthreads - Document delete stuffer thread woke up DEBUG 2015-08-17 20:27:20,078 (Document cleanup stuffer thread) - org.apache.manifoldcf.crawlerthreads - Document cleanup stuffer thread found nothing to do DEBUG 2015-08-17 20:27:20,078 (Document delete stuffer thread) - org.apache.manifoldcf.crawlerthreads - Document delete stuffer thread found nothing to do DEBUG 2015-08-17 20:27:20,083 (Document delete stuffer thread) - org.apache.manifoldcf.crawlerthreads - Document delete stuffer thread woke up DEBUG 2015-08-17 20:27:20,083 (Document cleanup stuffer thread) - org.apache.manifoldcf.crawlerthreads - Document cleanup stuffer thread woke up DEBUG 2015-08-17 20:27:20,084 (Document delete stuffer thread) - org.apache.manifoldcf.crawlerthreads - Document delete stuffer thread found nothing to do DEBUG 2015-08-17 20:27:20,084 (Document cleanup stuffer thread) - org.apache.manifoldcf.crawlerthreads - Document cleanup stuffer thread found nothing to do DEBUG 2015-08-17 20:27:21,078 (Document cleanup stuffer thread) - org.apache.manifoldcf.crawlerthreads - Document cleanup stuffer thread woke up On 17 August 2015 at 21:29, Karl Wright <[email protected]> wrote: > 2.1 does do background reprioritization. If you want to see that occurring > in the log, you would need to add the following in your properties.xml file: > > <property name="org.apache.manifoldcf.scheduling" value="DEBUG"/> > > Can I have more information? Specifically, is this a multiprocess setup? > and if so, is this zookeeper or file system synchronization? > > Karl > > > On Mon, Aug 17, 2015 at 2:57 PM, Roman Šitina <[email protected]> wrote: >> >> Hello Karl, >> >> thanks for you quick reply! >> >> The version is 2.1. I tried to get detailed logging by setting >> log4j.rootLogger=INFO, MAIN in logging.ini but that did not help - >> only WARN level was still logging after restart. >> >> Roman >> >> On 17 August 2015 at 20:35, Karl Wright <[email protected]> wrote: >> > Hi Roman, >> > >> > ManifoldCF needs to reprioritize documents whenever you pause or restart >> > jobs. For jobs with large numbers of documents, the total amount of >> > work >> > involved in this is significant. But, depending on the precise >> > ManifoldCF >> > version you are using, the reprioritization typically continues in >> > background while MCF runs your job. >> > >> > Can you tell me more about what version of MCF you are trying here? >> > >> > Karl >> > >> > >> > On Mon, Aug 17, 2015 at 2:13 PM, Roman Šitina <[email protected]> wrote: >> >> >> >> Hello, >> >> >> >> I have a ManifoldCF setup based on multiprocess-file-example which is >> >> backed by PostgreSQL. >> >> >> >> I have created a connection from Documentum to ElasticSearch with >> >> about 300 000 documents. I was able to crawl several thousand >> >> documents so the connection is working properly. >> >> >> >> What I'm not sure about is that when I pause or stop the job and then >> >> run it again it takes a while and it looks like ManifoldCF is doing >> >> nothing (30 minutes). After that time I usually try to restart all >> >> processes. >> >> >> >> I looked at all logs - manifoldcf.log, documentum-registry, >> >> documentum-server and DFC itself but I can't find any relevant >> >> information. >> >> >> >> Can you help me figuring out what is the best way to monitor progress >> >> of jobs that look to be not progressing? >> >> >> >> Thank you very much >> >> Roman >> > >> > > >
