I would try executing the lock clean procedure. Shut down all ManifoldCF processes and web applications, then run the LockClean script, then start them back up again. If you have shut any processes down with kill -9, then you may have locks hanging around.
Karl On Mon, Aug 17, 2015 at 4:34 PM, Roman Šitina <[email protected]> wrote: > It is multiprocess setup with file synchronisation. > > I can see reprioritisation in logs and after a while all I can see are > these logs cycling: > > DEBUG 2015-08-17 20:27:19,980 (Expire stuffer thread) - > org.apache.manifoldcf.crawlerthreads - Expiration stuffer thread woke > up > > DEBUG 2015-08-17 20:27:19,981 (Expire stuffer thread) - > org.apache.manifoldcf.perf - Beginning query to look for documents to > expire > > DEBUG 2015-08-17 20:27:19,981 (Expire stuffer thread) - > org.apache.manifoldcf.perf - Attempt 1 to expire documents, after 0 > ms > > DEBUG 2015-08-17 20:27:19,983 (Expire stuffer thread) - > org.apache.manifoldcf.perf - Expiring 0 documents > > DEBUG 2015-08-17 20:27:19,984 (Expire stuffer thread) - > org.apache.manifoldcf.crawlerthreads - Expiration stuffer thread: > Found 0 documents to expire > > DEBUG 2015-08-17 20:27:19,996 (Expire stuffer thread) - > org.apache.manifoldcf.crawlerthreads - Expiration stuffer thread woke > up > > DEBUG 2015-08-17 20:27:19,996 (Expire stuffer thread) - > org.apache.manifoldcf.perf - Beginning query to look for documents to > expire > > DEBUG 2015-08-17 20:27:19,997 (Expire stuffer thread) - > org.apache.manifoldcf.perf - Attempt 1 to expire documents, after 1 > ms > > DEBUG 2015-08-17 20:27:19,999 (Expire stuffer thread) - > org.apache.manifoldcf.perf - Expiring 0 documents > > DEBUG 2015-08-17 20:27:19,999 (Expire stuffer thread) - > org.apache.manifoldcf.crawlerthreads - Expiration stuffer thread: > Found 0 documents to expire > > DEBUG 2015-08-17 20:27:20,077 (Document cleanup stuffer thread) - > org.apache.manifoldcf.crawlerthreads - Document cleanup stuffer thread > woke up > > DEBUG 2015-08-17 20:27:20,077 (Document delete stuffer thread) - > org.apache.manifoldcf.crawlerthreads - Document delete stuffer thread > woke up > > DEBUG 2015-08-17 20:27:20,078 (Document cleanup stuffer thread) - > org.apache.manifoldcf.crawlerthreads - Document cleanup stuffer thread > found nothing to do > > DEBUG 2015-08-17 20:27:20,078 (Document delete stuffer thread) - > org.apache.manifoldcf.crawlerthreads - Document delete stuffer thread > found nothing to do > > DEBUG 2015-08-17 20:27:20,083 (Document delete stuffer thread) - > org.apache.manifoldcf.crawlerthreads - Document delete stuffer thread > woke up > > DEBUG 2015-08-17 20:27:20,083 (Document cleanup stuffer thread) - > org.apache.manifoldcf.crawlerthreads - Document cleanup stuffer thread > woke up > > DEBUG 2015-08-17 20:27:20,084 (Document delete stuffer thread) - > org.apache.manifoldcf.crawlerthreads - Document delete stuffer thread > found nothing to do > > DEBUG 2015-08-17 20:27:20,084 (Document cleanup stuffer thread) - > org.apache.manifoldcf.crawlerthreads - Document cleanup stuffer thread > found nothing to do > > DEBUG 2015-08-17 20:27:21,078 (Document cleanup stuffer thread) - > org.apache.manifoldcf.crawlerthreads - Document cleanup stuffer thread > woke up > > > > On 17 August 2015 at 21:29, Karl Wright <[email protected]> wrote: > > 2.1 does do background reprioritization. If you want to see that > occurring > > in the log, you would need to add the following in your properties.xml > file: > > > > <property name="org.apache.manifoldcf.scheduling" value="DEBUG"/> > > > > Can I have more information? Specifically, is this a multiprocess setup? > > and if so, is this zookeeper or file system synchronization? > > > > Karl > > > > > > On Mon, Aug 17, 2015 at 2:57 PM, Roman Šitina <[email protected]> wrote: > >> > >> Hello Karl, > >> > >> thanks for you quick reply! > >> > >> The version is 2.1. I tried to get detailed logging by setting > >> log4j.rootLogger=INFO, MAIN in logging.ini but that did not help - > >> only WARN level was still logging after restart. > >> > >> Roman > >> > >> On 17 August 2015 at 20:35, Karl Wright <[email protected]> wrote: > >> > Hi Roman, > >> > > >> > ManifoldCF needs to reprioritize documents whenever you pause or > restart > >> > jobs. For jobs with large numbers of documents, the total amount of > >> > work > >> > involved in this is significant. But, depending on the precise > >> > ManifoldCF > >> > version you are using, the reprioritization typically continues in > >> > background while MCF runs your job. > >> > > >> > Can you tell me more about what version of MCF you are trying here? > >> > > >> > Karl > >> > > >> > > >> > On Mon, Aug 17, 2015 at 2:13 PM, Roman Šitina <[email protected]> > wrote: > >> >> > >> >> Hello, > >> >> > >> >> I have a ManifoldCF setup based on multiprocess-file-example which is > >> >> backed by PostgreSQL. > >> >> > >> >> I have created a connection from Documentum to ElasticSearch with > >> >> about 300 000 documents. I was able to crawl several thousand > >> >> documents so the connection is working properly. > >> >> > >> >> What I'm not sure about is that when I pause or stop the job and then > >> >> run it again it takes a while and it looks like ManifoldCF is doing > >> >> nothing (30 minutes). After that time I usually try to restart all > >> >> processes. > >> >> > >> >> I looked at all logs - manifoldcf.log, documentum-registry, > >> >> documentum-server and DFC itself but I can't find any relevant > >> >> information. > >> >> > >> >> Can you help me figuring out what is the best way to monitor progress > >> >> of jobs that look to be not progressing? > >> >> > >> >> Thank you very much > >> >> Roman > >> > > >> > > > > > >
