If your startup script starts *all* the mcf processes, you can do that. Otherwise, it would be a bad idea.
Zookeeper is resilient against this problem, so you can also switch to that. Karl On Mon, Aug 17, 2015 at 5:09 PM, Roman Šitina <[email protected]> wrote: > Thank you very much, that helped! > > Is it ok to put lockclean call in our startup script just to make > sure? And is it worth to go for Zookeeper version? > > Thanks again > Roman > > On 17 August 2015 at 22:55, Karl Wright <[email protected]> wrote: > > I would try executing the lock clean procedure. Shut down all ManifoldCF > > processes and web applications, then run the LockClean script, then start > > them back up again. If you have shut any processes down with kill -9, > then > > you may have locks hanging around. > > > > Karl > > > > > > On Mon, Aug 17, 2015 at 4:34 PM, Roman Šitina <[email protected]> wrote: > >> > >> It is multiprocess setup with file synchronisation. > >> > >> I can see reprioritisation in logs and after a while all I can see are > >> these logs cycling: > >> > >> DEBUG 2015-08-17 20:27:19,980 (Expire stuffer thread) - > >> org.apache.manifoldcf.crawlerthreads - Expiration stuffer thread woke > >> up > >> > >> DEBUG 2015-08-17 20:27:19,981 (Expire stuffer thread) - > >> org.apache.manifoldcf.perf - Beginning query to look for documents to > >> expire > >> > >> DEBUG 2015-08-17 20:27:19,981 (Expire stuffer thread) - > >> org.apache.manifoldcf.perf - Attempt 1 to expire documents, after 0 > >> ms > >> > >> DEBUG 2015-08-17 20:27:19,983 (Expire stuffer thread) - > >> org.apache.manifoldcf.perf - Expiring 0 documents > >> > >> DEBUG 2015-08-17 20:27:19,984 (Expire stuffer thread) - > >> org.apache.manifoldcf.crawlerthreads - Expiration stuffer thread: > >> Found 0 documents to expire > >> > >> DEBUG 2015-08-17 20:27:19,996 (Expire stuffer thread) - > >> org.apache.manifoldcf.crawlerthreads - Expiration stuffer thread woke > >> up > >> > >> DEBUG 2015-08-17 20:27:19,996 (Expire stuffer thread) - > >> org.apache.manifoldcf.perf - Beginning query to look for documents to > >> expire > >> > >> DEBUG 2015-08-17 20:27:19,997 (Expire stuffer thread) - > >> org.apache.manifoldcf.perf - Attempt 1 to expire documents, after 1 > >> ms > >> > >> DEBUG 2015-08-17 20:27:19,999 (Expire stuffer thread) - > >> org.apache.manifoldcf.perf - Expiring 0 documents > >> > >> DEBUG 2015-08-17 20:27:19,999 (Expire stuffer thread) - > >> org.apache.manifoldcf.crawlerthreads - Expiration stuffer thread: > >> Found 0 documents to expire > >> > >> DEBUG 2015-08-17 20:27:20,077 (Document cleanup stuffer thread) - > >> org.apache.manifoldcf.crawlerthreads - Document cleanup stuffer thread > >> woke up > >> > >> DEBUG 2015-08-17 20:27:20,077 (Document delete stuffer thread) - > >> org.apache.manifoldcf.crawlerthreads - Document delete stuffer thread > >> woke up > >> > >> DEBUG 2015-08-17 20:27:20,078 (Document cleanup stuffer thread) - > >> org.apache.manifoldcf.crawlerthreads - Document cleanup stuffer thread > >> found nothing to do > >> > >> DEBUG 2015-08-17 20:27:20,078 (Document delete stuffer thread) - > >> org.apache.manifoldcf.crawlerthreads - Document delete stuffer thread > >> found nothing to do > >> > >> DEBUG 2015-08-17 20:27:20,083 (Document delete stuffer thread) - > >> org.apache.manifoldcf.crawlerthreads - Document delete stuffer thread > >> woke up > >> > >> DEBUG 2015-08-17 20:27:20,083 (Document cleanup stuffer thread) - > >> org.apache.manifoldcf.crawlerthreads - Document cleanup stuffer thread > >> woke up > >> > >> DEBUG 2015-08-17 20:27:20,084 (Document delete stuffer thread) - > >> org.apache.manifoldcf.crawlerthreads - Document delete stuffer thread > >> found nothing to do > >> > >> DEBUG 2015-08-17 20:27:20,084 (Document cleanup stuffer thread) - > >> org.apache.manifoldcf.crawlerthreads - Document cleanup stuffer thread > >> found nothing to do > >> > >> DEBUG 2015-08-17 20:27:21,078 (Document cleanup stuffer thread) - > >> org.apache.manifoldcf.crawlerthreads - Document cleanup stuffer thread > >> woke up > >> > >> > >> > >> On 17 August 2015 at 21:29, Karl Wright <[email protected]> wrote: > >> > 2.1 does do background reprioritization. If you want to see that > >> > occurring > >> > in the log, you would need to add the following in your properties.xml > >> > file: > >> > > >> > <property name="org.apache.manifoldcf.scheduling" value="DEBUG"/> > >> > > >> > Can I have more information? Specifically, is this a multiprocess > >> > setup? > >> > and if so, is this zookeeper or file system synchronization? > >> > > >> > Karl > >> > > >> > > >> > On Mon, Aug 17, 2015 at 2:57 PM, Roman Šitina <[email protected]> > wrote: > >> >> > >> >> Hello Karl, > >> >> > >> >> thanks for you quick reply! > >> >> > >> >> The version is 2.1. I tried to get detailed logging by setting > >> >> log4j.rootLogger=INFO, MAIN in logging.ini but that did not help - > >> >> only WARN level was still logging after restart. > >> >> > >> >> Roman > >> >> > >> >> On 17 August 2015 at 20:35, Karl Wright <[email protected]> wrote: > >> >> > Hi Roman, > >> >> > > >> >> > ManifoldCF needs to reprioritize documents whenever you pause or > >> >> > restart > >> >> > jobs. For jobs with large numbers of documents, the total amount > of > >> >> > work > >> >> > involved in this is significant. But, depending on the precise > >> >> > ManifoldCF > >> >> > version you are using, the reprioritization typically continues in > >> >> > background while MCF runs your job. > >> >> > > >> >> > Can you tell me more about what version of MCF you are trying here? > >> >> > > >> >> > Karl > >> >> > > >> >> > > >> >> > On Mon, Aug 17, 2015 at 2:13 PM, Roman Šitina <[email protected]> > >> >> > wrote: > >> >> >> > >> >> >> Hello, > >> >> >> > >> >> >> I have a ManifoldCF setup based on multiprocess-file-example which > >> >> >> is > >> >> >> backed by PostgreSQL. > >> >> >> > >> >> >> I have created a connection from Documentum to ElasticSearch with > >> >> >> about 300 000 documents. I was able to crawl several thousand > >> >> >> documents so the connection is working properly. > >> >> >> > >> >> >> What I'm not sure about is that when I pause or stop the job and > >> >> >> then > >> >> >> run it again it takes a while and it looks like ManifoldCF is > doing > >> >> >> nothing (30 minutes). After that time I usually try to restart all > >> >> >> processes. > >> >> >> > >> >> >> I looked at all logs - manifoldcf.log, documentum-registry, > >> >> >> documentum-server and DFC itself but I can't find any relevant > >> >> >> information. > >> >> >> > >> >> >> Can you help me figuring out what is the best way to monitor > >> >> >> progress > >> >> >> of jobs that look to be not progressing? > >> >> >> > >> >> >> Thank you very much > >> >> >> Roman > >> >> > > >> >> > > >> > > >> > > > > > >
