It is multiprocess setup with file synchronisation.

I can see reprioritisation in logs and after a while all I can see are
these logs cycling:

DEBUG 2015-08-17 20:27:19,980 (Expire stuffer thread) -
org.apache.manifoldcf.crawlerthreads - Expiration stuffer thread woke
up

DEBUG 2015-08-17 20:27:19,981 (Expire stuffer thread) -
org.apache.manifoldcf.perf - Beginning query to look for documents to
expire

DEBUG 2015-08-17 20:27:19,981 (Expire stuffer thread) -
org.apache.manifoldcf.perf -  Attempt 1 to expire documents, after 0
ms

DEBUG 2015-08-17 20:27:19,983 (Expire stuffer thread) -
org.apache.manifoldcf.perf -  Expiring 0 documents

DEBUG 2015-08-17 20:27:19,984 (Expire stuffer thread) -
org.apache.manifoldcf.crawlerthreads - Expiration stuffer thread:
Found 0 documents to expire

DEBUG 2015-08-17 20:27:19,996 (Expire stuffer thread) -
org.apache.manifoldcf.crawlerthreads - Expiration stuffer thread woke
up

DEBUG 2015-08-17 20:27:19,996 (Expire stuffer thread) -
org.apache.manifoldcf.perf - Beginning query to look for documents to
expire

DEBUG 2015-08-17 20:27:19,997 (Expire stuffer thread) -
org.apache.manifoldcf.perf -  Attempt 1 to expire documents, after 1
ms

DEBUG 2015-08-17 20:27:19,999 (Expire stuffer thread) -
org.apache.manifoldcf.perf -  Expiring 0 documents

DEBUG 2015-08-17 20:27:19,999 (Expire stuffer thread) -
org.apache.manifoldcf.crawlerthreads - Expiration stuffer thread:
Found 0 documents to expire

DEBUG 2015-08-17 20:27:20,077 (Document cleanup stuffer thread) -
org.apache.manifoldcf.crawlerthreads - Document cleanup stuffer thread
woke up

DEBUG 2015-08-17 20:27:20,077 (Document delete stuffer thread) -
org.apache.manifoldcf.crawlerthreads - Document delete stuffer thread
woke up

DEBUG 2015-08-17 20:27:20,078 (Document cleanup stuffer thread) -
org.apache.manifoldcf.crawlerthreads - Document cleanup stuffer thread
found nothing to do

DEBUG 2015-08-17 20:27:20,078 (Document delete stuffer thread) -
org.apache.manifoldcf.crawlerthreads - Document delete stuffer thread
found nothing to do

DEBUG 2015-08-17 20:27:20,083 (Document delete stuffer thread) -
org.apache.manifoldcf.crawlerthreads - Document delete stuffer thread
woke up

DEBUG 2015-08-17 20:27:20,083 (Document cleanup stuffer thread) -
org.apache.manifoldcf.crawlerthreads - Document cleanup stuffer thread
woke up

DEBUG 2015-08-17 20:27:20,084 (Document delete stuffer thread) -
org.apache.manifoldcf.crawlerthreads - Document delete stuffer thread
found nothing to do

DEBUG 2015-08-17 20:27:20,084 (Document cleanup stuffer thread) -
org.apache.manifoldcf.crawlerthreads - Document cleanup stuffer thread
found nothing to do

DEBUG 2015-08-17 20:27:21,078 (Document cleanup stuffer thread) -
org.apache.manifoldcf.crawlerthreads - Document cleanup stuffer thread
woke up



On 17 August 2015 at 21:29, Karl Wright <[email protected]> wrote:
> 2.1 does do background reprioritization.  If you want to see that occurring
> in the log, you would need to add the following in your properties.xml file:
>
> <property name="org.apache.manifoldcf.scheduling" value="DEBUG"/>
>
> Can I have more information?  Specifically, is this a multiprocess setup?
> and if so, is this zookeeper or file system synchronization?
>
> Karl
>
>
> On Mon, Aug 17, 2015 at 2:57 PM, Roman Šitina <[email protected]> wrote:
>>
>> Hello Karl,
>>
>> thanks for you quick reply!
>>
>> The version is 2.1. I tried to get detailed logging by setting
>> log4j.rootLogger=INFO, MAIN in logging.ini but that did not help -
>> only WARN level was still logging after restart.
>>
>> Roman
>>
>> On 17 August 2015 at 20:35, Karl Wright <[email protected]> wrote:
>> > Hi Roman,
>> >
>> > ManifoldCF needs to reprioritize documents whenever you pause or restart
>> > jobs.  For jobs with large numbers of documents, the total amount of
>> > work
>> > involved in this is significant.  But, depending on the precise
>> > ManifoldCF
>> > version you are using, the reprioritization typically continues in
>> > background while MCF runs your job.
>> >
>> > Can you tell me more about what version of MCF you are trying here?
>> >
>> > Karl
>> >
>> >
>> > On Mon, Aug 17, 2015 at 2:13 PM, Roman Šitina <[email protected]> wrote:
>> >>
>> >> Hello,
>> >>
>> >> I have a ManifoldCF setup based on multiprocess-file-example which is
>> >> backed by PostgreSQL.
>> >>
>> >> I have created a connection from Documentum to ElasticSearch with
>> >> about 300 000 documents. I was able to crawl several thousand
>> >> documents so the connection is working properly.
>> >>
>> >> What I'm not sure about is that when I pause or stop the job and then
>> >> run it again it takes a while and it looks like ManifoldCF is doing
>> >> nothing (30 minutes). After that time I usually try to restart all
>> >> processes.
>> >>
>> >> I looked at all logs - manifoldcf.log, documentum-registry,
>> >> documentum-server and DFC itself but I can't find any relevant
>> >> information.
>> >>
>> >> Can you help me figuring out what is the best way to monitor progress
>> >> of jobs that look to be not progressing?
>> >>
>> >> Thank you very much
>> >> Roman
>> >
>> >
>
>

Reply via email to