Hi Erlend,

How many worker threads are you using?  How many documents (about) do you
crawl before things hang?

You may also want to try to increase the parameter: maxClientCnxns in
zookeeper.cfg to something bigger, if you have a lot of worker threads.
I'm thinking 1000 or some such.  See if it makes a difference for you.

I'll try a large crawl here using Zookeeper also, but it would be good to
know your parameters before I begin.

Karl


On Tue, Sep 16, 2014 at 7:21 AM, lalit jangra <[email protected]>
wrote:

> Hello,
>
> To restrain zookeeper from taking too much disk space, use below
> parameters. These will help to purge extra data one may not need.
>
> autopurge.snapRetainCount=3 : default value
> autopurge.purgeInterval=1: default value
>
> Feel free to update as per needs.
>
> Regards.
>
> On Tue, Sep 16, 2014 at 3:46 PM, Karl Wright <[email protected]> wrote:
>
>> Hi Erlend,
>>
>> The zookeeper configuration supplied will likely fill up your disk with
>> zookeeper synch data, because the parameters that control the cleanup of
>> that data are not properly set up for long-term execution.
>>
>> Graeme Seaton would be the best resource for using Zookeeper properly;
>> he's on this list and I've cc'd him directly as well.
>>
>> Karl
>>
>>
>> On Tue, Sep 16, 2014 at 5:06 AM, Erlend Garåsen <[email protected]>
>> wrote:
>>
>>> On 16.09.14 10:53, lalit jangra wrote:
>>>
>>>> Hi Erlend,
>>>>
>>>> Can you please elaborate on how you have configured zookeeper based
>>>> synchronization, is it in stand alone mode or clustered mode? How many
>>>> zookeeper nodes are you running for each of node and how many agents are
>>>> you running?
>>>>
>>>
>>> I'm not very familiar with Zookeeper, so I have just followed the
>>> examples inside the multiprocess-zk-example folder, i.e.:
>>> $MCF_HOME/../runzookeeper.sh > /dev/null 2>&1 &
>>>         # Reading global properties:
>>>         $MCF_HOME/../setglobalproperties.sh > /dev/null 2>&1 &
>>>         # Starting Agent process:
>>>         $MCF_HOME/processes/executecommand.sh
>>> org.apache.manifoldcf.agents.AgentRun \
>>>         1>>$LOGDIR/mcf_agent.stdout.log 2>>$LOGDIR/mcf_agent.stderr.log
>>> & pid=$!
>>>
>>> The above lines are from my startup script. I see now that I haven't
>>> specified "-Dorg.apache.manifoldcf.processid=A", I'm not sure this is
>>> important, but I can of course try to include that into my script and
>>> restart everything.
>>>
>>> So to the question about how many zookeeper nodes I'm using, the answer
>>> is one. The same applies to the number of running agents.
>>>
>>> Erlend
>>>
>>
>>
>
>
> --
> Regards,
> Lalit.
>

Reply via email to