Hi Erlend, How many worker threads are you using? How many documents (about) do you crawl before things hang?
You may also want to try to increase the parameter: maxClientCnxns in zookeeper.cfg to something bigger, if you have a lot of worker threads. I'm thinking 1000 or some such. See if it makes a difference for you. I'll try a large crawl here using Zookeeper also, but it would be good to know your parameters before I begin. Karl On Tue, Sep 16, 2014 at 7:21 AM, lalit jangra <[email protected]> wrote: > Hello, > > To restrain zookeeper from taking too much disk space, use below > parameters. These will help to purge extra data one may not need. > > autopurge.snapRetainCount=3 : default value > autopurge.purgeInterval=1: default value > > Feel free to update as per needs. > > Regards. > > On Tue, Sep 16, 2014 at 3:46 PM, Karl Wright <[email protected]> wrote: > >> Hi Erlend, >> >> The zookeeper configuration supplied will likely fill up your disk with >> zookeeper synch data, because the parameters that control the cleanup of >> that data are not properly set up for long-term execution. >> >> Graeme Seaton would be the best resource for using Zookeeper properly; >> he's on this list and I've cc'd him directly as well. >> >> Karl >> >> >> On Tue, Sep 16, 2014 at 5:06 AM, Erlend Garåsen <[email protected]> >> wrote: >> >>> On 16.09.14 10:53, lalit jangra wrote: >>> >>>> Hi Erlend, >>>> >>>> Can you please elaborate on how you have configured zookeeper based >>>> synchronization, is it in stand alone mode or clustered mode? How many >>>> zookeeper nodes are you running for each of node and how many agents are >>>> you running? >>>> >>> >>> I'm not very familiar with Zookeeper, so I have just followed the >>> examples inside the multiprocess-zk-example folder, i.e.: >>> $MCF_HOME/../runzookeeper.sh > /dev/null 2>&1 & >>> # Reading global properties: >>> $MCF_HOME/../setglobalproperties.sh > /dev/null 2>&1 & >>> # Starting Agent process: >>> $MCF_HOME/processes/executecommand.sh >>> org.apache.manifoldcf.agents.AgentRun \ >>> 1>>$LOGDIR/mcf_agent.stdout.log 2>>$LOGDIR/mcf_agent.stderr.log >>> & pid=$! >>> >>> The above lines are from my startup script. I see now that I haven't >>> specified "-Dorg.apache.manifoldcf.processid=A", I'm not sure this is >>> important, but I can of course try to include that into my script and >>> restart everything. >>> >>> So to the question about how many zookeeper nodes I'm using, the answer >>> is one. The same applies to the number of running agents. >>> >>> Erlend >>> >> >> > > > -- > Regards, > Lalit. >
