Thanks Karl, While crawling is very slow, its taking long so a bit of frustrating and as i have multiple high volume jobs that too in parallel, it does not seem to be a good thing.
I have also raised it on Zookeeper forums @ http://zookeeper-user.578899.n2.nabble.com/Getting-errors-in-zookeeper-logs-td7580260.html but waiting for reply. Regards. On Mon, Sep 15, 2014 at 6:51 PM, Karl Wright <[email protected]> wrote: > HI Lalit, > > When MCF cannot reach zookeeper, MCF crawls will pause until the zookeeper > connections are reestablished. Then the crawls should resume. This should > *not* abort your crawls, but it will make them very slow. > > I am not a zookeeper expert, so I would post on their message boards to > see if there is any adjustment that can be made to zookeeper parameters > that would improve zookeeper behavior when you have a flaky network. > However, since the obvious solution is to fix your network, they may not > have a code solution for you. > > Thanks, > Karl > > > On Mon, Sep 15, 2014 at 9:15 AM, lalit jangra <[email protected]> > wrote: > >> Thanks Karl, >> >> Ideally resetting connections should be taken care by zookeeper itself as >> i could see re-establishment of connections later in logs. >> >> Can you suggest any way to overcome this in addition to network issue >> resolution as my crawls are not working again and again? Anything in config >> files etc.? >> >> Regards. >> >> >> On Mon, Sep 15, 2014 at 6:39 PM, Karl Wright <[email protected]> wrote: >> >>> Hi Lalit, >>> >>> Zookeeper will keep working, but you should understand that you are >>> dropping connections to your zookeeper members for unknown reasons, which >>> is causing your crawl to stall when it happens. This argues that perhaps >>> you have some network flakiness of some kind. >>> >>> Karl >>> >>> >>> On Mon, Sep 15, 2014 at 8:59 AM, lalit jangra <[email protected]> >>> wrote: >>> >>>> >>>> Hi, >>>> >>>> I am running cluster of two Apache ManifoldCF nodes on two separate >>>> machines each of which having 3 zookeeper instances (total 6 instances in >>>> cluster). When i am running up manifoldCF agents, i see below warning >>>> during startup. >>>> >>>> [http-bio-80-exec-2-SendThread(iwdc1preecma03.iwater.ie:2181)] INFO >>>> org.apache.zookeeper.ClientCnxn - Unable to read additional data from >>>> server sessionid 0x0, likely server has closed socket, closing socket >>>> connection and attempting reconnect >>>> >>>> [http-bio-80-exec-2-SendThread(iwdc2preecma04.iwater.ie:2182)] INFO >>>> org.apache.zookeeper.ClientCnxn - Opening socket connection to server >>>> iwdc2preecma04.iwater.ie/10.231.72.25:2182. Will not attempt to >>>> authenticate using SASL (unknown error) >>>> >>>> >>>> Also i could see below error in logs in while agents are running. >>>> >>>> [http-bio-80-exec-2] INFO org.apache.zookeeper.ZooKeeper - Initiating >>>> client connection, >>>> connectString=iwdc1preecma03:2181,iwdc1preecma03:2182,iwdc1preecma03:2183,iwdc2preecma04:2181,iwdc2preecma04:2182,iwdc2preecma04:2183 >>>> sessionTimeout=4000 >>>> watcher=org.apache.manifoldcf.core.lockmanager.ZooKeeperConnection$ZooKeeperWatcher@51d83fd7 >>>> >>>> [http-bio-80-exec-2-SendThread(iwdc2preecma04.iwater.ie:2182)] INFO >>>> org.apache.zookeeper.ClientCnxn - Opening socket connection to server >>>> iwdc2preecma04.iwater.ie/10.231.72.25:2182. Will not attempt to >>>> authenticate using SASL (unknown error) >>>> >>>> [http-bio-80-exec-2-SendThread(iwdc2preecma04.iwater.ie:2182)] INFO >>>> org.apache.zookeeper.ClientCnxn - Socket connection established to >>>> iwdc2preecma04.iwater.ie/10.231.72.25:2182, initiating session >>>> >>>> [http-bio-80-exec-2-SendThread(iwdc2preecma04.iwater.ie:2182)] WARN >>>> org.apache.zookeeper.ClientCnxn - Session 0x0 for server >>>> iwdc2preecma04.iwater.ie/10.231.72.25:2182, unexpected error, closing >>>> socket connection and attempting reconnect >>>> >>>> java.io.IOException: Connection reset by peer >>>> >>>> at sun.nio.ch.FileDispatcherImpl.read0(Native Method) >>>> >>>> at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) >>>> >>>> at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:225) >>>> >>>> at sun.nio.ch.IOUtil.read(IOUtil.java:193) >>>> >>>> at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:375) >>>> >>>> at >>>> org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:68) >>>> >>>> at >>>> org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:355) >>>> >>>> at >>>> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068) >>>> >>>> [http-bio-80-exec-2-SendThread(iwdc2preecma04.iwater.ie:2183)] INFO >>>> org.apache.zookeeper.ClientCnxn - Opening socket connection to server >>>> iwdc2preecma04.iwater.ie/10.231.72.25:2183. Will not attempt to >>>> authenticate using SASL (unknown error) >>>> >>>> [http-bio-80-exec-2-SendThread(iwdc2preecma04.iwater.ie:2183)] INFO >>>> org.apache.zookeeper.ClientCnxn - Socket connection established to >>>> iwdc2preecma04.iwater.ie/10.231.72.25:2183, initiating session >>>> >>>> [http-bio-80-exec-2-SendThread(iwdc2preecma04.iwater.ie:2183)] INFO >>>> org.apache.zookeeper.ClientCnxn - Session establishment complete on server >>>> iwdc2preecma04.iwater.ie/10.231.72.25:2183, sessionid = >>>> 0x6487851bd330078, negotiated timeout = 4000 >>>> >>>> >>>> Below are configurations for 1. zookeeper nodes & 2. MCF nodes for >>>> zookeeper. >>>> >>>> >>>> *zoo.cfg : Same for all six zookeeper nodes.* >>>> >>>> >>>> # The number of milliseconds of each tick >>>> >>>> tickTime=2000 >>>> >>>> dataDir=/app/IW/zookeeper/data/data.1 >>>> >>>> dataLogDir=/app/IW/zookeeper/logs/log.1 >>>> >>>> clientPort=2181 >>>> >>>> server.1=iwdc1preecma03:2888:3888 >>>> >>>> server.2=iwdc1preecma03:2889:3889 >>>> >>>> server.3=iwdc1preecma03:2890:3890 >>>> >>>> server.4=iwdc2preecma04:2891:3891 >>>> >>>> server.5=iwdc2preecma04:2892:3892 >>>> >>>> server.6=iwdc2preecma04:2893:3893 >>>> >>>> # The number of ticks that the initial >>>> >>>> # synchronization phase can take >>>> >>>> initLimit=10 >>>> >>>> # The number of ticks that can pass between >>>> >>>> # sending a request and getting an acknowledgement >>>> >>>> syncLimit=5 >>>> >>>> # the directory where the snapshot is stored. >>>> >>>> # do not use /tmp for storage, /tmp here is just >>>> >>>> # example sakes. >>>> >>>> #dataDir=/tmp/zookeeper >>>> >>>> # the port at which the clients will connect >>>> >>>> #clientPort=2181 >>>> >>>> # the maximum number of client connections. >>>> >>>> # increase this if you need to handle more clients >>>> >>>> #maxClientCnxns=60 >>>> >>>> # >>>> >>>> # Be sure to read the maintenance section of the >>>> >>>> # administrator guide before turning on autopurge. >>>> >>>> # >>>> >>>> # >>>> http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance >>>> >>>> # >>>> >>>> # The number of snapshots to retain in dataDir >>>> >>>> autopurge.snapRetainCount=3 >>>> >>>> # Purge task interval in hours >>>> >>>> # Set to "0" to disable auto purge feature >>>> >>>> autopurge.purgeInterval=1 >>>> >>>> >>>> >>>> *ManifoldCF configurations : same for both ManifoldCF nodes.* >>>> >>>> >>>> <property name="org.apache.manifoldcf.lockmanagerclass" >>>> value="org.apache.manifoldcf.core.lockmanager.ZooKeeperLockManager"/> >>>> >>>> <property name="org.apache.manifoldcf.zookeeper.connectstring" >>>> value="iwdc1preecma03:2181,iwdc1preecma03:2182,iwdc1preecma03:2183,iwdc2preecma04:2181,iwdc2preecma04:2182,iwdc2preecma04:2183"/> >>>> >>>> <property name="org.apache.manifoldcf.zookeeper.sessiontimeout" >>>> value="4000"/> >>>> >>>> >>>> >>>> *I want to know if due to above warnings/errors, will zookeeper stop >>>> working or will zookeeper will work and these are non-failing messages, >>>> because ManifoldCF jobs are stuck while i can see these errors.* >>>> >>>> Please suggest. >>>> >>>> Regards, >>>> Lalit. >>>> >>>> >>> >> >> >> -- >> Regards, >> Lalit. >> > > -- Regards, Lalit.
