HI Lalit, When MCF cannot reach zookeeper, MCF crawls will pause until the zookeeper connections are reestablished. Then the crawls should resume. This should *not* abort your crawls, but it will make them very slow.
I am not a zookeeper expert, so I would post on their message boards to see if there is any adjustment that can be made to zookeeper parameters that would improve zookeeper behavior when you have a flaky network. However, since the obvious solution is to fix your network, they may not have a code solution for you. Thanks, Karl On Mon, Sep 15, 2014 at 9:15 AM, lalit jangra <[email protected]> wrote: > Thanks Karl, > > Ideally resetting connections should be taken care by zookeeper itself as > i could see re-establishment of connections later in logs. > > Can you suggest any way to overcome this in addition to network issue > resolution as my crawls are not working again and again? Anything in config > files etc.? > > Regards. > > > On Mon, Sep 15, 2014 at 6:39 PM, Karl Wright <[email protected]> wrote: > >> Hi Lalit, >> >> Zookeeper will keep working, but you should understand that you are >> dropping connections to your zookeeper members for unknown reasons, which >> is causing your crawl to stall when it happens. This argues that perhaps >> you have some network flakiness of some kind. >> >> Karl >> >> >> On Mon, Sep 15, 2014 at 8:59 AM, lalit jangra <[email protected]> >> wrote: >> >>> >>> Hi, >>> >>> I am running cluster of two Apache ManifoldCF nodes on two separate >>> machines each of which having 3 zookeeper instances (total 6 instances in >>> cluster). When i am running up manifoldCF agents, i see below warning >>> during startup. >>> >>> [http-bio-80-exec-2-SendThread(iwdc1preecma03.iwater.ie:2181)] INFO >>> org.apache.zookeeper.ClientCnxn - Unable to read additional data from >>> server sessionid 0x0, likely server has closed socket, closing socket >>> connection and attempting reconnect >>> >>> [http-bio-80-exec-2-SendThread(iwdc2preecma04.iwater.ie:2182)] INFO >>> org.apache.zookeeper.ClientCnxn - Opening socket connection to server >>> iwdc2preecma04.iwater.ie/10.231.72.25:2182. Will not attempt to >>> authenticate using SASL (unknown error) >>> >>> >>> Also i could see below error in logs in while agents are running. >>> >>> [http-bio-80-exec-2] INFO org.apache.zookeeper.ZooKeeper - Initiating >>> client connection, >>> connectString=iwdc1preecma03:2181,iwdc1preecma03:2182,iwdc1preecma03:2183,iwdc2preecma04:2181,iwdc2preecma04:2182,iwdc2preecma04:2183 >>> sessionTimeout=4000 >>> watcher=org.apache.manifoldcf.core.lockmanager.ZooKeeperConnection$ZooKeeperWatcher@51d83fd7 >>> >>> [http-bio-80-exec-2-SendThread(iwdc2preecma04.iwater.ie:2182)] INFO >>> org.apache.zookeeper.ClientCnxn - Opening socket connection to server >>> iwdc2preecma04.iwater.ie/10.231.72.25:2182. Will not attempt to >>> authenticate using SASL (unknown error) >>> >>> [http-bio-80-exec-2-SendThread(iwdc2preecma04.iwater.ie:2182)] INFO >>> org.apache.zookeeper.ClientCnxn - Socket connection established to >>> iwdc2preecma04.iwater.ie/10.231.72.25:2182, initiating session >>> >>> [http-bio-80-exec-2-SendThread(iwdc2preecma04.iwater.ie:2182)] WARN >>> org.apache.zookeeper.ClientCnxn - Session 0x0 for server >>> iwdc2preecma04.iwater.ie/10.231.72.25:2182, unexpected error, closing >>> socket connection and attempting reconnect >>> >>> java.io.IOException: Connection reset by peer >>> >>> at sun.nio.ch.FileDispatcherImpl.read0(Native Method) >>> >>> at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) >>> >>> at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:225) >>> >>> at sun.nio.ch.IOUtil.read(IOUtil.java:193) >>> >>> at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:375) >>> >>> at >>> org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:68) >>> >>> at >>> org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:355) >>> >>> at >>> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068) >>> >>> [http-bio-80-exec-2-SendThread(iwdc2preecma04.iwater.ie:2183)] INFO >>> org.apache.zookeeper.ClientCnxn - Opening socket connection to server >>> iwdc2preecma04.iwater.ie/10.231.72.25:2183. Will not attempt to >>> authenticate using SASL (unknown error) >>> >>> [http-bio-80-exec-2-SendThread(iwdc2preecma04.iwater.ie:2183)] INFO >>> org.apache.zookeeper.ClientCnxn - Socket connection established to >>> iwdc2preecma04.iwater.ie/10.231.72.25:2183, initiating session >>> >>> [http-bio-80-exec-2-SendThread(iwdc2preecma04.iwater.ie:2183)] INFO >>> org.apache.zookeeper.ClientCnxn - Session establishment complete on server >>> iwdc2preecma04.iwater.ie/10.231.72.25:2183, sessionid = >>> 0x6487851bd330078, negotiated timeout = 4000 >>> >>> >>> Below are configurations for 1. zookeeper nodes & 2. MCF nodes for >>> zookeeper. >>> >>> >>> *zoo.cfg : Same for all six zookeeper nodes.* >>> >>> >>> # The number of milliseconds of each tick >>> >>> tickTime=2000 >>> >>> dataDir=/app/IW/zookeeper/data/data.1 >>> >>> dataLogDir=/app/IW/zookeeper/logs/log.1 >>> >>> clientPort=2181 >>> >>> server.1=iwdc1preecma03:2888:3888 >>> >>> server.2=iwdc1preecma03:2889:3889 >>> >>> server.3=iwdc1preecma03:2890:3890 >>> >>> server.4=iwdc2preecma04:2891:3891 >>> >>> server.5=iwdc2preecma04:2892:3892 >>> >>> server.6=iwdc2preecma04:2893:3893 >>> >>> # The number of ticks that the initial >>> >>> # synchronization phase can take >>> >>> initLimit=10 >>> >>> # The number of ticks that can pass between >>> >>> # sending a request and getting an acknowledgement >>> >>> syncLimit=5 >>> >>> # the directory where the snapshot is stored. >>> >>> # do not use /tmp for storage, /tmp here is just >>> >>> # example sakes. >>> >>> #dataDir=/tmp/zookeeper >>> >>> # the port at which the clients will connect >>> >>> #clientPort=2181 >>> >>> # the maximum number of client connections. >>> >>> # increase this if you need to handle more clients >>> >>> #maxClientCnxns=60 >>> >>> # >>> >>> # Be sure to read the maintenance section of the >>> >>> # administrator guide before turning on autopurge. >>> >>> # >>> >>> # >>> http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance >>> >>> # >>> >>> # The number of snapshots to retain in dataDir >>> >>> autopurge.snapRetainCount=3 >>> >>> # Purge task interval in hours >>> >>> # Set to "0" to disable auto purge feature >>> >>> autopurge.purgeInterval=1 >>> >>> >>> >>> *ManifoldCF configurations : same for both ManifoldCF nodes.* >>> >>> >>> <property name="org.apache.manifoldcf.lockmanagerclass" >>> value="org.apache.manifoldcf.core.lockmanager.ZooKeeperLockManager"/> >>> >>> <property name="org.apache.manifoldcf.zookeeper.connectstring" >>> value="iwdc1preecma03:2181,iwdc1preecma03:2182,iwdc1preecma03:2183,iwdc2preecma04:2181,iwdc2preecma04:2182,iwdc2preecma04:2183"/> >>> >>> <property name="org.apache.manifoldcf.zookeeper.sessiontimeout" >>> value="4000"/> >>> >>> >>> >>> *I want to know if due to above warnings/errors, will zookeeper stop >>> working or will zookeeper will work and these are non-failing messages, >>> because ManifoldCF jobs are stuck while i can see these errors.* >>> >>> Please suggest. >>> >>> Regards, >>> Lalit. >>> >>> >> > > > -- > Regards, > Lalit. >
