Thanks Karl, After updated configurations, still i am hitting same zookeeper connection reset issue.
I was trying to assign memory to zookeeper instances but i could not see any way to do same. Can you suggest any way? What else i can do? Regards. On Mon, Sep 15, 2014 at 10:39 PM, Karl Wright <[email protected]> wrote: > Hi Lalit, > > If you have more than one unspecified Java process, EACH ONE will allocate > 25% of available memory by default. So you will have to do more than just > free up some MCF memory to get this to work. > > Karl > > > On Mon, Sep 15, 2014 at 12:29 PM, lalit jangra <[email protected]> > wrote: > >> Thanks Karl, >> >> I think this is the reason why my zookeeper nodes are resetting >> connection due to instability. What i will try in the meantime is to reduce >> MCF memory to 1.5G and leave rest unassigned so that will to 5.5 G for Java >> itself , more than 25% rule and see if it works. >> >> I also checked out Zookeeper documentation but no specific inputs i could >> take from it. >> >> Regards. >> >> On Mon, Sep 15, 2014 at 9:52 PM, Karl Wright <[email protected]> wrote: >> >>> Hi Lalit, >>> >>> I can't speak for Solr's memory consumption, but you absolutely need to >>> give Solr enough memory to avoid OOM errors or things will not work >>> properly. >>> >>> As for MCF, 3G is more than enough; probably you could give it 1G and be >>> fine. >>> >>> For Zookeeper, remember that it is a Java process. On 64-bit unix >>> machines, Java by default takes 25% of the total system memory. I would >>> look at their documentation to figure out what they need, and assign >>> precisely that amount, otherwise zk will obviously not be stable. >>> >>> Thanks, >>> Karl >>> >>> >>> On Mon, Sep 15, 2014 at 12:17 PM, lalit jangra <[email protected] >>> > wrote: >>> >>>> Hi Karl, >>>> >>>> Out of 12G, i have assigned 5G to solr as i could see a lot of Out of >>>> Memory errors/Java heap space issues while crawling large jobs,after which >>>> it seems to be OK. Also i have assigned 3G to MCF where it is quire >>>> comfortable. In rest of 4G, i am assuming is enough for OS & zookeeper >>>> nodes. I am currently running job for 35K documents & i could see more than >>>> 500MB memory free. >>>> >>>> Any thoughts? >>>> >>>> Regards. >>>> >>>> On Mon, Sep 15, 2014 at 8:45 PM, Karl Wright <[email protected]> >>>> wrote: >>>> >>>>> HI Lalit, >>>>> >>>>> The best way in Java to assess memory usage is to turn on JVM garbage >>>>> collection verbose output. Then you can see how often the system garbage >>>>> collects etc, and whether post-GC usage grows over time. >>>>> >>>>> 12G should be more than enough, so if you find you are running into >>>>> memory limits with that configuration, it would be worth trying to figure >>>>> out why that is happening. >>>>> >>>>> Karl >>>>> >>>>> >>>>> On Mon, Sep 15, 2014 at 11:04 AM, lalit jangra < >>>>> [email protected]> wrote: >>>>> >>>>>> Hi Karl, >>>>>> >>>>>> Can i see zookeeper connection reset messages due to system running >>>>>> on top of memory limits as i have 12G of RAM and can see its using 11.5G >>>>>> while job is running? >>>>>> >>>>>> >>>>>> Is there any way i should ascertain memory to zookeeper nodes & if >>>>>> so, is there any yardstick? >>>>>> >>>>>> Regards. >>>>>> >>>>>> On Mon, Sep 15, 2014 at 7:16 PM, Karl Wright <[email protected]> >>>>>> wrote: >>>>>> >>>>>>> Hi Lalit, >>>>>>> >>>>>>> Looks like this is the result of a tomcat shutdown, and is a >>>>>>> probable race condition bug in Zookeeper: >>>>>>> >>>>>>> >>>>>>> http://mail-archives.apache.org/mod_mbox/tomcat-users/201306.mbox/%[email protected]%3E >>>>>>> >>>>>>> Karl >>>>>>> >>>>>>> >>>>>>> On Mon, Sep 15, 2014 at 9:41 AM, lalit jangra < >>>>>>> [email protected]> wrote: >>>>>>> >>>>>>>> Hi Karl, >>>>>>>> >>>>>>>> Along with this, i could see below errors in tomcat catalina.out. >>>>>>>> >>>>>>>> Sep 15, 2014 1:06:14 PM >>>>>>>> org.apache.catalina.loader.WebappClassLoader loadClass >>>>>>>> >>>>>>>> INFO: Illegal access: this web application instance has been >>>>>>>> stopped already. Could not load org.apache.zookeeper.server.ZooTrace. >>>>>>>> The >>>>>>>> eventual following stack trace is caused by an error thrown for >>>>>>>> debugging >>>>>>>> purposes as well as to attempt to terminate the thread which caused the >>>>>>>> illegal access, and has no functional impact. >>>>>>>> >>>>>>>> java.lang.IllegalStateException >>>>>>>> >>>>>>>> at >>>>>>>> org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.java:1612) >>>>>>>> >>>>>>>> at >>>>>>>> org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.java:1571) >>>>>>>> >>>>>>>> at >>>>>>>> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1115) >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> [http-bio-80-exec-1-SendThread(iwdc2preecma04.iwater.ie:2183)] >>>>>>>> ERROR org.apache.zookeeper.ClientCnxn - from >>>>>>>> http-bio-80-exec-1-SendThread( >>>>>>>> iwdc2preecma04.iwater.ie:2183) >>>>>>>> >>>>>>>> java.lang.NoClassDefFoundError: org/apache/zookeeper/server/ZooTrace >>>>>>>> >>>>>>>> at >>>>>>>> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1115) >>>>>>>> >>>>>>>> Caused by: java.lang.ClassNotFoundException: >>>>>>>> org.apache.zookeeper.server.ZooTrace >>>>>>>> >>>>>>>> at >>>>>>>> org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.java:1720) >>>>>>>> >>>>>>>> at >>>>>>>> org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.java:1571) >>>>>>>> >>>>>>>> ... 1 more >>>>>>>> >>>>>>>> [http-bio-80-exec-1-SendThread(iwdc2preecma04.iwater.ie:2182)] >>>>>>>> ERROR org.apache.zookeeper.ClientCnxn - from >>>>>>>> http-bio-80-exec-1-SendThread( >>>>>>>> iwdc2preecma04.iwater.ie:2182) >>>>>>>> >>>>>>>> Sep 15, 2014 1:06:14 PM org.apache.coyote.AbstractProtocol destroy >>>>>>>> >>>>>>>> INFO: Destroying ProtocolHandler ["http-bio-80"] >>>>>>>> >>>>>>>> java.lang.NoClassDefFoundError: org/apache/zookeeper/server/ZooTrace >>>>>>>> >>>>>>>> Regards. >>>>>>>> >>>>>>>> On Mon, Sep 15, 2014 at 7:05 PM, lalit jangra < >>>>>>>> [email protected]> wrote: >>>>>>>> >>>>>>>>> Thanks Karl, >>>>>>>>> >>>>>>>>> While crawling is very slow, its taking long so a bit of >>>>>>>>> frustrating and as i have multiple high volume jobs that too in >>>>>>>>> parallel, >>>>>>>>> it does not seem to be a good thing. >>>>>>>>> >>>>>>>>> I have also raised it on Zookeeper forums @ >>>>>>>>> http://zookeeper-user.578899.n2.nabble.com/Getting-errors-in-zookeeper-logs-td7580260.html >>>>>>>>> but waiting for reply. >>>>>>>>> >>>>>>>>> Regards. >>>>>>>>> >>>>>>>>> On Mon, Sep 15, 2014 at 6:51 PM, Karl Wright <[email protected]> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> HI Lalit, >>>>>>>>>> >>>>>>>>>> When MCF cannot reach zookeeper, MCF crawls will pause until the >>>>>>>>>> zookeeper connections are reestablished. Then the crawls should >>>>>>>>>> resume. >>>>>>>>>> This should *not* abort your crawls, but it will make them very slow. >>>>>>>>>> >>>>>>>>>> I am not a zookeeper expert, so I would post on their message >>>>>>>>>> boards to see if there is any adjustment that can be made to >>>>>>>>>> zookeeper >>>>>>>>>> parameters that would improve zookeeper behavior when you have a >>>>>>>>>> flaky >>>>>>>>>> network. However, since the obvious solution is to fix your >>>>>>>>>> network, they >>>>>>>>>> may not have a code solution for you. >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> Karl >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Mon, Sep 15, 2014 at 9:15 AM, lalit jangra < >>>>>>>>>> [email protected]> wrote: >>>>>>>>>> >>>>>>>>>>> Thanks Karl, >>>>>>>>>>> >>>>>>>>>>> Ideally resetting connections should be taken care by zookeeper >>>>>>>>>>> itself as i could see re-establishment of connections later in logs. >>>>>>>>>>> >>>>>>>>>>> Can you suggest any way to overcome this in addition to network >>>>>>>>>>> issue resolution as my crawls are not working again and again? >>>>>>>>>>> Anything in >>>>>>>>>>> config files etc.? >>>>>>>>>>> >>>>>>>>>>> Regards. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Mon, Sep 15, 2014 at 6:39 PM, Karl Wright <[email protected] >>>>>>>>>>> > wrote: >>>>>>>>>>> >>>>>>>>>>>> Hi Lalit, >>>>>>>>>>>> >>>>>>>>>>>> Zookeeper will keep working, but you should understand that you >>>>>>>>>>>> are dropping connections to your zookeeper members for unknown >>>>>>>>>>>> reasons, >>>>>>>>>>>> which is causing your crawl to stall when it happens. This argues >>>>>>>>>>>> that >>>>>>>>>>>> perhaps you have some network flakiness of some kind. >>>>>>>>>>>> >>>>>>>>>>>> Karl >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Mon, Sep 15, 2014 at 8:59 AM, lalit jangra < >>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Hi, >>>>>>>>>>>>> >>>>>>>>>>>>> I am running cluster of two Apache ManifoldCF nodes on two >>>>>>>>>>>>> separate machines each of which having 3 zookeeper instances >>>>>>>>>>>>> (total 6 >>>>>>>>>>>>> instances in cluster). When i am running up manifoldCF agents, i >>>>>>>>>>>>> see below >>>>>>>>>>>>> warning during startup. >>>>>>>>>>>>> >>>>>>>>>>>>> [http-bio-80-exec-2-SendThread(iwdc1preecma03.iwater.ie:2181)] >>>>>>>>>>>>> INFO org.apache.zookeeper.ClientCnxn - Unable to read additional >>>>>>>>>>>>> data from >>>>>>>>>>>>> server sessionid 0x0, likely server has closed socket, closing >>>>>>>>>>>>> socket >>>>>>>>>>>>> connection and attempting reconnect >>>>>>>>>>>>> >>>>>>>>>>>>> [http-bio-80-exec-2-SendThread(iwdc2preecma04.iwater.ie:2182)] >>>>>>>>>>>>> INFO org.apache.zookeeper.ClientCnxn - Opening socket connection >>>>>>>>>>>>> to server >>>>>>>>>>>>> iwdc2preecma04.iwater.ie/10.231.72.25:2182. Will not attempt >>>>>>>>>>>>> to authenticate using SASL (unknown error) >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Also i could see below error in logs in while agents are >>>>>>>>>>>>> running. >>>>>>>>>>>>> >>>>>>>>>>>>> [http-bio-80-exec-2] INFO org.apache.zookeeper.ZooKeeper - >>>>>>>>>>>>> Initiating client connection, >>>>>>>>>>>>> connectString=iwdc1preecma03:2181,iwdc1preecma03:2182,iwdc1preecma03:2183,iwdc2preecma04:2181,iwdc2preecma04:2182,iwdc2preecma04:2183 >>>>>>>>>>>>> sessionTimeout=4000 >>>>>>>>>>>>> watcher=org.apache.manifoldcf.core.lockmanager.ZooKeeperConnection$ZooKeeperWatcher@51d83fd7 >>>>>>>>>>>>> >>>>>>>>>>>>> [http-bio-80-exec-2-SendThread(iwdc2preecma04.iwater.ie:2182)] >>>>>>>>>>>>> INFO org.apache.zookeeper.ClientCnxn - Opening socket connection >>>>>>>>>>>>> to server >>>>>>>>>>>>> iwdc2preecma04.iwater.ie/10.231.72.25:2182. Will not attempt >>>>>>>>>>>>> to authenticate using SASL (unknown error) >>>>>>>>>>>>> >>>>>>>>>>>>> [http-bio-80-exec-2-SendThread(iwdc2preecma04.iwater.ie:2182)] >>>>>>>>>>>>> INFO org.apache.zookeeper.ClientCnxn - Socket connection >>>>>>>>>>>>> established to >>>>>>>>>>>>> iwdc2preecma04.iwater.ie/10.231.72.25:2182, initiating session >>>>>>>>>>>>> >>>>>>>>>>>>> [http-bio-80-exec-2-SendThread(iwdc2preecma04.iwater.ie:2182)] >>>>>>>>>>>>> WARN org.apache.zookeeper.ClientCnxn - Session 0x0 for server >>>>>>>>>>>>> iwdc2preecma04.iwater.ie/10.231.72.25:2182, unexpected error, >>>>>>>>>>>>> closing socket connection and attempting reconnect >>>>>>>>>>>>> >>>>>>>>>>>>> java.io.IOException: Connection reset by peer >>>>>>>>>>>>> >>>>>>>>>>>>> at sun.nio.ch.FileDispatcherImpl.read0(Native Method) >>>>>>>>>>>>> >>>>>>>>>>>>> at >>>>>>>>>>>>> sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) >>>>>>>>>>>>> >>>>>>>>>>>>> at >>>>>>>>>>>>> sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:225) >>>>>>>>>>>>> >>>>>>>>>>>>> at sun.nio.ch.IOUtil.read(IOUtil.java:193) >>>>>>>>>>>>> >>>>>>>>>>>>> at >>>>>>>>>>>>> sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:375) >>>>>>>>>>>>> >>>>>>>>>>>>> at >>>>>>>>>>>>> org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:68) >>>>>>>>>>>>> >>>>>>>>>>>>> at >>>>>>>>>>>>> org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:355) >>>>>>>>>>>>> >>>>>>>>>>>>> at >>>>>>>>>>>>> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068) >>>>>>>>>>>>> >>>>>>>>>>>>> [http-bio-80-exec-2-SendThread(iwdc2preecma04.iwater.ie:2183)] >>>>>>>>>>>>> INFO org.apache.zookeeper.ClientCnxn - Opening socket connection >>>>>>>>>>>>> to server >>>>>>>>>>>>> iwdc2preecma04.iwater.ie/10.231.72.25:2183. Will not attempt >>>>>>>>>>>>> to authenticate using SASL (unknown error) >>>>>>>>>>>>> >>>>>>>>>>>>> [http-bio-80-exec-2-SendThread(iwdc2preecma04.iwater.ie:2183)] >>>>>>>>>>>>> INFO org.apache.zookeeper.ClientCnxn - Socket connection >>>>>>>>>>>>> established to >>>>>>>>>>>>> iwdc2preecma04.iwater.ie/10.231.72.25:2183, initiating session >>>>>>>>>>>>> >>>>>>>>>>>>> [http-bio-80-exec-2-SendThread(iwdc2preecma04.iwater.ie:2183)] >>>>>>>>>>>>> INFO org.apache.zookeeper.ClientCnxn - Session establishment >>>>>>>>>>>>> complete on >>>>>>>>>>>>> server iwdc2preecma04.iwater.ie/10.231.72.25:2183, sessionid >>>>>>>>>>>>> = 0x6487851bd330078, negotiated timeout = 4000 >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Below are configurations for 1. zookeeper nodes & 2. MCF nodes >>>>>>>>>>>>> for zookeeper. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> *zoo.cfg : Same for all six zookeeper nodes.* >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> # The number of milliseconds of each tick >>>>>>>>>>>>> >>>>>>>>>>>>> tickTime=2000 >>>>>>>>>>>>> >>>>>>>>>>>>> dataDir=/app/IW/zookeeper/data/data.1 >>>>>>>>>>>>> >>>>>>>>>>>>> dataLogDir=/app/IW/zookeeper/logs/log.1 >>>>>>>>>>>>> >>>>>>>>>>>>> clientPort=2181 >>>>>>>>>>>>> >>>>>>>>>>>>> server.1=iwdc1preecma03:2888:3888 >>>>>>>>>>>>> >>>>>>>>>>>>> server.2=iwdc1preecma03:2889:3889 >>>>>>>>>>>>> >>>>>>>>>>>>> server.3=iwdc1preecma03:2890:3890 >>>>>>>>>>>>> >>>>>>>>>>>>> server.4=iwdc2preecma04:2891:3891 >>>>>>>>>>>>> >>>>>>>>>>>>> server.5=iwdc2preecma04:2892:3892 >>>>>>>>>>>>> >>>>>>>>>>>>> server.6=iwdc2preecma04:2893:3893 >>>>>>>>>>>>> >>>>>>>>>>>>> # The number of ticks that the initial >>>>>>>>>>>>> >>>>>>>>>>>>> # synchronization phase can take >>>>>>>>>>>>> >>>>>>>>>>>>> initLimit=10 >>>>>>>>>>>>> >>>>>>>>>>>>> # The number of ticks that can pass between >>>>>>>>>>>>> >>>>>>>>>>>>> # sending a request and getting an acknowledgement >>>>>>>>>>>>> >>>>>>>>>>>>> syncLimit=5 >>>>>>>>>>>>> >>>>>>>>>>>>> # the directory where the snapshot is stored. >>>>>>>>>>>>> >>>>>>>>>>>>> # do not use /tmp for storage, /tmp here is just >>>>>>>>>>>>> >>>>>>>>>>>>> # example sakes. >>>>>>>>>>>>> >>>>>>>>>>>>> #dataDir=/tmp/zookeeper >>>>>>>>>>>>> >>>>>>>>>>>>> # the port at which the clients will connect >>>>>>>>>>>>> >>>>>>>>>>>>> #clientPort=2181 >>>>>>>>>>>>> >>>>>>>>>>>>> # the maximum number of client connections. >>>>>>>>>>>>> >>>>>>>>>>>>> # increase this if you need to handle more clients >>>>>>>>>>>>> >>>>>>>>>>>>> #maxClientCnxns=60 >>>>>>>>>>>>> >>>>>>>>>>>>> # >>>>>>>>>>>>> >>>>>>>>>>>>> # Be sure to read the maintenance section of the >>>>>>>>>>>>> >>>>>>>>>>>>> # administrator guide before turning on autopurge. >>>>>>>>>>>>> >>>>>>>>>>>>> # >>>>>>>>>>>>> >>>>>>>>>>>>> # >>>>>>>>>>>>> http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance >>>>>>>>>>>>> >>>>>>>>>>>>> # >>>>>>>>>>>>> >>>>>>>>>>>>> # The number of snapshots to retain in dataDir >>>>>>>>>>>>> >>>>>>>>>>>>> autopurge.snapRetainCount=3 >>>>>>>>>>>>> >>>>>>>>>>>>> # Purge task interval in hours >>>>>>>>>>>>> >>>>>>>>>>>>> # Set to "0" to disable auto purge feature >>>>>>>>>>>>> >>>>>>>>>>>>> autopurge.purgeInterval=1 >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> *ManifoldCF configurations : same for both ManifoldCF nodes.* >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> <property name="org.apache.manifoldcf.lockmanagerclass" >>>>>>>>>>>>> value="org.apache.manifoldcf.core.lockmanager.ZooKeeperLockManager"/> >>>>>>>>>>>>> >>>>>>>>>>>>> <property >>>>>>>>>>>>> name="org.apache.manifoldcf.zookeeper.connectstring" >>>>>>>>>>>>> value="iwdc1preecma03:2181,iwdc1preecma03:2182,iwdc1preecma03:2183,iwdc2preecma04:2181,iwdc2preecma04:2182,iwdc2preecma04:2183"/> >>>>>>>>>>>>> >>>>>>>>>>>>> <property >>>>>>>>>>>>> name="org.apache.manifoldcf.zookeeper.sessiontimeout" >>>>>>>>>>>>> value="4000"/> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> *I want to know if due to above warnings/errors, will >>>>>>>>>>>>> zookeeper stop working or will zookeeper will work and these are >>>>>>>>>>>>> non-failing messages, because ManifoldCF jobs are stuck while i >>>>>>>>>>>>> can see >>>>>>>>>>>>> these errors.* >>>>>>>>>>>>> >>>>>>>>>>>>> Please suggest. >>>>>>>>>>>>> >>>>>>>>>>>>> Regards, >>>>>>>>>>>>> Lalit. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> Regards, >>>>>>>>>>> Lalit. >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Regards, >>>>>>>>> Lalit. >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Regards, >>>>>>>> Lalit. >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Regards, >>>>>> Lalit. >>>>>> >>>>> >>>>> >>>> >>>> >>>> -- >>>> Regards, >>>> Lalit. >>>> >>> >>> >> >> >> -- >> Regards, >> Lalit. >> > > -- Regards, Lalit.
