Thanks Karl, I think this is the reason why my zookeeper nodes are resetting connection due to instability. What i will try in the meantime is to reduce MCF memory to 1.5G and leave rest unassigned so that will to 5.5 G for Java itself , more than 25% rule and see if it works.
I also checked out Zookeeper documentation but no specific inputs i could take from it. Regards. On Mon, Sep 15, 2014 at 9:52 PM, Karl Wright <[email protected]> wrote: > Hi Lalit, > > I can't speak for Solr's memory consumption, but you absolutely need to > give Solr enough memory to avoid OOM errors or things will not work > properly. > > As for MCF, 3G is more than enough; probably you could give it 1G and be > fine. > > For Zookeeper, remember that it is a Java process. On 64-bit unix > machines, Java by default takes 25% of the total system memory. I would > look at their documentation to figure out what they need, and assign > precisely that amount, otherwise zk will obviously not be stable. > > Thanks, > Karl > > > On Mon, Sep 15, 2014 at 12:17 PM, lalit jangra <[email protected]> > wrote: > >> Hi Karl, >> >> Out of 12G, i have assigned 5G to solr as i could see a lot of Out of >> Memory errors/Java heap space issues while crawling large jobs,after which >> it seems to be OK. Also i have assigned 3G to MCF where it is quire >> comfortable. In rest of 4G, i am assuming is enough for OS & zookeeper >> nodes. I am currently running job for 35K documents & i could see more than >> 500MB memory free. >> >> Any thoughts? >> >> Regards. >> >> On Mon, Sep 15, 2014 at 8:45 PM, Karl Wright <[email protected]> wrote: >> >>> HI Lalit, >>> >>> The best way in Java to assess memory usage is to turn on JVM garbage >>> collection verbose output. Then you can see how often the system garbage >>> collects etc, and whether post-GC usage grows over time. >>> >>> 12G should be more than enough, so if you find you are running into >>> memory limits with that configuration, it would be worth trying to figure >>> out why that is happening. >>> >>> Karl >>> >>> >>> On Mon, Sep 15, 2014 at 11:04 AM, lalit jangra <[email protected] >>> > wrote: >>> >>>> Hi Karl, >>>> >>>> Can i see zookeeper connection reset messages due to system running on >>>> top of memory limits as i have 12G of RAM and can see its using 11.5G while >>>> job is running? >>>> >>>> >>>> Is there any way i should ascertain memory to zookeeper nodes & if so, >>>> is there any yardstick? >>>> >>>> Regards. >>>> >>>> On Mon, Sep 15, 2014 at 7:16 PM, Karl Wright <[email protected]> >>>> wrote: >>>> >>>>> Hi Lalit, >>>>> >>>>> Looks like this is the result of a tomcat shutdown, and is a probable >>>>> race condition bug in Zookeeper: >>>>> >>>>> >>>>> http://mail-archives.apache.org/mod_mbox/tomcat-users/201306.mbox/%[email protected]%3E >>>>> >>>>> Karl >>>>> >>>>> >>>>> On Mon, Sep 15, 2014 at 9:41 AM, lalit jangra < >>>>> [email protected]> wrote: >>>>> >>>>>> Hi Karl, >>>>>> >>>>>> Along with this, i could see below errors in tomcat catalina.out. >>>>>> >>>>>> Sep 15, 2014 1:06:14 PM org.apache.catalina.loader.WebappClassLoader >>>>>> loadClass >>>>>> >>>>>> INFO: Illegal access: this web application instance has been stopped >>>>>> already. Could not load org.apache.zookeeper.server.ZooTrace. The >>>>>> eventual following stack trace is caused by an error thrown for debugging >>>>>> purposes as well as to attempt to terminate the thread which caused the >>>>>> illegal access, and has no functional impact. >>>>>> >>>>>> java.lang.IllegalStateException >>>>>> >>>>>> at >>>>>> org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.java:1612) >>>>>> >>>>>> at >>>>>> org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.java:1571) >>>>>> >>>>>> at >>>>>> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1115) >>>>>> >>>>>> >>>>>> >>>>>> [http-bio-80-exec-1-SendThread(iwdc2preecma04.iwater.ie:2183)] ERROR >>>>>> org.apache.zookeeper.ClientCnxn - from http-bio-80-exec-1-SendThread( >>>>>> iwdc2preecma04.iwater.ie:2183) >>>>>> >>>>>> java.lang.NoClassDefFoundError: org/apache/zookeeper/server/ZooTrace >>>>>> >>>>>> at >>>>>> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1115) >>>>>> >>>>>> Caused by: java.lang.ClassNotFoundException: >>>>>> org.apache.zookeeper.server.ZooTrace >>>>>> >>>>>> at >>>>>> org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.java:1720) >>>>>> >>>>>> at >>>>>> org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.java:1571) >>>>>> >>>>>> ... 1 more >>>>>> >>>>>> [http-bio-80-exec-1-SendThread(iwdc2preecma04.iwater.ie:2182)] ERROR >>>>>> org.apache.zookeeper.ClientCnxn - from http-bio-80-exec-1-SendThread( >>>>>> iwdc2preecma04.iwater.ie:2182) >>>>>> >>>>>> Sep 15, 2014 1:06:14 PM org.apache.coyote.AbstractProtocol destroy >>>>>> >>>>>> INFO: Destroying ProtocolHandler ["http-bio-80"] >>>>>> >>>>>> java.lang.NoClassDefFoundError: org/apache/zookeeper/server/ZooTrace >>>>>> >>>>>> Regards. >>>>>> >>>>>> On Mon, Sep 15, 2014 at 7:05 PM, lalit jangra < >>>>>> [email protected]> wrote: >>>>>> >>>>>>> Thanks Karl, >>>>>>> >>>>>>> While crawling is very slow, its taking long so a bit of frustrating >>>>>>> and as i have multiple high volume jobs that too in parallel, it does >>>>>>> not >>>>>>> seem to be a good thing. >>>>>>> >>>>>>> I have also raised it on Zookeeper forums @ >>>>>>> http://zookeeper-user.578899.n2.nabble.com/Getting-errors-in-zookeeper-logs-td7580260.html >>>>>>> but waiting for reply. >>>>>>> >>>>>>> Regards. >>>>>>> >>>>>>> On Mon, Sep 15, 2014 at 6:51 PM, Karl Wright <[email protected]> >>>>>>> wrote: >>>>>>> >>>>>>>> HI Lalit, >>>>>>>> >>>>>>>> When MCF cannot reach zookeeper, MCF crawls will pause until the >>>>>>>> zookeeper connections are reestablished. Then the crawls should >>>>>>>> resume. >>>>>>>> This should *not* abort your crawls, but it will make them very slow. >>>>>>>> >>>>>>>> I am not a zookeeper expert, so I would post on their message >>>>>>>> boards to see if there is any adjustment that can be made to zookeeper >>>>>>>> parameters that would improve zookeeper behavior when you have a flaky >>>>>>>> network. However, since the obvious solution is to fix your network, >>>>>>>> they >>>>>>>> may not have a code solution for you. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Karl >>>>>>>> >>>>>>>> >>>>>>>> On Mon, Sep 15, 2014 at 9:15 AM, lalit jangra < >>>>>>>> [email protected]> wrote: >>>>>>>> >>>>>>>>> Thanks Karl, >>>>>>>>> >>>>>>>>> Ideally resetting connections should be taken care by zookeeper >>>>>>>>> itself as i could see re-establishment of connections later in logs. >>>>>>>>> >>>>>>>>> Can you suggest any way to overcome this in addition to network >>>>>>>>> issue resolution as my crawls are not working again and again? >>>>>>>>> Anything in >>>>>>>>> config files etc.? >>>>>>>>> >>>>>>>>> Regards. >>>>>>>>> >>>>>>>>> >>>>>>>>> On Mon, Sep 15, 2014 at 6:39 PM, Karl Wright <[email protected]> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Hi Lalit, >>>>>>>>>> >>>>>>>>>> Zookeeper will keep working, but you should understand that you >>>>>>>>>> are dropping connections to your zookeeper members for unknown >>>>>>>>>> reasons, >>>>>>>>>> which is causing your crawl to stall when it happens. This argues >>>>>>>>>> that >>>>>>>>>> perhaps you have some network flakiness of some kind. >>>>>>>>>> >>>>>>>>>> Karl >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Mon, Sep 15, 2014 at 8:59 AM, lalit jangra < >>>>>>>>>> [email protected]> wrote: >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Hi, >>>>>>>>>>> >>>>>>>>>>> I am running cluster of two Apache ManifoldCF nodes on two >>>>>>>>>>> separate machines each of which having 3 zookeeper instances (total >>>>>>>>>>> 6 >>>>>>>>>>> instances in cluster). When i am running up manifoldCF agents, i >>>>>>>>>>> see below >>>>>>>>>>> warning during startup. >>>>>>>>>>> >>>>>>>>>>> [http-bio-80-exec-2-SendThread(iwdc1preecma03.iwater.ie:2181)] >>>>>>>>>>> INFO org.apache.zookeeper.ClientCnxn - Unable to read additional >>>>>>>>>>> data from >>>>>>>>>>> server sessionid 0x0, likely server has closed socket, closing >>>>>>>>>>> socket >>>>>>>>>>> connection and attempting reconnect >>>>>>>>>>> >>>>>>>>>>> [http-bio-80-exec-2-SendThread(iwdc2preecma04.iwater.ie:2182)] >>>>>>>>>>> INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to >>>>>>>>>>> server >>>>>>>>>>> iwdc2preecma04.iwater.ie/10.231.72.25:2182. Will not attempt to >>>>>>>>>>> authenticate using SASL (unknown error) >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Also i could see below error in logs in while agents are running. >>>>>>>>>>> >>>>>>>>>>> [http-bio-80-exec-2] INFO org.apache.zookeeper.ZooKeeper - >>>>>>>>>>> Initiating client connection, >>>>>>>>>>> connectString=iwdc1preecma03:2181,iwdc1preecma03:2182,iwdc1preecma03:2183,iwdc2preecma04:2181,iwdc2preecma04:2182,iwdc2preecma04:2183 >>>>>>>>>>> sessionTimeout=4000 >>>>>>>>>>> watcher=org.apache.manifoldcf.core.lockmanager.ZooKeeperConnection$ZooKeeperWatcher@51d83fd7 >>>>>>>>>>> >>>>>>>>>>> [http-bio-80-exec-2-SendThread(iwdc2preecma04.iwater.ie:2182)] >>>>>>>>>>> INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to >>>>>>>>>>> server >>>>>>>>>>> iwdc2preecma04.iwater.ie/10.231.72.25:2182. Will not attempt to >>>>>>>>>>> authenticate using SASL (unknown error) >>>>>>>>>>> >>>>>>>>>>> [http-bio-80-exec-2-SendThread(iwdc2preecma04.iwater.ie:2182)] >>>>>>>>>>> INFO org.apache.zookeeper.ClientCnxn - Socket connection >>>>>>>>>>> established to >>>>>>>>>>> iwdc2preecma04.iwater.ie/10.231.72.25:2182, initiating session >>>>>>>>>>> >>>>>>>>>>> [http-bio-80-exec-2-SendThread(iwdc2preecma04.iwater.ie:2182)] >>>>>>>>>>> WARN org.apache.zookeeper.ClientCnxn - Session 0x0 for server >>>>>>>>>>> iwdc2preecma04.iwater.ie/10.231.72.25:2182, unexpected error, >>>>>>>>>>> closing socket connection and attempting reconnect >>>>>>>>>>> >>>>>>>>>>> java.io.IOException: Connection reset by peer >>>>>>>>>>> >>>>>>>>>>> at sun.nio.ch.FileDispatcherImpl.read0(Native Method) >>>>>>>>>>> >>>>>>>>>>> at >>>>>>>>>>> sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) >>>>>>>>>>> >>>>>>>>>>> at >>>>>>>>>>> sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:225) >>>>>>>>>>> >>>>>>>>>>> at sun.nio.ch.IOUtil.read(IOUtil.java:193) >>>>>>>>>>> >>>>>>>>>>> at >>>>>>>>>>> sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:375) >>>>>>>>>>> >>>>>>>>>>> at >>>>>>>>>>> org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:68) >>>>>>>>>>> >>>>>>>>>>> at >>>>>>>>>>> org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:355) >>>>>>>>>>> >>>>>>>>>>> at >>>>>>>>>>> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068) >>>>>>>>>>> >>>>>>>>>>> [http-bio-80-exec-2-SendThread(iwdc2preecma04.iwater.ie:2183)] >>>>>>>>>>> INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to >>>>>>>>>>> server >>>>>>>>>>> iwdc2preecma04.iwater.ie/10.231.72.25:2183. Will not attempt to >>>>>>>>>>> authenticate using SASL (unknown error) >>>>>>>>>>> >>>>>>>>>>> [http-bio-80-exec-2-SendThread(iwdc2preecma04.iwater.ie:2183)] >>>>>>>>>>> INFO org.apache.zookeeper.ClientCnxn - Socket connection >>>>>>>>>>> established to >>>>>>>>>>> iwdc2preecma04.iwater.ie/10.231.72.25:2183, initiating session >>>>>>>>>>> >>>>>>>>>>> [http-bio-80-exec-2-SendThread(iwdc2preecma04.iwater.ie:2183)] >>>>>>>>>>> INFO org.apache.zookeeper.ClientCnxn - Session establishment >>>>>>>>>>> complete on >>>>>>>>>>> server iwdc2preecma04.iwater.ie/10.231.72.25:2183, sessionid = >>>>>>>>>>> 0x6487851bd330078, negotiated timeout = 4000 >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Below are configurations for 1. zookeeper nodes & 2. MCF nodes >>>>>>>>>>> for zookeeper. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> *zoo.cfg : Same for all six zookeeper nodes.* >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> # The number of milliseconds of each tick >>>>>>>>>>> >>>>>>>>>>> tickTime=2000 >>>>>>>>>>> >>>>>>>>>>> dataDir=/app/IW/zookeeper/data/data.1 >>>>>>>>>>> >>>>>>>>>>> dataLogDir=/app/IW/zookeeper/logs/log.1 >>>>>>>>>>> >>>>>>>>>>> clientPort=2181 >>>>>>>>>>> >>>>>>>>>>> server.1=iwdc1preecma03:2888:3888 >>>>>>>>>>> >>>>>>>>>>> server.2=iwdc1preecma03:2889:3889 >>>>>>>>>>> >>>>>>>>>>> server.3=iwdc1preecma03:2890:3890 >>>>>>>>>>> >>>>>>>>>>> server.4=iwdc2preecma04:2891:3891 >>>>>>>>>>> >>>>>>>>>>> server.5=iwdc2preecma04:2892:3892 >>>>>>>>>>> >>>>>>>>>>> server.6=iwdc2preecma04:2893:3893 >>>>>>>>>>> >>>>>>>>>>> # The number of ticks that the initial >>>>>>>>>>> >>>>>>>>>>> # synchronization phase can take >>>>>>>>>>> >>>>>>>>>>> initLimit=10 >>>>>>>>>>> >>>>>>>>>>> # The number of ticks that can pass between >>>>>>>>>>> >>>>>>>>>>> # sending a request and getting an acknowledgement >>>>>>>>>>> >>>>>>>>>>> syncLimit=5 >>>>>>>>>>> >>>>>>>>>>> # the directory where the snapshot is stored. >>>>>>>>>>> >>>>>>>>>>> # do not use /tmp for storage, /tmp here is just >>>>>>>>>>> >>>>>>>>>>> # example sakes. >>>>>>>>>>> >>>>>>>>>>> #dataDir=/tmp/zookeeper >>>>>>>>>>> >>>>>>>>>>> # the port at which the clients will connect >>>>>>>>>>> >>>>>>>>>>> #clientPort=2181 >>>>>>>>>>> >>>>>>>>>>> # the maximum number of client connections. >>>>>>>>>>> >>>>>>>>>>> # increase this if you need to handle more clients >>>>>>>>>>> >>>>>>>>>>> #maxClientCnxns=60 >>>>>>>>>>> >>>>>>>>>>> # >>>>>>>>>>> >>>>>>>>>>> # Be sure to read the maintenance section of the >>>>>>>>>>> >>>>>>>>>>> # administrator guide before turning on autopurge. >>>>>>>>>>> >>>>>>>>>>> # >>>>>>>>>>> >>>>>>>>>>> # >>>>>>>>>>> http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance >>>>>>>>>>> >>>>>>>>>>> # >>>>>>>>>>> >>>>>>>>>>> # The number of snapshots to retain in dataDir >>>>>>>>>>> >>>>>>>>>>> autopurge.snapRetainCount=3 >>>>>>>>>>> >>>>>>>>>>> # Purge task interval in hours >>>>>>>>>>> >>>>>>>>>>> # Set to "0" to disable auto purge feature >>>>>>>>>>> >>>>>>>>>>> autopurge.purgeInterval=1 >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> *ManifoldCF configurations : same for both ManifoldCF nodes.* >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> <property name="org.apache.manifoldcf.lockmanagerclass" >>>>>>>>>>> value="org.apache.manifoldcf.core.lockmanager.ZooKeeperLockManager"/> >>>>>>>>>>> >>>>>>>>>>> <property name="org.apache.manifoldcf.zookeeper.connectstring" >>>>>>>>>>> value="iwdc1preecma03:2181,iwdc1preecma03:2182,iwdc1preecma03:2183,iwdc2preecma04:2181,iwdc2preecma04:2182,iwdc2preecma04:2183"/> >>>>>>>>>>> >>>>>>>>>>> <property name="org.apache.manifoldcf.zookeeper.sessiontimeout" >>>>>>>>>>> value="4000"/> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> *I want to know if due to above warnings/errors, will zookeeper >>>>>>>>>>> stop working or will zookeeper will work and these are non-failing >>>>>>>>>>> messages, because ManifoldCF jobs are stuck while i can see these >>>>>>>>>>> errors.* >>>>>>>>>>> >>>>>>>>>>> Please suggest. >>>>>>>>>>> >>>>>>>>>>> Regards, >>>>>>>>>>> Lalit. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Regards, >>>>>>>>> Lalit. >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Regards, >>>>>>> Lalit. >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Regards, >>>>>> Lalit. >>>>>> >>>>> >>>>> >>>> >>>> >>>> -- >>>> Regards, >>>> Lalit. >>>> >>> >>> >> >> >> -- >> Regards, >> Lalit. >> > > -- Regards, Lalit.
