Hi Karl, Out of 12G, i have assigned 5G to solr as i could see a lot of Out of Memory errors/Java heap space issues while crawling large jobs,after which it seems to be OK. Also i have assigned 3G to MCF where it is quire comfortable. In rest of 4G, i am assuming is enough for OS & zookeeper nodes. I am currently running job for 35K documents & i could see more than 500MB memory free.
Any thoughts? Regards. On Mon, Sep 15, 2014 at 8:45 PM, Karl Wright <[email protected]> wrote: > HI Lalit, > > The best way in Java to assess memory usage is to turn on JVM garbage > collection verbose output. Then you can see how often the system garbage > collects etc, and whether post-GC usage grows over time. > > 12G should be more than enough, so if you find you are running into memory > limits with that configuration, it would be worth trying to figure out why > that is happening. > > Karl > > > On Mon, Sep 15, 2014 at 11:04 AM, lalit jangra <[email protected]> > wrote: > >> Hi Karl, >> >> Can i see zookeeper connection reset messages due to system running on >> top of memory limits as i have 12G of RAM and can see its using 11.5G while >> job is running? >> >> >> Is there any way i should ascertain memory to zookeeper nodes & if so, is >> there any yardstick? >> >> Regards. >> >> On Mon, Sep 15, 2014 at 7:16 PM, Karl Wright <[email protected]> wrote: >> >>> Hi Lalit, >>> >>> Looks like this is the result of a tomcat shutdown, and is a probable >>> race condition bug in Zookeeper: >>> >>> >>> http://mail-archives.apache.org/mod_mbox/tomcat-users/201306.mbox/%[email protected]%3E >>> >>> Karl >>> >>> >>> On Mon, Sep 15, 2014 at 9:41 AM, lalit jangra <[email protected]> >>> wrote: >>> >>>> Hi Karl, >>>> >>>> Along with this, i could see below errors in tomcat catalina.out. >>>> >>>> Sep 15, 2014 1:06:14 PM org.apache.catalina.loader.WebappClassLoader >>>> loadClass >>>> >>>> INFO: Illegal access: this web application instance has been stopped >>>> already. Could not load org.apache.zookeeper.server.ZooTrace. The >>>> eventual following stack trace is caused by an error thrown for debugging >>>> purposes as well as to attempt to terminate the thread which caused the >>>> illegal access, and has no functional impact. >>>> >>>> java.lang.IllegalStateException >>>> >>>> at >>>> org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.java:1612) >>>> >>>> at >>>> org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.java:1571) >>>> >>>> at >>>> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1115) >>>> >>>> >>>> >>>> [http-bio-80-exec-1-SendThread(iwdc2preecma04.iwater.ie:2183)] ERROR >>>> org.apache.zookeeper.ClientCnxn - from http-bio-80-exec-1-SendThread( >>>> iwdc2preecma04.iwater.ie:2183) >>>> >>>> java.lang.NoClassDefFoundError: org/apache/zookeeper/server/ZooTrace >>>> >>>> at >>>> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1115) >>>> >>>> Caused by: java.lang.ClassNotFoundException: >>>> org.apache.zookeeper.server.ZooTrace >>>> >>>> at >>>> org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.java:1720) >>>> >>>> at >>>> org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.java:1571) >>>> >>>> ... 1 more >>>> >>>> [http-bio-80-exec-1-SendThread(iwdc2preecma04.iwater.ie:2182)] ERROR >>>> org.apache.zookeeper.ClientCnxn - from http-bio-80-exec-1-SendThread( >>>> iwdc2preecma04.iwater.ie:2182) >>>> >>>> Sep 15, 2014 1:06:14 PM org.apache.coyote.AbstractProtocol destroy >>>> >>>> INFO: Destroying ProtocolHandler ["http-bio-80"] >>>> >>>> java.lang.NoClassDefFoundError: org/apache/zookeeper/server/ZooTrace >>>> >>>> Regards. >>>> >>>> On Mon, Sep 15, 2014 at 7:05 PM, lalit jangra <[email protected] >>>> > wrote: >>>> >>>>> Thanks Karl, >>>>> >>>>> While crawling is very slow, its taking long so a bit of frustrating >>>>> and as i have multiple high volume jobs that too in parallel, it does not >>>>> seem to be a good thing. >>>>> >>>>> I have also raised it on Zookeeper forums @ >>>>> http://zookeeper-user.578899.n2.nabble.com/Getting-errors-in-zookeeper-logs-td7580260.html >>>>> but waiting for reply. >>>>> >>>>> Regards. >>>>> >>>>> On Mon, Sep 15, 2014 at 6:51 PM, Karl Wright <[email protected]> >>>>> wrote: >>>>> >>>>>> HI Lalit, >>>>>> >>>>>> When MCF cannot reach zookeeper, MCF crawls will pause until the >>>>>> zookeeper connections are reestablished. Then the crawls should resume. >>>>>> This should *not* abort your crawls, but it will make them very slow. >>>>>> >>>>>> I am not a zookeeper expert, so I would post on their message boards >>>>>> to see if there is any adjustment that can be made to zookeeper >>>>>> parameters >>>>>> that would improve zookeeper behavior when you have a flaky network. >>>>>> However, since the obvious solution is to fix your network, they may not >>>>>> have a code solution for you. >>>>>> >>>>>> Thanks, >>>>>> Karl >>>>>> >>>>>> >>>>>> On Mon, Sep 15, 2014 at 9:15 AM, lalit jangra < >>>>>> [email protected]> wrote: >>>>>> >>>>>>> Thanks Karl, >>>>>>> >>>>>>> Ideally resetting connections should be taken care by zookeeper >>>>>>> itself as i could see re-establishment of connections later in logs. >>>>>>> >>>>>>> Can you suggest any way to overcome this in addition to network >>>>>>> issue resolution as my crawls are not working again and again? Anything >>>>>>> in >>>>>>> config files etc.? >>>>>>> >>>>>>> Regards. >>>>>>> >>>>>>> >>>>>>> On Mon, Sep 15, 2014 at 6:39 PM, Karl Wright <[email protected]> >>>>>>> wrote: >>>>>>> >>>>>>>> Hi Lalit, >>>>>>>> >>>>>>>> Zookeeper will keep working, but you should understand that you are >>>>>>>> dropping connections to your zookeeper members for unknown reasons, >>>>>>>> which >>>>>>>> is causing your crawl to stall when it happens. This argues that >>>>>>>> perhaps >>>>>>>> you have some network flakiness of some kind. >>>>>>>> >>>>>>>> Karl >>>>>>>> >>>>>>>> >>>>>>>> On Mon, Sep 15, 2014 at 8:59 AM, lalit jangra < >>>>>>>> [email protected]> wrote: >>>>>>>> >>>>>>>>> >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> I am running cluster of two Apache ManifoldCF nodes on two >>>>>>>>> separate machines each of which having 3 zookeeper instances (total 6 >>>>>>>>> instances in cluster). When i am running up manifoldCF agents, i see >>>>>>>>> below >>>>>>>>> warning during startup. >>>>>>>>> >>>>>>>>> [http-bio-80-exec-2-SendThread(iwdc1preecma03.iwater.ie:2181)] >>>>>>>>> INFO org.apache.zookeeper.ClientCnxn - Unable to read additional data >>>>>>>>> from >>>>>>>>> server sessionid 0x0, likely server has closed socket, closing socket >>>>>>>>> connection and attempting reconnect >>>>>>>>> >>>>>>>>> [http-bio-80-exec-2-SendThread(iwdc2preecma04.iwater.ie:2182)] >>>>>>>>> INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to >>>>>>>>> server >>>>>>>>> iwdc2preecma04.iwater.ie/10.231.72.25:2182. Will not attempt to >>>>>>>>> authenticate using SASL (unknown error) >>>>>>>>> >>>>>>>>> >>>>>>>>> Also i could see below error in logs in while agents are running. >>>>>>>>> >>>>>>>>> [http-bio-80-exec-2] INFO org.apache.zookeeper.ZooKeeper - >>>>>>>>> Initiating client connection, >>>>>>>>> connectString=iwdc1preecma03:2181,iwdc1preecma03:2182,iwdc1preecma03:2183,iwdc2preecma04:2181,iwdc2preecma04:2182,iwdc2preecma04:2183 >>>>>>>>> sessionTimeout=4000 >>>>>>>>> watcher=org.apache.manifoldcf.core.lockmanager.ZooKeeperConnection$ZooKeeperWatcher@51d83fd7 >>>>>>>>> >>>>>>>>> [http-bio-80-exec-2-SendThread(iwdc2preecma04.iwater.ie:2182)] >>>>>>>>> INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to >>>>>>>>> server >>>>>>>>> iwdc2preecma04.iwater.ie/10.231.72.25:2182. Will not attempt to >>>>>>>>> authenticate using SASL (unknown error) >>>>>>>>> >>>>>>>>> [http-bio-80-exec-2-SendThread(iwdc2preecma04.iwater.ie:2182)] >>>>>>>>> INFO org.apache.zookeeper.ClientCnxn - Socket connection established >>>>>>>>> to >>>>>>>>> iwdc2preecma04.iwater.ie/10.231.72.25:2182, initiating session >>>>>>>>> >>>>>>>>> [http-bio-80-exec-2-SendThread(iwdc2preecma04.iwater.ie:2182)] >>>>>>>>> WARN org.apache.zookeeper.ClientCnxn - Session 0x0 for server >>>>>>>>> iwdc2preecma04.iwater.ie/10.231.72.25:2182, unexpected error, >>>>>>>>> closing socket connection and attempting reconnect >>>>>>>>> >>>>>>>>> java.io.IOException: Connection reset by peer >>>>>>>>> >>>>>>>>> at sun.nio.ch.FileDispatcherImpl.read0(Native Method) >>>>>>>>> >>>>>>>>> at >>>>>>>>> sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) >>>>>>>>> >>>>>>>>> at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:225) >>>>>>>>> >>>>>>>>> at sun.nio.ch.IOUtil.read(IOUtil.java:193) >>>>>>>>> >>>>>>>>> at >>>>>>>>> sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:375) >>>>>>>>> >>>>>>>>> at >>>>>>>>> org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:68) >>>>>>>>> >>>>>>>>> at >>>>>>>>> org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:355) >>>>>>>>> >>>>>>>>> at >>>>>>>>> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068) >>>>>>>>> >>>>>>>>> [http-bio-80-exec-2-SendThread(iwdc2preecma04.iwater.ie:2183)] >>>>>>>>> INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to >>>>>>>>> server >>>>>>>>> iwdc2preecma04.iwater.ie/10.231.72.25:2183. Will not attempt to >>>>>>>>> authenticate using SASL (unknown error) >>>>>>>>> >>>>>>>>> [http-bio-80-exec-2-SendThread(iwdc2preecma04.iwater.ie:2183)] >>>>>>>>> INFO org.apache.zookeeper.ClientCnxn - Socket connection established >>>>>>>>> to >>>>>>>>> iwdc2preecma04.iwater.ie/10.231.72.25:2183, initiating session >>>>>>>>> >>>>>>>>> [http-bio-80-exec-2-SendThread(iwdc2preecma04.iwater.ie:2183)] >>>>>>>>> INFO org.apache.zookeeper.ClientCnxn - Session establishment complete >>>>>>>>> on >>>>>>>>> server iwdc2preecma04.iwater.ie/10.231.72.25:2183, sessionid = >>>>>>>>> 0x6487851bd330078, negotiated timeout = 4000 >>>>>>>>> >>>>>>>>> >>>>>>>>> Below are configurations for 1. zookeeper nodes & 2. MCF nodes for >>>>>>>>> zookeeper. >>>>>>>>> >>>>>>>>> >>>>>>>>> *zoo.cfg : Same for all six zookeeper nodes.* >>>>>>>>> >>>>>>>>> >>>>>>>>> # The number of milliseconds of each tick >>>>>>>>> >>>>>>>>> tickTime=2000 >>>>>>>>> >>>>>>>>> dataDir=/app/IW/zookeeper/data/data.1 >>>>>>>>> >>>>>>>>> dataLogDir=/app/IW/zookeeper/logs/log.1 >>>>>>>>> >>>>>>>>> clientPort=2181 >>>>>>>>> >>>>>>>>> server.1=iwdc1preecma03:2888:3888 >>>>>>>>> >>>>>>>>> server.2=iwdc1preecma03:2889:3889 >>>>>>>>> >>>>>>>>> server.3=iwdc1preecma03:2890:3890 >>>>>>>>> >>>>>>>>> server.4=iwdc2preecma04:2891:3891 >>>>>>>>> >>>>>>>>> server.5=iwdc2preecma04:2892:3892 >>>>>>>>> >>>>>>>>> server.6=iwdc2preecma04:2893:3893 >>>>>>>>> >>>>>>>>> # The number of ticks that the initial >>>>>>>>> >>>>>>>>> # synchronization phase can take >>>>>>>>> >>>>>>>>> initLimit=10 >>>>>>>>> >>>>>>>>> # The number of ticks that can pass between >>>>>>>>> >>>>>>>>> # sending a request and getting an acknowledgement >>>>>>>>> >>>>>>>>> syncLimit=5 >>>>>>>>> >>>>>>>>> # the directory where the snapshot is stored. >>>>>>>>> >>>>>>>>> # do not use /tmp for storage, /tmp here is just >>>>>>>>> >>>>>>>>> # example sakes. >>>>>>>>> >>>>>>>>> #dataDir=/tmp/zookeeper >>>>>>>>> >>>>>>>>> # the port at which the clients will connect >>>>>>>>> >>>>>>>>> #clientPort=2181 >>>>>>>>> >>>>>>>>> # the maximum number of client connections. >>>>>>>>> >>>>>>>>> # increase this if you need to handle more clients >>>>>>>>> >>>>>>>>> #maxClientCnxns=60 >>>>>>>>> >>>>>>>>> # >>>>>>>>> >>>>>>>>> # Be sure to read the maintenance section of the >>>>>>>>> >>>>>>>>> # administrator guide before turning on autopurge. >>>>>>>>> >>>>>>>>> # >>>>>>>>> >>>>>>>>> # >>>>>>>>> http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance >>>>>>>>> >>>>>>>>> # >>>>>>>>> >>>>>>>>> # The number of snapshots to retain in dataDir >>>>>>>>> >>>>>>>>> autopurge.snapRetainCount=3 >>>>>>>>> >>>>>>>>> # Purge task interval in hours >>>>>>>>> >>>>>>>>> # Set to "0" to disable auto purge feature >>>>>>>>> >>>>>>>>> autopurge.purgeInterval=1 >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> *ManifoldCF configurations : same for both ManifoldCF nodes.* >>>>>>>>> >>>>>>>>> >>>>>>>>> <property name="org.apache.manifoldcf.lockmanagerclass" >>>>>>>>> value="org.apache.manifoldcf.core.lockmanager.ZooKeeperLockManager"/> >>>>>>>>> >>>>>>>>> <property name="org.apache.manifoldcf.zookeeper.connectstring" >>>>>>>>> value="iwdc1preecma03:2181,iwdc1preecma03:2182,iwdc1preecma03:2183,iwdc2preecma04:2181,iwdc2preecma04:2182,iwdc2preecma04:2183"/> >>>>>>>>> >>>>>>>>> <property name="org.apache.manifoldcf.zookeeper.sessiontimeout" >>>>>>>>> value="4000"/> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> *I want to know if due to above warnings/errors, will zookeeper >>>>>>>>> stop working or will zookeeper will work and these are non-failing >>>>>>>>> messages, because ManifoldCF jobs are stuck while i can see these >>>>>>>>> errors.* >>>>>>>>> >>>>>>>>> Please suggest. >>>>>>>>> >>>>>>>>> Regards, >>>>>>>>> Lalit. >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Regards, >>>>>>> Lalit. >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>>> -- >>>>> Regards, >>>>> Lalit. >>>>> >>>> >>>> >>>> >>>> -- >>>> Regards, >>>> Lalit. >>>> >>> >>> >> >> >> -- >> Regards, >> Lalit. >> > > -- Regards, Lalit.
