It seems fine to limit the amount of memory, but I'd rather get the error that is preventing the servers from starting/making progress first.
-Flavio On Tuesday, September 16, 2014 9:48 AM, lalit jangra <[email protected]> wrote: > > >Thanks Flavio, > >I will try and update. Can you confirm if i add java.env under conf folder >with JVM settings as "-Xms 1024m -Xmx1024m" , it will help to limit memory >size of zookeeper till 1 G only? > >Regards. > >On Tue, Sep 16, 2014 at 2:05 PM, Flavio Junqueira < >[email protected]> wrote: > >> What if you use 'zkServer.sh start-foreground' to debug? >> >> -Flavio >> >> >> On Tuesday, September 16, 2014 5:20 AM, lalit jangra < >> [email protected]> wrote: >> >> >> > >> > >> >Hello Flavio, >> > >> >I am using 'zkServer.sh start' command to start zookeeper nodes. I also >> >could see logs in log folders in have specified but these logs are in a >> >form which is difficult to understand. >> > >> >Also regarding to using 6 zookeeper nodes (3+3), is it fine to handle >> >failures as per 50% rule as if 3 are down my cluster should work or should >> >i move to having odd numbers such as 5 or 7 here? >> > >> >Regards. >> > >> >On Tue, Sep 16, 2014 at 4:26 AM, Flavio Junqueira < >> >[email protected]> wrote: >> > >> >> Instead of guessing, I think it is best if we understand what's going >> >> wrong with the servers, you need to look at the server logs. If you >> don't >> >> know how to get it, could you please share the command you're using to >> >> start servers? >> >> >> >> -Flavio >> >> >> >> >> >> >> >> On Monday, September 15, 2014 3:30 PM, lalit jangra < >> >> [email protected]> wrote: >> >> >> >> >> >> > >> >> > >> >> >Hello Flavio, >> >> > >> >> >Can this issue arise from system not having enough RAM for Java Heap >> as i >> >> >could see my system is running on top of its RAM? >> >> > >> >> >Also is there any way to assign memory to zookeeper nodes? >> >> > >> >> >Regards. >> >> > >> >> >On Mon, Sep 15, 2014 at 7:37 PM, lalit jangra < >> [email protected]> >> >> >wrote: >> >> > >> >> >> Thanks Flavio, >> >> >> >> >> >> I am having 3+3 zookeeper nodes on two servers MCF1 & MCF2. Also i >> could >> >> >> see same error on both nodes. For logs into servers, i am not able to >> >> read >> >> >> anything from these, how can i read and interpret from zookeeper >> servers >> >> >> what is wrong? >> >> >> >> >> >> I have put different log & data directories for each of zookeeper, >> may >> >> be >> >> >> i should elaborate a bit more. I am deciding on names of logs & data >> >> >> directory as per myid (ranging from 1 to 6). >> >> >> >> >> >> ZK1 -> Data.1 -> Logs.1 >> >> >> ZK2 -> Data.2 -> Logs.2 >> >> >> ZK3 -> Data.3 -> Logs.3 >> >> >> ZK4 -> Data.4 -> Logs.4 >> >> >> ZK5 -> Data.5 -> Logs.5 >> >> >> ZK6 -> Data.6 -> Logs.6 >> >> >> >> >> >> As i have two servers only and i need to make it running on these two >> >> only >> >> >> so i chose this architecture. Also i am trying to make even for >> scenario >> >> >> where one node is down, i have only 3 zookeepers down so still >> second is >> >> >> working. If i have odd numbers say 5 or 7, if server with more >> numbers >> >> of >> >> >> zookeeper is down, its gone. >> >> >> >> >> >> Regards. >> >> >> >> >> >> >> >> >> On Mon, Sep 15, 2014 at 7:29 PM, Flavio Junqueira < >> >> >> [email protected]> wrote: >> >> >> >> >> >>> I believe you have shared just the client-side errors, and I was >> >> >>> wondering what's going on with the servers. One problem I could spot >> >> with >> >> >>> the configuration is with the values of dataDir and dataLogDir. It >> >> looks >> >> >>> like the processes on the same node are writing to the same >> directory, >> >> >>> which should be confusing the servers. >> >> >>> >> >> >>> A couple of things about your setting. I'm not sure what your >> >> motivation >> >> >>> is to put multiple servers on the same node. It will induce >> correlated >> >> >>> crashes for the servers on the same node. Also, we in general >> >> recommend to >> >> >>> use an odd number of servers (5 or 7 for your case). >> >> >>> >> >> >>> -Flavio >> >> >>> >> >> >>> On Wednesday, September 10, 2014 6:29 AM, lalit jangra < >> >> >>> [email protected]> wrote: >> >> >>> >> >> >>> >> >> >>> > >> >> >>> > >> >> >>> >Hi, >> >> >>> > >> >> >>> >I am running cluster of two Apache ManifoldCF nodes on two separate >> >> >>> >machines each of which having 3 zookeeper instances (total 6 >> >> instances in >> >> >>> >cluster). When i am running up manifoldCF agents, i see below >> warning >> >> >>> >during startup. >> >> >>> > >> >> >>> >[http-bio-80-exec-2-SendThread(iwdc1preecma03.iwater.ie:2181)] >> INFO >> >> >>> >org.apache.zookeeper.ClientCnxn - Unable to read additional data >> from >> >> >>> >server sessionid 0x0, likely server has closed socket, closing >> socket >> >> >>> >connection and attempting reconnect >> >> >>> > >> >> >>> >[http-bio-80-exec-2-SendThread(iwdc2preecma04.iwater.ie:2182)] >> INFO >> >> >>> >org.apache.zookeeper.ClientCnxn - Opening socket connection to >> server >> >> >>> >iwdc2preecma04.iwater.ie/10.231.72.25:2182. Will not attempt to >> >> >>> >authenticate using SASL (unknown error) >> >> >>> > >> >> >>> > >> >> >>> >Also i could see below error in logs in while agents are running. >> >> >>> > >> >> >>> >[localhost-startStop-1-SendThread(iwdc1preecma03.iwater.ie:2183)] >> >> WARN >> >> >>> >org.apache.zookeeper.ClientCnxn - Session 0x6485a8006060079 for >> server >> >> >>> >iwdc1preecma03.iwater.ie/10.231.72.24:2183, unexpected error, >> closing >> >> >>> >socket connection and attempting reconnect >> >> >>> > >> >> >>> >java.io.IOException: Connection reset by peer >> >> >>> > >> >> >>> > at sun.nio.ch.FileDispatcherImpl.read0(Native Method) >> >> >>> > >> >> >>> > at >> sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) >> >> >>> > >> >> >>> > at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:225) >> >> >>> > >> >> >>> > at sun.nio.ch.IOUtil.read(IOUtil.java:193) >> >> >>> > >> >> >>> > at >> >> sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:375) >> >> >>> > >> >> >>> > at >> >> >>> >> >> >>> >> >> >> >org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:68) >> >> >>> > >> >> >>> > at >> >> >>> >> >> >>> >> >> >> >org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:355) >> >> >>> > >> >> >>> > at >> >> >>> >> >org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068) >> >> >>> > >> >> >>> > >> >> >>> >Below are configurations for 1. zookeeper nodes & 2. MCF nodes for >> >> >>> >zookeeper. >> >> >>> > >> >> >>> > >> >> >>> >*zoo.cfg : Same for all six zookeeper nodes.* >> >> >>> > >> >> >>> > >> >> >>> ># The number of milliseconds of each tick >> >> >>> > >> >> >>> >tickTime=2000 >> >> >>> > >> >> >>> >dataDir=/app/IW/zookeeper/data/data.1 >> >> >>> > >> >> >>> >dataLogDir=/app/IW/zookeeper/logs/log.1 >> >> >>> > >> >> >>> >clientPort=2181 >> >> >>> > >> >> >>> >server.1=iwdc1preecma03:2888:3888 >> >> >>> > >> >> >>> >server.2=iwdc1preecma03:2889:3889 >> >> >>> > >> >> >>> >server.3=iwdc1preecma03:2890:3890 >> >> >>> > >> >> >>> >server.4=iwdc2preecma04:2891:3891 >> >> >>> > >> >> >>> >server.5=iwdc2preecma04:2892:3892 >> >> >>> > >> >> >>> >server.6=iwdc2preecma04:2893:3893 >> >> >>> > >> >> >>> ># The number of ticks that the initial >> >> >>> > >> >> >>> ># synchronization phase can take >> >> >>> > >> >> >>> >initLimit=10 >> >> >>> > >> >> >>> ># The number of ticks that can pass between >> >> >>> > >> >> >>> ># sending a request and getting an acknowledgement >> >> >>> > >> >> >>> >syncLimit=5 >> >> >>> > >> >> >>> ># the directory where the snapshot is stored. >> >> >>> > >> >> >>> ># do not use /tmp for storage, /tmp here is just >> >> >>> > >> >> >>> ># example sakes. >> >> >>> > >> >> >>> >#dataDir=/tmp/zookeeper >> >> >>> > >> >> >>> ># the port at which the clients will connect >> >> >>> > >> >> >>> >#clientPort=2181 >> >> >>> > >> >> >>> ># the maximum number of client connections. >> >> >>> > >> >> >>> ># increase this if you need to handle more clients >> >> >>> > >> >> >>> >#maxClientCnxns=60 >> >> >>> > >> >> >>> ># >> >> >>> > >> >> >>> ># Be sure to read the maintenance section of the >> >> >>> > >> >> >>> ># administrator guide before turning on autopurge. >> >> >>> > >> >> >>> ># >> >> >>> > >> >> >>> ># >> >> >>> >> >> >> http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance >> >> >>> > >> >> >>> ># >> >> >>> > >> >> >>> ># The number of snapshots to retain in dataDir >> >> >>> > >> >> >>> >autopurge.snapRetainCount=3 >> >> >>> > >> >> >>> ># Purge task interval in hours >> >> >>> > >> >> >>> ># Set to "0" to disable auto purge feature >> >> >>> > >> >> >>> >autopurge.purgeInterval=1 >> >> >>> > >> >> >>> > >> >> >>> > >> >> >>> >*ManifoldCF configurations : same for both ManifoldCF nodes.* >> >> >>> > >> >> >>> > >> >> >>> ><property name="org.apache.manifoldcf.lockmanagerclass" >> >> >>> >> >value="org.apache.manifoldcf.core.lockmanager.ZooKeeperLockManager"/> >> >> >>> > >> >> >>> > <property name="org.apache.manifoldcf.zookeeper.connectstring" >> >> >>> >> >> >>> >> >> >> >value="iwdc1preecma03:2181,iwdc1preecma03:2182,iwdc1preecma03:2183,iwdc2preecma04:2181,iwdc2preecma04:2182,iwdc2preecma04:2183"/> >> >> >>> > >> >> >>> ><property name="org.apache.manifoldcf.zookeeper.sessiontimeout" >> >> >>> >value="4000"/> >> >> >>> > >> >> >>> > >> >> >>> > >> >> >>> >*I want to know if due to above warnings/errors, will zookeeper >> stop >> >> >>> >working or will zookeeper will work and these are non-failing >> >> messages, >> >> >>> >because ManifoldCF jobs are stuck while i can see these errors.* >> >> >>> > >> >> >>> >Please suggest. >> >> >>> > >> >> >>> >Regards, >> >> >>> >Lalit. > >> > >> >> > >> >> >>> > >> >> >>> > >> >> >>> > >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> -- >> >> >> Regards, >> >> >> Lalit. >> >> >> >> >> > >> >> > >> >> > >> >> >-- >> >> >Regards, >> >> >Lalit. >> >> > >> >> > >> >> > >> >> >> > >> > >> > >> >-- >> >Regards, >> >Lalit. >> > >> > >> > >> > > > >-- >Regards, >Lalit. > > >
