You shouldn't use ZK to keep that data around. It's not designed to store a ton of historical information. Thousands of jobs is no big deal, but thousands of jobs and their history back through time is not what the system is designed for.
C On Fri, Sep 5, 2014 at 11:29 AM, Brian C. Huffman < [email protected]> wrote: > We use zookeeper to keep track of the jobs we run, and we run thousands of > jobs. When a job is finished it is no longer needed except for web > monitoring tools. Is that considered state? We want to keep that around so > we have a history of completed jobs. Will these stay in memory? > > Thanks, > Brian > > > On 09/05/2014 11:26 AM, Camille Fournier wrote: > >> All state is stored in memory in ZK for performance reasons. It sounds >> like >> you're putting more data into it than the heap will accommodate. >> ZK is useful for references to data, but not for large amounts of actual >> data. It's not designed to be a large data store. >> >> Thanks, >> C >> >> >> On Fri, Sep 5, 2014 at 10:33 AM, Brian C. Huffman < >> [email protected]> wrote: >> >> Flavio, >>> >>> I was having the same problems on 3.4.5 so I upgraded to 3.4.6. So it >>> doesn't seem to be related to the version. >>> >>> You might be right about the storing of state. I'm curious - does the >>> "state" consist of the entire node listing? Is there anyway to tell >>> zookeeper to keep a node around but only on disk? >>> >>> Thanks, >>> Brian >>> >>> >>> On 09/05/2014 09:47 AM, Flavio Junqueira wrote: >>> >>> Brian, >>>> >>>> How much state are you storing in ZK? Can you check the size of the >>>> snapshots? >>>> >>>> One common problem when folks are testing is that they forget to delete >>>> the data from previous tests, so the state keeps accumulating and the >>>> server keeps crashing because the state is too large. >>>> >>>> Also, consider trying 3.4.5 just to see if it is a problem with 3.4.6 >>>> alone. >>>> >>>> -Flavio >>>> >>>> >>>> On Friday, September 5, 2014 2:23 PM, Brian C. Huffman < >>>> [email protected]> wrote: >>>> >>>> >>>> We're running the latest version of the stable 3.4 branch (3.4.6) and >>>>> have been consistently having problems running out of heap space. >>>>> >>>>> We're running a single server (redundancy isn't a concern at this >>>>> point) >>>>> and I've tried the defaults (which seems to use Java's default heap of >>>>> 8GB) as well as limiting to 3GB. Either way the Zookeeper server >>>>> eventually dies. With larger heap size it seems to take longer to die. >>>>> >>>>> Here's the latest trace: >>>>> 2014-09-05 00:51:11,419 [myid:] - ERROR >>>>> [SyncThread:0:SyncRequestProcessor@183] - Severe unrecoverable error, >>>>> exiting >>>>> java.lang.OutOfMemoryError: Java heap space >>>>> at java.util.Arrays.copyOf(Arrays.java:2271) >>>>> at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream. >>>>> java:113) >>>>> at >>>>> java.io.ByteArrayOutputStream.ensureCapacity( >>>>> ByteArrayOutputStream.java:93) >>>>> at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream. >>>>> java:140) >>>>> at java.io.DataOutputStream.write(DataOutputStream.java:107) >>>>> at java.io.FilterOutputStream.write(FilterOutputStream.java:97) >>>>> at >>>>> org.apache.jute.BinaryOutputArchive.writeBuffer( >>>>> BinaryOutputArchive.java:119) >>>>> at org.apache.zookeeper.txn.Txn.serialize(Txn.java:49) >>>>> at >>>>> org.apache.jute.BinaryOutputArchive.writeRecord( >>>>> BinaryOutputArchive.java:123) >>>>> at org.apache.zookeeper.txn.MultiTxn.serialize(MultiTxn.java:44) >>>>> at >>>>> org.apache.zookeeper.server.persistence.Util. >>>>> marshallTxnEntry(Util.java: >>>>> 263) >>>>> at >>>>> org.apache.zookeeper.server.persistence.FileTxnLog.append( >>>>> FileTxnLog.java:216) >>>>> at >>>>> org.apache.zookeeper.server.persistence.FileTxnSnapLog. >>>>> append(FileTxnSnapLog.java:314) >>>>> at org.apache.zookeeper.server.ZKDatabase.append(ZKDatabase. >>>>> java:476) >>>>> at >>>>> org.apache.zookeeper.server.SyncRequestProcessor.run( >>>>> SyncRequestProcessor.java:140) >>>>> 2014-09-05 00:51:07,866 [myid:] - WARN >>>>> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@357] - caught >>>>> end of stream exception >>>>> EndOfStreamException: Unable to read additional data from client >>>>> sessionid 0x14837ac98960071, likely client has closed socket >>>>> at >>>>> org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228) >>>>> at >>>>> org.apache.zookeeper.server.NIOServerCnxnFactory.run( >>>>> NIOServerCnxnFactory.java:208) >>>>> at java.lang.Thread.run(Thread.java:745) >>>>> >>>>> Here's my configuration: >>>>> [user@xyz conf]$ grep -v '^#' zoo.cfg >>>>> tickTime=2000 >>>>> initLimit=10 >>>>> syncLimit=5 >>>>> dataDir=/usr/local/var/zookeeper >>>>> clientPort=2181 >>>>> autopurge.snapRetainCount=3 >>>>> autopurge.purgeInterval=1 >>>>> >>>>> Can anyone suggest what the issue could be? >>>>> >>>>> Thanks, >>>>> Brian >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>> > >
