Thanks. Very helpful information.
-b
On 09/05/2014 11:31 AM, Camille Fournier wrote:
You shouldn't use ZK to keep that data around. It's not designed to store a
ton of historical information. Thousands of jobs is no big deal, but
thousands of jobs and their history back through time is not what the
system is designed for.
C
On Fri, Sep 5, 2014 at 11:29 AM, Brian C. Huffman <
[email protected]> wrote:
We use zookeeper to keep track of the jobs we run, and we run thousands of
jobs. When a job is finished it is no longer needed except for web
monitoring tools. Is that considered state? We want to keep that around so
we have a history of completed jobs. Will these stay in memory?
Thanks,
Brian
On 09/05/2014 11:26 AM, Camille Fournier wrote:
All state is stored in memory in ZK for performance reasons. It sounds
like
you're putting more data into it than the heap will accommodate.
ZK is useful for references to data, but not for large amounts of actual
data. It's not designed to be a large data store.
Thanks,
C
On Fri, Sep 5, 2014 at 10:33 AM, Brian C. Huffman <
[email protected]> wrote:
Flavio,
I was having the same problems on 3.4.5 so I upgraded to 3.4.6. So it
doesn't seem to be related to the version.
You might be right about the storing of state. I'm curious - does the
"state" consist of the entire node listing? Is there anyway to tell
zookeeper to keep a node around but only on disk?
Thanks,
Brian
On 09/05/2014 09:47 AM, Flavio Junqueira wrote:
Brian,
How much state are you storing in ZK? Can you check the size of the
snapshots?
One common problem when folks are testing is that they forget to delete
the data from previous tests, so the state keeps accumulating and the
server keeps crashing because the state is too large.
Also, consider trying 3.4.5 just to see if it is a problem with 3.4.6
alone.
-Flavio
On Friday, September 5, 2014 2:23 PM, Brian C. Huffman <
[email protected]> wrote:
We're running the latest version of the stable 3.4 branch (3.4.6) and
have been consistently having problems running out of heap space.
We're running a single server (redundancy isn't a concern at this
point)
and I've tried the defaults (which seems to use Java's default heap of
8GB) as well as limiting to 3GB. Either way the Zookeeper server
eventually dies. With larger heap size it seems to take longer to die.
Here's the latest trace:
2014-09-05 00:51:11,419 [myid:] - ERROR
[SyncThread:0:SyncRequestProcessor@183] - Severe unrecoverable error,
exiting
java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:2271)
at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.
java:113)
at
java.io.ByteArrayOutputStream.ensureCapacity(
ByteArrayOutputStream.java:93)
at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.
java:140)
at java.io.DataOutputStream.write(DataOutputStream.java:107)
at java.io.FilterOutputStream.write(FilterOutputStream.java:97)
at
org.apache.jute.BinaryOutputArchive.writeBuffer(
BinaryOutputArchive.java:119)
at org.apache.zookeeper.txn.Txn.serialize(Txn.java:49)
at
org.apache.jute.BinaryOutputArchive.writeRecord(
BinaryOutputArchive.java:123)
at org.apache.zookeeper.txn.MultiTxn.serialize(MultiTxn.java:44)
at
org.apache.zookeeper.server.persistence.Util.
marshallTxnEntry(Util.java:
263)
at
org.apache.zookeeper.server.persistence.FileTxnLog.append(
FileTxnLog.java:216)
at
org.apache.zookeeper.server.persistence.FileTxnSnapLog.
append(FileTxnSnapLog.java:314)
at org.apache.zookeeper.server.ZKDatabase.append(ZKDatabase.
java:476)
at
org.apache.zookeeper.server.SyncRequestProcessor.run(
SyncRequestProcessor.java:140)
2014-09-05 00:51:07,866 [myid:] - WARN
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@357] - caught
end of stream exception
EndOfStreamException: Unable to read additional data from client
sessionid 0x14837ac98960071, likely client has closed socket
at
org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228)
at
org.apache.zookeeper.server.NIOServerCnxnFactory.run(
NIOServerCnxnFactory.java:208)
at java.lang.Thread.run(Thread.java:745)
Here's my configuration:
[user@xyz conf]$ grep -v '^#' zoo.cfg
tickTime=2000
initLimit=10
syncLimit=5
dataDir=/usr/local/var/zookeeper
clientPort=2181
autopurge.snapRetainCount=3
autopurge.purgeInterval=1
Can anyone suggest what the issue could be?
Thanks,
Brian