Hi Dejan,

I would try putting both logs and snapshots on disk before
experimenting with ramdisk. Could you try:

- disable forceSync
- increase snapCount

and see how many writes/sec you get?


On Tue, Apr 21, 2015 at 9:46 PM, Dejan Markic
<[email protected]> wrote:
> Hello!
>
> Thank you Michi and Flavio for all your valuable information!
> I'm testing setup, where I have logs on ramdisk and snapshots directly on 
> disk. It's working rather poorly.
>
> My setup is like this currently:
>
> tickTime=2000
> initLimit=10
> syncLimit=5
> dataDir=/var/lib/zookeeper
> dataLogDir=/var/lib/zookeeper/logs
> clientPort=2181
> server.1=...
> server.2=...
> server.3=...
> forceSync=no
> skipACL=yes
>
> ZooKeeper dies every one hour now.
> I have one setup with 3 servers running version 3.3.5 and one setup with 3 
> servers running version 3.4.5. Both versions are doing pretty much the same 
> stuff and both receive the same error now.
>
> This happens:
> Server#1:
> 2015-04-22 06:25:53,653 - WARN  
> [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:Follower@89] - Exception when 
> following the leader
> java.net.SocketTimeoutException: Read timed out
>         at java.net.SocketInputStream.socketRead0(Native Method)
>         at java.net.SocketInputStream.read(SocketInputStream.java:146)
>         at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
>         at java.io.BufferedInputStream.read(BufferedInputStream.java:254)
>         at java.io.DataInputStream.readInt(DataInputStream.java:387)
>         at 
> org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
>         at 
> org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:83)
>         at 
> org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:108)
>         at 
> org.apache.zookeeper.server.quorum.Learner.readPacket(Learner.java:152)
>         at 
> org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:85)
>         at 
> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:740)
> 2015-04-22 06:25:53,654 - INFO  
> [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:Follower@166] - shutdown called
> java.lang.Exception: shutdown Follower
>         at 
> org.apache.zookeeper.server.quorum.Follower.shutdown(Follower.java:166)
>         at 
> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:744)
> ... AFTER THAT I SEE ...
> 2015-04-22 06:25:56,076 - ERROR 
> [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FileTxnSnapLog@210] - Parent 
> /BOSS/ICS_O_182008125_580916217_180121007/_xlock missi
> ng for /BOSS/ICS_O_182008125_580916217_180121007/_xlock/lock-0000000003
> 2015-04-22 06:25:56,076 - ERROR 
> [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:QuorumPeer@453] - Unable to load 
> database on disk
> java.io.IOException: Failed to process transaction type: 1 error: 
> KeeperErrorCode = NoNode for /BOSS/ICS_O_182008125_580916217_180121007/_xlock
>         at 
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:153)
>         at 
> org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223)
>         at 
> org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:417)
>         at 
> org.apache.zookeeper.server.quorum.QuorumPeer.getLastLoggedZxid(QuorumPeer.java:546)
>         at 
> org.apache.zookeeper.server.quorum.FastLeaderElection.getInitLastLoggedZxid(FastLeaderElection.java:690)
>         at 
> org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:737)
>         at 
> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:716)
> Caused by: org.apache.zookeeper.KeeperException$NoNodeException: 
> KeeperErrorCode = NoNode for /BOSS/ICS_O_182008125_580916217_180121007/_xlock
>         at 
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.processTransaction(FileTxnSnapLog.java:211)
>         at 
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:151)
>         ... 6 more
> 2015-04-22 06:25:56,077 - WARN  
> [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:QuorumPeer@718] - Unexpected 
> exception
> java.lang.RuntimeException: Unable to run quorum server
>         at 
> org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:454)
>         at 
> org.apache.zookeeper.server.quorum.QuorumPeer.getLastLoggedZxid(QuorumPeer.java:546)
>         at 
> org.apache.zookeeper.server.quorum.FastLeaderElection.getInitLastLoggedZxid(FastLeaderElection.java:690)
>         at 
> org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:737)
>         at 
> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:716)
> Caused by: java.io.IOException: Failed to process transaction type: 1 error: 
> KeeperErrorCode = NoNode for /BOSS/ICS_O_182008125_580916217_180121007/_xlock
>         at 
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:153)
>         at 
> org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223)
>         at 
> org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:417)
>         ... 4 more
> Caused by: org.apache.zookeeper.KeeperException$NoNodeException: 
> KeeperErrorCode = NoNode for /BOSS/ICS_O_182008125_580916217_180121007/_xlock
>         at 
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.processTransaction(FileTxnSnapLog.java:211)
>         at org.apache
>
> Server#2:
> 2015-04-22 06:25:53,675 - WARN  
> [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181:Follower@89] - Exception when 
> following the leader
> java.net.SocketTimeoutException: Read timed out
>         at java.net.SocketInputStream.socketRead0(Native Method)
>         at java.net.SocketInputStream.read(SocketInputStream.java:146)
>         at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
>         at java.io.BufferedInputStream.read(BufferedInputStream.java:254)
>         at java.io.DataInputStream.readInt(DataInputStream.java:387)
>         at 
> org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
>         at 
> org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:83)
>         at 
> org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:108)
>         at 
> org.apache.zookeeper.server.quorum.Learner.readPacket(Learner.java:152)
>         at 
> org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:85)
>         at 
> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:740)
> 2015-04-22 06:25:53,823 - INFO  
> [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181:Follower@166] - shutdown called
> java.lang.Exception: shutdown Follower
>         at 
> org.apache.zookeeper.server.quorum.Follower.shutdown(Follower.java:166)
>         at 
> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:744)
> ... LATER ...
> 2015-04-22 06:25:53,829 - INFO  
> [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181:FollowerZooKeeperServer@139] - 
> Shutting down
> 2015-04-22 06:25:53,829 - INFO  
> [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181:ZooKeeperServer@419] - shutting down
> 2015-04-22 06:25:53,829 - INFO  
> [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181:FollowerRequestProcessor@105] - 
> Shutting down
> 2015-04-22 06:25:53,829 - INFO  
> [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181:CommitProcessor@181] - Shutting down
> 2015-04-22 06:25:53,829 - INFO  
> [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181:FinalRequestProcessor@415] - 
> shutdown of request processor complete
> 2015-04-22 06:25:53,830 - INFO  [CommitProcessor:2:CommitProcessor@150] - 
> CommitProcessor exited loop!
> 2015-04-22 06:25:53,830 - INFO  
> [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181:SyncRequestProcessor@175] - Shutting 
> down
> 2015-04-22 06:25:53,830 - WARN  
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@218] - 
> Ignoring unexpected runtime exception
> java.nio.channels.CancelledKeyException
>         at sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:73)
>         at sun.nio.ch.SelectionKeyImpl.readyOps(SelectionKeyImpl.java:87)
>         at 
> org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:187)
>         at java.lang.Thread.run(Thread.java:701)
> 2015-04-22 06:25:53,830 - INFO  [SyncThread:2:SyncRequestProcessor@155] - 
> SyncRequestProcessor exited!
> 2015-04-22 06:25:53,830 - INFO  
> [FollowerRequestProcessor:2:FollowerRequestProcessor@95] - 
> FollowerRequestProcessor exited loop!
> 2015-04-22 06:25:53,831 - INFO  
> [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181:QuorumPeer@670] - LOOKING
> 2015-04-22 06:25:53,835 - INFO  
> [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181:FileSnap@83] - Reading snapshot 
> /var/lib/zookeeper/version-2/snapshot.301015ebb
> ... LATER...
> 2015-04-22 06:25:56,970 - ERROR 
> [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181:FileTxnSnapLog@210] - Parent 
> /BOSS/ICS_O_182041092_581017710_180121007/_xlock missi
> ng for /BOSS/ICS_O_182041092_581017710_180121007/_xlock/lock-0000000003
> 2015-04-22 06:25:56,971 - ERROR 
> [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181:QuorumPeer@453] - Unable to load 
> database on disk
> java.io.IOException: Failed to process transaction type: 1 error: 
> KeeperErrorCode = NoNode for /BOSS/ICS_O_182041092_581017710_180121007/_xlock
>         at 
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:153)
>         at 
> org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223)
>         at 
> org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:417)
>         at 
> org.apache.zookeeper.server.quorum.QuorumPeer.getLastLoggedZxid(QuorumPeer.java:546)
>         at 
> org.apache.zookeeper.server.quorum.FastLeaderElection.getInitLastLoggedZxid(FastLeaderElection.java:690)
>         at 
> org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:737)
>         at 
> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:716)
> Caused by: org.apache.zookeeper.KeeperException$NoNodeException: 
> KeeperErrorCode = NoNode for /BOSS/ICS_O_182041092_581017710_180121007/_xlock
>         at 
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.processTransaction(FileTxnSnapLog.java:211)
>         at 
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:151)
>         ... 6 more
> 2015-04-22 06:25:56,971 - WARN  
> [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181:QuorumPeer@718] - Unexpected 
> exception
> java.lang.RuntimeException: Unable to run quorum server
>         at 
> org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:454)
>         at 
> org.apache.zookeeper.server.quorum.QuorumPeer.getLastLoggedZxid(QuorumPeer.java:546)
>         at 
> org.apache.zookeeper.server.quorum.FastLeaderElection.getInitLastLoggedZxid(FastLeaderElection.java:690)
>         at 
> org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:737)
>         at 
> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:716)
> Caused by: java.io.IOException: Failed to process transaction type: 1 error: 
> KeeperErrorCode = NoNode for /BOSS/ICS_O_182041092_581017710_180121007/_xlock
>         at 
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:153)
>         at 
> org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223)
>         at 
> org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:417)
>         ... 4 more
> Caused by: org.apache.zookeeper.KeeperException$NoNodeException: 
> KeeperErrorCode = NoNode for /BOSS/ICS_O_182041092_581017710_180121007/_xlock
>         at 
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.processTransaction(FileTxnSnapLog.java:211)
>         at 
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:151)
>         ... 6 more
> 2015-04-22 06:25:56,972 - INFO  
> [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181:QuorumPeer@670] - LOOKING
> 2015-04-22 06:25:56,973 - INFO  
> [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181:FileSnap@83] - Reading snapshot 
> /var/lib/zookeeper/version-2/snapshot.301015ebb
> ....
>
>
> Server#3:
>
> 2015-04-22 06:25:53,850 - INFO  [ProcessThread(sid:3 
> cport:-1)::PrepRequestProcessor@476] - Processed session termination for 
> sessionid: 0x24cdd77c03d0355
> 2015-04-22 06:25:53,850 - ERROR 
> [LearnerHandler-/192.168.10.114:60067:LearnerHandler@562] - Unexpected 
> exception causing shutdown while sock still open
> java.io.EOFException
>         at java.io.DataInputStream.readInt(DataInputStream.java:392)
>         at 
> org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
>         at 
> org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:83)
>         at 
> org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:108)
>         at 
> org.apache.zookeeper.server.quorum.LearnerHandler.run(LearnerHandler.java:476)
> 2015-04-22 06:25:53,851 - WARN  
> [Sender-/192.168.10.113:45055:LearnerHandler@153] - Unexpected exception at 
> LearnerHandler Socket[addr=/192.168.10.113,port=
> 45055,localport=2888] tickOfLastAck:15970 synced?:true queuedPacketLength:6
> java.net.SocketException: Broken pipe
>         at java.net.SocketOutputStream.socketWrite0(Native Method)
>         at 
> java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:109)
>         at java.net.SocketOutputStream.write(SocketOutputStream.java:153)
>         at 
> java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
>         at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
>         at 
> org.apache.zookeeper.server.quorum.LearnerHandler.sendPackets(LearnerHandler.java:136)
>         at 
> org.apache.zookeeper.server.quorum.LearnerHandler.access$000(LearnerHandler.java:56)
>         at 
> org.apache.zookeeper.server.quorum.LearnerHandler$1.run(LearnerHandler.java:437)
> 2015-04-22 06:25:53,851 - ERROR 
> [LearnerHandler-/192.168.10.113:45055:LearnerHandler@562] - Unexpected 
> exception causing shutdown while sock still open
> java.io.EOFException
>         at java.io.DataInputStream.readInt(DataInputStream.java:392)
>         at 
> org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
>         at 
> org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:83)
>         at 
> org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:108)
>         at 
> org.apache.zookeeper.server.quorum.LearnerHandler.run(LearnerHandler.java:476)
> 2015-04-22 06:25:53,852 - WARN  
> [LearnerHandler-/192.168.10.113:45055:LearnerHandler@575] - ******* GOODBYE 
> /192.168.10.113:45055 ********
> 2015-04-22 06:25:53,851 - WARN  
> [Sender-/192.168.10.114:60067:LearnerHandler@153] - Unexpected exception at 
> LearnerHandler Socket[addr=/192.168.10.114,port=
> 60067,localport=2888] tickOfLastAck:15970 synced?:true queuedPacketLength:1
> java.net.SocketException: Broken pipe
>         at java.net.SocketOutputStream.socketWrite0(Native Method)
>         at 
> java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:109)
>         at java.net.SocketOutputStream.write(SocketOutputStream.java:153)
>         at 
> java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
>         at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
>         at 
> org.apache.zookeeper.server.quorum.LearnerHandler.sendPackets(LearnerHandler.java:136)
>         at 
> org.apache.zookeeper.server.quorum.LearnerHandler.access$000(LearnerHandler.java:56)
>         at 
> org.apache.zookeeper.server.quorum.LearnerHandler$1.run(LearnerHandler.java:437)
> 2015-04-22 06:25:53,851 - INFO  [ProcessThread(sid:3 
> cport:-1)::PrepRequestProcessor@476] - Processed session termination for 
> sessionid: 0x34cdd77c0b30322
> 2015-04-22 06:25:53,851 - WARN  
> [LearnerHandler-/192.168.10.114:60067:LearnerHandler@575] - ******* GOODBYE 
> /192.168.10.114:60067 ********
> 2015-04-22 06:25:53,853 - INFO  [ProcessThread(sid:3 
> cport:-1)::PrepRequestProcessor@476] - Processed session termination for 
> sessionid: 0x24cdd77c03d0352
> 2015-04-22 06:25:53,851 - INFO  
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - 
> Accepted socket connection from /192.168.10.161:46385
> 2015-04-22 06:25:53,854 - INFO  [ProcessThread(sid:3 
> cport:-1)::PrepRequestProcessor@476] - Processed session termination for 
> sessionid: 0x24cdd77c03d034e
> 2015-04-22 06:25:53,854 - WARN  
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@349] - caught end of 
> stream exception
> EndOfStreamException: Unable to read additional data from client sessionid 
> 0x34cdd77c0b3031d, likely client has closed socket
>         at 
> org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:220)
>         at 
> org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208)
>         at java.lang.Thread.run(Thread.java:701)
>
> Why would "broken pipe" occur? It happens every 30-60 minutes. And wjy can't 
> ZK read Snapshot? It's probably do to forceSync=no ?
>
> Thank you for all your suggestions.
>
> Kind regards,
> Dejan Markic
> ________________________________________
> From: Michi Mutsuzaki [[email protected]]
> Sent: Friday, April 17, 2015 9:51 PM
> To: [email protected]
> Cc: Flavio Junqueira
> Subject: Re: Transaction logs and snapshots
>
> Hi Dejan,
>
> I had a similar usecase: no durability requirement / virtualized (esx)
> environment. We saw intermittent session expiry, so we ended up
> setting forceSync to false. It's been working well since then.
>
> http://zookeeper.apache.org/doc/trunk/zookeeperAdmin.html#Unsafe+Options
>
> On Thu, Apr 16, 2015 at 10:08 PM, Dejan Markic
> <[email protected]> wrote:
>> Hello Flavio!
>>
>> When we were testing ZooKeeper, we saw high IOPS - and since we don't care 
>> about data durability, we simply moved it to ramdisk. All ZK's are running 
>> on virtual machines (some HyperV, some vmWare). So yes, in the end, any high 
>> IOPS can be problematic.
>> So I guess my only solution at the moment is, to increase the ramdisk to 
>> accommodate the logs/snapshots.
>> I've just had another idea ... ZK uses only the log file while running 
>> right? That's where all IOPS are happening? Is there a way, to put active 
>> log on ramdisk, snapshots and old logs to another directory?
>> Don't know why I put snapshots on ramdisk ... if I understand correctly, 
>> snapshots are simply written when needed, right? I know I can put snapshots 
>> to another directory (eg to disk directly) and it will not cause constant 
>> IOPS, right?
>>
>> Thank you and kind regards,
>> Dejan Markic
>> ________________________________________
>> From: Flavio Junqueira [[email protected]]
>> Sent: Thursday, April 16, 2015 11:26 PM
>> To: Dejan Markic
>> Cc: [email protected]
>> Subject: Re: Transaction logs and snapshots
>>
>> Distributed locks is indeed part of our bread and butter. Why don't you want 
>> to write to disk? Your workload does't seem to be heavy. Does the IO traffic 
>> compete with some other traffic you have?
>>
>> -Flavio
>>
>>> On 16 Apr 2015, at 22:15, Dejan Markic <[email protected]> wrote:
>>>
>>> Hello Flavio!
>>>
>>> Yes, indeed, ZK might not be the best option - but I could not find any 
>>> better. What we need is a rather fast, distributed locking "system". ZK was 
>>> at the moment the best option, and after testing it seemed to be the thing 
>>> we are looking for. Other than snapshots/transaction logs, we have no 
>>> problems. It easily handles our current load. It has C library, which makes 
>>> it fairly easy to port it to other software.
>>> What we need (but I cannot find any) is distributed in-memory distributed 
>>> locking system where we can store some small information.
>>> For instance, we use ZK's nodes as /SESSION_ID ... we lock it here, and 
>>> then we use eg /SESSION_ID/my_var to store something. After session is 
>>> gone, we remove this node and all information about it.
>>>
>>> If you have any idea about what kind of software we should try, please let 
>>> me know. You've helped me enough already!
>>>
>>> Thank you and kind regards,
>>> Dejan Markic
>>> ________________________________________
>>> From: Flavio Junqueira [[email protected]]
>>> Sent: Thursday, April 16, 2015 10:29 PM
>>> To: Dejan Markic
>>> Cc: [email protected]
>>> Subject: Re: Transaction logs and snapshots
>>>
>>> Another think you could do is to make snapCount very large so that 
>>> snapshots are created infrequently. But, let me step back and ask you why 
>>> you think ZK is a good fit for your project. It isn't clear to me that your 
>>> case is a good one for ZK.
>>>
>>> -Flavio
>>>
>>>
>>>> On 16 Apr 2015, at 11:01, Dejan Markic <[email protected]> wrote:
>>>>
>>>> Hello!
>>>>
>>>> Log seems to be always 67.108.880 bytes.
>>>> Snapshots are currently between 30-40MB. Snapshot is created almost every 
>>>> minute.
>>>> Yes, data durability is not important at all. Once the session ends (it 
>>>> may last between 0 and few minutes, average around 1-2 minutes maybe), I 
>>>> don't need it anymore. I regulary remove  nodes that are not changed for 
>>>> more than 10 minutes.
>>>> I even recieve updates for sessions, so even if ZK looses data, I would 
>>>> get it back after few minutes.
>>>>
>>>> Thanks!
>>>>
>>>> Kind regards,
>>>> Dejan
>>>>
>>>>
>>>> -----Original Message-----
>>>> From: Flavio Junqueira [mailto:[email protected]]
>>>> Sent: Thursday, April 16, 2015 11:49 AM
>>>> To: [email protected]
>>>> Subject: Re: Transaction logs and snapshots
>>>>
>>>> Hi Dejan,
>>>> For a typical ZK application, granularity of hours is more than enough, 
>>>> since it is supposed to be an infrequent background task. In your case, it 
>>>> sounds like durability isn't an important property because if it is you 
>>>> shouldn't be getting rid of disk data this fast. I'm also wondering about 
>>>> the amount of data you're generating. What's the size of your snapshots 
>>>> and txn logs?
>>>> -Flavio
>>>>
>>>>
>>>>    On Thursday, April 16, 2015 10:26 AM, Dejan Markic 
>>>> <[email protected]> wrote:
>>>>
>>>>
>>>>
>>>> Hello Flavio!
>>>>
>>>> Would that mean, that zkCleanup.sh would not be needed?
>>>> PurgeInterval is minimum 1 hour? Why is it so high?
>>>>
>>>> Thanks!
>>>>
>>>> Kind regards,
>>>> Dejan Markic
>>>>
>>>>
>>>> -----Original Message-----
>>>> From: Flavio Junqueira [mailto:[email protected]]
>>>> Sent: Thursday, April 16, 2015 11:15 AM
>>>> To: [email protected]
>>>> Subject: Re: Transaction logs and snapshots
>>>>
>>>> Hi Dejan,
>>>> Check if the autopurge feature solves your problem:
>>>> http://zookeeper.apache.org/doc/r3.4.6/zookeeperAdmin.html#sc_advancedConfiguration
>>>>
>>>> -Flavio
>>>>
>>>>
>>>>    On Thursday, April 16, 2015 9:17 AM, Dejan Markic 
>>>> <[email protected]> wrote:
>>>>
>>>>
>>>>
>>>> Hello all!
>>>>
>>>> We are running 3 ZK servers in ensemble, and ZK is processing a lot of 
>>>> commands per seconds. There are probably around 300 nodes 
>>>> created/checked/set/get per second.
>>>> Since we have only information about live sessions we handle in ZK, we 
>>>> don't need any data persistency - eg: we can stop all nodes, clean all 
>>>> transaction logs/snapshots, and start them up again, without any issues.
>>>> Since we have a lot of requests/changes, we have moved dataDir onto 
>>>> ramdisk, so we have no problems with disk IOPS, etc.
>>>> Is there a way, to minimze the usage of snapshots/logs so ramdisk would 
>>>> not get filled up? It happens that transaction logs/snapshots grow so 
>>>> large, that we run out of space on ramdisk.
>>>> We issue >/usr/share/zookeeper/bin/zkCleanup.sh -n 3< every 2 minutes, so 
>>>> this should cleanup the dataDir quite often. Why is >count number of 
>>>> snapshots/logs to keep< limited to 3 and not below?
>>>> I assume, in my setup, I don't even need snapshots/logs to be stored after 
>>>> they are not actively needed?
>>>> So my basic questions are:
>>>> - can I somehow get rid of snapshot/logs sooner, more often ... ?
>>>> - when is snapshot created? Can it be created sooner, so it would be 
>>>> smaller?
>>>> - Is it possible to get rid of snapshot/logs all together?
>>>>
>>>> Thank you for all your inputs and kind regards, Dejan Markic
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>

Reply via email to