Re: Transaction logs and snapshots

Flavio Junqueira Sat, 18 Apr 2015 07:13:05 -0700

Disabling forceSync will only make the writes to the txn log asynchronous, but 
the same volume of data will be written. I still think you could try to reduce 
the number of snapshots generated by increasing snapCount.


-Flavio

> On 17 Apr 2015, at 20:51, Michi Mutsuzaki <[email protected]> wrote:
> 
> Hi Dejan,
> 
> I had a similar usecase: no durability requirement / virtualized (esx)
> environment. We saw intermittent session expiry, so we ended up
> setting forceSync to false. It's been working well since then.
> 
> http://zookeeper.apache.org/doc/trunk/zookeeperAdmin.html#Unsafe+Options
> 
> On Thu, Apr 16, 2015 at 10:08 PM, Dejan Markic
> <[email protected]> wrote:
>> Hello Flavio!
>> 
>> When we were testing ZooKeeper, we saw high IOPS - and since we don't care 
>> about data durability, we simply moved it to ramdisk. All ZK's are running 
>> on virtual machines (some HyperV, some vmWare). So yes, in the end, any high 
>> IOPS can be problematic.
>> So I guess my only solution at the moment is, to increase the ramdisk to 
>> accommodate the logs/snapshots.
>> I've just had another idea ... ZK uses only the log file while running 
>> right? That's where all IOPS are happening? Is there a way, to put active 
>> log on ramdisk, snapshots and old logs to another directory?
>> Don't know why I put snapshots on ramdisk ... if I understand correctly, 
>> snapshots are simply written when needed, right? I know I can put snapshots 
>> to another directory (eg to disk directly) and it will not cause constant 
>> IOPS, right?
>> 
>> Thank you and kind regards,
>> Dejan Markic
>> ________________________________________
>> From: Flavio Junqueira [[email protected]]
>> Sent: Thursday, April 16, 2015 11:26 PM
>> To: Dejan Markic
>> Cc: [email protected]
>> Subject: Re: Transaction logs and snapshots
>> 
>> Distributed locks is indeed part of our bread and butter. Why don't you want 
>> to write to disk? Your workload does't seem to be heavy. Does the IO traffic 
>> compete with some other traffic you have?
>> 
>> -Flavio
>> 
>>> On 16 Apr 2015, at 22:15, Dejan Markic <[email protected]> wrote:
>>> 
>>> Hello Flavio!
>>> 
>>> Yes, indeed, ZK might not be the best option - but I could not find any 
>>> better. What we need is a rather fast, distributed locking "system". ZK was 
>>> at the moment the best option, and after testing it seemed to be the thing 
>>> we are looking for. Other than snapshots/transaction logs, we have no 
>>> problems. It easily handles our current load. It has C library, which makes 
>>> it fairly easy to port it to other software.
>>> What we need (but I cannot find any) is distributed in-memory distributed 
>>> locking system where we can store some small information.
>>> For instance, we use ZK's nodes as /SESSION_ID ... we lock it here, and 
>>> then we use eg /SESSION_ID/my_var to store something. After session is 
>>> gone, we remove this node and all information about it.
>>> 
>>> If you have any idea about what kind of software we should try, please let 
>>> me know. You've helped me enough already!
>>> 
>>> Thank you and kind regards,
>>> Dejan Markic
>>> ________________________________________
>>> From: Flavio Junqueira [[email protected]]
>>> Sent: Thursday, April 16, 2015 10:29 PM
>>> To: Dejan Markic
>>> Cc: [email protected]
>>> Subject: Re: Transaction logs and snapshots
>>> 
>>> Another think you could do is to make snapCount very large so that 
>>> snapshots are created infrequently. But, let me step back and ask you why 
>>> you think ZK is a good fit for your project. It isn't clear to me that your 
>>> case is a good one for ZK.
>>> 
>>> -Flavio
>>> 
>>> 
>>>> On 16 Apr 2015, at 11:01, Dejan Markic <[email protected]> wrote:
>>>> 
>>>> Hello!
>>>> 
>>>> Log seems to be always 67.108.880 bytes.
>>>> Snapshots are currently between 30-40MB. Snapshot is created almost every 
>>>> minute.
>>>> Yes, data durability is not important at all. Once the session ends (it 
>>>> may last between 0 and few minutes, average around 1-2 minutes maybe), I 
>>>> don't need it anymore. I regulary remove  nodes that are not changed for 
>>>> more than 10 minutes.
>>>> I even recieve updates for sessions, so even if ZK looses data, I would 
>>>> get it back after few minutes.
>>>> 
>>>> Thanks!
>>>> 
>>>> Kind regards,
>>>> Dejan
>>>> 
>>>> 
>>>> -----Original Message-----
>>>> From: Flavio Junqueira [mailto:[email protected]]
>>>> Sent: Thursday, April 16, 2015 11:49 AM
>>>> To: [email protected]
>>>> Subject: Re: Transaction logs and snapshots
>>>> 
>>>> Hi Dejan,
>>>> For a typical ZK application, granularity of hours is more than enough, 
>>>> since it is supposed to be an infrequent background task. In your case, it 
>>>> sounds like durability isn't an important property because if it is you 
>>>> shouldn't be getting rid of disk data this fast. I'm also wondering about 
>>>> the amount of data you're generating. What's the size of your snapshots 
>>>> and txn logs?
>>>> -Flavio
>>>> 
>>>> 
>>>>   On Thursday, April 16, 2015 10:26 AM, Dejan Markic 
>>>> <[email protected]> wrote:
>>>> 
>>>> 
>>>> 
>>>> Hello Flavio!
>>>> 
>>>> Would that mean, that zkCleanup.sh would not be needed?
>>>> PurgeInterval is minimum 1 hour? Why is it so high?
>>>> 
>>>> Thanks!
>>>> 
>>>> Kind regards,
>>>> Dejan Markic
>>>> 
>>>> 
>>>> -----Original Message-----
>>>> From: Flavio Junqueira [mailto:[email protected]]
>>>> Sent: Thursday, April 16, 2015 11:15 AM
>>>> To: [email protected]
>>>> Subject: Re: Transaction logs and snapshots
>>>> 
>>>> Hi Dejan,
>>>> Check if the autopurge feature solves your problem:
>>>> http://zookeeper.apache.org/doc/r3.4.6/zookeeperAdmin.html#sc_advancedConfiguration
>>>> 
>>>> -Flavio
>>>> 
>>>> 
>>>>   On Thursday, April 16, 2015 9:17 AM, Dejan Markic 
>>>> <[email protected]> wrote:
>>>> 
>>>> 
>>>> 
>>>> Hello all!
>>>> 
>>>> We are running 3 ZK servers in ensemble, and ZK is processing a lot of 
>>>> commands per seconds. There are probably around 300 nodes 
>>>> created/checked/set/get per second.
>>>> Since we have only information about live sessions we handle in ZK, we 
>>>> don't need any data persistency - eg: we can stop all nodes, clean all 
>>>> transaction logs/snapshots, and start them up again, without any issues.
>>>> Since we have a lot of requests/changes, we have moved dataDir onto 
>>>> ramdisk, so we have no problems with disk IOPS, etc.
>>>> Is there a way, to minimze the usage of snapshots/logs so ramdisk would 
>>>> not get filled up? It happens that transaction logs/snapshots grow so 
>>>> large, that we run out of space on ramdisk.
>>>> We issue >/usr/share/zookeeper/bin/zkCleanup.sh -n 3< every 2 minutes, so 
>>>> this should cleanup the dataDir quite often. Why is >count number of 
>>>> snapshots/logs to keep< limited to 3 and not below?
>>>> I assume, in my setup, I don't even need snapshots/logs to be stored after 
>>>> they are not actively needed?
>>>> So my basic questions are:
>>>> - can I somehow get rid of snapshot/logs sooner, more often ... ?
>>>> - when is snapshot created? Can it be created sooner, so it would be 
>>>> smaller?
>>>> - Is it possible to get rid of snapshot/logs all together?
>>>> 
>>>> Thank you for all your inputs and kind regards, Dejan Markic
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>> 
>>

Re: Transaction logs and snapshots

Reply via email to