Re: "Dead" files in bitcask or something

Dmitry Demeshchuk Tue, 17 Aug 2010 23:39:01 -0700

Thanks, I've read the conversation.

Turning on timed sync for bitcask kinda solved the problem for now.


On Tue, Aug 17, 2010 at 11:52 PM, Alexander Sicular <sicul...@gmail.com> wrote:
> I'm having a discussion with dizzyd in the irc about this. The compaction is 
> not triggered by time or by and particular number of records overwritten, but 
> rather the total number of dead bytes and fragmentation percentage as listed 
> in the default config here, 
> http://github.com/basho/bitcask/blob/master/ebin/bitcask.app.
>
> check http://irclogger.com/riak/2010-08-17 for the latest.
>
> -alexander
>
>
> On Aug 17, 2010, at 11:59 AM, Dmitry Demeshchuk wrote:
>
>> We've been running Riak at production for 3 weeks and database just
>> kept growing. Even more time for our test server. Well, it was an
>> older Riak, 0.12.0.
>>
>> I've been running 0.12.1 for several hours and still no compaction though...
>>
>> On Tue, Aug 17, 2010 at 7:54 PM, Alexander Sicular <sicul...@gmail.com> 
>> wrote:
>>> Bitcask is a write only log (wol) that eats disk (by keeping all updates)
>>> until a compaction phase that reclaims disk at some defined interval.
>>>
>>> -Alexander
>>>
>>>
>>> @siculars on twitter
>>> http://siculars.posterous.com
>>>
>>> Sent from my iPhone
>>>
>>> On Aug 17, 2010, at 11:27, Dmitry Demeshchuk <demeshc...@gmail.com> wrote:
>>>
>>>> Greetings.
>>>>
>>>> This problem has already been discussed in IRC a bit.
>>>>
>>>> I use Riak 0.12.1 (have been using 0.12.0 but then updated to the
>>>> latest version and got the same problem) with bitcask storage.
>>>>
>>>> All Riak settings are default, i.e., all buckets are
>>>> default-configured (allow_mult=false), replication is 3x. Currently
>>>> Riak is run at a single machine. This problem is reproduced on
>>>> different machines with different Riak clusters brought up.
>>>>
>>>> Though the total database records size doesn't grow, update operations
>>>> (I'll describe them in details later) make the total size of the
>>>> "data/bitcask" folder. For example, I made a database backup on our
>>>> test server and the backup size was 2.5MB. But the size of the
>>>> "data/bitcask" folder was 17GB!
>>>>
>>>> Careful investigation showed that the entire database size on the disk
>>>> is performed when Riak update operation is performed, even when the
>>>> value during update was exactly the same.
>>>>
>>>> The update operation is like this:
>>>>
>>>> RiakObject = RiakClient:get(Bucket, Key, 1),
>>>> OldValue = riak_object:get_value(RiakObject),
>>>> NewValue = do_something(),
>>>> NewRiakObject = riak_object:update_value(RiakObject, NewValue),
>>>> RiakClient:put(NewRiakObject, 1).
>>>>
>>>> And it appeared that even if I make NewValue exactly the same as
>>>> OldValue, this update operation increases the database size of the
>>>> disk. Still, the entire size of this Riak object is the same.
>>>>
>>>> I thought that maybe I could do something wrong with data operating,
>>>> and there's some data I miss. But, again, backup file is very small,
>>>> much smaller then the disk space occupied by database.
>>>>
>>>> If I do list_buckets or list_keys, these operations work desperately
>>>> slow but finally they return the right values, without any garbage.
>>>> Values of the Riak objects are okay as well.
>>>>
>>>> When I had a look at data files, it appeared that *.bitcask.data are
>>>> the files that keep growing.
>>>>
>>>> That's all I found for now.
>>>>
>>>> Any clues?
>>>>
>>>> --
>>>> Best regards,
>>>> Dmitry Demeshchuk
>>>>
>>>> _______________________________________________
>>>> riak-users mailing list
>>>> riak-users@lists.basho.com
>>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>>
>>
>>
>>
>> --
>> Best regards,
>> Dmitry Demeshchuk
>
>



-- 
Best regards,
Dmitry Demeshchuk

_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: "Dead" files in bitcask or something

Reply via email to