At any point in time Bitcask may have data spread across a number of data
files. Bitcask occasionally runs a merge process which reads the data from
those files and writes a merged set of data to a new file. Once completed
the old files can be removed and the new file is used for future read
operations.
The merge process chooses which files to merge based on a number of
thresholds. The thresholds are:
{frag_threshold, 40}, % >= 40% fragmentation
{dead_bytes_threshold, 134217728}, % Dead bytes > 128 MB
{small_file_threshold, 10485760}, % File is < 10 MB
If a data file exceeds any of the thresholds it will be included in the
merge process. The small_file_threshold means that any inactive data file
that is less than 10MB will be included in the merge process.
Thanks,
Dan
Daniel Reverri
Developer Advocate
Basho Technologies, Inc.
[email protected]
On Wed, Sep 14, 2011 at 9:50 AM, Jeremy Raymond <[email protected]> wrote:
> Ok, thanks I'll give that a try.
>
> What does small_file_threshold do then?
>
> - Jeremy
>
>
>
> On Wed, Sep 14, 2011 at 12:48 PM, Dan Reverri <[email protected]> wrote:
>
>> Hi Jeremy,
>>
>> The max_file_size parameter controls when Bitcask will close the currently
>> active data file and start a new data file. The active data file will not be
>> considered when determining if a merge should occur. The default
>> max_file_size is 2GBs. This means that each partition in the system can grow
>> to 2GBs before the data files are considered for merging. This is likely
>> what you are seeing in your situation.
>>
>> You can lower the max_file_size in the app.config file under the bitcask
>> section. This parameter should be specified in bytes.
>>
>> This article is related to your issue:
>>
>> https://help.basho.com/entries/20141178-why-does-it-seem-that-bitcask-merging-is-only-triggered-when-a-riak-node-is-restarted
>>
>> Thanks,
>> Dan
>>
>> Daniel Reverri
>> Developer Advocate
>> Basho Technologies, Inc.
>> [email protected]
>>
>>
>>
>> On Wed, Sep 14, 2011 at 8:49 AM, Jeremy Raymond <[email protected]>wrote:
>>
>>> If I'm reading the docs correctly, only files smaller
>>> than small_file_threshold will be included in a merge. So
>>> if small_file_threshold must be bigger than max_file_size for a merge to
>>> happen?
>>>
>>> - Jeremy
>>>
>>>
>>>
>>> On Wed, Sep 14, 2011 at 10:23 AM, Jeremy Raymond <[email protected]>wrote:
>>>
>>>> Maybe I just need to tweak the Bitcask parameters to merge more often?
>>>>
>>>> I have approx 17000 keys which get overwritten once an hour. After each
>>>> updated the /var/lib/riak/bitcask folder grows by 20 MB (so about 1200
>>>> bytes
>>>> per key). With the default frag_merge_trigger at 60 I should get a merge
>>>> every 3 hours as I would have > 60% of the keys being dead? This would also
>>>> meet the default frag_threshold of 40 since > 40% of the keys are dead? I'm
>>>> not seeing the merging happening.
>>>>
>>>> - Jeremy
>>>>
>>>>
>>>>
>>>> On Wed, Sep 14, 2011 at 9:28 AM, Jeremiah Peschka <
>>>> [email protected]> wrote:
>>>>
>>>>> I would think that the InnoDB backend would be a better backend for the
>>>>> use case you're describing.
>>>>> ---
>>>>> Jeremiah Peschka - Founder, Brent Ozar PLF, LLC
>>>>> Microsoft SQL Server MVP
>>>>>
>>>>> On Sep 14, 2011, at 8:09 AM, Jeremy Raymond wrote:
>>>>>
>>>>> > Hi,
>>>>> >
>>>>> > I store data in Riak whose keys constantly get overwritten with new
>>>>> data. I'm currently using Bitcask as the back-end and recently noticed the
>>>>> Bitcask data folder grow to 24GB. After restarting the nodes, which I
>>>>> think
>>>>> triggered Bitcask merge, the data went down to 96MB. Today the data dirs
>>>>> are
>>>>> back up to around 500MB. Would an alternate backend better suit this type
>>>>> of
>>>>> use case where keys are constantly being overwritten?
>>>>> >
>>>>> > - Jeremy
>>>>> > _______________________________________________
>>>>> > riak-users mailing list
>>>>> > [email protected]
>>>>> > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>>>>
>>>>>
>>>>
>>>
>>> _______________________________________________
>>> riak-users mailing list
>>> [email protected]
>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>>
>>>
>>
>
_______________________________________________
riak-users mailing list
[email protected]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com