Ahh, OK I get it now... if _any_ of these thresholds are met _and_ the files
are not active (i.e. they have grown larger than max_file_size) they'll be
merged. Thanks!

- Jeremy


On Wed, Sep 14, 2011 at 3:16 PM, Dan Reverri <[email protected]> wrote:

> At any point in time Bitcask may have data spread across a number of data
> files. Bitcask occasionally runs a merge process which reads the data from
> those files and writes a merged set of data to a new file. Once completed
> the old files can be removed and the new file is used for future read
> operations.
>
> The merge process chooses which files to merge based on a number of
> thresholds. The thresholds are:
>
> {frag_threshold, 40}, % >= 40% fragmentation
> {dead_bytes_threshold, 134217728}, % Dead bytes > 128 MB
> {small_file_threshold, 10485760}, % File is < 10 MB
>
> If a data file exceeds any of the thresholds it will be included in the
> merge process. The small_file_threshold means that any inactive data file
> that is less than 10MB will be included in the merge process.
>
> Thanks,
> Dan
>
>
> Daniel Reverri
> Developer Advocate
> Basho Technologies, Inc.
> [email protected]
>
>
> On Wed, Sep 14, 2011 at 9:50 AM, Jeremy Raymond <[email protected]>wrote:
>
>> Ok, thanks I'll give that a try.
>>
>> What does small_file_threshold do then?
>>
>> - Jeremy
>>
>>
>>
>> On Wed, Sep 14, 2011 at 12:48 PM, Dan Reverri <[email protected]> wrote:
>>
>>> Hi Jeremy,
>>>
>>> The max_file_size parameter controls when Bitcask will close the
>>> currently active data file and start a new data file. The active data file
>>> will not be considered when determining if a merge should occur. The default
>>> max_file_size is 2GBs. This means that each partition in the system can grow
>>> to 2GBs before the data files are considered for merging. This is likely
>>> what you are seeing in your situation.
>>>
>>> You can lower the max_file_size in the app.config file under the bitcask
>>> section. This parameter should be specified in bytes.
>>>
>>> This article is related to your issue:
>>>
>>> https://help.basho.com/entries/20141178-why-does-it-seem-that-bitcask-merging-is-only-triggered-when-a-riak-node-is-restarted
>>>
>>> Thanks,
>>> Dan
>>>
>>> Daniel Reverri
>>> Developer Advocate
>>> Basho Technologies, Inc.
>>> [email protected]
>>>
>>>
>>>
>>> On Wed, Sep 14, 2011 at 8:49 AM, Jeremy Raymond <[email protected]>wrote:
>>>
>>>> If I'm reading the docs correctly, only files smaller
>>>> than small_file_threshold will be included in a merge. So
>>>> if small_file_threshold must be bigger than max_file_size for a merge to
>>>> happen?
>>>>
>>>>  - Jeremy
>>>>
>>>>
>>>>
>>>> On Wed, Sep 14, 2011 at 10:23 AM, Jeremy Raymond 
>>>> <[email protected]>wrote:
>>>>
>>>>> Maybe I just need to tweak the Bitcask parameters to merge more often?
>>>>>
>>>>> I have approx 17000 keys which get overwritten once an hour. After each
>>>>> updated the /var/lib/riak/bitcask folder grows by 20 MB (so about 1200 
>>>>> bytes
>>>>> per key). With the default frag_merge_trigger at 60 I should get a merge
>>>>> every 3 hours as I would have > 60% of the keys being dead? This would 
>>>>> also
>>>>> meet the default frag_threshold of 40 since > 40% of the keys are dead? 
>>>>> I'm
>>>>> not seeing the merging happening.
>>>>>
>>>>> - Jeremy
>>>>>
>>>>>
>>>>>
>>>>> On Wed, Sep 14, 2011 at 9:28 AM, Jeremiah Peschka <
>>>>> [email protected]> wrote:
>>>>>
>>>>>> I would think that the InnoDB backend would be a better backend for
>>>>>> the use case you're describing.
>>>>>> ---
>>>>>> Jeremiah Peschka - Founder, Brent Ozar PLF, LLC
>>>>>> Microsoft SQL Server MVP
>>>>>>
>>>>>> On Sep 14, 2011, at 8:09 AM, Jeremy Raymond wrote:
>>>>>>
>>>>>> > Hi,
>>>>>> >
>>>>>> > I store data in Riak whose keys constantly get overwritten with new
>>>>>> data. I'm currently using Bitcask as the back-end and recently noticed 
>>>>>> the
>>>>>> Bitcask data folder grow to 24GB. After restarting the nodes, which I 
>>>>>> think
>>>>>> triggered Bitcask merge, the data went down to 96MB. Today the data dirs 
>>>>>> are
>>>>>> back up to around 500MB. Would an alternate backend better suit this 
>>>>>> type of
>>>>>> use case where keys are constantly being overwritten?
>>>>>> >
>>>>>> > - Jeremy
>>>>>> > _______________________________________________
>>>>>> > riak-users mailing list
>>>>>> > [email protected]
>>>>>> > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>>>>>
>>>>>>
>>>>>
>>>>
>>>> _______________________________________________
>>>> riak-users mailing list
>>>> [email protected]
>>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>>>
>>>>
>>>
>>
>
_______________________________________________
riak-users mailing list
[email protected]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Reply via email to