Dan -

I've got dead_bytes_threshold=5242880 (5M) and dead_bytes_merge_trigger=10242880. My bitcask *.data files are 250-ish MB in size:

root@ha2:/data/riaksearch/bitcask/1027618338748291114361965898003636498195577569280#
 ls -lah
total 771M
drwxr-xr-x  2 riak riak 4.0K 2011-06-12 01:08 .
drwxr-xr-x 34 riak riak 4.0K 2011-06-12 01:10 ..
-rw-------  1 riak riak 229M 2011-06-08 13:11 1307415077.bitcask.data
-rw-r--r--  1 riak riak 4.3M 2011-06-08 13:11 1307415077.bitcask.hint
-rw-------  1 riak riak 276M 2011-06-10 13:30 1307562153.bitcask.data
-rw-r--r--  1 riak riak 5.1M 2011-06-10 13:30 1307562153.bitcask.hint
-rw-------  1 riak riak 1.4M 2011-06-08 13:45 1307562333.bitcask.data
-rw-r--r--  1 riak riak  27K 2011-06-08 13:45 1307562333.bitcask.hint
-rw-------  1 riak riak 246M 2011-06-13 15:34 1307862506.bitcask.data
-rw-r--r--  1 riak riak 9.4M 2011-06-13 15:34 1307862506.bitcask.hint
-rw-------  1 riak riak  107 2011-06-12 01:08 bitcask.write.lock

I'm pretty sure that 50% or more of the data in these files should've aged-off by now and the merge trigger should've happened. The article shows why merges happen when a restart is done, but it doesn't really explain why merges don't happen at normal runtime.

I really don't want to restart riak every day to merge files.

Q: What are some good trigger settings for my use case?

I want to collect and store 1 day worth of tweets from the twitter spritzer feed and have the data files auto-merge once in a while (once a day or more frequently) when they've gotten 10% of 'dead' data in them (aka, the tweets expire after 1 day).

- Steve

--
Steve Webb - Senior System Administrator for gnip.com
http://twitter.com/GnipWebb

On Mon, 13 Jun 2011, Dan Reverri wrote:

Hi Steve,

This Knowledge Base article may be related:
https://help.basho.com/entries/20141178-why-does-it-seem-that-bitcask-merging-is-only-triggered-when-a-riak-node-is-restarted

Thanks,
Dan

Daniel Reverri
Developer Advocate
Basho Technologies, Inc.
[email protected]


On Mon, Jun 13, 2011 at 10:25 AM, Steve Webb <[email protected]> wrote:

Justin -

My current bitcask settings are:

 %% Bitcask Config
 {bitcask, [
            {data_root, "/var/lib/riaksearch/bitcask" },
            {dead_bytes_merge_trigger, 10242880 },
            {dead_bytes_threshold, 5242880 },
            {expiry_secs, 86400}
          ]},

My understanding of these settings mean that the data should auto-expire
after one day.  Also, once each bitcask file in
.../riaksearch/bitcask/xxx/*.data once it has 10M of "dead" or expired data
in it, should be merged, right?

I'm collecting the spritzer twitter stream and loading it into two buckets
(one non-indexed bucket holds the full tweet, one indexed bucket holds the
tweet string, id, date and username).  I used to see about 10 GB of data
total, but it's growing and currently at 26GB of data total.

I'm seeing these in the logs:

INFO REPORT==== 13-Jun-2011::08:28:19 ===
Pid <0.6844.0> compacted 3 segments for 942232 bytes in 4.900694 seconds,
0.18 MB/sec

=INFO REPORT==== 13-Jun-2011::08:29:01 ===
Pid <0.6267.0> compacted 3 segments for 1721790 bytes in 9.690511 seconds,
0.17 MB/sec

=INFO REPORT==== 13-Jun-2011::08:31:23 ===
Pid <0.6924.0> compacted 3 segments for 6988416 bytes in 44.659753 seconds,
0.15 MB/sec

... but I'm not seeing any "merging" related entries.


- Steve

--
Steve Webb - Senior System Administrator for gnip.com
http://twitter.com/GnipWebb

On Wed, 8 Jun 2011, Justin Sheehy wrote:

 Hi, Steve.

Check out this page:
http://wiki.basho.com/Bitcask-Configuration.html#Disk-Usage-and-Merging-Settings

Basically, a "merge trigger" must be met in order to have the merge
process occur.  When it does occur, it will affect all existing files that
meet a "merge threshold."

One note that is relevant for your specific use: the expiry_secs parameter
will cause a given item to disappear from the client API immediately after
expiry, and to be cleaned if it is in a file already being merged, but will
not currently contribute toward merge triggers or thresholds on its own if
not otherwise "dead".

-Justin


On Jun 7, 2011, at 4:29 PM, Steve Webb wrote:

 Hello there.


I'm curious - I'm up to about 10GB of storage and I'm guessing that I'll
be full in 3-4 more days of ingesting data.  I have no idea if/when a merge
will run to expire the older data.

I'm loading a 2-node (1GB mem, 20GB storage, vmware VMs) riaksearch
cluster with the spritzer twitter feed.  I used the bitcask 'expiry_secs' to
expire data after 3 days. Q: Is there a method or command to force a merge
at any time? Q: Is there a way to run a merge when the storage size reaches
a specific threshold?


- Steve

--
Steve Webb - Senior System Administrator for gnip.com
http://twitter.com/GnipWebb

_______________________________________________
riak-users mailing list
[email protected]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com



_______________________________________________
riak-users mailing list
[email protected]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com



_______________________________________________
riak-users mailing list
[email protected]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Reply via email to