Hi Gordon,
I have limited knowledge of configuring Innostore but can help answer some
of your merge_index questions.
The most important merge_index setting in terms of memory usage is
'buffer_rollover_size'. This affects how large the buffer is allowed to
grow, in bytes, before getting converted to an on-disk segment. Each
partition maintains a separate buffer, so any increases to this number will
be multiplied by the number of partitions in your system. The higher this
number, the less frequently merge_index will need to perform compactions.
The second most important settings for memory usage are a combination of
'segment_full_read_size' and 'max_compact_segments'. During compaction, the
system will completely page any segments smaller than the
'segment_full_read_size' value into memory. This should generally be as
large or larger than the 'buffer_rollover_size'. The higher this number, the
quicker each compaction will be. 'max_compact_segments' is the maximum
number of segments to compact at one time. The higher this number, the more
segments merge_index can involve in each compaction. In the worst case, a
compaction could take ('segment_full_read_size' * 'max_compact_segments')
bytes of RAM.
The rest of the settings have a much smaller impact on performance and
memory usage, and exist mainly for tweaking and special cases.
This is a completely unscientific estimate based on observing other Riak
Search applications, but I'd set buffer_rollover_size so that (# Partitions
* buffer_rollover_size) is about one-half the memory you wish for
merge_index to consume, hopefully somewhere between 1M and 10M. The rest of
the memory will be used by in-memory offset tables, compaction processes,
and during query operations.
Hope that helps.
Best,
Rusty
On Mon, May 23, 2011 at 2:05 PM, Gordon Tillman <[email protected]> wrote:
> Greetings!
>
> We are working with a riaksearch cluster that uses innostore as the primary
> backend in tandem with merge_index that is required by search. From reading
> the Basho wiki it looks like the following are the most important factors
> affecting memory and performance:
>
> • innostore
> • put data_home_dir and log_group_home_dir on different
> spindles
> • noatime
> • buffer_pool_size
> • flush_method
> • merge_index
> • data_root
> • buffer_rollover_size
> • max_compact_segments
> • segment_file_buffer_size
> • segment_full_read_size
> • segment_block_size
>
> Ideally, data_home_dir, log_group_home_dir, and data_root would all be on
> different spindles, but if you had just 2 disks available what would you
> recommend? Would it be best to have data_home_dir and data_root on one and
> then log_group_home_dir on the other?
>
> in calculating the proper setting for buffer_pool_size you are directed to
> allocate 60-80 percent of available RAM. So lets assume you want to take
> the remaining 20-40% of available RAM and split it up between innostore and
> merge_index?
>
> Would it be best to give each of them half of that value?
>
> Determining the approximate memory requirements for merge_index isn't (to
> me) real obvious. I looks like the following all have an effect:
>
> * buffer_rollover_size
> * buffer_delayed_write_size
> * max_compact_segments
> * segment_query_read_ahead_size
> * segment_compaction_read_ahead_size
> * segment_full_read_size
> * segment_block_size
> * segment_values_staging_size
>
> Is there a formula for determining the (approximate) proper values to use
> given a certain amount of available RAM?
>
> Thanks in advance for any advice. Sorry for all the questions!
>
> --gordon
>
>
>
> _______________________________________________
> riak-users mailing list
> [email protected]
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
_______________________________________________
riak-users mailing list
[email protected]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com