i'll second edward's comment.  cassandra is designed to scale horizontally,
so if disk I/O is slowing you down then you must scale


On Tue, Jan 8, 2013 at 7:10 AM, Jim Cistaro <jcist...@netflix.com> wrote:

>  One metric to watch is pending compactions (via nodetool
> compactionstats).  This count will give you some idea of whether you are
> falling behind with compactions.  The other measure is how long you are
> compacting after your inserts have stopped.
>
>  If I understand correctly, since you never update the data, that would
> explain why the compaction logging shows 100% of orig.  With size-tiered,
> you are flushing small files, compacting when you get 4 of like size, etc.
>  Since you have no updates, the compaction will not shrink the data.
>
>  As Aaron said, use iostat –x (or dstat) to see if you are taxing the
> disks.  If so, then leveled compaction may be your option (for reasons
> already stated).  If not taxing the disks, then you might want to increase
> your compaction throughput, as you suggested.
>
>  Depending on what version you are using, another thing to possibly tune
> is the size of sstables when flushed to disk.  In your case of insert only,
> the smaller the flush size, the more times that row is going to be
> rewritten during a compaction (hence increase I/O).
>
>  jc
>
>   From: Edward Capriolo <edlinuxg...@gmail.com>
> Reply-To: "user@cassandra.apache.org" <user@cassandra.apache.org>
> Date: Monday, January 7, 2013 2:33 PM
>
> To: "user@cassandra.apache.org" <user@cassandra.apache.org>
> Subject: Re: help turning compaction..hours of run to get 0%
> compaction....
>
>  There is some point where you simply need more machines.
>
> On Mon, Jan 7, 2013 at 5:02 PM, Michael Kjellman 
> <mkjell...@barracuda.com>wrote:
>
>>  Right, I guess I'm saying that you should try loading your data with
>> leveled compaction and see how your compaction load is.
>>
>>  Your work load sounds like leveled will fit much better than size
>> tiered.
>>
>>   From: Brian Tarbox <tar...@cabotresearch.com>
>> Reply-To: "user@cassandra.apache.org" <user@cassandra.apache.org>
>> Date: Monday, January 7, 2013 1:58 PM
>> To: "user@cassandra.apache.org" <user@cassandra.apache.org>
>> Subject: Re: help turning compaction..hours of run to get 0%
>> compaction....
>>
>>  The problem I see is that it already takes me more than 24 hours just
>> to load my data...during which time the logs say I'm spending tons of time
>> doing compaction.  For example in the last 72 hours I'm consumed* 20
>> hours* per machine on compaction.
>>
>>  Can I conclude from that than I should be (perhaps drastically)
>> increasing my compaction_mb_per_sec on the theory that I'm getting behind?
>>
>>  The fact that it takes me 3 days or more to run a test means its hard
>> to just play with values and see what works best, so I'm trying to
>> understand the behavior in detail.
>>
>>  Thanks.
>>
>>  Brain
>>
>>
>> On Mon, Jan 7, 2013 at 4:13 PM, Michael Kjellman <mkjell...@barracuda.com
>> > wrote:
>>
>>>  http://www.datastax.com/dev/blog/when-to-use-leveled-compaction
>>>
>>>  "If you perform at least twice as many reads as you do writes, leveled
>>> compaction may actually save you disk I/O, despite consuming more I/O for
>>> compaction. This is especially true if your reads are fairly random and
>>> don’t focus on a single, hot dataset."
>>>
>>>   From: Brian Tarbox <tar...@cabotresearch.com>
>>> Reply-To: "user@cassandra.apache.org" <user@cassandra.apache.org>
>>>  Date: Monday, January 7, 2013 12:56 PM
>>> To: "user@cassandra.apache.org" <user@cassandra.apache.org>
>>> Subject: Re: help turning compaction..hours of run to get 0%
>>> compaction....
>>>
>>>  I have not specified leveled compaction so I guess I'm defaulting to
>>> size tiered?  My data (in the column family causing the trouble) insert
>>> once, ready many, update-never.
>>>
>>>  Brian
>>>
>>>
>>> On Mon, Jan 7, 2013 at 3:13 PM, Michael Kjellman <
>>> mkjell...@barracuda.com> wrote:
>>>
>>>>  Size tiered or leveled compaction?
>>>>
>>>>   From: Brian Tarbox <tar...@cabotresearch.com>
>>>> Reply-To: "user@cassandra.apache.org" <user@cassandra.apache.org>
>>>> Date: Monday, January 7, 2013 12:03 PM
>>>> To: "user@cassandra.apache.org" <user@cassandra.apache.org>
>>>> Subject: help turning compaction..hours of run to get 0% compaction....
>>>>
>>>>  I have a column family where I'm doing 500 inserts/sec for 12 hours
>>>> or so at time.  At some point my performance falls off a cliff due to time
>>>> spent doing compactions.
>>>>
>>>>  I'm seeing row after row of logs saying that after 1 or 2 hours of
>>>> compactiing it reduced to 100% of 99% of the original.
>>>>
>>>>  I'm trying to understand what direction this data points me to in
>>>> term of configuration change.
>>>>
>>>>     a) increase my compaction_throughput_mb_per_sec because I'm
>>>> falling behind (am I falling behind?)
>>>>
>>>>     b) enable multi-threaded compaction?
>>>>
>>>>  Any help is appreciated.
>>>>
>>>>  Brian
>>>>
>>>> ----------------------------------
>>>> Join Barracuda Networks in the fight against hunger.
>>>> To learn how you can help in your community, please visit:
>>>> http://on.fb.me/UAdL4f
>>>>   ­­
>>>>
>>>
>>>
>>> ----------------------------------
>>> Join Barracuda Networks in the fight against hunger.
>>> To learn how you can help in your community, please visit:
>>> http://on.fb.me/UAdL4f
>>>   ­­
>>>
>>
>>
>> ----------------------------------
>> Join Barracuda Networks in the fight against hunger.
>> To learn how you can help in your community, please visit:
>> http://on.fb.me/UAdL4f
>>   ­­
>>
>
>

Reply via email to