i'll second edward's comment. cassandra is designed to scale horizontally, so if disk I/O is slowing you down then you must scale
On Tue, Jan 8, 2013 at 7:10 AM, Jim Cistaro <jcist...@netflix.com> wrote: > One metric to watch is pending compactions (via nodetool > compactionstats). This count will give you some idea of whether you are > falling behind with compactions. The other measure is how long you are > compacting after your inserts have stopped. > > If I understand correctly, since you never update the data, that would > explain why the compaction logging shows 100% of orig. With size-tiered, > you are flushing small files, compacting when you get 4 of like size, etc. > Since you have no updates, the compaction will not shrink the data. > > As Aaron said, use iostat –x (or dstat) to see if you are taxing the > disks. If so, then leveled compaction may be your option (for reasons > already stated). If not taxing the disks, then you might want to increase > your compaction throughput, as you suggested. > > Depending on what version you are using, another thing to possibly tune > is the size of sstables when flushed to disk. In your case of insert only, > the smaller the flush size, the more times that row is going to be > rewritten during a compaction (hence increase I/O). > > jc > > From: Edward Capriolo <edlinuxg...@gmail.com> > Reply-To: "user@cassandra.apache.org" <user@cassandra.apache.org> > Date: Monday, January 7, 2013 2:33 PM > > To: "user@cassandra.apache.org" <user@cassandra.apache.org> > Subject: Re: help turning compaction..hours of run to get 0% > compaction.... > > There is some point where you simply need more machines. > > On Mon, Jan 7, 2013 at 5:02 PM, Michael Kjellman > <mkjell...@barracuda.com>wrote: > >> Right, I guess I'm saying that you should try loading your data with >> leveled compaction and see how your compaction load is. >> >> Your work load sounds like leveled will fit much better than size >> tiered. >> >> From: Brian Tarbox <tar...@cabotresearch.com> >> Reply-To: "user@cassandra.apache.org" <user@cassandra.apache.org> >> Date: Monday, January 7, 2013 1:58 PM >> To: "user@cassandra.apache.org" <user@cassandra.apache.org> >> Subject: Re: help turning compaction..hours of run to get 0% >> compaction.... >> >> The problem I see is that it already takes me more than 24 hours just >> to load my data...during which time the logs say I'm spending tons of time >> doing compaction. For example in the last 72 hours I'm consumed* 20 >> hours* per machine on compaction. >> >> Can I conclude from that than I should be (perhaps drastically) >> increasing my compaction_mb_per_sec on the theory that I'm getting behind? >> >> The fact that it takes me 3 days or more to run a test means its hard >> to just play with values and see what works best, so I'm trying to >> understand the behavior in detail. >> >> Thanks. >> >> Brain >> >> >> On Mon, Jan 7, 2013 at 4:13 PM, Michael Kjellman <mkjell...@barracuda.com >> > wrote: >> >>> http://www.datastax.com/dev/blog/when-to-use-leveled-compaction >>> >>> "If you perform at least twice as many reads as you do writes, leveled >>> compaction may actually save you disk I/O, despite consuming more I/O for >>> compaction. This is especially true if your reads are fairly random and >>> don’t focus on a single, hot dataset." >>> >>> From: Brian Tarbox <tar...@cabotresearch.com> >>> Reply-To: "user@cassandra.apache.org" <user@cassandra.apache.org> >>> Date: Monday, January 7, 2013 12:56 PM >>> To: "user@cassandra.apache.org" <user@cassandra.apache.org> >>> Subject: Re: help turning compaction..hours of run to get 0% >>> compaction.... >>> >>> I have not specified leveled compaction so I guess I'm defaulting to >>> size tiered? My data (in the column family causing the trouble) insert >>> once, ready many, update-never. >>> >>> Brian >>> >>> >>> On Mon, Jan 7, 2013 at 3:13 PM, Michael Kjellman < >>> mkjell...@barracuda.com> wrote: >>> >>>> Size tiered or leveled compaction? >>>> >>>> From: Brian Tarbox <tar...@cabotresearch.com> >>>> Reply-To: "user@cassandra.apache.org" <user@cassandra.apache.org> >>>> Date: Monday, January 7, 2013 12:03 PM >>>> To: "user@cassandra.apache.org" <user@cassandra.apache.org> >>>> Subject: help turning compaction..hours of run to get 0% compaction.... >>>> >>>> I have a column family where I'm doing 500 inserts/sec for 12 hours >>>> or so at time. At some point my performance falls off a cliff due to time >>>> spent doing compactions. >>>> >>>> I'm seeing row after row of logs saying that after 1 or 2 hours of >>>> compactiing it reduced to 100% of 99% of the original. >>>> >>>> I'm trying to understand what direction this data points me to in >>>> term of configuration change. >>>> >>>> a) increase my compaction_throughput_mb_per_sec because I'm >>>> falling behind (am I falling behind?) >>>> >>>> b) enable multi-threaded compaction? >>>> >>>> Any help is appreciated. >>>> >>>> Brian >>>> >>>> ---------------------------------- >>>> Join Barracuda Networks in the fight against hunger. >>>> To learn how you can help in your community, please visit: >>>> http://on.fb.me/UAdL4f >>>> >>>> >>> >>> >>> ---------------------------------- >>> Join Barracuda Networks in the fight against hunger. >>> To learn how you can help in your community, please visit: >>> http://on.fb.me/UAdL4f >>> >>> >> >> >> ---------------------------------- >> Join Barracuda Networks in the fight against hunger. >> To learn how you can help in your community, please visit: >> http://on.fb.me/UAdL4f >> >> > >