very even will answer aaron's email...
will upgrade to 0.8 too ! Le 7 juin 2011 13:09, "Terje Marthinussen" <tmarthinus...@gmail.com> a écrit : > If you run iostat without output every few second, is the I/O stable or do > you see very uneven I/O? > > Regards, > Terje > > On Tue, Jun 7, 2011 at 11:12 AM, aaron morton <aa...@thelastpickle.com >wrote: > >> There is a big IO queue and reads are spending a lot of time in the queue. >> >> Some more questions: >> - what version are you on ? >> - what is the concurrent_reads config setting ? >> - what is nodetool tpstats showing during the slow down ? >> - exactly how much data are you asking for ? how many rows and what sort of >> slice >> - has their been a lot of deletes or TTL columns used ? >> >> Hope that helps. >> Aaron >> >> ----------------- >> Aaron Morton >> Freelance Cassandra Developer >> @aaronmorton >> http://www.thelastpickle.com >> >> On 7 Jun 2011, at 10:09, Philippe wrote: >> >> Ok, here it goes again... No swapping at all... >> >> procs -----------memory---------- ---swap-- -----io---- -system-- >> ----cpu---- >> r b swpd free buff cache si so bi bo in cs us sy id >> wa >> 1 63 32044 88736 37996 7116524 0 0 227156 0 18314 5607 30 5 >> 11 53 >> 1 63 32044 90844 37996 7103904 0 0 233524 202 17418 4977 29 4 >> 9 58 >> 0 42 32044 91304 37996 7123884 0 0 249736 0 16197 5433 19 6 >> 3 72 >> 3 25 32044 89864 37996 7135980 0 0 223140 16 18135 7567 32 5 >> 11 52 >> 1 1 32044 88664 37996 7150728 0 0 229416 128 19168 7554 36 4 >> 10 51 >> 4 0 32044 89464 37996 7149428 0 0 213852 18 21041 8819 45 5 >> 12 38 >> 4 0 32044 90372 37996 7149432 0 0 233086 142 19909 7041 43 5 >> 10 41 >> 7 1 32044 89752 37996 7149520 0 0 206906 0 19350 6875 50 4 >> 11 35 >> >> Lots and lots of disk activity >> iostat -dmx 2 >> Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz >> avgqu-sz await r_await w_await svctm %util >> sda 52.50 0.00 7813.00 0.00 108.01 0.00 28.31 >> 117.15 14.89 14.89 0.00 0.11 83.00 >> sdb 56.00 0.00 7755.50 0.00 108.51 0.00 28.66 >> 118.67 15.18 15.18 0.00 0.11 82.80 >> md1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 >> 0.00 0.00 0.00 0.00 0.00 0.00 >> md5 0.00 0.00 15796.50 0.00 219.21 0.00 28.42 >> 0.00 0.00 0.00 0.00 0.00 0.00 >> dm-0 0.00 0.00 15796.50 0.00 219.21 0.00 28.42 >> 273.42 17.03 17.03 0.00 0.05 83.40 >> dm-1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 >> 0.00 0.00 0.00 0.00 0.00 0.00 >> >> More info : >> - all the data directory containing the data I'm querying into is 9.7GB >> and this is a server with 16GB >> - I'm hitting the server with 6 concurrent multigetsuperslicequeries on >> multiple keys, some of them can bring back quite a number of data >> - I'm reading all the keys for one column, pretty much sequentially >> >> This is a query in a rollup table that was originally in MySQL and it >> doesn't look like the performance to query by key is better. So I'm betting >> I'm doing something wrong here... but what ? >> >> Any ideas ? >> Thanks >> >> 2011/6/6 Philippe <watche...@gmail.com> >> >>> hum..no, it wasn't swapping. cassandra was the only thing running on that >>> server >>> and i was querying the same keys over and over >>> >>> i restarted Cassandra and doing the same thing, io is now down to zero >>> while cpu is up which dosen't surprise me as much. >>> >>> I'll report if it happens again. >>> Le 5 juin 2011 16:55, "Jonathan Ellis" <jbel...@gmail.com> a écrit : >>> >>> > You may be swapping. >>> > >>> > http://spyced.blogspot.com/2010/01/linux-performance-basics.html >>> > explains how to check this as well as how to see what threads are busy >>> > in the Java process. >>> > >>> > On Sat, Jun 4, 2011 at 5:34 PM, Philippe <watche...@gmail.com> wrote: >>> >> Hello, >>> >> I am evaluating using cassandra and I'm running into some strange IO >>> >> behavior that I can't explain, I'd like some help/ideas to troubleshoot >>> it. >>> >> I am running a 1 node cluster with a keyspace consisting of two columns >>> >> families, one of which has dozens of supercolumns itself containing >>> dozens >>> >> of columns. >>> >> All in all, this is a couple gigabytes of data, 12GB on the hard drive. >>> >> The hardware is pretty good : 16GB memory + RAID-0 SSD drives with LVM >>> and >>> >> an i5 processor (4 cores). >>> >> Keyspace: xxxxxxxxxxxxxxxxxxx >>> >> Read Count: 460754852 >>> >> Read Latency: 1.108205793092766 ms. >>> >> Write Count: 30620665 >>> >> Write Latency: 0.01411020877567486 ms. >>> >> Pending Tasks: 0 >>> >> Column Family: xxxxxxxxxxxxxxxxxxxxxxxxxx >>> >> SSTable count: 5 >>> >> Space used (live): 548700725 >>> >> Space used (total): 548700725 >>> >> Memtable Columns Count: 0 >>> >> Memtable Data Size: 0 >>> >> Memtable Switch Count: 11 >>> >> Read Count: 2891192 >>> >> Read Latency: NaN ms. >>> >> Write Count: 3157547 >>> >> Write Latency: NaN ms. >>> >> Pending Tasks: 0 >>> >> Key cache capacity: 367396 >>> >> Key cache size: 367396 >>> >> Key cache hit rate: NaN >>> >> Row cache capacity: 112683 >>> >> Row cache size: 112683 >>> >> Row cache hit rate: NaN >>> >> Compacted row minimum size: 125 >>> >> Compacted row maximum size: 924 >>> >> Compacted row mean size: 172 >>> >> Column Family: yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy >>> >> SSTable count: 7 >>> >> Space used (live): 8707538781 >>> >> Space used (total): 8707538781 >>> >> Memtable Columns Count: 0 >>> >> Memtable Data Size: 0 >>> >> Memtable Switch Count: 30 >>> >> Read Count: 457863660 >>> >> Read Latency: 2.381 ms. >>> >> Write Count: 27463118 >>> >> Write Latency: NaN ms. >>> >> Pending Tasks: 0 >>> >> Key cache capacity: 4518387 >>> >> Key cache size: 4518387 >>> >> Key cache hit rate: 0.9247881700850826 >>> >> Row cache capacity: 1349682 >>> >> Row cache size: 1349682 >>> >> Row cache hit rate: 0.39400533823415573 >>> >> Compacted row minimum size: 125 >>> >> Compacted row maximum size: 6866 >>> >> Compacted row mean size: 165 >>> >> My app makes a bunch of requests using a MultigetSuperSliceQuery for a >>> set >>> >> of keys, typically a couple dozen at most. It also selects a subset of >>> the >>> >> supercolumns. I am running 8 requests in parallel at most. >>> >> >>> >> Two days, I ran a 1.5 hour process that basically read every key. The >>> server >>> >> had no IOwaits and everything was humming along. However, right at the >>> end >>> >> of the process, there was a huge spike in IOs. I didn't think much of >>> it. >>> >> Today, after two days of inactivity, any query I run raises the IOs to >>> 80% >>> >> utilization of the SSD drives even though I'm running the same query >>> over >>> >> and over (no cache??) >>> >> Any ideas on how to troubleshoot this, or better, how to solve this ? >>> >> thanks >>> >> Philippe >>> > >>> > >>> > >>> > -- >>> > Jonathan Ellis >>> > Project Chair, Apache Cassandra >>> > co-founder of DataStax, the source for professional Cassandra support >>> > http://www.datastax.com >>> >> >> >>