randompartitioner cluster unbalanced

David Hawthorne Wed, 31 Aug 2011 21:29:56 -0700

$ ./nodetool -h localhost ring
Address         DC          Rack        Status State   Load            Owns    
Token                                       
                                                                            
136112946768375385385349842972707284580     
10.0.0.57    datacenter1 rack1       Up     Normal  8.31 GB         20.00%  0   
                                        
10.0.0.56    datacenter1 rack1       Up     Normal  13.7 GB         20.00%  
34028236692093846346337460743176821145      
10.0.0.55    datacenter1 rack1       Up     Normal  13.87 GB        20.00%  
68056473384187692692674921486353642290      
10.0.0.54    datacenter1 rack1       Up     Normal  8.03 GB         20.00%  
102084710076281539039012382229530463435     
10.0.0.72    datacenter1 rack1       Up     Normal  1.77 GB         20.00%  
136112946768375385385349842972707284580



This is a brand new cluster we just brought up and started loading data into a 
few days ago.  It's using the RandomPartitioner, RF=3 on everything, and we're 
doing QUORUM writes.  All keyspaces and CFs are for counter super columns.  All 
keys are moderately sized ascii strings with good variation between them.  All 
supercolumn names are longs.  All column names are ascii strings.  No 
decrements are done, no rows or columns are deleted, and read load is almost 
nonexistent.  Column values may get overwritten on account of being incremented 
because they are counters.  This is expected to happen quite a bit.  Not all 
rows are the same length.

Insert latency from my hector client box to the cluster averages at 70ms - 
200ms, which is really high.  Inserts/sec from hector's perspective peaks out 
at 750/sec, and consistently drops down (and stays at) 120/sec.  This is not 
due to compactions, based on the output of nodetool compactionstats.

I wiped the cluster this afternoon, started from scratch, and I'm seeing the 
same distribution on a smaller scale, with the same latencies.  Inserts

Going by statistics from cassandra via jmx, it looks like all hosts are getting 
about the same number of MutationStage Completed Tasks/sec.  However, I see one 
host consistently has Pending MutationStage and ReplicateOnWriteStage tasks 
(50/30 respectively - 211/42 respectively, throughout the day).  Now, I know 
that ReplicateOnWrite can go really slow if you have large SuperColumns, but I 
do not.  I'm working on proving that at the moment, pending a couple of code 
pushes.  This same box typically runs CPU up around 600-700%, and it's all user 
space cpu, not IO wait.  We monitor these boxes like crazy, and we've tweaked 
it a bit to try to rule things out (enabling mmap'd io, disabled swap, mounted 
ext4 with noatime), none of which has made a single bit of difference.  If I 
kill cassandra on that one box, the load moves to the box before it in the 
ring, ruling out this one box as bad hardware, etc.  Mutations and ROWs back 
up, and cpu jumps to 600%.  Heap memory usage sits at 600MB-2GB and heap size 
is 4G on all 5 boxes.

CPU usage and Mutations/ROWs are not affected by hector client connections;  if 
I remove this single host from the hector configuration and confirm that there 
are 0 connections from my client to this one box, I still see high Mutations 
and ROWs and CPU usage.  If I increase the number of client connections in the 
hector pool, performance does not change.

concurrent_writes are set at 48, concurrent_reads at 32, num cores per box is 
8.  memtable flush size in mb is 28MB and flush based on ops is 131k.  Our 
memtables flush every 3 minutes (based on graphs, and this aligns exactly with 
the 131k / (Mutations/sec each box is doing)).  commitlog and data are on the 
same disk, but our disks seem bored.

key cache is enabled and I see an almost perfect 100% hit rate.  row cache is 
disabled.

My questions are:

is this normal to see load unevenly spread out when using RandomPartitioner?
how do I fix it?  Do I need to assign token ranges manually even with 
RandomPartitioner?
is there a way to see the total row counts assigned to each box?
why is this one host running 600% cpu while the rest are sitting at 0%?


For reference, here's cfstats taken from the host with the high cpu usage.

Keyspace: STATS_TEST
        Read Count: 18744838
        Read Latency: 2.568355930309987 ms.
        Write Count: 18744845
        Write Latency: 0.020453476835898085 ms.
        Pending Tasks: 0
                Column Family: rollup1h
                SSTable count: 4
                Space used (live): 194724367
                Space used (total): 260574143
                Number of Keys (estimate): 11904
                Memtable Columns Count: 34708
                Memtable Data Size: 27280700
                Memtable Switch Count: 67
                Read Count: 9255646
                Read Latency: 2.498 ms.
                Write Count: 9255658
                Write Latency: 0.021 ms.
                Pending Tasks: 0
                Key cache capacity: 200000
                Key cache size: 91254
                Key cache hit rate: 0.9950598390225411
                Row cache: disabled
                Compacted row minimum size: 150
                Compacted row maximum size: 52066354
                Compacted row mean size: 17404

                Column Family: rollup5m
                SSTable count: 4
                Space used (live): 296161119
                Space used (total): 402687415
                Number of Keys (estimate): 10496
                Memtable Columns Count: 34742
                Memtable Data Size: 34607575
                Memtable Switch Count: 67
                Read Count: 9255681
                Read Latency: 2.700 ms.
                Write Count: 9255687
                Write Latency: 0.020 ms.
                Pending Tasks: 0
                Key cache capacity: 200000
                Key cache size: 88629
                Key cache hit rate: 0.9956045403129263
                Row cache: disabled
                Compacted row minimum size: 150
                Compacted row maximum size: 129557750
                Compacted row mean size: 25562

randompartitioner cluster unbalanced

Reply via email to