Hello,
We have a 5-node cluster runing cassandra 1.2.16, with a significant amount of
data:
Address Rack Status State Load Owns
Token
6783174585269344219
10.198.xx.xx1 rack1 Up Normal 2.59 TB 60.00%
-9223372036854775808
10.198.xx.xx2 rack1 Up Normal 1.49 TB 40.00%
-5534023222112865485
10.198.xx.xx3 rack1 Up Normal 2.18 TB 53.23%
-1844674407370955162
10.198.xx.xx4 rack1 Up Normal 2.86 TB 80.00%
5534023222112865484
10.198.xx.xx5 rack1 Up Moving 2.32 TB 66.77%
6783174585269344219
The first three nodes (.xx1 - .xx3 above) were at the desired tokens, so I
issued a move on .xx4:
nodetool move 1844674407370955161
That was about 40hrs ago!
When I do nodetool netstats, I do see apparent progress:
jatyler@xx4:~$ nodetool netstats
Mode: MOVING
Not sending any streams.
Streaming from: /10.198.xx.xx2
SyncCore: /var/cassandra/data/SyncCore/file-ic-31475-Data.db sections=1
progress=0/77699597 - 0%
…
SyncCore: /var/cassandra/data/SyncCore/anotherFile-ic-32252-Data.db
sections=1 progress=0/1254063427 - 0%
Read Repair Statistics:
Attempted: 8047367
Mismatch (Blocking): 97327
Mismatch (Background): 74369
Pool Name Active Pending Completed
Commands n/a 0 472255111
Responses n/a 1 749751322
I wrote 'apparent progress' because it reports “MOVING” and the Pending
Commands/Responses are changing over time. However, I haven’t seen the
individual .db files progress go above 0%.
Meanwhile, the system appears to have plenty of unused bandwidth, from 'iostat
-x -m 1':
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz
avgqu-sz await svctm %util
sda 0.00 56.00 1338.00 171.00 57.59 0.89 79.36
0.57 0.38 0.17 25.30
avg-cpu: %user %nice %system %iowait %steal %idle
22.77 1.82 2.35 0.20 0.00 72.86
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz
avgqu-sz await svctm %util
sda 0.00 0.00 785.00 0.00 33.80 0.00 88.17
0.27 0.35 0.18 14.10
avg-cpu: %user %nice %system %iowait %steal %idle
20.16 2.05 2.22 0.20 0.00 75.37
Is 40 hours too long for this move? Should I be seeing individual .db files
report more progress? Should I start with the first box (even though the token
appears correct)?
Any thoughts would be greatly appreciated.
THX
Cheers,
~Jason
*******