What errors are you seeing in the log files of the down nodes? Did you run upgradesstables? You need to upgradesstables when moving from < 1.1.7 to 1.1.9
On Apr 4, 2013, at 6:11 PM, S C <as...@outlook.com> wrote: > I was in the middle of upgrade to 1.1.9. I brought one node with 1.1.9 while > the other were running on 1.1.5. Once one of the node was on 1.1.9 it is no > longer recognizing other nodes in the ring. > > On 192.168.56.10 and 11 > > 192.168.56.10 DC1-Cass RAC1 Up Normal 28.06 GB 50.00% > 0 > 192.168.56.11 DC1-Cass RAC1 Up Normal 31.59 GB 25.00% > 42535295865117307932921825928971026432 > 192.168.56.12 DC1-Cass RAC1 Down Normal 29.02 GB 25.00% > 85070591730234615865843651857942052864 > > > On 192.168.56.12 > > 192.168.56.10 DC1-Cass RAC1 Down Normal 28.06 GB > 50.00% 0 > 192.168.56.11 DC1-Cass RAC1 Down Normal 31.59 GB > 25.00% 42535295865117307932921825928971026432 > 192.168.56.12 DC1-Cass RAC1 Up Normal 29.02 GB 25.00% > 85070591730234615865843651857942052864 > > > I do not see anything in the logs that tells me that there is a gossip issue. > > nodetool info > Token : 85070591730234615865843651857942052864 > Gossip active : true > Thrift active : true > Load : 29.05 GB > Generation No : 1365114563 > Uptime (seconds) : 2127 > Heap Memory (MB) : 848.71 / 7945.94 > Exceptions : 0 > Key Cache : size 2208 (bytes), capacity 104857584 (bytes), 1056 hits, > 1099 requests, 0.961 recent hit rate, 14400 save period in seconds > Row Cache : size 0 (bytes), capacity 0 (bytes), 0 hits, 0 requests, > NaN recent hit rate, 0 save period in seconds > > nodetool info > Token : 42535295865117307932921825928971026432 > Gossip active : true > Thrift active : true > Load : 31.59 GB > Generation No : 1364413038 > Uptime (seconds) : 703904 > Heap Memory (MB) : 733.02 / 7945.94 > Exceptions : 1 > Key Cache : size 3693312 (bytes), capacity 104857584 (bytes), 26071678 > hits, 26616282 requests, 0.980 recent hit rate, 14400 save period in seconds > Row Cache : size 0 (bytes), capacity 0 (bytes), 0 hits, 0 requests, > NaN recent hit rate, 0 save period in seconds > > > > There is no firewall between the nodes and I can reach each other on storage > port. > What else should I be looking at to find root cause? Appreciate your inputs.