[jira] [Comment Edited] (CASSANDRA-6053) system.peers table not updated after decommissioning nodes in C* 2.0
[ https://issues.apache.org/jira/browse/CASSANDRA-6053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14987349#comment-14987349 ] Kenneth Failbus edited comment on CASSANDRA-6053 at 11/3/15 2:30 PM: - [~brandon.williams] FYI - Even though, this was fixed it surfaced in 2.0.14 release that we have in production. We are going to follow the work-around as mentioned above. was (Author: kenfailbus): Even though, this was fixed it surfaced in 2.0.14 release that we have in production. We are going to follow the work-around as mentioned above. > system.peers table not updated after decommissioning nodes in C* 2.0 > > > Key: CASSANDRA-6053 > URL: https://issues.apache.org/jira/browse/CASSANDRA-6053 > Project: Cassandra > Issue Type: Bug > Environment: Datastax AMI running EC2 m1.xlarge instances >Reporter: Guyon Moree >Assignee: Tyler Hobbs > Fix For: 1.2.14, 2.0.5 > > Attachments: 6053-v1.patch, peers > > > After decommissioning my cluster from 20 to 9 nodes using opscenter, I found > all but one of the nodes had incorrect system.peers tables. > This became a problem (afaik) when using the python-driver, since this > queries the peers table to set up its connection pool. Resulting in very slow > startup times, because of timeouts. > The output of nodetool didn't seem to be affected. After removing the > incorrect entries from the peers tables, the connection issues seem to have > disappeared for us. > Would like some feedback on if this was the right way to handle the issue or > if I'm still left with a broken cluster. > Attached is the output of nodetool status, which shows the correct 9 nodes. > Below that the output of the system.peers tables on the individual nodes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-6053) system.peers table not updated after decommissioning nodes in C* 2.0
[ https://issues.apache.org/jira/browse/CASSANDRA-6053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13851971#comment-13851971 ] Ryan McGuire edited comment on CASSANDRA-6053 at 12/18/13 6:15 PM: --- First attempt appears to work correctly on cassandra-2.0 HEAD and 1.2.9 : {code} 12:53 PM:~$ ccm create -v git:cassandra-1.2.9 t Fetching Cassandra updates... Current cluster is now: t 12:53 PM:~$ ccm populate -n 5 12:54 PM:~$ ccm start 12:54 PM:~$ ccm node1 stress Created keyspaces. Sleeping 1s for propagation. total,interval_op_rate,interval_key_rate,latency/95th/99th,elapsed_time 24994,2499,2499,9.5,55.2,179.0,10 103123,7812,7812,2.8,27.2,134.7,20 236358,13323,13323,1.7,15.4,134.7,30 329477,9311,9311,1.7,9.8,109.8,40 405667,7619,7619,1.8,9.2,6591.9,50 558989,15332,15332,1.5,6.6,6591.1,60 ^C12:55 PM:~$ ccm node1 cqlsh Connected to t at 127.0.0.1:9160. [cqlsh 3.1.7 | Cassandra 1.2.9-SNAPSHOT | CQL spec 3.0.0 | Thrift protocol 19.36.0] Use HELP for help. cqlsh select peer from system.peers; peer --- 127.0.0.3 127.0.0.2 127.0.0.5 127.0.0.4 cqlsh 12:55 PM:~$ ccm node2 decommission 12:57 PM:~$ ccm node1 cqlsh Connected to t at 127.0.0.1:9160. [cqlsh 3.1.7 | Cassandra 1.2.9-SNAPSHOT | CQL spec 3.0.0 | Thrift protocol 19.36.0] Use HELP for help. cqlsh select peer from system.peers; peer --- 127.0.0.3 127.0.0.5 127.0.0.4 cqlsh 12:58 PM:~$ {code} All nodes show the same peers table. was (Author: enigmacurry): First attempt appears to work correctly on cassandra-2.0 HEAD and 1.2.9 : {code} 12:53 PM:~$ ccm create -v git:cassandra-1.2.9 t Fetching Cassandra updates... Current cluster is now: t 12:53 PM:~$ ccm populate -n 5 12:54 PM:~$ ccm start 12:54 PM:~$ ccm node1 stress Created keyspaces. Sleeping 1s for propagation. total,interval_op_rate,interval_key_rate,latency/95th/99th,elapsed_time 24994,2499,2499,9.5,55.2,179.0,10 103123,7812,7812,2.8,27.2,134.7,20 236358,13323,13323,1.7,15.4,134.7,30 329477,9311,9311,1.7,9.8,109.8,40 405667,7619,7619,1.8,9.2,6591.9,50 558989,15332,15332,1.5,6.6,6591.1,60 ^C12:55 PM:~$ ccm node1 cqlsh Connected to t at 127.0.0.1:9160. [cqlsh 3.1.7 | Cassandra 1.2.9-SNAPSHOT | CQL spec 3.0.0 | Thrift protocol 19.36.0] Use HELP for help. cqlsh select peer from system.peers; peer --- 127.0.0.3 127.0.0.2 127.0.0.5 127.0.0.4 cqlsh 12:55 PM:~$ ccm node2 decommission 12:57 PM:~$ ccm node1 cqlsh Connected to t at 127.0.0.1:9160. [cqlsh 3.1.7 | Cassandra 1.2.9-SNAPSHOT | CQL spec 3.0.0 | Thrift protocol 19.36.0] Use HELP for help. cqlsh select peer from system.peers; peer --- 127.0.0.3 127.0.0.5 127.0.0.4 cqlsh 12:58 PM:~$ {code} system.peers table not updated after decommissioning nodes in C* 2.0 Key: CASSANDRA-6053 URL: https://issues.apache.org/jira/browse/CASSANDRA-6053 Project: Cassandra Issue Type: Bug Components: Core Environment: Datastax AMI running EC2 m1.xlarge instances Reporter: Guyon Moree Assignee: Brandon Williams Attachments: peers After decommissioning my cluster from 20 to 9 nodes using opscenter, I found all but one of the nodes had incorrect system.peers tables. This became a problem (afaik) when using the python-driver, since this queries the peers table to set up its connection pool. Resulting in very slow startup times, because of timeouts. The output of nodetool didn't seem to be affected. After removing the incorrect entries from the peers tables, the connection issues seem to have disappeared for us. Would like some feedback on if this was the right way to handle the issue or if I'm still left with a broken cluster. Attached is the output of nodetool status, which shows the correct 9 nodes. Below that the output of the system.peers tables on the individual nodes. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Comment Edited] (CASSANDRA-6053) system.peers table not updated after decommissioning nodes in C* 2.0
[ https://issues.apache.org/jira/browse/CASSANDRA-6053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13851971#comment-13851971 ] Ryan McGuire edited comment on CASSANDRA-6053 at 12/18/13 6:16 PM: --- First attempt appears to work correctly on cassandra-2.0 HEAD and 1.2.9 : {code} 12:53 PM:~$ ccm create -v git:cassandra-1.2.9 t Fetching Cassandra updates... Current cluster is now: t 12:53 PM:~$ ccm populate -n 5 12:54 PM:~$ ccm start 12:54 PM:~$ ccm node1 stress Created keyspaces. Sleeping 1s for propagation. total,interval_op_rate,interval_key_rate,latency/95th/99th,elapsed_time 24994,2499,2499,9.5,55.2,179.0,10 103123,7812,7812,2.8,27.2,134.7,20 236358,13323,13323,1.7,15.4,134.7,30 329477,9311,9311,1.7,9.8,109.8,40 405667,7619,7619,1.8,9.2,6591.9,50 558989,15332,15332,1.5,6.6,6591.1,60 ^C12:55 PM:~$ ccm node1 cqlsh Connected to t at 127.0.0.1:9160. [cqlsh 3.1.7 | Cassandra 1.2.9-SNAPSHOT | CQL spec 3.0.0 | Thrift protocol 19.36.0] Use HELP for help. cqlsh select peer from system.peers; peer --- 127.0.0.3 127.0.0.2 127.0.0.5 127.0.0.4 cqlsh 12:55 PM:~$ ccm node2 decommission 12:57 PM:~$ ccm node1 cqlsh Connected to t at 127.0.0.1:9160. [cqlsh 3.1.7 | Cassandra 1.2.9-SNAPSHOT | CQL spec 3.0.0 | Thrift protocol 19.36.0] Use HELP for help. cqlsh select peer from system.peers; peer --- 127.0.0.3 127.0.0.5 127.0.0.4 cqlsh 12:58 PM:~$ {code} All nodes show equivalent peers table. was (Author: enigmacurry): First attempt appears to work correctly on cassandra-2.0 HEAD and 1.2.9 : {code} 12:53 PM:~$ ccm create -v git:cassandra-1.2.9 t Fetching Cassandra updates... Current cluster is now: t 12:53 PM:~$ ccm populate -n 5 12:54 PM:~$ ccm start 12:54 PM:~$ ccm node1 stress Created keyspaces. Sleeping 1s for propagation. total,interval_op_rate,interval_key_rate,latency/95th/99th,elapsed_time 24994,2499,2499,9.5,55.2,179.0,10 103123,7812,7812,2.8,27.2,134.7,20 236358,13323,13323,1.7,15.4,134.7,30 329477,9311,9311,1.7,9.8,109.8,40 405667,7619,7619,1.8,9.2,6591.9,50 558989,15332,15332,1.5,6.6,6591.1,60 ^C12:55 PM:~$ ccm node1 cqlsh Connected to t at 127.0.0.1:9160. [cqlsh 3.1.7 | Cassandra 1.2.9-SNAPSHOT | CQL spec 3.0.0 | Thrift protocol 19.36.0] Use HELP for help. cqlsh select peer from system.peers; peer --- 127.0.0.3 127.0.0.2 127.0.0.5 127.0.0.4 cqlsh 12:55 PM:~$ ccm node2 decommission 12:57 PM:~$ ccm node1 cqlsh Connected to t at 127.0.0.1:9160. [cqlsh 3.1.7 | Cassandra 1.2.9-SNAPSHOT | CQL spec 3.0.0 | Thrift protocol 19.36.0] Use HELP for help. cqlsh select peer from system.peers; peer --- 127.0.0.3 127.0.0.5 127.0.0.4 cqlsh 12:58 PM:~$ {code} All nodes show the same peers table. system.peers table not updated after decommissioning nodes in C* 2.0 Key: CASSANDRA-6053 URL: https://issues.apache.org/jira/browse/CASSANDRA-6053 Project: Cassandra Issue Type: Bug Components: Core Environment: Datastax AMI running EC2 m1.xlarge instances Reporter: Guyon Moree Assignee: Brandon Williams Attachments: peers After decommissioning my cluster from 20 to 9 nodes using opscenter, I found all but one of the nodes had incorrect system.peers tables. This became a problem (afaik) when using the python-driver, since this queries the peers table to set up its connection pool. Resulting in very slow startup times, because of timeouts. The output of nodetool didn't seem to be affected. After removing the incorrect entries from the peers tables, the connection issues seem to have disappeared for us. Would like some feedback on if this was the right way to handle the issue or if I'm still left with a broken cluster. Attached is the output of nodetool status, which shows the correct 9 nodes. Below that the output of the system.peers tables on the individual nodes. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Comment Edited] (CASSANDRA-6053) system.peers table not updated after decommissioning nodes in C* 2.0
[ https://issues.apache.org/jira/browse/CASSANDRA-6053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13851994#comment-13851994 ] Ryan McGuire edited comment on CASSANDRA-6053 at 12/18/13 6:52 PM: --- OK, reproduced this by killing -9 one of the nodes and then doing a 'nodetool removenode': {code} 01:20 PM:~$ kill -9 18961(PID of node1) 01:21 PM:~$ ccm node1 status Failed to connect to '127.0.0.1:7100': Connection refused 01:21 PM:~$ ccm node2 status Datacenter: datacenter1 === Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- AddressLoad Owns Host ID Token Rack DN 127.0.0.1 62.93 KB 20.0% 896644af-8640-4be6-a3ff-e8ed559d851c -9223372036854775808 rack1 UN 127.0.0.2 51.17 KB 20.0% d3801466-d36d-428c-b4e5-05ff69fe36c0 -5534023222112865485 rack1 UN 127.0.0.3 62.78 KB 20.0% cb36c3ad-df45-4f77-bff5-ca93c504ec08 -1844674407370955162 rack1 UN 127.0.0.4 51.17 KB 20.0% 89031a05-a3f6-4ac7-9d29-6caa0c609dbc 1844674407370955161 rack1 UN 127.0.0.5 51.27 KB 20.0% 4909d856-a86e-493a-a7d0-7570d71eb9d8 5534023222112865484 rack1 # Issue removenode on node3 : 01:21 PM:~$ ~/.ccm/t/node1/bin/nodetool -p 7300 removenode 896644af-8640-4be6-a3ff-e8ed559d851c 01:22 PM:~$ ccm node3 cqlsh Connected to t at 127.0.0.3:9160. [cqlsh 4.1.0 | Cassandra 2.0.3-SNAPSHOT | CQL spec 3.1.1 | Thrift protocol 19.39.0] Use HELP for help. cqlsh select * from system.peers; peer | data_center | host_id | preferred_ip | rack | release_version | rpc_address | schema_version | tokens ---+-+--+--+---+-+-+--+-- 127.0.0.2 | datacenter1 | d3801466-d36d-428c-b4e5-05ff69fe36c0 | null | rack1 | 2.0.3-SNAPSHOT | 127.0.0.2 | d133398f-f287-3674-83af-a1b04ee29f1f | {'-5534023222112865485'} 127.0.0.5 | datacenter1 | 4909d856-a86e-493a-a7d0-7570d71eb9d8 | null | rack1 | 2.0.3-SNAPSHOT | 127.0.0.5 | d133398f-f287-3674-83af-a1b04ee29f1f | {'5534023222112865484'} 127.0.0.4 | datacenter1 | 89031a05-a3f6-4ac7-9d29-6caa0c609dbc | null | rack1 | 2.0.3-SNAPSHOT | 127.0.0.4 | d133398f-f287-3674-83af-a1b04ee29f1f | {'1844674407370955161'} (3 rows) # Check node2 peers table: 01:23 PM:~$ ccm node2 cqlsh Connected to t at 127.0.0.2:9160. [cqlsh 4.1.0 | Cassandra 2.0.3-SNAPSHOT | CQL spec 3.1.1 | Thrift protocol 19.39.0] Use HELP for help. cqlsh select * from system.peers; peer | data_center | host_id | preferred_ip | rack | release_version | rpc_address | schema_version | tokens ---+-+--+--+---+-+-+--+-- 127.0.0.3 | datacenter1 | cb36c3ad-df45-4f77-bff5-ca93c504ec08 | null | rack1 | 2.0.3-SNAPSHOT | 127.0.0.3 | d133398f-f287-3674-83af-a1b04ee29f1f | {'-1844674407370955162'} 127.0.0.1 |null | 896644af-8640-4be6-a3ff-e8ed559d851c | null | null |null | 127.0.0.1 | null | null 127.0.0.5 | datacenter1 | 4909d856-a86e-493a-a7d0-7570d71eb9d8 | null | rack1 | 2.0.3-SNAPSHOT | 127.0.0.5 | d133398f-f287-3674-83af-a1b04ee29f1f | {'5534023222112865484'} 127.0.0.4 | datacenter1 | 89031a05-a3f6-4ac7-9d29-6caa0c609dbc | null | rack1 | 2.0.3-SNAPSHOT | 127.0.0.4 | d133398f-f287-3674-83af-a1b04ee29f1f | {'1844674407370955161'} (4 rows) # oh noes!... node2 still has an entry for node1 in peers table. 01:23 PM:~$ ccm node2 status Datacenter: datacenter1 === Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- AddressLoad Owns Host ID Token Rack UN 127.0.0.2 51.17 KB 40.0% d3801466-d36d-428c-b4e5-05ff69fe36c0 -5534023222112865485 rack1 UN 127.0.0.3 62.78 KB 20.0% cb36c3ad-df45-4f77-bff5-ca93c504ec08 -1844674407370955162 rack1 UN 127.0.0.4 51.17 KB 20.0% 89031a05-a3f6-4ac7-9d29-6caa0c609dbc 1844674407370955161 rack1 UN 127.0.0.5 51.27 KB 20.0% 4909d856-a86e-493a-a7d0-7570d71eb9d8 5534023222112865484 rack1 {code} By issuing the removenode on node3, node3 seems to know about the node being removed and it's peers table is correct. node2, although it's status output shows node1 going away, it's peers table has not been
[jira] [Comment Edited] (CASSANDRA-6053) system.peers table not updated after decommissioning nodes in C* 2.0
[ https://issues.apache.org/jira/browse/CASSANDRA-6053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13785003#comment-13785003 ] Jeremy Hanna edited comment on CASSANDRA-6053 at 10/3/13 11:27 AM: --- The load_ring_state=false directive should probably also clear out the peers table because otherwise, the state that you're trying to get rid of is still persisted there. was (Author: jeromatron): The load_ring_state=false directive should probably also clear out the peers table because otherwise, polluted gossip is still persisted there. system.peers table not updated after decommissioning nodes in C* 2.0 Key: CASSANDRA-6053 URL: https://issues.apache.org/jira/browse/CASSANDRA-6053 Project: Cassandra Issue Type: Bug Components: Core Environment: Datastax AMI running EC2 m1.xlarge instances Reporter: Guyon Moree Assignee: Brandon Williams Attachments: peers After decommissioning my cluster from 20 to 9 nodes using opscenter, I found all but one of the nodes had incorrect system.peers tables. This became a problem (afaik) when using the python-driver, since this queries the peers table to set up its connection pool. Resulting in very slow startup times, because of timeouts. The output of nodetool didn't seem to be affected. After removing the incorrect entries from the peers tables, the connection issues seem to have disappeared for us. Would like some feedback on if this was the right way to handle the issue or if I'm still left with a broken cluster. Attached is the output of nodetool status, which shows the correct 9 nodes. Below that the output of the system.peers tables on the individual nodes. -- This message was sent by Atlassian JIRA (v6.1#6144)