[jira] [Commented] (CASSANDRA-14848) When upgrading 3.11.3->4.0 using SSL 4.0 nodes does not connect to old non seed nodes
[ https://issues.apache.org/jira/browse/CASSANDRA-14848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17129873#comment-17129873 ] Jon Meredith commented on CASSANDRA-14848: -- I'm glad it's fixed for you. Sorry I missed your original patch when working on the fix, I didn't even think to look for it as I thought it was due to the internode message refactor. Closing this ticket. > When upgrading 3.11.3->4.0 using SSL 4.0 nodes does not connect to old non > seed nodes > - > > Key: CASSANDRA-14848 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14848 > Project: Cassandra > Issue Type: Bug > Components: Messaging/Internode >Reporter: Tommy Stendahl >Assignee: Tommy Stendahl >Priority: Urgent > Labels: security > Fix For: 4.0-beta > > > When upgrading from 3.11.3 to 4.0 with server encryption enabled the new 4.0 > node only connects to 3.11.3 seed node, there are no connection established > to non-seed nodes on the old version. > I have four nodes, *.242 is upgraded to 4.0, *.243 and *.244 are 3.11.3 > non-seed and *.246 are 3.11.3 seed. After starting the 4.0 node I get this > nodetool status on the different nodes: > {noformat} > *.242 > -- Address Load Tokens Owns (effective) Host ID Rack > UN 10.216.193.242 1017.77 KiB 256 75,1% 7d278e14-d549-42f3-840d-77cfd852fbf4 > RAC1 > DN 10.216.193.243 743.32 KiB 256 74,8% 5586243a-ca74-4125-8e7e-09e82e23c4e5 > RAC1 > DN 10.216.193.244 711.54 KiB 256 75,2% c155e262-b898-4e86-9e1d-d4d0f97e88f6 > RAC1 > UN 10.216.193.246 659.81 KiB 256 74,9% 502dd00f-fc02-4024-b65f-b98ba3808291 > RAC1 > *.243 and *.244 > -- Address Load Tokens Owns (effective) Host ID Rack > DN 10.216.193.242 657.4 KiB 256 75,1% 7d278e14-d549-42f3-840d-77cfd852fbf4 > RAC1 > UN 10.216.193.243 471 KiB 256 74,8% 5586243a-ca74-4125-8e7e-09e82e23c4e5 RAC1 > UN 10.216.193.244 471.71 KiB 256 75,2% c155e262-b898-4e86-9e1d-d4d0f97e88f6 > RAC1 > UN 10.216.193.246 388.54 KiB 256 74,9% 502dd00f-fc02-4024-b65f-b98ba3808291 > RAC1 > *.246 > -- Address Load Tokens Owns (effective) Host ID Rack > UN 10.216.193.242 657.4 KiB 256 75,1% 7d278e14-d549-42f3-840d-77cfd852fbf4 > RAC1 > UN 10.216.193.243 471 KiB 256 74,8% 5586243a-ca74-4125-8e7e-09e82e23c4e5 RAC1 > UN 10.216.193.244 471.71 KiB 256 75,2% c155e262-b898-4e86-9e1d-d4d0f97e88f6 > RAC1 > UN 10.216.193.246 388.54 KiB 256 74,9% 502dd00f-fc02-4024-b65f-b98ba3808291 > RAC1 > {noformat} > > I have built 4.0 with wire tracing activated and in my config the > storage_port=12700 and ssl_storage_port=12701. In the log I can see that the > 4.0 node start to connect to the 3.11.3 seed node on the storage_port but > quickly switch to the ssl_storage_port, but when connecting to the non-seed > nodes it never switch to the ssl_storage_port. > {noformat} > >grep 193.246 system.log | grep Outbound > 2018-10-25T10:57:36.799+0200 [MessagingService-NettyOutbound-Thread-4-1] INFO > i.n.u.internal.logging.Slf4JLogger:101 info [id: 0x2f0e5e55] CONNECT: > /10.216.193.246:12700 > 2018-10-25T10:57:36.902+0200 [MessagingService-NettyOutbound-Thread-4-2] INFO > i.n.u.internal.logging.Slf4JLogger:101 info [id: 0x9e81f62c] CONNECT: > /10.216.193.246:12701 > 2018-10-25T10:57:36.905+0200 [MessagingService-NettyOutbound-Thread-4-2] INFO > i.n.u.internal.logging.Slf4JLogger:101 info [id: 0x9e81f62c, > L:/10.216.193.242:37252 - R:10.216.193.246/10.216.193.246:12701] ACTIVE > 2018-10-25T10:57:36.906+0200 [MessagingService-NettyOutbound-Thread-4-2] INFO > i.n.u.internal.logging.Slf4JLogger:101 info [id: 0x9e81f62c, > L:/10.216.193.242:37252 - R:10.216.193.246/10.216.193.246:12701] WRITE: 8B > >grep 193.243 system.log | grep Outbound > 2018-10-25T10:57:38.438+0200 [MessagingService-NettyOutbound-Thread-4-3] INFO > i.n.u.internal.logging.Slf4JLogger:101 info [id: 0xd8f1d6c4] CONNECT: > /10.216.193.243:12700 > 2018-10-25T10:57:38.540+0200 [MessagingService-NettyOutbound-Thread-4-4] INFO > i.n.u.internal.logging.Slf4JLogger:101 info [id: 0xfde6cc9f] CONNECT: > /10.216.193.243:12700 > 2018-10-25T10:57:38.694+0200 [MessagingService-NettyOutbound-Thread-4-5] INFO > i.n.u.internal.logging.Slf4JLogger:101 info [id: 0x7e87fc4e] CONNECT: > /10.216.193.243:12700 > 2018-10-25T10:57:38.741+0200 [MessagingService-NettyOutbound-Thread-4-7] INFO > i.n.u.internal.logging.Slf4JLogger:101 info [id: 0x39395296] CONNECT: > /10.216.193.243:12700{noformat} > > When I had the dbug log activated and started the 4.0 node I can see that it > switch port for *.246 but not for *.243 and *.244. > {noformat} > >grep DEBUG system.log| grep OutboundMessagingConnection | grep > >maybeUpdateConnectionId > 2018-10-25T13:12:56.095+0200 [ScheduledFastTasks:1] DEBUG > o.a.c.n.a.OutboundMessagingConnection:314
[jira] [Commented] (CASSANDRA-14848) When upgrading 3.11.3->4.0 using SSL 4.0 nodes does not connect to old non seed nodes
[ https://issues.apache.org/jira/browse/CASSANDRA-14848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17112204#comment-17112204 ] Tommy Stendahl commented on CASSANDRA-14848: I retested this on the latest trunk and the problem is solved, I think it was solved in CASSANDRA-15727. I verified this using [~eperott] procedure above using ccm. I also tested starting the last node with enable_legacy_ssl_storage_port set to false and that was also working. > When upgrading 3.11.3->4.0 using SSL 4.0 nodes does not connect to old non > seed nodes > - > > Key: CASSANDRA-14848 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14848 > Project: Cassandra > Issue Type: Bug > Components: Messaging/Internode >Reporter: Tommy Stendahl >Assignee: Tommy Stendahl >Priority: Urgent > Labels: security > Fix For: 4.0-beta > > > When upgrading from 3.11.3 to 4.0 with server encryption enabled the new 4.0 > node only connects to 3.11.3 seed node, there are no connection established > to non-seed nodes on the old version. > I have four nodes, *.242 is upgraded to 4.0, *.243 and *.244 are 3.11.3 > non-seed and *.246 are 3.11.3 seed. After starting the 4.0 node I get this > nodetool status on the different nodes: > {noformat} > *.242 > -- Address Load Tokens Owns (effective) Host ID Rack > UN 10.216.193.242 1017.77 KiB 256 75,1% 7d278e14-d549-42f3-840d-77cfd852fbf4 > RAC1 > DN 10.216.193.243 743.32 KiB 256 74,8% 5586243a-ca74-4125-8e7e-09e82e23c4e5 > RAC1 > DN 10.216.193.244 711.54 KiB 256 75,2% c155e262-b898-4e86-9e1d-d4d0f97e88f6 > RAC1 > UN 10.216.193.246 659.81 KiB 256 74,9% 502dd00f-fc02-4024-b65f-b98ba3808291 > RAC1 > *.243 and *.244 > -- Address Load Tokens Owns (effective) Host ID Rack > DN 10.216.193.242 657.4 KiB 256 75,1% 7d278e14-d549-42f3-840d-77cfd852fbf4 > RAC1 > UN 10.216.193.243 471 KiB 256 74,8% 5586243a-ca74-4125-8e7e-09e82e23c4e5 RAC1 > UN 10.216.193.244 471.71 KiB 256 75,2% c155e262-b898-4e86-9e1d-d4d0f97e88f6 > RAC1 > UN 10.216.193.246 388.54 KiB 256 74,9% 502dd00f-fc02-4024-b65f-b98ba3808291 > RAC1 > *.246 > -- Address Load Tokens Owns (effective) Host ID Rack > UN 10.216.193.242 657.4 KiB 256 75,1% 7d278e14-d549-42f3-840d-77cfd852fbf4 > RAC1 > UN 10.216.193.243 471 KiB 256 74,8% 5586243a-ca74-4125-8e7e-09e82e23c4e5 RAC1 > UN 10.216.193.244 471.71 KiB 256 75,2% c155e262-b898-4e86-9e1d-d4d0f97e88f6 > RAC1 > UN 10.216.193.246 388.54 KiB 256 74,9% 502dd00f-fc02-4024-b65f-b98ba3808291 > RAC1 > {noformat} > > I have built 4.0 with wire tracing activated and in my config the > storage_port=12700 and ssl_storage_port=12701. In the log I can see that the > 4.0 node start to connect to the 3.11.3 seed node on the storage_port but > quickly switch to the ssl_storage_port, but when connecting to the non-seed > nodes it never switch to the ssl_storage_port. > {noformat} > >grep 193.246 system.log | grep Outbound > 2018-10-25T10:57:36.799+0200 [MessagingService-NettyOutbound-Thread-4-1] INFO > i.n.u.internal.logging.Slf4JLogger:101 info [id: 0x2f0e5e55] CONNECT: > /10.216.193.246:12700 > 2018-10-25T10:57:36.902+0200 [MessagingService-NettyOutbound-Thread-4-2] INFO > i.n.u.internal.logging.Slf4JLogger:101 info [id: 0x9e81f62c] CONNECT: > /10.216.193.246:12701 > 2018-10-25T10:57:36.905+0200 [MessagingService-NettyOutbound-Thread-4-2] INFO > i.n.u.internal.logging.Slf4JLogger:101 info [id: 0x9e81f62c, > L:/10.216.193.242:37252 - R:10.216.193.246/10.216.193.246:12701] ACTIVE > 2018-10-25T10:57:36.906+0200 [MessagingService-NettyOutbound-Thread-4-2] INFO > i.n.u.internal.logging.Slf4JLogger:101 info [id: 0x9e81f62c, > L:/10.216.193.242:37252 - R:10.216.193.246/10.216.193.246:12701] WRITE: 8B > >grep 193.243 system.log | grep Outbound > 2018-10-25T10:57:38.438+0200 [MessagingService-NettyOutbound-Thread-4-3] INFO > i.n.u.internal.logging.Slf4JLogger:101 info [id: 0xd8f1d6c4] CONNECT: > /10.216.193.243:12700 > 2018-10-25T10:57:38.540+0200 [MessagingService-NettyOutbound-Thread-4-4] INFO > i.n.u.internal.logging.Slf4JLogger:101 info [id: 0xfde6cc9f] CONNECT: > /10.216.193.243:12700 > 2018-10-25T10:57:38.694+0200 [MessagingService-NettyOutbound-Thread-4-5] INFO > i.n.u.internal.logging.Slf4JLogger:101 info [id: 0x7e87fc4e] CONNECT: > /10.216.193.243:12700 > 2018-10-25T10:57:38.741+0200 [MessagingService-NettyOutbound-Thread-4-7] INFO > i.n.u.internal.logging.Slf4JLogger:101 info [id: 0x39395296] CONNECT: > /10.216.193.243:12700{noformat} > > When I had the dbug log activated and started the 4.0 node I can see that it > switch port for *.246 but not for *.243 and *.244. > {noformat} > >grep DEBUG system.log| grep OutboundMessagingConnection | grep > >maybeUpdateConnectionId >
[jira] [Commented] (CASSANDRA-14848) When upgrading 3.11.3->4.0 using SSL 4.0 nodes does not connect to old non seed nodes
[ https://issues.apache.org/jira/browse/CASSANDRA-14848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16829276#comment-16829276 ] Per Otterström commented on CASSANDRA-14848: Tested this in a local ccm cluster. Without the patch I'm able to reproduce the reported issue, and I can also verify that the patch solves the issue. However, I get this error when I set ```enable_legacy_ssl_storage_port: false``` _on the last node_ which I didn't expect: {code:java} ERROR [main] 2019-04-29 11:39:12,995 CassandraDaemon.java:743 - Exception encountered during startup java.lang.RuntimeException: Unable to gossip with any peers at org.apache.cassandra.gms.Gossiper.doShadowRound(Gossiper.java:1546) at org.apache.cassandra.service.StorageService.checkForEndpointCollision(StorageService.java:553) at org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:841) at org.apache.cassandra.service.StorageService.initServer(StorageService.java:699) at org.apache.cassandra.service.StorageService.initServer(StorageService.java:650) at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:379) at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:609) at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:721) {code} Procedure to reproduce: {code} # Create and start cluster ccm create -v 3.11.4 -n 4 --node-ssl=~/cert c14848 ccm start # Upgrade node1 ccm node1 stop ccm node1 setdir sed -i 's/seeds.*/seeds: 127.0.0.1,127.0.0.2/' ~/.ccm/c14848/node1/conf/cassandra.yaml sed -i 's/enable_legacy_ssl_storage_port: false/enable_legacy_ssl_storage_port: true/' ~/.ccm/c14848/node1/conf/cassandra.yaml sed -i '/enable_legacy_ssl_storage_port/{n;s/false/true/}' ~/.ccm/c14848/node1/conf/cassandra.yaml ccm ${1} start # Repeat steps above on node2 and node3 as well # Repeat steps on node4 but let enable_legacy_ssl_storage_port be false. {code} Note that if I perform the upgrade with legacy ssl storage port enabled also on the final node, the upgrade will be able to complete without errors. Once that is completed it is possible to do a rolling restart while disabling the legacy port. > When upgrading 3.11.3->4.0 using SSL 4.0 nodes does not connect to old non > seed nodes > - > > Key: CASSANDRA-14848 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14848 > Project: Cassandra > Issue Type: Bug > Components: Messaging/Internode >Reporter: Tommy Stendahl >Assignee: Tommy Stendahl >Priority: Urgent > Labels: security > Fix For: 4.0 > > > When upgrading from 3.11.3 to 4.0 with server encryption enabled the new 4.0 > node only connects to 3.11.3 seed node, there are no connection established > to non-seed nodes on the old version. > I have four nodes, *.242 is upgraded to 4.0, *.243 and *.244 are 3.11.3 > non-seed and *.246 are 3.11.3 seed. After starting the 4.0 node I get this > nodetool status on the different nodes: > {noformat} > *.242 > -- Address Load Tokens Owns (effective) Host ID Rack > UN 10.216.193.242 1017.77 KiB 256 75,1% 7d278e14-d549-42f3-840d-77cfd852fbf4 > RAC1 > DN 10.216.193.243 743.32 KiB 256 74,8% 5586243a-ca74-4125-8e7e-09e82e23c4e5 > RAC1 > DN 10.216.193.244 711.54 KiB 256 75,2% c155e262-b898-4e86-9e1d-d4d0f97e88f6 > RAC1 > UN 10.216.193.246 659.81 KiB 256 74,9% 502dd00f-fc02-4024-b65f-b98ba3808291 > RAC1 > *.243 and *.244 > -- Address Load Tokens Owns (effective) Host ID Rack > DN 10.216.193.242 657.4 KiB 256 75,1% 7d278e14-d549-42f3-840d-77cfd852fbf4 > RAC1 > UN 10.216.193.243 471 KiB 256 74,8% 5586243a-ca74-4125-8e7e-09e82e23c4e5 RAC1 > UN 10.216.193.244 471.71 KiB 256 75,2% c155e262-b898-4e86-9e1d-d4d0f97e88f6 > RAC1 > UN 10.216.193.246 388.54 KiB 256 74,9% 502dd00f-fc02-4024-b65f-b98ba3808291 > RAC1 > *.246 > -- Address Load Tokens Owns (effective) Host ID Rack > UN 10.216.193.242 657.4 KiB 256 75,1% 7d278e14-d549-42f3-840d-77cfd852fbf4 > RAC1 > UN 10.216.193.243 471 KiB 256 74,8% 5586243a-ca74-4125-8e7e-09e82e23c4e5 RAC1 > UN 10.216.193.244 471.71 KiB 256 75,2% c155e262-b898-4e86-9e1d-d4d0f97e88f6 > RAC1 > UN 10.216.193.246 388.54 KiB 256 74,9% 502dd00f-fc02-4024-b65f-b98ba3808291 > RAC1 > {noformat} > > I have built 4.0 with wire tracing activated and in my config the > storage_port=12700 and ssl_storage_port=12701. In the log I can see that the > 4.0 node start to connect to the 3.11.3 seed node on the storage_port but > quickly switch to the ssl_storage_port, but when connecting to the non-seed > nodes it never switch to the ssl_storage_port. > {noformat} > >grep 193.246 system.log | grep Outbound > 2018-10-25T10:57:36.799+0200
[jira] [Commented] (CASSANDRA-14848) When upgrading 3.11.3->4.0 using SSL 4.0 nodes does not connect to old non seed nodes
[ https://issues.apache.org/jira/browse/CASSANDRA-14848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16773739#comment-16773739 ] Tommy Stendahl commented on CASSANDRA-14848: Patch is available here: [cassandra-14848|https://github.com/tommystendahl/cassandra/commit/d2a9cfe87c0e41e20ea43a75bab76f2cba8e293c] > When upgrading 3.11.3->4.0 using SSL 4.0 nodes does not connect to old non > seed nodes > - > > Key: CASSANDRA-14848 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14848 > Project: Cassandra > Issue Type: Bug > Components: Messaging/Internode >Reporter: Tommy Stendahl >Assignee: Tommy Stendahl >Priority: Blocker > Labels: security > Fix For: 4.0 > > > When upgrading from 3.11.3 to 4.0 with server encryption enabled the new 4.0 > node only connects to 3.11.3 seed node, there are no connection established > to non-seed nodes on the old version. > I have four nodes, *.242 is upgraded to 4.0, *.243 and *.244 are 3.11.3 > non-seed and *.246 are 3.11.3 seed. After starting the 4.0 node I get this > nodetool status on the different nodes: > {noformat} > *.242 > -- Address Load Tokens Owns (effective) Host ID Rack > UN 10.216.193.242 1017.77 KiB 256 75,1% 7d278e14-d549-42f3-840d-77cfd852fbf4 > RAC1 > DN 10.216.193.243 743.32 KiB 256 74,8% 5586243a-ca74-4125-8e7e-09e82e23c4e5 > RAC1 > DN 10.216.193.244 711.54 KiB 256 75,2% c155e262-b898-4e86-9e1d-d4d0f97e88f6 > RAC1 > UN 10.216.193.246 659.81 KiB 256 74,9% 502dd00f-fc02-4024-b65f-b98ba3808291 > RAC1 > *.243 and *.244 > -- Address Load Tokens Owns (effective) Host ID Rack > DN 10.216.193.242 657.4 KiB 256 75,1% 7d278e14-d549-42f3-840d-77cfd852fbf4 > RAC1 > UN 10.216.193.243 471 KiB 256 74,8% 5586243a-ca74-4125-8e7e-09e82e23c4e5 RAC1 > UN 10.216.193.244 471.71 KiB 256 75,2% c155e262-b898-4e86-9e1d-d4d0f97e88f6 > RAC1 > UN 10.216.193.246 388.54 KiB 256 74,9% 502dd00f-fc02-4024-b65f-b98ba3808291 > RAC1 > *.246 > -- Address Load Tokens Owns (effective) Host ID Rack > UN 10.216.193.242 657.4 KiB 256 75,1% 7d278e14-d549-42f3-840d-77cfd852fbf4 > RAC1 > UN 10.216.193.243 471 KiB 256 74,8% 5586243a-ca74-4125-8e7e-09e82e23c4e5 RAC1 > UN 10.216.193.244 471.71 KiB 256 75,2% c155e262-b898-4e86-9e1d-d4d0f97e88f6 > RAC1 > UN 10.216.193.246 388.54 KiB 256 74,9% 502dd00f-fc02-4024-b65f-b98ba3808291 > RAC1 > {noformat} > > I have built 4.0 with wire tracing activated and in my config the > storage_port=12700 and ssl_storage_port=12701. In the log I can see that the > 4.0 node start to connect to the 3.11.3 seed node on the storage_port but > quickly switch to the ssl_storage_port, but when connecting to the non-seed > nodes it never switch to the ssl_storage_port. > {noformat} > >grep 193.246 system.log | grep Outbound > 2018-10-25T10:57:36.799+0200 [MessagingService-NettyOutbound-Thread-4-1] INFO > i.n.u.internal.logging.Slf4JLogger:101 info [id: 0x2f0e5e55] CONNECT: > /10.216.193.246:12700 > 2018-10-25T10:57:36.902+0200 [MessagingService-NettyOutbound-Thread-4-2] INFO > i.n.u.internal.logging.Slf4JLogger:101 info [id: 0x9e81f62c] CONNECT: > /10.216.193.246:12701 > 2018-10-25T10:57:36.905+0200 [MessagingService-NettyOutbound-Thread-4-2] INFO > i.n.u.internal.logging.Slf4JLogger:101 info [id: 0x9e81f62c, > L:/10.216.193.242:37252 - R:10.216.193.246/10.216.193.246:12701] ACTIVE > 2018-10-25T10:57:36.906+0200 [MessagingService-NettyOutbound-Thread-4-2] INFO > i.n.u.internal.logging.Slf4JLogger:101 info [id: 0x9e81f62c, > L:/10.216.193.242:37252 - R:10.216.193.246/10.216.193.246:12701] WRITE: 8B > >grep 193.243 system.log | grep Outbound > 2018-10-25T10:57:38.438+0200 [MessagingService-NettyOutbound-Thread-4-3] INFO > i.n.u.internal.logging.Slf4JLogger:101 info [id: 0xd8f1d6c4] CONNECT: > /10.216.193.243:12700 > 2018-10-25T10:57:38.540+0200 [MessagingService-NettyOutbound-Thread-4-4] INFO > i.n.u.internal.logging.Slf4JLogger:101 info [id: 0xfde6cc9f] CONNECT: > /10.216.193.243:12700 > 2018-10-25T10:57:38.694+0200 [MessagingService-NettyOutbound-Thread-4-5] INFO > i.n.u.internal.logging.Slf4JLogger:101 info [id: 0x7e87fc4e] CONNECT: > /10.216.193.243:12700 > 2018-10-25T10:57:38.741+0200 [MessagingService-NettyOutbound-Thread-4-7] INFO > i.n.u.internal.logging.Slf4JLogger:101 info [id: 0x39395296] CONNECT: > /10.216.193.243:12700{noformat} > > When I had the dbug log activated and started the 4.0 node I can see that it > switch port for *.246 but not for *.243 and *.244. > {noformat} > >grep DEBUG system.log| grep OutboundMessagingConnection | grep > >maybeUpdateConnectionId > 2018-10-25T13:12:56.095+0200 [ScheduledFastTasks:1] DEBUG > o.a.c.n.a.OutboundMessagingConnection:314 maybeUpdateConnectionId changing > connectionId to 10.216.193.246:12701
[jira] [Commented] (CASSANDRA-14848) When upgrading 3.11.3->4.0 using SSL 4.0 nodes does not connect to old non seed nodes
[ https://issues.apache.org/jira/browse/CASSANDRA-14848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16773152#comment-16773152 ] C. Scott Andreas commented on CASSANDRA-14848: -- It looks like an anonymous user modified the state of this ticket to "Ready to Commit" yesterday. Moving back to "Patch Available" because I don't see a reviewer assigned. > When upgrading 3.11.3->4.0 using SSL 4.0 nodes does not connect to old non > seed nodes > - > > Key: CASSANDRA-14848 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14848 > Project: Cassandra > Issue Type: Bug > Components: Messaging/Internode >Reporter: Tommy Stendahl >Assignee: Tommy Stendahl >Priority: Blocker > Labels: security > Fix For: 4.0 > > > When upgrading from 3.11.3 to 4.0 with server encryption enabled the new 4.0 > node only connects to 3.11.3 seed node, there are no connection established > to non-seed nodes on the old version. > I have four nodes, *.242 is upgraded to 4.0, *.243 and *.244 are 3.11.3 > non-seed and *.246 are 3.11.3 seed. After starting the 4.0 node I get this > nodetool status on the different nodes: > {noformat} > *.242 > -- Address Load Tokens Owns (effective) Host ID Rack > UN 10.216.193.242 1017.77 KiB 256 75,1% 7d278e14-d549-42f3-840d-77cfd852fbf4 > RAC1 > DN 10.216.193.243 743.32 KiB 256 74,8% 5586243a-ca74-4125-8e7e-09e82e23c4e5 > RAC1 > DN 10.216.193.244 711.54 KiB 256 75,2% c155e262-b898-4e86-9e1d-d4d0f97e88f6 > RAC1 > UN 10.216.193.246 659.81 KiB 256 74,9% 502dd00f-fc02-4024-b65f-b98ba3808291 > RAC1 > *.243 and *.244 > -- Address Load Tokens Owns (effective) Host ID Rack > DN 10.216.193.242 657.4 KiB 256 75,1% 7d278e14-d549-42f3-840d-77cfd852fbf4 > RAC1 > UN 10.216.193.243 471 KiB 256 74,8% 5586243a-ca74-4125-8e7e-09e82e23c4e5 RAC1 > UN 10.216.193.244 471.71 KiB 256 75,2% c155e262-b898-4e86-9e1d-d4d0f97e88f6 > RAC1 > UN 10.216.193.246 388.54 KiB 256 74,9% 502dd00f-fc02-4024-b65f-b98ba3808291 > RAC1 > *.246 > -- Address Load Tokens Owns (effective) Host ID Rack > UN 10.216.193.242 657.4 KiB 256 75,1% 7d278e14-d549-42f3-840d-77cfd852fbf4 > RAC1 > UN 10.216.193.243 471 KiB 256 74,8% 5586243a-ca74-4125-8e7e-09e82e23c4e5 RAC1 > UN 10.216.193.244 471.71 KiB 256 75,2% c155e262-b898-4e86-9e1d-d4d0f97e88f6 > RAC1 > UN 10.216.193.246 388.54 KiB 256 74,9% 502dd00f-fc02-4024-b65f-b98ba3808291 > RAC1 > {noformat} > > I have built 4.0 with wire tracing activated and in my config the > storage_port=12700 and ssl_storage_port=12701. In the log I can see that the > 4.0 node start to connect to the 3.11.3 seed node on the storage_port but > quickly switch to the ssl_storage_port, but when connecting to the non-seed > nodes it never switch to the ssl_storage_port. > {noformat} > >grep 193.246 system.log | grep Outbound > 2018-10-25T10:57:36.799+0200 [MessagingService-NettyOutbound-Thread-4-1] INFO > i.n.u.internal.logging.Slf4JLogger:101 info [id: 0x2f0e5e55] CONNECT: > /10.216.193.246:12700 > 2018-10-25T10:57:36.902+0200 [MessagingService-NettyOutbound-Thread-4-2] INFO > i.n.u.internal.logging.Slf4JLogger:101 info [id: 0x9e81f62c] CONNECT: > /10.216.193.246:12701 > 2018-10-25T10:57:36.905+0200 [MessagingService-NettyOutbound-Thread-4-2] INFO > i.n.u.internal.logging.Slf4JLogger:101 info [id: 0x9e81f62c, > L:/10.216.193.242:37252 - R:10.216.193.246/10.216.193.246:12701] ACTIVE > 2018-10-25T10:57:36.906+0200 [MessagingService-NettyOutbound-Thread-4-2] INFO > i.n.u.internal.logging.Slf4JLogger:101 info [id: 0x9e81f62c, > L:/10.216.193.242:37252 - R:10.216.193.246/10.216.193.246:12701] WRITE: 8B > >grep 193.243 system.log | grep Outbound > 2018-10-25T10:57:38.438+0200 [MessagingService-NettyOutbound-Thread-4-3] INFO > i.n.u.internal.logging.Slf4JLogger:101 info [id: 0xd8f1d6c4] CONNECT: > /10.216.193.243:12700 > 2018-10-25T10:57:38.540+0200 [MessagingService-NettyOutbound-Thread-4-4] INFO > i.n.u.internal.logging.Slf4JLogger:101 info [id: 0xfde6cc9f] CONNECT: > /10.216.193.243:12700 > 2018-10-25T10:57:38.694+0200 [MessagingService-NettyOutbound-Thread-4-5] INFO > i.n.u.internal.logging.Slf4JLogger:101 info [id: 0x7e87fc4e] CONNECT: > /10.216.193.243:12700 > 2018-10-25T10:57:38.741+0200 [MessagingService-NettyOutbound-Thread-4-7] INFO > i.n.u.internal.logging.Slf4JLogger:101 info [id: 0x39395296] CONNECT: > /10.216.193.243:12700{noformat} > > When I had the dbug log activated and started the 4.0 node I can see that it > switch port for *.246 but not for *.243 and *.244. > {noformat} > >grep DEBUG system.log| grep OutboundMessagingConnection | grep > >maybeUpdateConnectionId > 2018-10-25T13:12:56.095+0200 [ScheduledFastTasks:1] DEBUG > o.a.c.n.a.OutboundMessagingConnection:314 maybeUpdateConnectionId
[jira] [Commented] (CASSANDRA-14848) When upgrading 3.11.3->4.0 using SSL 4.0 nodes does not connect to old non seed nodes
[ https://issues.apache.org/jira/browse/CASSANDRA-14848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16703919#comment-16703919 ] Ariel Weisberg commented on CASSANDRA-14848: It was CASSANDRA-14896 causing the InetAddressAndPort to be serialized in a format 3.0 nodes can't understand causing infinite reconnects without progress. > When upgrading 3.11.3->4.0 using SSL 4.0 nodes does not connect to old non > seed nodes > - > > Key: CASSANDRA-14848 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14848 > Project: Cassandra > Issue Type: Bug > Components: Streaming and Messaging >Reporter: Tommy Stendahl >Assignee: Tommy Stendahl >Priority: Blocker > Labels: security > > When upgrading from 3.11.3 to 4.0 with server encryption enabled the new 4.0 > node only connects to 3.11.3 seed node, there are no connection established > to non-seed nodes on the old version. > I have four nodes, *.242 is upgraded to 4.0, *.243 and *.244 are 3.11.3 > non-seed and *.246 are 3.11.3 seed. After starting the 4.0 node I get this > nodetool status on the different nodes: > {noformat} > *.242 > -- Address Load Tokens Owns (effective) Host ID Rack > UN 10.216.193.242 1017.77 KiB 256 75,1% 7d278e14-d549-42f3-840d-77cfd852fbf4 > RAC1 > DN 10.216.193.243 743.32 KiB 256 74,8% 5586243a-ca74-4125-8e7e-09e82e23c4e5 > RAC1 > DN 10.216.193.244 711.54 KiB 256 75,2% c155e262-b898-4e86-9e1d-d4d0f97e88f6 > RAC1 > UN 10.216.193.246 659.81 KiB 256 74,9% 502dd00f-fc02-4024-b65f-b98ba3808291 > RAC1 > *.243 and *.244 > -- Address Load Tokens Owns (effective) Host ID Rack > DN 10.216.193.242 657.4 KiB 256 75,1% 7d278e14-d549-42f3-840d-77cfd852fbf4 > RAC1 > UN 10.216.193.243 471 KiB 256 74,8% 5586243a-ca74-4125-8e7e-09e82e23c4e5 RAC1 > UN 10.216.193.244 471.71 KiB 256 75,2% c155e262-b898-4e86-9e1d-d4d0f97e88f6 > RAC1 > UN 10.216.193.246 388.54 KiB 256 74,9% 502dd00f-fc02-4024-b65f-b98ba3808291 > RAC1 > *.246 > -- Address Load Tokens Owns (effective) Host ID Rack > UN 10.216.193.242 657.4 KiB 256 75,1% 7d278e14-d549-42f3-840d-77cfd852fbf4 > RAC1 > UN 10.216.193.243 471 KiB 256 74,8% 5586243a-ca74-4125-8e7e-09e82e23c4e5 RAC1 > UN 10.216.193.244 471.71 KiB 256 75,2% c155e262-b898-4e86-9e1d-d4d0f97e88f6 > RAC1 > UN 10.216.193.246 388.54 KiB 256 74,9% 502dd00f-fc02-4024-b65f-b98ba3808291 > RAC1 > {noformat} > > I have built 4.0 with wire tracing activated and in my config the > storage_port=12700 and ssl_storage_port=12701. In the log I can see that the > 4.0 node start to connect to the 3.11.3 seed node on the storage_port but > quickly switch to the ssl_storage_port, but when connecting to the non-seed > nodes it never switch to the ssl_storage_port. > {noformat} > >grep 193.246 system.log | grep Outbound > 2018-10-25T10:57:36.799+0200 [MessagingService-NettyOutbound-Thread-4-1] INFO > i.n.u.internal.logging.Slf4JLogger:101 info [id: 0x2f0e5e55] CONNECT: > /10.216.193.246:12700 > 2018-10-25T10:57:36.902+0200 [MessagingService-NettyOutbound-Thread-4-2] INFO > i.n.u.internal.logging.Slf4JLogger:101 info [id: 0x9e81f62c] CONNECT: > /10.216.193.246:12701 > 2018-10-25T10:57:36.905+0200 [MessagingService-NettyOutbound-Thread-4-2] INFO > i.n.u.internal.logging.Slf4JLogger:101 info [id: 0x9e81f62c, > L:/10.216.193.242:37252 - R:10.216.193.246/10.216.193.246:12701] ACTIVE > 2018-10-25T10:57:36.906+0200 [MessagingService-NettyOutbound-Thread-4-2] INFO > i.n.u.internal.logging.Slf4JLogger:101 info [id: 0x9e81f62c, > L:/10.216.193.242:37252 - R:10.216.193.246/10.216.193.246:12701] WRITE: 8B > >grep 193.243 system.log | grep Outbound > 2018-10-25T10:57:38.438+0200 [MessagingService-NettyOutbound-Thread-4-3] INFO > i.n.u.internal.logging.Slf4JLogger:101 info [id: 0xd8f1d6c4] CONNECT: > /10.216.193.243:12700 > 2018-10-25T10:57:38.540+0200 [MessagingService-NettyOutbound-Thread-4-4] INFO > i.n.u.internal.logging.Slf4JLogger:101 info [id: 0xfde6cc9f] CONNECT: > /10.216.193.243:12700 > 2018-10-25T10:57:38.694+0200 [MessagingService-NettyOutbound-Thread-4-5] INFO > i.n.u.internal.logging.Slf4JLogger:101 info [id: 0x7e87fc4e] CONNECT: > /10.216.193.243:12700 > 2018-10-25T10:57:38.741+0200 [MessagingService-NettyOutbound-Thread-4-7] INFO > i.n.u.internal.logging.Slf4JLogger:101 info [id: 0x39395296] CONNECT: > /10.216.193.243:12700{noformat} > > When I had the dbug log activated and started the 4.0 node I can see that it > switch port for *.246 but not for *.243 and *.244. > {noformat} > >grep DEBUG system.log| grep OutboundMessagingConnection | grep > >maybeUpdateConnectionId > 2018-10-25T13:12:56.095+0200 [ScheduledFastTasks:1] DEBUG > o.a.c.n.a.OutboundMessagingConnection:314 maybeUpdateConnectionId changing > connectionId to 10.216.193.246:12701
[jira] [Commented] (CASSANDRA-14848) When upgrading 3.11.3->4.0 using SSL 4.0 nodes does not connect to old non seed nodes
[ https://issues.apache.org/jira/browse/CASSANDRA-14848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16703672#comment-16703672 ] Ariel Weisberg commented on CASSANDRA-14848: Huh, I think I am reproducing this issue without SSL? I've got the 4.0 upgraded node repeatedly reconnecting to the 3.0 node and I don't know why yet. > When upgrading 3.11.3->4.0 using SSL 4.0 nodes does not connect to old non > seed nodes > - > > Key: CASSANDRA-14848 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14848 > Project: Cassandra > Issue Type: Bug > Components: Streaming and Messaging >Reporter: Tommy Stendahl >Assignee: Tommy Stendahl >Priority: Blocker > Labels: security > > When upgrading from 3.11.3 to 4.0 with server encryption enabled the new 4.0 > node only connects to 3.11.3 seed node, there are no connection established > to non-seed nodes on the old version. > I have four nodes, *.242 is upgraded to 4.0, *.243 and *.244 are 3.11.3 > non-seed and *.246 are 3.11.3 seed. After starting the 4.0 node I get this > nodetool status on the different nodes: > {noformat} > *.242 > -- Address Load Tokens Owns (effective) Host ID Rack > UN 10.216.193.242 1017.77 KiB 256 75,1% 7d278e14-d549-42f3-840d-77cfd852fbf4 > RAC1 > DN 10.216.193.243 743.32 KiB 256 74,8% 5586243a-ca74-4125-8e7e-09e82e23c4e5 > RAC1 > DN 10.216.193.244 711.54 KiB 256 75,2% c155e262-b898-4e86-9e1d-d4d0f97e88f6 > RAC1 > UN 10.216.193.246 659.81 KiB 256 74,9% 502dd00f-fc02-4024-b65f-b98ba3808291 > RAC1 > *.243 and *.244 > -- Address Load Tokens Owns (effective) Host ID Rack > DN 10.216.193.242 657.4 KiB 256 75,1% 7d278e14-d549-42f3-840d-77cfd852fbf4 > RAC1 > UN 10.216.193.243 471 KiB 256 74,8% 5586243a-ca74-4125-8e7e-09e82e23c4e5 RAC1 > UN 10.216.193.244 471.71 KiB 256 75,2% c155e262-b898-4e86-9e1d-d4d0f97e88f6 > RAC1 > UN 10.216.193.246 388.54 KiB 256 74,9% 502dd00f-fc02-4024-b65f-b98ba3808291 > RAC1 > *.246 > -- Address Load Tokens Owns (effective) Host ID Rack > UN 10.216.193.242 657.4 KiB 256 75,1% 7d278e14-d549-42f3-840d-77cfd852fbf4 > RAC1 > UN 10.216.193.243 471 KiB 256 74,8% 5586243a-ca74-4125-8e7e-09e82e23c4e5 RAC1 > UN 10.216.193.244 471.71 KiB 256 75,2% c155e262-b898-4e86-9e1d-d4d0f97e88f6 > RAC1 > UN 10.216.193.246 388.54 KiB 256 74,9% 502dd00f-fc02-4024-b65f-b98ba3808291 > RAC1 > {noformat} > > I have built 4.0 with wire tracing activated and in my config the > storage_port=12700 and ssl_storage_port=12701. In the log I can see that the > 4.0 node start to connect to the 3.11.3 seed node on the storage_port but > quickly switch to the ssl_storage_port, but when connecting to the non-seed > nodes it never switch to the ssl_storage_port. > {noformat} > >grep 193.246 system.log | grep Outbound > 2018-10-25T10:57:36.799+0200 [MessagingService-NettyOutbound-Thread-4-1] INFO > i.n.u.internal.logging.Slf4JLogger:101 info [id: 0x2f0e5e55] CONNECT: > /10.216.193.246:12700 > 2018-10-25T10:57:36.902+0200 [MessagingService-NettyOutbound-Thread-4-2] INFO > i.n.u.internal.logging.Slf4JLogger:101 info [id: 0x9e81f62c] CONNECT: > /10.216.193.246:12701 > 2018-10-25T10:57:36.905+0200 [MessagingService-NettyOutbound-Thread-4-2] INFO > i.n.u.internal.logging.Slf4JLogger:101 info [id: 0x9e81f62c, > L:/10.216.193.242:37252 - R:10.216.193.246/10.216.193.246:12701] ACTIVE > 2018-10-25T10:57:36.906+0200 [MessagingService-NettyOutbound-Thread-4-2] INFO > i.n.u.internal.logging.Slf4JLogger:101 info [id: 0x9e81f62c, > L:/10.216.193.242:37252 - R:10.216.193.246/10.216.193.246:12701] WRITE: 8B > >grep 193.243 system.log | grep Outbound > 2018-10-25T10:57:38.438+0200 [MessagingService-NettyOutbound-Thread-4-3] INFO > i.n.u.internal.logging.Slf4JLogger:101 info [id: 0xd8f1d6c4] CONNECT: > /10.216.193.243:12700 > 2018-10-25T10:57:38.540+0200 [MessagingService-NettyOutbound-Thread-4-4] INFO > i.n.u.internal.logging.Slf4JLogger:101 info [id: 0xfde6cc9f] CONNECT: > /10.216.193.243:12700 > 2018-10-25T10:57:38.694+0200 [MessagingService-NettyOutbound-Thread-4-5] INFO > i.n.u.internal.logging.Slf4JLogger:101 info [id: 0x7e87fc4e] CONNECT: > /10.216.193.243:12700 > 2018-10-25T10:57:38.741+0200 [MessagingService-NettyOutbound-Thread-4-7] INFO > i.n.u.internal.logging.Slf4JLogger:101 info [id: 0x39395296] CONNECT: > /10.216.193.243:12700{noformat} > > When I had the dbug log activated and started the 4.0 node I can see that it > switch port for *.246 but not for *.243 and *.244. > {noformat} > >grep DEBUG system.log| grep OutboundMessagingConnection | grep > >maybeUpdateConnectionId > 2018-10-25T13:12:56.095+0200 [ScheduledFastTasks:1] DEBUG > o.a.c.n.a.OutboundMessagingConnection:314 maybeUpdateConnectionId changing > connectionId to 10.216.193.246:12701 (GOSSIP),
[jira] [Commented] (CASSANDRA-14848) When upgrading 3.11.3->4.0 using SSL 4.0 nodes does not connect to old non seed nodes
[ https://issues.apache.org/jira/browse/CASSANDRA-14848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16702232#comment-16702232 ] Ariel Weisberg commented on CASSANDRA-14848: [~tommy_s] there is an issue with 4.0 nodes advertising the wrong max version when they connect. Once that is fixed I think the migration issues might go away. > When upgrading 3.11.3->4.0 using SSL 4.0 nodes does not connect to old non > seed nodes > - > > Key: CASSANDRA-14848 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14848 > Project: Cassandra > Issue Type: Bug > Components: Streaming and Messaging >Reporter: Tommy Stendahl >Assignee: Tommy Stendahl >Priority: Major > Labels: security > > When upgrading from 3.11.3 to 4.0 with server encryption enabled the new 4.0 > node only connects to 3.11.3 seed node, there are no connection established > to non-seed nodes on the old version. > I have four nodes, *.242 is upgraded to 4.0, *.243 and *.244 are 3.11.3 > non-seed and *.246 are 3.11.3 seed. After starting the 4.0 node I get this > nodetool status on the different nodes: > {noformat} > *.242 > -- Address Load Tokens Owns (effective) Host ID Rack > UN 10.216.193.242 1017.77 KiB 256 75,1% 7d278e14-d549-42f3-840d-77cfd852fbf4 > RAC1 > DN 10.216.193.243 743.32 KiB 256 74,8% 5586243a-ca74-4125-8e7e-09e82e23c4e5 > RAC1 > DN 10.216.193.244 711.54 KiB 256 75,2% c155e262-b898-4e86-9e1d-d4d0f97e88f6 > RAC1 > UN 10.216.193.246 659.81 KiB 256 74,9% 502dd00f-fc02-4024-b65f-b98ba3808291 > RAC1 > *.243 and *.244 > -- Address Load Tokens Owns (effective) Host ID Rack > DN 10.216.193.242 657.4 KiB 256 75,1% 7d278e14-d549-42f3-840d-77cfd852fbf4 > RAC1 > UN 10.216.193.243 471 KiB 256 74,8% 5586243a-ca74-4125-8e7e-09e82e23c4e5 RAC1 > UN 10.216.193.244 471.71 KiB 256 75,2% c155e262-b898-4e86-9e1d-d4d0f97e88f6 > RAC1 > UN 10.216.193.246 388.54 KiB 256 74,9% 502dd00f-fc02-4024-b65f-b98ba3808291 > RAC1 > *.246 > -- Address Load Tokens Owns (effective) Host ID Rack > UN 10.216.193.242 657.4 KiB 256 75,1% 7d278e14-d549-42f3-840d-77cfd852fbf4 > RAC1 > UN 10.216.193.243 471 KiB 256 74,8% 5586243a-ca74-4125-8e7e-09e82e23c4e5 RAC1 > UN 10.216.193.244 471.71 KiB 256 75,2% c155e262-b898-4e86-9e1d-d4d0f97e88f6 > RAC1 > UN 10.216.193.246 388.54 KiB 256 74,9% 502dd00f-fc02-4024-b65f-b98ba3808291 > RAC1 > {noformat} > > I have built 4.0 with wire tracing activated and in my config the > storage_port=12700 and ssl_storage_port=12701. In the log I can see that the > 4.0 node start to connect to the 3.11.3 seed node on the storage_port but > quickly switch to the ssl_storage_port, but when connecting to the non-seed > nodes it never switch to the ssl_storage_port. > {noformat} > >grep 193.246 system.log | grep Outbound > 2018-10-25T10:57:36.799+0200 [MessagingService-NettyOutbound-Thread-4-1] INFO > i.n.u.internal.logging.Slf4JLogger:101 info [id: 0x2f0e5e55] CONNECT: > /10.216.193.246:12700 > 2018-10-25T10:57:36.902+0200 [MessagingService-NettyOutbound-Thread-4-2] INFO > i.n.u.internal.logging.Slf4JLogger:101 info [id: 0x9e81f62c] CONNECT: > /10.216.193.246:12701 > 2018-10-25T10:57:36.905+0200 [MessagingService-NettyOutbound-Thread-4-2] INFO > i.n.u.internal.logging.Slf4JLogger:101 info [id: 0x9e81f62c, > L:/10.216.193.242:37252 - R:10.216.193.246/10.216.193.246:12701] ACTIVE > 2018-10-25T10:57:36.906+0200 [MessagingService-NettyOutbound-Thread-4-2] INFO > i.n.u.internal.logging.Slf4JLogger:101 info [id: 0x9e81f62c, > L:/10.216.193.242:37252 - R:10.216.193.246/10.216.193.246:12701] WRITE: 8B > >grep 193.243 system.log | grep Outbound > 2018-10-25T10:57:38.438+0200 [MessagingService-NettyOutbound-Thread-4-3] INFO > i.n.u.internal.logging.Slf4JLogger:101 info [id: 0xd8f1d6c4] CONNECT: > /10.216.193.243:12700 > 2018-10-25T10:57:38.540+0200 [MessagingService-NettyOutbound-Thread-4-4] INFO > i.n.u.internal.logging.Slf4JLogger:101 info [id: 0xfde6cc9f] CONNECT: > /10.216.193.243:12700 > 2018-10-25T10:57:38.694+0200 [MessagingService-NettyOutbound-Thread-4-5] INFO > i.n.u.internal.logging.Slf4JLogger:101 info [id: 0x7e87fc4e] CONNECT: > /10.216.193.243:12700 > 2018-10-25T10:57:38.741+0200 [MessagingService-NettyOutbound-Thread-4-7] INFO > i.n.u.internal.logging.Slf4JLogger:101 info [id: 0x39395296] CONNECT: > /10.216.193.243:12700{noformat} > > When I had the dbug log activated and started the 4.0 node I can see that it > switch port for *.246 but not for *.243 and *.244. > {noformat} > >grep DEBUG system.log| grep OutboundMessagingConnection | grep > >maybeUpdateConnectionId > 2018-10-25T13:12:56.095+0200 [ScheduledFastTasks:1] DEBUG > o.a.c.n.a.OutboundMessagingConnection:314 maybeUpdateConnectionId changing > connectionId to 10.216.193.246:12701
[jira] [Commented] (CASSANDRA-14848) When upgrading 3.11.3->4.0 using SSL 4.0 nodes does not connect to old non seed nodes
[ https://issues.apache.org/jira/browse/CASSANDRA-14848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16693242#comment-16693242 ] Tommy Stendahl commented on CASSANDRA-14848: I have done some more troubleshooting on the new exception I got and I don't think its related to my patch, I get the same issue without my patch also. Its probably related to CASSANDRA-14896. My patch do solve the issue I reported in this jira and with it the new node selects the correct port to all old nodes. Patch is here: [cassandra-14848|https://github.com/tommystendahl/cassandra/tree/cassandra-14848] > When upgrading 3.11.3->4.0 using SSL 4.0 nodes does not connect to old non > seed nodes > - > > Key: CASSANDRA-14848 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14848 > Project: Cassandra > Issue Type: Bug > Components: Streaming and Messaging >Reporter: Tommy Stendahl >Priority: Major > Labels: security > > When upgrading from 3.11.3 to 4.0 with server encryption enabled the new 4.0 > node only connects to 3.11.3 seed node, there are no connection established > to non-seed nodes on the old version. > I have four nodes, *.242 is upgraded to 4.0, *.243 and *.244 are 3.11.3 > non-seed and *.246 are 3.11.3 seed. After starting the 4.0 node I get this > nodetool status on the different nodes: > {noformat} > *.242 > -- Address Load Tokens Owns (effective) Host ID Rack > UN 10.216.193.242 1017.77 KiB 256 75,1% 7d278e14-d549-42f3-840d-77cfd852fbf4 > RAC1 > DN 10.216.193.243 743.32 KiB 256 74,8% 5586243a-ca74-4125-8e7e-09e82e23c4e5 > RAC1 > DN 10.216.193.244 711.54 KiB 256 75,2% c155e262-b898-4e86-9e1d-d4d0f97e88f6 > RAC1 > UN 10.216.193.246 659.81 KiB 256 74,9% 502dd00f-fc02-4024-b65f-b98ba3808291 > RAC1 > *.243 and *.244 > -- Address Load Tokens Owns (effective) Host ID Rack > DN 10.216.193.242 657.4 KiB 256 75,1% 7d278e14-d549-42f3-840d-77cfd852fbf4 > RAC1 > UN 10.216.193.243 471 KiB 256 74,8% 5586243a-ca74-4125-8e7e-09e82e23c4e5 RAC1 > UN 10.216.193.244 471.71 KiB 256 75,2% c155e262-b898-4e86-9e1d-d4d0f97e88f6 > RAC1 > UN 10.216.193.246 388.54 KiB 256 74,9% 502dd00f-fc02-4024-b65f-b98ba3808291 > RAC1 > *.246 > -- Address Load Tokens Owns (effective) Host ID Rack > UN 10.216.193.242 657.4 KiB 256 75,1% 7d278e14-d549-42f3-840d-77cfd852fbf4 > RAC1 > UN 10.216.193.243 471 KiB 256 74,8% 5586243a-ca74-4125-8e7e-09e82e23c4e5 RAC1 > UN 10.216.193.244 471.71 KiB 256 75,2% c155e262-b898-4e86-9e1d-d4d0f97e88f6 > RAC1 > UN 10.216.193.246 388.54 KiB 256 74,9% 502dd00f-fc02-4024-b65f-b98ba3808291 > RAC1 > {noformat} > > I have built 4.0 with wire tracing activated and in my config the > storage_port=12700 and ssl_storage_port=12701. In the log I can see that the > 4.0 node start to connect to the 3.11.3 seed node on the storage_port but > quickly switch to the ssl_storage_port, but when connecting to the non-seed > nodes it never switch to the ssl_storage_port. > {noformat} > >grep 193.246 system.log | grep Outbound > 2018-10-25T10:57:36.799+0200 [MessagingService-NettyOutbound-Thread-4-1] INFO > i.n.u.internal.logging.Slf4JLogger:101 info [id: 0x2f0e5e55] CONNECT: > /10.216.193.246:12700 > 2018-10-25T10:57:36.902+0200 [MessagingService-NettyOutbound-Thread-4-2] INFO > i.n.u.internal.logging.Slf4JLogger:101 info [id: 0x9e81f62c] CONNECT: > /10.216.193.246:12701 > 2018-10-25T10:57:36.905+0200 [MessagingService-NettyOutbound-Thread-4-2] INFO > i.n.u.internal.logging.Slf4JLogger:101 info [id: 0x9e81f62c, > L:/10.216.193.242:37252 - R:10.216.193.246/10.216.193.246:12701] ACTIVE > 2018-10-25T10:57:36.906+0200 [MessagingService-NettyOutbound-Thread-4-2] INFO > i.n.u.internal.logging.Slf4JLogger:101 info [id: 0x9e81f62c, > L:/10.216.193.242:37252 - R:10.216.193.246/10.216.193.246:12701] WRITE: 8B > >grep 193.243 system.log | grep Outbound > 2018-10-25T10:57:38.438+0200 [MessagingService-NettyOutbound-Thread-4-3] INFO > i.n.u.internal.logging.Slf4JLogger:101 info [id: 0xd8f1d6c4] CONNECT: > /10.216.193.243:12700 > 2018-10-25T10:57:38.540+0200 [MessagingService-NettyOutbound-Thread-4-4] INFO > i.n.u.internal.logging.Slf4JLogger:101 info [id: 0xfde6cc9f] CONNECT: > /10.216.193.243:12700 > 2018-10-25T10:57:38.694+0200 [MessagingService-NettyOutbound-Thread-4-5] INFO > i.n.u.internal.logging.Slf4JLogger:101 info [id: 0x7e87fc4e] CONNECT: > /10.216.193.243:12700 > 2018-10-25T10:57:38.741+0200 [MessagingService-NettyOutbound-Thread-4-7] INFO > i.n.u.internal.logging.Slf4JLogger:101 info [id: 0x39395296] CONNECT: > /10.216.193.243:12700{noformat} > > When I had the dbug log activated and started the 4.0 node I can see that it > switch port for *.246 but not for *.243 and *.244. > {noformat} > >grep DEBUG system.log| grep OutboundMessagingConnection
[jira] [Commented] (CASSANDRA-14848) When upgrading 3.11.3->4.0 using SSL 4.0 nodes does not connect to old non seed nodes
[ https://issues.apache.org/jira/browse/CASSANDRA-14848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16691804#comment-16691804 ] Tommy Stendahl commented on CASSANDRA-14848: I thought the new exception I got might be the same issue as CASSANDRA-14896 reported by [~aweisberg], we both upgraded from 3.0->4.0 and got the same exception. So I tried to upgrade from 3.11.3->4.0 and expected not to get this exception, but I still get the same exception one minute after the old node detects the new node as UP. {noformat} 2018-11-19T15:13:52.061+0100 [GossipStage:1] INFO o.a.cassandra.service.StorageService:2289 handleStateNormal Node /10.216.193.242 state jump to NORMAL 2018-11-19T15:13:52.062+0100 [RequestResponseStage-1] INFO org.apache.cassandra.gms.Gossiper:1019 realMarkAlive InetAddress /10.216.193.242 is now UP 2018-11-19T15:14:52.072+0100 [MessagingService-Incoming-/10.216.193.242] ERROR o.a.c.service.CassandraDaemon$2:228 uncaughtException Exception in thread Thread[MessagingService-Incoming-/10.216.193.242,5,main] java.lang.RuntimeException: Unknown column additional_write_policy during deserialization at org.apache.cassandra.db.Columns$Serializer.deserialize(Columns.java:452) ~[apache-cassandra-3.11.3.jar:3.11.3] at org.apache.cassandra.db.SerializationHeader$Serializer.deserializeForMessaging(SerializationHeader.java:412) ~[apache-cassandra-3.11.3.jar:3.11.3] at org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.deserializeHeader(UnfilteredRowIteratorSerializer.java:195) ~[apache-cassandra-3.11.3.jar:3.11.3] at org.apache.cassandra.db.partitions.PartitionUpdate$PartitionUpdateSerializer.deserialize30(PartitionUpdate.java:851) ~[apache-cassandra-3.11.3.jar:3.11.3] at org.apache.cassandra.db.partitions.PartitionUpdate$PartitionUpdateSerializer.deserialize(PartitionUpdate.java:839) ~[apache-cassandra-3.11.3.jar:3.11.3] at org.apache.cassandra.db.Mutation$MutationSerializer.deserialize(Mutation.java:425) ~[apache-cassandra-3.11.3.jar:3.11.3] at org.apache.cassandra.db.Mutation$MutationSerializer.deserialize(Mutation.java:434) ~[apache-cassandra-3.11.3.jar:3.11.3] at org.apache.cassandra.service.MigrationManager$MigrationsSerializer.deserialize(MigrationManager.java:669) ~[apache-cassandra-3.11.3.jar:3.11.3] at org.apache.cassandra.service.MigrationManager$MigrationsSerializer.deserialize(MigrationManager.java:652) ~[apache-cassandra-3.11.3.jar:3.11.3] at org.apache.cassandra.net.MessageIn.read(MessageIn.java:123) ~[apache-cassandra-3.11.3.jar:3.11.3] at org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:192) ~[apache-cassandra-3.11.3.jar:3.11.3] at org.apache.cassandra.net.IncomingTcpConnection.receiveMessages(IncomingTcpConnection.java:180) ~[apache-cassandra-3.11.3.jar:3.11.3] at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:94) ~[apache-cassandra-3.11.3.jar:3.11.3]{noformat} > When upgrading 3.11.3->4.0 using SSL 4.0 nodes does not connect to old non > seed nodes > - > > Key: CASSANDRA-14848 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14848 > Project: Cassandra > Issue Type: Bug > Components: Streaming and Messaging >Reporter: Tommy Stendahl >Priority: Major > Labels: security > > When upgrading from 3.11.3 to 4.0 with server encryption enabled the new 4.0 > node only connects to 3.11.3 seed node, there are no connection established > to non-seed nodes on the old version. > I have four nodes, *.242 is upgraded to 4.0, *.243 and *.244 are 3.11.3 > non-seed and *.246 are 3.11.3 seed. After starting the 4.0 node I get this > nodetool status on the different nodes: > {noformat} > *.242 > -- Address Load Tokens Owns (effective) Host ID Rack > UN 10.216.193.242 1017.77 KiB 256 75,1% 7d278e14-d549-42f3-840d-77cfd852fbf4 > RAC1 > DN 10.216.193.243 743.32 KiB 256 74,8% 5586243a-ca74-4125-8e7e-09e82e23c4e5 > RAC1 > DN 10.216.193.244 711.54 KiB 256 75,2% c155e262-b898-4e86-9e1d-d4d0f97e88f6 > RAC1 > UN 10.216.193.246 659.81 KiB 256 74,9% 502dd00f-fc02-4024-b65f-b98ba3808291 > RAC1 > *.243 and *.244 > -- Address Load Tokens Owns (effective) Host ID Rack > DN 10.216.193.242 657.4 KiB 256 75,1% 7d278e14-d549-42f3-840d-77cfd852fbf4 > RAC1 > UN 10.216.193.243 471 KiB 256 74,8% 5586243a-ca74-4125-8e7e-09e82e23c4e5 RAC1 > UN 10.216.193.244 471.71 KiB 256 75,2% c155e262-b898-4e86-9e1d-d4d0f97e88f6 > RAC1 > UN 10.216.193.246 388.54 KiB 256 74,9% 502dd00f-fc02-4024-b65f-b98ba3808291 > RAC1 > *.246 > -- Address Load Tokens Owns (effective) Host ID Rack > UN 10.216.193.242 657.4 KiB 256 75,1% 7d278e14-d549-42f3-840d-77cfd852fbf4 > RAC1 > UN 10.216.193.243 471 KiB 256 74,8% 5586243a-ca74-4125-8e7e-09e82e23c4e5 RAC1 > UN
[jira] [Commented] (CASSANDRA-14848) When upgrading 3.11.3->4.0 using SSL 4.0 nodes does not connect to old non seed nodes
[ https://issues.apache.org/jira/browse/CASSANDRA-14848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16689456#comment-16689456 ] Tommy Stendahl commented on CASSANDRA-14848: I have created a patch that allow 4.0 nodes to connect to all 3.x nodes, its available here: [cassandra-14848|[https://github.com/tommystendahl/cassandra/tree/cassandra-14848].] Unfortunately I got another exception in the log of the old nodes: {noformat} 2018-11-16T13:48:15.165+0100 [MessagingService-Incoming-/10.216.193.242] ERROR o.a.c.service.CassandraDaemon$2:223 uncaughtException Exception in thread Thread[MessagingService-Incoming-/10.216.193.242,5,main] java.lang.RuntimeException: Unknown column additional_write_policy during deserialization at org.apache.cassandra.db.Columns$Serializer.deserialize(Columns.java:433) ~[apache-cassandra-3.0.17.jar:3.0.17] at org.apache.cassandra.db.SerializationHeader$Serializer.deserializeForMessaging(SerializationHeader.java:440) ~[apache-cassandra-3.0.17.jar:3.0.17] at org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.deserializeHeader(UnfilteredRowIteratorSerializer.java:190) ~[apache-cassandra-3.0.17.jar:3.0.17] at org.apache.cassandra.db.partitions.PartitionUpdate$PartitionUpdateSerializer.deserialize30(PartitionUpdate.java:686) ~[apache-cassandra-3.0.17.jar:3.0.17] at org.apache.cassandra.db.partitions.PartitionUpdate$PartitionUpdateSerializer.deserialize(PartitionUpdate.java:674) ~[apache-cassandra-3.0.17.jar:3.0.17] at org.apache.cassandra.db.Mutation$MutationSerializer.deserialize(Mutation.java:337) ~[apache-cassandra-3.0.17.jar:3.0.17] at org.apache.cassandra.db.Mutation$MutationSerializer.deserialize(Mutation.java:346) ~[apache-cassandra-3.0.17.jar:3.0.17] at org.apache.cassandra.service.MigrationManager$MigrationsSerializer.deserialize(MigrationManager.java:641) ~[apache-cassandra-3.0.17.jar:3.0.17] at org.apache.cassandra.service.MigrationManager$MigrationsSerializer.deserialize(MigrationManager.java:624) ~[apache-cassandra-3.0.17.jar:3.0.17] at org.apache.cassandra.net.MessageIn.read(MessageIn.java:98) ~[apache-cassandra-3.0.17.jar:3.0.17] at org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:201) ~[apache-cassandra-3.0.17.jar:3.0.17] at org.apache.cassandra.net.IncomingTcpConnection.receiveMessages(IncomingTcpConnection.java:178) ~[apache-cassandra-3.0.17.jar:3.0.17] at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:92) ~[apache-cassandra-3.0.17.jar:3.0.17]{noformat} It appears once or twice about one minute after the old node has detected the new node as being UP: {noformat} 2018-11-16T13:47:15.148+0100 [GossipStage:1] INFO org.apache.cassandra.gms.Gossiper:1040 handleMajorStateChange Node /10.216.193.242 has restarted, now UP 2018-11-16T13:47:15.149+0100 [GossipStage:1] INFO o.a.cassandra.service.StorageService:2024 handleStateNormal Node /10.216.193.242 state jump to NORMAL 2018-11-16T13:48:15.165+0100 [MessagingService-Incoming-/10.216.193.242] ERROR o.a.c.service.CassandraDaemon$2:223 uncaughtException Exception in thread Thread[MessagingService-Incoming-/10.216.193.242,5,main] java.lang.RuntimeException: Unknown column additional_write_policy during deserialization{noformat} So far I have not found this to cause any problems besides printing an unexpected exception in the log. Also I'm not sure if we should consider this a new issue or if my patch is wrong (or missing something). > When upgrading 3.11.3->4.0 using SSL 4.0 nodes does not connect to old non > seed nodes > - > > Key: CASSANDRA-14848 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14848 > Project: Cassandra > Issue Type: Bug > Components: Streaming and Messaging >Reporter: Tommy Stendahl >Priority: Major > Labels: security > > When upgrading from 3.11.3 to 4.0 with server encryption enabled the new 4.0 > node only connects to 3.11.3 seed node, there are no connection established > to non-seed nodes on the old version. > I have four nodes, *.242 is upgraded to 4.0, *.243 and *.244 are 3.11.3 > non-seed and *.246 are 3.11.3 seed. After starting the 4.0 node I get this > nodetool status on the different nodes: > {noformat} > *.242 > -- Address Load Tokens Owns (effective) Host ID Rack > UN 10.216.193.242 1017.77 KiB 256 75,1% 7d278e14-d549-42f3-840d-77cfd852fbf4 > RAC1 > DN 10.216.193.243 743.32 KiB 256 74,8% 5586243a-ca74-4125-8e7e-09e82e23c4e5 > RAC1 > DN 10.216.193.244 711.54 KiB 256 75,2% c155e262-b898-4e86-9e1d-d4d0f97e88f6 > RAC1 > UN 10.216.193.246 659.81 KiB 256 74,9% 502dd00f-fc02-4024-b65f-b98ba3808291 > RAC1 > *.243 and *.244 > -- Address Load Tokens Owns (effective) Host ID Rack > DN 10.216.193.242 657.4
[jira] [Commented] (CASSANDRA-14848) When upgrading 3.11.3->4.0 using SSL 4.0 nodes does not connect to old non seed nodes
[ https://issues.apache.org/jira/browse/CASSANDRA-14848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16663735#comment-16663735 ] Tommy Stendahl commented on CASSANDRA-14848: I think the problem is in {{OutboundMessagingConnection.maybeUpdateConnectionId()}} in combination with line 186 in that class: {code:java} targetVersion = MessagingService.instance().getVersion(connectionId.remote()); {code} What happens is that when the {{OutboundMessagingConnection}} is created for the seed node {{targerVersion}} is set to 12 since we don't know the version of that node yet. When we get incoming messages from the old seed node we detect that if has a lower version and the if statement in {{maybeUpdateConnectionId()}} will be true: {code:java} if (version < targetVersion) {code} {{and we will change the port.}} But when creating {{OutboundMessagingConnection}} for the non-seed nodes we already know there versions (from gossiping with the old seed) and on line 186 {{tagetVersion}} will be set to 11 and the if statement in in {{maybeUpdateConnectionId()}} will never be true so we will continue using the wrong port. I verified this by hard coding {{targetVersion=12}} on line 186 and then everything was working but I don't think that's the proper fix. > When upgrading 3.11.3->4.0 using SSL 4.0 nodes does not connect to old non > seed nodes > - > > Key: CASSANDRA-14848 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14848 > Project: Cassandra > Issue Type: Bug >Reporter: Tommy Stendahl >Priority: Major > > When upgrading from 3.11.3 to 4.0 with server encryption enabled the new 4.0 > node only connects to 3.11.3 seed node, there are no connection established > to non-seed nodes on the old version. > I have four nodes, *.242 is upgraded to 4.0, *.243 and *.244 are 3.11.3 > non-seed and *.246 are 3.11.3 seed. After starting the 4.0 node I get this > nodetool status on the different nodes: > {noformat} > *.242 > -- Address Load Tokens Owns (effective) Host ID Rack > UN 10.216.193.242 1017.77 KiB 256 75,1% 7d278e14-d549-42f3-840d-77cfd852fbf4 > RAC1 > DN 10.216.193.243 743.32 KiB 256 74,8% 5586243a-ca74-4125-8e7e-09e82e23c4e5 > RAC1 > DN 10.216.193.244 711.54 KiB 256 75,2% c155e262-b898-4e86-9e1d-d4d0f97e88f6 > RAC1 > UN 10.216.193.246 659.81 KiB 256 74,9% 502dd00f-fc02-4024-b65f-b98ba3808291 > RAC1 > *.243 and *.244 > -- Address Load Tokens Owns (effective) Host ID Rack > DN 10.216.193.242 657.4 KiB 256 75,1% 7d278e14-d549-42f3-840d-77cfd852fbf4 > RAC1 > UN 10.216.193.243 471 KiB 256 74,8% 5586243a-ca74-4125-8e7e-09e82e23c4e5 RAC1 > UN 10.216.193.244 471.71 KiB 256 75,2% c155e262-b898-4e86-9e1d-d4d0f97e88f6 > RAC1 > UN 10.216.193.246 388.54 KiB 256 74,9% 502dd00f-fc02-4024-b65f-b98ba3808291 > RAC1 > *.246 > -- Address Load Tokens Owns (effective) Host ID Rack > UN 10.216.193.242 657.4 KiB 256 75,1% 7d278e14-d549-42f3-840d-77cfd852fbf4 > RAC1 > UN 10.216.193.243 471 KiB 256 74,8% 5586243a-ca74-4125-8e7e-09e82e23c4e5 RAC1 > UN 10.216.193.244 471.71 KiB 256 75,2% c155e262-b898-4e86-9e1d-d4d0f97e88f6 > RAC1 > UN 10.216.193.246 388.54 KiB 256 74,9% 502dd00f-fc02-4024-b65f-b98ba3808291 > RAC1 > {noformat} > > I have built 4.0 with wire tracing activated and in my config the > storage_port=12700 and ssl_storage_port=12701. In the log I can see that the > 4.0 node start to connect to the 3.11.3 seed node on the storage_port but > quickly switch to the ssl_storage_port, but when connecting to the non-seed > nodes it never switch to the ssl_storage_port. > {noformat} > >grep 193.246 system.log | grep Outbound > 2018-10-25T10:57:36.799+0200 [MessagingService-NettyOutbound-Thread-4-1] INFO > i.n.u.internal.logging.Slf4JLogger:101 info [id: 0x2f0e5e55] CONNECT: > /10.216.193.246:12700 > 2018-10-25T10:57:36.902+0200 [MessagingService-NettyOutbound-Thread-4-2] INFO > i.n.u.internal.logging.Slf4JLogger:101 info [id: 0x9e81f62c] CONNECT: > /10.216.193.246:12701 > 2018-10-25T10:57:36.905+0200 [MessagingService-NettyOutbound-Thread-4-2] INFO > i.n.u.internal.logging.Slf4JLogger:101 info [id: 0x9e81f62c, > L:/10.216.193.242:37252 - R:10.216.193.246/10.216.193.246:12701] ACTIVE > 2018-10-25T10:57:36.906+0200 [MessagingService-NettyOutbound-Thread-4-2] INFO > i.n.u.internal.logging.Slf4JLogger:101 info [id: 0x9e81f62c, > L:/10.216.193.242:37252 - R:10.216.193.246/10.216.193.246:12701] WRITE: 8B > >grep 193.243 system.log | grep Outbound > 2018-10-25T10:57:38.438+0200 [MessagingService-NettyOutbound-Thread-4-3] INFO > i.n.u.internal.logging.Slf4JLogger:101 info [id: 0xd8f1d6c4] CONNECT: > /10.216.193.243:12700 > 2018-10-25T10:57:38.540+0200 [MessagingService-NettyOutbound-Thread-4-4] INFO > i.n.u.internal.logging.Slf4JLogger:101 info [id: 0xfde6cc9f]