Alexander Dejanovski created CASSANDRA-15878:
------------------------------------------------

             Summary: Ec2Snitch fails on upgrade in legacy mode
                 Key: CASSANDRA-15878
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-15878
             Project: Cassandra
          Issue Type: Bug
            Reporter: Alexander Dejanovski


CASSANDRA-7839 changed the way the EC2 DC/Rack naming was handled in the 
Ec2Snitch to match AWS conventions.

The "legacy" mode was introduced to allow upgrades from Cassandra 3.0/3.x and 
keep the same naming as before (while the "standard" mode uses the new naming 
convention).

When performing an upgrade in the us-west-2 region, the second node failed to 
start with the following exception:

 
{code:java}
ERROR [main] 2020-06-16 09:14:42,218 Ec2Snitch.java:210 - This ec2-enabled 
snitch appears to be using the legacy naming scheme for regions, but existing 
nodes in cluster are using the opposite: region(s) = [us-west-2], availability 
zone(s) = [2a]. Please check the ec2_naming_scheme property in the 
cassandra-rackdc.properties configuration file for more details.
ERROR [main] 2020-06-16 09:14:42,219 CassandraDaemon.java:789 - Exception 
encountered during startup
java.lang.IllegalStateException: null
        at 
org.apache.cassandra.service.StorageService.validateEndpointSnitch(StorageService.java:573)
        at 
org.apache.cassandra.service.StorageService.checkForEndpointCollision(StorageService.java:530)
        at 
org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:800)
        at 
org.apache.cassandra.service.StorageService.initServer(StorageService.java:659)
        at 
org.apache.cassandra.service.StorageService.initServer(StorageService.java:610)
        at 
org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:373)
        at 
org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:650)
        at 
org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:767)
{code}
 

The exception leads back to [this piece of 
code|https://github.com/apache/cassandra/blob/cassandra-4.0-alpha4/src/java/org/apache/cassandra/locator/Ec2Snitch.java#L183-L185].

After adding some logging, it turned out the DC name of the first upgraded node 
was considered invalid as a legacy one:
{code:java}
INFO  [main] 2020-06-16 09:14:42,216 Ec2Snitch.java:183 - Detected DC us-west-2
INFO  [main] 2020-06-16 09:14:42,217 Ec2Snitch.java:185 - 
dcUsesLegacyFormat=false / usingLegacyNaming=true
ERROR [main] 2020-06-16 09:14:42,217 Ec2Snitch.java:188 - Invalid DC name 
us-west-2
{code}
 

The problem is that the regex that's used to identify legacy dc names will 
match both old and new names : 
{code:java}
boolean dcUsesLegacyFormat = !dc.matches("[a-z]+-[a-z].+-[\\d].*");
{code}
Knowing that some dc names didn't change between the two modes (us-west-2 for 
example), I don't see how we can use the dc names to detect if the legacy mode 
is being used by other nodes in the cluster.
  
 The rack names on the other hand are totally different in the legacy and 
standard modes and can be used to detect mismatching settings.
  
 My go to fix would be to drop the check on datacenters by removing the 
following lines: 
[https://github.com/apache/cassandra/blob/cassandra-4.0-alpha4/src/java/org/apache/cassandra/locator/Ec2Snitch.java#L172-L186]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to