could be issue with keystore/trustore --- you may want to do keytool -- list
-- validate the files/password; also do md5sum on files from 1 node in west and
1 node in east.check ssl port 7001 --- from 1 node in west --> telnet <node in
east>:7001 (or custom port if you are not using default port)
On Monday, August 26, 2019, 05:46:19 PM PDT, Michael Carlise
<[email protected]> wrote:
Subroto -
both tools error; openssl errno 111 - which made me check bound ports on the c*
node with encryption flipped. Port 9042 is not open (determined by netstat
-ant). Looking at the log differences for when a node is started with/without
encryption. Without encryption, I get a bunch of lines like:
OutboundTcpConnection.java:561 - Handshaking version w/ IP
And this happens after a line like
Gossiper.java - Waiting for gossip to settle...
with encryption toggled to 'dc', I don't see any of those lines; presumable b/c
the gossiper is trying to start but doesn't.
On Mon, Aug 26, 2019 at 6:51 PM Subroto Barua <[email protected]>
wrote:
Michael,
Are you able to connect to any c* node via OpenSSL?
Openssl s_client -connect <ip address >:9042
Cqlsh <ip address> —ssl
Subroto
On Aug 26, 2019, at 2:47 PM, Marc Selwan <[email protected]> wrote:
which exact version of OpenJDK are you using? Is it possible you don't have JCE
on those nodes? (I believe more recent versions of Java 8 has this baked in so
that might not be it)
Marc Selwan | DataStax | PM, Server Team | (925) 413-7079 | Twitter
Quick links | DataStax | Training | Documentation | Downloads
On Mon, Aug 26, 2019 at 1:56 PM Michael Carlise
<[email protected]> wrote:
I originally opened this issue on stackoverflow
(https://stackoverflow.com/questions/57516660/cassandra-node-to-node-encryption-throws-unable-to-gossip-with-peers-exception).
However, I haven't gotten any responses in over a week. I'm going to post it
here and maybe someone will have an idea on where I can look.
We currently run a multi region cassandra cluster in AWS. It runs in four
regions, 12 nodes per region. It runs without node to node encryption (or
client encryption either). We are trying to enable inter datacenter node to
node encryption. However, when we flip encryption over we get an exception that
nodes are unable to gossip with any peers.
It could possibly be that we didn't build our jks keystore/truststores
correctly (more on how we built these files below). But, we additionally do not
see intra datacenter communication working (which should be set to unencrypted
communication). Additionally, cqlsh cannot connect to the node either; even
though we have (by default) client_auth_required set to false.
ERROR [main] 2019-08-15 18:46:32,241 CassandraDaemon.java:749 - Exception
encountered during startup
java.lang.RuntimeException: Unable to gossip with any peers
at org.apache.cassandra.gms.Gossiper.doShadowRound(Gossiper.java:1435)
~[apache-cassandra-3.11.4.jar:3.11.4]
at
org.apache.cassandra.service.StorageService.checkForEndpointCollision(StorageService.java:566)
~[apache-cassandra-3.11.4.jar:3.11.4]
at
org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:823)
~[apache-cassandra-3.11.4.jar:3.11.4]
at
org.apache.cassandra.service.StorageService.initServer(StorageService.java:683)
~[apache-cassandra-3.11.4.jar:3.11.4]
at
org.apache.cassandra.service.StorageService.initServer(StorageService.java:632)
~[apache-cassandra-3.11.4.jar:3.11.4]
at
org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:388)
[apache-cassandra-3.11.4.jar:3.11.4]
at
org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:620)
[apache-cassandra-3.11.4.jar:3.11.4]
at
org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:732)
[apache-cassandra-3.11.4.jar:3.11.4]
INFO [main] 2019-08-15 18:47:07,384 YamlConfigurationLoader.java:89 -
Configuration location: file:/etc/cassandra/cassandra.yaml
Something to note is that this error message occurs after a few minutes of the
node being up. (i.e. there is a delay between start up before this exception is
thrown).
Information about our cassandra setup
cassandra version: 3.11.4
JDK version: openjdk-8.
Linux: Ubuntu 18.04 (bionic).
cassandra.yaml
endpoint_snitch: Ec2MultiRegionSnitch
server_encryption_options:
internode_encryption: dc
keystore: <omitted>
keystore_password: <omitted>
truststore: <omitted>
truststore_password: <omitted>
client_encryption_options:
enabled: false
cassandra-rackdc.properties
prefer_local=true
No obvious errors with SSH output
When starting cassandra with JVM_OPTS="$JVM_OPTS -Djavax.net.debug=ssl" added
to cassandra-env.sh we see SSL logs printed to stdout (Note: Subject and Issuer
were omitted on purpose).
found key for : cassy-us-west-2
adding as trusted cert:
Subject: ...
Issuer: ...
Algorithm: RSA; Serial number: 0xdad28d843fc73325d4c1a75207d4e74
Valid from Fri May 27 00:00:00 UTC 2016 until Tue May 26 23:59:59 UTC 2026
...
trigger seeding of SecureRandom
done seeding SecureRandom
Looking at Java SE SSL/TLS connection debugging, this looks correct. But to
note, we see this series of messages (along with the RSA key signature output)
repeated several times in rapid fire. We never observe any messages about the
trust store being added; however that might be something that occurs only on
client initiation (?)
Additionally, we do see cassandra report that the Encrypted Messaging service
has been started.
INFO [main] 2019-08-15 18:45:31,022 MessagingService.java:704 - Starting
Encrypted Messaging Service on SSL port 7001
Doesn't appear to be a cassandra.yaml configuration problem
We can bring the node back online by simply configuring internode_encryption:
none. This action seems to rule out a broadcast_address or rpc_address
configuration problem.
How we built our keystore/truststores
We followed the basic template datastax docs for preparing SSL certificates.
One minor difference was that our private key and CSRs were generated using
openssl. One per each region (we plan to share key/signed certs across nodes in
regions). This was created using a command template as:
openssl req -new -newkey rsa:2048 -out cassy-<region>.csr -keyout
cassy-<region>.key -config cassy-<region>.conf -subj "..." -nodes -sha256
The generated CSR was then signed by an internal root CA. Because we generated
our files using openssl, we had to build our jks files by importing our certs
into them.
Commands to generate truststore
We distribute this one file to all nodes.
keytool -importcert
-keystore generic-server-truststore.jks
-alias rootCa
-file rootCa.crt
-noprompt
-keypass omitted
-storepass omitted
Commands to generate keystore
This was done one per region; but essentially we created a keystore with
keytool, then deleted the key entry and then imported our key entry using
keytool from a pkcs12 file.
keytool -genkeypair -keyalg RSA -alias cassy-${region} -keystore
cassy-${region}.jks -storepass omitted -keypass omitted -validity 365 -keysize
2048 -dname "..."
keytool -delete -alias cassy-${region} -keystore cassy-${region}.jks -storepass
omitted
openssl pkcs12 -export -in signed_certs/${region}.pem -inkey
keys/cassandra.${region}.key -name cassy-${region} -out ${region}.p12
keytool -importkeystore -deststorepass omitted -destkeystore
cassy-${region}.jks -srckeystore ${region}.p12 -srcstoretype PKCS12
keytool -importcert -keystore cassy-${region}.jks -alias rootCa -file ca.crt
-noprompt -keypass omitted -storepass omitted
Looking back at this, I don't remember why we used keytool to generate a
keypair/keystore, then deleted and imported. I think it was because the keytool
importkeystore command refused to run if the keystore didn't already exist.
ca.crt and pem file
The ca.crt file contains the root certificate and the intermediate certificate
that was used to sign the CSR. The pem file contains the signed CSR returned to
us, the intermediate cert, and the root CA (in that order).
openssl verify ca.crt and pem
openssl verify -CAfile ca.crt us-west-2.pem
signed_certs/us-west-2.pem: OK
Command output after enabling encryption
nodetool status (output truncated)
Datacenter: us-east
===================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID
Rack
?N 52.44.11.221 ? 256 25.4% null
1c
...
?N 52.204.232.195 ? 256 23.2% null
1d
Datacenter: us-west-2
=====================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID
Rack
?N 34.209.2.144 ? 256 26.5% null
2c
UN 52.40.32.177 105.99 GiB 256 23.7% null
2c
?N 34.210.109.203 ? 256 24.7% null
2a
...
With the online node being the node with encryption set.
cqlsh to localhost
cassy-node6:~$ cqlsh
Connection error: ('Unable to connect to any servers', {'127.0.0.1': error(111,
"Tried connecting to [('127.0.0.1', 9042)]. Last error: Connection refused")})
cqlsh to remote node Remote node is a node with encryption enabled
cassy-node6:~$ cqlsh 10.0.2.7
Connection error: ('Unable to connect to any servers', {'10.0.2.7': error(111,
"Tried connecting to [('10.0.2.7', 9042)]. Last error: Connection refused")})