which exact version of OpenJDK are you using? Is it possible you don't have JCE on those nodes? (I believe more recent versions of Java 8 has this baked in so that might not be it)
*Marc Selwan | *DataStax *| *PM, Server Team *|* *(925) 413-7079* *|* Twitter <https://twitter.com/MarcSelwan> * Quick links | *DataStax <http://www.datastax.com> *| *Training <http://www.academy.datastax.com> *| *Documentation <http://www.datastax.com/documentation/getting_started/doc/getting_started/gettingStartedIntro_r.html> *| *Downloads <http://www.datastax.com/download> On Mon, Aug 26, 2019 at 1:56 PM Michael Carlise <mcarl...@salesforce.com.invalid> wrote: > > I originally opened this issue on stackoverflow ( > https://stackoverflow.com/questions/57516660/cassandra-node-to-node-encryption-throws-unable-to-gossip-with-peers-exception > <https://urldefense.proofpoint.com/v2/url?u=https-3A__stackoverflow.com_questions_57516660_cassandra-2Dnode-2Dto-2Dnode-2Dencryption-2Dthrows-2Dunable-2Dto-2Dgossip-2Dwith-2Dpeers-2Dexception&d=DwMFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=E6NVfMr2TIhW42QMfARTvsfCLtdF-oEA3KfAQRfVZdk&m=KdhQDpMbz8v1GYrbdYL_opGq-GBPXftrEYEkgcGeMp0&s=4CR8PRQopb4FyCLj8PDI44mSouBz65Yx8THnH8tOb7o&e=> > ). > > However, I haven't gotten any responses in over a week. I'm going to post > it here and maybe someone will have an idea on where I can look. > > We currently run a multi region cassandra cluster in AWS. It runs in four > regions, 12 nodes per region. It runs without node to node encryption (or > client encryption either). We are trying to enable inter datacenter node to > node encryption. However, when we flip encryption over we get an exception > that nodes are unable to gossip with any peers. > > It could possibly be that we didn't build our jks keystore/truststores > correctly (more on how we built these files below). But, we additionally do > not see intra datacenter communication working (which should be set to > unencrypted communication). Additionally, cqlsh cannot connect to the node > either; even though we have (by default) client_auth_required set to false > . > > ERROR [main] 2019-08-15 18:46:32,241 CassandraDaemon.java:749 - Exception > encountered during startup > java.lang.RuntimeException: Unable to gossip with any peers > at > org.apache.cassandra.gms.Gossiper.doShadowRound(Gossiper.java:1435) > ~[apache-cassandra-3.11.4.jar:3.11.4] > at > org.apache.cassandra.service.StorageService.checkForEndpointCollision(StorageService.java:566) > ~[apache-cassandra-3.11.4.jar:3.11.4] > at > org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:823) > ~[apache-cassandra-3.11.4.jar:3.11.4] > at > org.apache.cassandra.service.StorageService.initServer(StorageService.java:683) > ~[apache-cassandra-3.11.4.jar:3.11.4] > at > org.apache.cassandra.service.StorageService.initServer(StorageService.java:632) > ~[apache-cassandra-3.11.4.jar:3.11.4] > at > org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:388) > [apache-cassandra-3.11.4.jar:3.11.4] > at > org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:620) > [apache-cassandra-3.11.4.jar:3.11.4] > at > org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:732) > [apache-cassandra-3.11.4.jar:3.11.4] > INFO [main] 2019-08-15 18:47:07,384 YamlConfigurationLoader.java:89 - > Configuration location: file:/etc/cassandra/cassandra.yaml > > > Something to note is that this error message occurs after a few minutes of > the node being up. (i.e. there is a delay between start up before this > exception is thrown). > > *Information about our cassandra setup* > > cassandra version: 3.11.4 > JDK version: openjdk-8. > Linux: Ubuntu 18.04 (bionic). > > *cassandra.yaml* > > endpoint_snitch: Ec2MultiRegionSnitch > > server_encryption_options: > internode_encryption: dc > keystore: <omitted> > keystore_password: <omitted> > truststore: <omitted> > truststore_password: <omitted> > > client_encryption_options: > enabled: false > > *cassandra-rackdc.properties* > > prefer_local=true > > *No obvious errors with SSH output* > > When starting cassandra with JVM_OPTS="$JVM_OPTS -Djavax.net.debug=ssl" added > to cassandra-env.sh we see SSL logs printed to stdout (*Note: Subject and > Issuer were omitted on purpose)*. > > found key for : cassy-us-west-2 > adding as trusted cert: > Subject: ... > Issuer: ... > Algorithm: RSA; Serial number: 0xdad28d843fc73325d4c1a75207d4e74 > Valid from Fri May 27 00:00:00 UTC 2016 until Tue May 26 23:59:59 UTC 2026 > > ... > > trigger seeding of SecureRandom > done seeding SecureRandom > > Looking at Java SE SSL/TLS connection debugging > <https://urldefense.proofpoint.com/v2/url?u=https-3A__docs.oracle.com_javase_7_docs_technotes_guides_security_jsse_ReadDebug.html&d=DwMFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=E6NVfMr2TIhW42QMfARTvsfCLtdF-oEA3KfAQRfVZdk&m=KdhQDpMbz8v1GYrbdYL_opGq-GBPXftrEYEkgcGeMp0&s=SR3ashwvSRxA75nBjGDwjAwq65nDuBZUaDOvHPGDrps&e=>, > this looks correct. But to note, we see this series of messages (along with > the RSA key signature output) repeated several times in rapid fire. We > never observe any messages about the trust store being added; however that > might be something that occurs only on client initiation (?) > > Additionally, we do see cassandra report that the Encrypted Messaging > service has been started. > > INFO [main] 2019-08-15 18:45:31,022 MessagingService.java:704 - Starting > Encrypted Messaging Service on SSL port 7001 > > *Doesn't appear to be a cassandra.yaml configuration problem* > > We can bring the node back online by simply configuring internode_encryption: > none. This action seems to rule out a broadcast_address or rpc_address > configuration problem. > > *How we built our keystore/truststores* > > We followed the basic template datastax docs for preparing SSL > certificates > <https://docs.datastax.com/en/archived/cassandra/3.0/cassandra/configuration/secureSSLCertWithCA.html>. > One minor difference was that our private key and CSRs were generated using > openssl. One per each region (we plan to share key/signed certs across > nodes in regions). This was created using a command template as: > > openssl req -new -newkey rsa:2048 -out cassy-<region>.csr -keyout > cassy-<region>.key -config cassy-<region>.conf -subj "..." -nodes -sha256 > > The generated CSR was then signed by an internal root CA. Because we > generated our files using openssl, we had to build our jks files by > importing our certs into them. > > *Commands to generate truststore* > > We distribute this one file to all nodes. > > keytool -importcert > -keystore generic-server-truststore.jks > -alias rootCa > -file rootCa.crt > -noprompt > -keypass omitted > -storepass omitted > > *Commands to generate keystore* > > This was done one per region; but essentially we created a keystore with > keytool, then deleted the key entry and then imported our key entry using > keytool from a pkcs12 file. > > keytool -genkeypair -keyalg RSA -alias cassy-${region} -keystore > cassy-${region}.jks -storepass omitted -keypass omitted -validity 365 > -keysize 2048 -dname "..." > > keytool -delete -alias cassy-${region} -keystore cassy-${region}.jks > -storepass omitted > > openssl pkcs12 -export -in signed_certs/${region}.pem -inkey > keys/cassandra.${region}.key -name cassy-${region} -out ${region}.p12 > > keytool -importkeystore -deststorepass omitted -destkeystore > cassy-${region}.jks -srckeystore ${region}.p12 -srcstoretype PKCS12 > > keytool -importcert -keystore cassy-${region}.jks -alias rootCa -file ca.crt > -noprompt -keypass omitted -storepass omitted > > Looking back at this, I don't remember why we used keytool to generate a > keypair/keystore, then deleted and imported. I think it was because the > keytool importkeystore command refused to run if the keystore didn't > already exist. > > *ca.crt and pem file* > > The ca.crt file contains the root certificate and the intermediate > certificate that was used to sign the CSR. The pem file contains the signed > CSR returned to us, the intermediate cert, and the root CA (in that order). > > *openssl verify ca.crt and pem* > > openssl verify -CAfile ca.crt us-west-2.pem > signed_certs/us-west-2.pem: OK > > *Command output after enabling encryption* > > *nodetool status (output truncated)* > > Datacenter: us-east > =================== > Status=Up/Down > |/ State=Normal/Leaving/Joining/Moving > -- Address Load Tokens Owns (effective) Host ID > Rack > ?N 52.44.11.221 ? 256 25.4% null > 1c > ... > ?N 52.204.232.195 ? 256 23.2% null > 1d > Datacenter: us-west-2 > ===================== > Status=Up/Down > |/ State=Normal/Leaving/Joining/Moving > -- Address Load Tokens Owns (effective) Host ID > Rack > ?N 34.209.2.144 ? 256 26.5% null > 2c > UN 52.40.32.177 105.99 GiB 256 23.7% null > 2c > ?N 34.210.109.203 ? 256 24.7% null > 2a > ... > > With the online node being the node with encryption set. > > *cqlsh to localhost* > > cassy-node6:~$ cqlsh > Connection error: ('Unable to connect to any servers', {'127.0.0.1': > error(111, "Tried connecting to [('127.0.0.1', 9042)]. Last error: Connection > refused")}) > > *cqlsh to remote node* Remote node is a node with encryption enabled > > cassy-node6:~$ cqlsh 10.0.2.7 > Connection error: ('Unable to connect to any servers', {'10.0.2.7': > error(111, "Tried connecting to [('10.0.2.7', 9042)]. Last error: Connection > refused")}) > >