Re: unable to gossip with peers exception when internode encryption is set to any setting other than 'none'

2019-08-28 Thread Michael Carlise
For clarity for anybody that comes to this chain in the archive.  This
might be an issue with Ec2MultiRegionSnitch all together; not sure.  But if
I create a local 3 node cluster using ccm (cassandra v 3.11.4).  I can drop
the keystore/truststore jks files in, and flip encryption and everything
works as expected.  Tomorrow I'll reach out to the slack channel and see if
anybody can help/suggest ways to test it; or if anybody is aware of an
ongoing issue.

On Wed, Aug 28, 2019 at 2:49 PM Michael Carlise 
wrote:

> telnet from node 1 -> node2 7001 (and 7000) works.
>
> However, I can't rule out a JKS keystore/truststore issue.  I have tried a
> number of configurations and none of them have seemed to help (or emit any
> further error logging).   We have a root and intermediate CA cert, and a
> private key + signed CSR.  Our keystore has a single privateKeyentry of
> length 2: consisting of the signed CSR and the intermediate cert (in that
> order).  The truststore has a single entry of length one: consisting of the
> root cert used to issue the intermediate.  Does anybody know if that is the
> correct setup for JKS.  This setup was given to us by another team in our
> company that uses java much more than us.
>
> Some other points to note: Cassandra-9386 issue points out that 'dc'
> internode_encryption when using Ec2MultiRegionSnitch doesn't work correctly
> (always uses encrypted connections).  But I still can't get 'all' to work.
> The way I'm trying to get it to work is by just simply flipping encryption
> on in two non-seed nodes in the same datacenter.  I notice that in
> system.log I can see them both output the message 'Handshaking with
> /private IP'.  But then a few minutes later the unable to gossip exception
> is thrown.  No other information/logs are given; so I assume the handshake
> failed? presumably b/c incorrect truststore/keystore?
>
> I can't seem to find any concrete information about how to setup the
> keystore cert chain and/or the truststore. Does anybody know of any good
> sources on this topic, or know at the top of the minds how this setup is
> supposed to be?
>
>
> On Mon, Aug 26, 2019 at 10:01 PM Subroto Barua 
> wrote:
>
>> could be issue with keystore/trustore --- you may want to do keytool --
>> list  -- validate the files/password; also do md5sum on files from 1 node
>> in west and 1 node in east.
>> check ssl port 7001 --- from 1 node in west --> telnet > east>:7001 (or custom port if you are not using default port)
>>
>> On Monday, August 26, 2019, 05:46:19 PM PDT, Michael Carlise
>>  wrote:
>>
>>
>> Subroto -
>>
>> both tools error; openssl errno 111 - which made me check bound ports on
>> the c* node with encryption flipped.  Port 9042 is not open (determined by
>> netstat -ant).  Looking at the log differences for when a node is started
>> with/without encryption.  Without encryption, I get a bunch of lines like:
>>
>> OutboundTcpConnection.java:561 - Handshaking version w/ IP
>>
>> And this happens after a line like
>>
>> Gossiper.java - Waiting for gossip to settle...
>>
>> with encryption toggled to 'dc', I don't see any of those lines;
>> presumable b/c the gossiper is trying to start but doesn't.
>>
>> On Mon, Aug 26, 2019 at 6:51 PM Subroto Barua 
>> wrote:
>>
>> Michael,
>>
>> Are you able to connect to any c* node via OpenSSL?
>>
>> Openssl s_client -connect :9042
>>
>> Cqlsh  —ssl
>>
>> Subroto
>>
>> On Aug 26, 2019, at 2:47 PM, Marc Selwan 
>> wrote:
>>
>> which exact version of OpenJDK are you using? Is it possible you don't
>> have JCE on those nodes? (I believe more recent versions of Java 8 has this
>> baked in so that might not be it)
>>
>>
>> *Marc Selwan | *DataStax *| *PM, Server Team *|* *(925) 413-7079* *|*
>> Twitter <https://twitter.com/MarcSelwan>
>>
>> *  Quick links | *DataStax <http://www.datastax.com> *| *Training
>> <http://www.academy.datastax.com> *| *Documentation
>> <http://www.datastax.com/documentation/getting_started/doc/getting_started/gettingStartedIntro_r.html>
>>  *| *Downloads <http://www.datastax.com/download>
>>
>>
>>
>> On Mon, Aug 26, 2019 at 1:56 PM Michael Carlise <
>> mcarl...@salesforce.com.invalid> wrote:
>>
>>
>> I originally opened this issue on stackoverflow (
>> https://stackoverflow.com/questions/57516660/cassandra-node-to-node-encryption-throws-unable-to-gossip-with-peers-exception
>> <https://urldefense.proofpoint.com/v

Re: unable to gossip with peers exception when internode encryption is set to any setting other than 'none'

2019-08-28 Thread Michael Carlise
telnet from node 1 -> node2 7001 (and 7000) works.

However, I can't rule out a JKS keystore/truststore issue.  I have tried a
number of configurations and none of them have seemed to help (or emit any
further error logging).   We have a root and intermediate CA cert, and a
private key + signed CSR.  Our keystore has a single privateKeyentry of
length 2: consisting of the signed CSR and the intermediate cert (in that
order).  The truststore has a single entry of length one: consisting of the
root cert used to issue the intermediate.  Does anybody know if that is the
correct setup for JKS.  This setup was given to us by another team in our
company that uses java much more than us.

Some other points to note: Cassandra-9386 issue points out that 'dc'
internode_encryption when using Ec2MultiRegionSnitch doesn't work correctly
(always uses encrypted connections).  But I still can't get 'all' to work.
The way I'm trying to get it to work is by just simply flipping encryption
on in two non-seed nodes in the same datacenter.  I notice that in
system.log I can see them both output the message 'Handshaking with
/private IP'.  But then a few minutes later the unable to gossip exception
is thrown.  No other information/logs are given; so I assume the handshake
failed? presumably b/c incorrect truststore/keystore?

I can't seem to find any concrete information about how to setup the
keystore cert chain and/or the truststore. Does anybody know of any good
sources on this topic, or know at the top of the minds how this setup is
supposed to be?


On Mon, Aug 26, 2019 at 10:01 PM Subroto Barua 
wrote:

> could be issue with keystore/trustore --- you may want to do keytool --
> list  -- validate the files/password; also do md5sum on files from 1 node
> in west and 1 node in east.
> check ssl port 7001 --- from 1 node in west --> telnet :7001
> (or custom port if you are not using default port)
>
> On Monday, August 26, 2019, 05:46:19 PM PDT, Michael Carlise
>  wrote:
>
>
> Subroto -
>
> both tools error; openssl errno 111 - which made me check bound ports on
> the c* node with encryption flipped.  Port 9042 is not open (determined by
> netstat -ant).  Looking at the log differences for when a node is started
> with/without encryption.  Without encryption, I get a bunch of lines like:
>
> OutboundTcpConnection.java:561 - Handshaking version w/ IP
>
> And this happens after a line like
>
> Gossiper.java - Waiting for gossip to settle...
>
> with encryption toggled to 'dc', I don't see any of those lines;
> presumable b/c the gossiper is trying to start but doesn't.
>
> On Mon, Aug 26, 2019 at 6:51 PM Subroto Barua 
> wrote:
>
> Michael,
>
> Are you able to connect to any c* node via OpenSSL?
>
> Openssl s_client -connect :9042
>
> Cqlsh  —ssl
>
> Subroto
>
> On Aug 26, 2019, at 2:47 PM, Marc Selwan  wrote:
>
> which exact version of OpenJDK are you using? Is it possible you don't
> have JCE on those nodes? (I believe more recent versions of Java 8 has this
> baked in so that might not be it)
>
>
> *Marc Selwan | *DataStax *| *PM, Server Team *|* *(925) 413-7079* *|*
> Twitter <https://twitter.com/MarcSelwan>
>
> *  Quick links | *DataStax <http://www.datastax.com> *| *Training
> <http://www.academy.datastax.com> *| *Documentation
> <http://www.datastax.com/documentation/getting_started/doc/getting_started/gettingStartedIntro_r.html>
>  *| *Downloads <http://www.datastax.com/download>
>
>
>
> On Mon, Aug 26, 2019 at 1:56 PM Michael Carlise <
> mcarl...@salesforce.com.invalid> wrote:
>
>
> I originally opened this issue on stackoverflow (
> https://stackoverflow.com/questions/57516660/cassandra-node-to-node-encryption-throws-unable-to-gossip-with-peers-exception
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__stackoverflow.com_questions_57516660_cassandra-2Dnode-2Dto-2Dnode-2Dencryption-2Dthrows-2Dunable-2Dto-2Dgossip-2Dwith-2Dpeers-2Dexception&d=DwMFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=E6NVfMr2TIhW42QMfARTvsfCLtdF-oEA3KfAQRfVZdk&m=KdhQDpMbz8v1GYrbdYL_opGq-GBPXftrEYEkgcGeMp0&s=4CR8PRQopb4FyCLj8PDI44mSouBz65Yx8THnH8tOb7o&e=>
> ).
>
> However, I haven't gotten any responses in over a week.  I'm going to post
> it here and maybe someone will have an idea on where I can look.
>
> We currently run a multi region cassandra cluster in AWS. It runs in four
> regions, 12 nodes per region. It runs without node to node encryption (or
> client encryption either). We are trying to enable inter datacenter node to
> node encryption. However, when we flip encryption over we get an exception
> that nodes are unable to gossip with any peers.
>
> It could possibly

Re: unable to gossip with peers exception when internode encryption is set to any setting other than 'none'

2019-08-26 Thread Michael Carlise
Subroto -

both tools error; openssl errno 111 - which made me check bound ports on
the c* node with encryption flipped.  Port 9042 is not open (determined by
netstat -ant).  Looking at the log differences for when a node is started
with/without encryption.  Without encryption, I get a bunch of lines like:

OutboundTcpConnection.java:561 - Handshaking version w/ IP

And this happens after a line like

Gossiper.java - Waiting for gossip to settle...

with encryption toggled to 'dc', I don't see any of those lines; presumable
b/c the gossiper is trying to start but doesn't.

On Mon, Aug 26, 2019 at 6:51 PM Subroto Barua 
wrote:

> Michael,
>
> Are you able to connect to any c* node via OpenSSL?
>
> Openssl s_client -connect :9042
>
> Cqlsh  —ssl
>
> Subroto
>
> On Aug 26, 2019, at 2:47 PM, Marc Selwan  wrote:
>
> which exact version of OpenJDK are you using? Is it possible you don't
> have JCE on those nodes? (I believe more recent versions of Java 8 has this
> baked in so that might not be it)
>
>
> *Marc Selwan | *DataStax *| *PM, Server Team *|* *(925) 413-7079* *|*
> Twitter <https://twitter.com/MarcSelwan>
>
> *  Quick links | *DataStax <http://www.datastax.com> *| *Training
> <http://www.academy.datastax.com> *| *Documentation
> <http://www.datastax.com/documentation/getting_started/doc/getting_started/gettingStartedIntro_r.html>
>  *| *Downloads <http://www.datastax.com/download>
>
>
>
> On Mon, Aug 26, 2019 at 1:56 PM Michael Carlise <
> mcarl...@salesforce.com.invalid> wrote:
>
>>
>> I originally opened this issue on stackoverflow (
>> https://stackoverflow.com/questions/57516660/cassandra-node-to-node-encryption-throws-unable-to-gossip-with-peers-exception
>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__stackoverflow.com_questions_57516660_cassandra-2Dnode-2Dto-2Dnode-2Dencryption-2Dthrows-2Dunable-2Dto-2Dgossip-2Dwith-2Dpeers-2Dexception&d=DwMFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=E6NVfMr2TIhW42QMfARTvsfCLtdF-oEA3KfAQRfVZdk&m=KdhQDpMbz8v1GYrbdYL_opGq-GBPXftrEYEkgcGeMp0&s=4CR8PRQopb4FyCLj8PDI44mSouBz65Yx8THnH8tOb7o&e=>
>> ).
>>
>> However, I haven't gotten any responses in over a week.  I'm going to
>> post it here and maybe someone will have an idea on where I can look.
>>
>> We currently run a multi region cassandra cluster in AWS. It runs in four
>> regions, 12 nodes per region. It runs without node to node encryption (or
>> client encryption either). We are trying to enable inter datacenter node to
>> node encryption. However, when we flip encryption over we get an exception
>> that nodes are unable to gossip with any peers.
>>
>> It could possibly be that we didn't build our jks keystore/truststores
>> correctly (more on how we built these files below). But, we additionally do
>> not see intra datacenter communication working (which should be set to
>> unencrypted communication). Additionally, cqlsh cannot connect to the node
>> either; even though we have (by default) client_auth_required set to
>> false.
>>
>> ERROR [main] 2019-08-15 18:46:32,241 CassandraDaemon.java:749 - Exception 
>> encountered during startup
>> java.lang.RuntimeException: Unable to gossip with any peers
>> at 
>> org.apache.cassandra.gms.Gossiper.doShadowRound(Gossiper.java:1435) 
>> ~[apache-cassandra-3.11.4.jar:3.11.4]
>> at 
>> org.apache.cassandra.service.StorageService.checkForEndpointCollision(StorageService.java:566)
>>  ~[apache-cassandra-3.11.4.jar:3.11.4]
>> at 
>> org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:823)
>>  ~[apache-cassandra-3.11.4.jar:3.11.4]
>> at 
>> org.apache.cassandra.service.StorageService.initServer(StorageService.java:683)
>>  ~[apache-cassandra-3.11.4.jar:3.11.4]
>> at 
>> org.apache.cassandra.service.StorageService.initServer(StorageService.java:632)
>>  ~[apache-cassandra-3.11.4.jar:3.11.4]
>> at 
>> org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:388) 
>> [apache-cassandra-3.11.4.jar:3.11.4]
>> at 
>> org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:620)
>>  [apache-cassandra-3.11.4.jar:3.11.4]
>> at 
>> org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:732) 
>> [apache-cassandra-3.11.4.jar:3.11.4]
>> INFO  [main] 2019-08-15 18:47:07,384 YamlConfigurationLoader.java:89 - 
>> Configuration location: file:/etc/cassandra/cassandra.yaml
>>
>>
>> Something to note is that this error message occurs after a 

Re: unable to gossip with peers exception when internode encryption is set to any setting other than 'none'

2019-08-26 Thread Michael Carlise
The version given by apt is 8u162-b12-1.  Which I think corresponds to
openJDK-8-162.  When I run jrunscript -e 'print
(javax.crypto.Cipher.getMaxAllowedKeyLength("RC5") >= 256);' the command
returns true.  Not sure if that is the best way to verify JCE installed.


Michael Carlise

On Mon, Aug 26, 2019 at 5:47 PM Marc Selwan 
wrote:

> which exact version of OpenJDK are you using? Is it possible you don't
> have JCE on those nodes? (I believe more recent versions of Java 8 has this
> baked in so that might not be it)
>
>
> *Marc Selwan | *DataStax *| *PM, Server Team *|* *(925) 413-7079* *|*
> Twitter <https://twitter.com/MarcSelwan>
>
> *  Quick links | *DataStax <http://www.datastax.com> *| *Training
> <http://www.academy.datastax.com> *| *Documentation
> <http://www.datastax.com/documentation/getting_started/doc/getting_started/gettingStartedIntro_r.html>
>  *| *Downloads <http://www.datastax.com/download>
>
>
>
> On Mon, Aug 26, 2019 at 1:56 PM Michael Carlise
>  wrote:
>
>>
>> I originally opened this issue on stackoverflow (
>> https://stackoverflow.com/questions/57516660/cassandra-node-to-node-encryption-throws-unable-to-gossip-with-peers-exception
>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__stackoverflow.com_questions_57516660_cassandra-2Dnode-2Dto-2Dnode-2Dencryption-2Dthrows-2Dunable-2Dto-2Dgossip-2Dwith-2Dpeers-2Dexception&d=DwMFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=E6NVfMr2TIhW42QMfARTvsfCLtdF-oEA3KfAQRfVZdk&m=KdhQDpMbz8v1GYrbdYL_opGq-GBPXftrEYEkgcGeMp0&s=4CR8PRQopb4FyCLj8PDI44mSouBz65Yx8THnH8tOb7o&e=>
>> ).
>>
>> However, I haven't gotten any responses in over a week.  I'm going to
>> post it here and maybe someone will have an idea on where I can look.
>>
>> We currently run a multi region cassandra cluster in AWS. It runs in four
>> regions, 12 nodes per region. It runs without node to node encryption (or
>> client encryption either). We are trying to enable inter datacenter node to
>> node encryption. However, when we flip encryption over we get an exception
>> that nodes are unable to gossip with any peers.
>>
>> It could possibly be that we didn't build our jks keystore/truststores
>> correctly (more on how we built these files below). But, we additionally do
>> not see intra datacenter communication working (which should be set to
>> unencrypted communication). Additionally, cqlsh cannot connect to the node
>> either; even though we have (by default) client_auth_required set to
>> false.
>>
>> ERROR [main] 2019-08-15 18:46:32,241 CassandraDaemon.java:749 - Exception 
>> encountered during startup
>> java.lang.RuntimeException: Unable to gossip with any peers
>> at 
>> org.apache.cassandra.gms.Gossiper.doShadowRound(Gossiper.java:1435) 
>> ~[apache-cassandra-3.11.4.jar:3.11.4]
>> at 
>> org.apache.cassandra.service.StorageService.checkForEndpointCollision(StorageService.java:566)
>>  ~[apache-cassandra-3.11.4.jar:3.11.4]
>> at 
>> org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:823)
>>  ~[apache-cassandra-3.11.4.jar:3.11.4]
>> at 
>> org.apache.cassandra.service.StorageService.initServer(StorageService.java:683)
>>  ~[apache-cassandra-3.11.4.jar:3.11.4]
>> at 
>> org.apache.cassandra.service.StorageService.initServer(StorageService.java:632)
>>  ~[apache-cassandra-3.11.4.jar:3.11.4]
>> at 
>> org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:388) 
>> [apache-cassandra-3.11.4.jar:3.11.4]
>> at 
>> org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:620)
>>  [apache-cassandra-3.11.4.jar:3.11.4]
>> at 
>> org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:732) 
>> [apache-cassandra-3.11.4.jar:3.11.4]
>> INFO  [main] 2019-08-15 18:47:07,384 YamlConfigurationLoader.java:89 - 
>> Configuration location: file:/etc/cassandra/cassandra.yaml
>>
>>
>> Something to note is that this error message occurs after a few minutes
>> of the node being up. (i.e. there is a delay between start up before this
>> exception is thrown).
>>
>> *Information about our cassandra setup*
>>
>> cassandra version: 3.11.4
>> JDK version: openjdk-8.
>> Linux: Ubuntu 18.04 (bionic).
>>
>> *cassandra.yaml*
>>
>> endpoint_snitch: Ec2MultiRegionSnitch
>>
>> server_encryption_options:
>>   internode_encryption: dc
>>   keystore: 
>>   keystore_password: 
>>   truststore: 
>>   trust

unable to gossip with peers exception when internode encryption is set to any setting other than 'none'

2019-08-26 Thread Michael Carlise
I originally opened this issue on stackoverflow (
https://stackoverflow.com/questions/57516660/cassandra-node-to-node-encryption-throws-unable-to-gossip-with-peers-exception
).

However, I haven't gotten any responses in over a week.  I'm going to post
it here and maybe someone will have an idea on where I can look.

We currently run a multi region cassandra cluster in AWS. It runs in four
regions, 12 nodes per region. It runs without node to node encryption (or
client encryption either). We are trying to enable inter datacenter node to
node encryption. However, when we flip encryption over we get an exception
that nodes are unable to gossip with any peers.

It could possibly be that we didn't build our jks keystore/truststores
correctly (more on how we built these files below). But, we additionally do
not see intra datacenter communication working (which should be set to
unencrypted communication). Additionally, cqlsh cannot connect to the node
either; even though we have (by default) client_auth_required set to false.

ERROR [main] 2019-08-15 18:46:32,241 CassandraDaemon.java:749 -
Exception encountered during startup
java.lang.RuntimeException: Unable to gossip with any peers
at org.apache.cassandra.gms.Gossiper.doShadowRound(Gossiper.java:1435)
~[apache-cassandra-3.11.4.jar:3.11.4]
at 
org.apache.cassandra.service.StorageService.checkForEndpointCollision(StorageService.java:566)
~[apache-cassandra-3.11.4.jar:3.11.4]
at 
org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:823)
~[apache-cassandra-3.11.4.jar:3.11.4]
at 
org.apache.cassandra.service.StorageService.initServer(StorageService.java:683)
~[apache-cassandra-3.11.4.jar:3.11.4]
at 
org.apache.cassandra.service.StorageService.initServer(StorageService.java:632)
~[apache-cassandra-3.11.4.jar:3.11.4]
at 
org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:388)
[apache-cassandra-3.11.4.jar:3.11.4]
at 
org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:620)
[apache-cassandra-3.11.4.jar:3.11.4]
at 
org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:732)
[apache-cassandra-3.11.4.jar:3.11.4]
INFO  [main] 2019-08-15 18:47:07,384 YamlConfigurationLoader.java:89 -
Configuration location: file:/etc/cassandra/cassandra.yaml


Something to note is that this error message occurs after a few minutes of
the node being up. (i.e. there is a delay between start up before this
exception is thrown).

*Information about our cassandra setup*

cassandra version: 3.11.4
JDK version: openjdk-8.
Linux: Ubuntu 18.04 (bionic).

*cassandra.yaml*

endpoint_snitch: Ec2MultiRegionSnitch

server_encryption_options:
  internode_encryption: dc
  keystore: 
  keystore_password: 
  truststore: 
  truststore_password: 

client_encryption_options:
  enabled: false

*cassandra-rackdc.properties*

prefer_local=true

*No obvious errors with SSH output*

When starting cassandra with JVM_OPTS="$JVM_OPTS -Djavax.net.debug=ssl" added
to cassandra-env.sh we see SSL logs printed to stdout (*Note: Subject and
Issuer were omitted on purpose)*.

found key for : cassy-us-west-2
adding as trusted cert:
  Subject: ...
  Issuer:  ...
  Algorithm: RSA; Serial number: 0xdad28d843fc73325d4c1a75207d4e74
  Valid from Fri May 27 00:00:00 UTC 2016 until Tue May 26 23:59:59 UTC 2026

...

trigger seeding of SecureRandom
done seeding SecureRandom

Looking at Java SE SSL/TLS connection debugging
,
this looks correct. But to note, we see this series of messages (along with
the RSA key signature output) repeated several times in rapid fire. We
never observe any messages about the trust store being added; however that
might be something that occurs only on client initiation (?)

Additionally, we do see cassandra report that the Encrypted Messaging
service has been started.

INFO  [main] 2019-08-15 18:45:31,022 MessagingService.java:704 -
Starting Encrypted Messaging Service on SSL port 7001

*Doesn't appear to be a cassandra.yaml configuration problem*

We can bring the node back online by simply configuring internode_encryption:
none. This action seems to rule out a broadcast_address or rpc_address
configuration problem.

*How we built our keystore/truststores*

We followed the basic template datastax docs for preparing SSL certificates
.
One minor difference was that our private key and CSRs were generated using
openssl. One per each region (we plan to share key/signed certs across
nodes in regions). This was created using a command template as:

openssl req -new -newkey rsa:2048 -out cassy-.csr -keyout
cassy-.key -config cassy-.conf -subj "..." -nodes
-sha256

The generated CSR was then signed by an internal root CA. Because we
generated our files using openssl, we had to build