Re: [Discuss] CASSANDRA-16999 introduction of a column in system.peers_v2

2024-02-13 Thread Jon Haddad
+1 to deprecating dual ports and removing in 5.0

On Tue, Feb 13, 2024 at 4:29 AM Štefan Miklošovič <
[email protected]> wrote:

> Alright ...  so how I am interpreting this, even more so after Sam's and
> Brandon's mail, is that we should just get rid of that completely in trunk
> and deprecate in 5.0.
>
> There are already patches for 3.x and 4.x branches of the driver so the
> way I was looking at that was that we might resurrect this feature but if
> there is actually no need for this then the complete removal in trunk is
> probably unavoidable.
>
> On Tue, Feb 13, 2024 at 1:27 PM Brandon Williams  wrote:
>
>> On Tue, Feb 13, 2024 at 6:17 AM Sam Tunnicliffe  wrote:
>> > Also, if CASSANDRA-16999 is only going to trunk, why can't we just
>> deprecate dual ports in 5.0 (as it isn't at -rc stage yet) and remove it
>> from trunk? That seems preferable to shoehorning something into the new
>> system_views.peers table, which isn't going to help any existing drivers
>> anyway as none of them will be using it.
>>
>> I agree and I think it will be a mess having the port in 3.x, then not
>> in 4.0, 4.1, or 5.0, then resurrected again after that.
>>
>> Kind Regards,
>> Brandon
>>
>


Re: [Discuss] CASSANDRA-16999 introduction of a column in system.peers_v2

2024-02-13 Thread Štefan Miklošovič
Alright ...  so how I am interpreting this, even more so after Sam's and
Brandon's mail, is that we should just get rid of that completely in trunk
and deprecate in 5.0.

There are already patches for 3.x and 4.x branches of the driver so the way
I was looking at that was that we might resurrect this feature but if there
is actually no need for this then the complete removal in trunk is probably
unavoidable.

On Tue, Feb 13, 2024 at 1:27 PM Brandon Williams  wrote:

> On Tue, Feb 13, 2024 at 6:17 AM Sam Tunnicliffe  wrote:
> > Also, if CASSANDRA-16999 is only going to trunk, why can't we just
> deprecate dual ports in 5.0 (as it isn't at -rc stage yet) and remove it
> from trunk? That seems preferable to shoehorning something into the new
> system_views.peers table, which isn't going to help any existing drivers
> anyway as none of them will be using it.
>
> I agree and I think it will be a mess having the port in 3.x, then not
> in 4.0, 4.1, or 5.0, then resurrected again after that.
>
> Kind Regards,
> Brandon
>


Re: [Discuss] CASSANDRA-16999 introduction of a column in system.peers_v2

2024-02-13 Thread Brandon Williams
On Tue, Feb 13, 2024 at 6:17 AM Sam Tunnicliffe  wrote:
> Also, if CASSANDRA-16999 is only going to trunk, why can't we just deprecate 
> dual ports in 5.0 (as it isn't at -rc stage yet) and remove it from trunk? 
> That seems preferable to shoehorning something into the new 
> system_views.peers table, which isn't going to help any existing drivers 
> anyway as none of them will be using it.

I agree and I think it will be a mess having the port in 3.x, then not
in 4.0, 4.1, or 5.0, then resurrected again after that.

Kind Regards,
Brandon


Re: [Discuss] CASSANDRA-16999 introduction of a column in system.peers_v2

2024-02-13 Thread Sam Tunnicliffe
Late to the party I'm afraid, but I'd agree with Abe's proposal to deprecate 
the dual port approach given that CASSANDRA-10559 makes it pretty much 
redundant. Adding further yaml options like "default ssl/no ssl" feels like 
another nasty band-aid that we'll have to live with for the foreseeable future.

Also, if CASSANDRA-16999 is only going to trunk, why can't we just deprecate 
dual ports in 5.0 (as it isn't at -rc stage yet) and remove it from trunk? That 
seems preferable to shoehorning something into the new system_views.peers 
table, which isn't going to help any existing drivers anyway as none of them 
will be using it.

 
> On 12 Feb 2024, at 07:58, Štefan Miklošovič  
> wrote:
> 
> I think that the situation like
> 
> client_encryption_options.enabled = true
> client_encryption_options.optional=true
> native_transport_port != native_transport_port_ssl
> 
> is a legit bug and should be fixed. If we look here (1), when these ports are 
> not equal, the normal port is explicitly set to be unencrypted but it is 
> encrypted on _ssl port. This is not always true for _ssl port, because in (2) 
> we throw only if client_encryption_options' encryption_policy is UNENCRYPTED. 
> We do not throw if it is OPTIONAL. If we say that the normal port is always 
> unencrypted, why don't we also say that _ssl port is always encrypted? This 
> asymmetry should be fixed.
> 
> However, I think it is too late to fix anything but trunk. Adding a column 
> might break clients and fixing the logic around ports might potentially break 
> the deployments so our best shot seems to be:
> 
> 1) from 4.0 to trunk - apply a patch which would inform a user that it is 
> preferable to use single port instead of dual ports
> 2) for trunk - apply a patch which adds native_port_ssl column to 
> system_views.peers so Cassandra Java Driver can connect to such a deployment
> 3) optionally fix the bug I was describing above.
> 
> I am not sure how to evaluate the severity of 3). I would like to see it from 
> 4.0 to trunk but I also understand that if it is too disruptive, we can leave 
> it just to trunk.
> 
> The problems you described are the result of us using one 
> client_encryption_options for both ssl and non-ssl ports. I would say that 
> the first problem you described, even if it looks weird, is less serious, 
> because a user knowingly uses two ports and one of them is said to be 
> unencrypted and another one encrypted. So the fact that a non-ssl port still 
> enables unencrypted traffic is somehow expected. What is not expected is that 
> _ssl port might still accept unencrypted traffic.
> 
> To sum it up for the driver, I do not think this has a nice solution for dual 
> ports deployments already out there so it would work just for the trunk. 
> Actually, because there is missing native_port_ssl in a system table, there 
> is currently no way to successfully use dual ports because Java driver just 
> can not connect to SSL-enabled nodes reliably using ssl and non-ssl ports.
> 
> (1) 
> https://github.com/apache/cassandra/blob/097c1231e2466163fe3f8b36b12cdc5235eb1403/src/java/org/apache/cassandra/service/NativeTransportService.java#L94
> 
> (2) 
> https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/config/DatabaseDescriptor.java#L905-L915
> 
> On Thu, Feb 8, 2024 at 9:58 PM Abe Ratnofsky  > wrote:
>> > Deprecating helps nothing for existing releases. We can’t/shouldn’t remove 
>> > the feature in existing releases.
>> 
>> The deprecation I'm proposing is intended to push people to configure their 
>> servers in a way that is more secure and maximizes compatibility with 
>> clients. Deprecating can help for existing releases - the better 
>> configuration already exists, and it's likely that users of dual-native-port 
>> optional SSL can use it. At the very least, users should be made aware of 
>> the risks of dual-native-port operation.
>> 
>> Currently, if a user specifies the following server configuration:
>> - client_encryption_options.enabled=true
>> - client_encryption_options.optional=false
>> - native_transport_port != native_transport_port_ssl
>> 
>> then the server will still handle unencrypted traffic on 
>> native_transport_port. This feels like a security risk: it would be 
>> reasonable to interpret that this configuration requires all traffic to be 
>> encrypted.
>> 
>> And if a user specifies this server configuration:
>> - client_encryption_options.enabled=true
>> - client_encryption_options.optional=true
>> - native_transport_port != native_transport_port_ssl
>> 
>> then clients can still send unencrypted traffic to 
>> native_transport_port_ssl, since the server handles optional encryption on 
>> this port. In this case, there are two ports that accept unencrypted 
>> traffic, one of which also accepts encrypted traffic.
>> 
>> In both cases:
>> - Clients configured to use SSL will discover non-SSL ports from 
>> system.peers_v2 and fail to connect to those 

Re: [Discuss] CASSANDRA-16999 introduction of a column in system.peers_v2

2024-02-11 Thread Štefan Miklošovič
I think that the situation like

client_encryption_options.enabled = true
client_encryption_options.optional=true
native_transport_port != native_transport_port_ssl

is a legit bug and should be fixed. If we look here (1), when these ports
are not equal, the normal port is explicitly set to be unencrypted but it
is encrypted on _ssl port. This is not always true for _ssl port, because
in (2) we throw only if client_encryption_options' encryption_policy is
UNENCRYPTED. We do not throw if it is OPTIONAL. If we say that the normal
port is always unencrypted, why don't we also say that _ssl port is always
encrypted? This asymmetry should be fixed.

However, I think it is too late to fix anything but trunk. Adding a column
might break clients and fixing the logic around ports might potentially
break the deployments so our best shot seems to be:

1) from 4.0 to trunk - apply a patch which would inform a user that it is
preferable to use single port instead of dual ports
2) for trunk - apply a patch which adds native_port_ssl column to
system_views.peers so Cassandra Java Driver can connect to such a deployment
3) optionally fix the bug I was describing above.

I am not sure how to evaluate the severity of 3). I would like to see it
from 4.0 to trunk but I also understand that if it is too disruptive, we
can leave it just to trunk.

The problems you described are the result of us using one
client_encryption_options for both ssl and non-ssl ports. I would say that
the first problem you described, even if it looks weird, is less serious,
because a user knowingly uses two ports and one of them is said to be
unencrypted and another one encrypted. So the fact that a non-ssl port
still enables unencrypted traffic is somehow expected. What is not expected
is that _ssl port might still accept unencrypted traffic.

To sum it up for the driver, I do not think this has a nice solution for
dual ports deployments already out there so it would work just for the
trunk. Actually, because there is missing native_port_ssl in a system
table, there is currently no way to successfully use dual ports because
Java driver just can not connect to SSL-enabled nodes reliably using ssl
and non-ssl ports.

(1)
https://github.com/apache/cassandra/blob/097c1231e2466163fe3f8b36b12cdc5235eb1403/src/java/org/apache/cassandra/service/NativeTransportService.java#L94

(2)
https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/config/DatabaseDescriptor.java#L905-L915

On Thu, Feb 8, 2024 at 9:58 PM Abe Ratnofsky  wrote:

> > Deprecating helps nothing for existing releases. We can’t/shouldn’t
> remove the feature in existing releases.
>
> The deprecation I'm proposing is intended to push people to configure
> their servers in a way that is more secure and maximizes compatibility with
> clients. Deprecating can help for existing releases - the better
> configuration already exists, and it's likely that users of
> dual-native-port optional SSL can use it. At the very least, users should
> be made aware of the risks of dual-native-port operation.
>
> Currently, if a user specifies the following server configuration:
> - client_encryption_options.enabled=true
> - client_encryption_options.optional=false
> - native_transport_port != native_transport_port_ssl
>
> then the server will still handle unencrypted traffic on
> native_transport_port. This feels like a security risk: it would be
> reasonable to interpret that this configuration requires all traffic to be
> encrypted.
>
> And if a user specifies this server configuration:
> - client_encryption_options.enabled=true
> - client_encryption_options.optional=true
> - native_transport_port != native_transport_port_ssl
>
> then clients can still send unencrypted traffic to
> native_transport_port_ssl, since the server handles optional encryption on
> this port. In this case, there are two ports that accept unencrypted
> traffic, one of which also accepts encrypted traffic.
>
> In both cases:
> - Clients configured to use SSL will discover non-SSL ports from
> system.peers_v2 and fail to connect to those hosts, causing
> single-connection sessions and no load balancing
> - Clients configured to use SSL will fail to interpret server-reported
> topology and status events because those events currently include non-SSL
> ports, causing user connection pools to incorrectly track cluster changes
>
> I'm proposing that in the dual-native-port case, we log a warning to
> advise clients to move to a single port, with the same values for enabled
> and optional. With a single port, users won't need to worry about
> native_transport_port always accepting unencrypted traffic. The peers_v2
> system table and EVENT response messages will include ports that clients
> will be able to connect to regardless of their SSL configuration.
>
> I'm open to discussing other ways we could handle this, but I think the
> requirements are:
> - Maintain compatibility with existing clients (no new tables for
> discove

Re: [Discuss] CASSANDRA-16999 introduction of a column in system.peers_v2

2024-02-08 Thread Abe Ratnofsky
> Deprecating helps nothing for existing releases. We can’t/shouldn’t remove 
> the feature in existing releases.

The deprecation I'm proposing is intended to push people to configure their 
servers in a way that is more secure and maximizes compatibility with clients. 
Deprecating can help for existing releases - the better configuration already 
exists, and it's likely that users of dual-native-port optional SSL can use it. 
At the very least, users should be made aware of the risks of dual-native-port 
operation.

Currently, if a user specifies the following server configuration:
- client_encryption_options.enabled=true
- client_encryption_options.optional=false
- native_transport_port != native_transport_port_ssl

then the server will still handle unencrypted traffic on native_transport_port. 
This feels like a security risk: it would be reasonable to interpret that this 
configuration requires all traffic to be encrypted.

And if a user specifies this server configuration:
- client_encryption_options.enabled=true
- client_encryption_options.optional=true
- native_transport_port != native_transport_port_ssl

then clients can still send unencrypted traffic to native_transport_port_ssl, 
since the server handles optional encryption on this port. In this case, there 
are two ports that accept unencrypted traffic, one of which also accepts 
encrypted traffic.

In both cases:
- Clients configured to use SSL will discover non-SSL ports from 
system.peers_v2 and fail to connect to those hosts, causing single-connection 
sessions and no load balancing
- Clients configured to use SSL will fail to interpret server-reported topology 
and status events because those events currently include non-SSL ports, causing 
user connection pools to incorrectly track cluster changes

I'm proposing that in the dual-native-port case, we log a warning to advise 
clients to move to a single port, with the same values for enabled and 
optional. With a single port, users won't need to worry about 
native_transport_port always accepting unencrypted traffic. The peers_v2 system 
table and EVENT response messages will include ports that clients will be able 
to connect to regardless of their SSL configuration.

I'm open to discussing other ways we could handle this, but I think the 
requirements are:
- Maintain compatibility with existing clients (no new tables for discovering 
peers, etc.)
- Ensure SSL and non-SSL sessions can continue to operate (with >1 pooled 
connections) without disruption

Thanks to Stefan for his efforts looking into this more closely with me 
yesterday. We found out how this came about as well - when the project moved 
support dual-native-port in CASSANDRA-9590 
, driver configurations 
were expected to include hard-coded ports for encrypted and unencrypted 
traffic. Then, when customizable per-host native ports came out in 
CASSANDRA-7554 , drivers 
were expected to discover the right native protocol port from the system table 
instead of hard-coding it. So this has been a problem since 4.0 for users 
running dual-native-port and clients requiring SSL.

--
Abe

Re: [Discuss] CASSANDRA-16999 introduction of a column in system.peers_v2

2024-02-07 Thread J. D. Jordan
Correct. But that initial connection will work and the client will work, it 
just won’t have connections to multiple nodes.

I didn’t say it’s optimal, but this is the best way I can see that doesn’t 
break things more than they are now, and does give an improvement because you 
can pick which ports shows up in peers.

Deprecating helps nothing for existing releases. We can’t/shouldn’t remove the 
feature in existing releases.

> On Feb 7, 2024, at 8:02 AM, Abe Ratnofsky  wrote:
> 
> If dual-native-port is enabled, a client is connecting unencrypted to the 
> non-SSL port, and "advertise-native-port=ssl" (name pending) is enabled, then 
> when that client fetches peers it will get the SSL ports, right? If the 
> client doesn't support SSL, then those subsequent connections will fail. An 
> operator would have to set "advertise-native-port=ssl" and override the port 
> options in all clients, which isn't feasible.


Re: [Discuss] CASSANDRA-16999 introduction of a column in system.peers_v2

2024-02-07 Thread Abe Ratnofsky
> For existing versions what about having a “default ssl” or “default no SSL” 
> yaml setting which decides what port is advertised?  Then someone could still 
> connect on the other port manually specifying.  Then new column can be added 
> with the new table in trunk.

I'm assuming "advertisement" here includes which port is in system.peers(_v2) 
and is included in status and topology events.

If dual-native-port is enabled, a client is connecting unencrypted to the 
non-SSL port, and "advertise-native-port=ssl" (name pending) is enabled, then 
when that client fetches peers it will get the SSL ports, right? If the client 
doesn't support SSL, then those subsequent connections will fail. An operator 
would have to set "advertise-native-port=ssl" and override the port options in 
all clients, which isn't feasible.

It might be possible to move to a system view, and return the right native port 
depending on whether the connection the query originates from is SSL or not. So 
if a control connection is over SSL, then ports in system_views.peers_ssl_aware 
(name pending) would be SSL ports. But having query responses dependent on 
connection state seems like a recipe for a very weird bug in the future.

I still think deprecating dual-native-port is the cleanest and most compatible 
solution, especially since it would address a few other related bugs as well. 
I'm considering a separate thread to get consensus on that.

Abe

Re: [Discuss] CASSANDRA-16999 introduction of a column in system.peers_v2

2024-02-07 Thread J. D. Jordan
We should not introduce a new column in a patch release. From what I have seen many drivers “select * from peers”, yes it’s not a good idea, but we can’t control what all clients do, and an extra column coming back may break the processing of that.For existing versions what about having a “default ssl” or “default no SSL” yaml setting which decides what port is advertised?  Then someone could still connect on the other port manually specifying.  Then new column can be added with the new table in trunk.-JeremiahOn Feb 7, 2024, at 5:56 AM, Štefan Miklošovič  wrote:Honest to god, I do not know, Abe. If I see a feedback where we reach consensus to deprecate dual port support, I will deprecate that. On Wed, Feb 7, 2024 at 12:42 PM Abe Ratnofsky  wrote:CASSANDRA-9590 (Support for both encrypted and unencrypted native transport connections) was implemented before CASSANDRA-10559 (Support encrypted and plain traffic on the same port), but both been available since 3.0.On 9590, STARTTLS was considered, but rejected due to the changes that would be required to support it from all drivers. But the current server implementation doesn't require STARTTLS: the client is expected to send the first message over the connection, so the server can just check if that message is encrypted, and then enable the Netty pipeline's SslHandler.The implementation in 10559 is compatible with existing clients, and is already used widely. Are there any reasons for users to stick with dual-native-port rather than a single port that supports both encrypted and unencrypted traffic?


Re: [Discuss] CASSANDRA-16999 introduction of a column in system.peers_v2

2024-02-07 Thread Štefan Miklošovič
Honest to god, I do not know, Abe. If I see a feedback where we reach
consensus to deprecate dual port support, I will deprecate that.

On Wed, Feb 7, 2024 at 12:42 PM Abe Ratnofsky  wrote:

> CASSANDRA-9590 (Support for both encrypted and unencrypted native
> transport connections) was implemented before CASSANDRA-10559 (Support
> encrypted and plain traffic on the same port), but both been available
> since 3.0.
>
> On 9590, STARTTLS was considered, but rejected due to the changes that
> would be required to support it from all drivers. But the current server
> implementation doesn't require STARTTLS: the client is expected to send the
> first message over the connection, so the server can just check if that
> message is encrypted
> ,
> and then enable the Netty pipeline's SslHandler
> 
> .
>
> The implementation in 10559 is compatible with existing clients, and is
> already used widely. Are there any reasons for users to stick with
> dual-native-port rather than a single port that supports both encrypted and
> unencrypted traffic?
>


Re: [Discuss] CASSANDRA-16999 introduction of a column in system.peers_v2

2024-02-07 Thread Abe Ratnofsky
CASSANDRA-9590 (Support for both encrypted and unencrypted native transport 
connections) was implemented before CASSANDRA-10559 (Support encrypted and 
plain traffic on the same port), but both been available since 3.0.

On 9590, STARTTLS was considered, but rejected due to the changes that would be 
required to support it from all drivers. But the current server implementation 
doesn't require STARTTLS: the client is expected to send the first message over 
the connection, so the server can just check if that message is encrypted 
,
 and then enable the Netty pipeline's SslHandler 
.

The implementation in 10559 is compatible with existing clients, and is already 
used widely. Are there any reasons for users to stick with dual-native-port 
rather than a single port that supports both encrypted and unencrypted traffic?

Re: [Discuss] CASSANDRA-16999 introduction of a column in system.peers_v2

2024-02-07 Thread Štefan Miklošovič
More context behind dual native port support might be found here
https://issues.apache.org/jira/browse/CASSANDRA-9590 where it was
implemented.

On Wed, Feb 7, 2024 at 12:07 PM Abe Ratnofsky  wrote:

> What is the audience for dual-native-port operation? My understanding is
> that most users can use a single port for optional SSL, ever since
> CASSANDRA-10559 .
> Using a single port for both encrypted and unencrypted traffic also makes
> clients more likely to behave correctly, since status and topology events
> (which identify hosts by their native address and port) will correctly
> identify host+port pairs that exist in user load balancing policies and
> connection pools. Dual-port operation appears to behave incorrectly against
> apache/cassandra-java-driver 4.x, as discussed in the #cassandra-drivers
> Slack channel
> . If
> this does not behave correctly in the official driver, it's likely there
> are bugs in other drivers' handling of dual-native-port clusters as well.
>
> Would there be any appetite for deprecating dual-native-port operation?
>
> > Also, if there is currently a user who e.g. reads from peers_v2 table
> and retrieves the value from a column by some index, then this would be
> "shifted" and it might break her reading.
>
> Given that peers_v2 is intended to be used by the internals of client
> implementations, we should be extra cautious about making changes. We
> shouldn't change the column index of any existing columns - if we're going
> to add a new column, it should be in the end position.
>
> Abe
>


Re: [Discuss] CASSANDRA-16999 introduction of a column in system.peers_v2

2024-02-07 Thread Abe Ratnofsky
What is the audience for dual-native-port operation? My understanding is that 
most users can use a single port for optional SSL, ever since CASSANDRA-10559 
. Using a single port 
for both encrypted and unencrypted traffic also makes clients more likely to 
behave correctly, since status and topology events (which identify hosts by 
their native address and port) will correctly identify host+port pairs that 
exist in user load balancing policies and connection pools. Dual-port operation 
appears to behave incorrectly against apache/cassandra-java-driver 4.x, as 
discussed in the #cassandra-drivers Slack channel 
. If this 
does not behave correctly in the official driver, it's likely there are bugs in 
other drivers' handling of dual-native-port clusters as well.

Would there be any appetite for deprecating dual-native-port operation?

> Also, if there is currently a user who e.g. reads from peers_v2 table and 
> retrieves the value from a column by some index, then this would be "shifted" 
> and it might break her reading.

Given that peers_v2 is intended to be used by the internals of client 
implementations, we should be extra cautious about making changes. We shouldn't 
change the column index of any existing columns - if we're going to add a new 
column, it should be in the end position.

Abe