[jira] [Commented] (CASSANDRA-15823) Support for networking via identity instead of IP

2021-02-24 Thread Brandon Williams (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17290081#comment-17290081
 ] 

Brandon Williams commented on CASSANDRA-15823:
--

bq. DNS is not strongly consistent and hard to reason about when making DNS a 
part of your system.

Also it, and any other kind of name resolution, has never been required at all, 
nor should it be.

> Support for networking via identity instead of IP
> -
>
> Key: CASSANDRA-15823
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15823
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Christopher Bradford
>Priority: Normal
>  Labels: kubernetes
> Attachments: consul-mesh-gateways.png, 
> istio-multicluster-with-gateways.svg, linkerd-service-mirroring.svg
>
>
> TL;DR: Instead of mapping host ids to IPs, use hostnames. This allows 
> resolution to different IP addresses per DC that may then be forwarded to 
> nodes on remote networks without requiring node to node IP connectivity for 
> cross-dc links.
>  
> This approach should not affect existing deployments as those could continue 
> to use IPs as the hostname and skip resolution.
> 
> With orchestration platforms like Kubernetes and the usage of ephemeral 
> containers in environments today we should consider some changes to how we 
> handle the tracking of nodes and their network location. Currently we 
> maintain a mapping between host ids and IP addresses.
>  
> With traditional infrastructure, if a node goes down it, usually, comes back 
> up with the same IP. In some environments this contract may be explicit with 
> virtual IPs that may move between hosts. In newer deployments, like on 
> Kubernetes, this contract is not possible. Pods (analogous to nodes) are 
> assigned an IP address at start time. Should the pod be restarted or 
> scheduled on a different host there is no guarantee we would have the same 
> IP. Cassandra is protected here as we already have logic in place to update 
> peers when we come up with the same host id, but a different IP address.
>  
> There are ways to get Kubernetes to assign a specific IP per Pod. Most 
> recommendations involve the use of a service per pod. Communication with the 
> fixed service IP would automatically forward to the associated pod, 
> regardless of address. We _could_ use this approach, but it seems like this 
> would needlessly create a number of extra resources in our k8s cluster to get 
> around the problem. Which, to be fair, doesn't seem like much of a problem 
> with the aforementioned mitigations built into C*.
>  
> So what is the _actual_ problem? *Cross-region, cross-cloud, 
> hybrid-deployment connectivity between pods is a pain.* This can be solved 
> with significant investment by those who want to deploy these types of 
> topologies. You can definitely configure connectivity between clouds over 
> dedicated connections, or VPN tunnels. With a big chunk of time insuring that 
> pod to pod connectivity just works even if those pods are managed by separate 
> control planes, but that again requires time and talent. There are a number 
> of edge cases to support between the ever so slight, but very important, 
> differences in cloud vendor networks.
>  
> Recently there have been a number of innovations that aid in the deployment 
> and operation of these types of applications on Kubernetes. Service meshes 
> support distributed microservices running across multiple k8s cluster control 
> planes in disparate networks. Instead of directly connecting to IP addresses 
> of remote services instead they use a hostname. With this approach, hostname 
> traffic may then be routed to a proxy that sends traffic over the WAN 
> (sometimes with mTLS) to another proxy pod in the remote cluster which then 
> forwards the data along to the correct pod in that network. (See attached 
> diagrams)
>  
> Which brings us to the point of this ticket. Instead of mapping host ids to 
> IPs, use hostnames (and update the underlying address periodically instead of 
> caching indefinitely). This allows resolution to different IP addresses per 
> DC (k8s cluster) that may then be forwarded to nodes (pods) on remote 
> networks (k8s clusters) without requiring node to node (pod to pod) IP 
> connectivity between them. Traditional deployments can still function like 
> they do today (even if operators opt to keep using IPs as identifiers instead 
> of hostnames). This proxy approach is then enabled like those we see in 
> service meshes.
>  
> _Notes_
> C* already has the concept of broadcast addresses vs those which are bound on 
> the node. This approach _could_ be leveraged to provide the behavior we're 
> looking for, but then the broadcast values would need to be pre-computed 
> _*and match*_ 

[jira] [Commented] (CASSANDRA-15823) Support for networking via identity instead of IP

2020-10-27 Thread vincent royer (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17221344#comment-17221344
 ] 

vincent royer commented on CASSANDRA-15823:
---

Hi [~jjirsa], would you have the scenario that leads to inconsistency when 
replacing a node with the same IP as the dead node (A scenario also possible 
under kubernetes) ?

> Support for networking via identity instead of IP
> -
>
> Key: CASSANDRA-15823
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15823
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Christopher Bradford
>Priority: Normal
>  Labels: kubernetes
> Attachments: consul-mesh-gateways.png, 
> istio-multicluster-with-gateways.svg, linkerd-service-mirroring.svg
>
>
> TL;DR: Instead of mapping host ids to IPs, use hostnames. This allows 
> resolution to different IP addresses per DC that may then be forwarded to 
> nodes on remote networks without requiring node to node IP connectivity for 
> cross-dc links.
>  
> This approach should not affect existing deployments as those could continue 
> to use IPs as the hostname and skip resolution.
> 
> With orchestration platforms like Kubernetes and the usage of ephemeral 
> containers in environments today we should consider some changes to how we 
> handle the tracking of nodes and their network location. Currently we 
> maintain a mapping between host ids and IP addresses.
>  
> With traditional infrastructure, if a node goes down it, usually, comes back 
> up with the same IP. In some environments this contract may be explicit with 
> virtual IPs that may move between hosts. In newer deployments, like on 
> Kubernetes, this contract is not possible. Pods (analogous to nodes) are 
> assigned an IP address at start time. Should the pod be restarted or 
> scheduled on a different host there is no guarantee we would have the same 
> IP. Cassandra is protected here as we already have logic in place to update 
> peers when we come up with the same host id, but a different IP address.
>  
> There are ways to get Kubernetes to assign a specific IP per Pod. Most 
> recommendations involve the use of a service per pod. Communication with the 
> fixed service IP would automatically forward to the associated pod, 
> regardless of address. We _could_ use this approach, but it seems like this 
> would needlessly create a number of extra resources in our k8s cluster to get 
> around the problem. Which, to be fair, doesn't seem like much of a problem 
> with the aforementioned mitigations built into C*.
>  
> So what is the _actual_ problem? *Cross-region, cross-cloud, 
> hybrid-deployment connectivity between pods is a pain.* This can be solved 
> with significant investment by those who want to deploy these types of 
> topologies. You can definitely configure connectivity between clouds over 
> dedicated connections, or VPN tunnels. With a big chunk of time insuring that 
> pod to pod connectivity just works even if those pods are managed by separate 
> control planes, but that again requires time and talent. There are a number 
> of edge cases to support between the ever so slight, but very important, 
> differences in cloud vendor networks.
>  
> Recently there have been a number of innovations that aid in the deployment 
> and operation of these types of applications on Kubernetes. Service meshes 
> support distributed microservices running across multiple k8s cluster control 
> planes in disparate networks. Instead of directly connecting to IP addresses 
> of remote services instead they use a hostname. With this approach, hostname 
> traffic may then be routed to a proxy that sends traffic over the WAN 
> (sometimes with mTLS) to another proxy pod in the remote cluster which then 
> forwards the data along to the correct pod in that network. (See attached 
> diagrams)
>  
> Which brings us to the point of this ticket. Instead of mapping host ids to 
> IPs, use hostnames (and update the underlying address periodically instead of 
> caching indefinitely). This allows resolution to different IP addresses per 
> DC (k8s cluster) that may then be forwarded to nodes (pods) on remote 
> networks (k8s clusters) without requiring node to node (pod to pod) IP 
> connectivity between them. Traditional deployments can still function like 
> they do today (even if operators opt to keep using IPs as identifiers instead 
> of hostnames). This proxy approach is then enabled like those we see in 
> service meshes.
>  
> _Notes_
> C* already has the concept of broadcast addresses vs those which are bound on 
> the node. This approach _could_ be leveraged to provide the behavior we're 
> looking for, but then the broadcast values would need to be pre-computed 
> _*and match*_ across all k8s control planes. By 

[jira] [Commented] (CASSANDRA-15823) Support for networking via identity instead of IP

2020-09-23 Thread Jeff Jirsa (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17200879#comment-17200879
 ] 

Jeff Jirsa commented on CASSANDRA-15823:


Will note that CASSANDRA-10134 likely solves some of the things I describe, and 
my knowledge was out of date because CASSANDRA-10134 landed in a version I 
don't pay attention to.


> Support for networking via identity instead of IP
> -
>
> Key: CASSANDRA-15823
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15823
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Christopher Bradford
>Priority: Normal
>  Labels: kubernetes
> Attachments: consul-mesh-gateways.png, 
> istio-multicluster-with-gateways.svg, linkerd-service-mirroring.svg
>
>
> TL;DR: Instead of mapping host ids to IPs, use hostnames. This allows 
> resolution to different IP addresses per DC that may then be forwarded to 
> nodes on remote networks without requiring node to node IP connectivity for 
> cross-dc links.
>  
> This approach should not affect existing deployments as those could continue 
> to use IPs as the hostname and skip resolution.
> 
> With orchestration platforms like Kubernetes and the usage of ephemeral 
> containers in environments today we should consider some changes to how we 
> handle the tracking of nodes and their network location. Currently we 
> maintain a mapping between host ids and IP addresses.
>  
> With traditional infrastructure, if a node goes down it, usually, comes back 
> up with the same IP. In some environments this contract may be explicit with 
> virtual IPs that may move between hosts. In newer deployments, like on 
> Kubernetes, this contract is not possible. Pods (analogous to nodes) are 
> assigned an IP address at start time. Should the pod be restarted or 
> scheduled on a different host there is no guarantee we would have the same 
> IP. Cassandra is protected here as we already have logic in place to update 
> peers when we come up with the same host id, but a different IP address.
>  
> There are ways to get Kubernetes to assign a specific IP per Pod. Most 
> recommendations involve the use of a service per pod. Communication with the 
> fixed service IP would automatically forward to the associated pod, 
> regardless of address. We _could_ use this approach, but it seems like this 
> would needlessly create a number of extra resources in our k8s cluster to get 
> around the problem. Which, to be fair, doesn't seem like much of a problem 
> with the aforementioned mitigations built into C*.
>  
> So what is the _actual_ problem? *Cross-region, cross-cloud, 
> hybrid-deployment connectivity between pods is a pain.* This can be solved 
> with significant investment by those who want to deploy these types of 
> topologies. You can definitely configure connectivity between clouds over 
> dedicated connections, or VPN tunnels. With a big chunk of time insuring that 
> pod to pod connectivity just works even if those pods are managed by separate 
> control planes, but that again requires time and talent. There are a number 
> of edge cases to support between the ever so slight, but very important, 
> differences in cloud vendor networks.
>  
> Recently there have been a number of innovations that aid in the deployment 
> and operation of these types of applications on Kubernetes. Service meshes 
> support distributed microservices running across multiple k8s cluster control 
> planes in disparate networks. Instead of directly connecting to IP addresses 
> of remote services instead they use a hostname. With this approach, hostname 
> traffic may then be routed to a proxy that sends traffic over the WAN 
> (sometimes with mTLS) to another proxy pod in the remote cluster which then 
> forwards the data along to the correct pod in that network. (See attached 
> diagrams)
>  
> Which brings us to the point of this ticket. Instead of mapping host ids to 
> IPs, use hostnames (and update the underlying address periodically instead of 
> caching indefinitely). This allows resolution to different IP addresses per 
> DC (k8s cluster) that may then be forwarded to nodes (pods) on remote 
> networks (k8s clusters) without requiring node to node (pod to pod) IP 
> connectivity between them. Traditional deployments can still function like 
> they do today (even if operators opt to keep using IPs as identifiers instead 
> of hostnames). This proxy approach is then enabled like those we see in 
> service meshes.
>  
> _Notes_
> C* already has the concept of broadcast addresses vs those which are bound on 
> the node. This approach _could_ be leveraged to provide the behavior we're 
> looking for, but then the broadcast values would need to be pre-computed 
> _*and match*_ across all k8s control planes. By 

[jira] [Commented] (CASSANDRA-15823) Support for networking via identity instead of IP

2020-09-23 Thread Stefan Podkowinski (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17200662#comment-17200662
 ] 

Stefan Podkowinski commented on CASSANDRA-15823:


DNS is not strongly consistent and hard to reason about when making DNS a part 
of your system. We should not rely on address translation when thinking about 
consistency invariants.

The title of the issue says "Support for networking via identity instead of 
IP". We should already be able to add some support for identities without using 
hostnames, in case we enable internode encryption. X509 certificates are issued 
with a unique identity. If you create an outgoing connection to a host, you 
should be able to compare the hostname of the peers certificate you're 
connecting to, with the hostname you've seen before for that IP and fail in 
case they wouldn't match. But this is really just useful for failing hard on IP 
swapping situations and not a basis for enabling a more advanced service mesh 
integration you've been mentioning.

> Support for networking via identity instead of IP
> -
>
> Key: CASSANDRA-15823
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15823
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Christopher Bradford
>Priority: Normal
>  Labels: kubernetes
> Attachments: consul-mesh-gateways.png, 
> istio-multicluster-with-gateways.svg, linkerd-service-mirroring.svg
>
>
> TL;DR: Instead of mapping host ids to IPs, use hostnames. This allows 
> resolution to different IP addresses per DC that may then be forwarded to 
> nodes on remote networks without requiring node to node IP connectivity for 
> cross-dc links.
>  
> This approach should not affect existing deployments as those could continue 
> to use IPs as the hostname and skip resolution.
> 
> With orchestration platforms like Kubernetes and the usage of ephemeral 
> containers in environments today we should consider some changes to how we 
> handle the tracking of nodes and their network location. Currently we 
> maintain a mapping between host ids and IP addresses.
>  
> With traditional infrastructure, if a node goes down it, usually, comes back 
> up with the same IP. In some environments this contract may be explicit with 
> virtual IPs that may move between hosts. In newer deployments, like on 
> Kubernetes, this contract is not possible. Pods (analogous to nodes) are 
> assigned an IP address at start time. Should the pod be restarted or 
> scheduled on a different host there is no guarantee we would have the same 
> IP. Cassandra is protected here as we already have logic in place to update 
> peers when we come up with the same host id, but a different IP address.
>  
> There are ways to get Kubernetes to assign a specific IP per Pod. Most 
> recommendations involve the use of a service per pod. Communication with the 
> fixed service IP would automatically forward to the associated pod, 
> regardless of address. We _could_ use this approach, but it seems like this 
> would needlessly create a number of extra resources in our k8s cluster to get 
> around the problem. Which, to be fair, doesn't seem like much of a problem 
> with the aforementioned mitigations built into C*.
>  
> So what is the _actual_ problem? *Cross-region, cross-cloud, 
> hybrid-deployment connectivity between pods is a pain.* This can be solved 
> with significant investment by those who want to deploy these types of 
> topologies. You can definitely configure connectivity between clouds over 
> dedicated connections, or VPN tunnels. With a big chunk of time insuring that 
> pod to pod connectivity just works even if those pods are managed by separate 
> control planes, but that again requires time and talent. There are a number 
> of edge cases to support between the ever so slight, but very important, 
> differences in cloud vendor networks.
>  
> Recently there have been a number of innovations that aid in the deployment 
> and operation of these types of applications on Kubernetes. Service meshes 
> support distributed microservices running across multiple k8s cluster control 
> planes in disparate networks. Instead of directly connecting to IP addresses 
> of remote services instead they use a hostname. With this approach, hostname 
> traffic may then be routed to a proxy that sends traffic over the WAN 
> (sometimes with mTLS) to another proxy pod in the remote cluster which then 
> forwards the data along to the correct pod in that network. (See attached 
> diagrams)
>  
> Which brings us to the point of this ticket. Instead of mapping host ids to 
> IPs, use hostnames (and update the underlying address periodically instead of 
> caching indefinitely). This allows resolution to different IP addresses per 
> DC (k8s cluster) 

[jira] [Commented] (CASSANDRA-15823) Support for networking via identity instead of IP

2020-09-22 Thread Jeff Jirsa (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17200519#comment-17200519
 ] 

Jeff Jirsa commented on CASSANDRA-15823:


{quote}
If a node comes up with a completely brand new IP that was never part of the 
cluster before, then we do not have any issues that I know of. Do you know of 
any problems that can happen for that case Jeff Jirsa ?
{quote}

Think this is true, yes

{quote}
While I agree it would be best to have all membership based on UUID's, I think 
we need to allow people to have hostnames be the contact point, and have those 
re-resolve on every "connect". While I agree "ips are best, dns is the devil", 
I have seen bad DNS take down clusters, there are many systems being created 
right now where hostnames are the invariant, not ips, and we need Cassandra to 
be able to play in those environments.
{quote}

If someone decides to completely redesign all of the internals to make it work 
by dns name instead of hostID, I think that would be a pretty horrible mistake. 
Yes, working by DNS should work, but things like ownership need to get to the 
point that we're referring to the (known unique) host IDs instead of just 
slapping a bandaid on top of the IP problem.



> Support for networking via identity instead of IP
> -
>
> Key: CASSANDRA-15823
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15823
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Christopher Bradford
>Priority: Normal
>  Labels: kubernetes
> Attachments: consul-mesh-gateways.png, 
> istio-multicluster-with-gateways.svg, linkerd-service-mirroring.svg
>
>
> TL;DR: Instead of mapping host ids to IPs, use hostnames. This allows 
> resolution to different IP addresses per DC that may then be forwarded to 
> nodes on remote networks without requiring node to node IP connectivity for 
> cross-dc links.
>  
> This approach should not affect existing deployments as those could continue 
> to use IPs as the hostname and skip resolution.
> 
> With orchestration platforms like Kubernetes and the usage of ephemeral 
> containers in environments today we should consider some changes to how we 
> handle the tracking of nodes and their network location. Currently we 
> maintain a mapping between host ids and IP addresses.
>  
> With traditional infrastructure, if a node goes down it, usually, comes back 
> up with the same IP. In some environments this contract may be explicit with 
> virtual IPs that may move between hosts. In newer deployments, like on 
> Kubernetes, this contract is not possible. Pods (analogous to nodes) are 
> assigned an IP address at start time. Should the pod be restarted or 
> scheduled on a different host there is no guarantee we would have the same 
> IP. Cassandra is protected here as we already have logic in place to update 
> peers when we come up with the same host id, but a different IP address.
>  
> There are ways to get Kubernetes to assign a specific IP per Pod. Most 
> recommendations involve the use of a service per pod. Communication with the 
> fixed service IP would automatically forward to the associated pod, 
> regardless of address. We _could_ use this approach, but it seems like this 
> would needlessly create a number of extra resources in our k8s cluster to get 
> around the problem. Which, to be fair, doesn't seem like much of a problem 
> with the aforementioned mitigations built into C*.
>  
> So what is the _actual_ problem? *Cross-region, cross-cloud, 
> hybrid-deployment connectivity between pods is a pain.* This can be solved 
> with significant investment by those who want to deploy these types of 
> topologies. You can definitely configure connectivity between clouds over 
> dedicated connections, or VPN tunnels. With a big chunk of time insuring that 
> pod to pod connectivity just works even if those pods are managed by separate 
> control planes, but that again requires time and talent. There are a number 
> of edge cases to support between the ever so slight, but very important, 
> differences in cloud vendor networks.
>  
> Recently there have been a number of innovations that aid in the deployment 
> and operation of these types of applications on Kubernetes. Service meshes 
> support distributed microservices running across multiple k8s cluster control 
> planes in disparate networks. Instead of directly connecting to IP addresses 
> of remote services instead they use a hostname. With this approach, hostname 
> traffic may then be routed to a proxy that sends traffic over the WAN 
> (sometimes with mTLS) to another proxy pod in the remote cluster which then 
> forwards the data along to the correct pod in that network. (See attached 
> diagrams)
>  
> Which brings us to the point of this ticket. 

[jira] [Commented] (CASSANDRA-15823) Support for networking via identity instead of IP

2020-09-22 Thread Roman Chernobelskiy (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17200199#comment-17200199
 ] 

Roman Chernobelskiy commented on CASSANDRA-15823:
-

One way to address this in kubernetes is with a sidecar that encodes the pod 
names into virtual IP addresses, thereby giving each node a stable ip and then 
looking up the ip based on the encoded hostname on outgoing connections.

> Support for networking via identity instead of IP
> -
>
> Key: CASSANDRA-15823
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15823
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Christopher Bradford
>Priority: Normal
>  Labels: kubernetes
> Attachments: consul-mesh-gateways.png, 
> istio-multicluster-with-gateways.svg, linkerd-service-mirroring.svg
>
>
> TL;DR: Instead of mapping host ids to IPs, use hostnames. This allows 
> resolution to different IP addresses per DC that may then be forwarded to 
> nodes on remote networks without requiring node to node IP connectivity for 
> cross-dc links.
>  
> This approach should not affect existing deployments as those could continue 
> to use IPs as the hostname and skip resolution.
> 
> With orchestration platforms like Kubernetes and the usage of ephemeral 
> containers in environments today we should consider some changes to how we 
> handle the tracking of nodes and their network location. Currently we 
> maintain a mapping between host ids and IP addresses.
>  
> With traditional infrastructure, if a node goes down it, usually, comes back 
> up with the same IP. In some environments this contract may be explicit with 
> virtual IPs that may move between hosts. In newer deployments, like on 
> Kubernetes, this contract is not possible. Pods (analogous to nodes) are 
> assigned an IP address at start time. Should the pod be restarted or 
> scheduled on a different host there is no guarantee we would have the same 
> IP. Cassandra is protected here as we already have logic in place to update 
> peers when we come up with the same host id, but a different IP address.
>  
> There are ways to get Kubernetes to assign a specific IP per Pod. Most 
> recommendations involve the use of a service per pod. Communication with the 
> fixed service IP would automatically forward to the associated pod, 
> regardless of address. We _could_ use this approach, but it seems like this 
> would needlessly create a number of extra resources in our k8s cluster to get 
> around the problem. Which, to be fair, doesn't seem like much of a problem 
> with the aforementioned mitigations built into C*.
>  
> So what is the _actual_ problem? *Cross-region, cross-cloud, 
> hybrid-deployment connectivity between pods is a pain.* This can be solved 
> with significant investment by those who want to deploy these types of 
> topologies. You can definitely configure connectivity between clouds over 
> dedicated connections, or VPN tunnels. With a big chunk of time insuring that 
> pod to pod connectivity just works even if those pods are managed by separate 
> control planes, but that again requires time and talent. There are a number 
> of edge cases to support between the ever so slight, but very important, 
> differences in cloud vendor networks.
>  
> Recently there have been a number of innovations that aid in the deployment 
> and operation of these types of applications on Kubernetes. Service meshes 
> support distributed microservices running across multiple k8s cluster control 
> planes in disparate networks. Instead of directly connecting to IP addresses 
> of remote services instead they use a hostname. With this approach, hostname 
> traffic may then be routed to a proxy that sends traffic over the WAN 
> (sometimes with mTLS) to another proxy pod in the remote cluster which then 
> forwards the data along to the correct pod in that network. (See attached 
> diagrams)
>  
> Which brings us to the point of this ticket. Instead of mapping host ids to 
> IPs, use hostnames (and update the underlying address periodically instead of 
> caching indefinitely). This allows resolution to different IP addresses per 
> DC (k8s cluster) that may then be forwarded to nodes (pods) on remote 
> networks (k8s clusters) without requiring node to node (pod to pod) IP 
> connectivity between them. Traditional deployments can still function like 
> they do today (even if operators opt to keep using IPs as identifiers instead 
> of hostnames). This proxy approach is then enabled like those we see in 
> service meshes.
>  
> _Notes_
> C* already has the concept of broadcast addresses vs those which are bound on 
> the node. This approach _could_ be leveraged to provide the behavior we're 
> looking for, but then the broadcast values would need to 

[jira] [Commented] (CASSANDRA-15823) Support for networking via identity instead of IP

2020-05-26 Thread Jeremy Hanna (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17116489#comment-17116489
 ] 

Jeremy Hanna commented on CASSANDRA-15823:
--

Adding in related issues where host id (for vnodes in 1.2) and previously token 
(for 0.7) were put in place of IP address to identify nodes - for historical 
purposes.

> Support for networking via identity instead of IP
> -
>
> Key: CASSANDRA-15823
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15823
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Christopher Bradford
>Priority: Normal
> Attachments: consul-mesh-gateways.png, 
> istio-multicluster-with-gateways.svg, linkerd-service-mirroring.svg
>
>
> TL;DR: Instead of mapping host ids to IPs, use hostnames. This allows 
> resolution to different IP addresses per DC that may then be forwarded to 
> nodes on remote networks without requiring node to node IP connectivity for 
> cross-dc links.
>  
> This approach should not affect existing deployments as those could continue 
> to use IPs as the hostname and skip resolution.
> 
> With orchestration platforms like Kubernetes and the usage of ephemeral 
> containers in environments today we should consider some changes to how we 
> handle the tracking of nodes and their network location. Currently we 
> maintain a mapping between host ids and IP addresses.
>  
> With traditional infrastructure, if a node goes down it, usually, comes back 
> up with the same IP. In some environments this contract may be explicit with 
> virtual IPs that may move between hosts. In newer deployments, like on 
> Kubernetes, this contract is not possible. Pods (analogous to nodes) are 
> assigned an IP address at start time. Should the pod be restarted or 
> scheduled on a different host there is no guarantee we would have the same 
> IP. Cassandra is protected here as we already have logic in place to update 
> peers when we come up with the same host id, but a different IP address.
>  
> There are ways to get Kubernetes to assign a specific IP per Pod. Most 
> recommendations involve the use of a service per pod. Communication with the 
> fixed service IP would automatically forward to the associated pod, 
> regardless of address. We _could_ use this approach, but it seems like this 
> would needlessly create a number of extra resources in our k8s cluster to get 
> around the problem. Which, to be fair, doesn't seem like much of a problem 
> with the aforementioned mitigations built into C*.
>  
> So what is the _actual_ problem? *Cross-region, cross-cloud, 
> hybrid-deployment connectivity between pods is a pain.* This can be solved 
> with significant investment by those who want to deploy these types of 
> topologies. You can definitely configure connectivity between clouds over 
> dedicated connections, or VPN tunnels. With a big chunk of time insuring that 
> pod to pod connectivity just works even if those pods are managed by separate 
> control planes, but that again requires time and talent. There are a number 
> of edge cases to support between the ever so slight, but very important, 
> differences in cloud vendor networks.
>  
> Recently there have been a number of innovations that aid in the deployment 
> and operation of these types of applications on Kubernetes. Service meshes 
> support distributed microservices running across multiple k8s cluster control 
> planes in disparate networks. Instead of directly connecting to IP addresses 
> of remote services instead they use a hostname. With this approach, hostname 
> traffic may then be routed to a proxy that sends traffic over the WAN 
> (sometimes with mTLS) to another proxy pod in the remote cluster which then 
> forwards the data along to the correct pod in that network. (See attached 
> diagrams)
>  
> Which brings us to the point of this ticket. Instead of mapping host ids to 
> IPs, use hostnames (and update the underlying address periodically instead of 
> caching indefinitely). This allows resolution to different IP addresses per 
> DC (k8s cluster) that may then be forwarded to nodes (pods) on remote 
> networks (k8s clusters) without requiring node to node (pod to pod) IP 
> connectivity between them. Traditional deployments can still function like 
> they do today (even if operators opt to keep using IPs as identifiers instead 
> of hostnames). This proxy approach is then enabled like those we see in 
> service meshes.
>  
> _Notes_
> C* already has the concept of broadcast addresses vs those which are bound on 
> the node. This approach _could_ be leveraged to provide the behavior we're 
> looking for, but then the broadcast values would need to be pre-computed 
> _*and match*_ across all k8s control planes. By using hostnames the 
> underlying IP 

[jira] [Commented] (CASSANDRA-15823) Support for networking via identity instead of IP

2020-05-20 Thread Jeremiah Jordan (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17112584#comment-17112584
 ] 

Jeremiah Jordan commented on CASSANDRA-15823:
-

bq. Cassandra is protected here as we already have logic in place to update 
peers when we come up with the same host id, but a different IP address.
bq. This definitely isn’t true / strictly safe. In fact it’s trivial to violate 
consistency / lose data by swapping the IP of two Pods/instances on the same 
host.

Right.  We are OK right now as long as you don't actually "swap" its with 
another node.  Aka Node A goes down loses its IP, Node B goes down loses its 
IP, Node A comes back up with the IP node B previously had.  "bad things" will 
happen in this case as there are a bunch of race conditions for the change in 
token ownership that just occurred.

If a node comes up with a completely brand new IP that was never part of the 
cluster before, then we do not have any issues that I know of.  Do you know of 
any problems that can happen for that case [~jjirsa] ?

bq. We really need everything to be based on UUIDs, not ip or port or host 
name. And we really really really shouldn’t assume that dns is universally 
available or correct (because that’s just not always true, even in 2020).

While I agree it would be best to have all membership based on UUID's, I think 
we need to allow people to have hostnames be the contact point, and have those 
re-resolve on every "connect".  While I agree "ips are best, dns is the devil", 
 I have seen bad DNS take down clusters, there are many systems being created 
right now where hostnames are the invariant, not ips, and we need Cassandra to 
be able to play in those environments.

> Support for networking via identity instead of IP
> -
>
> Key: CASSANDRA-15823
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15823
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Christopher Bradford
>Priority: Normal
> Attachments: consul-mesh-gateways.png, 
> istio-multicluster-with-gateways.svg, linkerd-service-mirroring.svg
>
>
> TL;DR: Instead of mapping host ids to IPs, use hostnames. This allows 
> resolution to different IP addresses per DC that may then be forwarded to 
> nodes on remote networks without requiring node to node IP connectivity for 
> cross-dc links.
>  
> This approach should not affect existing deployments as those could continue 
> to use IPs as the hostname and skip resolution.
> 
> With orchestration platforms like Kubernetes and the usage of ephemeral 
> containers in environments today we should consider some changes to how we 
> handle the tracking of nodes and their network location. Currently we 
> maintain a mapping between host ids and IP addresses.
>  
> With traditional infrastructure, if a node goes down it, usually, comes back 
> up with the same IP. In some environments this contract may be explicit with 
> virtual IPs that may move between hosts. In newer deployments, like on 
> Kubernetes, this contract is not possible. Pods (analogous to nodes) are 
> assigned an IP address at start time. Should the pod be restarted or 
> scheduled on a different host there is no guarantee we would have the same 
> IP. Cassandra is protected here as we already have logic in place to update 
> peers when we come up with the same host id, but a different IP address.
>  
> There are ways to get Kubernetes to assign a specific IP per Pod. Most 
> recommendations involve the use of a service per pod. Communication with the 
> fixed service IP would automatically forward to the associated pod, 
> regardless of address. We _could_ use this approach, but it seems like this 
> would needlessly create a number of extra resources in our k8s cluster to get 
> around the problem. Which, to be fair, doesn't seem like much of a problem 
> with the aforementioned mitigations built into C*.
>  
> So what is the _actual_ problem? *Cross-region, cross-cloud, 
> hybrid-deployment connectivity between pods is a pain.* This can be solved 
> with significant investment by those who want to deploy these types of 
> topologies. You can definitely configure connectivity between clouds over 
> dedicated connections, or VPN tunnels. With a big chunk of time insuring that 
> pod to pod connectivity just works even if those pods are managed by separate 
> control planes, but that again requires time and talent. There are a number 
> of edge cases to support between the ever so slight, but very important, 
> differences in cloud vendor networks.
>  
> Recently there have been a number of innovations that aid in the deployment 
> and operation of these types of applications on Kubernetes. Service meshes 
> support distributed microservices running across multiple k8s cluster control 
> 

[jira] [Commented] (CASSANDRA-15823) Support for networking via identity instead of IP

2020-05-20 Thread Christopher Bradford (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17112377#comment-17112377
 ] 

Christopher Bradford commented on CASSANDRA-15823:
--

Isn’t host id a uuid? I’m not suggesting we remove that part of the equation. 
The goal of this ticket is to support address identifiers other than IP. 

DNS availability and resilience should be weighed by the users on a per 
environment basis. If the user prefers configuration by hostname and believe 
their DNS is HA and resilient enough they should be able to opt in here. 

Now if we’re talking about validating a request was received and processed by a 
specific host id. IMHO that’s a different change / ticket. 

Do we have anything in our protocols / RPC that validates a request that’s 
received? IE if a coordinator sends a request to an IP that’s been swapped, 
does the receiving node validate that the request belongs to it? Maybe more 
succinctly, are host ids sent along with each request for validation purposes?

> Support for networking via identity instead of IP
> -
>
> Key: CASSANDRA-15823
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15823
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Christopher Bradford
>Priority: Normal
> Attachments: consul-mesh-gateways.png, 
> istio-multicluster-with-gateways.svg, linkerd-service-mirroring.svg
>
>
> TL;DR: Instead of mapping host ids to IPs, use hostnames. This allows 
> resolution to different IP addresses per DC that may then be forwarded to 
> nodes on remote networks without requiring node to node IP connectivity for 
> cross-dc links.
>  
> This approach should not affect existing deployments as those could continue 
> to use IPs as the hostname and skip resolution.
> 
> With orchestration platforms like Kubernetes and the usage of ephemeral 
> containers in environments today we should consider some changes to how we 
> handle the tracking of nodes and their network location. Currently we 
> maintain a mapping between host ids and IP addresses.
>  
> With traditional infrastructure, if a node goes down it, usually, comes back 
> up with the same IP. In some environments this contract may be explicit with 
> virtual IPs that may move between hosts. In newer deployments, like on 
> Kubernetes, this contract is not possible. Pods (analogous to nodes) are 
> assigned an IP address at start time. Should the pod be restarted or 
> scheduled on a different host there is no guarantee we would have the same 
> IP. Cassandra is protected here as we already have logic in place to update 
> peers when we come up with the same host id, but a different IP address.
>  
> There are ways to get Kubernetes to assign a specific IP per Pod. Most 
> recommendations involve the use of a service per pod. Communication with the 
> fixed service IP would automatically forward to the associated pod, 
> regardless of address. We _could_ use this approach, but it seems like this 
> would needlessly create a number of extra resources in our k8s cluster to get 
> around the problem. Which, to be fair, doesn't seem like much of a problem 
> with the aforementioned mitigations built into C*.
>  
> So what is the _actual_ problem? *Cross-region, cross-cloud, 
> hybrid-deployment connectivity between pods is a pain.* This can be solved 
> with significant investment by those who want to deploy these types of 
> topologies. You can definitely configure connectivity between clouds over 
> dedicated connections, or VPN tunnels. With a big chunk of time insuring that 
> pod to pod connectivity just works even if those pods are managed by separate 
> control planes, but that again requires time and talent. There are a number 
> of edge cases to support between the ever so slight, but very important, 
> differences in cloud vendor networks.
>  
> Recently there have been a number of innovations that aid in the deployment 
> and operation of these types of applications on Kubernetes. Service meshes 
> support distributed microservices running across multiple k8s cluster control 
> planes in disparate networks. Instead of directly connecting to IP addresses 
> of remote services instead they use a hostname. With this approach, hostname 
> traffic may then be routed to a proxy that sends traffic over the WAN 
> (sometimes with mTLS) to another proxy pod in the remote cluster which then 
> forwards the data along to the correct pod in that network. (See attached 
> diagrams)
>  
> Which brings us to the point of this ticket. Instead of mapping host ids to 
> IPs, use hostnames (and update the underlying address periodically instead of 
> caching indefinitely). This allows resolution to different IP addresses per 
> DC (k8s cluster) that may then be forwarded to nodes (pods) 

[jira] [Commented] (CASSANDRA-15823) Support for networking via identity instead of IP

2020-05-19 Thread Jeff Jirsa (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17111724#comment-17111724
 ] 

Jeff Jirsa commented on CASSANDRA-15823:


> Cassandra is protected here as we already have logic in place to update peers 
> when we come up with the same host id, but a different IP address.


This definitely isn’t true / strictly safe. In fact it’s trivial to violate 
consistency / lose data by swapping the IP of two Pods/instances on the same 
host. 

We really need everything to be based on UUIDs, not ip or port or host name. 
And we really really really shouldn’t assume that dns is universally available 
or correct (because that’s just not always true, even in 2020).


> Support for networking via identity instead of IP
> -
>
> Key: CASSANDRA-15823
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15823
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Christopher Bradford
>Priority: Normal
> Attachments: consul-mesh-gateways.png, 
> istio-multicluster-with-gateways.svg, linkerd-service-mirroring.svg
>
>
> TL;DR: Instead of mapping host ids to IPs, use hostnames. This allows 
> resolution to different IP addresses per DC that may then be forwarded to 
> nodes on remote networks without requiring node to node IP connectivity for 
> cross-dc links.
>  
> This approach should not affect existing deployments as those could continue 
> to use IPs as the hostname and skip resolution.
> 
> With orchestration platforms like Kubernetes and the usage of ephemeral 
> containers in environments today we should consider some changes to how we 
> handle the tracking of nodes and their network location. Currently we 
> maintain a mapping between host ids and IP addresses.
>  
> With traditional infrastructure, if a node goes down it, usually, comes back 
> up with the same IP. In some environments this contract may be explicit with 
> virtual IPs that may move between hosts. In newer deployments, like on 
> Kubernetes, this contract is not possible. Pods (analogous to nodes) are 
> assigned an IP address at start time. Should the pod be restarted or 
> scheduled on a different host there is no guarantee we would have the same 
> IP. Cassandra is protected here as we already have logic in place to update 
> peers when we come up with the same host id, but a different IP address.
>  
> There are ways to get Kubernetes to assign a specific IP per Pod. Most 
> recommendations involve the use of a service per pod. Communication with the 
> fixed service IP would automatically forward to the associated pod, 
> regardless of address. We _could_ use this approach, but it seems like this 
> would needlessly create a number of extra resources in our k8s cluster to get 
> around the problem. Which, to be fair, doesn't seem like much of a problem 
> with the aforementioned mitigations built into C*.
>  
> So what is the _actual_ problem? *Cross-region, cross-cloud, 
> hybrid-deployment connectivity between pods is a pain.* This can be solved 
> with significant investment by those who want to deploy these types of 
> topologies. You can definitely configure connectivity between clouds over 
> dedicated connections, or VPN tunnels. With a big chunk of time insuring that 
> pod to pod connectivity just works even if those pods are managed by separate 
> control planes, but that again requires time and talent. There are a number 
> of edge cases to support between the ever so slight, but very important, 
> differences in cloud vendor networks.
>  
> Recently there have been a number of innovations that aid in the deployment 
> and operation of these types of applications on Kubernetes. Service meshes 
> support distributed microservices running across multiple k8s cluster control 
> planes in disparate networks. Instead of directly connecting to IP addresses 
> of remote services instead they use a hostname. With this approach, hostname 
> traffic may then be routed to a proxy that sends traffic over the WAN 
> (sometimes with mTLS) to another proxy pod in the remote cluster which then 
> forwards the data along to the correct pod in that network. (See attached 
> diagrams)
>  
> Which brings us to the point of this ticket. Instead of mapping host ids to 
> IPs, use hostnames (and update the underlying address periodically instead of 
> caching indefinitely). This allows resolution to different IP addresses per 
> DC (k8s cluster) that may then be forwarded to nodes (pods) on remote 
> networks (k8s clusters) without requiring node to node (pod to pod) IP 
> connectivity between them. Traditional deployments can still function like 
> they do today (even if operators opt to keep using IPs as identifiers instead 
> of hostnames). This proxy approach is then enabled like those we see 

[jira] [Commented] (CASSANDRA-15823) Support for networking via identity instead of IP

2020-05-19 Thread Dor Laor (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17111693#comment-17111693
 ] 

Dor Laor commented on CASSANDRA-15823:
--

In Scylla we see a similar need and there is a suggestion to use UUID

identification instead of IPs: [https://github.com/scylladb/scylla/issues/6403]

> Support for networking via identity instead of IP
> -
>
> Key: CASSANDRA-15823
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15823
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Christopher Bradford
>Priority: Normal
> Attachments: consul-mesh-gateways.png, 
> istio-multicluster-with-gateways.svg, linkerd-service-mirroring.svg
>
>
> TL;DR: Instead of mapping host ids to IPs, use hostnames. This allows 
> resolution to different IP addresses per DC that may then be forwarded to 
> nodes on remote networks without requiring node to node IP connectivity for 
> cross-dc links.
>  
> This approach should not affect existing deployments as those could continue 
> to use IPs as the hostname and skip resolution.
> 
> With orchestration platforms like Kubernetes and the usage of ephemeral 
> containers in environments today we should consider some changes to how we 
> handle the tracking of nodes and their network location. Currently we 
> maintain a mapping between host ids and IP addresses.
>  
> With traditional infrastructure, if a node goes down it, usually, comes back 
> up with the same IP. In some environments this contract may be explicit with 
> virtual IPs that may move between hosts. In newer deployments, like on 
> Kubernetes, this contract is not possible. Pods (analogous to nodes) are 
> assigned an IP address at start time. Should the pod be restarted or 
> scheduled on a different host there is no guarantee we would have the same 
> IP. Cassandra is protected here as we already have logic in place to update 
> peers when we come up with the same host id, but a different IP address.
>  
> There are ways to get Kubernetes to assign a specific IP per Pod. Most 
> recommendations involve the use of a service per pod. Communication with the 
> fixed service IP would automatically forward to the associated pod, 
> regardless of address. We _could_ use this approach, but it seems like this 
> would needlessly create a number of extra resources in our k8s cluster to get 
> around the problem. Which, to be fair, doesn't seem like much of a problem 
> with the aforementioned mitigations built into C*.
>  
> So what is the _actual_ problem? *Cross-region, cross-cloud, 
> hybrid-deployment connectivity between pods is a pain.* This can be solved 
> with significant investment by those who want to deploy these types of 
> topologies. You can definitely configure connectivity between clouds over 
> dedicated connections, or VPN tunnels. With a big chunk of time insuring that 
> pod to pod connectivity just works even if those pods are managed by separate 
> control planes, but that again requires time and talent. There are a number 
> of edge cases to support between the ever so slight, but very important, 
> differences in cloud vendor networks.
>  
> Recently there have been a number of innovations that aid in the deployment 
> and operation of these types of applications on Kubernetes. Service meshes 
> support distributed microservices running across multiple k8s cluster control 
> planes in disparate networks. Instead of directly connecting to IP addresses 
> of remote services instead they use a hostname. With this approach, hostname 
> traffic may then be routed to a proxy that sends traffic over the WAN 
> (sometimes with mTLS) to another proxy pod in the remote cluster which then 
> forwards the data along to the correct pod in that network. (See attached 
> diagrams)
>  
> Which brings us to the point of this ticket. Instead of mapping host ids to 
> IPs, use hostnames (and update the underlying address periodically instead of 
> caching indefinitely). This allows resolution to different IP addresses per 
> DC (k8s cluster) that may then be forwarded to nodes (pods) on remote 
> networks (k8s clusters) without requiring node to node (pod to pod) IP 
> connectivity between them. Traditional deployments can still function like 
> they do today (even if operators opt to keep using IPs as identifiers instead 
> of hostnames). This proxy approach is then enabled like those we see in 
> service meshes.
>  
> _Notes_
> C* already has the concept of broadcast addresses vs those which are bound on 
> the node. This approach _could_ be leveraged to provide the behavior we're 
> looking for, but then the broadcast values would need to be pre-computed 
> _*and match*_ across all k8s control planes. By using hostnames the 
> underlying IP address does not matter and will