GitHub user HeinzM created a discussion: data-server. is not reachable in some 
vpc isolated guest networks

Hello,

let me describe the setup first:
We try to deploy a k8s cluster in a vpc setup with isolated networking for 
control plane and worker nodes.
CS: 4.21

**Cloudstack VPC and isolated networks**

We created a vpc first.

```hcl
resource "cloudstack_vpc" "k8s_vpc01" {
  name         = var.k8s_vpc01_name
  cidr         = var.k8s_vpc01_cidr
  vpc_offering = var.k8s_vpc01_offering
  zone         = var.zone
  project      = var.project_id
}
```

k8s_vpc01_cidr = 10.0.0.0/19
Then we create three networks, one for control plane and two worker networks.

```hcl
resource "cloudstack_network" "k8s_cp_01" {
  name             = var.k8s_nw_cp01
  cidr             = var.k8s_nw_cp01_cidr
  network_offering = var.vpc_network_offering
  zone             = var.zone
  vpc_id           = cloudstack_vpc.k8s_vpc01.id
  acl_id           = cloudstack_network_acl.k8s_acl_cp.id
  project          = var.project_id
}
```

```hcl
resource "cloudstack_network" "k8s_wn" {
  count            = var.k8s_nw_wn_count
  project          = var.project_id
  name             = local.wn_nws[count.index].network_name
  cidr             = local.wn_nws[count.index].cidr
  network_offering = var.vpc_network_offering
  zone             = var.zone
  vpc_id           = cloudstack_vpc.k8s_vpc01.id
  acl_id           = cloudstack_network_acl.k8s_acl_wn[count.index].id
}
```

k8s_nw_cp01_cidr = 10.0.1.0/28
k8s_nw_wn01_cidr = 10.0.2.0/28
k8s_nw_wn01_cidr = 10.0.3.0/28
vpc_network_offering = DefaultIsolatedNetworkOfferingForVpcNetworks

**Cloudstack instances**

We deploy nodes for control plane as cloudstack_instance with the following 
configuration:

```hcl
resource "cloudstack_instance" "controller" {
  depends_on        = [cloudstack_network.k8s_cp_01]
  count             = var.controller_count
  project           = var.project_id
  service_offering  = var.compute_offering_cp
  template          = var.talos_image
  name              = local.controller_nodes[count.index].name
  ip_address        = local.controller_nodes[count.index].ip
  zone              = var.zone
  cluster_id        = var.cluster_ids[0]
  network_id        = cloudstack_network.k8s_cp_01.id
  user_data         = 
base64encode(data.talos_machine_configuration.controller[count.index].machine_configuration)
  expunge           = true
}
```

and for worker

```hcl
resource "cloudstack_instance" "worker" {
  depends_on        = [
    cloudstack_instance.controller,
    cloudstack_network.k8s_wn
  ]
  for_each          = { for worker in local.worker_nodes: "${worker.name}" => 
worker }
  project           = var.project_id
  service_offering  = var.compute_offering_worker
  template          = var.talos_image
  name             = each.value.name
  ip_address       = each.value.ip
  zone              = var.zone
  cluster_id        = var.cluster_ids[0]
  network_id        = each.value.network
  user_data         = 
base64encode(data.talos_machine_configuration.worker.machine_configuration)
  expunge           = true
  root_disk_size    = 16
}
```

**The userdata**

controller:

```hcl
data "talos_machine_configuration" "controller" {
  count           = var.controller_count
  cluster_name     = var.k8s_cluster_name
  cluster_endpoint = local.cluster_endpoint
  machine_secrets  = talos_machine_secrets.talos.machine_secrets
  machine_type     = "controlplane"
  talos_version  = "1.11.3"
  config_patches = [
    yamlencode({
      machine = {
        install = {
          disk = "/dev/vda"
          extraKernelArgs = ["talos.platform=cloudstack"]
        }
        env = {
          http_proxy = var.proxy_server
          https_proxy = var.proxy_server
          no_proxy = var.no_proxy
        }
        time = {
          servers = var.ntp_servers
        }
        kubelet = {
          extraArgs = {
            rotate-server-certificates = true
          }
        }
        network = {
          hostname = local.controller_nodes[count.index].name
          interfaces = [
            {
              deviceSelector = {
                physical = true
              }
              addresses: [ 
"${local.controller_nodes[count.index].ip}/${local.cidr_mask[1]}" ]
              routes: [ {
                network = "0.0.0.0/0"
                gateway = "${local.controller_nodes[count.index].gateway}"
              } ]
            }
          ]
          nameservers = var.dns_servers
        }
      }
      cluster = {
        network = {
          cni = {
            name = "none"
          }
        }
        proxy = {
          disabled = true
        }
        apiServer = {
          certSANs = [ cloudstack_ipaddress.k8s_cp_staticnat_ip01.ip_address ]
        }
        extraManifests = [
          
"https://raw.githubusercontent.com/alex1989hu/kubelet-serving-cert-approver/main/deploy/standalone-install.yaml";,
          
"https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml";
        ]
      }
    })
  ]
}
```

worker:

```hcl
data "talos_machine_configuration" "worker" {
  cluster_name     = var.k8s_cluster_name
  cluster_endpoint = local.cluster_endpoint
  machine_secrets  = talos_machine_secrets.talos.machine_secrets
  talos_version  = "1.11.3"
  machine_type     = "worker"
  config_patches = [
    yamlencode({
      machine = {
        install = {
          disk = "/dev/vda"
          extraKernelArgs = ["talos.platform=cloudstack"]
        }
        env = {
          http_proxy = var.proxy_server
          https_proxy = var.proxy_server
          no_proxy = var.no_proxy
        }
        time = {
          servers = var.ntp_servers
        }
      }
      cluster = {
        network = {
          cni = {
            name = "none"
          }
        }
        proxy = {
          disabled = true
        }
      }
    })
  ]
}
```

proxy_server = http://server.ip:port
no_proxy = "10.0.0.0/8, data-server."

ntp_server = "/dev/ptp0"

What happens next:
The machines come up.
The machines from the control plane cannot resolve data-server. in any case 
that has occurred so far.
The machines from the worker networks alternate.
Sometimes data-server. can be resolved in one network and sometimes in the 
other.
I can see on the router in the virtual machine that data-server points to the 
respective IP addresses from the control plane and worker networks.
What I can also see is that in the worker network, where data-server can be 
reached, the DNS points to the IP of the virtual router.
In the other networks, the DNS servers from the local network are used.
Without the network separation, i.e., with just a simple guest network, the 
configuration works perfectly.
I can't tell right now whether this is a bug or user error.
Does anyone have any advice for me?

GitHub link: https://github.com/apache/cloudstack/discussions/11879

----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: [email protected]

Reply via email to