-------- Message initial -------- De: Stefan Lendl <s.le...@proxmox.com> À: "DERUMIER, Alexandre" <alexandre.derum...@groupe-cyllene.com> Cc: pve-devel@lists.proxmox.com <pve-devel@lists.proxmox.com> Objet: Re: [pve-devel] [WIP v2 cluster/network/manager/qemu- server/container 00/10] Add support for DHCP servers to SDN Date: 27/10/2023 14:53:25
Hi Alexandre, I am proposing a slightly different view. >>I think it's better to keep all IPs, managed by the IPAM in the IPAM >>and the VM only configures as DHCP. Yes, I'm thinking exactly the same ! I had tried 2year ago to implement ipam with static ip in vm configuration (+ipam), and they are a lot of corner case. >>I would implement the 4 mentioned events (vNIC create, destroy, >>start, >>stop) in the SDN module and limit interactions between VM configs and >>the SDN module to these events. >> >>On NIC create: the it calls the SDN::nic_join_vnet($bridge, $mac) >>function that handles IPAM registration if necessary triggers >>generating >>DHCP config and so on. Same approach for the other SDN related >>events. >> >>All the logic is implemented in the SDN module. This reduces coupling >>between VM logic and SDN logic. sound great :) "DERUMIER, Alexandre" <alexandre.derum...@groupe-cyllene.com> writes: > Hi Stefan (Lendl), > > I'm totally agreed with you, we should have persistent reservation, > at vm create/nic plug, nic delete, vm delete. > > At least , for my usage with multiple cluster on different > datacenters, > I really can wait to call ipam to api at each start (for scalability > or > for security if ipam is down) > > > This also allow to simply do reservations in dnsmasq file without any > need to restart it. (AFAIK, openstack is using dnsmasq like this too) > > > I'm not sure if true dynamic ephemral ip , changing at each vm > stop/start is interesting for a server vm usage. (maybe for desktop > vmwhere you share a small pool of ip, but I personnaly don't known > any > proxmox users using proxmox ve for this) > > > see my proposal here (with handle ephemeral && reserved, but it's > even > easier with only reserved): > > https://antiphishing.cetsi.fr/proxy/v3?i=YXJwbnI5ZGY3YXM2MThBYc__j3mP > QdDC0mZ08oRIJLw&r=d2RpVFJVaTVtcFJRWFNMYgYCddP93Y9SOEaGwAD- > 9JdLrx2JwwKfs9Sn_uiRQCCUgqnCg4WLD- > gLY0eKXrXX4A&f=SVN0TjFBb1k5Qk8zQ2E1YT- > NJ2Y2fJYrRVcVAuRs9UYfyMFrtkoDLcaTV9MhYQZD&u=https%3A//lists.proxmox.c > om/pipermail/pve-devel/2023-September/059169.html&k=ogd1 > > > > > " > I think we could implement ipam call like: > > > create vm or add a new nic --> > ----------------------------- > qm create ... -net0 > bridge=vnet,....,ip=(auto|192.168.0.1|dynamic),ip6=(..) > > > auto : search a free ip in ipam. write the ip address in net0: > ...,ip= > ip field > > 192.168.0.1: check if ip is free in ipam && register ip in ipam. > write > the ip in ip field. > > > dynamic: write "ephemeral" in net0: ....,ip=ephemeral (This is a > dynamic ip registered at vm start, and release at vm stop) > > > > vm start > --------- > - if ip=ephemeral, find && register a free ip in ipam, write it in vm > net0: ...,ip=192.168.0.10[E] . (maybe with a special flag [E] to > indicate it's ephemeral) > - read ip from vm config && inject in dhcp > > > vm_stop > ------- > if ip is ephemeral (netX: ip=192.168.0.10[E]), delete ip from ipam, > set ip=ephemeral in vm config > > > vm_destroy or nic remove/unplug > ------------------------- > if netX: ...,ip=192.168.0.10 , remove ip from ipam > > > > nic update when vm is running: > ------------------------------ > if ip is defined : netX: ip=192.168.0.10, we don't allow bridge > change > or ip change, as vm is not notified about theses changes, and still > use > old ip. > > We can allow nic hot-unplug && hotplug. (guest os will remove the ip > on > nic removal, and will call dhcp again on nic hotplug) > > > > > nic hotplug with ip=auto: > ------------------------- > > --> add nic in pending state ----> find ip in ipam && write it in > pending ---> do the hotplug in qemu. > > We need to handle the config revert to remove ip from ipam if the nic > hotplug is blocked in pending state(I never see this case until os > don't have pci_hotplug module loaded, but it's better to be carefull > ) > > " > > > > > I am currently working on the SDN feature. This is an initial > > > review > > > of > > > the patch series and I am trying to make a strong case against > > > ephemeral > > > DHCP IP reservation. > > > > > > The current state of the patch series invokes the IPAM on every > > > VM/CT > > > start/stop to add or remove the IP from the IPAM. > > > This triggers the dnsmasq config generation on the specific host > > > with > > > only the MAC/IP mapping of that particular host. > > > > > > From reading the discussion of the v1 patch series I understand this > approach tries to implement the ephemeral IP reservation strategy. > From > off-list conversations with Stefan Hanreich, I agree that having > ephemeral IP reservation coordinated by the IPAM requires us to > re-implement DHCP functionality in the IPAM and heavily rely on > syncing > between the different services. > > To maintain reliable sync we need to hook into many different places > where the IPAM need to be queried. Any issues with the > implementation > may lead to IPAM and DHCP local config state running out of sync > causing > network issues duplicate multiple IPs. > > Furthermore, every interaction with the IPAM requires a cluster-wide > lock on the IPAM. Having a central cluster-wide lock on every VM > start/stop/migrate will significantly limit parallel operations. > Event > starting two VMs in parallel will be limited by this central lock. At > boot trying to start many VMs (ideally as much in parallel as > possible) > is limited by the central IPAM lock even further. > > I argue that we shall not support ephemeral IPs altogether. > The alternative is to make all IPAM reservations persistent. > > Using persistent IPs only reduces the interactions of VM/CTs with the > IPAM to a minimum of NIC joining a subnet and NIC leaving a subnet. I > am > deliberately not referring to VMs because a VM may be part of > multiple > VNets or even multiple times in the same VNet (regardless if that is > sensible). > > Cases the IPAM needs to be involved: > > - NIC with DHCP enabled VNet is added to VM config > - NIC with DHCP enabled VNet is removed from VM config > - NIC is assigned to another Bridge > can be treated as individual leave + join events > > Cases that are explicitly not covered but may be added if desired: > > - Manually assign an IP address on a NIC > will not be automatically visible in the IPAM > - Manually change the MAC on a NIC > don't do that > you are on your own. > Not handled > change in IPAM manually > > Once an IP is reserved via IPAM, the dnsmasq config can be generated > stateless and idempotent from the pve IPAM and is identical on all > nodes > regardless if a VM/CT actually resides on that node or is running or > stopped. This is especially useful for VM migration because the IP > stays consistent without spacial considering. > > Snapshot/revert, backup/restore, suspend/hibernate/resume cases are > automatically covered because the IP will already be reserved for > that > MAC. > > If the admin wants to change, the IP of a VM this can be done via the > IPAM API/UI which will have to be implemented separately. > > A limitation of this approach vs dynamic IP reservation is that the > IP > range on the subnet needs to be large enough to hold all IPs of all, > even stopped, VMs in that subnet. This is in contrast to default DHCP > functionality where only the number of actively running VMs is > limited. > It should be enough to mention this in the docs. > > I will further review the code an try to implement the aforementioned > approach. > > Best regards, > Stefan Lendl > > Stefan Hanreich <s.hanre...@proxmox.com> writes: > > > This is a WIP patch series, since I will be gone for 3 weeks and > > wanted to > > share my current progress with the DHCP support for SDN. > > > > This patch series adds support for automatically deploying dnsmasq > > as > > a DHCP > > server to a simple SDN Zone. > > > > While certainly not 100% polished on some ends (looking at > > restarting > > systemd > > services in particular), the general idea behind the mechanism > > shows. > > I wanted > > to gather some feedback on how I approached designing the plugins > > and > > the > > config regeneration process before comitting to this design by > > creating an API > > and UI around it. > > > > You need to install dnsmasq (and disable it afterwards): > > > > apt install dnsmasq && systemctl disable --now dnsmasq > > > > > > You can use the following example configuration for deploying a > > DHCP > > server in > > a SDN subnet: > > > > /etc/pve/sdn/dhcp.cfg: > > > > dnsmasq: nat > > > > > > /etc/pve/sdn/zones.cfg: > > > > simple: DHCPNAT > > ipam pve > > > > > > /etc/pve/sdn/vnets.cfg: > > > > vnet: dhcpnat > > zone DHCPNAT > > > > > > /etc/pve/sdn/subnets.cfg: > > > > subnet: DHCPNAT-10.1.0.0-16 > > vnet dhcpnat > > dhcp-dns-server 10.1.0.1 > > dhcp-range server=nat,start-address=10.1.0.100,end- > > address=10.1.0.200 > > gateway 10.1.0.1 > > snat 1 > > > > > > Then apply the SDN configuration: > > > > pvesh set /cluster/sdn > > > > You need to apply the SDN configuration once after adding the dhcp- > > range lines > > to the configuration, since the running configuration is used for > > managing > > DHCP. It will not work otherwise! > > > > For testing it can be helpful to monitor the following files (e.g. > > with watch) > > to find out what is happening > > * /etc/dnsmasq.d/<dhcp_id>/ethers (on each node) > > * /etc/pve/priv/ipam.db > > > > Changes from v1 -> v2: > > * added hooks for handling DHCP when starting / stopping / .. VMs > > and CTs > > * Get an IP from IPAM and register that IP in the DHCP server > > (pve only for now) > > * remove lease-time, since it is now infinite and managed by the > > VM > > lifecycle > > * add hooks for setting & deleting DHCP mappings to DHCP plugins > > * modified interface of the abstract class to reflect new > > requirements > > * added helpers in existing SDN classes > > * simplified DHCP configuration settings > > > > > > > > pve-cluster: > > > > Stefan Hanreich (1): > > cluster files: add dhcp.cfg > > > > src/PVE/Cluster.pm | 1 + > > src/pmxcfs/status.c | 1 + > > 2 files changed, 2 insertions(+) > > > > > > pve-network: > > > > Stefan Hanreich (6): > > subnets: vnets: preparations for DHCP plugins > > dhcp: add abstract class for DHCP plugins > > dhcp: subnet: add DHCP options to subnet configuration > > dhcp: add DHCP plugin for dnsmasq > > ipam: Add helper methods for DHCP to PVE IPAM > > dhcp: regenerate config for DHCP servers on reload > > > > debian/control | 1 + > > src/PVE/Network/SDN.pm | 11 +- > > src/PVE/Network/SDN/Dhcp.pm | 192 > > +++++++++++++++++++++++++ > > src/PVE/Network/SDN/Dhcp/Dnsmasq.pm | 186 > > ++++++++++++++++++++++++ > > src/PVE/Network/SDN/Dhcp/Makefile | 8 ++ > > src/PVE/Network/SDN/Dhcp/Plugin.pm | 83 +++++++++++ > > src/PVE/Network/SDN/Ipams/PVEPlugin.pm | 64 +++++++++ > > src/PVE/Network/SDN/Makefile | 3 +- > > src/PVE/Network/SDN/SubnetPlugin.pm | 32 +++++ > > src/PVE/Network/SDN/Subnets.pm | 43 ++++-- > > src/PVE/Network/SDN/Vnets.pm | 27 ++-- > > 11 files changed, 622 insertions(+), 28 deletions(-) > > create mode 100644 src/PVE/Network/SDN/Dhcp.pm > > create mode 100644 src/PVE/Network/SDN/Dhcp/Dnsmasq.pm > > create mode 100644 src/PVE/Network/SDN/Dhcp/Makefile > > create mode 100644 src/PVE/Network/SDN/Dhcp/Plugin.pm > > > > > > pve-manager: > > > > Stefan Hanreich (1): > > sdn: regenerate DHCP config on reload > > > > PVE/API2/Network.pm | 1 + > > 1 file changed, 1 insertion(+) > > > > > > qemu-server: > > > > Stefan Hanreich (1): > > sdn: dhcp: add DHCP setup to vm-network-scripts > > > > PVE/QemuServer.pm | 14 ++++++++++++++ > > vm-network-scripts/pve-bridge | 3 +++ > > vm-network-scripts/pve-bridgedown | 19 +++++++++++++++++++ > > 3 files changed, 36 insertions(+) > > > > > > pve-container: > > > > Stefan Hanreich (1): > > sdn: dhcp: setup DHCP mappings in LXC hooks > > > > src/PVE/LXC.pm | 10 ++++++++++ > > src/lxc-pve-poststop-hook | 1 + > > src/lxc-pve-prestart-hook | 9 +++++++++ > > 3 files changed, 20 insertions(+) > > > > > > Summary over all repositories: > > 20 files changed, 681 insertions(+), 28 deletions(-) > > > > -- > > murpp v0.4.0 > > > > > > _______________________________________________ > > pve-devel mailing list > > pve-devel@lists.proxmox.com > > https://antiphishing.cetsi.fr/proxy/v3?i=YXJwbnI5ZGY3YXM2MThBYc__j3 > > mPQdDC0mZ08oRIJLw&r=d2RpVFJVaTVtcFJRWFNMYgYCddP93Y9SOEaGwAD- > > 9JdLrx2JwwKfs9Sn_uiRQCCUgqnCg4WLD- > > gLY0eKXrXX4A&f=SVN0TjFBb1k5Qk8zQ2E1YT- > > NJ2Y2fJYrRVcVAuRs9UYfyMFrtkoDLcaTV9MhYQZD&u=https%3A//antiphishing. > > cetsi.fr/proxy/v3%3Fi%3Dd1l4NXNNaWE4SWZqU0dLWcuTfdxE&k=ogd1 > > d98NfWIp9dma5kY&r=MXJUa0FrUVJqc1UwYWxNZ- > > tuXduEO8AMVnCvYVMprCZ3oPilgy3nXcuJTOGH5iK84rVRg8cukFAROdxYRgFTTg&f= > > c2 > > xMdVN4Smh2R2tOZDdIRKCk7WEocHpTPMerT1Q- > > Aq5qwr8l2xvAWuOGvFsV3frp2oSAgxNUQCpJDHp2iUmTWg&u=https%3A//lists.pr > > ox > > mox.com/cgi-bin/mailman/listinfo/pve-devel&k=fjzS > > > _______________________________________________ > pve-devel mailing list > pve-devel@lists.proxmox.com > https://antiphishing.cetsi.fr/proxy/v3?i=YXJwbnI5ZGY3YXM2MThBYc__j3mP > QdDC0mZ08oRIJLw&r=d2RpVFJVaTVtcFJRWFNMYgYCddP93Y9SOEaGwAD- > 9JdLrx2JwwKfs9Sn_uiRQCCUgqnCg4WLD- > gLY0eKXrXX4A&f=SVN0TjFBb1k5Qk8zQ2E1YT- > NJ2Y2fJYrRVcVAuRs9UYfyMFrtkoDLcaTV9MhYQZD&u=https%3A//antiphishing.ce > tsi.fr/proxy/v3%3Fi%3Dd1l4NXNNaWE4SWZqU0dLWcuTfdxEd9&k=ogd1 > 8NfWIp9dma5kY&r=MXJUa0FrUVJqc1UwYWxNZ- > tuXduEO8AMVnCvYVMprCZ3oPilgy3nXcuJTOGH5iK84rVRg8cukFAROdxYRgFTTg&f=c2 > xM > dVN4Smh2R2tOZDdIRKCk7WEocHpTPMerT1Q- > Aq5qwr8l2xvAWuOGvFsV3frp2oSAgxNUQCpJDHp2iUmTWg&u=https%3A//lists.prox > mo > x.com/cgi-bin/mailman/listinfo/pve-devel&k=fjzS _______________________________________________ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel