Re: [pve-devel] [PATCH container 1/1] Signed-off-by: Maurice Klein

Stefan Hanreich Tue, 10 Feb 2026 01:56:39 -0800

On 2/6/26 12:21 PM, Maurice Klein wrote:

[snip]


> I think I didn't explain properly about that.
> Basically the whole Idea is to have a gateway IP like 192.0.2.1/32 on
> the pve host on that bridge and not have a /24 or so route then.

Those are just local to the node for routing, the /24 wouldn't get
announced - only the /32 routes for the additional IPs. But I guess with
that setup you could do without it as well. It shouldn't be an issue to
create a 'subnet' as /32 and then give the PVE host the only IP as
gateway IP and configure it that way. Layer-2 zones for instance (VLAN,
QinQ, VXLAN) don't even need a subnet configured at all - so I don't see
a blocker there.

> Guests then also have addresses whatever they might look like.
> For example a guest could have 1.1.1.1/32 but usually always /32,
> although I guess for some use cases it could be beneficial to be able to
> have a guest that gets more then a /32 but let's put that aside for now.

would be quite interesting for IPv6 actually.

> Now there is no need/reason to define which subnet a guest is on and no
> need to be in the same with the host.
> 
> The guest would configure it's ip statically inside and it would be
> a /32 usually.

Yeah, most implementations I've seen usually have a 'cluster IP' that is
in RFC 1918 range for local cluster communication, but with containers
that works a lot easier since you can control the network configuration
of them whereas with VMs you cannot and would need to update the
configuration on every move - or use the same subnet across every node
instead of having a dedicated subnet per node/rack.

[snip]

> Now the biggest thing this enables us to do is in pve clusters if we
> build for example a ibgp full mesh the routes get shared.
> There could be any topology now and routing would adapt.
> just as an example while that is a shity topology it can illustrate the
> point.:
> 
>       GW-1        GW-2
>         | \        / |
>         |  \      /  |
>         |   \    /   |
>        pve1--pve3
>            \      /
>             \    /
>              pve2
> 
> Any pve can fail and there would still be everything reachable.
> Always the shortest path will be chosen.
> Any link can Fail.
> Any Gateway can Fail.
> Even multiple links failing is ok.
> No chance for loops because every link is p2p.
> Much like at the full mesh ceph setup with ospf or openfabric.
> 
> That can be archived with evpn/vxlan and anycast gateway and multiple
> exit nodes.
> Problem is the complexity and by giving bigger routes then /24 to
> gateways they will not always use the optimal path thus increasing
> latancy and putting unnesisary routing load on hosts where the vm isn't
> living right now.
> And all that to have one L2 domain which often brings more disadvantages
> then advantages.
> 
> I hope I explained it well now, if not feel free to ask anything, I
> could also provide some bigger documentation with screenshots of
> everything.

Yes that makes sense. The way I described it in my previous mail should
be like that, since it decouples the IP configuration + route creation
(which would then be handled by the zone / vnet) from the announcement
of that route (which would be handled by fabrics). As a start we could
just utilize the default routing table. I'm planning on adding VRF +
Route Redistribution + Route Map support mid-term, so the new zone could
then profit from those without having to implement anything of the sort
for now. It's a bit of an awkward timing, since I'm still working on
implementing several features that this plugin would benefit quite
heavily from and I don't want to do any duplicate work / code ourselves
into a corner by basically implementing all that functionality but only
specific to that plugin and then having to migrate everything over while
maintaining backwards compatibility.

[snip]

> I also feel like it would make sense in the network device, since it is
> part of specific configuration for that vm but I get why you are
> reluctant to that.
> This honestly makes me reconsider the sdn approach a little bit.
> I have an Idea here that could be something workable.
> What if we add a field not saying guest ip, what if we instead call id
> routes.
> Essentially that is what it is and might have extra use cases apart from
> what I'm trying to archive.
> That way for this use case you can use those fields to add the
> needed /32 host routes.
> It wouldn't be specific to the sdn feature we build.
> The SDN feature could then be more about configuring the bridge with the
> right addresses and fetures and enable us to later distribute the routes
> via bgp and other ways.
> I looked into the hotplug scenarios as well and that way those would be
> solved.

Yeah, I think VM configuration is the best bet. It should be tied to the
network device imo, so I guess adding a property that allows configuring
a CIDR there should be fine for starting out. Adding the route is
handled by the respective tap_plug / veth_create functions in
pve-network and the new zone plugin then.

[snip]

Re: [pve-devel] [PATCH container 1/1] Signed-off-by: Maurice Klein

Reply via email to