Bug#1042842: network interface names wrong in domU (>10 interfaces)
On 08 Aug 2023 16:59, Hans van Kranenburg wrote: I didn't read the other mailthread on the xen list fully yet. You gave me the idea to post the IRC digest, so the report here is more complete, and people not tracking the xen-devel ML can read it nicely. For those who do, the mail is dated "02 Aug 2023 18:19", and titled "Network interfaces naming changes in domUs with >10 vifs (Debian bug 1042842)". The ML post got no answer yet. [--- IRC ---] - AFAIK, there is no sorting in Xenstored. And you should not expect that even if libxl sorted properly it will be seen in the same order on the other end. - is the ethN number in domU related to vif number in xenstore, or to device detection order? - there's no order to eth names at all. they're allocated first-come-first-serve, so it entirely depends on how parallel the probing of nic drivers are. even if netfront is serialised around xenstore accesses, it probably allocates in the order that XS_DIRECTORY comes back with - from simple tests, it looks like VIFs are created in Xenstore in the order of the config file, but if you "xenstore-ls /[...]/vif", you can see vifs are ordered like vif1,vif10,vif11,vif2,etc - the order is different between Xen 4.14 and 4.17 (ie. the "expected" order works on 4.14, not 4.17) - But really, Debian should have never relied on how the nodes are ordered. This is not something we guarantee in the Xenstored API - the last big batch of XSA content for the xenstoreds did some major rearranging of oxenstored. We dropped a NIH second garbage collector, and a NIH weakref system IIRC. I could entirely believe that the apparent sort order changed as a result - generally, I think Linux world established quite some time ago that ethN names are not stable - It's definitely a complicated issue. Perhaps best to post to xen-devel so we can have a discussion. I expect the answer is not-a-Xen bug, but I don't think we have a clear understanding of the problem yet [--- /IRC ---] I'll report back when having tested the 111 vifs domU ... if my system agrees o_O As it requires a script to populate the cfg, one could also enhance it to try how dynamically adding/removing vifs is handled. (BTW, before this report I thought Xen had a hard limit of 8 vifs per domU. Or was that only on FreeBSD domUs ? Can't remember).
Bug#1042842: [Pkg-xen-devel] Bug#1042842: network interface names wrong in domU (>10 interfaces)
On 08 Aug 2023 12:08, Valentin Kleibel wrote: I posted on xen-devel, you can follow from : https://lists.xenproject.org/archives/html/xen-devel/2023-08/msg00244.html (Unfortunately, the formatting is weird via html, split the IRC part on "- "). Thank you for posting upstream. No prob, although if that very answer does not answer your question, I guess you'd be better off replying on xen-devel ML (reply to my post or at least reference it). All documentation i found found on the Xen wiki suggests that interfaces are connected vifX.Y <-> ethY. [0] [1] The only other way i know of for identifying the interfaces are MAC Addresses which can be randomly assigned if you don't configure them. On [0], you can read "In both cases the device naming is subject to the usual guest or backend domain facilities for renaming network devices". It says "naming/renaming", but you can assume "detecting". I also checked which net_ids udev knows about and the only things that pop up are: ID_NET_NAMING_SCHEME=v247 ID_NET_NAME_MAC=enx00163efd832b ID_OUI_FROM_DATABASE=Xensource, Inc. Is it from dom0 or domU ? Are you using "net.ifnames=0" on the domU kernel command line ? "v247" looks like systemd "predictive naming scheme" (eth -> enX). From bookworm on, domUs vifs get named enXN (enX0, enX1, ...). Read on : https://www.debian.org/releases/stable/i386/release-notes/ch-information.en.html#xen-network Either i am missing the way you're supposed to do this, or there is a bug somewhere in the toolchain. Unfortunately i'm not able to pinpoint the source of the issue, any help would be appreciated. I made some tests with a domU using many interfaces, like : [...] vif = [ 'bridge=xbr-tst,mac=00:16:3e:de:bd:00,type=vif,vifname=domu-a' , 'bridge=xbr-tst,mac=00:16:3e:de:bd:01,type=vif,vifname=domu-b' , 'bridge=xbr-tst,mac=00:16:3e:de:bd:02,type=vif,vifname=domu-c' , 'bridge=xbr-tst,mac=00:16:3e:de:bd:03,type=vif,vifname=domu-d' , 'bridge=xbr-tst,mac=00:16:3e:de:bd:04,type=vif,vifname=domu-e' , 'bridge=xbr-tst,mac=00:16:3e:de:bd:05,type=vif,vifname=domu-f' , 'bridge=xbr-tst,mac=00:16:3e:de:bd:06,type=vif,vifname=domu-g' , 'bridge=xbr-tst,mac=00:16:3e:de:bd:07,type=vif,vifname=domu-h' , 'bridge=xbr-tst,mac=00:16:3e:de:bd:08,type=vif,vifname=domu-i' , 'bridge=xbr-tst,mac=00:16:3e:de:bd:09,type=vif,vifname=domu-j' , 'bridge=xbr-tst,mac=00:16:3e:de:bd:10,type=vif,vifname=domu-k' , ] [...] - This is dom0's corresponding dmesg: [...] xbr-tst: port 3(domu-b) entered blocking state xbr-tst: port 3(domu-b) entered disabled state device domu-b entered promiscuous mode xbr-tst: port 4(domu-i) entered blocking state xbr-tst: port 4(domu-i) entered disabled state device domu-i entered promiscuous mode [...] Here you can see : port 3 <-> domu-b port 4 <-> domu-i We learn here that dom0 did not detect vifs serially. - In the domU, "ip link" shows : [...] eth0 link/ether 00:16:3e:de:bd:00 altname enX0 eth1 link/ether 00:16:3e:de:bd:01 altname enX1 eth2 link/ether 00:16:3e:de:bd:10 altname enX10 eth3 link/ether 00:16:3e:de:bd:02 altname enX2 [...] See how ethN interfaces get messed up, like in your setup, but predictable names would work, as you can see in "altname enXN" : eth1 (:01) -> enX1 eth2 (:10) -> enX10 eth3 (:02) -> enX2 So, my answer does not tell you if something changed in Xen itself, only in Debian. But I guess it relates to what Xen devs told us : vifs detection order cannot be relied upon, that's why "predictable names" were invented. The vif detection part is related to the domains kernels, not Xen itself (at least that's what I understood). Using eth0 nowadays is a bit like using /dev/sda for hard drives, it's considered legacy as it may create problems in some setups, like yours (ie. for disks, it's recommended to use UUIDs or /dev/disk/by-*). I hope this answers your question.
Bug#1042842: [Pkg-xen-devel] Bug#1042842: network interface names wrong in domU (>10 interfaces)
On 02 Aug 2023 18:09, Valentin Kleibel wrote: #xen-devel is the IRC Xen channel. I just pinged them, I'll wait. Depending on their answers, I'll post on the xen-devel mailing list. thanks for the clarification, looking forward to an answer. I posted on xen-devel, you can follow from : https://lists.xenproject.org/archives/html/xen-devel/2023-08/msg00244.html (Unfortunately, the formatting is weird via html, split the IRC part on "- "). Note that, at first sight, I was told this seems "not-a-Xen" bug (read the IRC excerpts). Our current workaround is to edit the interface names in the domUs config to match the wrong sorting. And be extra careful that the domUs MACs match the ones we expect on that network. Via udev (MAC matching) or /etc/network/interfaces ? I ask because it may help others, while this gets resolved. We just edited /etc/network/interfaces, as it only affects a few of our domUs. i think udev rules matching the MAC would be a better solution. I just didn't take the time to write them and went for the quick and dirty solution. Till it works, "whatever the bottle, till we have the poison" ;) This link may be useful: https://wiki.debian.org/NetworkInterfaceNames
Bug#1042842: [Pkg-xen-devel] Bug#1042842: Acknowledgement (network interface names wrong in domU (>10 interfaces))
On 02 Aug 2023 10:22, Valentin Kleibel wrote: Hi, the bug has been mentionned on #xen-devel, will keep you posted. Thanks. I wasn't able to find such a report, could you link the archive or post the threads subject so i can find it? #xen-devel is the IRC Xen channel. I just pinged them, I'll wait. Depending on their answers, I'll post on the xen-devel mailing list. Meanwhile, you may try to force the domU vif names with a letter The sorting with letters doesn't work out as renaming the interface is a secondary step. ... [53408.899507] vif vif-5-0 sort-a: renamed from vif5.0 Yeah I just tried with vifnames, no more luck, sorry. Running "xenstore-ls /local/domain/DOMU_ID/device/vif" shows that vif10+ are sorted before vif10- (ie. vif1, vif10, vif11, vif2, ...). Our current workaround is to edit the interface names in the domUs config to match the wrong sorting. And be extra careful that the domUs MACs match the ones we expect on that network. Via udev (MAC matching) or /etc/network/interfaces ? I ask because it may help others, while this gets resolved. -- zithro / Cyril
Bug#1042842: network interface names wrong in domU (>10 interfaces)
Hello, the bug has been mentionned on #xen-devel, will keep you posted. Meanwhile, you may try to force the domU vif names with a letter, like : vif = [ 'mac=00:16:3e:fd:83:2f,bridge=lanbr,vifname=domu-a', 'mac=00:16:3e:fd:83:30,bridge=lanbr,vifname=domu-b', 'mac=00:16:3e:fd:83:31,bridge=lanbr,vifname=domu-c', ... ] Note it's just a workaround, and I've not tested it. I only guess letters would be sorted correctly. If you test this, can you report back please ? -- zithro / Cyril
Bug#452721: [Pkg-xen-devel] Bug#452721: irt: Bug#452721 notes from explorations
On 31 Jul 2023 03:39, Elliott Mitchell wrote: Presently I hope to convince the Xen core to allow full Python in domain configuration files, but no news on that front so far. This would mean /etc/default/xendomains would need to change to match Python syntax. There was an answer today on xen-devel: the ability to use scripts in domU cfg files has been explicitely removed for various reasons. This does not prevent you from "source"-ing teh cfg files in your script(s) if they are proper Python syntax. Or you could simply parse/regex the values you want. And as Marek suggested in his answer, you can also put any arbitrary settings in the comments. Although ... My thinking for adding to domain configuration files would be something along these lines: init = { 'tool': 'xendomains-ng', 'version': 0, 'order': 9, 'startwait': 60, 'stopaction': 'save', } The problem with adding this to a domU config file is that it could cause problems for (live) migrations. The start/stop order is "per dom0", and may be different on another one. Imagine two dom0s, one storing the domain files "locally", while the other uses NFS. Only in the second case the domU should wait for the NFS server/domain to be available. To me, the start/stop logic should be in a dom0 config file. 'startwait' would tell the script to wait that long before starting subsquent domains. A time-based wait may be useful for when everything goes well, but what about when there are problems ? If you want to be sure a domain is up (ie. ready to serve), you would need to peek at the related "service". For example, to be sure a DNS domU is up, you would have to try a DNS request, as a ping or "xl list" would not be enough. Also, domains in xen/auto are started with a mix of serialization AND parallelization, as "xl create" returns once the domain has started (ie. in the Xen point of view, not the user's). 'stopaction' would allow different actions if the machine was to stop. The 3 options which come to mind are 'stop' (shutdown), 'save' (save to specified storage location), and 'migrate'. Then, each time you do NOT want to follow the usual action, you'd have to edit -each- domU cfg file ? If full Python doesn't become available, this might take the format: init = 'tool=xendomains-ng,version=0,order=9,startwait=60,stopaction=save' Not needing to parse the string though does make one's life simpler. Well, it makes -your- life easier, not the maintainers' one ;) I'm basically certain writing a new xendomains script in Python is the way to go. Now to get an answer as to whether full Python in domain configuration files could be reenabled. I'm not sure a Python script would solve anything, as (ba)sh variables are imported from other files. (see for example https://salsa.debian.org/xen-team/debian-xen/-/blob/master/tools/hotplug/Linux/xendomains.in) Everything considered, I'm not sure why Xen should provide such functionnality. I think custom scripts can handle all the various use cases, don't you think ? PS: as mentionned by diederik, the "dependency" logic is already handled by Qubes since years, and it never made it to Xen (I don't know the reasons though). But I agree the shutdown sequence could be adapted to : 1. first shutdown the domains NOT in xen/auto 2. then shutdown the domains in xen/auto, in reverse order For fine grained start/stop order, maybe having a dom0 config file handling this could be added, like: # START/STOP ORDER # domains not in these lists will be started after and stopped # before the ones here start-order=(list of domU names) stop-order=(list of domU names) But then again, this only ensures "domains" start order, not "services availability" in said domains. -- zithro / Cyril
Bug#1041533: xen-system-amd64: Xen fails to start hvm type VMs when a vncpasswd is set
Hello, I -think- VNC auth has been removed from the last QEMU versions. Also maybe related, QEMU in Debian is not configured with VNC_SASL (there was a discussion about it in #debian-xen). Wait for confirmations, meanwhile there is another option: SSH (maybe even more secure ?). The workaround is to make the VNC servers only accessible from dom0, then to create SSH tunnels to connect to them : 1. in the domU config file, select "127.0.0.1" as the IP address to listen to, and remove everything about authentication 2. from your management host, create a tunnel, something like "ssh -nN -L localhost:12345:localhost:59xx user@dom0" 3. from your management host, use VNC_APP:12345 to connect to the display The "xx" for the tunnel represent the "VNC display id" you've chosen in your domU config file, so if you have "vnclisten = 127.0.0.1:12", the real IP address is "127.0.0.1:5912" (in your case, you'd pick 5901). Hope it helps. PS: as for documentation it will be in the new Debian Xen wiki page (which I'm rewriting, for now it's still an offline draft). -- Cyril RĂ©bert / zithro
Bug#1038901: xen dom0 erroneous detected as 'xen' virtualization by systemd-detect-virt
Hello, I reported the bug upstream, just added there some comments to reflect that the output is different on AMD and Intel platforms. I also added the commit link, thanks for that. So, "systemd-detect-virt" on non-nested dom0s reports : - "xen" on Intel (like this bug report) - "vm-other" on AMD (like my bug report upstream)
Bug#983357: How are other distros handling this ?
Hi all, in an effort to solve this bug the quickest way possible, I want to share some pointers (and some time !). First, it can affect not only "upstream" Debian but also Debian-based distros. I personaly had the problem trying to use Kali as a HVM domU. Although Chuck's fix worked on Kali, I dunno which distros may fail without. But from memory, looking for solutions to the bug, I only found posts about Debian. From what I read, understood, and guessed : - it appears to be a kernel+udev+Xen bug, - it has nothing to do with Debian itself, - all distros using Xen are also using upstream Xen virtual KB So I'm wondering : - how can this bug only affect Debian and Debian-based distros but not others like Suse, fedora, etc ? - If I'm right, how are other distros handling this bug ? - How is the "Debian cloud installer" handling this bug ? - If the problem can't be solved "at its roots", can't we just apply the (harmless) workaround in the installer ? We should really speed up the fix before the Full Freeze. I can help for testing and reporting stuff, but I'm a noob in packaging. Thanks all, have a nice day, zithro / Cyril REBERT PS: off-topic remarks. I'm a regular Debian user since a dozen years, and Debian+Xen since 5 years, as dom0s and domUs. I recently joined the Debian Xen team. I want to say this is one of the most critical and obscure bug -I- encountered on a Debian stable installer. It's obvious but I'll state it nonetheless : it's really not a good "publicity" for Debian. Note I'm pointing no finger at anyone as I really don't care for it, I only care for a resolution. We don't really know how much users turned their backs on Debian because of this bug : "Deb does not work, but distro X works, let's just use the one that works OOTB". But maybe this bug is only affecting a handful of users ?