Bug#1042842: network interface names wrong in domU (>10 interfaces)

2023-08-08 Thread zithro

On 08 Aug 2023 16:59, Hans van Kranenburg wrote:

I didn't read the other mailthread on the xen list fully yet.


You gave me the idea to post the IRC digest, so the report here is more 
complete, and people not tracking the xen-devel ML can read it nicely.
For those who do, the mail is dated "02 Aug 2023 18:19", and titled 
"Network interfaces naming changes in domUs with >10 vifs (Debian bug 
1042842)".

The ML post got no answer yet.

[--- IRC ---]
- AFAIK, there is no sorting in Xenstored. And you should not expect 
that even if libxl sorted properly it will be seen in the same order on 
the other end.
- is the ethN number in domU related to vif number in xenstore, or to 
device detection order?
- there's no order to eth names at all. they're allocated 
first-come-first-serve, so it entirely depends on how parallel the 
probing of nic drivers are. even if netfront is serialised around 
xenstore accesses, it probably allocates in the order that XS_DIRECTORY 
comes back with
- from simple tests, it looks like VIFs are created in Xenstore in the 
order of the config file, but if you "xenstore-ls /[...]/vif", you can 
see vifs are ordered like vif1,vif10,vif11,vif2,etc
- the order is different between Xen 4.14 and 4.17 (ie. the "expected" 
order works on 4.14, not 4.17)
- But really, Debian should have never relied on how the nodes are 
ordered. This is not something we guarantee in the Xenstored API
- the last big batch of XSA content for the xenstoreds did some major 
rearranging of oxenstored. We dropped a NIH second garbage collector, 
and a NIH weakref system IIRC. I could entirely believe that the 
apparent sort order changed as a result
- generally, I think Linux world established quite some time ago that 
ethN names are not stable
- It's definitely a complicated issue.  Perhaps best to post to 
xen-devel so we can have a discussion. I expect the answer is not-a-Xen 
bug, but I don't think we have a clear understanding of the problem yet

[--- /IRC ---]

I'll report back when having tested the 111 vifs domU ... if my system 
agrees o_O
As it requires a script to populate the cfg, one could also enhance it 
to try how dynamically adding/removing vifs is handled.


(BTW, before this report I thought Xen had a hard limit of 8 vifs per 
domU. Or was that only on FreeBSD domUs ? Can't remember).




Bug#1042842: [Pkg-xen-devel] Bug#1042842: network interface names wrong in domU (>10 interfaces)

2023-08-08 Thread zithro

On 08 Aug 2023 12:08, Valentin Kleibel wrote:

I posted on xen-devel, you can follow from :
https://lists.xenproject.org/archives/html/xen-devel/2023-08/msg00244.html
(Unfortunately, the formatting is weird via html, split the IRC part 
on "- ").


Thank you for posting upstream.


No prob, although if that very answer does not answer your question, I 
guess you'd be better off replying on xen-devel ML (reply to my post or 
at least reference it).


All documentation i found found on the Xen wiki suggests that interfaces 
are connected vifX.Y <-> ethY. [0] [1]
The only other way i know of for identifying the interfaces are MAC 
Addresses which can be randomly assigned if you don't configure them.


On [0], you can read "In both cases the device naming is subject to the 
usual guest or backend domain facilities for renaming network devices".

It says "naming/renaming", but you can assume "detecting".

I also checked which net_ids udev knows about and the only things that 
pop up are:

ID_NET_NAMING_SCHEME=v247
ID_NET_NAME_MAC=enx00163efd832b
ID_OUI_FROM_DATABASE=Xensource, Inc.


Is it from dom0 or domU ?
Are you using "net.ifnames=0" on the domU kernel command line ?
"v247" looks like systemd "predictive naming scheme" (eth -> enX).
From bookworm on, domUs vifs get named enXN (enX0, enX1, ...).
Read on :
https://www.debian.org/releases/stable/i386/release-notes/ch-information.en.html#xen-network

Either i am missing the way you're supposed to do this, or there is a 
bug somewhere in the toolchain.
Unfortunately i'm not able to pinpoint the source of the issue, any help 
would be appreciated.


I made some tests with a domU using many interfaces, like :

[...]
vif = [ 'bridge=xbr-tst,mac=00:16:3e:de:bd:00,type=vif,vifname=domu-a' ,
'bridge=xbr-tst,mac=00:16:3e:de:bd:01,type=vif,vifname=domu-b' ,
'bridge=xbr-tst,mac=00:16:3e:de:bd:02,type=vif,vifname=domu-c' ,
'bridge=xbr-tst,mac=00:16:3e:de:bd:03,type=vif,vifname=domu-d' ,
'bridge=xbr-tst,mac=00:16:3e:de:bd:04,type=vif,vifname=domu-e' ,
'bridge=xbr-tst,mac=00:16:3e:de:bd:05,type=vif,vifname=domu-f' ,
'bridge=xbr-tst,mac=00:16:3e:de:bd:06,type=vif,vifname=domu-g' ,
'bridge=xbr-tst,mac=00:16:3e:de:bd:07,type=vif,vifname=domu-h' ,
'bridge=xbr-tst,mac=00:16:3e:de:bd:08,type=vif,vifname=domu-i' ,
'bridge=xbr-tst,mac=00:16:3e:de:bd:09,type=vif,vifname=domu-j' ,
'bridge=xbr-tst,mac=00:16:3e:de:bd:10,type=vif,vifname=domu-k' ,
]
[...]

-
This is dom0's corresponding dmesg:

[...]
xbr-tst: port 3(domu-b) entered blocking state
xbr-tst: port 3(domu-b) entered disabled state
device domu-b entered promiscuous mode
xbr-tst: port 4(domu-i) entered blocking state
xbr-tst: port 4(domu-i) entered disabled state
device domu-i entered promiscuous mode
[...]

Here you can see :
port 3 <-> domu-b
port 4 <-> domu-i

We learn here that dom0 did not detect vifs serially.

-
In the domU, "ip link" shows :

[...]
eth0
link/ether 00:16:3e:de:bd:00
altname enX0
eth1
link/ether 00:16:3e:de:bd:01
altname enX1
eth2
link/ether 00:16:3e:de:bd:10
altname enX10
eth3
link/ether 00:16:3e:de:bd:02
altname enX2
[...]

See how ethN interfaces get messed up, like in your setup, but 
predictable names would work, as you can see in "altname enXN" :

eth1 (:01) -> enX1
eth2 (:10) -> enX10
eth3 (:02) -> enX2

So, my answer does not tell you if something changed in Xen itself, only 
in Debian.
But I guess it relates to what Xen devs told us : vifs detection order 
cannot be relied upon, that's why "predictable names" were invented.
The vif detection part is related to the domains kernels, not Xen itself 
(at least that's what I understood).


Using eth0 nowadays is a bit like using /dev/sda for hard drives, it's 
considered legacy as it may create problems in some setups, like yours 
(ie. for disks, it's recommended to use UUIDs or /dev/disk/by-*).


I hope this answers your question.



Bug#1042842: [Pkg-xen-devel] Bug#1042842: network interface names wrong in domU (>10 interfaces)

2023-08-02 Thread zithro

On 02 Aug 2023 18:09, Valentin Kleibel wrote:

#xen-devel is the IRC Xen channel. I just pinged them, I'll wait.
Depending on their answers, I'll post on the xen-devel mailing list.


thanks for the clarification, looking forward to an answer.


I posted on xen-devel, you can follow from :
https://lists.xenproject.org/archives/html/xen-devel/2023-08/msg00244.html
(Unfortunately, the formatting is weird via html, split the IRC part on 
"- ").


Note that, at first sight, I was told this seems "not-a-Xen" bug (read 
the IRC excerpts).


Our current workaround is to edit the interface names in the domUs 
config to match the wrong sorting. And be extra careful that the 
domUs MACs match the ones we expect on that network.


Via udev (MAC matching) or /etc/network/interfaces ?
I ask because it may help others, while this gets resolved.


We just edited /etc/network/interfaces, as it only affects a few of our 
domUs.
i think udev rules matching the MAC would be a better solution. I just 
didn't take the time to write them and went for the quick and dirty 
solution.


Till it works, "whatever the bottle, till we have the poison" ;)

This link may be useful: https://wiki.debian.org/NetworkInterfaceNames



Bug#1042842: [Pkg-xen-devel] Bug#1042842: Acknowledgement (network interface names wrong in domU (>10 interfaces))

2023-08-02 Thread zithro

On 02 Aug 2023 10:22, Valentin Kleibel wrote:

Hi,


the bug has been mentionned on #xen-devel, will keep you posted.


Thanks. I wasn't able to find such a report, could you link the archive 
or post the threads subject so i can find it?


#xen-devel is the IRC Xen channel. I just pinged them, I'll wait.
Depending on their answers, I'll post on the xen-devel mailing list.


Meanwhile, you may try to force the domU vif names with a letter


The sorting with letters doesn't work out as renaming the interface is a 
secondary step.

...
[53408.899507] vif vif-5-0 sort-a: renamed from vif5.0


Yeah I just tried with vifnames, no more luck, sorry.
Running "xenstore-ls /local/domain/DOMU_ID/device/vif" shows that vif10+ 
are sorted before vif10- (ie. vif1, vif10, vif11, vif2, ...).


Our current workaround is to edit the interface names in the domUs 
config to match the wrong sorting. And be extra careful that the domUs 
MACs match the ones we expect on that network.


Via udev (MAC matching) or /etc/network/interfaces ?
I ask because it may help others, while this gets resolved.


--
zithro / Cyril



Bug#1042842: network interface names wrong in domU (>10 interfaces)

2023-08-01 Thread zithro

Hello,

the bug has been mentionned on #xen-devel, will keep you posted.

Meanwhile, you may try to force the domU vif names with a letter, like :

vif = [
'mac=00:16:3e:fd:83:2f,bridge=lanbr,vifname=domu-a',
'mac=00:16:3e:fd:83:30,bridge=lanbr,vifname=domu-b',
'mac=00:16:3e:fd:83:31,bridge=lanbr,vifname=domu-c',
...
  ]

Note it's just a workaround, and I've not tested it.
I only guess letters would be sorted correctly.
If you test this, can you report back please ?


--
zithro / Cyril



Bug#452721: [Pkg-xen-devel] Bug#452721: irt: Bug#452721 notes from explorations

2023-07-31 Thread zithro

On 31 Jul 2023 03:39, Elliott Mitchell wrote:


Presently I hope to convince the Xen core to allow full Python in domain
configuration files, but no news on that front so far.  This would mean
/etc/default/xendomains would need to change to match Python syntax.


There was an answer today on xen-devel: the ability to use scripts in 
domU cfg files has been explicitely removed for various reasons.
This does not prevent you from "source"-ing teh cfg files in your 
script(s) if they are proper Python syntax. Or you could simply 
parse/regex the values you want.
And as Marek suggested in his answer, you can also put any arbitrary 
settings in the comments.


Although ...


My thinking for adding to domain configuration files would be something
along these lines:

init = {
'tool': 'xendomains-ng',
'version': 0,
'order': 9,
'startwait': 60,
'stopaction': 'save',
}


The problem with adding this to a domU config file is that it could 
cause problems for (live) migrations. The start/stop order is "per 
dom0", and may be different on another one.
Imagine two dom0s, one storing the domain files "locally", while the 
other uses NFS. Only in the second case the domU should wait for the NFS 
server/domain to be available.


To me, the start/stop logic should be in a dom0 config file.


'startwait' would tell the script to wait that long before starting
subsquent domains.


A time-based wait may be useful for when everything goes well, but what 
about when there are problems ?
If you want to be sure a domain is up (ie. ready to serve), you would 
need to peek at the related "service".
For example, to be sure a DNS domU is up, you would have to try a DNS 
request, as a ping or "xl list" would not be enough.
Also, domains in xen/auto are started with a mix of serialization AND 
parallelization, as "xl create" returns once the domain has started (ie. 
in the Xen point of view, not the user's).



'stopaction' would allow different actions if the machine was to stop.
The 3 options which come to mind are 'stop' (shutdown), 'save' (save to
specified storage location), and 'migrate'.


Then, each time you do NOT want to follow the usual action, you'd have 
to edit -each- domU cfg file ?



If full Python doesn't become available, this might take the format:
init = 'tool=xendomains-ng,version=0,order=9,startwait=60,stopaction=save'
Not needing to parse the string though does make one's life simpler.


Well, it makes -your- life easier, not the maintainers' one ;)


I'm basically certain writing a new xendomains script in Python is the
way to go.  Now to get an answer as to whether full Python in domain
configuration files could be reenabled.


I'm not sure a Python script would solve anything, as (ba)sh variables 
are imported from other files.
(see for example 
https://salsa.debian.org/xen-team/debian-xen/-/blob/master/tools/hotplug/Linux/xendomains.in)


Everything considered, I'm not sure why Xen should provide such 
functionnality.
I think custom scripts can handle all the various use cases, don't you 
think ?
PS: as mentionned by diederik, the "dependency" logic is already handled 
by Qubes since years, and it never made it to Xen (I don't know the 
reasons though).


But I agree the shutdown sequence could be adapted to :
1. first shutdown the domains NOT in xen/auto
2. then shutdown the domains in xen/auto, in reverse order

For fine grained start/stop order, maybe having a dom0 config file 
handling this could be added, like:


# START/STOP ORDER
# domains not in these lists will be started after and stopped
# before the ones here
start-order=(list of domU names)
stop-order=(list of domU names)

But then again, this only ensures "domains" start order, not "services 
availability" in said domains.


--
zithro / Cyril



Bug#1041533: xen-system-amd64: Xen fails to start hvm type VMs when a vncpasswd is set

2023-07-20 Thread zithro

Hello,

I -think- VNC auth has been removed from the last QEMU versions.
Also maybe related, QEMU in Debian is not configured with VNC_SASL 
(there was a discussion about it in #debian-xen).


Wait for confirmations, meanwhile there is another option: SSH (maybe 
even more secure ?).


The workaround is to make the VNC servers only accessible from dom0, 
then to create SSH tunnels to connect to them :


1. in the domU config file, select "127.0.0.1" as the IP address to 
listen to, and remove everything about authentication
2. from your management host, create a tunnel, something like "ssh -nN 
-L localhost:12345:localhost:59xx user@dom0"

3. from your management host, use VNC_APP:12345 to connect to the display

The "xx" for the tunnel represent the "VNC display id" you've chosen in 
your domU config file, so if you have "vnclisten = 127.0.0.1:12", the 
real IP address is "127.0.0.1:5912" (in your case, you'd pick 5901).


Hope it helps.

PS: as for documentation it will be in the new Debian Xen wiki page 
(which I'm rewriting, for now it's still an offline draft).


--
Cyril RĂ©bert / zithro



Bug#1038901: xen dom0 erroneous detected as 'xen' virtualization by systemd-detect-virt

2023-06-23 Thread zithro

Hello,

I reported the bug upstream, just added there some comments to reflect 
that the output is different on AMD and Intel platforms.

I also added the commit link, thanks for that.

So, "systemd-detect-virt" on non-nested dom0s reports :
- "xen" on Intel (like this bug report)
- "vm-other" on AMD (like my bug report upstream)



Bug#983357: How are other distros handling this ?

2023-05-08 Thread zithro

Hi all,

in an effort to solve this bug the quickest way possible, I want to 
share some pointers (and some time !).


First, it can affect not only "upstream" Debian but also Debian-based 
distros. I personaly had the problem trying to use Kali as a HVM domU. 
Although Chuck's fix worked on Kali, I dunno which distros may fail without.
But from memory, looking for solutions to the bug, I only found posts 
about Debian.


From what I read, understood, and guessed :
- it appears to be a kernel+udev+Xen bug,
- it has nothing to do with Debian itself,
- all distros using Xen are also using upstream Xen virtual KB

So I'm wondering :
- how can this bug only affect Debian and Debian-based distros but not 
others like Suse, fedora, etc ?

- If I'm right, how are other distros handling this bug ?
- How is the "Debian cloud installer" handling this bug ?
- If the problem can't be solved "at its roots", can't we just apply the 
(harmless) workaround in the installer ?


We should really speed up the fix before the Full Freeze.
I can help for testing and reporting stuff, but I'm a noob in packaging.


Thanks all, have a nice day,

zithro / Cyril REBERT


PS: off-topic remarks.
I'm a regular Debian user since a dozen years, and Debian+Xen since 5 
years, as dom0s and domUs. I recently joined the Debian Xen team.
I want to say this is one of the most critical and obscure bug -I- 
encountered on a Debian stable installer.


It's obvious but I'll state it nonetheless : it's really not a good 
"publicity" for Debian. Note I'm pointing no finger at anyone as I 
really don't care for it, I only care for a resolution.
We don't really know how much users turned their backs on Debian because 
of this bug : "Deb does not work, but distro X works, let's just use the 
one that works OOTB".

But maybe this bug is only affecting a handful of users ?