Re: issue when not using acpi indices in libvirt 7.4.0 and qemu 6.0.0

2021-06-23 Thread Laine Stump




On 6/23/21 7:37 PM, Riccardo Ravaioli wrote:
On Wed, 23 Jun 2021 at 18:59, Daniel P. Berrangé > wrote:


[...]
So your config here does NOT list any ACPI indexes


Exactly, I don't list any ACPI indices.

 > After upgrading to libvirt 7.4.0 and qemu 6.0.0, the XML snippet
above
 > yielded:
 > - ens1 for the first virtio interface => OK
 > - rename4 for the second virtio interface => **KO**


(this is reminiscent of what would sometimes happen back in the "bad old 
days" of ethN NIC naming.)



 > - ens3 for the PCI passthrough interface  => OK


With the older libvirt + qemu, the guest (Debian) was setting the device 
names to ens1, ens2, and ens3 (through some sort of renaming, apparently 
by udev. The names that these interfaces would normally get (e.g. in 
Fedora or RHEL8) would be enp1s1, enp1s2, and enp1s3.


With the newer libvirt + qemu, the guest still has the names set by 
systemd (?) to enp1s1, enp1s2, and enp1s3.



So from libvirt's POV, nothing should have changed upon upgrade,
as we wouldn't be setting any ACPI indexes by default.


Right. If ACPI indexes had been turned on, I would have expected the 
names to be, e.g., eno1, eno2, eno3. But that would require explicitly 
adding the option to the qemu commandline, but it isn't there (see below).




Can you show the QEMU command line from /var/log/libvirt/qemu/$GUEST.log
both before and after the libvirt upgrade.


Sure, here it is before the upgrade: https://pastebin.com/ZzKd2uRJ 



-netdev tap,fd=50,id=hostnet0 \
-device 
virtio-net-pci,csum=off,netdev=hostnet0,id=net0,mac=52:54:00:aa:cc:05,bus=pci.1,addr=0x1 
\

-netdev tap,fd=51,id=hostnet1 \
-device 
virtio-net-pci,csum=off,netdev=hostnet1,id=net1,mac=52:54:00:aa:bb:81,bus=pci.1,addr=0x2 
\

[...]
-device vfio-pci,host=:0d:00.0,id=hostdev0,bus=pci.1,addr=0x3 \

And here after the upgrade: https://pastebin.com/EMu6Jgat 



-netdev tap,fd=55,id=hostnet0 \
-device 
virtio-net-pci,csum=off,netdev=hostnet0,id=net0,mac=52:54:00:aa:cc:a0,bus=pci.1,addr=0x1 
\

-netdev tap,fd=56,id=hostnet1 \
-device 
virtio-net-pci,csum=off,netdev=hostnet1,id=net1,mac=52:54:00:aa:bb:a1,bus=pci.1,addr=0x2 
\

[...]


So there is no change in the qemu commandline for the virtio-net 
devices, nor for the hostdev.


(BTW, you say that your vfio-assigned device is SRIOV, but it isn't - 
it's a standard ethernet device - "Intel Corporation I210 Gigabit 
Network Connection" - this has no effect on the current conversation, 
just FYI).


Since the name of the devices hasn't changed to "enoBLAH", I think the 
whole ACPI index thing is a red herring - ACPI indexes aren't being set 
and the device names aren't being set based on the non-existent ACPI 
indexes. There is something else going on (seemingly tied to the device 
renaming that udev (?) is doing from enpXsY to ensZ).


I notice that you're apparently redefining this domain from scratch each 
time it is started.


1) The machinetype changes from pc-i440fx-5.2 to pc-i440fx-6.0, implying 
that each time the domain is started, it is being told to use the 
generic machinetype "pc", which is then canonicalized to "the newest 
pci-i440fx-based machinetype" before starting the guest.


2) The MAC address has been changed for the two virtio-net cards, but 
not to some random number as would happen if you were allowing libvirt.


It's common for OSes to notice a new MAC address and attempt to give the 
interface a new name. Perhaps this is happening and whoever/whatever is 
doing that is screwing things up. Or it's possible there is some minor 
change in the machinetype from pc-i440fx-5.2 to pc-i440fx-6.0 that is 
causing this renaming to behave differently.


If you really need your guests to be stable, you shouldn't just use "pc" 
as the machinetype every time the guest is started, but instead save the 
canonicalized machinetype listed in the XML when you initially define 
the domain, and use that canonicalized machinetype on all future starts 
of the domain. Likewise, you should retain the exact MAC addresses that 
are used for all the NICs when the domain is originally defined and 
started for the first time, and use those exact same MAC addresses in 
subsequent starts. That way you are guaranteed (modulo any bugs) that 
the guest is presented with the exact same hardware each time it boots. 
If you use "virsh define" and "virsh start" (rather than "virsh create" 
- I can't be certain this is what you're doing, but there are clues 
indicating it might be the case) then all these details are 
automatically preserved for you within libvirt's persistent domain 
configuration.


One other comment - I don't remember the exact location, but I recall 
from a long long time ago that udev saves the information about names 
that it gives to NICs "somewhere". You may want to find and clear out 
that cache of info in the guest to get 

Re: issue when not using acpi indices in libvirt 7.4.0 and qemu 6.0.0

2021-06-23 Thread Riccardo Ravaioli
On Wed, 23 Jun 2021 at 18:59, Daniel P. Berrangé 
wrote:

> [...]
> So your config here does NOT list any ACPI indexes
>

Exactly, I don't list any ACPI indices.


> > After upgrading to libvirt 7.4.0 and qemu 6.0.0, the XML snippet above
> > yielded:
> > - ens1 for the first virtio interface => OK
> > - rename4 for the second virtio interface => **KO**
> > - ens3 for the PCI passthrough interface  => OK
>
> So from libvirt's POV, nothing should have changed upon upgrade,
> as we wouldn't be setting any ACPI indexes by default.
>
> Can you show the QEMU command line from /var/log/libvirt/qemu/$GUEST.log
> both before and after the libvirt upgrade.
>

Sure, here it is before the upgrade: https://pastebin.com/ZzKd2uRJ
And here after the upgrade: https://pastebin.com/EMu6Jgat
(there is a minor difference in the disks which shouldn't be related to
this issue)

Thanks!

Riccardo


Re: issue when not using acpi indices in libvirt 7.4.0 and qemu 6.0.0

2021-06-23 Thread Daniel P . Berrangé
On Wed, Jun 23, 2021 at 06:49:12PM +0200, Riccardo Ravaioli wrote:
> Hi everyone,
> 
> We have an issue with how network interfaces are presented in the VM with
> the latest libvirt 7.4.0 and qemu 6.0.0.
> 
> Previously, we were on libvirt 7.0.0 and qemu 5.2.0, and we used increasing
> virtual PCI addresses for any type of network interface (virtio, PCI
> passthrough, SRIOV) in order to decide the interface order inside the VM.
> For instance the following snippet yields ens1, ens2 and ens3 in a Debian
> Buster VM:
> 
>   
>  
>  
>   type="pci"/>
>  
>  
> 
>  
>   
>   
>  
>  
>   type="pci"/>
>  
>  
> 
>  
>   
>   
>  
> 
>  
>   type="pci"/>
>   

So your config here does NOT list any ACPI indexes

> After upgrading to libvirt 7.4.0 and qemu 6.0.0, the XML snippet above
> yielded:
> - ens1 for the first virtio interface => OK
> - rename4 for the second virtio interface => **KO**
> - ens3 for the PCI passthrough interface  => OK

So from libvirt's POV, nothing should have changed upon upgrade,
as we wouldn't be setting any ACPI indexes by default.

Can you show the QEMU command line from /var/log/libvirt/qemu/$GUEST.log
both before and after the libvirt upgrade.


Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|



issue when not using acpi indices in libvirt 7.4.0 and qemu 6.0.0

2021-06-23 Thread Riccardo Ravaioli
Hi everyone,

We have an issue with how network interfaces are presented in the VM with
the latest libvirt 7.4.0 and qemu 6.0.0.

Previously, we were on libvirt 7.0.0 and qemu 5.2.0, and we used increasing
virtual PCI addresses for any type of network interface (virtio, PCI
passthrough, SRIOV) in order to decide the interface order inside the VM.
For instance the following snippet yields ens1, ens2 and ens3 in a Debian
Buster VM:

  
 
 
 
 
 

 
  
  
 
 
 
 
 

 
  
  
 

 
 
  

After upgrading to libvirt 7.4.0 and qemu 6.0.0, the XML snippet above
yielded:
- ens1 for the first virtio interface => OK
- rename4 for the second virtio interface => **KO**
- ens3 for the PCI passthrough interface  => OK

Argh! What happened to ens2? By running udev inside the VM, I see that
"rename4" is the result of a conflict between the ID_NET_NAME_SLOT of the
second and the third interface, both appearing as ID_NET_NAME_SLOT=ens3. In
theory rename4 should show ID_NET_NAME_SLOT=ens2. What happened?

#  udevadm info -q all /sys/class/net/rename4
P: /devices/pci:00/:00:03.0/:01:02.0/virtio4/net/rename4
L: 0
E: DEVPATH=/devices/pci:00/:00:03.0/:01:02.0/virtio4/net/rename4
E: INTERFACE=rename4
E: IFINDEX=4
E: SUBSYSTEM=net
E: USEC_INITIALIZED=94191911
E: ID_NET_NAMING_SCHEME=v240
E: ID_NET_NAME_MAC=enx525400aabba1
E: ID_NET_NAME_PATH=enp1s2
E: ID_NET_NAME_SLOT=ens3
E: ID_BUS=pci
E: ID_VENDOR_ID=0x1af4
E: ID_MODEL_ID=0x1000
E: ID_PCI_CLASS_FROM_DATABASE=Network controller
E: ID_PCI_SUBCLASS_FROM_DATABASE=Ethernet controller
E: ID_VENDOR_FROM_DATABASE=Red Hat, Inc.
E: ID_MODEL_FROM_DATABASE=Virtio network device
E: ID_PATH=pci-:01:02.0
E: ID_PATH_TAG=pci-_01_02_0
E: ID_NET_DRIVER=virtio_net
E: ID_NET_LINK_FILE=/usr/lib/systemd/network/99-default.link
E: SYSTEMD_ALIAS=/sys/subsystem/net/devices/rename4
E: TAGS=:systemd:

#  udevadm info -q all /sys/class/net/ens3
P: /devices/pci:00/:00:03.0/:01:03.0/net/ens3
L: 0
E: DEVPATH=/devices/pci:00/:00:03.0/:01:03.0/net/ens3
E: INTERFACE=ens3
E: IFINDEX=2
E: SUBSYSTEM=net
E: USEC_INITIALIZED=3600940
E: ID_NET_NAMING_SCHEME=v240
E: ID_NET_NAME_MAC=enx00900b621235
E: ID_OUI_FROM_DATABASE=LANNER ELECTRONICS, INC.
E: ID_NET_NAME_PATH=enp1s3
E: ID_NET_NAME_SLOT=ens3
E: ID_BUS=pci
E: ID_VENDOR_ID=0x8086
E: ID_MODEL_ID=0x1533
E: ID_PCI_CLASS_FROM_DATABASE=Network controller
E: ID_PCI_SUBCLASS_FROM_DATABASE=Ethernet controller
E: ID_VENDOR_FROM_DATABASE=Intel Corporation
E: ID_MODEL_FROM_DATABASE=I210 Gigabit Network Connection
E: ID_PATH=pci-:01:03.0
E: ID_PATH_TAG=pci-_01_03_0
E: ID_NET_DRIVER=igb
E: ID_NET_LINK_FILE=/usr/lib/systemd/network/99-default.link
E: SYSTEMD_ALIAS=/sys/subsystem/net/devices/ens3
E: TAGS=:systemd:


Is there anything we can do in the XML definition of the VM to fix this?

The PCI tree from within the VM is the following, if it helps:
(with libvirt 7.0.0 and qemu 5.2.0 it was the same)

# lspci -tv
-[:00]-+-00.0  Intel Corporation 440FX - 82441FX PMC [Natoma]
   +-01.0  Intel Corporation 82371SB PIIX3 ISA [Natoma/Triton II]
   +-01.1  Intel Corporation 82371SB PIIX3 IDE [Natoma/Triton II]
   +-01.2  Intel Corporation 82371SB PIIX3 USB [Natoma/Triton II]
   +-01.3  Intel Corporation 82371AB/EB/MB PIIX4 ACPI
   +-02.0  Cirrus Logic GD 5446
   +-03.0-[01]--+-01.0  Red Hat, Inc. Virtio network device
   |+-02.0  Red Hat, Inc. Virtio network device
   |\-03.0  Intel Corporation I210 Gigabit Network
Connection
   +-04.0-[02]--
   +-05.0-[03]--
   +-06.0-[04]--
   +-07.0-[05]--
   +-08.0-[06]01.0  Red Hat, Inc. Virtio block device
   +-09.0  Intel Corporation 82801IR/IO/IH (ICH9R/DO/DH) 6 port
SATA Controller [AHCI mode]
   +-0a.0  Red Hat, Inc. Virtio console
   +-0b.0  Red Hat, Inc. Virtio memory balloon
   \-0c.0  Red Hat, Inc. Virtio RNG


I see that a new feature in qemu and libvirt is to add ACPI indices in
order to have network interfaces appear as *onboard* and sort them through
this index as opposed to virtual PCI addresses. This is great. I see that
in this case, interfaces appear as eno1, eno2, etc.

However, for the sake of backward compatibility, is there a way to have the
previous behaviour where interfaces were called by their PCI slot number
(ens1, ens2, etc.)?

If I move to the new naming yielded by ACPI indices, I am mostly worried
about any possible change in interface names that might occur across VMs
running different OS's, with respect to what we had before with libvirt
7.0.0 and qemu 5.2.0.

Thanks!

Best,
Riccardo Ravaioli


Re: Libvirt hit a issue when do VM migration

2021-06-23 Thread liang chaojun
Thanks Peter for your help as well as your suggestion. I have got the cues of 
this issue.

Current there still exist another issue that confuse me that we random hit VM 
can’t boot and  stuck in black screen show as “Guest has not initialized the 
display (yet).”

The attachment is the Libvirt debug log but I’m not able to find any useful 
cues.  Any comments regarding this issue will be appreciated.



[cid:a29599f6-5c30-4804-a2f1-65b1027a7e10@jpnprd01.prod.outlook.com]


> 在 2021年6月17日,下午8:21,Peter Krempa  写道:
>
> Firstly please don't CC random people who are subscribed to the list.
>
> Additionally this is a user-question so it's not entirely appropriate
> for libvir-l...@redhat.com only for libvirt-users@redhat.com.
>
> On Thu, Jun 17, 2021 at 16:52:42 +0800, 梁朝军 wrote:
>> Hi All,
>>
>> Who can give me a help? I hit another issue when do vm migration?  Libvirt 
>> throw out the error like below
>>
>> "libvirt: Domain Config error : unsupported configuration: vcpu enable order 
>> of vCPU '0' differs between source and destination definitions”
>
> This error happens when the 'order' attribute ...
>
>>
>>
>> The cup information is defined VM domain on source host is like:
>> 2
>> 
>> 
>> 
>
> ... such you can see here doesn't match for a vcpu of a particular ID.
>
>> 
>> 
>> 
>> 
>> 
>>
>> The distinction host previews generate xml for VM like:
>>
>> 2
>> 
>> 
>> 
>
> Yours do match though. Did you post the correct XMLs?
>
>> 
>> 
>> 
>> 
>> 
>>
>> What’t wrong with it? Or I missing some actions in configurations regarding 
>> the migration
>
> It's unfortunately impossible to tell, your report doesn't descirbe what
> steps you took to do the migration nor does contain debug logs.
>



testvm_cuhv7a54.log
Description: testvm_cuhv7a54.log


Re: Libvirt hit a issue when do VM migration

2021-06-23 Thread liang chaojun
Thanks Peter for your help as well as your suggestion. I have got the cues of 
this issue.

Current there still exist another issue that confuse me that we random hit VM 
can’t boot and  stuck in black screen show as “Guest has not initialized the 
display (yet).”

The attachment is the Libvirt debug log but I’m not able to find any useful 
cues.  Any comments regarding this issue will be appreciated.



[cid:463f2b37-202e-49a2-a82e-ff224e9a9eac@jpnprd01.prod.outlook.com]


> 在 2021年6月17日,下午8:21,Peter Krempa  写道:
>
> Firstly please don't CC random people who are subscribed to the list.
>
> Additionally this is a user-question so it's not entirely appropriate
> for libvir-l...@redhat.com only for libvirt-users@redhat.com.
>
> On Thu, Jun 17, 2021 at 16:52:42 +0800, 梁朝军 wrote:
>> Hi All,
>>
>> Who can give me a help? I hit another issue when do vm migration?  Libvirt 
>> throw out the error like below
>>
>> "libvirt: Domain Config error : unsupported configuration: vcpu enable order 
>> of vCPU '0' differs between source and destination definitions”
>
> This error happens when the 'order' attribute ...
>
>>
>>
>> The cup information is defined VM domain on source host is like:
>> 2
>> 
>> 
>> 
>
> ... such you can see here doesn't match for a vcpu of a particular ID.
>
>> 
>> 
>> 
>> 
>> 
>>
>> The distinction host previews generate xml for VM like:
>>
>> 2
>> 
>> 
>> 
>
> Yours do match though. Did you post the correct XMLs?
>
>> 
>> 
>> 
>> 
>> 
>>
>> What’t wrong with it? Or I missing some actions in configurations regarding 
>> the migration
>
> It's unfortunately impossible to tell, your report doesn't descirbe what
> steps you took to do the migration nor does contain debug logs.
>



testvm_cuhv7a54.log
Description: testvm_cuhv7a54.log


Re: Libvirt hit a issue when do VM migration

2021-06-23 Thread liang chaojun
Thanks Peter for your help as well as your suggestion. I have got the cues of 
this issue.

Current there still exist another issue that confuse me that we random hit VM 
can’t boot and  stuck in black screen show as “Guest has not initialized the 
display (yet).”

The attachment is the Libvirt debug log but I’m not able to find any useful 
cues.  Any comments regarding this issue will be appreciated.



[cid:47159081-a674-46ac-937d-c578768a82f9@jpnprd01.prod.outlook.com]


> 在 2021年6月17日,下午8:21,Peter Krempa  写道:
>
> Firstly please don't CC random people who are subscribed to the list.
>
> Additionally this is a user-question so it's not entirely appropriate
> for libvir-l...@redhat.com only for libvirt-users@redhat.com.
>
> On Thu, Jun 17, 2021 at 16:52:42 +0800, 梁朝军 wrote:
>> Hi All,
>>
>> Who can give me a help? I hit another issue when do vm migration?  Libvirt 
>> throw out the error like below
>>
>> "libvirt: Domain Config error : unsupported configuration: vcpu enable order 
>> of vCPU '0' differs between source and destination definitions”
>
> This error happens when the 'order' attribute ...
>
>>
>>
>> The cup information is defined VM domain on source host is like:
>> 2
>> 
>> 
>> 
>
> ... such you can see here doesn't match for a vcpu of a particular ID.
>
>> 
>> 
>> 
>> 
>> 
>>
>> The distinction host previews generate xml for VM like:
>>
>> 2
>> 
>> 
>> 
>
> Yours do match though. Did you post the correct XMLs?
>
>> 
>> 
>> 
>> 
>> 
>>
>> What’t wrong with it? Or I missing some actions in configurations regarding 
>> the migration
>
> It's unfortunately impossible to tell, your report doesn't descirbe what
> steps you took to do the migration nor does contain debug logs.
>



testvm_cuhv7a54.log
Description: testvm_cuhv7a54.log


Re: unaccessible directory in lxcContainerResolveSymlinks

2021-06-23 Thread Priyanka Gupta
Hi,

If I try to add only /flash (without dir subdirerctory) into the XML, it
works fine for me.
Does it matter what sort of permissions(user/grp) /flash/dir has?

Thanks
Priyanka

On Wed, Jun 23, 2021 at 2:46 PM Michal Prívozník 
wrote:

> On 6/23/21 11:08 AM, Priyanka Gupta wrote:
> > Hi Michal,
> >
> > This is how snippets from my XML looks like. Full XML at the end of the
> > mail..
> > /usr/sbin/libvirt_lxc
> > 
> >   
> >   
> > 
> > 
> >   
> >   
> > 
> >
> > The issue I am facing is, my container doesnt start. Fails at mounting
> > this /flash/dir  with below message.
> >
> > 2021-06-09 06:52:55.548+: 1: error : lxcContainerMountFSBind:1223 :
> > Failed to bind mount directory /.oldroot/flash/dir to /flash/dir: No
> > such file or directory
>
> Yeah, the lxc driver thinks that /flash/dir doesn't exist.
> Unfortunately, I am unable to reproduce - when I mkdir -p /flash/dir
> and add corresponding  to one of my containers it boots up
> just fine. What's your libvirt version?
>
> Michal
>
>


Re: unaccessible directory in lxcContainerResolveSymlinks

2021-06-23 Thread Michal Prívozník
On 6/23/21 11:28 AM, Priyanka Gupta wrote:
> Hi,
> 
> If I try to add only /flash (without dir subdirerctory) into the XML, it
> works fine for me.
> Does it matter what sort of permissions(user/grp) /flash/dir has? 

A-ha! It can happen that root can't access some directories even though
a regular user can. Typically, fuse mounts (unless explicitly allowed
when mounting). So it does matter, yeah.

Michal



Re: unaccessible directory in lxcContainerResolveSymlinks

2021-06-23 Thread Michal Prívozník
On 6/23/21 11:08 AM, Priyanka Gupta wrote:
> Hi Michal,
> 
> This is how snippets from my XML looks like. Full XML at the end of the
> mail.. 
>     /usr/sbin/libvirt_lxc
>     
>       
>       
>     
>     
>       
>       
>     
> 
> The issue I am facing is, my container doesnt start. Fails at mounting
> this /flash/dir  with below message.
> 
> 2021-06-09 06:52:55.548+: 1: error : lxcContainerMountFSBind:1223 :
> Failed to bind mount directory /.oldroot/flash/dir to /flash/dir: No
> such file or directory

Yeah, the lxc driver thinks that /flash/dir doesn't exist.
Unfortunately, I am unable to reproduce - when I mkdir -p /flash/dir
and add corresponding  to one of my containers it boots up
just fine. What's your libvirt version?

Michal



Re: unaccessible directory in lxcContainerResolveSymlinks

2021-06-23 Thread Priyanka Gupta
Hi Michal,

This is how snippets from my XML looks like. Full XML at the end of the
mail..
/usr/sbin/libvirt_lxc

  
  


  
  


The issue I am facing is, my container doesnt start. Fails at mounting this
/flash/dir  with below message.

2021-06-09 06:52:55.548+: 1: error : lxcContainerMountFSBind:1223 :
Failed to bind mount directory /.oldroot/flash/dir to /flash/dir: No such
file or directory

I see that .oldroot is added to this path and figured out
that lxcContainerPivotRoot is creating and mounting all folders under
rootfs to rootfs/.oldroot
But durig pivot root, this path /flash/dir is unaccessible and hence we
don't seem to create this .oldroot/flash/dir.
Later on, during lxcContainerMountFSBind(), /.oldroot/flash/dir bind mount
fails.

Any thoughts?

Thanks
Priyanka


Full XML



  try
  1048576
  1048576
  1
  
exe
/sbin/init
  
  
/usr/sbin/libvirt_lxc

  
  


  
  


  

  


On Wed, Jun 23, 2021 at 1:46 PM Michal Prívozník 
wrote:

> On 6/22/21 6:17 PM, Priyanka Gupta wrote:
> > Hi,
> >
> > Could someone pls let me know when this condition could possibly arise?
> > lxcContainerResolveSymlinks:621 : Skipped unaccessible '/flash/dir'
> >
> > The code seems to call access('/flash/dir', F_OK) which shall only check
> > for existence of this directory '/flash/dir'. I have this directory
> > created on my host. Is there anything that I am missing?
> >
>
> Hey, looking into the code the function is ran from a mount namespace
> thus the path may not exist. But is there a problem you are seeing? If
> so, can you share your LXC XML and describe the problem? I mean, the
> message you mention is just a debug printing, harmless.
>
> Michal
>
>