Re: Network driver domain broken
On 3/7/2022 5:07 PM, Jason Andryuk wrote: > On Mon, Mar 7, 2022 at 10:00 AM Andrea Stevanato > wrote: >> (XEN) XSM Framework v1.0.0 initialized >> (XEN) Initialising XSM SILO mode > > Yes, SILO mode is running. > >> # cat /boot/xen-4.14.3-pre.config | grep XSM >> CONFIG_XSM=y >> CONFIG_XSM_FLASK=y >> CONFIG_XSM_FLASK_AVC_STATS=y >> # CONFIG_XSM_FLASK_POLICY is not set >> CONFIG_XSM_SILO=y >> # CONFIG_XSM_DUMMY_DEFAULT is not set >> # CONFIG_XSM_FLASK_DEFAULT is not set >> CONFIG_XSM_SILO_DEFAULT=y >> >> This is the default configuration shipped with petalinux. From the >> help menuconfig, it seems that this XSM SILO deny communication >> between unprivileged VMs. > > You could try adding xsm=dummy to your hypervisor command line to turn > off SILO and allow the guests to communicate. I changed it to FLASK adding flask=late to hypervisor the command line. Which one should I choose? SILO + xsm=dummy or FLASK + flask=late/disabled? What are the differences? Cheers, Andrea
Re: Network driver domain broken
On 3/7/2022 3:56 PM, Jan Beulich wrote: > On 07.03.2022 15:52, Roger Pau Monné wrote: >> On Mon, Mar 07, 2022 at 03:20:22PM +0100, Andrea Stevanato wrote: >>> On 3/7/2022 12:46 PM, Roger Pau Monné wrote: >>>> On Mon, Mar 07, 2022 at 12:39:22PM +0100, Andrea Stevanato wrote: >>>>> /local/domain/2 = "" (n0,r2) >>>>> /local/domain/2/vm = "/vm/f6dca20a-54bb-43af-9a62-67c55cb75708" (n0,r2) >>>>> /local/domain/2/name = "guest1" (n0,r2) >>>>> /local/domain/2/cpu = "" (n0,r2) >>>>> /local/domain/2/cpu/0 = "" (n0,r2) >>>>> /local/domain/2/cpu/0/availability = "online" (n0,r2) >>>>> /local/domain/2/cpu/1 = "" (n0,r2) >>>>> /local/domain/2/cpu/1/availability = "online" (n0,r2) >>>>> /local/domain/2/memory = "" (n0,r2) >>>>> /local/domain/2/memory/static-max = "1048576" (n0,r2) >>>>> /local/domain/2/memory/target = "1048577" (n0,r2) >>>>> /local/domain/2/memory/videoram = "-1" (n0,r2) >>>>> /local/domain/2/device = "" (n0,r2) >>>>> /local/domain/2/device/suspend = "" (n0,r2) >>>>> /local/domain/2/device/suspend/event-channel = "" (n2) >>>>> /local/domain/2/device/vif = "" (n0,r2) >>>>> /local/domain/2/device/vif/0 = "" (n2,r1) >>>>> /local/domain/2/device/vif/0/backend = "/local/domain/1/backend/vif/2/0" >>>>> (n2,r1) >>>>> /local/domain/2/device/vif/0/backend-id = "1" (n2,r1) >>>>> /local/domain/2/device/vif/0/state = "6" (n2,r1) >>>>> /local/domain/2/device/vif/0/handle = "0" (n2,r1) >>>>> /local/domain/2/device/vif/0/mac = "00:16:3e:07:df:91" (n2,r1) >>>>> /local/domain/2/device/vif/0/xdp-headroom = "0" (n2,r1) >>>>> /local/domain/2/control = "" (n0,r2) >>>>> /local/domain/2/control/shutdown = "" (n2) >>>>> /local/domain/2/control/feature-poweroff = "1" (n2) >>>>> /local/domain/2/control/feature-reboot = "1" (n2) >>>>> /local/domain/2/control/feature-suspend = "" (n2) >>>>> /local/domain/2/control/sysrq = "" (n2) >>>>> /local/domain/2/control/platform-feature-multiprocessor-suspend = "1" >>>>> (n0,r2) >>>>> /local/domain/2/control/platform-feature-xs_reset_watches = "1" (n0,r2) >>>>> /local/domain/2/data = "" (n2) >>>>> /local/domain/2/drivers = "" (n2) >>>>> /local/domain/2/feature = "" (n2) >>>>> /local/domain/2/attr = "" (n2) >>>>> /local/domain/2/error = "" (n2) >>>>> /local/domain/2/error/device = "" (n2) >>>>> /local/domain/2/error/device/vif = "" (n2) >>>>> /local/domain/2/error/device/vif/0 = "" (n2) >>>>> /local/domain/2/error/device/vif/0/error = "1 allocating event channel" >>>>> (n2) >>>> >>>> That's the real error. Your guest netfront fails to allocate the event >>>> channel. Do you get any messages in the guest dmesg after trying to >>>> attach the network interface? >>> >>> Just these two lines: >>> >>> [ 389.453390] vif vif-0: 1 allocating event channel >>> [ 389.804135] vif vif-0: 1 allocating event channel >> >> Are you perhaps using some kind flask/xsm policy different from the >> defaults? > > Or SILO mode. It turns out that this was the problem. I changed it to FLASK, added flask=late to the bootloader cmd and now it works fine (at least for now). > Jan Forgive me for bothering you so much, as soon as I can I will update the wiki with all the information that I have discovered! Thank you all! Cheers, Andrea
Re: Network driver domain broken
On 3/7/2022 3:50 PM, Andrew Cooper wrote: > On 07/03/2022 14:43, Andrea Stevanato wrote: >> On 3/7/2022 3:36 PM, Jan Beulich wrote: >>> On 07.03.2022 15:20, Andrea Stevanato wrote: >>>> On 3/7/2022 12:46 PM, Roger Pau Monné wrote: >>>>> On Mon, Mar 07, 2022 at 12:39:22PM +0100, Andrea Stevanato wrote: >>>>>> /local/domain/2 = "" (n0,r2) >>>>>> /local/domain/2/vm = "/vm/f6dca20a-54bb-43af-9a62-67c55cb75708" (n0,r2) >>>>>> /local/domain/2/name = "guest1" (n0,r2) >>>>>> /local/domain/2/cpu = "" (n0,r2) >>>>>> /local/domain/2/cpu/0 = "" (n0,r2) >>>>>> /local/domain/2/cpu/0/availability = "online" (n0,r2) >>>>>> /local/domain/2/cpu/1 = "" (n0,r2) >>>>>> /local/domain/2/cpu/1/availability = "online" (n0,r2) >>>>>> /local/domain/2/memory = "" (n0,r2) >>>>>> /local/domain/2/memory/static-max = "1048576" (n0,r2) >>>>>> /local/domain/2/memory/target = "1048577" (n0,r2) >>>>>> /local/domain/2/memory/videoram = "-1" (n0,r2) >>>>>> /local/domain/2/device = "" (n0,r2) >>>>>> /local/domain/2/device/suspend = "" (n0,r2) >>>>>> /local/domain/2/device/suspend/event-channel = "" (n2) >>>>>> /local/domain/2/device/vif = "" (n0,r2) >>>>>> /local/domain/2/device/vif/0 = "" (n2,r1) >>>>>> /local/domain/2/device/vif/0/backend = "/local/domain/1/backend/vif/2/0" >>>>>> (n2,r1) >>>>>> /local/domain/2/device/vif/0/backend-id = "1" (n2,r1) >>>>>> /local/domain/2/device/vif/0/state = "6" (n2,r1) >>>>>> /local/domain/2/device/vif/0/handle = "0" (n2,r1) >>>>>> /local/domain/2/device/vif/0/mac = "00:16:3e:07:df:91" (n2,r1) >>>>>> /local/domain/2/device/vif/0/xdp-headroom = "0" (n2,r1) >>>>>> /local/domain/2/control = "" (n0,r2) >>>>>> /local/domain/2/control/shutdown = "" (n2) >>>>>> /local/domain/2/control/feature-poweroff = "1" (n2) >>>>>> /local/domain/2/control/feature-reboot = "1" (n2) >>>>>> /local/domain/2/control/feature-suspend = "" (n2) >>>>>> /local/domain/2/control/sysrq = "" (n2) >>>>>> /local/domain/2/control/platform-feature-multiprocessor-suspend = "1" >>>>>> (n0,r2) >>>>>> /local/domain/2/control/platform-feature-xs_reset_watches = "1" (n0,r2) >>>>>> /local/domain/2/data = "" (n2) >>>>>> /local/domain/2/drivers = "" (n2) >>>>>> /local/domain/2/feature = "" (n2) >>>>>> /local/domain/2/attr = "" (n2) >>>>>> /local/domain/2/error = "" (n2) >>>>>> /local/domain/2/error/device = "" (n2) >>>>>> /local/domain/2/error/device/vif = "" (n2) >>>>>> /local/domain/2/error/device/vif/0 = "" (n2) >>>>>> /local/domain/2/error/device/vif/0/error = "1 allocating event channel" >>>>>> (n2) >>>>> That's the real error. Your guest netfront fails to allocate the event >>>>> channel. Do you get any messages in the guest dmesg after trying to >>>>> attach the network interface? >>>> Just these two lines: >>>> >>>> [ 389.453390] vif vif-0: 1 allocating event channel >>>> [ 389.804135] vif vif-0: 1 allocating event channel >>> Well, these are the error messages, from xenbus_alloc_evtchn(). >>> What's a little odd is that the error code is positive, but that's >>> how -EPERM is logged. Is there perhaps a strange or broken XSM >>> policy in use? I ask because evtchn_alloc_unbound() itself >>> wouldn't return -EPERM afaics. >> As you can see I'm pretty new to Xen. Furthermore, it is the first >> time that I heard about XSM, so since I did not change anything I >> do not know what to answer! > > Please can you attach the full output of `xl dmesg`, which will help > answer this question. # xl dmesg (XEN) Checking for initrd in /chosen (XEN) RAM: - 7fef (XEN) RAM: 0008
Re: Network driver domain broken
On 3/7/2022 3:36 PM, Jan Beulich wrote: > On 07.03.2022 15:20, Andrea Stevanato wrote: >> On 3/7/2022 12:46 PM, Roger Pau Monné wrote: >>> On Mon, Mar 07, 2022 at 12:39:22PM +0100, Andrea Stevanato wrote: >>>> /local/domain/2 = "" (n0,r2) >>>> /local/domain/2/vm = "/vm/f6dca20a-54bb-43af-9a62-67c55cb75708" (n0,r2) >>>> /local/domain/2/name = "guest1" (n0,r2) >>>> /local/domain/2/cpu = "" (n0,r2) >>>> /local/domain/2/cpu/0 = "" (n0,r2) >>>> /local/domain/2/cpu/0/availability = "online" (n0,r2) >>>> /local/domain/2/cpu/1 = "" (n0,r2) >>>> /local/domain/2/cpu/1/availability = "online" (n0,r2) >>>> /local/domain/2/memory = "" (n0,r2) >>>> /local/domain/2/memory/static-max = "1048576" (n0,r2) >>>> /local/domain/2/memory/target = "1048577" (n0,r2) >>>> /local/domain/2/memory/videoram = "-1" (n0,r2) >>>> /local/domain/2/device = "" (n0,r2) >>>> /local/domain/2/device/suspend = "" (n0,r2) >>>> /local/domain/2/device/suspend/event-channel = "" (n2) >>>> /local/domain/2/device/vif = "" (n0,r2) >>>> /local/domain/2/device/vif/0 = "" (n2,r1) >>>> /local/domain/2/device/vif/0/backend = "/local/domain/1/backend/vif/2/0" >>>> (n2,r1) >>>> /local/domain/2/device/vif/0/backend-id = "1" (n2,r1) >>>> /local/domain/2/device/vif/0/state = "6" (n2,r1) >>>> /local/domain/2/device/vif/0/handle = "0" (n2,r1) >>>> /local/domain/2/device/vif/0/mac = "00:16:3e:07:df:91" (n2,r1) >>>> /local/domain/2/device/vif/0/xdp-headroom = "0" (n2,r1) >>>> /local/domain/2/control = "" (n0,r2) >>>> /local/domain/2/control/shutdown = "" (n2) >>>> /local/domain/2/control/feature-poweroff = "1" (n2) >>>> /local/domain/2/control/feature-reboot = "1" (n2) >>>> /local/domain/2/control/feature-suspend = "" (n2) >>>> /local/domain/2/control/sysrq = "" (n2) >>>> /local/domain/2/control/platform-feature-multiprocessor-suspend = "1" >>>> (n0,r2) >>>> /local/domain/2/control/platform-feature-xs_reset_watches = "1" (n0,r2) >>>> /local/domain/2/data = "" (n2) >>>> /local/domain/2/drivers = "" (n2) >>>> /local/domain/2/feature = "" (n2) >>>> /local/domain/2/attr = "" (n2) >>>> /local/domain/2/error = "" (n2) >>>> /local/domain/2/error/device = "" (n2) >>>> /local/domain/2/error/device/vif = "" (n2) >>>> /local/domain/2/error/device/vif/0 = "" (n2) >>>> /local/domain/2/error/device/vif/0/error = "1 allocating event channel" >>>> (n2) >>> >>> That's the real error. Your guest netfront fails to allocate the event >>> channel. Do you get any messages in the guest dmesg after trying to >>> attach the network interface? >> >> Just these two lines: >> >> [ 389.453390] vif vif-0: 1 allocating event channel >> [ 389.804135] vif vif-0: 1 allocating event channel > > Well, these are the error messages, from xenbus_alloc_evtchn(). > What's a little odd is that the error code is positive, but that's > how -EPERM is logged. Is there perhaps a strange or broken XSM > policy in use? I ask because evtchn_alloc_unbound() itself > wouldn't return -EPERM afaics. As you can see I'm pretty new to Xen. Furthermore, it is the first time that I heard about XSM, so since I did not change anything I do not know what to answer! The only thing that I can tell is that for both dom0 and guests I'm using the same exact kernel and rootfs. > Jan Cheers, Andrea
Re: Network driver domain broken
On 3/7/2022 12:46 PM, Roger Pau Monné wrote: > On Mon, Mar 07, 2022 at 12:39:22PM +0100, Andrea Stevanato wrote: >> /local/domain/2 = "" (n0,r2) >> /local/domain/2/vm = "/vm/f6dca20a-54bb-43af-9a62-67c55cb75708" (n0,r2) >> /local/domain/2/name = "guest1" (n0,r2) >> /local/domain/2/cpu = "" (n0,r2) >> /local/domain/2/cpu/0 = "" (n0,r2) >> /local/domain/2/cpu/0/availability = "online" (n0,r2) >> /local/domain/2/cpu/1 = "" (n0,r2) >> /local/domain/2/cpu/1/availability = "online" (n0,r2) >> /local/domain/2/memory = "" (n0,r2) >> /local/domain/2/memory/static-max = "1048576" (n0,r2) >> /local/domain/2/memory/target = "1048577" (n0,r2) >> /local/domain/2/memory/videoram = "-1" (n0,r2) >> /local/domain/2/device = "" (n0,r2) >> /local/domain/2/device/suspend = "" (n0,r2) >> /local/domain/2/device/suspend/event-channel = "" (n2) >> /local/domain/2/device/vif = "" (n0,r2) >> /local/domain/2/device/vif/0 = "" (n2,r1) >> /local/domain/2/device/vif/0/backend = "/local/domain/1/backend/vif/2/0" >> (n2,r1) >> /local/domain/2/device/vif/0/backend-id = "1" (n2,r1) >> /local/domain/2/device/vif/0/state = "6" (n2,r1) >> /local/domain/2/device/vif/0/handle = "0" (n2,r1) >> /local/domain/2/device/vif/0/mac = "00:16:3e:07:df:91" (n2,r1) >> /local/domain/2/device/vif/0/xdp-headroom = "0" (n2,r1) >> /local/domain/2/control = "" (n0,r2) >> /local/domain/2/control/shutdown = "" (n2) >> /local/domain/2/control/feature-poweroff = "1" (n2) >> /local/domain/2/control/feature-reboot = "1" (n2) >> /local/domain/2/control/feature-suspend = "" (n2) >> /local/domain/2/control/sysrq = "" (n2) >> /local/domain/2/control/platform-feature-multiprocessor-suspend = "1" >> (n0,r2) >> /local/domain/2/control/platform-feature-xs_reset_watches = "1" (n0,r2) >> /local/domain/2/data = "" (n2) >> /local/domain/2/drivers = "" (n2) >> /local/domain/2/feature = "" (n2) >> /local/domain/2/attr = "" (n2) >> /local/domain/2/error = "" (n2) >> /local/domain/2/error/device = "" (n2) >> /local/domain/2/error/device/vif = "" (n2) >> /local/domain/2/error/device/vif/0 = "" (n2) >> /local/domain/2/error/device/vif/0/error = "1 allocating event channel" >> (n2) > > That's the real error. Your guest netfront fails to allocate the event > channel. Do you get any messages in the guest dmesg after trying to > attach the network interface? Just these two lines: [ 389.453390] vif vif-0: 1 allocating event channel [ 389.804135] vif vif-0: 1 allocating event channel > Does the same happen if you don't use a driver domain and run the > backend in dom0? No, it does not. On dom0 everything is set up correctly. Here the final part of xl -vvv devd -F executed on dom0, which is different from the execution on guest0 libxl: debug: libxl_event.c:1052:devstate_callback: backend /local/domain/0/backend/vif/1/0/state wanted state 2 ok libxl: debug: libxl_event.c:850:libxl__ev_xswatch_deregister: watch w=0xca342470 wpath=/local/domain/0/backend/vif/1/0/state token=1/2: deregister slotnum=1 libxl: debug: libxl_device.c:1090:device_backend_callback: Domain 1:calling device_backend_cleanup libxl: debug: libxl_event.c:864:libxl__ev_xswatch_deregister: watch w=0xca342470: deregister unregistered libxl: debug: libxl_device.c:1191:device_hotplug: Domain 1:calling hotplug script: /etc/xen/scripts/vif-bridge online libxl: debug: libxl_device.c:1192:device_hotplug: Domain 1:extra args: libxl: debug: libxl_device.c:1198:device_hotplug: Domain 1: type_if=vif libxl: debug: libxl_device.c:1200:device_hotplug: Domain 1:env: libxl: debug: libxl_device.c:1207:device_hotplug: Domain 1: script: /etc/xen/scripts/vif-bridge libxl: debug: libxl_device.c:1207:device_hotplug: Domain 1: XENBUS_TYPE: vif libxl: debug: libxl_device.c:1207:device_hotplug: Domain 1: XENBUS_PATH: backend/vif/1/0 libxl: debug: libxl_device.c:1207:device_hotplug: Domain 1: XENBUS_BASE_PATH: backend libxl: debug: libxl_device.c:1207:device_hotplug: Domain 1: netdev: libxl: debug: libxl_device.c:1207:device_hotplug: Domain 1: vif: vif1.0 libxl: debug: libxl_aoutils.c:593:libxl__async_exec_start: forking to execute: /etc/xen/scripts/vif-bridge online > > Regards, Roger. Cheers, Andrea.
Re: Network driver domain broken
On 3/7/22 12:22, Roger Pau Monné wrote: On Fri, Mar 04, 2022 at 02:46:37PM +0100, Andrea Stevanato wrote: On 3/4/2022 1:27 PM, Roger Pau Monné wrote: On Fri, Mar 04, 2022 at 01:05:55PM +0100, Andrea Stevanato wrote: On 3/4/2022 12:52 PM, Roger Pau Monné wrote: On Thu, Mar 03, 2022 at 01:08:31PM -0500, Jason Andryuk wrote: On Thu, Mar 3, 2022 at 11:34 AM Roger Pau Monné wrote: On Thu, Mar 03, 2022 at 05:01:23PM +0100, Andrea Stevanato wrote: On 03/03/2022 15:54, Andrea Stevanato wrote: Hi all, according to the conversation that I had with royger, aa67b97ed34 broke the driver domain support. What I'm trying to do is to setup networking between guests using driver domain. Therefore, the guest (driver) has been started with the following cfg. name= "guest0" kernel = "/media/sd-mmcblk0p1/Image" ramdisk = "/media/sd-mmcblk0p1/rootfs.cpio.gz" extra = "console=hvc0 rdinit=/sbin/init root=/dev/ram0" memory = 1024 vcpus = 2 driver_domain = 1 On guest0 I created the bridge, assigned a static IP and started the udhcpd on xenbr0 interface. While the second guest has been started with the following cfg: name= "guest1" kernel = "/media/sd-mmcblk0p1/Image" ramdisk = "/media/sd-mmcblk0p1/rootfs.cpio.gz" extra = "console=hvc0 rdinit=/sbin/init root=/dev/ram0" memory = 1024 vcpus = 2 vcpus = 2 vif = [ 'bridge=xenbr0, backend=guest0' ] Follows the result of strace xl devd: # strace xl devd execve("/usr/sbin/xl", ["xl", "devd"], 0xdf0420c8 /* 13 vars */) = 0 ioctl(5, _IOC(_IOC_NONE, 0x50, 0, 0x30), 0xe6e41b40) = -1 EPERM (Operation not permitted) write(2, "libxl: ", 7libxl: ) = 7 write(2, "error: ", 7error: ) = 7 write(2, "libxl_utils.c:820:libxl_cpu_bitm"..., 87libxl_utils.c:820:libxl_cpu_bitmap_alloc: failed to retrieve the maximum number of cpus) = 87 write(2, "\n", 1 ) = 1 clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x9ee7a0e0) = 814 wait4(814, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 814 --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=814, si_uid=0, si_status=0, si_utime=2, si_stime=2} --- xl devd is daemonizing, but strace is only following the first process. Use `strace xl devd -F` to prevent the daemonizing (or `strace -f xl devd` to follow children). Or as a first step try to see what kind of messages you get from `xl devd -F` when trying to attach a device using the driver domain. Nothing has changed. On guest0 (the driver domain): # xl devd -F libxl: error: libxl_utils.c:820:libxl_cpu_bitmap_alloc: failed to retrieve the maximum number of cpus libxl: error: libxl_utils.c:820:libxl_cpu_bitmap_alloc: failed to retrieve the maximum number of cpus libxl: error: libxl_utils.c:820:libxl_cpu_bitmap_alloc: failed to retrieve the maximum number of cpus [ 696.805619] xenbr0: port 1(vif2.0) entered blocking state [ 696.810334] xenbr0: port 1(vif2.0) entered disabled state [ 696.824518] device vif2.0 entered promiscuous mode Can you use `xl -vvv devd -F` here? # xl -vvv devd -F libxl: error: libxl_utils.c:820:libxl_cpu_bitmap_alloc: failed to retrieve the maximum number of cpus libxl: error: libxl_utils.c:820:libxl_cpu_bitmap_alloc: failed to retrieve the maximum number of cpus libxl: error: libxl_utils.c:820:libxl_cpu_bitmap_alloc: failed to retrieve the maximum number of cpus libxl: debug: libxl_device.c:1749:libxl_device_events_handler: ao 0xece52130: create: how=(nil) callback=(nil) poller=0xece52430 libxl: debug: libxl_event.c:813:libxl__ev_xswatch_register: watch w=0xe628caf8 wpath=/local/domain/1/backend token=3/0: register slotnum=3 libxl: debug: libxl_device.c:1806:libxl_device_events_handler: ao 0xece52130: inprogress: poller=0xece52430, flags=i libxl: debug: libxl_event.c:750:watchfd_callback: watch w=0xe628caf8 wpath=/local/domain/1/backend token=3/0: event epath=/local/domain/1/backend libxl: debug: libxl_event.c:2445:libxl__nested_ao_create: ao 0xece51b90: nested ao, parent 0xece52130 libxl: debug: libxl_event.c:2035:libxl__ao__destroy: ao 0xece51b90: destroy libxl: debug: libxl_event.c:750:watchfd_callback: watch w=0xe628caf8 wpath=/local/domain/1/backend token=3/0: event epath=/local/domain/1/backend/vif/2/0 libxl: debug: libxl_event.c:2445:libxl__nested_ao_create: ao 0xece4e7b0: nested ao, parent 0xece52130 libxl: debug: libxl_event.c:2035:libxl__ao__destroy: ao 0xece4e7b0: destroy libxl: debug: libxl_event.c:750:watchfd_callback: watch w=0xe628caf8 wpath=/local/domain/1/backend token=3/0: event epath=/local/domain/1/backend/vif/2 libxl: debug: libxl_event.c:2445:libxl__nested_ao_create: ao 0xece4e990: nested ao, parent 0xece52130 libxl: debug: libxl_event.c:2035:libxl__ao__destroy: ao 0xaaa
Re: Network driver domain broken
On 04/03/2022 14:46, Andrea Stevanato wrote: > On 3/4/2022 1:27 PM, Roger Pau Monné wrote: >> On Fri, Mar 04, 2022 at 01:05:55PM +0100, Andrea Stevanato wrote: >>> On 3/4/2022 12:52 PM, Roger Pau Monné wrote: >>>> On Thu, Mar 03, 2022 at 01:08:31PM -0500, Jason Andryuk wrote: >>>>> On Thu, Mar 3, 2022 at 11:34 AM Roger Pau Monné >>>>> wrote: >>>>>> >>>>>> On Thu, Mar 03, 2022 at 05:01:23PM +0100, Andrea Stevanato wrote: >>>>>>> On 03/03/2022 15:54, Andrea Stevanato wrote: >>>>>>>> Hi all, >>>>>>>> >>>>>>>> according to the conversation that I had with royger, aa67b97ed34 >>>>>>>> broke the driver domain support. >>>>>>>> >>>>>>>> What I'm trying to do is to setup networking between guests using >>>>>>>> driver domain. Therefore, the guest (driver) has been started with the >>>>>>>> following cfg. >>>>>>>> >>>>>>>> name = "guest0" >>>>>>>> kernel = "/media/sd-mmcblk0p1/Image" >>>>>>>> ramdisk = "/media/sd-mmcblk0p1/rootfs.cpio.gz" >>>>>>>> extra = "console=hvc0 rdinit=/sbin/init root=/dev/ram0" >>>>>>>> memory = 1024 vcpus = 2 >>>>>>>> driver_domain = 1 >>>>>>>> >>>>>>>> On guest0 I created the bridge, assigned a static IP and started the >>>>>>>> udhcpd on xenbr0 interface. >>>>>>>> While the second guest has been started with the following cfg: >>>>>>>> >>>>>>>> name = "guest1" >>>>>>>> kernel = "/media/sd-mmcblk0p1/Image" >>>>>>>> ramdisk = "/media/sd-mmcblk0p1/rootfs.cpio.gz" >>>>>>>> extra = "console=hvc0 rdinit=/sbin/init root=/dev/ram0" >>>>>>>> memory = 1024 vcpus = 2 >>>>>>>> vcpus = 2 >>>>>>>> vif = [ 'bridge=xenbr0, backend=guest0' ] >>>>>>>> >>>>>>>> Follows the result of strace xl devd: >>>>>>>> >>>>>>>> # strace xl devd >>>>>>>> execve("/usr/sbin/xl", ["xl", "devd"], 0xdf0420c8 /* 13 vars */) = >>>>>>>> 0 >>>>> >>>>>>>> ioctl(5, _IOC(_IOC_NONE, 0x50, 0, 0x30), 0xe6e41b40) = -1 EPERM >>>>>>>> (Operation not permitted) >>>>>>>> write(2, "libxl: ", 7libxl: ) = 7 >>>>>>>> write(2, "error: ", 7error: ) = 7 >>>>>>>> write(2, "libxl_utils.c:820:libxl_cpu_bitm"..., >>>>>>>> 87libxl_utils.c:820:libxl_cpu_bitmap_alloc: failed to retrieve the >>>>>>>> maximum number of cpus) = 87 >>>>>>>> write(2, "\n", 1 >>>>>>>> ) = 1 >>>>>>>> clone(child_stack=NULL, >>>>>>>> flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, >>>>>>>> child_tidptr=0x9ee7a0e0) = 814 >>>>>>>> wait4(814, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 814 >>>>>>>> --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=814, >>>>>>>> si_uid=0, si_status=0, si_utime=2, si_stime=2} --- >>>>> >>>>> xl devd is daemonizing, but strace is only following the first >>>>> process. Use `strace xl devd -F` to prevent the daemonizing (or >>>>> `strace -f xl devd` to follow children). >>>> >>>> Or as a first step try to see what kind of messages you get from `xl >>>> devd -F` when trying to attach a device using the driver domain. >>> >>> Nothing has changed. On guest0 (the driver domain): >>> >>> # xl devd -F >>> libxl: error: libxl_utils.c:820:libxl_cpu_bitmap_alloc: failed to retrieve >>> the maximum number of cpus >>> libxl: error: libxl_utils.c:820:libxl_cpu_bitmap_alloc: failed to retrieve >>> the maximum number of cpus >>> libxl: error: libxl_utils.c:820:libxl_cpu_bitmap_alloc: failed to retrieve >>> the maximum number of cpus >>&
Re: Network driver domain broken
On 3/4/2022 1:27 PM, Roger Pau Monné wrote: On Fri, Mar 04, 2022 at 01:05:55PM +0100, Andrea Stevanato wrote: On 3/4/2022 12:52 PM, Roger Pau Monné wrote: On Thu, Mar 03, 2022 at 01:08:31PM -0500, Jason Andryuk wrote: On Thu, Mar 3, 2022 at 11:34 AM Roger Pau Monné wrote: On Thu, Mar 03, 2022 at 05:01:23PM +0100, Andrea Stevanato wrote: On 03/03/2022 15:54, Andrea Stevanato wrote: Hi all, according to the conversation that I had with royger, aa67b97ed34 broke the driver domain support. What I'm trying to do is to setup networking between guests using driver domain. Therefore, the guest (driver) has been started with the following cfg. name= "guest0" kernel = "/media/sd-mmcblk0p1/Image" ramdisk = "/media/sd-mmcblk0p1/rootfs.cpio.gz" extra = "console=hvc0 rdinit=/sbin/init root=/dev/ram0" memory = 1024 vcpus = 2 driver_domain = 1 On guest0 I created the bridge, assigned a static IP and started the udhcpd on xenbr0 interface. While the second guest has been started with the following cfg: name= "guest1" kernel = "/media/sd-mmcblk0p1/Image" ramdisk = "/media/sd-mmcblk0p1/rootfs.cpio.gz" extra = "console=hvc0 rdinit=/sbin/init root=/dev/ram0" memory = 1024 vcpus = 2 vcpus = 2 vif = [ 'bridge=xenbr0, backend=guest0' ] Follows the result of strace xl devd: # strace xl devd execve("/usr/sbin/xl", ["xl", "devd"], 0xdf0420c8 /* 13 vars */) = 0 ioctl(5, _IOC(_IOC_NONE, 0x50, 0, 0x30), 0xe6e41b40) = -1 EPERM (Operation not permitted) write(2, "libxl: ", 7libxl: ) = 7 write(2, "error: ", 7error: ) = 7 write(2, "libxl_utils.c:820:libxl_cpu_bitm"..., 87libxl_utils.c:820:libxl_cpu_bitmap_alloc: failed to retrieve the maximum number of cpus) = 87 write(2, "\n", 1 ) = 1 clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x9ee7a0e0) = 814 wait4(814, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 814 --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=814, si_uid=0, si_status=0, si_utime=2, si_stime=2} --- xl devd is daemonizing, but strace is only following the first process. Use `strace xl devd -F` to prevent the daemonizing (or `strace -f xl devd` to follow children). Or as a first step try to see what kind of messages you get from `xl devd -F` when trying to attach a device using the driver domain. Nothing has changed. On guest0 (the driver domain): # xl devd -F libxl: error: libxl_utils.c:820:libxl_cpu_bitmap_alloc: failed to retrieve the maximum number of cpus libxl: error: libxl_utils.c:820:libxl_cpu_bitmap_alloc: failed to retrieve the maximum number of cpus libxl: error: libxl_utils.c:820:libxl_cpu_bitmap_alloc: failed to retrieve the maximum number of cpus [ 696.805619] xenbr0: port 1(vif2.0) entered blocking state [ 696.810334] xenbr0: port 1(vif2.0) entered disabled state [ 696.824518] device vif2.0 entered promiscuous mode Can you use `xl -vvv devd -F` here? # xl -vvv devd -F libxl: error: libxl_utils.c:820:libxl_cpu_bitmap_alloc: failed to retrieve the maximum number of cpus libxl: error: libxl_utils.c:820:libxl_cpu_bitmap_alloc: failed to retrieve the maximum number of cpus libxl: error: libxl_utils.c:820:libxl_cpu_bitmap_alloc: failed to retrieve the maximum number of cpus libxl: debug: libxl_device.c:1749:libxl_device_events_handler: ao 0xece52130: create: how=(nil) callback=(nil) poller=0xece52430 libxl: debug: libxl_event.c:813:libxl__ev_xswatch_register: watch w=0xe628caf8 wpath=/local/domain/1/backend token=3/0: register slotnum=3 libxl: debug: libxl_device.c:1806:libxl_device_events_handler: ao 0xece52130: inprogress: poller=0xece52430, flags=i libxl: debug: libxl_event.c:750:watchfd_callback: watch w=0xe628caf8 wpath=/local/domain/1/backend token=3/0: event epath=/local/domain/1/backend libxl: debug: libxl_event.c:2445:libxl__nested_ao_create: ao 0xece51b90: nested ao, parent 0xece52130 libxl: debug: libxl_event.c:2035:libxl__ao__destroy: ao 0xece51b90: destroy libxl: debug: libxl_event.c:750:watchfd_callback: watch w=0xe628caf8 wpath=/local/domain/1/backend token=3/0: event epath=/local/domain/1/backend/vif/2/0 libxl: debug: libxl_event.c:2445:libxl__nested_ao_create: ao 0xece4e7b0: nested ao, parent 0xece52130 libxl: debug: libxl_event.c:2035:libxl__ao__destroy: ao 0xece4e7b0: destroy libxl: debug: libxl_event.c:750:watchfd_callback: watch w=0xe628caf8 wpath=/local/domain/1/backend token=3/0: event epath=/local/domain/1/backend/vif/2 libxl: debug: libxl_event.c:2445:libxl__nested_ao_create: ao 0xece4e990: nested ao, parent 0xece52130 libxl: debug: libxl_event.c:2035:libxl__ao__destroy: ao 0xece4e990: destroy libxl: debug: libxl_event.c:750:watchfd_callback: watch w=0xe628caf
Re: Network driver domain broken
On 3/4/2022 12:52 PM, Roger Pau Monné wrote: On Thu, Mar 03, 2022 at 01:08:31PM -0500, Jason Andryuk wrote: On Thu, Mar 3, 2022 at 11:34 AM Roger Pau Monné wrote: On Thu, Mar 03, 2022 at 05:01:23PM +0100, Andrea Stevanato wrote: On 03/03/2022 15:54, Andrea Stevanato wrote: Hi all, according to the conversation that I had with royger, aa67b97ed34 broke the driver domain support. What I'm trying to do is to setup networking between guests using driver domain. Therefore, the guest (driver) has been started with the following cfg. name= "guest0" kernel = "/media/sd-mmcblk0p1/Image" ramdisk = "/media/sd-mmcblk0p1/rootfs.cpio.gz" extra = "console=hvc0 rdinit=/sbin/init root=/dev/ram0" memory = 1024 vcpus = 2 driver_domain = 1 On guest0 I created the bridge, assigned a static IP and started the udhcpd on xenbr0 interface. While the second guest has been started with the following cfg: name= "guest1" kernel = "/media/sd-mmcblk0p1/Image" ramdisk = "/media/sd-mmcblk0p1/rootfs.cpio.gz" extra = "console=hvc0 rdinit=/sbin/init root=/dev/ram0" memory = 1024 vcpus = 2 vcpus = 2 vif = [ 'bridge=xenbr0, backend=guest0' ] Follows the result of strace xl devd: # strace xl devd execve("/usr/sbin/xl", ["xl", "devd"], 0xdf0420c8 /* 13 vars */) = 0 ioctl(5, _IOC(_IOC_NONE, 0x50, 0, 0x30), 0xe6e41b40) = -1 EPERM (Operation not permitted) write(2, "libxl: ", 7libxl: ) = 7 write(2, "error: ", 7error: ) = 7 write(2, "libxl_utils.c:820:libxl_cpu_bitm"..., 87libxl_utils.c:820:libxl_cpu_bitmap_alloc: failed to retrieve the maximum number of cpus) = 87 write(2, "\n", 1 ) = 1 clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x9ee7a0e0) = 814 wait4(814, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 814 --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=814, si_uid=0, si_status=0, si_utime=2, si_stime=2} --- xl devd is daemonizing, but strace is only following the first process. Use `strace xl devd -F` to prevent the daemonizing (or `strace -f xl devd` to follow children). Or as a first step try to see what kind of messages you get from `xl devd -F` when trying to attach a device using the driver domain. Nothing has changed. On guest0 (the driver domain): # xl devd -F libxl: error: libxl_utils.c:820:libxl_cpu_bitmap_alloc: failed to retrieve the maximum number of cpus libxl: error: libxl_utils.c:820:libxl_cpu_bitmap_alloc: failed to retrieve the maximum number of cpus libxl: error: libxl_utils.c:820:libxl_cpu_bitmap_alloc: failed to retrieve the maximum number of cpus [ 696.805619] xenbr0: port 1(vif2.0) entered blocking state [ 696.810334] xenbr0: port 1(vif2.0) entered disabled state [ 696.824518] device vif2.0 entered promiscuous mode While on dom0: # xl network-list guest1 Idx BE Mac Addr. handle state evt-ch tx-/rx-ring-ref BE-path 0 1 00:16:3e:18:52:ac 0 6 -1-1/-1 /local/domain/1/backend/vif/2/0 The same with using strace gives the following output: # strace xl devd -F execve("/usr/sbin/xl", ["xl", "devd", "-F"], 0xeed242a0 /* 13 vars */) = 0 brk(NULL) = 0xaaab092a8000 faccessat(AT_FDCWD, "/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory) openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3 fstat(3, {st_mode=S_IFREG|0644, st_size=7840, ...}) = 0 mmap(NULL, 7840, PROT_READ, MAP_PRIVATE, 3, 0) = 0x986e2000 close(3)= 0 openat(AT_FDCWD, "/usr/lib/libxlutil.so.4.14", O_RDONLY|O_CLOEXEC) = 3 read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0\267\0\1\0\0\0\0200\0\0\0\0\0\0"..., 832) = 832 fstat(3, {st_mode=S_IFREG|0755, st_size=68168, ...}) = 0 mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x986e mmap(NULL, 131784, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x98694000 mprotect(0x986a3000, 65536, PROT_NONE) = 0 mmap(0x986b3000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0xf000) = 0x986b3000 close(3)= 0 openat(AT_FDCWD, "/usr/lib/libxenlight.so.4.14", O_RDONLY|O_CLOEXEC) = 3 read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0\267\0\1\0\0\0`\16\2\0\0\0\0\0"..., 832) = 832 fstat(3, {st_mode=S_IFREG|0755, st_size=861848, ...}) = 0 mmap(NULL, 925752, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x985b1000 mprotect(0x9867e000, 61440, PROT_NONE) = 0 mmap(0x9868d000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0xcc000) = 0x9868d000 mmap(0x98693000, 56, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXE
Re: Network driver domain broken
On 3/3/2022 7:08 PM, Jason Andryuk wrote: On Thu, Mar 3, 2022 at 11:34 AM Roger Pau Monné wrote: On Thu, Mar 03, 2022 at 05:01:23PM +0100, Andrea Stevanato wrote: On 03/03/2022 15:54, Andrea Stevanato wrote: Hi all, according to the conversation that I had with royger, aa67b97ed34 broke the driver domain support. What I'm trying to do is to setup networking between guests using driver domain. Therefore, the guest (driver) has been started with the following cfg. name= "guest0" kernel = "/media/sd-mmcblk0p1/Image" ramdisk = "/media/sd-mmcblk0p1/rootfs.cpio.gz" extra = "console=hvc0 rdinit=/sbin/init root=/dev/ram0" memory = 1024 vcpus = 2 driver_domain = 1 On guest0 I created the bridge, assigned a static IP and started the udhcpd on xenbr0 interface. While the second guest has been started with the following cfg: name= "guest1" kernel = "/media/sd-mmcblk0p1/Image" ramdisk = "/media/sd-mmcblk0p1/rootfs.cpio.gz" extra = "console=hvc0 rdinit=/sbin/init root=/dev/ram0" memory = 1024 vcpus = 2 vcpus = 2 vif = [ 'bridge=xenbr0, backend=guest0' ] Follows the result of strace xl devd: # strace xl devd execve("/usr/sbin/xl", ["xl", "devd"], 0xdf0420c8 /* 13 vars */) = 0 ioctl(5, _IOC(_IOC_NONE, 0x50, 0, 0x30), 0xe6e41b40) = -1 EPERM (Operation not permitted) write(2, "libxl: ", 7libxl: ) = 7 write(2, "error: ", 7error: ) = 7 write(2, "libxl_utils.c:820:libxl_cpu_bitm"..., 87libxl_utils.c:820:libxl_cpu_bitmap_alloc: failed to retrieve the maximum number of cpus) = 87 write(2, "\n", 1 ) = 1 clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x9ee7a0e0) = 814 wait4(814, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 814 --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=814, si_uid=0, si_status=0, si_utime=2, si_stime=2} --- xl devd is daemonizing, but strace is only following the first process. Use `strace xl devd -F` to prevent the daemonizing (or `strace -f xl devd` to follow children). Sorry, I have not read this part. # strace xl devd -F execve("/usr/sbin/xl", ["xl", "devd", "-F"], 0xc53b6e50 /* 13 vars */) = 0 brk(NULL) = 0xaaab058a faccessat(AT_FDCWD, "/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory) openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3 fstat(3, {st_mode=S_IFREG|0644, st_size=7840, ...}) = 0 mmap(NULL, 7840, PROT_READ, MAP_PRIVATE, 3, 0) = 0x833c7000 close(3)= 0 openat(AT_FDCWD, "/usr/lib/libxlutil.so.4.14", O_RDONLY|O_CLOEXEC) = 3 read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0\267\0\1\0\0\0\0200\0\0\0\0\0\0"..., 832) = 832 fstat(3, {st_mode=S_IFREG|0755, st_size=68168, ...}) = 0 mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x833c5000 mmap(NULL, 131784, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x83379000 mprotect(0x83388000, 65536, PROT_NONE) = 0 mmap(0x83398000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0xf000) = 0x83398000 close(3)= 0 openat(AT_FDCWD, "/usr/lib/libxenlight.so.4.14", O_RDONLY|O_CLOEXEC) = 3 read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0\267\0\1\0\0\0`\16\2\0\0\0\0\0"..., 832) = 832 fstat(3, {st_mode=S_IFREG|0755, st_size=861848, ...}) = 0 mmap(NULL, 925752, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x83296000 mprotect(0x83363000, 61440, PROT_NONE) = 0 mmap(0x83372000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0xcc000) = 0x83372000 mmap(0x83378000, 56, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x83378000 close(3)= 0 openat(AT_FDCWD, "/usr/lib/libxentoollog.so.1", O_RDONLY|O_CLOEXEC) = 3 read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0\267\0\1\0\0\0P\r\0\0\0\0\0\0"..., 832) = 832 fstat(3, {st_mode=S_IFREG|0755, st_size=10368, ...}) = 0 mmap(NULL, 73904, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x83283000 mprotect(0x83285000, 61440, PROT_NONE) = 0 mmap(0x83294000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1000) = 0x83294000 close(3)= 0 openat(AT_FDCWD, "/usr/lib/libyajl.so.2", O_RDONLY|O_CLOEXEC) = 3 read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0\267\0\1\0\0\0\320\22\0\0\0\0\0\0"..., 832) = 832 fstat(3, {st_mode=S_IFREG|0755, st_size=38728, ...}) = 0 mmap(NULL, 102416, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x83269000 mprotect(0x83272000, 61
Re: Network driver domain broken
On 03/03/2022 19:08, Jason Andryuk wrote: On Thu, Mar 3, 2022 at 11:34 AM Roger Pau Monné wrote: On Thu, Mar 03, 2022 at 05:01:23PM +0100, Andrea Stevanato wrote: On 03/03/2022 15:54, Andrea Stevanato wrote: Hi all, according to the conversation that I had with royger, aa67b97ed34 broke the driver domain support. What I'm trying to do is to setup networking between guests using driver domain. Therefore, the guest (driver) has been started with the following cfg. name= "guest0" kernel = "/media/sd-mmcblk0p1/Image" ramdisk = "/media/sd-mmcblk0p1/rootfs.cpio.gz" extra = "console=hvc0 rdinit=/sbin/init root=/dev/ram0" memory = 1024 vcpus = 2 driver_domain = 1 On guest0 I created the bridge, assigned a static IP and started the udhcpd on xenbr0 interface. While the second guest has been started with the following cfg: name= "guest1" kernel = "/media/sd-mmcblk0p1/Image" ramdisk = "/media/sd-mmcblk0p1/rootfs.cpio.gz" extra = "console=hvc0 rdinit=/sbin/init root=/dev/ram0" memory = 1024 vcpus = 2 vcpus = 2 vif = [ 'bridge=xenbr0, backend=guest0' ] Follows the result of strace xl devd: # strace xl devd execve("/usr/sbin/xl", ["xl", "devd"], 0xdf0420c8 /* 13 vars */) = 0 ioctl(5, _IOC(_IOC_NONE, 0x50, 0, 0x30), 0xe6e41b40) = -1 EPERM (Operation not permitted) write(2, "libxl: ", 7libxl: ) = 7 write(2, "error: ", 7error: ) = 7 write(2, "libxl_utils.c:820:libxl_cpu_bitm"..., 87libxl_utils.c:820:libxl_cpu_bitmap_alloc: failed to retrieve the maximum number of cpus) = 87 write(2, "\n", 1 ) = 1 clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x9ee7a0e0) = 814 wait4(814, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 814 --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=814, si_uid=0, si_status=0, si_utime=2, si_stime=2} --- xl devd is daemonizing, but strace is only following the first process. Use `strace xl devd -F` to prevent the daemonizing (or `strace -f xl devd` to follow children). close(6)= 0 close(5)= 0 munmap(0x9f45f000, 4096)= 0 close(7)= 0 close(10) = 0 close(9)= 0 close(8)= 0 close(11) = 0 close(3)= 0 close(4)= 0 exit_group(0) = ? +++ exited with 0 +++ royger told me that it is a BUG and not an issue with my setup. Therefore here I am. Just a bit more context: AFAICT the calls to libxl_cpu_bitmap_alloc in parse_global_config will prevent xl from being usable on anything different than the control domain (due to sysctl only available to privileged domains). This is an issue for 'xl devd', as it won't start anymore. These look non-fatal at first glance? Regards, Jason Well, actually, this prevents me to be able to create network driver domains for inter-guests networking (no passthrough is required since they do not need to reach outside). Cheers, Andrea
Re: Network driver domain broken
On 03/03/2022 15:54, Andrea Stevanato wrote: Hi all, according to the conversation that I had with royger, aa67b97ed34 broke the driver domain support. What I'm trying to do is to setup networking between guests using driver domain. Therefore, the guest (driver) has been started with the following cfg. name = "guest0" kernel = "/media/sd-mmcblk0p1/Image" ramdisk = "/media/sd-mmcblk0p1/rootfs.cpio.gz" extra = "console=hvc0 rdinit=/sbin/init root=/dev/ram0" memory = 1024 vcpus = 2 driver_domain = 1 On guest0 I created the bridge, assigned a static IP and started the udhcpd on xenbr0 interface. While the second guest has been started with the following cfg: name = "guest1" kernel = "/media/sd-mmcblk0p1/Image" ramdisk = "/media/sd-mmcblk0p1/rootfs.cpio.gz" extra = "console=hvc0 rdinit=/sbin/init root=/dev/ram0" memory = 1024 vcpus = 2 vcpus = 2 vif = [ 'bridge=xenbr0, backend=guest0' ] Follows the result of strace xl devd: # strace xl devd execve("/usr/sbin/xl", ["xl", "devd"], 0xdf0420c8 /* 13 vars */) = 0 brk(NULL) = 0xeaf3b000 faccessat(AT_FDCWD, "/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory) openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3 fstat(3, {st_mode=S_IFREG|0644, st_size=7840, ...}) = 0 mmap(NULL, 7840, PROT_READ, MAP_PRIVATE, 3, 0) = 0x9f45e000 close(3)= 0 openat(AT_FDCWD, "/usr/lib/libxlutil.so.4.14", O_RDONLY|O_CLOEXEC) = 3 read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0\267\0\1\0\0\0\0200\0\0\0\0\0\0"..., 832) = 832 fstat(3, {st_mode=S_IFREG|0755, st_size=68168, ...}) = 0 mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x9f45c000 mmap(NULL, 131784, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x9f41 mprotect(0x9f41f000, 65536, PROT_NONE) = 0 mmap(0x9f42f000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0xf000) = 0x9f42f000 close(3)= 0 openat(AT_FDCWD, "/usr/lib/libxenlight.so.4.14", O_RDONLY|O_CLOEXEC) = 3 read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0\267\0\1\0\0\0`\16\2\0\0\0\0\0"..., 832) = 832 fstat(3, {st_mode=S_IFREG|0755, st_size=861848, ...}) = 0 mmap(NULL, 925752, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x9f32d000 mprotect(0x9f3fa000, 61440, PROT_NONE) = 0 mmap(0x9f409000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0xcc000) = 0x9f409000 mmap(0x9f40f000, 56, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x9f40f000 close(3)= 0 openat(AT_FDCWD, "/usr/lib/libxentoollog.so.1", O_RDONLY|O_CLOEXEC) = 3 read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0\267\0\1\0\0\0P\r\0\0\0\0\0\0"..., 832) = 832 fstat(3, {st_mode=S_IFREG|0755, st_size=10368, ...}) = 0 mmap(NULL, 73904, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x9f31a000 mprotect(0x9f31c000, 61440, PROT_NONE) = 0 mmap(0x9f32b000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1000) = 0x9f32b000 close(3)= 0 openat(AT_FDCWD, "/usr/lib/libyajl.so.2", O_RDONLY|O_CLOEXEC) = 3 read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0\267\0\1\0\0\0\320\22\0\0\0\0\0\0"..., 832) = 832 fstat(3, {st_mode=S_IFREG|0755, st_size=38728, ...}) = 0 mmap(NULL, 102416, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x9f30 mprotect(0x9f309000, 61440, PROT_NONE) = 0 mmap(0x9f318000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x8000) = 0x9f318000 close(3)= 0 openat(AT_FDCWD, "/lib/libpthread.so.0", O_RDONLY|O_CLOEXEC) = 3 read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0\267\0\1\0\0\0\300j\0\0\0\0\0\0"..., 832) = 832 fstat(3, {st_mode=S_IFREG|0755, st_size=113184, ...}) = 0 mmap(NULL, 192872, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x9f2d mprotect(0x9f2ea000, 65536, PROT_NONE) = 0 mmap(0x9f2fa000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1a000) = 0x9f2fa000 mmap(0x9f2fc000, 12648, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x9f2fc000 close(3)= 0 openat(AT_FDCWD, "/lib/libc.so.6", O_RDONLY|O_CLOEXEC) = 3 read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0\267\0\1\0\0\0\320I\2\0\0\0\0\0"..., 832) = 832 fstat(3, {st_mode=S_IFREG|0755, st_size=1428872, ...}) = 0 mmap(NULL, 1502000, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x9f161000 mprotect(0x9f2b8000, 61440, PROT_NONE) = 0 mmap(0x9f2c7000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_F
Network driver domain broken
Hi all, according to the conversation that I had with royger, aa67b97ed34 broke the driver domain support. What I'm trying to do is to setup networking between guests using driver domain. Therefore, the guest (driver) has been started with the following cfg. name = "guest0" kernel = "/media/sd-mmcblk0p1/Image" ramdisk = "/media/sd-mmcblk0p1/rootfs.cpio.gz" extra = "console=hvc0 rdinit=/sbin/init root=/dev/ram0" memory = 1024 vcpus = 2 driver_domain = 1 On guest0 I created the bridge, assigned a static IP and started the udhcpd on xenbr0 interface. While the second guest has been started with the following cfg: name = "guest1" kernel = "/media/sd-mmcblk0p1/Image" ramdisk = "/media/sd-mmcblk0p1/rootfs.cpio.gz" extra = "console=hvc0 rdinit=/sbin/init root=/dev/ram0" memory = 1024 vcpus = 2 vcpus = 2 vif = [ 'bridge=xenbr0, backend=guest0' ] Follows the result of strace xl devd: # strace xl devd execve("/usr/sbin/xl", ["xl", "devd"], 0xdf0420c8 /* 13 vars */) = 0 brk(NULL) = 0xeaf3b000 faccessat(AT_FDCWD, "/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory) openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3 fstat(3, {st_mode=S_IFREG|0644, st_size=7840, ...}) = 0 mmap(NULL, 7840, PROT_READ, MAP_PRIVATE, 3, 0) = 0x9f45e000 close(3)= 0 openat(AT_FDCWD, "/usr/lib/libxlutil.so.4.14", O_RDONLY|O_CLOEXEC) = 3 read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0\267\0\1\0\0\0\0200\0\0\0\0\0\0"..., 832) = 832 fstat(3, {st_mode=S_IFREG|0755, st_size=68168, ...}) = 0 mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x9f45c000 mmap(NULL, 131784, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x9f41 mprotect(0x9f41f000, 65536, PROT_NONE) = 0 mmap(0x9f42f000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0xf000) = 0x9f42f000 close(3)= 0 openat(AT_FDCWD, "/usr/lib/libxenlight.so.4.14", O_RDONLY|O_CLOEXEC) = 3 read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0\267\0\1\0\0\0`\16\2\0\0\0\0\0"..., 832) = 832 fstat(3, {st_mode=S_IFREG|0755, st_size=861848, ...}) = 0 mmap(NULL, 925752, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x9f32d000 mprotect(0x9f3fa000, 61440, PROT_NONE) = 0 mmap(0x9f409000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0xcc000) = 0x9f409000 mmap(0x9f40f000, 56, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x9f40f000 close(3)= 0 openat(AT_FDCWD, "/usr/lib/libxentoollog.so.1", O_RDONLY|O_CLOEXEC) = 3 read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0\267\0\1\0\0\0P\r\0\0\0\0\0\0"..., 832) = 832 fstat(3, {st_mode=S_IFREG|0755, st_size=10368, ...}) = 0 mmap(NULL, 73904, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x9f31a000 mprotect(0x9f31c000, 61440, PROT_NONE) = 0 mmap(0x9f32b000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1000) = 0x9f32b000 close(3)= 0 openat(AT_FDCWD, "/usr/lib/libyajl.so.2", O_RDONLY|O_CLOEXEC) = 3 read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0\267\0\1\0\0\0\320\22\0\0\0\0\0\0"..., 832) = 832 fstat(3, {st_mode=S_IFREG|0755, st_size=38728, ...}) = 0 mmap(NULL, 102416, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x9f30 mprotect(0x9f309000, 61440, PROT_NONE) = 0 mmap(0x9f318000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x8000) = 0x9f318000 close(3)= 0 openat(AT_FDCWD, "/lib/libpthread.so.0", O_RDONLY|O_CLOEXEC) = 3 read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0\267\0\1\0\0\0\300j\0\0\0\0\0\0"..., 832) = 832 fstat(3, {st_mode=S_IFREG|0755, st_size=113184, ...}) = 0 mmap(NULL, 192872, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x9f2d mprotect(0x9f2ea000, 65536, PROT_NONE) = 0 mmap(0x9f2fa000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1a000) = 0x9f2fa000 mmap(0x9f2fc000, 12648, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x9f2fc000 close(3)= 0 openat(AT_FDCWD, "/lib/libc.so.6", O_RDONLY|O_CLOEXEC) = 3 read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0\267\0\1\0\0\0\320I\2\0\0\0\0\0"..., 832) = 832 fstat(3, {st_mode=S_IFREG|0755, st_size=1428872, ...}) = 0 mmap(NULL, 1502000, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x9f161000 mprotect(0x9f2b8000, 61440, PROT_NONE) = 0 mmap(0x9f2c7000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x156000) = 0x9f2c7000 mmap(0x9f2cd000, 11056, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x9f2cd000 close(3)= 0 openat(AT_FDCWD, "/usr/lib/libxenevtchn.so.1", O_RDONLY|O_CLOEXEC) = 3