Re: [Xen-devel] reboot driver domain, vifX.Y = NO-CARRIER?
Hi Jason, On Sun, May 6, 2018 at 11:45 AM, Jason Cooperwrote: > Hi Marek, > > On Sat, May 05, 2018 at 01:03:15AM +0200, Marek Marczykowski-Górecki wrote: >> On Fri, May 04, 2018 at 06:13:25PM -0400, Rich Persaud wrote: >> > > On May 1, 2018, at 08:53, Jason Cooper wrote: >> > > >> > > add the link to xen-users thread of me talking to myself. :-)) >> > > >> > >> On Tue, May 01, 2018 at 12:37:51PM +, Jason Cooper wrote: >> > >> When I was first digging into this, I started a thread on xen-users [1], >> > >> I've attached my xl-reboot.sh script here so you can see exactly what >> > >> I'm attempting to do: >> > > >> > > [1] https://marc.info/?l=xen-users=152389443206023=2 >> > >> > You may want to look at the code (toolstack and/or frontend-backend >> > drivers) for Qubes and OpenXT, both of which use network driver >> > domains and support wired/wireless networks. >> > >> > Operational restart of a measured, non-persistent driver domain >> > (instead of host) is a benefit of Xen disaggregation architectures. >> >> In Qubes, on backend restart, we do equivalent of xl network-detach && >> xl network-attach (as you do in xl-reboot.sh). xl itself doesn't provide >> any place to plug such script, but we use libvirt which provide events. >> Also, we have full control over domain config (libvirt XML), so don't >> need to extract vif list from xenstore... OpenXT does the xl network-detach && xl network-attach in its own daemon: https://github.com/OpenXT/network/blob/master/nwd/Main.hs#L767 >> The problem you describe looks related to >> https://lkml.org/lkml/2018/2/28/289, but fix is included in 4.16... >> There was also related libxl patch: >> https://xen.markmail.org/thread/6qbgmwyjqsshjus7 >> (but it applies to the case where you first shutdown backend and only >> then do xl network-detach) >> >> Do you have xl devd running in your driver domain? Without that xl >> network-attach wont work (AFAIR udev isn't used here anymore). > > Yes, I've now modified the init script (xendomains in Gentoo) to create > a key /tool/vmstatus/$domname/status, start the domU, loop until it gets > it's domid, and -chmod the key. It then does a -watch on that key. In > the domU, *after* xl devd is started, it writes "online" to that key. > > This allows me to automatically bring up the driver domains, and make > sure they're ready for connections before proceeding to booting the next > VM. This only occurs when the host boots. > > After the driver domains are up, the rest of the domains are started in > parallel. > >> Also note that backend shutdown/restart/crash was a source of many >> problems in frontend kernel and toolstack in the past. Even simple >> dynamic network-attach/detach sometimes is problematic for the frontend. >> Links: >> https://github.com/QubesOS/qubes-issues/issues/3657 (frontend kernel >> problem) >> https://github.com/QubesOS/qubes-issues/issues/1426 (toolstack problem, >> + libvirt) >> https://github.com/QubesOS/qubes-issues/issues/975 (frontend kernel >> problem) > > Mmm, clearly the state machine and it's implementation needs some > review. I'm building v4.16.7 and we'll see how it goes for my usecase. OpenXT has some patches for reconnecting netfront after the netback domain is rebooted to a new domid: https://github.com/OpenXT/xenclient-oe/blob/master/recipes-kernel/linux/4.14/patches/netfront-support-backend-relocate.patch https://github.com/OpenXT/xenclient-oe/blob/master/recipes-kernel/linux/4.14/patches/xenbus-move-otherend-watches-on-relocate.patch I'm too familiar with those, so they may be specific to the OpenXT networking code. Jason, when you see the vif NO-CARRIER, how do the frontend and backend XenStore entries look? Do the domids matchup and is the pair in state 4 -> XenbusStateConnected? Regards, Jason ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] reboot driver domain, vifX.Y = NO-CARRIER?
Hi Marek, On Sat, May 05, 2018 at 01:03:15AM +0200, Marek Marczykowski-Górecki wrote: > On Fri, May 04, 2018 at 06:13:25PM -0400, Rich Persaud wrote: > > > On May 1, 2018, at 08:53, Jason Cooperwrote: > > > > > > add the link to xen-users thread of me talking to myself. :-)) > > > > > >> On Tue, May 01, 2018 at 12:37:51PM +, Jason Cooper wrote: > > >> When I was first digging into this, I started a thread on xen-users [1], > > >> I've attached my xl-reboot.sh script here so you can see exactly what > > >> I'm attempting to do: > > > > > > [1] https://marc.info/?l=xen-users=152389443206023=2 > > > > You may want to look at the code (toolstack and/or frontend-backend > > drivers) for Qubes and OpenXT, both of which use network driver > > domains and support wired/wireless networks. > > > > Operational restart of a measured, non-persistent driver domain > > (instead of host) is a benefit of Xen disaggregation architectures. > > In Qubes, on backend restart, we do equivalent of xl network-detach && > xl network-attach (as you do in xl-reboot.sh). xl itself doesn't provide > any place to plug such script, but we use libvirt which provide events. > Also, we have full control over domain config (libvirt XML), so don't > need to extract vif list from xenstore... > > The problem you describe looks related to > https://lkml.org/lkml/2018/2/28/289, but fix is included in 4.16... > There was also related libxl patch: > https://xen.markmail.org/thread/6qbgmwyjqsshjus7 > (but it applies to the case where you first shutdown backend and only > then do xl network-detach) > > Do you have xl devd running in your driver domain? Without that xl > network-attach wont work (AFAIR udev isn't used here anymore). Yes, I've now modified the init script (xendomains in Gentoo) to create a key /tool/vmstatus/$domname/status, start the domU, loop until it gets it's domid, and -chmod the key. It then does a -watch on that key. In the domU, *after* xl devd is started, it writes "online" to that key. This allows me to automatically bring up the driver domains, and make sure they're ready for connections before proceeding to booting the next VM. This only occurs when the host boots. After the driver domains are up, the rest of the domains are started in parallel. > Also note that backend shutdown/restart/crash was a source of many > problems in frontend kernel and toolstack in the past. Even simple > dynamic network-attach/detach sometimes is problematic for the frontend. > Links: > https://github.com/QubesOS/qubes-issues/issues/3657 (frontend kernel > problem) > https://github.com/QubesOS/qubes-issues/issues/1426 (toolstack problem, > + libvirt) > https://github.com/QubesOS/qubes-issues/issues/975 (frontend kernel > problem) Mmm, clearly the state machine and it's implementation needs some review. I'm building v4.16.7 and we'll see how it goes for my usecase. Thanks for all the pointers! thx, Jason. ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] reboot driver domain, vifX.Y = NO-CARRIER?
On Fri, May 04, 2018 at 06:13:25PM -0400, Rich Persaud wrote: > > On May 1, 2018, at 08:53, Jason Cooperwrote: > > > > add the link to xen-users thread of me talking to myself. :-)) > > > >> On Tue, May 01, 2018 at 12:37:51PM +, Jason Cooper wrote: > >> When I was first digging into this, I started a thread on xen-users [1], > >> I've attached my xl-reboot.sh script here so you can see exactly what > >> I'm attempting to do: > > > > [1] https://marc.info/?l=xen-users=152389443206023=2 > > You may want to look at the code (toolstack and/or frontend-backend drivers) > for Qubes and OpenXT, both of which use network driver domains and support > wired/wireless networks. > > Operational restart of a measured, non-persistent driver domain (instead of > host) is a benefit of Xen disaggregation architectures. In Qubes, on backend restart, we do equivalent of xl network-detach && xl network-attach (as you do in xl-reboot.sh). xl itself doesn't provide any place to plug such script, but we use libvirt which provide events. Also, we have full control over domain config (libvirt XML), so don't need to extract vif list from xenstore... The problem you describe looks related to https://lkml.org/lkml/2018/2/28/289, but fix is included in 4.16... There was also related libxl patch: https://xen.markmail.org/thread/6qbgmwyjqsshjus7 (but it applies to the case where you first shutdown backend and only then do xl network-detach) Do you have xl devd running in your driver domain? Without that xl network-attach wont work (AFAIR udev isn't used here anymore). Also note that backend shutdown/restart/crash was a source of many problems in frontend kernel and toolstack in the past. Even simple dynamic network-attach/detach sometimes is problematic for the frontend. Links: https://github.com/QubesOS/qubes-issues/issues/3657 (frontend kernel problem) https://github.com/QubesOS/qubes-issues/issues/1426 (toolstack problem, + libvirt) https://github.com/QubesOS/qubes-issues/issues/975 (frontend kernel problem) -- Best Regards, Marek Marczykowski-Górecki Invisible Things Lab A: Because it messes up the order in which people normally read text. Q: Why is top-posting such a bad thing? signature.asc Description: PGP signature ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] reboot driver domain, vifX.Y = NO-CARRIER?
> On May 1, 2018, at 08:53, Jason Cooperwrote: > > add the link to xen-users thread of me talking to myself. :-)) > >> On Tue, May 01, 2018 at 12:37:51PM +, Jason Cooper wrote: >> When I was first digging into this, I started a thread on xen-users [1], >> I've attached my xl-reboot.sh script here so you can see exactly what >> I'm attempting to do: > > [1] https://marc.info/?l=xen-users=152389443206023=2 You may want to look at the code (toolstack and/or frontend-backend drivers) for Qubes and OpenXT, both of which use network driver domains and support wired/wireless networks. Operational restart of a measured, non-persistent driver domain (instead of host) is a benefit of Xen disaggregation architectures. Rich ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] reboot driver domain, vifX.Y = NO-CARRIER?
add the link to xen-users thread of me talking to myself. :-)) On Tue, May 01, 2018 at 12:37:51PM +, Jason Cooper wrote: > When I was first digging into this, I started a thread on xen-users [1], > I've attached my xl-reboot.sh script here so you can see exactly what > I'm attempting to do: [1] https://marc.info/?l=xen-users=152389443206023=2 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] reboot driver domain, vifX.Y = NO-CARRIER?
Morning George, On Tue, May 01, 2018 at 11:25:06AM +0100, George Dunlap wrote: > On Mon, Apr 30, 2018 at 7:17 PM, Jason Cooper <x...@lakedaemon.net> wrote: > > On Mon, Apr 30, 2018 at 05:38:55PM +0100, George Dunlap wrote: > >> On Mon, Apr 30, 2018 at 5:16 PM, Jason Cooper <x...@lakedaemon.net> wrote: > >> > On Mon, Apr 30, 2018 at 04:22:30PM +0100, Ian Jackson wrote: > >> >> Wei Liu writes ("Re: [Xen-devel] reboot driver domain, vifX.Y = > >> >> NO-CARRIER?"): > >> >> > To implement reuse_domid in a sane way, either the toolstack needs to > >> >> > manage all domids and always sets domid when creating domain or the > >> >> > hypervisor needs to cooperate -- to have interface to reserve / > >> >> > pre-allocate domids. > >> >> > >> >> I think this is entirely the wrong approach. > >> > > >> > Whew. Glad I didn't start hacking yet... > >> > > >> >> I think the right answer is that this is simply a bug in the > >> >> frontends. frontends should cope if the backend path pointer in the > >> >> frontend directory is updated, and should start reading the new > >> >> backend instead. > >> > > >> > Ok, so I'm new to the guts of Xen. The bug, at a high level, is that > >> > "When a driver domain is rebooted (domid changed), previously connected > >> > client domUs can't gain network connectivity to/through the driver > >> > domain via 'xl network-attach client_domu mac=... bridge=... > >> > backend=drv_dom'" > >> > >> Hang on -- just to clarify, something like the following doesn't work > >> (or wouldn't, you suspect, work)? > >> > >> * Start driver domain > >> * Start domU A with no network > > > > My setup is different here. I include the vif = [... backend=...] > > declaration in my domain config. > > > >> * xl network-attach A backend=drv_dom > > > > So I don't do this step manually. > > Right, but you do the detach manually (as well as the subsequent > attach after the driver domain > > > > >> * [do some stuff] > >> * xl network-detach A [network devid] > >> * Restart driver domain > >> * xl network-attach A backend=drv_dom > [snip] > > Sorry, I get NO-CARRIER in the just rebooted driver domain. And the > > interface is still UP in domU A. > > Wait, that sounds like a different problem than the one we thought you > were talking about. You're saying that the driver domain is losing > connection to the *physical* network after reboot? No, this has nothing to do with the physical nic that is pic-passthrough'd. It's as my subject line says: vifX.Y gets NO-CARRIER. Here's a snippet from 'ip link' 12: vif20.1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq master br10 qlen 32 link/ether fe:ff:ff:ff:ff:ff brd ff:ff:ff:ff:ff:ff > So what happens if you do the following: > > * Boot your driver domain (but don't connect any guests) > * From your driver domain, ping an off-host IP > * Reboot the driver domain > * Try pinging an off-host IP again > > It sounds like maybe the second ping will fail? I assume this is for debugging the (hopefully clarified) non-existent problem with pci-passthrough. fwiw, this particular driver domain is in the middle of the diagram I did earlier in the thread. It's a netfront client to a driver domain which does have the pci-passthrough. When I was first digging into this, I started a thread on xen-users [1], I've attached my xl-reboot.sh script here so you can see exactly what I'm attempting to do: --->8 #!/bin/bash if [ $# -ne 1 ]; then echo >&2 "Usage: ${0##*/} domain" exit 1 fi DOM="$1" # get the domain id DOMID="`xl domid $DOM`" [[ "$DOMID" =~ (^[0-9]+$) ]] || exit 1 tmp="`mktemp`" # loop through frontends while read frontend <&4; do while read vif <&5; do if [ "x$vif" = "x" ]; then # stale frontend echo >&2 "WARN: stale frontend ($frontend), removing" xenstore-rm /local/domain/$DOMID/backend/vif/$frontend continue fi # store info for afterwards front="`xl domname $frontend`" bridge="`xenstore-read /local/domain/$DOMID/backend/vif/$frontend/$vif/bridge`" if [ "x$front&quo
Re: [Xen-devel] reboot driver domain, vifX.Y = NO-CARRIER?
On Mon, Apr 30, 2018 at 04:16:09PM +, Jason Cooper wrote: > Hi Ian, > > On Mon, Apr 30, 2018 at 04:22:30PM +0100, Ian Jackson wrote: > > Wei Liu writes ("Re: [Xen-devel] reboot driver domain, vifX.Y = > > NO-CARRIER?"): > > > To implement reuse_domid in a sane way, either the toolstack needs to > > > manage all domids and always sets domid when creating domain or the > > > hypervisor needs to cooperate -- to have interface to reserve / > > > pre-allocate domids. > > > > I think this is entirely the wrong approach. > > Whew. Glad I didn't start hacking yet... > > > I think the right answer is that this is simply a bug in the > > frontends. frontends should cope if the backend path pointer in the > > frontend directory is updated, and should start reading the new > > backend instead. > > Ok, so I'm new to the guts of Xen. The bug, at a high level, is that > "When a driver domain is rebooted (domid changed), previously connected > client domUs can't gain network connectivity to/through the driver > domain via 'xl network-attach client_domu mac=... bridge=... > backend=drv_dom'" This seems to be different from what I originally understood. I thought you were just expecting the frontend to reconnect automatically. At the risk of asking the obvious question: drv_dom is the name not numeric domid, right? > > This is due to the fact that the frontend net driver doesn't / can't > follow the backend driver to the new domid in xenstore. > This is strange. A new udev event should be initiated in DomU. It will then scans xenstore for a _new_ network device. There should be a new device from DomU's PoV, which means it doesn't need to know what backend domid is. This should be already handled by core xenbus driver. Also "backend-id" is already in a device's xenstore tree. Wei. ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] reboot driver domain, vifX.Y = NO-CARRIER?
On Fri, Apr 27, 2018 at 05:27:29PM +, Jason Cooper wrote: > Hi Wei Liu, > > On Fri, Apr 27, 2018 at 05:58:17PM +0100, Wei Liu wrote: > > On Fri, Apr 27, 2018 at 04:14:16PM +, Jason Cooper wrote: > > > On Fri, Apr 27, 2018 at 04:52:57PM +0100, Andrew Cooper wrote: > ... > > > > xc_domain_create() takes a domid value by pointer. Passing a value > > > > other than zero will cause Xen to use that domid, rather than by > > > > searching for the next free domid. > > > > > > > > diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c > > > > index b5e27a7..7866092 100644 > > > > --- a/tools/libxl/libxl_create.c > > > > +++ b/tools/libxl/libxl_create.c > > > > @@ -583,6 +583,7 @@ int libxl__domain_make(libxl__gc *gc, > > > > libxl_domain_config *d_config, > > > > goto out; > > > > } > > > > > > > > + *domid = atoi(getenv("OVERRIDE_DOMID") ?: "0"); > > > > ret = xc_domain_create(ctx->xch, info->ssidref, handle, flags, > > > > domid, > > > > _config); > > > > if (ret < 0) { > > > > > > > > This gross hack may get you somewhere (Entirely untested). > > > > > > Gah! Yep, that's just what I needed, thanks! I don't suppose a patch > > > series adding a 'domid' field to the domain config file would be > > > rejected outright? That would allow callers of xl to use key=value for > > > reboot scripts like mine, and also allow for a static domid setup of the > > > driver domains if folks want that. > > > > Seems a bit hacky to me. You also need to reserve a set of domids > > before hand? > > My thought of creating a domid config file variable was to do just as > you say, reserve specific domids for specific guests. I could even > trigger an error if domid is set when driver_domain isn't. > > Actually, I could slightly overload driver_domain, changing from a bool > to a 'static domid'. 0 = not a driver domain, >0 is it's static domid > assignment. > > For backwards compatibility, 1 = next domid available, and >1 would be > the static domid. I'm not sure if I like that though. > > The racey part is when a driver domain is shut down, how does a create > thread know that that domid is reserved? If a driver domain shuts down and another domain gets allocated that domain id, your whole system is hosed. It is even worse if you consider the security implication: some potentially malicious guest can impersonate driver domain and sees what other guests' data. > > third option, tri-state: > > driver_domain = 0 # not a driver domain > driver_domain = 1 # is a driver domain, use next avail domid > driver_domain = 2 # is a driver domain, re-use domid > Let's shelve this UI discussion for now. I will have a look at the other subthread. Wei. > Honestly, I'm not really liking any of these. Perhaps 'xl > network-detach ...' should be doing a better job of cleaning up? Or, > 'xl network-attach ...' should do a better job of re-attaching? > > thx, > > Jason. ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] reboot driver domain, vifX.Y = NO-CARRIER?
On Mon, Apr 30, 2018 at 06:14:15PM +, Jason Cooper wrote: > On Mon, Apr 30, 2018 at 05:26:38PM +0100, Ian Jackson wrote: > > Jason Cooper writes ("Re: [Xen-devel] reboot driver domain, vifX.Y = > > NO-CARRIER?"): > > > On Mon, Apr 30, 2018 at 04:22:30PM +0100, Ian Jackson wrote: > ... > > > Ok, so I'm new to the guts of Xen. The bug, at a high level, is that > > > "When a driver domain is rebooted (domid changed), previously connected > > > client domUs can't gain network connectivity to/through the driver > > > domain via 'xl network-attach client_domu mac=... bridge=... > > > backend=drv_dom'" > > > > > > This is due to the fact that the frontend net driver doesn't / can't > > > follow the backend driver to the new domid in xenstore. > > > > Yes. > > > > > > I'm a bit surprised that this doesn't already work. > > > > > > I'm currently running Xen 4.9.1 as patched in the standard Gentoo > > > ebuild. I've been putting off upgrading to 4.9.2, now marked stable in > > > portage, until I nail this down. I'm happy to move to 4.10 if needed. > > > > > > Do you think this is something that is definitely fixed in a more recent > > > version of Xen? I'm happy to test if so. Is there a commit id I can > > > look for? > > > > I think that in my view (which others may disagree with) this is not a > > bug in Xen but in the Linux kernel frontend. So changing the Xen > > version won't help. > > I'm running vanilla v4.16.4 based on allnoconfig in all of these > mini-domu's. It doesn't look there's been any pertinent recent changes > in drivers/net/xen-netfront.c since v4.16. > > Based on an initial scan of the code, it looks like xen-netback watches > for hotplug events on the frontend (xen-netback/xenbus.c:1041-1046 in > connect()). xen-netfront.c:1995-2036, netback_changed(), is the > registered callback for netfront. > > Is the xenbus netback/netfront state machine documented anywhere? > include/xen/interface/io/netif.h has a great description of tx/rx queue > setup and teardown, but doesn't seem to have anything specific to the > high-level signalling that 'xl network-attach' would cause. > Netback state machine is in drivers/net/xen-netback/xenbus.c:set_backend_state. But honestly I don't think that solves the general issue. It is a bit unfortunately that Xen drivers don't have a unified state machine. Wei. ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] reboot driver domain, vifX.Y = NO-CARRIER?
correction: On Mon, Apr 30, 2018 at 06:17:54PM +, Jason Cooper wrote: > On Mon, Apr 30, 2018 at 05:38:55PM +0100, George Dunlap wrote: > > On Mon, Apr 30, 2018 at 5:16 PM, Jason Cooper <x...@lakedaemon.net> wrote: > > > Hi Ian, > > > > > > On Mon, Apr 30, 2018 at 04:22:30PM +0100, Ian Jackson wrote: > > >> Wei Liu writes ("Re: [Xen-devel] reboot driver domain, vifX.Y = > > >> NO-CARRIER?"): > > >> > To implement reuse_domid in a sane way, either the toolstack needs to > > >> > manage all domids and always sets domid when creating domain or the > > >> > hypervisor needs to cooperate -- to have interface to reserve / > > >> > pre-allocate domids. > > >> > > >> I think this is entirely the wrong approach. > > > > > > Whew. Glad I didn't start hacking yet... > > > > > >> I think the right answer is that this is simply a bug in the > > >> frontends. frontends should cope if the backend path pointer in the > > >> frontend directory is updated, and should start reading the new > > >> backend instead. > > > > > > Ok, so I'm new to the guts of Xen. The bug, at a high level, is that > > > "When a driver domain is rebooted (domid changed), previously connected > > > client domUs can't gain network connectivity to/through the driver > > > domain via 'xl network-attach client_domu mac=... bridge=... > > > backend=drv_dom'" > > > > Hang on -- just to clarify, something like the following doesn't work > > (or wouldn't, you suspect, work)? > > > > * Start driver domain > > * Start domU A with no network > > My setup is different here. I include the vif = [... backend=...] > declaration in my domain config. > > > * xl network-attach A backend=drv_dom > > So I don't do this step manually. > > > * [do some stuff] > > * xl network-detach A [network devid] > > * Restart driver domain > > * xl network-attach A backend=drv_dom > > Otherwise, this is all correct. Then I get the NO-CARRIER in domU A. Sorry, I get NO-CARRIER in the just rebooted driver domain. And the interface is still UP in domU A. thx, Jason. ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] reboot driver domain, vifX.Y = NO-CARRIER?
Hi George, On Mon, Apr 30, 2018 at 05:38:55PM +0100, George Dunlap wrote: > On Mon, Apr 30, 2018 at 5:16 PM, Jason Cooper <x...@lakedaemon.net> wrote: > > Hi Ian, > > > > On Mon, Apr 30, 2018 at 04:22:30PM +0100, Ian Jackson wrote: > >> Wei Liu writes ("Re: [Xen-devel] reboot driver domain, vifX.Y = > >> NO-CARRIER?"): > >> > To implement reuse_domid in a sane way, either the toolstack needs to > >> > manage all domids and always sets domid when creating domain or the > >> > hypervisor needs to cooperate -- to have interface to reserve / > >> > pre-allocate domids. > >> > >> I think this is entirely the wrong approach. > > > > Whew. Glad I didn't start hacking yet... > > > >> I think the right answer is that this is simply a bug in the > >> frontends. frontends should cope if the backend path pointer in the > >> frontend directory is updated, and should start reading the new > >> backend instead. > > > > Ok, so I'm new to the guts of Xen. The bug, at a high level, is that > > "When a driver domain is rebooted (domid changed), previously connected > > client domUs can't gain network connectivity to/through the driver > > domain via 'xl network-attach client_domu mac=... bridge=... > > backend=drv_dom'" > > Hang on -- just to clarify, something like the following doesn't work > (or wouldn't, you suspect, work)? > > * Start driver domain > * Start domU A with no network My setup is different here. I include the vif = [... backend=...] declaration in my domain config. > * xl network-attach A backend=drv_dom So I don't do this step manually. > * [do some stuff] > * xl network-detach A [network devid] > * Restart driver domain > * xl network-attach A backend=drv_dom Otherwise, this is all correct. Then I get the NO-CARRIER in domU A. thx, Jason. ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] reboot driver domain, vifX.Y = NO-CARRIER?
On Mon, Apr 30, 2018 at 05:26:38PM +0100, Ian Jackson wrote: > Jason Cooper writes ("Re: [Xen-devel] reboot driver domain, vifX.Y = > NO-CARRIER?"): > > On Mon, Apr 30, 2018 at 04:22:30PM +0100, Ian Jackson wrote: ... > > Ok, so I'm new to the guts of Xen. The bug, at a high level, is that > > "When a driver domain is rebooted (domid changed), previously connected > > client domUs can't gain network connectivity to/through the driver > > domain via 'xl network-attach client_domu mac=... bridge=... > > backend=drv_dom'" > > > > This is due to the fact that the frontend net driver doesn't / can't > > follow the backend driver to the new domid in xenstore. > > Yes. > > > > I'm a bit surprised that this doesn't already work. > > > > I'm currently running Xen 4.9.1 as patched in the standard Gentoo > > ebuild. I've been putting off upgrading to 4.9.2, now marked stable in > > portage, until I nail this down. I'm happy to move to 4.10 if needed. > > > > Do you think this is something that is definitely fixed in a more recent > > version of Xen? I'm happy to test if so. Is there a commit id I can > > look for? > > I think that in my view (which others may disagree with) this is not a > bug in Xen but in the Linux kernel frontend. So changing the Xen > version won't help. I'm running vanilla v4.16.4 based on allnoconfig in all of these mini-domu's. It doesn't look there's been any pertinent recent changes in drivers/net/xen-netfront.c since v4.16. Based on an initial scan of the code, it looks like xen-netback watches for hotplug events on the frontend (xen-netback/xenbus.c:1041-1046 in connect()). xen-netfront.c:1995-2036, netback_changed(), is the registered callback for netfront. Is the xenbus netback/netfront state machine documented anywhere? include/xen/interface/io/netif.h has a great description of tx/rx queue setup and teardown, but doesn't seem to have anything specific to the high-level signalling that 'xl network-attach' would cause. Any pointers? thx, Jason. ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] reboot driver domain, vifX.Y = NO-CARRIER?
On Mon, Apr 30, 2018 at 5:16 PM, Jason Cooper <x...@lakedaemon.net> wrote: > Hi Ian, > > On Mon, Apr 30, 2018 at 04:22:30PM +0100, Ian Jackson wrote: >> Wei Liu writes ("Re: [Xen-devel] reboot driver domain, vifX.Y = >> NO-CARRIER?"): >> > To implement reuse_domid in a sane way, either the toolstack needs to >> > manage all domids and always sets domid when creating domain or the >> > hypervisor needs to cooperate -- to have interface to reserve / >> > pre-allocate domids. >> >> I think this is entirely the wrong approach. > > Whew. Glad I didn't start hacking yet... > >> I think the right answer is that this is simply a bug in the >> frontends. frontends should cope if the backend path pointer in the >> frontend directory is updated, and should start reading the new >> backend instead. > > Ok, so I'm new to the guts of Xen. The bug, at a high level, is that > "When a driver domain is rebooted (domid changed), previously connected > client domUs can't gain network connectivity to/through the driver > domain via 'xl network-attach client_domu mac=... bridge=... > backend=drv_dom'" Hang on -- just to clarify, something like the following doesn't work (or wouldn't, you suspect, work)? * Start driver domain * Start domU A with no network * xl network-attach A backend=drv_dom * [do some stuff] * xl network-detach A [network devid] * Restart driver domain * xl network-attach A backend=drv_dom -George ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] reboot driver domain, vifX.Y = NO-CARRIER?
Jason Cooper writes ("Re: [Xen-devel] reboot driver domain, vifX.Y = NO-CARRIER?"): > On Mon, Apr 30, 2018 at 04:22:30PM +0100, Ian Jackson wrote: > > Wei Liu writes ("Re: [Xen-devel] reboot driver domain, vifX.Y = > > NO-CARRIER?"): > > > To implement reuse_domid in a sane way, either the toolstack needs to > > > manage all domids and always sets domid when creating domain or the > > > hypervisor needs to cooperate -- to have interface to reserve / > > > pre-allocate domids. > > > > I think this is entirely the wrong approach. > > Whew. Glad I didn't start hacking yet... Well, it might be that you end up having to use this fixed-domid thing as a workaround :-/. > > I think the right answer is that this is simply a bug in the > > frontends. frontends should cope if the backend path pointer in the > > frontend directory is updated, and should start reading the new > > backend instead. > > Ok, so I'm new to the guts of Xen. The bug, at a high level, is that > "When a driver domain is rebooted (domid changed), previously connected > client domUs can't gain network connectivity to/through the driver > domain via 'xl network-attach client_domu mac=... bridge=... > backend=drv_dom'" > > This is due to the fact that the frontend net driver doesn't / can't > follow the backend driver to the new domid in xenstore. Yes. > > I'm a bit surprised that this doesn't already work. > > I'm currently running Xen 4.9.1 as patched in the standard Gentoo > ebuild. I've been putting off upgrading to 4.9.2, now marked stable in > portage, until I nail this down. I'm happy to move to 4.10 if needed. > > Do you think this is something that is definitely fixed in a more recent > version of Xen? I'm happy to test if so. Is there a commit id I can > look for? I think that in my view (which others may disagree with) this is not a bug in Xen but in the Linux kernel frontend. So changing the Xen version won't help. Ian. ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] reboot driver domain, vifX.Y = NO-CARRIER?
Hi Ian, On Mon, Apr 30, 2018 at 04:22:30PM +0100, Ian Jackson wrote: > Wei Liu writes ("Re: [Xen-devel] reboot driver domain, vifX.Y = NO-CARRIER?"): > > To implement reuse_domid in a sane way, either the toolstack needs to > > manage all domids and always sets domid when creating domain or the > > hypervisor needs to cooperate -- to have interface to reserve / > > pre-allocate domids. > > I think this is entirely the wrong approach. Whew. Glad I didn't start hacking yet... > I think the right answer is that this is simply a bug in the > frontends. frontends should cope if the backend path pointer in the > frontend directory is updated, and should start reading the new > backend instead. Ok, so I'm new to the guts of Xen. The bug, at a high level, is that "When a driver domain is rebooted (domid changed), previously connected client domUs can't gain network connectivity to/through the driver domain via 'xl network-attach client_domu mac=... bridge=... backend=drv_dom'" This is due to the fact that the frontend net driver doesn't / can't follow the backend driver to the new domid in xenstore. Does that sound right? > I'm a bit surprised that this doesn't already work. I'm currently running Xen 4.9.1 as patched in the standard Gentoo ebuild. I've been putting off upgrading to 4.9.2, now marked stable in portage, until I nail this down. I'm happy to move to 4.10 if needed. Do you think this is something that is definitely fixed in a more recent version of Xen? I'm happy to test if so. Is there a commit id I can look for? thx, Jason. ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] reboot driver domain, vifX.Y = NO-CARRIER?
Wei Liu writes ("Re: [Xen-devel] reboot driver domain, vifX.Y = NO-CARRIER?"): > To implement reuse_domid in a sane way, either the toolstack needs to > manage all domids and always sets domid when creating domain or the > hypervisor needs to cooperate -- to have interface to reserve / > pre-allocate domids. I think this is entirely the wrong approach. I think the right answer is that this is simply a bug in the frontends. frontends should cope if the backend path pointer in the frontend directory is updated, and should start reading the new backend instead. I'm a bit surprised that this doesn't already work. Ian. ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] reboot driver domain, vifX.Y = NO-CARRIER?
Hi Wei Liu, On Fri, Apr 27, 2018 at 05:58:17PM +0100, Wei Liu wrote: > On Fri, Apr 27, 2018 at 04:14:16PM +, Jason Cooper wrote: > > On Fri, Apr 27, 2018 at 04:52:57PM +0100, Andrew Cooper wrote: ... > > > xc_domain_create() takes a domid value by pointer. Passing a value > > > other than zero will cause Xen to use that domid, rather than by > > > searching for the next free domid. > > > > > > diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c > > > index b5e27a7..7866092 100644 > > > --- a/tools/libxl/libxl_create.c > > > +++ b/tools/libxl/libxl_create.c > > > @@ -583,6 +583,7 @@ int libxl__domain_make(libxl__gc *gc, > > > libxl_domain_config *d_config, > > > goto out; > > > } > > > > > > + *domid = atoi(getenv("OVERRIDE_DOMID") ?: "0"); > > > ret = xc_domain_create(ctx->xch, info->ssidref, handle, flags, > > > domid, > > > _config); > > > if (ret < 0) { > > > > > > This gross hack may get you somewhere (Entirely untested). > > > > Gah! Yep, that's just what I needed, thanks! I don't suppose a patch > > series adding a 'domid' field to the domain config file would be > > rejected outright? That would allow callers of xl to use key=value for > > reboot scripts like mine, and also allow for a static domid setup of the > > driver domains if folks want that. > > Seems a bit hacky to me. You also need to reserve a set of domids > before hand? My thought of creating a domid config file variable was to do just as you say, reserve specific domids for specific guests. I could even trigger an error if domid is set when driver_domain isn't. Actually, I could slightly overload driver_domain, changing from a bool to a 'static domid'. 0 = not a driver domain, >0 is it's static domid assignment. For backwards compatibility, 1 = next domid available, and >1 would be the static domid. I'm not sure if I like that though. The racey part is when a driver domain is shut down, how does a create thread know that that domid is reserved? third option, tri-state: driver_domain = 0 # not a driver domain driver_domain = 1 # is a driver domain, use next avail domid driver_domain = 2 # is a driver domain, re-use domid Honestly, I'm not really liking any of these. Perhaps 'xl network-detach ...' should be doing a better job of cleaning up? Or, 'xl network-attach ...' should do a better job of re-attaching? thx, Jason. ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] reboot driver domain, vifX.Y = NO-CARRIER?
On Fri, Apr 27, 2018 at 06:02:46PM +0100, Andrew Cooper wrote: > On 27/04/18 17:14, Jason Cooper wrote: > > On Fri, Apr 27, 2018 at 04:52:57PM +0100, Andrew Cooper wrote: > >> On 27/04/18 16:35, Jason Cooper wrote: > >>> On Fri, Apr 27, 2018 at 04:11:39PM +0100, Andrew Cooper wrote: > On 27/04/18 16:03, Jason Cooper wrote: > > The problem occurs when I reboot a driver domain. Regardless of the > > type of guest attached to it, I'm unable to re-establish connectivity > > between the driver domain and the re-attached guest. e.g. I reboot > > GW/FW, then re-attach VM1, VM2 and the rest. No matter how I do it, I > > get: > > > > $ ip link > > ... > > 11: vif20.1:mtu 1500 qdisc mq > > master br10 qlen 32 > > link/ether fe:ff:ff:ff:ff:ff brd ff:ff:ff:ff:ff:ff > > > > In the driver domain. At this point, absolutely no packets flow between > > the two VMs. Not even ARP. The only solution, so far, is to > > unnecessarily > > reboot the PV guests. After that, networking is fine. > > > > Any thoughts? > The underlying problem is that the frontend/backend setup in xenstore > encodes the domid in path, and changing that isn't transparent to the > guest at all. > >>> Oh joy. Would seem to make more send to use the domain name or the > >>> uuid... > >> domids are also used in the grant and event hypercall interfaces with Xen. > >> > >> There is no way this horse is being put back in its stable... > > :-( > > > The best idea we came up with was to reboot the driver domain and reuse > its old domid, at which point all the xenstore paths would remain > valid. There is support in Xen for explicitly choosing the domid of a > domain, but I don't think that it is wired up sensibly in xl. > >>> hmmm, yes. It's not wired up at all afaict. Mind giving me a hint on > >>> how to reuse the domid? > >> xc_domain_create() takes a domid value by pointer. Passing a value > >> other than zero will cause Xen to use that domid, rather than by > >> searching for the next free domid. > >> > >> diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c > >> index b5e27a7..7866092 100644 > >> --- a/tools/libxl/libxl_create.c > >> +++ b/tools/libxl/libxl_create.c > >> @@ -583,6 +583,7 @@ int libxl__domain_make(libxl__gc *gc, > >> libxl_domain_config *d_config, > >> goto out; > >> } > >> > >> + *domid = atoi(getenv("OVERRIDE_DOMID") ?: "0"); > >> ret = xc_domain_create(ctx->xch, info->ssidref, handle, flags, > >> domid, > >> _config); > >> if (ret < 0) { > >> > >> This gross hack may get you somewhere (Entirely untested). > > Gah! Yep, that's just what I needed, thanks! I don't suppose a patch > > series adding a 'domid' field to the domain config file would be > > rejected outright? That would allow callers of xl to use key=value for > > reboot scripts like mine, and also allow for a static domid setup of the > > driver domains if folks want that. > > That question would have to be deferred to the toolstack maintainers, > but some ability to manage exact domid's would be a very good thing. > > Having a domid= field would allow for very fine grain control, but > probably more control than most people want. Alternatively, having some > kind of "reuse_domid" field which booted the domain normally once, > recorded its domid, and reused that on reboot might be rather more useful. > To implement reuse_domid in a sane way, either the toolstack needs to manage all domids and always sets domid when creating domain or the hypervisor needs to cooperate -- to have interface to reserve / pre-allocate domids. Either should be doable. We should think a bit more which approach is better. Wei. ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] reboot driver domain, vifX.Y = NO-CARRIER?
On 27/04/18 17:14, Jason Cooper wrote: > On Fri, Apr 27, 2018 at 04:52:57PM +0100, Andrew Cooper wrote: >> On 27/04/18 16:35, Jason Cooper wrote: >>> On Fri, Apr 27, 2018 at 04:11:39PM +0100, Andrew Cooper wrote: On 27/04/18 16:03, Jason Cooper wrote: > The problem occurs when I reboot a driver domain. Regardless of the > type of guest attached to it, I'm unable to re-establish connectivity > between the driver domain and the re-attached guest. e.g. I reboot > GW/FW, then re-attach VM1, VM2 and the rest. No matter how I do it, I > get: > > $ ip link > ... > 11: vif20.1:mtu 1500 qdisc mq master > br10 qlen 32 > link/ether fe:ff:ff:ff:ff:ff brd ff:ff:ff:ff:ff:ff > > In the driver domain. At this point, absolutely no packets flow between > the two VMs. Not even ARP. The only solution, so far, is to > unnecessarily > reboot the PV guests. After that, networking is fine. > > Any thoughts? The underlying problem is that the frontend/backend setup in xenstore encodes the domid in path, and changing that isn't transparent to the guest at all. >>> Oh joy. Would seem to make more send to use the domain name or the >>> uuid... >> domids are also used in the grant and event hypercall interfaces with Xen. >> >> There is no way this horse is being put back in its stable... > :-( > The best idea we came up with was to reboot the driver domain and reuse its old domid, at which point all the xenstore paths would remain valid. There is support in Xen for explicitly choosing the domid of a domain, but I don't think that it is wired up sensibly in xl. >>> hmmm, yes. It's not wired up at all afaict. Mind giving me a hint on >>> how to reuse the domid? >> xc_domain_create() takes a domid value by pointer. Passing a value >> other than zero will cause Xen to use that domid, rather than by >> searching for the next free domid. >> >> diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c >> index b5e27a7..7866092 100644 >> --- a/tools/libxl/libxl_create.c >> +++ b/tools/libxl/libxl_create.c >> @@ -583,6 +583,7 @@ int libxl__domain_make(libxl__gc *gc, >> libxl_domain_config *d_config, >> goto out; >> } >> >> + *domid = atoi(getenv("OVERRIDE_DOMID") ?: "0"); >> ret = xc_domain_create(ctx->xch, info->ssidref, handle, flags, >> domid, >> _config); >> if (ret < 0) { >> >> This gross hack may get you somewhere (Entirely untested). > Gah! Yep, that's just what I needed, thanks! I don't suppose a patch > series adding a 'domid' field to the domain config file would be > rejected outright? That would allow callers of xl to use key=value for > reboot scripts like mine, and also allow for a static domid setup of the > driver domains if folks want that. That question would have to be deferred to the toolstack maintainers, but some ability to manage exact domid's would be a very good thing. Having a domid= field would allow for very fine grain control, but probably more control than most people want. Alternatively, having some kind of "reuse_domid" field which booted the domain normally once, recorded its domid, and reused that on reboot might be rather more useful. ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] reboot driver domain, vifX.Y = NO-CARRIER?
On Fri, Apr 27, 2018 at 04:14:16PM +, Jason Cooper wrote: > On Fri, Apr 27, 2018 at 04:52:57PM +0100, Andrew Cooper wrote: > > On 27/04/18 16:35, Jason Cooper wrote: > > > On Fri, Apr 27, 2018 at 04:11:39PM +0100, Andrew Cooper wrote: > > >> On 27/04/18 16:03, Jason Cooper wrote: > > >>> The problem occurs when I reboot a driver domain. Regardless of the > > >>> type of guest attached to it, I'm unable to re-establish connectivity > > >>> between the driver domain and the re-attached guest. e.g. I reboot > > >>> GW/FW, then re-attach VM1, VM2 and the rest. No matter how I do it, I > > >>> get: > > >>> > > >>> $ ip link > > >>> ... > > >>> 11: vif20.1:mtu 1500 qdisc mq > > >>> master br10 qlen 32 > > >>> link/ether fe:ff:ff:ff:ff:ff brd ff:ff:ff:ff:ff:ff > > >>> > > >>> In the driver domain. At this point, absolutely no packets flow between > > >>> the two VMs. Not even ARP. The only solution, so far, is to > > >>> unnecessarily > > >>> reboot the PV guests. After that, networking is fine. > > >>> > > >>> Any thoughts? > > >> The underlying problem is that the frontend/backend setup in xenstore > > >> encodes the domid in path, and changing that isn't transparent to the > > >> guest at all. > > > Oh joy. Would seem to make more send to use the domain name or the > > > uuid... > > > > domids are also used in the grant and event hypercall interfaces with Xen. > > > > There is no way this horse is being put back in its stable... > > :-( > > > >> The best idea we came up with was to reboot the driver domain and reuse > > >> its old domid, at which point all the xenstore paths would remain > > >> valid. There is support in Xen for explicitly choosing the domid of a > > >> domain, but I don't think that it is wired up sensibly in xl. > > > hmmm, yes. It's not wired up at all afaict. Mind giving me a hint on > > > how to reuse the domid? > > > > xc_domain_create() takes a domid value by pointer. Passing a value > > other than zero will cause Xen to use that domid, rather than by > > searching for the next free domid. > > > > diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c > > index b5e27a7..7866092 100644 > > --- a/tools/libxl/libxl_create.c > > +++ b/tools/libxl/libxl_create.c > > @@ -583,6 +583,7 @@ int libxl__domain_make(libxl__gc *gc, > > libxl_domain_config *d_config, > > goto out; > > } > > > > + *domid = atoi(getenv("OVERRIDE_DOMID") ?: "0"); > > ret = xc_domain_create(ctx->xch, info->ssidref, handle, flags, > > domid, > > _config); > > if (ret < 0) { > > > > This gross hack may get you somewhere (Entirely untested). > > Gah! Yep, that's just what I needed, thanks! I don't suppose a patch > series adding a 'domid' field to the domain config file would be > rejected outright? That would allow callers of xl to use key=value for > reboot scripts like mine, and also allow for a static domid setup of the > driver domains if folks want that. Seems a bit hacky to me. You also need to reserve a set of domids before hand? Wei. ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] reboot driver domain, vifX.Y = NO-CARRIER?
On Fri, Apr 27, 2018 at 04:52:57PM +0100, Andrew Cooper wrote: > On 27/04/18 16:35, Jason Cooper wrote: > > Hi Andrew, > > > > On Fri, Apr 27, 2018 at 04:11:39PM +0100, Andrew Cooper wrote: > >> On 27/04/18 16:03, Jason Cooper wrote: > >>> The problem occurs when I reboot a driver domain. Regardless of the > >>> type of guest attached to it, I'm unable to re-establish connectivity > >>> between the driver domain and the re-attached guest. e.g. I reboot > >>> GW/FW, then re-attach VM1, VM2 and the rest. No matter how I do it, I > >>> get: > >>> > >>> $ ip link > >>> ... > >>> 11: vif20.1:mtu 1500 qdisc mq master > >>> br10 qlen 32 > >>> link/ether fe:ff:ff:ff:ff:ff brd ff:ff:ff:ff:ff:ff > >>> > >>> In the driver domain. At this point, absolutely no packets flow between > >>> the two VMs. Not even ARP. The only solution, so far, is to > >>> unnecessarily > >>> reboot the PV guests. After that, networking is fine. > >>> > >>> Any thoughts? > >> The underlying problem is that the frontend/backend setup in xenstore > >> encodes the domid in path, and changing that isn't transparent to the > >> guest at all. > > Oh joy. Would seem to make more send to use the domain name or the > > uuid... > > domids are also used in the grant and event hypercall interfaces with Xen. If the frontend manages to go through disconnect/reconnect cycle, grant table and event channel aren't going to be a problem? Wei. ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] reboot driver domain, vifX.Y = NO-CARRIER?
On Fri, Apr 27, 2018 at 04:52:57PM +0100, Andrew Cooper wrote: > On 27/04/18 16:35, Jason Cooper wrote: > > On Fri, Apr 27, 2018 at 04:11:39PM +0100, Andrew Cooper wrote: > >> On 27/04/18 16:03, Jason Cooper wrote: > >>> The problem occurs when I reboot a driver domain. Regardless of the > >>> type of guest attached to it, I'm unable to re-establish connectivity > >>> between the driver domain and the re-attached guest. e.g. I reboot > >>> GW/FW, then re-attach VM1, VM2 and the rest. No matter how I do it, I > >>> get: > >>> > >>> $ ip link > >>> ... > >>> 11: vif20.1:mtu 1500 qdisc mq master > >>> br10 qlen 32 > >>> link/ether fe:ff:ff:ff:ff:ff brd ff:ff:ff:ff:ff:ff > >>> > >>> In the driver domain. At this point, absolutely no packets flow between > >>> the two VMs. Not even ARP. The only solution, so far, is to > >>> unnecessarily > >>> reboot the PV guests. After that, networking is fine. > >>> > >>> Any thoughts? > >> The underlying problem is that the frontend/backend setup in xenstore > >> encodes the domid in path, and changing that isn't transparent to the > >> guest at all. > > Oh joy. Would seem to make more send to use the domain name or the > > uuid... > > domids are also used in the grant and event hypercall interfaces with Xen. > > There is no way this horse is being put back in its stable... :-( > >> The best idea we came up with was to reboot the driver domain and reuse > >> its old domid, at which point all the xenstore paths would remain > >> valid. There is support in Xen for explicitly choosing the domid of a > >> domain, but I don't think that it is wired up sensibly in xl. > > hmmm, yes. It's not wired up at all afaict. Mind giving me a hint on > > how to reuse the domid? > > xc_domain_create() takes a domid value by pointer. Passing a value > other than zero will cause Xen to use that domid, rather than by > searching for the next free domid. > > diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c > index b5e27a7..7866092 100644 > --- a/tools/libxl/libxl_create.c > +++ b/tools/libxl/libxl_create.c > @@ -583,6 +583,7 @@ int libxl__domain_make(libxl__gc *gc, > libxl_domain_config *d_config, > goto out; > } > > + *domid = atoi(getenv("OVERRIDE_DOMID") ?: "0"); > ret = xc_domain_create(ctx->xch, info->ssidref, handle, flags, domid, > _config); > if (ret < 0) { > > This gross hack may get you somewhere (Entirely untested). Gah! Yep, that's just what I needed, thanks! I don't suppose a patch series adding a 'domid' field to the domain config file would be rejected outright? That would allow callers of xl to use key=value for reboot scripts like mine, and also allow for a static domid setup of the driver domains if folks want that. thx, Jason. ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] reboot driver domain, vifX.Y = NO-CARRIER?
Hi Andrew, On Fri, Apr 27, 2018 at 04:11:39PM +0100, Andrew Cooper wrote: > On 27/04/18 16:03, Jason Cooper wrote: > > The problem occurs when I reboot a driver domain. Regardless of the > > type of guest attached to it, I'm unable to re-establish connectivity > > between the driver domain and the re-attached guest. e.g. I reboot > > GW/FW, then re-attach VM1, VM2 and the rest. No matter how I do it, I > > get: > > > > $ ip link > > ... > > 11: vif20.1:mtu 1500 qdisc mq master > > br10 qlen 32 > > link/ether fe:ff:ff:ff:ff:ff brd ff:ff:ff:ff:ff:ff > > > > In the driver domain. At this point, absolutely no packets flow between > > the two VMs. Not even ARP. The only solution, so far, is to unnecessarily > > reboot the PV guests. After that, networking is fine. > > > > Any thoughts? > > The underlying problem is that the frontend/backend setup in xenstore > encodes the domid in path, and changing that isn't transparent to the > guest at all. Oh joy. Would seem to make more send to use the domain name or the uuid... > The best idea we came up with was to reboot the driver domain and reuse > its old domid, at which point all the xenstore paths would remain > valid. There is support in Xen for explicitly choosing the domid of a > domain, but I don't think that it is wired up sensibly in xl. hmmm, yes. It's not wired up at all afaict. Mind giving me a hint on how to reuse the domid? The solution I see with my current, limited understanding could be to change the path for the guest via xenstore-write. Although I suspect there's more going on underneath the hood than I'm currently aware of. thx, Jason. ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] reboot driver domain, vifX.Y = NO-CARRIER?
On 27/04/18 16:03, Jason Cooper wrote: > All, > > On Gentoo Xen 4.9.1, I've been creating minimal Linux DomU's to create a > virtual, segregated network infrastructure. This has been going really > well, and I'm slowly progressing toward a self-updating system. > > My main snag has to do with re-attaching VMs to a driver domain after > rebooting the driver domain. e.g. > > > +-+ > /-| VM1 | >/ +-+ >++ +---+ / +-+ > ISP ---| SW |---| GW/FW |-| VM2 | >++ +---+ \ +-+ > DDDD \ +-+ > \-| VMN | > +-+ > > So, in this diagram, SW, GW/FW, and VM1 are mini-VMs. VM2, and the rest > are full fledged Linux PV VMs. > > Only SW, and GW/FW are driver domains. SW has the physical nic via > pci-passthrough. There are actually 7 GW/FW mini-VMs (for 7 public IPs, > and 7 different networks), and a trunk mini-VM that aren't shown. > > The problem occurs when I reboot a driver domain. Regardless of the > type of guest attached to it, I'm unable to re-establish connectivity > between the driver domain and the re-attached guest. e.g. I reboot > GW/FW, then re-attach VM1, VM2 and the rest. No matter how I do it, I > get: > > $ ip link > ... > 11: vif20.1:mtu 1500 qdisc mq master > br10 qlen 32 > link/ether fe:ff:ff:ff:ff:ff brd ff:ff:ff:ff:ff:ff > > In the driver domain. At this point, absolutely no packets flow between > the two VMs. Not even ARP. The only solution, so far, is to unnecessarily > reboot the PV guests. After that, networking is fine. > > Any thoughts? XenServer found this when we investigated using device driver domains in a similar way. The underlying problem is that the frontend/backend setup in xenstore encodes the domid in path, and changing that isn't transparent to the guest at all. The best idea we came up with was to reboot the driver domain and reuse its old domid, at which point all the xenstore paths would remain valid. There is support in Xen for explicitly choosing the domid of a domain, but I don't think that it is wired up sensibly in xl. ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel