Re: [Xen-devel] reboot driver domain, vifX.Y = NO-CARRIER?

2018-05-07 Thread Jason Andryuk
Hi Jason,

On Sun, May 6, 2018 at 11:45 AM, Jason Cooper  wrote:
> Hi Marek,
>
> On Sat, May 05, 2018 at 01:03:15AM +0200, Marek Marczykowski-Górecki wrote:
>> On Fri, May 04, 2018 at 06:13:25PM -0400, Rich Persaud wrote:
>> > > On May 1, 2018, at 08:53, Jason Cooper  wrote:
>> > >
>> > > add the link to xen-users thread of me talking to myself.  :-))
>> > >
>> > >> On Tue, May 01, 2018 at 12:37:51PM +, Jason Cooper wrote:
>> > >> When I was first digging into this, I started a thread on xen-users [1],
>> > >> I've attached my xl-reboot.sh script here so you can see exactly what
>> > >> I'm attempting to do:
>> > >
>> > > [1] https://marc.info/?l=xen-users=152389443206023=2
>> >
>> > You may want to look at the code (toolstack and/or frontend-backend
>> > drivers) for Qubes and OpenXT, both of which use network driver
>> > domains and support wired/wireless networks.
>> >
>> > Operational restart of a measured, non-persistent driver domain
>> > (instead of host) is a benefit of Xen disaggregation architectures.
>>
>> In Qubes, on backend restart, we do equivalent of xl network-detach &&
>> xl network-attach (as you do in xl-reboot.sh). xl itself doesn't provide
>> any place to plug such script, but we use libvirt which provide events.
>> Also, we have full control over domain config (libvirt XML), so don't
>> need to extract vif list from xenstore...

OpenXT does the xl network-detach && xl network-attach in its own
daemon: https://github.com/OpenXT/network/blob/master/nwd/Main.hs#L767

>> The problem you describe looks related to
>> https://lkml.org/lkml/2018/2/28/289, but fix is included in 4.16...
>> There was also related libxl patch:
>> https://xen.markmail.org/thread/6qbgmwyjqsshjus7
>> (but it applies to the case where you first shutdown backend and only
>> then do xl network-detach)
>>
>> Do you have xl devd running in your driver domain? Without that xl
>> network-attach wont work (AFAIR udev isn't used here anymore).
>
> Yes, I've now modified the init script (xendomains in Gentoo) to create
> a key /tool/vmstatus/$domname/status, start the domU, loop until it gets
> it's domid, and -chmod the key.  It then does a -watch on that key.  In
> the domU, *after* xl devd is started, it writes "online" to that key.
>
> This allows me to automatically bring up the driver domains, and make
> sure they're ready for connections before proceeding to booting the next
> VM.  This only occurs when the host boots.
>
> After the driver domains are up, the rest of the domains are started in
> parallel.
>
>> Also note that backend shutdown/restart/crash was a source of many
>> problems in frontend kernel and toolstack in the past. Even simple
>> dynamic network-attach/detach sometimes is problematic for the frontend.
>> Links:
>> https://github.com/QubesOS/qubes-issues/issues/3657 (frontend kernel
>> problem)
>> https://github.com/QubesOS/qubes-issues/issues/1426 (toolstack problem,
>> + libvirt)
>> https://github.com/QubesOS/qubes-issues/issues/975 (frontend kernel
>> problem)
>
> Mmm, clearly the state machine and it's implementation needs some
> review.  I'm building v4.16.7 and we'll see how it goes for my usecase.

OpenXT has some patches for reconnecting netfront after the netback
domain is rebooted to a new domid:
https://github.com/OpenXT/xenclient-oe/blob/master/recipes-kernel/linux/4.14/patches/netfront-support-backend-relocate.patch
https://github.com/OpenXT/xenclient-oe/blob/master/recipes-kernel/linux/4.14/patches/xenbus-move-otherend-watches-on-relocate.patch

I'm too familiar with those, so they may be specific to the OpenXT
networking code.

Jason, when you see the vif NO-CARRIER, how do the frontend and
backend XenStore entries look?  Do the domids matchup and is the pair
in state 4 -> XenbusStateConnected?

Regards,
Jason

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] reboot driver domain, vifX.Y = NO-CARRIER?

2018-05-06 Thread Jason Cooper
Hi Marek,

On Sat, May 05, 2018 at 01:03:15AM +0200, Marek Marczykowski-Górecki wrote:
> On Fri, May 04, 2018 at 06:13:25PM -0400, Rich Persaud wrote:
> > > On May 1, 2018, at 08:53, Jason Cooper  wrote:
> > > 
> > > add the link to xen-users thread of me talking to myself.  :-))
> > > 
> > >> On Tue, May 01, 2018 at 12:37:51PM +, Jason Cooper wrote:
> > >> When I was first digging into this, I started a thread on xen-users [1],
> > >> I've attached my xl-reboot.sh script here so you can see exactly what
> > >> I'm attempting to do:
> > > 
> > > [1] https://marc.info/?l=xen-users=152389443206023=2
> > 
> > You may want to look at the code (toolstack and/or frontend-backend
> > drivers) for Qubes and OpenXT, both of which use network driver
> > domains and support wired/wireless networks.  
> > 
> > Operational restart of a measured, non-persistent driver domain
> > (instead of host) is a benefit of Xen disaggregation architectures.
> 
> In Qubes, on backend restart, we do equivalent of xl network-detach &&
> xl network-attach (as you do in xl-reboot.sh). xl itself doesn't provide
> any place to plug such script, but we use libvirt which provide events.
> Also, we have full control over domain config (libvirt XML), so don't
> need to extract vif list from xenstore...
> 
> The problem you describe looks related to
> https://lkml.org/lkml/2018/2/28/289, but fix is included in 4.16...
> There was also related libxl patch:
> https://xen.markmail.org/thread/6qbgmwyjqsshjus7
> (but it applies to the case where you first shutdown backend and only
> then do xl network-detach)
> 
> Do you have xl devd running in your driver domain? Without that xl
> network-attach wont work (AFAIR udev isn't used here anymore).

Yes, I've now modified the init script (xendomains in Gentoo) to create
a key /tool/vmstatus/$domname/status, start the domU, loop until it gets
it's domid, and -chmod the key.  It then does a -watch on that key.  In
the domU, *after* xl devd is started, it writes "online" to that key.

This allows me to automatically bring up the driver domains, and make
sure they're ready for connections before proceeding to booting the next
VM.  This only occurs when the host boots.

After the driver domains are up, the rest of the domains are started in
parallel.

> Also note that backend shutdown/restart/crash was a source of many
> problems in frontend kernel and toolstack in the past. Even simple
> dynamic network-attach/detach sometimes is problematic for the frontend.
> Links:
> https://github.com/QubesOS/qubes-issues/issues/3657 (frontend kernel
> problem)
> https://github.com/QubesOS/qubes-issues/issues/1426 (toolstack problem,
> + libvirt)
> https://github.com/QubesOS/qubes-issues/issues/975 (frontend kernel
> problem)

Mmm, clearly the state machine and it's implementation needs some
review.  I'm building v4.16.7 and we'll see how it goes for my usecase.

Thanks for all the pointers!

thx,

Jason.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] reboot driver domain, vifX.Y = NO-CARRIER?

2018-05-04 Thread Marek Marczykowski-Górecki
On Fri, May 04, 2018 at 06:13:25PM -0400, Rich Persaud wrote:
> > On May 1, 2018, at 08:53, Jason Cooper  wrote:
> > 
> > add the link to xen-users thread of me talking to myself.  :-))
> > 
> >> On Tue, May 01, 2018 at 12:37:51PM +, Jason Cooper wrote:
> >> When I was first digging into this, I started a thread on xen-users [1],
> >> I've attached my xl-reboot.sh script here so you can see exactly what
> >> I'm attempting to do:
> > 
> > [1] https://marc.info/?l=xen-users=152389443206023=2
> 
> You may want to look at the code (toolstack and/or frontend-backend drivers) 
> for Qubes and OpenXT, both of which use network driver domains and support 
> wired/wireless networks.  
> 
> Operational restart of a measured, non-persistent driver domain (instead of 
> host) is a benefit of Xen disaggregation architectures.

In Qubes, on backend restart, we do equivalent of xl network-detach &&
xl network-attach (as you do in xl-reboot.sh). xl itself doesn't provide
any place to plug such script, but we use libvirt which provide events.
Also, we have full control over domain config (libvirt XML), so don't
need to extract vif list from xenstore...

The problem you describe looks related to
https://lkml.org/lkml/2018/2/28/289, but fix is included in 4.16...
There was also related libxl patch:
https://xen.markmail.org/thread/6qbgmwyjqsshjus7
(but it applies to the case where you first shutdown backend and only
then do xl network-detach)

Do you have xl devd running in your driver domain? Without that xl
network-attach wont work (AFAIR udev isn't used here anymore).

Also note that backend shutdown/restart/crash was a source of many
problems in frontend kernel and toolstack in the past. Even simple
dynamic network-attach/detach sometimes is problematic for the frontend.
Links:
https://github.com/QubesOS/qubes-issues/issues/3657 (frontend kernel
problem)
https://github.com/QubesOS/qubes-issues/issues/1426 (toolstack problem,
+ libvirt)
https://github.com/QubesOS/qubes-issues/issues/975 (frontend kernel
problem)

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?


signature.asc
Description: PGP signature
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] reboot driver domain, vifX.Y = NO-CARRIER?

2018-05-04 Thread Rich Persaud
> On May 1, 2018, at 08:53, Jason Cooper  wrote:
> 
> add the link to xen-users thread of me talking to myself.  :-))
> 
>> On Tue, May 01, 2018 at 12:37:51PM +, Jason Cooper wrote:
>> When I was first digging into this, I started a thread on xen-users [1],
>> I've attached my xl-reboot.sh script here so you can see exactly what
>> I'm attempting to do:
> 
> [1] https://marc.info/?l=xen-users=152389443206023=2

You may want to look at the code (toolstack and/or frontend-backend drivers) 
for Qubes and OpenXT, both of which use network driver domains and support 
wired/wireless networks.  

Operational restart of a measured, non-persistent driver domain (instead of 
host) is a benefit of Xen disaggregation architectures.

Rich
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] reboot driver domain, vifX.Y = NO-CARRIER?

2018-05-01 Thread Jason Cooper
add the link to xen-users thread of me talking to myself.  :-))

On Tue, May 01, 2018 at 12:37:51PM +, Jason Cooper wrote:
> When I was first digging into this, I started a thread on xen-users [1],
> I've attached my xl-reboot.sh script here so you can see exactly what
> I'm attempting to do:

[1] https://marc.info/?l=xen-users=152389443206023=2


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] reboot driver domain, vifX.Y = NO-CARRIER?

2018-05-01 Thread Jason Cooper
Morning George,

On Tue, May 01, 2018 at 11:25:06AM +0100, George Dunlap wrote:
> On Mon, Apr 30, 2018 at 7:17 PM, Jason Cooper <x...@lakedaemon.net> wrote:
> > On Mon, Apr 30, 2018 at 05:38:55PM +0100, George Dunlap wrote:
> >> On Mon, Apr 30, 2018 at 5:16 PM, Jason Cooper <x...@lakedaemon.net> wrote:
> >> > On Mon, Apr 30, 2018 at 04:22:30PM +0100, Ian Jackson wrote:
> >> >> Wei Liu writes ("Re: [Xen-devel] reboot driver domain, vifX.Y = 
> >> >> NO-CARRIER?"):
> >> >> > To implement reuse_domid in a sane way, either the toolstack needs to
> >> >> > manage all domids and always sets domid when creating domain or the
> >> >> > hypervisor needs to cooperate -- to have interface to reserve /
> >> >> > pre-allocate domids.
> >> >>
> >> >> I think this is entirely the wrong approach.
> >> >
> >> > Whew.  Glad I didn't start hacking yet...
> >> >
> >> >> I think the right answer is that this is simply a bug in the
> >> >> frontends.  frontends should cope if the backend path pointer in the
> >> >> frontend directory is updated, and should start reading the new
> >> >> backend instead.
> >> >
> >> > Ok, so I'm new to the guts of Xen.  The bug, at a high level, is that
> >> > "When a driver domain is rebooted (domid changed), previously connected
> >> > client domUs can't gain network connectivity to/through the driver
> >> > domain via 'xl network-attach client_domu mac=... bridge=...
> >> > backend=drv_dom'"
> >>
> >> Hang on -- just to clarify, something like the following doesn't work
> >> (or wouldn't, you suspect, work)?
> >>
> >> * Start driver domain
> >> * Start domU A with no network
> >
> > My setup is different here.  I include the vif = [... backend=...]
> > declaration in my domain config.
> >
> >> * xl network-attach A backend=drv_dom
> >
> > So I don't do this step manually.
> 
> Right, but you do the detach manually (as well as the subsequent
> attach after the driver domain
> 
> >
> >> * [do some stuff]
> >> * xl network-detach A [network devid]
> >> * Restart driver domain
> >> * xl network-attach A backend=drv_dom
> [snip]
> > Sorry, I get NO-CARRIER in the just rebooted driver domain.  And the
> > interface is still UP in domU A.
> 
> Wait, that sounds like a different problem than the one we thought you
> were talking about.  You're saying that the driver domain is losing
> connection to the *physical* network after reboot?

No, this has nothing to do with the physical nic that is
pic-passthrough'd.  It's as my subject line says: vifX.Y gets
NO-CARRIER.  Here's a snippet from 'ip link'

12: vif20.1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq master br10 
qlen 32
link/ether fe:ff:ff:ff:ff:ff brd ff:ff:ff:ff:ff:ff

> So what happens if you do the following:
> 
> * Boot your driver domain (but don't connect any guests)
> * From your driver domain, ping an off-host IP
> * Reboot the driver domain
> * Try pinging an off-host IP again
> 
> It sounds like maybe the second ping will fail?

I assume this is for debugging the (hopefully clarified) non-existent
problem with pci-passthrough.  fwiw, this particular driver domain is in
the middle of the diagram I did earlier in the thread.  It's a netfront
client to a driver domain which does have the pci-passthrough.

When I was first digging into this, I started a thread on xen-users [1],
I've attached my xl-reboot.sh script here so you can see exactly what
I'm attempting to do:

--->8
#!/bin/bash

if [ $# -ne 1 ]; then
echo >&2 "Usage: ${0##*/} domain"
exit 1
fi

DOM="$1"

# get the domain id
DOMID="`xl domid $DOM`"
[[ "$DOMID" =~ (^[0-9]+$) ]] || exit 1

tmp="`mktemp`"

# loop through frontends
while read frontend <&4; do
while read vif <&5; do
if [ "x$vif" = "x" ]; then
# stale frontend
echo >&2 "WARN: stale frontend ($frontend), removing"
xenstore-rm /local/domain/$DOMID/backend/vif/$frontend
continue
fi

# store info for afterwards
front="`xl domname $frontend`"
bridge="`xenstore-read 
/local/domain/$DOMID/backend/vif/$frontend/$vif/bridge`"
if [ "x$front&quo

Re: [Xen-devel] reboot driver domain, vifX.Y = NO-CARRIER?

2018-05-01 Thread Wei Liu
On Mon, Apr 30, 2018 at 04:16:09PM +, Jason Cooper wrote:
> Hi Ian,
> 
> On Mon, Apr 30, 2018 at 04:22:30PM +0100, Ian Jackson wrote:
> > Wei Liu writes ("Re: [Xen-devel] reboot driver domain, vifX.Y = 
> > NO-CARRIER?"):
> > > To implement reuse_domid in a sane way, either the toolstack needs to
> > > manage all domids and always sets domid when creating domain or the
> > > hypervisor needs to cooperate -- to have interface to reserve /
> > > pre-allocate domids.
> > 
> > I think this is entirely the wrong approach.
> 
> Whew.  Glad I didn't start hacking yet...
> 
> > I think the right answer is that this is simply a bug in the
> > frontends.  frontends should cope if the backend path pointer in the
> > frontend directory is updated, and should start reading the new
> > backend instead.
> 
> Ok, so I'm new to the guts of Xen.  The bug, at a high level, is that
> "When a driver domain is rebooted (domid changed), previously connected
> client domUs can't gain network connectivity to/through the driver
> domain via 'xl network-attach client_domu mac=... bridge=...
> backend=drv_dom'"

This seems to be different from what I originally understood. I thought
you were just expecting the frontend to reconnect automatically.

At the risk of asking the obvious question: drv_dom is the name not
numeric domid, right?

> 
> This is due to the fact that the frontend net driver doesn't / can't
> follow the backend driver to the new domid in xenstore.
> 

This is strange. A new udev event should be initiated in DomU. It will
then scans xenstore for a _new_ network device. There should be a new
device from DomU's PoV, which means it doesn't need to know what backend
domid is. This should be already handled by core xenbus driver.

Also "backend-id" is already in a device's xenstore tree.

Wei.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] reboot driver domain, vifX.Y = NO-CARRIER?

2018-05-01 Thread Wei Liu
On Fri, Apr 27, 2018 at 05:27:29PM +, Jason Cooper wrote:
> Hi Wei Liu,
> 
> On Fri, Apr 27, 2018 at 05:58:17PM +0100, Wei Liu wrote:
> > On Fri, Apr 27, 2018 at 04:14:16PM +, Jason Cooper wrote:
> > > On Fri, Apr 27, 2018 at 04:52:57PM +0100, Andrew Cooper wrote:
> ...
> > > > xc_domain_create() takes a domid value by pointer.  Passing a value
> > > > other than zero will cause Xen to use that domid, rather than by
> > > > searching for the next free domid.
> > > > 
> > > > diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
> > > > index b5e27a7..7866092 100644
> > > > --- a/tools/libxl/libxl_create.c
> > > > +++ b/tools/libxl/libxl_create.c
> > > > @@ -583,6 +583,7 @@ int libxl__domain_make(libxl__gc *gc,
> > > > libxl_domain_config *d_config,
> > > >  goto out;
> > > >  }
> > > >  
> > > > +    *domid = atoi(getenv("OVERRIDE_DOMID") ?: "0");
> > > >  ret = xc_domain_create(ctx->xch, info->ssidref, handle, flags, 
> > > > domid,
> > > >     _config);
> > > >  if (ret < 0) {
> > > > 
> > > > This gross hack may get you somewhere (Entirely untested).
> > > 
> > > Gah!  Yep, that's just what I needed, thanks!  I don't suppose a patch
> > > series adding a 'domid' field to the domain config file would be
> > > rejected outright?  That would allow callers of xl to use key=value for
> > > reboot scripts like mine, and also allow for a static domid setup of the
> > > driver domains if folks want that.
> > 
> > Seems a bit  hacky to me. You also need to reserve a set of domids
> > before hand?
> 
> My thought of creating a domid config file variable was to do just as
> you say, reserve specific domids for specific guests.  I could even
> trigger an error if domid is set when driver_domain isn't.
> 
> Actually, I could slightly overload driver_domain, changing from a bool
> to a 'static domid'.  0 = not a driver domain, >0 is it's static domid
> assignment.
> 
> For backwards compatibility, 1 = next domid available, and >1 would be
> the static domid.  I'm not sure if I like that though.
> 
> The racey part is when a driver domain is shut down, how does a create
> thread know that that domid is reserved?

If a driver domain shuts down and another domain gets allocated that
domain id, your whole system is hosed.

It is even worse if you consider the security implication: some
potentially malicious guest can impersonate driver domain and sees what
other guests' data.

> 
> third option, tri-state:
> 
> driver_domain = 0   # not a driver domain
> driver_domain = 1   # is a driver domain, use next avail domid
> driver_domain = 2   # is a driver domain, re-use domid
> 

Let's shelve this UI discussion for now. I will have a look at the other
subthread.

Wei.

> Honestly, I'm not really liking any of these.  Perhaps 'xl
> network-detach ...' should be doing a better job of cleaning up?  Or,
> 'xl network-attach ...' should do a better job of re-attaching?
> 
> thx,
> 
> Jason.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] reboot driver domain, vifX.Y = NO-CARRIER?

2018-05-01 Thread Wei Liu
On Mon, Apr 30, 2018 at 06:14:15PM +, Jason Cooper wrote:
> On Mon, Apr 30, 2018 at 05:26:38PM +0100, Ian Jackson wrote:
> > Jason Cooper writes ("Re: [Xen-devel] reboot driver domain, vifX.Y = 
> > NO-CARRIER?"):
> > > On Mon, Apr 30, 2018 at 04:22:30PM +0100, Ian Jackson wrote:
> ...
> > > Ok, so I'm new to the guts of Xen.  The bug, at a high level, is that
> > > "When a driver domain is rebooted (domid changed), previously connected
> > > client domUs can't gain network connectivity to/through the driver
> > > domain via 'xl network-attach client_domu mac=... bridge=...
> > > backend=drv_dom'"
> > > 
> > > This is due to the fact that the frontend net driver doesn't / can't
> > > follow the backend driver to the new domid in xenstore.
> > 
> > Yes.
> > 
> > > > I'm a bit surprised that this doesn't already work.
> > > 
> > > I'm currently running Xen 4.9.1 as patched in the standard Gentoo
> > > ebuild.  I've been putting off upgrading to 4.9.2, now marked stable in
> > > portage, until I nail this down.  I'm happy to move to 4.10 if needed.
> > > 
> > > Do you think this is something that is definitely fixed in a more recent
> > > version of Xen?  I'm happy to test if so.  Is there a commit id I can
> > > look for?
> > 
> > I think that in my view (which others may disagree with) this is not a
> > bug in Xen but in the Linux kernel frontend.  So changing the Xen
> > version won't help.
> 
> I'm running vanilla v4.16.4 based on allnoconfig in all of these
> mini-domu's.  It doesn't look there's been any pertinent recent changes
> in drivers/net/xen-netfront.c since v4.16.
> 
> Based on an initial scan of the code, it looks like xen-netback watches
> for hotplug events on the frontend (xen-netback/xenbus.c:1041-1046 in
> connect()).  xen-netfront.c:1995-2036, netback_changed(), is the
> registered callback for netfront.
> 
> Is the xenbus netback/netfront state machine documented anywhere?
> include/xen/interface/io/netif.h has a great description of tx/rx queue
> setup and teardown, but doesn't seem to have anything specific to the
> high-level signalling that 'xl network-attach' would cause.
> 

Netback state machine is in
drivers/net/xen-netback/xenbus.c:set_backend_state.

But honestly I don't think that solves the general issue. It is a bit
unfortunately that Xen drivers don't have a unified state machine.

Wei.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] reboot driver domain, vifX.Y = NO-CARRIER?

2018-04-30 Thread Jason Cooper
correction:

On Mon, Apr 30, 2018 at 06:17:54PM +, Jason Cooper wrote:
> On Mon, Apr 30, 2018 at 05:38:55PM +0100, George Dunlap wrote:
> > On Mon, Apr 30, 2018 at 5:16 PM, Jason Cooper <x...@lakedaemon.net> wrote:
> > > Hi Ian,
> > >
> > > On Mon, Apr 30, 2018 at 04:22:30PM +0100, Ian Jackson wrote:
> > >> Wei Liu writes ("Re: [Xen-devel] reboot driver domain, vifX.Y = 
> > >> NO-CARRIER?"):
> > >> > To implement reuse_domid in a sane way, either the toolstack needs to
> > >> > manage all domids and always sets domid when creating domain or the
> > >> > hypervisor needs to cooperate -- to have interface to reserve /
> > >> > pre-allocate domids.
> > >>
> > >> I think this is entirely the wrong approach.
> > >
> > > Whew.  Glad I didn't start hacking yet...
> > >
> > >> I think the right answer is that this is simply a bug in the
> > >> frontends.  frontends should cope if the backend path pointer in the
> > >> frontend directory is updated, and should start reading the new
> > >> backend instead.
> > >
> > > Ok, so I'm new to the guts of Xen.  The bug, at a high level, is that
> > > "When a driver domain is rebooted (domid changed), previously connected
> > > client domUs can't gain network connectivity to/through the driver
> > > domain via 'xl network-attach client_domu mac=... bridge=...
> > > backend=drv_dom'"
> > 
> > Hang on -- just to clarify, something like the following doesn't work
> > (or wouldn't, you suspect, work)?
> > 
> > * Start driver domain
> > * Start domU A with no network
> 
> My setup is different here.  I include the vif = [... backend=...]
> declaration in my domain config.
> 
> > * xl network-attach A backend=drv_dom
> 
> So I don't do this step manually.
> 
> > * [do some stuff]
> > * xl network-detach A [network devid]
> > * Restart driver domain
> > * xl network-attach A backend=drv_dom
> 
> Otherwise, this is all correct.  Then I get the NO-CARRIER in domU A.

Sorry, I get NO-CARRIER in the just rebooted driver domain.  And the
interface is still UP in domU A.

thx,

Jason.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] reboot driver domain, vifX.Y = NO-CARRIER?

2018-04-30 Thread Jason Cooper
Hi George,

On Mon, Apr 30, 2018 at 05:38:55PM +0100, George Dunlap wrote:
> On Mon, Apr 30, 2018 at 5:16 PM, Jason Cooper <x...@lakedaemon.net> wrote:
> > Hi Ian,
> >
> > On Mon, Apr 30, 2018 at 04:22:30PM +0100, Ian Jackson wrote:
> >> Wei Liu writes ("Re: [Xen-devel] reboot driver domain, vifX.Y = 
> >> NO-CARRIER?"):
> >> > To implement reuse_domid in a sane way, either the toolstack needs to
> >> > manage all domids and always sets domid when creating domain or the
> >> > hypervisor needs to cooperate -- to have interface to reserve /
> >> > pre-allocate domids.
> >>
> >> I think this is entirely the wrong approach.
> >
> > Whew.  Glad I didn't start hacking yet...
> >
> >> I think the right answer is that this is simply a bug in the
> >> frontends.  frontends should cope if the backend path pointer in the
> >> frontend directory is updated, and should start reading the new
> >> backend instead.
> >
> > Ok, so I'm new to the guts of Xen.  The bug, at a high level, is that
> > "When a driver domain is rebooted (domid changed), previously connected
> > client domUs can't gain network connectivity to/through the driver
> > domain via 'xl network-attach client_domu mac=... bridge=...
> > backend=drv_dom'"
> 
> Hang on -- just to clarify, something like the following doesn't work
> (or wouldn't, you suspect, work)?
> 
> * Start driver domain
> * Start domU A with no network

My setup is different here.  I include the vif = [... backend=...]
declaration in my domain config.

> * xl network-attach A backend=drv_dom

So I don't do this step manually.

> * [do some stuff]
> * xl network-detach A [network devid]
> * Restart driver domain
> * xl network-attach A backend=drv_dom

Otherwise, this is all correct.  Then I get the NO-CARRIER in domU A.

thx,

Jason.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] reboot driver domain, vifX.Y = NO-CARRIER?

2018-04-30 Thread Jason Cooper
On Mon, Apr 30, 2018 at 05:26:38PM +0100, Ian Jackson wrote:
> Jason Cooper writes ("Re: [Xen-devel] reboot driver domain, vifX.Y = 
> NO-CARRIER?"):
> > On Mon, Apr 30, 2018 at 04:22:30PM +0100, Ian Jackson wrote:
...
> > Ok, so I'm new to the guts of Xen.  The bug, at a high level, is that
> > "When a driver domain is rebooted (domid changed), previously connected
> > client domUs can't gain network connectivity to/through the driver
> > domain via 'xl network-attach client_domu mac=... bridge=...
> > backend=drv_dom'"
> > 
> > This is due to the fact that the frontend net driver doesn't / can't
> > follow the backend driver to the new domid in xenstore.
> 
> Yes.
> 
> > > I'm a bit surprised that this doesn't already work.
> > 
> > I'm currently running Xen 4.9.1 as patched in the standard Gentoo
> > ebuild.  I've been putting off upgrading to 4.9.2, now marked stable in
> > portage, until I nail this down.  I'm happy to move to 4.10 if needed.
> > 
> > Do you think this is something that is definitely fixed in a more recent
> > version of Xen?  I'm happy to test if so.  Is there a commit id I can
> > look for?
> 
> I think that in my view (which others may disagree with) this is not a
> bug in Xen but in the Linux kernel frontend.  So changing the Xen
> version won't help.

I'm running vanilla v4.16.4 based on allnoconfig in all of these
mini-domu's.  It doesn't look there's been any pertinent recent changes
in drivers/net/xen-netfront.c since v4.16.

Based on an initial scan of the code, it looks like xen-netback watches
for hotplug events on the frontend (xen-netback/xenbus.c:1041-1046 in
connect()).  xen-netfront.c:1995-2036, netback_changed(), is the
registered callback for netfront.

Is the xenbus netback/netfront state machine documented anywhere?
include/xen/interface/io/netif.h has a great description of tx/rx queue
setup and teardown, but doesn't seem to have anything specific to the
high-level signalling that 'xl network-attach' would cause.

Any pointers?

thx,

Jason.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] reboot driver domain, vifX.Y = NO-CARRIER?

2018-04-30 Thread George Dunlap
On Mon, Apr 30, 2018 at 5:16 PM, Jason Cooper <x...@lakedaemon.net> wrote:
> Hi Ian,
>
> On Mon, Apr 30, 2018 at 04:22:30PM +0100, Ian Jackson wrote:
>> Wei Liu writes ("Re: [Xen-devel] reboot driver domain, vifX.Y = 
>> NO-CARRIER?"):
>> > To implement reuse_domid in a sane way, either the toolstack needs to
>> > manage all domids and always sets domid when creating domain or the
>> > hypervisor needs to cooperate -- to have interface to reserve /
>> > pre-allocate domids.
>>
>> I think this is entirely the wrong approach.
>
> Whew.  Glad I didn't start hacking yet...
>
>> I think the right answer is that this is simply a bug in the
>> frontends.  frontends should cope if the backend path pointer in the
>> frontend directory is updated, and should start reading the new
>> backend instead.
>
> Ok, so I'm new to the guts of Xen.  The bug, at a high level, is that
> "When a driver domain is rebooted (domid changed), previously connected
> client domUs can't gain network connectivity to/through the driver
> domain via 'xl network-attach client_domu mac=... bridge=...
> backend=drv_dom'"

Hang on -- just to clarify, something like the following doesn't work
(or wouldn't, you suspect, work)?

* Start driver domain
* Start domU A with no network
* xl network-attach A backend=drv_dom
* [do some stuff]
* xl network-detach A [network devid]
* Restart driver domain
* xl network-attach A backend=drv_dom

 -George

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] reboot driver domain, vifX.Y = NO-CARRIER?

2018-04-30 Thread Ian Jackson
Jason Cooper writes ("Re: [Xen-devel] reboot driver domain, vifX.Y = 
NO-CARRIER?"):
> On Mon, Apr 30, 2018 at 04:22:30PM +0100, Ian Jackson wrote:
> > Wei Liu writes ("Re: [Xen-devel] reboot driver domain, vifX.Y = 
> > NO-CARRIER?"):
> > > To implement reuse_domid in a sane way, either the toolstack needs to
> > > manage all domids and always sets domid when creating domain or the
> > > hypervisor needs to cooperate -- to have interface to reserve /
> > > pre-allocate domids.
> > 
> > I think this is entirely the wrong approach.
> 
> Whew.  Glad I didn't start hacking yet...

Well, it might be that you end up having to use this fixed-domid thing
as a workaround :-/.

> > I think the right answer is that this is simply a bug in the
> > frontends.  frontends should cope if the backend path pointer in the
> > frontend directory is updated, and should start reading the new
> > backend instead.
> 
> Ok, so I'm new to the guts of Xen.  The bug, at a high level, is that
> "When a driver domain is rebooted (domid changed), previously connected
> client domUs can't gain network connectivity to/through the driver
> domain via 'xl network-attach client_domu mac=... bridge=...
> backend=drv_dom'"
> 
> This is due to the fact that the frontend net driver doesn't / can't
> follow the backend driver to the new domid in xenstore.

Yes.

> > I'm a bit surprised that this doesn't already work.
> 
> I'm currently running Xen 4.9.1 as patched in the standard Gentoo
> ebuild.  I've been putting off upgrading to 4.9.2, now marked stable in
> portage, until I nail this down.  I'm happy to move to 4.10 if needed.
> 
> Do you think this is something that is definitely fixed in a more recent
> version of Xen?  I'm happy to test if so.  Is there a commit id I can
> look for?

I think that in my view (which others may disagree with) this is not a
bug in Xen but in the Linux kernel frontend.  So changing the Xen
version won't help.

Ian.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] reboot driver domain, vifX.Y = NO-CARRIER?

2018-04-30 Thread Jason Cooper
Hi Ian,

On Mon, Apr 30, 2018 at 04:22:30PM +0100, Ian Jackson wrote:
> Wei Liu writes ("Re: [Xen-devel] reboot driver domain, vifX.Y = NO-CARRIER?"):
> > To implement reuse_domid in a sane way, either the toolstack needs to
> > manage all domids and always sets domid when creating domain or the
> > hypervisor needs to cooperate -- to have interface to reserve /
> > pre-allocate domids.
> 
> I think this is entirely the wrong approach.

Whew.  Glad I didn't start hacking yet...

> I think the right answer is that this is simply a bug in the
> frontends.  frontends should cope if the backend path pointer in the
> frontend directory is updated, and should start reading the new
> backend instead.

Ok, so I'm new to the guts of Xen.  The bug, at a high level, is that
"When a driver domain is rebooted (domid changed), previously connected
client domUs can't gain network connectivity to/through the driver
domain via 'xl network-attach client_domu mac=... bridge=...
backend=drv_dom'"

This is due to the fact that the frontend net driver doesn't / can't
follow the backend driver to the new domid in xenstore.

Does that sound right?

> I'm a bit surprised that this doesn't already work.

I'm currently running Xen 4.9.1 as patched in the standard Gentoo
ebuild.  I've been putting off upgrading to 4.9.2, now marked stable in
portage, until I nail this down.  I'm happy to move to 4.10 if needed.

Do you think this is something that is definitely fixed in a more recent
version of Xen?  I'm happy to test if so.  Is there a commit id I can
look for?


thx,

Jason.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] reboot driver domain, vifX.Y = NO-CARRIER?

2018-04-30 Thread Ian Jackson
Wei Liu writes ("Re: [Xen-devel] reboot driver domain, vifX.Y = NO-CARRIER?"):
> To implement reuse_domid in a sane way, either the toolstack needs to
> manage all domids and always sets domid when creating domain or the
> hypervisor needs to cooperate -- to have interface to reserve /
> pre-allocate domids.

I think this is entirely the wrong approach.

I think the right answer is that this is simply a bug in the
frontends.  frontends should cope if the backend path pointer in the
frontend directory is updated, and should start reading the new
backend instead.

I'm a bit surprised that this doesn't already work.

Ian.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] reboot driver domain, vifX.Y = NO-CARRIER?

2018-04-27 Thread Jason Cooper
Hi Wei Liu,

On Fri, Apr 27, 2018 at 05:58:17PM +0100, Wei Liu wrote:
> On Fri, Apr 27, 2018 at 04:14:16PM +, Jason Cooper wrote:
> > On Fri, Apr 27, 2018 at 04:52:57PM +0100, Andrew Cooper wrote:
...
> > > xc_domain_create() takes a domid value by pointer.  Passing a value
> > > other than zero will cause Xen to use that domid, rather than by
> > > searching for the next free domid.
> > > 
> > > diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
> > > index b5e27a7..7866092 100644
> > > --- a/tools/libxl/libxl_create.c
> > > +++ b/tools/libxl/libxl_create.c
> > > @@ -583,6 +583,7 @@ int libxl__domain_make(libxl__gc *gc,
> > > libxl_domain_config *d_config,
> > >  goto out;
> > >  }
> > >  
> > > +    *domid = atoi(getenv("OVERRIDE_DOMID") ?: "0");
> > >  ret = xc_domain_create(ctx->xch, info->ssidref, handle, flags, 
> > > domid,
> > >     _config);
> > >  if (ret < 0) {
> > > 
> > > This gross hack may get you somewhere (Entirely untested).
> > 
> > Gah!  Yep, that's just what I needed, thanks!  I don't suppose a patch
> > series adding a 'domid' field to the domain config file would be
> > rejected outright?  That would allow callers of xl to use key=value for
> > reboot scripts like mine, and also allow for a static domid setup of the
> > driver domains if folks want that.
> 
> Seems a bit  hacky to me. You also need to reserve a set of domids
> before hand?

My thought of creating a domid config file variable was to do just as
you say, reserve specific domids for specific guests.  I could even
trigger an error if domid is set when driver_domain isn't.

Actually, I could slightly overload driver_domain, changing from a bool
to a 'static domid'.  0 = not a driver domain, >0 is it's static domid
assignment.

For backwards compatibility, 1 = next domid available, and >1 would be
the static domid.  I'm not sure if I like that though.

The racey part is when a driver domain is shut down, how does a create
thread know that that domid is reserved?

third option, tri-state:

driver_domain = 0   # not a driver domain
driver_domain = 1   # is a driver domain, use next avail domid
driver_domain = 2   # is a driver domain, re-use domid

Honestly, I'm not really liking any of these.  Perhaps 'xl
network-detach ...' should be doing a better job of cleaning up?  Or,
'xl network-attach ...' should do a better job of re-attaching?

thx,

Jason.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] reboot driver domain, vifX.Y = NO-CARRIER?

2018-04-27 Thread Wei Liu
On Fri, Apr 27, 2018 at 06:02:46PM +0100, Andrew Cooper wrote:
> On 27/04/18 17:14, Jason Cooper wrote:
> > On Fri, Apr 27, 2018 at 04:52:57PM +0100, Andrew Cooper wrote:
> >> On 27/04/18 16:35, Jason Cooper wrote:
> >>> On Fri, Apr 27, 2018 at 04:11:39PM +0100, Andrew Cooper wrote:
>  On 27/04/18 16:03, Jason Cooper wrote:
> > The problem occurs when I reboot a driver domain.  Regardless of the
> > type of guest attached to it, I'm unable to re-establish connectivity
> > between the driver domain and the re-attached guest.  e.g. I reboot
> > GW/FW, then re-attach VM1, VM2 and the rest.  No matter how I do it, I
> > get:
> >
> > $ ip link
> > ...
> > 11: vif20.1:  mtu 1500 qdisc mq 
> > master br10 qlen 32
> > link/ether fe:ff:ff:ff:ff:ff brd ff:ff:ff:ff:ff:ff
> >
> > In the driver domain.  At this point, absolutely no packets flow between
> > the two VMs.  Not even ARP.  The only solution, so far, is to 
> > unnecessarily
> > reboot the PV guests.  After that, networking is fine.
> >
> > Any thoughts?
>  The underlying problem is that the frontend/backend setup in xenstore
>  encodes the domid in path, and changing that isn't transparent to the
>  guest at all.
> >>> Oh joy.  Would seem to make more send to use the domain name or the
> >>> uuid...
> >> domids are also used in the grant and event hypercall interfaces with Xen.
> >>
> >> There is no way this horse is being put back in its stable...
> > :-(
> >
>  The best idea we came up with was to reboot the driver domain and reuse
>  its old domid, at which point all the xenstore paths would remain
>  valid.  There is support in Xen for explicitly choosing the domid of a
>  domain, but I don't think that it is wired up sensibly in xl.
> >>> hmmm, yes.  It's not wired up at all afaict.  Mind giving me a hint on
> >>> how to reuse the domid?
> >> xc_domain_create() takes a domid value by pointer.  Passing a value
> >> other than zero will cause Xen to use that domid, rather than by
> >> searching for the next free domid.
> >>
> >> diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
> >> index b5e27a7..7866092 100644
> >> --- a/tools/libxl/libxl_create.c
> >> +++ b/tools/libxl/libxl_create.c
> >> @@ -583,6 +583,7 @@ int libxl__domain_make(libxl__gc *gc,
> >> libxl_domain_config *d_config,
> >>  goto out;
> >>  }
> >>  
> >> +    *domid = atoi(getenv("OVERRIDE_DOMID") ?: "0");
> >>  ret = xc_domain_create(ctx->xch, info->ssidref, handle, flags, 
> >> domid,
> >>     _config);
> >>  if (ret < 0) {
> >>
> >> This gross hack may get you somewhere (Entirely untested).
> > Gah!  Yep, that's just what I needed, thanks!  I don't suppose a patch
> > series adding a 'domid' field to the domain config file would be
> > rejected outright?  That would allow callers of xl to use key=value for
> > reboot scripts like mine, and also allow for a static domid setup of the
> > driver domains if folks want that.
> 
> That question would have to be deferred to the toolstack maintainers,
> but some ability to manage exact domid's would be a very good thing.
> 
> Having a domid= field would allow for very fine grain control, but
> probably more control than most people want.  Alternatively, having some
> kind of "reuse_domid" field which booted the domain normally once,
> recorded its domid, and reused that on reboot might be rather more useful.
> 

To implement reuse_domid in a sane way, either the toolstack needs to
manage all domids and always sets domid when creating domain or the
hypervisor needs to cooperate -- to have interface to reserve /
pre-allocate domids.

Either should be doable. We should think a bit more which approach is
better.

Wei.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] reboot driver domain, vifX.Y = NO-CARRIER?

2018-04-27 Thread Andrew Cooper
On 27/04/18 17:14, Jason Cooper wrote:
> On Fri, Apr 27, 2018 at 04:52:57PM +0100, Andrew Cooper wrote:
>> On 27/04/18 16:35, Jason Cooper wrote:
>>> On Fri, Apr 27, 2018 at 04:11:39PM +0100, Andrew Cooper wrote:
 On 27/04/18 16:03, Jason Cooper wrote:
> The problem occurs when I reboot a driver domain.  Regardless of the
> type of guest attached to it, I'm unable to re-establish connectivity
> between the driver domain and the re-attached guest.  e.g. I reboot
> GW/FW, then re-attach VM1, VM2 and the rest.  No matter how I do it, I
> get:
>
> $ ip link
> ...
> 11: vif20.1:  mtu 1500 qdisc mq master 
> br10 qlen 32
> link/ether fe:ff:ff:ff:ff:ff brd ff:ff:ff:ff:ff:ff
>
> In the driver domain.  At this point, absolutely no packets flow between
> the two VMs.  Not even ARP.  The only solution, so far, is to 
> unnecessarily
> reboot the PV guests.  After that, networking is fine.
>
> Any thoughts?
 The underlying problem is that the frontend/backend setup in xenstore
 encodes the domid in path, and changing that isn't transparent to the
 guest at all.
>>> Oh joy.  Would seem to make more send to use the domain name or the
>>> uuid...
>> domids are also used in the grant and event hypercall interfaces with Xen.
>>
>> There is no way this horse is being put back in its stable...
> :-(
>
 The best idea we came up with was to reboot the driver domain and reuse
 its old domid, at which point all the xenstore paths would remain
 valid.  There is support in Xen for explicitly choosing the domid of a
 domain, but I don't think that it is wired up sensibly in xl.
>>> hmmm, yes.  It's not wired up at all afaict.  Mind giving me a hint on
>>> how to reuse the domid?
>> xc_domain_create() takes a domid value by pointer.  Passing a value
>> other than zero will cause Xen to use that domid, rather than by
>> searching for the next free domid.
>>
>> diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
>> index b5e27a7..7866092 100644
>> --- a/tools/libxl/libxl_create.c
>> +++ b/tools/libxl/libxl_create.c
>> @@ -583,6 +583,7 @@ int libxl__domain_make(libxl__gc *gc,
>> libxl_domain_config *d_config,
>>  goto out;
>>  }
>>  
>> +    *domid = atoi(getenv("OVERRIDE_DOMID") ?: "0");
>>  ret = xc_domain_create(ctx->xch, info->ssidref, handle, flags, 
>> domid,
>>     _config);
>>  if (ret < 0) {
>>
>> This gross hack may get you somewhere (Entirely untested).
> Gah!  Yep, that's just what I needed, thanks!  I don't suppose a patch
> series adding a 'domid' field to the domain config file would be
> rejected outright?  That would allow callers of xl to use key=value for
> reboot scripts like mine, and also allow for a static domid setup of the
> driver domains if folks want that.

That question would have to be deferred to the toolstack maintainers,
but some ability to manage exact domid's would be a very good thing.

Having a domid= field would allow for very fine grain control, but
probably more control than most people want.  Alternatively, having some
kind of "reuse_domid" field which booted the domain normally once,
recorded its domid, and reused that on reboot might be rather more useful.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] reboot driver domain, vifX.Y = NO-CARRIER?

2018-04-27 Thread Wei Liu
On Fri, Apr 27, 2018 at 04:14:16PM +, Jason Cooper wrote:
> On Fri, Apr 27, 2018 at 04:52:57PM +0100, Andrew Cooper wrote:
> > On 27/04/18 16:35, Jason Cooper wrote:
> > > On Fri, Apr 27, 2018 at 04:11:39PM +0100, Andrew Cooper wrote:
> > >> On 27/04/18 16:03, Jason Cooper wrote:
> > >>> The problem occurs when I reboot a driver domain.  Regardless of the
> > >>> type of guest attached to it, I'm unable to re-establish connectivity
> > >>> between the driver domain and the re-attached guest.  e.g. I reboot
> > >>> GW/FW, then re-attach VM1, VM2 and the rest.  No matter how I do it, I
> > >>> get:
> > >>>
> > >>> $ ip link
> > >>> ...
> > >>> 11: vif20.1:  mtu 1500 qdisc mq 
> > >>> master br10 qlen 32
> > >>> link/ether fe:ff:ff:ff:ff:ff brd ff:ff:ff:ff:ff:ff
> > >>>
> > >>> In the driver domain.  At this point, absolutely no packets flow between
> > >>> the two VMs.  Not even ARP.  The only solution, so far, is to 
> > >>> unnecessarily
> > >>> reboot the PV guests.  After that, networking is fine.
> > >>>
> > >>> Any thoughts?
> > >> The underlying problem is that the frontend/backend setup in xenstore
> > >> encodes the domid in path, and changing that isn't transparent to the
> > >> guest at all.
> > > Oh joy.  Would seem to make more send to use the domain name or the
> > > uuid...
> > 
> > domids are also used in the grant and event hypercall interfaces with Xen.
> > 
> > There is no way this horse is being put back in its stable...
> 
> :-(
> 
> > >> The best idea we came up with was to reboot the driver domain and reuse
> > >> its old domid, at which point all the xenstore paths would remain
> > >> valid.  There is support in Xen for explicitly choosing the domid of a
> > >> domain, but I don't think that it is wired up sensibly in xl.
> > > hmmm, yes.  It's not wired up at all afaict.  Mind giving me a hint on
> > > how to reuse the domid?
> > 
> > xc_domain_create() takes a domid value by pointer.  Passing a value
> > other than zero will cause Xen to use that domid, rather than by
> > searching for the next free domid.
> > 
> > diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
> > index b5e27a7..7866092 100644
> > --- a/tools/libxl/libxl_create.c
> > +++ b/tools/libxl/libxl_create.c
> > @@ -583,6 +583,7 @@ int libxl__domain_make(libxl__gc *gc,
> > libxl_domain_config *d_config,
> >  goto out;
> >  }
> >  
> > +    *domid = atoi(getenv("OVERRIDE_DOMID") ?: "0");
> >  ret = xc_domain_create(ctx->xch, info->ssidref, handle, flags, 
> > domid,
> >     _config);
> >  if (ret < 0) {
> > 
> > This gross hack may get you somewhere (Entirely untested).
> 
> Gah!  Yep, that's just what I needed, thanks!  I don't suppose a patch
> series adding a 'domid' field to the domain config file would be
> rejected outright?  That would allow callers of xl to use key=value for
> reboot scripts like mine, and also allow for a static domid setup of the
> driver domains if folks want that.

Seems a bit  hacky to me. You also need to reserve a set of domids
before hand?

Wei.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] reboot driver domain, vifX.Y = NO-CARRIER?

2018-04-27 Thread Wei Liu
On Fri, Apr 27, 2018 at 04:52:57PM +0100, Andrew Cooper wrote:
> On 27/04/18 16:35, Jason Cooper wrote:
> > Hi Andrew,
> >
> > On Fri, Apr 27, 2018 at 04:11:39PM +0100, Andrew Cooper wrote:
> >> On 27/04/18 16:03, Jason Cooper wrote:
> >>> The problem occurs when I reboot a driver domain.  Regardless of the
> >>> type of guest attached to it, I'm unable to re-establish connectivity
> >>> between the driver domain and the re-attached guest.  e.g. I reboot
> >>> GW/FW, then re-attach VM1, VM2 and the rest.  No matter how I do it, I
> >>> get:
> >>>
> >>> $ ip link
> >>> ...
> >>> 11: vif20.1:  mtu 1500 qdisc mq master 
> >>> br10 qlen 32
> >>> link/ether fe:ff:ff:ff:ff:ff brd ff:ff:ff:ff:ff:ff
> >>>
> >>> In the driver domain.  At this point, absolutely no packets flow between
> >>> the two VMs.  Not even ARP.  The only solution, so far, is to 
> >>> unnecessarily
> >>> reboot the PV guests.  After that, networking is fine.
> >>>
> >>> Any thoughts?
> >> The underlying problem is that the frontend/backend setup in xenstore
> >> encodes the domid in path, and changing that isn't transparent to the
> >> guest at all.
> > Oh joy.  Would seem to make more send to use the domain name or the
> > uuid...
> 
> domids are also used in the grant and event hypercall interfaces with Xen.

If the frontend manages to go through disconnect/reconnect cycle, grant
table and event channel aren't going to be a problem?

Wei.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] reboot driver domain, vifX.Y = NO-CARRIER?

2018-04-27 Thread Jason Cooper
On Fri, Apr 27, 2018 at 04:52:57PM +0100, Andrew Cooper wrote:
> On 27/04/18 16:35, Jason Cooper wrote:
> > On Fri, Apr 27, 2018 at 04:11:39PM +0100, Andrew Cooper wrote:
> >> On 27/04/18 16:03, Jason Cooper wrote:
> >>> The problem occurs when I reboot a driver domain.  Regardless of the
> >>> type of guest attached to it, I'm unable to re-establish connectivity
> >>> between the driver domain and the re-attached guest.  e.g. I reboot
> >>> GW/FW, then re-attach VM1, VM2 and the rest.  No matter how I do it, I
> >>> get:
> >>>
> >>> $ ip link
> >>> ...
> >>> 11: vif20.1:  mtu 1500 qdisc mq master 
> >>> br10 qlen 32
> >>> link/ether fe:ff:ff:ff:ff:ff brd ff:ff:ff:ff:ff:ff
> >>>
> >>> In the driver domain.  At this point, absolutely no packets flow between
> >>> the two VMs.  Not even ARP.  The only solution, so far, is to 
> >>> unnecessarily
> >>> reboot the PV guests.  After that, networking is fine.
> >>>
> >>> Any thoughts?
> >> The underlying problem is that the frontend/backend setup in xenstore
> >> encodes the domid in path, and changing that isn't transparent to the
> >> guest at all.
> > Oh joy.  Would seem to make more send to use the domain name or the
> > uuid...
> 
> domids are also used in the grant and event hypercall interfaces with Xen.
> 
> There is no way this horse is being put back in its stable...

:-(

> >> The best idea we came up with was to reboot the driver domain and reuse
> >> its old domid, at which point all the xenstore paths would remain
> >> valid.  There is support in Xen for explicitly choosing the domid of a
> >> domain, but I don't think that it is wired up sensibly in xl.
> > hmmm, yes.  It's not wired up at all afaict.  Mind giving me a hint on
> > how to reuse the domid?
> 
> xc_domain_create() takes a domid value by pointer.  Passing a value
> other than zero will cause Xen to use that domid, rather than by
> searching for the next free domid.
> 
> diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
> index b5e27a7..7866092 100644
> --- a/tools/libxl/libxl_create.c
> +++ b/tools/libxl/libxl_create.c
> @@ -583,6 +583,7 @@ int libxl__domain_make(libxl__gc *gc,
> libxl_domain_config *d_config,
>  goto out;
>  }
>  
> +    *domid = atoi(getenv("OVERRIDE_DOMID") ?: "0");
>  ret = xc_domain_create(ctx->xch, info->ssidref, handle, flags, domid,
>     _config);
>  if (ret < 0) {
> 
> This gross hack may get you somewhere (Entirely untested).

Gah!  Yep, that's just what I needed, thanks!  I don't suppose a patch
series adding a 'domid' field to the domain config file would be
rejected outright?  That would allow callers of xl to use key=value for
reboot scripts like mine, and also allow for a static domid setup of the
driver domains if folks want that.

thx,

Jason.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] reboot driver domain, vifX.Y = NO-CARRIER?

2018-04-27 Thread Jason Cooper
Hi Andrew,

On Fri, Apr 27, 2018 at 04:11:39PM +0100, Andrew Cooper wrote:
> On 27/04/18 16:03, Jason Cooper wrote:
> > The problem occurs when I reboot a driver domain.  Regardless of the
> > type of guest attached to it, I'm unable to re-establish connectivity
> > between the driver domain and the re-attached guest.  e.g. I reboot
> > GW/FW, then re-attach VM1, VM2 and the rest.  No matter how I do it, I
> > get:
> >
> > $ ip link
> > ...
> > 11: vif20.1:  mtu 1500 qdisc mq master 
> > br10 qlen 32
> > link/ether fe:ff:ff:ff:ff:ff brd ff:ff:ff:ff:ff:ff
> >
> > In the driver domain.  At this point, absolutely no packets flow between
> > the two VMs.  Not even ARP.  The only solution, so far, is to unnecessarily
> > reboot the PV guests.  After that, networking is fine.
> >
> > Any thoughts?
> 
> The underlying problem is that the frontend/backend setup in xenstore
> encodes the domid in path, and changing that isn't transparent to the
> guest at all.

Oh joy.  Would seem to make more send to use the domain name or the
uuid...

> The best idea we came up with was to reboot the driver domain and reuse
> its old domid, at which point all the xenstore paths would remain
> valid.  There is support in Xen for explicitly choosing the domid of a
> domain, but I don't think that it is wired up sensibly in xl.

hmmm, yes.  It's not wired up at all afaict.  Mind giving me a hint on
how to reuse the domid?

The solution I see with my current, limited understanding could be to
change the path for the guest via xenstore-write.  Although I suspect
there's more going on underneath the hood than I'm currently aware of.

thx,

Jason.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] reboot driver domain, vifX.Y = NO-CARRIER?

2018-04-27 Thread Andrew Cooper
On 27/04/18 16:03, Jason Cooper wrote:
> All,
>
> On Gentoo Xen 4.9.1, I've been creating minimal Linux DomU's to create a
> virtual, segregated network infrastructure.  This has been going really
> well, and I'm slowly progressing toward a self-updating system.
>
> My main snag has to do with re-attaching VMs to a driver domain after
> rebooting the driver domain.  e.g.
>
>
>   +-+
> /-| VM1 |
>/  +-+
>++   +---+ /   +-+
> ISP ---| SW |---| GW/FW |-| VM2 |
>++   +---+ \   +-+
> DDDD   \  +-+
> \-| VMN |
>   +-+
>
> So, in this diagram, SW, GW/FW, and VM1 are mini-VMs.  VM2, and the rest
> are full fledged Linux PV VMs.
>
> Only SW, and GW/FW are driver domains.  SW has the physical nic via
> pci-passthrough.  There are actually 7 GW/FW mini-VMs (for 7 public IPs,
> and 7 different networks), and a trunk mini-VM that aren't shown.
>
> The problem occurs when I reboot a driver domain.  Regardless of the
> type of guest attached to it, I'm unable to re-establish connectivity
> between the driver domain and the re-attached guest.  e.g. I reboot
> GW/FW, then re-attach VM1, VM2 and the rest.  No matter how I do it, I
> get:
>
> $ ip link
> ...
> 11: vif20.1:  mtu 1500 qdisc mq master 
> br10 qlen 32
> link/ether fe:ff:ff:ff:ff:ff brd ff:ff:ff:ff:ff:ff
>
> In the driver domain.  At this point, absolutely no packets flow between
> the two VMs.  Not even ARP.  The only solution, so far, is to unnecessarily
> reboot the PV guests.  After that, networking is fine.
>
> Any thoughts?

XenServer found this when we investigated using device driver domains in
a similar way.

The underlying problem is that the frontend/backend setup in xenstore
encodes the domid in path, and changing that isn't transparent to the
guest at all.

The best idea we came up with was to reboot the driver domain and reuse
its old domid, at which point all the xenstore paths would remain
valid.  There is support in Xen for explicitly choosing the domid of a
domain, but I don't think that it is wired up sensibly in xl.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel