Re: [RFC PATCH] virtio_ring: Use DMA API if guest memory is encrypted

2019-03-21 Thread Thiago Jung Bauermann


Michael S. Tsirkin  writes:

> On Wed, Mar 20, 2019 at 01:13:41PM -0300, Thiago Jung Bauermann wrote:
>> >> Another way of looking at this issue which also explains our reluctance
>> >> is that the only difference between a secure guest and a regular guest
>> >> (at least regarding virtio) is that the former uses swiotlb while the
>> >> latter doens't.
>> >
>> > But swiotlb is just one implementation. It's a guest internal thing. The
>> > issue is that memory isn't host accessible.
>>
>> >From what I understand of the ACCESS_PLATFORM definition, the host will
>> only ever try to access memory addresses that are supplied to it by the
>> guest, so all of the secure guest memory that the host cares about is
>> accessible:
>>
>> If this feature bit is set to 0, then the device has same access to
>> memory addresses supplied to it as the driver has. In particular,
>> the device will always use physical addresses matching addresses
>> used by the driver (typically meaning physical addresses used by the
>> CPU) and not translated further, and can access any address supplied
>> to it by the driver. When clear, this overrides any
>> platform-specific description of whether device access is limited or
>> translated in any way, e.g. whether an IOMMU may be present.
>>
>> All of the above is true for POWER guests, whether they are secure
>> guests or not.
>>
>> Or are you saying that a virtio device may want to access memory
>> addresses that weren't supplied to it by the driver?
>
> Your logic would apply to IOMMUs as well.  For your mode, there are
> specific encrypted memory regions that driver has access to but device
> does not. that seems to violate the constraint.

Right, if there's a pre-configured 1:1 mapping in the IOMMU such that
the device can ignore the IOMMU for all practical purposes I would
indeed say that the logic would apply to IOMMUs as well. :-)

I guess I'm still struggling with the purpose of signalling to the
driver that the host may not have access to memory addresses that it
will never try to access.

>> >> And from the device's point of view they're
>> >> indistinguishable. It can't tell one guest that is using swiotlb from
>> >> one that isn't. And that implies that secure guest vs regular guest
>> >> isn't a virtio interface issue, it's "guest internal affairs". So
>> >> there's no reason to reflect that in the feature flags.
>> >
>> > So don't. The way not to reflect that in the feature flags is
>> > to set ACCESS_PLATFORM.  Then you say *I don't care let platform device*.
>> >
>> >
>> > Without ACCESS_PLATFORM
>> > virtio has a very specific opinion about the security of the
>> > device, and that opinion is that device is part of the guest
>> > supervisor security domain.
>>
>> Sorry for being a bit dense, but not sure what "the device is part of
>> the guest supervisor security domain" means. In powerpc-speak,
>> "supervisor" is the operating system so perhaps that explains my
>> confusion. Are you saying that without ACCESS_PLATFORM, the guest
>> considers the host to be part of the guest operating system's security
>> domain?
>
> I think so. The spec says "device has same access as driver".

Ok, makes sense.

>> If so, does that have any other implication besides "the host
>> can access any address supplied to it by the driver"? If that is the
>> case, perhaps the definition of ACCESS_PLATFORM needs to be amended to
>> include that information because it's not part of the current
>> definition.
>>
>> >> > But the name "sev_active" makes me scared because at least AMD guys who
>> >> > were doing the sensible thing and setting ACCESS_PLATFORM
>> >>
>> >> My understanding is, AMD guest-platform knows in advance that their
>> >> guest will run in secure mode and hence sets the flag at the time of VM
>> >> instantiation. Unfortunately we dont have that luxury on our platforms.
>> >
>> > Well you do have that luxury. It looks like that there are existing
>> > guests that already acknowledge ACCESS_PLATFORM and you are not happy
>> > with how that path is slow. So you are trying to optimize for
>> > them by clearing ACCESS_PLATFORM and then you have lost ability
>> > to invoke DMA API.
>> >
>> > For example if there was another flag just like ACCESS_PLATFORM
>> > just not yet used by anyone, you would be all fine using that right?
>>
>> Yes, a new flag sounds like a great idea. What about the definition
>> below?
>>
>> VIRTIO_F_ACCESS_PLATFORM_NO_IOMMU This feature has the same meaning as
>> VIRTIO_F_ACCESS_PLATFORM both when set and when not set, with the
>> exception that the IOMMU is explicitly defined to be off or bypassed
>> when accessing memory addresses supplied to the device by the
>> driver. This flag should be set by the guest if offered, but to
>> allow for backward-compatibility device implementations allow for it
>> to be left unset by the guest. It is an error to set both this flag
>> and VIRTIO_F_ACCESS_PLATFORM.

Re: [summary] virtio network device failover writeup

2019-03-21 Thread Michael S. Tsirkin
On Thu, Mar 21, 2019 at 06:31:35PM +0200, Liran Alon wrote:
> 
> 
> > On 21 Mar 2019, at 17:50, Michael S. Tsirkin  wrote:
> > 
> > On Thu, Mar 21, 2019 at 08:45:17AM -0700, Stephen Hemminger wrote:
> >> On Thu, 21 Mar 2019 15:04:37 +0200
> >> Liran Alon  wrote:
> >> 
>  
>  OK. Now what happens if master is moved to another namespace? Do we need
>  to move the slaves too?  
> >>> 
> >>> No. Why would we move the slaves? The whole point is to make most 
> >>> customer ignore the net-failover slaves and remain them “hidden” in their 
> >>> dedicated netns.
> >>> We won’t prevent customer from explicitly moving the net-failover slaves 
> >>> out of this netns, but we will not move them out of there automatically.
> >> 
> >> 
> >> The 2-device netvsc already handles case where master changes namespace.
> > 
> > Is it by moving slave with it?
> 
> See c0a41b887ce6 ("hv_netvsc: move VF to same namespace as netvsc device”).
> It seems that when NetVSC master netdev changes netns, the VF is moved to the 
> same netns by the NetVSC driver.
> Kinda the opposite than what we are suggesting here to make sure that the 
> net-failover master netdev is on a separate
> netns than it’s slaves...
> 
> -Liran
> 
> > 
> > -- 
> > MST

Not exactly opposite I'd say.

If failover is in host ns, slaves in /primary and /standby, then moving
failover to /container should move slaves to /container/primary and
/container/standby.


-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: virtio-blk: should num_vqs be limited by num_possible_cpus()?

2019-03-21 Thread Stefan Hajnoczi
On Wed, Mar 20, 2019 at 08:53:33PM +0800, Jason Wang wrote:
> 
> On 2019/3/19 上午10:22, Dongli Zhang wrote:
> > Hi Jason,
> > 
> > On 3/18/19 3:47 PM, Jason Wang wrote:
> > > On 2019/3/15 下午8:41, Cornelia Huck wrote:
> > > > On Fri, 15 Mar 2019 12:50:11 +0800
> > > > Jason Wang  wrote:
> > option 3:
> > We should allow more vectors even the block layer would support at most
> > nr_cpu_ids queues.
> > 
> > 
> > I understand a new policy for queue-vector mapping is very helpful. I am 
> > just
> > asking the question from block layer's point of view.
> > 
> > Thank you very much!
> > 
> > Dongli Zhang
> 
> 
> Don't know much for block, cc Stefan for more idea.

Thanks for CCing me.  I don't have much input at this stage.

Stefan


signature.asc
Description: PGP signature
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [summary] virtio network device failover writeup

2019-03-21 Thread Michael S. Tsirkin
On Thu, Mar 21, 2019 at 08:45:17AM -0700, Stephen Hemminger wrote:
> On Thu, 21 Mar 2019 15:04:37 +0200
> Liran Alon  wrote:
> 
> > > 
> > > OK. Now what happens if master is moved to another namespace? Do we need
> > > to move the slaves too?  
> > 
> > No. Why would we move the slaves? The whole point is to make most customer 
> > ignore the net-failover slaves and remain them “hidden” in their dedicated 
> > netns.
> > We won’t prevent customer from explicitly moving the net-failover slaves 
> > out of this netns, but we will not move them out of there automatically.
> 
> 
> The 2-device netvsc already handles case where master changes namespace.

Is it by moving slave with it?

-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [summary] virtio network device failover writeup

2019-03-21 Thread Stephen Hemminger
On Thu, 21 Mar 2019 15:04:37 +0200
Liran Alon  wrote:

> > 
> > OK. Now what happens if master is moved to another namespace? Do we need
> > to move the slaves too?  
> 
> No. Why would we move the slaves? The whole point is to make most customer 
> ignore the net-failover slaves and remain them “hidden” in their dedicated 
> netns.
> We won’t prevent customer from explicitly moving the net-failover slaves out 
> of this netns, but we will not move them out of there automatically.


The 2-device netvsc already handles case where master changes namespace.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [summary] virtio network device failover writeup

2019-03-21 Thread Stephen Hemminger
On Thu, 21 Mar 2019 08:57:03 -0400
"Michael S. Tsirkin"  wrote:

> On Thu, Mar 21, 2019 at 02:47:50PM +0200, Liran Alon wrote:
> > 
> >   
> > > On 21 Mar 2019, at 14:37, Michael S. Tsirkin  wrote:
> > > 
> > > On Thu, Mar 21, 2019 at 12:07:57PM +0200, Liran Alon wrote:  
> > >> 2) It brings non-intuitive customer experience. For example, a 
> > >> customer may attempt to analyse connectivity issue by checking the 
> > >> connectivity
> > >> on a net-failover slave (e.g. the VF) but will see no connectivity 
> > >> when in-fact checking the connectivity on the net-failover master 
> > >> netdev shows correct connectivity.
> > >> 
> > >> The set of changes I vision to fix our issues are:
> > >> 1) Hide net-failover slaves in a different netns created and managed 
> > >> by the kernel. But that user can enter to it and manage the netdevs 
> > >> there if wishes to do so explicitly.
> > >> (E.g. Configure the net-failover VF slave in some special way).
> > >> 2) Match the virtio-net and the VF based on a PV attribute instead 
> > >> of MAC. (Similar to as done in NetVSC). E.g. Provide a virtio-net 
> > >> interface to get PCI slot where the matching VF will be hot-plugged 
> > >> by hypervisor.
> > >> 3) Have an explicit virtio-net control message to command hypervisor 
> > >> to switch data-path from virtio-net to VF and vice-versa. Instead of 
> > >> relying on intercepting the PCI master enable-bit
> > >> as an indicator on when VF is about to be set up. (Similar to as 
> > >> done in NetVSC).
> > >> 
> > >> Is there any clear issue we see regarding the above suggestion?
> > >> 
> > >> -Liran  
> > > 
> > > The issue would be this: how do we avoid conflicting with namespaces
> > > created by users?  
> >  
> >  This is kinda controversial, but maybe separate netns names into 2 
> >  groups: hidden and normal.
> >  To reference a hidden netns, you need to do it explicitly. 
> >  Hidden and normal netns names can collide as they will be maintained 
> >  in different namespaces (Yes I’m overloading the term namespace 
> >  here…).  
> > >>> 
> > >>> Maybe it's an unnamed namespace. Hidden until userspace gives it a 
> > >>> name?  
> > >> 
> > >> This is also a good idea that will solve the issue. Yes.
> > >>   
> > >>>   
> >  Does this seems reasonable?
> >  
> >  -Liran  
> > >>> 
> > >>> Reasonable I'd say yes, easy to implement probably no. But maybe I
> > >>> missed a trick or two.  
> > >> 
> > >> BTW, from a practical point of view, I think that even until we figure 
> > >> out a solution on how to implement this,
> > >> it was better to create an kernel auto-generated name (e.g. 
> > >> “kernel_net_failover_slaves")
> > >> that will break only userspace workloads that by a very rare-chance have 
> > >> a netns that collides with this then
> > >> the breakage we have today for the various userspace components.
> > >> 
> > >> -Liran  
> > > 
> > > It seems quite easy to supply that as a module parameter. Do we need two
> > > namespaces though? Won't some userspace still be confused by the two
> > > slaves sharing the MAC address?  
> > 
> > That’s one reasonable option.
> > Another one is that we will indeed change the mechanism by which we 
> > determine a VF should be bonded with a virtio-net device.
> > i.e. Expose a new virtio-net property that specify the PCI slot of the VF 
> > to be bonded with.
> > 
> > The second seems cleaner but I don’t have a strong opinion on this. Both 
> > seem reasonable to me and your suggestion is faster to implement from 
> > current state of things.
> > 
> > -Liran  
> 
> OK. Now what happens if master is moved to another namespace? Do we need
> to move the slaves too?
> 
> Also siwei's patch is then kind of extraneous right?
> Attempts to rename a slave will now fail as it's in a namespace...

I did try moving slave device into a namespace at one point.
The problem is that introduces all sorts of locking problems in the code
because you can't do it directly in the context of when the callback
happens that a new slave device is discovered.

Since you can't safely change device namespace in the notifier,
it requires a work queue. Then you add more complexity and error cases
because the slave is exposed for a short period, and handling all the
state race unwinds...

Good idea but hard to implement
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [summary] virtio network device failover writeup

2019-03-21 Thread Michael S. Tsirkin
On Thu, Mar 21, 2019 at 04:16:14PM +0200, Liran Alon wrote:
> 
> 
> > On 21 Mar 2019, at 15:51, Michael S. Tsirkin  wrote:
> > 
> > On Thu, Mar 21, 2019 at 03:24:39PM +0200, Liran Alon wrote:
> >> 
> >> 
> >>> On 21 Mar 2019, at 15:12, Michael S. Tsirkin  wrote:
> >>> 
> >>> On Thu, Mar 21, 2019 at 03:04:37PM +0200, Liran Alon wrote:
>  
>  
> > On 21 Mar 2019, at 14:57, Michael S. Tsirkin  wrote:
> > 
> > On Thu, Mar 21, 2019 at 02:47:50PM +0200, Liran Alon wrote:
> >> 
> >> 
> >>> On 21 Mar 2019, at 14:37, Michael S. Tsirkin  wrote:
> >>> 
> >>> On Thu, Mar 21, 2019 at 12:07:57PM +0200, Liran Alon wrote:
>  2) It brings non-intuitive customer experience. For example, a 
>  customer may attempt to analyse connectivity issue by checking 
>  the connectivity
>  on a net-failover slave (e.g. the VF) but will see no 
>  connectivity when in-fact checking the connectivity on the 
>  net-failover master netdev shows correct connectivity.
>  
>  The set of changes I vision to fix our issues are:
>  1) Hide net-failover slaves in a different netns created and 
>  managed by the kernel. But that user can enter to it and manage 
>  the netdevs there if wishes to do so explicitly.
>  (E.g. Configure the net-failover VF slave in some special way).
>  2) Match the virtio-net and the VF based on a PV attribute 
>  instead of MAC. (Similar to as done in NetVSC). E.g. Provide a 
>  virtio-net interface to get PCI slot where the matching VF will 
>  be hot-plugged by hypervisor.
>  3) Have an explicit virtio-net control message to command 
>  hypervisor to switch data-path from virtio-net to VF and 
>  vice-versa. Instead of relying on intercepting the PCI master 
>  enable-bit
>  as an indicator on when VF is about to be set up. (Similar to as 
>  done in NetVSC).
>  
>  Is there any clear issue we see regarding the above suggestion?
>  
>  -Liran
> >>> 
> >>> The issue would be this: how do we avoid conflicting with 
> >>> namespaces
> >>> created by users?
> >> 
> >> This is kinda controversial, but maybe separate netns names into 2 
> >> groups: hidden and normal.
> >> To reference a hidden netns, you need to do it explicitly. 
> >> Hidden and normal netns names can collide as they will be 
> >> maintained in different namespaces (Yes I’m overloading the term 
> >> namespace here…).
> > 
> > Maybe it's an unnamed namespace. Hidden until userspace gives it a 
> > name?
>  
>  This is also a good idea that will solve the issue. Yes.
>  
> > 
> >> Does this seems reasonable?
> >> 
> >> -Liran
> > 
> > Reasonable I'd say yes, easy to implement probably no. But maybe I
> > missed a trick or two.
>  
>  BTW, from a practical point of view, I think that even until we 
>  figure out a solution on how to implement this,
>  it was better to create an kernel auto-generated name (e.g. 
>  “kernel_net_failover_slaves")
>  that will break only userspace workloads that by a very rare-chance 
>  have a netns that collides with this then
>  the breakage we have today for the various userspace components.
>  
>  -Liran
> >>> 
> >>> It seems quite easy to supply that as a module parameter. Do we need 
> >>> two
> >>> namespaces though? Won't some userspace still be confused by the two
> >>> slaves sharing the MAC address?
> >> 
> >> That’s one reasonable option.
> >> Another one is that we will indeed change the mechanism by which we 
> >> determine a VF should be bonded with a virtio-net device.
> >> i.e. Expose a new virtio-net property that specify the PCI slot of the 
> >> VF to be bonded with.
> >> 
> >> The second seems cleaner but I don’t have a strong opinion on this. 
> >> Both seem reasonable to me and your suggestion is faster to implement 
> >> from current state of things.
> >> 
> >> -Liran
> > 
> > OK. Now what happens if master is moved to another namespace? Do we need
> > to move the slaves too?
>  
>  No. Why would we move the slaves?
> >>> 
> >>> 
> >>> The reason we have 3 device model at all is so users can fine tune the
> >>> slaves.
> >> 
> >> I Agree.
> >> 
> >>> I don't see why this applies to the root namespace but not
> >>> a container. If it has access to failover it should have access
> >>> to slaves.
> >> 
> >> Oh now I see your point. I haven’t thought about the containers usage.
> >> My thinking was that 

Re: [PATCH net v2] failover: allow name change on IFF_UP slave interfaces

2019-03-21 Thread Michael S. Tsirkin
On Thu, Mar 21, 2019 at 04:20:40PM +0200, Liran Alon wrote:
> 
> 
> > On 21 Mar 2019, at 16:04, Michael S. Tsirkin  wrote:
> > 
> > On Wed, Mar 06, 2019 at 10:08:32PM -0500, Si-Wei Liu wrote:
> >> When a netdev appears through hot plug then gets enslaved by a failover
> >> master that is already up and running, the slave will be opened
> >> right away after getting enslaved. Today there's a race that userspace
> >> (udev) may fail to rename the slave if the kernel (net_failover)
> >> opens the slave earlier than when the userspace rename happens.
> >> Unlike bond or team, the primary slave of failover can't be renamed by
> >> userspace ahead of time, since the kernel initiated auto-enslavement is
> >> unable to, or rather, is never meant to be synchronized with the rename
> >> request from userspace.
> >> 
> >> As the failover slave interfaces are not designed to be operated
> >> directly by userspace apps: IP configuration, filter rules with
> >> regard to network traffic passing and etc., should all be done on master
> >> interface. In general, userspace apps only care about the
> >> name of master interface, while slave names are less important as long
> >> as admin users can see reliable names that may carry
> >> other information describing the netdev. For e.g., they can infer that
> >> "ens3nsby" is a standby slave of "ens3", while for a
> >> name like "eth0" they can't tell which master it belongs to.
> >> 
> >> Historically the name of IFF_UP interface can't be changed because
> >> there might be admin script or management software that is already
> >> relying on such behavior and assumes that the slave name can't be
> >> changed once UP. But failover is special: with the in-kernel
> >> auto-enslavement mechanism, the userspace expectation for device
> >> enumeration and bring-up order is already broken. Previously initramfs
> >> and various userspace config tools were modified to bypass failover
> >> slaves because of auto-enslavement and duplicate MAC address. Similarly,
> >> in case that users care about seeing reliable slave name, the new type
> >> of failover slaves needs to be taken care of specifically in userspace
> >> anyway.
> >> 
> >> It's less risky to lift up the rename restriction on failover slave
> >> which is already UP. Although it's possible this change may potentially
> >> break userspace component (most likely configuration scripts or
> >> management software) that assumes slave name can't be changed while
> >> UP, it's relatively a limited and controllable set among all userspace
> >> components, which can be fixed specifically to work with the new naming
> >> behavior of failover slaves. Userspace component interacting with
> >> slaves should be changed to operate on failover master instead, as the
> >> failover slave is dynamic in nature which may come and go at any point.
> >> The goal is to make the role of failover slaves less relevant, and
> >> all userspace should only deal with master in the long run.
> >> 
> >> Fixes: 30c8bd5aa8b2 ("net: Introduce generic failover module")
> >> Signed-off-by: Si-Wei Liu 
> >> Reviewed-by: Liran Alon 
> >> Acked-by: Michael S. Tsirkin 
> > 
> > I worry that userspace might have made a bunch of assumptions
> > that names never change as long as interface is up.
> > So listening for up events ensures that interface
> > is not renamed.
> 
> That’s true. This is exactly what is described in 3rd paragraph of commit 
> message.
> However, as commit message claims, net-failover slaves can be treated 
> specially
> because userspace is already broken on their handling and they need to be 
> modified
> to behave specially in regards to those slaves. Therefore, it’s less risky to 
> lift up the
> rename restriction on failover slave which is already UP.
> 
> > 
> > How about sending down and up events around such renames?
> 
> You mean that dev_change_name() will behave as proposed in this patch but 
> also in addition
> send fake DOWN and UP uevents to userspace?
> 
> -Liran

That was what I was trying to say.

> > 
> > 
> > 
> >> ---
> >> v1 -> v2:
> >> - Drop configurable module parameter (Sridhar)
> >> 
> >> 
> >> include/linux/netdevice.h | 3 +++
> >> net/core/dev.c| 3 ++-
> >> net/core/failover.c   | 6 +++---
> >> 3 files changed, 8 insertions(+), 4 deletions(-)
> >> 
> >> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> >> index 857f8ab..6d9e4e0 100644
> >> --- a/include/linux/netdevice.h
> >> +++ b/include/linux/netdevice.h
> >> @@ -1487,6 +1487,7 @@ struct net_device_ops {
> >>  * @IFF_NO_RX_HANDLER: device doesn't support the rx_handler hook
> >>  * @IFF_FAILOVER: device is a failover master device
> >>  * @IFF_FAILOVER_SLAVE: device is lower dev of a failover master device
> >> + * @IFF_SLAVE_RENAME_OK: rename is allowed while slave device is running
> >>  */
> >> enum netdev_priv_flags {
> >>IFF_802_1Q_VLAN = 1<<0,
> >> @@ -1518,6 +1519,7 @@ enum netdev_priv_flags {
> >>

[PATCH] VMCI/VSOCK: Add maintainers for VMCI, AF_VSOCK and VMCI transport

2019-03-21 Thread Jorgen Hansen via Virtualization
Update the maintainers file to include maintainers for the VMware
vmci driver, af_vsock, and the vsock vmci transport.

Signed-off-by: Jorgen Hansen 
---
 MAINTAINERS | 20 
 1 file changed, 20 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index e17ebf7..b9714fc 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -16630,6 +16630,14 @@ S: Maintained
 F: drivers/scsi/vmw_pvscsi.c
 F: drivers/scsi/vmw_pvscsi.h
 
+VMWARE VMCI DRIVER
+M: Jorgen Hansen 
+M: Vishnu Dasa 
+M: "VMware, Inc." 
+L: linux-ker...@vger.kernel.org
+S: Maintained
+F: drivers/misc/vmw_vmci/
+
 VMWARE VMMOUSE SUBDRIVER
 M: "VMware Graphics" 
 M: "VMware, Inc." 
@@ -16638,6 +16646,18 @@ S: Maintained
 F: drivers/input/mouse/vmmouse.c
 F: drivers/input/mouse/vmmouse.h
 
+VMWARE VSOCK DRIVER (AF_VSOCK) AND VMCI TRANSPORT
+M: Jorgen Hansen 
+M: Vishnu Dasa 
+M: "VMware, Inc." 
+L: net...@vger.kernel.org
+S: Maintained
+F: net/vmw_vsock/af_vsock.c
+F: net/vmw_vsock/vmci_transport*
+F: net/vmw_vsock/vsock_addr.c
+F: include/linux/vm_sockets.h
+F: include/uapi/linux/vm_sockets.h
+
 VMWARE VMXNET3 ETHERNET DRIVER
 M: Ronak Doshi 
 M: "VMware, Inc." 
-- 
2.6.2

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net v2] failover: allow name change on IFF_UP slave interfaces

2019-03-21 Thread Michael S. Tsirkin
On Wed, Mar 06, 2019 at 10:08:32PM -0500, Si-Wei Liu wrote:
> When a netdev appears through hot plug then gets enslaved by a failover
> master that is already up and running, the slave will be opened
> right away after getting enslaved. Today there's a race that userspace
> (udev) may fail to rename the slave if the kernel (net_failover)
> opens the slave earlier than when the userspace rename happens.
> Unlike bond or team, the primary slave of failover can't be renamed by
> userspace ahead of time, since the kernel initiated auto-enslavement is
> unable to, or rather, is never meant to be synchronized with the rename
> request from userspace.
> 
> As the failover slave interfaces are not designed to be operated
> directly by userspace apps: IP configuration, filter rules with
> regard to network traffic passing and etc., should all be done on master
> interface. In general, userspace apps only care about the
> name of master interface, while slave names are less important as long
> as admin users can see reliable names that may carry
> other information describing the netdev. For e.g., they can infer that
> "ens3nsby" is a standby slave of "ens3", while for a
> name like "eth0" they can't tell which master it belongs to.
> 
> Historically the name of IFF_UP interface can't be changed because
> there might be admin script or management software that is already
> relying on such behavior and assumes that the slave name can't be
> changed once UP. But failover is special: with the in-kernel
> auto-enslavement mechanism, the userspace expectation for device
> enumeration and bring-up order is already broken. Previously initramfs
> and various userspace config tools were modified to bypass failover
> slaves because of auto-enslavement and duplicate MAC address. Similarly,
> in case that users care about seeing reliable slave name, the new type
> of failover slaves needs to be taken care of specifically in userspace
> anyway.
> 
> It's less risky to lift up the rename restriction on failover slave
> which is already UP. Although it's possible this change may potentially
> break userspace component (most likely configuration scripts or
> management software) that assumes slave name can't be changed while
> UP, it's relatively a limited and controllable set among all userspace
> components, which can be fixed specifically to work with the new naming
> behavior of failover slaves. Userspace component interacting with
> slaves should be changed to operate on failover master instead, as the
> failover slave is dynamic in nature which may come and go at any point.
> The goal is to make the role of failover slaves less relevant, and
> all userspace should only deal with master in the long run.
> 
> Fixes: 30c8bd5aa8b2 ("net: Introduce generic failover module")
> Signed-off-by: Si-Wei Liu 
> Reviewed-by: Liran Alon 
> Acked-by: Michael S. Tsirkin 

I worry that userspace might have made a bunch of assumptions
that names never change as long as interface is up.
So listening for up events ensures that interface
is not renamed.

How about sending down and up events around such renames?



> ---
> v1 -> v2:
> - Drop configurable module parameter (Sridhar)
> 
> 
>  include/linux/netdevice.h | 3 +++
>  net/core/dev.c| 3 ++-
>  net/core/failover.c   | 6 +++---
>  3 files changed, 8 insertions(+), 4 deletions(-)
> 
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index 857f8ab..6d9e4e0 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -1487,6 +1487,7 @@ struct net_device_ops {
>   * @IFF_NO_RX_HANDLER: device doesn't support the rx_handler hook
>   * @IFF_FAILOVER: device is a failover master device
>   * @IFF_FAILOVER_SLAVE: device is lower dev of a failover master device
> + * @IFF_SLAVE_RENAME_OK: rename is allowed while slave device is running
>   */
>  enum netdev_priv_flags {
>   IFF_802_1Q_VLAN = 1<<0,
> @@ -1518,6 +1519,7 @@ enum netdev_priv_flags {
>   IFF_NO_RX_HANDLER   = 1<<26,
>   IFF_FAILOVER= 1<<27,
>   IFF_FAILOVER_SLAVE  = 1<<28,
> + IFF_SLAVE_RENAME_OK = 1<<29,
>  };
>  
>  #define IFF_802_1Q_VLAN  IFF_802_1Q_VLAN
> @@ -1548,6 +1550,7 @@ enum netdev_priv_flags {
>  #define IFF_NO_RX_HANDLERIFF_NO_RX_HANDLER
>  #define IFF_FAILOVER IFF_FAILOVER
>  #define IFF_FAILOVER_SLAVE   IFF_FAILOVER_SLAVE
> +#define IFF_SLAVE_RENAME_OK  IFF_SLAVE_RENAME_OK
>  
>  /**
>   *   struct net_device - The DEVICE structure.
> diff --git a/net/core/dev.c b/net/core/dev.c
> index 722d50d..ae070de 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -1180,7 +1180,8 @@ int dev_change_name(struct net_device *dev, const char 
> *newname)
>   BUG_ON(!dev_net(dev));
>  
>   net = dev_net(dev);
> - if (dev->flags & IFF_UP)
> + if (dev->flags & IFF_UP &&
> + !(dev->priv_flags & 

Re: [summary] virtio network device failover writeup

2019-03-21 Thread Michael S. Tsirkin
On Thu, Mar 21, 2019 at 03:24:39PM +0200, Liran Alon wrote:
> 
> 
> > On 21 Mar 2019, at 15:12, Michael S. Tsirkin  wrote:
> > 
> > On Thu, Mar 21, 2019 at 03:04:37PM +0200, Liran Alon wrote:
> >> 
> >> 
> >>> On 21 Mar 2019, at 14:57, Michael S. Tsirkin  wrote:
> >>> 
> >>> On Thu, Mar 21, 2019 at 02:47:50PM +0200, Liran Alon wrote:
>  
>  
> > On 21 Mar 2019, at 14:37, Michael S. Tsirkin  wrote:
> > 
> > On Thu, Mar 21, 2019 at 12:07:57PM +0200, Liran Alon wrote:
> >> 2) It brings non-intuitive customer experience. For example, a 
> >> customer may attempt to analyse connectivity issue by checking the 
> >> connectivity
> >> on a net-failover slave (e.g. the VF) but will see no connectivity 
> >> when in-fact checking the connectivity on the net-failover master 
> >> netdev shows correct connectivity.
> >> 
> >> The set of changes I vision to fix our issues are:
> >> 1) Hide net-failover slaves in a different netns created and 
> >> managed by the kernel. But that user can enter to it and manage 
> >> the netdevs there if wishes to do so explicitly.
> >> (E.g. Configure the net-failover VF slave in some special way).
> >> 2) Match the virtio-net and the VF based on a PV attribute instead 
> >> of MAC. (Similar to as done in NetVSC). E.g. Provide a virtio-net 
> >> interface to get PCI slot where the matching VF will be 
> >> hot-plugged by hypervisor.
> >> 3) Have an explicit virtio-net control message to command 
> >> hypervisor to switch data-path from virtio-net to VF and 
> >> vice-versa. Instead of relying on intercepting the PCI master 
> >> enable-bit
> >> as an indicator on when VF is about to be set up. (Similar to as 
> >> done in NetVSC).
> >> 
> >> Is there any clear issue we see regarding the above suggestion?
> >> 
> >> -Liran
> > 
> > The issue would be this: how do we avoid conflicting with namespaces
> > created by users?
>  
>  This is kinda controversial, but maybe separate netns names into 2 
>  groups: hidden and normal.
>  To reference a hidden netns, you need to do it explicitly. 
>  Hidden and normal netns names can collide as they will be maintained 
>  in different namespaces (Yes I’m overloading the term namespace 
>  here…).
> >>> 
> >>> Maybe it's an unnamed namespace. Hidden until userspace gives it a 
> >>> name?
> >> 
> >> This is also a good idea that will solve the issue. Yes.
> >> 
> >>> 
>  Does this seems reasonable?
>  
>  -Liran
> >>> 
> >>> Reasonable I'd say yes, easy to implement probably no. But maybe I
> >>> missed a trick or two.
> >> 
> >> BTW, from a practical point of view, I think that even until we figure 
> >> out a solution on how to implement this,
> >> it was better to create an kernel auto-generated name (e.g. 
> >> “kernel_net_failover_slaves")
> >> that will break only userspace workloads that by a very rare-chance 
> >> have a netns that collides with this then
> >> the breakage we have today for the various userspace components.
> >> 
> >> -Liran
> > 
> > It seems quite easy to supply that as a module parameter. Do we need two
> > namespaces though? Won't some userspace still be confused by the two
> > slaves sharing the MAC address?
>  
>  That’s one reasonable option.
>  Another one is that we will indeed change the mechanism by which we 
>  determine a VF should be bonded with a virtio-net device.
>  i.e. Expose a new virtio-net property that specify the PCI slot of the 
>  VF to be bonded with.
>  
>  The second seems cleaner but I don’t have a strong opinion on this. Both 
>  seem reasonable to me and your suggestion is faster to implement from 
>  current state of things.
>  
>  -Liran
> >>> 
> >>> OK. Now what happens if master is moved to another namespace? Do we need
> >>> to move the slaves too?
> >> 
> >> No. Why would we move the slaves?
> > 
> > 
> > The reason we have 3 device model at all is so users can fine tune the
> > slaves.
> 
> I Agree.
> 
> > I don't see why this applies to the root namespace but not
> > a container. If it has access to failover it should have access
> > to slaves.
> 
> Oh now I see your point. I haven’t thought about the containers usage.
> My thinking was that customer can always just enter to the “hidden” netns and 
> configure there whatever he wants.
> 
> Do you have a suggestion how to handle this?
> 
> One option can be that every "visible" netns on system will have a “hidden” 
> unnamed netns where the net-failover slaves reside in.
> If customer wishes to be able to enter to that netns and manage the 
> net-failover slaves 

Re: [summary] virtio network device failover writeup

2019-03-21 Thread Michael S. Tsirkin
On Thu, Mar 21, 2019 at 03:04:37PM +0200, Liran Alon wrote:
> 
> 
> > On 21 Mar 2019, at 14:57, Michael S. Tsirkin  wrote:
> > 
> > On Thu, Mar 21, 2019 at 02:47:50PM +0200, Liran Alon wrote:
> >> 
> >> 
> >>> On 21 Mar 2019, at 14:37, Michael S. Tsirkin  wrote:
> >>> 
> >>> On Thu, Mar 21, 2019 at 12:07:57PM +0200, Liran Alon wrote:
>  2) It brings non-intuitive customer experience. For example, a 
>  customer may attempt to analyse connectivity issue by checking the 
>  connectivity
>  on a net-failover slave (e.g. the VF) but will see no connectivity 
>  when in-fact checking the connectivity on the net-failover master 
>  netdev shows correct connectivity.
>  
>  The set of changes I vision to fix our issues are:
>  1) Hide net-failover slaves in a different netns created and managed 
>  by the kernel. But that user can enter to it and manage the netdevs 
>  there if wishes to do so explicitly.
>  (E.g. Configure the net-failover VF slave in some special way).
>  2) Match the virtio-net and the VF based on a PV attribute instead 
>  of MAC. (Similar to as done in NetVSC). E.g. Provide a virtio-net 
>  interface to get PCI slot where the matching VF will be hot-plugged 
>  by hypervisor.
>  3) Have an explicit virtio-net control message to command hypervisor 
>  to switch data-path from virtio-net to VF and vice-versa. Instead of 
>  relying on intercepting the PCI master enable-bit
>  as an indicator on when VF is about to be set up. (Similar to as 
>  done in NetVSC).
>  
>  Is there any clear issue we see regarding the above suggestion?
>  
>  -Liran
> >>> 
> >>> The issue would be this: how do we avoid conflicting with namespaces
> >>> created by users?
> >> 
> >> This is kinda controversial, but maybe separate netns names into 2 
> >> groups: hidden and normal.
> >> To reference a hidden netns, you need to do it explicitly. 
> >> Hidden and normal netns names can collide as they will be maintained 
> >> in different namespaces (Yes I’m overloading the term namespace here…).
> > 
> > Maybe it's an unnamed namespace. Hidden until userspace gives it a name?
>  
>  This is also a good idea that will solve the issue. Yes.
>  
> > 
> >> Does this seems reasonable?
> >> 
> >> -Liran
> > 
> > Reasonable I'd say yes, easy to implement probably no. But maybe I
> > missed a trick or two.
>  
>  BTW, from a practical point of view, I think that even until we figure 
>  out a solution on how to implement this,
>  it was better to create an kernel auto-generated name (e.g. 
>  “kernel_net_failover_slaves")
>  that will break only userspace workloads that by a very rare-chance have 
>  a netns that collides with this then
>  the breakage we have today for the various userspace components.
>  
>  -Liran
> >>> 
> >>> It seems quite easy to supply that as a module parameter. Do we need two
> >>> namespaces though? Won't some userspace still be confused by the two
> >>> slaves sharing the MAC address?
> >> 
> >> That’s one reasonable option.
> >> Another one is that we will indeed change the mechanism by which we 
> >> determine a VF should be bonded with a virtio-net device.
> >> i.e. Expose a new virtio-net property that specify the PCI slot of the VF 
> >> to be bonded with.
> >> 
> >> The second seems cleaner but I don’t have a strong opinion on this. Both 
> >> seem reasonable to me and your suggestion is faster to implement from 
> >> current state of things.
> >> 
> >> -Liran
> > 
> > OK. Now what happens if master is moved to another namespace? Do we need
> > to move the slaves too?
> 
> No. Why would we move the slaves?


The reason we have 3 device model at all is so users can fine tune the
slaves. I don't see why this applies to the root namespace but not
a container. If it has access to failover it should have access
to slaves.

> The whole point is to make most customer ignore the net-failover slaves and 
> remain them “hidden” in their dedicated netns.

So that makes the common case easy. That is good. My worry is it might
make some uncommon cases impossible.

> We won’t prevent customer from explicitly moving the net-failover slaves out 
> of this netns, but we will not move them out of there automatically.
> 
> > 
> > Also siwei's patch is then kind of extraneous right?
> > Attempts to rename a slave will now fail as it's in a namespace…
> 
> I’m not sure actually. Isn't udev/systemd netns-aware?
> I would expect it to be able to provide names also to netdevs in netns 
> different than default netns.

I think most people move devices after they are renamed.

> If that’s the case, Si-Wei patch to be able to rename a net-failover slave 
> when it is already open is still required. As 

Re: [summary] virtio network device failover writeup

2019-03-21 Thread Michael S. Tsirkin
On Thu, Mar 21, 2019 at 02:47:50PM +0200, Liran Alon wrote:
> 
> 
> > On 21 Mar 2019, at 14:37, Michael S. Tsirkin  wrote:
> > 
> > On Thu, Mar 21, 2019 at 12:07:57PM +0200, Liran Alon wrote:
> >> 2) It brings non-intuitive customer experience. For example, a 
> >> customer may attempt to analyse connectivity issue by checking the 
> >> connectivity
> >> on a net-failover slave (e.g. the VF) but will see no connectivity 
> >> when in-fact checking the connectivity on the net-failover master 
> >> netdev shows correct connectivity.
> >> 
> >> The set of changes I vision to fix our issues are:
> >> 1) Hide net-failover slaves in a different netns created and managed 
> >> by the kernel. But that user can enter to it and manage the netdevs 
> >> there if wishes to do so explicitly.
> >> (E.g. Configure the net-failover VF slave in some special way).
> >> 2) Match the virtio-net and the VF based on a PV attribute instead of 
> >> MAC. (Similar to as done in NetVSC). E.g. Provide a virtio-net 
> >> interface to get PCI slot where the matching VF will be hot-plugged by 
> >> hypervisor.
> >> 3) Have an explicit virtio-net control message to command hypervisor 
> >> to switch data-path from virtio-net to VF and vice-versa. Instead of 
> >> relying on intercepting the PCI master enable-bit
> >> as an indicator on when VF is about to be set up. (Similar to as done 
> >> in NetVSC).
> >> 
> >> Is there any clear issue we see regarding the above suggestion?
> >> 
> >> -Liran
> > 
> > The issue would be this: how do we avoid conflicting with namespaces
> > created by users?
>  
>  This is kinda controversial, but maybe separate netns names into 2 
>  groups: hidden and normal.
>  To reference a hidden netns, you need to do it explicitly. 
>  Hidden and normal netns names can collide as they will be maintained in 
>  different namespaces (Yes I’m overloading the term namespace here…).
> >>> 
> >>> Maybe it's an unnamed namespace. Hidden until userspace gives it a name?
> >> 
> >> This is also a good idea that will solve the issue. Yes.
> >> 
> >>> 
>  Does this seems reasonable?
>  
>  -Liran
> >>> 
> >>> Reasonable I'd say yes, easy to implement probably no. But maybe I
> >>> missed a trick or two.
> >> 
> >> BTW, from a practical point of view, I think that even until we figure out 
> >> a solution on how to implement this,
> >> it was better to create an kernel auto-generated name (e.g. 
> >> “kernel_net_failover_slaves")
> >> that will break only userspace workloads that by a very rare-chance have a 
> >> netns that collides with this then
> >> the breakage we have today for the various userspace components.
> >> 
> >> -Liran
> > 
> > It seems quite easy to supply that as a module parameter. Do we need two
> > namespaces though? Won't some userspace still be confused by the two
> > slaves sharing the MAC address?
> 
> That’s one reasonable option.
> Another one is that we will indeed change the mechanism by which we determine 
> a VF should be bonded with a virtio-net device.
> i.e. Expose a new virtio-net property that specify the PCI slot of the VF to 
> be bonded with.
> 
> The second seems cleaner but I don’t have a strong opinion on this. Both seem 
> reasonable to me and your suggestion is faster to implement from current 
> state of things.
> 
> -Liran

OK. Now what happens if master is moved to another namespace? Do we need
to move the slaves too?

Also siwei's patch is then kind of extraneous right?
Attempts to rename a slave will now fail as it's in a namespace...

> > 
> > -- 
> > MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [summary] virtio network device failover writeup

2019-03-21 Thread Michael S. Tsirkin
On Thu, Mar 21, 2019 at 12:07:57PM +0200, Liran Alon wrote:
>  2) It brings non-intuitive customer experience. For example, a customer 
>  may attempt to analyse connectivity issue by checking the connectivity
>  on a net-failover slave (e.g. the VF) but will see no connectivity when 
>  in-fact checking the connectivity on the net-failover master netdev 
>  shows correct connectivity.
>  
>  The set of changes I vision to fix our issues are:
>  1) Hide net-failover slaves in a different netns created and managed by 
>  the kernel. But that user can enter to it and manage the netdevs there 
>  if wishes to do so explicitly.
>  (E.g. Configure the net-failover VF slave in some special way).
>  2) Match the virtio-net and the VF based on a PV attribute instead of 
>  MAC. (Similar to as done in NetVSC). E.g. Provide a virtio-net interface 
>  to get PCI slot where the matching VF will be hot-plugged by hypervisor.
>  3) Have an explicit virtio-net control message to command hypervisor to 
>  switch data-path from virtio-net to VF and vice-versa. Instead of 
>  relying on intercepting the PCI master enable-bit
>  as an indicator on when VF is about to be set up. (Similar to as done in 
>  NetVSC).
>  
>  Is there any clear issue we see regarding the above suggestion?
>  
>  -Liran
> >>> 
> >>> The issue would be this: how do we avoid conflicting with namespaces
> >>> created by users?
> >> 
> >> This is kinda controversial, but maybe separate netns names into 2 groups: 
> >> hidden and normal.
> >> To reference a hidden netns, you need to do it explicitly. 
> >> Hidden and normal netns names can collide as they will be maintained in 
> >> different namespaces (Yes I’m overloading the term namespace here…).
> > 
> > Maybe it's an unnamed namespace. Hidden until userspace gives it a name?
> 
> This is also a good idea that will solve the issue. Yes.
> 
> > 
> >> Does this seems reasonable?
> >> 
> >> -Liran
> > 
> > Reasonable I'd say yes, easy to implement probably no. But maybe I
> > missed a trick or two.
> 
> BTW, from a practical point of view, I think that even until we figure out a 
> solution on how to implement this,
> it was better to create an kernel auto-generated name (e.g. 
> “kernel_net_failover_slaves")
> that will break only userspace workloads that by a very rare-chance have a 
> netns that collides with this then
> the breakage we have today for the various userspace components.
> 
> -Liran

It seems quite easy to supply that as a module parameter. Do we need two
namespaces though? Won't some userspace still be confused by the two
slaves sharing the MAC address?

-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [summary] virtio network device failover writeup

2019-03-21 Thread Michael S. Tsirkin
On Thu, Mar 21, 2019 at 12:19:22AM +0200, Liran Alon wrote:
> 
> 
> > On 21 Mar 2019, at 0:10, Michael S. Tsirkin  wrote:
> > 
> > On Wed, Mar 20, 2019 at 11:43:41PM +0200, Liran Alon wrote:
> >> 
> >> 
> >>> On 20 Mar 2019, at 16:09, Michael S. Tsirkin  wrote:
> >>> 
> >>> On Wed, Mar 20, 2019 at 02:23:36PM +0200, Liran Alon wrote:
>  
>  
> > On 20 Mar 2019, at 12:25, Michael S. Tsirkin  wrote:
> > 
> > On Wed, Mar 20, 2019 at 01:25:58AM +0200, Liran Alon wrote:
> >> 
> >> 
> >>> On 19 Mar 2019, at 23:19, Michael S. Tsirkin  wrote:
> >>> 
> >>> On Tue, Mar 19, 2019 at 08:46:47AM -0700, Stephen Hemminger wrote:
>  On Tue, 19 Mar 2019 14:38:06 +0200
>  Liran Alon  wrote:
>  
> > b.3) cloud-init: If configured to perform network-configuration, it 
> > attempts to configure all available netdevs. It should avoid 
> > however doing so on net-failover slaves.
> > (Microsoft has handled this by adding a mechanism in cloud-init to 
> > blacklist a netdev from being configured in case it is owned by a 
> > specific PCI driver. Specifically, they blacklist Mellanox VF 
> > driver. However, this technique doesn’t work for the net-failover 
> > mechanism because both the net-failover netdev and the virtio-net 
> > netdev are owned by the virtio-net PCI driver).
>  
>  Cloud-init should really just ignore all devices that have a master 
>  device.
>  That would have been more general, and safer for other use cases.
> >>> 
> >>> Given lots of userspace doesn't do this, I wonder whether it would be
> >>> safer to just somehow pretend to userspace that the slave links are
> >>> down? And add a special attribute for the actual link state.
> >> 
> >> I think this may be problematic as it would also break legit use case
> >> of userspace attempt to set various config on VF slave.
> >> In general, lying to userspace usually leads to problems.
> > 
> > I hear you on this. So how about instead of lying,
> > we basically just fail some accesses to slaves
> > unless a flag is set e.g. in ethtool.
> > 
> > Some userspace will need to change to set it but in a minor way.
> > Arguably/hopefully failure to set config would generally be a safer
> > failure.
>  
>  Once userspace will set this new flag by ethtool, all operations done by 
>  other userspace components will still work.
> >>> 
> >>> Sorry about being unclear, the idea would be to require the flag on each 
> >>> ethtool operation.
> >> 
> >> Oh. I have indeed misunderstood your previous email then. :)
> >> Thanks for clarifying.
> >> 
> >>> 
>  E.g. Running dhclient without parameters, after this flag was set, will 
>  still attempt to perform DHCP on it and will now succeed.
> >>> 
> >>> I think sending/receiving should probably just fail unconditionally.
> >> 
> >> You mean that you wish that somehow kernel will prevent Tx on net-failover 
> >> slave netdev
> >> unless skb is marked with some flag to indicate it has been sent via the 
> >> net-failover master?
> > 
> > We can maybe avoid binding a protocol socket to the device?
> 
> That is indeed another possibility that would work to avoid the DHCP issues.
> And will still allow checking connectivity. So it is better.
> However, I still think it provides an non-intuitive customer experience.
> In addition, I also want to take into account that most customers are 
> expected a 1:1 mapping between a vNIC and a netdev.
> i.e. A cloud instance should show 1-netdev if it has one vNIC attached to it 
> defined.
> Customers usually don’t care how they get accelerated networking. They just 
> care they do.
> 
> > 
> >> This indeed resolves the group of userspace issues around performing DHCP 
> >> on net-failover slaves directly (By dracut/initramfs, dhclient and etc.).
> >> 
> >> However, I see a couple of down-sides to it:
> >> 1) It doesn’t resolve all userspace issues listed in this email thread. 
> >> For example, cloud-init will still attempt to perform network config on 
> >> net-failover slaves.
> >> It also doesn’t help with regard to Ubuntu’s netplan issue that creates 
> >> udev rules that match only by MAC.
> > 
> > 
> > How about we fail to retrieve mac from the slave?
> 
> That would work but I think it is cleaner to just not bind PV and VF based on 
> having the same MAC.

There's a reference to that under "Non-MAC based pairing".

I'll look into making it more explicit.

> > 
> >> 2) It brings non-intuitive customer experience. For example, a customer 
> >> may attempt to analyse connectivity issue by checking the connectivity
> >> on a net-failover slave (e.g. the VF) but will see no connectivity when 
> >> in-fact checking the connectivity on the net-failover master netdev shows 
> >> correct connectivity.
> >> 
> >> The set of changes I vision to fix our