[virtio-dev] Re: [PATCH v4 2/2] virtio_net: Extend virtio to use VF datapath when available

2018-03-13 Thread Siwei Liu
On Tue, Mar 13, 2018 at 5:28 PM, Samudrala, Sridhar
 wrote:
> On 3/12/2018 3:44 PM, Siwei Liu wrote:
>>
>> Apologies, still some comments going. Please see inline.
>>
>> On Thu, Mar 1, 2018 at 12:08 PM, Sridhar Samudrala
>>  wrote:
>>>
>>> This patch enables virtio_net to switch over to a VF datapath when a VF
>>> netdev is present with the same MAC address. It allows live migration
>>> of a VM with a direct attached VF without the need to setup a bond/team
>>> between a VF and virtio net device in the guest.
>>>
>>> The hypervisor needs to enable only one datapath at any time so that
>>> packets don't get looped back to the VM over the other datapath. When a
>>> VF
>>> is plugged, the virtio datapath link state can be marked as down. The
>>> hypervisor needs to unplug the VF device from the guest on the source
>>> host
>>> and reset the MAC filter of the VF to initiate failover of datapath to
>>> virtio before starting the migration. After the migration is completed,
>>> the destination hypervisor sets the MAC filter on the VF and plugs it
>>> back
>>> to the guest to switch over to VF datapath.
>>>
>>> When BACKUP feature is enabled, an additional netdev(bypass netdev) is
>>> created that acts as a master device and tracks the state of the 2 lower
>>> netdevs. The original virtio_net netdev is marked as 'backup' netdev and
>>> a
>>> passthru device with the same MAC is registered as 'active' netdev.
>>>
>>> This patch is based on the discussion initiated by Jesse on this thread.
>>> https://marc.info/?l=linux-virtualization=151189725224231=2
>>>
>>> Signed-off-by: Sridhar Samudrala 
>>> Signed-off-by: Alexander Duyck 
>>> Reviewed-by: Jesse Brandeburg 
>>> ---
>>>   drivers/net/virtio_net.c | 683
>>> ++-
>>>   1 file changed, 682 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
>>> index bcd13fe906ca..f2860d86c952 100644
>>> --- a/drivers/net/virtio_net.c
>>> +++ b/drivers/net/virtio_net.c
>>> @@ -30,6 +30,8 @@
>>>   #include 
>>>   #include 
>>>   #include 
>>> +#include 
>>> +#include 
>>>   #include 
>>>   #include 
>>>
>>> @@ -206,6 +208,9 @@ struct virtnet_info {
>>>  u32 speed;
>>>
>>>  unsigned long guest_offloads;
>>> +
>>> +   /* upper netdev created when BACKUP feature enabled */
>>> +   struct net_device *bypass_netdev;
>>>   };
>>>
>>>   struct padded_vnet_hdr {
>>> @@ -2236,6 +2241,22 @@ static int virtnet_xdp(struct net_device *dev,
>>> struct netdev_bpf *xdp)
>>>  }
>>>   }
>>>
>>> +static int virtnet_get_phys_port_name(struct net_device *dev, char *buf,
>>> + size_t len)
>>> +{
>>> +   struct virtnet_info *vi = netdev_priv(dev);
>>> +   int ret;
>>> +
>>> +   if (!virtio_has_feature(vi->vdev, VIRTIO_NET_F_BACKUP))
>>> +   return -EOPNOTSUPP;
>>> +
>>> +   ret = snprintf(buf, len, "_bkup");
>>> +   if (ret >= len)
>>> +   return -EOPNOTSUPP;
>>> +
>>> +   return 0;
>>> +}
>>> +
>>>   static const struct net_device_ops virtnet_netdev = {
>>>  .ndo_open= virtnet_open,
>>>  .ndo_stop= virtnet_close,
>>> @@ -2253,6 +2274,7 @@ static const struct net_device_ops virtnet_netdev =
>>> {
>>>  .ndo_xdp_xmit   = virtnet_xdp_xmit,
>>>  .ndo_xdp_flush  = virtnet_xdp_flush,
>>>  .ndo_features_check = passthru_features_check,
>>> +   .ndo_get_phys_port_name = virtnet_get_phys_port_name,
>>>   };
>>>
>>>   static void virtnet_config_changed_work(struct work_struct *work)
>>> @@ -2647,6 +2669,653 @@ static int virtnet_validate(struct virtio_device
>>> *vdev)
>>>  return 0;
>>>   }
>>>
>>> +/* START of functions supporting VIRTIO_NET_F_BACKUP feature.
>>> + * When BACKUP feature is enabled, an additional netdev(bypass netdev)
>>> + * is created that acts as a master device and tracks the state of the
>>> + * 2 lower netdevs. The original virtio_net netdev is registered as
>>> + * 'backup' netdev and a passthru device with the same MAC is registered
>>> + * as 'active' netdev.
>>> + */
>>> +
>>> +/* bypass state maintained when BACKUP feature is enabled */
>>> +struct virtnet_bypass_info {
>>> +   /* passthru netdev with same MAC */
>>> +   struct net_device __rcu *active_netdev;
>>> +
>>> +   /* virtio_net netdev */
>>> +   struct net_device __rcu *backup_netdev;
>>> +
>>> +   /* active netdev stats */
>>> +   struct rtnl_link_stats64 active_stats;
>>> +
>>> +   /* backup netdev stats */
>>> +   struct rtnl_link_stats64 backup_stats;
>>> +
>>> +   /* aggregated stats */
>>> +   struct rtnl_link_stats64 bypass_stats;
>>> +
>>> +   /* spinlock while updating stats */
>>> +   spinlock_t stats_lock;
>>> +};
>>> +
>>> 

[virtio-dev] Re: [PATCH v4 2/2] virtio_net: Extend virtio to use VF datapath when available

2018-03-13 Thread Michael S. Tsirkin
On Tue, Mar 13, 2018 at 05:28:07PM -0700, Samudrala, Sridhar wrote:
> > I am not sure if it's a good idea to leave the
> > virtio_bypass around if running into failure: the guest is not
> > migratable as the VF doesn't have a backup path,
> 
> Are you talking about a failure when registering backup netdev?  This should 
> not
> happen, but i guess we can improve error handling in such scenario.

A nice way to do this would be to clear the backup feature bit.

> 
> > And perhaps the worse
> > part is that, it now has two interfaces with identical MAC address but
> > one of them is invalid (user cannot use the virtio interface as it has
> > a dampened datapath). IMHO the virtio_bypass and its lower netdev
> > should be destroyed at all when it fails to bind the VF, and
> > technically, there should be some way to propogate the failure status
> > to the hypervisor/backend, indicating that the VM is not migratable
> > because of guest software errors (maybe by clearing out the backup
> > feature from the guest virtio driver so host can see/learn it).
> 
> In BACKUP mode, user can only use the upper virtio_bypass netdev and that will
> always be there. Any failure to enslave VF netdev is not fatal, but i will see
> if we can improve the error handling of failure to enslave backup netdev.
> Also, i don't think the BACKUP feature bit is negotiable with the host.
> 
> Thanks
> Sridhar

All bits are negotiable.  It's up to the host whether to support
a device with this bit clear, or to fail negotiation.

-- 
MST

-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] Re: [PATCH v4 2/2] virtio_net: Extend virtio to use VF datapath when available

2018-03-13 Thread Samudrala, Sridhar

On 3/12/2018 2:08 PM, Jiri Pirko wrote:

Mon, Mar 12, 2018 at 09:58:06PM CET, sridhar.samudr...@intel.com wrote:


On 3/12/2018 1:12 PM, Jiri Pirko wrote:

Thu, Mar 01, 2018 at 09:08:43PM CET, sridhar.samudr...@intel.com wrote:

This patch enables virtio_net to switch over to a VF datapath when a VF
netdev is present with the same MAC address. It allows live migration
of a VM with a direct attached VF without the need to setup a bond/team
between a VF and virtio net device in the guest.

The hypervisor needs to enable only one datapath at any time so that
packets don't get looped back to the VM over the other datapath. When a VF
is plugged, the virtio datapath link state can be marked as down. The
hypervisor needs to unplug the VF device from the guest on the source host
and reset the MAC filter of the VF to initiate failover of datapath to
virtio before starting the migration. After the migration is completed,
the destination hypervisor sets the MAC filter on the VF and plugs it back
to the guest to switch over to VF datapath.

When BACKUP feature is enabled, an additional netdev(bypass netdev) is
created that acts as a master device and tracks the state of the 2 lower
netdevs. The original virtio_net netdev is marked as 'backup' netdev and a
passthru device with the same MAC is registered as 'active' netdev.

This patch is based on the discussion initiated by Jesse on this thread.
https://marc.info/?l=linux-virtualization=151189725224231=2

Signed-off-by: Sridhar Samudrala 
Signed-off-by: Alexander Duyck 
Reviewed-by: Jesse Brandeburg 

[...]



+static int virtnet_bypass_register_child(struct net_device *child_netdev)
+{
+   struct virtnet_bypass_info *vbi;
+   struct net_device *dev;
+   bool backup;
+   int ret;
+
+   if (child_netdev->addr_len != ETH_ALEN)
+   return NOTIFY_DONE;
+
+   /* We will use the MAC address to locate the virtnet_bypass netdev
+* to associate with the child netdev. If we don't find a matching
+* bypass netdev, move on.
+*/
+   dev = get_virtnet_bypass_bymac(dev_net(child_netdev),
+  child_netdev->perm_addr);
+   if (!dev)
+   return NOTIFY_DONE;
+
+   vbi = netdev_priv(dev);
+   backup = (child_netdev->dev.parent == dev->dev.parent);
+   if (backup ? rtnl_dereference(vbi->backup_netdev) :
+   rtnl_dereference(vbi->active_netdev)) {
+   netdev_info(dev,
+   "%s attempting to join bypass dev when %s already 
present\n",
+   child_netdev->name, backup ? "backup" : "active");
+   return NOTIFY_DONE;
+   }
+
+   /* Avoid non pci devices as active netdev */
+   if (!backup && (!child_netdev->dev.parent ||
+   !dev_is_pci(child_netdev->dev.parent)))
+   return NOTIFY_DONE;
+
+   ret = netdev_rx_handler_register(child_netdev,
+virtnet_bypass_handle_frame, dev);
+   if (ret != 0) {
+   netdev_err(child_netdev,
+  "can not register bypass receive handler (err = 
%d)\n",
+  ret);
+   goto rx_handler_failed;
+   }
+
+   ret = netdev_upper_dev_link(child_netdev, dev, NULL);
+   if (ret != 0) {
+   netdev_err(child_netdev,
+  "can not set master device %s (err = %d)\n",
+  dev->name, ret);
+   goto upper_link_failed;
+   }
+
+   child_netdev->flags |= IFF_SLAVE;
+
+   if (netif_running(dev)) {
+   ret = dev_open(child_netdev);
+   if (ret && (ret != -EBUSY)) {
+   netdev_err(dev, "Opening child %s failed ret:%d\n",
+  child_netdev->name, ret);
+   goto err_interface_up;
+   }
+   }

Much of this function is copy of netvsc_vf_join, should be shared with
netvsc.

Yes. This is based on netvsc_register_vf() and netvsc_vf_join(). But modified
to handle registration of 2 child netdevs(backup virtio and active vf).  In case
of netvsc, they only register VF. There is no backup netdev.
I think trying to make this code shareable with netvsc will introduce additional
checks and make it convoluted.

No problem.


Not clear what you meant here?
Want to confirm that you are agreeing that it is OK to not share this function
with netvsc.










+
+   /* Align MTU of child with master */
+   ret = dev_set_mtu(child_netdev, dev->mtu);
+   if (ret) {
+   netdev_err(dev,
+  "unable to change mtu of %s to %u register failed\n",
+  child_netdev->name, dev->mtu);
+   goto err_set_mtu;
+   }
+
+   call_netdevice_notifiers(NETDEV_JOIN, child_netdev);
+
+ 

[virtio-dev] Re: [PATCH v4 2/2] virtio_net: Extend virtio to use VF datapath when available

2018-03-13 Thread Samudrala, Sridhar

On 3/12/2018 3:44 PM, Siwei Liu wrote:

Apologies, still some comments going. Please see inline.

On Thu, Mar 1, 2018 at 12:08 PM, Sridhar Samudrala
 wrote:

This patch enables virtio_net to switch over to a VF datapath when a VF
netdev is present with the same MAC address. It allows live migration
of a VM with a direct attached VF without the need to setup a bond/team
between a VF and virtio net device in the guest.

The hypervisor needs to enable only one datapath at any time so that
packets don't get looped back to the VM over the other datapath. When a VF
is plugged, the virtio datapath link state can be marked as down. The
hypervisor needs to unplug the VF device from the guest on the source host
and reset the MAC filter of the VF to initiate failover of datapath to
virtio before starting the migration. After the migration is completed,
the destination hypervisor sets the MAC filter on the VF and plugs it back
to the guest to switch over to VF datapath.

When BACKUP feature is enabled, an additional netdev(bypass netdev) is
created that acts as a master device and tracks the state of the 2 lower
netdevs. The original virtio_net netdev is marked as 'backup' netdev and a
passthru device with the same MAC is registered as 'active' netdev.

This patch is based on the discussion initiated by Jesse on this thread.
https://marc.info/?l=linux-virtualization=151189725224231=2

Signed-off-by: Sridhar Samudrala 
Signed-off-by: Alexander Duyck 
Reviewed-by: Jesse Brandeburg 
---
  drivers/net/virtio_net.c | 683 ++-
  1 file changed, 682 insertions(+), 1 deletion(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index bcd13fe906ca..f2860d86c952 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -30,6 +30,8 @@
  #include 
  #include 
  #include 
+#include 
+#include 
  #include 
  #include 

@@ -206,6 +208,9 @@ struct virtnet_info {
 u32 speed;

 unsigned long guest_offloads;
+
+   /* upper netdev created when BACKUP feature enabled */
+   struct net_device *bypass_netdev;
  };

  struct padded_vnet_hdr {
@@ -2236,6 +2241,22 @@ static int virtnet_xdp(struct net_device *dev, struct 
netdev_bpf *xdp)
 }
  }

+static int virtnet_get_phys_port_name(struct net_device *dev, char *buf,
+ size_t len)
+{
+   struct virtnet_info *vi = netdev_priv(dev);
+   int ret;
+
+   if (!virtio_has_feature(vi->vdev, VIRTIO_NET_F_BACKUP))
+   return -EOPNOTSUPP;
+
+   ret = snprintf(buf, len, "_bkup");
+   if (ret >= len)
+   return -EOPNOTSUPP;
+
+   return 0;
+}
+
  static const struct net_device_ops virtnet_netdev = {
 .ndo_open= virtnet_open,
 .ndo_stop= virtnet_close,
@@ -2253,6 +2274,7 @@ static const struct net_device_ops virtnet_netdev = {
 .ndo_xdp_xmit   = virtnet_xdp_xmit,
 .ndo_xdp_flush  = virtnet_xdp_flush,
 .ndo_features_check = passthru_features_check,
+   .ndo_get_phys_port_name = virtnet_get_phys_port_name,
  };

  static void virtnet_config_changed_work(struct work_struct *work)
@@ -2647,6 +2669,653 @@ static int virtnet_validate(struct virtio_device *vdev)
 return 0;
  }

+/* START of functions supporting VIRTIO_NET_F_BACKUP feature.
+ * When BACKUP feature is enabled, an additional netdev(bypass netdev)
+ * is created that acts as a master device and tracks the state of the
+ * 2 lower netdevs. The original virtio_net netdev is registered as
+ * 'backup' netdev and a passthru device with the same MAC is registered
+ * as 'active' netdev.
+ */
+
+/* bypass state maintained when BACKUP feature is enabled */
+struct virtnet_bypass_info {
+   /* passthru netdev with same MAC */
+   struct net_device __rcu *active_netdev;
+
+   /* virtio_net netdev */
+   struct net_device __rcu *backup_netdev;
+
+   /* active netdev stats */
+   struct rtnl_link_stats64 active_stats;
+
+   /* backup netdev stats */
+   struct rtnl_link_stats64 backup_stats;
+
+   /* aggregated stats */
+   struct rtnl_link_stats64 bypass_stats;
+
+   /* spinlock while updating stats */
+   spinlock_t stats_lock;
+};
+
+static void virtnet_bypass_child_open(struct net_device *dev,
+ struct net_device *child_netdev)
+{
+   int err = dev_open(child_netdev);
+
+   if (err)
+   netdev_warn(dev, "unable to open slave: %s: %d\n",
+   child_netdev->name, err);
+}

Why ignoring the error?? i.e. virtnet_bypass_child_open should have
return value. I don't believe the caller is in a safe context to
guarantee the dev_open always succeeds.


OK.  Will handle this in the next revision.





+
+static int virtnet_bypass_open(struct 

[virtio-dev] Re: [PATCH v4 2/2] virtio_net: Extend virtio to use VF datapath when available

2018-03-12 Thread Siwei Liu
Apologies, still some comments going. Please see inline.

On Thu, Mar 1, 2018 at 12:08 PM, Sridhar Samudrala
 wrote:
> This patch enables virtio_net to switch over to a VF datapath when a VF
> netdev is present with the same MAC address. It allows live migration
> of a VM with a direct attached VF without the need to setup a bond/team
> between a VF and virtio net device in the guest.
>
> The hypervisor needs to enable only one datapath at any time so that
> packets don't get looped back to the VM over the other datapath. When a VF
> is plugged, the virtio datapath link state can be marked as down. The
> hypervisor needs to unplug the VF device from the guest on the source host
> and reset the MAC filter of the VF to initiate failover of datapath to
> virtio before starting the migration. After the migration is completed,
> the destination hypervisor sets the MAC filter on the VF and plugs it back
> to the guest to switch over to VF datapath.
>
> When BACKUP feature is enabled, an additional netdev(bypass netdev) is
> created that acts as a master device and tracks the state of the 2 lower
> netdevs. The original virtio_net netdev is marked as 'backup' netdev and a
> passthru device with the same MAC is registered as 'active' netdev.
>
> This patch is based on the discussion initiated by Jesse on this thread.
> https://marc.info/?l=linux-virtualization=151189725224231=2
>
> Signed-off-by: Sridhar Samudrala 
> Signed-off-by: Alexander Duyck 
> Reviewed-by: Jesse Brandeburg 
> ---
>  drivers/net/virtio_net.c | 683 
> ++-
>  1 file changed, 682 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index bcd13fe906ca..f2860d86c952 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -30,6 +30,8 @@
>  #include 
>  #include 
>  #include 
> +#include 
> +#include 
>  #include 
>  #include 
>
> @@ -206,6 +208,9 @@ struct virtnet_info {
> u32 speed;
>
> unsigned long guest_offloads;
> +
> +   /* upper netdev created when BACKUP feature enabled */
> +   struct net_device *bypass_netdev;
>  };
>
>  struct padded_vnet_hdr {
> @@ -2236,6 +2241,22 @@ static int virtnet_xdp(struct net_device *dev, struct 
> netdev_bpf *xdp)
> }
>  }
>
> +static int virtnet_get_phys_port_name(struct net_device *dev, char *buf,
> + size_t len)
> +{
> +   struct virtnet_info *vi = netdev_priv(dev);
> +   int ret;
> +
> +   if (!virtio_has_feature(vi->vdev, VIRTIO_NET_F_BACKUP))
> +   return -EOPNOTSUPP;
> +
> +   ret = snprintf(buf, len, "_bkup");
> +   if (ret >= len)
> +   return -EOPNOTSUPP;
> +
> +   return 0;
> +}
> +
>  static const struct net_device_ops virtnet_netdev = {
> .ndo_open= virtnet_open,
> .ndo_stop= virtnet_close,
> @@ -2253,6 +2274,7 @@ static const struct net_device_ops virtnet_netdev = {
> .ndo_xdp_xmit   = virtnet_xdp_xmit,
> .ndo_xdp_flush  = virtnet_xdp_flush,
> .ndo_features_check = passthru_features_check,
> +   .ndo_get_phys_port_name = virtnet_get_phys_port_name,
>  };
>
>  static void virtnet_config_changed_work(struct work_struct *work)
> @@ -2647,6 +2669,653 @@ static int virtnet_validate(struct virtio_device 
> *vdev)
> return 0;
>  }
>
> +/* START of functions supporting VIRTIO_NET_F_BACKUP feature.
> + * When BACKUP feature is enabled, an additional netdev(bypass netdev)
> + * is created that acts as a master device and tracks the state of the
> + * 2 lower netdevs. The original virtio_net netdev is registered as
> + * 'backup' netdev and a passthru device with the same MAC is registered
> + * as 'active' netdev.
> + */
> +
> +/* bypass state maintained when BACKUP feature is enabled */
> +struct virtnet_bypass_info {
> +   /* passthru netdev with same MAC */
> +   struct net_device __rcu *active_netdev;
> +
> +   /* virtio_net netdev */
> +   struct net_device __rcu *backup_netdev;
> +
> +   /* active netdev stats */
> +   struct rtnl_link_stats64 active_stats;
> +
> +   /* backup netdev stats */
> +   struct rtnl_link_stats64 backup_stats;
> +
> +   /* aggregated stats */
> +   struct rtnl_link_stats64 bypass_stats;
> +
> +   /* spinlock while updating stats */
> +   spinlock_t stats_lock;
> +};
> +
> +static void virtnet_bypass_child_open(struct net_device *dev,
> + struct net_device *child_netdev)
> +{
> +   int err = dev_open(child_netdev);
> +
> +   if (err)
> +   netdev_warn(dev, "unable to open slave: %s: %d\n",
> +   child_netdev->name, err);
> +}

Why ignoring the error?? i.e. virtnet_bypass_child_open should have
return value. I don't 

[virtio-dev] Re: [PATCH v4 2/2] virtio_net: Extend virtio to use VF datapath when available

2018-03-12 Thread Siwei Liu
On Sat, Mar 3, 2018 at 8:04 PM, Michael S. Tsirkin  wrote:
> On Fri, Mar 02, 2018 at 03:56:31PM -0800, Siwei Liu wrote:
>> On Fri, Mar 2, 2018 at 1:36 PM, Michael S. Tsirkin  wrote:
>> > On Fri, Mar 02, 2018 at 01:11:56PM -0800, Siwei Liu wrote:
>> >> On Thu, Mar 1, 2018 at 12:08 PM, Sridhar Samudrala
>> >>  wrote:
>> >> > This patch enables virtio_net to switch over to a VF datapath when a VF
>> >> > netdev is present with the same MAC address. It allows live migration
>> >> > of a VM with a direct attached VF without the need to setup a bond/team
>> >> > between a VF and virtio net device in the guest.
>> >> >
>> >> > The hypervisor needs to enable only one datapath at any time so that
>> >> > packets don't get looped back to the VM over the other datapath. When a 
>> >> > VF
>> >> > is plugged, the virtio datapath link state can be marked as down. The
>> >> > hypervisor needs to unplug the VF device from the guest on the source 
>> >> > host
>> >> > and reset the MAC filter of the VF to initiate failover of datapath to
>> >> > virtio before starting the migration. After the migration is completed,
>> >> > the destination hypervisor sets the MAC filter on the VF and plugs it 
>> >> > back
>> >> > to the guest to switch over to VF datapath.
>> >> >
>> >> > When BACKUP feature is enabled, an additional netdev(bypass netdev) is
>> >> > created that acts as a master device and tracks the state of the 2 lower
>> >> > netdevs. The original virtio_net netdev is marked as 'backup' netdev 
>> >> > and a
>> >> > passthru device with the same MAC is registered as 'active' netdev.
>> >> >
>> >> > This patch is based on the discussion initiated by Jesse on this thread.
>> >> > https://marc.info/?l=linux-virtualization=151189725224231=2
>> >> >
>> >> > Signed-off-by: Sridhar Samudrala 
>> >> > Signed-off-by: Alexander Duyck 
>> >> > Reviewed-by: Jesse Brandeburg 
>> >> > ---
>> >> >  drivers/net/virtio_net.c | 683 
>> >> > ++-
>> >> >  1 file changed, 682 insertions(+), 1 deletion(-)
>> >> >
>> >> > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
>> >> > index bcd13fe906ca..f2860d86c952 100644
>> >> > --- a/drivers/net/virtio_net.c
>> >> > +++ b/drivers/net/virtio_net.c
>> >> > @@ -30,6 +30,8 @@
>> >> >  #include 
>> >> >  #include 
>> >> >  #include 
>> >> > +#include 
>> >> > +#include 
>> >> >  #include 
>> >> >  #include 
>> >> >
>> >> > @@ -206,6 +208,9 @@ struct virtnet_info {
>> >> > u32 speed;
>> >> >
>> >> > unsigned long guest_offloads;
>> >> > +
>> >> > +   /* upper netdev created when BACKUP feature enabled */
>> >> > +   struct net_device *bypass_netdev;
>> >> >  };
>> >> >
>> >> >  struct padded_vnet_hdr {
>> >> > @@ -2236,6 +2241,22 @@ static int virtnet_xdp(struct net_device *dev, 
>> >> > struct netdev_bpf *xdp)
>> >> > }
>> >> >  }
>> >> >
>> >> > +static int virtnet_get_phys_port_name(struct net_device *dev, char 
>> >> > *buf,
>> >> > + size_t len)
>> >> > +{
>> >> > +   struct virtnet_info *vi = netdev_priv(dev);
>> >> > +   int ret;
>> >> > +
>> >> > +   if (!virtio_has_feature(vi->vdev, VIRTIO_NET_F_BACKUP))
>> >> > +   return -EOPNOTSUPP;
>> >> > +
>> >> > +   ret = snprintf(buf, len, "_bkup");
>> >> > +   if (ret >= len)
>> >> > +   return -EOPNOTSUPP;
>> >> > +
>> >> > +   return 0;
>> >> > +}
>> >> > +
>> >>
>> >> What if the systemd/udevd is not new enough to enforce the
>> >> n naming? Would virtio_bypass get a different name
>> >> than the original virtio_net?
>> >
>> > You mean people using ethX names? Any hardware config change breaks
>> > these, I don't think that can be helped.
>>
>> I don't like the way to rely on .ndo_get_phys_port_name - it's fragile
>> and it does not completely solve the problem it tries to address.
>> Imagine what can end up with if getting an old udevd, or users already
>> have exsiting explicit udev rules around phys_port_name. It does not
>> give you the an ack in saying "yes, I know you're the bypass and
>> you're the backup, please continue and I will give you both correct
>> names", or an unacknowlegment saying "no, I don't know what these
>> extra interfaces are, please go back and leave the VF device alone".
>> We need new udev API for both feature negotiation and naming, or may
>> even completely hide the lower interfaces.
>
> Go ahead and try to make this happen, but I won't hold my
> breath.

Heads-up: I have some working code to hide the lower devices, upon
which the new udev/sysfs interace for feature negotiation and naming
will be built. I will get it posted for review in the next few days
once ready. Hopefully this could end the fight between the 3-netdev
and 2-netdev driver model, as I see most of the problems being argued
in 

[virtio-dev] Re: [PATCH v4 2/2] virtio_net: Extend virtio to use VF datapath when available

2018-03-12 Thread Samudrala, Sridhar



On 3/12/2018 1:12 PM, Jiri Pirko wrote:

Thu, Mar 01, 2018 at 09:08:43PM CET, sridhar.samudr...@intel.com wrote:

This patch enables virtio_net to switch over to a VF datapath when a VF
netdev is present with the same MAC address. It allows live migration
of a VM with a direct attached VF without the need to setup a bond/team
between a VF and virtio net device in the guest.

The hypervisor needs to enable only one datapath at any time so that
packets don't get looped back to the VM over the other datapath. When a VF
is plugged, the virtio datapath link state can be marked as down. The
hypervisor needs to unplug the VF device from the guest on the source host
and reset the MAC filter of the VF to initiate failover of datapath to
virtio before starting the migration. After the migration is completed,
the destination hypervisor sets the MAC filter on the VF and plugs it back
to the guest to switch over to VF datapath.

When BACKUP feature is enabled, an additional netdev(bypass netdev) is
created that acts as a master device and tracks the state of the 2 lower
netdevs. The original virtio_net netdev is marked as 'backup' netdev and a
passthru device with the same MAC is registered as 'active' netdev.

This patch is based on the discussion initiated by Jesse on this thread.
https://marc.info/?l=linux-virtualization=151189725224231=2

Signed-off-by: Sridhar Samudrala 
Signed-off-by: Alexander Duyck 
Reviewed-by: Jesse Brandeburg 

[...]



+static int virtnet_bypass_register_child(struct net_device *child_netdev)
+{
+   struct virtnet_bypass_info *vbi;
+   struct net_device *dev;
+   bool backup;
+   int ret;
+
+   if (child_netdev->addr_len != ETH_ALEN)
+   return NOTIFY_DONE;
+
+   /* We will use the MAC address to locate the virtnet_bypass netdev
+* to associate with the child netdev. If we don't find a matching
+* bypass netdev, move on.
+*/
+   dev = get_virtnet_bypass_bymac(dev_net(child_netdev),
+  child_netdev->perm_addr);
+   if (!dev)
+   return NOTIFY_DONE;
+
+   vbi = netdev_priv(dev);
+   backup = (child_netdev->dev.parent == dev->dev.parent);
+   if (backup ? rtnl_dereference(vbi->backup_netdev) :
+   rtnl_dereference(vbi->active_netdev)) {
+   netdev_info(dev,
+   "%s attempting to join bypass dev when %s already 
present\n",
+   child_netdev->name, backup ? "backup" : "active");
+   return NOTIFY_DONE;
+   }
+
+   /* Avoid non pci devices as active netdev */
+   if (!backup && (!child_netdev->dev.parent ||
+   !dev_is_pci(child_netdev->dev.parent)))
+   return NOTIFY_DONE;
+
+   ret = netdev_rx_handler_register(child_netdev,
+virtnet_bypass_handle_frame, dev);
+   if (ret != 0) {
+   netdev_err(child_netdev,
+  "can not register bypass receive handler (err = 
%d)\n",
+  ret);
+   goto rx_handler_failed;
+   }
+
+   ret = netdev_upper_dev_link(child_netdev, dev, NULL);
+   if (ret != 0) {
+   netdev_err(child_netdev,
+  "can not set master device %s (err = %d)\n",
+  dev->name, ret);
+   goto upper_link_failed;
+   }
+
+   child_netdev->flags |= IFF_SLAVE;
+
+   if (netif_running(dev)) {
+   ret = dev_open(child_netdev);
+   if (ret && (ret != -EBUSY)) {
+   netdev_err(dev, "Opening child %s failed ret:%d\n",
+  child_netdev->name, ret);
+   goto err_interface_up;
+   }
+   }

Much of this function is copy of netvsc_vf_join, should be shared with
netvsc.


Yes. This is based on netvsc_register_vf() and netvsc_vf_join(). But modified
to handle registration of 2 child netdevs(backup virtio and active vf).  In case
of netvsc, they only register VF. There is no backup netdev.
I think trying to make this code shareable with netvsc will introduce additional
checks and make it convoluted.






+
+   /* Align MTU of child with master */
+   ret = dev_set_mtu(child_netdev, dev->mtu);
+   if (ret) {
+   netdev_err(dev,
+  "unable to change mtu of %s to %u register failed\n",
+  child_netdev->name, dev->mtu);
+   goto err_set_mtu;
+   }
+
+   call_netdevice_notifiers(NETDEV_JOIN, child_netdev);
+
+   netdev_info(dev, "registering %s\n", child_netdev->name);
+
+   dev_hold(child_netdev);
+   if (backup) {
+   rcu_assign_pointer(vbi->backup_netdev, child_netdev);
+   dev_get_stats(vbi->backup_netdev, 

Re: [virtio-dev] Re: [PATCH v4 2/2] virtio_net: Extend virtio to use VF datapath when available

2018-03-12 Thread Samudrala, Sridhar

On 3/7/2018 12:11 PM, Michael S. Tsirkin wrote:

On Wed, Mar 07, 2018 at 10:06:30AM -0800, Stephen Hemminger wrote:

On Wed, 7 Mar 2018 09:50:50 -0800
Alexander Duyck  wrote:


On Tue, Mar 6, 2018 at 6:38 PM, Michael S. Tsirkin  wrote:

On Tue, Mar 06, 2018 at 03:27:46PM -0800, Alexander Duyck wrote:

I definitelly vote for a separate common shared code for both netvsc and
virtio_net - even if you use 2 and 3 netdev model, you could share the
common code. Strict checks and limitation should be in place.

Noted. But as I also mentioned there isn't that much "common" code
between the two models. I think if anything we could probably look at
peeling out a few bits such as "get__bymac" which really would
become dev_get_by_mac_and_ops in order to find the device for the
notifiers. I probably wouldn't even put that in our driver and would
instead put it in the core code since it almost makes more sense
there. Beyond that sharing becomes much more challenging due to the
differences in the Rx and Tx paths that build out of the difference
between the 2 driver and 3 driver models.

At this point it might be worth it to articulate the advantages
of the 3 netdev model.

The main advantages are performance on the lower devices. Specifically
we can run LLTX and IFF_NOQUEUE on the upper device, meaning the VF
doesn't take a performance hit when we start transmitting through it.
In addition the paravirtual device is able to fully expose it's
features, so for example virtio_net can maintain in-driver XDP, and we
can provide generic XDP on the upper device. We can also have an
asymmetric configuration where the number of queues or feature sets
can be different between the ports.

Basically what it comes down to is in the 3 netdev model we never have
to deal with the issues of going from being a standard netdev to being
a master netdev. The role of each netdev is clearly defined.

As far as doing a generic solution it is the preferred way to go as
well since we don't have to rewrite functionality of the lower device.
Currently the only changes really needed are to add a phys_port_name
operation so that you can distinguish between the bypass interface and
the original paravirtual one. As such you have no confusion about what
you are running.


If they are compelling, why wouldn't netvsc users want them?

I believe part of the logic for Stephen's choice is that netvsc
doesn't have a "BACKUP" bit like what we have virtio_net. As a result
they have no way of knowing if a VF will ever show up or not. That
makes the 3 netdev solution less appealing as they always end up in
the bonding mode. In addition they have intentionally limited
themselves to avoid features such as XDP that would cause issues with
the 2 netdev model.


Alex, I think you were one of the strongest proponents of this model,
you should be well placed to provide a summary.

Hopefully the information provided helps. In my mind the issue is that
netvsc was not designed correctly, and as such it is using the 2
netdev model as a bit of a crutch to handle the case where they wanted
to add a VF bypass. At this point it has become a part of their
architecture so it would be confusing for their user base to deal with
having 2 netdevs spawn every time their driver sees a new device. In
the case of virtio we only spawn 2 netdevs if the backup bit is set
which implies a specific use case. When the bit isn't set we are only
spawning one device, and as a result we can get much better
performance out of the resultant combination of devices, and can
expose functionality such as in-driver XDP in virtio.


I have experimented with toggling IFF_NOQUEUE and/or LLTX on the netvsc
device when doing passthrough. It didn't help performance much and there
are corner cases that make it painful, like what if qdisc was placed
by user on the netvsc device?  Should the qdisc then be moved back
and forth to the VF device when it arrives or is removed?

As far as XDP support, it is on the plan to support XDP on the netvsc
device since the receive buffers do have to be copied. But there has
been no customer demand for it; and the VF model on Azure has limitations
which make a transparent XDP model pretty much impossible.


Jiri, does that answer your question? Even though there is some
similarity, the needs of netvsc and virtio appear significantly
different.

We do not yet know whether other PV devices will be virtio-like
or netvsc-like.

Do you agree?


Michael,

At this point can we agree to go with 3 netdev model for virtio and as 
this is

not compatible with netvsc's 2 netdev model,  the scope for any common code
between virtio and netvsc is very limited.

Should i submit a v5 version of this patch series with some minor 
fixes?  Or can

you pull in this series and i can submit patches on top of that?

Thanks
Sridhar

-
To unsubscribe, e-mail: 

[virtio-dev] Re: [PATCH v4 2/2] virtio_net: Extend virtio to use VF datapath when available

2018-03-07 Thread Michael S. Tsirkin
On Wed, Mar 07, 2018 at 10:06:30AM -0800, Stephen Hemminger wrote:
> On Wed, 7 Mar 2018 09:50:50 -0800
> Alexander Duyck  wrote:
> 
> > On Tue, Mar 6, 2018 at 6:38 PM, Michael S. Tsirkin  wrote:
> > > On Tue, Mar 06, 2018 at 03:27:46PM -0800, Alexander Duyck wrote:  
> > >> > I definitelly vote for a separate common shared code for both netvsc 
> > >> > and
> > >> > virtio_net - even if you use 2 and 3 netdev model, you could share the
> > >> > common code. Strict checks and limitation should be in place.  
> > >>
> > >> Noted. But as I also mentioned there isn't that much "common" code
> > >> between the two models. I think if anything we could probably look at
> > >> peeling out a few bits such as "get__bymac" which really would
> > >> become dev_get_by_mac_and_ops in order to find the device for the
> > >> notifiers. I probably wouldn't even put that in our driver and would
> > >> instead put it in the core code since it almost makes more sense
> > >> there. Beyond that sharing becomes much more challenging due to the
> > >> differences in the Rx and Tx paths that build out of the difference
> > >> between the 2 driver and 3 driver models.  
> > >
> > > At this point it might be worth it to articulate the advantages
> > > of the 3 netdev model.  
> > 
> > The main advantages are performance on the lower devices. Specifically
> > we can run LLTX and IFF_NOQUEUE on the upper device, meaning the VF
> > doesn't take a performance hit when we start transmitting through it.
> > In addition the paravirtual device is able to fully expose it's
> > features, so for example virtio_net can maintain in-driver XDP, and we
> > can provide generic XDP on the upper device. We can also have an
> > asymmetric configuration where the number of queues or feature sets
> > can be different between the ports.
> > 
> > Basically what it comes down to is in the 3 netdev model we never have
> > to deal with the issues of going from being a standard netdev to being
> > a master netdev. The role of each netdev is clearly defined.
> > 
> > As far as doing a generic solution it is the preferred way to go as
> > well since we don't have to rewrite functionality of the lower device.
> > Currently the only changes really needed are to add a phys_port_name
> > operation so that you can distinguish between the bypass interface and
> > the original paravirtual one. As such you have no confusion about what
> > you are running.
> > 
> > > If they are compelling, why wouldn't netvsc users want them?  
> > 
> > I believe part of the logic for Stephen's choice is that netvsc
> > doesn't have a "BACKUP" bit like what we have virtio_net. As a result
> > they have no way of knowing if a VF will ever show up or not. That
> > makes the 3 netdev solution less appealing as they always end up in
> > the bonding mode. In addition they have intentionally limited
> > themselves to avoid features such as XDP that would cause issues with
> > the 2 netdev model.
> > 
> > > Alex, I think you were one of the strongest proponents of this model,
> > > you should be well placed to provide a summary.  
> > 
> > Hopefully the information provided helps. In my mind the issue is that
> > netvsc was not designed correctly, and as such it is using the 2
> > netdev model as a bit of a crutch to handle the case where they wanted
> > to add a VF bypass. At this point it has become a part of their
> > architecture so it would be confusing for their user base to deal with
> > having 2 netdevs spawn every time their driver sees a new device. In
> > the case of virtio we only spawn 2 netdevs if the backup bit is set
> > which implies a specific use case. When the bit isn't set we are only
> > spawning one device, and as a result we can get much better
> > performance out of the resultant combination of devices, and can
> > expose functionality such as in-driver XDP in virtio.
> 
> 
> I have experimented with toggling IFF_NOQUEUE and/or LLTX on the netvsc
> device when doing passthrough. It didn't help performance much and there
> are corner cases that make it painful, like what if qdisc was placed
> by user on the netvsc device?  Should the qdisc then be moved back
> and forth to the VF device when it arrives or is removed?
> 
> As far as XDP support, it is on the plan to support XDP on the netvsc
> device since the receive buffers do have to be copied. But there has
> been no customer demand for it; and the VF model on Azure has limitations
> which make a transparent XDP model pretty much impossible.


Jiri, does that answer your question? Even though there is some
similarity, the needs of netvsc and virtio appear significantly
different.

We do not yet know whether other PV devices will be virtio-like
or netvsc-like.

Do you agree?

-- 
MST

-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: 

[virtio-dev] Re: [PATCH v4 2/2] virtio_net: Extend virtio to use VF datapath when available

2018-03-07 Thread Alexander Duyck
On Wed, Mar 7, 2018 at 10:06 AM, Stephen Hemminger
 wrote:
> On Wed, 7 Mar 2018 09:50:50 -0800
> Alexander Duyck  wrote:
>
>> On Tue, Mar 6, 2018 at 6:38 PM, Michael S. Tsirkin  wrote:
>> > On Tue, Mar 06, 2018 at 03:27:46PM -0800, Alexander Duyck wrote:
>> >> > I definitelly vote for a separate common shared code for both netvsc and
>> >> > virtio_net - even if you use 2 and 3 netdev model, you could share the
>> >> > common code. Strict checks and limitation should be in place.
>> >>
>> >> Noted. But as I also mentioned there isn't that much "common" code
>> >> between the two models. I think if anything we could probably look at
>> >> peeling out a few bits such as "get__bymac" which really would
>> >> become dev_get_by_mac_and_ops in order to find the device for the
>> >> notifiers. I probably wouldn't even put that in our driver and would
>> >> instead put it in the core code since it almost makes more sense
>> >> there. Beyond that sharing becomes much more challenging due to the
>> >> differences in the Rx and Tx paths that build out of the difference
>> >> between the 2 driver and 3 driver models.
>> >
>> > At this point it might be worth it to articulate the advantages
>> > of the 3 netdev model.
>>
>> The main advantages are performance on the lower devices. Specifically
>> we can run LLTX and IFF_NOQUEUE on the upper device, meaning the VF
>> doesn't take a performance hit when we start transmitting through it.
>> In addition the paravirtual device is able to fully expose it's
>> features, so for example virtio_net can maintain in-driver XDP, and we
>> can provide generic XDP on the upper device. We can also have an
>> asymmetric configuration where the number of queues or feature sets
>> can be different between the ports.
>>
>> Basically what it comes down to is in the 3 netdev model we never have
>> to deal with the issues of going from being a standard netdev to being
>> a master netdev. The role of each netdev is clearly defined.
>>
>> As far as doing a generic solution it is the preferred way to go as
>> well since we don't have to rewrite functionality of the lower device.
>> Currently the only changes really needed are to add a phys_port_name
>> operation so that you can distinguish between the bypass interface and
>> the original paravirtual one. As such you have no confusion about what
>> you are running.
>>
>> > If they are compelling, why wouldn't netvsc users want them?
>>
>> I believe part of the logic for Stephen's choice is that netvsc
>> doesn't have a "BACKUP" bit like what we have virtio_net. As a result
>> they have no way of knowing if a VF will ever show up or not. That
>> makes the 3 netdev solution less appealing as they always end up in
>> the bonding mode. In addition they have intentionally limited
>> themselves to avoid features such as XDP that would cause issues with
>> the 2 netdev model.
>>
>> > Alex, I think you were one of the strongest proponents of this model,
>> > you should be well placed to provide a summary.
>>
>> Hopefully the information provided helps. In my mind the issue is that
>> netvsc was not designed correctly, and as such it is using the 2
>> netdev model as a bit of a crutch to handle the case where they wanted
>> to add a VF bypass. At this point it has become a part of their
>> architecture so it would be confusing for their user base to deal with
>> having 2 netdevs spawn every time their driver sees a new device. In
>> the case of virtio we only spawn 2 netdevs if the backup bit is set
>> which implies a specific use case. When the bit isn't set we are only
>> spawning one device, and as a result we can get much better
>> performance out of the resultant combination of devices, and can
>> expose functionality such as in-driver XDP in virtio.
>
>
> I have experimented with toggling IFF_NOQUEUE and/or LLTX on the netvsc
> device when doing passthrough. It didn't help performance much and there
> are corner cases that make it painful, like what if qdisc was placed
> by user on the netvsc device?  Should the qdisc then be moved back
> and forth to the VF device when it arrives or is removed?

The advantages of IFF_NOQUEUE and LLTX would mostly be seen on small
packets, and I wouldn't risk toggling them on/off as doing so with
packets in flight might lead to odd results. That was one of the
reasons why I wanted to just stick with the 3 netdev model for us, and
leave you guys running with the 2 netdev model.

> As far as XDP support, it is on the plan to support XDP on the netvsc
> device since the receive buffers do have to be copied. But there has
> been no customer demand for it; and the VF model on Azure has limitations
> which make a transparent XDP model pretty much impossible.

I wasn't saying you needed to implement XDP in netvsc. My point was
that in virtio we already have it and it would be confusing for us to
have a virtio driver that had to drop support for it in 

[virtio-dev] Re: [PATCH v4 2/2] virtio_net: Extend virtio to use VF datapath when available

2018-03-07 Thread Alexander Duyck
On Tue, Mar 6, 2018 at 6:38 PM, Michael S. Tsirkin  wrote:
> On Tue, Mar 06, 2018 at 03:27:46PM -0800, Alexander Duyck wrote:
>> > I definitelly vote for a separate common shared code for both netvsc and
>> > virtio_net - even if you use 2 and 3 netdev model, you could share the
>> > common code. Strict checks and limitation should be in place.
>>
>> Noted. But as I also mentioned there isn't that much "common" code
>> between the two models. I think if anything we could probably look at
>> peeling out a few bits such as "get__bymac" which really would
>> become dev_get_by_mac_and_ops in order to find the device for the
>> notifiers. I probably wouldn't even put that in our driver and would
>> instead put it in the core code since it almost makes more sense
>> there. Beyond that sharing becomes much more challenging due to the
>> differences in the Rx and Tx paths that build out of the difference
>> between the 2 driver and 3 driver models.
>
> At this point it might be worth it to articulate the advantages
> of the 3 netdev model.

The main advantages are performance on the lower devices. Specifically
we can run LLTX and IFF_NOQUEUE on the upper device, meaning the VF
doesn't take a performance hit when we start transmitting through it.
In addition the paravirtual device is able to fully expose it's
features, so for example virtio_net can maintain in-driver XDP, and we
can provide generic XDP on the upper device. We can also have an
asymmetric configuration where the number of queues or feature sets
can be different between the ports.

Basically what it comes down to is in the 3 netdev model we never have
to deal with the issues of going from being a standard netdev to being
a master netdev. The role of each netdev is clearly defined.

As far as doing a generic solution it is the preferred way to go as
well since we don't have to rewrite functionality of the lower device.
Currently the only changes really needed are to add a phys_port_name
operation so that you can distinguish between the bypass interface and
the original paravirtual one. As such you have no confusion about what
you are running.

> If they are compelling, why wouldn't netvsc users want them?

I believe part of the logic for Stephen's choice is that netvsc
doesn't have a "BACKUP" bit like what we have virtio_net. As a result
they have no way of knowing if a VF will ever show up or not. That
makes the 3 netdev solution less appealing as they always end up in
the bonding mode. In addition they have intentionally limited
themselves to avoid features such as XDP that would cause issues with
the 2 netdev model.

> Alex, I think you were one of the strongest proponents of this model,
> you should be well placed to provide a summary.

Hopefully the information provided helps. In my mind the issue is that
netvsc was not designed correctly, and as such it is using the 2
netdev model as a bit of a crutch to handle the case where they wanted
to add a VF bypass. At this point it has become a part of their
architecture so it would be confusing for their user base to deal with
having 2 netdevs spawn every time their driver sees a new device. In
the case of virtio we only spawn 2 netdevs if the backup bit is set
which implies a specific use case. When the bit isn't set we are only
spawning one device, and as a result we can get much better
performance out of the resultant combination of devices, and can
expose functionality such as in-driver XDP in virtio.

-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] Re: [PATCH v4 2/2] virtio_net: Extend virtio to use VF datapath when available

2018-03-06 Thread Michael S. Tsirkin
On Tue, Mar 06, 2018 at 03:27:46PM -0800, Alexander Duyck wrote:
> > I definitelly vote for a separate common shared code for both netvsc and
> > virtio_net - even if you use 2 and 3 netdev model, you could share the
> > common code. Strict checks and limitation should be in place.
> 
> Noted. But as I also mentioned there isn't that much "common" code
> between the two models. I think if anything we could probably look at
> peeling out a few bits such as "get__bymac" which really would
> become dev_get_by_mac_and_ops in order to find the device for the
> notifiers. I probably wouldn't even put that in our driver and would
> instead put it in the core code since it almost makes more sense
> there. Beyond that sharing becomes much more challenging due to the
> differences in the Rx and Tx paths that build out of the difference
> between the 2 driver and 3 driver models.

At this point it might be worth it to articulate the advantages
of the 3 netdev model.

If they are compelling, why wouldn't netvsc users want them?

Alex, I think you were one of the strongest proponents of this model,
you should be well placed to provide a summary.

-- 
MST

-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] Re: [PATCH v4 2/2] virtio_net: Extend virtio to use VF datapath when available

2018-03-06 Thread Alexander Duyck
On Tue, Mar 6, 2018 at 2:59 PM, Jiri Pirko  wrote:
> Tue, Mar 06, 2018 at 08:08:21PM CET, alexander.du...@gmail.com wrote:
>>On Mon, Mar 5, 2018 at 7:15 PM, Stephen Hemminger
>> wrote:
>>> On Mon, 5 Mar 2018 14:47:20 -0800
>>> Alexander Duyck  wrote:
>>>
 On Mon, Mar 5, 2018 at 2:30 PM, Jiri Pirko  wrote:
 > Mon, Mar 05, 2018 at 05:11:32PM CET, step...@networkplumber.org wrote:
 >>On Mon, 5 Mar 2018 10:21:18 +0100
 >>Jiri Pirko  wrote:
 >>
 >>> Sun, Mar 04, 2018 at 10:58:34PM CET, alexander.du...@gmail.com wrote:
 >>> >On Sun, Mar 4, 2018 at 10:50 AM, Jiri Pirko  wrote:
 >>> >> Sun, Mar 04, 2018 at 07:24:12PM CET, alexander.du...@gmail.com 
 >>> >> wrote:
 >>> >>>On Sat, Mar 3, 2018 at 11:13 PM, Jiri Pirko  
 >>> >>>wrote:
 >>>
 >>> [...]
 >>>
 >>> >
 >>> >>>Currently we only have agreement from Michael on taking this code, 
 >>> >>>as
 >>> >>>such we are working with virtio only for now. When the time comes 
 >>> >>>that
 >>> >>
 >>> >> If you do duplication of netvsc in-driver bonding in virtio_net, it 
 >>> >> will
 >>> >> stay there forever. So what you say is: "We will do it halfway now
 >>> >> and promise to fix it later". That later will never happen, I'm 
 >>> >> pretty
 >>> >> sure. That is why I push for in-driver bonding shared code as a 
 >>> >> part of
 >>> >> this patchset.
 >>> >
 >>> >You want this new approach and a copy of netvsc moved into either core
 >>> >or some module of its own. I say pick an architecture. We are looking
 >>> >at either 2 netdevs or 3. We are not going to support both because
 >>> >that will ultimately lead to a terrible user experience and make
 >>> >things quite confusing.
 >>> >
 >>> >> + if you would be pushing first driver to do this, I would 
 >>> >> understand.
 >>> >> But the first driver is already in. You are pushing second. This is 
 >>> >> the
 >>> >> time to do the sharing, unification of behaviour. Next time is too 
 >>> >> late.
 >>> >
 >>> >That is great, if we want to share then lets share. But what you are
 >>> >essentially telling us is that we need to fork this solution and
 >>> >maintain two code paths, one for 2 netdevs, and another for 3. At that
 >>> >point what is the point in merging them together?
 >>>
 >>> Of course, I vote for the same behaviour for netvsc and virtio_net. 
 >>> That
 >>> is my point from the very beginning.
 >>>
 >>> Stephen, what do you think? Could we please make virtio_net and netvsc
 >>> behave the same and to use a single code with well-defined checks and
 >>> restrictions for this feature?
 >>
 >>Eventually, yes both could share common code routines. In reality,
 >>the failover stuff is only a very small part of either driver so
 >>it is not worth stretching to try and cover too much. If you look,
 >>the failover code is just using routines that already exist for
 >>use by bonding, teaming, etc.
 >
 > Yeah, we consern was also about the code that processes the netdev
 > notifications and does auto-enslave and all related stuff.

 The concern was the driver model. If we expose 3 netdevs or 2 with the
 VF driver present. Somehow this is turning into a "merge netvsc into
 virtio" think and that isn't the subject that was being asked.

 Ideally we want one model for this. Either 3 netdevs or 2. The problem
 is 2 causes issues in terms of performance and will limit features of
 virtio, but 2 is the precedent set by netvsc. We need to figure out
 the path forward for this. There is talk about "sharing" but it is
 hard to make these two approaches share code when they are doing two
 very different setups and end up presenting themselves as two very
 different driver models.
>>>
>>> I appreciate this discussion, and it has helped a lot.
>>>
>>> Netvsc is stuck with 2 netdev model for the foreseeable future.
>>> We already failed once with the bonding model, and that created a lot of
>>> pain. The current model is working well and have convinced the major distros
>>> to support the two netdev model and don't want to back.
>>>
>>> Very open to optimizations and ways to smooth out the rough edges.
>>
>>Thank you for clarifying this Stephen.
>>
>>Okay. So with things defined such that we are doing a 2 netdev model
>>for netvsc, and a 3 netdev model for virtio, is it still in our
>>interest for us to try making a shared library between the two? In my
>>mind, the virtnet_bypass becomes the way we go forward for any future
>>solutions. I say we treat the netvsc approach as a "legacy" approach
>>and avoid creating any new libraries or drivers to support it, and
>>instead just focus on the 3 

[virtio-dev] Re: [PATCH v4 2/2] virtio_net: Extend virtio to use VF datapath when available

2018-03-06 Thread Alexander Duyck
On Mon, Mar 5, 2018 at 7:15 PM, Stephen Hemminger
 wrote:
> On Mon, 5 Mar 2018 14:47:20 -0800
> Alexander Duyck  wrote:
>
>> On Mon, Mar 5, 2018 at 2:30 PM, Jiri Pirko  wrote:
>> > Mon, Mar 05, 2018 at 05:11:32PM CET, step...@networkplumber.org wrote:
>> >>On Mon, 5 Mar 2018 10:21:18 +0100
>> >>Jiri Pirko  wrote:
>> >>
>> >>> Sun, Mar 04, 2018 at 10:58:34PM CET, alexander.du...@gmail.com wrote:
>> >>> >On Sun, Mar 4, 2018 at 10:50 AM, Jiri Pirko  wrote:
>> >>> >> Sun, Mar 04, 2018 at 07:24:12PM CET, alexander.du...@gmail.com wrote:
>> >>> >>>On Sat, Mar 3, 2018 at 11:13 PM, Jiri Pirko  wrote:
>> >>>
>> >>> [...]
>> >>>
>> >>> >
>> >>> >>>Currently we only have agreement from Michael on taking this code, as
>> >>> >>>such we are working with virtio only for now. When the time comes that
>> >>> >>
>> >>> >> If you do duplication of netvsc in-driver bonding in virtio_net, it 
>> >>> >> will
>> >>> >> stay there forever. So what you say is: "We will do it halfway now
>> >>> >> and promise to fix it later". That later will never happen, I'm pretty
>> >>> >> sure. That is why I push for in-driver bonding shared code as a part 
>> >>> >> of
>> >>> >> this patchset.
>> >>> >
>> >>> >You want this new approach and a copy of netvsc moved into either core
>> >>> >or some module of its own. I say pick an architecture. We are looking
>> >>> >at either 2 netdevs or 3. We are not going to support both because
>> >>> >that will ultimately lead to a terrible user experience and make
>> >>> >things quite confusing.
>> >>> >
>> >>> >> + if you would be pushing first driver to do this, I would understand.
>> >>> >> But the first driver is already in. You are pushing second. This is 
>> >>> >> the
>> >>> >> time to do the sharing, unification of behaviour. Next time is too 
>> >>> >> late.
>> >>> >
>> >>> >That is great, if we want to share then lets share. But what you are
>> >>> >essentially telling us is that we need to fork this solution and
>> >>> >maintain two code paths, one for 2 netdevs, and another for 3. At that
>> >>> >point what is the point in merging them together?
>> >>>
>> >>> Of course, I vote for the same behaviour for netvsc and virtio_net. That
>> >>> is my point from the very beginning.
>> >>>
>> >>> Stephen, what do you think? Could we please make virtio_net and netvsc
>> >>> behave the same and to use a single code with well-defined checks and
>> >>> restrictions for this feature?
>> >>
>> >>Eventually, yes both could share common code routines. In reality,
>> >>the failover stuff is only a very small part of either driver so
>> >>it is not worth stretching to try and cover too much. If you look,
>> >>the failover code is just using routines that already exist for
>> >>use by bonding, teaming, etc.
>> >
>> > Yeah, we consern was also about the code that processes the netdev
>> > notifications and does auto-enslave and all related stuff.
>>
>> The concern was the driver model. If we expose 3 netdevs or 2 with the
>> VF driver present. Somehow this is turning into a "merge netvsc into
>> virtio" think and that isn't the subject that was being asked.
>>
>> Ideally we want one model for this. Either 3 netdevs or 2. The problem
>> is 2 causes issues in terms of performance and will limit features of
>> virtio, but 2 is the precedent set by netvsc. We need to figure out
>> the path forward for this. There is talk about "sharing" but it is
>> hard to make these two approaches share code when they are doing two
>> very different setups and end up presenting themselves as two very
>> different driver models.
>
> I appreciate this discussion, and it has helped a lot.
>
> Netvsc is stuck with 2 netdev model for the foreseeable future.
> We already failed once with the bonding model, and that created a lot of
> pain. The current model is working well and have convinced the major distros
> to support the two netdev model and don't want to back.
>
> Very open to optimizations and ways to smooth out the rough edges.

Thank you for clarifying this Stephen.

Okay. So with things defined such that we are doing a 2 netdev model
for netvsc, and a 3 netdev model for virtio, is it still in our
interest for us to try making a shared library between the two? In my
mind, the virtnet_bypass becomes the way we go forward for any future
solutions. I say we treat the netvsc approach as a "legacy" approach
and avoid creating any new libraries or drivers to support it, and
instead just focus on the 3 netdev approach as the way this is to be
done going forward. That way we avoid anyone else trying to implement
something like the 2 netdev solution in the future.

So getting back to the code here. Should we split the virtnet_bypass
code out into a separate module? My preference would be to let this
incubate as a part of virtio_net until either there is another user,
or it becomes big enough that it needs to be 

[virtio-dev] Re: [PATCH v4 2/2] virtio_net: Extend virtio to use VF datapath when available

2018-03-05 Thread Alexander Duyck
On Mon, Mar 5, 2018 at 2:30 PM, Jiri Pirko  wrote:
> Mon, Mar 05, 2018 at 05:11:32PM CET, step...@networkplumber.org wrote:
>>On Mon, 5 Mar 2018 10:21:18 +0100
>>Jiri Pirko  wrote:
>>
>>> Sun, Mar 04, 2018 at 10:58:34PM CET, alexander.du...@gmail.com wrote:
>>> >On Sun, Mar 4, 2018 at 10:50 AM, Jiri Pirko  wrote:
>>> >> Sun, Mar 04, 2018 at 07:24:12PM CET, alexander.du...@gmail.com wrote:
>>> >>>On Sat, Mar 3, 2018 at 11:13 PM, Jiri Pirko  wrote:
>>>
>>> [...]
>>>
>>> >
>>> >>>Currently we only have agreement from Michael on taking this code, as
>>> >>>such we are working with virtio only for now. When the time comes that
>>> >>
>>> >> If you do duplication of netvsc in-driver bonding in virtio_net, it will
>>> >> stay there forever. So what you say is: "We will do it halfway now
>>> >> and promise to fix it later". That later will never happen, I'm pretty
>>> >> sure. That is why I push for in-driver bonding shared code as a part of
>>> >> this patchset.
>>> >
>>> >You want this new approach and a copy of netvsc moved into either core
>>> >or some module of its own. I say pick an architecture. We are looking
>>> >at either 2 netdevs or 3. We are not going to support both because
>>> >that will ultimately lead to a terrible user experience and make
>>> >things quite confusing.
>>> >
>>> >> + if you would be pushing first driver to do this, I would understand.
>>> >> But the first driver is already in. You are pushing second. This is the
>>> >> time to do the sharing, unification of behaviour. Next time is too late.
>>> >
>>> >That is great, if we want to share then lets share. But what you are
>>> >essentially telling us is that we need to fork this solution and
>>> >maintain two code paths, one for 2 netdevs, and another for 3. At that
>>> >point what is the point in merging them together?
>>>
>>> Of course, I vote for the same behaviour for netvsc and virtio_net. That
>>> is my point from the very beginning.
>>>
>>> Stephen, what do you think? Could we please make virtio_net and netvsc
>>> behave the same and to use a single code with well-defined checks and
>>> restrictions for this feature?
>>
>>Eventually, yes both could share common code routines. In reality,
>>the failover stuff is only a very small part of either driver so
>>it is not worth stretching to try and cover too much. If you look,
>>the failover code is just using routines that already exist for
>>use by bonding, teaming, etc.
>
> Yeah, we consern was also about the code that processes the netdev
> notifications and does auto-enslave and all related stuff.

The concern was the driver model. If we expose 3 netdevs or 2 with the
VF driver present. Somehow this is turning into a "merge netvsc into
virtio" think and that isn't the subject that was being asked.

Ideally we want one model for this. Either 3 netdevs or 2. The problem
is 2 causes issues in terms of performance and will limit features of
virtio, but 2 is the precedent set by netvsc. We need to figure out
the path forward for this. There is talk about "sharing" but it is
hard to make these two approaches share code when they are doing two
very different setups and end up presenting themselves as two very
different driver models.

>>
>>There will always be two drivers, the ring buffers and buffering
>>are very different between vmbus and virtio. It would help to address
>>some of the awkward stuff like queue selection and offload handling
>>in a common way.
>
> Agreed.

There are going to end up being three drivers by the time we are done.
We will end up with netvsc, virtio, and some shared block of
functionality that is used between the two of them. At least that is
the assumption if the two are going to share code. I don't know if
everyone will want to take on the extra overhead for the code shared
between these two drivers being a part of the core net code.

-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] Re: [PATCH v4 2/2] virtio_net: Extend virtio to use VF datapath when available

2018-03-04 Thread Alexander Duyck
On Sun, Mar 4, 2018 at 10:50 AM, Jiri Pirko  wrote:
> Sun, Mar 04, 2018 at 07:24:12PM CET, alexander.du...@gmail.com wrote:
>>On Sat, Mar 3, 2018 at 11:13 PM, Jiri Pirko  wrote:
>>> Sun, Mar 04, 2018 at 01:26:53AM CET, alexander.du...@gmail.com wrote:
On Sat, Mar 3, 2018 at 1:25 PM, Jiri Pirko  wrote:
> Sat, Mar 03, 2018 at 07:04:57PM CET, alexander.du...@gmail.com wrote:
>>On Sat, Mar 3, 2018 at 3:31 AM, Jiri Pirko  wrote:
>>> Fri, Mar 02, 2018 at 08:42:47PM CET, m...@redhat.com wrote:
On Fri, Mar 02, 2018 at 05:20:17PM +0100, Jiri Pirko wrote:
> >Yeah, this code essentially calls out the "shareable" code with a
> >comment at the start and end of the section what defines the
> >virtio_bypass functionality. It would just be a matter of mostly
> >cutting and pasting to put it into a separate driver module.
>
> Please put it there and unite the use of it with netvsc.

Surely, adding this to other drivers (e.g. might this be handy for xen
too?) can be left for a separate patchset. Let's get one device merged
first.
>>>
>>> Why? Let's do the generic infra alongside with the driver. I see no good
>>> reason to rush into merging driver and only later, if ever, to convert
>>> it to generic solution. On contrary. That would lead into multiple
>>> approaches and different behavious in multiple drivers. That is plain
>>> wrong.
>>
>>If nothing else it doesn't hurt to do this in one driver in a generic
>>way, and once it has been proven to address all the needs of that one
>>driver we can then start moving other drivers to it. The current
>>solution is quite generic, that was my contribution to this patch set
>>as I didn't like how invasive it was being to virtio and thought it
>>would be best to keep this as minimally invasive as possible.
>>
>>My preference would be to give this a release or two in virtio to
>>mature before we start pushing it onto other drivers. It shouldn't
>>take much to cut/paste this into a new driver file once we decide it
>>is time to start extending it out to other drivers.
>
> I'm not talking about cut/paste and in fact that is what I'm worried
> about. I'm talking about common code in net/core/ or somewhere that
> would take care of this in-driver bonding. Each driver, like virtio_net,
> netvsc would just register some ops to it and the core would do all
> logic. I believe it is essential take this approach from the start.

Sorry, I didn't mean cut/paste into another driver, I meant to make it
a driver of its own. My thought was to eventually create a shared/core
driver module that is then used by the other drivers.

My concern right now is that Stephen has indicated he doesn't want
this approach taken with netvsc, and most of the community doesn't
>>>
>>> IIUC, he only does not like the extra netdev. Is there anything else?
>>
>>Nope that is pretty much it. It doesn't seem like a big deal for
>>virtio, but for netvsc it is significant since they don't have any
>>"backup" bit feature differentiation, so they would likely be stuck
>>with 2 netdevs even in their basic setup.
>
> Okay. If that is a strict "no-go" for netvsc, this should be
> just a flag passed down to the in-driver bond code.

Are you serious? We might as well just do a per-driver bond then if
that is what you want. Once you go back to the "2 netdev" model for
this the bond becomes tightly woven into the driver and becomes a
separate beast entirely. At that point sharing kind of goes out the
window since you have to be tightly coupled into all of the per-driver
ops. I would argue there is no way to do the "2 netdev" model
generically. It is kind of inherent to the "2 netdev" model in the
first place since you can't have a third driver pop up so now
everything is pulled into the paravirtual interface unless you invert
everything and require the netvsc driver to provide the driver with a
set of function pointers allowing it to call back into it. In addition
you suddenly have to deal with all the qdisc and Tx queue locking
mess. So the 3 netdev model let the driver be lockless and run with no
queue disc. Are you telling us you expect our solution to run in both
modes or are you pushing the qdisc overhead and Tx queue locking into
the 3 netdev model?

What it ultimately comes down to is how do you create a new netdev
without exposing a new netdev? In the 3 netdev model this all makes
sense as we can leave the paravirtual interface in tact. Now you are
telling us that based on a flag we either have to embed ourselves into
the paravirtual interface without exposing our operations, or we have
to embed the paravirtual interface into our device without letting it
be visible. The sheer overhead of that will end up more then doubling
the 

[virtio-dev] Re: [PATCH v4 2/2] virtio_net: Extend virtio to use VF datapath when available

2018-03-04 Thread Samudrala, Sridhar



On 3/4/2018 10:50 AM, Jiri Pirko wrote:

Sun, Mar 04, 2018 at 07:24:12PM CET, alexander.du...@gmail.com wrote:

On Sat, Mar 3, 2018 at 11:13 PM, Jiri Pirko  wrote:

Sun, Mar 04, 2018 at 01:26:53AM CET, alexander.du...@gmail.com wrote:

On Sat, Mar 3, 2018 at 1:25 PM, Jiri Pirko  wrote:

Sat, Mar 03, 2018 at 07:04:57PM CET, alexander.du...@gmail.com wrote:

On Sat, Mar 3, 2018 at 3:31 AM, Jiri Pirko  wrote:

Fri, Mar 02, 2018 at 08:42:47PM CET, m...@redhat.com wrote:

On Fri, Mar 02, 2018 at 05:20:17PM +0100, Jiri Pirko wrote:

Yeah, this code essentially calls out the "shareable" code with a
comment at the start and end of the section what defines the
virtio_bypass functionality. It would just be a matter of mostly
cutting and pasting to put it into a separate driver module.

Please put it there and unite the use of it with netvsc.

Surely, adding this to other drivers (e.g. might this be handy for xen
too?) can be left for a separate patchset. Let's get one device merged
first.

Why? Let's do the generic infra alongside with the driver. I see no good
reason to rush into merging driver and only later, if ever, to convert
it to generic solution. On contrary. That would lead into multiple
approaches and different behavious in multiple drivers. That is plain
wrong.

If nothing else it doesn't hurt to do this in one driver in a generic
way, and once it has been proven to address all the needs of that one
driver we can then start moving other drivers to it. The current
solution is quite generic, that was my contribution to this patch set
as I didn't like how invasive it was being to virtio and thought it
would be best to keep this as minimally invasive as possible.

My preference would be to give this a release or two in virtio to
mature before we start pushing it onto other drivers. It shouldn't
take much to cut/paste this into a new driver file once we decide it
is time to start extending it out to other drivers.

I'm not talking about cut/paste and in fact that is what I'm worried
about. I'm talking about common code in net/core/ or somewhere that
would take care of this in-driver bonding. Each driver, like virtio_net,
netvsc would just register some ops to it and the core would do all
logic. I believe it is essential take this approach from the start.

Sorry, I didn't mean cut/paste into another driver, I meant to make it
a driver of its own. My thought was to eventually create a shared/core
driver module that is then used by the other drivers.

My concern right now is that Stephen has indicated he doesn't want
this approach taken with netvsc, and most of the community doesn't

IIUC, he only does not like the extra netdev. Is there anything else?

Nope that is pretty much it. It doesn't seem like a big deal for
virtio, but for netvsc it is significant since they don't have any
"backup" bit feature differentiation, so they would likely be stuck
with 2 netdevs even in their basic setup.

Okay. If that is a strict "no-go" for netvsc, this should be
just a flag passed down to the in-driver bond code.


This results in a 3 driver model (virtio/netvsc, vf & bypass) with 2 netdevs
created when bypass is based on netvsc and 3 netdevs created when the bypass
is based on virtio_net.

Unless we agree on a common netdev model between netvsc and virtio_net,
i am not sure if it is useful to commonize the code into a separate driver.

-Sridhar

-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] Re: [PATCH v4 2/2] virtio_net: Extend virtio to use VF datapath when available

2018-03-04 Thread Alexander Duyck
On Sat, Mar 3, 2018 at 11:13 PM, Jiri Pirko  wrote:
> Sun, Mar 04, 2018 at 01:26:53AM CET, alexander.du...@gmail.com wrote:
>>On Sat, Mar 3, 2018 at 1:25 PM, Jiri Pirko  wrote:
>>> Sat, Mar 03, 2018 at 07:04:57PM CET, alexander.du...@gmail.com wrote:
On Sat, Mar 3, 2018 at 3:31 AM, Jiri Pirko  wrote:
> Fri, Mar 02, 2018 at 08:42:47PM CET, m...@redhat.com wrote:
>>On Fri, Mar 02, 2018 at 05:20:17PM +0100, Jiri Pirko wrote:
>>> >Yeah, this code essentially calls out the "shareable" code with a
>>> >comment at the start and end of the section what defines the
>>> >virtio_bypass functionality. It would just be a matter of mostly
>>> >cutting and pasting to put it into a separate driver module.
>>>
>>> Please put it there and unite the use of it with netvsc.
>>
>>Surely, adding this to other drivers (e.g. might this be handy for xen
>>too?) can be left for a separate patchset. Let's get one device merged
>>first.
>
> Why? Let's do the generic infra alongside with the driver. I see no good
> reason to rush into merging driver and only later, if ever, to convert
> it to generic solution. On contrary. That would lead into multiple
> approaches and different behavious in multiple drivers. That is plain
> wrong.

If nothing else it doesn't hurt to do this in one driver in a generic
way, and once it has been proven to address all the needs of that one
driver we can then start moving other drivers to it. The current
solution is quite generic, that was my contribution to this patch set
as I didn't like how invasive it was being to virtio and thought it
would be best to keep this as minimally invasive as possible.

My preference would be to give this a release or two in virtio to
mature before we start pushing it onto other drivers. It shouldn't
take much to cut/paste this into a new driver file once we decide it
is time to start extending it out to other drivers.
>>>
>>> I'm not talking about cut/paste and in fact that is what I'm worried
>>> about. I'm talking about common code in net/core/ or somewhere that
>>> would take care of this in-driver bonding. Each driver, like virtio_net,
>>> netvsc would just register some ops to it and the core would do all
>>> logic. I believe it is essential take this approach from the start.
>>
>>Sorry, I didn't mean cut/paste into another driver, I meant to make it
>>a driver of its own. My thought was to eventually create a shared/core
>>driver module that is then used by the other drivers.
>>
>>My concern right now is that Stephen has indicated he doesn't want
>>this approach taken with netvsc, and most of the community doesn't
>
> IIUC, he only does not like the extra netdev. Is there anything else?

Nope that is pretty much it. It doesn't seem like a big deal for
virtio, but for netvsc it is significant since they don't have any
"backup" bit feature differentiation, so they would likely be stuck
with 2 netdevs even in their basic setup.

>>want the netvsc approach applied to virtio. Until that impasse can be
>>resolved there isn't much value in trying to split this up so it is
>>available to other drivers. In addition I would imagine it would make
>>it a pain for others to back-port into distros since it would break
>>legacy netvsc driver behavior. Patches are always welcome. Once this
>>is in you are free to try fighting to get this made into a generic
>>module and applied to both drivers, but we have already spent close to
>>3 months on this and it seems like there has been significantly more
>
> Alex, time is never a good argument for poor design and shortcuts.

I'm not saying we should go with a poor design due to time. But
expecting us to implement something where the maintainer of said
driver has not agreed to is pointless, and I don't see it as a design
shortcut to implement something in one driver with the expectation
that we will then make it core later once it has proven itself and has
use elsewhere. In the meantime I would imagine it also makes it easier
for things like backports and such for us to do it this way since we
are only impacting one driver.

You are telling us to do something that not everyone has agreed to.
Currently we only have agreement from Michael on taking this code, as
such we are working with virtio only for now. When the time comes that
we can get other maintainers, specifically Stephen, to agree to it
then we can cut/paste this code into a core file or into a module of
its own. Alternatively I suppose we could take this up to Dave if you
can't get Stephen to agree. If you can get Dave to say we need to
change netvsc then we will go ahead with it, but generally I prefer to
respect when the maintainer of something says they don't want us
modifying their code in some way.

-
To unsubscribe, e-mail: 

[virtio-dev] Re: [PATCH v4 2/2] virtio_net: Extend virtio to use VF datapath when available

2018-03-03 Thread Michael S. Tsirkin
On Fri, Mar 02, 2018 at 03:56:31PM -0800, Siwei Liu wrote:
> On Fri, Mar 2, 2018 at 1:36 PM, Michael S. Tsirkin  wrote:
> > On Fri, Mar 02, 2018 at 01:11:56PM -0800, Siwei Liu wrote:
> >> On Thu, Mar 1, 2018 at 12:08 PM, Sridhar Samudrala
> >>  wrote:
> >> > This patch enables virtio_net to switch over to a VF datapath when a VF
> >> > netdev is present with the same MAC address. It allows live migration
> >> > of a VM with a direct attached VF without the need to setup a bond/team
> >> > between a VF and virtio net device in the guest.
> >> >
> >> > The hypervisor needs to enable only one datapath at any time so that
> >> > packets don't get looped back to the VM over the other datapath. When a 
> >> > VF
> >> > is plugged, the virtio datapath link state can be marked as down. The
> >> > hypervisor needs to unplug the VF device from the guest on the source 
> >> > host
> >> > and reset the MAC filter of the VF to initiate failover of datapath to
> >> > virtio before starting the migration. After the migration is completed,
> >> > the destination hypervisor sets the MAC filter on the VF and plugs it 
> >> > back
> >> > to the guest to switch over to VF datapath.
> >> >
> >> > When BACKUP feature is enabled, an additional netdev(bypass netdev) is
> >> > created that acts as a master device and tracks the state of the 2 lower
> >> > netdevs. The original virtio_net netdev is marked as 'backup' netdev and 
> >> > a
> >> > passthru device with the same MAC is registered as 'active' netdev.
> >> >
> >> > This patch is based on the discussion initiated by Jesse on this thread.
> >> > https://marc.info/?l=linux-virtualization=151189725224231=2
> >> >
> >> > Signed-off-by: Sridhar Samudrala 
> >> > Signed-off-by: Alexander Duyck 
> >> > Reviewed-by: Jesse Brandeburg 
> >> > ---
> >> >  drivers/net/virtio_net.c | 683 
> >> > ++-
> >> >  1 file changed, 682 insertions(+), 1 deletion(-)
> >> >
> >> > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> >> > index bcd13fe906ca..f2860d86c952 100644
> >> > --- a/drivers/net/virtio_net.c
> >> > +++ b/drivers/net/virtio_net.c
> >> > @@ -30,6 +30,8 @@
> >> >  #include 
> >> >  #include 
> >> >  #include 
> >> > +#include 
> >> > +#include 
> >> >  #include 
> >> >  #include 
> >> >
> >> > @@ -206,6 +208,9 @@ struct virtnet_info {
> >> > u32 speed;
> >> >
> >> > unsigned long guest_offloads;
> >> > +
> >> > +   /* upper netdev created when BACKUP feature enabled */
> >> > +   struct net_device *bypass_netdev;
> >> >  };
> >> >
> >> >  struct padded_vnet_hdr {
> >> > @@ -2236,6 +2241,22 @@ static int virtnet_xdp(struct net_device *dev, 
> >> > struct netdev_bpf *xdp)
> >> > }
> >> >  }
> >> >
> >> > +static int virtnet_get_phys_port_name(struct net_device *dev, char *buf,
> >> > + size_t len)
> >> > +{
> >> > +   struct virtnet_info *vi = netdev_priv(dev);
> >> > +   int ret;
> >> > +
> >> > +   if (!virtio_has_feature(vi->vdev, VIRTIO_NET_F_BACKUP))
> >> > +   return -EOPNOTSUPP;
> >> > +
> >> > +   ret = snprintf(buf, len, "_bkup");
> >> > +   if (ret >= len)
> >> > +   return -EOPNOTSUPP;
> >> > +
> >> > +   return 0;
> >> > +}
> >> > +
> >>
> >> What if the systemd/udevd is not new enough to enforce the
> >> n naming? Would virtio_bypass get a different name
> >> than the original virtio_net?
> >
> > You mean people using ethX names? Any hardware config change breaks
> > these, I don't think that can be helped.
> 
> I don't like the way to rely on .ndo_get_phys_port_name - it's fragile
> and it does not completely solve the problem it tries to address.
> Imagine what can end up with if getting an old udevd, or users already
> have exsiting explicit udev rules around phys_port_name. It does not
> give you the an ack in saying "yes, I know you're the bypass and
> you're the backup, please continue and I will give you both correct
> names", or an unacknowlegment saying "no, I don't know what these
> extra interfaces are, please go back and leave the VF device alone".
> We need new udev API for both feature negotiation and naming, or may
> even completely hide the lower interfaces.

Go ahead and try to make this happen, but I won't hold my
breath.

> >
> >> Should we detect this earlier and fall
> >> back to legacy mode without creating the bypass netdev and ensalving
> >> the VF?
> >
> > I don't think we can do this with existing kernel/userspace APIs.
> 
> That's why I ever said to make udev aware of this new type of combined
> device instead of doing hacks here and there around.
> 
> Regards,
> -Siwei

We can add new interfaces on top but the main purpose here is to
make old userspace do new tricks.

> >
> > --
> > MST


Re: [virtio-dev] Re: [PATCH v4 2/2] virtio_net: Extend virtio to use VF datapath when available

2018-03-03 Thread Michael S. Tsirkin
On Fri, Mar 02, 2018 at 02:26:48PM -0800, Siwei Liu wrote:
> On Fri, Mar 2, 2018 at 1:31 PM, Michael S. Tsirkin  wrote:
> > On Fri, Mar 02, 2018 at 12:44:56PM -0800, Siwei Liu wrote:
> >> On Fri, Mar 2, 2018 at 12:10 PM, Michael S. Tsirkin  
> >> wrote:
> >> > On Fri, Mar 02, 2018 at 11:52:27AM -0800, Samudrala, Sridhar wrote:
> >> >>
> >> >>
> >> >> On 3/2/2018 11:41 AM, Michael S. Tsirkin wrote:
> >> >> > On Fri, Mar 02, 2018 at 07:26:25AM -0800, Alexander Duyck wrote:
> >> >> > > The design limits things to a 1:1 relationship since we just have 
> >> >> > > the
> >> >> > > child and backup pointers, but I don't think I am seeing exception
> >> >> > > handling to prevent us from overwriting the child pointers so there
> >> >> > > may be a leak there.
> >> >> > >
> >> >> > > Thanks.
> >> >> > >
> >> >> > > - Alex
> >> >> > In fact maintaining a list in that case would be nicer, and
> >> >> > just use an arbitrary one.
> >> >> > E.g. one can see how a user wanting to swap device 1 for device 2
> >> >> > might first add device 2 with same MAC then drop device 1.
> >> >>
> >> >> It should be possible to swap VF1 with VF2 by
> >> >> 1.- enabling virtio link
> >> >> 2.- unplugging VF1
> >> >> 3.- plugging VF2
> >> >> 4.- disabling virtio link
> >> >>
> >> >
> >> > True, but it isn't hard to avoid breakage if user
> >> > swapped steps 2 and 3. No need to make it more
> >> > fragile that it has to be.
> >>
> >> The migration case, VF2 is associated with another PF on another
> >> machine (destination), I wonder how it is possible.
> >
> > E.g. you want to remove the PF so you unplug the VF
> > then add another VF of the same PF.
> >
> >> Even with local plugging of VF2 on the same PF, the MAC address
> >> requirement (VF1's == VF2's) would fail the MAC address assignment on
> >> VF2.
> >>
> >> -Siwei
> >
> > Why would it fail? These are separate cards.
> 
> OK. I realized that you may talk about assigning a VF on a diffferent
> PF (VF1 on PF1 while VF2 on PF2). And we might assign a pass-through
> device rather than a VF. Yes, it's indeed possible that may happen but
> I take it as a further step down (another patch maybe) as it would
> involve changes to notify the network with gratuituious ARP and/or
> unsolicited ND advertisement of the MAC address association with the
> new port.
> 
> -Siwei

Interesting point. I guess that's a limitation in the curent patch then:
virtio and PT device must be connected to the same physical NIC.
Worth documenting.

> >
> >> >
> >> > --
> >> > MST
> >> >
> >> > -
> >> > To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
> >> > For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org
> >> >

-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] Re: [PATCH v4 2/2] virtio_net: Extend virtio to use VF datapath when available

2018-03-03 Thread Alexander Duyck
On Sat, Mar 3, 2018 at 1:25 PM, Jiri Pirko  wrote:
> Sat, Mar 03, 2018 at 07:04:57PM CET, alexander.du...@gmail.com wrote:
>>On Sat, Mar 3, 2018 at 3:31 AM, Jiri Pirko  wrote:
>>> Fri, Mar 02, 2018 at 08:42:47PM CET, m...@redhat.com wrote:
On Fri, Mar 02, 2018 at 05:20:17PM +0100, Jiri Pirko wrote:
> >Yeah, this code essentially calls out the "shareable" code with a
> >comment at the start and end of the section what defines the
> >virtio_bypass functionality. It would just be a matter of mostly
> >cutting and pasting to put it into a separate driver module.
>
> Please put it there and unite the use of it with netvsc.

Surely, adding this to other drivers (e.g. might this be handy for xen
too?) can be left for a separate patchset. Let's get one device merged
first.
>>>
>>> Why? Let's do the generic infra alongside with the driver. I see no good
>>> reason to rush into merging driver and only later, if ever, to convert
>>> it to generic solution. On contrary. That would lead into multiple
>>> approaches and different behavious in multiple drivers. That is plain
>>> wrong.
>>
>>If nothing else it doesn't hurt to do this in one driver in a generic
>>way, and once it has been proven to address all the needs of that one
>>driver we can then start moving other drivers to it. The current
>>solution is quite generic, that was my contribution to this patch set
>>as I didn't like how invasive it was being to virtio and thought it
>>would be best to keep this as minimally invasive as possible.
>>
>>My preference would be to give this a release or two in virtio to
>>mature before we start pushing it onto other drivers. It shouldn't
>>take much to cut/paste this into a new driver file once we decide it
>>is time to start extending it out to other drivers.
>
> I'm not talking about cut/paste and in fact that is what I'm worried
> about. I'm talking about common code in net/core/ or somewhere that
> would take care of this in-driver bonding. Each driver, like virtio_net,
> netvsc would just register some ops to it and the core would do all
> logic. I believe it is essential take this approach from the start.

Sorry, I didn't mean cut/paste into another driver, I meant to make it
a driver of its own. My thought was to eventually create a shared/core
driver module that is then used by the other drivers.

My concern right now is that Stephen has indicated he doesn't want
this approach taken with netvsc, and most of the community doesn't
want the netvsc approach applied to virtio. Until that impasse can be
resolved there isn't much value in trying to split this up so it is
available to other drivers. In addition I would imagine it would make
it a pain for others to back-port into distros since it would break
legacy netvsc driver behavior. Patches are always welcome. Once this
is in you are free to try fighting to get this made into a generic
module and applied to both drivers, but we have already spent close to
3 months on this and it seems like there has been significantly more
time spent arguing over the number of interfaces and/or drivers than
spent writing/reviewing actual code.

- Alex

-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] Re: [PATCH v4 2/2] virtio_net: Extend virtio to use VF datapath when available

2018-03-03 Thread Alexander Duyck
On Sat, Mar 3, 2018 at 3:31 AM, Jiri Pirko  wrote:
> Fri, Mar 02, 2018 at 08:42:47PM CET, m...@redhat.com wrote:
>>On Fri, Mar 02, 2018 at 05:20:17PM +0100, Jiri Pirko wrote:
>>> >Yeah, this code essentially calls out the "shareable" code with a
>>> >comment at the start and end of the section what defines the
>>> >virtio_bypass functionality. It would just be a matter of mostly
>>> >cutting and pasting to put it into a separate driver module.
>>>
>>> Please put it there and unite the use of it with netvsc.
>>
>>Surely, adding this to other drivers (e.g. might this be handy for xen
>>too?) can be left for a separate patchset. Let's get one device merged
>>first.
>
> Why? Let's do the generic infra alongside with the driver. I see no good
> reason to rush into merging driver and only later, if ever, to convert
> it to generic solution. On contrary. That would lead into multiple
> approaches and different behavious in multiple drivers. That is plain
> wrong.

If nothing else it doesn't hurt to do this in one driver in a generic
way, and once it has been proven to address all the needs of that one
driver we can then start moving other drivers to it. The current
solution is quite generic, that was my contribution to this patch set
as I didn't like how invasive it was being to virtio and thought it
would be best to keep this as minimally invasive as possible.

My preference would be to give this a release or two in virtio to
mature before we start pushing it onto other drivers. It shouldn't
take much to cut/paste this into a new driver file once we decide it
is time to start extending it out to other drivers.

- Alex

-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



Re: [virtio-dev] Re: [PATCH v4 2/2] virtio_net: Extend virtio to use VF datapath when available

2018-03-02 Thread Siwei Liu
On Fri, Mar 2, 2018 at 3:12 PM, Samudrala, Sridhar
 wrote:
> On 3/2/2018 1:11 PM, Siwei Liu wrote:
>>
>> On Thu, Mar 1, 2018 at 12:08 PM, Sridhar Samudrala
>>  wrote:
>>>
>>> This patch enables virtio_net to switch over to a VF datapath when a VF
>>> netdev is present with the same MAC address. It allows live migration
>>> of a VM with a direct attached VF without the need to setup a bond/team
>>> between a VF and virtio net device in the guest.
>>>
>>> The hypervisor needs to enable only one datapath at any time so that
>>> packets don't get looped back to the VM over the other datapath. When a
>>> VF
>>> is plugged, the virtio datapath link state can be marked as down. The
>>> hypervisor needs to unplug the VF device from the guest on the source
>>> host
>>> and reset the MAC filter of the VF to initiate failover of datapath to
>>> virtio before starting the migration. After the migration is completed,
>>> the destination hypervisor sets the MAC filter on the VF and plugs it
>>> back
>>> to the guest to switch over to VF datapath.
>>>
>>> When BACKUP feature is enabled, an additional netdev(bypass netdev) is
>>> created that acts as a master device and tracks the state of the 2 lower
>>> netdevs. The original virtio_net netdev is marked as 'backup' netdev and
>>> a
>>> passthru device with the same MAC is registered as 'active' netdev.
>>>
>>> This patch is based on the discussion initiated by Jesse on this thread.
>>> https://marc.info/?l=linux-virtualization=151189725224231=2
>>>
>>> Signed-off-by: Sridhar Samudrala 
>>> Signed-off-by: Alexander Duyck 
>>> Reviewed-by: Jesse Brandeburg 
>>> ---
>>>   drivers/net/virtio_net.c | 683
>>> ++-
>>>   1 file changed, 682 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
>>> index bcd13fe906ca..f2860d86c952 100644
>>> --- a/drivers/net/virtio_net.c
>>> +++ b/drivers/net/virtio_net.c
>>> @@ -30,6 +30,8 @@
>>>   #include 
>>>   #include 
>>>   #include 
>>> +#include 
>>> +#include 
>>>   #include 
>>>   #include 
>>>
>>> @@ -206,6 +208,9 @@ struct virtnet_info {
>>>  u32 speed;
>>>
>>>  unsigned long guest_offloads;
>>> +
>>> +   /* upper netdev created when BACKUP feature enabled */
>>> +   struct net_device *bypass_netdev;
>>>   };
>>>
>>>   struct padded_vnet_hdr {
>>> @@ -2236,6 +2241,22 @@ static int virtnet_xdp(struct net_device *dev,
>>> struct netdev_bpf *xdp)
>>>  }
>>>   }
>>>
>>> +static int virtnet_get_phys_port_name(struct net_device *dev, char *buf,
>>> + size_t len)
>>> +{
>>> +   struct virtnet_info *vi = netdev_priv(dev);
>>> +   int ret;
>>> +
>>> +   if (!virtio_has_feature(vi->vdev, VIRTIO_NET_F_BACKUP))
>>> +   return -EOPNOTSUPP;
>>> +
>>> +   ret = snprintf(buf, len, "_bkup");
>>> +   if (ret >= len)
>>> +   return -EOPNOTSUPP;
>>> +
>>> +   return 0;
>>> +}
>>> +
>>
>> What if the systemd/udevd is not new enough to enforce the
>> n naming? Would virtio_bypass get a different name
>> than the original virtio_net? Should we detect this earlier and fall
>> back to legacy mode without creating the bypass netdev and ensalving
>> the VF?
>
>
> If udev doesn't support renaming of the devices,  then the upper bypass
> device
> should get the original name and the lower virtio netdev will get the next
> name.

If you got two virtio-net's (say e.g. eth0 and eth1) before the
update, the virtio-bypass interface on the first virtio gets eth0 and
the backup netdev would get eth1? Then the IP address originally on
eth1 gets configurd on the backup interface?

> Hopefully the distros updating the kernel will also move to the new
> systemd/udev.

This is not reliable. I'd opt for a new udev API for this, and fall
back to what it was if don't see fit.

>
>
>
>>
>>>   static const struct net_device_ops virtnet_netdev = {
>>>  .ndo_open= virtnet_open,
>>>  .ndo_stop= virtnet_close,
>>> @@ -2253,6 +2274,7 @@ static const struct net_device_ops virtnet_netdev =
>>> {
>>>  .ndo_xdp_xmit   = virtnet_xdp_xmit,
>>>  .ndo_xdp_flush  = virtnet_xdp_flush,
>>>  .ndo_features_check = passthru_features_check,
>>> +   .ndo_get_phys_port_name = virtnet_get_phys_port_name,
>>>   };
>>>
>>>   static void virtnet_config_changed_work(struct work_struct *work)
>>> @@ -2647,6 +2669,653 @@ static int virtnet_validate(struct virtio_device
>>> *vdev)
>>>  return 0;
>>>   }
>>>
>>> +/* START of functions supporting VIRTIO_NET_F_BACKUP feature.
>>> + * When BACKUP feature is enabled, an additional netdev(bypass netdev)
>>> + * is created that acts as a master device and tracks the state of the
>>> + * 2 lower netdevs. The original 

[virtio-dev] Re: [PATCH v4 2/2] virtio_net: Extend virtio to use VF datapath when available

2018-03-02 Thread Siwei Liu
On Fri, Mar 2, 2018 at 1:36 PM, Michael S. Tsirkin  wrote:
> On Fri, Mar 02, 2018 at 01:11:56PM -0800, Siwei Liu wrote:
>> On Thu, Mar 1, 2018 at 12:08 PM, Sridhar Samudrala
>>  wrote:
>> > This patch enables virtio_net to switch over to a VF datapath when a VF
>> > netdev is present with the same MAC address. It allows live migration
>> > of a VM with a direct attached VF without the need to setup a bond/team
>> > between a VF and virtio net device in the guest.
>> >
>> > The hypervisor needs to enable only one datapath at any time so that
>> > packets don't get looped back to the VM over the other datapath. When a VF
>> > is plugged, the virtio datapath link state can be marked as down. The
>> > hypervisor needs to unplug the VF device from the guest on the source host
>> > and reset the MAC filter of the VF to initiate failover of datapath to
>> > virtio before starting the migration. After the migration is completed,
>> > the destination hypervisor sets the MAC filter on the VF and plugs it back
>> > to the guest to switch over to VF datapath.
>> >
>> > When BACKUP feature is enabled, an additional netdev(bypass netdev) is
>> > created that acts as a master device and tracks the state of the 2 lower
>> > netdevs. The original virtio_net netdev is marked as 'backup' netdev and a
>> > passthru device with the same MAC is registered as 'active' netdev.
>> >
>> > This patch is based on the discussion initiated by Jesse on this thread.
>> > https://marc.info/?l=linux-virtualization=151189725224231=2
>> >
>> > Signed-off-by: Sridhar Samudrala 
>> > Signed-off-by: Alexander Duyck 
>> > Reviewed-by: Jesse Brandeburg 
>> > ---
>> >  drivers/net/virtio_net.c | 683 
>> > ++-
>> >  1 file changed, 682 insertions(+), 1 deletion(-)
>> >
>> > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
>> > index bcd13fe906ca..f2860d86c952 100644
>> > --- a/drivers/net/virtio_net.c
>> > +++ b/drivers/net/virtio_net.c
>> > @@ -30,6 +30,8 @@
>> >  #include 
>> >  #include 
>> >  #include 
>> > +#include 
>> > +#include 
>> >  #include 
>> >  #include 
>> >
>> > @@ -206,6 +208,9 @@ struct virtnet_info {
>> > u32 speed;
>> >
>> > unsigned long guest_offloads;
>> > +
>> > +   /* upper netdev created when BACKUP feature enabled */
>> > +   struct net_device *bypass_netdev;
>> >  };
>> >
>> >  struct padded_vnet_hdr {
>> > @@ -2236,6 +2241,22 @@ static int virtnet_xdp(struct net_device *dev, 
>> > struct netdev_bpf *xdp)
>> > }
>> >  }
>> >
>> > +static int virtnet_get_phys_port_name(struct net_device *dev, char *buf,
>> > + size_t len)
>> > +{
>> > +   struct virtnet_info *vi = netdev_priv(dev);
>> > +   int ret;
>> > +
>> > +   if (!virtio_has_feature(vi->vdev, VIRTIO_NET_F_BACKUP))
>> > +   return -EOPNOTSUPP;
>> > +
>> > +   ret = snprintf(buf, len, "_bkup");
>> > +   if (ret >= len)
>> > +   return -EOPNOTSUPP;
>> > +
>> > +   return 0;
>> > +}
>> > +
>>
>> What if the systemd/udevd is not new enough to enforce the
>> n naming? Would virtio_bypass get a different name
>> than the original virtio_net?
>
> You mean people using ethX names? Any hardware config change breaks
> these, I don't think that can be helped.

I don't like the way to rely on .ndo_get_phys_port_name - it's fragile
and it does not completely solve the problem it tries to address.
Imagine what can end up with if getting an old udevd, or users already
have exsiting explicit udev rules around phys_port_name. It does not
give you the an ack in saying "yes, I know you're the bypass and
you're the backup, please continue and I will give you both correct
names", or an unacknowlegment saying "no, I don't know what these
extra interfaces are, please go back and leave the VF device alone".
We need new udev API for both feature negotiation and naming, or may
even completely hide the lower interfaces.

>
>> Should we detect this earlier and fall
>> back to legacy mode without creating the bypass netdev and ensalving
>> the VF?
>
> I don't think we can do this with existing kernel/userspace APIs.

That's why I ever said to make udev aware of this new type of combined
device instead of doing hacks here and there around.

Regards,
-Siwei

>
> --
> MST

-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



Re: [virtio-dev] Re: [PATCH v4 2/2] virtio_net: Extend virtio to use VF datapath when available

2018-03-02 Thread Samudrala, Sridhar

On 3/2/2018 1:11 PM, Siwei Liu wrote:

On Thu, Mar 1, 2018 at 12:08 PM, Sridhar Samudrala
 wrote:

This patch enables virtio_net to switch over to a VF datapath when a VF
netdev is present with the same MAC address. It allows live migration
of a VM with a direct attached VF without the need to setup a bond/team
between a VF and virtio net device in the guest.

The hypervisor needs to enable only one datapath at any time so that
packets don't get looped back to the VM over the other datapath. When a VF
is plugged, the virtio datapath link state can be marked as down. The
hypervisor needs to unplug the VF device from the guest on the source host
and reset the MAC filter of the VF to initiate failover of datapath to
virtio before starting the migration. After the migration is completed,
the destination hypervisor sets the MAC filter on the VF and plugs it back
to the guest to switch over to VF datapath.

When BACKUP feature is enabled, an additional netdev(bypass netdev) is
created that acts as a master device and tracks the state of the 2 lower
netdevs. The original virtio_net netdev is marked as 'backup' netdev and a
passthru device with the same MAC is registered as 'active' netdev.

This patch is based on the discussion initiated by Jesse on this thread.
https://marc.info/?l=linux-virtualization=151189725224231=2

Signed-off-by: Sridhar Samudrala 
Signed-off-by: Alexander Duyck 
Reviewed-by: Jesse Brandeburg 
---
  drivers/net/virtio_net.c | 683 ++-
  1 file changed, 682 insertions(+), 1 deletion(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index bcd13fe906ca..f2860d86c952 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -30,6 +30,8 @@
  #include 
  #include 
  #include 
+#include 
+#include 
  #include 
  #include 

@@ -206,6 +208,9 @@ struct virtnet_info {
 u32 speed;

 unsigned long guest_offloads;
+
+   /* upper netdev created when BACKUP feature enabled */
+   struct net_device *bypass_netdev;
  };

  struct padded_vnet_hdr {
@@ -2236,6 +2241,22 @@ static int virtnet_xdp(struct net_device *dev, struct 
netdev_bpf *xdp)
 }
  }

+static int virtnet_get_phys_port_name(struct net_device *dev, char *buf,
+ size_t len)
+{
+   struct virtnet_info *vi = netdev_priv(dev);
+   int ret;
+
+   if (!virtio_has_feature(vi->vdev, VIRTIO_NET_F_BACKUP))
+   return -EOPNOTSUPP;
+
+   ret = snprintf(buf, len, "_bkup");
+   if (ret >= len)
+   return -EOPNOTSUPP;
+
+   return 0;
+}
+

What if the systemd/udevd is not new enough to enforce the
n naming? Would virtio_bypass get a different name
than the original virtio_net? Should we detect this earlier and fall
back to legacy mode without creating the bypass netdev and ensalving
the VF?


If udev doesn't support renaming of the devices,  then the upper bypass device
should get the original name and the lower virtio netdev will get the next name.
Hopefully the distros updating the kernel will also move to the new 
systemd/udev.





  static const struct net_device_ops virtnet_netdev = {
 .ndo_open= virtnet_open,
 .ndo_stop= virtnet_close,
@@ -2253,6 +2274,7 @@ static const struct net_device_ops virtnet_netdev = {
 .ndo_xdp_xmit   = virtnet_xdp_xmit,
 .ndo_xdp_flush  = virtnet_xdp_flush,
 .ndo_features_check = passthru_features_check,
+   .ndo_get_phys_port_name = virtnet_get_phys_port_name,
  };

  static void virtnet_config_changed_work(struct work_struct *work)
@@ -2647,6 +2669,653 @@ static int virtnet_validate(struct virtio_device *vdev)
 return 0;
  }

+/* START of functions supporting VIRTIO_NET_F_BACKUP feature.
+ * When BACKUP feature is enabled, an additional netdev(bypass netdev)
+ * is created that acts as a master device and tracks the state of the
+ * 2 lower netdevs. The original virtio_net netdev is registered as
+ * 'backup' netdev and a passthru device with the same MAC is registered
+ * as 'active' netdev.
+ */
+
+/* bypass state maintained when BACKUP feature is enabled */
+struct virtnet_bypass_info {
+   /* passthru netdev with same MAC */
+   struct net_device __rcu *active_netdev;
+
+   /* virtio_net netdev */
+   struct net_device __rcu *backup_netdev;
+
+   /* active netdev stats */
+   struct rtnl_link_stats64 active_stats;
+
+   /* backup netdev stats */
+   struct rtnl_link_stats64 backup_stats;
+
+   /* aggregated stats */
+   struct rtnl_link_stats64 bypass_stats;
+
+   /* spinlock while updating stats */
+   spinlock_t stats_lock;
+};
+
+static void virtnet_bypass_child_open(struct net_device *dev,
+ struct net_device *child_netdev)
+{
+   

Re: [virtio-dev] Re: [PATCH v4 2/2] virtio_net: Extend virtio to use VF datapath when available

2018-03-02 Thread Siwei Liu
On Fri, Mar 2, 2018 at 1:31 PM, Michael S. Tsirkin  wrote:
> On Fri, Mar 02, 2018 at 12:44:56PM -0800, Siwei Liu wrote:
>> On Fri, Mar 2, 2018 at 12:10 PM, Michael S. Tsirkin  wrote:
>> > On Fri, Mar 02, 2018 at 11:52:27AM -0800, Samudrala, Sridhar wrote:
>> >>
>> >>
>> >> On 3/2/2018 11:41 AM, Michael S. Tsirkin wrote:
>> >> > On Fri, Mar 02, 2018 at 07:26:25AM -0800, Alexander Duyck wrote:
>> >> > > The design limits things to a 1:1 relationship since we just have the
>> >> > > child and backup pointers, but I don't think I am seeing exception
>> >> > > handling to prevent us from overwriting the child pointers so there
>> >> > > may be a leak there.
>> >> > >
>> >> > > Thanks.
>> >> > >
>> >> > > - Alex
>> >> > In fact maintaining a list in that case would be nicer, and
>> >> > just use an arbitrary one.
>> >> > E.g. one can see how a user wanting to swap device 1 for device 2
>> >> > might first add device 2 with same MAC then drop device 1.
>> >>
>> >> It should be possible to swap VF1 with VF2 by
>> >> 1.- enabling virtio link
>> >> 2.- unplugging VF1
>> >> 3.- plugging VF2
>> >> 4.- disabling virtio link
>> >>
>> >
>> > True, but it isn't hard to avoid breakage if user
>> > swapped steps 2 and 3. No need to make it more
>> > fragile that it has to be.
>>
>> The migration case, VF2 is associated with another PF on another
>> machine (destination), I wonder how it is possible.
>
> E.g. you want to remove the PF so you unplug the VF
> then add another VF of the same PF.
>
>> Even with local plugging of VF2 on the same PF, the MAC address
>> requirement (VF1's == VF2's) would fail the MAC address assignment on
>> VF2.
>>
>> -Siwei
>
> Why would it fail? These are separate cards.

OK. I realized that you may talk about assigning a VF on a diffferent
PF (VF1 on PF1 while VF2 on PF2). And we might assign a pass-through
device rather than a VF. Yes, it's indeed possible that may happen but
I take it as a further step down (another patch maybe) as it would
involve changes to notify the network with gratuituious ARP and/or
unsolicited ND advertisement of the MAC address association with the
new port.

-Siwei

>
>> >
>> > --
>> > MST
>> >
>> > -
>> > To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
>> > For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org
>> >

-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] Re: [PATCH v4 2/2] virtio_net: Extend virtio to use VF datapath when available

2018-03-02 Thread Michael S. Tsirkin
On Fri, Mar 02, 2018 at 01:11:56PM -0800, Siwei Liu wrote:
> On Thu, Mar 1, 2018 at 12:08 PM, Sridhar Samudrala
>  wrote:
> > This patch enables virtio_net to switch over to a VF datapath when a VF
> > netdev is present with the same MAC address. It allows live migration
> > of a VM with a direct attached VF without the need to setup a bond/team
> > between a VF and virtio net device in the guest.
> >
> > The hypervisor needs to enable only one datapath at any time so that
> > packets don't get looped back to the VM over the other datapath. When a VF
> > is plugged, the virtio datapath link state can be marked as down. The
> > hypervisor needs to unplug the VF device from the guest on the source host
> > and reset the MAC filter of the VF to initiate failover of datapath to
> > virtio before starting the migration. After the migration is completed,
> > the destination hypervisor sets the MAC filter on the VF and plugs it back
> > to the guest to switch over to VF datapath.
> >
> > When BACKUP feature is enabled, an additional netdev(bypass netdev) is
> > created that acts as a master device and tracks the state of the 2 lower
> > netdevs. The original virtio_net netdev is marked as 'backup' netdev and a
> > passthru device with the same MAC is registered as 'active' netdev.
> >
> > This patch is based on the discussion initiated by Jesse on this thread.
> > https://marc.info/?l=linux-virtualization=151189725224231=2
> >
> > Signed-off-by: Sridhar Samudrala 
> > Signed-off-by: Alexander Duyck 
> > Reviewed-by: Jesse Brandeburg 
> > ---
> >  drivers/net/virtio_net.c | 683 
> > ++-
> >  1 file changed, 682 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > index bcd13fe906ca..f2860d86c952 100644
> > --- a/drivers/net/virtio_net.c
> > +++ b/drivers/net/virtio_net.c
> > @@ -30,6 +30,8 @@
> >  #include 
> >  #include 
> >  #include 
> > +#include 
> > +#include 
> >  #include 
> >  #include 
> >
> > @@ -206,6 +208,9 @@ struct virtnet_info {
> > u32 speed;
> >
> > unsigned long guest_offloads;
> > +
> > +   /* upper netdev created when BACKUP feature enabled */
> > +   struct net_device *bypass_netdev;
> >  };
> >
> >  struct padded_vnet_hdr {
> > @@ -2236,6 +2241,22 @@ static int virtnet_xdp(struct net_device *dev, 
> > struct netdev_bpf *xdp)
> > }
> >  }
> >
> > +static int virtnet_get_phys_port_name(struct net_device *dev, char *buf,
> > + size_t len)
> > +{
> > +   struct virtnet_info *vi = netdev_priv(dev);
> > +   int ret;
> > +
> > +   if (!virtio_has_feature(vi->vdev, VIRTIO_NET_F_BACKUP))
> > +   return -EOPNOTSUPP;
> > +
> > +   ret = snprintf(buf, len, "_bkup");
> > +   if (ret >= len)
> > +   return -EOPNOTSUPP;
> > +
> > +   return 0;
> > +}
> > +
> 
> What if the systemd/udevd is not new enough to enforce the
> n naming? Would virtio_bypass get a different name
> than the original virtio_net?

You mean people using ethX names? Any hardware config change breaks
these, I don't think that can be helped.

> Should we detect this earlier and fall
> back to legacy mode without creating the bypass netdev and ensalving
> the VF?

I don't think we can do this with existing kernel/userspace APIs.

-- 
MST

-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



Re: [virtio-dev] Re: [PATCH v4 2/2] virtio_net: Extend virtio to use VF datapath when available

2018-03-02 Thread Michael S. Tsirkin
On Fri, Mar 02, 2018 at 12:56:21PM -0800, Samudrala, Sridhar wrote:
> 
> 
> On 3/2/2018 12:44 PM, Siwei Liu wrote:
> > On Fri, Mar 2, 2018 at 12:10 PM, Michael S. Tsirkin  wrote:
> > > On Fri, Mar 02, 2018 at 11:52:27AM -0800, Samudrala, Sridhar wrote:
> > > > 
> > > > On 3/2/2018 11:41 AM, Michael S. Tsirkin wrote:
> > > > > On Fri, Mar 02, 2018 at 07:26:25AM -0800, Alexander Duyck wrote:
> > > > > > The design limits things to a 1:1 relationship since we just have 
> > > > > > the
> > > > > > child and backup pointers, but I don't think I am seeing exception
> > > > > > handling to prevent us from overwriting the child pointers so there
> > > > > > may be a leak there.
> > > > > > 
> > > > > > Thanks.
> > > > > > 
> > > > > > - Alex
> > > > > In fact maintaining a list in that case would be nicer, and
> > > > > just use an arbitrary one.
> > > > > E.g. one can see how a user wanting to swap device 1 for device 2
> > > > > might first add device 2 with same MAC then drop device 1.
> > > > It should be possible to swap VF1 with VF2 by
> > > > 1.- enabling virtio link
> > > > 2.- unplugging VF1
> > > > 3.- plugging VF2
> > > > 4.- disabling virtio link
> > > > 
> > > True, but it isn't hard to avoid breakage if user
> > > swapped steps 2 and 3. No need to make it more
> > > fragile that it has to be.
> > The migration case, VF2 is associated with another PF on another
> > machine (destination), I wonder how it is possible.
> > 
> > Even with local plugging of VF2 on the same PF, the MAC address
> > requirement (VF1's == VF2's) would fail the MAC address assignment on
> > VF2.
> > 
> > 
> I didn't include updating the MAC filter step in the above sequence.
> So definitely plugging 2 VFs with the same MAC address will be an issue.

If these are two separate PFs then I don't see why -
each has its own MAC filter.

> Here is the more complete sequence of steps that are required to
> enable live migration.

Replacing VF1 by VF2 is not about migration. It's to remove PF
from host e.g. for service.

-- 
MST

-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



Re: [virtio-dev] Re: [PATCH v4 2/2] virtio_net: Extend virtio to use VF datapath when available

2018-03-02 Thread Michael S. Tsirkin
On Fri, Mar 02, 2018 at 12:44:56PM -0800, Siwei Liu wrote:
> On Fri, Mar 2, 2018 at 12:10 PM, Michael S. Tsirkin  wrote:
> > On Fri, Mar 02, 2018 at 11:52:27AM -0800, Samudrala, Sridhar wrote:
> >>
> >>
> >> On 3/2/2018 11:41 AM, Michael S. Tsirkin wrote:
> >> > On Fri, Mar 02, 2018 at 07:26:25AM -0800, Alexander Duyck wrote:
> >> > > The design limits things to a 1:1 relationship since we just have the
> >> > > child and backup pointers, but I don't think I am seeing exception
> >> > > handling to prevent us from overwriting the child pointers so there
> >> > > may be a leak there.
> >> > >
> >> > > Thanks.
> >> > >
> >> > > - Alex
> >> > In fact maintaining a list in that case would be nicer, and
> >> > just use an arbitrary one.
> >> > E.g. one can see how a user wanting to swap device 1 for device 2
> >> > might first add device 2 with same MAC then drop device 1.
> >>
> >> It should be possible to swap VF1 with VF2 by
> >> 1.- enabling virtio link
> >> 2.- unplugging VF1
> >> 3.- plugging VF2
> >> 4.- disabling virtio link
> >>
> >
> > True, but it isn't hard to avoid breakage if user
> > swapped steps 2 and 3. No need to make it more
> > fragile that it has to be.
> 
> The migration case, VF2 is associated with another PF on another
> machine (destination), I wonder how it is possible.

E.g. you want to remove the PF so you unplug the VF
then add another VF of the same PF.

> Even with local plugging of VF2 on the same PF, the MAC address
> requirement (VF1's == VF2's) would fail the MAC address assignment on
> VF2.
> 
> -Siwei

Why would it fail? These are separate cards.

> >
> > --
> > MST
> >
> > -
> > To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
> > For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org
> >

-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] Re: [PATCH v4 2/2] virtio_net: Extend virtio to use VF datapath when available

2018-03-02 Thread Siwei Liu
On Thu, Mar 1, 2018 at 12:08 PM, Sridhar Samudrala
 wrote:
> This patch enables virtio_net to switch over to a VF datapath when a VF
> netdev is present with the same MAC address. It allows live migration
> of a VM with a direct attached VF without the need to setup a bond/team
> between a VF and virtio net device in the guest.
>
> The hypervisor needs to enable only one datapath at any time so that
> packets don't get looped back to the VM over the other datapath. When a VF
> is plugged, the virtio datapath link state can be marked as down. The
> hypervisor needs to unplug the VF device from the guest on the source host
> and reset the MAC filter of the VF to initiate failover of datapath to
> virtio before starting the migration. After the migration is completed,
> the destination hypervisor sets the MAC filter on the VF and plugs it back
> to the guest to switch over to VF datapath.
>
> When BACKUP feature is enabled, an additional netdev(bypass netdev) is
> created that acts as a master device and tracks the state of the 2 lower
> netdevs. The original virtio_net netdev is marked as 'backup' netdev and a
> passthru device with the same MAC is registered as 'active' netdev.
>
> This patch is based on the discussion initiated by Jesse on this thread.
> https://marc.info/?l=linux-virtualization=151189725224231=2
>
> Signed-off-by: Sridhar Samudrala 
> Signed-off-by: Alexander Duyck 
> Reviewed-by: Jesse Brandeburg 
> ---
>  drivers/net/virtio_net.c | 683 
> ++-
>  1 file changed, 682 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index bcd13fe906ca..f2860d86c952 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -30,6 +30,8 @@
>  #include 
>  #include 
>  #include 
> +#include 
> +#include 
>  #include 
>  #include 
>
> @@ -206,6 +208,9 @@ struct virtnet_info {
> u32 speed;
>
> unsigned long guest_offloads;
> +
> +   /* upper netdev created when BACKUP feature enabled */
> +   struct net_device *bypass_netdev;
>  };
>
>  struct padded_vnet_hdr {
> @@ -2236,6 +2241,22 @@ static int virtnet_xdp(struct net_device *dev, struct 
> netdev_bpf *xdp)
> }
>  }
>
> +static int virtnet_get_phys_port_name(struct net_device *dev, char *buf,
> + size_t len)
> +{
> +   struct virtnet_info *vi = netdev_priv(dev);
> +   int ret;
> +
> +   if (!virtio_has_feature(vi->vdev, VIRTIO_NET_F_BACKUP))
> +   return -EOPNOTSUPP;
> +
> +   ret = snprintf(buf, len, "_bkup");
> +   if (ret >= len)
> +   return -EOPNOTSUPP;
> +
> +   return 0;
> +}
> +

What if the systemd/udevd is not new enough to enforce the
n naming? Would virtio_bypass get a different name
than the original virtio_net? Should we detect this earlier and fall
back to legacy mode without creating the bypass netdev and ensalving
the VF?

>  static const struct net_device_ops virtnet_netdev = {
> .ndo_open= virtnet_open,
> .ndo_stop= virtnet_close,
> @@ -2253,6 +2274,7 @@ static const struct net_device_ops virtnet_netdev = {
> .ndo_xdp_xmit   = virtnet_xdp_xmit,
> .ndo_xdp_flush  = virtnet_xdp_flush,
> .ndo_features_check = passthru_features_check,
> +   .ndo_get_phys_port_name = virtnet_get_phys_port_name,
>  };
>
>  static void virtnet_config_changed_work(struct work_struct *work)
> @@ -2647,6 +2669,653 @@ static int virtnet_validate(struct virtio_device 
> *vdev)
> return 0;
>  }
>
> +/* START of functions supporting VIRTIO_NET_F_BACKUP feature.
> + * When BACKUP feature is enabled, an additional netdev(bypass netdev)
> + * is created that acts as a master device and tracks the state of the
> + * 2 lower netdevs. The original virtio_net netdev is registered as
> + * 'backup' netdev and a passthru device with the same MAC is registered
> + * as 'active' netdev.
> + */
> +
> +/* bypass state maintained when BACKUP feature is enabled */
> +struct virtnet_bypass_info {
> +   /* passthru netdev with same MAC */
> +   struct net_device __rcu *active_netdev;
> +
> +   /* virtio_net netdev */
> +   struct net_device __rcu *backup_netdev;
> +
> +   /* active netdev stats */
> +   struct rtnl_link_stats64 active_stats;
> +
> +   /* backup netdev stats */
> +   struct rtnl_link_stats64 backup_stats;
> +
> +   /* aggregated stats */
> +   struct rtnl_link_stats64 bypass_stats;
> +
> +   /* spinlock while updating stats */
> +   spinlock_t stats_lock;
> +};
> +
> +static void virtnet_bypass_child_open(struct net_device *dev,
> + struct net_device *child_netdev)
> +{
> +   int err = dev_open(child_netdev);
> +
> +   if (err)
> +   

Re: [virtio-dev] Re: [PATCH v4 2/2] virtio_net: Extend virtio to use VF datapath when available

2018-03-02 Thread Samudrala, Sridhar



On 3/2/2018 12:44 PM, Siwei Liu wrote:

On Fri, Mar 2, 2018 at 12:10 PM, Michael S. Tsirkin  wrote:

On Fri, Mar 02, 2018 at 11:52:27AM -0800, Samudrala, Sridhar wrote:


On 3/2/2018 11:41 AM, Michael S. Tsirkin wrote:

On Fri, Mar 02, 2018 at 07:26:25AM -0800, Alexander Duyck wrote:

The design limits things to a 1:1 relationship since we just have the
child and backup pointers, but I don't think I am seeing exception
handling to prevent us from overwriting the child pointers so there
may be a leak there.

Thanks.

- Alex

In fact maintaining a list in that case would be nicer, and
just use an arbitrary one.
E.g. one can see how a user wanting to swap device 1 for device 2
might first add device 2 with same MAC then drop device 1.

It should be possible to swap VF1 with VF2 by
1.- enabling virtio link
2.- unplugging VF1
3.- plugging VF2
4.- disabling virtio link


True, but it isn't hard to avoid breakage if user
swapped steps 2 and 3. No need to make it more
fragile that it has to be.

The migration case, VF2 is associated with another PF on another
machine (destination), I wonder how it is possible.

Even with local plugging of VF2 on the same PF, the MAC address
requirement (VF1's == VF2's) would fail the MAC address assignment on
VF2.



I didn't include updating the MAC filter step in the above sequence.
So definitely plugging 2 VFs with the same MAC address will be an issue.

Here is the more complete sequence of steps that are required to
enable live migration.

Source Hypervisor
- Bring up the virtio link
- Hot Unplug VF from the VM
- Delete FDB entry for the MAC on the Bridge associated with virtio/tap
- Remove the MAC filter associated with the VF
- Migrate VM to destination

Destination Hypervisor (after migration is completed)
- Set MAC filter for the VF
- Hot Plug VF to the VM
- Bring down the virtio link

Thanks
Sridhar




-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



Re: [virtio-dev] Re: [PATCH v4 2/2] virtio_net: Extend virtio to use VF datapath when available

2018-03-02 Thread Siwei Liu
On Fri, Mar 2, 2018 at 11:42 AM, Michael S. Tsirkin  wrote:
> On Fri, Mar 02, 2018 at 05:20:17PM +0100, Jiri Pirko wrote:
>> >Yeah, this code essentially calls out the "shareable" code with a
>> >comment at the start and end of the section what defines the
>> >virtio_bypass functionality. It would just be a matter of mostly
>> >cutting and pasting to put it into a separate driver module.
>>
>> Please put it there and unite the use of it with netvsc.
>
> Surely, adding this to other drivers (e.g. might this be handy for xen
> too?) can be left for a separate patchset. Let's get one device merged
> first.
Agreed.

-Siwei

>
> --
> MST
>
> -
> To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
> For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org
>

-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] Re: [PATCH v4 2/2] virtio_net: Extend virtio to use VF datapath when available

2018-03-02 Thread Michael S. Tsirkin
On Fri, Mar 02, 2018 at 11:52:27AM -0800, Samudrala, Sridhar wrote:
> 
> 
> On 3/2/2018 11:41 AM, Michael S. Tsirkin wrote:
> > On Fri, Mar 02, 2018 at 07:26:25AM -0800, Alexander Duyck wrote:
> > > The design limits things to a 1:1 relationship since we just have the
> > > child and backup pointers, but I don't think I am seeing exception
> > > handling to prevent us from overwriting the child pointers so there
> > > may be a leak there.
> > > 
> > > Thanks.
> > > 
> > > - Alex
> > In fact maintaining a list in that case would be nicer, and
> > just use an arbitrary one.
> > E.g. one can see how a user wanting to swap device 1 for device 2
> > might first add device 2 with same MAC then drop device 1.
> 
> It should be possible to swap VF1 with VF2 by
> 1.- enabling virtio link
> 2.- unplugging VF1
> 3.- plugging VF2
> 4.- disabling virtio link
> 

True, but it isn't hard to avoid breakage if user
swapped steps 2 and 3. No need to make it more
fragile that it has to be.

-- 
MST

-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] Re: [PATCH v4 2/2] virtio_net: Extend virtio to use VF datapath when available

2018-03-02 Thread Michael S. Tsirkin
On Fri, Mar 02, 2018 at 05:20:17PM +0100, Jiri Pirko wrote:
> >Yeah, this code essentially calls out the "shareable" code with a
> >comment at the start and end of the section what defines the
> >virtio_bypass functionality. It would just be a matter of mostly
> >cutting and pasting to put it into a separate driver module.
> 
> Please put it there and unite the use of it with netvsc.

Surely, adding this to other drivers (e.g. might this be handy for xen
too?) can be left for a separate patchset. Let's get one device merged
first.

-- 
MST

-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] Re: [PATCH v4 2/2] virtio_net: Extend virtio to use VF datapath when available

2018-03-02 Thread Michael S. Tsirkin
On Fri, Mar 02, 2018 at 07:26:25AM -0800, Alexander Duyck wrote:
> The design limits things to a 1:1 relationship since we just have the
> child and backup pointers, but I don't think I am seeing exception
> handling to prevent us from overwriting the child pointers so there
> may be a leak there.
> 
> Thanks.
> 
> - Alex

In fact maintaining a list in that case would be nicer, and
just use an arbitrary one.
E.g. one can see how a user wanting to swap device 1 for device 2
might first add device 2 with same MAC then drop device 1.

-- 
MST

-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] Re: [PATCH v4 2/2] virtio_net: Extend virtio to use VF datapath when available

2018-03-02 Thread Alexander Duyck
On Fri, Mar 2, 2018 at 8:37 AM, Samudrala, Sridhar
 wrote:
>
>
> On 3/2/2018 8:20 AM, Jiri Pirko wrote:
>
> Fri, Mar 02, 2018 at 04:26:25PM CET, alexander.du...@gmail.com wrote:
>
> On Fri, Mar 2, 2018 at 12:36 AM, Jiri Pirko  wrote:
>
> Thu, Mar 01, 2018 at 09:08:43PM CET, sridhar.samudr...@intel.com wrote:
>
> This patch enables virtio_net to switch over to a VF datapath when a VF
> netdev is present with the same MAC address. It allows live migration
> of a VM with a direct attached VF without the need to setup a bond/team
> between a VF and virtio net device in the guest.
>
> The hypervisor needs to enable only one datapath at any time so that
> packets don't get looped back to the VM over the other datapath. When a VF
> is plugged, the virtio datapath link state can be marked as down. The
> hypervisor needs to unplug the VF device from the guest on the source host
> and reset the MAC filter of the VF to initiate failover of datapath to
> virtio before starting the migration. After the migration is completed,
> the destination hypervisor sets the MAC filter on the VF and plugs it back
> to the guest to switch over to VF datapath.
>
> When BACKUP feature is enabled, an additional netdev(bypass netdev) is
> created that acts as a master device and tracks the state of the 2 lower
> netdevs. The original virtio_net netdev is marked as 'backup' netdev and a
> passthru device with the same MAC is registered as 'active' netdev.
>
> This patch is based on the discussion initiated by Jesse on this thread.
> https://marc.info/?l=linux-virtualization=151189725224231=2
>
> Signed-off-by: Sridhar Samudrala 
> Signed-off-by: Alexander Duyck 
> Reviewed-by: Jesse Brandeburg 
> ---
> drivers/net/virtio_net.c | 683
> ++-
>
> As I wrote to the discussion of the other version of this patchset,
> I strongly believe you need to have a common part in net/core/ that will
> be shared by both virtio_net and netvsc. There there should be explicit
> limits for this in-driver bonding solution, like max 1 vf slave, etc.
>
> Yeah, this code essentially calls out the "shareable" code with a
> comment at the start and end of the section what defines the
> virtio_bypass functionality. It would just be a matter of mostly
> cutting and pasting to put it into a separate driver module.
>
> Please put it there and unite the use of it with netvsc.
>
> Based on Stephen's responses, it is not clear if netvsc would be OK to move
> to
> the 3 netdev solution at this time. When another paravirtual driver would
> like
> to use this solution, we can do refactoring at that time.
>
> The design limits things to a 1:1 relationship since we just have the
> child and backup pointers, but I don't think I am seeing exception
> handling to prevent us from overwriting the child pointers so there
> may be a leak there.
>
>
> We only allow one active netdev registered as a child. There is a check to
> make sure that if an active netdev is registered, another one with same MAC
> is not allowed. I don't think there is a leak.
>
> vbi = netdev_priv(dev);
> backup = (child_netdev->dev.parent == dev->dev.parent);
> if (backup ? rtnl_dereference(vbi->backup_netdev) :
> rtnl_dereference(vbi->active_netdev)) {
> netdev_info(dev,
> "%s attempting to join bypass dev when %s
> already present\n",
> child_netdev->name, backup ? "backup" :
> "active");
> return NOTIFY_DONE;
> }
>
> /* Avoid non pci devices as active netdev */
> if (!backup && (!child_netdev->dev.parent ||
> !dev_is_pci(child_netdev->dev.parent)))
> return NOTIFY_DONE;
>
> Thanks
> Sridhar

I overlooked that part.

Thanks.

- Alex

-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] Re: [PATCH v4 2/2] virtio_net: Extend virtio to use VF datapath when available

2018-03-02 Thread Samudrala, Sridhar



On 3/2/2018 8:20 AM, Jiri Pirko wrote:

Fri, Mar 02, 2018 at 04:26:25PM CET, alexander.du...@gmail.com wrote:

On Fri, Mar 2, 2018 at 12:36 AM, Jiri Pirko  wrote:

Thu, Mar 01, 2018 at 09:08:43PM CET, sridhar.samudr...@intel.com wrote:

This patch enables virtio_net to switch over to a VF datapath when a VF
netdev is present with the same MAC address. It allows live migration
of a VM with a direct attached VF without the need to setup a bond/team
between a VF and virtio net device in the guest.

The hypervisor needs to enable only one datapath at any time so that
packets don't get looped back to the VM over the other datapath. When a VF
is plugged, the virtio datapath link state can be marked as down. The
hypervisor needs to unplug the VF device from the guest on the source host
and reset the MAC filter of the VF to initiate failover of datapath to
virtio before starting the migration. After the migration is completed,
the destination hypervisor sets the MAC filter on the VF and plugs it back
to the guest to switch over to VF datapath.

When BACKUP feature is enabled, an additional netdev(bypass netdev) is
created that acts as a master device and tracks the state of the 2 lower
netdevs. The original virtio_net netdev is marked as 'backup' netdev and a
passthru device with the same MAC is registered as 'active' netdev.

This patch is based on the discussion initiated by Jesse on this thread.
https://marc.info/?l=linux-virtualization=151189725224231=2

Signed-off-by: Sridhar Samudrala 
Signed-off-by: Alexander Duyck 
Reviewed-by: Jesse Brandeburg 
---
drivers/net/virtio_net.c | 683 ++-

As I wrote to the discussion of the other version of this patchset,
I strongly believe you need to have a common part in net/core/ that will
be shared by both virtio_net and netvsc. There there should be explicit
limits for this in-driver bonding solution, like max 1 vf slave, etc.

Yeah, this code essentially calls out the "shareable" code with a
comment at the start and end of the section what defines the
virtio_bypass functionality. It would just be a matter of mostly
cutting and pasting to put it into a separate driver module.

Please put it there and unite the use of it with netvsc.


Based on Stephen's responses, it is not clear if netvsc would be OK to move to
the 3 netdev solution at this time. When another paravirtual driver would like
to use this solution, we can do refactoring at that time.




The design limits things to a 1:1 relationship since we just have the
child and backup pointers, but I don't think I am seeing exception
handling to prevent us from overwriting the child pointers so there
may be a leak there.



We only allow one active netdev registered as a child. There is a check to
make sure that if an active netdev is registered, another one with same MAC
is not allowed. I don't think there is a leak.

vbi = netdev_priv(dev);
backup = (child_netdev->dev.parent == dev->dev.parent);
if (backup ? rtnl_dereference(vbi->backup_netdev) :
rtnl_dereference(vbi->active_netdev)) {
netdev_info(dev,
"%s attempting to join bypass dev when %s already 
present\n",
child_netdev->name, backup ? "backup" : "active");
return NOTIFY_DONE;
}

/* Avoid non pci devices as active netdev */
if (!backup && (!child_netdev->dev.parent ||
!dev_is_pci(child_netdev->dev.parent)))
return NOTIFY_DONE;

Thanks
Sridhar



[virtio-dev] Re: [PATCH v4 2/2] virtio_net: Extend virtio to use VF datapath when available

2018-03-02 Thread Alexander Duyck
On Fri, Mar 2, 2018 at 12:36 AM, Jiri Pirko  wrote:
> Thu, Mar 01, 2018 at 09:08:43PM CET, sridhar.samudr...@intel.com wrote:
>>This patch enables virtio_net to switch over to a VF datapath when a VF
>>netdev is present with the same MAC address. It allows live migration
>>of a VM with a direct attached VF without the need to setup a bond/team
>>between a VF and virtio net device in the guest.
>>
>>The hypervisor needs to enable only one datapath at any time so that
>>packets don't get looped back to the VM over the other datapath. When a VF
>>is plugged, the virtio datapath link state can be marked as down. The
>>hypervisor needs to unplug the VF device from the guest on the source host
>>and reset the MAC filter of the VF to initiate failover of datapath to
>>virtio before starting the migration. After the migration is completed,
>>the destination hypervisor sets the MAC filter on the VF and plugs it back
>>to the guest to switch over to VF datapath.
>>
>>When BACKUP feature is enabled, an additional netdev(bypass netdev) is
>>created that acts as a master device and tracks the state of the 2 lower
>>netdevs. The original virtio_net netdev is marked as 'backup' netdev and a
>>passthru device with the same MAC is registered as 'active' netdev.
>>
>>This patch is based on the discussion initiated by Jesse on this thread.
>>https://marc.info/?l=linux-virtualization=151189725224231=2
>>
>>Signed-off-by: Sridhar Samudrala 
>>Signed-off-by: Alexander Duyck 
>>Reviewed-by: Jesse Brandeburg 
>>---
>> drivers/net/virtio_net.c | 683 
>> ++-
>
> As I wrote to the discussion of the other version of this patchset,
> I strongly believe you need to have a common part in net/core/ that will
> be shared by both virtio_net and netvsc. There there should be explicit
> limits for this in-driver bonding solution, like max 1 vf slave, etc.

Yeah, this code essentially calls out the "shareable" code with a
comment at the start and end of the section what defines the
virtio_bypass functionality. It would just be a matter of mostly
cutting and pasting to put it into a separate driver module.

The design limits things to a 1:1 relationship since we just have the
child and backup pointers, but I don't think I am seeing exception
handling to prevent us from overwriting the child pointers so there
may be a leak there.

Thanks.

- Alex

-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org