Re: [RFC PATCH v3 0/3] Enable virtio_net to act as a backup for a passthru device

2018-02-21 Thread Samudrala, Sridhar

On 2/21/2018 6:35 PM, Samudrala, Sridhar wrote:

On 2/21/2018 5:59 PM, Siwei Liu wrote:

On Wed, Feb 21, 2018 at 4:17 PM, Alexander Duyck
 wrote:

On Wed, Feb 21, 2018 at 3:50 PM, Siwei Liu  wrote:

I haven't checked emails for days and did not realize the new revision
had already came out. And thank you for the effort, this revision
really looks to be a step forward towards our use case and is close to
what we wanted to do. A few questions in line.

On Sat, Feb 17, 2018 at 9:12 AM, Alexander Duyck
 wrote:
On Fri, Feb 16, 2018 at 6:38 PM, Jakub Kicinski  
wrote:

On Fri, 16 Feb 2018 10:11:19 -0800, Sridhar Samudrala wrote:

Ppatch 2 is in response to the community request for a 3 netdev
solution.  However, it creates some issues we'll get into in a 
moment.

It extends virtio_net to use alternate datapath when available and
registered. When BACKUP feature is enabled, virtio_net driver 
creates
an additional 'bypass' netdev that acts as a master device and 
controls

2 slave devices.  The original virtio_net netdev is registered as
'backup' netdev and a passthru/vf device with the same MAC gets
registered as 'active' netdev. Both 'bypass' and 'backup' 
netdevs are
associated with the same 'pci' device.  The user accesses the 
network
interface via 'bypass' netdev. The 'bypass' netdev chooses 
'active' netdev
as default for transmits when it is available with link up and 
running.

Thank you do doing this.


We noticed a couple of issues with this approach during testing.
- As both 'bypass' and 'backup' netdevs are associated with the 
same
   virtio pci device, udev tries to rename both of them with the 
same name
   and the 2nd rename will fail. This would be OK as long as the 
first netdev
   to be renamed is the 'bypass' netdev, but the order in which 
udev gets

   to rename the 2 netdevs is not reliable.

Out of curiosity - why do you link the master netdev to the virtio
struct device?

The basic idea of all this is that we wanted this to work with an
existing VM image that was using virtio. As such we were trying to
make it so that the bypass interface takes the place of the original
virtio and get udev to rename the bypass to what the original
virtio_net was.

Could it made it also possible to take over the config from VF instead
of virtio on an existing VM image? And get udev rename the bypass
netdev to what the original VF was. I don't say tightly binding the
bypass master to only virtio or VF, but I think we should provide both
options to support different upgrade paths. Possibly we could tweak
the device tree layout to reuse the same PCI slot for the master
bypass netdev, such that udev would not get confused when renaming the
device. The VF needs to use a different function slot afterwards.
Perhaps we might need to a special multiseat like QEMU device for that
purpose?

Our case we'll upgrade the config from VF to virtio-bypass directly.

So if I am understanding what you are saying you are wanting to flip
the backup interface from the virtio to a VF. The problem is that
becomes a bit of a vendor lock-in solution since it would rely on a
specific VF driver. I would agree with Jiri that we don't want to go
down that path. We don't want every VF out there firing up its own
separate bond. Ideally you want the hypervisor to be able to manage
all of this which is why it makes sense to have virtio manage this and
why this is associated with the virtio_net interface.

No, that's not what I was talking about of course. I thought you
mentioned the upgrade scenario this patch would like to address is to
use the bypass interface "to take the place of the original virtio,
and get udev to rename the bypass to what the original virtio_net
was". That is one of the possible upgrade paths for sure. However the
upgrade path I was seeking is to use the bypass interface to take the
place of original VF interface while retaining the name and network
configs, which generally can be done simply with kernel upgrade. It
would become limiting as this patch makes the bypass interface share
the same virtio pci device with virito backup. Can this bypass
interface be made general to take place of any pci device other than
virtio-net? This will be more helpful as the cloud users who has
existing setup on VF interface don't have to recreate it on virtio-net
and VF separately again.


Yes. This sounds interesting. Looks like you want an existing VM image 
with

VF only configuration to get transparent live migration support by adding
virtio_net with BACKUP feature.  We may need another feature bit to 
switch

between these 2 options.


After thinking some more, this may be more involved than adding a new
feature bit.  This requires a netdev created by virtio to take over the 
name of

a VF netdev associated with a PCI device that may not be plugged in when
the virtio driver is coming up. This definitely requires some new messages

Re: [RFC PATCH v3 0/3] Enable virtio_net to act as a backup for a passthru device

2018-02-21 Thread Samudrala, Sridhar

On 2/21/2018 5:59 PM, Siwei Liu wrote:

On Wed, Feb 21, 2018 at 4:17 PM, Alexander Duyck
 wrote:

On Wed, Feb 21, 2018 at 3:50 PM, Siwei Liu  wrote:

I haven't checked emails for days and did not realize the new revision
had already came out. And thank you for the effort, this revision
really looks to be a step forward towards our use case and is close to
what we wanted to do. A few questions in line.

On Sat, Feb 17, 2018 at 9:12 AM, Alexander Duyck
 wrote:

On Fri, Feb 16, 2018 at 6:38 PM, Jakub Kicinski  wrote:

On Fri, 16 Feb 2018 10:11:19 -0800, Sridhar Samudrala wrote:

Ppatch 2 is in response to the community request for a 3 netdev
solution.  However, it creates some issues we'll get into in a moment.
It extends virtio_net to use alternate datapath when available and
registered. When BACKUP feature is enabled, virtio_net driver creates
an additional 'bypass' netdev that acts as a master device and controls
2 slave devices.  The original virtio_net netdev is registered as
'backup' netdev and a passthru/vf device with the same MAC gets
registered as 'active' netdev. Both 'bypass' and 'backup' netdevs are
associated with the same 'pci' device.  The user accesses the network
interface via 'bypass' netdev. The 'bypass' netdev chooses 'active' netdev
as default for transmits when it is available with link up and running.

Thank you do doing this.


We noticed a couple of issues with this approach during testing.
- As both 'bypass' and 'backup' netdevs are associated with the same
   virtio pci device, udev tries to rename both of them with the same name
   and the 2nd rename will fail. This would be OK as long as the first netdev
   to be renamed is the 'bypass' netdev, but the order in which udev gets
   to rename the 2 netdevs is not reliable.

Out of curiosity - why do you link the master netdev to the virtio
struct device?

The basic idea of all this is that we wanted this to work with an
existing VM image that was using virtio. As such we were trying to
make it so that the bypass interface takes the place of the original
virtio and get udev to rename the bypass to what the original
virtio_net was.

Could it made it also possible to take over the config from VF instead
of virtio on an existing VM image? And get udev rename the bypass
netdev to what the original VF was. I don't say tightly binding the
bypass master to only virtio or VF, but I think we should provide both
options to support different upgrade paths. Possibly we could tweak
the device tree layout to reuse the same PCI slot for the master
bypass netdev, such that udev would not get confused when renaming the
device. The VF needs to use a different function slot afterwards.
Perhaps we might need to a special multiseat like QEMU device for that
purpose?

Our case we'll upgrade the config from VF to virtio-bypass directly.

So if I am understanding what you are saying you are wanting to flip
the backup interface from the virtio to a VF. The problem is that
becomes a bit of a vendor lock-in solution since it would rely on a
specific VF driver. I would agree with Jiri that we don't want to go
down that path. We don't want every VF out there firing up its own
separate bond. Ideally you want the hypervisor to be able to manage
all of this which is why it makes sense to have virtio manage this and
why this is associated with the virtio_net interface.

No, that's not what I was talking about of course. I thought you
mentioned the upgrade scenario this patch would like to address is to
use the bypass interface "to take the place of the original virtio,
and get udev to rename the bypass to what the original virtio_net
was". That is one of the possible upgrade paths for sure. However the
upgrade path I was seeking is to use the bypass interface to take the
place of original VF interface while retaining the name and network
configs, which generally can be done simply with kernel upgrade. It
would become limiting as this patch makes the bypass interface share
the same virtio pci device with virito backup. Can this bypass
interface be made general to take place of any pci device other than
virtio-net? This will be more helpful as the cloud users who has
existing setup on VF interface don't have to recreate it on virtio-net
and VF separately again.


Yes. This sounds interesting. Looks like you want an existing VM image with
VF only configuration to get transparent live migration support by adding
virtio_net with BACKUP feature.  We may need another feature bit to switch
between these 2 options.





The other bits get into more complexity then we are ready to handle
for now. I think I might have talked about something similar that I
was referring to as a "virtio-bond" where you would have a PCI/PCIe
tree topology that makes this easier to sort out, and the "virtio-bond
would be used to handle coordination/configuration of a much more
complex interface.

That was 

Re: [RFC PATCH v3 0/3] Enable virtio_net to act as a backup for a passthru device

2018-02-21 Thread Samudrala, Sridhar

On 2/21/2018 6:02 PM, Jakub Kicinski wrote:

On Wed, 21 Feb 2018 12:57:09 -0800, Alexander Duyck wrote:

I don't see why the team cannot be there always.

It is more the logistical nightmare. Part of the goal here was to work
with the cloud base images that are out there such as
https://alt.fedoraproject.org/cloud/. With just the kernel changes the
overhead for this stays fairly small and would be pulled in as just a
standard part of the kernel update process. The virtio bypass only
pops up if the backup bit is present. With the team solution it
requires that the base image use the team driver on virtio_net when it
sees one. I doubt the OSVs would want to do that just because SR-IOV
isn't that popular of a case.

IIUC we need to monitor for a "backup hint", spawn the master, rename it
to maintain backwards compatibility with no-VF setups and enslave the VF
if it appears.

All those sound possible from user space, the advantage of the kernel
solution right now is that it has more complete code.

Am I misunderstanding?


I think there is some misunderstanding about the exact requirement and 
the usecase
we are trying to solve.  If the Guest is allowed to do this 
configuration, we already have

a solution with either bond/team based user space configuration.

This is to enable cloud service providers to provide a accelerated 
datapath by simply
letting to tenants to get their own images with the only requirement to 
enable their

kernels with newer virtio_net driver with BACKUP support and the VF driver.

To recap from an earlier thread, here is a response from Stephen that 
talks about the
requirement for the netvsc solution and we would like to provide similar 
solution for

KVM based cloud deployments.

> The requirement with Azure accelerated network was that a stock 
distribution image from the

> store must be able to run unmodified and get accelerated networking.
>  Not sure if other environments need to work the same, but it would 
be nice.

>  That meant no additional setup scripts (aka no bonding) and also it must
> work transparently with hot-plug. Also there are diverse set of 
environments:
> openstack, cloudinit, network manager and systemd. The solution had 
to not depend

> on any one of them, but also not break any of them.

Thanks
Sridhar
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [RFC PATCH v3 0/3] Enable virtio_net to act as a backup for a passthru device

2018-02-21 Thread Jakub Kicinski
On Wed, 21 Feb 2018 12:57:09 -0800, Alexander Duyck wrote:
> > I don't see why the team cannot be there always.  
> 
> It is more the logistical nightmare. Part of the goal here was to work
> with the cloud base images that are out there such as
> https://alt.fedoraproject.org/cloud/. With just the kernel changes the
> overhead for this stays fairly small and would be pulled in as just a
> standard part of the kernel update process. The virtio bypass only
> pops up if the backup bit is present. With the team solution it
> requires that the base image use the team driver on virtio_net when it
> sees one. I doubt the OSVs would want to do that just because SR-IOV
> isn't that popular of a case.

IIUC we need to monitor for a "backup hint", spawn the master, rename it
to maintain backwards compatibility with no-VF setups and enslave the VF
if it appears.

All those sound possible from user space, the advantage of the kernel
solution right now is that it has more complete code.

Am I misunderstanding?
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [RFC PATCH v3 0/3] Enable virtio_net to act as a backup for a passthru device

2018-02-21 Thread Siwei Liu
On Wed, Feb 21, 2018 at 4:17 PM, Alexander Duyck
 wrote:
> On Wed, Feb 21, 2018 at 3:50 PM, Siwei Liu  wrote:
>> I haven't checked emails for days and did not realize the new revision
>> had already came out. And thank you for the effort, this revision
>> really looks to be a step forward towards our use case and is close to
>> what we wanted to do. A few questions in line.
>>
>> On Sat, Feb 17, 2018 at 9:12 AM, Alexander Duyck
>>  wrote:
>>> On Fri, Feb 16, 2018 at 6:38 PM, Jakub Kicinski  wrote:
 On Fri, 16 Feb 2018 10:11:19 -0800, Sridhar Samudrala wrote:
> Ppatch 2 is in response to the community request for a 3 netdev
> solution.  However, it creates some issues we'll get into in a moment.
> It extends virtio_net to use alternate datapath when available and
> registered. When BACKUP feature is enabled, virtio_net driver creates
> an additional 'bypass' netdev that acts as a master device and controls
> 2 slave devices.  The original virtio_net netdev is registered as
> 'backup' netdev and a passthru/vf device with the same MAC gets
> registered as 'active' netdev. Both 'bypass' and 'backup' netdevs are
> associated with the same 'pci' device.  The user accesses the network
> interface via 'bypass' netdev. The 'bypass' netdev chooses 'active' netdev
> as default for transmits when it is available with link up and running.

 Thank you do doing this.

> We noticed a couple of issues with this approach during testing.
> - As both 'bypass' and 'backup' netdevs are associated with the same
>   virtio pci device, udev tries to rename both of them with the same name
>   and the 2nd rename will fail. This would be OK as long as the first 
> netdev
>   to be renamed is the 'bypass' netdev, but the order in which udev gets
>   to rename the 2 netdevs is not reliable.

 Out of curiosity - why do you link the master netdev to the virtio
 struct device?
>>>
>>> The basic idea of all this is that we wanted this to work with an
>>> existing VM image that was using virtio. As such we were trying to
>>> make it so that the bypass interface takes the place of the original
>>> virtio and get udev to rename the bypass to what the original
>>> virtio_net was.
>>
>> Could it made it also possible to take over the config from VF instead
>> of virtio on an existing VM image? And get udev rename the bypass
>> netdev to what the original VF was. I don't say tightly binding the
>> bypass master to only virtio or VF, but I think we should provide both
>> options to support different upgrade paths. Possibly we could tweak
>> the device tree layout to reuse the same PCI slot for the master
>> bypass netdev, such that udev would not get confused when renaming the
>> device. The VF needs to use a different function slot afterwards.
>> Perhaps we might need to a special multiseat like QEMU device for that
>> purpose?
>>
>> Our case we'll upgrade the config from VF to virtio-bypass directly.
>
> So if I am understanding what you are saying you are wanting to flip
> the backup interface from the virtio to a VF. The problem is that
> becomes a bit of a vendor lock-in solution since it would rely on a
> specific VF driver. I would agree with Jiri that we don't want to go
> down that path. We don't want every VF out there firing up its own
> separate bond. Ideally you want the hypervisor to be able to manage
> all of this which is why it makes sense to have virtio manage this and
> why this is associated with the virtio_net interface.

No, that's not what I was talking about of course. I thought you
mentioned the upgrade scenario this patch would like to address is to
use the bypass interface "to take the place of the original virtio,
and get udev to rename the bypass to what the original virtio_net
was". That is one of the possible upgrade paths for sure. However the
upgrade path I was seeking is to use the bypass interface to take the
place of original VF interface while retaining the name and network
configs, which generally can be done simply with kernel upgrade. It
would become limiting as this patch makes the bypass interface share
the same virtio pci device with virito backup. Can this bypass
interface be made general to take place of any pci device other than
virtio-net? This will be more helpful as the cloud users who has
existing setup on VF interface don't have to recreate it on virtio-net
and VF separately again.

>
> The other bits get into more complexity then we are ready to handle
> for now. I think I might have talked about something similar that I
> was referring to as a "virtio-bond" where you would have a PCI/PCIe
> tree topology that makes this easier to sort out, and the "virtio-bond
> would be used to handle coordination/configuration of a much more
> complex interface.

That was one way to solve this problem but I'd like to see 

Re: [PATCH 3/4] iommu/virtio: Add event queue

2018-02-21 Thread kbuild test robot
Hi Jean-Philippe,

I love your patch! Yet something to improve:

[auto build test ERROR on linus/master]
[also build test ERROR on v4.16-rc2 next-20180221]
[cannot apply to iommu/next]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Jean-Philippe-Brucker/Add-virtio-iommu-driver/20180217-075417
config: parisc-allmodconfig (attached as .config)
compiler: hppa-linux-gnu-gcc (Debian 7.2.0-11) 7.2.0
reproduce:
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=parisc 

All errors (new ones prefixed by >>):

   drivers/iommu/virtio-iommu.o: In function `virtio_iommu_drv_init':
   (.init.text+0x24): undefined reference to `register_virtio_driver'
   drivers/iommu/virtio-iommu.o: In function `viommu_send_reqs_sync':
   (.text.viommu_send_reqs_sync+0xdc): undefined reference to 
`virtqueue_add_sgs'
   (.text.viommu_send_reqs_sync+0x12c): undefined reference to `virtqueue_kick'
   (.text.viommu_send_reqs_sync+0x29c): undefined reference to 
`virtqueue_get_buf'
   drivers/iommu/virtio-iommu.o: In function `viommu_event_handler':
>> (.text.viommu_event_handler+0x288): undefined reference to 
>> `virtqueue_add_inbuf'
>> (.text.viommu_event_handler+0x2a8): undefined reference to 
>> `virtqueue_get_buf'
>> (.text.viommu_event_handler+0x2b8): undefined reference to `virtqueue_kick'
   drivers/iommu/virtio-iommu.o: In function `viommu_probe':
   (.text.viommu_probe+0x1a0): undefined reference to 
`virtio_check_driver_offered_feature'
   (.text.viommu_probe+0x248): undefined reference to 
`virtio_check_driver_offered_feature'
   (.text.viommu_probe+0x2ec): undefined reference to 
`virtio_check_driver_offered_feature'
   (.text.viommu_probe+0x328): undefined reference to 
`virtio_check_driver_offered_feature'
>> (.text.viommu_probe+0x428): undefined reference to `virtqueue_add_inbuf'
>> (.text.viommu_probe+0x440): undefined reference to `virtqueue_kick'
   drivers/iommu/virtio-iommu.o: In function `virtio_iommu_drv_exit':
   (.exit.text+0x18): undefined reference to `unregister_virtio_driver'

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [RFC PATCH v3 0/3] Enable virtio_net to act as a backup for a passthru device

2018-02-21 Thread Alexander Duyck
On Wed, Feb 21, 2018 at 3:50 PM, Siwei Liu  wrote:
> I haven't checked emails for days and did not realize the new revision
> had already came out. And thank you for the effort, this revision
> really looks to be a step forward towards our use case and is close to
> what we wanted to do. A few questions in line.
>
> On Sat, Feb 17, 2018 at 9:12 AM, Alexander Duyck
>  wrote:
>> On Fri, Feb 16, 2018 at 6:38 PM, Jakub Kicinski  wrote:
>>> On Fri, 16 Feb 2018 10:11:19 -0800, Sridhar Samudrala wrote:
 Ppatch 2 is in response to the community request for a 3 netdev
 solution.  However, it creates some issues we'll get into in a moment.
 It extends virtio_net to use alternate datapath when available and
 registered. When BACKUP feature is enabled, virtio_net driver creates
 an additional 'bypass' netdev that acts as a master device and controls
 2 slave devices.  The original virtio_net netdev is registered as
 'backup' netdev and a passthru/vf device with the same MAC gets
 registered as 'active' netdev. Both 'bypass' and 'backup' netdevs are
 associated with the same 'pci' device.  The user accesses the network
 interface via 'bypass' netdev. The 'bypass' netdev chooses 'active' netdev
 as default for transmits when it is available with link up and running.
>>>
>>> Thank you do doing this.
>>>
 We noticed a couple of issues with this approach during testing.
 - As both 'bypass' and 'backup' netdevs are associated with the same
   virtio pci device, udev tries to rename both of them with the same name
   and the 2nd rename will fail. This would be OK as long as the first 
 netdev
   to be renamed is the 'bypass' netdev, but the order in which udev gets
   to rename the 2 netdevs is not reliable.
>>>
>>> Out of curiosity - why do you link the master netdev to the virtio
>>> struct device?
>>
>> The basic idea of all this is that we wanted this to work with an
>> existing VM image that was using virtio. As such we were trying to
>> make it so that the bypass interface takes the place of the original
>> virtio and get udev to rename the bypass to what the original
>> virtio_net was.
>
> Could it made it also possible to take over the config from VF instead
> of virtio on an existing VM image? And get udev rename the bypass
> netdev to what the original VF was. I don't say tightly binding the
> bypass master to only virtio or VF, but I think we should provide both
> options to support different upgrade paths. Possibly we could tweak
> the device tree layout to reuse the same PCI slot for the master
> bypass netdev, such that udev would not get confused when renaming the
> device. The VF needs to use a different function slot afterwards.
> Perhaps we might need to a special multiseat like QEMU device for that
> purpose?
>
> Our case we'll upgrade the config from VF to virtio-bypass directly.

So if I am understanding what you are saying you are wanting to flip
the backup interface from the virtio to a VF. The problem is that
becomes a bit of a vendor lock-in solution since it would rely on a
specific VF driver. I would agree with Jiri that we don't want to go
down that path. We don't want every VF out there firing up its own
separate bond. Ideally you want the hypervisor to be able to manage
all of this which is why it makes sense to have virtio manage this and
why this is associated with the virtio_net interface.

The other bits get into more complexity then we are ready to handle
for now. I think I might have talked about something similar that I
was referring to as a "virtio-bond" where you would have a PCI/PCIe
tree topology that makes this easier to sort out, and the "virtio-bond
would be used to handle coordination/configuration of a much more
complex interface.

>>
>>> FWIW two solutions that immediately come to mind is to export "backup"
>>> as phys_port_name of the backup virtio link and/or assign a name to the
>>> master like you are doing already.  I think team uses team%d and bond
>>> uses bond%d, soft naming of master devices seems quite natural in this
>>> case.
>>
>> I figured I had overlooked something like that.. Thanks for pointing
>> this out. Okay so I think the phys_port_name approach might resolve
>> the original issue. If I am reading things correctly what we end up
>> with is the master showing up as "ens1" for example and the backup
>> showing up as "ens1nbackup". Am I understanding that right?
>>
>> The problem with the team/bond%d approach is that it creates a new
>> netdevice and so it would require guest configuration changes.
>>
>>> IMHO phys_port_name == "backup" if BACKUP bit is set on slave virtio
>>> link is quite neat.
>>
>> I agree. For non-"backup" virio_net devices would it be okay for us to
>> just return -EOPNOTSUPP? I assume it would be and that way the legacy
>> behavior could be maintained although the function still exists.
>>
 - When 

Re: [RFC PATCH v3 0/3] Enable virtio_net to act as a backup for a passthru device

2018-02-21 Thread Siwei Liu
I haven't checked emails for days and did not realize the new revision
had already came out. And thank you for the effort, this revision
really looks to be a step forward towards our use case and is close to
what we wanted to do. A few questions in line.

On Sat, Feb 17, 2018 at 9:12 AM, Alexander Duyck
 wrote:
> On Fri, Feb 16, 2018 at 6:38 PM, Jakub Kicinski  wrote:
>> On Fri, 16 Feb 2018 10:11:19 -0800, Sridhar Samudrala wrote:
>>> Ppatch 2 is in response to the community request for a 3 netdev
>>> solution.  However, it creates some issues we'll get into in a moment.
>>> It extends virtio_net to use alternate datapath when available and
>>> registered. When BACKUP feature is enabled, virtio_net driver creates
>>> an additional 'bypass' netdev that acts as a master device and controls
>>> 2 slave devices.  The original virtio_net netdev is registered as
>>> 'backup' netdev and a passthru/vf device with the same MAC gets
>>> registered as 'active' netdev. Both 'bypass' and 'backup' netdevs are
>>> associated with the same 'pci' device.  The user accesses the network
>>> interface via 'bypass' netdev. The 'bypass' netdev chooses 'active' netdev
>>> as default for transmits when it is available with link up and running.
>>
>> Thank you do doing this.
>>
>>> We noticed a couple of issues with this approach during testing.
>>> - As both 'bypass' and 'backup' netdevs are associated with the same
>>>   virtio pci device, udev tries to rename both of them with the same name
>>>   and the 2nd rename will fail. This would be OK as long as the first netdev
>>>   to be renamed is the 'bypass' netdev, but the order in which udev gets
>>>   to rename the 2 netdevs is not reliable.
>>
>> Out of curiosity - why do you link the master netdev to the virtio
>> struct device?
>
> The basic idea of all this is that we wanted this to work with an
> existing VM image that was using virtio. As such we were trying to
> make it so that the bypass interface takes the place of the original
> virtio and get udev to rename the bypass to what the original
> virtio_net was.

Could it made it also possible to take over the config from VF instead
of virtio on an existing VM image? And get udev rename the bypass
netdev to what the original VF was. I don't say tightly binding the
bypass master to only virtio or VF, but I think we should provide both
options to support different upgrade paths. Possibly we could tweak
the device tree layout to reuse the same PCI slot for the master
bypass netdev, such that udev would not get confused when renaming the
device. The VF needs to use a different function slot afterwards.
Perhaps we might need to a special multiseat like QEMU device for that
purpose?

Our case we'll upgrade the config from VF to virtio-bypass directly.

>
>> FWIW two solutions that immediately come to mind is to export "backup"
>> as phys_port_name of the backup virtio link and/or assign a name to the
>> master like you are doing already.  I think team uses team%d and bond
>> uses bond%d, soft naming of master devices seems quite natural in this
>> case.
>
> I figured I had overlooked something like that.. Thanks for pointing
> this out. Okay so I think the phys_port_name approach might resolve
> the original issue. If I am reading things correctly what we end up
> with is the master showing up as "ens1" for example and the backup
> showing up as "ens1nbackup". Am I understanding that right?
>
> The problem with the team/bond%d approach is that it creates a new
> netdevice and so it would require guest configuration changes.
>
>> IMHO phys_port_name == "backup" if BACKUP bit is set on slave virtio
>> link is quite neat.
>
> I agree. For non-"backup" virio_net devices would it be okay for us to
> just return -EOPNOTSUPP? I assume it would be and that way the legacy
> behavior could be maintained although the function still exists.
>
>>> - When the 'active' netdev is unplugged OR not present on a destination
>>>   system after live migration, the user will see 2 virtio_net netdevs.
>>
>> That's necessary and expected, all configuration applies to the master
>> so master must exist.
>
> With the naming issue resolved this is the only item left outstanding.
> This becomes a matter of form vs function.
>
> The main complaint about the "3 netdev" solution is a bit confusing to
> have the 2 netdevs present if the VF isn't there. The idea is that
> having the extra "master" netdev there if there isn't really a bond is
> a bit ugly.

Is it this uglier in terms of user experience rather than
functionality? I don't want it dynamically changed between 2-netdev
and 3-netdev depending on the presence of VF. That gets back to my
original question and suggestion earlier: why not just hide the lower
netdevs from udev renaming and such? Which important observability
benefits users may get if exposing the lower netdevs?

Thanks,
-Siwei

>
> The downside of the "2 netdev" solution is that you have to deal 

Re: [PATCH 1/4] iommu: Add virtio-iommu driver

2018-02-21 Thread kbuild test robot
Hi Jean-Philippe,

I love your patch! Yet something to improve:

[auto build test ERROR on linus/master]
[also build test ERROR on v4.16-rc2 next-20180221]
[cannot apply to iommu/next]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Jean-Philippe-Brucker/Add-virtio-iommu-driver/20180217-075417
config: ia64-allmodconfig (attached as .config)
compiler: ia64-linux-gcc (GCC) 7.2.0
reproduce:
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=ia64 

All errors (new ones prefixed by >>):

   drivers/iommu/virtio-iommu.o: In function `viommu_send_reqs_sync':
>> virtio-iommu.c:(.text+0xa82): undefined reference to `virtqueue_add_sgs'
>> virtio-iommu.c:(.text+0xb52): undefined reference to `virtqueue_kick'
>> virtio-iommu.c:(.text+0xd82): undefined reference to `virtqueue_get_buf'
   drivers/iommu/virtio-iommu.o: In function `viommu_probe':
>> virtio-iommu.c:(.text+0x23f2): undefined reference to 
>> `virtio_check_driver_offered_feature'
   virtio-iommu.c:(.text+0x2572): undefined reference to 
`virtio_check_driver_offered_feature'
   virtio-iommu.c:(.text+0x26f2): undefined reference to 
`virtio_check_driver_offered_feature'
   drivers/iommu/virtio-iommu.o: In function `virtio_iommu_drv_init':
>> virtio-iommu.c:(.init.text+0x22): undefined reference to 
>> `register_virtio_driver'

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [RFC PATCH v3 0/3] Enable virtio_net to act as a backup for a passthru device

2018-02-21 Thread Alexander Duyck
On Wed, Feb 21, 2018 at 11:38 AM, Jiri Pirko  wrote:
> Wed, Feb 21, 2018 at 06:56:35PM CET, alexander.du...@gmail.com wrote:
>>On Wed, Feb 21, 2018 at 8:58 AM, Jiri Pirko  wrote:
>>> Wed, Feb 21, 2018 at 05:49:49PM CET, alexander.du...@gmail.com wrote:
On Wed, Feb 21, 2018 at 8:11 AM, Jiri Pirko  wrote:
> Wed, Feb 21, 2018 at 04:56:48PM CET, alexander.du...@gmail.com wrote:
>>On Wed, Feb 21, 2018 at 1:51 AM, Jiri Pirko  wrote:
>>> Tue, Feb 20, 2018 at 11:33:56PM CET, kubak...@wp.pl wrote:
On Tue, 20 Feb 2018 21:14:10 +0100, Jiri Pirko wrote:
> Yeah, I can see it now :( I guess that the ship has sailed and we are
> stuck with this ugly thing forever...
>
> Could you at least make some common code that is shared in between
> netvsc and virtio_net so this is handled in exacly the same way in 
> both?

IMHO netvsc is a vendor specific driver which made a mistake on what
behaviour it provides (or tried to align itself with Windows SR-IOV).
Let's not make a far, far more commonly deployed and important driver
(virtio) bug-compatible with netvsc.
>>>
>>> Yeah. netvsc solution is a dangerous precedent here and in my opinition
>>> it was a huge mistake to merge it. I personally would vote to unmerge it
>>> and make the solution based on team/bond.
>>>
>>>

To Jiri's initial comments, I feel the same way, in fact I've talked to
the NetworkManager guys to get auto-bonding based on MACs handled in
user space.  I think it may very well get done in next versions of NM,
but isn't done yet.  Stephen also raised the point that not everybody is
using NM.
>>>
>>> Can be done in NM, networkd or other network management tools.
>>> Even easier to do this in teamd and let them all benefit.
>>>
>>> Actually, I took a stab to implement this in teamd. Took me like an hour
>>> and half.
>>>
>>> You can just run teamd with config option "kidnap" like this:
>>> # teamd/teamd -c '{"kidnap": true }'
>>>
>>> Whenever teamd sees another netdev to appear with the same mac as his,
>>> or whenever teamd sees another netdev to change mac to his,
>>> it enslaves it.
>>>
>>> Here's the patch (quick and dirty):
>>>
>>> Subject: [patch teamd] teamd: introduce kidnap feature
>>>
>>> Signed-off-by: Jiri Pirko 
>>
>>So this doesn't really address the original problem we were trying to
>>solve. You asked earlier why the netdev name mattered and it mostly
>>has to do with configuration. Specifically what our patch is
>>attempting to resolve is the issue of how to allow a cloud provider to
>>upgrade their customer to SR-IOV support and live migration without
>>requiring them to reconfigure their guest. So the general idea with
>>our patch is to take a VM that is running with virtio_net only and
>>allow it to instead spawn a virtio_bypass master using the same netdev
>>name as the original virtio, and then have the virtio_net and VF come
>>up and be enslaved by the bypass interface. Doing it this way we can
>>allow for multi-vendor SR-IOV live migration support using a guest
>>that was originally configured for virtio only.
>>
>>The problem with your solution is we already have teaming and bonding
>>as you said. There is already a write-up from Red Hat on how to do it
>>(https://access.redhat.com/documentation/en-us/red_hat_virtualization/4.1/html/virtual_machine_management_guide/sect-migrating_virtual_machines_between_hosts).
>>That is all well and good as long as you are willing to keep around
>>two VM images, one for virtio, and one for SR-IOV with live migration.
>
> You don't need 2 images. You need only one. The one with the team setup.
> That's it. If another netdev with the same mac appears, teamd will
> enslave it and run traffic on it. If not, ok, you'll go only through
> virtio_net.

Isn't that going to cause the routing table to get messed up when we
rearrange the netdevs? We don't want to have an significant disruption
 in traffic when we are adding/removing the VF. It seems like we would
need to invalidate any entries that were configured for the virtio_net
and reestablish them on the new team interface. Part of the criteria
we have been working with is that we should be able to transition from
having a VF to not or vice versa without seeing any significant
disruption in the traffic.
>>>
>>> What? You have routes on the team netdev. virtio_net and VF are only
>>> slaves. What are you talking about? I don't get it :/
>>
>>So lets walk though this by example. The general idea of the base case
>>for all this is somebody starting with virtio_net, we will call the

Re: [RFC PATCH v3 0/3] Enable virtio_net to act as a backup for a passthru device

2018-02-21 Thread Jiri Pirko
Wed, Feb 21, 2018 at 06:56:35PM CET, alexander.du...@gmail.com wrote:
>On Wed, Feb 21, 2018 at 8:58 AM, Jiri Pirko  wrote:
>> Wed, Feb 21, 2018 at 05:49:49PM CET, alexander.du...@gmail.com wrote:
>>>On Wed, Feb 21, 2018 at 8:11 AM, Jiri Pirko  wrote:
 Wed, Feb 21, 2018 at 04:56:48PM CET, alexander.du...@gmail.com wrote:
>On Wed, Feb 21, 2018 at 1:51 AM, Jiri Pirko  wrote:
>> Tue, Feb 20, 2018 at 11:33:56PM CET, kubak...@wp.pl wrote:
>>>On Tue, 20 Feb 2018 21:14:10 +0100, Jiri Pirko wrote:
 Yeah, I can see it now :( I guess that the ship has sailed and we are
 stuck with this ugly thing forever...

 Could you at least make some common code that is shared in between
 netvsc and virtio_net so this is handled in exacly the same way in 
 both?
>>>
>>>IMHO netvsc is a vendor specific driver which made a mistake on what
>>>behaviour it provides (or tried to align itself with Windows SR-IOV).
>>>Let's not make a far, far more commonly deployed and important driver
>>>(virtio) bug-compatible with netvsc.
>>
>> Yeah. netvsc solution is a dangerous precedent here and in my opinition
>> it was a huge mistake to merge it. I personally would vote to unmerge it
>> and make the solution based on team/bond.
>>
>>
>>>
>>>To Jiri's initial comments, I feel the same way, in fact I've talked to
>>>the NetworkManager guys to get auto-bonding based on MACs handled in
>>>user space.  I think it may very well get done in next versions of NM,
>>>but isn't done yet.  Stephen also raised the point that not everybody is
>>>using NM.
>>
>> Can be done in NM, networkd or other network management tools.
>> Even easier to do this in teamd and let them all benefit.
>>
>> Actually, I took a stab to implement this in teamd. Took me like an hour
>> and half.
>>
>> You can just run teamd with config option "kidnap" like this:
>> # teamd/teamd -c '{"kidnap": true }'
>>
>> Whenever teamd sees another netdev to appear with the same mac as his,
>> or whenever teamd sees another netdev to change mac to his,
>> it enslaves it.
>>
>> Here's the patch (quick and dirty):
>>
>> Subject: [patch teamd] teamd: introduce kidnap feature
>>
>> Signed-off-by: Jiri Pirko 
>
>So this doesn't really address the original problem we were trying to
>solve. You asked earlier why the netdev name mattered and it mostly
>has to do with configuration. Specifically what our patch is
>attempting to resolve is the issue of how to allow a cloud provider to
>upgrade their customer to SR-IOV support and live migration without
>requiring them to reconfigure their guest. So the general idea with
>our patch is to take a VM that is running with virtio_net only and
>allow it to instead spawn a virtio_bypass master using the same netdev
>name as the original virtio, and then have the virtio_net and VF come
>up and be enslaved by the bypass interface. Doing it this way we can
>allow for multi-vendor SR-IOV live migration support using a guest
>that was originally configured for virtio only.
>
>The problem with your solution is we already have teaming and bonding
>as you said. There is already a write-up from Red Hat on how to do it
>(https://access.redhat.com/documentation/en-us/red_hat_virtualization/4.1/html/virtual_machine_management_guide/sect-migrating_virtual_machines_between_hosts).
>That is all well and good as long as you are willing to keep around
>two VM images, one for virtio, and one for SR-IOV with live migration.

 You don't need 2 images. You need only one. The one with the team setup.
 That's it. If another netdev with the same mac appears, teamd will
 enslave it and run traffic on it. If not, ok, you'll go only through
 virtio_net.
>>>
>>>Isn't that going to cause the routing table to get messed up when we
>>>rearrange the netdevs? We don't want to have an significant disruption
>>> in traffic when we are adding/removing the VF. It seems like we would
>>>need to invalidate any entries that were configured for the virtio_net
>>>and reestablish them on the new team interface. Part of the criteria
>>>we have been working with is that we should be able to transition from
>>>having a VF to not or vice versa without seeing any significant
>>>disruption in the traffic.
>>
>> What? You have routes on the team netdev. virtio_net and VF are only
>> slaves. What are you talking about? I don't get it :/
>
>So lets walk though this by example. The general idea of the base case
>for all this is somebody starting with virtio_net, we will call the
>interface "ens1" for now. It comes up and is assigned a dhcp address
>and everything works as expected. Now in order to get better
>performance we want to add a VF 

Re: [PATCH 1/4] iommu: Add virtio-iommu driver

2018-02-21 Thread kbuild test robot
Hi Jean-Philippe,

I love your patch! Yet something to improve:

[auto build test ERROR on linus/master]
[also build test ERROR on v4.16-rc2 next-20180221]
[cannot apply to iommu/next]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Jean-Philippe-Brucker/Add-virtio-iommu-driver/20180217-075417
config: arm64-allmodconfig (attached as .config)
compiler: aarch64-linux-gnu-gcc (Debian 7.2.0-11) 7.2.0
reproduce:
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=arm64 

All errors (new ones prefixed by >>):

   aarch64-linux-gnu-ld: arch/arm64/kernel/head.o: relocation R_AARCH64_ABS32 
against `_kernel_offset_le_lo32' can not be used when making a shared object
   arch/arm64/kernel/head.o: In function `kimage_vaddr':
   (.idmap.text+0x0): dangerous relocation: unsupported relocation
   arch/arm64/kernel/head.o: In function `__primary_switch':
   (.idmap.text+0x340): dangerous relocation: unsupported relocation
   (.idmap.text+0x348): dangerous relocation: unsupported relocation
   drivers/iommu/virtio-iommu.o: In function `viommu_probe':
   virtio-iommu.c:(.text+0xbdc): undefined reference to 
`virtio_check_driver_offered_feature'
   virtio-iommu.c:(.text+0xcfc): undefined reference to 
`virtio_check_driver_offered_feature'
   virtio-iommu.c:(.text+0xe10): undefined reference to 
`virtio_check_driver_offered_feature'
   drivers/iommu/virtio-iommu.o: In function `viommu_send_reqs_sync':
   virtio-iommu.c:(.text+0x130c): undefined reference to `virtqueue_add_sgs'
   virtio-iommu.c:(.text+0x1398): undefined reference to `virtqueue_kick'
   virtio-iommu.c:(.text+0x14d4): undefined reference to `virtqueue_get_buf'
   drivers/iommu/virtio-iommu.o: In function `virtio_iommu_drv_init':
   virtio-iommu.c:(.init.text+0x1c): undefined reference to 
`register_virtio_driver'
   drivers/iommu/virtio-iommu.o: In function `virtio_iommu_drv_exit':
>> virtio-iommu.c:(.exit.text+0x14): undefined reference to 
>> `unregister_virtio_driver'

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [RFC PATCH v3 0/3] Enable virtio_net to act as a backup for a passthru device

2018-02-21 Thread Alexander Duyck
On Wed, Feb 21, 2018 at 8:58 AM, Jiri Pirko  wrote:
> Wed, Feb 21, 2018 at 05:49:49PM CET, alexander.du...@gmail.com wrote:
>>On Wed, Feb 21, 2018 at 8:11 AM, Jiri Pirko  wrote:
>>> Wed, Feb 21, 2018 at 04:56:48PM CET, alexander.du...@gmail.com wrote:
On Wed, Feb 21, 2018 at 1:51 AM, Jiri Pirko  wrote:
> Tue, Feb 20, 2018 at 11:33:56PM CET, kubak...@wp.pl wrote:
>>On Tue, 20 Feb 2018 21:14:10 +0100, Jiri Pirko wrote:
>>> Yeah, I can see it now :( I guess that the ship has sailed and we are
>>> stuck with this ugly thing forever...
>>>
>>> Could you at least make some common code that is shared in between
>>> netvsc and virtio_net so this is handled in exacly the same way in both?
>>
>>IMHO netvsc is a vendor specific driver which made a mistake on what
>>behaviour it provides (or tried to align itself with Windows SR-IOV).
>>Let's not make a far, far more commonly deployed and important driver
>>(virtio) bug-compatible with netvsc.
>
> Yeah. netvsc solution is a dangerous precedent here and in my opinition
> it was a huge mistake to merge it. I personally would vote to unmerge it
> and make the solution based on team/bond.
>
>
>>
>>To Jiri's initial comments, I feel the same way, in fact I've talked to
>>the NetworkManager guys to get auto-bonding based on MACs handled in
>>user space.  I think it may very well get done in next versions of NM,
>>but isn't done yet.  Stephen also raised the point that not everybody is
>>using NM.
>
> Can be done in NM, networkd or other network management tools.
> Even easier to do this in teamd and let them all benefit.
>
> Actually, I took a stab to implement this in teamd. Took me like an hour
> and half.
>
> You can just run teamd with config option "kidnap" like this:
> # teamd/teamd -c '{"kidnap": true }'
>
> Whenever teamd sees another netdev to appear with the same mac as his,
> or whenever teamd sees another netdev to change mac to his,
> it enslaves it.
>
> Here's the patch (quick and dirty):
>
> Subject: [patch teamd] teamd: introduce kidnap feature
>
> Signed-off-by: Jiri Pirko 

So this doesn't really address the original problem we were trying to
solve. You asked earlier why the netdev name mattered and it mostly
has to do with configuration. Specifically what our patch is
attempting to resolve is the issue of how to allow a cloud provider to
upgrade their customer to SR-IOV support and live migration without
requiring them to reconfigure their guest. So the general idea with
our patch is to take a VM that is running with virtio_net only and
allow it to instead spawn a virtio_bypass master using the same netdev
name as the original virtio, and then have the virtio_net and VF come
up and be enslaved by the bypass interface. Doing it this way we can
allow for multi-vendor SR-IOV live migration support using a guest
that was originally configured for virtio only.

The problem with your solution is we already have teaming and bonding
as you said. There is already a write-up from Red Hat on how to do it
(https://access.redhat.com/documentation/en-us/red_hat_virtualization/4.1/html/virtual_machine_management_guide/sect-migrating_virtual_machines_between_hosts).
That is all well and good as long as you are willing to keep around
two VM images, one for virtio, and one for SR-IOV with live migration.
>>>
>>> You don't need 2 images. You need only one. The one with the team setup.
>>> That's it. If another netdev with the same mac appears, teamd will
>>> enslave it and run traffic on it. If not, ok, you'll go only through
>>> virtio_net.
>>
>>Isn't that going to cause the routing table to get messed up when we
>>rearrange the netdevs? We don't want to have an significant disruption
>> in traffic when we are adding/removing the VF. It seems like we would
>>need to invalidate any entries that were configured for the virtio_net
>>and reestablish them on the new team interface. Part of the criteria
>>we have been working with is that we should be able to transition from
>>having a VF to not or vice versa without seeing any significant
>>disruption in the traffic.
>
> What? You have routes on the team netdev. virtio_net and VF are only
> slaves. What are you talking about? I don't get it :/

So lets walk though this by example. The general idea of the base case
for all this is somebody starting with virtio_net, we will call the
interface "ens1" for now. It comes up and is assigned a dhcp address
and everything works as expected. Now in order to get better
performance we want to add a VF "ens2", but we don't want a new IP
address. Now if I understand correctly what will happen is that when
"ens2" appears on the system teamd will then create a new team

Re: [RFC PATCH v3 0/3] Enable virtio_net to act as a backup for a passthru device

2018-02-21 Thread Jiri Pirko
Wed, Feb 21, 2018 at 05:49:49PM CET, alexander.du...@gmail.com wrote:
>On Wed, Feb 21, 2018 at 8:11 AM, Jiri Pirko  wrote:
>> Wed, Feb 21, 2018 at 04:56:48PM CET, alexander.du...@gmail.com wrote:
>>>On Wed, Feb 21, 2018 at 1:51 AM, Jiri Pirko  wrote:
 Tue, Feb 20, 2018 at 11:33:56PM CET, kubak...@wp.pl wrote:
>On Tue, 20 Feb 2018 21:14:10 +0100, Jiri Pirko wrote:
>> Yeah, I can see it now :( I guess that the ship has sailed and we are
>> stuck with this ugly thing forever...
>>
>> Could you at least make some common code that is shared in between
>> netvsc and virtio_net so this is handled in exacly the same way in both?
>
>IMHO netvsc is a vendor specific driver which made a mistake on what
>behaviour it provides (or tried to align itself with Windows SR-IOV).
>Let's not make a far, far more commonly deployed and important driver
>(virtio) bug-compatible with netvsc.

 Yeah. netvsc solution is a dangerous precedent here and in my opinition
 it was a huge mistake to merge it. I personally would vote to unmerge it
 and make the solution based on team/bond.


>
>To Jiri's initial comments, I feel the same way, in fact I've talked to
>the NetworkManager guys to get auto-bonding based on MACs handled in
>user space.  I think it may very well get done in next versions of NM,
>but isn't done yet.  Stephen also raised the point that not everybody is
>using NM.

 Can be done in NM, networkd or other network management tools.
 Even easier to do this in teamd and let them all benefit.

 Actually, I took a stab to implement this in teamd. Took me like an hour
 and half.

 You can just run teamd with config option "kidnap" like this:
 # teamd/teamd -c '{"kidnap": true }'

 Whenever teamd sees another netdev to appear with the same mac as his,
 or whenever teamd sees another netdev to change mac to his,
 it enslaves it.

 Here's the patch (quick and dirty):

 Subject: [patch teamd] teamd: introduce kidnap feature

 Signed-off-by: Jiri Pirko 
>>>
>>>So this doesn't really address the original problem we were trying to
>>>solve. You asked earlier why the netdev name mattered and it mostly
>>>has to do with configuration. Specifically what our patch is
>>>attempting to resolve is the issue of how to allow a cloud provider to
>>>upgrade their customer to SR-IOV support and live migration without
>>>requiring them to reconfigure their guest. So the general idea with
>>>our patch is to take a VM that is running with virtio_net only and
>>>allow it to instead spawn a virtio_bypass master using the same netdev
>>>name as the original virtio, and then have the virtio_net and VF come
>>>up and be enslaved by the bypass interface. Doing it this way we can
>>>allow for multi-vendor SR-IOV live migration support using a guest
>>>that was originally configured for virtio only.
>>>
>>>The problem with your solution is we already have teaming and bonding
>>>as you said. There is already a write-up from Red Hat on how to do it
>>>(https://access.redhat.com/documentation/en-us/red_hat_virtualization/4.1/html/virtual_machine_management_guide/sect-migrating_virtual_machines_between_hosts).
>>>That is all well and good as long as you are willing to keep around
>>>two VM images, one for virtio, and one for SR-IOV with live migration.
>>
>> You don't need 2 images. You need only one. The one with the team setup.
>> That's it. If another netdev with the same mac appears, teamd will
>> enslave it and run traffic on it. If not, ok, you'll go only through
>> virtio_net.
>
>Isn't that going to cause the routing table to get messed up when we
>rearrange the netdevs? We don't want to have an significant disruption
> in traffic when we are adding/removing the VF. It seems like we would
>need to invalidate any entries that were configured for the virtio_net
>and reestablish them on the new team interface. Part of the criteria
>we have been working with is that we should be able to transition from
>having a VF to not or vice versa without seeing any significant
>disruption in the traffic.

What? You have routes on the team netdev. virtio_net and VF are only
slaves. What are you talking about? I don't get it :/


>
>Also how does this handle any static configuration? I am assuming that
>everything here assumes the team will be brought up as soon as it is
>seen and assigned a DHCP address.

Again. You configure whatever you need on the team netdev.


>
>The solution as you have proposed seems problematic at best. I don't
>see how the team solution works without introducing some sort of
>traffic disruption to either add/remove the VF and bring up/tear down
>the team interface. At that point we might as well just give up on
>this piece of live migration support entirely since the disruption was
>what we were trying to avoid. We 

Re: [RFC PATCH v3 0/3] Enable virtio_net to act as a backup for a passthru device

2018-02-21 Thread Alexander Duyck
On Wed, Feb 21, 2018 at 8:11 AM, Jiri Pirko  wrote:
> Wed, Feb 21, 2018 at 04:56:48PM CET, alexander.du...@gmail.com wrote:
>>On Wed, Feb 21, 2018 at 1:51 AM, Jiri Pirko  wrote:
>>> Tue, Feb 20, 2018 at 11:33:56PM CET, kubak...@wp.pl wrote:
On Tue, 20 Feb 2018 21:14:10 +0100, Jiri Pirko wrote:
> Yeah, I can see it now :( I guess that the ship has sailed and we are
> stuck with this ugly thing forever...
>
> Could you at least make some common code that is shared in between
> netvsc and virtio_net so this is handled in exacly the same way in both?

IMHO netvsc is a vendor specific driver which made a mistake on what
behaviour it provides (or tried to align itself with Windows SR-IOV).
Let's not make a far, far more commonly deployed and important driver
(virtio) bug-compatible with netvsc.
>>>
>>> Yeah. netvsc solution is a dangerous precedent here and in my opinition
>>> it was a huge mistake to merge it. I personally would vote to unmerge it
>>> and make the solution based on team/bond.
>>>
>>>

To Jiri's initial comments, I feel the same way, in fact I've talked to
the NetworkManager guys to get auto-bonding based on MACs handled in
user space.  I think it may very well get done in next versions of NM,
but isn't done yet.  Stephen also raised the point that not everybody is
using NM.
>>>
>>> Can be done in NM, networkd or other network management tools.
>>> Even easier to do this in teamd and let them all benefit.
>>>
>>> Actually, I took a stab to implement this in teamd. Took me like an hour
>>> and half.
>>>
>>> You can just run teamd with config option "kidnap" like this:
>>> # teamd/teamd -c '{"kidnap": true }'
>>>
>>> Whenever teamd sees another netdev to appear with the same mac as his,
>>> or whenever teamd sees another netdev to change mac to his,
>>> it enslaves it.
>>>
>>> Here's the patch (quick and dirty):
>>>
>>> Subject: [patch teamd] teamd: introduce kidnap feature
>>>
>>> Signed-off-by: Jiri Pirko 
>>
>>So this doesn't really address the original problem we were trying to
>>solve. You asked earlier why the netdev name mattered and it mostly
>>has to do with configuration. Specifically what our patch is
>>attempting to resolve is the issue of how to allow a cloud provider to
>>upgrade their customer to SR-IOV support and live migration without
>>requiring them to reconfigure their guest. So the general idea with
>>our patch is to take a VM that is running with virtio_net only and
>>allow it to instead spawn a virtio_bypass master using the same netdev
>>name as the original virtio, and then have the virtio_net and VF come
>>up and be enslaved by the bypass interface. Doing it this way we can
>>allow for multi-vendor SR-IOV live migration support using a guest
>>that was originally configured for virtio only.
>>
>>The problem with your solution is we already have teaming and bonding
>>as you said. There is already a write-up from Red Hat on how to do it
>>(https://access.redhat.com/documentation/en-us/red_hat_virtualization/4.1/html/virtual_machine_management_guide/sect-migrating_virtual_machines_between_hosts).
>>That is all well and good as long as you are willing to keep around
>>two VM images, one for virtio, and one for SR-IOV with live migration.
>
> You don't need 2 images. You need only one. The one with the team setup.
> That's it. If another netdev with the same mac appears, teamd will
> enslave it and run traffic on it. If not, ok, you'll go only through
> virtio_net.

Isn't that going to cause the routing table to get messed up when we
rearrange the netdevs? We don't want to have an significant disruption
 in traffic when we are adding/removing the VF. It seems like we would
need to invalidate any entries that were configured for the virtio_net
and reestablish them on the new team interface. Part of the criteria
we have been working with is that we should be able to transition from
having a VF to not or vice versa without seeing any significant
disruption in the traffic.

Also how does this handle any static configuration? I am assuming that
everything here assumes the team will be brought up as soon as it is
seen and assigned a DHCP address.

The solution as you have proposed seems problematic at best. I don't
see how the team solution works without introducing some sort of
traffic disruption to either add/remove the VF and bring up/tear down
the team interface. At that point we might as well just give up on
this piece of live migration support entirely since the disruption was
what we were trying to avoid. We might as well just hotplug out the VF
and hotplug in a virtio at the same bus device and function number and
just let udev take care of renaming it for us. The idea was supposed
to be a seamless transition between the two interfaces.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org

Re: [RFC PATCH v3 0/3] Enable virtio_net to act as a backup for a passthru device

2018-02-21 Thread Jiri Pirko
Wed, Feb 21, 2018 at 04:56:48PM CET, alexander.du...@gmail.com wrote:
>On Wed, Feb 21, 2018 at 1:51 AM, Jiri Pirko  wrote:
>> Tue, Feb 20, 2018 at 11:33:56PM CET, kubak...@wp.pl wrote:
>>>On Tue, 20 Feb 2018 21:14:10 +0100, Jiri Pirko wrote:
 Yeah, I can see it now :( I guess that the ship has sailed and we are
 stuck with this ugly thing forever...

 Could you at least make some common code that is shared in between
 netvsc and virtio_net so this is handled in exacly the same way in both?
>>>
>>>IMHO netvsc is a vendor specific driver which made a mistake on what
>>>behaviour it provides (or tried to align itself with Windows SR-IOV).
>>>Let's not make a far, far more commonly deployed and important driver
>>>(virtio) bug-compatible with netvsc.
>>
>> Yeah. netvsc solution is a dangerous precedent here and in my opinition
>> it was a huge mistake to merge it. I personally would vote to unmerge it
>> and make the solution based on team/bond.
>>
>>
>>>
>>>To Jiri's initial comments, I feel the same way, in fact I've talked to
>>>the NetworkManager guys to get auto-bonding based on MACs handled in
>>>user space.  I think it may very well get done in next versions of NM,
>>>but isn't done yet.  Stephen also raised the point that not everybody is
>>>using NM.
>>
>> Can be done in NM, networkd or other network management tools.
>> Even easier to do this in teamd and let them all benefit.
>>
>> Actually, I took a stab to implement this in teamd. Took me like an hour
>> and half.
>>
>> You can just run teamd with config option "kidnap" like this:
>> # teamd/teamd -c '{"kidnap": true }'
>>
>> Whenever teamd sees another netdev to appear with the same mac as his,
>> or whenever teamd sees another netdev to change mac to his,
>> it enslaves it.
>>
>> Here's the patch (quick and dirty):
>>
>> Subject: [patch teamd] teamd: introduce kidnap feature
>>
>> Signed-off-by: Jiri Pirko 
>
>So this doesn't really address the original problem we were trying to
>solve. You asked earlier why the netdev name mattered and it mostly
>has to do with configuration. Specifically what our patch is
>attempting to resolve is the issue of how to allow a cloud provider to
>upgrade their customer to SR-IOV support and live migration without
>requiring them to reconfigure their guest. So the general idea with
>our patch is to take a VM that is running with virtio_net only and
>allow it to instead spawn a virtio_bypass master using the same netdev
>name as the original virtio, and then have the virtio_net and VF come
>up and be enslaved by the bypass interface. Doing it this way we can
>allow for multi-vendor SR-IOV live migration support using a guest
>that was originally configured for virtio only.
>
>The problem with your solution is we already have teaming and bonding
>as you said. There is already a write-up from Red Hat on how to do it
>(https://access.redhat.com/documentation/en-us/red_hat_virtualization/4.1/html/virtual_machine_management_guide/sect-migrating_virtual_machines_between_hosts).
>That is all well and good as long as you are willing to keep around
>two VM images, one for virtio, and one for SR-IOV with live migration.

You don't need 2 images. You need only one. The one with the team setup.
That's it. If another netdev with the same mac appears, teamd will
enslave it and run traffic on it. If not, ok, you'll go only through
virtio_net.


>The problem is nobody wants to do that. What they want is to maintain
>one guest image and if they decide to upgrade to SR-IOV they still
>want their live migration and they don't want to have to reconfigure
>the guest.
>
>That said it does seem to make the existing Red Hat solution easier to
>manage since you wouldn't be guessing at ifname so I have provided
>some feedback below.
>
>> ---
>>  include/team.h |  7 +++
>>  libteam/ifinfo.c   | 20 
>>  teamd/teamd.c  | 17 +
>>  teamd/teamd.h  |  5 +
>>  teamd/teamd_events.c   | 17 +
>>  teamd/teamd_ifinfo_watch.c |  9 +
>>  teamd/teamd_per_port.c |  7 ++-
>>  7 files changed, 81 insertions(+), 1 deletion(-)
>>
>> diff --git a/include/team.h b/include/team.h
>> index 9ae517d..b0c19c8 100644
>> --- a/include/team.h
>> +++ b/include/team.h
>> @@ -137,6 +137,13 @@ struct team_ifinfo *team_get_next_ifinfo(struct 
>> team_handle *th,
>>  #define team_for_each_ifinfo(ifinfo, th)   \
>> for (ifinfo = team_get_next_ifinfo(th, NULL); ifinfo;   \
>>  ifinfo = team_get_next_ifinfo(th, ifinfo))
>> +
>> +struct team_ifinfo *team_get_next_unlinked_ifinfo(struct team_handle *th,
>> + struct team_ifinfo 
>> *ifinfo);
>> +#define team_for_each_unlinked_ifinfo(ifinfo, th)  \
>> +   for (ifinfo = team_get_next_unlinked_ifinfo(th, NULL); ifinfo;  \
>> +ifinfo = 

Re: [RFC PATCH v3 0/3] Enable virtio_net to act as a backup for a passthru device

2018-02-21 Thread Alexander Duyck
On Wed, Feb 21, 2018 at 1:51 AM, Jiri Pirko  wrote:
> Tue, Feb 20, 2018 at 11:33:56PM CET, kubak...@wp.pl wrote:
>>On Tue, 20 Feb 2018 21:14:10 +0100, Jiri Pirko wrote:
>>> Yeah, I can see it now :( I guess that the ship has sailed and we are
>>> stuck with this ugly thing forever...
>>>
>>> Could you at least make some common code that is shared in between
>>> netvsc and virtio_net so this is handled in exacly the same way in both?
>>
>>IMHO netvsc is a vendor specific driver which made a mistake on what
>>behaviour it provides (or tried to align itself with Windows SR-IOV).
>>Let's not make a far, far more commonly deployed and important driver
>>(virtio) bug-compatible with netvsc.
>
> Yeah. netvsc solution is a dangerous precedent here and in my opinition
> it was a huge mistake to merge it. I personally would vote to unmerge it
> and make the solution based on team/bond.
>
>
>>
>>To Jiri's initial comments, I feel the same way, in fact I've talked to
>>the NetworkManager guys to get auto-bonding based on MACs handled in
>>user space.  I think it may very well get done in next versions of NM,
>>but isn't done yet.  Stephen also raised the point that not everybody is
>>using NM.
>
> Can be done in NM, networkd or other network management tools.
> Even easier to do this in teamd and let them all benefit.
>
> Actually, I took a stab to implement this in teamd. Took me like an hour
> and half.
>
> You can just run teamd with config option "kidnap" like this:
> # teamd/teamd -c '{"kidnap": true }'
>
> Whenever teamd sees another netdev to appear with the same mac as his,
> or whenever teamd sees another netdev to change mac to his,
> it enslaves it.
>
> Here's the patch (quick and dirty):
>
> Subject: [patch teamd] teamd: introduce kidnap feature
>
> Signed-off-by: Jiri Pirko 

So this doesn't really address the original problem we were trying to
solve. You asked earlier why the netdev name mattered and it mostly
has to do with configuration. Specifically what our patch is
attempting to resolve is the issue of how to allow a cloud provider to
upgrade their customer to SR-IOV support and live migration without
requiring them to reconfigure their guest. So the general idea with
our patch is to take a VM that is running with virtio_net only and
allow it to instead spawn a virtio_bypass master using the same netdev
name as the original virtio, and then have the virtio_net and VF come
up and be enslaved by the bypass interface. Doing it this way we can
allow for multi-vendor SR-IOV live migration support using a guest
that was originally configured for virtio only.

The problem with your solution is we already have teaming and bonding
as you said. There is already a write-up from Red Hat on how to do it
(https://access.redhat.com/documentation/en-us/red_hat_virtualization/4.1/html/virtual_machine_management_guide/sect-migrating_virtual_machines_between_hosts).
That is all well and good as long as you are willing to keep around
two VM images, one for virtio, and one for SR-IOV with live migration.
The problem is nobody wants to do that. What they want is to maintain
one guest image and if they decide to upgrade to SR-IOV they still
want their live migration and they don't want to have to reconfigure
the guest.

That said it does seem to make the existing Red Hat solution easier to
manage since you wouldn't be guessing at ifname so I have provided
some feedback below.

> ---
>  include/team.h |  7 +++
>  libteam/ifinfo.c   | 20 
>  teamd/teamd.c  | 17 +
>  teamd/teamd.h  |  5 +
>  teamd/teamd_events.c   | 17 +
>  teamd/teamd_ifinfo_watch.c |  9 +
>  teamd/teamd_per_port.c |  7 ++-
>  7 files changed, 81 insertions(+), 1 deletion(-)
>
> diff --git a/include/team.h b/include/team.h
> index 9ae517d..b0c19c8 100644
> --- a/include/team.h
> +++ b/include/team.h
> @@ -137,6 +137,13 @@ struct team_ifinfo *team_get_next_ifinfo(struct 
> team_handle *th,
>  #define team_for_each_ifinfo(ifinfo, th)   \
> for (ifinfo = team_get_next_ifinfo(th, NULL); ifinfo;   \
>  ifinfo = team_get_next_ifinfo(th, ifinfo))
> +
> +struct team_ifinfo *team_get_next_unlinked_ifinfo(struct team_handle *th,
> + struct team_ifinfo *ifinfo);
> +#define team_for_each_unlinked_ifinfo(ifinfo, th)  \
> +   for (ifinfo = team_get_next_unlinked_ifinfo(th, NULL); ifinfo;  \
> +ifinfo = team_get_next_unlinked_ifinfo(th, ifinfo))
> +
>  /* ifinfo getters */
>  bool team_is_ifinfo_removed(struct team_ifinfo *ifinfo);
>  uint32_t team_get_ifinfo_ifindex(struct team_ifinfo *ifinfo);
> diff --git a/libteam/ifinfo.c b/libteam/ifinfo.c
> index 5c32a9c..8f9548e 100644
> --- a/libteam/ifinfo.c
> +++ b/libteam/ifinfo.c
> @@ -494,6 +494,26 @@ struct team_ifinfo 

Re: [RFC PATCH v3 0/3] Enable virtio_net to act as a backup for a passthru device

2018-02-21 Thread Jiri Pirko
Tue, Feb 20, 2018 at 11:33:56PM CET, kubak...@wp.pl wrote:
>On Tue, 20 Feb 2018 21:14:10 +0100, Jiri Pirko wrote:
>> Yeah, I can see it now :( I guess that the ship has sailed and we are
>> stuck with this ugly thing forever...
>> 
>> Could you at least make some common code that is shared in between
>> netvsc and virtio_net so this is handled in exacly the same way in both?
>
>IMHO netvsc is a vendor specific driver which made a mistake on what
>behaviour it provides (or tried to align itself with Windows SR-IOV).
>Let's not make a far, far more commonly deployed and important driver
>(virtio) bug-compatible with netvsc.

Yeah. netvsc solution is a dangerous precedent here and in my opinition
it was a huge mistake to merge it. I personally would vote to unmerge it
and make the solution based on team/bond.


>
>To Jiri's initial comments, I feel the same way, in fact I've talked to
>the NetworkManager guys to get auto-bonding based on MACs handled in
>user space.  I think it may very well get done in next versions of NM,
>but isn't done yet.  Stephen also raised the point that not everybody is
>using NM.

Can be done in NM, networkd or other network management tools.
Even easier to do this in teamd and let them all benefit.

Actually, I took a stab to implement this in teamd. Took me like an hour
and half.

You can just run teamd with config option "kidnap" like this:
# teamd/teamd -c '{"kidnap": true }'

Whenever teamd sees another netdev to appear with the same mac as his,
or whenever teamd sees another netdev to change mac to his,
it enslaves it.

Here's the patch (quick and dirty):

Subject: [patch teamd] teamd: introduce kidnap feature

Signed-off-by: Jiri Pirko 
---
 include/team.h |  7 +++
 libteam/ifinfo.c   | 20 
 teamd/teamd.c  | 17 +
 teamd/teamd.h  |  5 +
 teamd/teamd_events.c   | 17 +
 teamd/teamd_ifinfo_watch.c |  9 +
 teamd/teamd_per_port.c |  7 ++-
 7 files changed, 81 insertions(+), 1 deletion(-)

diff --git a/include/team.h b/include/team.h
index 9ae517d..b0c19c8 100644
--- a/include/team.h
+++ b/include/team.h
@@ -137,6 +137,13 @@ struct team_ifinfo *team_get_next_ifinfo(struct 
team_handle *th,
 #define team_for_each_ifinfo(ifinfo, th)   \
for (ifinfo = team_get_next_ifinfo(th, NULL); ifinfo;   \
 ifinfo = team_get_next_ifinfo(th, ifinfo))
+
+struct team_ifinfo *team_get_next_unlinked_ifinfo(struct team_handle *th,
+ struct team_ifinfo *ifinfo);
+#define team_for_each_unlinked_ifinfo(ifinfo, th)  \
+   for (ifinfo = team_get_next_unlinked_ifinfo(th, NULL); ifinfo;  \
+ifinfo = team_get_next_unlinked_ifinfo(th, ifinfo))
+
 /* ifinfo getters */
 bool team_is_ifinfo_removed(struct team_ifinfo *ifinfo);
 uint32_t team_get_ifinfo_ifindex(struct team_ifinfo *ifinfo);
diff --git a/libteam/ifinfo.c b/libteam/ifinfo.c
index 5c32a9c..8f9548e 100644
--- a/libteam/ifinfo.c
+++ b/libteam/ifinfo.c
@@ -494,6 +494,26 @@ struct team_ifinfo *team_get_next_ifinfo(struct 
team_handle *th,
return NULL;
 }
 
+/**
+ * @param th   libteam library context
+ * @param ifinfo   ifinfo structure
+ *
+ * @details Get next unlinked ifinfo in list.
+ *
+ * @return Ifinfo next to ifinfo passed.
+ **/
+TEAM_EXPORT
+struct team_ifinfo *team_get_next_unlinked_ifinfo(struct team_handle *th,
+ struct team_ifinfo *ifinfo)
+{
+   do {
+   ifinfo = list_get_next_node_entry(>ifinfo_list, ifinfo, 
list);
+   if (ifinfo && !ifinfo->linked)
+   return ifinfo;
+   } while (ifinfo);
+   return NULL;
+}
+
 /**
  * @param ifinfo   ifinfo structure
  *
diff --git a/teamd/teamd.c b/teamd/teamd.c
index aac2511..069c7f0 100644
--- a/teamd/teamd.c
+++ b/teamd/teamd.c
@@ -926,8 +926,25 @@ static int teamd_event_watch_port_added(struct 
teamd_context *ctx,
return 0;
 }
 
+static int teamd_event_watch_unlinked_hwaddr_changed(struct teamd_context *ctx,
+struct team_ifinfo *ifinfo,
+void *priv)
+{
+   int err;
+   bool kidnap;
+
+   err = teamd_config_bool_get(ctx, , "$.kidnap");
+   if (err || !kidnap ||
+   ctx->hwaddr_len != team_get_ifinfo_hwaddr_len(ifinfo) ||
+   memcmp(team_get_ifinfo_hwaddr(ifinfo),
+  ctx->hwaddr, ctx->hwaddr_len))
+   return 0;
+   return teamd_port_add(ctx, team_get_ifinfo_ifindex(ifinfo));
+}
+
 static const struct teamd_event_watch_ops teamd_port_watch_ops = {
.port_added = teamd_event_watch_port_added,
+   .unlinked_hwaddr_changed = teamd_event_watch_unlinked_hwaddr_changed,
 };
 
 static int teamd_port_watch_init(struct teamd_context *ctx)