Re: [virtio-dev] Re: [RFC PATCH net-next v2 2/2] virtio_net: Extend virtio to use VF datapath when available

2018-01-29 Thread Alexander Duyck
On Mon, Jan 29, 2018 at 10:24 AM, Michael S. Tsirkin  wrote:
> On Sun, Jan 28, 2018 at 08:26:53PM -0800, Alexander Duyck wrote:
>> >> > For live migration with advanced usecases that Siwei is suggesting, i
>> >> > think we need a new driver with a new device type that can track the
>> >> > VF specific feature settings even when the VF driver is unloaded.
>> >
>> > I see no added value of the 3 netdev model, there is no need for a bond
>> > device.
>>
>> I agree a full-blown bond isn't what is needed. However, just forking
>> traffic out from virtio to a VF doesn't really solve things either.
>>
>> One of the issues as I see it is the fact that the qdisc model with
>> the merged device gets to be pretty ugly. There is the fact that every
>> packet that goes to the VF has to pass through the qdisc code twice.
>> There is the dual nature of the 2 netdev solution that also introduces
>> the potential for head-of-line blocking since the virtio could put
>> back pressure on the upper qdisc layer which could stall the VF
>> traffic when switching over. I hope we could avoid issues like that by
>> maintaining qdiscs per device queue instead of on an upper device that
>> is half software interface and half not. Ideally the virtio-bond
>> device could operate without a qdisc and without needing any
>> additional locks so there shouldn't be head of line blocking occurring
>> between the two interfaces and overhead could be kept minimal.
>>
>> Also in the case of virtio there is support for in-driver XDP. As
>> Sridhar stated, when using the 2 netdev model "we cannot support XDP
>> in this model and it needs to be disabled". That sounds like a step
>> backwards instead of forwards. I would much rather leave the XDP
>> enabled at the lower dev level, and then if we want we can use the
>> generic XDP at the virtio-bond level to capture traffic on both
>> interfaces at the same time.
>
> I agree dropping XDP makes everything very iffy.
>
>> In the case of netvsc you have control of both sides of a given link
>> so you can match up the RSS tables, queue configuration and everything
>> is somewhat symmetric since you are running the PF and all the HyperV
>> subchannels. Most of the complexity is pushed down into the host and
>> your subchannel management is synchronized there if I am not mistaken.
>> We don't have this in the case of this virtio-bond setup. Instead a
>> single bit is set indicating "backup" without indicating what that
>> means to topology other than the fact that this virtio interface is
>> the backup for some other interface. We are essentially blind other
>> than having the link status for the VF and virtio and knowing that the
>> virtio is intended to be the backup.
>
> Would you be interested in posting at least a proof of concept
> patch for this approach? It's OK if there are some TODO stubs.
> It would be much easier to compare code to code rather than
> a high level description to code.

That is the general idea. I was hoping to go the bonding route last
week but I got too snarled up trying to untangle the features we
didn't need. I have some code I am working on but don't have an ETA
for when it will be done.

At this point I am hoping we can build something based on Sridhar's
original patch that can addresses the items I brought up and shifts to
more of a 3 netdev model. If I am not mistaken we might be able to do
it as a separate driver that has a pair of calls that allow for
adding/removing a virt-bond that is provided with a MAC address and a
device pointer so that we can do the bits necessary to get ourselves
swapped with the original virtio device and identify the virtio as the
backup channel.

Thanks.

- Alex

-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



Re: [virtio-dev] Re: [RFC PATCH net-next v2 2/2] virtio_net: Extend virtio to use VF datapath when available

2018-01-29 Thread Michael S. Tsirkin
On Sun, Jan 28, 2018 at 08:26:53PM -0800, Alexander Duyck wrote:
> >> > For live migration with advanced usecases that Siwei is suggesting, i
> >> > think we need a new driver with a new device type that can track the
> >> > VF specific feature settings even when the VF driver is unloaded.
> >
> > I see no added value of the 3 netdev model, there is no need for a bond
> > device.
> 
> I agree a full-blown bond isn't what is needed. However, just forking
> traffic out from virtio to a VF doesn't really solve things either.
> 
> One of the issues as I see it is the fact that the qdisc model with
> the merged device gets to be pretty ugly. There is the fact that every
> packet that goes to the VF has to pass through the qdisc code twice.
> There is the dual nature of the 2 netdev solution that also introduces
> the potential for head-of-line blocking since the virtio could put
> back pressure on the upper qdisc layer which could stall the VF
> traffic when switching over. I hope we could avoid issues like that by
> maintaining qdiscs per device queue instead of on an upper device that
> is half software interface and half not. Ideally the virtio-bond
> device could operate without a qdisc and without needing any
> additional locks so there shouldn't be head of line blocking occurring
> between the two interfaces and overhead could be kept minimal.
> 
> Also in the case of virtio there is support for in-driver XDP. As
> Sridhar stated, when using the 2 netdev model "we cannot support XDP
> in this model and it needs to be disabled". That sounds like a step
> backwards instead of forwards. I would much rather leave the XDP
> enabled at the lower dev level, and then if we want we can use the
> generic XDP at the virtio-bond level to capture traffic on both
> interfaces at the same time.

I agree dropping XDP makes everything very iffy.

> In the case of netvsc you have control of both sides of a given link
> so you can match up the RSS tables, queue configuration and everything
> is somewhat symmetric since you are running the PF and all the HyperV
> subchannels. Most of the complexity is pushed down into the host and
> your subchannel management is synchronized there if I am not mistaken.
> We don't have this in the case of this virtio-bond setup. Instead a
> single bit is set indicating "backup" without indicating what that
> means to topology other than the fact that this virtio interface is
> the backup for some other interface. We are essentially blind other
> than having the link status for the VF and virtio and knowing that the
> virtio is intended to be the backup.

Would you be interested in posting at least a proof of concept
patch for this approach? It's OK if there are some TODO stubs.
It would be much easier to compare code to code rather than
a high level description to code.

> We might be able to get to a 2 or maybe even a 1 netdev solution at
> some point in the future, but  I don't think that time is now. For now
> a virtio-bond type solution would allow us to address most of the use
> cases with minimal modification to the existing virtio and can deal
> with feature and/or resource asymmetry.
> 
> - Alex

-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



Re: [virtio-dev] Re: [RFC PATCH net-next v2 2/2] virtio_net: Extend virtio to use VF datapath when available

2018-01-28 Thread Alexander Duyck
On Sun, Jan 28, 2018 at 3:02 PM, Stephen Hemminger
 wrote:
> On Fri, 26 Jan 2018 18:30:03 -0800
> Jakub Kicinski  wrote:
>
>> On Fri, 26 Jan 2018 15:30:35 -0800, Samudrala, Sridhar wrote:
>> > On 1/26/2018 2:47 PM, Jakub Kicinski wrote:
>> > > On Sat, 27 Jan 2018 00:14:20 +0200, Michael S. Tsirkin wrote:
>> > >> On Fri, Jan 26, 2018 at 01:46:42PM -0800, Siwei Liu wrote:
>> >  and the VM is not expected to do any tuning/optimizations on the VF 
>> >  driver
>> >  directly,
>> >  i think the current patch that follows the netvsc model of 2 
>> >  netdevs(virtio
>> >  and vf) should
>> >  work fine.
>> > >>> OK. For your use case that's fine. But that's too specific scenario
>> > >>> with lots of restrictions IMHO, perhaps very few users will benefit
>> > >>> from it, I'm not sure. If you're unwilling to move towards it, we'd
>> > >>> take this one and come back with a generic solution that is able to
>> > >>> address general use cases for VF/PT live migration .
>> > >> I think that's a fine approach. Scratch your own itch!  I imagine a very
>> > >> generic virtio-switchdev providing host routing info to guests could
>> > >> address lots of usecases. A driver could bind to that one and enslave
>> > >> arbitrary other devices.  Sounds reasonable.
>> > >>
>> > >> But given the fundamental idea of a failover was floated at least as
>> > >> early as 2013, and made 0 progress since precisely because it kept
>> > >> trying to address more and more features, and given netvsc is already
>> > >> using the basic solution with some success, I'm not inclined to block
>> > >> this specific effort waiting for the generic one.
>> > > I think there is an agreement that the extra netdev will be useful for
>> > > more advanced use cases, and is generally preferable.  What is the
>> > > argument for not doing that from the start?  If it was made I must have
>> > > missed it.  Is it just unwillingness to write the extra 300 lines of
>> > > code?  Sounds like a pretty weak argument when adding kernel ABI is at
>> > > stake...
>> >
>> > I am still not clear on the need for the extra netdev created by
>> > virtio_net. The only advantage i can see is that the stats can be
>> > broken between VF and virtio datapaths compared to the aggregrated
>> > stats on virtio netdev as seen with the 2 netdev approach.
>>
>> Maybe you're not convinced but multiple arguments were made.
>>
>> > With 2 netdev model, any VM image that has a working network
>> > configuration will transparently get VF based acceleration without
>> > any changes.
>>
>> Nothing happens transparently.  Things may happen automatically.  The
>> VF netdev doesn't disappear with netvsc.  The PV netdev transforms into
>> something it did not use to be.  And configures and reports some
>> information from the PV (e.g. speed) but PV doesn't pass traffic any
>> longer.
>>
>> > 3 netdev model breaks this configuration starting with the creation
>> > and naming of the 2 devices to udev needing to be aware of master and
>> > slave virtio-net devices.
>>
>> I don't understand this comment.  There is one virtio-net device and
>> one "virtio-bond" netdev.  And user space has to be aware of the special
>> automatic arrangement anyway, because it can't touch the VF.  It
>> doesn't make any difference whether it ignores the VF or PV and VF.
>> It simply can't touch the slaves, no matter how many there are.
>>
>> > Also, from a user experience point of view, loading a virtio-net with
>> > BACKUP feature enabled will now show 2 virtio-net netdevs.
>>
>> One virtio-net and one virtio-bond, which represents what's happening.
>>
>> > For live migration with advanced usecases that Siwei is suggesting, i
>> > think we need a new driver with a new device type that can track the
>> > VF specific feature settings even when the VF driver is unloaded.
>
> I see no added value of the 3 netdev model, there is no need for a bond
> device.

I agree a full-blown bond isn't what is needed. However, just forking
traffic out from virtio to a VF doesn't really solve things either.

One of the issues as I see it is the fact that the qdisc model with
the merged device gets to be pretty ugly. There is the fact that every
packet that goes to the VF has to pass through the qdisc code twice.
There is the dual nature of the 2 netdev solution that also introduces
the potential for head-of-line blocking since the virtio could put
back pressure on the upper qdisc layer which could stall the VF
traffic when switching over. I hope we could avoid issues like that by
maintaining qdiscs per device queue instead of on an upper device that
is half software interface and half not. Ideally the virtio-bond
device could operate without a qdisc and without needing any
additional locks so there shouldn't be head of line blocking occurring
between the two interfaces and overhead could be kept minimal.

Also in the case of virtio there is support for in-driver 

Re: [virtio-dev] Re: [RFC PATCH net-next v2 2/2] virtio_net: Extend virtio to use VF datapath when available

2018-01-28 Thread Alexander Duyck
On Sun, Jan 28, 2018 at 1:01 PM, Samudrala, Sridhar
 wrote:
>
>
> On 1/28/2018 12:18 PM, Alexander Duyck wrote:
>>
>> On Sun, Jan 28, 2018 at 11:18 AM, Samudrala, Sridhar
>>  wrote:
>>>
>>> On 1/28/2018 9:35 AM, Alexander Duyck wrote:

 On Fri, Jan 26, 2018 at 9:58 PM, Jakub Kicinski  wrote:
>
> On Fri, 26 Jan 2018 21:33:01 -0800, Samudrala, Sridhar wrote:

 3 netdev model breaks this configuration starting with the creation
 and naming of the 2 devices to udev needing to be aware of master
 and
 slave virtio-net devices.
>>>
>>> I don't understand this comment.  There is one virtio-net device and
>>> one "virtio-bond" netdev.  And user space has to be aware of the
>>> special
>>> automatic arrangement anyway, because it can't touch the VF.  It
>>> doesn't make any difference whether it ignores the VF or PV and VF.
>>> It simply can't touch the slaves, no matter how many there are.
>>
>> If the userspace is not expected to touch the slaves, then why do we
>> need to
>> take extra effort to expose a netdev that is just not really useful.
>
> You said:
> "[user space] needs to be aware of master and slave virtio-net
> devices."
>
> I'm saying:
> It has to be aware of the special arrangement whether there is an
> explicit bond netdev or not.

 To clarify here the kernel should be aware that there is a device that
 is an aggregate of 2 other devices. It isn't as if we need to insert
 the new device "above" the virtio.

 I have been doing a bit of messing around with a few ideas and it
 seems like it would be better if we could replace the virtio interface
 with the virtio-bond, renaming my virt-bond concept to this since it
 is now supposedly going to live in the virtio driver, interface, and
 then push the original virtio down one layer and call it a
 virtio-backup. If I am not mistaken we could assign the same dev
 pointer used by the virtio netdev to the virtio-bond, and if we
 register it first with the "eth%d" name then udev will assume that the
 virtio-bond device is the original virtio and all existing scripts
 should just work with that. We then would want to change the name of
 the virtio interface with the backup feature bit set, maybe call it
 something like bkup-00:00:00 where the 00:00:00 would be the last 3
 octets of the MAC address. It should solve the issue of inserting an
 interface "above" the virtio by making the virtio-bond become the
 virtio. The only limitation is that we will probably need to remove
 the back-up if the virtio device is removed, however that was already
 a limitation of this solution and others like the netvsc solution
 anyway.
>>>
>>>
>>> With 3 netdev model, if we make the the master virtio-net associated with
>>> the
>>> real virtio pci device,  i think  the userspace scripts would not break.
>>> If we go this route, i am still not clear on the purpose of exposing the
>>> bkup netdev.
>>> Can we start with the 2 netdev model and move to 3 netdev model later if
>>> we
>>> find out that there are limitiations with the 2 netdev model? I don't
>>> think
>>> this will
>>> break any user API as the userspace is not expected to use the bkup
>>> netdev.
>>
>> The 2 netdev model breaks a large number of expectations of user
>> space. Among them is XDP since we cannot guarantee a symmetric setup
>> between any entity and the virtio. How does it make sense that
>> enabling XDP on virtio shows zero Rx packets, and in the meantime you
>> are getting all of the packets coming in off of the VF?
>
> Sure we cannot support XDP in this model and it needs to be disabled.
>>
>>
>> In addition we would need to rewrite the VLAN and MAC address
>> filtering ndo operations since we likely cannot add any VLANs since in
>> most cases VFs are VLAN locked due to things like port VLAN and we
>> cannot change the MAC address since the whole bonding concept is built
>> around it.
>>
>> The last bit is the netpoll packet routing which the current code
>> assumes is using the virtio only, but I don't know if that is a valid
>> assumption since the VF is expected to be the default route for
>> everything else when it is available.
>>
>> The point is by the time you are done you will have rewritten pretty
>> much all the network device ops. With that being the case why add all
>> the code to virtio itself instead of just coming up with a brand new
>> set of ndo_ops that belong to this new device, and you could leave the
>> existing virtio code in place and save yourself a bunch of time by
>> just accessing it as an existing call as a separate netdev.
>
>
> When the BACKUP feature is enabled, we can simply disable most of these ndo
> ops
> that cannot be supported. Not sure we need an additional netdev and 

Re: [virtio-dev] Re: [RFC PATCH net-next v2 2/2] virtio_net: Extend virtio to use VF datapath when available

2018-01-28 Thread Samudrala, Sridhar



On 1/28/2018 12:18 PM, Alexander Duyck wrote:

On Sun, Jan 28, 2018 at 11:18 AM, Samudrala, Sridhar
 wrote:

On 1/28/2018 9:35 AM, Alexander Duyck wrote:

On Fri, Jan 26, 2018 at 9:58 PM, Jakub Kicinski  wrote:

On Fri, 26 Jan 2018 21:33:01 -0800, Samudrala, Sridhar wrote:

3 netdev model breaks this configuration starting with the creation
and naming of the 2 devices to udev needing to be aware of master and
slave virtio-net devices.

I don't understand this comment.  There is one virtio-net device and
one "virtio-bond" netdev.  And user space has to be aware of the
special
automatic arrangement anyway, because it can't touch the VF.  It
doesn't make any difference whether it ignores the VF or PV and VF.
It simply can't touch the slaves, no matter how many there are.

If the userspace is not expected to touch the slaves, then why do we
need to
take extra effort to expose a netdev that is just not really useful.

You said:
"[user space] needs to be aware of master and slave virtio-net devices."

I'm saying:
It has to be aware of the special arrangement whether there is an
explicit bond netdev or not.

To clarify here the kernel should be aware that there is a device that
is an aggregate of 2 other devices. It isn't as if we need to insert
the new device "above" the virtio.

I have been doing a bit of messing around with a few ideas and it
seems like it would be better if we could replace the virtio interface
with the virtio-bond, renaming my virt-bond concept to this since it
is now supposedly going to live in the virtio driver, interface, and
then push the original virtio down one layer and call it a
virtio-backup. If I am not mistaken we could assign the same dev
pointer used by the virtio netdev to the virtio-bond, and if we
register it first with the "eth%d" name then udev will assume that the
virtio-bond device is the original virtio and all existing scripts
should just work with that. We then would want to change the name of
the virtio interface with the backup feature bit set, maybe call it
something like bkup-00:00:00 where the 00:00:00 would be the last 3
octets of the MAC address. It should solve the issue of inserting an
interface "above" the virtio by making the virtio-bond become the
virtio. The only limitation is that we will probably need to remove
the back-up if the virtio device is removed, however that was already
a limitation of this solution and others like the netvsc solution
anyway.


With 3 netdev model, if we make the the master virtio-net associated with
the
real virtio pci device,  i think  the userspace scripts would not break.
If we go this route, i am still not clear on the purpose of exposing the
bkup netdev.
Can we start with the 2 netdev model and move to 3 netdev model later if we
find out that there are limitiations with the 2 netdev model? I don't think
this will
break any user API as the userspace is not expected to use the bkup netdev.

The 2 netdev model breaks a large number of expectations of user
space. Among them is XDP since we cannot guarantee a symmetric setup
between any entity and the virtio. How does it make sense that
enabling XDP on virtio shows zero Rx packets, and in the meantime you
are getting all of the packets coming in off of the VF?

Sure we cannot support XDP in this model and it needs to be disabled.


In addition we would need to rewrite the VLAN and MAC address
filtering ndo operations since we likely cannot add any VLANs since in
most cases VFs are VLAN locked due to things like port VLAN and we
cannot change the MAC address since the whole bonding concept is built
around it.

The last bit is the netpoll packet routing which the current code
assumes is using the virtio only, but I don't know if that is a valid
assumption since the VF is expected to be the default route for
everything else when it is available.

The point is by the time you are done you will have rewritten pretty
much all the network device ops. With that being the case why add all
the code to virtio itself instead of just coming up with a brand new
set of ndo_ops that belong to this new device, and you could leave the
existing virtio code in place and save yourself a bunch of time by
just accessing it as an existing call as a separate netdev.


When the BACKUP feature is enabled, we can simply disable most of these 
ndo ops

that cannot be supported. Not sure we need an additional netdev and ndo_ops.

When we can support all these usecases along with live migration we can move
to the 3 netdev model and i think we will need a new feature bit so that the
hypervisor can allow VM to use both datapaths and configure PF accordingly.

Thanks
Sridhar

-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



Re: [virtio-dev] Re: [RFC PATCH net-next v2 2/2] virtio_net: Extend virtio to use VF datapath when available

2018-01-28 Thread Samudrala, Sridhar

On 1/28/2018 9:35 AM, Alexander Duyck wrote:

On Fri, Jan 26, 2018 at 9:58 PM, Jakub Kicinski  wrote:

On Fri, 26 Jan 2018 21:33:01 -0800, Samudrala, Sridhar wrote:

3 netdev model breaks this configuration starting with the creation
and naming of the 2 devices to udev needing to be aware of master and
slave virtio-net devices.

I don't understand this comment.  There is one virtio-net device and
one "virtio-bond" netdev.  And user space has to be aware of the special
automatic arrangement anyway, because it can't touch the VF.  It
doesn't make any difference whether it ignores the VF or PV and VF.
It simply can't touch the slaves, no matter how many there are.

If the userspace is not expected to touch the slaves, then why do we need to
take extra effort to expose a netdev that is just not really useful.

You said:
"[user space] needs to be aware of master and slave virtio-net devices."

I'm saying:
It has to be aware of the special arrangement whether there is an
explicit bond netdev or not.

To clarify here the kernel should be aware that there is a device that
is an aggregate of 2 other devices. It isn't as if we need to insert
the new device "above" the virtio.

I have been doing a bit of messing around with a few ideas and it
seems like it would be better if we could replace the virtio interface
with the virtio-bond, renaming my virt-bond concept to this since it
is now supposedly going to live in the virtio driver, interface, and
then push the original virtio down one layer and call it a
virtio-backup. If I am not mistaken we could assign the same dev
pointer used by the virtio netdev to the virtio-bond, and if we
register it first with the "eth%d" name then udev will assume that the
virtio-bond device is the original virtio and all existing scripts
should just work with that. We then would want to change the name of
the virtio interface with the backup feature bit set, maybe call it
something like bkup-00:00:00 where the 00:00:00 would be the last 3
octets of the MAC address. It should solve the issue of inserting an
interface "above" the virtio by making the virtio-bond become the
virtio. The only limitation is that we will probably need to remove
the back-up if the virtio device is removed, however that was already
a limitation of this solution and others like the netvsc solution
anyway.


With 3 netdev model, if we make the the master virtio-net associated 
with the

real virtio pci device,  i think  the userspace scripts would not break.
If we go this route, i am still not clear on the purpose of exposing the 
bkup netdev.

Can we start with the 2 netdev model and move to 3 netdev model later if we
find out that there are limitiations with the 2 netdev model? I don't 
think this will

break any user API as the userspace is not expected to use the bkup netdev.




Also, from a user experience point of view, loading a virtio-net with
BACKUP feature enabled will now show 2 virtio-net netdevs.

One virtio-net and one virtio-bond, which represents what's happening.

This again assumes that we want to represent a bond setup. Can't we
treat this
as virtio-net providing an alternate low-latency datapath by taking over
VF datapath?

Bond is just a familiar name, we can call it something else if you
prefer.  The point is there are two data paths which can have
independent low-level settings and a higher level entity with
global settings which represents any path to the outside world.

Hiding low-level netdevs from a lay user requires a generic solution.

The last thing I think we should be doing is hiding the low level
netdevs. It doesn't solve anythying. We are already trusting the owner
of the VM enough to let them have root access of the VM. That means
they can load/unload any driver they want. They don't have to use the
kernel provided virtio driver, they could load their own. They could
even do something like run DPDK on top of it, or they could run DPDK
on top of the VF. In either case there is no way the bond would ever
be created and all hiding devices does is make it easier to fix
problems when something gets broken. Unless I am mistaken, and
"security through obscurity" has somehow become a valid security
model.

As I mentioned to Sridhar on an off-list thread I think the goal
should be to make it so that the user wants to use the virtio-bond,
not make it so that they have no choice but to use it. The idea is we
should be making things easier for the administrator of the VM, not
harder.



When the hypervisor has enabled BACKUP bit along with a VF with the same
MAC address, the VM can use either VF only OR extended virtio with 
attached  VF.
Although there are 2 datapaths to the device, the hypervisor will enable 
only
one at any time.  The virtio via PF datapath is only enabled during live 
migration
when the VF is unplugged. If not, VF broadcasts/multicasts will get 
looped back
to the VM via the PF datapath. In the RFC path i was assuming that the 
VM can
use both datapaths 

Re: [virtio-dev] Re: [RFC PATCH net-next v2 2/2] virtio_net: Extend virtio to use VF datapath when available

2018-01-28 Thread Alexander Duyck
On Fri, Jan 26, 2018 at 9:58 PM, Jakub Kicinski  wrote:
> On Fri, 26 Jan 2018 21:33:01 -0800, Samudrala, Sridhar wrote:
>> >> 3 netdev model breaks this configuration starting with the creation
>> >> and naming of the 2 devices to udev needing to be aware of master and
>> >> slave virtio-net devices.
>> > I don't understand this comment.  There is one virtio-net device and
>> > one "virtio-bond" netdev.  And user space has to be aware of the special
>> > automatic arrangement anyway, because it can't touch the VF.  It
>> > doesn't make any difference whether it ignores the VF or PV and VF.
>> > It simply can't touch the slaves, no matter how many there are.
>>
>> If the userspace is not expected to touch the slaves, then why do we need to
>> take extra effort to expose a netdev that is just not really useful.
>
> You said:
> "[user space] needs to be aware of master and slave virtio-net devices."
>
> I'm saying:
> It has to be aware of the special arrangement whether there is an
> explicit bond netdev or not.

To clarify here the kernel should be aware that there is a device that
is an aggregate of 2 other devices. It isn't as if we need to insert
the new device "above" the virtio.

I have been doing a bit of messing around with a few ideas and it
seems like it would be better if we could replace the virtio interface
with the virtio-bond, renaming my virt-bond concept to this since it
is now supposedly going to live in the virtio driver, interface, and
then push the original virtio down one layer and call it a
virtio-backup. If I am not mistaken we could assign the same dev
pointer used by the virtio netdev to the virtio-bond, and if we
register it first with the "eth%d" name then udev will assume that the
virtio-bond device is the original virtio and all existing scripts
should just work with that. We then would want to change the name of
the virtio interface with the backup feature bit set, maybe call it
something like bkup-00:00:00 where the 00:00:00 would be the last 3
octets of the MAC address. It should solve the issue of inserting an
interface "above" the virtio by making the virtio-bond become the
virtio. The only limitation is that we will probably need to remove
the back-up if the virtio device is removed, however that was already
a limitation of this solution and others like the netvsc solution
anyway.

>> >> Also, from a user experience point of view, loading a virtio-net with
>> >> BACKUP feature enabled will now show 2 virtio-net netdevs.
>> > One virtio-net and one virtio-bond, which represents what's happening.
>> This again assumes that we want to represent a bond setup. Can't we
>> treat this
>> as virtio-net providing an alternate low-latency datapath by taking over
>> VF datapath?
>
> Bond is just a familiar name, we can call it something else if you
> prefer.  The point is there are two data paths which can have
> independent low-level settings and a higher level entity with
> global settings which represents any path to the outside world.
>
> Hiding low-level netdevs from a lay user requires a generic solution.

The last thing I think we should be doing is hiding the low level
netdevs. It doesn't solve anythying. We are already trusting the owner
of the VM enough to let them have root access of the VM. That means
they can load/unload any driver they want. They don't have to use the
kernel provided virtio driver, they could load their own. They could
even do something like run DPDK on top of it, or they could run DPDK
on top of the VF. In either case there is no way the bond would ever
be created and all hiding devices does is make it easier to fix
problems when something gets broken. Unless I am mistaken, and
"security through obscurity" has somehow become a valid security
model.

As I mentioned to Sridhar on an off-list thread I think the goal
should be to make it so that the user wants to use the virtio-bond,
not make it so that they have no choice but to use it. The idea is we
should be making things easier for the administrator of the VM, not
harder.

- Alex

-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



Re: [virtio-dev] Re: [RFC PATCH net-next v2 2/2] virtio_net: Extend virtio to use VF datapath when available

2018-01-26 Thread Samudrala, Sridhar


On 1/26/2018 2:47 PM, Jakub Kicinski wrote:

On Sat, 27 Jan 2018 00:14:20 +0200, Michael S. Tsirkin wrote:

On Fri, Jan 26, 2018 at 01:46:42PM -0800, Siwei Liu wrote:

and the VM is not expected to do any tuning/optimizations on the VF driver
directly,
i think the current patch that follows the netvsc model of 2 netdevs(virtio
and vf) should
work fine.

OK. For your use case that's fine. But that's too specific scenario
with lots of restrictions IMHO, perhaps very few users will benefit
from it, I'm not sure. If you're unwilling to move towards it, we'd
take this one and come back with a generic solution that is able to
address general use cases for VF/PT live migration .

I think that's a fine approach. Scratch your own itch!  I imagine a very
generic virtio-switchdev providing host routing info to guests could
address lots of usecases. A driver could bind to that one and enslave
arbitrary other devices.  Sounds reasonable.

But given the fundamental idea of a failover was floated at least as
early as 2013, and made 0 progress since precisely because it kept
trying to address more and more features, and given netvsc is already
using the basic solution with some success, I'm not inclined to block
this specific effort waiting for the generic one.

I think there is an agreement that the extra netdev will be useful for
more advanced use cases, and is generally preferable.  What is the
argument for not doing that from the start?  If it was made I must have
missed it.  Is it just unwillingness to write the extra 300 lines of
code?  Sounds like a pretty weak argument when adding kernel ABI is at
stake...


I am still not clear on the need for the extra netdev created by 
virtio_net. The only advantage
i can see is that the stats can be broken between VF and virtio 
datapaths compared
to the aggregrated stats on virtio netdev as seen with the 2 netdev 
approach.


With 2 netdev model, any VM image that has a working network 
configuration will transparently get
VF based acceleration without any changes. 3 netdev model breaks this 
configuration starting with the
creation and naming of the 2 devices to udev needing to be aware of 
master and slave virtio-net devices.
Also, from a user experience point of view, loading a virtio-net with 
BACKUP feature

enabled will  now show 2 virtio-net netdevs.

For live migration with advanced usecases that Siwei is suggesting, i 
think we need a new driver
with a new device type that can track the VF specific feature settings 
even when the VF driver is unloaded.


Thanks
Sridhar



-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



Re: [virtio-dev] Re: [RFC PATCH net-next v2 2/2] virtio_net: Extend virtio to use VF datapath when available

2018-01-26 Thread Michael S. Tsirkin
On Fri, Jan 26, 2018 at 01:46:42PM -0800, Siwei Liu wrote:
> > and the VM is not expected to do any tuning/optimizations on the VF driver
> > directly,
> > i think the current patch that follows the netvsc model of 2 netdevs(virtio
> > and vf) should
> > work fine.
> 
> OK. For your use case that's fine. But that's too specific scenario
> with lots of restrictions IMHO, perhaps very few users will benefit
> from it, I'm not sure. If you're unwilling to move towards it, we'd
> take this one and come back with a generic solution that is able to
> address general use cases for VF/PT live migration .

I think that's a fine approach. Scratch your own itch!  I imagine a very
generic virtio-switchdev providing host routing info to guests could
address lots of usecases. A driver could bind to that one and enslave
arbitrary other devices.  Sounds reasonable.

But given the fundamental idea of a failover was floated at least as
early as 2013, and made 0 progress since precisely because it kept
trying to address more and more features, and given netvsc is already
using the basic solution with some success, I'm not inclined to block
this specific effort waiting for the generic one.

-- 
MST

-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



Re: [virtio-dev] Re: [RFC PATCH net-next v2 2/2] virtio_net: Extend virtio to use VF datapath when available

2018-01-26 Thread Siwei Liu
On Fri, Jan 26, 2018 at 8:51 AM, Samudrala, Sridhar
 wrote:
>
>
> On 1/26/2018 12:14 AM, Siwei Liu wrote:
>>
>> On Tue, Jan 23, 2018 at 2:58 PM, Michael S. Tsirkin 
>> wrote:
>>>
>>> On Tue, Jan 23, 2018 at 12:24:47PM -0800, Siwei Liu wrote:

 On Mon, Jan 22, 2018 at 1:41 PM, Michael S. Tsirkin 
 wrote:
>
> On Mon, Jan 22, 2018 at 12:27:14PM -0800, Siwei Liu wrote:
>>
>> First off, as mentioned in another thread, the model of stacking up
>> virt-bond functionality over virtio seems a wrong direction to me.
>> Essentially the migration process would need to carry over all guest
>> side configurations previously done on the VF/PT and get them moved to
>> the new device being it virtio or VF/PT.
>
> I might be wrong but I don't see why we should worry about this
> usecase.
> Whoever has a bond configured already has working config for migration.
> We are trying to help people who don't, not convert existig users.

 That has been placed in the view of cloud providers that the imported
 images from the store must be able to run unmodified thus no
 additional setup script is allowed (just as Stephen mentioned in
 another mail). Cloud users don't care about live migration themselves
 but the providers are required to implement such automation mechanism
 to make this process transparent if at all possible. The user does not
 care about the device underneath being VF or not, but they do care
 about consistency all across and the resulting performance
 acceleration in making VF the prefered datapath. It is not quite
 peculiar user cases but IMHO *any* approach proposed for live
 migration should be able to persist the state including network config
 e.g. as simple as MTU. Actually this requirement has nothing to do
 with virtio but our target users are live migration agnostic, being it
 tracking DMA through dirty pages, using virtio as the helper, or
 whatsoever, the goal of persisting configs across remains same.
>>>
>>> So the patching being discussed here will mostly do exactly that if your
>>> original config was simply a single virtio net device.
>>>
>> True, but I don't see the patch being discussed starts with good
>> foundation of supporting the same for VF/PT device. That is the core
>> of the issue.
>
>
>>> What kind of configs do your users have right now?
>>
>> Any configs be it generic or driver specific that the VF/PT device
>> supports and have been enabled/configured. General network configs
>> (MAC, IP address, VLAN, MTU, iptables rules), ethtool settings
>> (hardware offload, # of queues and ring entris, RSC options, rss
>> rxfh-indir table, rx-flow-hash, et al) , bpf/XDP program being run, tc
>> flower offload, just to name a few. As cloud providers we don't limit
>> users from applying driver specific tuning to the NIC/VF, and
>> sometimes this is essential to achieving best performance for their
>> workload. We've seen cases like tuning coalescing parameters for
>> getting low latency, changing rx-flow-hash function for better VXLAN
>> throughput, or even adopting quite advanced NIC features such as flow
>> director or cloud filter. We don't expect users to compromise even a
>> little bit on these. That is once we turn on live migration for the VF
>> or pass through devices in the VM, it all takes place under the hood,
>> users (guest admins, applications) don't have to react upon it or even
>> notice the change. I should note that the majority of live migrations
>> take place between machines with completely identical hardware, it's
>> more critical than necessary to keep the config as-is across the move,
>> stealth while quiet.
>
>
> This usecase is much more complicated and different than what this patch is
> trying
> to address.

Yep, it is technically difficult, but as cloud providers we would like
to take actions to address use case for our own if no one else is
willing to do so. However we're not seeking complicated design or
messing up the others such as your use case. As this is the first time
a real patch of the PV failover approach, although having be discussed
for years, posted to the mailing list. All voices suddenly came over,
various parties wish their specific needs added to the todo list, it's
indeed hard to accommodate all at once in the first place. I went
through same tough period of time while I was doing similar work so I
completely understand that. The task is not easy for sure. :)

The attempts I made was trying to consolidate all potential use cases
into one single solution rather than diverge from the very beginning.
It's in the phase of RFC and I don't want to wait expressing our
interest until very late.

>  Also your usecase seems to be assuming that source and
> destination
> hosts are identical and have the same HW.

Not exactly, this will be positioned as an optimization, but