Re: [virtio-dev] Re: net_failover slave udev renaming (was Re: [RFC PATCH net-next v6 4/4] netvsc: refactor notifier/event handling code to use the bypass framework)

2019-04-19 Thread Liran Alon


> On 28 Feb 2019, at 1:50, Michael S. Tsirkin  wrote:
> 
> On Wed, Feb 27, 2019 at 03:34:56PM -0800, si-wei liu wrote:
>> 
>> 
>> On 2/27/2019 2:38 PM, Michael S. Tsirkin wrote:
>>> On Tue, Feb 26, 2019 at 04:17:21PM -0800, si-wei liu wrote:
 
 On 2/25/2019 6:08 PM, Michael S. Tsirkin wrote:
> On Mon, Feb 25, 2019 at 04:58:07PM -0800, si-wei liu wrote:
>> On 2/22/2019 7:14 AM, Michael S. Tsirkin wrote:
>>> On Thu, Feb 21, 2019 at 11:55:11PM -0800, si-wei liu wrote:
 On 2/21/2019 11:00 PM, Samudrala, Sridhar wrote:
> On 2/21/2019 7:33 PM, si-wei liu wrote:
>> On 2/21/2019 5:39 PM, Michael S. Tsirkin wrote:
>>> On Thu, Feb 21, 2019 at 05:14:44PM -0800, Siwei Liu wrote:
 Sorry for replying to this ancient thread. There was some remaining
 issue that I don't think the initial net_failover patch got 
 addressed
 cleanly, see:
 
 https://urldefense.proofpoint.com/v2/url?u=https-3A__bugs.launchpad.net_ubuntu_-2Bsource_linux_-2Bbug_1815268&d=DwIBAg&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=Jk6Q8nNzkQ6LJ6g42qARkg6ryIDGQr-yKXPNGZbpTx0&m=aL-QfUoSYx8r0XCOBkcDtF8f-cYxrJI3skYLFTb8XJE&s=yk6Nqv3a6_JMzyrXKY67h00FyNrDJyQ-PYMFffDSTXM&e=
 
 The renaming of 'eth0' to 'ens4' fails because the udev userspace 
 was
 not specifically writtten for such kernel automatic enslavement.
 Specifically, if it is a bond or team, the slave would typically 
 get
 renamed *before* virtual device gets created, that's what udev can
 control (without getting netdev opened early by the other part of
 kernel) and other userspace components for e.g. initramfs,
 init-scripts can coordinate well in between. The in-kernel
 auto-enslavement of net_failover breaks this userspace convention,
 which don't provides a solution if user care about consistent 
 naming
 on the slave netdevs specifically.
 
 Previously this issue had been specifically called out when 
 IFF_HIDDEN
 and the 1-netdev was proposed, but no one gives out a solution to 
 this
 problem ever since. Please share your mind how to proceed and solve
 this userspace issue if netdev does not welcome a 1-netdev model.
>>> Above says:
>>> 
>>>   there's no motivation in the systemd/udevd community at
>>>   this point to refactor the rename logic and make it work well 
>>> with
>>>   3-netdev.
>>> 
>>> What would the fix be? Skip slave devices?
>>> 
>> There's nothing user can get if just skipping slave devices - the
>> name is still unchanged and unpredictable e.g. eth0, or eth1 the
>> next reboot, while the rest may conform to the naming scheme (ens3
>> and such). There's no way one can fix this in userspace alone - when
>> the failover is created the enslaved netdev was opened by the kernel
>> earlier than the userspace is made aware of, and there's no
>> negotiation protocol for kernel to know when userspace has done
>> initial renaming of the interface. I would expect netdev list should
>> at least provide the direction in general for how this can be
>> solved...
>>> I was just wondering what did you mean when you said
>>> "refactor the rename logic and make it work well with 3-netdev" -
>>> was there a proposal udev rejected?
>> No. I never believed this particular issue can be fixed in userspace 
>> alone.
>> Previously someone had said it could be, but I never see any work or
>> relevant discussion ever happened in various userspace communities (for 
>> e.g.
>> dracut, initramfs-tools, systemd, udev, and NetworkManager). IMHO the 
>> root
>> of the issue derives from the kernel, it makes more sense to start from
>> netdev, work out and decide on a solution: see what can be done in the
>> kernel in order to fix it, then after that engage userspace community for
>> the feasibility...
>> 
>>> Anyway, can we write a time diagram for what happens in which order that
>>> leads to failure?  That would help look for triggers that we can tie
>>> into, or add new ones.
>>> 
>> See attached diagram.
>> 
>>> 
>>> 
> Is there an issue if slave device names are not predictable? The 
> user/admin scripts are expected
> to only work with the master failover device.
 Where does this expectation come from?
 
 Admin users may have ethtool or tc configurations that need to deal 
 with
 predictable interface name. Third-party app which was built upon 
 specifying
 certain interface name can't be modifie

Re: [virtio-dev] Re: net_failover slave udev renaming (was Re: [RFC PATCH net-next v6 4/4] netvsc: refactor notifier/event handling code to use the bypass framework)

2019-03-01 Thread Siwei Liu
On Thu, Feb 28, 2019 at 5:05 PM Jakub Kicinski  wrote:
>
> On Thu, 28 Feb 2019 16:20:28 -0800, Siwei Liu wrote:
> > On Thu, Feb 28, 2019 at 11:56 AM Jakub Kicinski wrote:
> > > On Thu, 28 Feb 2019 14:36:56 -0500, Michael S. Tsirkin wrote:
> > > > > It is a bit of a the chicken or the egg situation ;)  But users can
> > > > > just blacklist, too.  Anyway, I think this is far better than module
> > > > > parameters
> > > >
> > > > Sorry I'm a bit confused. What is better than what?
> > >
> > > I mean that blacklist net_failover or module param to disable
> > > net_failover and handle in user space are better than trying to solve
> > > the renaming at kernel level (either by adding module params that make
> > > the kernel rename devices or letting user space change names of running
> > > devices if they are slaves).
> >
> > Before I was aksed to revive this old mail thread, I knew the
> > discussion could end up with something like this. Yes, theoretically
> > there's a point - basically you don't believe kernel should take risk
> > in fixing the issue, so you push back the hope to something in
> > hypothesis that actually wasn't done and hard to get done in reality.
> > It's not too different than saying "hey, what you're asking for is
> > simply wrong, don't do it! Go back to modify userspace to create a
> > bond or team instead!" FWIW I want to emphasize that the debate for
> > what should be the right place to implement this failover facility:
> > userspace versus kernel, had been around for almost a decade, and no
> > real work ever happened in userspace to "standardize" this in the
> > Linux world.
>
> Let me offer you my very subjective opinion of why "no real work ever
> happened in user space".  The actors who have primary interest to get
> the auto-bonding working are HW vendors trying to either convince
> customers to use SR-IOV, or being pressured by customers to make SR-IOV
> easier to consume.  HW vendors hire driver developers, not user space
> developers.  So the solution we arrive at is in the kernel for a non
> technical reason (Conway's law, sort of).
>
> $ cd NetworkManager/
> $ git log --pretty=format:"%ae" | \
> grep '\(mellanox\|intel\|broadcom\|netronome\)' | sort | uniq -c
>  81 andrew.zaborow...@intel.com
>   2 david.woodho...@intel.com
>   2 ismo.puusti...@intel.com
>   1 michael.i.dohe...@intel.com
>
> Andrew works on WiFi.
>

I'm sorry, but we don't use NetworkManager in our cloud images at all.
We sufferd from lots of problems when booting from remote iSCSI disk
with NetworkManager enabled, and it looks like those issues are still
there while that's not (my subjective impression) a network config
tool mainly targeting desktop and WiFi users ever cares about. At
least a sign of lack of sufficient testing was made there.

>From cloud service provider perspective, we always prefer single
central solution than speak to various distro vendors with their own
network daemons/config tools thus different solutions. It's hard to
coordicate all efforts in one place. From my personal perspetive, the
in-kernel auto-slave solution is nothing technically inferior than any
userspace implementation, and every major OS/cloud providers choose to
implement this in-kernel model for the same reason. I don't want to
argue more if there's value or not for net_failover to be in Linux
kernel, given that it's already there I think it's better to move on.

We have done extensive work in reporting (actually, fix them
internally before posting) issues to the dracut, udev,
initramfs-tools, and cloud-init community. Although as claimed the
3-netdev should be transparent to userspace in general, the reality is
opposite: the effort is nothing differenet than bring up a new type of
virutal bond than any existing userspace tool would otherwise expect
for a regular physical netdev. If there's ever concern about breaking
userspace, I bet no one ever tries to start using it. If they did they
know what I am saying. The dup MAC address setting and plugging order
are totally new to userspace that none of userspace tools fail to know
how to plumb failover interface in a proper way, if without fixing
them one or another.

-Siwei

> I have asked the NetworkManager folks to implement this feature last
> year when net_failover got dangerously close to getting merged, and
> they said they were never approached with this request before, much less
> offered code that solve it.  Unfortunately before they got around to it
> net_failover was merged already, and they didn't proceed.
>
> So to my knowledge nobody ever tried to solve this in user space.
> I don't think net_failover is particularly terrible, or that renaming
> of primary in the kernel is the end of the world, but I'd appreciate if
> you could point me to efforts to solve it upstream in user space
> components, or acknowledge that nobody actually tried that.
___
Virtualization mailing list
Virtualization@li

Re: [virtio-dev] Re: net_failover slave udev renaming (was Re: [RFC PATCH net-next v6 4/4] netvsc: refactor notifier/event handling code to use the bypass framework)

2019-03-01 Thread Michael S. Tsirkin
On Thu, Feb 28, 2019 at 05:30:56PM -0800, si-wei liu wrote:
> 
> 
> On 2/28/2019 6:26 AM, Michael S. Tsirkin wrote:
> > On Thu, Feb 28, 2019 at 01:32:12AM -0800, si-wei liu wrote:
> > > > > Will the
> > > > > change break userspace further?
> > > > > 
> > > > > -Siwei
> > > > Didn't you show userspace is already broken. You can't "further
> > > > break it", rename already fails.
> > > It's a race, userspace tends to give slave a user(space) desired name but
> > > sometimes may fail due to this race. Today if failover master is not up,
> > > rename would succeed anyway. While what you proposed prohibits user from
> > > providing a name in all circumstances if I understand you correctly. 
> > > That's
> > > what I meant of breaking userspace further. On the other hand, you seem to
> > > tighten the kernel default naming to udev predictable names, which is
> > > derived from only recent systemd-udevd, while there exists many possible
> > > userspace naming schemes out of that. Users today who deliberately chooses
> > > to disable predictable naming (net.ifnames=0 biosdevname=0) and fall back 
> > > to
> > > kernel provided names would expect the ethX pattern, with this change
> > > admin/user scripts which matches the ethX pattern could potentially break.
> > Whatever crashes with a name not matching ethX will crash on the
> > standby interface *anyway*.
> With udev predictable naming disabled they should not. It's not hard for
> user to look for device attribute to persistent the name well, in a
> consistent and reliable way.

Well that's special code for failover already. So far we just
taught userspace to skip renaming slave interfaces.

> > 
> > So I think what you are saying is that someone might have already
> > written scripts and gotten them to work on v4.17 when STANDBY was
> > included and these scripts rely on ethX. Now these scripts
> > will break.
> The controversial part is the new kernel naming pattern. Initially I thought
> there shouldn't be such crazy scripts relying on the pattern, but when I
> worked on cloud-init it I realized that there's already a lot of software
> taking assumption around the 'eth0' name. In the past I've seen random
> scripts that parses the ethX name assumes (incorrectly) the name ends up
> with digits, or even the digits and name are 1:1 mapped. Of course, you can
> say these are bugs in scripts themselves.

No what I say is that they will crash on rename of standby too.

> Anyway, I'll let others in the netdev to comment on this new scheme, maybe
> that's the concern of merely myself. The good part of your proposal is that
> we can get consistent slave name, which still plays its role until we move
> towards making slave names less relevant, i.e. ideally a 1-netdev model. I
> think we both agree that the master matters more than the slave names.
> > 
> > Maybe it is still early enough (just half a year passed) that the
> > number of these users would be small.  So how about a kernel config
> > option and maybe a module parameter to rename the primary?  People can
> > then opt in to the old broken behaviour.
> Were I could I would ask  why a similar opt-in (kernel config or module
> parameter) couldn't be implemented to open up the rename restriction on
> slave, net_failover in particular. What I felt about this rename restriction
> was more because of historical reason than anything else, while net_failover
> is comparatively a new type of link that we are now designing proper use
> case it should support, and can get it shaped to whatever it fits. My
> personal view is that the slave can't be renamed when master is running is
> just implementation details that got incorrectly exposed to userspace apps
> for many years. It's old behavior with historical reason for sure, but I
> don't think this applies to net_failover.
> 
> (FWIW as one previous bond maintainer for another OS, we relieved the rename
> restriction slaves 13 year ago, while no single complaint or issue was ever
> raised because of this change over the years, neither from the customers of
> tens of millions of installation base, nor the FOSS software running atop.
> Of course, Linux is different so that experience doesn't count.)
> 
> Thanks,
> -Siwei
> 
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [virtio-dev] Re: net_failover slave udev renaming (was Re: [RFC PATCH net-next v6 4/4] netvsc: refactor notifier/event handling code to use the bypass framework)

2019-02-28 Thread Jakub Kicinski
On Thu, 28 Feb 2019 16:20:28 -0800, Siwei Liu wrote:
> On Thu, Feb 28, 2019 at 11:56 AM Jakub Kicinski wrote:
> > On Thu, 28 Feb 2019 14:36:56 -0500, Michael S. Tsirkin wrote:  
> > > > It is a bit of a the chicken or the egg situation ;)  But users can
> > > > just blacklist, too.  Anyway, I think this is far better than module
> > > > parameters  
> > >
> > > Sorry I'm a bit confused. What is better than what?  
> >
> > I mean that blacklist net_failover or module param to disable
> > net_failover and handle in user space are better than trying to solve
> > the renaming at kernel level (either by adding module params that make
> > the kernel rename devices or letting user space change names of running
> > devices if they are slaves).  
> 
> Before I was aksed to revive this old mail thread, I knew the
> discussion could end up with something like this. Yes, theoretically
> there's a point - basically you don't believe kernel should take risk
> in fixing the issue, so you push back the hope to something in
> hypothesis that actually wasn't done and hard to get done in reality.
> It's not too different than saying "hey, what you're asking for is
> simply wrong, don't do it! Go back to modify userspace to create a
> bond or team instead!" FWIW I want to emphasize that the debate for
> what should be the right place to implement this failover facility:
> userspace versus kernel, had been around for almost a decade, and no
> real work ever happened in userspace to "standardize" this in the
> Linux world.

Let me offer you my very subjective opinion of why "no real work ever
happened in user space".  The actors who have primary interest to get
the auto-bonding working are HW vendors trying to either convince
customers to use SR-IOV, or being pressured by customers to make SR-IOV
easier to consume.  HW vendors hire driver developers, not user space
developers.  So the solution we arrive at is in the kernel for a non
technical reason (Conway's law, sort of).

$ cd NetworkManager/
$ git log --pretty=format:"%ae" | \
grep '\(mellanox\|intel\|broadcom\|netronome\)' | sort | uniq -c
 81 andrew.zaborow...@intel.com
  2 david.woodho...@intel.com
  2 ismo.puusti...@intel.com
  1 michael.i.dohe...@intel.com

Andrew works on WiFi.

I have asked the NetworkManager folks to implement this feature last
year when net_failover got dangerously close to getting merged, and
they said they were never approached with this request before, much less
offered code that solve it.  Unfortunately before they got around to it
net_failover was merged already, and they didn't proceed.  

So to my knowledge nobody ever tried to solve this in user space.
I don't think net_failover is particularly terrible, or that renaming
of primary in the kernel is the end of the world, but I'd appreciate if
you could point me to efforts to solve it upstream in user space
components, or acknowledge that nobody actually tried that.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [virtio-dev] Re: net_failover slave udev renaming (was Re: [RFC PATCH net-next v6 4/4] netvsc: refactor notifier/event handling code to use the bypass framework)

2019-02-28 Thread Siwei Liu
On Thu, Feb 28, 2019 at 11:56 AM Jakub Kicinski  wrote:
>
> On Thu, 28 Feb 2019 14:36:56 -0500, Michael S. Tsirkin wrote:
> > > It is a bit of a the chicken or the egg situation ;)  But users can
> > > just blacklist, too.  Anyway, I think this is far better than module
> > > parameters
> >
> > Sorry I'm a bit confused. What is better than what?
>
> I mean that blacklist net_failover or module param to disable
> net_failover and handle in user space are better than trying to solve
> the renaming at kernel level (either by adding module params that make
> the kernel rename devices or letting user space change names of running
> devices if they are slaves).

Before I was aksed to revive this old mail thread, I knew the
discussion could end up with something like this. Yes, theoretically
there's a point - basically you don't believe kernel should take risk
in fixing the issue, so you push back the hope to something in
hypothesis that actually wasn't done and hard to get done in reality.
It's not too different than saying "hey, what you're asking for is
simply wrong, don't do it! Go back to modify userspace to create a
bond or team instead!" FWIW I want to emphasize that the debate for
what should be the right place to implement this failover facility:
userspace versus kernel, had been around for almost a decade, and no
real work ever happened in userspace to "standardize" this in the
Linux world.  The truth is that it's quite amount of complex work to
get it implemented right at userspace in reality: what Michael
mentions about making dracut auto-bonding aware is just tip of the
iceberg. Basically one would need to modify all the existing network
config tools to treat them well with this new auto-bonding concept:
handle duplicate MACs, differentiate it with regular bond/team, fix
boot time dependency of network boot and etc. Moreover, it's not a
single distro's effort from cloud provider's perspective, at least not
as simple as to say just move it to a daemon systemd/NM then work is
done. We (Oracle) had done extensive work in the past year to help
align various userspace components and work with distro vendors to
patch shipped packages to make them work with the failover 3-netdev
model. The work that needs to be done with userspace auto-bonding
would be more involved than just that, with quite trivial value (just
naming?) in turn that I suspect any developer in userspace could be
motivated.

So, simply put, no, we have zero interest in this direction. If
upstream believes this is the final conclusion, I think we can stop
discussing.

Thanks,
-Siwei
>
> > > for twiddling kernel-based interface naming policy.. :S
> >
> > I see your point. But my point is slave names don't really matter, only
> > master name matters.  So I am not sure there's any policy worth talking
> > about here.
>
> Oh yes, I don't disagree with you, but others seems to want to rename
> the auto-bonded lower devices.  Which can be done trivially if it was
> a daemon in user space instantiating the auto-bond.  We are just
> providing a basic version of auto-bonding in the kernel.  If there are
> extra requirements on policy, or naming - the whole thing is better
> solved in user space.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [virtio-dev] Re: net_failover slave udev renaming (was Re: [RFC PATCH net-next v6 4/4] netvsc: refactor notifier/event handling code to use the bypass framework)

2019-02-28 Thread Jakub Kicinski
On Thu, 28 Feb 2019 15:14:55 -0500, Michael S. Tsirkin wrote:
> On Thu, Feb 28, 2019 at 11:56:41AM -0800, Jakub Kicinski wrote:
> > On Thu, 28 Feb 2019 14:36:56 -0500, Michael S. Tsirkin wrote:  
> > > > It is a bit of a the chicken or the egg situation ;)  But users can
> > > > just blacklist, too.  Anyway, I think this is far better than module
> > > > parameters
> > > 
> > > Sorry I'm a bit confused. What is better than what?  
> > 
> > I mean that blacklist net_failover or module param to disable
> > net_failover and handle in user space are better than trying to solve
> > the renaming at kernel level (either by adding module params that make
> > the kernel rename devices or letting user space change names of running
> > devices if they are slaves).
> >   
> > > > for twiddling kernel-based interface naming policy.. :S
> > > 
> > > I see your point. But my point is slave names don't really matter, only
> > > master name matters.  So I am not sure there's any policy worth talking
> > > about here.  
> > 
> > Oh yes, I don't disagree with you, but others seems to want to rename
> > the auto-bonded lower devices.  Which can be done trivially if it was 
> > a daemon in user space instantiating the auto-bond.  We are just
> > providing a basic version of auto-bonding in the kernel.  If there are
> > extra requirements on policy, or naming - the whole thing is better
> > solved in user space.  
> 
> OK so it seems that you would be happy with a combination of the module
> parameter disabling failover completely and renaming primary in kernel?
> Did I get it right?

Not 100%, I'm personally not convinced that renaming primary in the
kernel is okay.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [virtio-dev] Re: net_failover slave udev renaming (was Re: [RFC PATCH net-next v6 4/4] netvsc: refactor notifier/event handling code to use the bypass framework)

2019-02-28 Thread Michael S. Tsirkin
On Thu, Feb 28, 2019 at 11:56:41AM -0800, Jakub Kicinski wrote:
> On Thu, 28 Feb 2019 14:36:56 -0500, Michael S. Tsirkin wrote:
> > > It is a bit of a the chicken or the egg situation ;)  But users can
> > > just blacklist, too.  Anyway, I think this is far better than module
> > > parameters  
> > 
> > Sorry I'm a bit confused. What is better than what?
> 
> I mean that blacklist net_failover or module param to disable
> net_failover and handle in user space are better than trying to solve
> the renaming at kernel level (either by adding module params that make
> the kernel rename devices or letting user space change names of running
> devices if they are slaves).
> 
> > > for twiddling kernel-based interface naming policy.. :S  
> > 
> > I see your point. But my point is slave names don't really matter, only
> > master name matters.  So I am not sure there's any policy worth talking
> > about here.
> 
> Oh yes, I don't disagree with you, but others seems to want to rename
> the auto-bonded lower devices.  Which can be done trivially if it was 
> a daemon in user space instantiating the auto-bond.  We are just
> providing a basic version of auto-bonding in the kernel.  If there are
> extra requirements on policy, or naming - the whole thing is better
> solved in user space.

OK so it seems that you would be happy with a combination of the module
parameter disabling failover completely and renaming primary in kernel?
Did I get it right?

-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [virtio-dev] Re: net_failover slave udev renaming (was Re: [RFC PATCH net-next v6 4/4] netvsc: refactor notifier/event handling code to use the bypass framework)

2019-02-28 Thread Jakub Kicinski
On Thu, 28 Feb 2019 14:36:56 -0500, Michael S. Tsirkin wrote:
> > It is a bit of a the chicken or the egg situation ;)  But users can
> > just blacklist, too.  Anyway, I think this is far better than module
> > parameters  
> 
> Sorry I'm a bit confused. What is better than what?

I mean that blacklist net_failover or module param to disable
net_failover and handle in user space are better than trying to solve
the renaming at kernel level (either by adding module params that make
the kernel rename devices or letting user space change names of running
devices if they are slaves).

> > for twiddling kernel-based interface naming policy.. :S  
> 
> I see your point. But my point is slave names don't really matter, only
> master name matters.  So I am not sure there's any policy worth talking
> about here.

Oh yes, I don't disagree with you, but others seems to want to rename
the auto-bonded lower devices.  Which can be done trivially if it was 
a daemon in user space instantiating the auto-bond.  We are just
providing a basic version of auto-bonding in the kernel.  If there are
extra requirements on policy, or naming - the whole thing is better
solved in user space.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [virtio-dev] Re: net_failover slave udev renaming (was Re: [RFC PATCH net-next v6 4/4] netvsc: refactor notifier/event handling code to use the bypass framework)

2019-02-28 Thread Michael S. Tsirkin
On Thu, Feb 28, 2019 at 10:13:56AM -0800, Jakub Kicinski wrote:
> On Wed, 27 Feb 2019 23:47:33 -0500, Michael S. Tsirkin wrote:
> > On Wed, Feb 27, 2019 at 05:52:18PM -0800, Jakub Kicinski wrote:
> > > > > Can the users who care about the naming put net_failover into
> > > > > "user space will do the bond enslavement" mode, and do the bond
> > > > > creation/management themselves from user space (in systemd/ 
> > > > > Network Manager) based on the failover flag?
> > > > 
> > > > Putting issues of compatibility aside (userspace tends to be confused if
> > > > you give it two devices with same MAC), how would you have it work in
> > > > practice? Timer based hacks like netvsc where if userspace didn't
> > > > respond within X seconds we assume it won't and do everything 
> > > > ourselves?  
> > > 
> > > Well, what I'm saying is basically if user space knows how to deal with
> > > the auto-bonding, we can put aside net_failover for the most part.  It
> > > can either be blacklisted or it can have some knob which will
> > > effectively disable the auto-enslavement.  
> > 
> > OK I guess we could add a module parameter to skip this.
> > Is this what you mean?
> 
> Yup.
> 
> > > Auto-bonding capable user space can do the renames, spawn the bond,
> > > etc. all by itself.  I'm basically going back to my initial proposal
> > > here :)  There is a RedHat bugzilla for the NetworkManager team to do
> > > this, but we merged net_failover before those folks got around to
> > > implementing it.  
> > 
> > In particular because there's no policy involved whatsoever
> > here so it's just mechanism being pushed up to userspace.
> > 
> > > IOW if NM/systemd is capable of doing the auto-bonding itself it can
> > > disable the kernel mechanism and take care of it all.  If kernel is
> > > booted with an old user space which doesn't have capable NM/systemd -
> > > net_failover will kick in and do its best.  
> > 
> > Sure - it's just 2 lines of code, see below.
> > 
> > Signed-off-by: Michael S. Tsirkin 
> > 
> > But I don't intend to bother until there's actual interest from
> > userspace developers to bother. In particular it is not just NM/systemd
> > even on Fedora - e.g. you will need to teach dracut to somehow detect
> > and handle this - right now it gets confused if there are two devices
> > with same MAC addresses.
> 
> It is a bit of a the chicken or the egg situation ;)  But users can
> just blacklist, too.  Anyway, I think this is far better than module
> parameters

Sorry I'm a bit confused. What is better than what?

> for twiddling kernel-based interface naming policy.. :S

I see your point. But my point is slave names don't really matter, only
master name matters.  So I am not sure there's any policy worth talking
about here.

> > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > index 955b3e76eb8d..dd2b2c370003 100644
> > --- a/drivers/net/virtio_net.c
> > +++ b/drivers/net/virtio_net.c
> > @@ -43,6 +43,7 @@ static bool csum = true, gso = true, napi_tx;
> >  module_param(csum, bool, 0444);
> >  module_param(gso, bool, 0444);
> >  module_param(napi_tx, bool, 0644);
> > +module_param(disable_failover, bool, 0644);
> >  
> >  /* FIXME: MTU in config. */
> >  #define GOOD_PACKET_LEN (ETH_HLEN + VLAN_HLEN + ETH_DATA_LEN)
> > @@ -3163,6 +3164,7 @@ static int virtnet_probe(struct virtio_device *vdev)
> > virtnet_init_settings(dev);
> >  
> > -   if (virtio_has_feature(vdev, VIRTIO_NET_F_STANDBY)) {
> > +   if (virtio_has_feature(vdev, VIRTIO_NET_F_STANDBY) &&
> > +   !disable_failover) {
> > vi->failover = net_failover_create(vi->dev);
> > if (IS_ERR(vi->failover)) {
> > err = PTR_ERR(vi->failover);
> > 
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [virtio-dev] Re: net_failover slave udev renaming (was Re: [RFC PATCH net-next v6 4/4] netvsc: refactor notifier/event handling code to use the bypass framework)

2019-02-28 Thread Jakub Kicinski
On Wed, 27 Feb 2019 23:47:33 -0500, Michael S. Tsirkin wrote:
> On Wed, Feb 27, 2019 at 05:52:18PM -0800, Jakub Kicinski wrote:
> > > > Can the users who care about the naming put net_failover into
> > > > "user space will do the bond enslavement" mode, and do the bond
> > > > creation/management themselves from user space (in systemd/ 
> > > > Network Manager) based on the failover flag?
> > > 
> > > Putting issues of compatibility aside (userspace tends to be confused if
> > > you give it two devices with same MAC), how would you have it work in
> > > practice? Timer based hacks like netvsc where if userspace didn't
> > > respond within X seconds we assume it won't and do everything ourselves?  
> > 
> > Well, what I'm saying is basically if user space knows how to deal with
> > the auto-bonding, we can put aside net_failover for the most part.  It
> > can either be blacklisted or it can have some knob which will
> > effectively disable the auto-enslavement.  
> 
> OK I guess we could add a module parameter to skip this.
> Is this what you mean?

Yup.

> > Auto-bonding capable user space can do the renames, spawn the bond,
> > etc. all by itself.  I'm basically going back to my initial proposal
> > here :)  There is a RedHat bugzilla for the NetworkManager team to do
> > this, but we merged net_failover before those folks got around to
> > implementing it.  
> 
> In particular because there's no policy involved whatsoever
> here so it's just mechanism being pushed up to userspace.
> 
> > IOW if NM/systemd is capable of doing the auto-bonding itself it can
> > disable the kernel mechanism and take care of it all.  If kernel is
> > booted with an old user space which doesn't have capable NM/systemd -
> > net_failover will kick in and do its best.  
> 
> Sure - it's just 2 lines of code, see below.
> 
> Signed-off-by: Michael S. Tsirkin 
> 
> But I don't intend to bother until there's actual interest from
> userspace developers to bother. In particular it is not just NM/systemd
> even on Fedora - e.g. you will need to teach dracut to somehow detect
> and handle this - right now it gets confused if there are two devices
> with same MAC addresses.

It is a bit of a the chicken or the egg situation ;)  But users can
just blacklist, too.  Anyway, I think this is far better than module
parameters for twiddling kernel-based interface naming policy.. :S

> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 955b3e76eb8d..dd2b2c370003 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -43,6 +43,7 @@ static bool csum = true, gso = true, napi_tx;
>  module_param(csum, bool, 0444);
>  module_param(gso, bool, 0444);
>  module_param(napi_tx, bool, 0644);
> +module_param(disable_failover, bool, 0644);
>  
>  /* FIXME: MTU in config. */
>  #define GOOD_PACKET_LEN (ETH_HLEN + VLAN_HLEN + ETH_DATA_LEN)
> @@ -3163,6 +3164,7 @@ static int virtnet_probe(struct virtio_device *vdev)
>   virtnet_init_settings(dev);
>  
> - if (virtio_has_feature(vdev, VIRTIO_NET_F_STANDBY)) {
> + if (virtio_has_feature(vdev, VIRTIO_NET_F_STANDBY) &&
> + !disable_failover) {
>   vi->failover = net_failover_create(vi->dev);
>   if (IS_ERR(vi->failover)) {
>   err = PTR_ERR(vi->failover);
> 
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [virtio-dev] Re: net_failover slave udev renaming (was Re: [RFC PATCH net-next v6 4/4] netvsc: refactor notifier/event handling code to use the bypass framework)

2019-02-28 Thread Michael S. Tsirkin
On Thu, Feb 28, 2019 at 01:32:12AM -0800, si-wei liu wrote:
> > > Will the
> > > change break userspace further?
> > > 
> > > -Siwei
> > Didn't you show userspace is already broken. You can't "further
> > break it", rename already fails.
> It's a race, userspace tends to give slave a user(space) desired name but
> sometimes may fail due to this race. Today if failover master is not up,
> rename would succeed anyway. While what you proposed prohibits user from
> providing a name in all circumstances if I understand you correctly. That's
> what I meant of breaking userspace further. On the other hand, you seem to
> tighten the kernel default naming to udev predictable names, which is
> derived from only recent systemd-udevd, while there exists many possible
> userspace naming schemes out of that. Users today who deliberately chooses
> to disable predictable naming (net.ifnames=0 biosdevname=0) and fall back to
> kernel provided names would expect the ethX pattern, with this change
> admin/user scripts which matches the ethX pattern could potentially break.

Whatever crashes with a name not matching ethX will crash on the
standby interface *anyway*.

So I think what you are saying is that someone might have already
written scripts and gotten them to work on v4.17 when STANDBY was
included and these scripts rely on ethX. Now these scripts
will break.

Maybe it is still early enough (just half a year passed) that the
number of these users would be small.  So how about a kernel config
option and maybe a module parameter to rename the primary?  People can
then opt in to the old broken behaviour.

-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [virtio-dev] Re: net_failover slave udev renaming (was Re: [RFC PATCH net-next v6 4/4] netvsc: refactor notifier/event handling code to use the bypass framework)

2019-02-27 Thread Michael S. Tsirkin
On Wed, Feb 27, 2019 at 05:52:18PM -0800, Jakub Kicinski wrote:
> On Wed, 27 Feb 2019 20:26:02 -0500, Michael S. Tsirkin wrote:
> > On Wed, Feb 27, 2019 at 04:52:05PM -0800, Jakub Kicinski wrote:
> > > On Wed, 27 Feb 2019 19:41:32 -0500, Michael S. Tsirkin wrote:  
> > > > > As this scheme adds much complexity to the kernel naming convention
> > > > > (currently it's just ethX names) that no userspace can understand.
> > > > 
> > > > Anything that pokes at slaves needs to be specially designed anyway.
> > > > Naming seems like a minor issue.  
> > > 
> > > Can the users who care about the naming put net_failover into
> > > "user space will do the bond enslavement" mode, and do the bond
> > > creation/management themselves from user space (in systemd/ 
> > > Network Manager) based on the failover flag?  
> > 
> > Putting issues of compatibility aside (userspace tends to be confused if
> > you give it two devices with same MAC), how would you have it work in
> > practice? Timer based hacks like netvsc where if userspace didn't
> > respond within X seconds we assume it won't and do everything ourselves?
> 
> Well, what I'm saying is basically if user space knows how to deal with
> the auto-bonding, we can put aside net_failover for the most part.  It
> can either be blacklisted or it can have some knob which will
> effectively disable the auto-enslavement.

OK I guess we could add a module parameter to skip this.
Is this what you mean?

> Auto-bonding capable user space can do the renames, spawn the bond,
> etc. all by itself.  I'm basically going back to my initial proposal
> here :)  There is a RedHat bugzilla for the NetworkManager team to do
> this, but we merged net_failover before those folks got around to
> implementing it.

In particular because there's no policy involved whatsoever
here so it's just mechanism being pushed up to userspace.

> IOW if NM/systemd is capable of doing the auto-bonding itself it can
> disable the kernel mechanism and take care of it all.  If kernel is
> booted with an old user space which doesn't have capable NM/systemd -
> net_failover will kick in and do its best.

Sure - it's just 2 lines of code, see below.

Signed-off-by: Michael S. Tsirkin 

But I don't intend to bother until there's actual interest from
userspace developers to bother. In particular it is not just NM/systemd
even on Fedora - e.g. you will need to teach dracut to somehow detect
and handle this - right now it gets confused if there are two devices
with same MAC addresses.

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 955b3e76eb8d..dd2b2c370003 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -43,6 +43,7 @@ static bool csum = true, gso = true, napi_tx;
 module_param(csum, bool, 0444);
 module_param(gso, bool, 0444);
 module_param(napi_tx, bool, 0644);
+module_param(disable_failover, bool, 0644);
 
 /* FIXME: MTU in config. */
 #define GOOD_PACKET_LEN (ETH_HLEN + VLAN_HLEN + ETH_DATA_LEN)
@@ -3163,6 +3164,7 @@ static int virtnet_probe(struct virtio_device *vdev)
virtnet_init_settings(dev);
 
-   if (virtio_has_feature(vdev, VIRTIO_NET_F_STANDBY)) {
+   if (virtio_has_feature(vdev, VIRTIO_NET_F_STANDBY) &&
+   !disable_failover) {
vi->failover = net_failover_create(vi->dev);
if (IS_ERR(vi->failover)) {
err = PTR_ERR(vi->failover);

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [virtio-dev] Re: net_failover slave udev renaming (was Re: [RFC PATCH net-next v6 4/4] netvsc: refactor notifier/event handling code to use the bypass framework)

2019-02-27 Thread Jakub Kicinski
On Wed, 27 Feb 2019 20:26:02 -0500, Michael S. Tsirkin wrote:
> On Wed, Feb 27, 2019 at 04:52:05PM -0800, Jakub Kicinski wrote:
> > On Wed, 27 Feb 2019 19:41:32 -0500, Michael S. Tsirkin wrote:  
> > > > As this scheme adds much complexity to the kernel naming convention
> > > > (currently it's just ethX names) that no userspace can understand.
> > > 
> > > Anything that pokes at slaves needs to be specially designed anyway.
> > > Naming seems like a minor issue.  
> > 
> > Can the users who care about the naming put net_failover into
> > "user space will do the bond enslavement" mode, and do the bond
> > creation/management themselves from user space (in systemd/ 
> > Network Manager) based on the failover flag?  
> 
> Putting issues of compatibility aside (userspace tends to be confused if
> you give it two devices with same MAC), how would you have it work in
> practice? Timer based hacks like netvsc where if userspace didn't
> respond within X seconds we assume it won't and do everything ourselves?

Well, what I'm saying is basically if user space knows how to deal with
the auto-bonding, we can put aside net_failover for the most part.  It
can either be blacklisted or it can have some knob which will
effectively disable the auto-enslavement.

Auto-bonding capable user space can do the renames, spawn the bond,
etc. all by itself.  I'm basically going back to my initial proposal
here :)  There is a RedHat bugzilla for the NetworkManager team to do
this, but we merged net_failover before those folks got around to
implementing it.

IOW if NM/systemd is capable of doing the auto-bonding itself it can
disable the kernel mechanism and take care of it all.  If kernel is
booted with an old user space which doesn't have capable NM/systemd -
net_failover will kick in and do its best.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [virtio-dev] Re: net_failover slave udev renaming (was Re: [RFC PATCH net-next v6 4/4] netvsc: refactor notifier/event handling code to use the bypass framework)

2019-02-27 Thread Michael S. Tsirkin
On Wed, Feb 27, 2019 at 04:52:05PM -0800, Jakub Kicinski wrote:
> On Wed, 27 Feb 2019 19:41:32 -0500, Michael S. Tsirkin wrote:
> > > As this scheme adds much complexity to the kernel naming convention
> > > (currently it's just ethX names) that no userspace can understand.  
> > 
> > Anything that pokes at slaves needs to be specially designed anyway.
> > Naming seems like a minor issue.
> 
> Can the users who care about the naming put net_failover into
> "user space will do the bond enslavement" mode, and do the bond
> creation/management themselves from user space (in systemd/ 
> Network Manager) based on the failover flag?

Putting issues of compatibility aside (userspace tends to be confused if
you give it two devices with same MAC), how would you have it work in
practice? Timer based hacks like netvsc where if userspace didn't
respond within X seconds we assume it won't and do everything ourselves?

-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [virtio-dev] Re: net_failover slave udev renaming (was Re: [RFC PATCH net-next v6 4/4] netvsc: refactor notifier/event handling code to use the bypass framework)

2019-02-27 Thread Jakub Kicinski
On Wed, 27 Feb 2019 19:41:32 -0500, Michael S. Tsirkin wrote:
> > As this scheme adds much complexity to the kernel naming convention
> > (currently it's just ethX names) that no userspace can understand.  
> 
> Anything that pokes at slaves needs to be specially designed anyway.
> Naming seems like a minor issue.

Can the users who care about the naming put net_failover into
"user space will do the bond enslavement" mode, and do the bond
creation/management themselves from user space (in systemd/ 
Network Manager) based on the failover flag?
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [virtio-dev] Re: net_failover slave udev renaming (was Re: [RFC PATCH net-next v6 4/4] netvsc: refactor notifier/event handling code to use the bypass framework)

2019-02-27 Thread Michael S. Tsirkin
On Wed, Feb 27, 2019 at 04:38:00PM -0800, si-wei liu wrote:
> 
> 
> On 2/27/2019 3:50 PM, Michael S. Tsirkin wrote:
> > On Wed, Feb 27, 2019 at 03:34:56PM -0800, si-wei liu wrote:
> > > 
> > > On 2/27/2019 2:38 PM, Michael S. Tsirkin wrote:
> > > > On Tue, Feb 26, 2019 at 04:17:21PM -0800, si-wei liu wrote:
> > > > > On 2/25/2019 6:08 PM, Michael S. Tsirkin wrote:
> > > > > > On Mon, Feb 25, 2019 at 04:58:07PM -0800, si-wei liu wrote:
> > > > > > > On 2/22/2019 7:14 AM, Michael S. Tsirkin wrote:
> > > > > > > > On Thu, Feb 21, 2019 at 11:55:11PM -0800, si-wei liu wrote:
> > > > > > > > > On 2/21/2019 11:00 PM, Samudrala, Sridhar wrote:
> > > > > > > > > > On 2/21/2019 7:33 PM, si-wei liu wrote:
> > > > > > > > > > > On 2/21/2019 5:39 PM, Michael S. Tsirkin wrote:
> > > > > > > > > > > > On Thu, Feb 21, 2019 at 05:14:44PM -0800, Siwei Liu 
> > > > > > > > > > > > wrote:
> > > > > > > > > > > > > Sorry for replying to this ancient thread. There was 
> > > > > > > > > > > > > some remaining
> > > > > > > > > > > > > issue that I don't think the initial net_failover 
> > > > > > > > > > > > > patch got addressed
> > > > > > > > > > > > > cleanly, see:
> > > > > > > > > > > > > 
> > > > > > > > > > > > > https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1815268
> > > > > > > > > > > > > 
> > > > > > > > > > > > > The renaming of 'eth0' to 'ens4' fails because the 
> > > > > > > > > > > > > udev userspace was
> > > > > > > > > > > > > not specifically writtten for such kernel automatic 
> > > > > > > > > > > > > enslavement.
> > > > > > > > > > > > > Specifically, if it is a bond or team, the slave 
> > > > > > > > > > > > > would typically get
> > > > > > > > > > > > > renamed *before* virtual device gets created, that's 
> > > > > > > > > > > > > what udev can
> > > > > > > > > > > > > control (without getting netdev opened early by the 
> > > > > > > > > > > > > other part of
> > > > > > > > > > > > > kernel) and other userspace components for e.g. 
> > > > > > > > > > > > > initramfs,
> > > > > > > > > > > > > init-scripts can coordinate well in between. The 
> > > > > > > > > > > > > in-kernel
> > > > > > > > > > > > > auto-enslavement of net_failover breaks this 
> > > > > > > > > > > > > userspace convention,
> > > > > > > > > > > > > which don't provides a solution if user care about 
> > > > > > > > > > > > > consistent naming
> > > > > > > > > > > > > on the slave netdevs specifically.
> > > > > > > > > > > > > 
> > > > > > > > > > > > > Previously this issue had been specifically called 
> > > > > > > > > > > > > out when IFF_HIDDEN
> > > > > > > > > > > > > and the 1-netdev was proposed, but no one gives out a 
> > > > > > > > > > > > > solution to this
> > > > > > > > > > > > > problem ever since. Please share your mind how to 
> > > > > > > > > > > > > proceed and solve
> > > > > > > > > > > > > this userspace issue if netdev does not welcome a 
> > > > > > > > > > > > > 1-netdev model.
> > > > > > > > > > > > Above says:
> > > > > > > > > > > > 
> > > > > > > > > > > > there's no motivation in the systemd/udevd 
> > > > > > > > > > > > community at
> > > > > > > > > > > > this point to refactor the rename logic and 
> > > > > > > > > > > > make it work well with
> > > > > > > > > > > > 3-netdev.
> > > > > > > > > > > > 
> > > > > > > > > > > > What would the fix be? Skip slave devices?
> > > > > > > > > > > > 
> > > > > > > > > > > There's nothing user can get if just skipping slave 
> > > > > > > > > > > devices - the
> > > > > > > > > > > name is still unchanged and unpredictable e.g. eth0, or 
> > > > > > > > > > > eth1 the
> > > > > > > > > > > next reboot, while the rest may conform to the naming 
> > > > > > > > > > > scheme (ens3
> > > > > > > > > > > and such). There's no way one can fix this in userspace 
> > > > > > > > > > > alone - when
> > > > > > > > > > > the failover is created the enslaved netdev was opened by 
> > > > > > > > > > > the kernel
> > > > > > > > > > > earlier than the userspace is made aware of, and there's 
> > > > > > > > > > > no
> > > > > > > > > > > negotiation protocol for kernel to know when userspace 
> > > > > > > > > > > has done
> > > > > > > > > > > initial renaming of the interface. I would expect netdev 
> > > > > > > > > > > list should
> > > > > > > > > > > at least provide the direction in general for how this 
> > > > > > > > > > > can be
> > > > > > > > > > > solved...
> > > > > > > > I was just wondering what did you mean when you said
> > > > > > > > "refactor the rename logic and make it work well with 3-netdev" 
> > > > > > > > -
> > > > > > > > was there a proposal udev rejected?
> > > > > > > No. I never believed this particular issue can be fixed in 
> > > > > > > userspace alone.
> > > > > > > Previously someone had said it could be, but I never see any work 
> > > > > > > or
> > > > > > > relevant discussion ever happened in various userspace 
> > > > > > > communities (for e.g.
> > > > > > > dracut, initramfs-to

Re: [virtio-dev] Re: net_failover slave udev renaming (was Re: [RFC PATCH net-next v6 4/4] netvsc: refactor notifier/event handling code to use the bypass framework)

2019-02-27 Thread Michael S. Tsirkin
On Wed, Feb 27, 2019 at 04:03:42PM -0800, Stephen Hemminger wrote:
> > With this approach kernel will deny attempts by userspace to rename
> > slaves.  Slaves will always be named XXXnsby and XXnpry. Master renames
> > will rename both slaves.
> > 
> > It seems pretty solid to me, the only issue is that in theory userspace
> > can use a name like XXXnsby for something else. But this seems unlikely.
> 
> Similar schemes (with kernel providing naming) were also previously rejected
> upstream.

Links?
I'm inclined to try and see what happens.

> It has been a consistent theme that the kernel should not be in
> the renaming business.

In this case it's not in renaming business per se. The only reason
we even have the original name is due to the ways internal APIs
work. You can look at it as simply having slaves names being
part of master.

> It will certainly break userspace.

That's a strong claim. What is it based on?  It so happens that
userspace renaming slaves is already broken on virtio. So we can fix it
any way we like :)

And yes it won't help netvsc because netvsc wants compatibility with old
scripts but then netvsc uses a 2 device model anyway.

-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [virtio-dev] Re: net_failover slave udev renaming (was Re: [RFC PATCH net-next v6 4/4] netvsc: refactor notifier/event handling code to use the bypass framework)

2019-02-27 Thread Stephen Hemminger
On Wed, 27 Feb 2019 18:50:44 -0500
"Michael S. Tsirkin"  wrote:

> On Wed, Feb 27, 2019 at 03:34:56PM -0800, si-wei liu wrote:
> > 
> > 
> > On 2/27/2019 2:38 PM, Michael S. Tsirkin wrote:  
> > > On Tue, Feb 26, 2019 at 04:17:21PM -0800, si-wei liu wrote:  
> > > > 
> > > > On 2/25/2019 6:08 PM, Michael S. Tsirkin wrote:  
> > > > > On Mon, Feb 25, 2019 at 04:58:07PM -0800, si-wei liu wrote:  
> > > > > > On 2/22/2019 7:14 AM, Michael S. Tsirkin wrote:  
> > > > > > > On Thu, Feb 21, 2019 at 11:55:11PM -0800, si-wei liu wrote:  
> > > > > > > > On 2/21/2019 11:00 PM, Samudrala, Sridhar wrote:  
> > > > > > > > > On 2/21/2019 7:33 PM, si-wei liu wrote:  
> > > > > > > > > > On 2/21/2019 5:39 PM, Michael S. Tsirkin wrote:  
> > > > > > > > > > > On Thu, Feb 21, 2019 at 05:14:44PM -0800, Siwei Liu 
> > > > > > > > > > > wrote:  
> > > > > > > > > > > > Sorry for replying to this ancient thread. There was 
> > > > > > > > > > > > some remaining
> > > > > > > > > > > > issue that I don't think the initial net_failover patch 
> > > > > > > > > > > > got addressed
> > > > > > > > > > > > cleanly, see:
> > > > > > > > > > > > 
> > > > > > > > > > > > https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1815268
> > > > > > > > > > > > 
> > > > > > > > > > > > The renaming of 'eth0' to 'ens4' fails because the udev 
> > > > > > > > > > > > userspace was
> > > > > > > > > > > > not specifically writtten for such kernel automatic 
> > > > > > > > > > > > enslavement.
> > > > > > > > > > > > Specifically, if it is a bond or team, the slave would 
> > > > > > > > > > > > typically get
> > > > > > > > > > > > renamed *before* virtual device gets created, that's 
> > > > > > > > > > > > what udev can
> > > > > > > > > > > > control (without getting netdev opened early by the 
> > > > > > > > > > > > other part of
> > > > > > > > > > > > kernel) and other userspace components for e.g. 
> > > > > > > > > > > > initramfs,
> > > > > > > > > > > > init-scripts can coordinate well in between. The 
> > > > > > > > > > > > in-kernel
> > > > > > > > > > > > auto-enslavement of net_failover breaks this userspace 
> > > > > > > > > > > > convention,
> > > > > > > > > > > > which don't provides a solution if user care about 
> > > > > > > > > > > > consistent naming
> > > > > > > > > > > > on the slave netdevs specifically.
> > > > > > > > > > > > 
> > > > > > > > > > > > Previously this issue had been specifically called out 
> > > > > > > > > > > > when IFF_HIDDEN
> > > > > > > > > > > > and the 1-netdev was proposed, but no one gives out a 
> > > > > > > > > > > > solution to this
> > > > > > > > > > > > problem ever since. Please share your mind how to 
> > > > > > > > > > > > proceed and solve
> > > > > > > > > > > > this userspace issue if netdev does not welcome a 
> > > > > > > > > > > > 1-netdev model.  
> > > > > > > > > > > Above says:
> > > > > > > > > > > 
> > > > > > > > > > >there's no motivation in the systemd/udevd 
> > > > > > > > > > > community at
> > > > > > > > > > >this point to refactor the rename logic and make 
> > > > > > > > > > > it work well with
> > > > > > > > > > >3-netdev.
> > > > > > > > > > > 
> > > > > > > > > > > What would the fix be? Skip slave devices?
> > > > > > > > > > >   
> > > > > > > > > > There's nothing user can get if just skipping slave devices 
> > > > > > > > > > - the
> > > > > > > > > > name is still unchanged and unpredictable e.g. eth0, or 
> > > > > > > > > > eth1 the
> > > > > > > > > > next reboot, while the rest may conform to the naming 
> > > > > > > > > > scheme (ens3
> > > > > > > > > > and such). There's no way one can fix this in userspace 
> > > > > > > > > > alone - when
> > > > > > > > > > the failover is created the enslaved netdev was opened by 
> > > > > > > > > > the kernel
> > > > > > > > > > earlier than the userspace is made aware of, and there's no
> > > > > > > > > > negotiation protocol for kernel to know when userspace has 
> > > > > > > > > > done
> > > > > > > > > > initial renaming of the interface. I would expect netdev 
> > > > > > > > > > list should
> > > > > > > > > > at least provide the direction in general for how this can 
> > > > > > > > > > be
> > > > > > > > > > solved...  
> > > > > > > I was just wondering what did you mean when you said
> > > > > > > "refactor the rename logic and make it work well with 3-netdev" -
> > > > > > > was there a proposal udev rejected?  
> > > > > > No. I never believed this particular issue can be fixed in 
> > > > > > userspace alone.
> > > > > > Previously someone had said it could be, but I never see any work or
> > > > > > relevant discussion ever happened in various userspace communities 
> > > > > > (for e.g.
> > > > > > dracut, initramfs-tools, systemd, udev, and NetworkManager). IMHO 
> > > > > > the root
> > > > > > of the issue derives from the kernel, it makes more sense to start 
> > > > > > from
> > > > > > netdev, work out and decide on a solution: see what can be don

Re: [virtio-dev] Re: net_failover slave udev renaming (was Re: [RFC PATCH net-next v6 4/4] netvsc: refactor notifier/event handling code to use the bypass framework)

2019-02-27 Thread Michael S. Tsirkin
On Wed, Feb 27, 2019 at 03:34:56PM -0800, si-wei liu wrote:
> 
> 
> On 2/27/2019 2:38 PM, Michael S. Tsirkin wrote:
> > On Tue, Feb 26, 2019 at 04:17:21PM -0800, si-wei liu wrote:
> > > 
> > > On 2/25/2019 6:08 PM, Michael S. Tsirkin wrote:
> > > > On Mon, Feb 25, 2019 at 04:58:07PM -0800, si-wei liu wrote:
> > > > > On 2/22/2019 7:14 AM, Michael S. Tsirkin wrote:
> > > > > > On Thu, Feb 21, 2019 at 11:55:11PM -0800, si-wei liu wrote:
> > > > > > > On 2/21/2019 11:00 PM, Samudrala, Sridhar wrote:
> > > > > > > > On 2/21/2019 7:33 PM, si-wei liu wrote:
> > > > > > > > > On 2/21/2019 5:39 PM, Michael S. Tsirkin wrote:
> > > > > > > > > > On Thu, Feb 21, 2019 at 05:14:44PM -0800, Siwei Liu wrote:
> > > > > > > > > > > Sorry for replying to this ancient thread. There was some 
> > > > > > > > > > > remaining
> > > > > > > > > > > issue that I don't think the initial net_failover patch 
> > > > > > > > > > > got addressed
> > > > > > > > > > > cleanly, see:
> > > > > > > > > > > 
> > > > > > > > > > > https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1815268
> > > > > > > > > > > 
> > > > > > > > > > > The renaming of 'eth0' to 'ens4' fails because the udev 
> > > > > > > > > > > userspace was
> > > > > > > > > > > not specifically writtten for such kernel automatic 
> > > > > > > > > > > enslavement.
> > > > > > > > > > > Specifically, if it is a bond or team, the slave would 
> > > > > > > > > > > typically get
> > > > > > > > > > > renamed *before* virtual device gets created, that's what 
> > > > > > > > > > > udev can
> > > > > > > > > > > control (without getting netdev opened early by the other 
> > > > > > > > > > > part of
> > > > > > > > > > > kernel) and other userspace components for e.g. initramfs,
> > > > > > > > > > > init-scripts can coordinate well in between. The in-kernel
> > > > > > > > > > > auto-enslavement of net_failover breaks this userspace 
> > > > > > > > > > > convention,
> > > > > > > > > > > which don't provides a solution if user care about 
> > > > > > > > > > > consistent naming
> > > > > > > > > > > on the slave netdevs specifically.
> > > > > > > > > > > 
> > > > > > > > > > > Previously this issue had been specifically called out 
> > > > > > > > > > > when IFF_HIDDEN
> > > > > > > > > > > and the 1-netdev was proposed, but no one gives out a 
> > > > > > > > > > > solution to this
> > > > > > > > > > > problem ever since. Please share your mind how to proceed 
> > > > > > > > > > > and solve
> > > > > > > > > > > this userspace issue if netdev does not welcome a 
> > > > > > > > > > > 1-netdev model.
> > > > > > > > > > Above says:
> > > > > > > > > > 
> > > > > > > > > >there's no motivation in the systemd/udevd community 
> > > > > > > > > > at
> > > > > > > > > >this point to refactor the rename logic and make it 
> > > > > > > > > > work well with
> > > > > > > > > >3-netdev.
> > > > > > > > > > 
> > > > > > > > > > What would the fix be? Skip slave devices?
> > > > > > > > > > 
> > > > > > > > > There's nothing user can get if just skipping slave devices - 
> > > > > > > > > the
> > > > > > > > > name is still unchanged and unpredictable e.g. eth0, or eth1 
> > > > > > > > > the
> > > > > > > > > next reboot, while the rest may conform to the naming scheme 
> > > > > > > > > (ens3
> > > > > > > > > and such). There's no way one can fix this in userspace alone 
> > > > > > > > > - when
> > > > > > > > > the failover is created the enslaved netdev was opened by the 
> > > > > > > > > kernel
> > > > > > > > > earlier than the userspace is made aware of, and there's no
> > > > > > > > > negotiation protocol for kernel to know when userspace has 
> > > > > > > > > done
> > > > > > > > > initial renaming of the interface. I would expect netdev list 
> > > > > > > > > should
> > > > > > > > > at least provide the direction in general for how this can be
> > > > > > > > > solved...
> > > > > > I was just wondering what did you mean when you said
> > > > > > "refactor the rename logic and make it work well with 3-netdev" -
> > > > > > was there a proposal udev rejected?
> > > > > No. I never believed this particular issue can be fixed in userspace 
> > > > > alone.
> > > > > Previously someone had said it could be, but I never see any work or
> > > > > relevant discussion ever happened in various userspace communities 
> > > > > (for e.g.
> > > > > dracut, initramfs-tools, systemd, udev, and NetworkManager). IMHO the 
> > > > > root
> > > > > of the issue derives from the kernel, it makes more sense to start 
> > > > > from
> > > > > netdev, work out and decide on a solution: see what can be done in the
> > > > > kernel in order to fix it, then after that engage userspace community 
> > > > > for
> > > > > the feasibility...
> > > > > 
> > > > > > Anyway, can we write a time diagram for what happens in which order 
> > > > > > that
> > > > > > leads to failure?  That would help look for triggers that we can tie
> > > > > > into, or add new ones.
> > >

Re: [virtio-dev] Re: net_failover slave udev renaming (was Re: [RFC PATCH net-next v6 4/4] netvsc: refactor notifier/event handling code to use the bypass framework)

2019-02-27 Thread Michael S. Tsirkin
On Tue, Feb 26, 2019 at 04:17:21PM -0800, si-wei liu wrote:
> 
> 
> On 2/25/2019 6:08 PM, Michael S. Tsirkin wrote:
> > On Mon, Feb 25, 2019 at 04:58:07PM -0800, si-wei liu wrote:
> > > 
> > > On 2/22/2019 7:14 AM, Michael S. Tsirkin wrote:
> > > > On Thu, Feb 21, 2019 at 11:55:11PM -0800, si-wei liu wrote:
> > > > > On 2/21/2019 11:00 PM, Samudrala, Sridhar wrote:
> > > > > > On 2/21/2019 7:33 PM, si-wei liu wrote:
> > > > > > > On 2/21/2019 5:39 PM, Michael S. Tsirkin wrote:
> > > > > > > > On Thu, Feb 21, 2019 at 05:14:44PM -0800, Siwei Liu wrote:
> > > > > > > > > Sorry for replying to this ancient thread. There was some 
> > > > > > > > > remaining
> > > > > > > > > issue that I don't think the initial net_failover patch got 
> > > > > > > > > addressed
> > > > > > > > > cleanly, see:
> > > > > > > > > 
> > > > > > > > > https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1815268
> > > > > > > > > 
> > > > > > > > > The renaming of 'eth0' to 'ens4' fails because the udev 
> > > > > > > > > userspace was
> > > > > > > > > not specifically writtten for such kernel automatic 
> > > > > > > > > enslavement.
> > > > > > > > > Specifically, if it is a bond or team, the slave would 
> > > > > > > > > typically get
> > > > > > > > > renamed *before* virtual device gets created, that's what 
> > > > > > > > > udev can
> > > > > > > > > control (without getting netdev opened early by the other 
> > > > > > > > > part of
> > > > > > > > > kernel) and other userspace components for e.g. initramfs,
> > > > > > > > > init-scripts can coordinate well in between. The in-kernel
> > > > > > > > > auto-enslavement of net_failover breaks this userspace 
> > > > > > > > > convention,
> > > > > > > > > which don't provides a solution if user care about consistent 
> > > > > > > > > naming
> > > > > > > > > on the slave netdevs specifically.
> > > > > > > > > 
> > > > > > > > > Previously this issue had been specifically called out when 
> > > > > > > > > IFF_HIDDEN
> > > > > > > > > and the 1-netdev was proposed, but no one gives out a 
> > > > > > > > > solution to this
> > > > > > > > > problem ever since. Please share your mind how to proceed and 
> > > > > > > > > solve
> > > > > > > > > this userspace issue if netdev does not welcome a 1-netdev 
> > > > > > > > > model.
> > > > > > > > Above says:
> > > > > > > > 
> > > > > > > >   there's no motivation in the systemd/udevd community at
> > > > > > > >   this point to refactor the rename logic and make it work 
> > > > > > > > well with
> > > > > > > >   3-netdev.
> > > > > > > > 
> > > > > > > > What would the fix be? Skip slave devices?
> > > > > > > > 
> > > > > > > There's nothing user can get if just skipping slave devices - the
> > > > > > > name is still unchanged and unpredictable e.g. eth0, or eth1 the
> > > > > > > next reboot, while the rest may conform to the naming scheme (ens3
> > > > > > > and such). There's no way one can fix this in userspace alone - 
> > > > > > > when
> > > > > > > the failover is created the enslaved netdev was opened by the 
> > > > > > > kernel
> > > > > > > earlier than the userspace is made aware of, and there's no
> > > > > > > negotiation protocol for kernel to know when userspace has done
> > > > > > > initial renaming of the interface. I would expect netdev list 
> > > > > > > should
> > > > > > > at least provide the direction in general for how this can be
> > > > > > > solved...
> > > > I was just wondering what did you mean when you said
> > > > "refactor the rename logic and make it work well with 3-netdev" -
> > > > was there a proposal udev rejected?
> > > No. I never believed this particular issue can be fixed in userspace 
> > > alone.
> > > Previously someone had said it could be, but I never see any work or
> > > relevant discussion ever happened in various userspace communities (for 
> > > e.g.
> > > dracut, initramfs-tools, systemd, udev, and NetworkManager). IMHO the root
> > > of the issue derives from the kernel, it makes more sense to start from
> > > netdev, work out and decide on a solution: see what can be done in the
> > > kernel in order to fix it, then after that engage userspace community for
> > > the feasibility...
> > > 
> > > > Anyway, can we write a time diagram for what happens in which order that
> > > > leads to failure?  That would help look for triggers that we can tie
> > > > into, or add new ones.
> > > > 
> > > See attached diagram.
> > > 
> > > > 
> > > > 
> > > > 
> > > > > > Is there an issue if slave device names are not predictable? The 
> > > > > > user/admin scripts are expected
> > > > > > to only work with the master failover device.
> > > > > Where does this expectation come from?
> > > > > 
> > > > > Admin users may have ethtool or tc configurations that need to deal 
> > > > > with
> > > > > predictable interface name. Third-party app which was built upon 
> > > > > specifying
> > > > > certain interface name can't be modified to chase dynamic names.
> > > > > 
>

Re: [virtio-dev] Re: net_failover slave udev renaming (was Re: [RFC PATCH net-next v6 4/4] netvsc: refactor notifier/event handling code to use the bypass framework)

2019-02-27 Thread Stephen Hemminger
On Tue, 26 Feb 2019 16:17:21 -0800
si-wei liu  wrote:

> On 2/25/2019 6:08 PM, Michael S. Tsirkin wrote:
> > On Mon, Feb 25, 2019 at 04:58:07PM -0800, si-wei liu wrote:  
> >>
> >> On 2/22/2019 7:14 AM, Michael S. Tsirkin wrote:  
> >>> On Thu, Feb 21, 2019 at 11:55:11PM -0800, si-wei liu wrote:  
>  On 2/21/2019 11:00 PM, Samudrala, Sridhar wrote:  
> > On 2/21/2019 7:33 PM, si-wei liu wrote:  
> >> On 2/21/2019 5:39 PM, Michael S. Tsirkin wrote:  
> >>> On Thu, Feb 21, 2019 at 05:14:44PM -0800, Siwei Liu wrote:  
>  Sorry for replying to this ancient thread. There was some remaining
>  issue that I don't think the initial net_failover patch got addressed
>  cleanly, see:
> 
>  https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1815268
> 
>  The renaming of 'eth0' to 'ens4' fails because the udev userspace was
>  not specifically writtten for such kernel automatic enslavement.
>  Specifically, if it is a bond or team, the slave would typically get
>  renamed *before* virtual device gets created, that's what udev can
>  control (without getting netdev opened early by the other part of
>  kernel) and other userspace components for e.g. initramfs,
>  init-scripts can coordinate well in between. The in-kernel
>  auto-enslavement of net_failover breaks this userspace convention,
>  which don't provides a solution if user care about consistent naming
>  on the slave netdevs specifically.
> 
>  Previously this issue had been specifically called out when 
>  IFF_HIDDEN
>  and the 1-netdev was proposed, but no one gives out a solution to 
>  this
>  problem ever since. Please share your mind how to proceed and solve
>  this userspace issue if netdev does not welcome a 1-netdev model.  
> >>> Above says:
> >>>
> >>>   there's no motivation in the systemd/udevd community at
> >>>   this point to refactor the rename logic and make it work well 
> >>> with
> >>>   3-netdev.
> >>>
> >>> What would the fix be? Skip slave devices?
> >>>  
> >> There's nothing user can get if just skipping slave devices - the
> >> name is still unchanged and unpredictable e.g. eth0, or eth1 the
> >> next reboot, while the rest may conform to the naming scheme (ens3
> >> and such). There's no way one can fix this in userspace alone - when
> >> the failover is created the enslaved netdev was opened by the kernel
> >> earlier than the userspace is made aware of, and there's no
> >> negotiation protocol for kernel to know when userspace has done
> >> initial renaming of the interface. I would expect netdev list should
> >> at least provide the direction in general for how this can be
> >> solved...  
> >>> I was just wondering what did you mean when you said
> >>> "refactor the rename logic and make it work well with 3-netdev" -
> >>> was there a proposal udev rejected?  
> >> No. I never believed this particular issue can be fixed in userspace alone.
> >> Previously someone had said it could be, but I never see any work or
> >> relevant discussion ever happened in various userspace communities (for 
> >> e.g.
> >> dracut, initramfs-tools, systemd, udev, and NetworkManager). IMHO the root
> >> of the issue derives from the kernel, it makes more sense to start from
> >> netdev, work out and decide on a solution: see what can be done in the
> >> kernel in order to fix it, then after that engage userspace community for
> >> the feasibility...
> >>  
> >>> Anyway, can we write a time diagram for what happens in which order that
> >>> leads to failure?  That would help look for triggers that we can tie
> >>> into, or add new ones.
> >>>  
> >> See attached diagram.
> >>  
> >>>
> >>>
> >>>  
> > Is there an issue if slave device names are not predictable? The 
> > user/admin scripts are expected
> > to only work with the master failover device.  
>  Where does this expectation come from?
> 
>  Admin users may have ethtool or tc configurations that need to deal with
>  predictable interface name. Third-party app which was built upon 
>  specifying
>  certain interface name can't be modified to chase dynamic names.
> 
>  Specifically, we have pre-canned image that uses ethtool to fine tune VF
>  offload settings post boot for specific workload. Those images won't work
>  well if the name is constantly changing just after couple rounds of live
>  migration.  
> >>> It should be possible to specify the ethtool configuration on the
> >>> master and have it automatically propagated to the slave.
> >>>
> >>> BTW this is something we should look at IMHO.  
> >> I was elaborating a few examples that the expectation and assumption that
> >> user/admin scripts only deal with master failover device is incorrect. It
> >>

Re: [virtio-dev] Re: net_failover slave udev renaming (was Re: [RFC PATCH net-next v6 4/4] netvsc: refactor notifier/event handling code to use the bypass framework)

2019-02-25 Thread Michael S. Tsirkin
On Mon, Feb 25, 2019 at 04:58:07PM -0800, si-wei liu wrote:
> 
> 
> On 2/22/2019 7:14 AM, Michael S. Tsirkin wrote:
> > On Thu, Feb 21, 2019 at 11:55:11PM -0800, si-wei liu wrote:
> > > 
> > > On 2/21/2019 11:00 PM, Samudrala, Sridhar wrote:
> > > > 
> > > > On 2/21/2019 7:33 PM, si-wei liu wrote:
> > > > > 
> > > > > On 2/21/2019 5:39 PM, Michael S. Tsirkin wrote:
> > > > > > On Thu, Feb 21, 2019 at 05:14:44PM -0800, Siwei Liu wrote:
> > > > > > > Sorry for replying to this ancient thread. There was some 
> > > > > > > remaining
> > > > > > > issue that I don't think the initial net_failover patch got 
> > > > > > > addressed
> > > > > > > cleanly, see:
> > > > > > > 
> > > > > > > https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1815268
> > > > > > > 
> > > > > > > The renaming of 'eth0' to 'ens4' fails because the udev userspace 
> > > > > > > was
> > > > > > > not specifically writtten for such kernel automatic enslavement.
> > > > > > > Specifically, if it is a bond or team, the slave would typically 
> > > > > > > get
> > > > > > > renamed *before* virtual device gets created, that's what udev can
> > > > > > > control (without getting netdev opened early by the other part of
> > > > > > > kernel) and other userspace components for e.g. initramfs,
> > > > > > > init-scripts can coordinate well in between. The in-kernel
> > > > > > > auto-enslavement of net_failover breaks this userspace convention,
> > > > > > > which don't provides a solution if user care about consistent 
> > > > > > > naming
> > > > > > > on the slave netdevs specifically.
> > > > > > > 
> > > > > > > Previously this issue had been specifically called out when 
> > > > > > > IFF_HIDDEN
> > > > > > > and the 1-netdev was proposed, but no one gives out a solution to 
> > > > > > > this
> > > > > > > problem ever since. Please share your mind how to proceed and 
> > > > > > > solve
> > > > > > > this userspace issue if netdev does not welcome a 1-netdev model.
> > > > > > Above says:
> > > > > > 
> > > > > >  there's no motivation in the systemd/udevd community at
> > > > > >  this point to refactor the rename logic and make it work well 
> > > > > > with
> > > > > >  3-netdev.
> > > > > > 
> > > > > > What would the fix be? Skip slave devices?
> > > > > > 
> > > > > There's nothing user can get if just skipping slave devices - the
> > > > > name is still unchanged and unpredictable e.g. eth0, or eth1 the
> > > > > next reboot, while the rest may conform to the naming scheme (ens3
> > > > > and such). There's no way one can fix this in userspace alone - when
> > > > > the failover is created the enslaved netdev was opened by the kernel
> > > > > earlier than the userspace is made aware of, and there's no
> > > > > negotiation protocol for kernel to know when userspace has done
> > > > > initial renaming of the interface. I would expect netdev list should
> > > > > at least provide the direction in general for how this can be
> > > > > solved...
> > 
> > I was just wondering what did you mean when you said
> > "refactor the rename logic and make it work well with 3-netdev" -
> > was there a proposal udev rejected?
> No. I never believed this particular issue can be fixed in userspace alone.
> Previously someone had said it could be, but I never see any work or
> relevant discussion ever happened in various userspace communities (for e.g.
> dracut, initramfs-tools, systemd, udev, and NetworkManager). IMHO the root
> of the issue derives from the kernel, it makes more sense to start from
> netdev, work out and decide on a solution: see what can be done in the
> kernel in order to fix it, then after that engage userspace community for
> the feasibility...
> 
> > Anyway, can we write a time diagram for what happens in which order that
> > leads to failure?  That would help look for triggers that we can tie
> > into, or add new ones.
> > 
> 
> See attached diagram.
> 
> > 
> > 
> > 
> > 
> > > > Is there an issue if slave device names are not predictable? The 
> > > > user/admin scripts are expected
> > > > to only work with the master failover device.
> > > Where does this expectation come from?
> > > 
> > > Admin users may have ethtool or tc configurations that need to deal with
> > > predictable interface name. Third-party app which was built upon 
> > > specifying
> > > certain interface name can't be modified to chase dynamic names.
> > > 
> > > Specifically, we have pre-canned image that uses ethtool to fine tune VF
> > > offload settings post boot for specific workload. Those images won't work
> > > well if the name is constantly changing just after couple rounds of live
> > > migration.
> > It should be possible to specify the ethtool configuration on the
> > master and have it automatically propagated to the slave.
> > 
> > BTW this is something we should look at IMHO.
> I was elaborating a few examples that the expectation and assumption that
> user/admin scripts only deal with master failover device is incorrect.

Re: [virtio-dev] Re: net_failover slave udev renaming (was Re: [RFC PATCH net-next v6 4/4] netvsc: refactor notifier/event handling code to use the bypass framework)

2019-02-25 Thread Michael S. Tsirkin
On Mon, Feb 25, 2019 at 05:39:12PM -0800, Stephen Hemminger wrote:
> > >>> Moreover, you were suggesting hiding the lower slave devices anyway. 
> > >>> There was some discussion
> > >>> about moving them to a hidden network namespace so that they are not 
> > >>> visible from the default namespace.
> > >>> I looked into this sometime back, but did not find the right kernel api 
> > >>> to create a network namespace within
> > >>> kernel. If so, we could use this mechanism to simulate a 1-netdev 
> > >>> model.  
> > >> Yes, that's one possible implementation (IMHO the key is to make 1-netdev
> > >> model as much transparent to a real NIC as possible, while a hidden 
> > >> netns is
> > >> just the vehicle). However, I recall there was resistance around this
> > >> discussion that even the concept of hiding itself is a taboo for Linux
> > >> netdev. I would like to summon potential alternatives before concluding
> > >> 1-netdev is the only solution too soon.
> > >>
> > >> Thanks,
> > >> -Siwei  
> > > Your scripts would not work at all then, right?  
> > At this point we don't claim images with such usage as SR-IOV live 
> > migrate-able. We would flag it as live migrate-able until this ethtool 
> > config issue is fully addressed and a transparent live migration 
> > solution emerges in upstream eventually.
> 
> The hyper-v netvsc with 1-dev model uses a timeout to allow  udev to do its 
> rename.
> I proposed a patch to key state change off of the udev rename, but that patch 
> was
> rejected.

Of course that would mean nothing works without udev - was
that the objection? Could you help me find that discussion pls?

-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [virtio-dev] Re: net_failover slave udev renaming (was Re: [RFC PATCH net-next v6 4/4] netvsc: refactor notifier/event handling code to use the bypass framework)

2019-02-25 Thread Stephen Hemminger
On Mon, 25 Feb 2019 16:58:07 -0800
si-wei liu  wrote:

> On 2/22/2019 7:14 AM, Michael S. Tsirkin wrote:
> > On Thu, Feb 21, 2019 at 11:55:11PM -0800, si-wei liu wrote:  
> >>
> >> On 2/21/2019 11:00 PM, Samudrala, Sridhar wrote:  
> >>>
> >>> On 2/21/2019 7:33 PM, si-wei liu wrote:  
> 
>  On 2/21/2019 5:39 PM, Michael S. Tsirkin wrote:  
> > On Thu, Feb 21, 2019 at 05:14:44PM -0800, Siwei Liu wrote:  
> >> Sorry for replying to this ancient thread. There was some remaining
> >> issue that I don't think the initial net_failover patch got addressed
> >> cleanly, see:
> >>
> >> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1815268
> >>
> >> The renaming of 'eth0' to 'ens4' fails because the udev userspace was
> >> not specifically writtten for such kernel automatic enslavement.
> >> Specifically, if it is a bond or team, the slave would typically get
> >> renamed *before* virtual device gets created, that's what udev can
> >> control (without getting netdev opened early by the other part of
> >> kernel) and other userspace components for e.g. initramfs,
> >> init-scripts can coordinate well in between. The in-kernel
> >> auto-enslavement of net_failover breaks this userspace convention,
> >> which don't provides a solution if user care about consistent naming
> >> on the slave netdevs specifically.
> >>
> >> Previously this issue had been specifically called out when IFF_HIDDEN
> >> and the 1-netdev was proposed, but no one gives out a solution to this
> >> problem ever since. Please share your mind how to proceed and solve
> >> this userspace issue if netdev does not welcome a 1-netdev model.  
> > Above says:
> >
> >  there's no motivation in the systemd/udevd community at
> >  this point to refactor the rename logic and make it work well with
> >  3-netdev.
> >
> > What would the fix be? Skip slave devices?
> >  
>  There's nothing user can get if just skipping slave devices - the
>  name is still unchanged and unpredictable e.g. eth0, or eth1 the
>  next reboot, while the rest may conform to the naming scheme (ens3
>  and such). There's no way one can fix this in userspace alone - when
>  the failover is created the enslaved netdev was opened by the kernel
>  earlier than the userspace is made aware of, and there's no
>  negotiation protocol for kernel to know when userspace has done
>  initial renaming of the interface. I would expect netdev list should
>  at least provide the direction in general for how this can be
>  solved...  
> >
> > I was just wondering what did you mean when you said
> > "refactor the rename logic and make it work well with 3-netdev" -
> > was there a proposal udev rejected?  
> No. I never believed this particular issue can be fixed in userspace 
> alone. Previously someone had said it could be, but I never see any work 
> or relevant discussion ever happened in various userspace communities 
> (for e.g. dracut, initramfs-tools, systemd, udev, and NetworkManager). 
> IMHO the root of the issue derives from the kernel, it makes more sense 
> to start from netdev, work out and decide on a solution: see what can be 
> done in the kernel in order to fix it, then after that engage userspace 
> community for the feasibility...
> 
> > Anyway, can we write a time diagram for what happens in which order that
> > leads to failure?  That would help look for triggers that we can tie
> > into, or add new ones.
> >  
> 
> See attached diagram.
> 
> >
> >
> >
> >  
> >>> Is there an issue if slave device names are not predictable? The 
> >>> user/admin scripts are expected
> >>> to only work with the master failover device.  
> >> Where does this expectation come from?
> >>
> >> Admin users may have ethtool or tc configurations that need to deal with
> >> predictable interface name. Third-party app which was built upon specifying
> >> certain interface name can't be modified to chase dynamic names.
> >>
> >> Specifically, we have pre-canned image that uses ethtool to fine tune VF
> >> offload settings post boot for specific workload. Those images won't work
> >> well if the name is constantly changing just after couple rounds of live
> >> migration.  
> > It should be possible to specify the ethtool configuration on the
> > master and have it automatically propagated to the slave.
> >
> > BTW this is something we should look at IMHO.  
> I was elaborating a few examples that the expectation and assumption 
> that user/admin scripts only deal with master failover device is 
> incorrect. It had never been taken good care of, although I did try to 
> emphasize it from the very beginning.
> 
> Basically what you said about propagating the ethtool configuration down 
> to the slave is the key pursuance of 1-netdev model. However, what I am 
> seeking now is any alternative that can also fix the specific udev 
> r

Re: [virtio-dev] Re: net_failover slave udev renaming (was Re: [RFC PATCH net-next v6 4/4] netvsc: refactor notifier/event handling code to use the bypass framework)

2019-02-22 Thread Michael S. Tsirkin
On Thu, Feb 21, 2019 at 11:55:11PM -0800, si-wei liu wrote:
> 
> 
> On 2/21/2019 11:00 PM, Samudrala, Sridhar wrote:
> > 
> > 
> > On 2/21/2019 7:33 PM, si-wei liu wrote:
> > > 
> > > 
> > > On 2/21/2019 5:39 PM, Michael S. Tsirkin wrote:
> > > > On Thu, Feb 21, 2019 at 05:14:44PM -0800, Siwei Liu wrote:
> > > > > Sorry for replying to this ancient thread. There was some remaining
> > > > > issue that I don't think the initial net_failover patch got addressed
> > > > > cleanly, see:
> > > > > 
> > > > > https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1815268
> > > > > 
> > > > > The renaming of 'eth0' to 'ens4' fails because the udev userspace was
> > > > > not specifically writtten for such kernel automatic enslavement.
> > > > > Specifically, if it is a bond or team, the slave would typically get
> > > > > renamed *before* virtual device gets created, that's what udev can
> > > > > control (without getting netdev opened early by the other part of
> > > > > kernel) and other userspace components for e.g. initramfs,
> > > > > init-scripts can coordinate well in between. The in-kernel
> > > > > auto-enslavement of net_failover breaks this userspace convention,
> > > > > which don't provides a solution if user care about consistent naming
> > > > > on the slave netdevs specifically.
> > > > > 
> > > > > Previously this issue had been specifically called out when IFF_HIDDEN
> > > > > and the 1-netdev was proposed, but no one gives out a solution to this
> > > > > problem ever since. Please share your mind how to proceed and solve
> > > > > this userspace issue if netdev does not welcome a 1-netdev model.
> > > > Above says:
> > > > 
> > > > there's no motivation in the systemd/udevd community at
> > > > this point to refactor the rename logic and make it work well with
> > > > 3-netdev.
> > > > 
> > > > What would the fix be? Skip slave devices?
> > > > 
> > > There's nothing user can get if just skipping slave devices - the
> > > name is still unchanged and unpredictable e.g. eth0, or eth1 the
> > > next reboot, while the rest may conform to the naming scheme (ens3
> > > and such). There's no way one can fix this in userspace alone - when
> > > the failover is created the enslaved netdev was opened by the kernel
> > > earlier than the userspace is made aware of, and there's no
> > > negotiation protocol for kernel to know when userspace has done
> > > initial renaming of the interface. I would expect netdev list should
> > > at least provide the direction in general for how this can be
> > > solved...


I was just wondering what did you mean when you said
"refactor the rename logic and make it work well with 3-netdev" -
was there a proposal udev rejected?

Anyway, can we write a time diagram for what happens in which order that
leads to failure?  That would help look for triggers that we can tie
into, or add new ones.






> > > 
> > Is there an issue if slave device names are not predictable? The user/admin 
> > scripts are expected
> > to only work with the master failover device.
> Where does this expectation come from?
> 
> Admin users may have ethtool or tc configurations that need to deal with
> predictable interface name. Third-party app which was built upon specifying
> certain interface name can't be modified to chase dynamic names.
> 
> Specifically, we have pre-canned image that uses ethtool to fine tune VF
> offload settings post boot for specific workload. Those images won't work
> well if the name is constantly changing just after couple rounds of live
> migration.

It should be possible to specify the ethtool configuration on the
master and have it automatically propagated to the slave.

BTW this is something we should look at IMHO.

> > Moreover, you were suggesting hiding the lower slave devices anyway. There 
> > was some discussion
> > about moving them to a hidden network namespace so that they are not 
> > visible from the default namespace.
> > I looked into this sometime back, but did not find the right kernel api to 
> > create a network namespace within
> > kernel. If so, we could use this mechanism to simulate a 1-netdev model.
> Yes, that's one possible implementation (IMHO the key is to make 1-netdev
> model as much transparent to a real NIC as possible, while a hidden netns is
> just the vehicle). However, I recall there was resistance around this
> discussion that even the concept of hiding itself is a taboo for Linux
> netdev. I would like to summon potential alternatives before concluding
> 1-netdev is the only solution too soon.
> 
> Thanks,
> -Siwei

Your scripts would not work at all then, right?


> > 
> > > -Siwei
> > > 
> > > 
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [virtio-dev] Re: net_failover slave udev renaming (was Re: [RFC PATCH net-next v6 4/4] netvsc: refactor notifier/event handling code to use the bypass framework)

2019-02-21 Thread Samudrala, Sridhar


On 2/21/2019 7:33 PM, si-wei liu wrote:



On 2/21/2019 5:39 PM, Michael S. Tsirkin wrote:

On Thu, Feb 21, 2019 at 05:14:44PM -0800, Siwei Liu wrote:

Sorry for replying to this ancient thread. There was some remaining
issue that I don't think the initial net_failover patch got addressed
cleanly, see:

https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1815268

The renaming of 'eth0' to 'ens4' fails because the udev userspace was
not specifically writtten for such kernel automatic enslavement.
Specifically, if it is a bond or team, the slave would typically get
renamed *before* virtual device gets created, that's what udev can
control (without getting netdev opened early by the other part of
kernel) and other userspace components for e.g. initramfs,
init-scripts can coordinate well in between. The in-kernel
auto-enslavement of net_failover breaks this userspace convention,
which don't provides a solution if user care about consistent naming
on the slave netdevs specifically.

Previously this issue had been specifically called out when IFF_HIDDEN
and the 1-netdev was proposed, but no one gives out a solution to this
problem ever since. Please share your mind how to proceed and solve
this userspace issue if netdev does not welcome a 1-netdev model.

Above says:

there's no motivation in the systemd/udevd community at
this point to refactor the rename logic and make it work well with
3-netdev.

What would the fix be? Skip slave devices?

There's nothing user can get if just skipping slave devices - the name 
is still unchanged and unpredictable e.g. eth0, or eth1 the next 
reboot, while the rest may conform to the naming scheme (ens3 and 
such). There's no way one can fix this in userspace alone - when the 
failover is created the enslaved netdev was opened by the kernel 
earlier than the userspace is made aware of, and there's no 
negotiation protocol for kernel to know when userspace has done 
initial renaming of the interface. I would expect netdev list should 
at least provide the direction in general for how this can be solved...



Is there an issue if slave device names are not predictable? The user/admin 
scripts are expected
to only work with the master failover device.
Moreover, you were suggesting hiding the lower slave devices anyway. There was 
some discussion
about moving them to a hidden network namespace so that they are not visible 
from the default namespace.
I looked into this sometime back, but did not find the right kernel api to 
create a network namespace within
kernel. If so, we could use this mechanism to simulate a 1-netdev model.



-Siwei


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization