On Tue, Jun 19, 2018 at 04:14:40PM -0700, Stephen Hemminger wrote:
> On Wed, 20 Jun 2018 08:01:50 +0900
> Greg KH wrote:
>
> > On Tue, Jun 19, 2018 at 03:19:41PM -0400, Thomas Walker wrote:
> > > Upon updating some internal kernels from 4.14.18 to 4.14.49, our hyper-v
> > > guests failed to bring up network interfaces on boot, logging "A link
> > > change request failed with some changes committed already. Interface eth0
> > > may have been left with an inconsistent configuration, please check."
> > > Running 'ifconfig eth0 up' appears to fix the problem temporarily so I
> > > went about bisecting which landed on:
> > >
> > > commit be9c798d0d13ae609a91177323ac816545c39d28
> > > Author: Stephen Hemminger
> > > Date: Mon May 14 15:32:18 2018 -0700
> > >
> > > hv_netvsc: common detach logic
> > >
> > > [ Commit 7b2ee50c0cd513a176a26a71f2989facdd75bfea upstream. ]
> > >
> > > Make common function for detaching internals of device
> > > during changes to MTU and RSS. Make sure no more packets
> > > are transmitted and all packets have been received before
> > > doing device teardown.
> > >
> > > Change the wait logic to be common and use usleep_range().
> > >
> > > Changes transmit enabling logic so that transmit queues are disabled
> > > during the period when lower device is being changed. And enabled
> > > only after sub channels are setup. This avoids issue where it could
> > > be that a packet was being sent while subchannel was not initialized.
> > >
> > > Fixes: 8195b1396ec8 ("hv_netvsc: fix deadlock on hotplug")
> > > Signed-off-by: Stephen Hemminger
> > > Signed-off-by: David S. Miller
> > > Signed-off-by: Greg Kroah-Hartman
> > >
> > >
> > > The removal of which does indeed fix the problem. That led me to:
> > >
> > > commit 52acf73b6e9a6962045feb2ba5a8921da2201915
> > > Author: Dexuan Cui
> > > Date: Wed Jun 6 21:32:51 2018 +
> > >
> > > hv_netvsc: Fix a network regression after ifdown/ifup
> > >
> > > Recently people reported the NIC stops working after
> > > "ifdown eth0; ifup eth0". It turns out in this case the TX queues are
> > > not
> > > enabled, after the refactoring of the common detach logic: when the
> > > NIC
> > > has sub-channels, usually we enable all the TX queues after all
> > > sub-channels are set up: see rndis_set_subchannel() ->
> > > netif_device_attach(), but in the case of "ifdown eth0; ifup eth0"
> > > where
> > > the number of channels doesn't change, we also must make sure the TX
> > > queues
> > > are enabled. The patch fixes the regression.
> > >
> > > Fixes: 7b2ee50c0cd5 ("hv_netvsc: common detach logic")
> > > Signed-off-by: Dexuan Cui
> > > Cc: Stephen Hemminger
> > > Cc: K. Y. Srinivasan
> > > Cc: Haiyang Zhang
> > > Signed-off-by: David S. Miller
> > >
> > > Which sounded very promising, but does not seem to fully fix the issue.
> > > Doing some more digging, I was able to determine that the message
> > > coincides with 'ip link set dev eth0 mtu 1300 up' very shortly (>~1
> > > second) after the hv_netvsc driver loads. If I delay the mtu change
> > > until well after the driver loads, everything works fine. If I unload
> > > hv_netvsc and then reload it and apply the mtu change immediately, the
> > > failure re-occurs. So something is racy here, and the above doesn't
> > > entirely address it.
> > >
> > > I'm happy to test out any suggested patches and/or do additional
> > > debugging if anyone has any suggestions.
> > >
> > > (oh, and I did also try 4.18-rc1 and the problem still persists)
> >
> > Always cc: the authors of the patch you are having problems with, along
> > with the mailing list for the networking subsystem, so that the right
> > people know to look at this.
>
> How are you changing the MTU? and starting the device.
> When MTU changes the device has to reconnect with the host. This takes a
> small amount of time
> and no changes to device state are possible then.
>
> If MTU change happens and at the same time some other script tries to bring
> up the connection,
> then that scri