date:20170726

qmi_wwan: Null pointer dereference when removing driver

2017-07-26 Thread Nathaniel Roach


Unsure at which point was added, but issue not present in stock debian 4.11 
kernel.

Running on a Thinkpad X220 with coreboot.

I'm building from upstream. When I attempt to remove the qmi_wwan module (which 
also happens pre-suspend) the rmmod process gets killed, and the following 
shows in dmesg:

[   59.979791] usb 2-1.4: USB disconnect, device number 4
[   59.980102] qmi_wwan 2-1.4:1.6 wwp0s29u1u4i6: unregister 'qmi_wwan' 
usb-:00:1d.0-1.4, WWAN/QMI device
[   60.006821] BUG: unable to handle kernel NULL pointer dereference at 
00e0
[   60.006879] IP: qmi_wwan_disconnect+0x25/0xc0 [qmi_wwan]
[   60.006911] PGD 0
[   60.006911] P4D 0
[   60.006957] Oops:  [#1] SMP
[   60.006978] Modules linked in: fuse(E) ccm(E) rfcomm(E) cmac(E) bnep(E) 
qmi_wwan(E) cdc_wdm(E) cdc_ether(E) usbnet(E) mii(E) btusb(E) btrtl(E) btbcm(E) 
btintel(E) bluetooth(E) joydev(E) xpad(E) ecdh_generic(E) ff_memless(E) 
binfmt_misc(E) snd_hda_codec_hdmi(E) snd_hda_codec_conexant(E) 
snd_hda_codec_generic(E) arc4(E) iTCO_wdt(E) iTCO_vendor_support(E) 
intel_rapl(E) x86_pkg_temp_thermal(E) kvm_intel(E) kvm(E) irqbypass(E) 
crct10dif_pclmul(E) crc32_pclmul(E) crc32c_intel(E) ghash_clmulni_intel(E) 
aesni_intel(E) iwlmvm(E) aes_x86_64(E) crypto_simd(E) mac80211(E) cryptd(E) 
glue_helper(E) snd_hda_intel(E) snd_hda_codec(E) iwlwifi(E) snd_hwdep(E) 
psmouse(E) snd_hda_core(E) snd_pcm(E) serio_raw(E) sdhci_pci(E) pcspkr(E) 
snd_timer(E) ehci_pci(E) e1000e(E) i2c_i801(E) ehci_hcd(E) snd(E) sg(E) i915(E) 
lpc_ich(E)
[   60.007366]  ptp(E) usbcore(E) cfg80211(E) mfd_core(E) pps_core(E) shpchp(E) 
ac(E) battery(E) tpm_tis(E) tpm_tis_core(E) evdev(E) tpm(E) parport_pc(E) 
ppdev(E) lp(E) parport(E) ip_tables(E) x_tables(E) autofs4(E)
[   60.007474] CPU: 2 PID: 33 Comm: kworker/2:1 Tainted: GE   
4.12.3-nr44-normandy-r1500619820+ #1
[   60.007524] Hardware name: LENOVO 4291LR7/4291LR7, BIOS CBET4000 
4.6-810-g50522254fb 07/21/2017
[   60.007580] Workqueue: usb_hub_wq hub_event [usbcore]
[   60.007609] task: 8c882b716040 task.stack: b8e800d84000
[   60.007644] RIP: 0010:qmi_wwan_disconnect+0x25/0xc0 [qmi_wwan]
[   60.007678] RSP: 0018:b8e800d87b38 EFLAGS: 00010246
[   60.007711] RAX:  RBX:  RCX: 
[   60.007752] RDX: 0001 RSI: 8c8824f3f1d0 RDI: 8c8824ef6400
[   60.007792] RBP: 8c8824ef6400 R08:  R09: 
[   60.007833] R10: b8e800d87780 R11: 0011 R12: c07ea0e8
[   60.007874] R13: 8c8824e2e000 R14: 8c8824e2e098 R15: 
[   60.007915] FS:  () GS:8c883530() 
knlGS:
[   60.007960] CS:  0010 DS:  ES:  CR0: 80050033
[   60.007994] CR2: 00e0 CR3: 000229ca5000 CR4: 000406e0
[   60.008035] Call Trace:
[   60.008065]  ? usb_unbind_interface+0x71/0x270 [usbcore]
[   60.008101]  ? device_release_driver_internal+0x154/0x210
[   60.008135]  ? qmi_wwan_unbind+0x6d/0xc0 [qmi_wwan]
[   60.008168]  ? usbnet_disconnect+0x6c/0xf0 [usbnet]
[   60.008194]  ? qmi_wwan_disconnect+0x87/0xc0 [qmi_wwan]
[   60.008232]  ? usb_unbind_interface+0x71/0x270 [usbcore]
[   60.008264]  ? device_release_driver_internal+0x154/0x210
[   60.008296]  ? bus_remove_device+0xf5/0x160
[   60.008324]  ? device_del+0x1dc/0x310
[   60.008355]  ? usb_remove_ep_devs+0x1b/0x30 [usbcore]
[   60.008393]  ? usb_disable_device+0x93/0x250 [usbcore]
[   60.008430]  ? usb_disconnect+0x90/0x260 [usbcore]
[   60.008468]  ? hub_event+0x1d9/0x14a0 [usbcore]
[   60.008500]  ? process_one_work+0x175/0x370
[   60.008528]  ? worker_thread+0x4a/0x380
[   60.008555]  ? kthread+0xfc/0x130
[   60.008579]  ? process_one_work+0x370/0x370
[   60.008606]  ? kthread_park+0x60/0x60
[   60.008631]  ? ret_from_fork+0x22/0x30
[   60.008656] Code: 66 0f 1f 44 00 00 66 66 66 66 90 55 48 89 fd 53 48 83 ec 10 48 
8b 9f c8 00 00 00 65 48 8b 04 25 28 00 00 00 48 89 44 24 08 31 c0  83 e0 00 
00 00 02 74 51 e8 0d b3 2b cd 85 c0 74 67 48 8b bb
[   60.011925] RIP: qmi_wwan_disconnect+0x25/0xc0 [qmi_wwan] RSP: 
b8e800d87b38
[   60.013564] CR2: 00e0
[   60.022125] ---[ end trace e536b59f45bc0f25 ]---
[   60.025385] IPv6: ADDRCONF(NETDEV_UP): wlp2s0: link is not ready

If I attempt a second rmmod, the process hangs. If I attempt it on 4.11.x it 
works as expected:

[   16.897783] fuse init (API version 7.26)
[   68.073552] usbcore: deregistering interface driver qmi_wwan
[   68.075808] qmi_wwan 2-1.4:1.6 wwp0s29u1u4i6: unregister 'qmi_wwan' 
usb-:00:1d.0-1.4, WWAN/QMI device
[   72.431403] e1000e: enp0s25 NIC Link is Down

So I'm pretty certain it's not coreboot causing the issue.

[PATCH V2 net] Revert "vhost: cache used event for better performance"

2017-07-26 Thread Jason Wang

This reverts commit 809ecb9bca6a9424ccd392d67e368160f8b76c92. Since it
was reported to break vhost_net. We want to cache used event and use
it to check for notification. The assumption was that guest won't move
the event idx back, but this could happen in fact when 16 bit index
wraps around after 64K entries.

Signed-off-by: Jason Wang 
---
- Changes from V1: tweak commit log
- The patch is needed for -stable.
---
 drivers/vhost/vhost.c | 28 ++--
 drivers/vhost/vhost.h |  3 ---
 2 files changed, 6 insertions(+), 25 deletions(-)

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index e4613a3..9cb3f72 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -308,7 +308,6 @@ static void vhost_vq_reset(struct vhost_dev *dev,
vq->avail = NULL;
vq->used = NULL;
vq->last_avail_idx = 0;
-   vq->last_used_event = 0;
vq->avail_idx = 0;
vq->last_used_idx = 0;
vq->signalled_used = 0;
@@ -1402,7 +1401,7 @@ long vhost_vring_ioctl(struct vhost_dev *d, int ioctl, 
void __user *argp)
r = -EINVAL;
break;
}
-   vq->last_avail_idx = vq->last_used_event = s.num;
+   vq->last_avail_idx = s.num;
/* Forget the cached index value. */
vq->avail_idx = vq->last_avail_idx;
break;
@@ -2241,6 +2240,10 @@ static bool vhost_notify(struct vhost_dev *dev, struct 
vhost_virtqueue *vq)
__u16 old, new;
__virtio16 event;
bool v;
+   /* Flush out used index updates. This is paired
+* with the barrier that the Guest executes when enabling
+* interrupts. */
+   smp_mb();
 
if (vhost_has_feature(vq, VIRTIO_F_NOTIFY_ON_EMPTY) &&
unlikely(vq->avail_idx == vq->last_avail_idx))
@@ -2248,10 +2251,6 @@ static bool vhost_notify(struct vhost_dev *dev, struct 
vhost_virtqueue *vq)
 
if (!vhost_has_feature(vq, VIRTIO_RING_F_EVENT_IDX)) {
__virtio16 flags;
-   /* Flush out used index updates. This is paired
-* with the barrier that the Guest executes when enabling
-* interrupts. */
-   smp_mb();
if (vhost_get_avail(vq, flags, >avail->flags)) {
vq_err(vq, "Failed to get flags");
return true;
@@ -2266,26 +2265,11 @@ static bool vhost_notify(struct vhost_dev *dev, struct 
vhost_virtqueue *vq)
if (unlikely(!v))
return true;
 
-   /* We're sure if the following conditions are met, there's no
-* need to notify guest:
-* 1) cached used event is ahead of new
-* 2) old to new updating does not cross cached used event. */
-   if (vring_need_event(vq->last_used_event, new + vq->num, new) &&
-   !vring_need_event(vq->last_used_event, new, old))
-   return false;
-
-   /* Flush out used index updates. This is paired
-* with the barrier that the Guest executes when enabling
-* interrupts. */
-   smp_mb();
-
if (vhost_get_avail(vq, event, vhost_used_event(vq))) {
vq_err(vq, "Failed to get used event idx");
return true;
}
-   vq->last_used_event = vhost16_to_cpu(vq, event);
-
-   return vring_need_event(vq->last_used_event, new, old);
+   return vring_need_event(vhost16_to_cpu(vq, event), new, old);
 }
 
 /* This actually signals the guest, using eventfd. */
diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
index f720958..bb7c29b 100644
--- a/drivers/vhost/vhost.h
+++ b/drivers/vhost/vhost.h
@@ -115,9 +115,6 @@ struct vhost_virtqueue {
/* Last index we used. */
u16 last_used_idx;
 
-   /* Last used evet we've seen */
-   u16 last_used_event;
-
/* Used flags */
u16 used_flags;
 
-- 
2.7.4

Re: [PATCH 1/4] can: dev: Add support for limiting configured bitrate

2017-07-26 Thread Franklin S Cooper Jr

Hi Kurt,
On 07/26/2017 03:04 PM, Kurt Van Dijck wrote:
> Hi,
> 
> I know my response is late ...
> 
>> Hi Oliver
>> On 07/20/2017 02:43 AM, Oliver Hartkopp wrote:
>>> Hi Franklin,
>>>
>>> On 07/20/2017 01:36 AM, Franklin S Cooper Jr wrote:
>>>
 +#ifdef CONFIG_OF
 +void of_transceiver_is_fixed(struct net_device *dev)
 +{
>>>
>>> (..)
>>>
 +}
 +EXPORT_SYMBOL(of_transceiver_is_fixed);
 +#endif
>>>
>>> I'm not sure about the naming here.
>>>
>>> As this is a CAN transceiver related option it should be named accordingly:
> 
> I contest the the name too:
> 1) the can transceiver isn't fixed at all, it limited to the higher
> bitrates.

Its "possible" that this subnode may have additional properties beyond
bitrates in the future. But your right as of now it is specifically
addressing max bit rates. The naming of this function and subnode is
based on "fixed-link". So "fixed" is just implying that certain
properties can't be changed.

> 
> 2) of_can_transceiver_is_fixed suggests to test if a transceiver is
> fixed, it does not suggest to load some properties from the device tree.
> of_can_load_transceiver looks way more clear to me.

I address this partially in my rev 2 that I've already sent. I'm now
using of_can_transceiver_fixed. The fact its of_ already implies it is
loading properties from device tree. So I don't think
of_can_load_transceiver really makes things clearer.

> 
> That's my opinion.
> The important things, like the contents of the functions, look good.

You mind throwing your two cents in the thread for my latest patch?
Specifically the conversation regarding naming the properties. A couple
of people prefer to not use "arbitration" in one of the property names.
Currently I believe there are two options on property names that can be
used and I'm open to a majority vote on which one to go with.

> 
> Kind regards,
> Kurt Van Dijck
>

RE: [PATCH V2 net-next] TLP: Don't reschedule PTO when there's one outstanding TLP retransmission

2017-07-26 Thread maowenan



> -Original Message-
> From: Sergei Shtylyov [mailto:sergei.shtyl...@cogentembedded.com]
> Sent: Wednesday, July 26, 2017 6:03 PM
> To: maowenan; netdev@vger.kernel.org; da...@davemloft.net;
> ncardw...@google.com; ych...@google.com; nandi...@google.com;
> weiyongjun (A); Chenweilong; Wangkefeng (Kevin)
> Subject: Re: [PATCH V2 net-next] TLP: Don't reschedule PTO when there's one
> outstanding TLP retransmission
> 
> On 7/26/2017 12:44 PM, Mao Wenan wrote:
> 
> > If there is one TLP probe went out(TLP use the write_queue_tail packet
> > as TLP probe, we assume this first TLP probe named A), and this TLP
> > probe was not acked by receive side.
> >
> > Then the transmit side sent the next two packetes out(named B,C), but
> > unfortunately these two packets are also not acked by receive side.
> >
> > And then there is one data packet with ack_seq A arrive at transmit
> > side, in tcp_ack() will call tcp_schedule_loss_probe() to rearm PTO,
> > the handler tcp_send_loss_probe() is to check
> > if(tp->tlp_high_seq) then go to rearm_timer(because there is one
> > outstanding TLP named A), so the new TLP probe can't be sent out and
> > it needs to rearm the RTO timer(timeout is relative to the transmit
> > time of the write queue head).
> >
> > After that, there is another data packet with ack_seq A is received,
> > if the tlp_time_stamp is greater than rto_time_stamp, it will reset
> > the TLP timeout, which is before previous RTO timeout, so PTO is rearm
> > and previous RTO is cleared. Because there is no retransmission packet
> > was sent or no TLP sack receive, tp->tlp_high_seq can't be reset to
> > zero and the next TLP probe also can't be sent out, so there is no
> > way(or very long time) to retransmit the lost packet.
> >
> > This fix is to check(tp->tlp_high_seq) in tcp_schedule_loss_probe()
> > when TLP PTO is after RTO, It is not needed to reschedule PTO when
> > there is one outstanding TLP retransmission, so if the TLP A is lost
> > RTO can retransmit lost packet, then tp->tlp_high_seq will be set to
> > 0, and TLP will go to the normal work process.
> >
> > v1->v2
> > refine some words of code and patch comments.
> >
> > Signed-off-by: Mao Wenan 
> > ---
> >   net/ipv4/tcp_output.c | 6 ++
> >   1 file changed, 6 insertions(+)
> >
> > diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index
> > 886d874..f85c7ef 100644
> > --- a/net/ipv4/tcp_output.c
> > +++ b/net/ipv4/tcp_output.c
> > @@ -2423,6 +2423,12 @@ bool tcp_schedule_loss_probe(struct sock *sk)
> > tlp_time_stamp = tcp_jiffies32 + timeout;
> > rto_time_stamp = (u32)inet_csk(sk)->icsk_timeout;
> > if ((s32)(tlp_time_stamp - rto_time_stamp) > 0) {
> > +   /* It is not needed to reschedule PTO when there
> > +* is one outstanding TLP retransmission.
> > +*/
> > +   if (tp->tlp_high_seq) {
> > +   return false;
> > +   }
> 
> I have already told you to remove the needless {}... :-/
Thanks, sorry for this.
> 
> [...]
> 
> MBR, Sergei

RE: [PATCH net-next] TLP: Don't reschedule PTO when there's one outstanding TLP retransmission.

2017-07-26 Thread maowenan



> -Original Message-
> From: Yuchung Cheng [mailto:ych...@google.com]
> Sent: Thursday, July 27, 2017 12:45 AM
> To: maowenan
> Cc: Neal Cardwell; Netdev; David Miller; weiyongjun (A); Chenweilong; Nandita
> Dukkipati; Wangkefeng (Kevin)
> Subject: Re: [PATCH net-next] TLP: Don't reschedule PTO when there's one
> outstanding TLP retransmission.
> 
> On Tue, Jul 25, 2017 at 7:12 PM, maowenan 
> wrote:
> >
> >
> >
> > > -Original Message-
> > > From: Neal Cardwell [mailto:ncardw...@google.com]
> > > Sent: Tuesday, July 25, 2017 9:30 PM
> > > To: maowenan
> > > Cc: Netdev; David Miller; weiyongjun (A); Chenweilong; Yuchung
> > > Cheng; Nandita Dukkipati
> > > Subject: Re: [PATCH net-next] TLP: Don't reschedule PTO when there's
> > > one outstanding TLP retransmission.
> > >
> > > On Tue, Jul 25, 2017 at 4:35 AM, Mao Wenan 
> > > wrote:
> > > > If there is one TLP probe went out(TLP use the write_queue_tail
> > > > packet as TLP probe, we assume this first TLP probe named A), and
> > > > this TLP probe was not acked by receive side.
> > > >
> > > > Then the transmit side sent the next two packetes out(named B,C),
> > > > but unfortunately these two packets are also not acked by receive side.
> > > >
> > > > And then there is one data packet with ack_seq A arrive, in
> > > > tcp_ack() will call tcp_schedule_loss_probe() to rearm PTO, the
> > > > handler
> > > > tcp_send_loss_probe() pass if(tp->tlp_high_seq)(because there is
> > > > one outstanding TLP named A,tp->tlp_high_seq is not zero), so the
> > > > new TLP probe can't be went out and need to rearm the RTO
> > > > timer(timeout is relative to the transmit time of the write queue head).
> > > >
> > > > After this, another data packet with ack_seq A is received, if the
> > > > tlp_time_stamp is after rto_time_stamp, it will reset the TLP
> > > > timeout with delta value, which is before previous RTO timeout, so
> > > > PTO is rearm and previous RTO is cleared. This TLP probe also
> > > > can't be sent out because of tp->tlp_high_seq != 0, so there is no
> > > > way(or need very long time)to retransmit the packet because of TLP A is
> lost.
> > > >
> > > > This fix is not to pass the if(tp->tlp_high_seq) in
> > > > tcp_schedule_loss_probe() when TLP PTO is after RTO, It is no need
> > > > to reschedule PTO when there is one outstanding TLP
> > > > retransmission, so if the TLP A is lost then RTO can retransmit
> > > > that packet, and
> > > > tp->tlp_high_seq will be set to 0. After this TLP will go to the
> > > > tp->normal work
> > > process.
> > > >
> > > > Signed-off-by: Mao Wenan 
> > >
> > > Thanks for posting this. This is a pretty involved scenario. To help
> > > document/test precisely what the behavior is before and after your
> > > patch, would you be able to post a packetdrill (
> > > https://github.com/google/packetdrill ) test case for this scenario?
> > >
> > > Can I ask if you saw this scenario in an actual trace, or noticed
> > > this by inspection?
> > [Mao Wenan] It is my actual product scenario, from some debug trace
> > info I found that PTO is always rescheduled before RTO, this is PTO is
> > trigged by receiving one tcp_ack packets. After this patch, the
> > product works well and there is no previous issue happen. The packet
> > can retransmit with RTO if TLP probe is lost.
> > I will say sorry that my local test can't reproduce because i can't
> > build successfully the same complex scenario, I don't have mature test
> > case to reproduce now but I will do my best to try in local test 
> > environment.
> tcpdump output of an example trace also helps. Feel free to remove IP address,
> ports, or payload information. We only need the timing and TCP header info.
Ok, please check as below. Server: 192.169.0.18, client: 192.169.0.17
..
20:36:12.239562 IP 192.169.0.18.1234 > 192.169.0.17.56246: P 
501216544:501216616(72) ack 2308880351 win 92
20:36:12.240034 IP 192.169.0.18.1234 > 192.169.0.17.56246: P 
501216616:501216688(72) ack 2308880351 win 92
20:36:12.260764 IP 192.169.0.18.1234 > 192.169.0.17.56246: P 
501216616:501216688(72) ack 2308880351 win 92 /* first TLP probe A, seq: 
501216616 */ 
20:36:13.018020 IP 192.169.0.17.56246 > 192.169.0.18.1234: . ack 501216616 win 
1970 
20:36:13.018386 IP 192.169.0.18.1234 > 192.169.0.17.56246: P 
501216688:501216760(72) ack 2308880351 win 92 /* server send packet B */
20:36:13.022448 IP 192.169.0.18.1234 > 192.169.0.17.56246: P 
501216760:501216832(72) ack 2308880351 win 92 /* server send packet C */
20:36:13.033813 IP 192.169.0.17.56246 > 192.169.0.18.1234: P 
2308880351:2308880437(86) ack 501216616 win 1970  /* receive client data packet 
ack 501216616 */
20:36:13.034213 IP 192.169.0.17.56246 > 192.169.0.18.1234: P 
2308880437:2308880523(86) ack 501216616 win 1970  /* receive client data packet 
ack 501216616 */
20:36:13.034633 IP 192.169.0.18.1234 > 192.169.0.17.56246: .

Re: [PATCH v7 2/3] PCI: Enable PCIe Relaxed Ordering if supported

2017-07-26 Thread Ding Tianhong



On 2017/7/27 2:26, Casey Leedom wrote:
>   By the way Ding, two issues:
> 
>  1. Did we ever get any acknowledgement from either Intel or AMD
> on this patch?  I know that we can't ensure that, but it sure would
> be nice since the PCI Quirks that we're putting in affect their
> products.
> 

Still no Intel and AMD guys has ack this, this is what I am worried about, 
should I
ping some man again ?

Thanks
Ding
> 
> Casey
> .
>

Re: [PATCH v7 2/3] PCI: Enable PCIe Relaxed Ordering if supported

2017-07-26 Thread Ding Tianhong



On 2017/7/27 3:05, Casey Leedom wrote:
> | From: Alexander Duyck 
> | Sent: Wednesday, July 26, 2017 11:44 AM
> | 
> | On Jul 26, 2017 11:26 AM, "Casey Leedom"  wrote:
> | |
> | | I think that the patch will need to be extended to modify
> | | drivers/pci.c/iov.c:sriov_enable() to explicitly turn off
> | | Relaxed Ordering Enable if the Root Complex is marked
> | for no RO TLPs.
> | 
> | I'm not sure that would be an issue. Wouldn't most VFs inherit the PF's 
> settings?
> 
> Ah yes, you're right.  This is covered in section 3.5.4 of the Single Root I/O
> Virtualization and Sharing Specification, Revision 1.0 (September 11, 2007),
> governing the PCIe Capability Device Control register.  It states that the VF
> version of that register shall follow the setting of the corresponding PF.
> 
> So we should enhance the cxgb4vf/sge.c:t4vf_sge_alloc_rxq() in the same
> way we did for the cxgb4 driver, but that's not critical since the Relaxed
> Ordering Enable supersedes the internal chip's desire to use the Relaxed
> Ordering Attribute.
> 
> Ding, send me a note if you'd like me to work that up for you.
> 

Ok, you could send the change log and I could put it in the v8 version together,
will you base on the patch 3/3 or build a independence patch?

Ding

> | Also I thought most of the VF configuration space is read only.
> 
> Yes, but not all of it.  And when a VF is exported to a Virtual Machine,
> then the Hypervisor captures and interprets all accesses to the VF's
> PCIe Configuration Space from the VM.
> 
> Thanks again for reminding me of the subtle aspect of the SR_IOV
> specification that I forgot.
> 
> Casey
> .
>

Re: [PATCH v3 net-next] net: systemport: Support 64bit statistics

2017-07-26 Thread Florian Fainelli

On 07/26/2017 05:40 PM, Jianming.qiao wrote:
> When using Broadcom Systemport device in 32bit Platform, ifconfig can
> only report up to 4G tx,rx status, which will be wrapped to 0 when the
> number of incoming or outgoing packets exceeds 4G, only taking
> around 2 hours in busy network environment (such as streaming).
> Therefore, it makes hard for network diagnostic tool to get reliable
> statistical result, so the patch is used to add 64bit support for
> Broadcom Systemport device in 32bit Platform.
> 
> Signed-off-by: Jianming.qiao 
> ---
>  drivers/net/ethernet/broadcom/bcmsysport.c | 71 
> --
>  drivers/net/ethernet/broadcom/bcmsysport.h |  9 +++-
>  2 files changed, 55 insertions(+), 25 deletions(-)
> 
> diff --git a/drivers/net/ethernet/broadcom/bcmsysport.c 
> b/drivers/net/ethernet/broadcom/bcmsysport.c
> index 5274501..967221f 100644
> --- a/drivers/net/ethernet/broadcom/bcmsysport.c
> +++ b/drivers/net/ethernet/broadcom/bcmsysport.c
> @@ -662,6 +662,7 @@ static int bcm_sysport_alloc_rx_bufs(struct 
> bcm_sysport_priv *priv)
>  static unsigned int bcm_sysport_desc_rx(struct bcm_sysport_priv *priv,
>   unsigned int budget)
>  {
> + struct bcm_sysport_stats *stats64 = >stats64;
>   struct net_device *ndev = priv->netdev;
>   unsigned int processed = 0, to_process;
>   struct bcm_sysport_cb *cb;
> @@ -765,6 +766,10 @@ static unsigned int bcm_sysport_desc_rx(struct 
> bcm_sysport_priv *priv,
>   skb->protocol = eth_type_trans(skb, ndev);
>   ndev->stats.rx_packets++;
>   ndev->stats.rx_bytes += len;
> + u64_stats_update_begin(>syncp);
> + stats64->rx_packets++;
> + stats64->rx_bytes += len;
> + u64_stats_update_end(>syncp);
>  
>   napi_gro_receive(>napi, skb);
>  next:
> @@ -784,24 +789,31 @@ static void bcm_sysport_tx_reclaim_one(struct 
> bcm_sysport_tx_ring *ring,
>  unsigned int *pkts_compl)
>  {
>   struct bcm_sysport_priv *priv = ring->priv;
> + struct bcm_sysport_stats *stats64 = >stats64;
>   struct device *kdev = >pdev->dev;
> + unsigned int len = 0;
>  
>   if (cb->skb) {
> - ring->bytes += cb->skb->len;
> - *bytes_compl += cb->skb->len;
> + len = cb->skb->len;
> + *bytes_compl += len;
>   dma_unmap_single(kdev, dma_unmap_addr(cb, dma_addr),
>dma_unmap_len(cb, dma_len),
>DMA_TO_DEVICE);
> - ring->packets++;
>   (*pkts_compl)++;
>   bcm_sysport_free_cb(cb);
>   /* SKB fragment */
>   } else if (dma_unmap_addr(cb, dma_addr)) {
> - ring->bytes += dma_unmap_len(cb, dma_len);
> + len = dma_unmap_len(cb, dma_len);
>   dma_unmap_page(kdev, dma_unmap_addr(cb, dma_addr),
>  dma_unmap_len(cb, dma_len), DMA_TO_DEVICE);
>   dma_unmap_addr_set(cb, dma_addr, 0);
>   }
> +
> + u64_stats_update_begin(>syncp);
> + ring->bytes += len;
> + if (cb->skb)
> + ring->packets++;

How can this be safe now with bcm_sysport_free_cb() doing cb->skb =
NULL? I know I recommended moving the 64-bit statistics update out of
the if () else () clause before but still, it sounds like we are no
longer able to count the ring packets here.

> + u64_stats_update_end(>syncp);
>  }
>  
>  /* Reclaim queued SKBs for transmission completion, lockless version */
> @@ -1671,24 +1683,6 @@ static int bcm_sysport_change_mac(struct net_device 
> *dev, void *p)
>   return 0;
>  }
>  
> -static struct net_device_stats *bcm_sysport_get_nstats(struct net_device 
> *dev)
> -{
> - struct bcm_sysport_priv *priv = netdev_priv(dev);
> - unsigned long tx_bytes = 0, tx_packets = 0;
> - struct bcm_sysport_tx_ring *ring;
> - unsigned int q;
> -
> - for (q = 0; q < dev->num_tx_queues; q++) {
> - ring = >tx_rings[q];
> - tx_bytes += ring->bytes;
> - tx_packets += ring->packets;
> - }
> -
> - dev->stats.tx_bytes = tx_bytes;
> - dev->stats.tx_packets = tx_packets;
> - return >stats;
> -}
> -
>  static void bcm_sysport_netif_start(struct net_device *dev)
>  {
>   struct bcm_sysport_priv *priv = netdev_priv(dev);
> @@ -1923,6 +1917,37 @@ static int bcm_sysport_stop(struct net_device *dev)
>   return 0;
>  }
>  
> +static void bcm_sysport_get_stats64(struct net_device *dev,
> + struct rtnl_link_stats64 *stats)
> +{
> + struct bcm_sysport_priv *priv = netdev_priv(dev);
> + struct bcm_sysport_stats *stats64 = >stats64;
> + struct bcm_sysport_tx_ring *ring;
> + u64 tx_packets = 0, tx_bytes = 0;
> + unsigned int start;
> + unsigned int q;
> +
> + netdev_stats_to_stats64(stats, >stats);
> +
> +

[PATCH v3 net-next] net: systemport: Support 64bit statistics

2017-07-26 Thread Jianming.qiao

When using Broadcom Systemport device in 32bit Platform, ifconfig can
only report up to 4G tx,rx status, which will be wrapped to 0 when the
number of incoming or outgoing packets exceeds 4G, only taking
around 2 hours in busy network environment (such as streaming).
Therefore, it makes hard for network diagnostic tool to get reliable
statistical result, so the patch is used to add 64bit support for
Broadcom Systemport device in 32bit Platform.

Signed-off-by: Jianming.qiao 
---
 drivers/net/ethernet/broadcom/bcmsysport.c | 71 --
 drivers/net/ethernet/broadcom/bcmsysport.h |  9 +++-
 2 files changed, 55 insertions(+), 25 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bcmsysport.c 
b/drivers/net/ethernet/broadcom/bcmsysport.c
index 5274501..967221f 100644
--- a/drivers/net/ethernet/broadcom/bcmsysport.c
+++ b/drivers/net/ethernet/broadcom/bcmsysport.c
@@ -662,6 +662,7 @@ static int bcm_sysport_alloc_rx_bufs(struct 
bcm_sysport_priv *priv)
 static unsigned int bcm_sysport_desc_rx(struct bcm_sysport_priv *priv,
unsigned int budget)
 {
+   struct bcm_sysport_stats *stats64 = >stats64;
struct net_device *ndev = priv->netdev;
unsigned int processed = 0, to_process;
struct bcm_sysport_cb *cb;
@@ -765,6 +766,10 @@ static unsigned int bcm_sysport_desc_rx(struct 
bcm_sysport_priv *priv,
skb->protocol = eth_type_trans(skb, ndev);
ndev->stats.rx_packets++;
ndev->stats.rx_bytes += len;
+   u64_stats_update_begin(>syncp);
+   stats64->rx_packets++;
+   stats64->rx_bytes += len;
+   u64_stats_update_end(>syncp);
 
napi_gro_receive(>napi, skb);
 next:
@@ -784,24 +789,31 @@ static void bcm_sysport_tx_reclaim_one(struct 
bcm_sysport_tx_ring *ring,
   unsigned int *pkts_compl)
 {
struct bcm_sysport_priv *priv = ring->priv;
+   struct bcm_sysport_stats *stats64 = >stats64;
struct device *kdev = >pdev->dev;
+   unsigned int len = 0;
 
if (cb->skb) {
-   ring->bytes += cb->skb->len;
-   *bytes_compl += cb->skb->len;
+   len = cb->skb->len;
+   *bytes_compl += len;
dma_unmap_single(kdev, dma_unmap_addr(cb, dma_addr),
 dma_unmap_len(cb, dma_len),
 DMA_TO_DEVICE);
-   ring->packets++;
(*pkts_compl)++;
bcm_sysport_free_cb(cb);
/* SKB fragment */
} else if (dma_unmap_addr(cb, dma_addr)) {
-   ring->bytes += dma_unmap_len(cb, dma_len);
+   len = dma_unmap_len(cb, dma_len);
dma_unmap_page(kdev, dma_unmap_addr(cb, dma_addr),
   dma_unmap_len(cb, dma_len), DMA_TO_DEVICE);
dma_unmap_addr_set(cb, dma_addr, 0);
}
+
+   u64_stats_update_begin(>syncp);
+   ring->bytes += len;
+   if (cb->skb)
+   ring->packets++;
+   u64_stats_update_end(>syncp);
 }
 
 /* Reclaim queued SKBs for transmission completion, lockless version */
@@ -1671,24 +1683,6 @@ static int bcm_sysport_change_mac(struct net_device 
*dev, void *p)
return 0;
 }
 
-static struct net_device_stats *bcm_sysport_get_nstats(struct net_device *dev)
-{
-   struct bcm_sysport_priv *priv = netdev_priv(dev);
-   unsigned long tx_bytes = 0, tx_packets = 0;
-   struct bcm_sysport_tx_ring *ring;
-   unsigned int q;
-
-   for (q = 0; q < dev->num_tx_queues; q++) {
-   ring = >tx_rings[q];
-   tx_bytes += ring->bytes;
-   tx_packets += ring->packets;
-   }
-
-   dev->stats.tx_bytes = tx_bytes;
-   dev->stats.tx_packets = tx_packets;
-   return >stats;
-}
-
 static void bcm_sysport_netif_start(struct net_device *dev)
 {
struct bcm_sysport_priv *priv = netdev_priv(dev);
@@ -1923,6 +1917,37 @@ static int bcm_sysport_stop(struct net_device *dev)
return 0;
 }
 
+static void bcm_sysport_get_stats64(struct net_device *dev,
+   struct rtnl_link_stats64 *stats)
+{
+   struct bcm_sysport_priv *priv = netdev_priv(dev);
+   struct bcm_sysport_stats *stats64 = >stats64;
+   struct bcm_sysport_tx_ring *ring;
+   u64 tx_packets = 0, tx_bytes = 0;
+   unsigned int start;
+   unsigned int q;
+
+   netdev_stats_to_stats64(stats, >stats);
+
+   for (q = 0; q < dev->num_tx_queues; q++) {
+   ring = >tx_rings[q];
+   do {
+   start = u64_stats_fetch_begin_irq(>syncp);
+   tx_bytes += ring->bytes;
+   tx_packets += ring->packets;
+   } while (u64_stats_fetch_retry_irq(>syncp, start));
+   }
+
+   stats->tx_packets = tx_packets;
+

Re: [PATCH] bpf: testing: fix devmap tests

2017-07-26 Thread Daniel Borkmann


On 07/27/2017 02:32 AM, John Fastabend wrote:

Apparently through one of my revisions of the initial patches
series I lost the devmap test. We can add more testing later but
for now lets fix the simple one we have.

Fixes: 546ac1ffb70d "bpf: add devmap, a map for storing net device references"
Reported-by: Jakub Kicinski 
Signed-off-by: John Fastabend 


Looks good, thanks!

Acked-by: Daniel Borkmann

RE: [Intel-wired-lan] [PATCH] igb: support BCM54616 PHY

2017-07-26 Thread Brown, Aaron F

> From: Intel-wired-lan [mailto:intel-wired-lan-boun...@osuosl.org] On Behalf
> Of John W. Linville
> Sent: Friday, July 21, 2017 11:12 AM
> To: netdev@vger.kernel.org
> Cc: intel-wired-...@lists.osuosl.org; John W. Linville
> 
> Subject: [Intel-wired-lan] [PATCH] igb: support BCM54616 PHY
> 
> The management port on an Edgecore AS7712-32 switch uses an igb MAC,
> but
> it uses a BCM54616 PHY. Without a patch like this, loading the igb
> module produces dmesg output like this:
> 
> [3.439125] igb: Copyright (c) 2007-2014 Intel Corporation.
> [3.439866] igb: probe of :00:14.0 failed with error -2
> 
> Signed-off-by: John W. Linville 
> Cc: Jeff Kirsher 
> ---
>  drivers/net/ethernet/intel/igb/e1000_82575.c   | 6 ++
>  drivers/net/ethernet/intel/igb/e1000_defines.h | 1 +
>  drivers/net/ethernet/intel/igb/e1000_hw.h  | 1 +
>  3 files changed, 8 insertions(+)

I do not have the specific hardware (Edgecore switch) but as far as regression 
tests go this works fine.
Tested-by: Aaron Brown

Re: [PATCH] bpf: testing: fix devmap tests

2017-07-26 Thread John Fastabend

On 07/26/2017 05:32 PM, John Fastabend wrote:
> Apparently through one of my revisions of the initial patches
> series I lost the devmap test. We can add more testing later but
> for now lets fix the simple one we have.
> 
> Fixes: 546ac1ffb70d "bpf: add devmap, a map for storing net device references"
> Reported-by: Jakub Kicinski 
> Signed-off-by: John Fastabend 
> ---

Forgot prefix, this is for net-next. Sorry about that.

Thanks,
John

[PATCH] bpf: testing: fix devmap tests

2017-07-26 Thread John Fastabend

Apparently through one of my revisions of the initial patches
series I lost the devmap test. We can add more testing later but
for now lets fix the simple one we have.

Fixes: 546ac1ffb70d "bpf: add devmap, a map for storing net device references"
Reported-by: Jakub Kicinski 
Signed-off-by: John Fastabend 
---
 tools/testing/selftests/bpf/test_maps.c |4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index ce2988b..1579cab 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -104,6 +104,7 @@ enum bpf_map_type {
BPF_MAP_TYPE_LPM_TRIE,
BPF_MAP_TYPE_ARRAY_OF_MAPS,
BPF_MAP_TYPE_HASH_OF_MAPS,
+   BPF_MAP_TYPE_DEVMAP,
 };
 
 enum bpf_prog_type {
diff --git a/tools/testing/selftests/bpf/test_maps.c 
b/tools/testing/selftests/bpf/test_maps.c
index 36d6ac3..c991ab6 100644
--- a/tools/testing/selftests/bpf/test_maps.c
+++ b/tools/testing/selftests/bpf/test_maps.c
@@ -440,7 +440,7 @@ static void test_arraymap_percpu_many_keys(void)
 
 static void test_devmap(int task, void *data)
 {
-   int next_key, fd;
+   int fd;
__u32 key, value;
 
fd = bpf_create_map(BPF_MAP_TYPE_DEVMAP, sizeof(key), sizeof(value),
@@ -620,6 +620,8 @@ static void run_all_tests(void)
 
test_arraymap_percpu_many_keys();
 
+   test_devmap(0, NULL);
+
test_map_large();
test_map_parallel();
test_map_stress();

Re: [PATCH net-next v2 07/10] net: dsa: lan9303: Added basic offloading of unicast traffic

2017-07-26 Thread kbuild test robot

Hi Egil,

[auto build test WARNING on net-next/master]

url:
https://github.com/0day-ci/linux/commits/Egil-Hjelmeland/net-dsa-lan9303-unicast-offload-fdb-mdb-STP/20170727-074246
config: x86_64-randconfig-x000-201730 (attached as .config)
compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901
reproduce:
# save the attached .config to linux build tree
make ARCH=x86_64 

Note: it may well be a FALSE warning. FWIW you are at least aware of it now.
http://gcc.gnu.org/wiki/Better_Uninitialized_Warnings

All warnings (new ones prefixed by >>):

   drivers/net//dsa/lan9303-core.c: In function 'lan9303_port_stp_state_set':
>> drivers/net//dsa/lan9303-core.c:945:16: warning: 'portstate' may be used 
>> uninitialized in this function [-Wmaybe-uninitialized]
 int portmask, portstate;
   ^

vim +/portstate +945 drivers/net//dsa/lan9303-core.c

   941  
   942  static void lan9303_port_stp_state_set(struct dsa_switch *ds, int port,
   943 u8 state)
   944  {
 > 945  int portmask, portstate;
   946  struct lan9303 *chip = ds->priv;
   947  
   948  dev_dbg(chip->dev, "%s(port %d, state %d)\n",
   949  __func__, port, state);
   950  if (!chip->is_bridged)
   951  return;
   952  
   953  switch (state) {
   954  case BR_STATE_DISABLED:
   955  portstate = LAN9303_SWE_PORT_STATE_DISABLED_PORT0;
   956  break;
   957  case BR_STATE_BLOCKING:
   958  case BR_STATE_LISTENING:
   959  portstate = LAN9303_SWE_PORT_STATE_BLOCKING_PORT0;
   960  break;
   961  case BR_STATE_LEARNING:
   962  portstate = LAN9303_SWE_PORT_STATE_LEARNING_PORT0;
   963  break;
   964  case BR_STATE_FORWARDING:
   965  portstate = LAN9303_SWE_PORT_STATE_FORWARDING_PORT0;
   966  break;
   967  default:
   968  dev_err(chip->dev, "%s(port %d, state %d)\n",
   969  __func__, port, state);
   970  }
   971  portmask = 0x3 << (port * 2);
   972  portstate <<= (port * 2);
   973  lan9303_write_switch_reg_mask(chip, LAN9303_SWE_PORT_STATE,
   974portstate, portmask);
   975  }
   976  

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip

Re: [PATCH net] bpf: don't zero out the info struct in bpf_obj_get_info_by_fd()

2017-07-26 Thread David Miller

From: Jakub Kicinski 
Date: Tue, 25 Jul 2017 15:16:12 -0700

> The buffer passed to bpf_obj_get_info_by_fd() should be initialized
> to zeros.  Kernel will enforce that to guarantee we can safely extend
> info structures in the future.
> 
> Making the bpf_obj_get_info_by_fd() call in libbpf perform the zeroing
> is problematic, however, since some members of the info structures
> may need to be initialized by the callers (for instance pointers
> to buffers to which kernel is to dump translated and jited images).
> 
> Remove the zeroing and fix up the in-tree callers before any kernel
> has been released with this code.
> 
> As Daniel points out this seems to be the intended operation anyway,
> since commit 95b9afd3987f ("bpf: Test for bpf ID") is itself setting
> the buffer pointers before calling bpf_obj_get_info_by_fd().
> 
> Signed-off-by: Jakub Kicinski 

Applied, thanks Jakub.

Re: [PATCH] netpoll: Fix device name check in netpoll_setup()

2017-07-26 Thread David Miller

From: Matthias Kaehlcke 
Date: Tue, 25 Jul 2017 11:36:25 -0700

> Apparently netpoll_setup() assumes that netpoll.dev_name is a pointer
> when checking if the device name is set:
> 
> if (np->dev_name) {
>   ...
> 
> However the field is a character array, therefore the condition always
> yields true. Check instead whether the first byte of the array has a
> non-zero value.
> 
> Signed-off-by: Matthias Kaehlcke 

Applied, thanks a lot.

Re: [net-next 00/14][pull request] 40GbE Intel Wired LAN Driver Updates 2017-07-25

2017-07-26 Thread David Miller

From: Jeff Kirsher 
Date: Wed, 26 Jul 2017 03:31:09 -0700

> This series contains updates to i40e and i40evf only.
> 
> Gustavo Silva fixes a variable assignment, where the incorrect variable
> was being used to store the error parameter.
> 
> Carolyn provides a fix for a problem found in systems when entering S4
> state, by ensuring that the misc vector's IRQ is disabled as well.
> 
> Jake removes the single-threaded restriction on the module workqueue,
> which was causing issues with events such as CORER.  Does some future
> proofing, by changing how the driver displays the UDP tunnel type.
> 
> Paul adds a retry in releasing resources if the admin queue times out
> during the first attempt to release the resources.
> 
> Jesse fixes up references to 32bit timspec, since there are a small set
> of errors on 32 bit, so we need to be using the right calls for dealing
> with timespec64 variables.  Cleaned up code indentation and corrected
> an "if" conditional check, as well as making the code flow more clear.
> Cast or changed the types to remove warnings for comparing signed and
> unsigned types.  Adds missing includes in i40evf, which were being used
> but were not being directly included.
> 
> Daniel Borkmann fixes i40e to fill the XDP prog_id with the id just like
> other XDP enabled drivers, so that on dump we can retrieve the attached
> program based on the id and dump BPF insns, opcodes, etc back to user
> space.
> 
> Tushar Dave adds le32_to_cpu while evaluating the hardware descriptor
> fields, since they are in little-endian format.  Also removed
> unnecessary "__packed" to a couple of i40evf structures.
> 
> Stefan Assmann fixes an issue when an administratively set MAC was set
> and should now be switched back to 00:00:00:00:00:00, the pf_set_mac
> flag is not being toggled back to false.
 ...
>   git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue 40GbE

Also pulled, thanks Jeff.

[PATCH net-next v2 00/10] netvsc fixes and new features

2017-07-26 Thread Stephen Hemminger

This series has updates for Hyper-V network driver.
The first four patches are just minor fixes.

The next group of patches aims to reduce the driver memory footprint
by reducing the size of the per-device receive and send buffers, and
the per-channel receive completion queue.

The biggest change is in the last three patches that implement transparent
management of SR-IOV VF device. A little more explaination is required.

What does it do?

With the new netvsc VF logic, the Linux hyper-V network
virtual driver will directly manage the link to SR-IOV VF device.
When VF device is detected (hot plug) it is automatically made a
slave device of the netvsc device. The VF device state reflects
the state of the netvsc device; i.e. if netvsc is set down, then
VF is set down. If netvsc is set up, then VF is brought up.
 
Packet flow is independent of VF status; all packets are sent and
received as if they were associated with the netvsc device. If VF is
removed or link is down then the synthetic VMBUS path is used.
 
What was wrong with using bonding script?

A lot of work went into getting the bonding script to work on all
distributions, but it was a major struggle. Linux network devices
can be configured many, many ways and there is no one solution from
userspace to make it all work. What is really hard is when
configuration is attached to synthetic device during boot (eth0) and
then the same addresses and firewall rules needs to also work later if
doing bonding. The new code gets around all of this.
 
How does VF work during initialization?

Since all packets are sent and received through the logical netvsc
device, initialization is much easier. Just configure the regular
netvsc Ethernet device; when/if SR-IOV is enabled it just
works. Provisioning and cloud init only need to worry about setting up
netvsc device (eth0). If SR-IOV is enabled (even as a later step), the
address and rules stay the same.
 
What devices show up?

Both netvsc and PCI devices are visible in the system. The netvsc
device is active and named in usual manner (eth0). The PCI device is
visible to Linux and gets renamed by udev to a persistent name
(enP2p3s0). The PCI device name is now irrelevant now.

The logica also sets the PCI VF device SLAVE flag on the network
device so network tools can see the relationship if they are smart
enough to understand how layered devices work.
 
This is a lot like how I see Windows working.
The VF device is visible in Device Manager, but is not configured.
 
Is there any performance impact?
There is no visible change in performance. The bonding
and netvsc driver both have equivalent steps.
 
Is it compatible with old bonding script?

It turns out that if you use the old bonding script, then everything
still works but in a sub-optimum manner. What happens is that bonding
is unable to steal the VF from the netvsc device so it creates a one
legged bond.  Packet flow then is:
bond0 <--> eth0 <- -> VF (enP2p3s0).
In other words, if you get it wrong it still works, just
awkward and slower.
 
What if I add address or firewall rule onto the VF?

Same problems occur with now as already occur with bonding, bridging,
teaming on Linux if user incorrectly does configuration onto
an underlying slave device. It will sort of work, packets will come in
and out but the Linux kernel gets confused and things like ARP don’t
work right.  There is no way to block manipulation of the slave
device, and I am sure someone will find some special use case where
they want it.

Stephen Hemminger (10):
  netvsc: fix return value for set_channels
  netvsc: fix warnings reported by lockdep
  netvsc: don't print pointer value in error message
  netvsc: remove unnecessary indirection of page_buffer
  netvsc: optimize receive completions
  netvsc: signal host if receive ring is emptied
  netvsc: allow smaller send/recv buffer size
  netvsc: transparent VF management
  netvsc: add documentation
  netvsc: remove bonding setup script

 Documentation/networking/netvsc.txt |  50 
 MAINTAINERS |   1 +
 drivers/net/hyperv/hyperv_net.h |  33 ++-
 drivers/net/hyperv/netvsc.c | 317 +++-
 drivers/net/hyperv/netvsc_drv.c | 476 
 drivers/net/hyperv/rndis_filter.c   | 108 
 tools/hv/bondvf.sh  | 255 ---
 7 files changed, 642 insertions(+), 598 deletions(-)
 create mode 100644 Documentation/networking/netvsc.txt
 delete mode 100755 tools/hv/bondvf.sh

-- 
2.11.0

[PATCH net-next v2 09/10] netvsc: add documentation

2017-07-26 Thread Stephen Hemminger

Add some background documentation on netvsc device options
and limitations.

Signed-off-by: Stephen Hemminger 
---
 Documentation/networking/netvsc.txt | 50 +
 MAINTAINERS |  1 +
 2 files changed, 51 insertions(+)
 create mode 100644 Documentation/networking/netvsc.txt

diff --git a/Documentation/networking/netvsc.txt 
b/Documentation/networking/netvsc.txt
new file mode 100644
index ..86f72dced4da
--- /dev/null
+++ b/Documentation/networking/netvsc.txt
@@ -0,0 +1,50 @@
+Hyper-V network driver
+==
+
+Compatibility
+=
+
+This driver is compatible and tested on Windows Server 2012 R2, 2016 and
+Windows 10.
+
+Features
+
+
+  Checksum offload
+  
+  The netvsc driver supports checksum offload as long as the
+  Hyper-V host version does. Windows Server 2016 and Azure
+  support checksum offload for TCP and UDP for both IPv4 and
+  IPv6. Windows Server 2012 only supports checksum offload for TCP.
+
+  Receive Side Scaling
+  
+  Hyper-V supports receive side scaling. For TCP, packets are
+  distributed among available queues based on IP address and port
+  number. Current versions of Hyper-V host, only distribute UDP
+  packets based on the IP source and destination address.
+  The port number is not used as part of the hash value for UDP.
+  Fragmented IP packets are not distributed between queues;
+  all fragmented packets arrive on the first channel.
+
+  Generic Receive Offload, aka GRO
+  
+  The driver supports GRO and it is enabled by default. GRO coalesces
+  like packets and significantly reduces CPU usage under heavy Rx
+  load.
+
+  SR-IOV support
+  --
+  Hyper-V supports SR-IOV as a hardware acceleration option. If SR-IOV
+  is enabled in both the vSwitch and the guest configuration, then the
+  Virtual Function (VF) device is passed to the guest as a PCI
+  device. In this case, both a synthetic (netvsc) and VF device are
+  visible in the guest OS and both NIC's have the same MAC address.
+
+  The VF is enslaved by netvsc device.  The netvsc driver will transparently
+  switch the data path to the VF when it is available and up.
+  Network state (addresses, firewall, etc) should be applied only to the
+  netvsc device; the slave device should not be accessed directly in
+  most cases.  The exceptions are if some special queue discipline or
+  flow direction is desired, these should be applied directly to the
+  VF slave device.
diff --git a/MAINTAINERS b/MAINTAINERS
index 297e610c9163..d30c17df1deb 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -6294,6 +6294,7 @@ M:Haiyang Zhang 
 M: Stephen Hemminger 
 L: de...@linuxdriverproject.org
 S: Maintained
+F: Documentation/networking/netvsc.txt
 F: arch/x86/include/asm/mshyperv.h
 F: arch/x86/include/uapi/asm/hyperv.h
 F: arch/x86/kernel/cpu/mshyperv.c
-- 
2.11.0

[PATCH net-next v2 10/10] netvsc: remove bonding setup script

2017-07-26 Thread Stephen Hemminger

No longer needed, now all managed by transparent VF logic.

Signed-off-by: Stephen Hemminger 
---
 tools/hv/bondvf.sh | 255 -
 1 file changed, 255 deletions(-)
 delete mode 100755 tools/hv/bondvf.sh

diff --git a/tools/hv/bondvf.sh b/tools/hv/bondvf.sh
deleted file mode 100755
index 80f102860cf8..
--- a/tools/hv/bondvf.sh
+++ /dev/null
@@ -1,255 +0,0 @@
-#!/bin/bash
-
-# This example script creates bonding network devices based on synthetic NIC
-# (the virtual network adapter usually provided by Hyper-V) and the matching
-# VF NIC (SRIOV virtual function). So the synthetic NIC and VF NIC can
-# function as one network device, and fail over to the synthetic NIC if VF is
-# down.
-#
-# Usage:
-# - After configured vSwitch and vNIC with SRIOV, start Linux virtual
-#   machine (VM)
-# - Run this scripts on the VM. It will create configuration files in
-#   distro specific directory.
-# - Reboot the VM, so that the bonding config are enabled.
-#
-# The config files are DHCP by default. You may edit them if you need to change
-# to Static IP or change other settings.
-#
-
-sysdir=/sys/class/net
-netvsc_cls={f8615163-df3e-46c5-913f-f2d2f965ed0e}
-bondcnt=0
-
-# Detect Distro
-if [ -f /etc/redhat-release ];
-then
-   cfgdir=/etc/sysconfig/network-scripts
-   distro=redhat
-elif grep -q 'Ubuntu' /etc/issue
-then
-   cfgdir=/etc/network
-   distro=ubuntu
-elif grep -q 'SUSE' /etc/issue
-then
-   cfgdir=/etc/sysconfig/network
-   distro=suse
-else
-   echo "Unsupported Distro"
-   exit 1
-fi
-
-echo Detected Distro: $distro, or compatible
-
-# Get a list of ethernet names
-list_eth=(`cd $sysdir && ls -d */ | cut -d/ -f1 | grep -v bond`)
-eth_cnt=${#list_eth[@]}
-
-echo List of net devices:
-
-# Get the MAC addresses
-for (( i=0; i < $eth_cnt; i++ ))
-do
-   list_mac[$i]=`cat $sysdir/${list_eth[$i]}/address`
-   echo ${list_eth[$i]}, ${list_mac[$i]}
-done
-
-# Find NIC with matching MAC
-for (( i=0; i < $eth_cnt-1; i++ ))
-do
-   for (( j=i+1; j < $eth_cnt; j++ ))
-   do
-   if [ "${list_mac[$i]}" = "${list_mac[$j]}" ]
-   then
-   list_match[$i]=${list_eth[$j]}
-   break
-   fi
-   done
-done
-
-function create_eth_cfg_redhat {
-   local fn=$cfgdir/ifcfg-$1
-
-   rm -f $fn
-   echo DEVICE=$1 >>$fn
-   echo TYPE=Ethernet >>$fn
-   echo BOOTPROTO=none >>$fn
-   echo UUID=`uuidgen` >>$fn
-   echo ONBOOT=yes >>$fn
-   echo PEERDNS=yes >>$fn
-   echo IPV6INIT=yes >>$fn
-   echo MASTER=$2 >>$fn
-   echo SLAVE=yes >>$fn
-}
-
-function create_eth_cfg_pri_redhat {
-   create_eth_cfg_redhat $1 $2
-}
-
-function create_bond_cfg_redhat {
-   local fn=$cfgdir/ifcfg-$1
-
-   rm -f $fn
-   echo DEVICE=$1 >>$fn
-   echo TYPE=Bond >>$fn
-   echo BOOTPROTO=dhcp >>$fn
-   echo UUID=`uuidgen` >>$fn
-   echo ONBOOT=yes >>$fn
-   echo PEERDNS=yes >>$fn
-   echo IPV6INIT=yes >>$fn
-   echo BONDING_MASTER=yes >>$fn
-   echo BONDING_OPTS=\"mode=active-backup miimon=100 primary=$2\" >>$fn
-}
-
-function del_eth_cfg_ubuntu {
-   local mainfn=$cfgdir/interfaces
-   local fnlist=( $mainfn )
-
-   local dirlist=(`awk '/^[ \t]*source/{print $2}' $mainfn`)
-
-   local i
-   for i in "${dirlist[@]}"
-   do
-   fnlist+=(`ls $i 2>/dev/null`)
-   done
-
-   local tmpfl=$(mktemp)
-
-   local nic_start='^[ \t]*(auto|iface|mapping|allow-.*)[ \t]+'$1
-   local nic_end='^[ \t]*(auto|iface|mapping|allow-.*|source)'
-
-   local fn
-   for fn in "${fnlist[@]}"
-   do
-   awk "/$nic_end/{x=0} x{next} /$nic_start/{x=1;next} 1" \
-   $fn >$tmpfl
-
-   cp $tmpfl $fn
-   done
-
-   rm $tmpfl
-}
-
-function create_eth_cfg_ubuntu {
-   local fn=$cfgdir/interfaces
-
-   del_eth_cfg_ubuntu $1
-   echo $'\n'auto $1 >>$fn
-   echo iface $1 inet manual >>$fn
-   echo bond-master $2 >>$fn
-}
-
-function create_eth_cfg_pri_ubuntu {
-   local fn=$cfgdir/interfaces
-
-   del_eth_cfg_ubuntu $1
-   echo $'\n'allow-hotplug $1 >>$fn
-   echo iface $1 inet manual >>$fn
-   echo bond-master $2 >>$fn
-   echo bond-primary $1 >>$fn
-}
-
-function create_bond_cfg_ubuntu {
-   local fn=$cfgdir/interfaces
-
-   del_eth_cfg_ubuntu $1
-
-   echo $'\n'auto $1 >>$fn
-   echo iface $1 inet dhcp >>$fn
-   echo bond-mode active-backup >>$fn
-   echo bond-miimon 100 >>$fn
-   echo bond-slaves none >>$fn
-}
-
-function create_eth_cfg_suse {
-local fn=$cfgdir/ifcfg-$1
-
-rm -f $fn
-   echo BOOTPROTO=none >>$fn
-   echo STARTMODE=auto >>$fn
-}
-
-function create_eth_cfg_pri_suse {
-   local fn=$cfgdir/ifcfg-$1
-
-   rm -f $fn
-   echo BOOTPROTO=none >>$fn
-   echo

[PATCH net-next v2 02/10] netvsc: fix warnings reported by lockdep

2017-07-26 Thread Stephen Hemminger

This includes a bunch of fixups for issues reported by
lockdep.
   * ethtool routines can assume RTNL
   * send is done with RCU lock (and BH disable)
   * avoid refetching internal device struct (netvsc)
 instead pass it as a parameter.

Signed-off-by: Stephen Hemminger 
---
 drivers/net/hyperv/hyperv_net.h   |  3 +-
 drivers/net/hyperv/netvsc.c   |  2 +-
 drivers/net/hyperv/netvsc_drv.c   | 15 ---
 drivers/net/hyperv/rndis_filter.c | 84 +++
 4 files changed, 54 insertions(+), 50 deletions(-)

diff --git a/drivers/net/hyperv/hyperv_net.h b/drivers/net/hyperv/hyperv_net.h
index 4e7ff348327e..fb62ea632914 100644
--- a/drivers/net/hyperv/hyperv_net.h
+++ b/drivers/net/hyperv/hyperv_net.h
@@ -217,7 +217,8 @@ int rndis_filter_receive(struct net_device *ndev,
 struct vmbus_channel *channel,
 void *data, u32 buflen);
 
-int rndis_filter_set_device_mac(struct net_device *ndev, char *mac);
+int rndis_filter_set_device_mac(struct netvsc_device *ndev,
+   const char *mac);
 
 void netvsc_switch_datapath(struct net_device *nv_dev, bool vf);
 
diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c
index 06f39a99da7c..94c00acac58a 100644
--- a/drivers/net/hyperv/netvsc.c
+++ b/drivers/net/hyperv/netvsc.c
@@ -832,7 +832,7 @@ int netvsc_send(struct net_device_context *ndev_ctx,
struct sk_buff *skb)
 {
struct netvsc_device *net_device
-   = rcu_dereference_rtnl(ndev_ctx->nvdev);
+   = rcu_dereference_bh(ndev_ctx->nvdev);
struct hv_device *device = ndev_ctx->device_ctx;
int ret = 0;
struct netvsc_channel *nvchan;
diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c
index f1eaf675d2e9..a04f2efbbc25 100644
--- a/drivers/net/hyperv/netvsc_drv.c
+++ b/drivers/net/hyperv/netvsc_drv.c
@@ -923,6 +923,8 @@ static void netvsc_get_stats64(struct net_device *net,
 
 static int netvsc_set_mac_addr(struct net_device *ndev, void *p)
 {
+   struct net_device_context *ndc = netdev_priv(ndev);
+   struct netvsc_device *nvdev = rtnl_dereference(ndc->nvdev);
struct sockaddr *addr = p;
char save_adr[ETH_ALEN];
unsigned char save_aatype;
@@ -935,7 +937,10 @@ static int netvsc_set_mac_addr(struct net_device *ndev, 
void *p)
if (err != 0)
return err;
 
-   err = rndis_filter_set_device_mac(ndev, addr->sa_data);
+   if (!nvdev)
+   return -ENODEV;
+
+   err = rndis_filter_set_device_mac(nvdev, addr->sa_data);
if (err != 0) {
/* roll back to saved MAC */
memcpy(ndev->dev_addr, save_adr, ETH_ALEN);
@@ -981,7 +986,7 @@ static void netvsc_get_ethtool_stats(struct net_device *dev,
 struct ethtool_stats *stats, u64 *data)
 {
struct net_device_context *ndc = netdev_priv(dev);
-   struct netvsc_device *nvdev = rcu_dereference(ndc->nvdev);
+   struct netvsc_device *nvdev = rtnl_dereference(ndc->nvdev);
const void *nds = >eth_stats;
const struct netvsc_stats *qstats;
unsigned int start;
@@ -1019,7 +1024,7 @@ static void netvsc_get_ethtool_stats(struct net_device 
*dev,
 static void netvsc_get_strings(struct net_device *dev, u32 stringset, u8 *data)
 {
struct net_device_context *ndc = netdev_priv(dev);
-   struct netvsc_device *nvdev = rcu_dereference(ndc->nvdev);
+   struct netvsc_device *nvdev = rtnl_dereference(ndc->nvdev);
u8 *p = data;
int i;
 
@@ -1077,7 +1082,7 @@ netvsc_get_rxnfc(struct net_device *dev, struct 
ethtool_rxnfc *info,
 u32 *rules)
 {
struct net_device_context *ndc = netdev_priv(dev);
-   struct netvsc_device *nvdev = rcu_dereference(ndc->nvdev);
+   struct netvsc_device *nvdev = rtnl_dereference(ndc->nvdev);
 
if (!nvdev)
return -ENODEV;
@@ -1127,7 +1132,7 @@ static int netvsc_get_rxfh(struct net_device *dev, u32 
*indir, u8 *key,
   u8 *hfunc)
 {
struct net_device_context *ndc = netdev_priv(dev);
-   struct netvsc_device *ndev = rcu_dereference(ndc->nvdev);
+   struct netvsc_device *ndev = rtnl_dereference(ndc->nvdev);
struct rndis_device *rndis_dev;
int i;
 
diff --git a/drivers/net/hyperv/rndis_filter.c 
b/drivers/net/hyperv/rndis_filter.c
index eaa3f0d5682a..bf21ea92c743 100644
--- a/drivers/net/hyperv/rndis_filter.c
+++ b/drivers/net/hyperv/rndis_filter.c
@@ -85,14 +85,6 @@ static struct rndis_device *get_rndis_device(void)
return device;
 }
 
-static struct netvsc_device *
-net_device_to_netvsc_device(struct net_device *ndev)
-{
-   struct net_device_context *net_device_ctx = netdev_priv(ndev);
-
-   return rtnl_dereference(net_device_ctx->nvdev);
-}
-
 static struct rndis_request *get_rndis_request(struct rndis_device

[PATCH net-next v2 04/10] netvsc: remove unnecessary indirection of page_buffer

2017-07-26 Thread Stephen Hemminger

The internal API was passing struct hv_page_buffer **
when only simple struct hv_page_buffer * was necessary
for passing an array.

Signed-off-by: Stephen Hemminger 
---
 drivers/net/hyperv/hyperv_net.h   |  2 +-
 drivers/net/hyperv/netvsc.c   | 21 ++---
 drivers/net/hyperv/netvsc_drv.c   | 10 --
 drivers/net/hyperv/rndis_filter.c |  4 ++--
 4 files changed, 17 insertions(+), 20 deletions(-)

diff --git a/drivers/net/hyperv/hyperv_net.h b/drivers/net/hyperv/hyperv_net.h
index fb62ea632914..9ca3ed692d73 100644
--- a/drivers/net/hyperv/hyperv_net.h
+++ b/drivers/net/hyperv/hyperv_net.h
@@ -190,7 +190,7 @@ void netvsc_device_remove(struct hv_device *device);
 int netvsc_send(struct net_device_context *ndc,
struct hv_netvsc_packet *packet,
struct rndis_message *rndis_msg,
-   struct hv_page_buffer **page_buffer,
+   struct hv_page_buffer *page_buffer,
struct sk_buff *skb);
 void netvsc_linkstatus_callback(struct hv_device *device_obj,
struct rndis_message *resp);
diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c
index f0c15e782ce0..d3c0b19f6d34 100644
--- a/drivers/net/hyperv/netvsc.c
+++ b/drivers/net/hyperv/netvsc.c
@@ -697,7 +697,7 @@ static u32 netvsc_copy_to_send_buf(struct netvsc_device 
*net_device,
   u32 pend_size,
   struct hv_netvsc_packet *packet,
   struct rndis_message *rndis_msg,
-  struct hv_page_buffer **pb,
+  struct hv_page_buffer *pb,
   struct sk_buff *skb)
 {
char *start = net_device->send_buf;
@@ -718,9 +718,9 @@ static u32 netvsc_copy_to_send_buf(struct netvsc_device 
*net_device,
}
 
for (i = 0; i < page_count; i++) {
-   char *src = phys_to_virt((*pb)[i].pfn << PAGE_SHIFT);
-   u32 offset = (*pb)[i].offset;
-   u32 len = (*pb)[i].len;
+   char *src = phys_to_virt(pb[i].pfn << PAGE_SHIFT);
+   u32 offset = pb[i].offset;
+   u32 len = pb[i].len;
 
memcpy(dest, (src + offset), len);
msg_size += len;
@@ -739,7 +739,7 @@ static inline int netvsc_send_pkt(
struct hv_device *device,
struct hv_netvsc_packet *packet,
struct netvsc_device *net_device,
-   struct hv_page_buffer **pb,
+   struct hv_page_buffer *pb,
struct sk_buff *skb)
 {
struct nvsp_message nvmsg;
@@ -750,7 +750,6 @@ static inline int netvsc_send_pkt(
struct netdev_queue *txq = netdev_get_tx_queue(ndev, packet->q_idx);
u64 req_id;
int ret;
-   struct hv_page_buffer *pgbuf;
u32 ring_avail = hv_ringbuf_avail_percent(_channel->outbound);
 
nvmsg.hdr.msg_type = NVSP_MSG1_TYPE_SEND_RNDIS_PKT;
@@ -776,11 +775,11 @@ static inline int netvsc_send_pkt(
return -ENODEV;
 
if (packet->page_buf_cnt) {
-   pgbuf = packet->cp_partial ? (*pb) +
-   packet->rmsg_pgcnt : (*pb);
+   if (packet->cp_partial)
+   pb += packet->rmsg_pgcnt;
+
ret = vmbus_sendpacket_pagebuffer_ctl(out_channel,
- pgbuf,
- packet->page_buf_cnt,
+ pb, packet->page_buf_cnt,
  ,
  sizeof(struct 
nvsp_message),
  req_id,
@@ -830,7 +829,7 @@ static inline void move_pkt_msd(struct hv_netvsc_packet 
**msd_send,
 int netvsc_send(struct net_device_context *ndev_ctx,
struct hv_netvsc_packet *packet,
struct rndis_message *rndis_msg,
-   struct hv_page_buffer **pb,
+   struct hv_page_buffer *pb,
struct sk_buff *skb)
 {
struct netvsc_device *net_device
diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c
index a04f2efbbc25..8ff4cbf582cc 100644
--- a/drivers/net/hyperv/netvsc_drv.c
+++ b/drivers/net/hyperv/netvsc_drv.c
@@ -282,9 +282,8 @@ static u32 fill_pg_buf(struct page *page, u32 offset, u32 
len,
 
 static u32 init_page_array(void *hdr, u32 len, struct sk_buff *skb,
   struct hv_netvsc_packet *packet,
-  struct hv_page_buffer **page_buf)
+  struct hv_page_buffer *pb)
 {
-   struct hv_page_buffer *pb = *page_buf;
u32 slots_used = 0;
char *data = skb->data;
int frags = skb_shinfo(skb)->nr_frags;
@@ -359,8 +358,7 @@ static int netvsc_start_xmit(struct sk_buff *skb, struct 
net_device *net)

[PATCH net-next v2 07/10] netvsc: allow smaller send/recv buffer size

2017-07-26 Thread Stephen Hemminger

The default value of send and receive buffer area for host DMA
is much larger than it needs to be. Experimentation shows that
4M receive and 1M send is sufficient.

Make the size a module parameter so that it can be adjusted
as needed for testing or special needs.

Signed-off-by: Stephen Hemminger 
---
 drivers/net/hyperv/hyperv_net.h |  2 ++
 drivers/net/hyperv/netvsc.c | 18 --
 drivers/net/hyperv/netvsc_drv.c | 31 ++-
 3 files changed, 44 insertions(+), 7 deletions(-)

diff --git a/drivers/net/hyperv/hyperv_net.h b/drivers/net/hyperv/hyperv_net.h
index f2cef5aaed1f..4779422868cf 100644
--- a/drivers/net/hyperv/hyperv_net.h
+++ b/drivers/net/hyperv/hyperv_net.h
@@ -148,6 +148,8 @@ struct netvsc_device_info {
unsigned char mac_adr[ETH_ALEN];
int  ring_size;
u32  num_chn;
+   u32  recv_buf_size;
+   u32  send_buf_size;
 };
 
 enum rndis_device_state {
diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c
index c5e6f7fc4f2b..dd64227c736c 100644
--- a/drivers/net/hyperv/netvsc.c
+++ b/drivers/net/hyperv/netvsc.c
@@ -482,14 +482,16 @@ static int negotiate_nvsp_ver(struct hv_device *device,
 }
 
 static int netvsc_connect_vsp(struct hv_device *device,
- struct netvsc_device *net_device)
+ struct netvsc_device *net_device,
+ const struct netvsc_device_info *device_info)
 {
const u32 ver_list[] = {
NVSP_PROTOCOL_VERSION_1, NVSP_PROTOCOL_VERSION_2,
NVSP_PROTOCOL_VERSION_4, NVSP_PROTOCOL_VERSION_5
};
struct nvsp_message *init_packet;
-   int ndis_version, i, ret;
+   u32 max_recv_buf_size, ndis_version;
+   int i, ret;
 
init_packet = _device->channel_init_pkt;
 
@@ -534,10 +536,14 @@ static int netvsc_connect_vsp(struct hv_device *device,
 
/* Post the big receive buffer to NetVSP */
if (net_device->nvsp_version <= NVSP_PROTOCOL_VERSION_2)
-   net_device->recv_buf_size = NETVSC_RECEIVE_BUFFER_SIZE_LEGACY;
+   max_recv_buf_size = NETVSC_RECEIVE_BUFFER_SIZE_LEGACY;
else
-   net_device->recv_buf_size = NETVSC_RECEIVE_BUFFER_SIZE;
-   net_device->send_buf_size = NETVSC_SEND_BUFFER_SIZE;
+   max_recv_buf_size = NETVSC_RECEIVE_BUFFER_SIZE;
+
+   net_device->recv_buf_size = min(max_recv_buf_size,
+   device_info->recv_buf_size);
+   net_device->send_buf_size = min_t(u32, NETVSC_SEND_BUFFER_SIZE,
+ device_info->send_buf_size);
 
ret = netvsc_init_buf(device, net_device);
 
@@ -1302,7 +1308,7 @@ struct netvsc_device *netvsc_device_add(struct hv_device 
*device,
rcu_assign_pointer(net_device_ctx->nvdev, net_device);
 
/* Connect with the NetVsp */
-   ret = netvsc_connect_vsp(device, net_device);
+   ret = netvsc_connect_vsp(device, net_device, device_info);
if (ret != 0) {
netdev_err(ndev,
"unable to connect to NetVSP - %d\n", ret);
diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c
index 8ff4cbf582cc..b662c0198807 100644
--- a/drivers/net/hyperv/netvsc_drv.c
+++ b/drivers/net/hyperv/netvsc_drv.c
@@ -44,13 +44,23 @@
 
 #include "hyperv_net.h"
 
-#define RING_SIZE_MIN 64
+#define RING_SIZE_MIN   64
+#define RECV_BUFFER_MIN 16
+#define SEND_BUFFER_MIN 4
 #define LINKCHANGE_INT (2 * HZ)
 
 static int ring_size = 128;
 module_param(ring_size, int, S_IRUGO);
 MODULE_PARM_DESC(ring_size, "Ring buffer size (# of pages)");
 
+static unsigned int recv_buffer_size = (4 * 1024 * 1024) / PAGE_SIZE;
+module_param(recv_buffer_size, uint, S_IRUGO);
+MODULE_PARM_DESC(recv_buffer_size, "Receive buffer size (# of pages)");
+
+static unsigned int send_buffer_size = (1024 * 1024) / PAGE_SIZE;
+module_param(send_buffer_size, uint, S_IRUGO);
+MODULE_PARM_DESC(send_buffer_size, "Send buffer size (# of pages)");
+
 static const u32 default_msg = NETIF_MSG_DRV | NETIF_MSG_PROBE |
NETIF_MSG_LINK | NETIF_MSG_IFUP |
NETIF_MSG_IFDOWN | NETIF_MSG_RX_ERR |
@@ -751,6 +761,8 @@ static int netvsc_set_channels(struct net_device *net,
memset(_info, 0, sizeof(device_info));
device_info.num_chn = count;
device_info.ring_size = ring_size;
+   device_info.send_buf_size = nvdev->send_buf_size;
+   device_info.recv_buf_size = nvdev->recv_buf_size;
 
nvdev = rndis_filter_device_add(dev, _info);
if (!IS_ERR(nvdev)) {
@@ -848,6 +860,8 @@ static int netvsc_change_mtu(struct net_device *ndev, int 
mtu)
memset(_info, 0, sizeof(device_info));
device_info.ring_size = ring_size;
device_info.num_chn = nvdev->num_chn;
+   device_info.send_buf_size = nvdev->send_buf_size;
+

[PATCH net-next v2 06/10] netvsc: signal host if receive ring is emptied

2017-07-26 Thread Stephen Hemminger

Latency improvement related to NAPI conversion.
If all packets are processed from receive ring then need
to signal host.

Signed-off-by: Stephen Hemminger 
---
 drivers/net/hyperv/netvsc.c | 11 ---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c
index 4c709b454d34..c5e6f7fc4f2b 100644
--- a/drivers/net/hyperv/netvsc.c
+++ b/drivers/net/hyperv/netvsc.c
@@ -1195,10 +1195,15 @@ int netvsc_poll(struct napi_struct *napi, int budget)
nvchan->desc = hv_pkt_iter_next(channel, nvchan->desc);
}
 
-   /* If send of  pending receive completions suceeded
-*   and did not exhaust NAPI budget
+   /* if ring is empty, signal host */
+   if (!nvchan->desc)
+   hv_pkt_iter_close(channel);
+
+   /* If send of pending receive completions suceeded
+*   and did not exhaust NAPI budget this time
 *   and not doing busy poll
-* then reschedule if more data has arrived from host
+* then re-enable host interrupts
+* and reschedule if ring is not empty.
 */
if (send_recv_completions(nvchan) == 0 &&
work_done < budget &&
-- 
2.11.0

[PATCH net-next v2 05/10] netvsc: optimize receive completions

2017-07-26 Thread Stephen Hemminger

Optimize how receive completion ring are managed.
   * Allocate only as many slots as needed for all buffers from host
   * Allocate before setting up sub channel for better error detection
   * Don't need to keep copy of initial receive section message
   * Precompute the watermark for when receive flushing is needed
   * Replace division with conditional test
   * Replace atomic per-device variable with per-channel check.
   * Handle corner case where receive completion send
 fails if ring buffer to host is full.

Signed-off-by: Stephen Hemminger 
---
 drivers/net/hyperv/hyperv_net.h   |  14 +-
 drivers/net/hyperv/netvsc.c   | 267 --
 drivers/net/hyperv/rndis_filter.c |  20 +--
 3 files changed, 126 insertions(+), 175 deletions(-)

diff --git a/drivers/net/hyperv/hyperv_net.h b/drivers/net/hyperv/hyperv_net.h
index 9ca3ed692d73..f2cef5aaed1f 100644
--- a/drivers/net/hyperv/hyperv_net.h
+++ b/drivers/net/hyperv/hyperv_net.h
@@ -186,6 +186,7 @@ struct net_device_context;
 
 struct netvsc_device *netvsc_device_add(struct hv_device *device,
const struct netvsc_device_info *info);
+int netvsc_alloc_recv_comp_ring(struct netvsc_device *net_device, u32 q_idx);
 void netvsc_device_remove(struct hv_device *device);
 int netvsc_send(struct net_device_context *ndc,
struct hv_netvsc_packet *packet,
@@ -657,13 +658,10 @@ struct recv_comp_data {
u32 status;
 };
 
-/* Netvsc Receive Slots Max */
-#define NETVSC_RECVSLOT_MAX (NETVSC_RECEIVE_BUFFER_SIZE / ETH_DATA_LEN + 1)
-
 struct multi_recv_comp {
-   void *buf; /* queued receive completions */
-   u32 first; /* first data entry */
-   u32 next; /* next entry for writing */
+   struct recv_comp_data *slots;
+   u32 first;  /* first data entry */
+   u32 next;   /* next entry for writing */
 };
 
 struct netvsc_stats {
@@ -750,7 +748,7 @@ struct netvsc_device {
u32 recv_buf_size;
u32 recv_buf_gpadl_handle;
u32 recv_section_cnt;
-   struct nvsp_1_receive_buffer_section *recv_section;
+   u32 recv_completion_cnt;
 
/* Send buffer allocated by us */
void *send_buf;
@@ -778,8 +776,6 @@ struct netvsc_device {
u32 max_pkt; /* max number of pkt in one send, e.g. 8 */
u32 pkt_align; /* alignment bytes, e.g. 8 */
 
-   atomic_t num_outstanding_recvs;
-
atomic_t open_cnt;
 
struct netvsc_channel chan_table[VRSS_CHANNEL_MAX];
diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c
index d3c0b19f6d34..4c709b454d34 100644
--- a/drivers/net/hyperv/netvsc.c
+++ b/drivers/net/hyperv/netvsc.c
@@ -72,9 +72,6 @@ static struct netvsc_device *alloc_net_device(void)
if (!net_device)
return NULL;
 
-   net_device->chan_table[0].mrc.buf
-   = vzalloc(NETVSC_RECVSLOT_MAX * sizeof(struct recv_comp_data));
-
init_waitqueue_head(_device->wait_drain);
net_device->destroy = false;
atomic_set(_device->open_cnt, 0);
@@ -92,7 +89,7 @@ static void free_netvsc_device(struct rcu_head *head)
int i;
 
for (i = 0; i < VRSS_CHANNEL_MAX; i++)
-   vfree(nvdev->chan_table[i].mrc.buf);
+   vfree(nvdev->chan_table[i].mrc.slots);
 
kfree(nvdev);
 }
@@ -171,12 +168,6 @@ static void netvsc_destroy_buf(struct hv_device *device)
net_device->recv_buf = NULL;
}
 
-   if (net_device->recv_section) {
-   net_device->recv_section_cnt = 0;
-   kfree(net_device->recv_section);
-   net_device->recv_section = NULL;
-   }
-
/* Deal with the send buffer we may have setup.
 * If we got a  send section size, it means we received a
 * NVSP_MSG1_TYPE_SEND_SEND_BUF_COMPLETE msg (ie sent
@@ -239,11 +230,26 @@ static void netvsc_destroy_buf(struct hv_device *device)
kfree(net_device->send_section_map);
 }
 
+int netvsc_alloc_recv_comp_ring(struct netvsc_device *net_device, u32 q_idx)
+{
+   struct netvsc_channel *nvchan = _device->chan_table[q_idx];
+   int node = cpu_to_node(nvchan->channel->target_cpu);
+   size_t size;
+
+   size = net_device->recv_completion_cnt * sizeof(struct recv_comp_data);
+   nvchan->mrc.slots = vzalloc_node(size, node);
+   if (!nvchan->mrc.slots)
+   nvchan->mrc.slots = vzalloc(size);
+
+   return nvchan->mrc.slots ? 0 : -ENOMEM;
+}
+
 static int netvsc_init_buf(struct hv_device *device,
   struct netvsc_device *net_device)
 {
int ret = 0;
struct nvsp_message *init_packet;
+   struct nvsp_1_message_send_receive_buffer_complete *resp;
struct net_device *ndev;
size_t map_words;
int node;
@@ -300,43 +306,41 @@ static int netvsc_init_buf(struct hv_device *device,
wait_for_completion(_device->channel_init_wait);
 
/* Check

[PATCH net-next v2 08/10] netvsc: transparent VF management

2017-07-26 Thread Stephen Hemminger

This patch implements transparent fail over from synthetic NIC to
SR-IOV virtual function NIC in Hyper-V environment. It is a better
alternative to using bonding as is done now. Instead, the receive and
transmit fail over is done internally inside the driver.

Using bonding driver has lots of issues because it depends on the
script being run early enough in the boot process and with sufficient
information to make the association. This patch moves all that
functionality into the kernel.

Signed-off-by: Stephen Hemminger 
---
 drivers/net/hyperv/hyperv_net.h |  12 ++
 drivers/net/hyperv/netvsc_drv.c | 418 +++-
 2 files changed, 341 insertions(+), 89 deletions(-)

diff --git a/drivers/net/hyperv/hyperv_net.h b/drivers/net/hyperv/hyperv_net.h
index 4779422868cf..8ff4ff0d77dd 100644
--- a/drivers/net/hyperv/hyperv_net.h
+++ b/drivers/net/hyperv/hyperv_net.h
@@ -682,6 +682,15 @@ struct netvsc_ethtool_stats {
unsigned long tx_busy;
 };
 
+struct netvsc_vf_pcpu_stats {
+   u64 rx_packets;
+   u64 rx_bytes;
+   u64 tx_packets;
+   u64 tx_bytes;
+   struct u64_stats_sync   syncp;
+   u32 tx_dropped;
+};
+
 struct netvsc_reconfig {
struct list_head list;
u32 event;
@@ -715,6 +724,9 @@ struct net_device_context {
 
/* State to manage the associated VF interface. */
struct net_device __rcu *vf_netdev;
+   struct netvsc_vf_pcpu_stats __percpu *vf_stats;
+   struct work_struct vf_takeover;
+   struct work_struct vf_notify;
 
/* 1: allocated, serial number is valid. 0: not allocated */
u32 vf_alloc;
diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c
index b662c0198807..73edc3db97b2 100644
--- a/drivers/net/hyperv/netvsc_drv.c
+++ b/drivers/net/hyperv/netvsc_drv.c
@@ -34,6 +34,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -81,6 +82,7 @@ static void netvsc_set_multicast_list(struct net_device *net)
 static int netvsc_open(struct net_device *net)
 {
struct net_device_context *ndev_ctx = netdev_priv(net);
+   struct net_device *vf_netdev = rtnl_dereference(ndev_ctx->vf_netdev);
struct netvsc_device *nvdev = rtnl_dereference(ndev_ctx->nvdev);
struct rndis_device *rdev;
int ret = 0;
@@ -97,15 +99,29 @@ static int netvsc_open(struct net_device *net)
netif_tx_wake_all_queues(net);
 
rdev = nvdev->extension;
-   if (!rdev->link_state && !ndev_ctx->datapath)
+
+   if (!rdev->link_state)
netif_carrier_on(net);
 
-   return ret;
+   if (vf_netdev) {
+   /* Setting synthetic device up transparently sets
+* slave as up. If open fails, then slave will be
+* still be offline (and not used).
+*/
+   ret = dev_open(vf_netdev);
+   if (ret)
+   netdev_warn(net,
+   "unable to open slave: %s: %d\n",
+   vf_netdev->name, ret);
+   }
+   return 0;
 }
 
 static int netvsc_close(struct net_device *net)
 {
struct net_device_context *net_device_ctx = netdev_priv(net);
+   struct net_device *vf_netdev
+   = rtnl_dereference(net_device_ctx->vf_netdev);
struct netvsc_device *nvdev = rtnl_dereference(net_device_ctx->nvdev);
int ret;
u32 aread, i, msec = 10, retry = 0, retry_max = 20;
@@ -151,6 +167,9 @@ static int netvsc_close(struct net_device *net)
ret = -ETIMEDOUT;
}
 
+   if (vf_netdev)
+   dev_close(vf_netdev);
+
return ret;
 }
 
@@ -234,13 +253,11 @@ static inline int netvsc_get_tx_queue(struct net_device 
*ndev,
  *
  * TODO support XPS - but get_xps_queue not exported
  */
-static u16 netvsc_select_queue(struct net_device *ndev, struct sk_buff *skb,
-   void *accel_priv, select_queue_fallback_t fallback)
+static u16 netvsc_pick_tx(struct net_device *ndev, struct sk_buff *skb)
 {
-   unsigned int num_tx_queues = ndev->real_num_tx_queues;
int q_idx = sk_tx_queue_get(skb->sk);
 
-   if (q_idx < 0 || skb->ooo_okay) {
+   if (q_idx < 0 || skb->ooo_okay || q_idx >= ndev->real_num_tx_queues) {
/* If forwarding a packet, we use the recorded queue when
 * available for better cache locality.
 */
@@ -250,12 +267,33 @@ static u16 netvsc_select_queue(struct net_device *ndev, 
struct sk_buff *skb,
q_idx = netvsc_get_tx_queue(ndev, skb, q_idx);
}
 
-   while (unlikely(q_idx >= num_tx_queues))
-   q_idx -= num_tx_queues;
-
return q_idx;
 }
 
+static u16 netvsc_select_queue(struct net_device *ndev, struct sk_buff *skb,
+  void *accel_priv,
+  select_queue_fallback_t fallback)
+{
+

[PATCH net-next v2 03/10] netvsc: don't print pointer value in error message

2017-07-26 Thread Stephen Hemminger

Using %p to print pointer to packet meta-data doesn't give any
good info, and exposes kernel memory offsets.

Signed-off-by: Stephen Hemminger 
---
 drivers/net/hyperv/netvsc.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c
index 94c00acac58a..f0c15e782ce0 100644
--- a/drivers/net/hyperv/netvsc.c
+++ b/drivers/net/hyperv/netvsc.c
@@ -805,8 +805,10 @@ static inline int netvsc_send_pkt(
ret = -ENOSPC;
}
} else {
-   netdev_err(ndev, "Unable to send packet %p ret %d\n",
-  packet, ret);
+   netdev_err(ndev,
+  "Unable to send packet pages %u len %u, ret %d\n",
+  packet->page_buf_cnt, packet->total_data_buflen,
+  ret);
}
 
return ret;
-- 
2.11.0

[PATCH net-next v2 01/10] netvsc: fix return value for set_channels

2017-07-26 Thread Stephen Hemminger

The error and normal case got swapped.

Signed-off-by: Stephen Hemminger 
---
 drivers/net/hyperv/netvsc_drv.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c
index 262486ce8e2a..f1eaf675d2e9 100644
--- a/drivers/net/hyperv/netvsc_drv.c
+++ b/drivers/net/hyperv/netvsc_drv.c
@@ -758,8 +758,8 @@ static int netvsc_set_channels(struct net_device *net,
if (!IS_ERR(nvdev)) {
netif_set_real_num_tx_queues(net, nvdev->num_chn);
netif_set_real_num_rx_queues(net, nvdev->num_chn);
-   ret = PTR_ERR(nvdev);
} else {
+   ret = PTR_ERR(nvdev);
device_info.num_chn = orig;
rndis_filter_device_add(dev, _info);
}
-- 
2.11.0

Re: [net-next v4 0/5][pull request] 10GbE Intel Wired LAN Driver Updates 2017-07-25

2017-07-26 Thread David Miller

From: Jeff Kirsher 
Date: Tue, 25 Jul 2017 16:41:33 -0700

> This series contains updates to ixgbe only.
> 
> Tony provides all of the changes in the series, starting with adding a
> check to ensure that adding a MAC filter was successful, before setting the
> MACVLAN.  In order to receive notifications of link configurations of the
> external PHY and support the configuration of the internal iXFI link on
> X552 devices, Tony enables LASI interrupts.  Update the iXFI driver code
> flow, since the MAC register NW_MNG_IF_SEL fields have been redefined for
> X553 devices, so add MAC checks for iXFI flows.  Added additional checks
> for flow control autonegotiation, since it is not support for X553 fiber
>  and XFI devices.
> 
> v2: removed unnecessary parens noticed by David Miller in patch 6 of the
> series.
> v3: dropped patch 6 of the original series, while we work out a more
> generic solution for malicious driver detection (MDD) support.
> v4: updated patch 1 of the series with the comments from Joe Perches which
> were:
>   - switched logic to return on error
>   - return 0 on success
>   - declare retval as an integer
 ...
>   git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue 10GbE

Pulled, thanks Jeff!

Re: [PATCH] net: inet: diag: expose sockets cgroup classid

2017-07-26 Thread Levin, Alexander (Sasha Levin)

On Wed, Jul 26, 2017 at 03:01:32PM -0700, Cong Wang wrote:
>On Wed, Jul 26, 2017 at 10:22 AM, Levin, Alexander (Sasha Levin)
> wrote:
>> +   if (ext & (1 << (INET_DIAG_CLASS_ID - 1))) {
>> +   u32 classid = 0;
>> +
>> +#ifdef CONFIG_SOCK_CGROUP_DATA
>> +   classid = sock_cgroup_classid(>sk_cgrp_data);
>> +#endif
>
>
>If CONFIG_SOCK_CGROUP_DATA is not enabled, should we put 0
>or put nothing (that is, skipping this nla_put())?

My logic was that if CONFIG_SOCK_CGROUP_DATA is disabled, then all sockets have 
the same default classid ==> 0, rather than having to deal with whether that 
config is enabled or not.

>> +
>> +   nla_put(skb, INET_DIAG_CLASS_ID, sizeof(classid), );
>
>nla_put_u32()

Will fix, thanks!

-- 

Thanks,
Sasha

Re: [PATCH RFC 02/13] net: phy: split out PHY speed and duplex string generation

2017-07-26 Thread Florian Fainelli

On 07/25/2017 07:02 AM, Russell King wrote:
> Other code would like to make use of this, so make the speed and duplex
> string generation visible, and place it in a separate file.
> 
> Signed-off-by: Russell King 

Reviewed-by: Florian Fainelli 
-- 
Florian

Re: [PATCH RFC 01/13] net: phy: allow settings table to support more than 32 link modes

2017-07-26 Thread Florian Fainelli

On 07/25/2017 07:02 AM, Russell King wrote:
> Allow the phy settings table to support more than 32 link modes by
> switching to the ethtool link mode bit number representation, rather
> than storing the mask.  This will allow phylink and other ethtool
> code to share the settings table to look up settings.
> 
> Signed-off-by: Russell King 

Reviewed-by: Florian Fainelli 
-- 
Florian

Re: [PATCH RFC 07/13] net: phy: add I2C mdio bus

2017-07-26 Thread Florian Fainelli

On 07/25/2017 07:03 AM, Russell King wrote:
> Add an I2C MDIO bus bridge library, to allow phylib to access PHYs which
> are connected to an I2C bus instead of the more conventional MDIO bus.
> Such PHYs can be found in SFP adapters and SFF modules.
> 
> Since PHYs appear at I2C bus address 0x40..0x5f, and 0x50/0x51 are
> reserved for SFP EEPROMs/diagnostics, we must not allow the MDIO bus
> to access these I2C addresses.
> 
> Signed-off-by: Russell King 

Reviewed-by: Florian Fainelli 
-- 
Florian

Re: [PATCH V2 net-next 20/21] net-next/hinic: Add ethtool and stats

2017-07-26 Thread Andrew Lunn

On Wed, Jul 19, 2017 at 03:36:28PM +0300, Aviad Krawczyk wrote:
> Hi Joe,
> 
> I tried to be consistent with the comments before, that requested
> that we will use dev_err exclude some special cases for use netif.
> 
> We will replace the dev_err(>dev,.. to netdev_err in the
> next fix.

netdev_err() should be used when possible. You just have to be careful
in the probe() function, before netdev exists and you get "(NULL
net_device):" or before it is registered and you get "(unnamed
net_device)" instead of "eth42" etc.

  Andrew

Re: [PATCH RFC 12/13] phylink: add in-band autonegotiation support for 10GBase-KR mode.

2017-07-26 Thread Andrew Lunn

On Tue, Jul 25, 2017 at 03:03:34PM +0100, Russell King wrote:
> Add in-band autonegotation support for 10GBase-KR mode.
> 
> Signed-off-by: Russell King 

Reviewed-by: Andrew Lunn 

Andrew

[Patch net v2] team: use a larger struct for mac address

2017-07-26 Thread Cong Wang

IPv6 tunnels use sizeof(struct in6_addr) as dev->addr_len,
but in many places especially bonding, we use struct sockaddr
to copy and set mac addr, this could lead to stack out-of-bounds
access.

Fix it by using a larger address storage like bonding.

Reported-by: Andrey Konovalov 
Cc: Jiri Pirko 
Signed-off-by: Cong Wang 
---
 drivers/net/team/team.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/net/team/team.c b/drivers/net/team/team.c
index 464570409796..ae53e899259f 100644
--- a/drivers/net/team/team.c
+++ b/drivers/net/team/team.c
@@ -60,11 +60,11 @@ static struct team_port *team_port_get_rtnl(const struct 
net_device *dev)
 static int __set_port_dev_addr(struct net_device *port_dev,
   const unsigned char *dev_addr)
 {
-   struct sockaddr addr;
+   struct sockaddr_storage addr;
 
-   memcpy(addr.sa_data, dev_addr, port_dev->addr_len);
-   addr.sa_family = port_dev->type;
-   return dev_set_mac_address(port_dev, );
+   memcpy(addr.__data, dev_addr, port_dev->addr_len);
+   addr.ss_family = port_dev->type;
+   return dev_set_mac_address(port_dev, (struct sockaddr *));
 }
 
 static int team_port_set_orig_dev_addr(struct team_port *port)
-- 
2.13.0

[Patch net v2] net: check dev->addr_len for dev_set_mac_address()

2017-07-26 Thread Cong Wang

Historically, dev_ifsioc() uses struct sockaddr as mac
address definition, this is why dev_set_mac_address()
accepts a struct sockaddr pointer as input but now we
have various types of mac addresse whose lengths
are up to MAX_ADDR_LEN, longer than struct sockaddr,
and saved in dev->addr_len.

It is too late to fix dev_ifsioc() due to API
compatibility, so just reject those larger than
sizeof(struct sockaddr), otherwise we would read
and use some random bytes from kernel stack.

Fortunately, only a few IPv6 tunnel devices have addr_len
larger than sizeof(struct sockaddr) and they don't support
ndo_set_mac_addr(). But with team driver, in lb mode, they
can still be enslaved to a team master and make its mac addr
length as the same.

Cc: Jiri Pirko 
Signed-off-by: Cong Wang 
---
 net/core/dev_ioctl.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/net/core/dev_ioctl.c b/net/core/dev_ioctl.c
index 06b147d7d9e2..709a4e6fb447 100644
--- a/net/core/dev_ioctl.c
+++ b/net/core/dev_ioctl.c
@@ -263,6 +263,8 @@ static int dev_ifsioc(struct net *net, struct ifreq *ifr, 
unsigned int cmd)
return dev_set_mtu(dev, ifr->ifr_mtu);
 
case SIOCSIFHWADDR:
+   if (dev->addr_len > sizeof(struct sockaddr))
+   return -EINVAL;
return dev_set_mac_address(dev, >ifr_hwaddr);
 
case SIOCSIFHWBROADCAST:
-- 
2.13.0

Re: [PATCH RFC 07/13] net: phy: add I2C mdio bus

2017-07-26 Thread Andrew Lunn

On Tue, Jul 25, 2017 at 03:03:08PM +0100, Russell King wrote:
> Add an I2C MDIO bus bridge library, to allow phylib to access PHYs which
> are connected to an I2C bus instead of the more conventional MDIO bus.
> Such PHYs can be found in SFP adapters and SFF modules.
> 
> Since PHYs appear at I2C bus address 0x40..0x5f, and 0x50/0x51 are
> reserved for SFP EEPROMs/diagnostics, we must not allow the MDIO bus
> to access these I2C addresses.
> 
> Signed-off-by: Russell King 

Reviewed-by: Andrew Lunn 

Andrew

Re: [PATCH RFC 03/13] net: phy: move phy_lookup_setting() and guts of phy_supported_speeds() to phy-core

2017-07-26 Thread Andrew Lunn

On Tue, Jul 25, 2017 at 03:02:47PM +0100, Russell King wrote:
> phy_lookup_setting() provides useful functionality in ethtool code
> outside phylib.  Move it to phy-core and allow it to be re-used (eg,
> in phylink) rather than duplicated elsewhere.  Note that this supports
> the larger linkmode space.
> 
> As we move the phy settings table, we also need to move the guts of
> phy_supported_speeds() as well.
> 
> Signed-off-by: Russell King 

Reviewed-by: Andrew Lunn 

Andrew

[PATCH v3 1/3] ptp: introduce ptp auxiliary worker

2017-07-26 Thread Grygorii Strashko

Many PTP drivers required to perform some asynchronous or periodic work,
like periodically handling PHC counter overflow or handle delayed timestamp
for RX/TX network packets. In most of the cases, such work is implemented
using workqueues. Unfortunately, Kernel workqueues might introduce
significant delay in work scheduling under high system load and on -RT,
which could cause misbehavior of PTP drivers due to internal counter
overflow, for example, and there is no way to tune its execution policy and
priority manually.

Hence, The kthread_worker can be used instead of workqueues, as it creates
separate named kthread for each worker and its its execution policy and
priority can be configured using chrt tool.

This problem was reported for two drivers TI CPSW CPTS and dp83640, so
instead of modifying each of these driver it was proposed to add PTP
auxiliary worker to the PHC subsystem [1].

The patch adds PTP auxiliary worker in PHC subsystem using kthread_worker
and kthread_delayed_work and introduces two new PHC subsystem APIs:

- long (*do_aux_work)(struct ptp_clock_info *ptp) callback in
ptp_clock_info structure, which driver should assign if it require to
perform asynchronous or periodic work. Driver should return the delay of
the PTP next auxiliary work scheduling time (>=0) or negative value in case
further scheduling is not required.

- int ptp_schedule_worker(struct ptp_clock *ptp, unsigned long delay) which
allows schedule PTP auxiliary work.

The name of kthread_worker thread corresponds PTP PHC device name "ptp%d".

[1] https://www.spinics.net/lists/netdev/msg445392.html
Signed-off-by: Grygorii Strashko 
---
 drivers/ptp/ptp_clock.c  | 41 
 drivers/ptp/ptp_private.h|  3 +++
 include/linux/ptp_clock_kernel.h | 20 
 3 files changed, 64 insertions(+)

diff --git a/drivers/ptp/ptp_clock.c b/drivers/ptp/ptp_clock.c
index b774357..899433c 100644
--- a/drivers/ptp/ptp_clock.c
+++ b/drivers/ptp/ptp_clock.c
@@ -28,6 +28,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "ptp_private.h"
 
@@ -184,6 +185,19 @@ static void delete_ptp_clock(struct posix_clock *pc)
kfree(ptp);
 }
 
+static void ptp_aux_kworker(struct kthread_work *work)
+{
+   struct ptp_clock *ptp = container_of(work, struct ptp_clock,
+aux_work.work);
+   struct ptp_clock_info *info = ptp->info;
+   long delay;
+
+   delay = info->do_aux_work(info);
+
+   if (delay >= 0)
+   kthread_queue_delayed_work(ptp->kworker, >aux_work, delay);
+}
+
 /* public interface */
 
 struct ptp_clock *ptp_clock_register(struct ptp_clock_info *info,
@@ -217,6 +231,19 @@ struct ptp_clock *ptp_clock_register(struct ptp_clock_info 
*info,
mutex_init(>pincfg_mux);
init_waitqueue_head(>tsev_wq);
 
+   if (ptp->info->do_aux_work) {
+   char *worker_name = kasprintf(GFP_KERNEL, "ptp%d", ptp->index);
+
+   kthread_init_delayed_work(>aux_work, ptp_aux_kworker);
+   ptp->kworker = kthread_create_worker(0, worker_name ?
+worker_name : info->name);
+   if (IS_ERR(ptp->kworker)) {
+   err = PTR_ERR(ptp->kworker);
+   pr_err("failed to create ptp aux_worker %d\n", err);
+   goto kworker_err;
+   }
+   }
+
err = ptp_populate_pin_groups(ptp);
if (err)
goto no_pin_groups;
@@ -259,6 +286,9 @@ struct ptp_clock *ptp_clock_register(struct ptp_clock_info 
*info,
 no_device:
ptp_cleanup_pin_groups(ptp);
 no_pin_groups:
+   if (ptp->kworker)
+   kthread_destroy_worker(ptp->kworker);
+kworker_err:
mutex_destroy(>tsevq_mux);
mutex_destroy(>pincfg_mux);
ida_simple_remove(_clocks_map, index);
@@ -274,6 +304,11 @@ int ptp_clock_unregister(struct ptp_clock *ptp)
ptp->defunct = 1;
wake_up_interruptible(>tsev_wq);
 
+   if (ptp->kworker) {
+   kthread_cancel_delayed_work_sync(>aux_work);
+   kthread_destroy_worker(ptp->kworker);
+   }
+
/* Release the clock's resources. */
if (ptp->pps_source)
pps_unregister_source(ptp->pps_source);
@@ -339,6 +374,12 @@ int ptp_find_pin(struct ptp_clock *ptp,
 }
 EXPORT_SYMBOL(ptp_find_pin);
 
+int ptp_schedule_worker(struct ptp_clock *ptp, unsigned long delay)
+{
+   return kthread_mod_delayed_work(ptp->kworker, >aux_work, delay);
+}
+EXPORT_SYMBOL(ptp_schedule_worker);
+
 /* module operations */
 
 static void __exit ptp_exit(void)
diff --git a/drivers/ptp/ptp_private.h b/drivers/ptp/ptp_private.h
index d958889..b86f1bf 100644
--- a/drivers/ptp/ptp_private.h
+++ b/drivers/ptp/ptp_private.h
@@ -22,6 +22,7 @@
 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -56,6 +57,8 @@ struct ptp_clock {

[PATCH v3 0/3] net: ethernet: ti: cpts: fix tx timestamping timeout

2017-07-26 Thread Grygorii Strashko

Hi

With the low Ethernet connection speed cpdma notification about packet
processing can be received before CPTS TX timestamp event, which is set
when packet actually left CPSW while cpdma notification is sent when packet
pushed in CPSW fifo. As result, when connection is slow and CPU is fast
enough TX timestamping is not working properly.
Issue was discovered using timestamping tool on am57x boards with Ethernet link
speed forced to 100M and on am335x-evm with Ethernet link speed forced to 10M.

Patch3 - This series fixes it by introducing TX SKB queue to store PTP SKBs for
which Ethernet Transmit Event hasn't been received yet and then re-check this
queue with new Ethernet Transmit Events by scheduling CPTS overflow
work more often until TX SKB queue is not empty.

Patch 1,2 - As CPTS overflow work is time critical task it important to ensure
that its scheduling is not delayed. Unfortunately, There could be significant
delay in CPTS work schedule under high system load and on -RT which could cause
CPTS misbehavior due to internal counter overflow and there is no way to tune
CPTS overflow work execution policy and priority manually. The kthread_worker
can be used instead of workqueues, as it creates separate named kthread for
each worker and its its execution policy and priority can be configured
using chrt tool. Instead of modifying CPTS driver itself it was proposed to
it was proposed to add PTP auxiliary worker to the PHC subsystem [1], so
other drivers can benefit from this feature also.

[1] https://www.spinics.net/lists/netdev/msg445392.html

changes in v3:
- patch 1: added proper error handling in ptp_clock_register. 
  minor comments applied.

changes in v2:
- added PTP auxiliary worker to the PHC subsystem

Links
v2: https://www.spinics.net/lists/netdev/msg445859.html
v1: https://www.spinics.net/lists/netdev/msg445387.html

Grygorii Strashko (3):
  ptp: introduce ptp auxiliary worker
  net: ethernet: ti: cpts: convert to use ptp auxiliary worker
  net: ethernet: ti: cpts: fix tx timestamping timeout

 drivers/net/ethernet/ti/cpts.c   | 111 +--
 drivers/net/ethernet/ti/cpts.h   |   2 +-
 drivers/ptp/ptp_clock.c  |  41 +++
 drivers/ptp/ptp_private.h|   3 ++
 include/linux/ptp_clock_kernel.h |  20 +++
 5 files changed, 160 insertions(+), 17 deletions(-)

-- 
2.10.1

[PATCH v3 3/3] net: ethernet: ti: cpts: fix tx timestamping timeout

2017-07-26 Thread Grygorii Strashko

With the low speed Ethernet connection CPDMA notification about packet
processing can be received before CPTS TX timestamp event, which is set
when packet actually left CPSW while cpdma notification is sent when packet
pushed in CPSW fifo.  As result, when connection is slow and CPU is fast
enough TX timestamping is not working properly.

Fix it, by introducing TX SKB queue to store PTP SKBs for which Ethernet
Transmit Event hasn't been received yet and then re-check this queue
with new Ethernet Transmit Events by scheduling CPTS overflow
work more often (every 1 jiffies) until TX SKB queue is not empty.

Side effect of this change is:
 - User space tools require to take into account possible delay in TX
timestamp processing (for example ptp4l works with tx_timestamp_timeout=400
under net traffic and tx_timestamp_timeout=25 in idle).

Signed-off-by: Grygorii Strashko 
---
 drivers/net/ethernet/ti/cpts.c | 86 --
 drivers/net/ethernet/ti/cpts.h |  1 +
 2 files changed, 84 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/ti/cpts.c b/drivers/net/ethernet/ti/cpts.c
index 3ed438e..c2121d2 100644
--- a/drivers/net/ethernet/ti/cpts.c
+++ b/drivers/net/ethernet/ti/cpts.c
@@ -31,9 +31,18 @@
 
 #include "cpts.h"
 
+#define CPTS_SKB_TX_WORK_TIMEOUT 1 /* jiffies */
+
+struct cpts_skb_cb_data {
+   unsigned long tmo;
+};
+
 #define cpts_read32(c, r)  readl_relaxed(>reg->r)
 #define cpts_write32(c, v, r)  writel_relaxed(v, >reg->r)
 
+static int cpts_match(struct sk_buff *skb, unsigned int ptp_class,
+ u16 ts_seqid, u8 ts_msgtype);
+
 static int event_expired(struct cpts_event *event)
 {
return time_after(jiffies, event->tmo);
@@ -77,6 +86,47 @@ static int cpts_purge_events(struct cpts *cpts)
return removed ? 0 : -1;
 }
 
+static bool cpts_match_tx_ts(struct cpts *cpts, struct cpts_event *event)
+{
+   struct sk_buff *skb, *tmp;
+   u16 seqid;
+   u8 mtype;
+   bool found = false;
+
+   mtype = (event->high >> MESSAGE_TYPE_SHIFT) & MESSAGE_TYPE_MASK;
+   seqid = (event->high >> SEQUENCE_ID_SHIFT) & SEQUENCE_ID_MASK;
+
+   /* no need to grab txq.lock as access is always done under cpts->lock */
+   skb_queue_walk_safe(>txq, skb, tmp) {
+   struct skb_shared_hwtstamps ssh;
+   unsigned int class = ptp_classify_raw(skb);
+   struct cpts_skb_cb_data *skb_cb =
+   (struct cpts_skb_cb_data *)skb->cb;
+
+   if (cpts_match(skb, class, seqid, mtype)) {
+   u64 ns = timecounter_cyc2time(>tc, event->low);
+
+   memset(, 0, sizeof(ssh));
+   ssh.hwtstamp = ns_to_ktime(ns);
+   skb_tstamp_tx(skb, );
+   found = true;
+   __skb_unlink(skb, >txq);
+   dev_consume_skb_any(skb);
+   dev_dbg(cpts->dev, "match tx timestamp mtype %u seqid 
%04x\n",
+   mtype, seqid);
+   } else if (time_after(jiffies, skb_cb->tmo)) {
+   /* timeout any expired skbs over 1s */
+   dev_dbg(cpts->dev,
+   "expiring tx timestamp mtype %u seqid %04x\n",
+   mtype, seqid);
+   __skb_unlink(skb, >txq);
+   dev_consume_skb_any(skb);
+   }
+   }
+
+   return found;
+}
+
 /*
  * Returns zero if matching event type was found.
  */
@@ -101,9 +151,15 @@ static int cpts_fifo_read(struct cpts *cpts, int match)
event->low = lo;
type = event_type(event);
switch (type) {
+   case CPTS_EV_TX:
+   if (cpts_match_tx_ts(cpts, event)) {
+   /* if the new event matches an existing skb,
+* then don't queue it
+*/
+   break;
+   }
case CPTS_EV_PUSH:
case CPTS_EV_RX:
-   case CPTS_EV_TX:
list_del_init(>list);
list_add_tail(>list, >events);
break;
@@ -229,8 +285,15 @@ static long cpts_overflow_check(struct ptp_clock_info *ptp)
struct cpts *cpts = container_of(ptp, struct cpts, info);
unsigned long delay = cpts->ov_check_period;
struct timespec64 ts;
+   unsigned long flags;
+
+   spin_lock_irqsave(>lock, flags);
+   ts = ns_to_timespec64(timecounter_read(>tc));
+
+   if (!skb_queue_empty(>txq))
+   delay = CPTS_SKB_TX_WORK_TIMEOUT;
+   spin_unlock_irqrestore(>lock, flags);
 
-   cpts_ptp_gettime(>info, );
pr_debug("cpts overflow check at %lld.%09lu\n", ts.tv_sec, ts.tv_nsec);
return

[PATCH v4 2/3] net: ethernet: ti: cpts: convert to use ptp auxiliary worker

2017-07-26 Thread Grygorii Strashko

There could be significant delay in CPTS work schedule under high system
load and on -RT which could cause CPTS misbehavior due to internal counter
overflow. Usage of own kthread_worker allows to avoid such kind of issues
and makes it possible to tune priority of CPTS kthread_worker thread on -RT.

Hence, the CPTS driver is converted to use PTP auxiliary worker as PHC
subsystem implements such functionality in a generic way now.

Signed-off-by: Grygorii Strashko 
---
 drivers/net/ethernet/ti/cpts.c | 27 +--
 drivers/net/ethernet/ti/cpts.h |  1 -
 2 files changed, 13 insertions(+), 15 deletions(-)

diff --git a/drivers/net/ethernet/ti/cpts.c b/drivers/net/ethernet/ti/cpts.c
index 32279d2..3ed438e 100644
--- a/drivers/net/ethernet/ti/cpts.c
+++ b/drivers/net/ethernet/ti/cpts.c
@@ -224,6 +224,17 @@ static int cpts_ptp_enable(struct ptp_clock_info *ptp,
return -EOPNOTSUPP;
 }
 
+static long cpts_overflow_check(struct ptp_clock_info *ptp)
+{
+   struct cpts *cpts = container_of(ptp, struct cpts, info);
+   unsigned long delay = cpts->ov_check_period;
+   struct timespec64 ts;
+
+   cpts_ptp_gettime(>info, );
+   pr_debug("cpts overflow check at %lld.%09lu\n", ts.tv_sec, ts.tv_nsec);
+   return (long)delay;
+}
+
 static struct ptp_clock_info cpts_info = {
.owner  = THIS_MODULE,
.name   = "CTPS timer",
@@ -236,18 +247,9 @@ static struct ptp_clock_info cpts_info = {
.gettime64  = cpts_ptp_gettime,
.settime64  = cpts_ptp_settime,
.enable = cpts_ptp_enable,
+   .do_aux_work= cpts_overflow_check,
 };
 
-static void cpts_overflow_check(struct work_struct *work)
-{
-   struct timespec64 ts;
-   struct cpts *cpts = container_of(work, struct cpts, overflow_work.work);
-
-   cpts_ptp_gettime(>info, );
-   pr_debug("cpts overflow check at %lld.%09lu\n", ts.tv_sec, ts.tv_nsec);
-   schedule_delayed_work(>overflow_work, cpts->ov_check_period);
-}
-
 static int cpts_match(struct sk_buff *skb, unsigned int ptp_class,
  u16 ts_seqid, u8 ts_msgtype)
 {
@@ -378,7 +380,7 @@ int cpts_register(struct cpts *cpts)
}
cpts->phc_index = ptp_clock_index(cpts->clock);
 
-   schedule_delayed_work(>overflow_work, cpts->ov_check_period);
+   ptp_schedule_worker(cpts->clock, cpts->ov_check_period);
return 0;
 
 err_ptp:
@@ -392,8 +394,6 @@ void cpts_unregister(struct cpts *cpts)
if (WARN_ON(!cpts->clock))
return;
 
-   cancel_delayed_work_sync(>overflow_work);
-
ptp_clock_unregister(cpts->clock);
cpts->clock = NULL;
 
@@ -476,7 +476,6 @@ struct cpts *cpts_create(struct device *dev, void __iomem 
*regs,
cpts->dev = dev;
cpts->reg = (struct cpsw_cpts __iomem *)regs;
spin_lock_init(>lock);
-   INIT_DELAYED_WORK(>overflow_work, cpts_overflow_check);
 
ret = cpts_of_parse(cpts, node);
if (ret)
diff --git a/drivers/net/ethernet/ti/cpts.h b/drivers/net/ethernet/ti/cpts.h
index 01ea82b..586edd9 100644
--- a/drivers/net/ethernet/ti/cpts.h
+++ b/drivers/net/ethernet/ti/cpts.h
@@ -119,7 +119,6 @@ struct cpts {
u32 cc_mult; /* for the nominal frequency */
struct cyclecounter cc;
struct timecounter tc;
-   struct delayed_work overflow_work;
int phc_index;
struct clk *refclk;
struct list_head events;
-- 
2.10.1

Re: [PATCH] net: inet: diag: expose sockets cgroup classid

2017-07-26 Thread Cong Wang

On Wed, Jul 26, 2017 at 10:22 AM, Levin, Alexander (Sasha Levin)
 wrote:
> +   if (ext & (1 << (INET_DIAG_CLASS_ID - 1))) {
> +   u32 classid = 0;
> +
> +#ifdef CONFIG_SOCK_CGROUP_DATA
> +   classid = sock_cgroup_classid(>sk_cgrp_data);
> +#endif


If CONFIG_SOCK_CGROUP_DATA is not enabled, should we put 0
or put nothing (that is, skipping this nla_put())?

> +
> +   nla_put(skb, INET_DIAG_CLASS_ID, sizeof(classid), );

nla_put_u32()

Thanks.

Re: [PATCH RFC 04/13] net: phy: add 1000Base-X to phy settings table

2017-07-26 Thread Andrew Lunn

On Tue, Jul 25, 2017 at 03:02:52PM +0100, Russell King wrote:
> Add the missing 1000Base-X entry to the phy settings table.  This was
> not included because the original code could not cope with more than
> 32 bits of link mode mask.
> 
> Signed-off-by: Russell King 

Hi Russell

Looks like there are another 15 to add to the table. But that can wait
for another time when somebody needs them.

Reviewed-by: Andrew Lunn 

Andrew

Re: [PATCH RFC 02/13] net: phy: split out PHY speed and duplex string generation

2017-07-26 Thread Andrew Lunn

On Tue, Jul 25, 2017 at 03:02:42PM +0100, Russell King wrote:
> Other code would like to make use of this, so make the speed and duplex
> string generation visible, and place it in a separate file.
> 
> Signed-off-by: Russell King 

Reviewed-by: Andrew Lunn 

Andrew

Re: [PATCH net-next v2 01/10] net: dsa: lan9303: Fixed MDIO interface

2017-07-26 Thread Andrew Lunn

> Good. Just one question about process. Could I have posted my work
> as a RFC? To get one round of initial feedback before chopping into
> small patch requests. As well as indicating where I am heading. Or is
> that just waste of human bandwidth?

Depends. Post 100 RFC patches, i won't look at them. Post 21 with a
cover note making it clear you are planning to submit them in blocks
of 7, i might.

But it is best to assume reviewers have small blocks of time. 21
patches take 3 times a long to review as 7. The block of time might
not be enough for 21, so the review gets differed. 7 are more likely
to fit in the available time, so it happens quickly.

   Andrew

Re: [PATCH] netpoll: Fix device name check in netpoll_setup()

2017-07-26 Thread Cong Wang

On Wed, Jul 26, 2017 at 11:44 AM, Doug Anderson  wrote:
> Hi,
>
> On Tue, Jul 25, 2017 at 11:36 AM, Matthias Kaehlcke  wrote:
>> Apparently netpoll_setup() assumes that netpoll.dev_name is a pointer
>> when checking if the device name is set:
>>
>> if (np->dev_name) {
>>   ...
>>
>> However the field is a character array, therefore the condition always
>> yields true. Check instead whether the first byte of the array has a
>> non-zero value.
>>
>> Signed-off-by: Matthias Kaehlcke 
>> ---
>>  net/core/netpoll.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/net/core/netpoll.c b/net/core/netpoll.c
>> index 8357f164c660..912731bed7b7 100644
>> --- a/net/core/netpoll.c
>> +++ b/net/core/netpoll.c
>> @@ -666,7 +666,7 @@ int netpoll_setup(struct netpoll *np)
>> int err;
>>
>> rtnl_lock();
>> -   if (np->dev_name) {
>> +   if (np->dev_name[0]) {
>> struct net *net = current->nsproxy->net_ns;
>> ndev = __dev_get_by_name(net, np->dev_name);
>> }
>
> It's really up to the maintainer of the code, but my first instinct
> here would be to instead remove the "if" test unless we really expect
> dev->dev_name to be blank in lots of cases.  It will slightly slow
> down the error case but should avoid an "if" test in the non-error
> case.  By definition it should be safe since currently the "if" test
> should always evaluate to true.
>

netconsole could set this dev_name to empty via configfs,
so this patch is correct.

Re: [PATCH net-next v2 01/10] net: dsa: lan9303: Fixed MDIO interface

2017-07-26 Thread Egil Hjelmeland

Den 26. juli 2017 22:07, skrev David Miller:

From: Andrew Lunn 
Date: Wed, 26 Jul 2017 19:52:24 +0200

So I really want to group the patches into only a few series in order
to not spend months on the process.

I strongly agree with Vivien here. Good patches get accepted in about
3 days. You should expect feedback within a day or two. That allows
you to have fast cycle times for getting patches in.

+1

Small simple patches will get everything in 10 times fast than if
you clump everything together into larger, harder to review ones.

Good. Just one question about process. Could I have posted my work
as a RFC? To get one round of initial feedback before chopping into
small patch requests. As well as indicating where I am heading. Or is
that just waste of human bandwidth?

Egil

Re: [PATCH net-next v2] bpf: install libbpf headers on 'make install'

2017-07-26 Thread David Miller

From: Jakub Kicinski 
Date: Tue, 25 Jul 2017 11:17:11 -0700

> Add a new target to install the bpf.h header to $(prefix)/include/bpf/ 
> directory.  This is necessary to build standalone applications using 
> libbpf, without the need to clone the kernel sources and point to them.
> 
> Signed-off-by: Jakub Kicinski 

Applied, thank you.

Re: [Patch net] bonding: commit link status change after propose

2017-07-26 Thread David Miller

From: Cong Wang 
Date: Tue, 25 Jul 2017 09:44:25 -0700

> Commit de77ecd4ef02 ("bonding: improve link-status update in mii-monitoring")
> moves link status commitment into bond_mii_monitor(), but it still relies
> on the return value of bond_miimon_inspect() as the hint. We need to return
> non-zero as long as we propose a link status change.
> 
> Fixes: de77ecd4ef02 ("bonding: improve link-status update in mii-monitoring")
> Reported-by: Benjamin Gilbert 
> Tested-by: Benjamin Gilbert 
> Cc: Mahesh Bandewar 
> Signed-off-by: Cong Wang 

Applied and queued up for -stable.

Re: [PATCH 0/8] net: mvpp2: add TX interrupts support

2017-07-26 Thread David Miller

From: Thomas Petazzoni 
Date: Tue, 25 Jul 2017 17:55:01 +0200

> Hello,
> 
> So far, the mvpp2 driver was using an hrtimer to handle TX
> completion. This patch series adds support for using TX interrupts
> (for each CPU) on PPv2.2, the variant of the IP used on Marvell Armada
> 7K/8K.
> 
> This series has been tested on Marvell Armada 7K (PPv2.2) and Armada
> 375 (PPv2.1).
> 
> Dave:
> 
>  - This series depends on the previous series sent by Antoine Ténart
>"net: mvpp2: MAC/GoP configuration and optional PHYs". Functionally
>speaking there is no real dependency, but we touch in a few areas
>the same piece of code, so I based my patch series on top of
>Antoine's.
> 
>  - Please do not apply the last patch of this series "arm64: dts:
>marvell: add TX interrupts for PPv2.2", it will be taken by the ARM
>mvebu maintainers.

Please don't do things this way.

Patiently wait for Antione's series to make it into my tree, then
submit your's.

Also, if we're continually going to elide the DTS file patches, just
don't bother adding them to the series.  That way you don't have to
give me special instructions, and I don't have the possibility of
making a mistake and applying it accidently.

Thanks.

Re: [RFC] switchdev: clarify ndo_get_phys_port_name() formats

2017-07-26 Thread Jakub Kicinski

On Wed, 26 Jul 2017 12:23:17 +0300, Or Gerlitz wrote:
> On Wed, Jul 26, 2017 at 11:13 AM, Jakub Kicinski  wrote:
> > On Wed, 26 Jul 2017 07:48:40 +0200, Jiri Pirko wrote:  
> 
> >> I think it would make sense if the driver would just fill-up a struct in
> >> the ndo call and core would generate the string.  
> 
> > I do like the idea of core generating the string.  I have a temptation
> > to try to involve devlink in this somehow, since port id and split info
> > is already available there.  Perhaps that would only overcomplicate
> > things.  
> 
> > The other question is: can we remove the option to generate an arbitrary
> > string completely?  I think Or and mlx5 folks may object since it would
> > mean mlx5 VF repr names would change.  
> 
> What we have today is the representor driver instance setting the VF index as
> the of phys port name and we're telling users to have this sort of udev rule:
> 
> SUBSYSTEM=="net", ACTION=="add",
> ATTR{phys_switch_id}=="", \ ATTR{phys_port_name}!="",
> NAME="$PF_NIC$attr{phys_port_name}"

Example names generated by this rule would be pfnic_0, pfnic_1 for vf
representors 0 and 1?

> that has affiliation to  the PF where this VF belongs. AFAIK this
> model/assumption is now under push to
> higher level (open stack), so lets see if/what we want to change here
> w.r.t to sriov offloading drivers.
> 
> I would opt for an approach where the value returned by the kernel is
> the minimal possible and
> user-space has flexibility to further orchestrate that with udev
> rules. I wasn't fully clear on Jakub's suggestion
> which parts must come from the kernel. Do we have any length
> limitation for the phys port name?

Looks like the limit today is IFNAMSIZ.  I'm in favor of leaving the
flexibility to the userspace, why I suggested adding the pf%d or
pf%dvf%d to the name is that I don't think we have any other way today
of telling whether representor is for a physical port, VF or PF.

If I understand mlx5 code, you're not reporting port ids for physical
ports so presence of the name already implies it's a VF but in case you
want to add port splitting support, for example, reporting the name on
physical ports will become more of a necessity.

If we adopt Jiri's suggestion of returning structured data it will be
very easy to give user space type and indexes separately, but we should
probably still return the string for backwards compatibility.

Re: [PATCH net-next v2 01/10] net: dsa: lan9303: Fixed MDIO interface

2017-07-26 Thread David Miller

From: Andrew Lunn 
Date: Wed, 26 Jul 2017 19:52:24 +0200

>> > So I really want to group the patches into only a few series in order
>> > to not spend months on the process.
> 
> I strongly agree with Vivien here. Good patches get accepted in about
> 3 days. You should expect feedback within a day or two. That allows
> you to have fast cycle times for getting patches in.

+1

Small simple patches will get everything in 10 times fast than if
you clump everything together into larger, harder to review ones.

Re: [PATCH 1/4] can: dev: Add support for limiting configured bitrate

2017-07-26 Thread Kurt Van Dijck

Hi,

I know my response is late ...

> Hi Oliver
> On 07/20/2017 02:43 AM, Oliver Hartkopp wrote:
> > Hi Franklin,
> > 
> > On 07/20/2017 01:36 AM, Franklin S Cooper Jr wrote:
> > 
> >> +#ifdef CONFIG_OF
> >> +void of_transceiver_is_fixed(struct net_device *dev)
> >> +{
> > 
> > (..)
> > 
> >> +}
> >> +EXPORT_SYMBOL(of_transceiver_is_fixed);
> >> +#endif
> > 
> > I'm not sure about the naming here.
> > 
> > As this is a CAN transceiver related option it should be named accordingly:

I contest the the name too:
1) the can transceiver isn't fixed at all, it limited to the higher
bitrates.

2) of_can_transceiver_is_fixed suggests to test if a transceiver is
fixed, it does not suggest to load some properties from the device tree.
of_can_load_transceiver looks way more clear to me.

That's my opinion.
The important things, like the contents of the functions, look good.

Kind regards,
Kurt Van Dijck

Re: [PATCH net] net: phy: Run state machine to completion

2017-07-26 Thread Florian Fainelli

On 07/26/2017 12:34 PM, Florian Fainelli wrote:
> On 07/26/2017 12:24 PM, Florian Fainelli wrote:
>> Marc reported that he was not getting the PHY library adjust_link()
>> callback function to run when calling phy_stop() + phy_disconnect()
>> which does not indeed happen because we don't make sure we flush the
>> PHYLIB delayed work and let it run to completion. We also need to have
>> a synchronous call to phy_state_machine() in order to have the state
>> machine actually act on PHY_HALTED, set the PHY device's link down, turn
>> the network device's carrier off and finally call the adjust_link()
>> function.
>>
>> Reported-by: Marc Gonzalez 
>> Fixes: a390d1f379cf ("phylib: convert state_queue work to delayed_work")
>> Signed-off-by: Florian Fainelli 
>> ---
>> David, this dates back from before the commit mentioned in Fixes but it would
>> be hard to backport to earlier kernels if we flagged the original design flaw
>> that used timers. Also, I am not clear on the timer API whether there was a 
>> way
>> to ensure timers would run to completion before they would be cancelled.
>>
>> Marc, please add your Signed-off-by tag since you contributed the second 
>> line.
>>
>> Thanks!
> 
> David, please hold off before applying this, I found a corner case with
> power management where we may be accessing clock gated registers and
> cause bus errors, will report back here.

This was a downstream only problem because I was missing these two commits:

49d52e8108a21749dc2114b924c907db43358984 ("net: phy: handle state
correctly in phy_stop_machine")
d5c3d84657db57bd23ecd58b97f1c99dd42a7b80 ("net: phy: Avoid polling PHY
with PHY_IGNORE_INTERRUPTS")

This was tested with bcmgenet (PHY_IGNORE_INTERRUPT and PHY_POLL) and
with bcm_sf2 (net/dsa/slave.c), would appreciate some review/testing
from others as well.

> 
>>
>>  drivers/net/phy/phy.c | 5 +
>>  1 file changed, 5 insertions(+)
>>
>> diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c
>> index d0626bf5c540..30e7c43e0d87 100644
>> --- a/drivers/net/phy/phy.c
>> +++ b/drivers/net/phy/phy.c
>> @@ -743,12 +743,17 @@ void phy_trigger_machine(struct phy_device *phydev, 
>> bool sync)
>>   */
>>  void phy_stop_machine(struct phy_device *phydev)
>>  {
>> +/* Run the state machine to completion */
>> +flush_delayed_work(>state_queue);
>>  cancel_delayed_work_sync(>state_queue);
>>  
>>  mutex_lock(>lock);
>>  if (phydev->state > PHY_UP && phydev->state != PHY_HALTED)
>>  phydev->state = PHY_UP;
>>  mutex_unlock(>lock);
>> +
>> +/* Now we can run the state machine synchronously */
>> +phy_state_machine(>state_queue.work);
>>  }
>>  
>>  /**
>>
> 
> 


-- 
Florian

Re: [PATCH net-next 10/18] net: mvpp2: use the GoP interrupt for link status changes

2017-07-26 Thread Russell King - ARM Linux

On Wed, Jul 26, 2017 at 06:26:48PM +0200, Andrew Lunn wrote:
> And this is another bit of code you probably need to change in a while
> with phylink lands.

The way the MAC driver handles link up/down and configuration events
changes significantly when a MAC driver switches to phylink, since
a directly connected SFP cage needs to have the MAC reconfigured
between SGMII and 1000base-X modes.  If you add SFP+ into that, also
10Gbase-KR as well.

Note also that the "link up" condition for SFP (and probably SFF) is
more complex than just "is the module reporting that it's receiving
a signal" - especially with 1000base-X, there's negotiation to be
performed, so you also need to know (if the module is connected
directly to the MAC) whether the Serdes is in sync and has finished
negotiation (and itself says it has link with the remote end.)

With the Marvell 88x3310 PHY, the MAC driver already needs to switch
between 10Gbase-KR and SGMII modes, as the 88x3310 automatically
makes that switch on its MAC facing interface without software
intervention.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.

Re: [PATCH net] ipv6: no need to return rt->dst.error if it is not null entry.

2017-07-26 Thread Roopa Prabhu

On Wed, Jul 26, 2017 at 12:00 PM, David Ahern  wrote:
> On 7/26/17 12:55 PM, Roopa Prabhu wrote:
>> On Wed, Jul 26, 2017 at 11:49 AM, David Ahern  wrote:
>>> On 7/26/17 12:27 PM, Roopa Prabhu wrote:
 agreed...so looks like the check in v3 should be


 +   if ( rt == net->ipv6.ip6_null_entry ||
 +(rt->dst.error &&
 + #ifdef CONFIG_IPV6_MULTIPLE_TABLES
 +  rt != net->ipv6.ip6_prohibit_entry &&
 +  rt != net->ipv6.ip6_blk_hole_entry &&
 +#endif
 + )) {
 err = rt->dst.error;
 ip6_rt_put(rt);
 goto errout;

>>>
>>> I don't think so. If I add a prohibit route and use the fibmatch
>>> attribute, I want to see the route from the FIB that was matched.
>>
>>
>> yes, exactly. wouldn't  'rt != net->ipv6.ip6_prohibit_entry' above let
>> it fall through to the route fill code ?
>>
>> ah...but i guess you are saying that they will have rt6_info's of
>> their own and will not match. got it. ack.
>>
>
> This:
>
> diff --git a/net/ipv6/route.c b/net/ipv6/route.c
> index 4d30c96a819d..24de81c804c2 100644
> --- a/net/ipv6/route.c
> +++ b/net/ipv6/route.c
> @@ -3637,11 +3637,6 @@ static int inet6_rtm_getroute(struct sk_buff
> *in_skb, struct nlmsghdr *nlh,
> dst = ip6_route_lookup(net, , 0);
>
> rt = container_of(dst, struct rt6_info, dst);
> -   if (rt->dst.error) {
> -   err = rt->dst.error;
> -   ip6_rt_put(rt);
> -   goto errout;
> -   }
>
> if (rt == net->ipv6.ip6_null_entry) {
> err = rt->dst.error;
>
> Puts back the original behavior. In that case, only rt == null_entry
> drops to the error path which is correct. All other rt values will drop
> to rt6_fill_node and return rt data.

yes, i thought so too and hence acked v1. But, following congs
comment, realized that it may mask some real errors for fibmatch ?

I just tested a case of unreachable route with just the above patch
you posted, and I do get the error correctly.

so, I guess you are saying all real errors for fibmatch will have "rt
== net->ipv6.ip6_null_entry" and we should be ok.
sounds good to me.

Re: [PATCH net] net: phy: Run state machine to completion

2017-07-26 Thread Florian Fainelli

On 07/26/2017 12:24 PM, Florian Fainelli wrote:
> Marc reported that he was not getting the PHY library adjust_link()
> callback function to run when calling phy_stop() + phy_disconnect()
> which does not indeed happen because we don't make sure we flush the
> PHYLIB delayed work and let it run to completion. We also need to have
> a synchronous call to phy_state_machine() in order to have the state
> machine actually act on PHY_HALTED, set the PHY device's link down, turn
> the network device's carrier off and finally call the adjust_link()
> function.
> 
> Reported-by: Marc Gonzalez 
> Fixes: a390d1f379cf ("phylib: convert state_queue work to delayed_work")
> Signed-off-by: Florian Fainelli 
> ---
> David, this dates back from before the commit mentioned in Fixes but it would
> be hard to backport to earlier kernels if we flagged the original design flaw
> that used timers. Also, I am not clear on the timer API whether there was a 
> way
> to ensure timers would run to completion before they would be cancelled.
> 
> Marc, please add your Signed-off-by tag since you contributed the second line.
> 
> Thanks!

David, please hold off before applying this, I found a corner case with
power management where we may be accessing clock gated registers and
cause bus errors, will report back here.

> 
>  drivers/net/phy/phy.c | 5 +
>  1 file changed, 5 insertions(+)
> 
> diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c
> index d0626bf5c540..30e7c43e0d87 100644
> --- a/drivers/net/phy/phy.c
> +++ b/drivers/net/phy/phy.c
> @@ -743,12 +743,17 @@ void phy_trigger_machine(struct phy_device *phydev, 
> bool sync)
>   */
>  void phy_stop_machine(struct phy_device *phydev)
>  {
> + /* Run the state machine to completion */
> + flush_delayed_work(>state_queue);
>   cancel_delayed_work_sync(>state_queue);
>  
>   mutex_lock(>lock);
>   if (phydev->state > PHY_UP && phydev->state != PHY_HALTED)
>   phydev->state = PHY_UP;
>   mutex_unlock(>lock);
> +
> + /* Now we can run the state machine synchronously */
> + phy_state_machine(>state_queue.work);
>  }
>  
>  /**
> 


-- 
Florian

Re: ARM GLX Khadas VIM Pro - Ethernet detected as only 10Mbps and stalled after some traffic

2017-07-26 Thread Jerome Brunet

On Tue, 2017-07-25 at 18:56 +0200, crow wrote:
> Hi,
> Today I did test on ArchLinuxArm the Kernel v4.13-rc2. On downloading
> the linux git source the network will eventually get stalled. Here are
> the information
> 
> Over SSH (network works).
> 
> [root@alarm ~]# uname -a
> Linux alarm 4.13.0-rc2-1-ARCH #1 SMP Mon Jul 24 20:02:50 MDT 2017
> aarch64 GNU/Linux
> [root@alarm ~]# mii-tool -vvv eth0
> Using SIOCGMIIPHY=0x8947
> eth0: negotiated 1000baseT-HD flow-control, link ok

[Replying again on the last thread :) ]
This 1000BaseT Half Duplex looks suspucious if the PHY is supposed to be a
10/100Mbps

>   registers for MII PHY 8:
> 1000 782d 0181 4400 01e1 c1e1 000f 2001
>        
> 0040 0002 40e8 5400 1c1c   
> fff0   000a 1407 004a  105a
>   product info: vendor 00:60:51, model 0 rev 0
>   basic mode:   autonegotiation enabled
>   basic status: autonegotiation complete, link ok
>   capabilities: 1000baseT-HD 1000baseT-FD 100baseTx-FD 100baseTx-HD
> 10baseT-FD 10baseT-HD
>   advertising:  1000baseT-HD 1000baseT-FD 100baseTx-FD 100baseTx-HD
> 10baseT-FD 10baseT-HD
>   link partner: 1000baseT-HD 1000baseT-FD 100baseTx-FD 100baseTx-HD
> 10baseT-FD 10baseT-HD
> [root@alarm ~]# ethtool -S eth0
> NIC statistics:
>  mmc_tx_octetcount_gb: 0
>  mmc_tx_framecount_gb: 0
>  mmc_tx_broadcastframe_g: 0
>  mmc_tx_multicastframe_g: 0
>  mmc_tx_64_octets_gb: 0
>  mmc_tx_65_to_127_octets_gb: 0
>  mmc_tx_128_to_255_octets_gb: 0
>  mmc_tx_256_to_511_octets_gb: 0
>  mmc_tx_512_to_1023_octets_gb: 0
>  mmc_tx_1024_to_max_octets_gb: 0
>  mmc_tx_unicast_gb: 0
>  mmc_tx_multicast_gb: 0
>  mmc_tx_broadcast_gb: 0
>  mmc_tx_underflow_error: 0
>  mmc_tx_singlecol_g: 0
>  mmc_tx_multicol_g: 0
>  mmc_tx_deferred: 0
>  mmc_tx_latecol: 0
>  mmc_tx_exesscol: 0
>  mmc_tx_carrier_error: 0
>  mmc_tx_octetcount_g: 0
>  mmc_tx_framecount_g: 0
>  mmc_tx_excessdef: 0
>  mmc_tx_pause_frame: 0
>  mmc_tx_vlan_frame_g: 0
>  mmc_rx_framecount_gb: 133
>  mmc_rx_octetcount_gb: 16646
>  mmc_rx_octetcount_g: 16646
>  mmc_rx_broadcastframe_g: 9
>  mmc_rx_multicastframe_g: 22
>  mmc_rx_crc_error: 0
>  mmc_rx_align_error: 0
>  mmc_rx_run_error: 0
>  mmc_rx_jabber_error: 0
>  mmc_rx_undersize_g: 0
>  mmc_rx_oversize_g: 0
>  mmc_rx_64_octets_gb: 45
>  mmc_rx_65_to_127_octets_gb: 64
>  mmc_rx_128_to_255_octets_gb: 13
>  mmc_rx_256_to_511_octets_gb: 7
>  mmc_rx_512_to_1023_octets_gb: 4
>  mmc_rx_1024_to_max_octets_gb: 0
>  mmc_rx_unicast_g: 102
>  mmc_rx_length_error: 0
>  mmc_rx_autofrangetype: 0
>  mmc_rx_pause_frames: 0
>  mmc_rx_fifo_overflow: 0
>  mmc_rx_vlan_frames_gb: 0
>  mmc_rx_watchdog_error: 0
>  mmc_rx_ipc_intr_mask: 1073692671
>  mmc_rx_ipc_intr: 0
>  mmc_rx_ipv4_gd: 117
>  mmc_rx_ipv4_hderr: 0
>  mmc_rx_ipv4_nopay: 0
>  mmc_rx_ipv4_frag: 0
>  mmc_rx_ipv4_udsbl: 0
>  mmc_rx_ipv4_gd_octets: 12585
>  mmc_rx_ipv4_hderr_octets: 0
>  mmc_rx_ipv4_nopay_octets: 0
>  mmc_rx_ipv4_frag_octets: 0
>  mmc_rx_ipv4_udsbl_octets: 0
>  mmc_rx_ipv6_gd_octets: 104
>  mmc_rx_ipv6_hderr_octets: 0
>  mmc_rx_ipv6_nopay_octets: 0
>  mmc_rx_ipv6_gd: 1
>  mmc_rx_ipv6_hderr: 0
>  mmc_rx_ipv6_nopay: 0
>  mmc_rx_udp_gd: 31
>  mmc_rx_udp_err: 0
>  mmc_rx_tcp_gd: 85
>  mmc_rx_tcp_err: 0
>  mmc_rx_icmp_gd: 2
>  mmc_rx_icmp_err: 0
>  mmc_rx_udp_gd_octets: 2963
>  mmc_rx_udp_err_octets: 0
>  mmc_rx_tcp_gd_octets: 7254
>  mmc_rx_tcp_err_octets: 0
>  mmc_rx_icmp_gd_octets: 92
>  mmc_rx_icmp_err_octets: 0
>  tx_underflow: 0
>  tx_carrier: 0
>  tx_losscarrier: 0
>  vlan_tag: 0
>  tx_deferred: 0
>  tx_vlan: 0
>  tx_jabber: 0
>  tx_frame_flushed: 0
>  tx_payload_error: 0
>  tx_ip_header_error: 0
>  rx_desc: 0
>  sa_filter_fail: 0
>  overflow_error: 0
>  ipc_csum_error: 0
>  rx_collision: 0
>  rx_crc_errors: 0
>  dribbling_bit: 0
>  rx_length: 0
>  rx_mii: 0
>  rx_multicast: 0
>  rx_gmac_overflow: 0
>  rx_watchdog: 0
>  da_rx_filter_fail: 0
>  sa_rx_filter_fail: 0
>  rx_missed_cntr: 0
>  rx_overflow_cntr: 0
>  rx_vlan: 0
>  tx_undeflow_irq: 0
>  tx_process_stopped_irq: 0
>  tx_jabber_irq: 0
>  rx_overflow_irq: 0
>  rx_buf_unav_irq: 0
>  rx_process_stopped_irq: 0
>  rx_watchdog_irq: 0
>  tx_early_irq: 0
>  fatal_bus_error_irq: 0
>  rx_early_irq: 0
>  threshold: 1
>  tx_pkt_n: 83
>  rx_pkt_n: 133
>  normal_irq_n: 130
>  rx_normal_irq_n: 129
>  napi_poll: 130
>  tx_normal_irq_n: 1
>  tx_clean: 192
>  tx_set_ic_bit: 1
>  irq_receive_pmt_irq_n: 0
>  mmc_tx_irq_n: 0
>  mmc_rx_irq_n: 0
>  mmc_rx_csum_offload_irq_n: 0

Re: ARM GLX Khadas VIM Pro - Ethernet detected as only 10Mbps and stalled after some traffic

2017-07-26 Thread Jerome Brunet

On Sun, 2017-06-11 at 08:31 +0200, crow wrote:
> Hi Andrew,
> 
> On Sat, Jun 10, 2017 at 5:27 PM, Andrew Lunn  wrote:
> > > Also what Martin Blumenstingl wrote is following which is also crucial
> > > for fixing the issue:
> > > Amlogic has given their ethernet PHY driver some updates [2], it now
> > > includes wake-on-lan, and they now have an internal_phy_read_status
> > > which uses reset_internal_phy if there's a link and some error counter
> > > exceeds some magic value.
> > 
> > Hi Crow
> > 
> > You could probably just drop the Amlogic driver into mainline and see
> > if it works better. If that solves your problem, we can look at
> > merging the changes.
> > 
> > Andrew
> 
> Thank your for the suggestion, and thanks Martin to explaining me over
> IRC what actually I should do.
> 
> I recompiled mainline kernel 4.12-rc4 with the Amlogic driver:
> replaced drivers/net/phy/meson-gxl.c with
> https://github.com/khadas/linux/blob/ubuntu-4.9/drivers/amlogic/ethernet/phy/a
> mlogic.c
> 
> But this did not solve the issue. As soon as i start git clone i lose
> network connection to device (no session timeout/disconnect this time,
> but I am unable to reconnect over SSH or to get OK ping replay back).
> 
> Here are the tests:
> Linux khadasvimpro 4.12.0-rc4-4-ARCH #1 SMP Sun Jun 11 03:39:21 CEST
> 2017 aarch64 GNU/Linux
> 
> # modinfo meson_gxl
> filename:
> /lib/modules/4.12.0-rc4-4-ARCH/kernel/drivers/net/phy/meson-gxl.ko.gz
> license:GPL
> author: Neil Armstrong 
> author: Baoqi wang
> description:Amlogic Meson GXL Internal PHY driver
> alias:  mdio:0001100101000100
> depends:
> intree: Y
> vermagic:   4.12.0-rc4-4-ARCH SMP mod_unload aarch64
> #
> # mii-tool -vvv eth0
> Using SIOCGMIIPHY=0x8947
> eth0: negotiated 1000baseT-HD flow-control, link ok

Hum, 1000Mbps Half duplex looks duplex looks suspicious
The PHY is supposed to be a 10/100, right ? or did I miss something ?

>   registers for MII PHY 8:
> 1000 782d 0181 4400 01e1 c1e1 000f 2001
>        
> 0040 0002 40e8 5400 1c1c   
> fff0   040a 1407 004a  105a
>   product info: vendor 00:60:51, model 0 rev 0
>   basic mode:   autonegotiation enabled
>   basic status: autonegotiation complete, link ok
>   capabilities: 1000baseT-HD 1000baseT-FD 100baseTx-FD 100baseTx-HD
> 10baseT-FD 10baseT-HD
>   advertising:  1000baseT-HD 1000baseT-FD 100baseTx-FD 100baseTx-HD
> 10baseT-FD 10baseT-HD
>   link partner: 1000baseT-HD 1000baseT-FD 100baseTx-FD 100baseTx-HD
> 10baseT-FD 10baseT-HD
> $
> 
> over SSH startet following but it stall already at 0%:
> 
> $ git clone git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-
> stable.git
> Cloning into 'linux-stable'...
> remote: Counting objects: 5948690, done.
> remote: Compressing objects: 100% (124799/124799), done.
> Receiving objects:   0% (11668/5948690), 2.27 MiB | 4.52 MiB/s
> 
> shows timeout while trying to sync with NTP server:
> 
> # journalctl -f
> systemd-timesyncd[299]: Timed out waiting for reply from
> 83.68.137.76:123 (2.at.pool.ntp.org).
> systemd-timesyncd[299]: Timed out waiting for reply from
> 86.59.113.114:123 (2.at.pool.ntp.org).
> 
> while still not working dump the register:
> # mii-tool -vvv eth0
> Using SIOCGMIIPHY=0x8947
> eth0: negotiated 1000baseT-HD flow-control, link ok
>   registers for MII PHY 8:
> 1000 782d 0181 4400 01e1 c1e1 000d 2001
>        
> 0040 0002 40e8 5400 1c1c   
> fff0   040a 1407   105a
>   product info: vendor 00:60:51, model 0 rev 0
>   basic mode:   autonegotiation enabled
>   basic status: autonegotiation complete, link ok
>   capabilities: 1000baseT-HD 1000baseT-FD 100baseTx-FD 100baseTx-HD
> 10baseT-FD 10baseT-HD
>   advertising:  1000baseT-HD 1000baseT-FD 100baseTx-FD 100baseTx-HD
> 10baseT-FD 10baseT-HD
>   link partner: 1000baseT-HD 1000baseT-FD 100baseTx-FD 100baseTx-HD
> 10baseT-FD 10baseT-HD
> #
> 
> # ifconfig eth0 down && ifconfig eth0 up
> # dmesg -T | tail -n 10
> [Sun Jun 11 07:40:34 2017] meson8b-dwmac c941.ethernet eth0:
> device MAC address 00:15:18:01:81:31
> [Sun Jun 11 07:40:34 2017] random: crng init done
> [Sun Jun 11 07:40:34 2017] Meson GXL Internal PHY 0.e40908ff:08:
> attached PHY driver [Meson GXL Internal PHY]
> (mii_bus:phy_addr=0.e40908ff:08, irq=-1)
> [Sun Jun 11 07:40:34 2017] meson8b-dwmac c941.ethernet eth0: PTP
> not supported by HW
> [Sun Jun 11 07:40:36 2017] brcmfmac: brcmf_c_preinit_dcmds: Firmware
> version = wl0: Mar  1 2015 07:29:38 version 7.45.18 (r538002) FWID
> 01-6a2c8ad4
> [Sun Jun 11 07:40:36 2017] meson8b-dwmac c941.ethernet eth0: Link
> is Up - 100Mbps/Full - flow control off
> [Sun Jun 11 07:41:23 2017] EXT4-fs (mmcblk1p1): mounted filesystem
> with ordered data mode. Opts: (null)
> [Sun Jun 11 07:54:28 2017] Meson GXL Internal PHY

[PATCH net] net: phy: Run state machine to completion

2017-07-26 Thread Florian Fainelli

Marc reported that he was not getting the PHY library adjust_link()
callback function to run when calling phy_stop() + phy_disconnect()
which does not indeed happen because we don't make sure we flush the
PHYLIB delayed work and let it run to completion. We also need to have
a synchronous call to phy_state_machine() in order to have the state
machine actually act on PHY_HALTED, set the PHY device's link down, turn
the network device's carrier off and finally call the adjust_link()
function.

Reported-by: Marc Gonzalez 
Fixes: a390d1f379cf ("phylib: convert state_queue work to delayed_work")
Signed-off-by: Florian Fainelli 
---
David, this dates back from before the commit mentioned in Fixes but it would
be hard to backport to earlier kernels if we flagged the original design flaw
that used timers. Also, I am not clear on the timer API whether there was a way
to ensure timers would run to completion before they would be cancelled.

Marc, please add your Signed-off-by tag since you contributed the second line.

Thanks!

 drivers/net/phy/phy.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c
index d0626bf5c540..30e7c43e0d87 100644
--- a/drivers/net/phy/phy.c
+++ b/drivers/net/phy/phy.c
@@ -743,12 +743,17 @@ void phy_trigger_machine(struct phy_device *phydev, bool 
sync)
  */
 void phy_stop_machine(struct phy_device *phydev)
 {
+   /* Run the state machine to completion */
+   flush_delayed_work(>state_queue);
cancel_delayed_work_sync(>state_queue);
 
mutex_lock(>lock);
if (phydev->state > PHY_UP && phydev->state != PHY_HALTED)
phydev->state = PHY_UP;
mutex_unlock(>lock);
+
+   /* Now we can run the state machine synchronously */
+   phy_state_machine(>state_queue.work);
 }
 
 /**
-- 
2.9.3

Re: [PATCH net-next] net: phy: Remove stale comments referencing timer

2017-07-26 Thread Andrew Lunn

On Wed, Jul 26, 2017 at 12:05:38PM -0700, Florian Fainelli wrote:
> Since commit a390d1f379cf ("phylib: convert state_queue work to
> delayed_work"), the PHYLIB state machine was converted to use delayed
> workqueues, yet some functions were still referencing the PHY library
> timer in their comments, fix that and remove the now unused
> linux/timer.h include.
> 
> Signed-off-by: Florian Fainelli 

Reviewed-by: Andrew Lunn 

Andrew

[PATCH v3 net-next 2/2] liquidio: cleanup: removed cryptic and misleading macro

2017-07-26 Thread Felix Manlunas

From: Rick Farrington 

Signed-off-by: Rick Farrington 
Signed-off-by: Felix Manlunas 
---
 drivers/net/ethernet/cavium/liquidio/octeon_console.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/cavium/liquidio/octeon_console.c 
b/drivers/net/ethernet/cavium/liquidio/octeon_console.c
index 501ad95..15ad1ab 100644
--- a/drivers/net/ethernet/cavium/liquidio/octeon_console.c
+++ b/drivers/net/ethernet/cavium/liquidio/octeon_console.c
@@ -42,8 +42,6 @@ module_param(console_bitmask, int, 0644);
 MODULE_PARM_DESC(console_bitmask,
 "Bitmask indicating which consoles have debug output 
redirected to syslog.");
 
-#define CAST_ULL(v) ((u64)(v))
-
 #define BOOTLOADER_PCI_READ_BUFFER_DATA_ADDR0x0006c008
 #define BOOTLOADER_PCI_READ_BUFFER_LEN_ADDR 0x0006c004
 #define BOOTLOADER_PCI_READ_BUFFER_OWNER_ADDR   0x0006c000
@@ -233,7 +231,7 @@ static int __cvmx_bootmem_check_version(struct 
octeon_device *oct,
(exact_match && major_version != exact_match)) {
dev_err(>pci_dev->dev, "bootmem ver mismatch %d.%d 
addr:0x%llx\n",
major_version, minor_version,
-   CAST_ULL(oct->bootmem_desc_addr));
+   (long long)oct->bootmem_desc_addr);
return -1;
} else {
return 0;
-- 
2.9.0

[PATCH v3 net-next 1/2] liquidio: standardization: use min_t instead of custom macro

2017-07-26 Thread Felix Manlunas

From: Rick Farrington 

Signed-off-by: Rick Farrington 
Signed-off-by: Derek Chickles 
Signed-off-by: Felix Manlunas 
---
 drivers/net/ethernet/cavium/liquidio/octeon_console.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/cavium/liquidio/octeon_console.c 
b/drivers/net/ethernet/cavium/liquidio/octeon_console.c
index e08f760..501ad95 100644
--- a/drivers/net/ethernet/cavium/liquidio/octeon_console.c
+++ b/drivers/net/ethernet/cavium/liquidio/octeon_console.c
@@ -42,7 +42,6 @@ module_param(console_bitmask, int, 0644);
 MODULE_PARM_DESC(console_bitmask,
 "Bitmask indicating which consoles have debug output 
redirected to syslog.");
 
-#define MIN(a, b) min((a), (b))
 #define CAST_ULL(v) ((u64)(v))
 
 #define BOOTLOADER_PCI_READ_BUFFER_DATA_ADDR0x0006c008
@@ -704,7 +703,7 @@ static int octeon_console_read(struct octeon_device *oct, 
u32 console_num,
if (bytes_to_read <= 0)
return bytes_to_read;
 
-   bytes_to_read = MIN(bytes_to_read, (s32)buf_size);
+   bytes_to_read = min_t(s32, bytes_to_read, buf_size);
 
/* Check to see if what we want to read is not contiguous, and limit
 * ourselves to the contiguous block
-- 
2.9.0

[PATCH v3 net-next 0/2] liquidio: standardization and cleanup

2017-07-26 Thread Felix Manlunas

From: Rick Farrington 

This patchset corrects some non-standard macro usage.

1. Replaced custom MIN macro with use of standard 'min_t'.
2. Removed cryptic and misleading macro 'CAST_ULL'.

change log:
V1 -> V2:
  1. Add driver cleanup of macro 'CAST_ULL'.
V2 -> V3:
  1. Remove extra parentheses from previous usage of macro 'CAST_ULL'.

Rick Farrington (2):
  liquidio: standardization: use min_t instead of custom macro
  liquidio: cleanup: removed cryptic and misleading macro

 drivers/net/ethernet/cavium/liquidio/octeon_console.c | 7 ++-
 1 file changed, 2 insertions(+), 5 deletions(-)

-- 
2.9.0

Re: [PATCH v7 2/3] PCI: Enable PCIe Relaxed Ordering if supported

2017-07-26 Thread Casey Leedom

| From: Alexander Duyck 
| Sent: Wednesday, July 26, 2017 11:44 AM
| 
| On Jul 26, 2017 11:26 AM, "Casey Leedom"  wrote:
| |
| |     I think that the patch will need to be extended to modify
| |     drivers/pci.c/iov.c:sriov_enable() to explicitly turn off
| |     Relaxed Ordering Enable if the Root Complex is marked
|     for no RO TLPs.
| 
| I'm not sure that would be an issue. Wouldn't most VFs inherit the PF's 
settings?

Ah yes, you're right.  This is covered in section 3.5.4 of the Single Root I/O
Virtualization and Sharing Specification, Revision 1.0 (September 11, 2007),
governing the PCIe Capability Device Control register.  It states that the VF
version of that register shall follow the setting of the corresponding PF.

So we should enhance the cxgb4vf/sge.c:t4vf_sge_alloc_rxq() in the same
way we did for the cxgb4 driver, but that's not critical since the Relaxed
Ordering Enable supersedes the internal chip's desire to use the Relaxed
Ordering Attribute.

Ding, send me a note if you'd like me to work that up for you.

| Also I thought most of the VF configuration space is read only.

Yes, but not all of it.  And when a VF is exported to a Virtual Machine,
then the Hypervisor captures and interprets all accesses to the VF's
PCIe Configuration Space from the VM.

Thanks again for reminding me of the subtle aspect of the SR_IOV
specification that I forgot.

Casey

[PATCH net-next] net: phy: Remove stale comments referencing timer

2017-07-26 Thread Florian Fainelli

Since commit a390d1f379cf ("phylib: convert state_queue work to
delayed_work"), the PHYLIB state machine was converted to use delayed
workqueues, yet some functions were still referencing the PHY library
timer in their comments, fix that and remove the now unused
linux/timer.h include.

Signed-off-by: Florian Fainelli 
---
 drivers/net/phy/phy.c | 11 +--
 1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c
index d0626bf5c540..ac1dcf0289fa 100644
--- a/drivers/net/phy/phy.c
+++ b/drivers/net/phy/phy.c
@@ -30,7 +30,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
@@ -705,8 +704,8 @@ EXPORT_SYMBOL(phy_start_aneg);
  *
  * Description: The PHY infrastructure can run a state machine
  *   which tracks whether the PHY is starting up, negotiating,
- *   etc.  This function starts the timer which tracks the state
- *   of the PHY.  If you want to maintain your own state machine,
+ *   etc.  This function starts the delayed workqueue which tracks
+ *   the state of the PHY. If you want to maintain your own state machine,
  *   do not call this function.
  */
 void phy_start_machine(struct phy_device *phydev)
@@ -737,9 +736,9 @@ void phy_trigger_machine(struct phy_device *phydev, bool 
sync)
  * phy_stop_machine - stop the PHY state machine tracking
  * @phydev: target phy_device struct
  *
- * Description: Stops the state machine timer, sets the state to UP
- *   (unless it wasn't up yet). This function must be called BEFORE
- *   phy_detach.
+ * Description: Stops the state machine delayed workqueue, sets the
+ *   state to UP (unless it wasn't up yet). This function must be
+ *   called BEFORE phy_detach.
  */
 void phy_stop_machine(struct phy_device *phydev)
 {
-- 
2.9.3

Re: TCP fast retransmit issues

2017-07-26 Thread Neal Cardwell

On Wed, Jul 26, 2017 at 2:38 PM, Neal Cardwell  wrote:
> Yeah, it looks like I can reproduce this issue with (1) bad sacks
> causing repeated TLPs, and (2) TLPs timers being pushed out to later
> times due to incoming data. Scripts are attached.

I'm testing a fix of only scheduling a TLP if (flag & FLAG_DATA_ACKED)
is true...

neal

Re: [PATCH net] ipv6: no need to return rt->dst.error if it is not null entry.

2017-07-26 Thread David Ahern

On 7/26/17 12:55 PM, Roopa Prabhu wrote:
> On Wed, Jul 26, 2017 at 11:49 AM, David Ahern  wrote:
>> On 7/26/17 12:27 PM, Roopa Prabhu wrote:
>>> agreed...so looks like the check in v3 should be
>>>
>>>
>>> +   if ( rt == net->ipv6.ip6_null_entry ||
>>> +(rt->dst.error &&
>>> + #ifdef CONFIG_IPV6_MULTIPLE_TABLES
>>> +  rt != net->ipv6.ip6_prohibit_entry &&
>>> +  rt != net->ipv6.ip6_blk_hole_entry &&
>>> +#endif
>>> + )) {
>>> err = rt->dst.error;
>>> ip6_rt_put(rt);
>>> goto errout;
>>>
>>
>> I don't think so. If I add a prohibit route and use the fibmatch
>> attribute, I want to see the route from the FIB that was matched.
> 
> 
> yes, exactly. wouldn't  'rt != net->ipv6.ip6_prohibit_entry' above let
> it fall through to the route fill code ?
> 
> ah...but i guess you are saying that they will have rt6_info's of
> their own and will not match. got it. ack.
> 

This:

diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 4d30c96a819d..24de81c804c2 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -3637,11 +3637,6 @@ static int inet6_rtm_getroute(struct sk_buff
*in_skb, struct nlmsghdr *nlh,
dst = ip6_route_lookup(net, , 0);

rt = container_of(dst, struct rt6_info, dst);
-   if (rt->dst.error) {
-   err = rt->dst.error;
-   ip6_rt_put(rt);
-   goto errout;
-   }

if (rt == net->ipv6.ip6_null_entry) {
err = rt->dst.error;

Puts back the original behavior. In that case, only rt == null_entry
drops to the error path which is correct. All other rt values will drop
to rt6_fill_node and return rt data.

Re: [PATCH net] ipv6: no need to return rt->dst.error if it is not null entry.

2017-07-26 Thread Roopa Prabhu

On Wed, Jul 26, 2017 at 11:49 AM, David Ahern  wrote:
> On 7/26/17 12:27 PM, Roopa Prabhu wrote:
>> agreed...so looks like the check in v3 should be
>>
>>
>> +   if ( rt == net->ipv6.ip6_null_entry ||
>> +(rt->dst.error &&
>> + #ifdef CONFIG_IPV6_MULTIPLE_TABLES
>> +  rt != net->ipv6.ip6_prohibit_entry &&
>> +  rt != net->ipv6.ip6_blk_hole_entry &&
>> +#endif
>> + )) {
>> err = rt->dst.error;
>> ip6_rt_put(rt);
>> goto errout;
>>
>
> I don't think so. If I add a prohibit route and use the fibmatch
> attribute, I want to see the route from the FIB that was matched.


yes, exactly. wouldn't  'rt != net->ipv6.ip6_prohibit_entry' above let
it fall through to the route fill code ?

ah...but i guess you are saying that they will have rt6_info's of
their own and will not match. got it. ack.

Re: [PATCH] netpoll: Fix device name check in netpoll_setup()

2017-07-26 Thread Doug Anderson

Hi,

On Tue, Jul 25, 2017 at 11:36 AM, Matthias Kaehlcke  wrote:
> Apparently netpoll_setup() assumes that netpoll.dev_name is a pointer
> when checking if the device name is set:
>
> if (np->dev_name) {
>   ...
>
> However the field is a character array, therefore the condition always
> yields true. Check instead whether the first byte of the array has a
> non-zero value.
>
> Signed-off-by: Matthias Kaehlcke 
> ---
>  net/core/netpoll.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/net/core/netpoll.c b/net/core/netpoll.c
> index 8357f164c660..912731bed7b7 100644
> --- a/net/core/netpoll.c
> +++ b/net/core/netpoll.c
> @@ -666,7 +666,7 @@ int netpoll_setup(struct netpoll *np)
> int err;
>
> rtnl_lock();
> -   if (np->dev_name) {
> +   if (np->dev_name[0]) {
> struct net *net = current->nsproxy->net_ns;
> ndev = __dev_get_by_name(net, np->dev_name);
> }

It's really up to the maintainer of the code, but my first instinct
here would be to instead remove the "if" test unless we really expect
dev->dev_name to be blank in lots of cases.  It will slightly slow
down the error case but should avoid an "if" test in the non-error
case.  By definition it should be safe since currently the "if" test
should always evaluate to true.

-Doug

Re: [PATCH net] ipv6: no need to return rt->dst.error if it is not null entry.

2017-07-26 Thread David Ahern

On 7/26/17 12:27 PM, Roopa Prabhu wrote:
> agreed...so looks like the check in v3 should be
> 
> 
> +   if ( rt == net->ipv6.ip6_null_entry ||
> +(rt->dst.error &&
> + #ifdef CONFIG_IPV6_MULTIPLE_TABLES
> +  rt != net->ipv6.ip6_prohibit_entry &&
> +  rt != net->ipv6.ip6_blk_hole_entry &&
> +#endif
> + )) {
> err = rt->dst.error;
> ip6_rt_put(rt);
> goto errout;
> 

I don't think so. If I add a prohibit route and use the fibmatch
attribute, I want to see the route from the FIB that was matched.

Re: [PATCHv3 net] ipv6: no need to return rt->dst.error if it is prohibit entry

2017-07-26 Thread David Ahern

On 7/26/17 11:09 AM, David Ahern wrote:
> I don't understand why you are focused on the built-in null and prohibit
> route entries. 

I see. You are using fib rules for the prohibit entry; I am using an
explicit route entry.

If I run 'ip ro get fibmatch' for the latter I want to see that route
entry since it is a route in the FIB:

# ip -6 ro get fibmatch vrf red 5000::1
prohibit 5000::/120 dev lo table red metric 1024 error -13 pref medium

So there are multiple cases to verify.

Re: TCP fast retransmit issues

2017-07-26 Thread Neal Cardwell

On Wed, Jul 26, 2017 at 1:06 PM, Neal Cardwell  wrote:
> On Wed, Jul 26, 2017 at 12:43 PM, Neal Cardwell  wrote:
>> (2) It looks like there is a bug in the sender code where it seems to
>> be repeatedly using a TLP timer firing 211ms after every ACK is
>> received to transmit another TLP probe (a new packet in this case).
>> Somehow these weird invalid SACKs seem to be triggering a code path
>> that makes us think we can send another TLP, when we probably should
>> be firing an RTO. That's my interpretation, anyway. I will try to
>> reproduce this with packetdrill.
>
> Hmm. It looks like this might be a general issue, where any time we
> get an ACK that doesn't ACK/SACK anything new (whether because it's
> incoming data in a bi-directional flow, or a middlebox breaking the
> SACKs), then we schedule a TLP timer further out in time. Probably we
> should only push the TLP timer out if something is cumulatively ACKed.
>
> But that's not a trivial thing to do, because by the time we are
> deciding whether to schedule another TLP, we have already canceled the
> previous TLP and reinstalled an RTO. Hmm.

Yeah, it looks like I can reproduce this issue with (1) bad sacks
causing repeated TLPs, and (2) TLPs timers being pushed out to later
times due to incoming data. Scripts are attached.

neal


tlp-bad-sacks.pkt
Description: Binary data


tlp-bidirectional.pkt
Description: Binary data

Re: [PATCH v2 2/4] can: fixed-transceiver: Add documentation for CAN fixed transceiver bindings

2017-07-26 Thread Franklin S Cooper Jr

On 07/26/2017 12:05 PM, Oliver Hartkopp wrote:
> On 07/26/2017 06:41 PM, Andrew Lunn wrote:
>> On Mon, Jul 24, 2017 at 06:05:19PM -0500, Franklin S Cooper Jr wrote:
> 
>>> +
>>> +Optional:
>>> + max-arbitration-speed: a positive non 0 value that determines the max
>>> +speed that CAN can run in non CAN-FD mode or during the
>>> +arbitration phase in CAN-FD mode.
>>
>> Hi Franklin
>>
>> Since this and the next property are optional, it is good to document
>> what happens when they are not in the DT blob. Also document what 0
>> means.

The driver ignores values less than 0 with the exception being
max-data-speed which supports a value of -1. Not sure what I'm
documenting when the binding specifically says to use a positive non
zero value. Its the same reason I don't document what happens if you
give it a negative value.

>>
>>> +
>>> + max-data-speed:a positive non 0 value that determines the max
>>> data rate
>>> +that can be used in CAN-FD mode. A value of -1 implies
>>> +CAN-FD is not supported by the transceiver.
>>
>> -1 is ugly. I think it would be better to have a missing
>> max-data-speed property indicate that CAN-FD is not supported.
> 

Although this leads to your later point I don't think this is the right
approach. Its an optional property. Not including the property should
not assume it isn't supported.

> Thanks Andrew! I had the same feeling about '-1' :-)

Ok I'll go back to having 0 = not supported. Which will handle the
documentation comment above.

> 
>> And
>> maybe put 'fd' into the property name.
> 
> Good point. In fact the common naming to set bitrates for CAN(FD)
> controllers are 'bitrate' and 'data bitrate'.
> 
> 'speed' is not really a good word for that.

I'm fine with switching to using bitrate instead of speed. Kurk was
originally the one that suggested to use the term arbitration and data
since thats how the spec refers to it. Which I do agree with. But your
right that in the drivers (struct can_priv) we just use bittiming and
data_bittiming (CAN-FD timings). I don't think adding "fd" into the
property name makes sense unless we are calling it something like
"max-canfd-bitrate" which I would agree is the easiest to understand.

So what is the preference if we end up sticking with two properties?
Option 1 or 2?

1)
max-bitrate
max-data-bitrate

2)
max-bitrate
max-canfd-bitrate

> 
> Finally, @Franklin:
> 
> A CAN transceiver is limited in bandwidth. But you only have one RX and
> one TX line between the CAN controller and the CAN transceiver. The
> transceiver does not know about CAN FD - it has just a physical(!) layer
> with a limited bandwidth. This is ONE limitation.
> 
> So I tend to specify only ONE 'max-bitrate' property for the
> fixed-transceiver binding.
> 
> The fact whether the CAN controller is CAN FD capable or not is provided
> by the netlink configuration interface for CAN controllers.

Part of the reasoning to have two properties is to indicate that you
don't support CAN FD while limiting the "arbitration" bit rate. With one
property you can not determine this and end up having to make some
assumptions that can quickly end up biting people.

> 
> Regards,
> Oliver
>

Re: [PATCH net] ipv6: no need to return rt->dst.error if it is not null entry.

2017-07-26 Thread Roopa Prabhu

On Wed, Jul 26, 2017 at 10:18 AM, David Ahern  wrote:
> On 7/25/17 1:32 AM, Hangbin Liu wrote:
>> On Mon, Jul 24, 2017 at 09:28:07PM -0600, David Ahern wrote:
>>> On 7/24/17 6:08 PM, Hangbin Liu wrote:
 That's why I think we should remove both rt->dst.error and ip6_null_entry
 check in inet6_rtm_getroute(). And even further, remove the ip6_null_entry
 check in rt6_dump_route().
>>>
>>> git blame net/ipv6/route.c
>>>
>>> find the commits and review.
>>
>> Hi David,
>>
>> Sorry, would you like to give me more hints?
>>
>> I saw your commit 1f17e2f2c8a8 ("net: ipv6: ignore null_entry on route
>> dumps"). But I think this issue has been fixed by
>>
>> 2f460933f58e ("ipv6: initialize route null entry in addrconf_init()") and
>> 242d3a49a2a1 ("ipv6: reorder ip6_route_dev_notifier after ipv6_dev_notf")
>>
>> I use a reproducer which add unreachable route in netns with lo down. And I
>> could not trigger Panic in the fixed kernel. That's why I think we could
>> remove ip6_null_entry check in rt6_dump_route().
>
> Understood. Cong's patch to fix the intialization order (lo device
> before route init) makes sure the idev is not null. That said, the
> null_entry route is internal ipv6 logic and is not a route entry to be
> returned to userspace.

agreed...so looks like the check in v3 should be


+   if ( rt == net->ipv6.ip6_null_entry ||
+(rt->dst.error &&
+ #ifdef CONFIG_IPV6_MULTIPLE_TABLES
+  rt != net->ipv6.ip6_prohibit_entry &&
+  rt != net->ipv6.ip6_blk_hole_entry &&
+#endif
+ )) {
err = rt->dst.error;
ip6_rt_put(rt);
goto errout;

Re: [PATCH v7 2/3] PCI: Enable PCIe Relaxed Ordering if supported

2017-07-26 Thread Casey Leedom

  By the way Ding, two issues:

 1. Did we ever get any acknowledgement from either Intel or AMD
on this patch?  I know that we can't ensure that, but it sure would
be nice since the PCI Quirks that we're putting in affect their
products.

 2. I just realized that there's still a small hole in the patch with
respect to PCIe SR-IOV Virtual Functions.  When the PCI Quirk
notices a problematic PCIe Device and marks it to note that
it's not "happy" receiving Transaction Layer Packets with the
Relaxed Ordering Attribute set, it's my understanding that your
current patch iterates down the PCIe Fabric and turns off the
PCIe Capability Device Control[Relaxed Ordering Enable].
But this scan may miss any SR-IOV VFs because they
probably won't be instantiated at that time.  And, at a later
time, when they are instantiated, they could well have their
Relaxed Ordering Enable set.

I think that the patch will need to be extended to modify
drivers/pci.c/iov.c:sriov_enable() to explicitly turn off
Relaxed Ordering Enable if the Root Complex is marked
for no RO TLPs.

Casey

Re: [PATCH 0/4] Configuring traffic classes via new hardware offload mechanism in tc/mqprio

2017-07-26 Thread Nambiar, Amritha




On 7/21/2017 2:42 AM, Richard Cochran wrote:

On Mon, May 22, 2017 at 12:31:12PM -0700, Jeff Kirsher wrote:

On Fri, 2017-05-19 at 17:58 -0700, Amritha Nambiar wrote:

The following series introduces a new harware offload mode in tc/mqprio
where the TCs, the queue configurations and bandwidth rate limits are
offloaded to the hardware.


...


This was meant to be sent out as an RFC, but apparently that did not get
conveyed when these were sent out Friday.


I am looking at tc/mqprio and dcb with the purpose of implementing

  - Forwarding and Queuing Enhancements for Time-Sensitive Streams (FQTSS)
802.1Q-2014 Clause 34

  - Scheduled Traffic (time based scheduling)
P802.1Qbv

using the HW capabilities of the i210.  This series looks like a
promising avenue for these features.

My question is, did series go anywhere?

I didn't see any follow ups on netdev, but maybe I missed something.


I have submitted another series as non-RFC for next-queue maintained by 
Jeff Kirsher. This is going to come through the next-queue tree. You can 
follow the new series "[PATCH 0/6] Configuring traffic classes via new 
hardware offload mechanism in tc/mqprio" at 
https://www.mail-archive.com/netdev@vger.kernel.org/msg177390.html




Thanks,
Richard

[PATCH net-next 3/3] nfp: only use direct firmware requests

2017-07-26 Thread Jakub Kicinski

request_firmware() will fallback to user space helper and may cause
long delays when driver is loaded if udev doesn't correctly handle
FW requests.  Since we never really made use of the user space
helper functionality switch to the simpler request_firmware_direct()
call.  The side effect of this change is that no warning will be
printed when the FW image does not exists.  To help users figure
out which FW file is missing print a info message when we request
each file.

Signed-off-by: Jakub Kicinski 
---
 drivers/net/ethernet/netronome/nfp/nfp_main.c | 42 +--
 1 file changed, 26 insertions(+), 16 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/nfp_main.c 
b/drivers/net/ethernet/netronome/nfp/nfp_main.c
index 13d056da0765..dd769eceb33d 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_main.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_main.c
@@ -174,6 +174,21 @@ static int nfp_pcie_sriov_configure(struct pci_dev *pdev, 
int num_vfs)
return nfp_pcie_sriov_enable(pdev, num_vfs);
 }
 
+static const struct firmware *
+nfp_net_fw_request(struct pci_dev *pdev, struct nfp_pf *pf, const char *name)
+{
+   const struct firmware *fw = NULL;
+   int err;
+
+   err = request_firmware_direct(, name, >dev);
+   nfp_info(pf->cpp, "  %s: %s\n",
+name, err ? "not found" : "found, loading...");
+   if (err)
+   return NULL;
+
+   return fw;
+}
+
 /**
  * nfp_net_fw_find() - Find the correct firmware image for netdev mode
  * @pdev:  PCI Device structure
@@ -184,29 +199,30 @@ static int nfp_pcie_sriov_configure(struct pci_dev *pdev, 
int num_vfs)
 static const struct firmware *
 nfp_net_fw_find(struct pci_dev *pdev, struct nfp_pf *pf)
 {
-   const struct firmware *fw = NULL;
struct nfp_eth_table_port *port;
+   const struct firmware *fw;
const char *fw_model;
char fw_name[256];
const u8 *serial;
-   int spc, err = 0;
u16 interface;
-   int i, j;
+   int spc, i, j;
+
+   nfp_info(pf->cpp, "Looking for firmware file in order of priority:\n");
 
/* First try to find a firmware image specific for this device */
interface = nfp_cpp_interface(pf->cpp);
nfp_cpp_serial(pf->cpp, );
sprintf(fw_name, "netronome/serial-%pMF-%02hhx-%02hhx.nffw",
serial, interface >> 8, interface & 0xff);
-   err = request_firmware_direct(, fw_name, >dev);
-   if (!err)
-   goto done;
+   fw = nfp_net_fw_request(pdev, pf, fw_name);
+   if (fw)
+   return fw;
 
/* Then try the PCI name */
sprintf(fw_name, "netronome/pci-%s.nffw", pci_name(pdev));
-   err = request_firmware_direct(, fw_name, >dev);
-   if (!err)
-   goto done;
+   fw = nfp_net_fw_request(pdev, pf, fw_name);
+   if (fw)
+   return fw;
 
/* Finally try the card type and media */
if (!pf->eth_tbl) {
@@ -241,13 +257,7 @@ nfp_net_fw_find(struct pci_dev *pdev, struct nfp_pf *pf)
if (spc <= 0)
return NULL;
 
-   err = request_firmware(, fw_name, >dev);
-   if (err)
-   return NULL;
-done:
-   dev_info(>dev, "Loading FW image: %s\n", fw_name);
-
-   return fw;
+   return nfp_net_fw_request(pdev, pf, fw_name);
 }
 
 /**
-- 
2.11.0

[PATCH net-next 1/3] nfp: remove the probe deferral when FW not present

2017-07-26 Thread Jakub Kicinski

We use a hack to defer probe when firmware was not pre-loaded
or found on disk.  This helps in case users forgot to include
firmware in initramfs, the driver will most likely get another
shot at probing after real root is mounted.

This is not for what EPROBE_DEFER is supposed to be used, and
when FW is completely missing every time new device is probed
NFP will reprobe spamming kernel logs.

Remove this hack, users will now have to make sure the right
firmware image is present in initramfs if nfp.ko is placed
there or built in.

Signed-off-by: Jakub Kicinski 
---
 drivers/net/ethernet/netronome/nfp/nfp_net_main.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_main.c 
b/drivers/net/ethernet/netronome/nfp/nfp_net_main.c
index 5797dbf2b507..d5e2361f0e86 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_main.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_main.c
@@ -704,7 +704,7 @@ int nfp_net_pci_probe(struct nfp_pf *pf)
if (!pf->rtbl) {
nfp_err(pf->cpp, "No %s, giving up.\n",
pf->fw_loaded ? "symbol table" : "firmware found");
-   return -EPROBE_DEFER;
+   return -EINVAL;
}
 
mutex_lock(>lock);
-- 
2.11.0

[PATCH net-next 2/3] nfp: look for firmware image by device serial number and PCI name

2017-07-26 Thread Jakub Kicinski

We generally look up firmware by card type, but that doesn't allow
users who have more than one card of the same type in their system
to select firmware per adapter.

Unfortunately user space firmware helper seems fraught with
difficulties and to be on its way out.  In particular support for
handling firmware uevents have been dropped from systemd and most
distributions don't enable the FW fallback by default any more.

To allow users selecting firmware for a particular device look up
firmware names by serial and pci_name().  Use the direct lookup to
disable generating uevents when enabled in Kconfig and not print
any warnings to logs if adapter-specific files are missing.  Users
can place in /lib/firmware/netronome files named:

pci-${pci_name}.nffw
serial-${serial}.nffw

to target a specific card.  E.g.:

pci-:04:00.0.nffw
pci-:82:00.0.nffw
serial-00-aa-bb-11-22-33-10-ff.nffw

We use the full serial number including the interface id, as it
appears in lspci output (bytes separated by '-').

Signed-off-by: Jakub Kicinski 
---
 drivers/net/ethernet/netronome/nfp/nfp_main.c | 20 +++-
 1 file changed, 19 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/netronome/nfp/nfp_main.c 
b/drivers/net/ethernet/netronome/nfp/nfp_main.c
index d67969d3e484..13d056da0765 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_main.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_main.c
@@ -188,9 +188,27 @@ nfp_net_fw_find(struct pci_dev *pdev, struct nfp_pf *pf)
struct nfp_eth_table_port *port;
const char *fw_model;
char fw_name[256];
+   const u8 *serial;
int spc, err = 0;
+   u16 interface;
int i, j;
 
+   /* First try to find a firmware image specific for this device */
+   interface = nfp_cpp_interface(pf->cpp);
+   nfp_cpp_serial(pf->cpp, );
+   sprintf(fw_name, "netronome/serial-%pMF-%02hhx-%02hhx.nffw",
+   serial, interface >> 8, interface & 0xff);
+   err = request_firmware_direct(, fw_name, >dev);
+   if (!err)
+   goto done;
+
+   /* Then try the PCI name */
+   sprintf(fw_name, "netronome/pci-%s.nffw", pci_name(pdev));
+   err = request_firmware_direct(, fw_name, >dev);
+   if (!err)
+   goto done;
+
+   /* Finally try the card type and media */
if (!pf->eth_tbl) {
dev_err(>dev, "Error: can't identify media config\n");
return NULL;
@@ -226,7 +244,7 @@ nfp_net_fw_find(struct pci_dev *pdev, struct nfp_pf *pf)
err = request_firmware(, fw_name, >dev);
if (err)
return NULL;
-
+done:
dev_info(>dev, "Loading FW image: %s\n", fw_name);
 
return fw;
-- 
2.11.0

[PATCH net-next 0/3] nfp: extend firmware request logic

2017-07-26 Thread Jakub Kicinski

Hi!

We have been pondering for some time how to support loading different
application firmwares onto NFP.  We want to support both users selecting
one of the firmware images provided by Netronome (which are optimized
for different use cases each) as well as firmware images created  by 
users themselves or other companies.

In the end we decided to go with the simplest solution - depending on
the right firmware image being placed in /lib/firmware.  This vastly
simplifies driver logic and also doesn't require any new API.

Different NICs on one system may want to run different applications
therefore we first try to load firmware specific to the device (by 
serial number of PCI slot) and if not present try the device model
based name we have been using so far.


Jakub Kicinski (3):
  nfp: remove the probe deferral when FW not present
  nfp: look for firmware image by device serial number and PCI name
  nfp: only use direct firmware requests

 drivers/net/ethernet/netronome/nfp/nfp_main.c | 48 ++-
 drivers/net/ethernet/netronome/nfp/nfp_net_main.c |  2 +-
 2 files changed, 39 insertions(+), 11 deletions(-)

-- 
2.11.0

Re: [PATCH RFC 00/13] phylink and sfp support

2017-07-26 Thread Russell King - ARM Linux

On Wed, Jul 26, 2017 at 06:44:11PM +0200, Andrew Lunn wrote:
> Do you have a git branch we can look at to see these changes. It helps
> understand the API when you can see both sides of it.

git://git.armlinux.org.uk/~rmk/linux-arm.git phy

or head to the cgit at:

git.armlinux.org.uk/cgit/linux-arm.git/log/?h=phy

Not only does that include the mvneta changes, but also the changes to
the 88x3310 driver (although incomplete, but functional.)

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.

Re: [RFC] switchdev: clarify ndo_get_phys_port_name() formats

2017-07-26 Thread Andy Gospodarek

On Tue, Jul 25, 2017 at 07:34:47PM -0700, Jakub Kicinski wrote:
> On Tue, 25 Jul 2017 21:48:15 -0400, Andy Gospodarek wrote:
> > On Tue, Jul 25, 2017 at 03:26:47PM -0700, Jakub Kicinski wrote:
> > > On Tue, 25 Jul 2017 11:22:41 -0400, Andy Gospodarek wrote:  
> > > > On Mon, Jul 24, 2017 at 10:13:44PM -0700, Jakub Kicinski wrote:  
> > > > > We are still in position where we can suggest uniform naming
> > > > > convention for ndo_get_phys_port_name().  switchdev.txt file
> > > > > already contained a suggestion of how to name external ports.
> > > > > Since the use of switchdev for SR-IOV NIC's eswitches is growing,
> > > > > establish a format for ports of those devices as well.
> > > > > 
> > > > > Signed-off-by: Jakub Kicinski 
> > > > 
> > > > This is a nice addition and I suspect there could be even more done to
> > > > update this file to cover the VF rep usage.
> > > >   
> > > > > ---
> > > > >  Documentation/networking/switchdev.txt | 14 +++---
> > > > >  1 file changed, 11 insertions(+), 3 deletions(-)
> > > > > 
> > > > > diff --git a/Documentation/networking/switchdev.txt 
> > > > > b/Documentation/networking/switchdev.txt
> > > > > index 3e7b946dea27..7c4b6025fb4b 100644
> > > > > --- a/Documentation/networking/switchdev.txt
> > > > > +++ b/Documentation/networking/switchdev.txt
> > > > > @@ -119,9 +119,17 @@ into 4 10G ports, resulting in 4 port netdevs, 
> > > > > the device can give a unique
> > > > >  SUBSYSTEM=="net", ACTION=="add", 
> > > > > ATTR{phys_switch_id}=="", \
> > > > >   ATTR{phys_port_name}!="", NAME="swX$attr{phys_port_name}"
> > > > >  
> > > > > -Suggested naming convention is "swXpYsZ", where X is the switch name 
> > > > > or ID, Y
> > > > > -is the port name or ID, and Z is the sub-port name or ID.  For 
> > > > > example, sw1p1s0
> > > > > -would be sub-port 0 on port 1 on switch 1.
> > > > > +Suggested formats of the port name returned by 
> > > > > ndo_get_phys_port_name are:
> > > > > + - pA for external ports;
> > > > > + - pAsB   for split external ports;
> > > > > + - pfCfor PF ports (so called PF representors);
> > > > > + - pfCvfD for VF ports (so called VF representors).
> > > > 
> > > > I hate to clutter this up, but might be also need to add:
> > > > 
> > > >  - pfCsBfor split PF ports (so called PF representors);
> > > >  - pfCsBvfD for split VF ports (so called VF representors).
> > > > 
> > > > or are we comfortable that these additions to the name for split ports
> > > > are implied?  
> > > 
> > > Hm..  What is a split PF port?  Splits happen on the physical port - see
> > > my rant on the thread this is a reply to ;)  PFs are PCIe functions,
> > > on the opposite side of the eswitch from the wires.  
> > 
> > I'm with you that I think there is value in separate netdevs to
> > represent "PFs, VFs and external ports/MACs" -- particularly for the
> > use-case you to create rules to control PF<->VF traffic.
> > 
> > So while I'm not saying it is a _great_ idea to support such a thing as
> > port-splitting of PFs, I suggested this addition as I'm not willing to 
> > restrict
> > such a design/implementation if a vendor or customer desired.  It seemed
> > useful to provde some guidance on how to name them -- even if we do not
> > like them.  :-)
> 
> If I understand you correctly split PF would be a situation where
> device has multiple port instances on the PCIe PF side?  IOW switch sees
> multiple endpoints on the PF side?  Let me attempt an ASCII diagram :)
> 
>
> HOST A ||  HOST B  
>||  
> PF A   | V | V | V | V ||   PF B| V | V | V   
> 
>| F | F | F | F ... ||   | F | F | F ...   
> 
>  port A0 | port A1 | 0 | 1 | 2 | 3 || port B0 | port B1 | 0 | 1 | 2   
>  
>||  
>  PCI Express link  ||PCI Express link  
> \  \  \  |   |   |  ||  /   /   / 
>  \  \  \ |   |   |  || /   /   /  
>   \__\__\'   |   |  |'/___/___/
>  /---\
>  
>  |<<==   |
>  | ==>>  |
>  | SR-IOV e-switch   |
>  
>  |<<==   |
>  
>  | ==>>  |
>   
>  \---/ 
>   |

Re: [PATCH net-next v2 01/10] net: dsa: lan9303: Fixed MDIO interface

2017-07-26 Thread Andrew Lunn

> > So I really want to group the patches into only a few series in order
> > to not spend months on the process.

I strongly agree with Vivien here. Good patches get accepted in about
3 days. You should expect feedback within a day or two. That allows
you to have fast cycle times for getting patches in.

Andrew

Re: [PATCH net-next v2 08/10] net: dsa: lan9303: Added ALR/fdb/mdb handling

2017-07-26 Thread Andrew Lunn

Hi Egil

> +/* This function will wait a while until mask & reg == value */
> +/* Otherwise, return timeout */
> +static int lan9303_csr_reg_wait(struct lan9303 *chip, int regno,
> + int mask, char value)
> +{
> + int i;
> +
> + for (i = 0; i < 0x1000; i++) {
> + u32 reg;
> +
> + lan9303_read_switch_reg(chip, regno, );
> + if ((reg & mask) == value)
> + return 0;
> + }
> + return -ETIMEDOUT;

Busy looping is probably not a good idea. Can you add a usleep()?

> +}
> +
> +static int _lan9303_alr_make_entry_raw(struct lan9303 *chip, u32 dat0, u32 
> dat1)

What does the _ indicate. I could understand having it when you have
lan9303_alr_make_entry_raw() call _lan9303_alr_make_entry_raw() after
taking a lock, but i don't see anything like that here.

> +{
> + lan9303_write_switch_reg(
> + chip, LAN9303_SWE_ALR_WR_DAT_0, dat0);
> + lan9303_write_switch_reg(
> + chip, LAN9303_SWE_ALR_WR_DAT_1, dat1);
> + lan9303_write_switch_reg(
> + chip, LAN9303_SWE_ALR_CMD, ALR_CMD_MAKE_ENTRY);
> + lan9303_csr_reg_wait(
> + chip, LAN9303_SWE_ALR_CMD_STS, ALR_STS_MAKE_PEND, 0);
> + lan9303_write_switch_reg(chip, LAN9303_SWE_ALR_CMD, 0);
> + return 0;
> +}
> +
> +typedef void alr_loop_cb_t(
> + struct lan9303 *chip, u32 dat0, u32 dat1, int portmap, void *ctx);
> +
> +static void lan9303_alr_loop(struct lan9303 *chip, alr_loop_cb_t *cb, void 
> *ctx)
> +{
> + int i;
> +
> + lan9303_write_switch_reg(chip, LAN9303_SWE_ALR_CMD, ALR_CMD_GET_FIRST);
> + lan9303_write_switch_reg(chip, LAN9303_SWE_ALR_CMD, 0);
> +
> + for (i = 1; i < LAN9303_NUM_ALR_RECORDS; i++) {
> + u32 dat0, dat1;
> + int alrport, portmap;
> +
> + lan9303_read_switch_reg(chip, LAN9303_SWE_ALR_RD_DAT_0, );
> + lan9303_read_switch_reg(chip, LAN9303_SWE_ALR_RD_DAT_1, );
> + if (dat1 & ALR_DAT1_END_OF_TABL)
> + break;
> +
> + alrport = (dat1 & ALR_DAT1_PORT_MASK) >> ALR_DAT1_PORT_BITOFFS;
> + portmap = alrport_2_portmap[alrport];
> +
> + cb(chip, dat0, dat1, portmap, ctx);
> +
> + lan9303_write_switch_reg(
> + chip, LAN9303_SWE_ALR_CMD, ALR_CMD_GET_NEXT);
> + lan9303_write_switch_reg(chip, LAN9303_SWE_ALR_CMD, 0);
> + }
> +}
> +
> +/* ALR: lan9303_alr_loop callback functions */
> +
> +static void _alr_reg_to_mac(u32 dat0, u32 dat1, u8 mac[6])
> +{
> + mac[0] = (dat0 >>  0) & 0xff;
> + mac[1] = (dat0 >>  8) & 0xff;
> + mac[2] = (dat0 >> 16) & 0xff;
> + mac[3] = (dat0 >> 24) & 0xff;
> + mac[4] = (dat1 >>  0) & 0xff;
> + mac[5] = (dat1 >>  8) & 0xff;
> +}
> +
> +/* Clear learned (non-static) entry on given port */
> +static void alr_loop_cb_del_port_learned(
> + struct lan9303 *chip, u32 dat0, u32 dat1, int portmap, void *ctx)
> +{
> + int *port = ctx;
> +
> + if (((BIT(*port) & portmap) == 0) || (dat1 & ALR_DAT1_STATIC))
> + return;
> +
> + /* learned entries has only one port, we can just delete */
> + dat1 &= ~ALR_DAT1_VALID; /* delete entry */
> + _lan9303_alr_make_entry_raw(chip, dat0, dat1);
> +}
> +
> +struct port_fdb_dump_ctx {
> + int port;
> + struct switchdev_obj_port_fdb *fdb;
> + switchdev_obj_dump_cb_t   *cb;
> +};
> +
> +static void alr_loop_cb_fdb_port_dump(
> + struct lan9303 *chip, u32 dat0, u32 dat1, int portmap, void *ctx)
> +{
> + struct port_fdb_dump_ctx *dump_ctx = ctx;
> + struct switchdev_obj_port_fdb *fdb = dump_ctx->fdb;
> + u8 mac[ETH_ALEN];
> +
> + if ((BIT(dump_ctx->port) & portmap) == 0)
> + return;
> +
> + _alr_reg_to_mac(dat0, dat1, mac);
> + ether_addr_copy(fdb->addr, mac);
> + fdb->vid = 0;
> + fdb->ndm_state = (dat1 & ALR_DAT1_STATIC) ?
> + NUD_NOARP : NUD_REACHABLE;
> + dump_ctx->cb(>obj);
> +}
> +
> +/* ALR: Add/modify/delete ALR entries */
> +
> +/* Set a static ALR entry. Delete entry if port_map is zero */
> +static void _lan9303_alr_set_entry(struct lan9303 *chip, const u8 *mac,
> +u8 port_map, bool stp_override)
> +{
> + u32 dat0, dat1, alr_port;
> +
> + dat1 = ALR_DAT1_STATIC;
> + if (port_map)
> + dat1 |= ALR_DAT1_VALID; /* otherwise no ports: delete entry */
> + if (stp_override)
> + dat1 |= ALR_DAT1_AGE_OVERRID;
> +
> + alr_port = portmap_2_alrport[port_map & 7];
> + dat1 &= ~ALR_DAT1_PORT_MASK;
> + dat1 |= alr_port << ALR_DAT1_PORT_BITOFFS;
> +
> + dat0 = 0;
> + dat0 |= (mac[0] << 0);
> + dat0 |= (mac[1] << 8);
> + dat0 |= (mac[2] << 16);
> + dat0 |= (mac[3] << 24);
> +
> + dat1 |= (mac[4] << 0);
> + dat1 |= (mac[5] << 8);
> +
> + dev_dbg(chip->dev, "%s %pM %d %08x %08x\n",
> + __func__, mac, port_map, dat0,

Re: [PATCH net-next v2 07/10] net: dsa: lan9303: Added basic offloading of unicast traffic

2017-07-26 Thread Andrew Lunn

Hi Egil

> +/* forward special tagged packets from port 0 to port 1 *or* port 2 */
> +static int lan9303_setup_tagging(struct lan9303 *chip)
> +{
> + int ret;

Blank line please.


> + /* enable defining the destination port via special VLAN tagging
> +  * for port 0
> +  */
> + ret = lan9303_write_switch_reg(chip, LAN9303_SWE_INGRESS_PORT_TYPE,
> +0x03);

#define for 0x03.

> + if (ret)
> + return ret;
> +
> + /* tag incoming packets at port 1 and 2 on their way to port 0 to be
> +  * able to discover their source port
> +  */
> + return lan9303_write_switch_reg(
> + chip, LAN9303_BM_EGRSS_PORT_TYPE,
> + LAN9303_BM_EGRSS_PORT_TYPE_SPECIAL_TAG_PORT0);
> +}
> +
>  /* We want a special working switch:
>   * - do not forward packets between port 1 and 2
>   * - forward everything from port 1 to port 0
>   * - forward everything from port 2 to port 0
> - * - forward special tagged packets from port 0 to port 1 *or* port 2
>   */
>  static int lan9303_separate_ports(struct lan9303 *chip)
>  {
> @@ -534,22 +555,6 @@ static int lan9303_separate_ports(struct lan9303 *chip)
>   if (ret)
>   return ret;
>  
> - /* enable defining the destination port via special VLAN tagging
> -  * for port 0
> -  */
> - ret = lan9303_write_switch_reg(chip, LAN9303_SWE_INGRESS_PORT_TYPE,
> -0x03);
> - if (ret)
> - return ret;
> -
> - /* tag incoming packets at port 1 and 2 on their way to port 0 to be
> -  * able to discover their source port
> -  */
> - ret = lan9303_write_switch_reg(chip, LAN9303_BM_EGRSS_PORT_TYPE,
> - LAN9303_BM_EGRSS_PORT_TYPE_SPECIAL_TAG_PORT0);
> - if (ret)
> - return ret;
> -
>   /* prevent port 1 and 2 from forwarding packets by their own */
>   return lan9303_write_switch_reg(chip, LAN9303_SWE_PORT_STATE,
>   LAN9303_SWE_PORT_STATE_FORWARDING_PORT0 |
> @@ -557,6 +562,12 @@ static int lan9303_separate_ports(struct lan9303 *chip)
>   LAN9303_SWE_PORT_STATE_BLOCKING_PORT2);
>  }
>  
> +static void lan9303_bridge_ports(struct lan9303 *chip)
> +{
> + /* ports bridged: remove mirroring */
> + lan9303_write_switch_reg(chip, LAN9303_SWE_PORT_MIRROR, 0);
> +}
> +
>  static int lan9303_handle_reset(struct lan9303 *chip)
>  {
>   if (!chip->reset_gpio)
> @@ -707,6 +718,10 @@ static int lan9303_setup(struct dsa_switch *ds)
>   return -EINVAL;
>   }
>  
> + ret = lan9303_setup_tagging(chip);
> + if (ret)
> + dev_err(chip->dev, "failed to setup port tagging %d\n", ret);
> +
>   ret = lan9303_separate_ports(chip);
>   if (ret)
>   dev_err(chip->dev, "failed to separate ports %d\n", ret);
> @@ -898,17 +913,81 @@ static void lan9303_port_disable(struct dsa_switch *ds, 
> int port,
>   }
>  }
>  
> +static int lan9303_port_bridge_join(struct dsa_switch *ds, int port,
> + struct net_device *br)
> +{
> + struct lan9303 *chip = ds->priv;
> +
> + dev_dbg(chip->dev, "%s(port %d)\n", __func__, port);
> + if (ds->ports[1].bridge_dev ==  ds->ports[2].bridge_dev) {
> + lan9303_bridge_ports(chip);
> + chip->is_bridged = true;  /* unleash stp_state_set() */
> + }
> +
> + return 0;
> +}
> +
> +static void lan9303_port_bridge_leave(struct dsa_switch *ds, int port,
> +   struct net_device *br)
> +{
> + struct lan9303 *chip = ds->priv;
> +
> + dev_dbg(chip->dev, "%s(port %d)\n", __func__, port);
> + if (chip->is_bridged) {
> + lan9303_separate_ports(chip);
> + chip->is_bridged = false;
> + }
> +}
> +
> +static void lan9303_port_stp_state_set(struct dsa_switch *ds, int port,
> +u8 state)
> +{
> + int portmask, portstate;
> + struct lan9303 *chip = ds->priv;
> +
> + dev_dbg(chip->dev, "%s(port %d, state %d)\n",
> + __func__, port, state);
> + if (!chip->is_bridged)
> + return;

I think you are over-simplifying here. Say i have a layer 2 VPN and i
bridge port 1 and the VPN? The software bridge still wants to do STP
on port 1, in order to solve loops.

> +
> + switch (state) {
> + case BR_STATE_DISABLED:
> + portstate = LAN9303_SWE_PORT_STATE_DISABLED_PORT0;
> + break;
> + case BR_STATE_BLOCKING:
> + case BR_STATE_LISTENING:
> + portstate = LAN9303_SWE_PORT_STATE_BLOCKING_PORT0;
> + break;
> + case BR_STATE_LEARNING:
> + portstate = LAN9303_SWE_PORT_STATE_LEARNING_PORT0;
> + break;
> + case BR_STATE_FORWARDING:
> + portstate = LAN9303_SWE_PORT_STATE_FORWARDING_PORT0;
> + break;
> + default:
> +

Re: [PATCH] net: phy: Log only PHY state transitions

2017-07-26 Thread Florian Fainelli

On 07/25/2017 12:33 PM, David Miller wrote:
> From: Marc Gonzalez 
> Date: Tue, 25 Jul 2017 11:31:46 +0200
> 
>> In the current code, old and new PHY states are always logged.
>> From now on, log only PHY state transitions.
>>
>> Signed-off-by: Marc Gonzalez 
>> ---
>>  drivers/net/phy/phy.c | 3 ++-
>>  1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c
>> index d0626bf5c540..6bb764e716fc 100644
>> --- a/drivers/net/phy/phy.c
>> +++ b/drivers/net/phy/phy.c
>> @@ -1226,7 +1226,8 @@ void phy_state_machine(struct work_struct *work)
>>  if (err < 0)
>>  phy_error(phydev);
>>  
>> -phydev_dbg(phydev, "PHY state change %s -> %s\n",
>> +if (old_state != phydev->state)
>> +phydev_dbg(phydev, "PHY state change %s -> %s\n",
>> phy_state_to_str(old_state),
>> phy_state_to_str(phydev->state));
> 
> Something is not kosher with this indentation at all.

Indeed, please align the two phy_state_to_str() where the opening
parenthesis for phydev_dbg() starts, such this visually looks like:

if (old_state != phydev->state)
phydev_dbg(phydev, "PHY state change %s -> %s\n",
   phy_state_to_str(old_state),
   phy_state_to_str(phydev->state));

Thanks!
-- 
Florian

[PATCH] net: inet: diag: expose sockets cgroup classid

2017-07-26 Thread Levin, Alexander (Sasha Levin)

This is useful for directly looking up a task based on class id rather than
having to scan through all open file descriptors.

Signed-off-by: Sasha Levin 
---
 include/uapi/linux/inet_diag.h |  1 +
 net/ipv4/inet_diag.c   | 10 ++
 2 files changed, 11 insertions(+)

diff --git a/include/uapi/linux/inet_diag.h b/include/uapi/linux/inet_diag.h
index bbe201047df6..678496897a68 100644
--- a/include/uapi/linux/inet_diag.h
+++ b/include/uapi/linux/inet_diag.h
@@ -142,6 +142,7 @@ enum {
INET_DIAG_PAD,
INET_DIAG_MARK,
INET_DIAG_BBRINFO,
+   INET_DIAG_CLASS_ID,
__INET_DIAG_MAX,
 };
 
diff --git a/net/ipv4/inet_diag.c b/net/ipv4/inet_diag.c
index 3828b3a805cd..ffc6dad9780a 100644
--- a/net/ipv4/inet_diag.c
+++ b/net/ipv4/inet_diag.c
@@ -274,6 +274,16 @@ int inet_sk_diag_fill(struct sock *sk, struct 
inet_connection_sock *icsk,
goto errout;
}
 
+   if (ext & (1 << (INET_DIAG_CLASS_ID - 1))) {
+   u32 classid = 0;
+
+#ifdef CONFIG_SOCK_CGROUP_DATA
+   classid = sock_cgroup_classid(>sk_cgrp_data);
+#endif
+
+   nla_put(skb, INET_DIAG_CLASS_ID, sizeof(classid), );
+   }
+
 out:
nlmsg_end(skb, nlh);
return 0;
-- 
2.11.0

Re: [PATCH net] ipv6: no need to return rt->dst.error if it is not null entry.

2017-07-26 Thread David Ahern

On 7/25/17 1:32 AM, Hangbin Liu wrote:
> On Mon, Jul 24, 2017 at 09:28:07PM -0600, David Ahern wrote:
>> On 7/24/17 6:08 PM, Hangbin Liu wrote:
>>> That's why I think we should remove both rt->dst.error and ip6_null_entry
>>> check in inet6_rtm_getroute(). And even further, remove the ip6_null_entry
>>> check in rt6_dump_route().
>>
>> git blame net/ipv6/route.c
>>
>> find the commits and review.
> 
> Hi David,
> 
> Sorry, would you like to give me more hints?
> 
> I saw your commit 1f17e2f2c8a8 ("net: ipv6: ignore null_entry on route
> dumps"). But I think this issue has been fixed by
> 
> 2f460933f58e ("ipv6: initialize route null entry in addrconf_init()") and
> 242d3a49a2a1 ("ipv6: reorder ip6_route_dev_notifier after ipv6_dev_notf")
> 
> I use a reproducer which add unreachable route in netns with lo down. And I
> could not trigger Panic in the fixed kernel. That's why I think we could
> remove ip6_null_entry check in rt6_dump_route().

Understood. Cong's patch to fix the intialization order (lo device
before route init) makes sure the idev is not null. That said, the
null_entry route is internal ipv6 logic and is not a route entry to be
returned to userspace.

Re: [PATCH net-next v2 06/10] net: dsa: lan9303: added sysfs node swe_bcst_throt

2017-07-26 Thread Andrew Lunn

On Tue, Jul 25, 2017 at 06:15:49PM +0200, Egil Hjelmeland wrote:
> Allowing per-port access to Switch Engine Broadcast Throttling Register

Hi Egil

In general, we are against using sysfs. If there is a generic
mechanism, that applies for all sorts of network interfaces, it should
be used instead of sysfs.

Is this intended to reduce the effect of broadcast storms?

   Andrew

Re: [PATCH net-next v2 04/10] net: dsa: lan9303: Added adjust_link() method

2017-07-26 Thread Andrew Lunn

On Tue, Jul 25, 2017 at 06:15:47PM +0200, Egil Hjelmeland wrote:
> This makes the driver react to device tree "fixed-link" declaration
> on CPU port.
> 
> - turn off autonegotiation
> - force speed 10 or 100 mb/s
> - force duplex mode
> 
> Signed-off-by: Egil Hjelmeland 
> ---
>  drivers/net/dsa/lan9303-core.c | 33 +
>  1 file changed, 33 insertions(+)
> 
> diff --git a/drivers/net/dsa/lan9303-core.c b/drivers/net/dsa/lan9303-core.c
> index 0806a0684d55..be6d78f45a5f 100644
> --- a/drivers/net/dsa/lan9303-core.c
> +++ b/drivers/net/dsa/lan9303-core.c
> @@ -17,6 +17,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include "lan9303.h"
>  
> @@ -746,6 +747,37 @@ static int lan9303_phy_write(struct dsa_switch *ds, int 
> phy, int regnum,
>   return chip->ops->phy_write(chip, phy, regnum, val);
>  }
>  
> +static void lan9303_adjust_link(struct dsa_switch *ds, int port,
> + struct phy_device *phydev)
> +{
> + struct lan9303 *chip = ds->priv;
> +
> + int ctl, res;
> +
> + ctl = lan9303_phy_read(ds, port, MII_BMCR);
> +
> + if (!phy_is_pseudo_fixed_link(phydev))
> + return;
> +

Hi Egil

Maybe do this check before reading MII_BMCR?

> + ctl &= ~BMCR_ANENABLE;

Should this also mask out BMCR_SPEED100 and DUPLEX_FULL? Otherwise how
do you select 10/Half if it is already configured for 100/Full?

> + if (phydev->speed == SPEED_100)
> + ctl |= BMCR_SPEED100;
> +
> + if (phydev->duplex == DUPLEX_FULL)
> + ctl |= BMCR_FULLDPLX;
> +
> + res =  lan9303_phy_write(ds, port, MII_BMCR, ctl);
> +
> + if (port == chip->phy_addr_sel_strap) {
> + /* Virtual Phy: Remove Turbo 200Mbit mode */
> + lan9303_read(chip->regmap, LAN9303_VIRT_SPECIAL_CTRL, );
> +
> + ctl &= ~(1 << 10); // TURBO BIT

BIT(10), or better still something like BIT(LAN9303_VIRT_SPECIAL_TURBO)

> + res =  regmap_write(chip->regmap,
> + LAN9303_VIRT_SPECIAL_CTRL, ctl);
> + }
> +}

  Andrew

Re: [PATCHv3 net] ipv6: no need to return rt->dst.error if it is prohibit entry

2017-07-26 Thread David Ahern

On 7/26/17 3:20 AM, Hangbin Liu wrote:
> After commit 18c3a61c4264 ("net: ipv6: RTM_GETROUTE: return matched fib
> result when requested"). When we get a prohibit ertry, we will return
> -EACCES directly.
> 
> Before:

Do you mean "Before commit 18c3a61c4264?"

> + ip netns exec client ip -6 route get 2003::1
> prohibit 2003::1 dev lo table unspec proto kernel src 2001::1 metric
> 4294967295 error -13
> 
> After:

And "After commit 18c3a61c4264?"

> + ip netns exec server ip -6 route get 2002::1
> RTNETLINK answers: Permission denied
> 
> Fix this by add prohibit and blk hole check.
> 
> At the same time, after commit
> 2f460933f58e ("ipv6: initialize route null entry in addrconf_init()") and
> 242d3a49a2a1 ("ipv6: reorder ip6_route_dev_notifier after ipv6_dev_notf")
> We will init rt6i_idev correctly. So we could dump ip6_null_entry
> (unreachable route entry) safely now.
> 
> Fixes: 18c3a61c4264 ("net: ipv6: RTM_GETROUTE: return matched fib...")
> Signed-off-by: Hangbin Liu 
> ---
>  net/ipv6/route.c | 13 ++---
>  1 file changed, 6 insertions(+), 7 deletions(-)
> 
> diff --git a/net/ipv6/route.c b/net/ipv6/route.c
> index 4d30c96..b05da74 100644
> --- a/net/ipv6/route.c
> +++ b/net/ipv6/route.c
> @@ -3637,13 +3637,12 @@ static int inet6_rtm_getroute(struct sk_buff *in_skb, 
> struct nlmsghdr *nlh,
>   dst = ip6_route_lookup(net, , 0);
>  
>   rt = container_of(dst, struct rt6_info, dst);
> - if (rt->dst.error) {
> - err = rt->dst.error;
> - ip6_rt_put(rt);
> - goto errout;
> - }
> -
> - if (rt == net->ipv6.ip6_null_entry) {
> + if (rt->dst.error &&
> +#ifdef CONFIG_IPV6_MULTIPLE_TABLES
> + rt != net->ipv6.ip6_prohibit_entry &&
> + rt != net->ipv6.ip6_blk_hole_entry &&
> +#endif
> + rt != net->ipv6.ip6_null_entry) {
>   err = rt->dst.error;
>   ip6_rt_put(rt);
>   goto errout;
> 

This is what I see with your patch:

# ip -6 ro ls vrf red
2001:db8:1::/120 dev eth1 proto kernel metric 256  pref medium
prohibit 5000::/120 dev lo metric 1024  error -13 pref medium
fe80::/64 dev eth1 proto kernel metric 256  pref medium
ff00::/8 dev eth1 metric 256  pref medium
unreachable default dev lo metric 8192  error -113 pref medium

ie., I added a prohibit route for 5000:/120

and then running:
# ip -6 ro get vrf red 5000::1
RTNETLINK answers: Permission denied

Which is the behavior without your patch.

Now if I delete just the first bit:

diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 4d30c96a819d..8fc52de40175 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -3637,12 +3637,6 @@ static int inet6_rtm_getroute(struct sk_buff
*in_skb, struct nlmsghdr *nlh,
dst = ip6_route_lookup(net, , 0);

rt = container_of(dst, struct rt6_info, dst);
-   if (rt->dst.error) {
-   err = rt->dst.error;
-   ip6_rt_put(rt);
-   goto errout;
-   }
-
if (rt == net->ipv6.ip6_null_entry) {
err = rt->dst.error;
ip6_rt_put(rt);

Then I get:

# ip -6 ro get vrf red 5000::1
prohibit 5000::1 from :: dev lo table red src 2001:db8::2 metric 1024
error -13 pref medium

which seems to be your objective.

I don't understand why you are focused on the built-in null and prohibit
route entries. When I add a default unreachable or prohibit route those
are different rt6_info entries. Take a look at ip6_route_info_create.

Re: TCP fast retransmit issues

2017-07-26 Thread Neal Cardwell

On Wed, Jul 26, 2017 at 12:43 PM, Neal Cardwell  wrote:
> (1) Because the connection negotiated SACK, the Linux TCP sender does
> not get to its tcp_add_reno_sack() code to count dupacks and enter
> fast recovery on the 3rd dupack. The sender keeps waiting for specific
> packets to be SACKed that would signal that something has probably
> been lost. We could probably mitigate this by having the sender turn
> off SACK once it sees SACKed ranges that are completely invalid (way
> out of window). Then it should use the old non-SACK "Recovery on 3rd
> dupack" path.
>
> (2) It looks like there is a bug in the sender code where it seems to
> be repeatedly using a TLP timer firing 211ms after every ACK is
> received to transmit another TLP probe (a new packet in this case).
> Somehow these weird invalid SACKs seem to be triggering a code path
> that makes us think we can send another TLP, when we probably should
> be firing an RTO. That's my interpretation, anyway. I will try to
> reproduce this with packetdrill.

Hmm. It looks like this might be a general issue, where any time we
get an ACK that doesn't ACK/SACK anything new (whether because it's
incoming data in a bi-directional flow, or a middlebox breaking the
SACKs), then we schedule a TLP timer further out in time. Probably we
should only push the TLP timer out if something is cumulatively ACKed.

But that's not a trivial thing to do, because by the time we are
deciding whether to schedule another TLP, we have already canceled the
previous TLP and reinstalled an RTO. Hmm.

neal

Re: [PATCH v2 2/4] can: fixed-transceiver: Add documentation for CAN fixed transceiver bindings

2017-07-26 Thread Oliver Hartkopp


On 07/26/2017 06:41 PM, Andrew Lunn wrote:

On Mon, Jul 24, 2017 at 06:05:19PM -0500, Franklin S Cooper Jr wrote:



+
+Optional:
+ max-arbitration-speed: a positive non 0 value that determines the max
+   speed that CAN can run in non CAN-FD mode or during the
+   arbitration phase in CAN-FD mode.


Hi Franklin

Since this and the next property are optional, it is good to document
what happens when they are not in the DT blob. Also document what 0
means.


+
+ max-data-speed:   a positive non 0 value that determines the max data rate
+   that can be used in CAN-FD mode. A value of -1 implies
+   CAN-FD is not supported by the transceiver.


-1 is ugly. I think it would be better to have a missing
max-data-speed property indicate that CAN-FD is not supported.


Thanks Andrew! I had the same feeling about '-1' :-)


And
maybe put 'fd' into the property name.


Good point. In fact the common naming to set bitrates for CAN(FD) 
controllers are 'bitrate' and 'data bitrate'.


'speed' is not really a good word for that.

Finally, @Franklin:

A CAN transceiver is limited in bandwidth. But you only have one RX and 
one TX line between the CAN controller and the CAN transceiver. The 
transceiver does not know about CAN FD - it has just a physical(!) layer 
with a limited bandwidth. This is ONE limitation.


So I tend to specify only ONE 'max-bitrate' property for the 
fixed-transceiver binding.


The fact whether the CAN controller is CAN FD capable or not is provided 
by the netlink configuration interface for CAN controllers.


Regards,
Oliver

[RFC] net: make net.core.{r,w}mem_{default,max} namespaced

2017-07-26 Thread Matteo Croce

The following sysctl are global and can't be read or set from a netns:

net.core.rmem_default
net.core.rmem_max
net.core.wmem_default
net.core.wmem_max

Make the following sysctl parameters available from within a network
namespace, allowing to set unique values per network namespace.

My concern is about the initial value of this sysctl in the newly
creates netns: I'm not sure if is better to copy them from the init
namespace or set them to the default values.

Setting them to the default value has the advantage that a new namespace
behaves like a freshly booted system, while copying them from the init
netns has the advantage of keeping the current behaviour as the values
from the init netns are used.

Signed-off-by: Matteo Croce 
---
 include/net/netns/core.h|  5 +++
 include/net/sock.h  |  6 
 include/net/tcp.h   |  3 +-
 net/core/net_namespace.c| 22 +
 net/core/sock.c | 31 +-
 net/core/sysctl_net_core.c  | 70 ++---
 net/ipv4/ip_output.c|  2 +-
 net/ipv4/syncookies.c   |  3 +-
 net/ipv4/tcp_minisocks.c|  3 +-
 net/ipv4/tcp_output.c   | 12 ---
 net/ipv6/syncookies.c   |  3 +-
 net/netfilter/ipvs/ip_vs_sync.c |  4 +--
 12 files changed, 89 insertions(+), 75 deletions(-)

diff --git a/include/net/netns/core.h b/include/net/netns/core.h
index 78eb1ff75475..9b613162467d 100644
--- a/include/net/netns/core.h
+++ b/include/net/netns/core.h
@@ -9,6 +9,11 @@ struct netns_core {
struct ctl_table_header *sysctl_hdr;
 
int sysctl_somaxconn;
+   u32 sysctl_wmem_max;
+   u32 sysctl_rmem_max;
+
+   u32 sysctl_wmem_default;
+   u32 sysctl_rmem_default;
 
struct prot_inuse __percpu *inuse;
 };
diff --git a/include/net/sock.h b/include/net/sock.h
index 7c0632c7e870..e62a279e420f 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -2363,13 +2363,7 @@ bool sk_net_capable(const struct sock *sk, int cap);
 
 void sk_get_meminfo(const struct sock *sk, u32 *meminfo);
 
-extern __u32 sysctl_wmem_max;
-extern __u32 sysctl_rmem_max;
-
 extern int sysctl_tstamp_allow_data;
 extern int sysctl_optmem_max;
 
-extern __u32 sysctl_wmem_default;
-extern __u32 sysctl_rmem_default;
-
 #endif /* _SOCK_H */
diff --git a/include/net/tcp.h b/include/net/tcp.h
index 70483296157f..460f4373d42a 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -1300,7 +1300,8 @@ static inline void tcp_slow_start_after_idle_check(struct 
sock *sk)
 /* Determine a window scaling and initial window to offer. */
 void tcp_select_initial_window(int __space, __u32 mss, __u32 *rcv_wnd,
   __u32 *window_clamp, int wscale_ok,
-  __u8 *rcv_wscale, __u32 init_rcv_wnd);
+  __u8 *rcv_wscale, __u32 init_rcv_wnd,
+  __u32 rmem_max);
 
 static inline int tcp_win_from_space(int space)
 {
diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
index 8726d051f31d..2d72b2bd6eab 100644
--- a/net/core/net_namespace.c
+++ b/net/core/net_namespace.c
@@ -23,6 +23,16 @@
 #include 
 #include 
 
+/* Take into consideration the size of the struct sk_buff overhead in the
+ * determination of these values, since that is non-constant across
+ * platforms.  This makes socket queueing behavior and performance
+ * not depend upon such differences.
+ */
+#define _SK_MEM_PACKETS256
+#define _SK_MEM_OVERHEAD   SKB_TRUESIZE(256)
+#define SK_WMEM_MAX(_SK_MEM_OVERHEAD * _SK_MEM_PACKETS)
+#define SK_RMEM_MAX(_SK_MEM_OVERHEAD * _SK_MEM_PACKETS)
+
 /*
  * Our network namespace constructor/destructor lists
  */
@@ -318,6 +328,18 @@ static __net_init int setup_net(struct net *net, struct 
user_namespace *user_ns)
 static int __net_init net_defaults_init_net(struct net *net)
 {
net->core.sysctl_somaxconn = SOMAXCONN;
+   if (net_eq(net, _net)) {
+   init_net.core.sysctl_wmem_max = SK_WMEM_MAX;
+   init_net.core.sysctl_rmem_max = SK_RMEM_MAX;
+   init_net.core.sysctl_wmem_default = SK_WMEM_MAX;
+   init_net.core.sysctl_rmem_default = SK_RMEM_MAX;
+   } else {
+   net->core.sysctl_wmem_max = init_net.core.sysctl_wmem_max;
+   net->core.sysctl_rmem_max = init_net.core.sysctl_rmem_max;
+   net->core.sysctl_wmem_default = 
init_net.core.sysctl_wmem_default;
+   net->core.sysctl_rmem_default = 
init_net.core.sysctl_rmem_default;
+   }
+
return 0;
 }
 
diff --git a/net/core/sock.c b/net/core/sock.c
index ac2a404c73eb..8086a660d75f 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -307,24 +307,6 @@ static struct lock_class_key af_wlock_keys[AF_MAX];
 static struct lock_class_key af_elock_keys[AF_MAX];
 static struct lock_class_key

Re: [PATCH net-next v2 02/10] net: dsa: lan9303: Do not disable/enable switch fabric port 0 at startup

2017-07-26 Thread Andrew Lunn

On Tue, Jul 25, 2017 at 06:15:45PM +0200, Egil Hjelmeland wrote:
> For some mysterious reason enable switch fabric port 0 TX fails to
> work, when the TX has previous been disabled. Resolved by not
> disable/enable switch fabric port 0 at startup. Port 1 and 2 are
> still disabled in early init.
> 
> Signed-off-by: Egil Hjelmeland 
> ---
>  drivers/net/dsa/lan9303-core.c | 7 ---
>  1 file changed, 7 deletions(-)
> 
> diff --git a/drivers/net/dsa/lan9303-core.c b/drivers/net/dsa/lan9303-core.c
> index e622db586c3d..c2b53659f58f 100644
> --- a/drivers/net/dsa/lan9303-core.c
> +++ b/drivers/net/dsa/lan9303-core.c
> @@ -557,9 +557,6 @@ static int lan9303_disable_processing(struct lan9303 
> *chip)
>  {
>   int ret;
>  
> - ret = lan9303_disable_packet_processing(chip, LAN9303_PORT_0_OFFSET);
> - if (ret)
> - return ret;
>   ret = lan9303_disable_packet_processing(chip, LAN9303_PORT_1_OFFSET);
>   if (ret)
>   return ret;
> @@ -633,10 +630,6 @@ static int lan9303_setup(struct dsa_switch *ds)
>   if (ret)
>   dev_err(chip->dev, "failed to separate ports %d\n", ret);
>  
> - ret = lan9303_enable_packet_processing(chip, LAN9303_PORT_0_OFFSET);
> - if (ret)
> - dev_err(chip->dev, "failed to re-enable switching %d\n", ret);
> -

Does this mean you are relying on something else enabling port 0? The
bootloader?

I'm wondering if it is better to keep the enable, but remove the
disable?

Andrew

Re: [PATCH net-next v2 01/10] net: dsa: lan9303: Fixed MDIO interface

2017-07-26 Thread Andrew Lunn

On Tue, Jul 25, 2017 at 06:15:44PM +0200, Egil Hjelmeland wrote:
> Fixes after testing on actual HW:
> 
> - lan9303_mdio_write()/_read() must multiply register number
>   by 4 to get offset
> 
> - Indirect access (PMI) to phy register only work in I2C mode. In
>   MDIO mode phy registers must be accessed directly. Introduced
>   struct lan9303_phy_ops to handle the two modes. Renamed functions
>   to clarify.
> 
> - lan9303_detect_phy_setup() : Failed MDIO read return 0x.
>   Handle that.
> 
> Signed-off-by: Egil Hjelmeland 
> ---
>  drivers/net/dsa/lan9303-core.c | 42 
> +++---
>  drivers/net/dsa/lan9303.h  | 11 +++
>  drivers/net/dsa/lan9303_i2c.c  |  2 ++
>  drivers/net/dsa/lan9303_mdio.c | 34 ++
>  4 files changed, 74 insertions(+), 15 deletions(-)
> 
> diff --git a/drivers/net/dsa/lan9303-core.c b/drivers/net/dsa/lan9303-core.c
> index cd76e61f1fca..e622db586c3d 100644
> --- a/drivers/net/dsa/lan9303-core.c
> +++ b/drivers/net/dsa/lan9303-core.c
> @@ -20,6 +20,9 @@
>  
>  #include "lan9303.h"
>  
> +/* 13.2 System Control and Status Registers
> + * Multiply register number by 4 to get address offset.
> + */
>  #define LAN9303_CHIP_REV 0x14
>  # define LAN9303_CHIP_ID 0x9303
>  #define LAN9303_IRQ_CFG 0x15
> @@ -53,6 +56,9 @@
>  #define LAN9303_VIRT_PHY_BASE 0x70
>  #define LAN9303_VIRT_SPECIAL_CTRL 0x77
>  
> +/*13.4 Switch Fabric Control and Status Registers
> + * Accessed indirectly via SWITCH_CSR_CMD, SWITCH_CSR_DATA.
> + */
>  #define LAN9303_SW_DEV_ID 0x
>  #define LAN9303_SW_RESET 0x0001
>  #define LAN9303_SW_RESET_RESET BIT(0)
> @@ -242,7 +248,7 @@ static int lan9303_virt_phy_reg_write(struct lan9303 
> *chip, int regnum, u16 val)
>   return regmap_write(chip->regmap, LAN9303_VIRT_PHY_BASE + regnum, val);
>  }
>  
> -static int lan9303_port_phy_reg_wait_for_completion(struct lan9303 *chip)
> +static int lan9303_indirect_phy_wait_for_completion(struct lan9303 *chip)
>  {
>   int ret, i;
>   u32 reg;
> @@ -262,7 +268,7 @@ static int 
> lan9303_port_phy_reg_wait_for_completion(struct lan9303 *chip)
>   return -EIO;
>  }
>  
> -static int lan9303_port_phy_reg_read(struct lan9303 *chip, int addr, int 
> regnum)
> +static int lan9303_indirect_phy_read(struct lan9303 *chip, int addr, int 
> regnum)
>  {
>   int ret;
>   u32 val;
> @@ -272,7 +278,7 @@ static int lan9303_port_phy_reg_read(struct lan9303 
> *chip, int addr, int regnum)
>  
>   mutex_lock(>indirect_mutex);
>  
> - ret = lan9303_port_phy_reg_wait_for_completion(chip);
> + ret = lan9303_indirect_phy_wait_for_completion(chip);
>   if (ret)
>   goto on_error;
>  
> @@ -281,7 +287,7 @@ static int lan9303_port_phy_reg_read(struct lan9303 
> *chip, int addr, int regnum)
>   if (ret)
>   goto on_error;
>  
> - ret = lan9303_port_phy_reg_wait_for_completion(chip);
> + ret = lan9303_indirect_phy_wait_for_completion(chip);
>   if (ret)
>   goto on_error;
>  
> @@ -299,8 +305,8 @@ static int lan9303_port_phy_reg_read(struct lan9303 
> *chip, int addr, int regnum)
>   return ret;
>  }
>  
> -static int lan9303_phy_reg_write(struct lan9303 *chip, int addr, int regnum,
> -  unsigned int val)
> +static int lan9303_indirect_phy_write(struct lan9303 *chip, int addr,
> +   int regnum, u16 val)
>  {
>   int ret;
>   u32 reg;
> @@ -311,7 +317,7 @@ static int lan9303_phy_reg_write(struct lan9303 *chip, 
> int addr, int regnum,
>  
>   mutex_lock(>indirect_mutex);
>  
> - ret = lan9303_port_phy_reg_wait_for_completion(chip);
> + ret = lan9303_indirect_phy_wait_for_completion(chip);
>   if (ret)
>   goto on_error;
>  
> @@ -328,6 +334,11 @@ static int lan9303_phy_reg_write(struct lan9303 *chip, 
> int addr, int regnum,
>   return ret;
>  }
>  
> +const struct lan9303_phy_ops lan9303_indirect_phy_ops = {
> + .phy_read = lan9303_indirect_phy_read,
> + .phy_write = lan9303_indirect_phy_write,
> +};
> +
>  static int lan9303_switch_wait_for_completion(struct lan9303 *chip)
>  {
>   int ret, i;
> @@ -427,14 +438,15 @@ static int lan9303_detect_phy_setup(struct lan9303 
> *chip)
>* Special reg 18 of phy 3 reads as 0x, if 'phy_addr_sel_strap' is 0
>* and the IDs are 0-1-2, else it contains something different from
>* 0x, which means 'phy_addr_sel_strap' is 1 and the IDs are 1-2-3.
> +  * 0x is returned for failed MDIO access.

Hi Egil

0x is not really a failure. It just means there is nothing on the
bus at that address. The bus has a weak pull-up, so defaults to one.

>*/
> - reg = lan9303_port_phy_reg_read(chip, 3, MII_LAN911X_SPECIAL_MODES);
> + reg = chip->ops->phy_read(chip, 3, MII_LAN911X_SPECIAL_MODES);
>   if (reg < 0) {
>   dev_err(chip->dev, "Failed to detect phy

Re: TC-pedit man page examples error

2017-07-26 Thread Tyler Bautista

Hello Phil,

Thank you for taking my message. I have gotten a solution working so
feel free to ignore the rest of this email. By vanilla iproute2 do you
mean the latest package through the yum or apt repository? All I can
tell you is that in the example I gave the iproute2 package was the
latest one through the RHEL 7 yum repository. Otherwise I will need
clarification on what vanilla means. I attempted the above example
code on two other Ubuntu machines, one of which I can confirm is
Ubuntu 16.04 LTS. Unfortunately I am not at that machine right now so
I cannot give you the output. Both machines gave similar error
messages. The reason I emailed you via netdev was because the example
code I used was taken verbatim from the tc-pedit man page on my
machine. The man page also gave me email addresses to contact in case
an error was found with the man page, which is what I did. I did get
the code to work when using the latest version of iproute2, so I
assume the only practical solution is to manually install the latest
version of iproute2 or wait for the various distros of Linux to use
updated iproute2. I appreciate your help. Have a nice day.

Tyler Bautista

On Wed, Jul 26, 2017 at 1:43 AM, Phil Sutter  wrote:
> Hi Tyler,
>
> On Tue, Jul 25, 2017 at 08:33:40AM -0700, Tyler Bautista wrote:
>> To whom it may concern,
>> I recently attempted to use simple tc action pedit commands on the man
>> page and I ran into some errors. The following is some information
>> about my version of iproute and my machine:
>> 
>> the following is my iproute package information
>>
>> Loaded plugins: fastestmirror, langpacks
>> iproute-3.10.0-74.el7.x86_64
>
> This mailing list is for upstream issues only. Can you reproduce the
> problem with vanilla iproute2?
>
> You are using RHEL7's iproute package, which has a number of known
> issues in pedit action:
>
> https://bugzilla.redhat.com/show_bug.cgi?id=1322406
>
> These are planned to be resolved for RHEL7.5, not sure when it will land
> in CentOS.
>
> Cheers, Phil

1 2 >

1 - 100 of 199 matches

Mail list logo