date:20170724

[RFC] switchdev: clarify ndo_get_phys_port_name() formats

2017-07-24 Thread Jakub Kicinski

We are still in position where we can suggest uniform naming
convention for ndo_get_phys_port_name().  switchdev.txt file
already contained a suggestion of how to name external ports.
Since the use of switchdev for SR-IOV NIC's eswitches is growing,
establish a format for ports of those devices as well.

Signed-off-by: Jakub Kicinski 
---
 Documentation/networking/switchdev.txt | 14 +++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/Documentation/networking/switchdev.txt 
b/Documentation/networking/switchdev.txt
index 3e7b946dea27..7c4b6025fb4b 100644
--- a/Documentation/networking/switchdev.txt
+++ b/Documentation/networking/switchdev.txt
@@ -119,9 +119,17 @@ into 4 10G ports, resulting in 4 port netdevs, the device 
can give a unique
 SUBSYSTEM=="net", ACTION=="add", ATTR{phys_switch_id}=="", \
ATTR{phys_port_name}!="", NAME="swX$attr{phys_port_name}"
 
-Suggested naming convention is "swXpYsZ", where X is the switch name or ID, Y
-is the port name or ID, and Z is the sub-port name or ID.  For example, sw1p1s0
-would be sub-port 0 on port 1 on switch 1.
+Suggested formats of the port name returned by ndo_get_phys_port_name are:
+ - pA for external ports;
+ - pAsB   for split external ports;
+ - pfCfor PF ports (so called PF representors);
+ - pfCvfD for VF ports (so called VF representors).
+Where A is the port name or ID; B is the sub-port name or ID; C is PCIe
+Physical Function name or ID and D is PCIe Virtual Function name or ID.
+
+Suggested naming convention for switches is "swX", where X is the switch name
+or ID, plus the port name.  For example, sw1p1s0 would be sub-port 0 on port 1
+on switch 1.
 
 Port Features
 ^
-- 
2.11.0

Re: [PATCH net-next 10/10] bnxt_en: add support for port_attr_get and and get_phys_port_name

2017-07-24 Thread Jakub Kicinski

On Mon, 24 Jul 2017 12:34:29 -0400, Michael Chan wrote:
> From: Sathya Perla 
> 
> This patch adds support for the switchdev_port_attr_get() and
> ndo_get_phys_port_name() methods for the PF and the VF-reps.
> Using this support a user application can deduce that the PF
> (when in the ESWITCH_SWDEV mode) and it's VF-reps form a switch.
> 
> Signed-off-by: Sathya Perla 
> Signed-off-by: Michael Chan 
> ---
>  drivers/net/ethernet/broadcom/bnxt/bnxt.c | 57 
> +++
>  drivers/net/ethernet/broadcom/bnxt/bnxt.h |  2 +
>  drivers/net/ethernet/broadcom/bnxt/bnxt_vfr.c | 31 ++-
>  3 files changed, 89 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c 
> b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
> index f262fe6..82cbe18 100644
> --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
> +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
> @@ -7546,6 +7546,61 @@ static int bnxt_bridge_setlink(struct net_device *dev, 
> struct nlmsghdr *nlh,
>   return rc;
>  }
>  
> +static int bnxt_get_phys_port_name(struct net_device *dev, char *buf,
> +size_t len)
> +{
> + struct bnxt *bp = netdev_priv(dev);
> + int rc;
> +
> + /* The PF and it's VF-reps only support the switchdev framework */
> + if (!BNXT_PF(bp))
> + return -EOPNOTSUPP;
> +
> + /* The switch-id that the pf belongs to is exported by
> +  * the switchdev ndo. This name is just to distinguish from the
> +  * vf-rep ports.
> +  */
> + rc = snprintf(buf, len, "pf%d", bp->pf.port_id);

This is worrisome.  What do you mean by PF?  In my experience, since
for years PFs had one-to-one association with physical ports people use
the terms interchangeably.  This causes a lot of headaches in proper
eswitch modelling.

I'm not sure whether this is a HW limitation or engineers trying to
make things "easy for users" but most VF-representor patchsets we've
seen on netdev don't include PF representors.  NICs have 3 types of
ports - PFs, VFs and external ports/MACs.  For some reason there is a
tendency to keep pretending PF and physical port is the same entity,
which among other things makes it impossible to install VF->PF rules
(as in from VF to the host, not to the network).

Most high performance NIC vendors also have multi-PF NICs, even though
those are not deployed by wider public.  If we keep pretending PF is
the external port, those will get very awkward to model.  This is a bit
of a pet peeve of mine :)

If this bnxt_get_phys_port_name() relates to the external port, please
change your implementation to comply with
Documentation/networking/switchdev.txt, in particular:

> Suggested naming convention is "swXpYsZ", where X is the switch name or ID, Y
> is the port name or ID, and Z is the sub-port name or ID.  For example, 
> sw1p1s0
> would be sub-port 0 on port 1 on switch 1.

So for physical ports the convention is p, and in case of
breakout ps.

> +static int bnxt_vf_rep_get_phys_port_name(struct net_device *dev, char *buf,
> +   size_t len)
> +{
> + struct bnxt_vf_rep *vf_rep = netdev_priv(dev);
> + int rc;
> +
> + rc = snprintf(buf, len, "vfr%d", vf_rep->vf_idx);

This is even worse.  We already have two naming conventions in the
kernel, mlx5 uses "%d" for legacy reasons.  nfp uses pf%dvf%d for vfs,
which is better for two reasons: (a) it works with multi-PF devices;
(b) naming something representor from switchdev perspective is
pointless, you're not calling the external port netdev a physical port
representor, even though it has the exact same relation to the port as
with VFs (i.e. egressing traffic on the netdev will cause the switch
to send to the given port).

Re: [PATCH 1/2] net: ethernet: ti: cpts: convert to use kthread_worker

2017-07-24 Thread Richard Cochran

On Mon, Jul 24, 2017 at 07:34:38PM -0500, Grygorii Strashko wrote:
> Below if pure TBD/RFC version of patch which add kthread worker to PTP core.
> I'm sending it to get you opinion about implementation in general, before 
> continue with more changes. Pls, take a look if you have time?
> - are you ok with names (API, callbacks, ptp structs members)?

The API and naming looks good to me.
 
> I can prepare, update and resend proper patches tom if feedback is positive.

Please do.

> I also can convert dp83640 driver to use new feature, but I can't test it.

No need for that.  It would be enough to have cpts as the first user
and example.
 
> + if (ptp->info->do_aux_work) {
> + struct sched_param param = {
> + .sched_priority = MAX_RT_PRIO - 1 };
> +
> + kthread_init_delayed_work(&ptp->aux_work, ptp_aux_kworker);
> + ptp->kworker = kthread_create_worker(0, info->name);
> + if (IS_ERR(ptp->kworker)) {
> + pr_err("failed to create ptp aux_worker task %ld\n",
> +PTR_ERR(ptp->kworker));
> + return ERR_CAST(ptp->kworker);
> + }
> + err = sched_setscheduler_nocheck(ptp->kworker->task,
> +  SCHED_FIFO, ¶m);

I think we should not hard code the scheduler and priority here but
rather leave it to the sysadmin to configure these using chrt(1).
After all, a normal work item is has served just in many situations.

> + if (err)
> + pr_err("sched_setscheduler_nocheck err %d\n", err);
> + }
> +
>   err = ptp_populate_pin_groups(ptp);
>   if (err)
>   goto no_pin_groups;
> @@ -274,6 +305,9 @@ int ptp_clock_unregister(struct ptp_clock *ptp)
>   ptp->defunct = 1;
>   wake_up_interruptible(&ptp->tsev_wq);
>  
> + kthread_cancel_delayed_work_sync(&ptp->aux_work);
> + kthread_destroy_worker(ptp->kworker);

These can't be called unconditionally.

>   /* Release the clock's resources. */
>   if (ptp->pps_source)
>   pps_unregister_source(ptp->pps_source);

Thanks,
Richard

Re: [PATCH net-next 10/10] bnxt_en: add support for port_attr_get and and get_phys_port_name

2017-07-24 Thread David Miller

From: Jakub Kicinski 
Date: Mon, 24 Jul 2017 21:06:17 -0700

> On Tue, 25 Jul 2017 10:44:49 +0800, kbuild test robot wrote:
>>drivers/net/ethernet/broadcom/bnxt/bnxt.c: In function 'bnxt_init_one':
>> >> drivers/net/ethernet/broadcom/bnxt/bnxt.c:7897:5: error: 'struct 
>> >> net_device' has no member named 'switchdev_ops'; did you mean 
>> >> 'netdev_ops'?  
>>  dev->switchdev_ops = &bnxt_switchdev_ops;
>> ^~
> 
> Check out SWITCHDEV_SET_OPS() from commit d643a75ac2bc ("net: switchdev:
> add SET_SWITCHDEV_OPS helper").

This was easy enough so I just pushed out:


>From bc88055ab72c0eaa080926c888628b77d2055513 Mon Sep 17 00:00:00 2001
From: "David S. Miller" 
Date: Mon, 24 Jul 2017 21:20:16 -0700
Subject: [PATCH] bnxt_en: Use SWITCHDEV_SET_OPS().

Suggested by Jakub Kicinski.

Fixes: c124a62ff2dd ("bnxt_en: add support for port_attr_get and and 
get_phys_port_name")
Reported-by: kbuild test robot 
Signed-off-by: David S. Miller 
---
 drivers/net/ethernet/broadcom/bnxt/bnxt.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c 
b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index 82cbe1804821..badbc3550338 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -7894,7 +7894,7 @@ static int bnxt_init_one(struct pci_dev *pdev, const 
struct pci_device_id *ent)
dev->netdev_ops = &bnxt_netdev_ops;
dev->watchdog_timeo = BNXT_TX_TIMEOUT;
dev->ethtool_ops = &bnxt_ethtool_ops;
-   dev->switchdev_ops = &bnxt_switchdev_ops;
+   SWITCHDEV_SET_OPS(dev, &bnxt_switchdev_ops);
pci_set_drvdata(pdev, dev);
 
rc = bnxt_alloc_hwrm_resources(bp);
-- 
2.13.3

Re: A buggy behavior for Linux TCP Reno and HTCP

2017-07-24 Thread Stephen Hemminger

On Mon, 24 Jul 2017 16:41:12 -0700
Yuchung Cheng  wrote:

> On Mon, Jul 24, 2017 at 11:29 AM, Neal Cardwell  wrote:
> > On Mon, Jul 24, 2017 at 2:17 PM, Yuchung Cheng  wrote:  
> >> On Sun, Jul 23, 2017 at 7:37 PM, Neal Cardwell  
> >> wrote:  
> >>> On Sun, Jul 23, 2017 at 10:36 PM, Neal Cardwell  
> >>> wrote:  
> > ...  
>  What if we call the field tp->prior_cwnd? Then at least we'd have some
>  nice symmetry:
> 
>  - tp->snd_cwnd,  which is saved in tp->prior_cwnd  (and restored upon 
>  undo)
>  - tp->snd_ssthresh,  which is saved in tp-> prior_ssthresh  (and
>  restored upon undo)
> 
>  That sounds appealing to me. WDYT?  
> >>>
> >>> And, I should add, if we go with the tp->prior_cwnd approach, then we
> >>> can have a single "default"/CUBIC-style undo function, instead of 15
> >>> separate but identical implementations...  
> >> you mean all CC modules share one ca_ops->undo_cwnd function? sounds a
> >> nice consolidation work.  
> >
> > Yes, exactly.
> >
> > Right now we have 9 modules that have identical tcp_foo_cwnd_undo functions:
> >
> > tcp_bic.c:188:  return max(tp->snd_cwnd, ca->loss_cwnd);
> > tcp_cubic.c:378:return max(tcp_sk(sk)->snd_cwnd, ca->loss_cwnd);
> > tcp_dctcp.c:318:return max(tcp_sk(sk)->snd_cwnd, ca->loss_cwnd);
> > tcp_highspeed.c:165:return max(tcp_sk(sk)->snd_cwnd, ca->loss_cwnd);
> > tcp_illinois.c:309: return max(tcp_sk(sk)->snd_cwnd, ca->loss_cwnd);
> > tcp_nv.c:190:   return max(tcp_sk(sk)->snd_cwnd, ca->loss_cwnd);
> > tcp_scalable.c:50:  return max(tcp_sk(sk)->snd_cwnd, ca->loss_cwnd);
> > tcp_veno.c:210: return max(tcp_sk(sk)->snd_cwnd, veno->loss_cwnd);
> > tcp_yeah.c:232: return max(tcp_sk(sk)->snd_cwnd, yeah->loss_cwnd);
> >
> > And if we fix this bug in tcp_reno_undo_cwnd() by referring to
> > ca->loss_cwnd then we will add another 6 like this.
> >
> > So my proposal would be
> >
> > - tp->snd_cwnd,  which is saved in tp->prior_cwnd  (and restored upon undo)
> > - tp->snd_ssthresh,  which is saved in tp-> prior_ssthresh  (and
> >restored upon undo)
> >
> > Actually, now that I re-read the code, we already do have a
> > prior_cwnd, which is used for the PRR code, and already set upon
> > entering CA_Recovery. So if we set prior_cwnd for CA_Loss, perhaps we
> > can do something like:
> >
> > diff --git a/net/ipv4/tcp_cong.c b/net/ipv4/tcp_cong.c
> > index fde983f6376b..c2b174469645 100644
> > --- a/net/ipv4/tcp_cong.c
> > +++ b/net/ipv4/tcp_cong.c
> > @@ -456,7 +456,7 @@ u32 tcp_reno_undo_cwnd(struct sock *sk)
> >  {
> > const struct tcp_sock *tp = tcp_sk(sk);
> >
> > -   return max(tp->snd_cwnd, tp->snd_ssthresh << 1);
> > +   return max(tp->snd_cwnd, tp->prior_cwnd);
> >  }
> >  EXPORT_SYMBOL_GPL(tcp_reno_undo_cwnd);
> >
> > diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
> > index 2920e0cb09f8..ae790a84302d 100644
> > --- a/net/ipv4/tcp_input.c
> > +++ b/net/ipv4/tcp_input.c
> > @@ -1951,6 +1951,7 @@ void tcp_enter_loss(struct sock *sk)
> > !after(tp->high_seq, tp->snd_una) ||
> > (icsk->icsk_ca_state == TCP_CA_Loss && 
> > !icsk->icsk_retransmits)) {
> > tp->prior_ssthresh = tcp_current_ssthresh(sk);
> > +   tp->prior_cwnd = tp->snd_cwnd;
> > tp->snd_ssthresh = icsk->icsk_ca_ops->ssthresh(sk);
> > tcp_ca_event(sk, CA_EVENT_LOSS);
> > tcp_init_undo(tp);
> >
> > And then change all the CC modules but BBR to use the
> > tcp_reno_undo_cwnd() instead of their own custom undo code.
> >
> > WDYT?  
> Looks reasonable. But we might want to eventually refactor TCP undo
> code: the stats changes (prior_ssthresh, prior_cwnd, undo_marker,
> undo_retrans) are scattered in different helpers, making the code hard
> to audit.

I like having common code as much as possible,
having per CC undo means more variations and sources of errors.

[PATCH net 1/1] Revert "netvsc: optimize calculation of number of slots"

2017-07-24 Thread Stephen Hemminger

The logic for computing page buffer scatter does not take into
account the impact of compound pages. Therefore the optimization
to compute number of slots was incorrect and could cause stack
corruption a skb was sent with lots of fragments from huge pages.

This reverts commit 60b86665af0dfbeebda8aae43f0ba451cd2dcfe5.

Fixes: 60b86665af0d ("netvsc: optimize calculation of number of slots")
Signed-off-by: Stephen Hemminger 
---
 drivers/net/hyperv/netvsc_drv.c | 43 +++--
 1 file changed, 33 insertions(+), 10 deletions(-)

diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c
index 63c98bbbc596..0d78727f1a14 100644
--- a/drivers/net/hyperv/netvsc_drv.c
+++ b/drivers/net/hyperv/netvsc_drv.c
@@ -315,14 +315,34 @@ static u32 init_page_array(void *hdr, u32 len, struct 
sk_buff *skb,
return slots_used;
 }
 
-/* Estimate number of page buffers neede to transmit
- * Need at most 2 for RNDIS header plus skb body and fragments.
- */
-static unsigned int netvsc_get_slots(const struct sk_buff *skb)
+static int count_skb_frag_slots(struct sk_buff *skb)
+{
+   int i, frags = skb_shinfo(skb)->nr_frags;
+   int pages = 0;
+
+   for (i = 0; i < frags; i++) {
+   skb_frag_t *frag = skb_shinfo(skb)->frags + i;
+   unsigned long size = skb_frag_size(frag);
+   unsigned long offset = frag->page_offset;
+
+   /* Skip unused frames from start of page */
+   offset &= ~PAGE_MASK;
+   pages += PFN_UP(offset + size);
+   }
+   return pages;
+}
+
+static int netvsc_get_slots(struct sk_buff *skb)
 {
-   return PFN_UP(offset_in_page(skb->data) + skb_headlen(skb))
-   + skb_shinfo(skb)->nr_frags
-   + 2;
+   char *data = skb->data;
+   unsigned int offset = offset_in_page(data);
+   unsigned int len = skb_headlen(skb);
+   int slots;
+   int frag_slots;
+
+   slots = DIV_ROUND_UP(offset + len, PAGE_SIZE);
+   frag_slots = count_skb_frag_slots(skb);
+   return slots + frag_slots;
 }
 
 static u32 net_checksum_info(struct sk_buff *skb)
@@ -360,18 +380,21 @@ static int netvsc_start_xmit(struct sk_buff *skb, struct 
net_device *net)
struct hv_page_buffer page_buf[MAX_PAGE_BUFFER_COUNT];
struct hv_page_buffer *pb = page_buf;
 
-   /* We can only transmit MAX_PAGE_BUFFER_COUNT number
+   /* We will atmost need two pages to describe the rndis
+* header. We can only transmit MAX_PAGE_BUFFER_COUNT number
 * of pages in a single packet. If skb is scattered around
 * more pages we try linearizing it.
 */
-   num_data_pgs = netvsc_get_slots(skb);
+
+   num_data_pgs = netvsc_get_slots(skb) + 2;
+
if (unlikely(num_data_pgs > MAX_PAGE_BUFFER_COUNT)) {
++net_device_ctx->eth_stats.tx_scattered;
 
if (skb_linearize(skb))
goto no_memory;
 
-   num_data_pgs = netvsc_get_slots(skb);
+   num_data_pgs = netvsc_get_slots(skb) + 2;
if (num_data_pgs > MAX_PAGE_BUFFER_COUNT) {
++net_device_ctx->eth_stats.tx_too_big;
goto drop;
-- 
2.11.0

[PATCH net 0/1] fix bug in netvsc driver

2017-07-24 Thread Stephen Hemminger

Internal testing found a problem under some load tests and the problem
was bisected down to one commit.
This needs to go to current (4.13) and stable (4.12).

Stephen Hemminger (1):
  Revert "netvsc: optimize calculation of number of slots"

 drivers/net/hyperv/netvsc_drv.c | 43 +++--
 1 file changed, 33 insertions(+), 10 deletions(-)

-- 
2.11.0

Re: [PATCH net-next 10/10] bnxt_en: add support for port_attr_get and and get_phys_port_name

2017-07-24 Thread Jakub Kicinski

On Tue, 25 Jul 2017 10:44:49 +0800, kbuild test robot wrote:
>drivers/net/ethernet/broadcom/bnxt/bnxt.c: In function 'bnxt_init_one':
> >> drivers/net/ethernet/broadcom/bnxt/bnxt.c:7897:5: error: 'struct 
> >> net_device' has no member named 'switchdev_ops'; did you mean 
> >> 'netdev_ops'?  
>  dev->switchdev_ops = &bnxt_switchdev_ops;
> ^~

Check out SWITCHDEV_SET_OPS() from commit d643a75ac2bc ("net: switchdev:
add SET_SWITCHDEV_OPS helper").

Re: [PATCH] tun/tap: Add the missed return value check of register_netdevice_notifier

2017-07-24 Thread David Miller

From: Tonghao Zhang 
Date: Tue, 25 Jul 2017 10:43:26 +0800

> One question, this type bugfix will be applied to net-next ?

'net' is periodically merged into 'net-next', so yes.

Re: [PATCH net] ipv6: no need to return rt->dst.error if it is not null entry.

2017-07-24 Thread David Ahern

On 7/24/17 6:08 PM, Hangbin Liu wrote:
> That's why I think we should remove both rt->dst.error and ip6_null_entry
> check in inet6_rtm_getroute(). And even further, remove the ip6_null_entry
> check in rt6_dump_route().

git blame net/ipv6/route.c

find the commits and review.

Re: [PATCH net-next 10/10] bnxt_en: add support for port_attr_get and and get_phys_port_name

2017-07-24 Thread kbuild test robot

Hi Sathya,

[auto build test ERROR on net-next/master]

url:
https://github.com/0day-ci/linux/commits/Michael-Chan/bnxt_en-Update-firmware-interface-spec-to-1-8-0/20170725-094835
config: x86_64-randconfig-x016-201730 (attached as .config)
compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901
reproduce:
# save the attached .config to linux build tree
make ARCH=x86_64 

All errors (new ones prefixed by >>):

   drivers/net/ethernet/broadcom/bnxt/bnxt.c: In function 'bnxt_init_one':
>> drivers/net/ethernet/broadcom/bnxt/bnxt.c:7897:5: error: 'struct net_device' 
>> has no member named 'switchdev_ops'; did you mean 'netdev_ops'?
 dev->switchdev_ops = &bnxt_switchdev_ops;
^~
--
   drivers/net/ethernet/broadcom/bnxt/bnxt_vfr.c: In function 
'bnxt_vf_rep_netdev_init':
>> drivers/net/ethernet/broadcom/bnxt/bnxt_vfr.c:305:5: error: 'struct 
>> net_device' has no member named 'switchdev_ops'; did you mean 'netdev_ops'?
 dev->switchdev_ops = &bnxt_vf_rep_switchdev_ops;
^~
   In file included from include/linux/kernfs.h:13:0,
from include/linux/sysfs.h:15,
from include/linux/kobject.h:21,
from include/linux/pci.h:28,
from drivers/net/ethernet/broadcom/bnxt/bnxt_vfr.c:9:
   drivers/net/ethernet/broadcom/bnxt/bnxt_vfr.c: In function 
'bnxt_dl_eswitch_mode_set':
   drivers/net/ethernet/broadcom/bnxt/bnxt_vfr.c:410:16: error: 'struct bnxt' 
has no member named 'sriov_lock'
 mutex_lock(&bp->sriov_lock);
   ^
   include/linux/mutex.h:164:44: note: in definition of macro 'mutex_lock'
#define mutex_lock(lock) mutex_lock_nested(lock, 0)
   ^~~~
   drivers/net/ethernet/broadcom/bnxt/bnxt_vfr.c:439:18: error: 'struct bnxt' 
has no member named 'sriov_lock'
 mutex_unlock(&bp->sriov_lock);
 ^~

vim +7897 drivers/net/ethernet/broadcom/bnxt/bnxt.c

  7863  
  7864  static int bnxt_init_one(struct pci_dev *pdev, const struct 
pci_device_id *ent)
  7865  {
  7866  static int version_printed;
  7867  struct net_device *dev;
  7868  struct bnxt *bp;
  7869  int rc, max_irqs;
  7870  
  7871  if (pci_is_bridge(pdev))
  7872  return -ENODEV;
  7873  
  7874  if (version_printed++ == 0)
  7875  pr_info("%s", version);
  7876  
  7877  max_irqs = bnxt_get_max_irq(pdev);
  7878  dev = alloc_etherdev_mq(sizeof(*bp), max_irqs);
  7879  if (!dev)
  7880  return -ENOMEM;
  7881  
  7882  bp = netdev_priv(dev);
  7883  
  7884  if (bnxt_vf_pciid(ent->driver_data))
  7885  bp->flags |= BNXT_FLAG_VF;
  7886  
  7887  if (pdev->msix_cap)
  7888  bp->flags |= BNXT_FLAG_MSIX_CAP;
  7889  
  7890  rc = bnxt_init_board(pdev, dev);
  7891  if (rc < 0)
  7892  goto init_err_free;
  7893  
  7894  dev->netdev_ops = &bnxt_netdev_ops;
  7895  dev->watchdog_timeo = BNXT_TX_TIMEOUT;
  7896  dev->ethtool_ops = &bnxt_ethtool_ops;
> 7897  dev->switchdev_ops = &bnxt_switchdev_ops;
  7898  pci_set_drvdata(pdev, dev);
  7899  
  7900  rc = bnxt_alloc_hwrm_resources(bp);
  7901  if (rc)
  7902  goto init_err_pci_clean;
  7903  
  7904  mutex_init(&bp->hwrm_cmd_lock);
  7905  rc = bnxt_hwrm_ver_get(bp);
  7906  if (rc)
  7907  goto init_err_pci_clean;
  7908  
  7909  if (bp->flags & BNXT_FLAG_SHORT_CMD) {
  7910  rc = bnxt_alloc_hwrm_short_cmd_req(bp);
  7911  if (rc)
  7912  goto init_err_pci_clean;
  7913  }
  7914  
  7915  rc = bnxt_hwrm_func_reset(bp);
  7916  if (rc)
  7917  goto init_err_pci_clean;
  7918  
  7919  bnxt_hwrm_fw_set_time(bp);
  7920  
  7921  dev->hw_features = NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM | 
NETIF_F_SG |
  7922 NETIF_F_TSO | NETIF_F_TSO6 |
  7923 NETIF_F_GSO_UDP_TUNNEL | NETIF_F_GSO_GRE |
  7924 NETIF_F_GSO_IPXIP4 |
  7925 NETIF_F_GSO_UDP_TUNNEL_CSUM | 
NETIF_F_GSO_GRE_CSUM |
  7926 NETIF_F_GSO_PARTIAL | NETIF_F_RXHASH |
  7927 NETIF_F_RXCSUM | NETIF_F_GRO;
  7928  
  7929  if (!BNXT_CHIP_TYPE_NITRO_A0(bp))
  7930  dev->hw_features |= NETIF_F_LRO;
  7931  
  7932  dev->hw_enc_features =
  7933  NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM | 
NETIF_F_SG |
  7934  NETIF_F_TSO | NETIF_F_TSO6 |
  7935  NETIF_F_GSO_UDP_TUNNEL | NETIF_F_GSO_GRE |
  7936  NETIF_F_GSO_UDP_TUNNEL_CSUM | 
NETIF_F_GSO_GRE_CSUM |
  7937

Re: [PATCH] tun/tap: Add the missed return value check of register_netdevice_notifier

2017-07-24 Thread Tonghao Zhang

One question, this type bugfix will be applied to net-next ?
> On Jul 25, 2017, at 4:45 AM, David Miller  wrote:
> 
>> 
>> There is some codes of tun/tap module which did not check the return
>> value of register_netdevice_notifier. Add the check now.
>> 
>> Signed-off-by: Tonghao Zhang 
> 
> Applied.

Re: [PATCH net-next 08/10] bnxt_en: add support to enable VF-representors

2017-07-24 Thread kbuild test robot

Hi Sathya,

[auto build test ERROR on net-next/master]

url:
https://github.com/0day-ci/linux/commits/Michael-Chan/bnxt_en-Update-firmware-interface-spec-to-1-8-0/20170725-094835
config: x86_64-randconfig-x016-201730 (attached as .config)
compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901
reproduce:
# save the attached .config to linux build tree
make ARCH=x86_64 

All errors (new ones prefixed by >>):

   In file included from include/linux/kernfs.h:13:0,
from include/linux/sysfs.h:15,
from include/linux/kobject.h:21,
from include/linux/pci.h:28,
from drivers/net/ethernet/broadcom/bnxt/bnxt_vfr.c:9:
   drivers/net/ethernet/broadcom/bnxt/bnxt_vfr.c: In function 
'bnxt_dl_eswitch_mode_set':
>> drivers/net/ethernet/broadcom/bnxt/bnxt_vfr.c:165:16: error: 'struct bnxt' 
>> has no member named 'sriov_lock'
 mutex_lock(&bp->sriov_lock);
   ^
   include/linux/mutex.h:164:44: note: in definition of macro 'mutex_lock'
#define mutex_lock(lock) mutex_lock_nested(lock, 0)
   ^~~~
   drivers/net/ethernet/broadcom/bnxt/bnxt_vfr.c:194:18: error: 'struct bnxt' 
has no member named 'sriov_lock'
 mutex_unlock(&bp->sriov_lock);
 ^~

vim +165 drivers/net/ethernet/broadcom/bnxt/bnxt_vfr.c

   > 9  #include 
10  #include 
11  #include 
12  #include 
13  #include 
14  
15  #include "bnxt_hsi.h"
16  #include "bnxt.h"
17  #include "bnxt_vfr.h"
18  
19  #define CFA_HANDLE_INVALID  0x
20  
21  static void __bnxt_vf_reps_destroy(struct bnxt *bp)
22  {
23  u16 num_vfs = pci_num_vf(bp->pdev);
24  struct bnxt_vf_rep *vf_rep;
25  int i;
26  
27  for (i = 0; i < num_vfs; i++) {
28  vf_rep = bp->vf_reps[i];
29  if (vf_rep) {
30  if (vf_rep->dev) {
31  /* if register_netdev failed, then 
netdev_ops
32   * would have been set to NULL
33   */
34  if (vf_rep->dev->netdev_ops)
35  unregister_netdev(vf_rep->dev);
36  free_netdev(vf_rep->dev);
37  }
38  }
39  }
40  
41  kfree(bp->vf_reps);
42  bp->vf_reps = NULL;
43  }
44  
45  void bnxt_vf_reps_destroy(struct bnxt *bp)
46  {
47  bool closed = false;
48  
49  if (bp->eswitch_mode != DEVLINK_ESWITCH_MODE_SWITCHDEV)
50  return;
51  
52  if (!bp->vf_reps)
53  return;
54  
55  /* Ensure that parent PF's and VF-reps' RX/TX has been quiesced
56   * before proceeding with VF-rep cleanup.
57   */
58  rtnl_lock();
59  if (netif_running(bp->dev)) {
60  bnxt_close_nic(bp, false, false);
61  closed = true;
62  }
63  bp->eswitch_mode = DEVLINK_ESWITCH_MODE_LEGACY;
64  
65  if (closed)
66  bnxt_open_nic(bp, false, false);
67  rtnl_unlock();
68  
69  /* Need to call vf_reps_destroy() outside of rntl_lock
70   * as unregister_netdev takes rtnl_lock
71   */
72  __bnxt_vf_reps_destroy(bp);
73  }
74  
75  /* Use the OUI of the PF's perm addr and report the same mac addr
76   * for the same VF-rep each time
77   */
78  static void bnxt_vf_rep_eth_addr_gen(u8 *src_mac, u16 vf_idx, u8 *mac)
79  {
80  u32 addr;
81  
82  ether_addr_copy(mac, src_mac);
83  
84  addr = jhash(src_mac, ETH_ALEN, 0) + vf_idx;
85  mac[3] = (u8)(addr & 0xFF);
86  mac[4] = (u8)((addr >> 8) & 0xFF);
87  mac[5] = (u8)((addr >> 16) & 0xFF);
88  }
89  
90  static void bnxt_vf_rep_netdev_init(struct bnxt *bp, struct bnxt_vf_rep 
*vf_rep,
91  struct net_device *dev)
92  {
93  struct net_device *pf_dev = bp->dev;
94  
95  /* Just inherit all the featues of the parent PF as the VF-R
96   * uses the RX/TX rings of the parent PF
97   */
98  dev->hw_features = pf_dev->hw_features;
99  dev->gso_partial_features = pf_dev->gso_partial_features;
   100  dev->vlan_features = pf_dev->vlan_features;
   101  dev->hw_enc_features = pf_dev->hw_enc_features;
   102  dev->features |= pf_dev->features;
   103  bnxt_vf_rep_eth_addr_gen(bp->pf.mac_addr, vf_rep->vf_idx,
   104

Re: Bonding driver fails to enable second interface if updelay is non-zero

2017-07-24 Thread Benjamin Gilbert

On Mon, Jul 24, 2017 at 2:34 PM, Mahesh Bandewar (महेश बंडेवार)
 wrote:
> Having said that, the proposed solution (Cong's patch) should fix it, please
> give it a try so that this fix can be formalized.

The patch fixes the problem for me.

Thanks!
--Benjamin Gilbert

Re: SELinux/IP_PASSSEC regression in 4.13-rcX

2017-07-24 Thread Paul Moore

On Mon, Jul 24, 2017 at 3:00 PM, Paul Moore  wrote:
> On Mon, Jul 24, 2017 at 12:09 PM, Paolo Abeni  wrote:
>> Hi,
>>
>> On Mon, 2017-07-24 at 10:42 -0400, Paul Moore wrote:
>>> The change in behavior for userspace makes me a little nervous as
>>> there is no way of knowing how any random application may be coded.
>>> Even if we are confident that the majority of applications set
>>> IP_PASSSEC before calling bind(), we are likely still stuck with a few
>>> that will break, and that means a lot of hard to debug problem
>>> reports.
>>>
>>> I would feel much more comfortable if we could preserve the existing 
>>> behavior.
>>
>> I agree, we must preserve the original behavior.
>>
>> Re-thinking about the problem, checking skb->sp in the BH, and storing
>> the status in the scratch area should both fix the issue in a sane way
>> and preserve the optimization.
>>
>> Something like the code below. Could you please try in your
>> environment? (or point me to simple reproducer ;-)
>
> I'm happy to test this, but if you are curious, you can find the
> selinux-testsuite at the link below; the "inet_socket" tests are the
> ones relevant to this problem.
>
> * https://github.com/SELinuxProject/selinux-testsuite
>
> However, I believe there is a problem with this patch, see below.
>
>> ---
>> diff --git a/include/net/udp.h b/include/net/udp.h
>> index 972ce4b..8d2c406 100644
>> --- a/include/net/udp.h
>> +++ b/include/net/udp.h
>> @@ -305,33 +305,44 @@ struct sock *udp6_lib_lookup_skb(struct sk_buff *skb,
>>  /* UDP uses skb->dev_scratch to cache as much information as possible and 
>> avoid
>>   * possibly multiple cache miss on dequeue()
>>   */
>> -#if BITS_PER_LONG == 64
>> -
>> -/* truesize, len and the bit needed to compute skb_csum_unnecessary will be 
>> on
>> - * cold cache lines at recvmsg time.
>> - * skb->len can be stored on 16 bits since the udp header has been already
>> - * validated and pulled.
>> - */
>>  struct udp_dev_scratch {
>> -   u32 truesize;
>> +   /* skb->truesize and the stateless bit embeded in a single field;
>> +* do not use a bitfield since the compiler emits better/smaller code
>> +* this way
>> +*/
>> +   u32 _tsize_state;
>> +
>> +#if BITS_PER_LONG == 64
>> +   /* len and the bit needed to compute skb_csum_unnecessary
>> +* will be on cold cache lines at recvmsg time.
>> +* skb->len can be stored on 16 bits since the udp header has been
>> +* already validated and pulled.
>> +*/
>> u16 len;
>> bool is_linear;
>> bool csum_unnecessary;
>> +#endif
>>  };
>
> ...
>
>> diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
>> index b057653..d243772 100644
>> --- a/net/ipv4/udp.c
>> +++ b/net/ipv4/udp.c
>> @@ -1163,34 +1163,32 @@ int udp_sendpage(struct sock *sk, struct page *page, 
>> int offset,
>> return ret;
>>  }
>>
>> -#if BITS_PER_LONG == 64
>> +#define UDP_SKB_IS_STATELESS 0x8000
>> +
>>  static void udp_set_dev_scratch(struct sk_buff *skb)
>>  {
>> -   struct udp_dev_scratch *scratch;
>> +   struct udp_dev_scratch *scratch = udp_skb_scratch(skb);
>>
>> BUILD_BUG_ON(sizeof(struct udp_dev_scratch) > sizeof(long));
>
> The BUILD_BUG_ON() assertion no longer appears to be correct with this patch.

Nevermind, I just took a closer look at this and realized I made a
mistake when applying your patch (had to apply manually for some
reason).  I'm building a test kernel now.

-- 
paul moore
www.paul-moore.com

Re: [PATCH 00/22] gcc-7 -Wformat-* warnings

2017-07-24 Thread Martin K. Petersen


Arnd,

> This series addresses all warnings that gcc-7 introduces for
> -Wformat-overflow= and turns off the -Wformat-truncation by default
> (they remain enabled with "make W=1").

Applied the SCSI patches. Thanks!

-- 
Martin K. Petersen  Oracle Linux Engineering

Dear Talented

2017-07-24 Thread Kim Sharma

Dear Talented,

I am Talent Scout For Sony Pictures Animation, Present Sony Pictures Animation 
Movies a Film Corporation Located in the United State, is Soliciting for the
Right to use Your Photo/Face and Personality as One of the Semi -Major
Role/ Character in our Upcoming ANIMATED Stereoscope 3D Movie-The Story
of Peter Rabbit (Peter Rabbit 2018) The Movie is Currently Filming (In
Production) Please Note That There Will Be No Auditions, Traveling or
Any Special / Professional Acting Skills, Since the Production of This
Movie Will Be Done with our State of Art Computer -Generating Imagery
Equipment. We Are Prepared to Pay the Total Sum of $560,000.00 USD. For
More Information/Understanding, Please Write us on the E-Mail Below.
CONTACT EMAIL: bluesky.filmstu...@usa.com
All Reply to:  bluesky.filmstu...@usa.com
Note: Only the Response send to this mail will be Given a Prior
Consideration.

Talent Scout
Kim Sharma

[PATCH net-next] i40e: limit lan queue count in large cpu count machine

2017-07-24 Thread Shannon Nelson

When a machine has more cpus than queuepairs, e.g. 512 cores, the
counting gets a little funky and turns off Flow Director with the
message:
  not enough queues for Flow Director. Flow Director feature is disabled

This patch limits the number of lan queues initially allocated to
be sure we have some left for FD and other features.

Signed-off-by: Shannon Nelson 
---
 drivers/net/ethernet/intel/i40e/i40e_main.c |   11 +++
 1 files changed, 7 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c 
b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 2db93d3..a694e99 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -11059,6 +11059,7 @@ static int i40e_setup_pf_switch(struct i40e_pf *pf, 
bool reinit)
 static void i40e_determine_queue_usage(struct i40e_pf *pf)
 {
int queues_left;
+   int q_max;
 
pf->num_lan_qps = 0;
 
@@ -11105,10 +11106,12 @@ static void i40e_determine_queue_usage(struct i40e_pf 
*pf)
I40E_FLAG_DCB_ENABLED);
dev_info(&pf->pdev->dev, "not enough queues for DCB. 
DCB is disabled.\n");
}
-   pf->num_lan_qps = max_t(int, pf->rss_size_max,
-   num_online_cpus());
-   pf->num_lan_qps = min_t(int, pf->num_lan_qps,
-   pf->hw.func_caps.num_tx_qp);
+
+   /* limit lan qps to the smaller of qps, cpus or msix */
+   q_max = max_t(int, pf->rss_size_max, num_online_cpus());
+   q_max = min_t(int, q_max, pf->hw.func_caps.num_tx_qp);
+   q_max = min_t(int, q_max, pf->hw.func_caps.num_msix_vectors);
+   pf->num_lan_qps = q_max;
 
queues_left -= pf->num_lan_qps;
}
-- 
1.7.1

[PATCH] ftgmac100: return error in ftgmac100_alloc_rx_buf

2017-07-24 Thread Joel Stanley

The error paths set err, but it's not returned.

I wondered if we should fix all of the callers to check the returned
value, but Ben explains why the code is this way:

> Most call sites ignore it on purpose. There's nothing we can do if
> we fail to get a buffer at interrupt time, so we point the buffer to
> the scratch page so the HW doesn't DMA into lalaland and lose the
> packet.
>
> The one call site that tests and can fail is the one used when brining
> the interface up. If we fail to allocate at that point, we fail the
> ifup. But as you noticed, I do have a bug not returning the error.

Acked-by: Benjamin Herrenschmidt 
Signed-off-by: Joel Stanley 
---
 drivers/net/ethernet/faraday/ftgmac100.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/faraday/ftgmac100.c 
b/drivers/net/ethernet/faraday/ftgmac100.c
index fce0aa4f78b7..33b5c8eb9961 100644
--- a/drivers/net/ethernet/faraday/ftgmac100.c
+++ b/drivers/net/ethernet/faraday/ftgmac100.c
@@ -392,7 +392,7 @@ static int ftgmac100_alloc_rx_buf(struct ftgmac100 *priv, 
unsigned int entry,
struct net_device *netdev = priv->netdev;
struct sk_buff *skb;
dma_addr_t map;
-   int err;
+   int err = 0;
 
skb = netdev_alloc_skb_ip_align(netdev, RX_BUF_SIZE);
if (unlikely(!skb)) {
@@ -428,7 +428,7 @@ static int ftgmac100_alloc_rx_buf(struct ftgmac100 *priv, 
unsigned int entry,
else
rxdes->rxdes0 = 0;
 
-   return 0;
+   return err;
 }
 
 static unsigned int ftgmac100_next_rx_pointer(struct ftgmac100 *priv,
-- 
2.13.3

Re: [PATCH net-next] ibmvnic: Check for transport event on driver resume

2017-07-24 Thread David Miller

From: John Allen 
Date: Mon, 24 Jul 2017 13:26:06 -0500

> On resume, the ibmvnic driver will fail to resume normal operation.
> The main crq gets closed on suspend by the vnic server and doesn't get
> reopened again as the interrupt for the transport event that would reset
> the main crq comes in after the driver has been suspended.
> 
> This patch resolves the issue by removing the calls to kick the receive
> interrupts handlers and instead directly invoking the main crq interrupt
> handler. This will ensure that we see the transport event necessary to
> properly resume the driver.
> 
> Signed-off-by: John Allen 

Applied.

Re: [PATCH net-next 0/6] netvsc: minor fixes

2017-07-24 Thread David Miller

From: Stephen Hemminger 
Date: Mon, 24 Jul 2017 10:57:24 -0700

> This fixes fallout from previous patch related to RTNL and RCU
> annotaiton. Also one patch sent to wrong list.

Series applied, thanks.

Re: [PATCH net] net: dsa: Initialize ds->cpu_port_mask earlier

2017-07-24 Thread David Miller

From: Florian Fainelli 
Date: Mon, 24 Jul 2017 10:49:23 -0700

> The mt7530 driver has its dsa_switch_ops::get_tag_protocol function
> check ds->cpu_port_mask to issue a warning in case the configured CPU
> port is not capable of supporting tags.
> 
> After commit 14be36c2c96c ("net: dsa: Initialize all CPU and enabled
> ports masks in dsa_ds_parse()") we slightly re-arranged the
> initialization such that this was no longer working. Just make sure that
> ds->cpu_port_mask is set prior to the first call to get_tag_protocol,
> thus restoring the expected contract. In case of error, the CPU port bit
> is cleared.
> 
> Fixes: 14be36c2c96c ("net: dsa: Initialize all CPU and enabled ports masks in 
> dsa_ds_parse()")
> Reported-by: Sean Wang 
> Signed-off-by: Florian Fainelli 

Applied, thanks Florian.

Re: [patch net-next] mlxsw: spectrum_router: Fix build when IPv6 isn't enabled

2017-07-24 Thread David Ahern

On 7/24/17 1:56 AM, Jiri Pirko wrote:
> @@ -949,9 +950,13 @@ mlxsw_sp_router_neighs_update_interval_init(struct 
> mlxsw_sp *mlxsw_sp)
>  {
>   unsigned long interval;
>  
> +#if IS_ENABLED(CONFIG_IPV6)
>   interval = min_t(unsigned long,
>NEIGH_VAR(&arp_tbl.parms, DELAY_PROBE_TIME),
>NEIGH_VAR(&nd_tbl.parms, DELAY_PROBE_TIME));
> +#else
> + interval = NEIGH_VAR(&arp_tbl.parms, DELAY_PROBE_TIME);
> +#endif
>   mlxsw_sp->router->neighs_update.interval = jiffies_to_msecs(interval);
>  }
>  

arp and ndisc should have accessors for those rather than exporting the
tables to drivers.

Re: [PATCH 1/2] net: ethernet: ti: cpts: convert to use kthread_worker

2017-07-24 Thread Grygorii Strashko

Hi Richard,

On 07/22/2017 12:29 AM, Richard Cochran wrote:
> On Fri, Jul 21, 2017 at 06:49:42PM -0500, Grygorii Strashko wrote:
>> There could be significant delay in CPTS work schedule under high system
>> load and on -RT which could cause CPTS misbehavior due to internal counter
>> overflow. Usage of own kthread_worker allows to avoid such kind of issues
>> and makes it possible to tune priority of CPTS kthread_worker thread on -RT
>> (thread name "cpts").
> 
> I have also seen excessive delays in the time stamp work from the
> dp83640 under certain loads.  Can we please implement this time stamp
> kthread_worker in the PHC subsystem?  That way, the facility can be
> used by other drivers without code duplication.

Below if pure TBD/RFC version of patch which add kthread worker to PTP core.
I'm sending it to get you opinion about implementation in general, before 
continue with more changes. Pls, take a look if you have time?
- are you ok with names (API, callbacks, ptp structs members)?

I can prepare, update and resend proper patches tom if feedback is positive.
I also can convert dp83640 driver to use new feature, but I can't test it.

>From 48212c4564ae38d633d95723b1f4616dee195b47 Mon Sep 17 00:00:00 2001
From: Grygorii Strashko 
Date: Mon, 24 Jul 2017 19:19:50 -0500
Subject: [PATCH] [rfc] ptp: add auxiliary worker

Add auxiliary kthread worker to PTP core so drivers can
re-use it and benefit from common implementation which is RT
friendly... TBD

Signed-off-by: Grygorii Strashko 
---
 drivers/net/ethernet/ti/cpts.c   | 53 +++-
 drivers/net/ethernet/ti/cpts.h   |  2 --
 drivers/ptp/ptp_clock.c  | 42 +++
 drivers/ptp/ptp_private.h|  3 +++
 include/linux/ptp_clock_kernel.h | 15 
 5 files changed, 80 insertions(+), 35 deletions(-)

diff --git a/drivers/net/ethernet/ti/cpts.c b/drivers/net/ethernet/ti/cpts.c
index 746dd1d..e05a1b4 100644
--- a/drivers/net/ethernet/ti/cpts.c
+++ b/drivers/net/ethernet/ti/cpts.c
@@ -280,23 +280,9 @@ static int cpts_ptp_enable(struct ptp_clock_info *ptp,
return -EOPNOTSUPP;
 }
 
-static struct ptp_clock_info cpts_info = {
-   .owner  = THIS_MODULE,
-   .name   = "CTPS timer",
-   .max_adj= 100,
-   .n_ext_ts   = 0,
-   .n_pins = 0,
-   .pps= 0,
-   .adjfreq= cpts_ptp_adjfreq,
-   .adjtime= cpts_ptp_adjtime,
-   .gettime64  = cpts_ptp_gettime,
-   .settime64  = cpts_ptp_settime,
-   .enable = cpts_ptp_enable,
-};
-
-static void cpts_overflow_check(struct kthread_work *work)
+static long cpts_overflow_check(struct ptp_clock_info *ptp)
 {
-   struct cpts *cpts = container_of(work, struct cpts, overflow_work.work);
+   struct cpts *cpts = container_of(ptp, struct cpts, info);
unsigned long delay = cpts->ov_check_period;
struct timespec64 ts;
unsigned long flags;
@@ -309,9 +295,24 @@ static void cpts_overflow_check(struct kthread_work *work)
spin_unlock_irqrestore(&cpts->lock, flags);
 
pr_debug("cpts overflow check at %lld.%09lu\n", ts.tv_sec, ts.tv_nsec);
-   kthread_queue_delayed_work(cpts->kworker, &cpts->overflow_work, delay);
+   return (long)delay;
 }
 
+static struct ptp_clock_info cpts_info = {
+   .owner  = THIS_MODULE,
+   .name   = "CTPS timer",
+   .max_adj= 100,
+   .n_ext_ts   = 0,
+   .n_pins = 0,
+   .pps= 0,
+   .adjfreq= cpts_ptp_adjfreq,
+   .adjtime= cpts_ptp_adjtime,
+   .gettime64  = cpts_ptp_gettime,
+   .settime64  = cpts_ptp_settime,
+   .enable = cpts_ptp_enable,
+   .do_aux_work= cpts_overflow_check,
+};
+
 static int cpts_match(struct sk_buff *skb, unsigned int ptp_class,
  u16 ts_seqid, u8 ts_msgtype)
 {
@@ -392,8 +393,7 @@ static u64 cpts_find_ts(struct cpts *cpts, struct sk_buff 
*skb, int ev_type)
/* get the timestamp for timeouts */
skb_cb->tmo = jiffies + msecs_to_jiffies(100);
__skb_queue_tail(&cpts->txq, skb);
-   kthread_mod_delayed_work(cpts->kworker,
-&cpts->overflow_work, 0);
+   ptp_schedule_worker(cpts->clock, 0);
}
spin_unlock_irqrestore(&cpts->lock, flags);
 
@@ -457,8 +457,7 @@ int cpts_register(struct cpts *cpts)
}
cpts->phc_index = ptp_clock_index(cpts->clock);
 
-   kthread_queue_delayed_work(cpts->kworker, &cpts->overflow_work,
-  cpts->ov_check_period);
+   ptp_schedule_worker(cpts->clock, cpts->ov_check_period);
return 0;
 
 err_ptp:
@@ -472,8 +471,6 @@ void cpts_unregister(struct cpts *cpts)
if (WARN_ON(!cpts->clock))
return;
 
-   kthread_cancel_delayed_work_sync(&cpts->overflo

Re: [Patch net] packet: fix use-after-free in prb_retire_rx_blk_timer_expired()

2017-07-24 Thread David Miller

From: Cong Wang 
Date: Mon, 24 Jul 2017 10:07:32 -0700

> There are multiple reports showing we have a use-after-free in
> the timer prb_retire_rx_blk_timer_expired(), where we use struct
> tpacket_kbdq_core::pkbdq, a pg_vec, after it gets freed by
> free_pg_vec().
> 
> The interesting part is it is not freed via packet_release() but
> via packet_setsockopt(), which means we are not closing the socket.
> Looking into the big and fat function packet_set_ring(), this could
> happen if we satisfy the following conditions:
> 
> 1. closing == 0, not on packet_release() path
> 2. req->tp_block_nr == 0, we don't allocate a new pg_vec
> 3. rx_ring->pg_vec is already set as V3, which means we already called
>packet_set_ring() wtih req->tp_block_nr > 0 previously
> 4. req->tp_frame_nr == 0, pass sanity check
> 5. po->mapped == 0, never called mmap()
> 
> In this scenario we are clearing the old rx_ring->pg_vec, so we need
> to free this pg_vec, but we don't stop the timer on this path because
> of closing==0.
> 
> The timer has to be stopped as long as we need to free pg_vec, therefore
> the check on closing!=0 is wrong, we should check pg_vec!=NULL instead.
> 
> Thanks to liujian for testing different fixes.
> 
> Reported-by: alexander.le...@verizon.com
> Reported-by: Dave Jones 
> Reported-by: liujian (CE) 
> Tested-by: liujian (CE) 
> Cc: Ding Tianhong 
> Cc: Willem de Bruijn 
> Signed-off-by: Cong Wang 

Applied and queued up for -stable, thanks!

Re: [PATCH net-next 00/10] bnxt_en: Updates for net-next.

2017-07-24 Thread David Miller

From: Michael Chan 
Date: Mon, 24 Jul 2017 12:34:19 -0400

> This series includes updating the firmware interface, adding
> methods to get and set VEPA/VEB bridge modes, some minor DCBX and ETS
> refinements, and 3 patches from Sathya Perla to implement initial
> VF representors for SRIOV switching.

Series applied, thank you.

Re: Problem with PHY state machine when using interrupts

2017-07-24 Thread Florian Fainelli

On 07/24/2017 03:59 PM, Florian Fainelli wrote:
> On 07/24/2017 03:53 PM, Mason wrote:
>> On 25/07/2017 00:36, Florian Fainelli wrote:
>>> On 07/24/2017 02:20 PM, Mason wrote:
 On 24/07/2017 21:53, Florian Fainelli wrote:

> Well now that I see the possible interrupts generated, I indeed don't
> see how you can get a link down notification unless you somehow force
> the link down yourself, which would certainly happen in phy_suspend()
> when we set BMCR.pwrdwn, but that may be too late.
>
> You should still expect the adjust_link() function to be called though
> with PHY_HALTED being set and that takes care of doing phydev->link = 0
> and netif_carrier_off(). If that still does not work, then see whether
> removing the call to phy_stop() does help (it really should).

 The only functions setting phydev->state to PHY_HALTED
 are phy_error() and phy_stop() AFAICT.

 I am aware that when phy_state_machine() handles the
 PHY_HALTED state, it will set phydev->link = 0;
 and call netif_carrier_off() -- because that's where
 I copied that code from.

 My issue is that phy_state_machine() does not run when
 I run 'ip set link dev eth0 down' from the command line.
>>>
>>> Yes, that much is clear, which is why I suggested earlier you try the
>>> patch at the end now.
>>>

 If I'm reading the code right, phy_disconnect() actually
 stops the state machine.

 In interrupt mode, phy_state_machine() doesn't run
 because no interrupt is generated.

 In polling mode, phy_state_machine() doesn't run
 because phy_disconnect() stops the state machine.

 Introducing a sleep before phy_disconnect() gives
 the state machine a chance to run in polling mode,
 but it doesn't feel right, and doesn't fix the
 other mode, which I'm using.
>>>
>>> There are several problems it seems:
>>>
>>> - the PHY state machine cannot move solely based on the PHY generating
>>> interrupts during phy_stop() if none of the changing conditions for the
>>> HW have changed, come to think about it, I doubt any PHY would be
>>> capable of doing something like that
>>>
>>> - there is an expectation from your driver to have adjust_link() run
>>> sometime during ndo_stop() to do something, but why?
>>>
>>> What is special about nb8800 that interrupts should be generated during
>>> ndo_stop(), and why do you think this is going to solve your problem?
>>>

 Looking at bcm_enet_stop() it calls phy_stop() and
 phy_disconnect() just like the nb8800 driver...

 I'm stumped.
>>>
>>> My suggestion of not using phy_stop() was not a good one, but
>>> functionally there is little difference in doing phy_stop() +
>>> phy_disconnect() or just phy_disconnect() alone. What matters is that we
>>> restart the PHY properly when phy_connect() or phy_start() is called.
>>>
>>> What I understand now from your "bug report" is that you want
>>> adjust_link to run with phydev->link = 0 to do something during
>>> ndo_close() but you have not explained why, but I suspect such that upon
>>> ndo_open() things work again.
>>>
>>> What I suspect your bug is, is that the really was no change in link
>>> status, no interrupt was generated because there should not be one, yet
>>> some of what nb8800_stop() does is not properly balanced by
>>> nb8800_open(). How about the following patch:
>>>
>>> diff --git a/drivers/net/ethernet/aurora/nb8800.c
>>> b/drivers/net/ethernet/aurora/nb8800.c
>>> index 041cfb7952f8..b07dea3ab019 100644
>>> --- a/drivers/net/ethernet/aurora/nb8800.c
>>> +++ b/drivers/net/ethernet/aurora/nb8800.c
>>> @@ -972,6 +972,10 @@ static int nb8800_open(struct net_device *dev)
>>> nb8800_mac_rx(dev, true);
>>> nb8800_mac_tx(dev, true);
>>>
>>> +   priv->speed = -1;
>>> +   priv->link = -1;
>>> +   priv->duplex = -1;
>>> +
>>> phydev = of_phy_connect(dev, priv->phy_node,
>>> nb8800_link_reconfigure, 0,
>>> priv->phy_mode);
>>>
>>
>> I will test your two patches ASAP.
>>
>> For the record, I have two (maybe related) issues:
>>
>> 1) After a sequence of
>> ip set link dev eth0 up
>> ip set link dev eth0 down
>> ip set link dev eth0 up
>> RX engine is hosed, thus network connectivity is borked.
>> The work-around is resetting the MAC in ndo_open
>> => mac_init in my proposed patch.
>> This is by far the biggest issue.
>> Also, resetting the MAC in ndo_open will make it easy
>> to implement suspend/resume, as I can just ndo_stop
>> in suspend, and ndo_open in resume.
> 
> OK.
> 
>>
>> 2) The system does not print "Link down" when I run
>> 'ip set link dev eth0 down'
>> This is a symptom of nb8800_link_reconfigure()
>> not being called at all (or being called
>> with phydev->link == priv->link when I tried
>> skipping phy_stop)
> 
> Correct, and there is actually a timing hazard that I could also
> reproduce here in that

Re: [PATCH] tcp: remove redundant argument from tcp_rcv_established()

2017-07-24 Thread David Miller

From: Ilya Matveychikov 
Date: Mon, 24 Jul 2017 16:02:12 +0400

> The last (4th) argument of tcp_rcv_established() is redundant as it
> always equals to skb->len and the skb itself is always passed as 2th
> agrument. There is no reason to have it.
> 
> Signed-off-by: Ilya V. Matveychikov 

Applied to net-next, thanks.

Re: [PATCH 1/1] mlx4_en: remove unnecessary error check

2017-07-24 Thread David Miller

From: Zhu Yanjun 
Date: Mon, 24 Jul 2017 04:22:44 -0400

> The function mlx4_en_get_profile always return zero. So it is not
> necessary to check its return value.
> 
> CC: Joe Jin 
> CC: Junxiao Bi 
> Signed-off-by: Zhu Yanjun 

Applied, thanks.

Re: [PATCH 2/2] ftgmac100: Make the MDIO bus a child of the ethernet device

2017-07-24 Thread David Miller

From: Benjamin Herrenschmidt 
Date: Mon, 24 Jul 2017 16:59:07 +1000

> Populate mii_bus->parent with our own platform device before
> registering, which makes it easier to locate the MDIO bus
> in sysfs when trying to diagnose problems.
> 
> Signed-off-by: Benjamin Herrenschmidt 

Applied.

Re: [PATCH 1/2] ftgmac100: Increase reset timeout

2017-07-24 Thread David Miller

From: Benjamin Herrenschmidt 
Date: Mon, 24 Jul 2017 16:59:01 +1000

> We had reports of 50us not being sufficient to reset the MAC,
> thus hitting the "Hardware reset failed" error bringing the
> interface up on some AST2400 based machines.
> 
> This bumps the timeout to 200us.
> 
> Signed-off-by: Benjamin Herrenschmidt 

Applied.

[PATCH v2 net-next 2/2] liquidio: cleanup: removed cryptic and misleading macro

2017-07-24 Thread Felix Manlunas

From: Rick Farrington 

Signed-off-by: Rick Farrington 
Signed-off-by: Felix Manlunas 
---
 drivers/net/ethernet/cavium/liquidio/octeon_console.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/cavium/liquidio/octeon_console.c 
b/drivers/net/ethernet/cavium/liquidio/octeon_console.c
index 501ad95..0770183 100644
--- a/drivers/net/ethernet/cavium/liquidio/octeon_console.c
+++ b/drivers/net/ethernet/cavium/liquidio/octeon_console.c
@@ -42,8 +42,6 @@ module_param(console_bitmask, int, 0644);
 MODULE_PARM_DESC(console_bitmask,
 "Bitmask indicating which consoles have debug output 
redirected to syslog.");
 
-#define CAST_ULL(v) ((u64)(v))
-
 #define BOOTLOADER_PCI_READ_BUFFER_DATA_ADDR0x0006c008
 #define BOOTLOADER_PCI_READ_BUFFER_LEN_ADDR 0x0006c004
 #define BOOTLOADER_PCI_READ_BUFFER_OWNER_ADDR   0x0006c000
@@ -233,7 +231,7 @@ static int __cvmx_bootmem_check_version(struct 
octeon_device *oct,
(exact_match && major_version != exact_match)) {
dev_err(&oct->pci_dev->dev, "bootmem ver mismatch %d.%d 
addr:0x%llx\n",
major_version, minor_version,
-   CAST_ULL(oct->bootmem_desc_addr));
+   (long long)(oct->bootmem_desc_addr));
return -1;
} else {
return 0;
-- 
2.9.0

[PATCH v2 net-next 1/2] liquidio: standardization: use min_t instead of custom macro

2017-07-24 Thread Felix Manlunas

From: Rick Farrington 

Signed-off-by: Rick Farrington 
Signed-off-by: Derek Chickles 
Signed-off-by: Felix Manlunas 
---
 drivers/net/ethernet/cavium/liquidio/octeon_console.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/cavium/liquidio/octeon_console.c 
b/drivers/net/ethernet/cavium/liquidio/octeon_console.c
index e08f760..501ad95 100644
--- a/drivers/net/ethernet/cavium/liquidio/octeon_console.c
+++ b/drivers/net/ethernet/cavium/liquidio/octeon_console.c
@@ -42,7 +42,6 @@ module_param(console_bitmask, int, 0644);
 MODULE_PARM_DESC(console_bitmask,
 "Bitmask indicating which consoles have debug output 
redirected to syslog.");
 
-#define MIN(a, b) min((a), (b))
 #define CAST_ULL(v) ((u64)(v))
 
 #define BOOTLOADER_PCI_READ_BUFFER_DATA_ADDR0x0006c008
@@ -704,7 +703,7 @@ static int octeon_console_read(struct octeon_device *oct, 
u32 console_num,
if (bytes_to_read <= 0)
return bytes_to_read;
 
-   bytes_to_read = MIN(bytes_to_read, (s32)buf_size);
+   bytes_to_read = min_t(s32, bytes_to_read, buf_size);
 
/* Check to see if what we want to read is not contiguous, and limit
 * ourselves to the contiguous block
-- 
2.9.0

[PATCH v2 net-next 0/2] liquidio: standardization and cleanup

2017-07-24 Thread Felix Manlunas

From: Rick Farrington 

This patchset corrects some non-standard macro usage.

1. Replaced custom MIN macro with use of standard 'min_t'.
2. Removed cryptic and misleading macro 'CAST_ULL'.

change log:
V1 -> V2:
1. Added driver cleanup of macro 'CAST_ULL'.

Rick Farrington (2):
  liquidio: standardization: use min_t instead of custom macro
  liquidio: cleanup: removed cryptic and misleading macro

 drivers/net/ethernet/cavium/liquidio/octeon_console.c | 7 ++-
 1 file changed, 2 insertions(+), 5 deletions(-)

-- 
2.9.0

Re: [patch net-next] mlxsw: spectrum_router: Fix build when IPv6 isn't enabled

2017-07-24 Thread David Miller

From: Jiri Pirko 
Date: Mon, 24 Jul 2017 09:56:00 +0200

> From: Ido Schimmel 
> 
> When IPv6 isn't enabled the following error is generated:
> 
> ERROR: "nd_tbl" [drivers/net/ethernet/mellanox/mlxsw/mlxsw_spectrum.ko]
> undefined!
> 
> Fix it by replacing 'arp_tbl' and 'nd_tbl' with 'tbl->family' wherever
> possible and reference 'nd_tbl' only when IPV6 is enabled.
> 
> Fixes: d5eb89cf68d6 ("mlxsw: spectrum_router: Reflect IPv6 neighbours to the 
> device")
> Signed-off-by: Ido Schimmel 
> Reported-by: kbuild test robot 
> Signed-off-by: Jiri Pirko 

Applied.

Re: [PATCH net] ipv6: no need to return rt->dst.error if it is not null entry.

2017-07-24 Thread Hangbin Liu

On Mon, Jul 24, 2017 at 12:57:43PM -0700, Cong Wang wrote:
> On Sun, Jul 23, 2017 at 8:09 PM, Hangbin Liu  wrote:
> > Do we still need this net->ipv6.ip6_null_entry check? How about remove all
> > the checks?
> 
> I believe you only need to check for rt->dst.error, no need to check against
> NULL or ip6_null_entry.
> 
> Take a look at other ip6_route_lookup() callers.

Yes, I saw it. That why I send v2 patch to check both rt->dst.error and
ip6_null_entry.

The question is the other two caller are rpfilter_lookup_reverse6() and
nft_fib6_eval(). From the code it looks these two caller only care about
device match.

 if (rt->rt6i_idev->dev == dev || (flags & XT_RPFILTER_LOOSE))
 ret = true;

And the device would be lo if it is ip6_null_entry. So they just discard it.
I'm not familiar with netfilter, Please correct me if I make any mistake.

But what we want in inet6_rtm_getroute() and rt6_dump_route() is to
get/dump the route info. So we should get the info even it's unreachable or
prohibit.

That's why I think we should remove both rt->dst.error and ip6_null_entry
check in inet6_rtm_getroute(). And even further, remove the ip6_null_entry
check in rt6_dump_route().

What do you think?

Thanks
Hangbin

Re: [PATCH net-next 00/18] net: mvpp2: MAC/GoP configuration and optional PHYs

2017-07-24 Thread Marcin Wojtas

Hi Antoine,

I stressed 10G interfaces with bidirectional traffic on A8040-DB, did
some up/down sequences and overall it seems stable (of course I needed
fix mentioned in patch 10/18). In a spare moment I'll give other
changes a scroll.

Best regards,
Marcin

2017-07-24 15:48 GMT+02:00 Antoine Tenart :
> Hi all,
>
> This series aim two things: making the PPv2 driver less depending on
> the firmware/bootloader initialization[1], and making the PPv2 driver
> work when no Ethernet PHY is connected between a port and the physical
> layer[2]. A few patches cleanup some small parts of the driver, and
> newly supported interfaces are described in the device trees[3].
>
> [1] The current implementation of the PPv2 driver relies on the
> firmware/bootloader initialization to configure some parts, as the
> Group of Ports (GoP) and the MACs (GMAC and/or XLG MAC --for 10G--).
> The drawback is the kernel must be configured to match exactly what
> the bootloader configures which is not convenient and is an issue
> when using boards having an Ethernet port and an SFP port wired to
> the same GoP port, as no dynamic configuration can be done.
>
> This series adds the GoP and GMAC/XLG MAC initializations so that
> the PPV2 does not have to rely on a previous initialization. One
> part is still missing from this series, and that would be the
> 'comphy' which provides shared serdes PHYs and which must be
> configured as well for a full kernel initialization to work. This
> comphy support will be part of a following up series. (This
> series was also tested with this 'comphy' support, as it's nearly
> ready).
>
> Patches 3-7, 11 and 14-15.
>
> [2] While the documentation states the phy property in a port node is
> optional, it is not in the current driver's implementation. This is
> needed when no PHY is connected between a GoP port and the physical
> layer (as for the two SFP ports on the 8040-db). One other feature
> is missing to be able to use such ports: the port link interrupt
> which allows not to rely on the phylib link event callback.
>
> This series makes the phy optional in the PPv2 driver, and then adds
> the support for the GoP port link interrupt to handle link status
> changes on such ports.
>
> Patches 8-10, 12-14.
>
> [3] With the port link interrupt and optional PHY support, the two SFP
> ports on the Marvell Armada 8040 DB can be described and used; as
> well as the SFP port on the 7040 DB..
>
> Cosmetic changes / fixes.
>
> Patches 1, 2 and 16-18.
>
> I intentionally grouped all these patches into one series, as we would
> end up with series depending on each others (and I already did not
> include all my patches in this one). If that's an issue for this series
> to be reviewed/merged, I can easily split it into two series, with a
> dependency of one on the other.
>
> @Dave: Patches 13 to 18 should go through the mvebu tree, thanks :)
>
> Thanks!
> Antoine
>
>
> Antoine Tenart (18):
>   net: mvpp2: unify register definitions coding style
>   net: mvpp2: fix the synchronization module bypass macro name
>   net: mvpp2: set the SMI PHY address when connecting to the PHY
>   net: mvpp2: move the mii configuration in the ndo_open path
>   net: mvpp2: initialize the GMAC when using a port
>   net: mvpp2: initialize the XLG MAC when using a port
>   net: mvpp2: initialize the GoP
>   net: mvpp2: make the phy optional
>   net: mvpp2: use named interrupts
>   net: mvpp2: use the GoP interrupt for link status changes
>   Documentation/bindings: net: marvell-pp2: add the system controller
>   Documentation/bindings: net: marvell-pp2: add the interrupt-names
>   arm64: dts: marvell: cp110: use named interrupts for the Ethernet
> ports
>   arm64: dts: marvell: cp110: add PPv2 port interrupts
>   arm64: dts: marvell: add a reference to the sysctrl syscon in the ppv2
> node
>   arm64: dts: marvell: mcbin: enable more networking ports
>   arm64: dts: marvell: 7040-db: enable the SFP port
>   arm64: dts: marvell: 8040-db: enable the SFP ports
>
>  .../devicetree/bindings/net/marvell-pp2.txt|   7 +
>  arch/arm64/boot/dts/marvell/armada-7040-db.dts |   5 +
>  arch/arm64/boot/dts/marvell/armada-8040-db.dts |  10 +
>  arch/arm64/boot/dts/marvell/armada-8040-mcbin.dts  |  30 ++
>  .../boot/dts/marvell/armada-cp110-master.dtsi  |  13 +-
>  .../arm64/boot/dts/marvell/armada-cp110-slave.dtsi |  13 +-
>  drivers/net/ethernet/marvell/mvpp2.c   | 587 
> ++---
>  7 files changed, 575 insertions(+), 90 deletions(-)
>
> --
> 2.13.3
>

Re: A buggy behavior for Linux TCP Reno and HTCP

2017-07-24 Thread Yuchung Cheng

On Mon, Jul 24, 2017 at 11:29 AM, Neal Cardwell  wrote:
> On Mon, Jul 24, 2017 at 2:17 PM, Yuchung Cheng  wrote:
>> On Sun, Jul 23, 2017 at 7:37 PM, Neal Cardwell  wrote:
>>> On Sun, Jul 23, 2017 at 10:36 PM, Neal Cardwell  
>>> wrote:
> ...
 What if we call the field tp->prior_cwnd? Then at least we'd have some
 nice symmetry:

 - tp->snd_cwnd,  which is saved in tp->prior_cwnd  (and restored upon undo)
 - tp->snd_ssthresh,  which is saved in tp-> prior_ssthresh  (and
 restored upon undo)

 That sounds appealing to me. WDYT?
>>>
>>> And, I should add, if we go with the tp->prior_cwnd approach, then we
>>> can have a single "default"/CUBIC-style undo function, instead of 15
>>> separate but identical implementations...
>> you mean all CC modules share one ca_ops->undo_cwnd function? sounds a
>> nice consolidation work.
>
> Yes, exactly.
>
> Right now we have 9 modules that have identical tcp_foo_cwnd_undo functions:
>
> tcp_bic.c:188:  return max(tp->snd_cwnd, ca->loss_cwnd);
> tcp_cubic.c:378:return max(tcp_sk(sk)->snd_cwnd, ca->loss_cwnd);
> tcp_dctcp.c:318:return max(tcp_sk(sk)->snd_cwnd, ca->loss_cwnd);
> tcp_highspeed.c:165:return max(tcp_sk(sk)->snd_cwnd, ca->loss_cwnd);
> tcp_illinois.c:309: return max(tcp_sk(sk)->snd_cwnd, ca->loss_cwnd);
> tcp_nv.c:190:   return max(tcp_sk(sk)->snd_cwnd, ca->loss_cwnd);
> tcp_scalable.c:50:  return max(tcp_sk(sk)->snd_cwnd, ca->loss_cwnd);
> tcp_veno.c:210: return max(tcp_sk(sk)->snd_cwnd, veno->loss_cwnd);
> tcp_yeah.c:232: return max(tcp_sk(sk)->snd_cwnd, yeah->loss_cwnd);
>
> And if we fix this bug in tcp_reno_undo_cwnd() by referring to
> ca->loss_cwnd then we will add another 6 like this.
>
> So my proposal would be
>
> - tp->snd_cwnd,  which is saved in tp->prior_cwnd  (and restored upon undo)
> - tp->snd_ssthresh,  which is saved in tp-> prior_ssthresh  (and
>restored upon undo)
>
> Actually, now that I re-read the code, we already do have a
> prior_cwnd, which is used for the PRR code, and already set upon
> entering CA_Recovery. So if we set prior_cwnd for CA_Loss, perhaps we
> can do something like:
>
> diff --git a/net/ipv4/tcp_cong.c b/net/ipv4/tcp_cong.c
> index fde983f6376b..c2b174469645 100644
> --- a/net/ipv4/tcp_cong.c
> +++ b/net/ipv4/tcp_cong.c
> @@ -456,7 +456,7 @@ u32 tcp_reno_undo_cwnd(struct sock *sk)
>  {
> const struct tcp_sock *tp = tcp_sk(sk);
>
> -   return max(tp->snd_cwnd, tp->snd_ssthresh << 1);
> +   return max(tp->snd_cwnd, tp->prior_cwnd);
>  }
>  EXPORT_SYMBOL_GPL(tcp_reno_undo_cwnd);
>
> diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
> index 2920e0cb09f8..ae790a84302d 100644
> --- a/net/ipv4/tcp_input.c
> +++ b/net/ipv4/tcp_input.c
> @@ -1951,6 +1951,7 @@ void tcp_enter_loss(struct sock *sk)
> !after(tp->high_seq, tp->snd_una) ||
> (icsk->icsk_ca_state == TCP_CA_Loss && !icsk->icsk_retransmits)) {
> tp->prior_ssthresh = tcp_current_ssthresh(sk);
> +   tp->prior_cwnd = tp->snd_cwnd;
> tp->snd_ssthresh = icsk->icsk_ca_ops->ssthresh(sk);
> tcp_ca_event(sk, CA_EVENT_LOSS);
> tcp_init_undo(tp);
>
> And then change all the CC modules but BBR to use the
> tcp_reno_undo_cwnd() instead of their own custom undo code.
>
> WDYT?
Looks reasonable. But we might want to eventually refactor TCP undo
code: the stats changes (prior_ssthresh, prior_cwnd, undo_marker,
undo_retrans) are scattered in different helpers, making the code hard
to audit.

>
> neal

Re: [PATCH] netvsc: fix ptr_ret.cocci warnings

2017-07-24 Thread David Miller

From: Stephen Hemminger 
Date: Mon, 24 Jul 2017 16:21:57 -0700

> On Mon, 24 Jul 2017 16:20:28 -0700 (PDT)
> David Miller  wrote:
> 
>> From: kbuild test robot 
>> Date: Sat, 22 Jul 2017 02:54:43 +0800
>> 
>> > drivers/net/hyperv/netvsc_drv.c:737:8-14: WARNING: PTR_ERR_OR_ZERO can be 
>> > used
>> > 
>> > 
>> >  Use PTR_ERR_OR_ZERO rather than if(IS_ERR(...)) + PTR_ERR
>> > 
>> > Generated by: scripts/coccinelle/api/ptr_ret.cocci
>> > 
>> > Fixes: 9749fed5d43d ("netvsc: use ERR_PTR to avoid dereference issues")
>> > CC: stephen hemminger 
>> > Signed-off-by: Fengguang Wu   
>> 
>> Applied to net-next.
> 
> That function is gone in the later patches.

I guess I'll deal with that when I apply that series or do a merge
then :-)

Re: [PATCH 1/1] mlx4_en: remove unnecessary returned value

2017-07-24 Thread David Miller

From: Zhu Yanjun 
Date: Sun, 23 Jul 2017 23:02:53 -0400

> The function mlx4_en_arm_cq always returns zero. So change the
> return type of the function mlx4_en_arm_cq to void.
> 
> CC: Joe Jin 
> CC: Junxiao Bi 
> Signed-off-by: Zhu Yanjun 

Applied to net-next, thanks.

Re: [PATCH] of_mdio: kill useless variable in of_phy_register_fixed_link()

2017-07-24 Thread David Miller

From: Sergei Shtylyov 
Date: Sun, 23 Jul 2017 21:45:47 +0300

> of_phy_register_fixed_link() declares the 'err' variable to hold the result
> of of_property_read_string() but only uses it once after that, while that
> function can be called directly from the *if* statement...
> 
> Remove that variable and move/regroup 'link_gpio' and 'len' variables in
> order to sort the declarations in the reverse Xmas tree order -- to please
> DaveM. ;-)

Respect MY AUTHORITY! :-)

> Signed-off-by: Sergei Shtylyov 

Applied, thanks.

Re: [PATCH net-next] skbuff: re-add check for NULL skb->head in kfree_skb path

2017-07-24 Thread David Miller

From: Florian Westphal 
Date: Sun, 23 Jul 2017 19:54:47 +0200

> A null check is needed after all.  netlink skbs can have skb->head be
> backed by vmalloc.  The netlink destructor vfree()s head, then sets it to
> NULL.  We then panic in skb_release_data with a NULL dereference.
> 
> Re-add such a test.
> 
> Alternative would be to switch to kvfree to free skb->head memory
> and remove the special handling in netlink destructor.
> 
> Reported-by: kernel test robot 
> Fixes: 06dc75ab06943 ("net: Revert "net: add function to allocate sk_buff 
> head without data area")
> Signed-off-by: Florian Westphal 

Ok, back it goes.

Applied, thanks.

Re: [PATCH-next] liquidio: fix implicit irq include causing build failures

2017-07-24 Thread David Miller

From: Paul Gortmaker 
Date: Sun, 23 Jul 2017 10:44:52 -0400

> To fix
> 
> In file included from 
> drivers/net/ethernet/cavium/liquidio/octeon_mem_ops.c:24:0:
> drivers/net/ethernet/cavium/liquidio/octeon_device.h:216:2: error: expected 
> specifier-qualifier-list before ‘irqreturn_t’
>   irqreturn_t (*process_interrupt_regs)(void *);
>   ^
> 
> as seen on arm64 allmodconfig builds.
> 
> Cc: Derek Chickles 
> Cc: Satanand Burla 
> Cc: Felix Manlunas 
> Cc: Raghu Vatsavayi 
> Signed-off-by: Paul Gortmaker 

Applied to net-next, thanks.

Re: [PATCH net] openvswitch: fix potential out of bound access in parse_ct

2017-07-24 Thread David Miller

From: Liping Zhang 
Date: Sun, 23 Jul 2017 17:52:23 +0800

> From: Liping Zhang 
> 
> Before the 'type' is validated, we shouldn't use it to fetch the
> ovs_ct_attr_lens's minlen and maxlen, else, out of bound access
> may happen.
> 
> Fixes: 7f8a436eaa2c ("openvswitch: Add conntrack action")
> Signed-off-by: Liping Zhang 

Applied and queued up for -stable.

Re: [PATCH net-next] bpf: dev_map_alloc() shouldn't return NULL

2017-07-24 Thread David Miller

From: Dan Carpenter 
Date: Sat, 22 Jul 2017 10:40:04 +0300

> We forgot to set the error code on two error paths which means that we
> return ERR_PTR(0) which is NULL.  The caller, find_and_alloc_map(), is
> not expecting that and will have a NULL dereference.
> 
> Fixes: 546ac1ffb70d ("bpf: add devmap, a map for storing net device 
> references")
> Signed-off-by: Dan Carpenter 

Applied, thanks Dan.

Re: [PATCH] netvsc: fix ptr_ret.cocci warnings

2017-07-24 Thread Stephen Hemminger

On Mon, 24 Jul 2017 16:20:28 -0700 (PDT)
David Miller  wrote:

> From: kbuild test robot 
> Date: Sat, 22 Jul 2017 02:54:43 +0800
> 
> > drivers/net/hyperv/netvsc_drv.c:737:8-14: WARNING: PTR_ERR_OR_ZERO can be 
> > used
> > 
> > 
> >  Use PTR_ERR_OR_ZERO rather than if(IS_ERR(...)) + PTR_ERR
> > 
> > Generated by: scripts/coccinelle/api/ptr_ret.cocci
> > 
> > Fixes: 9749fed5d43d ("netvsc: use ERR_PTR to avoid dereference issues")
> > CC: stephen hemminger 
> > Signed-off-by: Fengguang Wu   
> 
> Applied to net-next.

That function is gone in the later patches.

Re: [PATCH] net: stmmac: Adjust dump offset of DMA registers for ethtool

2017-07-24 Thread David Miller

From: thor.tha...@linux.intel.com
Date: Fri, 21 Jul 2017 16:35:09 -0500

> From: Thor Thayer 
> 
> The commit fbf68229ffe7 ("net: stmmac: unify registers dumps methods")
> 
> in the Linux kernel modified the register dump to store the DMA registers
> at the DMA register offset (0x1000) but ethtool (stmmac.c) looks for the
> DMA registers after the MAC registers which is offset 55.
> This patch copies the DMA registers from the higher offset to the offset
> where ethtool expects them.
> 
> Signed-off-by: Thor Thayer 

Applied.

Re: [PATCH] netvsc: fix ptr_ret.cocci warnings

2017-07-24 Thread David Miller

From: kbuild test robot 
Date: Sat, 22 Jul 2017 02:54:43 +0800

> drivers/net/hyperv/netvsc_drv.c:737:8-14: WARNING: PTR_ERR_OR_ZERO can be used
> 
> 
>  Use PTR_ERR_OR_ZERO rather than if(IS_ERR(...)) + PTR_ERR
> 
> Generated by: scripts/coccinelle/api/ptr_ret.cocci
> 
> Fixes: 9749fed5d43d ("netvsc: use ERR_PTR to avoid dereference issues")
> CC: stephen hemminger 
> Signed-off-by: Fengguang Wu 

Applied to net-next.

Re: [PATCH net-next 0/2] rxrpc: Rearrange headers

2017-07-24 Thread David Miller

From: David Howells 
Date: Fri, 21 Jul 2017 19:29:42 +0100

> 
> Here's a pair of patches that rearrange some of the AF_RXRPC header files
> that are outside of the net/rxrpc/ directory:
> 
>  (1) The bits userspace need are moved to uapi/linux/rxrpc.h.  [Should this
>  be af_rxrpc.h instead, I wonder - but there doesn't seem to be
>  precedent for that in the other net UAPI headers.]
> 
>  (2) For the most part, the contents of rxrpc/packet.h are no longer used
>  outside of the AF_RXRPC module, so move them to net/rxrpc/protocol.h
>  with the exception of the standard abort codes which are exposed to
>  userspace when an abort occurs and the security index values which are
>  needed when constructing keys.
> 
> The patches can be found here also:
> 
>   
> http://git.kernel.org/cgit/linux/kernel/git/dhowells/linux-fs.git/log/?h=rxrpc-rewrite
> 
> Tagged thusly:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git
>   rxrpc-rewrite-20170721

Pulled, thanks David.

Re: [patch net-next] mlxsw: spectrum_router: Don't batch neighbour deletion

2017-07-24 Thread David Miller

From: Jiri Pirko 
Date: Fri, 21 Jul 2017 20:31:38 +0200

> From: Ido Schimmel 
> 
> Current firmware supported by the driver doesn't support batch deletion
> of IPv6 neighbours on a given router interface (RIF).
> 
> Until a new version that supports this functionality is made available,
> delete neighbours one by one.
> 
> Signed-off-by: Ido Schimmel 
> Signed-off-by: Jiri Pirko 

Applied.

Re: [PATCH V2 net] net: fix tcp reset packet flowlabel for ipv6

2017-07-24 Thread Cong Wang

On Mon, Jul 24, 2017 at 1:34 PM, David Miller  wrote:
> From: Shaohua Li 
> Date: Tue, 18 Jul 2017 12:03:37 -0700
>
>> + /* Since this is being sent on the wire obfuscate hash a bit
>> +  * to minimize possbility that any useful information to an
>> +  * attacker is leaked. Only lower 20 bits are relevant.
>> +  */
>> + rol32(hash, 16);
>
> This doesn't help anything at all.

I believe the above code is copy-n-pasted from ip6_make_flowlabel()
(with few adjustments). Don't know why we can't refactor that function
for reuse.

[PATCH v2 4/4] can: m_can: Add call to of_can_transceiver_fixed

2017-07-24 Thread Franklin S Cooper Jr

Add call to new generic functions that provides support via a binding
to limit the arbitration rate and/or data rate imposed by the physical
transceiver connected to the MCAN peripheral.

Signed-off-by: Franklin S Cooper Jr 
---
 drivers/net/can/m_can/m_can.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/can/m_can/m_can.c b/drivers/net/can/m_can/m_can.c
index f4947a7..bd75df1 100644
--- a/drivers/net/can/m_can/m_can.c
+++ b/drivers/net/can/m_can/m_can.c
@@ -1649,6 +1649,8 @@ static int m_can_plat_probe(struct platform_device *pdev)
 
devm_can_led_init(dev);
 
+   of_can_transceiver_fixed(dev);
+
dev_info(&pdev->dev, "%s device registered (irq=%d, version=%d)\n",
 KBUILD_MODNAME, dev->irq, priv->version);
 
-- 
2.10.0

[PATCH v2 3/4] can: m_can: Update documentation to mention new fixed transceiver binding

2017-07-24 Thread Franklin S Cooper Jr

Add information regarding fixed transceiver binding. This is especially
important for MCAN since the IP allows CAN FD mode to run significantly
faster than what most transceivers are capable of.

Signed-off-by: Franklin S Cooper Jr 
---
Version 2 changes:
Drop unit address

 Documentation/devicetree/bindings/net/can/m_can.txt | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/Documentation/devicetree/bindings/net/can/m_can.txt 
b/Documentation/devicetree/bindings/net/can/m_can.txt
index 9e33177..e4abd2c 100644
--- a/Documentation/devicetree/bindings/net/can/m_can.txt
+++ b/Documentation/devicetree/bindings/net/can/m_can.txt
@@ -43,6 +43,11 @@ Required properties:
  Please refer to 2.4.1 Message RAM Configuration in
  Bosch M_CAN user manual for details.
 
+Optional properties:
+- fixed-transceiver: Fixed-transceiver subnode describing maximum speed
+ that can be used for CAN and/or CAN-FD modes.  See
+ 
Documentation/devicetree/bindings/net/can/fixed-transceiver.txt
+ for details.
 Example:
 SoC dtsi:
 m_can1: can@020e8000 {
@@ -64,4 +69,9 @@ Board dts:
pinctrl-names = "default";
pinctrl-0 = <&pinctrl_m_can1>;
status = "enabled";
+
+   fixed-transceiver {
+   max-arbitration-speed = <100>;
+   max-data-speed = <500>;
+   };
 };
-- 
2.10.0

[PATCH v2 0/4] can: Add new binding to limit bit rate used

2017-07-24 Thread Franklin S Cooper Jr

Add a new generic binding that CAN drivers can use to specify the max
arbitration and data bit rate supported by a transceiver. This is
useful since in some instances the maximum speeds may be limited by
the transceiver used. However, transceivers may not provide a means
to determine this limitation at runtime. Therefore, create a new binding
that mimics "fixed-link" that allows a user to hardcode the max speeds
that can be used.

Also add support for this new binding in the MCAN driver.

Note this is an optional subnode so even if a driver adds support for
parsing fixed-transceiver the user does not have to define it in their
device tree.

Version 2 changes:
Rename function
Define proper variable default
Drop unit address
Move check to changelink function
Reword commit message
Reword documentation

Franklin S Cooper Jr (4):
  can: dev: Add support for limiting configured bitrate
  can: fixed-transceiver: Add documentation for CAN fixed transceiver
bindings
  can: m_can: Update documentation to mention new fixed transceiver
binding
  can: m_can: Add call to of_can_transceiver_fixed

 .../bindings/net/can/fixed-transceiver.txt | 42 +++
 .../devicetree/bindings/net/can/m_can.txt  | 10 
 drivers/net/can/dev.c  | 59 ++
 drivers/net/can/m_can/m_can.c  |  2 +
 include/linux/can/dev.h|  5 ++
 5 files changed, 118 insertions(+)
 create mode 100644 
Documentation/devicetree/bindings/net/can/fixed-transceiver.txt

-- 
2.10.0

[PATCH v2 1/4] can: dev: Add support for limiting configured bitrate

2017-07-24 Thread Franklin S Cooper Jr

Various CAN or CAN-FD IP may be able to run at a faster rate than
what the transceiver the CAN node is connected to. This can lead to
unexpected errors. However, CAN transceivers typically have fixed
limitations and provide no means to discover these limitations at
runtime. Therefore, add support for a fixed-transceiver node that
can be reused by other CAN peripheral drivers to determine for both
CAN and CAN-FD what the max bitrate that can be used. If the user
tries to configure CAN to pass these maximum bitrates it will throw
an error.

Signed-off-by: Franklin S Cooper Jr 
---
Version 2 changes:
Rename new function to of_can_transceiver_fixed
Use version of of_property_read that supports signed/negative values
Return error when user tries to use CAN-FD if the transceiver doesn't
support it (max-data-speed = -1).

 drivers/net/can/dev.c   | 59 +
 include/linux/can/dev.h |  5 +
 2 files changed, 64 insertions(+)

diff --git a/drivers/net/can/dev.c b/drivers/net/can/dev.c
index 365a8cc..c046631 100644
--- a/drivers/net/can/dev.c
+++ b/drivers/net/can/dev.c
@@ -27,6 +27,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #define MOD_DESC "CAN device driver interface"
@@ -814,6 +815,41 @@ int open_candev(struct net_device *dev)
 }
 EXPORT_SYMBOL_GPL(open_candev);
 
+#ifdef CONFIG_OF
+void of_can_transceiver_fixed(struct net_device *dev)
+{
+   struct device_node *dn;
+   struct can_priv *priv = netdev_priv(dev);
+   int max_frequency;
+   struct device_node *np;
+
+   np = dev->dev.parent->of_node;
+
+   dn = of_get_child_by_name(np, "fixed-transceiver");
+   if (!dn)
+   return;
+
+   /* Value of 0 implies ignore max speed constraint */
+   max_frequency = 0;
+   of_property_read_s32(dn, "max-arbitration-speed", &max_frequency);
+
+   if (max_frequency >= 0)
+   priv->max_trans_arbitration_speed = max_frequency;
+   else
+   priv->max_trans_arbitration_speed = 0;
+
+   max_frequency = 0;
+
+   of_property_read_s32(dn, "max-data-speed", &max_frequency);
+
+   if (max_frequency >= -1)
+   priv->max_trans_data_speed = max_frequency;
+   else
+   priv->max_trans_data_speed = 0;
+}
+EXPORT_SYMBOL(of_can_transceiver_fixed);
+#endif
+
 /*
  * Common close function for cleanup before the device gets closed.
  *
@@ -913,6 +949,14 @@ static int can_changelink(struct net_device *dev, struct 
nlattr *tb[],
priv->bitrate_const_cnt);
if (err)
return err;
+
+   if (priv->max_trans_arbitration_speed > 0 &&
+   bt.bitrate > priv->max_trans_arbitration_speed) {
+   netdev_err(dev, "arbitration bitrate surpasses 
transceiver capabilities of %d bps\n",
+  priv->max_trans_arbitration_speed);
+   return -EINVAL;
+   }
+
memcpy(&priv->bittiming, &bt, sizeof(bt));
 
if (priv->do_set_bittiming) {
@@ -989,6 +1033,12 @@ static int can_changelink(struct net_device *dev, struct 
nlattr *tb[],
if (!priv->data_bittiming_const && !priv->do_set_data_bittiming)
return -EOPNOTSUPP;
 
+   if ((priv->ctrlmode & CAN_CTRLMODE_FD) &&
+   priv->max_trans_data_speed == -1) {
+   netdev_err(dev, "canfd mode is not supported by 
transceiver\n");
+   return -EINVAL;
+   }
+
memcpy(&dbt, nla_data(data[IFLA_CAN_DATA_BITTIMING]),
   sizeof(dbt));
err = can_get_bittiming(dev, &dbt,
@@ -997,6 +1047,15 @@ static int can_changelink(struct net_device *dev, struct 
nlattr *tb[],
priv->data_bitrate_const_cnt);
if (err)
return err;
+
+   if (priv->max_trans_data_speed  > 0 &&
+   (priv->ctrlmode & CAN_CTRLMODE_FD) &&
+   (dbt.bitrate > priv->max_trans_data_speed)) {
+   netdev_err(dev, "canfd data bitrate surpasses 
transceiver capabilities of %d bps\n",
+  priv->max_trans_data_speed);
+   return -EINVAL;
+   }
+
memcpy(&priv->data_bittiming, &dbt, sizeof(dbt));
 
if (priv->do_set_data_bittiming) {
diff --git a/include/linux/can/dev.h b/include/linux/can/dev.h
index 141b05a..926fc7e 100644
--- a/include/linux/can/dev.h
+++ b/include/linux/can/dev.h
@@ -47,6 +47,9 @@ struct can_priv {
unsigned int data_bitrate_const_cnt;
struct can_clock clock;
 
+   int max_trans_arbitration_speed;
+   int max_trans_data_speed;
+
enum can_state state;
 
/* CAN controller features - see include/uapi/linux/can/netlink.h */
@@ -165,6 +1

[PATCH v2 2/4] can: fixed-transceiver: Add documentation for CAN fixed transceiver bindings

2017-07-24 Thread Franklin S Cooper Jr

Add documentation to describe usage of the new fixed transceiver binding.
This new binding is applicable for any CAN device therefore it exists as
its own document.

Signed-off-by: Franklin S Cooper Jr 
---
Version 2:
Tweak commit message working
Tweak wording for properties
Drop unit addressing
Give example if CAN-FD isn't supported

 .../bindings/net/can/fixed-transceiver.txt | 42 ++
 1 file changed, 42 insertions(+)
 create mode 100644 
Documentation/devicetree/bindings/net/can/fixed-transceiver.txt

diff --git a/Documentation/devicetree/bindings/net/can/fixed-transceiver.txt 
b/Documentation/devicetree/bindings/net/can/fixed-transceiver.txt
new file mode 100644
index 000..dc7e3eb
--- /dev/null
+++ b/Documentation/devicetree/bindings/net/can/fixed-transceiver.txt
@@ -0,0 +1,42 @@
+Fixed transceiver Device Tree binding
+--
+
+CAN transceiver typically limits the max speed in standard CAN and CAN FD
+modes. Typically these limitations are static and the transceivers themselves
+provide no way to detect this limitation at runtime. For this situation,
+the "fixed-transceiver" node can be used.
+
+Properties:
+
+Optional:
+ max-arbitration-speed: a positive non 0 value that determines the max
+   speed that CAN can run in non CAN-FD mode or during the
+   arbitration phase in CAN-FD mode.
+
+ max-data-speed:   a positive non 0 value that determines the max data rate
+   that can be used in CAN-FD mode. A value of -1 implies
+   CAN-FD is not supported by the transceiver.
+
+Examples:
+
+Based on Texas Instrument's TCAN1042HGV CAN Transceiver
+
+m_can0 {
+   
+   fixed-transceiver {
+   max-arbitration-speed = <100>;
+   max-data-speed = <500>;
+   };
+   ...
+};
+
+Based on a CAN Transceiver that doesn't support CAN-FD. Only needed if the
+CAN IP supports CAN-FD but is using a transceiver that doesn't.
+
+m_can0 {
+   
+   fixed-transceiver {
+   max-data-speed = <(-1)>;
+   };
+   ...
+};
-- 
2.10.0

Re: [PATCH V2 net-next 01/21] net-next/hinic: Initialize hw interface

2017-07-24 Thread Francois Romieu

Aviad Krawczyk  :
[...]
> hinic_remove - If insmod failed and someone calls rmmod, we will get a
> crash because the resource are already free. Therefore we test if the
> device exists, please tell me if you meant to something different

The module won't even proceed through the pci_driver remove method if
the probe method failed. See drivers/pci/bus.c::pci_bus_add_device and
track 'dev->is_added'. You don't need to believe me: try it.

I have no idea where your crash comes from but something does not
look quite right.

(use module_pci_driver() to save some boilerplate code btw)

[...]
> The priv data is in type void * because the
> caller can use any struct that it wants, like the priv data in Linux
> (netdev, irq, tasklet, work..) -

I disagree. A driver is a piece of glue between hardware and software.
It fills a kernel's contract. It is not supposed to introduce opaque
data (even if it's hard to resist).

> we can change it but if we will pass different struct
> in the future, we will have to change the prototype of the functions.

It's fine. If I do something wrong - and at some point I will - I'd
rather have it detected at compile time. Nobody wants to waste precious
hardware lab testing time because of excess void *.

> According to the other void *:
> The wq struct is used for cmdq, sq and rq. Therefore the wqe is in type
> void *. There are 4 operations get_wqe, write_wqe, read_wqe and put_wqe - 
> there
> is no option that one function will be fed with a wrong pointer because the 
> caller
> should use what it got in get_wqe function.
> 
> When something is used as multiple types, it can be used as void * or union.
> Usually, I would prefer union. But, in this case if we will use union, maybe
> there is a chance of using the wrong wqe type in the wrong work queue type.

union * will at least catch being fed a wrong type. void * won't notice.

Let's take a practical example: hinic_sq_get_sges.

void hinic_sq_get_sges(void *wqe, struct hinic_sge *sges, int nr_sges)
   ^
{
struct hinic_sq_wqe *sq_wqe = (struct hinic_sq_wqe *)wqe;


static void free_all_tx_skbs(struct hinic_txq *txq)
{
struct hinic_dev *nic_dev = netdev_priv(txq->netdev);
struct hinic_sq *sq = txq->sq;
struct hinic_sq_wqe *wqe;
[...]
hinic_sq_get_sges(wqe, txq->free_sges, nr_sges);


static int free_tx_poll(struct napi_struct *napi, int budget)
{
[...]
struct hinic_sq_wqe *wqe;
[...]
hinic_sq_get_sges(wqe, txq->free_sges, nr_sges);


Why is it:

void hinic_sq_get_sges(void *wqe, ...

instead of:

void hinic_sq_get_sges(struct hinic_sq_wqe *wqe, ...

Because of a future change ?

-- 
Ueimor

Re: [PATCH net-next 00/11] sctp: remove typedefs from structures part 3

2017-07-24 Thread David Miller

From: Xin Long 
Date: Sun, 23 Jul 2017 09:34:25 +0800

> As we know, typedef is suggested not to use in kernel, even checkpatch.pl
> also gives warnings about it. Now sctp is using it for many structures.
> 
> All this kind of typedef's using should be removed. This patchset is the
> part 3 to remove it for another 11 basic structures from linux/sctp.h.
> 
> Just as the part 1 and 2, No any code's logic would be changed in these
> patches, only cleaning up.

Series applied, thanks.

Re: Problem with PHY state machine when using interrupts

2017-07-24 Thread Florian Fainelli

On 07/24/2017 03:53 PM, Mason wrote:
> On 25/07/2017 00:36, Florian Fainelli wrote:
>> On 07/24/2017 02:20 PM, Mason wrote:
>>> On 24/07/2017 21:53, Florian Fainelli wrote:
>>>
 Well now that I see the possible interrupts generated, I indeed don't
 see how you can get a link down notification unless you somehow force
 the link down yourself, which would certainly happen in phy_suspend()
 when we set BMCR.pwrdwn, but that may be too late.

 You should still expect the adjust_link() function to be called though
 with PHY_HALTED being set and that takes care of doing phydev->link = 0
 and netif_carrier_off(). If that still does not work, then see whether
 removing the call to phy_stop() does help (it really should).
>>>
>>> The only functions setting phydev->state to PHY_HALTED
>>> are phy_error() and phy_stop() AFAICT.
>>>
>>> I am aware that when phy_state_machine() handles the
>>> PHY_HALTED state, it will set phydev->link = 0;
>>> and call netif_carrier_off() -- because that's where
>>> I copied that code from.
>>>
>>> My issue is that phy_state_machine() does not run when
>>> I run 'ip set link dev eth0 down' from the command line.
>>
>> Yes, that much is clear, which is why I suggested earlier you try the
>> patch at the end now.
>>
>>>
>>> If I'm reading the code right, phy_disconnect() actually
>>> stops the state machine.
>>>
>>> In interrupt mode, phy_state_machine() doesn't run
>>> because no interrupt is generated.
>>>
>>> In polling mode, phy_state_machine() doesn't run
>>> because phy_disconnect() stops the state machine.
>>>
>>> Introducing a sleep before phy_disconnect() gives
>>> the state machine a chance to run in polling mode,
>>> but it doesn't feel right, and doesn't fix the
>>> other mode, which I'm using.
>>
>> There are several problems it seems:
>>
>> - the PHY state machine cannot move solely based on the PHY generating
>> interrupts during phy_stop() if none of the changing conditions for the
>> HW have changed, come to think about it, I doubt any PHY would be
>> capable of doing something like that
>>
>> - there is an expectation from your driver to have adjust_link() run
>> sometime during ndo_stop() to do something, but why?
>>
>> What is special about nb8800 that interrupts should be generated during
>> ndo_stop(), and why do you think this is going to solve your problem?
>>
>>>
>>> Looking at bcm_enet_stop() it calls phy_stop() and
>>> phy_disconnect() just like the nb8800 driver...
>>>
>>> I'm stumped.
>>
>> My suggestion of not using phy_stop() was not a good one, but
>> functionally there is little difference in doing phy_stop() +
>> phy_disconnect() or just phy_disconnect() alone. What matters is that we
>> restart the PHY properly when phy_connect() or phy_start() is called.
>>
>> What I understand now from your "bug report" is that you want
>> adjust_link to run with phydev->link = 0 to do something during
>> ndo_close() but you have not explained why, but I suspect such that upon
>> ndo_open() things work again.
>>
>> What I suspect your bug is, is that the really was no change in link
>> status, no interrupt was generated because there should not be one, yet
>> some of what nb8800_stop() does is not properly balanced by
>> nb8800_open(). How about the following patch:
>>
>> diff --git a/drivers/net/ethernet/aurora/nb8800.c
>> b/drivers/net/ethernet/aurora/nb8800.c
>> index 041cfb7952f8..b07dea3ab019 100644
>> --- a/drivers/net/ethernet/aurora/nb8800.c
>> +++ b/drivers/net/ethernet/aurora/nb8800.c
>> @@ -972,6 +972,10 @@ static int nb8800_open(struct net_device *dev)
>> nb8800_mac_rx(dev, true);
>> nb8800_mac_tx(dev, true);
>>
>> +   priv->speed = -1;
>> +   priv->link = -1;
>> +   priv->duplex = -1;
>> +
>> phydev = of_phy_connect(dev, priv->phy_node,
>> nb8800_link_reconfigure, 0,
>> priv->phy_mode);
>>
> 
> I will test your two patches ASAP.
> 
> For the record, I have two (maybe related) issues:
> 
> 1) After a sequence of
> ip set link dev eth0 up
> ip set link dev eth0 down
> ip set link dev eth0 up
> RX engine is hosed, thus network connectivity is borked.
> The work-around is resetting the MAC in ndo_open
> => mac_init in my proposed patch.
> This is by far the biggest issue.
> Also, resetting the MAC in ndo_open will make it easy
> to implement suspend/resume, as I can just ndo_stop
> in suspend, and ndo_open in resume.

OK.

> 
> 2) The system does not print "Link down" when I run
> 'ip set link dev eth0 down'
> This is a symptom of nb8800_link_reconfigure()
> not being called at all (or being called
> with phydev->link == priv->link when I tried
> skipping phy_stop)

Correct, and there is actually a timing hazard that I could also
reproduce here in that phy_stop() + phy_disconnect(), will try to cook a
patch fixing that as well, since that seems highly undesirable.

> 
> Again, thanks for your patches and suggestions,
> which

Re: [PATCH net-next 10/18] net: mvpp2: use the GoP interrupt for link status changes

2017-07-24 Thread Marcin Wojtas

Hi Antoine,

This patch requires also:

diff --git a/drivers/net/ethernet/marvell/mvpp2.c
b/drivers/net/ethernet/marvell/mvpp2.c
index 4694d4f..369819f 100644
--- a/drivers/net/ethernet/marvell/mvpp2.c
+++ b/drivers/net/ethernet/marvell/mvpp2.c
@@ -6625,6 +6625,7 @@ static int mvpp2_stop(struct net_device *dev)
 {
struct mvpp2_port *port = netdev_priv(dev);
struct mvpp2_port_pcpu *port_pcpu;
+   struct mvpp2 *priv = port->priv;
int cpu;

mvpp2_stop_dev(port);
@@ -6633,6 +6634,10 @@ static int mvpp2_stop(struct net_device *dev)
/* Mask interrupts on all CPUs */
on_each_cpu(mvpp2_interrupts_mask, port, 1);

+   if (priv->hw_version == MVPP22 && !port->phy_node && port->link_irq) {
+   free_irq(port->link_irq, port);
+   }
+
free_irq(port->irq, port);
for_each_present_cpu(cpu) {
port_pcpu = per_cpu_ptr(port->pcpu, cpu);

Otherwise a sequence: ifconfig up/down/up results in faults.

Best regards,
Marcin

2017-07-24 15:48 GMT+02:00 Antoine Tenart :
> This patch adds the GoP link interrupt support for when a port isn't
> connected to a PHY. Because of this the phylib callback is never called
> and the link status management isn't done. This patch use the GoP link
> interrupt in such cases to still have a minimal link management. Without
> this patch ports not connected to a PHY cannot work.
>
> Signed-off-by: Antoine Tenart 
> ---
>  drivers/net/ethernet/marvell/mvpp2.c | 157 
> ++-
>  1 file changed, 154 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/net/ethernet/marvell/mvpp2.c 
> b/drivers/net/ethernet/marvell/mvpp2.c
> index 77eef2cc40a1..33a7eb834855 100644
> --- a/drivers/net/ethernet/marvell/mvpp2.c
> +++ b/drivers/net/ethernet/marvell/mvpp2.c
> @@ -339,16 +339,24 @@
>  #define MVPP2_GMAC_FLOW_CTRL_AUTONEG   BIT(11)
>  #define MVPP2_GMAC_CONFIG_FULL_DUPLEX  BIT(12)
>  #define MVPP2_GMAC_AN_DUPLEX_ENBIT(13)
> +#define MVPP2_GMAC_STATUS0 0x10
> +#define MVPP2_GMAC_STATUS0_LINK_UP BIT(0)
>  #define MVPP2_GMAC_PORT_FIFO_CFG_1_REG 0x1c
>  #define MVPP2_GMAC_TX_FIFO_MIN_TH_OFFS 6
>  #define MVPP2_GMAC_TX_FIFO_MIN_TH_ALL_MASK 0x1fc0
>  #define MVPP2_GMAC_TX_FIFO_MIN_TH_MASK(v)  (((v) << 6) & \
> MVPP2_GMAC_TX_FIFO_MIN_TH_ALL_MASK)
> +#define MVPP22_GMAC_INT_STAT   0x20
> +#define MVPP22_GMAC_INT_STAT_LINK  BIT(1)
> +#define MVPP22_GMAC_INT_MASK   0x24
> +#define MVPP22_GMAC_INT_MASK_LINK_STAT BIT(1)
>  #define MVPP22_GMAC_CTRL_4_REG 0x90
>  #define MVPP22_CTRL4_EXT_PIN_GMII_SEL  BIT(0)
>  #define MVPP22_CTRL4_DP_CLK_SELBIT(5)
>  #define MVPP22_CTRL4_SYNC_BYPASS_DIS   BIT(6)
>  #define MVPP22_CTRL4_QSGMII_BYPASS_ACTIVE  BIT(7)
> +#define MVPP22_GMAC_INT_SUM_MASK   0xa4
> +#define MVPP22_GMAC_INT_SUM_MASK_LINK_STAT BIT(1)
>
>  /* Per-port XGMAC registers. PPv2.2 only, only for GOP port 0,
>   * relative to port->base.
> @@ -358,12 +366,19 @@
>  #define MVPP22_XLG_CTRL0_MAC_RESET_DIS BIT(1)
>  #define MVPP22_XLG_CTRL0_RX_FLOW_CTRL_EN   BIT(7)
>  #define MVPP22_XLG_CTRL0_MIB_CNT_DIS   BIT(14)
> -
> +#define MVPP22_XLG_STATUS  0x10c
> +#define MVPP22_XLG_STATUS_LINK_UP  BIT(0)
> +#define MVPP22_XLG_INT_STAT0x114
> +#define MVPP22_XLG_INT_STAT_LINK   BIT(1)
> +#define MVPP22_XLG_INT_MASK0x118
> +#define MVPP22_XLG_INT_MASK_LINK   BIT(1)
>  #define MVPP22_XLG_CTRL3_REG   0x11c
>  #define MVPP22_XLG_CTRL3_MACMODESELECT_MASK(7 << 13)
>  #define MVPP22_XLG_CTRL3_MACMODESELECT_GMAC(0 << 13)
>  #define MVPP22_XLG_CTRL3_MACMODESELECT_10G (1 << 13)
> -
> +#define MVPP22_XLG_EXT_INT_MASK0x15c
> +#define MVPP22_XLG_EXT_INT_MASK_XLGBIT(1)
> +#define MVPP22_XLG_EXT_INT_MASK_GIGBIT(2)
>  #define MVPP22_XLG_CTRL4_REG   0x184
>  #define MVPP22_XLG_CTRL4_FWD_FCBIT(5)
>  #define MVPP22_XLG_CTRL4_FWD_PFC   BIT(6)
> @@ -814,6 +829,7 @@ struct mvpp2_port {
> int gop_id;
>
> int irq;
> +   int link_irq;
>
> struct mvpp2 *priv;
>
> @@ -4330,6 +4346,63 @@ static int mvpp22_gop_init(struct mvpp2_port *port)
> return -EINVAL;
>  }
>
> +static void mvpp22_gop_unmask_irq(struct mvpp2_port *port)
> +{
> +   u32 val;
> +
> +   if (port->phy_interface == PHY_INTERFACE_MODE_RGMII ||
> +   port->phy_interface == PHY_INTERFACE_MODE_SGMII) {
> +   /* Enable the GMAC link status irq for this port */
> +   val = readl(port->base + MVPP22_GMAC_INT_SUM_MASK);
> +   val |= MVPP22_GMAC_INT_SUM_MASK_LINK_STAT;
> +

Re: Problem with PHY state machine when using interrupts

2017-07-24 Thread Mason

On 25/07/2017 00:36, Florian Fainelli wrote:
> On 07/24/2017 02:20 PM, Mason wrote:
>> On 24/07/2017 21:53, Florian Fainelli wrote:
>>
>>> Well now that I see the possible interrupts generated, I indeed don't
>>> see how you can get a link down notification unless you somehow force
>>> the link down yourself, which would certainly happen in phy_suspend()
>>> when we set BMCR.pwrdwn, but that may be too late.
>>>
>>> You should still expect the adjust_link() function to be called though
>>> with PHY_HALTED being set and that takes care of doing phydev->link = 0
>>> and netif_carrier_off(). If that still does not work, then see whether
>>> removing the call to phy_stop() does help (it really should).
>>
>> The only functions setting phydev->state to PHY_HALTED
>> are phy_error() and phy_stop() AFAICT.
>>
>> I am aware that when phy_state_machine() handles the
>> PHY_HALTED state, it will set phydev->link = 0;
>> and call netif_carrier_off() -- because that's where
>> I copied that code from.
>>
>> My issue is that phy_state_machine() does not run when
>> I run 'ip set link dev eth0 down' from the command line.
> 
> Yes, that much is clear, which is why I suggested earlier you try the
> patch at the end now.
> 
>>
>> If I'm reading the code right, phy_disconnect() actually
>> stops the state machine.
>>
>> In interrupt mode, phy_state_machine() doesn't run
>> because no interrupt is generated.
>>
>> In polling mode, phy_state_machine() doesn't run
>> because phy_disconnect() stops the state machine.
>>
>> Introducing a sleep before phy_disconnect() gives
>> the state machine a chance to run in polling mode,
>> but it doesn't feel right, and doesn't fix the
>> other mode, which I'm using.
> 
> There are several problems it seems:
> 
> - the PHY state machine cannot move solely based on the PHY generating
> interrupts during phy_stop() if none of the changing conditions for the
> HW have changed, come to think about it, I doubt any PHY would be
> capable of doing something like that
> 
> - there is an expectation from your driver to have adjust_link() run
> sometime during ndo_stop() to do something, but why?
> 
> What is special about nb8800 that interrupts should be generated during
> ndo_stop(), and why do you think this is going to solve your problem?
> 
>>
>> Looking at bcm_enet_stop() it calls phy_stop() and
>> phy_disconnect() just like the nb8800 driver...
>>
>> I'm stumped.
> 
> My suggestion of not using phy_stop() was not a good one, but
> functionally there is little difference in doing phy_stop() +
> phy_disconnect() or just phy_disconnect() alone. What matters is that we
> restart the PHY properly when phy_connect() or phy_start() is called.
> 
> What I understand now from your "bug report" is that you want
> adjust_link to run with phydev->link = 0 to do something during
> ndo_close() but you have not explained why, but I suspect such that upon
> ndo_open() things work again.
> 
> What I suspect your bug is, is that the really was no change in link
> status, no interrupt was generated because there should not be one, yet
> some of what nb8800_stop() does is not properly balanced by
> nb8800_open(). How about the following patch:
> 
> diff --git a/drivers/net/ethernet/aurora/nb8800.c
> b/drivers/net/ethernet/aurora/nb8800.c
> index 041cfb7952f8..b07dea3ab019 100644
> --- a/drivers/net/ethernet/aurora/nb8800.c
> +++ b/drivers/net/ethernet/aurora/nb8800.c
> @@ -972,6 +972,10 @@ static int nb8800_open(struct net_device *dev)
> nb8800_mac_rx(dev, true);
> nb8800_mac_tx(dev, true);
> 
> +   priv->speed = -1;
> +   priv->link = -1;
> +   priv->duplex = -1;
> +
> phydev = of_phy_connect(dev, priv->phy_node,
> nb8800_link_reconfigure, 0,
> priv->phy_mode);
> 

I will test your two patches ASAP.

For the record, I have two (maybe related) issues:

1) After a sequence of
ip set link dev eth0 up
ip set link dev eth0 down
ip set link dev eth0 up
RX engine is hosed, thus network connectivity is borked.
The work-around is resetting the MAC in ndo_open
=> mac_init in my proposed patch.
This is by far the biggest issue.
Also, resetting the MAC in ndo_open will make it easy
to implement suspend/resume, as I can just ndo_stop
in suspend, and ndo_open in resume.

2) The system does not print "Link down" when I run
'ip set link dev eth0 down'
This is a symptom of nb8800_link_reconfigure()
not being called at all (or being called
with phydev->link == priv->link when I tried
skipping phy_stop)

Again, thanks for your patches and suggestions,
which I will test in the morning.

I will also try to understand why I didn't have
these issues on the other board...

Regards.

[PATCH v2 1/3] can: m_can: Make hclk optional

2017-07-24 Thread Franklin S Cooper Jr

Hclk is the MCAN's interface clock. However, for OMAP based devices such as
DRA7 SoC family the interface clock is handled by hwmod. Therefore, this
interface clock is managed by hwmod driver via pm_runtime_get and
pm_runtime_put calls. Therefore, this interface clock isn't defined in DT
and thus the driver shouldn't fail if this clock isn't found.

Signed-off-by: Franklin S Cooper Jr 
---
Version 2 changes:
Used NULL instead of 0 for unused hclk handle

 drivers/net/can/m_can/m_can.c | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/drivers/net/can/m_can/m_can.c b/drivers/net/can/m_can/m_can.c
index f4947a7..ea48e59 100644
--- a/drivers/net/can/m_can/m_can.c
+++ b/drivers/net/can/m_can/m_can.c
@@ -1568,8 +1568,13 @@ static int m_can_plat_probe(struct platform_device *pdev)
hclk = devm_clk_get(&pdev->dev, "hclk");
cclk = devm_clk_get(&pdev->dev, "cclk");
 
-   if (IS_ERR(hclk) || IS_ERR(cclk)) {
-   dev_err(&pdev->dev, "no clock found\n");
+   if (IS_ERR(hclk)) {
+   dev_warn(&pdev->dev, "hclk could not be found\n");
+   hclk = NULL;
+   }
+
+   if (IS_ERR(cclk)) {
+   dev_err(&pdev->dev, "cclk could not be found\n");
ret = -ENODEV;
goto failed_ret;
}
-- 
2.10.0

[PATCH v2 0/3] can: m_can: Add PM Runtime Support

2017-07-24 Thread Franklin S Cooper Jr

Add PM runtime support to the MCAN driver. To support devices that don't
use PM runtime leave the original clk calls in the driver. Perhaps in the
future when it makes sense we can remove these non pm runtime clk calls.

Version 2 changes:
Used NULL instead of 0 for unused hclk handle

Franklin S Cooper Jr (3):
  can: m_can: Make hclk optional
  can: m_can: Update documentation to indicate that hclk may be optional
  can: m_can: Add PM Runtime

 .../devicetree/bindings/net/can/m_can.txt  |  3 ++-
 drivers/net/can/m_can/m_can.c  | 22 --
 2 files changed, 22 insertions(+), 3 deletions(-)

-- 
2.10.0

[PATCH v2 3/3] can: m_can: Add PM Runtime

2017-07-24 Thread Franklin S Cooper Jr

Add support for PM Runtime which is the new way to handle managing clocks.
However, to avoid breaking SoCs not using PM_RUNTIME leave the old clk
management approach in place.

PM_RUNTIME is required by OMAP based devices to handle clock management.
Therefore, this allows future Texas Instruments SoCs that have the MCAN IP
to work with this driver.

Signed-off-by: Franklin S Cooper Jr 
---
 drivers/net/can/m_can/m_can.c | 13 +
 1 file changed, 13 insertions(+)

diff --git a/drivers/net/can/m_can/m_can.c b/drivers/net/can/m_can/m_can.c
index ea48e59..93e23f5 100644
--- a/drivers/net/can/m_can/m_can.c
+++ b/drivers/net/can/m_can/m_can.c
@@ -23,6 +23,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 
@@ -633,11 +634,15 @@ static int m_can_clk_start(struct m_can_priv *priv)
if (err)
clk_disable_unprepare(priv->hclk);
 
+   pm_runtime_get_sync(priv->device);
+
return err;
 }
 
 static void m_can_clk_stop(struct m_can_priv *priv)
 {
+   pm_runtime_put_sync(priv->device);
+
clk_disable_unprepare(priv->cclk);
clk_disable_unprepare(priv->hclk);
 }
@@ -1582,6 +1587,8 @@ static int m_can_plat_probe(struct platform_device *pdev)
/* Enable clocks. Necessary to read Core Release in order to determine
 * M_CAN version
 */
+   pm_runtime_enable(&pdev->dev);
+
ret = clk_prepare_enable(hclk);
if (ret)
goto disable_hclk_ret;
@@ -1626,6 +1633,8 @@ static int m_can_plat_probe(struct platform_device *pdev)
 */
tx_fifo_size = mram_config_vals[7];
 
+   pm_runtime_get_sync(&pdev->dev);
+
/* allocate the m_can device */
dev = alloc_m_can_dev(pdev, addr, tx_fifo_size);
if (!dev) {
@@ -1670,6 +1679,7 @@ static int m_can_plat_probe(struct platform_device *pdev)
 disable_hclk_ret:
clk_disable_unprepare(hclk);
 failed_ret:
+   pm_runtime_put_sync(&pdev->dev);
return ret;
 }
 
@@ -1726,6 +1736,9 @@ static int m_can_plat_remove(struct platform_device *pdev)
struct net_device *dev = platform_get_drvdata(pdev);
 
unregister_m_can_dev(dev);
+
+   pm_runtime_disable(&pdev->dev);
+
platform_set_drvdata(pdev, NULL);
 
free_m_can_dev(dev);
-- 
2.10.0

[PATCH v2 2/3] can: m_can: Update documentation to indicate that hclk may be optional

2017-07-24 Thread Franklin S Cooper Jr

Update the documentation to reflect that hclk is now an optional clock.

Signed-off-by: Franklin S Cooper Jr 
Acked-by: Rob Herring 
---
 Documentation/devicetree/bindings/net/can/m_can.txt | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/Documentation/devicetree/bindings/net/can/m_can.txt 
b/Documentation/devicetree/bindings/net/can/m_can.txt
index 9e33177..2a0fe5b 100644
--- a/Documentation/devicetree/bindings/net/can/m_can.txt
+++ b/Documentation/devicetree/bindings/net/can/m_can.txt
@@ -12,7 +12,8 @@ Required properties:
 - interrupt-names  : Should contain "int0" and "int1"
 - clocks   : Clocks used by controller, should be host clock
  and CAN clock.
-- clock-names  : Should contain "hclk" and "cclk"
+- clock-names  : Should contain "hclk" and "cclk". For some socs hclk
+ may be optional.
 - pinctrl-  : Pinctrl states as described in 
bindings/pinctrl/pinctrl-bindings.txt
 - pinctrl-names: Names corresponding to the numbered pinctrl states
 - bosch,mram-cfg   : Message RAM configuration data.
-- 
2.10.0

Re: Problem with PHY state machine when using interrupts

2017-07-24 Thread Florian Fainelli

On 07/24/2017 03:36 PM, Florian Fainelli wrote:
> On 07/24/2017 02:20 PM, Mason wrote:
>> On 24/07/2017 21:53, Florian Fainelli wrote:
>>
>>> Well now that I see the possible interrupts generated, I indeed don't
>>> see how you can get a link down notification unless you somehow force
>>> the link down yourself, which would certainly happen in phy_suspend()
>>> when we set BMCR.pwrdwn, but that may be too late.
>>>
>>> You should still expect the adjust_link() function to be called though
>>> with PHY_HALTED being set and that takes care of doing phydev->link = 0
>>> and netif_carrier_off(). If that still does not work, then see whether
>>> removing the call to phy_stop() does help (it really should).
>>
>> The only functions setting phydev->state to PHY_HALTED
>> are phy_error() and phy_stop() AFAICT.
>>
>> I am aware that when phy_state_machine() handles the
>> PHY_HALTED state, it will set phydev->link = 0;
>> and call netif_carrier_off() -- because that's where
>> I copied that code from.
>>
>> My issue is that phy_state_machine() does not run when
>> I run 'ip set link dev eth0 down' from the command line.
> 
> Yes, that much is clear, which is why I suggested earlier you try the
> patch at the end now.

This sentence was referring to this patch that I changed mid way through
this email:

diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c
index d0626bf5c540..652e24b53f3f 100644
--- a/drivers/net/phy/phy.c
+++ b/drivers/net/phy/phy.c
@@ -968,6 +968,8 @@ void phy_stop(struct phy_device *phydev)
 * of rtnl_lock(), but PHY_HALTED shall guarantee phy_change()
 * will not reenable interrupts.
 */
+   if (phy_interrupt_is_valid(phydev))
+   phy_change(phydev);
 }
 EXPORT_SYMBOL(phy_stop);
-- 
Florian

Re: Problem with PHY state machine when using interrupts

2017-07-24 Thread Florian Fainelli

On 07/24/2017 02:20 PM, Mason wrote:
> On 24/07/2017 21:53, Florian Fainelli wrote:
> 
>> Well now that I see the possible interrupts generated, I indeed don't
>> see how you can get a link down notification unless you somehow force
>> the link down yourself, which would certainly happen in phy_suspend()
>> when we set BMCR.pwrdwn, but that may be too late.
>>
>> You should still expect the adjust_link() function to be called though
>> with PHY_HALTED being set and that takes care of doing phydev->link = 0
>> and netif_carrier_off(). If that still does not work, then see whether
>> removing the call to phy_stop() does help (it really should).
> 
> The only functions setting phydev->state to PHY_HALTED
> are phy_error() and phy_stop() AFAICT.
> 
> I am aware that when phy_state_machine() handles the
> PHY_HALTED state, it will set phydev->link = 0;
> and call netif_carrier_off() -- because that's where
> I copied that code from.
> 
> My issue is that phy_state_machine() does not run when
> I run 'ip set link dev eth0 down' from the command line.

Yes, that much is clear, which is why I suggested earlier you try the
patch at the end now.

> 
> If I'm reading the code right, phy_disconnect() actually
> stops the state machine.
> 
> In interrupt mode, phy_state_machine() doesn't run
> because no interrupt is generated.
> 
> In polling mode, phy_state_machine() doesn't run
> because phy_disconnect() stops the state machine.
> 
> Introducing a sleep before phy_disconnect() gives
> the state machine a chance to run in polling mode,
> but it doesn't feel right, and doesn't fix the
> other mode, which I'm using.

There are several problems it seems:

- the PHY state machine cannot move solely based on the PHY generating
interrupts during phy_stop() if none of the changing conditions for the
HW have changed, come to think about it, I doubt any PHY would be
capable of doing something like that

- there is an expectation from your driver to have adjust_link() run
sometime during ndo_stop() to do something, but why?

What is special about nb8800 that interrupts should be generated during
ndo_stop(), and why do you think this is going to solve your problem?

> 
> Looking at bcm_enet_stop() it calls phy_stop() and
> phy_disconnect() just like the nb8800 driver...
> 
> I'm stumped.

My suggestion of not using phy_stop() was not a good one, but
functionally there is little difference in doing phy_stop() +
phy_disconnect() or just phy_disconnect() alone. What matters is that we
restart the PHY properly when phy_connect() or phy_start() is called.

What I understand now from your "bug report" is that you want
adjust_link to run with phydev->link = 0 to do something during
ndo_close() but you have not explained why, but I suspect such that upon
ndo_open() things work again.

What I suspect your bug is, is that the really was no change in link
status, no interrupt was generated because there should not be one, yet
some of what nb8800_stop() does is not properly balanced by
nb8800_open(). How about the following patch:

diff --git a/drivers/net/ethernet/aurora/nb8800.c
b/drivers/net/ethernet/aurora/nb8800.c
index 041cfb7952f8..b07dea3ab019 100644
--- a/drivers/net/ethernet/aurora/nb8800.c
+++ b/drivers/net/ethernet/aurora/nb8800.c
@@ -972,6 +972,10 @@ static int nb8800_open(struct net_device *dev)
nb8800_mac_rx(dev, true);
nb8800_mac_tx(dev, true);

+   priv->speed = -1;
+   priv->link = -1;
+   priv->duplex = -1;
+
phydev = of_phy_connect(dev, priv->phy_node,
nb8800_link_reconfigure, 0,
priv->phy_mode);
-- 
Florian

Re: [PATCH 2/2] net: ethernet: nb8800: Fix RGMII TX clock delay setup

2017-07-24 Thread Mason

On 24/07/2017 23:49, Florian Fainelli wrote:
> On 07/24/2017 02:21 PM, Mason wrote:
>> On 20/07/2017 14:33, Mason wrote:
>>
>>> As [Florian] pointed out, the spec states that the
>>> "Data to Clock input Skew (at Receiver)"
>>> must be within [ 1.0, 2.6 ] ns.
>>>
>>> I understand that 2 ns is 1/4 of a 125 MHz period,
>>> but it's not clear to me why the above interval is
>>> centered at 1.8 instead of 2.0 ns.
>>>
>>> Also, the AR8035 PHY offers 4 possible TX clock delays:
>>> { 0.25, 1.3, 2.4, 3.4 } according to their doc.
>>> The two extremes are outside the interval, when would
>>> they be useful? In case the transmitter adds "bad" skew?
>>>
>>> Why doesn't the PHY support 1.8/2.0? Is it perhaps
>>> unable to, because of PLL limitations?
>>
>> I haven't yet found answers for these questions.
>>
>> - Why is the interval centered at 1.8 instead of 2.0 ns?
> 
> Presumably because this is almost the middle of the available range and
> it still provides a value that is within the specification...

I was talking about the RGMII spec.
If - theoretically - the best results are achieved
by having a 2 ns skew between clock and data,
it seems odd for the RGMII spec to define an
interval of [ 1.0, 2.6 ] ns for acceptable values.
I would have expected [ 1.2, 2.8 ] ns.

>> - What use are 0.25 ns and 3.4 ns skew?
> 
> Accounting for extreme PCB traces lengths possibly, or just exposing the
> raw values that the HW supports by increments of 0.25 ns, just because
> the HW supports it.

The AR8035 doesn't support increments of 0.25 ns,
it supports just 4 values: 0.25, 1.3, 2.4, 3.4
Two of which are outside the acceptable range
defined in the RGMII spec. Odd.
Giving it more thought, I don't think trace length
factors in, unless the data and clock lines have
very different length (signal propagation).

>> - Why doesn't the PHY support a "recommended" value like 1.8 ns?
>>
>> Does anyone have pointers to good resources?
> 
> The PHY datasheet and the RGMII specification really ought to be the
> starting points, there is not much more to it. Maybe go ask your support
> person at Qualcomm/Atheros about their PHY design?

Sadly, I rarely have access to support for the blocks
we use. I had to download the datasheet off the internet.
But I was only asking out of personal curiosity, since
this is outside my field. I don't think any customer
has complained about the default settings.

Regards.

Re: [PATCH iproute2 net-next] bridge: this patch adds json support for bridge mdb show

2017-07-24 Thread Nikhil Gajendrakumar

Hello Stephen, 

Thanks you.
I will send an incremental patch with updated usage message and man page.

-Nikhil

> On Jul 18, 2017, at 5:33 PM, Stephen Hemminger  
> wrote:
> 
> On Fri,  7 Jul 2017 15:24:16 -0700
> Roopa Prabhu  wrote:
> 
>> From: Nikhil Gajendrakumar 
>> 
>> This patch adds json output to bridge mdb show
>> 
>> Normal Output:
>> $ bridge -d -s mdb show
>> dev br0 port swp3 grp 239.0.0.1 temp  vid 128 172.26
>> dev br0 port swp3 grp 239.0.0.1 temp  vid 64 172.26
>> dev br0 port swp2 grp 239.0.0.2 temp  vid 1024 172.26
>> dev br0 port swp2 grp 239.0.0.2 temp  vid 256 172.26
>> dev br0 port swp2 grp 239.0.0.2 temp  vid 1 172.26
>> dev br0 port swp3 grp 239.0.0.1 temp  vid 1 172.26
>> router ports on br0: swp40.00 permanent
>> router ports on br0: swp50.00 permanent
>> 
>> Json Output:
>> $ bridge -d -s -j mdb show
>> {
>>"mdb": [{
>>"dev": "br0",
>>"port": "swp3",
>>"grp": "239.0.0.1",
>>"state": "temp",
>>"vid": 128,
>>"timer": " 166.74"
>>},{
>>"dev": "br0",
>>"port": "swp3",
>>"grp": "239.0.0.1",
>>"state": "temp",
>>"vid": 64,
>>"timer": " 166.74"
>>},{
>>"dev": "br0",
>>"port": "swp2",
>>"grp": "239.0.0.2",
>>"state": "temp",
>>"vid": 1024,
>>"timer": " 166.74"
>>},{
>>"dev": "br0",
>>"port": "swp2",
>>"grp": "239.0.0.2",
>>"state": "temp",
>>"vid": 256,
>>"timer": " 166.74"
>>},{
>>"dev": "br0",
>>"port": "swp2",
>>"grp": "239.0.0.2",
>>"state": "temp",
>>"vid": 1,
>>"timer": " 166.74"
>>},{
>>"dev": "br0",
>>"port": "swp3",
>>"grp": "239.0.0.1",
>>"state": "temp",
>>"vid": 1,
>>"timer": " 166.74"
>>}
>>],
>>"router": {
>>"br0": [{
>>"port": "swp4",
>>"timer": "   0.00",
>>"type": "permanent"
>>},{
>>"port": "swp5",
>>"timer": "   0.00",
>>"type": "permanent"
>>}
>>]
>>}
>> }
>> 
>> Signed-off-by: Nikhil Gajendrakumar 
>> Signed-off-by: Roopa Prabhu 
> 
> Applied, you should also update usage message and man page.

[PATCH] mwifiex: usb: fix spelling mistake: "aggreataon"-> "aggregation"

2017-07-24 Thread Colin King

From: Colin Ian King 

Trivial fix to spelling mistake in aggr_ctrl module parameter
message text.

Signed-off-by: Colin Ian King 
---
 drivers/net/wireless/marvell/mwifiex/main.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/wireless/marvell/mwifiex/main.c 
b/drivers/net/wireless/marvell/mwifiex/main.c
index f2600b827e81..ebe2c9319948 100644
--- a/drivers/net/wireless/marvell/mwifiex/main.c
+++ b/drivers/net/wireless/marvell/mwifiex/main.c
@@ -46,7 +46,7 @@ MODULE_PARM_DESC(mfg_mode, "manufacturing mode enable:1, 
disable:0");
 
 bool aggr_ctrl;
 module_param(aggr_ctrl, bool, );
-MODULE_PARM_DESC(aggr_ctrl, "usb tx aggreataon enable:1, disable:0");
+MODULE_PARM_DESC(aggr_ctrl, "usb tx aggregation enable:1, disable:0");
 
 /*
  * This function registers the device and performs all the necessary
-- 
2.11.0

Re: commit 16ecba59 breaks 82574L under heavy load.

2017-07-24 Thread Philip Prindeville


> On Jul 20, 2017, at 5:44 PM, Benjamin Poirier  wrote:
> 
> [snip]
> Could you please test the following patch and let me know if it:
> 1) reduces the interrupt rate of the Other msi-x vector
> 2) avoids the link flaps
> or
> 3) logs some dmesg warnings of the form "Other interrupt with unhandled [...]"
> In this case, please paste icr values printed.
> 
> Thanks
> 
> diff --git a/drivers/net/ethernet/intel/e1000e/defines.h 
> b/drivers/net/ethernet/intel/e1000e/defines.h
> index 0641c0098738..afb7ebe20b24 100644
> --- a/drivers/net/ethernet/intel/e1000e/defines.h
> +++ b/drivers/net/ethernet/intel/e1000e/defines.h
> @@ -398,6 +398,7 @@
> #define E1000_ICR_LSC   0x0004 /* Link Status Change */
> #define E1000_ICR_RXSEQ 0x0008 /* Rx sequence error */
> #define E1000_ICR_RXDMT00x0010 /* Rx desc min. threshold (0) */
> +#define E1000_ICR_RXO   0x0040 /* Receiver Overrun */
> #define E1000_ICR_RXT0  0x0080 /* Rx timer intr (ring 0) */
> #define E1000_ICR_ECCER 0x0040 /* Uncorrectable ECC Error */
> /* If this bit asserted, the driver should claim the interrupt */
> diff --git a/drivers/net/ethernet/intel/e1000e/e1000.h 
> b/drivers/net/ethernet/intel/e1000e/e1000.h
> index c7c994eb410e..f7b46eba3efb 100644
> --- a/drivers/net/ethernet/intel/e1000e/e1000.h
> +++ b/drivers/net/ethernet/intel/e1000e/e1000.h
> @@ -351,6 +351,10 @@ struct e1000_adapter {
>   s32 ptp_delta;
> 
>   u16 eee_advert;
> +
> + unsigned int uh_count;
> + u32 uh_values[16];
> + unsigned int uh_values_nb;
> };
> 
> struct e1000_info {
> diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c 
> b/drivers/net/ethernet/intel/e1000e/netdev.c
> index b3679728caac..46697338c0e1 100644
> --- a/drivers/net/ethernet/intel/e1000e/netdev.c
> +++ b/drivers/net/ethernet/intel/e1000e/netdev.c
> @@ -46,6 +46,8 @@
> 
> #include "e1000.h"
> 
> +DEFINE_RATELIMIT_STATE(other_uh_ratelimit_state, HZ, 1);
> +
> #define DRV_EXTRAVERSION "-k"
> 
> #define DRV_VERSION "3.2.6" DRV_EXTRAVERSION
> @@ -1904,12 +1906,60 @@ static irqreturn_t e1000_msix_other(int 
> __always_unused irq, void *data)
>   struct net_device *netdev = data;
>   struct e1000_adapter *adapter = netdev_priv(netdev);
>   struct e1000_hw *hw = &adapter->hw;
> + u32 icr;
> + bool enable = true;
> + bool handled = false;
> + unsigned int i;
> 
> - hw->mac.get_link_status = true;
> + icr = er32(ICR);
> + if (icr & E1000_ICR_RXO) {
> + ew32(ICR, E1000_ICR_RXO);
> + enable = false;
> + /* napi poll will re-enable Other, make sure it runs */
> + if (napi_schedule_prep(&adapter->napi)) {
> + adapter->total_rx_bytes = 0;
> + adapter->total_rx_packets = 0;
> + __napi_schedule(&adapter->napi);
> + }
> + handled = true;
> + }
> + if (icr & E1000_ICR_LSC) {
> + ew32(ICR, E1000_ICR_LSC);
> + hw->mac.get_link_status = true;
> + /* guard against interrupt when we're going down */
> + if (!test_bit(__E1000_DOWN, &adapter->state)) {
> + mod_timer(&adapter->watchdog_timer, jiffies + 1);
> + }
> + handled = true;
> + }
> 
> - /* guard against interrupt when we're going down */
> - if (!test_bit(__E1000_DOWN, &adapter->state)) {
> - mod_timer(&adapter->watchdog_timer, jiffies + 1);
> + if (!handled) {
> + adapter->uh_count++;
> + /* only print unseen icr values */
> + if (adapter->uh_values_nb < ARRAY_SIZE(adapter->uh_values)) {
> + for (i = 0; i < ARRAY_SIZE(adapter->uh_values); i++) {
> + if (adapter->uh_values[i] == icr) {
> + break;
> + }
> + }
> + if (i == ARRAY_SIZE(adapter->uh_values)) {
> + adapter->uh_values[adapter->uh_values_nb] =
> + icr;
> + adapter->uh_values_nb++;
> + netdev_warn(netdev,
> + "Other interrupt with unhandled icr 
> 0x%08x\n",
> + icr);
> + }
> + }
> + }
> + if (adapter->uh_count && __ratelimit(&other_uh_ratelimit_state)) {
> + netdev_warn(netdev,
> + "Other interrupt with unhandled cause, count %u\n",
> + adapter->uh_count);
> + adapter->uh_count = 0;
> + }
> +
> + if (enable && !test_bit(__E1000_DOWN, &adapter->state)) {
>   ew32(IMS, E1000_IMS_OTHER);
>   }
> 
> @@ -2681,7 +2731,8 @@ static int e1000e_poll(struct napi_struct *napi, int 
> weight)
>   napi_complete_done(napi

Re: [PATCH 2/2] net: ethernet: nb8800: Fix RGMII TX clock delay setup

2017-07-24 Thread Florian Fainelli

On 07/24/2017 02:21 PM, Mason wrote:
> On 20/07/2017 14:33, Mason wrote:
> 
>> As [Florian] pointed out, the spec states that the
>> "Data to Clock input Skew (at Receiver)"
>> must be within [ 1.0, 2.6 ] ns.
>>
>> I understand that 2 ns is 1/4 of a 125 MHz period,
>> but it's not clear to me why the above interval is
>> centered at 1.8 instead of 2.0 ns.
>>
>> Also, the AR8035 PHY offers 4 possible TX clock delays:
>> { 0.25, 1.3, 2.4, 3.4 } according to their doc.
>> The two extremes are outside the interval, when would
>> they be useful? In case the transmitter adds "bad" skew?
>>
>> Why doesn't the PHY support 1.8/2.0? Is it perhaps
>> unable to, because of PLL limitations?
> 
> I haven't yet found answers for these questions.
> 
> - Why is the interval centered at 1.8 instead of 2.0 ns?

Presumably because this is almost the middle of the available range and
it still provides a value that is within the specification...

> - What use are 0.25 ns and 3.4 ns skew?

Accounting for extreme PCB traces lengths possibly, or just exposing the
raw values that the HW supports by increments of 0.25 ns, just because
the HW supports it.

> - Why doesn't the PHY support a "recommended" value like 1.8 ns?
> 
> Does anyone have pointers to good resources?

The PHY datasheet and the RGMII specification really ought to be the
starting points, there is not much more to it. Maybe go ask your support
person at Qualcomm/Atheros about their PHY design?
-- 
Florian

Re: [PATCH net-next V2 0/5] Refine virtio-net XDP

2017-07-24 Thread Michael S. Tsirkin

On Mon, Jul 24, 2017 at 01:38:59PM -0700, David Miller wrote:
> From: Jason Wang 
> Date: Wed, 19 Jul 2017 16:54:44 +0800
> 
> > Hi:
> > 
> > This series brings two optimizations for virtio-net XDP:
> > 
> > - avoid reset during XDP set
> > - turn off offloads on demand
> > 
> > Changes from V1:
> > - Various tweaks on commit logs and comments
> > - Use virtnet_napi_enable() when enabling NAPI on XDP set
> > - Copy the small buffer packet only if xdp_headroom is smaller than
> >   required
> > 
> > Please review.
> 
> Series applied, thanks Jason.
> 
> Michael, I gave you 4+ days to properly review this respin of this
> series.  I felt that was more than sufficient time, even with an
> intervening weekend.  So if you don't like that I applied this without
> your review, please be more punctual in the future.
> 
> Thank you.

My bad - reviewed when I was offline and forgot to send it out.

By luck this is not a big deal:

- Patches 1-4 are good.
- Patch 5 has some problems IMO, but all it does
is try to enable a previously non-working configuration, so
it seems ok to fix it up with follow-up patches as appropriate.

-- 
MST

Re: [PATCH net-next V2 5/5] virtio-net: switch off offloads on demand if possible on XDP set

2017-07-24 Thread Michael S. Tsirkin

On Wed, Jul 19, 2017 at 04:54:49PM +0800, Jason Wang wrote:
> Current XDP implementation wants guest offloads feature to be disabled
> on device. This is inconvenient and means guest can't benefit from
> offloads if XDP is not used. This patch tries to address this
> limitation by disabling the offloads on demand through control guest
> offloads. Guest offloads will be disabled and enabled on demand on XDP
> set.
> 
> Signed-off-by: Jason Wang 

Also looks like the comment from patch 4 needs to be extended
to cover LRO as cause of temporary packet scatter.

> ---
>  drivers/net/virtio_net.c | 70 
> 
>  1 file changed, 65 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index b3fc01d..5fbd15e 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -57,6 +57,11 @@ DECLARE_EWMA(pkt_len, 0, 64)
>  
>  #define VIRTNET_DRIVER_VERSION "1.0.0"
>  
> +const unsigned long guest_offloads[] = { VIRTIO_NET_F_GUEST_TSO4,
> +  VIRTIO_NET_F_GUEST_TSO6,
> +  VIRTIO_NET_F_GUEST_ECN,
> +  VIRTIO_NET_F_GUEST_UFO };
> +
>  struct virtnet_stats {
>   struct u64_stats_sync tx_syncp;
>   struct u64_stats_sync rx_syncp;
> @@ -164,10 +169,13 @@ struct virtnet_info {
>   u8 ctrl_promisc;
>   u8 ctrl_allmulti;
>   u16 ctrl_vid;
> + u64 ctrl_offloads;
>  
>   /* Ethtool settings */
>   u8 duplex;
>   u32 speed;
> +
> + unsigned long guest_offloads;
>  };
>  
>  struct padded_vnet_hdr {
> @@ -1896,6 +1904,47 @@ static int virtnet_restore_up(struct virtio_device 
> *vdev)
>   return err;
>  }
>  
> +static int virtnet_set_guest_offloads(struct virtnet_info *vi, u64 offloads)
> +{
> + struct scatterlist sg;
> + vi->ctrl_offloads = cpu_to_virtio64(vi->vdev, offloads);
> +
> + sg_init_one(&sg, &vi->ctrl_offloads, sizeof(vi->ctrl_offloads));
> +
> + if (!virtnet_send_command(vi, VIRTIO_NET_CTRL_GUEST_OFFLOADS,
> +   VIRTIO_NET_CTRL_GUEST_OFFLOADS_SET, &sg)) {
> + dev_warn(&vi->dev->dev, "Fail to set guest offload. \n");
> + return -EINVAL;
> + }
> +
> + return 0;
> +}
> +
> +static int virtnet_clear_guest_offloads(struct virtnet_info *vi)
> +{
> + u64 offloads = 0;
> +
> + if (!vi->guest_offloads)
> + return 0;
> +
> + if (virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_CSUM))
> + offloads = 1ULL << VIRTIO_NET_F_GUEST_CSUM;
> +
> + return virtnet_set_guest_offloads(vi, offloads);
> +}
> +
> +static int virtnet_restore_guest_offloads(struct virtnet_info *vi)
> +{
> + u64 offloads = vi->guest_offloads;
> +
> + if (!vi->guest_offloads)
> + return 0;
> + if (virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_CSUM))
> + offloads |= 1ULL << VIRTIO_NET_F_GUEST_CSUM;
> +
> + return virtnet_set_guest_offloads(vi, offloads);
> +}
> +
>  static int virtnet_xdp_set(struct net_device *dev, struct bpf_prog *prog,
>  struct netlink_ext_ack *extack)
>  {
> @@ -1905,10 +1954,11 @@ static int virtnet_xdp_set(struct net_device *dev, 
> struct bpf_prog *prog,
>   u16 xdp_qp = 0, curr_qp;
>   int i, err;
>  
> - if (virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_TSO4) ||
> - virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_TSO6) ||
> - virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_ECN) ||
> - virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_UFO)) {
> + if (!virtio_has_feature(vi->vdev, VIRTIO_NET_F_CTRL_GUEST_OFFLOADS)
> + && (virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_TSO4) ||
> + virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_TSO6) ||
> + virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_ECN) ||
> + virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_UFO))) {
>   NL_SET_ERR_MSG_MOD(extack, "Can't set XDP while host is 
> implementing LRO, disable LRO first");
>   return -EOPNOTSUPP;
>   }
> @@ -1955,6 +2005,12 @@ static int virtnet_xdp_set(struct net_device *dev, 
> struct bpf_prog *prog,
>   for (i = 0; i < vi->max_queue_pairs; i++) {
>   old_prog = rtnl_dereference(vi->rq[i].xdp_prog);
>   rcu_assign_pointer(vi->rq[i].xdp_prog, prog);
> + if (i == 0) {
> + if (!old_prog)
> + virtnet_clear_guest_offloads(vi);
> + if (!prog)
> + virtnet_restore_guest_offloads(vi);
> + }
>   if (old_prog)
>   bpf_prog_put(old_prog);
>   virtnet_napi_enable(vi->rq[i].vq, &vi->rq[i].napi);
> @@ -2588,6 +2644,10 @@ static int virtnet_probe(struct virtio_device *vdev)
>   netif_carrier_on(dev);
>   }
>  
> + for (i = 0

Re: [PATCH net-next 5/5] virtio-net: switch off offloads on demand if possible on XDP set

2017-07-24 Thread Michael S. Tsirkin

On Wed, Jul 19, 2017 at 10:39:53AM +0800, Jason Wang wrote:
> 
> 
> On 2017年07月19日 04:07, Michael S. Tsirkin wrote:
> > On Mon, Jul 17, 2017 at 08:44:01PM +0800, Jason Wang wrote:
> > > Current XDP implementation want guest offloads feature to be disabled
> > s/want/wants/
> > 
> > > on qemu cli.
> > on the device.
> > 
> > > This is inconvenient and means guest can't benefit from
> > > offloads if XDP is not used. This patch tries to address this
> > > limitation by disable
> > disabling
> > 
> > > the offloads on demand through control guest
> > > offloads. Guest offloads will be disabled and enabled on demand on XDP
> > > set.
> > > 
> > > Signed-off-by: Jason Wang 
> > In fact, since we no longer reset when XDP is set,
> > here device might have offloads enabled, buffers are
> > used but not consumed, then XDP is set.
> > 
> > This can result in
> > - packet scattered across multiple buffers
> >(handled correctly but need to update the comment)
> 
> Ok.
> 
> > - packet may have VIRTIO_NET_HDR_F_NEEDS_CSUM, in that case
> >the spec says "The checksum on the packet is incomplete".
> >(probably needs to be handled by calculating the checksum).
> 
> That's an option. Maybe it's tricky but I was thinking whether or not we can
> just keep the CHECKSUM_PARTIAL here.

XDP programs do not expect this currently. As it's a temporary
condition, let's just fix it up.

> > 
> > 
> > Ideas for follow-up patches:
> > 
> > - skip looking at packet data completely
> >won't work if you play with checksums dynamically
> >but can be done if disabled on device
> > - allow ethtools to tweak offloads from userspace as well
> 
> Right.
> 
> Thanks
> 
> > 
> > > ---
> > >   drivers/net/virtio_net.c | 70 
> > > 
> > >   1 file changed, 65 insertions(+), 5 deletions(-)
> > > 
> > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > index e732bd6..d970c2d 100644
> > > --- a/drivers/net/virtio_net.c
> > > +++ b/drivers/net/virtio_net.c
> > > @@ -57,6 +57,11 @@ DECLARE_EWMA(pkt_len, 0, 64)
> > >   #define VIRTNET_DRIVER_VERSION "1.0.0"
> > > +const unsigned long guest_offloads[] = { VIRTIO_NET_F_GUEST_TSO4,
> > > +  VIRTIO_NET_F_GUEST_TSO6,
> > > +  VIRTIO_NET_F_GUEST_ECN,
> > > +  VIRTIO_NET_F_GUEST_UFO };
> > > +
> > >   struct virtnet_stats {
> > >   struct u64_stats_sync tx_syncp;
> > >   struct u64_stats_sync rx_syncp;
> > > @@ -164,10 +169,13 @@ struct virtnet_info {
> > >   u8 ctrl_promisc;
> > >   u8 ctrl_allmulti;
> > >   u16 ctrl_vid;
> > > + u64 ctrl_offloads;
> > >   /* Ethtool settings */
> > >   u8 duplex;
> > >   u32 speed;
> > > +
> > > + unsigned long guest_offloads;
> > >   };
> > >   struct padded_vnet_hdr {
> > > @@ -1889,6 +1897,47 @@ static int virtnet_restore_up(struct virtio_device 
> > > *vdev)
> > >   return err;
> > >   }
> > > +static int virtnet_set_guest_offloads(struct virtnet_info *vi, u64 
> > > offloads)
> > > +{
> > > + struct scatterlist sg;
> > > + vi->ctrl_offloads = cpu_to_virtio64(vi->vdev, offloads);
> > > +
> > > + sg_init_one(&sg, &vi->ctrl_offloads, sizeof(vi->ctrl_offloads));
> > > +
> > > + if (!virtnet_send_command(vi, VIRTIO_NET_CTRL_GUEST_OFFLOADS,
> > > +   VIRTIO_NET_CTRL_GUEST_OFFLOADS_SET, &sg)) {
> > > + dev_warn(&vi->dev->dev, "Fail to set guest offload. \n");
> > > + return -EINVAL;
> > > + }
> > > +
> > > + return 0;
> > > +}
> > > +
> > > +static int virtnet_clear_guest_offloads(struct virtnet_info *vi)
> > > +{
> > > + u64 offloads = 0;
> > > +
> > > + if (!vi->guest_offloads)
> > > + return 0;
> > > +
> > > + if (virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_CSUM))
> > > + offloads = 1ULL << VIRTIO_NET_F_GUEST_CSUM;
> > > +
> > > + return virtnet_set_guest_offloads(vi, offloads);
> > > +}
> > > +
> > > +static int virtnet_restore_guest_offloads(struct virtnet_info *vi)
> > > +{
> > > + u64 offloads = vi->guest_offloads;
> > > +
> > > + if (!vi->guest_offloads)
> > > + return 0;
> > > + if (virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_CSUM))
> > > + offloads |= 1ULL << VIRTIO_NET_F_GUEST_CSUM;
> > > +
> > > + return virtnet_set_guest_offloads(vi, offloads);
> > > +}
> > > +
> > >   static int virtnet_xdp_set(struct net_device *dev, struct bpf_prog 
> > > *prog,
> > >  struct netlink_ext_ack *extack)
> > >   {
> > > @@ -1898,10 +1947,11 @@ static int virtnet_xdp_set(struct net_device 
> > > *dev, struct bpf_prog *prog,
> > >   u16 xdp_qp = 0, curr_qp;
> > >   int i, err;
> > > - if (virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_TSO4) ||
> > > - virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_TSO6) ||
> > > - virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_ECN) ||
> > > - virtio_has_feature(vi

Re: [PATCH net-next V2 5/5] virtio-net: switch off offloads on demand if possible on XDP set

2017-07-24 Thread Michael S. Tsirkin

On Wed, Jul 19, 2017 at 04:54:49PM +0800, Jason Wang wrote:
> Current XDP implementation wants guest offloads feature to be disabled
> on device. This is inconvenient and means guest can't benefit from
> offloads if XDP is not used. This patch tries to address this
> limitation by disabling the offloads on demand through control guest
> offloads. Guest offloads will be disabled and enabled on demand on XDP
> set.
> 
> Signed-off-by: Jason Wang 

Doesn't look like my comments on v1 were addressed.
Copying here:

In fact, since we no longer reset when XDP is set,
here device might have offloads enabled, buffers are
used but not consumed, then XDP is set.

This can result in
- packet scattered across multiple buffers
  (handled correctly but need to update the comment)
- packet may have VIRTIO_NET_HDR_F_NEEDS_CSUM, in that case
  the spec says "The checksum on the packet is incomplete".
  (probably needs to be handled by calculating the checksum).

Jason, as David applied this patch already, can you comment
on the issues please?

> ---
>  drivers/net/virtio_net.c | 70 
> 
>  1 file changed, 65 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index b3fc01d..5fbd15e 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -57,6 +57,11 @@ DECLARE_EWMA(pkt_len, 0, 64)
>  
>  #define VIRTNET_DRIVER_VERSION "1.0.0"
>  
> +const unsigned long guest_offloads[] = { VIRTIO_NET_F_GUEST_TSO4,
> +  VIRTIO_NET_F_GUEST_TSO6,
> +  VIRTIO_NET_F_GUEST_ECN,
> +  VIRTIO_NET_F_GUEST_UFO };
> +
>  struct virtnet_stats {
>   struct u64_stats_sync tx_syncp;
>   struct u64_stats_sync rx_syncp;
> @@ -164,10 +169,13 @@ struct virtnet_info {
>   u8 ctrl_promisc;
>   u8 ctrl_allmulti;
>   u16 ctrl_vid;
> + u64 ctrl_offloads;
>  
>   /* Ethtool settings */
>   u8 duplex;
>   u32 speed;
> +
> + unsigned long guest_offloads;
>  };
>  
>  struct padded_vnet_hdr {
> @@ -1896,6 +1904,47 @@ static int virtnet_restore_up(struct virtio_device 
> *vdev)
>   return err;
>  }
>  
> +static int virtnet_set_guest_offloads(struct virtnet_info *vi, u64 offloads)
> +{
> + struct scatterlist sg;
> + vi->ctrl_offloads = cpu_to_virtio64(vi->vdev, offloads);
> +
> + sg_init_one(&sg, &vi->ctrl_offloads, sizeof(vi->ctrl_offloads));
> +
> + if (!virtnet_send_command(vi, VIRTIO_NET_CTRL_GUEST_OFFLOADS,
> +   VIRTIO_NET_CTRL_GUEST_OFFLOADS_SET, &sg)) {
> + dev_warn(&vi->dev->dev, "Fail to set guest offload. \n");
> + return -EINVAL;
> + }
> +
> + return 0;
> +}
> +
> +static int virtnet_clear_guest_offloads(struct virtnet_info *vi)
> +{
> + u64 offloads = 0;
> +
> + if (!vi->guest_offloads)
> + return 0;
> +
> + if (virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_CSUM))
> + offloads = 1ULL << VIRTIO_NET_F_GUEST_CSUM;
> +
> + return virtnet_set_guest_offloads(vi, offloads);
> +}
> +
> +static int virtnet_restore_guest_offloads(struct virtnet_info *vi)
> +{
> + u64 offloads = vi->guest_offloads;
> +
> + if (!vi->guest_offloads)
> + return 0;
> + if (virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_CSUM))
> + offloads |= 1ULL << VIRTIO_NET_F_GUEST_CSUM;
> +
> + return virtnet_set_guest_offloads(vi, offloads);
> +}
> +
>  static int virtnet_xdp_set(struct net_device *dev, struct bpf_prog *prog,
>  struct netlink_ext_ack *extack)
>  {
> @@ -1905,10 +1954,11 @@ static int virtnet_xdp_set(struct net_device *dev, 
> struct bpf_prog *prog,
>   u16 xdp_qp = 0, curr_qp;
>   int i, err;
>  
> - if (virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_TSO4) ||
> - virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_TSO6) ||
> - virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_ECN) ||
> - virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_UFO)) {
> + if (!virtio_has_feature(vi->vdev, VIRTIO_NET_F_CTRL_GUEST_OFFLOADS)
> + && (virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_TSO4) ||
> + virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_TSO6) ||
> + virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_ECN) ||
> + virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_UFO))) {
>   NL_SET_ERR_MSG_MOD(extack, "Can't set XDP while host is 
> implementing LRO, disable LRO first");
>   return -EOPNOTSUPP;
>   }
> @@ -1955,6 +2005,12 @@ static int virtnet_xdp_set(struct net_device *dev, 
> struct bpf_prog *prog,
>   for (i = 0; i < vi->max_queue_pairs; i++) {
>   old_prog = rtnl_dereference(vi->rq[i].xdp_prog);
>   rcu_assign_pointer(vi->rq[i].xdp_prog, prog);
> + if (i == 0) {
> + if (!old_prog)
> +

[PATCH net-next] bpf: add helper capable of reading out instructions

2017-07-24 Thread Jakub Kicinski

To read translated and jited instructions from the kernel,
one has to set certain pointers of struct bpf_prog_info to
pre-allocated user buffers.  Unfortunately, the existing
bpf_obj_get_info_by_fd() helper zeros struct bpf_prog_info
before passing it to the kernel.

Keeping the zeroing seems like a good idea in general, since
kernel will check if the structure was zeroed.  Add a new
helper for those more advanced users who can be trusted to
take care of zeroing themselves.

Signed-off-by: Jakub Kicinski 
---
I'm happy to change the name of the new function.

 tools/lib/bpf/bpf.c | 10 --
 tools/lib/bpf/bpf.h |  2 ++
 2 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
index 412a7c82995a..2703fa282b65 100644
--- a/tools/lib/bpf/bpf.c
+++ b/tools/lib/bpf/bpf.c
@@ -308,13 +308,12 @@ int bpf_map_get_fd_by_id(__u32 id)
return sys_bpf(BPF_MAP_GET_FD_BY_ID, &attr, sizeof(attr));
 }
 
-int bpf_obj_get_info_by_fd(int prog_fd, void *info, __u32 *info_len)
+int __bpf_obj_get_info_by_fd(int prog_fd, void *info, __u32 *info_len)
 {
union bpf_attr attr;
int err;
 
bzero(&attr, sizeof(attr));
-   bzero(info, *info_len);
attr.info.bpf_fd = prog_fd;
attr.info.info_len = *info_len;
attr.info.info = ptr_to_u64(info);
@@ -325,3 +324,10 @@ int bpf_obj_get_info_by_fd(int prog_fd, void *info, __u32 
*info_len)
 
return err;
 }
+
+int bpf_obj_get_info_by_fd(int prog_fd, void *info, __u32 *info_len)
+{
+   bzero(info, *info_len);
+
+   return __bpf_obj_get_info_by_fd(prog_fd, info, info_len);
+}
diff --git a/tools/lib/bpf/bpf.h b/tools/lib/bpf/bpf.h
index 418c86e69bcb..e44b423ac07e 100644
--- a/tools/lib/bpf/bpf.h
+++ b/tools/lib/bpf/bpf.h
@@ -58,6 +58,8 @@ int bpf_prog_get_next_id(__u32 start_id, __u32 *next_id);
 int bpf_map_get_next_id(__u32 start_id, __u32 *next_id);
 int bpf_prog_get_fd_by_id(__u32 id);
 int bpf_map_get_fd_by_id(__u32 id);
+/* Note: bpf_obj_get_info_by_fd() will init info to zeroes */
 int bpf_obj_get_info_by_fd(int prog_fd, void *info, __u32 *info_len);
+int __bpf_obj_get_info_by_fd(int prog_fd, void *info, __u32 *info_len);
 
 #endif
-- 
2.11.0

[PATCH net-next] bpf: install libbpf headers on 'make install'

2017-07-24 Thread Jakub Kicinski

Install the bpf.h header to $(prefix)/include/bpf/ directory.
This is necessary to build standalone applications using libbpf,
without the need to clone the kernel sources and point to them.

Signed-off-by: Jakub Kicinski 
---
I'm not 100% sure if it's OK to export the header file and which
directory it should end up in (bpf/? libbpf/?).

 tools/lib/bpf/Makefile | 9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/tools/lib/bpf/Makefile b/tools/lib/bpf/Makefile
index 1f5300e56b44..22dad416e0bd 100644
--- a/tools/lib/bpf/Makefile
+++ b/tools/lib/bpf/Makefile
@@ -46,6 +46,7 @@ else
 endif
 
 prefix ?= /usr/local
+headerdir = $(prefix)/include/bpf/
 libdir = $(prefix)/$(libdir_relative)
 man_dir = $(prefix)/share/man
 man_dir_SQ = '$(subst ','\'',$(man_dir))'
@@ -90,11 +91,13 @@ endif
 export prefix libdir src obj
 
 # Shell quotes
+headerdir_SQ = $(subst ','\'',$(headerdir))
 libdir_SQ = $(subst ','\'',$(libdir))
 libdir_relative_SQ = $(subst ','\'',$(libdir_relative))
 plugin_dir_SQ = $(subst ','\'',$(plugin_dir))
 
 LIB_FILE = libbpf.a libbpf.so
+HEADER_FILE = bpf.h
 
 VERSION= $(BPF_VERSION)
 PATCHLEVEL = $(BPF_PATCHLEVEL)
@@ -189,7 +192,11 @@ install_lib: all_cmd
$(call QUIET_INSTALL, $(LIB_FILE)) \
$(call do_install,$(LIB_FILE),$(libdir_SQ))
 
-install: install_lib
+install_hdr: all_cmd
+   $(call QUIET_INSTALL, $(HEADER_FILE)) \
+   $(call do_install,$(HEADER_FILE),$(headerdir_SQ))
+
+install: install_lib install_hdr
 
 ### Cleaning rules
 
-- 
2.11.0

Re: [PATCH 2/2] net: ethernet: nb8800: Fix RGMII TX clock delay setup

2017-07-24 Thread Mason

On 20/07/2017 14:33, Mason wrote:

> As [Florian] pointed out, the spec states that the
> "Data to Clock input Skew (at Receiver)"
> must be within [ 1.0, 2.6 ] ns.
> 
> I understand that 2 ns is 1/4 of a 125 MHz period,
> but it's not clear to me why the above interval is
> centered at 1.8 instead of 2.0 ns.
> 
> Also, the AR8035 PHY offers 4 possible TX clock delays:
> { 0.25, 1.3, 2.4, 3.4 } according to their doc.
> The two extremes are outside the interval, when would
> they be useful? In case the transmitter adds "bad" skew?
> 
> Why doesn't the PHY support 1.8/2.0? Is it perhaps
> unable to, because of PLL limitations?

I haven't yet found answers for these questions.

- Why is the interval centered at 1.8 instead of 2.0 ns?
- What use are 0.25 ns and 3.4 ns skew?
- Why doesn't the PHY support a "recommended" value like 1.8 ns?

Does anyone have pointers to good resources?

Regards.

Re: Problem with PHY state machine when using interrupts

2017-07-24 Thread Mason

On 24/07/2017 21:53, Florian Fainelli wrote:

> Well now that I see the possible interrupts generated, I indeed don't
> see how you can get a link down notification unless you somehow force
> the link down yourself, which would certainly happen in phy_suspend()
> when we set BMCR.pwrdwn, but that may be too late.
> 
> You should still expect the adjust_link() function to be called though
> with PHY_HALTED being set and that takes care of doing phydev->link = 0
> and netif_carrier_off(). If that still does not work, then see whether
> removing the call to phy_stop() does help (it really should).

The only functions setting phydev->state to PHY_HALTED
are phy_error() and phy_stop() AFAICT.

I am aware that when phy_state_machine() handles the
PHY_HALTED state, it will set phydev->link = 0;
and call netif_carrier_off() -- because that's where
I copied that code from.

My issue is that phy_state_machine() does not run when
I run 'ip set link dev eth0 down' from the command line.

If I'm reading the code right, phy_disconnect() actually
stops the state machine.

In interrupt mode, phy_state_machine() doesn't run
because no interrupt is generated.

In polling mode, phy_state_machine() doesn't run
because phy_disconnect() stops the state machine.

Introducing a sleep before phy_disconnect() gives
the state machine a chance to run in polling mode,
but it doesn't feel right, and doesn't fix the
other mode, which I'm using.

Looking at bcm_enet_stop() it calls phy_stop() and
phy_disconnect() just like the nb8800 driver...

I'm stumped.

Regards.

Re: [PATCH net] net/socket: fix type in assignment and trim long line

2017-07-24 Thread David Miller

From: Paolo Abeni 
Date: Fri, 21 Jul 2017 18:49:45 +0200

> The commit ffb07550c76f ("copy_msghdr_from_user(): get rid of
> field-by-field copyin") introduce a new sparse warning:
> 
> net/socket.c:1919:27: warning: incorrect type in assignment (different 
> address spaces)
> net/socket.c:1919:27:expected void *msg_control
> net/socket.c:1919:27:got void [noderef] *[addressable] msg_control
> 
> and a line above 80 chars, let's fix them
> 
> Fixes: ffb07550c76f ("copy_msghdr_from_user(): get rid of field-by-field 
> copyin")
> Signed-off-by: Paolo Abeni 

Applied, thanks.

Re: pull-request: wireless-drivers 2017-07-21

2017-07-24 Thread David Miller

From: Kalle Valo 
Date: Fri, 21 Jul 2017 19:12:54 +0300

> important fixes for net which had accumulated while I was away. I only
> applied the brcmfmac and rtlwifi patches only eight hours ago and I
> haven't seen the kbuild report yet so they might have some build
> breakage in theory. But the patches are so small so the chances that
> they would break something are really small and so I send this to you
> already now to not delay them any longer.
> 
> Please let me know if there are any problems.

Welcome back, pulled, thanks.

Re: [patch net] mlxsw: spectrum_router: Don't offload routes next in list

2017-07-24 Thread David Miller

From: Jiri Pirko 
Date: Fri, 21 Jul 2017 18:04:28 +0200

> From: Ido Schimmel 
> 
> Each FIB node holds a linked list of routes sharing the same prefix and
> length. In the case of IPv4 it's ordered according to table ID, metric
> and TOS and only the first route in the list is actually programmed to
> the device.
> 
> In case a gatewayed route is added somewhere in the list, then after its
> nexthop group will be refreshed and become valid (due to the resolution
> of its gateway), it'll mistakenly overwrite the existing entry.
> 
> Example:
> 192.168.200.0/24 dev enp3s0np3 scope link metric 1000 offload
> 192.168.200.0/24 via 192.168.100.1 dev enp3s0np3 metric 1000 offload
> 
> Both routes are marked as offloaded despite the fact only the first one
> should actually be present in the device's table.
> 
> When refreshing the nexthop group, don't write the route to the device's
> table unless it's the first in its node.
> 
> Fixes: 9aecce1c7d97 ("mlxsw: spectrum_router: Correctly handle identical 
> routes")
> Signed-off-by: Ido Schimmel 
> Signed-off-by: Jiri Pirko 

Applied.

[PATCH net] ipv6: Don't increase IPSTATS_MIB_FRAGFAILS twice in ip6_fragment()

2017-07-24 Thread Stefano Brivio

RFC 2465 defines ipv6IfStatsOutFragFails as:

"The number of IPv6 datagrams that have been discarded
 because they needed to be fragmented at this output
 interface but could not be."

The existing implementation, instead, would increase the counter
twice in case we fail to allocate room for single fragments:
once for the fragment, once for the datagram.

This didn't look intentional though. In one of the two affected
affected failure paths, the double increase was simply a result
of a new 'goto fail' statement, introduced to avoid a skb leak.
The other path appears to be affected since at least 2.6.12-rc2.

Reported-by: Sabrina Dubroca 
Fixes: 1d325d217c7f ("ipv6: ip6_fragment: fix headroom tests and skb leak")
Signed-off-by: Stefano Brivio 
---
 net/ipv6/ip6_output.c | 4 
 1 file changed, 4 deletions(-)

diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index 1422d6c08377..162efba0d0cd 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -673,8 +673,6 @@ int ip6_fragment(struct net *net, struct sock *sk, struct 
sk_buff *skb,
*prevhdr = NEXTHDR_FRAGMENT;
tmp_hdr = kmemdup(skb_network_header(skb), hlen, GFP_ATOMIC);
if (!tmp_hdr) {
-   IP6_INC_STATS(net, ip6_dst_idev(skb_dst(skb)),
- IPSTATS_MIB_FRAGFAILS);
err = -ENOMEM;
goto fail;
}
@@ -789,8 +787,6 @@ int ip6_fragment(struct net *net, struct sock *sk, struct 
sk_buff *skb,
frag = alloc_skb(len + hlen + sizeof(struct frag_hdr) +
 hroom + troom, GFP_ATOMIC);
if (!frag) {
-   IP6_INC_STATS(net, ip6_dst_idev(skb_dst(skb)),
- IPSTATS_MIB_FRAGFAILS);
err = -ENOMEM;
goto fail;
}
-- 
2.9.4

Re: [PATCH] lib: test_rhashtable: fix for large entry counts

2017-07-24 Thread David Miller

From: Phil Sutter 
Date: Fri, 21 Jul 2017 16:51:31 +0200

> During concurrent access testing, threadfunc() concatenated thread ID
> and object index to create a unique key like so:
> 
> | tdata->objs[i].value = (tdata->id << 16) | i;
> 
> This breaks if a user passes an entries parameter of 64k or higher,
> since 'i' might use more than 16 bits then. Effectively, this will lead
> to duplicate keys in the table.
> 
> Fix the problem by introducing a struct holding object and thread ID and
> using that as key instead of a single integer type field.
> 
> Fixes: f4a3e90ba5739 ("rhashtable-test: extend to test concurrency")
> Reported by: Manuel Messner 
> Signed-off-by: Phil Sutter 

Applied, thanks Phil.

Re: [RFC PATCH] IP: do not modify ingress packet IP option in ip_options_echo()

2017-07-24 Thread David Miller

From: Paolo Abeni 
Date: Fri, 21 Jul 2017 15:55:18 +0200

> diff --git a/net/ipv4/ip_options.c b/net/ipv4/ip_options.c
> index 93157f2..fdda973 100644
> --- a/net/ipv4/ip_options.c
> +++ b/net/ipv4/ip_options.c
> @@ -174,9 +174,6 @@ int __ip_options_echo(struct ip_options *dopt, struct 
> sk_buff *skb,
>   doffset -= 4;
>   }
>   if (doffset > 3) {
> - __be32 daddr = fib_compute_spec_dst(skb);
> -
> - memcpy(&start[doffset-1], &daddr, 4);
>   dopt->faddr = faddr;

This transformation is required, but in the destination not the source.

The red flag is that we are indexing 'start' with 'doffset' instead of
'soffset'.

Re: Kernel TLS in 4.13-rc1

2017-07-24 Thread Dave Watson

On 07/23/17 09:39 PM, David Oberhollenzer wrote:
> After fixing the benchmark/test tool that the patch description
> linked to (https://github.com/Mellanox/tls-af_ktls_tool) to make
> sure that the server and client actually *agree* on AES-128-GCM,
> I simply ran the client program with the --verify-sendpage option.
> 
> The handshake and setting up of the sockets appears to work but
> the program complains that the sent and received page contents
> do not match (sent is 0x12 repeated all over and received looks
> pretty random).

The --verify functions depend on the RX path as well, which has not
been merged.  Any programs / tests using OpenSSL + patches should work
fine.

If you want to use the tool, something like this should work, so that
the receive path uses gnutls:

./server --no-echo

./client --server-port 12345 --sendfile some_file --server-host localhost

Re: [BUG] /net/hsr: prune_timer

2017-07-24 Thread Cong Wang

On Tue, Jul 18, 2017 at 1:07 AM, Dmitry  wrote:
> Hello!
>
>
> void hsr_prune_nodes(unsigned long data) called once by timer.
>
> hsr_prune_nodes must be called periodically every PRUNE_PERIOD (60s).
>
>
>
>
> This code want be added to tail of hsr_prune_nodes function
>
> hsr->prune_timer.expires = jiffies + msecs_to_jiffies(PRUNE_PERIOD);
> add_timer(&hsr->prune_timer);
>
> or
>
> mod_timer(&hsr->prune_timer, jiffies + msecs_to_jiffies(PRUNE_PERIOD));

Please feel free to send a patch.

You can follow this doc:
https://www.kernel.org/doc/html/v4.11/process/submitting-patches.html


Thanks!

Re: [PATCH net 0/2] bpf: fix verifier min/max handling in BPF_SUB

2017-07-24 Thread David Miller

From: Edward Cree 
Date: Fri, 21 Jul 2017 14:35:17 +0100

> I managed to come up with a test for the swapped bounds in BPF_SUB, so here
>  it is along with a patch that fixes it, separated out from my 'rewrite
>  everything' series so it can go to -stable.

Series applied and queued up for -stable, thanks.

Re: [PATCH] PCI: Update ACS quirk for more Intel 10G NICs

2017-07-24 Thread Alex Williamson

On Mon, 24 Jul 2017 12:31:39 -0700
Roland Dreier  wrote:

> > Is there a misunderstanding of the code flow here?  We're never setting
> > EC.  In the first code block we're just masking out requested
> > capabilities where unimplemented capabilities is the same as
> > implemented + enabled.  We're not adding EC to the request, we're
> > just not removing it based on the implemented capabilities because we
> > don't interpret it the same way.  Thus if someone wants to test a
> > device for EC, it really needs to implement EC, we cannot assume it
> > based on lack of support for EC in the ACS capability.  As you point
> > out, nobody really cares about EC yet though.  
> 
> I guess I find the semantics confusing.  For every other bit,
> pci_acs_enabled() returns true if the bit is either enabled or not
> implemented.  For EC, it returns false if the bit is not implemented.

EC is a bit more complicated than the other bits, it has both an enable
bit and a control vector and I'd need to stare at the spec for a while
to understand it better and likely decide that it needs a separate
interface from the rest of the capabilities within ACS.  Therefore we
take the conservative approach of requiring the device to legitimately
support it if anyone asks for it.

> It's not clear to me what the use case for checking for PCI_ACS_EC
> enabled would be.  Seems like checking for EC in the capabilities
> register would be more useful.

Some sort of interface for manipulating the control vector would be
necessary to fully support it and maybe the interface today just
doesn't make much sense for it.  Thanks,

Alex

Re: Bonding driver fails to enable second interface if updelay is non-zero

2017-07-24 Thread Cong Wang

On Thu, Jul 20, 2017 at 7:07 PM, Benjamin Gilbert
 wrote:
> [resend]
>
> Hello,
>
> Starting with commit de77ecd4ef02ca783f7762e04e92b3d0964be66b, and
> through 4.12.2, the bonding driver in 802.3ad mode fails to enable the
> second interface on a bond device if updelay is non-zero.  dmesg says:
>
> [   35.825227] bond0: Setting xmit hash policy to layer3+4 (1)
> [   35.825259] bond0: Setting MII monitoring interval to 100
> [   35.825303] bond0: Setting down delay to 200
> [   35.825328] bond0: Setting up delay to 200
> [   35.827414] bond0: Adding slave eth0
> [   35.949205] bond0: Enslaving eth0 as a backup interface with a down link
> [   35.950812] bond0: Adding slave eth1
> [   36.073764] bond0: Enslaving eth1 as a backup interface with a down link
> [   36.076808] IPv6: ADDRCONF(NETDEV_UP): bond0: link is not ready
> [   39.327423] igb :01:00.0 eth0: igb: eth0 NIC Link is Up 1000
> Mbps Full Duplex, Flow Control: RX
> [   39.405580] bond0: link status up for interface eth0, enabling it in 0 ms
> [   39.405607] bond0: link status definitely up for interface eth0,
> 1000 Mbps full duplex
> [   39.405608] bond0: Warning: No 802.3ad response from the link
> partner for any adapters in the bond
> [   39.405613] bond0: first active interface up!
> [   39.406186] IPv6: ADDRCONF(NETDEV_CHANGE): bond0: link becomes ready
> [   39.551391] igb :01:00.1 eth1: igb: eth1 NIC Link is Up 1000
> Mbps Full Duplex, Flow Control: RX
> [   39.613590] bond0: link status up for interface eth1, enabling it in 200 ms
> [   39.717575] bond0: link status up for interface eth1, enabling it in 200 ms
> [   39.821395] bond0: link status up for interface eth1, enabling it in 200 ms
> [   39.925584] bond0: link status up for interface eth1, enabling it in 200 ms
> [   40.029288] bond0: link status up for interface eth1, enabling it in 200 ms
> [   40.133388] bond0: link status up for interface eth1, enabling it in 200 ms
>
> ...and so on every 100 ms.  The bug doesn't trigger 100% reliably, but
> can be provoked by removing and re-adding interfaces to the bond via
> sysfs.
>
> While the problem is occurring, networking appears to be unreliable.
> Setting the updelay to 0 fixes it:
>
> [  345.472559] bond0: link status up for interface eth1, enabling it in 200 ms
> [  345.576558] bond0: link status up for interface eth1, enabling it in 200 ms
> [  345.607614] bond0: Setting up delay to 0
> [  345.680396] bond0: link status definitely up for interface eth1,
> 1000 Mbps full duplex
>
> I'd be happy to provide further details or to test patches.

A quick glance seems Mahesh missed the following piece:

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 181839d6fbea..9bee6c1c70cc 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -2050,6 +2050,7 @@ static int bond_miimon_inspect(struct bonding *bond)
continue;

bond_propose_link_state(slave, BOND_LINK_FAIL);
+   commit++;
slave->delay = bond->params.downdelay;
if (slave->delay) {
netdev_info(bond->dev, "link status
down for %sinterface %s, disabling it in %d ms\n",
@@ -2088,6 +2089,7 @@ static int bond_miimon_inspect(struct bonding *bond)
continue;

bond_propose_link_state(slave, BOND_LINK_BACK);
+   commit++;
slave->delay = bond->params.updelay;

if (slave->delay) {

Re: [PATCH net-next v2 0/6] Allow to switch off UDP-based tunnel offloads per netdevice

2017-07-24 Thread David Miller

From: Sabrina Dubroca 
Date: Fri, 21 Jul 2017 12:49:27 +0200

> This patchset adds a new netdevice feature to toggle RX offloads of
> UDP-based tunnel via ethtool. This is useful if the offload is causing
> issues, for example if the hardware is buggy.
> 
> The feature is added to all devices providing the ->ndo_udp_tunnel_add
> op, and enabled by default to preserve current behavior.
> 
> When the administrator disables this feature on a device, all
> currently offloaded ports are cleared from the device.  When the
> feature is turned on, the stack notifies the device about all current
> ports.
> 
> v2:
>  - rename feature bit to NETIF_F_RX_UDP_TUNNEL_PORT
>  - rename ethtool feature to rx-udp_tunnel-port-offload

Series applied, thanks Sabrina.

Re: [PATCH net-next v3 1/1] geneve: add rtnl changelink support

2017-07-24 Thread David Miller

From: Girish Moodalbail 
Date: Thu, 20 Jul 2017 22:44:20 -0700

> This patch adds changelink rtnl operation support for geneve devices
> and the code changes involve:
> 
>   - added geneve_quiesce() which quiesces the geneve device data path
> for both TX and RX. This lets us perform the changelink operation
> atomically w.r.t data path. Also added geneve_unquiesce() to
> reverse the operation of geneve_quiesce().
> 
>   - refactor geneve_newlink into geneve_nl2info to be used by both
> geneve_newlink and geneve_changelink
> 
>   - geneve_nl2info takes a changelink boolean argument to isolate
> changelink checks.
> 
>   - Allow changing only a few attributes (ttl, tos, and remote tunnel
> endpoint IP address (within the same address family)):
> - return -EOPNOTSUPP for attributes that cannot be changed for
>   now. Incremental patches can make the non-supported one
>   available in the future if needed.
> 
> Signed-off-by: Girish Moodalbail 

Applied.

Re: [PATCH] net: ethernet: mediatek: Explicitly include linux/interrupt.h

2017-07-24 Thread David Miller

From: Mark Brown 
Date: Thu, 20 Jul 2017 11:06:31 +0100

> The mediatek ethernet driver uses interrupts but does not explicitly
> include linux/interrupt.h, relying on implicit includes.  Fix this so we
> don't get build breaks as happened for ARM in next-20170720.
> 
> Signed-off-by: Mark Brown 

Applied, thanks.

Re: [PATCH] tun/tap: Add the missed return value check of register_netdevice_notifier

2017-07-24 Thread David Miller

From: Tonghao Zhang 
Date: Thu, 20 Jul 2017 02:41:34 -0700

> There is some codes of tun/tap module which did not check the return
> value of register_netdevice_notifier. Add the check now.
> 
> Signed-off-by: Tonghao Zhang 

Applied.

Re: [PATCH v2 net] net: ethernet: mediatek: avoid potential invalid memory access

2017-07-24 Thread David Miller

From: 
Date: Sat, 22 Jul 2017 20:45:55 +0800

> From: Sean Wang 
> 
> Potential dangerous invalid memory might be accessed if invalid mac value
> reflected from the forward port field in rxd4 caused by possible potential
> hardware defects. So added a simple sanity checker to avoid the kind of
> situation happening.
> 
> Signed-off-by: Sean Wang 
> Acked-by: John Crispin 

Applied.

Re: [PATCH] net: Convert to using %pOF instead of full_name

2017-07-24 Thread David Miller

From: Rob Herring 
Date: Wed, 19 Jul 2017 10:09:13 -0500

> On Tue, Jul 18, 2017 at 4:51 PM, David Miller  wrote:
>> From: Rob Herring 
>> Date: Tue, 18 Jul 2017 16:43:19 -0500
>>
>>> Now that we have a custom printf format specifier, convert users of
>>> full_name to use %pOF instead. This is preparation to remove storing
>>> of the full path string for each node.
>>>
>>> Signed-off-by: Rob Herring 
>>
>> Acked-by: David S. Miller 
> 
> The dependency went into 4.13. You can take this thru netdev.

Ok, applied to net-next.

Re: [PATCH net-next V2 0/5] Refine virtio-net XDP

2017-07-24 Thread David Miller

From: Jason Wang 
Date: Wed, 19 Jul 2017 16:54:44 +0800

> Hi:
> 
> This series brings two optimizations for virtio-net XDP:
> 
> - avoid reset during XDP set
> - turn off offloads on demand
> 
> Changes from V1:
> - Various tweaks on commit logs and comments
> - Use virtnet_napi_enable() when enabling NAPI on XDP set
> - Copy the small buffer packet only if xdp_headroom is smaller than
>   required
> 
> Please review.

Series applied, thanks Jason.

Michael, I gave you 4+ days to properly review this respin of this
series.  I felt that was more than sufficient time, even with an
intervening weekend.  So if you don't like that I applied this without
your review, please be more punctual in the future.

Thank you.

Re: [PATCH V2 net] net: fix tcp reset packet flowlabel for ipv6

2017-07-24 Thread David Miller

From: Shaohua Li 
Date: Tue, 18 Jul 2017 12:03:37 -0700

> + /* Since this is being sent on the wire obfuscate hash a bit
> +  * to minimize possbility that any useful information to an
> +  * attacker is leaked. Only lower 20 bits are relevant.
> +  */
> + rol32(hash, 16);

This doesn't help anything at all.

I don't like things that try to give a sense of security (however
little) but actually don't give any at all.

I'd rather, therefore, that you remove this altogether.

And note that maybe not using the same flow label prevents
attacks of that nature, ever consider that?

Re: [PATCH V4 net-next 7/8] net: hns3: Add Ethtool support to HNS3 driver

2017-07-24 Thread Rustad, Mark D

> On Jul 23, 2017, at 10:05 AM, Florian Fainelli  wrote:
>> +
>> +strncpy(drvinfo->version, HNAE_DRIVER_VERSION,
>> +sizeof(drvinfo->version));
>> +drvinfo->version[sizeof(drvinfo->version) - 1] = '\0';
> 
> strlcpy() would probably do that for you.

You need to be careful about strlcpy - it does not completely write the 
destination buffer as strncpy does, and so can result in a kernel memory leak 
if the destination is not known to already be cleared.

>> +
>> +strncpy(drvinfo->driver, HNAE_DRIVER_NAME, sizeof(drvinfo->driver));
>> +drvinfo->driver[sizeof(drvinfo->driver) - 1] = '\0';
> 
> Same here

Same here

>> +
>> +strncpy(drvinfo->bus_info, priv->dev->bus->name,
>> +sizeof(drvinfo->bus_info));> +  
>> drvinfo->bus_info[ETHTOOL_BUSINFO_LEN - 1] = '\0';
> 
> And here.

And here too. I haven't looked at this deeply enough to know whether a leak 
could be created by strlcpy here, but I wanted to raise it as something to be 
considered before switching to it. Blindly adopting strlcpy is hazardous as are 
tools that unconditionally recommend it.

--
Mark Rustad, Networking Division, Intel Corporation

signature.asc
Description: Message signed with OpenPGP

1 2 3 >

1 - 100 of 219 matches

Mail list logo