date:20170713

Re: [PATCH 2/2] net: ethernet: nixge: Add support for National Instruments XGE netdev

2017-07-13 Thread kbuild test robot

Hi Moritz,

[auto build test WARNING on net-next/master]
[also build test WARNING on v4.12 next-20170713]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Moritz-Fischer/dt-bindings-net-Add-bindings-for-National-Instruments-XGE-netdev/20170714-125718
config: ia64-allmodconfig (attached as .config)
compiler: ia64-linux-gcc (GCC) 6.2.0
reproduce:
wget 
https://raw.githubusercontent.com/01org/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=ia64 

All warnings (new ones prefixed by >>):

   In file included from include/linux/if_ether.h:23:0,
from include/linux/etherdevice.h:25,
from drivers/net/ethernet/ni/nixge.c:14:
   drivers/net/ethernet/ni/nixge.c: In function 'nixge_dma_bd_release':
>> drivers/net/ethernet/ni/nixge.c:241:17: warning: cast to pointer from 
>> integer of different size [-Wint-to-pointer-cast]
  dev_kfree_skb((struct sk_buff *)
^
   include/linux/skbuff.h:977:38: note: in definition of macro 'dev_kfree_skb'
#define dev_kfree_skb(a) consume_skb(a)
 ^
   drivers/net/ethernet/ni/nixge.c: In function 'nixge_dma_bd_init':
>> drivers/net/ethernet/ni/nixge.c:299:35: warning: cast from pointer to 
>> integer of different size [-Wpointer-to-int-cast]
  priv->rx_bd_v[i].sw_id_offset = (u32)skb;
  ^
   drivers/net/ethernet/ni/nixge.c: In function 'nixge_start_xmit_done':
   drivers/net/ethernet/ni/nixge.c:452:22: warning: cast to pointer from 
integer of different size [-Wint-to-pointer-cast]
   dev_kfree_skb_irq((struct sk_buff *)cur_p->app4);
 ^
   drivers/net/ethernet/ni/nixge.c: In function 'nixge_recv':
   drivers/net/ethernet/ni/nixge.c:546:9: warning: cast to pointer from integer 
of different size [-Wint-to-pointer-cast]
  skb = (struct sk_buff *)(cur_p->sw_id_offset);
^
   drivers/net/ethernet/ni/nixge.c:577:25: warning: cast from pointer to 
integer of different size [-Wpointer-to-int-cast]
  cur_p->sw_id_offset = (u32)new_skb;
^
   drivers/net/ethernet/ni/nixge.c: In function 'nixge_dma_err_handler':
   drivers/net/ethernet/ni/nixge.c:687:22: warning: cast to pointer from 
integer of different size [-Wint-to-pointer-cast]
   dev_kfree_skb_irq((struct sk_buff *)cur_p->app4);
 ^

vim +241 drivers/net/ethernet/ni/nixge.c

   228  
   229  #define nixge_ctrl_poll_timeout(priv, addr, val, cond, sleep_us, 
timeout_us) \
   230  readl_poll_timeout((priv)->ctrl_regs + (addr), (val), cond, \
   231 (sleep_us), (timeout_us))
   232  
   233  static void nixge_dma_bd_release(struct net_device *ndev)
   234  {
   235  int i;
   236  struct nixge_priv *priv = netdev_priv(ndev);
   237  
   238  for (i = 0; i < RX_BD_NUM; i++) {
   239  dma_unmap_single(ndev->dev.parent, 
priv->rx_bd_v[i].phys,
   240   priv->max_frm_size, DMA_FROM_DEVICE);
 > 241  dev_kfree_skb((struct sk_buff *)
   242(priv->rx_bd_v[i].sw_id_offset));
   243  }
   244  
   245  if (priv->rx_bd_v) {
   246  dma_free_coherent(ndev->dev.parent,
   247sizeof(*priv->rx_bd_v) * RX_BD_NUM,
   248priv->rx_bd_v,
   249priv->rx_bd_p);
   250  }
   251  if (priv->tx_bd_v) {
   252  dma_free_coherent(ndev->dev.parent,
   253sizeof(*priv->tx_bd_v) * TX_BD_NUM,
   254priv->tx_bd_v,
   255priv->tx_bd_p);
   256  }
   257  }
   258  
   259  static int nixge_dma_bd_init(struct net_device *ndev)
   260  {
   261  u32 cr;
   262  int i;
   263  struct sk_buff *skb;
   264  struct nixge_priv *priv = netdev_priv(ndev);
   265  
   266  /* Reset the indexes which are used for accessing the BDs */
   267  priv->tx_bd_ci = 0;
   268  priv->tx_bd_tail = 0;
   269  priv->rx_bd_ci = 0;
   270  
   271  /* Allocate the Tx and Rx buffer descriptors. */
   272  priv->tx_bd_v = dma_zalloc_coherent(ndev->dev.parent,
   273sizeof(*priv->tx_bd_v) * 
TX_BD_NUM,
   274>tx_bd_p, GFP_KERNEL);
   275  if (!priv->tx_bd_v)
   276  goto out;
   277  
   278

Re: [PATCH 2/2] net: ethernet: nixge: Add support for National Instruments XGE netdev

2017-07-13 Thread Moritz Fischer

Hi Andrew,

On Thu, Jul 13, 2017 at 6:34 PM, Andrew Lunn  wrote:
>> > > + /* not sure if this is the correct way of dealing with this ... */
>> > > + ndev->phydev->supported &= ~(SUPPORTED_Autoneg);
>> > > + ndev->phydev->advertising = ndev->phydev->supported;
>> > > + ndev->phydev->autoneg = AUTONEG_DISABLE;
>> >
>> > What are you trying to achieve?
>>
>> Basically can't do Autoneg, I'll need to take a closer look.
>
> Hi Moritz
>
> What i actually think you mean, is it can only do 1Gbps. So you could
> autoneg, but only advertise 1Gbps. Look at masking out
> PHY_10BT_FEATURES and PHY_100BT_FEATURES.

It does either 1Gbps or 10Gbps (over SFP+), depending which bitstream is loaded
into the  FPGA. In the current setup I could also just have two
different compatible
strings, since neither setup supports the other rate, but that might change.

It seems getting rid of that part (the default values) now works, too.

I'll need to take a closer look tomorrow (and I need to retest with 1Gbps)

>
> Take a look at:
>
> http://elixir.free-electrons.com/linux/latest/source/drivers/net/ethernet/renesas/ravb_main.c#L1045

Will do.

Thanks for feedback,

Moritz

[PATCHv2 1/1] mlx4_en: remove unnecessary returned value check

2017-07-13 Thread Zhu Yanjun

The function __mlx4_zone_remove_one_entry always returns zero. So
it is not necessary to check it.

Cc: Joe Jin 
Cc: Junxiao Bi 
Signed-off-by: Zhu Yanjun 
Reviewed-by: Yuval Shaia 
---
Change from v1 to v2:
 Initialization is moved to variable declaration.

 drivers/net/ethernet/mellanox/mlx4/alloc.c | 8 +++-
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/alloc.c 
b/drivers/net/ethernet/mellanox/mlx4/alloc.c
index 249a458..710d6c6 100644
--- a/drivers/net/ethernet/mellanox/mlx4/alloc.c
+++ b/drivers/net/ethernet/mellanox/mlx4/alloc.c
@@ -283,7 +283,7 @@ int mlx4_zone_add_one(struct mlx4_zone_allocator 
*zone_alloc,
 }
 
 /* Should be called under a lock */
-static int __mlx4_zone_remove_one_entry(struct mlx4_zone_entry *entry)
+static void __mlx4_zone_remove_one_entry(struct mlx4_zone_entry *entry)
 {
struct mlx4_zone_allocator *zone_alloc = entry->allocator;
 
@@ -315,8 +315,6 @@ static int __mlx4_zone_remove_one_entry(struct 
mlx4_zone_entry *entry)
}
zone_alloc->mask = mask;
}
-
-   return 0;
 }
 
 void mlx4_zone_allocator_destroy(struct mlx4_zone_allocator *zone_alloc)
@@ -457,7 +455,7 @@ struct mlx4_bitmap *mlx4_zone_get_bitmap(struct 
mlx4_zone_allocator *zones, u32
 int mlx4_zone_remove_one(struct mlx4_zone_allocator *zones, u32 uid)
 {
struct mlx4_zone_entry *zone;
-   int res;
+   int res = 0;
 
spin_lock(>lock);
 
@@ -468,7 +466,7 @@ int mlx4_zone_remove_one(struct mlx4_zone_allocator *zones, 
u32 uid)
goto out;
}
 
-   res = __mlx4_zone_remove_one_entry(zone);
+   __mlx4_zone_remove_one_entry(zone);
 
 out:
spin_unlock(>lock);
-- 
2.7.4

[git pull] vfs.git network field-by-field copyin patches

2017-07-13 Thread Al Viro

[My apologies for late pull requests - I hoped to be back to normal
connectivity by yesterday evening.  Spent the night sitting in Logan
instead...]

This part of misc.compat queue was held back for review from networking
folks and since davem has jus ACKed those...

The following changes since commit 0d0606060baefdb13d3d80dba1b4c816b0676e16:

  mqueue: move compat syscalls to native ones (2017-07-04 13:13:49 -0400)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs.git misc.compat

for you to fetch changes up to f8f8a727eab1c5b78c3703a461565b042979cc79:

  get_compat_bpf_fprog(): don't copyin field-by-field (2017-07-04 13:14:34 
-0400)


Al Viro (3):
  copy_msghdr_from_user(): get rid of field-by-field copyin
  get_compat_msghdr(): get rid of field-by-field copyin
  get_compat_bpf_fprog(): don't copyin field-by-field

 net/compat.c | 49 +++--
 net/socket.c | 31 ++-
 2 files changed, 37 insertions(+), 43 deletions(-)

Re: [RFC] get_compat_msghdr(): get rid of field-by-field copyin

2017-07-13 Thread David Miller

From: Al Viro 
Date: Fri, 14 Jul 2017 02:37:50 +0100

> On Tue, Jul 11, 2017 at 08:25:14PM -0700, David Miller wrote:
> 
>> looks harmless, or if there is a bug in there I can't see it.
>> 
>> But whatever it is, that same problem could be hiding in some of these
>> other transformations as well.
>> 
>> I think the bug might be that we are corrupting the user's stack
>> somehow.  But the two user copies in that commit look perfectly fine
>> to my eyes.
>> 
>> There shouldn't be any padding in that compat_rlimit structure, so
>> it's not like we're copying extra bytes.  Well, we'd be exposing
>> kernel stack memory if that were the case.
> 
> There isn't any padding in compat_rlimit; unfortunately, it was
> mistakenly declared as struct rlimit instead.  Which, of course,
> has different member sizes - otherwise we wouldn't have needed
> a compat syscall there in the first place.
> 
> It was harder to spot since I combined move and a transformation
> into one commit.  Shouldn't have done so...  Had those been two
> separate commits, the bug would've stood out immediately.  Shouldn't
> be the case here...

Ok, I'll ack these patches then:

Acked-by: David S. Miller

Re: [PATCH][net-next] svcrdma: fix an incorrect check on -E2BIG and -EINVAL

2017-07-13 Thread J. Bruce Fields

On Thu, Jul 13, 2017 at 01:53:10PM -0400, Chuck Lever wrote:
> 
> > On Jul 13, 2017, at 1:51 PM, Colin King  wrote:
> > 
> > From: Colin Ian King 
> > 
> > The current check will always be true and will always jump to
> > err1, this looks dubious to me. I believe && should be used
> > instead of ||.
> > 
> > Detected by CoverityScan, CID#1450120 ("Logically Dead Code")
> > 
> > Fixes: 107c1d0a991a ("svcrdma: Avoid Send Queue overflow")
> > Signed-off-by: Colin Ian King 
> 
> Reviewed-by: Chuck Lever 
> 
> Dan reported this today, and I have a similar patch in my
> "pending for-rc" series. This one works too.

Applied and merged upstream, thanks.

--b.

Re: [RFC] get_compat_msghdr(): get rid of field-by-field copyin

2017-07-13 Thread Al Viro

On Tue, Jul 11, 2017 at 08:25:14PM -0700, David Miller wrote:

> looks harmless, or if there is a bug in there I can't see it.
> 
> But whatever it is, that same problem could be hiding in some of these
> other transformations as well.
> 
> I think the bug might be that we are corrupting the user's stack
> somehow.  But the two user copies in that commit look perfectly fine
> to my eyes.
> 
> There shouldn't be any padding in that compat_rlimit structure, so
> it's not like we're copying extra bytes.  Well, we'd be exposing
> kernel stack memory if that were the case.

There isn't any padding in compat_rlimit; unfortunately, it was
mistakenly declared as struct rlimit instead.  Which, of course,
has different member sizes - otherwise we wouldn't have needed
a compat syscall there in the first place.

It was harder to spot since I combined move and a transformation
into one commit.  Shouldn't have done so...  Had those been two
separate commits, the bug would've stood out immediately.  Shouldn't
be the case here...

> Color me stumped, but cautious about ACK'ing these networking
> compat changes.

Re: [PATCH 2/2] net: ethernet: nixge: Add support for National Instruments XGE netdev

2017-07-13 Thread Andrew Lunn

> > > + /* not sure if this is the correct way of dealing with this ... */
> > > + ndev->phydev->supported &= ~(SUPPORTED_Autoneg);
> > > + ndev->phydev->advertising = ndev->phydev->supported;
> > > + ndev->phydev->autoneg = AUTONEG_DISABLE;
> > 
> > What are you trying to achieve?
> 
> Basically can't do Autoneg, I'll need to take a closer look.

Hi Moritz

What i actually think you mean, is it can only do 1Gbps. So you could
autoneg, but only advertise 1Gbps. Look at masking out
PHY_10BT_FEATURES and PHY_100BT_FEATURES.

Take a look at:

http://elixir.free-electrons.com/linux/latest/source/drivers/net/ethernet/renesas/ravb_main.c#L1045

It might actually make sense to add a phy_set_min_speed(), a mirror to
phy_set_max_speed().

Andrew

Re: [PATCH v7 2/3] PCI: Enable PCIe Relaxed Ordering if supported

2017-07-13 Thread Ding Tianhong



On 2017/7/14 5:09, Sinan Kaya wrote:
> On 7/13/2017 10:21 AM, Ding Tianhong wrote:
>> static void pci_configure_relaxed_ordering(struct pci_dev *dev)
>> +{
>> +/* We should not alter the relaxed ordering bit for the VF */
>> +if (dev->is_virtfn)
>> +return;
>> +
>> +/* If the releaxed ordering enable bit is not set, do nothing. */
>> +if (!pcie_relaxed_ordering_supported(dev))
>> +return;
>> +
>> +if (pci_dev_should_disable_relaxed_ordering(dev)) {
>> +pcie_clear_relaxed_ordering(dev);
>> +dev_info(>dev, "Disable Relaxed Ordering\n");
>> +}
>> +}
> 
> I couldn't find anywhere where you actually enable the relaxed ordering
> like the subject suggests.
> 
There is no code to enable the PCIe Relaxed Ordering bit in the configuration 
space,
it is only be enable by default according to the PCIe Standard Specification, 
what we
do is to distinguish the RC problematic platform and clear the Relaxed Ordering 
bit
to tell the PCIe EP don't send any TLPs with Relaxed Ordering Attributes to the 
Root
Complex.

Thanks
Ding

Re: [PATCH net] net: bridge: fix dest lookup when vlan proto doesn't match

2017-07-13 Thread Toshiaki Makita

On 2017/07/13 22:09, Nikolay Aleksandrov wrote:
> With 802.1ad support the vlan_ingress code started checking for vlan
> protocol mismatch which causes the current tag to be inserted and the
> bridge vlan protocol & pvid to be set. The vlan tag insertion changes
> the skb mac_header and thus the lookup mac dest pointer which was loaded
> prior to calling br_allowed_ingress in br_handle_frame_finish is VLAN_HLEN
> bytes off now, pointing to the last two bytes of the destination mac and
> the first four of the source mac causing lookups to always fail and
> broadcasting all such packets to all ports. Same thing happens for locally
> originated packets when passing via br_dev_xmit. So load the dest pointer
> after the vlan checks and possible skb change.
> 
> Fixes: 8580e2117c06 ("bridge: Prepare for 802.1ad vlan filtering support")
> Reported-by: Anitha Narasimha Murthy 
> Signed-off-by: Nikolay Aleksandrov 

Oops. Thank you for fixing it.

Acked-by: Toshiaki Makita

Re: [PATCH net-next v1 2/3] gtp: Support creating flow-based gtp net_device

2017-07-13 Thread Jiannan Ouyang

Hi Harald,

> On 7/13/17, 12:35 AM, "Harald Welte"  wrote:
> 
> > +static int gtp_dev_open(struct net_device *dev)
> > +{
> > +   struct gtp_dev *gtp = netdev_priv(dev);
> > +   struct net *net = gtp->net;
> > +   struct socket *sock1u;
> > +   struct socket *sock0;
> > +   struct udp_tunnel_sock_cfg tunnel_cfg;
> > +   struct udp_port_cfg udp_conf;
> > +   int err;
> > +
> > +   memset(_conf, 0, sizeof(udp_conf));
> > +
> > +   udp_conf.family = AF_INET;
> > +   udp_conf.local_ip.s_addr = htonl(INADDR_ANY);
> > +   udp_conf.local_udp_port = htons(GTP1U_PORT);
> > +
> > +   err = udp_sock_create(gtp->net, _conf, );
> > +   if (err < 0)
> > +   return err;
> > +
> > +   udp_conf.local_udp_port = htons(GTP0_PORT);
> > +   err = udp_sock_create(gtp->net, _conf, );
> > +   if (err < 0)
> > +   return err;
> 
> you're unconditionally binding to both GTP0 and GTP1 UDP ports.  This is
> done selectively based on netlink attributes in the existing "normal"
> non-OVS kernel code, i.e. the control is left to the user.
> 
> Is this function is only called/used in the context of OVS?  If so,
> since you explicitly implement only GTPv1, why bind to GTPv0 port?
> 

I had doubts on how this flow-based GTPv1 code path should fit in, which is why
the GTPv0 and the GTPv1 code pieces are mixed in my changes. Should I
explicitly claim that the flow-based change is for GTPv1 only?

> > +   setup_udp_tunnel_sock(net, sock1u, _cfg);
> 
> even here, you're only setting up the v1 and not v0.
> 

same reason as above.

> > +   /* Assume largest header, ie. GTPv0. */
> > +   dev->needed_headroom= LL_MAX_HEADER +
> > +   sizeof(struct iphdr) +
> > +   sizeof(struct udphdr) +
> > +   sizeof(struct gtp0_header);
> 
> ... and here you're using headroom for a GTPv0 header, despite (I think)
> only supporting GTPv1 from this configuration?

Yes, only GTPv1 is supported.

> 
> > +   err = gtp_hashtable_new(gtp, GTP_PDP_HASHSIZE); // JO: when to free??
> 
> I think that question about when to free needs to be resolved before any
> merge.  Did you check that it persists even after the device is
> closed/removed?

I didn't investigate it. What do you mean by persist?

Thanks
-Jiannan

Re: [PATCH net-next v1 1/3] gtp: refactor to support flow-based gtp encap and decap

2017-07-13 Thread Jiannan Ouyang

Hi Harald,

> On 7/13/17, 12:26 AM, "Harald Welte"  wrote:
>·
> >  static inline void gtp_set_pktinfo_ipv4(struct gtp_pktinfo *pktinfo,
> >   struct sock *sk, struct iphdr *iph,
> > - struct pdp_ctx *pctx, struct rtable *rt,
> > - struct flowi4 *fl4,
> > + struct rtable *rt, struct flowi4 *fl4,
> >   struct net_device *dev)
> >  {
> >  [...]
> > + __be32 tun_id;
>·
> you are breaking GTPv0 functionality here.  GTPv0 has 64 bit tunnel
> identifiers, and this function is called both from GTPv1 and GTPv0
> context.
>·
> This makes me wonder how you did verify that your changes do not break
> the existing operation with both GTPv0 and GTPv1?
>·

Good catch. I only fully tested the GTPv1 path against oai-cn. Will fix
this and test the GTPv0 path as well.

I had doubts on how this flow-based GTPv1 code path should fit in, 
which is why the GTPv0 and the GTPv1 code pieces are mixed in my changes. 

Should I explicitly claim that the flow-based change is for GTPv1 only? 

> > + // flow-based GTP1U encap
> > + info = skb_tunnel_info(skb);
> > + if (gtp->collect_md && info && ntohs(info->key.tp_dst) == GTP1U_PORT) {
>·
> I think it's typically safe to assume that GTP is only operated on
> standard ports, but it is something you chould/should think about, i.e.
> whether you want that kind of restriction.  In the existing use case, we
> have the v0/v1 information stored in the per-pdp context structure.
>·

The reason I’m checking GTP1U_PORT here is to filter GTP1U traffic. 
It possible to pass a port number from ovs into the gtp module. I will 
investigate how to support programmable port. 

> > +   tun_id  = htonl(pctx->u.v1.o_tei);
>·
> here is where you're assuming GTPv1 in two ways from code that is called
> from both v0 and v1.
> * you're dereferencing a v1 specific element in the pctx union
> * you're storing the result in a 32bit variable
>·

Right, will fix this for GTPv0.

> >   gtp = netdev_priv(dev);
> > + gtp->net = src_net;
>·
> Isn't this a generic change that's independent of your work on OVS GTP?

It is meant to be OVS independent. What makes it not? Should I leave 
this field un-initialized?

Thanks
-Jiannan

Re: [PATCH 1/2] dt-bindings: net: Add bindings for National Instruments XGE netdev

2017-07-13 Thread Moritz Fischer

Hi Yuan,

On Thu, Jul 13, 2017 at 5:33 PM, YUAN Linyu
 wrote:
>
>
>> -Original Message-
>> From: netdev-ow...@vger.kernel.org [mailto:netdev-ow...@vger.kernel.org]
>> On Behalf Of Moritz Fischer
>> Sent: Friday, July 14, 2017 5:22 AM
>> To: netdev@vger.kernel.org
>> Cc: devicet...@vger.kernel.org; linux-ker...@vger.kernel.org;
>> da...@davemloft.net; mark.rutl...@arm.com; robh...@kernel.org; Moritz
>> Fischer
>> Subject: [PATCH 1/2] dt-bindings: net: Add bindings for National Instruments
>> XGE netdev
>>
>> This adds bindings for the NI XGE 1G/10G network device.
>>
>> Signed-off-by: Moritz Fischer 
>> ---
>>  Documentation/devicetree/bindings/net/nixge.c | 32
>> +++
>>  1 file changed, 32 insertions(+)
>>  create mode 100644 Documentation/devicetree/bindings/net/nixge.c
>
> It should be a text file, nixge.txt

You are absolutely right ... I need to have my head examined.
>
>>
>> diff --git a/Documentation/devicetree/bindings/net/nixge.c
>> b/Documentation/devicetree/bindings/net/nixge.c
>> new file mode 100644
>> index 000..9fff5a7

RE: [PATCH 1/2] dt-bindings: net: Add bindings for National Instruments XGE netdev

2017-07-13 Thread YUAN Linyu



> -Original Message-
> From: netdev-ow...@vger.kernel.org [mailto:netdev-ow...@vger.kernel.org]
> On Behalf Of Moritz Fischer
> Sent: Friday, July 14, 2017 5:22 AM
> To: netdev@vger.kernel.org
> Cc: devicet...@vger.kernel.org; linux-ker...@vger.kernel.org;
> da...@davemloft.net; mark.rutl...@arm.com; robh...@kernel.org; Moritz
> Fischer
> Subject: [PATCH 1/2] dt-bindings: net: Add bindings for National Instruments
> XGE netdev
> 
> This adds bindings for the NI XGE 1G/10G network device.
> 
> Signed-off-by: Moritz Fischer 
> ---
>  Documentation/devicetree/bindings/net/nixge.c | 32
> +++
>  1 file changed, 32 insertions(+)
>  create mode 100644 Documentation/devicetree/bindings/net/nixge.c

It should be a text file, nixge.txt

> 
> diff --git a/Documentation/devicetree/bindings/net/nixge.c
> b/Documentation/devicetree/bindings/net/nixge.c
> new file mode 100644
> index 000..9fff5a7

Re: [PATCH 2/2] net: ethernet: nixge: Add support for National Instruments XGE netdev

2017-07-13 Thread Moritz Fischer

Hi Andrew,

thanks for the quick response.

On Fri, Jul 14, 2017 at 12:36:36AM +0200, Andrew Lunn wrote:
> > +++ b/drivers/net/ethernet/ni/nixge.c
> > @@ -0,0 +1,1246 @@
> > +/*
> > + * Copyright (c) 2016-2017, National Instruments Corp.
> > + *
> > + * Network Driver for Ettus Research XGE MAC
> > + *
> > + * This is largely based on the Xilinx AXI Ethernet Driver,
> > + * and uses the same DMA engine in the FPGA
> 
> Hi Moritz 
> 
> Is the DMA code the same as in the AXI driver? Should it be pulled out
> into a library and shared?

Mostly, I'll see what I can do. At least the register definitions and
common structures can be pulled out into a common header file.

> 
> > +struct nixge_priv {
> > +   struct net_device *ndev;
> > +   struct device *dev;
> > +
> > +   /* Connection to PHY device */
> > +   struct phy_device *phy_dev;
> > +   phy_interface_t phy_interface;
> 
> > +   /* protecting link parameters */
> > +   spinlock_t  lock;
> > +   int link;
> > +   int speed;
> > +   int duplex;
> 
> All these seem to be pointless. They are set, but never used.

Will fix.
> 
> > +
> > +static inline void nixge_dma_write_reg(struct nixge_priv *priv, off_t 
> > offset,
> > +  u32 val)
> 
> Please leave it up to the compile to inline.

Will fix.
> 
> > +static void __nixge_device_reset(struct nixge_priv *priv, off_t offset)
> > +{
> > +   u32 timeout;
> > +   /* Reset Axi DMA. This would reset NIXGE Ethernet core as well.
> > +* The reset process of Axi DMA takes a while to complete as all
> > +* pending commands/transfers will be flushed or completed during
> > +* this reset process.
> > +*/
> > +   nixge_dma_write_reg(priv, offset, XAXIDMA_CR_RESET_MASK);
> > +   timeout = DELAY_OF_ONE_MILLISEC;
> > +   while (nixge_dma_read_reg(priv, offset) & XAXIDMA_CR_RESET_MASK) {
> > +   udelay(1);
> 
> There is a link between the 1 and the value of DELAY_OF_ONE_MILLISEC.
> It would be good to try to link these two together.

D'oh ... Seems like a good candidate for iopoll ...
> 
> > +   if (--timeout == 0) {
> > +   netdev_err(priv->ndev, "%s: DMA reset timeout!\n",
> > +  __func__);
> > +   break;
> > +   }
> > +   }
> > +}
> > +
> 
> > +static void nixge_handle_link_change(struct net_device *ndev)
> > +{
> > +   struct nixge_priv *priv = netdev_priv(ndev);
> > +   struct phy_device *phydev = ndev->phydev;
> > +   unsigned long flags;
> > +   int status_change = 0;
> > +
> > +   spin_lock_irqsave(>lock, flags);
> > +
> > +   if (phydev->link != priv->link) {
> > +   if (!phydev->link) {
> > +   priv->speed = 0;
> > +   priv->duplex = -1;
> > +   }
> > +   priv->link = phydev->link;
> > +
> > +   status_change = 1;
> > +   }
> > +
> > +   spin_unlock_irqrestore(>lock, flags);
> > +
> > +   if (status_change) {
> > +   if (phydev->link) {
> > +   netif_carrier_on(ndev);
> > +   netdev_info(ndev, "link up (%d/%s)\n",
> > +   phydev->speed,
> > +   phydev->duplex == DUPLEX_FULL ?
> > +   "Full" : "Half");
> > +   } else {
> > +   netif_carrier_off(ndev);
> > +   netdev_info(ndev, "link down\n");
> > +   }
> 
> phy_print_status() should be used.

Will do.
> 
> Also, the phylib will handle netif_carrier_off/on for you.

Good to know :)
> 
> > +static int nixge_open(struct net_device *ndev)
> > +{
> > +   struct nixge_priv *priv = netdev_priv(ndev);
> > +   int ret;
> > +
> > +   nixge_device_reset(ndev);
> > +
> > +   /* start netif carrier down */
> > +   netif_carrier_off(ndev);
> > +
> > +   if (!ndev->phydev)
> > +   netdev_err(ndev, "no phy, phy_start() failed\n");
> 
> Not really correct. You don't call phy_start(). And phy_start() cannot
> indicate a failure, it is a void function.

Will fix.
> 
> It would be a lot better to bail out if there is no phy. Probably
> during probe.

Yeah.
> 
> > +static s32 __nixge_set_mac_address(struct net_device *ndev, const void 
> > *addr)
> > +{
> > +   struct nixge_priv *priv = netdev_priv(ndev);
> > +
> > +   if (addr)
> > +   memcpy(ndev->dev_addr, addr, ETH_ALEN);
> > +   if (!is_valid_ether_addr(ndev->dev_addr))
> > +   eth_random_addr(ndev->dev_addr);
> 
> Messy. I would change this. Make addr mandatory. If it is invalid,
> return an error. That will make nixge_net_set_mac_address() do the
> right thing. When called from nixge_probe() should verify what it gets
> from the nvmem, and if it is invalid, pass a random MAC address.

Will fix.
> 
> > +
> > +   nixge_ctrl_write_reg(priv, NIXGE_REG_MAC_LSB,
> > +(ndev->dev_addr[2]) << 24 |
> > +(ndev->dev_addr[3] << 16) |
> > +(ndev->dev_addr[4] << 8) |
> >

Re: [PATCH v7 3/3] net/cxgb4: Use new PCI_DEV_FLAGS_NO_RELAXED_ORDERING flag

2017-07-13 Thread Casey Leedom

[[ Sorry for the Double Send: I forgot to switch to Plain Text.  Have I 
mentioned how much I hate modern Web-based email agents? :-) -- Casey ]]

  Yeah, I think this works  for now.  We'll stumble over what to do when we 
want to mix upstream  TLPs without Relaxed Ordering Attributes directed at 
problematic Root  Complexes, and Peer-to-Peer TLPs with Relaxed Ordering 
Attributes ... or vice versa depending on which target PCIe Device has  issues 
with Relaxed Ordering.

  Thanks for all the work!

Casey

Re: [PATCH] datapath: Fix for force/commit action failures

2017-07-13 Thread Joe Stringer

On 13 July 2017 at 15:38, Greg Rose  wrote:
> On 07/13/2017 11:03 AM, Joe Stringer wrote:
>>
>> On 13 July 2017 at 11:01, Greg Rose  wrote:
>>>
>>> On 07/13/2017 10:46 AM, Joe Stringer wrote:


 On 13 July 2017 at 09:25, Greg Rose  wrote:
>
>
> When there is an established connection in direction A->B, it is
> possible to receive a packet on port B which then executes
> ct(commit,force) without first performing ct() - ie, a lookup.
> In this case, we would expect that this packet can delete the existing
> entry so that we can commit a connection with direction B->A. However,
> currently we only perform a check in skb_nfct_cached() for whether
> OVS_CS_F_TRACKED is set and OVS_CS_F_INVALID is not set, ie that a
> lookup previously occurred. In the above scenario, a lookup has not
> occurred but we should still be able to statelessly look up the
> existing entry and potentially delete the entry if it is in the
> opposite direction.
>
> This patch extends the check to also hint that if the action has the
> force flag set, then we will lookup the existing entry so that the
> force check at the end of skb_nfct_cached has the ability to delete
> the connection.
>
> CC: d...@openvswitch.org
> CC: Pravin Shalar 
> Signed-off-by: Joe Stringer 
> Signed-off-by: Greg Rose 
> ---



 A couple more administrative notes, on netdev the module name in the
 patch subject for openvswitch is "openvswitch" rather than datapath;
>>>
>>>
>>>
>>> Right you are.
>>>
 and patches rather than having just "PATCH" as the subject prefix
 should state the tree. In this case, it's a bugfix so it should be
 "PATCH net".
>>>
>>>
>>>
>>> I knew that... forgot the format patch option to add it.  Net-next
>>> is closed so that would be mandatory.
>>>
>>>   Furthermore, if you're able to figure out which commit


 introduced the issue (I believe it's introduced by the force commit
 patch), then you should place the "Fixes: " tag. I can give you some
 pointers off-list on how to do this (git blame and some basic
 formatting of the targeted patch should do the trick - this tag
 expects a 12-digit hash).

 For reference, I ended up looking it up during review, this is the
 line you'd add:
 Fixes: dd41d33f0b03 ("openvswitch: Add force commit.")
>>>
>>>
>>>
>>> Oh, thanks!
>>>
>>>

>net/openvswitch/conntrack.c | 12 
>1 file changed, 8 insertions(+), 4 deletions(-)
>
> diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
> index 08679eb..9041cf8 100644
> --- a/net/openvswitch/conntrack.c
> +++ b/net/openvswitch/conntrack.c
> @@ -641,17 +641,21 @@ static bool skb_nfct_cached(struct net *net,
>   ct = nf_ct_get(skb, );
>   /* If no ct, check if we have evidence that an existing
> conntrack entry
>* might be found for this skb.  This happens when we lose a
> skb->_nfct
> -* due to an upcall.  If the connection was not confirmed, it
> is
> not
> -* cached and needs to be run through conntrack again.
> +* due to an upcall, or if the direction is being forced.  If
> the
> +* connection was not confirmed, it is not cached and needs to
> be
> run
> +* through conntrack again.
>*/
> -   if (!ct && key->ct_state & OVS_CS_F_TRACKED &&
> +   if ((!ct && (key->ct_state & OVS_CS_F_TRACKED) &&
>   !(key->ct_state & OVS_CS_F_INVALID) &&
> -   key->ct_zone == info->zone.id) {
> +key->ct_zone == info->zone.id) ||
> +(!key->ct_state && info->force)) {
>   ct = ovs_ct_find_existing(net, >zone,
> info->family, skb,
> !!(key->ct_state
>& OVS_CS_F_NAT_MASK));
>   if (ct)
>   nf_ct_get(skb, );
> +   else
> +   return false;
>   }
>   if (!ct)
>   return false;



 I was just wondering if this has the potential to prevent
 nf_conntrack_in() from being called at all in this case, which is also
 not quite right. In the original case of (!ct && (key->ct_state &
 OVS_CS_F_TRACKED) && !(key->ct_state & OVS_CS_F_TRACKED)), which I'll
 refer to as "ct_executed", we explicitly want to avoid running
 nf_conntrack_in() if we already ran it, because the connection tracker
 doesn't expect to see the same packet twice (there's also things like
 stats/accounting, and potentially L4 state machines that

Re: [PATCH net-next v1 0/3] Flow Based GTP Tunneling

2017-07-13 Thread Jiannan Ouyang

Hi Harald and Jeo,

Thank you for the code review. They are really helpful!

> On 7/13/17, 11:14 AM, "Joe Stringer"  wrote:
> 
> On 13 July 2017 at 00:12, Harald Welte  wrote:
>> I'm not familiar with the details here, but does this imply that you
>> are matching on the outer (transport layer) source IP address? If so,
>> please note this is in violation of the 3GPP specifications for GTP,
>> which require explicitly that the TEID is the *only* criteria for
>> matching an encapsulated packet to the tunnel.  Basically anyone can
>> send you an encapsulated packet from any random source IP, just as long
>> as the TEID matches a tunnel, it will be decapsulated.
>>
>> This is [presumably] in order to take care of mobility, as the
>> subscribers' phone moves around different MME/S-GW/SGSN, each having
>> different source IP addresses.
>
> I think that this will be hard to avoid; the several existing tunnel
> implementations that OVS plugs into all allow matching on the outer
> addresses/ports, I don't see a good way to restrict it without
> introducing completely new metadata_dst paths for GTP. I'd prefer not
> to introduce something like this if we can avoid it; several tunnels
> currently share all of the same metadata_dst code and that's proven
> sufficient for all cases so far. If someone wishes to implement the
> 3GPP standard correctly then they should not create matches like this.
> In quite a few cases, OVS tends to take the approach that we give the
> user the tools to do what they need to do, but if they wish to shoot
> themselves in the foot then that's up to them. We can of course work
>  towards ensuring the OVS userspace guides users in the right direction
> though.

The flow listed out here is an example for the nature of the match
that can be performed, but the actual match rule that is programmed
 by the control plane should wildcard the tun_src as noted by Harald. 

It is the control plane’s responsibility to enforce the 3GPP Specifications, 
e.g. creating flow rules to make TEID the only criteria for matching an 
encapsulated packet to the tunnel.  

I will provide a 3GPP compliant example in the next version, thank Harald
 for pointing it out.

-Jiannan

[PATCH] zd1211rw: fix spelling mistake 'hybernate' -> 'hibernate'

2017-07-13 Thread Colin King

From: Colin Ian King 

Trivial fix to spelling mistake in PDEBUG debug message.

Signed-off-by: Colin Ian King 
---
 drivers/net/wireless/zydas/zd1211rw/zd_rf_rf2959.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/wireless/zydas/zd1211rw/zd_rf_rf2959.c 
b/drivers/net/wireless/zydas/zd1211rw/zd_rf_rf2959.c
index a93f657a41c7..d4e512f50945 100644
--- a/drivers/net/wireless/zydas/zd1211rw/zd_rf_rf2959.c
+++ b/drivers/net/wireless/zydas/zd1211rw/zd_rf_rf2959.c
@@ -61,7 +61,7 @@ static void dump_regwrite(u32 rw)
 
switch (reg) {
case 0:
-   PDEBUG("reg0 CFG1 ref_sel %d hybernate %d rf_vco_reg_en %d"
+   PDEBUG("reg0 CFG1 ref_sel %d hibernate %d rf_vco_reg_en %d"
   " if_vco_reg_en %d if_vga_en %d",
   bits(rw, 14, 15), bit(rw, 3), bit(rw, 2), bit(rw, 1),
   bit(rw, 0));
-- 
2.11.0

Re: [PATCH] datapath: Fix for force/commit action failures

2017-07-13 Thread Greg Rose


On 07/13/2017 11:03 AM, Joe Stringer wrote:

On 13 July 2017 at 11:01, Greg Rose  wrote:

On 07/13/2017 10:46 AM, Joe Stringer wrote:


On 13 July 2017 at 09:25, Greg Rose  wrote:


When there is an established connection in direction A->B, it is
possible to receive a packet on port B which then executes
ct(commit,force) without first performing ct() - ie, a lookup.
In this case, we would expect that this packet can delete the existing
entry so that we can commit a connection with direction B->A. However,
currently we only perform a check in skb_nfct_cached() for whether
OVS_CS_F_TRACKED is set and OVS_CS_F_INVALID is not set, ie that a
lookup previously occurred. In the above scenario, a lookup has not
occurred but we should still be able to statelessly look up the
existing entry and potentially delete the entry if it is in the
opposite direction.

This patch extends the check to also hint that if the action has the
force flag set, then we will lookup the existing entry so that the
force check at the end of skb_nfct_cached has the ability to delete
the connection.

CC: d...@openvswitch.org
CC: Pravin Shalar 
Signed-off-by: Joe Stringer 
Signed-off-by: Greg Rose 
---



A couple more administrative notes, on netdev the module name in the
patch subject for openvswitch is "openvswitch" rather than datapath;



Right you are.


and patches rather than having just "PATCH" as the subject prefix
should state the tree. In this case, it's a bugfix so it should be
"PATCH net".



I knew that... forgot the format patch option to add it.  Net-next
is closed so that would be mandatory.

  Furthermore, if you're able to figure out which commit


introduced the issue (I believe it's introduced by the force commit
patch), then you should place the "Fixes: " tag. I can give you some
pointers off-list on how to do this (git blame and some basic
formatting of the targeted patch should do the trick - this tag
expects a 12-digit hash).

For reference, I ended up looking it up during review, this is the
line you'd add:
Fixes: dd41d33f0b03 ("openvswitch: Add force commit.")



Oh, thanks!





   net/openvswitch/conntrack.c | 12 
   1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
index 08679eb..9041cf8 100644
--- a/net/openvswitch/conntrack.c
+++ b/net/openvswitch/conntrack.c
@@ -641,17 +641,21 @@ static bool skb_nfct_cached(struct net *net,
  ct = nf_ct_get(skb, );
  /* If no ct, check if we have evidence that an existing
conntrack entry
   * might be found for this skb.  This happens when we lose a
skb->_nfct
-* due to an upcall.  If the connection was not confirmed, it is
not
-* cached and needs to be run through conntrack again.
+* due to an upcall, or if the direction is being forced.  If the
+* connection was not confirmed, it is not cached and needs to be
run
+* through conntrack again.
   */
-   if (!ct && key->ct_state & OVS_CS_F_TRACKED &&
+   if ((!ct && (key->ct_state & OVS_CS_F_TRACKED) &&
  !(key->ct_state & OVS_CS_F_INVALID) &&
-   key->ct_zone == info->zone.id) {
+key->ct_zone == info->zone.id) ||
+(!key->ct_state && info->force)) {
  ct = ovs_ct_find_existing(net, >zone,
info->family, skb,
!!(key->ct_state
   & OVS_CS_F_NAT_MASK));
  if (ct)
  nf_ct_get(skb, );
+   else
+   return false;
  }
  if (!ct)
  return false;



I was just wondering if this has the potential to prevent
nf_conntrack_in() from being called at all in this case, which is also
not quite right. In the original case of (!ct && (key->ct_state &
OVS_CS_F_TRACKED) && !(key->ct_state & OVS_CS_F_TRACKED)), which I'll
refer to as "ct_executed", we explicitly want to avoid running
nf_conntrack_in() if we already ran it, because the connection tracker
doesn't expect to see the same packet twice (there's also things like
stats/accounting, and potentially L4 state machines that could get
messed up by multiple calls). By the time the info->force and
direction check happens at the end of the function, "ct_executed" is
implied to be true. However, in this new case, ct_executed is actually
false - because there was no ct() before the ct(force,commit). In this
case, we only want to look up the existing entry to see if it should
be deleted; if it should not be deleted, then we still haven't yet
done the nf_conntrack_in() call so we should return false and the
caller, __ovs_ct_lookup() should make the call to nf_conntrack_in().

What I mean is something like the following incremental on your patch:

diff --git a/net/openvswitch/conntrack.c

Re: [PATCH 2/2] net: ethernet: nixge: Add support for National Instruments XGE netdev

2017-07-13 Thread Andrew Lunn

> +++ b/drivers/net/ethernet/ni/nixge.c
> @@ -0,0 +1,1246 @@
> +/*
> + * Copyright (c) 2016-2017, National Instruments Corp.
> + *
> + * Network Driver for Ettus Research XGE MAC
> + *
> + * This is largely based on the Xilinx AXI Ethernet Driver,
> + * and uses the same DMA engine in the FPGA

Hi Moritz 

Is the DMA code the same as in the AXI driver? Should it be pulled out
into a library and shared?

> +struct nixge_priv {
> + struct net_device *ndev;
> + struct device *dev;
> +
> + /* Connection to PHY device */
> + struct phy_device *phy_dev;
> + phy_interface_t phy_interface;

> + /* protecting link parameters */
> + spinlock_t  lock;
> + int link;
> + int speed;
> + int duplex;

All these seem to be pointless. They are set, but never used.

> +
> +static inline void nixge_dma_write_reg(struct nixge_priv *priv, off_t offset,
> +u32 val)

Please leave it up to the compile to inline.

> +static void __nixge_device_reset(struct nixge_priv *priv, off_t offset)
> +{
> + u32 timeout;
> + /* Reset Axi DMA. This would reset NIXGE Ethernet core as well.
> +  * The reset process of Axi DMA takes a while to complete as all
> +  * pending commands/transfers will be flushed or completed during
> +  * this reset process.
> +  */
> + nixge_dma_write_reg(priv, offset, XAXIDMA_CR_RESET_MASK);
> + timeout = DELAY_OF_ONE_MILLISEC;
> + while (nixge_dma_read_reg(priv, offset) & XAXIDMA_CR_RESET_MASK) {
> + udelay(1);

There is a link between the 1 and the value of DELAY_OF_ONE_MILLISEC.
It would be good to try to link these two together.

> + if (--timeout == 0) {
> + netdev_err(priv->ndev, "%s: DMA reset timeout!\n",
> +__func__);
> + break;
> + }
> + }
> +}
> +

> +static void nixge_handle_link_change(struct net_device *ndev)
> +{
> + struct nixge_priv *priv = netdev_priv(ndev);
> + struct phy_device *phydev = ndev->phydev;
> + unsigned long flags;
> + int status_change = 0;
> +
> + spin_lock_irqsave(>lock, flags);
> +
> + if (phydev->link != priv->link) {
> + if (!phydev->link) {
> + priv->speed = 0;
> + priv->duplex = -1;
> + }
> + priv->link = phydev->link;
> +
> + status_change = 1;
> + }
> +
> + spin_unlock_irqrestore(>lock, flags);
> +
> + if (status_change) {
> + if (phydev->link) {
> + netif_carrier_on(ndev);
> + netdev_info(ndev, "link up (%d/%s)\n",
> + phydev->speed,
> + phydev->duplex == DUPLEX_FULL ?
> + "Full" : "Half");
> + } else {
> + netif_carrier_off(ndev);
> + netdev_info(ndev, "link down\n");
> + }

phy_print_status() should be used.

Also, the phylib will handle netif_carrier_off/on for you.

> +static int nixge_open(struct net_device *ndev)
> +{
> + struct nixge_priv *priv = netdev_priv(ndev);
> + int ret;
> +
> + nixge_device_reset(ndev);
> +
> + /* start netif carrier down */
> + netif_carrier_off(ndev);
> +
> + if (!ndev->phydev)
> + netdev_err(ndev, "no phy, phy_start() failed\n");

Not really correct. You don't call phy_start(). And phy_start() cannot
indicate a failure, it is a void function.

It would be a lot better to bail out if there is no phy. Probably
during probe.

> +static s32 __nixge_set_mac_address(struct net_device *ndev, const void *addr)
> +{
> + struct nixge_priv *priv = netdev_priv(ndev);
> +
> + if (addr)
> + memcpy(ndev->dev_addr, addr, ETH_ALEN);
> + if (!is_valid_ether_addr(ndev->dev_addr))
> + eth_random_addr(ndev->dev_addr);

Messy. I would change this. Make addr mandatory. If it is invalid,
return an error. That will make nixge_net_set_mac_address() do the
right thing. When called from nixge_probe() should verify what it gets
from the nvmem, and if it is invalid, pass a random MAC address.

> +
> + nixge_ctrl_write_reg(priv, NIXGE_REG_MAC_LSB,
> +  (ndev->dev_addr[2]) << 24 |
> +  (ndev->dev_addr[3] << 16) |
> +  (ndev->dev_addr[4] << 8) |
> +  (ndev->dev_addr[5] << 0));
> +
> + nixge_ctrl_write_reg(priv, NIXGE_REG_MAC_MSB,
> +  (ndev->dev_addr[1] | (ndev->dev_addr[0] << 8)));
> +
> + return 0;
> +}
> +

> +static void nixge_ethtools_get_drvinfo(struct net_device *ndev,
> +struct ethtool_drvinfo *ed)
> +{
> + strlcpy(ed->driver, "nixge", sizeof(ed->driver));
> + strlcpy(ed->version, "1.00a", sizeof(ed->version));

Driver version is pretty pointless. What

Re: [PATCH] igb: fix unused igb_deliver_wake_packet() warning when CONFIG_PM=n

2017-07-13 Thread Jeff Kirsher

On Wed, 2017-07-12 at 18:23 -0300, Fabio Estevam wrote:
> On Wed, Jul 12, 2017 at 6:09 PM, Dave Hansen
>  wrote:
> > 
> > From: Dave Hansen 
> > 
> > I'm seeing warnings on kernel configurations where CONFIG_PM is
> > disabled.  It happens in 4.12, at least:
> > 
> > drivers/ethernet/intel/igb/igb_main.c:7988:13: warning:
> > 'igb_deliver_wake_packet' defined but not used [-Wunused-function]
> > 
> > This is because igb_deliver_wake_packet() is defined outside of
> > the #ifdef", but is used only a single time within the #ifdef in
> > igb_resume().  Fix it by moving igb_deliver_wake_packet() next to
> > igb_resume() inside the #ifdef.
> > 
> > The diff ends up looking a bit funky here.  It *looks* like
> > igb_suspend() is getting moved, but that's an artifact of how
> > 'diff' sees the changes.
> > 
> > Cc: Jeff Kirsher 
> > Cc: intel-wired-...@lists.osuosl.org
> > Cc: netdev@vger.kernel.org
> > Signed-off-by: Dave Hansen 
> 
> Seems to be fixed by the following commit in linux-next:
> 
> commit 000ba1f2ebf0d6f93b9ae6cfbe5417e66f1b8e8c
> Author: Arnd Bergmann 
> Date:   Thu Apr 27 21:09:52 2017 +0200
> 
> igb: mark PM functions as __maybe_unused
> 
> The new wake function is only used by the suspend/resume handlers
> that
> are defined in inside of an #ifdef, which can cause this harmless
> warning:
> 
> drivers/net/ethernet/intel/igb/igb_main.c:7988:13: warning:
> 'igb_deliver_wake_packet' defined but not used [-Wunused-function]
> 
> Removing the #ifdef, instead using a __maybe_unused annotation
> simplifies the code and avoids the warning.
> 
> Fixes: b90fa8763560 ("igb: Enable reading of wake up packet")
> Signed-off-by: Arnd Bergmann 
> Tested-by: Aaron Brown 
> Signed-off-by: Jeff Kirsher 

Yeah, this is already fixed in my tree.  Dropping this patch.

signature.asc
Description: This is a digitally signed message part

[PATCH 1/2] dt-bindings: net: Add bindings for National Instruments XGE netdev

2017-07-13 Thread Moritz Fischer

This adds bindings for the NI XGE 1G/10G network device.

Signed-off-by: Moritz Fischer 
---
 Documentation/devicetree/bindings/net/nixge.c | 32 +++
 1 file changed, 32 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/net/nixge.c

diff --git a/Documentation/devicetree/bindings/net/nixge.c 
b/Documentation/devicetree/bindings/net/nixge.c
new file mode 100644
index 000..9fff5a7
--- /dev/null
+++ b/Documentation/devicetree/bindings/net/nixge.c
@@ -0,0 +1,32 @@
+* NI XGE Ethernet controller
+
+Required properties:
+- compatible: Should be "ni,xge-enet-2.00"
+- reg: Address and length of the register set for the device
+- interrupts: Should contain tx and rx interrupt
+- interrupt-names: Should be "rx-irq" and "tx-irq"
+- phy-mode: See ethernet.txt file in the same directory.
+- nvmem-cells: Phandle of nvmem cell containing the mac address
+- nvmem-cell-names: Should be "address"
+
+Examples (10G generic PHY):
+   nixge0: ethernet@4000 {
+   compatible = "ni,xge-enet-2.00";
+   reg = <0x4000 0x6000>;
+
+   nvmem-cells = <_addr>;
+   nvmem-cell-names = "address";
+
+   interrupts = <0 29 4>, <0 30 4>;
+   interrupt-names = "rx-irq", "tx-irq";
+   interrupt-parent = <>;
+
+   phy-mode = "xgmii";
+   phy-handle = <_phy1>;
+
+   ethernet_phy1: ethernet-phy@4 {
+   compatible = "ethernet-phy-ieee802.3-c45";
+   reg = <4>;
+   devices = <0xa>;
+   };
+   };
-- 
2.7.4

[PATCH 2/2] net: ethernet: nixge: Add support for National Instruments XGE netdev

2017-07-13 Thread Moritz Fischer

Add support for the National Instruments XGE 1/10G network device.

It uses the EEPROM on the board via NVMEM.

Signed-off-by: Moritz Fischer 
---
 drivers/net/ethernet/Kconfig |1 +
 drivers/net/ethernet/Makefile|1 +
 drivers/net/ethernet/ni/Kconfig  |   26 +
 drivers/net/ethernet/ni/Makefile |1 +
 drivers/net/ethernet/ni/nixge.c  | 1246 ++
 5 files changed, 1275 insertions(+)
 create mode 100644 drivers/net/ethernet/ni/Kconfig
 create mode 100644 drivers/net/ethernet/ni/Makefile
 create mode 100644 drivers/net/ethernet/ni/nixge.c

diff --git a/drivers/net/ethernet/Kconfig b/drivers/net/ethernet/Kconfig
index edae15ac..2021806 100644
--- a/drivers/net/ethernet/Kconfig
+++ b/drivers/net/ethernet/Kconfig
@@ -127,6 +127,7 @@ config FEALNX
 
 source "drivers/net/ethernet/natsemi/Kconfig"
 source "drivers/net/ethernet/netronome/Kconfig"
+source "drivers/net/ethernet/ni/Kconfig"
 source "drivers/net/ethernet/8390/Kconfig"
 
 config NET_NETX
diff --git a/drivers/net/ethernet/Makefile b/drivers/net/ethernet/Makefile
index bf7f450..68f49f7 100644
--- a/drivers/net/ethernet/Makefile
+++ b/drivers/net/ethernet/Makefile
@@ -58,6 +58,7 @@ obj-$(CONFIG_NET_VENDOR_MYRI) += myricom/
 obj-$(CONFIG_FEALNX) += fealnx.o
 obj-$(CONFIG_NET_VENDOR_NATSEMI) += natsemi/
 obj-$(CONFIG_NET_VENDOR_NETRONOME) += netronome/
+obj-$(CONFIG_NET_VENDOR_NI) += ni/
 obj-$(CONFIG_NET_NETX) += netx-eth.o
 obj-$(CONFIG_NET_VENDOR_NUVOTON) += nuvoton/
 obj-$(CONFIG_NET_VENDOR_NVIDIA) += nvidia/
diff --git a/drivers/net/ethernet/ni/Kconfig b/drivers/net/ethernet/ni/Kconfig
new file mode 100644
index 000..a74ffeb
--- /dev/null
+++ b/drivers/net/ethernet/ni/Kconfig
@@ -0,0 +1,26 @@
+#
+# National Instuments network device configuration
+#
+
+config NET_VENDOR_NI
+   bool "National Instruments Devices"
+   default y
+   ---help---
+ If you have a network (Ethernet) device belonging to this class, say 
Y.
+
+ Note that the answer to this question doesn't directly affect the
+ kernel: saying N will just cause the configurator to skip all
+ the questions about National Instrument devices.
+ If you say Y, you will be asked for your specific device in the
+ following questions.
+
+if NET_VENDOR_NI
+
+config NI_XGE_MANAGEMENT_ENET
+   tristate "National Instruments XGE management enet support"
+   select PHYLIB
+   ---help---
+ Simple LAN device for debug or management purposes. Can
+ support either 10G or 1G PHYs via SFP+ ports.
+
+endif
diff --git a/drivers/net/ethernet/ni/Makefile b/drivers/net/ethernet/ni/Makefile
new file mode 100644
index 000..99c6646
--- /dev/null
+++ b/drivers/net/ethernet/ni/Makefile
@@ -0,0 +1 @@
+obj-$(CONFIG_NI_XGE_MANAGEMENT_ENET) += nixge.o
diff --git a/drivers/net/ethernet/ni/nixge.c b/drivers/net/ethernet/ni/nixge.c
new file mode 100644
index 000..85b213c
--- /dev/null
+++ b/drivers/net/ethernet/ni/nixge.c
@@ -0,0 +1,1246 @@
+/*
+ * Copyright (c) 2016-2017, National Instruments Corp.
+ *
+ * Network Driver for Ettus Research XGE MAC
+ *
+ * This is largely based on the Xilinx AXI Ethernet Driver,
+ * and uses the same DMA engine in the FPGA
+ *
+ * SPDX-License-Identifier: GPL-2.0
+ *
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#define TX_BD_NUM  64
+#define RX_BD_NUM  128
+
+#define DELAY_OF_ONE_MILLISEC  1000
+
+/* Axi DMA Register definitions */
+
+#define XAXIDMA_TX_CR_OFFSET   0x /* Channel control */
+#define XAXIDMA_TX_SR_OFFSET   0x0004 /* Status */
+#define XAXIDMA_TX_CDESC_OFFSET0x0008 /* Current descriptor 
pointer */
+#define XAXIDMA_TX_TDESC_OFFSET0x0010 /* Tail descriptor pointer */
+
+#define XAXIDMA_RX_CR_OFFSET   0x0030 /* Channel control */
+#define XAXIDMA_RX_SR_OFFSET   0x0034 /* Status */
+#define XAXIDMA_RX_CDESC_OFFSET0x0038 /* Current descriptor 
pointer */
+#define XAXIDMA_RX_TDESC_OFFSET0x0040 /* Tail descriptor pointer */
+
+#define XAXIDMA_CR_RUNSTOP_MASK0x0001 /* Start/stop DMA channel */
+#define XAXIDMA_CR_RESET_MASK  0x0004 /* Reset DMA engine */
+
+#define XAXIDMA_BD_NDESC_OFFSET0x00 /* Next descriptor pointer 
*/
+#define XAXIDMA_BD_BUFA_OFFSET 0x08 /* Buffer address */
+#define XAXIDMA_BD_CTRL_LEN_OFFSET 0x18 /* Control/buffer length */
+#define XAXIDMA_BD_STS_OFFSET  0x1C /* Status */
+#define XAXIDMA_BD_USR0_OFFSET 0x20 /* User IP specific word0 */
+#define XAXIDMA_BD_USR1_OFFSET 0x24 /* User IP specific word1 */
+#define XAXIDMA_BD_USR2_OFFSET 0x28 /* User IP specific word2 */
+#define XAXIDMA_BD_USR3_OFFSET 0x2C /* User IP specific word3 */
+#define XAXIDMA_BD_USR4_OFFSET

Re: [PATCH v7 2/3] PCI: Enable PCIe Relaxed Ordering if supported

2017-07-13 Thread Sinan Kaya

On 7/13/2017 10:21 AM, Ding Tianhong wrote:
> static void pci_configure_relaxed_ordering(struct pci_dev *dev)
> +{
> + /* We should not alter the relaxed ordering bit for the VF */
> + if (dev->is_virtfn)
> + return;
> +
> + /* If the releaxed ordering enable bit is not set, do nothing. */
> + if (!pcie_relaxed_ordering_supported(dev))
> + return;
> +
> + if (pci_dev_should_disable_relaxed_ordering(dev)) {
> + pcie_clear_relaxed_ordering(dev);
> + dev_info(>dev, "Disable Relaxed Ordering\n");
> + }
> +}

I couldn't find anywhere where you actually enable the relaxed ordering
like the subject suggests.

-- 
Sinan Kaya
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm 
Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux 
Foundation Collaborative Project.

Re: [PATCH net] mlxsw: spectrum_router: do not drop refcnt on fib rule

2017-07-13 Thread Ido Schimmel

On Thu, Jul 13, 2017 at 02:39:10PM -0600, David Ahern wrote:
> On 7/13/17 2:33 PM, Ido Schimmel wrote:
> > Remains where? It's not clear to me how you concluded mlxsw is at fault.
> > My setup is running net-next with the refcount patches and I didn't
> > observe this.
> 
> Create a VRF.

BTW, this didn't show up on my dev branch as I've patches that introduce
IPv6 support where I move the rules notifications to core, after the
refcount is set to 1 and just before the netlink notification is sent.
https://github.com/idosch/linux/commit/7b17a21b1d71fc9a1969080e5fdcb90f376b73b2

Re: [PATCH net] mlxsw: spectrum_router: do not drop refcnt on fib rule

2017-07-13 Thread Ido Schimmel

On Thu, Jul 13, 2017 at 02:39:10PM -0600, David Ahern wrote:
> On 7/13/17 2:33 PM, Ido Schimmel wrote:
> > Remains where? It's not clear to me how you concluded mlxsw is at fault.
> > My setup is running net-next with the refcount patches and I didn't
> > observe this.
> 
> Create a VRF.

Yea, I wasn't running VRFs with the refcount patches.

Reproduced this on my system now. Thanks for the fix.

[PATCH] [for 4.13] net: qcom/emac: fix double free of SGMII IRQ during shutdown

2017-07-13 Thread Timur Tabi

If the interface is not up, then don't try to close it during a
shutdown.  This avoids possible double free of the IRQ, which
can happen during a shutdown.

Fixes: 03eb3eb4d4d5 ("net: qcom/emac: add shutdown function")
Signed-off-by: Timur Tabi 
---
 drivers/net/ethernet/qualcomm/emac/emac.c | 10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/qualcomm/emac/emac.c 
b/drivers/net/ethernet/qualcomm/emac/emac.c
index 746d94e..60850bf 100644
--- a/drivers/net/ethernet/qualcomm/emac/emac.c
+++ b/drivers/net/ethernet/qualcomm/emac/emac.c
@@ -766,11 +766,13 @@ static void emac_shutdown(struct platform_device *pdev)
struct emac_adapter *adpt = netdev_priv(netdev);
struct emac_sgmii *sgmii = >phy;
 
-   /* Closing the SGMII turns off its interrupts */
-   sgmii->close(adpt);
+   if (netdev->flags & IFF_UP) {
+   /* Closing the SGMII turns off its interrupts */
+   sgmii->close(adpt);
 
-   /* Resetting the MAC turns off all DMA and its interrupts */
-   emac_mac_reset(adpt);
+   /* Resetting the MAC turns off all DMA and its interrupts */
+   emac_mac_reset(adpt);
+   }
 }
 
 static struct platform_driver emac_platform_driver = {
-- 
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm
Technologies, Inc.  Qualcomm Technologies, Inc. is a member of the
Code Aurora Forum, a Linux Foundation Collaborative Project.

Re: [PATCH] sctp: don't dereference ptr before leaving _sctp_walk_{params,errors}()

2017-07-13 Thread David Miller

From: Alexander Potapenko 
Date: Thu, 13 Jul 2017 21:28:39 +0200

> On Thu, Jul 13, 2017 at 8:32 PM, David Miller  wrote:
>> struct sctp_paramhdr {
>> __be16 type;
>> __be16 length;
>> };
>>
>> typedef struct sctp_errhdr {
>> __be16 cause;
>> __be16 length;
>> __u8  variable[0];
>> } sctp_errhdr_t;
 ...
>> Something like:
>>
>> pos.v + offsetof(pos.v, length) + sizeof(pos.v->length) <= (void *) 
>> chunk + end
> 
> Do we need to bother about truncated structures? Shouldn't it be
> enough to check that there's at least sizeof(struct sctp_paramhdr)
> bytes left then?

With the zero length array at the end, it's arguable what the "size"
of such a thing is.

That's why I tried to be explicit with the length field.

Re: [PATCH net] net: set fib rule refcount after malloc

2017-07-13 Thread David Miller

From: David Ahern 
Date: Thu, 13 Jul 2017 13:36:40 -0700

> The configure callback of fib_rules_ops can change the refcnt of a
> fib rule. For instance, mlxsw takes a refcnt when adding the processing
> of the rule to a work queue. Thus the rule refcnt can not be reset to
> to 1 afterwards. Move the refcnt setting to after the allocation.
> 
> Fixes: 5361e209dd30 ("net: avoid one splat in fib_nl_delrule()")
> Signed-off-by: David Ahern 

Can someone keep a scoreboard of how many bugs and arbitrary
unnecessary changes get introduced because of this new refcount_t
business rather than get fixed?

I'd really like to see how that pans out over the long term especially
with all of the backporting pain this facility is going to cause.

Anyways, David I will apply this, thanks.

Re: [PATCH net] mlxsw: spectrum_router: do not drop refcnt on fib rule

2017-07-13 Thread David Ahern

On 7/13/17 2:33 PM, Ido Schimmel wrote:
> Remains where? It's not clear to me how you concluded mlxsw is at fault.
> My setup is running net-next with the refcount patches and I didn't
> observe this.

Create a VRF.

see latest patch. mlxsw releasing the refcnt on the rule was the victim;
eric's patch to fix a delete was setting the refcnt to 1 after mlxsw
bumped it.

[PATCH net] net: set fib rule refcount after malloc

2017-07-13 Thread David Ahern

The configure callback of fib_rules_ops can change the refcnt of a
fib rule. For instance, mlxsw takes a refcnt when adding the processing
of the rule to a work queue. Thus the rule refcnt can not be reset to
to 1 afterwards. Move the refcnt setting to after the allocation.

Fixes: 5361e209dd30 ("net: avoid one splat in fib_nl_delrule()")
Signed-off-by: David Ahern 
---
 net/core/fib_rules.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/net/core/fib_rules.c b/net/core/fib_rules.c
index a0093e1b0235..fdcb1bcd2afa 100644
--- a/net/core/fib_rules.c
+++ b/net/core/fib_rules.c
@@ -400,6 +400,7 @@ int fib_nl_newrule(struct sk_buff *skb, struct nlmsghdr 
*nlh,
err = -ENOMEM;
goto errout;
}
+   refcount_set(>refcnt, 1);
rule->fr_net = net;
 
rule->pref = tb[FRA_PRIORITY] ? nla_get_u32(tb[FRA_PRIORITY])
@@ -517,8 +518,6 @@ int fib_nl_newrule(struct sk_buff *skb, struct nlmsghdr 
*nlh,
last = r;
}
 
-   refcount_set(>refcnt, 1);
-
if (last)
list_add_rcu(>list, >list);
else
-- 
2.1.4

Re: [PATCH net] mlxsw: spectrum_router: do not drop refcnt on fib rule

2017-07-13 Thread Ido Schimmel

On Thu, Jul 13, 2017 at 01:46:15PM -0600, David Ahern wrote:
> On 7/13/17 1:11 PM, David Ahern wrote:
> > Since mlxsw is not doing a get on the rule to increase the ref count, it
> > should not be doing a put.
> 
> upon further review, mlxsw is doing a get on the rule
> 
> Problem remains, but this is not the right fix.

Remains where? It's not clear to me how you concluded mlxsw is at fault.
My setup is running net-next with the refcount patches and I didn't
observe this.

If current trace isn't enough to pinpoint the problem, can you try to
reproduce with a KASAN enabled kernel?

Re: [PATCH net] libceph: osdmap: Fix some NULL dereferences

2017-07-13 Thread Dan Carpenter

On Thu, Jul 13, 2017 at 06:13:22PM +0200, Ilya Dryomov wrote:
> 
> Hi Dan,
> 
> I applied osdmap_apply_incremental() hunk and fixed a similar bug in
> osdmap_decode() (it's a not NULL deref, that's why smatch didn't catch
> it).
> 

No...  :/  Smatch complained about it, but I somehow marked it as done
yesterday without sending a patch.  Thanks for catching that.

The Smatch check is just looking for places where we do ERR_PTR(0) or
ERR_PTR(valid_ptr).  That can be intentional, so there is some hand
reviewing involved.

regards,
dan carpenter

Re: [PATCH net] mlxsw: spectrum_router: do not drop refcnt on fib rule

2017-07-13 Thread David Ahern

On 7/13/17 1:11 PM, David Ahern wrote:
> Since mlxsw is not doing a get on the rule to increase the ref count, it
> should not be doing a put.

upon further review, mlxsw is doing a get on the rule

Problem remains, but this is not the right fix.

Re: [PATCH] sctp: don't dereference ptr before leaving _sctp_walk_{params,errors}()

2017-07-13 Thread Alexander Potapenko

On Thu, Jul 13, 2017 at 8:32 PM, David Miller  wrote:
> From: Alexander Potapenko 
> Date: Thu, 13 Jul 2017 20:10:34 +0200
>
>> diff --git a/include/net/sctp/sctp.h b/include/net/sctp/sctp.h
>> index a9519a06a23b..f13632ee33f0 100644
>> --- a/include/net/sctp/sctp.h
>> +++ b/include/net/sctp/sctp.h
>> @@ -469,6 +469,7 @@ _sctp_walk_params((pos), (chunk), 
>> ntohs((chunk)->chunk_hdr.length), member)
>>
>>  #define _sctp_walk_params(pos, chunk, end, member)\
>>  for (pos.v = chunk->member;\
>> + pos.v < (void *)chunk + end &&\
>>   pos.v <= (void *)chunk + end - ntohs(pos.p->length) &&\
>>   ntohs(pos.p->length) >= sizeof(struct sctp_paramhdr);\
>>   pos.v += SCTP_PAD4(ntohs(pos.p->length)))
>> @@ -479,6 +480,7 @@ _sctp_walk_errors((err), (chunk_hdr), 
>> ntohs((chunk_hdr)->length))
>>  #define _sctp_walk_errors(err, chunk_hdr, end)\
>>  for (err = (sctp_errhdr_t *)((void *)chunk_hdr + \
>>   sizeof(struct sctp_chunkhdr));\
>> + (void *)err <= (void *)chunk_hdr + end &&\
>>   (void *)err <= (void *)chunk_hdr + end - ntohs(err->length) &&\
>>   ntohs(err->length) >= sizeof(sctp_errhdr_t); \
>>   err = (sctp_errhdr_t *)((void *)err + SCTP_PAD4(ntohs(err->length
>
> Even with the "err < ..." fixed in the second hunk, I still think you need
> to tweak these checks some more.
>
> What is necessary is that the first two members of sctp_paramhdr or
> sctp_errhdr are in the range ptr to end.
>
> struct sctp_paramhdr {
> __be16 type;
> __be16 length;
> };
>
> typedef struct sctp_errhdr {
> __be16 cause;
> __be16 length;
> __u8  variable[0];
> } sctp_errhdr_t;
>
> so that we can legally dereference ->length.
>
> But that is not what your new check is doing.  Your new check is only
> making sure there is at least one byte there.
>
> I guess it's not easy to write a clean test that doesn't hardcode the
> offset of the length field and it's size.
>
> Something like:
>
> pos.v + offsetof(pos.v, length) + sizeof(pos.v->length) <= (void *) 
> chunk + end

Do we need to bother about truncated structures? Shouldn't it be
enough to check that there's at least sizeof(struct sctp_paramhdr)
bytes left then?
> and
>
> (void *)err + offsetof(err, length) + sizeof(err->length) <= (void *) 
> chunk_hdr + end
>
> And yeah, that isn't exactly concise nor pretty...



-- 
Alexander Potapenko
Software Engineer

Google Germany GmbH
Erika-Mann-Straße, 33
80636 München

Geschäftsführer: Matthew Scott Sucherman, Paul Terence Manicle
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg

[PATCH net] mlxsw: spectrum_router: do not drop refcnt on fib rule

2017-07-13 Thread David Ahern

The recent conversion to refcount_t, 717d1e993ad8 ("net: convert
fib_rule.refcnt from atomic_t to refcount_t"), and subsequent fix
by Eric, 5361e209dd30 ("net: avoid one splat in fib_nl_delrule()"),
exposed a bug in mlxsw.

The driver is doing a put on fib rules after processing it from the
notifier. This triggers a BUG on:

[  104.444889] BUG: unable to handle kernel NULL pointer dereference at 
0010
[  104.452821] IP: fib_rules_lookup+0x39/0x170
[  104.457056] PGD 409395067
[  104.457057] P4D 409395067
[  104.459783] PUD 408c23067
[  104.462507] PMD 0
...
[  104.519750] CPU: 1 PID: 900 Comm: vrf Tainted: GW   4.12.0-rc7+ 
#51
[  104.527133] Hardware name: Mellanox Technologies Ltd. Mellanox 
switch/Mellanox switch, BIOS 4.6.5 05/21/2015
[  104.537084] task: 880401454380 task.stack: c97c
[  104.543029] RIP: 0010:fib_rules_lookup+0x39/0x170
[  104.547784] RSP: :88041dd039d8 EFLAGS: 00010207
[  104.553053] RAX: d8e1b910 RBX:  RCX: 0002
[  104.560264] RDX: fff5 RSI:  RDI: 880408d80f30
[  104.567461] RBP: 88041dd03a08 R08: 001d R09: 
[  104.574699] R10:  R11: 0006 R12: 88040b160cc0
[  104.581916] R13: 88041dd03a18 R14: 88040b160d40 R15: 88041dd03aa0
[  104.589130] FS:  7f44b0edf700() GS:88041dd0() 
knlGS:
[  104.597330] CS:  0010 DS:  ES:  CR0: 80050033
[  104.603151] CR2: 0010 CR3: 000408ca8000 CR4: 001406e0
[  104.610371] Call Trace:
[  104.612839]  
[  104.614872]  __fib_lookup+0x54/0x90
[  104.618406]  fib_validate_source+0x31d/0x570
[  104.622731]  ? fib_rules_lookup+0x131/0x170
[  104.626975]  ? __fib_lookup+0x54/0x90
[  104.630685]  ip_route_input_rcu+0xbcf/0xd30

Since mlxsw is not doing a get on the rule to increase the ref count, it
should not be doing a put.

Fixes: 5d7bfd141924a("ipv4: fib_rules: Dump FIB rules when registering FIB 
notifier")
Signed-off-by: David Ahern 
---
 drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c 
b/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
index 383fef5a8e24..b0fb8e5e83c9 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
@@ -2844,7 +2844,6 @@ static void mlxsw_sp_router_fib_event_work(struct 
work_struct *work)
rule = fib_work->fr_info.rule;
if (!fib4_rule_default(rule) && !rule->l3mdev)
mlxsw_sp_router_fib4_abort(mlxsw_sp);
-   fib_rule_put(rule);
break;
case FIB_EVENT_NH_ADD: /* fall through */
case FIB_EVENT_NH_DEL:
-- 
2.1.4

[PATCH 3/3] Documentation: devicetree: net: optional idm regs for bgmac

2017-07-13 Thread Abhishek Shah

Specifying IDM register space in DT is not mendatory for SoCs
where firmware takes care of IDM operations. This patch updates
BGMAC driver's DT binding documentation indicating the same.

Signed-off-by: Abhishek Shah 
Reviewed-by: Ray Jui 
Reviewed-by: Oza Oza 
Reviewed-by: Scott Branden 
---
 Documentation/devicetree/bindings/net/brcm,amac.txt | 1 +
 1 file changed, 1 insertion(+)

diff --git a/Documentation/devicetree/bindings/net/brcm,amac.txt 
b/Documentation/devicetree/bindings/net/brcm,amac.txt
index 2fefa1a..ad16c1f 100644
--- a/Documentation/devicetree/bindings/net/brcm,amac.txt
+++ b/Documentation/devicetree/bindings/net/brcm,amac.txt
@@ -11,6 +11,7 @@ Required properties:
  - reg-names:  Names of the registers.
"amac_base":Address and length of the GMAC registers
"idm_base": Address and length of the GMAC IDM registers
+   (required for NSP and Northstar2)
"nicpm_base":   Address and length of the NIC Port Manager
registers (required for Northstar2)
  - interrupts: Interrupt number
-- 
2.7.4

[PATCH 2/3] net: ethernet: bgmac: Make IDM register space optional

2017-07-13 Thread Abhishek Shah

IDM operations are usually one time ops and should be done in
firmware itself. Driver is not supposed to touch IDM registers.

However, for some SoCs', driver is performing IDM read/writes.
So this patch masks IDM operations in case firmware is taking
care of IDM operations.

Signed-off-by: Abhishek Shah 
Reviewed-by: Oza Oza 
Reviewed-by: Ray Jui 
Reviewed-by: Scott Branden 
---
 drivers/net/ethernet/broadcom/bgmac-platform.c | 19 ---
 drivers/net/ethernet/broadcom/bgmac.c  | 70 +++---
 drivers/net/ethernet/broadcom/bgmac.h  |  1 +
 3 files changed, 55 insertions(+), 35 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bgmac-platform.c 
b/drivers/net/ethernet/broadcom/bgmac-platform.c
index 1ca75de..d937083d 100644
--- a/drivers/net/ethernet/broadcom/bgmac-platform.c
+++ b/drivers/net/ethernet/broadcom/bgmac-platform.c
@@ -55,6 +55,9 @@ static void platform_bgmac_idm_write(struct bgmac *bgmac, u16 
offset, u32 value)
 
 static bool platform_bgmac_clk_enabled(struct bgmac *bgmac)
 {
+   if (!bgmac->plat.idm_base)
+   return true;
+
if ((bgmac_idm_read(bgmac, BCMA_IOCTL) & BGMAC_CLK_EN) != BGMAC_CLK_EN)
return false;
if (bgmac_idm_read(bgmac, BCMA_RESET_CTL) & BCMA_RESET_CTL_RESET)
@@ -66,6 +69,9 @@ static void platform_bgmac_clk_enable(struct bgmac *bgmac, 
u32 flags)
 {
u32 val;
 
+   if (!bgmac->plat.idm_base)
+   return;
+
/* The Reset Control register only contains a single bit to show if the
 * controller is currently in reset.  Do a sanity check here, just in
 * case the bootloader happened to leave the device in reset.
@@ -180,6 +186,7 @@ static int bgmac_probe(struct platform_device *pdev)
bgmac->feature_flags |= BGMAC_FEAT_CMDCFG_SR_REV4;
bgmac->feature_flags |= BGMAC_FEAT_TX_MASK_SETUP;
bgmac->feature_flags |= BGMAC_FEAT_RX_MASK_SETUP;
+   bgmac->feature_flags |= BGMAC_FEAT_IDM_MASK;
 
bgmac->dev = >dev;
bgmac->dma_dev = >dev;
@@ -207,15 +214,13 @@ static int bgmac_probe(struct platform_device *pdev)
return PTR_ERR(bgmac->plat.base);
 
regs = platform_get_resource_byname(pdev, IORESOURCE_MEM, "idm_base");
-   if (!regs) {
-   dev_err(>dev, "Unable to obtain idm resource\n");
-   return -EINVAL;
+   if (regs) {
+   bgmac->plat.idm_base = devm_ioremap_resource(>dev, regs);
+   if (IS_ERR(bgmac->plat.idm_base))
+   return PTR_ERR(bgmac->plat.idm_base);
+   bgmac->feature_flags &= ~BGMAC_FEAT_IDM_MASK;
}
 
-   bgmac->plat.idm_base = devm_ioremap_resource(>dev, regs);
-   if (IS_ERR(bgmac->plat.idm_base))
-   return PTR_ERR(bgmac->plat.idm_base);
-
regs = platform_get_resource_byname(pdev, IORESOURCE_MEM, "nicpm_base");
if (regs) {
bgmac->plat.nicpm_base = devm_ioremap_resource(>dev,
diff --git a/drivers/net/ethernet/broadcom/bgmac.c 
b/drivers/net/ethernet/broadcom/bgmac.c
index ba4d2e1..48d672b 100644
--- a/drivers/net/ethernet/broadcom/bgmac.c
+++ b/drivers/net/ethernet/broadcom/bgmac.c
@@ -622,9 +622,11 @@ static int bgmac_dma_alloc(struct bgmac *bgmac)
BUILD_BUG_ON(BGMAC_MAX_TX_RINGS > ARRAY_SIZE(ring_base));
BUILD_BUG_ON(BGMAC_MAX_RX_RINGS > ARRAY_SIZE(ring_base));
 
-   if (!(bgmac_idm_read(bgmac, BCMA_IOST) & BCMA_IOST_DMA64)) {
-   dev_err(bgmac->dev, "Core does not report 64-bit DMA\n");
-   return -ENOTSUPP;
+   if (!(bgmac->feature_flags & BGMAC_FEAT_IDM_MASK)) {
+   if (!(bgmac_idm_read(bgmac, BCMA_IOST) & BCMA_IOST_DMA64)) {
+   dev_err(bgmac->dev, "Core does not report 64-bit 
DMA\n");
+   return -ENOTSUPP;
+   }
}
 
for (i = 0; i < BGMAC_MAX_TX_RINGS; i++) {
@@ -855,9 +857,11 @@ static void bgmac_mac_speed(struct bgmac *bgmac)
 static void bgmac_miiconfig(struct bgmac *bgmac)
 {
if (bgmac->feature_flags & BGMAC_FEAT_FORCE_SPEED_2500) {
-   bgmac_idm_write(bgmac, BCMA_IOCTL,
-   bgmac_idm_read(bgmac, BCMA_IOCTL) | 0x40 |
-   BGMAC_BCMA_IOCTL_SW_CLKEN);
+   if (!(bgmac->feature_flags & BGMAC_FEAT_IDM_MASK)) {
+   bgmac_idm_write(bgmac, BCMA_IOCTL,
+   bgmac_idm_read(bgmac, BCMA_IOCTL) |
+   0x40 | BGMAC_BCMA_IOCTL_SW_CLKEN);
+   }
bgmac->mac_speed = SPEED_2500;
bgmac->mac_duplex = DUPLEX_FULL;
bgmac_mac_speed(bgmac);
@@ -874,11 +878,36 @@ static void bgmac_miiconfig(struct bgmac *bgmac)
}
 }
 
+static void bgmac_chip_reset_idm_config(struct bgmac *bgmac)
+{
+

[PATCH 1/3] net: ethernet: bgmac: Remove unnecessary 'return' from platform_bgmac_idm_write

2017-07-13 Thread Abhishek Shah

Return type for idm register write callback should be void as 'writel'
API is used for write operation. However, there no need to have 'return'
in this function.

Signed-off-by: Abhishek Shah 
Reviewed-by: Oza Oza 
Reviewed-by: Ray Jui 
Reviewed-by: Scott Branden 
---
 drivers/net/ethernet/broadcom/bgmac-platform.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/broadcom/bgmac-platform.c 
b/drivers/net/ethernet/broadcom/bgmac-platform.c
index 73aca97..1ca75de 100644
--- a/drivers/net/ethernet/broadcom/bgmac-platform.c
+++ b/drivers/net/ethernet/broadcom/bgmac-platform.c
@@ -50,7 +50,7 @@ static u32 platform_bgmac_idm_read(struct bgmac *bgmac, u16 
offset)
 
 static void platform_bgmac_idm_write(struct bgmac *bgmac, u16 offset, u32 
value)
 {
-   return writel(value, bgmac->plat.idm_base + offset);
+   writel(value, bgmac->plat.idm_base + offset);
 }
 
 static bool platform_bgmac_clk_enabled(struct bgmac *bgmac)
-- 
2.7.4

[PATCH 0/3] Extend BGMAC driver for Stingray SoC

2017-07-13 Thread Abhishek Shah

The patchset extends Broadcom BGMAC driver for Broadcom Stingray SoC.

This patchset is based on Linux-4.12 and tested on NS2 and Stingray.

Abhishek Shah (3):
  net: ethernet: bgmac: Remove unnecessary 'return' from
platform_bgmac_idm_write
  net: ethernet: bgmac: Make IDM register space optional in BGMAC driver
  Documentation: devicetree: net: optional idm regs for bgmac

 .../devicetree/bindings/net/brcm,amac.txt  |  1 +
 drivers/net/ethernet/broadcom/bgmac-platform.c | 21 ---
 drivers/net/ethernet/broadcom/bgmac.c  | 70 +-
 drivers/net/ethernet/broadcom/bgmac.h  |  1 +
 4 files changed, 57 insertions(+), 36 deletions(-)

-- 
2.7.4

Re: [PATCH] sctp: don't dereference ptr before leaving _sctp_walk_{params,errors}()

2017-07-13 Thread David Miller

From: Alexander Potapenko 
Date: Thu, 13 Jul 2017 20:10:34 +0200

> diff --git a/include/net/sctp/sctp.h b/include/net/sctp/sctp.h
> index a9519a06a23b..f13632ee33f0 100644
> --- a/include/net/sctp/sctp.h
> +++ b/include/net/sctp/sctp.h
> @@ -469,6 +469,7 @@ _sctp_walk_params((pos), (chunk), 
> ntohs((chunk)->chunk_hdr.length), member)
>  
>  #define _sctp_walk_params(pos, chunk, end, member)\
>  for (pos.v = chunk->member;\
> + pos.v < (void *)chunk + end &&\
>   pos.v <= (void *)chunk + end - ntohs(pos.p->length) &&\
>   ntohs(pos.p->length) >= sizeof(struct sctp_paramhdr);\
>   pos.v += SCTP_PAD4(ntohs(pos.p->length)))
> @@ -479,6 +480,7 @@ _sctp_walk_errors((err), (chunk_hdr), 
> ntohs((chunk_hdr)->length))
>  #define _sctp_walk_errors(err, chunk_hdr, end)\
>  for (err = (sctp_errhdr_t *)((void *)chunk_hdr + \
>   sizeof(struct sctp_chunkhdr));\
> + (void *)err <= (void *)chunk_hdr + end &&\
>   (void *)err <= (void *)chunk_hdr + end - ntohs(err->length) &&\
>   ntohs(err->length) >= sizeof(sctp_errhdr_t); \
>   err = (sctp_errhdr_t *)((void *)err + SCTP_PAD4(ntohs(err->length

Even with the "err < ..." fixed in the second hunk, I still think you need
to tweak these checks some more.

What is necessary is that the first two members of sctp_paramhdr or
sctp_errhdr are in the range ptr to end.

struct sctp_paramhdr {
__be16 type;
__be16 length;
};

typedef struct sctp_errhdr {
__be16 cause;
__be16 length;
__u8  variable[0];
} sctp_errhdr_t;

so that we can legally dereference ->length.

But that is not what your new check is doing.  Your new check is only
making sure there is at least one byte there.

I guess it's not easy to write a clean test that doesn't hardcode the
offset of the length field and it's size.

Something like:

pos.v + offsetof(pos.v, length) + sizeof(pos.v->length) <= (void *) 
chunk + end

and

(void *)err + offsetof(err, length) + sizeof(err->length) <= (void *) 
chunk_hdr + end

And yeah, that isn't exactly concise nor pretty...

RE: [PATCH] smsc95xx: use ethtool_op_get_ts_info()

2017-07-13 Thread Woojung.Huh

> -Original Message-
> From: Petr Kulhavy [mailto:br...@jikos.cz]
> Sent: Thursday, July 13, 2017 1:41 PM
> To: steve.glendinn...@shawell.net; UNGLinuxDriver
> Cc: netdev@vger.kernel.org; linux-...@vger.kernel.org; Petr Kulhavy
> Subject: [PATCH] smsc95xx: use ethtool_op_get_ts_info()
> 
> This change enables the use of SW timestamping on Raspberry PI.
> 
> smsc95xx uses the usbnet transmit function usbnet_start_xmit(), which
> implements software timestamping. However the
> SOF_TIMESTAMPING_TX_SOFTWARE
> capability was missing and only SOF_TIMESTAMPING_RX_SOFTWARE was
> announced.
> By using ethtool_op_get_ts_info() as get_ts_info() also the
> SOF_TIMESTAMPING_TX_SOFTWARE is announced.
> 
> Signed-off-by: Petr Kulhavy 

Reviewed-by: Woojung Huh 

Thanks

Re: [RFC PATCH 00/12] Implement XDP bpf_redirect vairants

2017-07-13 Thread David Miller

From: John Fastabend 
Date: Thu, 13 Jul 2017 10:00:15 -0700

> On 07/13/2017 09:16 AM, Jesper Dangaard Brouer wrote:
>> On Thu, 13 Jul 2017 13:14:30 +0200
>> Jesper Dangaard Brouer  wrote:
>> 
>>> I'm still getting crashes (but much harder to provoke), but I figured
>>> out why.  We sort of missed one case, where map_to_flush gets set, when
>>> the ndo_xdp_xmit() call starts to fail, and the ixgbe driver then
>>> forgets to call xdp_do_flush_map, if all packets in that NAPI cycle
>>> failed.  We could blame the driver, but yhe clean solution is making
>>> sure, that we don't set map_to_flush when the __bpf_tx_xdp() call
>>> fails. It should also handle the other case I fixed  I'll cleanup
>>> my PoC-fix patch, test it and provide it here.
>> 
>> I changed flow in the function to be:
> 
> 
> Great, I'll merge this, the other couple fixes, and the bitops optimization 
> and
> hopefully then we are set. I'll post a v2 and we can do some final checks.

I am so looking forward to merging this, great work everyone.

Re: [PATCH] sctp: don't dereference ptr before leaving _sctp_walk_{params,errors}()

2017-07-13 Thread Alexander Potapenko

On Thu, Jul 13, 2017 at 8:14 PM, Alexander Potapenko  wrote:
> On Thu, Jul 13, 2017 at 8:10 PM, Alexander Potapenko  
> wrote:
>> If the iterator (|pos.p| or |err|) has already reached the end of
>> chunk, we shouldn't access iterator->length.
>>
>> This bug has been detected by KMSAN. For the following pair of system
>> calls:
>>
>>   socket(PF_INET6, SOCK_STREAM, 0x84 /* IPPROTO_??? */) = 3
>>   sendto(3, "A", 1, MSG_OOB, {sa_family=AF_INET6, sin6_port=htons(0),
>>  inet_pton(AF_INET6, "::1", _addr), sin6_flowinfo=0,
>>  sin6_scope_id=0}, 28) = 1
>>
>> the tool has reported a use of uninitialized memory:
>>
>>   ==
>>   BUG: KMSAN: use of uninitialized memory in sctp_rcv+0x17b8/0x43b0
>>   CPU: 1 PID: 2940 Comm: probe Not tainted 4.11.0-rc5+ #2926
>>   Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs
>>   01/01/2011
>>   Call Trace:
>>
>>__dump_stack lib/dump_stack.c:16
>>dump_stack+0x172/0x1c0 lib/dump_stack.c:52
>>kmsan_report+0x12a/0x180 mm/kmsan/kmsan.c:927
>>__msan_warning_32+0x61/0xb0 mm/kmsan/kmsan_instr.c:469
>>__sctp_rcv_init_lookup net/sctp/input.c:1074
>>__sctp_rcv_lookup_harder net/sctp/input.c:1233
>>__sctp_rcv_lookup net/sctp/input.c:1255
>>sctp_rcv+0x17b8/0x43b0 net/sctp/input.c:170
>>sctp6_rcv+0x32/0x70 net/sctp/ipv6.c:984
>>ip6_input_finish+0x82f/0x1ee0 net/ipv6/ip6_input.c:279
>>NF_HOOK ./include/linux/netfilter.h:257
>>ip6_input+0x239/0x290 net/ipv6/ip6_input.c:322
>>dst_input ./include/net/dst.h:492
>>ip6_rcv_finish net/ipv6/ip6_input.c:69
>>NF_HOOK ./include/linux/netfilter.h:257
>>ipv6_rcv+0x1dbd/0x22e0 net/ipv6/ip6_input.c:203
>>__netif_receive_skb_core+0x2f6f/0x3a20 net/core/dev.c:4208
>>__netif_receive_skb net/core/dev.c:4246
>>process_backlog+0x667/0xba0 net/core/dev.c:4866
>>napi_poll net/core/dev.c:5268
>>net_rx_action+0xc95/0x1590 net/core/dev.c:5333
>>__do_softirq+0x485/0x942 kernel/softirq.c:284
>>do_softirq_own_stack+0x1c/0x30 arch/x86/entry/entry_64.S:902
>>
>>do_softirq kernel/softirq.c:328
>>__local_bh_enable_ip+0x25b/0x290 kernel/softirq.c:181
>>local_bh_enable+0x37/0x40 ./include/linux/bottom_half.h:31
>>rcu_read_unlock_bh ./include/linux/rcupdate.h:931
>>ip6_finish_output2+0x19b2/0x1cf0 net/ipv6/ip6_output.c:124
>>ip6_finish_output+0x764/0x970 net/ipv6/ip6_output.c:149
>>NF_HOOK_COND ./include/linux/netfilter.h:246
>>ip6_output+0x456/0x520 net/ipv6/ip6_output.c:163
>>dst_output ./include/net/dst.h:486
>>NF_HOOK ./include/linux/netfilter.h:257
>>ip6_xmit+0x1841/0x1c00 net/ipv6/ip6_output.c:261
>>sctp_v6_xmit+0x3b7/0x470 net/sctp/ipv6.c:225
>>sctp_packet_transmit+0x38cb/0x3a20 net/sctp/output.c:632
>>sctp_outq_flush+0xeb3/0x46e0 net/sctp/outqueue.c:885
>>sctp_outq_uncork+0xb2/0xd0 net/sctp/outqueue.c:750
>>sctp_side_effects net/sctp/sm_sideeffect.c:1773
>>sctp_do_sm+0x6962/0x6ec0 net/sctp/sm_sideeffect.c:1147
>>sctp_primitive_ASSOCIATE+0x12c/0x160 net/sctp/primitive.c:88
>>sctp_sendmsg+0x43e5/0x4f90 net/sctp/socket.c:1954
>>inet_sendmsg+0x498/0x670 net/ipv4/af_inet.c:762
>>sock_sendmsg_nosec net/socket.c:633
>>sock_sendmsg net/socket.c:643
>>SYSC_sendto+0x608/0x710 net/socket.c:1696
>>SyS_sendto+0x8a/0xb0 net/socket.c:1664
>>do_syscall_64+0xe6/0x130 arch/x86/entry/common.c:285
>>entry_SYSCALL64_slow_path+0x25/0x25 arch/x86/entry/entry_64.S:246
>>   RIP: 0033:0x401133
>>   RSP: 002b:7fff6d99cd38 EFLAGS: 0246 ORIG_RAX: 002c
>>   RAX: ffda RBX: 004002b0 RCX: 00401133
>>   RDX: 0001 RSI: 00494088 RDI: 0003
>>   RBP: 7fff6d99cd90 R08: 7fff6d99cd50 R09: 001c
>>   R10: 0001 R11: 0246 R12: 
>>   R13: 004063d0 R14: 00406460 R15: 
>>   origin:
>>save_stack_trace+0x37/0x40 arch/x86/kernel/stacktrace.c:59
>>kmsan_save_stack_with_flags mm/kmsan/kmsan.c:302
>>kmsan_internal_poison_shadow+0xb1/0x1a0 mm/kmsan/kmsan.c:198
>>kmsan_poison_shadow+0x6d/0xc0 mm/kmsan/kmsan.c:211
>>slab_alloc_node mm/slub.c:2743
>>__kmalloc_node_track_caller+0x200/0x360 mm/slub.c:4351
>>__kmalloc_reserve net/core/skbuff.c:138
>>__alloc_skb+0x26b/0x840 net/core/skbuff.c:231
>>alloc_skb ./include/linux/skbuff.h:933
>>sctp_packet_transmit+0x31e/0x3a20 net/sctp/output.c:570
>>sctp_outq_flush+0xeb3/0x46e0 net/sctp/outqueue.c:885
>>sctp_outq_uncork+0xb2/0xd0 net/sctp/outqueue.c:750
>>sctp_side_effects net/sctp/sm_sideeffect.c:1773
>>sctp_do_sm+0x6962/0x6ec0 net/sctp/sm_sideeffect.c:1147
>>sctp_primitive_ASSOCIATE+0x12c/0x160 net/sctp/primitive.c:88
>>sctp_sendmsg+0x43e5/0x4f90 net/sctp/socket.c:1954
>>inet_sendmsg+0x498/0x670

Re: [PATCH v7 3/3] net/cxgb4: Use new PCI_DEV_FLAGS_NO_RELAXED_ORDERING flag

2017-07-13 Thread Alexander Duyck

On Thu, Jul 13, 2017 at 11:14 AM, Alexander Duyck
 wrote:
> On Thu, Jul 13, 2017 at 7:21 AM, Ding Tianhong  
> wrote:
>> From: Casey Leedom 
>>
>> cxgb4 Ethernet driver now queries PCIe configuration space to determine
>> if it can send TLPs to it with the Relaxed Ordering Attribute set.
>>
>> Remove the enable_pcie_relaxed_ordering() to avoid enable PCIe Capability
>> Device Control[Relaxed Ordering Enable] at probe routine, to make sure
>> the driver will not send the Relaxed Ordering TLPs to the Root Complex which
>> could not deal the Relaxed Ordering TLPs.
>>
>> Signed-off-by: Casey Leedom 
>> Signed-off-by: Ding Tianhong 
>
> Ding,
>
> You can probably just drop this patch. If I am understanding Casey
> correctly just the fact that the relaxed ordering enable bit is
> cleared in the configuration should be enough to do this for the
> device automatically.
>
> - Alex

Actually I take that back. I hadn't caught the most recent parts of
the thread. If this is good for Casey then this works for me.

- Alex

Re: [PATCH net-next v1 0/3] Flow Based GTP Tunneling

2017-07-13 Thread Joe Stringer

On 13 July 2017 at 00:12, Harald Welte  wrote:
> hi Jiannan,
>
> net-next si closed, as it has been pointed out already by Joe.
>
> On Wed, Jul 12, 2017 at 05:44:52PM -0700, Jiannan Ouyang wrote:
>> ovs-ofctl add-flow br0
>> "in_port=2,tun_src=192.168.60.141,tun_id=123, \
>> actions=set_field:02:00:00:00:00:00->eth_src, \
>> set_field:ff:ff:ff:ff:ff:ff->eth_dst,LOCAL"
>
> I'm not familiar with the details here, but does this imply that you
> are matching on the outer (transport layer) source IP address? If so,
> please note this is in violation of the 3GPP specifications for GTP,
> which require explicitly that the TEID is the *only* criteria for
> matching an encapsulated packet to the tunnel.  Basically anyone can
> send you an encapsulated packet from any random source IP, just as long
> as the TEID matches a tunnel, it will be decapsulated.
>
> This is [presumably] in order to take care of mobility, as the
> subscribers' phone moves around different MME/S-GW/SGSN, each having
> different source IP addresses.

I think that this will be hard to avoid; the several existing tunnel
implementations that OVS plugs into all allow matching on the outer
addresses/ports, I don't see a good way to restrict it without
introducing completely new metadata_dst paths for GTP. I'd prefer not
to introduce something like this if we can avoid it; several tunnels
currently share all of the same metadata_dst code and that's proven
sufficient for all cases so far. If someone wishes to implement the
3GPP standard correctly then they should not create matches like this.
In quite a few cases, OVS tends to take the approach that we give the
user the tools to do what they need to do, but if they wish to shoot
themselves in the foot then that's up to them. We can of course work
towards ensuring the OVS userspace guides users in the right direction
though.

Re: [PATCH v7 3/3] net/cxgb4: Use new PCI_DEV_FLAGS_NO_RELAXED_ORDERING flag

2017-07-13 Thread Alexander Duyck

On Thu, Jul 13, 2017 at 7:21 AM, Ding Tianhong  wrote:
> From: Casey Leedom 
>
> cxgb4 Ethernet driver now queries PCIe configuration space to determine
> if it can send TLPs to it with the Relaxed Ordering Attribute set.
>
> Remove the enable_pcie_relaxed_ordering() to avoid enable PCIe Capability
> Device Control[Relaxed Ordering Enable] at probe routine, to make sure
> the driver will not send the Relaxed Ordering TLPs to the Root Complex which
> could not deal the Relaxed Ordering TLPs.
>
> Signed-off-by: Casey Leedom 
> Signed-off-by: Ding Tianhong 

Ding,

You can probably just drop this patch. If I am understanding Casey
correctly just the fact that the relaxed ordering enable bit is
cleared in the configuration should be enough to do this for the
device automatically.

- Alex

> ---
>  drivers/net/ethernet/chelsio/cxgb4/cxgb4.h  |  1 +
>  drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c | 23 +--
>  drivers/net/ethernet/chelsio/cxgb4/sge.c|  5 +++--
>  3 files changed, 21 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h 
> b/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h
> index ef4be78..09ea62e 100644
> --- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h
> +++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h
> @@ -529,6 +529,7 @@ enum { /* adapter flags */
> USING_SOFT_PARAMS  = (1 << 6),
> MASTER_PF  = (1 << 7),
> FW_OFLD_CONN   = (1 << 9),
> +   ROOT_NO_RELAXED_ORDERING = (1 << 10),
>  };
>
>  enum {
> diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c 
> b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
> index e403fa1..391e484 100644
> --- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
> +++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
> @@ -4654,11 +4654,6 @@ static void print_port_info(const struct net_device 
> *dev)
> dev->name, adap->params.vpd.id, adap->name, buf);
>  }
>
> -static void enable_pcie_relaxed_ordering(struct pci_dev *dev)
> -{
> -   pcie_capability_set_word(dev, PCI_EXP_DEVCTL, 
> PCI_EXP_DEVCTL_RELAX_EN);
> -}
> -
>  /*
>   * Free the following resources:
>   * - memory used for tables
> @@ -4908,7 +4903,6 @@ static int init_one(struct pci_dev *pdev, const struct 
> pci_device_id *ent)
> }
>
> pci_enable_pcie_error_reporting(pdev);
> -   enable_pcie_relaxed_ordering(pdev);
> pci_set_master(pdev);
> pci_save_state(pdev);
>
> @@ -4947,6 +4941,23 @@ static int init_one(struct pci_dev *pdev, const struct 
> pci_device_id *ent)
> adapter->msg_enable = DFLT_MSG_ENABLE;
> memset(adapter->chan_map, 0xff, sizeof(adapter->chan_map));
>
> +   /* If possible, we use PCIe Relaxed Ordering Attribute to deliver
> +* Ingress Packet Data to Free List Buffers in order to allow for
> +* chipset performance optimizations between the Root Complex and
> +* Memory Controllers.  (Messages to the associated Ingress Queue
> +* notifying new Packet Placement in the Free Lists Buffers will be
> +* send without the Relaxed Ordering Attribute thus guaranteeing that
> +* all preceding PCIe Transaction Layer Packets will be processed
> +* first.)  But some Root Complexes have various issues with Upstream
> +* Transaction Layer Packets with the Relaxed Ordering Attribute set.
> +* The PCIe devices which under the Root Complexes will be cleared the
> +* Relaxed Ordering bit in the configuration space, So we check our
> +* PCIe configuration space to see if it's flagged with advice against
> +* using Relaxed Ordering.
> +*/
> +   if (!pcie_relaxed_ordering_supported(pdev))
> +   adapter->flags |= ROOT_NO_RELAXED_ORDERING;
> +
> spin_lock_init(>stats_lock);
> spin_lock_init(>tid_release_lock);
> spin_lock_init(>win0_lock);
> diff --git a/drivers/net/ethernet/chelsio/cxgb4/sge.c 
> b/drivers/net/ethernet/chelsio/cxgb4/sge.c
> index ede1220..4ef68f6 100644
> --- a/drivers/net/ethernet/chelsio/cxgb4/sge.c
> +++ b/drivers/net/ethernet/chelsio/cxgb4/sge.c
> @@ -2719,6 +2719,7 @@ int t4_sge_alloc_rxq(struct adapter *adap, struct 
> sge_rspq *iq, bool fwevtq,
> struct fw_iq_cmd c;
> struct sge *s = >sge;
> struct port_info *pi = netdev_priv(dev);
> +   int relaxed = !(adap->flags & ROOT_NO_RELAXED_ORDERING);
>
> /* Size needs to be multiple of 16, including status entry. */
> iq->size = roundup(iq->size, 16);
> @@ -2772,8 +2773,8 @@ int t4_sge_alloc_rxq(struct adapter *adap, struct 
> sge_rspq *iq, bool fwevtq,
>
> flsz = fl->size / 8 + s->stat_len / sizeof(struct tx_desc);
> c.iqns_to_fl0congen |= htonl(FW_IQ_CMD_FL0PACKEN_F |
> -

Re: [PATCH] sctp: don't dereference ptr before leaving _sctp_walk_{params,errors}()

2017-07-13 Thread Alexander Potapenko

On Thu, Jul 13, 2017 at 8:10 PM, Alexander Potapenko  wrote:
> If the iterator (|pos.p| or |err|) has already reached the end of
> chunk, we shouldn't access iterator->length.
>
> This bug has been detected by KMSAN. For the following pair of system
> calls:
>
>   socket(PF_INET6, SOCK_STREAM, 0x84 /* IPPROTO_??? */) = 3
>   sendto(3, "A", 1, MSG_OOB, {sa_family=AF_INET6, sin6_port=htons(0),
>  inet_pton(AF_INET6, "::1", _addr), sin6_flowinfo=0,
>  sin6_scope_id=0}, 28) = 1
>
> the tool has reported a use of uninitialized memory:
>
>   ==
>   BUG: KMSAN: use of uninitialized memory in sctp_rcv+0x17b8/0x43b0
>   CPU: 1 PID: 2940 Comm: probe Not tainted 4.11.0-rc5+ #2926
>   Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs
>   01/01/2011
>   Call Trace:
>
>__dump_stack lib/dump_stack.c:16
>dump_stack+0x172/0x1c0 lib/dump_stack.c:52
>kmsan_report+0x12a/0x180 mm/kmsan/kmsan.c:927
>__msan_warning_32+0x61/0xb0 mm/kmsan/kmsan_instr.c:469
>__sctp_rcv_init_lookup net/sctp/input.c:1074
>__sctp_rcv_lookup_harder net/sctp/input.c:1233
>__sctp_rcv_lookup net/sctp/input.c:1255
>sctp_rcv+0x17b8/0x43b0 net/sctp/input.c:170
>sctp6_rcv+0x32/0x70 net/sctp/ipv6.c:984
>ip6_input_finish+0x82f/0x1ee0 net/ipv6/ip6_input.c:279
>NF_HOOK ./include/linux/netfilter.h:257
>ip6_input+0x239/0x290 net/ipv6/ip6_input.c:322
>dst_input ./include/net/dst.h:492
>ip6_rcv_finish net/ipv6/ip6_input.c:69
>NF_HOOK ./include/linux/netfilter.h:257
>ipv6_rcv+0x1dbd/0x22e0 net/ipv6/ip6_input.c:203
>__netif_receive_skb_core+0x2f6f/0x3a20 net/core/dev.c:4208
>__netif_receive_skb net/core/dev.c:4246
>process_backlog+0x667/0xba0 net/core/dev.c:4866
>napi_poll net/core/dev.c:5268
>net_rx_action+0xc95/0x1590 net/core/dev.c:5333
>__do_softirq+0x485/0x942 kernel/softirq.c:284
>do_softirq_own_stack+0x1c/0x30 arch/x86/entry/entry_64.S:902
>
>do_softirq kernel/softirq.c:328
>__local_bh_enable_ip+0x25b/0x290 kernel/softirq.c:181
>local_bh_enable+0x37/0x40 ./include/linux/bottom_half.h:31
>rcu_read_unlock_bh ./include/linux/rcupdate.h:931
>ip6_finish_output2+0x19b2/0x1cf0 net/ipv6/ip6_output.c:124
>ip6_finish_output+0x764/0x970 net/ipv6/ip6_output.c:149
>NF_HOOK_COND ./include/linux/netfilter.h:246
>ip6_output+0x456/0x520 net/ipv6/ip6_output.c:163
>dst_output ./include/net/dst.h:486
>NF_HOOK ./include/linux/netfilter.h:257
>ip6_xmit+0x1841/0x1c00 net/ipv6/ip6_output.c:261
>sctp_v6_xmit+0x3b7/0x470 net/sctp/ipv6.c:225
>sctp_packet_transmit+0x38cb/0x3a20 net/sctp/output.c:632
>sctp_outq_flush+0xeb3/0x46e0 net/sctp/outqueue.c:885
>sctp_outq_uncork+0xb2/0xd0 net/sctp/outqueue.c:750
>sctp_side_effects net/sctp/sm_sideeffect.c:1773
>sctp_do_sm+0x6962/0x6ec0 net/sctp/sm_sideeffect.c:1147
>sctp_primitive_ASSOCIATE+0x12c/0x160 net/sctp/primitive.c:88
>sctp_sendmsg+0x43e5/0x4f90 net/sctp/socket.c:1954
>inet_sendmsg+0x498/0x670 net/ipv4/af_inet.c:762
>sock_sendmsg_nosec net/socket.c:633
>sock_sendmsg net/socket.c:643
>SYSC_sendto+0x608/0x710 net/socket.c:1696
>SyS_sendto+0x8a/0xb0 net/socket.c:1664
>do_syscall_64+0xe6/0x130 arch/x86/entry/common.c:285
>entry_SYSCALL64_slow_path+0x25/0x25 arch/x86/entry/entry_64.S:246
>   RIP: 0033:0x401133
>   RSP: 002b:7fff6d99cd38 EFLAGS: 0246 ORIG_RAX: 002c
>   RAX: ffda RBX: 004002b0 RCX: 00401133
>   RDX: 0001 RSI: 00494088 RDI: 0003
>   RBP: 7fff6d99cd90 R08: 7fff6d99cd50 R09: 001c
>   R10: 0001 R11: 0246 R12: 
>   R13: 004063d0 R14: 00406460 R15: 
>   origin:
>save_stack_trace+0x37/0x40 arch/x86/kernel/stacktrace.c:59
>kmsan_save_stack_with_flags mm/kmsan/kmsan.c:302
>kmsan_internal_poison_shadow+0xb1/0x1a0 mm/kmsan/kmsan.c:198
>kmsan_poison_shadow+0x6d/0xc0 mm/kmsan/kmsan.c:211
>slab_alloc_node mm/slub.c:2743
>__kmalloc_node_track_caller+0x200/0x360 mm/slub.c:4351
>__kmalloc_reserve net/core/skbuff.c:138
>__alloc_skb+0x26b/0x840 net/core/skbuff.c:231
>alloc_skb ./include/linux/skbuff.h:933
>sctp_packet_transmit+0x31e/0x3a20 net/sctp/output.c:570
>sctp_outq_flush+0xeb3/0x46e0 net/sctp/outqueue.c:885
>sctp_outq_uncork+0xb2/0xd0 net/sctp/outqueue.c:750
>sctp_side_effects net/sctp/sm_sideeffect.c:1773
>sctp_do_sm+0x6962/0x6ec0 net/sctp/sm_sideeffect.c:1147
>sctp_primitive_ASSOCIATE+0x12c/0x160 net/sctp/primitive.c:88
>sctp_sendmsg+0x43e5/0x4f90 net/sctp/socket.c:1954
>inet_sendmsg+0x498/0x670 net/ipv4/af_inet.c:762
>sock_sendmsg_nosec net/socket.c:633
>sock_sendmsg net/socket.c:643
>SYSC_sendto+0x608/0x710 net/socket.c:1696
>SyS_sendto+0x8a/0xb0 net/socket.c:1664
>

[PATCH] sctp: don't dereference ptr before leaving _sctp_walk_{params,errors}()

2017-07-13 Thread Alexander Potapenko

If the iterator (|pos.p| or |err|) has already reached the end of
chunk, we shouldn't access iterator->length.

This bug has been detected by KMSAN. For the following pair of system
calls:

  socket(PF_INET6, SOCK_STREAM, 0x84 /* IPPROTO_??? */) = 3
  sendto(3, "A", 1, MSG_OOB, {sa_family=AF_INET6, sin6_port=htons(0),
 inet_pton(AF_INET6, "::1", _addr), sin6_flowinfo=0,
 sin6_scope_id=0}, 28) = 1

the tool has reported a use of uninitialized memory:

  ==
  BUG: KMSAN: use of uninitialized memory in sctp_rcv+0x17b8/0x43b0
  CPU: 1 PID: 2940 Comm: probe Not tainted 4.11.0-rc5+ #2926
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs
  01/01/2011
  Call Trace:
   
   __dump_stack lib/dump_stack.c:16
   dump_stack+0x172/0x1c0 lib/dump_stack.c:52
   kmsan_report+0x12a/0x180 mm/kmsan/kmsan.c:927
   __msan_warning_32+0x61/0xb0 mm/kmsan/kmsan_instr.c:469
   __sctp_rcv_init_lookup net/sctp/input.c:1074
   __sctp_rcv_lookup_harder net/sctp/input.c:1233
   __sctp_rcv_lookup net/sctp/input.c:1255
   sctp_rcv+0x17b8/0x43b0 net/sctp/input.c:170
   sctp6_rcv+0x32/0x70 net/sctp/ipv6.c:984
   ip6_input_finish+0x82f/0x1ee0 net/ipv6/ip6_input.c:279
   NF_HOOK ./include/linux/netfilter.h:257
   ip6_input+0x239/0x290 net/ipv6/ip6_input.c:322
   dst_input ./include/net/dst.h:492
   ip6_rcv_finish net/ipv6/ip6_input.c:69
   NF_HOOK ./include/linux/netfilter.h:257
   ipv6_rcv+0x1dbd/0x22e0 net/ipv6/ip6_input.c:203
   __netif_receive_skb_core+0x2f6f/0x3a20 net/core/dev.c:4208
   __netif_receive_skb net/core/dev.c:4246
   process_backlog+0x667/0xba0 net/core/dev.c:4866
   napi_poll net/core/dev.c:5268
   net_rx_action+0xc95/0x1590 net/core/dev.c:5333
   __do_softirq+0x485/0x942 kernel/softirq.c:284
   do_softirq_own_stack+0x1c/0x30 arch/x86/entry/entry_64.S:902
   
   do_softirq kernel/softirq.c:328
   __local_bh_enable_ip+0x25b/0x290 kernel/softirq.c:181
   local_bh_enable+0x37/0x40 ./include/linux/bottom_half.h:31
   rcu_read_unlock_bh ./include/linux/rcupdate.h:931
   ip6_finish_output2+0x19b2/0x1cf0 net/ipv6/ip6_output.c:124
   ip6_finish_output+0x764/0x970 net/ipv6/ip6_output.c:149
   NF_HOOK_COND ./include/linux/netfilter.h:246
   ip6_output+0x456/0x520 net/ipv6/ip6_output.c:163
   dst_output ./include/net/dst.h:486
   NF_HOOK ./include/linux/netfilter.h:257
   ip6_xmit+0x1841/0x1c00 net/ipv6/ip6_output.c:261
   sctp_v6_xmit+0x3b7/0x470 net/sctp/ipv6.c:225
   sctp_packet_transmit+0x38cb/0x3a20 net/sctp/output.c:632
   sctp_outq_flush+0xeb3/0x46e0 net/sctp/outqueue.c:885
   sctp_outq_uncork+0xb2/0xd0 net/sctp/outqueue.c:750
   sctp_side_effects net/sctp/sm_sideeffect.c:1773
   sctp_do_sm+0x6962/0x6ec0 net/sctp/sm_sideeffect.c:1147
   sctp_primitive_ASSOCIATE+0x12c/0x160 net/sctp/primitive.c:88
   sctp_sendmsg+0x43e5/0x4f90 net/sctp/socket.c:1954
   inet_sendmsg+0x498/0x670 net/ipv4/af_inet.c:762
   sock_sendmsg_nosec net/socket.c:633
   sock_sendmsg net/socket.c:643
   SYSC_sendto+0x608/0x710 net/socket.c:1696
   SyS_sendto+0x8a/0xb0 net/socket.c:1664
   do_syscall_64+0xe6/0x130 arch/x86/entry/common.c:285
   entry_SYSCALL64_slow_path+0x25/0x25 arch/x86/entry/entry_64.S:246
  RIP: 0033:0x401133
  RSP: 002b:7fff6d99cd38 EFLAGS: 0246 ORIG_RAX: 002c
  RAX: ffda RBX: 004002b0 RCX: 00401133
  RDX: 0001 RSI: 00494088 RDI: 0003
  RBP: 7fff6d99cd90 R08: 7fff6d99cd50 R09: 001c
  R10: 0001 R11: 0246 R12: 
  R13: 004063d0 R14: 00406460 R15: 
  origin:
   save_stack_trace+0x37/0x40 arch/x86/kernel/stacktrace.c:59
   kmsan_save_stack_with_flags mm/kmsan/kmsan.c:302
   kmsan_internal_poison_shadow+0xb1/0x1a0 mm/kmsan/kmsan.c:198
   kmsan_poison_shadow+0x6d/0xc0 mm/kmsan/kmsan.c:211
   slab_alloc_node mm/slub.c:2743
   __kmalloc_node_track_caller+0x200/0x360 mm/slub.c:4351
   __kmalloc_reserve net/core/skbuff.c:138
   __alloc_skb+0x26b/0x840 net/core/skbuff.c:231
   alloc_skb ./include/linux/skbuff.h:933
   sctp_packet_transmit+0x31e/0x3a20 net/sctp/output.c:570
   sctp_outq_flush+0xeb3/0x46e0 net/sctp/outqueue.c:885
   sctp_outq_uncork+0xb2/0xd0 net/sctp/outqueue.c:750
   sctp_side_effects net/sctp/sm_sideeffect.c:1773
   sctp_do_sm+0x6962/0x6ec0 net/sctp/sm_sideeffect.c:1147
   sctp_primitive_ASSOCIATE+0x12c/0x160 net/sctp/primitive.c:88
   sctp_sendmsg+0x43e5/0x4f90 net/sctp/socket.c:1954
   inet_sendmsg+0x498/0x670 net/ipv4/af_inet.c:762
   sock_sendmsg_nosec net/socket.c:633
   sock_sendmsg net/socket.c:643
   SYSC_sendto+0x608/0x710 net/socket.c:1696
   SyS_sendto+0x8a/0xb0 net/socket.c:1664
   do_syscall_64+0xe6/0x130 arch/x86/entry/common.c:285
   return_from_SYSCALL_64+0x0/0x6a arch/x86/entry/entry_64.S:246
  ==

Signed-off-by: Alexander Potapenko 
---
 include/net/sctp/sctp.h | 2 ++

Re: [PATCH] datapath: Fix for force/commit action failures

2017-07-13 Thread Joe Stringer

On 13 July 2017 at 11:01, Greg Rose  wrote:
> On 07/13/2017 10:46 AM, Joe Stringer wrote:
>>
>> On 13 July 2017 at 09:25, Greg Rose  wrote:
>>>
>>> When there is an established connection in direction A->B, it is
>>> possible to receive a packet on port B which then executes
>>> ct(commit,force) without first performing ct() - ie, a lookup.
>>> In this case, we would expect that this packet can delete the existing
>>> entry so that we can commit a connection with direction B->A. However,
>>> currently we only perform a check in skb_nfct_cached() for whether
>>> OVS_CS_F_TRACKED is set and OVS_CS_F_INVALID is not set, ie that a
>>> lookup previously occurred. In the above scenario, a lookup has not
>>> occurred but we should still be able to statelessly look up the
>>> existing entry and potentially delete the entry if it is in the
>>> opposite direction.
>>>
>>> This patch extends the check to also hint that if the action has the
>>> force flag set, then we will lookup the existing entry so that the
>>> force check at the end of skb_nfct_cached has the ability to delete
>>> the connection.
>>>
>>> CC: d...@openvswitch.org
>>> CC: Pravin Shalar 
>>> Signed-off-by: Joe Stringer 
>>> Signed-off-by: Greg Rose 
>>> ---
>>
>>
>> A couple more administrative notes, on netdev the module name in the
>> patch subject for openvswitch is "openvswitch" rather than datapath;
>
>
> Right you are.
>
>> and patches rather than having just "PATCH" as the subject prefix
>> should state the tree. In this case, it's a bugfix so it should be
>> "PATCH net".
>
>
> I knew that... forgot the format patch option to add it.  Net-next
> is closed so that would be mandatory.
>
>  Furthermore, if you're able to figure out which commit
>>
>> introduced the issue (I believe it's introduced by the force commit
>> patch), then you should place the "Fixes: " tag. I can give you some
>> pointers off-list on how to do this (git blame and some basic
>> formatting of the targeted patch should do the trick - this tag
>> expects a 12-digit hash).
>>
>> For reference, I ended up looking it up during review, this is the
>> line you'd add:
>> Fixes: dd41d33f0b03 ("openvswitch: Add force commit.")
>
>
> Oh, thanks!
>
>
>>
>>>   net/openvswitch/conntrack.c | 12 
>>>   1 file changed, 8 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
>>> index 08679eb..9041cf8 100644
>>> --- a/net/openvswitch/conntrack.c
>>> +++ b/net/openvswitch/conntrack.c
>>> @@ -641,17 +641,21 @@ static bool skb_nfct_cached(struct net *net,
>>>  ct = nf_ct_get(skb, );
>>>  /* If no ct, check if we have evidence that an existing
>>> conntrack entry
>>>   * might be found for this skb.  This happens when we lose a
>>> skb->_nfct
>>> -* due to an upcall.  If the connection was not confirmed, it is
>>> not
>>> -* cached and needs to be run through conntrack again.
>>> +* due to an upcall, or if the direction is being forced.  If the
>>> +* connection was not confirmed, it is not cached and needs to be
>>> run
>>> +* through conntrack again.
>>>   */
>>> -   if (!ct && key->ct_state & OVS_CS_F_TRACKED &&
>>> +   if ((!ct && (key->ct_state & OVS_CS_F_TRACKED) &&
>>>  !(key->ct_state & OVS_CS_F_INVALID) &&
>>> -   key->ct_zone == info->zone.id) {
>>> +key->ct_zone == info->zone.id) ||
>>> +(!key->ct_state && info->force)) {
>>>  ct = ovs_ct_find_existing(net, >zone,
>>> info->family, skb,
>>>!!(key->ct_state
>>>   & OVS_CS_F_NAT_MASK));
>>>  if (ct)
>>>  nf_ct_get(skb, );
>>> +   else
>>> +   return false;
>>>  }
>>>  if (!ct)
>>>  return false;
>>
>>
>> I was just wondering if this has the potential to prevent
>> nf_conntrack_in() from being called at all in this case, which is also
>> not quite right. In the original case of (!ct && (key->ct_state &
>> OVS_CS_F_TRACKED) && !(key->ct_state & OVS_CS_F_TRACKED)), which I'll
>> refer to as "ct_executed", we explicitly want to avoid running
>> nf_conntrack_in() if we already ran it, because the connection tracker
>> doesn't expect to see the same packet twice (there's also things like
>> stats/accounting, and potentially L4 state machines that could get
>> messed up by multiple calls). By the time the info->force and
>> direction check happens at the end of the function, "ct_executed" is
>> implied to be true. However, in this new case, ct_executed is actually
>> false - because there was no ct() before the ct(force,commit). In this
>> case, we only want to look up the existing entry to see if it should
>> be deleted; if it should not be

Re: [PATCH net-next v1 2/3] gtp: Support creating flow-based gtp net_device

2017-07-13 Thread Joe Stringer

On 12 July 2017 at 17:44, Jiannan Ouyang  wrote:
> Add the gtp_create_flow_based_dev() interface to create flow-based gtp
> net_device, which sets gtp->collect_md. Under flow-based mode, UDP sockets are
> created and maintained in kernel.
>
> Signed-off-by: Jiannan Ouyang 
> ---



> +static int gtp_configure(struct net *net, struct net_device *dev,
> +const struct ip_tunnel_info *info)
> +{
> +   struct gtp_net *gn = net_generic(net, gtp_net_id);
> +   struct gtp_dev *gtp = netdev_priv(dev);
> +   int err;
> +
> +   gtp->net = net;
> +   gtp->dev = dev;
> +
> +   if (gtp_find_flow_based_dev(net))
> +   return -EBUSY;
> +
> +   dev->netdev_ops = _netdev_ops;
> +   dev->priv_destructor= free_netdev;
> +
> +   dev->hard_header_len = 0;
> +   dev->addr_len = 0;
> +
> +   /* Zero header length. */
> +   dev->type = ARPHRD_NONE;
> +   dev->flags = IFF_POINTOPOINT | IFF_NOARP | IFF_MULTICAST;
> +
> +   dev->priv_flags |= IFF_NO_QUEUE;
> +   dev->features   |= NETIF_F_LLTX;
> +   netif_keep_dst(dev);
> +
> +   /* Assume largest header, ie. GTPv0. */
> +   dev->needed_headroom= LL_MAX_HEADER +
> +   sizeof(struct iphdr) +
> +   sizeof(struct udphdr) +
> +   sizeof(struct gtp0_header);
> +
> +   dst_cache_reset(>info.dst_cache);
> +   gtp->info = *info;
> +   gtp->collect_md = true;
> +
> +   err = gtp_hashtable_new(gtp, GTP_PDP_HASHSIZE); // JO: when to free??
> +   if (err < 0) {
> +   pr_err("Error gtp_hashtable_new");
> +   return err;
> +   }

Firstly, if this succeeds then the subsequent register_netdevice()
call below could still fail, in which case the error handling there
free this hashtable. Don't forget that even if this whole function
succeeds, the caller may still fail to set up the device and in those
error cases everything needs to be freed/cleaned up as well.

Then in terms of the overall lifecycle, I guess it's a matter of 1) If
there is a hashtable allocated then it must be freed on device close;
2) If the device can be reconfigured, and there is already one of
these allocated then it would need to be freed upon reconfiguration.

Cheers,
Joe

Re: [PATCH] datapath: Fix for force/commit action failures

2017-07-13 Thread Greg Rose


On 07/13/2017 10:46 AM, Joe Stringer wrote:

On 13 July 2017 at 09:25, Greg Rose  wrote:

When there is an established connection in direction A->B, it is
possible to receive a packet on port B which then executes
ct(commit,force) without first performing ct() - ie, a lookup.
In this case, we would expect that this packet can delete the existing
entry so that we can commit a connection with direction B->A. However,
currently we only perform a check in skb_nfct_cached() for whether
OVS_CS_F_TRACKED is set and OVS_CS_F_INVALID is not set, ie that a
lookup previously occurred. In the above scenario, a lookup has not
occurred but we should still be able to statelessly look up the
existing entry and potentially delete the entry if it is in the
opposite direction.

This patch extends the check to also hint that if the action has the
force flag set, then we will lookup the existing entry so that the
force check at the end of skb_nfct_cached has the ability to delete
the connection.

CC: d...@openvswitch.org
CC: Pravin Shalar 
Signed-off-by: Joe Stringer 
Signed-off-by: Greg Rose 
---


A couple more administrative notes, on netdev the module name in the
patch subject for openvswitch is "openvswitch" rather than datapath;


Right you are.


and patches rather than having just "PATCH" as the subject prefix
should state the tree. In this case, it's a bugfix so it should be
"PATCH net".


I knew that... forgot the format patch option to add it.  Net-next
is closed so that would be mandatory.

 Furthermore, if you're able to figure out which commit

introduced the issue (I believe it's introduced by the force commit
patch), then you should place the "Fixes: " tag. I can give you some
pointers off-list on how to do this (git blame and some basic
formatting of the targeted patch should do the trick - this tag
expects a 12-digit hash).

For reference, I ended up looking it up during review, this is the
line you'd add:
Fixes: dd41d33f0b03 ("openvswitch: Add force commit.")


Oh, thanks!




  net/openvswitch/conntrack.c | 12 
  1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
index 08679eb..9041cf8 100644
--- a/net/openvswitch/conntrack.c
+++ b/net/openvswitch/conntrack.c
@@ -641,17 +641,21 @@ static bool skb_nfct_cached(struct net *net,
 ct = nf_ct_get(skb, );
 /* If no ct, check if we have evidence that an existing conntrack entry
  * might be found for this skb.  This happens when we lose a skb->_nfct
-* due to an upcall.  If the connection was not confirmed, it is not
-* cached and needs to be run through conntrack again.
+* due to an upcall, or if the direction is being forced.  If the
+* connection was not confirmed, it is not cached and needs to be run
+* through conntrack again.
  */
-   if (!ct && key->ct_state & OVS_CS_F_TRACKED &&
+   if ((!ct && (key->ct_state & OVS_CS_F_TRACKED) &&
 !(key->ct_state & OVS_CS_F_INVALID) &&
-   key->ct_zone == info->zone.id) {
+key->ct_zone == info->zone.id) ||
+(!key->ct_state && info->force)) {
 ct = ovs_ct_find_existing(net, >zone, info->family, skb,
   !!(key->ct_state
  & OVS_CS_F_NAT_MASK));
 if (ct)
 nf_ct_get(skb, );
+   else
+   return false;
 }
 if (!ct)
 return false;


I was just wondering if this has the potential to prevent
nf_conntrack_in() from being called at all in this case, which is also
not quite right. In the original case of (!ct && (key->ct_state &
OVS_CS_F_TRACKED) && !(key->ct_state & OVS_CS_F_TRACKED)), which I'll
refer to as "ct_executed", we explicitly want to avoid running
nf_conntrack_in() if we already ran it, because the connection tracker
doesn't expect to see the same packet twice (there's also things like
stats/accounting, and potentially L4 state machines that could get
messed up by multiple calls). By the time the info->force and
direction check happens at the end of the function, "ct_executed" is
implied to be true. However, in this new case, ct_executed is actually
false - because there was no ct() before the ct(force,commit). In this
case, we only want to look up the existing entry to see if it should
be deleted; if it should not be deleted, then we still haven't yet
done the nf_conntrack_in() call so we should return false and the
caller, __ovs_ct_lookup() should make the call to nf_conntrack_in().

What I mean is something like the following incremental on your patch:

diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
index 9041cf8b822f..98783f114824 100644
--- a/net/openvswitch/conntrack.c
+++ b/net/openvswitch/conntrack.c
@@ -637,6

[RFC net 1/2] net: set skb hash for IP6 TCP reset packet

2017-07-13 Thread Shaohua Li

From: Shaohua Li 

Please see below tcpdump output:
21:00:48.109122 IP6 (flowlabel 0x43304, hlim 64, next-header TCP (6) payload 
length: 40) fec0::5054:ff:fe12:3456.55804 > fec0::5054:ff:fe12:3456.: Flags 
[S], cksum 0x0529 (incorrect -> 0xf56c), seq 3282214508, win 43690, options 
[mss 65476,sackOK,TS val 2500903437 ecr 0,nop,wscale 7], length 0
21:00:48.109381 IP6 (flowlabel 0xd827f, hlim 64, next-header TCP (6) payload 
length: 40) fec0::5054:ff:fe12:3456. > fec0::5054:ff:fe12:3456.55804: Flags 
[S.], cksum 0x0529 (incorrect -> 0x49ad), seq 1923801573, ack 3282214509, win 
43690, options [mss 65476,sackOK,TS val 2500903437 ecr 2500903437,nop,wscale 
7], length 0
21:00:48.109548 IP6 (flowlabel 0x43304, hlim 64, next-header TCP (6) payload 
length: 32) fec0::5054:ff:fe12:3456.55804 > fec0::5054:ff:fe12:3456.: Flags 
[.], cksum 0x0521 (incorrect -> 0x1bdf), seq 1, ack 1, win 342, options 
[nop,nop,TS val 2500903437 ecr 2500903437], length 0
21:00:48.109823 IP6 (flowlabel 0x43304, hlim 64, next-header TCP (6) payload 
length: 62) fec0::5054:ff:fe12:3456.55804 > fec0::5054:ff:fe12:3456.: Flags 
[P.], cksum 0x053f (incorrect -> 0xb8b1), seq 1:31, ack 1, win 342, options 
[nop,nop,TS val 2500903437 ecr 2500903437], length 30
21:00:48.109910 IP6 (flowlabel 0xd827f, hlim 64, next-header TCP (6) payload 
length: 32) fec0::5054:ff:fe12:3456. > fec0::5054:ff:fe12:3456.55804: Flags 
[.], cksum 0x0521 (incorrect -> 0x1bc1), seq 1, ack 31, win 342, options 
[nop,nop,TS val 2500903437 ecr 2500903437], length 0
21:00:48.110043 IP6 (flowlabel 0xd827f, hlim 64, next-header TCP (6) payload 
length: 56) fec0::5054:ff:fe12:3456. > fec0::5054:ff:fe12:3456.55804: Flags 
[P.], cksum 0x0539 (incorrect -> 0xb726), seq 1:25, ack 31, win 342, options 
[nop,nop,TS val 2500903438 ecr 2500903437], length 24
21:00:48.110173 IP6 (flowlabel 0x43304, hlim 64, next-header TCP (6) payload 
length: 32) fec0::5054:ff:fe12:3456.55804 > fec0::5054:ff:fe12:3456.: Flags 
[.], cksum 0x0521 (incorrect -> 0x1ba7), seq 31, ack 25, win 342, options 
[nop,nop,TS val 2500903438 ecr 2500903438], length 0
21:00:48.110211 IP6 (flowlabel 0xd827f, hlim 64, next-header TCP (6) payload 
length: 32) fec0::5054:ff:fe12:3456. > fec0::5054:ff:fe12:3456.55804: Flags 
[F.], cksum 0x0521 (incorrect -> 0x1ba7), seq 25, ack 31, win 342, options 
[nop,nop,TS val 2500903438 ecr 2500903437], length 0
21:00:48.151099 IP6 (flowlabel 0x43304, hlim 64, next-header TCP (6) payload 
length: 32) fec0::5054:ff:fe12:3456.55804 > fec0::5054:ff:fe12:3456.: Flags 
[.], cksum 0x0521 (incorrect -> 0x1ba6), seq 31, ack 26, win 342, options 
[nop,nop,TS val 2500903438 ecr 2500903438], length 0
21:00:49.110524 IP6 (flowlabel 0x43304, hlim 64, next-header TCP (6) payload 
length: 56) fec0::5054:ff:fe12:3456.55804 > fec0::5054:ff:fe12:3456.: Flags 
[P.], cksum 0x0539 (incorrect -> 0xb324), seq 31:55, ack 26, win 342, options 
[nop,nop,TS val 2500904438 ecr 2500903438], length 24
21:00:49.110637 IP6 (flowlabel 0xb34d5, hlim 64, next-header TCP (6) payload 
length: 20) fec0::5054:ff:fe12:3456. > fec0::5054:ff:fe12:3456.55804: Flags 
[R], cksum 0x0515 (incorrect -> 0x668c), seq 1923801599, win 0, length 0

The tcp reset packet has a different flowlabel, which causes our router
doesn't correctly close tcp connection.  The reason is the normal packet
gets the skb->hash from sk->sk_txhash, which is generated randomly.
ip6_make_flowlabel then uses the hash to create a flowlabel. The reset
packet doesn't get assigned a hash, so the flowlabel is calculated with
flowi6.

The solution is to save the hash value for timeout sock and use it for
reset packet.

Signed-off-by: Shaohua Li 
---
 include/net/inet_timewait_sock.h | 10 ++
 net/ipv4/inet_timewait_sock.c|  1 +
 net/ipv6/tcp_ipv6.c  |  7 +++
 3 files changed, 18 insertions(+)

diff --git a/include/net/inet_timewait_sock.h b/include/net/inet_timewait_sock.h
index 6a75d67..b568224 100644
--- a/include/net/inet_timewait_sock.h
+++ b/include/net/inet_timewait_sock.h
@@ -77,6 +77,7 @@ struct inet_timewait_sock {
tw_pad  : 2,/* 2 bits hole */
tw_tos  : 8;
kmemcheck_bitfield_end(flags);
+   __u32   tw_txhash;
struct timer_list   tw_timer;
struct inet_bind_bucket *tw_tb;
 };
@@ -87,6 +88,15 @@ static inline struct inet_timewait_sock *inet_twsk(const 
struct sock *sk)
return (struct inet_timewait_sock *)sk;
 }
 
+static inline void skb_set_hash_from_twsk(struct sk_buff *skb,
+   struct inet_timewait_sock *twsk)
+{
+   if (twsk->tw_txhash) {
+   skb->l4_hash = 1;
+   skb->hash = twsk->tw_txhash;
+   }
+}
+
 void inet_twsk_free(struct inet_timewait_sock *tw);
 void inet_twsk_put(struct inet_timewait_sock *tw);
 
diff --git a/net/ipv4/inet_timewait_sock.c b/net/ipv4/inet_timewait_sock.c

[RFC net 0/2] fix ipv6 tcp reset packet flowlabel issues

2017-07-13 Thread Shaohua Li

From: Shaohua Li 

Please see below tcpdump output:
21:00:48.109122 IP6 (flowlabel 0x43304, hlim 64, next-header TCP (6) payload 
length: 40) fec0::5054:ff:fe12:3456.55804 > fec0::5054:ff:fe12:3456.: Flags 
[S], cksum 0x0529 (incorrect -> 0xf56c), seq 3282214508, win 43690, options 
[mss 65476,sackOK,TS val 2500903437 ecr 0,nop,wscale 7], length 0
21:00:48.109381 IP6 (flowlabel 0xd827f, hlim 64, next-header TCP (6) payload 
length: 40) fec0::5054:ff:fe12:3456. > fec0::5054:ff:fe12:3456.55804: Flags 
[S.], cksum 0x0529 (incorrect -> 0x49ad), seq 1923801573, ack 3282214509, win 
43690, options [mss 65476,sackOK,TS val 2500903437 ecr 2500903437,nop,wscale 
7], length 0
21:00:48.109548 IP6 (flowlabel 0x43304, hlim 64, next-header TCP (6) payload 
length: 32) fec0::5054:ff:fe12:3456.55804 > fec0::5054:ff:fe12:3456.: Flags 
[.], cksum 0x0521 (incorrect -> 0x1bdf), seq 1, ack 1, win 342, options 
[nop,nop,TS val 2500903437 ecr 2500903437], length 0
21:00:48.109823 IP6 (flowlabel 0x43304, hlim 64, next-header TCP (6) payload 
length: 62) fec0::5054:ff:fe12:3456.55804 > fec0::5054:ff:fe12:3456.: Flags 
[P.], cksum 0x053f (incorrect -> 0xb8b1), seq 1:31, ack 1, win 342, options 
[nop,nop,TS val 2500903437 ecr 2500903437], length 30
21:00:48.109910 IP6 (flowlabel 0xd827f, hlim 64, next-header TCP (6) payload 
length: 32) fec0::5054:ff:fe12:3456. > fec0::5054:ff:fe12:3456.55804: Flags 
[.], cksum 0x0521 (incorrect -> 0x1bc1), seq 1, ack 31, win 342, options 
[nop,nop,TS val 2500903437 ecr 2500903437], length 0
21:00:48.110043 IP6 (flowlabel 0xd827f, hlim 64, next-header TCP (6) payload 
length: 56) fec0::5054:ff:fe12:3456. > fec0::5054:ff:fe12:3456.55804: Flags 
[P.], cksum 0x0539 (incorrect -> 0xb726), seq 1:25, ack 31, win 342, options 
[nop,nop,TS val 2500903438 ecr 2500903437], length 24
21:00:48.110173 IP6 (flowlabel 0x43304, hlim 64, next-header TCP (6) payload 
length: 32) fec0::5054:ff:fe12:3456.55804 > fec0::5054:ff:fe12:3456.: Flags 
[.], cksum 0x0521 (incorrect -> 0x1ba7), seq 31, ack 25, win 342, options 
[nop,nop,TS val 2500903438 ecr 2500903438], length 0
21:00:48.110211 IP6 (flowlabel 0xd827f, hlim 64, next-header TCP (6) payload 
length: 32) fec0::5054:ff:fe12:3456. > fec0::5054:ff:fe12:3456.55804: Flags 
[F.], cksum 0x0521 (incorrect -> 0x1ba7), seq 25, ack 31, win 342, options 
[nop,nop,TS val 2500903438 ecr 2500903437], length 0
21:00:48.151099 IP6 (flowlabel 0x43304, hlim 64, next-header TCP (6) payload 
length: 32) fec0::5054:ff:fe12:3456.55804 > fec0::5054:ff:fe12:3456.: Flags 
[.], cksum 0x0521 (incorrect -> 0x1ba6), seq 31, ack 26, win 342, options 
[nop,nop,TS val 2500903438 ecr 2500903438], length 0
21:00:49.110524 IP6 (flowlabel 0x43304, hlim 64, next-header TCP (6) payload 
length: 56) fec0::5054:ff:fe12:3456.55804 > fec0::5054:ff:fe12:3456.: Flags 
[P.], cksum 0x0539 (incorrect -> 0xb324), seq 31:55, ack 26, win 342, options 
[nop,nop,TS val 2500904438 ecr 2500903438], length 24
21:00:49.110637 IP6 (flowlabel 0xb34d5, hlim 64, next-header TCP (6) payload 
length: 20) fec0::5054:ff:fe12:3456. > fec0::5054:ff:fe12:3456.55804: Flags 
[R], cksum 0x0515 (incorrect -> 0x668c), seq 1923801599, win 0, length 0

The flowlabel of reset packet (0xb34d5) and that of normal packet (0xd827f) are
different. This causes our router doesn't correctly close tcp connection. The
patches try to fix the issue.

Thanks,
Shaohua

Shaohua Li (2):
  net: set skb hash for IP6 TCP reset packet
  net: tcp_v6_send_reset should set flowlabel

 include/net/inet_timewait_sock.h | 10 ++
 net/ipv4/inet_timewait_sock.c|  1 +
 net/ipv6/tcp_ipv6.c  | 25 -
 3 files changed, 35 insertions(+), 1 deletion(-)

-- 
2.9.3

[RFC net 2/2] net: tcp_v6_send_reset should set flowlabel

2017-07-13 Thread Shaohua Li

From: Shaohua Li 

Currently tcp_v6_send_reset ignores user defined flowlabel, so the reset
packet doesn't include the flowlabel info.

Signed-off-by: Shaohua Li 
---
 net/ipv6/tcp_ipv6.c | 18 +-
 1 file changed, 17 insertions(+), 1 deletion(-)

diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
index 2e656d2..eaf4b8e 100644
--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -898,6 +898,8 @@ static void tcp_v6_send_reset(const struct sock *sk, struct 
sk_buff *skb)
struct sock *sk1 = NULL;
 #endif
int oif;
+   u8 tclass = 0;
+   __be32 flowlabel = 0;
 
if (th->rst)
return;
@@ -946,7 +948,21 @@ static void tcp_v6_send_reset(const struct sock *sk, 
struct sk_buff *skb)
  (th->doff << 2);
 
oif = sk ? sk->sk_bound_dev_if : 0;
-   tcp_v6_send_response(sk, skb, seq, ack_seq, 0, 0, 0, oif, key, 1, 0, 0);
+   if (sk) {
+   if (sk_fullsock(sk)) {
+   struct ipv6_pinfo *np = inet6_sk(sk);
+
+   tclass = np->tclass;
+   flowlabel = np->flow_label & IPV6_FLOWLABEL_MASK;
+   } else {
+   struct inet_timewait_sock *tw = inet_twsk(sk);
+
+   tclass = tw->tw_tclass;
+   flowlabel = cpu_to_be32(tw->tw_flowlabel);
+   }
+   }
+   tcp_v6_send_response(sk, skb, seq, ack_seq, 0, 0, 0, oif, key, 1,
+   tclass, flowlabel);
 
 #ifdef CONFIG_TCP_MD5SIG
 out:
-- 
2.9.3

Re: [PATCH][net-next] svcrdma: fix an incorrect check on -E2BIG and -EINVAL

2017-07-13 Thread Chuck Lever


> On Jul 13, 2017, at 1:51 PM, Colin King  wrote:
> 
> From: Colin Ian King 
> 
> The current check will always be true and will always jump to
> err1, this looks dubious to me. I believe && should be used
> instead of ||.
> 
> Detected by CoverityScan, CID#1450120 ("Logically Dead Code")
> 
> Fixes: 107c1d0a991a ("svcrdma: Avoid Send Queue overflow")
> Signed-off-by: Colin Ian King 

Reviewed-by: Chuck Lever 

Dan reported this today, and I have a similar patch in my
"pending for-rc" series. This one works too.


> ---
> net/sunrpc/xprtrdma/svc_rdma_sendto.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/net/sunrpc/xprtrdma/svc_rdma_sendto.c 
> b/net/sunrpc/xprtrdma/svc_rdma_sendto.c
> index 19fd01e4b690..7c3a211e0e9a 100644
> --- a/net/sunrpc/xprtrdma/svc_rdma_sendto.c
> +++ b/net/sunrpc/xprtrdma/svc_rdma_sendto.c
> @@ -684,7 +684,7 @@ int svc_rdma_sendto(struct svc_rqst *rqstp)
>   return 0;
> 
>  err2:
> - if (ret != -E2BIG || ret != -EINVAL)
> + if (ret != -E2BIG && ret != -EINVAL)
>   goto err1;
> 
>   ret = svc_rdma_post_recv(rdma, GFP_KERNEL);
> -- 
> 2.11.0
> 

--
Chuck Lever

[PATCH][net-next] svcrdma: fix an incorrect check on -E2BIG and -EINVAL

2017-07-13 Thread Colin King

From: Colin Ian King 

The current check will always be true and will always jump to
err1, this looks dubious to me. I believe && should be used
instead of ||.

Detected by CoverityScan, CID#1450120 ("Logically Dead Code")

Fixes: 107c1d0a991a ("svcrdma: Avoid Send Queue overflow")
Signed-off-by: Colin Ian King 
---
 net/sunrpc/xprtrdma/svc_rdma_sendto.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/sunrpc/xprtrdma/svc_rdma_sendto.c 
b/net/sunrpc/xprtrdma/svc_rdma_sendto.c
index 19fd01e4b690..7c3a211e0e9a 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_sendto.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_sendto.c
@@ -684,7 +684,7 @@ int svc_rdma_sendto(struct svc_rqst *rqstp)
return 0;
 
  err2:
-   if (ret != -E2BIG || ret != -EINVAL)
+   if (ret != -E2BIG && ret != -EINVAL)
goto err1;
 
ret = svc_rdma_post_recv(rdma, GFP_KERNEL);
-- 
2.11.0

Re: [PATCH] datapath: Fix for force/commit action failures

2017-07-13 Thread Joe Stringer

On 13 July 2017 at 09:25, Greg Rose  wrote:
> When there is an established connection in direction A->B, it is
> possible to receive a packet on port B which then executes
> ct(commit,force) without first performing ct() - ie, a lookup.
> In this case, we would expect that this packet can delete the existing
> entry so that we can commit a connection with direction B->A. However,
> currently we only perform a check in skb_nfct_cached() for whether
> OVS_CS_F_TRACKED is set and OVS_CS_F_INVALID is not set, ie that a
> lookup previously occurred. In the above scenario, a lookup has not
> occurred but we should still be able to statelessly look up the
> existing entry and potentially delete the entry if it is in the
> opposite direction.
>
> This patch extends the check to also hint that if the action has the
> force flag set, then we will lookup the existing entry so that the
> force check at the end of skb_nfct_cached has the ability to delete
> the connection.
>
> CC: d...@openvswitch.org
> CC: Pravin Shalar 
> Signed-off-by: Joe Stringer 
> Signed-off-by: Greg Rose 
> ---

A couple more administrative notes, on netdev the module name in the
patch subject for openvswitch is "openvswitch" rather than datapath;
and patches rather than having just "PATCH" as the subject prefix
should state the tree. In this case, it's a bugfix so it should be
"PATCH net". Furthermore, if you're able to figure out which commit
introduced the issue (I believe it's introduced by the force commit
patch), then you should place the "Fixes: " tag. I can give you some
pointers off-list on how to do this (git blame and some basic
formatting of the targeted patch should do the trick - this tag
expects a 12-digit hash).

For reference, I ended up looking it up during review, this is the
line you'd add:
Fixes: dd41d33f0b03 ("openvswitch: Add force commit.")

>  net/openvswitch/conntrack.c | 12 
>  1 file changed, 8 insertions(+), 4 deletions(-)
>
> diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
> index 08679eb..9041cf8 100644
> --- a/net/openvswitch/conntrack.c
> +++ b/net/openvswitch/conntrack.c
> @@ -641,17 +641,21 @@ static bool skb_nfct_cached(struct net *net,
> ct = nf_ct_get(skb, );
> /* If no ct, check if we have evidence that an existing conntrack 
> entry
>  * might be found for this skb.  This happens when we lose a 
> skb->_nfct
> -* due to an upcall.  If the connection was not confirmed, it is not
> -* cached and needs to be run through conntrack again.
> +* due to an upcall, or if the direction is being forced.  If the
> +* connection was not confirmed, it is not cached and needs to be run
> +* through conntrack again.
>  */
> -   if (!ct && key->ct_state & OVS_CS_F_TRACKED &&
> +   if ((!ct && (key->ct_state & OVS_CS_F_TRACKED) &&
> !(key->ct_state & OVS_CS_F_INVALID) &&
> -   key->ct_zone == info->zone.id) {
> +key->ct_zone == info->zone.id) ||
> +(!key->ct_state && info->force)) {
> ct = ovs_ct_find_existing(net, >zone, info->family, skb,
>   !!(key->ct_state
>  & OVS_CS_F_NAT_MASK));
> if (ct)
> nf_ct_get(skb, );
> +   else
> +   return false;
> }
> if (!ct)
> return false;

I was just wondering if this has the potential to prevent
nf_conntrack_in() from being called at all in this case, which is also
not quite right. In the original case of (!ct && (key->ct_state &
OVS_CS_F_TRACKED) && !(key->ct_state & OVS_CS_F_TRACKED)), which I'll
refer to as "ct_executed", we explicitly want to avoid running
nf_conntrack_in() if we already ran it, because the connection tracker
doesn't expect to see the same packet twice (there's also things like
stats/accounting, and potentially L4 state machines that could get
messed up by multiple calls). By the time the info->force and
direction check happens at the end of the function, "ct_executed" is
implied to be true. However, in this new case, ct_executed is actually
false - because there was no ct() before the ct(force,commit). In this
case, we only want to look up the existing entry to see if it should
be deleted; if it should not be deleted, then we still haven't yet
done the nf_conntrack_in() call so we should return false and the
caller, __ovs_ct_lookup() should make the call to nf_conntrack_in().

What I mean is something like the following incremental on your patch:

diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
index 9041cf8b822f..98783f114824 100644
--- a/net/openvswitch/conntrack.c
+++ b/net/openvswitch/conntrack.c
@@ -637,6 +637,7 @@ static bool skb_nfct_cached(struct net *net,
{
   enum ip_conntrack_info ctinfo;

[PATCH] smsc95xx: use ethtool_op_get_ts_info()

2017-07-13 Thread Petr Kulhavy

This change enables the use of SW timestamping on Raspberry PI.

smsc95xx uses the usbnet transmit function usbnet_start_xmit(), which
implements software timestamping. However the SOF_TIMESTAMPING_TX_SOFTWARE
capability was missing and only SOF_TIMESTAMPING_RX_SOFTWARE was announced.
By using ethtool_op_get_ts_info() as get_ts_info() also the
SOF_TIMESTAMPING_TX_SOFTWARE is announced.

Signed-off-by: Petr Kulhavy 
---
 drivers/net/usb/smsc95xx.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/usb/smsc95xx.c b/drivers/net/usb/smsc95xx.c
index 2dfca96..340c134 100644
--- a/drivers/net/usb/smsc95xx.c
+++ b/drivers/net/usb/smsc95xx.c
@@ -898,6 +898,7 @@ static const struct ethtool_ops smsc95xx_ethtool_ops = {
.set_wol= smsc95xx_ethtool_set_wol,
.get_link_ksettings = smsc95xx_get_link_ksettings,
.set_link_ksettings = smsc95xx_set_link_ksettings,
+   .get_ts_info= ethtool_op_get_ts_info,
 };
 
 static int smsc95xx_ioctl(struct net_device *netdev, struct ifreq *rq, int cmd)
-- 
2.7.4

Re: [ovs-dev] [PATCH] datapath: Fix for force/commit action failures

2017-07-13 Thread Greg Rose


On 07/13/2017 10:08 AM, Darrell Ball wrote:



On 7/13/17, 9:25 AM, "ovs-dev-boun...@openvswitch.org on behalf of Greg Rose" 
 wrote:

 When there is an established connection in direction A->B, it is
 possible to receive a packet on port B which then executes
 ct(commit,force) without first performing ct() - ie, a lookup.
 In this case, we would expect that this packet can delete the existing
 entry so that we can commit a connection with direction B->A. However,
 currently we only perform a check in skb_nfct_cached() for whether
 OVS_CS_F_TRACKED is set and OVS_CS_F_INVALID is not set, ie that a
 lookup previously occurred. In the above scenario, a lookup has not
 occurred but we should still be able to statelessly look up the
 existing entry and potentially delete the entry if it is in the
 opposite direction.
 
 This patch extends the check to also hint that if the action has the

 force flag set, then we will lookup the existing entry so that the
 force check at the end of skb_nfct_cached has the ability to delete
 the connection.
 
 CC: d...@openvswitch.org

 CC: Pravin Shalar 
 Signed-off-by: Joe Stringer 
 Signed-off-by: Greg Rose 
 ---
  net/openvswitch/conntrack.c | 12 
  1 file changed, 8 insertions(+), 4 deletions(-)
 
 diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c

 index 08679eb..9041cf8 100644
 --- a/net/openvswitch/conntrack.c
 +++ b/net/openvswitch/conntrack.c
 @@ -641,17 +641,21 @@ static bool skb_nfct_cached(struct net *net,
ct = nf_ct_get(skb, );
/* If no ct, check if we have evidence that an existing conntrack entry
 * might be found for this skb.  This happens when we lose a skb->_nfct
 -   * due to an upcall.  If the connection was not confirmed, it is not
 -   * cached and needs to be run through conntrack again.
 +   * due to an upcall, or if the direction is being forced.  If the
 +   * connection was not confirmed, it is not cached and needs to be run
 +   * through conntrack again.
 */
 -  if (!ct && key->ct_state & OVS_CS_F_TRACKED &&
 +  if ((!ct && (key->ct_state & OVS_CS_F_TRACKED) &&
!(key->ct_state & OVS_CS_F_INVALID) &&
 -  key->ct_zone == info->zone.id) {
 +   key->ct_zone == info->zone.id) ||
 +   (!key->ct_state && info->force)) {
ct = ovs_ct_find_existing(net, >zone, info->family, skb,
  !!(key->ct_state
 & OVS_CS_F_NAT_MASK));
if (ct)
nf_ct_get(skb, );
 +  else
 +  return false;

the above else case looks redundant since it maps to the following check
if (!ct)
return false;
which services a superset of the code flow.


Sure, we can let it fall through... missed that.

After waiting for more review I'll send a V2 patch.

Thanks for the review Darrell

- Greg



}
if (!ct)
return false;
 --
 1.8.3.1
 
 ___

 dev mailing list
 d...@openvswitch.org
 
https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.openvswitch.org_mailman_listinfo_ovs-2Ddev=DwICAg=uilaK90D4TOVoH58JNXRgQ=BVhFA09CGX7JQ5Ih-uZnsw=5da6ykiSQJBoJXmpq6iVknmPAc5JDzlVngmp8j_xcXA=PsX-njQ_hFqy8P77KNEyX0i7u165p2Wrbg4uAYqfbGs=

[PATCH net 1/1] net sched actions: rename act_get_notify() to tcf_get_notify()

2017-07-13 Thread Roman Mashak

Make name consistent with other TC event notification routines, such as
tcf_add_notify() and tcf_del_notify()

Signed-off-by: Roman Mashak 
---
 net/sched/act_api.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/sched/act_api.c b/net/sched/act_api.c
index aed6cf2..f2e9ed3 100644
--- a/net/sched/act_api.c
+++ b/net/sched/act_api.c
@@ -835,7 +835,7 @@ static int tca_get_fill(struct sk_buff *skb, struct 
list_head *actions,
 }
 
 static int
-act_get_notify(struct net *net, u32 portid, struct nlmsghdr *n,
+tcf_get_notify(struct net *net, u32 portid, struct nlmsghdr *n,
   struct list_head *actions, int event)
 {
struct sk_buff *skb;
@@ -1018,7 +1018,7 @@ static int tca_action_flush(struct net *net, struct 
nlattr *nla,
}
 
if (event == RTM_GETACTION)
-   ret = act_get_notify(net, portid, n, , event);
+   ret = tcf_get_notify(net, portid, n, , event);
else { /* delete */
ret = tcf_del_notify(net, n, , portid);
if (ret)
-- 
1.9.1

Re: [ovs-dev] [PATCH] datapath: Fix for force/commit action failures

2017-07-13 Thread Darrell Ball



On 7/13/17, 9:25 AM, "ovs-dev-boun...@openvswitch.org on behalf of Greg Rose" 
 wrote:

When there is an established connection in direction A->B, it is
possible to receive a packet on port B which then executes
ct(commit,force) without first performing ct() - ie, a lookup.
In this case, we would expect that this packet can delete the existing
entry so that we can commit a connection with direction B->A. However,
currently we only perform a check in skb_nfct_cached() for whether
OVS_CS_F_TRACKED is set and OVS_CS_F_INVALID is not set, ie that a
lookup previously occurred. In the above scenario, a lookup has not
occurred but we should still be able to statelessly look up the
existing entry and potentially delete the entry if it is in the
opposite direction.

This patch extends the check to also hint that if the action has the
force flag set, then we will lookup the existing entry so that the
force check at the end of skb_nfct_cached has the ability to delete
the connection.

CC: d...@openvswitch.org
CC: Pravin Shalar 
Signed-off-by: Joe Stringer 
Signed-off-by: Greg Rose 
---
 net/openvswitch/conntrack.c | 12 
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
index 08679eb..9041cf8 100644
--- a/net/openvswitch/conntrack.c
+++ b/net/openvswitch/conntrack.c
@@ -641,17 +641,21 @@ static bool skb_nfct_cached(struct net *net,
ct = nf_ct_get(skb, );
/* If no ct, check if we have evidence that an existing conntrack entry
 * might be found for this skb.  This happens when we lose a skb->_nfct
-* due to an upcall.  If the connection was not confirmed, it is not
-* cached and needs to be run through conntrack again.
+* due to an upcall, or if the direction is being forced.  If the
+* connection was not confirmed, it is not cached and needs to be run
+* through conntrack again.
 */
-   if (!ct && key->ct_state & OVS_CS_F_TRACKED &&
+   if ((!ct && (key->ct_state & OVS_CS_F_TRACKED) &&
!(key->ct_state & OVS_CS_F_INVALID) &&
-   key->ct_zone == info->zone.id) {
+key->ct_zone == info->zone.id) ||
+(!key->ct_state && info->force)) {
ct = ovs_ct_find_existing(net, >zone, info->family, skb,
  !!(key->ct_state
 & OVS_CS_F_NAT_MASK));
if (ct)
nf_ct_get(skb, );
+   else
+   return false;

the above else case looks redundant since it maps to the following check
if (!ct)
return false;
which services a superset of the code flow.

}
if (!ct)
return false;
-- 
1.8.3.1

___
dev mailing list
d...@openvswitch.org

https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.openvswitch.org_mailman_listinfo_ovs-2Ddev=DwICAg=uilaK90D4TOVoH58JNXRgQ=BVhFA09CGX7JQ5Ih-uZnsw=5da6ykiSQJBoJXmpq6iVknmPAc5JDzlVngmp8j_xcXA=PsX-njQ_hFqy8P77KNEyX0i7u165p2Wrbg4uAYqfbGs=

Re: [RFC PATCH 00/12] Implement XDP bpf_redirect vairants

2017-07-13 Thread John Fastabend

On 07/13/2017 09:16 AM, Jesper Dangaard Brouer wrote:
> On Thu, 13 Jul 2017 13:14:30 +0200
> Jesper Dangaard Brouer  wrote:
> 
>> I'm still getting crashes (but much harder to provoke), but I figured
>> out why.  We sort of missed one case, where map_to_flush gets set, when
>> the ndo_xdp_xmit() call starts to fail, and the ixgbe driver then
>> forgets to call xdp_do_flush_map, if all packets in that NAPI cycle
>> failed.  We could blame the driver, but yhe clean solution is making
>> sure, that we don't set map_to_flush when the __bpf_tx_xdp() call
>> fails. It should also handle the other case I fixed  I'll cleanup
>> my PoC-fix patch, test it and provide it here.
> 
> I changed flow in the function to be:


Great, I'll merge this, the other couple fixes, and the bitops optimization and
hopefully then we are set. I'll post a v2 and we can do some final checks.

Thanks!
John

> 
> int xdp_do_redirect_map(struct net_device *dev, struct xdp_buff *xdp,
>   struct bpf_prog *xdp_prog)
> {
>   struct redirect_info *ri = this_cpu_ptr(_info);
>   struct bpf_map *map = ri->map;
>   u32 index = ri->ifindex;
>   struct net_device *fwd;
>   int err = -EINVAL;
> 
>   ri->ifindex = 0;
>   ri->map = NULL;
> 
>   fwd = __dev_map_lookup_elem(map, index);
>   if (!fwd)
>   goto out;
> 
>   if (ri->map_to_flush && (ri->map_to_flush != map))
>   xdp_do_flush_map();
> 
>   err = __bpf_tx_xdp(fwd, map, xdp, index);
>   if (likely(!err))
>   ri->map_to_flush = map;
> 
> out:
>   trace_xdp_redirect(dev, fwd, xdp_prog, XDP_REDIRECT);
>   return err;
> }
> 
> 
> The diff is:
> 
> diff --git a/net/core/filter.c b/net/core/filter.c
> index 4ca895d6ed51..c50a7ec2cdab 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -2483,26 +2483,25 @@ int xdp_do_redirect_map(struct net_device *dev, 
> struct xdp_buff *xdp,
> struct bpf_map *map = ri->map;
> u32 index = ri->ifindex;
> struct net_device *fwd;
> +   int err = -EINVAL;
> +
> +   ri->ifindex = 0;
> +   ri->map = NULL;
>  
> fwd = __dev_map_lookup_elem(map, index);
> if (!fwd)
> goto out;
>  
> -   ri->ifindex = 0;
> -
> if (ri->map_to_flush && (ri->map_to_flush != map))
> xdp_do_flush_map();
>  
> -   ri->map_to_flush = map;
> -   ri->map = NULL;
> +   err = __bpf_tx_xdp(fwd, map, xdp, index);
> +   if (likely(!err))
> +   ri->map_to_flush = map;
>  
> -   trace_xdp_redirect(dev, fwd, xdp_prog, XDP_REDIRECT);
> -
> -   return __bpf_tx_xdp(fwd, map, xdp, index);
>  out:
> -   ri->ifindex = 0;
> -   ri->map = NULL;
> -   return -EINVAL;
> +   trace_xdp_redirect(dev, fwd, xdp_prog, XDP_REDIRECT);
> +   return err;
>  }
>  
>  int xdp_do_redirect(struct net_device *dev, struct xdp_buff *xdp,
>

[PATCH net] net/packet: Fix Tx queue selection for AF_PACKET

2017-07-13 Thread Iván Briano

When PACKET_QDISC_BYPASS is not used, Tx queue selection will be done
before the packet is enqueued, taking into account any mappings set by
a queuing discipline such as mqprio without hardware offloading. This
selection may be affected by a previously saved queue_mapping, either on
the Rx path, or done before the packet reaches the device, as it's
currently the case for AF_PACKET.

In order for queue selection to work as expected when using traffic
control, there can't be another selection done before that point is
reached, so move the call to packet_pick_tx_queue to
packet_direct_xmit, leaving the default xmit path as it was before
PACKET_QDISC_BYPASS was introduced.

A forward declaration of packet_pick_tx_queue() is introduced to avoid
the need to reorder the functions within the file.

Fixes: d346a3fae3ff ("packet: introduce PACKET_QDISC_BYPASS socket option")
Signed-off-by: Iván Briano 
---
 net/packet/af_packet.c | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index e3beb28..008bb34 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -214,6 +214,7 @@ static void prb_clear_rxhash(struct tpacket_kbdq_core *,
 static void prb_fill_vlan_info(struct tpacket_kbdq_core *,
struct tpacket3_hdr *);
 static void packet_flush_mclist(struct sock *sk);
+static void packet_pick_tx_queue(struct net_device *dev, struct sk_buff *skb);
 
 struct packet_skb_cb {
union {
@@ -260,6 +261,7 @@ static int packet_direct_xmit(struct sk_buff *skb)
if (skb != orig_skb)
goto drop;
 
+   packet_pick_tx_queue(dev, skb);
txq = skb_get_tx_queue(dev, skb);
 
local_bh_disable();
@@ -2747,8 +2749,6 @@ static int tpacket_snd(struct packet_sock *po, struct 
msghdr *msg)
goto tpacket_error;
}
 
-   packet_pick_tx_queue(dev, skb);
-
skb->destructor = tpacket_destruct_skb;
__packet_set_status(po, ph, TP_STATUS_SENDING);
packet_inc_pending(>tx_ring);
@@ -2931,8 +2931,6 @@ static int packet_snd(struct socket *sock, struct msghdr 
*msg, size_t len)
skb->priority = sk->sk_priority;
skb->mark = sockc.mark;
 
-   packet_pick_tx_queue(dev, skb);
-
if (po->has_vnet_hdr) {
err = virtio_net_hdr_to_skb(skb, _hdr, vio_le());
if (err)
-- 
2.9.4

Re: [PATCH] net: broadcom: bnx2x: make a couple of const arrays static

2017-07-13 Thread David Miller

From: Colin King 
Date: Tue, 11 Jul 2017 11:52:23 +0100

> From: Colin Ian King 
> 
> Don't populate various tables on the stack but make them static const.
> Makes the object code smaller by nearly 200 bytes:
> 
> Before:
>text  data bss dec hex filename
>  113468 11200   0  124668   1e6fc bnx2x_ethtool.o
> 
> After:
>text  data bss dec hex filename
>  113129 11344   0  124473   1e639 bnx2x_ethtool.o
> 
> Signed-off-by: Colin Ian King 

Applied.

Re: [PATCH] dccp: make const array error_code static

2017-07-13 Thread David Miller

From: Colin King 
Date: Thu, 13 Jul 2017 12:22:24 +0100

> From: Colin Ian King 
> 
> Don't populate array error_code on the stack but make it static. Makes
> the object code smaller by almost 250 bytes:
> 
> Before:
>text  data bss dec hex filename
>   10366   983   0   113492c55 net/dccp/input.o
> 
> After:
>text  data bss dec hex filename
>   10161  1039   0   112002bc0 net/dccp/input.o
> 
> Signed-off-by: Colin Ian King 

Applied.

Re: [PATCH] net: stmmac: make const array route_possibilities static

2017-07-13 Thread David Miller

From: Colin King 
Date: Tue, 11 Jul 2017 12:18:48 +0100

> From: Colin Ian King 
> 
> Don't populate array route_possibilities on the stack but make it
> static const.  Makes the object code a little smaller by 85 bytes:
> 
> Before:
>text  data bss dec hex filename
>9901  2448   0   12349303d dwmac4_core.o
> 
> After:
>text  data bss dec hex filename
>9760  2504   0   122642fe8 dwmac4_core.o
> 
> Signed-off-by: Colin Ian King 

Applied.

Re: [PATCH] netlink: correctly document nla_put_u64_64bit()

2017-07-13 Thread David Miller

From: Rolf Eike Beer 
Date: Thu, 13 Jul 2017 16:50:24 +0200

> From 90bda8d1bc2a0c5d283e4e0a4b19812a4cce72bd Mon Sep 17 00:00:00 2001
> From: Rolf Eike Beer 
> Date: Thu, 13 Jul 2017 16:46:43 +0200
> Subject: [PATCH] netlink: correctly document nla_put_u64_64bit()
> 
> Signed-off-by: Rolf Eike Beer 

Applied.

[PATCH] datapath: Fix for force/commit action failures

2017-07-13 Thread Greg Rose

When there is an established connection in direction A->B, it is
possible to receive a packet on port B which then executes
ct(commit,force) without first performing ct() - ie, a lookup.
In this case, we would expect that this packet can delete the existing
entry so that we can commit a connection with direction B->A. However,
currently we only perform a check in skb_nfct_cached() for whether
OVS_CS_F_TRACKED is set and OVS_CS_F_INVALID is not set, ie that a
lookup previously occurred. In the above scenario, a lookup has not
occurred but we should still be able to statelessly look up the
existing entry and potentially delete the entry if it is in the
opposite direction.

This patch extends the check to also hint that if the action has the
force flag set, then we will lookup the existing entry so that the
force check at the end of skb_nfct_cached has the ability to delete
the connection.

CC: d...@openvswitch.org
CC: Pravin Shalar 
Signed-off-by: Joe Stringer 
Signed-off-by: Greg Rose 
---
 net/openvswitch/conntrack.c | 12 
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
index 08679eb..9041cf8 100644
--- a/net/openvswitch/conntrack.c
+++ b/net/openvswitch/conntrack.c
@@ -641,17 +641,21 @@ static bool skb_nfct_cached(struct net *net,
ct = nf_ct_get(skb, );
/* If no ct, check if we have evidence that an existing conntrack entry
 * might be found for this skb.  This happens when we lose a skb->_nfct
-* due to an upcall.  If the connection was not confirmed, it is not
-* cached and needs to be run through conntrack again.
+* due to an upcall, or if the direction is being forced.  If the
+* connection was not confirmed, it is not cached and needs to be run
+* through conntrack again.
 */
-   if (!ct && key->ct_state & OVS_CS_F_TRACKED &&
+   if ((!ct && (key->ct_state & OVS_CS_F_TRACKED) &&
!(key->ct_state & OVS_CS_F_INVALID) &&
-   key->ct_zone == info->zone.id) {
+key->ct_zone == info->zone.id) ||
+(!key->ct_state && info->force)) {
ct = ovs_ct_find_existing(net, >zone, info->family, skb,
  !!(key->ct_state
 & OVS_CS_F_NAT_MASK));
if (ct)
nf_ct_get(skb, );
+   else
+   return false;
}
if (!ct)
return false;
-- 
1.8.3.1

Re: [PATCH net] cxgb4: add new T5 pci device id's

2017-07-13 Thread David Miller

From: Ganesh Goudar 
Date: Thu, 13 Jul 2017 18:45:07 +0530

> Add 0x50a3 and 0x50a4 T5 device id's
> 
> Signed-off-by: Ganesh Goudar 

Applied.

Re: [PATCH net] xgene: Don't fail probe, if there is no clk resource for SGMII interfaces

2017-07-13 Thread David Miller

From: Thomas Bogendoerfer 
Date: Thu, 13 Jul 2017 10:57:40 +0200

> From: Thomas Bogendoerfer 
> 
> This change fixes following problem
> 
> [1.827940] xgene-enet: probe of 1f210030.ethernet failed with error -2
> 
> which leads to a missing ethernet interface (reproducable at least on
> Gigabyte MP30-AR0 and APM Mustang systems).
> 
> The check for a valid clk resource fails, because DT doesn't provide a
> clock for sgenet1. But the driver doesn't use this clk, if the ethernet
> port is connected via SGMII. Therefore this patch avoids probing for clk
> on SGMII interfaces.

Applied, thanks.

> Fixes: 9aea7779b764 drivers: net: xgene: Fix crash on DT systems

Please put the commit header text inside of parenthesis and double quotes,
like this:

Fixes: 9aea7779b764 ("drivers: net: xgene: Fix crash on DT systems")

I fixed it up for you this time.

Re: [PATCH] bpf: fix return in bpf_skb_adjust_net

2017-07-13 Thread David Miller

From: Kefeng Wang 
Date: Thu, 13 Jul 2017 14:27:58 +0800

> The bpf_skb_adjust_net() ignores the return value of bpf_skb_net_shrink/grow,
> and always return 0, fix it by return 'ret'.
> 
> Signed-off-by: Kefeng Wang 

Applied, thanks.

Re: [RFC PATCH 00/12] Implement XDP bpf_redirect vairants

2017-07-13 Thread Jesper Dangaard Brouer

On Thu, 13 Jul 2017 13:14:30 +0200
Jesper Dangaard Brouer  wrote:

> I'm still getting crashes (but much harder to provoke), but I figured
> out why.  We sort of missed one case, where map_to_flush gets set, when
> the ndo_xdp_xmit() call starts to fail, and the ixgbe driver then
> forgets to call xdp_do_flush_map, if all packets in that NAPI cycle
> failed.  We could blame the driver, but yhe clean solution is making
> sure, that we don't set map_to_flush when the __bpf_tx_xdp() call
> fails. It should also handle the other case I fixed  I'll cleanup
> my PoC-fix patch, test it and provide it here.

I changed flow in the function to be:

int xdp_do_redirect_map(struct net_device *dev, struct xdp_buff *xdp,
struct bpf_prog *xdp_prog)
{
struct redirect_info *ri = this_cpu_ptr(_info);
struct bpf_map *map = ri->map;
u32 index = ri->ifindex;
struct net_device *fwd;
int err = -EINVAL;

ri->ifindex = 0;
ri->map = NULL;

fwd = __dev_map_lookup_elem(map, index);
if (!fwd)
goto out;

if (ri->map_to_flush && (ri->map_to_flush != map))
xdp_do_flush_map();

err = __bpf_tx_xdp(fwd, map, xdp, index);
if (likely(!err))
ri->map_to_flush = map;

out:
trace_xdp_redirect(dev, fwd, xdp_prog, XDP_REDIRECT);
return err;
}


The diff is:

diff --git a/net/core/filter.c b/net/core/filter.c
index 4ca895d6ed51..c50a7ec2cdab 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -2483,26 +2483,25 @@ int xdp_do_redirect_map(struct net_device *dev, struct 
xdp_buff *xdp,
struct bpf_map *map = ri->map;
u32 index = ri->ifindex;
struct net_device *fwd;
+   int err = -EINVAL;
+
+   ri->ifindex = 0;
+   ri->map = NULL;
 
fwd = __dev_map_lookup_elem(map, index);
if (!fwd)
goto out;
 
-   ri->ifindex = 0;
-
if (ri->map_to_flush && (ri->map_to_flush != map))
xdp_do_flush_map();
 
-   ri->map_to_flush = map;
-   ri->map = NULL;
+   err = __bpf_tx_xdp(fwd, map, xdp, index);
+   if (likely(!err))
+   ri->map_to_flush = map;
 
-   trace_xdp_redirect(dev, fwd, xdp_prog, XDP_REDIRECT);
-
-   return __bpf_tx_xdp(fwd, map, xdp, index);
 out:
-   ri->ifindex = 0;
-   ri->map = NULL;
-   return -EINVAL;
+   trace_xdp_redirect(dev, fwd, xdp_prog, XDP_REDIRECT);
+   return err;
 }
 
 int xdp_do_redirect(struct net_device *dev, struct xdp_buff *xdp,

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

Re: [PATCH net] libceph: osdmap: Fix some NULL dereferences

2017-07-13 Thread Ilya Dryomov

On Thu, Jul 13, 2017 at 9:45 AM, Dan Carpenter  wrote:
> There are hidden gotos in the ceph_decode_* macros.  We need to set the
> "err" variable on these error paths otherwise we end up returning
> ERR_PTR(0) which is NULL.  It causes NULL dereferences in the callers.
>
> Fixes: 278b1d709c6a ("libceph: ceph_decode_skip_* helpers")
> Signed-off-by: Dan Carpenter 
>
> diff --git a/net/ceph/osdmap.c b/net/ceph/osdmap.c
> index 864789c5974e..c7521a847ef7 100644
> --- a/net/ceph/osdmap.c
> +++ b/net/ceph/osdmap.c
> @@ -510,6 +510,7 @@ static struct crush_map *crush_decode(void *pbyval, void 
> *end)
> }
> }
>
> +   err = -EINVAL;
> ceph_decode_skip_map(p, end, 32, string, bad); /* type_map */
> ceph_decode_skip_map(p, end, 32, string, bad); /* name_map */
> ceph_decode_skip_map(p, end, 32, string, bad); /* rule_name_map */
> @@ -1825,9 +1826,9 @@ struct ceph_osdmap *osdmap_apply_incremental(void **p, 
> void *end,
> if (struct_v >= 3) {
> /* new_erasure_code_profiles */
> ceph_decode_skip_map_of_map(p, end, string, string, string,
> -   bad);
> +   e_inval);
> /* old_erasure_code_profiles */
> -   ceph_decode_skip_set(p, end, string, bad);
> +   ceph_decode_skip_set(p, end, string, e_inval);
> }
>
> if (struct_v >= 4) {

Hi Dan,

I applied osdmap_apply_incremental() hunk and fixed a similar bug in
osdmap_decode() (it's a not NULL deref, that's why smatch didn't catch
it).

I posted a separate patch for crush_decode().  This is the second time
in two months you are fixing it, so I wanted something more future-proof.

See the end result at https://github.com/ceph/ceph-client/commits/testing.

Thanks,

Ilya

Re: [BUG]: NULL ptr dereference in unix_stream_sendmsg+0x1c1/0x380

2017-07-13 Thread Dmitry Safonov


Sorry, I've forgot to change the subject as the first time it fired at
xlog_cil_push(), but all reproductions fire on unix_stream_sendmsg().

On 07/13/2017 06:16 PM, Dmitry Safonov wrote:

Hello,

We run CRIU tests on linux-next tree and today we found this issue.

CRIU tests are the set of small programs to check checkpoint/restore
of different primitives (files, sockets, signals, pipes, etc).
https://github.com/xemul/criu/tree/master/test

Each test is executed three times: without namespaces, in a set of all
namespaces except userns, in a set of all namespaces. When a test
passed the preparation tests, it sends a signal to an executer, and
then the executer dumps and restores tests processes, and sends a
signal to the test back to check that everything are restored
correctly.

= Run zdtm/transition/unix_sock in ns 
==

Start test
./unix_sock --pidfile=unix_sock.pid --outfile=unix_sock.out 
--filename=unix_sock.test

Run criu dump
[   57.647284] writing to auto_msgmni has no effect
[   60.730380] criu (2023) used greatest stack depth: 11808 bytes left
Run criu restore
[   60.993529] BUG: unable to handle kernel NULL pointer dereference at 
  (null)

[   60.994221] IP: skb_queue_tail+0x2e/0x50
[   60.994589] PGD 71070067
[   60.994590] P4D 71070067
[   60.994854] PUD 71071067
[   60.995102] PMD 0
[   60.995352]
[   60.995694] Oops: 0002 [#1] SMP
[   60.996004] CPU: 0 PID: 2053 Comm: unix_sock Not tainted 
4.12.0-next-20170713 #6
[   60.996706] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), 
BIOS 1.10.2-20170228_101828-anatol 04/01/2014

[   60.997657] task: 880074748c80 task.stack: c9594000
[   60.998208] RIP: 0010:skb_queue_tail+0x2e/0x50
[   60.998614] RSP: 0018:c9597cf8 EFLAGS: 00010046
[   60.999132] RAX: 0246 RBX: 88006f3fa0c8 RCX: 

[   60.999797] RDX:  RSI: 0246 RDI: 
88006f3fa0dc
[   61.000455] RBP: c9597d10 R08: c9597e50 R09: 

[   61.001114] R10: 880072daea00 R11: 88007d002d80 R12: 
880072daea00
[   61.001772] R13: 88006f3fa0dc R14: 88006f3fa000 R15: 
0001
[   61.002451] FS:  () GS:88007fc0(0063) 
knlGS:f7f7b380

[   61.003198] CS:  0010 DS: 002b ES: 002b CR0: 80050033
[   61.003735] CR2:  CR3: 7106f000 CR4: 
06f0
[   61.004393] DR0:  DR1:  DR2: 

[   61.005050] DR3:  DR6: fffe0ff0 DR7: 
0400

[   61.005717] Call Trace:
[   61.005952]  unix_stream_sendmsg+0x1c1/0x380
[   61.006345]  sock_sendmsg+0x33/0x40
[   61.006667]  sock_write_iter+0x7d/0xc0
[   61.007032]  __vfs_write+0xcd/0x120
[   61.007353]  vfs_write+0xac/0x1a0
[   61.007677]  SyS_write+0x41/0xa0
[   61.007996]  do_fast_syscall_32+0x8b/0x15c
[   61.008371]  entry_SYSENTER_compat+0x4c/0x5b
[   61.008781] RIP: 0023:0xf7f7faf9
[   61.009082] RSP: 002b:fffd62f8 EFLAGS: 0246 ORIG_RAX: 
0004
[   61.009811] RAX: ffda RBX: 0004 RCX: 
fffd6738
[   61.010453] RDX: 03e8 RSI: fffd63b8 RDI: 
fffd6749
[   61.06] RBP: fffd6b38 R08:  R09: 

[   61.011795] R10:  R11:  R12: 

[   61.012378] R13:  R14:  R15: 

[   61.013027] Code: e5 41 55 4c 8d 6f 14 41 54 53 48 89 fb 4c 89 ef 49 
89 f4 e8 85 d3 21 00 48 8b 53 08 49 89 1c 24 4c 89 ef 48 89 c6 49 89 54 
24 08 <4c> 89 22 83 43 10 01 4c 89 63 08 e8 22 d4 21 00 5b 41 5c 41 5d

[   61.014778] RIP: skb_queue_tail+0x2e/0x50 RSP: c9597cf8
[   61.015333] CR2: 
[   61.015639] ---[ end trace efd0a4201d4b29fc ]---

The bug is easily (5/5) reproduced on next-20170713 with the following:

git clone https://github.com/xemul/criu.git
cd criu && git checkout criu-dev
COMPAT_TEST=y make -j5 zdtm
for i in `seq 1 2`; do ./test/zdtm.py run -t zdtm/transition/unix_sock 
-f ns ; done





--
 Dmitry

[BUG]: NULL ptr dereference in xlog_cil_push+0x274/0x430

2017-07-13 Thread Dmitry Safonov


Hello,

We run CRIU tests on linux-next tree and today we found this issue.

CRIU tests are the set of small programs to check checkpoint/restore
of different primitives (files, sockets, signals, pipes, etc).
https://github.com/xemul/criu/tree/master/test

Each test is executed three times: without namespaces, in a set of all
namespaces except userns, in a set of all namespaces. When a test
passed the preparation tests, it sends a signal to an executer, and
then the executer dumps and restores tests processes, and sends a
signal to the test back to check that everything are restored
correctly.

= Run zdtm/transition/unix_sock in ns 
==

Start test
./unix_sock --pidfile=unix_sock.pid --outfile=unix_sock.out 
--filename=unix_sock.test

Run criu dump
[   57.647284] writing to auto_msgmni has no effect
[   60.730380] criu (2023) used greatest stack depth: 11808 bytes left
Run criu restore
[   60.993529] BUG: unable to handle kernel NULL pointer dereference at 
 (null)

[   60.994221] IP: skb_queue_tail+0x2e/0x50
[   60.994589] PGD 71070067
[   60.994590] P4D 71070067
[   60.994854] PUD 71071067
[   60.995102] PMD 0
[   60.995352]
[   60.995694] Oops: 0002 [#1] SMP
[   60.996004] CPU: 0 PID: 2053 Comm: unix_sock Not tainted 
4.12.0-next-20170713 #6
[   60.996706] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), 
BIOS 1.10.2-20170228_101828-anatol 04/01/2014

[   60.997657] task: 880074748c80 task.stack: c9594000
[   60.998208] RIP: 0010:skb_queue_tail+0x2e/0x50
[   60.998614] RSP: 0018:c9597cf8 EFLAGS: 00010046
[   60.999132] RAX: 0246 RBX: 88006f3fa0c8 RCX: 

[   60.999797] RDX:  RSI: 0246 RDI: 
88006f3fa0dc
[   61.000455] RBP: c9597d10 R08: c9597e50 R09: 

[   61.001114] R10: 880072daea00 R11: 88007d002d80 R12: 
880072daea00
[   61.001772] R13: 88006f3fa0dc R14: 88006f3fa000 R15: 
0001
[   61.002451] FS:  () GS:88007fc0(0063) 
knlGS:f7f7b380

[   61.003198] CS:  0010 DS: 002b ES: 002b CR0: 80050033
[   61.003735] CR2:  CR3: 7106f000 CR4: 
06f0
[   61.004393] DR0:  DR1:  DR2: 

[   61.005050] DR3:  DR6: fffe0ff0 DR7: 
0400

[   61.005717] Call Trace:
[   61.005952]  unix_stream_sendmsg+0x1c1/0x380
[   61.006345]  sock_sendmsg+0x33/0x40
[   61.006667]  sock_write_iter+0x7d/0xc0
[   61.007032]  __vfs_write+0xcd/0x120
[   61.007353]  vfs_write+0xac/0x1a0
[   61.007677]  SyS_write+0x41/0xa0
[   61.007996]  do_fast_syscall_32+0x8b/0x15c
[   61.008371]  entry_SYSENTER_compat+0x4c/0x5b
[   61.008781] RIP: 0023:0xf7f7faf9
[   61.009082] RSP: 002b:fffd62f8 EFLAGS: 0246 ORIG_RAX: 
0004
[   61.009811] RAX: ffda RBX: 0004 RCX: 
fffd6738
[   61.010453] RDX: 03e8 RSI: fffd63b8 RDI: 
fffd6749
[   61.06] RBP: fffd6b38 R08:  R09: 

[   61.011795] R10:  R11:  R12: 

[   61.012378] R13:  R14:  R15: 

[   61.013027] Code: e5 41 55 4c 8d 6f 14 41 54 53 48 89 fb 4c 89 ef 49 
89 f4 e8 85 d3 21 00 48 8b 53 08 49 89 1c 24 4c 89 ef 48 89 c6 49 89 54 
24 08 <4c> 89 22 83 43 10 01 4c 89 63 08 e8 22 d4 21 00 5b 41 5c 41 5d

[   61.014778] RIP: skb_queue_tail+0x2e/0x50 RSP: c9597cf8
[   61.015333] CR2: 
[   61.015639] ---[ end trace efd0a4201d4b29fc ]---

The bug is easily (5/5) reproduced on next-20170713 with the following:

git clone https://github.com/xemul/criu.git
cd criu && git checkout criu-dev
COMPAT_TEST=y make -j5 zdtm
for i in `seq 1 2`; do ./test/zdtm.py run -t zdtm/transition/unix_sock 
-f ns ; done


--
 Dmitry

[PATCH] netlink: correctly document nla_put_u64_64bit()

2017-07-13 Thread Rolf Eike Beer

>From 90bda8d1bc2a0c5d283e4e0a4b19812a4cce72bd Mon Sep 17 00:00:00 2001
From: Rolf Eike Beer 
Date: Thu, 13 Jul 2017 16:46:43 +0200
Subject: [PATCH] netlink: correctly document nla_put_u64_64bit()

Signed-off-by: Rolf Eike Beer 
---
 include/net/netlink.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/net/netlink.h b/include/net/netlink.h
index 01709172b3d3..ef8e6c3a80a6 100644
--- a/include/net/netlink.h
+++ b/include/net/netlink.h
@@ -98,8 +98,8 @@
  *   nla_put_u8(skb, type, value)  add u8 attribute to skb
  *   nla_put_u16(skb, type, value) add u16 attribute to skb
  *   nla_put_u32(skb, type, value) add u32 attribute to skb
- *   nla_put_u64_64bits(skb, type,
- * value, padattr) add u64 attribute to skb
+ *   nla_put_u64_64bit(skb, type,
+ * value, padattr) add u64 attribute to skb
  *   nla_put_s8(skb, type, value)  add s8 attribute to skb
  *   nla_put_s16(skb, type, value) add s16 attribute to skb
  *   nla_put_s32(skb, type, value) add s32 attribute to skb
-- 
2.13.2


-- 
Rolf Eike Beer, emlix GmbH, http://www.emlix.com
Fon +49 551 30664-0, Fax +49 551 30664-11
Bertha-von-Suttner-Str. 9, 37085 Göttingen, Germany
Sitz der Gesellschaft: Göttingen, Amtsgericht Göttingen HR B 3160
Geschäftsführung: Heike Jordan, Dr. Uwe Kracke – Ust-IdNr.: DE 205 198 055

emlix – smart embedded open source

Re: sctp refcount bug.

2017-07-13 Thread Dave Jones

On Thu, Jul 13, 2017 at 11:38:34AM -0300, Marcelo Ricardo Leitner wrote:
 > On Thu, Jul 13, 2017 at 10:36:39AM -0400, Dave Jones wrote:
 > > Hit this on Linus' current tree.
 > > 
 > > 
 > > refcount_t: underflow; use-after-free.
 > 
 > Any tips on how to reproduce this?

Only seen it once so far. Will see if I can narrow it down if it
reproduces.  It took ~12 hours of fuzzing to find overnight.

Dave

Re: sctp refcount bug.

2017-07-13 Thread Marcelo Ricardo Leitner

On Thu, Jul 13, 2017 at 10:36:39AM -0400, Dave Jones wrote:
> Hit this on Linus' current tree.
> 
> 
> refcount_t: underflow; use-after-free.

Any tips on how to reproduce this?

  Marcelo

sctp refcount bug.

2017-07-13 Thread Dave Jones

Hit this on Linus' current tree.


refcount_t: underflow; use-after-free.
[ cut here ]
WARNING: CPU: 2 PID: 14455 at lib/refcount.c:186 refcount_sub_and_test+0x45/0x50
CPU: 2 PID: 14455 Comm: trinity-c46 Tainted: G  D 4.12.0-think+ #11 
task: 8804fc71b8c0 task.stack: c90002328000
RIP: 0010:refcount_sub_and_test+0x45/0x50
RSP: 0018:c9000232ba58 EFLAGS: 00010282
RAX: 0026 RBX: 88001db1d1c0 RCX: 
RDX:  RSI: 88050a3ccca8 RDI: 88050a3ccca8
RBP: c9000232ba58 R08:  R09: 0001
R10: c9000232ba88 R11:  R12: 88000d3f9b40
R13: 880456948008 R14: 880456948870 R15: c9000232bd10
FS:  7f79b1032700() GS:88050a20() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 008436726348 CR3: 00022cc87000 CR4: 001406e0
DR0: 7f731068f000 DR1: 7f2d83eb9000 DR2: 7f302340e000
DR3:  DR6: 0ff0 DR7: 0600
Call Trace:
 sctp_wfree+0x5d/0x190 [sctp]
 skb_release_head_state+0x64/0xc0
 skb_release_all+0x12/0x30
 consume_skb+0x50/0x170
 sctp_chunk_put+0x59/0x80 [sctp]
 sctp_chunk_free+0x26/0x30 [sctp]
 __sctp_outq_teardown+0x1d8/0x270 [sctp]
 sctp_outq_free+0xe/0x10 [sctp]
 sctp_association_free+0x92/0x220 [sctp]
 sctp_do_sm+0x12a6/0x1920 [sctp]
 ? __get_user_4+0x18/0x20
 ? no_context+0x3f/0x360
 ? lock_acquire+0xe7/0x1e0
 ? skb_dequeue+0x1d/0x70
 sctp_primitive_SHUTDOWN+0x33/0x40 [sctp]
 sctp_close+0x26e/0x2a0 [sctp]
 inet_release+0x3c/0x60
 sock_release+0x1f/0x80
 sock_close+0x12/0x20
 __fput+0xf8/0x200
 fput+0xe/0x10
 task_work_run+0x85/0xc0
 exit_to_usermode_loop+0xa8/0xb0
 do_syscall_64+0x151/0x190
 entry_SYSCALL64_slow_path+0x25/0x25
RIP: 0033:0x7f79b095b1e9
RSP: 002b:7ffc5eca3088 EFLAGS: 0246
 ORIG_RAX: 0120
RAX: fff2 RBX: 0120 RCX: 7f79b095b1e9
RDX: 006e RSI: 008436738120 RDI: 0130
RBP: 7ffc5eca3130 R08:  R09: 0ff0
R10: 00080800 R11: 0246 R12: 0002
R13: 7f79b0ee9058 R14: 7f79b1032698 R15: 7f79b0ee9000
Code: 75 e6 85 d2 0f 94 c0 c3 31 c0 c3 80 3d ce 95 bc 00 00 75 f4 55 48 c7 c7 
00 d9 ee 81 48 89 e5 c6 05 ba 95 bc 00 01 e8 fc 2c c0 ff <0f> ff 31 c0 5d c3 0f 
1f 44 00 00 55 48 89 fe bf 01 00 00 00 48 
---[ end trace 19b7bd878c0f56fd ]---
[ cut here ]
WARNING: CPU: 2 PID: 14455 at net/ipv4/af_inet.c:154 
inet_sock_destruct+0x1b8/0x1f0
CPU: 2 PID: 14455 Comm: trinity-c46 Tainted: G  D W   4.12.0-think+ #11 
task: 8804fc71b8c0 task.stack: c90002328000
RIP: 0010:inet_sock_destruct+0x1b8/0x1f0
RSP: 0018:c9000232bcf8 EFLAGS: 00010286
RAX:  RBX: 88000d3f9b40 RCX: 
RDX: fd00 RSI: 0300 RDI: 88000d3f9ca8
RBP: c9000232bd08 R08:  R09: 
R10:  R11:  R12: 88000d3f9ca8
R13: 88000d3f9b40 R14: 88000d3f9bc8 R15: 8801836e21d0
FS:  7f79b1032700() GS:88050a20() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 3b9ab732 CR3: 00022cc87000 CR4: 001406e0
DR0: 7f731068f000 DR1: 7f2d83eb9000 DR2: 7f302340e000
DR3:  DR6: 0ff0 DR7: 0600
Call Trace:
 sctp_destruct_sock+0x25/0x30 [sctp]
 __sk_destruct+0x28/0x230
 sk_destruct+0x20/0x30
 __sk_free+0x43/0xa0
 sk_free+0x25/0x30
 sctp_close+0x218/0x2a0 [sctp]
 inet_release+0x3c/0x60
 sock_release+0x1f/0x80
 sock_close+0x12/0x20
 __fput+0xf8/0x200
 fput+0xe/0x10
 task_work_run+0x85/0xc0
 exit_to_usermode_loop+0xa8/0xb0
 do_syscall_64+0x151/0x190
 entry_SYSCALL64_slow_path+0x25/0x25
RIP: 0033:0x7f79b095b1e9
RSP: 002b:7ffc5eca3088 EFLAGS: 0246
 ORIG_RAX: 0120
RAX: fff2 RBX: 0120 RCX: 7f79b095b1e9
RDX: 006e RSI: 008436738120 RDI: 0130
RBP: 7ffc5eca3130 R08:  R09: 0ff0
R10: 00080800 R11: 0246 R12: 0002
R13: 7f79b0ee9058 R14: 7f79b1032698 R15: 7f79b0ee9000
Code: df e8 bd 5f f4 ff e9 07 ff ff ff 0f ff 8b 83 8c 02 00 00 85 c0 0f 84 2d 
ff ff ff 0f ff 8b 93 88 02 00 00 85 d2 0f 84 2b ff ff ff <0f> ff 8b 83 40 02 00 
00 85 c0 0f 84 29 ff ff ff 0f ff e9 22 ff 
---[ end trace 19b7bd878c0f56fe ]---
[ cut here ]
WARNING: CPU: 2 PID: 14455 at net/ipv4/af_inet.c:155 
inet_sock_destruct+0x1c8/0x1f0
CPU: 2 PID: 14455 Comm: trinity-c46 Tainted: G  D W   4.12.0-think+ #11 
task: 8804fc71b8c0 task.stack: c90002328000
RIP: 0010:inet_sock_destruct+0x1c8/0x1f0
RSP: 0018:c9000232bcf8 EFLAGS: 00010206
RAX: 0300 RBX: 88000d3f9b40 RCX: 
RDX: fd00 RSI: 0300 RDI:

[PATCH v7 1/3] PCI: Add new PCIe Fabric End Node flag, PCI_DEV_FLAGS_NO_RELAXED_ORDERING

2017-07-13 Thread Ding Tianhong

From: Casey Leedom 

The new flag PCI_DEV_FLAGS_NO_RELAXED_ORDERING indicates that the Relaxed
Ordering Attribute should not be used on Transaction Layer Packets destined
for the PCIe End Node so flagged.  Initially flagged this way are Intel
E5-26xx Root Complex Ports which suffer from a Flow Control Credit
Performance Problem and AMD A1100 ARM ("SEATTLE") Root Complex Ports which
don't obey PCIe 3.0 ordering rules which can lead to Data Corruption.

Signed-off-by: Casey Leedom 
Signed-off-by: Ding Tianhong 
---
 drivers/pci/quirks.c | 38 ++
 include/linux/pci.h  |  2 ++
 2 files changed, 40 insertions(+)

diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index 6967c6b..1e1cdbe 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -4016,6 +4016,44 @@ static void quirk_tw686x_class(struct pci_dev *pdev)
  quirk_tw686x_class);
 
 /*
+ * Some devices have problems with Transaction Layer Packets with the Relaxed
+ * Ordering Attribute set.  Such devices should mark themselves and other
+ * Device Drivers should check before sending TLPs with RO set.
+ */
+static void quirk_relaxedordering_disable(struct pci_dev *dev)
+{
+   dev->dev_flags |= PCI_DEV_FLAGS_NO_RELAXED_ORDERING;
+}
+
+/*
+ * Intel E5-26xx Root Complex has a Flow Control Credit issue which can
+ * cause performance problems with Upstream Transaction Layer Packets with
+ * Relaxed Ordering set.
+ */
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x6f02, 
PCI_CLASS_NOT_DEFINED, 8,
+ quirk_relaxedordering_disable);
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x6f04, 
PCI_CLASS_NOT_DEFINED, 8,
+ quirk_relaxedordering_disable);
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x6f08, 
PCI_CLASS_NOT_DEFINED, 8,
+ quirk_relaxedordering_disable);
+
+/*
+ * The AMD ARM A1100 (AKA "SEATTLE") SoC has a bug in its PCIe Root Complex
+ * where Upstream Transaction Layer Packets with the Relaxed Ordering
+ * Attribute clear are allowed to bypass earlier TLPs with Relaxed Ordering
+ * set.  This is a violation of the PCIe 3.0 Transaction Ordering Rules
+ * outlined in Section 2.4.1 (PCI Express(r) Base Specification Revision 3.0
+ * November 10, 2010).  As a result, on this platform we can't use Relaxed
+ * Ordering for Upstream TLPs.
+ */
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_AMD, 0x1a00, 
PCI_CLASS_NOT_DEFINED, 8,
+ quirk_relaxedordering_disable);
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_AMD, 0x1a01, 
PCI_CLASS_NOT_DEFINED, 8,
+ quirk_relaxedordering_disable);
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_AMD, 0x1a02, 
PCI_CLASS_NOT_DEFINED, 8,
+ quirk_relaxedordering_disable);
+
+/*
  * Per PCIe r3.0, sec 2.2.9, "Completion headers must supply the same
  * values for the Attribute as were supplied in the header of the
  * corresponding Request, except as explicitly allowed when IDO is used."
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 4869e66..412ec1c 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -188,6 +188,8 @@ enum pci_dev_flags {
 * the direct_complete optimization.
 */
PCI_DEV_FLAGS_NEEDS_RESUME = (__force pci_dev_flags_t) (1 << 11),
+   /* Don't use Relaxed Ordering for TLPs directed at this device */
+   PCI_DEV_FLAGS_NO_RELAXED_ORDERING = (__force pci_dev_flags_t) (1 << 12),
 };
 
 enum pci_irq_reroute_variant {
-- 
1.8.3.1

[PATCH v7 2/3] PCI: Enable PCIe Relaxed Ordering if supported

2017-07-13 Thread Ding Tianhong

The PCIe Device Control Register use the bit 4 to indicate that
whether the device is permitted to enable relaxed ordering or not.
But relaxed ordering is not safe for some platform which could only
use strong write ordering, so devices are allowed (but not required)
to enable relaxed ordering bit by default.

If a PCIe device didn't enable the relaxed ordering attribute default,
we should not do anything in the PCIe configuration, otherwise we
should check if any of the devices above us do not support relaxed
ordering by the PCI_DEV_FLAGS_NO_RELAXED_ORDERING flag, then base on
the result if we get a return that indicate that the relaxed ordering
is not supported we should update our device to disable relaxed ordering
in configuration space. If the device above us doesn't exist or isn't
the PCIe device, we shouldn't do anything and skip updating relaxed ordering
because we are probably running in a guest machine.

Signed-off-by: Ding Tianhong 
---
 drivers/pci/pci.c   | 29 +
 drivers/pci/probe.c | 37 +
 include/linux/pci.h |  2 ++
 3 files changed, 68 insertions(+)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index d88edf5..7a6b32f 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -4854,6 +4854,35 @@ int pcie_set_mps(struct pci_dev *dev, int mps)
 EXPORT_SYMBOL(pcie_set_mps);
 
 /**
+ * pcie_clear_relaxed_ordering - clear PCI Express relaxed ordering bit
+ * @dev: PCI device to query
+ *
+ * If possible clear relaxed ordering
+ */
+int pcie_clear_relaxed_ordering(struct pci_dev *dev)
+{
+   return pcie_capability_clear_word(dev, PCI_EXP_DEVCTL,
+ PCI_EXP_DEVCTL_RELAX_EN);
+}
+EXPORT_SYMBOL(pcie_clear_relaxed_ordering);
+
+/**
+ * pcie_relaxed_ordering_supported - Probe for PCIe relexed ordering support
+ * @dev: PCI device to query
+ *
+ * Returns true if the device support relaxed ordering attribute.
+ */
+bool pcie_relaxed_ordering_supported(struct pci_dev *dev)
+{
+   u16 v;
+
+   pcie_capability_read_word(dev, PCI_EXP_DEVCTL, );
+
+   return !!(v & PCI_EXP_DEVCTL_RELAX_EN);
+}
+EXPORT_SYMBOL(pcie_relaxed_ordering_supported);
+
+/**
  * pcie_get_minimum_link - determine minimum link settings of a PCI device
  * @dev: PCI device to query
  * @speed: storage for minimum speed
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index c31310d..48df012 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -1762,6 +1762,42 @@ static void pci_configure_extended_tags(struct pci_dev 
*dev)
 PCI_EXP_DEVCTL_EXT_TAG);
 }
 
+/**
+ * pci_dev_should_disable_relaxed_ordering - check if the PCI device
+ * should disable the relaxed ordering attribute.
+ * @dev: PCI device
+ *
+ * Return true if any of the PCI devices above us do not support
+ * relaxed ordering.
+ */
+static bool pci_dev_should_disable_relaxed_ordering(struct pci_dev *dev)
+{
+   while (dev) {
+   if (dev->dev_flags & PCI_DEV_FLAGS_NO_RELAXED_ORDERING)
+   return true;
+
+   dev = dev->bus->self;
+   }
+
+   return false;
+}
+
+static void pci_configure_relaxed_ordering(struct pci_dev *dev)
+{
+   /* We should not alter the relaxed ordering bit for the VF */
+   if (dev->is_virtfn)
+   return;
+
+   /* If the releaxed ordering enable bit is not set, do nothing. */
+   if (!pcie_relaxed_ordering_supported(dev))
+   return;
+
+   if (pci_dev_should_disable_relaxed_ordering(dev)) {
+   pcie_clear_relaxed_ordering(dev);
+   dev_info(>dev, "Disable Relaxed Ordering\n");
+   }
+}
+
 static void pci_configure_device(struct pci_dev *dev)
 {
struct hotplug_params hpp;
@@ -1769,6 +1805,7 @@ static void pci_configure_device(struct pci_dev *dev)
 
pci_configure_mps(dev);
pci_configure_extended_tags(dev);
+   pci_configure_relaxed_ordering(dev);
 
memset(, 0, sizeof(hpp));
ret = pci_get_hp_params(dev, );
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 412ec1c..3aa23a2 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -1127,6 +1127,8 @@ int pci_add_ext_cap_save_buffer(struct pci_dev *dev,
 void pci_pme_wakeup_bus(struct pci_bus *bus);
 void pci_d3cold_enable(struct pci_dev *dev);
 void pci_d3cold_disable(struct pci_dev *dev);
+int pcie_clear_relaxed_ordering(struct pci_dev *dev);
+bool pcie_relaxed_ordering_supported(struct pci_dev *dev);
 
 /* PCI Virtual Channel */
 int pci_save_vc_state(struct pci_dev *dev);
-- 
1.8.3.1

[PATCH v7 0/3] Add new PCI_DEV_FLAGS_NO_RELAXED_ORDERING flag

2017-07-13 Thread Ding Tianhong

Some devices have problems with Transaction Layer Packets with the Relaxed
Ordering Attribute set.  This patch set adds a new PCIe Device Flag,
PCI_DEV_FLAGS_NO_RELAXED_ORDERING, a set of PCI Quirks to catch some known
devices with Relaxed Ordering issues, and a use of this new flag by the
cxgb4 driver to avoid using Relaxed Ordering with problematic Root Complex
Ports.

It's been years since I've submitted kernel.org patches, I appolgise for the
almost certain submission errors.

v2: Alexander point out that the v1 was only a part of the whole solution,
some platform which has some issues could use the new flag to indicate
that it is not safe to enable relaxed ordering attribute, then we need
to clear the relaxed ordering enable bits in the PCI configuration when
initializing the device. So add a new second patch to modify the PCI
initialization code to clear the relaxed ordering enable bit in the
event that the root complex doesn't want relaxed ordering enabled.

The third patch was base on the v1's second patch and only be changed
to query the relaxed ordering enable bit in the PCI configuration space
to allow the Chelsio NIC to send TLPs with the relaxed ordering attributes
set.

This version didn't plan to drop the defines for Intel Drivers to use the
new checking way to enable relaxed ordering because it is not the hardest
part of the moment, we could fix it in next patchset when this patches
reach the goal.  

v3: Redesigned the logic for pci_configure_relaxed_ordering when configuration,
If a PCIe device didn't enable the relaxed ordering attribute default,
we should not do anything in the PCIe configuration, otherwise we
should check if any of the devices above us do not support relaxed
ordering by the PCI_DEV_FLAGS_NO_RELAXED_ORDERING flag, then base on
the result if we get a return that indicate that the relaxed ordering
is not supported we should update our device to disable relaxed ordering
in configuration space. If the device above us doesn't exist or isn't
the PCIe device, we shouldn't do anything and skip updating relaxed ordering
because we are probably running in a guest.

v4: Rename the functions pcie_get_relaxed_ordering and 
pcie_disable_relaxed_ordering
according John's suggestion, and modify the description, use the true/false
as the return value.

We shouldn't enable relaxed ordering attribute by the setting in the root
complex configuration space for PCIe device, so fix it for cxgb4.

Fix some format issues.

v5: Removed the unnecessary code for some function which only return the bool
value, and add the check for VF device.

Make this patch set base on 4.12-rc5.

v6: Fix the logic error in the need to enable the relaxed ordering attribute 
for cxgb4.

v7: The cxgb4 drivers will enable the PCIe Capability Device Control[Relaxed
Ordering Enable] in PCI Probe() routine, this will break our current
solution for some platform which has problematic when enable the relaxed
ordering attribute. According to the latest recommendations, remove the
enable_pcie_relaxed_ordering(), although it could not cover the Peer-to-Peer
scene, but we agree to leave this problem until we really trigger it.

Make this patch set base on 4.12 release version.

Casey Leedom (2):
  PCI: Add new PCIe Fabric End Node flag,
PCI_DEV_FLAGS_NO_RELAXED_ORDERING
  net/cxgb4: Use new PCI_DEV_FLAGS_NO_RELAXED_ORDERING flag

Ding Tianhong (1):
  PCI: Enable PCIe Relaxed Ordering if supported

 drivers/net/ethernet/chelsio/cxgb4/cxgb4.h  |  1 +
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c | 17 ++
 drivers/net/ethernet/chelsio/cxgb4/sge.c|  5 +--
 drivers/pci/pci.c   | 32 +++
 drivers/pci/probe.c | 41 +
 drivers/pci/quirks.c| 38 +++
 include/linux/pci.h |  4 +++
 7 files changed, 136 insertions(+), 2 deletions(-)

-- 
1.9.0

[PATCH v7 3/3] net/cxgb4: Use new PCI_DEV_FLAGS_NO_RELAXED_ORDERING flag

2017-07-13 Thread Ding Tianhong

From: Casey Leedom 

cxgb4 Ethernet driver now queries PCIe configuration space to determine
if it can send TLPs to it with the Relaxed Ordering Attribute set.

Remove the enable_pcie_relaxed_ordering() to avoid enable PCIe Capability
Device Control[Relaxed Ordering Enable] at probe routine, to make sure
the driver will not send the Relaxed Ordering TLPs to the Root Complex which
could not deal the Relaxed Ordering TLPs.

Signed-off-by: Casey Leedom 
Signed-off-by: Ding Tianhong 
---
 drivers/net/ethernet/chelsio/cxgb4/cxgb4.h  |  1 +
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c | 23 +--
 drivers/net/ethernet/chelsio/cxgb4/sge.c|  5 +++--
 3 files changed, 21 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h 
b/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h
index ef4be78..09ea62e 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h
@@ -529,6 +529,7 @@ enum { /* adapter flags */
USING_SOFT_PARAMS  = (1 << 6),
MASTER_PF  = (1 << 7),
FW_OFLD_CONN   = (1 << 9),
+   ROOT_NO_RELAXED_ORDERING = (1 << 10),
 };
 
 enum {
diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c 
b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
index e403fa1..391e484 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
@@ -4654,11 +4654,6 @@ static void print_port_info(const struct net_device *dev)
dev->name, adap->params.vpd.id, adap->name, buf);
 }
 
-static void enable_pcie_relaxed_ordering(struct pci_dev *dev)
-{
-   pcie_capability_set_word(dev, PCI_EXP_DEVCTL, PCI_EXP_DEVCTL_RELAX_EN);
-}
-
 /*
  * Free the following resources:
  * - memory used for tables
@@ -4908,7 +4903,6 @@ static int init_one(struct pci_dev *pdev, const struct 
pci_device_id *ent)
}
 
pci_enable_pcie_error_reporting(pdev);
-   enable_pcie_relaxed_ordering(pdev);
pci_set_master(pdev);
pci_save_state(pdev);
 
@@ -4947,6 +4941,23 @@ static int init_one(struct pci_dev *pdev, const struct 
pci_device_id *ent)
adapter->msg_enable = DFLT_MSG_ENABLE;
memset(adapter->chan_map, 0xff, sizeof(adapter->chan_map));
 
+   /* If possible, we use PCIe Relaxed Ordering Attribute to deliver
+* Ingress Packet Data to Free List Buffers in order to allow for
+* chipset performance optimizations between the Root Complex and
+* Memory Controllers.  (Messages to the associated Ingress Queue
+* notifying new Packet Placement in the Free Lists Buffers will be
+* send without the Relaxed Ordering Attribute thus guaranteeing that
+* all preceding PCIe Transaction Layer Packets will be processed
+* first.)  But some Root Complexes have various issues with Upstream
+* Transaction Layer Packets with the Relaxed Ordering Attribute set.
+* The PCIe devices which under the Root Complexes will be cleared the
+* Relaxed Ordering bit in the configuration space, So we check our
+* PCIe configuration space to see if it's flagged with advice against
+* using Relaxed Ordering.
+*/
+   if (!pcie_relaxed_ordering_supported(pdev))
+   adapter->flags |= ROOT_NO_RELAXED_ORDERING;
+
spin_lock_init(>stats_lock);
spin_lock_init(>tid_release_lock);
spin_lock_init(>win0_lock);
diff --git a/drivers/net/ethernet/chelsio/cxgb4/sge.c 
b/drivers/net/ethernet/chelsio/cxgb4/sge.c
index ede1220..4ef68f6 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/sge.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/sge.c
@@ -2719,6 +2719,7 @@ int t4_sge_alloc_rxq(struct adapter *adap, struct 
sge_rspq *iq, bool fwevtq,
struct fw_iq_cmd c;
struct sge *s = >sge;
struct port_info *pi = netdev_priv(dev);
+   int relaxed = !(adap->flags & ROOT_NO_RELAXED_ORDERING);
 
/* Size needs to be multiple of 16, including status entry. */
iq->size = roundup(iq->size, 16);
@@ -2772,8 +2773,8 @@ int t4_sge_alloc_rxq(struct adapter *adap, struct 
sge_rspq *iq, bool fwevtq,
 
flsz = fl->size / 8 + s->stat_len / sizeof(struct tx_desc);
c.iqns_to_fl0congen |= htonl(FW_IQ_CMD_FL0PACKEN_F |
-FW_IQ_CMD_FL0FETCHRO_F |
-FW_IQ_CMD_FL0DATARO_F |
+FW_IQ_CMD_FL0FETCHRO_V(relaxed) |
+FW_IQ_CMD_FL0DATARO_V(relaxed) |
 FW_IQ_CMD_FL0PADEN_F);
if (cong >= 0)
c.iqns_to_fl0congen |=
-- 
1.8.3.1

Re: [PATCH net 20/20] net/hinic: Add ethtool and stats

2017-07-13 Thread Aviad Krawczyk (A)


Hi Andrew,

We will separate the patches when we will resubmit.

Aviad

On 7/12/2017 6:43 PM, Andrew Lunn wrote:
> On Wed, Jul 12, 2017 at 10:17:26PM +0800, Aviad Krawczyk wrote:
> 
> Hi Avaid
> 
>> +
>> +static void hinic_tx_timeout(struct net_device *netdev)
>> +{
>> +struct hinic_dev *nic_dev = netdev_priv(netdev);
>> +
>> +netif_err(nic_dev, drv, netdev, "Tx timeout\n");
>> +}
>> +
>> +#ifdef CONFIG_NET_POLL_CONTROLLER
>> +static void hinic_netpoll(struct net_device *netdev)
>> +{
>> +struct hinic_dev *nic_dev = netdev_priv(netdev);
>> +struct hinic_hwdev *hwdev = nic_dev->hwdev;
>> +int i, num_qps = hinic_hwdev_num_qps(hwdev);
>> +
>> +for (i = 0; i < num_qps; i++) {
>> +struct hinic_txq *txq = _dev->txqs[i];
>> +struct hinic_rxq *rxq = _dev->rxqs[i];
>> +
>> +napi_schedule(>napi);
>> +napi_schedule(>napi);
>> +}
>> +}
>> +#endif
> 
> This has nothing to do with ethtool support. Separate patch please.
> 
>  Andrew
> 
> .
>

[PATCH net] cxgb4: add new T5 pci device id's

2017-07-13 Thread Ganesh Goudar

Add 0x50a3 and 0x50a4 T5 device id's

Signed-off-by: Ganesh Goudar 
---
 drivers/net/ethernet/chelsio/cxgb4/t4_pci_id_tbl.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/ethernet/chelsio/cxgb4/t4_pci_id_tbl.h 
b/drivers/net/ethernet/chelsio/cxgb4/t4_pci_id_tbl.h
index 99987d8..aa28299 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/t4_pci_id_tbl.h
+++ b/drivers/net/ethernet/chelsio/cxgb4/t4_pci_id_tbl.h
@@ -174,6 +174,8 @@ CH_PCI_DEVICE_ID_TABLE_DEFINE_BEGIN
CH_PCI_ID_TABLE_FENTRY(0x50a0), /* Custom T540-CR */
CH_PCI_ID_TABLE_FENTRY(0x50a1), /* Custom T540-CR */
CH_PCI_ID_TABLE_FENTRY(0x50a2), /* Custom T540-KR4 */
+   CH_PCI_ID_TABLE_FENTRY(0x50a3), /* Custom T580-KR4 */
+   CH_PCI_ID_TABLE_FENTRY(0x50a4), /* Custom 2x T540-CR */
 
/* T6 adapters:
 */
-- 
2.1.0

Re: [PATCH net 01/20] net/hinic: Initialize hw interface

2017-07-13 Thread Aviad Krawczyk (A)


Hi Andrew,

Now is the merge window and we need to resubmit.
We will fix it when we will resubmit.

The version number was used for the ethtool information and module version and
will be removed.

devm_kzalloc - will be used for the allocation of the memory at initialization.

We used pr_ for messages that point on errors or information in module,
and dev_ for errors in device and netif_ for errors in network device 
operations.
netif_ will be used for network device ops and dev_ for the others.

Aviad

On 7/12/2017 6:29 PM, Andrew Lunn wrote:
>> +
>> +#define HINIC_DRV_NAME  "HiNIC"
>> +#define HINIC_DRV_VERSION   "1.0"
> 
> Hi Aviad
> 
> Please don't add a driver version. There was a discussion about this
> recently, how pointless it is.
> 
>> +/**
>> + * hinic_init_hwdev - Initialize the NIC HW
>> + * @hwdev: the NIC HW device that is returned from the initialization
>> + * @pdev: the NIC pci device
>> + *
>> + * Return 0 - Success, negative - Failure
>> + *
>> + * Initialize the NIC HW device and return a pointer to it in the first arg
>> + **/
>> +int hinic_init_hwdev(struct hinic_hwdev **hwdev, struct pci_dev *pdev)
>> +{
>> +struct hinic_pfhwdev *pfhwdev;
>> +struct hinic_hwif *hwif;
>> +int err;
>> +
>> +hwif = kzalloc(sizeof(*hwif), GFP_KERNEL);
> 
> Using the devm_ functions makes your cleanup code simpler when
> handling memory.
> 
>> +/**
>> + * nic_dev_init - Initialize the NIC device
>> + * @pdev: the NIC pci device
>> + *
>> + * Return 0 - Success, negative - Failure
>> + **/
>> +static int nic_dev_init(struct pci_dev *pdev)
>> +{
>> +struct hinic_dev *nic_dev;
>> +struct net_device *netdev;
>> +struct hinic_hwdev *hwdev;
>> +int err, num_qps;
>> +
>> +err = hinic_init_hwdev(, pdev);
>> +if (err) {
>> +dev_err(>dev, "Failed to initialize HW device\n");
>> +return err;
>> +}
>> +
>> +num_qps = hinic_hwdev_num_qps(hwdev);
>> +if (num_qps <= 0) {
>> +dev_err(>dev, "Invalid number of QPS\n");
>> +err = -EINVAL;
>> +goto num_qps_err;
>> +}
>> +
>> +netdev = alloc_etherdev_mq(sizeof(*nic_dev), num_qps);
>> +if (!netdev) {
>> +pr_err("Failed to allocate Ethernet device\n");
> 
> Above you used dev_err, here you used pr_err(). Please be consistent.
> 
>> +err = -ENOMEM;
>> +goto alloc_etherdev_err;
>> +}
>> +
>> +netdev->netdev_ops = _netdev_ops;
>> +
>> +nic_dev = (struct hinic_dev *)netdev_priv(netdev);
>> +nic_dev->hwdev = hwdev;
>> +nic_dev->netdev = netdev;
>> +nic_dev->msg_enable = MSG_ENABLE_DEFAULT;
>> +
>> +pci_set_drvdata(pdev, netdev);
>> +
>> +netif_carrier_off(netdev);
>> +
>> +err = register_netdev(netdev);
>> +if (err) {
>> +netif_err(nic_dev, probe, netdev, "Failed to register 
>> netdev\n");
> 
> probably not a good idea to use netif_err, if register_netdev just
> failed. dev_err() would be better.
> 
> 
> .
>

[PATCH net v2] cxgb4: ptp_clock_register() returns error pointers

2017-07-13 Thread Ganesh Goudar

Check ptp_clock_register() return not only for NULL but
also for error pointers, and also nullify adapter->ptp_clock
if ptp_clock_register() fails.

Fixes: 9c33e4208bce ("cxgb4: Add PTP Hardware Clock (PHC) support")
Reported-by: Dan Carpenter 
Cc: Richard Cochran 
Signed-off-by: Ganesh Goudar 
---
v2: nullifying adapter->ptp_clock if ptp_clock_register() fails.
---
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_ptp.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_ptp.c 
b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_ptp.c
index 50517cf..9f9d6ca 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_ptp.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_ptp.c
@@ -441,7 +441,8 @@ void cxgb4_ptp_init(struct adapter *adapter)
 
adapter->ptp_clock = ptp_clock_register(>ptp_clock_info,
>pdev->dev);
-   if (!adapter->ptp_clock) {
+   if (IS_ERR_OR_NULL(adapter->ptp_clock)) {
+   adapter->ptp_clock = NULL;
dev_err(adapter->pdev_dev,
"PTP %s Clock registration has failed\n", __func__);
return;
-- 
2.1.0

[PATCH net] net: bridge: fix dest lookup when vlan proto doesn't match

2017-07-13 Thread Nikolay Aleksandrov

With 802.1ad support the vlan_ingress code started checking for vlan
protocol mismatch which causes the current tag to be inserted and the
bridge vlan protocol & pvid to be set. The vlan tag insertion changes
the skb mac_header and thus the lookup mac dest pointer which was loaded
prior to calling br_allowed_ingress in br_handle_frame_finish is VLAN_HLEN
bytes off now, pointing to the last two bytes of the destination mac and
the first four of the source mac causing lookups to always fail and
broadcasting all such packets to all ports. Same thing happens for locally
originated packets when passing via br_dev_xmit. So load the dest pointer
after the vlan checks and possible skb change.

Fixes: 8580e2117c06 ("bridge: Prepare for 802.1ad vlan filtering support")
Reported-by: Anitha Narasimha Murthy 
Signed-off-by: Nikolay Aleksandrov 
---
 net/bridge/br_device.c | 3 ++-
 net/bridge/br_input.c  | 3 ++-
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/net/bridge/br_device.c b/net/bridge/br_device.c
index f0f3447e8aa4..861ae2a165f4 100644
--- a/net/bridge/br_device.c
+++ b/net/bridge/br_device.c
@@ -34,11 +34,11 @@ static struct lock_class_key bridge_netdev_addr_lock_key;
 netdev_tx_t br_dev_xmit(struct sk_buff *skb, struct net_device *dev)
 {
struct net_bridge *br = netdev_priv(dev);
-   const unsigned char *dest = skb->data;
struct net_bridge_fdb_entry *dst;
struct net_bridge_mdb_entry *mdst;
struct pcpu_sw_netstats *brstats = this_cpu_ptr(br->stats);
const struct nf_br_ops *nf_ops;
+   const unsigned char *dest;
u16 vid = 0;
 
rcu_read_lock();
@@ -61,6 +61,7 @@ netdev_tx_t br_dev_xmit(struct sk_buff *skb, struct 
net_device *dev)
if (!br_allowed_ingress(br, br_vlan_group_rcu(br), skb, ))
goto out;
 
+   dest = eth_hdr(skb)->h_dest;
if (is_broadcast_ether_addr(dest)) {
br_flood(br, skb, BR_PKT_BROADCAST, false, true);
} else if (is_multicast_ether_addr(dest)) {
diff --git a/net/bridge/br_input.c b/net/bridge/br_input.c
index 013f2290bfa5..7637f58c1226 100644
--- a/net/bridge/br_input.c
+++ b/net/bridge/br_input.c
@@ -131,11 +131,11 @@ static void br_do_proxy_arp(struct sk_buff *skb, struct 
net_bridge *br,
 int br_handle_frame_finish(struct net *net, struct sock *sk, struct sk_buff 
*skb)
 {
struct net_bridge_port *p = br_port_get_rcu(skb->dev);
-   const unsigned char *dest = eth_hdr(skb)->h_dest;
enum br_pkt_type pkt_type = BR_PKT_UNICAST;
struct net_bridge_fdb_entry *dst = NULL;
struct net_bridge_mdb_entry *mdst;
bool local_rcv, mcast_hit = false;
+   const unsigned char *dest;
struct net_bridge *br;
u16 vid = 0;
 
@@ -153,6 +153,7 @@ int br_handle_frame_finish(struct net *net, struct sock 
*sk, struct sk_buff *skb
br_fdb_update(br, p, eth_hdr(skb)->h_source, vid, false);
 
local_rcv = !!(br->dev->flags & IFF_PROMISC);
+   dest = eth_hdr(skb)->h_dest;
if (is_multicast_ether_addr(dest)) {
/* by definition the broadcast is also a multicast address */
if (is_broadcast_ether_addr(dest)) {
-- 
2.1.4

Re: net-next STATUS page

2017-07-13 Thread Jan Engelhardt

On Thursday 2017-07-13 13:32, Saeed Mahameed wrote:
>>>
 Therefore, in order to avoid any and all confusion I have created this
 web site:

  http://vger.kernel.org/~davem/net-next.html
>
> You will need an image processing algorithm to determine whether it is open or
> close, i have some good neural network algorithms if you are interested :)

That's not going to win any hacker prize.

A much more resource-saving approach is to just strcasestr(s, "closed") 
in the raw text. …at least while the page maintainer remains 
good-hearted.

Re: net-next STATUS page

2017-07-13 Thread Saeed Mahameed




On 7/13/2017 11:56 AM, Jiri Pirko wrote:

Tue, Jul 11, 2017 at 08:58:41PM CEST, step...@networkplumber.org wrote:

On Tue, 11 Jul 2017 07:24:16 -0700 (PDT)
David Miller  wrote:


It has gotten to the point that even casually walking around
Faro, Portugal last week, random German tourists would stop
me in the street and ask if net-next was open or not.

Therefore, in order to avoid any and all confusion I have created this
web site:


https://emea01.safelinks.protection.outlook.com/?url=http:%2F%2Fvger.kernel.org%2F~davem%2Fnet-next.html=02%7C01%7Csaeedm%40mellanox.com%7Caf800791df264ccd4eb008d4c9cd20ca%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636355330266033104=frov8YFnlybI1n3hacYLWJZzYbWL7mk4kWIS6XlG2Vg%3D=0

Please refer to it if you are ever in doubt.

Thanks!


Where is the app?


There would be good to have git-send-email hook to check this :)



You will need an image processing algorithm to determine whether it is 
open or close, i have some good neural network algorithms if you are 
interested :)

[PATCH] dccp: make const array error_code static

2017-07-13 Thread Colin King

From: Colin Ian King 

Don't populate array error_code on the stack but make it static. Makes
the object code smaller by almost 250 bytes:

Before:
   textdata bss dec hex filename
  10366 983   0   113492c55 net/dccp/input.o

After:
   textdata bss dec hex filename
  101611039   0   112002bc0 net/dccp/input.o

Signed-off-by: Colin Ian King 
---
 net/dccp/input.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/dccp/input.c b/net/dccp/input.c
index 4a05d7876850..fa6be9750bb4 100644
--- a/net/dccp/input.c
+++ b/net/dccp/input.c
@@ -126,7 +126,7 @@ static int dccp_rcv_closereq(struct sock *sk, struct 
sk_buff *skb)
 
 static u16 dccp_reset_code_convert(const u8 code)
 {
-   const u16 error_code[] = {
+   static const u16 error_code[] = {
[DCCP_RESET_CODE_CLOSED] = 0,   /* normal termination */
[DCCP_RESET_CODE_UNSPECIFIED]= 0,   /* nothing known */
[DCCP_RESET_CODE_ABORTED]= ECONNRESET,
-- 
2.11.0

Re: [RFC PATCH 00/12] Implement XDP bpf_redirect vairants

2017-07-13 Thread Jesper Dangaard Brouer

On Tue, 11 Jul 2017 11:26:54 -0700
John Fastabend  wrote:

> On 07/11/2017 07:23 AM, Jesper Dangaard Brouer wrote:
> > On Mon, 10 Jul 2017 17:59:17 -0700
> > John Fastabend  wrote:
> >   
> >> On 07/10/2017 11:30 AM, Jesper Dangaard Brouer wrote:  
> >>> On Sat, 8 Jul 2017 21:06:17 +0200
> >>> Jesper Dangaard Brouer  wrote:
> >>> 
>  On Sat, 08 Jul 2017 10:46:18 +0100 (WEST)
>  David Miller  wrote:
> 
> > From: John Fastabend 
> > Date: Fri, 07 Jul 2017 10:48:36 -0700
> >   
> >> On 07/07/2017 10:34 AM, John Fastabend wrote:
> >>> This series adds two new XDP helper routines bpf_redirect() and
> >>> bpf_redirect_map(). The first variant bpf_redirect() is meant
> >>> to be used the same way it is currently being used by the cls_bpf
> >>> classifier. An xdp packet will be redirected immediately when this
> >>> is called.
> >>
> >> Also other than the typo in the title there ;) I'm going to CC
> >> the driver maintainers working on XDP (makes for a long CC list but)
> >> because we would want to try and get support in as many as possible in
> >> the next merge window.
> >>
> >> For this rev I just implemented on ixgbe because I wrote the
> >> original XDP support there. I'll volunteer to do virtio as well.   
> >>  
> >
> > I went over this series a few times and it looks great to me.
> > You didn't even give me some coding style issues to pick on :-)  
> 
>  We (Daniel, Andy and I) have been reviewing and improving on this
>  patchset the last couple of weeks ;-).  We had some stability issues,
>  which is why it wasn't published earlier. My plan is to test this
>  latest patchset again, Monday and Tuesday. I'll try to assess stability
>  and provide some performance numbers.
> >>>
> >>>
> >>> Damn, I though it was stable, I have been running a lot of performance
> >>> tests, and then this just happened :-(
> >>
> >> Thanks, I'll take a look through the code and see if I can come up with
> >> why this might happen. I haven't hit it on my tests yet though.  
> > 
> > I've figured out why this happens, and I have a fix, see patch below
> > with some comments with questions.
> >   
> 
> Awesome!
> 
> > The problem is that we can leak map_to_flush in an error path, the fix:
> > 
> > diff --git a/net/core/filter.c b/net/core/filter.c
> > index 2ccd6ff09493..7f1f48668dcf 100644
> > --- a/net/core/filter.c
> > +++ b/net/core/filter.c
> > @@ -2497,11 +2497,14 @@ int xdp_do_redirect_map(struct net_device *dev, 
> > struct xdp_buff *xdp,
> > ri->map = NULL;
> >  
> > trace_xdp_redirect(dev, fwd, xdp_prog, XDP_REDIRECT);
> > -
> > +   // Q: Should we also trace "goto out" (failed lookup)?
> > +   //like bpf_warn_invalid_xdp_redirect();  
> 
> Maybe another trace event? trace_xdp_redirect_failed()
> 
> > return __bpf_tx_xdp(fwd, map, xdp, index);
> >  out:
> > ri->ifindex = 0;
> > -   ri->map = NULL;
> > +   // XXX: here we could leak ri->map_to_flush, which could be
> > +   //  picked up later by xdp_do_flush_map()
> > +   xdp_do_flush_map(); /* Clears ri->map_to_flush + ri->map */  
> 
> +1 
> 
> ah map lookup failed and we need to do the flush nice catch.

I'm still getting crashes (but much harder to provoke), but I figured
out why.  We sort of missed one case, where map_to_flush gets set, when
the ndo_xdp_xmit() call starts to fail, and the ixgbe driver then
forgets to call xdp_do_flush_map, if all packets in that NAPI cycle
failed.  We could blame the driver, but yhe clean solution is making
sure, that we don't set map_to_flush when the __bpf_tx_xdp() call
fails. It should also handle the other case I fixed  I'll cleanup
my PoC-fix patch, test it and provide it here.

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

[PATCH net] net: hns: add acpi function of xge led control

2017-07-13 Thread liujian56

From: LiuJian 

The current code only support DT method to control xge led.
This patch is the implementation of acpi method to control xge led.

Signed-off-by: LiuJian 
Reviewed-by: John Garry 
Reviewed-by: Yunsheng Lin 
Reviewed-by: Daode Huang 
---
 drivers/net/ethernet/hisilicon/hns/hns_ae_adapt.c  |  3 +-
 drivers/net/ethernet/hisilicon/hns/hns_dsaf_misc.c | 58 +-
 2 files changed, 58 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/hisilicon/hns/hns_ae_adapt.c 
b/drivers/net/ethernet/hisilicon/hns/hns_ae_adapt.c
index ff864a1..a37166e 100644
--- a/drivers/net/ethernet/hisilicon/hns/hns_ae_adapt.c
+++ b/drivers/net/ethernet/hisilicon/hns/hns_ae_adapt.c
@@ -776,8 +776,9 @@ void hns_ae_update_led_status(struct hnae_handle *handle)
 
assert(handle);
mac_cb = hns_get_mac_cb(handle);
-   if (!mac_cb->cpld_ctrl)
+   if (mac_cb->media_type != HNAE_MEDIA_TYPE_FIBER)
return;
+
hns_set_led_opt(mac_cb);
 }
 
diff --git a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_misc.c 
b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_misc.c
index 7a8addd..408b63f 100644
--- a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_misc.c
+++ b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_misc.c
@@ -53,6 +53,34 @@ static u32 dsaf_read_sub(struct dsaf_device *dsaf_dev, u32 
reg)
return ret;
 }
 
+static void hns_dsaf_acpi_ledctrl_by_port(struct hns_mac_cb *mac_cb, u8 
op_type,
+  u32 link, u32 port, u32 act)
+{
+   union acpi_object *obj;
+   union acpi_object obj_args[3], argv4;
+
+   obj_args[0].integer.type = ACPI_TYPE_INTEGER;
+   obj_args[0].integer.value = link;
+   obj_args[1].integer.type = ACPI_TYPE_INTEGER;
+   obj_args[1].integer.value = port;
+   obj_args[2].integer.type = ACPI_TYPE_INTEGER;
+   obj_args[2].integer.value = act;
+
+   argv4.type = ACPI_TYPE_PACKAGE;
+   argv4.package.count = 3;
+   argv4.package.elements = obj_args;
+
+   obj = acpi_evaluate_dsm(ACPI_HANDLE(mac_cb->dev),
+   _dsaf_acpi_dsm_guid, 0, op_type, );
+   if (!obj) {
+   dev_warn(mac_cb->dev, "ledctrl fail, link:%d port:%d act:%d!\n",
+link, port, act);
+   return;
+   }
+
+   ACPI_FREE(obj);
+}
+
 static void hns_cpld_set_led(struct hns_mac_cb *mac_cb, int link_status,
 u16 speed, int data)
 {
@@ -93,6 +121,18 @@ static void hns_cpld_set_led(struct hns_mac_cb *mac_cb, int 
link_status,
}
 }
 
+static void hns_cpld_set_led_acpi(struct hns_mac_cb *mac_cb, int link_status,
+u16 speed, int data)
+{
+   if (!mac_cb) {
+   pr_err("cpld_led_set mac_cb is null!\n");
+   return;
+   }
+
+   hns_dsaf_acpi_ledctrl_by_port(mac_cb, HNS_OP_LED_SET_FUNC,
+   link_status, mac_cb->mac_id, data);
+}
+
 static void cpld_led_reset(struct hns_mac_cb *mac_cb)
 {
if (!mac_cb || !mac_cb->cpld_ctrl)
@@ -103,6 +143,20 @@ static void cpld_led_reset(struct hns_mac_cb *mac_cb)
mac_cb->cpld_led_value = CPLD_LED_DEFAULT_VALUE;
 }
 
+static void cpld_led_reset_acpi(struct hns_mac_cb *mac_cb)
+{
+   if (!mac_cb) {
+   pr_err("cpld_led_reset mac_cb is null!\n");
+   return;
+   }
+
+   if (mac_cb->media_type != HNAE_MEDIA_TYPE_FIBER)
+return;
+
+   hns_dsaf_acpi_ledctrl_by_port(mac_cb, HNS_OP_LED_SET_FUNC,
+   0, mac_cb->mac_id, 0);
+}
+
 static int cpld_set_led_id(struct hns_mac_cb *mac_cb,
   enum hnae_led_state status)
 {
@@ -604,8 +658,8 @@ struct dsaf_misc_op *hns_misc_op_get(struct dsaf_device 
*dsaf_dev)
 
misc_op->cfg_serdes_loopback = hns_mac_config_sds_loopback;
} else if (is_acpi_node(dsaf_dev->dev->fwnode)) {
-   misc_op->cpld_set_led = hns_cpld_set_led;
-   misc_op->cpld_reset_led = cpld_led_reset;
+   misc_op->cpld_set_led = hns_cpld_set_led_acpi;
+   misc_op->cpld_reset_led = cpld_led_reset_acpi;
misc_op->cpld_set_led_id = cpld_set_led_id;
 
misc_op->dsaf_reset = hns_dsaf_rst_acpi;
-- 
2.5.0

[PATCH net] xgene: Don't fail probe, if there is no clk resource for SGMII interfaces

2017-07-13 Thread Thomas Bogendoerfer

From: Thomas Bogendoerfer 

This change fixes following problem

[1.827940] xgene-enet: probe of 1f210030.ethernet failed with error -2

which leads to a missing ethernet interface (reproducable at least on
Gigabyte MP30-AR0 and APM Mustang systems).

The check for a valid clk resource fails, because DT doesn't provide a
clock for sgenet1. But the driver doesn't use this clk, if the ethernet
port is connected via SGMII. Therefore this patch avoids probing for clk
on SGMII interfaces.

Fixes: 9aea7779b764 drivers: net: xgene: Fix crash on DT systems
Signed-off-by: Thomas Bogendoerfer 
---
 drivers/net/ethernet/apm/xgene/xgene_enet_main.c | 22 --
 1 file changed, 12 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/apm/xgene/xgene_enet_main.c 
b/drivers/net/ethernet/apm/xgene/xgene_enet_main.c
index d3906f6b01bd..86058a9f3417 100644
--- a/drivers/net/ethernet/apm/xgene/xgene_enet_main.c
+++ b/drivers/net/ethernet/apm/xgene/xgene_enet_main.c
@@ -1785,16 +1785,18 @@ static int xgene_enet_get_resources(struct 
xgene_enet_pdata *pdata)
 
xgene_enet_gpiod_get(pdata);
 
-   pdata->clk = devm_clk_get(>dev, NULL);
-   if (IS_ERR(pdata->clk)) {
-   /* Abort if the clock is defined but couldn't be retrived.
-* Always abort if the clock is missing on DT system as
-* the driver can't cope with this case.
-*/
-   if (PTR_ERR(pdata->clk) != -ENOENT || dev->of_node)
-   return PTR_ERR(pdata->clk);
-   /* Firmware may have set up the clock already. */
-   dev_info(dev, "clocks have been setup already\n");
+   if (pdata->phy_mode != PHY_INTERFACE_MODE_SGMII) {
+   pdata->clk = devm_clk_get(>dev, NULL);
+   if (IS_ERR(pdata->clk)) {
+   /* Abort if the clock is defined but couldn't be
+* retrived. Always abort if the clock is missing on
+* DT system as the driver can't cope with this case.
+*/
+   if (PTR_ERR(pdata->clk) != -ENOENT || dev->of_node)
+   return PTR_ERR(pdata->clk);
+   /* Firmware may have set up the clock already. */
+   dev_info(dev, "clocks have been setup already\n");
+   }
}
 
if (pdata->phy_mode != PHY_INTERFACE_MODE_XGMII)
-- 
2.12.3

Re: [PATCH] bpf: fix return in bpf_skb_adjust_net

2017-07-13 Thread Daniel Borkmann


On 07/13/2017 08:27 AM, Kefeng Wang wrote:

The bpf_skb_adjust_net() ignores the return value of bpf_skb_net_shrink/grow,
and always return 0, fix it by return 'ret'.

Signed-off-by: Kefeng Wang 


Sigh, yep of course. Thanks!

Acked-by: Daniel Borkmann

Re: net-next STATUS page

2017-07-13 Thread Jiri Pirko

Tue, Jul 11, 2017 at 08:58:41PM CEST, step...@networkplumber.org wrote:
>On Tue, 11 Jul 2017 07:24:16 -0700 (PDT)
>David Miller  wrote:
>
>> It has gotten to the point that even casually walking around
>> Faro, Portugal last week, random German tourists would stop
>> me in the street and ask if net-next was open or not.
>> 
>> Therefore, in order to avoid any and all confusion I have created this
>> web site:
>> 
>>  http://vger.kernel.org/~davem/net-next.html
>> 
>> Please refer to it if you are ever in doubt.
>> 
>> Thanks!
>
>Where is the app?

There would be good to have git-send-email hook to check this :)

[PATCH net] libceph: osdmap: Fix some NULL dereferences

2017-07-13 Thread Dan Carpenter

There are hidden gotos in the ceph_decode_* macros.  We need to set the
"err" variable on these error paths otherwise we end up returning
ERR_PTR(0) which is NULL.  It causes NULL dereferences in the callers.

Fixes: 278b1d709c6a ("libceph: ceph_decode_skip_* helpers")
Signed-off-by: Dan Carpenter 

diff --git a/net/ceph/osdmap.c b/net/ceph/osdmap.c
index 864789c5974e..c7521a847ef7 100644
--- a/net/ceph/osdmap.c
+++ b/net/ceph/osdmap.c
@@ -510,6 +510,7 @@ static struct crush_map *crush_decode(void *pbyval, void 
*end)
}
}
 
+   err = -EINVAL;
ceph_decode_skip_map(p, end, 32, string, bad); /* type_map */
ceph_decode_skip_map(p, end, 32, string, bad); /* name_map */
ceph_decode_skip_map(p, end, 32, string, bad); /* rule_name_map */
@@ -1825,9 +1826,9 @@ struct ceph_osdmap *osdmap_apply_incremental(void **p, 
void *end,
if (struct_v >= 3) {
/* new_erasure_code_profiles */
ceph_decode_skip_map_of_map(p, end, string, string, string,
-   bad);
+   e_inval);
/* old_erasure_code_profiles */
-   ceph_decode_skip_set(p, end, string, bad);
+   ceph_decode_skip_set(p, end, string, e_inval);
}
 
if (struct_v >= 4) {

Re: [PATCH net-next v1 2/3] gtp: Support creating flow-based gtp net_device

2017-07-13 Thread Harald Welte

Hi Jiannan,

please keep in mind I have zero clue about OVS, so I cannot really
comment on what is customary in that context and what not.  However,
some general review from the GTP point of view below:

On Wed, Jul 12, 2017 at 05:44:54PM -0700, Jiannan Ouyang wrote:
> Add the gtp_create_flow_based_dev() interface to create flow-based gtp
> net_device, which sets gtp->collect_md. Under flow-based mode, UDP sockets are
> created and maintained in kernel.
> 
> Signed-off-by: Jiannan Ouyang 
> ---
>  drivers/net/gtp.c | 213 
> +-
>  include/net/gtp.h |   8 ++
>  2 files changed, 217 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/net/gtp.c b/drivers/net/gtp.c
> index 5a7b504..09712c9 100644
> --- a/drivers/net/gtp.c
> +++ b/drivers/net/gtp.c
> @@ -642,9 +642,94 @@ static netdev_tx_t gtp_dev_xmit(struct sk_buff *skb, 
> struct net_device *dev)
>   return NETDEV_TX_OK;
>  }
>  
> +static int gtp_hashtable_new(struct gtp_dev *gtp, int hsize);
> +static void gtp_hashtable_free(struct gtp_dev *gtp);
> +static int gtp_encap_enable(struct gtp_dev *gtp, struct nlattr *data[]);
> +
> +static int gtp_change_mtu(struct net_device *dev, int new_mtu, bool strict)
> +{
> + int max_mtu = IP_MAX_MTU - dev->hard_header_len - sizeof(struct iphdr)
> + - sizeof(struct udphdr) - sizeof(struct gtp1_header);
> +
> + if (new_mtu < ETH_MIN_MTU)
> + return -EINVAL;
> +
> + if (new_mtu > max_mtu) {
> + if (strict)
> + return -EINVAL;
> +
> + new_mtu = max_mtu;
> + }
> +
> + dev->mtu = new_mtu;
> + return 0;
> +}
> +
> +static int gtp_dev_open(struct net_device *dev)
> +{
> + struct gtp_dev *gtp = netdev_priv(dev);
> + struct net *net = gtp->net;
> + struct socket *sock1u;
> + struct socket *sock0;
> + struct udp_tunnel_sock_cfg tunnel_cfg;
> + struct udp_port_cfg udp_conf;
> + int err;
> +
> + memset(_conf, 0, sizeof(udp_conf));
> +
> + udp_conf.family = AF_INET;
> + udp_conf.local_ip.s_addr = htonl(INADDR_ANY);
> + udp_conf.local_udp_port = htons(GTP1U_PORT);
> +
> + err = udp_sock_create(gtp->net, _conf, );
> + if (err < 0)
> + return err;
> +
> + udp_conf.local_udp_port = htons(GTP0_PORT);
> + err = udp_sock_create(gtp->net, _conf, );
> + if (err < 0)
> + return err;

you're unconditionally binding to both GTP0 and GTP1 UDP ports.  This is
done selectively based on netlink attributes in the existing "normal"
non-OVS kernel code, i.e. the control is left to the user.

Is this function is only called/used in the context of OVS?  If so,
since you explicitly implement only GTPv1, why bind to GTPv0 port?

> + setup_udp_tunnel_sock(net, sock1u, _cfg);

even here, you're only setting up the v1 and not v0.

> + /* Assume largest header, ie. GTPv0. */
> + dev->needed_headroom= LL_MAX_HEADER +
> + sizeof(struct iphdr) +
> + sizeof(struct udphdr) +
> + sizeof(struct gtp0_header);

... and here you're using headroom for a GTPv0 header, despite (I think)
only supporting GTPv1 from this configuration?

> + err = gtp_hashtable_new(gtp, GTP_PDP_HASHSIZE); // JO: when to free??

I think that question about when to free needs to be resolved before any
merge.  Did you check that it persists even after the device is
closed/removed?

-- 
- Harald Welte    http://laforge.gnumonks.org/

"Privacy in residential applications is a desirable marketing option."
  (ETSI EN 300 175-7 Ch. A6)

1 2 >

1 - 100 of 103 matches

Mail list logo