date:20160904

[PATCH net] bnxt_en: Fix TX push operation on ARM64.

2016-09-04 Thread Michael Chan

There is a code path where we are calling __iowrite64_copy() on
an address that is not 64-bit aligned.  This causes an exception on
some architectures such as arm64.  Fix that code path by using
__iowrite32_copy().

Reported-by: JD Zheng 
Signed-off-by: Michael Chan 
---

Please consider this for stable as well.  Thanks.

 drivers/net/ethernet/broadcom/bnxt/bnxt.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c 
b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index 2cf7910..228c964 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -353,8 +353,8 @@ static netdev_tx_t bnxt_start_xmit(struct sk_buff *skb, 
struct net_device *dev)
push_len = (length + sizeof(*tx_push) + 7) / 8;
if (push_len > 16) {
__iowrite64_copy(txr->tx_doorbell, tx_push_buf, 16);
-   __iowrite64_copy(txr->tx_doorbell + 4, tx_push_buf + 1,
-push_len - 16);
+   __iowrite32_copy(txr->tx_doorbell + 4, tx_push_buf + 1,
+(push_len - 16) << 1);
} else {
__iowrite64_copy(txr->tx_doorbell, tx_push_buf,
 push_len);
-- 
1.8.3.1

Re: [PATCH] RDS: Simplify code

2016-09-04 Thread Leon Romanovsky

On Mon, Sep 05, 2016 at 06:38:21AM +0200, Christophe JAILLET wrote:
> Le 04/09/2016 à 20:23, Leon Romanovsky a écrit :
> >On Sun, Sep 04, 2016 at 05:57:20PM +0200, Christophe JAILLET wrote:
> >>Le 04/09/2016 à 14:20, Leon Romanovsky a écrit :
> >>>On Sat, Sep 03, 2016 at 07:33:29AM +0200, Christophe JAILLET wrote:
> Calling 'list_splice' followed by 'INIT_LIST_HEAD' is equivalent to
> 'list_splice_init'.
> >>>It is not 100% accurate
> >>>
> >>>list_splice(y, z)
> >>>INIT_LIST_HEAD(y)
> >>>
> >>>==>
> >>>
> >>>if (!list_empty(y))
> >>>  __list_splice(y, z, z>next);
> >>>INIT_LIST_HEAD(y)
> >>>
> >>>and not
> >>>
> >>>if (!list_empty(y)) {
> >>>  __list_splice(y, z, z>next);
> >>>  INIT_LIST_HEAD(y)
> >>>}
> >>>
> >>>as list_splice_init will do.
> >>>
> >>You are right but if you dig further you will see that calling
> >>INIT_LIST_HEAD on an empty list is a no-op (AFAIK).
> >>And if this list was not already correctly initialized, then you would have
> >>some other troubles.
> >Thank you for the suggestion,
> >It looks like the code after that can be skipped in case of loop_conns
> >list is empty, the tmp_list will be empty too.
> >
> >174 list_for_each_entry_safe(lc, _lc, _list, loop_node) {
> >175 WARN_ON(lc->conn->c_passive);
> >176 rds_conn_destroy(lc->conn);
> >177 }
> Yes, but this would require some more code and test. This function doesn't
> seem to be in a hot path, so I'm not sure that the added complexity would
> worth it.
> It would require a new 'list_empty()' test and some code rearrangement.
>
> I suppose that testing for emptiness at the beginning or going through a
> list_for_each_entry_safe on a empty list (which will exit immediately and do
> nothing) is more or less the same in term of speed. So keep the code simple
> and readable.

I would expect one list_empty check at the beginning and return
immediately, but anyway it doesn't matter.

>
> CJ
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


signature.asc
Description: PGP signature

[PATCH] tcp: cwnd does not increase in TCP YeAH

2016-09-04 Thread Artem Germanov


Commit 76174004a0f19785a328f40388e87e982bbf69b9
(tcp: do not slow start when cwnd equals ssthresh )
introduced regression in TCP YeAH. Using 100ms delay 1% loss virtual 
ethernet link kernel 4.2 shows bandwidth ~500KB/s for single TCP 
connection and kernel 4.3 and above (including 4.8-rc4) shows bandwidth 
~100KB/s.
 That is caused by stalled cwnd when cwnd equals ssthresh. This patch 
fixes it by proper increasing cwnd in this case.


Signed-off-by: Artem Germanov 
---
--- net/ipv4/tcp_yeah.c.orig2016-09-04 09:53:01.0 -0700
+++ net/ipv4/tcp_yeah.c 2016-09-04 09:53:40.0 -0700
@@ -76,7 +76,7 @@ static void tcp_yeah_cong_avoid(struct s
if (!tcp_is_cwnd_limited(sk))
return;

-   if (tp->snd_cwnd <= tp->snd_ssthresh)
+   if (tcp_in_slow_start(tp))
tcp_slow_start(tp, acked);

else if (!yeah->doing_reno_now) {

[PATCHv2 iproute2 net-next] nstat: add sctp snmp support

2016-09-04 Thread Hangbin Liu

SCTP module was not load by default. But this should be OK since we will not
load table if fdopen() failed, also opening the proc file won't load SCTP
kernel module.

Signed-off-by: Hangbin Liu 
---
 misc/nstat.c | 18 ++
 1 file changed, 18 insertions(+)

diff --git a/misc/nstat.c b/misc/nstat.c
index 6143719..1cb6c7e 100644
--- a/misc/nstat.c
+++ b/misc/nstat.c
@@ -76,6 +76,11 @@ static int net_snmp6_open(void)
return generic_proc_open("PROC_NET_SNMP6", "net/snmp6");
 }
 
+static int net_sctp_snmp_open(void)
+{
+   return generic_proc_open("PROC_NET_SCTP_SNMP", "net/sctp/snmp");
+}
+
 struct nstat_ent {
struct nstat_ent *next;
char *id;
@@ -247,6 +252,16 @@ static void load_ugly_table(FILE *fp)
}
 }
 
+static void load_sctp_snmp(void)
+{
+   FILE *fp = fdopen(net_sctp_snmp_open(), "r");
+
+   if (fp) {
+   load_good_table(fp);
+   fclose(fp);
+   }
+}
+
 static void load_snmp(void)
 {
FILE *fp = fdopen(net_snmp_open(), "r");
@@ -391,6 +406,7 @@ static void update_db(int interval)
load_netstat();
load_snmp6();
load_snmp();
+   load_sctp_snmp();
 
h = kern_db;
kern_db = n;
@@ -450,6 +466,7 @@ static void server_loop(int fd)
load_netstat();
load_snmp6();
load_snmp();
+   load_sctp_snmp();
 
for (;;) {
int status;
@@ -706,6 +723,7 @@ int main(int argc, char *argv[])
load_netstat();
load_snmp6();
load_snmp();
+   load_sctp_snmp();
if (info_source[0] == 0)
strcpy(info_source, "kernel");
}
-- 
2.5.5

Re: [PATCH iproute2 net-next] nstat: add sctp snmp support

2016-09-04 Thread Hangbin Liu

2016-09-02 18:09 GMT+08:00 Phil Sutter :
> Did you forget to add the load call to update_db(), or am I missing
> something?

Opps, forgot to add it there. I will send patchv2 for this issue.

Thanks
Hangbin

> Apart from that, looks nice and clean.
>
> Cheers, Phil

linux-next: manual merge of the net-next tree with the net tree

2016-09-04 Thread Stephen Rothwell

Hi all,

Today's linux-next merge of the net-next tree got a conflict in:

  drivers/net/ethernet/mediatek/mtk_eth_soc.c

between commits:

  d3bd1ce4db8e ("net: ethernet: mediatek: remove redundant free_irq for 
devm_request_irq allocated irq")
  7c6b0d76fa02 ("net: ethernet: mediatek: fix logic unbalance between probe and 
remove")

from the net tree and commits:

  45d339309f49 ("net: mediatek: remove unnecessary platform_set_drvdata()")
  bacfd110e059 ("net: ethernet: mediatek: modify to use the PDMA instead of the 
QDMA for Ethernet RX")

from the net-next tree.

I fixed it up (see below) and can carry the fix as necessary. This
is now fixed as far as linux-next is concerned, but any non trivial
conflicts should be mentioned to your upstream maintainer when your tree
is submitted for merging.  You may also want to consider cooperating
with the maintainer of the conflicting tree to minimise any particularly
complex conflicts.

-- 
Cheers,
Stephen Rothwell

diff --cc drivers/net/ethernet/mediatek/mtk_eth_soc.c
index d9199151a83e,2dadfa961898..
--- a/drivers/net/ethernet/mediatek/mtk_eth_soc.c
+++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
@@@ -334,9 -338,12 +334,10 @@@ static void mtk_mdio_cleanup(struct mtk
return;
  
mdiobus_unregister(eth->mii_bus);
 -  of_node_put(eth->mii_bus->dev.of_node);
 -  mdiobus_free(eth->mii_bus);
  }
  
- static inline void mtk_irq_disable(struct mtk_eth *eth, u32 mask)
+ static inline void mtk_irq_disable(struct mtk_eth *eth,
+  unsigned reg, u32 mask)
  {
unsigned long flags;
u32 val;
@@@ -1501,7 -1513,11 +1508,8 @@@ static void mtk_uninit(struct net_devic
struct mtk_eth *eth = mac->hw;
  
phy_disconnect(mac->phy_dev);
-   mtk_irq_disable(eth, ~0);
 -  mtk_mdio_cleanup(eth);
+   mtk_irq_disable(eth, MTK_QDMA_INT_MASK, ~0);
+   mtk_irq_disable(eth, MTK_PDMA_INT_MASK, ~0);
 -  free_irq(eth->irq[1], dev);
 -  free_irq(eth->irq[2], dev);
  }
  
  static int mtk_do_ioctl(struct net_device *dev, struct ifreq *ifr, int cmd)
@@@ -1913,8 -1920,6 +1921,7 @@@ static int mtk_remove(struct platform_d
netif_napi_del(>tx_napi);
netif_napi_del(>rx_napi);
mtk_cleanup(eth);
 +  mtk_mdio_cleanup(eth);
-   platform_set_drvdata(pdev, NULL);
  
return 0;
  }

Re: [PATCH v4 4/5] net: stmmac: add a glue driver for the Amlogic Meson 8b / GXBB DWMAC

2016-09-04 Thread kbuild test robot

Hi Martin,

[auto build test ERROR on next-20160825]
[also build test ERROR on v4.8-rc5]
[cannot apply to robh/for-next net-next/master net/master v4.8-rc4 v4.8-rc3 
v4.8-rc2]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]
[Suggest to use git(>=2.9.0) format-patch --base= (or --base=auto for 
convenience) to record what (public, well-known) commit your patch series was 
built on]
[Check https://git-scm.com/docs/git-format-patch for more information]

url:
https://github.com/0day-ci/linux/commits/Martin-Blumenstingl/meson-Meson8b-and-GXBB-DWMAC-glue-driver/20160905-023130
config: tile-allyesconfig (attached as .config)
compiler: tilegx-linux-gcc (GCC) 4.6.2
reproduce:
wget 
https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross
 -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=tile 

All errors (new ones prefixed by >>):

   drivers/net/ethernet/stmicro/stmmac/dwmac-meson8b.c:63:18: error: field 
'm250_mux' has incomplete type
   drivers/net/ethernet/stmicro/stmmac/dwmac-meson8b.c:67:21: error: field 
'm250_div' has incomplete type
   drivers/net/ethernet/stmicro/stmmac/dwmac-meson8b.c:70:21: error: field 
'm25_div' has incomplete type
   drivers/net/ethernet/stmicro/stmmac/dwmac-meson8b.c: In function 
'meson8b_init_clk':
   drivers/net/ethernet/stmicro/stmmac/dwmac-meson8b.c:88:23: error: storage 
size of 'init' isn't known
>> drivers/net/ethernet/stmicro/stmmac/dwmac-meson8b.c:94:30: error: array type 
>> has incomplete element type
   drivers/net/ethernet/stmicro/stmmac/dwmac-meson8b.c:95:3: error: field name 
not in record or union initializer
   drivers/net/ethernet/stmicro/stmmac/dwmac-meson8b.c:95:3: error: (near 
initialization for 'clk_25m_div_table')
   drivers/net/ethernet/stmicro/stmmac/dwmac-meson8b.c:95:3: error: field name 
not in record or union initializer
   drivers/net/ethernet/stmicro/stmmac/dwmac-meson8b.c:95:3: error: (near 
initialization for 'clk_25m_div_table')
   drivers/net/ethernet/stmicro/stmmac/dwmac-meson8b.c:96:3: error: field name 
not in record or union initializer
   drivers/net/ethernet/stmicro/stmmac/dwmac-meson8b.c:96:3: error: (near 
initialization for 'clk_25m_div_table')
   drivers/net/ethernet/stmicro/stmmac/dwmac-meson8b.c:96:3: error: field name 
not in record or union initializer
   drivers/net/ethernet/stmicro/stmmac/dwmac-meson8b.c:96:3: error: (near 
initialization for 'clk_25m_div_table')
   drivers/net/ethernet/stmicro/stmmac/dwmac-meson8b.c:114:4: error: implicit 
declaration of function '__clk_get_name'
   drivers/net/ethernet/stmicro/stmmac/dwmac-meson8b.c:113:23: warning: 
assignment makes pointer from integer without a cast [enabled by default]
   drivers/net/ethernet/stmicro/stmmac/dwmac-meson8b.c:120:14: error: 
'clk_mux_ops' undeclared (first use in this function)
   drivers/net/ethernet/stmicro/stmmac/dwmac-meson8b.c:120:14: note: each 
undeclared identifier is reported only once for each function it appears in
   drivers/net/ethernet/stmicro/stmmac/dwmac-meson8b.c:132:2: error: implicit 
declaration of function 'devm_clk_register'
   drivers/net/ethernet/stmicro/stmmac/dwmac-meson8b.c:139:14: error: 
'clk_divider_ops' undeclared (first use in this function)
   drivers/net/ethernet/stmicro/stmmac/dwmac-meson8b.c:140:15: error: 
'CLK_SET_RATE_PARENT' undeclared (first use in this function)
   drivers/net/ethernet/stmicro/stmmac/dwmac-meson8b.c:141:21: warning: 
assignment makes pointer from integer without a cast [enabled by default]
   drivers/net/ethernet/stmicro/stmmac/dwmac-meson8b.c:149:26: error: 
'CLK_DIVIDER_ONE_BASED' undeclared (first use in this function)
   drivers/net/ethernet/stmicro/stmmac/dwmac-meson8b.c:149:50: error: 
'CLK_DIVIDER_ALLOW_ZERO' undeclared (first use in this function)
   drivers/net/ethernet/stmicro/stmmac/dwmac-meson8b.c:159:15: error: 
'CLK_IS_BASIC' undeclared (first use in this function)
   drivers/net/ethernet/stmicro/stmmac/dwmac-meson8b.c:160:21: warning: 
assignment makes pointer from integer without a cast [enabled by default]
   drivers/net/ethernet/stmicro/stmmac/dwmac-meson8b.c:94:30: warning: unused 
variable 'clk_25m_div_table'
   drivers/net/ethernet/stmicro/stmmac/dwmac-meson8b.c:88:23: warning: unused 
variable 'init'
   cc1: some warnings being treated as errors

vim +94 drivers/net/ethernet/stmicro/stmmac/dwmac-meson8b.c

57  struct platform_device  *pdev;
58  
59  void __iomem*regs;
60  
61  phy_interface_t phy_mode;
62  
  > 63  struct clk_mux  m250_mux;
64  struct clk  *m250_mux_clk;
65  struct clk  *m250_mux_parent[MUX_CLK_NUM_PARENTS];
66  
67  struct clk_divider  m250_div;
68  struct clk  *m250_div_clk;
69  
70  struct

Re: [PATCH v4 4/5] net: stmmac: add a glue driver for the Amlogic Meson 8b / GXBB DWMAC

2016-09-04 Thread kbuild test robot

Hi Martin,

[auto build test ERROR on next-20160825]
[also build test ERROR on v4.8-rc5]
[cannot apply to robh/for-next net-next/master net/master v4.8-rc4 v4.8-rc3 
v4.8-rc2]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]
[Suggest to use git(>=2.9.0) format-patch --base= (or --base=auto for 
convenience) to record what (public, well-known) commit your patch series was 
built on]
[Check https://git-scm.com/docs/git-format-patch for more information]

url:
https://github.com/0day-ci/linux/commits/Martin-Blumenstingl/meson-Meson8b-and-GXBB-DWMAC-glue-driver/20160905-023130
config: sparc64-allyesconfig (attached as .config)
compiler: sparc64-linux-gnu-gcc (Debian 5.4.0-6) 5.4.0 20160609
reproduce:
wget 
https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross
 -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=sparc64 

All error/warnings (new ones prefixed by >>):

>> drivers/net/ethernet/stmicro/stmmac/dwmac-meson8b.c:63:18: error: field 
>> 'm250_mux' has incomplete type
 struct clk_mux  m250_mux;
 ^
>> drivers/net/ethernet/stmicro/stmmac/dwmac-meson8b.c:67:21: error: field 
>> 'm250_div' has incomplete type
 struct clk_divider m250_div;
^
>> drivers/net/ethernet/stmicro/stmmac/dwmac-meson8b.c:70:21: error: field 
>> 'm25_div' has incomplete type
 struct clk_divider m25_div;
^
   drivers/net/ethernet/stmicro/stmmac/dwmac-meson8b.c: In function 
'meson8b_init_clk':
>> drivers/net/ethernet/stmicro/stmmac/dwmac-meson8b.c:88:23: error: storage 
>> size of 'init' isn't known
 struct clk_init_data init;
  ^
>> drivers/net/ethernet/stmicro/stmmac/dwmac-meson8b.c:94:30: error: array type 
>> has incomplete element type 'struct clk_div_table'
 static struct clk_div_table clk_25m_div_table[] = {
 ^
>> drivers/net/ethernet/stmicro/stmmac/dwmac-meson8b.c:95:5: error: field name 
>> not in record or union initializer
  { .val = 0, .div = 5 },
^
   drivers/net/ethernet/stmicro/stmmac/dwmac-meson8b.c:95:5: note: (near 
initialization for 'clk_25m_div_table')
   drivers/net/ethernet/stmicro/stmmac/dwmac-meson8b.c:95:15: error: field name 
not in record or union initializer
  { .val = 0, .div = 5 },
  ^
   drivers/net/ethernet/stmicro/stmmac/dwmac-meson8b.c:95:15: note: (near 
initialization for 'clk_25m_div_table')
   drivers/net/ethernet/stmicro/stmmac/dwmac-meson8b.c:96:5: error: field name 
not in record or union initializer
  { .val = 1, .div = 10 },
^
   drivers/net/ethernet/stmicro/stmmac/dwmac-meson8b.c:96:5: note: (near 
initialization for 'clk_25m_div_table')
   drivers/net/ethernet/stmicro/stmmac/dwmac-meson8b.c:96:15: error: field name 
not in record or union initializer
  { .val = 1, .div = 10 },
  ^
   drivers/net/ethernet/stmicro/stmmac/dwmac-meson8b.c:96:15: note: (near 
initialization for 'clk_25m_div_table')
>> drivers/net/ethernet/stmicro/stmmac/dwmac-meson8b.c:114:4: error: implicit 
>> declaration of function '__clk_get_name' 
>> [-Werror=implicit-function-declaration]
   __clk_get_name(dwmac->m250_mux_parent[i]);
   ^
>> drivers/net/ethernet/stmicro/stmmac/dwmac-meson8b.c:113:23: warning: 
>> assignment makes pointer from integer without a cast [-Wint-conversion]
  mux_parent_names[i] =
  ^
>> drivers/net/ethernet/stmicro/stmmac/dwmac-meson8b.c:120:14: error: 
>> 'clk_mux_ops' undeclared (first use in this function)
 init.ops = _mux_ops;
 ^
   drivers/net/ethernet/stmicro/stmmac/dwmac-meson8b.c:120:14: note: each 
undeclared identifier is reported only once for each function it appears in
>> drivers/net/ethernet/stmicro/stmmac/dwmac-meson8b.c:132:24: error: implicit 
>> declaration of function 'devm_clk_register' 
>> [-Werror=implicit-function-declaration]
 dwmac->m250_mux_clk = devm_clk_register(dev, >m250_mux.hw);
   ^
>> drivers/net/ethernet/stmicro/stmmac/dwmac-meson8b.c:139:14: error: 
>> 'clk_divider_ops' undeclared (first use in this function)
 init.ops = _divider_ops;
 ^
>> drivers/net/ethernet/stmicro/stmmac/dwmac-meson8b.c:140:15: error: 
>> 'CLK_SET_RATE_PARENT' undeclared (first use in this function)
 init.flags = CLK_SET_RATE_PARENT;
  ^
   drivers/net/ethernet/stmicro/stmmac/dwmac-meson8b.c:141:21: warning: 
assignment makes pointer from integer without a cast [-Wint-conversion]
 clk_div_parents[0] = __clk_get_name(dwmac->m250_mux_clk);
^
>> drivers/net/ethernet/stmicro/stmmac/dwmac-meson8b.c:149:26: error: 
>> 'CLK_DIVIDER_ONE_BASED' undeclared (first use in this function)
 dwmac->m250_div.flags = CLK_DIVIDER_ONE_BASED | CLK_DIVIDER_ALLOW_ZERO;

[PATCH v2] net: Don't delete routes in different VRFs

2016-09-04 Thread Mark Tomlinson

When deleting an IP address from an interface, there is a clean-up of
routes which refer to this local address. However, there was no check to
see that the VRF matched. This meant that deletion wasn't confined to
the VRF it should have been.

To solve this, a new field has been added to fib_info to hold a table
id. When removing fib entries corresponding to a local ip address, this
table id is also used in the comparison.

The table id is populated when the fib_info is created. This was already
done in some places, but not in ip_rt_ioctl(). This has now been fixed.

Fixes: 021dd3b8a142 ("net: Add routes to the table associated with the device")
Acked-by: David Ahern 
Tested-by: David Ahern 
Signed-off-by: Mark Tomlinson 
---
 include/net/ip_fib.h | 3 ++-
 net/ipv4/fib_frontend.c  | 3 ++-
 net/ipv4/fib_semantics.c | 8 ++--
 3 files changed, 10 insertions(+), 4 deletions(-)

diff --git a/include/net/ip_fib.h b/include/net/ip_fib.h
index 4079fc1..7d4a72e 100644
--- a/include/net/ip_fib.h
+++ b/include/net/ip_fib.h
@@ -111,6 +111,7 @@ struct fib_info {
unsigned char   fib_scope;
unsigned char   fib_type;
__be32  fib_prefsrc;
+   u32 fib_tb_id;
u32 fib_priority;
u32 *fib_metrics;
 #define fib_mtu fib_metrics[RTAX_MTU-1]
@@ -319,7 +320,7 @@ void fib_flush_external(struct net *net);
 /* Exported by fib_semantics.c */
 int ip_fib_check_default(__be32 gw, struct net_device *dev);
 int fib_sync_down_dev(struct net_device *dev, unsigned long event, bool force);
-int fib_sync_down_addr(struct net *net, __be32 local);
+int fib_sync_down_addr(struct net_device *dev, __be32 local);
 int fib_sync_up(struct net_device *dev, unsigned int nh_flags);
 
 extern u32 fib_multipath_secret __read_mostly;
diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c
index ef2ebeb..1b25daf 100644
--- a/net/ipv4/fib_frontend.c
+++ b/net/ipv4/fib_frontend.c
@@ -509,6 +509,7 @@ static int rtentry_to_fib_config(struct net *net, int cmd, 
struct rtentry *rt,
if (!dev)
return -ENODEV;
cfg->fc_oif = dev->ifindex;
+   cfg->fc_table = l3mdev_fib_table(dev);
if (colon) {
struct in_ifaddr *ifa;
struct in_device *in_dev = __in_dev_get_rtnl(dev);
@@ -1027,7 +1028,7 @@ no_promotions:
 * First of all, we scan fib_info list searching
 * for stray nexthop entries, then ignite fib_flush.
 */
-   if (fib_sync_down_addr(dev_net(dev), ifa->ifa_local))
+   if (fib_sync_down_addr(dev, ifa->ifa_local))
fib_flush(dev_net(dev));
}
}
diff --git a/net/ipv4/fib_semantics.c b/net/ipv4/fib_semantics.c
index 539fa26..e9f5622 100644
--- a/net/ipv4/fib_semantics.c
+++ b/net/ipv4/fib_semantics.c
@@ -1057,6 +1057,7 @@ struct fib_info *fib_create_info(struct fib_config *cfg)
fi->fib_priority = cfg->fc_priority;
fi->fib_prefsrc = cfg->fc_prefsrc;
fi->fib_type = cfg->fc_type;
+   fi->fib_tb_id = cfg->fc_table;
 
fi->fib_nhs = nhs;
change_nexthops(fi) {
@@ -1337,18 +1338,21 @@ nla_put_failure:
  *   referring to it.
  * - device went down -> we must shutdown all nexthops going via it.
  */
-int fib_sync_down_addr(struct net *net, __be32 local)
+int fib_sync_down_addr(struct net_device *dev, __be32 local)
 {
int ret = 0;
unsigned int hash = fib_laddr_hashfn(local);
struct hlist_head *head = _info_laddrhash[hash];
+   struct net *net = dev_net(dev);
+   int tb_id = l3mdev_fib_table(dev);
struct fib_info *fi;
 
if (!fib_info_laddrhash || local == 0)
return 0;
 
hlist_for_each_entry(fi, head, fib_lhash) {
-   if (!net_eq(fi->fib_net, net))
+   if (!net_eq(fi->fib_net, net) ||
+   fi->fib_tb_id != tb_id)
continue;
if (fi->fib_prefsrc == local) {
fi->fib_flags |= RTNH_F_DEAD;
-- 
2.9.3

[net-next PATCH V2] qed: Remove OOM messages

2016-09-04 Thread Joe Perches

These messages are unnecessary as OOM allocation failures already do
a dump_stack() giving more or less the same information.

$ size drivers/net/ethernet/qlogic/qed/built-in.o* (defconfig x86-64)
   textdata bss dec hex filename
 127817   27969   32800  188586   2e0aa 
drivers/net/ethernet/qlogic/qed/built-in.o.new
 132474   27969   32800  193243   2f2db 
drivers/net/ethernet/qlogic/qed/built-in.o.old

Miscellanea:

o Change allocs to the generally preferred forms where possible.

Signed-off-by: Joe Perches 
---

V2: Respun against net-next, updated object sizes

 drivers/net/ethernet/qlogic/qed/qed_cxt.c  | 20 +++--
 drivers/net/ethernet/qlogic/qed/qed_dcbx.c | 13 ++
 drivers/net/ethernet/qlogic/qed/qed_dev.c  | 60 +++---
 drivers/net/ethernet/qlogic/qed/qed_hw.c   | 12 ++
 drivers/net/ethernet/qlogic/qed/qed_init_ops.c |  4 +-
 drivers/net/ethernet/qlogic/qed/qed_int.c  | 23 +++---
 drivers/net/ethernet/qlogic/qed/qed_main.c |  4 +-
 drivers/net/ethernet/qlogic/qed/qed_mcp.c  |  1 -
 drivers/net/ethernet/qlogic/qed/qed_spq.c  | 28 +++-
 drivers/net/ethernet/qlogic/qed/qed_sriov.c|  9 ++--
 drivers/net/ethernet/qlogic/qed/qed_vf.c   | 14 ++
 11 files changed, 47 insertions(+), 141 deletions(-)

diff --git a/drivers/net/ethernet/qlogic/qed/qed_cxt.c 
b/drivers/net/ethernet/qlogic/qed/qed_cxt.c
index 5476927..dd579b2 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_cxt.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_cxt.c
@@ -792,10 +792,9 @@ static int qed_cxt_src_t2_alloc(struct qed_hwfn *p_hwfn)
p_mngr->t2_num_pages = DIV_ROUND_UP(total_size, psz);
 
/* allocate t2 */
-   p_mngr->t2 = kzalloc(p_mngr->t2_num_pages * sizeof(struct qed_dma_mem),
+   p_mngr->t2 = kcalloc(p_mngr->t2_num_pages, sizeof(struct qed_dma_mem),
 GFP_KERNEL);
if (!p_mngr->t2) {
-   DP_NOTICE(p_hwfn, "Failed to allocate t2 table\n");
rc = -ENOMEM;
goto t2_fail;
}
@@ -957,7 +956,6 @@ static int qed_ilt_shadow_alloc(struct qed_hwfn *p_hwfn)
p_mngr->ilt_shadow = kcalloc(size, sizeof(struct qed_dma_mem),
 GFP_KERNEL);
if (!p_mngr->ilt_shadow) {
-   DP_NOTICE(p_hwfn, "Failed to allocate ilt shadow table\n");
rc = -ENOMEM;
goto ilt_shadow_fail;
}
@@ -1050,10 +1048,8 @@ int qed_cxt_mngr_alloc(struct qed_hwfn *p_hwfn)
u32 i;
 
p_mngr = kzalloc(sizeof(*p_mngr), GFP_KERNEL);
-   if (!p_mngr) {
-   DP_NOTICE(p_hwfn, "Failed to allocate `struct qed_cxt_mngr'\n");
+   if (!p_mngr)
return -ENOMEM;
-   }
 
/* Initialize ILT client registers */
clients = p_mngr->clients;
@@ -1105,24 +1101,18 @@ int qed_cxt_tables_alloc(struct qed_hwfn *p_hwfn)
 
/* Allocate the ILT shadow table */
rc = qed_ilt_shadow_alloc(p_hwfn);
-   if (rc) {
-   DP_NOTICE(p_hwfn, "Failed to allocate ilt memory\n");
+   if (rc)
goto tables_alloc_fail;
-   }
 
/* Allocate the T2  table */
rc = qed_cxt_src_t2_alloc(p_hwfn);
-   if (rc) {
-   DP_NOTICE(p_hwfn, "Failed to allocate T2 memory\n");
+   if (rc)
goto tables_alloc_fail;
-   }
 
/* Allocate and initialize the acquired cids bitmaps */
rc = qed_cid_map_alloc(p_hwfn);
-   if (rc) {
-   DP_NOTICE(p_hwfn, "Failed to allocate cid maps\n");
+   if (rc)
goto tables_alloc_fail;
-   }
 
return 0;
 
diff --git a/drivers/net/ethernet/qlogic/qed/qed_dcbx.c 
b/drivers/net/ethernet/qlogic/qed/qed_dcbx.c
index b900dfb..be7b3dc 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_dcbx.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_dcbx.c
@@ -874,11 +874,8 @@ int qed_dcbx_info_alloc(struct qed_hwfn *p_hwfn)
int rc = 0;
 
p_hwfn->p_dcbx_info = kzalloc(sizeof(*p_hwfn->p_dcbx_info), GFP_KERNEL);
-   if (!p_hwfn->p_dcbx_info) {
-   DP_NOTICE(p_hwfn,
- "Failed to allocate 'struct qed_dcbx_info'\n");
+   if (!p_hwfn->p_dcbx_info)
rc = -ENOMEM;
-   }
 
return rc;
 }
@@ -1176,10 +1173,8 @@ int qed_dcbx_get_config_params(struct qed_hwfn *p_hwfn,
}
 
dcbx_info = kmalloc(sizeof(*dcbx_info), GFP_KERNEL);
-   if (!dcbx_info) {
-   DP_ERR(p_hwfn, "Failed to allocate struct qed_dcbx_info\n");
+   if (!dcbx_info)
return -ENOMEM;
-   }
 
rc = qed_dcbx_query_params(p_hwfn, dcbx_info, QED_DCBX_OPERATIONAL_MIB);
if (rc) {
@@ -1213,10 +1208,8 @@ static struct qed_dcbx_get *qed_dcbnl_get_dcbx(struct 
qed_hwfn *hwfn,
struct qed_dcbx_get *dcbx_info;
 
dcbx_info = kmalloc(sizeof(*dcbx_info), GFP_KERNEL);
-   if (!dcbx_info)

Re: [PATCH] qed: Remove OOM messages

2016-09-04 Thread Joe Perches

On Sun, 2016-09-04 at 13:28 -0700, David Miller wrote:
> From: Joe Perches 
> Date: Fri,  2 Sep 2016 10:48:47 -0700
> 
> > These messages are unnecessary as OOM allocation failures already do
> > a dump_stack() giving more or less the same information.
> > 
> > $ size drivers/net/ethernet/qlogic/qed/built-in.o* (defconfig x86-64)
> >    text  data bss dec hex filename
> >  126849 27968   32800  187617   2dce1 
> >drivers/net/ethernet/qlogic/qed/built-in.o.new
> >  131506 27968   32800  192274   2ef12 
> >drivers/net/ethernet/qlogic/qed/built-in.o.old
> > 
> > Miscellanea:
> > 
> > o Change allocs to the generally preferred forms where possible.
> > 
> > Signed-off-by: Joe Perches 
> 
> Joe can you respin this against net-next?
> 
> It applies against 'net' but this isn't really a critical bug fix and
> it'll also create conflicts against net-next.

No worries, I'll spin it back after the qed -net changes
are pushed back into net-next.

Oh, they are now.  Give me a minute.

[PATCH net-next 6/9] fs/afs/vlocation: Remove deprecated create_singlethread_workqueue

2016-09-04 Thread David Howells

From: Bhaktipriya Shridhar 

The workqueue "afs_vlocation_update_worker" queues a single work item
_vlocation_update and hence it doesn't require execution ordering.
Hence, alloc_workqueue has been used to replace the deprecated
create_singlethread_workqueue instance.

Since the workqueue is being used on a memory reclaim path, WQ_MEM_RECLAIM
flag has been set to ensure forward progress under memory pressure.

Since there are fixed number of work items, explicit concurrency
limit is unnecessary here.

Signed-off-by: Bhaktipriya Shridhar 
Signed-off-by: David Howells 
---

 fs/afs/vlocation.c |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/afs/vlocation.c b/fs/afs/vlocation.c
index 52976785a32c..45a86396fd2d 100644
--- a/fs/afs/vlocation.c
+++ b/fs/afs/vlocation.c
@@ -594,8 +594,8 @@ static void afs_vlocation_reaper(struct work_struct *work)
  */
 int __init afs_vlocation_update_init(void)
 {
-   afs_vlocation_update_worker =
-   create_singlethread_workqueue("kafs_vlupdated");
+   afs_vlocation_update_worker = alloc_workqueue("kafs_vlupdated",
+ WQ_MEM_RECLAIM, 0);
return afs_vlocation_update_worker ? 0 : -ENOMEM;
 }

[PATCH net-next 4/9] rxrpc: Randomise epoch and starting client conn ID values

2016-09-04 Thread David Howells

Create a random epoch value rather than a time-based one on startup and set
the top bit to indicate that this is the case.

Also create a random starting client connection ID value.  This will be
incremented from here as new client connections are created.

Signed-off-by: David Howells 
---

 include/rxrpc/packet.h |1 +
 net/rxrpc/af_rxrpc.c   |9 -
 2 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/include/rxrpc/packet.h b/include/rxrpc/packet.h
index b2017440b765..3c6128e1fdbe 100644
--- a/include/rxrpc/packet.h
+++ b/include/rxrpc/packet.h
@@ -24,6 +24,7 @@ typedef __be32rxrpc_serial_net_t; /* on-the-wire Rx 
message serial number */
  */
 struct rxrpc_wire_header {
__be32  epoch;  /* client boot timestamp */
+#define RXRPC_RANDOM_EPOCH 0x8000  /* Random if set, date-based if 
not */
 
__be32  cid;/* connection and channel ID */
 #define RXRPC_MAXCALLS 4   /* max active calls per 
conn */
diff --git a/net/rxrpc/af_rxrpc.c b/net/rxrpc/af_rxrpc.c
index 32d544995dda..b66a9e6f8d04 100644
--- a/net/rxrpc/af_rxrpc.c
+++ b/net/rxrpc/af_rxrpc.c
@@ -16,6 +16,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -700,7 +701,13 @@ static int __init af_rxrpc_init(void)
 
BUILD_BUG_ON(sizeof(struct rxrpc_skb_priv) > FIELD_SIZEOF(struct 
sk_buff, cb));
 
-   rxrpc_epoch = get_seconds();
+   get_random_bytes(_epoch, sizeof(rxrpc_epoch));
+   rxrpc_epoch |= RXRPC_RANDOM_EPOCH;
+   get_random_bytes(_client_conn_ids.cur,
+sizeof(rxrpc_client_conn_ids.cur));
+   rxrpc_client_conn_ids.cur &= 0x3fff;
+   if (rxrpc_client_conn_ids.cur == 0)
+   rxrpc_client_conn_ids.cur = 1;
 
ret = -ENOMEM;
rxrpc_call_jar = kmem_cache_create(

[PATCH net-next 7/9] fs/afs/rxrpc: Remove deprecated create_singlethread_workqueue

2016-09-04 Thread David Howells

From: Bhaktipriya Shridhar 

The workqueue "afs_async_calls" queues work item
>async_work per afs_call. Since there could be multiple calls and since
these calls can be run concurrently, alloc_workqueue has been used to replace
the deprecated create_singlethread_workqueue instance.

The WQ_MEM_RECLAIM flag has been set to ensure forward progress under
memory pressure because the workqueue is being used on a memory reclaim
path.

Since there are fixed number of work items, explicit concurrency
limit is unnecessary here.

Signed-off-by: Bhaktipriya Shridhar 
Signed-off-by: David Howells 
---

 fs/afs/rxrpc.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/afs/rxrpc.c b/fs/afs/rxrpc.c
index 244896baf241..37608be52abd 100644
--- a/fs/afs/rxrpc.c
+++ b/fs/afs/rxrpc.c
@@ -76,7 +76,7 @@ int afs_open_socket(void)
_enter("");
 
ret = -ENOMEM;
-   afs_async_calls = create_singlethread_workqueue("kafsd");
+   afs_async_calls = alloc_workqueue("kafsd", WQ_MEM_RECLAIM, 0);
if (!afs_async_calls)
goto error_0;

[PATCH net-next 8/9] fs/afs/callback: Remove deprecated create_singlethread_workqueue

2016-09-04 Thread David Howells

From: Bhaktipriya Shridhar 

The workqueue "afs_callback_update_worker" queues multiple work items
viz  >cb_broken_work, >cb_break_work which require strict
execution ordering. Hence, an ordered dedicated workqueue has been used.

Since the workqueue is being used on a memory reclaim path, WQ_MEM_RECLAIM
has been set to ensure forward progress under memory pressure.

Signed-off-by: Bhaktipriya Shridhar 
Signed-off-by: David Howells 
---

 fs/afs/callback.c |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/afs/callback.c b/fs/afs/callback.c
index 7ef637d7f3a5..1e9d2f84e5b5 100644
--- a/fs/afs/callback.c
+++ b/fs/afs/callback.c
@@ -461,8 +461,8 @@ static void afs_callback_updater(struct work_struct *work)
  */
 int __init afs_callback_update_init(void)
 {
-   afs_callback_update_worker =
-   create_singlethread_workqueue("kafs_callbackd");
+   afs_callback_update_worker = alloc_ordered_workqueue("kafs_callbackd",
+WQ_MEM_RECLAIM);
return afs_callback_update_worker ? 0 : -ENOMEM;
 }

[PATCH net-next 5/9] rxrpc: Don't change the epoch

2016-09-04 Thread David Howells

It seems the local epoch should only be changed on boot, so remove the code
that changes it for client connections.

Signed-off-by: David Howells 
---

 net/rxrpc/conn_client.c |   32 
 1 file changed, 8 insertions(+), 24 deletions(-)

diff --git a/net/rxrpc/conn_client.c b/net/rxrpc/conn_client.c
index e19804dd6c8d..82de1aeaef21 100644
--- a/net/rxrpc/conn_client.c
+++ b/net/rxrpc/conn_client.c
@@ -108,12 +108,12 @@ static DECLARE_DELAYED_WORK(rxrpc_client_conn_reap,
 /*
  * Get a connection ID and epoch for a client connection from the global pool.
  * The connection struct pointer is then recorded in the idr radix tree.  The
- * epoch is changed if this wraps.
+ * epoch doesn't change until the client is rebooted (or, at least, unless the
+ * module is unloaded).
  */
 static int rxrpc_get_client_connection_id(struct rxrpc_connection *conn,
  gfp_t gfp)
 {
-   u32 epoch;
int id;
 
_enter("");
@@ -121,34 +121,18 @@ static int rxrpc_get_client_connection_id(struct 
rxrpc_connection *conn,
idr_preload(gfp);
spin_lock(_conn_id_lock);
 
-   epoch = rxrpc_epoch;
-
-   /* We could use idr_alloc_cyclic() here, but we really need to know
-* when the thing wraps so that we can advance the epoch.
-*/
-   if (rxrpc_client_conn_ids.cur == 0)
-   rxrpc_client_conn_ids.cur = 1;
-   id = idr_alloc(_client_conn_ids, conn,
-  rxrpc_client_conn_ids.cur, 0x4000, GFP_NOWAIT);
-   if (id < 0) {
-   if (id != -ENOSPC)
-   goto error;
-   id = idr_alloc(_client_conn_ids, conn,
-  1, 0x4000, GFP_NOWAIT);
-   if (id < 0)
-   goto error;
-   epoch++;
-   rxrpc_epoch = epoch;
-   }
-   rxrpc_client_conn_ids.cur = id + 1;
+   id = idr_alloc_cyclic(_client_conn_ids, conn,
+ 1, 0x4000, GFP_NOWAIT);
+   if (id < 0)
+   goto error;
 
spin_unlock(_conn_id_lock);
idr_preload_end();
 
-   conn->proto.epoch = epoch;
+   conn->proto.epoch = rxrpc_epoch;
conn->proto.cid = id << RXRPC_CIDSHIFT;
set_bit(RXRPC_CONN_HAS_IDR, >flags);
-   _leave(" [CID %x:%x]", epoch, conn->proto.cid);
+   _leave(" [CID %x]", conn->proto.cid);
return 0;
 
 error:

[PATCH net-next 9/9] fs/afs/flock: Remove deprecated create_singlethread_workqueue

2016-09-04 Thread David Howells

From: Bhaktipriya Shridhar 

The workqueue "afs_lock_manager" queues work item >lock_work,
per vnode. Since there can be multiple vnodes and since their work items
can be executed concurrently, alloc_workqueue has been used to replace
the deprecated create_singlethread_workqueue instance.

The WQ_MEM_RECLAIM flag has been set to ensure forward progress under
memory pressure because the workqueue is being used on a memory reclaim
path.

Since there are fixed number of work items, explicit concurrency
limit is unnecessary here.

Signed-off-by: Bhaktipriya Shridhar 
Signed-off-by: David Howells 
---

 fs/afs/flock.c |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/afs/flock.c b/fs/afs/flock.c
index d91a9c9cfbd0..3191dff2c156 100644
--- a/fs/afs/flock.c
+++ b/fs/afs/flock.c
@@ -36,8 +36,8 @@ static int afs_init_lock_manager(void)
if (!afs_lock_manager) {
mutex_lock(_lock_manager_mutex);
if (!afs_lock_manager) {
-   afs_lock_manager =
-   create_singlethread_workqueue("kafs_lockd");
+   afs_lock_manager = alloc_workqueue("kafs_lockd",
+  WQ_MEM_RECLAIM, 0);
if (!afs_lock_manager)
ret = -ENOMEM;
}

[PATCH net-next 1/3] rxrpc: Split sendmsg from packet transmission code

2016-09-04 Thread David Howells

Split the sendmsg code from the packet transmission code (mostly to be
found in output.c).

Signed-off-by: David Howells 
---

 net/rxrpc/Makefile  |1 
 net/rxrpc/ar-internal.h |9 -
 net/rxrpc/misc.c|5 
 net/rxrpc/output.c  |  630 --
 net/rxrpc/sendmsg.c |  645 +++
 5 files changed, 657 insertions(+), 633 deletions(-)
 create mode 100644 net/rxrpc/sendmsg.c

diff --git a/net/rxrpc/Makefile b/net/rxrpc/Makefile
index 10f3f48a16a8..8fc6ea347182 100644
--- a/net/rxrpc/Makefile
+++ b/net/rxrpc/Makefile
@@ -22,6 +22,7 @@ af-rxrpc-y := \
peer_object.o \
recvmsg.o \
security.o \
+   sendmsg.o \
skbuff.o \
utils.o
 
diff --git a/net/rxrpc/ar-internal.h b/net/rxrpc/ar-internal.h
index 4e86d248dc5e..464dfda2a995 100644
--- a/net/rxrpc/ar-internal.h
+++ b/net/rxrpc/ar-internal.h
@@ -814,6 +814,7 @@ extern unsigned int rxrpc_idle_ack_delay;
 extern unsigned int rxrpc_rx_window_size;
 extern unsigned int rxrpc_rx_mtu;
 extern unsigned int rxrpc_rx_jumbo_max;
+extern unsigned int rxrpc_resend_timeout;
 
 extern const char *const rxrpc_pkts[];
 extern const s8 rxrpc_ack_priority[];
@@ -823,10 +824,7 @@ extern const char *rxrpc_acks(u8 reason);
 /*
  * output.c
  */
-extern unsigned int rxrpc_resend_timeout;
-
 int rxrpc_send_data_packet(struct rxrpc_connection *, struct sk_buff *);
-int rxrpc_do_sendmsg(struct rxrpc_sock *, struct msghdr *, size_t);
 
 /*
  * peer_event.c
@@ -888,6 +886,11 @@ int __init rxrpc_init_security(void);
 void rxrpc_exit_security(void);
 int rxrpc_init_client_conn_security(struct rxrpc_connection *);
 int rxrpc_init_server_conn_security(struct rxrpc_connection *);
+ 
+/*
+ * sendmsg.c
+ */
+int rxrpc_do_sendmsg(struct rxrpc_sock *, struct msghdr *, size_t);
 
 /*
  * skbuff.c
diff --git a/net/rxrpc/misc.c b/net/rxrpc/misc.c
index bdc5e42fe600..39e7cc37c392 100644
--- a/net/rxrpc/misc.c
+++ b/net/rxrpc/misc.c
@@ -64,6 +64,11 @@ unsigned int rxrpc_rx_mtu = 5692;
  */
 unsigned int rxrpc_rx_jumbo_max = 4;
 
+/*
+ * Time till packet resend (in jiffies).
+ */
+unsigned int rxrpc_resend_timeout = 4 * HZ;
+
 const char *const rxrpc_pkts[] = {
"?00",
"DATA", "ACK", "BUSY", "ABORT", "ACKALL", "CHALL", "RESP", "DEBUG",
diff --git a/net/rxrpc/output.c b/net/rxrpc/output.c
index 817ae801e769..5b5508f6fc2a 100644
--- a/net/rxrpc/output.c
+++ b/net/rxrpc/output.c
@@ -14,299 +14,12 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
 #include "ar-internal.h"
 
 /*
- * Time till packet resend (in jiffies).
- */
-unsigned int rxrpc_resend_timeout = 4 * HZ;
-
-static int rxrpc_send_data(struct rxrpc_sock *rx,
-  struct rxrpc_call *call,
-  struct msghdr *msg, size_t len);
-
-/*
- * extract control messages from the sendmsg() control buffer
- */
-static int rxrpc_sendmsg_cmsg(struct msghdr *msg,
- unsigned long *user_call_ID,
- enum rxrpc_command *command,
- u32 *abort_code,
- bool *_exclusive)
-{
-   struct cmsghdr *cmsg;
-   bool got_user_ID = false;
-   int len;
-
-   *command = RXRPC_CMD_SEND_DATA;
-
-   if (msg->msg_controllen == 0)
-   return -EINVAL;
-
-   for_each_cmsghdr(cmsg, msg) {
-   if (!CMSG_OK(msg, cmsg))
-   return -EINVAL;
-
-   len = cmsg->cmsg_len - CMSG_ALIGN(sizeof(struct cmsghdr));
-   _debug("CMSG %d, %d, %d",
-  cmsg->cmsg_level, cmsg->cmsg_type, len);
-
-   if (cmsg->cmsg_level != SOL_RXRPC)
-   continue;
-
-   switch (cmsg->cmsg_type) {
-   case RXRPC_USER_CALL_ID:
-   if (msg->msg_flags & MSG_CMSG_COMPAT) {
-   if (len != sizeof(u32))
-   return -EINVAL;
-   *user_call_ID = *(u32 *) CMSG_DATA(cmsg);
-   } else {
-   if (len != sizeof(unsigned long))
-   return -EINVAL;
-   *user_call_ID = *(unsigned long *)
-   CMSG_DATA(cmsg);
-   }
-   _debug("User Call ID %lx", *user_call_ID);
-   got_user_ID = true;
-   break;
-
-   case RXRPC_ABORT:
-   if (*command != RXRPC_CMD_SEND_DATA)
-   return -EINVAL;
-   *command = RXRPC_CMD_SEND_ABORT;
-   if (len != sizeof(*abort_code))
-   return -EINVAL;
-   *abort_code = *(unsigned int *) CMSG_DATA(cmsg);
-

[PATCH net-next 3/3] rxrpc Move enum rxrpc_command to sendmsg.c

2016-09-04 Thread David Howells

Move enum rxrpc_command to sendmsg.c as it's now only used in that file.

Signed-off-by: David Howells 
---

 net/rxrpc/ar-internal.h |7 ---
 net/rxrpc/sendmsg.c |7 +++
 2 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/net/rxrpc/ar-internal.h b/net/rxrpc/ar-internal.h
index 464dfda2a995..bb342f5fe7e4 100644
--- a/net/rxrpc/ar-internal.h
+++ b/net/rxrpc/ar-internal.h
@@ -131,13 +131,6 @@ struct rxrpc_skb_priv {
 
 #define rxrpc_skb(__skb) ((struct rxrpc_skb_priv *) &(__skb)->cb)
 
-enum rxrpc_command {
-   RXRPC_CMD_SEND_DATA,/* send data message */
-   RXRPC_CMD_SEND_ABORT,   /* request abort generation */
-   RXRPC_CMD_ACCEPT,   /* [server] accept incoming call */
-   RXRPC_CMD_REJECT_BUSY,  /* [server] reject a call as busy */
-};
-
 /*
  * RxRPC security module interface
  */
diff --git a/net/rxrpc/sendmsg.c b/net/rxrpc/sendmsg.c
index 17a9ebbc2346..7376794a0308 100644
--- a/net/rxrpc/sendmsg.c
+++ b/net/rxrpc/sendmsg.c
@@ -20,6 +20,13 @@
 #include 
 #include "ar-internal.h"
 
+enum rxrpc_command {
+   RXRPC_CMD_SEND_DATA,/* send data message */
+   RXRPC_CMD_SEND_ABORT,   /* request abort generation */
+   RXRPC_CMD_ACCEPT,   /* [server] accept incoming call */
+   RXRPC_CMD_REJECT_BUSY,  /* [server] reject a call as busy */
+};
+
 /*
  * wait for space to appear in the transmit/ACK window
  * - caller holds the socket locked

[PATCH net-next 3/9] rxrpc: The client call state must be changed before attachment to conn

2016-09-04 Thread David Howells

We must set the client call state to RXRPC_CALL_CLIENT_SEND_REQUEST before
attaching the call to the connection struct, not after, as it's liable to
receive errors and conn aborts as soon as the assignment is made - and
these will cause its state to be changed outside of the initiating thread's
control.

Signed-off-by: David Howells 
---

 net/rxrpc/call_object.c |2 --
 net/rxrpc/conn_client.c |4 
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/net/rxrpc/call_object.c b/net/rxrpc/call_object.c
index 57e00fc9cff2..65691742199b 100644
--- a/net/rxrpc/call_object.c
+++ b/net/rxrpc/call_object.c
@@ -197,8 +197,6 @@ static int rxrpc_begin_client_call(struct rxrpc_call *call,
if (ret < 0)
return ret;
 
-   call->state = RXRPC_CALL_CLIENT_SEND_REQUEST;
-
spin_lock(>conn->params.peer->lock);
hlist_add_head(>error_link, 
>conn->params.peer->error_targets);
spin_unlock(>conn->params.peer->lock);
diff --git a/net/rxrpc/conn_client.c b/net/rxrpc/conn_client.c
index 4b213bc0f554..e19804dd6c8d 100644
--- a/net/rxrpc/conn_client.c
+++ b/net/rxrpc/conn_client.c
@@ -537,6 +537,10 @@ static void rxrpc_activate_one_channel(struct 
rxrpc_connection *conn,
 struct rxrpc_call, chan_wait_link);
u32 call_id = chan->call_counter + 1;
 
+   write_lock_bh(>state_lock);
+   call->state = RXRPC_CALL_CLIENT_SEND_REQUEST;
+   write_unlock_bh(>state_lock);
+
rxrpc_see_call(call);
list_del_init(>chan_wait_link);
conn->active_chans |= 1 << channel;

[PATCH net-next 2/3] rxrpc: Rearrange net/rxrpc/sendmsg.c

2016-09-04 Thread David Howells

Rearrange net/rxrpc/sendmsg.c to be in a more logical order.  This makes it
easier to follow and eliminates forward declarations.

Signed-off-by: David Howells 
---

 net/rxrpc/sendmsg.c | 1006 +--
 1 file changed, 501 insertions(+), 505 deletions(-)

diff --git a/net/rxrpc/sendmsg.c b/net/rxrpc/sendmsg.c
index ff3e28ddc6d8..17a9ebbc2346 100644
--- a/net/rxrpc/sendmsg.c
+++ b/net/rxrpc/sendmsg.c
@@ -20,626 +20,622 @@
 #include 
 #include "ar-internal.h"
 
-static int rxrpc_send_data(struct rxrpc_sock *rx,
-  struct rxrpc_call *call,
-  struct msghdr *msg, size_t len);
-
 /*
- * extract control messages from the sendmsg() control buffer
+ * wait for space to appear in the transmit/ACK window
+ * - caller holds the socket locked
  */
-static int rxrpc_sendmsg_cmsg(struct msghdr *msg,
- unsigned long *user_call_ID,
- enum rxrpc_command *command,
- u32 *abort_code,
- bool *_exclusive)
+static int rxrpc_wait_for_tx_window(struct rxrpc_sock *rx,
+   struct rxrpc_call *call,
+   long *timeo)
 {
-   struct cmsghdr *cmsg;
-   bool got_user_ID = false;
-   int len;
-
-   *command = RXRPC_CMD_SEND_DATA;
-
-   if (msg->msg_controllen == 0)
-   return -EINVAL;
-
-   for_each_cmsghdr(cmsg, msg) {
-   if (!CMSG_OK(msg, cmsg))
-   return -EINVAL;
-
-   len = cmsg->cmsg_len - CMSG_ALIGN(sizeof(struct cmsghdr));
-   _debug("CMSG %d, %d, %d",
-  cmsg->cmsg_level, cmsg->cmsg_type, len);
-
-   if (cmsg->cmsg_level != SOL_RXRPC)
-   continue;
+   DECLARE_WAITQUEUE(myself, current);
+   int ret;
 
-   switch (cmsg->cmsg_type) {
-   case RXRPC_USER_CALL_ID:
-   if (msg->msg_flags & MSG_CMSG_COMPAT) {
-   if (len != sizeof(u32))
-   return -EINVAL;
-   *user_call_ID = *(u32 *) CMSG_DATA(cmsg);
-   } else {
-   if (len != sizeof(unsigned long))
-   return -EINVAL;
-   *user_call_ID = *(unsigned long *)
-   CMSG_DATA(cmsg);
-   }
-   _debug("User Call ID %lx", *user_call_ID);
-   got_user_ID = true;
-   break;
+   _enter(",{%d},%ld",
+  CIRC_SPACE(call->acks_head, ACCESS_ONCE(call->acks_tail),
+ call->acks_winsz),
+  *timeo);
 
-   case RXRPC_ABORT:
-   if (*command != RXRPC_CMD_SEND_DATA)
-   return -EINVAL;
-   *command = RXRPC_CMD_SEND_ABORT;
-   if (len != sizeof(*abort_code))
-   return -EINVAL;
-   *abort_code = *(unsigned int *) CMSG_DATA(cmsg);
-   _debug("Abort %x", *abort_code);
-   if (*abort_code == 0)
-   return -EINVAL;
-   break;
+   add_wait_queue(>waitq, );
 
-   case RXRPC_ACCEPT:
-   if (*command != RXRPC_CMD_SEND_DATA)
-   return -EINVAL;
-   *command = RXRPC_CMD_ACCEPT;
-   if (len != 0)
-   return -EINVAL;
+   for (;;) {
+   set_current_state(TASK_INTERRUPTIBLE);
+   ret = 0;
+   if (CIRC_SPACE(call->acks_head, ACCESS_ONCE(call->acks_tail),
+  call->acks_winsz) > 0)
break;
-
-   case RXRPC_EXCLUSIVE_CALL:
-   *_exclusive = true;
-   if (len != 0)
-   return -EINVAL;
+   if (signal_pending(current)) {
+   ret = sock_intr_errno(*timeo);
break;
-   default:
-   return -EINVAL;
}
+
+   release_sock(>sk);
+   *timeo = schedule_timeout(*timeo);
+   lock_sock(>sk);
}
 
-   if (!got_user_ID)
-   return -EINVAL;
-   _leave(" = 0");
-   return 0;
+   remove_wait_queue(>waitq, );
+   set_current_state(TASK_RUNNING);
+   _leave(" = %d", ret);
+   return ret;
 }
 
 /*
- * abort a call, sending an ABORT packet to the peer
+ * attempt to schedule an instant Tx resend
  */
-static void rxrpc_send_abort(struct rxrpc_call *call, u32 abort_code)
+static inline void

[PATCH net-next 0/3] rxrpc: Split output code from sendmsg code

2016-09-04 Thread David Howells


Here's a set of small patches that split the packet transmission code from
the sendmsg code and simply rearrange the new file to make it more
logically laid out ready for being rewritten.  An enum is also moved out of
the header file to there as it's only used there.  This needs to be applied
on top of the just-posted fixes patch set.

The patches can be found here also (non-terminally on the branch):


http://git.kernel.org/cgit/linux/kernel/git/dhowells/linux-fs.git/log/?h=rxrpc-rewrite

Tagged thusly:

git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git
rxrpc-rewrite-20160904-2

David
---
David Howells (3):
  rxrpc: Split sendmsg from packet transmission code
  rxrpc: Rearrange net/rxrpc/sendmsg.c
  rxrpc Move enum rxrpc_command to sendmsg.c


 net/rxrpc/Makefile  |1 
 net/rxrpc/ar-internal.h |   16 -
 net/rxrpc/misc.c|5 
 net/rxrpc/output.c  |  630 --
 net/rxrpc/sendmsg.c |  648 +++
 5 files changed, 660 insertions(+), 640 deletions(-)
 create mode 100644 net/rxrpc/sendmsg.c

[PATCH net-next 1/9] rxrpc: fix undefined behavior in rxrpc_mark_call_released

2016-09-04 Thread David Howells

From: Arnd Bergmann 

gcc -Wmaybe-initialized correctly points out a newly introduced bug
through which we can end up calling rxrpc_queue_call() for a dead
connection:

net/rxrpc/call_object.c: In function 'rxrpc_mark_call_released':
net/rxrpc/call_object.c:600:5: error: 'sched' may be used uninitialized in this 
function [-Werror=maybe-uninitialized]

This sets the 'sched' variable to zero to restore the previous
behavior.

Signed-off-by: Arnd Bergmann 
Fixes: f5c17aaeb2ae ("rxrpc: Calls should only have one terminal state")
Signed-off-by: David Howells 
---

 net/rxrpc/call_object.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/rxrpc/call_object.c b/net/rxrpc/call_object.c
index 516d8ea82f02..57e00fc9cff2 100644
--- a/net/rxrpc/call_object.c
+++ b/net/rxrpc/call_object.c
@@ -586,7 +586,7 @@ static void rxrpc_dead_call_expired(unsigned long _call)
  */
 static void rxrpc_mark_call_released(struct rxrpc_call *call)
 {
-   bool sched;
+   bool sched = false;
 
rxrpc_see_call(call);
write_lock(>state_lock);

[PATCH net-next 2/9] rxrpc: Fix uninitialised variable warning

2016-09-04 Thread David Howells

Fix the following uninitialised variable warning:

../net/rxrpc/call_event.c: In function 'rxrpc_process_call':
../net/rxrpc/call_event.c:879:58: warning: 'error' may be used uninitialized in 
this function [-Wmaybe-uninitialized]
_debug("post net error %d", error);
  ^

Signed-off-by: David Howells 
---

 net/rxrpc/call_event.c |5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/net/rxrpc/call_event.c b/net/rxrpc/call_event.c
index de72de662044..4754c7fb6242 100644
--- a/net/rxrpc/call_event.c
+++ b/net/rxrpc/call_event.c
@@ -868,7 +868,6 @@ skip_msg_init:
/* deal with events of a final nature */
if (test_bit(RXRPC_CALL_EV_RCVD_ERROR, >events)) {
enum rxrpc_skb_mark mark;
-   int error;
 
clear_bit(RXRPC_CALL_EV_CONN_ABORT, >events);
clear_bit(RXRPC_CALL_EV_REJECT_BUSY, >events);
@@ -876,10 +875,10 @@ skip_msg_init:
 
if (call->completion == RXRPC_CALL_NETWORK_ERROR) {
mark = RXRPC_SKB_MARK_NET_ERROR;
-   _debug("post net error %d", error);
+   _debug("post net error %d", call->error);
} else {
mark = RXRPC_SKB_MARK_LOCAL_ERROR;
-   _debug("post net local error %d", error);
+   _debug("post net local error %d", call->error);
}
 
if (rxrpc_post_message(call, mark, call->error, true) < 0)

[PATCH net-next 0/9] rxrpc: Small fixes

2016-09-04 Thread David Howells


Here's a set of small fix patches:

 (1) Fix some uninitialised variables.

 (2) Set the client call state before making it live by attaching it to the
 conn struct.

 (3) Randomise the epoch and starting client conn ID values, and don't
 change the epoch when the client conn ID rolls round.

 (4) Replace deprecated create_singlethread_workqueue() calls.

The patches can be found here also (non-terminally on the branch):


http://git.kernel.org/cgit/linux/kernel/git/dhowells/linux-fs.git/log/?h=rxrpc-rewrite

Tagged thusly:

git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git
rxrpc-rewrite-20160904-1

David
---
Arnd Bergmann (1):
  rxrpc: fix undefined behavior in rxrpc_mark_call_released

Bhaktipriya Shridhar (4):
  fs/afs/vlocation: Remove deprecated create_singlethread_workqueue
  fs/afs/rxrpc: Remove deprecated create_singlethread_workqueue
  fs/afs/callback: Remove deprecated create_singlethread_workqueue
  fs/afs/flock: Remove deprecated create_singlethread_workqueue

David Howells (4):
  rxrpc: Fix uninitialised variable warning
  rxrpc: The client call state must be changed before attachment to conn
  rxrpc: Randomise epoch and starting client conn ID values
  rxrpc: Don't change the epoch


 fs/afs/callback.c   |4 ++--
 fs/afs/flock.c  |4 ++--
 fs/afs/rxrpc.c  |2 +-
 fs/afs/vlocation.c  |4 ++--
 include/rxrpc/packet.h  |1 +
 net/rxrpc/af_rxrpc.c|9 -
 net/rxrpc/call_event.c  |5 ++---
 net/rxrpc/call_object.c |4 +---
 net/rxrpc/conn_client.c |   36 
 9 files changed, 31 insertions(+), 38 deletions(-)

Re: [PATCH 2/2] af_unix: split 'u->readlock' into two: 'iolock' and 'bindlock'

2016-09-04 Thread David Miller

From: Linus Torvalds 
Date: Fri, 2 Sep 2016 11:13:09 -0700 (PDT)

> 
> From: Linus Torvalds 
> Date: Thu, 1 Sep 2016 14:43:53 -0700
> Subject: [PATCH 2/2] af_unix: split 'u->readlock' into two: 'iolock' and 
> 'bindlock'
> 
> Right now we use the 'readlock' both for protecting some of the af_unix
> IO path and for making the bind be single-threaded.
> 
> The two are independent, but using the same lock makes for a nasty
> deadlock due to ordering with regards to filesystem locking.  The bind
> locking would want to nest outside the VSF pathname locking, but the IO
> locking wants to nest inside some of those same locks.
> 
> We tried to fix this earlier with commit c845acb324aa ("af_unix: Fix
> splice-bind deadlock") which moved the readlock inside the vfs locks,
> but that caused problems with overlayfs that will then call back into
> filesystem routines that take the lock in the wrong order anyway.
> 
> Splitting the locks means that we can go back to having the bind lock be
> the outermost lock, and we don't have any deadlocks with lock ordering.
> 
> Acked-by: Rainer Weikusat 
> Acked-by: Al Viro 
> Signed-off-by: Linus Torvalds 

Applied.

Re: [PATCH 1/2] Revert "af_unix: Fix splice-bind deadlock"

2016-09-04 Thread David Miller

From: Linus Torvalds 
Date: Fri, 2 Sep 2016 11:09:23 -0700 (PDT)

> 
> From: Linus Torvalds 
> Date: Thu, 1 Sep 2016 14:56:49 -0700
> Subject: [PATCH 1/2] Revert "af_unix: Fix splice-bind deadlock"
> 
> This reverts commit c845acb324aa85a39650a14e7696982ceea75dc1.
> 
> It turns out that it just replaces one deadlock with another one: we can
> still get the wrong lock ordering with the readlock due to overlayfs
> calling back into the filesystem layer and still taking the vfs locks
> after the readlock.
> 
> The proper solution ends up being to just split the readlock into two
> pieces: the bind lock (taken *outside* the vfs locks) and the IO lock
> (taken *inside* the filesystem locks).  The two locks are independent
> anyway.
> 
> Signed-off-by: Linus Torvalds 

Applied.

Re: [PATCH] qed: Remove OOM messages

2016-09-04 Thread David Miller

From: Joe Perches 
Date: Fri,  2 Sep 2016 10:48:47 -0700

> These messages are unnecessary as OOM allocation failures already do
> a dump_stack() giving more or less the same information.
> 
> $ size drivers/net/ethernet/qlogic/qed/built-in.o* (defconfig x86-64)
>text  data bss dec hex filename
>  126849 27968   32800  187617   2dce1 
> drivers/net/ethernet/qlogic/qed/built-in.o.new
>  131506 27968   32800  192274   2ef12 
> drivers/net/ethernet/qlogic/qed/built-in.o.old
> 
> Miscellanea:
> 
> o Change allocs to the generally preferred forms where possible.
> 
> Signed-off-by: Joe Perches 

Joe can you respin this against net-next?

It applies against 'net' but this isn't really a critical bug fix and
it'll also create conflicts against net-next.

Thanks.

Re: [Patch net-next 2/2] netns: avoid disabling irq for netns id

2016-09-04 Thread Nicolas Dichtel

Le 02/09/2016 à 19:24, Cong Wang a écrit :
> On Fri, Sep 2, 2016 at 9:39 AM, Cong Wang  wrote:
>> On Fri, Sep 2, 2016 at 1:12 AM, Nicolas Dichtel
>>  wrote:
>>> Le 02/09/2016 à 06:53, Cong Wang a écrit :
 We never read or change netns id in hardirq context,
 the only place we read netns id in softirq context
 is in vxlan_xmit(). So, it should be enough to just
 disable BH.
>>>
>>> Are you sure? Did you audit all part of the code?
>>
>> I did audit all the callers, and I didn't find any of them in IRQ context.
>>
>>> peernet2id() is called from netlink core system (do_one_broadcast()). Are 
>>> you
>>> sure that no driver call this function from an hard irq context?
>>
>> I audit all callers of netlink_broadcast(), and I don't see any of
>> them in hardirq context.
> 
> Note, you can rule out most of them by checking GFP_KERNEL,
> which indicates a process context. ;) For GFP_ATOMIC cases,
> I don't see any of them in hardirq context either, but I am definitely
> not familiar with drivers like infiniband.
> 
Yes, I was thinking to that one. But I'm also not familiar with it ;-)

Re: [PATCH] net: ti: cpmac: Fix compiler warning due to type confusion

2016-09-04 Thread David Miller

From: Paul Burton 
Date: Fri, 2 Sep 2016 15:22:48 +0100

> cpmac_start_xmit() used the max() macro on skb->len (an unsigned int)
> and ETH_ZLEN (a signed int literal). This led to the following compiler
> warning:
> 
>   In file included from include/linux/list.h:8:0,
>from include/linux/module.h:9,
>from drivers/net/ethernet/ti/cpmac.c:19:
>   drivers/net/ethernet/ti/cpmac.c: In function 'cpmac_start_xmit':
>   include/linux/kernel.h:748:17: warning: comparison of distinct pointer
>   types lacks a cast
> (void) (&_max1 == &_max2);  \
>^
>   drivers/net/ethernet/ti/cpmac.c:560:8: note: in expansion of macro 'max'
> len = max(skb->len, ETH_ZLEN);
>   ^
> 
> On top of this, it assigned the result of the max() macro to a signed
> integer whilst all further uses of it result in it being cast to varying
> widths of unsigned integer.
> 
> Fix this up by using max_t to ensure the comparison is performed as
> unsigned integers, and for consistency change the type of the len
> variable to unsigned int.
> 
> Signed-off-by: Paul Burton 

Applied.

Re: [PATCH] net: lantiq_etop: Remove unused 'i' variable

2016-09-04 Thread David Miller

From: Paul Burton 
Date: Fri, 2 Sep 2016 15:26:54 +0100

> Commit e7f4dc3536a4 ("mdio: Move allocation of interrupts into core")
> removed the only use of the 'i' variable from ltq_etop_mdio_init() but
> left the variable declaration behind, leading to the following compiler
> warning:
> 
>   drivers/net/ethernet/lantiq_etop.c: In function 'ltq_etop_mdio_init':
>   drivers/net/ethernet/lantiq_etop.c:414:6: warning: unused variable 'i' 
> [-Wunused-variable]
> int i;
> ^
> 
> Fix this by removing the declaration of the 'i' variable.
> 
> Signed-off-by: Paul Burton 

This appears to already be fixed in net-next

Re: [Patch v5 0/2] net: ethernet: xilinx: mac addr and mips

2016-09-04 Thread David Miller

From: Zubair Lutfullah Kakakhel 
Date: Fri, 2 Sep 2016 12:39:24 +0100

> A couple of simple patches to generate the random mac address
> if none is found. And enabling the driver for mips.
> 
> Based on v4.8-rc4.
> 
> These were part of a larger series but that series is growing
> wildly. Splitting and submitting the net subsystem patches separately.
> Hence the v5.

This doesn't apply cleanly to any of my trees.

Re: [PATCH net-next] cxgb4: Add support for ndo_get_vf_config

2016-09-04 Thread David Miller

From: Hariprasad Shenai 
Date: Fri,  2 Sep 2016 19:13:53 +0530

> Adds support for ndo_get_vf_config, also fill the default mac address
> that will be provided to the VF by firmware, in case user doesn't
> provide one. So user can get the default MAC address address also
> through ndo_get_vf_config.
> 
> Signed-off-by: Hariprasad Shenai 

Applied.

Re: [PATCH net 0/2] vxlan: fix error reporting

2016-09-04 Thread David Miller

From: Jiri Benc 
Date: Fri,  2 Sep 2016 13:37:10 +0200

> This patchset improves checking for invalid configuration in VXLAN and fixes
> problems with duplicated and inappropriate error messages.

Series applied, thanks.

Re: [PATCH net] bonding: Fix bonding crash

2016-09-04 Thread David Miller

From: Mahesh Bandewar 
Date: Thu,  1 Sep 2016 22:18:34 -0700

> From: Mahesh Bandewar 
> 
> Following few steps will crash kernel -
> 
>   (a) Create bonding master
>   > modprobe bonding miimon=50
>   (b) Create macvlan bridge on eth2
>   > ip link add link eth2 dev mvl0 address aa:0:0:0:0:01 \
>  type macvlan
>   (c) Now try adding eth2 into the bond
>   > echo +eth2 > /sys/class/net/bond0/bonding/slaves
>   
> 
> Bonding does lots of things before checking if the device enslaved is
> busy or not.
> 
> In this case when the notifier call-chain sends notifications, the
> bond_netdev_event() assumes that the rx_handler /rx_handler_data is
> registered while the bond_enslave() hasn't progressed far enough to
> register rx_handler for the new slave.
> 
> This patch adds a rx_handler check that can be performed right at the
> beginning of the enslave code to avoid getting into this situation.
> 
> Signed-off-by: Mahesh Bandewar 

Applied and queued up for -stable, thanks.

Re: [Patch net-next 0/2] net: some minor optimization for netns id

2016-09-04 Thread David Miller


Series applied.

Re: [PATCH net-next] openvswitch: Free tmpl with tmpl_free.

2016-09-04 Thread David Miller

From: Joe Stringer 
Date: Thu,  1 Sep 2016 18:01:47 -0700

> When an error occurs during conntrack template creation as part of
> actions validation, we need to free the template. Previously we've been
> using nf_ct_put() to do this, but nf_ct_tmpl_free() is more appropriate.
> 
> Signed-off-by: Joe Stringer 

Applied, thanks.

[PATCH v4 4/5] net: stmmac: add a glue driver for the Amlogic Meson 8b / GXBB DWMAC

2016-09-04 Thread Martin Blumenstingl

The Ethernet controller available in Meson8b and GXBB SoCs is a Synopsys
DesignWare MAC IP core which is already supported by the stmmac driver.

In addition to the standard stmmac driver some Meson8b / GXBB specific
registers have to be configured for the PHY clocks. These SoC specific
registers are called PRG_ETHERNET_ADDR0 and PRG_ETHERNET_ADDR1 in the
datasheet.
These registers are not backwards compatible with those on Meson 6b,
which is why a new glue driver is introduced. This worked for many
boards because the bootloader programs the PRG_ETHERNET registers
correctly. Additionally the meson6-dwmac driver only sets bit 1 of
PRG_ETHERNET_ADDR0 which (according to the datasheet) is only used
during reset.

Currently all configuration values can be determined automatically,
based on the configured phy-mode (which is mandatory for the stmmac
driver). If required the tx-delay and the mux clock (so it supports
the MPLL2 clock as well) can be made configurable in the future.

Signed-off-by: Martin Blumenstingl 
Tested-by: Kevin Hilman 
Acked-by: David S. Miller 
---
 drivers/net/ethernet/stmicro/stmmac/Makefile   |   2 +-
 .../net/ethernet/stmicro/stmmac/dwmac-meson8b.c| 324 +
 2 files changed, 325 insertions(+), 1 deletion(-)
 create mode 100644 drivers/net/ethernet/stmicro/stmmac/dwmac-meson8b.c

diff --git a/drivers/net/ethernet/stmicro/stmmac/Makefile 
b/drivers/net/ethernet/stmicro/stmmac/Makefile
index 44b630c..f77edb9 100644
--- a/drivers/net/ethernet/stmicro/stmmac/Makefile
+++ b/drivers/net/ethernet/stmicro/stmmac/Makefile
@@ -9,7 +9,7 @@ stmmac-objs:= stmmac_main.o stmmac_ethtool.o stmmac_mdio.o 
ring_mode.o  \
 obj-$(CONFIG_STMMAC_PLATFORM)  += stmmac-platform.o
 obj-$(CONFIG_DWMAC_IPQ806X)+= dwmac-ipq806x.o
 obj-$(CONFIG_DWMAC_LPC18XX)+= dwmac-lpc18xx.o
-obj-$(CONFIG_DWMAC_MESON)  += dwmac-meson.o
+obj-$(CONFIG_DWMAC_MESON)  += dwmac-meson.o dwmac-meson8b.o
 obj-$(CONFIG_DWMAC_ROCKCHIP)   += dwmac-rk.o
 obj-$(CONFIG_DWMAC_SOCFPGA)+= dwmac-altr-socfpga.o
 obj-$(CONFIG_DWMAC_STI)+= dwmac-sti.o
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-meson8b.c 
b/drivers/net/ethernet/stmicro/stmmac/dwmac-meson8b.c
new file mode 100644
index 000..a31ec24
--- /dev/null
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-meson8b.c
@@ -0,0 +1,324 @@
+/*
+ * Amlogic Meson S805/S905 DWMAC glue layer
+ *
+ * Copyright (C) 2016 Martin Blumenstingl 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see .
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "stmmac_platform.h"
+
+#define PRG_ETH0   0x0
+
+#define PRG_ETH0_RGMII_MODEBIT(0)
+
+/* mux to choose between fclk_div2 (bit unset) and mpll2 (bit set) */
+#define PRG_ETH0_CLK_M250_SEL_SHIFT4
+#define PRG_ETH0_CLK_M250_SEL_MASK GENMASK(4, 4)
+
+#define PRG_ETH0_TXDLY_SHIFT   5
+#define PRG_ETH0_TXDLY_MASKGENMASK(6, 5)
+#define PRG_ETH0_TXDLY_OFF (0x0 << PRG_ETH0_TXDLY_SHIFT)
+#define PRG_ETH0_TXDLY_QUARTER (0x1 << PRG_ETH0_TXDLY_SHIFT)
+#define PRG_ETH0_TXDLY_HALF(0x2 << PRG_ETH0_TXDLY_SHIFT)
+#define PRG_ETH0_TXDLY_THREE_QUARTERS  (0x3 << PRG_ETH0_TXDLY_SHIFT)
+
+/* divider for the result of m250_sel */
+#define PRG_ETH0_CLK_M250_DIV_SHIFT7
+#define PRG_ETH0_CLK_M250_DIV_WIDTH3
+
+/* divides the result of m25_sel by either 5 (bit unset) or 10 (bit set) */
+#define PRG_ETH0_CLK_M25_DIV_SHIFT 10
+#define PRG_ETH0_CLK_M25_DIV_WIDTH 1
+
+#define PRG_ETH0_INVERTED_RMII_CLK BIT(11)
+#define PRG_ETH0_TX_AND_PHY_REF_CLKBIT(12)
+
+#define MUX_CLK_NUM_PARENTS2
+
+struct meson8b_dwmac {
+   struct platform_device  *pdev;
+
+   void __iomem*regs;
+
+   phy_interface_t phy_mode;
+
+   struct clk_mux  m250_mux;
+   struct clk  *m250_mux_clk;
+   struct clk  *m250_mux_parent[MUX_CLK_NUM_PARENTS];
+
+   struct clk_divider  m250_div;
+   struct clk  *m250_div_clk;
+
+   struct clk_divider  m25_div;
+   struct clk  *m25_div_clk;
+};
+
+static void meson8b_dwmac_mask_bits(struct meson8b_dwmac *dwmac, u32 reg,
+   u32 mask, u32 value)
+{
+   u32 data;
+
+   data = readl(dwmac->regs + reg);
+   data &= ~mask;
+   data |= (value & mask);
+
+   writel(data, dwmac->regs + reg);
+}
+
+static int meson8b_init_clk(struct meson8b_dwmac

[PATCH v4 1/5] net: dt-bindings: Document the new Meson8b and GXBB DWMAC bindings

2016-09-04 Thread Martin Blumenstingl

This patch adds the documentation for the DWMAC ethernet controller
found in Amlogic Meson 8b (S805) and GXBB (S905) SoCs.
The main difference between the Meson6 glue is that different registers
(with different layout) are used.

Signed-off-by: Martin Blumenstingl 
Acked-by: Rob Herring 
Acked-by: David S. Miller 
---
 .../devicetree/bindings/net/meson-dwmac.txt| 45 ++
 1 file changed, 37 insertions(+), 8 deletions(-)

diff --git a/Documentation/devicetree/bindings/net/meson-dwmac.txt 
b/Documentation/devicetree/bindings/net/meson-dwmac.txt
index ec633d7..89e62dd 100644
--- a/Documentation/devicetree/bindings/net/meson-dwmac.txt
+++ b/Documentation/devicetree/bindings/net/meson-dwmac.txt
@@ -1,18 +1,32 @@
 * Amlogic Meson DWMAC Ethernet controller
 
 The device inherits all the properties of the dwmac/stmmac devices
-described in the file net/stmmac.txt with the following changes.
+described in the file stmmac.txt in the current directory with the
+following changes.
 
-Required properties:
+Required properties on all platforms:
 
-- compatible: should be "amlogic,meson6-dwmac" along with "snps,dwmac"
- and any applicable more detailed version number
- described in net/stmmac.txt
+- compatible:  Depending on the platform this should be one of:
+   - "amlogic,meson6-dwmac"
+   - "amlogic,meson8b-dwmac"
+   - "amlogic,meson-gxbb-dwmac"
+   Additionally "snps,dwmac" and any applicable more
+   detailed version number described in net/stmmac.txt
+   should be used.
 
-- reg: should contain a register range for the dwmac controller and
-   another one for the Amlogic specific configuration
+- reg: The first register range should be the one of the DWMAC
+   controller. The second range is is for the Amlogic specific
+   configuration (for example the PRG_ETHERNET register range
+   on Meson8b and newer)
 
-Example:
+Required properties on Meson8b and newer:
+- clock-names: Should contain the following:
+   - "stmmaceth" - see stmmac.txt
+   - "clkin0" - first parent clock of the internal mux
+   - "clkin1" - second parent clock of the internal mux
+
+
+Example for Meson6:
 
ethmac: ethernet@c941 {
compatible = "amlogic,meson6-dwmac", "snps,dwmac";
@@ -23,3 +37,18 @@ Example:
clocks = <>;
clock-names = "stmmaceth";
}
+
+Example for GXBB:
+   ethmac: ethernet@c941 {
+   compatible = "amlogic,meson-gxbb-dwmac", "snps,dwmac";
+   reg = <0x0 0xc941 0x0 0x1>,
+   <0x0 0xc8834540 0x0 0x8>;
+   interrupts = <0 8 1>;
+   interrupt-names = "macirq";
+   clocks = < CLKID_ETH>,
+   < CLKID_FCLK_DIV2>,
+   < CLKID_MPLL2>;
+   clock-names = "stmmaceth", "clkin0", "clkin1";
+   phy-mode = "rgmii";
+   status = "disabled";
+   };
-- 
2.9.3

Re: [PATCH] RDS: Simplify code

2016-09-04 Thread Leon Romanovsky

On Sun, Sep 04, 2016 at 05:57:20PM +0200, Christophe JAILLET wrote:
> Le 04/09/2016 à 14:20, Leon Romanovsky a écrit :
> >On Sat, Sep 03, 2016 at 07:33:29AM +0200, Christophe JAILLET wrote:
> >>Calling 'list_splice' followed by 'INIT_LIST_HEAD' is equivalent to
> >>'list_splice_init'.
> >It is not 100% accurate
> >
> >list_splice(y, z)
> >INIT_LIST_HEAD(y)
> >
> >==>
> >
> >if (!list_empty(y))
> >  __list_splice(y, z, z>next);
> >INIT_LIST_HEAD(y)
> >
> >and not
> >
> >if (!list_empty(y)) {
> >  __list_splice(y, z, z>next);
> >  INIT_LIST_HEAD(y)
> >}
> >
> >as list_splice_init will do.
> >
> You are right but if you dig further you will see that calling
> INIT_LIST_HEAD on an empty list is a no-op (AFAIK).
> And if this list was not already correctly initialized, then you would have
> some other troubles.

Thank you for the suggestion,
It looks like the code after that can be skipped in case of loop_conns
list is empty, the tmp_list will be empty too.

174 list_for_each_entry_safe(lc, _lc, _list, loop_node) {
175 WARN_ON(lc->conn->c_passive);
176 rds_conn_destroy(lc->conn);
177 }

>
> CJ
>


signature.asc
Description: PGP signature

[PATCH v4 2/5] clk: gxbb: expose MPLL2 clock for use by DT

2016-09-04 Thread Martin Blumenstingl

This exposes the MPLL2 clock as this is one of the input clocks of the
ethernet controller's internal mux.

Signed-off-by: Martin Blumenstingl 
---
 drivers/clk/meson/gxbb.h  | 2 +-
 include/dt-bindings/clock/gxbb-clkc.h | 1 +
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/clk/meson/gxbb.h b/drivers/clk/meson/gxbb.h
index 217df51..3606e875 100644
--- a/drivers/clk/meson/gxbb.h
+++ b/drivers/clk/meson/gxbb.h
@@ -183,7 +183,7 @@
 /* CLKID_CLK81 */
 #define CLKID_MPLL0  13
 #define CLKID_MPLL1  14
-#define CLKID_MPLL2  15
+/* CLKID_MPLL2 */
 #define CLKID_DDR16
 #define CLKID_DOS17
 #define CLKID_ISA18
diff --git a/include/dt-bindings/clock/gxbb-clkc.h 
b/include/dt-bindings/clock/gxbb-clkc.h
index 7d41864..244ea6e 100644
--- a/include/dt-bindings/clock/gxbb-clkc.h
+++ b/include/dt-bindings/clock/gxbb-clkc.h
@@ -8,6 +8,7 @@
 #define CLKID_CPUCLK   1
 #define CLKID_FCLK_DIV24
 #define CLKID_CLK8112
+#define CLKID_MPLL215
 #define CLKID_ETH  36
 #define CLKID_SD_EMMC_A94
 #define CLKID_SD_EMMC_B95
-- 
2.9.3

[PATCH v4 5/5] ARM64: dts: meson-gxbb: use the new GXBB DWMAC glue driver

2016-09-04 Thread Martin Blumenstingl

The Amlogic reference driver uses the "mc_val" devicetree property to
configure the PRG_ETHERNET_ADDR0 register. Unfortunately it uses magic
values for this configuration.
According to the datasheet the PRG_ETHERNET_ADDR0 register is at address
0xc8834108. However, the reference driver uses 0xc8834540 instead.
According to my tests, the value from the reference driver is correct.

No changes are required to the board dts files because the only
required configuration option is the phy-mode, which had to be
configured correctly before as well.

Signed-off-by: Martin Blumenstingl 
---
 arch/arm64/boot/dts/amlogic/meson-gxbb.dtsi | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/boot/dts/amlogic/meson-gxbb.dtsi 
b/arch/arm64/boot/dts/amlogic/meson-gxbb.dtsi
index 2b47415..2e8a3d9 100644
--- a/arch/arm64/boot/dts/amlogic/meson-gxbb.dtsi
+++ b/arch/arm64/boot/dts/amlogic/meson-gxbb.dtsi
@@ -497,13 +497,15 @@
};
 
ethmac: ethernet@c941 {
-   compatible = "amlogic,meson6-dwmac", "snps,dwmac";
+   compatible = "amlogic,meson-gxbb-dwmac", "snps,dwmac";
reg = <0x0 0xc941 0x0 0x1
   0x0 0xc8834540 0x0 0x4>;
interrupts = <0 8 1>;
interrupt-names = "macirq";
-   clocks = < CLKID_ETH>;
-   clock-names = "stmmaceth";
+   clocks = < CLKID_ETH>,
+< CLKID_FCLK_DIV2>,
+< CLKID_MPLL2>;
+   clock-names = "stmmaceth", "clkin0", "clkin1";
phy-mode = "rgmii";
status = "disabled";
};
-- 
2.9.3

[PATCH v4 3/5] stmmac: introduce get_stmmac_bsp_priv() helper

2016-09-04 Thread Martin Blumenstingl

From: Joachim Eastwood 

Create a helper to retrive dwmac private data from a dev
pointer. This is useful in PM callbacks and driver remove.

Signed-off-by: Joachim Eastwood 
Tested-by: Martin Blumenstingl 
Acked-by: David S. Miller 
---
 drivers/net/ethernet/stmicro/stmmac/stmmac_platform.h | 8 
 1 file changed, 8 insertions(+)

diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.h 
b/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.h
index ffeb8d9..64e147f 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.h
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.h
@@ -30,4 +30,12 @@ int stmmac_get_platform_resources(struct platform_device 
*pdev,
 int stmmac_pltfr_remove(struct platform_device *pdev);
 extern const struct dev_pm_ops stmmac_pltfr_pm_ops;
 
+static inline void *get_stmmac_bsp_priv(struct device *dev)
+{
+   struct net_device *ndev = dev_get_drvdata(dev);
+   struct stmmac_priv *priv = netdev_priv(ndev);
+
+   return priv->plat->bsp_priv;
+}
+
 #endif /* __STMMAC_PLATFORM_H__ */
-- 
2.9.3

[PATCH v4 0/5] meson: Meson8b and GXBB DWMAC glue driver

2016-09-04 Thread Martin Blumenstingl

This adds a DWMAC glue driver for the PRG_ETHERNET registers found in
Meson8b and GXBB SoCs. Based on the "old" meson6b-dwmac glue driver
the register layout is completely different.
Thus I introduced a separate driver.


Changes since v3:
- remove (unnecessary) usage of CLK_IS_BASIC flag
- use WARN_ON(IS_ERR(...)) instead of WARN_ON(PTR_ERR_OR_ZERO(...))
- let devm_ioremap_resource() check the result returned by
  platform_get_resource()
- added David Miller's ACKs to the patches which would be relevant for
  the net-next tree (patches 1, 3 and 4) as per
  http://lists.infradead.org/pipermail/linux-amlogic/2016-September/000995.html
- fixed typo in copyright year

Joachim Eastwood (1):
  stmmac: introduce get_stmmac_bsp_priv() helper

Martin Blumenstingl (4):
  net: dt-bindings: Document the new Meson8b and GXBB DWMAC bindings
  clk: gxbb: expose MPLL2 clock for use by DT
  net: stmmac: add a glue driver for the Amlogic Meson 8b / GXBB DWMAC
  ARM64: dts: meson-gxbb: use the new GXBB DWMAC glue driver

 .../devicetree/bindings/net/meson-dwmac.txt|  45 ++-
 arch/arm64/boot/dts/amlogic/meson-gxbb.dtsi|   8 +-
 drivers/clk/meson/gxbb.h   |   2 +-
 drivers/net/ethernet/stmicro/stmmac/Makefile   |   2 +-
 .../net/ethernet/stmicro/stmmac/dwmac-meson8b.c| 324 +
 .../net/ethernet/stmicro/stmmac/stmmac_platform.h  |   8 +
 include/dt-bindings/clock/gxbb-clkc.h  |   1 +
 7 files changed, 377 insertions(+), 13 deletions(-)
 create mode 100644 drivers/net/ethernet/stmicro/stmmac/dwmac-meson8b.c

-- 
2.9.3

Re: [PATCH v3 4/5] net: stmmac: add a glue driver for the Amlogic Meson 8b / GXBB DWMAC

2016-09-04 Thread Martin Blumenstingl

On Tue, Aug 30, 2016 at 9:19 PM, Stephen Boyd  wrote:
>> + return PTR_ERR(dwmac->m250_mux_clk);
>> +
>> + /* create the m250_div */
>> + snprintf(clk_name, sizeof(clk_name), "%s#m250_div", dev_name(dev));
>> + init.name = devm_kstrdup(dev, clk_name, GFP_KERNEL);
>> + init.ops = _divider_ops;
>> + init.flags = CLK_IS_BASIC | CLK_SET_RATE_PARENT;
>> + clk_div_parents[0] = __clk_get_name(dwmac->m250_mux_clk);
>> + init.parent_names = clk_div_parents;
>> + init.num_parents = ARRAY_SIZE(clk_div_parents);
>> +
>> + dwmac->m250_div.reg = dwmac->regs + PRG_ETH0;
>> + dwmac->m250_div.shift = PRG_ETH0_CLK_M250_DIV_SHIFT;
>> + dwmac->m250_div.width = PRG_ETH0_CLK_M250_DIV_WIDTH;
>> + dwmac->m250_div.hw.init = 
>> + dwmac->m250_div.flags = CLK_DIVIDER_ONE_BASED | CLK_DIVIDER_ALLOW_ZERO;
>> +
>> + dwmac->m250_div_clk = devm_clk_register(dev, >m250_div.hw);
>
> We've been trying to move away from devm_clk_register() to
> devm_clk_hw_register() so that clk providers aren't also clk
> consumers. Obviously in this case this driver is a provider and a
> consumer, so this isn't as important. Kevin did something similar
> in the mmc driver, so I'll reiterate what I said on that patch.
> Perhaps we should make __clk_create_clk() into a real clk
> provider API so that we can use devm_clk_hw_register() here and
> then generate a clk for this device. That would allow us to have
> proper consumer tracking without relying on the clk that is
> returned from clk_register() (the intent is to make that clk
> instance internal to the framework).
please correct me if I'm wrong but I read this as "this code is OK for
now, but it should be changed once the clk framework has API for
that".
If still you want me to change the code then please send a NACK
(preferably on the updated series which I am preparing right now).

>> + if (WARN_ON(PTR_ERR_OR_ZERO(dwmac->m250_div_clk)))
>> + return PTR_ERR(dwmac->m250_div_clk);
>> +
>> + /* create the m25_div */
>> + snprintf(clk_name, sizeof(clk_name), "%s#m25_div", dev_name(dev));
>> + init.name = devm_kstrdup(dev, clk_name, GFP_KERNEL);
>> + init.ops = _divider_ops;
>> + init.flags = CLK_IS_BASIC | CLK_SET_RATE_PARENT;
>> + clk_div_parents[0] = __clk_get_name(dwmac->m250_div_clk);
>> + init.parent_names = clk_div_parents;
>> + init.num_parents = ARRAY_SIZE(clk_div_parents);
>> +
>> + dwmac->m25_div.reg = dwmac->regs + PRG_ETH0;
>> + dwmac->m25_div.shift = PRG_ETH0_CLK_M25_DIV_SHIFT;
>> + dwmac->m25_div.width = PRG_ETH0_CLK_M25_DIV_WIDTH;
>> + dwmac->m25_div.table = clk_25m_div_table;
>> + dwmac->m25_div.hw.init = 
>> + dwmac->m25_div.flags = CLK_DIVIDER_ALLOW_ZERO;
>> +
>> + dwmac->m25_div_clk = devm_clk_register(dev, >m25_div.hw);
>> + if (WARN_ON(PTR_ERR_OR_ZERO(dwmac->m25_div_clk)))
>> + return PTR_ERR(dwmac->m25_div_clk);
>> +
>> + return 0;
>
> This could be return WARN_ON(PTR_ERR_OR_ZERO(...))
This would work as well but I prefer the way it is right now (as one
could easily extend the code without having to touch any existing code
apart from the last return).
However, as it's always the case with personal preference: if
coding-style requires me to change it then I'll do so, just let me
know.

I have addressed all other issues you found (thanks for that!) in v4
(which I am about to send in the next few minutes).


Thanks,
Martin

RE: [PATCH net-next V5 4/4] net/sched: Introduce act_tunnel_key

2016-09-04 Thread Rosen, Rami

Hi, Hadar,

>For example, the following flower filter will forward all ICMP packets 
>destined to 11.11.11.2 >through the shared vxlan device 'vxlan0'. Before 
>redirecting, a metadata for the vxlan tunnel >is created using the tunnel_key 
>action and it's arguments:

Shouldn't it be "tc filter add dev ..."?

>$ filter add dev net0 protocol ip parent : \
>flower \
>  ip_proto 1 \
>  dst_ip 11.11.11.2 \
>action tunnel_key set \
>  src_ip 11.11.0.1 \
>  dst_ip 11.11.0.2 \
>  id 11 \
>action mirred egress redirect dev vxlan0

Regards,
Rami Rosen
Intel Corporation

[PATCH v2] net: smsc: remove build warning of duplicate definition

2016-09-04 Thread Sudip Mukherjee

The build of m32r was giving warning:

In file included from drivers/net/ethernet/smsc/smc91x.c:92:0:
drivers/net/ethernet/smsc/smc91x.h:448:0: warning: "SMC_inb" redefined
 #define SMC_inb(ioaddr, reg)  ({ BUG(); 0; })
 
drivers/net/ethernet/smsc/smc91x.h:106:0:
note: this is the location of the previous definition
 #define SMC_inb(a, r)  inb(((u32)a) + (r))
 
drivers/net/ethernet/smsc/smc91x.h:449:0: warning: "SMC_outb" redefined
 #define SMC_outb(x, ioaddr, reg) BUG()
 
drivers/net/ethernet/smsc/smc91x.h:108:0:
note: this is the location of the previous definition
 #define SMC_outb(v, a, r) outb(v, ((u32)a) + (r))

Signed-off-by: Sudip Mukherjee 
---

v2: +#ifdef of v1 is removed.

 drivers/net/ethernet/smsc/smc91x.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/ethernet/smsc/smc91x.h 
b/drivers/net/ethernet/smsc/smc91x.h
index 1a55c79..06d7dca 100644
--- a/drivers/net/ethernet/smsc/smc91x.h
+++ b/drivers/net/ethernet/smsc/smc91x.h
@@ -445,7 +445,9 @@ smc_pxa_dma_insw(void __iomem *ioaddr, struct smc_local 
*lp, int reg, int dma,
 #endif
 
 #if ! SMC_CAN_USE_8BIT
+#undef SMC_inb
 #define SMC_inb(ioaddr, reg)   ({ BUG(); 0; })
+#undef SMC_outb
 #define SMC_outb(x, ioaddr, reg)   BUG()
 #define SMC_insb(a, r, p, l)   BUG()
 #define SMC_outsb(a, r, p, l)  BUG()
-- 
1.9.1

[PATCH v3] net: macb: initialize checksum when using checksum offloading

2016-09-04 Thread Helmut Buchsbaum

I'm still struggling to get this fix right..

Changes since v2:
 - do not blindly modify SKB contents according to Dave's legitimate
   objection

Changes since v1:
 - dropped disabling HW checksum offload for Zynq
 - initialize checksum similar to net/ethernet/freescale/fec_main.c

-- >8 --
MACB/GEM needs the checksum field initialized to 0 to get correct
results on transmit in all cases, e.g. on Zynq, UDP packets with
payload <= 2 otherwise contain a wrong checksums.

Signed-off-by: Helmut Buchsbaum 
---
 drivers/net/ethernet/cadence/macb.c | 23 +++
 1 file changed, 23 insertions(+)

diff --git a/drivers/net/ethernet/cadence/macb.c 
b/drivers/net/ethernet/cadence/macb.c
index 89c0cfa..d954a97 100644
--- a/drivers/net/ethernet/cadence/macb.c
+++ b/drivers/net/ethernet/cadence/macb.c
@@ -1323,6 +1323,24 @@ dma_error:
return 0;
 }
 
+static inline int macb_clear_csum(struct sk_buff *skb)
+{
+   /* no change for packets without checksum offloading */
+   if (skb->ip_summed != CHECKSUM_PARTIAL)
+   return 0;
+
+   /* make sure we can modify the header */
+   if (unlikely(skb_cow_head(skb, 0)))
+   return -1;
+
+   /* initialize checksum field
+* This is required - at least for Zynq, which otherwise calculates
+* wrong UDP header checksums for UDP packets with UDP data len <=2
+*/
+   *(__sum16 *)(skb_checksum_start(skb) + skb->csum_offset) = 0;
+   return 0;
+}
+
 static int macb_start_xmit(struct sk_buff *skb, struct net_device *dev)
 {
u16 queue_index = skb_get_queue_mapping(skb);
@@ -1362,6 +1380,11 @@ static int macb_start_xmit(struct sk_buff *skb, struct 
net_device *dev)
return NETDEV_TX_BUSY;
}
 
+   if (macb_clear_csum(skb)) {
+   dev_kfree_skb_any(skb);
+   return NETDEV_TX_OK;
+   }
+
/* Map socket buffer for DMA transfer */
if (!macb_tx_map(bp, queue, skb)) {
dev_kfree_skb_any(skb);
-- 
2.1.4

Re: [PATCH] RDS: Simplify code

2016-09-04 Thread Christophe JAILLET


Le 04/09/2016 à 14:20, Leon Romanovsky a écrit :

On Sat, Sep 03, 2016 at 07:33:29AM +0200, Christophe JAILLET wrote:

Calling 'list_splice' followed by 'INIT_LIST_HEAD' is equivalent to
'list_splice_init'.

It is not 100% accurate

list_splice(y, z)
INIT_LIST_HEAD(y)

==>

if (!list_empty(y))
  __list_splice(y, z, z>next);
INIT_LIST_HEAD(y)

and not

if (!list_empty(y)) {
  __list_splice(y, z, z>next);
  INIT_LIST_HEAD(y)
}

as list_splice_init will do.

You are right but if you dig further you will see that calling 
INIT_LIST_HEAD on an empty list is a no-op (AFAIK).
And if this list was not already correctly initialized, then you would 
have some other troubles.


CJ

Re: [PATCH] vxlan: Update tx_errors statistics if vxlan_build_skb return err.

2016-09-04 Thread Jiri Benc

On Sun,  4 Sep 2016 18:52:51 +0800, Haishuang Yan wrote:
> If vxlan_build_skb return err < 0, tx_errors should be also increased.
> 
> Signed-off-by: Haishuang Yan 
> ---
>  drivers/net/vxlan.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
> index f605a36..2c72dcd 100644
> --- a/drivers/net/vxlan.c
> +++ b/drivers/net/vxlan.c
> @@ -2103,6 +2103,7 @@ static void vxlan_xmit_one(struct sk_buff *skb, struct 
> net_device *dev,
> vni, md, flags, udp_sum);
>   if (err < 0) {
>   dst_release(ndst);
> + dev->stats.tx_errors++;
>   return;
>   }
>   udp_tunnel6_xmit_skb(ndst, sk, skb, dev,

Acked-by: Jiri Benc 

The error path in vxlan_xmit_one deserves complete rework, though.

 Jiri

Re: [PATCH net-next V5 4/4] net/sched: Introduce act_tunnel_key

2016-09-04 Thread kbuild test robot

Hi Amir,

[auto build test WARNING on net-next/master]

url:
https://github.com/0day-ci/linux/commits/Hadar-Hen-Zion/net-sched-ip-tunnel-metadata-set-release-classify-by-using-TC/20160904-185825
reproduce:
# apt-get install sparse
make ARCH=x86_64 allmodconfig
make C=1 CF=-D__CHECK_ENDIAN__


sparse warnings: (new ones prefixed by >>)

   include/linux/compiler.h:230:8: sparse: attribute 'no_sanitize_address': 
unknown attribute
>> net/sched/act_tunnel_key.c:38:18: sparse: incompatible types in comparison 
>> expression (different address spaces)
   net/sched/act_tunnel_key.c:169:22: sparse: incompatible types in comparison 
expression (different address spaces)
   net/sched/act_tunnel_key.c:197:18: sparse: incompatible types in comparison 
expression (different address spaces)
   net/sched/act_tunnel_key.c:248:18: sparse: incompatible types in comparison 
expression (different address spaces)

vim +38 net/sched/act_tunnel_key.c

22  #include 
23  
24  #define TUNNEL_KEY_TAB_MASK 15
25  
26  static int tunnel_key_net_id;
27  static struct tc_action_ops act_tunnel_key_ops;
28  
29  static int tunnel_key_act(struct sk_buff *skb, const struct tc_action 
*a,
30struct tcf_result *res)
31  {
32  struct tcf_tunnel_key *t = to_tunnel_key(a);
33  struct tcf_tunnel_key_params *params;
34  int action;
35  
36  rcu_read_lock();
37  
  > 38  params = rcu_dereference(t->params);
39  
40  tcf_lastuse_update(>tcf_tm);
41  bstats_cpu_update(this_cpu_ptr(t->common.cpu_bstats), skb);
42  action = params->action;
43  
44  switch (params->tcft_action) {
45  case TCA_TUNNEL_KEY_ACT_RELEASE:
46  skb_dst_drop(skb);

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation

Re: [PATCH 1/1] vxlan: make the struct nlattr variable const

2016-09-04 Thread Jiri Benc

On Sat,  3 Sep 2016 22:07:10 +0800, zyjzyj2...@gmail.com wrote:
> From: Zhu Yanjun 
> 
> The struct nlattr variable should not be changed.

True. But the only caller is vxlan_fdb_parse which doesn't have its tb
parameter const, so what's the point?

Now, the tb parameter of vxlan_fdb_parse could be changed to const.
This again doesn't make much sense without modifying its callers, which
are vxlan_fdb_add and vxlan_fdb_delete. And those are net_device_ops
hooks. Have you considered looking into those (ndo_fdb_*) and turning
the struct nlattr ** to const (that is, if it's possible)?

 Jiri

Re: [PATCH net-next V5 2/4] net/dst: Utility functions to build dst_metadata without supplying an skb

2016-09-04 Thread Hadar Hen Zion

On Sun, Sep 4, 2016 at 2:14 PM, Sergei Shtylyov
 wrote:
> Hello.
>
>
> On 9/4/2016 1:55 PM, Hadar Hen Zion wrote:
>
>> From: Amir Vadai 
>>
>> Extract __ip_tun_set_dst() and __ipv6_tun_set_dst() out of
>> ip_tun_rx_dst() and ipv6_tun_rx_dst(), to be used without supplying an
>> skb.
>>
>> Signed-off-by: Amir Vadai 
>> Signed-off-by: Hadar Hen Zion 
>> Acked-by: Jiri Pirko 
>> Reviewed-by: Shmulik Ladkani 
>> ---
>>  include/net/dst_metadata.h | 45
>> -
>>  1 file changed, 32 insertions(+), 13 deletions(-)
>>
>> diff --git a/include/net/dst_metadata.h b/include/net/dst_metadata.h
>> index 5db9f59..49e8847 100644
>> --- a/include/net/dst_metadata.h
>> +++ b/include/net/dst_metadata.h
>> @@ -112,12 +112,10 @@ static inline struct ip_tunnel_info
>> *skb_tunnel_info_unclone(struct sk_buff *skb
>> return >u.tun_info;
>>  }
>>
>> -static inline struct metadata_dst *ip_tun_rx_dst(struct sk_buff *skb,
>> -__be16 flags,
>> -__be64 tunnel_id,
>> -int md_size)
>> +static inline struct metadata_dst *
>> +__ip_tun_set_dst(__be32 saddr, __be32 daddr, __u8 tos, __u8 ttl,
>> +__be16 flags, __be64 tunnel_id, int md_size)
>
>
>The continuation lines should start under the 1st '__be32' on the broken
> up line. See how it was before your patch.

ack

>
>>  {
>> -   const struct iphdr *iph = ip_hdr(skb);
>> struct metadata_dst *tun_dst;
>>
>> tun_dst = tun_rx_dst(md_size);
>> @@ -125,17 +123,27 @@ static inline struct metadata_dst
>> *ip_tun_rx_dst(struct sk_buff *skb,
>
> [...]
>>
>> -static inline struct metadata_dst *ipv6_tun_rx_dst(struct sk_buff *skb,
>> +static inline struct metadata_dst *ip_tun_rx_dst(struct sk_buff *skb,
>>  __be16 flags,
>>  __be64 tunnel_id,
>>  int md_size)
>>  {
>> -   const struct ipv6hdr *ip6h = ipv6_hdr(skb);
>> +   const struct iphdr *iph = ip_hdr(skb);
>> +
>> +   return __ip_tun_set_dst(iph->saddr, iph->daddr, iph->tos,
>> iph->ttl,
>> +   flags, tunnel_id, md_size);
>> +}
>> +
>> +static inline struct metadata_dst *
>> +__ipv6_tun_set_dst(const struct in6_addr *saddr, const struct in6_addr
>> *daddr,
>> +  __u8 tos, __u8 ttl, __be32 label, __be16 flags,
>> +  __be64 tunnel_id, int md_size)
>
>
>The continuation lines should start under the 1st *const* on the broken
> up line.

ack

>
>> +{
>> struct metadata_dst *tun_dst;
>> struct ip_tunnel_info *info;
>>
>> @@ -150,14 +158,25 @@ static inline struct metadata_dst
>> *ipv6_tun_rx_dst(struct sk_buff *skb,
>
> [...]
>>
>> +static inline struct metadata_dst *
>> +ipv6_tun_rx_dst(struct sk_buff *skb, __be16 flags, __be64 tunnel_id,
>> +   int md_size)
>> +{
>> +   const struct ipv6hdr *ip6h = ipv6_hdr(skb);
>> +
>> +   return __ipv6_tun_set_dst(>saddr, >daddr,
>> +   ipv6_get_dsfield(ip6h), ip6h->hop_limit,
>> +   ip6_flowlabel(ip6h), flags, tunnel_id,
>> +   md_size);
>
>
>The continuation lines should start exactly under the 1st & on the broken
> up line.
>That's DaveM's preference, I don't remember if checkpatch.pl reports that
> for the networking code...

checkpatch doesn't report :(
I'll fix it to the next version.

>
> [...]
>
> MBR, Sergei
>

Re: [PATCH] RDS: Simplify code

2016-09-04 Thread Leon Romanovsky

On Sat, Sep 03, 2016 at 07:33:29AM +0200, Christophe JAILLET wrote:
> Calling 'list_splice' followed by 'INIT_LIST_HEAD' is equivalent to
> 'list_splice_init'.

It is not 100% accurate

list_splice(y, z)
INIT_LIST_HEAD(y)

==>

if (!list_empty(y))
 __list_splice(y, z, z>next);
INIT_LIST_HEAD(y)

and not

if (!list_empty(y)) {
 __list_splice(y, z, z>next);
 INIT_LIST_HEAD(y)
}

as list_splice_init will do.


>
> This has been spotted with the following coccinelle script:
> /
> @@
> expression y,z;
> @@
>
> -   list_splice(y,z);
> -   INIT_LIST_HEAD(y);
> +   list_splice_init(y,z);
>
> Signed-off-by: Christophe JAILLET 
> ---
>  net/rds/loop.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
>
> diff --git a/net/rds/loop.c b/net/rds/loop.c
> index f2bf78de5688..c3e6da4fdf97 100644
> --- a/net/rds/loop.c
> +++ b/net/rds/loop.c
> @@ -167,8 +167,7 @@ void rds_loop_exit(void)
>
>   /* avoid calling conn_destroy with irqs off */
>   spin_lock_irq(_conns_lock);
> - list_splice(_conns, _list);
> - INIT_LIST_HEAD(_conns);
> + list_splice_init(_conns, _list);
>   spin_unlock_irq(_conns_lock);
>
>   list_for_each_entry_safe(lc, _lc, _list, loop_node) {
> --
> 2.7.4
>


signature.asc
Description: PGP signature

Re: [PATCH net-next V5 4/4] net/sched: Introduce act_tunnel_key

2016-09-04 Thread Shmulik Ladkani

On Sun,  4 Sep 2016 13:55:55 +0300, had...@mellanox.com wrote:
> From: Amir Vadai 
> 
> This action could be used before redirecting packets to a shared tunnel
> device, or when redirecting packets arriving from a such a device.
> 
> The action will release the metadata created by the tunnel device
> (decap), or set the metadata with the specified values for encap
> operation.
> 
> For example, the following flower filter will forward all ICMP packets
> destined to 11.11.11.2 through the shared vxlan device 'vxlan0'. Before
> redirecting, a metadata for the vxlan tunnel is created using the
> tunnel_key action and it's arguments:
> 
> $ filter add dev net0 protocol ip parent : \
> flower \
>   ip_proto 1 \
>   dst_ip 11.11.11.2 \
> action tunnel_key set \
>   src_ip 11.11.0.1 \
>   dst_ip 11.11.0.2 \
>   id 11 \
> action mirred egress redirect dev vxlan0
> 
> Signed-off-by: Amir Vadai 
> Signed-off-by: Hadar Hen Zion 

Reviewed-by: Shmulik Ladkani 

Thanks!

Re: [PATCH net-next V5 2/4] net/dst: Utility functions to build dst_metadata without supplying an skb

2016-09-04 Thread Sergei Shtylyov


Hello.

On 9/4/2016 1:55 PM, Hadar Hen Zion wrote:


From: Amir Vadai 

Extract __ip_tun_set_dst() and __ipv6_tun_set_dst() out of
ip_tun_rx_dst() and ipv6_tun_rx_dst(), to be used without supplying an
skb.

Signed-off-by: Amir Vadai 
Signed-off-by: Hadar Hen Zion 
Acked-by: Jiri Pirko 
Reviewed-by: Shmulik Ladkani 
---
 include/net/dst_metadata.h | 45 -
 1 file changed, 32 insertions(+), 13 deletions(-)

diff --git a/include/net/dst_metadata.h b/include/net/dst_metadata.h
index 5db9f59..49e8847 100644
--- a/include/net/dst_metadata.h
+++ b/include/net/dst_metadata.h
@@ -112,12 +112,10 @@ static inline struct ip_tunnel_info 
*skb_tunnel_info_unclone(struct sk_buff *skb
return >u.tun_info;
 }

-static inline struct metadata_dst *ip_tun_rx_dst(struct sk_buff *skb,
-__be16 flags,
-__be64 tunnel_id,
-int md_size)
+static inline struct metadata_dst *
+__ip_tun_set_dst(__be32 saddr, __be32 daddr, __u8 tos, __u8 ttl,
+__be16 flags, __be64 tunnel_id, int md_size)


   The continuation lines should start under the 1st '__be32' on the broken 
up line. See how it was before your patch.



 {
-   const struct iphdr *iph = ip_hdr(skb);
struct metadata_dst *tun_dst;

tun_dst = tun_rx_dst(md_size);
@@ -125,17 +123,27 @@ static inline struct metadata_dst *ip_tun_rx_dst(struct 
sk_buff *skb,

[...]

-static inline struct metadata_dst *ipv6_tun_rx_dst(struct sk_buff *skb,
+static inline struct metadata_dst *ip_tun_rx_dst(struct sk_buff *skb,
 __be16 flags,
 __be64 tunnel_id,
 int md_size)
 {
-   const struct ipv6hdr *ip6h = ipv6_hdr(skb);
+   const struct iphdr *iph = ip_hdr(skb);
+
+   return __ip_tun_set_dst(iph->saddr, iph->daddr, iph->tos, iph->ttl,
+   flags, tunnel_id, md_size);
+}
+
+static inline struct metadata_dst *
+__ipv6_tun_set_dst(const struct in6_addr *saddr, const struct in6_addr *daddr,
+  __u8 tos, __u8 ttl, __be32 label, __be16 flags,
+  __be64 tunnel_id, int md_size)


   The continuation lines should start under the 1st *const* on the broken up 
line.



+{
struct metadata_dst *tun_dst;
struct ip_tunnel_info *info;

@@ -150,14 +158,25 @@ static inline struct metadata_dst *ipv6_tun_rx_dst(struct 
sk_buff *skb,

[...]

+static inline struct metadata_dst *
+ipv6_tun_rx_dst(struct sk_buff *skb, __be16 flags, __be64 tunnel_id,
+   int md_size)
+{
+   const struct ipv6hdr *ip6h = ipv6_hdr(skb);
+
+   return __ipv6_tun_set_dst(>saddr, >daddr,
+   ipv6_get_dsfield(ip6h), ip6h->hop_limit,
+   ip6_flowlabel(ip6h), flags, tunnel_id,
+   md_size);


   The continuation lines should start exactly under the 1st & on the broken 
up line.
   That's DaveM's preference, I don't remember if checkpatch.pl reports that 
for the networking code...


[...]

MBR, Sergei

[PATCH] vti: use right inner_mode for inbound inter address family policy checks

2016-09-04 Thread Thomas Zeitlhofer

In case of inter address family tunneling (IPv6 over vti4 or IPv4 over
vti6), the inbound policy checks in vti_rcv_cb and vti6_rcv_cb are using
the wrong address family. As a result, all inbound inter address family
traffic is dropped.

Use the xfrm_ip2inner_mode helper (as done in xfrm_prepare_input and
xfrm_input) to select the inner_mode that contains the right address family
for the inbound policy checks.

Signed-off-by: Thomas Zeitlhofer 
---

Notes:
The patch was developed by looking at the code, but without knowledge of
the XFRM code in the kernel. It has been successfully tested, but it is
more a guess that might be helpful for the maintainers to find a proper
solution.

 net/ipv4/ip_vti.c  | 12 +++-
 net/ipv6/ip6_vti.c | 12 +++-
 2 files changed, 22 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/ip_vti.c b/net/ipv4/ip_vti.c
index a917903..44d5449 100644
--- a/net/ipv4/ip_vti.c
+++ b/net/ipv4/ip_vti.c
@@ -88,6 +88,7 @@ static int vti_rcv_cb(struct sk_buff *skb, int err)
struct net_device *dev;
struct pcpu_sw_netstats *tstats;
struct xfrm_state *x;
+   struct xfrm_mode *inner_mode;
struct ip_tunnel *tunnel = XFRM_TUNNEL_SKB_CB(skb)->tunnel.ip4;
u32 orig_mark = skb->mark;
int ret;
@@ -105,7 +106,16 @@ static int vti_rcv_cb(struct sk_buff *skb, int err)
}
 
x = xfrm_input_state(skb);
-   family = x->inner_mode->afinfo->family;
+
+   inner_mode = x->inner_mode;
+
+   if (x->sel.family == AF_UNSPEC) {
+   inner_mode = xfrm_ip2inner_mode(x, 
XFRM_MODE_SKB_CB(skb)->protocol);
+   if (inner_mode == NULL)
+   return -EPERM;
+   }
+
+   family = inner_mode->afinfo->family;
 
skb->mark = be32_to_cpu(tunnel->parms.i_key);
ret = xfrm_policy_check(NULL, XFRM_POLICY_IN, skb, family);
diff --git a/net/ipv6/ip6_vti.c b/net/ipv6/ip6_vti.c
index d90a11f..3149757 100644
--- a/net/ipv6/ip6_vti.c
+++ b/net/ipv6/ip6_vti.c
@@ -340,6 +340,7 @@ static int vti6_rcv_cb(struct sk_buff *skb, int err)
struct net_device *dev;
struct pcpu_sw_netstats *tstats;
struct xfrm_state *x;
+   struct xfrm_mode *inner_mode;
struct ip6_tnl *t = XFRM_TUNNEL_SKB_CB(skb)->tunnel.ip6;
u32 orig_mark = skb->mark;
int ret;
@@ -357,7 +358,16 @@ static int vti6_rcv_cb(struct sk_buff *skb, int err)
}
 
x = xfrm_input_state(skb);
-   family = x->inner_mode->afinfo->family;
+
+   inner_mode = x->inner_mode;
+
+   if (x->sel.family == AF_UNSPEC) {
+   inner_mode = xfrm_ip2inner_mode(x, 
XFRM_MODE_SKB_CB(skb)->protocol);
+   if (inner_mode == NULL)
+   return -EPERM;
+   }
+
+   family = inner_mode->afinfo->family;
 
skb->mark = be32_to_cpu(t->parms.i_key);
ret = xfrm_policy_check(NULL, XFRM_POLICY_IN, skb, family);
-- 
2.1.4

Re: [ovs-dev] [PATCH net-next v21 3/4] openvswitch: 802.1AD Flow handling, actions, vlan parsing, netlink attributes

2016-09-04 Thread Jiri Benc

On Sat, 3 Sep 2016 13:30:12 -0400, Eric Garver wrote:
> Would a BUG_ON(!(encap && in_encap)) be better?

Please don't crash the kernel for something that could very well
continue without problems. Use WARN_ON at most.

And if you go that way, WARN_ON_ONCE or rate limiting seems to be even
more appropriate, because if this triggers, it's quite possible it will
trigger repeatedly and the resulting log flood would practically make
the machine useless anyway.

Thanks,

 Jiri

[PATCH] vxlan: Update tx_errors statistics if vxlan_build_skb return err.

2016-09-04 Thread Haishuang Yan

If vxlan_build_skb return err < 0, tx_errors should be also increased.

Signed-off-by: Haishuang Yan 
---
 drivers/net/vxlan.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index f605a36..2c72dcd 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -2103,6 +2103,7 @@ static void vxlan_xmit_one(struct sk_buff *skb, struct 
net_device *dev,
  vni, md, flags, udp_sum);
if (err < 0) {
dst_release(ndst);
+   dev->stats.tx_errors++;
return;
}
udp_tunnel6_xmit_skb(ndst, sk, skb, dev,
-- 
1.8.3.1

[PATCH net-next V5 3/4] net/sched: cls_flower: Classify packet in ip tunnels

2016-09-04 Thread Hadar Hen Zion

From: Amir Vadai 

Introduce classifying by metadata extracted by the tunnel device.
Outer header fields - source/dest ip and tunnel id, are extracted from
the metadata when classifying.

For example, the following will add a filter on the ingress Qdisc of shared
vxlan device named 'vxlan0'. To forward packets with outer src ip
11.11.0.2, dst ip 11.11.0.1 and tunnel id 11. The packets will be
forwarded to tap device 'vnet0' (after metadata is released):

$ filter add dev vxlan0 protocol ip parent : \
flower \
  enc_src_ip 11.11.0.2 \
  enc_dst_ip 11.11.0.1 \
  enc_key_id 11 \
  dst_ip 11.11.11.1 \
action tunnel_key release \
action mirred egress redirect dev vnet0

The action tunnel_key, will be introduced in the next patch in this
series.

Signed-off-by: Amir Vadai 
Signed-off-by: Hadar Hen Zion 
Acked-by: Jiri Pirko 
---
 include/uapi/linux/pkt_cls.h |  11 +
 net/sched/cls_flower.c   | 100 ++-
 2 files changed, 110 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/pkt_cls.h b/include/uapi/linux/pkt_cls.h
index 51b5b24..f9c287c 100644
--- a/include/uapi/linux/pkt_cls.h
+++ b/include/uapi/linux/pkt_cls.h
@@ -431,6 +431,17 @@ enum {
TCA_FLOWER_KEY_VLAN_ID,
TCA_FLOWER_KEY_VLAN_PRIO,
TCA_FLOWER_KEY_VLAN_ETH_TYPE,
+
+   TCA_FLOWER_KEY_ENC_KEY_ID,  /* be32 */
+   TCA_FLOWER_KEY_ENC_IPV4_SRC,/* be32 */
+   TCA_FLOWER_KEY_ENC_IPV4_SRC_MASK,/* be32 */
+   TCA_FLOWER_KEY_ENC_IPV4_DST,/* be32 */
+   TCA_FLOWER_KEY_ENC_IPV4_DST_MASK,/* be32 */
+   TCA_FLOWER_KEY_ENC_IPV6_SRC,/* struct in6_addr */
+   TCA_FLOWER_KEY_ENC_IPV6_SRC_MASK,/* struct in6_addr */
+   TCA_FLOWER_KEY_ENC_IPV6_DST,/* struct in6_addr */
+   TCA_FLOWER_KEY_ENC_IPV6_DST_MASK,/* struct in6_addr */
+
__TCA_FLOWER_MAX,
 };
 
diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c
index cf9ad5b..b084b2a 100644
--- a/net/sched/cls_flower.c
+++ b/net/sched/cls_flower.c
@@ -23,9 +23,13 @@
 #include 
 #include 
 
+#include 
+#include 
+
 struct fl_flow_key {
int indev_ifindex;
struct flow_dissector_key_control control;
+   struct flow_dissector_key_control enc_control;
struct flow_dissector_key_basic basic;
struct flow_dissector_key_eth_addrs eth;
struct flow_dissector_key_vlan vlan;
@@ -35,6 +39,11 @@ struct fl_flow_key {
struct flow_dissector_key_ipv6_addrs ipv6;
};
struct flow_dissector_key_ports tp;
+   struct flow_dissector_key_keyid enc_key_id;
+   union {
+   struct flow_dissector_key_ipv4_addrs enc_ipv4;
+   struct flow_dissector_key_ipv6_addrs enc_ipv6;
+   };
 } __aligned(BITS_PER_LONG / 8); /* Ensure that we can do comparisons as longs. 
*/
 
 struct fl_flow_mask_range {
@@ -124,11 +133,31 @@ static int fl_classify(struct sk_buff *skb, const struct 
tcf_proto *tp,
struct cls_fl_filter *f;
struct fl_flow_key skb_key;
struct fl_flow_key skb_mkey;
+   struct ip_tunnel_info *info;
 
if (!atomic_read(>ht.nelems))
return -1;
 
fl_clear_masked_range(_key, >mask);
+
+   info = skb_tunnel_info(skb);
+   if (info) {
+   struct ip_tunnel_key *key = >key;
+
+   switch (ip_tunnel_info_af(info)) {
+   case AF_INET:
+   skb_key.enc_ipv4.src = key->u.ipv4.src;
+   skb_key.enc_ipv4.dst = key->u.ipv4.dst;
+   break;
+   case AF_INET6:
+   skb_key.enc_ipv6.src = key->u.ipv6.src;
+   skb_key.enc_ipv6.dst = key->u.ipv6.dst;
+   break;
+   }
+
+   skb_key.enc_key_id.keyid = tunnel_id_to_key32(key->tun_id);
+   }
+
skb_key.indev_ifindex = skb->skb_iif;
/* skb_flow_dissect() does not set n_proto in case an unknown protocol,
 * so do it rather here.
@@ -297,7 +326,15 @@ static const struct nla_policy fl_policy[TCA_FLOWER_MAX + 
1] = {
[TCA_FLOWER_KEY_VLAN_ID]= { .type = NLA_U16 },
[TCA_FLOWER_KEY_VLAN_PRIO]  = { .type = NLA_U8 },
[TCA_FLOWER_KEY_VLAN_ETH_TYPE]  = { .type = NLA_U16 },
-
+   [TCA_FLOWER_KEY_ENC_KEY_ID] = { .type = NLA_U32 },
+   [TCA_FLOWER_KEY_ENC_IPV4_SRC]   = { .type = NLA_U32 },
+   [TCA_FLOWER_KEY_ENC_IPV4_SRC_MASK] = { .type = NLA_U32 },
+   [TCA_FLOWER_KEY_ENC_IPV4_DST]   = { .type = NLA_U32 },
+   [TCA_FLOWER_KEY_ENC_IPV4_DST_MASK] = { .type = NLA_U32 },
+   [TCA_FLOWER_KEY_ENC_IPV6_SRC]   = { .len = sizeof(struct in6_addr) },
+   [TCA_FLOWER_KEY_ENC_IPV6_SRC_MASK] = { .len = sizeof(struct in6_addr) },
+   [TCA_FLOWER_KEY_ENC_IPV6_DST]   = { .len = sizeof(struct in6_addr) },
+   [TCA_FLOWER_KEY_ENC_IPV6_DST_MASK] = { .len =

[PATCH net-next V5 0/4] net/sched: ip tunnel metadata set/release/classify by using TC

2016-09-04 Thread Hadar Hen Zion

Hi,

This patchset introduces ip tunnel manipulation support using the TC subsystem.

In the decap flow, it enables the user to redirect packets from a shared tunnel
device and classify by outer and inner headers. The outer headers are extracted
from the metadata and used by the flower filter. A new action act_tunnel_key,
releases the metadata.

In the encap flow, act_tunnel_key creates a metadata object to be used by the
shared tunnel device. The actual redirection to the tunnel device is done using
act_mirred.

For example:
$ tc qdisc add dev vnet0 ingress
$ tc filter add dev vnet0 protocol ip parent : \
flower \
 ip_proto 1 \
action tunnel_key set \
 src_ip 11.11.0.1 \
 dst_ip 11.11.0.2 \
 id 11 \
action mirred egress redirect dev vxlan0

$ tc qdisc add dev vxlan0 ingress
$ tc filter add dev vxlan0 protocol ip parent : \
flower \
 enc_src_ip 11.11.0.2 \
 enc_dst_ip 11.11.0.1 \
 enc_key_id 11 \
action tunnel_key release \
action mirred egress redirect dev vnet0

Amir & Hadar

Changes from V4:
- Fix tunnel_key_init function error flow.
- Add 'action' variable to struct tcf_tunnel_key_params and use it instead of
  tcf_action variable which is not protected by rcu lock.

Changes from V3:
- Use percpu stats
- No spinlock on datapatch - protecting parameters with rcu
- Fix buggy handling of set/release dst
- Use nla_get_in_addr and nla_put_in_addr
- Fix change logs
- Pass in6_addr by pointer
- Rename utility functions to start with double underscore

Changes from V2:
- Use union in struct fl_flow_key for enc_ipv6 and enc_ipv4.
- Rename functions _ip_tun_rx_dst and _ipv6_tun_rx_dst to _ip_tun_set_dst and
  _ipv6_tun_set_dst accordingly.
- Remove local parameter 'encapdecap' from tunnel_key_init function.
- Don't copy in6_addr values in tunnel_key_dump_addresses function, use 
pointers.

Changes from V1:
- More cleanups to key32_to_tunnel_id() and tunnel_id_to_key32()
- IPv6 Support added
- Set TUNNEL_KEY flag to make GRE work
- Handle zero tunnel id properly in act_tunnel_key
- Don't leave junk in decap action
- Fix bug in act_tunnel_key initialization where (exists & ocr) is true
- Remove BUG() from code
- Rename action to tunnel_key
- Improve grep-ability of code
- Reuse code from ip_tun_rx_dst() and ipv6_tun_rx_dst()

Changes from RFC:
- Add a new action instead of making mirred too complex
- No need to specify UDP port in action - it is already in the tunnel device
  configuration
- Added a decap operation to drop tunnel metadata

Amir Vadai (4):
  net/ip_tunnels: Introduce tunnel_id_to_key32() and
key32_to_tunnel_id()
  net/dst: Utility functions to build dst_metadata without supplying an
skb
  net/sched: cls_flower: Classify packet in ip tunnels
  net/sched: Introduce act_tunnel_key

 drivers/net/vxlan.c   |   4 +-
 include/net/dst_metadata.h|  45 ++--
 include/net/ip_tunnels.h  |  19 ++
 include/net/tc_act/tc_tunnel_key.h|  31 +++
 include/net/vxlan.h   |  18 --
 include/uapi/linux/pkt_cls.h  |  11 +
 include/uapi/linux/tc_act/tc_tunnel_key.h |  42 
 net/ipv4/ip_gre.c |  23 +-
 net/sched/Kconfig |  11 +
 net/sched/Makefile|   1 +
 net/sched/act_tunnel_key.c| 348 ++
 net/sched/cls_flower.c| 100 -
 12 files changed, 598 insertions(+), 55 deletions(-)
 create mode 100644 include/net/tc_act/tc_tunnel_key.h
 create mode 100644 include/uapi/linux/tc_act/tc_tunnel_key.h
 create mode 100644 net/sched/act_tunnel_key.c

-- 
1.8.3.1

[PATCH net-next V5 4/4] net/sched: Introduce act_tunnel_key

2016-09-04 Thread Hadar Hen Zion

From: Amir Vadai 

This action could be used before redirecting packets to a shared tunnel
device, or when redirecting packets arriving from a such a device.

The action will release the metadata created by the tunnel device
(decap), or set the metadata with the specified values for encap
operation.

For example, the following flower filter will forward all ICMP packets
destined to 11.11.11.2 through the shared vxlan device 'vxlan0'. Before
redirecting, a metadata for the vxlan tunnel is created using the
tunnel_key action and it's arguments:

$ filter add dev net0 protocol ip parent : \
flower \
  ip_proto 1 \
  dst_ip 11.11.11.2 \
action tunnel_key set \
  src_ip 11.11.0.1 \
  dst_ip 11.11.0.2 \
  id 11 \
action mirred egress redirect dev vxlan0

Signed-off-by: Amir Vadai 
Signed-off-by: Hadar Hen Zion 
---
 include/net/tc_act/tc_tunnel_key.h|  31 +++
 include/uapi/linux/tc_act/tc_tunnel_key.h |  42 
 net/sched/Kconfig |  11 +
 net/sched/Makefile|   1 +
 net/sched/act_tunnel_key.c| 348 ++
 5 files changed, 433 insertions(+)
 create mode 100644 include/net/tc_act/tc_tunnel_key.h
 create mode 100644 include/uapi/linux/tc_act/tc_tunnel_key.h
 create mode 100644 net/sched/act_tunnel_key.c

diff --git a/include/net/tc_act/tc_tunnel_key.h 
b/include/net/tc_act/tc_tunnel_key.h
new file mode 100644
index 000..7c34652
--- /dev/null
+++ b/include/net/tc_act/tc_tunnel_key.h
@@ -0,0 +1,31 @@
+/*
+ * Copyright (c) 2016, Amir Vadai 
+ * Copyright (c) 2016, Mellanox Technologies. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+#ifndef __NET_TC_TUNNEL_KEY_H
+#define __NET_TC_TUNNEL_KEY_H
+
+#include 
+
+struct tcf_tunnel_key_params {
+   struct rcu_head rcu;
+   int tcft_action;
+   int action;
+   struct metadata_dst *tcft_enc_metadata;
+};
+
+struct tcf_tunnel_key {
+   struct tc_action  common;
+   struct tcf_tunnel_key_params *params;
+};
+
+#define to_tunnel_key(a) ((struct tcf_tunnel_key *)a)
+
+#endif /* __NET_TC_TUNNEL_KEY_H */
+
diff --git a/include/uapi/linux/tc_act/tc_tunnel_key.h 
b/include/uapi/linux/tc_act/tc_tunnel_key.h
new file mode 100644
index 000..f9ddf53
--- /dev/null
+++ b/include/uapi/linux/tc_act/tc_tunnel_key.h
@@ -0,0 +1,42 @@
+/*
+ * Copyright (c) 2016, Amir Vadai 
+ * Copyright (c) 2016, Mellanox Technologies. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+#ifndef __LINUX_TC_TUNNEL_KEY_H
+#define __LINUX_TC_TUNNEL_KEY_H
+
+#include 
+
+#define TCA_ACT_TUNNEL_KEY 17
+
+#define TCA_TUNNEL_KEY_ACT_SET 1
+#define TCA_TUNNEL_KEY_ACT_RELEASE  2
+
+struct tc_tunnel_key {
+   tc_gen;
+   int t_action;
+};
+
+enum {
+   TCA_TUNNEL_KEY_UNSPEC,
+   TCA_TUNNEL_KEY_TM,
+   TCA_TUNNEL_KEY_PARMS,
+   TCA_TUNNEL_KEY_ENC_IPV4_SRC,/* be32 */
+   TCA_TUNNEL_KEY_ENC_IPV4_DST,/* be32 */
+   TCA_TUNNEL_KEY_ENC_IPV6_SRC,/* struct in6_addr */
+   TCA_TUNNEL_KEY_ENC_IPV6_DST,/* struct in6_addr */
+   TCA_TUNNEL_KEY_ENC_KEY_ID,  /* be64 */
+   TCA_TUNNEL_KEY_PAD,
+   __TCA_TUNNEL_KEY_MAX,
+};
+
+#define TCA_TUNNEL_KEY_MAX (__TCA_TUNNEL_KEY_MAX - 1)
+
+#endif
+
diff --git a/net/sched/Kconfig b/net/sched/Kconfig
index ccf931b..72e3426 100644
--- a/net/sched/Kconfig
+++ b/net/sched/Kconfig
@@ -761,6 +761,17 @@ config NET_ACT_IFE
  To compile this code as a module, choose M here: the
  module will be called act_ife.
 
+config NET_ACT_TUNNEL_KEY
+tristate "IP tunnel metadata manipulation"
+depends on NET_CLS_ACT
+---help---
+ Say Y here to set/release ip tunnel metadata.
+
+ If unsure, say N.
+
+ To compile this code as a module, choose M here: the
+ module will be called act_tunnel_key.
+
 config NET_IFE_SKBMARK
 tristate "Support to encoding decoding skb mark on IFE action"
 depends on NET_ACT_IFE
diff --git a/net/sched/Makefile b/net/sched/Makefile
index ae088a5..b9d046b 100644
--- a/net/sched/Makefile
+++ b/net/sched/Makefile
@@ -22,6 +22,7 @@ obj-$(CONFIG_NET_ACT_CONNMARK)+= act_connmark.o
 obj-$(CONFIG_NET_ACT_IFE)  += act_ife.o
 obj-$(CONFIG_NET_IFE_SKBMARK)  += act_meta_mark.o
 obj-$(CONFIG_NET_IFE_SKBPRIO)  += act_meta_skbprio.o

[PATCH net-next V5 2/4] net/dst: Utility functions to build dst_metadata without supplying an skb

2016-09-04 Thread Hadar Hen Zion

From: Amir Vadai 

Extract __ip_tun_set_dst() and __ipv6_tun_set_dst() out of
ip_tun_rx_dst() and ipv6_tun_rx_dst(), to be used without supplying an
skb.

Signed-off-by: Amir Vadai 
Signed-off-by: Hadar Hen Zion 
Acked-by: Jiri Pirko 
Reviewed-by: Shmulik Ladkani 
---
 include/net/dst_metadata.h | 45 -
 1 file changed, 32 insertions(+), 13 deletions(-)

diff --git a/include/net/dst_metadata.h b/include/net/dst_metadata.h
index 5db9f59..49e8847 100644
--- a/include/net/dst_metadata.h
+++ b/include/net/dst_metadata.h
@@ -112,12 +112,10 @@ static inline struct ip_tunnel_info 
*skb_tunnel_info_unclone(struct sk_buff *skb
return >u.tun_info;
 }
 
-static inline struct metadata_dst *ip_tun_rx_dst(struct sk_buff *skb,
-__be16 flags,
-__be64 tunnel_id,
-int md_size)
+static inline struct metadata_dst *
+__ip_tun_set_dst(__be32 saddr, __be32 daddr, __u8 tos, __u8 ttl,
+__be16 flags, __be64 tunnel_id, int md_size)
 {
-   const struct iphdr *iph = ip_hdr(skb);
struct metadata_dst *tun_dst;
 
tun_dst = tun_rx_dst(md_size);
@@ -125,17 +123,27 @@ static inline struct metadata_dst *ip_tun_rx_dst(struct 
sk_buff *skb,
return NULL;
 
ip_tunnel_key_init(_dst->u.tun_info.key,
-  iph->saddr, iph->daddr, iph->tos, iph->ttl,
+  saddr, daddr, tos, ttl,
   0, 0, 0, tunnel_id, flags);
return tun_dst;
 }
 
-static inline struct metadata_dst *ipv6_tun_rx_dst(struct sk_buff *skb,
+static inline struct metadata_dst *ip_tun_rx_dst(struct sk_buff *skb,
 __be16 flags,
 __be64 tunnel_id,
 int md_size)
 {
-   const struct ipv6hdr *ip6h = ipv6_hdr(skb);
+   const struct iphdr *iph = ip_hdr(skb);
+
+   return __ip_tun_set_dst(iph->saddr, iph->daddr, iph->tos, iph->ttl,
+   flags, tunnel_id, md_size);
+}
+
+static inline struct metadata_dst *
+__ipv6_tun_set_dst(const struct in6_addr *saddr, const struct in6_addr *daddr,
+  __u8 tos, __u8 ttl, __be32 label, __be16 flags,
+  __be64 tunnel_id, int md_size)
+{
struct metadata_dst *tun_dst;
struct ip_tunnel_info *info;
 
@@ -150,14 +158,25 @@ static inline struct metadata_dst *ipv6_tun_rx_dst(struct 
sk_buff *skb,
info->key.tp_src = 0;
info->key.tp_dst = 0;
 
-   info->key.u.ipv6.src = ip6h->saddr;
-   info->key.u.ipv6.dst = ip6h->daddr;
+   info->key.u.ipv6.src = *saddr;
+   info->key.u.ipv6.dst = *daddr;
 
-   info->key.tos = ipv6_get_dsfield(ip6h);
-   info->key.ttl = ip6h->hop_limit;
-   info->key.label = ip6_flowlabel(ip6h);
+   info->key.tos = tos;
+   info->key.ttl = ttl;
+   info->key.label = label;
 
return tun_dst;
 }
 
+static inline struct metadata_dst *
+ipv6_tun_rx_dst(struct sk_buff *skb, __be16 flags, __be64 tunnel_id,
+   int md_size)
+{
+   const struct ipv6hdr *ip6h = ipv6_hdr(skb);
+
+   return __ipv6_tun_set_dst(>saddr, >daddr,
+   ipv6_get_dsfield(ip6h), ip6h->hop_limit,
+   ip6_flowlabel(ip6h), flags, tunnel_id,
+   md_size);
+}
 #endif /* __NET_DST_METADATA_H */
-- 
1.8.3.1

[PATCH net-next V5 1/4] net/ip_tunnels: Introduce tunnel_id_to_key32() and key32_to_tunnel_id()

2016-09-04 Thread Hadar Hen Zion

From: Amir Vadai 

Add utility functions to convert a 32 bits key into a 64 bits tunnel and
vice versa.
These functions will be used instead of cloning code in GRE and VXLAN,
and in tc act_iptunnel which will be introduced in a following patch in
this patchset.

Signed-off-by: Amir Vadai 
Signed-off-by: Hadar Hen Zion 
Reviewed-by: Shmulik Ladkani 
Acked-by: Jiri Benc 
Acked-by: Jiri Pirko 
---
 drivers/net/vxlan.c  |  4 ++--
 include/net/ip_tunnels.h | 19 +++
 include/net/vxlan.h  | 18 --
 net/ipv4/ip_gre.c| 23 ++-
 4 files changed, 23 insertions(+), 41 deletions(-)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index f605a36..dc1a412 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -1291,7 +1291,7 @@ static int vxlan_rcv(struct sock *sk, struct sk_buff *skb)
struct metadata_dst *tun_dst;
 
tun_dst = udp_tun_rx_dst(skb, vxlan_get_sk_family(vs), 
TUNNEL_KEY,
-vxlan_vni_to_tun_id(vni), sizeof(*md));
+key32_to_tunnel_id(vni), sizeof(*md));
 
if (!tun_dst)
goto drop;
@@ -1945,7 +1945,7 @@ static void vxlan_xmit_one(struct sk_buff *skb, struct 
net_device *dev,
goto drop;
}
dst_port = info->key.tp_dst ? : vxlan->cfg.dst_port;
-   vni = vxlan_tun_id_to_vni(info->key.tun_id);
+   vni = tunnel_id_to_key32(info->key.tun_id);
remote_ip.sa.sa_family = ip_tunnel_info_af(info);
if (remote_ip.sa.sa_family == AF_INET) {
remote_ip.sin.sin_addr.s_addr = info->key.u.ipv4.dst;
diff --git a/include/net/ip_tunnels.h b/include/net/ip_tunnels.h
index a5e7035..e598c63 100644
--- a/include/net/ip_tunnels.h
+++ b/include/net/ip_tunnels.h
@@ -222,6 +222,25 @@ static inline unsigned short ip_tunnel_info_af(const 
struct ip_tunnel_info
return tun_info->mode & IP_TUNNEL_INFO_IPV6 ? AF_INET6 : AF_INET;
 }
 
+static inline __be64 key32_to_tunnel_id(__be32 key)
+{
+#ifdef __BIG_ENDIAN
+   return (__force __be64)key;
+#else
+   return (__force __be64)((__force u64)key << 32);
+#endif
+}
+
+/* Returns the least-significant 32 bits of a __be64. */
+static inline __be32 tunnel_id_to_key32(__be64 tun_id)
+{
+#ifdef __BIG_ENDIAN
+   return (__force __be32)tun_id;
+#else
+   return (__force __be32)((__force u64)tun_id >> 32);
+#endif
+}
+
 #ifdef CONFIG_INET
 
 int ip_tunnel_init(struct net_device *dev);
diff --git a/include/net/vxlan.h b/include/net/vxlan.h
index b96d036..0255613 100644
--- a/include/net/vxlan.h
+++ b/include/net/vxlan.h
@@ -350,24 +350,6 @@ static inline __be32 vxlan_vni_field(__be32 vni)
 #endif
 }
 
-static inline __be32 vxlan_tun_id_to_vni(__be64 tun_id)
-{
-#if defined(__BIG_ENDIAN)
-   return (__force __be32)tun_id;
-#else
-   return (__force __be32)((__force u64)tun_id >> 32);
-#endif
-}
-
-static inline __be64 vxlan_vni_to_tun_id(__be32 vni)
-{
-#if defined(__BIG_ENDIAN)
-   return (__force __be64)vni;
-#else
-   return (__force __be64)((u64)(__force u32)vni << 32);
-#endif
-}
-
 static inline size_t vxlan_rco_start(__be32 vni_field)
 {
return be32_to_cpu(vni_field & VXLAN_RCO_MASK) << VXLAN_RCO_SHIFT;
diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c
index 113cc43..576f705 100644
--- a/net/ipv4/ip_gre.c
+++ b/net/ipv4/ip_gre.c
@@ -246,25 +246,6 @@ static void gre_err(struct sk_buff *skb, u32 info)
ipgre_err(skb, info, );
 }
 
-static __be64 key_to_tunnel_id(__be32 key)
-{
-#ifdef __BIG_ENDIAN
-   return (__force __be64)((__force u32)key);
-#else
-   return (__force __be64)((__force u64)key << 32);
-#endif
-}
-
-/* Returns the least-significant 32 bits of a __be64. */
-static __be32 tunnel_id_to_key(__be64 x)
-{
-#ifdef __BIG_ENDIAN
-   return (__force __be32)x;
-#else
-   return (__force __be32)((__force u64)x >> 32);
-#endif
-}
-
 static int __ipgre_rcv(struct sk_buff *skb, const struct tnl_ptk_info *tpi,
   struct ip_tunnel_net *itn, int hdr_len, bool raw_proto)
 {
@@ -290,7 +271,7 @@ static int __ipgre_rcv(struct sk_buff *skb, const struct 
tnl_ptk_info *tpi,
__be64 tun_id;
 
flags = tpi->flags & (TUNNEL_CSUM | TUNNEL_KEY);
-   tun_id = key_to_tunnel_id(tpi->key);
+   tun_id = key32_to_tunnel_id(tpi->key);
tun_dst = ip_tun_rx_dst(skb, flags, tun_id, 0);
if (!tun_dst)
return PACKET_REJECT;
@@ -446,7 +427,7 @@ static void gre_fb_xmit(struct sk_buff *skb, struct 
net_device *dev,
 
flags = tun_info->key.tun_flags & (TUNNEL_CSUM | TUNNEL_KEY);

Re: [PATCH] vhost: Add polling mode

2016-09-04 Thread Razya Ladelsky

"Michael S. Tsirkin"  wrote on 10/08/2014 10:45:59 PM:

> From: "Michael S. Tsirkin" 
> To: Razya Ladelsky/Haifa/IBM@IBMIL, 
> Cc: k...@vger.kernel.org, Alex Glikson/Haifa/IBM@IBMIL, Eran 
> Raichstein/Haifa/IBM@IBMIL, Yossi Kuperman1/Haifa/IBM@IBMIL, Joel 
> Nider/Haifa/IBM@IBMIL, abel.gor...@gmail.com, linux-
> ker...@vger.kernel.org, netdev@vger.kernel.org, 
> virtualizat...@lists.linux-foundation.org
> Date: 10/08/2014 10:45 PM
> Subject: Re: [PATCH] vhost: Add polling mode
> 
> On Sun, Aug 10, 2014 at 11:30:35AM +0300, Razya Ladelsky wrote:
> > From: Razya Ladelsky 
> > Date: Thu, 31 Jul 2014 09:47:20 +0300
> > Subject: [PATCH] vhost: Add polling mode
> > 
> > When vhost is waiting for buffers from the guest driver (e.g., 
> more packets to
> > send in vhost-net's transmit queue), it normally goes to sleep and
> waits for the
> > guest to "kick" it. This kick involves a PIO in the guest, and 
> therefore an exit
> > (and possibly userspace involvement in translating this PIO exit into 
a file
> > descriptor event), all of which hurts performance.
> > 
> > If the system is under-utilized (has cpu time to spare), vhost can
> continuously
> > poll the virtqueues for new buffers, and avoid asking the guest to 
kick us.
> > This patch adds an optional polling mode to vhost, that can be enabled 
via a
> > kernel module parameter, "poll_start_rate".
> > 
> > When polling is active for a virtqueue, the guest is asked to disable
> > notification (kicks), and the worker thread continuously checks 
> for new buffers.
> > When it does discover new buffers, it simulates a "kick" by invoking 
the
> > underlying backend driver (such as vhost-net), which thinks it got
> a real kick
> > from the guest, and acts accordingly. If the underlying driver 
> asks not to be
> > kicked, we disable polling on this virtqueue.
> > 
> > We start polling on a virtqueue when we notice it has work to do. 
Polling on
> > this virtqueue is later disabled after 3 seconds of polling 
> turning up no new
> > work, as in this case we are better off returning to the exit-
> based notification
> > mechanism. The default timeout of 3 seconds can be changed with the
> > "poll_stop_idle" kernel module parameter.
> > 
> > This polling approach makes lot of sense for new HW with posted-
> interrupts for
> > which we have exitless host-to-guest notifications. But even with 
> support for
> > posted interrupts, guest-to-host communication still causes exits.
> Polling adds
> > the missing part.
> > 
> > When systems are overloaded, there won't be enough cpu time for the 
various
> > vhost threads to poll their guests' devices. For these scenarios, 
> we plan to add
> > support for vhost threads that can be shared by multiple devices, even 
of
> > multiple vms.
> > Our ultimate goal is to implement the I/O acceleration features 
> described in:
> > KVM Forum 2013: Efficient and Scalable Virtio (by Abel Gordon)
> > https://www.youtube.com/watch?v=9EyweibHfEs
> > and
> > https://www.mail-archive.com/kvm@vger.kernel.org/msg98179.html
> > 
> > I ran some experiments with TCP stream netperf and filebench 
> (having 2 threads
> > performing random reads) benchmarks on an IBM System x3650 M4.
> > I have two machines, A and B. A hosts the vms, B runs the netserver.
> > The vms (on A) run netperf, its destination server is running on B.
> > All runs loaded the guests in a way that they were (cpu) 
> saturated. For example,
> > I ran netperf with 64B messages, which is heavily loading the vm 
> (which is why
> > its throughput is low).
> > The idea was to get it 100% loaded, so we can see that the polling
> is getting it
> > to produce higher throughput.
> 
> And, did your tests actually produce 100% load on both host CPUs?
> 

The vm indeed utilized 100% cpu, whether polling was enabled or not.
The vhost thread utilized less than 100% (of the other cpu) when polling 
was disabled.
Enabling polling increased its utilization to 100% (in which case both 
cpus were 100% utilized). 
 

> > The system had two cores per guest, as to allow for both the vcpu 
> and the vhost
> > thread to run concurrently for maximum throughput (but I didn't 
> pin the threads
> > to specific cores).
> > My experiments were fair in a sense that for both cases, with or 
without
> > polling, I run both threads, vcpu and vhost, on 2 cores (set their
> affinity that
> > way). The only difference was whether polling was enabled/disabled.
> > 
> > Results:
> > 
> > Netperf, 1 vm:
> > The polling patch improved throughput by ~33% (1516 MB/sec -> 2046 
MB/sec).
> > Number of exits/sec decreased 6x.
> > The same improvement was shown when I tested with 3 vms running 
netperf
> > (4086 MB/sec -> 5545 MB/sec).
> > 
> > filebench, 1 vm:
> > ops/sec improved by 13% with the polling patch. Number of exits 
> was reduced by
> > 31%.
> > The same experiment with 3 vms running filebench showed similar 
numbers.
> > 
> > Signed-off-by: Razya

RE: [PATCH] qed: Remove OOM messages

2016-09-04 Thread Yuval Mintz


> These messages are unnecessary as OOM allocation failures already do a
> dump_stack() giving more or less the same information.
> 
> $ size drivers/net/ethernet/qlogic/qed/built-in.o* (defconfig x86-64)
>text  data bss dec hex filename
>  126849 27968   32800  187617   2dce1 
> drivers/net/ethernet/qlogic/qed/built-
> in.o.new
>  131506 27968   32800  192274   2ef12 
> drivers/net/ethernet/qlogic/qed/built-
> in.o.old
> 
> Miscellanea:
> 
> o Change allocs to the generally preferred forms where possible.
> 
> Signed-off-by: Joe Perches 

Looking good. Thanks Joe.

Acked-by: Yuval Mintz

66 matches

Mail list logo