Re: [REGRESSION] NFS is creating a hidden port (left over from xs_bind() )

2015-06-19 Thread Trond Myklebust
On Fri, 2015-06-19 at 15:52 -0400, Jeff Layton wrote:
 On Fri, 19 Jun 2015 13:39:08 -0400
 Trond Myklebust trond.mykleb...@primarydata.com wrote:
 
  On Fri, Jun 19, 2015 at 1:17 PM, Steven Rostedt 
  rost...@goodmis.org wrote:
   On Fri, 19 Jun 2015 12:25:53 -0400
   Steven Rostedt rost...@goodmis.org wrote:
   
   
I don't see that 55201 anywhere. But then again, I didn't look 
for it
before the port disappeared. I could reboot and look for it 
again. I
should have saved the full netstat -tapn as well :-/
   
   Of course I didn't find it anywhere, that's the port on my wife's 
   box
   that port 947 was connected to.
   
   Now I even went over to my wife's box and ran
   
# rpcinfo -p localhost
  program vers proto   port  service
   104   tcp111  portmapper
   103   tcp111  portmapper
   102   tcp111  portmapper
   104   udp111  portmapper
   103   udp111  portmapper
   102   udp111  portmapper
   1000241   udp  34243  status
   1000241   tcp  34498  status
   
   which doesn't show anything.
   
   but something is listening to that port...
   
# netstat -ntap |grep 55201
   tcp0  0 0.0.0.0:55201   0.0.0.0:*
  LISTEN
  
  
  Hang on. This is on the client box while there is an active NFSv4
  mount? Then that's probably the NFSv4 callback channel listening 
  for
  delegation callbacks.
  
  Can you please try:
  
  echo options nfs callback_tcpport=4048  /etc/modprobe.d/nfs
  -local.conf
  
  and then either reboot the client or unload and then reload the nfs
  modules before reattempting the mount. If this is indeed the 
  callback
  channel, then that will move your phantom listener to port 4048...
  
 
 Right, it was a little unclear to me before, but it now seems clear
 that the callback socket that the server is opening to the client is
 the one squatting on the port.
 
 ...and that sort of makes sense, doesn't it? That rpc_clnt will stick
 around for the life of the client's lease, and the rpc_clnt binds to 
 a
 particular port so that it can reconnect using the same one.
 
 Given that Stephen has done the legwork and figured out that 
 reverting
 those commits fixes the issue, then I suspect that the real culprit 
 is
 caf4ccd4e88cf2.
 
 The client is likely closing down the other end of the callback
 socket when it goes idle. Before that commit, we probably did an
 xs_close on it, but now we're doing a xs_tcp_shutdown and that leaves
 the port bound.
 

Agreed. I've been looking into whether or not there is a simple fix.
Reverting those patches is not an option, because the whole point was
to ensure that the socket is in the TCP_CLOSED state before we release
the socket.

Steven, how about something like the following patch?

8-
From 9a0bcfdbdbc793eae1ed6d901a6396b6c66f9513 Mon Sep 17 00:00:00 2001
From: Trond Myklebust trond.mykleb...@primarydata.com
Date: Fri, 19 Jun 2015 16:17:57 -0400
Subject: [PATCH] SUNRPC: Ensure we release the TCP socket once it has been
 closed

This fixes a regression introduced by commit caf4ccd4e88cf2 (SUNRPC:
Make xs_tcp_close() do a socket shutdown rather than a sock_release).
Prior to that commit, the autoclose feature would ensure that an
idle connection would result in the socket being both disconnected and
released, whereas now only gets disconnected.

While the current behaviour is harmless, it does leave the port bound
until either RPC traffic resumes or the RPC client is shut down.

Reported-by: Steven Rostedt rost...@goodmis.org
Signed-off-by: Trond Myklebust trond.mykleb...@primarydata.com
---
 net/sunrpc/xprt.c | 2 +-
 net/sunrpc/xprtsock.c | 8 ++--
 2 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c
index 3ca31f20b97c..ab5dd621ae0c 100644
--- a/net/sunrpc/xprt.c
+++ b/net/sunrpc/xprt.c
@@ -611,8 +611,8 @@ static void xprt_autoclose(struct work_struct *work)
struct rpc_xprt *xprt =
container_of(work, struct rpc_xprt, task_cleanup);
 
-   xprt-ops-close(xprt);
clear_bit(XPRT_CLOSE_WAIT, xprt-state);
+   xprt-ops-close(xprt);
xprt_release_write(xprt, NULL);
 }
 
diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
index fda8ec8c74c0..75dcdadf0269 100644
--- a/net/sunrpc/xprtsock.c
+++ b/net/sunrpc/xprtsock.c
@@ -634,10 +634,13 @@ static void xs_tcp_shutdown(struct rpc_xprt *xprt)
struct sock_xprt *transport = container_of(xprt, struct sock_xprt, 
xprt);
struct socket *sock = transport-sock;
 
-   if (sock != NULL) {
+   if (sock == NULL)
+   return;
+   if (xprt_connected(xprt)) {
kernel_sock_shutdown(sock, SHUT_RDWR);
trace_rpc_socket_shutdown(xprt, sock);
-   }
+   } else
+   xs_reset_transport(transport);
 }
 

[net-next] vmxnet3: Changes for vmxnet3 adapter version 2 (fwd)

2015-06-19 Thread Shreyas Bhatewara

Make the driver understand adapter version 2.

Cc: Rachel Lunnon rachel_lun...@stormagic.com
Signed-off-by: Guolin Yang gy...@vmware.com
Signed-off-by: Shreyas N Bhatewara sbhatew...@vmware.com
--

diff --git a/drivers/net/vmxnet3/vmxnet3_defs.h 
b/drivers/net/vmxnet3/vmxnet3_defs.h
index 3718d02..221a530 100644
--- a/drivers/net/vmxnet3/vmxnet3_defs.h
+++ b/drivers/net/vmxnet3/vmxnet3_defs.h
@@ -1,7 +1,7 @@
 /*
  * Linux driver for VMware's vmxnet3 ethernet NIC.
  *
- * Copyright (C) 2008-2009, VMware, Inc. All Rights Reserved.
+ * Copyright (C) 2008-2015, VMware, Inc. All Rights Reserved.
  *
  * This program is free software; you can redistribute it and/or modify it
  * under the terms of the GNU General Public License as published by the
@@ -277,6 +277,40 @@ struct Vmxnet3_RxCompDesc {
 #endif  /* __BIG_ENDIAN_BITFIELD */
 };
 
+struct Vmxnet3_RxCompDescExt {
+   __le32  dword1;
+   u8  segCnt;   /* Number of aggregated packets */
+   u8  dupAckCnt;/* Number of duplicate Acks */
+   __le16  tsDelta;  /* TCP timestamp difference */
+   __le32  dword2;
+#ifdef __BIG_ENDIAN_BITFIELD
+   u32 gen:1;/* generation bit */
+   u32 type:7;   /* completion type */
+   u32 fcs:1;/* Frame CRC correct */
+   u32 frg:1;/* IP Fragment */
+   u32 v4:1; /* IPv4 */
+   u32 v6:1; /* IPv6 */
+   u32 ipc:1;/* IP Checksum Correct */
+   u32 tcp:1;/* TCP packet */
+   u32 udp:1;/* UDP packet */
+   u32 tuc:1;/* TCP/UDP Checksum Correct */
+   u32 mss:16;
+#else
+   u32 mss:16;
+   u32 tuc:1;/* TCP/UDP Checksum Correct */
+   u32 udp:1;/* UDP packet */
+   u32 tcp:1;/* TCP packet */
+   u32 ipc:1;/* IP Checksum Correct */
+   u32 v6:1; /* IPv6 */
+   u32 v4:1; /* IPv4 */
+   u32 frg:1;/* IP Fragment */
+   u32 fcs:1;/* Frame CRC correct */
+   u32 type:7;   /* completion type */
+   u32 gen:1;/* generation bit */
+#endif  /* __BIG_ENDIAN_BITFIELD */
+};
+
+
 /* fields in RxCompDesc we access via Vmxnet3_GenericDesc.dword[3] */
 #define VMXNET3_RCD_TUC_SHIFT  16
 #define VMXNET3_RCD_IPC_SHIFT  19
@@ -310,6 +344,7 @@ union Vmxnet3_GenericDesc {
struct Vmxnet3_RxDesc   rxd;
struct Vmxnet3_TxCompDesc   tcd;
struct Vmxnet3_RxCompDesc   rcd;
+   struct Vmxnet3_RxCompDescExtrcdExt;
 };
 
 #define VMXNET3_INIT_GEN   1
@@ -361,6 +396,7 @@ enum {
 /* completion descriptor types */
 #define VMXNET3_CDTYPE_TXCOMP  0/* Tx Completion Descriptor */
 #define VMXNET3_CDTYPE_RXCOMP  3/* Rx Completion Descriptor */
+#define VMXNET3_CDTYPE_RXCOMP_LRO  4/* Rx Completion Descriptor for LRO */
 
 enum {
VMXNET3_GOS_BITS_UNK= 0,   /* unknown */
diff --git a/drivers/net/vmxnet3/vmxnet3_drv.c 
b/drivers/net/vmxnet3/vmxnet3_drv.c
index ab53975..da11bb5 100644
--- a/drivers/net/vmxnet3/vmxnet3_drv.c
+++ b/drivers/net/vmxnet3/vmxnet3_drv.c
@@ -1163,6 +1163,52 @@ vmxnet3_rx_error(struct vmxnet3_rx_queue *rq, struct 
Vmxnet3_RxCompDesc *rcd,
 }
 
 
+static u32
+vmxnet3_get_hdr_len(struct vmxnet3_adapter *adapter, struct sk_buff *skb,
+   union Vmxnet3_GenericDesc *gdesc)
+{
+   u32 hlen, maplen;
+   union {
+   void *ptr;
+   struct ethhdr *eth;
+   struct iphdr *ipv4;
+   struct ipv6hdr *ipv6;
+   struct tcphdr *tcp;
+   } hdr;
+   BUG_ON(gdesc-rcd.tcp == 0);
+
+   maplen = skb_headlen(skb);
+   if (unlikely(sizeof(struct iphdr) + sizeof(struct tcphdr)  maplen))
+   return 0;
+
+   hdr.eth = eth_hdr(skb);
+   if (gdesc-rcd.v4) {
+   BUG_ON(hdr.eth-h_proto != htons(ETH_P_IP));
+   hdr.ptr += sizeof(struct ethhdr);
+   BUG_ON(hdr.ipv4-protocol != IPPROTO_TCP);
+   hlen = hdr.ipv4-ihl  2;
+   hdr.ptr += hdr.ipv4-ihl  2;
+   } else if (gdesc-rcd.v6) {
+   BUG_ON(hdr.eth-h_proto != htons(ETH_P_IPV6));
+   hdr.ptr += sizeof(struct ethhdr);
+   /* Use an estimated value, since we also need to handle
+* TSO case.
+*/
+   if (hdr.ipv6-nexthdr != IPPROTO_TCP)
+   return sizeof(struct ipv6hdr) + sizeof(struct tcphdr);
+   hlen = sizeof(struct ipv6hdr);
+   hdr.ptr += sizeof(struct ipv6hdr);
+   } else {
+   /* Non-IP pkt, dont estimate header length */
+   return 0;
+   }
+
+  

Re: [REGRESSION] NFS is creating a hidden port (left over from xs_bind() )

2015-06-19 Thread Steven Rostedt
On Fri, 19 Jun 2015 13:39:08 -0400
Trond Myklebust trond.mykleb...@primarydata.com wrote:


 Hang on. This is on the client box while there is an active NFSv4
 mount? Then that's probably the NFSv4 callback channel listening for
 delegation callbacks.
 
 Can you please try:
 
 echo options nfs callback_tcpport=4048  /etc/modprobe.d/nfs-local.conf
 
 and then either reboot the client or unload and then reload the nfs
 modules before reattempting the mount. If this is indeed the callback
 channel, then that will move your phantom listener to port 4048...

I unmounted the directories, removed the nfs modules, and then add this
file, and loaded the modules back and remounted the directories.

# netstat -ntap |grep 4048
tcp0  0 0.0.0.0:40480.0.0.0:*   LISTEN  
-   
tcp0  0 192.168.23.22:4048  192.168.23.9:1010   ESTABLISHED 
-   
tcp6   0  0 :::4048 :::*LISTEN  
-   

-- Steve

--
To unsubscribe from this list: send the line unsubscribe netdev in


Re: [REGRESSION] NFS is creating a hidden port (left over from xs_bind() )

2015-06-19 Thread Trond Myklebust
On Fri, 2015-06-19 at 18:14 -0400, Steven Rostedt wrote:
 On Fri, 19 Jun 2015 16:30:18 -0400
 Trond Myklebust trond.mykleb...@primarydata.com wrote:
 
  Steven, how about something like the following patch?
  
 
 OK, the box I'm running this on is using v4.0.5, can you make a patch
 based on that, as whatever you make needs to go to stable as well.

Is it causing any other damage than the rkhunter warning you reported?

 distcc[31554] ERROR: compile /home/rostedt/work/git/nobackup/linux
 -build.git/net/sunrpc/xprtsock.c on fedora/8 failed
 distcc[31554] (dcc_build_somewhere) Warning: remote compilation of 
 '/home/rostedt/work/git/nobackup/linux
 -build.git/net/sunrpc/xprtsock.c' failed, retrying locally
 distcc[31554] Warning: failed to distribute 
 /home/rostedt/work/git/nobackup/linux-build.git/net/sunrpc/xprtsock.c 
 to fedora/8, running locally instead
 /home/rostedt/work/git/nobackup/linux
 -build.git/net/sunrpc/xprtsock.c: In function 'xs_tcp_shutdown':
 /home/rostedt/work/git/nobackup/linux
 -build.git/net/sunrpc/xprtsock.c:643:3: error: implicit declaration 
 of function 'xs_reset_transport' [-Werror=implicit-function
 -declaration]
 /home/rostedt/work/git/nobackup/linux
 -build.git/net/sunrpc/xprtsock.c: At top level:
 /home/rostedt/work/git/nobackup/linux
 -build.git/net/sunrpc/xprtsock.c:825:13: warning: conflicting types 
 for 'xs_reset_transport' [enabled by default]
 /home/rostedt/work/git/nobackup/linux
 -build.git/net/sunrpc/xprtsock.c:825:13: error: static declaration of 
 'xs_reset_transport' follows non-static declaration
 /home/rostedt/work/git/nobackup/linux
 -build.git/net/sunrpc/xprtsock.c:643:3: note: previous implicit 
 declaration of 'xs_reset_transport' was here
 cc1: some warnings being treated as errors
 distcc[31554] ERROR: compile /home/rostedt/work/git/nobackup/linux
 -build.git/net/sunrpc/xprtsock.c on localhost failed
 /home/rostedt/work/git/nobackup/linux
 -build.git/scripts/Makefile.build:258: recipe for target 
 'net/sunrpc/xprtsock.o' failed
 make[3]: *** [net/sunrpc/xprtsock.o] Error 1

Sorry. I sent that one off too quickly. Try the following.

8--
From 4876cc779ff525b9c2376d8076edf47815e71f2c Mon Sep 17 00:00:00 2001
From: Trond Myklebust trond.mykleb...@primarydata.com
Date: Fri, 19 Jun 2015 16:17:57 -0400
Subject: [PATCH v2] SUNRPC: Ensure we release the TCP socket once it has been
 closed

This fixes a regression introduced by commit caf4ccd4e88cf2 (SUNRPC:
Make xs_tcp_close() do a socket shutdown rather than a sock_release).
Prior to that commit, the autoclose feature would ensure that an
idle connection would result in the socket being both disconnected and
released, whereas now only gets disconnected.

While the current behaviour is harmless, it does leave the port bound
until either RPC traffic resumes or the RPC client is shut down.

Reported-by: Steven Rostedt rost...@goodmis.org
Signed-off-by: Trond Myklebust trond.mykleb...@primarydata.com
---
 net/sunrpc/xprt.c |  2 +-
 net/sunrpc/xprtsock.c | 40 ++--
 2 files changed, 23 insertions(+), 19 deletions(-)

diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c
index 3ca31f20b97c..ab5dd621ae0c 100644
--- a/net/sunrpc/xprt.c
+++ b/net/sunrpc/xprt.c
@@ -611,8 +611,8 @@ static void xprt_autoclose(struct work_struct *work)
struct rpc_xprt *xprt =
container_of(work, struct rpc_xprt, task_cleanup);
 
-   xprt-ops-close(xprt);
clear_bit(XPRT_CLOSE_WAIT, xprt-state);
+   xprt-ops-close(xprt);
xprt_release_write(xprt, NULL);
 }
 
diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
index fda8ec8c74c0..ee0715dfc3c7 100644
--- a/net/sunrpc/xprtsock.c
+++ b/net/sunrpc/xprtsock.c
@@ -623,24 +623,6 @@ process_status:
 }
 
 /**
- * xs_tcp_shutdown - gracefully shut down a TCP socket
- * @xprt: transport
- *
- * Initiates a graceful shutdown of the TCP socket by calling the
- * equivalent of shutdown(SHUT_RDWR);
- */
-static void xs_tcp_shutdown(struct rpc_xprt *xprt)
-{
-   struct sock_xprt *transport = container_of(xprt, struct sock_xprt, 
xprt);
-   struct socket *sock = transport-sock;
-
-   if (sock != NULL) {
-   kernel_sock_shutdown(sock, SHUT_RDWR);
-   trace_rpc_socket_shutdown(xprt, sock);
-   }
-}
-
-/**
  * xs_tcp_send_request - write an RPC request to a TCP socket
  * @task: address of RPC task that manages the state of an RPC request
  *
@@ -786,6 +768,7 @@ static void xs_sock_mark_closed(struct rpc_xprt *xprt)
xs_sock_reset_connection_flags(xprt);
/* Mark transport as closed and wake up all pending tasks */
xprt_disconnect_done(xprt);
+   xprt_force_disconnect(xprt);
 }
 
 /**
@@ -2103,6 +2086,27 @@ out:
xprt_wake_pending_tasks(xprt, status);
 }
 
+/**
+ * xs_tcp_shutdown - gracefully shut down a TCP socket
+ * @xprt: transport
+ *
+ * Initiates a graceful shutdown of the TCP socket by 

[PATCH net] netfilter: nf_queue: Don't recompute the hook_list head

2015-06-19 Thread Eric W. Biederman

If someone sends packets from one of the netdevice ingress hooks to
the a userspace queue, and then userspace later accepts the packet,
the netfilter code can enter an infinite loop as the list head will
never be found.

Pass in the saved list_head to avoid this.

Signed-off-by: Eric W. Biederman ebied...@xmission.com
---
 net/netfilter/nf_queue.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/netfilter/nf_queue.c b/net/netfilter/nf_queue.c
index cd60d397fe05..8a8b2abc35ff 100644
--- a/net/netfilter/nf_queue.c
+++ b/net/netfilter/nf_queue.c
@@ -213,7 +213,7 @@ void nf_reinject(struct nf_queue_entry *entry, unsigned int 
verdict)
 
if (verdict == NF_ACCEPT) {
next_hook:
-   verdict = 
nf_iterate(nf_hooks[entry-state.pf][entry-state.hook],
+   verdict = nf_iterate(entry-state.hook_list,
 skb, entry-state, elem);
}
 
-- 
2.2.1

--
To unsubscribe from this list: send the line unsubscribe netdev in


[net-next] vmxnet3: Register shutdown handler for device (fwd)

2015-06-19 Thread Shreyas Bhatewara

Implement a handler for pci shutdown so that the driver has an
opportunity to make sure that device is quiesced before the PCI
switches to legacy IRQs. This way the possibility of
screaming interrupt is avoided.

Acked-by: Shrikrishna Khare skh...@vmware.com
Signed-off-by: Shreyas N Bhatewara sbhatew...@vmware.com
---

diff --git a/drivers/net/vmxnet3/vmxnet3_drv.c 
b/drivers/net/vmxnet3/vmxnet3_drv.c
index 61c0840..bb35210 100644
--- a/drivers/net/vmxnet3/vmxnet3_drv.c
+++ b/drivers/net/vmxnet3/vmxnet3_drv.c
@@ -3184,6 +3184,32 @@ vmxnet3_remove_device(struct pci_dev *pdev)
free_netdev(netdev);
 }
 
+static void vmxnet3_shutdown_device(struct pci_dev *pdev)
+{
+   struct net_device *netdev = pci_get_drvdata(pdev);
+   struct vmxnet3_adapter *adapter = netdev_priv(netdev);
+   unsigned long flags;
+
+   /* Reset_work may be in the middle of resetting the device, wait for its
+* completion.
+*/
+   while (test_and_set_bit(VMXNET3_STATE_BIT_RESETTING, adapter-state))
+   msleep(1);
+
+   if (test_and_set_bit(VMXNET3_STATE_BIT_QUIESCED,
+adapter-state)) {
+   clear_bit(VMXNET3_STATE_BIT_RESETTING, adapter-state);
+   return;
+   }
+   spin_lock_irqsave(adapter-cmd_lock, flags);
+   VMXNET3_WRITE_BAR1_REG(adapter, VMXNET3_REG_CMD,
+  VMXNET3_CMD_QUIESCE_DEV);
+   spin_unlock_irqrestore(adapter-cmd_lock, flags);
+   vmxnet3_disable_all_intrs(adapter);
+
+   clear_bit(VMXNET3_STATE_BIT_RESETTING, adapter-state);
+}
+
 
 #ifdef CONFIG_PM
 
@@ -3360,6 +3386,7 @@ static struct pci_driver vmxnet3_driver = {
.id_table   = vmxnet3_pciid_table,
.probe  = vmxnet3_probe_device,
.remove = vmxnet3_remove_device,
+   .shutdown   = vmxnet3_shutdown_device,
 #ifdef CONFIG_PM
.driver.pm  = vmxnet3_pm_ops,
 #endif
--
To unsubscribe from this list: send the line unsubscribe netdev in


Re: [REGRESSION] NFS is creating a hidden port (left over from xs_bind() )

2015-06-19 Thread Steven Rostedt
On Fri, 19 Jun 2015 16:30:18 -0400
Trond Myklebust trond.mykleb...@primarydata.com wrote:

 Steven, how about something like the following patch?

Building it now. Will let you know in a bit.


 
 8-
 From 9a0bcfdbdbc793eae1ed6d901a6396b6c66f9513 Mon Sep 17 00:00:00 2001
 From: Trond Myklebust trond.mykleb...@primarydata.com
 Date: Fri, 19 Jun 2015 16:17:57 -0400
 Subject: [PATCH] SUNRPC: Ensure we release the TCP socket once it has been
  closed
 
 This fixes a regression introduced by commit caf4ccd4e88cf2 (SUNRPC:
 Make xs_tcp_close() do a socket shutdown rather than a sock_release).
 Prior to that commit, the autoclose feature would ensure that an
 idle connection would result in the socket being both disconnected and
 released, whereas now only gets disconnected.
 
 While the current behaviour is harmless, it does leave the port bound
 until either RPC traffic resumes or the RPC client is shut down.

Hmm, is this true? The port is bound, but the socket has been freed.
That is sk-sk_socket points to garbage. As my portlist.c module
verified.

It doesn't seem that anything can attach to that port again that I
know of. Is there a way to verify that something can attach to it again?

-- Steve


 
 Reported-by: Steven Rostedt rost...@goodmis.org
 Signed-off-by: Trond Myklebust trond.mykleb...@primarydata.com
 ---
  net/sunrpc/xprt.c | 2 +-
  net/sunrpc/xprtsock.c | 8 ++--
  2 files changed, 7 insertions(+), 3 deletions(-)
 
 diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c
 index 3ca31f20b97c..ab5dd621ae0c 100644
 --- a/net/sunrpc/xprt.c
 +++ b/net/sunrpc/xprt.c
 @@ -611,8 +611,8 @@ static void xprt_autoclose(struct work_struct *work)
   struct rpc_xprt *xprt =
   container_of(work, struct rpc_xprt, task_cleanup);
  
 - xprt-ops-close(xprt);
   clear_bit(XPRT_CLOSE_WAIT, xprt-state);
 + xprt-ops-close(xprt);
   xprt_release_write(xprt, NULL);
  }
  
 diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
 index fda8ec8c74c0..75dcdadf0269 100644
 --- a/net/sunrpc/xprtsock.c
 +++ b/net/sunrpc/xprtsock.c
 @@ -634,10 +634,13 @@ static void xs_tcp_shutdown(struct rpc_xprt *xprt)
   struct sock_xprt *transport = container_of(xprt, struct sock_xprt, 
 xprt);
   struct socket *sock = transport-sock;
  
 - if (sock != NULL) {
 + if (sock == NULL)
 + return;
 + if (xprt_connected(xprt)) {
   kernel_sock_shutdown(sock, SHUT_RDWR);
   trace_rpc_socket_shutdown(xprt, sock);
 - }
 + } else
 + xs_reset_transport(transport);
  }
  
  /**
 @@ -786,6 +789,7 @@ static void xs_sock_mark_closed(struct rpc_xprt *xprt)
   xs_sock_reset_connection_flags(xprt);
   /* Mark transport as closed and wake up all pending tasks */
   xprt_disconnect_done(xprt);
 + xprt_force_disconnect(xprt);
  }
  
  /**

--
To unsubscribe from this list: send the line unsubscribe netdev in


Re: [REGRESSION] NFS is creating a hidden port (left over from xs_bind() )

2015-06-19 Thread Steven Rostedt
On Fri, 19 Jun 2015 16:30:18 -0400
Trond Myklebust trond.mykleb...@primarydata.com wrote:

 Steven, how about something like the following patch?
 

OK, the box I'm running this on is using v4.0.5, can you make a patch
based on that, as whatever you make needs to go to stable as well.

distcc[31554] ERROR: compile 
/home/rostedt/work/git/nobackup/linux-build.git/net/sunrpc/xprtsock.c on 
fedora/8 failed
distcc[31554] (dcc_build_somewhere) Warning: remote compilation of 
'/home/rostedt/work/git/nobackup/linux-build.git/net/sunrpc/xprtsock.c' failed, 
retrying locally
distcc[31554] Warning: failed to distribute 
/home/rostedt/work/git/nobackup/linux-build.git/net/sunrpc/xprtsock.c to 
fedora/8, running locally instead
/home/rostedt/work/git/nobackup/linux-build.git/net/sunrpc/xprtsock.c: In 
function 'xs_tcp_shutdown':
/home/rostedt/work/git/nobackup/linux-build.git/net/sunrpc/xprtsock.c:643:3: 
error: implicit declaration of function 'xs_reset_transport' 
[-Werror=implicit-function-declaration]
/home/rostedt/work/git/nobackup/linux-build.git/net/sunrpc/xprtsock.c: At top 
level:
/home/rostedt/work/git/nobackup/linux-build.git/net/sunrpc/xprtsock.c:825:13: 
warning: conflicting types for 'xs_reset_transport' [enabled by default]
/home/rostedt/work/git/nobackup/linux-build.git/net/sunrpc/xprtsock.c:825:13: 
error: static declaration of 'xs_reset_transport' follows non-static declaration
/home/rostedt/work/git/nobackup/linux-build.git/net/sunrpc/xprtsock.c:643:3: 
note: previous implicit declaration of 'xs_reset_transport' was here
cc1: some warnings being treated as errors
distcc[31554] ERROR: compile 
/home/rostedt/work/git/nobackup/linux-build.git/net/sunrpc/xprtsock.c on 
localhost failed
/home/rostedt/work/git/nobackup/linux-build.git/scripts/Makefile.build:258: 
recipe for target 'net/sunrpc/xprtsock.o' failed
make[3]: *** [net/sunrpc/xprtsock.o] Error 1

-- Steve

 8-
 From 9a0bcfdbdbc793eae1ed6d901a6396b6c66f9513 Mon Sep 17 00:00:00 2001
 From: Trond Myklebust trond.mykleb...@primarydata.com
 Date: Fri, 19 Jun 2015 16:17:57 -0400
 Subject: [PATCH] SUNRPC: Ensure we release the TCP socket once it has been
  closed
 
 This fixes a regression introduced by commit caf4ccd4e88cf2 (SUNRPC:
 Make xs_tcp_close() do a socket shutdown rather than a sock_release).
 Prior to that commit, the autoclose feature would ensure that an
 idle connection would result in the socket being both disconnected and
 released, whereas now only gets disconnected.
 
 While the current behaviour is harmless, it does leave the port bound
 until either RPC traffic resumes or the RPC client is shut down.
 
 Reported-by: Steven Rostedt rost...@goodmis.org
 Signed-off-by: Trond Myklebust trond.mykleb...@primarydata.com
 ---
  net/sunrpc/xprt.c | 2 +-
  net/sunrpc/xprtsock.c | 8 ++--
  2 files changed, 7 insertions(+), 3 deletions(-)
 
 diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c
 index 3ca31f20b97c..ab5dd621ae0c 100644
 --- a/net/sunrpc/xprt.c
 +++ b/net/sunrpc/xprt.c
 @@ -611,8 +611,8 @@ static void xprt_autoclose(struct work_struct *work)
   struct rpc_xprt *xprt =
   container_of(work, struct rpc_xprt, task_cleanup);
  
 - xprt-ops-close(xprt);
   clear_bit(XPRT_CLOSE_WAIT, xprt-state);
 + xprt-ops-close(xprt);
   xprt_release_write(xprt, NULL);
  }
  
 diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
 index fda8ec8c74c0..75dcdadf0269 100644
 --- a/net/sunrpc/xprtsock.c
 +++ b/net/sunrpc/xprtsock.c
 @@ -634,10 +634,13 @@ static void xs_tcp_shutdown(struct rpc_xprt *xprt)
   struct sock_xprt *transport = container_of(xprt, struct sock_xprt, 
 xprt);
   struct socket *sock = transport-sock;
  
 - if (sock != NULL) {
 + if (sock == NULL)
 + return;
 + if (xprt_connected(xprt)) {
   kernel_sock_shutdown(sock, SHUT_RDWR);
   trace_rpc_socket_shutdown(xprt, sock);
 - }
 + } else
 + xs_reset_transport(transport);
  }
  
  /**
 @@ -786,6 +789,7 @@ static void xs_sock_mark_closed(struct rpc_xprt *xprt)
   xs_sock_reset_connection_flags(xprt);
   /* Mark transport as closed and wake up all pending tasks */
   xprt_disconnect_done(xprt);
 + xprt_force_disconnect(xprt);
  }
  
  /**

--
To unsubscribe from this list: send the line unsubscribe netdev in


[net-next] vmxnet3: Fix memory leaks in rx path (fwd)

2015-06-19 Thread Shreyas Bhatewara

If rcd length was zero, the page used for frag was not being released. It
was being replaced with a newly allocated page. This change takes care
of that memory leak.

Signed-off-by: Guolin Yang gy...@vmware.com
Signed-off-by: Shreyas N Bhatewara sbhatew...@vmware.com
---

diff --git a/drivers/net/vmxnet3/vmxnet3_drv.c 
b/drivers/net/vmxnet3/vmxnet3_drv.c
index bb35210..ab53975 100644
--- a/drivers/net/vmxnet3/vmxnet3_drv.c
+++ b/drivers/net/vmxnet3/vmxnet3_drv.c
@@ -861,6 +861,9 @@ vmxnet3_parse_and_copy_hdr(struct sk_buff *skb, struct 
vmxnet3_tx_queue *tq,
 , skb_headlen(skb));
}
 
+   if (skb-len = VMXNET3_HDR_COPY_SIZE)
+   ctx-copy_size = skb-len;
+
/* make sure headers are accessible directly */
if (unlikely(!pskb_may_pull(skb, ctx-copy_size)))
goto err;
@@ -1273,36 +1276,36 @@ vmxnet3_rq_rx_complete(struct vmxnet3_rx_queue *rq,
if (skip_page_frags)
goto rcd_done;
 
-   new_page = alloc_page(GFP_ATOMIC);
-   if (unlikely(new_page == NULL)) {
+   if (rcd-len) {
+   new_page = alloc_page(GFP_ATOMIC);
/* Replacement page frag could not be allocated.
 * Reuse this page. Drop the pkt and free the
 * skb which contained this page as a frag. Skip
 * processing all the following non-sop frags.
 */
-   rq-stats.rx_buf_alloc_failure++;
-   dev_kfree_skb(ctx-skb);
-   ctx-skb = NULL;
-   skip_page_frags = true;
-   goto rcd_done;
-   }
+   if (unlikely(!new_page)) {
+   rq-stats.rx_buf_alloc_failure++;
+   dev_kfree_skb(ctx-skb);
+   ctx-skb = NULL;
+   skip_page_frags = true;
+   goto rcd_done;
+   }
 
-   if (rcd-len) {
dma_unmap_page(adapter-pdev-dev,
   rbi-dma_addr, rbi-len,
   PCI_DMA_FROMDEVICE);
 
vmxnet3_append_frag(ctx-skb, rcd, rbi);
-   }
 
-   /* Immediate refill */
-   rbi-page = new_page;
-   rbi-dma_addr = dma_map_page(adapter-pdev-dev,
-rbi-page,
-0, PAGE_SIZE,
-PCI_DMA_FROMDEVICE);
-   rxd-addr = cpu_to_le64(rbi-dma_addr);
-   rxd-len = rbi-len;
+   /* Immediate refill */
+   rbi-page = new_page;
+   rbi-dma_addr = dma_map_page(adapter-pdev-dev
+   , rbi-page,
+   0, PAGE_SIZE,
+   PCI_DMA_FROMDEVICE);
+   rxd-addr = cpu_to_le64(rbi-dma_addr);
+   rxd-len = rbi-len;
+   }
}
 
 
--
To unsubscribe from this list: send the line unsubscribe netdev in


[PATCH 0/3] Small fixes for Renesas R-Car CAN driver

2015-06-19 Thread Sergei Shtylyov
Hello.

   Here's the set of 3 patches against Marc Kleine-Budde's 'linux-can.git'
repo; they are small fixes for the Renesas R-Car CAN driver.

rcar_can: fix IRQ check
rcar_can: print signed IRQ #
rcar_can: fix typo in error message

WBR, Sergei

--
To unsubscribe from this list: send the line unsubscribe netdev in


Re: [REGRESSION] NFS is creating a hidden port (left over from xs_bind() )

2015-06-19 Thread Steven Rostedt
On Fri, 19 Jun 2015 19:25:59 -0400
Trond Myklebust trond.mykleb...@primarydata.com wrote:

 On Fri, 2015-06-19 at 18:14 -0400, Steven Rostedt wrote:
  On Fri, 19 Jun 2015 16:30:18 -0400
  Trond Myklebust trond.mykleb...@primarydata.com wrote:
  
   Steven, how about something like the following patch?
   
  
  OK, the box I'm running this on is using v4.0.5, can you make a patch
  based on that, as whatever you make needs to go to stable as well.
 
 Is it causing any other damage than the rkhunter warning you reported?

Well, not that I know of. Are you sure that this port will be
reconnected, and is not just a leak. Not sure if you could waste more
ports this way with connections to other machines. I only have my
wife's box connect to this server. This server is actually a client to
my other boxes.

Although the rkhunter warning is the only thing that triggers, I still
would think this is a stable fix, especially if the port is leaked and
not taken again.

 
 Sorry. I sent that one off too quickly. Try the following.

This built, will be testing it shortly.

-- Steve
--
To unsubscribe from this list: send the line unsubscribe netdev in


[PATCH 0/2] Error message clean-ups for Renesas R-Car CAN driver

2015-06-19 Thread Sergei Shtylyov
Hello.

   Here's the set of 2 patches against Marc Kleine-Budde's 'linux-can.git'
repo plus 3 fix patches just posted; they are small error message cleanups for
the Renesas R-Car CAN driver.

[1/2] rcar_can: print request_irq() error code
[2/2] rcar_can: unify error messages

WBR, Sergei

--
To unsubscribe from this list: send the line unsubscribe netdev in


[PATCH 2/2] rcar_can: unify error messages

2015-06-19 Thread Sergei Shtylyov
All the error messages in the driver but  the ones from devm_clk_get() failures
use similar format.  Make those  two messages consitent with others.

Signed-off-by: Sergei Shtylyov sergei.shtyl...@cogentembedded.com

---
 drivers/net/can/rcar_can.c |5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

Index: linux-can/drivers/net/can/rcar_can.c
===
--- linux-can.orig/drivers/net/can/rcar_can.c
+++ linux-can/drivers/net/can/rcar_can.c
@@ -785,7 +785,8 @@ static int rcar_can_probe(struct platfor
priv-clk = devm_clk_get(pdev-dev, clkp1);
if (IS_ERR(priv-clk)) {
err = PTR_ERR(priv-clk);
-   dev_err(pdev-dev, cannot get peripheral clock: %d\n, err);
+   dev_err(pdev-dev, cannot get peripheral clock, error %d\n,
+   err);
goto fail_clk;
}
 
@@ -797,7 +798,7 @@ static int rcar_can_probe(struct platfor
priv-can_clk = devm_clk_get(pdev-dev, clock_names[clock_select]);
if (IS_ERR(priv-can_clk)) {
err = PTR_ERR(priv-can_clk);
-   dev_err(pdev-dev, cannot get CAN clock: %d\n, err);
+   dev_err(pdev-dev, cannot get CAN clock, error %d\n, err);
goto fail_clk;
}
 

--
To unsubscribe from this list: send the line unsubscribe netdev in


Re: [PATCH v2] bpf: BPF based latency tracing

2015-06-19 Thread Alexei Starovoitov

On 6/19/15 7:00 AM, Daniel Wagner wrote:

BPF offers another way to generate latency histograms. We attach
kprobes at trace_preempt_off and trace_preempt_on and calculate the
time it takes to from seeing the off/on transition.

...

Signed-off-by: Daniel Wagner daniel.wag...@bmw-carit.de

...

With the rebase on net-next no additinal patches are needed and this
thing here runs fine.

...

samples/bpf/Makefile   |   4 ++
  samples/bpf/lathist_kern.c |  99 +++
  samples/bpf/lathist_user.c | 103 +
  3 files changed, 206 insertions(+)
  create mode 100644 samples/bpf/lathist_kern.c
  create mode 100644 samples/bpf/lathist_user.c


Thanks. That's a useful example.
Acked-by: Alexei Starovoitov a...@plumgrid.com

Dave,
this patch is for net-next and I hope it's not too late
for this merge window.

--
To unsubscribe from this list: send the line unsubscribe netdev in


Re: [REGRESSION] NFS is creating a hidden port (left over from xs_bind() )

2015-06-19 Thread Steven Rostedt
On Fri, 19 Jun 2015 19:25:59 -0400
Trond Myklebust trond.mykleb...@primarydata.com wrote:


 8--
 From 4876cc779ff525b9c2376d8076edf47815e71f2c Mon Sep 17 00:00:00 2001
 From: Trond Myklebust trond.mykleb...@primarydata.com
 Date: Fri, 19 Jun 2015 16:17:57 -0400
 Subject: [PATCH v2] SUNRPC: Ensure we release the TCP socket once it has been
  closed
 
 This fixes a regression introduced by commit caf4ccd4e88cf2 (SUNRPC:
 Make xs_tcp_close() do a socket shutdown rather than a sock_release).
 Prior to that commit, the autoclose feature would ensure that an
 idle connection would result in the socket being both disconnected and
 released, whereas now only gets disconnected.
 
 While the current behaviour is harmless, it does leave the port bound
 until either RPC traffic resumes or the RPC client is shut down.

Is there a way to test RPC traffic resuming? I'd like to try that before
declaring this bug harmless.

 
 Reported-by: Steven Rostedt rost...@goodmis.org

The problem appears to go away with this patch.

Tested-by: Steven Rostedt rost...@goodmis.org

Thanks a lot!

-- Steve

 Signed-off-by: Trond Myklebust trond.mykleb...@primarydata.com
 ---
  net/sunrpc/xprt.c |  2 +-
  net/sunrpc/xprtsock.c | 40 ++--
  2 files changed, 23 insertions(+), 19 deletions(-)
 
 diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c
 index 3ca31f20b97c..ab5dd621ae0c 100644
 --- a/net/sunrpc/xprt.c
 +++ b/net/sunrpc/xprt.c
 @@ -611,8 +611,8 @@ static void xprt_autoclose(struct work_struct *work)
   struct rpc_xprt *xprt =
   container_of(work, struct rpc_xprt, task_cleanup);
  
 - xprt-ops-close(xprt);
   clear_bit(XPRT_CLOSE_WAIT, xprt-state);
 + xprt-ops-close(xprt);
   xprt_release_write(xprt, NULL);
  }
  
 diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
 index fda8ec8c74c0..ee0715dfc3c7 100644
 --- a/net/sunrpc/xprtsock.c
 +++ b/net/sunrpc/xprtsock.c
 @@ -623,24 +623,6 @@ process_status:
  }
  
  /**
 - * xs_tcp_shutdown - gracefully shut down a TCP socket
 - * @xprt: transport
 - *
 - * Initiates a graceful shutdown of the TCP socket by calling the
 - * equivalent of shutdown(SHUT_RDWR);
 - */
 -static void xs_tcp_shutdown(struct rpc_xprt *xprt)
 -{
 - struct sock_xprt *transport = container_of(xprt, struct sock_xprt, 
 xprt);
 - struct socket *sock = transport-sock;
 -
 - if (sock != NULL) {
 - kernel_sock_shutdown(sock, SHUT_RDWR);
 - trace_rpc_socket_shutdown(xprt, sock);
 - }
 -}
 -
 -/**
   * xs_tcp_send_request - write an RPC request to a TCP socket
   * @task: address of RPC task that manages the state of an RPC request
   *
 @@ -786,6 +768,7 @@ static void xs_sock_mark_closed(struct rpc_xprt *xprt)
   xs_sock_reset_connection_flags(xprt);
   /* Mark transport as closed and wake up all pending tasks */
   xprt_disconnect_done(xprt);
 + xprt_force_disconnect(xprt);
  }
  
  /**
 @@ -2103,6 +2086,27 @@ out:
   xprt_wake_pending_tasks(xprt, status);
  }
  
 +/**
 + * xs_tcp_shutdown - gracefully shut down a TCP socket
 + * @xprt: transport
 + *
 + * Initiates a graceful shutdown of the TCP socket by calling the
 + * equivalent of shutdown(SHUT_RDWR);
 + */
 +static void xs_tcp_shutdown(struct rpc_xprt *xprt)
 +{
 + struct sock_xprt *transport = container_of(xprt, struct sock_xprt, 
 xprt);
 + struct socket *sock = transport-sock;
 +
 + if (sock == NULL)
 + return;
 + if (xprt_connected(xprt)) {
 + kernel_sock_shutdown(sock, SHUT_RDWR);
 + trace_rpc_socket_shutdown(xprt, sock);
 + } else
 + xs_reset_transport(transport);
 +}
 +
  static int xs_tcp_finish_connecting(struct rpc_xprt *xprt, struct socket 
 *sock)
  {
   struct sock_xprt *transport = container_of(xprt, struct sock_xprt, 
 xprt);

--
To unsubscribe from this list: send the line unsubscribe netdev in


[PATCH 1/3] rcar_can: fix IRQ check

2015-06-19 Thread Sergei Shtylyov
rcar_can_probe() regards 0 as a wrong IRQ #, despite platform_get_irq() that it
calls returns negative error code in that case. This leads to the following
being printed to the console when attempting to open the device:

error requesting interrupt fffa

because  rcar_can_open() calls request_irq() with a negative IRQ #, and that
function naturally fails with -EINVAL.

Check for the negative error codes instead and propagate them upstream instead
of just returning -ENODEV.

Fixes: fd1159318e55 (can: add Renesas R-Car CAN driver)
Signed-off-by: Sergei Shtylyov sergei.shtyl...@cogentembedded.com

---
 drivers/net/can/rcar_can.c |3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Index: linux-can/drivers/net/can/rcar_can.c
===
--- linux-can.orig/drivers/net/can/rcar_can.c
+++ linux-can/drivers/net/can/rcar_can.c
@@ -758,8 +758,9 @@ static int rcar_can_probe(struct platfor
}
 
irq = platform_get_irq(pdev, 0);
-   if (!irq) {
+   if (irq  0) {
dev_err(pdev-dev, No IRQ resource\n);
+   err = irq;
goto fail;
}
 

--
To unsubscribe from this list: send the line unsubscribe netdev in


[PATCH 1/2] rcar_can: print request_irq() error code

2015-06-19 Thread Sergei Shtylyov
Also print the error code when the request_irq() call fails in rcar_can_open(),
rewording  the error message...

Signed-off-by: Sergei Shtylyov sergei.shtyl...@cogentembedded.com

---
 drivers/net/can/rcar_can.c |3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Index: linux-can/drivers/net/can/rcar_can.c
===
--- linux-can.orig/drivers/net/can/rcar_can.c
+++ linux-can/drivers/net/can/rcar_can.c
@@ -527,7 +527,8 @@ static int rcar_can_open(struct net_devi
napi_enable(priv-napi);
err = request_irq(ndev-irq, rcar_can_interrupt, 0, ndev-name, ndev);
if (err) {
-   netdev_err(ndev, error requesting interrupt %d\n, ndev-irq);
+   netdev_err(ndev, request_irq(%d) failed, error %d\n,
+  ndev-irq, err);
goto out_close;
}
can_led_event(ndev, CAN_LED_EVENT_OPEN);

--
To unsubscribe from this list: send the line unsubscribe netdev in


Re: [REGRESSION] NFS is creating a hidden port (left over from xs_bind() )

2015-06-19 Thread Steven Rostedt
On Fri, 19 Jun 2015 20:37:45 -0400
Steven Rostedt rost...@goodmis.org wrote:


  Is it causing any other damage than the rkhunter warning you reported?
 
 Well, not that I know of. Are you sure that this port will be
 reconnected, and is not just a leak. Not sure if you could waste more
 ports this way with connections to other machines. I only have my
 wife's box connect to this server. This server is actually a client to
 my other boxes.
 
 Although the rkhunter warning is the only thing that triggers, I still
 would think this is a stable fix, especially if the port is leaked and
 not taken again.

I did some experiments. If I unmount the directories from my wife's
machine and remount them, the port that was hidden is fully closed.
Maybe it's not that big of a deal after all.

-- Steve
--
To unsubscribe from this list: send the line unsubscribe netdev in


Re: [REGRESSION] NFS is creating a hidden port (left over from xs_bind() )

2015-06-19 Thread Trond Myklebust
On Fri, Jun 19, 2015 at 9:27 PM, Steven Rostedt rost...@goodmis.org wrote:
 On Fri, 19 Jun 2015 19:25:59 -0400
 Trond Myklebust trond.mykleb...@primarydata.com wrote:


 8--
 From 4876cc779ff525b9c2376d8076edf47815e71f2c Mon Sep 17 00:00:00 2001
 From: Trond Myklebust trond.mykleb...@primarydata.com
 Date: Fri, 19 Jun 2015 16:17:57 -0400
 Subject: [PATCH v2] SUNRPC: Ensure we release the TCP socket once it has been
  closed

 This fixes a regression introduced by commit caf4ccd4e88cf2 (SUNRPC:
 Make xs_tcp_close() do a socket shutdown rather than a sock_release).
 Prior to that commit, the autoclose feature would ensure that an
 idle connection would result in the socket being both disconnected and
 released, whereas now only gets disconnected.

 While the current behaviour is harmless, it does leave the port bound
 until either RPC traffic resumes or the RPC client is shut down.

 Is there a way to test RPC traffic resuming? I'd like to try that before
 declaring this bug harmless.

You should be seeing the same issue if you mount an NFSv3 partition.
After about 5 minutes of inactivity, the client will close down the
connection to the server, and rkhunter should again see the phantom
socket. If you then try to access the partition, the RPC layer should
immediately release the socket and establish a new connection on the
same port.

Cheers
  Trond
--
To unsubscribe from this list: send the line unsubscribe netdev in


[PATCH 2/3] rcar_can: print signed IRQ #

2015-06-19 Thread Sergei Shtylyov
Printing IRQ # using %x and %u unsigned formats isn't quite correct as
'ndev-irq' is of  type *int*, so  the %d format  needs to be used instead.

While fixing this, beautify the dev_info() message in rcar_can_probe() a bit.

Fixes: fd1159318e55 (can: add Renesas R-Car CAN driver)
Signed-off-by: Sergei Shtylyov sergei.shtyl...@cogentembedded.com

---
 drivers/net/can/rcar_can.c |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Index: linux-can/drivers/net/can/rcar_can.c
===
--- linux-can.orig/drivers/net/can/rcar_can.c
+++ linux-can/drivers/net/can/rcar_can.c
@@ -526,7 +526,7 @@ static int rcar_can_open(struct net_devi
napi_enable(priv-napi);
err = request_irq(ndev-irq, rcar_can_interrupt, 0, ndev-name, ndev);
if (err) {
-   netdev_err(ndev, error requesting interrupt %x\n, ndev-irq);
+   netdev_err(ndev, error requesting interrupt %d\n, ndev-irq);
goto out_close;
}
can_led_event(ndev, CAN_LED_EVENT_OPEN);
@@ -824,7 +824,7 @@ static int rcar_can_probe(struct platfor
 
devm_can_led_init(ndev);
 
-   dev_info(pdev-dev, device registered (reg_base=%p, irq=%u)\n,
+   dev_info(pdev-dev, device registered (regs @ %p, IRQ%d)\n,
 priv-regs, ndev-irq);
 
return 0;

--
To unsubscribe from this list: send the line unsubscribe netdev in


[PATCH 3/3] rcar_can: fix typo in error message

2015-06-19 Thread Sergei Shtylyov
Fix typo in the first error message printed by rcar_can_open().

Based on the original patch by Vladimir Barinov.

Fixes: 862e2b6af941 (can: rcar_can: support all input clocks)
Reported-by: Vladimir Barinov vladimir.bari...@cogentembedded.com
Signed-off-by: Sergei Shtylyov sergei.shtyl...@cogentembedded.com

---
 drivers/net/can/rcar_can.c |3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Index: linux-can/drivers/net/can/rcar_can.c
===
--- linux-can.orig/drivers/net/can/rcar_can.c
+++ linux-can/drivers/net/can/rcar_can.c
@@ -508,7 +508,8 @@ static int rcar_can_open(struct net_devi
 
err = clk_prepare_enable(priv-clk);
if (err) {
-   netdev_err(ndev, failed to enable periperal clock, error %d\n,
+   netdev_err(ndev,
+  failed to enable peripheral clock, error %d\n,
   err);
goto out;
}

--
To unsubscribe from this list: send the line unsubscribe netdev in


Re: [PATCH RFC net] neigh: do not modify unlinked entries

2015-06-19 Thread Eric Dumazet
On Tue, 2015-06-16 at 22:56 +0300, Julian Anastasov wrote:
 The lockless lookups can return entry that is unlinked.
 Sometimes they get reference before last neigh_cleanup_and_release,
 sometimes they do not need reference. Later, any
 modification attempts may result in the following problems:
 
 1. entry is not destroyed immediately because neigh_update
 can start the timer for dead entry, eg. on change to NUD_REACHABLE
 state. As result, entry lives for some time but is invisible
 and out of control.
 
 2. __neigh_event_send can run in parallel with neigh_destroy
 while refcnt=0 but if timer is started and expired refcnt can
 reach 0 for second time leading to second neigh_destroy and
 possible crash.
 
 Thanks to Eric Dumazet and Ying Xue for their work and analyze
 on the __neigh_event_send change.
 
 Fixes: 767e97e1e0db (neigh: RCU conversion of struct neighbour)
 Fixes: a263b3093641 (ipv4: Make neigh lookups directly in output packet 
 path.)
 Fixes: 6fd6ce2056de (ipv6: Do not depend on rt-n in ip6_finish_output2().)
 Cc: Eric Dumazet eric.duma...@gmail.com
 Cc: Ying Xue ying@windriver.com
 Signed-off-by: Julian Anastasov j...@ssi.bg
 ---

Seems good to me Julian !

Acked-by: Eric Dumazet eduma...@google.com


--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next v2] bridge: multicast: start querier timer when running user-space stp

2015-06-19 Thread Herbert Xu
On Fri, Jun 19, 2015 at 01:45:50AM -0700, Nikolay Aleksandrov wrote:
 When STP is running in user-space and querier is configured, the
 querier timer is not started when a port goes to a non-blocking state.
 This patch unifies the user- and kernel-space stp multicast port enable
 path and enables it in all states different from blocking. Note that when a
 port goes in BR_STATE_DISABLED it's not enabled because that is handled
 in the beginning of the port list loop.
 
 Signed-off-by: Nikolay Aleksandrov niko...@cumulusnetworks.com

Acked-by: Herbert Xu herb...@gondor.apana.org.au

On a related note, we never disable the multicast querying when
the port goes into blocking mode and we probably should.  So could
you take a look at that and create a patch for it?

Thanks,
-- 
Email: Herbert Xu herb...@gondor.apana.org.au
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line unsubscribe netdev in


Re: [PATCH net-next] inet_diag: Remove _bh suffix in inet_diag_dump_reqs().

2015-06-19 Thread Hiroaki Shimoda
On Thu, 18 Jun 2015 23:42:08 -0700
Eric Dumazet eric.duma...@gmail.com wrote:

 Sure, although this will soon be removed completely when SYN_RECV
 sockets will be stored in regular ehash table.

OK. Thank you for letting me know.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v1] bpf: BPF based latency tracing

2015-06-19 Thread Daniel Wagner
BPF offers another way to generate latency histograms. We attach
kprobes at trace_preempt_off and trace_preempt_on and calculate the
time it takes to from seeing the off/on transition.

The first array is used to store the start time stamp. The key is the
CPU id. The second array stores the log2(time diff). We need to use
static allocation here (array and not hash tables). The kprobes
hooking into trace_preempt_on|off should not calling any dynamic
memory allocation or free path. We need to avoid recursivly
getting called. Besides that, it reduces jitter in the measurement.

CPU 0
  latency: count distribution
   1 - 1: 0||
   2 - 3: 0||
   4 - 7: 0||
   8 - 15   : 0||
  16 - 31   : 0||
  32 - 63   : 0||
  64 - 127  : 0||
 128 - 255  : 0||
 256 - 511  : 0||
 512 - 1023 : 0||
1024 - 2047 : 0||
2048 - 4095 : 166723   |*** |
4096 - 8191 : 19870|*** |
8192 - 16383: 6324 ||
   16384 - 32767: 1098 ||
   32768 - 65535: 190  ||
   65536 - 131071   : 179  ||
  131072 - 262143   : 18   ||
  262144 - 524287   : 4||
  524288 - 1048575  : 1363 ||
CPU 1
  latency: count distribution
   1 - 1: 0||
   2 - 3: 0||
   4 - 7: 0||
   8 - 15   : 0||
  16 - 31   : 0||
  32 - 63   : 0||
  64 - 127  : 0||
 128 - 255  : 0||
 256 - 511  : 0||
 512 - 1023 : 0||
1024 - 2047 : 0||
2048 - 4095 : 114042   |*** |
4096 - 8191 : 9587 |**  |
8192 - 16383: 4140 ||
   16384 - 32767: 673  ||
   32768 - 65535: 179  ||
   65536 - 131071   : 29   ||
  131072 - 262143   : 4||
  262144 - 524287   : 1||
  524288 - 1048575  : 364  ||
CPU 2
  latency: count distribution
   1 - 1: 0||
   2 - 3: 0||
   4 - 7: 0||
   8 - 15   : 0||
  16 - 31   : 0||
  32 - 63   : 0||
  64 - 127  : 0||
 128 - 255  : 0||
 256 - 511  : 0||
 512 - 1023 : 0||
1024 - 2047 : 0||
2048 - 4095 : 40147|*** |
4096 - 8191 : 2300 |*   |
8192 - 16383: 828  ||
   16384 - 32767: 178  ||
   32768 - 65535: 59   ||
   65536 - 131071   : 2||
  131072 - 262143   : 0|   

Re: [PATCH net-next RFC v2 2/3] ipv4: add support for light weight tunnel encap attributes

2015-06-19 Thread roopa

On 6/18/15, 11:59 PM, Julian Anastasov wrote:

Hello,

On Thu, 18 Jun 2015, Roopa Prabhu wrote:


@@ -366,6 +371,7 @@ static inline size_t fib_nlmsg_size(struct fib_info *fi)
payload += nla_total_size((RTAX_MAX * nla_total_size(4)));
  
  	if (fi-fib_nhs) {

+   size_t nh_encapsize = 0;

Var not in #ifdef. Any warnings with CONFIG_LWTUNNEL=n?


/* Also handles the special case fib_nhs == 1 */
  
  		/* each nexthop is packed in an attribute */

@@ -374,8 +380,23 @@ static inline size_t fib_nlmsg_size(struct fib_info *fi)
/* may contain flow and gateway attribute */
nhsize += 2 * nla_total_size(4);
  
+#ifdef CONFIG_LWTUNNEL

+   /* grab encap info */
+   for_nexthops(fi) {
+   if (nh-nh_lwtstate) {
+   /* RTA_ENCAP_TYPE */
+   nh_encapsize += lwtunnel_get_encap_size(
+   nh-nh_lwtstate);

New labels not in #ifdef:

Will check and fix all warnings with CONFIG_LWTUNNEL off



+
+err_inval:
+   ret = -EINVAL;
+
+errout:
+   return ret;
  }

Some other places may need changes:

- nh_comp: there is logic that decides if same fib_info
is reused from many fib nodes. There should be check
if NH matches by nh_lwtstate.


yes, i will add that.


- xfrm4_fill_dst: not sure about this but some fields
are copied.

I have not picked up xfrm4_fill_dst specifically, but this infra is 
supposed to be similar to that.

I will look.

Thanks.
--
To unsubscribe from this list: send the line unsubscribe netdev in


[PATCH v2] bpf: BPF based latency tracing

2015-06-19 Thread Daniel Wagner
BPF offers another way to generate latency histograms. We attach
kprobes at trace_preempt_off and trace_preempt_on and calculate the
time it takes to from seeing the off/on transition.

The first array is used to store the start time stamp. The key is the
CPU id. The second array stores the log2(time diff). We need to use
static allocation here (array and not hash tables). The kprobes
hooking into trace_preempt_on|off should not calling any dynamic
memory allocation or free path. We need to avoid recursivly
getting called. Besides that, it reduces jitter in the measurement.

CPU 0
  latency: count distribution
   1 - 1: 0||
   2 - 3: 0||
   4 - 7: 0||
   8 - 15   : 0||
  16 - 31   : 0||
  32 - 63   : 0||
  64 - 127  : 0||
 128 - 255  : 0||
 256 - 511  : 0||
 512 - 1023 : 0||
1024 - 2047 : 0||
2048 - 4095 : 166723   |*** |
4096 - 8191 : 19870|*** |
8192 - 16383: 6324 ||
   16384 - 32767: 1098 ||
   32768 - 65535: 190  ||
   65536 - 131071   : 179  ||
  131072 - 262143   : 18   ||
  262144 - 524287   : 4||
  524288 - 1048575  : 1363 ||
CPU 1
  latency: count distribution
   1 - 1: 0||
   2 - 3: 0||
   4 - 7: 0||
   8 - 15   : 0||
  16 - 31   : 0||
  32 - 63   : 0||
  64 - 127  : 0||
 128 - 255  : 0||
 256 - 511  : 0||
 512 - 1023 : 0||
1024 - 2047 : 0||
2048 - 4095 : 114042   |*** |
4096 - 8191 : 9587 |**  |
8192 - 16383: 4140 ||
   16384 - 32767: 673  ||
   32768 - 65535: 179  ||
   65536 - 131071   : 29   ||
  131072 - 262143   : 4||
  262144 - 524287   : 1||
  524288 - 1048575  : 364  ||
CPU 2
  latency: count distribution
   1 - 1: 0||
   2 - 3: 0||
   4 - 7: 0||
   8 - 15   : 0||
  16 - 31   : 0||
  32 - 63   : 0||
  64 - 127  : 0||
 128 - 255  : 0||
 256 - 511  : 0||
 512 - 1023 : 0||
1024 - 2047 : 0||
2048 - 4095 : 40147|*** |
4096 - 8191 : 2300 |*   |
8192 - 16383: 828  ||
   16384 - 32767: 178  ||
   32768 - 65535: 59   ||
   65536 - 131071   : 2||
  131072 - 262143   : 0|   

Re: [PATCH] net: inet_diag: export IPV6_V6ONLY sockopt

2015-06-19 Thread Eric Dumazet
On Fri, 2015-06-19 at 14:15 +0200, Phil Sutter wrote:
 For AF_INET6 sockets, the value of struct ipv6_pinfo.ipv6only is
 exported to userspace. It indicates whether an unbound socket listens on
 IPv4 as well as IPv6.

What is an 'unbound socket' ??? This makes no sense to me here.

  Since the socket is natively IPv6, it is not
 listed by e.g. 'netstat -l -4'.

netstat does not use this interface. iproute2/ss does.

 
 Signed-off-by: Phil Sutter p...@nwl.cc
 ---
 This patch is accompanied by an appropriate one for iproute2 to enable
 the additional information in 'ss -e'.
 ---
  include/uapi/linux/inet_diag.h | 3 ++-
  net/ipv4/inet_diag.c   | 4 
  2 files changed, 6 insertions(+), 1 deletion(-)
 
 diff --git a/include/uapi/linux/inet_diag.h b/include/uapi/linux/inet_diag.h
 index c7093c7..9ca4834 100644
 --- a/include/uapi/linux/inet_diag.h
 +++ b/include/uapi/linux/inet_diag.h
 @@ -111,9 +111,10 @@ enum {
   INET_DIAG_SKMEMINFO,
   INET_DIAG_SHUTDOWN,
   INET_DIAG_DCTCPINFO,
 + INET_DIAG_SKV6ONLY,
  };
  
 -#define INET_DIAG_MAX INET_DIAG_DCTCPINFO
 +#define INET_DIAG_MAX INET_DIAG_SKV6ONLY
  
  /* INET_DIAG_MEM */
  
 diff --git a/net/ipv4/inet_diag.c b/net/ipv4/inet_diag.c
 index 4d32262..4bf6d03 100644
 --- a/net/ipv4/inet_diag.c
 +++ b/net/ipv4/inet_diag.c
 @@ -151,6 +151,10 @@ int inet_sk_diag_fill(struct sock *sk, struct 
 inet_connection_sock *icsk,
   if (nla_put_u8(skb, INET_DIAG_TCLASS,
  inet6_sk(sk)-tclass)  0)
   goto errout;
 +
 + if (nla_put_u8(skb, INET_DIAG_SKV6ONLY,
 + inet6_sk(sk)-ipv6only)  0)
 + goto errout;
   }
  #endif
  

1) This certainly should not compile on current linux trees.
   Always submit such patches on net-next.

2) It is not clear why we would add this attribute if it is 0.
This looks a waste of data.

So I would rather use :

diff --git a/include/uapi/linux/inet_diag.h b/include/uapi/linux/inet_diag.h
index 
b629fc53b1090e73047b263a9231e34ebf64b2af..46d72e45f8701526abb06f4a8187262dbc635784
 100644
--- a/include/uapi/linux/inet_diag.h
+++ b/include/uapi/linux/inet_diag.h
@@ -112,6 +112,7 @@ enum {
INET_DIAG_SHUTDOWN,
INET_DIAG_DCTCPINFO,
INET_DIAG_PROTOCOL,  /* response attribute only */
+   INET_DIAG_SKV6ONLY,
 };
 
 #define INET_DIAG_MAX INET_DIAG_PROTOCOL
diff --git a/net/ipv4/inet_diag.c b/net/ipv4/inet_diag.c
index 
21985d8d41e709908021769be36380f7a5dfac23..381a26e932691075a73ae63569fd3a4366ce277f
 100644
--- a/net/ipv4/inet_diag.c
+++ b/net/ipv4/inet_diag.c
@@ -151,6 +151,9 @@ int inet_sk_diag_fill(struct sock *sk, struct 
inet_connection_sock *icsk,
if (nla_put_u8(skb, INET_DIAG_TCLASS,
   inet6_sk(sk)-tclass)  0)
goto errout;
+   if (ipv6_only_sock(sk) 
+   nla_put_u8(skb, INET_DIAG_SKV6ONLY, 1))
+   goto errout;
}
 #endif
 


--
To unsubscribe from this list: send the line unsubscribe netdev in


Re: [PATCH net-next v2] bridge: multicast: start querier timer when running user-space stp

2015-06-19 Thread Nikolay Aleksandrov

 On Jun 19, 2015, at 4:47 PM, Herbert Xu herb...@gondor.apana.org.au wrote:
 
 On Fri, Jun 19, 2015 at 01:45:50AM -0700, Nikolay Aleksandrov wrote:
 When STP is running in user-space and querier is configured, the
 querier timer is not started when a port goes to a non-blocking state.
 This patch unifies the user- and kernel-space stp multicast port enable
 path and enables it in all states different from blocking. Note that when a
 port goes in BR_STATE_DISABLED it's not enabled because that is handled
 in the beginning of the port list loop.
 
 Signed-off-by: Nikolay Aleksandrov niko...@cumulusnetworks.com
 
 Acked-by: Herbert Xu herb...@gondor.apana.org.au
 
 On a related note, we never disable the multicast querying when
 the port goes into blocking mode and we probably should.  So could
 you take a look at that and create a patch for it?
 
 Thanks,
 -- 
 Email: Herbert Xu herb...@gondor.apana.org.au
 Home Page: http://gondor.apana.org.au/~herbert/
 PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
 --
 To unsubscribe from this list: send the line unsubscribe netdev in

Good catch, I’ll look into it.

Thanks,
 Nik--
To unsubscribe from this list: send the line unsubscribe netdev in


[PATCH v2 net-next] macvtap: Increase limit of macvtap queues

2015-06-19 Thread Pankaj Gupta
Macvtap should be compatible with tuntap for 
maximum number of queues. 

commit 'baf71c5c1f80d82e92924050a60b5baaf97e3094 (tuntap: 
Increase the number of queues in tun.)' removes
the limitations and increases number of queues in tuntap.
Now, Its safe to increase number of queues in Macvtap as well.

This patch also modifies 'macvtap_del_queues' function
to avoid extra memory allocation in stack.

Changes from v1-v2 :
Michael S. Tsirkin, Jason Wang  : 
  Better way to use linked list to 
avoid use of extra memory in stack.
Sergei Shtylyov : Specify dependent commit's summary.

Signed-off-by: Pankaj Gupta pagu...@redhat.com
---
 drivers/net/macvtap.c  | 10 ++
 include/linux/if_macvlan.h |  2 +-
 2 files changed, 3 insertions(+), 9 deletions(-)

diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c
index 483afb1..6a64197f 100644
--- a/drivers/net/macvtap.c
+++ b/drivers/net/macvtap.c
@@ -263,27 +263,21 @@ out:
 static void macvtap_del_queues(struct net_device *dev)
 {
struct macvlan_dev *vlan = netdev_priv(dev);
-   struct macvtap_queue *q, *tmp, *qlist[MAX_MACVTAP_QUEUES];
-   int i, j = 0;
+   struct macvtap_queue *q, *tmp;
 
ASSERT_RTNL();
list_for_each_entry_safe(q, tmp, vlan-queue_list, next) {
list_del_init(q-next);
-   qlist[j++] = q;
RCU_INIT_POINTER(q-vlan, NULL);
if (q-enabled)
vlan-numvtaps--;
vlan-numqueues--;
+   sock_put(q-sk);
}
-   for (i = 0; i  vlan-numvtaps; i++)
-   RCU_INIT_POINTER(vlan-taps[i], NULL);
BUG_ON(vlan-numvtaps);
BUG_ON(vlan-numqueues);
/* guarantee that any future macvtap_set_queue will fail */
vlan-numvtaps = MAX_MACVTAP_QUEUES;
-
-   for (--j; j = 0; j--)
-   sock_put(qlist[j]-sk);
 }
 
 static rx_handler_result_t macvtap_handle_frame(struct sk_buff **pskb)
diff --git a/include/linux/if_macvlan.h b/include/linux/if_macvlan.h
index 6f6929e..a4ccc31 100644
--- a/include/linux/if_macvlan.h
+++ b/include/linux/if_macvlan.h
@@ -29,7 +29,7 @@ struct macvtap_queue;
  * Maximum times a macvtap device can be opened. This can be used to
  * configure the number of receive queue, e.g. for multiqueue virtio.
  */
-#define MAX_MACVTAP_QUEUES 16
+#define MAX_MACVTAP_QUEUES 256
 
 #define MACVLAN_MC_FILTER_BITS 8
 #define MACVLAN_MC_FILTER_SZ   (1  MACVLAN_MC_FILTER_BITS)
-- 
1.9.3

--
To unsubscribe from this list: send the line unsubscribe netdev in


[PATCH] xen-netback: fix a BUG() during initialization

2015-06-19 Thread Imre Palik
From: Palik, Imre im...@amazon.de

Commit edafc132baac (xen-netback: making the bandwidth limiter runtime 
settable)
introduced the capability to change the bandwidth rate limit at runtime.
But it also introduced a possible crashing bug.

If netback receives two XenbusStateConnected without getting the
hotplug-status watch firing in between, then it will try to register the
watches for the rate limiter again.  But this triggers a BUG() in the watch
registration code.

The fix modifies connect() to remove the possibly existing packet-rate
watches before trying to install those watches.  This behaviour is in line
with how connect() deals with the hotplug-status watch.

Signed-off-by: Imre Palik im...@amazon.de
Cc: Matt Wilson m...@amazon.com
---
 drivers/net/xen-netback/xenbus.c |4 
 1 file changed, 4 insertions(+)

diff --git a/drivers/net/xen-netback/xenbus.c b/drivers/net/xen-netback/xenbus.c
index 968787a..ec383b0 100644
--- a/drivers/net/xen-netback/xenbus.c
+++ b/drivers/net/xen-netback/xenbus.c
@@ -681,6 +681,9 @@ static int xen_register_watchers(struct xenbus_device *dev, 
struct xenvif *vif)
char *node;
unsigned maxlen = strlen(dev-nodename) + sizeof(/rate);
 
+   if (vif-credit_watch.node)
+   return -EADDRINUSE;
+
node = kmalloc(maxlen, GFP_KERNEL);
if (!node)
return -ENOMEM;
@@ -770,6 +773,7 @@ static void connect(struct backend_info *be)
}
 
xen_net_read_rate(dev, credit_bytes, credit_usec);
+   xen_unregister_watchers(be-vif);
xen_register_watchers(dev, be-vif);
read_xenbus_vif_flags(be);
 
-- 
1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe netdev in


[PATCH net v2] bridge: multicast: restore router configuration on port link down/up

2015-06-19 Thread Nikolay Aleksandrov
From: Satish Ashok sas...@cumulusnetworks.com

When a port goes through a link down/up the multicast router configuration
is not restored.

Signed-off-by: Satish Ashok sas...@cumulusnetworks.com
Signed-off-by: Nikolay Aleksandrov niko...@cumulusnetworks.com
Fixes: 0909e11758bd (bridge: Add multicast_router sysfs entries)
Acked-by: Herbert Xu herb...@gondor.apana.org.au
---
v2: Added the acked-by and sent as a separate patch. I plan to repurpose
the second patch for net-next, they weren't dependent anyway.

 net/bridge/br_multicast.c |4 
 1 file changed, 4 insertions(+)

diff --git a/net/bridge/br_multicast.c b/net/bridge/br_multicast.c
index ff667e18b2d6..761fc733bf6d 100644
--- a/net/bridge/br_multicast.c
+++ b/net/bridge/br_multicast.c
@@ -37,6 +37,8 @@
 
 static void br_multicast_start_querier(struct net_bridge *br,
   struct bridge_mcast_own_query *query);
+static void br_multicast_add_router(struct net_bridge *br,
+   struct net_bridge_port *port);
 unsigned int br_mdb_rehash_seq;
 
 static inline int br_ip_equal(const struct br_ip *a, const struct br_ip *b)
@@ -936,6 +938,8 @@ void br_multicast_enable_port(struct net_bridge_port *port)
 #if IS_ENABLED(CONFIG_IPV6)
br_multicast_enable(port-ip6_own_query);
 #endif
+   if (port-multicast_router == 2  hlist_unhashed(port-rlist))
+   br_multicast_add_router(br, port);
 
 out:
spin_unlock(br-multicast_lock);
-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next v2] bridge: multicast: start querier timer when running user-space stp

2015-06-19 Thread Nikolay Aleksandrov
When STP is running in user-space and querier is configured, the
querier timer is not started when a port goes to a non-blocking state.
This patch unifies the user- and kernel-space stp multicast port enable
path and enables it in all states different from blocking. Note that when a
port goes in BR_STATE_DISABLED it's not enabled because that is handled
in the beginning of the port list loop.

Signed-off-by: Nikolay Aleksandrov niko...@cumulusnetworks.com
---
v2: Repurposed for net-next and implemented Herbert's suggestion for
unifying both user- and kernel-space multicast enable port paths.

 net/bridge/br_stp.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/bridge/br_stp.c b/net/bridge/br_stp.c
index 45f1ff113af9..e7ab74b405a1 100644
--- a/net/bridge/br_stp.c
+++ b/net/bridge/br_stp.c
@@ -428,7 +428,6 @@ static void br_make_forwarding(struct net_bridge_port *p)
else
br_set_state(p, BR_STATE_LEARNING);
 
-   br_multicast_enable_port(p);
br_log_state(p);
br_ifinfo_notify(RTM_NEWLINK, p);
 
@@ -462,6 +461,8 @@ void br_port_state_selection(struct net_bridge *br)
}
}
 
+   if (p-state != BR_STATE_BLOCKING)
+   br_multicast_enable_port(p);
if (p-state == BR_STATE_FORWARDING)
++liveports;
}
-- 
2.4.3

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/1] ssb: remove unncessary out label

2015-06-19 Thread Maninder Singh
This patch removes unnecessary label out and
some restructring for using device_create_file directly.

Signed-off-by: Maninder Singh maninder...@samsung.com
Reviewed-by: Rohit Thapliyal r.thapli...@samsung.com
---
 drivers/ssb/pci.c |8 +---
 1 files changed, 1 insertions(+), 7 deletions(-)

diff --git a/drivers/ssb/pci.c b/drivers/ssb/pci.c
index 0f28c08..d6ca4d3 100644
--- a/drivers/ssb/pci.c
+++ b/drivers/ssb/pci.c
@@ -1173,17 +1173,11 @@ void ssb_pci_exit(struct ssb_bus *bus)
 int ssb_pci_init(struct ssb_bus *bus)
 {
struct pci_dev *pdev;
-   int err;
 
if (bus-bustype != SSB_BUSTYPE_PCI)
return 0;
 
pdev = bus-host_pci;
mutex_init(bus-sprom_mutex);
-   err = device_create_file(pdev-dev, dev_attr_ssb_sprom);
-   if (err)
-   goto out;
-
-out:
-   return err;
+   return device_create_file(pdev-dev, dev_attr_ssb_sprom);
 }
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] net: inet_diag: export IPV6_V6ONLY sockopt

2015-06-19 Thread Phil Sutter
For AF_INET6 sockets, the value of struct ipv6_pinfo.ipv6only is
exported to userspace. It indicates whether an unbound socket listens on
IPv4 as well as IPv6. Since the socket is natively IPv6, it is not
listed by e.g. 'netstat -l -4'.

Signed-off-by: Phil Sutter p...@nwl.cc
---
This patch is accompanied by an appropriate one for iproute2 to enable
the additional information in 'ss -e'.
---
 include/uapi/linux/inet_diag.h | 3 ++-
 net/ipv4/inet_diag.c   | 4 
 2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/inet_diag.h b/include/uapi/linux/inet_diag.h
index c7093c7..9ca4834 100644
--- a/include/uapi/linux/inet_diag.h
+++ b/include/uapi/linux/inet_diag.h
@@ -111,9 +111,10 @@ enum {
INET_DIAG_SKMEMINFO,
INET_DIAG_SHUTDOWN,
INET_DIAG_DCTCPINFO,
+   INET_DIAG_SKV6ONLY,
 };
 
-#define INET_DIAG_MAX INET_DIAG_DCTCPINFO
+#define INET_DIAG_MAX INET_DIAG_SKV6ONLY
 
 /* INET_DIAG_MEM */
 
diff --git a/net/ipv4/inet_diag.c b/net/ipv4/inet_diag.c
index 4d32262..4bf6d03 100644
--- a/net/ipv4/inet_diag.c
+++ b/net/ipv4/inet_diag.c
@@ -151,6 +151,10 @@ int inet_sk_diag_fill(struct sock *sk, struct 
inet_connection_sock *icsk,
if (nla_put_u8(skb, INET_DIAG_TCLASS,
   inet6_sk(sk)-tclass)  0)
goto errout;
+
+   if (nla_put_u8(skb, INET_DIAG_SKV6ONLY,
+   inet6_sk(sk)-ipv6only)  0)
+   goto errout;
}
 #endif
 
-- 
2.1.2

--
To unsubscribe from this list: send the line unsubscribe netdev in


Re: [PATCH v2 1/3] net: mvneta: introduce compatible string marvell, armada-xp-neta

2015-06-19 Thread Simon Guinot
On Wed, Jun 17, 2015 at 05:01:12PM +, Jason Cooper wrote:
 Hi Gregory,
 
 On Wed, Jun 17, 2015 at 05:15:28PM +0200, Gregory CLEMENT wrote:
  On 17/06/2015 17:12, Gregory CLEMENT wrote:
   On 17/06/2015 15:19, Simon Guinot wrote:
   The mvneta driver supports the Ethernet IP found in the Armada 370, XP,
   380 and 385 SoCs. Since at least one more hardware feature is available
   for the Armada XP SoCs then a way to identify them is needed.
  
   This patch introduces a new compatible string marvell,armada-xp-neta.
   
   Let's be future proof by going further. I would like to have an 
   compatible string
   for each SoC even if we currently we don't use them.
 
 I disagree with this.  We can't predict what incosistencies we'll discover in
 the future.  We should only assign new compatible strings based on known IP
 variations when we discover them.  This seems fraught with demons since we
 can't predict the scope of affected IP blocks (some steppings of one SoC, 
 three
 SoCs plus two steppings of a fourth, etc)
 
 imho, the 'future-proofing' lies in being specific as to the naming of the
 compatible strings against known hardware variations at the time.

So, should I add more compatible strings or not ?

Simon


signature.asc
Description: Digital signature


Re: [PATCH net-next] inet_diag: Remove _bh suffix in inet_diag_dump_reqs().

2015-06-19 Thread Eric Dumazet
On Thu, 2015-06-18 at 20:40 +0900, Hiroaki Shimoda wrote:
 inet_diag_dump_reqs() is called from inet_diag_dump_icsk() with BH
 disabled. So no need to disable BH in inet_diag_dump_reqs().
 
 Signed-off-by: Hiroaki Shimoda shimoda.hiro...@gmail.com
 ---
  net/ipv4/inet_diag.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)
 
 diff --git a/net/ipv4/inet_diag.c b/net/ipv4/inet_diag.c
 index 21985d8d41e7..4ca789ba63cb 100644
 --- a/net/ipv4/inet_diag.c
 +++ b/net/ipv4/inet_diag.c
 @@ -746,7 +746,7 @@ static int inet_diag_dump_reqs(struct sk_buff *skb, 
 struct sock *sk,
  
   entry.family = sk-sk_family;
  
 - spin_lock_bh(icsk-icsk_accept_queue.syn_wait_lock);
 + spin_lock(icsk-icsk_accept_queue.syn_wait_lock);
  
   lopt = icsk-icsk_accept_queue.listen_opt;
   if (!lopt || !listen_sock_qlen(lopt))
 @@ -794,7 +794,7 @@ static int inet_diag_dump_reqs(struct sk_buff *skb, 
 struct sock *sk,
   }
  
  out:
 - spin_unlock_bh(icsk-icsk_accept_queue.syn_wait_lock);
 + spin_unlock(icsk-icsk_accept_queue.syn_wait_lock);
  
   return err;
  }

Sure, although this will soon be removed completely when SYN_RECV
sockets will be stored in regular ehash table.

Thanks


--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next RFC v2 2/3] ipv4: add support for light weight tunnel encap attributes

2015-06-19 Thread Julian Anastasov

Hello,

On Thu, 18 Jun 2015, Roopa Prabhu wrote:

 @@ -366,6 +371,7 @@ static inline size_t fib_nlmsg_size(struct fib_info *fi)
   payload += nla_total_size((RTAX_MAX * nla_total_size(4)));
  
   if (fi-fib_nhs) {
 + size_t nh_encapsize = 0;

Var not in #ifdef. Any warnings with CONFIG_LWTUNNEL=n?

   /* Also handles the special case fib_nhs == 1 */
  
   /* each nexthop is packed in an attribute */
 @@ -374,8 +380,23 @@ static inline size_t fib_nlmsg_size(struct fib_info *fi)
   /* may contain flow and gateway attribute */
   nhsize += 2 * nla_total_size(4);
  
 +#ifdef CONFIG_LWTUNNEL
 + /* grab encap info */
 + for_nexthops(fi) {
 + if (nh-nh_lwtstate) {
 + /* RTA_ENCAP_TYPE */
 + nh_encapsize += lwtunnel_get_encap_size(
 + nh-nh_lwtstate);

New labels not in #ifdef:

 +
 +err_inval:
 + ret = -EINVAL;
 +
 +errout:
 + return ret;
  }

Some other places may need changes:

- nh_comp: there is logic that decides if same fib_info
is reused from many fib nodes. There should be check
if NH matches by nh_lwtstate.

- xfrm4_fill_dst: not sure about this but some fields
are copied.

Regards

--
Julian Anastasov j...@ssi.bg
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] fm10k: Report MAC address on driver load

2015-06-19 Thread Jeff Kirsher
On Thu, 2015-06-18 at 19:41 -0700, Alexander Duyck wrote:
 This change adds the MAC address to the list of values recorded on
 driver
 load.  The MAC address represents the serial number of the unit and
 allows
 us to track the value should a card be replaced in a system.
 
 The log message should now be similar in output to that of ixgbe.
 
 Signed-off-by: Alexander Duyck alexander.h.du...@redhat.com
 ---
 
 v2: Moved printing of MAC onto separate line similar to ixgbe.
 
 (Hopefully this works for you Jeff.  I took at look at the patch and
 just
  moved the bit I needed down.  I figured since this block hasn't
 changed I
  should be able to get away with just doing this instead of pulling
 and
  rebasing off of your tree. )
 
  drivers/net/ethernet/intel/fm10k/fm10k_pci.c |3 +++
  1 file changed, 3 insertions(+)

Works for me!  I have added your updated patch to my queue.


signature.asc
Description: This is a digitally signed message part


Re: [PATCH net 0/2] bridge: multicast behaviour fixes

2015-06-19 Thread Nikolay Aleksandrov

 On Jun 17, 2015, at 2:28 PM, Nikolay Aleksandrov 
 niko...@cumulusnetworks.com wrote:
 
 Hi,
 Patch 01 fixes a problem when a router is configured and a port goes
 through a link down/up, the router configuration was not restored.
 Patch 02 starts the multicast querier when using user-space STP and a
 port goes to forwarding state.
 These are behaviour fixes and if you think they are more appropriate for
 net-next, then feel free to apply them there, I've run them with both
 net and net-next. Also I've provided fixes tags, but since these are
 behaviour changes they're the initial implementations of these functions.
 
 Best regards,
 Nikolay Aleksandrov
 
 Satish Ashok (2):
  bridge: multicast: restore router configuration on port link down/up
  bridge: multicast: start querier timer when running user-space stp
 
 net/bridge/br_multicast.c | 4 
 net/bridge/br_stp.c   | 3 +++
 2 files changed, 7 insertions(+)
 
 -- 
 2.4.3
 

Dave please drop this series, I’ll post the patches separately because I have
to implement Herbert’s suggestion and would like to repurpose patch 02 for
net-next.

Thanks,
 Nik--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] net: dsa: Allow configuration of CPU DSA port speeds/duplex

2015-06-19 Thread Andrew Lunn
 Yes I do have debug too, but via sysfs (with eventually write access) for:
 GLOBAL1, GLOBAL2, cpu port registers, SerDes registers, PVIDs, and VTU.
 Not really standard though.

We should really get an implementation into mainline. There is no
point us all implementing our own.

You say your code is not really standard. Do you think it would get
rejected if it was submitted? The rules for debugfs are much more
relaxed, so what i have should be acceptable.

 Andrew
--
To unsubscribe from this list: send the line unsubscribe netdev in


Re: [PATCH net-next RFC v2 2/3] ipv4: add support for light weight tunnel encap attributes

2015-06-19 Thread roopa

On 6/19/15, 8:19 AM, Robert Shearman wrote:

On 19/06/15 05:49, Roopa Prabhu wrote:

From: Roopa Prabhu ro...@cumulusnetworks.com

Introduces two netlink attributes RTA_ENCAP_TYPE and
RTA_ENCAP to support attaching encap information to ipv4 routes.


Surely RTA_ENCAP_TYPE should be part of RTA_ENCAP, since the type 
doesn't make sense without the data and vice versa?
I went back and forth on this. And started with what you are saying 
above. But then I wanted RTA_ENCAP netlink policy to be declared by 
individual lwtunnel drivers.
And to determine which RTA_ENCAP netlink policy to pick, you need to 
know the RTA_ENCAP policy type (or lwtunnel type)
which is encoded in RTA_ENCAP_TYPE. And I did not want to introduce 
another level of nest in RTA_ENCAP (because for nexthops we are already 
2 levels deep when parsing RTA_ENCAP).


Hence, fib code first looks for RTA_ENCAP and if RTA_ENCAP is specified, 
RTA_ENCAP_TYPE is a required attribute. My iproute2 patches handles this 
and makes sure

there is an  RTA_ENCAP_TYPE specified with RTA_ENCAP.

thanks,
Roopa


--
To unsubscribe from this list: send the line unsubscribe netdev in


[PATCH] net: dsa: mv88e6xxx: split phy page accessors

2015-06-19 Thread Vivien Didelot
Split mv88e6xxx_phy_page_read and mv88e6xxx_phy_page_write into two
functions each, one to acquire the smi_mutex and one to call the actual
read/write functions.

This will be useful to access registers such as Fiber/SERDES Control,
from setup code with SMI lock held.

Also rename their error labels to clear, since it is not only an
error path.

Signed-off-by: Vivien Didelot vivien.dide...@savoirfairelinux.com
---
 drivers/net/dsa/mv88e6xxx.c | 43 ---
 1 file changed, 32 insertions(+), 11 deletions(-)

diff --git a/drivers/net/dsa/mv88e6xxx.c b/drivers/net/dsa/mv88e6xxx.c
index bfe70ce..9caec51 100644
--- a/drivers/net/dsa/mv88e6xxx.c
+++ b/drivers/net/dsa/mv88e6xxx.c
@@ -2068,37 +2068,58 @@ int mv88e6xxx_switch_reset(struct dsa_switch *ds, bool 
ppu_active)
return 0;
 }
 
-int mv88e6xxx_phy_page_read(struct dsa_switch *ds, int port, int page, int reg)
+/* Must be called with SMI lock held */
+static int _mv88e6xxx_phy_page_read(struct dsa_switch *ds, int port, int page,
+   int reg)
 {
-   struct mv88e6xxx_priv_state *ps = ds_to_priv(ds);
int ret;
 
-   mutex_lock(ps-smi_mutex);
ret = _mv88e6xxx_phy_write_indirect(ds, port, 0x16, page);
if (ret  0)
-   goto error;
+   goto clear;
ret = _mv88e6xxx_phy_read_indirect(ds, port, reg);
-error:
+clear:
_mv88e6xxx_phy_write_indirect(ds, port, 0x16, 0x0);
-   mutex_unlock(ps-smi_mutex);
return ret;
 }
 
-int mv88e6xxx_phy_page_write(struct dsa_switch *ds, int port, int page,
-int reg, int val)
+int mv88e6xxx_phy_page_read(struct dsa_switch *ds, int port, int page, int reg)
 {
struct mv88e6xxx_priv_state *ps = ds_to_priv(ds);
int ret;
 
mutex_lock(ps-smi_mutex);
+   ret = _mv88e6xxx_phy_page_read(ds, port, page, reg);
+   mutex_unlock(ps-smi_mutex);
+
+   return ret;
+}
+
+/* Must be called with SMI lock held */
+static int _mv88e6xxx_phy_page_write(struct dsa_switch *ds, int port, int page,
+int reg, int val)
+{
+   int ret;
+
ret = _mv88e6xxx_phy_write_indirect(ds, port, 0x16, page);
if (ret  0)
-   goto error;
-
+   goto clear;
ret = _mv88e6xxx_phy_write_indirect(ds, port, reg, val);
-error:
+clear:
_mv88e6xxx_phy_write_indirect(ds, port, 0x16, 0x0);
+   return ret;
+}
+
+int mv88e6xxx_phy_page_write(struct dsa_switch *ds, int port, int page,
+int reg, int val)
+{
+   struct mv88e6xxx_priv_state *ps = ds_to_priv(ds);
+   int ret;
+
+   mutex_lock(ps-smi_mutex);
+   ret = _mv88e6xxx_phy_page_write(ds, port, page, reg, val);
mutex_unlock(ps-smi_mutex);
+
return ret;
 }
 
-- 
2.4.3

--
To unsubscribe from this list: send the line unsubscribe netdev in


Re: [PATCH net-next RFC v2 1/3] lwt: infrastructure to support light weight tunnels

2015-06-19 Thread roopa

On 6/19/15, 7:43 AM, Robert Shearman wrote:

diff --git a/include/linux/lwtunnel.h b/include/linux/lwtunnel.h
new file mode 100644 

snip

+/* lw tunnel state flags */
+#define LWTUNNEL_STATE_OUTPUT_REDIRECT 0x1
+
+#define lwtunnel_output_redirect(lwtstate) (lwtstate  \
+(lwtstate-flags  LWTUNNEL_STATE_OUTPUT_REDIRECT))


This could be made an inline function for type-safety.

ack



+
+struct lwtunnel_state {
+__u16type;
+__u16flags;
+atomic_trefcnt;
+struct lwtunnel_hdr tunnel;
+};
+
+struct lwtunnel_net {
+struct hlist_head tunnels[LWTUNNEL_HASH_SIZE];
+};


This type doesn't appear to be used in this patch series. Do you 
intend to use it in a future version?

ack, will get rid of it




+
+static inline struct lwtunnel_state *lwtunnel_skb_lwstate(struct 
sk_buff *skb)

+{
+struct rtable *rt = (struct rtable *)skb_dst(skb);
+
+return rt-rt_lwtstate;
+}


It doesn't look like this patch will build on its own because 
rt_lwtstate isn't added to struct rtable until patch 2.
looks like i messed up the patch creation. I will fix that with the next 
series.


More importantly, is it safe to assume that skb_dst will always return 
an IPv4 dst? How will this look when IPv6 support is added?


Today lwtunnel_skb_lwstate is called from lwtunnel_output which is only 
called from ipv4 code.
And my ipv6 variant code was supposed to have a 6 suffix. something like 
lwtunnel_output6.
Or to be more explicit i will probably have variants of the output and 
skb handling functions like,

lwtunnel_output_ipv4 and lwtunnel_output_ipv6.

+
+ret = -EOPNOTSUPP;
+nest = nla_nest_start(skb, RTA_ENCAP);


Again, it doesn't look like this will build since RTA_ENCAP isn't 
added until patch 2.



ack, sorry abt the patch ordering. will fix it.

Thanks for the review.
--
To unsubscribe from this list: send the line unsubscribe netdev in


Re: [PATCH 1/1] ssb: remove unncessary out label

2015-06-19 Thread Michael Büsch
On Fri, 19 Jun 2015 14:23:45 +0530
Maninder Singh maninder...@samsung.com wrote:

 This patch removes unnecessary label out and
 some restructring for using device_create_file directly.
 
 Signed-off-by: Maninder Singh maninder...@samsung.com
 Reviewed-by: Rohit Thapliyal r.thapli...@samsung.com
 ---
  drivers/ssb/pci.c |8 +---
  1 files changed, 1 insertions(+), 7 deletions(-)
 
 diff --git a/drivers/ssb/pci.c b/drivers/ssb/pci.c
 index 0f28c08..d6ca4d3 100644
 --- a/drivers/ssb/pci.c
 +++ b/drivers/ssb/pci.c
 @@ -1173,17 +1173,11 @@ void ssb_pci_exit(struct ssb_bus *bus)
  int ssb_pci_init(struct ssb_bus *bus)
  {
   struct pci_dev *pdev;
 - int err;
  
   if (bus-bustype != SSB_BUSTYPE_PCI)
   return 0;
  
   pdev = bus-host_pci;
   mutex_init(bus-sprom_mutex);
 - err = device_create_file(pdev-dev, dev_attr_ssb_sprom);
 - if (err)
 - goto out;
 -
 -out:
 - return err;
 + return device_create_file(pdev-dev, dev_attr_ssb_sprom);
  }


I don't really think this change adds any value, but if you insist on
it you get my acked-by.


-- 
Michael


pgp8ErvjUlmpV.pgp
Description: OpenPGP digital signature


Re: [PATCH net-next RFC v2 2/3] ipv4: add support for light weight tunnel encap attributes

2015-06-19 Thread Robert Shearman

On 19/06/15 05:49, Roopa Prabhu wrote:

From: Roopa Prabhu ro...@cumulusnetworks.com

Introduces two netlink attributes RTA_ENCAP_TYPE and
RTA_ENCAP to support attaching encap information to ipv4 routes.


Surely RTA_ENCAP_TYPE should be part of RTA_ENCAP, since the type 
doesn't make sense without the data and vice versa?




RTA_ENCAP is a nested attribute as suggested by Thomas
(and also as Robert had it in his series). RTA_ENCAP
netlink policy is declared by the light weight tunnel
drivers that support this encap type.

fib code calls the following for each nexthop:
- new route handler:
lwt build state (that parses RTA_ENCAP and returns
lwt state that lives in every fib_nh)
- del dump hanlder:
lwt release handler to release lwt state data
- route dump hanlder:
lwt dump encap to fill RTA_ENCAP data
- during input route lookup
sets dst-output to lwtunnel_output which
in turn calls the corresponding lwt tunnel
output function which applies the required
encap and xmits the packet

Signed-off-by: Roopa Prabhu ro...@cumulusnetworks.com


Thanks,
Rob
--
To unsubscribe from this list: send the line unsubscribe netdev in


Re: [PATCH] xen-netback: fix a BUG() during initialization

2015-06-19 Thread Wei Liu
On Fri, Jun 19, 2015 at 02:21:51PM +0200, Imre Palik wrote:
 From: Palik, Imre im...@amazon.de
 
 Commit edafc132baac (xen-netback: making the bandwidth limiter runtime 
 settable)
 introduced the capability to change the bandwidth rate limit at runtime.
 But it also introduced a possible crashing bug.
 
 If netback receives two XenbusStateConnected without getting the
 hotplug-status watch firing in between, then it will try to register the
 watches for the rate limiter again.  But this triggers a BUG() in the watch
 registration code.
 
 The fix modifies connect() to remove the possibly existing packet-rate
 watches before trying to install those watches.  This behaviour is in line
 with how connect() deals with the hotplug-status watch.
 
 Signed-off-by: Imre Palik im...@amazon.de
 Cc: Matt Wilson m...@amazon.com

Acked-by: Wei Liu wei.l...@citrix.com

 ---
  drivers/net/xen-netback/xenbus.c |4 
  1 file changed, 4 insertions(+)
 
 diff --git a/drivers/net/xen-netback/xenbus.c 
 b/drivers/net/xen-netback/xenbus.c
 index 968787a..ec383b0 100644
 --- a/drivers/net/xen-netback/xenbus.c
 +++ b/drivers/net/xen-netback/xenbus.c
 @@ -681,6 +681,9 @@ static int xen_register_watchers(struct xenbus_device 
 *dev, struct xenvif *vif)
   char *node;
   unsigned maxlen = strlen(dev-nodename) + sizeof(/rate);
  
 + if (vif-credit_watch.node)
 + return -EADDRINUSE;
 +
   node = kmalloc(maxlen, GFP_KERNEL);
   if (!node)
   return -ENOMEM;
 @@ -770,6 +773,7 @@ static void connect(struct backend_info *be)
   }
  
   xen_net_read_rate(dev, credit_bytes, credit_usec);
 + xen_unregister_watchers(be-vif);
   xen_register_watchers(dev, be-vif);
   read_xenbus_vif_flags(be);
  
 -- 
 1.7.9.5
--
To unsubscribe from this list: send the line unsubscribe netdev in


Re: [PATCH net-next 00/15] Simplify netfilter and network namespaces

2015-06-19 Thread Eric W. Biederman
Julian Anastasov j...@ssi.bg writes:

   Hello,

 On Thu, 18 Jun 2015, Eric W. Biederman wrote:

 My incremental patch for ipvs on top of everything else I have pushed
 out looks like this:
 
 From: Eric W. Biederman ebied...@xmission.com
 Date: Fri, 12 Jun 2015 18:34:12 -0500
 Subject: [PATCH] ipvs: Pass struct net down to where it is needed and used
 
 Pass struct net down to where it is used and stop guessing
 which network namespace should be used.

   At first look patch is ok. But I'm not sure
 for the changes in ip_vs_xmit.c. Can you explain in
 2-3 lines, when can we see different netns? Is it when
 packet is forwarded to output device and it is part from
 another netns?

   I'm asking because these __ip_vs_get_out_rt*
 calls in ip_vs_xmit.c can reroute packet to another
 device...

The driver was ensure_mtu_is_adequate where in one half
of the function we have skb_net another half we have 
dev_net(dst_skb(skb)-dev).That is goofy.

That gets replaced by ip_vs_conn_net(cp).

In practice today I don't see that there are differences in
network namespaces today.

Moving forward I hope to be able to route packets out to network devices
in different network namespaces. It is a massive optimization that
doesn't need much code to support.  Once that optimization is in place
deriving the network namespace from the output device will not work.

I believe using ip_vs_conn_net(cp) is a simple rule that is easy to
understand and easy to implement correctly on the output path.

   Also, skb_sknet is another candidate for removal.
 But I can take care about it after your changes are
 pushed...

*nod*

Eric
--
To unsubscribe from this list: send the line unsubscribe netdev in


Re: [PATCH net-next RFC v2 0/3] light weight tunnel infrastructure and driver

2015-06-19 Thread Robert Shearman

On 19/06/15 05:49, Roopa Prabhu wrote:

From: Roopa Prabhu ro...@cumulusnetworks.com

This series implements infrastructure for light weight tunnels to support
mpls label edge routers (ie mpls ip tunnels). As previously discussed
having netdevices will not scale. Hence this series introduces new RTA_ENCAP*
attributes to attach encap information with routes (following suggestion
from Eric Biederman).


Looks promising, thanks for posting this series Roopa!



The first patch introduces an infrastructure to support light weight tunnels
that dont have netdevices. The infrastructure allows tunnel drivers
to register handlers to parse and build tunnel encap data which can be attached
to each route nexthop.

The second patch adds support in ipv4 fib to carry such light weight tunnel
encap data.


I presume this isn't ready to be merged until IPv6 is done, right?



The third patch implements mpls ip tunnels using this light weight tunnel
infrastructure.

Could not think of a better name, so, it is 'lwt' for 'light weight tunnels'
for now.


I can't think of a better name either.

Thanks,
Rob
--
To unsubscribe from this list: send the line unsubscribe netdev in


Re: [PATCH net-next RFC v2 0/3] light weight tunnel infrastructure and driver

2015-06-19 Thread roopa

On 6/19/15, 7:38 AM, Robert Shearman wrote:
This series implements infrastructure for light weight tunnels to 
support

mpls label edge routers (ie mpls ip tunnels). As previously discussed
having netdevices will not scale. Hence this series introduces new 
RTA_ENCAP*

attributes to attach encap information with routes (following suggestion
from Eric Biederman).


Looks promising, thanks for posting this series Roopa!



The first patch introduces an infrastructure to support light weight 
tunnels

that dont have netdevices. The infrastructure allows tunnel drivers
to register handlers to parse and build tunnel encap data which can 
be attached

to each route nexthop.

The second patch adds support in ipv4 fib to carry such light weight 
tunnel

encap data.


I presume this isn't ready to be merged until IPv6 is done, right?
yes, I will be adding ipv6 support soon. I will post the next non-RFC 
series with the ipv6 changes


thanks,
Roopa

--
To unsubscribe from this list: send the line unsubscribe netdev in


[PATCH net] netfilter: nftables: Do not run chains in the wrong network namespace

2015-06-19 Thread Eric W. Biederman

Currenlty nf_tables chains added in one network namespace are being
run in all network namespace.  The issues are myriad with the simplest
being an unprivileged user can cause any network packets to be dropped.

Address this by simply not running nf_tables chains in the wrong
network namespace.

Cc: sta...@vger.kernel.org
Signed-off-by: Eric W. Biederman ebied...@xmission.com
---
 net/netfilter/nf_tables_core.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/net/netfilter/nf_tables_core.c b/net/netfilter/nf_tables_core.c
index f153b07073af..f77bad46ac68 100644
--- a/net/netfilter/nf_tables_core.c
+++ b/net/netfilter/nf_tables_core.c
@@ -114,7 +114,8 @@ unsigned int
 nft_do_chain(struct nft_pktinfo *pkt, const struct nf_hook_ops *ops)
 {
const struct nft_chain *chain = ops-priv, *basechain = chain;
-   const struct net *net = read_pnet(nft_base_chain(basechain)-pnet);
+   const struct net *chain_net = 
read_pnet(nft_base_chain(basechain)-pnet);
+   const struct net *net = dev_net(pkt-in ? pkt-in : pkt-out);
const struct nft_rule *rule;
const struct nft_expr *expr, *last;
struct nft_regs regs;
@@ -124,6 +125,10 @@ nft_do_chain(struct nft_pktinfo *pkt, const struct 
nf_hook_ops *ops)
int rulenum;
unsigned int gencursor = nft_genmask_cur(net);
 
+   /* Ignore chains that are not for the current network namespace */
+   if (!net_eq(net, chain_net))
+   return NF_ACCEPT;
+
 do_chain:
rulenum = 0;
rule = list_entry(chain-rules, struct nft_rule, list);
-- 
2.2.1

--
To unsubscribe from this list: send the line unsubscribe netdev in


Re: [PATCH net-next RFC v2 1/3] lwt: infrastructure to support light weight tunnels

2015-06-19 Thread Robert Shearman

On 19/06/15 05:49, Roopa Prabhu wrote:

From: Roopa Prabhu ro...@cumulusnetworks.com

provides ops to parse, build and output encaped
packets for drivers that want to attach tunnel encap
information to routes.

Signed-off-by: Roopa Prabhu ro...@cumulusnetworks.com
---
  include/linux/lwtunnel.h  |6 ++
  include/net/lwtunnel.h|   84 +
  include/uapi/linux/lwtunnel.h |   11 +++
  net/Kconfig   |5 ++
  net/core/Makefile |1 +
  net/core/lwtunnel.c   |  162 +
  6 files changed, 269 insertions(+)
  create mode 100644 include/linux/lwtunnel.h
  create mode 100644 include/net/lwtunnel.h
  create mode 100644 include/uapi/linux/lwtunnel.h
  create mode 100644 net/core/lwtunnel.c

diff --git a/include/linux/lwtunnel.h b/include/linux/lwtunnel.h
new file mode 100644
index 000..97f32f8
--- /dev/null
+++ b/include/linux/lwtunnel.h
@@ -0,0 +1,6 @@
+#ifndef _LINUX_LWTUNNEL_H_
+#define _LINUX_LWTUNNEL_H_
+
+#include uapi/linux/lwtunnel.h
+
+#endif /* _LINUX_LWTUNNEL_H_ */
diff --git a/include/net/lwtunnel.h b/include/net/lwtunnel.h
new file mode 100644
index 000..649da3c
--- /dev/null
+++ b/include/net/lwtunnel.h
@@ -0,0 +1,84 @@
+#ifndef __NET_LWTUNNEL_H
+#define __NET_LWTUNNEL_H 1
+
+#include linux/lwtunnel.h
+#include linux/netdevice.h
+#include linux/skbuff.h
+#include linux/types.h
+#include net/dsfield.h
+#include net/ip.h
+#include net/rtnetlink.h
+
+#define LWTUNNEL_HASH_BITS   7
+#define LWTUNNEL_HASH_SIZE   (1  LWTUNNEL_HASH_BITS)
+
+struct lwtunnel_hdr {
+   int len;
+   __u8data[0];
+};
+
+/* lw tunnel state flags */
+#define LWTUNNEL_STATE_OUTPUT_REDIRECT 0x1
+
+#define lwtunnel_output_redirect(lwtstate) (lwtstate  \
+   (lwtstate-flags  LWTUNNEL_STATE_OUTPUT_REDIRECT))


This could be made an inline function for type-safety.


+
+struct lwtunnel_state {
+   __u16   type;
+   __u16   flags;
+   atomic_trefcnt;
+   struct lwtunnel_hdr tunnel;
+};
+
+struct lwtunnel_net {
+   struct hlist_head tunnels[LWTUNNEL_HASH_SIZE];
+};


This type doesn't appear to be used in this patch series. Do you intend 
to use it in a future version?



+
+struct lwtunnel_encap_ops {
+   int (*build_state)(struct net_device *dev, struct nlattr *encap,
+  struct lwtunnel_state **ts);
+   int (*output)(struct sock *sk, struct sk_buff *skb);
+   int (*fill_encap)(struct sk_buff *skb,
+ struct lwtunnel_state *lwtstate);
+   int (*get_encap_size)(struct lwtunnel_state *lwtstate);
+};
+
+#define MAX_LWTUNNEL_ENCAP_OPS 8
+extern const struct lwtunnel_encap_ops __rcu *
+   lwtun_encaps[MAX_LWTUNNEL_ENCAP_OPS];
+
+static inline void lwtunnel_state_get(struct lwtunnel_state *lws)
+{
+   atomic_inc(lws-refcnt);
+}
+
+static inline void lwtunnel_state_put(struct lwtunnel_state *lws)
+{
+   if (!lws)
+   return;
+
+   if (atomic_dec_and_test(lws-refcnt))
+   kfree(lws);
+}
+
+static inline struct lwtunnel_state *lwtunnel_skb_lwstate(struct sk_buff *skb)
+{
+   struct rtable *rt = (struct rtable *)skb_dst(skb);
+
+   return rt-rt_lwtstate;
+}


It doesn't look like this patch will build on its own because 
rt_lwtstate isn't added to struct rtable until patch 2.


More importantly, is it safe to assume that skb_dst will always return 
an IPv4 dst? How will this look when IPv6 support is added?



+
+int lwtunnel_encap_add_ops(const struct lwtunnel_encap_ops *op,
+  unsigned int num);
+int lwtunnel_encap_del_ops(const struct lwtunnel_encap_ops *op,
+  unsigned int num);
+int lwtunnel_build_state(struct net_device *dev, u16 encap_type,
+struct nlattr *encap,
+struct lwtunnel_state **lws);
+int lwtunnel_fill_encap(struct sk_buff *skb,
+   struct lwtunnel_state *lwtstate);
+int lwtunnel_get_encap_size(struct lwtunnel_state *lwtstate);
+struct lwtunnel_state *lwtunnel_state_alloc(int hdr_len);
+int lwtunnel_output(struct sock *sk, struct sk_buff *skb);
+
+#endif /* __NET_LWTUNNEL_H */

...

diff --git a/net/core/lwtunnel.c b/net/core/lwtunnel.c
new file mode 100644
index 000..29c7802
--- /dev/null
+++ b/net/core/lwtunnel.c
@@ -0,0 +1,162 @@
+/*
+ * lwtunnelInfrastructure for light weight tunnels like mpls
+ *
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ *
+ */
+#define pr_fmt(fmt) KBUILD_MODNAME :  fmt
+
+#include linux/capability.h
+#include linux/module.h
+#include linux/types.h
+#include linux/kernel.h
+#include linux/slab.h

Re: [PATCH 3/3] net: dsa: Allow configuration of CPU DSA port speeds/duplex

2015-06-19 Thread Vivien Didelot
Hi Andrew,

On Jun 17, 2015, at 9:11 PM, Andrew Lunn and...@lunn.ch wrote:
 On Wed, Jun 17, 2015 at 02:09:52PM -0400, Vivien Didelot wrote:
 Hi Andrew, All,
 
 On 12/06/15 10:18, Andrew Lunn wrote:
  By default, DSA and CPU ports are configured to the maximum speed the
  switch supports. However there can be use cases where the peer device
  port is slower. Allow a fixed-link property to be used with the DSA
  and CPU port in the device tree, and use this information to configure
  the port.
 
 Would it be a good idea for DSA to expose the cpu port to userspace as 
 well?
 That way, it'd be possible to use ethtool to set the port speed and duplex
 mode, or dump registers (this would have saved me quite some time in dev).
 
 I have code which expose these via debugfs. So far, i have all
 registers, stats, ATU, and the scratch registers. For the patches to
 apply cleanly, they depend on these patches, so i've not posted them
 yet.

Yes I do have debug too, but via sysfs (with eventually write access) for:
GLOBAL1, GLOBAL2, cpu port registers, SerDes registers, PVIDs, and VTU.
Not really standard though.

 I'm not strongly against having a CPU port, but i don't particularly
 like having the CPU port as an interface. And when you get to cascaded
 switches, the DSA ports are also interesting, so should we also have a
 netdev for them? But they are equally useless for transferring frames
 from the host as the CPU port. This is why i went for debugfs.
 
 Also, in my RFC for 802.1Q support [1], I assume the CPU port to be a tagged
 member of each VLAN. But someone may want to add a VLAN with swp3 and swp4
 only, and another VLAN with swp0, swp1 and the CPU port. Am I correct?
 
 The DSA concept is that switch ports are separate interfaces. So
 adding a VLAN to two ports does to automatically bridge those ports
 together. You need to add them to a bridge as well before VLAN tagged
 frames are bridged between ports.

My point was to expose all configurable ports with the same standard interface
(netdev, like any other virtual switch port). But indeed, their uselessness for
data transfer can be a good reason not to do it.

Thanks,
-v
--
To unsubscribe from this list: send the line unsubscribe netdev in


Re: [PATCH net-next RFC v2 2/3] ipv4: add support for light weight tunnel encap attributes

2015-06-19 Thread Robert Shearman

On 19/06/15 15:19, roopa wrote:

On 6/18/15, 11:59 PM, Julian Anastasov wrote:

Hello,

On Thu, 18 Jun 2015, Roopa Prabhu wrote:


@@ -366,6 +371,7 @@ static inline size_t fib_nlmsg_size(struct
fib_info *fi)
  payload += nla_total_size((RTAX_MAX * nla_total_size(4)));
  if (fi-fib_nhs) {
+size_t nh_encapsize = 0;

Var not in #ifdef. Any warnings with CONFIG_LWTUNNEL=n?


  /* Also handles the special case fib_nhs == 1 */
  /* each nexthop is packed in an attribute */
@@ -374,8 +380,23 @@ static inline size_t fib_nlmsg_size(struct
fib_info *fi)
  /* may contain flow and gateway attribute */
  nhsize += 2 * nla_total_size(4);
+#ifdef CONFIG_LWTUNNEL
+/* grab encap info */
+for_nexthops(fi) {
+if (nh-nh_lwtstate) {
+/* RTA_ENCAP_TYPE */
+nh_encapsize += lwtunnel_get_encap_size(
+nh-nh_lwtstate);

New labels not in #ifdef:

Will check and fix all warnings with CONFIG_LWTUNNEL off



+
+err_inval:
+ret = -EINVAL;
+
+errout:
+return ret;
  }

Some other places may need changes:

- nh_comp: there is logic that decides if same fib_info
is reused from many fib nodes. There should be check
if NH matches by nh_lwtstate.


yes, i will add that.


One other place - fib_nh_match. This is used when deleting a route to 
verify that any supplied rtnetlink properties match the route in the fib.


Thanks,
Rob
--
To unsubscribe from this list: send the line unsubscribe netdev in


Re: [PATCH net-next RFC v2 2/3] ipv4: add support for light weight tunnel encap attributes

2015-06-19 Thread roopa

On 6/19/15, 7:55 AM, Robert Shearman wrote:

On 19/06/15 15:19, roopa wrote:

On 6/18/15, 11:59 PM, Julian Anastasov wrote:

  Some other places may need changes:

- nh_comp: there is logic that decides if same fib_info
is reused from many fib nodes. There should be check
if NH matches by nh_lwtstate.


yes, i will add that.


One other place - fib_nh_match. This is used when deleting a route to 
verify that any supplied rtnetlink properties match the route in the fib.

ack, thanks!.

--
To unsubscribe from this list: send the line unsubscribe netdev in


Re: [PATCH next v3] bonding: Display LACP info only to CAP_NET_ADMIN capable user

2015-06-19 Thread Mahesh Bandewar
On Thu, Jun 18, 2015 at 8:00 PM, Andy Gospodarek
go...@cumulusnetworks.com wrote:

 On Thu, Jun 18, 2015 at 11:30:54AM -0700, Mahesh Bandewar wrote:
  Actor and Partner details can be accessed via proc-fs, sys-fs
  entries or netlink interface. These interfaces are world readable
  at this moment. The earlier patch-series made the LACP communication
  secure to avoid nuisance attack from within the same L2 domain but
  it did not prevent someone unprivileged looking at that information
  on host and perform the same act.
 
  This patch essentially avoids spitting those entries if the user
  in question does not have enough privileges.
 
  Signed-off-by: Mahesh Bandewar mahe...@google.com
  ---
   drivers/net/bonding/bond_netlink.c |  23 +
   drivers/net/bonding/bond_procfs.c  | 101 
  +++--
   drivers/net/bonding/bond_sysfs.c   |  12 ++---
   3 files changed, 71 insertions(+), 65 deletions(-)
 
 [...]
  diff --git a/drivers/net/bonding/bond_procfs.c 
  b/drivers/net/bonding/bond_procfs.c
  index e7f3047a26df..f514fe5e80a5 100644
  --- a/drivers/net/bonding/bond_procfs.c
  +++ b/drivers/net/bonding/bond_procfs.c
 [...]
  @@ -199,33 +202,35 @@ static void bond_info_show_slave(struct seq_file *seq,
seq_printf(seq, Partner Churned Count: %d\n,
   port-churn_partner_count);
 
  - seq_puts(seq, details actor lacp pdu:\n);
  - seq_printf(seq, system priority: %d\n,
  -port-actor_system_priority);
  - seq_printf(seq, system mac address: %pM\n,
  -port-actor_system);
  - seq_printf(seq, port key: %d\n,
  -port-actor_oper_port_key);
  - seq_printf(seq, port priority: %d\n,
  -port-actor_port_priority);
  - seq_printf(seq, port number: %d\n,
  -port-actor_port_number);
  - seq_printf(seq, port state: %d\n,
  -port-actor_oper_port_state);
  -
  - seq_puts(seq, details partner lacp pdu:\n);
  - seq_printf(seq, system priority: %d\n,
  -port-partner_oper.system_priority);
  - seq_printf(seq, system mac address: %pM\n,
  -port-partner_oper.system);
  - seq_printf(seq, oper key: %d\n,
  -port-partner_oper.key);
  - seq_printf(seq, port priority: %d\n,
  -port-partner_oper.port_priority);
  - seq_printf(seq, port number: %d\n,
  -port-partner_oper.port_number);
  - seq_printf(seq, port state: %d\n,
  -port-partner_oper.port_state);
  + if (capable(CAP_NET_ADMIN)) {
  + seq_puts(seq, details actor lacp pdu:\n);
  + seq_printf(seq, system priority: %d\n,
  +port-actor_system_priority);
  + seq_printf(seq, system mac address: 
  %pM\n,
  +port-actor_system);
  + seq_printf(seq, port key: %d\n,
  +port-actor_oper_port_key);
  + seq_printf(seq, port priority: %d\n,
  +port-actor_port_priority);
  + seq_printf(seq, port number: %d\n,
  +port-actor_port_number);
  + seq_printf(seq, port state: %d\n,
  +port-actor_oper_port_state);
  +
  + seq_puts(seq, details partner lacp pdu:\n);
  + seq_printf(seq, system priority: %d\n,
  +
  port-partner_oper.system_priority);
  + seq_printf(seq, system mac address: 
  %pM\n,
  +port-partner_oper.system);
  + seq_printf(seq, oper key: %d\n,
  +port-partner_oper.key);
  + seq_printf(seq, port priority: %d\n,
  +port-partner_oper.port_priority);
  + seq_printf(seq, port number: %d\n,
  +port-partner_oper.port_number);
  + seq_printf(seq, port state: %d\n,
  +

[PATCH 09/12] netfilter: use forward declaration instead of including linux/proc_fs.h

2015-06-19 Thread Pablo Neira Ayuso
We don't need to pull the full definitions in that file, a simple forward
declaration is enough.

Moreover, include linux/procfs.h from nf_synproxy_core, otherwise this hits a
compilation error due to missing declarations, ie.

net/netfilter/nf_synproxy_core.c: In function ‘synproxy_proc_init’:
net/netfilter/nf_synproxy_core.c:326:2: error: implicit declaration of function 
‘proc_create’ [-Werror=implicit-function-declaration]
  if (!proc_create(synproxy, S_IRUGO, net-proc_net_stat,
  ^

Signed-off-by: Pablo Neira Ayuso pa...@netfilter.org
Signed-off-by: Eric W. Biederman ebied...@xmission.com
---
 include/net/netns/netfilter.h|2 +-
 net/netfilter/nf_synproxy_core.c |1 +
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/include/net/netns/netfilter.h b/include/net/netns/netfilter.h
index 8874002..cf25b5e 100644
--- a/include/net/netns/netfilter.h
+++ b/include/net/netns/netfilter.h
@@ -1,9 +1,9 @@
 #ifndef __NETNS_NETFILTER_H
 #define __NETNS_NETFILTER_H
 
-#include linux/proc_fs.h
 #include linux/netfilter.h
 
+struct proc_dir_entry;
 struct nf_logger;
 
 struct netns_nf {
diff --git a/net/netfilter/nf_synproxy_core.c b/net/netfilter/nf_synproxy_core.c
index 52e20c9..789feea 100644
--- a/net/netfilter/nf_synproxy_core.c
+++ b/net/netfilter/nf_synproxy_core.c
@@ -11,6 +11,7 @@
 #include asm/unaligned.h
 #include net/tcp.h
 #include net/netns/generic.h
+#include linux/proc_fs.h
 
 #include linux/netfilter_ipv4/ip_tables.h
 #include linux/netfilter/x_tables.h
-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe netdev in


[PATCH 00/12] Netfilter updates for net-next

2015-06-19 Thread Pablo Neira Ayuso
Hi David,

The following patchset contains a final Netfilter pull request for net-next
4.2. This mostly addresses some fallout from the previous pull request, small
netns updates and a couple of new features for nfnetlink_log and the socket
match that didn't get in time for the previous pull request. More specifically
they are:

1) Add security context information to nfnetlink_queue, from Roman Kubiak.

2) Add support to restore the sk_mark into skb-mark through xt_socket,
   from Harout Hedeshian.

3) Force alignment of 16 bytes of per cpu xt_counters, from Eric Dumazet.

4) Rename br_netfilter.c to br_netfilter_hooks.c to prepare split of IPv6 code
   into a separated file.

5) Move the IPv6 code in br_netfilter into a separated file.

6) Remove unused RCV_SKB_FAIL() in nfnetlink_queue and nfetlink_log, from Eric
   Biederman.

7) Two liner to simplify netns logic in em_ipset_match().

8) Add missing includes to net/net_namespace.h to avoid compilation problems
   that result from not including linux/netfilter.h in netns headers.

9) Use a forward declaration instead of including linux/proc_fs.h from
   netns/netfilter.h

10) Add a new linux/netfilter_defs.h to replace the linux/netfilter.h inclusion
in netns headers.

11) Remove spurious netfilter.h file included in the net tree, also from Eric
Biederman.

12) Fix x_tables compilation warnings on 32 bits platforms that resulted from
recent changes in x_tables counters, from Florian Westphal.

You can pull these changes from:

  git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next.git

Thanks!



The following changes since commit 89d256bb69f2596c3a31ac51466eac9e1791c388:

  bpf: disallow bpf tc programs access current-pid,uid (2015-06-15 20:51:20 
-0700)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next.git master

for you to fetch changes up to dcb8f5c8139ef945cdfd55900fae265c4dbefc02:

  netfilter: xtables: fix warnings on 32bit platforms (2015-06-18 21:14:33 
+0200)


Eric Dumazet (1):
  netfilter: x_tables: align per cpu xt_counter

Eric W Biederman (1):
  netfilter: Remove spurios included of netfilter.h

Eric W. Biederman (2):
  netfilter: Kill unused copies of RCV_SKB_FAIL
  net: sched: Simplify em_ipset_match

Florian Westphal (1):
  netfilter: xtables: fix warnings on 32bit platforms

Harout Hedeshian (1):
  netfilter: xt_socket: add XT_SOCKET_RESTORESKMARK flag

Pablo Neira Ayuso (5):
  netfilter: bridge: rename br_netfilter.c to br_netfilter_hooks.c
  netfilter: bridge: split ipv6 code into separated file
  net: include missing headers in net/net_namespace.h
  netfilter: use forward declaration instead of including linux/proc_fs.h
  netfilter: don't pull include/linux/netfilter.h from netns headers

Roman Kubiak (1):
  netfilter: nfnetlink_queue: add security context information

 drivers/net/hamradio/bpqether.c|1 -
 drivers/net/ppp/pptp.c |2 -
 drivers/net/wan/lapbether.c|1 -
 include/linux/netfilter.h  |6 +-
 include/linux/netfilter/x_tables.h |   14 +-
 include/linux/netfilter_defs.h |9 +
 include/net/net_namespace.h|2 +
 include/net/netfilter/br_netfilter.h   |   60 +
 include/net/netns/netfilter.h  |4 +-
 include/net/netns/x_tables.h   |2 +-
 include/uapi/linux/netfilter.h |3 +-
 include/uapi/linux/netfilter/nfnetlink_queue.h |4 +-
 include/uapi/linux/netfilter/xt_socket.h   |8 +
 net/ax25/af_ax25.c |1 -
 net/ax25/ax25_in.c |1 -
 net/ax25/ax25_ip.c |1 -
 net/ax25/ax25_out.c|1 -
 net/ax25/ax25_uid.c|1 -
 net/bridge/Makefile|2 +
 .../{br_netfilter.c = br_netfilter_hooks.c}   |  248 +---
 net/bridge/br_netfilter_ipv6.c |  245 +++
 net/ipv6/output_core.c |1 +
 net/netfilter/nf_synproxy_core.c   |1 +
 net/netfilter/nfnetlink_log.c  |2 -
 net/netfilter/nfnetlink_queue_core.c   |   37 ++-
 net/netfilter/xt_socket.c  |   59 -
 net/netrom/nr_route.c  |1 -
 net/rose/rose_link.c   |1 -
 net/rose/rose_route.c  |1 -
 net/sched/em_ipset.c   |4 +-
 security/selinux/xfrm.c|3 -
 31 files 

[PATCH 02/12] netfilter: xt_socket: add XT_SOCKET_RESTORESKMARK flag

2015-06-19 Thread Pablo Neira Ayuso
From: Harout Hedeshian haro...@codeaurora.org

xt_socket is useful for matching sockets with IP_TRANSPARENT and
taking some action on the matching packets. However, it lacks the
ability to match only a small subset of transparent sockets.

Suppose there are 2 applications, each with its own set of transparent
sockets. The first application wants all matching packets dropped,
while the second application wants them forwarded somewhere else.

Add the ability to retore the skb-mark from the sk_mark. The mark
is only restored if a matching socket is found and the transparent /
nowildcard conditions are satisfied.

Now the 2 hypothetical applications can differentiate their sockets
based on a mark value set with SO_MARK.

iptables -t mangle -I PREROUTING -m socket --transparent \
   --restore-skmark -j action
iptables -t mangle -A action -m mark --mark 10 -j action2
iptables -t mangle -A action -m mark --mark 11 -j action3

Signed-off-by: Harout Hedeshian haro...@codeaurora.org
Signed-off-by: Pablo Neira Ayuso pa...@netfilter.org
---
 include/uapi/linux/netfilter/xt_socket.h |8 
 net/netfilter/xt_socket.c|   59 +++---
 2 files changed, 61 insertions(+), 6 deletions(-)

diff --git a/include/uapi/linux/netfilter/xt_socket.h 
b/include/uapi/linux/netfilter/xt_socket.h
index 6315e2a..87644f8 100644
--- a/include/uapi/linux/netfilter/xt_socket.h
+++ b/include/uapi/linux/netfilter/xt_socket.h
@@ -6,6 +6,7 @@
 enum {
XT_SOCKET_TRANSPARENT = 1  0,
XT_SOCKET_NOWILDCARD = 1  1,
+   XT_SOCKET_RESTORESKMARK = 1  2,
 };
 
 struct xt_socket_mtinfo1 {
@@ -18,4 +19,11 @@ struct xt_socket_mtinfo2 {
 };
 #define XT_SOCKET_FLAGS_V2 (XT_SOCKET_TRANSPARENT | XT_SOCKET_NOWILDCARD)
 
+struct xt_socket_mtinfo3 {
+   __u8 flags;
+};
+#define XT_SOCKET_FLAGS_V3 (XT_SOCKET_TRANSPARENT \
+  | XT_SOCKET_NOWILDCARD \
+  | XT_SOCKET_RESTORESKMARK)
+
 #endif /* _XT_SOCKET_H */
diff --git a/net/netfilter/xt_socket.c b/net/netfilter/xt_socket.c
index e092cb0..43e26c8 100644
--- a/net/netfilter/xt_socket.c
+++ b/net/netfilter/xt_socket.c
@@ -205,6 +205,7 @@ static bool
 socket_match(const struct sk_buff *skb, struct xt_action_param *par,
 const struct xt_socket_mtinfo1 *info)
 {
+   struct sk_buff *pskb = (struct sk_buff *)skb;
struct sock *sk = skb-sk;
 
if (!sk)
@@ -226,6 +227,10 @@ socket_match(const struct sk_buff *skb, struct 
xt_action_param *par,
if (info-flags  XT_SOCKET_TRANSPARENT)
transparent = xt_socket_sk_is_transparent(sk);
 
+   if (info-flags  XT_SOCKET_RESTORESKMARK  !wildcard 
+   transparent)
+   pskb-mark = sk-sk_mark;
+
if (sk != skb-sk)
sock_gen_put(sk);
 
@@ -247,7 +252,7 @@ socket_mt4_v0(const struct sk_buff *skb, struct 
xt_action_param *par)
 }
 
 static bool
-socket_mt4_v1_v2(const struct sk_buff *skb, struct xt_action_param *par)
+socket_mt4_v1_v2_v3(const struct sk_buff *skb, struct xt_action_param *par)
 {
return socket_match(skb, par, par-matchinfo);
 }
@@ -371,9 +376,10 @@ static struct sock *xt_socket_lookup_slow_v6(const struct 
sk_buff *skb,
 }
 
 static bool
-socket_mt6_v1_v2(const struct sk_buff *skb, struct xt_action_param *par)
+socket_mt6_v1_v2_v3(const struct sk_buff *skb, struct xt_action_param *par)
 {
const struct xt_socket_mtinfo1 *info = (struct xt_socket_mtinfo1 *) 
par-matchinfo;
+   struct sk_buff *pskb = (struct sk_buff *)skb;
struct sock *sk = skb-sk;
 
if (!sk)
@@ -395,6 +401,10 @@ socket_mt6_v1_v2(const struct sk_buff *skb, struct 
xt_action_param *par)
if (info-flags  XT_SOCKET_TRANSPARENT)
transparent = xt_socket_sk_is_transparent(sk);
 
+   if (info-flags  XT_SOCKET_RESTORESKMARK  !wildcard 
+   transparent)
+   pskb-mark = sk-sk_mark;
+
if (sk != skb-sk)
sock_gen_put(sk);
 
@@ -428,6 +438,19 @@ static int socket_mt_v2_check(const struct xt_mtchk_param 
*par)
return 0;
 }
 
+static int socket_mt_v3_check(const struct xt_mtchk_param *par)
+{
+   const struct xt_socket_mtinfo3 *info =
+   (struct xt_socket_mtinfo3 *)par-matchinfo;
+
+   if (info-flags  ~XT_SOCKET_FLAGS_V3) {
+   pr_info(unknown flags 0x%x\n,
+   info-flags  ~XT_SOCKET_FLAGS_V3);
+   return -EINVAL;
+   }
+   return 0;
+}
+
 static struct xt_match socket_mt_reg[] __read_mostly = {
{
.name   = socket,
@@ -442,7 +465,7 @@ static struct xt_match socket_mt_reg[] __read_mostly = {
.name   = socket,
.revision   = 1,
.family = NFPROTO_IPV4,
-   

[PATCH 06/12] netfilter: Kill unused copies of RCV_SKB_FAIL

2015-06-19 Thread Pablo Neira Ayuso
From: Eric W. Biederman ebied...@xmission.com

This appears to have been a dead macro in both nfnetlink_log.c and
nfnetlink_queue_core.c since these pieces of code were added in 2005.

Signed-off-by: Eric W. Biederman ebied...@xmission.com
Signed-off-by: Pablo Neira Ayuso pa...@netfilter.org
---
 net/netfilter/nfnetlink_log.c|2 --
 net/netfilter/nfnetlink_queue_core.c |2 --
 2 files changed, 4 deletions(-)

diff --git a/net/netfilter/nfnetlink_log.c b/net/netfilter/nfnetlink_log.c
index 4ef1fae..4670821 100644
--- a/net/netfilter/nfnetlink_log.c
+++ b/net/netfilter/nfnetlink_log.c
@@ -598,8 +598,6 @@ nla_put_failure:
return -1;
 }
 
-#define RCV_SKB_FAIL(err) do { netlink_ack(skb, nlh, (err)); return; } while 
(0)
-
 static struct nf_loginfo default_loginfo = {
.type = NF_LOG_TYPE_ULOG,
.u = {
diff --git a/net/netfilter/nfnetlink_queue_core.c 
b/net/netfilter/nfnetlink_queue_core.c
index 6eccf0f..e26a46e 100644
--- a/net/netfilter/nfnetlink_queue_core.c
+++ b/net/netfilter/nfnetlink_queue_core.c
@@ -834,8 +834,6 @@ nfqnl_dev_drop(struct net *net, int ifindex)
rcu_read_unlock();
 }
 
-#define RCV_SKB_FAIL(err) do { netlink_ack(skb, nlh, (err)); return; } while 
(0)
-
 static int
 nfqnl_rcv_dev_event(struct notifier_block *this,
unsigned long event, void *ptr)
-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe netdev in


[PATCH 12/12] netfilter: xtables: fix warnings on 32bit platforms

2015-06-19 Thread Pablo Neira Ayuso
From: Florian Westphal f...@strlen.de

On 32bit archs gcc complains due to cast from void* to u64.
Add intermediate casts to long to silence these warnings.

include/linux/netfilter/x_tables.h:376:10: warning: cast from pointer to 
integer of different size [-Wpointer-to-int-cast]
include/linux/netfilter/x_tables.h:384:15: warning: cast to pointer from 
integer of different size [-Wint-to-pointer-cast]
include/linux/netfilter/x_tables.h:391:23: warning: cast to pointer from 
integer of different size [-Wint-to-pointer-cast]
include/linux/netfilter/x_tables.h:400:22: warning: cast to pointer from 
integer of different size [-Wint-to-pointer-cast]

Fixes: 71ae0dff02d756e (netfilter: xtables: use percpu rule counters)
Reported-by: kbuild test robot fengguang...@intel.com
Signed-off-by: Florian Westphal f...@strlen.de
Signed-off-by: Pablo Neira Ayuso pa...@netfilter.org
---
 include/linux/netfilter/x_tables.h |8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/include/linux/netfilter/x_tables.h 
b/include/linux/netfilter/x_tables.h
index 1c97a22..286098a 100644
--- a/include/linux/netfilter/x_tables.h
+++ b/include/linux/netfilter/x_tables.h
@@ -372,7 +372,7 @@ static inline u64 xt_percpu_counter_alloc(void)
if (res == NULL)
return (u64) -ENOMEM;
 
-   return (__force u64) res;
+   return (u64) (__force unsigned long) res;
}
 
return 0;
@@ -380,14 +380,14 @@ static inline u64 xt_percpu_counter_alloc(void)
 static inline void xt_percpu_counter_free(u64 pcnt)
 {
if (nr_cpu_ids  1)
-   free_percpu((void __percpu *) pcnt);
+   free_percpu((void __percpu *) (unsigned long) pcnt);
 }
 
 static inline struct xt_counters *
 xt_get_this_cpu_counter(struct xt_counters *cnt)
 {
if (nr_cpu_ids  1)
-   return this_cpu_ptr((void __percpu *) cnt-pcnt);
+   return this_cpu_ptr((void __percpu *) (unsigned long) 
cnt-pcnt);
 
return cnt;
 }
@@ -396,7 +396,7 @@ static inline struct xt_counters *
 xt_get_per_cpu_counter(struct xt_counters *cnt, unsigned int cpu)
 {
if (nr_cpu_ids  1)
-   return per_cpu_ptr((void __percpu *) cnt-pcnt, cpu);
+   return per_cpu_ptr((void __percpu *) (unsigned long) cnt-pcnt, 
cpu);
 
return cnt;
 }
-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe netdev in


[PATCH 11/12] netfilter: Remove spurios included of netfilter.h

2015-06-19 Thread Pablo Neira Ayuso
From: Eric W Biederman ebied...@xmission.com

While testing my netfilter changes I noticed several files where
recompiling unncessarily because they unncessarily included
netfilter.h.

Signed-off-by: Eric W. Biederman ebied...@xmission.com
Signed-off-by: Pablo Neira Ayuso pa...@netfilter.org
---
 drivers/net/hamradio/bpqether.c |1 -
 drivers/net/ppp/pptp.c  |2 --
 drivers/net/wan/lapbether.c |1 -
 net/ax25/af_ax25.c  |1 -
 net/ax25/ax25_in.c  |1 -
 net/ax25/ax25_ip.c  |1 -
 net/ax25/ax25_out.c |1 -
 net/ax25/ax25_uid.c |1 -
 net/netrom/nr_route.c   |1 -
 net/rose/rose_link.c|1 -
 net/rose/rose_route.c   |1 -
 security/selinux/xfrm.c |3 ---
 12 files changed, 15 deletions(-)

diff --git a/drivers/net/hamradio/bpqether.c b/drivers/net/hamradio/bpqether.c
index 63ff08a..7856b6c 100644
--- a/drivers/net/hamradio/bpqether.c
+++ b/drivers/net/hamradio/bpqether.c
@@ -76,7 +76,6 @@
 #include linux/proc_fs.h
 #include linux/seq_file.h
 #include linux/stat.h
-#include linux/netfilter.h
 #include linux/module.h
 #include linux/init.h
 #include linux/rtnetlink.h
diff --git a/drivers/net/ppp/pptp.c b/drivers/net/ppp/pptp.c
index 14839bc..686f37d 100644
--- a/drivers/net/ppp/pptp.c
+++ b/drivers/net/ppp/pptp.c
@@ -28,8 +28,6 @@
 #include linux/file.h
 #include linux/in.h
 #include linux/ip.h
-#include linux/netfilter.h
-#include linux/netfilter_ipv4.h
 #include linux/rcupdate.h
 #include linux/spinlock.h
 
diff --git a/drivers/net/wan/lapbether.c b/drivers/net/wan/lapbether.c
index 2f5eda8..6676607 100644
--- a/drivers/net/wan/lapbether.c
+++ b/drivers/net/wan/lapbether.c
@@ -40,7 +40,6 @@
 #include linux/interrupt.h
 #include linux/notifier.h
 #include linux/stat.h
-#include linux/netfilter.h
 #include linux/module.h
 #include linux/lapb.h
 #include linux/init.h
diff --git a/net/ax25/af_ax25.c b/net/ax25/af_ax25.c
index 4273533..9c891d0 100644
--- a/net/ax25/af_ax25.c
+++ b/net/ax25/af_ax25.c
@@ -40,7 +40,6 @@
 #include linux/notifier.h
 #include linux/proc_fs.h
 #include linux/stat.h
-#include linux/netfilter.h
 #include linux/sysctl.h
 #include linux/init.h
 #include linux/spinlock.h
diff --git a/net/ax25/ax25_in.c b/net/ax25/ax25_in.c
index 7ed8ab7..29a3687 100644
--- a/net/ax25/ax25_in.c
+++ b/net/ax25/ax25_in.c
@@ -23,7 +23,6 @@
 #include linux/inet.h
 #include linux/netdevice.h
 #include linux/skbuff.h
-#include linux/netfilter.h
 #include net/sock.h
 #include net/tcp_states.h
 #include asm/uaccess.h
diff --git a/net/ax25/ax25_ip.c b/net/ax25/ax25_ip.c
index 7c646bb..b563a3f 100644
--- a/net/ax25/ax25_ip.c
+++ b/net/ax25/ax25_ip.c
@@ -31,7 +31,6 @@
 #include linux/notifier.h
 #include linux/proc_fs.h
 #include linux/stat.h
-#include linux/netfilter.h
 #include linux/sysctl.h
 #include net/ip.h
 #include net/arp.h
diff --git a/net/ax25/ax25_out.c b/net/ax25/ax25_out.c
index be2acab..8ddd41b 100644
--- a/net/ax25/ax25_out.c
+++ b/net/ax25/ax25_out.c
@@ -24,7 +24,6 @@
 #include linux/inet.h
 #include linux/netdevice.h
 #include linux/skbuff.h
-#include linux/netfilter.h
 #include net/sock.h
 #include asm/uaccess.h
 #include linux/fcntl.h
diff --git a/net/ax25/ax25_uid.c b/net/ax25/ax25_uid.c
index 71c4bad..4ad2fb7 100644
--- a/net/ax25/ax25_uid.c
+++ b/net/ax25/ax25_uid.c
@@ -34,7 +34,6 @@
 #include linux/proc_fs.h
 #include linux/seq_file.h
 #include linux/stat.h
-#include linux/netfilter.h
 #include linux/sysctl.h
 #include linux/export.h
 #include net/ip.h
diff --git a/net/netrom/nr_route.c b/net/netrom/nr_route.c
index 96b64d2..d72a4f1 100644
--- a/net/netrom/nr_route.c
+++ b/net/netrom/nr_route.c
@@ -31,7 +31,6 @@
 #include linux/mm.h
 #include linux/interrupt.h
 #include linux/notifier.h
-#include linux/netfilter.h
 #include linux/init.h
 #include linux/spinlock.h
 #include net/netrom.h
diff --git a/net/rose/rose_link.c b/net/rose/rose_link.c
index e873d7d..c76638c 100644
--- a/net/rose/rose_link.c
+++ b/net/rose/rose_link.c
@@ -25,7 +25,6 @@
 #include linux/fcntl.h
 #include linux/mm.h
 #include linux/interrupt.h
-#include linux/netfilter.h
 #include net/rose.h
 
 static void rose_ftimer_expiry(unsigned long);
diff --git a/net/rose/rose_route.c b/net/rose/rose_route.c
index 40148932..0fc76d8 100644
--- a/net/rose/rose_route.c
+++ b/net/rose/rose_route.c
@@ -31,7 +31,6 @@
 #include linux/mm.h
 #include linux/interrupt.h
 #include linux/notifier.h
-#include linux/netfilter.h
 #include linux/init.h
 #include net/rose.h
 #include linux/seq_file.h
diff --git a/security/selinux/xfrm.c b/security/selinux/xfrm.c
index 98b0426..56e354f 100644
--- a/security/selinux/xfrm.c
+++ b/security/selinux/xfrm.c
@@ -35,9 +35,6 @@
 #include linux/init.h
 #include linux/security.h
 #include linux/types.h
-#include linux/netfilter.h
-#include linux/netfilter_ipv4.h
-#include linux/netfilter_ipv6.h
 #include linux/slab.h
 #include linux/ip.h
 #include linux/tcp.h
-- 
1.7.10.4


[PATCH 10/12] netfilter: don't pull include/linux/netfilter.h from netns headers

2015-06-19 Thread Pablo Neira Ayuso
This pulls the full hook netfilter definitions from all those that include
net_namespace.h.

Instead let's just include the bare minimum required in the new
linux/netfilter_defs.h file, and use it from the netfilter netns header files.

I also needed to include in.h and in6.h from linux/netfilter.h otherwise we hit
this compilation error:

In file included from include/linux/netfilter_defs.h:4:0,
 from include/net/netns/netfilter.h:4,
 from include/net/net_namespace.h:22,
 from include/linux/netdevice.h:43,
 from net/netfilter/nfnetlink_queue_core.c:23:
include/uapi/linux/netfilter.h:76:17: error: field ‘in’ has incomplete type 
struct in_addr in;

And also explicit include linux/netfilter.h in several spots.

Signed-off-by: Pablo Neira Ayuso pa...@netfilter.org
Signed-off-by: Eric W. Biederman ebied...@xmission.com
---
 include/linux/netfilter.h  |6 ++
 include/linux/netfilter_defs.h |9 +
 include/net/netns/netfilter.h  |2 +-
 include/net/netns/x_tables.h   |2 +-
 include/uapi/linux/netfilter.h |3 ++-
 net/ipv6/output_core.c |1 +
 6 files changed, 16 insertions(+), 7 deletions(-)
 create mode 100644 include/linux/netfilter_defs.h

diff --git a/include/linux/netfilter.h b/include/linux/netfilter.h
index f5ff5d1..00050df 100644
--- a/include/linux/netfilter.h
+++ b/include/linux/netfilter.h
@@ -10,7 +10,8 @@
 #include linux/wait.h
 #include linux/list.h
 #include linux/static_key.h
-#include uapi/linux/netfilter.h
+#include linux/netfilter_defs.h
+
 #ifdef CONFIG_NETFILTER
 static inline int NF_DROP_GETERR(int verdict)
 {
@@ -38,9 +39,6 @@ static inline void nf_inet_addr_mask(const union nf_inet_addr 
*a1,
 
 int netfilter_init(void);
 
-/* Largest hook number + 1 */
-#define NF_MAX_HOOKS 8
-
 struct sk_buff;
 
 struct nf_hook_ops;
diff --git a/include/linux/netfilter_defs.h b/include/linux/netfilter_defs.h
new file mode 100644
index 000..d3a7f85
--- /dev/null
+++ b/include/linux/netfilter_defs.h
@@ -0,0 +1,9 @@
+#ifndef __LINUX_NETFILTER_CORE_H_
+#define __LINUX_NETFILTER_CORE_H_
+
+#include uapi/linux/netfilter.h
+
+/* Largest hook number + 1, see uapi/linux/netfilter_decnet.h */
+#define NF_MAX_HOOKS 8
+
+#endif
diff --git a/include/net/netns/netfilter.h b/include/net/netns/netfilter.h
index cf25b5e..532e4ba 100644
--- a/include/net/netns/netfilter.h
+++ b/include/net/netns/netfilter.h
@@ -1,7 +1,7 @@
 #ifndef __NETNS_NETFILTER_H
 #define __NETNS_NETFILTER_H
 
-#include linux/netfilter.h
+#include linux/netfilter_defs.h
 
 struct proc_dir_entry;
 struct nf_logger;
diff --git a/include/net/netns/x_tables.h b/include/net/netns/x_tables.h
index 4d6597a..c8a7681 100644
--- a/include/net/netns/x_tables.h
+++ b/include/net/netns/x_tables.h
@@ -2,7 +2,7 @@
 #define __NETNS_X_TABLES_H
 
 #include linux/list.h
-#include linux/netfilter.h
+#include linux/netfilter_defs.h
 
 struct ebt_table;
 
diff --git a/include/uapi/linux/netfilter.h b/include/uapi/linux/netfilter.h
index 177027c..d93f949 100644
--- a/include/uapi/linux/netfilter.h
+++ b/include/uapi/linux/netfilter.h
@@ -4,7 +4,8 @@
 #include linux/types.h
 #include linux/compiler.h
 #include linux/sysctl.h
-
+#include linux/in.h
+#include linux/in6.h
 
 /* Responses from hook functions. */
 #define NF_DROP 0
diff --git a/net/ipv6/output_core.c b/net/ipv6/output_core.c
index 21678ac..928a0fb 100644
--- a/net/ipv6/output_core.c
+++ b/net/ipv6/output_core.c
@@ -8,6 +8,7 @@
 #include net/ip6_fib.h
 #include net/addrconf.h
 #include net/secure_seq.h
+#include linux/netfilter.h
 
 static u32 __ipv6_select_ident(struct net *net, u32 hashrnd,
   const struct in6_addr *dst,
-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe netdev in


Re: [PATCH next v3] bonding: Display LACP info only to CAP_NET_ADMIN capable user

2015-06-19 Thread Andy Gospodarek
On Fri, Jun 19, 2015 at 10:02:39AM -0700, Mahesh Bandewar wrote:
 On Thu, Jun 18, 2015 at 8:00 PM, Andy Gospodarek
 go...@cumulusnetworks.com wrote:
[...]
  With this patch, actor_oper_port_state and partner_oper.port_state are
  not displayed in /proc, but that information is available via netlink
  from bond_fill_slave_info().
 
  I suspect you do not deem these two values as critical to the security
  of the system, but wanted to confirm before ACKing.
 
 Yes, one can very easily figure out that LACP is used in the system
 with parameters like bond-mode, lacp-rate, or the port-state. I feel
 these do not need to be hidden from unprivileged users to ensure
 security. Principally hiding enough to ensure security would be good
 rather than hiding everything. However if there is a scenario where
 exposing these values is a threat (in the same sense) then it's not
 lot of extra work to achieve that and I'm open to make those change.

Sounds fine to me.  I just wanted to be sure the diffrence between the
information displayed in various modes was intentional (or at least not
unintentional) and did not conflict with your plans.

Signed-off-by: Andy Gospodarek go...@cumulusnetworks.com

--
To unsubscribe from this list: send the line unsubscribe netdev in


Re: [PATCH net-next RFC v2 3/3] mpls: support for ip tunnels

2015-06-19 Thread Robert Shearman

On 19/06/15 05:49, Roopa Prabhu wrote:

From: Roopa Prabhu ro...@cumulusnetworks.com

Support ip mpls tunnels using the new lwt infrastructure.

Signed-off-by: Roopa Prabhu ro...@cumulusnetworks.com

...

+int mpls_output(struct sock *sk, struct sk_buff *skb)
+{
+   struct mpls_iptunnel_encap *tun_encap_info;
+   struct mpls_shim_hdr *hdr;
+   struct mpls_entry_decoded dec;
+   struct net_device *out_dev;
+   unsigned int hh_len;
+   unsigned int new_header_size;
+   unsigned int mtu;
+   struct lwtunnel_state *lwtstate;
+   struct rtable *rt = skb_rtable(skb);
+   int err;
+   bool bos;
+   int i;
+
+   if (skb-pkt_type != PACKET_HOST)
+   goto drop;
+
+   if ((skb = skb_share_check(skb, GFP_ATOMIC)) == NULL)
+   goto drop;
+
+   if (!rt)
+   goto drop;
+
+   /* Find the output device */
+   out_dev = rcu_dereference(skb_dst(skb)-dev);


Since the entire label stack and the output device is encoded in the 
route, this means that you won't get prefix-independent convergence with 
this implementation for an IGP route change. I.e. if you've got 10 
million VPN routes via an IGP route for the BGP nexthop, and the IGP 
route for the BGP nexthop changes (e.g. because a link has gone down 
somewhere in the network) then you'll have to update all 10 million IP 
routes to change the output device, gateway and IGP label.


That's going to represent a scaling obstacle for one of the primary MPLS 
use cases.



+   if (!mpls_output_possible(out_dev))
+   goto drop;
+
+   if (skb_warn_if_lro(skb))
+   goto drop;
+   skb_forward_csum(skb);
+
+   lwtstate = rt-rt_lwtstate;
+   if (!lwtstate)
+   goto drop;
+
+   tun_encap_info = mpls_lwt_hdr(lwtstate);
+
+   /* Verify the destination can hold the packet */
+   new_header_size = mpls_encap_size(tun_encap_info);
+   mtu = mpls_dev_mtu(out_dev);
+   if (mpls_pkt_too_big(skb, mtu - new_header_size))
+   goto drop;
+
+   hh_len = LL_RESERVED_SPACE(out_dev);
+   if (!out_dev-header_ops)
+   hh_len = 0;
+
+   /* Ensure there is enough space for the headers in the skb */
+   if (skb_cow(skb, hh_len + new_header_size))
+   goto drop;
+
+   skb-dev = out_dev;
+   skb-protocol = htons(ETH_P_MPLS_UC);
+
+   skb_push(skb, new_header_size);
+   skb_reset_network_header(skb);
+
+   /* Push the new labels */
+   hdr = mpls_hdr(skb);
+   bos = true;
+   for (i = tun_encap_info-labels - 1; i = 0; i--) {
+   hdr[i] = mpls_entry_encode(tun_encap_info-label[i],
+  dec.ttl, 0, bos);


dec is never initialised in this function, so this will encode a garbage 
ttl into the packet.


This should instead be deriving the ttl from the IP packet, as Eric did 
in his original patch.


Thanks,
Rob


+   bos = false;
+   }
+
+   err = neigh_xmit(NEIGH_ARP_TABLE, out_dev, rt-rt_gateway,
+skb);
+   if (err)
+   net_dbg_ratelimited(%s: packet transmission failed: 
+   %d\n, __func__, err);
+
+   return 0;
+
+drop:
+   kfree_skb(skb);
+   return -EINVAL;
+}

--
To unsubscribe from this list: send the line unsubscribe netdev in


Re: [PATCH net] netfilter: nftables: Do not run chains in the wrong network namespace

2015-06-19 Thread Pablo Neira Ayuso
On Fri, Jun 19, 2015 at 10:41:21AM -0500, Eric W. Biederman wrote:
 
 Currenlty nf_tables chains added in one network namespace are being
 run in all network namespace.  The issues are myriad with the simplest
 being an unprivileged user can cause any network packets to be dropped.
 
 Address this by simply not running nf_tables chains in the wrong
 network namespace.
 
 Cc: sta...@vger.kernel.org
 Signed-off-by: Eric W. Biederman ebied...@xmission.com

Acked-by: Pablo Neira Ayuso pa...@netfilter.org

@David: Patrick sent a similar patch to address this, if you can get
this into the net tree, I'll make sure this propagates to -stable.
Thanks.
--
To unsubscribe from this list: send the line unsubscribe netdev in


Re: [REGRESSION] NFS is creating a hidden port (left over from xs_bind() )

2015-06-19 Thread Steven Rostedt
On Fri, 19 Jun 2015 12:25:53 -0400
Steven Rostedt rost...@goodmis.org wrote:


 I don't see that 55201 anywhere. But then again, I didn't look for it
 before the port disappeared. I could reboot and look for it again. I
 should have saved the full netstat -tapn as well :-/

Of course I didn't find it anywhere, that's the port on my wife's box
that port 947 was connected to.

Now I even went over to my wife's box and ran

 # rpcinfo -p localhost
   program vers proto   port  service
104   tcp111  portmapper
103   tcp111  portmapper
102   tcp111  portmapper
104   udp111  portmapper
103   udp111  portmapper
102   udp111  portmapper
1000241   udp  34243  status
1000241   tcp  34498  status

which doesn't show anything.

but something is listening to that port...

 # netstat -ntap |grep 55201
tcp0  0 0.0.0.0:55201   0.0.0.0:*   LISTEN   

I rebooted again, but this time I ran this on my wife's box:

 # trace-cmd record -e nfs -e nfs4 -e net -e skb -e sock -e udp -e workqueue -e 
sunrpc

I started it when my server started booting the kernel, and kept it
running till the port vanished.

The full trace can be downloaded from
http://rostedt.homelinux.com/private/wife-trace.txt

Here's some interesting output from that trace:

ksoftirq-13  1..s. 12272627.681760: netif_receive_skb:dev=lo 
skbaddr=0x88020944c600 len=88
ksoftirq-13  1..s. 12272627.681776: net_dev_queue:dev=eth0 
skbaddr=0x880234e5b100 len=42
ksoftirq-13  1..s. 12272627.681777: net_dev_start_xmit:   dev=eth0 
queue_mapping=0 skbaddr=0x880234e5b100 vlan_tagged=0 vlan_proto=0x 
vlan_tci=0x protocol=0x0806 ip_
summed=0 len=42 data_len=0 network_offset=14 transport_offset_valid=0 
transport_offset=65533 tx_flags=0 gso_size=0 gso_segs=0 gso_type=0
ksoftirq-13  1..s. 12272627.681779: net_dev_xmit: dev=eth0 
skbaddr=0x880234e5b100 len=42 rc=0
ksoftirq-13  1..s. 12272627.681780: kfree_skb:
skbaddr=0x88023444cf00 protocol=2048 location=0x81422a72
ksoftirq-13  1..s. 12272627.681783: rpc_socket_error: error=-113 
socket:[11886206] dstaddr=192.168.23.9/2049 state=2 () sk_state=2 ()
ksoftirq-13  1..s. 12272627.681785: rpc_task_wakeup:  task:18128@0 
flags=5281 state=0006 status=-113 timeout=45000 queue=xprt_pending
ksoftirq-13  1d.s. 12272627.681786: workqueue_queue_work: work 
struct=0x8800b5a94588 function=rpc_async_schedule 
workqueue=0x880234666800 req_cpu=512 cpu=1
ksoftirq-13  1d.s. 12272627.681787: workqueue_activate_work: work struct 
0x8800b5a94588
ksoftirq-13  1..s. 12272627.681791: rpc_socket_state_change: 
socket:[11886206] dstaddr=192.168.23.9/2049 state=2 () sk_state=7 ()
ksoftirq-13  1..s. 12272627.681792: kfree_skb:
skbaddr=0x88020944c600 protocol=2048 location=0x81482c05
kworker/-20111   1 12272627.681796: workqueue_execute_start: work struct 
0x8800b5a94588: function rpc_async_schedule
kworker/-20111   1 12272627.681797: rpc_task_run_action:  task:18128@0 
flags=5281 state=0005 status=-113 action=call_connect_status
kworker/-20111   1 12272627.681798: rpc_task_run_action:  task:18128@0 
flags=5281 state=0005 status=-113 action=call_connect_status
kworker/-20111   1 12272627.681798: rpc_connect_status:   task:18128@0, 
status -113
kworker/-20111   1..s. 12272627.681799: rpc_task_sleep:   task:18128@0 
flags=5281 state=0005 status=0 timeout=750 queue=delayq
kworker/-20111   1 12272627.681800: workqueue_execute_end: work struct 
0x8800b5a94588

  idle-0   1..s. 12272630.688741: rpc_task_wakeup:  task:18128@0 
flags=5281 state=0006 status=-110 timeout=750 queue=delayq
  idle-0   1dNs. 12272630.688749: workqueue_queue_work: work 
struct=0x8800b5a94588 function=rpc_async_schedule 
workqueue=0x880234666800 req_cpu=512 cpu=1
  idle-0   1dNs. 12272630.688749: workqueue_activate_work: work struct 
0x8800b5a94588
kworker/-20111   1 12272630.688758: workqueue_execute_start: work struct 
0x8800b5a94588: function rpc_async_schedule
kworker/-20111   1 12272630.688759: rpc_task_run_action:  task:18128@0 
flags=5281 state=0005 status=-110 action=call_timeout
kworker/-20111   1 12272630.688760: rpc_task_run_action:  task:18128@0 
flags=5281 state=0005 status=0 action=call_timeout
kworker/-20111   1 12272630.688760: rpc_task_run_action:  task:18128@0 
flags=5281 state=0005 status=0 action=call_bind
kworker/-20111   1 12272630.688761: rpc_task_run_action:  task:18128@0 
flags=5281 state=0005 status=0 action=call_connect
kworker/-20111   1..s. 12272630.688762: rpc_task_sleep:   task:18128@0 
flags=5281 state=0005 status=0 timeout=45000 queue=xprt_pending
kworker/-20111   1 12272630.688765: workqueue_execute_end: work struct 
0x8800b5a94588
  idle-0   3d.s. 12272630.696742: 

[PATCH 08/12] net: include missing headers in net/net_namespace.h

2015-06-19 Thread Pablo Neira Ayuso
Include linux/idr.h and linux/skbuff.h since they are required by objects that
are declared in the net structure.

 struct net {
...
struct idr  netns_ids;
...
struct sk_buff_head wext_nlevents;
...

Signed-off-by: Pablo Neira Ayuso pa...@netfilter.org
Signed-off-by: Eric W. Biederman ebied...@xmission.com
---
 include/net/net_namespace.h |2 ++
 1 file changed, 2 insertions(+)

diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h
index 72eb237..e951453 100644
--- a/include/net/net_namespace.h
+++ b/include/net/net_namespace.h
@@ -28,6 +28,8 @@
 #include net/netns/xfrm.h
 #include net/netns/mpls.h
 #include linux/ns_common.h
+#include linux/idr.h
+#include linux/skbuff.h
 
 struct user_namespace;
 struct proc_dir_entry;
-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe netdev in


[PATCH 05/12] netfilter: bridge: split ipv6 code into separated file

2015-06-19 Thread Pablo Neira Ayuso
Resolve compilation breakage when CONFIG_IPV6 is not set by moving the IPv6
code into a separated br_netfilter_ipv6.c file.

Fixes: efb6de9b4ba0 (netfilter: bridge: forward IPv6 fragmented packets)
Reported-by: kbuild test robot fengguang...@intel.com
Signed-off-by: Pablo Neira Ayuso pa...@netfilter.org
---
 include/net/netfilter/br_netfilter.h |   60 
 net/bridge/Makefile  |1 +
 net/bridge/br_netfilter_hooks.c  |  248 ++
 net/bridge/br_netfilter_ipv6.c   |  245 +
 4 files changed, 315 insertions(+), 239 deletions(-)
 create mode 100644 net/bridge/br_netfilter_ipv6.c

diff --git a/include/net/netfilter/br_netfilter.h 
b/include/net/netfilter/br_netfilter.h
index 2aa6048..bab824b 100644
--- a/include/net/netfilter/br_netfilter.h
+++ b/include/net/netfilter/br_netfilter.h
@@ -1,6 +1,66 @@
 #ifndef _BR_NETFILTER_H_
 #define _BR_NETFILTER_H_
 
+#include ../../../net/bridge/br_private.h
+
+static inline struct nf_bridge_info *nf_bridge_alloc(struct sk_buff *skb)
+{
+   skb-nf_bridge = kzalloc(sizeof(struct nf_bridge_info), GFP_ATOMIC);
+
+   if (likely(skb-nf_bridge))
+   atomic_set((skb-nf_bridge-use), 1);
+
+   return skb-nf_bridge;
+}
+
+void nf_bridge_update_protocol(struct sk_buff *skb);
+
+static inline struct nf_bridge_info *
+nf_bridge_info_get(const struct sk_buff *skb)
+{
+   return skb-nf_bridge;
+}
+
+unsigned int nf_bridge_encap_header_len(const struct sk_buff *skb);
+
+static inline void nf_bridge_push_encap_header(struct sk_buff *skb)
+{
+   unsigned int len = nf_bridge_encap_header_len(skb);
+
+   skb_push(skb, len);
+   skb-network_header -= len;
+}
+
+int br_nf_pre_routing_finish_bridge(struct sock *sk, struct sk_buff *skb);
+
+static inline struct rtable *bridge_parent_rtable(const struct net_device *dev)
+{
+   struct net_bridge_port *port;
+
+   port = br_port_get_rcu(dev);
+   return port ? port-br-fake_rtable : NULL;
+}
+
+struct net_device *setup_pre_routing(struct sk_buff *skb);
 void br_netfilter_enable(void);
 
+#if IS_ENABLED(CONFIG_IPV6)
+int br_validate_ipv6(struct sk_buff *skb);
+unsigned int br_nf_pre_routing_ipv6(const struct nf_hook_ops *ops,
+   struct sk_buff *skb,
+   const struct nf_hook_state *state);
+#else
+static inline int br_validate_ipv6(struct sk_buff *skb)
+{
+   return -1;
+}
+
+static inline unsigned int
+br_nf_pre_routing_ipv6(const struct nf_hook_ops *ops, struct sk_buff *skb,
+  const struct nf_hook_state *state)
+{
+   return NF_DROP;
+}
+#endif
+
 #endif /* _BR_NETFILTER_H_ */
diff --git a/net/bridge/Makefile b/net/bridge/Makefile
index c52577a..a1cda5d 100644
--- a/net/bridge/Makefile
+++ b/net/bridge/Makefile
@@ -13,6 +13,7 @@ bridge-$(CONFIG_SYSFS) += br_sysfs_if.o br_sysfs_br.o
 bridge-$(subst m,y,$(CONFIG_BRIDGE_NETFILTER)) += br_nf_core.o
 
 br_netfilter-y := br_netfilter_hooks.o
+br_netfilter-$(subst m,y,$(CONFIG_IPV6)) += br_netfilter_ipv6.o
 obj-$(CONFIG_BRIDGE_NETFILTER) += br_netfilter.o
 
 bridge-$(CONFIG_BRIDGE_IGMP_SNOOPING) += br_multicast.o br_mdb.o
diff --git a/net/bridge/br_netfilter_hooks.c b/net/bridge/br_netfilter_hooks.c
index e4e5f2f..d89f4fa 100644
--- a/net/bridge/br_netfilter_hooks.c
+++ b/net/bridge/br_netfilter_hooks.c
@@ -123,11 +123,6 @@ struct brnf_frag_data {
 static DEFINE_PER_CPU(struct brnf_frag_data, brnf_frag_data_storage);
 #endif
 
-static struct nf_bridge_info *nf_bridge_info_get(const struct sk_buff *skb)
-{
-   return skb-nf_bridge;
-}
-
 static void nf_bridge_info_free(struct sk_buff *skb)
 {
if (skb-nf_bridge) {
@@ -136,14 +131,6 @@ static void nf_bridge_info_free(struct sk_buff *skb)
}
 }
 
-static inline struct rtable *bridge_parent_rtable(const struct net_device *dev)
-{
-   struct net_bridge_port *port;
-
-   port = br_port_get_rcu(dev);
-   return port ? port-br-fake_rtable : NULL;
-}
-
 static inline struct net_device *bridge_parent(const struct net_device *dev)
 {
struct net_bridge_port *port;
@@ -152,15 +139,6 @@ static inline struct net_device *bridge_parent(const 
struct net_device *dev)
return port ? port-br-dev : NULL;
 }
 
-static inline struct nf_bridge_info *nf_bridge_alloc(struct sk_buff *skb)
-{
-   skb-nf_bridge = kzalloc(sizeof(struct nf_bridge_info), GFP_ATOMIC);
-   if (likely(skb-nf_bridge))
-   atomic_set((skb-nf_bridge-use), 1);
-
-   return skb-nf_bridge;
-}
-
 static inline struct nf_bridge_info *nf_bridge_unshare(struct sk_buff *skb)
 {
struct nf_bridge_info *nf_bridge = skb-nf_bridge;
@@ -178,7 +156,7 @@ static inline struct nf_bridge_info 
*nf_bridge_unshare(struct sk_buff *skb)
return nf_bridge;
 }
 
-static unsigned int nf_bridge_encap_header_len(const struct sk_buff *skb)
+unsigned int nf_bridge_encap_header_len(const struct sk_buff *skb)
 {
switch 

[PATCH 07/12] net: sched: Simplify em_ipset_match

2015-06-19 Thread Pablo Neira Ayuso
From: Eric W. Biederman ebied...@xmission.com

em-net is always set and always available, use it in preference
to dev_net(skb-dev).

Signed-off-by: Eric W. Biederman ebied...@xmission.com
Signed-off-by: Pablo Neira Ayuso pa...@netfilter.org
---
 net/sched/em_ipset.c |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/sched/em_ipset.c b/net/sched/em_ipset.c
index a3d79c8..df0328b 100644
--- a/net/sched/em_ipset.c
+++ b/net/sched/em_ipset.c
@@ -92,8 +92,8 @@ static int em_ipset_match(struct sk_buff *skb, struct 
tcf_ematch *em,
 
rcu_read_lock();
 
-   if (dev  skb-skb_iif)
-   indev = dev_get_by_index_rcu(dev_net(dev), skb-skb_iif);
+   if (skb-skb_iif)
+   indev = dev_get_by_index_rcu(em-net, skb-skb_iif);
 
acpar.in  = indev ? indev : dev;
acpar.out = dev;
-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe netdev in


[PATCH 04/12] netfilter: bridge: rename br_netfilter.c to br_netfilter_hooks.c

2015-06-19 Thread Pablo Neira Ayuso
To prepare separation of the IPv6 code into different file.

Signed-off-by: Pablo Neira Ayuso pa...@netfilter.org
---
 net/bridge/Makefile |1 +
 net/bridge/{br_netfilter.c = br_netfilter_hooks.c} |0
 2 files changed, 1 insertion(+)
 rename net/bridge/{br_netfilter.c = br_netfilter_hooks.c} (100%)

diff --git a/net/bridge/Makefile b/net/bridge/Makefile
index fd7ee03..c52577a 100644
--- a/net/bridge/Makefile
+++ b/net/bridge/Makefile
@@ -12,6 +12,7 @@ bridge-$(CONFIG_SYSFS) += br_sysfs_if.o br_sysfs_br.o
 
 bridge-$(subst m,y,$(CONFIG_BRIDGE_NETFILTER)) += br_nf_core.o
 
+br_netfilter-y := br_netfilter_hooks.o
 obj-$(CONFIG_BRIDGE_NETFILTER) += br_netfilter.o
 
 bridge-$(CONFIG_BRIDGE_IGMP_SNOOPING) += br_multicast.o br_mdb.o
diff --git a/net/bridge/br_netfilter.c b/net/bridge/br_netfilter_hooks.c
similarity index 100%
rename from net/bridge/br_netfilter.c
rename to net/bridge/br_netfilter_hooks.c
-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe netdev in


[PATCH 03/12] netfilter: x_tables: align per cpu xt_counter

2015-06-19 Thread Pablo Neira Ayuso
From: Eric Dumazet eduma...@google.com

Let's force a 16 bytes alignment on xt_counter percpu allocations,
so that bytes and packets sit in same cache line.

xt_counter being exported to user space, we cannot add __align(16) on
the structure itself.

Signed-off-by: Eric Dumazet eduma...@google.com
Cc: Florian Westphal f...@strlen.de
Acked-by: Florian Westphal f...@strlen.de
Signed-off-by: Pablo Neira Ayuso pa...@netfilter.org
---
 include/linux/netfilter/x_tables.h |6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/include/linux/netfilter/x_tables.h 
b/include/linux/netfilter/x_tables.h
index 95693c4..1c97a22 100644
--- a/include/linux/netfilter/x_tables.h
+++ b/include/linux/netfilter/x_tables.h
@@ -356,7 +356,8 @@ static inline unsigned long ifname_compare_aligned(const 
char *_a,
  * so nothing needs to be done there.
  *
  * xt_percpu_counter_alloc returns the address of the percpu
- * counter, or 0 on !SMP.
+ * counter, or 0 on !SMP. We force an alignment of 16 bytes
+ * so that bytes/packets share a common cache line.
  *
  * Hence caller must use IS_ERR_VALUE to check for error, this
  * allows us to return 0 for single core systems without forcing
@@ -365,7 +366,8 @@ static inline unsigned long ifname_compare_aligned(const 
char *_a,
 static inline u64 xt_percpu_counter_alloc(void)
 {
if (nr_cpu_ids  1) {
-   void __percpu *res = alloc_percpu(struct xt_counters);
+   void __percpu *res = __alloc_percpu(sizeof(struct xt_counters),
+   sizeof(struct xt_counters));
 
if (res == NULL)
return (u64) -ENOMEM;
-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe netdev in


Re: [PATCH net-next RFC v2 1/3] lwt: infrastructure to support light weight tunnels

2015-06-19 Thread Robert Shearman

n 19/06/15 16:14, roopa wrote:

On 6/19/15, 7:43 AM, Robert Shearman wrote:


+
+static inline struct lwtunnel_state *lwtunnel_skb_lwstate(struct
sk_buff *skb)
+{
+struct rtable *rt = (struct rtable *)skb_dst(skb);
+
+return rt-rt_lwtstate;
+}


It doesn't look like this patch will build on its own because
rt_lwtstate isn't added to struct rtable until patch 2.

looks like i messed up the patch creation. I will fix that with the next
series.


More importantly, is it safe to assume that skb_dst will always return
an IPv4 dst? How will this look when IPv6 support is added?


Today lwtunnel_skb_lwstate is called from lwtunnel_output which is only
called from ipv4 code.
And my ipv6 variant code was supposed to have a 6 suffix. something like
lwtunnel_output6.
Or to be more explicit i will probably have variants of the output and
skb handling functions like,
lwtunnel_output_ipv4 and lwtunnel_output_ipv6.


Do you intend for these functions to be used by netdevices to support 
the vxlan use case?


If so, then how will the netdevice know which one of the two to call? 
Will there have to be a netdevice for ipv4 and a netdevice for ipv6?


If not, could you outline how you intend for it to be implemented?

Thanks,
Rob
--
To unsubscribe from this list: send the line unsubscribe netdev in


Re: [REGRESSION] NFS is creating a hidden port (left over from xs_bind() )

2015-06-19 Thread Trond Myklebust
On Fri, Jun 19, 2015 at 1:17 PM, Steven Rostedt rost...@goodmis.org wrote:
 On Fri, 19 Jun 2015 12:25:53 -0400
 Steven Rostedt rost...@goodmis.org wrote:


 I don't see that 55201 anywhere. But then again, I didn't look for it
 before the port disappeared. I could reboot and look for it again. I
 should have saved the full netstat -tapn as well :-/

 Of course I didn't find it anywhere, that's the port on my wife's box
 that port 947 was connected to.

 Now I even went over to my wife's box and ran

  # rpcinfo -p localhost
program vers proto   port  service
 104   tcp111  portmapper
 103   tcp111  portmapper
 102   tcp111  portmapper
 104   udp111  portmapper
 103   udp111  portmapper
 102   udp111  portmapper
 1000241   udp  34243  status
 1000241   tcp  34498  status

 which doesn't show anything.

 but something is listening to that port...

  # netstat -ntap |grep 55201
 tcp0  0 0.0.0.0:55201   0.0.0.0:*   LISTEN


Hang on. This is on the client box while there is an active NFSv4
mount? Then that's probably the NFSv4 callback channel listening for
delegation callbacks.

Can you please try:

echo options nfs callback_tcpport=4048  /etc/modprobe.d/nfs-local.conf

and then either reboot the client or unload and then reload the nfs
modules before reattempting the mount. If this is indeed the callback
channel, then that will move your phantom listener to port 4048...

Cheers
   Trond
--
To unsubscribe from this list: send the line unsubscribe netdev in


[PATCH 01/12] netfilter: nfnetlink_queue: add security context information

2015-06-19 Thread Pablo Neira Ayuso
From: Roman Kubiak r.kub...@samsung.com

This patch adds an additional attribute when sending
packet information via netlink in netfilter_queue module.
It will send additional security context data, so that
userspace applications can verify this context against
their own security databases.

Signed-off-by: Roman Kubiak r.kub...@samsung.com
Acked-by: Florian Westphal f...@strlen.de
Signed-off-by: Pablo Neira Ayuso pa...@netfilter.org
---
 include/uapi/linux/netfilter/nfnetlink_queue.h |4 ++-
 net/netfilter/nfnetlink_queue_core.c   |   35 +++-
 2 files changed, 37 insertions(+), 2 deletions(-)

diff --git a/include/uapi/linux/netfilter/nfnetlink_queue.h 
b/include/uapi/linux/netfilter/nfnetlink_queue.h
index 8dd819e..b67a853 100644
--- a/include/uapi/linux/netfilter/nfnetlink_queue.h
+++ b/include/uapi/linux/netfilter/nfnetlink_queue.h
@@ -49,6 +49,7 @@ enum nfqnl_attr_type {
NFQA_EXP,   /* nf_conntrack_netlink.h */
NFQA_UID,   /* __u32 sk uid */
NFQA_GID,   /* __u32 sk gid */
+   NFQA_SECCTX,/* security context string */
 
__NFQA_MAX
 };
@@ -102,7 +103,8 @@ enum nfqnl_attr_config {
 #define NFQA_CFG_F_CONNTRACK   (1  1)
 #define NFQA_CFG_F_GSO (1  2)
 #define NFQA_CFG_F_UID_GID (1  3)
-#define NFQA_CFG_F_MAX (1  4)
+#define NFQA_CFG_F_SECCTX  (1  4)
+#define NFQA_CFG_F_MAX (1  5)
 
 /* flags for NFQA_SKB_INFO */
 /* packet appears to have wrong checksums, but they are ok */
diff --git a/net/netfilter/nfnetlink_queue_core.c 
b/net/netfilter/nfnetlink_queue_core.c
index 22a5ac7..6eccf0f 100644
--- a/net/netfilter/nfnetlink_queue_core.c
+++ b/net/netfilter/nfnetlink_queue_core.c
@@ -278,6 +278,23 @@ nla_put_failure:
return -1;
 }
 
+static u32 nfqnl_get_sk_secctx(struct sk_buff *skb, char **secdata)
+{
+   u32 seclen = 0;
+#if IS_ENABLED(CONFIG_NETWORK_SECMARK)
+   if (!skb || !sk_fullsock(skb-sk))
+   return 0;
+
+   read_lock_bh(skb-sk-sk_callback_lock);
+
+   if (skb-secmark)
+   security_secid_to_secctx(skb-secmark, secdata, seclen);
+
+   read_unlock_bh(skb-sk-sk_callback_lock);
+#endif
+   return seclen;
+}
+
 static struct sk_buff *
 nfqnl_build_packet_message(struct net *net, struct nfqnl_instance *queue,
   struct nf_queue_entry *entry,
@@ -297,6 +314,8 @@ nfqnl_build_packet_message(struct net *net, struct 
nfqnl_instance *queue,
struct nf_conn *ct = NULL;
enum ip_conntrack_info uninitialized_var(ctinfo);
bool csum_verify;
+   char *secdata = NULL;
+   u32 seclen = 0;
 
size =nlmsg_total_size(sizeof(struct nfgenmsg))
+ nla_total_size(sizeof(struct nfqnl_msg_packet_hdr))
@@ -352,6 +371,12 @@ nfqnl_build_packet_message(struct net *net, struct 
nfqnl_instance *queue,
+ nla_total_size(sizeof(u_int32_t)));   /* gid */
}
 
+   if ((queue-flags  NFQA_CFG_F_SECCTX)  entskb-sk) {
+   seclen = nfqnl_get_sk_secctx(entskb, secdata);
+   if (seclen)
+   size += nla_total_size(seclen);
+   }
+
skb = nfnetlink_alloc_skb(net, size, queue-peer_portid,
  GFP_ATOMIC);
if (!skb) {
@@ -479,6 +504,9 @@ nfqnl_build_packet_message(struct net *net, struct 
nfqnl_instance *queue,
nfqnl_put_sk_uidgid(skb, entskb-sk)  0)
goto nla_put_failure;
 
+   if (seclen  nla_put(skb, NFQA_SECCTX, seclen, secdata))
+   goto nla_put_failure;
+
if (ct  nfqnl_ct_put(skb, ct, ctinfo)  0)
goto nla_put_failure;
 
@@ -1142,7 +1170,12 @@ nfqnl_recv_config(struct sock *ctnl, struct sk_buff *skb,
ret = -EOPNOTSUPP;
goto err_out_unlock;
}
-
+#if !IS_ENABLED(CONFIG_NETWORK_SECMARK)
+   if (flags  mask  NFQA_CFG_F_SECCTX) {
+   ret = -EOPNOTSUPP;
+   goto err_out_unlock;
+   }
+#endif
spin_lock_bh(queue-lock);
queue-flags = ~mask;
queue-flags |= flags  mask;
-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe netdev in


Re: [PATCH net-next RFC v2 2/3] ipv4: add support for light weight tunnel encap attributes

2015-06-19 Thread Robert Shearman

On 19/06/15 16:28, roopa wrote:

On 6/19/15, 8:19 AM, Robert Shearman wrote:

On 19/06/15 05:49, Roopa Prabhu wrote:

From: Roopa Prabhu ro...@cumulusnetworks.com

Introduces two netlink attributes RTA_ENCAP_TYPE and
RTA_ENCAP to support attaching encap information to ipv4 routes.


Surely RTA_ENCAP_TYPE should be part of RTA_ENCAP, since the type
doesn't make sense without the data and vice versa?

I went back and forth on this. And started with what you are saying
above. But then I wanted RTA_ENCAP netlink policy to be declared by
individual lwtunnel drivers.
And to determine which RTA_ENCAP netlink policy to pick, you need to
know the RTA_ENCAP policy type (or lwtunnel type)
which is encoded in RTA_ENCAP_TYPE. And I did not want to introduce
another level of nest in RTA_ENCAP (because for nexthops we are already
2 levels deep when parsing RTA_ENCAP).


No need for that - use the example of how RTA_MULTIPATH is used for 
ipv4/ipv6:


+--+
| RTA_MULTIPATH|
+--+
| +--+ |
| | struct rtnexthop | |
| +--+ |
| | RTA_GATEWAY, etc.| |
| +--+ |
+--+

You could do similar for RTA_ENCAP where the type is stored in the data 
prior to the nested attributes starting. E.g.:


+--+
| RTA_ENCAP|
+--+
| +--+ |
| | struct rtencap   | |
| +--+ |
| | MPLS_IPTUNNEL_DST| |
| +--+ |
+--+

struct rtencap {
__u16 rte_type;
};



Hence, fib code first looks for RTA_ENCAP and if RTA_ENCAP is specified,
RTA_ENCAP_TYPE is a required attribute. My iproute2 patches handles this
and makes sure
there is an  RTA_ENCAP_TYPE specified with RTA_ENCAP.


No doubt, but surely it's better to present an unambiguous API to 
userspace if possible?


Thanks,
Rob
--
To unsubscribe from this list: send the line unsubscribe netdev in


Re: [PATCH 02/22] fjes: Hardware initialization routine

2015-06-19 Thread Yasuaki Ishimatsu
Hi Izumi-san,

On Thu, 18 Jun 2015 09:49:27 +0900
Taku Izumi izumi.t...@jp.fujitsu.com wrote:

 This patch adds hardware initialization routine to be
 invoked at driver's .probe routine.
 
 Signed-off-by: Taku Izumi izumi.t...@jp.fujitsu.com
 ---
  drivers/platform/x86/fjes/Makefile|   2 +-
  drivers/platform/x86/fjes/fjes_hw.c   | 305 
 ++
  drivers/platform/x86/fjes/fjes_hw.h   | 254 
  drivers/platform/x86/fjes/fjes_regs.h | 110 
  4 files changed, 670 insertions(+), 1 deletion(-)
  create mode 100644 drivers/platform/x86/fjes/fjes_hw.c
  create mode 100644 drivers/platform/x86/fjes/fjes_hw.h
  create mode 100644 drivers/platform/x86/fjes/fjes_regs.h
 
 diff --git a/drivers/platform/x86/fjes/Makefile 
 b/drivers/platform/x86/fjes/Makefile
 index 98e59cb..a67f65d8 100644
 --- a/drivers/platform/x86/fjes/Makefile
 +++ b/drivers/platform/x86/fjes/Makefile
 @@ -27,5 +27,5 @@
  
  obj-$(CONFIG_FUJITSU_ES) += fjes.o
  
 -fjes-objs := fjes_main.o
 +fjes-objs := fjes_main.o fjes_hw.o
  
 diff --git a/drivers/platform/x86/fjes/fjes_hw.c 
 b/drivers/platform/x86/fjes/fjes_hw.c
 new file mode 100644
 index 000..1731827
 --- /dev/null
 +++ b/drivers/platform/x86/fjes/fjes_hw.c
 @@ -0,0 +1,305 @@
 +/*
 + *  FUJITSU Extended Socket Network Device driver
 + *  Copyright (c) 2015 FUJITSU LIMITED
 + *
 + * This program is free software; you can redistribute it and/or modify it
 + * under the terms and conditions of the GNU General Public License,
 + * version 2, as published by the Free Software Foundation.
 + *
 + * This program is distributed in the hope it will be useful, but WITHOUT
 + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
 + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
 + * more details.
 + *
 + * You should have received a copy of the GNU General Public License along 
 with
 + * this program; if not, see http://www.gnu.org/licenses/.
 + *
 + * The full GNU General Public License is included in this distribution in
 + * the file called COPYING.
 + *
 + */
 +
 +#include fjes_hw.h
 +#include fjes.h
 +
 +/* supported MTU list */
 +u32 fjes_support_mtu[] = {
 + FJES_MTU_DEFINE(8 * 1024),
 + FJES_MTU_DEFINE(16 * 1024),
 + FJES_MTU_DEFINE(32 * 1024),
 + FJES_MTU_DEFINE(64 * 1024),
 + 0
 +};
 +
 +u32 fjes_hw_rd32(struct fjes_hw *hw, u32 reg)
 +{
 + u8 *base = hw-base;
 + u32 value = 0;
 +
 + value = readl(base[reg]);
 +
 + return value;
 +}
 +
 +static u8 *fjes_hw_iomap(struct fjes_hw *hw)
 +{
 + u8 *base;
 +
 + if (!request_mem_region(hw-hw_res.start, hw-hw_res.size,
 + fjes_driver_name)) {
 + pr_err(request_mem_region failed);
 + return NULL;
 + }
 +
 + base = (u8 *)ioremap_nocache(hw-hw_res.start, hw-hw_res.size);
 +
 + return base;
 +}
 +
 +
 +int fjes_hw_reset(struct fjes_hw *hw)
 +{
 +
 + int timeout;
 + union REG_DCTL dctl;
 +
 + dctl.Reg = 0;
 + dctl.Bits.reset = 1;
 + wr32(XSCT_DCTL, dctl.Reg);
 +
 +
 + timeout = FJES_DEVICE_RESET_TIMEOUT * 1000;
 + dctl.Reg = rd32(XSCT_DCTL);

 + while ((dctl.Bits.reset == 1)  (timeout  0)) {
 + msleep(1000);
 + dctl.Reg = rd32(XSCT_DCTL);
 + timeout -= 1000;
 + }
 +

 + return timeout = 0 ? 0 : -EIO;

The while loop finishes when timeout becomes 0. So the
funtion always returns 0.

It should be return dctl.Bits.reset =! 1 ? 0 : -EIO.

 +
 +}
 +

 +static int fjes_hw_get_max_epid(struct fjes_hw *hw)
 +{
 + union REG_MAX_EP info;
 +
 + info.Reg = rd32(XSCT_MAX_EP);
 +
 + return info.Bits.maxep;
 +}

This is very difficut to read. Please add comment.
When does info.Bits.maxep get value? The function just
uses rd32(XSCT_MAX_EP).

 +
 +static int fjes_hw_get_my_epid(struct fjes_hw *hw)
 +{
 + union REG_OWNER_EPID info;
 +
 + info.Reg = rd32(XSCT_OWNER_EPID);
 +
 + return info.Bits.epid;
 +}

Ditto.

 +
 +static int fjes_hw_alloc_shared_status_region(struct fjes_hw *hw)
 +{
 + size_t size;
 +
 + size = sizeof(struct fjes_device_shared_info) +
 + (sizeof(u8) * hw-max_epid);
 + hw-hw_info.share = kzalloc(size, GFP_KERNEL);
 + if (!hw-hw_info.share)
 + return -ENOMEM;
 +
 + hw-hw_info.share-epnum = hw-max_epid;
 +
 + return 0;
 +}
 +
 +static int fjes_hw_alloc_epbuf(struct epbuf_handler *epbh)
 +{
 + void *mem;
 +

 + mem = vmalloc(EP_BUFFER_SIZE);
 + if (!mem)
 + return -ENOMEM;
 + memset(mem, 0, EP_BUFFER_SIZE);

How about use vzalloc().

 +
 + epbh-buffer = mem;
 + epbh-size = EP_BUFFER_SIZE;
 +
 + epbh-info = (union ep_buffer_info *)mem;
 + epbh-ring = (u8 *) (mem + sizeof(union ep_buffer_info));
 +
 + return 0;
 +}
 +
 +void fjes_hw_setup_epbuf(struct epbuf_handler *epbh, u8 *mac_addr, u32 mtu)
 +{
 +
 + union ep_buffer_info *info = epbh-info;
 +   

Re: [PATCH 1/1 net-next] sunrpc: use sg_init_one() in krb5_rc4_setup_enc/seq_key()

2015-06-19 Thread J. Bruce Fields
Thanks, applying.--b.

On Tue, Jun 16, 2015 at 09:57:52PM +0200, Fabian Frederick wrote:
 Don't opencode sg_init_one()
 
 Signed-off-by: Fabian Frederick f...@skynet.be
 ---
  net/sunrpc/auth_gss/gss_krb5_crypto.c | 8 ++--
  1 file changed, 2 insertions(+), 6 deletions(-)
 
 diff --git a/net/sunrpc/auth_gss/gss_krb5_crypto.c 
 b/net/sunrpc/auth_gss/gss_krb5_crypto.c
 index b5408e8..fee3c15 100644
 --- a/net/sunrpc/auth_gss/gss_krb5_crypto.c
 +++ b/net/sunrpc/auth_gss/gss_krb5_crypto.c
 @@ -881,9 +881,7 @@ krb5_rc4_setup_seq_key(struct krb5_ctx *kctx, struct 
 crypto_blkcipher *cipher,
   if (err)
   goto out_err;
  
 - sg_init_table(sg, 1);
 - sg_set_buf(sg, zeroconstant, 4);
 -
 + sg_init_one(sg, zeroconstant, 4);
   err = crypto_hash_digest(desc, sg, 4, Kseq);
   if (err)
   goto out_err;
 @@ -951,9 +949,7 @@ krb5_rc4_setup_enc_key(struct krb5_ctx *kctx, struct 
 crypto_blkcipher *cipher,
   if (err)
   goto out_err;
  
 - sg_init_table(sg, 1);
 - sg_set_buf(sg, zeroconstant, 4);
 -
 + sg_init_one(sg, zeroconstant, 4);
   err = crypto_hash_digest(desc, sg, 4, Kcrypt);
   if (err)
   goto out_err;
 -- 
 2.4.2
--
To unsubscribe from this list: send the line unsubscribe netdev in


Re: [REGRESSION] NFS is creating a hidden port (left over from xs_bind() )

2015-06-19 Thread Jeff Layton
On Fri, 19 Jun 2015 13:39:08 -0400
Trond Myklebust trond.mykleb...@primarydata.com wrote:

 On Fri, Jun 19, 2015 at 1:17 PM, Steven Rostedt rost...@goodmis.org wrote:
  On Fri, 19 Jun 2015 12:25:53 -0400
  Steven Rostedt rost...@goodmis.org wrote:
 
 
  I don't see that 55201 anywhere. But then again, I didn't look for it
  before the port disappeared. I could reboot and look for it again. I
  should have saved the full netstat -tapn as well :-/
 
  Of course I didn't find it anywhere, that's the port on my wife's box
  that port 947 was connected to.
 
  Now I even went over to my wife's box and ran
 
   # rpcinfo -p localhost
 program vers proto   port  service
  104   tcp111  portmapper
  103   tcp111  portmapper
  102   tcp111  portmapper
  104   udp111  portmapper
  103   udp111  portmapper
  102   udp111  portmapper
  1000241   udp  34243  status
  1000241   tcp  34498  status
 
  which doesn't show anything.
 
  but something is listening to that port...
 
   # netstat -ntap |grep 55201
  tcp0  0 0.0.0.0:55201   0.0.0.0:*   LISTEN
 
 
 Hang on. This is on the client box while there is an active NFSv4
 mount? Then that's probably the NFSv4 callback channel listening for
 delegation callbacks.
 
 Can you please try:
 
 echo options nfs callback_tcpport=4048  /etc/modprobe.d/nfs-local.conf
 
 and then either reboot the client or unload and then reload the nfs
 modules before reattempting the mount. If this is indeed the callback
 channel, then that will move your phantom listener to port 4048...
 

Right, it was a little unclear to me before, but it now seems clear
that the callback socket that the server is opening to the client is
the one squatting on the port.

...and that sort of makes sense, doesn't it? That rpc_clnt will stick
around for the life of the client's lease, and the rpc_clnt binds to a
particular port so that it can reconnect using the same one.

Given that Stephen has done the legwork and figured out that reverting
those commits fixes the issue, then I suspect that the real culprit is
caf4ccd4e88cf2.

The client is likely closing down the other end of the callback
socket when it goes idle. Before that commit, we probably did an
xs_close on it, but now we're doing a xs_tcp_shutdown and that leaves
the port bound.

I'm travelling this weekend and am not set up to reproduce it to
confirm, but that does seem to be a plausible scenario.
-- 
Jeff Layton jlay...@poochiereds.net
--
To unsubscribe from this list: send the line unsubscribe netdev in


Re: [PATCH net] netfilter: nf_qeueue: Drop queue entries on nf_unregister_hook

2015-06-19 Thread Eric W. Biederman

Add code to nf_unregister_hook to flush the nf_queue when a hook is
unregistered.  This guarantees that the pointer that the nf_queue code
retains into the nf_hook list will remain valid while a packet is
queued.

I tested what would happen if we do not flush queued packets and was
trivially able to obtain the oops below.  All that was required was
to stop the nf_queue listening process, to delete all of the nf_tables,
and to awaken the nf_queue listening process.

 BUG: unable to handle kernel paging request at 00010001
 IP: [00010001] 0x10001
 PGD b9c35067 PUD 0
 Oops: 0010 [#1] SMP
 Modules linked in:
 CPU: 0 PID: 519 Comm: lt-nfqnl_test Not tainted
 task: 8800b9c8c050 ti: 8800ba9d8000 task.ti: 8800ba9d8000
 RIP: 0010:[00010001]  [00010001] 0x10001
 RSP: 0018:8800ba9dba40  EFLAGS: 00010a16
 RAX: 8800bab48a00 RBX: 8800ba9dba90 RCX: 8800ba9dba90
 RDX: 8800b9c10128 RSI: 8800ba940900 RDI: 8800bab48a00
 RBP: 8800b9c10128 R08: 82976660 R09: 8800ba9dbb28
 R10: dead00100100 R11: dead00200200 R12: 8800ba940900
 R13: 8313fd50 R14: 8800b9c95200 R15: 
 FS:  7fb91fc34700() GS:8800bfa0() knlGS:
 CS:  0010 DS:  ES:  CR0: 80050033
 CR2: 00010001 CR3: babfb000 CR4: 07f0
 Stack:
  8206ab0f 82982240 8800bab48a00 8800b9c100a8
  8800b9c10100 0001 8800ba940900 8800b9c10128
  8206bd65 8800bfb0d5e0 8800bab48a00 00014dc0
 Call Trace:
  [8206ab0f] ? nf_iterate+0x4f/0xa0
  [8206bd65] ? nf_reinject+0x125/0x190
  [8206dee5] ? nfqnl_recv_verdict+0x255/0x360
  [81386290] ? nla_parse+0x80/0xf0
  [8206c42c] ? nfnetlink_rcv_msg+0x13c/0x240
  [811b2fec] ? __memcg_kmem_get_cache+0x4c/0x150
  [8206c2f0] ? nfnl_lock+0x20/0x20
  [82068159] ? netlink_rcv_skb+0xa9/0xc0
  [820677bf] ? netlink_unicast+0x12f/0x1c0
  [82067ade] ? netlink_sendmsg+0x28e/0x650
  [81fdd814] ? sock_sendmsg+0x44/0x50
  [81fde07b] ? ___sys_sendmsg+0x2ab/0x2c0
  [810e8f73] ? __wake_up+0x43/0x70
  [8141a134] ? tty_write+0x1c4/0x2a0
  [81fde9f4] ? __sys_sendmsg+0x44/0x80
  [823ff8d7] ? system_call_fastpath+0x12/0x6a
 Code:  Bad RIP value.
 RIP  [00010001] 0x10001
  RSP 8800ba9dba40
 CR2: 00010001
 ---[ end trace 08eb65d42362793f ]---

Cc: sta...@vger.kernel.org
Signed-off-by: Eric W. Biederman ebied...@xmission.com
---

Apologies for the duplicate send but I forgot to include the appropriate
mailing lists.

 include/net/netfilter/nf_queue.h |  2 ++
 net/netfilter/core.c |  1 +
 net/netfilter/nf_internals.h |  1 +
 net/netfilter/nf_queue.c | 17 +
 net/netfilter/nfnetlink_queue_core.c | 24 +++-
 5 files changed, 44 insertions(+), 1 deletion(-)

diff --git a/include/net/netfilter/nf_queue.h b/include/net/netfilter/nf_queue.h
index d81d584157e1..e8635854a55b 100644
--- a/include/net/netfilter/nf_queue.h
+++ b/include/net/netfilter/nf_queue.h
@@ -24,6 +24,8 @@ struct nf_queue_entry {
 struct nf_queue_handler {
int (*outfn)(struct nf_queue_entry *entry,
 unsigned int queuenum);
+   void(*nf_hook_drop)(struct net *net,
+   struct nf_hook_ops *ops);
 };
 
 void nf_register_queue_handler(const struct nf_queue_handler *qh);
diff --git a/net/netfilter/core.c b/net/netfilter/core.c
index 653e32eac08c..a0e54974e2c9 100644
--- a/net/netfilter/core.c
+++ b/net/netfilter/core.c
@@ -118,6 +118,7 @@ void nf_unregister_hook(struct nf_hook_ops *reg)
static_key_slow_dec(nf_hooks_needed[reg-pf][reg-hooknum]);
 #endif
synchronize_net();
+   nf_queue_nf_hook_drop(reg);
 }
 EXPORT_SYMBOL(nf_unregister_hook);
 
diff --git a/net/netfilter/nf_internals.h b/net/netfilter/nf_internals.h
index ea7f36784b3d..399210693c2a 100644
--- a/net/netfilter/nf_internals.h
+++ b/net/netfilter/nf_internals.h
@@ -19,6 +19,7 @@ unsigned int nf_iterate(struct list_head *head, struct 
sk_buff *skb,
 /* nf_queue.c */
 int nf_queue(struct sk_buff *skb, struct nf_hook_ops *elem,
 struct nf_hook_state *state, unsigned int queuenum);
+void nf_queue_nf_hook_drop(struct nf_hook_ops *ops);
 int __init netfilter_queue_init(void);
 
 /* nf_log.c */
diff --git a/net/netfilter/nf_queue.c b/net/netfilter/nf_queue.c
index 2e88032cd5ad..cd60d397fe05 100644
--- a/net/netfilter/nf_queue.c
+++ b/net/netfilter/nf_queue.c
@@ -105,6 +105,23 @@ bool nf_queue_entry_get_refs(struct nf_queue_entry *entry)
 }
 EXPORT_SYMBOL_GPL(nf_queue_entry_get_refs);
 
+void nf_queue_nf_hook_drop(struct nf_hook_ops *ops)
+{
+   const struct nf_queue_handler *qh;
+   struct net 

[no subject]

2015-06-19 Thread CEO at Epis
Are You Seriously In Need Of Loan,Get approved  loan today, at 3% contact 
(tracycla...@dr.com)--
To unsubscribe from this list: send the line unsubscribe netdev in


Re: [PATCH net-next RFC v2 1/3] lwt: infrastructure to support light weight tunnels

2015-06-19 Thread roopa

On 6/19/15, 10:25 AM, Robert Shearman wrote:

n 19/06/15 16:14, roopa wrote:

Today lwtunnel_skb_lwstate is called from lwtunnel_output which is only
called from ipv4 code.
And my ipv6 variant code was supposed to have a 6 suffix. something like
lwtunnel_output6.
Or to be more explicit i will probably have variants of the output and
skb handling functions like,
lwtunnel_output_ipv4 and lwtunnel_output_ipv6.


Do you intend for these functions to be used by netdevices to support 
the vxlan use case?


If so, then how will the netdevice know which one of the two to call? 
Will there have to be a netdevice for ipv4 and a netdevice for ipv6?


If not, could you outline how you intend for it to be implemented?


In the netdevice case, this output function is not called atall. It 
should just follow the existing netdevice the route is pointing to.

--
To unsubscribe from this list: send the line unsubscribe netdev in


Re: [PATCH net-next RFC v2 1/3] lwt: infrastructure to support light weight tunnels

2015-06-19 Thread Robert Shearman

On 19/06/15 19:34, roopa wrote:

On 6/19/15, 10:25 AM, Robert Shearman wrote:

n 19/06/15 16:14, roopa wrote:

Today lwtunnel_skb_lwstate is called from lwtunnel_output which is only
called from ipv4 code.
And my ipv6 variant code was supposed to have a 6 suffix. something like
lwtunnel_output6.
Or to be more explicit i will probably have variants of the output and
skb handling functions like,
lwtunnel_output_ipv4 and lwtunnel_output_ipv6.


Do you intend for these functions to be used by netdevices to support
the vxlan use case?

If so, then how will the netdevice know which one of the two to call?
Will there have to be a netdevice for ipv4 and a netdevice for ipv6?

If not, could you outline how you intend for it to be implemented?


In the netdevice case, this output function is not called atall. It
should just follow the existing netdevice the route is pointing to.


Sorry for not being clear, but I meant that there would have to be 
lwtunnel_skb_lwstate functions for ipv4 and ipv6 to match the output 
functions. So in the vxlan use case where it's using a netdevice, how 
would it determine which one to call?


Thanks,
Rob
--
To unsubscribe from this list: send the line unsubscribe netdev in


Re: [PATCH net-next RFC v2 2/3] ipv4: add support for light weight tunnel encap attributes

2015-06-19 Thread roopa

On 6/19/15, 10:17 AM, Robert Shearman wrote:


No need for that - use the example of how RTA_MULTIPATH is used for 
ipv4/ipv6:


+--+
| RTA_MULTIPATH|
+--+
| +--+ |
| | struct rtnexthop | |
| +--+ |
| | RTA_GATEWAY, etc.| |
| +--+ |
+--+

You could do similar for RTA_ENCAP where the type is stored in the 
data prior to the nested attributes starting. E.g.:


+--+
| RTA_ENCAP|
+--+
| +--+ |
| | struct rtencap   | |
| +--+ |
| | MPLS_IPTUNNEL_DST| |
| +--+ |
+--+

struct rtencap {
__u16 rte_type;
};
I did think about that...but today the rtnextop seems like it was 
written a struct initially and then extended with attributes only 
because the struct could not be extended (I maybe wrong). But half the 
fields are in a struct and the others are attributes. It gets confusing.

And i was trying to avoid that.
--
To unsubscribe from this list: send the line unsubscribe netdev in