Re: [REGRESSION] NFS is creating a hidden port (left over from xs_bind() )
On Fri, 2015-06-19 at 15:52 -0400, Jeff Layton wrote: On Fri, 19 Jun 2015 13:39:08 -0400 Trond Myklebust trond.mykleb...@primarydata.com wrote: On Fri, Jun 19, 2015 at 1:17 PM, Steven Rostedt rost...@goodmis.org wrote: On Fri, 19 Jun 2015 12:25:53 -0400 Steven Rostedt rost...@goodmis.org wrote: I don't see that 55201 anywhere. But then again, I didn't look for it before the port disappeared. I could reboot and look for it again. I should have saved the full netstat -tapn as well :-/ Of course I didn't find it anywhere, that's the port on my wife's box that port 947 was connected to. Now I even went over to my wife's box and ran # rpcinfo -p localhost program vers proto port service 104 tcp111 portmapper 103 tcp111 portmapper 102 tcp111 portmapper 104 udp111 portmapper 103 udp111 portmapper 102 udp111 portmapper 1000241 udp 34243 status 1000241 tcp 34498 status which doesn't show anything. but something is listening to that port... # netstat -ntap |grep 55201 tcp0 0 0.0.0.0:55201 0.0.0.0:* LISTEN Hang on. This is on the client box while there is an active NFSv4 mount? Then that's probably the NFSv4 callback channel listening for delegation callbacks. Can you please try: echo options nfs callback_tcpport=4048 /etc/modprobe.d/nfs -local.conf and then either reboot the client or unload and then reload the nfs modules before reattempting the mount. If this is indeed the callback channel, then that will move your phantom listener to port 4048... Right, it was a little unclear to me before, but it now seems clear that the callback socket that the server is opening to the client is the one squatting on the port. ...and that sort of makes sense, doesn't it? That rpc_clnt will stick around for the life of the client's lease, and the rpc_clnt binds to a particular port so that it can reconnect using the same one. Given that Stephen has done the legwork and figured out that reverting those commits fixes the issue, then I suspect that the real culprit is caf4ccd4e88cf2. The client is likely closing down the other end of the callback socket when it goes idle. Before that commit, we probably did an xs_close on it, but now we're doing a xs_tcp_shutdown and that leaves the port bound. Agreed. I've been looking into whether or not there is a simple fix. Reverting those patches is not an option, because the whole point was to ensure that the socket is in the TCP_CLOSED state before we release the socket. Steven, how about something like the following patch? 8- From 9a0bcfdbdbc793eae1ed6d901a6396b6c66f9513 Mon Sep 17 00:00:00 2001 From: Trond Myklebust trond.mykleb...@primarydata.com Date: Fri, 19 Jun 2015 16:17:57 -0400 Subject: [PATCH] SUNRPC: Ensure we release the TCP socket once it has been closed This fixes a regression introduced by commit caf4ccd4e88cf2 (SUNRPC: Make xs_tcp_close() do a socket shutdown rather than a sock_release). Prior to that commit, the autoclose feature would ensure that an idle connection would result in the socket being both disconnected and released, whereas now only gets disconnected. While the current behaviour is harmless, it does leave the port bound until either RPC traffic resumes or the RPC client is shut down. Reported-by: Steven Rostedt rost...@goodmis.org Signed-off-by: Trond Myklebust trond.mykleb...@primarydata.com --- net/sunrpc/xprt.c | 2 +- net/sunrpc/xprtsock.c | 8 ++-- 2 files changed, 7 insertions(+), 3 deletions(-) diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c index 3ca31f20b97c..ab5dd621ae0c 100644 --- a/net/sunrpc/xprt.c +++ b/net/sunrpc/xprt.c @@ -611,8 +611,8 @@ static void xprt_autoclose(struct work_struct *work) struct rpc_xprt *xprt = container_of(work, struct rpc_xprt, task_cleanup); - xprt-ops-close(xprt); clear_bit(XPRT_CLOSE_WAIT, xprt-state); + xprt-ops-close(xprt); xprt_release_write(xprt, NULL); } diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c index fda8ec8c74c0..75dcdadf0269 100644 --- a/net/sunrpc/xprtsock.c +++ b/net/sunrpc/xprtsock.c @@ -634,10 +634,13 @@ static void xs_tcp_shutdown(struct rpc_xprt *xprt) struct sock_xprt *transport = container_of(xprt, struct sock_xprt, xprt); struct socket *sock = transport-sock; - if (sock != NULL) { + if (sock == NULL) + return; + if (xprt_connected(xprt)) { kernel_sock_shutdown(sock, SHUT_RDWR); trace_rpc_socket_shutdown(xprt, sock); - } + } else + xs_reset_transport(transport); }
[net-next] vmxnet3: Changes for vmxnet3 adapter version 2 (fwd)
Make the driver understand adapter version 2. Cc: Rachel Lunnon rachel_lun...@stormagic.com Signed-off-by: Guolin Yang gy...@vmware.com Signed-off-by: Shreyas N Bhatewara sbhatew...@vmware.com -- diff --git a/drivers/net/vmxnet3/vmxnet3_defs.h b/drivers/net/vmxnet3/vmxnet3_defs.h index 3718d02..221a530 100644 --- a/drivers/net/vmxnet3/vmxnet3_defs.h +++ b/drivers/net/vmxnet3/vmxnet3_defs.h @@ -1,7 +1,7 @@ /* * Linux driver for VMware's vmxnet3 ethernet NIC. * - * Copyright (C) 2008-2009, VMware, Inc. All Rights Reserved. + * Copyright (C) 2008-2015, VMware, Inc. All Rights Reserved. * * This program is free software; you can redistribute it and/or modify it * under the terms of the GNU General Public License as published by the @@ -277,6 +277,40 @@ struct Vmxnet3_RxCompDesc { #endif /* __BIG_ENDIAN_BITFIELD */ }; +struct Vmxnet3_RxCompDescExt { + __le32 dword1; + u8 segCnt; /* Number of aggregated packets */ + u8 dupAckCnt;/* Number of duplicate Acks */ + __le16 tsDelta; /* TCP timestamp difference */ + __le32 dword2; +#ifdef __BIG_ENDIAN_BITFIELD + u32 gen:1;/* generation bit */ + u32 type:7; /* completion type */ + u32 fcs:1;/* Frame CRC correct */ + u32 frg:1;/* IP Fragment */ + u32 v4:1; /* IPv4 */ + u32 v6:1; /* IPv6 */ + u32 ipc:1;/* IP Checksum Correct */ + u32 tcp:1;/* TCP packet */ + u32 udp:1;/* UDP packet */ + u32 tuc:1;/* TCP/UDP Checksum Correct */ + u32 mss:16; +#else + u32 mss:16; + u32 tuc:1;/* TCP/UDP Checksum Correct */ + u32 udp:1;/* UDP packet */ + u32 tcp:1;/* TCP packet */ + u32 ipc:1;/* IP Checksum Correct */ + u32 v6:1; /* IPv6 */ + u32 v4:1; /* IPv4 */ + u32 frg:1;/* IP Fragment */ + u32 fcs:1;/* Frame CRC correct */ + u32 type:7; /* completion type */ + u32 gen:1;/* generation bit */ +#endif /* __BIG_ENDIAN_BITFIELD */ +}; + + /* fields in RxCompDesc we access via Vmxnet3_GenericDesc.dword[3] */ #define VMXNET3_RCD_TUC_SHIFT 16 #define VMXNET3_RCD_IPC_SHIFT 19 @@ -310,6 +344,7 @@ union Vmxnet3_GenericDesc { struct Vmxnet3_RxDesc rxd; struct Vmxnet3_TxCompDesc tcd; struct Vmxnet3_RxCompDesc rcd; + struct Vmxnet3_RxCompDescExtrcdExt; }; #define VMXNET3_INIT_GEN 1 @@ -361,6 +396,7 @@ enum { /* completion descriptor types */ #define VMXNET3_CDTYPE_TXCOMP 0/* Tx Completion Descriptor */ #define VMXNET3_CDTYPE_RXCOMP 3/* Rx Completion Descriptor */ +#define VMXNET3_CDTYPE_RXCOMP_LRO 4/* Rx Completion Descriptor for LRO */ enum { VMXNET3_GOS_BITS_UNK= 0, /* unknown */ diff --git a/drivers/net/vmxnet3/vmxnet3_drv.c b/drivers/net/vmxnet3/vmxnet3_drv.c index ab53975..da11bb5 100644 --- a/drivers/net/vmxnet3/vmxnet3_drv.c +++ b/drivers/net/vmxnet3/vmxnet3_drv.c @@ -1163,6 +1163,52 @@ vmxnet3_rx_error(struct vmxnet3_rx_queue *rq, struct Vmxnet3_RxCompDesc *rcd, } +static u32 +vmxnet3_get_hdr_len(struct vmxnet3_adapter *adapter, struct sk_buff *skb, + union Vmxnet3_GenericDesc *gdesc) +{ + u32 hlen, maplen; + union { + void *ptr; + struct ethhdr *eth; + struct iphdr *ipv4; + struct ipv6hdr *ipv6; + struct tcphdr *tcp; + } hdr; + BUG_ON(gdesc-rcd.tcp == 0); + + maplen = skb_headlen(skb); + if (unlikely(sizeof(struct iphdr) + sizeof(struct tcphdr) maplen)) + return 0; + + hdr.eth = eth_hdr(skb); + if (gdesc-rcd.v4) { + BUG_ON(hdr.eth-h_proto != htons(ETH_P_IP)); + hdr.ptr += sizeof(struct ethhdr); + BUG_ON(hdr.ipv4-protocol != IPPROTO_TCP); + hlen = hdr.ipv4-ihl 2; + hdr.ptr += hdr.ipv4-ihl 2; + } else if (gdesc-rcd.v6) { + BUG_ON(hdr.eth-h_proto != htons(ETH_P_IPV6)); + hdr.ptr += sizeof(struct ethhdr); + /* Use an estimated value, since we also need to handle +* TSO case. +*/ + if (hdr.ipv6-nexthdr != IPPROTO_TCP) + return sizeof(struct ipv6hdr) + sizeof(struct tcphdr); + hlen = sizeof(struct ipv6hdr); + hdr.ptr += sizeof(struct ipv6hdr); + } else { + /* Non-IP pkt, dont estimate header length */ + return 0; + } + +
Re: [REGRESSION] NFS is creating a hidden port (left over from xs_bind() )
On Fri, 19 Jun 2015 13:39:08 -0400 Trond Myklebust trond.mykleb...@primarydata.com wrote: Hang on. This is on the client box while there is an active NFSv4 mount? Then that's probably the NFSv4 callback channel listening for delegation callbacks. Can you please try: echo options nfs callback_tcpport=4048 /etc/modprobe.d/nfs-local.conf and then either reboot the client or unload and then reload the nfs modules before reattempting the mount. If this is indeed the callback channel, then that will move your phantom listener to port 4048... I unmounted the directories, removed the nfs modules, and then add this file, and loaded the modules back and remounted the directories. # netstat -ntap |grep 4048 tcp0 0 0.0.0.0:40480.0.0.0:* LISTEN - tcp0 0 192.168.23.22:4048 192.168.23.9:1010 ESTABLISHED - tcp6 0 0 :::4048 :::*LISTEN - -- Steve -- To unsubscribe from this list: send the line unsubscribe netdev in
Re: [REGRESSION] NFS is creating a hidden port (left over from xs_bind() )
On Fri, 2015-06-19 at 18:14 -0400, Steven Rostedt wrote: On Fri, 19 Jun 2015 16:30:18 -0400 Trond Myklebust trond.mykleb...@primarydata.com wrote: Steven, how about something like the following patch? OK, the box I'm running this on is using v4.0.5, can you make a patch based on that, as whatever you make needs to go to stable as well. Is it causing any other damage than the rkhunter warning you reported? distcc[31554] ERROR: compile /home/rostedt/work/git/nobackup/linux -build.git/net/sunrpc/xprtsock.c on fedora/8 failed distcc[31554] (dcc_build_somewhere) Warning: remote compilation of '/home/rostedt/work/git/nobackup/linux -build.git/net/sunrpc/xprtsock.c' failed, retrying locally distcc[31554] Warning: failed to distribute /home/rostedt/work/git/nobackup/linux-build.git/net/sunrpc/xprtsock.c to fedora/8, running locally instead /home/rostedt/work/git/nobackup/linux -build.git/net/sunrpc/xprtsock.c: In function 'xs_tcp_shutdown': /home/rostedt/work/git/nobackup/linux -build.git/net/sunrpc/xprtsock.c:643:3: error: implicit declaration of function 'xs_reset_transport' [-Werror=implicit-function -declaration] /home/rostedt/work/git/nobackup/linux -build.git/net/sunrpc/xprtsock.c: At top level: /home/rostedt/work/git/nobackup/linux -build.git/net/sunrpc/xprtsock.c:825:13: warning: conflicting types for 'xs_reset_transport' [enabled by default] /home/rostedt/work/git/nobackup/linux -build.git/net/sunrpc/xprtsock.c:825:13: error: static declaration of 'xs_reset_transport' follows non-static declaration /home/rostedt/work/git/nobackup/linux -build.git/net/sunrpc/xprtsock.c:643:3: note: previous implicit declaration of 'xs_reset_transport' was here cc1: some warnings being treated as errors distcc[31554] ERROR: compile /home/rostedt/work/git/nobackup/linux -build.git/net/sunrpc/xprtsock.c on localhost failed /home/rostedt/work/git/nobackup/linux -build.git/scripts/Makefile.build:258: recipe for target 'net/sunrpc/xprtsock.o' failed make[3]: *** [net/sunrpc/xprtsock.o] Error 1 Sorry. I sent that one off too quickly. Try the following. 8-- From 4876cc779ff525b9c2376d8076edf47815e71f2c Mon Sep 17 00:00:00 2001 From: Trond Myklebust trond.mykleb...@primarydata.com Date: Fri, 19 Jun 2015 16:17:57 -0400 Subject: [PATCH v2] SUNRPC: Ensure we release the TCP socket once it has been closed This fixes a regression introduced by commit caf4ccd4e88cf2 (SUNRPC: Make xs_tcp_close() do a socket shutdown rather than a sock_release). Prior to that commit, the autoclose feature would ensure that an idle connection would result in the socket being both disconnected and released, whereas now only gets disconnected. While the current behaviour is harmless, it does leave the port bound until either RPC traffic resumes or the RPC client is shut down. Reported-by: Steven Rostedt rost...@goodmis.org Signed-off-by: Trond Myklebust trond.mykleb...@primarydata.com --- net/sunrpc/xprt.c | 2 +- net/sunrpc/xprtsock.c | 40 ++-- 2 files changed, 23 insertions(+), 19 deletions(-) diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c index 3ca31f20b97c..ab5dd621ae0c 100644 --- a/net/sunrpc/xprt.c +++ b/net/sunrpc/xprt.c @@ -611,8 +611,8 @@ static void xprt_autoclose(struct work_struct *work) struct rpc_xprt *xprt = container_of(work, struct rpc_xprt, task_cleanup); - xprt-ops-close(xprt); clear_bit(XPRT_CLOSE_WAIT, xprt-state); + xprt-ops-close(xprt); xprt_release_write(xprt, NULL); } diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c index fda8ec8c74c0..ee0715dfc3c7 100644 --- a/net/sunrpc/xprtsock.c +++ b/net/sunrpc/xprtsock.c @@ -623,24 +623,6 @@ process_status: } /** - * xs_tcp_shutdown - gracefully shut down a TCP socket - * @xprt: transport - * - * Initiates a graceful shutdown of the TCP socket by calling the - * equivalent of shutdown(SHUT_RDWR); - */ -static void xs_tcp_shutdown(struct rpc_xprt *xprt) -{ - struct sock_xprt *transport = container_of(xprt, struct sock_xprt, xprt); - struct socket *sock = transport-sock; - - if (sock != NULL) { - kernel_sock_shutdown(sock, SHUT_RDWR); - trace_rpc_socket_shutdown(xprt, sock); - } -} - -/** * xs_tcp_send_request - write an RPC request to a TCP socket * @task: address of RPC task that manages the state of an RPC request * @@ -786,6 +768,7 @@ static void xs_sock_mark_closed(struct rpc_xprt *xprt) xs_sock_reset_connection_flags(xprt); /* Mark transport as closed and wake up all pending tasks */ xprt_disconnect_done(xprt); + xprt_force_disconnect(xprt); } /** @@ -2103,6 +2086,27 @@ out: xprt_wake_pending_tasks(xprt, status); } +/** + * xs_tcp_shutdown - gracefully shut down a TCP socket + * @xprt: transport + * + * Initiates a graceful shutdown of the TCP socket by
[PATCH net] netfilter: nf_queue: Don't recompute the hook_list head
If someone sends packets from one of the netdevice ingress hooks to the a userspace queue, and then userspace later accepts the packet, the netfilter code can enter an infinite loop as the list head will never be found. Pass in the saved list_head to avoid this. Signed-off-by: Eric W. Biederman ebied...@xmission.com --- net/netfilter/nf_queue.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/netfilter/nf_queue.c b/net/netfilter/nf_queue.c index cd60d397fe05..8a8b2abc35ff 100644 --- a/net/netfilter/nf_queue.c +++ b/net/netfilter/nf_queue.c @@ -213,7 +213,7 @@ void nf_reinject(struct nf_queue_entry *entry, unsigned int verdict) if (verdict == NF_ACCEPT) { next_hook: - verdict = nf_iterate(nf_hooks[entry-state.pf][entry-state.hook], + verdict = nf_iterate(entry-state.hook_list, skb, entry-state, elem); } -- 2.2.1 -- To unsubscribe from this list: send the line unsubscribe netdev in
[net-next] vmxnet3: Register shutdown handler for device (fwd)
Implement a handler for pci shutdown so that the driver has an opportunity to make sure that device is quiesced before the PCI switches to legacy IRQs. This way the possibility of screaming interrupt is avoided. Acked-by: Shrikrishna Khare skh...@vmware.com Signed-off-by: Shreyas N Bhatewara sbhatew...@vmware.com --- diff --git a/drivers/net/vmxnet3/vmxnet3_drv.c b/drivers/net/vmxnet3/vmxnet3_drv.c index 61c0840..bb35210 100644 --- a/drivers/net/vmxnet3/vmxnet3_drv.c +++ b/drivers/net/vmxnet3/vmxnet3_drv.c @@ -3184,6 +3184,32 @@ vmxnet3_remove_device(struct pci_dev *pdev) free_netdev(netdev); } +static void vmxnet3_shutdown_device(struct pci_dev *pdev) +{ + struct net_device *netdev = pci_get_drvdata(pdev); + struct vmxnet3_adapter *adapter = netdev_priv(netdev); + unsigned long flags; + + /* Reset_work may be in the middle of resetting the device, wait for its +* completion. +*/ + while (test_and_set_bit(VMXNET3_STATE_BIT_RESETTING, adapter-state)) + msleep(1); + + if (test_and_set_bit(VMXNET3_STATE_BIT_QUIESCED, +adapter-state)) { + clear_bit(VMXNET3_STATE_BIT_RESETTING, adapter-state); + return; + } + spin_lock_irqsave(adapter-cmd_lock, flags); + VMXNET3_WRITE_BAR1_REG(adapter, VMXNET3_REG_CMD, + VMXNET3_CMD_QUIESCE_DEV); + spin_unlock_irqrestore(adapter-cmd_lock, flags); + vmxnet3_disable_all_intrs(adapter); + + clear_bit(VMXNET3_STATE_BIT_RESETTING, adapter-state); +} + #ifdef CONFIG_PM @@ -3360,6 +3386,7 @@ static struct pci_driver vmxnet3_driver = { .id_table = vmxnet3_pciid_table, .probe = vmxnet3_probe_device, .remove = vmxnet3_remove_device, + .shutdown = vmxnet3_shutdown_device, #ifdef CONFIG_PM .driver.pm = vmxnet3_pm_ops, #endif -- To unsubscribe from this list: send the line unsubscribe netdev in
Re: [REGRESSION] NFS is creating a hidden port (left over from xs_bind() )
On Fri, 19 Jun 2015 16:30:18 -0400 Trond Myklebust trond.mykleb...@primarydata.com wrote: Steven, how about something like the following patch? Building it now. Will let you know in a bit. 8- From 9a0bcfdbdbc793eae1ed6d901a6396b6c66f9513 Mon Sep 17 00:00:00 2001 From: Trond Myklebust trond.mykleb...@primarydata.com Date: Fri, 19 Jun 2015 16:17:57 -0400 Subject: [PATCH] SUNRPC: Ensure we release the TCP socket once it has been closed This fixes a regression introduced by commit caf4ccd4e88cf2 (SUNRPC: Make xs_tcp_close() do a socket shutdown rather than a sock_release). Prior to that commit, the autoclose feature would ensure that an idle connection would result in the socket being both disconnected and released, whereas now only gets disconnected. While the current behaviour is harmless, it does leave the port bound until either RPC traffic resumes or the RPC client is shut down. Hmm, is this true? The port is bound, but the socket has been freed. That is sk-sk_socket points to garbage. As my portlist.c module verified. It doesn't seem that anything can attach to that port again that I know of. Is there a way to verify that something can attach to it again? -- Steve Reported-by: Steven Rostedt rost...@goodmis.org Signed-off-by: Trond Myklebust trond.mykleb...@primarydata.com --- net/sunrpc/xprt.c | 2 +- net/sunrpc/xprtsock.c | 8 ++-- 2 files changed, 7 insertions(+), 3 deletions(-) diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c index 3ca31f20b97c..ab5dd621ae0c 100644 --- a/net/sunrpc/xprt.c +++ b/net/sunrpc/xprt.c @@ -611,8 +611,8 @@ static void xprt_autoclose(struct work_struct *work) struct rpc_xprt *xprt = container_of(work, struct rpc_xprt, task_cleanup); - xprt-ops-close(xprt); clear_bit(XPRT_CLOSE_WAIT, xprt-state); + xprt-ops-close(xprt); xprt_release_write(xprt, NULL); } diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c index fda8ec8c74c0..75dcdadf0269 100644 --- a/net/sunrpc/xprtsock.c +++ b/net/sunrpc/xprtsock.c @@ -634,10 +634,13 @@ static void xs_tcp_shutdown(struct rpc_xprt *xprt) struct sock_xprt *transport = container_of(xprt, struct sock_xprt, xprt); struct socket *sock = transport-sock; - if (sock != NULL) { + if (sock == NULL) + return; + if (xprt_connected(xprt)) { kernel_sock_shutdown(sock, SHUT_RDWR); trace_rpc_socket_shutdown(xprt, sock); - } + } else + xs_reset_transport(transport); } /** @@ -786,6 +789,7 @@ static void xs_sock_mark_closed(struct rpc_xprt *xprt) xs_sock_reset_connection_flags(xprt); /* Mark transport as closed and wake up all pending tasks */ xprt_disconnect_done(xprt); + xprt_force_disconnect(xprt); } /** -- To unsubscribe from this list: send the line unsubscribe netdev in
Re: [REGRESSION] NFS is creating a hidden port (left over from xs_bind() )
On Fri, 19 Jun 2015 16:30:18 -0400 Trond Myklebust trond.mykleb...@primarydata.com wrote: Steven, how about something like the following patch? OK, the box I'm running this on is using v4.0.5, can you make a patch based on that, as whatever you make needs to go to stable as well. distcc[31554] ERROR: compile /home/rostedt/work/git/nobackup/linux-build.git/net/sunrpc/xprtsock.c on fedora/8 failed distcc[31554] (dcc_build_somewhere) Warning: remote compilation of '/home/rostedt/work/git/nobackup/linux-build.git/net/sunrpc/xprtsock.c' failed, retrying locally distcc[31554] Warning: failed to distribute /home/rostedt/work/git/nobackup/linux-build.git/net/sunrpc/xprtsock.c to fedora/8, running locally instead /home/rostedt/work/git/nobackup/linux-build.git/net/sunrpc/xprtsock.c: In function 'xs_tcp_shutdown': /home/rostedt/work/git/nobackup/linux-build.git/net/sunrpc/xprtsock.c:643:3: error: implicit declaration of function 'xs_reset_transport' [-Werror=implicit-function-declaration] /home/rostedt/work/git/nobackup/linux-build.git/net/sunrpc/xprtsock.c: At top level: /home/rostedt/work/git/nobackup/linux-build.git/net/sunrpc/xprtsock.c:825:13: warning: conflicting types for 'xs_reset_transport' [enabled by default] /home/rostedt/work/git/nobackup/linux-build.git/net/sunrpc/xprtsock.c:825:13: error: static declaration of 'xs_reset_transport' follows non-static declaration /home/rostedt/work/git/nobackup/linux-build.git/net/sunrpc/xprtsock.c:643:3: note: previous implicit declaration of 'xs_reset_transport' was here cc1: some warnings being treated as errors distcc[31554] ERROR: compile /home/rostedt/work/git/nobackup/linux-build.git/net/sunrpc/xprtsock.c on localhost failed /home/rostedt/work/git/nobackup/linux-build.git/scripts/Makefile.build:258: recipe for target 'net/sunrpc/xprtsock.o' failed make[3]: *** [net/sunrpc/xprtsock.o] Error 1 -- Steve 8- From 9a0bcfdbdbc793eae1ed6d901a6396b6c66f9513 Mon Sep 17 00:00:00 2001 From: Trond Myklebust trond.mykleb...@primarydata.com Date: Fri, 19 Jun 2015 16:17:57 -0400 Subject: [PATCH] SUNRPC: Ensure we release the TCP socket once it has been closed This fixes a regression introduced by commit caf4ccd4e88cf2 (SUNRPC: Make xs_tcp_close() do a socket shutdown rather than a sock_release). Prior to that commit, the autoclose feature would ensure that an idle connection would result in the socket being both disconnected and released, whereas now only gets disconnected. While the current behaviour is harmless, it does leave the port bound until either RPC traffic resumes or the RPC client is shut down. Reported-by: Steven Rostedt rost...@goodmis.org Signed-off-by: Trond Myklebust trond.mykleb...@primarydata.com --- net/sunrpc/xprt.c | 2 +- net/sunrpc/xprtsock.c | 8 ++-- 2 files changed, 7 insertions(+), 3 deletions(-) diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c index 3ca31f20b97c..ab5dd621ae0c 100644 --- a/net/sunrpc/xprt.c +++ b/net/sunrpc/xprt.c @@ -611,8 +611,8 @@ static void xprt_autoclose(struct work_struct *work) struct rpc_xprt *xprt = container_of(work, struct rpc_xprt, task_cleanup); - xprt-ops-close(xprt); clear_bit(XPRT_CLOSE_WAIT, xprt-state); + xprt-ops-close(xprt); xprt_release_write(xprt, NULL); } diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c index fda8ec8c74c0..75dcdadf0269 100644 --- a/net/sunrpc/xprtsock.c +++ b/net/sunrpc/xprtsock.c @@ -634,10 +634,13 @@ static void xs_tcp_shutdown(struct rpc_xprt *xprt) struct sock_xprt *transport = container_of(xprt, struct sock_xprt, xprt); struct socket *sock = transport-sock; - if (sock != NULL) { + if (sock == NULL) + return; + if (xprt_connected(xprt)) { kernel_sock_shutdown(sock, SHUT_RDWR); trace_rpc_socket_shutdown(xprt, sock); - } + } else + xs_reset_transport(transport); } /** @@ -786,6 +789,7 @@ static void xs_sock_mark_closed(struct rpc_xprt *xprt) xs_sock_reset_connection_flags(xprt); /* Mark transport as closed and wake up all pending tasks */ xprt_disconnect_done(xprt); + xprt_force_disconnect(xprt); } /** -- To unsubscribe from this list: send the line unsubscribe netdev in
[net-next] vmxnet3: Fix memory leaks in rx path (fwd)
If rcd length was zero, the page used for frag was not being released. It was being replaced with a newly allocated page. This change takes care of that memory leak. Signed-off-by: Guolin Yang gy...@vmware.com Signed-off-by: Shreyas N Bhatewara sbhatew...@vmware.com --- diff --git a/drivers/net/vmxnet3/vmxnet3_drv.c b/drivers/net/vmxnet3/vmxnet3_drv.c index bb35210..ab53975 100644 --- a/drivers/net/vmxnet3/vmxnet3_drv.c +++ b/drivers/net/vmxnet3/vmxnet3_drv.c @@ -861,6 +861,9 @@ vmxnet3_parse_and_copy_hdr(struct sk_buff *skb, struct vmxnet3_tx_queue *tq, , skb_headlen(skb)); } + if (skb-len = VMXNET3_HDR_COPY_SIZE) + ctx-copy_size = skb-len; + /* make sure headers are accessible directly */ if (unlikely(!pskb_may_pull(skb, ctx-copy_size))) goto err; @@ -1273,36 +1276,36 @@ vmxnet3_rq_rx_complete(struct vmxnet3_rx_queue *rq, if (skip_page_frags) goto rcd_done; - new_page = alloc_page(GFP_ATOMIC); - if (unlikely(new_page == NULL)) { + if (rcd-len) { + new_page = alloc_page(GFP_ATOMIC); /* Replacement page frag could not be allocated. * Reuse this page. Drop the pkt and free the * skb which contained this page as a frag. Skip * processing all the following non-sop frags. */ - rq-stats.rx_buf_alloc_failure++; - dev_kfree_skb(ctx-skb); - ctx-skb = NULL; - skip_page_frags = true; - goto rcd_done; - } + if (unlikely(!new_page)) { + rq-stats.rx_buf_alloc_failure++; + dev_kfree_skb(ctx-skb); + ctx-skb = NULL; + skip_page_frags = true; + goto rcd_done; + } - if (rcd-len) { dma_unmap_page(adapter-pdev-dev, rbi-dma_addr, rbi-len, PCI_DMA_FROMDEVICE); vmxnet3_append_frag(ctx-skb, rcd, rbi); - } - /* Immediate refill */ - rbi-page = new_page; - rbi-dma_addr = dma_map_page(adapter-pdev-dev, -rbi-page, -0, PAGE_SIZE, -PCI_DMA_FROMDEVICE); - rxd-addr = cpu_to_le64(rbi-dma_addr); - rxd-len = rbi-len; + /* Immediate refill */ + rbi-page = new_page; + rbi-dma_addr = dma_map_page(adapter-pdev-dev + , rbi-page, + 0, PAGE_SIZE, + PCI_DMA_FROMDEVICE); + rxd-addr = cpu_to_le64(rbi-dma_addr); + rxd-len = rbi-len; + } } -- To unsubscribe from this list: send the line unsubscribe netdev in
[PATCH 0/3] Small fixes for Renesas R-Car CAN driver
Hello. Here's the set of 3 patches against Marc Kleine-Budde's 'linux-can.git' repo; they are small fixes for the Renesas R-Car CAN driver. rcar_can: fix IRQ check rcar_can: print signed IRQ # rcar_can: fix typo in error message WBR, Sergei -- To unsubscribe from this list: send the line unsubscribe netdev in
Re: [REGRESSION] NFS is creating a hidden port (left over from xs_bind() )
On Fri, 19 Jun 2015 19:25:59 -0400 Trond Myklebust trond.mykleb...@primarydata.com wrote: On Fri, 2015-06-19 at 18:14 -0400, Steven Rostedt wrote: On Fri, 19 Jun 2015 16:30:18 -0400 Trond Myklebust trond.mykleb...@primarydata.com wrote: Steven, how about something like the following patch? OK, the box I'm running this on is using v4.0.5, can you make a patch based on that, as whatever you make needs to go to stable as well. Is it causing any other damage than the rkhunter warning you reported? Well, not that I know of. Are you sure that this port will be reconnected, and is not just a leak. Not sure if you could waste more ports this way with connections to other machines. I only have my wife's box connect to this server. This server is actually a client to my other boxes. Although the rkhunter warning is the only thing that triggers, I still would think this is a stable fix, especially if the port is leaked and not taken again. Sorry. I sent that one off too quickly. Try the following. This built, will be testing it shortly. -- Steve -- To unsubscribe from this list: send the line unsubscribe netdev in
[PATCH 0/2] Error message clean-ups for Renesas R-Car CAN driver
Hello. Here's the set of 2 patches against Marc Kleine-Budde's 'linux-can.git' repo plus 3 fix patches just posted; they are small error message cleanups for the Renesas R-Car CAN driver. [1/2] rcar_can: print request_irq() error code [2/2] rcar_can: unify error messages WBR, Sergei -- To unsubscribe from this list: send the line unsubscribe netdev in
[PATCH 2/2] rcar_can: unify error messages
All the error messages in the driver but the ones from devm_clk_get() failures use similar format. Make those two messages consitent with others. Signed-off-by: Sergei Shtylyov sergei.shtyl...@cogentembedded.com --- drivers/net/can/rcar_can.c |5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) Index: linux-can/drivers/net/can/rcar_can.c === --- linux-can.orig/drivers/net/can/rcar_can.c +++ linux-can/drivers/net/can/rcar_can.c @@ -785,7 +785,8 @@ static int rcar_can_probe(struct platfor priv-clk = devm_clk_get(pdev-dev, clkp1); if (IS_ERR(priv-clk)) { err = PTR_ERR(priv-clk); - dev_err(pdev-dev, cannot get peripheral clock: %d\n, err); + dev_err(pdev-dev, cannot get peripheral clock, error %d\n, + err); goto fail_clk; } @@ -797,7 +798,7 @@ static int rcar_can_probe(struct platfor priv-can_clk = devm_clk_get(pdev-dev, clock_names[clock_select]); if (IS_ERR(priv-can_clk)) { err = PTR_ERR(priv-can_clk); - dev_err(pdev-dev, cannot get CAN clock: %d\n, err); + dev_err(pdev-dev, cannot get CAN clock, error %d\n, err); goto fail_clk; } -- To unsubscribe from this list: send the line unsubscribe netdev in
Re: [PATCH v2] bpf: BPF based latency tracing
On 6/19/15 7:00 AM, Daniel Wagner wrote: BPF offers another way to generate latency histograms. We attach kprobes at trace_preempt_off and trace_preempt_on and calculate the time it takes to from seeing the off/on transition. ... Signed-off-by: Daniel Wagner daniel.wag...@bmw-carit.de ... With the rebase on net-next no additinal patches are needed and this thing here runs fine. ... samples/bpf/Makefile | 4 ++ samples/bpf/lathist_kern.c | 99 +++ samples/bpf/lathist_user.c | 103 + 3 files changed, 206 insertions(+) create mode 100644 samples/bpf/lathist_kern.c create mode 100644 samples/bpf/lathist_user.c Thanks. That's a useful example. Acked-by: Alexei Starovoitov a...@plumgrid.com Dave, this patch is for net-next and I hope it's not too late for this merge window. -- To unsubscribe from this list: send the line unsubscribe netdev in
Re: [REGRESSION] NFS is creating a hidden port (left over from xs_bind() )
On Fri, 19 Jun 2015 19:25:59 -0400 Trond Myklebust trond.mykleb...@primarydata.com wrote: 8-- From 4876cc779ff525b9c2376d8076edf47815e71f2c Mon Sep 17 00:00:00 2001 From: Trond Myklebust trond.mykleb...@primarydata.com Date: Fri, 19 Jun 2015 16:17:57 -0400 Subject: [PATCH v2] SUNRPC: Ensure we release the TCP socket once it has been closed This fixes a regression introduced by commit caf4ccd4e88cf2 (SUNRPC: Make xs_tcp_close() do a socket shutdown rather than a sock_release). Prior to that commit, the autoclose feature would ensure that an idle connection would result in the socket being both disconnected and released, whereas now only gets disconnected. While the current behaviour is harmless, it does leave the port bound until either RPC traffic resumes or the RPC client is shut down. Is there a way to test RPC traffic resuming? I'd like to try that before declaring this bug harmless. Reported-by: Steven Rostedt rost...@goodmis.org The problem appears to go away with this patch. Tested-by: Steven Rostedt rost...@goodmis.org Thanks a lot! -- Steve Signed-off-by: Trond Myklebust trond.mykleb...@primarydata.com --- net/sunrpc/xprt.c | 2 +- net/sunrpc/xprtsock.c | 40 ++-- 2 files changed, 23 insertions(+), 19 deletions(-) diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c index 3ca31f20b97c..ab5dd621ae0c 100644 --- a/net/sunrpc/xprt.c +++ b/net/sunrpc/xprt.c @@ -611,8 +611,8 @@ static void xprt_autoclose(struct work_struct *work) struct rpc_xprt *xprt = container_of(work, struct rpc_xprt, task_cleanup); - xprt-ops-close(xprt); clear_bit(XPRT_CLOSE_WAIT, xprt-state); + xprt-ops-close(xprt); xprt_release_write(xprt, NULL); } diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c index fda8ec8c74c0..ee0715dfc3c7 100644 --- a/net/sunrpc/xprtsock.c +++ b/net/sunrpc/xprtsock.c @@ -623,24 +623,6 @@ process_status: } /** - * xs_tcp_shutdown - gracefully shut down a TCP socket - * @xprt: transport - * - * Initiates a graceful shutdown of the TCP socket by calling the - * equivalent of shutdown(SHUT_RDWR); - */ -static void xs_tcp_shutdown(struct rpc_xprt *xprt) -{ - struct sock_xprt *transport = container_of(xprt, struct sock_xprt, xprt); - struct socket *sock = transport-sock; - - if (sock != NULL) { - kernel_sock_shutdown(sock, SHUT_RDWR); - trace_rpc_socket_shutdown(xprt, sock); - } -} - -/** * xs_tcp_send_request - write an RPC request to a TCP socket * @task: address of RPC task that manages the state of an RPC request * @@ -786,6 +768,7 @@ static void xs_sock_mark_closed(struct rpc_xprt *xprt) xs_sock_reset_connection_flags(xprt); /* Mark transport as closed and wake up all pending tasks */ xprt_disconnect_done(xprt); + xprt_force_disconnect(xprt); } /** @@ -2103,6 +2086,27 @@ out: xprt_wake_pending_tasks(xprt, status); } +/** + * xs_tcp_shutdown - gracefully shut down a TCP socket + * @xprt: transport + * + * Initiates a graceful shutdown of the TCP socket by calling the + * equivalent of shutdown(SHUT_RDWR); + */ +static void xs_tcp_shutdown(struct rpc_xprt *xprt) +{ + struct sock_xprt *transport = container_of(xprt, struct sock_xprt, xprt); + struct socket *sock = transport-sock; + + if (sock == NULL) + return; + if (xprt_connected(xprt)) { + kernel_sock_shutdown(sock, SHUT_RDWR); + trace_rpc_socket_shutdown(xprt, sock); + } else + xs_reset_transport(transport); +} + static int xs_tcp_finish_connecting(struct rpc_xprt *xprt, struct socket *sock) { struct sock_xprt *transport = container_of(xprt, struct sock_xprt, xprt); -- To unsubscribe from this list: send the line unsubscribe netdev in
[PATCH 1/3] rcar_can: fix IRQ check
rcar_can_probe() regards 0 as a wrong IRQ #, despite platform_get_irq() that it calls returns negative error code in that case. This leads to the following being printed to the console when attempting to open the device: error requesting interrupt fffa because rcar_can_open() calls request_irq() with a negative IRQ #, and that function naturally fails with -EINVAL. Check for the negative error codes instead and propagate them upstream instead of just returning -ENODEV. Fixes: fd1159318e55 (can: add Renesas R-Car CAN driver) Signed-off-by: Sergei Shtylyov sergei.shtyl...@cogentembedded.com --- drivers/net/can/rcar_can.c |3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) Index: linux-can/drivers/net/can/rcar_can.c === --- linux-can.orig/drivers/net/can/rcar_can.c +++ linux-can/drivers/net/can/rcar_can.c @@ -758,8 +758,9 @@ static int rcar_can_probe(struct platfor } irq = platform_get_irq(pdev, 0); - if (!irq) { + if (irq 0) { dev_err(pdev-dev, No IRQ resource\n); + err = irq; goto fail; } -- To unsubscribe from this list: send the line unsubscribe netdev in
[PATCH 1/2] rcar_can: print request_irq() error code
Also print the error code when the request_irq() call fails in rcar_can_open(), rewording the error message... Signed-off-by: Sergei Shtylyov sergei.shtyl...@cogentembedded.com --- drivers/net/can/rcar_can.c |3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) Index: linux-can/drivers/net/can/rcar_can.c === --- linux-can.orig/drivers/net/can/rcar_can.c +++ linux-can/drivers/net/can/rcar_can.c @@ -527,7 +527,8 @@ static int rcar_can_open(struct net_devi napi_enable(priv-napi); err = request_irq(ndev-irq, rcar_can_interrupt, 0, ndev-name, ndev); if (err) { - netdev_err(ndev, error requesting interrupt %d\n, ndev-irq); + netdev_err(ndev, request_irq(%d) failed, error %d\n, + ndev-irq, err); goto out_close; } can_led_event(ndev, CAN_LED_EVENT_OPEN); -- To unsubscribe from this list: send the line unsubscribe netdev in
Re: [REGRESSION] NFS is creating a hidden port (left over from xs_bind() )
On Fri, 19 Jun 2015 20:37:45 -0400 Steven Rostedt rost...@goodmis.org wrote: Is it causing any other damage than the rkhunter warning you reported? Well, not that I know of. Are you sure that this port will be reconnected, and is not just a leak. Not sure if you could waste more ports this way with connections to other machines. I only have my wife's box connect to this server. This server is actually a client to my other boxes. Although the rkhunter warning is the only thing that triggers, I still would think this is a stable fix, especially if the port is leaked and not taken again. I did some experiments. If I unmount the directories from my wife's machine and remount them, the port that was hidden is fully closed. Maybe it's not that big of a deal after all. -- Steve -- To unsubscribe from this list: send the line unsubscribe netdev in
Re: [REGRESSION] NFS is creating a hidden port (left over from xs_bind() )
On Fri, Jun 19, 2015 at 9:27 PM, Steven Rostedt rost...@goodmis.org wrote: On Fri, 19 Jun 2015 19:25:59 -0400 Trond Myklebust trond.mykleb...@primarydata.com wrote: 8-- From 4876cc779ff525b9c2376d8076edf47815e71f2c Mon Sep 17 00:00:00 2001 From: Trond Myklebust trond.mykleb...@primarydata.com Date: Fri, 19 Jun 2015 16:17:57 -0400 Subject: [PATCH v2] SUNRPC: Ensure we release the TCP socket once it has been closed This fixes a regression introduced by commit caf4ccd4e88cf2 (SUNRPC: Make xs_tcp_close() do a socket shutdown rather than a sock_release). Prior to that commit, the autoclose feature would ensure that an idle connection would result in the socket being both disconnected and released, whereas now only gets disconnected. While the current behaviour is harmless, it does leave the port bound until either RPC traffic resumes or the RPC client is shut down. Is there a way to test RPC traffic resuming? I'd like to try that before declaring this bug harmless. You should be seeing the same issue if you mount an NFSv3 partition. After about 5 minutes of inactivity, the client will close down the connection to the server, and rkhunter should again see the phantom socket. If you then try to access the partition, the RPC layer should immediately release the socket and establish a new connection on the same port. Cheers Trond -- To unsubscribe from this list: send the line unsubscribe netdev in
[PATCH 2/3] rcar_can: print signed IRQ #
Printing IRQ # using %x and %u unsigned formats isn't quite correct as 'ndev-irq' is of type *int*, so the %d format needs to be used instead. While fixing this, beautify the dev_info() message in rcar_can_probe() a bit. Fixes: fd1159318e55 (can: add Renesas R-Car CAN driver) Signed-off-by: Sergei Shtylyov sergei.shtyl...@cogentembedded.com --- drivers/net/can/rcar_can.c |4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) Index: linux-can/drivers/net/can/rcar_can.c === --- linux-can.orig/drivers/net/can/rcar_can.c +++ linux-can/drivers/net/can/rcar_can.c @@ -526,7 +526,7 @@ static int rcar_can_open(struct net_devi napi_enable(priv-napi); err = request_irq(ndev-irq, rcar_can_interrupt, 0, ndev-name, ndev); if (err) { - netdev_err(ndev, error requesting interrupt %x\n, ndev-irq); + netdev_err(ndev, error requesting interrupt %d\n, ndev-irq); goto out_close; } can_led_event(ndev, CAN_LED_EVENT_OPEN); @@ -824,7 +824,7 @@ static int rcar_can_probe(struct platfor devm_can_led_init(ndev); - dev_info(pdev-dev, device registered (reg_base=%p, irq=%u)\n, + dev_info(pdev-dev, device registered (regs @ %p, IRQ%d)\n, priv-regs, ndev-irq); return 0; -- To unsubscribe from this list: send the line unsubscribe netdev in
[PATCH 3/3] rcar_can: fix typo in error message
Fix typo in the first error message printed by rcar_can_open(). Based on the original patch by Vladimir Barinov. Fixes: 862e2b6af941 (can: rcar_can: support all input clocks) Reported-by: Vladimir Barinov vladimir.bari...@cogentembedded.com Signed-off-by: Sergei Shtylyov sergei.shtyl...@cogentembedded.com --- drivers/net/can/rcar_can.c |3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) Index: linux-can/drivers/net/can/rcar_can.c === --- linux-can.orig/drivers/net/can/rcar_can.c +++ linux-can/drivers/net/can/rcar_can.c @@ -508,7 +508,8 @@ static int rcar_can_open(struct net_devi err = clk_prepare_enable(priv-clk); if (err) { - netdev_err(ndev, failed to enable periperal clock, error %d\n, + netdev_err(ndev, + failed to enable peripheral clock, error %d\n, err); goto out; } -- To unsubscribe from this list: send the line unsubscribe netdev in
Re: [PATCH RFC net] neigh: do not modify unlinked entries
On Tue, 2015-06-16 at 22:56 +0300, Julian Anastasov wrote: The lockless lookups can return entry that is unlinked. Sometimes they get reference before last neigh_cleanup_and_release, sometimes they do not need reference. Later, any modification attempts may result in the following problems: 1. entry is not destroyed immediately because neigh_update can start the timer for dead entry, eg. on change to NUD_REACHABLE state. As result, entry lives for some time but is invisible and out of control. 2. __neigh_event_send can run in parallel with neigh_destroy while refcnt=0 but if timer is started and expired refcnt can reach 0 for second time leading to second neigh_destroy and possible crash. Thanks to Eric Dumazet and Ying Xue for their work and analyze on the __neigh_event_send change. Fixes: 767e97e1e0db (neigh: RCU conversion of struct neighbour) Fixes: a263b3093641 (ipv4: Make neigh lookups directly in output packet path.) Fixes: 6fd6ce2056de (ipv6: Do not depend on rt-n in ip6_finish_output2().) Cc: Eric Dumazet eric.duma...@gmail.com Cc: Ying Xue ying@windriver.com Signed-off-by: Julian Anastasov j...@ssi.bg --- Seems good to me Julian ! Acked-by: Eric Dumazet eduma...@google.com -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next v2] bridge: multicast: start querier timer when running user-space stp
On Fri, Jun 19, 2015 at 01:45:50AM -0700, Nikolay Aleksandrov wrote: When STP is running in user-space and querier is configured, the querier timer is not started when a port goes to a non-blocking state. This patch unifies the user- and kernel-space stp multicast port enable path and enables it in all states different from blocking. Note that when a port goes in BR_STATE_DISABLED it's not enabled because that is handled in the beginning of the port list loop. Signed-off-by: Nikolay Aleksandrov niko...@cumulusnetworks.com Acked-by: Herbert Xu herb...@gondor.apana.org.au On a related note, we never disable the multicast querying when the port goes into blocking mode and we probably should. So could you take a look at that and create a patch for it? Thanks, -- Email: Herbert Xu herb...@gondor.apana.org.au Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -- To unsubscribe from this list: send the line unsubscribe netdev in
Re: [PATCH net-next] inet_diag: Remove _bh suffix in inet_diag_dump_reqs().
On Thu, 18 Jun 2015 23:42:08 -0700 Eric Dumazet eric.duma...@gmail.com wrote: Sure, although this will soon be removed completely when SYN_RECV sockets will be stored in regular ehash table. OK. Thank you for letting me know. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v1] bpf: BPF based latency tracing
BPF offers another way to generate latency histograms. We attach kprobes at trace_preempt_off and trace_preempt_on and calculate the time it takes to from seeing the off/on transition. The first array is used to store the start time stamp. The key is the CPU id. The second array stores the log2(time diff). We need to use static allocation here (array and not hash tables). The kprobes hooking into trace_preempt_on|off should not calling any dynamic memory allocation or free path. We need to avoid recursivly getting called. Besides that, it reduces jitter in the measurement. CPU 0 latency: count distribution 1 - 1: 0|| 2 - 3: 0|| 4 - 7: 0|| 8 - 15 : 0|| 16 - 31 : 0|| 32 - 63 : 0|| 64 - 127 : 0|| 128 - 255 : 0|| 256 - 511 : 0|| 512 - 1023 : 0|| 1024 - 2047 : 0|| 2048 - 4095 : 166723 |*** | 4096 - 8191 : 19870|*** | 8192 - 16383: 6324 || 16384 - 32767: 1098 || 32768 - 65535: 190 || 65536 - 131071 : 179 || 131072 - 262143 : 18 || 262144 - 524287 : 4|| 524288 - 1048575 : 1363 || CPU 1 latency: count distribution 1 - 1: 0|| 2 - 3: 0|| 4 - 7: 0|| 8 - 15 : 0|| 16 - 31 : 0|| 32 - 63 : 0|| 64 - 127 : 0|| 128 - 255 : 0|| 256 - 511 : 0|| 512 - 1023 : 0|| 1024 - 2047 : 0|| 2048 - 4095 : 114042 |*** | 4096 - 8191 : 9587 |** | 8192 - 16383: 4140 || 16384 - 32767: 673 || 32768 - 65535: 179 || 65536 - 131071 : 29 || 131072 - 262143 : 4|| 262144 - 524287 : 1|| 524288 - 1048575 : 364 || CPU 2 latency: count distribution 1 - 1: 0|| 2 - 3: 0|| 4 - 7: 0|| 8 - 15 : 0|| 16 - 31 : 0|| 32 - 63 : 0|| 64 - 127 : 0|| 128 - 255 : 0|| 256 - 511 : 0|| 512 - 1023 : 0|| 1024 - 2047 : 0|| 2048 - 4095 : 40147|*** | 4096 - 8191 : 2300 |* | 8192 - 16383: 828 || 16384 - 32767: 178 || 32768 - 65535: 59 || 65536 - 131071 : 2|| 131072 - 262143 : 0|
Re: [PATCH net-next RFC v2 2/3] ipv4: add support for light weight tunnel encap attributes
On 6/18/15, 11:59 PM, Julian Anastasov wrote: Hello, On Thu, 18 Jun 2015, Roopa Prabhu wrote: @@ -366,6 +371,7 @@ static inline size_t fib_nlmsg_size(struct fib_info *fi) payload += nla_total_size((RTAX_MAX * nla_total_size(4))); if (fi-fib_nhs) { + size_t nh_encapsize = 0; Var not in #ifdef. Any warnings with CONFIG_LWTUNNEL=n? /* Also handles the special case fib_nhs == 1 */ /* each nexthop is packed in an attribute */ @@ -374,8 +380,23 @@ static inline size_t fib_nlmsg_size(struct fib_info *fi) /* may contain flow and gateway attribute */ nhsize += 2 * nla_total_size(4); +#ifdef CONFIG_LWTUNNEL + /* grab encap info */ + for_nexthops(fi) { + if (nh-nh_lwtstate) { + /* RTA_ENCAP_TYPE */ + nh_encapsize += lwtunnel_get_encap_size( + nh-nh_lwtstate); New labels not in #ifdef: Will check and fix all warnings with CONFIG_LWTUNNEL off + +err_inval: + ret = -EINVAL; + +errout: + return ret; } Some other places may need changes: - nh_comp: there is logic that decides if same fib_info is reused from many fib nodes. There should be check if NH matches by nh_lwtstate. yes, i will add that. - xfrm4_fill_dst: not sure about this but some fields are copied. I have not picked up xfrm4_fill_dst specifically, but this infra is supposed to be similar to that. I will look. Thanks. -- To unsubscribe from this list: send the line unsubscribe netdev in
[PATCH v2] bpf: BPF based latency tracing
BPF offers another way to generate latency histograms. We attach kprobes at trace_preempt_off and trace_preempt_on and calculate the time it takes to from seeing the off/on transition. The first array is used to store the start time stamp. The key is the CPU id. The second array stores the log2(time diff). We need to use static allocation here (array and not hash tables). The kprobes hooking into trace_preempt_on|off should not calling any dynamic memory allocation or free path. We need to avoid recursivly getting called. Besides that, it reduces jitter in the measurement. CPU 0 latency: count distribution 1 - 1: 0|| 2 - 3: 0|| 4 - 7: 0|| 8 - 15 : 0|| 16 - 31 : 0|| 32 - 63 : 0|| 64 - 127 : 0|| 128 - 255 : 0|| 256 - 511 : 0|| 512 - 1023 : 0|| 1024 - 2047 : 0|| 2048 - 4095 : 166723 |*** | 4096 - 8191 : 19870|*** | 8192 - 16383: 6324 || 16384 - 32767: 1098 || 32768 - 65535: 190 || 65536 - 131071 : 179 || 131072 - 262143 : 18 || 262144 - 524287 : 4|| 524288 - 1048575 : 1363 || CPU 1 latency: count distribution 1 - 1: 0|| 2 - 3: 0|| 4 - 7: 0|| 8 - 15 : 0|| 16 - 31 : 0|| 32 - 63 : 0|| 64 - 127 : 0|| 128 - 255 : 0|| 256 - 511 : 0|| 512 - 1023 : 0|| 1024 - 2047 : 0|| 2048 - 4095 : 114042 |*** | 4096 - 8191 : 9587 |** | 8192 - 16383: 4140 || 16384 - 32767: 673 || 32768 - 65535: 179 || 65536 - 131071 : 29 || 131072 - 262143 : 4|| 262144 - 524287 : 1|| 524288 - 1048575 : 364 || CPU 2 latency: count distribution 1 - 1: 0|| 2 - 3: 0|| 4 - 7: 0|| 8 - 15 : 0|| 16 - 31 : 0|| 32 - 63 : 0|| 64 - 127 : 0|| 128 - 255 : 0|| 256 - 511 : 0|| 512 - 1023 : 0|| 1024 - 2047 : 0|| 2048 - 4095 : 40147|*** | 4096 - 8191 : 2300 |* | 8192 - 16383: 828 || 16384 - 32767: 178 || 32768 - 65535: 59 || 65536 - 131071 : 2|| 131072 - 262143 : 0|
Re: [PATCH] net: inet_diag: export IPV6_V6ONLY sockopt
On Fri, 2015-06-19 at 14:15 +0200, Phil Sutter wrote: For AF_INET6 sockets, the value of struct ipv6_pinfo.ipv6only is exported to userspace. It indicates whether an unbound socket listens on IPv4 as well as IPv6. What is an 'unbound socket' ??? This makes no sense to me here. Since the socket is natively IPv6, it is not listed by e.g. 'netstat -l -4'. netstat does not use this interface. iproute2/ss does. Signed-off-by: Phil Sutter p...@nwl.cc --- This patch is accompanied by an appropriate one for iproute2 to enable the additional information in 'ss -e'. --- include/uapi/linux/inet_diag.h | 3 ++- net/ipv4/inet_diag.c | 4 2 files changed, 6 insertions(+), 1 deletion(-) diff --git a/include/uapi/linux/inet_diag.h b/include/uapi/linux/inet_diag.h index c7093c7..9ca4834 100644 --- a/include/uapi/linux/inet_diag.h +++ b/include/uapi/linux/inet_diag.h @@ -111,9 +111,10 @@ enum { INET_DIAG_SKMEMINFO, INET_DIAG_SHUTDOWN, INET_DIAG_DCTCPINFO, + INET_DIAG_SKV6ONLY, }; -#define INET_DIAG_MAX INET_DIAG_DCTCPINFO +#define INET_DIAG_MAX INET_DIAG_SKV6ONLY /* INET_DIAG_MEM */ diff --git a/net/ipv4/inet_diag.c b/net/ipv4/inet_diag.c index 4d32262..4bf6d03 100644 --- a/net/ipv4/inet_diag.c +++ b/net/ipv4/inet_diag.c @@ -151,6 +151,10 @@ int inet_sk_diag_fill(struct sock *sk, struct inet_connection_sock *icsk, if (nla_put_u8(skb, INET_DIAG_TCLASS, inet6_sk(sk)-tclass) 0) goto errout; + + if (nla_put_u8(skb, INET_DIAG_SKV6ONLY, + inet6_sk(sk)-ipv6only) 0) + goto errout; } #endif 1) This certainly should not compile on current linux trees. Always submit such patches on net-next. 2) It is not clear why we would add this attribute if it is 0. This looks a waste of data. So I would rather use : diff --git a/include/uapi/linux/inet_diag.h b/include/uapi/linux/inet_diag.h index b629fc53b1090e73047b263a9231e34ebf64b2af..46d72e45f8701526abb06f4a8187262dbc635784 100644 --- a/include/uapi/linux/inet_diag.h +++ b/include/uapi/linux/inet_diag.h @@ -112,6 +112,7 @@ enum { INET_DIAG_SHUTDOWN, INET_DIAG_DCTCPINFO, INET_DIAG_PROTOCOL, /* response attribute only */ + INET_DIAG_SKV6ONLY, }; #define INET_DIAG_MAX INET_DIAG_PROTOCOL diff --git a/net/ipv4/inet_diag.c b/net/ipv4/inet_diag.c index 21985d8d41e709908021769be36380f7a5dfac23..381a26e932691075a73ae63569fd3a4366ce277f 100644 --- a/net/ipv4/inet_diag.c +++ b/net/ipv4/inet_diag.c @@ -151,6 +151,9 @@ int inet_sk_diag_fill(struct sock *sk, struct inet_connection_sock *icsk, if (nla_put_u8(skb, INET_DIAG_TCLASS, inet6_sk(sk)-tclass) 0) goto errout; + if (ipv6_only_sock(sk) + nla_put_u8(skb, INET_DIAG_SKV6ONLY, 1)) + goto errout; } #endif -- To unsubscribe from this list: send the line unsubscribe netdev in
Re: [PATCH net-next v2] bridge: multicast: start querier timer when running user-space stp
On Jun 19, 2015, at 4:47 PM, Herbert Xu herb...@gondor.apana.org.au wrote: On Fri, Jun 19, 2015 at 01:45:50AM -0700, Nikolay Aleksandrov wrote: When STP is running in user-space and querier is configured, the querier timer is not started when a port goes to a non-blocking state. This patch unifies the user- and kernel-space stp multicast port enable path and enables it in all states different from blocking. Note that when a port goes in BR_STATE_DISABLED it's not enabled because that is handled in the beginning of the port list loop. Signed-off-by: Nikolay Aleksandrov niko...@cumulusnetworks.com Acked-by: Herbert Xu herb...@gondor.apana.org.au On a related note, we never disable the multicast querying when the port goes into blocking mode and we probably should. So could you take a look at that and create a patch for it? Thanks, -- Email: Herbert Xu herb...@gondor.apana.org.au Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -- To unsubscribe from this list: send the line unsubscribe netdev in Good catch, I’ll look into it. Thanks, Nik-- To unsubscribe from this list: send the line unsubscribe netdev in
[PATCH v2 net-next] macvtap: Increase limit of macvtap queues
Macvtap should be compatible with tuntap for maximum number of queues. commit 'baf71c5c1f80d82e92924050a60b5baaf97e3094 (tuntap: Increase the number of queues in tun.)' removes the limitations and increases number of queues in tuntap. Now, Its safe to increase number of queues in Macvtap as well. This patch also modifies 'macvtap_del_queues' function to avoid extra memory allocation in stack. Changes from v1-v2 : Michael S. Tsirkin, Jason Wang : Better way to use linked list to avoid use of extra memory in stack. Sergei Shtylyov : Specify dependent commit's summary. Signed-off-by: Pankaj Gupta pagu...@redhat.com --- drivers/net/macvtap.c | 10 ++ include/linux/if_macvlan.h | 2 +- 2 files changed, 3 insertions(+), 9 deletions(-) diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c index 483afb1..6a64197f 100644 --- a/drivers/net/macvtap.c +++ b/drivers/net/macvtap.c @@ -263,27 +263,21 @@ out: static void macvtap_del_queues(struct net_device *dev) { struct macvlan_dev *vlan = netdev_priv(dev); - struct macvtap_queue *q, *tmp, *qlist[MAX_MACVTAP_QUEUES]; - int i, j = 0; + struct macvtap_queue *q, *tmp; ASSERT_RTNL(); list_for_each_entry_safe(q, tmp, vlan-queue_list, next) { list_del_init(q-next); - qlist[j++] = q; RCU_INIT_POINTER(q-vlan, NULL); if (q-enabled) vlan-numvtaps--; vlan-numqueues--; + sock_put(q-sk); } - for (i = 0; i vlan-numvtaps; i++) - RCU_INIT_POINTER(vlan-taps[i], NULL); BUG_ON(vlan-numvtaps); BUG_ON(vlan-numqueues); /* guarantee that any future macvtap_set_queue will fail */ vlan-numvtaps = MAX_MACVTAP_QUEUES; - - for (--j; j = 0; j--) - sock_put(qlist[j]-sk); } static rx_handler_result_t macvtap_handle_frame(struct sk_buff **pskb) diff --git a/include/linux/if_macvlan.h b/include/linux/if_macvlan.h index 6f6929e..a4ccc31 100644 --- a/include/linux/if_macvlan.h +++ b/include/linux/if_macvlan.h @@ -29,7 +29,7 @@ struct macvtap_queue; * Maximum times a macvtap device can be opened. This can be used to * configure the number of receive queue, e.g. for multiqueue virtio. */ -#define MAX_MACVTAP_QUEUES 16 +#define MAX_MACVTAP_QUEUES 256 #define MACVLAN_MC_FILTER_BITS 8 #define MACVLAN_MC_FILTER_SZ (1 MACVLAN_MC_FILTER_BITS) -- 1.9.3 -- To unsubscribe from this list: send the line unsubscribe netdev in
[PATCH] xen-netback: fix a BUG() during initialization
From: Palik, Imre im...@amazon.de Commit edafc132baac (xen-netback: making the bandwidth limiter runtime settable) introduced the capability to change the bandwidth rate limit at runtime. But it also introduced a possible crashing bug. If netback receives two XenbusStateConnected without getting the hotplug-status watch firing in between, then it will try to register the watches for the rate limiter again. But this triggers a BUG() in the watch registration code. The fix modifies connect() to remove the possibly existing packet-rate watches before trying to install those watches. This behaviour is in line with how connect() deals with the hotplug-status watch. Signed-off-by: Imre Palik im...@amazon.de Cc: Matt Wilson m...@amazon.com --- drivers/net/xen-netback/xenbus.c |4 1 file changed, 4 insertions(+) diff --git a/drivers/net/xen-netback/xenbus.c b/drivers/net/xen-netback/xenbus.c index 968787a..ec383b0 100644 --- a/drivers/net/xen-netback/xenbus.c +++ b/drivers/net/xen-netback/xenbus.c @@ -681,6 +681,9 @@ static int xen_register_watchers(struct xenbus_device *dev, struct xenvif *vif) char *node; unsigned maxlen = strlen(dev-nodename) + sizeof(/rate); + if (vif-credit_watch.node) + return -EADDRINUSE; + node = kmalloc(maxlen, GFP_KERNEL); if (!node) return -ENOMEM; @@ -770,6 +773,7 @@ static void connect(struct backend_info *be) } xen_net_read_rate(dev, credit_bytes, credit_usec); + xen_unregister_watchers(be-vif); xen_register_watchers(dev, be-vif); read_xenbus_vif_flags(be); -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe netdev in
[PATCH net v2] bridge: multicast: restore router configuration on port link down/up
From: Satish Ashok sas...@cumulusnetworks.com When a port goes through a link down/up the multicast router configuration is not restored. Signed-off-by: Satish Ashok sas...@cumulusnetworks.com Signed-off-by: Nikolay Aleksandrov niko...@cumulusnetworks.com Fixes: 0909e11758bd (bridge: Add multicast_router sysfs entries) Acked-by: Herbert Xu herb...@gondor.apana.org.au --- v2: Added the acked-by and sent as a separate patch. I plan to repurpose the second patch for net-next, they weren't dependent anyway. net/bridge/br_multicast.c |4 1 file changed, 4 insertions(+) diff --git a/net/bridge/br_multicast.c b/net/bridge/br_multicast.c index ff667e18b2d6..761fc733bf6d 100644 --- a/net/bridge/br_multicast.c +++ b/net/bridge/br_multicast.c @@ -37,6 +37,8 @@ static void br_multicast_start_querier(struct net_bridge *br, struct bridge_mcast_own_query *query); +static void br_multicast_add_router(struct net_bridge *br, + struct net_bridge_port *port); unsigned int br_mdb_rehash_seq; static inline int br_ip_equal(const struct br_ip *a, const struct br_ip *b) @@ -936,6 +938,8 @@ void br_multicast_enable_port(struct net_bridge_port *port) #if IS_ENABLED(CONFIG_IPV6) br_multicast_enable(port-ip6_own_query); #endif + if (port-multicast_router == 2 hlist_unhashed(port-rlist)) + br_multicast_add_router(br, port); out: spin_unlock(br-multicast_lock); -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next v2] bridge: multicast: start querier timer when running user-space stp
When STP is running in user-space and querier is configured, the querier timer is not started when a port goes to a non-blocking state. This patch unifies the user- and kernel-space stp multicast port enable path and enables it in all states different from blocking. Note that when a port goes in BR_STATE_DISABLED it's not enabled because that is handled in the beginning of the port list loop. Signed-off-by: Nikolay Aleksandrov niko...@cumulusnetworks.com --- v2: Repurposed for net-next and implemented Herbert's suggestion for unifying both user- and kernel-space multicast enable port paths. net/bridge/br_stp.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/net/bridge/br_stp.c b/net/bridge/br_stp.c index 45f1ff113af9..e7ab74b405a1 100644 --- a/net/bridge/br_stp.c +++ b/net/bridge/br_stp.c @@ -428,7 +428,6 @@ static void br_make_forwarding(struct net_bridge_port *p) else br_set_state(p, BR_STATE_LEARNING); - br_multicast_enable_port(p); br_log_state(p); br_ifinfo_notify(RTM_NEWLINK, p); @@ -462,6 +461,8 @@ void br_port_state_selection(struct net_bridge *br) } } + if (p-state != BR_STATE_BLOCKING) + br_multicast_enable_port(p); if (p-state == BR_STATE_FORWARDING) ++liveports; } -- 2.4.3 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/1] ssb: remove unncessary out label
This patch removes unnecessary label out and some restructring for using device_create_file directly. Signed-off-by: Maninder Singh maninder...@samsung.com Reviewed-by: Rohit Thapliyal r.thapli...@samsung.com --- drivers/ssb/pci.c |8 +--- 1 files changed, 1 insertions(+), 7 deletions(-) diff --git a/drivers/ssb/pci.c b/drivers/ssb/pci.c index 0f28c08..d6ca4d3 100644 --- a/drivers/ssb/pci.c +++ b/drivers/ssb/pci.c @@ -1173,17 +1173,11 @@ void ssb_pci_exit(struct ssb_bus *bus) int ssb_pci_init(struct ssb_bus *bus) { struct pci_dev *pdev; - int err; if (bus-bustype != SSB_BUSTYPE_PCI) return 0; pdev = bus-host_pci; mutex_init(bus-sprom_mutex); - err = device_create_file(pdev-dev, dev_attr_ssb_sprom); - if (err) - goto out; - -out: - return err; + return device_create_file(pdev-dev, dev_attr_ssb_sprom); } -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] net: inet_diag: export IPV6_V6ONLY sockopt
For AF_INET6 sockets, the value of struct ipv6_pinfo.ipv6only is exported to userspace. It indicates whether an unbound socket listens on IPv4 as well as IPv6. Since the socket is natively IPv6, it is not listed by e.g. 'netstat -l -4'. Signed-off-by: Phil Sutter p...@nwl.cc --- This patch is accompanied by an appropriate one for iproute2 to enable the additional information in 'ss -e'. --- include/uapi/linux/inet_diag.h | 3 ++- net/ipv4/inet_diag.c | 4 2 files changed, 6 insertions(+), 1 deletion(-) diff --git a/include/uapi/linux/inet_diag.h b/include/uapi/linux/inet_diag.h index c7093c7..9ca4834 100644 --- a/include/uapi/linux/inet_diag.h +++ b/include/uapi/linux/inet_diag.h @@ -111,9 +111,10 @@ enum { INET_DIAG_SKMEMINFO, INET_DIAG_SHUTDOWN, INET_DIAG_DCTCPINFO, + INET_DIAG_SKV6ONLY, }; -#define INET_DIAG_MAX INET_DIAG_DCTCPINFO +#define INET_DIAG_MAX INET_DIAG_SKV6ONLY /* INET_DIAG_MEM */ diff --git a/net/ipv4/inet_diag.c b/net/ipv4/inet_diag.c index 4d32262..4bf6d03 100644 --- a/net/ipv4/inet_diag.c +++ b/net/ipv4/inet_diag.c @@ -151,6 +151,10 @@ int inet_sk_diag_fill(struct sock *sk, struct inet_connection_sock *icsk, if (nla_put_u8(skb, INET_DIAG_TCLASS, inet6_sk(sk)-tclass) 0) goto errout; + + if (nla_put_u8(skb, INET_DIAG_SKV6ONLY, + inet6_sk(sk)-ipv6only) 0) + goto errout; } #endif -- 2.1.2 -- To unsubscribe from this list: send the line unsubscribe netdev in
Re: [PATCH v2 1/3] net: mvneta: introduce compatible string marvell, armada-xp-neta
On Wed, Jun 17, 2015 at 05:01:12PM +, Jason Cooper wrote: Hi Gregory, On Wed, Jun 17, 2015 at 05:15:28PM +0200, Gregory CLEMENT wrote: On 17/06/2015 17:12, Gregory CLEMENT wrote: On 17/06/2015 15:19, Simon Guinot wrote: The mvneta driver supports the Ethernet IP found in the Armada 370, XP, 380 and 385 SoCs. Since at least one more hardware feature is available for the Armada XP SoCs then a way to identify them is needed. This patch introduces a new compatible string marvell,armada-xp-neta. Let's be future proof by going further. I would like to have an compatible string for each SoC even if we currently we don't use them. I disagree with this. We can't predict what incosistencies we'll discover in the future. We should only assign new compatible strings based on known IP variations when we discover them. This seems fraught with demons since we can't predict the scope of affected IP blocks (some steppings of one SoC, three SoCs plus two steppings of a fourth, etc) imho, the 'future-proofing' lies in being specific as to the naming of the compatible strings against known hardware variations at the time. So, should I add more compatible strings or not ? Simon signature.asc Description: Digital signature
Re: [PATCH net-next] inet_diag: Remove _bh suffix in inet_diag_dump_reqs().
On Thu, 2015-06-18 at 20:40 +0900, Hiroaki Shimoda wrote: inet_diag_dump_reqs() is called from inet_diag_dump_icsk() with BH disabled. So no need to disable BH in inet_diag_dump_reqs(). Signed-off-by: Hiroaki Shimoda shimoda.hiro...@gmail.com --- net/ipv4/inet_diag.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/net/ipv4/inet_diag.c b/net/ipv4/inet_diag.c index 21985d8d41e7..4ca789ba63cb 100644 --- a/net/ipv4/inet_diag.c +++ b/net/ipv4/inet_diag.c @@ -746,7 +746,7 @@ static int inet_diag_dump_reqs(struct sk_buff *skb, struct sock *sk, entry.family = sk-sk_family; - spin_lock_bh(icsk-icsk_accept_queue.syn_wait_lock); + spin_lock(icsk-icsk_accept_queue.syn_wait_lock); lopt = icsk-icsk_accept_queue.listen_opt; if (!lopt || !listen_sock_qlen(lopt)) @@ -794,7 +794,7 @@ static int inet_diag_dump_reqs(struct sk_buff *skb, struct sock *sk, } out: - spin_unlock_bh(icsk-icsk_accept_queue.syn_wait_lock); + spin_unlock(icsk-icsk_accept_queue.syn_wait_lock); return err; } Sure, although this will soon be removed completely when SYN_RECV sockets will be stored in regular ehash table. Thanks -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next RFC v2 2/3] ipv4: add support for light weight tunnel encap attributes
Hello, On Thu, 18 Jun 2015, Roopa Prabhu wrote: @@ -366,6 +371,7 @@ static inline size_t fib_nlmsg_size(struct fib_info *fi) payload += nla_total_size((RTAX_MAX * nla_total_size(4))); if (fi-fib_nhs) { + size_t nh_encapsize = 0; Var not in #ifdef. Any warnings with CONFIG_LWTUNNEL=n? /* Also handles the special case fib_nhs == 1 */ /* each nexthop is packed in an attribute */ @@ -374,8 +380,23 @@ static inline size_t fib_nlmsg_size(struct fib_info *fi) /* may contain flow and gateway attribute */ nhsize += 2 * nla_total_size(4); +#ifdef CONFIG_LWTUNNEL + /* grab encap info */ + for_nexthops(fi) { + if (nh-nh_lwtstate) { + /* RTA_ENCAP_TYPE */ + nh_encapsize += lwtunnel_get_encap_size( + nh-nh_lwtstate); New labels not in #ifdef: + +err_inval: + ret = -EINVAL; + +errout: + return ret; } Some other places may need changes: - nh_comp: there is logic that decides if same fib_info is reused from many fib nodes. There should be check if NH matches by nh_lwtstate. - xfrm4_fill_dst: not sure about this but some fields are copied. Regards -- Julian Anastasov j...@ssi.bg -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] fm10k: Report MAC address on driver load
On Thu, 2015-06-18 at 19:41 -0700, Alexander Duyck wrote: This change adds the MAC address to the list of values recorded on driver load. The MAC address represents the serial number of the unit and allows us to track the value should a card be replaced in a system. The log message should now be similar in output to that of ixgbe. Signed-off-by: Alexander Duyck alexander.h.du...@redhat.com --- v2: Moved printing of MAC onto separate line similar to ixgbe. (Hopefully this works for you Jeff. I took at look at the patch and just moved the bit I needed down. I figured since this block hasn't changed I should be able to get away with just doing this instead of pulling and rebasing off of your tree. ) drivers/net/ethernet/intel/fm10k/fm10k_pci.c |3 +++ 1 file changed, 3 insertions(+) Works for me! I have added your updated patch to my queue. signature.asc Description: This is a digitally signed message part
Re: [PATCH net 0/2] bridge: multicast behaviour fixes
On Jun 17, 2015, at 2:28 PM, Nikolay Aleksandrov niko...@cumulusnetworks.com wrote: Hi, Patch 01 fixes a problem when a router is configured and a port goes through a link down/up, the router configuration was not restored. Patch 02 starts the multicast querier when using user-space STP and a port goes to forwarding state. These are behaviour fixes and if you think they are more appropriate for net-next, then feel free to apply them there, I've run them with both net and net-next. Also I've provided fixes tags, but since these are behaviour changes they're the initial implementations of these functions. Best regards, Nikolay Aleksandrov Satish Ashok (2): bridge: multicast: restore router configuration on port link down/up bridge: multicast: start querier timer when running user-space stp net/bridge/br_multicast.c | 4 net/bridge/br_stp.c | 3 +++ 2 files changed, 7 insertions(+) -- 2.4.3 Dave please drop this series, I’ll post the patches separately because I have to implement Herbert’s suggestion and would like to repurpose patch 02 for net-next. Thanks, Nik-- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/3] net: dsa: Allow configuration of CPU DSA port speeds/duplex
Yes I do have debug too, but via sysfs (with eventually write access) for: GLOBAL1, GLOBAL2, cpu port registers, SerDes registers, PVIDs, and VTU. Not really standard though. We should really get an implementation into mainline. There is no point us all implementing our own. You say your code is not really standard. Do you think it would get rejected if it was submitted? The rules for debugfs are much more relaxed, so what i have should be acceptable. Andrew -- To unsubscribe from this list: send the line unsubscribe netdev in
Re: [PATCH net-next RFC v2 2/3] ipv4: add support for light weight tunnel encap attributes
On 6/19/15, 8:19 AM, Robert Shearman wrote: On 19/06/15 05:49, Roopa Prabhu wrote: From: Roopa Prabhu ro...@cumulusnetworks.com Introduces two netlink attributes RTA_ENCAP_TYPE and RTA_ENCAP to support attaching encap information to ipv4 routes. Surely RTA_ENCAP_TYPE should be part of RTA_ENCAP, since the type doesn't make sense without the data and vice versa? I went back and forth on this. And started with what you are saying above. But then I wanted RTA_ENCAP netlink policy to be declared by individual lwtunnel drivers. And to determine which RTA_ENCAP netlink policy to pick, you need to know the RTA_ENCAP policy type (or lwtunnel type) which is encoded in RTA_ENCAP_TYPE. And I did not want to introduce another level of nest in RTA_ENCAP (because for nexthops we are already 2 levels deep when parsing RTA_ENCAP). Hence, fib code first looks for RTA_ENCAP and if RTA_ENCAP is specified, RTA_ENCAP_TYPE is a required attribute. My iproute2 patches handles this and makes sure there is an RTA_ENCAP_TYPE specified with RTA_ENCAP. thanks, Roopa -- To unsubscribe from this list: send the line unsubscribe netdev in
[PATCH] net: dsa: mv88e6xxx: split phy page accessors
Split mv88e6xxx_phy_page_read and mv88e6xxx_phy_page_write into two functions each, one to acquire the smi_mutex and one to call the actual read/write functions. This will be useful to access registers such as Fiber/SERDES Control, from setup code with SMI lock held. Also rename their error labels to clear, since it is not only an error path. Signed-off-by: Vivien Didelot vivien.dide...@savoirfairelinux.com --- drivers/net/dsa/mv88e6xxx.c | 43 --- 1 file changed, 32 insertions(+), 11 deletions(-) diff --git a/drivers/net/dsa/mv88e6xxx.c b/drivers/net/dsa/mv88e6xxx.c index bfe70ce..9caec51 100644 --- a/drivers/net/dsa/mv88e6xxx.c +++ b/drivers/net/dsa/mv88e6xxx.c @@ -2068,37 +2068,58 @@ int mv88e6xxx_switch_reset(struct dsa_switch *ds, bool ppu_active) return 0; } -int mv88e6xxx_phy_page_read(struct dsa_switch *ds, int port, int page, int reg) +/* Must be called with SMI lock held */ +static int _mv88e6xxx_phy_page_read(struct dsa_switch *ds, int port, int page, + int reg) { - struct mv88e6xxx_priv_state *ps = ds_to_priv(ds); int ret; - mutex_lock(ps-smi_mutex); ret = _mv88e6xxx_phy_write_indirect(ds, port, 0x16, page); if (ret 0) - goto error; + goto clear; ret = _mv88e6xxx_phy_read_indirect(ds, port, reg); -error: +clear: _mv88e6xxx_phy_write_indirect(ds, port, 0x16, 0x0); - mutex_unlock(ps-smi_mutex); return ret; } -int mv88e6xxx_phy_page_write(struct dsa_switch *ds, int port, int page, -int reg, int val) +int mv88e6xxx_phy_page_read(struct dsa_switch *ds, int port, int page, int reg) { struct mv88e6xxx_priv_state *ps = ds_to_priv(ds); int ret; mutex_lock(ps-smi_mutex); + ret = _mv88e6xxx_phy_page_read(ds, port, page, reg); + mutex_unlock(ps-smi_mutex); + + return ret; +} + +/* Must be called with SMI lock held */ +static int _mv88e6xxx_phy_page_write(struct dsa_switch *ds, int port, int page, +int reg, int val) +{ + int ret; + ret = _mv88e6xxx_phy_write_indirect(ds, port, 0x16, page); if (ret 0) - goto error; - + goto clear; ret = _mv88e6xxx_phy_write_indirect(ds, port, reg, val); -error: +clear: _mv88e6xxx_phy_write_indirect(ds, port, 0x16, 0x0); + return ret; +} + +int mv88e6xxx_phy_page_write(struct dsa_switch *ds, int port, int page, +int reg, int val) +{ + struct mv88e6xxx_priv_state *ps = ds_to_priv(ds); + int ret; + + mutex_lock(ps-smi_mutex); + ret = _mv88e6xxx_phy_page_write(ds, port, page, reg, val); mutex_unlock(ps-smi_mutex); + return ret; } -- 2.4.3 -- To unsubscribe from this list: send the line unsubscribe netdev in
Re: [PATCH net-next RFC v2 1/3] lwt: infrastructure to support light weight tunnels
On 6/19/15, 7:43 AM, Robert Shearman wrote: diff --git a/include/linux/lwtunnel.h b/include/linux/lwtunnel.h new file mode 100644 snip +/* lw tunnel state flags */ +#define LWTUNNEL_STATE_OUTPUT_REDIRECT 0x1 + +#define lwtunnel_output_redirect(lwtstate) (lwtstate \ +(lwtstate-flags LWTUNNEL_STATE_OUTPUT_REDIRECT)) This could be made an inline function for type-safety. ack + +struct lwtunnel_state { +__u16type; +__u16flags; +atomic_trefcnt; +struct lwtunnel_hdr tunnel; +}; + +struct lwtunnel_net { +struct hlist_head tunnels[LWTUNNEL_HASH_SIZE]; +}; This type doesn't appear to be used in this patch series. Do you intend to use it in a future version? ack, will get rid of it + +static inline struct lwtunnel_state *lwtunnel_skb_lwstate(struct sk_buff *skb) +{ +struct rtable *rt = (struct rtable *)skb_dst(skb); + +return rt-rt_lwtstate; +} It doesn't look like this patch will build on its own because rt_lwtstate isn't added to struct rtable until patch 2. looks like i messed up the patch creation. I will fix that with the next series. More importantly, is it safe to assume that skb_dst will always return an IPv4 dst? How will this look when IPv6 support is added? Today lwtunnel_skb_lwstate is called from lwtunnel_output which is only called from ipv4 code. And my ipv6 variant code was supposed to have a 6 suffix. something like lwtunnel_output6. Or to be more explicit i will probably have variants of the output and skb handling functions like, lwtunnel_output_ipv4 and lwtunnel_output_ipv6. + +ret = -EOPNOTSUPP; +nest = nla_nest_start(skb, RTA_ENCAP); Again, it doesn't look like this will build since RTA_ENCAP isn't added until patch 2. ack, sorry abt the patch ordering. will fix it. Thanks for the review. -- To unsubscribe from this list: send the line unsubscribe netdev in
Re: [PATCH 1/1] ssb: remove unncessary out label
On Fri, 19 Jun 2015 14:23:45 +0530 Maninder Singh maninder...@samsung.com wrote: This patch removes unnecessary label out and some restructring for using device_create_file directly. Signed-off-by: Maninder Singh maninder...@samsung.com Reviewed-by: Rohit Thapliyal r.thapli...@samsung.com --- drivers/ssb/pci.c |8 +--- 1 files changed, 1 insertions(+), 7 deletions(-) diff --git a/drivers/ssb/pci.c b/drivers/ssb/pci.c index 0f28c08..d6ca4d3 100644 --- a/drivers/ssb/pci.c +++ b/drivers/ssb/pci.c @@ -1173,17 +1173,11 @@ void ssb_pci_exit(struct ssb_bus *bus) int ssb_pci_init(struct ssb_bus *bus) { struct pci_dev *pdev; - int err; if (bus-bustype != SSB_BUSTYPE_PCI) return 0; pdev = bus-host_pci; mutex_init(bus-sprom_mutex); - err = device_create_file(pdev-dev, dev_attr_ssb_sprom); - if (err) - goto out; - -out: - return err; + return device_create_file(pdev-dev, dev_attr_ssb_sprom); } I don't really think this change adds any value, but if you insist on it you get my acked-by. -- Michael pgp8ErvjUlmpV.pgp Description: OpenPGP digital signature
Re: [PATCH net-next RFC v2 2/3] ipv4: add support for light weight tunnel encap attributes
On 19/06/15 05:49, Roopa Prabhu wrote: From: Roopa Prabhu ro...@cumulusnetworks.com Introduces two netlink attributes RTA_ENCAP_TYPE and RTA_ENCAP to support attaching encap information to ipv4 routes. Surely RTA_ENCAP_TYPE should be part of RTA_ENCAP, since the type doesn't make sense without the data and vice versa? RTA_ENCAP is a nested attribute as suggested by Thomas (and also as Robert had it in his series). RTA_ENCAP netlink policy is declared by the light weight tunnel drivers that support this encap type. fib code calls the following for each nexthop: - new route handler: lwt build state (that parses RTA_ENCAP and returns lwt state that lives in every fib_nh) - del dump hanlder: lwt release handler to release lwt state data - route dump hanlder: lwt dump encap to fill RTA_ENCAP data - during input route lookup sets dst-output to lwtunnel_output which in turn calls the corresponding lwt tunnel output function which applies the required encap and xmits the packet Signed-off-by: Roopa Prabhu ro...@cumulusnetworks.com Thanks, Rob -- To unsubscribe from this list: send the line unsubscribe netdev in
Re: [PATCH] xen-netback: fix a BUG() during initialization
On Fri, Jun 19, 2015 at 02:21:51PM +0200, Imre Palik wrote: From: Palik, Imre im...@amazon.de Commit edafc132baac (xen-netback: making the bandwidth limiter runtime settable) introduced the capability to change the bandwidth rate limit at runtime. But it also introduced a possible crashing bug. If netback receives two XenbusStateConnected without getting the hotplug-status watch firing in between, then it will try to register the watches for the rate limiter again. But this triggers a BUG() in the watch registration code. The fix modifies connect() to remove the possibly existing packet-rate watches before trying to install those watches. This behaviour is in line with how connect() deals with the hotplug-status watch. Signed-off-by: Imre Palik im...@amazon.de Cc: Matt Wilson m...@amazon.com Acked-by: Wei Liu wei.l...@citrix.com --- drivers/net/xen-netback/xenbus.c |4 1 file changed, 4 insertions(+) diff --git a/drivers/net/xen-netback/xenbus.c b/drivers/net/xen-netback/xenbus.c index 968787a..ec383b0 100644 --- a/drivers/net/xen-netback/xenbus.c +++ b/drivers/net/xen-netback/xenbus.c @@ -681,6 +681,9 @@ static int xen_register_watchers(struct xenbus_device *dev, struct xenvif *vif) char *node; unsigned maxlen = strlen(dev-nodename) + sizeof(/rate); + if (vif-credit_watch.node) + return -EADDRINUSE; + node = kmalloc(maxlen, GFP_KERNEL); if (!node) return -ENOMEM; @@ -770,6 +773,7 @@ static void connect(struct backend_info *be) } xen_net_read_rate(dev, credit_bytes, credit_usec); + xen_unregister_watchers(be-vif); xen_register_watchers(dev, be-vif); read_xenbus_vif_flags(be); -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe netdev in
Re: [PATCH net-next 00/15] Simplify netfilter and network namespaces
Julian Anastasov j...@ssi.bg writes: Hello, On Thu, 18 Jun 2015, Eric W. Biederman wrote: My incremental patch for ipvs on top of everything else I have pushed out looks like this: From: Eric W. Biederman ebied...@xmission.com Date: Fri, 12 Jun 2015 18:34:12 -0500 Subject: [PATCH] ipvs: Pass struct net down to where it is needed and used Pass struct net down to where it is used and stop guessing which network namespace should be used. At first look patch is ok. But I'm not sure for the changes in ip_vs_xmit.c. Can you explain in 2-3 lines, when can we see different netns? Is it when packet is forwarded to output device and it is part from another netns? I'm asking because these __ip_vs_get_out_rt* calls in ip_vs_xmit.c can reroute packet to another device... The driver was ensure_mtu_is_adequate where in one half of the function we have skb_net another half we have dev_net(dst_skb(skb)-dev).That is goofy. That gets replaced by ip_vs_conn_net(cp). In practice today I don't see that there are differences in network namespaces today. Moving forward I hope to be able to route packets out to network devices in different network namespaces. It is a massive optimization that doesn't need much code to support. Once that optimization is in place deriving the network namespace from the output device will not work. I believe using ip_vs_conn_net(cp) is a simple rule that is easy to understand and easy to implement correctly on the output path. Also, skb_sknet is another candidate for removal. But I can take care about it after your changes are pushed... *nod* Eric -- To unsubscribe from this list: send the line unsubscribe netdev in
Re: [PATCH net-next RFC v2 0/3] light weight tunnel infrastructure and driver
On 19/06/15 05:49, Roopa Prabhu wrote: From: Roopa Prabhu ro...@cumulusnetworks.com This series implements infrastructure for light weight tunnels to support mpls label edge routers (ie mpls ip tunnels). As previously discussed having netdevices will not scale. Hence this series introduces new RTA_ENCAP* attributes to attach encap information with routes (following suggestion from Eric Biederman). Looks promising, thanks for posting this series Roopa! The first patch introduces an infrastructure to support light weight tunnels that dont have netdevices. The infrastructure allows tunnel drivers to register handlers to parse and build tunnel encap data which can be attached to each route nexthop. The second patch adds support in ipv4 fib to carry such light weight tunnel encap data. I presume this isn't ready to be merged until IPv6 is done, right? The third patch implements mpls ip tunnels using this light weight tunnel infrastructure. Could not think of a better name, so, it is 'lwt' for 'light weight tunnels' for now. I can't think of a better name either. Thanks, Rob -- To unsubscribe from this list: send the line unsubscribe netdev in
Re: [PATCH net-next RFC v2 0/3] light weight tunnel infrastructure and driver
On 6/19/15, 7:38 AM, Robert Shearman wrote: This series implements infrastructure for light weight tunnels to support mpls label edge routers (ie mpls ip tunnels). As previously discussed having netdevices will not scale. Hence this series introduces new RTA_ENCAP* attributes to attach encap information with routes (following suggestion from Eric Biederman). Looks promising, thanks for posting this series Roopa! The first patch introduces an infrastructure to support light weight tunnels that dont have netdevices. The infrastructure allows tunnel drivers to register handlers to parse and build tunnel encap data which can be attached to each route nexthop. The second patch adds support in ipv4 fib to carry such light weight tunnel encap data. I presume this isn't ready to be merged until IPv6 is done, right? yes, I will be adding ipv6 support soon. I will post the next non-RFC series with the ipv6 changes thanks, Roopa -- To unsubscribe from this list: send the line unsubscribe netdev in
[PATCH net] netfilter: nftables: Do not run chains in the wrong network namespace
Currenlty nf_tables chains added in one network namespace are being run in all network namespace. The issues are myriad with the simplest being an unprivileged user can cause any network packets to be dropped. Address this by simply not running nf_tables chains in the wrong network namespace. Cc: sta...@vger.kernel.org Signed-off-by: Eric W. Biederman ebied...@xmission.com --- net/netfilter/nf_tables_core.c | 7 ++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/net/netfilter/nf_tables_core.c b/net/netfilter/nf_tables_core.c index f153b07073af..f77bad46ac68 100644 --- a/net/netfilter/nf_tables_core.c +++ b/net/netfilter/nf_tables_core.c @@ -114,7 +114,8 @@ unsigned int nft_do_chain(struct nft_pktinfo *pkt, const struct nf_hook_ops *ops) { const struct nft_chain *chain = ops-priv, *basechain = chain; - const struct net *net = read_pnet(nft_base_chain(basechain)-pnet); + const struct net *chain_net = read_pnet(nft_base_chain(basechain)-pnet); + const struct net *net = dev_net(pkt-in ? pkt-in : pkt-out); const struct nft_rule *rule; const struct nft_expr *expr, *last; struct nft_regs regs; @@ -124,6 +125,10 @@ nft_do_chain(struct nft_pktinfo *pkt, const struct nf_hook_ops *ops) int rulenum; unsigned int gencursor = nft_genmask_cur(net); + /* Ignore chains that are not for the current network namespace */ + if (!net_eq(net, chain_net)) + return NF_ACCEPT; + do_chain: rulenum = 0; rule = list_entry(chain-rules, struct nft_rule, list); -- 2.2.1 -- To unsubscribe from this list: send the line unsubscribe netdev in
Re: [PATCH net-next RFC v2 1/3] lwt: infrastructure to support light weight tunnels
On 19/06/15 05:49, Roopa Prabhu wrote: From: Roopa Prabhu ro...@cumulusnetworks.com provides ops to parse, build and output encaped packets for drivers that want to attach tunnel encap information to routes. Signed-off-by: Roopa Prabhu ro...@cumulusnetworks.com --- include/linux/lwtunnel.h |6 ++ include/net/lwtunnel.h| 84 + include/uapi/linux/lwtunnel.h | 11 +++ net/Kconfig |5 ++ net/core/Makefile |1 + net/core/lwtunnel.c | 162 + 6 files changed, 269 insertions(+) create mode 100644 include/linux/lwtunnel.h create mode 100644 include/net/lwtunnel.h create mode 100644 include/uapi/linux/lwtunnel.h create mode 100644 net/core/lwtunnel.c diff --git a/include/linux/lwtunnel.h b/include/linux/lwtunnel.h new file mode 100644 index 000..97f32f8 --- /dev/null +++ b/include/linux/lwtunnel.h @@ -0,0 +1,6 @@ +#ifndef _LINUX_LWTUNNEL_H_ +#define _LINUX_LWTUNNEL_H_ + +#include uapi/linux/lwtunnel.h + +#endif /* _LINUX_LWTUNNEL_H_ */ diff --git a/include/net/lwtunnel.h b/include/net/lwtunnel.h new file mode 100644 index 000..649da3c --- /dev/null +++ b/include/net/lwtunnel.h @@ -0,0 +1,84 @@ +#ifndef __NET_LWTUNNEL_H +#define __NET_LWTUNNEL_H 1 + +#include linux/lwtunnel.h +#include linux/netdevice.h +#include linux/skbuff.h +#include linux/types.h +#include net/dsfield.h +#include net/ip.h +#include net/rtnetlink.h + +#define LWTUNNEL_HASH_BITS 7 +#define LWTUNNEL_HASH_SIZE (1 LWTUNNEL_HASH_BITS) + +struct lwtunnel_hdr { + int len; + __u8data[0]; +}; + +/* lw tunnel state flags */ +#define LWTUNNEL_STATE_OUTPUT_REDIRECT 0x1 + +#define lwtunnel_output_redirect(lwtstate) (lwtstate \ + (lwtstate-flags LWTUNNEL_STATE_OUTPUT_REDIRECT)) This could be made an inline function for type-safety. + +struct lwtunnel_state { + __u16 type; + __u16 flags; + atomic_trefcnt; + struct lwtunnel_hdr tunnel; +}; + +struct lwtunnel_net { + struct hlist_head tunnels[LWTUNNEL_HASH_SIZE]; +}; This type doesn't appear to be used in this patch series. Do you intend to use it in a future version? + +struct lwtunnel_encap_ops { + int (*build_state)(struct net_device *dev, struct nlattr *encap, + struct lwtunnel_state **ts); + int (*output)(struct sock *sk, struct sk_buff *skb); + int (*fill_encap)(struct sk_buff *skb, + struct lwtunnel_state *lwtstate); + int (*get_encap_size)(struct lwtunnel_state *lwtstate); +}; + +#define MAX_LWTUNNEL_ENCAP_OPS 8 +extern const struct lwtunnel_encap_ops __rcu * + lwtun_encaps[MAX_LWTUNNEL_ENCAP_OPS]; + +static inline void lwtunnel_state_get(struct lwtunnel_state *lws) +{ + atomic_inc(lws-refcnt); +} + +static inline void lwtunnel_state_put(struct lwtunnel_state *lws) +{ + if (!lws) + return; + + if (atomic_dec_and_test(lws-refcnt)) + kfree(lws); +} + +static inline struct lwtunnel_state *lwtunnel_skb_lwstate(struct sk_buff *skb) +{ + struct rtable *rt = (struct rtable *)skb_dst(skb); + + return rt-rt_lwtstate; +} It doesn't look like this patch will build on its own because rt_lwtstate isn't added to struct rtable until patch 2. More importantly, is it safe to assume that skb_dst will always return an IPv4 dst? How will this look when IPv6 support is added? + +int lwtunnel_encap_add_ops(const struct lwtunnel_encap_ops *op, + unsigned int num); +int lwtunnel_encap_del_ops(const struct lwtunnel_encap_ops *op, + unsigned int num); +int lwtunnel_build_state(struct net_device *dev, u16 encap_type, +struct nlattr *encap, +struct lwtunnel_state **lws); +int lwtunnel_fill_encap(struct sk_buff *skb, + struct lwtunnel_state *lwtstate); +int lwtunnel_get_encap_size(struct lwtunnel_state *lwtstate); +struct lwtunnel_state *lwtunnel_state_alloc(int hdr_len); +int lwtunnel_output(struct sock *sk, struct sk_buff *skb); + +#endif /* __NET_LWTUNNEL_H */ ... diff --git a/net/core/lwtunnel.c b/net/core/lwtunnel.c new file mode 100644 index 000..29c7802 --- /dev/null +++ b/net/core/lwtunnel.c @@ -0,0 +1,162 @@ +/* + * lwtunnelInfrastructure for light weight tunnels like mpls + * + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version + * 2 of the License, or (at your option) any later version. + * + */ +#define pr_fmt(fmt) KBUILD_MODNAME : fmt + +#include linux/capability.h +#include linux/module.h +#include linux/types.h +#include linux/kernel.h +#include linux/slab.h
Re: [PATCH 3/3] net: dsa: Allow configuration of CPU DSA port speeds/duplex
Hi Andrew, On Jun 17, 2015, at 9:11 PM, Andrew Lunn and...@lunn.ch wrote: On Wed, Jun 17, 2015 at 02:09:52PM -0400, Vivien Didelot wrote: Hi Andrew, All, On 12/06/15 10:18, Andrew Lunn wrote: By default, DSA and CPU ports are configured to the maximum speed the switch supports. However there can be use cases where the peer device port is slower. Allow a fixed-link property to be used with the DSA and CPU port in the device tree, and use this information to configure the port. Would it be a good idea for DSA to expose the cpu port to userspace as well? That way, it'd be possible to use ethtool to set the port speed and duplex mode, or dump registers (this would have saved me quite some time in dev). I have code which expose these via debugfs. So far, i have all registers, stats, ATU, and the scratch registers. For the patches to apply cleanly, they depend on these patches, so i've not posted them yet. Yes I do have debug too, but via sysfs (with eventually write access) for: GLOBAL1, GLOBAL2, cpu port registers, SerDes registers, PVIDs, and VTU. Not really standard though. I'm not strongly against having a CPU port, but i don't particularly like having the CPU port as an interface. And when you get to cascaded switches, the DSA ports are also interesting, so should we also have a netdev for them? But they are equally useless for transferring frames from the host as the CPU port. This is why i went for debugfs. Also, in my RFC for 802.1Q support [1], I assume the CPU port to be a tagged member of each VLAN. But someone may want to add a VLAN with swp3 and swp4 only, and another VLAN with swp0, swp1 and the CPU port. Am I correct? The DSA concept is that switch ports are separate interfaces. So adding a VLAN to two ports does to automatically bridge those ports together. You need to add them to a bridge as well before VLAN tagged frames are bridged between ports. My point was to expose all configurable ports with the same standard interface (netdev, like any other virtual switch port). But indeed, their uselessness for data transfer can be a good reason not to do it. Thanks, -v -- To unsubscribe from this list: send the line unsubscribe netdev in
Re: [PATCH net-next RFC v2 2/3] ipv4: add support for light weight tunnel encap attributes
On 19/06/15 15:19, roopa wrote: On 6/18/15, 11:59 PM, Julian Anastasov wrote: Hello, On Thu, 18 Jun 2015, Roopa Prabhu wrote: @@ -366,6 +371,7 @@ static inline size_t fib_nlmsg_size(struct fib_info *fi) payload += nla_total_size((RTAX_MAX * nla_total_size(4))); if (fi-fib_nhs) { +size_t nh_encapsize = 0; Var not in #ifdef. Any warnings with CONFIG_LWTUNNEL=n? /* Also handles the special case fib_nhs == 1 */ /* each nexthop is packed in an attribute */ @@ -374,8 +380,23 @@ static inline size_t fib_nlmsg_size(struct fib_info *fi) /* may contain flow and gateway attribute */ nhsize += 2 * nla_total_size(4); +#ifdef CONFIG_LWTUNNEL +/* grab encap info */ +for_nexthops(fi) { +if (nh-nh_lwtstate) { +/* RTA_ENCAP_TYPE */ +nh_encapsize += lwtunnel_get_encap_size( +nh-nh_lwtstate); New labels not in #ifdef: Will check and fix all warnings with CONFIG_LWTUNNEL off + +err_inval: +ret = -EINVAL; + +errout: +return ret; } Some other places may need changes: - nh_comp: there is logic that decides if same fib_info is reused from many fib nodes. There should be check if NH matches by nh_lwtstate. yes, i will add that. One other place - fib_nh_match. This is used when deleting a route to verify that any supplied rtnetlink properties match the route in the fib. Thanks, Rob -- To unsubscribe from this list: send the line unsubscribe netdev in
Re: [PATCH net-next RFC v2 2/3] ipv4: add support for light weight tunnel encap attributes
On 6/19/15, 7:55 AM, Robert Shearman wrote: On 19/06/15 15:19, roopa wrote: On 6/18/15, 11:59 PM, Julian Anastasov wrote: Some other places may need changes: - nh_comp: there is logic that decides if same fib_info is reused from many fib nodes. There should be check if NH matches by nh_lwtstate. yes, i will add that. One other place - fib_nh_match. This is used when deleting a route to verify that any supplied rtnetlink properties match the route in the fib. ack, thanks!. -- To unsubscribe from this list: send the line unsubscribe netdev in
Re: [PATCH next v3] bonding: Display LACP info only to CAP_NET_ADMIN capable user
On Thu, Jun 18, 2015 at 8:00 PM, Andy Gospodarek go...@cumulusnetworks.com wrote: On Thu, Jun 18, 2015 at 11:30:54AM -0700, Mahesh Bandewar wrote: Actor and Partner details can be accessed via proc-fs, sys-fs entries or netlink interface. These interfaces are world readable at this moment. The earlier patch-series made the LACP communication secure to avoid nuisance attack from within the same L2 domain but it did not prevent someone unprivileged looking at that information on host and perform the same act. This patch essentially avoids spitting those entries if the user in question does not have enough privileges. Signed-off-by: Mahesh Bandewar mahe...@google.com --- drivers/net/bonding/bond_netlink.c | 23 + drivers/net/bonding/bond_procfs.c | 101 +++-- drivers/net/bonding/bond_sysfs.c | 12 ++--- 3 files changed, 71 insertions(+), 65 deletions(-) [...] diff --git a/drivers/net/bonding/bond_procfs.c b/drivers/net/bonding/bond_procfs.c index e7f3047a26df..f514fe5e80a5 100644 --- a/drivers/net/bonding/bond_procfs.c +++ b/drivers/net/bonding/bond_procfs.c [...] @@ -199,33 +202,35 @@ static void bond_info_show_slave(struct seq_file *seq, seq_printf(seq, Partner Churned Count: %d\n, port-churn_partner_count); - seq_puts(seq, details actor lacp pdu:\n); - seq_printf(seq, system priority: %d\n, -port-actor_system_priority); - seq_printf(seq, system mac address: %pM\n, -port-actor_system); - seq_printf(seq, port key: %d\n, -port-actor_oper_port_key); - seq_printf(seq, port priority: %d\n, -port-actor_port_priority); - seq_printf(seq, port number: %d\n, -port-actor_port_number); - seq_printf(seq, port state: %d\n, -port-actor_oper_port_state); - - seq_puts(seq, details partner lacp pdu:\n); - seq_printf(seq, system priority: %d\n, -port-partner_oper.system_priority); - seq_printf(seq, system mac address: %pM\n, -port-partner_oper.system); - seq_printf(seq, oper key: %d\n, -port-partner_oper.key); - seq_printf(seq, port priority: %d\n, -port-partner_oper.port_priority); - seq_printf(seq, port number: %d\n, -port-partner_oper.port_number); - seq_printf(seq, port state: %d\n, -port-partner_oper.port_state); + if (capable(CAP_NET_ADMIN)) { + seq_puts(seq, details actor lacp pdu:\n); + seq_printf(seq, system priority: %d\n, +port-actor_system_priority); + seq_printf(seq, system mac address: %pM\n, +port-actor_system); + seq_printf(seq, port key: %d\n, +port-actor_oper_port_key); + seq_printf(seq, port priority: %d\n, +port-actor_port_priority); + seq_printf(seq, port number: %d\n, +port-actor_port_number); + seq_printf(seq, port state: %d\n, +port-actor_oper_port_state); + + seq_puts(seq, details partner lacp pdu:\n); + seq_printf(seq, system priority: %d\n, + port-partner_oper.system_priority); + seq_printf(seq, system mac address: %pM\n, +port-partner_oper.system); + seq_printf(seq, oper key: %d\n, +port-partner_oper.key); + seq_printf(seq, port priority: %d\n, +port-partner_oper.port_priority); + seq_printf(seq, port number: %d\n, +port-partner_oper.port_number); + seq_printf(seq, port state: %d\n, +
[PATCH 09/12] netfilter: use forward declaration instead of including linux/proc_fs.h
We don't need to pull the full definitions in that file, a simple forward declaration is enough. Moreover, include linux/procfs.h from nf_synproxy_core, otherwise this hits a compilation error due to missing declarations, ie. net/netfilter/nf_synproxy_core.c: In function ‘synproxy_proc_init’: net/netfilter/nf_synproxy_core.c:326:2: error: implicit declaration of function ‘proc_create’ [-Werror=implicit-function-declaration] if (!proc_create(synproxy, S_IRUGO, net-proc_net_stat, ^ Signed-off-by: Pablo Neira Ayuso pa...@netfilter.org Signed-off-by: Eric W. Biederman ebied...@xmission.com --- include/net/netns/netfilter.h|2 +- net/netfilter/nf_synproxy_core.c |1 + 2 files changed, 2 insertions(+), 1 deletion(-) diff --git a/include/net/netns/netfilter.h b/include/net/netns/netfilter.h index 8874002..cf25b5e 100644 --- a/include/net/netns/netfilter.h +++ b/include/net/netns/netfilter.h @@ -1,9 +1,9 @@ #ifndef __NETNS_NETFILTER_H #define __NETNS_NETFILTER_H -#include linux/proc_fs.h #include linux/netfilter.h +struct proc_dir_entry; struct nf_logger; struct netns_nf { diff --git a/net/netfilter/nf_synproxy_core.c b/net/netfilter/nf_synproxy_core.c index 52e20c9..789feea 100644 --- a/net/netfilter/nf_synproxy_core.c +++ b/net/netfilter/nf_synproxy_core.c @@ -11,6 +11,7 @@ #include asm/unaligned.h #include net/tcp.h #include net/netns/generic.h +#include linux/proc_fs.h #include linux/netfilter_ipv4/ip_tables.h #include linux/netfilter/x_tables.h -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe netdev in
[PATCH 00/12] Netfilter updates for net-next
Hi David, The following patchset contains a final Netfilter pull request for net-next 4.2. This mostly addresses some fallout from the previous pull request, small netns updates and a couple of new features for nfnetlink_log and the socket match that didn't get in time for the previous pull request. More specifically they are: 1) Add security context information to nfnetlink_queue, from Roman Kubiak. 2) Add support to restore the sk_mark into skb-mark through xt_socket, from Harout Hedeshian. 3) Force alignment of 16 bytes of per cpu xt_counters, from Eric Dumazet. 4) Rename br_netfilter.c to br_netfilter_hooks.c to prepare split of IPv6 code into a separated file. 5) Move the IPv6 code in br_netfilter into a separated file. 6) Remove unused RCV_SKB_FAIL() in nfnetlink_queue and nfetlink_log, from Eric Biederman. 7) Two liner to simplify netns logic in em_ipset_match(). 8) Add missing includes to net/net_namespace.h to avoid compilation problems that result from not including linux/netfilter.h in netns headers. 9) Use a forward declaration instead of including linux/proc_fs.h from netns/netfilter.h 10) Add a new linux/netfilter_defs.h to replace the linux/netfilter.h inclusion in netns headers. 11) Remove spurious netfilter.h file included in the net tree, also from Eric Biederman. 12) Fix x_tables compilation warnings on 32 bits platforms that resulted from recent changes in x_tables counters, from Florian Westphal. You can pull these changes from: git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next.git Thanks! The following changes since commit 89d256bb69f2596c3a31ac51466eac9e1791c388: bpf: disallow bpf tc programs access current-pid,uid (2015-06-15 20:51:20 -0700) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next.git master for you to fetch changes up to dcb8f5c8139ef945cdfd55900fae265c4dbefc02: netfilter: xtables: fix warnings on 32bit platforms (2015-06-18 21:14:33 +0200) Eric Dumazet (1): netfilter: x_tables: align per cpu xt_counter Eric W Biederman (1): netfilter: Remove spurios included of netfilter.h Eric W. Biederman (2): netfilter: Kill unused copies of RCV_SKB_FAIL net: sched: Simplify em_ipset_match Florian Westphal (1): netfilter: xtables: fix warnings on 32bit platforms Harout Hedeshian (1): netfilter: xt_socket: add XT_SOCKET_RESTORESKMARK flag Pablo Neira Ayuso (5): netfilter: bridge: rename br_netfilter.c to br_netfilter_hooks.c netfilter: bridge: split ipv6 code into separated file net: include missing headers in net/net_namespace.h netfilter: use forward declaration instead of including linux/proc_fs.h netfilter: don't pull include/linux/netfilter.h from netns headers Roman Kubiak (1): netfilter: nfnetlink_queue: add security context information drivers/net/hamradio/bpqether.c|1 - drivers/net/ppp/pptp.c |2 - drivers/net/wan/lapbether.c|1 - include/linux/netfilter.h |6 +- include/linux/netfilter/x_tables.h | 14 +- include/linux/netfilter_defs.h |9 + include/net/net_namespace.h|2 + include/net/netfilter/br_netfilter.h | 60 + include/net/netns/netfilter.h |4 +- include/net/netns/x_tables.h |2 +- include/uapi/linux/netfilter.h |3 +- include/uapi/linux/netfilter/nfnetlink_queue.h |4 +- include/uapi/linux/netfilter/xt_socket.h |8 + net/ax25/af_ax25.c |1 - net/ax25/ax25_in.c |1 - net/ax25/ax25_ip.c |1 - net/ax25/ax25_out.c|1 - net/ax25/ax25_uid.c|1 - net/bridge/Makefile|2 + .../{br_netfilter.c = br_netfilter_hooks.c} | 248 +--- net/bridge/br_netfilter_ipv6.c | 245 +++ net/ipv6/output_core.c |1 + net/netfilter/nf_synproxy_core.c |1 + net/netfilter/nfnetlink_log.c |2 - net/netfilter/nfnetlink_queue_core.c | 37 ++- net/netfilter/xt_socket.c | 59 - net/netrom/nr_route.c |1 - net/rose/rose_link.c |1 - net/rose/rose_route.c |1 - net/sched/em_ipset.c |4 +- security/selinux/xfrm.c|3 - 31 files
[PATCH 02/12] netfilter: xt_socket: add XT_SOCKET_RESTORESKMARK flag
From: Harout Hedeshian haro...@codeaurora.org xt_socket is useful for matching sockets with IP_TRANSPARENT and taking some action on the matching packets. However, it lacks the ability to match only a small subset of transparent sockets. Suppose there are 2 applications, each with its own set of transparent sockets. The first application wants all matching packets dropped, while the second application wants them forwarded somewhere else. Add the ability to retore the skb-mark from the sk_mark. The mark is only restored if a matching socket is found and the transparent / nowildcard conditions are satisfied. Now the 2 hypothetical applications can differentiate their sockets based on a mark value set with SO_MARK. iptables -t mangle -I PREROUTING -m socket --transparent \ --restore-skmark -j action iptables -t mangle -A action -m mark --mark 10 -j action2 iptables -t mangle -A action -m mark --mark 11 -j action3 Signed-off-by: Harout Hedeshian haro...@codeaurora.org Signed-off-by: Pablo Neira Ayuso pa...@netfilter.org --- include/uapi/linux/netfilter/xt_socket.h |8 net/netfilter/xt_socket.c| 59 +++--- 2 files changed, 61 insertions(+), 6 deletions(-) diff --git a/include/uapi/linux/netfilter/xt_socket.h b/include/uapi/linux/netfilter/xt_socket.h index 6315e2a..87644f8 100644 --- a/include/uapi/linux/netfilter/xt_socket.h +++ b/include/uapi/linux/netfilter/xt_socket.h @@ -6,6 +6,7 @@ enum { XT_SOCKET_TRANSPARENT = 1 0, XT_SOCKET_NOWILDCARD = 1 1, + XT_SOCKET_RESTORESKMARK = 1 2, }; struct xt_socket_mtinfo1 { @@ -18,4 +19,11 @@ struct xt_socket_mtinfo2 { }; #define XT_SOCKET_FLAGS_V2 (XT_SOCKET_TRANSPARENT | XT_SOCKET_NOWILDCARD) +struct xt_socket_mtinfo3 { + __u8 flags; +}; +#define XT_SOCKET_FLAGS_V3 (XT_SOCKET_TRANSPARENT \ + | XT_SOCKET_NOWILDCARD \ + | XT_SOCKET_RESTORESKMARK) + #endif /* _XT_SOCKET_H */ diff --git a/net/netfilter/xt_socket.c b/net/netfilter/xt_socket.c index e092cb0..43e26c8 100644 --- a/net/netfilter/xt_socket.c +++ b/net/netfilter/xt_socket.c @@ -205,6 +205,7 @@ static bool socket_match(const struct sk_buff *skb, struct xt_action_param *par, const struct xt_socket_mtinfo1 *info) { + struct sk_buff *pskb = (struct sk_buff *)skb; struct sock *sk = skb-sk; if (!sk) @@ -226,6 +227,10 @@ socket_match(const struct sk_buff *skb, struct xt_action_param *par, if (info-flags XT_SOCKET_TRANSPARENT) transparent = xt_socket_sk_is_transparent(sk); + if (info-flags XT_SOCKET_RESTORESKMARK !wildcard + transparent) + pskb-mark = sk-sk_mark; + if (sk != skb-sk) sock_gen_put(sk); @@ -247,7 +252,7 @@ socket_mt4_v0(const struct sk_buff *skb, struct xt_action_param *par) } static bool -socket_mt4_v1_v2(const struct sk_buff *skb, struct xt_action_param *par) +socket_mt4_v1_v2_v3(const struct sk_buff *skb, struct xt_action_param *par) { return socket_match(skb, par, par-matchinfo); } @@ -371,9 +376,10 @@ static struct sock *xt_socket_lookup_slow_v6(const struct sk_buff *skb, } static bool -socket_mt6_v1_v2(const struct sk_buff *skb, struct xt_action_param *par) +socket_mt6_v1_v2_v3(const struct sk_buff *skb, struct xt_action_param *par) { const struct xt_socket_mtinfo1 *info = (struct xt_socket_mtinfo1 *) par-matchinfo; + struct sk_buff *pskb = (struct sk_buff *)skb; struct sock *sk = skb-sk; if (!sk) @@ -395,6 +401,10 @@ socket_mt6_v1_v2(const struct sk_buff *skb, struct xt_action_param *par) if (info-flags XT_SOCKET_TRANSPARENT) transparent = xt_socket_sk_is_transparent(sk); + if (info-flags XT_SOCKET_RESTORESKMARK !wildcard + transparent) + pskb-mark = sk-sk_mark; + if (sk != skb-sk) sock_gen_put(sk); @@ -428,6 +438,19 @@ static int socket_mt_v2_check(const struct xt_mtchk_param *par) return 0; } +static int socket_mt_v3_check(const struct xt_mtchk_param *par) +{ + const struct xt_socket_mtinfo3 *info = + (struct xt_socket_mtinfo3 *)par-matchinfo; + + if (info-flags ~XT_SOCKET_FLAGS_V3) { + pr_info(unknown flags 0x%x\n, + info-flags ~XT_SOCKET_FLAGS_V3); + return -EINVAL; + } + return 0; +} + static struct xt_match socket_mt_reg[] __read_mostly = { { .name = socket, @@ -442,7 +465,7 @@ static struct xt_match socket_mt_reg[] __read_mostly = { .name = socket, .revision = 1, .family = NFPROTO_IPV4, -
[PATCH 06/12] netfilter: Kill unused copies of RCV_SKB_FAIL
From: Eric W. Biederman ebied...@xmission.com This appears to have been a dead macro in both nfnetlink_log.c and nfnetlink_queue_core.c since these pieces of code were added in 2005. Signed-off-by: Eric W. Biederman ebied...@xmission.com Signed-off-by: Pablo Neira Ayuso pa...@netfilter.org --- net/netfilter/nfnetlink_log.c|2 -- net/netfilter/nfnetlink_queue_core.c |2 -- 2 files changed, 4 deletions(-) diff --git a/net/netfilter/nfnetlink_log.c b/net/netfilter/nfnetlink_log.c index 4ef1fae..4670821 100644 --- a/net/netfilter/nfnetlink_log.c +++ b/net/netfilter/nfnetlink_log.c @@ -598,8 +598,6 @@ nla_put_failure: return -1; } -#define RCV_SKB_FAIL(err) do { netlink_ack(skb, nlh, (err)); return; } while (0) - static struct nf_loginfo default_loginfo = { .type = NF_LOG_TYPE_ULOG, .u = { diff --git a/net/netfilter/nfnetlink_queue_core.c b/net/netfilter/nfnetlink_queue_core.c index 6eccf0f..e26a46e 100644 --- a/net/netfilter/nfnetlink_queue_core.c +++ b/net/netfilter/nfnetlink_queue_core.c @@ -834,8 +834,6 @@ nfqnl_dev_drop(struct net *net, int ifindex) rcu_read_unlock(); } -#define RCV_SKB_FAIL(err) do { netlink_ack(skb, nlh, (err)); return; } while (0) - static int nfqnl_rcv_dev_event(struct notifier_block *this, unsigned long event, void *ptr) -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe netdev in
[PATCH 12/12] netfilter: xtables: fix warnings on 32bit platforms
From: Florian Westphal f...@strlen.de On 32bit archs gcc complains due to cast from void* to u64. Add intermediate casts to long to silence these warnings. include/linux/netfilter/x_tables.h:376:10: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast] include/linux/netfilter/x_tables.h:384:15: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast] include/linux/netfilter/x_tables.h:391:23: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast] include/linux/netfilter/x_tables.h:400:22: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast] Fixes: 71ae0dff02d756e (netfilter: xtables: use percpu rule counters) Reported-by: kbuild test robot fengguang...@intel.com Signed-off-by: Florian Westphal f...@strlen.de Signed-off-by: Pablo Neira Ayuso pa...@netfilter.org --- include/linux/netfilter/x_tables.h |8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/include/linux/netfilter/x_tables.h b/include/linux/netfilter/x_tables.h index 1c97a22..286098a 100644 --- a/include/linux/netfilter/x_tables.h +++ b/include/linux/netfilter/x_tables.h @@ -372,7 +372,7 @@ static inline u64 xt_percpu_counter_alloc(void) if (res == NULL) return (u64) -ENOMEM; - return (__force u64) res; + return (u64) (__force unsigned long) res; } return 0; @@ -380,14 +380,14 @@ static inline u64 xt_percpu_counter_alloc(void) static inline void xt_percpu_counter_free(u64 pcnt) { if (nr_cpu_ids 1) - free_percpu((void __percpu *) pcnt); + free_percpu((void __percpu *) (unsigned long) pcnt); } static inline struct xt_counters * xt_get_this_cpu_counter(struct xt_counters *cnt) { if (nr_cpu_ids 1) - return this_cpu_ptr((void __percpu *) cnt-pcnt); + return this_cpu_ptr((void __percpu *) (unsigned long) cnt-pcnt); return cnt; } @@ -396,7 +396,7 @@ static inline struct xt_counters * xt_get_per_cpu_counter(struct xt_counters *cnt, unsigned int cpu) { if (nr_cpu_ids 1) - return per_cpu_ptr((void __percpu *) cnt-pcnt, cpu); + return per_cpu_ptr((void __percpu *) (unsigned long) cnt-pcnt, cpu); return cnt; } -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe netdev in
[PATCH 11/12] netfilter: Remove spurios included of netfilter.h
From: Eric W Biederman ebied...@xmission.com While testing my netfilter changes I noticed several files where recompiling unncessarily because they unncessarily included netfilter.h. Signed-off-by: Eric W. Biederman ebied...@xmission.com Signed-off-by: Pablo Neira Ayuso pa...@netfilter.org --- drivers/net/hamradio/bpqether.c |1 - drivers/net/ppp/pptp.c |2 -- drivers/net/wan/lapbether.c |1 - net/ax25/af_ax25.c |1 - net/ax25/ax25_in.c |1 - net/ax25/ax25_ip.c |1 - net/ax25/ax25_out.c |1 - net/ax25/ax25_uid.c |1 - net/netrom/nr_route.c |1 - net/rose/rose_link.c|1 - net/rose/rose_route.c |1 - security/selinux/xfrm.c |3 --- 12 files changed, 15 deletions(-) diff --git a/drivers/net/hamradio/bpqether.c b/drivers/net/hamradio/bpqether.c index 63ff08a..7856b6c 100644 --- a/drivers/net/hamradio/bpqether.c +++ b/drivers/net/hamradio/bpqether.c @@ -76,7 +76,6 @@ #include linux/proc_fs.h #include linux/seq_file.h #include linux/stat.h -#include linux/netfilter.h #include linux/module.h #include linux/init.h #include linux/rtnetlink.h diff --git a/drivers/net/ppp/pptp.c b/drivers/net/ppp/pptp.c index 14839bc..686f37d 100644 --- a/drivers/net/ppp/pptp.c +++ b/drivers/net/ppp/pptp.c @@ -28,8 +28,6 @@ #include linux/file.h #include linux/in.h #include linux/ip.h -#include linux/netfilter.h -#include linux/netfilter_ipv4.h #include linux/rcupdate.h #include linux/spinlock.h diff --git a/drivers/net/wan/lapbether.c b/drivers/net/wan/lapbether.c index 2f5eda8..6676607 100644 --- a/drivers/net/wan/lapbether.c +++ b/drivers/net/wan/lapbether.c @@ -40,7 +40,6 @@ #include linux/interrupt.h #include linux/notifier.h #include linux/stat.h -#include linux/netfilter.h #include linux/module.h #include linux/lapb.h #include linux/init.h diff --git a/net/ax25/af_ax25.c b/net/ax25/af_ax25.c index 4273533..9c891d0 100644 --- a/net/ax25/af_ax25.c +++ b/net/ax25/af_ax25.c @@ -40,7 +40,6 @@ #include linux/notifier.h #include linux/proc_fs.h #include linux/stat.h -#include linux/netfilter.h #include linux/sysctl.h #include linux/init.h #include linux/spinlock.h diff --git a/net/ax25/ax25_in.c b/net/ax25/ax25_in.c index 7ed8ab7..29a3687 100644 --- a/net/ax25/ax25_in.c +++ b/net/ax25/ax25_in.c @@ -23,7 +23,6 @@ #include linux/inet.h #include linux/netdevice.h #include linux/skbuff.h -#include linux/netfilter.h #include net/sock.h #include net/tcp_states.h #include asm/uaccess.h diff --git a/net/ax25/ax25_ip.c b/net/ax25/ax25_ip.c index 7c646bb..b563a3f 100644 --- a/net/ax25/ax25_ip.c +++ b/net/ax25/ax25_ip.c @@ -31,7 +31,6 @@ #include linux/notifier.h #include linux/proc_fs.h #include linux/stat.h -#include linux/netfilter.h #include linux/sysctl.h #include net/ip.h #include net/arp.h diff --git a/net/ax25/ax25_out.c b/net/ax25/ax25_out.c index be2acab..8ddd41b 100644 --- a/net/ax25/ax25_out.c +++ b/net/ax25/ax25_out.c @@ -24,7 +24,6 @@ #include linux/inet.h #include linux/netdevice.h #include linux/skbuff.h -#include linux/netfilter.h #include net/sock.h #include asm/uaccess.h #include linux/fcntl.h diff --git a/net/ax25/ax25_uid.c b/net/ax25/ax25_uid.c index 71c4bad..4ad2fb7 100644 --- a/net/ax25/ax25_uid.c +++ b/net/ax25/ax25_uid.c @@ -34,7 +34,6 @@ #include linux/proc_fs.h #include linux/seq_file.h #include linux/stat.h -#include linux/netfilter.h #include linux/sysctl.h #include linux/export.h #include net/ip.h diff --git a/net/netrom/nr_route.c b/net/netrom/nr_route.c index 96b64d2..d72a4f1 100644 --- a/net/netrom/nr_route.c +++ b/net/netrom/nr_route.c @@ -31,7 +31,6 @@ #include linux/mm.h #include linux/interrupt.h #include linux/notifier.h -#include linux/netfilter.h #include linux/init.h #include linux/spinlock.h #include net/netrom.h diff --git a/net/rose/rose_link.c b/net/rose/rose_link.c index e873d7d..c76638c 100644 --- a/net/rose/rose_link.c +++ b/net/rose/rose_link.c @@ -25,7 +25,6 @@ #include linux/fcntl.h #include linux/mm.h #include linux/interrupt.h -#include linux/netfilter.h #include net/rose.h static void rose_ftimer_expiry(unsigned long); diff --git a/net/rose/rose_route.c b/net/rose/rose_route.c index 40148932..0fc76d8 100644 --- a/net/rose/rose_route.c +++ b/net/rose/rose_route.c @@ -31,7 +31,6 @@ #include linux/mm.h #include linux/interrupt.h #include linux/notifier.h -#include linux/netfilter.h #include linux/init.h #include net/rose.h #include linux/seq_file.h diff --git a/security/selinux/xfrm.c b/security/selinux/xfrm.c index 98b0426..56e354f 100644 --- a/security/selinux/xfrm.c +++ b/security/selinux/xfrm.c @@ -35,9 +35,6 @@ #include linux/init.h #include linux/security.h #include linux/types.h -#include linux/netfilter.h -#include linux/netfilter_ipv4.h -#include linux/netfilter_ipv6.h #include linux/slab.h #include linux/ip.h #include linux/tcp.h -- 1.7.10.4
[PATCH 10/12] netfilter: don't pull include/linux/netfilter.h from netns headers
This pulls the full hook netfilter definitions from all those that include net_namespace.h. Instead let's just include the bare minimum required in the new linux/netfilter_defs.h file, and use it from the netfilter netns header files. I also needed to include in.h and in6.h from linux/netfilter.h otherwise we hit this compilation error: In file included from include/linux/netfilter_defs.h:4:0, from include/net/netns/netfilter.h:4, from include/net/net_namespace.h:22, from include/linux/netdevice.h:43, from net/netfilter/nfnetlink_queue_core.c:23: include/uapi/linux/netfilter.h:76:17: error: field ‘in’ has incomplete type struct in_addr in; And also explicit include linux/netfilter.h in several spots. Signed-off-by: Pablo Neira Ayuso pa...@netfilter.org Signed-off-by: Eric W. Biederman ebied...@xmission.com --- include/linux/netfilter.h |6 ++ include/linux/netfilter_defs.h |9 + include/net/netns/netfilter.h |2 +- include/net/netns/x_tables.h |2 +- include/uapi/linux/netfilter.h |3 ++- net/ipv6/output_core.c |1 + 6 files changed, 16 insertions(+), 7 deletions(-) create mode 100644 include/linux/netfilter_defs.h diff --git a/include/linux/netfilter.h b/include/linux/netfilter.h index f5ff5d1..00050df 100644 --- a/include/linux/netfilter.h +++ b/include/linux/netfilter.h @@ -10,7 +10,8 @@ #include linux/wait.h #include linux/list.h #include linux/static_key.h -#include uapi/linux/netfilter.h +#include linux/netfilter_defs.h + #ifdef CONFIG_NETFILTER static inline int NF_DROP_GETERR(int verdict) { @@ -38,9 +39,6 @@ static inline void nf_inet_addr_mask(const union nf_inet_addr *a1, int netfilter_init(void); -/* Largest hook number + 1 */ -#define NF_MAX_HOOKS 8 - struct sk_buff; struct nf_hook_ops; diff --git a/include/linux/netfilter_defs.h b/include/linux/netfilter_defs.h new file mode 100644 index 000..d3a7f85 --- /dev/null +++ b/include/linux/netfilter_defs.h @@ -0,0 +1,9 @@ +#ifndef __LINUX_NETFILTER_CORE_H_ +#define __LINUX_NETFILTER_CORE_H_ + +#include uapi/linux/netfilter.h + +/* Largest hook number + 1, see uapi/linux/netfilter_decnet.h */ +#define NF_MAX_HOOKS 8 + +#endif diff --git a/include/net/netns/netfilter.h b/include/net/netns/netfilter.h index cf25b5e..532e4ba 100644 --- a/include/net/netns/netfilter.h +++ b/include/net/netns/netfilter.h @@ -1,7 +1,7 @@ #ifndef __NETNS_NETFILTER_H #define __NETNS_NETFILTER_H -#include linux/netfilter.h +#include linux/netfilter_defs.h struct proc_dir_entry; struct nf_logger; diff --git a/include/net/netns/x_tables.h b/include/net/netns/x_tables.h index 4d6597a..c8a7681 100644 --- a/include/net/netns/x_tables.h +++ b/include/net/netns/x_tables.h @@ -2,7 +2,7 @@ #define __NETNS_X_TABLES_H #include linux/list.h -#include linux/netfilter.h +#include linux/netfilter_defs.h struct ebt_table; diff --git a/include/uapi/linux/netfilter.h b/include/uapi/linux/netfilter.h index 177027c..d93f949 100644 --- a/include/uapi/linux/netfilter.h +++ b/include/uapi/linux/netfilter.h @@ -4,7 +4,8 @@ #include linux/types.h #include linux/compiler.h #include linux/sysctl.h - +#include linux/in.h +#include linux/in6.h /* Responses from hook functions. */ #define NF_DROP 0 diff --git a/net/ipv6/output_core.c b/net/ipv6/output_core.c index 21678ac..928a0fb 100644 --- a/net/ipv6/output_core.c +++ b/net/ipv6/output_core.c @@ -8,6 +8,7 @@ #include net/ip6_fib.h #include net/addrconf.h #include net/secure_seq.h +#include linux/netfilter.h static u32 __ipv6_select_ident(struct net *net, u32 hashrnd, const struct in6_addr *dst, -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe netdev in
Re: [PATCH next v3] bonding: Display LACP info only to CAP_NET_ADMIN capable user
On Fri, Jun 19, 2015 at 10:02:39AM -0700, Mahesh Bandewar wrote: On Thu, Jun 18, 2015 at 8:00 PM, Andy Gospodarek go...@cumulusnetworks.com wrote: [...] With this patch, actor_oper_port_state and partner_oper.port_state are not displayed in /proc, but that information is available via netlink from bond_fill_slave_info(). I suspect you do not deem these two values as critical to the security of the system, but wanted to confirm before ACKing. Yes, one can very easily figure out that LACP is used in the system with parameters like bond-mode, lacp-rate, or the port-state. I feel these do not need to be hidden from unprivileged users to ensure security. Principally hiding enough to ensure security would be good rather than hiding everything. However if there is a scenario where exposing these values is a threat (in the same sense) then it's not lot of extra work to achieve that and I'm open to make those change. Sounds fine to me. I just wanted to be sure the diffrence between the information displayed in various modes was intentional (or at least not unintentional) and did not conflict with your plans. Signed-off-by: Andy Gospodarek go...@cumulusnetworks.com -- To unsubscribe from this list: send the line unsubscribe netdev in
Re: [PATCH net-next RFC v2 3/3] mpls: support for ip tunnels
On 19/06/15 05:49, Roopa Prabhu wrote: From: Roopa Prabhu ro...@cumulusnetworks.com Support ip mpls tunnels using the new lwt infrastructure. Signed-off-by: Roopa Prabhu ro...@cumulusnetworks.com ... +int mpls_output(struct sock *sk, struct sk_buff *skb) +{ + struct mpls_iptunnel_encap *tun_encap_info; + struct mpls_shim_hdr *hdr; + struct mpls_entry_decoded dec; + struct net_device *out_dev; + unsigned int hh_len; + unsigned int new_header_size; + unsigned int mtu; + struct lwtunnel_state *lwtstate; + struct rtable *rt = skb_rtable(skb); + int err; + bool bos; + int i; + + if (skb-pkt_type != PACKET_HOST) + goto drop; + + if ((skb = skb_share_check(skb, GFP_ATOMIC)) == NULL) + goto drop; + + if (!rt) + goto drop; + + /* Find the output device */ + out_dev = rcu_dereference(skb_dst(skb)-dev); Since the entire label stack and the output device is encoded in the route, this means that you won't get prefix-independent convergence with this implementation for an IGP route change. I.e. if you've got 10 million VPN routes via an IGP route for the BGP nexthop, and the IGP route for the BGP nexthop changes (e.g. because a link has gone down somewhere in the network) then you'll have to update all 10 million IP routes to change the output device, gateway and IGP label. That's going to represent a scaling obstacle for one of the primary MPLS use cases. + if (!mpls_output_possible(out_dev)) + goto drop; + + if (skb_warn_if_lro(skb)) + goto drop; + skb_forward_csum(skb); + + lwtstate = rt-rt_lwtstate; + if (!lwtstate) + goto drop; + + tun_encap_info = mpls_lwt_hdr(lwtstate); + + /* Verify the destination can hold the packet */ + new_header_size = mpls_encap_size(tun_encap_info); + mtu = mpls_dev_mtu(out_dev); + if (mpls_pkt_too_big(skb, mtu - new_header_size)) + goto drop; + + hh_len = LL_RESERVED_SPACE(out_dev); + if (!out_dev-header_ops) + hh_len = 0; + + /* Ensure there is enough space for the headers in the skb */ + if (skb_cow(skb, hh_len + new_header_size)) + goto drop; + + skb-dev = out_dev; + skb-protocol = htons(ETH_P_MPLS_UC); + + skb_push(skb, new_header_size); + skb_reset_network_header(skb); + + /* Push the new labels */ + hdr = mpls_hdr(skb); + bos = true; + for (i = tun_encap_info-labels - 1; i = 0; i--) { + hdr[i] = mpls_entry_encode(tun_encap_info-label[i], + dec.ttl, 0, bos); dec is never initialised in this function, so this will encode a garbage ttl into the packet. This should instead be deriving the ttl from the IP packet, as Eric did in his original patch. Thanks, Rob + bos = false; + } + + err = neigh_xmit(NEIGH_ARP_TABLE, out_dev, rt-rt_gateway, +skb); + if (err) + net_dbg_ratelimited(%s: packet transmission failed: + %d\n, __func__, err); + + return 0; + +drop: + kfree_skb(skb); + return -EINVAL; +} -- To unsubscribe from this list: send the line unsubscribe netdev in
Re: [PATCH net] netfilter: nftables: Do not run chains in the wrong network namespace
On Fri, Jun 19, 2015 at 10:41:21AM -0500, Eric W. Biederman wrote: Currenlty nf_tables chains added in one network namespace are being run in all network namespace. The issues are myriad with the simplest being an unprivileged user can cause any network packets to be dropped. Address this by simply not running nf_tables chains in the wrong network namespace. Cc: sta...@vger.kernel.org Signed-off-by: Eric W. Biederman ebied...@xmission.com Acked-by: Pablo Neira Ayuso pa...@netfilter.org @David: Patrick sent a similar patch to address this, if you can get this into the net tree, I'll make sure this propagates to -stable. Thanks. -- To unsubscribe from this list: send the line unsubscribe netdev in
Re: [REGRESSION] NFS is creating a hidden port (left over from xs_bind() )
On Fri, 19 Jun 2015 12:25:53 -0400 Steven Rostedt rost...@goodmis.org wrote: I don't see that 55201 anywhere. But then again, I didn't look for it before the port disappeared. I could reboot and look for it again. I should have saved the full netstat -tapn as well :-/ Of course I didn't find it anywhere, that's the port on my wife's box that port 947 was connected to. Now I even went over to my wife's box and ran # rpcinfo -p localhost program vers proto port service 104 tcp111 portmapper 103 tcp111 portmapper 102 tcp111 portmapper 104 udp111 portmapper 103 udp111 portmapper 102 udp111 portmapper 1000241 udp 34243 status 1000241 tcp 34498 status which doesn't show anything. but something is listening to that port... # netstat -ntap |grep 55201 tcp0 0 0.0.0.0:55201 0.0.0.0:* LISTEN I rebooted again, but this time I ran this on my wife's box: # trace-cmd record -e nfs -e nfs4 -e net -e skb -e sock -e udp -e workqueue -e sunrpc I started it when my server started booting the kernel, and kept it running till the port vanished. The full trace can be downloaded from http://rostedt.homelinux.com/private/wife-trace.txt Here's some interesting output from that trace: ksoftirq-13 1..s. 12272627.681760: netif_receive_skb:dev=lo skbaddr=0x88020944c600 len=88 ksoftirq-13 1..s. 12272627.681776: net_dev_queue:dev=eth0 skbaddr=0x880234e5b100 len=42 ksoftirq-13 1..s. 12272627.681777: net_dev_start_xmit: dev=eth0 queue_mapping=0 skbaddr=0x880234e5b100 vlan_tagged=0 vlan_proto=0x vlan_tci=0x protocol=0x0806 ip_ summed=0 len=42 data_len=0 network_offset=14 transport_offset_valid=0 transport_offset=65533 tx_flags=0 gso_size=0 gso_segs=0 gso_type=0 ksoftirq-13 1..s. 12272627.681779: net_dev_xmit: dev=eth0 skbaddr=0x880234e5b100 len=42 rc=0 ksoftirq-13 1..s. 12272627.681780: kfree_skb: skbaddr=0x88023444cf00 protocol=2048 location=0x81422a72 ksoftirq-13 1..s. 12272627.681783: rpc_socket_error: error=-113 socket:[11886206] dstaddr=192.168.23.9/2049 state=2 () sk_state=2 () ksoftirq-13 1..s. 12272627.681785: rpc_task_wakeup: task:18128@0 flags=5281 state=0006 status=-113 timeout=45000 queue=xprt_pending ksoftirq-13 1d.s. 12272627.681786: workqueue_queue_work: work struct=0x8800b5a94588 function=rpc_async_schedule workqueue=0x880234666800 req_cpu=512 cpu=1 ksoftirq-13 1d.s. 12272627.681787: workqueue_activate_work: work struct 0x8800b5a94588 ksoftirq-13 1..s. 12272627.681791: rpc_socket_state_change: socket:[11886206] dstaddr=192.168.23.9/2049 state=2 () sk_state=7 () ksoftirq-13 1..s. 12272627.681792: kfree_skb: skbaddr=0x88020944c600 protocol=2048 location=0x81482c05 kworker/-20111 1 12272627.681796: workqueue_execute_start: work struct 0x8800b5a94588: function rpc_async_schedule kworker/-20111 1 12272627.681797: rpc_task_run_action: task:18128@0 flags=5281 state=0005 status=-113 action=call_connect_status kworker/-20111 1 12272627.681798: rpc_task_run_action: task:18128@0 flags=5281 state=0005 status=-113 action=call_connect_status kworker/-20111 1 12272627.681798: rpc_connect_status: task:18128@0, status -113 kworker/-20111 1..s. 12272627.681799: rpc_task_sleep: task:18128@0 flags=5281 state=0005 status=0 timeout=750 queue=delayq kworker/-20111 1 12272627.681800: workqueue_execute_end: work struct 0x8800b5a94588 idle-0 1..s. 12272630.688741: rpc_task_wakeup: task:18128@0 flags=5281 state=0006 status=-110 timeout=750 queue=delayq idle-0 1dNs. 12272630.688749: workqueue_queue_work: work struct=0x8800b5a94588 function=rpc_async_schedule workqueue=0x880234666800 req_cpu=512 cpu=1 idle-0 1dNs. 12272630.688749: workqueue_activate_work: work struct 0x8800b5a94588 kworker/-20111 1 12272630.688758: workqueue_execute_start: work struct 0x8800b5a94588: function rpc_async_schedule kworker/-20111 1 12272630.688759: rpc_task_run_action: task:18128@0 flags=5281 state=0005 status=-110 action=call_timeout kworker/-20111 1 12272630.688760: rpc_task_run_action: task:18128@0 flags=5281 state=0005 status=0 action=call_timeout kworker/-20111 1 12272630.688760: rpc_task_run_action: task:18128@0 flags=5281 state=0005 status=0 action=call_bind kworker/-20111 1 12272630.688761: rpc_task_run_action: task:18128@0 flags=5281 state=0005 status=0 action=call_connect kworker/-20111 1..s. 12272630.688762: rpc_task_sleep: task:18128@0 flags=5281 state=0005 status=0 timeout=45000 queue=xprt_pending kworker/-20111 1 12272630.688765: workqueue_execute_end: work struct 0x8800b5a94588 idle-0 3d.s. 12272630.696742:
[PATCH 08/12] net: include missing headers in net/net_namespace.h
Include linux/idr.h and linux/skbuff.h since they are required by objects that are declared in the net structure. struct net { ... struct idr netns_ids; ... struct sk_buff_head wext_nlevents; ... Signed-off-by: Pablo Neira Ayuso pa...@netfilter.org Signed-off-by: Eric W. Biederman ebied...@xmission.com --- include/net/net_namespace.h |2 ++ 1 file changed, 2 insertions(+) diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h index 72eb237..e951453 100644 --- a/include/net/net_namespace.h +++ b/include/net/net_namespace.h @@ -28,6 +28,8 @@ #include net/netns/xfrm.h #include net/netns/mpls.h #include linux/ns_common.h +#include linux/idr.h +#include linux/skbuff.h struct user_namespace; struct proc_dir_entry; -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe netdev in
[PATCH 05/12] netfilter: bridge: split ipv6 code into separated file
Resolve compilation breakage when CONFIG_IPV6 is not set by moving the IPv6 code into a separated br_netfilter_ipv6.c file. Fixes: efb6de9b4ba0 (netfilter: bridge: forward IPv6 fragmented packets) Reported-by: kbuild test robot fengguang...@intel.com Signed-off-by: Pablo Neira Ayuso pa...@netfilter.org --- include/net/netfilter/br_netfilter.h | 60 net/bridge/Makefile |1 + net/bridge/br_netfilter_hooks.c | 248 ++ net/bridge/br_netfilter_ipv6.c | 245 + 4 files changed, 315 insertions(+), 239 deletions(-) create mode 100644 net/bridge/br_netfilter_ipv6.c diff --git a/include/net/netfilter/br_netfilter.h b/include/net/netfilter/br_netfilter.h index 2aa6048..bab824b 100644 --- a/include/net/netfilter/br_netfilter.h +++ b/include/net/netfilter/br_netfilter.h @@ -1,6 +1,66 @@ #ifndef _BR_NETFILTER_H_ #define _BR_NETFILTER_H_ +#include ../../../net/bridge/br_private.h + +static inline struct nf_bridge_info *nf_bridge_alloc(struct sk_buff *skb) +{ + skb-nf_bridge = kzalloc(sizeof(struct nf_bridge_info), GFP_ATOMIC); + + if (likely(skb-nf_bridge)) + atomic_set((skb-nf_bridge-use), 1); + + return skb-nf_bridge; +} + +void nf_bridge_update_protocol(struct sk_buff *skb); + +static inline struct nf_bridge_info * +nf_bridge_info_get(const struct sk_buff *skb) +{ + return skb-nf_bridge; +} + +unsigned int nf_bridge_encap_header_len(const struct sk_buff *skb); + +static inline void nf_bridge_push_encap_header(struct sk_buff *skb) +{ + unsigned int len = nf_bridge_encap_header_len(skb); + + skb_push(skb, len); + skb-network_header -= len; +} + +int br_nf_pre_routing_finish_bridge(struct sock *sk, struct sk_buff *skb); + +static inline struct rtable *bridge_parent_rtable(const struct net_device *dev) +{ + struct net_bridge_port *port; + + port = br_port_get_rcu(dev); + return port ? port-br-fake_rtable : NULL; +} + +struct net_device *setup_pre_routing(struct sk_buff *skb); void br_netfilter_enable(void); +#if IS_ENABLED(CONFIG_IPV6) +int br_validate_ipv6(struct sk_buff *skb); +unsigned int br_nf_pre_routing_ipv6(const struct nf_hook_ops *ops, + struct sk_buff *skb, + const struct nf_hook_state *state); +#else +static inline int br_validate_ipv6(struct sk_buff *skb) +{ + return -1; +} + +static inline unsigned int +br_nf_pre_routing_ipv6(const struct nf_hook_ops *ops, struct sk_buff *skb, + const struct nf_hook_state *state) +{ + return NF_DROP; +} +#endif + #endif /* _BR_NETFILTER_H_ */ diff --git a/net/bridge/Makefile b/net/bridge/Makefile index c52577a..a1cda5d 100644 --- a/net/bridge/Makefile +++ b/net/bridge/Makefile @@ -13,6 +13,7 @@ bridge-$(CONFIG_SYSFS) += br_sysfs_if.o br_sysfs_br.o bridge-$(subst m,y,$(CONFIG_BRIDGE_NETFILTER)) += br_nf_core.o br_netfilter-y := br_netfilter_hooks.o +br_netfilter-$(subst m,y,$(CONFIG_IPV6)) += br_netfilter_ipv6.o obj-$(CONFIG_BRIDGE_NETFILTER) += br_netfilter.o bridge-$(CONFIG_BRIDGE_IGMP_SNOOPING) += br_multicast.o br_mdb.o diff --git a/net/bridge/br_netfilter_hooks.c b/net/bridge/br_netfilter_hooks.c index e4e5f2f..d89f4fa 100644 --- a/net/bridge/br_netfilter_hooks.c +++ b/net/bridge/br_netfilter_hooks.c @@ -123,11 +123,6 @@ struct brnf_frag_data { static DEFINE_PER_CPU(struct brnf_frag_data, brnf_frag_data_storage); #endif -static struct nf_bridge_info *nf_bridge_info_get(const struct sk_buff *skb) -{ - return skb-nf_bridge; -} - static void nf_bridge_info_free(struct sk_buff *skb) { if (skb-nf_bridge) { @@ -136,14 +131,6 @@ static void nf_bridge_info_free(struct sk_buff *skb) } } -static inline struct rtable *bridge_parent_rtable(const struct net_device *dev) -{ - struct net_bridge_port *port; - - port = br_port_get_rcu(dev); - return port ? port-br-fake_rtable : NULL; -} - static inline struct net_device *bridge_parent(const struct net_device *dev) { struct net_bridge_port *port; @@ -152,15 +139,6 @@ static inline struct net_device *bridge_parent(const struct net_device *dev) return port ? port-br-dev : NULL; } -static inline struct nf_bridge_info *nf_bridge_alloc(struct sk_buff *skb) -{ - skb-nf_bridge = kzalloc(sizeof(struct nf_bridge_info), GFP_ATOMIC); - if (likely(skb-nf_bridge)) - atomic_set((skb-nf_bridge-use), 1); - - return skb-nf_bridge; -} - static inline struct nf_bridge_info *nf_bridge_unshare(struct sk_buff *skb) { struct nf_bridge_info *nf_bridge = skb-nf_bridge; @@ -178,7 +156,7 @@ static inline struct nf_bridge_info *nf_bridge_unshare(struct sk_buff *skb) return nf_bridge; } -static unsigned int nf_bridge_encap_header_len(const struct sk_buff *skb) +unsigned int nf_bridge_encap_header_len(const struct sk_buff *skb) { switch
[PATCH 07/12] net: sched: Simplify em_ipset_match
From: Eric W. Biederman ebied...@xmission.com em-net is always set and always available, use it in preference to dev_net(skb-dev). Signed-off-by: Eric W. Biederman ebied...@xmission.com Signed-off-by: Pablo Neira Ayuso pa...@netfilter.org --- net/sched/em_ipset.c |4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/net/sched/em_ipset.c b/net/sched/em_ipset.c index a3d79c8..df0328b 100644 --- a/net/sched/em_ipset.c +++ b/net/sched/em_ipset.c @@ -92,8 +92,8 @@ static int em_ipset_match(struct sk_buff *skb, struct tcf_ematch *em, rcu_read_lock(); - if (dev skb-skb_iif) - indev = dev_get_by_index_rcu(dev_net(dev), skb-skb_iif); + if (skb-skb_iif) + indev = dev_get_by_index_rcu(em-net, skb-skb_iif); acpar.in = indev ? indev : dev; acpar.out = dev; -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe netdev in
[PATCH 04/12] netfilter: bridge: rename br_netfilter.c to br_netfilter_hooks.c
To prepare separation of the IPv6 code into different file. Signed-off-by: Pablo Neira Ayuso pa...@netfilter.org --- net/bridge/Makefile |1 + net/bridge/{br_netfilter.c = br_netfilter_hooks.c} |0 2 files changed, 1 insertion(+) rename net/bridge/{br_netfilter.c = br_netfilter_hooks.c} (100%) diff --git a/net/bridge/Makefile b/net/bridge/Makefile index fd7ee03..c52577a 100644 --- a/net/bridge/Makefile +++ b/net/bridge/Makefile @@ -12,6 +12,7 @@ bridge-$(CONFIG_SYSFS) += br_sysfs_if.o br_sysfs_br.o bridge-$(subst m,y,$(CONFIG_BRIDGE_NETFILTER)) += br_nf_core.o +br_netfilter-y := br_netfilter_hooks.o obj-$(CONFIG_BRIDGE_NETFILTER) += br_netfilter.o bridge-$(CONFIG_BRIDGE_IGMP_SNOOPING) += br_multicast.o br_mdb.o diff --git a/net/bridge/br_netfilter.c b/net/bridge/br_netfilter_hooks.c similarity index 100% rename from net/bridge/br_netfilter.c rename to net/bridge/br_netfilter_hooks.c -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe netdev in
[PATCH 03/12] netfilter: x_tables: align per cpu xt_counter
From: Eric Dumazet eduma...@google.com Let's force a 16 bytes alignment on xt_counter percpu allocations, so that bytes and packets sit in same cache line. xt_counter being exported to user space, we cannot add __align(16) on the structure itself. Signed-off-by: Eric Dumazet eduma...@google.com Cc: Florian Westphal f...@strlen.de Acked-by: Florian Westphal f...@strlen.de Signed-off-by: Pablo Neira Ayuso pa...@netfilter.org --- include/linux/netfilter/x_tables.h |6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/include/linux/netfilter/x_tables.h b/include/linux/netfilter/x_tables.h index 95693c4..1c97a22 100644 --- a/include/linux/netfilter/x_tables.h +++ b/include/linux/netfilter/x_tables.h @@ -356,7 +356,8 @@ static inline unsigned long ifname_compare_aligned(const char *_a, * so nothing needs to be done there. * * xt_percpu_counter_alloc returns the address of the percpu - * counter, or 0 on !SMP. + * counter, or 0 on !SMP. We force an alignment of 16 bytes + * so that bytes/packets share a common cache line. * * Hence caller must use IS_ERR_VALUE to check for error, this * allows us to return 0 for single core systems without forcing @@ -365,7 +366,8 @@ static inline unsigned long ifname_compare_aligned(const char *_a, static inline u64 xt_percpu_counter_alloc(void) { if (nr_cpu_ids 1) { - void __percpu *res = alloc_percpu(struct xt_counters); + void __percpu *res = __alloc_percpu(sizeof(struct xt_counters), + sizeof(struct xt_counters)); if (res == NULL) return (u64) -ENOMEM; -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe netdev in
Re: [PATCH net-next RFC v2 1/3] lwt: infrastructure to support light weight tunnels
n 19/06/15 16:14, roopa wrote: On 6/19/15, 7:43 AM, Robert Shearman wrote: + +static inline struct lwtunnel_state *lwtunnel_skb_lwstate(struct sk_buff *skb) +{ +struct rtable *rt = (struct rtable *)skb_dst(skb); + +return rt-rt_lwtstate; +} It doesn't look like this patch will build on its own because rt_lwtstate isn't added to struct rtable until patch 2. looks like i messed up the patch creation. I will fix that with the next series. More importantly, is it safe to assume that skb_dst will always return an IPv4 dst? How will this look when IPv6 support is added? Today lwtunnel_skb_lwstate is called from lwtunnel_output which is only called from ipv4 code. And my ipv6 variant code was supposed to have a 6 suffix. something like lwtunnel_output6. Or to be more explicit i will probably have variants of the output and skb handling functions like, lwtunnel_output_ipv4 and lwtunnel_output_ipv6. Do you intend for these functions to be used by netdevices to support the vxlan use case? If so, then how will the netdevice know which one of the two to call? Will there have to be a netdevice for ipv4 and a netdevice for ipv6? If not, could you outline how you intend for it to be implemented? Thanks, Rob -- To unsubscribe from this list: send the line unsubscribe netdev in
Re: [REGRESSION] NFS is creating a hidden port (left over from xs_bind() )
On Fri, Jun 19, 2015 at 1:17 PM, Steven Rostedt rost...@goodmis.org wrote: On Fri, 19 Jun 2015 12:25:53 -0400 Steven Rostedt rost...@goodmis.org wrote: I don't see that 55201 anywhere. But then again, I didn't look for it before the port disappeared. I could reboot and look for it again. I should have saved the full netstat -tapn as well :-/ Of course I didn't find it anywhere, that's the port on my wife's box that port 947 was connected to. Now I even went over to my wife's box and ran # rpcinfo -p localhost program vers proto port service 104 tcp111 portmapper 103 tcp111 portmapper 102 tcp111 portmapper 104 udp111 portmapper 103 udp111 portmapper 102 udp111 portmapper 1000241 udp 34243 status 1000241 tcp 34498 status which doesn't show anything. but something is listening to that port... # netstat -ntap |grep 55201 tcp0 0 0.0.0.0:55201 0.0.0.0:* LISTEN Hang on. This is on the client box while there is an active NFSv4 mount? Then that's probably the NFSv4 callback channel listening for delegation callbacks. Can you please try: echo options nfs callback_tcpport=4048 /etc/modprobe.d/nfs-local.conf and then either reboot the client or unload and then reload the nfs modules before reattempting the mount. If this is indeed the callback channel, then that will move your phantom listener to port 4048... Cheers Trond -- To unsubscribe from this list: send the line unsubscribe netdev in
[PATCH 01/12] netfilter: nfnetlink_queue: add security context information
From: Roman Kubiak r.kub...@samsung.com This patch adds an additional attribute when sending packet information via netlink in netfilter_queue module. It will send additional security context data, so that userspace applications can verify this context against their own security databases. Signed-off-by: Roman Kubiak r.kub...@samsung.com Acked-by: Florian Westphal f...@strlen.de Signed-off-by: Pablo Neira Ayuso pa...@netfilter.org --- include/uapi/linux/netfilter/nfnetlink_queue.h |4 ++- net/netfilter/nfnetlink_queue_core.c | 35 +++- 2 files changed, 37 insertions(+), 2 deletions(-) diff --git a/include/uapi/linux/netfilter/nfnetlink_queue.h b/include/uapi/linux/netfilter/nfnetlink_queue.h index 8dd819e..b67a853 100644 --- a/include/uapi/linux/netfilter/nfnetlink_queue.h +++ b/include/uapi/linux/netfilter/nfnetlink_queue.h @@ -49,6 +49,7 @@ enum nfqnl_attr_type { NFQA_EXP, /* nf_conntrack_netlink.h */ NFQA_UID, /* __u32 sk uid */ NFQA_GID, /* __u32 sk gid */ + NFQA_SECCTX,/* security context string */ __NFQA_MAX }; @@ -102,7 +103,8 @@ enum nfqnl_attr_config { #define NFQA_CFG_F_CONNTRACK (1 1) #define NFQA_CFG_F_GSO (1 2) #define NFQA_CFG_F_UID_GID (1 3) -#define NFQA_CFG_F_MAX (1 4) +#define NFQA_CFG_F_SECCTX (1 4) +#define NFQA_CFG_F_MAX (1 5) /* flags for NFQA_SKB_INFO */ /* packet appears to have wrong checksums, but they are ok */ diff --git a/net/netfilter/nfnetlink_queue_core.c b/net/netfilter/nfnetlink_queue_core.c index 22a5ac7..6eccf0f 100644 --- a/net/netfilter/nfnetlink_queue_core.c +++ b/net/netfilter/nfnetlink_queue_core.c @@ -278,6 +278,23 @@ nla_put_failure: return -1; } +static u32 nfqnl_get_sk_secctx(struct sk_buff *skb, char **secdata) +{ + u32 seclen = 0; +#if IS_ENABLED(CONFIG_NETWORK_SECMARK) + if (!skb || !sk_fullsock(skb-sk)) + return 0; + + read_lock_bh(skb-sk-sk_callback_lock); + + if (skb-secmark) + security_secid_to_secctx(skb-secmark, secdata, seclen); + + read_unlock_bh(skb-sk-sk_callback_lock); +#endif + return seclen; +} + static struct sk_buff * nfqnl_build_packet_message(struct net *net, struct nfqnl_instance *queue, struct nf_queue_entry *entry, @@ -297,6 +314,8 @@ nfqnl_build_packet_message(struct net *net, struct nfqnl_instance *queue, struct nf_conn *ct = NULL; enum ip_conntrack_info uninitialized_var(ctinfo); bool csum_verify; + char *secdata = NULL; + u32 seclen = 0; size =nlmsg_total_size(sizeof(struct nfgenmsg)) + nla_total_size(sizeof(struct nfqnl_msg_packet_hdr)) @@ -352,6 +371,12 @@ nfqnl_build_packet_message(struct net *net, struct nfqnl_instance *queue, + nla_total_size(sizeof(u_int32_t))); /* gid */ } + if ((queue-flags NFQA_CFG_F_SECCTX) entskb-sk) { + seclen = nfqnl_get_sk_secctx(entskb, secdata); + if (seclen) + size += nla_total_size(seclen); + } + skb = nfnetlink_alloc_skb(net, size, queue-peer_portid, GFP_ATOMIC); if (!skb) { @@ -479,6 +504,9 @@ nfqnl_build_packet_message(struct net *net, struct nfqnl_instance *queue, nfqnl_put_sk_uidgid(skb, entskb-sk) 0) goto nla_put_failure; + if (seclen nla_put(skb, NFQA_SECCTX, seclen, secdata)) + goto nla_put_failure; + if (ct nfqnl_ct_put(skb, ct, ctinfo) 0) goto nla_put_failure; @@ -1142,7 +1170,12 @@ nfqnl_recv_config(struct sock *ctnl, struct sk_buff *skb, ret = -EOPNOTSUPP; goto err_out_unlock; } - +#if !IS_ENABLED(CONFIG_NETWORK_SECMARK) + if (flags mask NFQA_CFG_F_SECCTX) { + ret = -EOPNOTSUPP; + goto err_out_unlock; + } +#endif spin_lock_bh(queue-lock); queue-flags = ~mask; queue-flags |= flags mask; -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe netdev in
Re: [PATCH net-next RFC v2 2/3] ipv4: add support for light weight tunnel encap attributes
On 19/06/15 16:28, roopa wrote: On 6/19/15, 8:19 AM, Robert Shearman wrote: On 19/06/15 05:49, Roopa Prabhu wrote: From: Roopa Prabhu ro...@cumulusnetworks.com Introduces two netlink attributes RTA_ENCAP_TYPE and RTA_ENCAP to support attaching encap information to ipv4 routes. Surely RTA_ENCAP_TYPE should be part of RTA_ENCAP, since the type doesn't make sense without the data and vice versa? I went back and forth on this. And started with what you are saying above. But then I wanted RTA_ENCAP netlink policy to be declared by individual lwtunnel drivers. And to determine which RTA_ENCAP netlink policy to pick, you need to know the RTA_ENCAP policy type (or lwtunnel type) which is encoded in RTA_ENCAP_TYPE. And I did not want to introduce another level of nest in RTA_ENCAP (because for nexthops we are already 2 levels deep when parsing RTA_ENCAP). No need for that - use the example of how RTA_MULTIPATH is used for ipv4/ipv6: +--+ | RTA_MULTIPATH| +--+ | +--+ | | | struct rtnexthop | | | +--+ | | | RTA_GATEWAY, etc.| | | +--+ | +--+ You could do similar for RTA_ENCAP where the type is stored in the data prior to the nested attributes starting. E.g.: +--+ | RTA_ENCAP| +--+ | +--+ | | | struct rtencap | | | +--+ | | | MPLS_IPTUNNEL_DST| | | +--+ | +--+ struct rtencap { __u16 rte_type; }; Hence, fib code first looks for RTA_ENCAP and if RTA_ENCAP is specified, RTA_ENCAP_TYPE is a required attribute. My iproute2 patches handles this and makes sure there is an RTA_ENCAP_TYPE specified with RTA_ENCAP. No doubt, but surely it's better to present an unambiguous API to userspace if possible? Thanks, Rob -- To unsubscribe from this list: send the line unsubscribe netdev in
Re: [PATCH 02/22] fjes: Hardware initialization routine
Hi Izumi-san, On Thu, 18 Jun 2015 09:49:27 +0900 Taku Izumi izumi.t...@jp.fujitsu.com wrote: This patch adds hardware initialization routine to be invoked at driver's .probe routine. Signed-off-by: Taku Izumi izumi.t...@jp.fujitsu.com --- drivers/platform/x86/fjes/Makefile| 2 +- drivers/platform/x86/fjes/fjes_hw.c | 305 ++ drivers/platform/x86/fjes/fjes_hw.h | 254 drivers/platform/x86/fjes/fjes_regs.h | 110 4 files changed, 670 insertions(+), 1 deletion(-) create mode 100644 drivers/platform/x86/fjes/fjes_hw.c create mode 100644 drivers/platform/x86/fjes/fjes_hw.h create mode 100644 drivers/platform/x86/fjes/fjes_regs.h diff --git a/drivers/platform/x86/fjes/Makefile b/drivers/platform/x86/fjes/Makefile index 98e59cb..a67f65d8 100644 --- a/drivers/platform/x86/fjes/Makefile +++ b/drivers/platform/x86/fjes/Makefile @@ -27,5 +27,5 @@ obj-$(CONFIG_FUJITSU_ES) += fjes.o -fjes-objs := fjes_main.o +fjes-objs := fjes_main.o fjes_hw.o diff --git a/drivers/platform/x86/fjes/fjes_hw.c b/drivers/platform/x86/fjes/fjes_hw.c new file mode 100644 index 000..1731827 --- /dev/null +++ b/drivers/platform/x86/fjes/fjes_hw.c @@ -0,0 +1,305 @@ +/* + * FUJITSU Extended Socket Network Device driver + * Copyright (c) 2015 FUJITSU LIMITED + * + * This program is free software; you can redistribute it and/or modify it + * under the terms and conditions of the GNU General Public License, + * version 2, as published by the Free Software Foundation. + * + * This program is distributed in the hope it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + * more details. + * + * You should have received a copy of the GNU General Public License along with + * this program; if not, see http://www.gnu.org/licenses/. + * + * The full GNU General Public License is included in this distribution in + * the file called COPYING. + * + */ + +#include fjes_hw.h +#include fjes.h + +/* supported MTU list */ +u32 fjes_support_mtu[] = { + FJES_MTU_DEFINE(8 * 1024), + FJES_MTU_DEFINE(16 * 1024), + FJES_MTU_DEFINE(32 * 1024), + FJES_MTU_DEFINE(64 * 1024), + 0 +}; + +u32 fjes_hw_rd32(struct fjes_hw *hw, u32 reg) +{ + u8 *base = hw-base; + u32 value = 0; + + value = readl(base[reg]); + + return value; +} + +static u8 *fjes_hw_iomap(struct fjes_hw *hw) +{ + u8 *base; + + if (!request_mem_region(hw-hw_res.start, hw-hw_res.size, + fjes_driver_name)) { + pr_err(request_mem_region failed); + return NULL; + } + + base = (u8 *)ioremap_nocache(hw-hw_res.start, hw-hw_res.size); + + return base; +} + + +int fjes_hw_reset(struct fjes_hw *hw) +{ + + int timeout; + union REG_DCTL dctl; + + dctl.Reg = 0; + dctl.Bits.reset = 1; + wr32(XSCT_DCTL, dctl.Reg); + + + timeout = FJES_DEVICE_RESET_TIMEOUT * 1000; + dctl.Reg = rd32(XSCT_DCTL); + while ((dctl.Bits.reset == 1) (timeout 0)) { + msleep(1000); + dctl.Reg = rd32(XSCT_DCTL); + timeout -= 1000; + } + + return timeout = 0 ? 0 : -EIO; The while loop finishes when timeout becomes 0. So the funtion always returns 0. It should be return dctl.Bits.reset =! 1 ? 0 : -EIO. + +} + +static int fjes_hw_get_max_epid(struct fjes_hw *hw) +{ + union REG_MAX_EP info; + + info.Reg = rd32(XSCT_MAX_EP); + + return info.Bits.maxep; +} This is very difficut to read. Please add comment. When does info.Bits.maxep get value? The function just uses rd32(XSCT_MAX_EP). + +static int fjes_hw_get_my_epid(struct fjes_hw *hw) +{ + union REG_OWNER_EPID info; + + info.Reg = rd32(XSCT_OWNER_EPID); + + return info.Bits.epid; +} Ditto. + +static int fjes_hw_alloc_shared_status_region(struct fjes_hw *hw) +{ + size_t size; + + size = sizeof(struct fjes_device_shared_info) + + (sizeof(u8) * hw-max_epid); + hw-hw_info.share = kzalloc(size, GFP_KERNEL); + if (!hw-hw_info.share) + return -ENOMEM; + + hw-hw_info.share-epnum = hw-max_epid; + + return 0; +} + +static int fjes_hw_alloc_epbuf(struct epbuf_handler *epbh) +{ + void *mem; + + mem = vmalloc(EP_BUFFER_SIZE); + if (!mem) + return -ENOMEM; + memset(mem, 0, EP_BUFFER_SIZE); How about use vzalloc(). + + epbh-buffer = mem; + epbh-size = EP_BUFFER_SIZE; + + epbh-info = (union ep_buffer_info *)mem; + epbh-ring = (u8 *) (mem + sizeof(union ep_buffer_info)); + + return 0; +} + +void fjes_hw_setup_epbuf(struct epbuf_handler *epbh, u8 *mac_addr, u32 mtu) +{ + + union ep_buffer_info *info = epbh-info; +
Re: [PATCH 1/1 net-next] sunrpc: use sg_init_one() in krb5_rc4_setup_enc/seq_key()
Thanks, applying.--b. On Tue, Jun 16, 2015 at 09:57:52PM +0200, Fabian Frederick wrote: Don't opencode sg_init_one() Signed-off-by: Fabian Frederick f...@skynet.be --- net/sunrpc/auth_gss/gss_krb5_crypto.c | 8 ++-- 1 file changed, 2 insertions(+), 6 deletions(-) diff --git a/net/sunrpc/auth_gss/gss_krb5_crypto.c b/net/sunrpc/auth_gss/gss_krb5_crypto.c index b5408e8..fee3c15 100644 --- a/net/sunrpc/auth_gss/gss_krb5_crypto.c +++ b/net/sunrpc/auth_gss/gss_krb5_crypto.c @@ -881,9 +881,7 @@ krb5_rc4_setup_seq_key(struct krb5_ctx *kctx, struct crypto_blkcipher *cipher, if (err) goto out_err; - sg_init_table(sg, 1); - sg_set_buf(sg, zeroconstant, 4); - + sg_init_one(sg, zeroconstant, 4); err = crypto_hash_digest(desc, sg, 4, Kseq); if (err) goto out_err; @@ -951,9 +949,7 @@ krb5_rc4_setup_enc_key(struct krb5_ctx *kctx, struct crypto_blkcipher *cipher, if (err) goto out_err; - sg_init_table(sg, 1); - sg_set_buf(sg, zeroconstant, 4); - + sg_init_one(sg, zeroconstant, 4); err = crypto_hash_digest(desc, sg, 4, Kcrypt); if (err) goto out_err; -- 2.4.2 -- To unsubscribe from this list: send the line unsubscribe netdev in
Re: [REGRESSION] NFS is creating a hidden port (left over from xs_bind() )
On Fri, 19 Jun 2015 13:39:08 -0400 Trond Myklebust trond.mykleb...@primarydata.com wrote: On Fri, Jun 19, 2015 at 1:17 PM, Steven Rostedt rost...@goodmis.org wrote: On Fri, 19 Jun 2015 12:25:53 -0400 Steven Rostedt rost...@goodmis.org wrote: I don't see that 55201 anywhere. But then again, I didn't look for it before the port disappeared. I could reboot and look for it again. I should have saved the full netstat -tapn as well :-/ Of course I didn't find it anywhere, that's the port on my wife's box that port 947 was connected to. Now I even went over to my wife's box and ran # rpcinfo -p localhost program vers proto port service 104 tcp111 portmapper 103 tcp111 portmapper 102 tcp111 portmapper 104 udp111 portmapper 103 udp111 portmapper 102 udp111 portmapper 1000241 udp 34243 status 1000241 tcp 34498 status which doesn't show anything. but something is listening to that port... # netstat -ntap |grep 55201 tcp0 0 0.0.0.0:55201 0.0.0.0:* LISTEN Hang on. This is on the client box while there is an active NFSv4 mount? Then that's probably the NFSv4 callback channel listening for delegation callbacks. Can you please try: echo options nfs callback_tcpport=4048 /etc/modprobe.d/nfs-local.conf and then either reboot the client or unload and then reload the nfs modules before reattempting the mount. If this is indeed the callback channel, then that will move your phantom listener to port 4048... Right, it was a little unclear to me before, but it now seems clear that the callback socket that the server is opening to the client is the one squatting on the port. ...and that sort of makes sense, doesn't it? That rpc_clnt will stick around for the life of the client's lease, and the rpc_clnt binds to a particular port so that it can reconnect using the same one. Given that Stephen has done the legwork and figured out that reverting those commits fixes the issue, then I suspect that the real culprit is caf4ccd4e88cf2. The client is likely closing down the other end of the callback socket when it goes idle. Before that commit, we probably did an xs_close on it, but now we're doing a xs_tcp_shutdown and that leaves the port bound. I'm travelling this weekend and am not set up to reproduce it to confirm, but that does seem to be a plausible scenario. -- Jeff Layton jlay...@poochiereds.net -- To unsubscribe from this list: send the line unsubscribe netdev in
Re: [PATCH net] netfilter: nf_qeueue: Drop queue entries on nf_unregister_hook
Add code to nf_unregister_hook to flush the nf_queue when a hook is unregistered. This guarantees that the pointer that the nf_queue code retains into the nf_hook list will remain valid while a packet is queued. I tested what would happen if we do not flush queued packets and was trivially able to obtain the oops below. All that was required was to stop the nf_queue listening process, to delete all of the nf_tables, and to awaken the nf_queue listening process. BUG: unable to handle kernel paging request at 00010001 IP: [00010001] 0x10001 PGD b9c35067 PUD 0 Oops: 0010 [#1] SMP Modules linked in: CPU: 0 PID: 519 Comm: lt-nfqnl_test Not tainted task: 8800b9c8c050 ti: 8800ba9d8000 task.ti: 8800ba9d8000 RIP: 0010:[00010001] [00010001] 0x10001 RSP: 0018:8800ba9dba40 EFLAGS: 00010a16 RAX: 8800bab48a00 RBX: 8800ba9dba90 RCX: 8800ba9dba90 RDX: 8800b9c10128 RSI: 8800ba940900 RDI: 8800bab48a00 RBP: 8800b9c10128 R08: 82976660 R09: 8800ba9dbb28 R10: dead00100100 R11: dead00200200 R12: 8800ba940900 R13: 8313fd50 R14: 8800b9c95200 R15: FS: 7fb91fc34700() GS:8800bfa0() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: 00010001 CR3: babfb000 CR4: 07f0 Stack: 8206ab0f 82982240 8800bab48a00 8800b9c100a8 8800b9c10100 0001 8800ba940900 8800b9c10128 8206bd65 8800bfb0d5e0 8800bab48a00 00014dc0 Call Trace: [8206ab0f] ? nf_iterate+0x4f/0xa0 [8206bd65] ? nf_reinject+0x125/0x190 [8206dee5] ? nfqnl_recv_verdict+0x255/0x360 [81386290] ? nla_parse+0x80/0xf0 [8206c42c] ? nfnetlink_rcv_msg+0x13c/0x240 [811b2fec] ? __memcg_kmem_get_cache+0x4c/0x150 [8206c2f0] ? nfnl_lock+0x20/0x20 [82068159] ? netlink_rcv_skb+0xa9/0xc0 [820677bf] ? netlink_unicast+0x12f/0x1c0 [82067ade] ? netlink_sendmsg+0x28e/0x650 [81fdd814] ? sock_sendmsg+0x44/0x50 [81fde07b] ? ___sys_sendmsg+0x2ab/0x2c0 [810e8f73] ? __wake_up+0x43/0x70 [8141a134] ? tty_write+0x1c4/0x2a0 [81fde9f4] ? __sys_sendmsg+0x44/0x80 [823ff8d7] ? system_call_fastpath+0x12/0x6a Code: Bad RIP value. RIP [00010001] 0x10001 RSP 8800ba9dba40 CR2: 00010001 ---[ end trace 08eb65d42362793f ]--- Cc: sta...@vger.kernel.org Signed-off-by: Eric W. Biederman ebied...@xmission.com --- Apologies for the duplicate send but I forgot to include the appropriate mailing lists. include/net/netfilter/nf_queue.h | 2 ++ net/netfilter/core.c | 1 + net/netfilter/nf_internals.h | 1 + net/netfilter/nf_queue.c | 17 + net/netfilter/nfnetlink_queue_core.c | 24 +++- 5 files changed, 44 insertions(+), 1 deletion(-) diff --git a/include/net/netfilter/nf_queue.h b/include/net/netfilter/nf_queue.h index d81d584157e1..e8635854a55b 100644 --- a/include/net/netfilter/nf_queue.h +++ b/include/net/netfilter/nf_queue.h @@ -24,6 +24,8 @@ struct nf_queue_entry { struct nf_queue_handler { int (*outfn)(struct nf_queue_entry *entry, unsigned int queuenum); + void(*nf_hook_drop)(struct net *net, + struct nf_hook_ops *ops); }; void nf_register_queue_handler(const struct nf_queue_handler *qh); diff --git a/net/netfilter/core.c b/net/netfilter/core.c index 653e32eac08c..a0e54974e2c9 100644 --- a/net/netfilter/core.c +++ b/net/netfilter/core.c @@ -118,6 +118,7 @@ void nf_unregister_hook(struct nf_hook_ops *reg) static_key_slow_dec(nf_hooks_needed[reg-pf][reg-hooknum]); #endif synchronize_net(); + nf_queue_nf_hook_drop(reg); } EXPORT_SYMBOL(nf_unregister_hook); diff --git a/net/netfilter/nf_internals.h b/net/netfilter/nf_internals.h index ea7f36784b3d..399210693c2a 100644 --- a/net/netfilter/nf_internals.h +++ b/net/netfilter/nf_internals.h @@ -19,6 +19,7 @@ unsigned int nf_iterate(struct list_head *head, struct sk_buff *skb, /* nf_queue.c */ int nf_queue(struct sk_buff *skb, struct nf_hook_ops *elem, struct nf_hook_state *state, unsigned int queuenum); +void nf_queue_nf_hook_drop(struct nf_hook_ops *ops); int __init netfilter_queue_init(void); /* nf_log.c */ diff --git a/net/netfilter/nf_queue.c b/net/netfilter/nf_queue.c index 2e88032cd5ad..cd60d397fe05 100644 --- a/net/netfilter/nf_queue.c +++ b/net/netfilter/nf_queue.c @@ -105,6 +105,23 @@ bool nf_queue_entry_get_refs(struct nf_queue_entry *entry) } EXPORT_SYMBOL_GPL(nf_queue_entry_get_refs); +void nf_queue_nf_hook_drop(struct nf_hook_ops *ops) +{ + const struct nf_queue_handler *qh; + struct net
[no subject]
Are You Seriously In Need Of Loan,Get approved loan today, at 3% contact (tracycla...@dr.com)-- To unsubscribe from this list: send the line unsubscribe netdev in
Re: [PATCH net-next RFC v2 1/3] lwt: infrastructure to support light weight tunnels
On 6/19/15, 10:25 AM, Robert Shearman wrote: n 19/06/15 16:14, roopa wrote: Today lwtunnel_skb_lwstate is called from lwtunnel_output which is only called from ipv4 code. And my ipv6 variant code was supposed to have a 6 suffix. something like lwtunnel_output6. Or to be more explicit i will probably have variants of the output and skb handling functions like, lwtunnel_output_ipv4 and lwtunnel_output_ipv6. Do you intend for these functions to be used by netdevices to support the vxlan use case? If so, then how will the netdevice know which one of the two to call? Will there have to be a netdevice for ipv4 and a netdevice for ipv6? If not, could you outline how you intend for it to be implemented? In the netdevice case, this output function is not called atall. It should just follow the existing netdevice the route is pointing to. -- To unsubscribe from this list: send the line unsubscribe netdev in
Re: [PATCH net-next RFC v2 1/3] lwt: infrastructure to support light weight tunnels
On 19/06/15 19:34, roopa wrote: On 6/19/15, 10:25 AM, Robert Shearman wrote: n 19/06/15 16:14, roopa wrote: Today lwtunnel_skb_lwstate is called from lwtunnel_output which is only called from ipv4 code. And my ipv6 variant code was supposed to have a 6 suffix. something like lwtunnel_output6. Or to be more explicit i will probably have variants of the output and skb handling functions like, lwtunnel_output_ipv4 and lwtunnel_output_ipv6. Do you intend for these functions to be used by netdevices to support the vxlan use case? If so, then how will the netdevice know which one of the two to call? Will there have to be a netdevice for ipv4 and a netdevice for ipv6? If not, could you outline how you intend for it to be implemented? In the netdevice case, this output function is not called atall. It should just follow the existing netdevice the route is pointing to. Sorry for not being clear, but I meant that there would have to be lwtunnel_skb_lwstate functions for ipv4 and ipv6 to match the output functions. So in the vxlan use case where it's using a netdevice, how would it determine which one to call? Thanks, Rob -- To unsubscribe from this list: send the line unsubscribe netdev in
Re: [PATCH net-next RFC v2 2/3] ipv4: add support for light weight tunnel encap attributes
On 6/19/15, 10:17 AM, Robert Shearman wrote: No need for that - use the example of how RTA_MULTIPATH is used for ipv4/ipv6: +--+ | RTA_MULTIPATH| +--+ | +--+ | | | struct rtnexthop | | | +--+ | | | RTA_GATEWAY, etc.| | | +--+ | +--+ You could do similar for RTA_ENCAP where the type is stored in the data prior to the nested attributes starting. E.g.: +--+ | RTA_ENCAP| +--+ | +--+ | | | struct rtencap | | | +--+ | | | MPLS_IPTUNNEL_DST| | | +--+ | +--+ struct rtencap { __u16 rte_type; }; I did think about that...but today the rtnextop seems like it was written a struct initially and then extended with attributes only because the struct could not be extended (I maybe wrong). But half the fields are in a struct and the others are attributes. It gets confusing. And i was trying to avoid that. -- To unsubscribe from this list: send the line unsubscribe netdev in