Re: [RFC PATCH net] sctp: ASCONF-ACK with Unresolvable Address should be sent

2015-08-10 Thread lucien xin
On Mon, Jul 27, 2015 at 9:44 PM, Marcelo Ricardo Leitner
marcelo.leit...@gmail.com wrote:
 On Sat, Jul 25, 2015 at 01:08:08PM +0800, Xin Long wrote:
 RFC 5061:
 This is an opaque integer assigned by the sender to identify each
 request parameter.  The receiver of the ASCONF Chunk will copy this
 32-bit value into the ASCONF Response Correlation ID field of the
 ASCONF-ACK response parameter.  The sender of the ASCONF can use this
 same value in the ASCONF-ACK to find which request the response is
 for.  Note that the receiver MUST NOT change this 32-bit value.

 Address Parameter: TLV

 This field contains an IPv4 or IPv6 address parameter, as described
 in Section 3.3.2.1 of [RFC4960].

 ASCONF chunk with Error Cause Indication Parameter (Unresolvable Address)
 should be sent if the Delete IP Address is not part of the association.

   Endpoint A   Endpoint B
   (ESTABLISHED)(ESTABLISHED)

   ASCONF-
   (Delete IP Address)
 -  ASCONF-ACK
 (Unresolvable Address)

 Signed-off-by: Xin Long lucien@gmail.com
 ---
  net/sctp/sm_make_chunk.c | 15 +--
  1 file changed, 13 insertions(+), 2 deletions(-)

 diff --git a/net/sctp/sm_make_chunk.c b/net/sctp/sm_make_chunk.c
 index 06320c8..6e399f6 100644
 --- a/net/sctp/sm_make_chunk.c
 +++ b/net/sctp/sm_make_chunk.c
 @@ -3090,8 +3090,19 @@ static __be16 sctp_process_asconf_param(struct 
 sctp_association *asoc,
   sctp_assoc_set_primary(asoc, asconf-transport);
   sctp_assoc_del_nonprimary_peers(asoc,
   asconf-transport);
 - } else
 - sctp_assoc_del_peer(asoc, addr);
 + return SCTP_ERROR_NO_ERROR;
 + }
 +
 + /* If the address is not part of the association, the
 +  * ASCONF-ACK with Error Cause Indication Parameter
 +  * which including cause of Unresolvable Address should
 +  * be sent.
 +  */
 + peer = sctp_assoc_lookup_paddr(asoc, addr);
 + if (!peer)
 + return SCTP_ERROR_DNS_FAILED;
 +
 + sctp_assoc_rm_peer(asoc, peer);
   break;
   case SCTP_PARAM_SET_PRIMARY:
   /* ADDIP Section 4.2.4
 --
 2.1.0


 Looks good to me.

   Marcelo


any update for this one? is it accepted?
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next] bpf: fix the bug 'struct bpf_array' has no member named 'prog' in s390 architecture

2015-08-10 Thread Kaixu Xia
'Kbuild test robot' sent me an email about a build error
'struct bpf_array' has no member named 'prog' in s390
architecture. This error is caused by commit: 2a36f0b92eb
638dd023870574eb471b1c56be9ad [656/692] bpf: Make the bpf
_prog_array_map more generic. In this patch, the member 'prog'
of struct bpf_array has been replaced by 'ptrs'. So this
patch fix it.
---
 arch/s390/net/bpf_jit_comp.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/s390/net/bpf_jit_comp.c b/arch/s390/net/bpf_jit_comp.c
index 9f4bbc0..eeda051 100644
--- a/arch/s390/net/bpf_jit_comp.c
+++ b/arch/s390/net/bpf_jit_comp.c
@@ -1032,7 +1032,7 @@ static noinline int bpf_jit_insn(struct bpf_jit *jit, 
struct bpf_prog *fp, int i
  MAX_TAIL_CALL_CNT, 0, 0x2);
 
/*
-* prog = array-prog[index];
+* prog = array-ptrs[index];
 * if (prog == NULL)
 * goto out;
 */
@@ -1041,7 +1041,7 @@ static noinline int bpf_jit_insn(struct bpf_jit *jit, 
struct bpf_prog *fp, int i
EMIT6_DISP_LH(0xeb00, 0x000d, REG_1, BPF_REG_3, REG_0, 3);
/* lg %r1,prog(%b2,%r1) */
EMIT6_DISP_LH(0xe300, 0x0004, REG_1, BPF_REG_2,
- REG_1, offsetof(struct bpf_array, prog));
+ REG_1, offsetof(struct bpf_array, ptrs));
/* clgij %r1,0,0x8,label0 */
EMIT6_PCREL_IMM_LABEL(0xec00, 0x007d, REG_1, 0, 0, 0x8);
 
-- 
1.8.3.4

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] net/fddi: remove HWM_REVERSE() macro

2015-08-10 Thread yalin wang

 On Aug 11, 2015, at 12:24, David Miller da...@davemloft.net wrote:
 
 From: yalin wang yalin.wang2...@gmail.com
 Date: Tue, 11 Aug 2015 09:57:21 +0800
 
 HWM_REVERSE() macro is unused, remove it.
 
 Signed-off-by: yalin wang yalin.wang2...@gmail.com
 
 Your email client has corrupted this patch.
 
 Please read Documentation/email-clients.txt, send a test patch to yourself,
 and only resubmit this change once you are able to successfully apply the
 patch you receive in that test email.
 
 Thanks.
ok, Thanks.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 Resend] net/fddi: remove HWM_REVERSE() macro

2015-08-10 Thread yalin wang
HWM_REVERSE() macro is unused, remove it.

Signed-off-by: yalin wang yalin.wang2...@gmail.com
---
drivers/net/fddi/skfp/h/hwmtm.h | 10 --
1 file changed, 10 deletions(-)

diff --git a/drivers/net/fddi/skfp/h/hwmtm.h b/drivers/net/fddi/skfp/h/hwmtm.h
index 5924d42..4ca2341 100644
--- a/drivers/net/fddi/skfp/h/hwmtm.h
+++ b/drivers/net/fddi/skfp/h/hwmtm.h
@@ -74,15 +74,6 @@
#define NULL   0
#endif

-#ifdef LITTLE_ENDIAN
-#define HWM_REVERSE(x) (x)
-#else
-#defineHWM_REVERSE(x)  x)24L)0xff00L)   +   
\
-(((x) 8L)0x00ffL)   +   \
-(((x) 8L)0xff00L)   +   \
-(((x)24L)0x00ffL))
-#endif
-
#define C_INDIC(1L25)
#define A_INDIC(1L26)
#defineRD_FS_LOCAL 0x80
-- 
1.9.1


--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next] bpf: fix the bug 'struct bpf_array' has no member named 'prog' in s390 architecture

2015-08-10 Thread David Miller
From: Kaixu Xia xiaka...@huawei.com
Date: Tue, 11 Aug 2015 05:00:24 +

 'Kbuild test robot' sent me an email about a build error
 'struct bpf_array' has no member named 'prog' in s390
 architecture. This error is caused by commit: 2a36f0b92eb
 638dd023870574eb471b1c56be9ad [656/692] bpf: Make the bpf
 _prog_array_map more generic. In this patch, the member 'prog'
 of struct bpf_array has been replaced by 'ptrs'. So this
 patch fix it.

Please resubmit with a proper Fixes:  and Signed-off-by: 
tags.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 Resend] net/fddi: remove HWM_REVERSE() macro

2015-08-10 Thread David Miller
From: yalin wang yalin.wang2...@gmail.com
Date: Tue, 11 Aug 2015 13:11:22 +0800

 HWM_REVERSE() macro is unused, remove it.
 
 Signed-off-by: yalin wang yalin.wang2...@gmail.com

You did not do as I asked you to, this patch is still corrupted
and there is no way you successfully applied what is in this patch.

 -#defineHWM_REVERSE(x)  x)24L)0xff00L)   + 
   \
 -(((x) 8L)0x00ffL)   +   \
 -(((x) 8L)0xff00L)   +   \
 -(((x)24L)0x00ffL))

This indentation here is spaces, whereas in the source files they
are TABS.

Your email client did this.

If you fail to properly verify that your outgoing patches are not
corrupted before submitting them here, I will stop reviewing and
considering your changes.

Thank you.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: VxLAN support question

2015-08-10 Thread Alexei Starovoitov

On 8/10/15 4:47 PM, Andrew Qu wrote:


Pretty much what I want is that  kernel will have about 1K interfaces 
(something like Tunnel100.1-tunnel100.1000
To be created and attached to 1K bridge domains on which each VNI is associated 
with given
VNI to bridge-domain will be assigned using other CLIs)


creating 1k vxlan devices is doable, but you probably want to take
a look at recently added metadata mode of vxlan.
Also sounds like for each vni you'd need a different multicast group?
What fabric going to support that?


* Email Confidentiality Notice 


please avoid such banners.

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 Resend] net/fddi: remove HWM_REVERSE() macro

2015-08-10 Thread yalin wang

 On Aug 11, 2015, at 13:37, David Miller da...@davemloft.net wrote:
 
 From: yalin wang yalin.wang2...@gmail.com
 Date: Tue, 11 Aug 2015 13:11:22 +0800
 
 HWM_REVERSE() macro is unused, remove it.
 
 Signed-off-by: yalin wang yalin.wang2...@gmail.com
 
 You did not do as I asked you to, this patch is still corrupted
 and there is no way you successfully applied what is in this patch.
 
 -#defineHWM_REVERSE(x)  x)24L)0xff00L)   +
\
 -(((x) 8L)0x00ffL)   +   \
 -(((x) 8L)0xff00L)   +   \
 -(((x)24L)0x00ffL))
 
 This indentation here is spaces, whereas in the source files they
 are TABS.
 
 Your email client did this.
 
 If you fail to properly verify that your outgoing patches are not
 corrupted before submitting them here, I will stop reviewing and
 considering your changes.
 
 Thank you.
ouch,  i am sorry that i am sending from windows PC,
let me check that .
Sorry for that .--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net] netconsole: Check for carrier before calling netpoll_send_udp()

2015-08-10 Thread Jonathan Maxwell
 What if the carrier check passes, and then the chip reset starts on
 another cpu?  You'll have the same problem.

Okay, let me see if I can come up with a better way to mitigate this.

On Tue, Aug 11, 2015 at 2:22 PM, David Miller da...@davemloft.net wrote:
 From: Jon Maxwell jmaxwel...@gmail.com
 Date: Tue, 11 Aug 2015 11:32:26 +1000

 We have seen a few crashes recently where a NIC is getting
 reset for some reason and then the driver or another module calls
 printk() which invokes netconsole. Netconsole then calls the
 adapter specific poll routine via netpoll which crashes because
 the adapter is resetting and its structures are being reinitialized.

 This isn't a fix.

 What if the carrier check passes, and then the chip reset starts on
 another cpu?  You'll have the same problem.

 I'm not applying this, sorry.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next] vxlan: fix fdb_dump index calculation

2015-08-10 Thread Atzm Watanabe
When too many remotes are bound to an FDB entry, index may not be increased.
This problem will be caused on the large scale environment that is based on
the unicast default destination, for instance.

Signed-off-by: Atzm Watanabe a...@iij.ad.jp
---
 drivers/net/vxlan.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index b6731fa..06c0731 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -931,10 +931,10 @@ static int vxlan_fdb_dump(struct sk_buff *skb, struct 
netlink_callback *cb,
hlist_for_each_entry_rcu(f, vxlan-fdb_head[h], hlist) {
struct vxlan_rdst *rd;
 
-   if (idx  cb-args[0])
-   goto skip;
-
list_for_each_entry_rcu(rd, f-remotes, list) {
+   if (idx  cb-args[0])
+   goto skip;
+
err = vxlan_fdb_info(skb, vxlan, f,
 NETLINK_CB(cb-skb).portid,
 cb-nlh-nlmsg_seq,
@@ -942,9 +942,9 @@ static int vxlan_fdb_dump(struct sk_buff *skb, struct 
netlink_callback *cb,
 NLM_F_MULTI, rd);
if (err  0)
goto out;
-   }
 skip:
-   ++idx;
+   ++idx;
+   }
}
}
 out:
-- 
2.4.6

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv1 bluetooth-next] cc2520: set the default fifo pin value from platform data

2015-08-10 Thread Varka Bhadram

Yup...  :-)  Your name in the From (LIYONG) address is different from SOB (Yong 
Li) address.
It should be same, please fix your email-client.

On 08/10/2015 12:59 PM, LIYONG wrote:


In case of the device tree support is disabled, the fifo_pin is uninitialized, 
this
patch will set the fifo_pin value based on platform data

Signed-off-by: Yong Lisdliy...@gmail.com
---
  drivers/net/ieee802154/cc2520.c | 1 +
  1 file changed, 1 insertion(+)

diff --git a/drivers/net/ieee802154/cc2520.c b/drivers/net/ieee802154/cc2520.c
index 613dae5..c5b54a1 100644
--- a/drivers/net/ieee802154/cc2520.c
+++ b/drivers/net/ieee802154/cc2520.c
@@ -833,6 +833,7 @@ static int cc2520_get_platform_data(struct spi_device *spi,
 if (!spi_pdata)
 return -ENOENT;
 *pdata = *spi_pdata;
+   priv-fifo_pin = pdata-fifo;
 return 0;
 }

--
2.1.0




This patch is not applying.

Please use 'git format-patch' to generate the patch. And send it by 'git 
send-email'

In your case:
git commit -am -s cc2520: set the default fifo pin value from platform 
data
git format-patch --subject-prefix=PATCH v2 bluetooth-next -1
git send-email 0001-


--
Varka Bhadram.

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] net/fddi:change HWM_REVERSE() macro

2015-08-10 Thread yalin wang
change HWM_REVERSE() macro to generic le32_to_cpu()

Signed-off-by: yalin wang yalin.wang2...@gmail.com
---
 drivers/net/fddi/skfp/h/hwmtm.h | 11 ++-
 1 file changed, 2 insertions(+), 9 deletions(-)

diff --git a/drivers/net/fddi/skfp/h/hwmtm.h b/drivers/net/fddi/skfp/h/hwmtm.h
index 5924d42..72701ef 100644
--- a/drivers/net/fddi/skfp/h/hwmtm.h
+++ b/drivers/net/fddi/skfp/h/hwmtm.h
@@ -14,7 +14,7 @@
 
 #ifndef_HWM_
 #define_HWM_
-
+#include linux/byteorder/generic.h
 #include mbuf.h
 
 /*
@@ -74,14 +74,7 @@
 #define NULL   0
 #endif
 
-#ifdef LITTLE_ENDIAN
-#define HWM_REVERSE(x) (x)
-#else
-#defineHWM_REVERSE(x)  x)24L)0xff00L)   +   
\
-(((x) 8L)0x00ffL)   +   \
-(((x) 8L)0xff00L)   +   \
-(((x)24L)0x00ffL))
-#endif
+#define HWM_REVERSE(x) le32_to_cpu(x)
 
 #define C_INDIC(1L25)
 #define A_INDIC(1L26)
-- 
1.9.1



--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next v3 0/8] net: dsa: mv88e6xxx: support switchdev FDB objects

2015-08-10 Thread Vivien Didelot
Hi Andrew,

On 15-08-10 16:11:38, Andrew Lunn wrote:
 On Mon, Aug 10, 2015 at 09:09:45AM -0400, Vivien Didelot wrote:
  This patchset refactors the FDB management in the mv88e6xxx code and adds 
  the
  glue in DSA to use the switchdev FDB objects.
 
 Hi Vivien
 
 Thanks for reworking these patches. Now they are much smaller, they
 are much easier to review.
 
 Reviewed-by: Andrew Lunn and...@lunn.ch

Thanks for your time and suggestions on this, indeed with the reworked
order, the diffs got smaller and more natural.

Regards,
-v
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net] inet: fix races with reqsk timers

2015-08-10 Thread Eric Dumazet
From: Eric Dumazet eduma...@google.com

reqsk_queue_destroy() and reqsk_queue_unlink() should use
del_timer_sync() instead of del_timer() before calling reqsk_put(),
otherwise we could free a req still used by another cpu.

But before doing so, reqsk_queue_destroy() must release syn_wait_lock
spinlock or risk a dead lock, as reqsk_timer_handler() might
need to take this same spinlock from reqsk_queue_unlink() (called from
inet_csk_reqsk_queue_drop())

Fixes: fa76ce7328b2 (inet: get rid of central tcp/dccp listener timer)
Signed-off-by: Eric Dumazet eduma...@google.com
---
 net/core/request_sock.c |8 +++-
 net/ipv4/inet_connection_sock.c |2 +-
 2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/net/core/request_sock.c b/net/core/request_sock.c
index 87b22c0..b42f0e2 100644
--- a/net/core/request_sock.c
+++ b/net/core/request_sock.c
@@ -103,10 +103,16 @@ void reqsk_queue_destroy(struct request_sock_queue *queue)
spin_lock_bh(queue-syn_wait_lock);
while ((req = lopt-syn_table[i]) != NULL) {
lopt-syn_table[i] = req-dl_next;
+   /* Because of following del_timer_sync(),
+* we must release the spinlock here
+* or risk a dead lock.
+*/
+   spin_unlock_bh(queue-syn_wait_lock);
atomic_inc(lopt-qlen_dec);
-   if (del_timer(req-rsk_timer))
+   if (del_timer_sync(req-rsk_timer))
reqsk_put(req);
reqsk_put(req);
+   spin_lock_bh(queue-syn_wait_lock);
}
spin_unlock_bh(queue-syn_wait_lock);
}
diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
index 60021d0..05e3145 100644
--- a/net/ipv4/inet_connection_sock.c
+++ b/net/ipv4/inet_connection_sock.c
@@ -593,7 +593,7 @@ static bool reqsk_queue_unlink(struct request_sock_queue 
*queue,
}
 
spin_unlock(queue-syn_wait_lock);
-   if (del_timer(req-rsk_timer))
+   if (del_timer_sync(req-rsk_timer))
reqsk_put(req);
return found;
 }


--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] net/fddi:change HWM_REVERSE() macro

2015-08-10 Thread Joe Perches
On Tue, 2015-08-11 at 00:14 +0800, yalin wang wrote:
 HWM_REVERSE

Is unused and it would be better if removed.

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 08/10] ss: symmetrical subhandler output extension example

2015-08-10 Thread Eric Dumazet
On Mon, 2015-08-10 at 15:19 +0300, Sergei Shtylyov wrote:

 {} not needed. I guess you haven't run your patches thru 
 scripts/checkpatch.pl?
 

Yes, although this is missing from iproute2 sources ;)


--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next 9/9] net: Introduce VRF device driver

2015-08-10 Thread David Ahern
This driver borrows heavily from IPvlan and teaming drivers.

Routing domains (VRF-lite) are created by instantiating a VRF master
device with an associated table and enslaving all routed interfaces that
participate in the domain. As part of the enslavement, all connected
routes for the enslaved devices are moved to the table associated with
the VRF device. Outgoing sockets must bind to the VRF device to function.

Standard FIB rules bind the VRF device to tables and regular fib rule
processing is followed. Routed traffic through the box, is forwarded by
using the VRF device as the IIF and following the IIF rule to a table
that is mated with the VRF.

Example:

   Create vrf 1:
 ip link add vrf1 type vrf table 5
 ip rule add iif vrf1 table 5
 ip rule add oif vrf1 table 5
 ip route add table 5 prohibit default
 ip link set vrf1 up

   Add interface to vrf 1:
 ip link set eth1 master vrf1

Signed-off-by: Shrijeet Mukherjee s...@cumulusnetworks.com
Signed-off-by: David Ahern d...@cumulusnetworks.com
---
 drivers/net/Kconfig  |   7 +
 drivers/net/Makefile |   1 +
 drivers/net/vrf.c| 685 +++
 3 files changed, 693 insertions(+)
 create mode 100644 drivers/net/vrf.c

diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig
index c18f9e62a9fa..e58468b02987 100644
--- a/drivers/net/Kconfig
+++ b/drivers/net/Kconfig
@@ -297,6 +297,13 @@ config NLMON
  diagnostics, etc. This is mostly intended for developers or support
  to debug netlink issues. If unsure, say N.
 
+config NET_VRF
+   tristate Virtual Routing and Forwarding (Lite)
+   depends on IP_MULTIPLE_TABLES  IPV6_MULTIPLE_TABLES
+   ---help---
+ This option enables the support for mapping interfaces into VRF's. The
+ support enables VRF devices.
+
 endif # NET_CORE
 
 config SUNGEM_PHY
diff --git a/drivers/net/Makefile b/drivers/net/Makefile
index c12cb22478a7..ca16dd689b36 100644
--- a/drivers/net/Makefile
+++ b/drivers/net/Makefile
@@ -25,6 +25,7 @@ obj-$(CONFIG_VIRTIO_NET) += virtio_net.o
 obj-$(CONFIG_VXLAN) += vxlan.o
 obj-$(CONFIG_GENEVE) += geneve.o
 obj-$(CONFIG_NLMON) += nlmon.o
+obj-$(CONFIG_NET_VRF) += vrf.o
 
 #
 # Networking Drivers
diff --git a/drivers/net/vrf.c b/drivers/net/vrf.c
new file mode 100644
index ..95097cb79354
--- /dev/null
+++ b/drivers/net/vrf.c
@@ -0,0 +1,685 @@
+/*
+ * vrf.c: device driver to encapsulate a VRF space
+ *
+ * Copyright (c) 2015 Cumulus Networks. All rights reserved.
+ * Copyright (c) 2015 Shrijeet Mukherjee s...@cumulusnetworks.com
+ * Copyright (c) 2015 David Ahern d...@cumulusnetworks.com
+ *
+ * Based on dummy, team and ipvlan drivers
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+#include linux/module.h
+#include linux/kernel.h
+#include linux/netdevice.h
+#include linux/etherdevice.h
+#include linux/ip.h
+#include linux/init.h
+#include linux/moduleparam.h
+#include linux/netfilter.h
+#include linux/rtnetlink.h
+#include net/rtnetlink.h
+#include linux/u64_stats_sync.h
+#include linux/hashtable.h
+
+#include linux/inetdevice.h
+#include net/ip.h
+#include net/ip_fib.h
+#include net/ip6_route.h
+#include net/rtnetlink.h
+#include net/route.h
+#include net/addrconf.h
+#include net/vrf.h
+
+#define DRV_NAME   vrf
+#define DRV_VERSION1.0
+
+#define vrf_is_slave(dev)   ((dev)-flags  IFF_SLAVE)
+
+#define vrf_master_get_rcu(dev) \
+   ((struct net_device *)rcu_dereference(dev-rx_handler_data))
+
+struct pcpu_dstats {
+   u64 tx_pkts;
+   u64 tx_bytes;
+   u64 tx_drps;
+   u64 rx_pkts;
+   u64 rx_bytes;
+   struct u64_stats_sync   syncp;
+};
+
+static struct dst_entry *vrf_ip_check(struct dst_entry *dst, u32 cookie)
+{
+   return dst;
+}
+
+static int vrf_ip_local_out(struct sk_buff *skb)
+{
+   return ip_local_out(skb);
+}
+
+static unsigned int vrf_v4_mtu(const struct dst_entry *dst)
+{
+   /* TO-DO: return max ethernet size? */
+   return dst-dev-mtu;
+}
+
+static void vrf_dst_destroy(struct dst_entry *dst)
+{
+   /* our dst lives forever - or until the device is closed */
+}
+
+static unsigned int vrf_default_advmss(const struct dst_entry *dst)
+{
+   return 65535 - 40;
+}
+
+static struct dst_ops vrf_dst_ops = {
+   .family = AF_INET,
+   .local_out  = vrf_ip_local_out,
+   .check  = vrf_ip_check,
+   .mtu= vrf_v4_mtu,
+   .destroy= vrf_dst_destroy,
+   .default_advmss = vrf_default_advmss,
+};
+
+static bool is_ip_rx_frame(struct sk_buff *skb)
+{
+   switch (skb-protocol) {
+   case htons(ETH_P_IP):
+   case htons(ETH_P_IPV6):
+   

[PATCH 3/5] netfilter: conntrack: Use flags in nf_ct_tmpl_alloc()

2015-08-10 Thread Pablo Neira Ayuso
From: Joe Stringer joestrin...@nicira.com

The flags were ignored for this function when it was introduced. Also
fix the style problem in kzalloc.

Fixes: 0838aa7fc (netfilter: fix netns dependencies with conntrack
templates)
Signed-off-by: Joe Stringer joestrin...@nicira.com
Signed-off-by: Pablo Neira Ayuso pa...@netfilter.org
---
 net/netfilter/nf_conntrack_core.c |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/netfilter/nf_conntrack_core.c 
b/net/netfilter/nf_conntrack_core.c
index f168099..3c20d02 100644
--- a/net/netfilter/nf_conntrack_core.c
+++ b/net/netfilter/nf_conntrack_core.c
@@ -292,7 +292,7 @@ struct nf_conn *nf_ct_tmpl_alloc(struct net *net, u16 zone, 
gfp_t flags)
 {
struct nf_conn *tmpl;
 
-   tmpl = kzalloc(sizeof(struct nf_conn), GFP_KERNEL);
+   tmpl = kzalloc(sizeof(*tmpl), flags);
if (tmpl == NULL)
return NULL;
 
@@ -303,7 +303,7 @@ struct nf_conn *nf_ct_tmpl_alloc(struct net *net, u16 zone, 
gfp_t flags)
if (zone) {
struct nf_conntrack_zone *nf_ct_zone;
 
-   nf_ct_zone = nf_ct_ext_add(tmpl, NF_CT_EXT_ZONE, GFP_ATOMIC);
+   nf_ct_zone = nf_ct_ext_add(tmpl, NF_CT_EXT_ZONE, flags);
if (!nf_ct_zone)
goto out_free;
nf_ct_zone-id = zone;
-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/5] netfilter: ip6t_SYNPROXY: fix NULL pointer dereference

2015-08-10 Thread Pablo Neira Ayuso
From: Phil Sutter p...@nwl.cc

This happens when networking namespaces are enabled.

Suggested-by: Patrick McHardy ka...@trash.net
Signed-off-by: Phil Sutter p...@nwl.cc
Acked-by: Patrick McHardy ka...@trash.net
Signed-off-by: Pablo Neira Ayuso pa...@netfilter.org
---
 net/ipv6/netfilter/ip6t_SYNPROXY.c |   18 ++
 1 file changed, 10 insertions(+), 8 deletions(-)

diff --git a/net/ipv6/netfilter/ip6t_SYNPROXY.c 
b/net/ipv6/netfilter/ip6t_SYNPROXY.c
index 6edb7b1..bcebc24 100644
--- a/net/ipv6/netfilter/ip6t_SYNPROXY.c
+++ b/net/ipv6/netfilter/ip6t_SYNPROXY.c
@@ -37,12 +37,13 @@ synproxy_build_ip(struct sk_buff *skb, const struct 
in6_addr *saddr,
 }
 
 static void
-synproxy_send_tcp(const struct sk_buff *skb, struct sk_buff *nskb,
+synproxy_send_tcp(const struct synproxy_net *snet,
+ const struct sk_buff *skb, struct sk_buff *nskb,
  struct nf_conntrack *nfct, enum ip_conntrack_info ctinfo,
  struct ipv6hdr *niph, struct tcphdr *nth,
  unsigned int tcp_hdr_size)
 {
-   struct net *net = nf_ct_net((struct nf_conn *)nfct);
+   struct net *net = nf_ct_net(snet-tmpl);
struct dst_entry *dst;
struct flowi6 fl6;
 
@@ -83,7 +84,8 @@ free_nskb:
 }
 
 static void
-synproxy_send_client_synack(const struct sk_buff *skb, const struct tcphdr *th,
+synproxy_send_client_synack(const struct synproxy_net *snet,
+   const struct sk_buff *skb, const struct tcphdr *th,
const struct synproxy_options *opts)
 {
struct sk_buff *nskb;
@@ -119,7 +121,7 @@ synproxy_send_client_synack(const struct sk_buff *skb, 
const struct tcphdr *th,
 
synproxy_build_options(nth, opts);
 
-   synproxy_send_tcp(skb, nskb, skb-nfct, IP_CT_ESTABLISHED_REPLY,
+   synproxy_send_tcp(snet, skb, nskb, skb-nfct, IP_CT_ESTABLISHED_REPLY,
  niph, nth, tcp_hdr_size);
 }
 
@@ -163,7 +165,7 @@ synproxy_send_server_syn(const struct synproxy_net *snet,
 
synproxy_build_options(nth, opts);
 
-   synproxy_send_tcp(skb, nskb, snet-tmpl-ct_general, IP_CT_NEW,
+   synproxy_send_tcp(snet, skb, nskb, snet-tmpl-ct_general, IP_CT_NEW,
  niph, nth, tcp_hdr_size);
 }
 
@@ -203,7 +205,7 @@ synproxy_send_server_ack(const struct synproxy_net *snet,
 
synproxy_build_options(nth, opts);
 
-   synproxy_send_tcp(skb, nskb, NULL, 0, niph, nth, tcp_hdr_size);
+   synproxy_send_tcp(snet, skb, nskb, NULL, 0, niph, nth, tcp_hdr_size);
 }
 
 static void
@@ -241,7 +243,7 @@ synproxy_send_client_ack(const struct synproxy_net *snet,
 
synproxy_build_options(nth, opts);
 
-   synproxy_send_tcp(skb, nskb, NULL, 0, niph, nth, tcp_hdr_size);
+   synproxy_send_tcp(snet, skb, nskb, NULL, 0, niph, nth, tcp_hdr_size);
 }
 
 static bool
@@ -301,7 +303,7 @@ synproxy_tg6(struct sk_buff *skb, const struct 
xt_action_param *par)
  XT_SYNPROXY_OPT_SACK_PERM |
  XT_SYNPROXY_OPT_ECN);
 
-   synproxy_send_client_synack(skb, th, opts);
+   synproxy_send_client_synack(snet, skb, th, opts);
return NF_DROP;
 
} else if (th-ack  !(th-fin || th-rst || th-syn)) {
-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/5] netfilter: nf_conntrack: silence warning on falling back to vmalloc()

2015-08-10 Thread Pablo Neira Ayuso
Since 88eab472ec21 (netfilter: conntrack: adjust nf_conntrack_buckets default
value), the hashtable can easily hit this warning. We got reports from users
that are getting this message in a quite spamming fashion, so better silence
this.

Signed-off-by: Pablo Neira Ayuso pa...@netfilter.org
Acked-by: Florian Westphal f...@strlen.de
---
 net/netfilter/nf_conntrack_core.c |4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/net/netfilter/nf_conntrack_core.c 
b/net/netfilter/nf_conntrack_core.c
index 651039a..f168099 100644
--- a/net/netfilter/nf_conntrack_core.c
+++ b/net/netfilter/nf_conntrack_core.c
@@ -1544,10 +1544,8 @@ void *nf_ct_alloc_hashtable(unsigned int *sizep, int 
nulls)
sz = nr_slots * sizeof(struct hlist_nulls_head);
hash = (void *)__get_free_pages(GFP_KERNEL | __GFP_NOWARN | __GFP_ZERO,
get_order(sz));
-   if (!hash) {
-   printk(KERN_WARNING nf_conntrack: falling back to vmalloc.\n);
+   if (!hash)
hash = vzalloc(sz);
-   }
 
if (hash  nulls)
for (i = 0; i  nr_slots; i++)
-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next 1/2] net: track link status of ipv6 nexthops

2015-08-10 Thread David Miller
From: Andy Gospodarek go...@cumulusnetworks.com
Date: Thu,  6 Aug 2015 11:42:33 -0400

 Add support to track current link status of ipv6 nexthops to match
 recent changes that added support for ipv4 nexthops.  There was not a
 field already available that could track these and no space available in
 the existing rt6i_flags field, so this patch adds rt6i_nhflags to struct
 rt6_info.
 
 Signed-off-by: Andy Gospodarek go...@cumulusnetworks.com
 Signed-off-by: Dinesh Dutt dd...@cumulusnetworks.com

This doesn't really make any sense to me.

You can evaluate the state of the link at the time you look at the
route at all of the places where it matters as far as I can tell.

It's so expensive to walk the entire routing table every time a link
goes up and down, so it's much better to take an evaluate as needed
approach to implementing this.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/5] Netfilter fixes for net

2015-08-10 Thread Pablo Neira Ayuso
Hi David,

The following patchset contains five Netfilter fixes for your net tree,
they are:

1) Silence a warning on falling back to vmalloc(). Since 88eab472ec21, we can
   easily hit this warning message, that gets users confused. So let's get rid
   of it.

2) Recently when porting the template object allocation on top of kmalloc to
   fix the netns dependencies between x_tables and conntrack, the error
   checks where left unchanged. Remove IS_ERR() and check for NULL instead.
   Patch from Dan Carpenter.

3) Don't ignore gfp_flags in the new nf_ct_tmpl_alloc() function, from
   Joe Stringer.

4) Fix a crash due to NULL pointer dereference in ip6t_SYNPROXY, patch from
   Phil Sutter.

5) The sequence number of the Syn+ack that is sent from SYNPROXY to clients is
   not adjusted through our NAT infrastructure, as a result the client may
   ignore this TCP packet and TCP flow hangs until the client probes us.  Also
   from Phil Sutter.

You can pull these changes from:

  git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf.git

Thanks!



The following changes since commit 15f1bb1f1e067be7088ed43ef23d59629bd24348:

  qlcnic: Fix corruption while copying (2015-07-29 23:57:26 -0700)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf.git master

for you to fetch changes up to 3c16241c445303a90529565e7437e1f240acfef2:

  netfilter: SYNPROXY: fix sending window update to client (2015-08-10 13:55:07 
+0200)


Dan Carpenter (1):
  netfilter: nf_conntrack: checking for IS_ERR() instead of NULL

Joe Stringer (1):
  netfilter: conntrack: Use flags in nf_ct_tmpl_alloc()

Pablo Neira Ayuso (1):
  netfilter: nf_conntrack: silence warning on falling back to vmalloc()

Phil Sutter (2):
  netfilter: ip6t_SYNPROXY: fix NULL pointer dereference
  netfilter: SYNPROXY: fix sending window update to client

 net/ipv4/netfilter/ipt_SYNPROXY.c  |3 ++-
 net/ipv6/netfilter/ip6t_SYNPROXY.c |   19 +++
 net/netfilter/nf_conntrack_core.c  |8 +++-
 net/netfilter/nf_synproxy_core.c   |4 +---
 net/netfilter/xt_CT.c  |5 +++--
 5 files changed, 20 insertions(+), 19 deletions(-)
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/5] netfilter: nf_conntrack: checking for IS_ERR() instead of NULL

2015-08-10 Thread Pablo Neira Ayuso
From: Dan Carpenter dan.carpen...@oracle.com

We recently changed this from nf_conntrack_alloc() to nf_ct_tmpl_alloc()
so the error handling needs to changed to check for NULL instead of
IS_ERR().

Fixes: 0838aa7fcfcd ('netfilter: fix netns dependencies with conntrack 
templates')
Signed-off-by: Dan Carpenter dan.carpen...@oracle.com
Signed-off-by: Pablo Neira Ayuso pa...@netfilter.org
---
 net/netfilter/nf_synproxy_core.c |4 +---
 net/netfilter/xt_CT.c|5 +++--
 2 files changed, 4 insertions(+), 5 deletions(-)

diff --git a/net/netfilter/nf_synproxy_core.c b/net/netfilter/nf_synproxy_core.c
index 71f1e9f..d7f1685 100644
--- a/net/netfilter/nf_synproxy_core.c
+++ b/net/netfilter/nf_synproxy_core.c
@@ -353,10 +353,8 @@ static int __net_init synproxy_net_init(struct net *net)
int err = -ENOMEM;
 
ct = nf_ct_tmpl_alloc(net, 0, GFP_KERNEL);
-   if (IS_ERR(ct)) {
-   err = PTR_ERR(ct);
+   if (!ct)
goto err1;
-   }
 
if (!nfct_seqadj_ext_add(ct))
goto err2;
diff --git a/net/netfilter/xt_CT.c b/net/netfilter/xt_CT.c
index c663003..43ddeee 100644
--- a/net/netfilter/xt_CT.c
+++ b/net/netfilter/xt_CT.c
@@ -202,9 +202,10 @@ static int xt_ct_tg_check(const struct xt_tgchk_param *par,
goto err1;
 
ct = nf_ct_tmpl_alloc(par-net, info-zone, GFP_KERNEL);
-   ret = PTR_ERR(ct);
-   if (IS_ERR(ct))
+   if (!ct) {
+   ret = -ENOMEM;
goto err2;
+   }
 
ret = 0;
if ((info-ct_events || info-exp_events) 
-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 5/5] netfilter: SYNPROXY: fix sending window update to client

2015-08-10 Thread Pablo Neira Ayuso
From: Phil Sutter p...@nwl.cc

Upon receipt of SYNACK from the server, ipt_SYNPROXY first sends back an ACK to
finish the server handshake, then calls nf_ct_seqadj_init() to initiate
sequence number adjustment of forwarded packets to the client and finally sends
a window update to the client to unblock it's TX queue.

Since synproxy_send_client_ack() does not set synproxy_send_tcp()'s nfct
parameter, no sequence number adjustment happens and the client receives the
window update with incorrect sequence number. Depending on client TCP
implementation, this leads to a significant delay (until a window probe is
being sent).

Signed-off-by: Phil Sutter p...@nwl.cc
Signed-off-by: Pablo Neira Ayuso pa...@netfilter.org
---
 net/ipv4/netfilter/ipt_SYNPROXY.c  |3 ++-
 net/ipv6/netfilter/ip6t_SYNPROXY.c |3 ++-
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/netfilter/ipt_SYNPROXY.c 
b/net/ipv4/netfilter/ipt_SYNPROXY.c
index fe8cc18..95ea633e 100644
--- a/net/ipv4/netfilter/ipt_SYNPROXY.c
+++ b/net/ipv4/netfilter/ipt_SYNPROXY.c
@@ -226,7 +226,8 @@ synproxy_send_client_ack(const struct synproxy_net *snet,
 
synproxy_build_options(nth, opts);
 
-   synproxy_send_tcp(skb, nskb, NULL, 0, niph, nth, tcp_hdr_size);
+   synproxy_send_tcp(skb, nskb, skb-nfct, IP_CT_ESTABLISHED_REPLY,
+ niph, nth, tcp_hdr_size);
 }
 
 static bool
diff --git a/net/ipv6/netfilter/ip6t_SYNPROXY.c 
b/net/ipv6/netfilter/ip6t_SYNPROXY.c
index bcebc24..ebbb754 100644
--- a/net/ipv6/netfilter/ip6t_SYNPROXY.c
+++ b/net/ipv6/netfilter/ip6t_SYNPROXY.c
@@ -243,7 +243,8 @@ synproxy_send_client_ack(const struct synproxy_net *snet,
 
synproxy_build_options(nth, opts);
 
-   synproxy_send_tcp(snet, skb, nskb, NULL, 0, niph, nth, tcp_hdr_size);
+   synproxy_send_tcp(snet, skb, nskb, skb-nfct, IP_CT_ESTABLISHED_REPLY,
+ niph, nth, tcp_hdr_size);
 }
 
 static bool
-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net] netlink: make sure -EBUSY won't escape from netlink_insert

2015-08-10 Thread David Miller
From: Daniel Borkmann dan...@iogearbox.net
Date: Fri,  7 Aug 2015 00:26:41 +0200

 Linus reports the following deadlock on rtnl_mutex; triggered only
 once so far (extract):
 ...
 It seems so far plausible that the recursive call into rtnetlink_rcv()
 looks suspicious. One way, where this could trigger is that the senders
 NETLINK_CB(skb).portid was wrongly 0 (which is rtnetlink socket), so
 the rtnl_getlink() request's answer would be sent to the kernel instead
 to the actual user process, thus grabbing rtnl_mutex() twice.
 
 One theory would be that netlink_autobind() triggered via netlink_sendmsg()
 internally overwrites the -EBUSY error to 0, but where it is wrongly
 originating from __netlink_insert() instead. That would reset the
 socket's portid to 0, which is then filled into NETLINK_CB(skb).portid
 later on. As commit d470e3b483dc ([NETLINK]: Fix two socket hashing bugs.)
 also puts it, -EBUSY should not be propagated from netlink_insert().
 
 It looks like it's very unlikely to reproduce. We need to trigger the
 rhashtable_insert_rehash() handler under a situation where rehashing
 currently occurs (one /rare/ way would be to hit ht-elasticity limits
 while not filled enough to expand the hashtable, but that would rather
 require a specifically crafted bind() sequence with knowledge about
 destination slots, seems unlikely). It probably makes sense to guard
 __netlink_insert() in any case and remap that error. It was suggested
 that EOVERFLOW might be better than an already overloaded ENOMEM.
 
 Reference: http://thread.gmane.org/gmane.linux.network/372676
 Reported-by: Linus Torvalds torva...@linux-foundation.org
 Signed-off-by: Daniel Borkmann dan...@iogearbox.net

Applied and queued up for -stable, thanks.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kernel warning in tcp_fragment

2015-08-10 Thread Jovi Zhangwei
Ping?

We saw a lot of this warnings in our production system. It would be
great appreciate if someone can give us the fix on this warnings. :)

On Fri, Jul 31, 2015 at 11:04 AM, Jovi Zhangwei j...@cloudflare.com wrote:
 Hi Eric,

 Would you like share your thought on this bug? great thanks.


 On Mon, Jul 27, 2015 at 4:19 PM, Martin KaFai Lau ka...@fb.com wrote:
 On Wed, Jul 22, 2015 at 11:55:35AM -0700, Jovi Zhangwei wrote:
 Sorry for disturbing, our production system(3.14 and 3.18 stable
 kernel) have many tcp_fragment warnings,
 the trace is same as below one which you discussed before.

 https://urldefense.proofpoint.com/v1/url?u=http://comments.gmane.org/gmane.linux.network/365658k=ZVNjlDMF0FElm4dQtryO4A%3D%3D%0Ar=%2Faj1ZOQObwbmtLwlDw3XzQ%3D%3D%0Am=fQUME5h%2FYY3oZjXbnLC3z6TaEEcTBSCAji4PkNqFjq8%3D%0As=1527f3221a6f31cba9544e5ddaa20986aafe8be8c898b42c7e9ce5e68d3803d8

 But I didn't found the final solution in that mail thread, do you have
 any new ideas or patches on this warning?

 I think the following points to the last discussion.  We are currently using 
 a
 similar patch:
 http://comments.gmane.org/gmane.linux.network/366549

 Eric, any update on your findings? or you have already pushed a fix?

 Thanks,
 --Martin
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 08/10] ss: symmetrical subhandler output extension example

2015-08-10 Thread Sergei Shtylyov

On 08/10/2015 05:53 PM, Eric Dumazet wrote:


 {} not needed. I guess you haven't run your patches thru
scripts/checkpatch.pl?



Yes, although this is missing from iproute2 sources ;)


   Oh, sorry, somehow I thought it's a kernel patch. :-)

MBR, Sergei

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: IPv6 and private net with masquerading not working correctly

2015-08-10 Thread Cong Wang
(Cc'ing netdev and netfilter-devel)

On Fri, Aug 7, 2015 at 6:00 AM, Gerhard Wiesinger li...@wiesinger.com wrote:
 On 06.08.2015 20:43, Gerhard Wiesinger wrote:

 Hello,

 I'm having the following problem with IPv6 and a private internal LAN
 which will be masqueraded to the public internet (I don't want to have
 public IPs in the LAN because of some static IPs and tracking) . Rules are
 generated by shorewall.

 Problem is that ICMP6 packets source address is not translated by the
 kernel on the reply when MTU has to be discovered because of too big packets
 and limited MTU capabilities on the path (happens also on tcp6 which works
 thereofore not correctly).

 # From an internal host on net fd00:1234:5678::/64
 ping6 -s 2000 2a02:1234:5678:7::2

 /etc/shorewall6/masq
 EXT_IF   fc00::/7

 ip6tables rule:
 MASQUERADE  all  *  *   fc00::/7 ::/0

 # Internal interface
 IP6 fd00:1234:5678::9  2a02:1234:5678:7::2: frag (0|1432) ICMP6, echo
 request, seq 1, length 1432
 IP6 fd00:1234:5678::9  2a02:1234:5678:7::2: frag (1432|576)
 IP6 2a02:1234:5678:9abc::115  fd00:1234:5678::9: ICMP6, packet too big,
 mtu 1440, length 1240

 # External interface
 IP6 2001:1234:5678:9abc::1  2a02:1234:5678:7::2: frag (0|1432) ICMP6,
 echo request, seq 1, length 1432
 IP6 2001:1234:5678:9abc::1  2a02:1234:5678:7::2: frag (1432|576)
 IP6 2a02:1234:5678:9abc::115  2001:1234:5678:9abc::1: ICMP6, packet too
 big, mtu 1440, length 1240

 Looks to me like a a major kernel bug.
 Kernel version is: 4.1.3-201.fc22.x86_64 from Fedora 22

 Any ideas?


 Any comments?

 Ciao,
 Gerhard

 --
 http://www.wiesinger.com/


 --
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next 00/10] VRF-lite - v5

2015-08-10 Thread David Ahern
In the context of internet scale routing a requirement that always comes
up is the need to partition the available routing tables into disjoint
routing planes. A specific use case is the multi-tenancy problem where
each tenant has their own unique routing tables and in the very least
need different default gateways.

This patch allows the ability to create virtual router domains (aka VRFs
(VRF-lite to be specific) in the linux packet forwarding stack. The main
observation is that through the use of rules and socket binding to interfaces,
all the facilities that we need are already present in the infrastructure. What
is missing is a handle that identifies a routing domain and can be used to
gather applicable rules/tables and uniqify neighbor selection. The scheme used
needs to preserves the notions of ECMP, and general routing principles.

This driver is a cross between functionality that the IPVLAN driver
and the Team drivers provide where a device is created and packets
into/out of the routing domain are shuttled through this device. The
device is then used as a handle to identify the applicable rules. The
VRF device is thus the layer3 equivalent of a vlan device.

The very important point to note is that this is only a Layer3 concept
so L2 tools (e.g., LLDP) do not need to be run in each VRF, processes can
run in unaware mode or select a VRF to be talking through. Also the
behavioral model is a generalized application of the familiar VRF-Lite
model with some performance paths that need optimization. (Specifically
the output route selector that Roopa, Robert, Thomas and EricB are
currently discussing on the MPLS thread)

High Level points
=
1. Simple overlay driver (minimal changes to current stack)
   * uses the existing fib tables and fib rules infrastructure
2. Modelled closely after the ipvlan driver
3. Uses current API and infrastructure.
   * Applications can use SO_BINDTODEVICE or cmsg device indentifiers
 to pick VRF (ping, traceroute just work)
   * Standard IP Rules work, and since they are aggregated against the
 device, scale is manageable
4. Completely orthogonal to Namespaces and only provides separation in
   the routing plane (and ARP)

 N2
   N1 (all configs here)  +---+
+--+  |   |
|swp1 :10.0.1.1+--+swp1 :10.0.1.2 |
|  |  |   |
|swp2 :10.0.2.1+--+swp2 :10.0.2.2 |
|  |  +---+
| VRF 1|
| table 5  |
|  |
+---+
|  |
| VRF 2| N3
| table 6  |  +---+
|  |  |   |
|swp3 :10.0.2.1+--+swp1 :10.0.2.2 |
|  |  |   |
|swp4 :10.0.3.1+--+swp2 :10.0.3.2 |
+--+  +---+


Given the topology above, the setup needed to get the basic VRF
functions working would be

Create the VRF devices and associate with a table
ip link add vrf1 type vrf table 5
ip link add vrf2 type vrf table 6

Install the lookup rules that map table to VRF domain
ip rule add pref 200 oif vrf1 lookup 5
ip rule add pref 200 iif vrf1 lookup 5
ip rule add pref 200 oif vrf2 lookup 6
ip rule add pref 200 iif vrf2 lookup 6

ip link set vrf1 up
ip link set vrf2 up

Enslave the routing member interfaces
ip link set swp1 master vrf1
ip link set swp2 master vrf1
ip link set swp3 master vrf2
ip link set swp4 master vrf2

Connected and local routes are automatically moved from main and local
tables to the VRF table.

ping using VRF0 is simply
ping -I vrf0 10.0.1.2


Design Highlights
=
If a device is enslaved to a VRF device (ie., associated with a VRF)
then:
1. Rx path
   The master device index is used as the iif for all lookups.

2. Tx path
   Similarly, for Tx the VRF device oif is used in the flow to direct
   lookups to the table associated with the VRF via its rule. From there
   the FLOWI_FLAG_VRFSRC flag is used to indicate that the oif should
   not be used for FIB table lookups.

3. Connected and local routes
   On link up for a device, connected and local routes are added to the
   table associated with the VRF device, rather than the local and main
   tables.

4. Socket lookups
   Socket lookups use the VRF device for comparison with sk_bound_dev_if.
   If a socket is not bound to a device a socket match can happen based
   on destination address, port and protocol in which case a VRF global
   or agnostic process handles the connection (ie., this allows 1 listener
   socket to handle connections across VRFs). The child socket becomes
   bound to the 

[PATCH net-next 2/9] net: Use VRF device index for lookups on RX

2015-08-10 Thread David Ahern
On ingress use index of VRF master device for route lookups if real device
is enslaved. Rules are expected to be installed for the VRF device to
direct lookups to a specific table.

Signed-off-by: Shrijeet Mukherjee s...@cumulusnetworks.com
Signed-off-by: David Ahern d...@cumulusnetworks.com
---
 net/ipv4/fib_frontend.c | 8 +++-
 net/ipv4/route.c| 3 ++-
 2 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c
index 6b98de0d7949..d8ced1d89f1b 100644
--- a/net/ipv4/fib_frontend.c
+++ b/net/ipv4/fib_frontend.c
@@ -45,6 +45,7 @@
 #include net/ip_fib.h
 #include net/rtnetlink.h
 #include net/xfrm.h
+#include net/vrf.h
 
 #ifndef CONFIG_IP_MULTIPLE_TABLES
 
@@ -309,7 +310,9 @@ static int __fib_validate_source(struct sk_buff *skb, 
__be32 src, __be32 dst,
bool dev_match;
 
fl4.flowi4_oif = 0;
-   fl4.flowi4_iif = oif ? : LOOPBACK_IFINDEX;
+   fl4.flowi4_iif = vrf_master_ifindex_rcu(dev);
+   if (!fl4.flowi4_iif)
+   fl4.flowi4_iif = oif ? : LOOPBACK_IFINDEX;
fl4.daddr = src;
fl4.saddr = dst;
fl4.flowi4_tos = tos;
@@ -339,6 +342,9 @@ static int __fib_validate_source(struct sk_buff *skb, 
__be32 src, __be32 dst,
if (nh-nh_dev == dev) {
dev_match = true;
break;
+   } else if (vrf_master_ifindex_rcu(nh-nh_dev) == dev-ifindex) {
+   dev_match = true;
+   break;
}
}
 #else
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 18fd7c9095c7..c26ff1f7067d 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -112,6 +112,7 @@
 #endif
 #include net/secure_seq.h
 #include net/ip_tunnels.h
+#include net/vrf.h
 
 #define RT_FL_TOS(oldflp4) \
((oldflp4)-flowi4_tos  (IPTOS_RT_MASK | RTO_ONLINK))
@@ -1726,7 +1727,7 @@ static int ip_route_input_slow(struct sk_buff *skb, 
__be32 daddr, __be32 saddr,
 *  Now we are ready to route packet.
 */
fl4.flowi4_oif = 0;
-   fl4.flowi4_iif = dev-ifindex;
+   fl4.flowi4_iif = vrf_master_ifindex_rcu(dev) ? : dev-ifindex;
fl4.flowi4_mark = skb-mark;
fl4.flowi4_tos = tos;
fl4.flowi4_scope = RT_SCOPE_UNIVERSE;
-- 
2.3.2 (Apple Git-55)

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 06/10] ss: replaced old output mechanisms with fmt handlers interfaces

2015-08-10 Thread Matthias Tafelmeier
Now, since the fmt (json, hr) handlers are in place, all can be output via these
newly deviced code parts.

Signed-off-by: Matthias Tafelmeier matthias.tafelme...@gmx.net
Suggested-by: Hagen Paul Pfeifer ha...@jauu.net
---
 misc/ss.c | 330 +-
 1 file changed, 152 insertions(+), 178 deletions(-)

diff --git a/misc/ss.c b/misc/ss.c
index 8fb6e7d..993a87b 100644
--- a/misc/ss.c
+++ b/misc/ss.c
@@ -105,6 +105,7 @@ int show_sock_ctx = 0;
 int user_ent_hash_build_init = 0;
 int follow_events = 0;
 int json_output = 0;
+int json_first_elem = 1;
 
 int netid_width;
 int state_width;
@@ -113,6 +114,8 @@ int addr_width;
 int serv_width;
 int screen_width;
 
+enum out_fmt_type fmt_type = FMT_HR;
+
 static const char *TCP_PROTO = tcp;
 static const char *UDP_PROTO = udp;
 static const char *RAW_PROTO = raw;
@@ -346,6 +349,16 @@ static FILE *ephemeral_ports_open(void)
 #define USER_ENT_HASH_SIZE 256
 struct user_ent *user_ent_hash[USER_ENT_HASH_SIZE];
 
+static void json_print_opening(void)
+{
+   if (json_output  json_first_elem) {
+   json_first_elem = 0;
+   printf({\n);
+   } else if (json_output) {
+   printf(,\n{\n);
+   }
+}
+
 static int user_ent_hashfn(unsigned int ino)
 {
int val = (ino  24) ^ (ino  16) ^ (ino  8) ^ ino;
@@ -791,7 +804,8 @@ do_numeric:
return buf;
 }
 
-static void inet_addr_print(const inet_prefix *a, int port, unsigned int 
ifindex)
+static void inet_addr_print(const inet_prefix *a, int port,
+   unsigned int ifindex, char *peer_kind)
 {
char buf[1024];
const char *ap = buf;
@@ -819,8 +833,8 @@ static void inet_addr_print(const inet_prefix *a, int port, 
unsigned int ifindex
est_len -= strlen(ifname) + 1;  /* +1 for percent char */
}
 
-   sock_addr_print_width(est_len, ap, :, serv_width, 
resolve_service(port),
-   ifname);
+   sock_addr_fmt(ap, est_len, :, serv_width, resolve_service(port),
+   ifname, peer_kind);
 }
 
 static int inet2_addr_match(const inet_prefix *a, const inet_prefix *p,
@@ -1352,21 +1366,27 @@ static void inet_stats_print(struct sockstat *s, int 
protocol)
 {
char *buf = NULL;
 
-   sock_state_print(s, proto_name(protocol));
+   sock_state_fmt(s, sstate_name, proto_name(protocol),
+   netid_width, state_width);
 
-   inet_addr_print(s-local, s-lport, s-iface);
-   inet_addr_print(s-remote, s-rport, 0);
+   if (json_output)
+   printf(\t,\peers\: {\n);
+
+   inet_addr_print(s-local, s-lport, s-iface, local);
+   inet_addr_print(s-remote, s-rport, 0, remote);
+   if (json_output)
+   printf(});
 
if (show_proc_ctx || show_sock_ctx) {
if (find_entry(s-ino, buf,
-   (show_proc_ctx  show_sock_ctx) ?
-   PROC_SOCK_CTX : PROC_CTX)  0) {
-   printf( users:(%s), buf);
+  (show_proc_ctx  show_sock_ctx) ?
+  PROC_SOCK_CTX : PROC_CTX)  0) {
+   sock_users_fmt(buf);
free(buf);
}
} else if (show_users) {
if (find_entry(s-ino, buf, USERS)  0) {
-   printf( users:(%s), buf);
+   sock_users_fmt(buf);
free(buf);
}
}
@@ -1470,16 +1490,16 @@ static int tcp_show_line(char *line, const struct 
filter *f, int family)
inet_stats_print(s.ss, IPPROTO_TCP);
 
if (show_options)
-   tcp_timer_print(s);
+   tcp_timer_fmt(s);
 
if (show_details) {
-   sock_details_print(s.ss);
+   sock_details_fmt(s.ss, GENERIC_DETAIL, 0, 0);
if (opt[0])
-   printf( opt:\%s\, opt);
+   opt_fmt(opt);
}
 
if (show_tcpinfo)
-   tcp_stats_print(s);
+   tcp_stats_fmt(s);
 
printf(\n);
return 0;
@@ -1523,31 +1543,14 @@ static void print_skmeminfo(struct rtattr *tb[], int 
attrtype)
const struct inet_diag_meminfo *minfo =
RTA_DATA(tb[INET_DIAG_MEMINFO]);
 
-   printf( mem:(r%u,w%u,f%u,t%u),
-   minfo-idiag_rmem,
-   minfo-idiag_wmem,
-   minfo-idiag_fmem,
-   minfo-idiag_tmem);
+   mem_fmt(minfo);
}
return;
}
 
skmeminfo = RTA_DATA(tb[attrtype]);
 
-   printf( skmem:(r%u,rb%u,t%u,tb%u,f%u,w%u,o%u,
-  skmeminfo[SK_MEMINFO_RMEM_ALLOC],
-  skmeminfo[SK_MEMINFO_RCVBUF],
-  

[PATCH 09/10] ss: symmetrical formatter extension example

2015-08-10 Thread Matthias Tafelmeier
This commit shall show shortly where to place changes when one wants to
extend an ss output formatter with a new handler (format print
procedure). The extension is done symmetrically. That means, every up to
now existing formatter is extended with a semantically equivalent
handler (hr and json formatter).

Signed-off-by: Matthias Tafelmeier matthias.tafelme...@gmx.net
Suggested-by: Hagen Paul Pfeifer ha...@jauu.net
---
 misc/ss_hr_fmt.c   | 61 ++
 misc/ss_json_fmt.c | 65 ++
 misc/ss_out_fmt.c  | 10 +
 misc/ss_out_fmt.h  | 10 +
 4 files changed, 146 insertions(+)

diff --git a/misc/ss_hr_fmt.c b/misc/ss_hr_fmt.c
index 40b6b7c..ca73dda 100644
--- a/misc/ss_hr_fmt.c
+++ b/misc/ss_hr_fmt.c
@@ -242,6 +242,66 @@ static void packet_show_ring_hr_fmt(struct 
packet_diag_ring *ring)
printf(,features:0x%x, ring-pdr_features);
 }
 
+static void packet_details_hr_fmt(struct packet_diag_info *pinfo,
+   struct packet_diag_ring *ring_rx,
+   struct packet_diag_ring *ring_tx,
+   uint32_t fanout,
+   bool has_fanout)
+{
+   if (pinfo) {
+   printf(\n\tver:%d, pinfo-pdi_version);
+   printf( cpy_thresh:%d, pinfo-pdi_copy_thresh);
+   printf( flags( );
+   if (pinfo-pdi_flags  PDI_RUNNING)
+   printf(running);
+   if (pinfo-pdi_flags  PDI_AUXDATA)
+   printf( auxdata);
+   if (pinfo-pdi_flags  PDI_ORIGDEV)
+   printf( origdev);
+   if (pinfo-pdi_flags  PDI_VNETHDR)
+   printf( vnethdr);
+   if (pinfo-pdi_flags  PDI_LOSS)
+   printf( loss);
+   if (!pinfo-pdi_flags)
+   printf(0);
+   printf( ));
+   }
+   if (ring_rx) {
+   printf(\n\tring_rx();
+   packet_show_ring_fmt(ring_rx);
+   printf());
+   }
+   if (ring_tx) {
+   printf(\n\tring_tx();
+   packet_show_ring_fmt(ring_tx);
+   printf());
+   }
+   if (has_fanout) {
+   uint16_t type = (fanout  16)  0x;
+
+   printf(\n\tfanout();
+   printf(id:%d,, fanout  0x);
+   printf(type:);
+
+   if (type == 0)
+   printf(hash);
+   else if (type == 1)
+   printf(lb);
+   else if (type == 2)
+   printf(cpu);
+   else if (type == 3)
+   printf(roll);
+   else if (type == 4)
+   printf(random);
+   else if (type == 5)
+   printf(qm);
+   else
+   printf(0x%x, type);
+
+   printf());
+   }
+}
+
 const struct fmt_op_hub hr_output_op = {
.tcp_stats_fmt = tcp_stats_hr_fmt,
.tcp_timer_fmt = tcp_timer_hr_fmt,
@@ -257,4 +317,5 @@ const struct fmt_op_hub hr_output_op = {
.opt_fmt = opt_hr_fmt,
.proc_fmt = proc_hr_fmt,
.packet_show_ring_fmt = packet_show_ring_hr_fmt,
+   .packet_details_fmt = packet_details_hr_fmt
 };
diff --git a/misc/ss_json_fmt.c b/misc/ss_json_fmt.c
index d7dfce9..3d10220 100644
--- a/misc/ss_json_fmt.c
+++ b/misc/ss_json_fmt.c
@@ -355,6 +355,70 @@ static void packet_show_ring_json_fmt(struct 
packet_diag_ring *ring)
printf(\features_0x\ : \%x\\n, ring-pdr_features);
 }
 
+static void packet_details_json_fmt(struct packet_diag_info *pinfo,
+   struct packet_diag_ring *ring_rx,
+   struct packet_diag_ring *ring_tx,
+   uint32_t fanout,
+   bool has_fanout)
+{
+   printf(,\n);
+   if (pinfo) {
+   printf(\t\ver\: \%d\,\n, pinfo-pdi_version);
+   printf(\t\cpy_thresh\: \%d\,\n, pinfo-pdi_copy_thresh);
+   printf(\t\flags\: \);
+   if (pinfo-pdi_flags  PDI_RUNNING)
+   printf(running);
+   if (pinfo-pdi_flags  PDI_AUXDATA)
+   printf(_auxdata);
+   if (pinfo-pdi_flags  PDI_ORIGDEV)
+   printf(_origdev);
+   if (pinfo-pdi_flags  PDI_VNETHDR)
+   printf(_vnethdr);
+   if (pinfo-pdi_flags  PDI_LOSS)
+   printf(_loss);
+   if (!pinfo-pdi_flags)
+   printf(0);
+   printf(\);
+   res_json_fmt_branch(ring_rx || ring_tx || has_fanout, ' ');
+   }
+   if (ring_rx) {
+   printf(\t\ring_rx\: {);
+   packet_show_ring_fmt(ring_rx);
+   printf(});
+   res_json_fmt_branch(ring_tx || has_fanout, ' ');
+   }
+   if (ring_tx) {
+   printf(\t\ring_tx\: {);
+   

V2 iproute2: full ss json support and general output simplification

2015-08-10 Thread Matthias Tafelmeier
TLDR:

- add full JSON support for ss
- Patchset provides a general and easy to use abstraction to extend ss later
- Patchset size is large to minimize daily use (user should not deal with
formation (json, human readble) later on)
- Patches 8/10 and 9/10 illustrate how to extend ss for new data to support 
human readble and json
output. 
- Example_Usages: 1. ss -jt to print out all tcp related information formatted 
in json
  2. ss --json -a to print out all info (also summary) 

STATS:

Matthias Tafelmeier (10):
  ss: rooted out ss type declarations for output formatters
  ss: created formatters for json and hr
  ss: removed obsolet fmt functions
  ss: prepare timer for output handler usage
  ss: framed skeleton for json output in ss
  ss: replaced old output mechanisms with fmt handlers interfaces
  ss: renaming and export of current_filter
  ss: symmetrical subhandler output extension example
  ss: symmetrical formatter extension example
  ss: fixed free on local array for valid json output

 misc/Makefile  |2 +-
 misc/ss.c  | 1006 +++-
 misc/ss_hr_fmt.c   |  321 +
 misc/ss_hr_fmt.h   |9 +
 misc/ss_json_fmt.c |  438 +++
 misc/ss_json_fmt.h |   24 ++
 misc/ss_out_fmt.c  |  137 +++
 misc/ss_out_fmt.h  |   92 +
 misc/ss_types.h|  186 ++
 9 files changed, 1564 insertions(+), 651 deletions(-)
 create mode 100644 misc/ss_hr_fmt.c
 create mode 100644 misc/ss_hr_fmt.h
 create mode 100644 misc/ss_json_fmt.c
 create mode 100644 misc/ss_json_fmt.h
 create mode 100644 misc/ss_out_fmt.c
 create mode 100644 misc/ss_out_fmt.h
 create mode 100644 misc/ss_types.h

-- 

Abstract: 

This patch set originates from the necessity to upgrade ss with the possibility
to output in json format. Not to clutter up ss too much, the author of the
patch decided to come up with a simple distributor to handler approach. That
is, the distributor poses the mechanical interface which passes the output
requests coming from ss to the appropriate handler. This simplifies the
interaction with ss and provides a maximum of future extensiblity. Not to
forget, ss loses weight thereby since output implemented in ss itself does
migrate to the appropriate handler. Additionally, because types are shared
amongst handlers, the distributor and ss, the author conceived, that a separate
containter module for types has to be formed. In future, all type declarations
and extensins go there. 

In sum, the patchset has this voluminous extent since there is no viable way
for putting out syntactically correct human readble and json in a simpler 
manner.
The requirement for convenient extensibility of output and data is
another justification for the patchset size.

Concept sketch:

   formatter1 
  
  *  *
  *  *
   ss   ~~~zzz  *
 ** ~ *  *
 ** ~   ###fff  *
 ** ~   # *  *
 **  distributor~   # 
 *    *   * ~   # 
 *   ---  *   * ~   # 
 *    *-  *   * ~   # 
 **    # 
 **   *   * ~   #  formatter2 
 **  
 *    *-  *   * ~   # *  *
 *   ---  *   * ~   # *  *
 *    *   * ~   # *  *
 ** #~~zzz  *
 ** # *  *
 ** ###fff  *
 **   *  *
  
   
At the moment, the distributor is the ss_out_fmt module while two handlers are
up: namely the ss_json_fmt and the ss_hr_fmt (human readable). You can use
those modules as the main reference for own extensions.

Future Extension:
In the following, I will expand on the expandability of the formatter model.
The explanations advances from the minimal to the most sweeping extension in
mind.

Sub Format Handler 

[PATCH 08/10] ss: symmetrical subhandler output extension example

2015-08-10 Thread Matthias Tafelmeier
This small sized patch shall convey the locations which have to be
changed for a symmetrical output extension. Symmetrical means in this
context all existing semantically related handlers in the diverse
formatters (for hr and json up to now).

Signed-off-by: Matthias Tafelmeier matthias.tafelme...@gmx.net
Suggested-by: Hagen Paul Pfeifer ha...@jauu.net
---
 misc/ss_hr_fmt.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/misc/ss_hr_fmt.c b/misc/ss_hr_fmt.c
index 6955ea5..40b6b7c 100644
--- a/misc/ss_hr_fmt.c
+++ b/misc/ss_hr_fmt.c
@@ -82,6 +82,8 @@ static void tcp_stats_hr_fmt(struct tcpstat *s)
printf( reordering:%d, s-reordering);
if (s-rcv_rtt)
printf( rcv_rtt:%g, s-rcv_rtt);
+   if (s-rcv_space)
+   printf( rcv_space:%d, s-rcv_space);
 
CHECK_FMT_ADAPT(s-rcv_space, s,
hr_handler_must_be_adapted_accordingly_when_json_fmt_is_extended);
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 04/10] ss: prepare timer for output handler usage

2015-08-10 Thread Matthias Tafelmeier
Minor preparation Patch

Renamed, and exported timer to not have to pass it as a function local
parameter argument.

Signed-off-by: Matthias Tafelmeier matthias.tafelme...@gmx.net
Suggested-by: Hagen Paul Pfeifer ha...@jauu.net
---
 misc/ss.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/misc/ss.c b/misc/ss.c
index e241b2f..1b3ef90 100644
--- a/misc/ss.c
+++ b/misc/ss.c
@@ -647,7 +647,7 @@ static const char *sstate_namel[] = {
[SS_CLOSING] = closing,
 };
 
-static const char *tmr_name[] = {
+const char *ss_timer_name[] = {
off,
on,
keepalive,
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 07/10] ss: renaming and export of current_filter

2015-08-10 Thread Matthias Tafelmeier
Exported current_filter as ss_current_filter, because in
the fmt handlers, I need that piece of info to resolve out issues of json.

Signed-off-by: Matthias Tafelmeier matthias.tafelme...@gmx.net
Suggested-by: Hagen Paul Pfeifer ha...@jauu.net
---
 misc/ss.c | 218 +++---
 1 file changed, 109 insertions(+), 109 deletions(-)

diff --git a/misc/ss.c b/misc/ss.c
index 993a87b..5eba08d 100644
--- a/misc/ss.c
+++ b/misc/ss.c
@@ -199,7 +199,7 @@ static const struct filter default_afs[AF_MAX] = {
 };
 
 static int do_default = 1;
-static struct filter current_filter;
+struct filter ss_current_filter;
 
 static void filter_db_set(struct filter *f, int db)
 {
@@ -1189,7 +1189,7 @@ void *parse_hostcond(char *addr, bool is_port)
struct aafilter a = { .port = -1 };
struct aafilter *res;
int fam = preferred_family;
-   struct filter *f = current_filter;
+   struct filter *f = ss_current_filter;
 
if (fam == AF_UNIX || strncmp(addr, unix:, 5) == 0) {
char *p;
@@ -1288,9 +1288,9 @@ void *parse_hostcond(char *addr, bool is_port)
if (get_integer(a.port, port, 0)) {
struct servent *se1 = NULL;
struct servent *se2 = NULL;
-   if (current_filter.dbs(1UDP_DB))
+   if (ss_current_filter.dbs  (1  UDP_DB))
se1 = getservbyname(port, UDP_PROTO);
-   if (current_filter.dbs(1TCP_DB))
+   if (ss_current_filter.dbs  (1  TCP_DB))
se2 = getservbyname(port, TCP_PROTO);
if (se1  se2  se1-s_port != se2-s_port) {
fprintf(stderr, Error: ambiguous port 
\%s\.\n, port);
@@ -1304,9 +1304,9 @@ void *parse_hostcond(char *addr, bool is_port)
struct scache *s;
for (s = rlist; s; s = s-next) {
if ((s-proto == UDP_PROTO 
-
(current_filter.dbs(1UDP_DB))) ||
+
(ss_current_filter.dbs(1UDP_DB))) ||
(s-proto == TCP_PROTO 
-
(current_filter.dbs(1TCP_DB {
+
(ss_current_filter.dbs(1TCP_DB {
if (s-name  
strcmp(s-name, port) == 0) {
if (a.port  0 
 a.port != s-port) {

fprintf(stderr, Error: ambiguous port \%s\.\n, port);
@@ -3221,19 +3221,19 @@ int main(int argc, char *argv[])
follow_events = 1;
break;
case 'd':
-   filter_db_set(current_filter, DCCP_DB);
+   filter_db_set(ss_current_filter, DCCP_DB);
break;
case 't':
-   filter_db_set(current_filter, TCP_DB);
+   filter_db_set(ss_current_filter, TCP_DB);
break;
case 'u':
-   filter_db_set(current_filter, UDP_DB);
+   filter_db_set(ss_current_filter, UDP_DB);
break;
case 'w':
-   filter_db_set(current_filter, RAW_DB);
+   filter_db_set(ss_current_filter, RAW_DB);
break;
case 'x':
-   filter_af_set(current_filter, AF_UNIX);
+   filter_af_set(ss_current_filter, AF_UNIX);
break;
case 'a':
state_filter = SS_ALL;
@@ -3242,25 +3242,25 @@ int main(int argc, char *argv[])
state_filter = (1  SS_LISTEN) | (1  SS_CLOSE);
break;
case '4':
-   filter_af_set(current_filter, AF_INET);
+   filter_af_set(ss_current_filter, AF_INET);
break;
case '6':
-   filter_af_set(current_filter, AF_INET6);
+   filter_af_set(ss_current_filter, AF_INET6);
break;
case '0':
-   filter_af_set(current_filter, AF_PACKET);
+   filter_af_set(ss_current_filter, AF_PACKET);
break;
case 'f':
if (strcmp(optarg, inet) == 0)
-   filter_af_set(current_filter, AF_INET);
+  

[PATCH 03/10] ss: removed obsolet fmt functions

2015-08-10 Thread Matthias Tafelmeier
Those functions are obsoleted since the new fmt handler mechanism
subsumes their tasks. Rendundancy would be contradictory to
the new mechanism.

Signed-off-by: Matthias Tafelmeier matthias.tafelme...@gmx.net
Suggested-by: Hagen Paul Pfeifer ha...@jauu.net
---
 misc/ss.c | 190 --
 1 file changed, 190 deletions(-)

diff --git a/misc/ss.c b/misc/ss.c
index 3d31b81..e241b2f 100644
--- a/misc/ss.c
+++ b/misc/ss.c
@@ -647,43 +647,6 @@ static const char *sstate_namel[] = {
[SS_CLOSING] = closing,
 };
 
-static void sock_state_print(struct sockstat *s, const char *sock_name)
-{
-   if (netid_width)
-   printf(%-*s , netid_width, sock_name);
-   if (state_width)
-   printf(%-*s , state_width, sstate_name[s-state]);
-
-   printf(%-6d %-6d , s-rq, s-wq);
-}
-
-static void sock_details_print(struct sockstat *s)
-{
-   if (s-uid)
-   printf( uid:%u, s-uid);
-
-   printf( ino:%u, s-ino);
-   printf( sk:%llx, s-sk);
-}
-
-static void sock_addr_print_width(int addr_len, const char *addr, char *delim,
-   int port_len, const char *port, const char *ifname)
-{
-   if (ifname) {
-   printf(%*s%%%s%s%-*s , addr_len, addr, ifname, delim,
-   port_len, port);
-   }
-   else {
-   printf(%*s%s%-*s , addr_len, addr, delim, port_len, port);
-   }
-}
-
-static void sock_addr_print(const char *addr, char *delim, const char *port,
-   const char *ifname)
-{
-   sock_addr_print_width(addr_width, addr, delim, serv_width, port, 
ifname);
-}
-
 static const char *tmr_name[] = {
off,
on,
@@ -693,33 +656,6 @@ static const char *tmr_name[] = {
unknown
 };
 
-static const char *print_ms_timer(int timeout)
-{
-   static char buf[64];
-   int secs, msecs, minutes;
-   if (timeout  0)
-   timeout = 0;
-   secs = timeout/1000;
-   minutes = secs/60;
-   secs = secs%60;
-   msecs = timeout%1000;
-   buf[0] = 0;
-   if (minutes) {
-   msecs = 0;
-   snprintf(buf, sizeof(buf)-16, %dmin, minutes);
-   if (minutes  9)
-   secs = 0;
-   }
-   if (secs) {
-   if (secs  9)
-   msecs = 0;
-   sprintf(buf+strlen(buf), %d%s, secs, msecs ? . : sec);
-   }
-   if (msecs)
-   sprintf(buf+strlen(buf), %03dms, msecs);
-   return buf;
-}
-
 struct scache *rlist;
 
 static void init_service_resolver(void)
@@ -1482,122 +1418,6 @@ static int proc_inet_split_line(char *line, char **loc, 
char **rem, char **data)
return 0;
 }
 
-static char *sprint_bw(char *buf, double bw)
-{
-   if (bw  100.)
-   sprintf(buf,%.1fM, bw / 100.);
-   else if (bw  1000.)
-   sprintf(buf,%.1fK, bw / 1000.);
-   else
-   sprintf(buf, %g, bw);
-
-   return buf;
-}
-
-static void tcp_stats_print(struct tcpstat *s)
-{
-   char b1[64];
-
-   if (s-has_ts_opt)
-   printf( ts);
-   if (s-has_sack_opt)
-   printf( sack);
-   if (s-has_ecn_opt)
-   printf( ecn);
-   if (s-has_ecnseen_opt)
-   printf( ecnseen);
-   if (s-has_fastopen_opt)
-   printf( fastopen);
-   if (s-cong_alg[0])
-   printf( %s, s-cong_alg);
-   if (s-has_wscale_opt)
-   printf( wscale:%d,%d, s-snd_wscale, s-rcv_wscale);
-   if (s-rto)
-   printf( rto:%g, s-rto);
-   if (s-backoff)
-   printf( backoff:%u, s-backoff);
-   if (s-rtt)
-   printf( rtt:%g/%g, s-rtt, s-rttvar);
-   if (s-ato)
-   printf( ato:%g, s-ato);
-
-   if (s-qack)
-   printf( qack:%d, s-qack);
-   if (s-qack  1)
-   printf( bidir);
-
-   if (s-mss)
-   printf( mss:%d, s-mss);
-   if (s-cwnd)
-   printf( cwnd:%d, s-cwnd);
-   if (s-ssthresh)
-   printf( ssthresh:%d, s-ssthresh);
-
-   if (s-bytes_acked)
-   printf( bytes_acked:%llu, s-bytes_acked);
-   if (s-bytes_received)
-   printf( bytes_received:%llu, s-bytes_received);
-   if (s-segs_out)
-   printf( segs_out:%u, s-segs_out);
-   if (s-segs_in)
-   printf( segs_in:%u, s-segs_in);
-
-   if (s-dctcp  s-dctcp-enabled) {
-   struct dctcpstat *dctcp = s-dctcp;
-
-   printf( dctcp:(ce_state:%u,alpha:%u,ab_ecn:%u,ab_tot:%u),
-   dctcp-ce_state, dctcp-alpha, dctcp-ab_ecn,
-   dctcp-ab_tot);
-   } else if (s-dctcp) {
-   printf( dctcp:fallback_mode);
-   }
-
-   if (s-send_bps)
-   printf( send %sbps, sprint_bw(b1, s-send_bps));
-   if (s-lastsnd)
-   printf( lastsnd:%u, 

[PATCH 02/10] ss: created formatters for json and hr

2015-08-10 Thread Matthias Tafelmeier
This patch creates a central formatter module that acts as a kind of
switch. From there, more specific handler modules for the certain output
formats are called. Up to now, humand readable and json do exist.

That prepares ss for potential output format extensions in the future.
With the help of such an apparatus, extensions should get done
conveniently as well.

For a completely new output format, a new handler module must be created
and should be constructed like its relatives (for ex.: ss_json_fmt.c).
Moreover, its functions need to get registered with the central output
distributor. The latter can be done in that the according fmt_op_hub of
the new handler module is registered in the fmt_op_hub array.

Solely extending tcp_stats output shall boil down to extending the
according handler function with the new predicate and its value. The
context of the output subparts are important. With JSON, for instance,
you have to ensure, that the comas are set at the right places.

Further, an interim solution for all tcp_stats extensions is to check
that all those muddle through to all fmt handlers by STATICAL_ASSERTING
that.  Interim is the solution, since a central structure would be much
more worthwile for maintainability and this method does not ensure
correct output fmt extension in a foolproof manner.

Examples for tcp_stats out extension:

ss_json_fmt.c:

To add a new foo_param in tcp_stats for output (Pseudocode):
[...]
if (s-has_ts_opt) {
printf(,\n%s\ts\: \true\, indent1);
}
if (s-has_sack_opt) {
printf(,\n%s\sack\: \true\, indent1);
}
if (s-has_ecn_opt) {
printf(,\n%s\ecn\: \true\, indent1);
}
[...]

- macro to ensure statically no new tcp_stats info will be forgotten in
- any of the fmt handlers
CHECK_FMT_ADAPT(s-new_foo_pred, s, error_msg_adapation_issue);

Signed-off-by: Matthias Tafelmeier matthias.tafelme...@gmx.net
Suggested-by: Hagen Paul Pfeifer ha...@jauu.net
---
 misc/Makefile  |   2 +-
 misc/ss_hr_fmt.c   | 258 
 misc/ss_hr_fmt.h   |   9 ++
 misc/ss_json_fmt.c | 373 +
 misc/ss_json_fmt.h |  24 
 misc/ss_out_fmt.c  | 127 ++
 misc/ss_out_fmt.h  |  82 
 7 files changed, 874 insertions(+), 1 deletion(-)
 create mode 100644 misc/ss_hr_fmt.c
 create mode 100644 misc/ss_hr_fmt.h
 create mode 100644 misc/ss_json_fmt.c
 create mode 100644 misc/ss_json_fmt.h
 create mode 100644 misc/ss_out_fmt.c
 create mode 100644 misc/ss_out_fmt.h

diff --git a/misc/Makefile b/misc/Makefile
index b7ecba9..fb67ead 100644
--- a/misc/Makefile
+++ b/misc/Makefile
@@ -1,4 +1,4 @@
-SSOBJ=ss.o ssfilter.o
+SSOBJ=ss.o ssfilter.o ss_hr_fmt.o ss_json_fmt.o ss_out_fmt.o
 LNSTATOBJ=lnstat.o lnstat_util.o
 
 TARGETS=ss nstat ifstat rtacct arpd lnstat
diff --git a/misc/ss_hr_fmt.c b/misc/ss_hr_fmt.c
new file mode 100644
index 000..6955ea5
--- /dev/null
+++ b/misc/ss_hr_fmt.c
@@ -0,0 +1,258 @@
+#include linux/sock_diag.h
+#include linux/rtnetlink.h
+#include ss_out_fmt.h
+#include ss_types.h
+#include ss_hr_fmt.h
+
+static void tcp_stats_hr_fmt(struct tcpstat *s)
+{
+   char b1[64];
+
+   if (s-has_ts_opt)
+   printf( ts);
+   if (s-has_sack_opt)
+   printf( sack);
+   if (s-has_ecn_opt)
+   printf( ecn);
+   if (s-has_ecnseen_opt)
+   printf( ecnseen);
+   if (s-has_fastopen_opt)
+   printf( fastopen);
+   if (s-cong_alg)
+   printf( %s, s-cong_alg);
+   if (s-has_wscale_opt)
+   printf( wscale:%d,%d, s-snd_wscale, s-rcv_wscale);
+   if (s-rto)
+   printf( rto:%g, s-rto);
+   if (s-backoff)
+   printf( backoff:%u, s-backoff);
+   if (s-rtt)
+   printf( rtt:%g/%g, s-rtt, s-rttvar);
+   if (s-ato)
+   printf( ato:%g, s-ato);
+
+   if (s-qack)
+   printf( qack:%d, s-qack);
+   if (s-qack  1)
+   printf( bidir);
+
+   if (s-mss)
+   printf( mss:%d, s-mss);
+   if (s-cwnd)
+   printf( cwnd:%d, s-cwnd);
+   if (s-ssthresh)
+   printf( ssthresh:%d, s-ssthresh);
+
+   if (s-dctcp  s-dctcp-enabled) {
+   struct dctcpstat *dctcp = s-dctcp;
+
+   printf( dctcp:(ce_state:%u,alpha:%u,ab_ecn:%u,ab_tot:%u),
+   dctcp-ce_state, dctcp-alpha, dctcp-ab_ecn,
+   dctcp-ab_tot);
+   } else if (s-dctcp) {
+   printf( dctcp:fallback_mode);
+   }
+
+   if (s-send_bps)
+   printf( send %sbps, sprint_bw(b1, s-send_bps));
+   if (s-lastsnd)
+   printf( lastsnd:%u, s-lastsnd);
+   if (s-lastrcv)
+   printf( lastrcv:%u, s-lastrcv);
+   if (s-lastack)
+   printf( lastack:%u, s-lastack);
+
+   if 

[PATCH 05/10] ss: framed skeleton for json output in ss

2015-08-10 Thread Matthias Tafelmeier
This patch just adds the --json flag to ss. Also it ensures proper
stats components bracketization – that goes for ex. TCP, UDP, NETLINK etc.

Moreover, this patch prevents human readable headers to be printed. The
first element flag ensures, that every first output json container
element is treated specially, while all the others are treated equally.
That is, only the first one does not print a coma ahead of itself. The
rest does. This mechanism ensures the correct coma setting as demaned by
the spec. Illustration in the following:

PSEUDOCODE:

{  no comma
{first }
,
{sec}
,
{third}
.
.
.
}

Signed-off-by: Matthias Tafelmeier matthias.tafelme...@gmx.net
Suggested-by: Hagen Paul Pfeifer ha...@jauu.net
---
 misc/ss.c | 198 --
 1 file changed, 155 insertions(+), 43 deletions(-)

diff --git a/misc/ss.c b/misc/ss.c
index 1b3ef90..8fb6e7d 100644
--- a/misc/ss.c
+++ b/misc/ss.c
@@ -34,6 +34,9 @@
 #include libnetlink.h
 #include namespace.h
 #include SNAPSHOT.h
+#include ss_out_fmt.h
+#include ss_json_fmt.h
+#include ss_types.h
 
 #include linux/tcp.h
 #include linux/sock_diag.h
@@ -101,6 +104,7 @@ int show_sock_ctx = 0;
 /* If show_users  show_proc_ctx only do user_ent_hash_build() once */
 int user_ent_hash_build_init = 0;
 int follow_events = 0;
+int json_output = 0;
 
 int netid_width;
 int state_width;
@@ -714,7 +718,6 @@ static int is_ephemeral(int port)
return (port = ip_local_port_min  port= ip_local_port_max);
 }
 
-
 static const char *__resolve_service(int port)
 {
struct scache *c;
@@ -3064,6 +3067,9 @@ static int print_summary(void)
 
printf(\n);
 
+   if (json_output  has_successor)
+   printf(,\n);
+
return 0;
 }
 
@@ -3090,6 +3096,7 @@ static void _usage(FILE *dest)
-z, --contexts  display process and socket SELinux security contexts\n
-N, --net   switch to the specified network namespace name\n
 \n
+   -j, --json  format output in JSON\n
-4, --ipv4  display only IP version 4 sockets\n
-6, --ipv6  display only IP version 6 sockets\n
-0, --packetdisplay PACKET sockets\n
@@ -3189,6 +3196,7 @@ static const struct option long_opts[] = {
{ help, 0, 0, 'h' },
{ context, 0, 0, 'Z' },
{ contexts, 0, 0, 'z' },
+   { json, 0, 0, 'j' },
{ net, 1, 0, 'N' },
{ 0 }
 
@@ -3204,7 +3212,7 @@ int main(int argc, char *argv[])
int ch;
int state_filter = 0;
 
-   while ((ch = getopt_long(argc, argv, 
dhaletuwxnro460spbEf:miA:D:F:vVzZN:,
+   while ((ch = getopt_long(argc, argv, 
dhaletuwxnro460spbEf:miA:D:F:vVzZN:j,
 long_opts, NULL)) != EOF) {
switch(ch) {
case 'n':
@@ -3383,6 +3391,10 @@ int main(int argc, char *argv[])
if (netns_switch(optarg))
exit(1);
break;
+   case 'j':
+   fmt_type = FMT_JSON;
+   json_output = 1;
+   break;
case 'h':
case '?':
help();
@@ -3464,11 +3476,33 @@ int main(int argc, char *argv[])
exit(-1);
}
}
+   printf(\TCP\: [\n);
inet_show_netlink(current_filter, dump_fp, IPPROTO_TCP);
+   res_json_fmt_branch(current_filter.dbs  (1NETLINK_DB) ||
+   current_filter.dbs  PACKET_DBM ||
+   current_filter.dbs  UNIX_DBM ||
+   current_filter.dbs  (1RAW_DB) ||
+   current_filter.dbs  (1UDP_DB) ||
+   current_filter.dbs  (1TCP_DB) ||
+   current_filter.dbs  (1DCCP_DB), ']');
fflush(dump_fp);
exit(0);
}
 
+   if (do_summary) {
+   print_summary(current_filter.dbs  PACKET_DBM ||
+   current_filter.dbs  UNIX_DBM ||
+   current_filter.dbs  (1RAW_DB) ||
+   current_filter.dbs  (1UDP_DB) ||
+   current_filter.dbs  (1TCP_DB) ||
+   current_filter.dbs  (1DCCP_DB));
+   if (do_default  argc == 0) {
+   if (json_output)
+   printf(}\n);
+   exit(0);
+   }
+   }
+
if (ssfilter_parse(current_filter.f, argc, argv, filter_fp))
usage();
 
@@ -3490,62 +3524,140 @@ int main(int argc, char *argv[])
}
}
 
-   addrp_width = screen_width;
-   addrp_width -= netid_width+1;
-   addrp_width -= state_width+1;
-   addrp_width -= 14;
+   if 

[PATCH 01/10] ss: rooted out ss type declarations for output formatters

2015-08-10 Thread Matthias Tafelmeier
The prospected output formatters and ss do share type declarations like
slabstat or tcpstat so that the decision has been made to centralize
those declarations in ss_types.h.  Potential future declarations shall
be placed there. The latter should help amend the extent of ss.c as
well.

Signed-off-by: Matthias Tafelmeier matthias.tafelme...@gmx.net
Suggested-by: Hagen Paul Pfeifer ha...@jauu.net
---
 misc/ss.c   | 186 +---
 misc/ss_types.h | 186 
 2 files changed, 187 insertions(+), 185 deletions(-)
 create mode 100644 misc/ss_types.h

diff --git a/misc/ss.c b/misc/ss.c
index f4c828c..3d31b81 100644
--- a/misc/ss.c
+++ b/misc/ss.c
@@ -27,6 +27,7 @@
 #include getopt.h
 #include stdbool.h
 
+#include ss_types.h
 #include utils.h
 #include rt_names.h
 #include ll_map.h
@@ -113,55 +114,17 @@ static const char *UDP_PROTO = udp;
 static const char *RAW_PROTO = raw;
 static const char *dg_proto = NULL;
 
-enum
-{
-   TCP_DB,
-   DCCP_DB,
-   UDP_DB,
-   RAW_DB,
-   UNIX_DG_DB,
-   UNIX_ST_DB,
-   UNIX_SQ_DB,
-   PACKET_DG_DB,
-   PACKET_R_DB,
-   NETLINK_DB,
-   MAX_DB
-};
 
 #define PACKET_DBM ((1PACKET_DG_DB)|(1PACKET_R_DB))
 #define UNIX_DBM ((1UNIX_DG_DB)|(1UNIX_ST_DB)|(1UNIX_SQ_DB))
 #define ALL_DB ((1MAX_DB)-1)
 #define INET_DBM ((1TCP_DB)|(1UDP_DB)|(1DCCP_DB)|(1RAW_DB))
 
-enum {
-   SS_UNKNOWN,
-   SS_ESTABLISHED,
-   SS_SYN_SENT,
-   SS_SYN_RECV,
-   SS_FIN_WAIT1,
-   SS_FIN_WAIT2,
-   SS_TIME_WAIT,
-   SS_CLOSE,
-   SS_CLOSE_WAIT,
-   SS_LAST_ACK,
-   SS_LISTEN,
-   SS_CLOSING,
-   SS_MAX
-};
-
 #define SS_ALL ((1  SS_MAX) - 1)
 #define SS_CONN (SS_ALL  
~((1SS_LISTEN)|(1SS_CLOSE)|(1SS_TIME_WAIT)|(1SS_SYN_RECV)))
 
 #include ssfilter.h
 
-struct filter
-{
-   int dbs;
-   int states;
-   int families;
-   struct ssfilter *f;
-};
-
 static const struct filter default_dbs[MAX_DB] = {
[TCP_DB] = {
.states   = SS_CONN,
@@ -376,16 +339,6 @@ static FILE *ephemeral_ports_open(void)
return generic_proc_open(PROC_IP_LOCAL_PORT_RANGE, 
sys/net/ipv4/ip_local_port_range);
 }
 
-struct user_ent {
-   struct user_ent *next;
-   unsigned intino;
-   int pid;
-   int fd;
-   char*process;
-   char*process_ctx;
-   char*socket_ctx;
-};
-
 #define USER_ENT_HASH_SIZE 256
 struct user_ent *user_ent_hash[USER_ENT_HASH_SIZE];
 
@@ -538,12 +491,6 @@ static void user_ent_hash_build(void)
closedir(dir);
 }
 
-enum entry_types {
-   USERS,
-   PROC_CTX,
-   PROC_SOCK_CTX
-};
-
 #define ENTRY_BUF_SIZE 512
 static int find_entry(unsigned ino, char **buf, int type)
 {
@@ -616,17 +563,6 @@ next:
return cnt;
 }
 
-/* Get stats from slab */
-
-struct slabstat
-{
-   int socks;
-   int tcp_ports;
-   int tcp_tws;
-   int tcp_syns;
-   int skbs;
-};
-
 static struct slabstat slabstat;
 
 static const char *slabstat_ids[] =
@@ -711,75 +647,6 @@ static const char *sstate_namel[] = {
[SS_CLOSING] = closing,
 };
 
-struct sockstat
-{
-   struct sockstat*next;
-   unsigned inttype;
-   uint16_tprot;
-   inet_prefix local;
-   inet_prefix remote;
-   int lport;
-   int rport;
-   int state;
-   int rq, wq;
-   unsignedino;
-   unsigneduid;
-   int refcnt;
-   unsigned intiface;
-   unsigned long long  sk;
-   char *name;
-   char *peer_name;
-};
-
-struct dctcpstat
-{
-   unsigned intce_state;
-   unsigned intalpha;
-   unsigned intab_ecn;
-   unsigned intab_tot;
-   boolenabled;
-};
-
-struct tcpstat
-{
-   struct sockstat ss;
-   int timer;
-   int timeout;
-   int probes;
-   charcong_alg[16];
-   double  rto, ato, rtt, rttvar;
-   int qack, cwnd, ssthresh, backoff;
-   double  send_bps;
-   int snd_wscale;
-   int rcv_wscale;
-   int mss;
-   unsigned intlastsnd;
-   unsigned intlastrcv;
-   unsigned intlastack;
-   double  pacing_rate;
-   double  pacing_rate_max;
-   unsigned long long  bytes_acked;
-   unsigned long long  bytes_received;
-   unsigned intsegs_out;
-   unsigned intsegs_in;
-   unsigned intunacked;
-   unsigned intretrans;
-   unsigned intretrans_total;
-   unsigned intlost;
-   unsigned intsacked;
-   

[PATCH 10/10] ss: fixed free on local array for valid json output

2015-08-10 Thread Matthias Tafelmeier
Minor fix to enable json output. Freeing of automatic char array name
which will get freed after function stack cleanup. Another one after
tcp_stats_fmt for freeing automatic tcpstats struct instance.

Signed-off-by: Matthias Tafelmeier matthias.tafelme...@gmx.net
Suggested-by: Hagen Paul Pfeifer ha...@jauu.net
---
 misc/ss.c | 6 --
 1 file changed, 6 deletions(-)

diff --git a/misc/ss.c b/misc/ss.c
index 5eba08d..722253a 100644
--- a/misc/ss.c
+++ b/misc/ss.c
@@ -1664,10 +1664,6 @@ static void tcp_show_info(const struct nlmsghdr *nlh, 
struct inet_diag_msg *r,
s.segs_out = info-tcpi_segs_out;
s.segs_in = info-tcpi_segs_in;
tcp_stats_fmt(s);
-   if (s.dctcp)
-   free(s.dctcp);
-   if (s.cong_alg)
-   free(s.cong_alg);
}
 }
 
@@ -2366,8 +2362,6 @@ if (show_mem) {
if (json_output)
printf(}\n);
 
-   if (name)
-   free(name);
return 0;
 }
 
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 08/10] ss: symmetrical subhandler output extension example

2015-08-10 Thread Matthias Tafelmeier
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA384


 {} not needed. I guess you haven't run your patches thru 
 scripts/checkpatch.pl?
 
 
 Yes, although this is missing from iproute2 sources ;)
 
 

Thank you for reviewing so far.

I see there slipped some parts of the patch through according
checkpatch.pl for which I am responsible. I will give a V2 patch for
these asap.

Nevertheless, there are parts of the patch for which I am not liable,
so please bear with me. I only copied those over from the origin
version. Well, I am quite prepared to correct them as well in order to
come up for the history break.


-BEGIN PGP SIGNATURE-
Version: GnuPG v1

iQEcBAEBCQAGBQJVyNoMAAoJEOAWT1uK3zQ7jOcH/3WJWNM+gcKDz/Hbj2oQLcli
M3jkIICJFZhSlCUqI0DjmVecy3ryDtxZjM4HuHcqPP8nqmdP7ykiO7p89PLTF2iC
XgA7UMMTByNJD6WSz7kjwWFlPXhvffrhE4yNZe+WkTE+HrJ8GPVydnhnr+Xo4L3g
YYDns9VWAHQgD14bd36FaoZkYmlXM1WQJZm5sgMCYWEq8ZpIHFJhqKRD6Y7e29rK
eI8BQchv30QHQiCzFOIyTqm7ncUb9CE8brBC1IFEFs9Eli5CQCoiriXANR3ntsjB
dU/6P3NuyAkis7CWILgGaKSNi0h/DPhszZQh5Gfjl4FFE5vszCVup6pM1evBWH0=
=mcKX
-END PGP SIGNATURE-


0x8ADF343B.asc
Description: application/pgp-keys


Re: [RFC PATCH net-next] tcp: reduce cpu usage under tcp memory pressure when SO_SNDBUF is set

2015-08-10 Thread Jason Baron
On 08/10/2015 10:47 AM, Eric Dumazet wrote:
 On Fri, 2015-08-07 at 18:31 +, Jason Baron wrote:
 From: Jason Baron jba...@akamai.com

 When SO_SNDBUF is set and we are under tcp memory pressure, the effective 
 write
 buffer space can be much lower than what was set using SO_SNDBUF. For 
 example,
 we may have set the buffer to 100kb, but we may only be able to write 10kb. 
 In
 this scenario poll()/select()/epoll(), are going to continuously return 
 POLLOUT,
 followed by -EAGAIN from write() in a very tight loop.

 Introduce sk-sk_effective_sndbuf, such that we can track the 'effective' 
 size
 of the sndbuf, when we have a short write due to memory pressure. By using 
 the
 sk-sk_effective_sndbuf instead of the sk-sk_sndbuf when we are under memory
 pressure, we can delay the POLLOUT until 1/3 of the buffer clears as we 
 normally
 do. There is no issue here when SO_SNDBUF is not set, since the tcp layer 
 will
 auto tune the sk-sndbuf.

 In my testing, this brought a single threaad's cpu usage down from 100% to 1%
 while maintaining the same level of throughput when under memory pressure.

 
 I am not sure we need to grow socket for something that looks like a
 flag ?



So I added a new field because I needed to store the new 'effective'
sndbuf somewhere and then restore the original value that was set via
SO_SNDBUF. So its really b/c of SO_SNDBUF. We could perhaps use the fact
that we are in memory pressure to signal wakeups differently, but I'm
not sure exactly how.


 Also you add a race in sk_stream_wspace() as sk_effective_sndbuf value
 can change under us.
 
 +   if (sk-sk_effective_sndbuf)
 +   return sk-sk_effective_sndbuf - sk-sk_wmem_queued;
 +
 
 
 
 

thanks. better?

--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -798,8 +798,10 @@ static inline int sk_stream_min_wspace(const struct
sock *sk)

 static inline int sk_stream_wspace(const struct sock *sk)
 {
-   if (sk-sk_effective_sndbuf)
-   return sk-sk_effective_sndbuf - sk-sk_wmem_queued;
+   int effective_sndbuf = sk-sk_effective_sndbuf;
+
+   if (effective_sndbuf)
+   return effective_sndbuf - sk-sk_wmem_queued;

return sk-sk_sndbuf - sk-sk_wmem_queued;
 }


Thanks,

-Jason
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] mkiss: Fix error handling in mkiss_open()

2015-08-10 Thread Fabio Estevam
If register_netdev() fails we are not propagating the error and
we return success because ax_open() succeeded previously.

Fix this by checking the return value of ax_open() and 
register_netdev() and propagate the error in case of failure.

Reported-by: RUC_Soft_Sec zy900...@163.com
Signed-off-by: Fabio Estevam fabio.este...@freescale.com
---
 drivers/net/hamradio/mkiss.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/net/hamradio/mkiss.c b/drivers/net/hamradio/mkiss.c
index 2ffbf13..216bfd3 100644
--- a/drivers/net/hamradio/mkiss.c
+++ b/drivers/net/hamradio/mkiss.c
@@ -728,11 +728,12 @@ static int mkiss_open(struct tty_struct *tty)
dev-type = ARPHRD_AX25;
 
/* Perform the low-level AX25 initialization. */
-   if ((err = ax_open(ax-dev))) {
+   err = ax_open(ax-dev);
+   if (err)
goto out_free_netdev;
-   }
 
-   if (register_netdev(dev))
+   err = register_netdev(dev);
+   if (err)
goto out_free_buffers;
 
/* after register_netdev() - because else printk smashes the kernel */
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] openvswitch: Fix L4 checksum handling when dealing with IP fragments

2015-08-10 Thread David Miller
From: Glenn Griffin ggriffin.ker...@gmail.com
Date: Mon, 10 Aug 2015 10:43:16 -0700

 On Mon, Aug 03, 2015 at 02:03:28PM -0700, David Miller wrote:
 From: Glenn Griffin ggriffin.ker...@gmail.com
 Date: Mon, 3 Aug 2015 09:56:54 -0700
 
  openvswitch modifies the L4 checksum of a packet when modifying
  the ip address. When an IP packet is fragmented only the first
  fragment contains an L4 header and checksum. Prior to this change
  openvswitch would modify all fragments, modifying application data
  in non-first fragments, causing checksum failures in the
  reassembled packet.
  
  Signed-off-by: Glenn Griffin ggriffin.ker...@gmail.com
  ---
  Changes in v2:
- Compare frag_off in network byte order rather than host byte order
 
 Applied and queued up for -stable.
 
 I noticed this change didn't seem to make it into 4.2-rc6. I'm not too
 familiar with the release schedule so wasn't sure if that was expected
 or an oversight. Will this remain queued up until the 4.3 merge window
 opens?

It's in my 'net' tree and will be pushed to Linus's tree at a time that I
deem appropriate.  Usually I try to push to Linus one every week or so,
in order for changes to soak and get tested in my tree before they get
pushed to his.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next 8/9] net: Use passed in table for nexthop lookups

2015-08-10 Thread David Ahern
If a user passes in a table for new routes use that table for nexthop
lookups. Specifically, this solves the case where a connected route does
not exist in the main table, but only another table and then a subsequent
route is added with a next hop using the connected route. ie.,

$ ip route ls
default via 10.0.2.2 dev eth0
10.0.2.0/24 dev eth0  proto kernel  scope link  src 10.0.2.15
169.254.0.0/16 dev eth0  scope link  metric 1003
192.168.56.0/24 dev eth1  proto kernel  scope link  src 192.168.56.51

$ ip route ls table 10
1.1.1.0/24 dev eth2  scope link

Without this patch adding a nexthop route fails:

$ ip route add table 10 2.2.2.0/24 via 1.1.1.10
RTNETLINK answers: Network is unreachable

With this patch the route is added successfully.

Signed-off-by: David Ahern d...@cumulusnetworks.com
---
 net/ipv4/fib_semantics.c | 13 +++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/fib_semantics.c b/net/ipv4/fib_semantics.c
index 85e9a8abf15c..b7f1d20a9615 100644
--- a/net/ipv4/fib_semantics.c
+++ b/net/ipv4/fib_semantics.c
@@ -691,6 +691,7 @@ static int fib_check_nh(struct fib_config *cfg, struct 
fib_info *fi,
}
rcu_read_lock();
{
+   struct fib_table *tbl = NULL;
struct flowi4 fl4 = {
.daddr = nh-nh_gw,
.flowi4_scope = cfg-fc_scope + 1,
@@ -701,8 +702,16 @@ static int fib_check_nh(struct fib_config *cfg, struct 
fib_info *fi,
/* It is not necessary, but requires a bit of thinking 
*/
if (fl4.flowi4_scope  RT_SCOPE_LINK)
fl4.flowi4_scope = RT_SCOPE_LINK;
-   err = fib_lookup(net, fl4, res,
-FIB_LOOKUP_IGNORE_LINKSTATE);
+
+   if (cfg-fc_table)
+   tbl = fib_get_table(net, cfg-fc_table);
+
+   if (tbl)
+   err = fib_table_lookup(tbl, fl4, res,
+  FIB_LOOKUP_IGNORE_LINKSTATE);
+   else
+   err = fib_lookup(net, fl4, res,
+FIB_LOOKUP_IGNORE_LINKSTATE);
if (err) {
rcu_read_unlock();
return err;
-- 
2.3.2 (Apple Git-55)

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] iproute2: Add support for VRF device

2015-08-10 Thread David Ahern
Allow user to create a vrf device and specify its table binding.
Based on the iplink_vlan implementation.

Signed-off-by: Shrijeet Mukherjee s...@cumulusnetworks.com
Signed-off-by: David Ahern d...@cumulusnetworks.com
---
 include/linux/if_link.h |  8 +
 ip/Makefile |  2 +-
 ip/iplink.c |  2 +-
 ip/iplink_vrf.c | 85 +
 4 files changed, 95 insertions(+), 2 deletions(-)
 create mode 100644 ip/iplink_vrf.c

diff --git a/include/linux/if_link.h b/include/linux/if_link.h
index b905cf7f4948..74dedf4320b8 100644
--- a/include/linux/if_link.h
+++ b/include/linux/if_link.h
@@ -338,6 +338,14 @@ enum macvlan_macaddr_mode {
 
 #define MACVLAN_FLAG_NOPROMISC 1
 
+/* VRF section */
+enum {
+   IFLA_VRF_UNSPEC,
+   IFLA_VRF_TABLE,
+   __IFLA_VRF_MAX
+};
+
+#define IFLA_VRF_MAX (__IFLA_VRF_MAX - 1)
 /* IPVLAN section */
 enum {
IFLA_IPVLAN_UNSPEC,
diff --git a/ip/Makefile b/ip/Makefile
index 77653ecc5785..d8b38ac2e44b 100644
--- a/ip/Makefile
+++ b/ip/Makefile
@@ -7,7 +7,7 @@ IPOBJ=ip.o ipaddress.o ipaddrlabel.o iproute.o iprule.o 
ipnetns.o \
 iplink_vxlan.o tcp_metrics.o iplink_ipoib.o ipnetconf.o link_ip6tnl.o \
 link_iptnl.o link_gre6.o iplink_bond.o iplink_bond_slave.o iplink_hsr.o \
 iplink_bridge.o iplink_bridge_slave.o ipfou.o iplink_ipvlan.o \
-iplink_geneve.o
+iplink_geneve.o iplink_vrf.o
 
 RTMONOBJ=rtmon.o
 
diff --git a/ip/iplink.c b/ip/iplink.c
index 369d50eab94e..14bf7211a447 100644
--- a/ip/iplink.c
+++ b/ip/iplink.c
@@ -94,7 +94,7 @@ void iplink_usage(void)
fprintf(stderr, TYPE := { vlan | veth | vcan | dummy | ifb | 
macvlan | macvtap |\n);
fprintf(stderr,   bridge | bond | ipoib | ip6tnl | 
ipip | sit | vxlan |\n);
fprintf(stderr,   gre | gretap | ip6gre | ip6gretap | 
vti | nlmon |\n);
-   fprintf(stderr,   bond_slave | ipvlan | geneve }\n);
+   fprintf(stderr,   bond_slave | ipvlan | geneve | vrf 
}\n);
}
exit(-1);
 }
diff --git a/ip/iplink_vrf.c b/ip/iplink_vrf.c
new file mode 100644
index ..0d7e21c7c152
--- /dev/null
+++ b/ip/iplink_vrf.c
@@ -0,0 +1,85 @@
+/* iplink_vrf.cVRF device support
+ *
+ *  This program is free software; you can redistribute it and/or
+ *  modify it under the terms of the GNU General Public License
+ *  as published by the Free Software Foundation; either version
+ *  2 of the License, or (at your option) any later version.
+ *
+ * Authors: Shrijeet Mukherjee s...@cumulusnetworks.com
+ */
+
+#include stdio.h
+#include stdlib.h
+#include string.h
+#include sys/socket.h
+#include linux/if_link.h
+
+#include rt_names.h
+#include utils.h
+#include ip_common.h
+
+static void vrf_explain(FILE *f)
+{
+   fprintf(f, Usage: ... vrf table TABLEID \n);
+}
+
+static void explain(void)
+{
+   vrf_explain(stderr);
+}
+
+static int table_arg(void)
+{
+   fprintf(stderr,Error: argument of \table\ must be 0-32767 and 
currently unused\n);
+   return -1;
+}
+
+static int vrf_parse_opt(struct link_util *lu, int argc, char **argv,
+   struct nlmsghdr *n)
+{
+   while (argc  0) {
+   if (matches(*argv, table) == 0) {
+   __u32 table = 0;
+   NEXT_ARG();
+
+   table = atoi(*argv);
+   if (table  0 || table  32767)
+   return table_arg();
+   addattr32(n, 1024, IFLA_VRF_TABLE, table);
+   } else if (matches(*argv, help) == 0) {
+   explain();
+   return -1;
+   } else {
+   fprintf(stderr, vrf: unknown option \%s\?\n,
+   *argv);
+   explain();
+   return -1;
+   }
+   argc--, argv++;
+   }
+
+   return 0;
+}
+
+static void vrf_print_opt(struct link_util *lu, FILE *f, struct rtattr *tb[])
+{
+   if (!tb)
+   return;
+
+   if (tb[IFLA_VRF_TABLE])
+   fprintf(f, table %u , rta_getattr_u32(tb[IFLA_VRF_TABLE]));
+}
+
+static void vrf_print_help(struct link_util *lu, int argc, char **argv,
+ FILE *f)
+{
+   vrf_explain(f);
+}
+
+struct link_util vrf_link_util = {
+   .id = vrf,
+   .maxattr= IFLA_VRF_MAX,
+   .parse_opt  = vrf_parse_opt,
+   .print_opt  = vrf_print_opt,
+   .print_help = vrf_print_help,
+};
-- 
2.3.2 (Apple Git-55)

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next 6/9] net: Fix up inet_addr_type checks

2015-08-10 Thread David Ahern
Currently inet_addr_type and inet_dev_addr_type expect local addresses
to be in the local table. With the VRF device local routes for devices
associated with a VRF will be in the table associated with the VRF.
Provide an alternate inet_addr lookup to use a specific table rather
than defaulting to the local table.

inet_addr_type_dev_table keeps the same semantics as inet_addr_type but
if the passed in device is enslaved to a VRF then the table for that VRF
is used for the lookup.

Signed-off-by: David Ahern d...@cumulusnetworks.com
---
 include/net/route.h  |  3 +++
 net/ipv4/af_inet.c   | 13 -
 net/ipv4/arp.c   | 15 +--
 net/ipv4/fib_frontend.c  | 28 +---
 net/ipv4/fib_semantics.c |  6 --
 net/ipv4/icmp.c  |  5 +++--
 6 files changed, 56 insertions(+), 14 deletions(-)

diff --git a/include/net/route.h b/include/net/route.h
index 6ba681f0b98d..6dda2c1bf8c6 100644
--- a/include/net/route.h
+++ b/include/net/route.h
@@ -192,6 +192,9 @@ unsigned int inet_addr_type(struct net *net, __be32 addr);
 unsigned int inet_addr_type_table(struct net *net, __be32 addr, int tb_id);
 unsigned int inet_dev_addr_type(struct net *net, const struct net_device *dev,
__be32 addr);
+unsigned int inet_addr_type_dev_table(struct net *net,
+ const struct net_device *dev,
+ __be32 addr);
 void ip_rt_multicast_event(struct in_device *);
 int ip_rt_ioctl(struct net *, unsigned int cmd, void __user *arg);
 void ip_rt_get_source(u8 *src, struct sk_buff *skb, struct rtable *rt);
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index cc4e498a0ccf..96fba4f63454 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -119,6 +119,7 @@
 #ifdef CONFIG_IP_MROUTE
 #include linux/mroute.h
 #endif
+#include net/vrf.h
 
 
 /* The inetsw table contains everything that inet_create needs to
@@ -427,6 +428,7 @@ int inet_bind(struct socket *sock, struct sockaddr *uaddr, 
int addr_len)
struct net *net = sock_net(sk);
unsigned short snum;
int chk_addr_ret;
+   int tb_id = 0;
int err;
 
/* If the socket has its own bind function then use it. (RAW) */
@@ -448,7 +450,16 @@ int inet_bind(struct socket *sock, struct sockaddr *uaddr, 
int addr_len)
goto out;
}
 
-   chk_addr_ret = inet_addr_type(net, addr-sin_addr.s_addr);
+   if (sk-sk_bound_dev_if) {
+   struct net_device *dev;
+
+   rcu_read_lock();
+   dev = dev_get_by_index_rcu(net, sk-sk_bound_dev_if);
+   if (dev)
+   tb_id = vrf_dev_table_rcu(dev);
+   rcu_read_unlock();
+   }
+   chk_addr_ret = inet_addr_type_table(net, addr-sin_addr.s_addr, tb_id);
 
/* Not specified by any standard per-se, however it breaks too
 * many applications when removed.  It is unfortunate since
diff --git a/net/ipv4/arp.c b/net/ipv4/arp.c
index 34a308573f4b..30409b75e925 100644
--- a/net/ipv4/arp.c
+++ b/net/ipv4/arp.c
@@ -233,7 +233,7 @@ static int arp_constructor(struct neighbour *neigh)
return -EINVAL;
}
 
-   neigh-type = inet_addr_type(dev_net(dev), addr);
+   neigh-type = inet_addr_type_dev_table(dev_net(dev), dev, addr);
 
parms = in_dev-arp_parms;
__neigh_parms_put(neigh-parms);
@@ -343,7 +343,7 @@ static void arp_solicit(struct neighbour *neigh, struct 
sk_buff *skb)
switch (IN_DEV_ARP_ANNOUNCE(in_dev)) {
default:
case 0: /* By default announce any local IP */
-   if (skb  inet_addr_type(dev_net(dev),
+   if (skb  inet_addr_type_dev_table(dev_net(dev), dev,
  ip_hdr(skb)-saddr) == RTN_LOCAL)
saddr = ip_hdr(skb)-saddr;
break;
@@ -351,7 +351,8 @@ static void arp_solicit(struct neighbour *neigh, struct 
sk_buff *skb)
if (!skb)
break;
saddr = ip_hdr(skb)-saddr;
-   if (inet_addr_type(dev_net(dev), saddr) == RTN_LOCAL) {
+   if (inet_addr_type_dev_table(dev_net(dev), dev,
+saddr) == RTN_LOCAL) {
/* saddr should be known to target */
if (inet_addr_onlink(in_dev, target, saddr))
break;
@@ -751,7 +752,7 @@ static int arp_process(struct sock *sk, struct sk_buff *skb)
/* Special case: IPv4 duplicate address detection packet (RFC2131) */
if (sip == 0) {
if (arp-ar_op == htons(ARPOP_REQUEST) 
-   inet_addr_type(net, tip) == RTN_LOCAL 
+   inet_addr_type_dev_table(net, dev, tip) == RTN_LOCAL 
!arp_ignore(in_dev, sip, tip))
arp_send(ARPOP_REPLY, ETH_P_ARP, sip, dev, tip, 

[PATCH net-next 7/9] net: Add routes to the table associated with the device

2015-08-10 Thread David Ahern
When a device associated with a VRF is brought up or down routes
should be added to/removed from the table associated with the VRF.
fib_magic defaults to using the main or local tables. Have it use
the table with the device if there is one.

A part of this is directing prefsrc validations to the correct
table as well.

Signed-off-by: David Ahern d...@cumulusnetworks.com
---
 net/ipv4/fib_frontend.c  |  8 
 net/ipv4/fib_semantics.c | 25 +++--
 2 files changed, 23 insertions(+), 10 deletions(-)

diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c
index d84ae0e30369..0a50a08ab844 100644
--- a/net/ipv4/fib_frontend.c
+++ b/net/ipv4/fib_frontend.c
@@ -803,6 +803,7 @@ static int inet_dump_fib(struct sk_buff *skb, struct 
netlink_callback *cb)
 static void fib_magic(int cmd, int type, __be32 dst, int dst_len, struct 
in_ifaddr *ifa)
 {
struct net *net = dev_net(ifa-ifa_dev-dev);
+   int tb_id = vrf_dev_table_rtnl(ifa-ifa_dev-dev);
struct fib_table *tb;
struct fib_config cfg = {
.fc_protocol = RTPROT_KERNEL,
@@ -817,11 +818,10 @@ static void fib_magic(int cmd, int type, __be32 dst, int 
dst_len, struct in_ifad
},
};
 
-   if (type == RTN_UNICAST)
-   tb = fib_new_table(net, RT_TABLE_MAIN);
-   else
-   tb = fib_new_table(net, RT_TABLE_LOCAL);
+   if (!tb_id)
+   tb_id = (type == RTN_UNICAST) ? RT_TABLE_MAIN : RT_TABLE_LOCAL;
 
+   tb = fib_new_table(net, tb_id);
if (!tb)
return;
 
diff --git a/net/ipv4/fib_semantics.c b/net/ipv4/fib_semantics.c
index 410ddb67221e..85e9a8abf15c 100644
--- a/net/ipv4/fib_semantics.c
+++ b/net/ipv4/fib_semantics.c
@@ -838,6 +838,23 @@ __be32 fib_info_update_nh_saddr(struct net *net, struct 
fib_nh *nh)
return nh-nh_saddr;
 }
 
+static bool fib_valid_prefsrc(struct fib_config *cfg, __be32 fib_prefsrc)
+{
+   if (cfg-fc_type != RTN_LOCAL || !cfg-fc_dst ||
+   fib_prefsrc != cfg-fc_dst) {
+   int tb_id = cfg-fc_table;
+
+   if (tb_id == RT_TABLE_MAIN)
+   tb_id = RT_TABLE_LOCAL;
+
+   if (inet_addr_type_table(cfg-fc_nlinfo.nl_net,
+fib_prefsrc, tb_id) != RTN_LOCAL) {
+   return false;
+   }
+   }
+   return true;
+}
+
 struct fib_info *fib_create_info(struct fib_config *cfg)
 {
int err;
@@ -1033,12 +1050,8 @@ struct fib_info *fib_create_info(struct fib_config *cfg)
fi-fib_flags |= RTNH_F_LINKDOWN;
}
 
-   if (fi-fib_prefsrc) {
-   if (cfg-fc_type != RTN_LOCAL || !cfg-fc_dst ||
-   fi-fib_prefsrc != cfg-fc_dst)
-   if (inet_addr_type(net, fi-fib_prefsrc) != RTN_LOCAL)
-   goto err_inval;
-   }
+   if (fi-fib_prefsrc  !fib_valid_prefsrc(cfg, fi-fib_prefsrc))
+   goto err_inval;
 
change_nexthops(fi) {
fib_info_update_nh_saddr(net, nexthop_nh);
-- 
2.3.2 (Apple Git-55)

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next 5/9] net: Add inet_addr lookup by table

2015-08-10 Thread David Ahern
Currently inet_addr_type and inet_dev_addr_type expect local addresses
to be in the local table. With the VRF device local routes for devices
associated with a VRF will be in the table associated with the VRF.
Provide an alternate inet_addr lookup to use a specific table rather
than defaulting to the local table.

Signed-off-by: Shrijeet Mukherjee s...@cumulusnetworks.com
Signed-off-by: David Ahern d...@cumulusnetworks.com
---
 include/net/route.h |  1 +
 net/ipv4/fib_frontend.c | 22 +++---
 2 files changed, 16 insertions(+), 7 deletions(-)

diff --git a/include/net/route.h b/include/net/route.h
index 94189d4bd899..6ba681f0b98d 100644
--- a/include/net/route.h
+++ b/include/net/route.h
@@ -189,6 +189,7 @@ void ipv4_sk_redirect(struct sk_buff *skb, struct sock *sk);
 void ip_rt_send_redirect(struct sk_buff *skb);
 
 unsigned int inet_addr_type(struct net *net, __be32 addr);
+unsigned int inet_addr_type_table(struct net *net, __be32 addr, int tb_id);
 unsigned int inet_dev_addr_type(struct net *net, const struct net_device *dev,
__be32 addr);
 void ip_rt_multicast_event(struct in_device *);
diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c
index d8ced1d89f1b..b11321a8e58d 100644
--- a/net/ipv4/fib_frontend.c
+++ b/net/ipv4/fib_frontend.c
@@ -212,12 +212,12 @@ void fib_flush_external(struct net *net)
  */
 static inline unsigned int __inet_dev_addr_type(struct net *net,
const struct net_device *dev,
-   __be32 addr)
+   __be32 addr, int tb_id)
 {
struct flowi4   fl4 = { .daddr = addr };
struct fib_result   res;
unsigned int ret = RTN_BROADCAST;
-   struct fib_table *local_table;
+   struct fib_table *table;
 
if (ipv4_is_zeronet(addr) || ipv4_is_lbcast(addr))
return RTN_BROADCAST;
@@ -226,10 +226,10 @@ static inline unsigned int __inet_dev_addr_type(struct 
net *net,
 
rcu_read_lock();
 
-   local_table = fib_get_table(net, RT_TABLE_LOCAL);
-   if (local_table) {
+   table = fib_get_table(net, tb_id);
+   if (table) {
ret = RTN_UNICAST;
-   if (!fib_table_lookup(local_table, fl4, res, 
FIB_LOOKUP_NOREF)) {
+   if (!fib_table_lookup(table, fl4, res, FIB_LOOKUP_NOREF)) {
if (!dev || dev == res.fi-fib_dev)
ret = res.type;
}
@@ -239,16 +239,24 @@ static inline unsigned int __inet_dev_addr_type(struct 
net *net,
return ret;
 }
 
+unsigned int inet_addr_type_table(struct net *net, __be32 addr, int tb_id)
+{
+   return __inet_dev_addr_type(net, NULL, addr, tb_id);
+}
+EXPORT_SYMBOL(inet_addr_type_table);
+
 unsigned int inet_addr_type(struct net *net, __be32 addr)
 {
-   return __inet_dev_addr_type(net, NULL, addr);
+   return __inet_dev_addr_type(net, NULL, addr, RT_TABLE_LOCAL);
 }
 EXPORT_SYMBOL(inet_addr_type);
 
 unsigned int inet_dev_addr_type(struct net *net, const struct net_device *dev,
__be32 addr)
 {
-   return __inet_dev_addr_type(net, dev, addr);
+   int rt_table = vrf_dev_table(dev) ? : RT_TABLE_LOCAL;
+
+   return __inet_dev_addr_type(net, dev, addr, rt_table);
 }
 EXPORT_SYMBOL(inet_dev_addr_type);
 
-- 
2.3.2 (Apple Git-55)

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net] bna: fix interrupts storm caused by erroneous packets

2015-08-10 Thread David Miller
From: Ivan Vecera ivec...@redhat.com
Date: Thu,  6 Aug 2015 22:48:23 +0200

 The commit e29aa33 bna: Enable Multi Buffer RX moved packets counter
 increment from the beginning of the NAPI processing loop after the check
 for erroneous packets so they are never accounted. This counter is used
 to inform firmware about number of processed completions (packets).
 As these packets are never acked the firmware fires IRQs for them again
 and again.
 
 Fixes: e29aa33 bna: Enable Multi Buffer RX
 Signed-off-by: Ivan Vecera ivec...@redhat.com

Applied, thanks.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH iproute2] tipc: fix bearer get/set help synopsis

2015-08-10 Thread Stephen Hemminger
On Fri, 7 Aug 2015 09:55:09 +0200
richard.a...@ericsson.com wrote:

 From: Richard Alpe richard.a...@ericsson.com
 
 One option is required for bearer set and bearer get.

Applied, thanks

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] openvswitch: Fix L4 checksum handling when dealing with IP fragments

2015-08-10 Thread Glenn Griffin
On Mon, Aug 03, 2015 at 02:03:28PM -0700, David Miller wrote:
 From: Glenn Griffin ggriffin.ker...@gmail.com
 Date: Mon, 3 Aug 2015 09:56:54 -0700
 
  openvswitch modifies the L4 checksum of a packet when modifying
  the ip address. When an IP packet is fragmented only the first
  fragment contains an L4 header and checksum. Prior to this change
  openvswitch would modify all fragments, modifying application data
  in non-first fragments, causing checksum failures in the
  reassembled packet.
  
  Signed-off-by: Glenn Griffin ggriffin.ker...@gmail.com
  ---
  Changes in v2:
- Compare frag_off in network byte order rather than host byte order
 
 Applied and queued up for -stable.

I noticed this change didn't seem to make it into 4.2-rc6. I'm not too
familiar with the release schedule so wasn't sure if that was expected
or an oversight. Will this remain queued up until the 4.3 merge window
opens?
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next 1/9] net: Introduce VRF related flags and helpers

2015-08-10 Thread David Ahern
Add a VRF_MASTER flag for interfaces and helper functions for determining
if a device is a VRF_MASTER.

Add link attribute for passing VRF_TABLE id.

Add vrf_ptr to netdevice.

Add various macros for determining if a device is a VRF device, the index
of the master VRF device and table associated with VRF device.

Signed-off-by: Shrijeet Mukherjee s...@cumulusnetworks.com
Signed-off-by: David Ahern d...@cumulusnetworks.com
---
 include/linux/netdevice.h|  20 +++
 include/net/vrf.h| 139 +++
 include/uapi/linux/if_link.h |   9 +++
 3 files changed, 168 insertions(+)
 create mode 100644 include/net/vrf.h

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 607b5f41f46f..f7a6ef2fae3a 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1289,6 +1289,7 @@ enum netdev_priv_flags {
IFF_XMIT_DST_RELEASE_PERM   = 122,
IFF_IPVLAN_MASTER   = 123,
IFF_IPVLAN_SLAVE= 124,
+   IFF_VRF_MASTER  = 125,
 };
 
 #define IFF_802_1Q_VLANIFF_802_1Q_VLAN
@@ -1316,6 +1317,7 @@ enum netdev_priv_flags {
 #define IFF_XMIT_DST_RELEASE_PERM  IFF_XMIT_DST_RELEASE_PERM
 #define IFF_IPVLAN_MASTER  IFF_IPVLAN_MASTER
 #define IFF_IPVLAN_SLAVE   IFF_IPVLAN_SLAVE
+#define IFF_VRF_MASTER IFF_VRF_MASTER
 
 /**
  * struct net_device - The DEVICE structure.
@@ -1432,6 +1434,7 @@ enum netdev_priv_flags {
  * @dn_ptr:DECnet specific data
  * @ip6_ptr:   IPv6 specific data
  * @ax25_ptr:  AX.25 specific data
+ * @vrf_ptr:   VRF specific data
  * @ieee80211_ptr: IEEE 802.11 specific data, assign before registering
  *
  * @last_rx:   Time of last Rx
@@ -1650,6 +1653,7 @@ struct net_device {
struct dn_dev __rcu *dn_ptr;
struct inet6_dev __rcu  *ip6_ptr;
void*ax25_ptr;
+   struct net_vrf_dev __rcu *vrf_ptr;
struct wireless_dev *ieee80211_ptr;
struct wpan_dev *ieee802154_ptr;
 #if IS_ENABLED(CONFIG_MPLS_ROUTING)
@@ -3808,6 +3812,22 @@ static inline bool netif_supports_nofcs(struct 
net_device *dev)
return dev-priv_flags  IFF_SUPP_NOFCS;
 }
 
+static inline bool netif_is_vrf(const struct net_device *dev)
+{
+   return dev-priv_flags  IFF_VRF_MASTER;
+}
+
+static inline bool netif_index_is_vrf(struct net *net, int ifindex)
+{
+   struct net_device *dev = dev_get_by_index_rcu(net, ifindex);
+   bool rc = false;
+
+   if (dev)
+   rc = netif_is_vrf(dev);
+
+   return rc;
+}
+
 /* This device needs to keep skb dst for qdisc enqueue or ndo_start_xmit() */
 static inline void netif_keep_dst(struct net_device *dev)
 {
diff --git a/include/net/vrf.h b/include/net/vrf.h
new file mode 100644
index ..25c709fdb98f
--- /dev/null
+++ b/include/net/vrf.h
@@ -0,0 +1,139 @@
+/*
+ * include/net/net_vrf.h - adds vrf dev structure definitions
+ * Copyright (c) 2015 Cumulus Networks
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+#ifndef __LINUX_NET_VRF_H
+#define __LINUX_NET_VRF_H
+
+struct net_vrf_dev {
+   struct rcu_head rcu;
+   int ifindex; /* ifindex of master dev */
+   u32 tb_id;   /* table id for VRF */
+};
+
+struct slave {
+   struct list_headlist;
+   struct net_device   *dev;
+};
+
+struct slave_queue {
+   struct list_headall_slaves;
+   int num_slaves;
+};
+
+struct net_vrf {
+   struct slave_queue  queue;
+   struct rtable   *rth;
+   u32 tb_id;
+};
+
+
+#if IS_ENABLED(CONFIG_NET_VRF)
+/* called with rcu_read_lock() */
+static inline int vrf_master_ifindex_rcu(const struct net_device *dev)
+{
+   struct net_vrf_dev *vrf_ptr;
+   int ifindex = 0;
+
+   if (!dev)
+   return 0;
+
+   if (netif_is_vrf(dev))
+   ifindex = dev-ifindex;
+   else {
+   vrf_ptr = rcu_dereference(dev-vrf_ptr);
+   if (vrf_ptr)
+   ifindex = vrf_ptr-ifindex;
+   }
+
+   return ifindex;
+}
+
+/* called with rcu_read_lock */
+static inline int vrf_dev_table_rcu(const struct net_device *dev)
+{
+   int tb_id = 0;
+
+   if (dev) {
+   struct net_vrf_dev *vrf_ptr;
+
+   vrf_ptr = rcu_dereference(dev-vrf_ptr);
+   if (vrf_ptr)
+   tb_id = vrf_ptr-tb_id;
+   }
+   return tb_id;
+}
+
+static inline int vrf_dev_table(const struct net_device *dev)
+{
+   int tb_id = 0;
+
+   rcu_read_lock();
+   tb_id = 

[PATCH net-next 4/9] udp: Handle VRF device in sendmsg

2015-08-10 Thread David Ahern
For unconnected UDP sockets using a VRF device lookup source address
based on VRF table. This allows the UDP header to be properly setup
before showing up at the VRF device via the dst.

Signed-off-by: Shrijeet Mukherjee s...@cumulusnetworks.com
Signed-off-by: David Ahern d...@cumulusnetworks.com
---
 net/ipv4/udp.c | 22 +-
 1 file changed, 21 insertions(+), 1 deletion(-)

diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 83aa604f9273..7af5052e3b1f 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -1013,11 +1013,31 @@ int udp_sendmsg(struct sock *sk, struct msghdr *msg, 
size_t len)
 
if (!rt) {
struct net *net = sock_net(sk);
+   __u8 flow_flags = inet_sk_flowi_flags(sk);
 
fl4 = fl4_stack;
+
+   /* unconnected socket. If output device is enslaved to a VRF
+* device lookup source address from VRF table. This mimics
+* behavior of ip_route_connect{_init}.
+*/
+   if (netif_index_is_vrf(net, ipc.oif)) {
+   flowi4_init_output(fl4, ipc.oif, sk-sk_mark, tos,
+  RT_SCOPE_UNIVERSE, sk-sk_protocol,
+  (flow_flags | FLOWI_FLAG_VRFSRC),
+  faddr, saddr, dport,
+  inet-inet_sport);
+
+   rt = ip_route_output_flow(net, fl4, sk);
+   if (!IS_ERR(rt)) {
+   saddr = fl4-saddr;
+   ip_rt_put(rt);
+   }
+   }
+
flowi4_init_output(fl4, ipc.oif, sk-sk_mark, tos,
   RT_SCOPE_UNIVERSE, sk-sk_protocol,
-  inet_sk_flowi_flags(sk),
+  flow_flags,
   faddr, saddr, dport, inet-inet_sport);
 
security_sk_classify_flow(sk, flowi4_to_flowi(fl4));
-- 
2.3.2 (Apple Git-55)

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next 3/9] net: Use VRF device index for lookups on TX

2015-08-10 Thread David Ahern
As with ingress use the index of VRF master device for route lookups on
egress. However, the oif should only be used to direct the lookups to a
specific table. Routes in the table are not based on the VRF device but
rather interfaces that are part of the VRF so do not consider the oif for
lookups within the table. The FLOWI_FLAG_VRFSRC is used to control this
latter part.

Signed-off-by: Shrijeet Mukherjee s...@cumulusnetworks.com
Signed-off-by: David Ahern d...@cumulusnetworks.com
---
 include/net/flow.h  | 1 +
 include/net/route.h | 3 +++
 net/ipv4/fib_trie.c | 7 +--
 net/ipv4/icmp.c | 4 
 net/ipv4/route.c| 5 +
 5 files changed, 18 insertions(+), 2 deletions(-)

diff --git a/include/net/flow.h b/include/net/flow.h
index 3098ae33a178..f305588fc162 100644
--- a/include/net/flow.h
+++ b/include/net/flow.h
@@ -33,6 +33,7 @@ struct flowi_common {
__u8flowic_flags;
 #define FLOWI_FLAG_ANYSRC  0x01
 #define FLOWI_FLAG_KNOWN_NH0x02
+#define FLOWI_FLAG_VRFSRC  0x04
__u32   flowic_secid;
struct flowi_tunnel flowic_tun_key;
 };
diff --git a/include/net/route.h b/include/net/route.h
index 2d45f419477f..94189d4bd899 100644
--- a/include/net/route.h
+++ b/include/net/route.h
@@ -251,6 +251,9 @@ static inline void ip_route_connect_init(struct flowi4 
*fl4, __be32 dst, __be32
if (inet_sk(sk)-transparent)
flow_flags |= FLOWI_FLAG_ANYSRC;
 
+   if (netif_index_is_vrf(sock_net(sk), oif))
+   flow_flags |= FLOWI_FLAG_VRFSRC;
+
flowi4_init_output(fl4, oif, sk-sk_mark, tos, RT_SCOPE_UNIVERSE,
   protocol, flow_flags, dst, src, dport, sport);
 }
diff --git a/net/ipv4/fib_trie.c b/net/ipv4/fib_trie.c
index 37c4bb89a708..1243c79cb5b0 100644
--- a/net/ipv4/fib_trie.c
+++ b/net/ipv4/fib_trie.c
@@ -1423,8 +1423,11 @@ int fib_table_lookup(struct fib_table *tb, const struct 
flowi4 *flp,
nh-nh_flags  RTNH_F_LINKDOWN 
!(fib_flags  FIB_LOOKUP_IGNORE_LINKSTATE))
continue;
-   if (flp-flowi4_oif  flp-flowi4_oif != nh-nh_oif)
-   continue;
+   if (!(flp-flowi4_flags  FLOWI_FLAG_VRFSRC)) {
+   if (flp-flowi4_oif 
+   flp-flowi4_oif != nh-nh_oif)
+   continue;
+   }
 
if (!(fib_flags  FIB_LOOKUP_NOREF))
atomic_inc(fi-fib_clntref);
diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c
index c0556f1e4bf0..1164fc4ce3bc 100644
--- a/net/ipv4/icmp.c
+++ b/net/ipv4/icmp.c
@@ -96,6 +96,7 @@
 #include net/xfrm.h
 #include net/inet_common.h
 #include net/ip_fib.h
+#include net/vrf.h
 
 /*
  * Build xmit assembly blocks
@@ -425,6 +426,7 @@ static void icmp_reply(struct icmp_bxm *icmp_param, struct 
sk_buff *skb)
fl4.flowi4_mark = mark;
fl4.flowi4_tos = RT_TOS(ip_hdr(skb)-tos);
fl4.flowi4_proto = IPPROTO_ICMP;
+   fl4.flowi4_oif = vrf_master_ifindex_rcu(skb-dev) ? : skb-dev-ifindex;
security_skb_classify_flow(skb, flowi4_to_flowi(fl4));
rt = ip_route_output_key(net, fl4);
if (IS_ERR(rt))
@@ -458,6 +460,8 @@ static struct rtable *icmp_route_lookup(struct net *net,
fl4-flowi4_proto = IPPROTO_ICMP;
fl4-fl4_icmp_type = type;
fl4-fl4_icmp_code = code;
+   fl4-flowi4_oif = vrf_master_ifindex_rcu(skb_in-dev) ? : 
skb_in-dev-ifindex;
+
security_skb_classify_flow(skb_in, flowi4_to_flowi(fl4));
rt = __ip_route_output_key(net, fl4);
if (IS_ERR(rt))
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index c26ff1f7067d..2c89d294b669 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -2131,6 +2131,11 @@ struct rtable *__ip_route_output_key(struct net *net, 
struct flowi4 *fl4)
fl4-saddr = inet_select_addr(dev_out, 0,
  RT_SCOPE_HOST);
}
+   if (netif_is_vrf(dev_out) 
+   !(fl4-flowi4_flags  FLOWI_FLAG_VRFSRC)) {
+   rth = vrf_dev_get_rth(dev_out);
+   goto out;
+   }
}
 
if (!fl4-daddr) {
-- 
2.3.2 (Apple Git-55)

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/3] Fixes for the network driver of Marvell Armada 375 SoC

2015-08-10 Thread David Miller
From: Marcin Wojtas m...@semihalf.com
Date: Thu,  6 Aug 2015 19:00:27 +0200

 This is a set of three patches that fix long-lasting problems implemented in
 the initial support for the Armada 375 network controller.
 
 Due to an inappropriate concept of handling the per-CPU sent packets'
 processing on TX path the driver numerous problems occured, such as RCU
 stalls. Those have been fixed, of which details you can find in the commit
 logs. The patches were intensively tested on top of v4.2-rc5.
 
 I'm looking forward to any comments or remarks.

Series applied, thanks.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH iproute2 -next] m_bpf: add frontend support for late binding

2015-08-10 Thread Stephen Hemminger
On Fri,  7 Aug 2015 11:36:50 +0200
Daniel Borkmann dan...@iogearbox.net wrote:

 Frontend support for kernel commit a5c90b29e5cc (act_bpf: properly
 support late binding of bpf action to a classifier).
 
 Signed-off-by: Daniel Borkmann dan...@iogearbox.net

Applied to net-next
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH iproute2 net-next] iplink: bonding: add support for IFLA_BOND_TLB_DYNAMIC_LB

2015-08-10 Thread Stephen Hemminger
On Mon,  3 Aug 2015 12:19:55 +0200
Nikolay Aleksandrov ra...@blackwall.org wrote:

 From: Nikolay Aleksandrov niko...@cumulusnetworks.com
 
 Add support to be able to set and show the value of tlb_dynamic_lb
 (IFLA_BOND_TLB_DYNAMIC_LB).
 Example:
 $ ip -d link show dev bond0 type bond
 7: bond0: BROADCAST,MULTICAST,MASTER mtu 1500 qdisc noop state DOWN
 mode DEFAULT group default
 link/ether ce:2f:e1:6e:d7:e0 brd ff:ff:ff:ff:ff:ff promiscuity 0
 bond mode balance-tlb miimon 100 updelay 0 downdelay 0 use_carrier 1
 arp_interval 0 arp_validate none arp_all_targets any primary_reselect
 always fail_over_mac none xmit_hash_policy layer2 resend_igmp 1
 num_grat_arp 1 all_slaves_active 0 min_links 0 lp_interval 1
 packets_per_slave 1 lacp_rate slow ad_select stable tlb_dynamic_lb 1
 addrgenmode eui64
 
 $ ip -d l set dev bond0 type bond tlb_dynamic_lb 0
 $ ip -d link show dev bond0 type bond
 7: bond0: BROADCAST,MULTICAST,MASTER mtu 1500 qdisc noop state DOWN
 mode DEFAULT group default
 link/ether ce:2f:e1:6e:d7:e0 brd ff:ff:ff:ff:ff:ff promiscuity 0
 bond mode balance-tlb miimon 100 updelay 0 downdelay 0 use_carrier 1
 arp_interval 0 arp_validate none arp_all_targets any primary_reselect
 always fail_over_mac none xmit_hash_policy layer2 resend_igmp 1
 num_grat_arp 1 all_slaves_active 0 min_links 0 lp_interval 1
 packets_per_slave 1 lacp_rate slow ad_select stable tlb_dynamic_lb 0
 addrgenmode eui64
 
 Signed-off-by: Nikolay Aleksandrov niko...@cumulusnetworks.com

Applied to net-next

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [iproute PATCH] misc/ss: don't imply -a when -A was specified

2015-08-10 Thread Stephen Hemminger
On Fri,  7 Aug 2015 15:31:27 +0200
Phil Sutter p...@nwl.cc wrote:

 Signed-off-by: Phil Sutter p...@nwl.cc

Ok, applied
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kernel warning in tcp_fragment

2015-08-10 Thread Neal Cardwell
On Mon, Aug 10, 2015 at 2:10 PM, Jovi Zhangwei j...@cloudflare.com wrote:

 Ping?

 We saw a lot of this warnings in our production system. It would be
 great appreciate if someone can give us the fix on this warnings. :)

What is your net.ipv4.tcp_mtu_probing setting? If 1, have you tried
setting it to 0? Previous reports (
https://patchwork.ozlabs.org/patch/480882/ ) have shown that this gets
rid of at least one source of the warning. So that would provide a
useful data point.

Separately, you could also try the attached patch. This is against
3.14.39. It tries to attack a different possible source of this
warning. Please let us know if that patch helps.

Thanks!

neal


0001-RFC-for-tests-on-v3.14.39-tcp-resegment-skbs-that-we.patch
Description: Binary data


Re: [PATCH net-next 1/2] net: track link status of ipv6 nexthops

2015-08-10 Thread Andy Gospodarek
On Mon, Aug 10, 2015 at 10:54:00AM -0700, David Miller wrote:
 From: Andy Gospodarek go...@cumulusnetworks.com
 Date: Thu,  6 Aug 2015 11:42:33 -0400
 
  Add support to track current link status of ipv6 nexthops to match
  recent changes that added support for ipv4 nexthops.  There was not a
  field already available that could track these and no space available in
  the existing rt6i_flags field, so this patch adds rt6i_nhflags to struct
  rt6_info.
  
  Signed-off-by: Andy Gospodarek go...@cumulusnetworks.com
  Signed-off-by: Dinesh Dutt dd...@cumulusnetworks.com
 
 This doesn't really make any sense to me.
 
 You can evaluate the state of the link at the time you look at the
 route at all of the places where it matters as far as I can tell.
 
 It's so expensive to walk the entire routing table every time a link
 goes up and down, so it's much better to take an evaluate as needed
 approach to implementing this.

I went this way as the idea of storing this info in a flags structure
for 2 reasons:

- This idea or marking on link status changes and checking for that mark
  during forwarding was done what was suggested by Alex et al for the
  ipv4 code and I wanted to keep the overall design similar.

- New flags will likely be needed when switchdev support is added for
  ipv6 routes so going ahead and mirroring the RTNH_F* flags in the the
  ipv6 code seemed reasonable.

I would actually be fine with what you proposed (it is closer to the
first implementation), so if my justification above does not change your
mind, let me know and I'll post a v2 that does not add rt6i_nhflags and
simply checks netif_carrier_ok() rather than RTNH_F_LINKDOWN.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH net-next] tcp: reduce cpu usage under tcp memory pressure when SO_SNDBUF is set

2015-08-10 Thread Eric Dumazet
On Fri, 2015-08-07 at 18:31 +, Jason Baron wrote:
 From: Jason Baron jba...@akamai.com
 
 When SO_SNDBUF is set and we are under tcp memory pressure, the effective 
 write
 buffer space can be much lower than what was set using SO_SNDBUF. For example,
 we may have set the buffer to 100kb, but we may only be able to write 10kb. In
 this scenario poll()/select()/epoll(), are going to continuously return 
 POLLOUT,
 followed by -EAGAIN from write() in a very tight loop.
 
 Introduce sk-sk_effective_sndbuf, such that we can track the 'effective' size
 of the sndbuf, when we have a short write due to memory pressure. By using the
 sk-sk_effective_sndbuf instead of the sk-sk_sndbuf when we are under memory
 pressure, we can delay the POLLOUT until 1/3 of the buffer clears as we 
 normally
 do. There is no issue here when SO_SNDBUF is not set, since the tcp layer will
 auto tune the sk-sndbuf.
 
 In my testing, this brought a single threaad's cpu usage down from 100% to 1%
 while maintaining the same level of throughput when under memory pressure.
 

I am not sure we need to grow socket for something that looks like a
flag ?

Also you add a race in sk_stream_wspace() as sk_effective_sndbuf value
can change under us.

+   if (sk-sk_effective_sndbuf)
+   return sk-sk_effective_sndbuf - sk-sk_wmem_queued;
+




--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next v3 7/8] net: switchdev: support static FDB addresses

2015-08-10 Thread Scott Feldman
On Mon, Aug 10, 2015 at 6:09 AM, Vivien Didelot
vivien.dide...@savoirfairelinux.com wrote:
 This patch adds an ndm_state member to the switchdev_obj_fdb structure,
 in order to support static FDB addresses.

 Set Rocker ndm_state to NUD_REACHABLE.

 Signed-off-by: Vivien Didelot vivien.dide...@savoirfairelinux.com

Acked-by: Scott Feldman sfel...@gmail.com
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/3] gianfar: correct list membership accounting

2015-08-10 Thread Jakub Kicinski
From: Jakub Kicinski kubak...@wp.pl

At a cost of one line let's make sure .count is correct
when calling gfar_process_filer_changes().

Signed-off-by: Jakub Kicinski kubak...@wp.pl
---
 drivers/net/ethernet/freescale/gianfar_ethtool.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/freescale/gianfar_ethtool.c 
b/drivers/net/ethernet/freescale/gianfar_ethtool.c
index e543d3b01838..b955ed83ca98 100644
--- a/drivers/net/ethernet/freescale/gianfar_ethtool.c
+++ b/drivers/net/ethernet/freescale/gianfar_ethtool.c
@@ -1723,13 +1723,14 @@ static int gfar_add_cls(struct gfar_private *priv,
}
 
 process:
+   priv-rx_list.count++;
ret = gfar_process_filer_changes(priv);
if (ret)
goto clean_list;
-   priv-rx_list.count++;
return ret;
 
 clean_list:
+   priv-rx_list.count--;
list_del(temp-list);
 clean_mem:
kfree(temp);
-- 
2.1.0

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net] ipv6: don't reject link-local nexthop on other interface

2015-08-10 Thread David Miller
From: Florian Westphal f...@strlen.de
Date: Fri,  7 Aug 2015 10:54:28 +0200

 48ed7b26faa7 (ipv6: reject locally assigned nexthop addresses) is too
 strict; it rejects following corner-case:
 
 ip -6 route add default via fe80::1:2:3 dev eth1
 
 [ where fe80::1:2:3 is assigned to a local interface, but not eth1 ]
 
 Fix this by restricting search to given device if nh is linklocal.
 
 Joint work with Hannes Frederic Sowa.
 
 Fixes: 48ed7b26faa7 (ipv6: reject locally assigned nexthop addresses)
 Signed-off-by: Hannes Frederic Sowa han...@stressinduktion.org
 Signed-off-by: Florian Westphal f...@strlen.de

Applied, thank you.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next] net: fec: fix the race between xmit and bdp reclaiming path

2015-08-10 Thread David Miller
From: Kevin Hao haoke...@gmail.com
Date: Fri,  7 Aug 2015 13:52:37 +0800

 When we transmit a fragmented skb, we may run into a race like the
 following scenario (assume txq-cur_tx is next to txq-dirty_tx):
cpu 0  cpu 1
   fec_enet_txq_submit_skb
 reserve a bdp for the first fragment
 fec_enet_txq_submit_frag_skb
update the bdp for the other fragment
update txq-cur_tx
fec_enet_tx_queue
  bdp = 
 fec_enet_get_nextdesc(txq-dirty_tx, fep, queue_id);
  This bdp is the bdp 
 reserved for the first segment. Given
  that this bdp 
 BD_ENET_TX_READY bit is not set and txq-cur_tx
  is already pointed to a 
 bdp beyond this one. We think this is a
  completed bdp and try to 
 reclaim it.
 update the bdp for the first segment
 update txq-cur_tx
 
 So we shouldn't update the txq-cur_tx until all the update to the
 bdps used for fragments are performed. Also add the corresponding
 memory barrier to guarantee that the update to the bdps, dirty_tx and
 cur_tx performed in the proper order.
 
 Signed-off-by: Kevin Hao haoke...@gmail.com

Applied, thanks.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[FWD] PROBLEM: there exists a wrong return value of function mkiss_open()

2015-08-10 Thread Linus Torvalds
I don't know how many people care about hamradio, but the report that
mkiss_open() returns success even when register_netdev() fails seems
entirely true. The email was just not sent to the right people..

   Linus

On Sun, Aug 9, 2015 at 5:08 PM, RUC_Soft_Sec zy900...@163.com wrote:
 Summary:
 there exists a wrong return value of function mkiss_open(). It's a
 theoretical problem. we use static analysis method to detect this bug.
 Bug Description:

 In function mkiss_open() at drivers/net/hamradio/mkiss.c:726, the call to
 register_netdev() in line 765 may return a negative error code, and thus
 function mkiss_open() will return the value of variable err. And, the
 function mkiss_open() will return 0 at last when it runs well. However, when
 the call to register_netdev() in line 765 return a negative error code, the
 value of err is 0. So the function mkiss_open() will return 0 to its caller
 functions when it runs error because of the failing call to
 register_netdev(), leading to a wrong return value of function mkiss_open().
 The related code snippets in mkiss_open() is as following.
 mkiss_open @@ drivers/net/hamradio/mkiss.c:726
  726static int mkiss_open(struct tty_struct *tty)
  727{
 ...
  761if ((err = ax_open(ax-dev))) {
  762goto out_free_netdev;
  763}
  764
  765if (register_netdev(dev))
  766goto out_free_buffers;
 ...
  800out_free_buffers:
  801kfree(ax-rbuff);
  802kfree(ax-xbuff);
  803
  804out_free_netdev:
  805free_netdev(dev);
  806
  807out:
  808return err;
  809}

 Generally, when the call to register_netdev() fails, the return value of
 caller functions should be different from another return value set when the
 call to register_netdev() succeeds, like the following codes in another
 file.
 com90io_found @@ drivers/net/arcnet/com90io.c:234
  234static int __init com90io_found(struct net_device *dev)
  235{
 ...
  268err = register_netdev(dev);
  269if (err) {
  270outb((inb(_CONFIG)  ~IOMAPflag), _CONFIG);
  271free_irq(dev-irq, dev);
  272release_region(dev-base_addr, ARCNET_TOTAL_SIZE);
  273return err;
  274}
  275
  276BUGMSG(D_NORMAL, COM90IO: station %02Xh found at %03lXh, IRQ
 %d.\n,
  277   dev-dev_addr[0], dev-base_addr, dev-irq);
  278
  279return 0;
  280}

 Kernel version:
 3.19.1




--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/3] gianfar: remove faulty filer optimizer

2015-08-10 Thread Jakub Kicinski
From: Jakub Kicinski kubak...@wp.pl

Current filer rule optimization is broken in several ways:
 (1) It destroys rule ordering.
 (2) It performs reads/writes beyond end of allocated tables.
 (3) It breaks badly for rules with more than 2 specifiers
 (e.g. matching ip, port, tos).
 (4) We observed that the masking rules it generates do not
 play well with clustering on P2020.  Only first rule
 of the cluster would ever fire.  Given that optimizer
 relies heavily on masking this is very hard to fix.

The fact that nobody noticed (1), (3) or (4) makes me think
that this feature is not very widely used and we should just
remove it.

Reported-by: Aleksander Dutkowski adutkow...@gmail.com
Signed-off-by: Jakub Kicinski kubak...@wp.pl
---
 drivers/net/ethernet/freescale/gianfar_ethtool.c | 337 ---
 1 file changed, 337 deletions(-)

diff --git a/drivers/net/ethernet/freescale/gianfar_ethtool.c 
b/drivers/net/ethernet/freescale/gianfar_ethtool.c
index b955ed83ca98..6bdc89179b72 100644
--- a/drivers/net/ethernet/freescale/gianfar_ethtool.c
+++ b/drivers/net/ethernet/freescale/gianfar_ethtool.c
@@ -902,27 +902,6 @@ static int gfar_check_filer_hardware(struct gfar_private 
*priv)
return 0;
 }
 
-static int gfar_comp_asc(const void *a, const void *b)
-{
-   return memcmp(a, b, 4);
-}
-
-static int gfar_comp_desc(const void *a, const void *b)
-{
-   return -memcmp(a, b, 4);
-}
-
-static void gfar_swap(void *a, void *b, int size)
-{
-   u32 *_a = a;
-   u32 *_b = b;
-
-   swap(_a[0], _b[0]);
-   swap(_a[1], _b[1]);
-   swap(_a[2], _b[2]);
-   swap(_a[3], _b[3]);
-}
-
 /* Write a mask to filer cache */
 static void gfar_set_mask(u32 mask, struct filer_table *tab)
 {
@@ -1272,310 +1251,6 @@ static int gfar_convert_to_filer(struct 
ethtool_rx_flow_spec *rule,
return 0;
 }
 
-/* Copy size filer entries */
-static void gfar_copy_filer_entries(struct gfar_filer_entry dst[0],
-   struct gfar_filer_entry src[0], s32 size)
-{
-   while (size  0) {
-   size--;
-   dst[size].ctrl = src[size].ctrl;
-   dst[size].prop = src[size].prop;
-   }
-}
-
-/* Delete the contents of the filer-table between start and end
- * and collapse them
- */
-static int gfar_trim_filer_entries(u32 begin, u32 end, struct filer_table *tab)
-{
-   int length;
-
-   if (end  MAX_FILER_CACHE_IDX || end  begin)
-   return -EINVAL;
-
-   end++;
-   length = end - begin;
-
-   /* Copy */
-   while (end  tab-index) {
-   tab-fe[begin].ctrl = tab-fe[end].ctrl;
-   tab-fe[begin++].prop = tab-fe[end++].prop;
-
-   }
-   /* Fill up with don't cares */
-   while (begin  tab-index) {
-   tab-fe[begin].ctrl = 0x60;
-   tab-fe[begin].prop = 0x;
-   begin++;
-   }
-
-   tab-index -= length;
-   return 0;
-}
-
-/* Make space on the wanted location */
-static int gfar_expand_filer_entries(u32 begin, u32 length,
-struct filer_table *tab)
-{
-   if (length == 0 || length + tab-index  MAX_FILER_CACHE_IDX ||
-   begin  MAX_FILER_CACHE_IDX)
-   return -EINVAL;
-
-   gfar_copy_filer_entries((tab-fe[begin + length]), (tab-fe[begin]),
-   tab-index - length + 1);
-
-   tab-index += length;
-   return 0;
-}
-
-static int gfar_get_next_cluster_start(int start, struct filer_table *tab)
-{
-   for (; (start  tab-index)  (start  MAX_FILER_CACHE_IDX - 1);
-start++) {
-   if ((tab-fe[start].ctrl  (RQFCR_AND | RQFCR_CLE)) ==
-   (RQFCR_AND | RQFCR_CLE))
-   return start;
-   }
-   return -1;
-}
-
-static int gfar_get_next_cluster_end(int start, struct filer_table *tab)
-{
-   for (; (start  tab-index)  (start  MAX_FILER_CACHE_IDX - 1);
-start++) {
-   if ((tab-fe[start].ctrl  (RQFCR_AND | RQFCR_CLE)) ==
-   (RQFCR_CLE))
-   return start;
-   }
-   return -1;
-}
-
-/* Uses hardwares clustering option to reduce
- * the number of filer table entries
- */
-static void gfar_cluster_filer(struct filer_table *tab)
-{
-   s32 i = -1, j, iend, jend;
-
-   while ((i = gfar_get_next_cluster_start(++i, tab)) != -1) {
-   j = i;
-   while ((j = gfar_get_next_cluster_start(++j, tab)) != -1) {
-   /* The cluster entries self and the previous one
-* (a mask) must be identical!
-*/
-   if (tab-fe[i].ctrl != tab-fe[j].ctrl)
-   break;
-   if (tab-fe[i].prop != tab-fe[j].prop)
-   break;
-   if (tab-fe[i - 1].ctrl != tab-fe[j - 1].ctrl)
-   break;

[PATCH 1/3] gianfar: correct filer table writing

2015-08-10 Thread Jakub Kicinski
From: Jakub Kicinski kubak...@wp.pl

MAX_FILER_IDX is the last usable index.  Using less-than
will already guarantee that one entry for catch-all rule
will be left, no need to subtract 1 here.

Signed-off-by: Jakub Kicinski kubak...@wp.pl
---
 drivers/net/ethernet/freescale/gianfar_ethtool.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/freescale/gianfar_ethtool.c 
b/drivers/net/ethernet/freescale/gianfar_ethtool.c
index 555e461b0cfe..e543d3b01838 100644
--- a/drivers/net/ethernet/freescale/gianfar_ethtool.c
+++ b/drivers/net/ethernet/freescale/gianfar_ethtool.c
@@ -1585,11 +1585,10 @@ static int gfar_write_filer_table(struct gfar_private 
*priv,
return -EBUSY;
 
/* Fill regular entries */
-   for (; i  MAX_FILER_IDX - 1  (tab-fe[i].ctrl | tab-fe[i].prop);
-i++)
+   for (; i  MAX_FILER_IDX  (tab-fe[i].ctrl | tab-fe[i].prop); i++)
gfar_write_filer(priv, i, tab-fe[i].ctrl, tab-fe[i].prop);
/* Fill the rest with fall-troughs */
-   for (; i  MAX_FILER_IDX - 1; i++)
+   for (; i  MAX_FILER_IDX; i++)
gfar_write_filer(priv, i, 0x60, 0x);
/* Last entry must be default accept
 * because that's what people expect
-- 
2.1.0

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/3] gianfar: filer changes

2015-08-10 Thread Jakub Kicinski
From: Jakub Kicinski kubak...@wp.pl

Hi,

I've been working with the gianfar filer code recently and got
some code to offer.  Well, maybe not that much code to offer
actually: two small fixes and removal of the current optimizer.
I'm not sure what your feelings on patch 3 will be.  It would
be great to have a working optimizer if someone wants to take
this task up but currently we have a semi-broken one and I vote
for killing the beast entirely before it has a chance to bite
more people...

Jakub Kicinski (3):
  gianfar: correct filer table writing
  gianfar: correct list membership accounting
  gianfar: remove faulty filer optimizer

 drivers/net/ethernet/freescale/gianfar_ethtool.c | 345 +--
 1 file changed, 4 insertions(+), 341 deletions(-)

-- 
2.1.0

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv2 net-next 5/9] openvswitch: Add conntrack action

2015-08-10 Thread Joe Stringer
On 6 August 2015 at 14:36, Pravin Shelar pshe...@nicira.com wrote:
 +static void ovs_fragment(struct vport *vport, struct sk_buff *skb,
 +unsigned int mru, __be16 ethertype)
 +{
 +   if (skb_network_offset(skb)  MAX_L2_LEN) {
 +   OVS_NLERR(1, L2 header too long to fragment);
 +   return;
 +   }
 +
 +   if (ethertype == htons(ETH_P_IP)) {
 +   struct dst_entry ovs_dst;
 +
 +   prepare_frag(vport, skb);
 +   dst_init(ovs_dst, ovs_dst_ops, NULL, 1,
 +DST_OBSOLETE_NONE, DST_NOCOUNT);
 +   ovs_dst.dev = vport-dev;
 +
 +   skb_dst_set_noref(skb, ovs_dst);
 +   IPCB(skb)-frag_max_size = mru;
 +
 +   ip_do_fragment(skb-sk, skb, ovs_vport_output);
 +   } else if (ethertype == htons(ETH_P_IPV6)) {
 +   const struct nf_ipv6_ops *v6ops = nf_get_ipv6_ops();
 +   struct rt6_info ovs_rt;
 +
 +   if (!v6ops) {
 +   kfree_skb(skb);
 +   return;
 +   }
 +
 +   prepare_frag(vport, skb);
 +   memset(ovs_rt, 0, sizeof(ovs_rt));
 +   dst_init(ovs_rt.dst, ovs_dst_ops, NULL, 1,
 +DST_OBSOLETE_NONE, DST_NOCOUNT);
 +   ovs_rt.dst.dev = vport-dev;
 +
 +   skb_dst_set_noref(skb, ovs_rt.dst);
 +   IP6CB(skb)-frag_max_size = mru;
 +
 +   v6ops-fragment(skb-sk, skb, ovs_vport_output);
 +   } else {
 +   WARN_ONCE(1, Failed fragment -%s: eth=%04x, MRU=%d, 
 MTU=%d.,
 + ovs_vport_name(vport), htons(ethertype), mru,
 + vport-dev-mtu);
 +   kfree_skb(skb);
 +   }
 +}
 +
 We also need something similar of this packet is going to userspace so
 that we can send original packets to userspace. Otherwise we would
 send defragmented packet to userspace.

 OK, in that case we'll need to get an MTU from somewhere. I'll look at
 using the MRU as the MTU for this path, since corner cases where the
 MRU is greater than the netlink payload size seems pretty unlikely
 (and the netlink sending code should already handle such cases). The
 other concern I have is exactly how this should be presented to
 userspace. Currently the conntrack action is treated an an implicit
 reassembly, which will implicitly refragment on output. In between, it
 remains defragmented. If we fragment on miss, then lookup will use the
 key representing the defragmented packet (ie no OVS_FRAG_TYPE_* bits
 set), so we should send the same up to userspace. I assume that
 userspace would then re-parse the packets and see that they are
 fragments for representing up to the higher layers like OpenFlow, but
 for flow installation it would reuse the key passed up from the
 kernel. Is that the model you have in mind?

 Right, Reassembly is transparent to cantrack action, so it should be
 to userspace. But this means we will need to fragment in upcall and
 defrag the skb again when the packet reenter kernel module from packet
 execute code path if we the action need to look at entire packet. So
 lets just keep it based on MRU parameter and we can enhance it later
 if we need it.

OK, we'll retain the upcall MRU and keep it assembled for the moment,
so the implicit conntrack will reassemble behaviour is retained from
this version. I'll fix up the other issues and send v3, thanks.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next v2] bridge: netlink: add support for vlan_filtering attribute

2015-08-10 Thread David Miller
From: Nikolay Aleksandrov ra...@blackwall.org
Date: Fri,  7 Aug 2015 19:40:45 +0300

 From: Nikolay Aleksandrov niko...@cumulusnetworks.com
 
 This patch adds the ability to toggle the vlan filtering support via
 netlink. Since we're already running with rtnl in .changelink() we don't
 need to take any additional locks.
 
 Signed-off-by: Nikolay Aleksandrov niko...@cumulusnetworks.com

Applied, thanks.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next] net: add explicit logging and stat for neighbour table overflow

2015-08-10 Thread David Miller
From: r...@tardy.usa.hp.com (Rick Jones)
Date: Fri,  7 Aug 2015 11:10:37 -0700 (PDT)

 From: Rick Jones rick.jon...@hp.com
 
 Add an explicit neighbour table overflow message (ratelimited) and
 statistic to make diagnosing neighbour table overflows tractable in
 the wild.
 
 Diagnosing a neighbour table overflow can be quite difficult in the wild
 because there is no explicit dmesg logged.  Callers to neighbour code
 seem to use net_dbg_ratelimit when the neighbour call fails which means
 the base message is not emitted and the callback suppressed messages
 from the ratelimiting can end-up juxtaposed with unrelated messages.
 Further, a forced garbage collection will increment a stat on each call
 whether it was successful in freeing-up a table entry or not, so that
 statistic is only a hint.  So, add a net_info_ratelimited message and
 explicit statistic to the neighbour code.
 
 Signed-off-by: Rick Jones rick.jon...@hp.com

Looks fine, applied, thanks Rick.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 1/4] vhost: Introduce a universal thread to serve all users

2015-08-10 Thread Bandan Das
Michael S. Tsirkin m...@redhat.com writes:

 On Mon, Jul 13, 2015 at 12:07:32AM -0400, Bandan Das wrote:
 vhost threads are per-device, but in most cases a single thread
 is enough. This change creates a single thread that is used to
 serve all guests.
 
 However, this complicates cgroups associations. The current policy
 is to attach the per-device thread to all cgroups of the parent process
 that the device is associated it. This is no longer possible if we
 have a single thread. So, we end up moving the thread around to
 cgroups of whichever device that needs servicing. This is a very
 inefficient protocol but seems to be the only way to integrate
 cgroups support.
 
 Signed-off-by: Razya Ladelsky ra...@il.ibm.com
 Signed-off-by: Bandan Das b...@redhat.com

 BTW, how does this interact with virtio net MQ?
 It would seem that MQ gains from more parallelism and
 CPU locality.

Hm.. Good point. As of this version, this design will always have
one worker thread servicing a guest. Now suppose we have 10 virtio
queues for a guest, surely, we could benefit from spawning off another
worker just like we are doing in case of a new guest/device with
the devs_per_worker parameter.

 ---
  drivers/vhost/scsi.c  |  15 +++--
  drivers/vhost/vhost.c | 150 
 --
  drivers/vhost/vhost.h |  19 +--
  3 files changed, 97 insertions(+), 87 deletions(-)
 
 diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c
 index ea32b38..6c42936 100644
 --- a/drivers/vhost/scsi.c
 +++ b/drivers/vhost/scsi.c
 @@ -535,7 +535,7 @@ static void vhost_scsi_complete_cmd(struct 
 vhost_scsi_cmd *cmd)
  
  llist_add(cmd-tvc_completion_list, vs-vs_completion_list);
  
 -vhost_work_queue(vs-dev, vs-vs_completion_work);
 +vhost_work_queue(vs-dev.worker, vs-vs_completion_work);
  }
  
  static int vhost_scsi_queue_data_in(struct se_cmd *se_cmd)
 @@ -1282,7 +1282,7 @@ vhost_scsi_send_evt(struct vhost_scsi *vs,
  }
  
  llist_add(evt-list, vs-vs_event_list);
 -vhost_work_queue(vs-dev, vs-vs_event_work);
 +vhost_work_queue(vs-dev.worker, vs-vs_event_work);
  }
  
  static void vhost_scsi_evt_handle_kick(struct vhost_work *work)
 @@ -1335,8 +1335,8 @@ static void vhost_scsi_flush(struct vhost_scsi *vs)
  /* Flush both the vhost poll and vhost work */
  for (i = 0; i  VHOST_SCSI_MAX_VQ; i++)
  vhost_scsi_flush_vq(vs, i);
 -vhost_work_flush(vs-dev, vs-vs_completion_work);
 -vhost_work_flush(vs-dev, vs-vs_event_work);
 +vhost_work_flush(vs-dev.worker, vs-vs_completion_work);
 +vhost_work_flush(vs-dev.worker, vs-vs_event_work);
  
  /* Wait for all reqs issued before the flush to be finished */
  for (i = 0; i  VHOST_SCSI_MAX_VQ; i++)
 @@ -1584,8 +1584,11 @@ static int vhost_scsi_open(struct inode *inode, 
 struct file *f)
  if (!vqs)
  goto err_vqs;
  
 -vhost_work_init(vs-vs_completion_work, vhost_scsi_complete_cmd_work);
 -vhost_work_init(vs-vs_event_work, vhost_scsi_evt_work);
 +vhost_work_init(vs-dev, vs-vs_completion_work,
 +vhost_scsi_complete_cmd_work);
 +
 +vhost_work_init(vs-dev, vs-vs_event_work,
 +vhost_scsi_evt_work);
  
  vs-vs_events_nr = 0;
  vs-vs_events_missed = false;
 diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
 index 2ee2826..951c96b 100644
 --- a/drivers/vhost/vhost.c
 +++ b/drivers/vhost/vhost.c
 @@ -11,6 +11,8 @@
   * Generic code for virtio server in host kernel.
   */
  
 +#define pr_fmt(fmt) KBUILD_MODNAME :  fmt
 +
  #include linux/eventfd.h
  #include linux/vhost.h
  #include linux/uio.h
 @@ -28,6 +30,9 @@
  
  #include vhost.h
  
 +/* Just one worker thread to service all devices */
 +static struct vhost_worker *worker;
 +
  enum {
  VHOST_MEMORY_MAX_NREGIONS = 64,
  VHOST_MEMORY_F_LOG = 0x1,
 @@ -58,13 +63,15 @@ static int vhost_poll_wakeup(wait_queue_t *wait, 
 unsigned mode, int sync,
  return 0;
  }
  
 -void vhost_work_init(struct vhost_work *work, vhost_work_fn_t fn)
 +void vhost_work_init(struct vhost_dev *dev,
 + struct vhost_work *work, vhost_work_fn_t fn)
  {
  INIT_LIST_HEAD(work-node);
  work-fn = fn;
  init_waitqueue_head(work-done);
  work-flushing = 0;
  work-queue_seq = work-done_seq = 0;
 +work-dev = dev;
  }
  EXPORT_SYMBOL_GPL(vhost_work_init);
  
 @@ -78,7 +85,7 @@ void vhost_poll_init(struct vhost_poll *poll, 
 vhost_work_fn_t fn,
  poll-dev = dev;
  poll-wqh = NULL;
  
 -vhost_work_init(poll-work, fn);
 +vhost_work_init(dev, poll-work, fn);
  }
  EXPORT_SYMBOL_GPL(vhost_poll_init);
  
 @@ -116,30 +123,30 @@ void vhost_poll_stop(struct vhost_poll *poll)
  }
  EXPORT_SYMBOL_GPL(vhost_poll_stop);
  
 -static bool vhost_work_seq_done(struct vhost_dev *dev, struct vhost_work 
 *work,
 -unsigned seq)
 +static bool vhost_work_seq_done(struct vhost_worker *worker,
 +struct 

Re: [RFC PATCH 1/4] vhost: Introduce a universal thread to serve all users

2015-08-10 Thread Bandan Das
Bandan Das b...@redhat.com writes:

 Michael S. Tsirkin m...@redhat.com writes:

 On Mon, Jul 13, 2015 at 12:07:32AM -0400, Bandan Das wrote:
 vhost threads are per-device, but in most cases a single thread
 is enough. This change creates a single thread that is used to
 serve all guests.
 
 However, this complicates cgroups associations. The current policy
 is to attach the per-device thread to all cgroups of the parent process
 that the device is associated it. This is no longer possible if we
 have a single thread. So, we end up moving the thread around to
 cgroups of whichever device that needs servicing. This is a very
 inefficient protocol but seems to be the only way to integrate
 cgroups support.
 
 Signed-off-by: Razya Ladelsky ra...@il.ibm.com
 Signed-off-by: Bandan Das b...@redhat.com

 BTW, how does this interact with virtio net MQ?
 It would seem that MQ gains from more parallelism and
 CPU locality.

 Hm.. Good point. As of this version, this design will always have
 one worker thread servicing a guest. Now suppose we have 10 virtio
 queues for a guest, surely, we could benefit from spawning off another
 worker just like we are doing in case of a new guest/device with
 the devs_per_worker parameter.

So, I did a quick smoke test with virtio-net and the Elvis patches.
virtio net MQ already spawns a new worker thread for every queue,
it seems ? So, the above setup already works! :) I will run some tests and
post back the results.

 ---
  drivers/vhost/scsi.c  |  15 +++--
  drivers/vhost/vhost.c | 150 
 --
  drivers/vhost/vhost.h |  19 +--
  3 files changed, 97 insertions(+), 87 deletions(-)
 
 diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c
 index ea32b38..6c42936 100644
 --- a/drivers/vhost/scsi.c
 +++ b/drivers/vhost/scsi.c
 @@ -535,7 +535,7 @@ static void vhost_scsi_complete_cmd(struct 
 vhost_scsi_cmd *cmd)
  
 llist_add(cmd-tvc_completion_list, vs-vs_completion_list);
  
 -   vhost_work_queue(vs-dev, vs-vs_completion_work);
 +   vhost_work_queue(vs-dev.worker, vs-vs_completion_work);
  }
  
  static int vhost_scsi_queue_data_in(struct se_cmd *se_cmd)
 @@ -1282,7 +1282,7 @@ vhost_scsi_send_evt(struct vhost_scsi *vs,
 }
  
 llist_add(evt-list, vs-vs_event_list);
 -   vhost_work_queue(vs-dev, vs-vs_event_work);
 +   vhost_work_queue(vs-dev.worker, vs-vs_event_work);
  }
  
  static void vhost_scsi_evt_handle_kick(struct vhost_work *work)
 @@ -1335,8 +1335,8 @@ static void vhost_scsi_flush(struct vhost_scsi *vs)
 /* Flush both the vhost poll and vhost work */
 for (i = 0; i  VHOST_SCSI_MAX_VQ; i++)
 vhost_scsi_flush_vq(vs, i);
 -   vhost_work_flush(vs-dev, vs-vs_completion_work);
 -   vhost_work_flush(vs-dev, vs-vs_event_work);
 +   vhost_work_flush(vs-dev.worker, vs-vs_completion_work);
 +   vhost_work_flush(vs-dev.worker, vs-vs_event_work);
  
 /* Wait for all reqs issued before the flush to be finished */
 for (i = 0; i  VHOST_SCSI_MAX_VQ; i++)
 @@ -1584,8 +1584,11 @@ static int vhost_scsi_open(struct inode *inode, 
 struct file *f)
 if (!vqs)
 goto err_vqs;
  
 -   vhost_work_init(vs-vs_completion_work, vhost_scsi_complete_cmd_work);
 -   vhost_work_init(vs-vs_event_work, vhost_scsi_evt_work);
 +   vhost_work_init(vs-dev, vs-vs_completion_work,
 +   vhost_scsi_complete_cmd_work);
 +
 +   vhost_work_init(vs-dev, vs-vs_event_work,
 +   vhost_scsi_evt_work);
  
 vs-vs_events_nr = 0;
 vs-vs_events_missed = false;
 diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
 index 2ee2826..951c96b 100644
 --- a/drivers/vhost/vhost.c
 +++ b/drivers/vhost/vhost.c
 @@ -11,6 +11,8 @@
   * Generic code for virtio server in host kernel.
   */
  
 +#define pr_fmt(fmt) KBUILD_MODNAME :  fmt
 +
  #include linux/eventfd.h
  #include linux/vhost.h
  #include linux/uio.h
 @@ -28,6 +30,9 @@
  
  #include vhost.h
  
 +/* Just one worker thread to service all devices */
 +static struct vhost_worker *worker;
 +
  enum {
 VHOST_MEMORY_MAX_NREGIONS = 64,
 VHOST_MEMORY_F_LOG = 0x1,
 @@ -58,13 +63,15 @@ static int vhost_poll_wakeup(wait_queue_t *wait, 
 unsigned mode, int sync,
 return 0;
  }
  
 -void vhost_work_init(struct vhost_work *work, vhost_work_fn_t fn)
 +void vhost_work_init(struct vhost_dev *dev,
 +struct vhost_work *work, vhost_work_fn_t fn)
  {
 INIT_LIST_HEAD(work-node);
 work-fn = fn;
 init_waitqueue_head(work-done);
 work-flushing = 0;
 work-queue_seq = work-done_seq = 0;
 +   work-dev = dev;
  }
  EXPORT_SYMBOL_GPL(vhost_work_init);
  
 @@ -78,7 +85,7 @@ void vhost_poll_init(struct vhost_poll *poll, 
 vhost_work_fn_t fn,
 poll-dev = dev;
 poll-wqh = NULL;
  
 -   vhost_work_init(poll-work, fn);
 +   vhost_work_init(dev, poll-work, fn);
  }
  EXPORT_SYMBOL_GPL(vhost_poll_init);
  
 @@ -116,30 +123,30 @@ void vhost_poll_stop(struct vhost_poll *poll)
  }
  EXPORT_SYMBOL_GPL(vhost_poll_stop);

Re: [PATCH] eventfd: implementation of EFD_MASK flag

2015-08-10 Thread Martin Sustrik

On 2015-08-10 10:57, Damian Hobson-Garcia wrote:

Hi Martin,

Thanks for your comments.

On 2015-08-10 3:39 PM, Martin Sustrik wrote:

On 2015-08-10 08:23, Damian Hobson-Garcia wrote:

Replying to my own post, but I had the following comments/questions.
Martin, if you have any response to my comments I would be very happy 
to

hear them.

On 2015-08-10 2:51 PM, Damian Hobson-Garcia wrote:

From: Martin Sustrik sust...@250bpm.com


[snip]


write(2):

User is allowed to write only buffers containing the following
structure:

struct efd_mask {
  __u32 events;
  __u64 data;
};

The value of 'events' should be any combination of event flags as
defined by
poll(2) function (POLLIN, POLLOUT, POLLERR, POLLHUP etc.) Specified
events will
be signaled when polling (select, poll, epoll) on the eventfd is 
done

later on.
'data' is opaque data that are not interpreted by eventfd object.

I'm not fully clear on the purpose that the 'data' member serves.  
Does

this opaque handle need to be tied together with this event
synchronization construct?


It's a convenience thing. Imagine you are implementing your own file
descriptor type in user space. You create an EFD_MASK socket and a
structure that will hold any state that you need for the socket (tx/rx
buffers and such).

Now you have two things to pass around. If you want to pass the fd to 
a
function, it must have two parameters (fd and pointer to the 
structure).


To fix it you can put the fd into the structure. That way there's only
one thing to pass around (the structure).

The problem with that approach is when you have generic code that 
deals
with file descriptors. For example, a simple poller which accepts a 
list

of (fd, callback) pairs and invokes the callback when one of the fds
signals POLLIN. You can't send a pointer to a structure to such
function. All you can send is the fd, but then, when the callback is
invoked, fd is all you have. You have no idea where your state is.

'data' member allows you to put the pointer to the state to the socket
itself. Thus, if you have a fd, you can always find out where the
associated data is by reading the mask structure from the fd.



Ok, I see what you're saying. I guess that keeping track of the mapping
between the fd and the struct in user space could be non-trivial if
there are a large number of active fds that are polling very 
frequently.

Wouldn't it be sufficient to just use epoll() in this case though?  It
already seems to support this kind of thing.


My use case was like this:

int s = mysocket();
...
// myrecv() can get the pointer to the structure
// without user having to pass it as an argument
myrecv(s, buf, sizeof(buf));

However, same behaviour can be accomplished by simply keeping
a static array of pointers in the user space.

So let's cut this part out of the patch.





[snip]

@@ -55,6 +69,9 @@ __u64 eventfd_signal(struct eventfd_ctx *ctx, 
__u64 n)

 {
+/* This function should never be used with eventfd in the mask
mode. */
+BUG_ON(ctx-flags  EFD_MASK);
+

...

@@ -158,6 +180,9 @@ int eventfd_ctx_remove_wait_queue(struct
eventfd_ctx *ctx, wait_queue_t *wait,
 {
+/* This function should never be used with eventfd in the mask
mode. */
+BUG_ON(ctx-flags  EFD_MASK);
+

...
@@ -188,6 +213,9 @@ ssize_t eventfd_ctx_read(struct eventfd_ctx 
*ctx,

int no_wait, __u64 *cnt)
+/* This function should never be used with eventfd in the mask
mode. */
+BUG_ON(ctx-flags  EFD_MASK);
+


If eventfd_ctx_fileget() returns EINVAL when EFD_MASK is set, I don't
think that there will be a way to call these functions in the mask 
mode,

so it should be possible to get rid of the BUG_ON checks.


Sure. Feel free to do so.



[snip]

@@ -230,6 +258,19 @@ static ssize_t eventfd_read(struct file *file,
char __user *buf, size_t count,
 ssize_t res;
 __u64 cnt;

+if (ctx-flags  EFD_MASK) {
+struct efd_mask mask;
+
+if (count  sizeof(mask))
+return -EINVAL;
+spin_lock_irq(ctx-wqh.lock);
+mask = ctx-mask;
+spin_unlock_irq(ctx-wqh.lock);
+if (copy_to_user(buf, mask, sizeof(mask)))
+return -EFAULT;
+return sizeof(mask);
+}
+


For the other eventfd modes, reading the value will update the 
internal

state of the eventfd (either clearing or decrementing the counter).
Should something similar be done here? I'm thinking of a case where a
process is polling on this fd in a loop. Clearing the efd_mask data  
on
read should provide an easy way for the polling process to know if it 
is

seeing new poll events.


No. In this case reading the value has no effect on the state of the 
fd.

How it should work is rather:

// fd is in POLLIN state
poll(fd);
// function exits with POLLIN but fd remains in POLLIN state
my_recv(fd, buf, size);
// my_recv function have found out that there's no more data to recv 
and

switched off the POLLIN flag
poll(fd); // we block here waiting for more data to arrive from the 
network




How 

Re: [PATCH net-next] net: dsa: mv88e6352: Use mnemonics for EEPROM registers and bits

2015-08-10 Thread David Miller
From: Andrew Lunn and...@lunn.ch
Date: Sat,  8 Aug 2015 17:04:50 +0200

 Add register definitions #defines for accessing the EEPROM.
 
 Signed-off-by: Andrew Lunn and...@lunn.ch

Applied, thanks.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch] hamradio/kiss: missing error code in mkiss_open()

2015-08-10 Thread Dan Carpenter
If register_netdev() fails we return success but we should return an
error code instead.

Reported-by: RUC_Soft_Sec zy900...@163.com
Signed-off-by: Dan Carpenter dan.carpen...@oracle.com

diff --git a/drivers/net/hamradio/mkiss.c b/drivers/net/hamradio/mkiss.c
index 2ffbf13..dcb6bb7 100644
--- a/drivers/net/hamradio/mkiss.c
+++ b/drivers/net/hamradio/mkiss.c
@@ -732,7 +732,8 @@ static int mkiss_open(struct tty_struct *tty)
goto out_free_netdev;
}
 
-   if (register_netdev(dev))
+   err = register_netdev(dev);
+   if (err)
goto out_free_buffers;
 
/* after register_netdev() - because else printk smashes the kernel */
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kernel warning in tcp_fragment

2015-08-10 Thread Jovi Zhangwei
Hi Neal,

Great thanks for your reply, we will arrange testing against that patch.

On Mon, Aug 10, 2015 at 11:35 AM, Neal Cardwell ncardw...@google.com wrote:
 On Mon, Aug 10, 2015 at 2:10 PM, Jovi Zhangwei j...@cloudflare.com wrote:

 Ping?

 We saw a lot of this warnings in our production system. It would be
 great appreciate if someone can give us the fix on this warnings. :)

 What is your net.ipv4.tcp_mtu_probing setting? If 1, have you tried
 setting it to 0? Previous reports (
 https://patchwork.ozlabs.org/patch/480882/ ) have shown that this gets
 rid of at least one source of the warning. So that would provide a
 useful data point.

 Separately, you could also try the attached patch. This is against
 3.14.39. It tries to attack a different possible source of this
 warning. Please let us know if that patch helps.

 Thanks!

 neal
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH net-next] tcp: reduce cpu usage under tcp memory pressure when SO_SNDBUF is set

2015-08-10 Thread Eric Dumazet
On Mon, 2015-08-10 at 13:29 -0400, Jason Baron wrote:
  +
  
  
  
  
 
 thanks. better?
 
 --- a/include/net/sock.h
 +++ b/include/net/sock.h
 @@ -798,8 +798,10 @@ static inline int sk_stream_min_wspace(const struct
 sock *sk)
 
  static inline int sk_stream_wspace(const struct sock *sk)
  {
 -   if (sk-sk_effective_sndbuf)
 -   return sk-sk_effective_sndbuf - sk-sk_wmem_queued;
 +   int effective_sndbuf = sk-sk_effective_sndbuf;
 +
 +   if (effective_sndbuf)
 +   return effective_sndbuf - sk-sk_wmem_queued;
 
 return sk-sk_sndbuf - sk-sk_wmem_queued;
  }
 
 

You need to use instead :

int effective_sndbuf = READ_ONCE(sk-sk_effective_sndbuf);

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next] dsa: Support multiple MDIO busses

2015-08-10 Thread David Miller
From: Andrew Lunn and...@lunn.ch
Date: Sat,  8 Aug 2015 17:09:14 +0200

 When using a cluster of switches, some topologies will have an MDIO
 bus per switch, not one for the whole cluster. Allow this to be
 represented in the device tree, by adding an optional mii-bus property
 at the switch level. The old platform_device method of instantiation
 supports this already, so only the device tree binding needs extending
 with an additional optional phandle.
 
 Signed-off-by: Andrew Lunn and...@lunn.ch

Also applied, thanks Andrew.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 2/2] cxgb4: cleanup some indenting

2015-08-10 Thread David Miller
From: Dan Carpenter dan.carpen...@oracle.com
Date: Sat, 8 Aug 2015 22:15:59 +0300

 Add or remove some tabs so that statements line up correctly.
 
 Signed-off-by: Dan Carpenter dan.carpen...@oracle.com

Applied to 'net-next', thanks Dan.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 1/2 -mainline] cxgb4: missing curly braces in t4_setup_debugfs()

2015-08-10 Thread David Miller
From: Dan Carpenter dan.carpen...@oracle.com
Date: Sat, 8 Aug 2015 22:15:25 +0300

 There were missing curly braces so it means we call add_debugfs_mem()
 unintentionally.
 
 Fixes: 3ccc6cf74d8c ('cxgb4: Adds support for T6 adapter')
 Signed-off-by: Dan Carpenter dan.carpen...@oracle.com

Applied to 'net'.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] isdn:remove reverse_bits(), use revbit8()

2015-08-10 Thread David Miller
From: yalin wang yalin.wang2...@gmail.com
Date: Mon, 10 Aug 2015 17:15:57 +0800

 This change isdn driver, remove reverse_bits() function,
 use the generic revbit8() function instead.
 
 Signed-off-by: yalin wang yalin.wang2...@gmail.com

Applied, however please format your Subject lines better in the
future.

There should be a space after the subsystem specifier and the ':'
character.  So isdn: 

Then you should capitalize the description in the Subject line
because it is very much like an English sentence.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


ipv6_mc_check_mld - kernel BUG at net/core/skbuff.c:1128

2015-08-10 Thread Brenden Blanco
Hi folks,

Here is a crash that I am able to easily reproduce. The setup is:

2 VMs, running in libvirt (qemu-kvm)
CPU mode is host-passthrough, virtio drivers used wherever available
Disable ipv6 (just to limit the amount of multicast noise)
Set up a multicast vxlan tunnel between the two VMs
Attach the vxlan device to a linux bridge
Attach a veth pair to the linux bridge
Enable ipv6 on a single veth
At this point, either one of the VMs may crash with the attached trace

Here is the test script. Not all lines are necessary, some are a byproduct of
eliminating various functions from the trace to eliminate them as suspects.

---
rmmod ebtable_nat
rmmod ebtables
sysctl net.ipv6.conf.all.disable_ipv6=1
ip l add vxlan0 type vxlan id 1 group 239.1.1.1 dev eth1
ip l add br0 type bridge
ip l set vxlan0 master br0
ip l set br0 up
ip l set vxlan0 up
ip l add v1a type veth peer name v1b
ip l set v1b master br0
ip l set v1b up
ip l set v1a up
sysctl net.ipv6.conf.v1a.disable_ipv6=0


Doing some code reading with Alexei, we found a suspect commit, which
introduces an skb_get and skb_may_pull of the same skb, which leads to the BUG
when skb-len == len.

9afd85c9e4552 net: Export IGMP/MLD message validation code

static struct sk_buff *skb_checksum_maybe_trim(struct sk_buff *skb,
  unsigned int transport_len)
...
   if (skb-len  len) {
   kfree_skb(skb);
   return NULL;
   } else if (skb-len == len) {
   return skb;
   }
...
static int __ipv6_mc_check_mld(struct sk_buff *skb,
  struct sk_buff **skb_trimmed)
...
   skb_get(skb);
   skb_chk = skb_checksum_trimmed(skb, transport_len,
  ipv6_mc_validate_checksum);



Would someone more familiar with the code be able to suggest a viable solution
or patch to try?

Cheers,
Brenden


Apologies for some of the mangled text:

[  100.879047] [ cut here ]
[  100.879105] kernel BUG at net/core/skbuff.c:1128!
[  100.879144] invalid opcode:  [#1]
[  100.879250] Modules linked in: veth bridge stp llc vxlan
ip6_udp_tunnel udp_tunnel ip6table_filter ip6_tables iptable_filter
ip_tables x_tables netconsole configfs btrfs ib_iser rdma_cm iw_cm
ib_cm ib_sa ib_mad ib_core ib_addr openvswitch iscsi_tcp libiscsi_tcp
libiscsi xor scsi_transport_iscsi libcrc32c raid6_pq dm_crypt iosf_mbi
kvm_intel kvm ppdev dm_multipath crct10dif_pclmul scsi_dh crc32_pclmul
ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul glue_helper
ablk_helper cryptd psmouse input_leds serio_raw floppy 8250_fintek
i2c_piix4 parport_pc pata_acpi mac_hid lp parport virtio_scsi
[last unloaded: ebtables]

[  100.881340] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.2.0-rc4+ #3
[  100.881375] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS 1.8.2-20150617_082717-anatol 04/01/2014
[  100.881416] task: 88013abca940 ti: 88013abdc000 task.ti:
88013abdc000
[  100.881457] RIP: 0010:[8168d3d7] [8168d3d7]
pskb_expand_head+0x227/0x260
[  100.881532] RSP: 0018:88013fd03ab8  EFLAGS: 00010202
[  100.881567] RAX: 0002 RBX: 8800bb601500 RCX: 0020
[  100.881604] RDX: 0148 RSI:  RDI: 8800bb601500
[  100.881642] RBP: 88013fd03af8 R08:  R09: 001c
[  100.881677] R10:  R11: 0001 R12: 
[  100.881714] R13: 8800bb601500 R14: 8800bb358840 R15: 
[  100.881749] FS:  () GS:88013fd0()
knlGS:
[  100.881790] CS:  0010 DS:  ES:  CR0: 80050033
[  100.881828] CR2: 7f91049f2162 CR3: 000136a94000 CR4: 001406e0
[  100.881864] DR0:  DR1:  DR2: 
[  100.881902] DR3:  DR6: fffe0ff0 DR7: 0400
[  100.881936] Stack:
[  100.881977]  88013fd03b38 816cd766 88013fd03b67
8800bb601500
[  100.882149]  88013fd03be0 0008 8800bb358840

[  100.882316]  88013fd03b48 8168e68f 88013fd03b48
00088168f4d0
[  100.882486] Call Trace:
[  100.882524]  IRQ
100.882524]  IRQ ace:
03b48 DR6: fffe0ff0 DR7: 0400
00
2717-anatol 04/01/2014
_netfilter if you need this.
 may change behavior in the future.

rser

31 c0 87 87 b0 01 00 00 f7
 tpm br_netfilter e1000e dw_dmac i2c_hid dw_dmac_core wmi video bridge
8250_dw gpio_lynxpoint i2c_designware_platform ptp mei_me stp
i2c_designware_core pps_core llc acpi_pad mei i2c_core shpchp
spi_pxa2xx_platform processor button xt_addrtype nf_conntrack_ipv4
nf_defrag_ipv4 xt_conntrack ip6table_filter ip6_tables
nf_conntrack_netbios_ns nf_conntrack_broadcast nf_nat_ftp nf_nat
nf_conntrack_ftp nf_conntrack iptable_filter sch_fq_codel nfsd nfs
auth_rpcgss fscache oid_registry nfs_acl lockd grace sunrpc ip_tables

Re: [PATCH net-next v5 0/4] GRE: Use flow based tunneling for OVS GRE vport.

2015-08-10 Thread David Miller
From: Pravin B Shelar pshe...@nicira.com
Date: Fri,  7 Aug 2015 23:49:50 -0700

 Following patches make use of new Using GRE tunnel meta data
 collection feature. This allows us to directly use netdev
 based GRE tunnel implementation. While doing so I have
 removed GRE demux API which were targeted for OVS. Most
 of GRE protocol code is now consolidated in ip_gre module.
 
 v5-v4:
 Fixed Kconfig dependency for vport-gre module.
 
 v3-v4:
 Added interface to ip-gre device to enable meta data collection.
 While doing this I split second patch into two patches.
 
 v2-v3:
 Add API to create GRE flow based device.

Series applied, thanks.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net 0/2] bnx2x: small fixes

2015-08-10 Thread David Miller
From: Yuval Mintz yuval.mi...@qlogic.com
Date: Mon, 10 Aug 2015 12:49:34 +0300

 This adds 2 small fixes, one to error flows during memory release
 and the other to flash writes via ethtool API.

Series applied, thanks.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net] inet: fix possible request socket leak

2015-08-10 Thread Eric Dumazet
From: Eric Dumazet eduma...@google.com

In commit b357a364c57c9 (inet: fix possible panic in
reqsk_queue_unlink()), I missed fact that tcp_check_req()
can return the listener socket in one case, and that we must
release the request socket refcount or we leak it.

Tested:

 Following packetdrill test template shows the issue

0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
+0setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
+0bind(3, ..., ...) = 0
+0listen(3, 1) = 0

+0 S 0:0(0) win 2920 mss 1460,sackOK,nop,nop
+0 S. 0:0(0) ack 1 mss 1460,nop,nop,sackOK
+.002  . 1:1(0) ack 21 win 2920
+0 R 21:21(0)

Fixes: b357a364c57c9 (inet: fix possible panic in reqsk_queue_unlink())
Signed-off-by: Eric Dumazet eduma...@google.com
---
 net/ipv4/tcp_ipv4.c |2 +-
 net/ipv6/tcp_ipv6.c |2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index d7d4c2b79cf2..0ea2e1c5d395 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -1348,7 +1348,7 @@ static struct sock *tcp_v4_hnd_req(struct sock *sk, 
struct sk_buff *skb)
req = inet_csk_search_req(sk, th-source, iph-saddr, iph-daddr);
if (req) {
nsk = tcp_check_req(sk, skb, req, false);
-   if (!nsk)
+   if (!nsk || nsk == sk)
reqsk_put(req);
return nsk;
}
diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
index 6748c4277aff..7a6cea5e4274 100644
--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -943,7 +943,7 @@ static struct sock *tcp_v6_hnd_req(struct sock *sk, struct 
sk_buff *skb)
   ipv6_hdr(skb)-daddr, tcp_v6_iif(skb));
if (req) {
nsk = tcp_check_req(sk, skb, req, false);
-   if (!nsk)
+   if (!nsk || nsk == sk)
reqsk_put(req);
return nsk;
}


--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] rt2x00: adjust EEPROM_SIZE for rt2500usb

2015-08-10 Thread Adrien Schildknecht
rt2500usb_validate_eeprom() read data up to 0x6e (EEPROM_CALIBRATE_OFFSET)
but only 0x6a bytes has been allocated and read from the eeprom.

This lead to out-of-bound accesses and invalid values for
EEPROM_BBPTUNE_R17 and EEPROM_CALIBRATE_OFFSET.

Change the EEPROM_SIZE to 0x6e in order to retrieve all the fields.

Tested with a rt2570 device.

Signed-off-by: Adrien Schildknecht adrien+...@schischi.me
---
 drivers/net/wireless/rt2x00/rt2500usb.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/wireless/rt2x00/rt2500usb.h 
b/drivers/net/wireless/rt2x00/rt2500usb.h
index afba073..78cc035 100644
--- a/drivers/net/wireless/rt2x00/rt2500usb.h
+++ b/drivers/net/wireless/rt2x00/rt2500usb.h
@@ -54,7 +54,7 @@
 #define CSR_REG_BASE   0x0400
 #define CSR_REG_SIZE   0x0100
 #define EEPROM_BASE0x
-#define EEPROM_SIZE0x006a
+#define EEPROM_SIZE0x006e
 #define BBP_BASE   0x
 #define BBP_SIZE   0x0060
 #define RF_BASE0x0004
-- 
2.5.0

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] xfrm: Add oif to dst lookups

2015-08-10 Thread David Ahern
Rules can be installed that direct route lookups to specific tables based
on oif. Plumb the oif through the xfrm lookups so it gets set in the flow
struct and passed to the resolver routines.

Signed-off-by: David Ahern d...@cumulusnetworks.com
---
 include/net/xfrm.h  |  7 +--
 net/ipv4/xfrm4_policy.c | 11 ++-
 net/ipv6/xfrm6_policy.c |  7 ---
 net/xfrm/xfrm_policy.c  | 24 ++--
 4 files changed, 29 insertions(+), 20 deletions(-)

diff --git a/include/net/xfrm.h b/include/net/xfrm.h
index f0ee97eec24d..312e3fee9ccf 100644
--- a/include/net/xfrm.h
+++ b/include/net/xfrm.h
@@ -285,10 +285,13 @@ struct xfrm_policy_afinfo {
unsigned short  family;
struct dst_ops  *dst_ops;
void(*garbage_collect)(struct net *net);
-   struct dst_entry*(*dst_lookup)(struct net *net, int tos,
+   struct dst_entry*(*dst_lookup)(struct net *net,
+  int tos, int oif,
   const xfrm_address_t *saddr,
   const xfrm_address_t *daddr);
-   int (*get_saddr)(struct net *net, xfrm_address_t 
*saddr, xfrm_address_t *daddr);
+   int (*get_saddr)(struct net *net, int oif,
+xfrm_address_t *saddr,
+xfrm_address_t *daddr);
void(*decode_session)(struct sk_buff *skb,
  struct flowi *fl,
  int reverse);
diff --git a/net/ipv4/xfrm4_policy.c b/net/ipv4/xfrm4_policy.c
index bff69746e05f..55b3c0f4dde5 100644
--- a/net/ipv4/xfrm4_policy.c
+++ b/net/ipv4/xfrm4_policy.c
@@ -19,7 +19,7 @@
 static struct xfrm_policy_afinfo xfrm4_policy_afinfo;
 
 static struct dst_entry *__xfrm4_dst_lookup(struct net *net, struct flowi4 
*fl4,
-   int tos,
+   int tos, int oif,
const xfrm_address_t *saddr,
const xfrm_address_t *daddr)
 {
@@ -28,6 +28,7 @@ static struct dst_entry *__xfrm4_dst_lookup(struct net *net, 
struct flowi4 *fl4,
memset(fl4, 0, sizeof(*fl4));
fl4-daddr = daddr-a4;
fl4-flowi4_tos = tos;
+   fl4-flowi4_oif = oif;
if (saddr)
fl4-saddr = saddr-a4;
 
@@ -38,22 +39,22 @@ static struct dst_entry *__xfrm4_dst_lookup(struct net 
*net, struct flowi4 *fl4,
return ERR_CAST(rt);
 }
 
-static struct dst_entry *xfrm4_dst_lookup(struct net *net, int tos,
+static struct dst_entry *xfrm4_dst_lookup(struct net *net, int tos, int oif,
  const xfrm_address_t *saddr,
  const xfrm_address_t *daddr)
 {
struct flowi4 fl4;
 
-   return __xfrm4_dst_lookup(net, fl4, tos, saddr, daddr);
+   return __xfrm4_dst_lookup(net, fl4, tos, oif, saddr, daddr);
 }
 
-static int xfrm4_get_saddr(struct net *net,
+static int xfrm4_get_saddr(struct net *net, int oif,
   xfrm_address_t *saddr, xfrm_address_t *daddr)
 {
struct dst_entry *dst;
struct flowi4 fl4;
 
-   dst = __xfrm4_dst_lookup(net, fl4, 0, NULL, daddr);
+   dst = __xfrm4_dst_lookup(net, fl4, 0, oif, NULL, daddr);
if (IS_ERR(dst))
return -EHOSTUNREACH;
 
diff --git a/net/ipv6/xfrm6_policy.c b/net/ipv6/xfrm6_policy.c
index ed0583c1b9fc..a74013d3eceb 100644
--- a/net/ipv6/xfrm6_policy.c
+++ b/net/ipv6/xfrm6_policy.c
@@ -26,7 +26,7 @@
 
 static struct xfrm_policy_afinfo xfrm6_policy_afinfo;
 
-static struct dst_entry *xfrm6_dst_lookup(struct net *net, int tos,
+static struct dst_entry *xfrm6_dst_lookup(struct net *net, int tos, int oif,
  const xfrm_address_t *saddr,
  const xfrm_address_t *daddr)
 {
@@ -35,6 +35,7 @@ static struct dst_entry *xfrm6_dst_lookup(struct net *net, 
int tos,
int err;
 
memset(fl6, 0, sizeof(fl6));
+   fl6.flowi6_oif = oif;
memcpy(fl6.daddr, daddr, sizeof(fl6.daddr));
if (saddr)
memcpy(fl6.saddr, saddr, sizeof(fl6.saddr));
@@ -50,13 +51,13 @@ static struct dst_entry *xfrm6_dst_lookup(struct net *net, 
int tos,
return dst;
 }
 
-static int xfrm6_get_saddr(struct net *net,
+static int xfrm6_get_saddr(struct net *net, int oif,
   xfrm_address_t *saddr, xfrm_address_t *daddr)
 {
struct dst_entry *dst;
struct net_device *dev;
 
-   dst = xfrm6_dst_lookup(net, 0, NULL, daddr);
+   dst = xfrm6_dst_lookup(net, 0, oif, NULL, daddr);
if (IS_ERR(dst))
return -EHOSTUNREACH;
 
diff --git a/net/xfrm/xfrm_policy.c 

VxLAN support question

2015-08-10 Thread Andrew Qu
Hi VxLAN experts,

In user space, we are developing a CLI as the  following:

Interface tunnel 100
   Mode vxlan 
   Remote ip ipv4 19.1.1.1
   Local ip ipv4 20.1.1.1
   Vni 1-1000  

With Kernel 3.12.37,  we can't support above configurations in kernel. (OR 
PLEASE
Correct me if I am wrong)

Noticing VxLAN supports has been actively worked on, hoping most
Recent kernel allow functionality above is supported now.

Pretty much what I want is that  kernel will have about 1K interfaces 
(something like Tunnel100.1-tunnel100.1000
To be created and attached to 1K bridge domains on which each VNI is associated 
with given
VNI to bridge-domain will be assigned using other CLIs)

Thanks,

Andrew


* Email Confidentiality Notice 
The information contained in this e-mail message (including any 
attachments) may be confidential, proprietary, privileged, or otherwise
exempt from disclosure under applicable laws. It is intended to be 
conveyed only to the designated recipient(s). Any use, dissemination, 
distribution, printing, retaining or copying of this e-mail (including its 
attachments) by unintended recipient(s) is strictly prohibited and may 
be unlawful. If you are not an intended recipient of this e-mail, or believe 
that you have received this e-mail in error, please notify the sender 
immediately (by replying to this e-mail), delete any and all copies of 
this e-mail (including any attachments) from your system, and do not
disclose the content of this e-mail to any other person. Thank you!

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [v4, 0/9] Freescale DPAA FMan

2015-08-10 Thread Liberman Igal
Hello David,

Thank you for your feedback.

I understand your concerns regarding the FMan driver, we've come a long way 
from where we started but still there are issues.
The community support is critical for getting the code to the desired quality 
level and I appreciate the support I receive from you and from the other 
previous reviewers.

In order to reduce the code scattering I plan to put together all the code for 
a certain IP block in one file.
For example FMan port in his current state in /drivers/net/freescale/fman/:
flib (directory)
   fsl_fman_port.h
inc (directory)
   fm_port_ext.h (API for other drivers/modules)
port (directory)
   fman_port.c (flib)
   fm_port.c
   fm_port.h
   Makefile
fm_port_drv.c (file)

New proposed structure in /drivers/net/freescale/fman/:
fman_port_drv.c (includes simplified code from fm_port.c, fman_port.c 
and fm_port_drv.c)
fman_port_drv.h (exported structures and API, minimal)

Of-course, I'll do the same for other modules (MAC, FMan itself).

After this structure change we get:
- Subdirectories completely removed
- Layering reduced, each module becomes much flatter, with one source and 
header file
- Fewer number of files (sources and headers)
- Namespace pollution drastically reduced
- General complexity of the driver reduced.

I would appreciate your comments about the steps described above.

Regards,
Igal

 -Original Message-
 From: David Miller [mailto:da...@davemloft.net]
 Sent: Saturday, August 08, 2015 1:31 AM
 To: Liberman Igal-B31950 igal.liber...@freescale.com
 Cc: netdev@vger.kernel.org; linuxppc-...@lists.ozlabs.org; linux-
 ker...@vger.kernel.org; Wood Scott-B07421 scottw...@freescale.com;
 Bucur Madalin-Cristian-B32716 madalin.bu...@freescale.com;
 pebo...@tiscali.nl; joakim.tjernl...@transmode.se; p...@mindchasers.com;
 step...@networkplumber.org
 Subject: Re: [v4, 0/9] Freescale DPAA FMan
 
 From: igal.liber...@freescale.com
 Date: Wed, 5 Aug 2015 12:25:16 +0300
 
  The Freescale Data Path Acceleration Architecture (DPAA) is a set of
  hardware components on specific QorIQ multicore processors.
  This architecture provides the infrastructure to support simplified
  sharing of networking interfaces and accelerators by multiple CPU
  cores and the accelerators.
 
 I think the directory and code structure of this new driver is quite 
 excessive.
 
 Because you've split things up _so_ much, you have to have all of these
 directories, and even worse and much more important to me you have to
 export so many functions from one source file to another.
 
 I think this is way too much.
 
 For example, in one file you have a bunch of initialization routines.
 init_a(), init_b(), init_c(), and you export them all.  Then they are always
 called in sequence:
 
   init_a();
   init_b();
   init_c();
 
 This is completely pointless.  You just needed to export one function which
 calls all three functions.
 
 The namespace pollution of this driver is out of control.
 
 You really need to completely rework the architecture and layout of this
 driver before I will even begin to review it again.
 
 And the lack of review interest by other developers should be an indication
 to you how undesirable this code submission is to read.
 
 Thanks.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next 0/6] qlcnic: enhancements

2015-08-10 Thread David Miller
From: Shahed Shaikh shahed.sha...@qlogic.com
Date: Fri, 7 Aug 2015 07:17:01 -0400

 This series adds few enhancements.
 
   o Patch from Harish reorders the sequence of header files inclusion,
 keeping kernel's header files on top.
 
   o Firmware introduced a new feature which allows driver to increases
 the size of firmware dump of iSCSI function which is being collected
 by NIC driver.
 
   o Print buffer address which is holding a firmware dump.
 
   o Use vzalloc() instead kzalloc() for allocating large chunk of memory
 which will avoid potential memory allocation failure.
 
   o Add new device ID for 0x8C30 which is a 83xx series based VF function.
 
 Please apply this series to net-next.

Series applied, thanks.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 bluetooth-next] cc2520: set the default fifo pin value from platform data

2015-08-10 Thread Varka Bhadram

On 08/11/2015 08:13 AM, sdliy...@gmail.com wrote:

From: Yong Li sdliy...@gmail.com

When the device tree support is disabled, the fifo_pin is uninitialized,
this patch will set the fifo_pin value based on platform data

Signed-off-by: Yong Li sdliy...@gmail.com


Acked-by: Varka Bhadram varkabhad...@gmail.com


---
  drivers/net/ieee802154/cc2520.c | 1 +
  1 file changed, 1 insertion(+)

diff --git a/drivers/net/ieee802154/cc2520.c b/drivers/net/ieee802154/cc2520.c
index 613dae5..c5b54a1 100644
--- a/drivers/net/ieee802154/cc2520.c
+++ b/drivers/net/ieee802154/cc2520.c
@@ -833,6 +833,7 @@ static int cc2520_get_platform_data(struct spi_device *spi,
if (!spi_pdata)
return -ENOENT;
*pdata = *spi_pdata;
+   priv-fifo_pin = pdata-fifo;
return 0;
}
  


--
Varka Bhadram.

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] net: Unbreak resetting default values for tcp_wmem/udp_wmem_min

2015-08-10 Thread Calvin Owens
On Sunday 08/09 at 22:41 -0700, David Miller wrote:
 From: Calvin Owens calvinow...@fb.com
 Date: Wed, 5 Aug 2015 13:26:54 -0700
 
  Commit 8133534c760d4083 (net: limit tcp/udp rmem/wmem to
  SOCK_{RCV,SND}BUF_MIN) modified four sysctls to enforce that the values
  written to them are not less than SOCK_MIN_{RCV,SND}BUF.
  
  This change is fine for tcp_rmem and udp_rmem_min, since SOCK_MIN_RCVBUF
  is equal to equal to TCP_SKB_MIN_TRUESIZE. But it breaks tcp_wmem and
  udp_wmem_min for previously valid values because SOCK_MIN_SNDBUF is
  (2 * TCP_SKB_MIN_TRUESIZE), which ends up being greater than 4KB.
  
  Thus, 4096 is no longer accepted as a valid value, despite still being
  the default for udp_wmem_min, and for 'min' in tcp_wmem. A huge number
  of sysctl configurations at FB use 4096 as 'min', so this change breaks
  all of them.
  
  This patch changes the sysctls to simply enforce that the value written
  is greater than or equal to the default value of SK_MEM_QUANTUM.
  
  Fixes: 8133534c760d4083 (net: limit tcp/udp rmem/wmem to SOCK_MIN...)
  Signed-off-by: Calvin Owens calvinow...@fb.com
 
 I think increasing the default makes more sense.
 
 If we don't allow applications to set 4K, the kernel shouldn't start
 with that value either.

I'm really questioning the limitation itself: why enforce a minimum of
SOCK_MIN_SNDBUF here? Why not SK_MEM_QUANTUM?

Commit 8133534c760d4083 referred to b1cb59cf2efe7971, which choose to
use the SOCK_MIN constants as the lower limits to avoid nasty bugs. But
AFAICS, a limit of SOCK_MIN_SNDBUF isn't necessary to do that: the
BUG_ON cited in the commit message for b1cb59cf2efe7971 seems to have
happened because unix_stream_sendmsg() expects a minimum of a full page
(ie SK_MEM_QUANTUM) and the math broke, not because it had less than
SOCK_MIN_SNDBUF allocated.

Nothing seems to assume that it has at least SOCK_MIN_SNDBUF to play
with, so my argument is that enforcing a minimum of SK_MEM_QUANTUM
avoids the sort of bugs commit 8133534c760d4083 was trying to avoid, and
it does so without breaking anybody's sysctl configurations. What do you
think?

Thanks very much,
Calvin
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] net: Unbreak resetting default values for tcp_wmem/udp_wmem_min

2015-08-10 Thread David Miller
From: Calvin Owens calvinow...@fb.com
Date: Mon, 10 Aug 2015 20:34:06 -0700

 I'm really questioning the limitation itself: why enforce a minimum of
 SOCK_MIN_SNDBUF here? Why not SK_MEM_QUANTUM?
 
 Commit 8133534c760d4083 referred to b1cb59cf2efe7971, which choose to
 use the SOCK_MIN constants as the lower limits to avoid nasty bugs. But
 AFAICS, a limit of SOCK_MIN_SNDBUF isn't necessary to do that: the
 BUG_ON cited in the commit message for b1cb59cf2efe7971 seems to have
 happened because unix_stream_sendmsg() expects a minimum of a full page
 (ie SK_MEM_QUANTUM) and the math broke, not because it had less than
 SOCK_MIN_SNDBUF allocated.
 
 Nothing seems to assume that it has at least SOCK_MIN_SNDBUF to play
 with, so my argument is that enforcing a minimum of SK_MEM_QUANTUM
 avoids the sort of bugs commit 8133534c760d4083 was trying to avoid, and
 it does so without breaking anybody's sysctl configurations. What do you
 think?

The author of said commit argues that too small values lead to really
bad performance, but I guess he should have adjusted the default if he
cared about it so much.

Ok, can you respin your patch with some added details in the commit
message like what you said above?

Thanks.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] mkiss: Fix error handling in mkiss_open()

2015-08-10 Thread David Miller
From: Fabio Estevam fabio.este...@freescale.com
Date: Mon, 10 Aug 2015 14:22:43 -0300

 If register_netdev() fails we are not propagating the error and
 we return success because ax_open() succeeded previously.
 
 Fix this by checking the return value of ax_open() and 
 register_netdev() and propagate the error in case of failure.
 
 Reported-by: RUC_Soft_Sec zy900...@163.com
 Signed-off-by: Fabio Estevam fabio.este...@freescale.com

Applied, thanks.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/5] Netfilter fixes for net

2015-08-10 Thread David Miller
From: Pablo Neira Ayuso pa...@netfilter.org
Date: Mon, 10 Aug 2015 19:58:34 +0200

 The following patchset contains five Netfilter fixes for your net tree,
 they are:
 
 1) Silence a warning on falling back to vmalloc(). Since 88eab472ec21, we can
easily hit this warning message, that gets users confused. So let's get rid
of it.
 
 2) Recently when porting the template object allocation on top of kmalloc to
fix the netns dependencies between x_tables and conntrack, the error
checks where left unchanged. Remove IS_ERR() and check for NULL instead.
Patch from Dan Carpenter.
 
 3) Don't ignore gfp_flags in the new nf_ct_tmpl_alloc() function, from
Joe Stringer.
 
 4) Fix a crash due to NULL pointer dereference in ip6t_SYNPROXY, patch from
Phil Sutter.
 
 5) The sequence number of the Syn+ack that is sent from SYNPROXY to clients is
not adjusted through our NAT infrastructure, as a result the client may
ignore this TCP packet and TCP flow hangs until the client probes us.  Also
from Phil Sutter.
 
 You can pull these changes from:
 
   git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf.git

Pulled, thanks Pablo.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next] mellanox: mlxsw: Use '%zx' to print size_t format

2015-08-10 Thread David Miller
From: Fabio Estevam fabio.este...@freescale.com
Date: Mon, 10 Aug 2015 09:54:28 -0300

 Use '%zx' to print size_t format in order to fix the following build warning:
 
 drivers/net/ethernet/mellanox/mlxsw/item.h:65:3: warning: format '%lx' 
 expects argument of type 'long unsigned int', but argument 6 has type 
 'size_t' [-Wformat=]
 
 Signed-off-by: Fabio Estevam fabio.este...@freescale.com

Applied, thanks.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


  1   2   >