date:20150724

Re: [RFC PATCH v4 net-next 1/4] tcp: replace cnt & rtt with struct in pkts_acked()

2015-07-24 Thread Eric Dumazet

On Fri, 2015-07-24 at 19:47 -0700, Lawrence Brakmo wrote:
> Replace 2 arguments (cnt and rtt) in the congestion control modules'
> pkts_acked() function with a struct. This will allow adding more
> information without having to modify existing congestion control
> modules (tcp_nv in particular needs bytes in flight when packet
> was sent).
> 
>  
> +struct ack_sample {
> + u32 pkts_acked;
> + s32 rtt_us;
> +};
> +
>  struct tcp_congestion_ops {
>   struct list_headlist;
>   u32 key;
> @@ -857,7 +862,7 @@ struct tcp_congestion_ops {
>   /* new value of cwnd after loss (optional) */
>   u32  (*undo_cwnd)(struct sock *sk);
>   /* hook for packet ack accounting (optional) */
> - void (*pkts_acked)(struct sock *sk, u32 num_acked, s32 rtt_us);
> + void (*pkts_acked)(struct sock *sk, struct ack_sample *sample);

This probably should be a const struct ack_sample *sample ?



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Patch net] sch_choke: drop all packets in queue during reset

2015-07-24 Thread David Miller

From: Cong Wang 
Date: Tue, 21 Jul 2015 16:52:43 -0700

> Signed-off-by: Cong Wang 

Applied.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Patch net] sch_plug: purge buffered packets during reset

2015-07-24 Thread David Miller

From: Cong Wang 
Date: Tue, 21 Jul 2015 16:31:53 -0700

> Otherwise the skbuff related structures are not correctly
> refcount'ed.
> 
> Cc: Jamal Hadi Salim 
> Signed-off-by: Cong Wang 

Applied.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net-next v2] bridge: Fix setting a flag in br_fill_ifvlaninfo_range().

2015-07-24 Thread David Miller

From: Rami Rosen 
Date: Wed, 22 Jul 2015 07:57:02 +0300

> This patch fixes setting of vinfo.flags in the br_fill_ifvlaninfo_range() 
> method. The
> assignment of vinfo.flags &= ~BRIDGE_VLAN_INFO_RANGE_BEGIN has no effect and 
> is
> unneeded, as vinfo.flags value is overriden by the  immediately following 
> vinfo.flags = flags | BRIDGE_VLAN_INFO_RANGE_END assignement.
> 
> Signed-off-by: Rami Rosen 

Applied.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net 0/2] ipv4: fib_select_default changes

2015-07-24 Thread David Miller

From: Julian Anastasov 
Date: Wed, 22 Jul 2015 10:43:21 +0300

> This patchset contains 2 changes for the alternative routes,
> one to add tb_id/fa_slen check needed after the recent
> fib_trie optimizations for fib aliases and the second
> change attempts to support alternative routes with TOS
> requirement.
> 
>   Sorry that I don't have access to the original
> report from Hagen Paul Pfeifer. I hope he will see this
> change.
> 
>   The second change adds fa_default field to the
> fib aliases (which can be many) and if the feature to
> filter the alternative routes by TOS is not worth it,
> this second patch can be scrapped.

Great work, series applied, thanks Julian!
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net-next] be2net: support ndo_get_phys_port_id()

2015-07-24 Thread David Miller

From: Sriharsha Basavapatna 
Date: Wed, 22 Jul 2015 11:15:12 +0530

> From: Sriharsha Basavapatna 
> 
> Add be_get_phys_port_id() function to report physical port id. The port id
> should be unique across different be2net devices in the system. We use the
> chip serial number along with the physical port number for this.
> 
> Signed-off-by: Sriharsha Basavapatna 

Applied, thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH V2 net-next 0/3] ARM BPF JIT features

2015-07-24 Thread David Miller

From: Nicolas Schichan 
Date: Tue, 21 Jul 2015 14:16:37 +0200

> This serie adds support for more instructions to the ARM BPF JIT

"series"

> namely skb netdevice type retrieval, skb payload offset retrieval, and
> skb packet type retrieval.
> 
> This allows 35 tests to use the JIT instead of 29 before.
> 
> This serie depends on the "BPF JIT fixes for ARM" serie sent earlier.

"series"

But even with that series applied these patches do not apply properly
at all.

davem@greenl8ke:~/src/GIT/net-next$ git am --signoff 
bundle-8569-arm-bpf-next.mbox 
Applying: ARM: net: add support for BPF_ANC | SKF_AD_PKTTYPE in ARM JIT.
error: patch failed: arch/arm/net/bpf_jit_32.c:864
error: arch/arm/net/bpf_jit_32.c: patch does not apply
Patch failed at 0001 ARM: net: add support for BPF_ANC | SKF_AD_PKTTYPE in ARM 
JIT.
When you have resolved this problem run "git am --resolved".
If you would prefer to skip this patch, instead run "git am --skip".
To restore the original branch and stop patching run "git am --abort".

Please respin against net-next, thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[RFC PATCH net] sctp: ASCONF-ACK with Unresolvable Address should be sent

2015-07-24 Thread Xin Long

RFC 5061:
This is an opaque integer assigned by the sender to identify each
request parameter.  The receiver of the ASCONF Chunk will copy this
32-bit value into the ASCONF Response Correlation ID field of the
ASCONF-ACK response parameter.  The sender of the ASCONF can use this
same value in the ASCONF-ACK to find which request the response is
for.  Note that the receiver MUST NOT change this 32-bit value.

Address Parameter: TLV

This field contains an IPv4 or IPv6 address parameter, as described
in Section 3.3.2.1 of [RFC4960].

ASCONF chunk with Error Cause Indication Parameter (Unresolvable Address)
should be sent if the Delete IP Address is not part of the association.

  Endpoint A   Endpoint B
  (ESTABLISHED)(ESTABLISHED)

  ASCONF->
  (Delete IP Address)
<-  ASCONF-ACK
(Unresolvable Address)

Signed-off-by: Xin Long 
---
 net/sctp/sm_make_chunk.c | 15 +--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/net/sctp/sm_make_chunk.c b/net/sctp/sm_make_chunk.c
index 06320c8..6e399f6 100644
--- a/net/sctp/sm_make_chunk.c
+++ b/net/sctp/sm_make_chunk.c
@@ -3090,8 +3090,19 @@ static __be16 sctp_process_asconf_param(struct 
sctp_association *asoc,
sctp_assoc_set_primary(asoc, asconf->transport);
sctp_assoc_del_nonprimary_peers(asoc,
asconf->transport);
-   } else
-   sctp_assoc_del_peer(asoc, &addr);
+   return SCTP_ERROR_NO_ERROR;
+   }
+
+   /* If the address is not part of the association, the
+* ASCONF-ACK with Error Cause Indication Parameter
+* which including cause of Unresolvable Address should
+* be sent.
+*/
+   peer = sctp_assoc_lookup_paddr(asoc, &addr);
+   if (!peer)
+   return SCTP_ERROR_DNS_FAILED;
+
+   sctp_assoc_rm_peer(asoc, peer);
break;
case SCTP_PARAM_SET_PRIMARY:
/* ADDIP Section 4.2.4
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH net] sctp: ASCONF-ACK with Unresolvable Address should be sent

2015-07-24 Thread lucien xin

On Sat, Jul 25, 2015 at 3:11 AM, Marcelo Ricardo Leitner
 wrote:
> On Fri, Jul 24, 2015 at 02:56:29PM +0800, Xin Long wrote:
>> RFC 5061:
>> This is an opaque integer assigned by the sender to identify each
>> request parameter.  The receiver of the ASCONF Chunk will copy this
>> 32-bit value into the ASCONF Response Correlation ID field of the
>> ASCONF-ACK response parameter.  The sender of the ASCONF can use this
>> same value in the ASCONF-ACK to find which request the response is
>> for.  Note that the receiver MUST NOT change this 32-bit value.
>>
>> Address Parameter: TLV
>>
>> This field contains an IPv4 or IPv6 address parameter, as described
>> in Section 3.3.2.1 of [RFC4960].
>>
>> ASCONF chunk with Error Cause Indication Parameter (Unresolvable Address)
>> should be sent if the Delete IP Address is not part of the association.
>>
>>   Endpoint A   Endpoint B
>>   (ESTABLISHED)(ESTABLISHED)
>>
>>   ASCONF->
>>   (Delete IP Address)
>> <-  ASCONF-ACK
>> (Unresolvable Address)
>>
>> Signed-off-by: Xin Long 
>> ---
>>  net/sctp/sm_make_chunk.c | 12 +++-
>>  1 file changed, 11 insertions(+), 1 deletion(-)
>>
>> diff --git a/net/sctp/sm_make_chunk.c b/net/sctp/sm_make_chunk.c
>> index 06320c8..88d82ef 100644
>> --- a/net/sctp/sm_make_chunk.c
>> +++ b/net/sctp/sm_make_chunk.c
>> @@ -3090,8 +3090,18 @@ static __be16 sctp_process_asconf_param(struct 
>> sctp_association *asoc,
>
> Please let's avoid increasing the indentation level when possible
>
>>   sctp_assoc_set_primary(asoc, asconf->transport);
>>   sctp_assoc_del_nonprimary_peers(asoc,
>>   asconf->transport);
> add a return here
>
>> - } else
>> + } else {
> and remove this else {}
> and we're good.
>
> sctp code is often too indented, trying to reduce that bit here and
> there.
>
>> + /* If the address is not part of the association, the
>> +  * ASCONF-ACK with Error Cause Indication Parameter
>> +  * which including cause of Unresolvable Address should
>> +  * be sent.
>> +  */
>> + peer = sctp_assoc_lookup_paddr(asoc, &addr);
>> + if (!peer)
>> + return SCTP_ERROR_DNS_FAILED;
>> +
>>   sctp_assoc_del_peer(asoc, &addr);
>
> Here we can replace this call to sctp_assoc_rm_peer() , because if we
> already have peer, we don't have to search for it again.
>
> Thanks,
> Marcelo
>
>> + }
>>   break;
>>   case SCTP_PARAM_SET_PRIMARY:
>>   /* ADDIP Section 4.2.4
>> --
>> 2.1.0
>>
>
>

okay, I will repost it
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[RFC PATCH v4 net-next 0/4] tcp: add NV congestion control

2015-07-24 Thread Lawrence Brakmo

This patchset adds support for NV congestion control.

The first patch replaces two arguments with a struct in pkts_acked()
The second patch is a refactor of tcp_skb_cb
The third patch adds in_flight to tcp_skb_cb's tx section
The fourth patch adds NV congestion control support.

[RFC PATCH v4 net-next 1/4] tcp: replace cnt & rtt with struct in pkts_acked()
[RFC PATCH v4 net-next 2/4] tcp: refactor struct tcp_skb_cb
[RFC PATCH v4 net-next 3/4] tcp: add in_flight to tcp_skb_cb
[RFC PATCH v4 net-next 4/4] tcp: add NV congestion control

Signed-off-by: Lawrence Brakmo 

include/net/tcp.h   |  20 ++-
net/ipv4/Kconfig|  16 ++
net/ipv4/Makefile   |   1 +
net/ipv4/tcp_bic.c  |   6 +-
net/ipv4/tcp_cdg.c  |  14 +-
net/ipv4/tcp_cubic.c|   6 +-
net/ipv4/tcp_htcp.c |  10 +-
net/ipv4/tcp_illinois.c |  20 +--
net/ipv4/tcp_input.c|  10 +-
net/ipv4/tcp_lp.c   |   6 +-
net/ipv4/tcp_nv.c   | 479 

net/ipv4/tcp_output.c   |   4 +-
net/ipv4/tcp_vegas.c|   6 +-
net/ipv4/tcp_vegas.h|   2 +-
net/ipv4/tcp_veno.c |   6 +-
net/ipv4/tcp_westwood.c |   6 +-
net/ipv4/tcp_yeah.c |   6 +-
17 files changed, 567 insertions(+), 51 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[RFC PATCH v4 net-next 4/4] tcp: add NV congestion control

2015-07-24 Thread Lawrence Brakmo

This is a request for comments.

TCP-NV (New Vegas) is a major update to TCP-Vegas.
An earlier version of NV was presented at 2010's LPC.
It is a delayed based congestion avoidance for the
data center. This version has been tested within a
10G rack where the HW RTTs are 20-50us.

A description of TCP-NV, including implementation
details as well as experimental results, can be found at:
http://www.brakmo.org/networking/tcp-nv/TCPNV.html

The current version includes many module parameters to support
experimentation with the parameters.

Signed-off-by: Lawrence Brakmo 
---
 net/ipv4/Kconfig  |  16 ++
 net/ipv4/Makefile |   1 +
 net/ipv4/tcp_nv.c | 479 ++
 3 files changed, 496 insertions(+)
 create mode 100644 net/ipv4/tcp_nv.c

diff --git a/net/ipv4/Kconfig b/net/ipv4/Kconfig
index 6fb3c90..f11f2f8 100644
--- a/net/ipv4/Kconfig
+++ b/net/ipv4/Kconfig
@@ -539,6 +539,22 @@ config TCP_CONG_VEGAS
window. TCP Vegas should provide less packet loss, but it is
not as aggressive as TCP Reno.
 
+config TCP_CONG_NV
+   tristate "TCP NV"
+   default n
+   ---help---
+   TCP NV is a follow up to TCP Vegas. It has been modified to deal with
+   10G networks, measurement noise introduced by LRO, GRO and interrupt
+   coalescence. In addition, it will decrease its cwnd multiplicatively
+   instead of linearly.
+
+   Note that in general congestion avoidance (cwnd decreased when # packets
+   queued grows) cannot coexist with congestion control (cwnd decreased 
only
+   when there is packet loss) due to fairness issues. One scenario when 
they
+   can coexist safely is when the CA flows have RTTs << CC flows RTTs.
+
+   For further details see http://www.brakmo.org/networking/tcp-nv/
+
 config TCP_CONG_SCALABLE
tristate "Scalable TCP"
default n
diff --git a/net/ipv4/Makefile b/net/ipv4/Makefile
index efc43f3..06f335f 100644
--- a/net/ipv4/Makefile
+++ b/net/ipv4/Makefile
@@ -50,6 +50,7 @@ obj-$(CONFIG_TCP_CONG_HSTCP) += tcp_highspeed.o
 obj-$(CONFIG_TCP_CONG_HYBLA) += tcp_hybla.o
 obj-$(CONFIG_TCP_CONG_HTCP) += tcp_htcp.o
 obj-$(CONFIG_TCP_CONG_VEGAS) += tcp_vegas.o
+obj-$(CONFIG_TCP_CONG_NV) += tcp_nv.o
 obj-$(CONFIG_TCP_CONG_VENO) += tcp_veno.o
 obj-$(CONFIG_TCP_CONG_SCALABLE) += tcp_scalable.o
 obj-$(CONFIG_TCP_CONG_LP) += tcp_lp.o
diff --git a/net/ipv4/tcp_nv.c b/net/ipv4/tcp_nv.c
new file mode 100644
index 000..c4379b8
--- /dev/null
+++ b/net/ipv4/tcp_nv.c
@@ -0,0 +1,479 @@
+/*
+ * TCP NV: TCP with Congestion Avoidance
+ *
+ * TCP-NV is a successor of TCP-Vegas that has been developed to
+ * deal with the issues that occur in modern networks. 
+ * Like TCP-Vegas, TCP-NV supports true congestion avoidance,
+ * the ability to detect congestion before packet losses occur.
+ * When congestion (queue buildup) starts to occur, TCP-NV
+ * predicts what the cwnd size should be for the current
+ * throughput and it reduces the cwnd proportionally to
+ * the difference between the current cwnd and the predicted cwnd.
+ * TCP-NV behaves like Reno when no congestion is detected, or when
+ * recovering from packet losses.
+ *
+ * TODO:
+ * 1) Add option to not decrease cwnd on losses below certain level
+ * 2) Add mechanism to deal with reverse congestion.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+/* TCP NV parameters */
+static int nv_enable __read_mostly = 1;
+static int nv_pad __read_mostly = 10;
+static int nv_pad_buffer __read_mostly = 2;
+static int nv_reset_period __read_mostly = 5;
+static int nv_min_cwnd = 10;
+static int nv_dec_eval_min_calls = 100;
+static int nv_ssthresh_eval_min_calls = 30;
+static int nv_rtt_min_cnt = 2;
+static int nv_cong_decrease_mult = 30*128/100;
+static int nv_ssthresh_factor = 8;
+static int nv_rtt_factor = 128;
+static int nv_rtt_cnt_dec_delta = 20; /* dec cwnd by this many RTTs */
+static int nv_dec_factor = 5;  /* actual value is factor/8 */
+static int nv_loss_dec_factor = 820; /* on loss reduce cwnd by 20% */
+static int nv_cwnd_growth_factor = 2; /* larger => cwnd grows slower */
+
+module_param(nv_pad, int, 0644);
+MODULE_PARM_DESC(nv_pad, "extra packets above congestion level");
+module_param(nv_pad_buffer, int, 0644);
+MODULE_PARM_DESC(nv_pad_buffer, "no growth buffer zone");
+module_param(nv_reset_period, int, 0644);
+MODULE_PARM_DESC(nv_reset_period, "nv_min_rtt reset period (secs)");
+module_param(nv_min_cwnd, int, 0644);
+MODULE_PARM_DESC(nv_min_cwnd, "NV will not decrease cwnd below this value"
+" without losses");
+module_param(nv_dec_eval_min_calls, int, 0644);
+MODULE_PARM_DESC(nv_dec_eval_min_calls, "Wait for this many data points "
+"before declaring congestion (< 256)");
+module_param(nv_ssthresh_eval_min_calls, int, 0644);
+MODULE_PARM_DESC(nv_ssthresh_eval_min_calls, "Wait for this many data points "
+"before declaring congestion during initial slow-start");
+module_para

[RFC PATCH v4 net-next 2/4] tcp: refactor struct tcp_skb_cb

2015-07-24 Thread Lawrence Brakmo

Refactor tcp_skb_cb to create two overlaping areas to store
state for incoming or outgoing skbs based on comments by
Neal Cardwell to tcp_nv patch:

   AFAICT this patch would not require an increase in the size of
   sk_buff cb[] if it were to take advantage of the fact that the
   tcp_skb_cb header.h4 and header.h6 fields are only used in the packet
   reception code path, and this in_flight field is only used on the
   transmit side.

Signed-off-by: Lawrence Brakmo 
---
 include/net/tcp.h | 11 ---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/include/net/tcp.h b/include/net/tcp.h
index 1e6c5b04..7c510ed 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -755,11 +755,16 @@ struct tcp_skb_cb {
/* 1 byte hole */
__u32   ack_seq;/* Sequence number ACK'd*/
union {
-   struct inet_skb_parmh4;
+   struct {
+   /* There is space for up to 20 bytes */
+   } tx;   /* only used for outgoing skbs */
+   union {
+   struct inet_skb_parmh4;
 #if IS_ENABLED(CONFIG_IPV6)
-   struct inet6_skb_parm   h6;
+   struct inet6_skb_parm   h6;
 #endif
-   } header;   /* For incoming frames  */
+   } header;   /* For incoming skbs */
+   };
 };
 
 #define TCP_SKB_CB(__skb)  ((struct tcp_skb_cb *)&((__skb)->cb[0]))
-- 
1.8.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[RFC PATCH v4 net-next 3/4] tcp: add in_flight to tcp_skb_cb

2015-07-24 Thread Lawrence Brakmo

Add in_flight (bytes in flight when packet was sent) field
to tx component of tcp_skb_cb and make it available to
congestion modules' pkts_acked() function through the
ack_sample function argument.

Signed-off-by: Lawrence Brakmo 
---
 include/net/tcp.h | 2 ++
 net/ipv4/tcp_input.c  | 5 -
 net/ipv4/tcp_output.c | 4 +++-
 3 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/include/net/tcp.h b/include/net/tcp.h
index 7c510ed..f850404 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -757,6 +757,7 @@ struct tcp_skb_cb {
union {
struct {
/* There is space for up to 20 bytes */
+   __u32 in_flight;/* Bytes in flight when packet sent */
} tx;   /* only used for outgoing skbs */
union {
struct inet_skb_parmh4;
@@ -842,6 +843,7 @@ union tcp_cc_info;
 struct ack_sample {
u32 pkts_acked;
s32 rtt_us;
+   u32 in_flight;
 };
 
 struct tcp_congestion_ops {
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 423d3af..3ab4178 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -3068,6 +3068,7 @@ static int tcp_clean_rtx_queue(struct sock *sk, int 
prior_fackets,
long ca_rtt_us = -1L;
struct sk_buff *skb;
u32 pkts_acked = 0;
+   u32 last_in_flight = 0;
bool rtt_update;
int flag = 0;
 
@@ -3107,6 +3108,7 @@ static int tcp_clean_rtx_queue(struct sock *sk, int 
prior_fackets,
if (!first_ackt.v64)
first_ackt = last_ackt;
 
+   last_in_flight = TCP_SKB_CB(skb)->tx.in_flight;
reord = min(pkts_acked, reord);
if (!after(scb->end_seq, tp->high_seq))
flag |= FLAG_ORIG_SACK_ACKED;
@@ -3196,7 +3198,8 @@ static int tcp_clean_rtx_queue(struct sock *sk, int 
prior_fackets,
}
 
if (icsk->icsk_ca_ops->pkts_acked) {
-   struct ack_sample sample = {pkts_acked, ca_rtt_us};
+   struct ack_sample sample = {pkts_acked, ca_rtt_us,
+   last_in_flight};
 
icsk->icsk_ca_ops->pkts_acked(sk, &sample);
}
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 7105784..e9deab5 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -920,9 +920,12 @@ static int tcp_transmit_skb(struct sock *sk, struct 
sk_buff *skb, int clone_it,
int err;
 
BUG_ON(!skb || !tcp_skb_pcount(skb));
+   tp = tcp_sk(sk);
 
if (clone_it) {
skb_mstamp_get(&skb->skb_mstamp);
+   TCP_SKB_CB(skb)->tx.in_flight = TCP_SKB_CB(skb)->end_seq
+   - tp->snd_una;
 
if (unlikely(skb_cloned(skb)))
skb = pskb_copy(skb, gfp_mask);
@@ -933,7 +936,6 @@ static int tcp_transmit_skb(struct sock *sk, struct sk_buff 
*skb, int clone_it,
}
 
inet = inet_sk(sk);
-   tp = tcp_sk(sk);
tcb = TCP_SKB_CB(skb);
memset(&opts, 0, sizeof(opts));
 
-- 
1.8.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[RFC PATCH v4 net-next 1/4] tcp: replace cnt & rtt with struct in pkts_acked()

2015-07-24 Thread Lawrence Brakmo

Replace 2 arguments (cnt and rtt) in the congestion control modules'
pkts_acked() function with a struct. This will allow adding more
information without having to modify existing congestion control
modules (tcp_nv in particular needs bytes in flight when packet
was sent).

As proposed by Neal Cardwell in his comments to the tcp_nv patch.

Signed-off-by: Lawrence Brakmo 
---
 include/net/tcp.h   |  7 ++-
 net/ipv4/tcp_bic.c  |  6 +++---
 net/ipv4/tcp_cdg.c  | 14 +++---
 net/ipv4/tcp_cubic.c|  6 +++---
 net/ipv4/tcp_htcp.c | 10 +-
 net/ipv4/tcp_illinois.c | 20 ++--
 net/ipv4/tcp_input.c|  7 +--
 net/ipv4/tcp_lp.c   |  6 +++---
 net/ipv4/tcp_vegas.c|  6 +++---
 net/ipv4/tcp_vegas.h|  2 +-
 net/ipv4/tcp_veno.c |  6 +++---
 net/ipv4/tcp_westwood.c |  6 +++---
 net/ipv4/tcp_yeah.c |  6 +++---
 13 files changed, 55 insertions(+), 47 deletions(-)

diff --git a/include/net/tcp.h b/include/net/tcp.h
index 364426a..1e6c5b04 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -834,6 +834,11 @@ enum tcp_ca_ack_event_flags {
 
 union tcp_cc_info;
 
+struct ack_sample {
+   u32 pkts_acked;
+   s32 rtt_us;
+};
+
 struct tcp_congestion_ops {
struct list_headlist;
u32 key;
@@ -857,7 +862,7 @@ struct tcp_congestion_ops {
/* new value of cwnd after loss (optional) */
u32  (*undo_cwnd)(struct sock *sk);
/* hook for packet ack accounting (optional) */
-   void (*pkts_acked)(struct sock *sk, u32 num_acked, s32 rtt_us);
+   void (*pkts_acked)(struct sock *sk, struct ack_sample *sample);
/* get info for inet_diag (optional) */
size_t (*get_info)(struct sock *sk, u32 ext, int *attr,
   union tcp_cc_info *info);
diff --git a/net/ipv4/tcp_bic.c b/net/ipv4/tcp_bic.c
index fd1405d..f237691 100644
--- a/net/ipv4/tcp_bic.c
+++ b/net/ipv4/tcp_bic.c
@@ -197,15 +197,15 @@ static void bictcp_state(struct sock *sk, u8 new_state)
 /* Track delayed acknowledgment ratio using sliding window
  * ratio = (15*ratio + sample) / 16
  */
-static void bictcp_acked(struct sock *sk, u32 cnt, s32 rtt)
+static void bictcp_acked(struct sock *sk, struct ack_sample *sample)
 {
const struct inet_connection_sock *icsk = inet_csk(sk);
 
if (icsk->icsk_ca_state == TCP_CA_Open) {
struct bictcp *ca = inet_csk_ca(sk);
 
-   cnt -= ca->delayed_ack >> ACK_RATIO_SHIFT;
-   ca->delayed_ack += cnt;
+   ca->delayed_ack += sample->pkts_acked - 
+   (ca->delayed_ack >> ACK_RATIO_SHIFT);
}
 }
 
diff --git a/net/ipv4/tcp_cdg.c b/net/ipv4/tcp_cdg.c
index 167b6a3..9fbdfa5 100644
--- a/net/ipv4/tcp_cdg.c
+++ b/net/ipv4/tcp_cdg.c
@@ -294,12 +294,12 @@ static void tcp_cdg_cong_avoid(struct sock *sk, u32 ack, 
u32 acked)
ca->shadow_wnd = max(ca->shadow_wnd, ca->shadow_wnd + incr);
 }
 
-static void tcp_cdg_acked(struct sock *sk, u32 num_acked, s32 rtt_us)
+static void tcp_cdg_acked(struct sock *sk, struct ack_sample *sample)
 {
struct cdg *ca = inet_csk_ca(sk);
struct tcp_sock *tp = tcp_sk(sk);
 
-   if (rtt_us <= 0)
+   if (sample->rtt_us <= 0)
return;
 
/* A heuristic for filtering delayed ACKs, adapted from:
@@ -307,20 +307,20 @@ static void tcp_cdg_acked(struct sock *sk, u32 num_acked, 
s32 rtt_us)
 * delay and rate based TCP mechanisms." TR 100219A. CAIA, 2010.
 */
if (tp->sacked_out == 0) {
-   if (num_acked == 1 && ca->delack) {
+   if (sample->pkts_acked == 1 && ca->delack) {
/* A delayed ACK is only used for the minimum if it is
 * provenly lower than an existing non-zero minimum.
 */
-   ca->rtt.min = min(ca->rtt.min, rtt_us);
+   ca->rtt.min = min(ca->rtt.min, sample->rtt_us);
ca->delack--;
return;
-   } else if (num_acked > 1 && ca->delack < 5) {
+   } else if (sample->pkts_acked > 1 && ca->delack < 5) {
ca->delack++;
}
}
 
-   ca->rtt.min = min_not_zero(ca->rtt.min, rtt_us);
-   ca->rtt.max = max(ca->rtt.max, rtt_us);
+   ca->rtt.min = min_not_zero(ca->rtt.min, sample->rtt_us);
+   ca->rtt.max = max(ca->rtt.max, sample->rtt_us);
 }
 
 static u32 tcp_cdg_ssthresh(struct sock *sk)
diff --git a/net/ipv4/tcp_cubic.c b/net/ipv4/tcp_cubic.c
index 28011fb..9817a8f 100644
--- a/net/ipv4/tcp_cubic.c
+++ b/net/ipv4/tcp_cubic.c
@@ -416,21 +416,21 @@ static void hystart_update(struct sock *sk, u32 delay)
 /* Track delayed acknowledgment ratio using sliding window
  * ratio = (15*ratio + sample) / 16
  */
-static void bictcp_acked(struct sock *sk, u32 cnt, s32 rtt_us)
+static void bictcp_acked(struct sock *sk, struct ack_sample *sample)
 {
con

[PATCH net-next 1/1] e1000: remove dead e1000_init_eeprom_params calls.

2015-07-24 Thread Francois Romieu

The device probe method e1000_probe calls e1000_init_eeprom_params
itself so there's no reason to call it again from e1000_do_write_eeprom
or e1000_do_read_eeprom.

The sentence above assumes that e1000_init_eeprom_params is effective
but it's mostly dependant on "hw->mac_type": safe as e1000_probe bails
out early if it can't set mac_type (see e1000_init_hw_struct, then
e1000_set_mac_type).

Btw, if effective, the removed paths would had been deadlock prone when
e1000_eeprom_spi was set:
-> e1000_write_eeprom (takes e1000_eeprom_lock)
   -> e1000_do_write_eeprom
  -> e1000_init_eeprom_params
 -> e1000_read_eeprom (takes e1000_eeprom_lock)

(same narrative with e1000_read_eeprom -> e1000_do_read_eeprom etc.)

As a final note, the candidate deadlock above can't happen in e1000_probe
due to the way eeprom->word_size is set / tested.

Signed-off-by: Francois Romieu 
---

Untested. I have found it while looking at Joern's patch.

 drivers/net/ethernet/intel/e1000/e1000_hw.c | 8 
 1 file changed, 8 deletions(-)

diff --git a/drivers/net/ethernet/intel/e1000/e1000_hw.c 
b/drivers/net/ethernet/intel/e1000/e1000_hw.c
index 45c8c864..b1af0d6 100644
--- a/drivers/net/ethernet/intel/e1000/e1000_hw.c
+++ b/drivers/net/ethernet/intel/e1000/e1000_hw.c
@@ -3900,10 +3900,6 @@ static s32 e1000_do_read_eeprom(struct e1000_hw *hw, u16 
offset, u16 words,
return E1000_SUCCESS;
}
 
-   /* If eeprom is not yet detected, do so now */
-   if (eeprom->word_size == 0)
-   e1000_init_eeprom_params(hw);
-
/* A check for invalid values:  offset too large, too many words, and
 * not enough words.
 */
@@ -4074,10 +4070,6 @@ static s32 e1000_do_write_eeprom(struct e1000_hw *hw, 
u16 offset, u16 words,
return E1000_SUCCESS;
}
 
-   /* If eeprom is not yet detected, do so now */
-   if (eeprom->word_size == 0)
-   e1000_init_eeprom_params(hw);
-
/* A check for invalid values:  offset too large, too many words, and
 * not enough words.
 */
-- 
2.4.3

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC Patch net-next] inet: introduce a sysctl ip_local_ports_strict_use

2015-07-24 Thread Cong Wang

On Wed, Jul 22, 2015 at 10:39 PM, Stephen Hemminger
 wrote:
> On Wed, 22 Jul 2015 17:07:37 -0700
> Cong Wang  wrote:
>
>> For a real example, named randomly selects some port to bind() for
>> security concern. (It doesn't use bind(0) to let kernel to select port
>> because it is not random enough, kernel usually just picks the next
>> available.) When running named on a Mesos controlled host, named would
>> silently fail when it binds a port assigned to a Mesos container.
>
> I think named is trying to workaround security issues that were fixed
> 5 years ago in Linux. The kernel does not just pick the next available
> in current code.
>

Good to know that. I will rephrase the changelog.

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH]mlx4-core: fix possible use after free in cq_completion

2015-07-24 Thread Or Gerlitz

On Fri, Jul 24, 2015 at 11:18 AM, Jinpu Wang
 wrote:
> I hit bug in OFED, I report to link below:
> http://marc.info/?l=linux-rdma&m=143634872328553&w=2
> I checked latest mainline Linux 4.2-rc3, it has similar bug.
> Here is the patch against Linux 4.2-rc3, compile test only.

Did you see the bug hitting and the fix in action over upstream?! if
not, it would be very helpful if you do so. Anyway, I'll ask Jack to
look on that next week.

Or.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH net] sctp: ASCONF-ACK with Unresolvable Address should be sent

2015-07-24 Thread Marcelo Ricardo Leitner

On Fri, Jul 24, 2015 at 02:56:29PM +0800, Xin Long wrote:
> RFC 5061:
> This is an opaque integer assigned by the sender to identify each
> request parameter.  The receiver of the ASCONF Chunk will copy this
> 32-bit value into the ASCONF Response Correlation ID field of the
> ASCONF-ACK response parameter.  The sender of the ASCONF can use this
> same value in the ASCONF-ACK to find which request the response is
> for.  Note that the receiver MUST NOT change this 32-bit value.
> 
> Address Parameter: TLV
> 
> This field contains an IPv4 or IPv6 address parameter, as described
> in Section 3.3.2.1 of [RFC4960].
> 
> ASCONF chunk with Error Cause Indication Parameter (Unresolvable Address)
> should be sent if the Delete IP Address is not part of the association.
> 
>   Endpoint A   Endpoint B
>   (ESTABLISHED)(ESTABLISHED)
> 
>   ASCONF->
>   (Delete IP Address)
> <-  ASCONF-ACK
> (Unresolvable Address)
> 
> Signed-off-by: Xin Long 
> ---
>  net/sctp/sm_make_chunk.c | 12 +++-
>  1 file changed, 11 insertions(+), 1 deletion(-)
> 
> diff --git a/net/sctp/sm_make_chunk.c b/net/sctp/sm_make_chunk.c
> index 06320c8..88d82ef 100644
> --- a/net/sctp/sm_make_chunk.c
> +++ b/net/sctp/sm_make_chunk.c
> @@ -3090,8 +3090,18 @@ static __be16 sctp_process_asconf_param(struct 
> sctp_association *asoc,

Please let's avoid increasing the indentation level when possible

>   sctp_assoc_set_primary(asoc, asconf->transport);
>   sctp_assoc_del_nonprimary_peers(asoc,
>   asconf->transport);
add a return here

> - } else
> + } else {
and remove this else {}
and we're good.

sctp code is often too indented, trying to reduce that bit here and
there.

> + /* If the address is not part of the association, the
> +  * ASCONF-ACK with Error Cause Indication Parameter
> +  * which including cause of Unresolvable Address should
> +  * be sent.
> +  */
> + peer = sctp_assoc_lookup_paddr(asoc, &addr);
> + if (!peer)
> + return SCTP_ERROR_DNS_FAILED;
> +
>   sctp_assoc_del_peer(asoc, &addr);

Here we can replace this call to sctp_assoc_rm_peer() , because if we
already have peer, we don't have to search for it again.

Thanks,
Marcelo

> + }
>   break;
>   case SCTP_PARAM_SET_PRIMARY:
>   /* ADDIP Section 4.2.4
> -- 
> 2.1.0
> 


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] net: netcp: Fixes SGMII reset on network interface shutdown

2015-07-24 Thread WingMan Kwok

This patch asserts SGMII RTRESET, i.e. resetting the SGMII Tx/Rx
logic,  during network interface shutdown to avoid having the
hardware wedge when shutting down with high incoming traffic rates.
This is cleared (brought out of RTRESET) when the interface is
brought back up.


Signed-off-by: WingMan Kwok 
---
This patch depends on the patch set 

Subject: [net-next PATCH v1 0/6] net: netcp: Bug fixes of CPSW statistics
 collection

submitted earlier.

 drivers/net/ethernet/ti/netcp.h   |1 +
 drivers/net/ethernet/ti/netcp_ethss.c |   18 ++
 drivers/net/ethernet/ti/netcp_sgmii.c |   30 --
 3 files changed, 47 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/ti/netcp.h b/drivers/net/ethernet/ti/netcp.h
index bbacf5c..a8a7306 100644
--- a/drivers/net/ethernet/ti/netcp.h
+++ b/drivers/net/ethernet/ti/netcp.h
@@ -223,6 +223,7 @@ void *netcp_device_find_module(struct netcp_device 
*netcp_device,
 
 /* SGMII functions */
 int netcp_sgmii_reset(void __iomem *sgmii_ofs, int port);
+bool netcp_sgmii_rtreset(void __iomem *sgmii_ofs, int port, bool set);
 int netcp_sgmii_get_port_link(void __iomem *sgmii_ofs, int port);
 int netcp_sgmii_config(void __iomem *sgmii_ofs, int port, u32 interface);
 
diff --git a/drivers/net/ethernet/ti/netcp_ethss.c 
b/drivers/net/ethernet/ti/netcp_ethss.c
index 7782120..571cf7a 100644
--- a/drivers/net/ethernet/ti/netcp_ethss.c
+++ b/drivers/net/ethernet/ti/netcp_ethss.c
@@ -2101,11 +2101,28 @@ static void gbe_port_config(struct gbe_priv *gbe_dev, 
struct gbe_slave *slave,
writel(slave->mac_control, GBE_REG_ADDR(slave, emac_regs, mac_control));
 }
 
+static void gbe_sgmii_rtreset(struct gbe_priv *priv,
+ struct gbe_slave *slave, bool set)
+{
+   void __iomem *sgmii_port_regs;
+
+   if (SLAVE_LINK_IS_XGMII(slave))
+   return;
+
+   if ((priv->ss_version == GBE_SS_VERSION_14) && (slave->slave_num >= 2))
+   sgmii_port_regs = priv->sgmii_port34_regs;
+   else
+   sgmii_port_regs = priv->sgmii_port_regs;
+
+   netcp_sgmii_rtreset(sgmii_port_regs, slave->slave_num, set);
+}
+
 static void gbe_slave_stop(struct gbe_intf *intf)
 {
struct gbe_priv *gbe_dev = intf->gbe_dev;
struct gbe_slave *slave = intf->slave;
 
+   gbe_sgmii_rtreset(gbe_dev, slave, true);
gbe_port_reset(slave);
/* Disable forwarding */
cpsw_ale_control_set(gbe_dev->ale, slave->port_num,
@@ -2147,6 +2164,7 @@ static int gbe_slave_open(struct gbe_intf *gbe_intf)
 
gbe_sgmii_config(priv, slave);
gbe_port_reset(slave);
+   gbe_sgmii_rtreset(priv, slave, false);
gbe_port_config(priv, slave, priv->rx_packet_max);
gbe_set_slave_mac(slave, gbe_intf);
/* enable forwarding */
diff --git a/drivers/net/ethernet/ti/netcp_sgmii.c 
b/drivers/net/ethernet/ti/netcp_sgmii.c
index dbeb142..5d8419f 100644
--- a/drivers/net/ethernet/ti/netcp_sgmii.c
+++ b/drivers/net/ethernet/ti/netcp_sgmii.c
@@ -18,6 +18,9 @@
 
 #include "netcp.h"
 
+#define SGMII_SRESET_RESET BIT(0)
+#define SGMII_SRESET_RTRESET   BIT(1)
+
 #define SGMII_REG_STATUS_LOCK  BIT(4)
 #defineSGMII_REG_STATUS_LINK   BIT(0)
 #define SGMII_REG_STATUS_AUTONEG   BIT(2)
@@ -51,12 +54,35 @@ static void sgmii_write_reg_bit(void __iomem *base, int 
reg, u32 val)
 int netcp_sgmii_reset(void __iomem *sgmii_ofs, int port)
 {
/* Soft reset */
-   sgmii_write_reg_bit(sgmii_ofs, SGMII_SRESET_REG(port), 0x1);
-   while (sgmii_read_reg(sgmii_ofs, SGMII_SRESET_REG(port)) != 0x0)
+   sgmii_write_reg_bit(sgmii_ofs, SGMII_SRESET_REG(port),
+   SGMII_SRESET_RESET);
+
+   while ((sgmii_read_reg(sgmii_ofs, SGMII_SRESET_REG(port)) &
+   SGMII_SRESET_RESET) != 0x0)
;
+
return 0;
 }
 
+/* port is 0 based */
+bool netcp_sgmii_rtreset(void __iomem *sgmii_ofs, int port, bool set)
+{
+   u32 reg;
+   bool oldval;
+
+   /* Initiate a soft reset */
+   reg = sgmii_read_reg(sgmii_ofs, SGMII_SRESET_REG(port));
+   oldval = (reg & SGMII_SRESET_RTRESET) != 0x0;
+   if (set)
+   reg |= SGMII_SRESET_RTRESET;
+   else
+   reg &= ~SGMII_SRESET_RTRESET;
+   sgmii_write_reg(sgmii_ofs, SGMII_SRESET_REG(port), reg);
+   wmb();
+
+   return oldval;
+}
+
 int netcp_sgmii_get_port_link(void __iomem *sgmii_ofs, int port)
 {
u32 status = 0, link = 0;
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v1 6/6] net/macb: convert to kernel doc

2015-07-24 Thread Andy Shevchenko

This patch coverts struct description to the kernel doc format. There is no
functional change.

Signed-off-by: Andy Shevchenko 
---
 include/linux/platform_data/macb.h | 14 +++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/include/linux/platform_data/macb.h 
b/include/linux/platform_data/macb.h
index 044a124..21b15f6 100644
--- a/include/linux/platform_data/macb.h
+++ b/include/linux/platform_data/macb.h
@@ -8,11 +8,19 @@
 #ifndef __MACB_PDATA_H__
 #define __MACB_PDATA_H__
 
+/**
+ * struct macb_platform_data - platform data for MACB Ethernet
+ * @phy_mask:  phy mask passed when register the MDIO bus
+ * within the driver
+ * @phy_irq_pin:   PHY IRQ
+ * @is_rmii:   using RMII interface?
+ * @rev_eth_addr:  reverse Ethernet address byte order
+ */
 struct macb_platform_data {
u32 phy_mask;
-   int phy_irq_pin;/* PHY IRQ */
-   u8  is_rmii;/* using RMII interface? */
-   u8  rev_eth_addr;   /* reverse Ethernet address byte order 
*/
+   int phy_irq_pin;
+   u8  is_rmii;
+   u8  rev_eth_addr;
 };
 
 #endif /* __MACB_PDATA_H__ */
-- 
2.4.6

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v1 4/6] net/macb: suppress compiler warnings

2015-07-24 Thread Andy Shevchenko

This patch fixes the following warnings:
drivers/net/ethernet/cadence/macb.c: In function ‘macb_handle_link_change’:
drivers/net/ethernet/cadence/macb.c:266: warning: comparison between signed and 
unsigned
drivers/net/ethernet/cadence/macb.c:267: warning: comparison between signed and 
unsigned
drivers/net/ethernet/cadence/macb.c:291: warning: comparison between signed and 
unsigned
drivers/net/ethernet/cadence/macb.c: In function ‘gem_update_stats’:
drivers/net/ethernet/cadence/macb.c:1908: warning: comparison between signed 
and unsigned
drivers/net/ethernet/cadence/macb.c: In function ‘gem_get_ethtool_strings’:
drivers/net/ethernet/cadence/macb.c:1988: warning: comparison between signed 
and unsigned

Signed-off-by: Andy Shevchenko 
---
 drivers/net/ethernet/cadence/macb.c | 5 ++---
 drivers/net/ethernet/cadence/macb.h | 6 +++---
 2 files changed, 5 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/cadence/macb.c 
b/drivers/net/ethernet/cadence/macb.c
index 367fc9d..13d7e96 100644
--- a/drivers/net/ethernet/cadence/macb.c
+++ b/drivers/net/ethernet/cadence/macb.c
@@ -303,7 +303,6 @@ static void macb_handle_link_change(struct net_device *dev)
struct macb *bp = netdev_priv(dev);
struct phy_device *phydev = bp->phy_dev;
unsigned long flags;
-
int status_change = 0;
 
spin_lock_irqsave(&bp->lock, flags);
@@ -1936,7 +1935,7 @@ static int macb_change_mtu(struct net_device *dev, int 
new_mtu)
 
 static void gem_update_stats(struct macb *bp)
 {
-   int i;
+   unsigned int i;
u32 *p = &bp->hw_stats.gem.tx_octets_31_0;
 
for (i = 0; i < GEM_STATS_LEN; ++i, ++p) {
@@ -2015,7 +2014,7 @@ static int gem_get_sset_count(struct net_device *dev, int 
sset)
 
 static void gem_get_ethtool_strings(struct net_device *dev, u32 sset, u8 *p)
 {
-   int i;
+   unsigned int i;
 
switch (sset) {
case ETH_SS_STATS:
diff --git a/drivers/net/ethernet/cadence/macb.h 
b/drivers/net/ethernet/cadence/macb.h
index f245340..2aa102e 100644
--- a/drivers/net/ethernet/cadence/macb.h
+++ b/drivers/net/ethernet/cadence/macb.h
@@ -816,9 +816,9 @@ struct macb {
 
struct mii_bus  *mii_bus;
struct phy_device   *phy_dev;
-   unsigned intlink;
-   unsigned intspeed;
-   unsigned intduplex;
+   int link;
+   int speed;
+   int duplex;
 
u32 caps;
unsigned intdma_burst_length;
-- 
2.4.6

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v1 0/6] net/macb: fix for AVR32 and clean up

2015-07-24 Thread Andy Shevchenko

It seems no one had tested recently the driver on AVR32 platforms such as
ATNGW100. This series bring it back to work.

Andy Shevchenko (6):
  net/macb: improve big endian CPU support
  net/macb: check if macb_config present
  net/macb: use dev_*() when netdev is not yet registered
  net/macb: suppress compiler warnings
  net/macb: replace macb_count_tx_descriptors() by DIV_ROUND_UP()
  net/macb: convert to kernel doc

 drivers/net/ethernet/cadence/macb.c | 125 
 drivers/net/ethernet/cadence/macb.h |  34 --
 include/linux/platform_data/macb.h  |  14 +++-
 3 files changed, 108 insertions(+), 65 deletions(-)

-- 
2.4.6

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v1 5/6] net/macb: replace macb_count_tx_descriptors() by DIV_ROUND_UP()

2015-07-24 Thread Andy Shevchenko

macb_count_tx_descriptors() repeats the generic macro DIV_ROUND_UP(). The patch
does a replacement.

There is no functional change.

Signed-off-by: Andy Shevchenko 
---
 drivers/net/ethernet/cadence/macb.c | 10 ++
 1 file changed, 2 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/cadence/macb.c 
b/drivers/net/ethernet/cadence/macb.c
index 13d7e96..5818c04 100644
--- a/drivers/net/ethernet/cadence/macb.c
+++ b/drivers/net/ethernet/cadence/macb.c
@@ -1157,12 +1157,6 @@ static void macb_poll_controller(struct net_device *dev)
 }
 #endif
 
-static inline unsigned int macb_count_tx_descriptors(struct macb *bp,
-unsigned int len)
-{
-   return (len + bp->max_tx_length - 1) / bp->max_tx_length;
-}
-
 static unsigned int macb_tx_map(struct macb *bp,
struct macb_queue *queue,
struct sk_buff *skb)
@@ -1313,11 +1307,11 @@ static int macb_start_xmit(struct sk_buff *skb, struct 
net_device *dev)
 * socket buffer: skb fragments of jumbo frames may need to be
 * splitted into many buffer descriptors.
 */
-   count = macb_count_tx_descriptors(bp, skb_headlen(skb));
+   count = DIV_ROUND_UP(skb_headlen(skb), bp->max_tx_length);
nr_frags = skb_shinfo(skb)->nr_frags;
for (f = 0; f < nr_frags; f++) {
frag_size = skb_frag_size(&skb_shinfo(skb)->frags[f]);
-   count += macb_count_tx_descriptors(bp, frag_size);
+   count += DIV_ROUND_UP(frag_size, bp->max_tx_length);
}
 
spin_lock_irqsave(&bp->lock, flags);
-- 
2.4.6

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v1 2/6] net/macb: check if macb_config present

2015-07-24 Thread Andy Shevchenko

The commit 98b5a0f4a228 introduces jumbo frame support, but also it assumes
that macb_config present which is not always true.

The configuration without macb_config fails to boot.

 Unable to handle kernel NULL pointer dereference at virtual address 0010
 ptbr = 9035 pgd = 
 Oops: Kernel access of bad area, sig: 11 [#1]
 FRAME_POINTER chip: 0x01f:0x1e82 rev 2
 Modules linked in:
 CPU: 0 PID: 1 Comm: swapper Not tainted 4.2.0-rc3-next-20150723+ #13
 task: 91c26000 ti: 91c28000 task.ti: 91c28000
 PC is at macb_probe+0x140/0x61c

Fixes: 98b5a0f4a228 (net: macb: Add support for jumbo frames)
Signed-off-by: Andy Shevchenko 
---
 drivers/net/ethernet/cadence/macb.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/cadence/macb.c 
b/drivers/net/ethernet/cadence/macb.c
index 6980115..7986778 100644
--- a/drivers/net/ethernet/cadence/macb.c
+++ b/drivers/net/ethernet/cadence/macb.c
@@ -2885,9 +2885,8 @@ static int macb_probe(struct platform_device *pdev)
bp->pclk = pclk;
bp->hclk = hclk;
bp->tx_clk = tx_clk;
-   if (macb_config->jumbo_max_len) {
+   if (macb_config)
bp->jumbo_max_len = macb_config->jumbo_max_len;
-   }
 
spin_lock_init(&bp->lock);
 
-- 
2.4.6

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v1 3/6] net/macb: use dev_*() when netdev is not yet registered

2015-07-24 Thread Andy Shevchenko

To avoid messages like

macb macb.0 (unnamed net_device) (uninitialized): Cadence caps 0x
macb macb.0 (unnamed net_device) (uninitialized): invalid hw address, using 
random

let's use dev_*() macros.

Signed-off-by: Andy Shevchenko 
---
 drivers/net/ethernet/cadence/macb.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/cadence/macb.c 
b/drivers/net/ethernet/cadence/macb.c
index 7986778..367fc9d 100644
--- a/drivers/net/ethernet/cadence/macb.c
+++ b/drivers/net/ethernet/cadence/macb.c
@@ -211,7 +211,7 @@ static void macb_get_hwaddr(struct macb *bp)
}
}
 
-   netdev_info(bp->dev, "invalid hw address, using random\n");
+   dev_info(&bp->pdev->dev, "invalid hw address, using random\n");
eth_hw_addr_random(bp->dev);
 }
 
@@ -2240,7 +2240,7 @@ static void macb_configure_caps(struct macb *bp, const 
struct macb_config *dt_co
bp->caps |= MACB_CAPS_FIFO_MODE;
}
 
-   netdev_dbg(bp->dev, "Cadence caps 0x%08x\n", bp->caps);
+   dev_dbg(&bp->pdev->dev, "Cadence caps 0x%08x\n", bp->caps);
 }
 
 static void macb_probe_queues(void __iomem *mem,
-- 
2.4.6

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v1 1/6] net/macb: improve big endian CPU support

2015-07-24 Thread Andy Shevchenko

The commit a50dad355a53 (net: macb: Add big endian CPU support) converted I/O
accessors to readl_relaxed() and writel_relaxed() and consequentially broke
MACB driver on AVR32 platforms such as ATNGW100.

This patch improves I/O access by checking endiannes first and use the
corresponding methods.

Fixes: a50dad355a53 (net: macb: Add big endian CPU support)
Signed-off-by: Andy Shevchenko 
---
 drivers/net/ethernet/cadence/macb.c | 103 ++--
 drivers/net/ethernet/cadence/macb.h |  28 --
 2 files changed, 87 insertions(+), 44 deletions(-)

diff --git a/drivers/net/ethernet/cadence/macb.c 
b/drivers/net/ethernet/cadence/macb.c
index a4e3f86..6980115 100644
--- a/drivers/net/ethernet/cadence/macb.c
+++ b/drivers/net/ethernet/cadence/macb.c
@@ -104,6 +104,57 @@ static void *macb_rx_buffer(struct macb *bp, unsigned int 
index)
return bp->rx_buffers + bp->rx_buffer_size * macb_rx_ring_wrap(index);
 }
 
+/* I/O accessors */
+static u32 hw_readl_native(struct macb *bp, int offset)
+{
+   return __raw_readl(bp->regs + offset);
+}
+
+static void hw_writel_native(struct macb *bp, int offset, u32 value)
+{
+   __raw_writel(value, bp->regs + offset);
+}
+
+static u32 hw_readl(struct macb *bp, int offset)
+{
+   return readl_relaxed(bp->regs + offset);
+}
+
+static void hw_writel(struct macb *bp, int offset, u32 value)
+{
+   writel_relaxed(value, bp->regs + offset);
+}
+
+/*
+ * Find the CPU endianness by using the loopback bit of NCR register. When the
+ * CPU is in big endian we need to program swaped mode for management
+ * descriptor access.
+ */
+static bool hw_is_native_io(void __iomem *addr)
+{
+   u32 value = MACB_BIT(LLB);
+
+   __raw_writel(value, addr + MACB_NCR);
+   value = __raw_readl(addr + MACB_NCR);
+
+   /* Write 0 back to disable everything */
+   __raw_writel(0, addr + MACB_NCR);
+
+   return value == MACB_BIT(LLB);
+}
+
+static bool hw_is_gem(void __iomem *addr, bool native_io)
+{
+   u32 id;
+
+   if (native_io)
+   id = __raw_readl(addr + MACB_MID);
+   else
+   id = readl_relaxed(addr + MACB_MID);
+
+   return MACB_BFEXT(IDNUM, id) >= 0x2;
+}
+
 static void macb_set_hwaddr(struct macb *bp)
 {
u32 bottom;
@@ -449,14 +500,14 @@ err_out:
 
 static void macb_update_stats(struct macb *bp)
 {
-   u32 __iomem *reg = bp->regs + MACB_PFR;
u32 *p = &bp->hw_stats.macb.rx_pause_frames;
u32 *end = &bp->hw_stats.macb.tx_pause_frames + 1;
+   int offset = MACB_PFR;
 
WARN_ON((unsigned long)(end - p - 1) != (MACB_TPF - MACB_PFR) / 4);
 
-   for(; p < end; p++, reg++)
-   *p += readl_relaxed(reg);
+   for(; p < end; p++, offset += 4)
+   *p += bp->readl(bp, offset);
 }
 
 static int macb_halt_tx(struct macb *bp)
@@ -1603,7 +1654,6 @@ static u32 macb_dbw(struct macb *bp)
 static void macb_configure_dma(struct macb *bp)
 {
u32 dmacfg;
-   u32 tmp, ncr;
 
if (macb_is_gem(bp)) {
dmacfg = gem_readl(bp, DMACFG) & ~GEM_BF(RXBS, -1L);
@@ -1613,22 +1663,11 @@ static void macb_configure_dma(struct macb *bp)
dmacfg |= GEM_BIT(TXPBMS) | GEM_BF(RXBMS, -1L);
dmacfg &= ~GEM_BIT(ENDIA_PKT);
 
-   /* Find the CPU endianness by using the loopback bit of net_ctrl
-* register. save it first. When the CPU is in big endian we
-* need to program swaped mode for management descriptor access.
-*/
-   ncr = macb_readl(bp, NCR);
-   __raw_writel(MACB_BIT(LLB), bp->regs + MACB_NCR);
-   tmp =  __raw_readl(bp->regs + MACB_NCR);
-
-   if (tmp == MACB_BIT(LLB))
+   if (bp->native_io)
dmacfg &= ~GEM_BIT(ENDIA_DESC);
else
dmacfg |= GEM_BIT(ENDIA_DESC); /* CPU in big endian */
 
-   /* Restore net_ctrl */
-   macb_writel(bp, NCR, ncr);
-
if (bp->dev->features & NETIF_F_HW_CSUM)
dmacfg |= GEM_BIT(TXCOEN);
else
@@ -1902,14 +1941,14 @@ static void gem_update_stats(struct macb *bp)
 
for (i = 0; i < GEM_STATS_LEN; ++i, ++p) {
u32 offset = gem_statistics[i].offset;
-   u64 val = readl_relaxed(bp->regs + offset);
+   u64 val = bp->readl(bp, offset);
 
bp->ethtool_stats[i] += val;
*p += val;
 
if (offset == GEM_OCTTXL || offset == GEM_OCTRXL) {
/* Add GEM_OCTTXH, GEM_OCTRXH */
-   val = readl_relaxed(bp->regs + offset + 4);
+   val = bp->readl(bp, offset + 4);
bp->ethtool_stats[i] += ((u64)val) << 32;
*(++p) += val;
}
@@ -2190,7 +2229,7 @@ static void macb_configure_caps(struct macb *bp, const 
struct macb_config

Re: Several races in "usbnet" module (kernel 4.1.x)

2015-07-24 Thread Eugene Shatokhin


21.07.2015 15:04, Oliver Neukum пишет:

On Mon, 2015-07-20 at 21:13 +0300, Eugene Shatokhin wrote:

Hi,

I have recently found several data races in "usbnet" module, checked on
vanilla kernel 4.1.0 on x86_64. The races do actually happen, I have
confirmed it by adding delays and using hardware breakpoints to detect
the conflicting memory accesses (with RaceHound tool,
https://github.com/winnukem/racehound).

I have not analyzed yet how harmful these races are (if they are), but
it is better to report them anyway, I think.

Everything was checked using YOTA 4G LTE Modem that works via "usbnet"
and "cdc_ether" kernel modules.
--

[Race #1]

Race on skb_queue ('next' pointer) between usbnet_stop() and rx_complete().

Reproduced that by unplugging the device while the system was
downloading a large file from the Net.

Here is part of the call stack with the code where the changes to the
queue happen:

#0 __skb_unlink (skbuff.h:1517) 
prev->next = next;
#1 defer_bh (usbnet.c:430)
spin_lock_irqsave(&list->lock, flags);
old_state = entry->state;
entry->state = state;
__skb_unlink(skb, list);
spin_unlock(&list->lock);
spin_lock(&dev->done.lock);
__skb_queue_tail(&dev->done, skb);
if (dev->done.qlen == 1)
tasklet_schedule(&dev->bh);
spin_unlock_irqrestore(&dev->done.lock, flags);
#2 rx_complete (usbnet.c:640)
state = defer_bh(dev, skb, &dev->rxq, state);

At the same time, the following code repeatedly checks if the queue is
empty and reads the same values concurrently with the above changes:

#0  usbnet_terminate_urbs (usbnet.c:765)
/* maybe wait for deletions to finish. */
while (!skb_queue_empty(&dev->rxq)
&& !skb_queue_empty(&dev->txq)
&& !skb_queue_empty(&dev->done)) {
schedule_timeout(msecs_to_jiffies(UNLINK_TIMEOUT_MS));
set_current_state(TASK_UNINTERRUPTIBLE);
netif_dbg(dev, ifdown, dev->net,
  "waited for %d urb completions\n", temp);
}
#1  usbnet_stop (usbnet.c:806)
if (!(info->flags & FLAG_AVOID_UNLINK_URBS))
usbnet_terminate_urbs(dev);

For example, it is possible that the skb is removed from dev->rxq by
__skb_unlink() before the check "!skb_queue_empty(&dev->rxq)" in
usbnet_terminate_urbs() is made. It is also possible in this case that
the skb is added to dev->done queue after "!skb_queue_empty(&dev->done)"
is checked. So usbnet_terminate_urbs() may stop waiting and return while
dev->done queue still has an item.


Hi,

your analysis is correct and it looks like in addition to your proposed
fix locking needs to be simplified and a common lock to be taken.
Suggestions?


Just an idea, I haven't tested it.

How about moving the operations with dev->done under &list->lock in 
defer_bh, while keeping dev->done.lock too and changing 
usbnet_terminate_urbs() as described below?


Like this:
@@ -428,12 +428,12 @@ static enum skb_state defer_bh(struct usbnet *dev, 
struct sk_buff *skb,

old_state = entry->state;
entry->state = state;
__skb_unlink(skb, list);
-   spin_unlock(&list->lock);
spin_lock(&dev->done.lock);
__skb_queue_tail(&dev->done, skb);
if (dev->done.qlen == 1)
tasklet_schedule(&dev->bh);
-   spin_unlock_irqrestore(&dev->done.lock, flags);
+   spin_unlock(&dev->done.lock);
+   spin_unlock_irqrestore(&list->lock, flags);
return old_state;
 }
---

usbnet_terminate_urbs() can then be changed as follows:

@@ -749,6 +749,20 @@ EXPORT_SYMBOL_GPL(usbnet_unlink_rx_urbs);


/*-*/

+static void wait_skb_queue_empty(struct sk_buff_head *q)
+{
+   unsigned long flags;
+
+   spin_lock_irqsave(&q->lock, flags);
+   while (!skb_queue_empty(q)) {
+   spin_unlock_irqrestore(&q->lock, flags);
+   schedule_timeout(msecs_to_jiffies(UNLINK_TIMEOUT_MS));
+   set_current_state(TASK_UNINTERRUPTIBLE);
+   spin_lock_irqsave(&q->lock, flags);
+   }
+   spin_unlock_irqrestore(&q->lock, flags);
+}
+
 // precondition: never called in_interrupt
 static void usbnet_terminate_urbs(struct usbnet *dev)
 {
@@ -762,14 +776,11 @@ static void usbnet_terminate_urbs(struct usbnet *dev)
unlink_urbs(dev, &dev->rxq);

/* maybe wait for deletions to finish. */
-   while (!skb_queue_empty(&dev->rxq)
-   && !skb_queue_empty(&dev->txq)
-   && !skb_queue_empty(&dev->done)) {
-   schedule_timeout(msecs_to_jiffies(UNLINK_TIMEOUT_MS));
-   set_current_state(TASK_UNINTERRUPTIBLE);
-   netif_dbg(dev, ifdown, dev->net,
- "waited for %d urb completions\n", temp);

Re: [PATCH 2/3] brcmfmac: dhd_sdio.c: use existing atomic_or primitive

2015-07-24 Thread Vineet Gupta

On Friday 24 July 2015 08:02 PM, Kalle Valo wrote:
> Vineet Gupta  writes:
> 
>> > There's already a generic implementation so use that instead.
>> > ---
>> > I'm not sure if the driver usage of atomic_or?() is correct in terms of
>> > storage size of @val for 64 bit arches.
>> >
>> > Assuming LP64 programming model for linux on say x86_64: atomic_or()
>> > callers in this driver use long (sana 64 bit) storage and pass it to
>> > atomic_orr/atomic_or which downcasts it to 32 bits. Is that OK ?
>> > ---
>> > Cc: Brett Rudley 
>> > Cc: Arend van Spriel 
>> > Cc: "Franky (Zhenhui) Lin" 
>> > Cc: Hante Meuleman 
>> > Cc: Kalle Valo 
>> > Cc: Pieter-Paul Giesberts 
>> > Cc: Daniel Kim 
>> > Cc: linux-wirel...@vger.kernel.org
>> > Cc: brcm80211-dev-l...@broadcom.com
>> > Cc: Peter Zijlstra 
>> > Cc: Ingo Molnar 
>> > Cc: netdev@vger.kernel.org
>> > Cc: linux-a...@vger.kernel.org
>> > Cc: linux-ker...@vger.kernel.org
>> > Signed-off-by: Vineet Gupta 
>> >
>> > Signed-off-by: Vineet Gupta 
> What's the plan with this patch? Should I take it to my
> wireless-drivers-next tree or will someone else take it?


Per last discussion on this topic, Arend wanted to discuss abt this with Hante.
I'm not taking it anyways so feel free to pick it up if you want !

-Vineet
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/3] brcmfmac: dhd_sdio.c: use existing atomic_or primitive

2015-07-24 Thread Kalle Valo

Vineet Gupta  writes:

> There's already a generic implementation so use that instead.
> ---
> I'm not sure if the driver usage of atomic_or?() is correct in terms of
> storage size of @val for 64 bit arches.
>
> Assuming LP64 programming model for linux on say x86_64: atomic_or()
> callers in this driver use long (sana 64 bit) storage and pass it to
> atomic_orr/atomic_or which downcasts it to 32 bits. Is that OK ?
> ---
> Cc: Brett Rudley 
> Cc: Arend van Spriel 
> Cc: "Franky (Zhenhui) Lin" 
> Cc: Hante Meuleman 
> Cc: Kalle Valo 
> Cc: Pieter-Paul Giesberts 
> Cc: Daniel Kim 
> Cc: linux-wirel...@vger.kernel.org
> Cc: brcm80211-dev-l...@broadcom.com
> Cc: Peter Zijlstra 
> Cc: Ingo Molnar 
> Cc: netdev@vger.kernel.org
> Cc: linux-a...@vger.kernel.org
> Cc: linux-ker...@vger.kernel.org
> Signed-off-by: Vineet Gupta 
>
> Signed-off-by: Vineet Gupta 

What's the plan with this patch? Should I take it to my
wireless-drivers-next tree or will someone else take it?

-- 
Kalle Valo
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net] tcp: fix recv with flags MSG_WAITALL | MSG_PEEK

2015-07-24 Thread Eric Dumazet

On Fri, 2015-07-24 at 18:19 +0200, Sabrina Dubroca wrote:
> Currently, tcp_recvmsg enters a busy loop in sk_wait_data if called
> with flags = MSG_WAITALL | MSG_PEEK.
> 
> sk_wait_data waits for sk_receive_queue not empty, but in this case,
> the receive queue is not empty, but does not contain any skb that we
> can use.
> 
> Add a "last skb seen on receive queue" argument to sk_wait_data, so
> that it sleeps until the receive queue has new skbs.
> 
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=99461
> Link: https://sourceware.org/bugzilla/show_bug.cgi?id=18493
> Link: https://bugzilla.redhat.com/show_bug.cgi?id=1205258
> Reported-by: Enrico Scholz 
> Reported-by: Dan Searle 
> Signed-off-by: Sabrina Dubroca 
> ---

Very nice !

Acked-by: Eric Dumazet 


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net-next v2 1/2] ipv6: Re-arrange code in rt6_probe()

2015-07-24 Thread Martin KaFai Lau

It is a prep work for the next patch to remove write_lock
from rt6_probe().

1. Reduce the number of if(neigh) check.  From 4 to 1.
2. Bring the write_(un)lock() closer to the operations that the
   lock is protecting.

Hopefully, the above make rt6_probe() more readable.

Signed-off-by: Martin KaFai Lau 
Cc: Hannes Frederic Sowa 
Cc: Julian Anastasov 
Cc: YOSHIFUJI Hideaki 
---
 net/ipv6/route.c | 44 
 1 file changed, 20 insertions(+), 24 deletions(-)

diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 7f2214f..6d503db 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -545,6 +545,7 @@ static void rt6_probe_deferred(struct work_struct *w)
 
 static void rt6_probe(struct rt6_info *rt)
 {
+   struct __rt6_probe_work *work;
struct neighbour *neigh;
/*
 * Okay, this does not seem to be appropriate
@@ -559,34 +560,29 @@ static void rt6_probe(struct rt6_info *rt)
rcu_read_lock_bh();
neigh = __ipv6_neigh_lookup_noref(rt->dst.dev, &rt->rt6i_gateway);
if (neigh) {
+   work = NULL;
write_lock(&neigh->lock);
-   if (neigh->nud_state & NUD_VALID)
-   goto out;
-   }
-
-   if (!neigh ||
-   time_after(jiffies, neigh->updated + 
rt->rt6i_idev->cnf.rtr_probe_interval)) {
-   struct __rt6_probe_work *work;
-
-   work = kmalloc(sizeof(*work), GFP_ATOMIC);
-
-   if (neigh && work)
-   __neigh_set_probe_once(neigh);
-
-   if (neigh)
-   write_unlock(&neigh->lock);
-
-   if (work) {
-   INIT_WORK(&work->work, rt6_probe_deferred);
-   work->target = rt->rt6i_gateway;
-   dev_hold(rt->dst.dev);
-   work->dev = rt->dst.dev;
-   schedule_work(&work->work);
+   if (!(neigh->nud_state & NUD_VALID) &&
+   time_after(jiffies,
+  neigh->updated +
+  rt->rt6i_idev->cnf.rtr_probe_interval)) {
+   work = kmalloc(sizeof(*work), GFP_ATOMIC);
+   if (work)
+   __neigh_set_probe_once(neigh);
}
-   } else {
-out:
write_unlock(&neigh->lock);
+   } else {
+   work = kmalloc(sizeof(*work), GFP_ATOMIC);
+   }
+
+   if (work) {
+   INIT_WORK(&work->work, rt6_probe_deferred);
+   work->target = rt->rt6i_gateway;
+   dev_hold(rt->dst.dev);
+   work->dev = rt->dst.dev;
+   schedule_work(&work->work);
}
+
rcu_read_unlock_bh();
 }
 #else
-- 
1.8.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net-next v2 0/2] ipv6: Avoid rt6_probe() taking writer lock in the fast path

2015-07-24 Thread Martin KaFai Lau

v1 -> v2:
1. Separate the code re-arrangement into another patch
2. Fix style

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net-next v2 2/2] ipv6: Avoid rt6_probe() taking writer lock in the fast path

2015-07-24 Thread Martin KaFai Lau

The patch checks neigh->nud_state before acquiring the writer lock.
Note that rt6_probe() is only used in CONFIG_IPV6_ROUTER_PREF.

40 udpflood processes and a /64 gateway route are used.
The gateway has NUD_PERMANENT.  Each of them is run for 30s.
At the end, the total number of finished sendto():

Before: 55M
After: 95M

Signed-off-by: Martin KaFai Lau 
Cc: Hannes Frederic Sowa 
CC: Julian Anastasov 
CC: YOSHIFUJI Hideaki 
---
 net/ipv6/route.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 6d503db..76dcff8 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -560,6 +560,9 @@ static void rt6_probe(struct rt6_info *rt)
rcu_read_lock_bh();
neigh = __ipv6_neigh_lookup_noref(rt->dst.dev, &rt->rt6i_gateway);
if (neigh) {
+   if (neigh->nud_state & NUD_VALID)
+   goto out;
+
work = NULL;
write_lock(&neigh->lock);
if (!(neigh->nud_state & NUD_VALID) &&
@@ -583,6 +586,7 @@ static void rt6_probe(struct rt6_info *rt)
schedule_work(&work->work);
}
 
+out:
rcu_read_unlock_bh();
 }
 #else
-- 
1.8.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net-next v2] route: allow to route in a peer netns via lwt framework

2015-07-24 Thread Alexei Starovoitov

On Fri, Jul 24, 2015 at 05:39:57PM +0200, Eric Dumazet wrote:
> 
> On Fri, 2015-07-24 at 16:16 +0200, Nicolas Dichtel wrote:
> > This patch takes advantage of the newly added lwtunnel framework to
> > allow the user to set routes that point to a peer netns.
> > 
> > Packets are injected to the peer netns via the loopback device. It works
> > only when the output device is 'lo'.
> > 
> > Example:
> > ip route add 40.1.1.1/32 encap netns nsid 5 via dev lo
> > 
> 
> Is this feature so badly wanted to add complexity on lo device ?
...
> >  static netdev_tx_t loopback_xmit(struct sk_buff *skb,
> >  struct net_device *dev)
...
> > +   if (nsid != NETNSA_NSID_NOT_ASSIGNED) {
> > +   peernet = get_net_ns_by_id(dev_net(dev), nsid);
> > +   if (!peernet) {
> > +   kfree_skb(skb);
> > +   goto end;
> > +   }
> > +
> > +   /* it's OK to use per_cpu_ptr() because BHs are off */
> > +   lb_stats = this_cpu_ptr(peernet->loopback_dev->lstats);
> > +   ret = dev_forward_skb(peernet->loopback_dev, skb);

have the same concern as Eric.
Using loopback for this looks wrong.
netns suppose to look like host, but I cannot imagine a host
without NICs seeing packets on loopback from another world.
Then how the opposite direction suppose to work?
netns will setup a route to send packets to loopback of the host?!
The idea of using routing to forward packets to namespaces is great,
but I think we need something else instead of loopback.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net] tcp: fix recv with flags MSG_WAITALL | MSG_PEEK

2015-07-24 Thread Sabrina Dubroca

Currently, tcp_recvmsg enters a busy loop in sk_wait_data if called
with flags = MSG_WAITALL | MSG_PEEK.

sk_wait_data waits for sk_receive_queue not empty, but in this case,
the receive queue is not empty, but does not contain any skb that we
can use.

Add a "last skb seen on receive queue" argument to sk_wait_data, so
that it sleeps until the receive queue has new skbs.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=99461
Link: https://sourceware.org/bugzilla/show_bug.cgi?id=18493
Link: https://bugzilla.redhat.com/show_bug.cgi?id=1205258
Reported-by: Enrico Scholz 
Reported-by: Dan Searle 
Signed-off-by: Sabrina Dubroca 
---
 include/net/sock.h |  2 +-
 net/core/sock.c|  5 +++--
 net/dccp/proto.c   |  2 +-
 net/ipv4/tcp.c | 11 +++
 net/llc/af_llc.c   |  4 ++--
 5 files changed, 14 insertions(+), 10 deletions(-)

diff --git a/include/net/sock.h b/include/net/sock.h
index 05a8c1aea251..f21f0708ec59 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -902,7 +902,7 @@ void sk_stream_kill_queues(struct sock *sk);
 void sk_set_memalloc(struct sock *sk);
 void sk_clear_memalloc(struct sock *sk);
 
-int sk_wait_data(struct sock *sk, long *timeo);
+int sk_wait_data(struct sock *sk, long *timeo, const struct sk_buff *skb);
 
 struct request_sock_ops;
 struct timewait_sock_ops;
diff --git a/net/core/sock.c b/net/core/sock.c
index 08f16db46070..8a14f1285fc4 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -1967,20 +1967,21 @@ static void __release_sock(struct sock *sk)
  * sk_wait_data - wait for data to arrive at sk_receive_queue
  * @sk:sock to wait on
  * @timeo: for how long
+ * @skb:   last skb seen on sk_receive_queue
  *
  * Now socket state including sk->sk_err is changed only under lock,
  * hence we may omit checks after joining wait queue.
  * We check receive queue before schedule() only as optimization;
  * it is very likely that release_sock() added new data.
  */
-int sk_wait_data(struct sock *sk, long *timeo)
+int sk_wait_data(struct sock *sk, long *timeo, const struct sk_buff *skb)
 {
int rc;
DEFINE_WAIT(wait);
 
prepare_to_wait(sk_sleep(sk), &wait, TASK_INTERRUPTIBLE);
set_bit(SOCK_ASYNC_WAITDATA, &sk->sk_socket->flags);
-   rc = sk_wait_event(sk, timeo, !skb_queue_empty(&sk->sk_receive_queue));
+   rc = sk_wait_event(sk, timeo, skb_peek_tail(&sk->sk_receive_queue) != 
skb);
clear_bit(SOCK_ASYNC_WAITDATA, &sk->sk_socket->flags);
finish_wait(sk_sleep(sk), &wait);
return rc;
diff --git a/net/dccp/proto.c b/net/dccp/proto.c
index 52a94016526d..b5cf13a28009 100644
--- a/net/dccp/proto.c
+++ b/net/dccp/proto.c
@@ -886,7 +886,7 @@ verify_sock_status:
break;
}
 
-   sk_wait_data(sk, &timeo);
+   sk_wait_data(sk, &timeo, NULL);
continue;
found_ok_skb:
if (len > skb->len)
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 7f4056785acc..45534a5ab430 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -780,7 +780,7 @@ ssize_t tcp_splice_read(struct socket *sock, loff_t *ppos,
ret = -EAGAIN;
break;
}
-   sk_wait_data(sk, &timeo);
+   sk_wait_data(sk, &timeo, NULL);
if (signal_pending(current)) {
ret = sock_intr_errno(timeo);
break;
@@ -1575,7 +1575,7 @@ int tcp_recvmsg(struct sock *sk, struct msghdr *msg, 
size_t len, int nonblock,
int target; /* Read at least this many bytes */
long timeo;
struct task_struct *user_recv = NULL;
-   struct sk_buff *skb;
+   struct sk_buff *skb, *last;
u32 urg_hole = 0;
 
if (unlikely(flags & MSG_ERRQUEUE))
@@ -1635,7 +1635,9 @@ int tcp_recvmsg(struct sock *sk, struct msghdr *msg, 
size_t len, int nonblock,
 
/* Next get a buffer. */
 
+   last = skb_peek_tail(&sk->sk_receive_queue);
skb_queue_walk(&sk->sk_receive_queue, skb) {
+   last = skb;
/* Now that we have two receive queues this
 * shouldn't happen.
 */
@@ -1754,8 +1756,9 @@ int tcp_recvmsg(struct sock *sk, struct msghdr *msg, 
size_t len, int nonblock,
/* Do not sleep, just process backlog. */
release_sock(sk);
lock_sock(sk);
-   } else
-   sk_wait_data(sk, &timeo);
+   } else {
+   sk_wait_data(sk, &timeo, last);
+   }
 
if (user_recv) {
int chunk;
diff --git a/net/llc/af_llc.c b/net/llc/af_llc.c
index 8fd9febaa5ba..8dab4e569571 100644
--- a/net/llc/af_llc.c
+++ b/net/llc/af_llc.c
@@ -613,7 +613,7 @@ static int llc_wait_data(stru

[PATCH] ip/ip6tunnel: fix missing return value check

2015-07-24 Thread Zhang Shengju

Make sure that return value of each socket() call is properly checked
and do not continue processing if the call failed.

Signed-off-by: Zhang Shengju 
---
 ip/tunnel.c | 24 
 1 file changed, 24 insertions(+)

diff --git a/ip/tunnel.c b/ip/tunnel.c
index 33c78e3..d69fe84 100644
--- a/ip/tunnel.c
+++ b/ip/tunnel.c
@@ -73,7 +73,13 @@ int tnl_get_ioctl(const char *basedev, void *p)
 
strncpy(ifr.ifr_name, basedev, IFNAMSIZ);
ifr.ifr_ifru.ifru_data = (void*)p;
+
fd = socket(preferred_family, SOCK_DGRAM, 0);
+   if (fd < 0) {
+   fprintf(stderr, "create socket failed: %s\n", strerror(errno));
+   return -1;
+   }
+
err = ioctl(fd, SIOCGETTUNNEL, &ifr);
if (err)
fprintf(stderr, "get tunnel \"%s\" failed: %s\n", basedev,
@@ -94,7 +100,13 @@ int tnl_add_ioctl(int cmd, const char *basedev, const char 
*name, void *p)
else
strncpy(ifr.ifr_name, basedev, IFNAMSIZ);
ifr.ifr_ifru.ifru_data = p;
+
fd = socket(preferred_family, SOCK_DGRAM, 0);
+   if (fd < 0) {
+   fprintf(stderr, "create socket failed: %s\n", strerror(errno));
+   return -1;
+   }
+
err = ioctl(fd, cmd, &ifr);
if (err)
fprintf(stderr, "add tunnel \"%s\" failed: %s\n", ifr.ifr_name,
@@ -115,7 +127,13 @@ int tnl_del_ioctl(const char *basedev, const char *name, 
void *p)
strncpy(ifr.ifr_name, basedev, IFNAMSIZ);
 
ifr.ifr_ifru.ifru_data = p;
+
fd = socket(preferred_family, SOCK_DGRAM, 0);
+   if (fd < 0) {
+   fprintf(stderr, "create socket failed: %s\n", strerror(errno));
+   return -1;
+   }
+
err = ioctl(fd, SIOCDELTUNNEL, &ifr);
if (err)
fprintf(stderr, "delete tunnel \"%s\" failed: %s\n",
@@ -133,7 +151,13 @@ static int tnl_gen_ioctl(int cmd, const char *name,
 
strncpy(ifr.ifr_name, name, IFNAMSIZ);
ifr.ifr_ifru.ifru_data = p;
+
fd = socket(preferred_family, SOCK_DGRAM, 0);
+   if (fd < 0) {
+   fprintf(stderr, "create socket failed: %s\n", strerror(errno));
+   return -1;
+   }
+
err = ioctl(fd, cmd, &ifr);
if (err && errno != skiperr)
fprintf(stderr, "%s: ioctl %x failed: %s\n", name,
-- 
1.8.3.1



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [PATCH 03/10] dpaa_eth: add configurable bpool thresholds

2015-07-24 Thread Madalin-Cristian Bucur

> -Original Message-
> From: Joe Perches [mailto:j...@perches.com]
> On Wed, 2015-07-22 at 19:16 +0300, Madalin Bucur wrote:
> > Allow the user to tweak the refill threshold and the total number
> > of buffers in the buffer pool. The provided values are for one CPU.
> 
> Any value in making these module parameters instead?

I expect one would (hardly ever) change these to improve some corner
cases then use them with the new values. It may help in the tuning process
but afterwards the bloat to the bootcmd would probably be  a nuisance.

> > +config FSL_DPAA_ETH_MAX_BUF_COUNT
> > +   int "Maximum number of buffers in private bpool"
> > +   range 64 2048
> > +   default "128"
> > +   ---help---
> > + The maximum number of buffers to be by default allocated in the
> DPAA-Ethernet private port's
> > + buffer pool. One needn't normally modify this, as it has probably
> been tuned for performance
> > + already. This cannot be lower than DPAA_ETH_REFILL_THRESHOLD.
> > +
> > +config FSL_DPAA_ETH_REFILL_THRESHOLD
> > +   int "Private bpool refill threshold"
> > +   range 32 FSL_DPAA_ETH_MAX_BUF_COUNT
> > +   default "80"
> > +   ---help---
> > + The DPAA-Ethernet driver will start replenishing buffer pools whose
> count
> > + falls below this threshold. This must be related to
> DPAA_ETH_MAX_BUF_COUNT. One needn't normally
> > + modify this value unless one has very specific performance reasons.
> > +
> >  config FSL_DPAA_CS_THRESHOLD_1G
> > hex "Egress congestion threshold on 1G ports"
> > range 0x1000 0x1000
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Fw: [Bug 99461] recvfrom SYSCALL infinite loop/deadlock chewing 100% CPU [was __libc_recv (fd=fd@entry=300, buf=buf@entry=0x7f6042880600, n=n@entry=5, flags=-1, flags@entry=258) at ../sysdeps/unix/sys

2015-07-24 Thread Stephen Hemminger



Begin forwarded message:

Date: Fri, 24 Jul 2015 11:22:17 +
From: "bugzilla-dae...@bugzilla.kernel.org" 

To: "shemmin...@linux-foundation.org" 
Subject: [Bug 99461] recvfrom SYSCALL infinite loop/deadlock chewing 100% CPU 
[was __libc_recv (fd=fd@entry=300, buf=buf@entry=0x7f6042880600, n=n@entry=5, 
flags=-1, flags@entry=258) at ../sysdeps/unix/sysv/linux/x86_64/recv.c:33]


https://bugzilla.kernel.org/show_bug.cgi?id=99461

--- Comment #4 from Dan Searle  ---
Is there anyone working on a fix for this bug? Is there any way a fix can be
expedited?

-- 
You are receiving this mail because:
You are the assignee for the bug.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [PATCH 02/10] dpaa_eth: add support for DPAA Ethernet

2015-07-24 Thread Madalin-Cristian Bucur

> -Original Message-
> From: Joe Perches [mailto:j...@perches.com]
> On Wed, 2015-07-22 at 19:16 +0300, Madalin Bucur wrote:
> > This introduces the Freescale Data Path Acceleration Architecture
> > (DPAA) Ethernet driver (dpaa_eth) that builds upon the DPAA QMan,
> > BMan, PAMU and FMan drivers to deliver Ethernet connectivity on
> > the Freescale DPAA QorIQ platforms.
> 
> trivia:
> 
> > +static void __hot _dpa_tx_conf(struct net_device   *net_dev,
> > +  const struct dpa_priv_s  *priv,
> > +  struct dpa_percpu_priv_s *percpu_priv,
> > +  const struct qm_fd   *fd,
> > +  u32  fqid)
> > +{
> []
> > +static struct dpa_bp * __cold
> > +dpa_priv_bp_probe(struct device *dev)
> 
> Do the __hot and __cold markings really matter?
> Some of them may be questionable.

Some may be, yes. I need to go through all of them.

> > +static int __init dpa_load(void)
> > +{
> []
> > +   err = platform_driver_register(&dpa_driver);
> > +   if (unlikely(err < 0)) {
> > +   pr_err(KBUILD_MODNAME
> > +   ": %s:%hu:%s(): platform_driver_register() = %d\n",
> > +   KBUILD_BASENAME ".c", __LINE__, __func__, err);
> > +   }
> > +
> > +   pr_debug(KBUILD_MODNAME ": %s:%s() ->\n",
> > +KBUILD_BASENAME ".c", __func__);
> 
> Perhaps these should use pr_fmt

Agree.

> > +static void __exit dpa_unload(void)
> > +{
> > +   pr_debug(KBUILD_MODNAME ": -> %s:%s()\n",
> > +KBUILD_BASENAME ".c", __func__);
> 
> dynamic debug has __func__ available and perhaps
> the function tracer might be used instead.
> 
> > diff --git a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.h
> b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.h
> []
> > +#define __hot
> 
> curious.
> 
> Maybe it'd be good to add a real __hot to compiler.h

They're mostly there to make readers aware the code is critical, any
changes could mess performance.

> > +struct dpa_buffer_layout_s {
> > +   u16 priv_data_size;
> > +   boolparse_results;
> > +   booltime_stamp;
> > +   boolhash_results;
> > +   u16 data_align;
> > +};
> 
> > +struct dpa_fq {
> > +   struct qman_fq   fq_base;
> > +   struct list_head list;
> > +   struct net_device   *net_dev;
> 
> some inconsistent indentation here and there

Yes, I've tried to align the style but given the many editors along the time 
the code existed 
there still are areas out of sync.

> > +struct dpa_bp {
> > +   struct bman_pool*pool;
> > +   u8  bpid;
> > +   struct device   *dev;
> > +   union {
> > +   /* The buffer pools used for the private ports are initialized
> > +* with target_count buffers for each CPU; at runtime the
> > +* number of buffers per CPU is constantly brought back to
> this
> > +* level
> > +*/
> > +   int target_count;
> > +   /* The configured value for the number of buffers in the
> pool,
> > +* used for shared port buffer pools
> > +*/
> > +   int config_count;
> > +   };
> 
> Anonymous unions are relatively rare

We liked the direct access to members...
In this particular case the use is a bit excessive, we can do without it.

> > +   struct {
> > +   /**
> 
> Maybe the /** style should be avoided

Will fix.

> > +* All egress queues to a given net device belong to one
> > +* (and the same) congestion group.
> > +*/
> > +   struct qman_cgr cgr;
> > +   } cgr_data;
> 
> []
> 
> > +int dpa_stop(struct net_device *net_dev)
> > +{
> []
> > +   err = mac_dev->stop(mac_dev);
> > +   if (unlikely(err < 0))
> > +   netif_err(priv, ifdown, net_dev, "mac_dev->stop() = %d\n",
> > + err);
> 
> Some of the likely/unlikely uses may not
> be useful/necessary.

In this particular case it's gratuitous, I'll go through all of them.

> > +
> > +   for_each_port_device(i, mac_dev->port_dev) {
> > +   error = fm_port_disable(
> > +   fm_port_drv_handle(mac_dev-
> >port_dev[i]));
> > +   err = error ? error : err;
> 
>   if (error)
>   err = error;
> 
> is more obvious to me.

Yes, it's more readable.

Thank you,
Madalin

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net-next] hv_netvsc: Add structs and handlers for VF messages

2015-07-24 Thread Haiyang Zhang

This patch adds data structures and handlers for messages related
to SRIOV Virtual Function.

Signed-off-by: Haiyang Zhang 
Reviewed-by: K. Y. Srinivasan 
---
 drivers/net/hyperv/hyperv_net.h |   29 ++
 drivers/net/hyperv/netvsc.c |   43 +-
 2 files changed, 62 insertions(+), 10 deletions(-)

diff --git a/drivers/net/hyperv/hyperv_net.h b/drivers/net/hyperv/hyperv_net.h
index 26cd14c..f225d1f 100644
--- a/drivers/net/hyperv/hyperv_net.h
+++ b/drivers/net/hyperv/hyperv_net.h
@@ -541,6 +541,29 @@ union nvsp_2_message_uber {
struct nvsp_2_free_rxbuf free_rxbuf;
 } __packed;
 
+struct nvsp_4_send_vf_association {
+   /* 1: allocated, serial number is valid. 0: not allocated */
+   u32 allocated;
+
+   /* Serial number of the VF to team with */
+   u32 serial;
+} __packed;
+
+enum nvsp_vm_datapath {
+   NVSP_DATAPATH_SYNTHETIC = 0,
+   NVSP_DATAPATH_VF,
+   NVSP_DATAPATH_MAX
+};
+
+struct nvsp_4_sw_datapath {
+   u32 active_datapath; /* active data path in VM */
+} __packed;
+
+union nvsp_4_message_uber {
+   struct nvsp_4_send_vf_association vf_assoc;
+   struct nvsp_4_sw_datapath active_dp;
+} __packed;
+
 enum nvsp_subchannel_operation {
NVSP_SUBCHANNEL_NONE = 0,
NVSP_SUBCHANNEL_ALLOCATE,
@@ -578,6 +601,7 @@ union nvsp_all_messages {
union nvsp_message_init_uber init_msg;
union nvsp_1_message_uber v1_msg;
union nvsp_2_message_uber v2_msg;
+   union nvsp_4_message_uber v4_msg;
union nvsp_5_message_uber v5_msg;
 } __packed;
 
@@ -689,6 +713,11 @@ struct netvsc_device {
 
/* The net device context */
struct net_device_context *nd_ctx;
+
+   /* 1: allocated, serial number is valid. 0: not allocated */
+   u32 vf_alloc;
+   /* Serial number of the VF to team with */
+   u32 vf_serial;
 };
 
 /* NdisInitialize message */
diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c
index 23126a7..51e4c0f 100644
--- a/drivers/net/hyperv/netvsc.c
+++ b/drivers/net/hyperv/netvsc.c
@@ -453,13 +453,16 @@ static int negotiate_nvsp_ver(struct hv_device *device,
if (nvsp_ver == NVSP_PROTOCOL_VERSION_1)
return 0;
 
-   /* NVSPv2 only: Send NDIS config */
+   /* NVSPv2 or later: Send NDIS config */
memset(init_packet, 0, sizeof(struct nvsp_message));
init_packet->hdr.msg_type = NVSP_MSG2_TYPE_SEND_NDIS_CONFIG;
init_packet->msg.v2_msg.send_ndis_config.mtu = net_device->ndev->mtu +
   ETH_HLEN;
init_packet->msg.v2_msg.send_ndis_config.capability.ieee8021q = 1;
 
+   if (nvsp_ver >= NVSP_PROTOCOL_VERSION_5)
+   init_packet->msg.v2_msg.send_ndis_config.capability.sriov = 1;
+
ret = vmbus_sendpacket(device->channel, init_packet,
sizeof(struct nvsp_message),
(unsigned long)init_packet,
@@ -1064,11 +1067,10 @@ static void netvsc_receive(struct netvsc_device 
*net_device,
 
 
 static void netvsc_send_table(struct hv_device *hdev,
- struct vmpacket_descriptor *vmpkt)
+ struct nvsp_message *nvmsg)
 {
struct netvsc_device *nvscdev;
struct net_device *ndev;
-   struct nvsp_message *nvmsg;
int i;
u32 count, *tab;
 
@@ -1077,12 +1079,6 @@ static void netvsc_send_table(struct hv_device *hdev,
return;
ndev = nvscdev->ndev;
 
-   nvmsg = (struct nvsp_message *)((unsigned long)vmpkt +
-   (vmpkt->offset8 << 3));
-
-   if (nvmsg->hdr.msg_type != NVSP_MSG5_TYPE_SEND_INDIRECTION_TABLE)
-   return;
-
count = nvmsg->msg.v5_msg.send_table.count;
if (count != VRSS_SEND_TAB_SIZE) {
netdev_err(ndev, "Received wrong send-table size:%u\n", count);
@@ -1096,6 +1092,28 @@ static void netvsc_send_table(struct hv_device *hdev,
nvscdev->send_table[i] = tab[i];
 }
 
+static void netvsc_send_vf(struct netvsc_device *nvdev,
+  struct nvsp_message *nvmsg)
+{
+   nvdev->vf_alloc = nvmsg->msg.v4_msg.vf_assoc.allocated;
+   nvdev->vf_serial = nvmsg->msg.v4_msg.vf_assoc.serial;
+}
+
+static inline void netvsc_receive_inband(struct hv_device *hdev,
+struct netvsc_device *nvdev,
+struct nvsp_message *nvmsg)
+{
+   switch (nvmsg->hdr.msg_type) {
+   case NVSP_MSG5_TYPE_SEND_INDIRECTION_TABLE:
+   netvsc_send_table(hdev, nvmsg);
+   break;
+
+   case NVSP_MSG4_TYPE_SEND_VF_ASSOCIATION:
+   netvsc_send_vf(nvdev, nvmsg);
+   break;
+   }
+}
+
 void netvsc_channel_cb(void *context)
 {
int ret;
@@ -1108,6 +1126,7 @@ void netvsc_channel_cb(void *context)
unsigned char

Re: [PATCH net-next v2] route: allow to route in a peer netns via lwt framework

2015-07-24 Thread Eric Dumazet


On Fri, 2015-07-24 at 16:16 +0200, Nicolas Dichtel wrote:
> This patch takes advantage of the newly added lwtunnel framework to
> allow the user to set routes that point to a peer netns.
> 
> Packets are injected to the peer netns via the loopback device. It works
> only when the output device is 'lo'.
> 
> Example:
> ip route add 40.1.1.1/32 encap netns nsid 5 via dev lo
> 

Is this feature so badly wanted to add complexity on lo device ?

> Signed-off-by: Nicolas Dichtel 
> ---
> 
> v2: rework loopback handling part (update stats and call skb_dst_force())
> fix ipv6 processing
> check lwtunnel type before converting data to a nsid
> 
>  drivers/net/loopback.c| 33 +--
>  include/net/lwtunnel.h| 27 ++
>  include/uapi/linux/lwtunnel.h |  1 +
>  net/core/net_namespace.c  | 52 
> +++
>  net/ipv6/route.c  |  9 ++--
>  5 files changed, 113 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/net/loopback.c b/drivers/net/loopback.c
> index c76283c2f84a..4358256ff94e 100644
> --- a/drivers/net/loopback.c
> +++ b/drivers/net/loopback.c
> @@ -57,6 +57,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  struct pcpu_lstats {
>   u64 packets;
> @@ -71,29 +72,47 @@ struct pcpu_lstats {
>  static netdev_tx_t loopback_xmit(struct sk_buff *skb,
>struct net_device *dev)
>  {
> + int nsid = skb_lwt_netns_info(skb);
>   struct pcpu_lstats *lb_stats;
> - int len;
> -
> - skb_orphan(skb);
> + struct net *peernet = NULL;
> + int len, ret;
>  
>   /* Before queueing this packet to netif_rx(),
>* make sure dst is refcounted.
>*/
>   skb_dst_force(skb);
>  
> - skb->protocol = eth_type_trans(skb, dev);
> + if (nsid != NETNSA_NSID_NOT_ASSIGNED) {
> + peernet = get_net_ns_by_id(dev_net(dev), nsid);
> + if (!peernet) {
> + kfree_skb(skb);
> + goto end;
> + }
> +
> + /* it's OK to use per_cpu_ptr() because BHs are off */
> + lb_stats = this_cpu_ptr(peernet->loopback_dev->lstats);
> + ret = dev_forward_skb(peernet->loopback_dev, skb);
> + } else {
> + skb_orphan(skb);
>  
> - /* it's OK to use per_cpu_ptr() because BHs are off */
> - lb_stats = this_cpu_ptr(dev->lstats);
> + skb->protocol = eth_type_trans(skb, dev);
> +
> + /* it's OK to use per_cpu_ptr() because BHs are off */
> + lb_stats = this_cpu_ptr(dev->lstats);
> + ret = netif_rx(skb);
> + }
>  
>   len = skb->len;

  At this point you no longer can access skb

> - if (likely(netif_rx(skb) == NET_RX_SUCCESS)) {
> + if (likely(ret == NET_RX_SUCCESS)) {
>   u64_stats_update_begin(&lb_stats->syncp);
>   lb_stats->bytes += len;
>   lb_stats->packets++;
>   u64_stats_update_end(&lb_stats->syncp);
>   }
>  
> +end:
> + if (peernet)
> + put_net(peernet);
>   return NETDEV_TX_OK;
>  }
>  



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net-next] route: allow to route in a peer netns via lwt framework

2015-07-24 Thread David Ahern


On 7/24/15 8:32 AM, Nicolas Dichtel wrote:

Le 24/07/2015 16:28, David Ahern a écrit :

On 7/23/15 8:22 AM, Nicolas Dichtel wrote:

  static netdev_tx_t loopback_xmit(struct sk_buff *skb,
   struct net_device *dev)
  {
+int nsid = skb_lwt_netns_info(skb);
  struct pcpu_lstats *lb_stats;
  int len;

+if (nsid >= 0) {
+struct net *peernet = get_net_ns_by_id(dev_net(dev), nsid);
+
+if (!peernet) {


If nsid is > 0 then the peer namespace should exist right? So for this
failure
path why not increment tx_error stat?

I was not sure about that, because before my patch we increment
statistics only
in case of NET_RX_SUCCESS.


In this case you are knowingly dropping packets. Would be nice to have a 
counter showing that.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Several races in "usbnet" module (kernel 4.1.x)

2015-07-24 Thread Eugene Shatokhin


23.07.2015 12:15, Oliver Neukum пишет:

On Wed, 2015-07-22 at 21:33 +0300, Eugene Shatokhin wrote:

The following part is not necessary, I think. usbnet_bh() does not
touch
EVENT_NO_RUNTIME_PM bit explicitly and these bit operations are
atomic
w.r.t. each other.


+ mpn |= !test_and_clear_bit(EVENT_NO_RUNTIME_PM, &dev->flags);
+ /* in case the bh reset a flag */


Yes, they are atomic w.r.t. each other. And that limitation worries me.

I am considering architectures which do atomic operations with
spinlocks. And this code mixes another operation into it. Can
this happen?

CPU A   CPU B

take lock
read old value
set value to 0
clear bit
write back changed value
release lock


From what I see now in Documentation/atomic_ops.txt, stores to the 
properly aligned memory locations are in fact atomic.


So, I think, the situation you described above cannot happen for 
dev->flags, which is good. No need to address that in the patch. The 
race might be harmless after all.


If I understand the code correctly now, dev->flags is set to 0 in 
usbnet_stop() so that the worker function (usbnet_deferred_kevent) would 
do nothing, should it start later. If so, how about adding memory 
barriers for all CPUs to see dev->flags is 0 before other things?


The patch could look like this then:


diff --git a/drivers/net/usb/usbnet.c b/drivers/net/usb/usbnet.c
index 3c86b10..d87b9c7 100644
--- a/drivers/net/usb/usbnet.c
+++ b/drivers/net/usb/usbnet.c
@@ -778,7 +778,7 @@ int usbnet_stop (struct net_device *net)
 {
struct usbnet   *dev = netdev_priv(net);
struct driver_info  *info = dev->driver_info;
-   int retval, pm;
+   int retval, pm, mpn;

clear_bit(EVENT_DEV_OPEN, &dev->flags);
netif_stop_queue (net);
@@ -813,14 +813,17 @@ int usbnet_stop (struct net_device *net)
 * can't flush_scheduled_work() until we drop rtnl (later),
 * else workers could deadlock; so make workers a NOP.
 */
+   mpn = !test_and_clear_bit(EVENT_NO_RUNTIME_PM, &dev->flags);
dev->flags = 0;
+   smp_mb(); /* make sure the workers see that dev->flags == 0 */
+
del_timer_sync (&dev->delay);
tasklet_kill (&dev->bh);
+
if (!pm)
usb_autopm_put_interface(dev->intf);

-   if (info->manage_power &&
-   !test_and_clear_bit(EVENT_NO_RUNTIME_PM, &dev->flags))
+   if (info->manage_power && mpn)
info->manage_power(dev, 0);
else
usb_autopm_put_interface(dev->intf);
@@ -1078,6 +1081,9 @@ usbnet_deferred_kevent (struct work_struct *work)
container_of(work, struct usbnet, kevent);
int status;

+   /* See the changes in dev->flags from other CPUs. */
+   smp_mb();
+
/* usb_clear_halt() needs a thread context */
if (test_bit (EVENT_TX_HALT, &dev->flags)) {
unlink_urbs (dev, &dev->txq);


What do you think?

Regards,
Eugene

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net-next] route: allow to route in a peer netns via lwt framework

2015-07-24 Thread Nicolas Dichtel


Le 24/07/2015 16:28, David Ahern a écrit :

On 7/23/15 8:22 AM, Nicolas Dichtel wrote:

  static netdev_tx_t loopback_xmit(struct sk_buff *skb,
   struct net_device *dev)
  {
+int nsid = skb_lwt_netns_info(skb);
  struct pcpu_lstats *lb_stats;
  int len;

+if (nsid >= 0) {
+struct net *peernet = get_net_ns_by_id(dev_net(dev), nsid);
+
+if (!peernet) {


If nsid is > 0 then the peer namespace should exist right? So for this failure
path why not increment tx_error stat?

I was not sure about that, because before my patch we increment statistics only
in case of NET_RX_SUCCESS.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net-next] route: allow to route in a peer netns via lwt framework

2015-07-24 Thread David Ahern


On 7/23/15 8:22 AM, Nicolas Dichtel wrote:

  static netdev_tx_t loopback_xmit(struct sk_buff *skb,
 struct net_device *dev)
  {
+   int nsid = skb_lwt_netns_info(skb);
struct pcpu_lstats *lb_stats;
int len;

+   if (nsid >= 0) {
+   struct net *peernet = get_net_ns_by_id(dev_net(dev), nsid);
+
+   if (!peernet) {


If nsid is > 0 then the peer namespace should exist right? So for this 
failure path why not increment tx_error stat?




+   kfree_skb(skb);
+   goto end;
+   }
+
+   dev_forward_skb(peernet->loopback_dev, skb);
+   put_net(peernet);
+   goto end;
+   }
+
skb_orphan(skb);

/* Before queueing this packet to netif_rx(),


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net-next v2] route: allow to route in a peer netns via lwt framework

2015-07-24 Thread Nicolas Dichtel

This patch takes advantage of the newly added lwtunnel framework to
allow the user to set routes that point to a peer netns.

Packets are injected to the peer netns via the loopback device. It works
only when the output device is 'lo'.

Example:
ip route add 40.1.1.1/32 encap netns nsid 5 via dev lo

Signed-off-by: Nicolas Dichtel 
---

v2: rework loopback handling part (update stats and call skb_dst_force())
fix ipv6 processing
check lwtunnel type before converting data to a nsid

 drivers/net/loopback.c| 33 +--
 include/net/lwtunnel.h| 27 ++
 include/uapi/linux/lwtunnel.h |  1 +
 net/core/net_namespace.c  | 52 +++
 net/ipv6/route.c  |  9 ++--
 5 files changed, 113 insertions(+), 9 deletions(-)

diff --git a/drivers/net/loopback.c b/drivers/net/loopback.c
index c76283c2f84a..4358256ff94e 100644
--- a/drivers/net/loopback.c
+++ b/drivers/net/loopback.c
@@ -57,6 +57,7 @@
 #include 
 #include 
 #include 
+#include 
 
 struct pcpu_lstats {
u64 packets;
@@ -71,29 +72,47 @@ struct pcpu_lstats {
 static netdev_tx_t loopback_xmit(struct sk_buff *skb,
 struct net_device *dev)
 {
+   int nsid = skb_lwt_netns_info(skb);
struct pcpu_lstats *lb_stats;
-   int len;
-
-   skb_orphan(skb);
+   struct net *peernet = NULL;
+   int len, ret;
 
/* Before queueing this packet to netif_rx(),
 * make sure dst is refcounted.
 */
skb_dst_force(skb);
 
-   skb->protocol = eth_type_trans(skb, dev);
+   if (nsid != NETNSA_NSID_NOT_ASSIGNED) {
+   peernet = get_net_ns_by_id(dev_net(dev), nsid);
+   if (!peernet) {
+   kfree_skb(skb);
+   goto end;
+   }
+
+   /* it's OK to use per_cpu_ptr() because BHs are off */
+   lb_stats = this_cpu_ptr(peernet->loopback_dev->lstats);
+   ret = dev_forward_skb(peernet->loopback_dev, skb);
+   } else {
+   skb_orphan(skb);
 
-   /* it's OK to use per_cpu_ptr() because BHs are off */
-   lb_stats = this_cpu_ptr(dev->lstats);
+   skb->protocol = eth_type_trans(skb, dev);
+
+   /* it's OK to use per_cpu_ptr() because BHs are off */
+   lb_stats = this_cpu_ptr(dev->lstats);
+   ret = netif_rx(skb);
+   }
 
len = skb->len;
-   if (likely(netif_rx(skb) == NET_RX_SUCCESS)) {
+   if (likely(ret == NET_RX_SUCCESS)) {
u64_stats_update_begin(&lb_stats->syncp);
lb_stats->bytes += len;
lb_stats->packets++;
u64_stats_update_end(&lb_stats->syncp);
}
 
+end:
+   if (peernet)
+   put_net(peernet);
return NETDEV_TX_OK;
 }
 
diff --git a/include/net/lwtunnel.h b/include/net/lwtunnel.h
index b02039081b04..78376da1afa2 100644
--- a/include/net/lwtunnel.h
+++ b/include/net/lwtunnel.h
@@ -5,7 +5,9 @@
 #include 
 #include 
 #include 
+#include 
 #include 
+#include 
 
 #define LWTUNNEL_HASH_BITS   7
 #define LWTUNNEL_HASH_SIZE   (1 << LWTUNNEL_HASH_BITS)
@@ -147,4 +149,29 @@ static inline int lwtunnel_output6(struct sock *sk, struct 
sk_buff *skb)
 
 #endif
 
+static inline u32 *lwt_netns_info(struct lwtunnel_state *lwtstate)
+{
+   return (u32 *)lwtstate->data;
+}
+
+static inline int skb_lwt_netns_info(struct sk_buff *skb)
+{
+   if (skb->protocol == htons(ETH_P_IP)) {
+   struct rtable *rt = (struct rtable *)skb_dst(skb);
+
+   if (rt &&
+   rt->rt_lwtstate &&
+   rt->rt_lwtstate->type & LWTUNNEL_ENCAP_NETNS)
+   return *lwt_netns_info(rt->rt_lwtstate);
+   } else if (skb->protocol == htons(ETH_P_IPV6)) {
+   struct rt6_info *rt6 = (struct rt6_info *)skb_dst(skb);
+
+   if (rt6 &&
+   rt6->rt6i_lwtstate &&
+   rt6->rt6i_lwtstate->type & LWTUNNEL_ENCAP_NETNS)
+   return *lwt_netns_info(rt6->rt6i_lwtstate);
+   }
+
+   return NETNSA_NSID_NOT_ASSIGNED;
+}
 #endif /* __NET_LWTUNNEL_H */
diff --git a/include/uapi/linux/lwtunnel.h b/include/uapi/linux/lwtunnel.h
index 31377bbea3f8..6715e7a1b335 100644
--- a/include/uapi/linux/lwtunnel.h
+++ b/include/uapi/linux/lwtunnel.h
@@ -7,6 +7,7 @@ enum lwtunnel_encap_types {
LWTUNNEL_ENCAP_NONE,
LWTUNNEL_ENCAP_MPLS,
LWTUNNEL_ENCAP_IP,
+   LWTUNNEL_ENCAP_NETNS,
__LWTUNNEL_ENCAP_MAX,
 };
 
diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
index 2c2eb1b629b1..c1267aac373d 100644
--- a/net/core/net_namespace.c
+++ b/net/core/net_namespace.c
@@ -20,6 +20,7 @@
 #include 
 #include 
 #include 
+#include 
 
 /*
  * Our network namespace constructor/destructor lists
@@ -725,6 +726,56 @@ out:
rtnl_set_sk_err(net, RTNL

Re: [PATCH net-next] route: allow to route in a peer netns via lwt framework

2015-07-24 Thread Nicolas Dichtel


Le 24/07/2015 15:50, roopa a écrit :

On 7/24/15, 5:24 AM, Nicolas Dichtel wrote:

Sure, but my goal was to not create a new .h file just for these two helpers.
It's related to lwtunnel, thus I was thinking they can go here.

ok..., since your lwt namespace functions went into net_namespace.c, I was 
thinking
these should really go into net_namespace.h. Does that work for you ?

Not so easy, it's a problem of chicken and egg. If I add this to
net/net_namespace.h, I need to include net/lwtunnel.h but this file already
includes net/net_namespace.h (included directly or indirectly by most of the
network headers).


Regards,
Nicolas
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net-next 2/2] lwtunnel: change prototype of lwtunnel_state_get()

2015-07-24 Thread roopa


On 7/24/15, 3:28 AM, Nicolas Dichtel wrote:

It saves some lines and simplify a bit the code when the state is returning
by this function. It's also useful to handle a NULL entry.

To avoid too long lines, I've also renamed lwtunnel_state_get() and
lwtunnel_state_put() to lwtstate_get() and lwtstate_put().

CC: Thomas Graf 
CC: Roopa Prabhu 
Signed-off-by: Nicolas Dichtel 


Acked-by: Roopa Prabhu 

thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net-next] bonding: convert num_grat_arp to the new bonding option API

2015-07-24 Thread Nikolay Aleksandrov

From: Nikolay Aleksandrov 

num_grat_arp wasn't converted to the new bonding option API, so do this
now and remove the specific sysfs store option in order to use the
standard one. num_grat_arp is the same as num_unsol_na so add it as an
alias with the same option settings. An important difference is the option
name which is matched in bond_sysfs_store_option().

Signed-off-by: Nikolay Aleksandrov 
---
 drivers/net/bonding/bond_options.c |  7 +++
 drivers/net/bonding/bond_sysfs.c   | 20 +++-
 include/net/bond_options.h |  1 +
 3 files changed, 11 insertions(+), 17 deletions(-)

diff --git a/drivers/net/bonding/bond_options.c 
b/drivers/net/bonding/bond_options.c
index e9c624d54dd4..6dda57e2e724 100644
--- a/drivers/net/bonding/bond_options.c
+++ b/drivers/net/bonding/bond_options.c
@@ -420,6 +420,13 @@ static const struct bond_option bond_opts[BOND_OPT_LAST] = 
{
.flags = BOND_OPTFLAG_IFDOWN,
.values = bond_ad_user_port_key_tbl,
.set = bond_option_ad_user_port_key_set,
+   },
+   [BOND_OPT_NUM_PEER_NOTIF_ALIAS] = {
+   .id = BOND_OPT_NUM_PEER_NOTIF_ALIAS,
+   .name = "num_grat_arp",
+   .desc = "Number of peer notifications to send on failover 
event",
+   .values = bond_num_peer_notif_tbl,
+   .set = bond_option_num_peer_notif_set
}
 };
 
diff --git a/drivers/net/bonding/bond_sysfs.c b/drivers/net/bonding/bond_sysfs.c
index 31835a4dab57..f4ae72086215 100644
--- a/drivers/net/bonding/bond_sysfs.c
+++ b/drivers/net/bonding/bond_sysfs.c
@@ -380,7 +380,7 @@ static ssize_t bonding_show_ad_select(struct device *d,
 static DEVICE_ATTR(ad_select, S_IRUGO | S_IWUSR,
   bonding_show_ad_select, bonding_sysfs_store_option);
 
-/* Show and set the number of peer notifications to send after a failover 
event. */
+/* Show the number of peer notifications to send after a failover event. */
 static ssize_t bonding_show_num_peer_notif(struct device *d,
   struct device_attribute *attr,
   char *buf)
@@ -388,24 +388,10 @@ static ssize_t bonding_show_num_peer_notif(struct device 
*d,
struct bonding *bond = to_bond(d);
return sprintf(buf, "%d\n", bond->params.num_peer_notif);
 }
-
-static ssize_t bonding_store_num_peer_notif(struct device *d,
-   struct device_attribute *attr,
-   const char *buf, size_t count)
-{
-   struct bonding *bond = to_bond(d);
-   int ret;
-
-   ret = bond_opt_tryset_rtnl(bond, BOND_OPT_NUM_PEER_NOTIF, (char *)buf);
-   if (!ret)
-   ret = count;
-
-   return ret;
-}
 static DEVICE_ATTR(num_grat_arp, S_IRUGO | S_IWUSR,
-  bonding_show_num_peer_notif, bonding_store_num_peer_notif);
+  bonding_show_num_peer_notif, bonding_sysfs_store_option);
 static DEVICE_ATTR(num_unsol_na, S_IRUGO | S_IWUSR,
-  bonding_show_num_peer_notif, bonding_store_num_peer_notif);
+  bonding_show_num_peer_notif, bonding_sysfs_store_option);
 
 /* Show the MII monitor interval. */
 static ssize_t bonding_show_miimon(struct device *d,
diff --git a/include/net/bond_options.h b/include/net/bond_options.h
index c28aca25320e..1797235cd590 100644
--- a/include/net/bond_options.h
+++ b/include/net/bond_options.h
@@ -66,6 +66,7 @@ enum {
BOND_OPT_AD_ACTOR_SYS_PRIO,
BOND_OPT_AD_ACTOR_SYSTEM,
BOND_OPT_AD_USER_PORT_KEY,
+   BOND_OPT_NUM_PEER_NOTIF_ALIAS,
BOND_OPT_LAST
 };
 
-- 
2.4.3

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net-next] route: allow to route in a peer netns via lwt framework

2015-07-24 Thread roopa


On 7/24/15, 5:24 AM, Nicolas Dichtel wrote:
Sure, but my goal was to not create a new .h file just for these two 
helpers.

It's related to lwtunnel, thus I was thinking they can go here.
ok..., since your lwt namespace functions went into net_namespace.c, I 
was thinking

these should really go into net_namespace.h. Does that work for you ?
If that does not, then yes, they could live here.

Thanks,
Roopa

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net-next 1/2] ipv6: copy lwtstate in ip6_rt_copy_init()

2015-07-24 Thread roopa


On 7/24/15, 3:28 AM, Nicolas Dichtel wrote:

We need to copy this field (ip6_rt_cache_alloc() and ip6_rt_pcpu_alloc()
use ip6_rt_copy_init() to build a dst).

CC: Thomas Graf 
CC: Roopa Prabhu 
Fixes: 19e42e451506 ("ipv6: support for fib route lwtunnel encap attributes")
Signed-off-by: Nicolas Dichtel 


Acked-by: Roopa Prabhu 

Thanks!
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net-next] ipv6: use lwtunnel_output6() only if flag redirect is set

2015-07-24 Thread roopa


On 7/24/15, 1:59 AM, Nicolas Dichtel wrote:

This function make sense only when LWTUNNEL_STATE_OUTPUT_REDIRECT is set.
The check is already done in IPv4.

CC: Thomas Graf 
CC: Roopa Prabhu 
Fixes: 74a0f2fe8ed5 ("ipv6: rt6_info output redirect to tunnel output")
Signed-off-by: Nicolas Dichtel 


Acked-by: Roopa Prabhu 

thanks
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net-next] route: allow to route in a peer netns via lwt framework

2015-07-24 Thread Nicolas Dichtel


Le 23/07/2015 17:50, roopa a écrit :

On 7/23/15, 8:25 AM, Nicolas Dichtel wrote:

Le 23/07/2015 17:01, roopa a écrit :

On 7/23/15, 7:22 AM, Nicolas Dichtel wrote:

[snip]

+static inline u32 *lwt_netns_info(struct lwtunnel_state *lwtstate)
+{
+return (u32 *)lwtstate->data;
+}
+
+static inline int skb_lwt_netns_info(struct sk_buff *skb)
+{
+if (skb->protocol == htons(ETH_P_IP)) {
+struct rtable *rt = (struct rtable *)skb_dst(skb);
+
+if (rt && rt->rt_lwtstate)
+return *lwt_netns_info(rt->rt_lwtstate);
+} else if (skb->protocol == htons(ETH_P_IPV6)) {
+struct rt6_info *rt6 = (struct rt6_info *)skb_dst(skb);
+
+if (rt6 && rt6->rt6i_lwtstate)
+return *lwt_netns_info(rt6->rt6i_lwtstate);
+}
+
+return NETNSA_NSID_NOT_ASSIGNED;
+}
  #endif /* __NET_LWTUNNEL_H */

since these apis' don't have to be netns specific,
Can they just be named lwtunnel_get_state_data and skb_lwtunnel_state ?

They are specific to netns because lwtstate->data is interpreted as an u32 *.
But I agree that a test is missing against lwtstate->type to ensure that data
will be a nsid.


o ok..., the api's in lwtunnel.h today are not specific to an encap type.
they are generic, so skb_lwtunnel_state() which returns struct lwtunnel_state
could go here.
the encap specific ones can go in the respective callers. Recently thomas added
a similar
skb_tunnel_info() for ip tunnels. I did  like to have a generic version of your
skb_lwt_netns_info in lwtunnel.h. I could use it in my mpls output func too.

Sure, but my goal was to not create a new .h file just for these two helpers.
It's related to lwtunnel, thus I was thinking they can go here.


Regards,
Nicolas
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [patch net-next 3/4] mlxsw: Add interface to access registers and process events

2015-07-24 Thread Andy Gospodarek

On Fri, Jul 24, 2015 at 07:24:53AM +0200, Jiri Pirko wrote:
> Thu, Jul 23, 2015 at 11:12:20PM CEST, go...@cumulusnetworks.com wrote:
> >On Thu, Jul 23, 2015 at 05:43:35PM +0200, Jiri Pirko wrote:
> >> From: Ido Schimmel 
> >> 
> >> Add the ability to construct mailbox-style register access messages
> >> called EMADs with provisions to construct and parse the registers payload.
> >> Implement EMAD transaction layer which is responsible for the reliable
> >> transmission of EMADs.
> >> Also, add an infrastructure used by the switch driver to register for
> >> particular events generated by the device.
> >> 
> >> Signed-off-by: Ido Schimmel 
> >> Signed-off-by: Jiri Pirko 
> >> Signed-off-by: Elad Raz 
> >> ---
> >>  drivers/net/ethernet/mellanox/mlxsw/core.c |  736 
> >>  drivers/net/ethernet/mellanox/mlxsw/core.h |   21 +
> >>  drivers/net/ethernet/mellanox/mlxsw/emad.h |  127 +++
> >>  drivers/net/ethernet/mellanox/mlxsw/port.h |   19 +
> >>  drivers/net/ethernet/mellanox/mlxsw/reg.h  | 1289 
> >> 
> >>  5 files changed, 2192 insertions(+)
> >>  create mode 100644 drivers/net/ethernet/mellanox/mlxsw/emad.h
> >>  create mode 100644 drivers/net/ethernet/mellanox/mlxsw/reg.h
> >> 
> >> diff --git a/drivers/net/ethernet/mellanox/mlxsw/core.c 
> >> b/drivers/net/ethernet/mellanox/mlxsw/core.c
> >> index 211ec9b..bd0f692 100644
> >> --- a/drivers/net/ethernet/mellanox/mlxsw/core.c
> >> +++ b/drivers/net/ethernet/mellanox/mlxsw/core.c
> >[...]
> >> +  struct list_head event_listener_list;
> >> +  struct {
> >> +  struct sk_buff *resp_skb;
> >> +  u64 tid;
> >> +  wait_queue_head_t wait;
> >> +  bool trans_active;
> >> +  struct mutex lock; /* One EMAD transaction at a time. */
> >> +  bool use_emad;
> >> +  } emad;
> >>struct mlxsw_core_pcpu_stats __percpu *pcpu_stats;
> >>struct dentry *dbg_dir;
> >>struct {
> >[...]
> >>}
> >>  
> >>INIT_LIST_HEAD(&mlxsw_core->rx_listener_list);
> >> +  INIT_LIST_HEAD(&mlxsw_core->event_listener_list);
> >>mlxsw_core->driver = mlxsw_driver;
> >>mlxsw_core->bus = mlxsw_bus;
> >>mlxsw_core->bus_priv = bus_priv;
> >[...]
> >> +  /* No reason to save item if we did not manage to register an RX
> >> +   * listener for it.
> >> +   */
> >> +  list_add_rcu(&el_item->list, &mlxsw_core->event_listener_list);
> >> +
> >
> >I see where 'event_listener_list' is defined and where entries are
> >added/removed, but where is the code that would receive these events and
> >presumably search this list so all handlers registered (currently just
> >PUDE) can handle events?
> 
> That is handled by calling mlxsw_core_rx_listener_register.
> that will add each event handler as a item to &mlxsw_core->rx_listener_list
> These rx_listeners are called from mlxsw_core_skb_receive.
> So event is here a special case of rx_listener. The event list is used
> just to contain struct mlxsw_event_listener_item instances.
> 

Thanks for the explanation!  I missed how that was called the first time
I went through this.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] net: phy: fix auto negotiation checking for teranetics

2015-07-24 Thread shh.xie

From: Shaohui Xie 

When using fiber port, the phy cannot report it's auto negotiation state,
driver should always report auto negotiation is done when using fiber port.

Signed-off-by: Shaohui Xie 
---
 drivers/net/phy/teranetics.c | 11 +--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/drivers/net/phy/teranetics.c b/drivers/net/phy/teranetics.c
index 7dcb5aa..91e1bec 100644
--- a/drivers/net/phy/teranetics.c
+++ b/drivers/net/phy/teranetics.c
@@ -51,8 +51,15 @@ static int teranetics_aneg_done(struct phy_device *phydev)
 {
int reg;
 
-   reg = phy_read_mmd(phydev, MDIO_MMD_AN, MDIO_STAT1);
-   return (reg < 0) ? reg : (reg & BMSR_ANEGCOMPLETE);
+   /* auto negotiation state can only be checked when using copper
+* port, if using fiber port, just lie it's done.
+*/
+   if (!phy_read_mmd(phydev, MDIO_MMD_VEND1, 93)) {
+   reg = phy_read_mmd(phydev, MDIO_MMD_AN, MDIO_STAT1);
+   return (reg < 0) ? reg : (reg & BMSR_ANEGCOMPLETE);
+   }
+
+   return 1;
 }
 
 static int teranetics_config_aneg(struct phy_device *phydev)
-- 
2.1.0.27.g96db324

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] net/ipv6: add sysctl option accept_ra_hop_limit

2015-07-24 Thread Hangbin Liu

2015-07-24 12:48 GMT+08:00 YOSHIFUJI Hideaki
:
> Hi,
>
> Hangbin Liu wrote:
>> Commit 6fd99094de2b ("ipv6: Don't reduce hop limit for an interface")
>> disabled accept hop limit from RA if it is higher than the current hop
>> limit for security stuff. But this behavior kind of break the RFC definition.
>>
>> RFC 4861, 6.3.4.  Processing Received Router Advertisements
>>If the received Cur Hop Limit value is non-zero, the host SHOULD set
>>its CurHopLimit variable to the received value.
>>
>> So add sysctl option accept_ra_hop_limit to let user choose whether accept
>> hop limit info in RA.
>>
>> Signed-off-by: Hangbin Liu 
>> Acked-by: Hannes Frederic Sowa 
>> ---
>>  Documentation/networking/ip-sysctl.txt | 11 +++
>>  include/linux/ipv6.h   |  1 +
>>  include/uapi/linux/ipv6.h  |  1 +
>>  net/ipv6/addrconf.c| 10 ++
>>  net/ipv6/ndisc.c   | 17 +++--
>>  5 files changed, 34 insertions(+), 6 deletions(-)
>>
> :
>> diff --git a/include/uapi/linux/ipv6.h b/include/uapi/linux/ipv6.h
>> index 5efa54a..9f40ac9 100644
>> --- a/include/uapi/linux/ipv6.h
>> +++ b/include/uapi/linux/ipv6.h
>> @@ -153,6 +153,7 @@ enum {
>>   DEVCONF_FORCE_MLD_VERSION,
>>   DEVCONF_ACCEPT_RA_DEFRTR,
>>   DEVCONF_ACCEPT_RA_PINFO,
>> + DEVCONF_ACCEPT_RA_HOP_LIMIT,
>>   DEVCONF_ACCEPT_RA_RTR_PREF,
>>   DEVCONF_RTR_PROBE_INTERVAL,
>>   DEVCONF_ACCEPT_RA_RT_INFO_MAX_PLEN,
>
> No, you cannot add new one in the middle of these since
> values are exported to userspace.
>
Hi Yoshfuji-san,

Thanks for the reminding, should I also move the value in struct ipv6_devconf
to the end or just leave after accept_ra_pinfo?

Thanks
Hangbin
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net-next] ipv6: use lwtunnel_output6() only if flag redirect is set

2015-07-24 Thread Thomas Graf

On 07/24/15 at 10:59am, Nicolas Dichtel wrote:
> This function make sense only when LWTUNNEL_STATE_OUTPUT_REDIRECT is set.
> The check is already done in IPv4.
> 
> CC: Thomas Graf 
> CC: Roopa Prabhu 
> Fixes: 74a0f2fe8ed5 ("ipv6: rt6_info output redirect to tunnel output")
> Signed-off-by: Nicolas Dichtel 

Acked-by: Thomas Graf 
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net-next 2/2] lwtunnel: change prototype of lwtunnel_state_get()

2015-07-24 Thread Thomas Graf

On 07/24/15 at 12:28pm, Nicolas Dichtel wrote:
> It saves some lines and simplify a bit the code when the state is returning
> by this function. It's also useful to handle a NULL entry.
> 
> To avoid too long lines, I've also renamed lwtunnel_state_get() and
> lwtunnel_state_put() to lwtstate_get() and lwtstate_put().
> 
> CC: Thomas Graf 
> CC: Roopa Prabhu 
> Signed-off-by: Nicolas Dichtel 

Acked-by: Thomas Graf 
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net-next 1/2] ipv6: copy lwtstate in ip6_rt_copy_init()

2015-07-24 Thread Thomas Graf

On 07/24/15 at 12:28pm, Nicolas Dichtel wrote:
> We need to copy this field (ip6_rt_cache_alloc() and ip6_rt_pcpu_alloc()
> use ip6_rt_copy_init() to build a dst).
> 
> CC: Thomas Graf 
> CC: Roopa Prabhu 
> Fixes: 19e42e451506 ("ipv6: support for fib route lwtunnel encap attributes")
> Signed-off-by: Nicolas Dichtel 

Acked-by: Thomas Graf 
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net-next 2/2] lwtunnel: change prototype of lwtunnel_state_get()

2015-07-24 Thread Nicolas Dichtel

It saves some lines and simplify a bit the code when the state is returning
by this function. It's also useful to handle a NULL entry.

To avoid too long lines, I've also renamed lwtunnel_state_get() and
lwtunnel_state_put() to lwtstate_get() and lwtstate_put().

CC: Thomas Graf 
CC: Roopa Prabhu 
Signed-off-by: Nicolas Dichtel 
---
 include/net/lwtunnel.h   | 16 +++-
 net/ipv4/fib_semantics.c |  9 -
 net/ipv4/route.c |  9 ++---
 net/ipv6/ip6_fib.c   |  2 +-
 net/ipv6/route.c |  8 ++--
 5 files changed, 20 insertions(+), 24 deletions(-)

diff --git a/include/net/lwtunnel.h b/include/net/lwtunnel.h
index bd72e82b45a1..78376da1afa2 100644
--- a/include/net/lwtunnel.h
+++ b/include/net/lwtunnel.h
@@ -37,12 +37,16 @@ extern const struct lwtunnel_encap_ops __rcu *
lwtun_encaps[LWTUNNEL_ENCAP_MAX+1];
 
 #ifdef CONFIG_LWTUNNEL
-static inline void lwtunnel_state_get(struct lwtunnel_state *lws)
+static inline struct lwtunnel_state *
+lwtstate_get(struct lwtunnel_state *lws)
 {
-   atomic_inc(&lws->refcnt);
+   if (lws)
+   atomic_inc(&lws->refcnt);
+
+   return lws;
 }
 
-static inline void lwtunnel_state_put(struct lwtunnel_state *lws)
+static inline void lwtstate_put(struct lwtunnel_state *lws)
 {
if (!lws)
return;
@@ -76,11 +80,13 @@ int lwtunnel_output6(struct sock *sk, struct sk_buff *skb);
 
 #else
 
-static inline void lwtunnel_state_get(struct lwtunnel_state *lws)
+static inline struct lwtunnel_state *
+lwtstate_get(struct lwtunnel_state *lws)
 {
+   return lws;
 }
 
-static inline void lwtunnel_state_put(struct lwtunnel_state *lws)
+static inline void lwtstate_put(struct lwtunnel_state *lws)
 {
 }
 
diff --git a/net/ipv4/fib_semantics.c b/net/ipv4/fib_semantics.c
index 6754c64b2fe0..7226df887531 100644
--- a/net/ipv4/fib_semantics.c
+++ b/net/ipv4/fib_semantics.c
@@ -209,7 +209,7 @@ static void free_fib_info_rcu(struct rcu_head *head)
change_nexthops(fi) {
if (nexthop_nh->nh_dev)
dev_put(nexthop_nh->nh_dev);
-   lwtunnel_state_put(nexthop_nh->nh_lwtstate);
+   lwtstate_put(nexthop_nh->nh_lwtstate);
free_nh_exceptions(nexthop_nh);
rt_fibinfo_free_cpus(nexthop_nh->nh_pcpu_rth_output);
rt_fibinfo_free(&nexthop_nh->nh_rth_input);
@@ -512,8 +512,8 @@ static int fib_get_nhs(struct fib_info *fi, struct 
rtnexthop *rtnh,
   nla, &lwtstate);
if (ret)
goto errout;
-   lwtunnel_state_get(lwtstate);
-   nexthop_nh->nh_lwtstate = lwtstate;
+   nexthop_nh->nh_lwtstate =
+   lwtstate_get(lwtstate);
}
}
 
@@ -969,8 +969,7 @@ struct fib_info *fib_create_info(struct fib_config *cfg)
if (err)
goto failure;
 
-   lwtunnel_state_get(lwtstate);
-   nh->nh_lwtstate = lwtstate;
+   nh->nh_lwtstate = lwtstate_get(lwtstate);
}
nh->nh_oif = cfg->fc_oif;
nh->nh_gw = cfg->fc_gw;
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 519ec232818d..11096396ef4a 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -1358,7 +1358,7 @@ static void ipv4_dst_destroy(struct dst_entry *dst)
list_del(&rt->rt_uncached);
spin_unlock_bh(&ul->lock);
}
-   lwtunnel_state_put(rt->rt_lwtstate);
+   lwtstate_put(rt->rt_lwtstate);
 }
 
 void rt_flush_dev(struct net_device *dev)
@@ -1407,12 +1407,7 @@ static void rt_set_nexthop(struct rtable *rt, __be32 
daddr,
 #ifdef CONFIG_IP_ROUTE_CLASSID
rt->dst.tclassid = nh->nh_tclassid;
 #endif
-   if (nh->nh_lwtstate) {
-   lwtunnel_state_get(nh->nh_lwtstate);
-   rt->rt_lwtstate = nh->nh_lwtstate;
-   } else {
-   rt->rt_lwtstate = NULL;
-   }
+   rt->rt_lwtstate = lwtstate_get(nh->nh_lwtstate);
if (unlikely(fnhe))
cached = rt_bind_exception(rt, fnhe, daddr);
else if (!(rt->dst.flags & DST_NOCACHE))
diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c
index d715f2e0c4e7..5693b5eb8482 100644
--- a/net/ipv6/ip6_fib.c
+++ b/net/ipv6/ip6_fib.c
@@ -178,7 +178,7 @@ static void rt6_free_pcpu(struct rt6_info *non_pcpu_rt)
 static void rt6_release(struct rt6_info *rt)
 {
if (atomic_dec_and_test(&rt->rt6i_ref)) {
-   lwtunnel_state_put(rt->rt6i_lwtstate);
+   lwtstate_put(rt->rt6i_lwtstate);
rt6_free_pcpu(rt);
dst_free(&rt->dst);
}
diff --git a/net/i

[PATCH net-next 1/2] ipv6: copy lwtstate in ip6_rt_copy_init()

2015-07-24 Thread Nicolas Dichtel

We need to copy this field (ip6_rt_cache_alloc() and ip6_rt_pcpu_alloc()
use ip6_rt_copy_init() to build a dst).

CC: Thomas Graf 
CC: Roopa Prabhu 
Fixes: 19e42e451506 ("ipv6: support for fib route lwtunnel encap attributes")
Signed-off-by: Nicolas Dichtel 
---
 net/ipv6/route.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 67b2367126f3..ac01ab0886a5 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -2164,6 +2164,10 @@ static void ip6_rt_copy_init(struct rt6_info *rt, struct 
rt6_info *ort)
 #endif
rt->rt6i_prefsrc = ort->rt6i_prefsrc;
rt->rt6i_table = ort->rt6i_table;
+   if (ort->rt6i_lwtstate) {
+   lwtunnel_state_get(ort->rt6i_lwtstate);
+   rt->rt6i_lwtstate = ort->rt6i_lwtstate;
+   }
 }
 
 #ifdef CONFIG_IPV6_ROUTE_INFO
-- 
2.4.2

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH]mlx4-core: fix possible use after free in cq_completion

2015-07-24 Thread Jinpu Wang

On Fri, Jul 24, 2015 at 10:18 AM, Jinpu Wang
 wrote:
> Hi all,
>
> I hit bug in OFED, I report to link below:
>
> http://marc.info/?l=linux-rdma&m=143634872328553&w=2
> I checked latest mainline Linux 4.2-rc3, it has similar bug.
> Here is the patch against Linux 4.2-rc3, compile test only.
>
> I add one copy as attachment in case mail client break the patch format.
>
> From a9fbc1ff0768acdb260e57e3324798fc0082d194 Mon Sep 17 00:00:00 2001
> From: Jack Wang 
> Date: Thu, 23 Jul 2015 18:58:08 +0200
> Subject: [PATCH] mlx4_core: fix possible use-after-free in cq_completion
>
> It's possible during mlx4_cq_free, there are new cq_completion come,
> and there is no spin_lock protection for cq_completion, also no
> refcount protection, it will lead to use after free. So add the
> spin_lock and refcount protection in cq_completion.
>
> Signed-off-by: Jack Wang 
> ---
>  drivers/net/ethernet/mellanox/mlx4/cq.c | 11 +--
>  1 file changed, 9 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/net/ethernet/mellanox/mlx4/cq.c
> b/drivers/net/ethernet/mellanox/mlx4/cq.c
> index 3348e64..8d7f405 100644
> --- a/drivers/net/ethernet/mellanox/mlx4/cq.c
> +++ b/drivers/net/ethernet/mellanox/mlx4/cq.c
> @@ -99,10 +99,15 @@ static void mlx4_add_cq_to_tasklet(struct mlx4_cq *cq)
>
>  void mlx4_cq_completion(struct mlx4_dev *dev, u32 cqn)
>  {
> +struct mlx4_cq_table *cq_table = &mlx4_priv(dev)->cq_table;
>  struct mlx4_cq *cq;
>
> -cq = radix_tree_lookup(&mlx4_priv(dev)->cq_table.tree,
> -   cqn & (dev->caps.num_cqs - 1));
> +spin_lock(&cq_table->lock);
> +cq = radix_tree_lookup(&cq_table->tree, cqn & (dev->caps.num_cqs - 1));
> +if (cq)
> +atomic_inc(&cq->refcount);
> +
> +spin_unlock(&cq_table->lock);
>  if (!cq) {
>  mlx4_dbg(dev, "Completion event for bogus CQ %08x\n", cqn);
>  return;
> @@ -111,6 +116,8 @@ void mlx4_cq_completion(struct mlx4_dev *dev, u32 cqn)
>  ++cq->arm_sn;
>
>  cq->comp(cq);
> +if (atomic_dec_and_test(&cq->refcount))
> +complete(&cq->free);
>  }
>
>  void mlx4_cq_event(struct mlx4_dev *dev, u32 cqn, int event_type)
> --
> 1.9.1
>

Found almost same patch as what I did, but 3 years ago :)

http://linux-rdma.vger.kernel.narkive.com/NSyWFRkW/patch-rfc-for-next-net-mlx4-core-fix-racy-flow-in-the-driver-cq-completion-handler

Could you consider to apply the patch, it fix real PANIC?

Thanks

Jack

--
Mit freundlichen Grüßen,
Best Regards,

Jack Wang

Linux Kernel Developer Storage
ProfitBricks GmbH  The IaaS-Company.

ProfitBricks GmbH
Greifswalder Str. 207
D - 10405 Berlin
Tel: +49 30 5770083-42
Fax: +49 30 5770085-98
Email: jinpu.w...@profitbricks.com
URL: http://www.profitbricks.de

Sitz der Gesellschaft: Berlin.
Registergericht: Amtsgericht Charlottenburg, HRB 125506 B.
Geschäftsführer: Andreas Gauger, Achim Weiss.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net-next] ipv6: use lwtunnel_output6() only if flag redirect is set

2015-07-24 Thread Nicolas Dichtel

This function make sense only when LWTUNNEL_STATE_OUTPUT_REDIRECT is set.
The check is already done in IPv4.

CC: Thomas Graf 
CC: Roopa Prabhu 
Fixes: 74a0f2fe8ed5 ("ipv6: rt6_info output redirect to tunnel output")
Signed-off-by: Nicolas Dichtel 
---
 net/ipv6/route.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index f216cb998628..67b2367126f3 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -1780,7 +1780,8 @@ int ip6_route_add(struct fib6_config *cfg)
goto out;
lwtunnel_state_get(lwtstate);
rt->rt6i_lwtstate = lwtstate;
-   rt->dst.output = lwtunnel_output6;
+   if (lwtunnel_output_redirect(rt->rt6i_lwtstate))
+   rt->dst.output = lwtunnel_output6;
}
 
ipv6_addr_prefix(&rt->rt6i_dst.addr, &cfg->fc_dst, cfg->fc_dst_len);
-- 
2.4.2

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH V3 3/7] Drivers: hv: vmbus: add APIs to send/recv hvsock packet and get the r/w-ability

2015-07-24 Thread Dan Carpenter

On Fri, Jul 24, 2015 at 11:57:01AM +0530, Sudip Mukherjee wrote:
> This is also ok, the function is supposed to return ret or-ed with the
> relevant flags based on the scan position. It is considered error if 0
> is returned (without any flag).

Yeah.  You're right.  I looked through my list again this morning and
they all seem fine...

regards,
dan carpenter

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH]mlx4-core: fix possible use after free in cq_completion

2015-07-24 Thread Jinpu Wang

Hi all,

I hit bug in OFED, I report to link below:

http://marc.info/?l=linux-rdma&m=143634872328553&w=2
I checked latest mainline Linux 4.2-rc3, it has similar bug.
Here is the patch against Linux 4.2-rc3, compile test only.

I add one copy as attachment in case mail client break the patch format.

>From a9fbc1ff0768acdb260e57e3324798fc0082d194 Mon Sep 17 00:00:00 2001
From: Jack Wang 
Date: Thu, 23 Jul 2015 18:58:08 +0200
Subject: [PATCH] mlx4_core: fix possible use-after-free in cq_completion

It's possible during mlx4_cq_free, there are new cq_completion come,
and there is no spin_lock protection for cq_completion, also no
refcount protection, it will lead to use after free. So add the
spin_lock and refcount protection in cq_completion.

Signed-off-by: Jack Wang 
---
 drivers/net/ethernet/mellanox/mlx4/cq.c | 11 +--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/cq.c
b/drivers/net/ethernet/mellanox/mlx4/cq.c
index 3348e64..8d7f405 100644
--- a/drivers/net/ethernet/mellanox/mlx4/cq.c
+++ b/drivers/net/ethernet/mellanox/mlx4/cq.c
@@ -99,10 +99,15 @@ static void mlx4_add_cq_to_tasklet(struct mlx4_cq *cq)

 void mlx4_cq_completion(struct mlx4_dev *dev, u32 cqn)
 {
+struct mlx4_cq_table *cq_table = &mlx4_priv(dev)->cq_table;
 struct mlx4_cq *cq;

-cq = radix_tree_lookup(&mlx4_priv(dev)->cq_table.tree,
-   cqn & (dev->caps.num_cqs - 1));
+spin_lock(&cq_table->lock);
+cq = radix_tree_lookup(&cq_table->tree, cqn & (dev->caps.num_cqs - 1));
+if (cq)
+atomic_inc(&cq->refcount);
+
+spin_unlock(&cq_table->lock);
 if (!cq) {
 mlx4_dbg(dev, "Completion event for bogus CQ %08x\n", cqn);
 return;
@@ -111,6 +116,8 @@ void mlx4_cq_completion(struct mlx4_dev *dev, u32 cqn)
 ++cq->arm_sn;

 cq->comp(cq);
+if (atomic_dec_and_test(&cq->refcount))
+complete(&cq->free);
 }

 void mlx4_cq_event(struct mlx4_dev *dev, u32 cqn, int event_type)
-- 
1.9.1

-- 
Mit freundlichen Grüßen,Linux 4.2-rc3
Best Regards,

Jack Wang

Linux Kernel Developer Storage
ProfitBricks GmbH  The IaaS-Company.

ProfitBricks GmbH
Greifswalder Str. 207
D - 10405 Berlin
Tel: +49 30 5770083-42
Fax: +49 30 5770085-98
Email: jinpu.w...@profitbricks.com
URL: http://www.profitbricks.de

Sitz der Gesellschaft: Berlin.
Registergericht: Amtsgericht Charlottenburg, HRB 125506 B.
Geschäftsführer: Andreas Gauger, Achim Weiss.
From a9fbc1ff0768acdb260e57e3324798fc0082d194 Mon Sep 17 00:00:00 2001
From: Jack Wang 
Date: Thu, 23 Jul 2015 18:58:08 +0200
Subject: [PATCH] mlx4_core: fix possible use-after-free in cq_completion

It's possible during mlx4_cq_free, there are new cq_completion come,
and there is no spin_lock protection for cq_completion, also no
refcount protection, it will lead to use after free. So add the
spin_lock and refcount protection in cq_completion.

Signed-off-by: Jack Wang 
---
 drivers/net/ethernet/mellanox/mlx4/cq.c | 11 +--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/cq.c b/drivers/net/ethernet/mellanox/mlx4/cq.c
index 3348e64..8d7f405 100644
--- a/drivers/net/ethernet/mellanox/mlx4/cq.c
+++ b/drivers/net/ethernet/mellanox/mlx4/cq.c
@@ -99,10 +99,15 @@ static void mlx4_add_cq_to_tasklet(struct mlx4_cq *cq)
 
 void mlx4_cq_completion(struct mlx4_dev *dev, u32 cqn)
 {
+	struct mlx4_cq_table *cq_table = &mlx4_priv(dev)->cq_table;
 	struct mlx4_cq *cq;
 
-	cq = radix_tree_lookup(&mlx4_priv(dev)->cq_table.tree,
-			   cqn & (dev->caps.num_cqs - 1));
+	spin_lock(&cq_table->lock);
+	cq = radix_tree_lookup(&cq_table->tree, cqn & (dev->caps.num_cqs - 1));
+	if (cq)
+		atomic_inc(&cq->refcount);
+
+	spin_unlock(&cq_table->lock);
 	if (!cq) {
 		mlx4_dbg(dev, "Completion event for bogus CQ %08x\n", cqn);
 		return;
@@ -111,6 +116,8 @@ void mlx4_cq_completion(struct mlx4_dev *dev, u32 cqn)
 	++cq->arm_sn;
 
 	cq->comp(cq);
+	if (atomic_dec_and_test(&cq->refcount))
+		complete(&cq->free);
 }
 
 void mlx4_cq_event(struct mlx4_dev *dev, u32 cqn, int event_type)
-- 
1.9.1

Re: [PATCH 1/1] ath10k: fixing wrong initialization of struct channel

2015-07-24 Thread Kalle Valo

Maninder Singh  writes:

> chandef is initialized with NULL and on the very next line,
> we are using it to get channel, which is not correct.
>
> channel should be initialized after obtaining chandef.
>
> Signed-off-by: Maninder Singh 

Thanks, applied.

-- 
Kalle Valo
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [patch net-next 3/4] mlxsw: Add interface to access registers and process events

2015-07-24 Thread Elad Raz

Sent from my iPhone

> On Jul 24, 2015, at 08:14, Scott Feldman  wrote:
> 
>> On Thu, Jul 23, 2015 at 8:43 AM, Jiri Pirko  wrote:
>> From: Ido Schimmel 
>> 
>> Add the ability to construct mailbox-style register access messages
>> called EMADs with provisions to construct and parse the registers payload.
>> Implement EMAD transaction layer which is responsible for the reliable
>> transmission of EMADs.
>> Also, add an infrastructure used by the switch driver to register for
>> particular events generated by the device.
> 
> What is this EMADs used for?  Is this for intra-switch or inter-switch
> communications?

Ethernet management datagram.
It's command encoding wrap as a packet. It used for host interface 
communication as well as multi-silicon support.--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

67 matches

Mail list logo