Re: [RFC PATCH v4 net-next 1/4] tcp: replace cnt & rtt with struct in pkts_acked()
On Fri, 2015-07-24 at 19:47 -0700, Lawrence Brakmo wrote: > Replace 2 arguments (cnt and rtt) in the congestion control modules' > pkts_acked() function with a struct. This will allow adding more > information without having to modify existing congestion control > modules (tcp_nv in particular needs bytes in flight when packet > was sent). > > > +struct ack_sample { > + u32 pkts_acked; > + s32 rtt_us; > +}; > + > struct tcp_congestion_ops { > struct list_headlist; > u32 key; > @@ -857,7 +862,7 @@ struct tcp_congestion_ops { > /* new value of cwnd after loss (optional) */ > u32 (*undo_cwnd)(struct sock *sk); > /* hook for packet ack accounting (optional) */ > - void (*pkts_acked)(struct sock *sk, u32 num_acked, s32 rtt_us); > + void (*pkts_acked)(struct sock *sk, struct ack_sample *sample); This probably should be a const struct ack_sample *sample ? -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Patch net] sch_choke: drop all packets in queue during reset
From: Cong Wang Date: Tue, 21 Jul 2015 16:52:43 -0700 > Signed-off-by: Cong Wang Applied. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Patch net] sch_plug: purge buffered packets during reset
From: Cong Wang Date: Tue, 21 Jul 2015 16:31:53 -0700 > Otherwise the skbuff related structures are not correctly > refcount'ed. > > Cc: Jamal Hadi Salim > Signed-off-by: Cong Wang Applied. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next v2] bridge: Fix setting a flag in br_fill_ifvlaninfo_range().
From: Rami Rosen Date: Wed, 22 Jul 2015 07:57:02 +0300 > This patch fixes setting of vinfo.flags in the br_fill_ifvlaninfo_range() > method. The > assignment of vinfo.flags &= ~BRIDGE_VLAN_INFO_RANGE_BEGIN has no effect and > is > unneeded, as vinfo.flags value is overriden by the immediately following > vinfo.flags = flags | BRIDGE_VLAN_INFO_RANGE_END assignement. > > Signed-off-by: Rami Rosen Applied. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net 0/2] ipv4: fib_select_default changes
From: Julian Anastasov Date: Wed, 22 Jul 2015 10:43:21 +0300 > This patchset contains 2 changes for the alternative routes, > one to add tb_id/fa_slen check needed after the recent > fib_trie optimizations for fib aliases and the second > change attempts to support alternative routes with TOS > requirement. > > Sorry that I don't have access to the original > report from Hagen Paul Pfeifer. I hope he will see this > change. > > The second change adds fa_default field to the > fib aliases (which can be many) and if the feature to > filter the alternative routes by TOS is not worth it, > this second patch can be scrapped. Great work, series applied, thanks Julian! -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next] be2net: support ndo_get_phys_port_id()
From: Sriharsha Basavapatna Date: Wed, 22 Jul 2015 11:15:12 +0530 > From: Sriharsha Basavapatna > > Add be_get_phys_port_id() function to report physical port id. The port id > should be unique across different be2net devices in the system. We use the > chip serial number along with the physical port number for this. > > Signed-off-by: Sriharsha Basavapatna Applied, thanks. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V2 net-next 0/3] ARM BPF JIT features
From: Nicolas Schichan Date: Tue, 21 Jul 2015 14:16:37 +0200 > This serie adds support for more instructions to the ARM BPF JIT "series" > namely skb netdevice type retrieval, skb payload offset retrieval, and > skb packet type retrieval. > > This allows 35 tests to use the JIT instead of 29 before. > > This serie depends on the "BPF JIT fixes for ARM" serie sent earlier. "series" But even with that series applied these patches do not apply properly at all. davem@greenl8ke:~/src/GIT/net-next$ git am --signoff bundle-8569-arm-bpf-next.mbox Applying: ARM: net: add support for BPF_ANC | SKF_AD_PKTTYPE in ARM JIT. error: patch failed: arch/arm/net/bpf_jit_32.c:864 error: arch/arm/net/bpf_jit_32.c: patch does not apply Patch failed at 0001 ARM: net: add support for BPF_ANC | SKF_AD_PKTTYPE in ARM JIT. When you have resolved this problem run "git am --resolved". If you would prefer to skip this patch, instead run "git am --skip". To restore the original branch and stop patching run "git am --abort". Please respin against net-next, thanks. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC PATCH net] sctp: ASCONF-ACK with Unresolvable Address should be sent
RFC 5061: This is an opaque integer assigned by the sender to identify each request parameter. The receiver of the ASCONF Chunk will copy this 32-bit value into the ASCONF Response Correlation ID field of the ASCONF-ACK response parameter. The sender of the ASCONF can use this same value in the ASCONF-ACK to find which request the response is for. Note that the receiver MUST NOT change this 32-bit value. Address Parameter: TLV This field contains an IPv4 or IPv6 address parameter, as described in Section 3.3.2.1 of [RFC4960]. ASCONF chunk with Error Cause Indication Parameter (Unresolvable Address) should be sent if the Delete IP Address is not part of the association. Endpoint A Endpoint B (ESTABLISHED)(ESTABLISHED) ASCONF-> (Delete IP Address) <- ASCONF-ACK (Unresolvable Address) Signed-off-by: Xin Long --- net/sctp/sm_make_chunk.c | 15 +-- 1 file changed, 13 insertions(+), 2 deletions(-) diff --git a/net/sctp/sm_make_chunk.c b/net/sctp/sm_make_chunk.c index 06320c8..6e399f6 100644 --- a/net/sctp/sm_make_chunk.c +++ b/net/sctp/sm_make_chunk.c @@ -3090,8 +3090,19 @@ static __be16 sctp_process_asconf_param(struct sctp_association *asoc, sctp_assoc_set_primary(asoc, asconf->transport); sctp_assoc_del_nonprimary_peers(asoc, asconf->transport); - } else - sctp_assoc_del_peer(asoc, &addr); + return SCTP_ERROR_NO_ERROR; + } + + /* If the address is not part of the association, the +* ASCONF-ACK with Error Cause Indication Parameter +* which including cause of Unresolvable Address should +* be sent. +*/ + peer = sctp_assoc_lookup_paddr(asoc, &addr); + if (!peer) + return SCTP_ERROR_DNS_FAILED; + + sctp_assoc_rm_peer(asoc, peer); break; case SCTP_PARAM_SET_PRIMARY: /* ADDIP Section 4.2.4 -- 2.1.0 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH net] sctp: ASCONF-ACK with Unresolvable Address should be sent
On Sat, Jul 25, 2015 at 3:11 AM, Marcelo Ricardo Leitner wrote: > On Fri, Jul 24, 2015 at 02:56:29PM +0800, Xin Long wrote: >> RFC 5061: >> This is an opaque integer assigned by the sender to identify each >> request parameter. The receiver of the ASCONF Chunk will copy this >> 32-bit value into the ASCONF Response Correlation ID field of the >> ASCONF-ACK response parameter. The sender of the ASCONF can use this >> same value in the ASCONF-ACK to find which request the response is >> for. Note that the receiver MUST NOT change this 32-bit value. >> >> Address Parameter: TLV >> >> This field contains an IPv4 or IPv6 address parameter, as described >> in Section 3.3.2.1 of [RFC4960]. >> >> ASCONF chunk with Error Cause Indication Parameter (Unresolvable Address) >> should be sent if the Delete IP Address is not part of the association. >> >> Endpoint A Endpoint B >> (ESTABLISHED)(ESTABLISHED) >> >> ASCONF-> >> (Delete IP Address) >> <- ASCONF-ACK >> (Unresolvable Address) >> >> Signed-off-by: Xin Long >> --- >> net/sctp/sm_make_chunk.c | 12 +++- >> 1 file changed, 11 insertions(+), 1 deletion(-) >> >> diff --git a/net/sctp/sm_make_chunk.c b/net/sctp/sm_make_chunk.c >> index 06320c8..88d82ef 100644 >> --- a/net/sctp/sm_make_chunk.c >> +++ b/net/sctp/sm_make_chunk.c >> @@ -3090,8 +3090,18 @@ static __be16 sctp_process_asconf_param(struct >> sctp_association *asoc, > > Please let's avoid increasing the indentation level when possible > >> sctp_assoc_set_primary(asoc, asconf->transport); >> sctp_assoc_del_nonprimary_peers(asoc, >> asconf->transport); > add a return here > >> - } else >> + } else { > and remove this else {} > and we're good. > > sctp code is often too indented, trying to reduce that bit here and > there. > >> + /* If the address is not part of the association, the >> + * ASCONF-ACK with Error Cause Indication Parameter >> + * which including cause of Unresolvable Address should >> + * be sent. >> + */ >> + peer = sctp_assoc_lookup_paddr(asoc, &addr); >> + if (!peer) >> + return SCTP_ERROR_DNS_FAILED; >> + >> sctp_assoc_del_peer(asoc, &addr); > > Here we can replace this call to sctp_assoc_rm_peer() , because if we > already have peer, we don't have to search for it again. > > Thanks, > Marcelo > >> + } >> break; >> case SCTP_PARAM_SET_PRIMARY: >> /* ADDIP Section 4.2.4 >> -- >> 2.1.0 >> > > okay, I will repost it -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC PATCH v4 net-next 0/4] tcp: add NV congestion control
This patchset adds support for NV congestion control. The first patch replaces two arguments with a struct in pkts_acked() The second patch is a refactor of tcp_skb_cb The third patch adds in_flight to tcp_skb_cb's tx section The fourth patch adds NV congestion control support. [RFC PATCH v4 net-next 1/4] tcp: replace cnt & rtt with struct in pkts_acked() [RFC PATCH v4 net-next 2/4] tcp: refactor struct tcp_skb_cb [RFC PATCH v4 net-next 3/4] tcp: add in_flight to tcp_skb_cb [RFC PATCH v4 net-next 4/4] tcp: add NV congestion control Signed-off-by: Lawrence Brakmo include/net/tcp.h | 20 ++- net/ipv4/Kconfig| 16 ++ net/ipv4/Makefile | 1 + net/ipv4/tcp_bic.c | 6 +- net/ipv4/tcp_cdg.c | 14 +- net/ipv4/tcp_cubic.c| 6 +- net/ipv4/tcp_htcp.c | 10 +- net/ipv4/tcp_illinois.c | 20 +-- net/ipv4/tcp_input.c| 10 +- net/ipv4/tcp_lp.c | 6 +- net/ipv4/tcp_nv.c | 479 net/ipv4/tcp_output.c | 4 +- net/ipv4/tcp_vegas.c| 6 +- net/ipv4/tcp_vegas.h| 2 +- net/ipv4/tcp_veno.c | 6 +- net/ipv4/tcp_westwood.c | 6 +- net/ipv4/tcp_yeah.c | 6 +- 17 files changed, 567 insertions(+), 51 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC PATCH v4 net-next 4/4] tcp: add NV congestion control
This is a request for comments. TCP-NV (New Vegas) is a major update to TCP-Vegas. An earlier version of NV was presented at 2010's LPC. It is a delayed based congestion avoidance for the data center. This version has been tested within a 10G rack where the HW RTTs are 20-50us. A description of TCP-NV, including implementation details as well as experimental results, can be found at: http://www.brakmo.org/networking/tcp-nv/TCPNV.html The current version includes many module parameters to support experimentation with the parameters. Signed-off-by: Lawrence Brakmo --- net/ipv4/Kconfig | 16 ++ net/ipv4/Makefile | 1 + net/ipv4/tcp_nv.c | 479 ++ 3 files changed, 496 insertions(+) create mode 100644 net/ipv4/tcp_nv.c diff --git a/net/ipv4/Kconfig b/net/ipv4/Kconfig index 6fb3c90..f11f2f8 100644 --- a/net/ipv4/Kconfig +++ b/net/ipv4/Kconfig @@ -539,6 +539,22 @@ config TCP_CONG_VEGAS window. TCP Vegas should provide less packet loss, but it is not as aggressive as TCP Reno. +config TCP_CONG_NV + tristate "TCP NV" + default n + ---help--- + TCP NV is a follow up to TCP Vegas. It has been modified to deal with + 10G networks, measurement noise introduced by LRO, GRO and interrupt + coalescence. In addition, it will decrease its cwnd multiplicatively + instead of linearly. + + Note that in general congestion avoidance (cwnd decreased when # packets + queued grows) cannot coexist with congestion control (cwnd decreased only + when there is packet loss) due to fairness issues. One scenario when they + can coexist safely is when the CA flows have RTTs << CC flows RTTs. + + For further details see http://www.brakmo.org/networking/tcp-nv/ + config TCP_CONG_SCALABLE tristate "Scalable TCP" default n diff --git a/net/ipv4/Makefile b/net/ipv4/Makefile index efc43f3..06f335f 100644 --- a/net/ipv4/Makefile +++ b/net/ipv4/Makefile @@ -50,6 +50,7 @@ obj-$(CONFIG_TCP_CONG_HSTCP) += tcp_highspeed.o obj-$(CONFIG_TCP_CONG_HYBLA) += tcp_hybla.o obj-$(CONFIG_TCP_CONG_HTCP) += tcp_htcp.o obj-$(CONFIG_TCP_CONG_VEGAS) += tcp_vegas.o +obj-$(CONFIG_TCP_CONG_NV) += tcp_nv.o obj-$(CONFIG_TCP_CONG_VENO) += tcp_veno.o obj-$(CONFIG_TCP_CONG_SCALABLE) += tcp_scalable.o obj-$(CONFIG_TCP_CONG_LP) += tcp_lp.o diff --git a/net/ipv4/tcp_nv.c b/net/ipv4/tcp_nv.c new file mode 100644 index 000..c4379b8 --- /dev/null +++ b/net/ipv4/tcp_nv.c @@ -0,0 +1,479 @@ +/* + * TCP NV: TCP with Congestion Avoidance + * + * TCP-NV is a successor of TCP-Vegas that has been developed to + * deal with the issues that occur in modern networks. + * Like TCP-Vegas, TCP-NV supports true congestion avoidance, + * the ability to detect congestion before packet losses occur. + * When congestion (queue buildup) starts to occur, TCP-NV + * predicts what the cwnd size should be for the current + * throughput and it reduces the cwnd proportionally to + * the difference between the current cwnd and the predicted cwnd. + * TCP-NV behaves like Reno when no congestion is detected, or when + * recovering from packet losses. + * + * TODO: + * 1) Add option to not decrease cwnd on losses below certain level + * 2) Add mechanism to deal with reverse congestion. + */ + +#include +#include +#include +#include +#include + +/* TCP NV parameters */ +static int nv_enable __read_mostly = 1; +static int nv_pad __read_mostly = 10; +static int nv_pad_buffer __read_mostly = 2; +static int nv_reset_period __read_mostly = 5; +static int nv_min_cwnd = 10; +static int nv_dec_eval_min_calls = 100; +static int nv_ssthresh_eval_min_calls = 30; +static int nv_rtt_min_cnt = 2; +static int nv_cong_decrease_mult = 30*128/100; +static int nv_ssthresh_factor = 8; +static int nv_rtt_factor = 128; +static int nv_rtt_cnt_dec_delta = 20; /* dec cwnd by this many RTTs */ +static int nv_dec_factor = 5; /* actual value is factor/8 */ +static int nv_loss_dec_factor = 820; /* on loss reduce cwnd by 20% */ +static int nv_cwnd_growth_factor = 2; /* larger => cwnd grows slower */ + +module_param(nv_pad, int, 0644); +MODULE_PARM_DESC(nv_pad, "extra packets above congestion level"); +module_param(nv_pad_buffer, int, 0644); +MODULE_PARM_DESC(nv_pad_buffer, "no growth buffer zone"); +module_param(nv_reset_period, int, 0644); +MODULE_PARM_DESC(nv_reset_period, "nv_min_rtt reset period (secs)"); +module_param(nv_min_cwnd, int, 0644); +MODULE_PARM_DESC(nv_min_cwnd, "NV will not decrease cwnd below this value" +" without losses"); +module_param(nv_dec_eval_min_calls, int, 0644); +MODULE_PARM_DESC(nv_dec_eval_min_calls, "Wait for this many data points " +"before declaring congestion (< 256)"); +module_param(nv_ssthresh_eval_min_calls, int, 0644); +MODULE_PARM_DESC(nv_ssthresh_eval_min_calls, "Wait for this many data points " +"before declaring congestion during initial slow-start"); +module_para
[RFC PATCH v4 net-next 2/4] tcp: refactor struct tcp_skb_cb
Refactor tcp_skb_cb to create two overlaping areas to store state for incoming or outgoing skbs based on comments by Neal Cardwell to tcp_nv patch: AFAICT this patch would not require an increase in the size of sk_buff cb[] if it were to take advantage of the fact that the tcp_skb_cb header.h4 and header.h6 fields are only used in the packet reception code path, and this in_flight field is only used on the transmit side. Signed-off-by: Lawrence Brakmo --- include/net/tcp.h | 11 --- 1 file changed, 8 insertions(+), 3 deletions(-) diff --git a/include/net/tcp.h b/include/net/tcp.h index 1e6c5b04..7c510ed 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -755,11 +755,16 @@ struct tcp_skb_cb { /* 1 byte hole */ __u32 ack_seq;/* Sequence number ACK'd*/ union { - struct inet_skb_parmh4; + struct { + /* There is space for up to 20 bytes */ + } tx; /* only used for outgoing skbs */ + union { + struct inet_skb_parmh4; #if IS_ENABLED(CONFIG_IPV6) - struct inet6_skb_parm h6; + struct inet6_skb_parm h6; #endif - } header; /* For incoming frames */ + } header; /* For incoming skbs */ + }; }; #define TCP_SKB_CB(__skb) ((struct tcp_skb_cb *)&((__skb)->cb[0])) -- 1.8.1 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC PATCH v4 net-next 3/4] tcp: add in_flight to tcp_skb_cb
Add in_flight (bytes in flight when packet was sent) field to tx component of tcp_skb_cb and make it available to congestion modules' pkts_acked() function through the ack_sample function argument. Signed-off-by: Lawrence Brakmo --- include/net/tcp.h | 2 ++ net/ipv4/tcp_input.c | 5 - net/ipv4/tcp_output.c | 4 +++- 3 files changed, 9 insertions(+), 2 deletions(-) diff --git a/include/net/tcp.h b/include/net/tcp.h index 7c510ed..f850404 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -757,6 +757,7 @@ struct tcp_skb_cb { union { struct { /* There is space for up to 20 bytes */ + __u32 in_flight;/* Bytes in flight when packet sent */ } tx; /* only used for outgoing skbs */ union { struct inet_skb_parmh4; @@ -842,6 +843,7 @@ union tcp_cc_info; struct ack_sample { u32 pkts_acked; s32 rtt_us; + u32 in_flight; }; struct tcp_congestion_ops { diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index 423d3af..3ab4178 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -3068,6 +3068,7 @@ static int tcp_clean_rtx_queue(struct sock *sk, int prior_fackets, long ca_rtt_us = -1L; struct sk_buff *skb; u32 pkts_acked = 0; + u32 last_in_flight = 0; bool rtt_update; int flag = 0; @@ -3107,6 +3108,7 @@ static int tcp_clean_rtx_queue(struct sock *sk, int prior_fackets, if (!first_ackt.v64) first_ackt = last_ackt; + last_in_flight = TCP_SKB_CB(skb)->tx.in_flight; reord = min(pkts_acked, reord); if (!after(scb->end_seq, tp->high_seq)) flag |= FLAG_ORIG_SACK_ACKED; @@ -3196,7 +3198,8 @@ static int tcp_clean_rtx_queue(struct sock *sk, int prior_fackets, } if (icsk->icsk_ca_ops->pkts_acked) { - struct ack_sample sample = {pkts_acked, ca_rtt_us}; + struct ack_sample sample = {pkts_acked, ca_rtt_us, + last_in_flight}; icsk->icsk_ca_ops->pkts_acked(sk, &sample); } diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index 7105784..e9deab5 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -920,9 +920,12 @@ static int tcp_transmit_skb(struct sock *sk, struct sk_buff *skb, int clone_it, int err; BUG_ON(!skb || !tcp_skb_pcount(skb)); + tp = tcp_sk(sk); if (clone_it) { skb_mstamp_get(&skb->skb_mstamp); + TCP_SKB_CB(skb)->tx.in_flight = TCP_SKB_CB(skb)->end_seq + - tp->snd_una; if (unlikely(skb_cloned(skb))) skb = pskb_copy(skb, gfp_mask); @@ -933,7 +936,6 @@ static int tcp_transmit_skb(struct sock *sk, struct sk_buff *skb, int clone_it, } inet = inet_sk(sk); - tp = tcp_sk(sk); tcb = TCP_SKB_CB(skb); memset(&opts, 0, sizeof(opts)); -- 1.8.1 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC PATCH v4 net-next 1/4] tcp: replace cnt & rtt with struct in pkts_acked()
Replace 2 arguments (cnt and rtt) in the congestion control modules' pkts_acked() function with a struct. This will allow adding more information without having to modify existing congestion control modules (tcp_nv in particular needs bytes in flight when packet was sent). As proposed by Neal Cardwell in his comments to the tcp_nv patch. Signed-off-by: Lawrence Brakmo --- include/net/tcp.h | 7 ++- net/ipv4/tcp_bic.c | 6 +++--- net/ipv4/tcp_cdg.c | 14 +++--- net/ipv4/tcp_cubic.c| 6 +++--- net/ipv4/tcp_htcp.c | 10 +- net/ipv4/tcp_illinois.c | 20 ++-- net/ipv4/tcp_input.c| 7 +-- net/ipv4/tcp_lp.c | 6 +++--- net/ipv4/tcp_vegas.c| 6 +++--- net/ipv4/tcp_vegas.h| 2 +- net/ipv4/tcp_veno.c | 6 +++--- net/ipv4/tcp_westwood.c | 6 +++--- net/ipv4/tcp_yeah.c | 6 +++--- 13 files changed, 55 insertions(+), 47 deletions(-) diff --git a/include/net/tcp.h b/include/net/tcp.h index 364426a..1e6c5b04 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -834,6 +834,11 @@ enum tcp_ca_ack_event_flags { union tcp_cc_info; +struct ack_sample { + u32 pkts_acked; + s32 rtt_us; +}; + struct tcp_congestion_ops { struct list_headlist; u32 key; @@ -857,7 +862,7 @@ struct tcp_congestion_ops { /* new value of cwnd after loss (optional) */ u32 (*undo_cwnd)(struct sock *sk); /* hook for packet ack accounting (optional) */ - void (*pkts_acked)(struct sock *sk, u32 num_acked, s32 rtt_us); + void (*pkts_acked)(struct sock *sk, struct ack_sample *sample); /* get info for inet_diag (optional) */ size_t (*get_info)(struct sock *sk, u32 ext, int *attr, union tcp_cc_info *info); diff --git a/net/ipv4/tcp_bic.c b/net/ipv4/tcp_bic.c index fd1405d..f237691 100644 --- a/net/ipv4/tcp_bic.c +++ b/net/ipv4/tcp_bic.c @@ -197,15 +197,15 @@ static void bictcp_state(struct sock *sk, u8 new_state) /* Track delayed acknowledgment ratio using sliding window * ratio = (15*ratio + sample) / 16 */ -static void bictcp_acked(struct sock *sk, u32 cnt, s32 rtt) +static void bictcp_acked(struct sock *sk, struct ack_sample *sample) { const struct inet_connection_sock *icsk = inet_csk(sk); if (icsk->icsk_ca_state == TCP_CA_Open) { struct bictcp *ca = inet_csk_ca(sk); - cnt -= ca->delayed_ack >> ACK_RATIO_SHIFT; - ca->delayed_ack += cnt; + ca->delayed_ack += sample->pkts_acked - + (ca->delayed_ack >> ACK_RATIO_SHIFT); } } diff --git a/net/ipv4/tcp_cdg.c b/net/ipv4/tcp_cdg.c index 167b6a3..9fbdfa5 100644 --- a/net/ipv4/tcp_cdg.c +++ b/net/ipv4/tcp_cdg.c @@ -294,12 +294,12 @@ static void tcp_cdg_cong_avoid(struct sock *sk, u32 ack, u32 acked) ca->shadow_wnd = max(ca->shadow_wnd, ca->shadow_wnd + incr); } -static void tcp_cdg_acked(struct sock *sk, u32 num_acked, s32 rtt_us) +static void tcp_cdg_acked(struct sock *sk, struct ack_sample *sample) { struct cdg *ca = inet_csk_ca(sk); struct tcp_sock *tp = tcp_sk(sk); - if (rtt_us <= 0) + if (sample->rtt_us <= 0) return; /* A heuristic for filtering delayed ACKs, adapted from: @@ -307,20 +307,20 @@ static void tcp_cdg_acked(struct sock *sk, u32 num_acked, s32 rtt_us) * delay and rate based TCP mechanisms." TR 100219A. CAIA, 2010. */ if (tp->sacked_out == 0) { - if (num_acked == 1 && ca->delack) { + if (sample->pkts_acked == 1 && ca->delack) { /* A delayed ACK is only used for the minimum if it is * provenly lower than an existing non-zero minimum. */ - ca->rtt.min = min(ca->rtt.min, rtt_us); + ca->rtt.min = min(ca->rtt.min, sample->rtt_us); ca->delack--; return; - } else if (num_acked > 1 && ca->delack < 5) { + } else if (sample->pkts_acked > 1 && ca->delack < 5) { ca->delack++; } } - ca->rtt.min = min_not_zero(ca->rtt.min, rtt_us); - ca->rtt.max = max(ca->rtt.max, rtt_us); + ca->rtt.min = min_not_zero(ca->rtt.min, sample->rtt_us); + ca->rtt.max = max(ca->rtt.max, sample->rtt_us); } static u32 tcp_cdg_ssthresh(struct sock *sk) diff --git a/net/ipv4/tcp_cubic.c b/net/ipv4/tcp_cubic.c index 28011fb..9817a8f 100644 --- a/net/ipv4/tcp_cubic.c +++ b/net/ipv4/tcp_cubic.c @@ -416,21 +416,21 @@ static void hystart_update(struct sock *sk, u32 delay) /* Track delayed acknowledgment ratio using sliding window * ratio = (15*ratio + sample) / 16 */ -static void bictcp_acked(struct sock *sk, u32 cnt, s32 rtt_us) +static void bictcp_acked(struct sock *sk, struct ack_sample *sample) { con
[PATCH net-next 1/1] e1000: remove dead e1000_init_eeprom_params calls.
The device probe method e1000_probe calls e1000_init_eeprom_params itself so there's no reason to call it again from e1000_do_write_eeprom or e1000_do_read_eeprom. The sentence above assumes that e1000_init_eeprom_params is effective but it's mostly dependant on "hw->mac_type": safe as e1000_probe bails out early if it can't set mac_type (see e1000_init_hw_struct, then e1000_set_mac_type). Btw, if effective, the removed paths would had been deadlock prone when e1000_eeprom_spi was set: -> e1000_write_eeprom (takes e1000_eeprom_lock) -> e1000_do_write_eeprom -> e1000_init_eeprom_params -> e1000_read_eeprom (takes e1000_eeprom_lock) (same narrative with e1000_read_eeprom -> e1000_do_read_eeprom etc.) As a final note, the candidate deadlock above can't happen in e1000_probe due to the way eeprom->word_size is set / tested. Signed-off-by: Francois Romieu --- Untested. I have found it while looking at Joern's patch. drivers/net/ethernet/intel/e1000/e1000_hw.c | 8 1 file changed, 8 deletions(-) diff --git a/drivers/net/ethernet/intel/e1000/e1000_hw.c b/drivers/net/ethernet/intel/e1000/e1000_hw.c index 45c8c864..b1af0d6 100644 --- a/drivers/net/ethernet/intel/e1000/e1000_hw.c +++ b/drivers/net/ethernet/intel/e1000/e1000_hw.c @@ -3900,10 +3900,6 @@ static s32 e1000_do_read_eeprom(struct e1000_hw *hw, u16 offset, u16 words, return E1000_SUCCESS; } - /* If eeprom is not yet detected, do so now */ - if (eeprom->word_size == 0) - e1000_init_eeprom_params(hw); - /* A check for invalid values: offset too large, too many words, and * not enough words. */ @@ -4074,10 +4070,6 @@ static s32 e1000_do_write_eeprom(struct e1000_hw *hw, u16 offset, u16 words, return E1000_SUCCESS; } - /* If eeprom is not yet detected, do so now */ - if (eeprom->word_size == 0) - e1000_init_eeprom_params(hw); - /* A check for invalid values: offset too large, too many words, and * not enough words. */ -- 2.4.3 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC Patch net-next] inet: introduce a sysctl ip_local_ports_strict_use
On Wed, Jul 22, 2015 at 10:39 PM, Stephen Hemminger wrote: > On Wed, 22 Jul 2015 17:07:37 -0700 > Cong Wang wrote: > >> For a real example, named randomly selects some port to bind() for >> security concern. (It doesn't use bind(0) to let kernel to select port >> because it is not random enough, kernel usually just picks the next >> available.) When running named on a Mesos controlled host, named would >> silently fail when it binds a port assigned to a Mesos container. > > I think named is trying to workaround security issues that were fixed > 5 years ago in Linux. The kernel does not just pick the next available > in current code. > Good to know that. I will rephrase the changelog. Thanks. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH]mlx4-core: fix possible use after free in cq_completion
On Fri, Jul 24, 2015 at 11:18 AM, Jinpu Wang wrote: > I hit bug in OFED, I report to link below: > http://marc.info/?l=linux-rdma&m=143634872328553&w=2 > I checked latest mainline Linux 4.2-rc3, it has similar bug. > Here is the patch against Linux 4.2-rc3, compile test only. Did you see the bug hitting and the fix in action over upstream?! if not, it would be very helpful if you do so. Anyway, I'll ask Jack to look on that next week. Or. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH net] sctp: ASCONF-ACK with Unresolvable Address should be sent
On Fri, Jul 24, 2015 at 02:56:29PM +0800, Xin Long wrote: > RFC 5061: > This is an opaque integer assigned by the sender to identify each > request parameter. The receiver of the ASCONF Chunk will copy this > 32-bit value into the ASCONF Response Correlation ID field of the > ASCONF-ACK response parameter. The sender of the ASCONF can use this > same value in the ASCONF-ACK to find which request the response is > for. Note that the receiver MUST NOT change this 32-bit value. > > Address Parameter: TLV > > This field contains an IPv4 or IPv6 address parameter, as described > in Section 3.3.2.1 of [RFC4960]. > > ASCONF chunk with Error Cause Indication Parameter (Unresolvable Address) > should be sent if the Delete IP Address is not part of the association. > > Endpoint A Endpoint B > (ESTABLISHED)(ESTABLISHED) > > ASCONF-> > (Delete IP Address) > <- ASCONF-ACK > (Unresolvable Address) > > Signed-off-by: Xin Long > --- > net/sctp/sm_make_chunk.c | 12 +++- > 1 file changed, 11 insertions(+), 1 deletion(-) > > diff --git a/net/sctp/sm_make_chunk.c b/net/sctp/sm_make_chunk.c > index 06320c8..88d82ef 100644 > --- a/net/sctp/sm_make_chunk.c > +++ b/net/sctp/sm_make_chunk.c > @@ -3090,8 +3090,18 @@ static __be16 sctp_process_asconf_param(struct > sctp_association *asoc, Please let's avoid increasing the indentation level when possible > sctp_assoc_set_primary(asoc, asconf->transport); > sctp_assoc_del_nonprimary_peers(asoc, > asconf->transport); add a return here > - } else > + } else { and remove this else {} and we're good. sctp code is often too indented, trying to reduce that bit here and there. > + /* If the address is not part of the association, the > + * ASCONF-ACK with Error Cause Indication Parameter > + * which including cause of Unresolvable Address should > + * be sent. > + */ > + peer = sctp_assoc_lookup_paddr(asoc, &addr); > + if (!peer) > + return SCTP_ERROR_DNS_FAILED; > + > sctp_assoc_del_peer(asoc, &addr); Here we can replace this call to sctp_assoc_rm_peer() , because if we already have peer, we don't have to search for it again. Thanks, Marcelo > + } > break; > case SCTP_PARAM_SET_PRIMARY: > /* ADDIP Section 4.2.4 > -- > 2.1.0 > -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] net: netcp: Fixes SGMII reset on network interface shutdown
This patch asserts SGMII RTRESET, i.e. resetting the SGMII Tx/Rx logic, during network interface shutdown to avoid having the hardware wedge when shutting down with high incoming traffic rates. This is cleared (brought out of RTRESET) when the interface is brought back up. Signed-off-by: WingMan Kwok --- This patch depends on the patch set Subject: [net-next PATCH v1 0/6] net: netcp: Bug fixes of CPSW statistics collection submitted earlier. drivers/net/ethernet/ti/netcp.h |1 + drivers/net/ethernet/ti/netcp_ethss.c | 18 ++ drivers/net/ethernet/ti/netcp_sgmii.c | 30 -- 3 files changed, 47 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/ti/netcp.h b/drivers/net/ethernet/ti/netcp.h index bbacf5c..a8a7306 100644 --- a/drivers/net/ethernet/ti/netcp.h +++ b/drivers/net/ethernet/ti/netcp.h @@ -223,6 +223,7 @@ void *netcp_device_find_module(struct netcp_device *netcp_device, /* SGMII functions */ int netcp_sgmii_reset(void __iomem *sgmii_ofs, int port); +bool netcp_sgmii_rtreset(void __iomem *sgmii_ofs, int port, bool set); int netcp_sgmii_get_port_link(void __iomem *sgmii_ofs, int port); int netcp_sgmii_config(void __iomem *sgmii_ofs, int port, u32 interface); diff --git a/drivers/net/ethernet/ti/netcp_ethss.c b/drivers/net/ethernet/ti/netcp_ethss.c index 7782120..571cf7a 100644 --- a/drivers/net/ethernet/ti/netcp_ethss.c +++ b/drivers/net/ethernet/ti/netcp_ethss.c @@ -2101,11 +2101,28 @@ static void gbe_port_config(struct gbe_priv *gbe_dev, struct gbe_slave *slave, writel(slave->mac_control, GBE_REG_ADDR(slave, emac_regs, mac_control)); } +static void gbe_sgmii_rtreset(struct gbe_priv *priv, + struct gbe_slave *slave, bool set) +{ + void __iomem *sgmii_port_regs; + + if (SLAVE_LINK_IS_XGMII(slave)) + return; + + if ((priv->ss_version == GBE_SS_VERSION_14) && (slave->slave_num >= 2)) + sgmii_port_regs = priv->sgmii_port34_regs; + else + sgmii_port_regs = priv->sgmii_port_regs; + + netcp_sgmii_rtreset(sgmii_port_regs, slave->slave_num, set); +} + static void gbe_slave_stop(struct gbe_intf *intf) { struct gbe_priv *gbe_dev = intf->gbe_dev; struct gbe_slave *slave = intf->slave; + gbe_sgmii_rtreset(gbe_dev, slave, true); gbe_port_reset(slave); /* Disable forwarding */ cpsw_ale_control_set(gbe_dev->ale, slave->port_num, @@ -2147,6 +2164,7 @@ static int gbe_slave_open(struct gbe_intf *gbe_intf) gbe_sgmii_config(priv, slave); gbe_port_reset(slave); + gbe_sgmii_rtreset(priv, slave, false); gbe_port_config(priv, slave, priv->rx_packet_max); gbe_set_slave_mac(slave, gbe_intf); /* enable forwarding */ diff --git a/drivers/net/ethernet/ti/netcp_sgmii.c b/drivers/net/ethernet/ti/netcp_sgmii.c index dbeb142..5d8419f 100644 --- a/drivers/net/ethernet/ti/netcp_sgmii.c +++ b/drivers/net/ethernet/ti/netcp_sgmii.c @@ -18,6 +18,9 @@ #include "netcp.h" +#define SGMII_SRESET_RESET BIT(0) +#define SGMII_SRESET_RTRESET BIT(1) + #define SGMII_REG_STATUS_LOCK BIT(4) #defineSGMII_REG_STATUS_LINK BIT(0) #define SGMII_REG_STATUS_AUTONEG BIT(2) @@ -51,12 +54,35 @@ static void sgmii_write_reg_bit(void __iomem *base, int reg, u32 val) int netcp_sgmii_reset(void __iomem *sgmii_ofs, int port) { /* Soft reset */ - sgmii_write_reg_bit(sgmii_ofs, SGMII_SRESET_REG(port), 0x1); - while (sgmii_read_reg(sgmii_ofs, SGMII_SRESET_REG(port)) != 0x0) + sgmii_write_reg_bit(sgmii_ofs, SGMII_SRESET_REG(port), + SGMII_SRESET_RESET); + + while ((sgmii_read_reg(sgmii_ofs, SGMII_SRESET_REG(port)) & + SGMII_SRESET_RESET) != 0x0) ; + return 0; } +/* port is 0 based */ +bool netcp_sgmii_rtreset(void __iomem *sgmii_ofs, int port, bool set) +{ + u32 reg; + bool oldval; + + /* Initiate a soft reset */ + reg = sgmii_read_reg(sgmii_ofs, SGMII_SRESET_REG(port)); + oldval = (reg & SGMII_SRESET_RTRESET) != 0x0; + if (set) + reg |= SGMII_SRESET_RTRESET; + else + reg &= ~SGMII_SRESET_RTRESET; + sgmii_write_reg(sgmii_ofs, SGMII_SRESET_REG(port), reg); + wmb(); + + return oldval; +} + int netcp_sgmii_get_port_link(void __iomem *sgmii_ofs, int port) { u32 status = 0, link = 0; -- 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v1 6/6] net/macb: convert to kernel doc
This patch coverts struct description to the kernel doc format. There is no functional change. Signed-off-by: Andy Shevchenko --- include/linux/platform_data/macb.h | 14 +++--- 1 file changed, 11 insertions(+), 3 deletions(-) diff --git a/include/linux/platform_data/macb.h b/include/linux/platform_data/macb.h index 044a124..21b15f6 100644 --- a/include/linux/platform_data/macb.h +++ b/include/linux/platform_data/macb.h @@ -8,11 +8,19 @@ #ifndef __MACB_PDATA_H__ #define __MACB_PDATA_H__ +/** + * struct macb_platform_data - platform data for MACB Ethernet + * @phy_mask: phy mask passed when register the MDIO bus + * within the driver + * @phy_irq_pin: PHY IRQ + * @is_rmii: using RMII interface? + * @rev_eth_addr: reverse Ethernet address byte order + */ struct macb_platform_data { u32 phy_mask; - int phy_irq_pin;/* PHY IRQ */ - u8 is_rmii;/* using RMII interface? */ - u8 rev_eth_addr; /* reverse Ethernet address byte order */ + int phy_irq_pin; + u8 is_rmii; + u8 rev_eth_addr; }; #endif /* __MACB_PDATA_H__ */ -- 2.4.6 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v1 4/6] net/macb: suppress compiler warnings
This patch fixes the following warnings: drivers/net/ethernet/cadence/macb.c: In function ‘macb_handle_link_change’: drivers/net/ethernet/cadence/macb.c:266: warning: comparison between signed and unsigned drivers/net/ethernet/cadence/macb.c:267: warning: comparison between signed and unsigned drivers/net/ethernet/cadence/macb.c:291: warning: comparison between signed and unsigned drivers/net/ethernet/cadence/macb.c: In function ‘gem_update_stats’: drivers/net/ethernet/cadence/macb.c:1908: warning: comparison between signed and unsigned drivers/net/ethernet/cadence/macb.c: In function ‘gem_get_ethtool_strings’: drivers/net/ethernet/cadence/macb.c:1988: warning: comparison between signed and unsigned Signed-off-by: Andy Shevchenko --- drivers/net/ethernet/cadence/macb.c | 5 ++--- drivers/net/ethernet/cadence/macb.h | 6 +++--- 2 files changed, 5 insertions(+), 6 deletions(-) diff --git a/drivers/net/ethernet/cadence/macb.c b/drivers/net/ethernet/cadence/macb.c index 367fc9d..13d7e96 100644 --- a/drivers/net/ethernet/cadence/macb.c +++ b/drivers/net/ethernet/cadence/macb.c @@ -303,7 +303,6 @@ static void macb_handle_link_change(struct net_device *dev) struct macb *bp = netdev_priv(dev); struct phy_device *phydev = bp->phy_dev; unsigned long flags; - int status_change = 0; spin_lock_irqsave(&bp->lock, flags); @@ -1936,7 +1935,7 @@ static int macb_change_mtu(struct net_device *dev, int new_mtu) static void gem_update_stats(struct macb *bp) { - int i; + unsigned int i; u32 *p = &bp->hw_stats.gem.tx_octets_31_0; for (i = 0; i < GEM_STATS_LEN; ++i, ++p) { @@ -2015,7 +2014,7 @@ static int gem_get_sset_count(struct net_device *dev, int sset) static void gem_get_ethtool_strings(struct net_device *dev, u32 sset, u8 *p) { - int i; + unsigned int i; switch (sset) { case ETH_SS_STATS: diff --git a/drivers/net/ethernet/cadence/macb.h b/drivers/net/ethernet/cadence/macb.h index f245340..2aa102e 100644 --- a/drivers/net/ethernet/cadence/macb.h +++ b/drivers/net/ethernet/cadence/macb.h @@ -816,9 +816,9 @@ struct macb { struct mii_bus *mii_bus; struct phy_device *phy_dev; - unsigned intlink; - unsigned intspeed; - unsigned intduplex; + int link; + int speed; + int duplex; u32 caps; unsigned intdma_burst_length; -- 2.4.6 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v1 0/6] net/macb: fix for AVR32 and clean up
It seems no one had tested recently the driver on AVR32 platforms such as ATNGW100. This series bring it back to work. Andy Shevchenko (6): net/macb: improve big endian CPU support net/macb: check if macb_config present net/macb: use dev_*() when netdev is not yet registered net/macb: suppress compiler warnings net/macb: replace macb_count_tx_descriptors() by DIV_ROUND_UP() net/macb: convert to kernel doc drivers/net/ethernet/cadence/macb.c | 125 drivers/net/ethernet/cadence/macb.h | 34 -- include/linux/platform_data/macb.h | 14 +++- 3 files changed, 108 insertions(+), 65 deletions(-) -- 2.4.6 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v1 5/6] net/macb: replace macb_count_tx_descriptors() by DIV_ROUND_UP()
macb_count_tx_descriptors() repeats the generic macro DIV_ROUND_UP(). The patch does a replacement. There is no functional change. Signed-off-by: Andy Shevchenko --- drivers/net/ethernet/cadence/macb.c | 10 ++ 1 file changed, 2 insertions(+), 8 deletions(-) diff --git a/drivers/net/ethernet/cadence/macb.c b/drivers/net/ethernet/cadence/macb.c index 13d7e96..5818c04 100644 --- a/drivers/net/ethernet/cadence/macb.c +++ b/drivers/net/ethernet/cadence/macb.c @@ -1157,12 +1157,6 @@ static void macb_poll_controller(struct net_device *dev) } #endif -static inline unsigned int macb_count_tx_descriptors(struct macb *bp, -unsigned int len) -{ - return (len + bp->max_tx_length - 1) / bp->max_tx_length; -} - static unsigned int macb_tx_map(struct macb *bp, struct macb_queue *queue, struct sk_buff *skb) @@ -1313,11 +1307,11 @@ static int macb_start_xmit(struct sk_buff *skb, struct net_device *dev) * socket buffer: skb fragments of jumbo frames may need to be * splitted into many buffer descriptors. */ - count = macb_count_tx_descriptors(bp, skb_headlen(skb)); + count = DIV_ROUND_UP(skb_headlen(skb), bp->max_tx_length); nr_frags = skb_shinfo(skb)->nr_frags; for (f = 0; f < nr_frags; f++) { frag_size = skb_frag_size(&skb_shinfo(skb)->frags[f]); - count += macb_count_tx_descriptors(bp, frag_size); + count += DIV_ROUND_UP(frag_size, bp->max_tx_length); } spin_lock_irqsave(&bp->lock, flags); -- 2.4.6 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v1 2/6] net/macb: check if macb_config present
The commit 98b5a0f4a228 introduces jumbo frame support, but also it assumes that macb_config present which is not always true. The configuration without macb_config fails to boot. Unable to handle kernel NULL pointer dereference at virtual address 0010 ptbr = 9035 pgd = Oops: Kernel access of bad area, sig: 11 [#1] FRAME_POINTER chip: 0x01f:0x1e82 rev 2 Modules linked in: CPU: 0 PID: 1 Comm: swapper Not tainted 4.2.0-rc3-next-20150723+ #13 task: 91c26000 ti: 91c28000 task.ti: 91c28000 PC is at macb_probe+0x140/0x61c Fixes: 98b5a0f4a228 (net: macb: Add support for jumbo frames) Signed-off-by: Andy Shevchenko --- drivers/net/ethernet/cadence/macb.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/drivers/net/ethernet/cadence/macb.c b/drivers/net/ethernet/cadence/macb.c index 6980115..7986778 100644 --- a/drivers/net/ethernet/cadence/macb.c +++ b/drivers/net/ethernet/cadence/macb.c @@ -2885,9 +2885,8 @@ static int macb_probe(struct platform_device *pdev) bp->pclk = pclk; bp->hclk = hclk; bp->tx_clk = tx_clk; - if (macb_config->jumbo_max_len) { + if (macb_config) bp->jumbo_max_len = macb_config->jumbo_max_len; - } spin_lock_init(&bp->lock); -- 2.4.6 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v1 3/6] net/macb: use dev_*() when netdev is not yet registered
To avoid messages like macb macb.0 (unnamed net_device) (uninitialized): Cadence caps 0x macb macb.0 (unnamed net_device) (uninitialized): invalid hw address, using random let's use dev_*() macros. Signed-off-by: Andy Shevchenko --- drivers/net/ethernet/cadence/macb.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/cadence/macb.c b/drivers/net/ethernet/cadence/macb.c index 7986778..367fc9d 100644 --- a/drivers/net/ethernet/cadence/macb.c +++ b/drivers/net/ethernet/cadence/macb.c @@ -211,7 +211,7 @@ static void macb_get_hwaddr(struct macb *bp) } } - netdev_info(bp->dev, "invalid hw address, using random\n"); + dev_info(&bp->pdev->dev, "invalid hw address, using random\n"); eth_hw_addr_random(bp->dev); } @@ -2240,7 +2240,7 @@ static void macb_configure_caps(struct macb *bp, const struct macb_config *dt_co bp->caps |= MACB_CAPS_FIFO_MODE; } - netdev_dbg(bp->dev, "Cadence caps 0x%08x\n", bp->caps); + dev_dbg(&bp->pdev->dev, "Cadence caps 0x%08x\n", bp->caps); } static void macb_probe_queues(void __iomem *mem, -- 2.4.6 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v1 1/6] net/macb: improve big endian CPU support
The commit a50dad355a53 (net: macb: Add big endian CPU support) converted I/O accessors to readl_relaxed() and writel_relaxed() and consequentially broke MACB driver on AVR32 platforms such as ATNGW100. This patch improves I/O access by checking endiannes first and use the corresponding methods. Fixes: a50dad355a53 (net: macb: Add big endian CPU support) Signed-off-by: Andy Shevchenko --- drivers/net/ethernet/cadence/macb.c | 103 ++-- drivers/net/ethernet/cadence/macb.h | 28 -- 2 files changed, 87 insertions(+), 44 deletions(-) diff --git a/drivers/net/ethernet/cadence/macb.c b/drivers/net/ethernet/cadence/macb.c index a4e3f86..6980115 100644 --- a/drivers/net/ethernet/cadence/macb.c +++ b/drivers/net/ethernet/cadence/macb.c @@ -104,6 +104,57 @@ static void *macb_rx_buffer(struct macb *bp, unsigned int index) return bp->rx_buffers + bp->rx_buffer_size * macb_rx_ring_wrap(index); } +/* I/O accessors */ +static u32 hw_readl_native(struct macb *bp, int offset) +{ + return __raw_readl(bp->regs + offset); +} + +static void hw_writel_native(struct macb *bp, int offset, u32 value) +{ + __raw_writel(value, bp->regs + offset); +} + +static u32 hw_readl(struct macb *bp, int offset) +{ + return readl_relaxed(bp->regs + offset); +} + +static void hw_writel(struct macb *bp, int offset, u32 value) +{ + writel_relaxed(value, bp->regs + offset); +} + +/* + * Find the CPU endianness by using the loopback bit of NCR register. When the + * CPU is in big endian we need to program swaped mode for management + * descriptor access. + */ +static bool hw_is_native_io(void __iomem *addr) +{ + u32 value = MACB_BIT(LLB); + + __raw_writel(value, addr + MACB_NCR); + value = __raw_readl(addr + MACB_NCR); + + /* Write 0 back to disable everything */ + __raw_writel(0, addr + MACB_NCR); + + return value == MACB_BIT(LLB); +} + +static bool hw_is_gem(void __iomem *addr, bool native_io) +{ + u32 id; + + if (native_io) + id = __raw_readl(addr + MACB_MID); + else + id = readl_relaxed(addr + MACB_MID); + + return MACB_BFEXT(IDNUM, id) >= 0x2; +} + static void macb_set_hwaddr(struct macb *bp) { u32 bottom; @@ -449,14 +500,14 @@ err_out: static void macb_update_stats(struct macb *bp) { - u32 __iomem *reg = bp->regs + MACB_PFR; u32 *p = &bp->hw_stats.macb.rx_pause_frames; u32 *end = &bp->hw_stats.macb.tx_pause_frames + 1; + int offset = MACB_PFR; WARN_ON((unsigned long)(end - p - 1) != (MACB_TPF - MACB_PFR) / 4); - for(; p < end; p++, reg++) - *p += readl_relaxed(reg); + for(; p < end; p++, offset += 4) + *p += bp->readl(bp, offset); } static int macb_halt_tx(struct macb *bp) @@ -1603,7 +1654,6 @@ static u32 macb_dbw(struct macb *bp) static void macb_configure_dma(struct macb *bp) { u32 dmacfg; - u32 tmp, ncr; if (macb_is_gem(bp)) { dmacfg = gem_readl(bp, DMACFG) & ~GEM_BF(RXBS, -1L); @@ -1613,22 +1663,11 @@ static void macb_configure_dma(struct macb *bp) dmacfg |= GEM_BIT(TXPBMS) | GEM_BF(RXBMS, -1L); dmacfg &= ~GEM_BIT(ENDIA_PKT); - /* Find the CPU endianness by using the loopback bit of net_ctrl -* register. save it first. When the CPU is in big endian we -* need to program swaped mode for management descriptor access. -*/ - ncr = macb_readl(bp, NCR); - __raw_writel(MACB_BIT(LLB), bp->regs + MACB_NCR); - tmp = __raw_readl(bp->regs + MACB_NCR); - - if (tmp == MACB_BIT(LLB)) + if (bp->native_io) dmacfg &= ~GEM_BIT(ENDIA_DESC); else dmacfg |= GEM_BIT(ENDIA_DESC); /* CPU in big endian */ - /* Restore net_ctrl */ - macb_writel(bp, NCR, ncr); - if (bp->dev->features & NETIF_F_HW_CSUM) dmacfg |= GEM_BIT(TXCOEN); else @@ -1902,14 +1941,14 @@ static void gem_update_stats(struct macb *bp) for (i = 0; i < GEM_STATS_LEN; ++i, ++p) { u32 offset = gem_statistics[i].offset; - u64 val = readl_relaxed(bp->regs + offset); + u64 val = bp->readl(bp, offset); bp->ethtool_stats[i] += val; *p += val; if (offset == GEM_OCTTXL || offset == GEM_OCTRXL) { /* Add GEM_OCTTXH, GEM_OCTRXH */ - val = readl_relaxed(bp->regs + offset + 4); + val = bp->readl(bp, offset + 4); bp->ethtool_stats[i] += ((u64)val) << 32; *(++p) += val; } @@ -2190,7 +2229,7 @@ static void macb_configure_caps(struct macb *bp, const struct macb_config
Re: Several races in "usbnet" module (kernel 4.1.x)
21.07.2015 15:04, Oliver Neukum пишет: On Mon, 2015-07-20 at 21:13 +0300, Eugene Shatokhin wrote: Hi, I have recently found several data races in "usbnet" module, checked on vanilla kernel 4.1.0 on x86_64. The races do actually happen, I have confirmed it by adding delays and using hardware breakpoints to detect the conflicting memory accesses (with RaceHound tool, https://github.com/winnukem/racehound). I have not analyzed yet how harmful these races are (if they are), but it is better to report them anyway, I think. Everything was checked using YOTA 4G LTE Modem that works via "usbnet" and "cdc_ether" kernel modules. -- [Race #1] Race on skb_queue ('next' pointer) between usbnet_stop() and rx_complete(). Reproduced that by unplugging the device while the system was downloading a large file from the Net. Here is part of the call stack with the code where the changes to the queue happen: #0 __skb_unlink (skbuff.h:1517) prev->next = next; #1 defer_bh (usbnet.c:430) spin_lock_irqsave(&list->lock, flags); old_state = entry->state; entry->state = state; __skb_unlink(skb, list); spin_unlock(&list->lock); spin_lock(&dev->done.lock); __skb_queue_tail(&dev->done, skb); if (dev->done.qlen == 1) tasklet_schedule(&dev->bh); spin_unlock_irqrestore(&dev->done.lock, flags); #2 rx_complete (usbnet.c:640) state = defer_bh(dev, skb, &dev->rxq, state); At the same time, the following code repeatedly checks if the queue is empty and reads the same values concurrently with the above changes: #0 usbnet_terminate_urbs (usbnet.c:765) /* maybe wait for deletions to finish. */ while (!skb_queue_empty(&dev->rxq) && !skb_queue_empty(&dev->txq) && !skb_queue_empty(&dev->done)) { schedule_timeout(msecs_to_jiffies(UNLINK_TIMEOUT_MS)); set_current_state(TASK_UNINTERRUPTIBLE); netif_dbg(dev, ifdown, dev->net, "waited for %d urb completions\n", temp); } #1 usbnet_stop (usbnet.c:806) if (!(info->flags & FLAG_AVOID_UNLINK_URBS)) usbnet_terminate_urbs(dev); For example, it is possible that the skb is removed from dev->rxq by __skb_unlink() before the check "!skb_queue_empty(&dev->rxq)" in usbnet_terminate_urbs() is made. It is also possible in this case that the skb is added to dev->done queue after "!skb_queue_empty(&dev->done)" is checked. So usbnet_terminate_urbs() may stop waiting and return while dev->done queue still has an item. Hi, your analysis is correct and it looks like in addition to your proposed fix locking needs to be simplified and a common lock to be taken. Suggestions? Just an idea, I haven't tested it. How about moving the operations with dev->done under &list->lock in defer_bh, while keeping dev->done.lock too and changing usbnet_terminate_urbs() as described below? Like this: @@ -428,12 +428,12 @@ static enum skb_state defer_bh(struct usbnet *dev, struct sk_buff *skb, old_state = entry->state; entry->state = state; __skb_unlink(skb, list); - spin_unlock(&list->lock); spin_lock(&dev->done.lock); __skb_queue_tail(&dev->done, skb); if (dev->done.qlen == 1) tasklet_schedule(&dev->bh); - spin_unlock_irqrestore(&dev->done.lock, flags); + spin_unlock(&dev->done.lock); + spin_unlock_irqrestore(&list->lock, flags); return old_state; } --- usbnet_terminate_urbs() can then be changed as follows: @@ -749,6 +749,20 @@ EXPORT_SYMBOL_GPL(usbnet_unlink_rx_urbs); /*-*/ +static void wait_skb_queue_empty(struct sk_buff_head *q) +{ + unsigned long flags; + + spin_lock_irqsave(&q->lock, flags); + while (!skb_queue_empty(q)) { + spin_unlock_irqrestore(&q->lock, flags); + schedule_timeout(msecs_to_jiffies(UNLINK_TIMEOUT_MS)); + set_current_state(TASK_UNINTERRUPTIBLE); + spin_lock_irqsave(&q->lock, flags); + } + spin_unlock_irqrestore(&q->lock, flags); +} + // precondition: never called in_interrupt static void usbnet_terminate_urbs(struct usbnet *dev) { @@ -762,14 +776,11 @@ static void usbnet_terminate_urbs(struct usbnet *dev) unlink_urbs(dev, &dev->rxq); /* maybe wait for deletions to finish. */ - while (!skb_queue_empty(&dev->rxq) - && !skb_queue_empty(&dev->txq) - && !skb_queue_empty(&dev->done)) { - schedule_timeout(msecs_to_jiffies(UNLINK_TIMEOUT_MS)); - set_current_state(TASK_UNINTERRUPTIBLE); - netif_dbg(dev, ifdown, dev->net, - "waited for %d urb completions\n", temp);
Re: [PATCH 2/3] brcmfmac: dhd_sdio.c: use existing atomic_or primitive
On Friday 24 July 2015 08:02 PM, Kalle Valo wrote: > Vineet Gupta writes: > >> > There's already a generic implementation so use that instead. >> > --- >> > I'm not sure if the driver usage of atomic_or?() is correct in terms of >> > storage size of @val for 64 bit arches. >> > >> > Assuming LP64 programming model for linux on say x86_64: atomic_or() >> > callers in this driver use long (sana 64 bit) storage and pass it to >> > atomic_orr/atomic_or which downcasts it to 32 bits. Is that OK ? >> > --- >> > Cc: Brett Rudley >> > Cc: Arend van Spriel >> > Cc: "Franky (Zhenhui) Lin" >> > Cc: Hante Meuleman >> > Cc: Kalle Valo >> > Cc: Pieter-Paul Giesberts >> > Cc: Daniel Kim >> > Cc: linux-wirel...@vger.kernel.org >> > Cc: brcm80211-dev-l...@broadcom.com >> > Cc: Peter Zijlstra >> > Cc: Ingo Molnar >> > Cc: netdev@vger.kernel.org >> > Cc: linux-a...@vger.kernel.org >> > Cc: linux-ker...@vger.kernel.org >> > Signed-off-by: Vineet Gupta >> > >> > Signed-off-by: Vineet Gupta > What's the plan with this patch? Should I take it to my > wireless-drivers-next tree or will someone else take it? Per last discussion on this topic, Arend wanted to discuss abt this with Hante. I'm not taking it anyways so feel free to pick it up if you want ! -Vineet -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/3] brcmfmac: dhd_sdio.c: use existing atomic_or primitive
Vineet Gupta writes: > There's already a generic implementation so use that instead. > --- > I'm not sure if the driver usage of atomic_or?() is correct in terms of > storage size of @val for 64 bit arches. > > Assuming LP64 programming model for linux on say x86_64: atomic_or() > callers in this driver use long (sana 64 bit) storage and pass it to > atomic_orr/atomic_or which downcasts it to 32 bits. Is that OK ? > --- > Cc: Brett Rudley > Cc: Arend van Spriel > Cc: "Franky (Zhenhui) Lin" > Cc: Hante Meuleman > Cc: Kalle Valo > Cc: Pieter-Paul Giesberts > Cc: Daniel Kim > Cc: linux-wirel...@vger.kernel.org > Cc: brcm80211-dev-l...@broadcom.com > Cc: Peter Zijlstra > Cc: Ingo Molnar > Cc: netdev@vger.kernel.org > Cc: linux-a...@vger.kernel.org > Cc: linux-ker...@vger.kernel.org > Signed-off-by: Vineet Gupta > > Signed-off-by: Vineet Gupta What's the plan with this patch? Should I take it to my wireless-drivers-next tree or will someone else take it? -- Kalle Valo -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net] tcp: fix recv with flags MSG_WAITALL | MSG_PEEK
On Fri, 2015-07-24 at 18:19 +0200, Sabrina Dubroca wrote: > Currently, tcp_recvmsg enters a busy loop in sk_wait_data if called > with flags = MSG_WAITALL | MSG_PEEK. > > sk_wait_data waits for sk_receive_queue not empty, but in this case, > the receive queue is not empty, but does not contain any skb that we > can use. > > Add a "last skb seen on receive queue" argument to sk_wait_data, so > that it sleeps until the receive queue has new skbs. > > Link: https://bugzilla.kernel.org/show_bug.cgi?id=99461 > Link: https://sourceware.org/bugzilla/show_bug.cgi?id=18493 > Link: https://bugzilla.redhat.com/show_bug.cgi?id=1205258 > Reported-by: Enrico Scholz > Reported-by: Dan Searle > Signed-off-by: Sabrina Dubroca > --- Very nice ! Acked-by: Eric Dumazet -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next v2 1/2] ipv6: Re-arrange code in rt6_probe()
It is a prep work for the next patch to remove write_lock from rt6_probe(). 1. Reduce the number of if(neigh) check. From 4 to 1. 2. Bring the write_(un)lock() closer to the operations that the lock is protecting. Hopefully, the above make rt6_probe() more readable. Signed-off-by: Martin KaFai Lau Cc: Hannes Frederic Sowa Cc: Julian Anastasov Cc: YOSHIFUJI Hideaki --- net/ipv6/route.c | 44 1 file changed, 20 insertions(+), 24 deletions(-) diff --git a/net/ipv6/route.c b/net/ipv6/route.c index 7f2214f..6d503db 100644 --- a/net/ipv6/route.c +++ b/net/ipv6/route.c @@ -545,6 +545,7 @@ static void rt6_probe_deferred(struct work_struct *w) static void rt6_probe(struct rt6_info *rt) { + struct __rt6_probe_work *work; struct neighbour *neigh; /* * Okay, this does not seem to be appropriate @@ -559,34 +560,29 @@ static void rt6_probe(struct rt6_info *rt) rcu_read_lock_bh(); neigh = __ipv6_neigh_lookup_noref(rt->dst.dev, &rt->rt6i_gateway); if (neigh) { + work = NULL; write_lock(&neigh->lock); - if (neigh->nud_state & NUD_VALID) - goto out; - } - - if (!neigh || - time_after(jiffies, neigh->updated + rt->rt6i_idev->cnf.rtr_probe_interval)) { - struct __rt6_probe_work *work; - - work = kmalloc(sizeof(*work), GFP_ATOMIC); - - if (neigh && work) - __neigh_set_probe_once(neigh); - - if (neigh) - write_unlock(&neigh->lock); - - if (work) { - INIT_WORK(&work->work, rt6_probe_deferred); - work->target = rt->rt6i_gateway; - dev_hold(rt->dst.dev); - work->dev = rt->dst.dev; - schedule_work(&work->work); + if (!(neigh->nud_state & NUD_VALID) && + time_after(jiffies, + neigh->updated + + rt->rt6i_idev->cnf.rtr_probe_interval)) { + work = kmalloc(sizeof(*work), GFP_ATOMIC); + if (work) + __neigh_set_probe_once(neigh); } - } else { -out: write_unlock(&neigh->lock); + } else { + work = kmalloc(sizeof(*work), GFP_ATOMIC); + } + + if (work) { + INIT_WORK(&work->work, rt6_probe_deferred); + work->target = rt->rt6i_gateway; + dev_hold(rt->dst.dev); + work->dev = rt->dst.dev; + schedule_work(&work->work); } + rcu_read_unlock_bh(); } #else -- 1.8.1 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next v2 0/2] ipv6: Avoid rt6_probe() taking writer lock in the fast path
v1 -> v2: 1. Separate the code re-arrangement into another patch 2. Fix style -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next v2 2/2] ipv6: Avoid rt6_probe() taking writer lock in the fast path
The patch checks neigh->nud_state before acquiring the writer lock. Note that rt6_probe() is only used in CONFIG_IPV6_ROUTER_PREF. 40 udpflood processes and a /64 gateway route are used. The gateway has NUD_PERMANENT. Each of them is run for 30s. At the end, the total number of finished sendto(): Before: 55M After: 95M Signed-off-by: Martin KaFai Lau Cc: Hannes Frederic Sowa CC: Julian Anastasov CC: YOSHIFUJI Hideaki --- net/ipv6/route.c | 4 1 file changed, 4 insertions(+) diff --git a/net/ipv6/route.c b/net/ipv6/route.c index 6d503db..76dcff8 100644 --- a/net/ipv6/route.c +++ b/net/ipv6/route.c @@ -560,6 +560,9 @@ static void rt6_probe(struct rt6_info *rt) rcu_read_lock_bh(); neigh = __ipv6_neigh_lookup_noref(rt->dst.dev, &rt->rt6i_gateway); if (neigh) { + if (neigh->nud_state & NUD_VALID) + goto out; + work = NULL; write_lock(&neigh->lock); if (!(neigh->nud_state & NUD_VALID) && @@ -583,6 +586,7 @@ static void rt6_probe(struct rt6_info *rt) schedule_work(&work->work); } +out: rcu_read_unlock_bh(); } #else -- 1.8.1 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next v2] route: allow to route in a peer netns via lwt framework
On Fri, Jul 24, 2015 at 05:39:57PM +0200, Eric Dumazet wrote: > > On Fri, 2015-07-24 at 16:16 +0200, Nicolas Dichtel wrote: > > This patch takes advantage of the newly added lwtunnel framework to > > allow the user to set routes that point to a peer netns. > > > > Packets are injected to the peer netns via the loopback device. It works > > only when the output device is 'lo'. > > > > Example: > > ip route add 40.1.1.1/32 encap netns nsid 5 via dev lo > > > > Is this feature so badly wanted to add complexity on lo device ? ... > > static netdev_tx_t loopback_xmit(struct sk_buff *skb, > > struct net_device *dev) ... > > + if (nsid != NETNSA_NSID_NOT_ASSIGNED) { > > + peernet = get_net_ns_by_id(dev_net(dev), nsid); > > + if (!peernet) { > > + kfree_skb(skb); > > + goto end; > > + } > > + > > + /* it's OK to use per_cpu_ptr() because BHs are off */ > > + lb_stats = this_cpu_ptr(peernet->loopback_dev->lstats); > > + ret = dev_forward_skb(peernet->loopback_dev, skb); have the same concern as Eric. Using loopback for this looks wrong. netns suppose to look like host, but I cannot imagine a host without NICs seeing packets on loopback from another world. Then how the opposite direction suppose to work? netns will setup a route to send packets to loopback of the host?! The idea of using routing to forward packets to namespaces is great, but I think we need something else instead of loopback. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net] tcp: fix recv with flags MSG_WAITALL | MSG_PEEK
Currently, tcp_recvmsg enters a busy loop in sk_wait_data if called with flags = MSG_WAITALL | MSG_PEEK. sk_wait_data waits for sk_receive_queue not empty, but in this case, the receive queue is not empty, but does not contain any skb that we can use. Add a "last skb seen on receive queue" argument to sk_wait_data, so that it sleeps until the receive queue has new skbs. Link: https://bugzilla.kernel.org/show_bug.cgi?id=99461 Link: https://sourceware.org/bugzilla/show_bug.cgi?id=18493 Link: https://bugzilla.redhat.com/show_bug.cgi?id=1205258 Reported-by: Enrico Scholz Reported-by: Dan Searle Signed-off-by: Sabrina Dubroca --- include/net/sock.h | 2 +- net/core/sock.c| 5 +++-- net/dccp/proto.c | 2 +- net/ipv4/tcp.c | 11 +++ net/llc/af_llc.c | 4 ++-- 5 files changed, 14 insertions(+), 10 deletions(-) diff --git a/include/net/sock.h b/include/net/sock.h index 05a8c1aea251..f21f0708ec59 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -902,7 +902,7 @@ void sk_stream_kill_queues(struct sock *sk); void sk_set_memalloc(struct sock *sk); void sk_clear_memalloc(struct sock *sk); -int sk_wait_data(struct sock *sk, long *timeo); +int sk_wait_data(struct sock *sk, long *timeo, const struct sk_buff *skb); struct request_sock_ops; struct timewait_sock_ops; diff --git a/net/core/sock.c b/net/core/sock.c index 08f16db46070..8a14f1285fc4 100644 --- a/net/core/sock.c +++ b/net/core/sock.c @@ -1967,20 +1967,21 @@ static void __release_sock(struct sock *sk) * sk_wait_data - wait for data to arrive at sk_receive_queue * @sk:sock to wait on * @timeo: for how long + * @skb: last skb seen on sk_receive_queue * * Now socket state including sk->sk_err is changed only under lock, * hence we may omit checks after joining wait queue. * We check receive queue before schedule() only as optimization; * it is very likely that release_sock() added new data. */ -int sk_wait_data(struct sock *sk, long *timeo) +int sk_wait_data(struct sock *sk, long *timeo, const struct sk_buff *skb) { int rc; DEFINE_WAIT(wait); prepare_to_wait(sk_sleep(sk), &wait, TASK_INTERRUPTIBLE); set_bit(SOCK_ASYNC_WAITDATA, &sk->sk_socket->flags); - rc = sk_wait_event(sk, timeo, !skb_queue_empty(&sk->sk_receive_queue)); + rc = sk_wait_event(sk, timeo, skb_peek_tail(&sk->sk_receive_queue) != skb); clear_bit(SOCK_ASYNC_WAITDATA, &sk->sk_socket->flags); finish_wait(sk_sleep(sk), &wait); return rc; diff --git a/net/dccp/proto.c b/net/dccp/proto.c index 52a94016526d..b5cf13a28009 100644 --- a/net/dccp/proto.c +++ b/net/dccp/proto.c @@ -886,7 +886,7 @@ verify_sock_status: break; } - sk_wait_data(sk, &timeo); + sk_wait_data(sk, &timeo, NULL); continue; found_ok_skb: if (len > skb->len) diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index 7f4056785acc..45534a5ab430 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -780,7 +780,7 @@ ssize_t tcp_splice_read(struct socket *sock, loff_t *ppos, ret = -EAGAIN; break; } - sk_wait_data(sk, &timeo); + sk_wait_data(sk, &timeo, NULL); if (signal_pending(current)) { ret = sock_intr_errno(timeo); break; @@ -1575,7 +1575,7 @@ int tcp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len, int nonblock, int target; /* Read at least this many bytes */ long timeo; struct task_struct *user_recv = NULL; - struct sk_buff *skb; + struct sk_buff *skb, *last; u32 urg_hole = 0; if (unlikely(flags & MSG_ERRQUEUE)) @@ -1635,7 +1635,9 @@ int tcp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len, int nonblock, /* Next get a buffer. */ + last = skb_peek_tail(&sk->sk_receive_queue); skb_queue_walk(&sk->sk_receive_queue, skb) { + last = skb; /* Now that we have two receive queues this * shouldn't happen. */ @@ -1754,8 +1756,9 @@ int tcp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len, int nonblock, /* Do not sleep, just process backlog. */ release_sock(sk); lock_sock(sk); - } else - sk_wait_data(sk, &timeo); + } else { + sk_wait_data(sk, &timeo, last); + } if (user_recv) { int chunk; diff --git a/net/llc/af_llc.c b/net/llc/af_llc.c index 8fd9febaa5ba..8dab4e569571 100644 --- a/net/llc/af_llc.c +++ b/net/llc/af_llc.c @@ -613,7 +613,7 @@ static int llc_wait_data(stru
[PATCH] ip/ip6tunnel: fix missing return value check
Make sure that return value of each socket() call is properly checked and do not continue processing if the call failed. Signed-off-by: Zhang Shengju --- ip/tunnel.c | 24 1 file changed, 24 insertions(+) diff --git a/ip/tunnel.c b/ip/tunnel.c index 33c78e3..d69fe84 100644 --- a/ip/tunnel.c +++ b/ip/tunnel.c @@ -73,7 +73,13 @@ int tnl_get_ioctl(const char *basedev, void *p) strncpy(ifr.ifr_name, basedev, IFNAMSIZ); ifr.ifr_ifru.ifru_data = (void*)p; + fd = socket(preferred_family, SOCK_DGRAM, 0); + if (fd < 0) { + fprintf(stderr, "create socket failed: %s\n", strerror(errno)); + return -1; + } + err = ioctl(fd, SIOCGETTUNNEL, &ifr); if (err) fprintf(stderr, "get tunnel \"%s\" failed: %s\n", basedev, @@ -94,7 +100,13 @@ int tnl_add_ioctl(int cmd, const char *basedev, const char *name, void *p) else strncpy(ifr.ifr_name, basedev, IFNAMSIZ); ifr.ifr_ifru.ifru_data = p; + fd = socket(preferred_family, SOCK_DGRAM, 0); + if (fd < 0) { + fprintf(stderr, "create socket failed: %s\n", strerror(errno)); + return -1; + } + err = ioctl(fd, cmd, &ifr); if (err) fprintf(stderr, "add tunnel \"%s\" failed: %s\n", ifr.ifr_name, @@ -115,7 +127,13 @@ int tnl_del_ioctl(const char *basedev, const char *name, void *p) strncpy(ifr.ifr_name, basedev, IFNAMSIZ); ifr.ifr_ifru.ifru_data = p; + fd = socket(preferred_family, SOCK_DGRAM, 0); + if (fd < 0) { + fprintf(stderr, "create socket failed: %s\n", strerror(errno)); + return -1; + } + err = ioctl(fd, SIOCDELTUNNEL, &ifr); if (err) fprintf(stderr, "delete tunnel \"%s\" failed: %s\n", @@ -133,7 +151,13 @@ static int tnl_gen_ioctl(int cmd, const char *name, strncpy(ifr.ifr_name, name, IFNAMSIZ); ifr.ifr_ifru.ifru_data = p; + fd = socket(preferred_family, SOCK_DGRAM, 0); + if (fd < 0) { + fprintf(stderr, "create socket failed: %s\n", strerror(errno)); + return -1; + } + err = ioctl(fd, cmd, &ifr); if (err && errno != skiperr) fprintf(stderr, "%s: ioctl %x failed: %s\n", name, -- 1.8.3.1 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH 03/10] dpaa_eth: add configurable bpool thresholds
> -Original Message- > From: Joe Perches [mailto:j...@perches.com] > On Wed, 2015-07-22 at 19:16 +0300, Madalin Bucur wrote: > > Allow the user to tweak the refill threshold and the total number > > of buffers in the buffer pool. The provided values are for one CPU. > > Any value in making these module parameters instead? I expect one would (hardly ever) change these to improve some corner cases then use them with the new values. It may help in the tuning process but afterwards the bloat to the bootcmd would probably be a nuisance. > > +config FSL_DPAA_ETH_MAX_BUF_COUNT > > + int "Maximum number of buffers in private bpool" > > + range 64 2048 > > + default "128" > > + ---help--- > > + The maximum number of buffers to be by default allocated in the > DPAA-Ethernet private port's > > + buffer pool. One needn't normally modify this, as it has probably > been tuned for performance > > + already. This cannot be lower than DPAA_ETH_REFILL_THRESHOLD. > > + > > +config FSL_DPAA_ETH_REFILL_THRESHOLD > > + int "Private bpool refill threshold" > > + range 32 FSL_DPAA_ETH_MAX_BUF_COUNT > > + default "80" > > + ---help--- > > + The DPAA-Ethernet driver will start replenishing buffer pools whose > count > > + falls below this threshold. This must be related to > DPAA_ETH_MAX_BUF_COUNT. One needn't normally > > + modify this value unless one has very specific performance reasons. > > + > > config FSL_DPAA_CS_THRESHOLD_1G > > hex "Egress congestion threshold on 1G ports" > > range 0x1000 0x1000 > > -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Fw: [Bug 99461] recvfrom SYSCALL infinite loop/deadlock chewing 100% CPU [was __libc_recv (fd=fd@entry=300, buf=buf@entry=0x7f6042880600, n=n@entry=5, flags=-1, flags@entry=258) at ../sysdeps/unix/sys
Begin forwarded message: Date: Fri, 24 Jul 2015 11:22:17 + From: "bugzilla-dae...@bugzilla.kernel.org" To: "shemmin...@linux-foundation.org" Subject: [Bug 99461] recvfrom SYSCALL infinite loop/deadlock chewing 100% CPU [was __libc_recv (fd=fd@entry=300, buf=buf@entry=0x7f6042880600, n=n@entry=5, flags=-1, flags@entry=258) at ../sysdeps/unix/sysv/linux/x86_64/recv.c:33] https://bugzilla.kernel.org/show_bug.cgi?id=99461 --- Comment #4 from Dan Searle --- Is there anyone working on a fix for this bug? Is there any way a fix can be expedited? -- You are receiving this mail because: You are the assignee for the bug. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH 02/10] dpaa_eth: add support for DPAA Ethernet
> -Original Message- > From: Joe Perches [mailto:j...@perches.com] > On Wed, 2015-07-22 at 19:16 +0300, Madalin Bucur wrote: > > This introduces the Freescale Data Path Acceleration Architecture > > (DPAA) Ethernet driver (dpaa_eth) that builds upon the DPAA QMan, > > BMan, PAMU and FMan drivers to deliver Ethernet connectivity on > > the Freescale DPAA QorIQ platforms. > > trivia: > > > +static void __hot _dpa_tx_conf(struct net_device *net_dev, > > + const struct dpa_priv_s *priv, > > + struct dpa_percpu_priv_s *percpu_priv, > > + const struct qm_fd *fd, > > + u32 fqid) > > +{ > [] > > +static struct dpa_bp * __cold > > +dpa_priv_bp_probe(struct device *dev) > > Do the __hot and __cold markings really matter? > Some of them may be questionable. Some may be, yes. I need to go through all of them. > > +static int __init dpa_load(void) > > +{ > [] > > + err = platform_driver_register(&dpa_driver); > > + if (unlikely(err < 0)) { > > + pr_err(KBUILD_MODNAME > > + ": %s:%hu:%s(): platform_driver_register() = %d\n", > > + KBUILD_BASENAME ".c", __LINE__, __func__, err); > > + } > > + > > + pr_debug(KBUILD_MODNAME ": %s:%s() ->\n", > > +KBUILD_BASENAME ".c", __func__); > > Perhaps these should use pr_fmt Agree. > > +static void __exit dpa_unload(void) > > +{ > > + pr_debug(KBUILD_MODNAME ": -> %s:%s()\n", > > +KBUILD_BASENAME ".c", __func__); > > dynamic debug has __func__ available and perhaps > the function tracer might be used instead. > > > diff --git a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.h > b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.h > [] > > +#define __hot > > curious. > > Maybe it'd be good to add a real __hot to compiler.h They're mostly there to make readers aware the code is critical, any changes could mess performance. > > +struct dpa_buffer_layout_s { > > + u16 priv_data_size; > > + boolparse_results; > > + booltime_stamp; > > + boolhash_results; > > + u16 data_align; > > +}; > > > +struct dpa_fq { > > + struct qman_fq fq_base; > > + struct list_head list; > > + struct net_device *net_dev; > > some inconsistent indentation here and there Yes, I've tried to align the style but given the many editors along the time the code existed there still are areas out of sync. > > +struct dpa_bp { > > + struct bman_pool*pool; > > + u8 bpid; > > + struct device *dev; > > + union { > > + /* The buffer pools used for the private ports are initialized > > +* with target_count buffers for each CPU; at runtime the > > +* number of buffers per CPU is constantly brought back to > this > > +* level > > +*/ > > + int target_count; > > + /* The configured value for the number of buffers in the > pool, > > +* used for shared port buffer pools > > +*/ > > + int config_count; > > + }; > > Anonymous unions are relatively rare We liked the direct access to members... In this particular case the use is a bit excessive, we can do without it. > > + struct { > > + /** > > Maybe the /** style should be avoided Will fix. > > +* All egress queues to a given net device belong to one > > +* (and the same) congestion group. > > +*/ > > + struct qman_cgr cgr; > > + } cgr_data; > > [] > > > +int dpa_stop(struct net_device *net_dev) > > +{ > [] > > + err = mac_dev->stop(mac_dev); > > + if (unlikely(err < 0)) > > + netif_err(priv, ifdown, net_dev, "mac_dev->stop() = %d\n", > > + err); > > Some of the likely/unlikely uses may not > be useful/necessary. In this particular case it's gratuitous, I'll go through all of them. > > + > > + for_each_port_device(i, mac_dev->port_dev) { > > + error = fm_port_disable( > > + fm_port_drv_handle(mac_dev- > >port_dev[i])); > > + err = error ? error : err; > > if (error) > err = error; > > is more obvious to me. Yes, it's more readable. Thank you, Madalin -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next] hv_netvsc: Add structs and handlers for VF messages
This patch adds data structures and handlers for messages related to SRIOV Virtual Function. Signed-off-by: Haiyang Zhang Reviewed-by: K. Y. Srinivasan --- drivers/net/hyperv/hyperv_net.h | 29 ++ drivers/net/hyperv/netvsc.c | 43 +- 2 files changed, 62 insertions(+), 10 deletions(-) diff --git a/drivers/net/hyperv/hyperv_net.h b/drivers/net/hyperv/hyperv_net.h index 26cd14c..f225d1f 100644 --- a/drivers/net/hyperv/hyperv_net.h +++ b/drivers/net/hyperv/hyperv_net.h @@ -541,6 +541,29 @@ union nvsp_2_message_uber { struct nvsp_2_free_rxbuf free_rxbuf; } __packed; +struct nvsp_4_send_vf_association { + /* 1: allocated, serial number is valid. 0: not allocated */ + u32 allocated; + + /* Serial number of the VF to team with */ + u32 serial; +} __packed; + +enum nvsp_vm_datapath { + NVSP_DATAPATH_SYNTHETIC = 0, + NVSP_DATAPATH_VF, + NVSP_DATAPATH_MAX +}; + +struct nvsp_4_sw_datapath { + u32 active_datapath; /* active data path in VM */ +} __packed; + +union nvsp_4_message_uber { + struct nvsp_4_send_vf_association vf_assoc; + struct nvsp_4_sw_datapath active_dp; +} __packed; + enum nvsp_subchannel_operation { NVSP_SUBCHANNEL_NONE = 0, NVSP_SUBCHANNEL_ALLOCATE, @@ -578,6 +601,7 @@ union nvsp_all_messages { union nvsp_message_init_uber init_msg; union nvsp_1_message_uber v1_msg; union nvsp_2_message_uber v2_msg; + union nvsp_4_message_uber v4_msg; union nvsp_5_message_uber v5_msg; } __packed; @@ -689,6 +713,11 @@ struct netvsc_device { /* The net device context */ struct net_device_context *nd_ctx; + + /* 1: allocated, serial number is valid. 0: not allocated */ + u32 vf_alloc; + /* Serial number of the VF to team with */ + u32 vf_serial; }; /* NdisInitialize message */ diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c index 23126a7..51e4c0f 100644 --- a/drivers/net/hyperv/netvsc.c +++ b/drivers/net/hyperv/netvsc.c @@ -453,13 +453,16 @@ static int negotiate_nvsp_ver(struct hv_device *device, if (nvsp_ver == NVSP_PROTOCOL_VERSION_1) return 0; - /* NVSPv2 only: Send NDIS config */ + /* NVSPv2 or later: Send NDIS config */ memset(init_packet, 0, sizeof(struct nvsp_message)); init_packet->hdr.msg_type = NVSP_MSG2_TYPE_SEND_NDIS_CONFIG; init_packet->msg.v2_msg.send_ndis_config.mtu = net_device->ndev->mtu + ETH_HLEN; init_packet->msg.v2_msg.send_ndis_config.capability.ieee8021q = 1; + if (nvsp_ver >= NVSP_PROTOCOL_VERSION_5) + init_packet->msg.v2_msg.send_ndis_config.capability.sriov = 1; + ret = vmbus_sendpacket(device->channel, init_packet, sizeof(struct nvsp_message), (unsigned long)init_packet, @@ -1064,11 +1067,10 @@ static void netvsc_receive(struct netvsc_device *net_device, static void netvsc_send_table(struct hv_device *hdev, - struct vmpacket_descriptor *vmpkt) + struct nvsp_message *nvmsg) { struct netvsc_device *nvscdev; struct net_device *ndev; - struct nvsp_message *nvmsg; int i; u32 count, *tab; @@ -1077,12 +1079,6 @@ static void netvsc_send_table(struct hv_device *hdev, return; ndev = nvscdev->ndev; - nvmsg = (struct nvsp_message *)((unsigned long)vmpkt + - (vmpkt->offset8 << 3)); - - if (nvmsg->hdr.msg_type != NVSP_MSG5_TYPE_SEND_INDIRECTION_TABLE) - return; - count = nvmsg->msg.v5_msg.send_table.count; if (count != VRSS_SEND_TAB_SIZE) { netdev_err(ndev, "Received wrong send-table size:%u\n", count); @@ -1096,6 +1092,28 @@ static void netvsc_send_table(struct hv_device *hdev, nvscdev->send_table[i] = tab[i]; } +static void netvsc_send_vf(struct netvsc_device *nvdev, + struct nvsp_message *nvmsg) +{ + nvdev->vf_alloc = nvmsg->msg.v4_msg.vf_assoc.allocated; + nvdev->vf_serial = nvmsg->msg.v4_msg.vf_assoc.serial; +} + +static inline void netvsc_receive_inband(struct hv_device *hdev, +struct netvsc_device *nvdev, +struct nvsp_message *nvmsg) +{ + switch (nvmsg->hdr.msg_type) { + case NVSP_MSG5_TYPE_SEND_INDIRECTION_TABLE: + netvsc_send_table(hdev, nvmsg); + break; + + case NVSP_MSG4_TYPE_SEND_VF_ASSOCIATION: + netvsc_send_vf(nvdev, nvmsg); + break; + } +} + void netvsc_channel_cb(void *context) { int ret; @@ -1108,6 +1126,7 @@ void netvsc_channel_cb(void *context) unsigned char
Re: [PATCH net-next v2] route: allow to route in a peer netns via lwt framework
On Fri, 2015-07-24 at 16:16 +0200, Nicolas Dichtel wrote: > This patch takes advantage of the newly added lwtunnel framework to > allow the user to set routes that point to a peer netns. > > Packets are injected to the peer netns via the loopback device. It works > only when the output device is 'lo'. > > Example: > ip route add 40.1.1.1/32 encap netns nsid 5 via dev lo > Is this feature so badly wanted to add complexity on lo device ? > Signed-off-by: Nicolas Dichtel > --- > > v2: rework loopback handling part (update stats and call skb_dst_force()) > fix ipv6 processing > check lwtunnel type before converting data to a nsid > > drivers/net/loopback.c| 33 +-- > include/net/lwtunnel.h| 27 ++ > include/uapi/linux/lwtunnel.h | 1 + > net/core/net_namespace.c | 52 > +++ > net/ipv6/route.c | 9 ++-- > 5 files changed, 113 insertions(+), 9 deletions(-) > > diff --git a/drivers/net/loopback.c b/drivers/net/loopback.c > index c76283c2f84a..4358256ff94e 100644 > --- a/drivers/net/loopback.c > +++ b/drivers/net/loopback.c > @@ -57,6 +57,7 @@ > #include > #include > #include > +#include > > struct pcpu_lstats { > u64 packets; > @@ -71,29 +72,47 @@ struct pcpu_lstats { > static netdev_tx_t loopback_xmit(struct sk_buff *skb, >struct net_device *dev) > { > + int nsid = skb_lwt_netns_info(skb); > struct pcpu_lstats *lb_stats; > - int len; > - > - skb_orphan(skb); > + struct net *peernet = NULL; > + int len, ret; > > /* Before queueing this packet to netif_rx(), >* make sure dst is refcounted. >*/ > skb_dst_force(skb); > > - skb->protocol = eth_type_trans(skb, dev); > + if (nsid != NETNSA_NSID_NOT_ASSIGNED) { > + peernet = get_net_ns_by_id(dev_net(dev), nsid); > + if (!peernet) { > + kfree_skb(skb); > + goto end; > + } > + > + /* it's OK to use per_cpu_ptr() because BHs are off */ > + lb_stats = this_cpu_ptr(peernet->loopback_dev->lstats); > + ret = dev_forward_skb(peernet->loopback_dev, skb); > + } else { > + skb_orphan(skb); > > - /* it's OK to use per_cpu_ptr() because BHs are off */ > - lb_stats = this_cpu_ptr(dev->lstats); > + skb->protocol = eth_type_trans(skb, dev); > + > + /* it's OK to use per_cpu_ptr() because BHs are off */ > + lb_stats = this_cpu_ptr(dev->lstats); > + ret = netif_rx(skb); > + } > > len = skb->len; At this point you no longer can access skb > - if (likely(netif_rx(skb) == NET_RX_SUCCESS)) { > + if (likely(ret == NET_RX_SUCCESS)) { > u64_stats_update_begin(&lb_stats->syncp); > lb_stats->bytes += len; > lb_stats->packets++; > u64_stats_update_end(&lb_stats->syncp); > } > > +end: > + if (peernet) > + put_net(peernet); > return NETDEV_TX_OK; > } > -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next] route: allow to route in a peer netns via lwt framework
On 7/24/15 8:32 AM, Nicolas Dichtel wrote: Le 24/07/2015 16:28, David Ahern a écrit : On 7/23/15 8:22 AM, Nicolas Dichtel wrote: static netdev_tx_t loopback_xmit(struct sk_buff *skb, struct net_device *dev) { +int nsid = skb_lwt_netns_info(skb); struct pcpu_lstats *lb_stats; int len; +if (nsid >= 0) { +struct net *peernet = get_net_ns_by_id(dev_net(dev), nsid); + +if (!peernet) { If nsid is > 0 then the peer namespace should exist right? So for this failure path why not increment tx_error stat? I was not sure about that, because before my patch we increment statistics only in case of NET_RX_SUCCESS. In this case you are knowingly dropping packets. Would be nice to have a counter showing that. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Several races in "usbnet" module (kernel 4.1.x)
23.07.2015 12:15, Oliver Neukum пишет: On Wed, 2015-07-22 at 21:33 +0300, Eugene Shatokhin wrote: The following part is not necessary, I think. usbnet_bh() does not touch EVENT_NO_RUNTIME_PM bit explicitly and these bit operations are atomic w.r.t. each other. + mpn |= !test_and_clear_bit(EVENT_NO_RUNTIME_PM, &dev->flags); + /* in case the bh reset a flag */ Yes, they are atomic w.r.t. each other. And that limitation worries me. I am considering architectures which do atomic operations with spinlocks. And this code mixes another operation into it. Can this happen? CPU A CPU B take lock read old value set value to 0 clear bit write back changed value release lock From what I see now in Documentation/atomic_ops.txt, stores to the properly aligned memory locations are in fact atomic. So, I think, the situation you described above cannot happen for dev->flags, which is good. No need to address that in the patch. The race might be harmless after all. If I understand the code correctly now, dev->flags is set to 0 in usbnet_stop() so that the worker function (usbnet_deferred_kevent) would do nothing, should it start later. If so, how about adding memory barriers for all CPUs to see dev->flags is 0 before other things? The patch could look like this then: diff --git a/drivers/net/usb/usbnet.c b/drivers/net/usb/usbnet.c index 3c86b10..d87b9c7 100644 --- a/drivers/net/usb/usbnet.c +++ b/drivers/net/usb/usbnet.c @@ -778,7 +778,7 @@ int usbnet_stop (struct net_device *net) { struct usbnet *dev = netdev_priv(net); struct driver_info *info = dev->driver_info; - int retval, pm; + int retval, pm, mpn; clear_bit(EVENT_DEV_OPEN, &dev->flags); netif_stop_queue (net); @@ -813,14 +813,17 @@ int usbnet_stop (struct net_device *net) * can't flush_scheduled_work() until we drop rtnl (later), * else workers could deadlock; so make workers a NOP. */ + mpn = !test_and_clear_bit(EVENT_NO_RUNTIME_PM, &dev->flags); dev->flags = 0; + smp_mb(); /* make sure the workers see that dev->flags == 0 */ + del_timer_sync (&dev->delay); tasklet_kill (&dev->bh); + if (!pm) usb_autopm_put_interface(dev->intf); - if (info->manage_power && - !test_and_clear_bit(EVENT_NO_RUNTIME_PM, &dev->flags)) + if (info->manage_power && mpn) info->manage_power(dev, 0); else usb_autopm_put_interface(dev->intf); @@ -1078,6 +1081,9 @@ usbnet_deferred_kevent (struct work_struct *work) container_of(work, struct usbnet, kevent); int status; + /* See the changes in dev->flags from other CPUs. */ + smp_mb(); + /* usb_clear_halt() needs a thread context */ if (test_bit (EVENT_TX_HALT, &dev->flags)) { unlink_urbs (dev, &dev->txq); What do you think? Regards, Eugene -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next] route: allow to route in a peer netns via lwt framework
Le 24/07/2015 16:28, David Ahern a écrit : On 7/23/15 8:22 AM, Nicolas Dichtel wrote: static netdev_tx_t loopback_xmit(struct sk_buff *skb, struct net_device *dev) { +int nsid = skb_lwt_netns_info(skb); struct pcpu_lstats *lb_stats; int len; +if (nsid >= 0) { +struct net *peernet = get_net_ns_by_id(dev_net(dev), nsid); + +if (!peernet) { If nsid is > 0 then the peer namespace should exist right? So for this failure path why not increment tx_error stat? I was not sure about that, because before my patch we increment statistics only in case of NET_RX_SUCCESS. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next] route: allow to route in a peer netns via lwt framework
On 7/23/15 8:22 AM, Nicolas Dichtel wrote: static netdev_tx_t loopback_xmit(struct sk_buff *skb, struct net_device *dev) { + int nsid = skb_lwt_netns_info(skb); struct pcpu_lstats *lb_stats; int len; + if (nsid >= 0) { + struct net *peernet = get_net_ns_by_id(dev_net(dev), nsid); + + if (!peernet) { If nsid is > 0 then the peer namespace should exist right? So for this failure path why not increment tx_error stat? + kfree_skb(skb); + goto end; + } + + dev_forward_skb(peernet->loopback_dev, skb); + put_net(peernet); + goto end; + } + skb_orphan(skb); /* Before queueing this packet to netif_rx(), -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next v2] route: allow to route in a peer netns via lwt framework
This patch takes advantage of the newly added lwtunnel framework to allow the user to set routes that point to a peer netns. Packets are injected to the peer netns via the loopback device. It works only when the output device is 'lo'. Example: ip route add 40.1.1.1/32 encap netns nsid 5 via dev lo Signed-off-by: Nicolas Dichtel --- v2: rework loopback handling part (update stats and call skb_dst_force()) fix ipv6 processing check lwtunnel type before converting data to a nsid drivers/net/loopback.c| 33 +-- include/net/lwtunnel.h| 27 ++ include/uapi/linux/lwtunnel.h | 1 + net/core/net_namespace.c | 52 +++ net/ipv6/route.c | 9 ++-- 5 files changed, 113 insertions(+), 9 deletions(-) diff --git a/drivers/net/loopback.c b/drivers/net/loopback.c index c76283c2f84a..4358256ff94e 100644 --- a/drivers/net/loopback.c +++ b/drivers/net/loopback.c @@ -57,6 +57,7 @@ #include #include #include +#include struct pcpu_lstats { u64 packets; @@ -71,29 +72,47 @@ struct pcpu_lstats { static netdev_tx_t loopback_xmit(struct sk_buff *skb, struct net_device *dev) { + int nsid = skb_lwt_netns_info(skb); struct pcpu_lstats *lb_stats; - int len; - - skb_orphan(skb); + struct net *peernet = NULL; + int len, ret; /* Before queueing this packet to netif_rx(), * make sure dst is refcounted. */ skb_dst_force(skb); - skb->protocol = eth_type_trans(skb, dev); + if (nsid != NETNSA_NSID_NOT_ASSIGNED) { + peernet = get_net_ns_by_id(dev_net(dev), nsid); + if (!peernet) { + kfree_skb(skb); + goto end; + } + + /* it's OK to use per_cpu_ptr() because BHs are off */ + lb_stats = this_cpu_ptr(peernet->loopback_dev->lstats); + ret = dev_forward_skb(peernet->loopback_dev, skb); + } else { + skb_orphan(skb); - /* it's OK to use per_cpu_ptr() because BHs are off */ - lb_stats = this_cpu_ptr(dev->lstats); + skb->protocol = eth_type_trans(skb, dev); + + /* it's OK to use per_cpu_ptr() because BHs are off */ + lb_stats = this_cpu_ptr(dev->lstats); + ret = netif_rx(skb); + } len = skb->len; - if (likely(netif_rx(skb) == NET_RX_SUCCESS)) { + if (likely(ret == NET_RX_SUCCESS)) { u64_stats_update_begin(&lb_stats->syncp); lb_stats->bytes += len; lb_stats->packets++; u64_stats_update_end(&lb_stats->syncp); } +end: + if (peernet) + put_net(peernet); return NETDEV_TX_OK; } diff --git a/include/net/lwtunnel.h b/include/net/lwtunnel.h index b02039081b04..78376da1afa2 100644 --- a/include/net/lwtunnel.h +++ b/include/net/lwtunnel.h @@ -5,7 +5,9 @@ #include #include #include +#include #include +#include #define LWTUNNEL_HASH_BITS 7 #define LWTUNNEL_HASH_SIZE (1 << LWTUNNEL_HASH_BITS) @@ -147,4 +149,29 @@ static inline int lwtunnel_output6(struct sock *sk, struct sk_buff *skb) #endif +static inline u32 *lwt_netns_info(struct lwtunnel_state *lwtstate) +{ + return (u32 *)lwtstate->data; +} + +static inline int skb_lwt_netns_info(struct sk_buff *skb) +{ + if (skb->protocol == htons(ETH_P_IP)) { + struct rtable *rt = (struct rtable *)skb_dst(skb); + + if (rt && + rt->rt_lwtstate && + rt->rt_lwtstate->type & LWTUNNEL_ENCAP_NETNS) + return *lwt_netns_info(rt->rt_lwtstate); + } else if (skb->protocol == htons(ETH_P_IPV6)) { + struct rt6_info *rt6 = (struct rt6_info *)skb_dst(skb); + + if (rt6 && + rt6->rt6i_lwtstate && + rt6->rt6i_lwtstate->type & LWTUNNEL_ENCAP_NETNS) + return *lwt_netns_info(rt6->rt6i_lwtstate); + } + + return NETNSA_NSID_NOT_ASSIGNED; +} #endif /* __NET_LWTUNNEL_H */ diff --git a/include/uapi/linux/lwtunnel.h b/include/uapi/linux/lwtunnel.h index 31377bbea3f8..6715e7a1b335 100644 --- a/include/uapi/linux/lwtunnel.h +++ b/include/uapi/linux/lwtunnel.h @@ -7,6 +7,7 @@ enum lwtunnel_encap_types { LWTUNNEL_ENCAP_NONE, LWTUNNEL_ENCAP_MPLS, LWTUNNEL_ENCAP_IP, + LWTUNNEL_ENCAP_NETNS, __LWTUNNEL_ENCAP_MAX, }; diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c index 2c2eb1b629b1..c1267aac373d 100644 --- a/net/core/net_namespace.c +++ b/net/core/net_namespace.c @@ -20,6 +20,7 @@ #include #include #include +#include /* * Our network namespace constructor/destructor lists @@ -725,6 +726,56 @@ out: rtnl_set_sk_err(net, RTNL
Re: [PATCH net-next] route: allow to route in a peer netns via lwt framework
Le 24/07/2015 15:50, roopa a écrit : On 7/24/15, 5:24 AM, Nicolas Dichtel wrote: Sure, but my goal was to not create a new .h file just for these two helpers. It's related to lwtunnel, thus I was thinking they can go here. ok..., since your lwt namespace functions went into net_namespace.c, I was thinking these should really go into net_namespace.h. Does that work for you ? Not so easy, it's a problem of chicken and egg. If I add this to net/net_namespace.h, I need to include net/lwtunnel.h but this file already includes net/net_namespace.h (included directly or indirectly by most of the network headers). Regards, Nicolas -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next 2/2] lwtunnel: change prototype of lwtunnel_state_get()
On 7/24/15, 3:28 AM, Nicolas Dichtel wrote: It saves some lines and simplify a bit the code when the state is returning by this function. It's also useful to handle a NULL entry. To avoid too long lines, I've also renamed lwtunnel_state_get() and lwtunnel_state_put() to lwtstate_get() and lwtstate_put(). CC: Thomas Graf CC: Roopa Prabhu Signed-off-by: Nicolas Dichtel Acked-by: Roopa Prabhu thanks. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next] bonding: convert num_grat_arp to the new bonding option API
From: Nikolay Aleksandrov num_grat_arp wasn't converted to the new bonding option API, so do this now and remove the specific sysfs store option in order to use the standard one. num_grat_arp is the same as num_unsol_na so add it as an alias with the same option settings. An important difference is the option name which is matched in bond_sysfs_store_option(). Signed-off-by: Nikolay Aleksandrov --- drivers/net/bonding/bond_options.c | 7 +++ drivers/net/bonding/bond_sysfs.c | 20 +++- include/net/bond_options.h | 1 + 3 files changed, 11 insertions(+), 17 deletions(-) diff --git a/drivers/net/bonding/bond_options.c b/drivers/net/bonding/bond_options.c index e9c624d54dd4..6dda57e2e724 100644 --- a/drivers/net/bonding/bond_options.c +++ b/drivers/net/bonding/bond_options.c @@ -420,6 +420,13 @@ static const struct bond_option bond_opts[BOND_OPT_LAST] = { .flags = BOND_OPTFLAG_IFDOWN, .values = bond_ad_user_port_key_tbl, .set = bond_option_ad_user_port_key_set, + }, + [BOND_OPT_NUM_PEER_NOTIF_ALIAS] = { + .id = BOND_OPT_NUM_PEER_NOTIF_ALIAS, + .name = "num_grat_arp", + .desc = "Number of peer notifications to send on failover event", + .values = bond_num_peer_notif_tbl, + .set = bond_option_num_peer_notif_set } }; diff --git a/drivers/net/bonding/bond_sysfs.c b/drivers/net/bonding/bond_sysfs.c index 31835a4dab57..f4ae72086215 100644 --- a/drivers/net/bonding/bond_sysfs.c +++ b/drivers/net/bonding/bond_sysfs.c @@ -380,7 +380,7 @@ static ssize_t bonding_show_ad_select(struct device *d, static DEVICE_ATTR(ad_select, S_IRUGO | S_IWUSR, bonding_show_ad_select, bonding_sysfs_store_option); -/* Show and set the number of peer notifications to send after a failover event. */ +/* Show the number of peer notifications to send after a failover event. */ static ssize_t bonding_show_num_peer_notif(struct device *d, struct device_attribute *attr, char *buf) @@ -388,24 +388,10 @@ static ssize_t bonding_show_num_peer_notif(struct device *d, struct bonding *bond = to_bond(d); return sprintf(buf, "%d\n", bond->params.num_peer_notif); } - -static ssize_t bonding_store_num_peer_notif(struct device *d, - struct device_attribute *attr, - const char *buf, size_t count) -{ - struct bonding *bond = to_bond(d); - int ret; - - ret = bond_opt_tryset_rtnl(bond, BOND_OPT_NUM_PEER_NOTIF, (char *)buf); - if (!ret) - ret = count; - - return ret; -} static DEVICE_ATTR(num_grat_arp, S_IRUGO | S_IWUSR, - bonding_show_num_peer_notif, bonding_store_num_peer_notif); + bonding_show_num_peer_notif, bonding_sysfs_store_option); static DEVICE_ATTR(num_unsol_na, S_IRUGO | S_IWUSR, - bonding_show_num_peer_notif, bonding_store_num_peer_notif); + bonding_show_num_peer_notif, bonding_sysfs_store_option); /* Show the MII monitor interval. */ static ssize_t bonding_show_miimon(struct device *d, diff --git a/include/net/bond_options.h b/include/net/bond_options.h index c28aca25320e..1797235cd590 100644 --- a/include/net/bond_options.h +++ b/include/net/bond_options.h @@ -66,6 +66,7 @@ enum { BOND_OPT_AD_ACTOR_SYS_PRIO, BOND_OPT_AD_ACTOR_SYSTEM, BOND_OPT_AD_USER_PORT_KEY, + BOND_OPT_NUM_PEER_NOTIF_ALIAS, BOND_OPT_LAST }; -- 2.4.3 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next] route: allow to route in a peer netns via lwt framework
On 7/24/15, 5:24 AM, Nicolas Dichtel wrote: Sure, but my goal was to not create a new .h file just for these two helpers. It's related to lwtunnel, thus I was thinking they can go here. ok..., since your lwt namespace functions went into net_namespace.c, I was thinking these should really go into net_namespace.h. Does that work for you ? If that does not, then yes, they could live here. Thanks, Roopa -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next 1/2] ipv6: copy lwtstate in ip6_rt_copy_init()
On 7/24/15, 3:28 AM, Nicolas Dichtel wrote: We need to copy this field (ip6_rt_cache_alloc() and ip6_rt_pcpu_alloc() use ip6_rt_copy_init() to build a dst). CC: Thomas Graf CC: Roopa Prabhu Fixes: 19e42e451506 ("ipv6: support for fib route lwtunnel encap attributes") Signed-off-by: Nicolas Dichtel Acked-by: Roopa Prabhu Thanks! -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next] ipv6: use lwtunnel_output6() only if flag redirect is set
On 7/24/15, 1:59 AM, Nicolas Dichtel wrote: This function make sense only when LWTUNNEL_STATE_OUTPUT_REDIRECT is set. The check is already done in IPv4. CC: Thomas Graf CC: Roopa Prabhu Fixes: 74a0f2fe8ed5 ("ipv6: rt6_info output redirect to tunnel output") Signed-off-by: Nicolas Dichtel Acked-by: Roopa Prabhu thanks -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next] route: allow to route in a peer netns via lwt framework
Le 23/07/2015 17:50, roopa a écrit : On 7/23/15, 8:25 AM, Nicolas Dichtel wrote: Le 23/07/2015 17:01, roopa a écrit : On 7/23/15, 7:22 AM, Nicolas Dichtel wrote: [snip] +static inline u32 *lwt_netns_info(struct lwtunnel_state *lwtstate) +{ +return (u32 *)lwtstate->data; +} + +static inline int skb_lwt_netns_info(struct sk_buff *skb) +{ +if (skb->protocol == htons(ETH_P_IP)) { +struct rtable *rt = (struct rtable *)skb_dst(skb); + +if (rt && rt->rt_lwtstate) +return *lwt_netns_info(rt->rt_lwtstate); +} else if (skb->protocol == htons(ETH_P_IPV6)) { +struct rt6_info *rt6 = (struct rt6_info *)skb_dst(skb); + +if (rt6 && rt6->rt6i_lwtstate) +return *lwt_netns_info(rt6->rt6i_lwtstate); +} + +return NETNSA_NSID_NOT_ASSIGNED; +} #endif /* __NET_LWTUNNEL_H */ since these apis' don't have to be netns specific, Can they just be named lwtunnel_get_state_data and skb_lwtunnel_state ? They are specific to netns because lwtstate->data is interpreted as an u32 *. But I agree that a test is missing against lwtstate->type to ensure that data will be a nsid. o ok..., the api's in lwtunnel.h today are not specific to an encap type. they are generic, so skb_lwtunnel_state() which returns struct lwtunnel_state could go here. the encap specific ones can go in the respective callers. Recently thomas added a similar skb_tunnel_info() for ip tunnels. I did like to have a generic version of your skb_lwt_netns_info in lwtunnel.h. I could use it in my mpls output func too. Sure, but my goal was to not create a new .h file just for these two helpers. It's related to lwtunnel, thus I was thinking they can go here. Regards, Nicolas -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch net-next 3/4] mlxsw: Add interface to access registers and process events
On Fri, Jul 24, 2015 at 07:24:53AM +0200, Jiri Pirko wrote: > Thu, Jul 23, 2015 at 11:12:20PM CEST, go...@cumulusnetworks.com wrote: > >On Thu, Jul 23, 2015 at 05:43:35PM +0200, Jiri Pirko wrote: > >> From: Ido Schimmel > >> > >> Add the ability to construct mailbox-style register access messages > >> called EMADs with provisions to construct and parse the registers payload. > >> Implement EMAD transaction layer which is responsible for the reliable > >> transmission of EMADs. > >> Also, add an infrastructure used by the switch driver to register for > >> particular events generated by the device. > >> > >> Signed-off-by: Ido Schimmel > >> Signed-off-by: Jiri Pirko > >> Signed-off-by: Elad Raz > >> --- > >> drivers/net/ethernet/mellanox/mlxsw/core.c | 736 > >> drivers/net/ethernet/mellanox/mlxsw/core.h | 21 + > >> drivers/net/ethernet/mellanox/mlxsw/emad.h | 127 +++ > >> drivers/net/ethernet/mellanox/mlxsw/port.h | 19 + > >> drivers/net/ethernet/mellanox/mlxsw/reg.h | 1289 > >> > >> 5 files changed, 2192 insertions(+) > >> create mode 100644 drivers/net/ethernet/mellanox/mlxsw/emad.h > >> create mode 100644 drivers/net/ethernet/mellanox/mlxsw/reg.h > >> > >> diff --git a/drivers/net/ethernet/mellanox/mlxsw/core.c > >> b/drivers/net/ethernet/mellanox/mlxsw/core.c > >> index 211ec9b..bd0f692 100644 > >> --- a/drivers/net/ethernet/mellanox/mlxsw/core.c > >> +++ b/drivers/net/ethernet/mellanox/mlxsw/core.c > >[...] > >> + struct list_head event_listener_list; > >> + struct { > >> + struct sk_buff *resp_skb; > >> + u64 tid; > >> + wait_queue_head_t wait; > >> + bool trans_active; > >> + struct mutex lock; /* One EMAD transaction at a time. */ > >> + bool use_emad; > >> + } emad; > >>struct mlxsw_core_pcpu_stats __percpu *pcpu_stats; > >>struct dentry *dbg_dir; > >>struct { > >[...] > >>} > >> > >>INIT_LIST_HEAD(&mlxsw_core->rx_listener_list); > >> + INIT_LIST_HEAD(&mlxsw_core->event_listener_list); > >>mlxsw_core->driver = mlxsw_driver; > >>mlxsw_core->bus = mlxsw_bus; > >>mlxsw_core->bus_priv = bus_priv; > >[...] > >> + /* No reason to save item if we did not manage to register an RX > >> + * listener for it. > >> + */ > >> + list_add_rcu(&el_item->list, &mlxsw_core->event_listener_list); > >> + > > > >I see where 'event_listener_list' is defined and where entries are > >added/removed, but where is the code that would receive these events and > >presumably search this list so all handlers registered (currently just > >PUDE) can handle events? > > That is handled by calling mlxsw_core_rx_listener_register. > that will add each event handler as a item to &mlxsw_core->rx_listener_list > These rx_listeners are called from mlxsw_core_skb_receive. > So event is here a special case of rx_listener. The event list is used > just to contain struct mlxsw_event_listener_item instances. > Thanks for the explanation! I missed how that was called the first time I went through this. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] net: phy: fix auto negotiation checking for teranetics
From: Shaohui Xie When using fiber port, the phy cannot report it's auto negotiation state, driver should always report auto negotiation is done when using fiber port. Signed-off-by: Shaohui Xie --- drivers/net/phy/teranetics.c | 11 +-- 1 file changed, 9 insertions(+), 2 deletions(-) diff --git a/drivers/net/phy/teranetics.c b/drivers/net/phy/teranetics.c index 7dcb5aa..91e1bec 100644 --- a/drivers/net/phy/teranetics.c +++ b/drivers/net/phy/teranetics.c @@ -51,8 +51,15 @@ static int teranetics_aneg_done(struct phy_device *phydev) { int reg; - reg = phy_read_mmd(phydev, MDIO_MMD_AN, MDIO_STAT1); - return (reg < 0) ? reg : (reg & BMSR_ANEGCOMPLETE); + /* auto negotiation state can only be checked when using copper +* port, if using fiber port, just lie it's done. +*/ + if (!phy_read_mmd(phydev, MDIO_MMD_VEND1, 93)) { + reg = phy_read_mmd(phydev, MDIO_MMD_AN, MDIO_STAT1); + return (reg < 0) ? reg : (reg & BMSR_ANEGCOMPLETE); + } + + return 1; } static int teranetics_config_aneg(struct phy_device *phydev) -- 2.1.0.27.g96db324 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] net/ipv6: add sysctl option accept_ra_hop_limit
2015-07-24 12:48 GMT+08:00 YOSHIFUJI Hideaki : > Hi, > > Hangbin Liu wrote: >> Commit 6fd99094de2b ("ipv6: Don't reduce hop limit for an interface") >> disabled accept hop limit from RA if it is higher than the current hop >> limit for security stuff. But this behavior kind of break the RFC definition. >> >> RFC 4861, 6.3.4. Processing Received Router Advertisements >>If the received Cur Hop Limit value is non-zero, the host SHOULD set >>its CurHopLimit variable to the received value. >> >> So add sysctl option accept_ra_hop_limit to let user choose whether accept >> hop limit info in RA. >> >> Signed-off-by: Hangbin Liu >> Acked-by: Hannes Frederic Sowa >> --- >> Documentation/networking/ip-sysctl.txt | 11 +++ >> include/linux/ipv6.h | 1 + >> include/uapi/linux/ipv6.h | 1 + >> net/ipv6/addrconf.c| 10 ++ >> net/ipv6/ndisc.c | 17 +++-- >> 5 files changed, 34 insertions(+), 6 deletions(-) >> > : >> diff --git a/include/uapi/linux/ipv6.h b/include/uapi/linux/ipv6.h >> index 5efa54a..9f40ac9 100644 >> --- a/include/uapi/linux/ipv6.h >> +++ b/include/uapi/linux/ipv6.h >> @@ -153,6 +153,7 @@ enum { >> DEVCONF_FORCE_MLD_VERSION, >> DEVCONF_ACCEPT_RA_DEFRTR, >> DEVCONF_ACCEPT_RA_PINFO, >> + DEVCONF_ACCEPT_RA_HOP_LIMIT, >> DEVCONF_ACCEPT_RA_RTR_PREF, >> DEVCONF_RTR_PROBE_INTERVAL, >> DEVCONF_ACCEPT_RA_RT_INFO_MAX_PLEN, > > No, you cannot add new one in the middle of these since > values are exported to userspace. > Hi Yoshfuji-san, Thanks for the reminding, should I also move the value in struct ipv6_devconf to the end or just leave after accept_ra_pinfo? Thanks Hangbin -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next] ipv6: use lwtunnel_output6() only if flag redirect is set
On 07/24/15 at 10:59am, Nicolas Dichtel wrote: > This function make sense only when LWTUNNEL_STATE_OUTPUT_REDIRECT is set. > The check is already done in IPv4. > > CC: Thomas Graf > CC: Roopa Prabhu > Fixes: 74a0f2fe8ed5 ("ipv6: rt6_info output redirect to tunnel output") > Signed-off-by: Nicolas Dichtel Acked-by: Thomas Graf -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next 2/2] lwtunnel: change prototype of lwtunnel_state_get()
On 07/24/15 at 12:28pm, Nicolas Dichtel wrote: > It saves some lines and simplify a bit the code when the state is returning > by this function. It's also useful to handle a NULL entry. > > To avoid too long lines, I've also renamed lwtunnel_state_get() and > lwtunnel_state_put() to lwtstate_get() and lwtstate_put(). > > CC: Thomas Graf > CC: Roopa Prabhu > Signed-off-by: Nicolas Dichtel Acked-by: Thomas Graf -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next 1/2] ipv6: copy lwtstate in ip6_rt_copy_init()
On 07/24/15 at 12:28pm, Nicolas Dichtel wrote: > We need to copy this field (ip6_rt_cache_alloc() and ip6_rt_pcpu_alloc() > use ip6_rt_copy_init() to build a dst). > > CC: Thomas Graf > CC: Roopa Prabhu > Fixes: 19e42e451506 ("ipv6: support for fib route lwtunnel encap attributes") > Signed-off-by: Nicolas Dichtel Acked-by: Thomas Graf -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next 2/2] lwtunnel: change prototype of lwtunnel_state_get()
It saves some lines and simplify a bit the code when the state is returning by this function. It's also useful to handle a NULL entry. To avoid too long lines, I've also renamed lwtunnel_state_get() and lwtunnel_state_put() to lwtstate_get() and lwtstate_put(). CC: Thomas Graf CC: Roopa Prabhu Signed-off-by: Nicolas Dichtel --- include/net/lwtunnel.h | 16 +++- net/ipv4/fib_semantics.c | 9 - net/ipv4/route.c | 9 ++--- net/ipv6/ip6_fib.c | 2 +- net/ipv6/route.c | 8 ++-- 5 files changed, 20 insertions(+), 24 deletions(-) diff --git a/include/net/lwtunnel.h b/include/net/lwtunnel.h index bd72e82b45a1..78376da1afa2 100644 --- a/include/net/lwtunnel.h +++ b/include/net/lwtunnel.h @@ -37,12 +37,16 @@ extern const struct lwtunnel_encap_ops __rcu * lwtun_encaps[LWTUNNEL_ENCAP_MAX+1]; #ifdef CONFIG_LWTUNNEL -static inline void lwtunnel_state_get(struct lwtunnel_state *lws) +static inline struct lwtunnel_state * +lwtstate_get(struct lwtunnel_state *lws) { - atomic_inc(&lws->refcnt); + if (lws) + atomic_inc(&lws->refcnt); + + return lws; } -static inline void lwtunnel_state_put(struct lwtunnel_state *lws) +static inline void lwtstate_put(struct lwtunnel_state *lws) { if (!lws) return; @@ -76,11 +80,13 @@ int lwtunnel_output6(struct sock *sk, struct sk_buff *skb); #else -static inline void lwtunnel_state_get(struct lwtunnel_state *lws) +static inline struct lwtunnel_state * +lwtstate_get(struct lwtunnel_state *lws) { + return lws; } -static inline void lwtunnel_state_put(struct lwtunnel_state *lws) +static inline void lwtstate_put(struct lwtunnel_state *lws) { } diff --git a/net/ipv4/fib_semantics.c b/net/ipv4/fib_semantics.c index 6754c64b2fe0..7226df887531 100644 --- a/net/ipv4/fib_semantics.c +++ b/net/ipv4/fib_semantics.c @@ -209,7 +209,7 @@ static void free_fib_info_rcu(struct rcu_head *head) change_nexthops(fi) { if (nexthop_nh->nh_dev) dev_put(nexthop_nh->nh_dev); - lwtunnel_state_put(nexthop_nh->nh_lwtstate); + lwtstate_put(nexthop_nh->nh_lwtstate); free_nh_exceptions(nexthop_nh); rt_fibinfo_free_cpus(nexthop_nh->nh_pcpu_rth_output); rt_fibinfo_free(&nexthop_nh->nh_rth_input); @@ -512,8 +512,8 @@ static int fib_get_nhs(struct fib_info *fi, struct rtnexthop *rtnh, nla, &lwtstate); if (ret) goto errout; - lwtunnel_state_get(lwtstate); - nexthop_nh->nh_lwtstate = lwtstate; + nexthop_nh->nh_lwtstate = + lwtstate_get(lwtstate); } } @@ -969,8 +969,7 @@ struct fib_info *fib_create_info(struct fib_config *cfg) if (err) goto failure; - lwtunnel_state_get(lwtstate); - nh->nh_lwtstate = lwtstate; + nh->nh_lwtstate = lwtstate_get(lwtstate); } nh->nh_oif = cfg->fc_oif; nh->nh_gw = cfg->fc_gw; diff --git a/net/ipv4/route.c b/net/ipv4/route.c index 519ec232818d..11096396ef4a 100644 --- a/net/ipv4/route.c +++ b/net/ipv4/route.c @@ -1358,7 +1358,7 @@ static void ipv4_dst_destroy(struct dst_entry *dst) list_del(&rt->rt_uncached); spin_unlock_bh(&ul->lock); } - lwtunnel_state_put(rt->rt_lwtstate); + lwtstate_put(rt->rt_lwtstate); } void rt_flush_dev(struct net_device *dev) @@ -1407,12 +1407,7 @@ static void rt_set_nexthop(struct rtable *rt, __be32 daddr, #ifdef CONFIG_IP_ROUTE_CLASSID rt->dst.tclassid = nh->nh_tclassid; #endif - if (nh->nh_lwtstate) { - lwtunnel_state_get(nh->nh_lwtstate); - rt->rt_lwtstate = nh->nh_lwtstate; - } else { - rt->rt_lwtstate = NULL; - } + rt->rt_lwtstate = lwtstate_get(nh->nh_lwtstate); if (unlikely(fnhe)) cached = rt_bind_exception(rt, fnhe, daddr); else if (!(rt->dst.flags & DST_NOCACHE)) diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c index d715f2e0c4e7..5693b5eb8482 100644 --- a/net/ipv6/ip6_fib.c +++ b/net/ipv6/ip6_fib.c @@ -178,7 +178,7 @@ static void rt6_free_pcpu(struct rt6_info *non_pcpu_rt) static void rt6_release(struct rt6_info *rt) { if (atomic_dec_and_test(&rt->rt6i_ref)) { - lwtunnel_state_put(rt->rt6i_lwtstate); + lwtstate_put(rt->rt6i_lwtstate); rt6_free_pcpu(rt); dst_free(&rt->dst); } diff --git a/net/i
[PATCH net-next 1/2] ipv6: copy lwtstate in ip6_rt_copy_init()
We need to copy this field (ip6_rt_cache_alloc() and ip6_rt_pcpu_alloc() use ip6_rt_copy_init() to build a dst). CC: Thomas Graf CC: Roopa Prabhu Fixes: 19e42e451506 ("ipv6: support for fib route lwtunnel encap attributes") Signed-off-by: Nicolas Dichtel --- net/ipv6/route.c | 4 1 file changed, 4 insertions(+) diff --git a/net/ipv6/route.c b/net/ipv6/route.c index 67b2367126f3..ac01ab0886a5 100644 --- a/net/ipv6/route.c +++ b/net/ipv6/route.c @@ -2164,6 +2164,10 @@ static void ip6_rt_copy_init(struct rt6_info *rt, struct rt6_info *ort) #endif rt->rt6i_prefsrc = ort->rt6i_prefsrc; rt->rt6i_table = ort->rt6i_table; + if (ort->rt6i_lwtstate) { + lwtunnel_state_get(ort->rt6i_lwtstate); + rt->rt6i_lwtstate = ort->rt6i_lwtstate; + } } #ifdef CONFIG_IPV6_ROUTE_INFO -- 2.4.2 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH]mlx4-core: fix possible use after free in cq_completion
On Fri, Jul 24, 2015 at 10:18 AM, Jinpu Wang wrote: > Hi all, > > I hit bug in OFED, I report to link below: > > http://marc.info/?l=linux-rdma&m=143634872328553&w=2 > I checked latest mainline Linux 4.2-rc3, it has similar bug. > Here is the patch against Linux 4.2-rc3, compile test only. > > I add one copy as attachment in case mail client break the patch format. > > From a9fbc1ff0768acdb260e57e3324798fc0082d194 Mon Sep 17 00:00:00 2001 > From: Jack Wang > Date: Thu, 23 Jul 2015 18:58:08 +0200 > Subject: [PATCH] mlx4_core: fix possible use-after-free in cq_completion > > It's possible during mlx4_cq_free, there are new cq_completion come, > and there is no spin_lock protection for cq_completion, also no > refcount protection, it will lead to use after free. So add the > spin_lock and refcount protection in cq_completion. > > Signed-off-by: Jack Wang > --- > drivers/net/ethernet/mellanox/mlx4/cq.c | 11 +-- > 1 file changed, 9 insertions(+), 2 deletions(-) > > diff --git a/drivers/net/ethernet/mellanox/mlx4/cq.c > b/drivers/net/ethernet/mellanox/mlx4/cq.c > index 3348e64..8d7f405 100644 > --- a/drivers/net/ethernet/mellanox/mlx4/cq.c > +++ b/drivers/net/ethernet/mellanox/mlx4/cq.c > @@ -99,10 +99,15 @@ static void mlx4_add_cq_to_tasklet(struct mlx4_cq *cq) > > void mlx4_cq_completion(struct mlx4_dev *dev, u32 cqn) > { > +struct mlx4_cq_table *cq_table = &mlx4_priv(dev)->cq_table; > struct mlx4_cq *cq; > > -cq = radix_tree_lookup(&mlx4_priv(dev)->cq_table.tree, > - cqn & (dev->caps.num_cqs - 1)); > +spin_lock(&cq_table->lock); > +cq = radix_tree_lookup(&cq_table->tree, cqn & (dev->caps.num_cqs - 1)); > +if (cq) > +atomic_inc(&cq->refcount); > + > +spin_unlock(&cq_table->lock); > if (!cq) { > mlx4_dbg(dev, "Completion event for bogus CQ %08x\n", cqn); > return; > @@ -111,6 +116,8 @@ void mlx4_cq_completion(struct mlx4_dev *dev, u32 cqn) > ++cq->arm_sn; > > cq->comp(cq); > +if (atomic_dec_and_test(&cq->refcount)) > +complete(&cq->free); > } > > void mlx4_cq_event(struct mlx4_dev *dev, u32 cqn, int event_type) > -- > 1.9.1 > Found almost same patch as what I did, but 3 years ago :) http://linux-rdma.vger.kernel.narkive.com/NSyWFRkW/patch-rfc-for-next-net-mlx4-core-fix-racy-flow-in-the-driver-cq-completion-handler Could you consider to apply the patch, it fix real PANIC? Thanks Jack -- Mit freundlichen Grüßen, Best Regards, Jack Wang Linux Kernel Developer Storage ProfitBricks GmbH The IaaS-Company. ProfitBricks GmbH Greifswalder Str. 207 D - 10405 Berlin Tel: +49 30 5770083-42 Fax: +49 30 5770085-98 Email: jinpu.w...@profitbricks.com URL: http://www.profitbricks.de Sitz der Gesellschaft: Berlin. Registergericht: Amtsgericht Charlottenburg, HRB 125506 B. Geschäftsführer: Andreas Gauger, Achim Weiss. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next] ipv6: use lwtunnel_output6() only if flag redirect is set
This function make sense only when LWTUNNEL_STATE_OUTPUT_REDIRECT is set. The check is already done in IPv4. CC: Thomas Graf CC: Roopa Prabhu Fixes: 74a0f2fe8ed5 ("ipv6: rt6_info output redirect to tunnel output") Signed-off-by: Nicolas Dichtel --- net/ipv6/route.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/net/ipv6/route.c b/net/ipv6/route.c index f216cb998628..67b2367126f3 100644 --- a/net/ipv6/route.c +++ b/net/ipv6/route.c @@ -1780,7 +1780,8 @@ int ip6_route_add(struct fib6_config *cfg) goto out; lwtunnel_state_get(lwtstate); rt->rt6i_lwtstate = lwtstate; - rt->dst.output = lwtunnel_output6; + if (lwtunnel_output_redirect(rt->rt6i_lwtstate)) + rt->dst.output = lwtunnel_output6; } ipv6_addr_prefix(&rt->rt6i_dst.addr, &cfg->fc_dst, cfg->fc_dst_len); -- 2.4.2 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V3 3/7] Drivers: hv: vmbus: add APIs to send/recv hvsock packet and get the r/w-ability
On Fri, Jul 24, 2015 at 11:57:01AM +0530, Sudip Mukherjee wrote: > This is also ok, the function is supposed to return ret or-ed with the > relevant flags based on the scan position. It is considered error if 0 > is returned (without any flag). Yeah. You're right. I looked through my list again this morning and they all seem fine... regards, dan carpenter -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH]mlx4-core: fix possible use after free in cq_completion
Hi all, I hit bug in OFED, I report to link below: http://marc.info/?l=linux-rdma&m=143634872328553&w=2 I checked latest mainline Linux 4.2-rc3, it has similar bug. Here is the patch against Linux 4.2-rc3, compile test only. I add one copy as attachment in case mail client break the patch format. >From a9fbc1ff0768acdb260e57e3324798fc0082d194 Mon Sep 17 00:00:00 2001 From: Jack Wang Date: Thu, 23 Jul 2015 18:58:08 +0200 Subject: [PATCH] mlx4_core: fix possible use-after-free in cq_completion It's possible during mlx4_cq_free, there are new cq_completion come, and there is no spin_lock protection for cq_completion, also no refcount protection, it will lead to use after free. So add the spin_lock and refcount protection in cq_completion. Signed-off-by: Jack Wang --- drivers/net/ethernet/mellanox/mlx4/cq.c | 11 +-- 1 file changed, 9 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx4/cq.c b/drivers/net/ethernet/mellanox/mlx4/cq.c index 3348e64..8d7f405 100644 --- a/drivers/net/ethernet/mellanox/mlx4/cq.c +++ b/drivers/net/ethernet/mellanox/mlx4/cq.c @@ -99,10 +99,15 @@ static void mlx4_add_cq_to_tasklet(struct mlx4_cq *cq) void mlx4_cq_completion(struct mlx4_dev *dev, u32 cqn) { +struct mlx4_cq_table *cq_table = &mlx4_priv(dev)->cq_table; struct mlx4_cq *cq; -cq = radix_tree_lookup(&mlx4_priv(dev)->cq_table.tree, - cqn & (dev->caps.num_cqs - 1)); +spin_lock(&cq_table->lock); +cq = radix_tree_lookup(&cq_table->tree, cqn & (dev->caps.num_cqs - 1)); +if (cq) +atomic_inc(&cq->refcount); + +spin_unlock(&cq_table->lock); if (!cq) { mlx4_dbg(dev, "Completion event for bogus CQ %08x\n", cqn); return; @@ -111,6 +116,8 @@ void mlx4_cq_completion(struct mlx4_dev *dev, u32 cqn) ++cq->arm_sn; cq->comp(cq); +if (atomic_dec_and_test(&cq->refcount)) +complete(&cq->free); } void mlx4_cq_event(struct mlx4_dev *dev, u32 cqn, int event_type) -- 1.9.1 -- Mit freundlichen Grüßen,Linux 4.2-rc3 Best Regards, Jack Wang Linux Kernel Developer Storage ProfitBricks GmbH The IaaS-Company. ProfitBricks GmbH Greifswalder Str. 207 D - 10405 Berlin Tel: +49 30 5770083-42 Fax: +49 30 5770085-98 Email: jinpu.w...@profitbricks.com URL: http://www.profitbricks.de Sitz der Gesellschaft: Berlin. Registergericht: Amtsgericht Charlottenburg, HRB 125506 B. Geschäftsführer: Andreas Gauger, Achim Weiss. From a9fbc1ff0768acdb260e57e3324798fc0082d194 Mon Sep 17 00:00:00 2001 From: Jack Wang Date: Thu, 23 Jul 2015 18:58:08 +0200 Subject: [PATCH] mlx4_core: fix possible use-after-free in cq_completion It's possible during mlx4_cq_free, there are new cq_completion come, and there is no spin_lock protection for cq_completion, also no refcount protection, it will lead to use after free. So add the spin_lock and refcount protection in cq_completion. Signed-off-by: Jack Wang --- drivers/net/ethernet/mellanox/mlx4/cq.c | 11 +-- 1 file changed, 9 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx4/cq.c b/drivers/net/ethernet/mellanox/mlx4/cq.c index 3348e64..8d7f405 100644 --- a/drivers/net/ethernet/mellanox/mlx4/cq.c +++ b/drivers/net/ethernet/mellanox/mlx4/cq.c @@ -99,10 +99,15 @@ static void mlx4_add_cq_to_tasklet(struct mlx4_cq *cq) void mlx4_cq_completion(struct mlx4_dev *dev, u32 cqn) { + struct mlx4_cq_table *cq_table = &mlx4_priv(dev)->cq_table; struct mlx4_cq *cq; - cq = radix_tree_lookup(&mlx4_priv(dev)->cq_table.tree, - cqn & (dev->caps.num_cqs - 1)); + spin_lock(&cq_table->lock); + cq = radix_tree_lookup(&cq_table->tree, cqn & (dev->caps.num_cqs - 1)); + if (cq) + atomic_inc(&cq->refcount); + + spin_unlock(&cq_table->lock); if (!cq) { mlx4_dbg(dev, "Completion event for bogus CQ %08x\n", cqn); return; @@ -111,6 +116,8 @@ void mlx4_cq_completion(struct mlx4_dev *dev, u32 cqn) ++cq->arm_sn; cq->comp(cq); + if (atomic_dec_and_test(&cq->refcount)) + complete(&cq->free); } void mlx4_cq_event(struct mlx4_dev *dev, u32 cqn, int event_type) -- 1.9.1
Re: [PATCH 1/1] ath10k: fixing wrong initialization of struct channel
Maninder Singh writes: > chandef is initialized with NULL and on the very next line, > we are using it to get channel, which is not correct. > > channel should be initialized after obtaining chandef. > > Signed-off-by: Maninder Singh Thanks, applied. -- Kalle Valo -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch net-next 3/4] mlxsw: Add interface to access registers and process events
Sent from my iPhone > On Jul 24, 2015, at 08:14, Scott Feldman wrote: > >> On Thu, Jul 23, 2015 at 8:43 AM, Jiri Pirko wrote: >> From: Ido Schimmel >> >> Add the ability to construct mailbox-style register access messages >> called EMADs with provisions to construct and parse the registers payload. >> Implement EMAD transaction layer which is responsible for the reliable >> transmission of EMADs. >> Also, add an infrastructure used by the switch driver to register for >> particular events generated by the device. > > What is this EMADs used for? Is this for intra-switch or inter-switch > communications? Ethernet management datagram. It's command encoding wrap as a packet. It used for host interface communication as well as multi-silicon support.-- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html