[PATCH 1/1] net: macb: ensure ordering write to re-enable RX smoothly
When a hardware issue happened as described by inline comments, the register write pattern looks like the following: + wmb(); There might be a memory barrier between these two write operations, so add wmb to ensure an flip from 0 to 1 for NCR. Signed-off-by: Zumeng Chen--- drivers/net/ethernet/cadence/macb.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/net/ethernet/cadence/macb.c b/drivers/net/ethernet/cadence/macb.c index 533653b..2f9c5b2 100644 --- a/drivers/net/ethernet/cadence/macb.c +++ b/drivers/net/ethernet/cadence/macb.c @@ -1156,6 +1156,7 @@ static irqreturn_t macb_interrupt(int irq, void *dev_id) if (status & MACB_BIT(RXUBR)) { ctrl = macb_readl(bp, NCR); macb_writel(bp, NCR, ctrl & ~MACB_BIT(RE)); + wmb(); macb_writel(bp, NCR, ctrl | MACB_BIT(RE)); if (bp->caps & MACB_CAPS_ISR_CLEAR_ON_WRITE) -- 2.4.11
Re: [PATCH net] net/sched: act_pedit: limit negative offset
On Mon, Nov 28, 2016 at 12:49:36AM -0500, David Miller wrote: > From: Cong Wang> Date: Sun, 27 Nov 2016 21:39:33 -0800 > > > On Sun, Nov 27, 2016 at 7:58 AM, Amir Vadai wrote: > >> Should not allow setting a negative offset that goes below the skb head. > > ... > >> diff --git a/net/sched/act_pedit.c b/net/sched/act_pedit.c > >> index b54d56d4959b..e79e8a88f2d2 100644 > >> --- a/net/sched/act_pedit.c > >> +++ b/net/sched/act_pedit.c > >> @@ -154,8 +154,11 @@ static int tcf_pedit(struct sk_buff *skb, const > >> struct tc_action *a, > >> } > >> > >> ptr = skb_header_pointer(skb, off + offset, 4, > >> &_data); > >> - if (!ptr) > >> + if ((unsigned char *)ptr < skb->head) { > > > > > > ptr returned could be &_data, which is on stack, so why this comparison > > makes sense for this case? > > Indeed, this will definitely do the wrong thing when the on-stack area > passed back to ptr. yes - my bad. will correct it and send v1
Re: [PATCH net-next v3 0/4] Documentation: net: phy: Improve documentation
On Sun, 2016-11-27 at 18:45 -0800, Florian Fainelli wrote: > Hi all, > > This patch series addresses discussions and feedback that was > recently received > on the mailing-list in the area of: flow control/pause frames, > interpretation of > phy_interface_t and finally add some links to useful standards > documents. > > Changes in v3: > > - add Timur's feedback into patch 3 > > Changes in v2: > > - clarify a few things in the RGMII section, add a paragraph about > common issues > with RGMII delay mismatches > Thanks a lot Florian. This is really helping, especially the part about RGMII delays. Reviewed-by: Jerome Brunet> Florian Fainelli (4): > Documentation: net: phy: remove description of function pointers > Documentation: net: phy: Add a paragraph about pause frames/flow > control > Documentation: net: phy: Add blurb about RGMII > Documentation: net: phy: Add links to several standards documents > > Documentation/networking/phy.txt | 140 > +-- > 1 file changed, 105 insertions(+), 35 deletions(-) >
[PATCH] net: arc_emac: add dependencies on associated arches and compile test
Add dependencies on the architectures that support these devices and add compile test to ensure ongoing code build coverage. Signed-off-by: Peter Robinson--- drivers/net/ethernet/arc/Kconfig | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/arc/Kconfig b/drivers/net/ethernet/arc/Kconfig index 6890451..e743ddf 100644 --- a/drivers/net/ethernet/arc/Kconfig +++ b/drivers/net/ethernet/arc/Kconfig @@ -17,13 +17,14 @@ if NET_VENDOR_ARC config ARC_EMAC_CORE tristate + depends on ARC || ARCH_ROCKCHIP || COMPILE_TEST select MII select PHYLIB config ARC_EMAC tristate "ARC EMAC support" select ARC_EMAC_CORE - depends on OF_IRQ && OF_NET && HAS_DMA + depends on OF_IRQ && OF_NET && HAS_DMA && (ARC || COMPILE_TEST) ---help--- On some legacy ARC (Synopsys) FPGA boards such as ARCAngel4/ML50x non-standard on-chip ethernet device ARC EMAC 10/100 is used. @@ -32,7 +33,7 @@ config ARC_EMAC config EMAC_ROCKCHIP tristate "Rockchip EMAC support" select ARC_EMAC_CORE - depends on OF_IRQ && OF_NET && REGULATOR && HAS_DMA + depends on OF_IRQ && OF_NET && REGULATOR && HAS_DMA && (ARCH_ROCKCHIP || COMPILE_TEST) ---help--- Support for Rockchip RK3036/RK3066/RK3188 EMAC ethernet controllers. This selects Rockchip SoC glue layer support for the -- 2.9.3
[PATCH net-next v2 6/6] tcp: SOF_TIMESTAMPING_OPT_STATS option for SO_TIMESTAMPING
From: Francis YanThis patch exports the sender chronograph stats via the socket SO_TIMESTAMPING channel. Currently we can instrument how long a particular application unit of data was queued in TCP by tracking SOF_TIMESTAMPING_TX_SOFTWARE and SOF_TIMESTAMPING_TX_SCHED. Having these sender chronograph stats exported simultaneously along with these timestamps allow further breaking down the various sender limitation. For example, a video server can tell if a particular chunk of video on a connection takes a long time to deliver because TCP was experiencing small receive window. It is not possible to tell before this patch without packet traces. To prepare these stats, the user needs to set SOF_TIMESTAMPING_OPT_STATS and SOF_TIMESTAMPING_OPT_TSONLY flags while requesting other SOF_TIMESTAMPING TX timestamps. When the timestamps are available in the error queue, the stats are returned in a separate control message of type SCM_TIMESTAMPING_OPT_STATS, in a list of TLVs (struct nlattr) of types: TCP_NLA_BUSY_TIME, TCP_NLA_RWND_LIMITED, TCP_NLA_SNDBUF_LIMITED. Unit is microsecond. Signed-off-by: Francis Yan Signed-off-by: Yuchung Cheng Signed-off-by: Soheil Hassas Yeganeh --- ChangeLog since v1: - fix build break if CONFIG_INET is not defined Documentation/networking/timestamping.txt | 10 ++ arch/alpha/include/uapi/asm/socket.h | 2 ++ arch/frv/include/uapi/asm/socket.h| 2 ++ arch/ia64/include/uapi/asm/socket.h | 2 ++ arch/m32r/include/uapi/asm/socket.h | 2 ++ arch/mips/include/uapi/asm/socket.h | 2 ++ arch/mn10300/include/uapi/asm/socket.h| 2 ++ arch/parisc/include/uapi/asm/socket.h | 2 ++ arch/powerpc/include/uapi/asm/socket.h| 2 ++ arch/s390/include/uapi/asm/socket.h | 2 ++ arch/sparc/include/uapi/asm/socket.h | 2 ++ arch/xtensa/include/uapi/asm/socket.h | 2 ++ include/linux/tcp.h | 2 ++ include/uapi/asm-generic/socket.h | 2 ++ include/uapi/linux/net_tstamp.h | 3 ++- include/uapi/linux/tcp.h | 8 net/core/skbuff.c | 14 +++--- net/core/sock.c | 7 +++ net/ipv4/tcp.c| 20 net/socket.c | 7 ++- 20 files changed, 90 insertions(+), 5 deletions(-) diff --git a/Documentation/networking/timestamping.txt b/Documentation/networking/timestamping.txt index 671cccf..96f5069 100644 --- a/Documentation/networking/timestamping.txt +++ b/Documentation/networking/timestamping.txt @@ -182,6 +182,16 @@ SOF_TIMESTAMPING_OPT_TSONLY: the timestamp even if sysctl net.core.tstamp_allow_data is 0. This option disables SOF_TIMESTAMPING_OPT_CMSG. +SOF_TIMESTAMPING_OPT_STATS: + + Optional stats that are obtained along with the transmit timestamps. + It must be used together with SOF_TIMESTAMPING_OPT_TSONLY. When the + transmit timestamp is available, the stats are available in a + separate control message of type SCM_TIMESTAMPING_OPT_STATS, as a + list of TLVs (struct nlattr) of types. These stats allow the + application to associate various transport layer stats with + the transmit timestamps, such as how long a certain block of + data was limited by peer's receiver window. New applications are encouraged to pass SOF_TIMESTAMPING_OPT_ID to disambiguate timestamps and SOF_TIMESTAMPING_OPT_TSONLY to operate diff --git a/arch/alpha/include/uapi/asm/socket.h b/arch/alpha/include/uapi/asm/socket.h index 9e46d6e..afc901b 100644 --- a/arch/alpha/include/uapi/asm/socket.h +++ b/arch/alpha/include/uapi/asm/socket.h @@ -97,4 +97,6 @@ #define SO_CNX_ADVICE 53 +#define SCM_TIMESTAMPING_OPT_STATS 54 + #endif /* _UAPI_ASM_SOCKET_H */ diff --git a/arch/frv/include/uapi/asm/socket.h b/arch/frv/include/uapi/asm/socket.h index afbc98f0..81e0353 100644 --- a/arch/frv/include/uapi/asm/socket.h +++ b/arch/frv/include/uapi/asm/socket.h @@ -90,5 +90,7 @@ #define SO_CNX_ADVICE 53 +#define SCM_TIMESTAMPING_OPT_STATS 54 + #endif /* _ASM_SOCKET_H */ diff --git a/arch/ia64/include/uapi/asm/socket.h b/arch/ia64/include/uapi/asm/socket.h index 0018fad..57feb0c 100644 --- a/arch/ia64/include/uapi/asm/socket.h +++ b/arch/ia64/include/uapi/asm/socket.h @@ -99,4 +99,6 @@ #define SO_CNX_ADVICE 53 +#define SCM_TIMESTAMPING_OPT_STATS 54 + #endif /* _ASM_IA64_SOCKET_H */ diff --git a/arch/m32r/include/uapi/asm/socket.h b/arch/m32r/include/uapi/asm/socket.h index 5fe42fc..5853f8e9 100644 --- a/arch/m32r/include/uapi/asm/socket.h +++ b/arch/m32r/include/uapi/asm/socket.h @@ -90,4 +90,6 @@ #define SO_CNX_ADVICE 53 +#define SCM_TIMESTAMPING_OPT_STATS 54 + #endif /* _ASM_M32R_SOCKET_H */ diff --git a/arch/mips/include/uapi/asm/socket.h b/arch/mips/include/uapi/asm/socket.h index
[PATCH net-next v2 4/6] tcp: instrument how long TCP is limited by insufficient send buffer
From: Francis YanThis patch measures the amount of time when TCP runs out of new data to send to the network due to insufficient send buffer, while TCP is still busy delivering (i.e. write queue is not empty). The goal is to indicate either the send buffer autotuning or user SO_SNDBUF setting has resulted network under-utilization. The measurement starts conservatively by checking various conditions to minimize false claims (i.e. under-estimation is more likely). The measurement stops when the SOCK_NOSPACE flag is cleared. But it does not account the time elapsed till the next application write. Also the measurement only starts if the sender is still busy sending data, s.t. the limit accounted is part of the total busy time. Signed-off-by: Francis Yan Signed-off-by: Yuchung Cheng Signed-off-by: Soheil Hassas Yeganeh --- net/ipv4/tcp.c| 10 -- net/ipv4/tcp_input.c | 5 - net/ipv4/tcp_output.c | 12 3 files changed, 24 insertions(+), 3 deletions(-) diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index 913f9bb..259ffb5 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -996,8 +996,11 @@ static ssize_t do_tcp_sendpages(struct sock *sk, struct page *page, int offset, goto out; out_err: /* make sure we wake any epoll edge trigger waiter */ - if (unlikely(skb_queue_len(>sk_write_queue) == 0 && err == -EAGAIN)) + if (unlikely(skb_queue_len(>sk_write_queue) == 0 && +err == -EAGAIN)) { sk->sk_write_space(sk); + tcp_chrono_stop(sk, TCP_CHRONO_SNDBUF_LIMITED); + } return sk_stream_error(sk, flags, err); } @@ -1331,8 +1334,11 @@ int tcp_sendmsg(struct sock *sk, struct msghdr *msg, size_t size) out_err: err = sk_stream_error(sk, flags, err); /* make sure we wake any epoll edge trigger waiter */ - if (unlikely(skb_queue_len(>sk_write_queue) == 0 && err == -EAGAIN)) + if (unlikely(skb_queue_len(>sk_write_queue) == 0 && +err == -EAGAIN)) { sk->sk_write_space(sk); + tcp_chrono_stop(sk, TCP_CHRONO_SNDBUF_LIMITED); + } release_sock(sk); return err; } diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index a5d1727..56fe736 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -5059,8 +5059,11 @@ static void tcp_check_space(struct sock *sk) /* pairs with tcp_poll() */ smp_mb__after_atomic(); if (sk->sk_socket && - test_bit(SOCK_NOSPACE, >sk_socket->flags)) + test_bit(SOCK_NOSPACE, >sk_socket->flags)) { tcp_new_space(sk); + if (!test_bit(SOCK_NOSPACE, >sk_socket->flags)) + tcp_chrono_stop(sk, TCP_CHRONO_SNDBUF_LIMITED); + } } } diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index b7c..d3545d0 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -1514,6 +1514,18 @@ static void tcp_cwnd_validate(struct sock *sk, bool is_cwnd_limited) if (sysctl_tcp_slow_start_after_idle && (s32)(tcp_time_stamp - tp->snd_cwnd_stamp) >= inet_csk(sk)->icsk_rto) tcp_cwnd_application_limited(sk); + + /* The following conditions together indicate the starvation +* is caused by insufficient sender buffer: +* 1) just sent some data (see tcp_write_xmit) +* 2) not cwnd limited (this else condition) +* 3) no more data to send (null tcp_send_head ) +* 4) application is hitting buffer limit (SOCK_NOSPACE) +*/ + if (!tcp_send_head(sk) && sk->sk_socket && + test_bit(SOCK_NOSPACE, >sk_socket->flags) && + (1 << sk->sk_state) & (TCPF_ESTABLISHED | TCPF_CLOSE_WAIT)) + tcp_chrono_start(sk, TCP_CHRONO_SNDBUF_LIMITED); } } -- 2.8.0.rc3.226.g39d4020
[PATCH net-next v2 5/6] tcp: export sender limits chronographs to TCP_INFO
From: Francis YanThis patch exports all the sender chronograph measurements collected in the previous patches to TCP_INFO interface. Note that busy time exported includes all the other sending limits (rwnd-limited, sndbuf-limited). Internally the time unit is jiffy but externally the measurements are in microseconds for future extensions. Signed-off-by: Francis Yan Signed-off-by: Yuchung Cheng Signed-off-by: Soheil Hassas Yeganeh --- include/uapi/linux/tcp.h | 4 net/ipv4/tcp.c | 20 2 files changed, 24 insertions(+) diff --git a/include/uapi/linux/tcp.h b/include/uapi/linux/tcp.h index 73ac0db..2863b66 100644 --- a/include/uapi/linux/tcp.h +++ b/include/uapi/linux/tcp.h @@ -214,6 +214,10 @@ struct tcp_info { __u32 tcpi_data_segs_out; /* RFC4898 tcpEStatsDataSegsOut */ __u64 tcpi_delivery_rate; + + __u64 tcpi_busy_time; /* Time (usec) busy sending data */ + __u64 tcpi_rwnd_limited; /* Time (usec) limited by receive window */ + __u64 tcpi_sndbuf_limited; /* Time (usec) limited by send buffer */ }; /* for TCP_MD5SIG socket option */ diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index 259ffb5..cdde20f 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -2708,6 +2708,25 @@ int compat_tcp_setsockopt(struct sock *sk, int level, int optname, EXPORT_SYMBOL(compat_tcp_setsockopt); #endif +static void tcp_get_info_chrono_stats(const struct tcp_sock *tp, + struct tcp_info *info) +{ + u64 stats[__TCP_CHRONO_MAX], total = 0; + enum tcp_chrono i; + + for (i = TCP_CHRONO_BUSY; i < __TCP_CHRONO_MAX; ++i) { + stats[i] = tp->chrono_stat[i - 1]; + if (i == tp->chrono_type) + stats[i] += tcp_time_stamp - tp->chrono_start; + stats[i] *= USEC_PER_SEC / HZ; + total += stats[i]; + } + + info->tcpi_busy_time = total; + info->tcpi_rwnd_limited = stats[TCP_CHRONO_RWND_LIMITED]; + info->tcpi_sndbuf_limited = stats[TCP_CHRONO_SNDBUF_LIMITED]; +} + /* Return information about state of tcp endpoint in API format. */ void tcp_get_info(struct sock *sk, struct tcp_info *info) { @@ -2800,6 +2819,7 @@ void tcp_get_info(struct sock *sk, struct tcp_info *info) info->tcpi_bytes_acked = tp->bytes_acked; info->tcpi_bytes_received = tp->bytes_received; info->tcpi_notsent_bytes = max_t(int, 0, tp->write_seq - tp->snd_nxt); + tcp_get_info_chrono_stats(tp, info); unlock_sock_fast(sk, slow); -- 2.8.0.rc3.226.g39d4020
[PATCH net-next v2 3/6] tcp: instrument how long TCP is limited by receive window
From: Francis YanThis patch measures the total time when the TCP stops sending because the receiver's advertised window is not large enough. Note that once the limit is lifted we are likely in the busy status if we have data pending. Signed-off-by: Francis Yan Signed-off-by: Yuchung Cheng Signed-off-by: Soheil Hassas Yeganeh --- net/ipv4/tcp_output.c | 11 +-- 1 file changed, 9 insertions(+), 2 deletions(-) diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index e8ea584..b7c 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -2144,7 +2144,7 @@ static bool tcp_write_xmit(struct sock *sk, unsigned int mss_now, int nonagle, unsigned int tso_segs, sent_pkts; int cwnd_quota; int result; - bool is_cwnd_limited = false; + bool is_cwnd_limited = false, is_rwnd_limited = false; u32 max_segs; sent_pkts = 0; @@ -2181,8 +2181,10 @@ static bool tcp_write_xmit(struct sock *sk, unsigned int mss_now, int nonagle, break; } - if (unlikely(!tcp_snd_wnd_test(tp, skb, mss_now))) + if (unlikely(!tcp_snd_wnd_test(tp, skb, mss_now))) { + is_rwnd_limited = true; break; + } if (tso_segs == 1) { if (unlikely(!tcp_nagle_test(tp, skb, mss_now, @@ -2227,6 +2229,11 @@ static bool tcp_write_xmit(struct sock *sk, unsigned int mss_now, int nonagle, break; } + if (is_rwnd_limited) + tcp_chrono_start(sk, TCP_CHRONO_RWND_LIMITED); + else + tcp_chrono_stop(sk, TCP_CHRONO_RWND_LIMITED); + if (likely(sent_pkts)) { if (tcp_in_cwnd_reduction(sk)) tp->prr_out += sent_pkts; -- 2.8.0.rc3.226.g39d4020
[PATCH net-next v2 1/6] tcp: instrument tcp sender limits chronographs
From: Francis YanThis patch implements the skeleton of the TCP chronograph instrumentation on sender side limits: 1) idle (unspec) 2) busy sending data other than 3-4 below 3) rwnd-limited 4) sndbuf-limited The limits are enumerated 'tcp_chrono'. Since a connection in theory can idle forever, we do not track the actual length of this uninteresting idle period. For the rest we track how long the sender spends in each limit. At any point during the life time of a connection, the sender must be in one of the four states. If there are multiple conditions worthy of tracking in a chronograph then the highest priority enum takes precedence over the other conditions. So that if something "more interesting" starts happening, stop the previous chrono and start a new one. The time unit is jiffy(u32) in order to save space in tcp_sock. This implies application must sample the stats no longer than every 49 days of 1ms jiffy. Signed-off-by: Francis Yan Signed-off-by: Yuchung Cheng Signed-off-by: Soheil Hassas Yeganeh --- include/linux/tcp.h | 7 +-- include/net/tcp.h | 14 ++ net/ipv4/tcp_output.c | 30 ++ 3 files changed, 49 insertions(+), 2 deletions(-) diff --git a/include/linux/tcp.h b/include/linux/tcp.h index 32a7c7e..d5d3bd8 100644 --- a/include/linux/tcp.h +++ b/include/linux/tcp.h @@ -211,8 +211,11 @@ struct tcp_sock { u8 reord;/* reordering detected */ } rack; u16 advmss; /* Advertised MSS */ - u8 rate_app_limited:1, /* rate_{delivered,interval_us} limited? */ - unused:7; + u32 chrono_start; /* Start time in jiffies of a TCP chrono */ + u32 chrono_stat[3]; /* Time in jiffies for chrono_stat stats */ + u8 chrono_type:2, /* current chronograph type */ + rate_app_limited:1, /* rate_{delivered,interval_us} limited? */ + unused:5; u8 nonagle : 4,/* Disable Nagle algorithm? */ thin_lto: 1,/* Use linear timeouts for thin streams */ thin_dupack : 1,/* Fast retransmit on first dupack */ diff --git a/include/net/tcp.h b/include/net/tcp.h index 7de8073..e5ff408 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -1516,6 +1516,20 @@ struct tcp_fastopen_context { struct rcu_head rcu; }; +/* Latencies incurred by various limits for a sender. They are + * chronograph-like stats that are mutually exclusive. + */ +enum tcp_chrono { + TCP_CHRONO_UNSPEC, + TCP_CHRONO_BUSY, /* Actively sending data (non-empty write queue) */ + TCP_CHRONO_RWND_LIMITED, /* Stalled by insufficient receive window */ + TCP_CHRONO_SNDBUF_LIMITED, /* Stalled by insufficient send buffer */ + __TCP_CHRONO_MAX, +}; + +void tcp_chrono_start(struct sock *sk, const enum tcp_chrono type); +void tcp_chrono_stop(struct sock *sk, const enum tcp_chrono type); + /* write queue abstraction */ static inline void tcp_write_queue_purge(struct sock *sk) { diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index 19105b4..34f7517 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -2081,6 +2081,36 @@ static bool tcp_small_queue_check(struct sock *sk, const struct sk_buff *skb, return false; } +static void tcp_chrono_set(struct tcp_sock *tp, const enum tcp_chrono new) +{ + const u32 now = tcp_time_stamp; + + if (tp->chrono_type > TCP_CHRONO_UNSPEC) + tp->chrono_stat[tp->chrono_type - 1] += now - tp->chrono_start; + tp->chrono_start = now; + tp->chrono_type = new; +} + +void tcp_chrono_start(struct sock *sk, const enum tcp_chrono type) +{ + struct tcp_sock *tp = tcp_sk(sk); + + /* If there are multiple conditions worthy of tracking in a +* chronograph then the highest priority enum takes precedence over +* the other conditions. So that if something "more interesting" +* starts happening, stop the previous chrono and start a new one. +*/ + if (type > tp->chrono_type) + tcp_chrono_set(tp, type); +} + +void tcp_chrono_stop(struct sock *sk, const enum tcp_chrono type) +{ + struct tcp_sock *tp = tcp_sk(sk); + + tcp_chrono_set(tp, TCP_CHRONO_UNSPEC); +} + /* This routine writes packets to the network. It advances the * send_head. This happens as incoming acks open up the remote * window for us. -- 2.8.0.rc3.226.g39d4020
[PATCH net-next v2 2/6] tcp: instrument how long TCP is busy sending
From: Francis YanThis patch measures TCP busy time, which is defined as the period of time when sender has data (or FIN) to send. The time starts when data is buffered and stops when the write queue is flushed by ACKs or error events. Note the busy time does not include SYN time, unless data is included in SYN (i.e. Fast Open). It does include FIN time even if the FIN carries no payload. Excluding pure FIN is possible but would incur one additional test in the fast path, which may not be worth it. Signed-off-by: Francis Yan Signed-off-by: Yuchung Cheng Signed-off-by: Soheil Hassas Yeganeh --- include/net/tcp.h | 6 +- net/ipv4/tcp_input.c | 3 +++ net/ipv4/tcp_output.c | 19 --- 3 files changed, 24 insertions(+), 4 deletions(-) diff --git a/include/net/tcp.h b/include/net/tcp.h index e5ff408..3e097e3 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -1535,6 +1535,7 @@ static inline void tcp_write_queue_purge(struct sock *sk) { struct sk_buff *skb; + tcp_chrono_stop(sk, TCP_CHRONO_BUSY); while ((skb = __skb_dequeue(>sk_write_queue)) != NULL) sk_wmem_free_skb(sk, skb); sk_mem_reclaim(sk); @@ -1593,8 +1594,10 @@ static inline void tcp_advance_send_head(struct sock *sk, const struct sk_buff * static inline void tcp_check_send_head(struct sock *sk, struct sk_buff *skb_unlinked) { - if (sk->sk_send_head == skb_unlinked) + if (sk->sk_send_head == skb_unlinked) { sk->sk_send_head = NULL; + tcp_chrono_stop(sk, TCP_CHRONO_BUSY); + } if (tcp_sk(sk)->highest_sack == skb_unlinked) tcp_sk(sk)->highest_sack = NULL; } @@ -1616,6 +1619,7 @@ static inline void tcp_add_write_queue_tail(struct sock *sk, struct sk_buff *skb /* Queue it, remembering where we must start sending. */ if (sk->sk_send_head == NULL) { sk->sk_send_head = skb; + tcp_chrono_start(sk, TCP_CHRONO_BUSY); if (tcp_sk(sk)->highest_sack == NULL) tcp_sk(sk)->highest_sack = skb; diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index 22e6a20..a5d1727 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -3178,6 +3178,9 @@ static int tcp_clean_rtx_queue(struct sock *sk, int prior_fackets, tp->lost_skb_hint = NULL; } + if (!skb) + tcp_chrono_stop(sk, TCP_CHRONO_BUSY); + if (likely(between(tp->snd_up, prior_snd_una, tp->snd_una))) tp->snd_up = tp->snd_una; diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index 34f7517..e8ea584 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -2096,8 +2096,8 @@ void tcp_chrono_start(struct sock *sk, const enum tcp_chrono type) struct tcp_sock *tp = tcp_sk(sk); /* If there are multiple conditions worthy of tracking in a -* chronograph then the highest priority enum takes precedence over -* the other conditions. So that if something "more interesting" +* chronograph then the highest priority enum takes precedence +* over the other conditions. So that if something "more interesting" * starts happening, stop the previous chrono and start a new one. */ if (type > tp->chrono_type) @@ -2108,7 +2108,18 @@ void tcp_chrono_stop(struct sock *sk, const enum tcp_chrono type) { struct tcp_sock *tp = tcp_sk(sk); - tcp_chrono_set(tp, TCP_CHRONO_UNSPEC); + + /* There are multiple conditions worthy of tracking in a +* chronograph, so that the highest priority enum takes +* precedence over the other conditions (see tcp_chrono_start). +* If a condition stops, we only stop chrono tracking if +* it's the "most interesting" or current chrono we are +* tracking and starts busy chrono if we have pending data. +*/ + if (tcp_write_queue_empty(sk)) + tcp_chrono_set(tp, TCP_CHRONO_UNSPEC); + else if (type == tp->chrono_type) + tcp_chrono_set(tp, TCP_CHRONO_BUSY); } /* This routine writes packets to the network. It advances the @@ -3328,6 +3339,8 @@ static int tcp_send_syn_data(struct sock *sk, struct sk_buff *syn) fo->copied = space; tcp_connect_queue_skb(sk, syn_data); + if (syn_data->len) + tcp_chrono_start(sk, TCP_CHRONO_BUSY); err = tcp_transmit_skb(sk, syn_data, 1, sk->sk_allocation); -- 2.8.0.rc3.226.g39d4020
[PATCH net-next v2 0/6] tcp: sender chronographs instrumentation
This patch set provides instrumentation on TCP sender limitations. While developing the BBR congestion control, we noticed that TCP sending process is often limited by factors unrelated to congestion control: insufficient sender buffer and/or insufficient receive window/buffer to saturate the network bandwidth. Unfortunately these limits are not visible to the users and often the poor performance is attributed to the congestion control of choice. Thie patch aims to help users get the high level understanding of where sending process is limited by, similar to the TCP_INFO design. It is not to replace detailed kernel tracing and instrumentation facilities. In addition this patch set provide a new option to the timestamping work to instrument these limits on application data unit. For exampe, one can use SO_TIMESTAMPING and this patch set to measure the how long a particular HTTP response is limited by small receive window. Patch set was initially written by Francis Yan then polished by Yuchung Cheng, with lots of help from Eric Dumazet and Soheil Hassas Yeganeh. Francis Yan (6): tcp: instrument tcp sender limits chronographs tcp: instrument how long TCP is busy sending tcp: instrument how long TCP is limited by receive window tcp: instrument how long TCP is limited by insufficient send buffer tcp: export sender limits chronographs to TCP_INFO tcp: SOF_TIMESTAMPING_OPT_STATS option for SO_TIMESTAMPING Documentation/networking/timestamping.txt | 10 + arch/alpha/include/uapi/asm/socket.h | 2 + arch/frv/include/uapi/asm/socket.h| 2 + arch/ia64/include/uapi/asm/socket.h | 2 + arch/m32r/include/uapi/asm/socket.h | 2 + arch/mips/include/uapi/asm/socket.h | 2 + arch/mn10300/include/uapi/asm/socket.h| 2 + arch/parisc/include/uapi/asm/socket.h | 2 + arch/powerpc/include/uapi/asm/socket.h| 2 + arch/s390/include/uapi/asm/socket.h | 2 + arch/sparc/include/uapi/asm/socket.h | 2 + arch/xtensa/include/uapi/asm/socket.h | 2 + include/linux/tcp.h | 9 - include/net/tcp.h | 20 +- include/uapi/asm-generic/socket.h | 2 + include/uapi/linux/net_tstamp.h | 3 +- include/uapi/linux/tcp.h | 12 ++ net/core/skbuff.c | 14 +-- net/core/sock.c | 7 net/ipv4/tcp.c| 50 ++- net/ipv4/tcp_input.c | 8 +++- net/ipv4/tcp_output.c | 66 ++- net/socket.c | 7 +++- 23 files changed, 217 insertions(+), 13 deletions(-) -- 2.8.0.rc3.226.g39d4020
Re: [PATCH] net: fec: turn on device when extracting statistics
28.11.2016 04:29, David Miller пишет: > From: Nikita Yushchenko> Date: Fri, 25 Nov 2016 13:02:00 +0300 > >> +int i, ret; >> + >> +ret = pm_runtime_get_sync(>pdev->dev); >> +if (IS_ERR_VALUE(ret)) { >> +memset(data, 0, sizeof(*data) * ARRAY_SIZE(fec_stats)); >> +return; >> +} > > This really isn't the way to do this. > > When the device is suspended and the clocks are going to be stopped, > you must fetch the statistic values into a software copy and provide > those if the device is suspended when statistics are requested. Ok, can do that, although can't see what's wrong with waking device here. The situation of requesting stats on down device isn't something widely used, thus keeping handling of that as local as possible looks better for me.
[PATCH] vxlan: fix a potential issue when create a new vxlan fdb entry.
vxlan_fdb_append may return error, so add the proper check, otherwise it will cause memory leak. Signed-off-by: Haishuang Yan--- drivers/net/vxlan.c | 10 +++--- 1 file changed, 7 insertions(+), 3 deletions(-) diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c index 21e92be..3b7b237 100644 --- a/drivers/net/vxlan.c +++ b/drivers/net/vxlan.c @@ -611,6 +611,7 @@ static int vxlan_fdb_create(struct vxlan_dev *vxlan, struct vxlan_rdst *rd = NULL; struct vxlan_fdb *f; int notify = 0; + int rc = 0; f = __vxlan_find_mac(vxlan, mac); if (f) { @@ -641,8 +642,7 @@ static int vxlan_fdb_create(struct vxlan_dev *vxlan, if ((flags & NLM_F_APPEND) && (is_multicast_ether_addr(f->eth_addr) || is_zero_ether_addr(f->eth_addr))) { - int rc = vxlan_fdb_append(f, ip, port, vni, ifindex, - ); + rc = vxlan_fdb_append(f, ip, port, vni, ifindex, ); if (rc < 0) return rc; @@ -673,7 +673,11 @@ static int vxlan_fdb_create(struct vxlan_dev *vxlan, INIT_LIST_HEAD(>remotes); memcpy(f->eth_addr, mac, ETH_ALEN); - vxlan_fdb_append(f, ip, port, vni, ifindex, ); + rc = vxlan_fdb_append(f, ip, port, vni, ifindex, ); + if (rc < 0) { + kfree(f); + return rc; + } ++vxlan->addrcnt; hlist_add_head_rcu(>hlist, -- 1.8.3.1
RE: BALANCE PAYMENT
Dear Sir/s, Please see attached. Thanks and regards, Accounts Department Al Omraniya Trading Co. LLC P.O. Box: 10757, Al Khabaisi Area, Deira 2, Dubai, U.A.E. Tel: +971 4 268 2730 / Fax: +971 4 268 4117
Re: [PATCH net] net, sched: respect rcu grace period on cls destruction
On Sat, Nov 26, 2016 at 4:18 PM, Daniel Borkmannwrote: > Roi reported a crash in flower where tp->root was NULL in ->classify() > callbacks. Reason is that in ->destroy() tp->root is set to NULL via > RCU_INIT_POINTER(). It's problematic for some of the classifiers, because > this doesn't respect RCU grace period for them, and as a result, still > outstanding readers from tc_classify() will try to blindly dereference > a NULL tp->root. > > The tp->root object is strictly private to the classifier implementation > and holds internal data the core such as tc_ctl_tfilter() doesn't know > about. Within some classifiers, such as cls_bpf, cls_basic, etc, tp->root > is only checked for NULL in ->get() callback, but nowhere else. This is > misleading and seemed to be copied from old classifier code that was not > cleaned up properly. For example, d3fa76ee6b4a ("[NET_SCHED]: cls_basic: > fix NULL pointer dereference") moved tp->root initialization into ->init() > routine, where before it was part of ->change(), so ->get() had to deal > with tp->root being NULL back then, so that was indeed a valid case, after > d3fa76ee6b4a, not really anymore. We used to set tp->root to NULL long > ago in ->destroy(), see 47a1a1d4be29 ("pkt_sched: remove unnecessary xchg() > in packet classifiers"); but the NULLifying was reintroduced with the > RCUification, but it's not correct for every classifier implementation. > > In the cases that are fixed here with one exception of cls_cgroup, tp->root > object is allocated and initialized inside ->init() callback, which is always > performed at a point in time after we allocate a new tp, which means tp and > thus tp->root was not globally visible in the tp chain yet (see > tc_ctl_tfilter()). > Also, on destruction tp->root is strictly kfree_rcu()'ed in ->destroy() > handler, same for the tp which is kfree_rcu()'ed right when we return > from ->destroy() in tcf_destroy(). This means, the head object's lifetime > for such classifiers is always tied to the tp lifetime. The RCU callback > invocation for the two kfree_rcu() could be out of order, but that's fine > since both are independent. > > Dropping the RCU_INIT_POINTER(tp->root, NULL) for these classifiers here > means that 1) we don't need a useless NULL check in fast-path and, 2) that > outstanding readers of that tp in tc_classify() can still execute under > respect with RCU grace period as it is actually expected. > > Things that haven't been touched here: cls_fw and cls_route. They each > handle tp->root being NULL in ->classify() path for historic reasons, so > their ->destroy() implementation can stay as is. If someone actually > cares, they could get cleaned up at some point to avoid the test in fast > path. cls_u32 doesn't set tp->root to NULL. For cls_rsvp, I just added a > !head should anyone actually be using/testing it, so it at least aligns with > cls_fw and cls_route. For cls_flower we additionally need to defer rhashtable > destruction (to a sleepable context) after RCU grace period as concurrent > readers might still access it. (Note that in this case we need to hold module > reference to keep work callback address intact, since we only wait on module > unload for all call_rcu()s to finish.) > > This fixes one race to bring RCU grace period guarantees back. Next step > as worked on by Cong however is to fix 1e052be69d04 ("net_sched: destroy > proto tp when all filters are gone") to get the order of unlinking the tp > in tc_ctl_tfilter() for the RTM_DELTFILTER case right by moving > RCU_INIT_POINTER() before tcf_destroy() and let the notification for > removal be done through the prior ->delete() callback. Both are independant > issues. Once we have that right, we can then clean tp->root up for a number > of classifiers by not making them RCU pointers, which requires a new callback > (->uninit) that is triggered from tp's RCU callback, where we just kfree() > tp->root from there. Looks good to my eyes, Acked-by: Cong Wang The ugly part is the work struct, I am not an RCU expert so don't know if we have any API to execute an RCU callback in process context. Paul? Thanks.
RE: BALANCE PAYMENT
Dear Sir/s, Please see attached. Thanks and regards, Accounts Department Al Omraniya Trading Co. LLC P.O. Box: 10757, Al Khabaisi Area, Deira 2, Dubai, U.A.E. Tel: +971 4 268 2730 / Fax: +971 4 268 4117
Re: Crash due to mutex genl_lock called from RCU context
On Sun, Nov 27, 2016 at 8:23 AM, Eric Dumazetwrote: > On Sat, 2016-11-26 at 22:28 -0800, Cong Wang wrote: >> On Sat, Nov 26, 2016 at 6:26 PM, Eric Dumazet wrote: >> > >> > Are you telling me inet_release() is called when we close() the first >> > file descriptor ? >> > >> > fd1 = socket() >> > fd2 = dup(fd1); >> > close(fd2) -> release() ??? >> >> Sorry, I didn't express myself clearly, I meant your change, >> if exclude the SOCK_RCU_FREE part, basically reverts this commit: >> >> commit 3f660d66dfbc13ea4b61d3865851b348444c24b4 >> Author: Herbert Xu >> Date: Thu May 3 03:17:14 2007 -0700 >> >> [NETLINK]: Kill CB only when socket is unused >> >> IOW, ->release() is called when the last sock fd ref is gone, but >> ->destructor() >> is called with the last sock ref is gone. They are very different. > > Hmm... > > >> I am confused, what Subash reported is a kernel warning which can >> surely be fixed by removing genl lock (if it is correct, I need to double >> check), so why for net-next? > > Because Subash pointed to a buggy commit. > > We want to fix all issues bring by this commit, not only the immediate > problem about mutex. > > I have no idea if we can safely remove the mutex from genl_lock_done() : I meant removing it only for the destructor case, we definitely can't remove it for the dump case. > > The genl_lock() is not only protecting the socket itself, it might > protect global data as well, or protect some kind of lock ordering among > multiple mutexes. > > Have you checked all genl users, down to linux-4.0 , point where commit > 21e4902aea80ef35a was added ? > I just took a deeper look, some user calls rhashtable_destroy() in ->done(), so even removing that genl lock is not enough, perhaps we should just move it to a work struct like what Daniel does for the tcf_proto, but that is ugly... I don't know if RCU provides any API to execute the callback in process context.
[patch net] net: dsa: fix unbalanced dsa_switch_tree reference counting
_dsa_register_switch() gets a dsa_switch_tree object either via dsa_get_dst() or via dsa_add_dst(). Former path does not increase kref in returned object (resulting into caller not owning a reference), while later path does create a new object (resulting into caller owning a reference). The rest of _dsa_register_switch() assumes that it owns a reference, and calls dsa_put_dst(). This causes a memory breakage if first switch in the tree initialized successfully, but second failed to initialize. In particular, freed dsa_swith_tree object is left referenced by switch that was initialized, and later access to sysfs attributes of that switch cause OOPS. To fix, need to add kref_get() call to dsa_get_dst(). Signed-off-by: Nikita YushchenkoFixes: 83c0afaec7b7 ("net: dsa: Add new binding implementation") Reviewed-by: Andrew Lunn --- net/dsa/dsa2.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/net/dsa/dsa2.c b/net/dsa/dsa2.c index f8a7d9aab437..5fff951a0a49 100644 --- a/net/dsa/dsa2.c +++ b/net/dsa/dsa2.c @@ -28,8 +28,10 @@ static struct dsa_switch_tree *dsa_get_dst(u32 tree) struct dsa_switch_tree *dst; list_for_each_entry(dst, _switch_trees, list) - if (dst->tree == tree) + if (dst->tree == tree) { + kref_get(>refcount); return dst; + } return NULL; } -- 2.1.4
RE: [net,v2] neigh: fix the loop index error in neigh dump
> -Original Message- > From: David Ahern [mailto:d...@cumulusnetworks.com] > Sent: Monday, November 28, 2016 1:07 PM > To: 张胜举; > netdev@vger.kernel.org > Subject: Re: [net,v2] neigh: fix the loop index error in neigh dump > > On 11/27/16 9:50 PM, 张胜举 wrote: > > No, when dump request must be processed by multiple 'recv/recvmsg' > > system calls, idx stores which dev/neigh the previous call have > > processed, so that next call will scan from the right place. > > I have tested multiple calls and I do not see redundant information or missing > information. > > > > > So no matter whether the dev/neigh is filtered, the idx should be > > increased anyway. > > No, it does not. Again, idx is the index in the list of devices/ of interest. It is > NOT a device index nor is it the absolute index in the list. It is a relative index. > The filter is the same across recvmsg calls so the idx count is absolutely fine. > > Produce a test case that fails. David, I know your point. And I agree with you that this will not make redundant or missing link information. But this will cause the filtered out device be scanned multiple times. For example, assume that netlink message can only store two devices info. And eth2-eth5 are filtered out. For the first loop, idx will point to eth2, but the code already scan to eth6. eth0->eth1->eth2(out)->eth3(out)-> eth4(out)->eth5(out)->eth6->eth7 ^ The next loop, the code will start to scan from eth2 to eth8, but eth2-eth5 already scanned by previous loop. After this loop, idx will point to eth4. eth0->eth1->eth2(out)->eth3(out)->eth4(out)->eth5(out)->eth6->eth7->eth8 ^ So this will cause the same device to be scanned multiple times. Almost all other dump functions treat idx as the absolute index in the list, and will not have the above problem. We don't treat this a bugfix, but i think we'd better in line with other dump functions.
Re: [PATCH] geneve: fix ip_hdr_len reserved for geneve6 tunnel.
On Sun, Nov 27, 2016 at 9:26 PM, Haishuang Yanwrote: > It shold reserved sizeof(ipv6hdr) for geneve in ipv6 tunnel. > > Fixes: c3ef5aa5e5 ('geneve: Merge ipv4 and ipv6 geneve_build_skb()') > > Signed-off-by: Haishuang Yan Thanks for fix. Acked-by: Pravin B Shelar
Re: [PATCH net] net/sched: act_pedit: limit negative offset
From: Cong WangDate: Sun, 27 Nov 2016 21:39:33 -0800 > On Sun, Nov 27, 2016 at 7:58 AM, Amir Vadai wrote: >> Should not allow setting a negative offset that goes below the skb head. > ... >> diff --git a/net/sched/act_pedit.c b/net/sched/act_pedit.c >> index b54d56d4959b..e79e8a88f2d2 100644 >> --- a/net/sched/act_pedit.c >> +++ b/net/sched/act_pedit.c >> @@ -154,8 +154,11 @@ static int tcf_pedit(struct sk_buff *skb, const struct >> tc_action *a, >> } >> >> ptr = skb_header_pointer(skb, off + offset, 4, >> &_data); >> - if (!ptr) >> + if ((unsigned char *)ptr < skb->head) { > > > ptr returned could be &_data, which is on stack, so why this comparison > makes sense for this case? Indeed, this will definitely do the wrong thing when the on-stack area passed back to ptr.
Re: [PATCH net] net/sched: act_pedit: limit negative offset
On Sun, Nov 27, 2016 at 7:58 AM, Amir Vadaiwrote: > Should not allow setting a negative offset that goes below the skb head. ... > diff --git a/net/sched/act_pedit.c b/net/sched/act_pedit.c > index b54d56d4959b..e79e8a88f2d2 100644 > --- a/net/sched/act_pedit.c > +++ b/net/sched/act_pedit.c > @@ -154,8 +154,11 @@ static int tcf_pedit(struct sk_buff *skb, const struct > tc_action *a, > } > > ptr = skb_header_pointer(skb, off + offset, 4, > &_data); > - if (!ptr) > + if ((unsigned char *)ptr < skb->head) { ptr returned could be &_data, which is on stack, so why this comparison makes sense for this case? > + pr_info("tc filter pedit offset out of > bounds\n"); > goto bad; > + } > + > /* just do it, baby */ > *ptr = ((*ptr & tkey->mask) ^ tkey->val); > if (ptr == &_data) > -- > 2.10.2 >
[PATCH] geneve: fix ip_hdr_len reserved for geneve6 tunnel.
It shold reserved sizeof(ipv6hdr) for geneve in ipv6 tunnel. Fixes: c3ef5aa5e5 ('geneve: Merge ipv4 and ipv6 geneve_build_skb()') Signed-off-by: Haishuang Yan--- drivers/net/geneve.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/geneve.c b/drivers/net/geneve.c index 7b80e28..45301cb 100644 --- a/drivers/net/geneve.c +++ b/drivers/net/geneve.c @@ -852,7 +852,7 @@ static int geneve6_xmit_skb(struct sk_buff *skb, struct net_device *dev, ip_hdr(skb), skb); ttl = key->ttl ? : ip6_dst_hoplimit(dst); } - err = geneve_build_skb(dst, skb, info, xnet, sizeof(struct iphdr)); + err = geneve_build_skb(dst, skb, info, xnet, sizeof(struct ipv6hdr)); if (unlikely(err)) return err; -- 1.8.3.1
Re: [net,v2] neigh: fix the loop index error in neigh dump
On 11/27/16 9:50 PM, 张胜举 wrote: > No, when dump request must be processed by multiple 'recv/recvmsg' system > calls, > idx stores which dev/neigh the previous call have processed, so that next > call will scan > from the right place. I have tested multiple calls and I do not see redundant information or missing information. > > So no matter whether the dev/neigh is filtered, the idx should be increased > anyway. No, it does not. Again, idx is the index in the list of devices/ of interest. It is NOT a device index nor is it the absolute index in the list. It is a relative index. The filter is the same across recvmsg calls so the idx count is absolutely fine. Produce a test case that fails.
Re: [PATCH net-next 2/9] liquidio CN23XX: VF registration
From: Raghu VatsavayiDate: Sun, 27 Nov 2016 20:51:35 -0800 > +static int > +liquidio_vf_probe(struct pci_dev *pdev, > + const struct pci_device_id *ent __attribute__((unused))) > +{ > + struct octeon_device *oct_dev = NULL; ... > + /* set linux specific device pointer */ > + oct_dev->pci_dev = (void *)pdev; > + This is a terrible cast on several levels. The type is already correct, oct_dev->pci_dev and pdev are both "struct pci_dev *" Furthermore, even if oct_dev->pci_dev was "void *", void pointer casts are _never_ necessary on assignment from any other pointer type.
Re: [PATCH v3 net-next 1/2] net: ethernet: slicoss: add slicoss gigabit ethernet driver
On 11/26/2016 04:20 AM, Lino Sanfilippo wrote: > Add driver for Alacritech gigabit ethernet cards with SLIC (session-layer > interface control) technology. The driver provides basic support without > SLIC for the following devices: > > - Mojave cards (single port PCI Gigabit) both copper and fiber > - Oasis cards (single and dual port PCI-x Gigabit) copper and fiber > - Kalahari cards (dual and quad port PCI-e Gigabit) copper and fiber This looks great, a few nits below: > +#define SLIC_MAX_TX_COMPLETIONS 100 You usually don't want to limit the number of TX completion, if the entire TX ring needs to be cleaned, you would want to allow that. [snip] > + while (slic_get_free_rx_descs(rxq) > SLIC_MAX_REQ_RX_DESCS) { > + skb = alloc_skb(maplen + ALIGN_MASK, gfp); > + if (!skb) > + break; > + > + paddr = dma_map_single(>pdev->dev, skb->data, maplen, > +DMA_FROM_DEVICE); > + if (dma_mapping_error(>pdev->dev, paddr)) { > + netdev_err(dev, "mapping rx packet failed\n"); > + /* drop skb */ > + dev_kfree_skb_any(skb); > + break; > + } > + /* ensure head buffer descriptors are 256 byte aligned */ > + offset = 0; > + misalign = paddr & ALIGN_MASK; > + if (misalign) { > + offset = SLIC_RX_BUFF_ALIGN - misalign; > + skb_reserve(skb, offset); > + } > + /* the HW expects dma chunks for descriptor + frame data */ > + desc = (struct slic_rx_desc *)skb->data; > + memset(desc, 0, sizeof(*desc)); Do you really need to zero-out the prepending RX descriptor? Are not you missing a write barrier here? [snip] > + > + dma_sync_single_for_cpu(>pdev->dev, > + dma_unmap_addr(buff, map_addr), > + buff->addr_offset + sizeof(*desc), > + DMA_FROM_DEVICE); > + > + status = le32_to_cpu(desc->status); > + if (!(status & SLIC_IRHDDR_SVALID)) > + break; > + > + buff->skb = NULL; > + > + dma_unmap_single(>pdev->dev, > + dma_unmap_addr(buff, map_addr), > + dma_unmap_len(buff, map_len), > + DMA_FROM_DEVICE); This is potentially inefficient, you already did a cache invalidation for the RX descriptor here, you could be more efficient with just invalidating the packet length, minus the descriptor length. > + > + /* skip rx descriptor that is placed before the frame data */ > + skb_reserve(skb, SLIC_RX_BUFF_HDR_SIZE); > + > + if (unlikely(status & SLIC_IRHDDR_ERR)) { > + slic_handle_frame_error(sdev, skb); > + dev_kfree_skb_any(skb); > + } else { > + struct ethhdr *eh = (struct ethhdr *)skb->data; > + > + if (is_multicast_ether_addr(eh->h_dest)) > + SLIC_INC_STATS_COUNTER(>stats, rx_mcasts); > + > + len = le32_to_cpu(desc->length) & SLIC_IRHDDR_FLEN_MSK; > + skb_put(skb, len); > + skb->protocol = eth_type_trans(skb, dev); > + skb->ip_summed = CHECKSUM_UNNECESSARY; > + skb->dev = dev; eth_type_trans() already assigns skb->dev = dev; > +static int slic_poll(struct napi_struct *napi, int todo) > +{ > + struct slic_device *sdev = container_of(napi, struct slic_device, napi); > + struct slic_shmem *sm = >shmem; > + struct slic_shmem_data *sm_data = sm->shmem_data; > + u32 isr = le32_to_cpu(sm_data->isr); > + unsigned int done = 0; > + > + slic_handle_irq(sdev, isr, todo, ); > + > + if (done < todo) { > + napi_complete(napi); napi_complete_done() since you know how many packets you completed. > + /* reenable irqs */ > + sm_data->isr = 0; > + /* make sure sm_data->isr is cleard before irqs are reenabled */ > + wmb(); > + slic_write(sdev, SLIC_REG_ISR, 0); > + slic_flush_write(sdev); > + } > + > + return done; > +} > + > +static irqreturn_t slic_irq(int irq, void *dev_id) > +{ > + struct slic_device *sdev = dev_id; > + struct slic_shmem *sm = >shmem; > + struct slic_shmem_data *sm_data = sm->shmem_data; > + > + slic_write(sdev, SLIC_REG_ICR, SLIC_ICR_INT_MASK); > + slic_flush_write(sdev); > + /* make sure sm_data->isr is read after ICR_INT_MASK is set */ > + wmb(); > + > + if (!sm_data->isr) { > + dma_rmb(); > + /* spurious interrupt */ > + slic_write(sdev, SLIC_REG_ISR, 0); > +
Re: [PATCH net v2 0/5] net: fix phydev reference leaks
From: Timur TabiDate: Sun, 27 Nov 2016 20:11:17 -0600 > David Miller wrote: >> Series applied, thanks. > > I was really hoping you'd give me the chance to test the patches > before applying them. Sorry, if anything is broken I will happily revert if it isn't fixed promptly.
[PATCH net-next 4/9] liquidio CN23XX: VF queue setup
Adds support for configuring VF input/output queues. Signed-off-by: Raghu VatsavayiSigned-off-by: Derek Chickles Signed-off-by: Satanand Burla Signed-off-by: Felix Manlunas --- .../ethernet/cavium/liquidio/cn23xx_vf_device.c| 144 + .../ethernet/cavium/liquidio/cn23xx_vf_device.h| 2 + drivers/net/ethernet/cavium/liquidio/lio_main.c| 6 +- drivers/net/ethernet/cavium/liquidio/lio_vf_main.c | 5 + .../net/ethernet/cavium/liquidio/octeon_device.c | 43 +- .../net/ethernet/cavium/liquidio/octeon_device.h | 7 +- .../net/ethernet/cavium/liquidio/request_manager.c | 4 +- 7 files changed, 207 insertions(+), 4 deletions(-) diff --git a/drivers/net/ethernet/cavium/liquidio/cn23xx_vf_device.c b/drivers/net/ethernet/cavium/liquidio/cn23xx_vf_device.c index d683bda..60fd138 100644 --- a/drivers/net/ethernet/cavium/liquidio/cn23xx_vf_device.c +++ b/drivers/net/ethernet/cavium/liquidio/cn23xx_vf_device.c @@ -25,13 +25,134 @@ #include "cn23xx_vf_device.h" #include "octeon_main.h" +static int cn23xx_vf_reset_io_queues(struct octeon_device *oct, u32 num_queues) +{ + u32 loop = BUSY_READING_REG_VF_LOOP_COUNT; + int ret_val = 0; + u32 q_no; + u64 d64; + + for (q_no = 0; q_no < num_queues; q_no++) { + /* set RST bit to 1. This bit applies to both IQ and OQ */ + d64 = octeon_read_csr64(oct, + CN23XX_VF_SLI_IQ_PKT_CONTROL64(q_no)); + d64 |= CN23XX_PKT_INPUT_CTL_RST; + octeon_write_csr64(oct, CN23XX_VF_SLI_IQ_PKT_CONTROL64(q_no), + d64); + } + + /* wait until the RST bit is clear or the RST and QUIET bits are set */ + for (q_no = 0; q_no < num_queues; q_no++) { + u64 reg_val = octeon_read_csr64(oct, + CN23XX_VF_SLI_IQ_PKT_CONTROL64(q_no)); + while ((READ_ONCE(reg_val) & CN23XX_PKT_INPUT_CTL_RST) && + !(READ_ONCE(reg_val) & CN23XX_PKT_INPUT_CTL_QUIET) && + loop) { + WRITE_ONCE(reg_val, octeon_read_csr64( + oct, CN23XX_VF_SLI_IQ_PKT_CONTROL64(q_no))); + loop--; + } + if (!loop) { + dev_err(>pci_dev->dev, + "clearing the reset reg failed or setting the quiet reg failed for qno: %u\n", + q_no); + return -1; + } + WRITE_ONCE(reg_val, READ_ONCE(reg_val) & + ~CN23XX_PKT_INPUT_CTL_RST); + octeon_write_csr64(oct, CN23XX_VF_SLI_IQ_PKT_CONTROL64(q_no), + READ_ONCE(reg_val)); + + WRITE_ONCE(reg_val, octeon_read_csr64( + oct, CN23XX_VF_SLI_IQ_PKT_CONTROL64(q_no))); + if (READ_ONCE(reg_val) & CN23XX_PKT_INPUT_CTL_RST) { + dev_err(>pci_dev->dev, + "clearing the reset failed for qno: %u\n", + q_no); + ret_val = -1; + } + } + + return ret_val; +} + +static int cn23xx_enable_vf_io_queues(struct octeon_device *oct) +{ + u32 q_no; + + for (q_no = 0; q_no < oct->num_iqs; q_no++) { + u64 reg_val; + + /* set the corresponding IQ IS_64B bit */ + if (oct->io_qmask.iq64B & BIT_ULL(q_no)) { + reg_val = octeon_read_csr64( + oct, CN23XX_VF_SLI_IQ_PKT_CONTROL64(q_no)); + reg_val |= CN23XX_PKT_INPUT_CTL_IS_64B; + octeon_write_csr64( + oct, CN23XX_VF_SLI_IQ_PKT_CONTROL64(q_no), reg_val); + } + + /* set the corresponding IQ ENB bit */ + if (oct->io_qmask.iq & BIT_ULL(q_no)) { + reg_val = octeon_read_csr64( + oct, CN23XX_VF_SLI_IQ_PKT_CONTROL64(q_no)); + reg_val |= CN23XX_PKT_INPUT_CTL_RING_ENB; + octeon_write_csr64( + oct, CN23XX_VF_SLI_IQ_PKT_CONTROL64(q_no), reg_val); + } + } + for (q_no = 0; q_no < oct->num_oqs; q_no++) { + u32 reg_val; + + /* set the corresponding OQ ENB bit */ + if (oct->io_qmask.oq & BIT_ULL(q_no)) { + reg_val = octeon_read_csr( + oct, CN23XX_VF_SLI_OQ_PKT_CONTROL(q_no)); + reg_val |= CN23XX_PKT_OUTPUT_CTL_RING_ENB; + octeon_write_csr( + oct, CN23XX_VF_SLI_OQ_PKT_CONTROL(q_no),
[PATCH net-next 8/9] liquidio CN23XX: VF interrupt
Adds support for VF interrupt processing. Signed-off-by: Raghu VatsavayiSigned-off-by: Derek Chickles Signed-off-by: Satanand Burla Signed-off-by: Felix Manlunas --- .../ethernet/cavium/liquidio/cn23xx_vf_device.c| 265 + .../ethernet/cavium/liquidio/cn23xx_vf_device.h| 6 + drivers/net/ethernet/cavium/liquidio/lio_core.c| 7 - drivers/net/ethernet/cavium/liquidio/lio_vf_main.c | 162 + .../net/ethernet/cavium/liquidio/octeon_device.c | 3 + .../net/ethernet/cavium/liquidio/octeon_device.h | 2 + 6 files changed, 438 insertions(+), 7 deletions(-) diff --git a/drivers/net/ethernet/cavium/liquidio/cn23xx_vf_device.c b/drivers/net/ethernet/cavium/liquidio/cn23xx_vf_device.c index e514797..9ded8fc 100644 --- a/drivers/net/ethernet/cavium/liquidio/cn23xx_vf_device.c +++ b/drivers/net/ethernet/cavium/liquidio/cn23xx_vf_device.c @@ -27,6 +27,26 @@ #include "octeon_main.h" #include "octeon_mailbox.h" +u32 cn23xx_vf_get_oq_ticks(struct octeon_device *oct, u32 time_intr_in_us) +{ + /* This gives the SLI clock per microsec */ + u32 oqticks_per_us = (u32)oct->pfvf_hsword.coproc_tics_per_us; + + /* This gives the clock cycles per millisecond */ + oqticks_per_us *= 1000; + + /* This gives the oq ticks (1024 core clock cycles) per millisecond */ + oqticks_per_us /= 1024; + + /* time_intr is in microseconds. The next 2 steps gives the oq ticks +* corressponding to time_intr. +*/ + oqticks_per_us *= time_intr_in_us; + oqticks_per_us /= 1000; + + return oqticks_per_us; +} + static int cn23xx_vf_reset_io_queues(struct octeon_device *oct, u32 num_queues) { u32 loop = BUSY_READING_REG_VF_LOOP_COUNT; @@ -212,6 +232,11 @@ static void cn23xx_setup_vf_iq_regs(struct octeon_device *oct, u32 iq_no) */ pkt_in_done = readq(iq->inst_cnt_reg); + if (oct->msix_on) { + /* Set CINT_ENB to enable IQ interrupt */ + writeq((pkt_in_done | CN23XX_INTR_CINT_ENB), + iq->inst_cnt_reg); + } iq->reset_instr_cnt = 0; } @@ -342,6 +367,240 @@ static void cn23xx_disable_vf_io_queues(struct octeon_device *oct) cn23xx_vf_reset_io_queues(oct, num_queues); } +void cn23xx_vf_ask_pf_to_do_flr(struct octeon_device *oct) +{ + struct octeon_mbox_cmd mbox_cmd; + + mbox_cmd.msg.u64 = 0; + mbox_cmd.msg.s.type = OCTEON_MBOX_REQUEST; + mbox_cmd.msg.s.resp_needed = 0; + mbox_cmd.msg.s.cmd = OCTEON_VF_FLR_REQUEST; + mbox_cmd.msg.s.len = 1; + mbox_cmd.q_no = 0; + mbox_cmd.recv_len = 0; + mbox_cmd.recv_status = 0; + mbox_cmd.fn = NULL; + mbox_cmd.fn_arg = 0; + + octeon_mbox_write(oct, _cmd); +} + +static void octeon_pfvf_hs_callback(struct octeon_device *oct, + struct octeon_mbox_cmd *cmd, + void *arg) +{ + u32 major = 0; + + memcpy((uint8_t *)>pfvf_hsword, cmd->msg.s.params, + CN23XX_MAILBOX_MSGPARAM_SIZE); + if (cmd->recv_len > 1) { + major = ((struct lio_version *)(cmd->data))->major; + major = major << 16; + } + + atomic_set((atomic_t *)arg, major | 1); +} + +int cn23xx_octeon_pfvf_handshake(struct octeon_device *oct) +{ + struct octeon_mbox_cmd mbox_cmd; + u32 q_no, count = 0; + atomic_t status; + u32 pfmajor; + u32 vfmajor; + u32 ret; + + /* Sending VF_ACTIVE indication to the PF driver */ + dev_dbg(>pci_dev->dev, "requesting info from pf\n"); + + mbox_cmd.msg.u64 = 0; + mbox_cmd.msg.s.type = OCTEON_MBOX_REQUEST; + mbox_cmd.msg.s.resp_needed = 1; + mbox_cmd.msg.s.cmd = OCTEON_VF_ACTIVE; + mbox_cmd.msg.s.len = 2; + mbox_cmd.data[0] = 0; + ((struct lio_version *)_cmd.data[0])->major = + LIQUIDIO_BASE_MAJOR_VERSION; + ((struct lio_version *)_cmd.data[0])->minor = + LIQUIDIO_BASE_MINOR_VERSION; + ((struct lio_version *)_cmd.data[0])->micro = + LIQUIDIO_BASE_MICRO_VERSION; + mbox_cmd.q_no = 0; + mbox_cmd.recv_len = 0; + mbox_cmd.recv_status = 0; + mbox_cmd.fn = (octeon_mbox_callback_t)octeon_pfvf_hs_callback; + mbox_cmd.fn_arg = (void *) + + /* Interrupts are not enabled at this point. +* Enable them with default oq ticks +*/ + oct->fn_list.enable_interrupt(oct, OCTEON_ALL_INTR); + + octeon_mbox_write(oct, _cmd); + + atomic_set(, 0); + + do { + schedule_timeout_uninterruptible(1); + } while ((!atomic_read()) && (count++ < 10)); + +
[PATCH net-next 5/9] liquidio CN23XX: VF register access
This patch adds support for VF device register access. Signed-off-by: Raghu VatsavayiSigned-off-by: Derek Chickles Signed-off-by: Satanand Burla Signed-off-by: Felix Manlunas --- .../ethernet/cavium/liquidio/cn23xx_vf_device.c| 189 + .../ethernet/cavium/liquidio/cn23xx_vf_device.h| 2 + drivers/net/ethernet/cavium/liquidio/lio_vf_main.c | 5 + 3 files changed, 196 insertions(+) diff --git a/drivers/net/ethernet/cavium/liquidio/cn23xx_vf_device.c b/drivers/net/ethernet/cavium/liquidio/cn23xx_vf_device.c index 60fd138..ad4e442 100644 --- a/drivers/net/ethernet/cavium/liquidio/cn23xx_vf_device.c +++ b/drivers/net/ethernet/cavium/liquidio/cn23xx_vf_device.c @@ -76,6 +76,161 @@ static int cn23xx_vf_reset_io_queues(struct octeon_device *oct, u32 num_queues) return ret_val; } +static int cn23xx_vf_setup_global_input_regs(struct octeon_device *oct) +{ + struct octeon_cn23xx_vf *cn23xx = (struct octeon_cn23xx_vf *)oct->chip; + struct octeon_instr_queue *iq; + u64 q_no, intr_threshold; + u64 d64; + + if (cn23xx_vf_reset_io_queues(oct, oct->sriov_info.rings_per_vf)) + return -1; + + for (q_no = 0; q_no < (oct->sriov_info.rings_per_vf); q_no++) { + void __iomem *inst_cnt_reg; + + octeon_write_csr64(oct, CN23XX_VF_SLI_IQ_DOORBELL(q_no), + 0x); + iq = oct->instr_queue[q_no]; + + if (iq) + inst_cnt_reg = iq->inst_cnt_reg; + else + inst_cnt_reg = (u8 *)oct->mmio[0].hw_addr + + CN23XX_VF_SLI_IQ_INSTR_COUNT64(q_no); + + d64 = octeon_read_csr64(oct, + CN23XX_VF_SLI_IQ_INSTR_COUNT64(q_no)); + + d64 &= 0xEFFFL; + + octeon_write_csr64(oct, CN23XX_VF_SLI_IQ_INSTR_COUNT64(q_no), + d64); + + /* Select ES, RO, NS, RDSIZE,DPTR Fomat#0 for +* the Input Queues +*/ + octeon_write_csr64(oct, CN23XX_VF_SLI_IQ_PKT_CONTROL64(q_no), + CN23XX_PKT_INPUT_CTL_MASK); + + /* set the wmark level to trigger PI_INT */ + intr_threshold = CFG_GET_IQ_INTR_PKT(cn23xx->conf) & +CN23XX_PKT_IN_DONE_WMARK_MASK; + + writeq((readq(inst_cnt_reg) & + ~(CN23XX_PKT_IN_DONE_WMARK_MASK << + CN23XX_PKT_IN_DONE_WMARK_BIT_POS)) | + (intr_threshold << CN23XX_PKT_IN_DONE_WMARK_BIT_POS), + inst_cnt_reg); + } + return 0; +} + +static void cn23xx_vf_setup_global_output_regs(struct octeon_device *oct) +{ + u32 reg_val; + u32 q_no; + + for (q_no = 0; q_no < (oct->sriov_info.rings_per_vf); q_no++) { + octeon_write_csr(oct, CN23XX_VF_SLI_OQ_PKTS_CREDIT(q_no), +0x); + + reg_val = + octeon_read_csr(oct, CN23XX_VF_SLI_OQ_PKTS_SENT(q_no)); + + reg_val &= 0xEFFFL; + + reg_val = + octeon_read_csr(oct, CN23XX_VF_SLI_OQ_PKT_CONTROL(q_no)); + + /* set IPTR & DPTR */ + reg_val |= + (CN23XX_PKT_OUTPUT_CTL_IPTR | CN23XX_PKT_OUTPUT_CTL_DPTR); + + /* reset BMODE */ + reg_val &= ~(CN23XX_PKT_OUTPUT_CTL_BMODE); + + /* No Relaxed Ordering, No Snoop, 64-bit Byte swap +* for Output Queue ScatterList reset ROR_P, NSR_P +*/ + reg_val &= ~(CN23XX_PKT_OUTPUT_CTL_ROR_P); + reg_val &= ~(CN23XX_PKT_OUTPUT_CTL_NSR_P); + +#ifdef __LITTLE_ENDIAN_BITFIELD + reg_val &= ~(CN23XX_PKT_OUTPUT_CTL_ES_P); +#else + reg_val |= (CN23XX_PKT_OUTPUT_CTL_ES_P); +#endif + /* No Relaxed Ordering, No Snoop, 64-bit Byte swap +* for Output Queue Data reset ROR, NSR +*/ + reg_val &= ~(CN23XX_PKT_OUTPUT_CTL_ROR); + reg_val &= ~(CN23XX_PKT_OUTPUT_CTL_NSR); + /* set the ES bit */ + reg_val |= (CN23XX_PKT_OUTPUT_CTL_ES); + + /* write all the selected settings */ + octeon_write_csr(oct, CN23XX_VF_SLI_OQ_PKT_CONTROL(q_no), +reg_val); + } +} + +static int cn23xx_setup_vf_device_regs(struct octeon_device *oct) +{ + if (cn23xx_vf_setup_global_input_regs(oct)) + return -1; + + cn23xx_vf_setup_global_output_regs(oct); + + return 0; +} + +static void cn23xx_setup_vf_iq_regs(struct octeon_device
[PATCH net-next 6/9] liquidio CN23XX: init VF softcommand queues
Adds support for initializing softcommand, dispatch and instructions queues for VF. Signed-off-by: Raghu VatsavayiSigned-off-by: Derek Chickles Signed-off-by: Satanand Burla Signed-off-by: Felix Manlunas --- drivers/net/ethernet/cavium/liquidio/lio_vf_main.c | 74 +- .../net/ethernet/cavium/liquidio/octeon_device.c | 5 ++ .../net/ethernet/cavium/liquidio/request_manager.c | 7 ++ 3 files changed, 84 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/cavium/liquidio/lio_vf_main.c b/drivers/net/ethernet/cavium/liquidio/lio_vf_main.c index 43a5373..d02f1dd 100644 --- a/drivers/net/ethernet/cavium/liquidio/lio_vf_main.c +++ b/drivers/net/ethernet/cavium/liquidio/lio_vf_main.c @@ -140,18 +140,51 @@ static void octeon_pci_flr(struct octeon_device *oct) */ static void octeon_destroy_resources(struct octeon_device *oct) { + int i; + switch (atomic_read(>status)) { + case OCT_DEV_IN_RESET: + case OCT_DEV_DROQ_INIT_DONE: + mdelay(100); + for (i = 0; i < MAX_OCTEON_OUTPUT_QUEUES(oct); i++) { + if (!(oct->io_qmask.oq & BIT_ULL(i))) + continue; + octeon_delete_droq(oct, i); + } + + /* fallthrough */ + case OCT_DEV_RESP_LIST_INIT_DONE: + octeon_delete_response_list(oct); + + /* fallthrough */ + case OCT_DEV_INSTR_QUEUE_INIT_DONE: + for (i = 0; i < MAX_OCTEON_INSTR_QUEUES(oct); i++) { + if (!(oct->io_qmask.iq & BIT_ULL(i))) + continue; + octeon_delete_instr_queue(oct, i); + } + + /* fallthrough */ + case OCT_DEV_SC_BUFF_POOL_INIT_DONE: + octeon_free_sc_buffer_pool(oct); + + /* fallthrough */ + case OCT_DEV_DISPATCH_INIT_DONE: + octeon_delete_dispatch_list(oct); + cancel_delayed_work_sync(>nic_poll_work.work); + + /* fallthrough */ case OCT_DEV_PCI_MAP_DONE: octeon_unmap_pci_barx(oct, 0); octeon_unmap_pci_barx(oct, 1); - /* fallthrough */ + /* fallthrough */ case OCT_DEV_PCI_ENABLE_DONE: pci_clear_master(oct->pci_dev); /* Disable the device, releasing the PCI INT */ pci_disable_device(oct->pci_dev); - /* fallthrough */ + /* fallthrough */ case OCT_DEV_BEGIN_STATE: /* Nothing to be done here either */ break; @@ -236,6 +269,14 @@ static int octeon_device_init(struct octeon_device *oct) atomic_set(>status, OCT_DEV_PCI_MAP_DONE); + /* Initialize the dispatch mechanism used to push packets arriving on +* Octeon Output queues. +*/ + if (octeon_init_dispatch_list(oct)) + return 1; + + atomic_set(>status, OCT_DEV_DISPATCH_INIT_DONE); + if (octeon_set_io_queues_off(oct)) { dev_err(>pci_dev->dev, "setting io queues off failed\n"); return 1; @@ -246,6 +287,35 @@ static int octeon_device_init(struct octeon_device *oct) return 1; } + /* Initialize soft command buffer pool */ + if (octeon_setup_sc_buffer_pool(oct)) { + dev_err(>pci_dev->dev, "sc buffer pool allocation failed\n"); + return 1; + } + atomic_set(>status, OCT_DEV_SC_BUFF_POOL_INIT_DONE); + + /* Setup the data structures that manage this Octeon's Input queues. */ + if (octeon_setup_instr_queues(oct)) { + dev_err(>pci_dev->dev, "instruction queue initialization failed\n"); + return 1; + } + atomic_set(>status, OCT_DEV_INSTR_QUEUE_INIT_DONE); + + /* Initialize lists to manage the requests of different types that +* arrive from user & kernel applications for this octeon device. +*/ + if (octeon_setup_response_list(oct)) { + dev_err(>pci_dev->dev, "Response list allocation failed\n"); + return 1; + } + atomic_set(>status, OCT_DEV_RESP_LIST_INIT_DONE); + + if (octeon_setup_output_queues(oct)) { + dev_err(>pci_dev->dev, "Output queue initialization failed\n"); + return 1; + } + atomic_set(>status, OCT_DEV_DROQ_INIT_DONE); + return 0; } diff --git a/drivers/net/ethernet/cavium/liquidio/octeon_device.c b/drivers/net/ethernet/cavium/liquidio/octeon_device.c index f2cfafd..8af08d4 100644 --- a/drivers/net/ethernet/cavium/liquidio/octeon_device.c +++ b/drivers/net/ethernet/cavium/liquidio/octeon_device.c @@ -797,6 +797,8 @@ int octeon_setup_instr_queues(struct octeon_device *oct)
[PATCH net-next 7/9] liquidio CN23XX: VF mailbox
Adds support for VF mailbox setup. Signed-off-by: Raghu VatsavayiSigned-off-by: Derek Chickles Signed-off-by: Satanand Burla Signed-off-by: Felix Manlunas --- .../ethernet/cavium/liquidio/cn23xx_vf_device.c| 59 ++ drivers/net/ethernet/cavium/liquidio/lio_vf_main.c | 10 2 files changed, 69 insertions(+) diff --git a/drivers/net/ethernet/cavium/liquidio/cn23xx_vf_device.c b/drivers/net/ethernet/cavium/liquidio/cn23xx_vf_device.c index ad4e442..e514797 100644 --- a/drivers/net/ethernet/cavium/liquidio/cn23xx_vf_device.c +++ b/drivers/net/ethernet/cavium/liquidio/cn23xx_vf_device.c @@ -17,6 +17,7 @@ ***/ #include #include +#include #include "liquidio_common.h" #include "octeon_droq.h" #include "octeon_iq.h" @@ -24,6 +25,7 @@ #include "octeon_device.h" #include "cn23xx_vf_device.h" #include "octeon_main.h" +#include "octeon_mailbox.h" static int cn23xx_vf_reset_io_queues(struct octeon_device *oct, u32 num_queues) { @@ -231,6 +233,61 @@ static void cn23xx_setup_vf_oq_regs(struct octeon_device *oct, u32 oq_no) (u8 *)oct->mmio[0].hw_addr + CN23XX_VF_SLI_OQ_PKTS_CREDIT(oq_no); } +static void cn23xx_vf_mbox_thread(struct work_struct *work) +{ + struct cavium_wk *wk = (struct cavium_wk *)work; + struct octeon_mbox *mbox = (struct octeon_mbox *)wk->ctxptr; + + octeon_mbox_process_message(mbox); +} + +static int cn23xx_free_vf_mbox(struct octeon_device *oct) +{ + cancel_delayed_work_sync(>mbox[0]->mbox_poll_wk.work); + vfree(oct->mbox[0]); + return 0; +} + +static int cn23xx_setup_vf_mbox(struct octeon_device *oct) +{ + struct octeon_mbox *mbox = NULL; + + mbox = vmalloc(sizeof(*mbox)); + if (!mbox) + return 1; + + memset(mbox, 0, sizeof(struct octeon_mbox)); + + spin_lock_init(>lock); + + mbox->oct_dev = oct; + + mbox->q_no = 0; + + mbox->state = OCTEON_MBOX_STATE_IDLE; + + /* VF mbox interrupt reg */ + mbox->mbox_int_reg = + (u8 *)oct->mmio[0].hw_addr + CN23XX_VF_SLI_PKT_MBOX_INT(0); + /* VF reads from SIG0 reg */ + mbox->mbox_read_reg = + (u8 *)oct->mmio[0].hw_addr + CN23XX_SLI_PKT_PF_VF_MBOX_SIG(0, 0); + /* VF writes into SIG1 reg */ + mbox->mbox_write_reg = + (u8 *)oct->mmio[0].hw_addr + CN23XX_SLI_PKT_PF_VF_MBOX_SIG(0, 1); + + INIT_DELAYED_WORK(>mbox_poll_wk.work, + cn23xx_vf_mbox_thread); + + mbox->mbox_poll_wk.ctxptr = (void *)mbox; + + oct->mbox[0] = mbox; + + writeq(OCTEON_PFVFSIG, mbox->mbox_read_reg); + + return 0; +} + static int cn23xx_enable_vf_io_queues(struct octeon_device *oct) { u32 q_no; @@ -338,6 +395,8 @@ int cn23xx_setup_octeon_vf_device(struct octeon_device *oct) oct->fn_list.setup_iq_regs = cn23xx_setup_vf_iq_regs; oct->fn_list.setup_oq_regs = cn23xx_setup_vf_oq_regs; + oct->fn_list.setup_mbox = cn23xx_setup_vf_mbox; + oct->fn_list.free_mbox = cn23xx_free_vf_mbox; oct->fn_list.setup_device_regs = cn23xx_setup_vf_device_regs; oct->fn_list.enable_io_queues = cn23xx_enable_vf_io_queues; diff --git a/drivers/net/ethernet/cavium/liquidio/lio_vf_main.c b/drivers/net/ethernet/cavium/liquidio/lio_vf_main.c index d02f1dd..e8eaece 100644 --- a/drivers/net/ethernet/cavium/liquidio/lio_vf_main.c +++ b/drivers/net/ethernet/cavium/liquidio/lio_vf_main.c @@ -143,6 +143,10 @@ static void octeon_destroy_resources(struct octeon_device *oct) int i; switch (atomic_read(>status)) { + case OCT_DEV_MBOX_SETUP_DONE: + oct->fn_list.free_mbox(oct); + + /* fallthrough */ case OCT_DEV_IN_RESET: case OCT_DEV_DROQ_INIT_DONE: mdelay(100); @@ -316,6 +320,12 @@ static int octeon_device_init(struct octeon_device *oct) } atomic_set(>status, OCT_DEV_DROQ_INIT_DONE); + if (oct->fn_list.setup_mbox(oct)) { + dev_err(>pci_dev->dev, "Mailbox setup failed\n"); + return 1; + } + atomic_set(>status, OCT_DEV_MBOX_SETUP_DONE); + return 0; } -- 1.8.3.1
[PATCH net-next 9/9] liquidio CN23XX: VF init and destroy
Adds support for VF initialization and destroy resources. Signed-off-by: Raghu VatsavayiSigned-off-by: Derek Chickles Signed-off-by: Satanand Burla Signed-off-by: Felix Manlunas --- .../ethernet/cavium/liquidio/cn23xx_vf_device.h| 2 + drivers/net/ethernet/cavium/liquidio/lio_vf_main.c | 106 + 2 files changed, 108 insertions(+) diff --git a/drivers/net/ethernet/cavium/liquidio/cn23xx_vf_device.h b/drivers/net/ethernet/cavium/liquidio/cn23xx_vf_device.h index 8590bdb..6715df3 100644 --- a/drivers/net/ethernet/cavium/liquidio/cn23xx_vf_device.h +++ b/drivers/net/ethernet/cavium/liquidio/cn23xx_vf_device.h @@ -36,6 +36,8 @@ struct octeon_cn23xx_vf { #define CN23XX_MAILBOX_MSGPARAM_SIZE 6 +#define MAX_VF_IP_OP_PENDING_PKT_COUNT 100 + void cn23xx_vf_ask_pf_to_do_flr(struct octeon_device *oct); int cn23xx_octeon_pfvf_handshake(struct octeon_device *oct); diff --git a/drivers/net/ethernet/cavium/liquidio/lio_vf_main.c b/drivers/net/ethernet/cavium/liquidio/lio_vf_main.c index 337285b..2493bf5 100644 --- a/drivers/net/ethernet/cavium/liquidio/lio_vf_main.c +++ b/drivers/net/ethernet/cavium/liquidio/lio_vf_main.c @@ -41,6 +41,60 @@ struct octeon_device_priv { static void liquidio_vf_remove(struct pci_dev *pdev); static int octeon_device_init(struct octeon_device *oct); +static int lio_wait_for_oq_pkts(struct octeon_device *oct) +{ + struct octeon_device_priv *oct_priv = + (struct octeon_device_priv *)oct->priv; + int retry = MAX_VF_IP_OP_PENDING_PKT_COUNT; + int pkt_cnt = 0, pending_pkts; + int i; + + do { + pending_pkts = 0; + + for (i = 0; i < MAX_OCTEON_OUTPUT_QUEUES(oct); i++) { + if (!(oct->io_qmask.oq & BIT_ULL(i))) + continue; + pkt_cnt += octeon_droq_check_hw_for_pkts(oct->droq[i]); + } + if (pkt_cnt > 0) { + pending_pkts += pkt_cnt; + tasklet_schedule(_priv->droq_tasklet); + } + pkt_cnt = 0; + schedule_timeout_uninterruptible(1); + + } while (retry-- && pending_pkts); + + return pkt_cnt; +} + +/** + * \brief wait for all pending requests to complete + * @param oct Pointer to Octeon device + * + * Called during shutdown sequence + */ +static int wait_for_pending_requests(struct octeon_device *oct) +{ + int i, pcount = 0; + + for (i = 0; i < MAX_VF_IP_OP_PENDING_PKT_COUNT; i++) { + pcount = atomic_read( + >response_list[OCTEON_ORDERED_SC_LIST] +.pending_req_count); + if (pcount) + schedule_timeout_uninterruptible(HZ / 10); + else + break; + } + + if (pcount) + return 1; + + return 0; +} + static const struct pci_device_id liquidio_vf_pci_tbl[] = { { PCI_VENDOR_ID_CAVIUM, OCTEON_CN23XX_VF_VID, @@ -257,6 +311,35 @@ static void octeon_destroy_resources(struct octeon_device *oct) int i; switch (atomic_read(>status)) { + case OCT_DEV_RUNNING: + case OCT_DEV_CORE_OK: + /* No more instructions will be forwarded. */ + atomic_set(>status, OCT_DEV_IN_RESET); + + dev_dbg(>pci_dev->dev, "Device state is now %s\n", + lio_get_state_string(>status)); + + schedule_timeout_uninterruptible(HZ / 10); + + /* fallthrough */ + case OCT_DEV_HOST_OK: + /* fallthrough */ + case OCT_DEV_IO_QUEUES_DONE: + if (wait_for_pending_requests(oct)) + dev_err(>pci_dev->dev, "There were pending requests\n"); + + if (lio_wait_for_instr_fetch(oct)) + dev_err(>pci_dev->dev, "IQ had pending instructions\n"); + + /* Disable the input and output queues now. No more packets will +* arrive from Octeon, but we should wait for all packet +* processing to finish. +*/ + oct->fn_list.disable_io_queues(oct); + + if (lio_wait_for_oq_pkts(oct)) + dev_err(>pci_dev->dev, "OQ had pending packets\n"); + case OCT_DEV_INTR_SET_DONE: /* Disable interrupts */ oct->fn_list.disable_interrupt(oct, OCTEON_ALL_INTR); @@ -395,6 +478,7 @@ static int octeon_pci_os_setup(struct octeon_device *oct) static int octeon_device_init(struct octeon_device *oct) { u32 rev_id; + int j; atomic_set(>status, OCT_DEV_BEGIN_STATE); @@ -488,6 +572,28 @@ static int octeon_device_init(struct octeon_device *oct)
[PATCH net-next 0/9] liquidio VF operations
Hi Dave, Following patches add support for VF device specific operations like mailbox, queues and register access. Please apply the patches in following order as these patches depend on each other. Thanks Raghu Vatsavayi (9): liquidio CN23XX: VF register definitions liquidio CN23XX: VF registration liquidio CN23XX: VF config setup liquidio CN23XX: VF queue setup liquidio CN23XX: VF register access liquidio CN23XX: init VF softcommand queues liquidio CN23XX: VF mailbox liquidio CN23XX: VF interrupt liquidio CN23XX: VF init and destroy drivers/net/ethernet/cavium/Kconfig| 12 + drivers/net/ethernet/cavium/liquidio/Makefile | 22 + .../ethernet/cavium/liquidio/cn23xx_vf_device.c| 701 + .../ethernet/cavium/liquidio/cn23xx_vf_device.h| 48 ++ .../net/ethernet/cavium/liquidio/cn23xx_vf_regs.h | 274 drivers/net/ethernet/cavium/liquidio/lio_core.c| 7 - drivers/net/ethernet/cavium/liquidio/lio_main.c| 6 +- drivers/net/ethernet/cavium/liquidio/lio_vf_main.c | 614 ++ .../net/ethernet/cavium/liquidio/octeon_device.c | 58 +- .../net/ethernet/cavium/liquidio/octeon_device.h | 9 +- .../net/ethernet/cavium/liquidio/request_manager.c | 11 +- 11 files changed, 1751 insertions(+), 11 deletions(-) create mode 100644 drivers/net/ethernet/cavium/liquidio/cn23xx_vf_device.c create mode 100644 drivers/net/ethernet/cavium/liquidio/cn23xx_vf_device.h create mode 100644 drivers/net/ethernet/cavium/liquidio/cn23xx_vf_regs.h create mode 100644 drivers/net/ethernet/cavium/liquidio/lio_vf_main.c -- 1.8.3.1
[PATCH net-next 3/9] liquidio CN23XX: VF config setup
Adds support for setting up VF configuration. Signed-off-by: Raghu VatsavayiSigned-off-by: Derek Chickles Signed-off-by: Satanand Burla Signed-off-by: Felix Manlunas --- drivers/net/ethernet/cavium/liquidio/Makefile | 1 + .../ethernet/cavium/liquidio/cn23xx_vf_device.c| 44 +++ .../ethernet/cavium/liquidio/cn23xx_vf_device.h| 2 + drivers/net/ethernet/cavium/liquidio/lio_vf_main.c | 136 + .../net/ethernet/cavium/liquidio/octeon_device.c | 3 + 5 files changed, 186 insertions(+) create mode 100644 drivers/net/ethernet/cavium/liquidio/cn23xx_vf_device.c diff --git a/drivers/net/ethernet/cavium/liquidio/Makefile b/drivers/net/ethernet/cavium/liquidio/Makefile index 69d23fc..cca903a 100644 --- a/drivers/net/ethernet/cavium/liquidio/Makefile +++ b/drivers/net/ethernet/cavium/liquidio/Makefile @@ -31,6 +31,7 @@ liquidio_vf-$(CONFIG_LIQUIDIO_VF) += lio_ethtool.o \ cn66xx_device.o\ cn68xx_device.o\ cn23xx_pf_device.o \ + cn23xx_vf_device.o \ octeon_mailbox.o \ octeon_mem_ops.o \ octeon_droq.o \ diff --git a/drivers/net/ethernet/cavium/liquidio/cn23xx_vf_device.c b/drivers/net/ethernet/cavium/liquidio/cn23xx_vf_device.c new file mode 100644 index 000..d683bda --- /dev/null +++ b/drivers/net/ethernet/cavium/liquidio/cn23xx_vf_device.c @@ -0,0 +1,44 @@ +/** + * Author: Cavium, Inc. + * + * Contact: supp...@cavium.com + * Please include "LiquidIO" in the subject. + * + * Copyright (c) 2003-2016 Cavium, Inc. + * + * This file is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License, Version 2, as + * published by the Free Software Foundation. + * + * This file is distributed in the hope that it will be useful, but + * AS-IS and WITHOUT ANY WARRANTY; without even the implied warranty + * of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE, TITLE, or + * NONINFRINGEMENT. See the GNU General Public License for more details. + ***/ +#include +#include +#include "liquidio_common.h" +#include "octeon_droq.h" +#include "octeon_iq.h" +#include "response_manager.h" +#include "octeon_device.h" +#include "cn23xx_vf_device.h" +#include "octeon_main.h" + +int cn23xx_setup_octeon_vf_device(struct octeon_device *oct) +{ + struct octeon_cn23xx_vf *cn23xx = (struct octeon_cn23xx_vf *)oct->chip; + + if (octeon_map_pci_barx(oct, 0, 0)) + return 1; + + cn23xx->conf = oct_get_config_info(oct, LIO_23XX); + if (!cn23xx->conf) { + dev_err(>pci_dev->dev, "%s No Config found for CN23XX\n", + __func__); + octeon_unmap_pci_barx(oct, 0); + return 1; + } + + return 0; +} diff --git a/drivers/net/ethernet/cavium/liquidio/cn23xx_vf_device.h b/drivers/net/ethernet/cavium/liquidio/cn23xx_vf_device.h index 015b6d4..9e4fb50 100644 --- a/drivers/net/ethernet/cavium/liquidio/cn23xx_vf_device.h +++ b/drivers/net/ethernet/cavium/liquidio/cn23xx_vf_device.h @@ -31,4 +31,6 @@ struct octeon_cn23xx_vf { struct octeon_config *conf; }; + +int cn23xx_setup_octeon_vf_device(struct octeon_device *oct); #endif diff --git a/drivers/net/ethernet/cavium/liquidio/lio_vf_main.c b/drivers/net/ethernet/cavium/liquidio/lio_vf_main.c index d1b1a24..721ee66 100644 --- a/drivers/net/ethernet/cavium/liquidio/lio_vf_main.c +++ b/drivers/net/ethernet/cavium/liquidio/lio_vf_main.c @@ -22,6 +22,8 @@ #include "octeon_iq.h" #include "response_manager.h" #include "octeon_device.h" +#include "octeon_main.h" +#include "cn23xx_vf_device.h" MODULE_AUTHOR("Cavium Networks, "); MODULE_DESCRIPTION("Cavium LiquidIO Intelligent Server Adapter Virtual Function Driver"); @@ -37,6 +39,7 @@ struct octeon_device_priv { static int liquidio_vf_probe(struct pci_dev *pdev, const struct pci_device_id *ent); static void liquidio_vf_remove(struct pci_dev *pdev); +static int octeon_device_init(struct octeon_device *oct); static const struct pci_device_id liquidio_vf_pci_tbl[] = { { @@ -84,10 +87,78 @@ struct octeon_device_priv { /* set linux specific device pointer */ oct_dev->pci_dev = (void *)pdev; + if (octeon_device_init(oct_dev)) { + liquidio_vf_remove(pdev); + return -ENOMEM; + } + + dev_dbg(_dev->pci_dev->dev, "Device is ready\n"); + return 0; } /** + * \brief PCI FLR for each Octeon device. + * @param oct octeon device + */ +static void octeon_pci_flr(struct octeon_device
RE: [net,v2] neigh: fix the loop index error in neigh dump
> -Original Message- > From: David Ahern [mailto:d...@cumulusnetworks.com] > Sent: Monday, November 28, 2016 11:10 AM > To: 张胜举; > netdev@vger.kernel.org > Subject: Re: [net,v2] neigh: fix the loop index error in neigh dump > > On 11/27/16 7:56 PM, David Ahern wrote: > > On 11/27/16 7:53 PM, 张胜举 wrote: > >> > >> > >>> -Original Message- > >>> From: David Ahern [mailto:d...@cumulusnetworks.com] > >>> Sent: Monday, November 28, 2016 10:39 AM > >>> To: 张胜举 ; > >>> netdev@vger.kernel.org > >>> Subject: Re: [net,v2] neigh: fix the loop index error in neigh dump > >>> > >>> On 11/27/16 7:34 PM, 张胜举 wrote: > > -Original Message- > > From: David Ahern [mailto:d...@cumulusnetworks.com] > > Sent: Monday, November 28, 2016 10:10 AM > > To: Zhang Shengju ; > > netdev@vger.kernel.org > > Subject: Re: [net,v2] neigh: fix the loop index error in neigh > > dump > > > > On 11/27/16 6:32 PM, Zhang Shengju wrote: > >> Loop index in neigh dump function is not updated correctly under > >> some circumstances, this patch will fix it. > > > > What's an example? > > If dev is filtered out, the original code goes to next loop without > updating loop index 'idx'. > >>> > >>> And you have a use case with missing or redundant data? Or is your > >>> comment based on a review of code only? > >> It's on my code review. No use case currently, this is uncommon to > happen. > >> > >> > >>> > > You are completely rewriting the dump loops. > > I put 'idx++' into for loop, so I replace 'goto' with 'continue'. > The other change is style related. > >>> > >>> A "fixes" should not include 'style related' changes. > >> Okay, I will send another version without style changes. > >> > > > > Personally, I think you need to produce a use case that fails before sending > another patch. I have not seen a problem with this code. > > > > And looking back at 3f0ae05d6f I should not have acked it (reviewed it too > quickly while on PTO). Your change is a no-op because of what idx represents > - the position in the hash list for devices relevant for the dump request. > Same goes for the neigh dump so this patch is not needed. > No, when dump request must be processed by multiple 'recv/recvmsg' system calls, idx stores which dev/neigh the previous call have processed, so that next call will scan from the right place. So no matter whether the dev/neigh is filtered, the idx should be increased anyway. It's hard to produce a use case, because we mostly have only one entity in hash list. Even with multiple entities, we also need the function to exit right at the place where dev/neigh is filter out. All other dump functiones for RT netlink keep this logic, you can refer inet_dump_ifaddr() if you wish.
[PATCH net-next 1/9] liquidio CN23XX: VF register definitions
Adds support for CN23xx VF registers. Signed-off-by: Raghu VatsavayiSigned-off-by: Derek Chickles Signed-off-by: Satanand Burla Signed-off-by: Felix Manlunas --- .../net/ethernet/cavium/liquidio/cn23xx_vf_regs.h | 274 + 1 file changed, 274 insertions(+) create mode 100644 drivers/net/ethernet/cavium/liquidio/cn23xx_vf_regs.h diff --git a/drivers/net/ethernet/cavium/liquidio/cn23xx_vf_regs.h b/drivers/net/ethernet/cavium/liquidio/cn23xx_vf_regs.h new file mode 100644 index 000..d33dd8f --- /dev/null +++ b/drivers/net/ethernet/cavium/liquidio/cn23xx_vf_regs.h @@ -0,0 +1,274 @@ +/** + * Author: Cavium, Inc. + * + * Contact: supp...@cavium.com + * Please include "LiquidIO" in the subject. + * + * Copyright (c) 2003-2016 Cavium, Inc. + * + * This file is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License, Version 2, as + * published by the Free Software Foundation. + * + * This file is distributed in the hope that it will be useful, but + * AS-IS and WITHOUT ANY WARRANTY; without even the implied warranty + * of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE, TITLE, or + * NONINFRINGEMENT. See the GNU General Public License for more details. + ***/ +/*! \file cn23xx_vf_regs.h + * \brief Host Driver: Register Address and Register Mask values for + * Octeon CN23XX vf functions. + */ + +#ifndef __CN23XX_VF_REGS_H__ +#define __CN23XX_VF_REGS_H__ + +#define CN23XX_CONFIG_XPANSION_BAR 0x38 + +#define CN23XX_CONFIG_PCIE_CAP 0x70 +#define CN23XX_CONFIG_PCIE_DEVCAP 0x74 +#define CN23XX_CONFIG_PCIE_DEVCTL 0x78 +#define CN23XX_CONFIG_PCIE_LINKCAP 0x7C +#define CN23XX_CONFIG_PCIE_LINKCTL 0x80 +#define CN23XX_CONFIG_PCIE_SLOTCAP 0x84 +#define CN23XX_CONFIG_PCIE_SLOTCTL 0x88 + +#define CN23XX_CONFIG_PCIE_FLTMSK 0x720 + +/* The input jabber is used to determine the TSO max size. + * Due to H/W limitation, this need to be reduced to 6 + * in order to to H/W TSO and avoid the WQE malfarmation + * PKO_BUG_24989_WQE_LEN + */ +#defineCN23XX_DEFAULT_INPUT_JABBER 0xEA60 /*6*/ + +/* ## BAR0 Registers */ + +/* Each Input Queue register is at a 16-byte Offset in BAR0 */ +#defineCN23XX_VF_IQ_OFFSET 0x2 + +/*## REQUEST QUEUE #*/ + +/* 64 registers for Input Queue Instr Count - SLI_PKT_IN_DONE0_CNTS */ +#defineCN23XX_VF_SLI_IQ_INSTR_COUNT_START64 0x10040 + +/* 64 registers for Input Queues Start Addr - SLI_PKT0_INSTR_BADDR */ +#defineCN23XX_VF_SLI_IQ_BASE_ADDR_START64 0x10010 + +/* 64 registers for Input Doorbell - SLI_PKT0_INSTR_BAOFF_DBELL */ +#defineCN23XX_VF_SLI_IQ_DOORBELL_START 0x10020 + +/* 64 registers for Input Queue size - SLI_PKT0_INSTR_FIFO_RSIZE */ +#defineCN23XX_VF_SLI_IQ_SIZE_START 0x10030 + +/* 64 registers (64-bit) - ES, RO, NS, Arbitration for Input Queue Data & + * gather list fetches. SLI_PKT(0..63)_INPUT_CONTROL. + */ +#defineCN23XX_VF_SLI_IQ_PKT_CONTROL_START64 0x1 + +/*--- Request Queue Macros -*/ +#define CN23XX_VF_SLI_IQ_PKT_CONTROL64(iq) \ + (CN23XX_VF_SLI_IQ_PKT_CONTROL_START64 + ((iq) * CN23XX_VF_IQ_OFFSET)) + +#define CN23XX_VF_SLI_IQ_BASE_ADDR64(iq) \ + (CN23XX_VF_SLI_IQ_BASE_ADDR_START64 + ((iq) * CN23XX_VF_IQ_OFFSET)) + +#define CN23XX_VF_SLI_IQ_SIZE(iq) \ + (CN23XX_VF_SLI_IQ_SIZE_START + ((iq) * CN23XX_VF_IQ_OFFSET)) + +#define CN23XX_VF_SLI_IQ_DOORBELL(iq) \ + (CN23XX_VF_SLI_IQ_DOORBELL_START + ((iq) * CN23XX_VF_IQ_OFFSET)) + +#define CN23XX_VF_SLI_IQ_INSTR_COUNT64(iq) \ + (CN23XX_VF_SLI_IQ_INSTR_COUNT_START64 + ((iq) * CN23XX_VF_IQ_OFFSET)) + +/*-- Masks */ +#defineCN23XX_PKT_INPUT_CTL_VF_NUM BIT_ULL(32) +#defineCN23XX_PKT_INPUT_CTL_MAC_NUM BIT(29) +/* Number of instructions to be read in one MAC read request. + * setting to Max value(4) + */ +#defineCN23XX_PKT_INPUT_CTL_RDSIZE (3 << 25) +#defineCN23XX_PKT_INPUT_CTL_IS_64B BIT(24) +#defineCN23XX_PKT_INPUT_CTL_RST BIT(23) +#defineCN23XX_PKT_INPUT_CTL_QUIET BIT(28) +#defineCN23XX_PKT_INPUT_CTL_RING_ENBBIT(22) +#defineCN23XX_PKT_INPUT_CTL_DATA_NS BIT(8) +#defineCN23XX_PKT_INPUT_CTL_DATA_ES_64B_SWAPBIT(6) +#define
[PATCH net-next 2/9] liquidio CN23XX: VF registration
Adds support for cn23xx VF probe and registration. Signed-off-by: Raghu VatsavayiSigned-off-by: Derek Chickles Signed-off-by: Satanand Burla Signed-off-by: Felix Manlunas --- drivers/net/ethernet/cavium/Kconfig| 12 +++ drivers/net/ethernet/cavium/liquidio/Makefile | 21 .../ethernet/cavium/liquidio/cn23xx_vf_device.h| 34 ++ drivers/net/ethernet/cavium/liquidio/lio_vf_main.c | 120 + .../net/ethernet/cavium/liquidio/octeon_device.c | 4 + 5 files changed, 191 insertions(+) create mode 100644 drivers/net/ethernet/cavium/liquidio/cn23xx_vf_device.h create mode 100644 drivers/net/ethernet/cavium/liquidio/lio_vf_main.c diff --git a/drivers/net/ethernet/cavium/Kconfig b/drivers/net/ethernet/cavium/Kconfig index 92f411c..c0679c2 100644 --- a/drivers/net/ethernet/cavium/Kconfig +++ b/drivers/net/ethernet/cavium/Kconfig @@ -74,4 +74,16 @@ config OCTEON_MGMT_ETHERNET port on Cavium Networks' Octeon CN57XX, CN56XX, CN55XX, CN54XX, CN52XX, and CN6XXX chips. +config LIQUIDIO_VF + tristate "Cavium LiquidIO VF support" + depends on 64BIT && PCI_MSI + select PTP_1588_CLOCK + ---help--- + This driver supports Cavium LiquidIO Intelligent Server Adapter + based on CN23XX chips. + + To compile this driver as a module, choose M here: The module + will be called liquidio_vf. MSI-X interrupt support is required + for this driver to work correctly + endif # NET_VENDOR_CAVIUM diff --git a/drivers/net/ethernet/cavium/liquidio/Makefile b/drivers/net/ethernet/cavium/liquidio/Makefile index 14958de..69d23fc 100644 --- a/drivers/net/ethernet/cavium/liquidio/Makefile +++ b/drivers/net/ethernet/cavium/liquidio/Makefile @@ -17,3 +17,24 @@ liquidio-$(CONFIG_LIQUIDIO) += lio_ethtool.o \ octeon_nic.o liquidio-objs := lio_main.o octeon_console.o $(liquidio-y) + +obj-$(CONFIG_LIQUIDIO_VF) += liquidio_vf.o + +ifeq ($(CONFIG_LIQUIDIO)$(CONFIG_LIQUIDIO_VF), yy) + liquidio_vf-objs := lio_vf_main.o +else +liquidio_vf-$(CONFIG_LIQUIDIO_VF) += lio_ethtool.o \ + lio_core.o \ + request_manager.o \ + response_manager.o \ + octeon_device.o\ + cn66xx_device.o\ + cn68xx_device.o\ + cn23xx_pf_device.o \ + octeon_mailbox.o \ + octeon_mem_ops.o \ + octeon_droq.o \ + octeon_nic.o + +liquidio_vf-objs := lio_vf_main.o $(liquidio_vf-y) +endif diff --git a/drivers/net/ethernet/cavium/liquidio/cn23xx_vf_device.h b/drivers/net/ethernet/cavium/liquidio/cn23xx_vf_device.h new file mode 100644 index 000..015b6d4 --- /dev/null +++ b/drivers/net/ethernet/cavium/liquidio/cn23xx_vf_device.h @@ -0,0 +1,34 @@ +/** + * Author: Cavium, Inc. + * + * Contact: supp...@cavium.com + * Please include "LiquidIO" in the subject. + * + * Copyright (c) 2003-2016 Cavium, Inc. + * + * This file is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License, Version 2, as + * published by the Free Software Foundation. + * + * This file is distributed in the hope that it will be useful, but + * AS-IS and WITHOUT ANY WARRANTY; without even the implied warranty + * of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE, TITLE, or + * NONINFRINGEMENT. See the GNU General Public License for more details. + ***/ +/*! \file cn23xx_device.h + * \brief Host Driver: Routines that perform CN23XX specific operations. + */ + +#ifndef __CN23XX_VF_DEVICE_H__ +#define __CN23XX_VF_DEVICE_H__ + +#include "cn23xx_vf_regs.h" + +/* Register address and configuration for a CN23XX devices. + * If device specific changes need to be made then add a struct to include + * device specific fields as shown in the commented section + */ +struct octeon_cn23xx_vf { + struct octeon_config *conf; +}; +#endif diff --git a/drivers/net/ethernet/cavium/liquidio/lio_vf_main.c b/drivers/net/ethernet/cavium/liquidio/lio_vf_main.c new file mode 100644 index 000..d1b1a24 --- /dev/null +++ b/drivers/net/ethernet/cavium/liquidio/lio_vf_main.c @@ -0,0 +1,120 @@ +/** + * Author: Cavium, Inc. + * + * Contact: supp...@cavium.com + * Please include "LiquidIO" in the subject. + * + * Copyright (c) 2003-2016 Cavium, Inc. + * + * This file is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License, Version
[PATCH net-next] bpf: samples: Fix compile of test_lru_dist.c
Build of samples/bpf on debian/jessie fails with: HOSTCC /home/dsa/kernel-3.git/samples/bpf/test_lru_dist.o /home/dsa/kernel-3.git/samples/bpf/test_lru_dist.c: In function ‘main’: /home/dsa/kernel-3.git/samples/bpf/test_lru_dist.c:490:9: error: variable ‘r’ has initializer but incomplete type struct rlimit r = {RLIM_INFINITY, RLIM_INFINITY}; ^ /home/dsa/kernel-3.git/samples/bpf/test_lru_dist.c:490:21: error: ‘RLIM_INFINITY’ undeclared (first use in this function) struct rlimit r = {RLIM_INFINITY, RLIM_INFINITY}; ^ /home/dsa/kernel-3.git/samples/bpf/test_lru_dist.c:490:21: note: each undeclared identifier is reported only once for each function it appears in /home/dsa/kernel-3.git/samples/bpf/test_lru_dist.c:490:9: warning: excess elements in struct initializer struct rlimit r = {RLIM_INFINITY, RLIM_INFINITY}; ^ /home/dsa/kernel-3.git/samples/bpf/test_lru_dist.c:490:9: warning: (near initialization for ‘r’) /home/dsa/kernel-3.git/samples/bpf/test_lru_dist.c:490:9: warning: excess elements in struct initializer /home/dsa/kernel-3.git/samples/bpf/test_lru_dist.c:490:9: warning: (near initialization for ‘r’) /home/dsa/kernel-3.git/samples/bpf/test_lru_dist.c:490:16: error: storage size of ‘r’ isn’t known struct rlimit r = {RLIM_INFINITY, RLIM_INFINITY}; Add sys/resource.h to the include list Fixes: 5db58faf989f ("bpf: Add tests for the LRU bpf_htab") Signed-off-by: David AhernCc: Martin KaFai Lau --- samples/bpf/test_lru_dist.c | 1 + 1 file changed, 1 insertion(+) diff --git a/samples/bpf/test_lru_dist.c b/samples/bpf/test_lru_dist.c index 2859977b7f37..bc4a2142eb91 100644 --- a/samples/bpf/test_lru_dist.c +++ b/samples/bpf/test_lru_dist.c @@ -16,6 +16,7 @@ #include #include #include +#include #include #include #include -- 2.1.4
Re: [RFC net-next 2/3] net: dsa: Propagate VLAN add/del to CPU port(s)
On 11/22/2016 08:50 AM, Vivien Didelot wrote: > Hi Florian, > > Open question: will we need to do the same for FDB and MDB objects? (overlooked that question early this week), I do expect that this could be helpful for FDB and MBD objects as well, yes. > > Florian Fainelliwrites: > >> Now that the bridge layer can call into switchdev to signal programming >> requests targeting the bridge master device itself, allow the switch >> drivers to implement separate programming of downstream and >> upstream/management ports. >> >> Signed-off-by: Vivien Didelot >> Signed-off-by: Florian Fainelli >> --- >> net/dsa/slave.c | 45 + >> 1 file changed, 33 insertions(+), 12 deletions(-) >> >> diff --git a/net/dsa/slave.c b/net/dsa/slave.c >> index d0c7bce88743..18288261b964 100644 >> --- a/net/dsa/slave.c >> +++ b/net/dsa/slave.c >> @@ -223,35 +223,30 @@ static int dsa_slave_set_mac_address(struct net_device >> *dev, void *a) >> return 0; >> } >> >> -static int dsa_slave_port_vlan_add(struct net_device *dev, >> +static int dsa_slave_port_vlan_add(struct dsa_switch *ds, int port, >> const struct switchdev_obj_port_vlan *vlan, >> struct switchdev_trans *trans) >> { >> -struct dsa_slave_priv *p = netdev_priv(dev); >> -struct dsa_switch *ds = p->parent; >> > > Extra newline ^. > >> if (switchdev_trans_ph_prepare(trans)) { >> if (!ds->ops->port_vlan_prepare || !ds->ops->port_vlan_add) >> return -EOPNOTSUPP; >> >> -return ds->ops->port_vlan_prepare(ds, p->port, vlan, trans); >> +return ds->ops->port_vlan_prepare(ds, port, vlan, trans); >> } >> >> -ds->ops->port_vlan_add(ds, p->port, vlan, trans); >> +ds->ops->port_vlan_add(ds, port, vlan, trans); >> >> return 0; >> } >> >> -static int dsa_slave_port_vlan_del(struct net_device *dev, >> +static int dsa_slave_port_vlan_del(struct dsa_switch *ds, int port, >> const struct switchdev_obj_port_vlan *vlan) >> { >> -struct dsa_slave_priv *p = netdev_priv(dev); >> -struct dsa_switch *ds = p->parent; >> - >> if (!ds->ops->port_vlan_del) >> return -EOPNOTSUPP; >> >> -return ds->ops->port_vlan_del(ds, p->port, vlan); >> +return ds->ops->port_vlan_del(ds, port, vlan); >> } >> >> static int dsa_slave_port_vlan_dump(struct net_device *dev, >> @@ -465,8 +460,21 @@ static int dsa_slave_port_obj_add(struct net_device >> *dev, >>const struct switchdev_obj *obj, >>struct switchdev_trans *trans) >> { >> +struct dsa_slave_priv *p = netdev_priv(dev); >> +struct dsa_switch *ds = p->parent; >> +int port = p->port; >> int err; >> >> +/* Here we may be called with an orig_dev which is different from dev, >> + * on purpose, to receive request coming from e.g the bridge master >> + * device. Although there are no network device associated with CPU/DSA >> + * ports, we may still have programming operation for these ports. >> + */ >> +if (obj->orig_dev == p->bridge_dev) { >> +ds = ds->dst->ds[0]; >> +port = ds->dst->cpu_port; >> +} >> + >> /* For the prepare phase, ensure the full set of changes is feasable in >> * one go in order to signal a failure properly. If an operation is not >> * supported, return -EOPNOTSUPP. >> @@ -483,7 +491,7 @@ static int dsa_slave_port_obj_add(struct net_device *dev, >> trans); >> break; >> case SWITCHDEV_OBJ_ID_PORT_VLAN: >> -err = dsa_slave_port_vlan_add(dev, >> +err = dsa_slave_port_vlan_add(ds, port, >>SWITCHDEV_OBJ_PORT_VLAN(obj), >>trans); > > Note that dsa_slave_port_vlan_add() will be called N times, N being the > number of bridge ports. This is not an issue for the moment though. > Programming it only once requires caching, so leave it for an eventual > future patch. > > When issuing the following command (lan0 being a member of br0): > > # bridge vlan add vid 42 dev lan0 > > the CPU port is also programmed as tagged in VLAN 42. Is that expected? The first time the VLAN id is programmed to either lan0 or br0, and it did not exist prior to that call, it also gets populated into the bridge VLAN database, which is why both the lan0 interface and the CPU port get programmed. -- Florian
Re: [net-next PATCH v2 3/5] virtio_net: Add XDP support
On Sun, Nov 27, 2016 at 07:56:09PM -0800, John Fastabend wrote: > On 16-11-27 07:36 PM, Michael S. Tsirkin wrote: > > On Fri, Nov 25, 2016 at 01:24:03PM -0800, John Fastabend wrote: > >> On 16-11-22 06:58 AM, Michael S. Tsirkin wrote: > >>> On Tue, Nov 22, 2016 at 12:27:03AM -0800, John Fastabend wrote: > On 16-11-21 03:20 PM, Michael S. Tsirkin wrote: > > On Sat, Nov 19, 2016 at 06:50:33PM -0800, John Fastabend wrote: > >> From: Shrijeet Mukherjee> >> > >> This adds XDP support to virtio_net. Some requirements must be > >> met for XDP to be enabled depending on the mode. First it will > >> only be supported with LRO disabled so that data is not pushed > >> across multiple buffers. The MTU must be less than a page size > >> to avoid having to handle XDP across multiple pages. > >> > >> If mergeable receive is enabled this first series only supports > >> the case where header and data are in the same buf which we can > >> check when a packet is received by looking at num_buf. If the > >> num_buf is greater than 1 and a XDP program is loaded the packet > >> is dropped and a warning is thrown. When any_header_sg is set this > >> does not happen and both header and data is put in a single buffer > >> as expected so we check this when XDP programs are loaded. Note I > >> have only tested this with Linux vhost backend. > >> > >> If big packets mode is enabled and MTU/LRO conditions above are > >> met then XDP is allowed. > >> > >> A follow on patch can be generated to solve the mergeable receive > >> case with num_bufs equal to 2. Buffers greater than two may not > >> be handled has easily. > > > > > > I would very much prefer support for other layouts without drops > > before merging this. > > header by itself can certainly be handled by skipping it. > > People wanted to use that e.g. for zero copy. > > OK fair enough I'll do this now rather than push it out. > > >> > >> Hi Michael, > >> > >> The header skip logic however complicates the xmit handling a fair > >> amount. Specifically when we release the buffers after xmit then > >> both the hdr and data portions need to be released which requires > >> some tracking. > > > > I thought you disable all checksum offloads so why not discard the > > header immediately? > > Well in the "normal" case where the header is part of the same buffer > we keep it to use the same space for the header on the TX path. > > If we discard it in the header split case we have to push the header > somewhere else. In the skb case the cb[] region is used it looks like. > In our case I guess free space at the end of the page could be used. You don't have to put start of page in a buffer, you can put an offset there. Will result in some waste in the common case, but it's just several bytes so likely not a big deal. > My thinking is if we handle the general case of more than one buffer > being used with a copy we can handle the case above using the same > logic and no need to handle it as a special case. It seems to be an odd > case that doesn't really exist anyways. At least not in qemu/Linux. I > have not tested anything else. OK > > > >> Is the header split logic actually in use somewhere today? It looks > >> like its not being used in Linux case. And zero copy RX is currently as > >> best I can tell not supported anywhere so I would prefer not to > >> complicate the XDP path at the moment with a possible future feature. > > > > Well it's part of the documented interface so we never > > know who implemented it. Normally if we want to make > > restrictions we would do the reverse and add a feature. > > > > We can do this easily, but I'd like to first look into > > just handling all possible inputs as the spec asks us to. > > I'm a bit too busy with other stuff next week but will > > look into this a week after that if you don't beat me to it. > > > > Well I've almost got it working now with some logic to copy everything > into a single page if we hit this case so should be OK but slow. I'll > finish testing this and send it out hopefully in the next few days. > > > > > Anything else can be handled by copying the packet. > >> > >> Any idea how to test this? At the moment I have some code to linearize > >> the data in all cases with more than a single buffer. But wasn't clear > >> to me which features I could negotiate with vhost/qemu to get more than > >> a single buffer in the receive path. > >> > >> Thanks, > >> John > > > > ATM you need to hack qemu. Here's a hack to make header completely > > separate. > > > > Perfect! hacking qemu for testing is no problem this helps a lot thanks > and saves me time trying to figure out how to get qemu to do this. Pls note I didn't try this at all, so might not work, but should give you the idea. > > > > diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c > >
Re: [net-next PATCH v2 3/5] virtio_net: Add XDP support
On 16-11-27 07:36 PM, Michael S. Tsirkin wrote: > On Fri, Nov 25, 2016 at 01:24:03PM -0800, John Fastabend wrote: >> On 16-11-22 06:58 AM, Michael S. Tsirkin wrote: >>> On Tue, Nov 22, 2016 at 12:27:03AM -0800, John Fastabend wrote: On 16-11-21 03:20 PM, Michael S. Tsirkin wrote: > On Sat, Nov 19, 2016 at 06:50:33PM -0800, John Fastabend wrote: >> From: Shrijeet Mukherjee>> >> This adds XDP support to virtio_net. Some requirements must be >> met for XDP to be enabled depending on the mode. First it will >> only be supported with LRO disabled so that data is not pushed >> across multiple buffers. The MTU must be less than a page size >> to avoid having to handle XDP across multiple pages. >> >> If mergeable receive is enabled this first series only supports >> the case where header and data are in the same buf which we can >> check when a packet is received by looking at num_buf. If the >> num_buf is greater than 1 and a XDP program is loaded the packet >> is dropped and a warning is thrown. When any_header_sg is set this >> does not happen and both header and data is put in a single buffer >> as expected so we check this when XDP programs are loaded. Note I >> have only tested this with Linux vhost backend. >> >> If big packets mode is enabled and MTU/LRO conditions above are >> met then XDP is allowed. >> >> A follow on patch can be generated to solve the mergeable receive >> case with num_bufs equal to 2. Buffers greater than two may not >> be handled has easily. > > > I would very much prefer support for other layouts without drops > before merging this. > header by itself can certainly be handled by skipping it. > People wanted to use that e.g. for zero copy. OK fair enough I'll do this now rather than push it out. >> >> Hi Michael, >> >> The header skip logic however complicates the xmit handling a fair >> amount. Specifically when we release the buffers after xmit then >> both the hdr and data portions need to be released which requires >> some tracking. > > I thought you disable all checksum offloads so why not discard the > header immediately? Well in the "normal" case where the header is part of the same buffer we keep it to use the same space for the header on the TX path. If we discard it in the header split case we have to push the header somewhere else. In the skb case the cb[] region is used it looks like. In our case I guess free space at the end of the page could be used. My thinking is if we handle the general case of more than one buffer being used with a copy we can handle the case above using the same logic and no need to handle it as a special case. It seems to be an odd case that doesn't really exist anyways. At least not in qemu/Linux. I have not tested anything else. > >> Is the header split logic actually in use somewhere today? It looks >> like its not being used in Linux case. And zero copy RX is currently as >> best I can tell not supported anywhere so I would prefer not to >> complicate the XDP path at the moment with a possible future feature. > > Well it's part of the documented interface so we never > know who implemented it. Normally if we want to make > restrictions we would do the reverse and add a feature. > > We can do this easily, but I'd like to first look into > just handling all possible inputs as the spec asks us to. > I'm a bit too busy with other stuff next week but will > look into this a week after that if you don't beat me to it. > Well I've almost got it working now with some logic to copy everything into a single page if we hit this case so should be OK but slow. I'll finish testing this and send it out hopefully in the next few days. > > Anything else can be handled by copying the packet. >> >> Any idea how to test this? At the moment I have some code to linearize >> the data in all cases with more than a single buffer. But wasn't clear >> to me which features I could negotiate with vhost/qemu to get more than >> a single buffer in the receive path. >> >> Thanks, >> John > > ATM you need to hack qemu. Here's a hack to make header completely > separate. > Perfect! hacking qemu for testing is no problem this helps a lot thanks and saves me time trying to figure out how to get qemu to do this. > > diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c > index b68c69d..4866144 100644 > --- a/hw/net/virtio-net.c > +++ b/hw/net/virtio-net.c > @@ -1164,6 +1164,7 @@ static ssize_t virtio_net_receive(NetClientState *nc, > const uint8_t *buf, size_t > offset = n->host_hdr_len; > total += n->guest_hdr_len; > guest_offset = n->guest_hdr_len; > +continue; > } else { > guest_offset = 0; > } > > > > here's one that should cap the 1st s/g to 100 bytes: > > > diff --git a/hw/net/virtio-net.c
Re: [net-next PATCH v2 3/5] virtio_net: Add XDP support
On Fri, Nov 25, 2016 at 01:24:03PM -0800, John Fastabend wrote: > On 16-11-22 06:58 AM, Michael S. Tsirkin wrote: > > On Tue, Nov 22, 2016 at 12:27:03AM -0800, John Fastabend wrote: > >> On 16-11-21 03:20 PM, Michael S. Tsirkin wrote: > >>> On Sat, Nov 19, 2016 at 06:50:33PM -0800, John Fastabend wrote: > From: Shrijeet Mukherjee> > This adds XDP support to virtio_net. Some requirements must be > met for XDP to be enabled depending on the mode. First it will > only be supported with LRO disabled so that data is not pushed > across multiple buffers. The MTU must be less than a page size > to avoid having to handle XDP across multiple pages. > > If mergeable receive is enabled this first series only supports > the case where header and data are in the same buf which we can > check when a packet is received by looking at num_buf. If the > num_buf is greater than 1 and a XDP program is loaded the packet > is dropped and a warning is thrown. When any_header_sg is set this > does not happen and both header and data is put in a single buffer > as expected so we check this when XDP programs are loaded. Note I > have only tested this with Linux vhost backend. > > If big packets mode is enabled and MTU/LRO conditions above are > met then XDP is allowed. > > A follow on patch can be generated to solve the mergeable receive > case with num_bufs equal to 2. Buffers greater than two may not > be handled has easily. > >>> > >>> > >>> I would very much prefer support for other layouts without drops > >>> before merging this. > >>> header by itself can certainly be handled by skipping it. > >>> People wanted to use that e.g. for zero copy. > >> > >> OK fair enough I'll do this now rather than push it out. > >> > > Hi Michael, > > The header skip logic however complicates the xmit handling a fair > amount. Specifically when we release the buffers after xmit then > both the hdr and data portions need to be released which requires > some tracking. I thought you disable all checksum offloads so why not discard the header immediately? > Is the header split logic actually in use somewhere today? It looks > like its not being used in Linux case. And zero copy RX is currently as > best I can tell not supported anywhere so I would prefer not to > complicate the XDP path at the moment with a possible future feature. Well it's part of the documented interface so we never know who implemented it. Normally if we want to make restrictions we would do the reverse and add a feature. We can do this easily, but I'd like to first look into just handling all possible inputs as the spec asks us to. I'm a bit too busy with other stuff next week but will look into this a week after that if you don't beat me to it. > >>> > >>> Anything else can be handled by copying the packet. > > Any idea how to test this? At the moment I have some code to linearize > the data in all cases with more than a single buffer. But wasn't clear > to me which features I could negotiate with vhost/qemu to get more than > a single buffer in the receive path. > > Thanks, > John ATM you need to hack qemu. Here's a hack to make header completely separate. diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c index b68c69d..4866144 100644 --- a/hw/net/virtio-net.c +++ b/hw/net/virtio-net.c @@ -1164,6 +1164,7 @@ static ssize_t virtio_net_receive(NetClientState *nc, const uint8_t *buf, size_t offset = n->host_hdr_len; total += n->guest_hdr_len; guest_offset = n->guest_hdr_len; +continue; } else { guest_offset = 0; } here's one that should cap the 1st s/g to 100 bytes: diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c index b68c69d..7943004 100644 --- a/hw/net/virtio-net.c +++ b/hw/net/virtio-net.c @@ -1164,6 +1164,7 @@ static ssize_t virtio_net_receive(NetClientState *nc, const uint8_t *buf, size_t offset = n->host_hdr_len; total += n->guest_hdr_len; guest_offset = n->guest_hdr_len; +sg.iov_len = MIN(sg.iov_len, 100); } else { guest_offset = 0; }
Re: [net,v2] neigh: fix the loop index error in neigh dump
On 11/27/16 7:56 PM, David Ahern wrote: > On 11/27/16 7:53 PM, 张胜举 wrote: >> >> >>> -Original Message- >>> From: David Ahern [mailto:d...@cumulusnetworks.com] >>> Sent: Monday, November 28, 2016 10:39 AM >>> To: 张胜举; >>> netdev@vger.kernel.org >>> Subject: Re: [net,v2] neigh: fix the loop index error in neigh dump >>> >>> On 11/27/16 7:34 PM, 张胜举 wrote: > -Original Message- > From: David Ahern [mailto:d...@cumulusnetworks.com] > Sent: Monday, November 28, 2016 10:10 AM > To: Zhang Shengju ; > netdev@vger.kernel.org > Subject: Re: [net,v2] neigh: fix the loop index error in neigh dump > > On 11/27/16 6:32 PM, Zhang Shengju wrote: >> Loop index in neigh dump function is not updated correctly under >> some circumstances, this patch will fix it. > > What's an example? If dev is filtered out, the original code goes to next loop without updating loop index 'idx'. >>> >>> And you have a use case with missing or redundant data? Or is your >>> comment based on a review of code only? >> It's on my code review. No use case currently, this is uncommon to happen. >> >> >>> > You are completely rewriting the dump loops. I put 'idx++' into for loop, so I replace 'goto' with 'continue'. The other change is style related. >>> >>> A "fixes" should not include 'style related' changes. >> Okay, I will send another version without style changes. >> > > Personally, I think you need to produce a use case that fails before sending > another patch. I have not seen a problem with this code. > And looking back at 3f0ae05d6f I should not have acked it (reviewed it too quickly while on PTO). Your change is a no-op because of what idx represents - the position in the hash list for devices relevant for the dump request. Same goes for the neigh dump so this patch is not needed.
Re: [Patch net-next] net_sched: move the empty tp check from ->destroy() to ->delete()
On 16-11-23 05:58 PM, Cong Wang wrote: > Roi reported we could have a race condition where in ->classify() path > we dereference tp->root and meanwhile a parallel ->destroy() makes it > a NULL. > > This is possible because ->destroy() could be called when deleting > a filter to check if we are the last one in tp, this tp is still > linked and visible at that time. > > The root cause of this problem is the semantic of ->destroy(), it > does two things (for non-force case): > > 1) check if tp is empty > 2) if tp is empty we could really destroy it > > and its caller, if cares, needs to check its return value to see if > it is really destroyed. Therefore we can't unlink tp unless we know > it is empty. > > As suggested by Daniel, we could actually move the test logic to ->delete() > so that we can safely unlink tp after ->delete() tells us the last one is > just deleted and before ->destroy(). > > What's more, even we unlink it before ->destroy(), it could still have > readers since we don't wait for a grace period here, we should not modify > tp->root in ->destroy() either. > > Fixes: 1e052be69d04 ("net_sched: destroy proto tp when all filters are gone") > Reported-by: Roi Dayan> Cc: Daniel Borkmann > Cc: John Fastabend > Signed-off-by: Cong Wang > --- Hi Cong, Thanks a lot for doing this. Can you rebase it on top of Daniel's patch though, [PATCH net] net, sched: respect rcu grace period on cls destruction And then push the NULL pointer work for the cls_fw and cls_route classifiers into another patch. Then I believe the last thing to make this correct is to convert the call_rcu() paths to call_rcu_bh(). .John
Re: [net,v2] neigh: fix the loop index error in neigh dump
On 11/27/16 7:53 PM, 张胜举 wrote: > > >> -Original Message- >> From: David Ahern [mailto:d...@cumulusnetworks.com] >> Sent: Monday, November 28, 2016 10:39 AM >> To: 张胜举; >> netdev@vger.kernel.org >> Subject: Re: [net,v2] neigh: fix the loop index error in neigh dump >> >> On 11/27/16 7:34 PM, 张胜举 wrote: -Original Message- From: David Ahern [mailto:d...@cumulusnetworks.com] Sent: Monday, November 28, 2016 10:10 AM To: Zhang Shengju ; netdev@vger.kernel.org Subject: Re: [net,v2] neigh: fix the loop index error in neigh dump On 11/27/16 6:32 PM, Zhang Shengju wrote: > Loop index in neigh dump function is not updated correctly under > some circumstances, this patch will fix it. What's an example? >>> >>> If dev is filtered out, the original code goes to next loop without >>> updating loop index 'idx'. >> >> And you have a use case with missing or redundant data? Or is your >> comment based on a review of code only? > It's on my code review. No use case currently, this is uncommon to happen. > > >> You are completely rewriting the dump loops. >>> >>> I put 'idx++' into for loop, so I replace 'goto' with 'continue'. >>> The other change is style related. >> >> A "fixes" should not include 'style related' changes. > Okay, I will send another version without style changes. > Personally, I think you need to produce a use case that fails before sending another patch. I have not seen a problem with this code.
Re: [PATCH net] net, sched: respect rcu grace period on cls destruction
On 16-11-26 04:18 PM, Daniel Borkmann wrote: > Roi reported a crash in flower where tp->root was NULL in ->classify() > callbacks. Reason is that in ->destroy() tp->root is set to NULL via > RCU_INIT_POINTER(). It's problematic for some of the classifiers, because > this doesn't respect RCU grace period for them, and as a result, still > outstanding readers from tc_classify() will try to blindly dereference > a NULL tp->root. > > The tp->root object is strictly private to the classifier implementation > and holds internal data the core such as tc_ctl_tfilter() doesn't know > about. Within some classifiers, such as cls_bpf, cls_basic, etc, tp->root > is only checked for NULL in ->get() callback, but nowhere else. This is > misleading and seemed to be copied from old classifier code that was not > cleaned up properly. For example, d3fa76ee6b4a ("[NET_SCHED]: cls_basic: > fix NULL pointer dereference") moved tp->root initialization into ->init() > routine, where before it was part of ->change(), so ->get() had to deal > with tp->root being NULL back then, so that was indeed a valid case, after > d3fa76ee6b4a, not really anymore. We used to set tp->root to NULL long > ago in ->destroy(), see 47a1a1d4be29 ("pkt_sched: remove unnecessary xchg() > in packet classifiers"); but the NULLifying was reintroduced with the > RCUification, but it's not correct for every classifier implementation. > > In the cases that are fixed here with one exception of cls_cgroup, tp->root > object is allocated and initialized inside ->init() callback, which is always > performed at a point in time after we allocate a new tp, which means tp and > thus tp->root was not globally visible in the tp chain yet (see > tc_ctl_tfilter()). > Also, on destruction tp->root is strictly kfree_rcu()'ed in ->destroy() > handler, same for the tp which is kfree_rcu()'ed right when we return > from ->destroy() in tcf_destroy(). This means, the head object's lifetime > for such classifiers is always tied to the tp lifetime. The RCU callback > invocation for the two kfree_rcu() could be out of order, but that's fine > since both are independent. > > Dropping the RCU_INIT_POINTER(tp->root, NULL) for these classifiers here > means that 1) we don't need a useless NULL check in fast-path and, 2) that > outstanding readers of that tp in tc_classify() can still execute under > respect with RCU grace period as it is actually expected. > > Things that haven't been touched here: cls_fw and cls_route. They each > handle tp->root being NULL in ->classify() path for historic reasons, so > their ->destroy() implementation can stay as is. If someone actually > cares, they could get cleaned up at some point to avoid the test in fast > path. cls_u32 doesn't set tp->root to NULL. For cls_rsvp, I just added a > !head should anyone actually be using/testing it, so it at least aligns with > cls_fw and cls_route. For cls_flower we additionally need to defer rhashtable > destruction (to a sleepable context) after RCU grace period as concurrent > readers might still access it. (Note that in this case we need to hold module > reference to keep work callback address intact, since we only wait on module > unload for all call_rcu()s to finish.) > > This fixes one race to bring RCU grace period guarantees back. Next step > as worked on by Cong however is to fix 1e052be69d04 ("net_sched: destroy > proto tp when all filters are gone") to get the order of unlinking the tp > in tc_ctl_tfilter() for the RTM_DELTFILTER case right by moving > RCU_INIT_POINTER() before tcf_destroy() and let the notification for > removal be done through the prior ->delete() callback. Both are independant > issues. Once we have that right, we can then clean tp->root up for a number > of classifiers by not making them RCU pointers, which requires a new callback > (->uninit) that is triggered from tp's RCU callback, where we just kfree() > tp->root from there. Thanks looks good to me and appreciate the detailed commit message. Acked-by: John Fastabend> > Fixes: 1f947bf151e9 ("net: sched: rcu'ify cls_bpf") > Fixes: 9888faefe132 ("net: sched: cls_basic use RCU") > Fixes: 70da9f0bf999 ("net: sched: cls_flow use RCU") > Fixes: 77b9900ef53a ("tc: introduce Flower classifier") > Fixes: bf3994d2ed31 ("net/sched: introduce Match-all classifier") > Fixes: 952313bd6258 ("net: sched: cls_cgroup use RCU") > Reported-by: Roi Dayan > Signed-off-by: Daniel Borkmann > Cc: Cong Wang > Cc: John Fastabend > Cc: Roi Dayan > Cc: Jiri Pirko > ---
RE: [net,v2] neigh: fix the loop index error in neigh dump
> -Original Message- > From: David Ahern [mailto:d...@cumulusnetworks.com] > Sent: Monday, November 28, 2016 10:39 AM > To: 张胜举; > netdev@vger.kernel.org > Subject: Re: [net,v2] neigh: fix the loop index error in neigh dump > > On 11/27/16 7:34 PM, 张胜举 wrote: > >> -Original Message- > >> From: David Ahern [mailto:d...@cumulusnetworks.com] > >> Sent: Monday, November 28, 2016 10:10 AM > >> To: Zhang Shengju ; > >> netdev@vger.kernel.org > >> Subject: Re: [net,v2] neigh: fix the loop index error in neigh dump > >> > >> On 11/27/16 6:32 PM, Zhang Shengju wrote: > >>> Loop index in neigh dump function is not updated correctly under > >>> some circumstances, this patch will fix it. > >> > >> What's an example? > > > > If dev is filtered out, the original code goes to next loop without > > updating loop index 'idx'. > > And you have a use case with missing or redundant data? Or is your > comment based on a review of code only? It's on my code review. No use case currently, this is uncommon to happen. > > >> You are completely rewriting the dump loops. > > > > I put 'idx++' into for loop, so I replace 'goto' with 'continue'. > > The other change is style related. > > A "fixes" should not include 'style related' changes. Okay, I will send another version without style changes.
[PATCH] net: handle no dst on skb in icmp6_send
Andrey reported the following while fuzzing the kernel with syzkaller: kasan: CONFIG_KASAN_INLINE enabled kasan: GPF could be caused by NULL-ptr deref or user memory access general protection fault: [#1] SMP KASAN Modules linked in: CPU: 0 PID: 3859 Comm: a.out Not tainted 4.9.0-rc6+ #429 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 task: 8800666d4200 task.stack: 880067348000 RIP: 0010:[] [] icmp6_send+0x5fc/0x1e30 net/ipv6/icmp.c:451 RSP: 0018:88006734f2c0 EFLAGS: 00010206 RAX: 8800666d4200 RBX: RCX: RDX: RSI: dc00 RDI: 0018 RBP: 88006734f630 R08: 880064138418 R09: 0003 R10: dc00 R11: 0005 R12: R13: 84e7e200 R14: 880064138484 R15: 8800641383c0 FS: 7fb3887a07c0() GS:88006cc0() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: 2000 CR3: 6b04 CR4: 06f0 Stack: 8800666d4200 8800666d49f8 8800666d4200 84c02460 8800666d4a1a 11000ccdaa2f 88006734f498 0046 88006734f440 832f4269 880064ba7456 Call Trace: [] icmpv6_param_prob+0x2c/0x40 net/ipv6/icmp.c:557 [< inline >] ip6_tlvopt_unknown net/ipv6/exthdrs.c:88 [] ip6_parse_tlv+0x555/0x670 net/ipv6/exthdrs.c:157 [] ipv6_parse_hopopts+0x199/0x460 net/ipv6/exthdrs.c:663 [] ipv6_rcv+0xfa3/0x1dc0 net/ipv6/ip6_input.c:191 ... icmp6_send / icmpv6_send is invoked for both rx and tx paths. In both cases the dst->dev should be preferred for determining the L3 domain if the dst has been set on the skb. Fallback to the skb->dev if it has not. This covers the case reported here where icmp6_send is invoked on Rx before the route lookup. Fixes: 5d41ce29e ("net: icmp6_send should use dst dev to determine L3 domain") Reported-by: Andrey KonovalovSigned-off-by: David Ahern --- net/ipv6/icmp.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/net/ipv6/icmp.c b/net/ipv6/icmp.c index 7370ad2e693a..2772004ba5a1 100644 --- a/net/ipv6/icmp.c +++ b/net/ipv6/icmp.c @@ -447,8 +447,10 @@ static void icmp6_send(struct sk_buff *skb, u8 type, u8 code, __u32 info, if (__ipv6_addr_needs_scope_id(addr_type)) iif = skb->dev->ifindex; - else - iif = l3mdev_master_ifindex(skb_dst(skb)->dev); + else { + dst = skb_dst(skb); + iif = l3mdev_master_ifindex(dst ? dst->dev : skb->dev); + } /* * Must not send error if the source does not uniquely -- 2.1.4
Re: [Patch net-next] net_sched: move the empty tp check from ->destroy() to ->delete()
On 16-11-27 06:26 PM, John Fastabend wrote: > On 16-11-26 10:29 PM, Roi Dayan wrote: >> >> >> On 27/11/2016 06:47, Roi Dayan wrote: >>> >>> >>> On 27/11/2016 02:33, Daniel Borkmann wrote: On 11/26/2016 12:09 PM, Daniel Borkmann wrote: > On 11/26/2016 07:46 AM, Cong Wang wrote: >> On Thu, Nov 24, 2016 at 7:20 AM, Daniel Borkmann >>wrote: [...] >>> Ok, strange, qdisc_destroy() calls into ops->destroy(), where ingress >>> drops its entire chain via tcf_destroy_chain(), so that will be NULL >>> eventually. The tps are freed by call_rcu() as well as qdisc itself >>> later on via qdisc_rcu_free(), where it frees per-cpu bstats as well. >>> Outstanding readers should either bail out due to if (!cl) or can >>> still >>> process the chain until read section ends, but during that time, >>> cl->q >>> resp. bstats should be good. Do you happen to know what's at address >>> 880a68b04028? I was wondering wrt call_rcu() vs call_rcu_bh(), >>> but >>> at least on ingress (netif_receive_skb_internal()) we hold >>> rcu_read_lock() >>> here. The KASAN report is reliably happening at this location, right? >> >> I am confused as well, I don't see how it could be related to my >> patch yet. >> I will take a deep look in the weekend. >>> >>> >>> >>> Hi Cong, >>> >>> When reported the new trace I didn't mean it's related to your patch, >>> I just wanted to point it out it exposed something. I should have been >>> clear about it. >>> >>> > > Ok, I'm currently on the run. Got too late yesterday night, but I'll > write what I found in the evening today, not related to ingress though. Just pushed out my analysis to netdev under "[PATCH net] net, sched: respect rcu grace period on cls destruction". My conclusion is that both issues are actually separate, and that one is small enough where we could route it via net actually. Perhaps this at the same time shrinks your "[PATCH net-next] net_sched: move the empty tp check from ->destroy() to ->delete()" to a reasonable size that it's suitable to net as well. Your ->delete()/->destroy() one is definitely needed, too. The tp->root one is independant of ->delete()/ ->destroy() as they are different races and tp->root could also happen when you just destroy the whole tp directly. I think that seems like a good path forward to me. Thanks, Daniel >>> >>> >>> >>> Hi Daniel, >>> >>> As for the tainted kernel. I was in old (week or two) net-next tree >>> and only cherry-picked from latest net-next related patches to >>> Mellanox HCA, cls_api, cls_flower, devlink. so those are the tainted >>> modules. >>> I have the issue reproducing in that tree so wanted it to check it >>> with Cong's patch instead of latest net-next. >>> I'll try running reproducing the issue with your new patch and later >>> try latest net-next as well. >>> >>> Thanks, >>> Roi >>> >> >> Hi, >> >> I tested "[PATCH net] net, sched: respect rcu grace period on cls >> destruction" and could not reproduce my original issue. > > Hi Roi, > > Just so I'm 100% clear. No issue with just the above "respect rcu grace > period on cls destruction" per above statement. > >> I rebased "[Patch net-next] net_sched: move the empty tp check from >> ->destroy() to ->delete()" over to test it in the same tree and got into >> a new trace in fl_delete. > > In this case did you test with "net_sched: move the empty tp check from > ->destroy() to ->delete()" _only_ or did this include both patches when > you see the error below. > > From my inspection we really need both patches to get correct behavior. > > Thanks! > John Ah dang nevermind I just read both patches in detail and applying them both at the same time is nonsense. Let me reply with comments directly to the patches. Thanks. sorry for the noise. > >> >> [35659.012123] BUG: KASAN: wild-memory-access on address 1803ca31 >> [35659.020042] Write of size 1 by task ovs-vswitchd/20135 >> [35659.025878] CPU: 19 PID: 20135 Comm: ovs-vswitchd Tainted: >> G O4.9.0-rc3+ #18 >> [35659.035948] Hardware name: HP ProLiant DL380p Gen8, BIOS P70 07/01/2015 >> [35659.043730] Call Trace: >> [35659.046619] [] dump_stack+0x63/0x81 >> [35659.052456] [] kasan_report_error+0x408/0x4e0 >> [35659.059402] [] kasan_report+0x58/0x60 >> [35659.065428] [] ? call_rcu_sched+0x1d/0x20 >> [35659.072119] [] ? fl_destroy_filter+0x21/0x30 >> [cls_flower] >> [35659.080217] [] ? fl_delete+0x1df/0x2e0 [cls_flower] >> [35659.087580] [] __asan_store1+0x4a/0x50 >> [35659.093697] [] fl_delete+0x1df/0x2e0 [cls_flower] >> [35659.100870] [] tc_ctl_tfilter+0x10da/0x1b90 >> >> >> 0x1d02 is in fl_delete (net/sched/cls_flower.c:805). >> 800 struct cls_fl_filter *f = (struct cls_fl_filter *) arg; >> 801 >> 802 rhashtable_remove_fast(>ht,
[PATCH net-next v3 2/4] Documentation: net: phy: Add a paragraph about pause frames/flow control
Describe that the Ethernet MAC controller is ultimately responsible for dealing with proper pause frames/flow control advertisement and enabling, and that it is therefore allowed to have it change phydev->supported/advertising with SUPPORTED_Pause and SUPPORTED_AsymPause. Reviewed-by: Martin BlumenstinglSigned-off-by: Florian Fainelli --- Documentation/networking/phy.txt | 18 -- 1 file changed, 16 insertions(+), 2 deletions(-) diff --git a/Documentation/networking/phy.txt b/Documentation/networking/phy.txt index 4b25c0f24201..9a42a9414cea 100644 --- a/Documentation/networking/phy.txt +++ b/Documentation/networking/phy.txt @@ -127,8 +127,9 @@ Letting the PHY Abstraction Layer do Everything values pruned from them which don't make sense for your controller (a 10/100 controller may be connected to a gigabit capable PHY, so you would need to mask off SUPPORTED_1000baseT*). See include/linux/ethtool.h for definitions - for these bitfields. Note that you should not SET any bits, or the PHY may - get put into an unsupported state. + for these bitfields. Note that you should not SET any bits, except the + SUPPORTED_Pause and SUPPORTED_AsymPause bits (see below), or the PHY may get + put into an unsupported state. Lastly, once the controller is ready to handle network traffic, you call phy_start(phydev). This tells the PAL that you are ready, and configures the @@ -139,6 +140,19 @@ Letting the PHY Abstraction Layer do Everything When you want to disconnect from the network (even if just briefly), you call phy_stop(phydev). +Pause frames / flow control + + The PHY does not participate directly in flow control/pause frames except by + making sure that the SUPPORTED_Pause and SUPPORTED_AsymPause bits are set in + MII_ADVERTISE to indicate towards the link partner that the Ethernet MAC + controller supports such a thing. Since flow control/pause frames generation + involves the Ethernet MAC driver, it is recommended that this driver takes care + of properly indicating advertisement and support for such features by setting + the SUPPORTED_Pause and SUPPORTED_AsymPause bits accordingly. This can be done + either before or after phy_connect() and/or as a result of implementing the + ethtool::set_pauseparam feature. + + Keeping Close Tabs on the PAL It is possible that the PAL's built-in state machine needs a little help to -- 2.9.3
[PATCH net-next v3 3/4] Documentation: net: phy: Add blurb about RGMII
RGMII is a recurring source of pain for people with Gigabit Ethernet hardware since it may require PHY driver and MAC driver level configuration hints. Document what are the expectations from PHYLIB and what options exist. Reviewed-by: Martin BlumenstinglSigned-off-by: Florian Fainelli --- Documentation/networking/phy.txt | 77 1 file changed, 77 insertions(+) diff --git a/Documentation/networking/phy.txt b/Documentation/networking/phy.txt index 9a42a9414cea..c7ba84b5d912 100644 --- a/Documentation/networking/phy.txt +++ b/Documentation/networking/phy.txt @@ -65,6 +65,83 @@ The MDIO bus drivers/net/ethernet/freescale/fsl_pq_mdio.c and an associated DTS file for one of the users. (e.g. "git grep fsl,.*-mdio arch/powerpc/boot/dts/") +(RG)MII/electrical interface considerations + + The Reduced Gigabit Medium Independent Interface (RGMII) is a 12-pin + electrical signal interface using a synchronous 125Mhz clock signal and several + data lines. Due to this design decision, a 1.5ns to 2ns delay must be added + between the clock line (RXC or TXC) and the data lines to let the PHY (clock + sink) have enough setup and hold times to sample the data lines correctly. The + PHY library offers different types of PHY_INTERFACE_MODE_RGMII* values to let + the PHY driver and optionally the MAC driver, implement the required delay. The + values of phy_interface_t must be understood from the perspective of the PHY + device itself, leading to the following: + + * PHY_INTERFACE_MODE_RGMII: the PHY is not responsible for inserting any + internal delay by itself, it assumes that either the Ethernet MAC (if capable + or the PCB traces) insert the correct 1.5-2ns delay + + * PHY_INTERFACE_MODE_RGMII_TXID: the PHY should insert an internal delay + for the transmit data lines (TXD[3:0]) processed by the PHY device + + * PHY_INTERFACE_MODE_RGMII_RXID: the PHY should insert an internal delay + for the receive data lines (RXD[3:0]) processed by the PHY device + + * PHY_INTERFACE_MODE_RGMII_ID: the PHY should insert internal delays for + both transmit AND receive data lines from/to the PHY device + + Whenever possible, use the PHY side RGMII delay for these reasons: + + * PHY devices may offer sub-nanosecond granularity in how they allow a + receiver/transmitter side delay (e.g: 0.5, 1.0, 1.5ns) to be specified. Such + precision may be required to account for differences in PCB trace lengths + + * PHY devices are typically qualified for a large range of applications + (industrial, medical, automotive...), and they provide a constant and + reliable delay across temperature/pressure/voltage ranges + + * PHY device drivers in PHYLIB being reusable by nature, being able to + configure correctly a specified delay enables more designs with similar delay + requirements to be operate correctly + + For cases where the PHY is not capable of providing this delay, but the + Ethernet MAC driver is capable of doing so, the correct phy_interface_t value + should be PHY_INTERFACE_MODE_RGMII, and the Ethernet MAC driver should be + configured correctly in order to provide the required transmit and/or receive + side delay from the perspective of the PHY device. Conversely, if the Ethernet + MAC driver looks at the phy_interface_t value, for any other mode but + PHY_INTERFACE_MODE_RGMII, it should make sure that the MAC-level delays are + disabled. + + In case neither the Ethernet MAC, nor the PHY are capable of providing the + required delays, as defined per the RGMII standard, several options may be + available: + + * Some SoCs may offer a pin pad/mux/controller capable of configuring a given + set of pins'strength, delays, and voltage; and it may be a suitable + option to insert the expected 2ns RGMII delay. + + * Modifying the PCB design to include a fixed delay (e.g: using a specifically + designed serpentine), which may not require software configuration at all. + +Common problems with RGMII delay mismatch + + When there is a RGMII delay mismatch between the Ethernet MAC and the PHY, this + will most likely result in the clock and data line signals to be unstable when + the PHY or MAC take a snapshot of these signals to translate them into logical + 1 or 0 states and reconstruct the data being transmitted/received. Typical + symptoms include: + + * Transmission/reception partially works, and there is frequent or occasional + packet loss observed + + * Ethernet MAC may report some or all packets ingressing with a FCS/CRC error, + or just discard them all + + * Switching to lower speeds such as 10/100Mbits/sec makes the problem go away + (since there is enough setup/hold time in that case) + + Connecting to a PHY Sometime during startup, the network driver needs to establish a connection -- 2.9.3
[PATCH net-next v3 4/4] Documentation: net: phy: Add links to several standards documents
Add links to the IEEE 802.3-2008 document, and the RGMII v1.3 and v2.0 revisions of the standard. Reviewed-by: Martin BlumenstinglSigned-off-by: Florian Fainelli --- Documentation/networking/phy.txt | 10 ++ 1 file changed, 10 insertions(+) diff --git a/Documentation/networking/phy.txt b/Documentation/networking/phy.txt index c7ba84b5d912..e017d933d530 100644 --- a/Documentation/networking/phy.txt +++ b/Documentation/networking/phy.txt @@ -407,3 +407,13 @@ Board Fixups The stubs set one of the two matching criteria, and set the other one to match anything. +Standards + + IEEE Standard 802.3: CSMA/CD Access Method and Physical Layer Specifications, Section Two: + http://standards.ieee.org/getieee802/download/802.3-2008_section2.pdf + + RGMII v1.3: + http://web.archive.org/web/20160303212629/http://www.hp.com/rnd/pdfs/RGMIIv1_3.pdf + + RGMII v2.0: + http://web.archive.org/web/20160303171328/http://www.hp.com/rnd/pdfs/RGMIIv2_0_final_hp.pdf -- 2.9.3
[PATCH net-next v3 1/4] Documentation: net: phy: remove description of function pointers
Remove the function pointers documentation which duplicates information found in include/linux/phy.h. Maintaining documentation about two different locations just does not work, but the code is less likely to be outdated. Reviewed-by: Martin BlumenstinglSigned-off-by: Florian Fainelli --- Documentation/networking/phy.txt | 35 ++- 1 file changed, 2 insertions(+), 33 deletions(-) diff --git a/Documentation/networking/phy.txt b/Documentation/networking/phy.txt index 7ab9404a8412..4b25c0f24201 100644 --- a/Documentation/networking/phy.txt +++ b/Documentation/networking/phy.txt @@ -251,39 +251,8 @@ Writing a PHY driver PHY_BASIC_FEATURES, but you can look in include/mii.h for other features. - Each driver consists of a number of function pointers: - - soft_reset: perform a PHY software reset - config_init: configures PHY into a sane state after a reset. - For instance, a Davicom PHY requires descrambling disabled. - probe: Allocate phy->priv, optionally refuse to bind. - PHY may not have been reset or had fixups run yet. - suspend/resume: power management - config_aneg: Changes the speed/duplex/negotiation settings - aneg_done: Determines the auto-negotiation result - read_status: Reads the current speed/duplex/negotiation settings - ack_interrupt: Clear a pending interrupt - did_interrupt: Checks if the PHY generated an interrupt - config_intr: Enable or disable interrupts - remove: Does any driver take-down - ts_info: Queries about the HW timestamping status - match_phy_device: used for Clause 45 capable PHYs to match devices - in package and ensure they are compatible - hwtstamp: Set the PHY HW timestamping configuration - rxtstamp: Requests a receive timestamp at the PHY level for a 'skb' - txtsamp: Requests a transmit timestamp at the PHY level for a 'skb' - set_wol: Enable Wake-on-LAN at the PHY level - get_wol: Get the Wake-on-LAN status at the PHY level - link_change_notify: called to inform the core is about to change the - link state, can be used to work around bogus PHY between state changes - read_mmd_indirect: Read PHY MMD indirect register - write_mmd_indirect: Write PHY MMD indirect register - module_info: Get the size and type of an EEPROM contained in an plug-in - module - module_eeprom: Get EEPROM information of a plug-in module - get_sset_count: Get number of strings sets that get_strings will count - get_strings: Get strings from requested objects (statistics) - get_stats: Get the extended statistics from the PHY device + Each driver consists of a number of function pointers, documented + in include/linux/phy.h under the phy_driver structure. Of these, only config_aneg and read_status are required to be assigned by the driver code. The rest are optional. Also, it is -- 2.9.3
[PATCH net-next v3 0/4] Documentation: net: phy: Improve documentation
Hi all, This patch series addresses discussions and feedback that was recently received on the mailing-list in the area of: flow control/pause frames, interpretation of phy_interface_t and finally add some links to useful standards documents. Changes in v3: - add Timur's feedback into patch 3 Changes in v2: - clarify a few things in the RGMII section, add a paragraph about common issues with RGMII delay mismatches Florian Fainelli (4): Documentation: net: phy: remove description of function pointers Documentation: net: phy: Add a paragraph about pause frames/flow control Documentation: net: phy: Add blurb about RGMII Documentation: net: phy: Add links to several standards documents Documentation/networking/phy.txt | 140 +-- 1 file changed, 105 insertions(+), 35 deletions(-) -- 2.9.3
Re: [net,v2] neigh: fix the loop index error in neigh dump
On 11/27/16 7:34 PM, 张胜举 wrote: >> -Original Message- >> From: David Ahern [mailto:d...@cumulusnetworks.com] >> Sent: Monday, November 28, 2016 10:10 AM >> To: Zhang Shengju; >> netdev@vger.kernel.org >> Subject: Re: [net,v2] neigh: fix the loop index error in neigh dump >> >> On 11/27/16 6:32 PM, Zhang Shengju wrote: >>> Loop index in neigh dump function is not updated correctly under some >>> circumstances, this patch will fix it. >> >> What's an example? > > If dev is filtered out, the original code goes to next loop without updating > loop index 'idx'. And you have a use case with missing or redundant data? Or is your comment based on a review of code only? >> You are completely rewriting the dump loops. > > I put 'idx++' into for loop, so I replace 'goto' with 'continue'. The > other change is style related. A "fixes" should not include 'style related' changes.
RE: [net,v2] neigh: fix the loop index error in neigh dump
> -Original Message- > From: David Ahern [mailto:d...@cumulusnetworks.com] > Sent: Monday, November 28, 2016 10:10 AM > To: Zhang Shengju; > netdev@vger.kernel.org > Subject: Re: [net,v2] neigh: fix the loop index error in neigh dump > > On 11/27/16 6:32 PM, Zhang Shengju wrote: > > Loop index in neigh dump function is not updated correctly under some > > circumstances, this patch will fix it. > > What's an example? If dev is filtered out, the original code goes to next loop without updating loop index 'idx'. > > > > > Fixes: 16660f0bd9 ("net: Add support for filtering neigh dump by > > device index") > > Fixes: 21fdd092ac ("net: Add support for filtering neigh dump by > > master device") > > > > Signed-off-by: Zhang Shengju > > --- > > net/core/neighbour.c | 39 ++- > > 1 file changed, 18 insertions(+), 21 deletions(-) > > > > diff --git a/net/core/neighbour.c b/net/core/neighbour.c index > > 2ae929f..ce32e9c 100644 > > --- a/net/core/neighbour.c > > +++ b/net/core/neighbour.c > > @@ -2256,6 +2256,16 @@ static bool neigh_ifindex_filtered(struct > net_device *dev, int filter_idx) > > return false; > > } > > > > +static bool neigh_dump_filtered(struct net_device *dev, int filter_idx, > > + int filter_master_idx) > > +{ > > + if (neigh_ifindex_filtered(dev, filter_idx) || > > + neigh_master_filtered(dev, filter_master_idx)) > > + return true; > > + > > + return false; > > +} > > + > > static int neigh_dump_table(struct neigh_table *tbl, struct sk_buff *skb, > > struct netlink_callback *cb) > > { > > @@ -2285,20 +2295,15 @@ static int neigh_dump_table(struct neigh_table > *tbl, struct sk_buff *skb, > > rcu_read_lock_bh(); > > nht = rcu_dereference_bh(tbl->nht); > > > > - for (h = s_h; h < (1 << nht->hash_shift); h++) { > > - if (h > s_h) > > - s_idx = 0; > > + for (h = s_h; h < (1 << nht->hash_shift); h++, s_idx = 0) { > > for (n = rcu_dereference_bh(nht->hash_buckets[h]), idx = 0; > > n != NULL; > > -n = rcu_dereference_bh(n->next)) { > > - if (!net_eq(dev_net(n->dev), net)) > > - continue; > > - if (neigh_ifindex_filtered(n->dev, filter_idx)) > > +n = rcu_dereference_bh(n->next), idx++) { > > + if (idx < s_idx || !net_eq(dev_net(n->dev), net)) > > continue; > > - if (neigh_master_filtered(n->dev, filter_master_idx)) > > + if (neigh_dump_filtered(n->dev, filter_idx, > > + filter_master_idx)) > > continue; > > - if (idx < s_idx) > > - goto next; > > if (neigh_fill_info(skb, n, NETLINK_CB(cb- > >skb).portid, > > cb->nlh->nlmsg_seq, > > RTM_NEWNEIGH, > > @@ -2306,8 +2311,6 @@ static int neigh_dump_table(struct neigh_table > *tbl, struct sk_buff *skb, > > rc = -1; > > goto out; > > } > > -next: > > - idx++; > > } > > } > > rc = skb->len; > > @@ -2328,14 +2331,10 @@ static int pneigh_dump_table(struct > > neigh_table *tbl, struct sk_buff *skb, > > > > read_lock_bh(>lock); > > > > - for (h = s_h; h <= PNEIGH_HASHMASK; h++) { > > - if (h > s_h) > > - s_idx = 0; > > - for (n = tbl->phash_buckets[h], idx = 0; n; n = n->next) { > > - if (pneigh_net(n) != net) > > + for (h = s_h; h <= PNEIGH_HASHMASK; h++, s_idx = 0) { > > + for (n = tbl->phash_buckets[h], idx = 0; n; n = n->next, idx++) > { > > + if (idx < s_idx || pneigh_net(n) != net) > > continue; > > - if (idx < s_idx) > > - goto next; > > if (pneigh_fill_info(skb, n, NETLINK_CB(cb- > >skb).portid, > > cb->nlh->nlmsg_seq, > > RTM_NEWNEIGH, > > @@ -2344,8 +2343,6 @@ static int pneigh_dump_table(struct neigh_table > *tbl, struct sk_buff *skb, > > rc = -1; > > goto out; > > } > > - next: > > - idx++; > > } > > } > > > > This fix is way to be complicated to be fixing anything related to 16660f0bd9 > or 21fdd092ac. Both of those commits added a continue: > > if (neigh_ifindex_filtered(n->dev, filter_idx)) > continue; > if (neigh_master_filtered(n->dev,
Re: [Patch net-next] net_sched: move the empty tp check from ->destroy() to ->delete()
On 16-11-26 10:29 PM, Roi Dayan wrote: > > > On 27/11/2016 06:47, Roi Dayan wrote: >> >> >> On 27/11/2016 02:33, Daniel Borkmann wrote: >>> On 11/26/2016 12:09 PM, Daniel Borkmann wrote: On 11/26/2016 07:46 AM, Cong Wang wrote: > On Thu, Nov 24, 2016 at 7:20 AM, Daniel Borkmann >wrote: >>> [...] >> Ok, strange, qdisc_destroy() calls into ops->destroy(), where ingress >> drops its entire chain via tcf_destroy_chain(), so that will be NULL >> eventually. The tps are freed by call_rcu() as well as qdisc itself >> later on via qdisc_rcu_free(), where it frees per-cpu bstats as well. >> Outstanding readers should either bail out due to if (!cl) or can >> still >> process the chain until read section ends, but during that time, >> cl->q >> resp. bstats should be good. Do you happen to know what's at address >> 880a68b04028? I was wondering wrt call_rcu() vs call_rcu_bh(), >> but >> at least on ingress (netif_receive_skb_internal()) we hold >> rcu_read_lock() >> here. The KASAN report is reliably happening at this location, right? > > I am confused as well, I don't see how it could be related to my > patch yet. > I will take a deep look in the weekend. >> >> >> >> Hi Cong, >> >> When reported the new trace I didn't mean it's related to your patch, >> I just wanted to point it out it exposed something. I should have been >> clear about it. >> >> Ok, I'm currently on the run. Got too late yesterday night, but I'll write what I found in the evening today, not related to ingress though. >>> >>> Just pushed out my analysis to netdev under "[PATCH net] net, sched: >>> respect >>> rcu grace period on cls destruction". My conclusion is that both >>> issues are >>> actually separate, and that one is small enough where we could route >>> it via >>> net actually. Perhaps this at the same time shrinks your "[PATCH >>> net-next] >>> net_sched: move the empty tp check from ->destroy() to ->delete()" to a >>> reasonable size that it's suitable to net as well. Your >>> ->delete()/->destroy() >>> one is definitely needed, too. The tp->root one is independant of >>> ->delete()/ >>> ->destroy() as they are different races and tp->root could also >>> happen when >>> you just destroy the whole tp directly. I think that seems like a >>> good path >>> forward to me. >>> >>> Thanks, >>> Daniel >> >> >> >> Hi Daniel, >> >> As for the tainted kernel. I was in old (week or two) net-next tree >> and only cherry-picked from latest net-next related patches to >> Mellanox HCA, cls_api, cls_flower, devlink. so those are the tainted >> modules. >> I have the issue reproducing in that tree so wanted it to check it >> with Cong's patch instead of latest net-next. >> I'll try running reproducing the issue with your new patch and later >> try latest net-next as well. >> >> Thanks, >> Roi >> > > Hi, > > I tested "[PATCH net] net, sched: respect rcu grace period on cls > destruction" and could not reproduce my original issue. Hi Roi, Just so I'm 100% clear. No issue with just the above "respect rcu grace period on cls destruction" per above statement. > I rebased "[Patch net-next] net_sched: move the empty tp check from > ->destroy() to ->delete()" over to test it in the same tree and got into > a new trace in fl_delete. In this case did you test with "net_sched: move the empty tp check from ->destroy() to ->delete()" _only_ or did this include both patches when you see the error below. >From my inspection we really need both patches to get correct behavior. Thanks! John > > [35659.012123] BUG: KASAN: wild-memory-access on address 1803ca31 > [35659.020042] Write of size 1 by task ovs-vswitchd/20135 > [35659.025878] CPU: 19 PID: 20135 Comm: ovs-vswitchd Tainted: > G O4.9.0-rc3+ #18 > [35659.035948] Hardware name: HP ProLiant DL380p Gen8, BIOS P70 07/01/2015 > [35659.043730] Call Trace: > [35659.046619] [] dump_stack+0x63/0x81 > [35659.052456] [] kasan_report_error+0x408/0x4e0 > [35659.059402] [] kasan_report+0x58/0x60 > [35659.065428] [] ? call_rcu_sched+0x1d/0x20 > [35659.072119] [] ? fl_destroy_filter+0x21/0x30 > [cls_flower] > [35659.080217] [] ? fl_delete+0x1df/0x2e0 [cls_flower] > [35659.087580] [] __asan_store1+0x4a/0x50 > [35659.093697] [] fl_delete+0x1df/0x2e0 [cls_flower] > [35659.100870] [] tc_ctl_tfilter+0x10da/0x1b90 > > > 0x1d02 is in fl_delete (net/sched/cls_flower.c:805). > 800 struct cls_fl_filter *f = (struct cls_fl_filter *) arg; > 801 > 802 rhashtable_remove_fast(>ht, >ht_node, > 803head->ht_params); > 804 __fl_delete(tp, f); > 805 *last = list_empty(>filters); > 806 return 0; > 807 } > > > Thanks, > Roi
Re: [PATCH net v2 0/5] net: fix phydev reference leaks
David Miller wrote: Series applied, thanks. I was really hoping you'd give me the chance to test the patches before applying them. -- Sent by an employee of the Qualcomm Innovation Center, Inc. The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, hosted by The Linux Foundation.
Re: [net,v2] neigh: fix the loop index error in neigh dump
On 11/27/16 6:32 PM, Zhang Shengju wrote: > Loop index in neigh dump function is not updated correctly under some > circumstances, this patch will fix it. What's an example? > > Fixes: 16660f0bd9 ("net: Add support for filtering neigh dump by device > index") > Fixes: 21fdd092ac ("net: Add support for filtering neigh dump by master > device") > > Signed-off-by: Zhang Shengju> --- > net/core/neighbour.c | 39 ++- > 1 file changed, 18 insertions(+), 21 deletions(-) > > diff --git a/net/core/neighbour.c b/net/core/neighbour.c > index 2ae929f..ce32e9c 100644 > --- a/net/core/neighbour.c > +++ b/net/core/neighbour.c > @@ -2256,6 +2256,16 @@ static bool neigh_ifindex_filtered(struct net_device > *dev, int filter_idx) > return false; > } > > +static bool neigh_dump_filtered(struct net_device *dev, int filter_idx, > + int filter_master_idx) > +{ > + if (neigh_ifindex_filtered(dev, filter_idx) || > + neigh_master_filtered(dev, filter_master_idx)) > + return true; > + > + return false; > +} > + > static int neigh_dump_table(struct neigh_table *tbl, struct sk_buff *skb, > struct netlink_callback *cb) > { > @@ -2285,20 +2295,15 @@ static int neigh_dump_table(struct neigh_table *tbl, > struct sk_buff *skb, > rcu_read_lock_bh(); > nht = rcu_dereference_bh(tbl->nht); > > - for (h = s_h; h < (1 << nht->hash_shift); h++) { > - if (h > s_h) > - s_idx = 0; > + for (h = s_h; h < (1 << nht->hash_shift); h++, s_idx = 0) { > for (n = rcu_dereference_bh(nht->hash_buckets[h]), idx = 0; >n != NULL; > - n = rcu_dereference_bh(n->next)) { > - if (!net_eq(dev_net(n->dev), net)) > - continue; > - if (neigh_ifindex_filtered(n->dev, filter_idx)) > + n = rcu_dereference_bh(n->next), idx++) { > + if (idx < s_idx || !net_eq(dev_net(n->dev), net)) > continue; > - if (neigh_master_filtered(n->dev, filter_master_idx)) > + if (neigh_dump_filtered(n->dev, filter_idx, > + filter_master_idx)) > continue; > - if (idx < s_idx) > - goto next; > if (neigh_fill_info(skb, n, NETLINK_CB(cb->skb).portid, > cb->nlh->nlmsg_seq, > RTM_NEWNEIGH, > @@ -2306,8 +2311,6 @@ static int neigh_dump_table(struct neigh_table *tbl, > struct sk_buff *skb, > rc = -1; > goto out; > } > -next: > - idx++; > } > } > rc = skb->len; > @@ -2328,14 +2331,10 @@ static int pneigh_dump_table(struct neigh_table *tbl, > struct sk_buff *skb, > > read_lock_bh(>lock); > > - for (h = s_h; h <= PNEIGH_HASHMASK; h++) { > - if (h > s_h) > - s_idx = 0; > - for (n = tbl->phash_buckets[h], idx = 0; n; n = n->next) { > - if (pneigh_net(n) != net) > + for (h = s_h; h <= PNEIGH_HASHMASK; h++, s_idx = 0) { > + for (n = tbl->phash_buckets[h], idx = 0; n; n = n->next, idx++) > { > + if (idx < s_idx || pneigh_net(n) != net) > continue; > - if (idx < s_idx) > - goto next; > if (pneigh_fill_info(skb, n, NETLINK_CB(cb->skb).portid, > cb->nlh->nlmsg_seq, > RTM_NEWNEIGH, > @@ -2344,8 +2343,6 @@ static int pneigh_dump_table(struct neigh_table *tbl, > struct sk_buff *skb, > rc = -1; > goto out; > } > - next: > - idx++; > } > } > This fix is way to be complicated to be fixing anything related to 16660f0bd9 or 21fdd092ac. Both of those commits added a continue: if (neigh_ifindex_filtered(n->dev, filter_idx)) continue; if (neigh_master_filtered(n->dev, filter_master_idx)) continue; At best the continue is replaced by 'goto next;' and I am not convinced that is right. You are completely rewriting the dump loops.
Re: [PATCH net-next 0/6] BPF cleanups and misc updates
From: Daniel BorkmannDate: Sat, 26 Nov 2016 01:28:03 +0100 > This patch set adds couple of cleanups in first few patches, > exposes owner_prog_type for array maps as well as mlocked mem > for maps in fdinfo, allows for mount permissions in fs and > fixes various outstanding issues in selftests and samples. Series applied, thanks Daniel.
Re: [PATCH net 1/1] tipc: fix link statistics counter errors
From: Jon MaloyDate: Fri, 25 Nov 2016 10:35:02 -0500 > In commit e4bf4f76962b ("tipc: simplify packet sequence number > handling") we changed the internal representation of the packet > sequence number counters from u32 to u16, reflecting what is really > sent over the wire. > > Since then some link statistics counters have been displaying incorrect > values, partially because the counters meant to be used as sequence > number snapshots are now used as direct counters, stored as u32, and > partially because some counter updates are just missing in the code. > > In this commit we correct this in two ways. First, we base the > displayed packet sent/received values on direct counters instead > of as previously a calculated difference between current sequence > number and a snapshot. Second, we add the missing updates of the > counters. > > This change is compatible with the current netlink API, and requires > no changes to the user space tools. > > Signed-off-by: Jon Maloy Applied.
Re: [PATCH v2 0/7] stmmac: dwmac-meson8b: configurable RGMII TX delay
From: Martin BlumenstinglDate: Fri, 25 Nov 2016 14:01:49 +0100 > Currently the dwmac-meson8b stmmac glue driver uses a hardcoded 1/4 > cycle TX clock delay. This seems to work fine for many boards (for > example Odroid-C2 or Amlogic's reference boards) but there are some > others where TX traffic is simply broken. > There are probably multiple reasons why it's working on some boards > while it's broken on others: > - some of Amlogic's reference boards are using a Micrel PHY > - hardware circuit design > - maybe more... The ARM arch file changes do not apply cleanly to net-next, you probably want to merge them via the ARM tree instead of mine, and respin this series to be without the .dts file changes.
[net,v2] neigh: fix the loop index error in neigh dump
Loop index in neigh dump function is not updated correctly under some circumstances, this patch will fix it. Fixes: 16660f0bd9 ("net: Add support for filtering neigh dump by device index") Fixes: 21fdd092ac ("net: Add support for filtering neigh dump by master device") Signed-off-by: Zhang Shengju--- net/core/neighbour.c | 39 ++- 1 file changed, 18 insertions(+), 21 deletions(-) diff --git a/net/core/neighbour.c b/net/core/neighbour.c index 2ae929f..ce32e9c 100644 --- a/net/core/neighbour.c +++ b/net/core/neighbour.c @@ -2256,6 +2256,16 @@ static bool neigh_ifindex_filtered(struct net_device *dev, int filter_idx) return false; } +static bool neigh_dump_filtered(struct net_device *dev, int filter_idx, + int filter_master_idx) +{ + if (neigh_ifindex_filtered(dev, filter_idx) || + neigh_master_filtered(dev, filter_master_idx)) + return true; + + return false; +} + static int neigh_dump_table(struct neigh_table *tbl, struct sk_buff *skb, struct netlink_callback *cb) { @@ -2285,20 +2295,15 @@ static int neigh_dump_table(struct neigh_table *tbl, struct sk_buff *skb, rcu_read_lock_bh(); nht = rcu_dereference_bh(tbl->nht); - for (h = s_h; h < (1 << nht->hash_shift); h++) { - if (h > s_h) - s_idx = 0; + for (h = s_h; h < (1 << nht->hash_shift); h++, s_idx = 0) { for (n = rcu_dereference_bh(nht->hash_buckets[h]), idx = 0; n != NULL; -n = rcu_dereference_bh(n->next)) { - if (!net_eq(dev_net(n->dev), net)) - continue; - if (neigh_ifindex_filtered(n->dev, filter_idx)) +n = rcu_dereference_bh(n->next), idx++) { + if (idx < s_idx || !net_eq(dev_net(n->dev), net)) continue; - if (neigh_master_filtered(n->dev, filter_master_idx)) + if (neigh_dump_filtered(n->dev, filter_idx, + filter_master_idx)) continue; - if (idx < s_idx) - goto next; if (neigh_fill_info(skb, n, NETLINK_CB(cb->skb).portid, cb->nlh->nlmsg_seq, RTM_NEWNEIGH, @@ -2306,8 +2311,6 @@ static int neigh_dump_table(struct neigh_table *tbl, struct sk_buff *skb, rc = -1; goto out; } -next: - idx++; } } rc = skb->len; @@ -2328,14 +2331,10 @@ static int pneigh_dump_table(struct neigh_table *tbl, struct sk_buff *skb, read_lock_bh(>lock); - for (h = s_h; h <= PNEIGH_HASHMASK; h++) { - if (h > s_h) - s_idx = 0; - for (n = tbl->phash_buckets[h], idx = 0; n; n = n->next) { - if (pneigh_net(n) != net) + for (h = s_h; h <= PNEIGH_HASHMASK; h++, s_idx = 0) { + for (n = tbl->phash_buckets[h], idx = 0; n; n = n->next, idx++) { + if (idx < s_idx || pneigh_net(n) != net) continue; - if (idx < s_idx) - goto next; if (pneigh_fill_info(skb, n, NETLINK_CB(cb->skb).portid, cb->nlh->nlmsg_seq, RTM_NEWNEIGH, @@ -2344,8 +2343,6 @@ static int pneigh_dump_table(struct neigh_table *tbl, struct sk_buff *skb, rc = -1; goto out; } - next: - idx++; } } -- 1.8.3.1
Re: [patch v2 net-next] sfc: remove unneeded variable
From: Dan CarpenterDate: Fri, 25 Nov 2016 13:43:04 +0300 > We don't use ->heap_buf after commit 46d1efd852cc ("sfc: remove Software > TSO") so let's remove the last traces. > > Signed-off-by: Dan Carpenter Applied, thanks Dan.
Re: [PATCH] net: fec: turn on device when extracting statistics
From: Nikita YushchenkoDate: Fri, 25 Nov 2016 13:02:00 +0300 > + int i, ret; > + > + ret = pm_runtime_get_sync(>pdev->dev); > + if (IS_ERR_VALUE(ret)) { > + memset(data, 0, sizeof(*data) * ARRAY_SIZE(fec_stats)); > + return; > + } This really isn't the way to do this. When the device is suspended and the clocks are going to be stopped, you must fetch the statistic values into a software copy and provide those if the device is suspended when statistics are requested.
Re: pull-request: wireless-drivers-next 2016-11-25
From: Kalle ValoDate: Fri, 25 Nov 2016 11:39:49 +0200 > here's a pull request for 4.10. ath9k has now been converted to use > mac80211 intermediate software queues to fix bufferbloat problems. rsi > has become active again and latevy mwifiex has been getting a _lot_ of > love. > > I'm not expecting to see any problems with this pull request. When you > pull git will do lots of automerging but at least I didn't see any > conflicts. Please let me know if you have any problems. Pulled, thanks Kalle.
Re: [PATCH 1/1] net: macb: fix the RX queue reset in macb_rx()
From: Cyrille PitchenDate: Fri, 25 Nov 2016 09:49:32 +0100 > On macb only (not gem), when a RX queue corruption was detected from > macb_rx(), the RX queue was reset: during this process the RX ring > buffer descriptor was initialized by macb_init_rx_ring() but we forgot > to also set bp->rx_tail to 0. > > Indeed, when processing the received frames, bp->rx_tail provides the > macb driver with the index in the RX ring buffer of the next buffer to > process. So when the whole ring buffer is reset we must also reset > bp->rx_tail so the driver is synchronized again with the hardware. > > Since macb_init_rx_ring() is called from many locations, currently from > macb_rx() and macb_init_rings(), we'd rather add the "bp->rx_tail = 0;" > line inside macb_init_rx_ring() than add the very same line after each > call of this function. > > Without this fix, the rx queue is not reset properly to recover from > queue corruption and connection drop may occur. > > Signed-off-by: Cyrille Pitchen > Fixes: 9ba723b081a2 ("net: macb: remove BUG_ON() and reset the queue to > handle RX errors") This doesn't apply cleanly to the 'net' tree, where RX_RING_SIZE is used instead of bp->rx_ring_size. It seems you generated this against net-next, however you didn't say that either in your Subject line nor the commit message. As a bug fix this should be targetted at 'net'.
Re: pull request (net): ipsec 2016-11-25
From: Steffen KlassertDate: Fri, 25 Nov 2016 07:57:57 +0100 > 1) Fix a refcount leak in vti6. >From Nicolas Dichtel. > > 2) Fix a wrong if statement in xfrm_sk_policy_lookup. >From Florian Westphal. > > 3) The flowcache watermarks are per cpu. Take this into >account when comparing to the threshold where we >refusing new allocations. From Miroslav Urbanek. > > Please pull or let me know if there are problems. Pulled, thanks Steffen!
Re: [PATCH net 1/1] driver: macvtap: Unregister netdev rx_handler if macvtap_newlink fails
From: f...@ikuai8.com Date: Fri, 25 Nov 2016 10:05:06 +0800 > From: Gao Feng> > The macvtap_newlink registers the netdev rx_handler firstly, but it > does not unregister the handler if macvlan_common_newlink failed. > > Signed-off-by: Gao Feng Applied.
Re: [PATCH net v2 0/5] net: fix phydev reference leaks
From: Johan HovoldDate: Thu, 24 Nov 2016 19:21:26 +0100 > This series fixes a number of phydev reference leaks (and one of_node > leak) due to failure to put the reference taken by of_phy_find_device(). > > Note that I did not try to fix drivers/net/phy/xilinx_gmii2rgmii.c which > still leaks a reference. > > Against net but should apply just as fine to net-next. ... > v2: > - use put_device() instead of phy_dev_free() to put the references >taken in net/dsa (patch 1/4). > - add four new patches fixing similar leaks Series applied, thanks.
Re: [PATCH] irda: fix overly long udelay()
From: Arnd BergmannDate: Thu, 24 Nov 2016 17:26:22 +0100 > irda_get_mtt() returns a hardcoded '1' in some cases, > and with gcc-7, we get a build error because this triggers a > compile-time check in udelay(): > > drivers/net/irda/w83977af_ir.o: In function `w83977af_hard_xmit': > w83977af_ir.c:(.text.w83977af_hard_xmit+0x14c): undefined reference to > `__bad_udelay' > > Older compilers did not run into this because they either did not > completely inline the irda_get_mtt() or did not consider the > 1 value a constant expression. > > The code has been wrong since the start of git history. > > Signed-off-by: Arnd Bergmann Applied, thanks Arnd.
Re: [PATCH net 1/1] driver: ipvlan: Fix one possible memleak in ipvlan_link_new
From: f...@ikuai8.com Date: Thu, 24 Nov 2016 23:39:59 +0800 > From: Gao Feng> > When ipvlan_link_new fails and creates one ipvlan port, it does not > destroy the ipvlan port created. It causes mem leak and the physical > device contains invalid ipvlan data. > > Signed-off-by: Gao Feng Applied, thanks.
Re: [PATCH 2/2] net: dsa: mv88e6xxx: Add 88E6176 device tree support
> Try to see it from my perspective: I see that some vf610 device I don't > have (found via `git grep marvell,mv88e6` or so) uses > "marvell,mv88e6085". I then assume it has that device on board. How > would I know it doesn't? Same for the other boards you mention. > > Unfortunately some of your replies are slightly cryptic. Had you simply > replied 'please just use "marvell,mv88e6085" instead', it would've been > much more clear what you want. (Same for extending the subject instead > of just pointing to some FAQ.) By reading the FAQ you have learnt more than me saying put the correct tree in the subject line. By asking you to explain why you need a compatible string, i'm trying to make you think, look at the code and understand it. In the future, you might think and understand the code before posting a patch, and then we all save time. > So are you okay with patch 1/2 documenting the compatible? Then we could > drop 2/2 and use "marvell,mv88e6176", "marvell,mv88e6085" instead of > just the latter. Or would you rather drop both and keep the actual chip > a comment? A comment only please. Thanks Andrew
Re: [PATCH net-next v2 3/4] Documentation: net: phy: Add blurb about RGMII
Le 27/11/2016 à 14:24, Timur Tabi a écrit : >> + * PHY device drivers in PHYLIB being reusable by nature, being able to >> + configure correctly a specified delay enables more designs with >> similar delay >> + requirements to be operate correctly > > Ok, this one I don't know how to fix. I'm not really sure what you're > trying to say. What I am trying to say is that once a PHY driver properly configures a delay that you have specified, there is no reason why this is not applicable to other platforms using this same PHY driver. >> + >> +Common problems with RGMII delay mismatch >> + >> + When there is a RGMII delay mismatch between the Ethernet MAC and >> the PHY, this >> + will most likely result in the clock and data line sampling to >> capture unstable > > I'm not sure what "sampling to capture unstable" is supposed to mean. When the PHY devices takes a "snapshot" of the state of the data lines, after a clock edge, if the delay is improperly configured, these data lines are going to still be floating, or show some kind of capacitance/inductance effect, so the logical level which is going to be read may be incorrect. -- Florian
Re: [PATCH 2/2] net: dsa: mv88e6xxx: Add 88E6176 device tree support
Andrew, Am 27.11.2016 um 23:08 schrieb Andrew Lunn: >>> This driver already supports nearly 30 different Marvell switch >>> models. Please document why the marvell,mv88e6176 is special and why >>> it needs its own compatible string when the others don't. >> >> I don't understand. > > Think about what i said. Why does the 6176 need its own compatible > string, when the two 6352s and the 6165 on the zii-devel-b don't have > one? And the DIR 665 has a 6171, which does not have a compatible > string of its own. The clearfog actually has a 6176, and it seems to > work fine without a compatible string. > >> You as driver author should know that the .data pointer is vital to your >> driver > > Exactly, so if i ask why is it needed, maybe you should stop and think > for a while. > >> you even recently accepted another model that conflicted with >> my patch. > > And think about that also, and you will find the 6390 family, who's > first device is 6190, is not compatible with the 6085, and so needs a > different compatible string. Try to see it from my perspective: I see that some vf610 device I don't have (found via `git grep marvell,mv88e6` or so) uses "marvell,mv88e6085". I then assume it has that device on board. How would I know it doesn't? Same for the other boards you mention. Unfortunately some of your replies are slightly cryptic. Had you simply replied 'please just use "marvell,mv88e6085" instead', it would've been much more clear what you want. (Same for extending the subject instead of just pointing to some FAQ.) So are you okay with patch 1/2 documenting the compatible? Then we could drop 2/2 and use "marvell,mv88e6176", "marvell,mv88e6085" instead of just the latter. Or would you rather drop both and keep the actual chip a comment? Regards, Andreas -- SUSE Linux GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany GF: Felix Imendörffer, Jane Smithard, Graham Norton HRB 21284 (AG Nürnberg)
Re: [PATCH net-next] net: dsa: mv88e6xxx: Fix mv88e6xxx_g1_irq_free() interrupt count
On Sun, Nov 27, 2016 at 11:26:28PM +0100, Andreas Färber wrote: > mv88e6xxx_g1_irq_setup() sets up chip->g1_irq.nirqs interrupt mappings, > so free the same amount. This will be 8 or 9 in practice, less than 16. > > Fixes: dc30c35be720 ("net: dsa: mv88e6xxx: Implement interrupt support.") > Cc: Andrew Lunn> Signed-off-by: Andreas Färber Reviewed-by: Andrew Lunn Thanks Andrew
[PATCH net-next] net: dsa: mv88e6xxx: Fix mv88e6xxx_g1_irq_free() interrupt count
mv88e6xxx_g1_irq_setup() sets up chip->g1_irq.nirqs interrupt mappings, so free the same amount. This will be 8 or 9 in practice, less than 16. Fixes: dc30c35be720 ("net: dsa: mv88e6xxx: Implement interrupt support.") Cc: Andrew LunnSigned-off-by: Andreas Färber --- drivers/net/dsa/mv88e6xxx/chip.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c index b14b3d5099c8..77f13ada2612 100644 --- a/drivers/net/dsa/mv88e6xxx/chip.c +++ b/drivers/net/dsa/mv88e6xxx/chip.c @@ -421,7 +421,7 @@ static void mv88e6xxx_g1_irq_free(struct mv88e6xxx_chip *chip) free_irq(chip->irq, chip); - for (irq = 0; irq < 16; irq++) { + for (irq = 0; irq < chip->g1_irq.nirqs; irq++) { virq = irq_find_mapping(chip->g1_irq.domain, irq); irq_dispose_mapping(virq); } -- 2.6.6
Re: [PATCH net-next v2 3/4] Documentation: net: phy: Add blurb about RGMII
Just some grammatical corrections. You might want to run a spellchecker on all the patches. Florian Fainelli wrote: + The Reduced Gigabit Medium Independent Interface (RGMII) is a 12 pins "is a 12-pin" + electrical signal interface using a synchronous 125Mhz clock signal and several + data lines. Due to this design decision, a 1.5ns to 2ns delay must be added + between the clock line (RXC or TXC) and the data lines to let the PHY (clock + sink) have enough setup and hold times to sample the data lines correctly. The + PHY library offers different types of PHY_INTERFACE_MODE_RGMII* values to let + the PHY driver and optionaly the MAC driver implement the required delay. The "driver, and optionally the MAC driver, implement" + values of phy_interface_t must be understood from the perspective of the PHY + device itself, leading to the following: + + * PHY_INTERFACE_MODE_RGMII: the PHY is not responsible for inserting any + internal delay by itself, it assumes that either the Ethernet MAC (if capable + or the PCB traces) insert the correct 1.5-2ns delay + + * PHY_INTERFACE_MODE_RGMII_TXID: the PHY should be inserting an internal delay "should insert" + for the transmit data lines (TXD[3:0]) processed by the PHY device + + * PHY_INTERFACE_MODE_RGMII_RXID: the PHY should be inserting an internal delay "should insert" + for the receive data lines (RXD[3:0]) processed by the PHY device + + * PHY_INTERFACE_MODE_RGMII_ID: the PHY should be inserting internal delays for "should insert" + both transmit AND receive data lines from/to the PHY device + + Whenever it is possible, it is preferrable to utilize the PHY side RGMII delay + for several reasons: "Whenever possible, use the PHY side RGMII delay for these reasons:" + * PHY devices may offer sub-nanosecond granularity in how they allow a + receiver/transmitter side delay (e.g: 0.5, 1.0, 1.5ns) to be specified. Such + precision may be required to account for differences in PCB trace lengths + + * PHY devices are typically qualified for a large range of applications + (industrial, medical, automotive...), and they provide a constant and + reliable delay across temperature/pressure/voltage ranges + + * PHY device drivers in PHYLIB being reusable by nature, being able to + configure correctly a specified delay enables more designs with similar delay + requirements to be operate correctly Ok, this one I don't know how to fix. I'm not really sure what you're trying to say. + + For cases where the PHY is not capable of providing this delay, but the + Ethernet MAC driver is capable of doing it, the correct phy_interface_t value "doing so," + should be PHY_INTERFACE_MODE_RGMII, and the Ethernet MAC driver should be + configured correctly in order to provide the required transmit and/or receive + side delay from the perspective of the PHY device. Conversely, if the Ethernet + MAC driver looks at the phy_interface_t value, for any other mode but + PHY_INTERFACE_MODE_RGMII, it should make sure that the MAC-level delays are + disabled. + + In case neither the Ethernet MAC, nor the PHY are capable of providing the + required delays, as defined per the RGMII standard, several options may be + available: + + * Some SoCs may offer a pin pad/mux/controller capable of configuring a given + set of pins' drive strength, delays and voltage, and it may be a suitable "strength, delays, and voltage; and" + option to insert the expected 2ns RGMII delay + + * Modifying the PCB design to include a fixed delay (e.g: using a specifically + designed serpentine), which may not require software configuration at all period after "all". + +Common problems with RGMII delay mismatch + + When there is a RGMII delay mismatch between the Ethernet MAC and the PHY, this + will most likely result in the clock and data line sampling to capture unstable I'm not sure what "sampling to capture unstable" is supposed to mean. + signals, typical symptoms include: + + * Transmission/reception partially works, and there is frequent or occasional + packet loss observed + + * Ethernet MAC may report some, or all packets ingressing with a FCS/CRC error, No comma after "some". + or just discard them all + + * Switching to lower speeds such as 10/100Mbits/sec makes the problem go away + (since there is enough setup/hold time in that case) -- Sent by an employee of the Qualcomm Innovation Center, Inc. The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, hosted by The Linux Foundation.
Re: [PATCH 2/2] net: dsa: mv88e6xxx: Add 88E6176 device tree support
> > This driver already supports nearly 30 different Marvell switch > > models. Please document why the marvell,mv88e6176 is special and why > > it needs its own compatible string when the others don't. > > I don't understand. Think about what i said. Why does the 6176 need its own compatible string, when the two 6352s and the 6165 on the zii-devel-b don't have one? And the DIR 665 has a 6171, which does not have a compatible string of its own. The clearfog actually has a 6176, and it seems to work fine without a compatible string. > You as driver author should know that the .data pointer is vital to your > driver Exactly, so if i ask why is it needed, maybe you should stop and think for a while. > you even recently accepted another model that conflicted with > my patch. And think about that also, and you will find the 6390 family, who's first device is 6190, is not compatible with the 6085, and so needs a different compatible string. Andrew
Re: [PATCH 2/2] net: dsa: mv88e6xxx: Add 88E6176 device tree support
Am 27.11.2016 um 22:27 schrieb Andrew Lunn: > On Sun, Nov 27, 2016 at 09:57:59PM +0100, Andreas Färber wrote: >> This model is found on the Turris Omnia. > > This driver already supports nearly 30 different Marvell switch > models. Please document why the marvell,mv88e6176 is special and why > it needs its own compatible string when the others don't. I don't understand. The commit message above already points out for which device this is (and you also know from the LAKML thread). You as driver author should know that the .data pointer is vital to your driver - you even recently accepted another model that conflicted with my patch. So are you arguing for a ", which uses a Device Tree for booting" half-sentence here? The others not having an entry simply means no one needed them yet. And any Turris Omnia side changes need to go through the mvebu tree. Regards, Andreas -- SUSE Linux GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany GF: Felix Imendörffer, Jane Smithard, Graham Norton HRB 21284 (AG Nürnberg)
Re: [PATCH] mv88e6xxx: Fix mv88e6xxx_g1_irq_free() interrupt count
On Sun, Nov 27, 2016 at 10:32:41PM +0100, Andreas Färber wrote: > Hi Andrew, > > Am 27.11.2016 um 22:22 schrieb Andrew Lunn: > > On Sun, Nov 27, 2016 at 09:43:44PM +0100, Andreas Färber wrote: > >> mv88e6xxx_g1_irq_setup() sets up chip->g1_irq.nirqs interrupt mappings, > >> so free the same amount. This will be 8 or 9 in practice, less than 16. > > > > Hi Andreas > > > > The patch is correct, but please read > > Documentation/networking/netdev-FAQ.txt > > and then resubmit the patch. > > Do you mean --subject-prefix="PATCH net-next" Yep. Thanks Andrew
Re: [PATCH v2] MAINTAINERS: Add device tree bindings to mv88e6xx section
On Sun, Nov 27, 2016 at 10:07:30PM +0100, Andreas Färber wrote: > Also include the netdev list for convenience, as done elsewhere. Please indicate which maintainer you expect to accept this. And if that is David Miller, please fix the Subject: line. > Cc: Andrew Lunn> Cc: Vivien Didelot > Signed-off-by: Andreas Färber Reviewed-by: Andrew Lunn Andrew
Re: [PATCH] mv88e6xxx: Fix mv88e6xxx_g1_irq_free() interrupt count
Hi Andrew, Am 27.11.2016 um 22:22 schrieb Andrew Lunn: > On Sun, Nov 27, 2016 at 09:43:44PM +0100, Andreas Färber wrote: >> mv88e6xxx_g1_irq_setup() sets up chip->g1_irq.nirqs interrupt mappings, >> so free the same amount. This will be 8 or 9 in practice, less than 16. > > Hi Andreas > > The patch is correct, but please read > Documentation/networking/netdev-FAQ.txt > and then resubmit the patch. Do you mean --subject-prefix="PATCH net-next" or something else? Thanks, Andreas -- SUSE Linux GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany GF: Felix Imendörffer, Jane Smithard, Graham Norton HRB 21284 (AG Nürnberg)
Re: [PATCH 2/2] net: dsa: mv88e6xxx: Add 88E6176 device tree support
On Sun, Nov 27, 2016 at 09:57:59PM +0100, Andreas Färber wrote: > This model is found on the Turris Omnia. This driver already supports nearly 30 different Marvell switch models. Please document why the marvell,mv88e6176 is special and why it needs its own compatible string when the others don't. Andrew
Re: [PATCH] mv88e6xxx: Fix mv88e6xxx_g1_irq_free() interrupt count
On Sun, Nov 27, 2016 at 09:43:44PM +0100, Andreas Färber wrote: > mv88e6xxx_g1_irq_setup() sets up chip->g1_irq.nirqs interrupt mappings, > so free the same amount. This will be 8 or 9 in practice, less than 16. Hi Andreas The patch is correct, but please read Documentation/networking/netdev-FAQ.txt and then resubmit the patch. Andrew > > Fixes: dc30c35be720 ("net: dsa: mv88e6xxx: Implement interrupt support.") > Cc: Andrew Lunn> Signed-off-by: Andreas Färber > --- > drivers/net/dsa/mv88e6xxx/chip.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/net/dsa/mv88e6xxx/chip.c > b/drivers/net/dsa/mv88e6xxx/chip.c > index 98302358ceb9..95b9efb33ec7 100644 > --- a/drivers/net/dsa/mv88e6xxx/chip.c > +++ b/drivers/net/dsa/mv88e6xxx/chip.c > @@ -421,7 +421,7 @@ static void mv88e6xxx_g1_irq_free(struct mv88e6xxx_chip > *chip) > > free_irq(chip->irq, chip); > > - for (irq = 0; irq < 16; irq++) { > + for (irq = 0; irq < chip->g1_irq.nirqs; irq++) { > virq = irq_find_mapping(chip->g1_irq.domain, irq); > irq_dispose_mapping(virq); > } > -- > 2.6.6 >
Re: [PATCH net-next 09/11] qede: Better utilize the qede_[rt]x_queue
> > I'd say this is a false positive, given that MTU can't be so large. > False positive or not you must fix the warning and resubmit this > series with that fixed. Sure. I'll re-spin later this week [hopefully it'll get some additional review comments by then].
[PATCH v2] MAINTAINERS: Add device tree bindings to mv88e6xx section
Also include the netdev list for convenience, as done elsewhere. Cc: Andrew LunnCc: Vivien Didelot Signed-off-by: Andreas Färber --- MAINTAINERS | 2 ++ 1 file changed, 2 insertions(+) diff --git a/MAINTAINERS b/MAINTAINERS index f73e19277a70..677d73cfedc7 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -7668,8 +7668,10 @@ S: Maintained MARVELL 88E6XXX ETHERNET SWITCH FABRIC DRIVER M: Andrew Lunn M: Vivien Didelot +L: netdev@vger.kernel.org S: Maintained F: drivers/net/dsa/mv88e6xxx/ +F: Documentation/devicetree/bindings/net/dsa/marvell.txt MARVELL ARMADA DRM SUPPORT M: Russell King -- 2.6.6
[PATCH] MAINTAINERS: Add device tree bindings to mv88e6xx section
Also include the netdev list for convenience, as done elsewhere. Cc: Andrew LunnCc: Vivien Didelot Signed-off-by: Andreas Färber --- MAINTAINERS | 1 + 1 file changed, 1 insertion(+) diff --git a/MAINTAINERS b/MAINTAINERS index f73e19277a70..46ccf6eadcc9 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -7670,6 +7670,7 @@ M:Andrew Lunn M: Vivien Didelot S: Maintained F: drivers/net/dsa/mv88e6xxx/ +F: Documentation/devicetree/bindings/net/dsa/marvell.txt MARVELL ARMADA DRM SUPPORT M: Russell King -- 2.6.6
[PATCH 2/2] net: dsa: mv88e6xxx: Add 88E6176 device tree support
This model is found on the Turris Omnia. Signed-off-by: Andreas Färber--- drivers/net/dsa/mv88e6xxx/chip.c | 4 1 file changed, 4 insertions(+) diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c index 77f13ada2612..95b9efb33ec7 100644 --- a/drivers/net/dsa/mv88e6xxx/chip.c +++ b/drivers/net/dsa/mv88e6xxx/chip.c @@ -4280,6 +4280,10 @@ static const struct of_device_id mv88e6xxx_of_match[] = { .data = _table[MV88E6085], }, { + .compatible = "marvell,mv88e6176", + .data = _table[MV88E6176], + }, + { .compatible = "marvell,mv88e6190", .data = _table[MV88E6190], }, -- 2.6.6
[PATCH 1/2] Documentation: net: dsa: marvell: Add 88E6176
Signed-off-by: Andreas Färber--- Documentation/devicetree/bindings/net/dsa/marvell.txt | 1 + 1 file changed, 1 insertion(+) diff --git a/Documentation/devicetree/bindings/net/dsa/marvell.txt b/Documentation/devicetree/bindings/net/dsa/marvell.txt index b3dd6b40e0de..000bc3b16edd 100644 --- a/Documentation/devicetree/bindings/net/dsa/marvell.txt +++ b/Documentation/devicetree/bindings/net/dsa/marvell.txt @@ -15,6 +15,7 @@ Additional required and optional properties can be found in dsa.txt. Required properties: - compatible : Should be one of "marvell,mv88e6085" or +"marvell,mv88e6176" or "marvell,mv88e6190" - reg : Address on the MII bus for the switch. -- 2.6.6
Re: [PATCH] netdevice: fix sparse warning for HARD_TX_LOCK
From: "Michael S. Tsirkin"Date: Thu, 24 Nov 2016 07:04:08 +0200 > sparse warns about context imbalance in any code > that uses HARD_TX_LOCK/UNLOCK - this is because it's > unable to determine that flags don't change so > lock and unlock are paired. > > Seems easy enough to fix by adding __acquire/__release > calls. > > With this patch af_packet.c is now sparse-clean, > > Signed-off-by: Michael S. Tsirkin Applied to net-next, thanks.
[PATCH] mv88e6xxx: Fix mv88e6xxx_g1_irq_free() interrupt count
mv88e6xxx_g1_irq_setup() sets up chip->g1_irq.nirqs interrupt mappings, so free the same amount. This will be 8 or 9 in practice, less than 16. Fixes: dc30c35be720 ("net: dsa: mv88e6xxx: Implement interrupt support.") Cc: Andrew LunnSigned-off-by: Andreas Färber --- drivers/net/dsa/mv88e6xxx/chip.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c index 98302358ceb9..95b9efb33ec7 100644 --- a/drivers/net/dsa/mv88e6xxx/chip.c +++ b/drivers/net/dsa/mv88e6xxx/chip.c @@ -421,7 +421,7 @@ static void mv88e6xxx_g1_irq_free(struct mv88e6xxx_chip *chip) free_irq(chip->irq, chip); - for (irq = 0; irq < 16; irq++) { + for (irq = 0; irq < chip->g1_irq.nirqs; irq++) { virq = irq_find_mapping(chip->g1_irq.domain, irq); irq_dispose_mapping(virq); } -- 2.6.6
Re: [PATCH net-next v2 0/4] Documentation: net: phy: Improve documentation
On Sun, Nov 27, 2016 at 7:44 PM, Florian Fainelliwrote: > Hi all, > > This patch series addresses discussions and feedback that was recently > received > on the mailing-list in the area of: flow control/pause frames, interpretation > of > phy_interface_t and finally add some links to useful standards documents. > > Changes in v2: > > - clarify a few things in the RGMII section, add a paragraph about common > issues > with RGMII delay mismatches Reviewed-by: Martin Blumenstingl Thanks a lot Florian, this will definitely help others in the future! > Florian Fainelli (4): > Documentation: net: phy: remove description of function pointers > Documentation: net: phy: Add a paragraph about pause frames/flow > control > Documentation: net: phy: Add blurb about RGMII > Documentation: net: phy: Add links to several standards documents > > Documentation/networking/phy.txt | 139 > +-- > 1 file changed, 104 insertions(+), 35 deletions(-) > > -- > 2.9.3 >
Re: [PATCH net-next 1/1] ptp: gianfar: Use high resolution frequency method.
From: Ulrik De BieDate: Wed, 23 Nov 2016 21:11:04 +0100 > This patch depends on commit d8d263541913 ("ptp: Introduce a high > resolution frequency adjustment method.") > > The gianfar devices offer a frequency resolution of about 0.46 ppb > (depends on actual value of tmr_add, for the calculation assumed > 0x8000). This patch lets users of the device benefit from the increased > frequency resolution when tuning the clock. Thanks to the rounding the > maximum error between the requested frequency and the applied frequency > will then be about 0.23 ppb. > > Tested on a v3.3.8 kernel on a real gianfar device. Verified compilation > on net-next (currently at v4.9-rc5). > > Signed-off-by: Ulrik De Bie Applied.
Re: [PATCH v3] cpsw: ethtool: add support for getting/setting EEE registers
From: yegorsli...@googlemail.com Date: Thu, 24 Nov 2016 10:17:01 +0100 > From: Yegor Yefremov> > Add the ability to query and set Energy Efficient Ethernet parameters > via ethtool for applicable devices. > > This patch doesn't activate full EEE support in cpsw driver, but it > enables reading and writing EEE advertising settings. This way one > can disable advertising EEE for certain speeds. > > Signed-off-by: Yegor Yefremov > Acked-by: Rami Rosen > --- > Changes: > v3: explain what features will be available with this patch (Florian > Fainelli) > v2: make routines static (Rami Rosen) Does not apply cleanly to net-next, please respin.
Re: [PATCH net-next] mlx4: do not use priv->stats_lock in mlx4_en_auto_moderation()
From: Eric DumazetDate: Wed, 23 Nov 2016 09:46:52 -0800 > From: Eric Dumazet > > Per RX ring packets/bytes counters are not protected by global > priv->stats_lock. > > Better not confuse the reader, and use READ_ONCE() to show we read > these counters without surrounding synchronization. > > Interrupt moderation is best effort, and we do not really care of > ultra precise counters. > > Signed-off-by: Eric Dumazet Applied.
Re: [PATCH v2] ipv6:ipv6_pinfo dereferenced after NULL check
From: Manjeet PawarDate: Thu, 24 Nov 2016 16:11:57 +0530 > From: Rohit Thapliyal > > np checked for NULL and then dereferenced. It should be modified > for NULL case. > > Signed-off-by: Rohit Thapliyal > Signed-off-by: Manjeet Pawar > Signed-off-by: Hannes Frederic Sowa > Reviewed-by: Akhilesh Kumar I do not think inet6_sk(sk) can ever be NULL in this function. All callers fall into two categories: 1) Calls where arguments already dereference np in some way to pass arguments to ip6_xmit(): net/dccp/ipv6.c:err = ip6_xmit(sk, skb, , opt, np->tclass); net/ipv6/inet6_connection_sock.c: res = ip6_xmit(sk, skb, , rcu_dereference(np->opt), net/ipv6/tcp_ipv6.c:err = ip6_xmit(sk, skb, fl6, opt, np->tclass); net/sctp/ipv6.c:res = ip6_xmit(sk, skb, fl6, rcu_dereference(np->opt), np->tclass); 2) Calls where the socket is a "control" socket which is initialized at procotol registration time and therefore definitely has a proper inet6_sk() pointer set up. net/dccp/ipv6.c:ip6_xmit(ctl_sk, skb, , NULL, 0); net/ipv6/tcp_ipv6.c:ip6_xmit(ctl_sk, buff, , NULL, tclass); Therefore, I think we should simply remove the NULL test entirely.
Re: [PATCH net-next 09/11] qede: Better utilize the qede_[rt]x_queue
From: "Mintz, Yuval"Date: Sun, 27 Nov 2016 16:15:42 + > I'd say this is a false positive, given that MTU can't be so large. False positive or not you must fix the warning and resubmit this series with that fixed.
[PATCH] rtlwifi: Add updates for RTL8723BE and RTL8821AE
The new versions will only work with new versions of the drivers. For that reason, they are given new names and the old versions are retained. Signed-off-by: Larry Finger--- WHENCE | 4 rtlwifi/rtl8723befw_36.bin | Bin 0 -> 31762 bytes rtlwifi/rtl8821aefw_29.bin | Bin 0 -> 28348 bytes 3 files changed, 4 insertions(+) create mode 100644 rtlwifi/rtl8723befw_36.bin create mode 100644 rtlwifi/rtl8821aefw_29.bin diff --git a/WHENCE b/WHENCE index 90d6e4d..c31fe15 100644 --- a/WHENCE +++ b/WHENCE @@ -2329,6 +2329,8 @@ Driver: rtl8723be - Realtek 802.11n WLAN driver for RTL8723BE Info: From Vendor's realtek/rtlwifi_linux_mac80211_0019.0320.2014V628 driver File: rtlwifi/rtl8723befw.bin +Info: Update to version 36 - Sent by Realtek +File: rtlwifi/rtl8723befw_36.bin Licence: Redistributable. See LICENCE.rtlwifi_firmware.txt for details. @@ -2370,6 +2372,8 @@ Driver: rtl8821ae - Realtek 802.11n WLAN driver for RTL8821AE Info: From Vendor's realtek/rtlwifi_linux_mac80211_0019.0320.2014V628 driver File: rtlwifi/rtl8821aefw.bin File: rtlwifi/rtl8821aefw_wowlan.bin +Info: Update to version 29 - Sent by Realtek +File: rtlwifi/rtl8821aefw_29.bin Licence: Redistributable. See LICENCE.rtlwifi_firmware.txt for details. diff --git a/rtlwifi/rtl8723befw_36.bin b/rtlwifi/rtl8723befw_36.bin new file mode 100644 index ..1bb9b9c8cea95689a0d27f9d7c264ad63728dde1 GIT binary patch literal 31762 zcmbrn31CxI+BcqibF*}z%b~2Ilu+D(7Q|f)Qrt%9mI8v%07YC_QcYRJl3NgP!`?y_ zM6g*<5yVckE`SPHU%b*ucP@!#Z9HS^r^ZlN4Q$Xi^-|zdsf3f%M`*WW2 zZ0AI%$T1Wc3 W$(=@yg0nwf5q4AzWvGyECpCL?{3$C}47 zB|}agOT)LoHi#MW3vB%}>6>S!Z?=T*kaT>94nfZRp@Ye98%*&-2KPnIqLKM|hQgwe z{d4fW6lDvGF3zLxg?R>i3;6du`u5S1+30&8jlKqx9x~~n96gkyhvGbkK@S*u@d2Ot z6r6uC0)vMPyWm2Dgnw2;j-fy1z0h#8!D`wgEOt~FE}mKwYUO|T1!pc+K{qnw!r zc2?f`hCIV4 CW@{3KJ(d?@qHEJIdP*27tkXMLBI)61TvWWAU5LDq*^pJd5d4`sc1W+MK7 z``}zZg XRuSj$~3h9*Mf38gstR^cV_{S_;3BJXXh#jtAdA)Z zu-KNadA{B)I?YMz;^bGSJjU=+`6ug7)~|kDTEDnZxv|zQY}l`c@i+bvq4rT{?c>W1 zJ$|UgCPoi8iFTtWX>-?)9C`9se>K!3@cAuNiUj*e}*&{rNZ&9d|p^ z(!L!q;ODe7`Cone%Aa>)Fp{a>R8Jlo<+dLCrSr8+@x}W7X7e##V8DQH)6#w(Aph$; z`Lk%jnv=(*lhKh9y0zgaYjos>pF9TJ>5fEcyvGW8M;wo%BazJ4Ea~nw=v};Cyx~ac z2=`j}_tC6k+J2#wQ^#pYOD8H+}olxdY@Mw;P`~4W?q>pkkgo(SMDQO*x+ztZO7n z5?`wCK#^W)+fB!QmVczEFCVG@w~o-XM@)sPcqn}6YfLs~|7#>S~%HFizAY9`564pM n-4yZ|+dNQE?5>S5)sHbe(hOLaR zjA#aG$XQ#|q)+X*djTnl`!*$%EcxbkFpgNp%9T)hH;4JYA=uY7n__ -1yU1 zVL3w(C2LN;T~= HYS5#91_ngo=oZ``9#j;Q>wrcG89K@-v z@uB6{Dey)#{X*%gz*Nw?VD9@`rWd#X{pqg(DnvE6Pt89ozTEF?VDq1B#Av7e4U zo~k09ts0lqi?Y3yu@jW11bGVCs`2hSlPK#?4EoWNFm?C45Hiv@~f4^_xz)G5c$ z=v05bGt@PwHME>+l!x$PI=;d%L#wHq8Pp;4$1Tbx-s7_xb#(Ja=B2DXJ29CDw(H znNsw7WyfyOx XXwE(TcL^Xa*K#gC^ zE+4A|Zeh>21K`{yjdv-m_@CAk-v~8d6^D2fU!mHXm*El021T6+4|$+yEnD)2;@_w_ zEo!xhxN(%nby@5+hXp%#ND Y3 zGMn9!CY#Xj%^EccS3RY-nil)eZDe}~3xlY^vRS*sC|)F& 3rAVq;cyvfnJ?4*=7Xw(sn@0nxvp*s>X#q{Z#B55HN@%QVoefp$^{Y0;|4dYi zO^CY0q^)SJ;%m~_?vG!yNS%%tW-LAr=_-cpGHSBXokyLC%wv0lh3ffb-D<(s_{#8Q zzW6FN8EcQ0B~%H2NyWEC?T)@?mdd)rTC82|Laiza#$xJxC(~ Kvw+Y}jcx zYjt_CaZ!M>iQVgxL88dHh*gtyeQI{gM#QnblD^@6*U=OtR~WE2YpTPEBx=BxZ0^^A zU2)c`u-yPOEN_r=bepY{FbC;@J4r8JnaAG;t3FNN?JX(v$Ufq!z;waX-C3q`_ z!3TV;KH2a`YhECjJu?6~ss%c8&1k%+vrcbCdvivtwI6Y^DT1ycbR^yqB@ZW zt(;v8T110{nO0&!>#sN)Zz=(Q2NHZyty**q)#p2))!iC EvVI3wU!Y2PY7Cr!qW$j|qh;+vHU|4oyrmN52^NBC{KFY_6S<~hX! zqZG4IC8QB}Sv~dMXouRZNH?gC=1JXJIKf(XDuEXCUc!9Trpq>TAe=y|>CI2nT zV_1 P$mj{|61~Cp z)9JZ%g|%nn`2WlgAe@$p{}YBY4{c9a%>au=SGt(tgu^^L~pSP5p1Z4Ocu+WK zU=yY){!e+CPuWd44F1fiE}Y|fD~1);&{h=G@m_Umli3;^vFibl5l8Hl+Qs%3u~(m< zFKrVZUGu^%R@cu1aKYjE3;;m~vaMR(O%$n^PO7flfC@@Yb*16Y6#E7^hb{Q2