[PATCH 1/1] net: macb: ensure ordering write to re-enable RX smoothly

2016-11-27 Thread Zumeng Chen
When a hardware issue happened as described by inline comments, the register
write pattern looks like the following:

  
  + wmb();
  

There might be a memory barrier between these two write operations, so add wmb
to ensure an flip from 0 to 1 for NCR.

Signed-off-by: Zumeng Chen 
---
 drivers/net/ethernet/cadence/macb.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/ethernet/cadence/macb.c 
b/drivers/net/ethernet/cadence/macb.c
index 533653b..2f9c5b2 100644
--- a/drivers/net/ethernet/cadence/macb.c
+++ b/drivers/net/ethernet/cadence/macb.c
@@ -1156,6 +1156,7 @@ static irqreturn_t macb_interrupt(int irq, void *dev_id)
if (status & MACB_BIT(RXUBR)) {
ctrl = macb_readl(bp, NCR);
macb_writel(bp, NCR, ctrl & ~MACB_BIT(RE));
+   wmb();
macb_writel(bp, NCR, ctrl | MACB_BIT(RE));
 
if (bp->caps & MACB_CAPS_ISR_CLEAR_ON_WRITE)
-- 
2.4.11



Re: [PATCH net] net/sched: act_pedit: limit negative offset

2016-11-27 Thread Amir Vadai"
On Mon, Nov 28, 2016 at 12:49:36AM -0500, David Miller wrote:
> From: Cong Wang 
> Date: Sun, 27 Nov 2016 21:39:33 -0800
> 
> > On Sun, Nov 27, 2016 at 7:58 AM, Amir Vadai  wrote:
> >> Should not allow setting a negative offset that goes below the skb head.
> > ...
> >> diff --git a/net/sched/act_pedit.c b/net/sched/act_pedit.c
> >> index b54d56d4959b..e79e8a88f2d2 100644
> >> --- a/net/sched/act_pedit.c
> >> +++ b/net/sched/act_pedit.c
> >> @@ -154,8 +154,11 @@ static int tcf_pedit(struct sk_buff *skb, const 
> >> struct tc_action *a,
> >> }
> >>
> >> ptr = skb_header_pointer(skb, off + offset, 4, 
> >> &_data);
> >> -   if (!ptr)
> >> +   if ((unsigned char *)ptr < skb->head) {
> > 
> > 
> > ptr returned could be &_data, which is on stack, so why this comparison
> > makes sense for this case?
> 
> Indeed, this will definitely do the wrong thing when the on-stack area
> passed back to ptr.
yes - my bad. will correct it and send v1


Re: [PATCH net-next v3 0/4] Documentation: net: phy: Improve documentation

2016-11-27 Thread Jerome Brunet
On Sun, 2016-11-27 at 18:45 -0800, Florian Fainelli wrote:
> Hi all,
> 
> This patch series addresses discussions and feedback that was
> recently received
> on the mailing-list in the area of: flow control/pause frames,
> interpretation of
> phy_interface_t and finally add some links to useful standards
> documents.
> 
> Changes in v3:
> 
> - add Timur's feedback into patch 3
> 
> Changes in v2:
> 
> - clarify a few things in the RGMII section, add a paragraph about
> common issues
>   with RGMII delay mismatches
> 

Thanks a lot Florian. This is really helping, especially the part about
RGMII delays.

Reviewed-by: Jerome Brunet 

> Florian Fainelli (4):
>   Documentation: net: phy: remove description of function pointers
>   Documentation: net: phy: Add a paragraph about pause frames/flow
> control
>   Documentation: net: phy: Add blurb about RGMII
>   Documentation: net: phy: Add links to several standards documents
> 
>  Documentation/networking/phy.txt | 140
> +--
>  1 file changed, 105 insertions(+), 35 deletions(-)
> 


[PATCH] net: arc_emac: add dependencies on associated arches and compile test

2016-11-27 Thread Peter Robinson
Add dependencies on the architectures that support these devices and
add compile test to ensure ongoing code build coverage.

Signed-off-by: Peter Robinson 
---
 drivers/net/ethernet/arc/Kconfig | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/arc/Kconfig b/drivers/net/ethernet/arc/Kconfig
index 6890451..e743ddf 100644
--- a/drivers/net/ethernet/arc/Kconfig
+++ b/drivers/net/ethernet/arc/Kconfig
@@ -17,13 +17,14 @@ if NET_VENDOR_ARC
 
 config ARC_EMAC_CORE
tristate
+   depends on ARC || ARCH_ROCKCHIP || COMPILE_TEST
select MII
select PHYLIB
 
 config ARC_EMAC
tristate "ARC EMAC support"
select ARC_EMAC_CORE
-   depends on OF_IRQ && OF_NET && HAS_DMA
+   depends on OF_IRQ && OF_NET && HAS_DMA && (ARC || COMPILE_TEST)
---help---
  On some legacy ARC (Synopsys) FPGA boards such as ARCAngel4/ML50x
  non-standard on-chip ethernet device ARC EMAC 10/100 is used.
@@ -32,7 +33,7 @@ config ARC_EMAC
 config EMAC_ROCKCHIP
tristate "Rockchip EMAC support"
select ARC_EMAC_CORE
-   depends on OF_IRQ && OF_NET && REGULATOR && HAS_DMA
+   depends on OF_IRQ && OF_NET && REGULATOR && HAS_DMA && (ARCH_ROCKCHIP 
|| COMPILE_TEST)
---help---
  Support for Rockchip RK3036/RK3066/RK3188 EMAC ethernet controllers.
  This selects Rockchip SoC glue layer support for the
-- 
2.9.3



[PATCH net-next v2 6/6] tcp: SOF_TIMESTAMPING_OPT_STATS option for SO_TIMESTAMPING

2016-11-27 Thread Yuchung Cheng
From: Francis Yan 

This patch exports the sender chronograph stats via the socket
SO_TIMESTAMPING channel. Currently we can instrument how long a
particular application unit of data was queued in TCP by tracking
SOF_TIMESTAMPING_TX_SOFTWARE and SOF_TIMESTAMPING_TX_SCHED. Having
these sender chronograph stats exported simultaneously along with
these timestamps allow further breaking down the various sender
limitation.  For example, a video server can tell if a particular
chunk of video on a connection takes a long time to deliver because
TCP was experiencing small receive window. It is not possible to
tell before this patch without packet traces.

To prepare these stats, the user needs to set
SOF_TIMESTAMPING_OPT_STATS and SOF_TIMESTAMPING_OPT_TSONLY flags
while requesting other SOF_TIMESTAMPING TX timestamps. When the
timestamps are available in the error queue, the stats are returned
in a separate control message of type SCM_TIMESTAMPING_OPT_STATS,
in a list of TLVs (struct nlattr) of types: TCP_NLA_BUSY_TIME,
TCP_NLA_RWND_LIMITED, TCP_NLA_SNDBUF_LIMITED. Unit is microsecond.

Signed-off-by: Francis Yan 
Signed-off-by: Yuchung Cheng 
Signed-off-by: Soheil Hassas Yeganeh 
---
ChangeLog since v1:
 - fix build break if CONFIG_INET is not defined

 Documentation/networking/timestamping.txt | 10 ++
 arch/alpha/include/uapi/asm/socket.h  |  2 ++
 arch/frv/include/uapi/asm/socket.h|  2 ++
 arch/ia64/include/uapi/asm/socket.h   |  2 ++
 arch/m32r/include/uapi/asm/socket.h   |  2 ++
 arch/mips/include/uapi/asm/socket.h   |  2 ++
 arch/mn10300/include/uapi/asm/socket.h|  2 ++
 arch/parisc/include/uapi/asm/socket.h |  2 ++
 arch/powerpc/include/uapi/asm/socket.h|  2 ++
 arch/s390/include/uapi/asm/socket.h   |  2 ++
 arch/sparc/include/uapi/asm/socket.h  |  2 ++
 arch/xtensa/include/uapi/asm/socket.h |  2 ++
 include/linux/tcp.h   |  2 ++
 include/uapi/asm-generic/socket.h |  2 ++
 include/uapi/linux/net_tstamp.h   |  3 ++-
 include/uapi/linux/tcp.h  |  8 
 net/core/skbuff.c | 14 +++---
 net/core/sock.c   |  7 +++
 net/ipv4/tcp.c| 20 
 net/socket.c  |  7 ++-
 20 files changed, 90 insertions(+), 5 deletions(-)

diff --git a/Documentation/networking/timestamping.txt 
b/Documentation/networking/timestamping.txt
index 671cccf..96f5069 100644
--- a/Documentation/networking/timestamping.txt
+++ b/Documentation/networking/timestamping.txt
@@ -182,6 +182,16 @@ SOF_TIMESTAMPING_OPT_TSONLY:
   the timestamp even if sysctl net.core.tstamp_allow_data is 0.
   This option disables SOF_TIMESTAMPING_OPT_CMSG.
 
+SOF_TIMESTAMPING_OPT_STATS:
+
+  Optional stats that are obtained along with the transmit timestamps.
+  It must be used together with SOF_TIMESTAMPING_OPT_TSONLY. When the
+  transmit timestamp is available, the stats are available in a
+  separate control message of type SCM_TIMESTAMPING_OPT_STATS, as a
+  list of TLVs (struct nlattr) of types. These stats allow the
+  application to associate various transport layer stats with
+  the transmit timestamps, such as how long a certain block of
+  data was limited by peer's receiver window.
 
 New applications are encouraged to pass SOF_TIMESTAMPING_OPT_ID to
 disambiguate timestamps and SOF_TIMESTAMPING_OPT_TSONLY to operate
diff --git a/arch/alpha/include/uapi/asm/socket.h 
b/arch/alpha/include/uapi/asm/socket.h
index 9e46d6e..afc901b 100644
--- a/arch/alpha/include/uapi/asm/socket.h
+++ b/arch/alpha/include/uapi/asm/socket.h
@@ -97,4 +97,6 @@
 
 #define SO_CNX_ADVICE  53
 
+#define SCM_TIMESTAMPING_OPT_STATS 54
+
 #endif /* _UAPI_ASM_SOCKET_H */
diff --git a/arch/frv/include/uapi/asm/socket.h 
b/arch/frv/include/uapi/asm/socket.h
index afbc98f0..81e0353 100644
--- a/arch/frv/include/uapi/asm/socket.h
+++ b/arch/frv/include/uapi/asm/socket.h
@@ -90,5 +90,7 @@
 
 #define SO_CNX_ADVICE  53
 
+#define SCM_TIMESTAMPING_OPT_STATS 54
+
 #endif /* _ASM_SOCKET_H */
 
diff --git a/arch/ia64/include/uapi/asm/socket.h 
b/arch/ia64/include/uapi/asm/socket.h
index 0018fad..57feb0c 100644
--- a/arch/ia64/include/uapi/asm/socket.h
+++ b/arch/ia64/include/uapi/asm/socket.h
@@ -99,4 +99,6 @@
 
 #define SO_CNX_ADVICE  53
 
+#define SCM_TIMESTAMPING_OPT_STATS 54
+
 #endif /* _ASM_IA64_SOCKET_H */
diff --git a/arch/m32r/include/uapi/asm/socket.h 
b/arch/m32r/include/uapi/asm/socket.h
index 5fe42fc..5853f8e9 100644
--- a/arch/m32r/include/uapi/asm/socket.h
+++ b/arch/m32r/include/uapi/asm/socket.h
@@ -90,4 +90,6 @@
 
 #define SO_CNX_ADVICE  53
 
+#define SCM_TIMESTAMPING_OPT_STATS 54
+
 #endif /* _ASM_M32R_SOCKET_H */
diff --git a/arch/mips/include/uapi/asm/socket.h 
b/arch/mips/include/uapi/asm/socket.h
index 

[PATCH net-next v2 4/6] tcp: instrument how long TCP is limited by insufficient send buffer

2016-11-27 Thread Yuchung Cheng
From: Francis Yan 

This patch measures the amount of time when TCP runs out of new data
to send to the network due to insufficient send buffer, while TCP
is still busy delivering (i.e. write queue is not empty). The goal
is to indicate either the send buffer autotuning or user SO_SNDBUF
setting has resulted network under-utilization.

The measurement starts conservatively by checking various conditions
to minimize false claims (i.e. under-estimation is more likely).
The measurement stops when the SOCK_NOSPACE flag is cleared. But it
does not account the time elapsed till the next application write.
Also the measurement only starts if the sender is still busy sending
data, s.t. the limit accounted is part of the total busy time.

Signed-off-by: Francis Yan 
Signed-off-by: Yuchung Cheng 
Signed-off-by: Soheil Hassas Yeganeh 
---
 net/ipv4/tcp.c| 10 --
 net/ipv4/tcp_input.c  |  5 -
 net/ipv4/tcp_output.c | 12 
 3 files changed, 24 insertions(+), 3 deletions(-)

diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 913f9bb..259ffb5 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -996,8 +996,11 @@ static ssize_t do_tcp_sendpages(struct sock *sk, struct 
page *page, int offset,
goto out;
 out_err:
/* make sure we wake any epoll edge trigger waiter */
-   if (unlikely(skb_queue_len(>sk_write_queue) == 0 && err == -EAGAIN))
+   if (unlikely(skb_queue_len(>sk_write_queue) == 0 &&
+err == -EAGAIN)) {
sk->sk_write_space(sk);
+   tcp_chrono_stop(sk, TCP_CHRONO_SNDBUF_LIMITED);
+   }
return sk_stream_error(sk, flags, err);
 }
 
@@ -1331,8 +1334,11 @@ int tcp_sendmsg(struct sock *sk, struct msghdr *msg, 
size_t size)
 out_err:
err = sk_stream_error(sk, flags, err);
/* make sure we wake any epoll edge trigger waiter */
-   if (unlikely(skb_queue_len(>sk_write_queue) == 0 && err == -EAGAIN))
+   if (unlikely(skb_queue_len(>sk_write_queue) == 0 &&
+err == -EAGAIN)) {
sk->sk_write_space(sk);
+   tcp_chrono_stop(sk, TCP_CHRONO_SNDBUF_LIMITED);
+   }
release_sock(sk);
return err;
 }
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index a5d1727..56fe736 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -5059,8 +5059,11 @@ static void tcp_check_space(struct sock *sk)
/* pairs with tcp_poll() */
smp_mb__after_atomic();
if (sk->sk_socket &&
-   test_bit(SOCK_NOSPACE, >sk_socket->flags))
+   test_bit(SOCK_NOSPACE, >sk_socket->flags)) {
tcp_new_space(sk);
+   if (!test_bit(SOCK_NOSPACE, >sk_socket->flags))
+   tcp_chrono_stop(sk, TCP_CHRONO_SNDBUF_LIMITED);
+   }
}
 }
 
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index b7c..d3545d0 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -1514,6 +1514,18 @@ static void tcp_cwnd_validate(struct sock *sk, bool 
is_cwnd_limited)
if (sysctl_tcp_slow_start_after_idle &&
(s32)(tcp_time_stamp - tp->snd_cwnd_stamp) >= 
inet_csk(sk)->icsk_rto)
tcp_cwnd_application_limited(sk);
+
+   /* The following conditions together indicate the starvation
+* is caused by insufficient sender buffer:
+* 1) just sent some data (see tcp_write_xmit)
+* 2) not cwnd limited (this else condition)
+* 3) no more data to send (null tcp_send_head )
+* 4) application is hitting buffer limit (SOCK_NOSPACE)
+*/
+   if (!tcp_send_head(sk) && sk->sk_socket &&
+   test_bit(SOCK_NOSPACE, >sk_socket->flags) &&
+   (1 << sk->sk_state) & (TCPF_ESTABLISHED | TCPF_CLOSE_WAIT))
+   tcp_chrono_start(sk, TCP_CHRONO_SNDBUF_LIMITED);
}
 }
 
-- 
2.8.0.rc3.226.g39d4020



[PATCH net-next v2 5/6] tcp: export sender limits chronographs to TCP_INFO

2016-11-27 Thread Yuchung Cheng
From: Francis Yan 

This patch exports all the sender chronograph measurements collected
in the previous patches to TCP_INFO interface. Note that busy time
exported includes all the other sending limits (rwnd-limited,
sndbuf-limited). Internally the time unit is jiffy but externally
the measurements are in microseconds for future extensions.

Signed-off-by: Francis Yan 
Signed-off-by: Yuchung Cheng 
Signed-off-by: Soheil Hassas Yeganeh 
---
 include/uapi/linux/tcp.h |  4 
 net/ipv4/tcp.c   | 20 
 2 files changed, 24 insertions(+)

diff --git a/include/uapi/linux/tcp.h b/include/uapi/linux/tcp.h
index 73ac0db..2863b66 100644
--- a/include/uapi/linux/tcp.h
+++ b/include/uapi/linux/tcp.h
@@ -214,6 +214,10 @@ struct tcp_info {
__u32   tcpi_data_segs_out; /* RFC4898 tcpEStatsDataSegsOut */
 
__u64   tcpi_delivery_rate;
+
+   __u64   tcpi_busy_time;  /* Time (usec) busy sending data */
+   __u64   tcpi_rwnd_limited;   /* Time (usec) limited by receive window */
+   __u64   tcpi_sndbuf_limited; /* Time (usec) limited by send buffer */
 };
 
 /* for TCP_MD5SIG socket option */
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 259ffb5..cdde20f 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -2708,6 +2708,25 @@ int compat_tcp_setsockopt(struct sock *sk, int level, 
int optname,
 EXPORT_SYMBOL(compat_tcp_setsockopt);
 #endif
 
+static void tcp_get_info_chrono_stats(const struct tcp_sock *tp,
+ struct tcp_info *info)
+{
+   u64 stats[__TCP_CHRONO_MAX], total = 0;
+   enum tcp_chrono i;
+
+   for (i = TCP_CHRONO_BUSY; i < __TCP_CHRONO_MAX; ++i) {
+   stats[i] = tp->chrono_stat[i - 1];
+   if (i == tp->chrono_type)
+   stats[i] += tcp_time_stamp - tp->chrono_start;
+   stats[i] *= USEC_PER_SEC / HZ;
+   total += stats[i];
+   }
+
+   info->tcpi_busy_time = total;
+   info->tcpi_rwnd_limited = stats[TCP_CHRONO_RWND_LIMITED];
+   info->tcpi_sndbuf_limited = stats[TCP_CHRONO_SNDBUF_LIMITED];
+}
+
 /* Return information about state of tcp endpoint in API format. */
 void tcp_get_info(struct sock *sk, struct tcp_info *info)
 {
@@ -2800,6 +2819,7 @@ void tcp_get_info(struct sock *sk, struct tcp_info *info)
info->tcpi_bytes_acked = tp->bytes_acked;
info->tcpi_bytes_received = tp->bytes_received;
info->tcpi_notsent_bytes = max_t(int, 0, tp->write_seq - tp->snd_nxt);
+   tcp_get_info_chrono_stats(tp, info);
 
unlock_sock_fast(sk, slow);
 
-- 
2.8.0.rc3.226.g39d4020



[PATCH net-next v2 3/6] tcp: instrument how long TCP is limited by receive window

2016-11-27 Thread Yuchung Cheng
From: Francis Yan 

This patch measures the total time when the TCP stops sending because
the receiver's advertised window is not large enough. Note that
once the limit is lifted we are likely in the busy status if we
have data pending.

Signed-off-by: Francis Yan 
Signed-off-by: Yuchung Cheng 
Signed-off-by: Soheil Hassas Yeganeh 
---
 net/ipv4/tcp_output.c | 11 +--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index e8ea584..b7c 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -2144,7 +2144,7 @@ static bool tcp_write_xmit(struct sock *sk, unsigned int 
mss_now, int nonagle,
unsigned int tso_segs, sent_pkts;
int cwnd_quota;
int result;
-   bool is_cwnd_limited = false;
+   bool is_cwnd_limited = false, is_rwnd_limited = false;
u32 max_segs;
 
sent_pkts = 0;
@@ -2181,8 +2181,10 @@ static bool tcp_write_xmit(struct sock *sk, unsigned int 
mss_now, int nonagle,
break;
}
 
-   if (unlikely(!tcp_snd_wnd_test(tp, skb, mss_now)))
+   if (unlikely(!tcp_snd_wnd_test(tp, skb, mss_now))) {
+   is_rwnd_limited = true;
break;
+   }
 
if (tso_segs == 1) {
if (unlikely(!tcp_nagle_test(tp, skb, mss_now,
@@ -2227,6 +2229,11 @@ static bool tcp_write_xmit(struct sock *sk, unsigned int 
mss_now, int nonagle,
break;
}
 
+   if (is_rwnd_limited)
+   tcp_chrono_start(sk, TCP_CHRONO_RWND_LIMITED);
+   else
+   tcp_chrono_stop(sk, TCP_CHRONO_RWND_LIMITED);
+
if (likely(sent_pkts)) {
if (tcp_in_cwnd_reduction(sk))
tp->prr_out += sent_pkts;
-- 
2.8.0.rc3.226.g39d4020



[PATCH net-next v2 1/6] tcp: instrument tcp sender limits chronographs

2016-11-27 Thread Yuchung Cheng
From: Francis Yan 

This patch implements the skeleton of the TCP chronograph
instrumentation on sender side limits:

1) idle (unspec)
2) busy sending data other than 3-4 below
3) rwnd-limited
4) sndbuf-limited

The limits are enumerated 'tcp_chrono'. Since a connection in
theory can idle forever, we do not track the actual length of this
uninteresting idle period. For the rest we track how long the sender
spends in each limit. At any point during the life time of a
connection, the sender must be in one of the four states.

If there are multiple conditions worthy of tracking in a chronograph
then the highest priority enum takes precedence over
the other conditions. So that if something "more interesting"
starts happening, stop the previous chrono and start a new one.

The time unit is jiffy(u32) in order to save space in tcp_sock.
This implies application must sample the stats no longer than every
49 days of 1ms jiffy.

Signed-off-by: Francis Yan 
Signed-off-by: Yuchung Cheng 
Signed-off-by: Soheil Hassas Yeganeh 
---
 include/linux/tcp.h   |  7 +--
 include/net/tcp.h | 14 ++
 net/ipv4/tcp_output.c | 30 ++
 3 files changed, 49 insertions(+), 2 deletions(-)

diff --git a/include/linux/tcp.h b/include/linux/tcp.h
index 32a7c7e..d5d3bd8 100644
--- a/include/linux/tcp.h
+++ b/include/linux/tcp.h
@@ -211,8 +211,11 @@ struct tcp_sock {
u8 reord;/* reordering detected */
} rack;
u16 advmss; /* Advertised MSS   */
-   u8  rate_app_limited:1,  /* rate_{delivered,interval_us} limited? */
-   unused:7;
+   u32 chrono_start;   /* Start time in jiffies of a TCP chrono */
+   u32 chrono_stat[3]; /* Time in jiffies for chrono_stat stats */
+   u8  chrono_type:2,  /* current chronograph type */
+   rate_app_limited:1,  /* rate_{delivered,interval_us} limited? */
+   unused:5;
u8  nonagle : 4,/* Disable Nagle algorithm? */
thin_lto: 1,/* Use linear timeouts for thin streams */
thin_dupack : 1,/* Fast retransmit on first dupack  */
diff --git a/include/net/tcp.h b/include/net/tcp.h
index 7de8073..e5ff408 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -1516,6 +1516,20 @@ struct tcp_fastopen_context {
struct rcu_head rcu;
 };
 
+/* Latencies incurred by various limits for a sender. They are
+ * chronograph-like stats that are mutually exclusive.
+ */
+enum tcp_chrono {
+   TCP_CHRONO_UNSPEC,
+   TCP_CHRONO_BUSY, /* Actively sending data (non-empty write queue) */
+   TCP_CHRONO_RWND_LIMITED, /* Stalled by insufficient receive window */
+   TCP_CHRONO_SNDBUF_LIMITED, /* Stalled by insufficient send buffer */
+   __TCP_CHRONO_MAX,
+};
+
+void tcp_chrono_start(struct sock *sk, const enum tcp_chrono type);
+void tcp_chrono_stop(struct sock *sk, const enum tcp_chrono type);
+
 /* write queue abstraction */
 static inline void tcp_write_queue_purge(struct sock *sk)
 {
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 19105b4..34f7517 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -2081,6 +2081,36 @@ static bool tcp_small_queue_check(struct sock *sk, const 
struct sk_buff *skb,
return false;
 }
 
+static void tcp_chrono_set(struct tcp_sock *tp, const enum tcp_chrono new)
+{
+   const u32 now = tcp_time_stamp;
+
+   if (tp->chrono_type > TCP_CHRONO_UNSPEC)
+   tp->chrono_stat[tp->chrono_type - 1] += now - tp->chrono_start;
+   tp->chrono_start = now;
+   tp->chrono_type = new;
+}
+
+void tcp_chrono_start(struct sock *sk, const enum tcp_chrono type)
+{
+   struct tcp_sock *tp = tcp_sk(sk);
+
+   /* If there are multiple conditions worthy of tracking in a
+* chronograph then the highest priority enum takes precedence over
+* the other conditions. So that if something "more interesting"
+* starts happening, stop the previous chrono and start a new one.
+*/
+   if (type > tp->chrono_type)
+   tcp_chrono_set(tp, type);
+}
+
+void tcp_chrono_stop(struct sock *sk, const enum tcp_chrono type)
+{
+   struct tcp_sock *tp = tcp_sk(sk);
+
+   tcp_chrono_set(tp, TCP_CHRONO_UNSPEC);
+}
+
 /* This routine writes packets to the network.  It advances the
  * send_head.  This happens as incoming acks open up the remote
  * window for us.
-- 
2.8.0.rc3.226.g39d4020



[PATCH net-next v2 2/6] tcp: instrument how long TCP is busy sending

2016-11-27 Thread Yuchung Cheng
From: Francis Yan 

This patch measures TCP busy time, which is defined as the period
of time when sender has data (or FIN) to send. The time starts when
data is buffered and stops when the write queue is flushed by ACKs
or error events.

Note the busy time does not include SYN time, unless data is
included in SYN (i.e. Fast Open). It does include FIN time even
if the FIN carries no payload. Excluding pure FIN is possible but
would incur one additional test in the fast path, which may not
be worth it.

Signed-off-by: Francis Yan 
Signed-off-by: Yuchung Cheng 
Signed-off-by: Soheil Hassas Yeganeh 
---
 include/net/tcp.h |  6 +-
 net/ipv4/tcp_input.c  |  3 +++
 net/ipv4/tcp_output.c | 19 ---
 3 files changed, 24 insertions(+), 4 deletions(-)

diff --git a/include/net/tcp.h b/include/net/tcp.h
index e5ff408..3e097e3 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -1535,6 +1535,7 @@ static inline void tcp_write_queue_purge(struct sock *sk)
 {
struct sk_buff *skb;
 
+   tcp_chrono_stop(sk, TCP_CHRONO_BUSY);
while ((skb = __skb_dequeue(>sk_write_queue)) != NULL)
sk_wmem_free_skb(sk, skb);
sk_mem_reclaim(sk);
@@ -1593,8 +1594,10 @@ static inline void tcp_advance_send_head(struct sock 
*sk, const struct sk_buff *
 
 static inline void tcp_check_send_head(struct sock *sk, struct sk_buff 
*skb_unlinked)
 {
-   if (sk->sk_send_head == skb_unlinked)
+   if (sk->sk_send_head == skb_unlinked) {
sk->sk_send_head = NULL;
+   tcp_chrono_stop(sk, TCP_CHRONO_BUSY);
+   }
if (tcp_sk(sk)->highest_sack == skb_unlinked)
tcp_sk(sk)->highest_sack = NULL;
 }
@@ -1616,6 +1619,7 @@ static inline void tcp_add_write_queue_tail(struct sock 
*sk, struct sk_buff *skb
/* Queue it, remembering where we must start sending. */
if (sk->sk_send_head == NULL) {
sk->sk_send_head = skb;
+   tcp_chrono_start(sk, TCP_CHRONO_BUSY);
 
if (tcp_sk(sk)->highest_sack == NULL)
tcp_sk(sk)->highest_sack = skb;
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 22e6a20..a5d1727 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -3178,6 +3178,9 @@ static int tcp_clean_rtx_queue(struct sock *sk, int 
prior_fackets,
tp->lost_skb_hint = NULL;
}
 
+   if (!skb)
+   tcp_chrono_stop(sk, TCP_CHRONO_BUSY);
+
if (likely(between(tp->snd_up, prior_snd_una, tp->snd_una)))
tp->snd_up = tp->snd_una;
 
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 34f7517..e8ea584 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -2096,8 +2096,8 @@ void tcp_chrono_start(struct sock *sk, const enum 
tcp_chrono type)
struct tcp_sock *tp = tcp_sk(sk);
 
/* If there are multiple conditions worthy of tracking in a
-* chronograph then the highest priority enum takes precedence over
-* the other conditions. So that if something "more interesting"
+* chronograph then the highest priority enum takes precedence
+* over the other conditions. So that if something "more interesting"
 * starts happening, stop the previous chrono and start a new one.
 */
if (type > tp->chrono_type)
@@ -2108,7 +2108,18 @@ void tcp_chrono_stop(struct sock *sk, const enum 
tcp_chrono type)
 {
struct tcp_sock *tp = tcp_sk(sk);
 
-   tcp_chrono_set(tp, TCP_CHRONO_UNSPEC);
+
+   /* There are multiple conditions worthy of tracking in a
+* chronograph, so that the highest priority enum takes
+* precedence over the other conditions (see tcp_chrono_start).
+* If a condition stops, we only stop chrono tracking if
+* it's the "most interesting" or current chrono we are
+* tracking and starts busy chrono if we have pending data.
+*/
+   if (tcp_write_queue_empty(sk))
+   tcp_chrono_set(tp, TCP_CHRONO_UNSPEC);
+   else if (type == tp->chrono_type)
+   tcp_chrono_set(tp, TCP_CHRONO_BUSY);
 }
 
 /* This routine writes packets to the network.  It advances the
@@ -3328,6 +3339,8 @@ static int tcp_send_syn_data(struct sock *sk, struct 
sk_buff *syn)
fo->copied = space;
 
tcp_connect_queue_skb(sk, syn_data);
+   if (syn_data->len)
+   tcp_chrono_start(sk, TCP_CHRONO_BUSY);
 
err = tcp_transmit_skb(sk, syn_data, 1, sk->sk_allocation);
 
-- 
2.8.0.rc3.226.g39d4020



[PATCH net-next v2 0/6] tcp: sender chronographs instrumentation

2016-11-27 Thread Yuchung Cheng
This patch set provides instrumentation on TCP sender limitations.
While developing the BBR congestion control, we noticed that TCP
sending process is often limited by factors unrelated to congestion
control: insufficient sender buffer and/or insufficient receive
window/buffer to saturate the network bandwidth. Unfortunately these
limits are not visible to the users and often the poor performance
is attributed to the congestion control of choice.

Thie patch aims to help users get the high level understanding of
where sending process is limited by, similar to the TCP_INFO design.
It is not to replace detailed kernel tracing and instrumentation
facilities.

In addition this patch set provide a new option to the timestamping
work to instrument these limits on application data unit. For exampe,
one can use SO_TIMESTAMPING and this patch set to measure the how
long a particular HTTP response is limited by small receive window.

Patch set was initially written by Francis Yan then polished
by Yuchung Cheng, with lots of help from Eric Dumazet and Soheil
Hassas Yeganeh.

Francis Yan (6):
  tcp: instrument tcp sender limits chronographs
  tcp: instrument how long TCP is busy sending
  tcp: instrument how long TCP is limited by receive window
  tcp: instrument how long TCP is limited by insufficient send buffer
  tcp: export sender limits chronographs to TCP_INFO
  tcp: SOF_TIMESTAMPING_OPT_STATS option for SO_TIMESTAMPING

 Documentation/networking/timestamping.txt | 10 +
 arch/alpha/include/uapi/asm/socket.h  |  2 +
 arch/frv/include/uapi/asm/socket.h|  2 +
 arch/ia64/include/uapi/asm/socket.h   |  2 +
 arch/m32r/include/uapi/asm/socket.h   |  2 +
 arch/mips/include/uapi/asm/socket.h   |  2 +
 arch/mn10300/include/uapi/asm/socket.h|  2 +
 arch/parisc/include/uapi/asm/socket.h |  2 +
 arch/powerpc/include/uapi/asm/socket.h|  2 +
 arch/s390/include/uapi/asm/socket.h   |  2 +
 arch/sparc/include/uapi/asm/socket.h  |  2 +
 arch/xtensa/include/uapi/asm/socket.h |  2 +
 include/linux/tcp.h   |  9 -
 include/net/tcp.h | 20 +-
 include/uapi/asm-generic/socket.h |  2 +
 include/uapi/linux/net_tstamp.h   |  3 +-
 include/uapi/linux/tcp.h  | 12 ++
 net/core/skbuff.c | 14 +--
 net/core/sock.c   |  7 
 net/ipv4/tcp.c| 50 ++-
 net/ipv4/tcp_input.c  |  8 +++-
 net/ipv4/tcp_output.c | 66 ++-
 net/socket.c  |  7 +++-
 23 files changed, 217 insertions(+), 13 deletions(-)

-- 
2.8.0.rc3.226.g39d4020



Re: [PATCH] net: fec: turn on device when extracting statistics

2016-11-27 Thread Nikita Yushchenko


28.11.2016 04:29, David Miller пишет:
> From: Nikita Yushchenko 
> Date: Fri, 25 Nov 2016 13:02:00 +0300
> 
>> +int i, ret;
>> +
>> +ret = pm_runtime_get_sync(>pdev->dev);
>> +if (IS_ERR_VALUE(ret)) {
>> +memset(data, 0, sizeof(*data) * ARRAY_SIZE(fec_stats));
>> +return;
>> +}
> 
> This really isn't the way to do this.
> 
> When the device is suspended and the clocks are going to be stopped,
> you must fetch the statistic values into a software copy and provide
> those if the device is suspended when statistics are requested.

Ok, can do that, although can't see what's wrong with waking device
here. The situation of requesting stats on down device isn't something
widely used, thus keeping handling of that as local as possible looks
better for me.


[PATCH] vxlan: fix a potential issue when create a new vxlan fdb entry.

2016-11-27 Thread Haishuang Yan
vxlan_fdb_append may return error, so add the proper check,
otherwise it will cause memory leak.

Signed-off-by: Haishuang Yan 
---
 drivers/net/vxlan.c | 10 +++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index 21e92be..3b7b237 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -611,6 +611,7 @@ static int vxlan_fdb_create(struct vxlan_dev *vxlan,
struct vxlan_rdst *rd = NULL;
struct vxlan_fdb *f;
int notify = 0;
+   int rc = 0;
 
f = __vxlan_find_mac(vxlan, mac);
if (f) {
@@ -641,8 +642,7 @@ static int vxlan_fdb_create(struct vxlan_dev *vxlan,
if ((flags & NLM_F_APPEND) &&
(is_multicast_ether_addr(f->eth_addr) ||
 is_zero_ether_addr(f->eth_addr))) {
-   int rc = vxlan_fdb_append(f, ip, port, vni, ifindex,
- );
+   rc = vxlan_fdb_append(f, ip, port, vni, ifindex, );
 
if (rc < 0)
return rc;
@@ -673,7 +673,11 @@ static int vxlan_fdb_create(struct vxlan_dev *vxlan,
INIT_LIST_HEAD(>remotes);
memcpy(f->eth_addr, mac, ETH_ALEN);
 
-   vxlan_fdb_append(f, ip, port, vni, ifindex, );
+   rc = vxlan_fdb_append(f, ip, port, vni, ifindex, );
+   if (rc < 0) {
+   kfree(f);
+   return rc;
+   }
 
++vxlan->addrcnt;
hlist_add_head_rcu(>hlist,
-- 
1.8.3.1





RE: BALANCE PAYMENT

2016-11-27 Thread coral

Dear Sir/s,

Please see attached.


Thanks and regards,

Accounts Department
Al Omraniya Trading Co. LLC
P.O. Box: 10757, Al Khabaisi Area,
Deira 2, Dubai, U.A.E.
Tel: +971 4 268 2730 / Fax: +971 4 268 4117



Re: [PATCH net] net, sched: respect rcu grace period on cls destruction

2016-11-27 Thread Cong Wang
On Sat, Nov 26, 2016 at 4:18 PM, Daniel Borkmann  wrote:
> Roi reported a crash in flower where tp->root was NULL in ->classify()
> callbacks. Reason is that in ->destroy() tp->root is set to NULL via
> RCU_INIT_POINTER(). It's problematic for some of the classifiers, because
> this doesn't respect RCU grace period for them, and as a result, still
> outstanding readers from tc_classify() will try to blindly dereference
> a NULL tp->root.
>
> The tp->root object is strictly private to the classifier implementation
> and holds internal data the core such as tc_ctl_tfilter() doesn't know
> about. Within some classifiers, such as cls_bpf, cls_basic, etc, tp->root
> is only checked for NULL in ->get() callback, but nowhere else. This is
> misleading and seemed to be copied from old classifier code that was not
> cleaned up properly. For example, d3fa76ee6b4a ("[NET_SCHED]: cls_basic:
> fix NULL pointer dereference") moved tp->root initialization into ->init()
> routine, where before it was part of ->change(), so ->get() had to deal
> with tp->root being NULL back then, so that was indeed a valid case, after
> d3fa76ee6b4a, not really anymore. We used to set tp->root to NULL long
> ago in ->destroy(), see 47a1a1d4be29 ("pkt_sched: remove unnecessary xchg()
> in packet classifiers"); but the NULLifying was reintroduced with the
> RCUification, but it's not correct for every classifier implementation.
>
> In the cases that are fixed here with one exception of cls_cgroup, tp->root
> object is allocated and initialized inside ->init() callback, which is always
> performed at a point in time after we allocate a new tp, which means tp and
> thus tp->root was not globally visible in the tp chain yet (see 
> tc_ctl_tfilter()).
> Also, on destruction tp->root is strictly kfree_rcu()'ed in ->destroy()
> handler, same for the tp which is kfree_rcu()'ed right when we return
> from ->destroy() in tcf_destroy(). This means, the head object's lifetime
> for such classifiers is always tied to the tp lifetime. The RCU callback
> invocation for the two kfree_rcu() could be out of order, but that's fine
> since both are independent.
>
> Dropping the RCU_INIT_POINTER(tp->root, NULL) for these classifiers here
> means that 1) we don't need a useless NULL check in fast-path and, 2) that
> outstanding readers of that tp in tc_classify() can still execute under
> respect with RCU grace period as it is actually expected.
>
> Things that haven't been touched here: cls_fw and cls_route. They each
> handle tp->root being NULL in ->classify() path for historic reasons, so
> their ->destroy() implementation can stay as is. If someone actually
> cares, they could get cleaned up at some point to avoid the test in fast
> path. cls_u32 doesn't set tp->root to NULL. For cls_rsvp, I just added a
> !head should anyone actually be using/testing it, so it at least aligns with
> cls_fw and cls_route. For cls_flower we additionally need to defer rhashtable
> destruction (to a sleepable context) after RCU grace period as concurrent
> readers might still access it. (Note that in this case we need to hold module
> reference to keep work callback address intact, since we only wait on module
> unload for all call_rcu()s to finish.)
>
> This fixes one race to bring RCU grace period guarantees back. Next step
> as worked on by Cong however is to fix 1e052be69d04 ("net_sched: destroy
> proto tp when all filters are gone") to get the order of unlinking the tp
> in tc_ctl_tfilter() for the RTM_DELTFILTER case right by moving
> RCU_INIT_POINTER() before tcf_destroy() and let the notification for
> removal be done through the prior ->delete() callback. Both are independant
> issues. Once we have that right, we can then clean tp->root up for a number
> of classifiers by not making them RCU pointers, which requires a new callback
> (->uninit) that is triggered from tp's RCU callback, where we just kfree()
> tp->root from there.

Looks good to my eyes,

Acked-by: Cong Wang 

The ugly part is the work struct, I am not an RCU expert so don't know if we
have any API to execute an RCU callback in process context. Paul?

Thanks.


RE: BALANCE PAYMENT

2016-11-27 Thread coral

Dear Sir/s,

Please see attached.


Thanks and regards,

Accounts Department
Al Omraniya Trading Co. LLC
P.O. Box: 10757, Al Khabaisi Area,
Deira 2, Dubai, U.A.E.
Tel: +971 4 268 2730 / Fax: +971 4 268 4117



Re: Crash due to mutex genl_lock called from RCU context

2016-11-27 Thread Cong Wang
On Sun, Nov 27, 2016 at 8:23 AM, Eric Dumazet  wrote:
> On Sat, 2016-11-26 at 22:28 -0800, Cong Wang wrote:
>> On Sat, Nov 26, 2016 at 6:26 PM, Eric Dumazet  wrote:
>> >
>> > Are you telling me inet_release() is called when we close() the first
>> > file descriptor ?
>> >
>> > fd1 = socket()
>> > fd2 = dup(fd1);
>> > close(fd2) -> release() ???
>>
>> Sorry, I didn't express myself clearly, I meant your change,
>> if exclude the SOCK_RCU_FREE part, basically reverts this commit:
>>
>> commit 3f660d66dfbc13ea4b61d3865851b348444c24b4
>> Author: Herbert Xu 
>> Date:   Thu May 3 03:17:14 2007 -0700
>>
>> [NETLINK]: Kill CB only when socket is unused
>>
>> IOW, ->release() is called when the last sock fd ref is gone, but 
>> ->destructor()
>> is called with the last sock ref is gone. They are very different.
>
> Hmm...
>
>
>> I am confused, what Subash reported is a kernel warning which can
>> surely be fixed by removing genl lock (if it is correct, I need to double
>> check), so why for net-next?
>
> Because Subash pointed to a buggy commit.
>
> We want to fix all issues bring by this commit, not only the immediate
> problem about mutex.
>
> I have no idea if we can safely remove the mutex from genl_lock_done() :

I meant removing it only for the destructor case, we definitely can't remove
it for the dump case.

>
> The genl_lock() is not only protecting the socket itself, it might
> protect global data as well, or protect some kind of lock ordering among
> multiple mutexes.
>
> Have you checked all genl users, down to linux-4.0 , point where commit
> 21e4902aea80ef35a was added ?
>

I just took a deeper look, some user calls rhashtable_destroy() in ->done(),
so even removing that genl lock is not enough, perhaps we should just
move it to a work struct like what Daniel does for the tcf_proto, but that is
ugly... I don't know if RCU provides any API to execute the callback in process
context.


[patch net] net: dsa: fix unbalanced dsa_switch_tree reference counting

2016-11-27 Thread Nikita Yushchenko
_dsa_register_switch() gets a dsa_switch_tree object either via
dsa_get_dst() or via dsa_add_dst(). Former path does not increase kref
in returned object (resulting into caller not owning a reference),
while later path does create a new object (resulting into caller owning
a reference).

The rest of _dsa_register_switch() assumes that it owns a reference, and
calls dsa_put_dst().

This causes a memory breakage if first switch in the tree initialized
successfully, but second failed to initialize. In particular, freed
dsa_swith_tree object is left referenced by switch that was initialized,
and later access to sysfs attributes of that switch cause OOPS.

To fix, need to add kref_get() call to dsa_get_dst().

Signed-off-by: Nikita Yushchenko 
Fixes: 83c0afaec7b7 ("net: dsa: Add new binding implementation")
Reviewed-by: Andrew Lunn 
---
 net/dsa/dsa2.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/net/dsa/dsa2.c b/net/dsa/dsa2.c
index f8a7d9aab437..5fff951a0a49 100644
--- a/net/dsa/dsa2.c
+++ b/net/dsa/dsa2.c
@@ -28,8 +28,10 @@ static struct dsa_switch_tree *dsa_get_dst(u32 tree)
struct dsa_switch_tree *dst;
 
list_for_each_entry(dst, _switch_trees, list)
-   if (dst->tree == tree)
+   if (dst->tree == tree) {
+   kref_get(>refcount);
return dst;
+   }
return NULL;
 }
 
-- 
2.1.4



RE: [net,v2] neigh: fix the loop index error in neigh dump

2016-11-27 Thread 张胜举
> -Original Message-
> From: David Ahern [mailto:d...@cumulusnetworks.com]
> Sent: Monday, November 28, 2016 1:07 PM
> To: 张胜举 ;
> netdev@vger.kernel.org
> Subject: Re: [net,v2] neigh: fix the loop index error in neigh dump
> 
> On 11/27/16 9:50 PM, 张胜举 wrote:
> > No, when dump request must be processed by multiple 'recv/recvmsg'
> > system calls, idx stores which dev/neigh the previous call have
> > processed, so that next call will scan from the right place.
> 
> I have tested multiple calls and I do not see redundant information or
missing
> information.
> 
> >
> > So no matter whether the dev/neigh is filtered, the idx should be
> > increased anyway.
> 
> No, it does not. Again, idx is the index in the list of devices/ of
interest. It is
> NOT a device index nor is it the absolute index in the list. It is a
relative index.
> The filter is the same across recvmsg calls so the idx count is absolutely
fine.
> 
> Produce a test case that fails.
David, I know your point. And I agree with you that this will not make 
redundant or missing link information.

But this will cause the filtered out device be scanned multiple times. 

For example, assume that netlink message can only store two devices info.

And eth2-eth5 are filtered out.

For the first loop, idx will point to eth2, but the code already scan to
eth6.
eth0->eth1->eth2(out)->eth3(out)-> eth4(out)->eth5(out)->eth6->eth7
 ^
The next loop, the code will start to scan from eth2 to eth8, but eth2-eth5 
already scanned by previous loop. After this loop, idx will point to eth4.
eth0->eth1->eth2(out)->eth3(out)->eth4(out)->eth5(out)->eth6->eth7->eth8
  ^
So this will cause the same device to be scanned multiple times.

Almost all other dump functions treat idx as the absolute index in the list,

and will not have the above problem. 

We don't treat this a bugfix, but i think we'd better in line with other 
dump functions.





Re: [PATCH] geneve: fix ip_hdr_len reserved for geneve6 tunnel.

2016-11-27 Thread Pravin Shelar
On Sun, Nov 27, 2016 at 9:26 PM, Haishuang Yan
 wrote:
> It shold reserved sizeof(ipv6hdr) for geneve in ipv6 tunnel.
>
> Fixes: c3ef5aa5e5 ('geneve: Merge ipv4 and ipv6 geneve_build_skb()')
>
> Signed-off-by: Haishuang Yan 

Thanks for fix.

Acked-by: Pravin B Shelar 


Re: [PATCH net] net/sched: act_pedit: limit negative offset

2016-11-27 Thread David Miller
From: Cong Wang 
Date: Sun, 27 Nov 2016 21:39:33 -0800

> On Sun, Nov 27, 2016 at 7:58 AM, Amir Vadai  wrote:
>> Should not allow setting a negative offset that goes below the skb head.
> ...
>> diff --git a/net/sched/act_pedit.c b/net/sched/act_pedit.c
>> index b54d56d4959b..e79e8a88f2d2 100644
>> --- a/net/sched/act_pedit.c
>> +++ b/net/sched/act_pedit.c
>> @@ -154,8 +154,11 @@ static int tcf_pedit(struct sk_buff *skb, const struct 
>> tc_action *a,
>> }
>>
>> ptr = skb_header_pointer(skb, off + offset, 4, 
>> &_data);
>> -   if (!ptr)
>> +   if ((unsigned char *)ptr < skb->head) {
> 
> 
> ptr returned could be &_data, which is on stack, so why this comparison
> makes sense for this case?

Indeed, this will definitely do the wrong thing when the on-stack area
passed back to ptr.


Re: [PATCH net] net/sched: act_pedit: limit negative offset

2016-11-27 Thread Cong Wang
On Sun, Nov 27, 2016 at 7:58 AM, Amir Vadai  wrote:
> Should not allow setting a negative offset that goes below the skb head.
...
> diff --git a/net/sched/act_pedit.c b/net/sched/act_pedit.c
> index b54d56d4959b..e79e8a88f2d2 100644
> --- a/net/sched/act_pedit.c
> +++ b/net/sched/act_pedit.c
> @@ -154,8 +154,11 @@ static int tcf_pedit(struct sk_buff *skb, const struct 
> tc_action *a,
> }
>
> ptr = skb_header_pointer(skb, off + offset, 4, 
> &_data);
> -   if (!ptr)
> +   if ((unsigned char *)ptr < skb->head) {


ptr returned could be &_data, which is on stack, so why this comparison
makes sense for this case?


> +   pr_info("tc filter pedit offset out of 
> bounds\n");
> goto bad;
> +   }
> +
> /* just do it, baby */
> *ptr = ((*ptr & tkey->mask) ^ tkey->val);
> if (ptr == &_data)
> --
> 2.10.2
>


[PATCH] geneve: fix ip_hdr_len reserved for geneve6 tunnel.

2016-11-27 Thread Haishuang Yan
It shold reserved sizeof(ipv6hdr) for geneve in ipv6 tunnel.

Fixes: c3ef5aa5e5 ('geneve: Merge ipv4 and ipv6 geneve_build_skb()')

Signed-off-by: Haishuang Yan 
---
 drivers/net/geneve.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/geneve.c b/drivers/net/geneve.c
index 7b80e28..45301cb 100644
--- a/drivers/net/geneve.c
+++ b/drivers/net/geneve.c
@@ -852,7 +852,7 @@ static int geneve6_xmit_skb(struct sk_buff *skb, struct 
net_device *dev,
   ip_hdr(skb), skb);
ttl = key->ttl ? : ip6_dst_hoplimit(dst);
}
-   err = geneve_build_skb(dst, skb, info, xnet, sizeof(struct iphdr));
+   err = geneve_build_skb(dst, skb, info, xnet, sizeof(struct ipv6hdr));
if (unlikely(err))
return err;
 
-- 
1.8.3.1





Re: [net,v2] neigh: fix the loop index error in neigh dump

2016-11-27 Thread David Ahern
On 11/27/16 9:50 PM, 张胜举 wrote:
> No, when dump request must be processed by multiple 'recv/recvmsg' system
> calls, 
> idx stores which dev/neigh the previous call have processed, so that next
> call will scan 
> from the right place.  

I have tested multiple calls and I do not see redundant information or missing 
information. 

> 
> So no matter whether the dev/neigh is filtered, the idx should be increased
> anyway.

No, it does not. Again, idx is the index in the list of devices/ of interest. 
It is NOT a device index nor is it the absolute index in the list. It is a 
relative index. The filter is the same across recvmsg calls so the idx count is 
absolutely fine.

Produce a test case that fails.



Re: [PATCH net-next 2/9] liquidio CN23XX: VF registration

2016-11-27 Thread David Miller
From: Raghu Vatsavayi 
Date: Sun, 27 Nov 2016 20:51:35 -0800

> +static int
> +liquidio_vf_probe(struct pci_dev *pdev,
> +   const struct pci_device_id *ent __attribute__((unused)))
> +{
> + struct octeon_device *oct_dev = NULL;
 ...
> + /* set linux specific device pointer */
> + oct_dev->pci_dev = (void *)pdev;
> +

This is a terrible cast on several levels.  The type is already
correct, oct_dev->pci_dev and pdev are both "struct pci_dev *"

Furthermore, even if oct_dev->pci_dev was "void *", void pointer
casts are _never_ necessary on assignment from any other pointer
type.


Re: [PATCH v3 net-next 1/2] net: ethernet: slicoss: add slicoss gigabit ethernet driver

2016-11-27 Thread Florian Fainelli
On 11/26/2016 04:20 AM, Lino Sanfilippo wrote:
> Add driver for Alacritech gigabit ethernet cards with SLIC (session-layer
> interface control) technology. The driver provides basic support without
> SLIC for the following devices:
> 
> - Mojave cards (single port PCI Gigabit) both copper and fiber
> - Oasis cards (single and dual port PCI-x Gigabit) copper and fiber
> - Kalahari cards (dual and quad port PCI-e Gigabit) copper and fiber

This looks great, a few nits below:


> +#define SLIC_MAX_TX_COMPLETIONS  100

You usually don't want to limit the number of TX completion, if the
entire TX ring needs to be cleaned, you would want to allow that.

[snip]

> + while (slic_get_free_rx_descs(rxq) > SLIC_MAX_REQ_RX_DESCS) {
> + skb = alloc_skb(maplen + ALIGN_MASK, gfp);
> + if (!skb)
> + break;
> +
> + paddr = dma_map_single(>pdev->dev, skb->data, maplen,
> +DMA_FROM_DEVICE);
> + if (dma_mapping_error(>pdev->dev, paddr)) {
> + netdev_err(dev, "mapping rx packet failed\n");
> + /* drop skb */
> + dev_kfree_skb_any(skb);
> + break;
> + }
> + /* ensure head buffer descriptors are 256 byte aligned */
> + offset = 0;
> + misalign = paddr & ALIGN_MASK;
> + if (misalign) {
> + offset = SLIC_RX_BUFF_ALIGN - misalign;
> + skb_reserve(skb, offset);
> + }
> + /* the HW expects dma chunks for descriptor + frame data */
> + desc = (struct slic_rx_desc *)skb->data;
> + memset(desc, 0, sizeof(*desc));

Do you really need to zero-out the prepending RX descriptor? Are not you
missing a write barrier here?

[snip]

> +
> + dma_sync_single_for_cpu(>pdev->dev,
> + dma_unmap_addr(buff, map_addr),
> + buff->addr_offset + sizeof(*desc),
> + DMA_FROM_DEVICE);
> +
> + status = le32_to_cpu(desc->status);
> + if (!(status & SLIC_IRHDDR_SVALID))
> + break;
> +
> + buff->skb = NULL;
> +
> + dma_unmap_single(>pdev->dev,
> +  dma_unmap_addr(buff, map_addr),
> +  dma_unmap_len(buff, map_len),
> +  DMA_FROM_DEVICE);

This is potentially inefficient, you already did a cache invalidation
for the RX descriptor here, you could be more efficient with just
invalidating the packet length, minus the descriptor length.

> +
> + /* skip rx descriptor that is placed before the frame data */
> + skb_reserve(skb, SLIC_RX_BUFF_HDR_SIZE);
> +
> + if (unlikely(status & SLIC_IRHDDR_ERR)) {
> + slic_handle_frame_error(sdev, skb);
> + dev_kfree_skb_any(skb);
> + } else {
> + struct ethhdr *eh = (struct ethhdr *)skb->data;
> +
> + if (is_multicast_ether_addr(eh->h_dest))
> + SLIC_INC_STATS_COUNTER(>stats, rx_mcasts);
> +
> + len = le32_to_cpu(desc->length) & SLIC_IRHDDR_FLEN_MSK;
> + skb_put(skb, len);
> + skb->protocol = eth_type_trans(skb, dev);
> + skb->ip_summed = CHECKSUM_UNNECESSARY;
> + skb->dev = dev;

eth_type_trans() already assigns skb->dev = dev;

> +static int slic_poll(struct napi_struct *napi, int todo)
> +{
> + struct slic_device *sdev = container_of(napi, struct slic_device, napi);
> + struct slic_shmem *sm = >shmem;
> + struct slic_shmem_data *sm_data = sm->shmem_data;
> + u32 isr = le32_to_cpu(sm_data->isr);
> + unsigned int done = 0;
> +
> + slic_handle_irq(sdev, isr, todo, );
> +
> + if (done < todo) {
> + napi_complete(napi);

napi_complete_done() since you know how many packets you completed.

> + /* reenable irqs */
> + sm_data->isr = 0;
> + /* make sure sm_data->isr is cleard before irqs are reenabled */
> + wmb();
> + slic_write(sdev, SLIC_REG_ISR, 0);
> + slic_flush_write(sdev);
> + }
> +
> + return done;
> +}
> +
> +static irqreturn_t slic_irq(int irq, void *dev_id)
> +{
> + struct slic_device *sdev = dev_id;
> + struct slic_shmem *sm = >shmem;
> + struct slic_shmem_data *sm_data = sm->shmem_data;
> +
> + slic_write(sdev, SLIC_REG_ICR, SLIC_ICR_INT_MASK);
> + slic_flush_write(sdev);
> + /* make sure sm_data->isr is read after ICR_INT_MASK is set */
> + wmb();
> +
> + if (!sm_data->isr) {
> + dma_rmb();
> + /* spurious interrupt */
> + slic_write(sdev, SLIC_REG_ISR, 0);
> + 

Re: [PATCH net v2 0/5] net: fix phydev reference leaks

2016-11-27 Thread David Miller
From: Timur Tabi 
Date: Sun, 27 Nov 2016 20:11:17 -0600

> David Miller wrote:
>> Series applied, thanks.
> 
> I was really hoping you'd give me the chance to test the patches
> before applying them.

Sorry, if anything is broken I will happily revert if it isn't
fixed promptly.


[PATCH net-next 4/9] liquidio CN23XX: VF queue setup

2016-11-27 Thread Raghu Vatsavayi
Adds support for configuring VF input/output queues.

Signed-off-by: Raghu Vatsavayi 
Signed-off-by: Derek Chickles 
Signed-off-by: Satanand Burla 
Signed-off-by: Felix Manlunas 
---
 .../ethernet/cavium/liquidio/cn23xx_vf_device.c| 144 +
 .../ethernet/cavium/liquidio/cn23xx_vf_device.h|   2 +
 drivers/net/ethernet/cavium/liquidio/lio_main.c|   6 +-
 drivers/net/ethernet/cavium/liquidio/lio_vf_main.c |   5 +
 .../net/ethernet/cavium/liquidio/octeon_device.c   |  43 +-
 .../net/ethernet/cavium/liquidio/octeon_device.h   |   7 +-
 .../net/ethernet/cavium/liquidio/request_manager.c |   4 +-
 7 files changed, 207 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/cavium/liquidio/cn23xx_vf_device.c 
b/drivers/net/ethernet/cavium/liquidio/cn23xx_vf_device.c
index d683bda..60fd138 100644
--- a/drivers/net/ethernet/cavium/liquidio/cn23xx_vf_device.c
+++ b/drivers/net/ethernet/cavium/liquidio/cn23xx_vf_device.c
@@ -25,13 +25,134 @@
 #include "cn23xx_vf_device.h"
 #include "octeon_main.h"
 
+static int cn23xx_vf_reset_io_queues(struct octeon_device *oct, u32 num_queues)
+{
+   u32 loop = BUSY_READING_REG_VF_LOOP_COUNT;
+   int ret_val = 0;
+   u32 q_no;
+   u64 d64;
+
+   for (q_no = 0; q_no < num_queues; q_no++) {
+   /* set RST bit to 1. This bit applies to both IQ and OQ */
+   d64 = octeon_read_csr64(oct,
+   CN23XX_VF_SLI_IQ_PKT_CONTROL64(q_no));
+   d64 |= CN23XX_PKT_INPUT_CTL_RST;
+   octeon_write_csr64(oct, CN23XX_VF_SLI_IQ_PKT_CONTROL64(q_no),
+  d64);
+   }
+
+   /* wait until the RST bit is clear or the RST and QUIET bits are set */
+   for (q_no = 0; q_no < num_queues; q_no++) {
+   u64 reg_val = octeon_read_csr64(oct,
+   CN23XX_VF_SLI_IQ_PKT_CONTROL64(q_no));
+   while ((READ_ONCE(reg_val) & CN23XX_PKT_INPUT_CTL_RST) &&
+  !(READ_ONCE(reg_val) & CN23XX_PKT_INPUT_CTL_QUIET) &&
+  loop) {
+   WRITE_ONCE(reg_val, octeon_read_csr64(
+   oct, CN23XX_VF_SLI_IQ_PKT_CONTROL64(q_no)));
+   loop--;
+   }
+   if (!loop) {
+   dev_err(>pci_dev->dev,
+   "clearing the reset reg failed or setting the 
quiet reg failed for qno: %u\n",
+   q_no);
+   return -1;
+   }
+   WRITE_ONCE(reg_val, READ_ONCE(reg_val) &
+  ~CN23XX_PKT_INPUT_CTL_RST);
+   octeon_write_csr64(oct, CN23XX_VF_SLI_IQ_PKT_CONTROL64(q_no),
+  READ_ONCE(reg_val));
+
+   WRITE_ONCE(reg_val, octeon_read_csr64(
+   oct, CN23XX_VF_SLI_IQ_PKT_CONTROL64(q_no)));
+   if (READ_ONCE(reg_val) & CN23XX_PKT_INPUT_CTL_RST) {
+   dev_err(>pci_dev->dev,
+   "clearing the reset failed for qno: %u\n",
+   q_no);
+   ret_val = -1;
+   }
+   }
+
+   return ret_val;
+}
+
+static int cn23xx_enable_vf_io_queues(struct octeon_device *oct)
+{
+   u32 q_no;
+
+   for (q_no = 0; q_no < oct->num_iqs; q_no++) {
+   u64 reg_val;
+
+   /* set the corresponding IQ IS_64B bit */
+   if (oct->io_qmask.iq64B & BIT_ULL(q_no)) {
+   reg_val = octeon_read_csr64(
+   oct, CN23XX_VF_SLI_IQ_PKT_CONTROL64(q_no));
+   reg_val |= CN23XX_PKT_INPUT_CTL_IS_64B;
+   octeon_write_csr64(
+   oct, CN23XX_VF_SLI_IQ_PKT_CONTROL64(q_no), reg_val);
+   }
+
+   /* set the corresponding IQ ENB bit */
+   if (oct->io_qmask.iq & BIT_ULL(q_no)) {
+   reg_val = octeon_read_csr64(
+   oct, CN23XX_VF_SLI_IQ_PKT_CONTROL64(q_no));
+   reg_val |= CN23XX_PKT_INPUT_CTL_RING_ENB;
+   octeon_write_csr64(
+   oct, CN23XX_VF_SLI_IQ_PKT_CONTROL64(q_no), reg_val);
+   }
+   }
+   for (q_no = 0; q_no < oct->num_oqs; q_no++) {
+   u32 reg_val;
+
+   /* set the corresponding OQ ENB bit */
+   if (oct->io_qmask.oq & BIT_ULL(q_no)) {
+   reg_val = octeon_read_csr(
+   oct, CN23XX_VF_SLI_OQ_PKT_CONTROL(q_no));
+   reg_val |= CN23XX_PKT_OUTPUT_CTL_RING_ENB;
+   octeon_write_csr(
+   oct, CN23XX_VF_SLI_OQ_PKT_CONTROL(q_no), 

[PATCH net-next 8/9] liquidio CN23XX: VF interrupt

2016-11-27 Thread Raghu Vatsavayi
Adds support for VF interrupt processing.

Signed-off-by: Raghu Vatsavayi 
Signed-off-by: Derek Chickles 
Signed-off-by: Satanand Burla 
Signed-off-by: Felix Manlunas 
---
 .../ethernet/cavium/liquidio/cn23xx_vf_device.c| 265 +
 .../ethernet/cavium/liquidio/cn23xx_vf_device.h|   6 +
 drivers/net/ethernet/cavium/liquidio/lio_core.c|   7 -
 drivers/net/ethernet/cavium/liquidio/lio_vf_main.c | 162 +
 .../net/ethernet/cavium/liquidio/octeon_device.c   |   3 +
 .../net/ethernet/cavium/liquidio/octeon_device.h   |   2 +
 6 files changed, 438 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/cavium/liquidio/cn23xx_vf_device.c 
b/drivers/net/ethernet/cavium/liquidio/cn23xx_vf_device.c
index e514797..9ded8fc 100644
--- a/drivers/net/ethernet/cavium/liquidio/cn23xx_vf_device.c
+++ b/drivers/net/ethernet/cavium/liquidio/cn23xx_vf_device.c
@@ -27,6 +27,26 @@
 #include "octeon_main.h"
 #include "octeon_mailbox.h"
 
+u32 cn23xx_vf_get_oq_ticks(struct octeon_device *oct, u32 time_intr_in_us)
+{
+   /* This gives the SLI clock per microsec */
+   u32 oqticks_per_us = (u32)oct->pfvf_hsword.coproc_tics_per_us;
+
+   /* This gives the clock cycles per millisecond */
+   oqticks_per_us *= 1000;
+
+   /* This gives the oq ticks (1024 core clock cycles) per millisecond */
+   oqticks_per_us /= 1024;
+
+   /* time_intr is in microseconds. The next 2 steps gives the oq ticks
+* corressponding to time_intr.
+*/
+   oqticks_per_us *= time_intr_in_us;
+   oqticks_per_us /= 1000;
+
+   return oqticks_per_us;
+}
+
 static int cn23xx_vf_reset_io_queues(struct octeon_device *oct, u32 num_queues)
 {
u32 loop = BUSY_READING_REG_VF_LOOP_COUNT;
@@ -212,6 +232,11 @@ static void cn23xx_setup_vf_iq_regs(struct octeon_device 
*oct, u32 iq_no)
 */
pkt_in_done = readq(iq->inst_cnt_reg);
 
+   if (oct->msix_on) {
+   /* Set CINT_ENB to enable IQ interrupt */
+   writeq((pkt_in_done | CN23XX_INTR_CINT_ENB),
+  iq->inst_cnt_reg);
+   }
iq->reset_instr_cnt = 0;
 }
 
@@ -342,6 +367,240 @@ static void cn23xx_disable_vf_io_queues(struct 
octeon_device *oct)
cn23xx_vf_reset_io_queues(oct, num_queues);
 }
 
+void cn23xx_vf_ask_pf_to_do_flr(struct octeon_device *oct)
+{
+   struct octeon_mbox_cmd mbox_cmd;
+
+   mbox_cmd.msg.u64 = 0;
+   mbox_cmd.msg.s.type = OCTEON_MBOX_REQUEST;
+   mbox_cmd.msg.s.resp_needed = 0;
+   mbox_cmd.msg.s.cmd = OCTEON_VF_FLR_REQUEST;
+   mbox_cmd.msg.s.len = 1;
+   mbox_cmd.q_no = 0;
+   mbox_cmd.recv_len = 0;
+   mbox_cmd.recv_status = 0;
+   mbox_cmd.fn = NULL;
+   mbox_cmd.fn_arg = 0;
+
+   octeon_mbox_write(oct, _cmd);
+}
+
+static void octeon_pfvf_hs_callback(struct octeon_device *oct,
+   struct octeon_mbox_cmd *cmd,
+   void *arg)
+{
+   u32 major = 0;
+
+   memcpy((uint8_t *)>pfvf_hsword, cmd->msg.s.params,
+  CN23XX_MAILBOX_MSGPARAM_SIZE);
+   if (cmd->recv_len > 1)  {
+   major = ((struct lio_version *)(cmd->data))->major;
+   major = major << 16;
+   }
+
+   atomic_set((atomic_t *)arg, major | 1);
+}
+
+int cn23xx_octeon_pfvf_handshake(struct octeon_device *oct)
+{
+   struct octeon_mbox_cmd mbox_cmd;
+   u32 q_no, count = 0;
+   atomic_t status;
+   u32 pfmajor;
+   u32 vfmajor;
+   u32 ret;
+
+   /* Sending VF_ACTIVE indication to the PF driver */
+   dev_dbg(>pci_dev->dev, "requesting info from pf\n");
+
+   mbox_cmd.msg.u64 = 0;
+   mbox_cmd.msg.s.type = OCTEON_MBOX_REQUEST;
+   mbox_cmd.msg.s.resp_needed = 1;
+   mbox_cmd.msg.s.cmd = OCTEON_VF_ACTIVE;
+   mbox_cmd.msg.s.len = 2;
+   mbox_cmd.data[0] = 0;
+   ((struct lio_version *)_cmd.data[0])->major =
+   LIQUIDIO_BASE_MAJOR_VERSION;
+   ((struct lio_version *)_cmd.data[0])->minor =
+   LIQUIDIO_BASE_MINOR_VERSION;
+   ((struct lio_version *)_cmd.data[0])->micro =
+   LIQUIDIO_BASE_MICRO_VERSION;
+   mbox_cmd.q_no = 0;
+   mbox_cmd.recv_len = 0;
+   mbox_cmd.recv_status = 0;
+   mbox_cmd.fn = (octeon_mbox_callback_t)octeon_pfvf_hs_callback;
+   mbox_cmd.fn_arg = (void *)
+
+   /* Interrupts are not enabled at this point.
+* Enable them with default oq ticks
+*/
+   oct->fn_list.enable_interrupt(oct, OCTEON_ALL_INTR);
+
+   octeon_mbox_write(oct, _cmd);
+
+   atomic_set(, 0);
+
+   do {
+   schedule_timeout_uninterruptible(1);
+   } while ((!atomic_read()) && (count++ < 10));
+
+

[PATCH net-next 5/9] liquidio CN23XX: VF register access

2016-11-27 Thread Raghu Vatsavayi
This patch adds support for VF device register access.

Signed-off-by: Raghu Vatsavayi 
Signed-off-by: Derek Chickles 
Signed-off-by: Satanand Burla 
Signed-off-by: Felix Manlunas 
---
 .../ethernet/cavium/liquidio/cn23xx_vf_device.c| 189 +
 .../ethernet/cavium/liquidio/cn23xx_vf_device.h|   2 +
 drivers/net/ethernet/cavium/liquidio/lio_vf_main.c |   5 +
 3 files changed, 196 insertions(+)

diff --git a/drivers/net/ethernet/cavium/liquidio/cn23xx_vf_device.c 
b/drivers/net/ethernet/cavium/liquidio/cn23xx_vf_device.c
index 60fd138..ad4e442 100644
--- a/drivers/net/ethernet/cavium/liquidio/cn23xx_vf_device.c
+++ b/drivers/net/ethernet/cavium/liquidio/cn23xx_vf_device.c
@@ -76,6 +76,161 @@ static int cn23xx_vf_reset_io_queues(struct octeon_device 
*oct, u32 num_queues)
return ret_val;
 }
 
+static int cn23xx_vf_setup_global_input_regs(struct octeon_device *oct)
+{
+   struct octeon_cn23xx_vf *cn23xx = (struct octeon_cn23xx_vf *)oct->chip;
+   struct octeon_instr_queue *iq;
+   u64 q_no, intr_threshold;
+   u64 d64;
+
+   if (cn23xx_vf_reset_io_queues(oct, oct->sriov_info.rings_per_vf))
+   return -1;
+
+   for (q_no = 0; q_no < (oct->sriov_info.rings_per_vf); q_no++) {
+   void __iomem *inst_cnt_reg;
+
+   octeon_write_csr64(oct, CN23XX_VF_SLI_IQ_DOORBELL(q_no),
+  0x);
+   iq = oct->instr_queue[q_no];
+
+   if (iq)
+   inst_cnt_reg = iq->inst_cnt_reg;
+   else
+   inst_cnt_reg = (u8 *)oct->mmio[0].hw_addr +
+  CN23XX_VF_SLI_IQ_INSTR_COUNT64(q_no);
+
+   d64 = octeon_read_csr64(oct,
+   CN23XX_VF_SLI_IQ_INSTR_COUNT64(q_no));
+
+   d64 &= 0xEFFFL;
+
+   octeon_write_csr64(oct, CN23XX_VF_SLI_IQ_INSTR_COUNT64(q_no),
+  d64);
+
+   /* Select ES, RO, NS, RDSIZE,DPTR Fomat#0 for
+* the Input Queues
+*/
+   octeon_write_csr64(oct, CN23XX_VF_SLI_IQ_PKT_CONTROL64(q_no),
+  CN23XX_PKT_INPUT_CTL_MASK);
+
+   /* set the wmark level to trigger PI_INT */
+   intr_threshold = CFG_GET_IQ_INTR_PKT(cn23xx->conf) &
+CN23XX_PKT_IN_DONE_WMARK_MASK;
+
+   writeq((readq(inst_cnt_reg) &
+   ~(CN23XX_PKT_IN_DONE_WMARK_MASK <<
+ CN23XX_PKT_IN_DONE_WMARK_BIT_POS)) |
+  (intr_threshold << CN23XX_PKT_IN_DONE_WMARK_BIT_POS),
+  inst_cnt_reg);
+   }
+   return 0;
+}
+
+static void cn23xx_vf_setup_global_output_regs(struct octeon_device *oct)
+{
+   u32 reg_val;
+   u32 q_no;
+
+   for (q_no = 0; q_no < (oct->sriov_info.rings_per_vf); q_no++) {
+   octeon_write_csr(oct, CN23XX_VF_SLI_OQ_PKTS_CREDIT(q_no),
+0x);
+
+   reg_val =
+   octeon_read_csr(oct, CN23XX_VF_SLI_OQ_PKTS_SENT(q_no));
+
+   reg_val &= 0xEFFFL;
+
+   reg_val =
+   octeon_read_csr(oct, CN23XX_VF_SLI_OQ_PKT_CONTROL(q_no));
+
+   /* set IPTR & DPTR */
+   reg_val |=
+   (CN23XX_PKT_OUTPUT_CTL_IPTR | CN23XX_PKT_OUTPUT_CTL_DPTR);
+
+   /* reset BMODE */
+   reg_val &= ~(CN23XX_PKT_OUTPUT_CTL_BMODE);
+
+   /* No Relaxed Ordering, No Snoop, 64-bit Byte swap
+* for Output Queue ScatterList reset ROR_P, NSR_P
+*/
+   reg_val &= ~(CN23XX_PKT_OUTPUT_CTL_ROR_P);
+   reg_val &= ~(CN23XX_PKT_OUTPUT_CTL_NSR_P);
+
+#ifdef __LITTLE_ENDIAN_BITFIELD
+   reg_val &= ~(CN23XX_PKT_OUTPUT_CTL_ES_P);
+#else
+   reg_val |= (CN23XX_PKT_OUTPUT_CTL_ES_P);
+#endif
+   /* No Relaxed Ordering, No Snoop, 64-bit Byte swap
+* for Output Queue Data reset ROR, NSR
+*/
+   reg_val &= ~(CN23XX_PKT_OUTPUT_CTL_ROR);
+   reg_val &= ~(CN23XX_PKT_OUTPUT_CTL_NSR);
+   /* set the ES bit */
+   reg_val |= (CN23XX_PKT_OUTPUT_CTL_ES);
+
+   /* write all the selected settings */
+   octeon_write_csr(oct, CN23XX_VF_SLI_OQ_PKT_CONTROL(q_no),
+reg_val);
+   }
+}
+
+static int cn23xx_setup_vf_device_regs(struct octeon_device *oct)
+{
+   if (cn23xx_vf_setup_global_input_regs(oct))
+   return -1;
+
+   cn23xx_vf_setup_global_output_regs(oct);
+
+   return 0;
+}
+
+static void cn23xx_setup_vf_iq_regs(struct octeon_device 

[PATCH net-next 6/9] liquidio CN23XX: init VF softcommand queues

2016-11-27 Thread Raghu Vatsavayi
Adds support for initializing softcommand, dispatch and
instructions queues for VF.

Signed-off-by: Raghu Vatsavayi 
Signed-off-by: Derek Chickles 
Signed-off-by: Satanand Burla 
Signed-off-by: Felix Manlunas 
---
 drivers/net/ethernet/cavium/liquidio/lio_vf_main.c | 74 +-
 .../net/ethernet/cavium/liquidio/octeon_device.c   |  5 ++
 .../net/ethernet/cavium/liquidio/request_manager.c |  7 ++
 3 files changed, 84 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/cavium/liquidio/lio_vf_main.c 
b/drivers/net/ethernet/cavium/liquidio/lio_vf_main.c
index 43a5373..d02f1dd 100644
--- a/drivers/net/ethernet/cavium/liquidio/lio_vf_main.c
+++ b/drivers/net/ethernet/cavium/liquidio/lio_vf_main.c
@@ -140,18 +140,51 @@ static void octeon_pci_flr(struct octeon_device *oct)
  */
 static void octeon_destroy_resources(struct octeon_device *oct)
 {
+   int i;
+
switch (atomic_read(>status)) {
+   case OCT_DEV_IN_RESET:
+   case OCT_DEV_DROQ_INIT_DONE:
+   mdelay(100);
+   for (i = 0; i < MAX_OCTEON_OUTPUT_QUEUES(oct); i++) {
+   if (!(oct->io_qmask.oq & BIT_ULL(i)))
+   continue;
+   octeon_delete_droq(oct, i);
+   }
+
+   /* fallthrough */
+   case OCT_DEV_RESP_LIST_INIT_DONE:
+   octeon_delete_response_list(oct);
+
+   /* fallthrough */
+   case OCT_DEV_INSTR_QUEUE_INIT_DONE:
+   for (i = 0; i < MAX_OCTEON_INSTR_QUEUES(oct); i++) {
+   if (!(oct->io_qmask.iq & BIT_ULL(i)))
+   continue;
+   octeon_delete_instr_queue(oct, i);
+   }
+
+   /* fallthrough */
+   case OCT_DEV_SC_BUFF_POOL_INIT_DONE:
+   octeon_free_sc_buffer_pool(oct);
+
+   /* fallthrough */
+   case OCT_DEV_DISPATCH_INIT_DONE:
+   octeon_delete_dispatch_list(oct);
+   cancel_delayed_work_sync(>nic_poll_work.work);
+
+   /* fallthrough */
case OCT_DEV_PCI_MAP_DONE:
octeon_unmap_pci_barx(oct, 0);
octeon_unmap_pci_barx(oct, 1);
 
-   /* fallthrough */
+   /* fallthrough */
case OCT_DEV_PCI_ENABLE_DONE:
pci_clear_master(oct->pci_dev);
/* Disable the device, releasing the PCI INT */
pci_disable_device(oct->pci_dev);
 
-   /* fallthrough */
+   /* fallthrough */
case OCT_DEV_BEGIN_STATE:
/* Nothing to be done here either */
break;
@@ -236,6 +269,14 @@ static int octeon_device_init(struct octeon_device *oct)
 
atomic_set(>status, OCT_DEV_PCI_MAP_DONE);
 
+   /* Initialize the dispatch mechanism used to push packets arriving on
+* Octeon Output queues.
+*/
+   if (octeon_init_dispatch_list(oct))
+   return 1;
+
+   atomic_set(>status, OCT_DEV_DISPATCH_INIT_DONE);
+
if (octeon_set_io_queues_off(oct)) {
dev_err(>pci_dev->dev, "setting io queues off failed\n");
return 1;
@@ -246,6 +287,35 @@ static int octeon_device_init(struct octeon_device *oct)
return 1;
}
 
+   /* Initialize soft command buffer pool */
+   if (octeon_setup_sc_buffer_pool(oct)) {
+   dev_err(>pci_dev->dev, "sc buffer pool allocation 
failed\n");
+   return 1;
+   }
+   atomic_set(>status, OCT_DEV_SC_BUFF_POOL_INIT_DONE);
+
+   /* Setup the data structures that manage this Octeon's Input queues. */
+   if (octeon_setup_instr_queues(oct)) {
+   dev_err(>pci_dev->dev, "instruction queue initialization 
failed\n");
+   return 1;
+   }
+   atomic_set(>status, OCT_DEV_INSTR_QUEUE_INIT_DONE);
+
+   /* Initialize lists to manage the requests of different types that
+* arrive from user & kernel applications for this octeon device.
+*/
+   if (octeon_setup_response_list(oct)) {
+   dev_err(>pci_dev->dev, "Response list allocation 
failed\n");
+   return 1;
+   }
+   atomic_set(>status, OCT_DEV_RESP_LIST_INIT_DONE);
+
+   if (octeon_setup_output_queues(oct)) {
+   dev_err(>pci_dev->dev, "Output queue initialization 
failed\n");
+   return 1;
+   }
+   atomic_set(>status, OCT_DEV_DROQ_INIT_DONE);
+
return 0;
 }
 
diff --git a/drivers/net/ethernet/cavium/liquidio/octeon_device.c 
b/drivers/net/ethernet/cavium/liquidio/octeon_device.c
index f2cfafd..8af08d4 100644
--- a/drivers/net/ethernet/cavium/liquidio/octeon_device.c
+++ b/drivers/net/ethernet/cavium/liquidio/octeon_device.c
@@ -797,6 +797,8 @@ int octeon_setup_instr_queues(struct octeon_device *oct)

[PATCH net-next 7/9] liquidio CN23XX: VF mailbox

2016-11-27 Thread Raghu Vatsavayi
Adds support for VF mailbox setup.

Signed-off-by: Raghu Vatsavayi 
Signed-off-by: Derek Chickles 
Signed-off-by: Satanand Burla 
Signed-off-by: Felix Manlunas 
---
 .../ethernet/cavium/liquidio/cn23xx_vf_device.c| 59 ++
 drivers/net/ethernet/cavium/liquidio/lio_vf_main.c | 10 
 2 files changed, 69 insertions(+)

diff --git a/drivers/net/ethernet/cavium/liquidio/cn23xx_vf_device.c 
b/drivers/net/ethernet/cavium/liquidio/cn23xx_vf_device.c
index ad4e442..e514797 100644
--- a/drivers/net/ethernet/cavium/liquidio/cn23xx_vf_device.c
+++ b/drivers/net/ethernet/cavium/liquidio/cn23xx_vf_device.c
@@ -17,6 +17,7 @@
  ***/
 #include 
 #include 
+#include 
 #include "liquidio_common.h"
 #include "octeon_droq.h"
 #include "octeon_iq.h"
@@ -24,6 +25,7 @@
 #include "octeon_device.h"
 #include "cn23xx_vf_device.h"
 #include "octeon_main.h"
+#include "octeon_mailbox.h"
 
 static int cn23xx_vf_reset_io_queues(struct octeon_device *oct, u32 num_queues)
 {
@@ -231,6 +233,61 @@ static void cn23xx_setup_vf_oq_regs(struct octeon_device 
*oct, u32 oq_no)
(u8 *)oct->mmio[0].hw_addr + CN23XX_VF_SLI_OQ_PKTS_CREDIT(oq_no);
 }
 
+static void cn23xx_vf_mbox_thread(struct work_struct *work)
+{
+   struct cavium_wk *wk = (struct cavium_wk *)work;
+   struct octeon_mbox *mbox = (struct octeon_mbox *)wk->ctxptr;
+
+   octeon_mbox_process_message(mbox);
+}
+
+static int cn23xx_free_vf_mbox(struct octeon_device *oct)
+{
+   cancel_delayed_work_sync(>mbox[0]->mbox_poll_wk.work);
+   vfree(oct->mbox[0]);
+   return 0;
+}
+
+static int cn23xx_setup_vf_mbox(struct octeon_device *oct)
+{
+   struct octeon_mbox *mbox = NULL;
+
+   mbox = vmalloc(sizeof(*mbox));
+   if (!mbox)
+   return 1;
+
+   memset(mbox, 0, sizeof(struct octeon_mbox));
+
+   spin_lock_init(>lock);
+
+   mbox->oct_dev = oct;
+
+   mbox->q_no = 0;
+
+   mbox->state = OCTEON_MBOX_STATE_IDLE;
+
+   /* VF mbox interrupt reg */
+   mbox->mbox_int_reg =
+   (u8 *)oct->mmio[0].hw_addr + CN23XX_VF_SLI_PKT_MBOX_INT(0);
+   /* VF reads from SIG0 reg */
+   mbox->mbox_read_reg =
+   (u8 *)oct->mmio[0].hw_addr + CN23XX_SLI_PKT_PF_VF_MBOX_SIG(0, 0);
+   /* VF writes into SIG1 reg */
+   mbox->mbox_write_reg =
+   (u8 *)oct->mmio[0].hw_addr + CN23XX_SLI_PKT_PF_VF_MBOX_SIG(0, 1);
+
+   INIT_DELAYED_WORK(>mbox_poll_wk.work,
+ cn23xx_vf_mbox_thread);
+
+   mbox->mbox_poll_wk.ctxptr = (void *)mbox;
+
+   oct->mbox[0] = mbox;
+
+   writeq(OCTEON_PFVFSIG, mbox->mbox_read_reg);
+
+   return 0;
+}
+
 static int cn23xx_enable_vf_io_queues(struct octeon_device *oct)
 {
u32 q_no;
@@ -338,6 +395,8 @@ int cn23xx_setup_octeon_vf_device(struct octeon_device *oct)
 
oct->fn_list.setup_iq_regs = cn23xx_setup_vf_iq_regs;
oct->fn_list.setup_oq_regs = cn23xx_setup_vf_oq_regs;
+   oct->fn_list.setup_mbox = cn23xx_setup_vf_mbox;
+   oct->fn_list.free_mbox = cn23xx_free_vf_mbox;
oct->fn_list.setup_device_regs = cn23xx_setup_vf_device_regs;
 
oct->fn_list.enable_io_queues = cn23xx_enable_vf_io_queues;
diff --git a/drivers/net/ethernet/cavium/liquidio/lio_vf_main.c 
b/drivers/net/ethernet/cavium/liquidio/lio_vf_main.c
index d02f1dd..e8eaece 100644
--- a/drivers/net/ethernet/cavium/liquidio/lio_vf_main.c
+++ b/drivers/net/ethernet/cavium/liquidio/lio_vf_main.c
@@ -143,6 +143,10 @@ static void octeon_destroy_resources(struct octeon_device 
*oct)
int i;
 
switch (atomic_read(>status)) {
+   case OCT_DEV_MBOX_SETUP_DONE:
+   oct->fn_list.free_mbox(oct);
+
+   /* fallthrough */
case OCT_DEV_IN_RESET:
case OCT_DEV_DROQ_INIT_DONE:
mdelay(100);
@@ -316,6 +320,12 @@ static int octeon_device_init(struct octeon_device *oct)
}
atomic_set(>status, OCT_DEV_DROQ_INIT_DONE);
 
+   if (oct->fn_list.setup_mbox(oct)) {
+   dev_err(>pci_dev->dev, "Mailbox setup failed\n");
+   return 1;
+   }
+   atomic_set(>status, OCT_DEV_MBOX_SETUP_DONE);
+
return 0;
 }
 
-- 
1.8.3.1



[PATCH net-next 9/9] liquidio CN23XX: VF init and destroy

2016-11-27 Thread Raghu Vatsavayi
Adds support for VF initialization and destroy resources.

Signed-off-by: Raghu Vatsavayi 
Signed-off-by: Derek Chickles 
Signed-off-by: Satanand Burla 
Signed-off-by: Felix Manlunas 
---
 .../ethernet/cavium/liquidio/cn23xx_vf_device.h|   2 +
 drivers/net/ethernet/cavium/liquidio/lio_vf_main.c | 106 +
 2 files changed, 108 insertions(+)

diff --git a/drivers/net/ethernet/cavium/liquidio/cn23xx_vf_device.h 
b/drivers/net/ethernet/cavium/liquidio/cn23xx_vf_device.h
index 8590bdb..6715df3 100644
--- a/drivers/net/ethernet/cavium/liquidio/cn23xx_vf_device.h
+++ b/drivers/net/ethernet/cavium/liquidio/cn23xx_vf_device.h
@@ -36,6 +36,8 @@ struct octeon_cn23xx_vf {
 
 #define CN23XX_MAILBOX_MSGPARAM_SIZE   6
 
+#define MAX_VF_IP_OP_PENDING_PKT_COUNT 100
+
 void cn23xx_vf_ask_pf_to_do_flr(struct octeon_device *oct);
 
 int cn23xx_octeon_pfvf_handshake(struct octeon_device *oct);
diff --git a/drivers/net/ethernet/cavium/liquidio/lio_vf_main.c 
b/drivers/net/ethernet/cavium/liquidio/lio_vf_main.c
index 337285b..2493bf5 100644
--- a/drivers/net/ethernet/cavium/liquidio/lio_vf_main.c
+++ b/drivers/net/ethernet/cavium/liquidio/lio_vf_main.c
@@ -41,6 +41,60 @@ struct octeon_device_priv {
 static void liquidio_vf_remove(struct pci_dev *pdev);
 static int octeon_device_init(struct octeon_device *oct);
 
+static int lio_wait_for_oq_pkts(struct octeon_device *oct)
+{
+   struct octeon_device_priv *oct_priv =
+   (struct octeon_device_priv *)oct->priv;
+   int retry = MAX_VF_IP_OP_PENDING_PKT_COUNT;
+   int pkt_cnt = 0, pending_pkts;
+   int i;
+
+   do {
+   pending_pkts = 0;
+
+   for (i = 0; i < MAX_OCTEON_OUTPUT_QUEUES(oct); i++) {
+   if (!(oct->io_qmask.oq & BIT_ULL(i)))
+   continue;
+   pkt_cnt += octeon_droq_check_hw_for_pkts(oct->droq[i]);
+   }
+   if (pkt_cnt > 0) {
+   pending_pkts += pkt_cnt;
+   tasklet_schedule(_priv->droq_tasklet);
+   }
+   pkt_cnt = 0;
+   schedule_timeout_uninterruptible(1);
+
+   } while (retry-- && pending_pkts);
+
+   return pkt_cnt;
+}
+
+/**
+ * \brief wait for all pending requests to complete
+ * @param oct Pointer to Octeon device
+ *
+ * Called during shutdown sequence
+ */
+static int wait_for_pending_requests(struct octeon_device *oct)
+{
+   int i, pcount = 0;
+
+   for (i = 0; i < MAX_VF_IP_OP_PENDING_PKT_COUNT; i++) {
+   pcount = atomic_read(
+   >response_list[OCTEON_ORDERED_SC_LIST]
+.pending_req_count);
+   if (pcount)
+   schedule_timeout_uninterruptible(HZ / 10);
+   else
+   break;
+   }
+
+   if (pcount)
+   return 1;
+
+   return 0;
+}
+
 static const struct pci_device_id liquidio_vf_pci_tbl[] = {
{
PCI_VENDOR_ID_CAVIUM, OCTEON_CN23XX_VF_VID,
@@ -257,6 +311,35 @@ static void octeon_destroy_resources(struct octeon_device 
*oct)
int i;
 
switch (atomic_read(>status)) {
+   case OCT_DEV_RUNNING:
+   case OCT_DEV_CORE_OK:
+   /* No more instructions will be forwarded. */
+   atomic_set(>status, OCT_DEV_IN_RESET);
+
+   dev_dbg(>pci_dev->dev, "Device state is now %s\n",
+   lio_get_state_string(>status));
+
+   schedule_timeout_uninterruptible(HZ / 10);
+
+   /* fallthrough */
+   case OCT_DEV_HOST_OK:
+   /* fallthrough */
+   case OCT_DEV_IO_QUEUES_DONE:
+   if (wait_for_pending_requests(oct))
+   dev_err(>pci_dev->dev, "There were pending 
requests\n");
+
+   if (lio_wait_for_instr_fetch(oct))
+   dev_err(>pci_dev->dev, "IQ had pending 
instructions\n");
+
+   /* Disable the input and output queues now. No more packets will
+* arrive from Octeon, but we should wait for all packet
+* processing to finish.
+*/
+   oct->fn_list.disable_io_queues(oct);
+
+   if (lio_wait_for_oq_pkts(oct))
+   dev_err(>pci_dev->dev, "OQ had pending packets\n");
+
case OCT_DEV_INTR_SET_DONE:
/* Disable interrupts  */
oct->fn_list.disable_interrupt(oct, OCTEON_ALL_INTR);
@@ -395,6 +478,7 @@ static int octeon_pci_os_setup(struct octeon_device *oct)
 static int octeon_device_init(struct octeon_device *oct)
 {
u32 rev_id;
+   int j;
 
atomic_set(>status, OCT_DEV_BEGIN_STATE);
 
@@ -488,6 +572,28 @@ static int octeon_device_init(struct octeon_device *oct)
 

[PATCH net-next 0/9] liquidio VF operations

2016-11-27 Thread Raghu Vatsavayi
Hi Dave,

Following  patches add support for VF device specific operations
like mailbox, queues and register access. Please apply the 
patches in following order as these patches depend on each other.

Thanks


Raghu Vatsavayi (9):
  liquidio CN23XX: VF register definitions
  liquidio CN23XX: VF registration
  liquidio CN23XX: VF config setup
  liquidio CN23XX: VF queue setup
  liquidio CN23XX: VF register access
  liquidio CN23XX: init VF softcommand queues
  liquidio CN23XX: VF mailbox
  liquidio CN23XX: VF interrupt
  liquidio CN23XX: VF init and destroy

 drivers/net/ethernet/cavium/Kconfig|  12 +
 drivers/net/ethernet/cavium/liquidio/Makefile  |  22 +
 .../ethernet/cavium/liquidio/cn23xx_vf_device.c| 701 +
 .../ethernet/cavium/liquidio/cn23xx_vf_device.h|  48 ++
 .../net/ethernet/cavium/liquidio/cn23xx_vf_regs.h  | 274 
 drivers/net/ethernet/cavium/liquidio/lio_core.c|   7 -
 drivers/net/ethernet/cavium/liquidio/lio_main.c|   6 +-
 drivers/net/ethernet/cavium/liquidio/lio_vf_main.c | 614 ++
 .../net/ethernet/cavium/liquidio/octeon_device.c   |  58 +-
 .../net/ethernet/cavium/liquidio/octeon_device.h   |   9 +-
 .../net/ethernet/cavium/liquidio/request_manager.c |  11 +-
 11 files changed, 1751 insertions(+), 11 deletions(-)
 create mode 100644 drivers/net/ethernet/cavium/liquidio/cn23xx_vf_device.c
 create mode 100644 drivers/net/ethernet/cavium/liquidio/cn23xx_vf_device.h
 create mode 100644 drivers/net/ethernet/cavium/liquidio/cn23xx_vf_regs.h
 create mode 100644 drivers/net/ethernet/cavium/liquidio/lio_vf_main.c

-- 
1.8.3.1



[PATCH net-next 3/9] liquidio CN23XX: VF config setup

2016-11-27 Thread Raghu Vatsavayi
Adds support for setting up VF configuration.

Signed-off-by: Raghu Vatsavayi 
Signed-off-by: Derek Chickles 
Signed-off-by: Satanand Burla 
Signed-off-by: Felix Manlunas 
---
 drivers/net/ethernet/cavium/liquidio/Makefile  |   1 +
 .../ethernet/cavium/liquidio/cn23xx_vf_device.c|  44 +++
 .../ethernet/cavium/liquidio/cn23xx_vf_device.h|   2 +
 drivers/net/ethernet/cavium/liquidio/lio_vf_main.c | 136 +
 .../net/ethernet/cavium/liquidio/octeon_device.c   |   3 +
 5 files changed, 186 insertions(+)
 create mode 100644 drivers/net/ethernet/cavium/liquidio/cn23xx_vf_device.c

diff --git a/drivers/net/ethernet/cavium/liquidio/Makefile 
b/drivers/net/ethernet/cavium/liquidio/Makefile
index 69d23fc..cca903a 100644
--- a/drivers/net/ethernet/cavium/liquidio/Makefile
+++ b/drivers/net/ethernet/cavium/liquidio/Makefile
@@ -31,6 +31,7 @@ liquidio_vf-$(CONFIG_LIQUIDIO_VF) += lio_ethtool.o \
cn66xx_device.o\
cn68xx_device.o\
cn23xx_pf_device.o \
+   cn23xx_vf_device.o \
octeon_mailbox.o   \
octeon_mem_ops.o   \
octeon_droq.o  \
diff --git a/drivers/net/ethernet/cavium/liquidio/cn23xx_vf_device.c 
b/drivers/net/ethernet/cavium/liquidio/cn23xx_vf_device.c
new file mode 100644
index 000..d683bda
--- /dev/null
+++ b/drivers/net/ethernet/cavium/liquidio/cn23xx_vf_device.c
@@ -0,0 +1,44 @@
+/**
+ * Author: Cavium, Inc.
+ *
+ * Contact: supp...@cavium.com
+ *  Please include "LiquidIO" in the subject.
+ *
+ * Copyright (c) 2003-2016 Cavium, Inc.
+ *
+ * This file is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, Version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This file is distributed in the hope that it will be useful, but
+ * AS-IS and WITHOUT ANY WARRANTY; without even the implied warranty
+ * of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE, TITLE, or
+ * NONINFRINGEMENT.  See the GNU General Public License for more details.
+ ***/
+#include 
+#include 
+#include "liquidio_common.h"
+#include "octeon_droq.h"
+#include "octeon_iq.h"
+#include "response_manager.h"
+#include "octeon_device.h"
+#include "cn23xx_vf_device.h"
+#include "octeon_main.h"
+
+int cn23xx_setup_octeon_vf_device(struct octeon_device *oct)
+{
+   struct octeon_cn23xx_vf *cn23xx = (struct octeon_cn23xx_vf *)oct->chip;
+
+   if (octeon_map_pci_barx(oct, 0, 0))
+   return 1;
+
+   cn23xx->conf  = oct_get_config_info(oct, LIO_23XX);
+   if (!cn23xx->conf) {
+   dev_err(>pci_dev->dev, "%s No Config found for CN23XX\n",
+   __func__);
+   octeon_unmap_pci_barx(oct, 0);
+   return 1;
+   }
+
+   return 0;
+}
diff --git a/drivers/net/ethernet/cavium/liquidio/cn23xx_vf_device.h 
b/drivers/net/ethernet/cavium/liquidio/cn23xx_vf_device.h
index 015b6d4..9e4fb50 100644
--- a/drivers/net/ethernet/cavium/liquidio/cn23xx_vf_device.h
+++ b/drivers/net/ethernet/cavium/liquidio/cn23xx_vf_device.h
@@ -31,4 +31,6 @@
 struct octeon_cn23xx_vf {
struct octeon_config *conf;
 };
+
+int cn23xx_setup_octeon_vf_device(struct octeon_device *oct);
 #endif
diff --git a/drivers/net/ethernet/cavium/liquidio/lio_vf_main.c 
b/drivers/net/ethernet/cavium/liquidio/lio_vf_main.c
index d1b1a24..721ee66 100644
--- a/drivers/net/ethernet/cavium/liquidio/lio_vf_main.c
+++ b/drivers/net/ethernet/cavium/liquidio/lio_vf_main.c
@@ -22,6 +22,8 @@
 #include "octeon_iq.h"
 #include "response_manager.h"
 #include "octeon_device.h"
+#include "octeon_main.h"
+#include "cn23xx_vf_device.h"
 
 MODULE_AUTHOR("Cavium Networks, ");
 MODULE_DESCRIPTION("Cavium LiquidIO Intelligent Server Adapter Virtual 
Function Driver");
@@ -37,6 +39,7 @@ struct octeon_device_priv {
 static int
 liquidio_vf_probe(struct pci_dev *pdev, const struct pci_device_id *ent);
 static void liquidio_vf_remove(struct pci_dev *pdev);
+static int octeon_device_init(struct octeon_device *oct);
 
 static const struct pci_device_id liquidio_vf_pci_tbl[] = {
{
@@ -84,10 +87,78 @@ struct octeon_device_priv {
/* set linux specific device pointer */
oct_dev->pci_dev = (void *)pdev;
 
+   if (octeon_device_init(oct_dev)) {
+   liquidio_vf_remove(pdev);
+   return -ENOMEM;
+   }
+
+   dev_dbg(_dev->pci_dev->dev, "Device is ready\n");
+
return 0;
 }
 
 /**
+ * \brief PCI FLR for each Octeon device.
+ * @param oct octeon device
+ */
+static void octeon_pci_flr(struct octeon_device 

RE: [net,v2] neigh: fix the loop index error in neigh dump

2016-11-27 Thread 张胜举


> -Original Message-
> From: David Ahern [mailto:d...@cumulusnetworks.com]
> Sent: Monday, November 28, 2016 11:10 AM
> To: 张胜举 ;
> netdev@vger.kernel.org
> Subject: Re: [net,v2] neigh: fix the loop index error in neigh dump
> 
> On 11/27/16 7:56 PM, David Ahern wrote:
> > On 11/27/16 7:53 PM, 张胜举 wrote:
> >>
> >>
> >>> -Original Message-
> >>> From: David Ahern [mailto:d...@cumulusnetworks.com]
> >>> Sent: Monday, November 28, 2016 10:39 AM
> >>> To: 张胜举 ;
> >>> netdev@vger.kernel.org
> >>> Subject: Re: [net,v2] neigh: fix the loop index error in neigh dump
> >>>
> >>> On 11/27/16 7:34 PM, 张胜举 wrote:
> > -Original Message-
> > From: David Ahern [mailto:d...@cumulusnetworks.com]
> > Sent: Monday, November 28, 2016 10:10 AM
> > To: Zhang Shengju ;
> > netdev@vger.kernel.org
> > Subject: Re: [net,v2] neigh: fix the loop index error in neigh
> > dump
> >
> > On 11/27/16 6:32 PM, Zhang Shengju wrote:
> >> Loop index in neigh dump function is not updated correctly under
> >> some circumstances, this patch will fix it.
> >
> > What's an example?
> 
>  If dev is filtered out, the original code goes to next loop without
>  updating loop index 'idx'.
> >>>
> >>> And you have a use case with missing or redundant data? Or is your
> >>> comment based on a review of code only?
> >> It's on my code review. No use case currently,  this is uncommon to
> happen.
> >>
> >>
> >>>
> > You are completely rewriting the dump loops.
> 
>  I put 'idx++' into for loop,  so I replace 'goto' with 'continue'.
>  The other change is style related.
> >>>
> >>> A "fixes" should not include 'style related' changes.
> >> Okay, I will send another version without style changes.
> >>
> >
> > Personally, I think you need to produce a use case that fails before
sending
> another patch. I have not seen a problem with this code.
> >
> 
> And looking back at 3f0ae05d6f I should not have acked it (reviewed it too
> quickly while on PTO). Your change is a no-op because of what idx
represents
> - the position in the hash list for devices relevant for the dump request.
> Same goes for the neigh dump so this patch is not needed.
> 
No, when dump request must be processed by multiple 'recv/recvmsg' system
calls, 
idx stores which dev/neigh the previous call have processed, so that next
call will scan 
from the right place.  

So no matter whether the dev/neigh is filtered, the idx should be increased
anyway.

It's hard to produce a use case, because we mostly have only one entity in
hash list. Even with
multiple entities, we also need the function to exit right at the place
where dev/neigh is filter out.

All other dump functiones for RT netlink keep this logic, you can refer
inet_dump_ifaddr() if you wish.





[PATCH net-next 1/9] liquidio CN23XX: VF register definitions

2016-11-27 Thread Raghu Vatsavayi
Adds support for CN23xx VF registers.

Signed-off-by: Raghu Vatsavayi 
Signed-off-by: Derek Chickles 
Signed-off-by: Satanand Burla 
Signed-off-by: Felix Manlunas 
---
 .../net/ethernet/cavium/liquidio/cn23xx_vf_regs.h  | 274 +
 1 file changed, 274 insertions(+)
 create mode 100644 drivers/net/ethernet/cavium/liquidio/cn23xx_vf_regs.h

diff --git a/drivers/net/ethernet/cavium/liquidio/cn23xx_vf_regs.h 
b/drivers/net/ethernet/cavium/liquidio/cn23xx_vf_regs.h
new file mode 100644
index 000..d33dd8f
--- /dev/null
+++ b/drivers/net/ethernet/cavium/liquidio/cn23xx_vf_regs.h
@@ -0,0 +1,274 @@
+/**
+ * Author: Cavium, Inc.
+ *
+ * Contact: supp...@cavium.com
+ *  Please include "LiquidIO" in the subject.
+ *
+ * Copyright (c) 2003-2016 Cavium, Inc.
+ *
+ * This file is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, Version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This file is distributed in the hope that it will be useful, but
+ * AS-IS and WITHOUT ANY WARRANTY; without even the implied warranty
+ * of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE, TITLE, or
+ * NONINFRINGEMENT.  See the GNU General Public License for more details.
+ ***/
+/*! \file cn23xx_vf_regs.h
+ * \brief Host Driver: Register Address and Register Mask values for
+ * Octeon CN23XX vf functions.
+ */
+
+#ifndef __CN23XX_VF_REGS_H__
+#define __CN23XX_VF_REGS_H__
+
+#define CN23XX_CONFIG_XPANSION_BAR 0x38
+
+#define CN23XX_CONFIG_PCIE_CAP 0x70
+#define CN23XX_CONFIG_PCIE_DEVCAP  0x74
+#define CN23XX_CONFIG_PCIE_DEVCTL  0x78
+#define CN23XX_CONFIG_PCIE_LINKCAP 0x7C
+#define CN23XX_CONFIG_PCIE_LINKCTL 0x80
+#define CN23XX_CONFIG_PCIE_SLOTCAP 0x84
+#define CN23XX_CONFIG_PCIE_SLOTCTL 0x88
+
+#define CN23XX_CONFIG_PCIE_FLTMSK  0x720
+
+/* The input jabber is used to determine the TSO max size.
+ * Due to H/W limitation, this need to be reduced to 6
+ * in order to to H/W TSO and avoid the WQE malfarmation
+ * PKO_BUG_24989_WQE_LEN
+ */
+#defineCN23XX_DEFAULT_INPUT_JABBER 0xEA60 /*6*/
+
+/* ##  BAR0 Registers  */
+
+/* Each Input Queue register is at a 16-byte Offset in BAR0 */
+#defineCN23XX_VF_IQ_OFFSET 0x2
+
+/*## REQUEST QUEUE #*/
+
+/* 64 registers for Input Queue Instr Count - SLI_PKT_IN_DONE0_CNTS */
+#defineCN23XX_VF_SLI_IQ_INSTR_COUNT_START64 0x10040
+
+/* 64 registers for Input Queues Start Addr - SLI_PKT0_INSTR_BADDR */
+#defineCN23XX_VF_SLI_IQ_BASE_ADDR_START64   0x10010
+
+/* 64 registers for Input Doorbell - SLI_PKT0_INSTR_BAOFF_DBELL */
+#defineCN23XX_VF_SLI_IQ_DOORBELL_START  0x10020
+
+/* 64 registers for Input Queue size - SLI_PKT0_INSTR_FIFO_RSIZE */
+#defineCN23XX_VF_SLI_IQ_SIZE_START  0x10030
+
+/* 64 registers (64-bit) - ES, RO, NS, Arbitration for Input Queue Data &
+ * gather list fetches. SLI_PKT(0..63)_INPUT_CONTROL.
+ */
+#defineCN23XX_VF_SLI_IQ_PKT_CONTROL_START64 0x1
+
+/*--- Request Queue Macros -*/
+#define CN23XX_VF_SLI_IQ_PKT_CONTROL64(iq) \
+   (CN23XX_VF_SLI_IQ_PKT_CONTROL_START64 + ((iq) * CN23XX_VF_IQ_OFFSET))
+
+#define CN23XX_VF_SLI_IQ_BASE_ADDR64(iq)   \
+   (CN23XX_VF_SLI_IQ_BASE_ADDR_START64 + ((iq) * CN23XX_VF_IQ_OFFSET))
+
+#define CN23XX_VF_SLI_IQ_SIZE(iq)  \
+   (CN23XX_VF_SLI_IQ_SIZE_START + ((iq) * CN23XX_VF_IQ_OFFSET))
+
+#define CN23XX_VF_SLI_IQ_DOORBELL(iq)  \
+   (CN23XX_VF_SLI_IQ_DOORBELL_START + ((iq) * CN23XX_VF_IQ_OFFSET))
+
+#define CN23XX_VF_SLI_IQ_INSTR_COUNT64(iq) \
+   (CN23XX_VF_SLI_IQ_INSTR_COUNT_START64 + ((iq) * CN23XX_VF_IQ_OFFSET))
+
+/*-- Masks */
+#defineCN23XX_PKT_INPUT_CTL_VF_NUM  BIT_ULL(32)
+#defineCN23XX_PKT_INPUT_CTL_MAC_NUM BIT(29)
+/* Number of instructions to be read in one MAC read request.
+ * setting to Max value(4)
+ */
+#defineCN23XX_PKT_INPUT_CTL_RDSIZE  (3 << 25)
+#defineCN23XX_PKT_INPUT_CTL_IS_64B  BIT(24)
+#defineCN23XX_PKT_INPUT_CTL_RST BIT(23)
+#defineCN23XX_PKT_INPUT_CTL_QUIET   BIT(28)
+#defineCN23XX_PKT_INPUT_CTL_RING_ENBBIT(22)
+#defineCN23XX_PKT_INPUT_CTL_DATA_NS BIT(8)
+#defineCN23XX_PKT_INPUT_CTL_DATA_ES_64B_SWAPBIT(6)
+#define

[PATCH net-next 2/9] liquidio CN23XX: VF registration

2016-11-27 Thread Raghu Vatsavayi
Adds support for cn23xx VF probe and registration.

Signed-off-by: Raghu Vatsavayi 
Signed-off-by: Derek Chickles 
Signed-off-by: Satanand Burla 
Signed-off-by: Felix Manlunas 
---
 drivers/net/ethernet/cavium/Kconfig|  12 +++
 drivers/net/ethernet/cavium/liquidio/Makefile  |  21 
 .../ethernet/cavium/liquidio/cn23xx_vf_device.h|  34 ++
 drivers/net/ethernet/cavium/liquidio/lio_vf_main.c | 120 +
 .../net/ethernet/cavium/liquidio/octeon_device.c   |   4 +
 5 files changed, 191 insertions(+)
 create mode 100644 drivers/net/ethernet/cavium/liquidio/cn23xx_vf_device.h
 create mode 100644 drivers/net/ethernet/cavium/liquidio/lio_vf_main.c

diff --git a/drivers/net/ethernet/cavium/Kconfig 
b/drivers/net/ethernet/cavium/Kconfig
index 92f411c..c0679c2 100644
--- a/drivers/net/ethernet/cavium/Kconfig
+++ b/drivers/net/ethernet/cavium/Kconfig
@@ -74,4 +74,16 @@ config OCTEON_MGMT_ETHERNET
  port on Cavium Networks' Octeon CN57XX, CN56XX, CN55XX,
  CN54XX, CN52XX, and CN6XXX chips.
 
+config LIQUIDIO_VF
+   tristate "Cavium LiquidIO VF support"
+   depends on 64BIT && PCI_MSI
+   select PTP_1588_CLOCK
+   ---help---
+ This driver supports Cavium LiquidIO Intelligent Server Adapter
+ based on CN23XX chips.
+
+ To compile this driver as a module, choose M here: The module
+ will be called liquidio_vf. MSI-X interrupt support is required
+ for this driver to work correctly
+
 endif # NET_VENDOR_CAVIUM
diff --git a/drivers/net/ethernet/cavium/liquidio/Makefile 
b/drivers/net/ethernet/cavium/liquidio/Makefile
index 14958de..69d23fc 100644
--- a/drivers/net/ethernet/cavium/liquidio/Makefile
+++ b/drivers/net/ethernet/cavium/liquidio/Makefile
@@ -17,3 +17,24 @@ liquidio-$(CONFIG_LIQUIDIO) += lio_ethtool.o \
octeon_nic.o
 
 liquidio-objs := lio_main.o octeon_console.o $(liquidio-y)
+
+obj-$(CONFIG_LIQUIDIO_VF) += liquidio_vf.o
+
+ifeq ($(CONFIG_LIQUIDIO)$(CONFIG_LIQUIDIO_VF), yy)
+   liquidio_vf-objs := lio_vf_main.o
+else
+liquidio_vf-$(CONFIG_LIQUIDIO_VF) += lio_ethtool.o \
+   lio_core.o \
+   request_manager.o  \
+   response_manager.o \
+   octeon_device.o\
+   cn66xx_device.o\
+   cn68xx_device.o\
+   cn23xx_pf_device.o \
+   octeon_mailbox.o   \
+   octeon_mem_ops.o   \
+   octeon_droq.o  \
+   octeon_nic.o
+
+liquidio_vf-objs := lio_vf_main.o $(liquidio_vf-y)
+endif
diff --git a/drivers/net/ethernet/cavium/liquidio/cn23xx_vf_device.h 
b/drivers/net/ethernet/cavium/liquidio/cn23xx_vf_device.h
new file mode 100644
index 000..015b6d4
--- /dev/null
+++ b/drivers/net/ethernet/cavium/liquidio/cn23xx_vf_device.h
@@ -0,0 +1,34 @@
+/**
+ * Author: Cavium, Inc.
+ *
+ * Contact: supp...@cavium.com
+ *  Please include "LiquidIO" in the subject.
+ *
+ * Copyright (c) 2003-2016 Cavium, Inc.
+ *
+ * This file is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, Version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This file is distributed in the hope that it will be useful, but
+ * AS-IS and WITHOUT ANY WARRANTY; without even the implied warranty
+ * of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE, TITLE, or
+ * NONINFRINGEMENT.  See the GNU General Public License for more details.
+ ***/
+/*! \file  cn23xx_device.h
+ * \brief Host Driver: Routines that perform CN23XX specific operations.
+ */
+
+#ifndef __CN23XX_VF_DEVICE_H__
+#define __CN23XX_VF_DEVICE_H__
+
+#include "cn23xx_vf_regs.h"
+
+/* Register address and configuration for a CN23XX devices.
+ * If device specific changes need to be made then add a struct to include
+ * device specific fields as shown in the commented section
+ */
+struct octeon_cn23xx_vf {
+   struct octeon_config *conf;
+};
+#endif
diff --git a/drivers/net/ethernet/cavium/liquidio/lio_vf_main.c 
b/drivers/net/ethernet/cavium/liquidio/lio_vf_main.c
new file mode 100644
index 000..d1b1a24
--- /dev/null
+++ b/drivers/net/ethernet/cavium/liquidio/lio_vf_main.c
@@ -0,0 +1,120 @@
+/**
+ * Author: Cavium, Inc.
+ *
+ * Contact: supp...@cavium.com
+ *  Please include "LiquidIO" in the subject.
+ *
+ * Copyright (c) 2003-2016 Cavium, Inc.
+ *
+ * This file is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, Version 

[PATCH net-next] bpf: samples: Fix compile of test_lru_dist.c

2016-11-27 Thread David Ahern
Build of samples/bpf on debian/jessie fails with:

  HOSTCC  /home/dsa/kernel-3.git/samples/bpf/test_lru_dist.o
/home/dsa/kernel-3.git/samples/bpf/test_lru_dist.c: In function ‘main’:
/home/dsa/kernel-3.git/samples/bpf/test_lru_dist.c:490:9: error: variable ‘r’ 
has initializer but incomplete type
  struct rlimit r = {RLIM_INFINITY, RLIM_INFINITY};
 ^
/home/dsa/kernel-3.git/samples/bpf/test_lru_dist.c:490:21: error: 
‘RLIM_INFINITY’ undeclared (first use in this function)
  struct rlimit r = {RLIM_INFINITY, RLIM_INFINITY};
 ^
/home/dsa/kernel-3.git/samples/bpf/test_lru_dist.c:490:21: note: each 
undeclared identifier is reported only once for each function it appears in
/home/dsa/kernel-3.git/samples/bpf/test_lru_dist.c:490:9: warning: excess 
elements in struct initializer
  struct rlimit r = {RLIM_INFINITY, RLIM_INFINITY};
 ^
/home/dsa/kernel-3.git/samples/bpf/test_lru_dist.c:490:9: warning: (near 
initialization for ‘r’)
/home/dsa/kernel-3.git/samples/bpf/test_lru_dist.c:490:9: warning: excess 
elements in struct initializer
/home/dsa/kernel-3.git/samples/bpf/test_lru_dist.c:490:9: warning: (near 
initialization for ‘r’)
/home/dsa/kernel-3.git/samples/bpf/test_lru_dist.c:490:16: error: storage size 
of ‘r’ isn’t known
  struct rlimit r = {RLIM_INFINITY, RLIM_INFINITY};

Add sys/resource.h to the include list

Fixes: 5db58faf989f ("bpf: Add tests for the LRU bpf_htab")
Signed-off-by: David Ahern 
Cc: Martin KaFai Lau 
---
 samples/bpf/test_lru_dist.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/samples/bpf/test_lru_dist.c b/samples/bpf/test_lru_dist.c
index 2859977b7f37..bc4a2142eb91 100644
--- a/samples/bpf/test_lru_dist.c
+++ b/samples/bpf/test_lru_dist.c
@@ -16,6 +16,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
-- 
2.1.4



Re: [RFC net-next 2/3] net: dsa: Propagate VLAN add/del to CPU port(s)

2016-11-27 Thread Florian Fainelli


On 11/22/2016 08:50 AM, Vivien Didelot wrote:
> Hi Florian,
> 
> Open question: will we need to do the same for FDB and MDB objects?

(overlooked that question early this week), I do expect that this could
be helpful for FDB and MBD objects as well, yes.

> 
> Florian Fainelli  writes:
> 
>> Now that the bridge layer can call into switchdev to signal programming
>> requests targeting the bridge master device itself, allow the switch
>> drivers to implement separate programming of downstream and
>> upstream/management ports.
>>
>> Signed-off-by: Vivien Didelot 
>> Signed-off-by: Florian Fainelli 
>> ---
>>  net/dsa/slave.c | 45 +
>>  1 file changed, 33 insertions(+), 12 deletions(-)
>>
>> diff --git a/net/dsa/slave.c b/net/dsa/slave.c
>> index d0c7bce88743..18288261b964 100644
>> --- a/net/dsa/slave.c
>> +++ b/net/dsa/slave.c
>> @@ -223,35 +223,30 @@ static int dsa_slave_set_mac_address(struct net_device 
>> *dev, void *a)
>>  return 0;
>>  }
>>  
>> -static int dsa_slave_port_vlan_add(struct net_device *dev,
>> +static int dsa_slave_port_vlan_add(struct dsa_switch *ds, int port,
>> const struct switchdev_obj_port_vlan *vlan,
>> struct switchdev_trans *trans)
>>  {
>> -struct dsa_slave_priv *p = netdev_priv(dev);
>> -struct dsa_switch *ds = p->parent;
>>  
> 
> Extra newline ^.
> 
>>  if (switchdev_trans_ph_prepare(trans)) {
>>  if (!ds->ops->port_vlan_prepare || !ds->ops->port_vlan_add)
>>  return -EOPNOTSUPP;
>>  
>> -return ds->ops->port_vlan_prepare(ds, p->port, vlan, trans);
>> +return ds->ops->port_vlan_prepare(ds, port, vlan, trans);
>>  }
>>  
>> -ds->ops->port_vlan_add(ds, p->port, vlan, trans);
>> +ds->ops->port_vlan_add(ds, port, vlan, trans);
>>  
>>  return 0;
>>  }
>>  
>> -static int dsa_slave_port_vlan_del(struct net_device *dev,
>> +static int dsa_slave_port_vlan_del(struct dsa_switch *ds, int port,
>> const struct switchdev_obj_port_vlan *vlan)
>>  {
>> -struct dsa_slave_priv *p = netdev_priv(dev);
>> -struct dsa_switch *ds = p->parent;
>> -
>>  if (!ds->ops->port_vlan_del)
>>  return -EOPNOTSUPP;
>>  
>> -return ds->ops->port_vlan_del(ds, p->port, vlan);
>> +return ds->ops->port_vlan_del(ds, port, vlan);
>>  }
>>  
>>  static int dsa_slave_port_vlan_dump(struct net_device *dev,
>> @@ -465,8 +460,21 @@ static int dsa_slave_port_obj_add(struct net_device 
>> *dev,
>>const struct switchdev_obj *obj,
>>struct switchdev_trans *trans)
>>  {
>> +struct dsa_slave_priv *p = netdev_priv(dev);
>> +struct dsa_switch *ds = p->parent;
>> +int port = p->port;
>>  int err;
>>  
>> +/* Here we may be called with an orig_dev which is different from dev,
>> + * on purpose, to receive request coming from e.g the bridge master
>> + * device. Although there are no network device associated with CPU/DSA
>> + * ports, we may still have programming operation for these ports.
>> + */
>> +if (obj->orig_dev == p->bridge_dev) {
>> +ds = ds->dst->ds[0];
>> +port = ds->dst->cpu_port;
>> +}
>> +
>>  /* For the prepare phase, ensure the full set of changes is feasable in
>>   * one go in order to signal a failure properly. If an operation is not
>>   * supported, return -EOPNOTSUPP.
>> @@ -483,7 +491,7 @@ static int dsa_slave_port_obj_add(struct net_device *dev,
>>   trans);
>>  break;
>>  case SWITCHDEV_OBJ_ID_PORT_VLAN:
>> -err = dsa_slave_port_vlan_add(dev,
>> +err = dsa_slave_port_vlan_add(ds, port,
>>SWITCHDEV_OBJ_PORT_VLAN(obj),
>>trans);
> 
> Note that dsa_slave_port_vlan_add() will be called N times, N being the
> number of bridge ports. This is not an issue for the moment though.
> Programming it only once requires caching, so leave it for an eventual
> future patch.
> 
> When issuing the following command (lan0 being a member of br0):
> 
> # bridge vlan add vid 42 dev lan0
> 
> the CPU port is also programmed as tagged in VLAN 42. Is that expected?

The first time the VLAN id is programmed to either lan0 or br0, and it
did not exist prior to that call, it also gets populated into the bridge
VLAN database, which is why both the lan0 interface and the CPU port get
programmed.
-- 
Florian


Re: [net-next PATCH v2 3/5] virtio_net: Add XDP support

2016-11-27 Thread Michael S. Tsirkin
On Sun, Nov 27, 2016 at 07:56:09PM -0800, John Fastabend wrote:
> On 16-11-27 07:36 PM, Michael S. Tsirkin wrote:
> > On Fri, Nov 25, 2016 at 01:24:03PM -0800, John Fastabend wrote:
> >> On 16-11-22 06:58 AM, Michael S. Tsirkin wrote:
> >>> On Tue, Nov 22, 2016 at 12:27:03AM -0800, John Fastabend wrote:
>  On 16-11-21 03:20 PM, Michael S. Tsirkin wrote:
> > On Sat, Nov 19, 2016 at 06:50:33PM -0800, John Fastabend wrote:
> >> From: Shrijeet Mukherjee 
> >>
> >> This adds XDP support to virtio_net. Some requirements must be
> >> met for XDP to be enabled depending on the mode. First it will
> >> only be supported with LRO disabled so that data is not pushed
> >> across multiple buffers. The MTU must be less than a page size
> >> to avoid having to handle XDP across multiple pages.
> >>
> >> If mergeable receive is enabled this first series only supports
> >> the case where header and data are in the same buf which we can
> >> check when a packet is received by looking at num_buf. If the
> >> num_buf is greater than 1 and a XDP program is loaded the packet
> >> is dropped and a warning is thrown. When any_header_sg is set this
> >> does not happen and both header and data is put in a single buffer
> >> as expected so we check this when XDP programs are loaded. Note I
> >> have only tested this with Linux vhost backend.
> >>
> >> If big packets mode is enabled and MTU/LRO conditions above are
> >> met then XDP is allowed.
> >>
> >> A follow on patch can be generated to solve the mergeable receive
> >> case with num_bufs equal to 2. Buffers greater than two may not
> >> be handled has easily.
> >
> >
> > I would very much prefer support for other layouts without drops
> > before merging this.
> > header by itself can certainly be handled by skipping it.
> > People wanted to use that e.g. for zero copy.
> 
>  OK fair enough I'll do this now rather than push it out.
> 
> >>
> >> Hi Michael,
> >>
> >> The header skip logic however complicates the xmit handling a fair
> >> amount. Specifically when we release the buffers after xmit then
> >> both the hdr and data portions need to be released which requires
> >> some tracking.
> > 
> > I thought you disable all checksum offloads so why not discard the
> > header immediately?
> 
> Well in the "normal" case where the header is part of the same buffer
> we keep it to use the same space for the header on the TX path.
> 
> If we discard it in the header split case we have to push the header
> somewhere else. In the skb case the cb[] region is used it looks like.
> In our case I guess free space at the end of the page could be used.

You don't have to put start of page in a buffer, you
can put an offset there. Will result in some waste in the
common case, but it's just several bytes so likely not a big deal.

> My thinking is if we handle the general case of more than one buffer
> being used with a copy we can handle the case above using the same
> logic and no need to handle it as a special case. It seems to be an odd
> case that doesn't really exist anyways. At least not in qemu/Linux. I
> have not tested anything else.

OK

> > 
> >> Is the header split logic actually in use somewhere today? It looks
> >> like its not being used in Linux case. And zero copy RX is currently as
> >> best I can tell not supported anywhere so I would prefer not to
> >> complicate the XDP path at the moment with a possible future feature.
> > 
> > Well it's part of the documented interface so we never
> > know who implemented it. Normally if we want to make
> > restrictions we would do the reverse and add a feature.
> > 
> > We can do this easily, but I'd like to first look into
> > just handling all possible inputs as the spec asks us to.
> > I'm a bit too busy with other stuff next week but will
> > look into this a week after that if you don't beat me to it.
> > 
> 
> Well I've almost got it working now with some logic to copy everything
> into a single page if we hit this case so should be OK but slow. I'll
> finish testing this and send it out hopefully in the next few days.
> 
> >
> > Anything else can be handled by copying the packet.
> >>
> >> Any idea how to test this? At the moment I have some code to linearize
> >> the data in all cases with more than a single buffer. But wasn't clear
> >> to me which features I could negotiate with vhost/qemu to get more than
> >> a single buffer in the receive path.
> >>
> >> Thanks,
> >> John
> > 
> > ATM you need to hack qemu. Here's a hack to make header completely
> > separate.
> > 
> 
> Perfect! hacking qemu for testing is no problem this helps a lot thanks
> and saves me time trying to figure out how to get qemu to do this.

Pls note I didn't try this at all, so might not work, but should
give you the idea.

> > 
> > diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
> > 

Re: [net-next PATCH v2 3/5] virtio_net: Add XDP support

2016-11-27 Thread John Fastabend
On 16-11-27 07:36 PM, Michael S. Tsirkin wrote:
> On Fri, Nov 25, 2016 at 01:24:03PM -0800, John Fastabend wrote:
>> On 16-11-22 06:58 AM, Michael S. Tsirkin wrote:
>>> On Tue, Nov 22, 2016 at 12:27:03AM -0800, John Fastabend wrote:
 On 16-11-21 03:20 PM, Michael S. Tsirkin wrote:
> On Sat, Nov 19, 2016 at 06:50:33PM -0800, John Fastabend wrote:
>> From: Shrijeet Mukherjee 
>>
>> This adds XDP support to virtio_net. Some requirements must be
>> met for XDP to be enabled depending on the mode. First it will
>> only be supported with LRO disabled so that data is not pushed
>> across multiple buffers. The MTU must be less than a page size
>> to avoid having to handle XDP across multiple pages.
>>
>> If mergeable receive is enabled this first series only supports
>> the case where header and data are in the same buf which we can
>> check when a packet is received by looking at num_buf. If the
>> num_buf is greater than 1 and a XDP program is loaded the packet
>> is dropped and a warning is thrown. When any_header_sg is set this
>> does not happen and both header and data is put in a single buffer
>> as expected so we check this when XDP programs are loaded. Note I
>> have only tested this with Linux vhost backend.
>>
>> If big packets mode is enabled and MTU/LRO conditions above are
>> met then XDP is allowed.
>>
>> A follow on patch can be generated to solve the mergeable receive
>> case with num_bufs equal to 2. Buffers greater than two may not
>> be handled has easily.
>
>
> I would very much prefer support for other layouts without drops
> before merging this.
> header by itself can certainly be handled by skipping it.
> People wanted to use that e.g. for zero copy.

 OK fair enough I'll do this now rather than push it out.

>>
>> Hi Michael,
>>
>> The header skip logic however complicates the xmit handling a fair
>> amount. Specifically when we release the buffers after xmit then
>> both the hdr and data portions need to be released which requires
>> some tracking.
> 
> I thought you disable all checksum offloads so why not discard the
> header immediately?

Well in the "normal" case where the header is part of the same buffer
we keep it to use the same space for the header on the TX path.

If we discard it in the header split case we have to push the header
somewhere else. In the skb case the cb[] region is used it looks like.
In our case I guess free space at the end of the page could be used.

My thinking is if we handle the general case of more than one buffer
being used with a copy we can handle the case above using the same
logic and no need to handle it as a special case. It seems to be an odd
case that doesn't really exist anyways. At least not in qemu/Linux. I
have not tested anything else.

> 
>> Is the header split logic actually in use somewhere today? It looks
>> like its not being used in Linux case. And zero copy RX is currently as
>> best I can tell not supported anywhere so I would prefer not to
>> complicate the XDP path at the moment with a possible future feature.
> 
> Well it's part of the documented interface so we never
> know who implemented it. Normally if we want to make
> restrictions we would do the reverse and add a feature.
> 
> We can do this easily, but I'd like to first look into
> just handling all possible inputs as the spec asks us to.
> I'm a bit too busy with other stuff next week but will
> look into this a week after that if you don't beat me to it.
> 

Well I've almost got it working now with some logic to copy everything
into a single page if we hit this case so should be OK but slow. I'll
finish testing this and send it out hopefully in the next few days.

>
> Anything else can be handled by copying the packet.
>>
>> Any idea how to test this? At the moment I have some code to linearize
>> the data in all cases with more than a single buffer. But wasn't clear
>> to me which features I could negotiate with vhost/qemu to get more than
>> a single buffer in the receive path.
>>
>> Thanks,
>> John
> 
> ATM you need to hack qemu. Here's a hack to make header completely
> separate.
> 

Perfect! hacking qemu for testing is no problem this helps a lot thanks
and saves me time trying to figure out how to get qemu to do this.

> 
> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
> index b68c69d..4866144 100644
> --- a/hw/net/virtio-net.c
> +++ b/hw/net/virtio-net.c
> @@ -1164,6 +1164,7 @@ static ssize_t virtio_net_receive(NetClientState *nc, 
> const uint8_t *buf, size_t
>  offset = n->host_hdr_len;
>  total += n->guest_hdr_len;
>  guest_offset = n->guest_hdr_len;
> +continue;
>  } else {
>  guest_offset = 0;
>  }
> 
> 
> 
> here's one that should cap the 1st s/g to 100 bytes:
> 
> 
> diff --git a/hw/net/virtio-net.c 

Re: [net-next PATCH v2 3/5] virtio_net: Add XDP support

2016-11-27 Thread Michael S. Tsirkin
On Fri, Nov 25, 2016 at 01:24:03PM -0800, John Fastabend wrote:
> On 16-11-22 06:58 AM, Michael S. Tsirkin wrote:
> > On Tue, Nov 22, 2016 at 12:27:03AM -0800, John Fastabend wrote:
> >> On 16-11-21 03:20 PM, Michael S. Tsirkin wrote:
> >>> On Sat, Nov 19, 2016 at 06:50:33PM -0800, John Fastabend wrote:
>  From: Shrijeet Mukherjee 
> 
>  This adds XDP support to virtio_net. Some requirements must be
>  met for XDP to be enabled depending on the mode. First it will
>  only be supported with LRO disabled so that data is not pushed
>  across multiple buffers. The MTU must be less than a page size
>  to avoid having to handle XDP across multiple pages.
> 
>  If mergeable receive is enabled this first series only supports
>  the case where header and data are in the same buf which we can
>  check when a packet is received by looking at num_buf. If the
>  num_buf is greater than 1 and a XDP program is loaded the packet
>  is dropped and a warning is thrown. When any_header_sg is set this
>  does not happen and both header and data is put in a single buffer
>  as expected so we check this when XDP programs are loaded. Note I
>  have only tested this with Linux vhost backend.
> 
>  If big packets mode is enabled and MTU/LRO conditions above are
>  met then XDP is allowed.
> 
>  A follow on patch can be generated to solve the mergeable receive
>  case with num_bufs equal to 2. Buffers greater than two may not
>  be handled has easily.
> >>>
> >>>
> >>> I would very much prefer support for other layouts without drops
> >>> before merging this.
> >>> header by itself can certainly be handled by skipping it.
> >>> People wanted to use that e.g. for zero copy.
> >>
> >> OK fair enough I'll do this now rather than push it out.
> >>
> 
> Hi Michael,
> 
> The header skip logic however complicates the xmit handling a fair
> amount. Specifically when we release the buffers after xmit then
> both the hdr and data portions need to be released which requires
> some tracking.

I thought you disable all checksum offloads so why not discard the
header immediately?

> Is the header split logic actually in use somewhere today? It looks
> like its not being used in Linux case. And zero copy RX is currently as
> best I can tell not supported anywhere so I would prefer not to
> complicate the XDP path at the moment with a possible future feature.

Well it's part of the documented interface so we never
know who implemented it. Normally if we want to make
restrictions we would do the reverse and add a feature.

We can do this easily, but I'd like to first look into
just handling all possible inputs as the spec asks us to.
I'm a bit too busy with other stuff next week but will
look into this a week after that if you don't beat me to it.

> >>>
> >>> Anything else can be handled by copying the packet.
> 
> Any idea how to test this? At the moment I have some code to linearize
> the data in all cases with more than a single buffer. But wasn't clear
> to me which features I could negotiate with vhost/qemu to get more than
> a single buffer in the receive path.
> 
> Thanks,
> John

ATM you need to hack qemu. Here's a hack to make header completely
separate.


diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index b68c69d..4866144 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -1164,6 +1164,7 @@ static ssize_t virtio_net_receive(NetClientState *nc, 
const uint8_t *buf, size_t
 offset = n->host_hdr_len;
 total += n->guest_hdr_len;
 guest_offset = n->guest_hdr_len;
+continue;
 } else {
 guest_offset = 0;
 }



here's one that should cap the 1st s/g to 100 bytes:


diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index b68c69d..7943004 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -1164,6 +1164,7 @@ static ssize_t virtio_net_receive(NetClientState *nc, 
const uint8_t *buf, size_t
 offset = n->host_hdr_len;
 total += n->guest_hdr_len;
 guest_offset = n->guest_hdr_len;
+sg.iov_len = MIN(sg.iov_len, 100);
 } else {
 guest_offset = 0;
 }


Re: [net,v2] neigh: fix the loop index error in neigh dump

2016-11-27 Thread David Ahern
On 11/27/16 7:56 PM, David Ahern wrote:
> On 11/27/16 7:53 PM, 张胜举 wrote:
>>
>>
>>> -Original Message-
>>> From: David Ahern [mailto:d...@cumulusnetworks.com]
>>> Sent: Monday, November 28, 2016 10:39 AM
>>> To: 张胜举 ;
>>> netdev@vger.kernel.org
>>> Subject: Re: [net,v2] neigh: fix the loop index error in neigh dump
>>>
>>> On 11/27/16 7:34 PM, 张胜举 wrote:
> -Original Message-
> From: David Ahern [mailto:d...@cumulusnetworks.com]
> Sent: Monday, November 28, 2016 10:10 AM
> To: Zhang Shengju ;
> netdev@vger.kernel.org
> Subject: Re: [net,v2] neigh: fix the loop index error in neigh dump
>
> On 11/27/16 6:32 PM, Zhang Shengju wrote:
>> Loop index in neigh dump function is not updated correctly under
>> some circumstances, this patch will fix it.
>
> What's an example?

 If dev is filtered out, the original code goes to next loop without
 updating loop index 'idx'.
>>>
>>> And you have a use case with missing or redundant data? Or is your
>>> comment based on a review of code only?
>> It's on my code review. No use case currently,  this is uncommon to happen.
>>
>>
>>>
> You are completely rewriting the dump loops.

 I put 'idx++' into for loop,  so I replace 'goto' with 'continue'.
 The other change is style related.
>>>
>>> A "fixes" should not include 'style related' changes.
>> Okay, I will send another version without style changes.
>>
> 
> Personally, I think you need to produce a use case that fails before sending 
> another patch. I have not seen a problem with this code.
> 

And looking back at 3f0ae05d6f I should not have acked it (reviewed it too 
quickly while on PTO). Your change is a no-op because of what idx represents - 
the position in the hash list for devices relevant for the dump request. Same 
goes for the neigh dump so this patch is not needed.




Re: [Patch net-next] net_sched: move the empty tp check from ->destroy() to ->delete()

2016-11-27 Thread John Fastabend
On 16-11-23 05:58 PM, Cong Wang wrote:
> Roi reported we could have a race condition where in ->classify() path
> we dereference tp->root and meanwhile a parallel ->destroy() makes it
> a NULL.
> 
> This is possible because ->destroy() could be called when deleting
> a filter to check if we are the last one in tp, this tp is still
> linked and visible at that time.
> 
> The root cause of this problem is the semantic of ->destroy(), it
> does two things (for non-force case):
> 
> 1) check if tp is empty
> 2) if tp is empty we could really destroy it
> 
> and its caller, if cares, needs to check its return value to see if
> it is really destroyed. Therefore we can't unlink tp unless we know
> it is empty.
> 
> As suggested by Daniel, we could actually move the test logic to ->delete()
> so that we can safely unlink tp after ->delete() tells us the last one is
> just deleted and before ->destroy().
> 
> What's more, even we unlink it before ->destroy(), it could still have
> readers since we don't wait for a grace period here, we should not modify
> tp->root in ->destroy() either.
> 
> Fixes: 1e052be69d04 ("net_sched: destroy proto tp when all filters are gone")
> Reported-by: Roi Dayan 
> Cc: Daniel Borkmann 
> Cc: John Fastabend 
> Signed-off-by: Cong Wang 
> ---

Hi Cong,

Thanks a lot for doing this. Can you rebase it on top of Daniel's patch
though,

 [PATCH net] net, sched: respect rcu grace period on cls destruction

And then push the NULL pointer work for the cls_fw and cls_route
classifiers into another patch.

Then I believe the last thing to make this correct is to convert the
call_rcu() paths to call_rcu_bh().

.John



Re: [net,v2] neigh: fix the loop index error in neigh dump

2016-11-27 Thread David Ahern
On 11/27/16 7:53 PM, 张胜举 wrote:
> 
> 
>> -Original Message-
>> From: David Ahern [mailto:d...@cumulusnetworks.com]
>> Sent: Monday, November 28, 2016 10:39 AM
>> To: 张胜举 ;
>> netdev@vger.kernel.org
>> Subject: Re: [net,v2] neigh: fix the loop index error in neigh dump
>>
>> On 11/27/16 7:34 PM, 张胜举 wrote:
 -Original Message-
 From: David Ahern [mailto:d...@cumulusnetworks.com]
 Sent: Monday, November 28, 2016 10:10 AM
 To: Zhang Shengju ;
 netdev@vger.kernel.org
 Subject: Re: [net,v2] neigh: fix the loop index error in neigh dump

 On 11/27/16 6:32 PM, Zhang Shengju wrote:
> Loop index in neigh dump function is not updated correctly under
> some circumstances, this patch will fix it.

 What's an example?
>>>
>>> If dev is filtered out, the original code goes to next loop without
>>> updating loop index 'idx'.
>>
>> And you have a use case with missing or redundant data? Or is your
>> comment based on a review of code only?
> It's on my code review. No use case currently,  this is uncommon to happen.
> 
> 
>>
 You are completely rewriting the dump loops.
>>>
>>> I put 'idx++' into for loop,  so I replace 'goto' with 'continue'.
>>> The other change is style related.
>>
>> A "fixes" should not include 'style related' changes.
> Okay, I will send another version without style changes.
> 

Personally, I think you need to produce a use case that fails before sending 
another patch. I have not seen a problem with this code.



Re: [PATCH net] net, sched: respect rcu grace period on cls destruction

2016-11-27 Thread John Fastabend
On 16-11-26 04:18 PM, Daniel Borkmann wrote:
> Roi reported a crash in flower where tp->root was NULL in ->classify()
> callbacks. Reason is that in ->destroy() tp->root is set to NULL via
> RCU_INIT_POINTER(). It's problematic for some of the classifiers, because
> this doesn't respect RCU grace period for them, and as a result, still
> outstanding readers from tc_classify() will try to blindly dereference
> a NULL tp->root.
> 
> The tp->root object is strictly private to the classifier implementation
> and holds internal data the core such as tc_ctl_tfilter() doesn't know
> about. Within some classifiers, such as cls_bpf, cls_basic, etc, tp->root
> is only checked for NULL in ->get() callback, but nowhere else. This is
> misleading and seemed to be copied from old classifier code that was not
> cleaned up properly. For example, d3fa76ee6b4a ("[NET_SCHED]: cls_basic:
> fix NULL pointer dereference") moved tp->root initialization into ->init()
> routine, where before it was part of ->change(), so ->get() had to deal
> with tp->root being NULL back then, so that was indeed a valid case, after
> d3fa76ee6b4a, not really anymore. We used to set tp->root to NULL long
> ago in ->destroy(), see 47a1a1d4be29 ("pkt_sched: remove unnecessary xchg()
> in packet classifiers"); but the NULLifying was reintroduced with the
> RCUification, but it's not correct for every classifier implementation.
> 
> In the cases that are fixed here with one exception of cls_cgroup, tp->root
> object is allocated and initialized inside ->init() callback, which is always
> performed at a point in time after we allocate a new tp, which means tp and
> thus tp->root was not globally visible in the tp chain yet (see 
> tc_ctl_tfilter()).
> Also, on destruction tp->root is strictly kfree_rcu()'ed in ->destroy()
> handler, same for the tp which is kfree_rcu()'ed right when we return
> from ->destroy() in tcf_destroy(). This means, the head object's lifetime
> for such classifiers is always tied to the tp lifetime. The RCU callback
> invocation for the two kfree_rcu() could be out of order, but that's fine
> since both are independent.
> 
> Dropping the RCU_INIT_POINTER(tp->root, NULL) for these classifiers here
> means that 1) we don't need a useless NULL check in fast-path and, 2) that
> outstanding readers of that tp in tc_classify() can still execute under
> respect with RCU grace period as it is actually expected.
> 
> Things that haven't been touched here: cls_fw and cls_route. They each
> handle tp->root being NULL in ->classify() path for historic reasons, so
> their ->destroy() implementation can stay as is. If someone actually
> cares, they could get cleaned up at some point to avoid the test in fast
> path. cls_u32 doesn't set tp->root to NULL. For cls_rsvp, I just added a
> !head should anyone actually be using/testing it, so it at least aligns with
> cls_fw and cls_route. For cls_flower we additionally need to defer rhashtable
> destruction (to a sleepable context) after RCU grace period as concurrent
> readers might still access it. (Note that in this case we need to hold module
> reference to keep work callback address intact, since we only wait on module
> unload for all call_rcu()s to finish.)
> 
> This fixes one race to bring RCU grace period guarantees back. Next step
> as worked on by Cong however is to fix 1e052be69d04 ("net_sched: destroy
> proto tp when all filters are gone") to get the order of unlinking the tp
> in tc_ctl_tfilter() for the RTM_DELTFILTER case right by moving
> RCU_INIT_POINTER() before tcf_destroy() and let the notification for
> removal be done through the prior ->delete() callback. Both are independant
> issues. Once we have that right, we can then clean tp->root up for a number
> of classifiers by not making them RCU pointers, which requires a new callback
> (->uninit) that is triggered from tp's RCU callback, where we just kfree()
> tp->root from there.

Thanks looks good to me and appreciate the detailed commit message.

Acked-by: John Fastabend 

> 
> Fixes: 1f947bf151e9 ("net: sched: rcu'ify cls_bpf")
> Fixes: 9888faefe132 ("net: sched: cls_basic use RCU")
> Fixes: 70da9f0bf999 ("net: sched: cls_flow use RCU")
> Fixes: 77b9900ef53a ("tc: introduce Flower classifier")
> Fixes: bf3994d2ed31 ("net/sched: introduce Match-all classifier")
> Fixes: 952313bd6258 ("net: sched: cls_cgroup use RCU")
> Reported-by: Roi Dayan 
> Signed-off-by: Daniel Borkmann 
> Cc: Cong Wang 
> Cc: John Fastabend 
> Cc: Roi Dayan 
> Cc: Jiri Pirko 
> ---



RE: [net,v2] neigh: fix the loop index error in neigh dump

2016-11-27 Thread 张胜举


> -Original Message-
> From: David Ahern [mailto:d...@cumulusnetworks.com]
> Sent: Monday, November 28, 2016 10:39 AM
> To: 张胜举 ;
> netdev@vger.kernel.org
> Subject: Re: [net,v2] neigh: fix the loop index error in neigh dump
> 
> On 11/27/16 7:34 PM, 张胜举 wrote:
> >> -Original Message-
> >> From: David Ahern [mailto:d...@cumulusnetworks.com]
> >> Sent: Monday, November 28, 2016 10:10 AM
> >> To: Zhang Shengju ;
> >> netdev@vger.kernel.org
> >> Subject: Re: [net,v2] neigh: fix the loop index error in neigh dump
> >>
> >> On 11/27/16 6:32 PM, Zhang Shengju wrote:
> >>> Loop index in neigh dump function is not updated correctly under
> >>> some circumstances, this patch will fix it.
> >>
> >> What's an example?
> >
> > If dev is filtered out, the original code goes to next loop without
> > updating loop index 'idx'.
> 
> And you have a use case with missing or redundant data? Or is your
> comment based on a review of code only?
It's on my code review. No use case currently,  this is uncommon to happen.


> 
> >> You are completely rewriting the dump loops.
> >
> > I put 'idx++' into for loop,  so I replace 'goto' with 'continue'.
> > The other change is style related.
> 
> A "fixes" should not include 'style related' changes.
Okay, I will send another version without style changes.





[PATCH] net: handle no dst on skb in icmp6_send

2016-11-27 Thread David Ahern
Andrey reported the following while fuzzing the kernel with syzkaller:

kasan: CONFIG_KASAN_INLINE enabled
kasan: GPF could be caused by NULL-ptr deref or user memory access
general protection fault:  [#1] SMP KASAN
Modules linked in:
CPU: 0 PID: 3859 Comm: a.out Not tainted 4.9.0-rc6+ #429
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
task: 8800666d4200 task.stack: 880067348000
RIP: 0010:[]  []
icmp6_send+0x5fc/0x1e30 net/ipv6/icmp.c:451
RSP: 0018:88006734f2c0  EFLAGS: 00010206
RAX: 8800666d4200 RBX:  RCX: 
RDX:  RSI: dc00 RDI: 0018
RBP: 88006734f630 R08: 880064138418 R09: 0003
R10: dc00 R11: 0005 R12: 
R13: 84e7e200 R14: 880064138484 R15: 8800641383c0
FS:  7fb3887a07c0() GS:88006cc0() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 2000 CR3: 6b04 CR4: 06f0
Stack:
 8800666d4200 8800666d49f8 8800666d4200 84c02460
 8800666d4a1a 11000ccdaa2f 88006734f498 0046
 88006734f440 832f4269 880064ba7456 
Call Trace:
 [] icmpv6_param_prob+0x2c/0x40 net/ipv6/icmp.c:557
 [< inline >] ip6_tlvopt_unknown net/ipv6/exthdrs.c:88
 [] ip6_parse_tlv+0x555/0x670 net/ipv6/exthdrs.c:157
 [] ipv6_parse_hopopts+0x199/0x460 net/ipv6/exthdrs.c:663
 [] ipv6_rcv+0xfa3/0x1dc0 net/ipv6/ip6_input.c:191
 ...

icmp6_send / icmpv6_send is invoked for both rx and tx paths. In both
cases the dst->dev should be preferred for determining the L3 domain
if the dst has been set on the skb. Fallback to the skb->dev if it has
not. This covers the case reported here where icmp6_send is invoked on
Rx before the route lookup.

Fixes: 5d41ce29e ("net: icmp6_send should use dst dev to determine L3 domain")
Reported-by: Andrey Konovalov 
Signed-off-by: David Ahern 
---
 net/ipv6/icmp.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/net/ipv6/icmp.c b/net/ipv6/icmp.c
index 7370ad2e693a..2772004ba5a1 100644
--- a/net/ipv6/icmp.c
+++ b/net/ipv6/icmp.c
@@ -447,8 +447,10 @@ static void icmp6_send(struct sk_buff *skb, u8 type, u8 
code, __u32 info,
 
if (__ipv6_addr_needs_scope_id(addr_type))
iif = skb->dev->ifindex;
-   else
-   iif = l3mdev_master_ifindex(skb_dst(skb)->dev);
+   else {
+   dst = skb_dst(skb);
+   iif = l3mdev_master_ifindex(dst ? dst->dev : skb->dev);
+   }
 
/*
 *  Must not send error if the source does not uniquely
-- 
2.1.4



Re: [Patch net-next] net_sched: move the empty tp check from ->destroy() to ->delete()

2016-11-27 Thread John Fastabend
On 16-11-27 06:26 PM, John Fastabend wrote:
> On 16-11-26 10:29 PM, Roi Dayan wrote:
>>
>>
>> On 27/11/2016 06:47, Roi Dayan wrote:
>>>
>>>
>>> On 27/11/2016 02:33, Daniel Borkmann wrote:
 On 11/26/2016 12:09 PM, Daniel Borkmann wrote:
> On 11/26/2016 07:46 AM, Cong Wang wrote:
>> On Thu, Nov 24, 2016 at 7:20 AM, Daniel Borkmann
>>  wrote:
 [...]
>>> Ok, strange, qdisc_destroy() calls into ops->destroy(), where ingress
>>> drops its entire chain via tcf_destroy_chain(), so that will be NULL
>>> eventually. The tps are freed by call_rcu() as well as qdisc itself
>>> later on via qdisc_rcu_free(), where it frees per-cpu bstats as well.
>>> Outstanding readers should either bail out due to if (!cl) or can
>>> still
>>> process the chain until read section ends, but during that time,
>>> cl->q
>>> resp. bstats should be good. Do you happen to know what's at address
>>> 880a68b04028? I was wondering wrt call_rcu() vs call_rcu_bh(),
>>> but
>>> at least on ingress (netif_receive_skb_internal()) we hold
>>> rcu_read_lock()
>>> here. The KASAN report is reliably happening at this location, right?
>>
>> I am confused as well, I don't see how it could be related to my
>> patch yet.
>> I will take a deep look in the weekend.
>>>
>>>
>>>
>>> Hi Cong,
>>>
>>> When reported the new trace I didn't mean it's related to your patch,
>>> I just wanted to point it out it exposed something. I should have been
>>> clear about it.
>>>
>>>
>
> Ok, I'm currently on the run. Got too late yesterday night, but I'll
> write what I found in the evening today, not related to ingress though.

 Just pushed out my analysis to netdev under "[PATCH net] net, sched:
 respect
 rcu grace period on cls destruction". My conclusion is that both
 issues are
 actually separate, and that one is small enough where we could route
 it via
 net actually. Perhaps this at the same time shrinks your "[PATCH
 net-next]
 net_sched: move the empty tp check from ->destroy() to ->delete()" to a
 reasonable size that it's suitable to net as well. Your
 ->delete()/->destroy()
 one is definitely needed, too. The tp->root one is independant of
 ->delete()/
 ->destroy() as they are different races and tp->root could also
 happen when
 you just destroy the whole tp directly. I think that seems like a
 good path
 forward to me.

 Thanks,
 Daniel
>>>
>>>
>>>
>>> Hi Daniel,
>>>
>>> As for the tainted kernel. I was in old (week or two) net-next tree
>>> and only cherry-picked from latest net-next related patches to
>>> Mellanox HCA, cls_api, cls_flower, devlink. so those are the tainted
>>> modules.
>>> I have the issue reproducing in that tree so wanted it to check it
>>> with Cong's patch instead of latest net-next.
>>> I'll try running reproducing the issue with your new patch and later
>>> try latest net-next as well.
>>>
>>> Thanks,
>>> Roi
>>>
>>
>> Hi,
>>
>> I tested "[PATCH net] net, sched: respect rcu grace period on cls
>> destruction" and could not reproduce my original issue.
> 
> Hi Roi,
> 
> Just so I'm 100% clear. No issue with just the above "respect rcu grace
> period on cls destruction" per above statement.
> 
>> I rebased "[Patch net-next] net_sched: move the empty tp check from
>> ->destroy() to ->delete()" over to test it in the same tree and got into
>> a new trace in fl_delete.
> 
> In this case did you test with "net_sched: move the empty tp check from
> ->destroy() to ->delete()" _only_ or did this include both patches when
> you see the error below.
> 
> From my inspection we really need both patches to get correct behavior.
> 
> Thanks!
> John

Ah dang nevermind I just read both patches in detail and applying them
both at the same time is nonsense. Let me reply with comments directly
to the patches.

Thanks. sorry for the noise.

> 
>>
>> [35659.012123] BUG: KASAN: wild-memory-access on address 1803ca31
>> [35659.020042] Write of size 1 by task ovs-vswitchd/20135
>> [35659.025878] CPU: 19 PID: 20135 Comm: ovs-vswitchd Tainted:
>> G   O4.9.0-rc3+ #18
>> [35659.035948] Hardware name: HP ProLiant DL380p Gen8, BIOS P70 07/01/2015
>> [35659.043730] Call Trace:
>> [35659.046619]  [] dump_stack+0x63/0x81
>> [35659.052456]  [] kasan_report_error+0x408/0x4e0
>> [35659.059402]  [] kasan_report+0x58/0x60
>> [35659.065428]  [] ? call_rcu_sched+0x1d/0x20
>> [35659.072119]  [] ? fl_destroy_filter+0x21/0x30
>> [cls_flower]
>> [35659.080217]  [] ? fl_delete+0x1df/0x2e0 [cls_flower]
>> [35659.087580]  [] __asan_store1+0x4a/0x50
>> [35659.093697]  [] fl_delete+0x1df/0x2e0 [cls_flower]
>> [35659.100870]  [] tc_ctl_tfilter+0x10da/0x1b90
>>
>>
>> 0x1d02 is in fl_delete (net/sched/cls_flower.c:805).
>> 800 struct cls_fl_filter *f = (struct cls_fl_filter *) arg;
>> 801
>> 802 rhashtable_remove_fast(>ht, 

[PATCH net-next v3 2/4] Documentation: net: phy: Add a paragraph about pause frames/flow control

2016-11-27 Thread Florian Fainelli
Describe that the Ethernet MAC controller is ultimately responsible for
dealing with proper pause frames/flow control advertisement and
enabling, and that it is therefore allowed to have it change
phydev->supported/advertising with SUPPORTED_Pause and
SUPPORTED_AsymPause.

Reviewed-by: Martin Blumenstingl 
Signed-off-by: Florian Fainelli 
---
 Documentation/networking/phy.txt | 18 --
 1 file changed, 16 insertions(+), 2 deletions(-)

diff --git a/Documentation/networking/phy.txt b/Documentation/networking/phy.txt
index 4b25c0f24201..9a42a9414cea 100644
--- a/Documentation/networking/phy.txt
+++ b/Documentation/networking/phy.txt
@@ -127,8 +127,9 @@ Letting the PHY Abstraction Layer do Everything
  values pruned from them which don't make sense for your controller (a 10/100
  controller may be connected to a gigabit capable PHY, so you would need to
  mask off SUPPORTED_1000baseT*).  See include/linux/ethtool.h for definitions
- for these bitfields. Note that you should not SET any bits, or the PHY may
- get put into an unsupported state.
+ for these bitfields. Note that you should not SET any bits, except the
+ SUPPORTED_Pause and SUPPORTED_AsymPause bits (see below), or the PHY may get
+ put into an unsupported state.
 
  Lastly, once the controller is ready to handle network traffic, you call
  phy_start(phydev).  This tells the PAL that you are ready, and configures the
@@ -139,6 +140,19 @@ Letting the PHY Abstraction Layer do Everything
  When you want to disconnect from the network (even if just briefly), you call
  phy_stop(phydev).
 
+Pause frames / flow control
+
+ The PHY does not participate directly in flow control/pause frames except by
+ making sure that the SUPPORTED_Pause and SUPPORTED_AsymPause bits are set in
+ MII_ADVERTISE to indicate towards the link partner that the Ethernet MAC
+ controller supports such a thing. Since flow control/pause frames generation
+ involves the Ethernet MAC driver, it is recommended that this driver takes 
care
+ of properly indicating advertisement and support for such features by setting
+ the SUPPORTED_Pause and SUPPORTED_AsymPause bits accordingly. This can be done
+ either before or after phy_connect() and/or as a result of implementing the
+ ethtool::set_pauseparam feature.
+
+
 Keeping Close Tabs on the PAL
 
  It is possible that the PAL's built-in state machine needs a little help to
-- 
2.9.3



[PATCH net-next v3 3/4] Documentation: net: phy: Add blurb about RGMII

2016-11-27 Thread Florian Fainelli
RGMII is a recurring source of pain for people with Gigabit Ethernet
hardware since it may require PHY driver and MAC driver level
configuration hints. Document what are the expectations from PHYLIB and
what options exist.

Reviewed-by: Martin Blumenstingl 
Signed-off-by: Florian Fainelli 
---
 Documentation/networking/phy.txt | 77 
 1 file changed, 77 insertions(+)

diff --git a/Documentation/networking/phy.txt b/Documentation/networking/phy.txt
index 9a42a9414cea..c7ba84b5d912 100644
--- a/Documentation/networking/phy.txt
+++ b/Documentation/networking/phy.txt
@@ -65,6 +65,83 @@ The MDIO bus
  drivers/net/ethernet/freescale/fsl_pq_mdio.c and an associated DTS file
  for one of the users. (e.g. "git grep fsl,.*-mdio arch/powerpc/boot/dts/")
 
+(RG)MII/electrical interface considerations
+
+ The Reduced Gigabit Medium Independent Interface (RGMII) is a 12-pin
+ electrical signal interface using a synchronous 125Mhz clock signal and 
several
+ data lines. Due to this design decision, a 1.5ns to 2ns delay must be added
+ between the clock line (RXC or TXC) and the data lines to let the PHY (clock
+ sink) have enough setup and hold times to sample the data lines correctly. The
+ PHY library offers different types of PHY_INTERFACE_MODE_RGMII* values to let
+ the PHY driver and optionally the MAC driver, implement the required delay. 
The
+ values of phy_interface_t must be understood from the perspective of the PHY
+ device itself, leading to the following:
+
+ * PHY_INTERFACE_MODE_RGMII: the PHY is not responsible for inserting any
+   internal delay by itself, it assumes that either the Ethernet MAC (if 
capable
+   or the PCB traces) insert the correct 1.5-2ns delay
+
+ * PHY_INTERFACE_MODE_RGMII_TXID: the PHY should insert an internal delay
+   for the transmit data lines (TXD[3:0]) processed by the PHY device
+
+ * PHY_INTERFACE_MODE_RGMII_RXID: the PHY should insert an internal delay
+   for the receive data lines (RXD[3:0]) processed by the PHY device
+
+ * PHY_INTERFACE_MODE_RGMII_ID: the PHY should insert internal delays for
+   both transmit AND receive data lines from/to the PHY device
+
+ Whenever possible, use the PHY side RGMII delay for these reasons:
+
+ * PHY devices may offer sub-nanosecond granularity in how they allow a
+   receiver/transmitter side delay (e.g: 0.5, 1.0, 1.5ns) to be specified. Such
+   precision may be required to account for differences in PCB trace lengths
+
+ * PHY devices are typically qualified for a large range of applications
+   (industrial, medical, automotive...), and they provide a constant and
+   reliable delay across temperature/pressure/voltage ranges
+
+ * PHY device drivers in PHYLIB being reusable by nature, being able to
+   configure correctly a specified delay enables more designs with similar 
delay
+   requirements to be operate correctly
+
+ For cases where the PHY is not capable of providing this delay, but the
+ Ethernet MAC driver is capable of doing so, the correct phy_interface_t value
+ should be PHY_INTERFACE_MODE_RGMII, and the Ethernet MAC driver should be
+ configured correctly in order to provide the required transmit and/or receive
+ side delay from the perspective of the PHY device. Conversely, if the Ethernet
+ MAC driver looks at the phy_interface_t value, for any other mode but
+ PHY_INTERFACE_MODE_RGMII, it should make sure that the MAC-level delays are
+ disabled.
+
+ In case neither the Ethernet MAC, nor the PHY are capable of providing the
+ required delays, as defined per the RGMII standard, several options may be
+ available:
+
+ * Some SoCs may offer a pin pad/mux/controller capable of configuring a given
+   set of pins'strength, delays, and voltage; and it may be a suitable
+   option to insert the expected 2ns RGMII delay.
+
+ * Modifying the PCB design to include a fixed delay (e.g: using a specifically
+   designed serpentine), which may not require software configuration at all.
+
+Common problems with RGMII delay mismatch
+
+ When there is a RGMII delay mismatch between the Ethernet MAC and the PHY, 
this
+ will most likely result in the clock and data line signals to be unstable when
+ the PHY or MAC take a snapshot of these signals to translate them into logical
+ 1 or 0 states and reconstruct the data being transmitted/received. Typical
+ symptoms include:
+
+ * Transmission/reception partially works, and there is frequent or occasional
+   packet loss observed
+
+ * Ethernet MAC may report some or all packets ingressing with a FCS/CRC error,
+   or just discard them all
+
+ * Switching to lower speeds such as 10/100Mbits/sec makes the problem go away
+   (since there is enough setup/hold time in that case)
+
+
 Connecting to a PHY
 
  Sometime during startup, the network driver needs to establish a connection
-- 
2.9.3



[PATCH net-next v3 4/4] Documentation: net: phy: Add links to several standards documents

2016-11-27 Thread Florian Fainelli
Add links to the IEEE 802.3-2008 document, and the RGMII v1.3 and v2.0
revisions of the standard.

Reviewed-by: Martin Blumenstingl 
Signed-off-by: Florian Fainelli 
---
 Documentation/networking/phy.txt | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/Documentation/networking/phy.txt b/Documentation/networking/phy.txt
index c7ba84b5d912..e017d933d530 100644
--- a/Documentation/networking/phy.txt
+++ b/Documentation/networking/phy.txt
@@ -407,3 +407,13 @@ Board Fixups
  The stubs set one of the two matching criteria, and set the other one to
  match anything.
 
+Standards
+
+ IEEE Standard 802.3: CSMA/CD Access Method and Physical Layer Specifications, 
Section Two:
+ http://standards.ieee.org/getieee802/download/802.3-2008_section2.pdf
+
+ RGMII v1.3:
+ 
http://web.archive.org/web/20160303212629/http://www.hp.com/rnd/pdfs/RGMIIv1_3.pdf
+
+ RGMII v2.0:
+ 
http://web.archive.org/web/20160303171328/http://www.hp.com/rnd/pdfs/RGMIIv2_0_final_hp.pdf
-- 
2.9.3



[PATCH net-next v3 1/4] Documentation: net: phy: remove description of function pointers

2016-11-27 Thread Florian Fainelli
Remove the function pointers documentation which duplicates information
found in include/linux/phy.h. Maintaining documentation about two
different locations just does not work, but the code is less likely to
be outdated.

Reviewed-by: Martin Blumenstingl 
Signed-off-by: Florian Fainelli 
---
 Documentation/networking/phy.txt | 35 ++-
 1 file changed, 2 insertions(+), 33 deletions(-)

diff --git a/Documentation/networking/phy.txt b/Documentation/networking/phy.txt
index 7ab9404a8412..4b25c0f24201 100644
--- a/Documentation/networking/phy.txt
+++ b/Documentation/networking/phy.txt
@@ -251,39 +251,8 @@ Writing a PHY driver
  PHY_BASIC_FEATURES, but you can look in include/mii.h for other
  features.
 
- Each driver consists of a number of function pointers:
-
-   soft_reset: perform a PHY software reset
-   config_init: configures PHY into a sane state after a reset.
- For instance, a Davicom PHY requires descrambling disabled.
-   probe: Allocate phy->priv, optionally refuse to bind.
-   PHY may not have been reset or had fixups run yet.
-   suspend/resume: power management
-   config_aneg: Changes the speed/duplex/negotiation settings
-   aneg_done: Determines the auto-negotiation result
-   read_status: Reads the current speed/duplex/negotiation settings
-   ack_interrupt: Clear a pending interrupt
-   did_interrupt: Checks if the PHY generated an interrupt
-   config_intr: Enable or disable interrupts
-   remove: Does any driver take-down
-   ts_info: Queries about the HW timestamping status
-   match_phy_device: used for Clause 45 capable PHYs to match devices
-   in package and ensure they are compatible
-   hwtstamp: Set the PHY HW timestamping configuration
-   rxtstamp: Requests a receive timestamp at the PHY level for a 'skb'
-   txtsamp: Requests a transmit timestamp at the PHY level for a 'skb'
-   set_wol: Enable Wake-on-LAN at the PHY level
-   get_wol: Get the Wake-on-LAN status at the PHY level
-   link_change_notify: called to inform the core is about to change the
-   link state, can be used to work around bogus PHY between state changes
-   read_mmd_indirect: Read PHY MMD indirect register
-   write_mmd_indirect: Write PHY MMD indirect register
-   module_info: Get the size and type of an EEPROM contained in an plug-in
-   module
-   module_eeprom: Get EEPROM information of a plug-in module
-   get_sset_count: Get number of strings sets that get_strings will count
-   get_strings: Get strings from requested objects (statistics)
-   get_stats: Get the extended statistics from the PHY device
+ Each driver consists of a number of function pointers, documented
+ in include/linux/phy.h under the phy_driver structure.
 
  Of these, only config_aneg and read_status are required to be
  assigned by the driver code.  The rest are optional.  Also, it is
-- 
2.9.3



[PATCH net-next v3 0/4] Documentation: net: phy: Improve documentation

2016-11-27 Thread Florian Fainelli
Hi all,

This patch series addresses discussions and feedback that was recently received
on the mailing-list in the area of: flow control/pause frames, interpretation of
phy_interface_t and finally add some links to useful standards documents.

Changes in v3:

- add Timur's feedback into patch 3

Changes in v2:

- clarify a few things in the RGMII section, add a paragraph about common issues
  with RGMII delay mismatches

Florian Fainelli (4):
  Documentation: net: phy: remove description of function pointers
  Documentation: net: phy: Add a paragraph about pause frames/flow
control
  Documentation: net: phy: Add blurb about RGMII
  Documentation: net: phy: Add links to several standards documents

 Documentation/networking/phy.txt | 140 +--
 1 file changed, 105 insertions(+), 35 deletions(-)

-- 
2.9.3



Re: [net,v2] neigh: fix the loop index error in neigh dump

2016-11-27 Thread David Ahern
On 11/27/16 7:34 PM, 张胜举 wrote:
>> -Original Message-
>> From: David Ahern [mailto:d...@cumulusnetworks.com]
>> Sent: Monday, November 28, 2016 10:10 AM
>> To: Zhang Shengju ;
>> netdev@vger.kernel.org
>> Subject: Re: [net,v2] neigh: fix the loop index error in neigh dump
>>
>> On 11/27/16 6:32 PM, Zhang Shengju wrote:
>>> Loop index in neigh dump function is not updated correctly under some
>>> circumstances, this patch will fix it.
>>
>> What's an example?
> 
> If dev is filtered out, the original code goes to next loop without updating
> loop index 'idx'.

And you have a use case with missing or redundant data? Or is your comment 
based on a review of code only?


>> You are completely rewriting the dump loops.
> 
> I put 'idx++' into for loop,  so I replace 'goto' with 'continue'.  The
> other change is style related. 

A "fixes" should not include 'style related' changes.


RE: [net,v2] neigh: fix the loop index error in neigh dump

2016-11-27 Thread 张胜举
> -Original Message-
> From: David Ahern [mailto:d...@cumulusnetworks.com]
> Sent: Monday, November 28, 2016 10:10 AM
> To: Zhang Shengju ;
> netdev@vger.kernel.org
> Subject: Re: [net,v2] neigh: fix the loop index error in neigh dump
> 
> On 11/27/16 6:32 PM, Zhang Shengju wrote:
> > Loop index in neigh dump function is not updated correctly under some
> > circumstances, this patch will fix it.
> 
> What's an example?

If dev is filtered out, the original code goes to next loop without updating
loop index 'idx'.

> 
> >
> > Fixes: 16660f0bd9 ("net: Add support for filtering neigh dump by
> > device index")
> > Fixes: 21fdd092ac ("net: Add support for filtering neigh dump by
> > master device")
> >
> > Signed-off-by: Zhang Shengju 
> > ---
> >  net/core/neighbour.c | 39 ++-
> >  1 file changed, 18 insertions(+), 21 deletions(-)
> >
> > diff --git a/net/core/neighbour.c b/net/core/neighbour.c index
> > 2ae929f..ce32e9c 100644
> > --- a/net/core/neighbour.c
> > +++ b/net/core/neighbour.c
> > @@ -2256,6 +2256,16 @@ static bool neigh_ifindex_filtered(struct
> net_device *dev, int filter_idx)
> > return false;
> >  }
> >
> > +static bool neigh_dump_filtered(struct net_device *dev, int filter_idx,
> > +   int filter_master_idx)
> > +{
> > +   if (neigh_ifindex_filtered(dev, filter_idx) ||
> > +   neigh_master_filtered(dev, filter_master_idx))
> > +   return true;
> > +
> > +   return false;
> > +}
> > +
> >  static int neigh_dump_table(struct neigh_table *tbl, struct sk_buff
*skb,
> > struct netlink_callback *cb)
> >  {
> > @@ -2285,20 +2295,15 @@ static int neigh_dump_table(struct neigh_table
> *tbl, struct sk_buff *skb,
> > rcu_read_lock_bh();
> > nht = rcu_dereference_bh(tbl->nht);
> >
> > -   for (h = s_h; h < (1 << nht->hash_shift); h++) {
> > -   if (h > s_h)
> > -   s_idx = 0;
> > +   for (h = s_h; h < (1 << nht->hash_shift); h++, s_idx = 0) {
> > for (n = rcu_dereference_bh(nht->hash_buckets[h]), idx = 0;
> >  n != NULL;
> > -n = rcu_dereference_bh(n->next)) {
> > -   if (!net_eq(dev_net(n->dev), net))
> > -   continue;
> > -   if (neigh_ifindex_filtered(n->dev, filter_idx))
> > +n = rcu_dereference_bh(n->next), idx++) {
> > +   if (idx < s_idx || !net_eq(dev_net(n->dev), net))
> > continue;
> > -   if (neigh_master_filtered(n->dev,
filter_master_idx))
> > +   if (neigh_dump_filtered(n->dev, filter_idx,
> > +   filter_master_idx))
> > continue;
> > -   if (idx < s_idx)
> > -   goto next;
> > if (neigh_fill_info(skb, n, NETLINK_CB(cb-
> >skb).portid,
> > cb->nlh->nlmsg_seq,
> > RTM_NEWNEIGH,
> > @@ -2306,8 +2311,6 @@ static int neigh_dump_table(struct neigh_table
> *tbl, struct sk_buff *skb,
> > rc = -1;
> > goto out;
> > }
> > -next:
> > -   idx++;
> > }
> > }
> > rc = skb->len;
> > @@ -2328,14 +2331,10 @@ static int pneigh_dump_table(struct
> > neigh_table *tbl, struct sk_buff *skb,
> >
> > read_lock_bh(>lock);
> >
> > -   for (h = s_h; h <= PNEIGH_HASHMASK; h++) {
> > -   if (h > s_h)
> > -   s_idx = 0;
> > -   for (n = tbl->phash_buckets[h], idx = 0; n; n = n->next) {
> > -   if (pneigh_net(n) != net)
> > +   for (h = s_h; h <= PNEIGH_HASHMASK; h++, s_idx = 0) {
> > +   for (n = tbl->phash_buckets[h], idx = 0; n; n = n->next,
idx++)
> {
> > +   if (idx < s_idx || pneigh_net(n) != net)
> > continue;
> > -   if (idx < s_idx)
> > -   goto next;
> > if (pneigh_fill_info(skb, n, NETLINK_CB(cb-
> >skb).portid,
> > cb->nlh->nlmsg_seq,
> > RTM_NEWNEIGH,
> > @@ -2344,8 +2343,6 @@ static int pneigh_dump_table(struct neigh_table
> *tbl, struct sk_buff *skb,
> > rc = -1;
> > goto out;
> > }
> > -   next:
> > -   idx++;
> > }
> > }
> >
> 
> This fix is way to be complicated to be fixing anything related to
16660f0bd9
> or 21fdd092ac. Both of those commits added a continue:
> 
> if (neigh_ifindex_filtered(n->dev, filter_idx))
> continue;
> if (neigh_master_filtered(n->dev,

Re: [Patch net-next] net_sched: move the empty tp check from ->destroy() to ->delete()

2016-11-27 Thread John Fastabend
On 16-11-26 10:29 PM, Roi Dayan wrote:
> 
> 
> On 27/11/2016 06:47, Roi Dayan wrote:
>>
>>
>> On 27/11/2016 02:33, Daniel Borkmann wrote:
>>> On 11/26/2016 12:09 PM, Daniel Borkmann wrote:
 On 11/26/2016 07:46 AM, Cong Wang wrote:
> On Thu, Nov 24, 2016 at 7:20 AM, Daniel Borkmann
>  wrote:
>>> [...]
>> Ok, strange, qdisc_destroy() calls into ops->destroy(), where ingress
>> drops its entire chain via tcf_destroy_chain(), so that will be NULL
>> eventually. The tps are freed by call_rcu() as well as qdisc itself
>> later on via qdisc_rcu_free(), where it frees per-cpu bstats as well.
>> Outstanding readers should either bail out due to if (!cl) or can
>> still
>> process the chain until read section ends, but during that time,
>> cl->q
>> resp. bstats should be good. Do you happen to know what's at address
>> 880a68b04028? I was wondering wrt call_rcu() vs call_rcu_bh(),
>> but
>> at least on ingress (netif_receive_skb_internal()) we hold
>> rcu_read_lock()
>> here. The KASAN report is reliably happening at this location, right?
>
> I am confused as well, I don't see how it could be related to my
> patch yet.
> I will take a deep look in the weekend.
>>
>>
>>
>> Hi Cong,
>>
>> When reported the new trace I didn't mean it's related to your patch,
>> I just wanted to point it out it exposed something. I should have been
>> clear about it.
>>
>>

 Ok, I'm currently on the run. Got too late yesterday night, but I'll
 write what I found in the evening today, not related to ingress though.
>>>
>>> Just pushed out my analysis to netdev under "[PATCH net] net, sched:
>>> respect
>>> rcu grace period on cls destruction". My conclusion is that both
>>> issues are
>>> actually separate, and that one is small enough where we could route
>>> it via
>>> net actually. Perhaps this at the same time shrinks your "[PATCH
>>> net-next]
>>> net_sched: move the empty tp check from ->destroy() to ->delete()" to a
>>> reasonable size that it's suitable to net as well. Your
>>> ->delete()/->destroy()
>>> one is definitely needed, too. The tp->root one is independant of
>>> ->delete()/
>>> ->destroy() as they are different races and tp->root could also
>>> happen when
>>> you just destroy the whole tp directly. I think that seems like a
>>> good path
>>> forward to me.
>>>
>>> Thanks,
>>> Daniel
>>
>>
>>
>> Hi Daniel,
>>
>> As for the tainted kernel. I was in old (week or two) net-next tree
>> and only cherry-picked from latest net-next related patches to
>> Mellanox HCA, cls_api, cls_flower, devlink. so those are the tainted
>> modules.
>> I have the issue reproducing in that tree so wanted it to check it
>> with Cong's patch instead of latest net-next.
>> I'll try running reproducing the issue with your new patch and later
>> try latest net-next as well.
>>
>> Thanks,
>> Roi
>>
> 
> Hi,
> 
> I tested "[PATCH net] net, sched: respect rcu grace period on cls
> destruction" and could not reproduce my original issue.

Hi Roi,

Just so I'm 100% clear. No issue with just the above "respect rcu grace
period on cls destruction" per above statement.

> I rebased "[Patch net-next] net_sched: move the empty tp check from
> ->destroy() to ->delete()" over to test it in the same tree and got into
> a new trace in fl_delete.

In this case did you test with "net_sched: move the empty tp check from
->destroy() to ->delete()" _only_ or did this include both patches when
you see the error below.

>From my inspection we really need both patches to get correct behavior.

Thanks!
John

> 
> [35659.012123] BUG: KASAN: wild-memory-access on address 1803ca31
> [35659.020042] Write of size 1 by task ovs-vswitchd/20135
> [35659.025878] CPU: 19 PID: 20135 Comm: ovs-vswitchd Tainted:
> G   O4.9.0-rc3+ #18
> [35659.035948] Hardware name: HP ProLiant DL380p Gen8, BIOS P70 07/01/2015
> [35659.043730] Call Trace:
> [35659.046619]  [] dump_stack+0x63/0x81
> [35659.052456]  [] kasan_report_error+0x408/0x4e0
> [35659.059402]  [] kasan_report+0x58/0x60
> [35659.065428]  [] ? call_rcu_sched+0x1d/0x20
> [35659.072119]  [] ? fl_destroy_filter+0x21/0x30
> [cls_flower]
> [35659.080217]  [] ? fl_delete+0x1df/0x2e0 [cls_flower]
> [35659.087580]  [] __asan_store1+0x4a/0x50
> [35659.093697]  [] fl_delete+0x1df/0x2e0 [cls_flower]
> [35659.100870]  [] tc_ctl_tfilter+0x10da/0x1b90
> 
> 
> 0x1d02 is in fl_delete (net/sched/cls_flower.c:805).
> 800 struct cls_fl_filter *f = (struct cls_fl_filter *) arg;
> 801
> 802 rhashtable_remove_fast(>ht, >ht_node,
> 803head->ht_params);
> 804 __fl_delete(tp, f);
> 805 *last = list_empty(>filters);
> 806 return 0;
> 807 }
> 
> 
> Thanks,
> Roi



Re: [PATCH net v2 0/5] net: fix phydev reference leaks

2016-11-27 Thread Timur Tabi

David Miller wrote:

Series applied, thanks.


I was really hoping you'd give me the chance to test the patches before 
applying them.


--
Sent by an employee of the Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the
Code Aurora Forum, hosted by The Linux Foundation.


Re: [net,v2] neigh: fix the loop index error in neigh dump

2016-11-27 Thread David Ahern
On 11/27/16 6:32 PM, Zhang Shengju wrote:
> Loop index in neigh dump function is not updated correctly under some
> circumstances, this patch will fix it.

What's an example?

> 
> Fixes: 16660f0bd9 ("net: Add support for filtering neigh dump by device 
> index")
> Fixes: 21fdd092ac ("net: Add support for filtering neigh dump by master 
> device")
> 
> Signed-off-by: Zhang Shengju 
> ---
>  net/core/neighbour.c | 39 ++-
>  1 file changed, 18 insertions(+), 21 deletions(-)
> 
> diff --git a/net/core/neighbour.c b/net/core/neighbour.c
> index 2ae929f..ce32e9c 100644
> --- a/net/core/neighbour.c
> +++ b/net/core/neighbour.c
> @@ -2256,6 +2256,16 @@ static bool neigh_ifindex_filtered(struct net_device 
> *dev, int filter_idx)
>   return false;
>  }
>  
> +static bool neigh_dump_filtered(struct net_device *dev, int filter_idx,
> + int filter_master_idx)
> +{
> + if (neigh_ifindex_filtered(dev, filter_idx) ||
> + neigh_master_filtered(dev, filter_master_idx))
> + return true;
> +
> + return false;
> +}
> +
>  static int neigh_dump_table(struct neigh_table *tbl, struct sk_buff *skb,
>   struct netlink_callback *cb)
>  {
> @@ -2285,20 +2295,15 @@ static int neigh_dump_table(struct neigh_table *tbl, 
> struct sk_buff *skb,
>   rcu_read_lock_bh();
>   nht = rcu_dereference_bh(tbl->nht);
>  
> - for (h = s_h; h < (1 << nht->hash_shift); h++) {
> - if (h > s_h)
> - s_idx = 0;
> + for (h = s_h; h < (1 << nht->hash_shift); h++, s_idx = 0) {
>   for (n = rcu_dereference_bh(nht->hash_buckets[h]), idx = 0;
>n != NULL;
> -  n = rcu_dereference_bh(n->next)) {
> - if (!net_eq(dev_net(n->dev), net))
> - continue;
> - if (neigh_ifindex_filtered(n->dev, filter_idx))
> +  n = rcu_dereference_bh(n->next), idx++) {
> + if (idx < s_idx || !net_eq(dev_net(n->dev), net))
>   continue;
> - if (neigh_master_filtered(n->dev, filter_master_idx))
> + if (neigh_dump_filtered(n->dev, filter_idx,
> + filter_master_idx))
>   continue;
> - if (idx < s_idx)
> - goto next;
>   if (neigh_fill_info(skb, n, NETLINK_CB(cb->skb).portid,
>   cb->nlh->nlmsg_seq,
>   RTM_NEWNEIGH,
> @@ -2306,8 +2311,6 @@ static int neigh_dump_table(struct neigh_table *tbl, 
> struct sk_buff *skb,
>   rc = -1;
>   goto out;
>   }
> -next:
> - idx++;
>   }
>   }
>   rc = skb->len;
> @@ -2328,14 +2331,10 @@ static int pneigh_dump_table(struct neigh_table *tbl, 
> struct sk_buff *skb,
>  
>   read_lock_bh(>lock);
>  
> - for (h = s_h; h <= PNEIGH_HASHMASK; h++) {
> - if (h > s_h)
> - s_idx = 0;
> - for (n = tbl->phash_buckets[h], idx = 0; n; n = n->next) {
> - if (pneigh_net(n) != net)
> + for (h = s_h; h <= PNEIGH_HASHMASK; h++, s_idx = 0) {
> + for (n = tbl->phash_buckets[h], idx = 0; n; n = n->next, idx++) 
> {
> + if (idx < s_idx || pneigh_net(n) != net)
>   continue;
> - if (idx < s_idx)
> - goto next;
>   if (pneigh_fill_info(skb, n, NETLINK_CB(cb->skb).portid,
>   cb->nlh->nlmsg_seq,
>   RTM_NEWNEIGH,
> @@ -2344,8 +2343,6 @@ static int pneigh_dump_table(struct neigh_table *tbl, 
> struct sk_buff *skb,
>   rc = -1;
>   goto out;
>   }
> - next:
> - idx++;
>   }
>   }
>  

This fix is way to be complicated to be fixing anything related to 16660f0bd9 
or 21fdd092ac. Both of those commits added a continue:

if (neigh_ifindex_filtered(n->dev, filter_idx))
continue;
if (neigh_master_filtered(n->dev, filter_master_idx))
continue;

At best the continue is replaced by 'goto next;' and I am not convinced that is 
right.

You are completely rewriting the dump loops.



Re: [PATCH net-next 0/6] BPF cleanups and misc updates

2016-11-27 Thread David Miller
From: Daniel Borkmann 
Date: Sat, 26 Nov 2016 01:28:03 +0100

> This patch set adds couple of cleanups in first few patches,
> exposes owner_prog_type for array maps as well as mlocked mem
> for maps in fdinfo, allows for mount permissions in fs and
> fixes various outstanding issues in selftests and samples.

Series applied, thanks Daniel.


Re: [PATCH net 1/1] tipc: fix link statistics counter errors

2016-11-27 Thread David Miller
From: Jon Maloy 
Date: Fri, 25 Nov 2016 10:35:02 -0500

> In commit e4bf4f76962b ("tipc: simplify packet sequence number
> handling") we changed the internal representation of the packet
> sequence number counters from u32 to u16, reflecting what is really
> sent over the wire.
> 
> Since then some link statistics counters have been displaying incorrect
> values, partially because the counters meant to be used as sequence
> number snapshots are now used as direct counters, stored as u32, and
> partially because some counter updates are just missing in the code.
> 
> In this commit we correct this in two ways. First, we base the
> displayed packet sent/received values on direct counters instead
> of as previously a calculated difference between current sequence
> number and a snapshot. Second, we add the missing updates of the
> counters.
> 
> This change is compatible with the current netlink API, and requires
> no changes to the user space tools.
> 
> Signed-off-by: Jon Maloy 

Applied.


Re: [PATCH v2 0/7] stmmac: dwmac-meson8b: configurable RGMII TX delay

2016-11-27 Thread David Miller
From: Martin Blumenstingl 
Date: Fri, 25 Nov 2016 14:01:49 +0100

> Currently the dwmac-meson8b stmmac glue driver uses a hardcoded 1/4
> cycle TX clock delay. This seems to work fine for many boards (for
> example Odroid-C2 or Amlogic's reference boards) but there are some
> others where TX traffic is simply broken.
> There are probably multiple reasons why it's working on some boards
> while it's broken on others:
> - some of Amlogic's reference boards are using a Micrel PHY
> - hardware circuit design
> - maybe more...

The ARM arch file changes do not apply cleanly to net-next, you probably
want to merge them via the ARM tree instead of mine, and respin this series
to be without the .dts file changes.


[net,v2] neigh: fix the loop index error in neigh dump

2016-11-27 Thread Zhang Shengju
Loop index in neigh dump function is not updated correctly under some
circumstances, this patch will fix it.

Fixes: 16660f0bd9 ("net: Add support for filtering neigh dump by device index")
Fixes: 21fdd092ac ("net: Add support for filtering neigh dump by master device")

Signed-off-by: Zhang Shengju 
---
 net/core/neighbour.c | 39 ++-
 1 file changed, 18 insertions(+), 21 deletions(-)

diff --git a/net/core/neighbour.c b/net/core/neighbour.c
index 2ae929f..ce32e9c 100644
--- a/net/core/neighbour.c
+++ b/net/core/neighbour.c
@@ -2256,6 +2256,16 @@ static bool neigh_ifindex_filtered(struct net_device 
*dev, int filter_idx)
return false;
 }
 
+static bool neigh_dump_filtered(struct net_device *dev, int filter_idx,
+   int filter_master_idx)
+{
+   if (neigh_ifindex_filtered(dev, filter_idx) ||
+   neigh_master_filtered(dev, filter_master_idx))
+   return true;
+
+   return false;
+}
+
 static int neigh_dump_table(struct neigh_table *tbl, struct sk_buff *skb,
struct netlink_callback *cb)
 {
@@ -2285,20 +2295,15 @@ static int neigh_dump_table(struct neigh_table *tbl, 
struct sk_buff *skb,
rcu_read_lock_bh();
nht = rcu_dereference_bh(tbl->nht);
 
-   for (h = s_h; h < (1 << nht->hash_shift); h++) {
-   if (h > s_h)
-   s_idx = 0;
+   for (h = s_h; h < (1 << nht->hash_shift); h++, s_idx = 0) {
for (n = rcu_dereference_bh(nht->hash_buckets[h]), idx = 0;
 n != NULL;
-n = rcu_dereference_bh(n->next)) {
-   if (!net_eq(dev_net(n->dev), net))
-   continue;
-   if (neigh_ifindex_filtered(n->dev, filter_idx))
+n = rcu_dereference_bh(n->next), idx++) {
+   if (idx < s_idx || !net_eq(dev_net(n->dev), net))
continue;
-   if (neigh_master_filtered(n->dev, filter_master_idx))
+   if (neigh_dump_filtered(n->dev, filter_idx,
+   filter_master_idx))
continue;
-   if (idx < s_idx)
-   goto next;
if (neigh_fill_info(skb, n, NETLINK_CB(cb->skb).portid,
cb->nlh->nlmsg_seq,
RTM_NEWNEIGH,
@@ -2306,8 +2311,6 @@ static int neigh_dump_table(struct neigh_table *tbl, 
struct sk_buff *skb,
rc = -1;
goto out;
}
-next:
-   idx++;
}
}
rc = skb->len;
@@ -2328,14 +2331,10 @@ static int pneigh_dump_table(struct neigh_table *tbl, 
struct sk_buff *skb,
 
read_lock_bh(>lock);
 
-   for (h = s_h; h <= PNEIGH_HASHMASK; h++) {
-   if (h > s_h)
-   s_idx = 0;
-   for (n = tbl->phash_buckets[h], idx = 0; n; n = n->next) {
-   if (pneigh_net(n) != net)
+   for (h = s_h; h <= PNEIGH_HASHMASK; h++, s_idx = 0) {
+   for (n = tbl->phash_buckets[h], idx = 0; n; n = n->next, idx++) 
{
+   if (idx < s_idx || pneigh_net(n) != net)
continue;
-   if (idx < s_idx)
-   goto next;
if (pneigh_fill_info(skb, n, NETLINK_CB(cb->skb).portid,
cb->nlh->nlmsg_seq,
RTM_NEWNEIGH,
@@ -2344,8 +2343,6 @@ static int pneigh_dump_table(struct neigh_table *tbl, 
struct sk_buff *skb,
rc = -1;
goto out;
}
-   next:
-   idx++;
}
}
 
-- 
1.8.3.1





Re: [patch v2 net-next] sfc: remove unneeded variable

2016-11-27 Thread David Miller
From: Dan Carpenter 
Date: Fri, 25 Nov 2016 13:43:04 +0300

> We don't use ->heap_buf after commit 46d1efd852cc ("sfc: remove Software
> TSO") so let's remove the last traces.
> 
> Signed-off-by: Dan Carpenter 

Applied, thanks Dan.


Re: [PATCH] net: fec: turn on device when extracting statistics

2016-11-27 Thread David Miller
From: Nikita Yushchenko 
Date: Fri, 25 Nov 2016 13:02:00 +0300

> + int i, ret;
> +
> + ret = pm_runtime_get_sync(>pdev->dev);
> + if (IS_ERR_VALUE(ret)) {
> + memset(data, 0, sizeof(*data) * ARRAY_SIZE(fec_stats));
> + return;
> + }

This really isn't the way to do this.

When the device is suspended and the clocks are going to be stopped,
you must fetch the statistic values into a software copy and provide
those if the device is suspended when statistics are requested.


Re: pull-request: wireless-drivers-next 2016-11-25

2016-11-27 Thread David Miller
From: Kalle Valo 
Date: Fri, 25 Nov 2016 11:39:49 +0200

> here's a pull request for 4.10. ath9k has now been converted to use
> mac80211 intermediate software queues to fix bufferbloat problems. rsi
> has become active again and latevy mwifiex has been getting a _lot_ of
> love.
> 
> I'm not expecting to see any problems with this pull request. When you
> pull git will do lots of automerging but at least I didn't see any
> conflicts. Please let me know if you have any problems.

Pulled, thanks Kalle.



Re: [PATCH 1/1] net: macb: fix the RX queue reset in macb_rx()

2016-11-27 Thread David Miller
From: Cyrille Pitchen 
Date: Fri, 25 Nov 2016 09:49:32 +0100

> On macb only (not gem), when a RX queue corruption was detected from
> macb_rx(), the RX queue was reset: during this process the RX ring
> buffer descriptor was initialized by macb_init_rx_ring() but we forgot
> to also set bp->rx_tail to 0.
> 
> Indeed, when processing the received frames, bp->rx_tail provides the
> macb driver with the index in the RX ring buffer of the next buffer to
> process. So when the whole ring buffer is reset we must also reset
> bp->rx_tail so the driver is synchronized again with the hardware.
> 
> Since macb_init_rx_ring() is called from many locations, currently from
> macb_rx() and macb_init_rings(), we'd rather add the "bp->rx_tail = 0;"
> line inside macb_init_rx_ring() than add the very same line after each
> call of this function.
> 
> Without this fix, the rx queue is not reset properly to recover from
> queue corruption and connection drop may occur.
> 
> Signed-off-by: Cyrille Pitchen 
> Fixes: 9ba723b081a2 ("net: macb: remove BUG_ON() and reset the queue to 
> handle RX errors")

This doesn't apply cleanly to the 'net' tree, where
RX_RING_SIZE is used instead of bp->rx_ring_size. It seems
you generated this against net-next, however you didn't say
that either in your Subject line nor the commit message.

As a bug fix this should be targetted at 'net'.


Re: pull request (net): ipsec 2016-11-25

2016-11-27 Thread David Miller
From: Steffen Klassert 
Date: Fri, 25 Nov 2016 07:57:57 +0100

> 1) Fix a refcount leak in vti6.
>From Nicolas Dichtel.
> 
> 2) Fix a wrong if statement in xfrm_sk_policy_lookup.
>From Florian Westphal.
> 
> 3) The flowcache watermarks are per cpu. Take this into
>account when comparing to the threshold where we
>refusing new allocations. From Miroslav Urbanek.
> 
> Please pull or let me know if there are problems.

Pulled, thanks Steffen!


Re: [PATCH net 1/1] driver: macvtap: Unregister netdev rx_handler if macvtap_newlink fails

2016-11-27 Thread David Miller
From: f...@ikuai8.com
Date: Fri, 25 Nov 2016 10:05:06 +0800

> From: Gao Feng 
> 
> The macvtap_newlink registers the netdev rx_handler firstly, but it
> does not unregister the handler if macvlan_common_newlink failed.
> 
> Signed-off-by: Gao Feng 

Applied.


Re: [PATCH net v2 0/5] net: fix phydev reference leaks

2016-11-27 Thread David Miller
From: Johan Hovold 
Date: Thu, 24 Nov 2016 19:21:26 +0100

> This series fixes a number of phydev reference leaks (and one of_node
> leak) due to failure to put the reference taken by of_phy_find_device().
> 
> Note that I did not try to fix drivers/net/phy/xilinx_gmii2rgmii.c which
> still leaks a reference.
> 
> Against net but should apply just as fine to net-next.
 ...
> v2: 
>  - use put_device() instead of phy_dev_free() to put the references
>taken in net/dsa (patch 1/4).
>  - add four new patches fixing similar leaks

Series applied, thanks.


Re: [PATCH] irda: fix overly long udelay()

2016-11-27 Thread David Miller
From: Arnd Bergmann 
Date: Thu, 24 Nov 2016 17:26:22 +0100

> irda_get_mtt() returns a hardcoded '1' in some cases,
> and with gcc-7, we get a build error because this triggers a
> compile-time check in udelay():
> 
> drivers/net/irda/w83977af_ir.o: In function `w83977af_hard_xmit':
> w83977af_ir.c:(.text.w83977af_hard_xmit+0x14c): undefined reference to 
> `__bad_udelay'
> 
> Older compilers did not run into this because they either did not
> completely inline the irda_get_mtt() or did not consider the
> 1 value a constant expression.
> 
> The code has been wrong since the start of git history.
> 
> Signed-off-by: Arnd Bergmann 

Applied, thanks Arnd.


Re: [PATCH net 1/1] driver: ipvlan: Fix one possible memleak in ipvlan_link_new

2016-11-27 Thread David Miller
From: f...@ikuai8.com
Date: Thu, 24 Nov 2016 23:39:59 +0800

> From: Gao Feng 
> 
> When ipvlan_link_new fails and creates one ipvlan port, it does not
> destroy the ipvlan port created. It causes mem leak and the physical
> device contains invalid ipvlan data.
> 
> Signed-off-by: Gao Feng 

Applied, thanks.


Re: [PATCH 2/2] net: dsa: mv88e6xxx: Add 88E6176 device tree support

2016-11-27 Thread Andrew Lunn
> Try to see it from my perspective: I see that some vf610 device I don't
> have (found via `git grep marvell,mv88e6` or so) uses
> "marvell,mv88e6085". I then assume it has that device on board. How
> would I know it doesn't? Same for the other boards you mention.
> 
> Unfortunately some of your replies are slightly cryptic. Had you simply
> replied 'please just use "marvell,mv88e6085" instead', it would've been
> much more clear what you want. (Same for extending the subject instead
> of just pointing to some FAQ.)

By reading the FAQ you have learnt more than me saying put the correct
tree in the subject line. By asking you to explain why you need a
compatible string, i'm trying to make you think, look at the code and
understand it. In the future, you might think and understand the code
before posting a patch, and then we all save time.

> So are you okay with patch 1/2 documenting the compatible? Then we could
> drop 2/2 and use "marvell,mv88e6176", "marvell,mv88e6085" instead of
> just the latter. Or would you rather drop both and keep the actual chip
> a comment?

A comment only please.

Thanks
Andrew


Re: [PATCH net-next v2 3/4] Documentation: net: phy: Add blurb about RGMII

2016-11-27 Thread Florian Fainelli
Le 27/11/2016 à 14:24, Timur Tabi a écrit :
>> + * PHY device drivers in PHYLIB being reusable by nature, being able to
>> +   configure correctly a specified delay enables more designs with
>> similar delay
>> +   requirements to be operate correctly
> 
> Ok, this one I don't know how to fix.  I'm not really sure what you're
> trying to say.

What I am trying to say is that once a PHY driver properly configures a
delay that you have specified, there is no reason why this is not
applicable to other platforms using this same PHY driver.

>> +
>> +Common problems with RGMII delay mismatch
>> +
>> + When there is a RGMII delay mismatch between the Ethernet MAC and
>> the PHY, this
>> + will most likely result in the clock and data line sampling to
>> capture unstable
> 
> I'm not sure what "sampling to capture unstable" is supposed to mean.

When the PHY devices takes a "snapshot" of the state of the data lines,
after a clock edge, if the delay is improperly configured, these data
lines are going to still be floating, or show some kind of
capacitance/inductance effect, so the logical level which is going to be
read may be incorrect.
-- 
Florian


Re: [PATCH 2/2] net: dsa: mv88e6xxx: Add 88E6176 device tree support

2016-11-27 Thread Andreas Färber
Andrew,

Am 27.11.2016 um 23:08 schrieb Andrew Lunn:
>>> This driver already supports nearly 30 different Marvell switch
>>> models. Please document why the marvell,mv88e6176 is special and why
>>> it needs its own compatible string when the others don't.
>>
>> I don't understand.
> 
> Think about what i said. Why does the 6176 need its own compatible
> string, when the two 6352s and the 6165 on the zii-devel-b don't have
> one? And the DIR 665 has a 6171, which does not have a compatible
> string of its own. The clearfog actually has a 6176, and it seems to
> work fine without a compatible string.
> 
>> You as driver author should know that the .data pointer is vital to your
>> driver
> 
> Exactly, so if i ask why is it needed, maybe you should stop and think
> for a while.
> 
>> you even recently accepted another model that conflicted with
>> my patch.
> 
> And think about that also, and you will find the 6390 family, who's
> first device is 6190, is not compatible with the 6085, and so needs a
> different compatible string.

Try to see it from my perspective: I see that some vf610 device I don't
have (found via `git grep marvell,mv88e6` or so) uses
"marvell,mv88e6085". I then assume it has that device on board. How
would I know it doesn't? Same for the other boards you mention.

Unfortunately some of your replies are slightly cryptic. Had you simply
replied 'please just use "marvell,mv88e6085" instead', it would've been
much more clear what you want. (Same for extending the subject instead
of just pointing to some FAQ.)

So are you okay with patch 1/2 documenting the compatible? Then we could
drop 2/2 and use "marvell,mv88e6176", "marvell,mv88e6085" instead of
just the latter. Or would you rather drop both and keep the actual chip
a comment?

Regards,
Andreas

-- 
SUSE Linux GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)


Re: [PATCH net-next] net: dsa: mv88e6xxx: Fix mv88e6xxx_g1_irq_free() interrupt count

2016-11-27 Thread Andrew Lunn
On Sun, Nov 27, 2016 at 11:26:28PM +0100, Andreas Färber wrote:
> mv88e6xxx_g1_irq_setup() sets up chip->g1_irq.nirqs interrupt mappings,
> so free the same amount. This will be 8 or 9 in practice, less than 16.
> 
> Fixes: dc30c35be720 ("net: dsa: mv88e6xxx: Implement interrupt support.")
> Cc: Andrew Lunn 
> Signed-off-by: Andreas Färber 

Reviewed-by: Andrew Lunn 

Thanks
Andrew


[PATCH net-next] net: dsa: mv88e6xxx: Fix mv88e6xxx_g1_irq_free() interrupt count

2016-11-27 Thread Andreas Färber
mv88e6xxx_g1_irq_setup() sets up chip->g1_irq.nirqs interrupt mappings,
so free the same amount. This will be 8 or 9 in practice, less than 16.

Fixes: dc30c35be720 ("net: dsa: mv88e6xxx: Implement interrupt support.")
Cc: Andrew Lunn 
Signed-off-by: Andreas Färber 
---
 drivers/net/dsa/mv88e6xxx/chip.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
index b14b3d5099c8..77f13ada2612 100644
--- a/drivers/net/dsa/mv88e6xxx/chip.c
+++ b/drivers/net/dsa/mv88e6xxx/chip.c
@@ -421,7 +421,7 @@ static void mv88e6xxx_g1_irq_free(struct mv88e6xxx_chip 
*chip)
 
free_irq(chip->irq, chip);
 
-   for (irq = 0; irq < 16; irq++) {
+   for (irq = 0; irq < chip->g1_irq.nirqs; irq++) {
virq = irq_find_mapping(chip->g1_irq.domain, irq);
irq_dispose_mapping(virq);
}
-- 
2.6.6



Re: [PATCH net-next v2 3/4] Documentation: net: phy: Add blurb about RGMII

2016-11-27 Thread Timur Tabi
Just some grammatical corrections.  You might want to run a spellchecker 
on all the patches.


Florian Fainelli wrote:

+ The Reduced Gigabit Medium Independent Interface (RGMII) is a 12 pins


"is a 12-pin"


+ electrical signal interface using a synchronous 125Mhz clock signal and 
several
+ data lines. Due to this design decision, a 1.5ns to 2ns delay must be added
+ between the clock line (RXC or TXC) and the data lines to let the PHY (clock
+ sink) have enough setup and hold times to sample the data lines correctly. The
+ PHY library offers different types of PHY_INTERFACE_MODE_RGMII* values to let
+ the PHY driver and optionaly the MAC driver implement the required delay. The


"driver, and optionally the MAC driver, implement"


+ values of phy_interface_t must be understood from the perspective of the PHY
+ device itself, leading to the following:
+
+ * PHY_INTERFACE_MODE_RGMII: the PHY is not responsible for inserting any
+   internal delay by itself, it assumes that either the Ethernet MAC (if 
capable
+   or the PCB traces) insert the correct 1.5-2ns delay
+
+ * PHY_INTERFACE_MODE_RGMII_TXID: the PHY should be inserting an internal delay


"should insert"



+   for the transmit data lines (TXD[3:0]) processed by the PHY device
+
+ * PHY_INTERFACE_MODE_RGMII_RXID: the PHY should be inserting an internal delay


"should insert"



+   for the receive data lines (RXD[3:0]) processed by the PHY device
+
+ * PHY_INTERFACE_MODE_RGMII_ID: the PHY should be inserting internal delays for


"should insert"


+   both transmit AND receive data lines from/to the PHY device
+
+ Whenever it is possible, it is preferrable to utilize the PHY side RGMII delay
+ for several reasons:


"Whenever possible, use the PHY side RGMII delay for these reasons:"


+ * PHY devices may offer sub-nanosecond granularity in how they allow a
+   receiver/transmitter side delay (e.g: 0.5, 1.0, 1.5ns) to be specified. Such
+   precision may be required to account for differences in PCB trace lengths
+
+ * PHY devices are typically qualified for a large range of applications
+   (industrial, medical, automotive...), and they provide a constant and
+   reliable delay across temperature/pressure/voltage ranges
+
+ * PHY device drivers in PHYLIB being reusable by nature, being able to
+   configure correctly a specified delay enables more designs with similar 
delay
+   requirements to be operate correctly


Ok, this one I don't know how to fix.  I'm not really sure what you're 
trying to say.



+
+ For cases where the PHY is not capable of providing this delay, but the
+ Ethernet MAC driver is capable of doing it, the correct phy_interface_t value


"doing so,"


+ should be PHY_INTERFACE_MODE_RGMII, and the Ethernet MAC driver should be
+ configured correctly in order to provide the required transmit and/or receive
+ side delay from the perspective of the PHY device. Conversely, if the Ethernet
+ MAC driver looks at the phy_interface_t value, for any other mode but
+ PHY_INTERFACE_MODE_RGMII, it should make sure that the MAC-level delays are
+ disabled.
+
+ In case neither the Ethernet MAC, nor the PHY are capable of providing the
+ required delays, as defined per the RGMII standard, several options may be
+ available:
+
+ * Some SoCs may offer a pin pad/mux/controller capable of configuring a given
+   set of pins' drive strength, delays and voltage, and it may be a suitable


"strength, delays, and voltage; and"


+   option to insert the expected 2ns RGMII delay
+
+ * Modifying the PCB design to include a fixed delay (e.g: using a specifically
+   designed serpentine), which may not require software configuration at all


period after "all".


+
+Common problems with RGMII delay mismatch
+
+ When there is a RGMII delay mismatch between the Ethernet MAC and the PHY, 
this
+ will most likely result in the clock and data line sampling to capture 
unstable


I'm not sure what "sampling to capture unstable" is supposed to mean.


+ signals, typical symptoms include:
+
+ * Transmission/reception partially works, and there is frequent or occasional
+   packet loss observed
+
+ * Ethernet MAC may report some, or all packets ingressing with a FCS/CRC 
error,


No comma after "some".


+   or just discard them all
+
+ * Switching to lower speeds such as 10/100Mbits/sec makes the problem go away
+   (since there is enough setup/hold time in that case)


--
Sent by an employee of the Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the
Code Aurora Forum, hosted by The Linux Foundation.


Re: [PATCH 2/2] net: dsa: mv88e6xxx: Add 88E6176 device tree support

2016-11-27 Thread Andrew Lunn
> > This driver already supports nearly 30 different Marvell switch
> > models. Please document why the marvell,mv88e6176 is special and why
> > it needs its own compatible string when the others don't.
> 
> I don't understand.

Think about what i said. Why does the 6176 need its own compatible
string, when the two 6352s and the 6165 on the zii-devel-b don't have
one? And the DIR 665 has a 6171, which does not have a compatible
string of its own. The clearfog actually has a 6176, and it seems to
work fine without a compatible string.

> You as driver author should know that the .data pointer is vital to your
> driver

Exactly, so if i ask why is it needed, maybe you should stop and think
for a while.

> you even recently accepted another model that conflicted with
> my patch.

And think about that also, and you will find the 6390 family, who's
first device is 6190, is not compatible with the 6085, and so needs a
different compatible string.

  Andrew


Re: [PATCH 2/2] net: dsa: mv88e6xxx: Add 88E6176 device tree support

2016-11-27 Thread Andreas Färber
Am 27.11.2016 um 22:27 schrieb Andrew Lunn:
> On Sun, Nov 27, 2016 at 09:57:59PM +0100, Andreas Färber wrote:
>> This model is found on the Turris Omnia.
> 
> This driver already supports nearly 30 different Marvell switch
> models. Please document why the marvell,mv88e6176 is special and why
> it needs its own compatible string when the others don't.

I don't understand.

The commit message above already points out for which device this is
(and you also know from the LAKML thread).

You as driver author should know that the .data pointer is vital to your
driver - you even recently accepted another model that conflicted with
my patch. So are you arguing for a ", which uses a Device Tree for
booting" half-sentence here?

The others not having an entry simply means no one needed them yet.

And any Turris Omnia side changes need to go through the mvebu tree.

Regards,
Andreas

-- 
SUSE Linux GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)


Re: [PATCH] mv88e6xxx: Fix mv88e6xxx_g1_irq_free() interrupt count

2016-11-27 Thread Andrew Lunn
On Sun, Nov 27, 2016 at 10:32:41PM +0100, Andreas Färber wrote:
> Hi Andrew,
> 
> Am 27.11.2016 um 22:22 schrieb Andrew Lunn:
> > On Sun, Nov 27, 2016 at 09:43:44PM +0100, Andreas Färber wrote:
> >> mv88e6xxx_g1_irq_setup() sets up chip->g1_irq.nirqs interrupt mappings,
> >> so free the same amount. This will be 8 or 9 in practice, less than 16.
> > 
> > Hi Andreas
> > 
> > The patch is correct, but please read
> > Documentation/networking/netdev-FAQ.txt
> > and then resubmit the patch.
> 
> Do you mean --subject-prefix="PATCH net-next"

Yep.

Thanks
Andrew


Re: [PATCH v2] MAINTAINERS: Add device tree bindings to mv88e6xx section

2016-11-27 Thread Andrew Lunn
On Sun, Nov 27, 2016 at 10:07:30PM +0100, Andreas Färber wrote:
> Also include the netdev list for convenience, as done elsewhere.

Please indicate which maintainer you expect to accept this. And if that
is David Miller, please fix the Subject: line.
 
> Cc: Andrew Lunn 
> Cc: Vivien Didelot 
> Signed-off-by: Andreas Färber 

Reviewed-by: Andrew Lunn 

Andrew


Re: [PATCH] mv88e6xxx: Fix mv88e6xxx_g1_irq_free() interrupt count

2016-11-27 Thread Andreas Färber
Hi Andrew,

Am 27.11.2016 um 22:22 schrieb Andrew Lunn:
> On Sun, Nov 27, 2016 at 09:43:44PM +0100, Andreas Färber wrote:
>> mv88e6xxx_g1_irq_setup() sets up chip->g1_irq.nirqs interrupt mappings,
>> so free the same amount. This will be 8 or 9 in practice, less than 16.
> 
> Hi Andreas
> 
> The patch is correct, but please read
> Documentation/networking/netdev-FAQ.txt
> and then resubmit the patch.

Do you mean --subject-prefix="PATCH net-next" or something else?

Thanks,
Andreas

-- 
SUSE Linux GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)


Re: [PATCH 2/2] net: dsa: mv88e6xxx: Add 88E6176 device tree support

2016-11-27 Thread Andrew Lunn
On Sun, Nov 27, 2016 at 09:57:59PM +0100, Andreas Färber wrote:
> This model is found on the Turris Omnia.

This driver already supports nearly 30 different Marvell switch
models. Please document why the marvell,mv88e6176 is special and why
it needs its own compatible string when the others don't.

  Andrew


Re: [PATCH] mv88e6xxx: Fix mv88e6xxx_g1_irq_free() interrupt count

2016-11-27 Thread Andrew Lunn
On Sun, Nov 27, 2016 at 09:43:44PM +0100, Andreas Färber wrote:
> mv88e6xxx_g1_irq_setup() sets up chip->g1_irq.nirqs interrupt mappings,
> so free the same amount. This will be 8 or 9 in practice, less than 16.

Hi Andreas

The patch is correct, but please read
Documentation/networking/netdev-FAQ.txt
and then resubmit the patch.

Andrew

> 
> Fixes: dc30c35be720 ("net: dsa: mv88e6xxx: Implement interrupt support.")
> Cc: Andrew Lunn 
> Signed-off-by: Andreas Färber 
> ---
>  drivers/net/dsa/mv88e6xxx/chip.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/net/dsa/mv88e6xxx/chip.c 
> b/drivers/net/dsa/mv88e6xxx/chip.c
> index 98302358ceb9..95b9efb33ec7 100644
> --- a/drivers/net/dsa/mv88e6xxx/chip.c
> +++ b/drivers/net/dsa/mv88e6xxx/chip.c
> @@ -421,7 +421,7 @@ static void mv88e6xxx_g1_irq_free(struct mv88e6xxx_chip 
> *chip)
>  
>   free_irq(chip->irq, chip);
>  
> - for (irq = 0; irq < 16; irq++) {
> + for (irq = 0; irq < chip->g1_irq.nirqs; irq++) {
>   virq = irq_find_mapping(chip->g1_irq.domain, irq);
>   irq_dispose_mapping(virq);
>   }
> -- 
> 2.6.6
> 


Re: [PATCH net-next 09/11] qede: Better utilize the qede_[rt]x_queue

2016-11-27 Thread Mintz, Yuval
> > I'd say this is a false positive, given that MTU can't be so large.

> False positive or not you must fix the warning and resubmit this
> series with that fixed.

Sure. I'll re-spin later this week [hopefully it'll get some additional
review comments by then]. 

[PATCH v2] MAINTAINERS: Add device tree bindings to mv88e6xx section

2016-11-27 Thread Andreas Färber
Also include the netdev list for convenience, as done elsewhere.

Cc: Andrew Lunn 
Cc: Vivien Didelot 
Signed-off-by: Andreas Färber 
---
 MAINTAINERS | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index f73e19277a70..677d73cfedc7 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -7668,8 +7668,10 @@ S:   Maintained
 MARVELL 88E6XXX ETHERNET SWITCH FABRIC DRIVER
 M: Andrew Lunn 
 M: Vivien Didelot 
+L: netdev@vger.kernel.org
 S: Maintained
 F: drivers/net/dsa/mv88e6xxx/
+F: Documentation/devicetree/bindings/net/dsa/marvell.txt
 
 MARVELL ARMADA DRM SUPPORT
 M: Russell King 
-- 
2.6.6



[PATCH] MAINTAINERS: Add device tree bindings to mv88e6xx section

2016-11-27 Thread Andreas Färber
Also include the netdev list for convenience, as done elsewhere.

Cc: Andrew Lunn 
Cc: Vivien Didelot 
Signed-off-by: Andreas Färber 
---
 MAINTAINERS | 1 +
 1 file changed, 1 insertion(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index f73e19277a70..46ccf6eadcc9 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -7670,6 +7670,7 @@ M:Andrew Lunn 
 M: Vivien Didelot 
 S: Maintained
 F: drivers/net/dsa/mv88e6xxx/
+F: Documentation/devicetree/bindings/net/dsa/marvell.txt
 
 MARVELL ARMADA DRM SUPPORT
 M: Russell King 
-- 
2.6.6



[PATCH 2/2] net: dsa: mv88e6xxx: Add 88E6176 device tree support

2016-11-27 Thread Andreas Färber
This model is found on the Turris Omnia.

Signed-off-by: Andreas Färber 
---
 drivers/net/dsa/mv88e6xxx/chip.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
index 77f13ada2612..95b9efb33ec7 100644
--- a/drivers/net/dsa/mv88e6xxx/chip.c
+++ b/drivers/net/dsa/mv88e6xxx/chip.c
@@ -4280,6 +4280,10 @@ static const struct of_device_id mv88e6xxx_of_match[] = {
.data = _table[MV88E6085],
},
{
+   .compatible = "marvell,mv88e6176",
+   .data = _table[MV88E6176],
+   },
+   {
.compatible = "marvell,mv88e6190",
.data = _table[MV88E6190],
},
-- 
2.6.6



[PATCH 1/2] Documentation: net: dsa: marvell: Add 88E6176

2016-11-27 Thread Andreas Färber
Signed-off-by: Andreas Färber 
---
 Documentation/devicetree/bindings/net/dsa/marvell.txt | 1 +
 1 file changed, 1 insertion(+)

diff --git a/Documentation/devicetree/bindings/net/dsa/marvell.txt 
b/Documentation/devicetree/bindings/net/dsa/marvell.txt
index b3dd6b40e0de..000bc3b16edd 100644
--- a/Documentation/devicetree/bindings/net/dsa/marvell.txt
+++ b/Documentation/devicetree/bindings/net/dsa/marvell.txt
@@ -15,6 +15,7 @@ Additional required and optional properties can be found in 
dsa.txt.
 
 Required properties:
 - compatible  : Should be one of "marvell,mv88e6085" or
+"marvell,mv88e6176" or
 "marvell,mv88e6190"
 - reg  : Address on the MII bus for the switch.
 
-- 
2.6.6



Re: [PATCH] netdevice: fix sparse warning for HARD_TX_LOCK

2016-11-27 Thread David Miller
From: "Michael S. Tsirkin" 
Date: Thu, 24 Nov 2016 07:04:08 +0200

> sparse warns about context imbalance in any code
> that uses HARD_TX_LOCK/UNLOCK - this is because it's
> unable to determine that flags don't change so
> lock and unlock are paired.
> 
> Seems easy enough to fix by adding __acquire/__release
> calls.
> 
> With this patch af_packet.c is now sparse-clean,
> 
> Signed-off-by: Michael S. Tsirkin 

Applied to net-next, thanks.


[PATCH] mv88e6xxx: Fix mv88e6xxx_g1_irq_free() interrupt count

2016-11-27 Thread Andreas Färber
mv88e6xxx_g1_irq_setup() sets up chip->g1_irq.nirqs interrupt mappings,
so free the same amount. This will be 8 or 9 in practice, less than 16.

Fixes: dc30c35be720 ("net: dsa: mv88e6xxx: Implement interrupt support.")
Cc: Andrew Lunn 
Signed-off-by: Andreas Färber 
---
 drivers/net/dsa/mv88e6xxx/chip.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
index 98302358ceb9..95b9efb33ec7 100644
--- a/drivers/net/dsa/mv88e6xxx/chip.c
+++ b/drivers/net/dsa/mv88e6xxx/chip.c
@@ -421,7 +421,7 @@ static void mv88e6xxx_g1_irq_free(struct mv88e6xxx_chip 
*chip)
 
free_irq(chip->irq, chip);
 
-   for (irq = 0; irq < 16; irq++) {
+   for (irq = 0; irq < chip->g1_irq.nirqs; irq++) {
virq = irq_find_mapping(chip->g1_irq.domain, irq);
irq_dispose_mapping(virq);
}
-- 
2.6.6



Re: [PATCH net-next v2 0/4] Documentation: net: phy: Improve documentation

2016-11-27 Thread Martin Blumenstingl
On Sun, Nov 27, 2016 at 7:44 PM, Florian Fainelli  wrote:
> Hi all,
>
> This patch series addresses discussions and feedback that was recently 
> received
> on the mailing-list in the area of: flow control/pause frames, interpretation 
> of
> phy_interface_t and finally add some links to useful standards documents.
>
> Changes in v2:
>
> - clarify a few things in the RGMII section, add a paragraph about common 
> issues
>   with RGMII delay mismatches
Reviewed-by: Martin Blumenstingl 

Thanks a lot Florian, this will definitely help others in the future!

> Florian Fainelli (4):
>   Documentation: net: phy: remove description of function pointers
>   Documentation: net: phy: Add a paragraph about pause frames/flow
> control
>   Documentation: net: phy: Add blurb about RGMII
>   Documentation: net: phy: Add links to several standards documents
>
>  Documentation/networking/phy.txt | 139 
> +--
>  1 file changed, 104 insertions(+), 35 deletions(-)
>
> --
> 2.9.3
>


Re: [PATCH net-next 1/1] ptp: gianfar: Use high resolution frequency method.

2016-11-27 Thread David Miller
From: Ulrik De Bie 
Date: Wed, 23 Nov 2016 21:11:04 +0100

> This patch depends on commit d8d263541913 ("ptp: Introduce a high
> resolution frequency adjustment method.")
> 
> The gianfar devices offer a frequency resolution of about 0.46 ppb
> (depends on actual value of tmr_add, for the calculation assumed
> 0x8000). This patch lets users of the device benefit from the increased
> frequency resolution when tuning the clock. Thanks to the rounding the
> maximum error between the requested frequency and the applied frequency
> will then be about 0.23 ppb.
> 
> Tested on a v3.3.8 kernel on a real gianfar device. Verified compilation
> on net-next (currently at v4.9-rc5).
> 
> Signed-off-by: Ulrik De Bie 

Applied.


Re: [PATCH v3] cpsw: ethtool: add support for getting/setting EEE registers

2016-11-27 Thread David Miller
From: yegorsli...@googlemail.com
Date: Thu, 24 Nov 2016 10:17:01 +0100

> From: Yegor Yefremov 
> 
> Add the ability to query and set Energy Efficient Ethernet parameters
> via ethtool for applicable devices.
> 
> This patch doesn't activate full EEE support in cpsw driver, but it
> enables reading and writing EEE advertising settings. This way one
> can disable advertising EEE for certain speeds.
> 
> Signed-off-by: Yegor Yefremov 
> Acked-by: Rami Rosen 
> ---
> Changes:
>   v3: explain what features will be available with this patch (Florian 
> Fainelli)
>   v2: make routines static (Rami Rosen)

Does not apply cleanly to net-next, please respin.


Re: [PATCH net-next] mlx4: do not use priv->stats_lock in mlx4_en_auto_moderation()

2016-11-27 Thread David Miller
From: Eric Dumazet 
Date: Wed, 23 Nov 2016 09:46:52 -0800

> From: Eric Dumazet 
> 
> Per RX ring packets/bytes counters are not protected by global
> priv->stats_lock.
> 
> Better not confuse the reader, and use READ_ONCE() to show we read
> these counters without surrounding synchronization.
> 
> Interrupt moderation is best effort, and we do not really care of
> ultra precise counters.
> 
> Signed-off-by: Eric Dumazet 

Applied.


Re: [PATCH v2] ipv6:ipv6_pinfo dereferenced after NULL check

2016-11-27 Thread David Miller
From: Manjeet Pawar 
Date: Thu, 24 Nov 2016 16:11:57 +0530

> From: Rohit Thapliyal 
> 
> np checked for NULL and then dereferenced. It should be modified
> for NULL case.
> 
> Signed-off-by: Rohit Thapliyal 
> Signed-off-by: Manjeet Pawar 
> Signed-off-by: Hannes Frederic Sowa 
> Reviewed-by: Akhilesh Kumar 

I do not think inet6_sk(sk) can ever be NULL in this function.

All callers fall into two categories:

1) Calls where arguments already dereference np in some way to
   pass arguments to ip6_xmit():

net/dccp/ipv6.c:err = ip6_xmit(sk, skb, , opt, np->tclass);
net/ipv6/inet6_connection_sock.c:   res = ip6_xmit(sk, skb, , 
rcu_dereference(np->opt),
net/ipv6/tcp_ipv6.c:err = ip6_xmit(sk, skb, fl6, opt, np->tclass);
net/sctp/ipv6.c:res = ip6_xmit(sk, skb, fl6, rcu_dereference(np->opt), 
np->tclass);

2) Calls where the socket is a "control" socket which is initialized
   at procotol registration time and therefore definitely has
   a proper inet6_sk() pointer set up.

net/dccp/ipv6.c:ip6_xmit(ctl_sk, skb, , NULL, 0);
net/ipv6/tcp_ipv6.c:ip6_xmit(ctl_sk, buff, , NULL, tclass);

Therefore, I think we should simply remove the NULL test entirely.


Re: [PATCH net-next 09/11] qede: Better utilize the qede_[rt]x_queue

2016-11-27 Thread David Miller
From: "Mintz, Yuval" 
Date: Sun, 27 Nov 2016 16:15:42 +

> I'd say this is a false positive, given that MTU can't be so large.

False positive or not you must fix the warning and resubmit this
series with that fixed.


[PATCH] rtlwifi: Add updates for RTL8723BE and RTL8821AE

2016-11-27 Thread Larry Finger
The new versions will only work with new versions of the drivers. For
that reason, they are given new names and the old versions are retained.

Signed-off-by: Larry Finger 
---
 WHENCE |   4 
 rtlwifi/rtl8723befw_36.bin | Bin 0 -> 31762 bytes
 rtlwifi/rtl8821aefw_29.bin | Bin 0 -> 28348 bytes
 3 files changed, 4 insertions(+)
 create mode 100644 rtlwifi/rtl8723befw_36.bin
 create mode 100644 rtlwifi/rtl8821aefw_29.bin

diff --git a/WHENCE b/WHENCE
index 90d6e4d..c31fe15 100644
--- a/WHENCE
+++ b/WHENCE
@@ -2329,6 +2329,8 @@ Driver: rtl8723be - Realtek 802.11n WLAN driver for 
RTL8723BE
 
 Info: From Vendor's realtek/rtlwifi_linux_mac80211_0019.0320.2014V628 driver
 File: rtlwifi/rtl8723befw.bin
+Info: Update to version 36 - Sent by Realtek
+File: rtlwifi/rtl8723befw_36.bin
 
 Licence: Redistributable. See LICENCE.rtlwifi_firmware.txt for details.
 
@@ -2370,6 +2372,8 @@ Driver: rtl8821ae - Realtek 802.11n WLAN driver for 
RTL8821AE
 Info: From Vendor's realtek/rtlwifi_linux_mac80211_0019.0320.2014V628 driver
 File: rtlwifi/rtl8821aefw.bin
 File: rtlwifi/rtl8821aefw_wowlan.bin
+Info: Update to version 29 - Sent by Realtek
+File: rtlwifi/rtl8821aefw_29.bin
 
 Licence: Redistributable. See LICENCE.rtlwifi_firmware.txt for details.
 
diff --git a/rtlwifi/rtl8723befw_36.bin b/rtlwifi/rtl8723befw_36.bin
new file mode 100644
index 
..1bb9b9c8cea95689a0d27f9d7c264ad63728dde1
GIT binary patch
literal 31762
zcmbrn31CxI+BcqibF*}z%b~2Ilu+D(7Q|f)Qrt%9mI8v%07YC_QcYRJl3NgP!`?y_
zM6g*<5yVckE`SPHU%b*ucP@!#Z9HS^r^ZlN4Q$Xi^-|zdsf3f%M`*WW2
zZ0AI%$T1Wc3W$(=@yg0nwf5q4AzWvGyECpCL?{3$C}47
zB|}agOT)LoHi#MW3vB%}>6>S!Z?=T*kaT>94nfZRp@Ye98%*&-2KPnIqLKM|hQgwe
z{d4fW6lDvGF3zLxg?R>i3;6du`u5S1+30&8jlKqx9x~~n96gkyhvGbkK@S*u@d2Ot
z6r6uC0)vMPyWm2Dgnw2;j-fy1z0h#8!D`wgEOt~FE}mKwYUO|T1!pc+K{qnw!r
zc2?f`hCIV4CW@{3KJ(d?@qHEJIdP*27tkXMLBI)61TvWWAU5LDq*^pJd5d4`sc1W+MK7
z``}zZgXRuSj$~3h9*Mf38gstR^cV_{S_;3BJXXh#jtAdA)Z
zu-KNadA{B)I?YMz;^bGSJjU=+`6ug7)~|kDTEDnZxv|zQY}l`c@i+bvq4rT{?c>W1
zJ$|UgCPoi8iFTtWX>-?)9C`9se>K!3@cAuNiUj*e}*&{rNZ&9d|p^
z(!L!q;ODe7`Cone%Aa>)Fp{a>R8Jlo<+dLCrSr8+@x}W7X7e##V8DQH)6#w(Aph$;
z`Lk%jnv=(*lhKh9y0zgaYjos>pF9TJ>5fEcyvGW8M;wo%BazJ4Ea~nw=v};Cyx~ac
z2=`j}_tC6k+J2#wQ^#pYOD8H+}olxdY@Mw;P`~4W?q>pkkgo(SMDQO*x+ztZO7n
z5?`wCK#^W)+fB!QmVczEFCVG@w~o-XM@)sPcqn}6YfLs~|7#>S~%HFizAY9`564pMn-4yZ|+dNQE?5>S5)sHbe(hOLaR
zjA#aG$XQ#|q)+X*djTnl`!*$%EcxbkFpgNp%9T)hH;4JYA=uY7n__-1yU1
zVL3w(C2LN;T~=HYS5#91_ngo=oZ``9#j;Q>wrcG89K@-v
z@uB6{Dey)#{X*%gz*Nw?VD9@`rWd#X{pqg(DnvE6Pt89ozTEF?VDq1B#Av7e4U
zo~k09ts0lqi?Y3yu@jW11bGVCs`2hSlPK#?4EoWNFm?C45Hiv@~f4^_xz)G5c$
z=v05bGt@PwHME>+l!x$PI=;d%L#wHq8Pp;4$1Tbx-s7_xb#(Ja=B2DXJ29CDw(H
znNsw7WyfyOxXXwE(TcL^Xa*K#gC^
zE+4A|Zeh>21K`{yjdv-m_@CAk-v~8d6^D2fU!mHXm*El021T6+4|$+yEnD)2;@_w_
zEo!xhxN(%nby@5+hXp%#NDY3
zGMn9!CY#Xj%^EccS3RY-nil)eZDe}~3xlY^vRS*sC|)F&3rAVq;cyvfnJ?4*=7Xw(sn@0nxvp*s>X#q{Z#B55HN@%QVoefp$^{Y0;|4dYi
zO^CY0q^)SJ;%m~_?vG!yNS%%tW-LAr=_-cpGHSBXokyLC%wv0lh3ffb-D<(s_{#8Q
zzW6FN8EcQ0B~%H2NyWEC?T)@?mdd)rTC82|Laiza#$xJxC(~Kvw+Y}jcx
zYjt_CaZ!M>iQVgxL88dHh*gtyeQI{gM#QnblD^@6*U=OtR~WE2YpTPEBx=BxZ0^^A
zU2)c`u-yPOEN_r=bepY{FbC;@J4r8JnaAG;t3FNN?JX(v$Ufq!z;waX-C3q`_
z!3TV;KH2a`YhECjJu?6~ss%c8&1k%+vrcbCdvivtwI6Y^DT1ycbR^yqB@ZW
zt(;v8T110{nO0&!>#sN)Zz=(Q2NHZyty**q)#p2))!iCEvVI3wU!Y2PY7Cr!qW$j|qh;+vHU|4oyrmN52^NBC{KFY_6S<~hX!
zqZG4IC8QB}Sv~dMXouRZNH?gC=1JXJIKf(XDuEXCUc!9Trpq>TAe=y|>CI2nT
zV_1P$mj{|61~Cp
z)9JZ%g|%nn`2WlgAe@$p{}YBY4{c9a%>au=SGt(tgu^^L~pSP5p1Z4Ocu+WK
zU=yY){!e+CPuWd44F1fiE}Y|fD~1);&{h=G@m_Umli3;^vFibl5l8Hl+Qs%3u~(m<
zFKrVZUGu^%R@cu1aKYjE3;;m~vaMR(O%$n^PO7flfC@@Yb*16Y6#E7^hb{Q2

  1   2   >